RFR: 8350621: Code cache stops scheduling GC
Thomas Schatzl
tschatzl at openjdk.org
Tue Jun 24 08:57:31 UTC 2025
On Sun, 16 Feb 2025 18:39:29 GMT, Alexandre Jacob <duke at openjdk.org> wrote:
> The purpose of this PR is to fix a bug where we can end up in a situation where the GC is not scheduled anymore by `CodeCache`.
>
> This situation is possible because the `_unloading_threshold_gc_requested` flag is set to `true` when triggering the GC and we expect the GC to call `CodeCache::on_gc_marking_cycle_finish` which in turn will call `CodeCache::update_cold_gc_count`, which will reset the flag `_unloading_threshold_gc_requested` allowing further GC scheduling.
>
> Unfortunately this can't work properly under certain circumstances.
> For example, if using G1GC, calling `G1CollectedHeap::collect` does no give the guarantee that the GC will actually run as it can be already running (see [here](https://github.com/openjdk/jdk/blob/7d11418c820b46926a25907766d16083a4b349de/src/hotspot/share/gc/g1/g1CollectedHeap.cpp#L1763)).
>
> I have observed this behavior on JVM in version 21 that were migrated recently from java 17.
> Those JVMs have some pressure on code cache and quite a large heap in comparison to allocation rate, which means that objects are mostly GC'd by young collections and full GC take a long time to happen.
>
> I have been able to reproduce this issue with ParallelGC and G1GC, and I imagine that other GC can be impacted as well.
>
> In order to reproduce this issue, I found a very simple and convenient way:
>
>
> public class CodeCacheMain {
> public static void main(String[] args) throws InterruptedException {
> while (true) {
> Thread.sleep(100);
> }
> }
> }
>
>
> Run this simple app with the following JVM flags:
>
>
> -Xlog:gc*=info,codecache=info -Xmx512m -XX:ReservedCodeCacheSize=2496k -XX:StartAggressiveSweepingAt=15
>
>
> - 512m for the heap just to clarify the intent that we don't want to be bothered by a full GC
> - low `ReservedCodeCacheSize` to put pressure on code cache quickly
> - `StartAggressiveSweepingAt` can be set to 20 or 15 for faster bug reproduction
>
> Itself, the program will hardly get pressure on code cache, but the good news is that it is sufficient to attach a jconsole on it which will:
> - allows us to monitor code cache
> - indirectly generate activity on the code cache, just what we need to reproduce the bug
>
> Some logs related to code cache will show up at some point with GC activity:
>
>
> [648.733s][info][codecache ] Triggering aggressive GC due to having only 14.970% free memory
>
>
> And then it will stop and we'll end up with the following message:
>
>
> [672.714s][info][codecache ] Code cache is full - disabling compilation
>
>
> L...
> Why making sure only one thread calls `collect(...)`? I believe this API can be invoked concurrently.
> I have a question regarding the existing code/logic.
>
> ```
> // In case the GC is concurrent, we make sure only one thread requests the GC.
> if (Atomic::cmpxchg(&_unloading_threshold_gc_requested, false, true) == false) {
> log_info(codecache)("Triggering aggressive GC due to having only %.3f%% free memory", free_ratio * 100.0);
> Universe::heap()->collect(GCCause::_codecache_GC_aggressive);
> }
> ```
>
> Why making sure only one thread calls `collect(...)`? I believe this API can be invoked concurrently.
>
> Would removing `_unloading_threshold_gc_requested` resolve this problem?
It does, at the cost of many log messages:
[0.047s][info][gc ] GC(0) Pause Young (Concurrent Start) (CodeCache GC Threshold) 2M->1M(512M) 4.087ms
[0.047s][info][gc,cpu ] GC(0) User=0.01s Sys=0.00s Real=0.00s
[0.047s][info][gc ] GC(1) Concurrent Mark Cycle
[0.047s][info][gc,marking ] GC(1) Concurrent Scan Root Regions
[0.048s][info][codecache ] Triggering threshold (7.654%) GC due to allocating 48.973% since last unloading (0.000% used -> 48.973% used)
[0.048s][info][gc,marking ] GC(1) Concurrent Scan Root Regions 0.147ms
[0.048s][info][gc,marking ] GC(1) Concurrent Mark
[0.048s][info][gc,marking ] GC(1) Concurrent Mark From Roots
[0.048s][info][codecache ] Triggering threshold (7.646%) GC due to allocating 49.028% since last unloading (0.000% used -> 49.028% used)
[0.048s][info][codecache ] Triggering threshold (7.646%) GC due to allocating 49.028% since last unloading (0.000% used -> 49.028% used)
[0.048s][info][codecache ] Triggering threshold (7.633%) GC due to allocating 49.114% since last unloading (0.000% used -> 49.114% used)
[0.049s][info][gc,task ] GC(1) Using 6 workers of 6 for marking
[0.049s][info][codecache ] Triggering threshold (7.625%) GC due to allocating 49.169% since last unloading (0.000% used -> 49.169% used)
[0.049s][info][codecache ] Triggering threshold (7.616%) GC due to allocating 49.224% since last unloading (0.000% used -> 49.224% used)
[...repeated 15 times...]
[0.063s][info][codecache ] Triggering threshold (7.527%) GC due to allocating 49.820% since last unloading (0.000% used -> 49.820% used)
[0.065s][info][codecache ] Triggering threshold (7.519%) GC due to allocating 49.875% since last unloading (0.000% used -> 49.875% used)
[0.067s][info][codecache ] Triggering threshold (7.511%) GC due to allocating 49.930% since last unloading (0.000% used -> 49.930% used)
[0.068s][info][gc,marking ] GC(1) Concurrent Mark From Roots 20.256ms
[0.068s][info][gc,marking ] GC(1) Concurrent Preclean
[0.068s][info][gc,marking ] GC(1) Concurrent Preclean 0.016ms
[0.068s][info][gc,start ] GC(1) Pause Remark
As you can see this is very annoying, particularly if the marking takes seconds all the while compiling is in progress.
>
> > I have been able to reproduce this issue with ParallelGC and G1GC, and I imagine that other GC can be impacted as well.
>
> For ParallelGC, `ParallelScavengeHeap::collect` contains the following to ensure `System.gc` gccause and similar ones guarantee a full-gc.
>
> ```
> if (!GCCause::is_explicit_full_gc(cause)) {
> return;
> }
> ```
>
> However, the current logic that a young-gc can cancel a full-gc (`_codecache_GC_aggressive` in this case) also seems surprising.
That's a different issue.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/23656#issuecomment-2999414442
More information about the hotspot-compiler-dev
mailing list