RFR: 8350621: Code cache stops scheduling GC
Albert Mingkun Yang
ayang at openjdk.org
Fri May 2 18:41:53 UTC 2025
On Sun, 16 Feb 2025 18:39:29 GMT, Alexandre Jacob <duke at openjdk.org> wrote:
> The purpose of this PR is to fix a bug where we can end up in a situation where the GC is not scheduled anymore by `CodeCache`.
>
> This situation is possible because the `_unloading_threshold_gc_requested` flag is set to `true` when triggering the GC and we expect the GC to call `CodeCache::on_gc_marking_cycle_finish` which in turn will call `CodeCache::update_cold_gc_count`, which will reset the flag `_unloading_threshold_gc_requested` allowing further GC scheduling.
>
> Unfortunately this can't work properly under certain circumstances.
> For example, if using G1GC, calling `G1CollectedHeap::collect` does no give the guarantee that the GC will actually run as it can be already running (see [here](https://github.com/openjdk/jdk/blob/7d11418c820b46926a25907766d16083a4b349de/src/hotspot/share/gc/g1/g1CollectedHeap.cpp#L1763)).
>
> I have observed this behavior on JVM in version 21 that were migrated recently from java 17.
> Those JVMs have some pressure on code cache and quite a large heap in comparison to allocation rate, which means that objects are mostly GC'd by young collections and full GC take a long time to happen.
>
> I have been able to reproduce this issue with ParallelGC and G1GC, and I imagine that other GC can be impacted as well.
>
> In order to reproduce this issue, I found a very simple and convenient way:
>
>
> public class CodeCacheMain {
> public static void main(String[] args) throws InterruptedException {
> while (true) {
> Thread.sleep(100);
> }
> }
> }
>
>
> Run this simple app with the following JVM flags:
>
>
> -Xlog:gc*=info,codecache=info -Xmx512m -XX:ReservedCodeCacheSize=2496k -XX:StartAggressiveSweepingAt=15
>
>
> - 512m for the heap just to clarify the intent that we don't want to be bothered by a full GC
> - low `ReservedCodeCacheSize` to put pressure on code cache quickly
> - `StartAggressiveSweepingAt` can be set to 20 or 15 for faster bug reproduction
>
> Itself, the program will hardly get pressure on code cache, but the good news is that it is sufficient to attach a jconsole on it which will:
> - allows us to monitor code cache
> - indirectly generate activity on the code cache, just what we need to reproduce the bug
>
> Some logs related to code cache will show up at some point with GC activity:
>
>
> [648.733s][info][codecache ] Triggering aggressive GC due to having only 14.970% free memory
>
>
> And then it will stop and we'll end up with the following message:
>
>
> [672.714s][info][codecache ] Code cache is full - disabling compilation
>
>
> L...
I have a question regarding the existing code/logic.
// In case the GC is concurrent, we make sure only one thread requests the GC.
if (Atomic::cmpxchg(&_unloading_threshold_gc_requested, false, true) == false) {
log_info(codecache)("Triggering aggressive GC due to having only %.3f%% free memory", free_ratio * 100.0);
Universe::heap()->collect(GCCause::_codecache_GC_aggressive);
}
Why making sure only one thread calls `collect(...)`? I believe this API can be invoked concurrently.
Would removing `_unloading_threshold_gc_requested` resolve this problem?
> I have been able to reproduce this issue with ParallelGC and G1GC, and I imagine that other GC can be impacted as well.
For ParallelGC, `ParallelScavengeHeap::collect` contains the following to ensure `System.gc` gccause and similar ones guarantee a full-gc.
if (!GCCause::is_explicit_full_gc(cause)) {
return;
}
However, the current logic that a young-gc can cancel a full-gc (`_codecache_GC_aggressive` in this case) also seems surprising.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/23656#issuecomment-2847860414
More information about the hotspot-gc-dev
mailing list