RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2]

Fri Feb 14 23:47:12 UTC 2025

On Wed, 5 Feb 2025 14:41:39 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> Here is an attempt to simplify GCLocker implementation for Serial and Parallel.
>> 
>> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker.
>> 
>> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization.
>> 
>> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress.
>> 
>> Test: tier1-8
>
> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
> 
>  - Merge branch 'master' into gclocker
>  - review
>  - Merge branch 'master' into gclocker
>  - gclocker

src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 385:

> 383: 
> 384: HeapWord* ParallelScavengeHeap::mem_allocate_old_gen(size_t size) {
> 385:   if (!should_alloc_in_eden(size) || GCLocker::is_active()) {

I don't understand why we are checking is_active() here.  The value is not reliable if we aren't at a safepoint, and iterating over all threads seems expensive.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1956881801