RFR: 8369048: GenShen: Defer ShenFreeSet::available() during rebuild

Y. Srinivas Ramakrishna ysr at openjdk.org
Fri Oct 31 05:09:05 UTC 2025


On Fri, 31 Oct 2025 00:44:11 GMT, Y. Srinivas Ramakrishna <ysr at openjdk.org> wrote:

>> This code introduces a new rebuild-freeset lock for purposes of coordinating the freeset rebuild activities and queries as to memory available for allocation in the mutator partition.
>> 
>> This addresses a problem that results if available memory is probed while we are rebuilding the freeset.
>> 
>> Rather than using the existing global heap lock to synchronize these activities, a new more narrowly scoped lock is introduced.  This allows the available memory to be probed even when other activities hold the global heap lock for reasons other than rebuilding the freeset, such as when they are allocating memory.  It is known that the global heap lock is heavily contended for certain workloads, and using this new lock avoids adding to contention for the global heap lock.
>
> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 240:
> 
>> 238:   // Return available_in assuming caller does not hold the heap lock.  In production builds, available is
>> 239:   // returned without acquiring the lock.  In debug builds, the global heap lock is acquired in order to
>> 240:   // enforce a consistency assert.
> 
> Can the comment be simplified to:
> 
> 
> // Return bytes `available` in the given `partition`
> // while holding the `rebuild_lock`.
> 
> 
> Don't say anything about the heap lock in the API comment. Rather, in the part that is `ifdef ASSERT` where you take the heap lock (line ~244), say:
> 
> //  Acquire the heap lock to get a consistent
> // snapshot to check assert.

As I write this, I realize that in the most general case where two threads may call these API's independently in a fastdebug build, you could theoretically get into a deadlock because they attempted to acquire locks in different orders (this possibility exists -- statically -- only in the fastdebug builds).

The general MuteLocker machinery has ranked mutexes to avoid such situations through static ranking and checks while acquiring locks (in debug builds as a way of potentially catching such situations and flagging them).

With such ranking though this code would assert because the locks are acquired in different order between here and elsewhere.

In product builds you are fine because the rebuild lock acts as a "leaf lock" (in hotspot parlance). But there seems to be a definite possibility of deadlock in debug builds if/when the rebuild is attempted by one thread while another checks available and attempts to acquire the heap lock to check the assertion. You could solve it by acquiring the heap lock before calling the work method where the assertion check is done.

However, I'd be much more comfortable if we used some form of lock rank framework, unless it was utterly impossible to do so for some reason. (Here it was easy to spot the lock order inversion because it was in the code. Of course, if a debug build deadlocked you would also figure out the same, but having lock ordering gives you a quick and easy way to verify if there's potential for trouble.)

Not sure of the history of ShenandoahLock or why the parallel infra to MutexLocker was introduced (perhaps for allowing some performance/tunability), but might be worthwhile to see if we want to build lock rank checks in for robustness/maintainability.

> src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 1158:
> 
>> 1156:     size_t young_cset_regions, old_cset_regions, first_old, last_old, num_old;
>> 1157:     ShenandoahFreeSet* free_set = heap->free_set();
>> 1158:     ShenandoahRebuildLocker rebuild_locker(free_set->lock());
> 
> Should you not create a scope around lines 1158 to line 1167, since you don't want to hold the rebuild lock as soon as the rebuild is done (i.e. immediately following `finish_rebuild()`)?

May be it doesn't matter, since no one else is running during a full gc who needs to query `available()`?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27612#discussion_r2479898169
PR Review Comment: https://git.openjdk.org/jdk/pull/27612#discussion_r2479867780


More information about the hotspot-gc-dev mailing list