RFR: 8369048: GenShen: Defer ShenFreeSet::available() during rebuild

Tue Nov 11 22:07:05 UTC 2025

On Fri, 31 Oct 2025 01:08:25 GMT, Y. Srinivas Ramakrishna <ysr at openjdk.org> wrote:

>> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 240:
>> 
>>> 238:   // Return available_in assuming caller does not hold the heap lock.  In production builds, available is
>>> 239:   // returned without acquiring the lock.  In debug builds, the global heap lock is acquired in order to
>>> 240:   // enforce a consistency assert.
>> 
>> Can the comment be simplified to:
>> 
>> 
>> // Return bytes `available` in the given `partition`
>> // while holding the `rebuild_lock`.
>> 
>> 
>> Don't say anything about the heap lock in the API comment. Rather, in the part that is `ifdef ASSERT` where you take the heap lock (line ~244), say:
>> 
>> //  Acquire the heap lock to get a consistent
>> // snapshot to check assert.
>
> As I write this, I realize that in the most general case where two threads may call these API's independently in a fastdebug build, you could theoretically get into a deadlock because they attempted to acquire locks in different orders (this possibility exists -- statically -- only in the fastdebug builds).
> 
> The general MuteLocker machinery has ranked mutexes to avoid such situations through static ranking and checks while acquiring locks (in debug builds as a way of potentially catching such situations and flagging them).
> 
> With such ranking though this code would assert because the locks are acquired in different order between here and elsewhere.
> 
> In product builds you are fine because the rebuild lock acts as a "leaf lock" (in hotspot parlance). But there seems to be a definite possibility of deadlock in debug builds if/when the rebuild is attempted by one thread while another checks available and attempts to acquire the heap lock to check the assertion. You could solve it by acquiring the heap lock before calling the work method where the assertion check is done.
> 
> However, I'd be much more comfortable if we used some form of lock rank framework, unless it was utterly impossible to do so for some reason. (Here it was easy to spot the lock order inversion because it was in the code. Of course, if a debug build deadlocked you would also figure out the same, but having lock ordering gives you a quick and easy way to verify if there's potential for trouble.)
> 
> Not sure of the history of ShenandoahLock or why the parallel infra to MutexLocker was introduced (perhaps for allowing some performance/tunability), but might be worthwhile to see if we want to build lock rank checks in for robustness/maintainability.

I'm coming back to this PR after working on others.  Thanks for your comments.

This is a good catch.  I know better than to do that!  Sorry.

My intention was to rank-order the locks.  Whenever multiple locks are held, it should be in this order:
  first acquire the global heap lock
  In nested context, acquire the rebuild_lock

Any thread that only acquires the global heap lock or only acquires the rebuild_lock will not deadlock.

Multiple threads that acquire both locks will not deadlock because they acquire in the same order.

The code you identified was definitely a problem because we were acquiring the two lock in the wrong order.  I'm going to remove that assert and the lock associated with it.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27612#discussion_r2515961240