RFR: 8291555: Implement alternative fast-locking scheme [v5]

Mon Jan 30 11:10:27 UTC 2023

On Fri, 27 Jan 2023 19:33:16 GMT, Daniel D. Daugherty <dcubed at openjdk.org> wrote:

>> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 86 commits:
>> 
>>  - Merge branch 'master' into JDK-8291556-v2
>>  - Revert UseFastLocking to default to off
>>  - Change log message inflate(locked) -> inflate(has_locker)
>>  - Properly set ZF on anon-check path; avoid some conditional branches
>>  - Use testb when testing for anon owner in fast-path
>>  - Merge branch 'master' into JDK-8291555-v2
>>  - In x86_32 C2 fast_lock(), CAS thread directly, instead of CASing stack-pointer and then storing thread
>>  - x86 part of optimization to check for anon owner
>>  - Optimize the check-for-anon-owner fast-path
>>  - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2
>>  - ... and 76 more: https://git.openjdk.org/jdk/compare/da80e7a4...784b361c
>
> src/hotspot/share/runtime/synchronizer.cpp line 1301:
> 
>> 1299:     // We could always eliminate polling by parking the thread on some auxiliary list.
>> 1300:     // NOTE: We need to check UseFastLocking here, because with fast-locking, the header
>> 1301:     // may legitimately be zero: cleared lock-bits and all upper header bits zero.
> 
> The `markWord::INFLATING()` value was picked to be zero because the header could never
> be zero except for the inflating case. With fast-locking, in the locked case with no hashcode,
> the header can be all zeros. Hmmm... gotta mull on that one...
> 
> fast-locking does not use the `markWord::INFLATING()`, but that protocol exists with the
> stack-locking implementation to prevent races. Here's the gory comment from the
> "Case stack-locked" portion of `ObjectSynchronizer::inflate()`:
> 
> 
>       // We've successfully installed INFLATING (0) into the mark-word.
>       // This is the only case where 0 will appear in a mark-word.
>       // Only the singular thread that successfully swings the mark-word
>       // to 0 can perform (or more precisely, complete) inflation.
>       //
>       // Why do we CAS a 0 into the mark-word instead of just CASing the
>       // mark-word from the stack-locked value directly to the new inflated state?
>       // Consider what happens when a thread unlocks a stack-locked object.
>       // It attempts to use CAS to swing the displaced header value from the
>       // on-stack BasicLock back into the object header.  Recall also that the
>       // header value (hash code, etc) can reside in (a) the object header, or
>       // (b) a displaced header associated with the stack-lock, or (c) a displaced
>       // header in an ObjectMonitor.  The inflate() routine must copy the header
>       // value from the BasicLock on the owner's stack to the ObjectMonitor, all
>       // the while preserving the hashCode stability invariants.  If the owner
>       // decides to release the lock while the value is 0, the unlock will fail
>       // and control will eventually pass from slow_exit() to inflate.  The owner
>       // will then spin, waiting for the 0 value to disappear.   Put another way,
>       // the 0 causes the owner to stall if the owner happens to try to
>       // drop the lock (restoring the header from the BasicLock to the object)
>       // while inflation is in-progress.  This protocol avoids races that might
>       // would otherwise permit hashCode values to change or "flicker" for an object.
>       // Critically, while object->mark is 0 mark.displaced_mark_helper() is stable.
>       // 0 serves as a "BUSY" inflate-in-progress indicator.
> 
> 
> So how does fast-locking avoid hashCode value flickering for an object
> when there are races between exiting an monitor and inflation? With
> fast-locking the owner of the monitor does not stall due to an INFLATING
> value being present in the header so the owner thread will race with the
> inflating thread. However, unlike with stack-locking, the owner thread in
> fast-locking is not restoring a saved header value from the BasicLock to
> the object's header; it is simply changing the `locked_value` to the
> `unlocked_value` in the lower two bits. If there's a hashCode in the header,
> then that hashCode remains untouched.
> 
> The other callers of `read_stable_mark()`:
> 
> `ObjectSynchronizer::FastHashCode()` - Does not need to stall in `read_stable_mark()`
> due to a racing inflation because the hashCode value will either be found in the header
> or in the ObjectMonitor and it will be the same value in both locations.
> 
> `ObjectSynchronizer::current_thread_holds_lock()` - Does not need to stall in
> `read_stable_mark()` due to a racing inflation because the current thread
> either owns the lock or it does not and that state cannot change while the
> current thread is executing this function.
> 
> `ObjectSynchronizer::get_lock_owner()` - I think this function does need to stall
> in `read_stable_mark()` due to a racing inflation. If the calling thread in this
> case gets a header value where `mark.is_fast_locked() == true`, then it will
> search the ThreadsList for the thread that has the object on its lock stack.
> If thread that's inflating the lock is the owner of the lock, then it will remove
> the object from its lock stack after it has changed the owner in the ObjectMonitor
> to itself. If that happens before the `get_lock_owner()` calling thread reaches
> the owning thread in its ThreadsList search, then the `get_lock_owner()` calling
> thread won't find the owner of the lock.
> 
> If the `get_lock_owner()` calling thread happens to "know" that the lock is
> supposed to be owned by _someone_, then the inconsistency could be detected.
> It might even be possible to write a test to detect this. I'll mull on that a bit...
> 
> `ObjectSynchronizer::inflate()` - Only calls read_stable_mark() when stack-locking
> is in use and the `INFLATING()` value is seen.
> 
> Otherwise, `object->mark_acquire()` is used for the read of the header at the top
> of the `inflate()` loop and all the code that updates the object's mark use
> `cas_set_mark()` to only change the object's mark when the current value matches
> the expected current value. Otherwise, we loop around and try again.

I don't think hashCode could flicker: the hashcode is store either in the header OR in the OM, but never in both. Except during inflation, in which case the CAS will ensure that the hash-code is stable (if it changes from 0 to N, then the CAS will fail and inflation will try again.

An aside: With fast-locking we will be able to install and access the i-hash without inflation. However, in this case we *do* need to ensure that the i-hash will not flicker: In the case of an inflated monitor, we would have to install the i-hash into the OM displaced header, but this creates a race: meanwhile the OM could be deflated, and another thread could observe a 0 hash, and install a new i-hash into the real header. An easy way out would be to generate the i-hash from the object address, which doesn't change between safepoints.

Regarding `get_lock_owner()`, there is a deeper problem IMO: The only caller of this method that is not guaranteed to be at a safepoint is `jmm_GetThreadInfo()` in `management.cpp`. However, this is not really safe to begin with: threads would always race to enter and exit monitors (no matter if they are stack-locked, fast-locked or monitor-locked). Therefore the lock-owner reported there cannot be more than informational. If we want this to be reliable, we should query the lock ownerships at safepoint only, in which case the problem that you describes dispappears. For the time being, I think we may report null lock owner there, pretty much like we could currently do with stack-locked or monitor-locked objects. Please correct me if I am wrong.

-------------

PR: https://git.openjdk.org/jdk/pull/10907