RFR: 8319797: Recursive lightweight locking: Runtime implementation [v6]

Thu Nov 23 07:35:08 UTC 2023

On Wed, 22 Nov 2023 21:17:55 GMT, Daniel D. Daugherty <dcubed at openjdk.org> wrote:

>> Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
>> 
>>  - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797
>>  - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797
>>  - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797
>>  - Fix nit
>>  - Fix comment typos
>>  - 8319797: Recursive lightweight locking: Runtime implementation
>
> src/hotspot/share/runtime/synchronizer.cpp line 1320:
> 
>> 1318:         inf->set_owner_from_anonymous(current);
>> 1319:         size_t removed = JavaThread::cast(current)->lock_stack().remove(object);
>> 1320:         inf->set_recursions(removed - 1);
> 
> Hmmmm... so now I'm wondering how non-lightweight locking gets the
> recursions count correct? IIRC, with LM_LEGACY we count the BasicLocks
> on the stack in order to get a proper recursion count, but I could be wrong...
> 
> I don't see anything in the inflation code that's updating the recursion count
> based on BasicLocks on the stack. Of course that would be impossible for a
> Thread-A to do safely why inflating an ObjectMonitor for a lock held by a
> Thread-B.

In LM_LEGACY the recursion is implicit through the virtual stack created by the BasicObjectLocks in thread frames + the `ObjectMonitors::_recursions` value (in case inflation occurred during the critical section). In LM_LEGACY we do not need to count, because when unwinding stack frames or when monitorexit / synchronised return occurs we always first check if the associated BasicObjectLocker contains `0` in its displaced mark word. In this case we know it was a recursive `LEGACY` monitorenter / synchronised method entry and we can simply do nothing by clear the BasicObjectLocker (regardless if the monitor is inflated).

On the last exit the displace mark word will not be `0` so if it was inflated we will to an `ObjectMonitor::exit` the owner might be a pointer to the  BasicObjectLocker (or at least into the current threads stack in the case of OSR) or the current thread, regardless the recursion will be zero and we unlock.

There are two places where we need to patch up this to make it work, one is OSR where if the OSR contains the initial monitor enter it inflates the monitor so we no longer have a stale stack pointer in the mark word. 
Or if we need to re-lock elided locks, and here we deal with the fact that we now can have that we are locking in an earlier frame than the current initial enter. So we find that BasicObjectLock (trivial as the mark word points to it). Write 0 to its displaced mark word (transforming it from an initial enter to a recursive one) and then update the mark word to point at the new BasicObjectLock.

So the recursions are handled implicitly in the algorithm. (Except for that inappropriate use of jni monitorexit can break the VMs assumptions and we can end up in very strange states, I have ideas on how to restrict this so at least it cannot invalidate our assumptions, as well as have a much stronger guarantee when it comes to the CheckJNICalls than we do today).

One interesting point here is that we could do something similar for LM_LIGHTWEIGHT. We still need the lock stack to map Objects to Threads (in liliput we do not have  space to write some thread identifying information in the mark word (in a safe, and non restrictive way)). But we could handle any type of recursion, (that is interleaved enters, not only consecutive like the current implementation). To avoid a linear scan of the lock stack in the fast lock we would only do the fast recursive path for consecutive enters. Then have a medium path where we linear scan for interleaved recursive enters. This is very similar to what legacy does that it has a fast path if the initial enters frame is within  one page size of the recursive enters frame, and otherwise does a medium path which checks if it is the same thread stack more exactly. 

I believe this would be worth while investigating. However this can be done as an enhancement to this implementation.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1402993813