RFR: 8291555: Replace stack-locking with fast-locking

Fri Nov 11 14:37:38 UTC 2022

On Fri, 28 Oct 2022 01:47:23 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> \-\-\-\-\- Original Message \-\-\-\-\-
>>> From\: \"John R Rose\" \<jrose at openjdk\.org>
>>> To\: hotspot\-dev at openjdk\.org\, serviceability\-dev at openjdk\.org\, shenandoah\-dev at openjdk\.org
>>> Sent\: Thursday\, October 27\, 2022 10\:41\:44 PM
>>> Subject\: Re\: RFR\: 8291555\: Replace stack\-locking with fast\-locking \[v7\]
>> 
>>> On Mon\, 24 Oct 2022 11\:01\:01 GMT\, Robbin Ehn \<rehn at openjdk\.org> wrote\:
>>> 
>>>> Secondly\, a question\/suggestion\: Many recursive cases do not interleave locks\,
>>>> meaning the recursive enter will happen with the lock\/oop top of lock stack
>>>> already\. Why not peak at top lock\/oop in lock\-stack if the is current just push
>>>> it again and the locking is done\? \(instead of inflating\) \(exit would need to
>>>> check if this is the last one and then proper exit\)
>>> 
>>> The CJM paper \(Dice\/Kogan 2021\) mentions a \"nesting\" counter for this purpose\.
>>> I suspect that a real counter is overkill\, and the \"unary\" representation
>>> Robbin mentions would be fine\, especially if there were a point \(when the
>>> per\-thread stack gets too big\) at which we go and inflate anyway\.
>>> 
>>> The CJM paper suggests a full search of the per\-thread array to detect the
>>> recursive condition\, but again I like Robbin\'s idea of checking only the most
>>> recent lock record\.
>>> 
>>> So the data structure for lock records \(per thread\) could consist of a series of
>>> distinct values \[ A B C \] and each of the values could be repeated\, but only
>>> adjacently\:  \[  A A A B C C \] for example\.  And there could be a depth limit as
>>> well\.  Any sequence of held locks not expressible within those limitations
>>> could go to inflation as a backup\.
>> 
>> Hi John\,
>> a certainly stupid question\, i\'ve some trouble to see how it can be implemented given that because of lock coarsening \(\+ may be OSR\)\, the number of time a lock is held is different between the interpreted code and the compiled code\.
>> 
>> R\?mi
>
>> So the data structure for lock records (per thread) could consist of a series of distinct values [ A B C ] and each of the values could be repeated, but only adjacently: [ A A A B C C ] for example.
> @rose00 why only adjacently? Nested locking can be interleaved on different monitors.

@dholmes-ora and all: I have prepared an alternative PR #10907 that implements the fast-locking behind a new experimental flag, and preserves the current stack-locking behavior as the default setting. It is currently implemented and tested on x86* and aarch64 arches. It is also less invasive because it keeps everything structurally the same (i.e. no method signature changes, no stack layout changes, etc). On the downside, it also means we can not have any of the associated cleanups and optimizations yet, but those are minor anyway. Also, there still is the risk that I make a mistake with the necessary factoring-out of current implementation. If we agree that this should be the way to go, then I would close this PR, and continue work on #10907.

-------------

PR: https://git.openjdk.org/jdk/pull/10590