[openjdk/lilliput] Rendezvous GC threads under STS for monitor deflation (PR #27)

Mon Nov 8 23:31:20 UTC 2021

I agree about the need to have some type of framework where we could quickly plug in new implementations and explore more readily.    I have that for linux kernel locks at compile-time and pthread_mutexes at runtime via LD_PRELOAD interposition — both are extremely useful for exploration -- but it’s more difficult in HotSpot as there are so many other components that like to reach into objectMonitors.   And of course the coupling between the synchronization subsystems and other areas is rather high.   That kind of flexibility is more common in JVMs designed as  research vehicles.  

Regarding recursive self locking, each thread has a stack of lock records that describe all the locks currently held by the thread.  Threads typically hold very few locks concurrently, so it’s cheap to look forward through the stack.  Yes, it’s a O(n) walk, but absent contrived microbenchmarks, n will be very small.   According to the race detection literature, “lockset” sizes (the number of distinct locks a thread holds simultaneously) can be expected to be very small — a lifetime maximum of 2 or 3 is a good rule of thumb.   So if we see an object is locked, we check our own stack 1st, and just advance a recursion count if we find a match.  (Hopefully the JIT will help here, for cases where it can statically detect such recursive locking).  I don’t think this quite plays into Panama confinement, however.  

Are there any working documents for the Panama confinement ideas?

Regards
Dave

> On 2021-11-8, at 5:50 PM, John Rose <john.r.rose at oracle.com> wrote:
> 
> This is great stuff.  I am very pleased to think that we can probably simplify the use of the object header for monitors in the directions of CJM and these newer designs.
> 
> I enjoyed reading about the two-bit design which avoids touching the global side table on the fast path.  I’m rooting for that.  But still more, I’m rooting for a well-curated set of choices (like theses) which can be configured into Lilliput builds, so that we can test experimentally which work best for varying workloads.
> 
> The detection of the condition “recursively locked to self” seems inherently kind of slow when you don’t use the whole header for a pointer.  Am I wrong?
> 
> The reason I’m asking about tests for “recursively locked to self” is they are equivalent in power to confinement tests such as we use to enforce temporal bounds in Panama data structures.  The key operations for confinement are (a) acquire confined state or create object initially in such state, (b) verify confined state before access, (c) release confined state.  After that last state, it is often desirable to call libc.free on some memory resource in the object, which in state (b) was being used to contain working data.  Fast (loop-hoistable) temporal confinement checks are the low price we pay for being able to free quickly after last use.
> 
> Another reason I like the two-bit design is that it points towards uses of high throughput protection of very small critical sections, with no recursive locking at all.  And my favorite case of that would be atomic access for multi-word structs in Valhalla, when those structs are contained, as mutable variables, in object fields.  Using the object header for some kind of STM seems like a natural move there.  (Not all Valhalla structs will require it atomicity, but some will.)
> 
> It would be useful, I think, to evolve these CJM-like structures to support not just classic Java 1.0 synchronization, but also functionally similar use cases, such as enforcement of confinement for Panama capabilities, and enforcement of atomicity for Valhalla structs.
>