Exploring Time-Space trade-offs for "synchronized" in Lilliput

Mon Nov 8 21:16:41 UTC 2021

Roman, FWIW, in Zing (now called Prime) we use a similar optional hash-code word, but we place the hashcode in a pre-header word, rather than with an appended word. It fits in basically the same place (in a gap between objects, and can occupy an alignment gap if one exists), but is easier/faster to address. Putting it before the header eliminates the need to know the object size to get to the hashcode.

> On Nov 8, 2021, at 11:41 AM, Roman Kennke <rkennke at redhat.com> wrote:
> 
> Hi Dave,
> 
> I read the paper once in full, but I still need to digest it a little more. Just some comments from my perspective:
> 
> 1. I have implemented a prototype for compact hashcode which only requires 2 bits in the object header. The idea is to recompute hashcode as long as an object doesn't move, and as soon as a hashed object is moved (by the GC), the hashcode will be 'appended' to the copied object (which sometimes makes the object larger - in many cases the hashcode fits into alignment gap, though).
> 
> 2. The main trouble that I have with the current locking scheme is caused by stack-locks ('thin locks'). It is inherently racy to get hold of a mark-word that is displaced by a stack-lock by a 'foreign' thread. Full inflated object monitors are less problematic: once inflated, it is safe (with caveats, see https://github.com/openjdk/lilliput/pull/27) to access the displaced mark word concurrently. That is why I am currently working on a way to disable stack-locking altogether, and need only deal with full monitors. It looks to me that in this regard, your proposed CJM is similar to current object monitors. Is that correct?
> 
> I will study CJM in more detail and probably will have more questions.
> 
> Thanks for your input!!
> 
> Cheers,
> Roman
> 
>> Abstract : In the context of project Lilliput, which attempts to reduce the size of object header in the HotSpot Java Virtual Machine (JVM), we explore a curated set of synchronization algorithms. Each of the algorithms could serve as a potential replacement implementation for the “synchronized” construct in HotSpot. Collectively, the algorithms illuminate trade-offs in space-time properties.
>> The key design decisions are where to locate synchronization metadata (monitor fields), how to map from an object to those fields, and the lifecycle of the monitor information.
>> The readers is assumed to be familiar with current HotSpot implementation of “synchronized” as well as the Compact Java Monitors (CJM) design (https://arxiv.org/abs/2102.04188)
>> Dave
>> p.s.,
>> Don’t expect any surprises regarding performance — common sense and intuition prevail.  As you’d expect, the more involved ”synchronized" is with the header word, the more performance we can squeeze out, albeit at a cost in complexity.
>> Note that I narrowed the space of solutions to cover only those that avoid the need for deferred deflation, as monitor accretion and deflation policies remain a continuing problem in HotSpot.
>> Finally, for reference, performance data is included for pthread_mutex operations and ReentrantLock.
>