[jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS

Wed Jul 20 00:31:57 UTC 2016

On Jul 19, 2016, at 5:14 PM, Martin Buchholz <martinrb at google.com> wrote:
> 
> 
> On Fri, Jul 15, 2016 at 5:50 PM, John Rose <john.r.rose at oracle.com <mailto:john.r.rose at oracle.com>> wrote:
> On Jul 15, 2016, at 2:39 PM, Martin Buchholz <martinrb at google.com <mailto:martinrb at google.com>> wrote:
>> 
>> On Thu, Jul 14, 2016 at 7:09 PM, John Rose <john.r.rose at oracle.com <mailto:john.r.rose at oracle.com>> wrote:
>> 
>> The particular use case I have in mind is SeqLocks, specifically the writer-enter operation, which needs to change the lock state to "odd", unless it is already "odd", and let the processor know what happened.  An "xadd" cannot do this, but a "cmpxchg" or "bts" can, and the "bts" is preferable.
>> 
>> Most synchronizers have more complex state than "locked or unlocked".  StampedLock is a read-write lock, so you can only acquire the write lock if not currently read-locked.  (Did I miss something?)
> 
> The bitwise stuff allows you to acquire or release a single independent bit
> in a lock word (or maybe more than one bit).  That bit doesn't have to encode
> the whole state of the lock; in fact if it did we'd use getAndSet of a boolean.
> The point is you can build lock state management on top of getAndBitwise*
> in useful ways, when if the first interaction with the lock is to assert a setting
> 
> I'm still thinking about where in j.u.c. we would use getAndBitwise*.
> 
> ... StampedLock ... 
> 
> we have to distinguish readers and writers, so both readers and writers acquire the micro-lock before proceeding on success to do another write to indicate the actual current lock state.  We'd better not lose our time slice in between!  If an acquirer fails to acquire the micro-lock in an indeterminate state, they probably spin waiting for the micro-lock owner, but for how long?

Yes, more work is needed to make that operate correctly.  I suppose we can reuse an idea from HotSpot and have compact and inflated states for such locks.  In a nutshell, it works like this:  The compact state needs at a minimum just enough bits to encode semantic lock state, plus distinguish compact from inflation states.  The lock would try to stay in the compact state, but inflate if waiter lists need to be dealt with.  The inflated state would have an out-of-line control block with waiter queues and every creature comfort.  It might be hard to do this on top of the JVM, which likes to use safepoints to pull tricks like deflating cold locks.

> ReentrantLock seems more promising.  The micro-lock bit unambiguously indicates "exclusively held"; other bits are reentrant hold count bits.  On reentrant acquire, have to check thread field: 
>   lock.thread == Thread.currentThread().
> If we don't acquire reentrantly, then a single getAndSetMicroLock is sufficient to unambiguously acquire the lock.
> 
> 
>> ReentrantLock is reentrant (!) so needs to store the lock hold count.  Perhaps ReentrantLock could benefit if you optimize for non-reentrant acquires, at the cost of doing an extra update for reentrant acquires.
> 
> 
> It seems to me that any multi-field concurrent structure (like a StampedLock)
> could be protected by a single-bit micro-lock built on top of a reserved bit taken
> from one of the structure's fields.  There are often reasons not to do such
> things, but when the technique is appropriate, the bitwise operators let you
> lay down the bit inside the same cache line as the rest of the structure.
> That seems like a win to me.
> 
> Some day we can persuade the JVM to loosen its grip on the slack bits
> in pointers, allowing types like AtomicMarkableReference to be implemented
> in one word.  In that case, AMR.attemptMark might use BTS/BTR.
> 
> But ...  AtomicMarkableReference probably needs to be implemented in the VM, not in pure Java code that uses VarHandles, since pointer bit stealing depends on things like compressed oops?

What I mean by "loosen its grip" is share enough layout information about pointers that Java code can find and use a slack bit in the pointer format.  (And if there isn't such a bit, then Java code would have to go away and do something else.)  Also, for pointers which are treated this way, the GC would have to mask off the shared bits.

— John