Atomic operations: your thoughts are welocme

Thu Feb 11 13:13:39 UTC 2021

On 2/11/21 4:59 AM, Kim Barrett wrote:
>> On Feb 8, 2021, at 1:14 PM, Andrew Haley <aph at redhat.com> wrote:
>>
>> I've been looking at the hottest Atomic operations in HotSpot, with a view to
>> finding out if the default memory_order_conservative (which is very expensive
>> on some architectures) can be weakened to something less. It's impossible to
>> fix all of them, but perhaps we can fix some of the most frequent.
> 
> Is there any information about the possible performance improvement from
> such changes?  1.5-3M occurrences doesn't mean much without context.

I am going through the exercise of relaxing some of the memory orders in Shenandoah code, and 
AArch64 benefits greatly from it (= two-way barriers are bad in hot code).

There are obvious things like relaxing counter updates:
  JDK-8261503: Shenandoah: reconsider verifier memory ordering
  JDK-8261501: Shenandoah: reconsider heap statistics memory ordering
  JDK-8261500: Shenandoah: reconsider region live data memory ordering
  JDK-8261496: Shenandoah: reconsider pacing updates memory ordering

There are more interesting things like relaxing accesses to marking bitmap (which is a large counter 
array in disguise) -- which effectively implies a CAS (and thus two FULL_MEM_BARRIER-s on AArch64) 
per marked object:
  JDK-8261493: Shenandoah: reconsider bitmap access memory ordering

These five relaxations above cut down marking phase time on AArch64 for about 10..15%.

And there is more advanced stuff where relaxed is not enough, but conservative is too conservative. 
There, acq/rel should be enough -- but we cannot yet test it, because AArch64 cmpxchg does not do 
anything except relaxed/conservative (JDK-8261579):
  JDK-8261492: Shenandoah: reconsider forwardee accesses memory ordering
  JDK-8261495: Shenandoah: reconsider update references memory ordering

These two (along with experimental 8261579 fix) cut down evacuation and update-references phase 
times for about 25..30% and 10..15%, respectively.

All in all, this cuts down Shenandoah GC cycle times on AArch64 for about 15..20%! So, I believe 
this shows enough benefit to invest our time. Heavy-duty GC code is where I expect the most benefit.

-- 
Thanks,
-Aleksey