RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v11]
Martin Doerr
mdoerr at openjdk.java.net
Tue Mar 8 17:29:07 UTC 2022
On Thu, 2 Sep 2021 16:06:45 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>> Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy.
>>
>> For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough.
>>
>> The reader side is much more interesting, because we generally want "consume", but it is not available. We can do "acquire", but it regresses performance all too much. The close inspection of the code reveals we need "acquire" on many paths, but not on the most critical one: heap updates. This must explain why current weaker reader side was never seen to fail, and this also opens a way to get `acquire`-in-lieu-of-`consume` without the observable performance penalty.
>>
>> The relaxation in forwardee installation improves concurrent evacuation quite visibly. See for example GC cycle times with SPECjvm2008, Compiler.sunflow on AArch64:
>>
>> Before:
>>
>>
>> [info][gc,stats] Concurrent Evacuation = 3.421 s (a = 21247 us) (n = 161)
>> [info][gc,stats] Concurrent Evacuation = 3.584 s (a = 21080 us) (n = 170)
>> [info][gc,stats] Concurrent Evacuation = 3.226 s (a = 21088 us) (n = 153)
>> [info][gc,stats] Concurrent Evacuation = 3.270 s (a = 20827 us) (n = 157)
>> [info][gc,stats] Concurrent Evacuation = 3.339 s (a = 20742 us) (n = 161)
>>
>>
>> After:
>>
>> [info][gc,stats] Concurrent Evacuation = 3.109 s (a = 18617 us) (n = 167)
>> [info][gc,stats] Concurrent Evacuation = 3.027 s (a = 18918 us) (n = 160)
>> [info][gc,stats] Concurrent Evacuation = 2.862 s (a = 17669 us) (n = 162)
>> [info][gc,stats] Concurrent Evacuation = 2.858 s (a = 17425 us) (n = 164)
>> [info][gc,stats] Concurrent Evacuation = 2.883 s (a = 17685 us) (n = 163)
>>
>>
>> Additional testing:
>> - [x] Linux x86_64 `hotspot_gc_shenandoah`
>> - [x] Linux AArch64 `hotspot_gc_shenandoah`
>> - [x] Linux x86_64 `tier1` with Shenandoah
>> - [x] Linux AArch64 `tier1` with Shenandoah
>
> Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision:
>
> - More natural order of arguments
> - Move the fwdptr-related updaters to ShenandoahForwarding
> - Avoid acq_rel that is promoted to seq_cst on ARM <8.3
Is this PR still planned?
Should we test it on PPC64?
-------------
PR: https://git.openjdk.java.net/jdk/pull/2496
More information about the shenandoah-dev
mailing list