RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v8]
Xiaowei Lu
github.com+39413832+weixlu at openjdk.java.net
Tue Sep 14 03:42:11 UTC 2021
On Fri, 10 Sep 2021 07:41:32 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>>> @shipilev Hi, I have tested this pull request as well as this pull request + `OrderAccess::release();` on specjbb 2015 on AArch64 (Kunpeng 920). Maybe there is a slight improvement on critical-jOPS? Here is the result.
>>
>> Thanks for testing. So explicit barrier does seem to result in a slight bump in critical-jOPS.
>>
>> I assume "base" results are this PR? If so, do you have performance results for the current master? In other words, it would be interesting to see three results: baseline (current master), this PR, and this PR + `OrderAccess::release()`.
>
>> @shipilev Yes, “base” means this PR in my previous comment. Here is the result of the current master(i.e. revert all commits in this PR). It seems master performs better, so the cost of “acquire” may be really high as you have said.
>
> (sighs) Thanks for testing. Do you have spare cycles to verify that "acquire" is indeed the culprit for this? It would be simple to check: replace all `mark_acquire()` to just `mark()` in this PR. I am somewhat sure that would not break things very much for the test runs.
@shipilev quite confusing. I have replaced `mark_acquire()` in get_forwardee_raw() and get_forwardee_mutator() and run specjbb, only to see a slight decrease on critical-jOPS compared with master. But the implementation of LSE instructions isn't so efficient on the current server(kunpeng 920), which may bother CAS instructions with memory order. So I use an Ampere processor to run the tests again. However, Same as before, critical-jOPS decreases by about 3% even if we have replaced the acquire in forwardee access.
Anyway, compared with current PR, relax `mark_acquire()` to `mark()` gives us perf boost. But I'm confused why it is below master, since we adopt `release` or even `relaxed` in self heal and forwardee update.
on kunpeng 920
relax_acquire_1:RUN RESULT: hbIR (max attempted) = 41119, hbIR (settled) = 34282, max-jOPS = 30017, critical-jOPS = 22581
relax_acquire_2:RUN RESULT: hbIR (max attempted) = 41119, hbIR (settled) = 34282, max-jOPS = 30017, critical-jOPS = 22581
relax_acquire_3:RUN RESULT: hbIR (max attempted) = 34282, hbIR (settled) = 32419, max-jOPS = 29825, critical-jOPS = 21492
on Ampere
master_1:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 107127, max-jOPS = 101742, critical-jOPS = 37649
master_2:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 107689, max-jOPS = 100516, critical-jOPS = 38331
master_3:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 103341, max-jOPS = 99291, critical-jOPS = 37898
relax_acquire_1:RUN RESULT: hbIR (max attempted) = 108894, hbIR (settled) = 104937, max-jOPS = 99094, critical-jOPS = 34048
relax_acquire_2:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 106745, max-jOPS = 101742, critical-jOPS = 38273
relax_acquire_3:RUN RESULT: hbIR (max attempted) = 108894, hbIR (settled) = 104937, max-jOPS = 101271, critical-jOPS = 37701
-------------
PR: https://git.openjdk.java.net/jdk/pull/2496
More information about the shenandoah-dev
mailing list