RFR (S) 8019968: Reference CAS induces GC store barrier even on failure
Vladimir Kozlov
vladimir.kozlov at oracle.com
Tue Aug 11 02:21:33 UTC 2015
My bad, I forgot that CompareAndSwapP assembler code produces Boolean
value in register. I mistook it for StorePConditional which produces flag.
But I think you can get better code since you want to generate test and
main point of having specialized CompareAndSwapP is to avoid test
instruction.
If we use StorePConditional instead of CompareAndSwapP we may remove
second branch:
> │││ 0x00007fa06809cdd1: test %r11d,%r11d
> ; CAS fail, jump to respin
> │ ╰ 0x00007fe618af412b: jeq 0x00007fe618af4082
> ; CAS success, jump to store barrier
> <UNCONDITIONAL STORE BARRIER HERE>
But C2 changes will be much larger. We would need new Ideal::if_then()
which take in result of StorePConditional and set load_store on both
paths to 0/1.
We may need to play with probability of if_then() to get barrier in
follow code.
Thanks,
Vladimir
On 8/10/15 2:13 AM, Aleksey Shipilev wrote:
> Hi Vladimir!
>
> On 07/31/2015 06:03 AM, Vladimir Kozlov wrote:
>> I think the test is wrong. It should be:
>>
>> if_then(load_store, BoolTest::eq, oldval, PROB_STATIC_FREQUENT);
>
> Um, no? I remember eyeballing the assembly to confirm this.
>
> For LS_cmpxchg, we are inlining "*boolean* cas(...)", so the load_store
> seems to have a boolean value, but "oldval" is oop. In other words,
> "load_store != 0" tests "(boolean)load_store != false".
>
> Current VM produces:
>
> 13.46% 45.85% │││ 0x00007fa06809cdaf: lock cmpxchg %r8d,(%rdi)
> 14.91% 4.41% │││ 0x00007fa06809cdb4: sete %r11b
> 0.07% │││ 0x00007fa06809cdb8: movzbl %r11b,%r11d
> <UNCONDITIONAL STORE BARRIER HERE>
> 0.06% │││ 0x00007fa06809cdd1: test %r11d,%r11d
>
> ; CAS fail, jump to respin
> │╰│ 0x00007fa06809cdd4: je 0x00007fa06809ccf0
>
>
> Patched VM piggybacks on the same result:
>
> 1.97% 0.05% │││ 0x00007fe618af4115: lock cmpxchg %ebx,(%r9)
> 50.59% 90.01% │││ 0x00007fe618af411a: sete %r11b
> 0.05% 0.01% │││ 0x00007fe618af411e: movzbl %r11b,%r11d
> 3.02% 1.80% │││ 0x00007fe618af4122: test %r11d,%r11d
>
> ; CAS success, jump to store barrier
> │╰│ 0x00007fe618af4125: jne 0x00007fe618af4070
>
> ; CAS fail, jump to respin
> │ ╰ 0x00007fe618af412b: jmpq 0x00007fe618af4082
>
>
> Your suggestion seems to ignore the test completely (GVN helped?), and
> while it's still technically correct with emitting the barrier always,
> it defeats the purpose of the change:
>
> 2.35% 4.97% │ ││ 0x00007f7790aefd26: lock cmpxchg %r10d,(%rbx)
> 48.54% 86.32% │ ││ 0x00007f7790aefd2b: mov $0x1,%eax
> 0.03% │╭││ 0x00007f7790aefd30: je 0x00007f7790aefd3b
> ││││ 0x00007f7790aefd36: mov $0x0,%eax
> <UNCONDITIONAL STORE BARRIER HERE>
> 2.16% 0.01% │ ││ 0x00007f7790aefd53: cmp $0x0,%eax
>
> ; CAS fail, jump back to respin
> │ ╰│ 0x00007f7790aefd56: je 0x00007f7790aefd10
>
> ; CAS success, follow to exit
>
> Also, AFAIU, performance results would look different if we screwed the
> success check. But they seem to be coherent with our expectations: when
> CAS fails, either the conditional card marking or this change helps, and
> the change does not help when CAS succeeds.
>
> Thanks,
> -Aleksey
>
>> Thanks,
>> Vladimir
>>
>> On 7/29/15 2:57 AM, Aleksey Shipilev wrote:
>>> On 07/29/2015 12:24 PM, Andrew Dinn wrote:
>>>> On 29/07/15 09:58, Aleksey Shipilev wrote:
>>>>> I would like to suggest a fix for:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8019968
>>>>
>>>>> In short, current reference CAS intrinsic blindly emits
>>>>> post_barrier, ignoring the CAS result. In some cases, notably
>>>>> contended CAS spin-loops, we fail the CAS a lot, and thus go for a
>>>>> post_barrier excessively. Instead, we can conditionalize on the
>>>>> result of the store itself, and put the post_barrier only on
>>>>> success path: http://cr.openjdk.java.net/~shade/8019968/webrev.01/
>>>>
>>>>> More performance results here:
>>>>> http://cr.openjdk.java.net/~shade/8019968/notes.txt
>>>>
>>>> Nice! The code looks fine and your test results are very convincing.
>>>> I'll be interested to see how this looks on AArch64.
>>>
>>> Thanks Andrew!
>>>
>>> The change passes JPRT, so AArch64 build is available. The benchmark JAR
>>> mentioned in the issue comments would run without intervention, taking
>>> around 40 minutes. You are very welcome to try, while Reviewers are
>>> taking a look. I can do that only next week.
>>>
>>>> That said, I am afraid you still need a Reviewer!
>>>
>>> That reminds me I haven't spelled out what testing was done:
>>>
>>> * JPRT on all open platforms
>>> * Targeted benchmarks
>>> * Eyeballing the generated x86 assembly
>>>
>>> Thanks,
>>> -Aleksey
>>>
>>>
>
>
More information about the hotspot-compiler-dev
mailing list