Possible problem with optimization of Memory Barriers in C2 compiler

Mon Feb 11 02:03:35 PST 2013

Hi Vladimir,

thanks for filing the bug.
Following the memory edges in addition will indeed fix the problem.

However, we could end up in emitting the MemBarVolatile  twice.
332 would become a necessary MemBar because of the Proj(334) and LoadI(339)
which fixes the problem. But the MemBarVolatile 335 is still there, but
the current implementation would not recognize it as unnecessary.

Thanks,
Martin

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Samstag, 9. Februar 2013 04:19
To: Doerr, Martin
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: Re: Possible problem with optimization of Memory Barriers in C2 compiler

Never-mind my previous answer (end of week :) ).

You are right, and I think the problem is in the next wrong assumption 
in post_store_load_barrier():

     if (x->is_MemBar()) {
       // We must retain this membar if there is an upcoming volatile
       // load, which will be preceded by acquire membar.
       if (xop == Op_MemBarAcquire)
         return false;

MemBarAcquire is following a volatile load and not preceding. So we need 
to fix post_store_load_barrier() to follow not only control edges but 
also memory to look for volatile loads.

I filed bug:

8007898: Incorrect optimization of Memory Barriers in 
Matcher::post_store_load_barrier()

Thanks,
Vladimir

On 2/8/13 6:42 PM, Vladimir Kozlov wrote:
> Martin,
>
> Matcher::post_store_load_barrier() does not remove mach nodes from
> graph. Matcher still should map your ideal membar nodes to mach nodes,
> something like next:
>
> n0  storeI
> n1  unnecessary_membar_volatile
> n2  unnecessary_membar_volatile
> n3  membar_volatile
> n4  loadI
>
> with all corresponding edges. It is only during assembler code emission
> assembler is not generated for unnecessary_membar_volatile. But until
> then they are in graph during gcm, lcm, ra and scheduler.
>
> There is comment about that:
>
> // We retain the Node to act as a compiler ordering barrier.
> bool Matcher::post_store_load_barrier(const Node *vmb) {
>
> Regards,
> Vladimir
>
>
> On 2/8/13 5:14 AM, Doerr, Martin wrote:
>> Hi,
>>
>> we contributed a change fixing memory ordering in taskqueue (8006971).
>>
>> In the course of the discussion Alexey Shipilev pointed us to the
>>
>> concurrency torture tests.
>>
>> While running these tests we found one test, for which, in our eyes,
>>
>> Matcher::post_store_load_barrier() generates wrong IR.  Nevertheless
>>
>> the test passes with openJDK, because lcm establishes a proper ordering.
>>
>> We would like to know whether this happens by accident, or whether
>>
>> this is designed into lcm (and OptoScheduling).
>>
>> I have written a self-contained test program similar to "DekkerTest"
>>
>> which tests volatile read after write.
>>
>>     class TestData {
>>
>>                     volatile int a;
>>
>>                     volatile int b;
>>
>>     }
>>
>>     int actor1(TestData t) {
>>
>>                     t.a = 1;
>>
>>                     return t.b;
>>
>>     }
>>
>> (Whole program under http://cr.openjdk.java.net/~goetz/c2_membar_issue/)
>>
>> Below you can see a part of the Ideal graph generated for
>>
>> TestRAW$runner1::run with inlined actor1.
>>
>> Membar 329 and 332 order Store 328 and Load 339.
>>
>> Matcher::post_store_load_barrier() turns 329 and 332 into
>>
>> unnecessary_MemBar_volatile so that the Store and Load are no more
>>
>> ordered by these ones.
>>
>> Lcm places Load 339 behind Membar 335 which results in correct code.
>>
>> Unfortunately there is no edge that forces Load 339 behind Membar 335.
>>
>> In other words, the ideal graph allows the schedule:
>>
>> 328 StoreI - 339 LoadI - 335 MemBarVolatile and the outcome of lcm
>>
>> seems to be accidential.
>>
>> 328    StoreI  ===  324  325  327  58  [[ 329  329  332  335 ]]
>> @TestRAW$TestData+12 *, name=a, idx=10; Volatile!  Memory:
>> @TestRAW$TestData:NotNull+12 *, name=a, idx=10; !jvms: TestRAW::actor1 @
>> bci:2 TestRAW$runner1::run @ bci:40
>>
>> 329    MemBarVolatile  ===  324  1  328  1  1  328  [[ 330  331 ]]
>> !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
>>
>> 330    Proj    ===  329  [[ 332 ]] #0 !jvms: TestRAW::actor1 @ bci:2
>> TestRAW$runner1::run @ bci:40
>>
>> 332    MemBarVolatile  ===  330  1  325  1  1  328  [[ 333  334 ]]
>> !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
>>
>> 333    Proj    ===  332  [[ 335 ]] #0 !jvms: TestRAW::actor1 @ bci:2
>> TestRAW$runner1::run @ bci:40
>>
>>   334    Proj    ===  332  [[ 326  335  339 ]] #2  Memory: @BotPTR
>> *+bot, idx=Bot; !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @
>> bci:40
>>
>> 335    MemBarVolatile  ===  333  1  334  1  1  328  [[ 336  337 ]]
>> !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
>>
>>   339    LoadI   ===  315  334  338  [[ 351  340  358 ]]
>> @TestRAW$TestData+16 *, name=b, idx=11; Volatile! #int !jvms:
>> TestRAW::actor1 @ bci:6 TestRAW$runner1::run @ bci:40
>>
>> (Whole ideal graph under
>> http://cr.openjdk.java.net/~goetz/c2_membar_issue/)
>>
>> The question we now have is the following. Is there a mechanism
>>
>> (e.g. in local code motion) which guarantees that the MemBarVolatile
>>
>> gets scheduled before the LoadI?
>>
>> If we change the local code motion, we can get the schedule
>>
>> StoreI-LoadI-MemBarVolatile causing this test to fail.
>>
>> lcm changes may make sense for architectures with many registers
>>
>> which may benefit from longer life ranges.
>>
>> Kind regards,
>>
>> Martin and Götz
>>