Possible problem with optimization of Memory Barriers in C2 compiler

Fri Feb 8 19:19:26 PST 2013

Never-mind my previous answer (end of week :) ).

You are right, and I think the problem is in the next wrong assumption 
in post_store_load_barrier():

     if (x->is_MemBar()) {
       // We must retain this membar if there is an upcoming volatile
       // load, which will be preceded by acquire membar.
       if (xop == Op_MemBarAcquire)
         return false;

MemBarAcquire is following a volatile load and not preceding. So we need 
to fix post_store_load_barrier() to follow not only control edges but 
also memory to look for volatile loads.

I filed bug:

8007898: Incorrect optimization of Memory Barriers in 
Matcher::post_store_load_barrier()

Thanks,
Vladimir

On 2/8/13 6:42 PM, Vladimir Kozlov wrote:
> Martin,
>
> Matcher::post_store_load_barrier() does not remove mach nodes from
> graph. Matcher still should map your ideal membar nodes to mach nodes,
> something like next:
>
> n0  storeI
> n1  unnecessary_membar_volatile
> n2  unnecessary_membar_volatile
> n3  membar_volatile
> n4  loadI
>
> with all corresponding edges. It is only during assembler code emission
> assembler is not generated for unnecessary_membar_volatile. But until
> then they are in graph during gcm, lcm, ra and scheduler.
>
> There is comment about that:
>
> // We retain the Node to act as a compiler ordering barrier.
> bool Matcher::post_store_load_barrier(const Node *vmb) {
>
> Regards,
> Vladimir
>
>
> On 2/8/13 5:14 AM, Doerr, Martin wrote:
>> Hi,
>>
>> we contributed a change fixing memory ordering in taskqueue (8006971).
>>
>> In the course of the discussion Alexey Shipilev pointed us to the
>>
>> concurrency torture tests.
>>
>> While running these tests we found one test, for which, in our eyes,
>>
>> Matcher::post_store_load_barrier() generates wrong IR.  Nevertheless
>>
>> the test passes with openJDK, because lcm establishes a proper ordering.
>>
>> We would like to know whether this happens by accident, or whether
>>
>> this is designed into lcm (and OptoScheduling).
>>
>> I have written a self-contained test program similar to "DekkerTest"
>>
>> which tests volatile read after write.
>>
>>     class TestData {
>>
>>                     volatile int a;
>>
>>                     volatile int b;
>>
>>     }
>>
>>     int actor1(TestData t) {
>>
>>                     t.a = 1;
>>
>>                     return t.b;
>>
>>     }
>>
>> (Whole program under http://cr.openjdk.java.net/~goetz/c2_membar_issue/)
>>
>> Below you can see a part of the Ideal graph generated for
>>
>> TestRAW$runner1::run with inlined actor1.
>>
>> Membar 329 and 332 order Store 328 and Load 339.
>>
>> Matcher::post_store_load_barrier() turns 329 and 332 into
>>
>> unnecessary_MemBar_volatile so that the Store and Load are no more
>>
>> ordered by these ones.
>>
>> Lcm places Load 339 behind Membar 335 which results in correct code.
>>
>> Unfortunately there is no edge that forces Load 339 behind Membar 335.
>>
>> In other words, the ideal graph allows the schedule:
>>
>> 328 StoreI - 339 LoadI - 335 MemBarVolatile and the outcome of lcm
>>
>> seems to be accidential.
>>
>> 328    StoreI  ===  324  325  327  58  [[ 329  329  332  335 ]]
>> @TestRAW$TestData+12 *, name=a, idx=10; Volatile!  Memory:
>> @TestRAW$TestData:NotNull+12 *, name=a, idx=10; !jvms: TestRAW::actor1 @
>> bci:2 TestRAW$runner1::run @ bci:40
>>
>> 329    MemBarVolatile  ===  324  1  328  1  1  328  [[ 330  331 ]]
>> !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
>>
>> 330    Proj    ===  329  [[ 332 ]] #0 !jvms: TestRAW::actor1 @ bci:2
>> TestRAW$runner1::run @ bci:40
>>
>> 332    MemBarVolatile  ===  330  1  325  1  1  328  [[ 333  334 ]]
>> !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
>>
>> 333    Proj    ===  332  [[ 335 ]] #0 !jvms: TestRAW::actor1 @ bci:2
>> TestRAW$runner1::run @ bci:40
>>
>>   334    Proj    ===  332  [[ 326  335  339 ]] #2  Memory: @BotPTR
>> *+bot, idx=Bot; !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @
>> bci:40
>>
>> 335    MemBarVolatile  ===  333  1  334  1  1  328  [[ 336  337 ]]
>> !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
>>
>>   339    LoadI   ===  315  334  338  [[ 351  340  358 ]]
>> @TestRAW$TestData+16 *, name=b, idx=11; Volatile! #int !jvms:
>> TestRAW::actor1 @ bci:6 TestRAW$runner1::run @ bci:40
>>
>> (Whole ideal graph under
>> http://cr.openjdk.java.net/~goetz/c2_membar_issue/)
>>
>> The question we now have is the following. Is there a mechanism
>>
>> (e.g. in local code motion) which guarantees that the MemBarVolatile
>>
>> gets scheduled before the LoadI?
>>
>> If we change the local code motion, we can get the schedule
>>
>> StoreI-LoadI-MemBarVolatile causing this test to fail.
>>
>> lcm changes may make sense for architectures with many registers
>>
>> which may benefit from longer life ranges.
>>
>> Kind regards,
>>
>> Martin and Götz
>>