Possible problem with optimization of Memory Barriers in C2 compiler

Fri Feb 8 18:42:41 PST 2013

Martin,

Matcher::post_store_load_barrier() does not remove mach nodes from 
graph. Matcher still should map your ideal membar nodes to mach nodes, 
something like next:

n0  storeI
n1  unnecessary_membar_volatile
n2  unnecessary_membar_volatile
n3  membar_volatile
n4  loadI

with all corresponding edges. It is only during assembler code emission 
assembler is not generated for unnecessary_membar_volatile. But until 
then they are in graph during gcm, lcm, ra and scheduler.

There is comment about that:

// We retain the Node to act as a compiler ordering barrier.
bool Matcher::post_store_load_barrier(const Node *vmb) {

Regards,
Vladimir

On 2/8/13 5:14 AM, Doerr, Martin wrote:
> Hi,
>
> we contributed a change fixing memory ordering in taskqueue (8006971).
>
> In the course of the discussion Alexey Shipilev pointed us to the
>
> concurrency torture tests.
>
> While running these tests we found one test, for which, in our eyes,
>
> Matcher::post_store_load_barrier() generates wrong IR.  Nevertheless
>
> the test passes with openJDK, because lcm establishes a proper ordering.
>
> We would like to know whether this happens by accident, or whether
>
> this is designed into lcm (and OptoScheduling).
>
> I have written a self-contained test program similar to "DekkerTest"
>
> which tests volatile read after write.
>
>     class TestData {
>
>                     volatile int a;
>
>                     volatile int b;
>
>     }
>
>     int actor1(TestData t) {
>
>                     t.a = 1;
>
>                     return t.b;
>
>     }
>
> (Whole program under http://cr.openjdk.java.net/~goetz/c2_membar_issue/)
>
> Below you can see a part of the Ideal graph generated for
>
> TestRAW$runner1::run with inlined actor1.
>
> Membar 329 and 332 order Store 328 and Load 339.
>
> Matcher::post_store_load_barrier() turns 329 and 332 into
>
> unnecessary_MemBar_volatile so that the Store and Load are no more
>
> ordered by these ones.
>
> Lcm places Load 339 behind Membar 335 which results in correct code.
>
> Unfortunately there is no edge that forces Load 339 behind Membar 335.
>
> In other words, the ideal graph allows the schedule:
>
> 328 StoreI - 339 LoadI - 335 MemBarVolatile and the outcome of lcm
>
> seems to be accidential.
>
> 328    StoreI  ===  324  325  327  58  [[ 329  329  332  335 ]]
> @TestRAW$TestData+12 *, name=a, idx=10; Volatile!  Memory:
> @TestRAW$TestData:NotNull+12 *, name=a, idx=10; !jvms: TestRAW::actor1 @
> bci:2 TestRAW$runner1::run @ bci:40
>
> 329    MemBarVolatile  ===  324  1  328  1  1  328  [[ 330  331 ]]
> !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
>
> 330    Proj    ===  329  [[ 332 ]] #0 !jvms: TestRAW::actor1 @ bci:2
> TestRAW$runner1::run @ bci:40
>
> 332    MemBarVolatile  ===  330  1  325  1  1  328  [[ 333  334 ]]
> !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
>
> 333    Proj    ===  332  [[ 335 ]] #0 !jvms: TestRAW::actor1 @ bci:2
> TestRAW$runner1::run @ bci:40
>
>   334    Proj    ===  332  [[ 326  335  339 ]] #2  Memory: @BotPTR
> *+bot, idx=Bot; !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
>
> 335    MemBarVolatile  ===  333  1  334  1  1  328  [[ 336  337 ]]
> !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
>
>   339    LoadI   ===  315  334  338  [[ 351  340  358 ]]
> @TestRAW$TestData+16 *, name=b, idx=11; Volatile! #int !jvms:
> TestRAW::actor1 @ bci:6 TestRAW$runner1::run @ bci:40
>
> (Whole ideal graph under http://cr.openjdk.java.net/~goetz/c2_membar_issue/)
>
> The question we now have is the following. Is there a mechanism
>
> (e.g. in local code motion) which guarantees that the MemBarVolatile
>
> gets scheduled before the LoadI?
>
> If we change the local code motion, we can get the schedule
>
> StoreI-LoadI-MemBarVolatile causing this test to fail.
>
> lcm changes may make sense for architectures with many registers
>
> which may benefit from longer life ranges.
>
> Kind regards,
>
> Martin and Götz
>