Possible problem with optimization of Memory Barriers in C2 compiler
Vladimir Kozlov
vladimir.kozlov at oracle.com
Fri Feb 8 19:19:26 PST 2013
Never-mind my previous answer (end of week :) ).
You are right, and I think the problem is in the next wrong assumption
in post_store_load_barrier():
if (x->is_MemBar()) {
// We must retain this membar if there is an upcoming volatile
// load, which will be preceded by acquire membar.
if (xop == Op_MemBarAcquire)
return false;
MemBarAcquire is following a volatile load and not preceding. So we need
to fix post_store_load_barrier() to follow not only control edges but
also memory to look for volatile loads.
I filed bug:
8007898: Incorrect optimization of Memory Barriers in
Matcher::post_store_load_barrier()
Thanks,
Vladimir
On 2/8/13 6:42 PM, Vladimir Kozlov wrote:
> Martin,
>
> Matcher::post_store_load_barrier() does not remove mach nodes from
> graph. Matcher still should map your ideal membar nodes to mach nodes,
> something like next:
>
> n0 storeI
> n1 unnecessary_membar_volatile
> n2 unnecessary_membar_volatile
> n3 membar_volatile
> n4 loadI
>
> with all corresponding edges. It is only during assembler code emission
> assembler is not generated for unnecessary_membar_volatile. But until
> then they are in graph during gcm, lcm, ra and scheduler.
>
> There is comment about that:
>
> // We retain the Node to act as a compiler ordering barrier.
> bool Matcher::post_store_load_barrier(const Node *vmb) {
>
> Regards,
> Vladimir
>
>
> On 2/8/13 5:14 AM, Doerr, Martin wrote:
>> Hi,
>>
>> we contributed a change fixing memory ordering in taskqueue (8006971).
>>
>> In the course of the discussion Alexey Shipilev pointed us to the
>>
>> concurrency torture tests.
>>
>> While running these tests we found one test, for which, in our eyes,
>>
>> Matcher::post_store_load_barrier() generates wrong IR. Nevertheless
>>
>> the test passes with openJDK, because lcm establishes a proper ordering.
>>
>> We would like to know whether this happens by accident, or whether
>>
>> this is designed into lcm (and OptoScheduling).
>>
>> I have written a self-contained test program similar to "DekkerTest"
>>
>> which tests volatile read after write.
>>
>> class TestData {
>>
>> volatile int a;
>>
>> volatile int b;
>>
>> }
>>
>> int actor1(TestData t) {
>>
>> t.a = 1;
>>
>> return t.b;
>>
>> }
>>
>> (Whole program under http://cr.openjdk.java.net/~goetz/c2_membar_issue/)
>>
>> Below you can see a part of the Ideal graph generated for
>>
>> TestRAW$runner1::run with inlined actor1.
>>
>> Membar 329 and 332 order Store 328 and Load 339.
>>
>> Matcher::post_store_load_barrier() turns 329 and 332 into
>>
>> unnecessary_MemBar_volatile so that the Store and Load are no more
>>
>> ordered by these ones.
>>
>> Lcm places Load 339 behind Membar 335 which results in correct code.
>>
>> Unfortunately there is no edge that forces Load 339 behind Membar 335.
>>
>> In other words, the ideal graph allows the schedule:
>>
>> 328 StoreI - 339 LoadI - 335 MemBarVolatile and the outcome of lcm
>>
>> seems to be accidential.
>>
>> 328 StoreI === 324 325 327 58 [[ 329 329 332 335 ]]
>> @TestRAW$TestData+12 *, name=a, idx=10; Volatile! Memory:
>> @TestRAW$TestData:NotNull+12 *, name=a, idx=10; !jvms: TestRAW::actor1 @
>> bci:2 TestRAW$runner1::run @ bci:40
>>
>> 329 MemBarVolatile === 324 1 328 1 1 328 [[ 330 331 ]]
>> !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
>>
>> 330 Proj === 329 [[ 332 ]] #0 !jvms: TestRAW::actor1 @ bci:2
>> TestRAW$runner1::run @ bci:40
>>
>> 332 MemBarVolatile === 330 1 325 1 1 328 [[ 333 334 ]]
>> !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
>>
>> 333 Proj === 332 [[ 335 ]] #0 !jvms: TestRAW::actor1 @ bci:2
>> TestRAW$runner1::run @ bci:40
>>
>> 334 Proj === 332 [[ 326 335 339 ]] #2 Memory: @BotPTR
>> *+bot, idx=Bot; !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @
>> bci:40
>>
>> 335 MemBarVolatile === 333 1 334 1 1 328 [[ 336 337 ]]
>> !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
>>
>> 339 LoadI === 315 334 338 [[ 351 340 358 ]]
>> @TestRAW$TestData+16 *, name=b, idx=11; Volatile! #int !jvms:
>> TestRAW::actor1 @ bci:6 TestRAW$runner1::run @ bci:40
>>
>> (Whole ideal graph under
>> http://cr.openjdk.java.net/~goetz/c2_membar_issue/)
>>
>> The question we now have is the following. Is there a mechanism
>>
>> (e.g. in local code motion) which guarantees that the MemBarVolatile
>>
>> gets scheduled before the LoadI?
>>
>> If we change the local code motion, we can get the schedule
>>
>> StoreI-LoadI-MemBarVolatile causing this test to fail.
>>
>> lcm changes may make sense for architectures with many registers
>>
>> which may benefit from longer life ranges.
>>
>> Kind regards,
>>
>> Martin and Götz
>>
More information about the hotspot-compiler-dev
mailing list