Possible problem with optimization of Memory Barriers in C2 compiler
Vladimir Kozlov
vladimir.kozlov at oracle.com
Mon Feb 11 11:05:29 PST 2013
Martin,
An other issue here is aliasing of membars. The reason we have 3 membars
at this part of code is they are aliased to different memory slices (see
memory Proj nodes below) but post_store_load_barrier() treat them as
wide (C2 bottom type) barriers. Which is correct since in hardware they
are wide. Membars are generated in Parse::do_put_xxx() method. And I
need to ask around why we don't use one fat membar there.
Thanks,
Vladimir
328 StoreI === 324 325 327 58 [[ 329 329 332 335 ]]
@TestRAW$TestData+12 *, name=a, idx=10; Volatile! Memory:
@TestRAW$TestData:NotNull+12 *, name=a, idx=10; !jvms: TestRAW::actor1 @
bci:2 TestRAW$runner1::run @ bci:40
329 MemBarVolatile === 324 1 328 1 1 328 [[ 330 331 ]] !jvms:
TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
331 Proj === 329 [[ 326 ]] #2 Memory: @TestRAW$TestData+12 *,
name=a, idx=10; !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
332 MemBarVolatile === 330 1 325 1 1 328 [[ 333 334 ]] !jvms:
TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
334 Proj === 332 [[ 326 335 339 ]] #2 Memory: @BotPTR *+bot,
idx=Bot; !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
335 MemBarVolatile === 333 1 334 1 1 328 [[ 336 337 ]] !jvms:
TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
337 Proj === 335 [[ 326 ]] #2 Memory: @TestRAW+16 *,
name=start_trigger, idx=5; !jvms: TestRAW::actor1 @ bci:2
TestRAW$runner1::run @ bci:40
339 LoadI === 315 334 338 [[ 351 340 358 ]]
@TestRAW$TestData+16 *, name=b, idx=11; Volatile! #int !jvms:
TestRAW::actor1 @ bci:6 TestRAW$runner1::run @ bci:40
On 2/11/13 2:03 AM, Doerr, Martin wrote:
> Hi Vladimir,
>
> thanks for filing the bug.
> Following the memory edges in addition will indeed fix the problem.
>
> However, we could end up in emitting the MemBarVolatile twice.
> 332 would become a necessary MemBar because of the Proj(334) and LoadI(339)
> which fixes the problem. But the MemBarVolatile 335 is still there, but
> the current implementation would not recognize it as unnecessary.
>
> Thanks,
> Martin
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Samstag, 9. Februar 2013 04:19
> To: Doerr, Martin
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: Possible problem with optimization of Memory Barriers in C2 compiler
>
> Never-mind my previous answer (end of week :) ).
>
> You are right, and I think the problem is in the next wrong assumption
> in post_store_load_barrier():
>
> if (x->is_MemBar()) {
> // We must retain this membar if there is an upcoming volatile
> // load, which will be preceded by acquire membar.
> if (xop == Op_MemBarAcquire)
> return false;
>
> MemBarAcquire is following a volatile load and not preceding. So we need
> to fix post_store_load_barrier() to follow not only control edges but
> also memory to look for volatile loads.
>
> I filed bug:
>
> 8007898: Incorrect optimization of Memory Barriers in
> Matcher::post_store_load_barrier()
>
> Thanks,
> Vladimir
>
> On 2/8/13 6:42 PM, Vladimir Kozlov wrote:
>> Martin,
>>
>> Matcher::post_store_load_barrier() does not remove mach nodes from
>> graph. Matcher still should map your ideal membar nodes to mach nodes,
>> something like next:
>>
>> n0 storeI
>> n1 unnecessary_membar_volatile
>> n2 unnecessary_membar_volatile
>> n3 membar_volatile
>> n4 loadI
>>
>> with all corresponding edges. It is only during assembler code emission
>> assembler is not generated for unnecessary_membar_volatile. But until
>> then they are in graph during gcm, lcm, ra and scheduler.
>>
>> There is comment about that:
>>
>> // We retain the Node to act as a compiler ordering barrier.
>> bool Matcher::post_store_load_barrier(const Node *vmb) {
>>
>> Regards,
>> Vladimir
>>
>>
>> On 2/8/13 5:14 AM, Doerr, Martin wrote:
>>> Hi,
>>>
>>> we contributed a change fixing memory ordering in taskqueue (8006971).
>>>
>>> In the course of the discussion Alexey Shipilev pointed us to the
>>>
>>> concurrency torture tests.
>>>
>>> While running these tests we found one test, for which, in our eyes,
>>>
>>> Matcher::post_store_load_barrier() generates wrong IR. Nevertheless
>>>
>>> the test passes with openJDK, because lcm establishes a proper ordering.
>>>
>>> We would like to know whether this happens by accident, or whether
>>>
>>> this is designed into lcm (and OptoScheduling).
>>>
>>> I have written a self-contained test program similar to "DekkerTest"
>>>
>>> which tests volatile read after write.
>>>
>>> class TestData {
>>>
>>> volatile int a;
>>>
>>> volatile int b;
>>>
>>> }
>>>
>>> int actor1(TestData t) {
>>>
>>> t.a = 1;
>>>
>>> return t.b;
>>>
>>> }
>>>
>>> (Whole program under http://cr.openjdk.java.net/~goetz/c2_membar_issue/)
>>>
>>> Below you can see a part of the Ideal graph generated for
>>>
>>> TestRAW$runner1::run with inlined actor1.
>>>
>>> Membar 329 and 332 order Store 328 and Load 339.
>>>
>>> Matcher::post_store_load_barrier() turns 329 and 332 into
>>>
>>> unnecessary_MemBar_volatile so that the Store and Load are no more
>>>
>>> ordered by these ones.
>>>
>>> Lcm places Load 339 behind Membar 335 which results in correct code.
>>>
>>> Unfortunately there is no edge that forces Load 339 behind Membar 335.
>>>
>>> In other words, the ideal graph allows the schedule:
>>>
>>> 328 StoreI - 339 LoadI - 335 MemBarVolatile and the outcome of lcm
>>>
>>> seems to be accidential.
>>>
>>> 328 StoreI === 324 325 327 58 [[ 329 329 332 335 ]]
>>> @TestRAW$TestData+12 *, name=a, idx=10; Volatile! Memory:
>>> @TestRAW$TestData:NotNull+12 *, name=a, idx=10; !jvms: TestRAW::actor1 @
>>> bci:2 TestRAW$runner1::run @ bci:40
>>>
>>> 329 MemBarVolatile === 324 1 328 1 1 328 [[ 330 331 ]]
>>> !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
>>>
>>> 330 Proj === 329 [[ 332 ]] #0 !jvms: TestRAW::actor1 @ bci:2
>>> TestRAW$runner1::run @ bci:40
>>>
>>> 332 MemBarVolatile === 330 1 325 1 1 328 [[ 333 334 ]]
>>> !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
>>>
>>> 333 Proj === 332 [[ 335 ]] #0 !jvms: TestRAW::actor1 @ bci:2
>>> TestRAW$runner1::run @ bci:40
>>>
>>> 334 Proj === 332 [[ 326 335 339 ]] #2 Memory: @BotPTR
>>> *+bot, idx=Bot; !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @
>>> bci:40
>>>
>>> 335 MemBarVolatile === 333 1 334 1 1 328 [[ 336 337 ]]
>>> !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
>>>
>>> 339 LoadI === 315 334 338 [[ 351 340 358 ]]
>>> @TestRAW$TestData+16 *, name=b, idx=11; Volatile! #int !jvms:
>>> TestRAW::actor1 @ bci:6 TestRAW$runner1::run @ bci:40
>>>
>>> (Whole ideal graph under
>>> http://cr.openjdk.java.net/~goetz/c2_membar_issue/)
>>>
>>> The question we now have is the following. Is there a mechanism
>>>
>>> (e.g. in local code motion) which guarantees that the MemBarVolatile
>>>
>>> gets scheduled before the LoadI?
>>>
>>> If we change the local code motion, we can get the schedule
>>>
>>> StoreI-LoadI-MemBarVolatile causing this test to fail.
>>>
>>> lcm changes may make sense for architectures with many registers
>>>
>>> which may benefit from longer life ranges.
>>>
>>> Kind regards,
>>>
>>> Martin and Götz
>>>
More information about the hotspot-compiler-dev
mailing list