Possible problem with optimization of Memory Barriers in C2 compiler
Vladimir Kozlov
vladimir.kozlov at oracle.com
Fri Feb 8 18:42:41 PST 2013
Martin,
Matcher::post_store_load_barrier() does not remove mach nodes from
graph. Matcher still should map your ideal membar nodes to mach nodes,
something like next:
n0 storeI
n1 unnecessary_membar_volatile
n2 unnecessary_membar_volatile
n3 membar_volatile
n4 loadI
with all corresponding edges. It is only during assembler code emission
assembler is not generated for unnecessary_membar_volatile. But until
then they are in graph during gcm, lcm, ra and scheduler.
There is comment about that:
// We retain the Node to act as a compiler ordering barrier.
bool Matcher::post_store_load_barrier(const Node *vmb) {
Regards,
Vladimir
On 2/8/13 5:14 AM, Doerr, Martin wrote:
> Hi,
>
> we contributed a change fixing memory ordering in taskqueue (8006971).
>
> In the course of the discussion Alexey Shipilev pointed us to the
>
> concurrency torture tests.
>
> While running these tests we found one test, for which, in our eyes,
>
> Matcher::post_store_load_barrier() generates wrong IR. Nevertheless
>
> the test passes with openJDK, because lcm establishes a proper ordering.
>
> We would like to know whether this happens by accident, or whether
>
> this is designed into lcm (and OptoScheduling).
>
> I have written a self-contained test program similar to "DekkerTest"
>
> which tests volatile read after write.
>
> class TestData {
>
> volatile int a;
>
> volatile int b;
>
> }
>
> int actor1(TestData t) {
>
> t.a = 1;
>
> return t.b;
>
> }
>
> (Whole program under http://cr.openjdk.java.net/~goetz/c2_membar_issue/)
>
> Below you can see a part of the Ideal graph generated for
>
> TestRAW$runner1::run with inlined actor1.
>
> Membar 329 and 332 order Store 328 and Load 339.
>
> Matcher::post_store_load_barrier() turns 329 and 332 into
>
> unnecessary_MemBar_volatile so that the Store and Load are no more
>
> ordered by these ones.
>
> Lcm places Load 339 behind Membar 335 which results in correct code.
>
> Unfortunately there is no edge that forces Load 339 behind Membar 335.
>
> In other words, the ideal graph allows the schedule:
>
> 328 StoreI - 339 LoadI - 335 MemBarVolatile and the outcome of lcm
>
> seems to be accidential.
>
> 328 StoreI === 324 325 327 58 [[ 329 329 332 335 ]]
> @TestRAW$TestData+12 *, name=a, idx=10; Volatile! Memory:
> @TestRAW$TestData:NotNull+12 *, name=a, idx=10; !jvms: TestRAW::actor1 @
> bci:2 TestRAW$runner1::run @ bci:40
>
> 329 MemBarVolatile === 324 1 328 1 1 328 [[ 330 331 ]]
> !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
>
> 330 Proj === 329 [[ 332 ]] #0 !jvms: TestRAW::actor1 @ bci:2
> TestRAW$runner1::run @ bci:40
>
> 332 MemBarVolatile === 330 1 325 1 1 328 [[ 333 334 ]]
> !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
>
> 333 Proj === 332 [[ 335 ]] #0 !jvms: TestRAW::actor1 @ bci:2
> TestRAW$runner1::run @ bci:40
>
> 334 Proj === 332 [[ 326 335 339 ]] #2 Memory: @BotPTR
> *+bot, idx=Bot; !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
>
> 335 MemBarVolatile === 333 1 334 1 1 328 [[ 336 337 ]]
> !jvms: TestRAW::actor1 @ bci:2 TestRAW$runner1::run @ bci:40
>
> 339 LoadI === 315 334 338 [[ 351 340 358 ]]
> @TestRAW$TestData+16 *, name=b, idx=11; Volatile! #int !jvms:
> TestRAW::actor1 @ bci:6 TestRAW$runner1::run @ bci:40
>
> (Whole ideal graph under http://cr.openjdk.java.net/~goetz/c2_membar_issue/)
>
> The question we now have is the following. Is there a mechanism
>
> (e.g. in local code motion) which guarantees that the MemBarVolatile
>
> gets scheduled before the LoadI?
>
> If we change the local code motion, we can get the schedule
>
> StoreI-LoadI-MemBarVolatile causing this test to fail.
>
> lcm changes may make sense for architectures with many registers
>
> which may benefit from longer life ranges.
>
> Kind regards,
>
> Martin and Götz
>
More information about the hotspot-compiler-dev
mailing list