[aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode
Doerr, Martin
martin.doerr at sap.com
Thu Dec 17 13:54:20 UTC 2015
Hi Hui Shi,
my concern was not limited to 8144993, but also with respect to 8136596 which is already pushed.
I have written the following small java example:
public class TestAllocMemBar{
static final int loop_cnt = 20000;
void dont_inline_me() {}
public class A{
public B b;
}
public class B{
public B(A a) { a.b = B.this; }
}
public void TestMethod() {
A a = new A();
dont_inline_me();
//System.gc();
B b = new B(a);
}
public static void main(String args[]){
TestAllocMemBar xyz = new TestAllocMemBar();
long duration = System.nanoTime();
for (int x = 0; x < loop_cnt; x++) { xyz.TestMethod(); }
duration = System.nanoTime() - duration;
System.out.println("duration: " + duration/1000/loop_cnt + " us per iteration");
}
}
Execution shows (tested on PPC64):
openjdk_9/bin/java -XX:+UseConcMarkSweepGC -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:CompileCommand="exclude TestAllocMemBar::dont_inline_me" -XX:+PrintInlining -XX:+PrintEscapeAnalysis -XX:-EliminateAllocations TestAllocMemBar
…
======== Connection graph for TestAllocMemBar::TestMethod
JavaObject NoEscape(NoEscape) [ 59F 179F [ 37 42 ]] 25 Allocate === 5 6 7 8 1 ( 23 21 22 1 10 1 1 ) [[ 26 27 28 35 36 37 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top ) TestAllocMemBar::TestMethod @ bci:0 !jvms: TestAllocMemBar::TestMethod @ bci:0
LocalVar [ 25P [ 42 59b ]] 37 Proj === 25 [[ 38 42 59 ]] #5 !jvms: TestAllocMemBar::TestMethod @ bci:0
LocalVar [ 37 25P [ 179b ]] 42 CheckCastPP === 39 37 [[ 179 183 179 119 98 93 ]] #TestAllocMemBar$A:NotNull:exact * Oop:TestAllocMemBar$A:NotNull:exact * !jvms: TestAllocMemBar::TestMethod @ bci:0
JavaObject NoEscape(NoEscape) NSR [ 153F [ 131 136 180 179 ]] 119 Allocate === 105 100 101 8 1 ( 54 117 22 1 10 42 1 ) [[ 120 121 122 129 130 131 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top ) TestAllocMemBar::TestMethod @ bci:13 !jvms: TestAllocMemBar::TestMethod @ bci:13
LocalVar [ 119P [ 136 153b ]] 131 Proj === 119 [[ 132 136 153 ]] #5 !jvms: TestAllocMemBar::TestMethod @ bci:13
LocalVar [ 131 119P [ 180 ]] 136 CheckCastPP === 133 131 [[ 180 193 ]] #TestAllocMemBar$B:NotNull:exact * Oop:TestAllocMemBar$B:NotNull:exact * !jvms: TestAllocMemBar::TestMethod @ bci:13
LocalVar [ 136 119P [ 179 ]] 180 EncodeP === _ 136 [[ 181 ]] #narrowoop: TestAllocMemBar$B:NotNull:exact * !jvms: TestAllocMemBar$B::<init> @ bci:11 TestAllocMemBar::TestMethod @ bci:19
@ 5 TestAllocMemBar$A::<init> (10 bytes) inline (hot)
@ 6 java.lang.Object::<init> (1 bytes) inline (hot)
@ 10 TestAllocMemBar::dont_inline_me (1 bytes) not compilable (disabled)
@ 19 TestAllocMemBar$B::<init> (15 bytes) inline (hot)
@ 6 java.lang.Object::<init> (1 bytes) inline (hot)
@ 6 java.lang.Object::<init> (1 bytes) inline (hot)
@ 6 java.lang.Object::<init> (1 bytes) inline (hot)
duration: 3 us per iteration
So you can see that both Allocations have the state NoEscape, but there’s a safepoint (the non-inlined call) between them. Concurrent GC could access the obj header and read stale data (and possibly crash). OptoAssembly shows that the MemBar was optimized out (probably due to 8136596).
However, we may have luck. Maybe no concurrent GC accesses the header of newly created objects. But I don’t know if this is true which is the reason why I posted this question originally. Keep in mind that objects can get allocated in old gen.
I still could imaging that these 2 optimization may be dangerous.
Best regards,
Martin
From: Hui Shi [mailto:hui.shi at linaro.org]
Sent: Mittwoch, 16. Dezember 2015 13:27
To: Andrew Haley <aph at redhat.com>
Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Vitaly Davidovich <vitalyd at gmail.com>; Doerr, Martin <martin.doerr at sap.com>; Aleksey Shipilev <aleksey.shipilev at oracle.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev <aarch64-port-dev at openjdk.java.net>; Mikael Gerdin <mikael.gerdin at oracle.com> (mikael.gerdin at oracle.com) <mikael.gerdin at oracle.com>
Subject: Re: RFR: 8144993: Elide redundant memory barrier after AllocationNode
Thanks Andrew, Goetz and all!
Major concern is will removing storestore barrier cause other threads read stale data for newly allocated object. Other threads include java thread or concurrent GC thread. It should be safe with following analysis.
1. If BCEA result "this"(b) escapes in its initializer, change will not optimize storestore barrier.
2. If BCEA result "this"(b) does not escape in its initializer, it's safe to remove storestore.
2.1 If there is a safe point between storestore and release, b is visible to GC in initializer, but at safe point, it should have a memory barrier.
2.2 If there is no safe point between storestore and release. b will be visible to other thread after release memory barrier.
Case #1
A a = new A();
safepoint // a can be reached from GC
new B(a)
allocation
-------
b.klass =...
b.markword =...
b.f1 = 0
..
b.fn = 0
storestore
-------- init start
....
a.x = this; // b might visible to other threads here
....
release
-------- init end
BCEA result indicate "this"(b) is not local and not arg_stack. So "b" will be treated as escaped in its initialzer, so change will not optimize storestore barrier.
[EA] estimated escape information for B::<init>
non-escaping args: {}
stack-allocatable args: {1}
return non-local value
modified args: 0x6 0x6
flags:
b="this" is not local and not arg_stack
a is arg_stack means it is passed in and not assigned to other object in initializer.
Case #2.1
allocation
-------
b.klass =...
b.markword =...
b.f1 = 0
..
b.fn = 0
storestore
-------- init start
....
safepoint // "this" is in oop map and might visible to GC thread here
....
release
-------- init end
Case #2.2
allocation
-------
b.klass =...
b.markword =...
b.f1 = 0
..
b.fn = 0
storestore
-------- init start
....
release
-------- init end
Regards
Hui
On 16 December 2015 at 00:15, Andrew Haley <aph at redhat.com<mailto:aph at redhat.com>> wrote:
On 12/15/2015 04:01 PM, Lindenmaier, Goetz wrote:
> Further, if the object is NoEscape it might not be scalar
> replaced. If I remember correctly, there are various conditions,
> e.g., too big, allocated in loop.
Well, that's the killer. The definition of "escape" we need to use
here is the really, truly, honest-to-goodness one: that this object
never becomes visible to any other thread by any means. Unless that
is so, all bets are off. In this case, what is intended is "appears
in an OOP map".
Andrew.
More information about the aarch64-port-dev
mailing list