[aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode

Doerr, Martin martin.doerr at sap.com
Thu Dec 17 13:54:20 UTC 2015


Hi Hui Shi,

my concern was not limited to 8144993, but also with respect to 8136596 which is already pushed.


I have written the following small java example:

public class TestAllocMemBar{

  static final int loop_cnt = 20000;

  void dont_inline_me() {}


  public class A{
    public B b;
  }

  public class B{
    public B(A a) { a.b = B.this; }
  }

  public void TestMethod() {
    A a = new A();
    dont_inline_me();
    //System.gc();
    B b = new B(a);
  }


  public static void main(String args[]){
    TestAllocMemBar xyz = new TestAllocMemBar();
    long duration = System.nanoTime();
    for (int x = 0; x < loop_cnt; x++) { xyz.TestMethod(); }
    duration = System.nanoTime() - duration;
    System.out.println("duration: " + duration/1000/loop_cnt + " us per iteration");
  }

}


Execution shows (tested on PPC64):
openjdk_9/bin/java -XX:+UseConcMarkSweepGC -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:CompileCommand="exclude TestAllocMemBar::dont_inline_me" -XX:+PrintInlining -XX:+PrintEscapeAnalysis -XX:-EliminateAllocations TestAllocMemBar
…
======== Connection graph for  TestAllocMemBar::TestMethod
JavaObject NoEscape(NoEscape) [ 59F 179F [ 37 42 ]]   25        Allocate        ===  5  6  7  8  1 ( 23  21  22  1  10  1  1 ) [[ 26  27  28  35  36  37 ]]  rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top ) TestAllocMemBar::TestMethod @ bci:0 !jvms: TestAllocMemBar::TestMethod @ bci:0
LocalVar [ 25P [ 42 59b ]]   37 Proj    ===  25  [[ 38  42  59 ]] #5 !jvms: TestAllocMemBar::TestMethod @ bci:0
LocalVar [ 37 25P [ 179b ]]   42        CheckCastPP     ===  39  37  [[ 179  183  179  119  98  93 ]]  #TestAllocMemBar$A:NotNull:exact *  Oop:TestAllocMemBar$A:NotNull:exact * !jvms: TestAllocMemBar::TestMethod @ bci:0

JavaObject NoEscape(NoEscape) NSR [ 153F [ 131 136 180 179 ]]   119     Allocate        ===  105  100  101  8  1 ( 54  117  22  1  10  42  1 ) [[ 120  121  122  129  130  131 ]]  rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top ) TestAllocMemBar::TestMethod @ bci:13 !jvms: TestAllocMemBar::TestMethod @ bci:13
LocalVar [ 119P [ 136 153b ]]   131     Proj    ===  119  [[ 132  136  153 ]] #5 !jvms: TestAllocMemBar::TestMethod @ bci:13
LocalVar [ 131 119P [ 180 ]]   136      CheckCastPP     ===  133  131  [[ 180  193 ]]  #TestAllocMemBar$B:NotNull:exact *  Oop:TestAllocMemBar$B:NotNull:exact * !jvms: TestAllocMemBar::TestMethod @ bci:13
LocalVar [ 136 119P [ 179 ]]   180      EncodeP === _  136  [[ 181 ]]  #narrowoop: TestAllocMemBar$B:NotNull:exact * !jvms: TestAllocMemBar$B::<init> @ bci:11 TestAllocMemBar::TestMethod @ bci:19

                            @ 5   TestAllocMemBar$A::<init> (10 bytes)   inline (hot)
                              @ 6   java.lang.Object::<init> (1 bytes)   inline (hot)
                            @ 10   TestAllocMemBar::dont_inline_me (1 bytes)   not compilable (disabled)
                            @ 19   TestAllocMemBar$B::<init> (15 bytes)   inline (hot)
                              @ 6   java.lang.Object::<init> (1 bytes)   inline (hot)
                            @ 6   java.lang.Object::<init> (1 bytes)   inline (hot)
                            @ 6   java.lang.Object::<init> (1 bytes)   inline (hot)
duration: 3 us per iteration


So you can see that both Allocations have the state NoEscape, but there’s a safepoint (the non-inlined call) between them. Concurrent GC could access the obj header and read stale data (and possibly crash). OptoAssembly shows that the MemBar was optimized out (probably due to 8136596).

However, we may have luck. Maybe no concurrent GC accesses the header of newly created objects. But I don’t know if this is true which is the reason why I posted this question originally. Keep in mind that objects can get allocated in old gen.

I still could imaging that these 2 optimization may be dangerous.

Best regards,
  Martin


From: Hui Shi [mailto:hui.shi at linaro.org]
Sent: Mittwoch, 16. Dezember 2015 13:27
To: Andrew Haley <aph at redhat.com>
Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Vitaly Davidovich <vitalyd at gmail.com>; Doerr, Martin <martin.doerr at sap.com>; Aleksey Shipilev <aleksey.shipilev at oracle.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev <aarch64-port-dev at openjdk.java.net>; Mikael Gerdin <mikael.gerdin at oracle.com> (mikael.gerdin at oracle.com) <mikael.gerdin at oracle.com>
Subject: Re: RFR: 8144993: Elide redundant memory barrier after AllocationNode

Thanks Andrew, Goetz and all!

Major concern is will removing storestore barrier cause other threads read stale data for newly allocated object. Other threads include java thread or concurrent GC thread. It should be safe with following analysis.

1. If BCEA result "this"(b) escapes in its initializer, change will not optimize storestore barrier.
2. If BCEA result "this"(b) does not escape in its initializer, it's safe to remove storestore.
   2.1 If there is a safe point between storestore and release, b is visible to GC in initializer, but at safe point, it should have a memory barrier.
   2.2 If there is no safe point between storestore and release. b will be visible to other thread after release memory barrier.

Case #1
A a = new A();
safepoint // a can be reached from GC
new B(a)

allocation
-------
b.klass =...
b.markword =...
b.f1 = 0
..
b.fn = 0
storestore
-------- init start
....
a.x = this;  // b might visible to other threads here
....
release
-------- init end

BCEA result indicate "this"(b) is not local and not arg_stack. So "b" will be treated as escaped in its initialzer, so change will not optimize storestore barrier.
[EA] estimated escape information for B::<init>
     non-escaping args:      {}
     stack-allocatable args: {1}
     return non-local value
     modified args:     0x6    0x6
     flags:
b="this"  is not local and not arg_stack
a        is arg_stack means it is passed in and not assigned to other object in initializer.

Case #2.1
allocation
-------
b.klass =...
b.markword =...
b.f1 = 0
..
b.fn = 0
storestore
-------- init start
....
safepoint  // "this" is in oop map and might visible to GC thread here
....
release
-------- init end

Case #2.2
allocation
-------
b.klass =...
b.markword =...
b.f1 = 0
..
b.fn = 0
storestore
-------- init start
....
release
-------- init end

Regards
Hui

On 16 December 2015 at 00:15, Andrew Haley <aph at redhat.com<mailto:aph at redhat.com>> wrote:
On 12/15/2015 04:01 PM, Lindenmaier, Goetz wrote:

> Further, if the object is NoEscape it might not be scalar
> replaced. If I remember correctly, there are various conditions,
> e.g., too big, allocated in loop.

Well, that's the killer.  The definition of "escape" we need to use
here is the really, truly, honest-to-goodness one: that this object
never becomes visible to any other thread by any means.  Unless that
is so, all bets are off.  In this case, what is intended is "appears
in an OOP map".

Andrew.



More information about the aarch64-port-dev mailing list