From edward.nevill at gmail.com  Wed Dec  2 14:24:26 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Wed, 02 Dec 2015 14:24:26 +0000
Subject: [aarch64-port-dev ] Help debugging problem with large code cache
Message-ID: <1449066266.25167.8.camel@mylittlepony.linaroharston>

Hi

I have been trying to debug a problem with large code caches on JDK 9 over the past week and could do with some help/advice on how to proceed.

I have filed a JIRA issue

https://bugs.openjdk.java.net/browse/JDK-8144498

Here is my analysis of the problem so far. Apologies if this is a bit stream of consciousness.


Running jtreg/langtools with -XX:ReservedCodeCacheSize=512m generates a number of failures dues to SEGVs whereas running without this option passes all tests.

The set of tests which fails each time is different. For example on two back to back runs I get

FAILED: tools/javac/classfiles/attributes/annotations/RuntimeAnnotationsForInnerAnnotationTest.java
FAILED: tools/javac/T6410706.java
FAILED: tools/jdeps/DotFileTest.java
ed at arm64:~/jtreg/jtreg$ fgrep FAILED log_512m_2
FAILED: com/sun/javadoc/testSimpleTag/TestSimpleTag.java
FAILED: com/sun/javadoc/testWindowTitle/TestWindowTitle.java
FAILED: jdk/jshell/CompletionSuggestionTest.java

The command used to invoke jtreg was

/home/ed/images/jdk9-orig/bin/java -jar lib/jtreg.jar -vmoption:-XX:ReservedCodeCacheSize=512m -nr -conc:48 -timeout:99 -othervm -jdk:/home/ed/images/jdk9-orig -v1 -a -ignore:quiet /home/ed/new_jdk9/hs-comp/langtools/test

The problem can also be replicated with EEMBC GrinderBench although it may required many 100s of runs to trigger. The command I used to invoke GrinderBench is

/home/ed/images/jdk9-orig/bin/java -XX:ReservedCodeCacheSize=512m -classpath dist/fullset/bench1.jar org.eembc.grinderbench.CmdlineWrapper -r 1 -m 1 -t 4

For the purposes of the following I have chosen to investigate the GrinderBench failure because it is easier to debug than random failures in jtreg/

The SEGV occurs in a method which is called from SharedRuntime::resolve_opt_virtual_call_C. The call backtrace is about 20 frames long. The following are the oldest few frames.

....
#17 0x000003ff99717a44 in SharedRuntime::resolve_helper (thread=thread at entry=0x3ff94010000,
    is_virtual=is_virtual at entry=true, is_optimized=is_optimized at entry=true,
    __the_thread__=__the_thread__ at entry=0x3ff94010000)
    at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/runtime/sharedRuntime.cpp:1186
#18 0x000003ff99718988 in SharedRuntime::resolve_opt_virtual_call_C (thread=0x3ff94010000)
    at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/runtime/sharedRuntime.cpp:1441
#19 0x000003ff70ab23a8 in ?? ()
#20 0x000003fdd59228f0 in ?? ()

Looking at frame #19

(gdb) x/10i $pc-20
   0x3ff70ab2394:       mov     x0, x28
   0x3ff70ab2398:       mov     x8, #0x8950                     // #35152
   0x3ff70ab239c:       movk    x8, #0x9971, lsl #16
   0x3ff70ab23a0:       movk    x8, #0x3ff, lsl #32
   0x3ff70ab23a4:       blr     x8
=> 0x3ff70ab23a8:       isb
   0x3ff70ab23ac:       str     xzr, [x28,#440]
   0x3ff70ab23b0:       str     xzr, [x28,#448]
   0x3ff70ab23b4:       ldr     x8, [x28,#8]
   0x3ff70ab23b8:       cbnz    x8, 0x3ff70ab2454

This is a stub for resolve_opt_virtual_call. So here it calls 0x3ff99718950 and disassembling that

(gdb) x/i 0x3ff99718950
   0x3ff99718950 <SharedRuntime::resolve_opt_virtual_call_C(JavaThread*)>:
    stp x29, x30, [sp,#-80]!

So it is calling SharedRuntime::resolve_opt_virtual_call_C which is correct according to the above stack trace. However, looking at the previous frame

(gdb) x/2g $fp
0x3ff98dede60:  0x0000000000000138      0x000003ff7122469c
(gdb) x/12i 0x000003ff7122469c-40
   0x3ff71224674:       ret
   0x3ff71224678:       mov     x8, #0x28f0                     // #10480
   0x3ff7122467c:       movk    x8, #0xd592, lsl #16
   0x3ff71224680:       movk    x8, #0x3fd, lsl #32
   0x3ff71224684:       str     x8, [sp,#8]
   0x3ff71224688:       mov     x8, #0xffffffffffffffff         // #-1
   0x3ff7122468c:       str     x8, [sp]
   0x3ff71224690:       adrp    x8, 0x3ff70ab2000   <<< HERE
   0x3ff71224694:       add     x8, x8, #0x300      <<<
   0x3ff71224698:       blr     x8                  <<<
   0x3ff7122469c:       b       0x3ff712242f8       <<<
   0x3ff712246a0:       adrp    x8, 0x3ff70adf000

The code marked HERE is a out of line stub which is calling the resolve_opt_virtual_call stub. So far so good.

*** But this is not the correct code to call resolve_opt_virtual_call ****

This is in fact the code generated by the following from c1_CodeStubs_aarch64.cpp

void CounterOverflowStub::emit_code(LIR_Assembler* ce) {
  __ bind(_entry);
  Metadata *m = _method->as_constant_ptr()->as_metadata();
  __ mov_metadata(rscratch1, m);
  ce->store_parameter(rscratch1, 1);
  ce->store_parameter(_bci, 0);
  __ far_call(RuntimeAddress(Runtime1::entry_for(Runtime1::counter_overflow_id)));
  ce->add_call_info_here(_info);
  ce->verify_oop_map(_info);
  __ b(_continuation);
}

So this code is supposed to be calling Runtime1::counter_overflow. The -1 for the BCI is the InvocationEntryBci because this is an invocation entry counter overflow and it is this -1 which eventually causes the SEGV because it is being used as a genuine index into the bytecode to get a constant pool index for the invoke.

But is shouldn't be calling SharedRuntime::resolve_opt_virtual_call_C, it should be calling Runtime1::counter_overflow.

Tracing back where this out of line stub is called from

(gdb) x/10i 0x3ff712242f8-36
   0x3ff712242d4:       mov     x0, #0xc250                     // #49744
   0x3ff712242d8:       movk    x0, #0xd592, lsl #16
   0x3ff712242dc:       movk    x0, #0x3fd, lsl #32
   0x3ff712242e0:       ldr     w6, [x0,#220]
   0x3ff712242e4:       add     w6, w6, #0x8
   0x3ff712242e8:       str     w6, [x0,#220]
   0x3ff712242ec:       and     w6, w6, #0x1ff8
   0x3ff712242f0:       cmp     w6, #0x0
   0x3ff712242f4:       b.eq    0x3ff71224678   <<<< HERE is the b to the out of line stub
   0x3ff712242f8:       str     w5, [sp,#52]
(gdb)

So the above confirms that it is really doing a counter overflow but calling resolve_opt_virtual_call.

So I tried changing the 'far_call' method in macroAssembler_aarch64.cpp to use movz/movk/movk instead of adrp/add.

IE
    // We can use ADRP here because we know that the total size of
    // the code cache cannot exceed 2Gb.
    adrp(tmp, entry, offset);
    add(tmp, tmp, offset);

becomes

    // We can use ADRP here because we know that the total size of
    // the code cache cannot exceed 2Gb.
    movptr(tmp, (uintptr_t)entry.target());
    //adrp(tmp, entry, offset);
    //add(tmp, tmp, offset);

This cause GrinderBench to start working (at least, no failures after about 5000 runs).

So I changed this to read

    // We can use ADRP here because we know that the total size of
    // the code cache cannot exceed 2Gb.
    movptr(tmp, (uintptr_t)entry.target());
    adrp(tmp, entry, offset);
    add(tmp, tmp, offset);

IE. So it generate both the movz/movk/movk vsn and the adrp/add version but uses the adrp version discarding the result of the movz/movk/movk version.

Now when I list the out of line stub in gdb I get

(gdb) x/10i 0x000003ff5521d5dc-32
   0x3ff5521d5bc:       mov     x8, #0xffffffffffffffff         // #-1
   0x3ff5521d5c0:       str     x8, [sp]
   0x3ff5521d5c4:       mov     x8, #0x9780            <<< movz/movk/movk -> 0x3ff54c89780
   0x3ff5521d5c8:       movk    x8, #0x54c8, lsl #16
   0x3ff5521d5cc:       movk    x8, #0x3ff, lsl #32
   0x3ff5521d5d0:       adrp    x8, 0x3ff54ab2000      <<< adrp/add -> 0x3ff54ab2300
   0x3ff5521d5d4:       add     x8, x8, #0x300
   0x3ff5521d5d8:       blr     x8
   0x3ff5521d5dc:       b       0x3ff5521d308
   0x3ff5521d5e0:       mov     x8, #0xfc80                     // #64640

So the adrp/add and movz/movk/movk address different runtime stubs. Disassembling both shows that the adrp is addressing the resolve_opt_virtual_call stub and the movz/movk/movk is addressing the Runtime1::counter_overflow stub.

So it looks like the adrp is either not being relocated, or is being relocated incorrectly.

Any suggestions as to why it might be doing this??? I have had a long look at the pd_patch_* code and it seems correct to me.

Unfortunately it is difficult to debug because I cannot walk though it in gdb because of the infrequency (once in every few 100 runs) so I can only debug as above by looking at the core files generated.

Thanks for your help,
Ed.


From edward.nevill at gmail.com  Thu Dec  3 07:41:44 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Thu, 03 Dec 2015 07:41:44 +0000
Subject: [aarch64-port-dev ] Help debugging problem with large code cache
In-Reply-To: <1449066266.25167.8.camel@mylittlepony.linaroharston>
References: <1449066266.25167.8.camel@mylittlepony.linaroharston>
Message-ID: <1449128504.15424.11.camel@mint>

On Wed, 2015-12-02 at 14:24 +0000, Edward Nevill wrote:
> So it looks like the adrp is either not being relocated, or is being relocated incorrectly.
> 
> Any suggestions as to why it might be doing this??? I have had a long look at the pd_patch_* code and it seems correct to me.

I think I have it on the run!

I believe the following code is getting a false positive

inline bool is_NativeCallTrampolineStub_at(address addr) {
  // Ensure that the stub is exactly
  //      ldr   xscratch1, L
  //      br    xscratch1
  // L:
  uint32_t *i = (uint32_t *)addr;
  return i[0] == 0x58000048 && i[1] == 0xd61f0100;
}


when called from the following in get_trampoline()

  address bl_destination
    = MacroAssembler::pd_call_destination(call_addr);
  if (code->content_contains(bl_destination) &&
      is_NativeCallTrampolineStub_at(bl_destination))
    return bl_destination;

which in turn is called from the following in pd_call_destination

  if (is_call()) {  
    address trampoline = nativeCall_at(addr())->get_trampoline();
    if (trampoline) {
      return nativeCallTrampolineStub_at(trampoline)->destination();
    }
  }

so the call destination for overflow_counter is matched as a false positive and the destination of a trampoline is returned instead, so the adrp is relocated to this.

Should the following line

    address trampoline = nativeCall_at(addr())->get_trampoline();

be

    address trampoline = nativeCall_at(orig_addr)->get_trampoline();

IE the address before relocation. Because if the code has not been relocated yet, then the adrp could be pointing somewhere randomly within the code buffer, and it just happens sometimes to point to a valid trampoline stub.

Regards,
Ed


From aph at redhat.com  Thu Dec  3 09:36:35 2015
From: aph at redhat.com (Andrew Haley)
Date: Thu, 3 Dec 2015 09:36:35 +0000
Subject: [aarch64-port-dev ] Help debugging problem with large code cache
In-Reply-To: <1449128504.15424.11.camel@mint>
References: <1449066266.25167.8.camel@mylittlepony.linaroharston>
	<1449128504.15424.11.camel@mint>
Message-ID: <56600D23.1020208@redhat.com>

On 03/12/15 07:41, Edward Nevill wrote:
> Because if the code has not been relocated yet, then the adrp could be pointing somewhere randomly within the code buffer, and it just happens sometimes to point to a valid trampoline stub.

If you can catch adrp being used where it randomly points somewhere
in a code buffer, then that undoubtedly would be a bug.

But pd_call_destination is surely not supposed to be used on a
branch whose destination has not been set: in that case it'll
return garbage, and it doesn't matter what kind of garbage.

The code in pd_set_call_destination certainly does look wrong,
however.  There is no guarantee at all that it points anywhere,
so dereferencing the adrp might be wrong.  It might be that the
logic here needs redesigning.

Andrew.


From edward.nevill at gmail.com  Thu Dec  3 10:15:32 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Thu, 03 Dec 2015 10:15:32 +0000
Subject: [aarch64-port-dev ] Help debugging problem with large code cache
In-Reply-To: <56600D23.1020208@redhat.com>
References: <1449066266.25167.8.camel@mylittlepony.linaroharston>
	<1449128504.15424.11.camel@mint> <56600D23.1020208@redhat.com>
Message-ID: <1449137732.6644.11.camel@mylittlepony.linaroharston>

On Thu, 2015-12-03 at 09:36 +0000, Andrew Haley wrote:
> On 03/12/15 07:41, Edward Nevill wrote:
> > Because if the code has not been relocated yet, then the adrp could be pointing somewhere randomly within the code buffer, and it just happens sometimes to point to a valid trampoline stub.
> 
> If you can catch adrp being used where it randomly points somewhere
> in a code buffer, then that undoubtedly would be a bug.

I assert it is

I have trapped it in gdb at the point where it is making the incorrect relocation.

Winding back the call trace to CallRelocation::fix_relocation_after_move

void CallRelocation::fix_relocation_after_move(const CodeBuffer* src, CodeBuffer* dest) {
  // Usually a self-relative reference to an external routine.
  // On some platforms, the reference is absolute (not self-relative).
  // The enhanced use of pd_call_destination sorts this all out.
  address orig_addr = old_addr_for(addr(), src, dest);
  address callee    = pd_call_destination(orig_addr);
  // Reassert the callee address, this time in the new copy of the code.
  pd_set_call_destination(callee);
}

(gdb) p/x callee
$9 = 0x3ff68ab2300
(gdb) p/x orig_addr	;; Relocating from 0x3ff68d16f28 -> 0x3ff691f02e8
$10 = 0x3ff68d16f28
(gdb) p/x addr()
$11 = 0x3ff691f02e8
(gdb) x/2i orig_addr
   0x3ff68d16f28:	adrp	x8, 0x3ff68d16000  ;; Original call destination
   0x3ff68d16f2c:	add	x8, x8, #0x500     ;; == 0x3ff68d16500
(gdb) x/2i addr()
   0x3ff691f02e8:	adrp	x8, 0x3ff691f0000  ;; Copied but not relocated
   0x3ff691f02ec:	add	x8, x8, #0x500     ;; dest == 0x3ff691f0500
(gdb) x/2i 0x3ff68d16500
   0x3ff68d16500:	stp	x29, x30, [sp,#-16]! ;; overflow_counter stub
   0x3ff68d16504:	mov	x29, sp            ;; pointed to by original call dest above
(gdb) x/2i 0x3ff691f0500
   0x3ff691f0500:	ldr	x8, 0x3ff691f0508  ;; copied but not relocated dest points here
   0x3ff691f0504:	br	x8                 ;; to a trampoline stub, but only by accident
                                                   ;; essentially pointing to a random place in
                                                   ;; the codebuf
(gdb) x/g 0x3ff691f0508 
0x3ff691f0508:	0x000003ff68ab2300                 ;; so since it thinks it is a trampoline stub
                                                   ;; it picks up this address as the final adr
                                                   ;; which we see in callee above

This is because it is using addr() in pd_call_destination, rather than orig_addr. IE. it is using the copied, but not relocated version, therefore the adrp is transiently pointing into garbage. Using the orig_addr should be correct.


From adinn at redhat.com  Thu Dec  3 11:02:46 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 3 Dec 2015 11:02:46 +0000
Subject: [aarch64-port-dev ] Help debugging problem with large code cache
In-Reply-To: <1449137732.6644.11.camel@mylittlepony.linaroharston>
References: <1449066266.25167.8.camel@mylittlepony.linaroharston>
	<1449128504.15424.11.camel@mint> <56600D23.1020208@redhat.com>
	<1449137732.6644.11.camel@mylittlepony.linaroharston>
Message-ID: <56602156.9030103@redhat.com>

On 03/12/15 10:15, Edward Nevill wrote:
> On Thu, 2015-12-03 at 09:36 +0000, Andrew Haley wrote:
>> On 03/12/15 07:41, Edward Nevill wrote:
>>> Because if the code has not been relocated yet, then the adrp could be pointing somewhere randomly within the code buffer, and it just happens sometimes to point to a valid trampoline stub.
>>
>> If you can catch adrp being used where it randomly points somewhere
>> in a code buffer, then that undoubtedly would be a bug.
> 
> I assert it is
> 
> I have trapped it in gdb at the point where it is making the incorrect relocation.

<snip>

Hmm, that looks like to me like it is the cause of the problem.

Interestingly, I just glanced at what the ppc code does and I am not
clear why it is not subject to the same problem -- admittedly only on a
half-arsed understanding of what it is doing. It might be worth you
looking at it to see if it there is something I have missed whihc sheds
light on the AArch64 case.

regards,


Andrew Dinn
-----------


From edward.nevill at gmail.com  Thu Dec  3 12:32:23 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Thu, 03 Dec 2015 12:32:23 +0000
Subject: [aarch64-port-dev ] Help debugging problem with large code cache
In-Reply-To: <56600D23.1020208@redhat.com>
References: <1449066266.25167.8.camel@mylittlepony.linaroharston>
	<1449128504.15424.11.camel@mint> <56600D23.1020208@redhat.com>
Message-ID: <1449145943.6644.22.camel@mylittlepony.linaroharston>

On Thu, 2015-12-03 at 09:36 +0000, Andrew Haley wrote:
> On 03/12/15 07:41, Edward Nevill wrote:

> The code in pd_set_call_destination certainly does look wrong,
> however.  There is no guarantee at all that it points anywhere,
> so dereferencing the adrp might be wrong.  It might be that the
> logic here needs redesigning.

I believe the code is pd_set_call_destination is correct although it is fragile.

Again it is looking at the copied but not relocated code as in pd_call_destination.

However, the NativeCall::get_trampoline() method called by pd_set_call_destination checks that the destination is within the code blob before examining it.

>From NativeCall::get_trampoline()

  if (code->content_contains(bl_destination) &&
      is_NativeCallTrampolineStub_at(bl_destination))
    return bl_destination;

so code->content_contains(bl_destination) checks that the destination is within the code blob.

We know that if a trampoline exists it must be in the same code blob (that is the whole purpose of the trampoline).

Regards,
Ed.


From fei.yang0953 at yahoo.com  Thu Dec  3 14:22:26 2015
From: fei.yang0953 at yahoo.com (felix yang)
Date: Thu, 3 Dec 2015 14:22:26 +0000 (UTC)
Subject: [aarch64-port-dev ] [RFR] aarch64: C2 generate vectorized MLA/MLS
	instructions
References: <537574996.11929209.1449152546033.JavaMail.yahoo.ref@mail.yahoo.com>
Message-ID: <537574996.11929209.1449152546033.JavaMail.yahoo@mail.yahoo.com>

Hi,
? Can someone help review and sponsor this code generation improvement for aarch64 port???
? Bug:?https://bugs.openjdk.java.net/browse/JDK-8144587
? Webrev:?http://cr.openjdk.java.net/~fyang/8144587/webrev.00/

? The hotspot/test/compiler/loopopts/superword/SumRed_Int.java can server as a test case.?? With this patch, the following code snippet by C2:?
? ? 0x0000007f6cec12cc: mul v19.4s, v16.4s, v17.4s
? ? 0x0000007f6cec12d0: mul v16.4s, v16.4s, v18.4s
? ? 0x0000007f6cec12d4: mul v17.4s, v18.4s, v17.4s
? ? 0x0000007f6cec12d8: add v16.4s, v19.4s, v16.4s
? ? 0x0000007f6cec12dc: add v16.4s, v16.4s, v17.4s
? will be further optimized into:?
? ? 0x0000007f9cdb86dc: mul? ? ? v19.4s, v16.4s, v17.4s
? ? 0x0000007f9cdb86e0: mla? ? ? v19.4s, v16.4s, v18.4s
? ? 0x0000007f9cdb86e4: mla? ? ? v19.4s, v17.4s, v18.4s

? About 13% performance gain achieved for the test case on my aarch64 server.??
? Tested with jtreg hotspot & langtools.? Results are the same before and after.??
? Is it OK to push???

Felix,??
Thanks for your help.??


From aph at redhat.com  Thu Dec  3 14:40:07 2015
From: aph at redhat.com (Andrew Haley)
Date: Thu, 3 Dec 2015 14:40:07 +0000
Subject: [aarch64-port-dev ] [RFR] aarch64: C2 generate vectorized
	MLA/MLS instructions
In-Reply-To: <537574996.11929209.1449152546033.JavaMail.yahoo@mail.yahoo.com>
References: <537574996.11929209.1449152546033.JavaMail.yahoo.ref@mail.yahoo.com>
	<537574996.11929209.1449152546033.JavaMail.yahoo@mail.yahoo.com>
Message-ID: <56605447.9070103@redhat.com>

It would help everybody if you did "hg commit" with an appropriate
changeset comment before generating the webrev.

Andrew.

From edward.nevill at gmail.com  Fri Dec  4 09:59:46 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Fri, 04 Dec 2015 09:59:46 +0000
Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache
	generates SEGV
Message-ID: <1449223186.15424.42.camel@mint>

Hi,

Please review the following webrev

http://cr.openjdk.java.net/~enevill/8144498/webrev.0/

JIRA issue: https://bugs.openjdk.java.net/browse/JDK-8144498

This fixes an issue when random SEGVs were generated with -XX:ReservedCodeCacheSize > 128m

The problem was that pd_call_destination was using addr() rather than orig_addr. IE. It was using the address in the copied, but not relocated code.

It was then following a call destination to determine whether or not this was a call to a trampoline (in order that it could substitute the final trampoline address).

Usually this worked OK because it ended up just referencing a random address in the code buffer. However, very occasionally it would point to a trampoline somewhere in the code buffer and get a false positive.

In this case it would substitute the final address of that trampoline.

The result was that it would very occasionally relocate the address of some call to a random trampoline stub.

I have tested this with jtreg hotspot and langtools with -XX:ReservedCodeCacheSize=256m and without specifying any ReservedCodeCacheSize (so it defaults to 128m).

With ReservedCodeCacheSize == default

Hotspot (original): Test results: passed: 935; failed: 22; error: 12
Hotspot (patched): Test results: passed: 942; failed: 15; error: 12
Langtools (original): Test results: passed: 3,313; failed: 33
Langtools (patched): Test results: passed: 3,316; failed: 33

With -XX:+ReservedCodeCacheSize=256m

Hotspot (original): Test results: passed: 865; failed: 19; error: 85
Hotspot (patched): Test results: passed: 946; failed: 10; error: 13
Langtools (original): Test results: passed: 3,049; failed: 77; error: 223
Langtools (patched): Test results: passed: 3,314; failed: 33

So in all cases it generates results as good, or better than the original. In the case of langtools with a 256m buffer it goes from 300 failures+errors to just 33.

I have also tested this with EEMBC GrinderBench which also showed the problem every few 100 runs. I have run this over 5000 times with no occurrence of the problem.

Thanks for your review,
Ed.


From adinn at redhat.com  Fri Dec  4 10:11:27 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 4 Dec 2015 10:11:27 +0000
Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache
 generates SEGV
In-Reply-To: <1449223186.15424.42.camel@mint>
References: <1449223186.15424.42.camel@mint>
Message-ID: <566166CF.5000006@redhat.com>

On 04/12/15 09:59, Edward Nevill wrote:
> Hi,
> 
> Please review the following webrev . . .

Reviewed by me as an AArch64-only patch.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors:Michael Cunningham (US), Michael O'Neill(Ireland), Paul Argiry
(US)

From aph at redhat.com  Fri Dec  4 16:14:03 2015
From: aph at redhat.com (Andrew Haley)
Date: Fri, 4 Dec 2015 16:14:03 +0000
Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache
 generates SEGV
In-Reply-To: <1449223186.15424.42.camel@mint>
References: <1449223186.15424.42.camel@mint>
Message-ID: <5661BBCB.5000307@redhat.com>

Your fix looks OK.

However, there is one other fix which would be nice.

We use call relocs for things other than bl instructions.  This
is because some things (e.g. MachUEPNode::emit) do this:

  __ far_jump(RuntimeAddress(SharedRuntime::get_ic_miss_stub()));

Only bl immediate instructions are ever used to jump to trampolines.
This is essential because they must be patchable.

Because of this, in here:

  if (is_call()) {
    address trampoline = nativeCall_at(orig_addr)->get_trampoline();
    if (trampoline) {
      return nativeCallTrampolineStub_at(trampoline)->destination();
    }
  }

the is_call() could be replaced by NativeCall::is_call_at().
Otherwise we're pointlessly decoding instructions and chasing
nonexistent trampolines.  Could you try that?

Thanks,

Andrew.


From aph at redhat.com  Fri Dec  4 17:38:19 2015
From: aph at redhat.com (Andrew Haley)
Date: Fri, 4 Dec 2015 17:38:19 +0000
Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache
 generates SEGV
In-Reply-To: <5661BBCB.5000307@redhat.com>
References: <1449223186.15424.42.camel@mint> <5661BBCB.5000307@redhat.com>
Message-ID: <5661CF8B.6040405@redhat.com>

On 12/04/2015 04:14 PM, Andrew Haley wrote:
> Your fix looks OK.

Scratch that, I'm seeing NetBeans failures with your patch.  I think
it's because you're missing a trampoline destination when the initial
relocation is being done.  This is because get_trampoline() looks for
a trampoline_stub reloc based on orig_addr, and this can never work.

(When a trampoline call is first created it is a call to self; the
reloc is the only way to find the trampoline.  For this reason, you
must use nativeCall_at(addr())->get_trampoline().)

I'm going to suggest this as a simpler fix:

address Relocation::pd_call_destination(address orig_addr) {
  assert(is_call(), "should be a call here");
  if (NativeCall::is_call_at(addr())) {  // is a BL instruction
    address trampoline = nativeCall_at(addr())->get_trampoline();
    if (trampoline) {
      return nativeCallTrampolineStub_at(trampoline)->destination();
    }
  }
  if (orig_addr != NULL) {
    return MacroAssembler::pd_call_destination(orig_addr);
  }
  return MacroAssembler::pd_call_destination(addr());
}

I think it's right because this way we only follow real BL
instructions, and if these point to trampolines they must be within
the blob which is being relocated.  I think this will fix your problem
because such BL instructions cannot point to anywhere wild.

Thanks,

Andrew.


From edward.nevill at gmail.com  Fri Dec  4 17:43:37 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Fri, 04 Dec 2015 17:43:37 +0000
Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache
 generates SEGV
In-Reply-To: <5661BBCB.5000307@redhat.com>
References: <1449223186.15424.42.camel@mint>  <5661BBCB.5000307@redhat.com>
Message-ID: <1449251017.4670.3.camel@mint>

On Fri, 2015-12-04 at 16:14 +0000, Andrew Haley wrote: 
> Your fix looks OK.
> 
> However, there is one other fix which would be nice.

> if (is_call()) {
>     address trampoline = nativeCall_at(orig_addr)->get_trampoline();
>     if (trampoline) {
>       return nativeCallTrampolineStub_at(trampoline)->destination();
>     }
>   }
> 
> the is_call() could be replaced by NativeCall::is_call_at().
> Otherwise we're pointlessly decoding instructions and chasing
> nonexistent trampolines.  Could you try that?

Done. New webrev at

http://cr.openjdk.java.net/~enevill/8144498/webrev.1

jtreg results with ReservedCodeCacheSize=256m

Hotspot (original): Test results: passed: 865; failed: 19; error: 85
Hotspot (patched): Test results: passed: 947; failed: 10; error: 12
Langtools (original): Test results: passed: 3,049; failed: 77; error: 223
Hotspot (patched): Test results: passed: 3,316; failed: 33

Many thanks,
Ed.


From fei.yang0953 at yahoo.com  Sun Dec  6 14:33:46 2015
From: fei.yang0953 at yahoo.com (felix yang)
Date: Sun, 6 Dec 2015 14:33:46 +0000 (UTC)
Subject: [aarch64-port-dev ] [RFR] aarch64: C2 generate vectorized
	MLA/MLS instructions
In-Reply-To: <56605447.9070103@redhat.com>
References: <56605447.9070103@redhat.com>
Message-ID: <792430786.13018984.1449412426251.JavaMail.yahoo@mail.yahoo.com>

Done.Currently, I have two webrevs which are under review.I hava recreated both of them:Bug: https://bugs.openjdk.java.net/browse/JDK-8144201
Webrev: http://cr.openjdk.java.net/~fyang/8144201/webrev.01Bug:?https://bugs.openjdk.java.net/browse/JDK-8144587
Webrev:?http://cr.openjdk.java.net/~fyang/8144587/webrev.01Is that OK?  Thanks.
 

    On Thursday, December 3, 2015 10:40 PM, Andrew Haley <aph at redhat.com> wrote:
 

 It would help everybody if you did "hg commit" with an appropriate
changeset comment before generating the webrev.

Andrew.


From aph at redhat.com  Mon Dec  7 09:48:36 2015
From: aph at redhat.com (Andrew Haley)
Date: Mon, 7 Dec 2015 09:48:36 +0000
Subject: [aarch64-port-dev ] [RFR] aarch64: C2 generate vectorized
	MLA/MLS instructions
In-Reply-To: <792430786.13018984.1449412426251.JavaMail.yahoo@mail.yahoo.com>
References: <56605447.9070103@redhat.com>
	<792430786.13018984.1449412426251.JavaMail.yahoo@mail.yahoo.com>
Message-ID: <566555F4.6090202@redhat.com>

On 06/12/15 14:33, felix yang wrote:
> Done.Currently, I have two webrevs which are under review.I hava recreated both of them:Bug: https://bugs.openjdk.java.net/browse/JDK-8144201
> Webrev: http://cr.openjdk.java.net/~fyang/8144201/webrev.01Bug: https://bugs.openjdk.java.net/browse/JDK-8144587
> Webrev: http://cr.openjdk.java.net/~fyang/8144587/webrev.01Is that OK?

No, the comment is not complete.

Please make sure that you have Jcheck installed in your Mercurial.

http://openjdk.java.net/projects/code-tools/jcheck/

Andrew.


From edward.nevill at gmail.com  Mon Dec  7 12:22:14 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Mon, 07 Dec 2015 12:22:14 +0000
Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache
 generates SEGV
In-Reply-To: <5661CF8B.6040405@redhat.com>
References: <1449223186.15424.42.camel@mint> <5661BBCB.5000307@redhat.com>
	<5661CF8B.6040405@redhat.com>
Message-ID: <1449490934.12382.49.camel@mint>

On Fri, 2015-12-04 at 17:38 +0000, Andrew Haley wrote:
> On 12/04/2015 04:14 PM, Andrew Haley wrote:
> I'm going to suggest this as a simpler fix:
> 
> address Relocation::pd_call_destination(address orig_addr) {
>   assert(is_call(), "should be a call here");
>   if (NativeCall::is_call_at(addr())) {  // is a BL instruction
>     address trampoline = nativeCall_at(addr())->get_trampoline();
>     if (trampoline) {
>       return nativeCallTrampolineStub_at(trampoline)->destination();
>     }
>   }
>   if (orig_addr != NULL) {
>     return MacroAssembler::pd_call_destination(orig_addr);
>   }
>   return MacroAssembler::pd_call_destination(addr());
> }
> 
> I think it's right because this way we only follow real BL
> instructions, and if these point to trampolines they must be within
> the blob which is being relocated.  I think this will fix your problem
> because such BL instructions cannot point to anywhere wild.

I am not sure this works.

Firstly, in the case that far_branches are not enabled (IE the code cache is <= 128m), then there could be BL instructions to other addresses outside the current code blob. These are generated by far_call as follows.

  if (far_branches()) {
    unsigned long offset;
    // We can use ADRP here because we know that the total size of
    // the code cache cannot exceed 2Gb.
    adrp(tmp, entry, offset);
    add(tmp, tmp, offset);
    if (cbuf) cbuf->set_insts_mark();
    blr(tmp);
  } else {
    if (cbuf) cbuf->set_insts_mark();
    bl(entry);
  }

I cannot see what prevents one of these BLs from being followed and since they may have been copied but not relocated then they may end up pointing somewhere random in the code buffer which just happens to look like a trampoline. Admittedly, the probability of failure is vastly reduced because there are no genuine trampolines for it to latch on to.

This case can be avoided by adding a far_branches() predicate to pd_call_destination as follows.

  if (far_branches() && NativeCall::is_call_at(addr())) {  // is a BL instruction

Second, I am not such that your assertion

> (When a trampoline call is first created it is a call to self; the
> reloc is the only way to find the trampoline.  For this reason, you
> must use nativeCall_at(addr())->get_trampoline().)

is correct. In MacroAssembler::trampoline_call I see

  if (Assembler::reachable_from_branch_at(pc(), entry.target())) {
    bl(entry.target());
  } else {
    bl(pc());
  }

so it only creates a call to self if the branch does not reach and as before you could have a dangling BL when this is copied.

I believe it would be possible to replace the above code section with simply

  bl(pc());

since it will always be relocated and therefore you can always generate the call to self.

All of this seems very fragile and I am wondering about the value of trampolines. The alternative to using trampolines would be to always generate

  adrp Xn, target & ~0xfff
  add  Xn, Xn, target & 0xfff
  blr  Xn

On most modern, out of order, dual issue implementations the ADRP and ADD will be folded into a single micro-op which will then be dual issued with the BLR so it doesn't end up costing us anything.

I did some experiments on 2 different implementations comparing the following 3 code fragments (where 'tramp_dest' is the final destination to be called).

1) Straight BL

tramp_test:
        mov     x2, x30
tramp1: 
        bl      tramp_dest
        subs    x0, x0, #1
        bne     tramp1
        ret     x2

2) Straight ADRP/ADD

tramp_test:
        mov     x2, x30
tramp1: 
        adr     x3, tramp_dest
        add     x3, x3, #0x0
        blr     x3
        subs    x0, x0, #1
        bne     tramp1
        ret     x2

3) Trampoline

tramp_test:
        mov     x2, x30
tramp1: 
        bl      tramp
        subs    x0, x0, #1
        bne     tramp1
        ret     x2

tramp:  
        ldr     x1, tramp_adcon
        br      x1
tramp_adcon:
        .dword  tramp_dest

I ran the above tests on 2 different implementations for 1E9 iteration. The results were

Imp 1: Straight BL = 4.50157 sec, ADRP/ADD = 4.50157 sec, trampoline = 6.00209 sec
Imp 2: Straight BL = 3.00107 sec, ADRP/ADD = 3.00106 sec, trampoline = 4.16815 sec

Maybe we could just get rid of trampolines?

All the best,
Ed.


From edward.nevill at gmail.com  Mon Dec  7 13:45:19 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Mon, 07 Dec 2015 13:45:19 +0000
Subject: [aarch64-port-dev ] aarch32 project
Message-ID: <1449495919.12382.59.camel@mint>

Hi,

This is not really applicable to aarch64 but there is probably a large overlap of interest so I am posting this announcement here.

The aarch32 project has now been created and there is now an aarch32 specific mailing list aarch32-port-dev at openjdk.java.net

Please go to http://mail.openjdk.java.net/mailman/listinfo/aarch32-port-dev to sign up.

There will be no further announcements on this list.

Thanks for your time,
Ed Nevill


From aph at redhat.com  Mon Dec  7 14:20:37 2015
From: aph at redhat.com (Andrew Haley)
Date: Mon, 7 Dec 2015 14:20:37 +0000
Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache
 generates SEGV
In-Reply-To: <1449490934.12382.49.camel@mint>
References: <1449223186.15424.42.camel@mint> <5661BBCB.5000307@redhat.com>
	<5661CF8B.6040405@redhat.com> <1449490934.12382.49.camel@mint>
Message-ID: <566595B5.9060400@redhat.com>

On 12/07/2015 12:22 PM, Edward Nevill wrote:

> I cannot see what prevents one of these BLs from being followed and
> since they may have been copied but not relocated then they may end
> up pointing somewhere random in the code buffer which just happens
> to look like a trampoline. Admittedly, the probability of failure is
> vastly reduced because there are no genuine trampolines for it to
> latch on to.

You must look inside get_trampoline().  It checks for this.

> Second, I am not such that your assertion
> 
>> (When a trampoline call is first created it is a call to self; the
>> reloc is the only way to find the trampoline.  For this reason, you
>> must use nativeCall_at(addr())->get_trampoline().)
> 
> is correct. In MacroAssembler::trampoline_call I see
> 
>   if (Assembler::reachable_from_branch_at(pc(), entry.target())) {
>     bl(entry.target());
>   } else {
>     bl(pc());
>   }
> 
> so it only creates a call to self if the branch does not reach and
> as before you could have a dangling BL when this is copied.

It doesn't matter, because get_trampoline() checks for BLs outside
the  current method.

> I believe it would be possible to replace the above code section
> with simply
>
>   bl(pc());

> since it will always be relocated and therefore you can always
> generate the call to self.

True.  There are some other tidy-ups which could also be made in this
area, but none of it is terribly important as far as I can see.

> Maybe we could just get rid of trampolines?

There's no need.  In the commonest case we BL directly to the
destination, which is optimal.  Your ADRP/ADD examples aren't
patchable; if you are going to compare trampolines with something
else, whatever else you choose must be patchable, and it will be
slower and/or larger than BL.

Andrew.

From felix.yang at linaro.org  Mon Dec  7 15:17:30 2015
From: felix.yang at linaro.org (Felix Yang)
Date: Mon, 7 Dec 2015 23:17:30 +0800
Subject: [aarch64-port-dev ] RFR: 8144201: aarch64:
 jdk/test/com/sun/net/httpserver/Test6a.java fails with
 --enable-unlimited-crypto
Message-ID: <CACc5Y6TTZFEBb6Xtnv0i8+w=D9-XpTyA46Aiip9GQDEkzJxWGw@mail.gmail.com>

Hi,

    I have corrected the webrev issues in my previous. Thanks Edward for
providing the help.
    Now I am resending this mail:

    Could someone help review and sponsor this runtime fix for aarch64?
    Bug: https://bugs.openjdk.java.net/browse/JDK-8144201
    Webrev: http://cr.openjdk.java.net/~fyang/8144201/webrev.02

    The test fails on aarch64 platform using openjdk8/9 configured with
--enable-unlimited-crypto.
    Reported error message: Execution failed: `main' threw exception:
java.io.IOException: Error writing request body to server.
    And the test passes with -XX:TieredStopAtLevel=3 or
-XX:-UseAESIntrinsics option.

    After narrowing down, I find the bug is caused by the
_cipherBlockChaining_decryptAESCrypt StubRoutine.
    The proposed patch fixes an obvious typo in this StubRoutine.  Passed
JTreg regression test(using openjdk8 built with --enable-unlimited-crypto).
    Is it OK to push?

Felix,
Thanks for your help.

From felix.yang at linaro.org  Mon Dec  7 15:26:06 2015
From: felix.yang at linaro.org (Felix Yang)
Date: Mon, 7 Dec 2015 23:26:06 +0800
Subject: [aarch64-port-dev ] RFR: 8144587: aarch64: generate vectorized
	MLA/MLS instructions
Message-ID: <CACc5Y6RASChRyqyJXTM1HoPkmNg5SQwrEtpoTGRGZBEr64Qtfg@mail.gmail.com>

Hi,

  I have corrected the webrev issues in my previous mail. Thanks Edward for
providing the help.
  Now I am resending this mail:

  Can someone help review and sponsor this code generation improvement for
aarch64 port?
  Bug: https://bugs.openjdk.java.net/browse/JDK-8144587
  Webrev: http://cr.openjdk.java.net/~fyang/8144587/webrev.02

  The hotspot/test/compiler/loopopts/superword/SumRed_Int.java can server
as a test case.
  With this patch, the following code snippet by C2:
    0x0000007f6cec12cc: mul v19.4s, v16.4s, v17.4s
    0x0000007f6cec12d0: mul v16.4s, v16.4s, v18.4s
    0x0000007f6cec12d4: mul v17.4s, v18.4s, v17.4s
    0x0000007f6cec12d8: add v16.4s, v19.4s, v16.4s
    0x0000007f6cec12dc: add v16.4s, v16.4s, v17.4s
  will be further optimized into:
    0x0000007f9cdb86dc: mul      v19.4s, v16.4s, v17.4s
    0x0000007f9cdb86e0: mla      v19.4s, v16.4s, v18.4s
    0x0000007f9cdb86e4: mla      v19.4s, v17.4s, v18.4s

  About 13% performance gain achieved for the test case on my aarch64
server.
  Tested with jtreg hotspot & langtools.  Results are the same before and
after.
  Is it OK to push?

Felix,
Thanks for your help.

From edward.nevill at gmail.com  Mon Dec  7 16:19:07 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Mon, 07 Dec 2015 16:19:07 +0000
Subject: [aarch64-port-dev ] RFR: 8144201: aarch64:
 jdk/test/com/sun/net/httpserver/Test6a.java fails with
 --enable-unlimited-crypto
In-Reply-To: <CACc5Y6TTZFEBb6Xtnv0i8+w=D9-XpTyA46Aiip9GQDEkzJxWGw@mail.gmail.com>
References: <CACc5Y6TTZFEBb6Xtnv0i8+w=D9-XpTyA46Aiip9GQDEkzJxWGw@mail.gmail.com>
Message-ID: <1449505147.12382.67.camel@mint>

Hi Felix,

Thanks for finding this.

The fix looks good to me.

Could we have an official reviewer please.

Regards,
Ed.

On Mon, 2015-12-07 at 23:17 +0800, Felix Yang wrote:
> Hi,
> 
>     I have corrected the webrev issues in my previous. Thanks Edward for
> providing the help.
>     Now I am resending this mail:
> 
>     Could someone help review and sponsor this runtime fix for aarch64?
>     Bug: https://bugs.openjdk.java.net/browse/JDK-8144201
>     Webrev: http://cr.openjdk.java.net/~fyang/8144201/webrev.02
> 
>     The test fails on aarch64 platform using openjdk8/9 configured with
> --enable-unlimited-crypto.
>     Reported error message: Execution failed: `main' threw exception:
> java.io.IOException: Error writing request body to server.
>     And the test passes with -XX:TieredStopAtLevel=3 or
> -XX:-UseAESIntrinsics option.
> 
>     After narrowing down, I find the bug is caused by the
> _cipherBlockChaining_decryptAESCrypt StubRoutine.
>     The proposed patch fixes an obvious typo in this StubRoutine.  Passed
> JTreg regression test(using openjdk8 built with --enable-unlimited-crypto).
>     Is it OK to push?
> 
> Felix,
> Thanks for your help.


From edward.nevill at gmail.com  Mon Dec  7 16:21:17 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Mon, 07 Dec 2015 16:21:17 +0000
Subject: [aarch64-port-dev ] RFR: 8144587: aarch64: generate vectorized
 MLA/MLS instructions
In-Reply-To: <CACc5Y6RASChRyqyJXTM1HoPkmNg5SQwrEtpoTGRGZBEr64Qtfg@mail.gmail.com>
References: <CACc5Y6RASChRyqyJXTM1HoPkmNg5SQwrEtpoTGRGZBEr64Qtfg@mail.gmail.com>
Message-ID: <1449505277.12382.69.camel@mint>

Hi Felix,

Thanks for this.

This optimisation looks good to me.

Could we have an official reviewer please.

Thanks,
Ed.

On Mon, 2015-12-07 at 23:26 +0800, Felix Yang wrote:
> Hi,
> 
>   I have corrected the webrev issues in my previous mail. Thanks Edward for
> providing the help.
>   Now I am resending this mail:
> 
>   Can someone help review and sponsor this code generation improvement for
> aarch64 port?
>   Bug: https://bugs.openjdk.java.net/browse/JDK-8144587
>   Webrev: http://cr.openjdk.java.net/~fyang/8144587/webrev.02
> 
>   The hotspot/test/compiler/loopopts/superword/SumRed_Int.java can server
> as a test case.
>   With this patch, the following code snippet by C2:
>     0x0000007f6cec12cc: mul v19.4s, v16.4s, v17.4s
>     0x0000007f6cec12d0: mul v16.4s, v16.4s, v18.4s
>     0x0000007f6cec12d4: mul v17.4s, v18.4s, v17.4s
>     0x0000007f6cec12d8: add v16.4s, v19.4s, v16.4s
>     0x0000007f6cec12dc: add v16.4s, v16.4s, v17.4s
>   will be further optimized into:
>     0x0000007f9cdb86dc: mul      v19.4s, v16.4s, v17.4s
>     0x0000007f9cdb86e0: mla      v19.4s, v16.4s, v18.4s
>     0x0000007f9cdb86e4: mla      v19.4s, v17.4s, v18.4s
> 
>   About 13% performance gain achieved for the test case on my aarch64
> server.
>   Tested with jtreg hotspot & langtools.  Results are the same before and
> after.
>   Is it OK to push?
> 
> Felix,
> Thanks for your help.


From roland.westrelin at oracle.com  Mon Dec  7 16:26:57 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Mon, 7 Dec 2015 17:26:57 +0100
Subject: [aarch64-port-dev ] RFR: 8144201: aarch64:
	jdk/test/com/sun/net/httpserver/Test6a.java fails with
	--enable-unlimited-crypto
In-Reply-To: <CACc5Y6TTZFEBb6Xtnv0i8+w=D9-XpTyA46Aiip9GQDEkzJxWGw@mail.gmail.com>
References: <CACc5Y6TTZFEBb6Xtnv0i8+w=D9-XpTyA46Aiip9GQDEkzJxWGw@mail.gmail.com>
Message-ID: <60CFC191-622E-4243-A9C5-E2D4B7F2F024@oracle.com>

>    Webrev: http://cr.openjdk.java.net/~fyang/8144201/webrev.02

That looks good to me.

Roland.

From roland.westrelin at oracle.com  Mon Dec  7 16:31:06 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Mon, 7 Dec 2015 17:31:06 +0100
Subject: [aarch64-port-dev ] RFR: 8144587: aarch64: generate vectorized
	MLA/MLS instructions
In-Reply-To: <CACc5Y6RASChRyqyJXTM1HoPkmNg5SQwrEtpoTGRGZBEr64Qtfg@mail.gmail.com>
References: <CACc5Y6RASChRyqyJXTM1HoPkmNg5SQwrEtpoTGRGZBEr64Qtfg@mail.gmail.com>
Message-ID: <F8724AB5-8BE0-4524-9615-219EFC1B6F5B@oracle.com>

>   Webrev: http://cr.openjdk.java.net/~fyang/8144587/webrev.02

That looks good to me.

Roland.

From edward.nevill at gmail.com  Tue Dec  8 15:32:30 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Tue, 08 Dec 2015 15:32:30 +0000
Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache
 generates SEGV
In-Reply-To: <566595B5.9060400@redhat.com>
References: <1449223186.15424.42.camel@mint> <5661BBCB.5000307@redhat.com>
	<5661CF8B.6040405@redhat.com> <1449490934.12382.49.camel@mint>
	<566595B5.9060400@redhat.com>
Message-ID: <1449588750.5880.28.camel@mylittlepony.linaroharston>

On Mon, 2015-12-07 at 14:20 +0000, Andrew Haley wrote:
> On 12/07/2015 12:22 PM, Edward Nevill wrote:
> 
> > I cannot see what prevents one of these BLs from being followed and
> > since they may have been copied but not relocated then they may end
> > up pointing somewhere random in the code buffer which just happens
> > to look like a trampoline. Admittedly, the probability of failure is
> > vastly reduced because there are no genuine trampolines for it to
> > latch on to.
> 
> You must look inside get_trampoline().  It checks for this.

OK. Thanks, I have satisfied myself that this is correct.

New webrev @ http://cr.openjdk.java.net/~enevill/8144498/webrev.2

I was having difficulty understanding why the check inside get_trapoline() did not exclude the adrp/add relocation. However when I trap it doing the relocation in gdb I see

Original:
   0x3ff54170b50:       adrp    x8, 0x3ff54170000  <<< Not in code blob
   0x3ff54170b54:       add     x8, x8, #0x400
   0x3ff54170b58:       blr     x8

Copied but not relocated.
   0x3ff5481d250:       adrp    x8, 0x3ff5481d000  <<< Within code blob
   0x3ff5481d254:       add     x8, x8, #0x400
   0x3ff5481d258:       blr     x8

So the destination offset in the original is 0x3ff54170400 - 0x3ff54170b50 = 0xfffffffffffff8b0, whereas in the copied but not relocated version it is 0x3ff5481d400 - 0x3ff5481d250 = 0x1b0 which is within the current code blob.

This happens because of the half PC relative, half absolute nature of the adrp/add relocation in that the bottom 12 bits are always absolute whereas the adrp instruction is PC relative.

I have retested this with JTreg hotspot & langtools with ReservedCodeCacheSize=256m

Hotspot original: Test results: passed: 865; failed: 19; error: 85
Hotspot revised: Test results: passed: 953; failed: 9; error: 12

Langtools original: Test results: passed: 3,049; failed: 77; error: 223
Langtools revised: Test results: passed: 3,316; failed: 33

Thanks for the review,
Ed.


From aph at redhat.com  Tue Dec  8 15:49:40 2015
From: aph at redhat.com (Andrew Haley)
Date: Tue, 8 Dec 2015 15:49:40 +0000
Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache
 generates SEGV
In-Reply-To: <1449588750.5880.28.camel@mylittlepony.linaroharston>
References: <1449223186.15424.42.camel@mint> <5661BBCB.5000307@redhat.com>
	<5661CF8B.6040405@redhat.com> <1449490934.12382.49.camel@mint>
	<566595B5.9060400@redhat.com>
	<1449588750.5880.28.camel@mylittlepony.linaroharston>
Message-ID: <5666FC14.6020001@redhat.com>

On 12/08/2015 03:32 PM, Edward Nevill wrote:
> OK. Thanks, I have satisfied myself that this is correct.
> 
> New webrev @ http://cr.openjdk.java.net/~enevill/8144498/webrev.2

That looks good to me.

Thanks,

Andrew.


From edward.nevill at gmail.com  Tue Dec  8 18:22:32 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Tue, 08 Dec 2015 18:22:32 +0000
Subject: [aarch64-port-dev ] Guarantee failures since 8144028: Use AArch64
 bit-test instructions in C2
Message-ID: <1449598952.3988.7.camel@mint>

Hi,

Since "8144028: Use AArch64 bit-test instructions in C2" I have been seeing occasional guarantee failures of the form.

#  Internal Error (assembler_aarch64.hpp:223), pid=4241, tid=4595
#  guarantee(chk == -1 || chk == 0) failed: Field too big for insn

These are being generated by the following call from pd_patch_instruction_size in macroAssembler_aarch64.cpp

    // Test & branch (immediate)
    Instruction_aarch64::spatch(branch, 18, 5, offset);

The problem is that test and branch instructions only have a 14 bit offset giving a range of +/- 32Kb which is not sufficient for large C2 methods.

What can we do about this? It seems a shame to backout this optimization but I cannot see any easy way around it.

All the best,
Ed.


From aph at redhat.com  Tue Dec  8 18:22:39 2015
From: aph at redhat.com (Andrew Haley)
Date: Tue, 8 Dec 2015 18:22:39 +0000
Subject: [aarch64-port-dev ] Guarantee failures since 8144028: Use
 AArch64 bit-test instructions in C2
In-Reply-To: <1449598952.3988.7.camel@mint>
References: <1449598952.3988.7.camel@mint>
Message-ID: <56671FEF.6020404@redhat.com>

On 12/08/2015 06:22 PM, Edward Nevill wrote:

> The problem is that test and branch instructions only have a 14 bit
> offset giving a range of +/- 32Kb which is not sufficient for large
> C2 methods.
> 
> What can we do about this? It seems a shame to backout this
> optimization but I cannot see any easy way around it.

C2 does support branch length relaxation: we already know it makes a
couple of passes generating code.  We've never used it, and I don't
quite know how to use it, but I think some other ports do.

Since this is my mess, I guess I should clean it up, and I'm
interested to try this.  But feel free if you like...

Andrew.

From edward.nevill at gmail.com  Tue Dec  8 18:31:54 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Tue, 08 Dec 2015 18:31:54 +0000
Subject: [aarch64-port-dev ] Guarantee failures since 8144028: Use
 AArch64 bit-test instructions in C2
In-Reply-To: <56671FEF.6020404@redhat.com>
References: <1449598952.3988.7.camel@mint>  <56671FEF.6020404@redhat.com>
Message-ID: <1449599514.3988.9.camel@mint>

On Tue, 2015-12-08 at 18:22 +0000, Andrew Haley wrote:
> On 12/08/2015 06:22 PM, Edward Nevill wrote:

> C2 does support branch length relaxation: we already know it makes a
> couple of passes generating code.  We've never used it, and I don't
> quite know how to use it, but I think some other ports do.
> 
> Since this is my mess, I guess I should clean it up, and I'm
> interested to try this.  But feel free if you like...

No. Its OK, thanks for the offer:-)
Ed.


From edward.nevill at gmail.com  Wed Dec  9 14:10:42 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Wed, 09 Dec 2015 14:10:42 +0000
Subject: [aarch64-port-dev ] RFR: aarch64: jdk8: large code cache support
Message-ID: <1449670242.21212.24.camel@mylittlepony.linaroharston>

Hi,

The following webrev

http://cr.openjdk.java.net/~enevill/jdk8_largecode/webrev

backports large code cache support from JDK 9 to JDK 8

This incorporates the fix to pd_call_destination

http://cr.openjdk.java.net/~enevill/8144498/webrev.2

I have also updated the jdk8 patch to reflect the setting of CODE_CACHE_SIZE_LIMIT in jdk9, so that jdk8 and jdk9 are the same.

One change I am not so sure about is the following in jdk9

@@ -868,7 +867,7 @@ 
   //   blrt rscratch1
   CodeBlob *cb = CodeCache::find_blob(_entry_point); 
   if (cb) {
-    return NativeInstruction::instruction_size; 
+    return MacroAssembler::far_branch_size();
   } else { 
     return 6 * NativeInstruction::instruction_size;

whereas in jdk8 we have

-    return 4;
+    return MacroAssembler::far_branch_size();
   } else {
     // A 48-bit address.  See movptr().
-    return 16;
+    // then a blrt
+    // return 16;
+    return 4 * NativeInstruction::instruction_size;


IE. 4 * instruction_size instead of 6 * instruction_size

This is because in jdk9, aarch64_enc_java_to_runtime does

      __ adr(rscratch2, retaddr);
      __ lea(rscratch1, RuntimeAddress(entry));
      // Leave a breadcrumb for JavaThread::pd_last_frame().
      __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize)));
      __ blrt(rscratch1, gpcnt, fpcnt, rtype);
      __ bind(retaddr);
      __ add(sp, sp, 2 * wordSize);

whereas in jdk8 it just does

      __ lea(rscratch1, RuntimeAddress(entry));
      __ blrt(rscratch1, gpcnt, fpcnt, rtype);

For the moment I have left this unchanged.

Is this necessary and should I include it in the backport?

I have tested the large code support in jdk8 with jtreg hotspot and langtools with the following results.

Hotspot (original - 128M code cache): Test results: passed: 674; failed: 17; error: 3
Hotspot (patched - 128M code cache): Test results: passed: 674; failed: 17; error: 3
Hotspot (patched- 256M code cache): Test results: passed: 674; failed: 17; error: 3

Langtools (original - 128M code cache): Test results: passed: 3,091
Langtools (patched - 128M code cache): Test results: passed: 3,090
Langtools (patched - 256M code cache): Test results: passed: 3,091

OK to push?

Ed.


From aph at redhat.com  Wed Dec  9 14:40:23 2015
From: aph at redhat.com (Andrew Haley)
Date: Wed, 9 Dec 2015 14:40:23 +0000
Subject: [aarch64-port-dev ] RFR: aarch64: jdk8: large code cache support
In-Reply-To: <1449670242.21212.24.camel@mylittlepony.linaroharston>
References: <1449670242.21212.24.camel@mylittlepony.linaroharston>
Message-ID: <56683D57.4090005@redhat.com>

On 12/09/2015 02:10 PM, Edward Nevill wrote:
> For the moment I have left this unchanged.
> 
> Is this necessary and should I include it in the backport?

This is fixed in http://hg.openjdk.java.net/aarch64-port/jdk8u

changeset:   8597:bea52c7ebf71
user:        aph
date:        Tue Sep 15 16:14:32 2015 +0000
summary:     Remove AArch64-specific code in generateOptoStub.cpp.

It's worth importing that patch.

Andrew.


From edward.nevill at gmail.com  Wed Dec  9 15:30:18 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Wed, 09 Dec 2015 15:30:18 +0000
Subject: [aarch64-port-dev ] RFR: aarch64: jdk8: large code cache support
In-Reply-To: <56683D57.4090005@redhat.com>
References: <1449670242.21212.24.camel@mylittlepony.linaroharston>
	<56683D57.4090005@redhat.com>
Message-ID: <1449675018.21212.33.camel@mylittlepony.linaroharston>

On Wed, 2015-12-09 at 14:40 +0000, Andrew Haley wrote:
> On 12/09/2015 02:10 PM, Edward Nevill wrote:
> > For the moment I have left this unchanged.
> > 
> > Is this necessary and should I include it in the backport?
> 
> This is fixed in http://hg.openjdk.java.net/aarch64-port/jdk8u
> 
> changeset:   8597:bea52c7ebf71
> user:        aph
> date:        Tue Sep 15 16:14:32 2015 +0000
> summary:     Remove AArch64-specific code in generateOptoStub.cpp.
> 
> It's worth importing that patch.

OK. Thanks. With that patch imported does the large code cache support patch look ok to push to jdk8?

Regards,
Ed.


From aph at redhat.com  Wed Dec  9 15:36:58 2015
From: aph at redhat.com (Andrew Haley)
Date: Wed, 9 Dec 2015 15:36:58 +0000
Subject: [aarch64-port-dev ] RFR: aarch64: jdk8: large code cache support
In-Reply-To: <1449675018.21212.33.camel@mylittlepony.linaroharston>
References: <1449670242.21212.24.camel@mylittlepony.linaroharston>
	<56683D57.4090005@redhat.com>
	<1449675018.21212.33.camel@mylittlepony.linaroharston>
Message-ID: <56684A9A.9070705@redhat.com>

On 12/09/2015 03:30 PM, Edward Nevill wrote:
> OK. Thanks. With that patch imported does the large code cache support patch look ok to push to jdk8?

I think so.

Andrew.


From aph at redhat.com  Wed Dec  9 19:00:00 2015
From: aph at redhat.com (Andrew Haley)
Date: Wed, 9 Dec 2015 19:00:00 +0000
Subject: [aarch64-port-dev ] Guarantee failures since 8144028: Use
 AArch64 bit-test instructions in C2
In-Reply-To: <1449598952.3988.7.camel@mint>
References: <1449598952.3988.7.camel@mint>
Message-ID: <56687A30.2020203@redhat.com>

On 12/08/2015 06:22 PM, Edward Nevill wrote:
> Hi,
> 
> Since "8144028: Use AArch64 bit-test instructions in C2" I have been seeing occasional guarantee failures of the form.
> 
> #  Internal Error (assembler_aarch64.hpp:223), pid=4241, tid=4595
> #  guarantee(chk == -1 || chk == 0) failed: Field too big for insn
> 
> These are being generated by the following call from pd_patch_instruction_size in macroAssembler_aarch64.cpp
> 
>     // Test & branch (immediate)
>     Instruction_aarch64::spatch(branch, 18, 5, offset);
> 
> The problem is that test and branch instructions only have a 14 bit offset giving a range of +/- 32Kb which is not sufficient for large C2 methods.
> 
> What can we do about this? It seems a shame to backout this optimization but I cannot see any easy way around it.

Please try this patch.

Andrew.


-------------- next part --------------
diff --git a/src/cpu/aarch64/vm/aarch64.ad b/src/cpu/aarch64/vm/aarch64.ad
--- a/src/cpu/aarch64/vm/aarch64.ad
+++ b/src/cpu/aarch64/vm/aarch64.ad
@@ -3484,10 +3484,17 @@
   return 0;
 }
 
-bool Matcher::is_short_branch_offset(int rule, int br_size, int offset)
-{
-  Unimplemented();
-  return false;
+// Is this branch offset short enough that a short branch can be used?
+//
+// NOTE: If the platform does not provide any short branch variants, then
+//       this method should return false for offset 0.
+bool Matcher::is_short_branch_offset(int rule, int br_size, int offset) {
+  // The passed offset is relative to address of the branch.  On
+  // AArch64 a branch displacement is calculated relative to address
+  // of the next instruction.
+  offset -= br_size;
+
+  return (-32768 <= offset && offset < 32768);
 }
 
 const bool Matcher::isSimpleConstant64(jlong value) {
@@ -13845,7 +13852,8 @@
 
 // Test bit and Branch
 
-instruct cmpL_branch_sign(cmpOp cmp, iRegL op1, immL0 op2, label labl, rFlagsReg cr) %{
+// Patterns for short (< 32KiB) variants
+instruct cmpL_branch_sign(cmpOp cmp, iRegL op1, immL0 op2, label labl) %{
   match(If cmp (CmpL op1 op2));
   predicate(n->in(1)->as_Bool()->_test._test == BoolTest::lt
             || n->in(1)->as_Bool()->_test._test == BoolTest::ge);
@@ -13855,16 +13863,15 @@
   format %{ "cb$cmp   $op1, $labl # long" %}
   ins_encode %{
     Label* L = $labl$$label;
-    Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode;
-    if (cond == Assembler::LT)
-      __ tbnz($op1$$Register, 63, *L);
-    else
-      __ tbz($op1$$Register, 63, *L);
+    Assembler::Condition cond =
+      ((Assembler::Condition)$cmp$$cmpcode == Assembler::LT) ? Assembler::NE : Assembler::EQ;
+    __ tbr($op1$$Register, cond, 63, *L);
   %}
   ins_pipe(pipe_cmp_branch);
-%}
-
-instruct cmpI_branch_sign(cmpOp cmp, iRegIorL2I op1, immI0 op2, label labl, rFlagsReg cr) %{
+  ins_short_branch(1);
+%}
+
+instruct cmpI_branch_sign(cmpOp cmp, iRegIorL2I op1, immI0 op2, label labl) %{
   match(If cmp (CmpI op1 op2));
   predicate(n->in(1)->as_Bool()->_test._test == BoolTest::lt
             || n->in(1)->as_Bool()->_test._test == BoolTest::ge);
@@ -13874,16 +13881,15 @@
   format %{ "cb$cmp   $op1, $labl # int" %}
   ins_encode %{
     Label* L = $labl$$label;
-    Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode;
-    if (cond == Assembler::LT)
-      __ tbnz($op1$$Register, 31, *L);
-    else
-      __ tbz($op1$$Register, 31, *L);
+    Assembler::Condition cond =
+      ((Assembler::Condition)$cmp$$cmpcode == Assembler::LT) ? Assembler::NE : Assembler::EQ;
+    __ tbr($op1$$Register, cond, 31, *L);
   %}
   ins_pipe(pipe_cmp_branch);
-%}
-
-instruct cmpL_branch_bit(cmpOp cmp, iRegL op1, immL op2, immL0 op3, label labl, rFlagsReg cr) %{
+  ins_short_branch(1);
+%}
+
+instruct cmpL_branch_bit(cmpOp cmp, iRegL op1, immL op2, immL0 op3, label labl) %{
   match(If cmp (CmpL (AndL op1 op2) op3));
   predicate((n->in(1)->as_Bool()->_test._test == BoolTest::ne
             || n->in(1)->as_Bool()->_test._test == BoolTest::eq)
@@ -13896,15 +13902,13 @@
     Label* L = $labl$$label;
     Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode;
     int bit = exact_log2($op2$$constant);
-    if (cond == Assembler::EQ)
-      __ tbz($op1$$Register, bit, *L);
-    else
-      __ tbnz($op1$$Register, bit, *L);
+    __ tbr($op1$$Register, cond, bit, *L);
   %}
   ins_pipe(pipe_cmp_branch);
-%}
-
-instruct cmpI_branch_bit(cmpOp cmp, iRegIorL2I op1, immI op2, immI0 op3, label labl, rFlagsReg cr) %{
+  ins_short_branch(1);
+%}
+
+instruct cmpI_branch_bit(cmpOp cmp, iRegIorL2I op1, immI op2, immI0 op3, label labl) %{
   match(If cmp (CmpI (AndI op1 op2) op3));
   predicate((n->in(1)->as_Bool()->_test._test == BoolTest::ne
             || n->in(1)->as_Bool()->_test._test == BoolTest::eq)
@@ -13917,10 +13921,79 @@
     Label* L = $labl$$label;
     Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode;
     int bit = exact_log2($op2$$constant);
-    if (cond == Assembler::EQ)
-      __ tbz($op1$$Register, bit, *L);
-    else
-      __ tbnz($op1$$Register, bit, *L);
+    __ tbr($op1$$Register, cond, bit, *L);
+  %}
+  ins_pipe(pipe_cmp_branch);
+  ins_short_branch(1);
+%}
+
+// And far variants
+instruct far_cmpL_branch_sign(cmpOp cmp, iRegL op1, immL0 op2, label labl) %{
+  match(If cmp (CmpL op1 op2));
+  predicate(n->in(1)->as_Bool()->_test._test == BoolTest::lt
+            || n->in(1)->as_Bool()->_test._test == BoolTest::ge);
+  effect(USE labl);
+
+  ins_cost(BRANCH_COST);
+  format %{ "cb$cmp   $op1, $labl # long" %}
+  ins_encode %{
+    Label* L = $labl$$label;
+    Assembler::Condition cond =
+      ((Assembler::Condition)$cmp$$cmpcode == Assembler::LT) ? Assembler::NE : Assembler::EQ;
+    __ tbr($op1$$Register, cond, 63, *L, /*far*/true);
+  %}
+  ins_pipe(pipe_cmp_branch);
+%}
+
+instruct far_cmpI_branch_sign(cmpOp cmp, iRegIorL2I op1, immI0 op2, label labl) %{
+  match(If cmp (CmpI op1 op2));
+  predicate(n->in(1)->as_Bool()->_test._test == BoolTest::lt
+            || n->in(1)->as_Bool()->_test._test == BoolTest::ge);
+  effect(USE labl);
+
+  ins_cost(BRANCH_COST);
+  format %{ "cb$cmp   $op1, $labl # int" %}
+  ins_encode %{
+    Label* L = $labl$$label;
+    Assembler::Condition cond =
+      ((Assembler::Condition)$cmp$$cmpcode == Assembler::LT) ? Assembler::NE : Assembler::EQ;
+    __ tbr($op1$$Register, cond, 31, *L, /*far*/true);
+  %}
+  ins_pipe(pipe_cmp_branch);
+%}
+
+instruct far_cmpL_branch_bit(cmpOp cmp, iRegL op1, immL op2, immL0 op3, label labl) %{
+  match(If cmp (CmpL (AndL op1 op2) op3));
+  predicate((n->in(1)->as_Bool()->_test._test == BoolTest::ne
+            || n->in(1)->as_Bool()->_test._test == BoolTest::eq)
+            && is_power_of_2(n->in(2)->in(1)->in(2)->get_long()));
+  effect(USE labl);
+
+  ins_cost(BRANCH_COST);
+  format %{ "tb$cmp   $op1, $op2, $labl" %}
+  ins_encode %{
+    Label* L = $labl$$label;
+    Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode;
+    int bit = exact_log2($op2$$constant);
+    __ tbr($op1$$Register, cond, bit, *L, /*far*/true);
+  %}
+  ins_pipe(pipe_cmp_branch);
+%}
+
+instruct far_cmpI_branch_bit(cmpOp cmp, iRegIorL2I op1, immI op2, immI0 op3, label labl) %{
+  match(If cmp (CmpI (AndI op1 op2) op3));
+  predicate((n->in(1)->as_Bool()->_test._test == BoolTest::ne
+            || n->in(1)->as_Bool()->_test._test == BoolTest::eq)
+            && is_power_of_2(n->in(2)->in(1)->in(2)->get_int()));
+  effect(USE labl);
+
+  ins_cost(BRANCH_COST);
+  format %{ "tb$cmp   $op1, $op2, $labl" %}
+  ins_encode %{
+    Label* L = $labl$$label;
+    Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode;
+    int bit = exact_log2($op2$$constant);
+    __ tbr($op1$$Register, cond, bit, *L, /*far*/true);
   %}
   ins_pipe(pipe_cmp_branch);
 %}
diff --git a/src/cpu/aarch64/vm/c1_MacroAssembler_aarch64.hpp b/src/cpu/aarch64/vm/c1_MacroAssembler_aarch64.hpp
--- a/src/cpu/aarch64/vm/c1_MacroAssembler_aarch64.hpp
+++ b/src/cpu/aarch64/vm/c1_MacroAssembler_aarch64.hpp
@@ -27,6 +27,7 @@
 #define CPU_AARCH64_VM_C1_MACROASSEMBLER_AARCH64_HPP
 
 using MacroAssembler::build_frame;
+using MacroAssembler::null_check;
 
 // C1_MacroAssembler contains high-level macros for C1
 
diff --git a/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp b/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp
--- a/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp
+++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp
@@ -487,6 +487,32 @@
     orr(Vd, T, Vn, Vn);
   }
 
+public:
+
+  // Generalized Test Bit And Branch, including a "far" variety which
+  // spans more than 32KiB.
+  void tbr(Register Rt, Condition cond, int bitpos, Label &dest, bool far = false) {
+    assert(cond == Assembler::EQ || cond == Assembler::NE, "must be");
+
+    if (far)
+      cond = ~cond;
+
+    void (Assembler::* branch)(Register Rt, int bitpos, Label &L);
+    if (cond == Assembler::EQ)
+      branch = &Assembler::tbz;
+    else
+      branch = &Assembler::tbnz;
+
+    if (far) {
+      Label L;
+      (this->*branch)(Rt, bitpos, L);
+      b(dest);
+      bind(L);
+    } else {
+      (this->*branch)(Rt, bitpos, dest);
+    }
+  }
+
   // macro instructions for accessing and updating floating point
   // status register
   //
diff --git a/src/share/vm/adlc/formssel.cpp b/src/share/vm/adlc/formssel.cpp
--- a/src/share/vm/adlc/formssel.cpp
+++ b/src/share/vm/adlc/formssel.cpp
@@ -1246,7 +1246,8 @@
       !is_short_branch() &&     // Don't match another short branch variant
       reduce_result() != NULL &&
       strcmp(reduce_result(), short_branch->reduce_result()) == 0 &&
-      _matrule->equivalent(AD.globalNames(), short_branch->_matrule)) {
+      _matrule->equivalent(AD.globalNames(), short_branch->_matrule) &&
+      equivalent_predicates(this, short_branch)) {
     // The instructions are equivalent.
 
     // Now verify that both instructions have the same parameters and

From edward.nevill at gmail.com  Thu Dec 10 11:16:07 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Thu, 10 Dec 2015 11:16:07 +0000
Subject: [aarch64-port-dev ] Guarantee failures since 8144028: Use
 AArch64 bit-test instructions in C2
In-Reply-To: <56687A30.2020203@redhat.com>
References: <1449598952.3988.7.camel@mint> <56687A30.2020203@redhat.com>
Message-ID: <1449746167.24789.16.camel@mylittlepony.linaroharston>

On Wed, 2015-12-09 at 19:00 +0000, Andrew Haley wrote:
> On 12/08/2015 06:22 PM, Edward Nevill wrote:
> > Hi,
> > 
> Please try this patch.

Hi,

It fixed some of the problems I see, but not all. The test I am running is jtreg/langtools. With 8144028 I see 33 failures. With this patch that reduces to 30 failures. With 8144028 backed out there are no failures.

The command I am using to run jtreg is

/home/ed/images/jdk9-backout/bin/java -jar lib/jtreg.jar -nr -conc:16 -timeout:3 -othervm -jdk:/home/ed/images/jdk9-backout -v1 -a -ignore:quiet /home/ed/new_jdk9/hs-comp/langtools/test

I am continuing to look at the other failures. Let me know if you want any logs etc.

All the best,
Ed.


From hui.shi at linaro.org  Thu Dec 10 14:48:05 2015
From: hui.shi at linaro.org (Hui Shi)
Date: Thu, 10 Dec 2015 22:48:05 +0800
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier
	after AllocationNode
Message-ID: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>

Hi All,


Could some one help comments this change?


Bug:  https://bugs.openjdk.java.net/browse/JDK-8144993

webrev:  http://cr.openjdk.java.net/~hshi/8144993/webrev/


This patch aims to remove redundant memory barrier after allocation node,
on AArch64 it removes redundant dmb when creating object. The motivation is
dmb instructions after commonly used object allocation, for example string
and boxing objects is redundant with dmb inserted for final field write. In
following small case:


String foo(String s)

{

   String copy = new String(s);

   return copy;

}


There are two dmb instructions in generated code. First one is
membar_storestore, inserted in PhaseMacroExpand::expand_allocate_common.
Second one is membar_release, inserted at exit of initializer method as
final fields write happens. Allocated String doesn't escape in String
initializer method, membar_release includes membar_storestore semantic. So
first one can be removed safely.


  0x0000007f85bbfa8c: prfm      pstl1keep, [x11,#256]

  0x0000007f85bbfa90: str       xzr, [x0,#16]

  0x0000007f85bbfa94: dmb       ishst        // first dmb to remove

  ....


  0x0000007fa01d83c0: ldrsb     w10, [x20,#20]

  0x0000007fa01d83c4: ldr       w12, [x20,#16]

  0x0000007fa01d83c8: ldr       x11, [sp,#8]

  0x0000007fa01d83cc: strb      w10, [x11,#20]

  0x0000007fa01d83d0: str       w12, [x11,#16]

  0x0000007fa01d83d4: dmb       ish        // second dmb


Patch targets this pattern and remove redundant memory barrier for
allocation node.

1. When inserting memory barrier for final field write. If final fields'
object allocation node is available, invoke
AllocationNode::compute_MemBar_redundancy(initializer method).

2. In AllocationNode:

    2.1 Add a new field _is_allocation_MemBar_redundant flag indicate if
memory barrier after allocation node is redundant.

   2.2 Add method compute_MemBar_redundancy, set
 _is_allocation_MemBar_redundant true if first parameter "this" does not
escape in initializer method according to BCEscapeAnalyzer.

3. skip inserting memory barrier in
PhaseMacroExpand::expand_allocate_common, when AllocationNode's
_is_allocation_MemBar_redundant
flag is true.


Regards

Hui

From edward.nevill at gmail.com  Thu Dec 10 17:05:26 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Thu, 10 Dec 2015 17:05:26 +0000
Subject: [aarch64-port-dev ] Guarantee failures since 8144028: Use
 AArch64 bit-test instructions in C2
In-Reply-To: <56687A30.2020203@redhat.com>
References: <1449598952.3988.7.camel@mint> <56687A30.2020203@redhat.com>
Message-ID: <1449767126.8845.3.camel@mylittlepony.linaroharston>

On Wed, 2015-12-09 at 19:00 +0000, Andrew Haley wrote:
> On 12/08/2015 06:22 PM, Edward Nevill wrote:
> > Hi,
> > 
> > Since "8144028: Use AArch64 bit-test instructions in C2" I have been seeing occasional guarantee failures of the form.
> > 
> > #  Internal Error (assembler_aarch64.hpp:223), pid=4241, tid=4595
> > #  guarantee(chk == -1 || chk == 0) failed: Field too big for insn
> > 
> > These are being generated by the following call from pd_patch_instruction_size in macroAssembler_aarch64.cpp
> > 
> >     // Test & branch (immediate)
> >     Instruction_aarch64::spatch(branch, 18, 5, offset);
> > 
> > The problem is that test and branch instructions only have a 14 bit offset giving a range of +/- 32Kb which is not sufficient for large C2 methods.
> > 
> > What can we do about this? It seems a shame to backout this optimization but I cannot see any easy way around it.
> 
> Please try this patch.

I think the following patch is needed in addition.

diff -r af66c2e5a0f6 src/cpu/aarch64/vm/interp_masm_aarch64.cpp
--- a/src/cpu/aarch64/vm/interp_masm_aarch64.cpp	Thu Dec 10 15:58:02 2015 +0000
+++ b/src/cpu/aarch64/vm/interp_masm_aarch64.cpp	Thu Dec 10 17:02:12 2015 +0000
@@ -1355,8 +1355,9 @@
   if (JvmtiExport::can_post_interpreter_events()) {
     Label L;
     ldr(r3, Address(rthread, JavaThread::interp_only_mode_offset()));
-    tst(r3, ~0);
-    br(Assembler::EQ, L);
+//    tst(r3, ~0);
+//    br(Assembler::EQ, L);
+    cbz(r3, L);
     call_VM(noreg, CAST_FROM_FN_PTR(address,
                                     InterpreterRuntime::post_method_entry));
     bind(L);

Regards,
Ed.


From aph at redhat.com  Thu Dec 10 17:22:59 2015
From: aph at redhat.com (Andrew Haley)
Date: Thu, 10 Dec 2015 17:22:59 +0000
Subject: [aarch64-port-dev ] Guarantee failures since 8144028: Use
 AArch64 bit-test instructions in C2
In-Reply-To: <1449767126.8845.3.camel@mylittlepony.linaroharston>
References: <1449598952.3988.7.camel@mint> <56687A30.2020203@redhat.com>
	<1449767126.8845.3.camel@mylittlepony.linaroharston>
Message-ID: <5669B4F3.2060800@redhat.com>

On 12/10/2015 05:05 PM, Edward Nevill wrote:
> I think the following patch is needed in addition.

Good catch!

Thanks,

Andrew.


From aph at redhat.com  Mon Dec 14 15:59:23 2015
From: aph at redhat.com (Andrew Haley)
Date: Mon, 14 Dec 2015 15:59:23 +0000
Subject: [aarch64-port-dev ] RFR: 8145320: Create unsafe_arraycopy and
 generic_arraycopy for AArch64
Message-ID: <566EE75B.70107@redhat.com>

http://cr.openjdk.java.net/~aph/8145320-1/

Andrew.

From edward.nevill at gmail.com  Mon Dec 14 17:52:39 2015
From: edward.nevill at gmail.com (edward.nevill at gmail.com)
Date: Mon, 14 Dec 2015 17:52:39 +0000
Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/hotspot: 2 new changesets
Message-ID: <201512141752.tBEHqd99023228@aojmv0008.oracle.com>

Changeset: b2df86902f5e
Author:    enevill
Date:      2015-12-09 13:08 +0000
URL:       http://hg.openjdk.java.net/aarch64-port/jdk8/hotspot/rev/b2df86902f5e

Add support for large code cache

! src/cpu/aarch64/vm/aarch64.ad
! src/cpu/aarch64/vm/assembler_aarch64.cpp
! src/cpu/aarch64/vm/assembler_aarch64.hpp
! src/cpu/aarch64/vm/c1_CodeStubs_aarch64.cpp
! src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp
! src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.hpp
! src/cpu/aarch64/vm/c1_MacroAssembler_aarch64.cpp
! src/cpu/aarch64/vm/c1_Runtime1_aarch64.cpp
! src/cpu/aarch64/vm/compiledIC_aarch64.cpp
! src/cpu/aarch64/vm/globalDefinitions_aarch64.hpp
! src/cpu/aarch64/vm/globals_aarch64.hpp
! src/cpu/aarch64/vm/icBuffer_aarch64.cpp
! src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
! src/cpu/aarch64/vm/macroAssembler_aarch64.hpp
! src/cpu/aarch64/vm/methodHandles_aarch64.cpp
! src/cpu/aarch64/vm/nativeInst_aarch64.cpp
! src/cpu/aarch64/vm/nativeInst_aarch64.hpp
! src/cpu/aarch64/vm/relocInfo_aarch64.cpp
! src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp
! src/cpu/aarch64/vm/stubGenerator_aarch64.cpp
! src/cpu/aarch64/vm/templateInterpreter_aarch64.cpp
! src/cpu/aarch64/vm/vtableStubs_aarch64.cpp
! src/os_cpu/linux_aarch64/vm/os_linux_aarch64.cpp
! src/share/vm/runtime/arguments.cpp
! src/share/vm/utilities/globalDefinitions.hpp

Changeset: 0096f1ef564e
Author:    aph
Date:      2015-09-15 16:14 +0000
URL:       http://hg.openjdk.java.net/aarch64-port/jdk8/hotspot/rev/0096f1ef564e

Remove AArch64-specific code in generateOptoStub.cpp.
In aarch64_enc_java_to_runtime leave a breadcrumb for
JavaThread::pd_last_frame().

! src/cpu/aarch64/vm/aarch64.ad
! src/share/vm/opto/generateOptoStub.cpp


From edward.nevill at gmail.com  Mon Dec 14 19:46:41 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Mon, 14 Dec 2015 19:46:41 +0000
Subject: [aarch64-port-dev ] Ping: RFR: aarch64: backports to JDK 7
In-Reply-To: <1448466290.16878.5.camel@mylittlepony.linaroharston>
References: <1448466290.16878.5.camel@mylittlepony.linaroharston>
Message-ID: <1450122401.708.1.camel@mint>

Hi,

OK to backport these to JDK7?

Thanks,
Ed.

On Wed, 2015-11-25 at 15:44 +0000, Edward Nevill wrote:
> Hi,
> 
> Please review the following backports to JDK 7
> 
> http://cr.openjdk.java.net/~enevill/jdk7_backports_1511/
> 
> Tested with jtreg hotspot & langtools. Results the same before and after.
> 
> Hotspot: Test results: passed: 297; failed: 12; error: 2
> Langtools: Test results: passed: 1,971; failed: 1; error: 2
> 
> Summary of the changesets below.
> 
> Thanks,
> Ed.
> 
> ---
> enevill at arm64:~/icedtea7-forest/hotspot$ hg outgoing
> comparing with ssh://enevill at icedtea.classpath.org/hg/icedtea7-forest/hotspot
> running ssh enevill at icedtea.classpath.org 'hg -R hg/icedtea7-forest/hotspot serve --stdio'
> searching for changes
> changeset:   6380:5b6efbae9fea
> user:        aph
> date:        Wed Nov 04 13:38:38 2015 +0100
> files:       src/share/vm/gc_implementation/parallelScavenge/psParallelCompact.hpp
> description:
> 8138966: Intermittent SEGV running ParallelGC
> Summary: Add necessary memory fences so that the parallel threads are unable to observe partially filled block tables.
> Reviewed-by: tschatzl
> 
> 
> changeset:   6381:c7679d143590
> user:        enevill
> date:        Thu Nov 19 15:15:20 2015 +0000
> files:       src/cpu/aarch64/vm/assembler_aarch64.cpp
> description:
> 8143067: aarch64: guarantee failure in javac
> Summary: Fix adrp going out of range during code relocation
> Reviewed-by: aph, kvn
> 
> 
> changeset:   6382:eeb4a3ec4563
> tag:         tip
> user:        hshi
> date:        Tue Nov 24 09:02:26 2015 +0000
> files:       src/cpu/aarch64/vm/interp_masm_aarch64.cpp
> description:
> 8143285: aarch64: Missing load acquire when checking if ConstantPoolCacheEntry is resolved
> Reviewed-by: roland, aph
> ---
> 
> 


From aph at redhat.com  Mon Dec 14 20:42:46 2015
From: aph at redhat.com (Andrew Haley)
Date: Mon, 14 Dec 2015 20:42:46 +0000
Subject: [aarch64-port-dev ] Ping: RFR: aarch64: backports to JDK 7
In-Reply-To: <1450122401.708.1.camel@mint>
References: <1448466290.16878.5.camel@mylittlepony.linaroharston>
	<1450122401.708.1.camel@mint>
Message-ID: <566F29C6.7090802@redhat.com>

On 12/14/2015 07:46 PM, Edward Nevill wrote:
> OK to backport these to JDK7?

Looks good.

Andrew.


From vladimir.kozlov at oracle.com  Mon Dec 14 23:07:03 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 14 Dec 2015 15:07:03 -0800
Subject: [aarch64-port-dev ] RFR: 8145320: Create unsafe_arraycopy and
 generic_arraycopy for AArch64
In-Reply-To: <566EE75B.70107@redhat.com>
References: <566EE75B.70107@redhat.com>
Message-ID: <566F4B97.5050605@oracle.com>

Looks fine to me.

Thanks,
Vladimir

On 12/14/15 7:59 AM, Andrew Haley wrote:
> http://cr.openjdk.java.net/~aph/8145320-1/
>
> Andrew.
>

From vladimir.kozlov at oracle.com  Tue Dec 15 02:40:02 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 14 Dec 2015 18:40:02 -0800
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
	barrier after AllocationNode
In-Reply-To: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
Message-ID: <566F7D82.6030806@oracle.com>

Very interesting!

Please, add short statement to the comment in /macro.cpp for your case.

Changes looks fine to me. One nit could be to delay bytecode analysis 
until macro expansion - it may reduce compilation time. Bytecode 
analysis of each constructor could be expensive.

Thanks,
Vladimir

On 12/10/15 6:48 AM, Hui Shi wrote:
> Hi All,
>
>
> Could some one help comments this change?
>
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8144993
>
> webrev: http://cr.openjdk.java.net/~hshi/8144993/webrev/
>
>
> This patch aims to remove redundant memory barrier after allocation
> node, on AArch64 it removes redundant dmb when creating object. The
> motivation is dmb instructions after commonly used object allocation,
> for example string and boxing objects is redundant with dmb inserted for
> final field write. In following small case:____
>
> __ __
>
> String foo(String s)____
>
> {____
>
>     String copy = new String(s);____
>
>     return copy;____
>
> }____
>
> __ __
>
> There are two dmb instructions in generated code. First one is
> membar_storestore, inserted in PhaseMacroExpand::expand_allocate_common.
> Second one is membar_release, inserted at exit of initializer method as
> final fields write happens. Allocated String doesn't escape in String
> initializer method, membar_release includes membar_storestore semantic.
> So first one can be removed safely.____
>
> __ __
>
>    0x0000007f85bbfa8c: prfm      pstl1keep, [x11,#256]____
>
>    0x0000007f85bbfa90: str       xzr, [x0,#16]____
>
>    0x0000007f85bbfa94: dmb       ishst        // first dmb to remove____
>
>    ....____
>
> ____
>
>    0x0000007fa01d83c0: ldrsb     w10, [x20,#20]____
>
>    0x0000007fa01d83c4: ldr       w12, [x20,#16]____
>
>    0x0000007fa01d83c8: ldr       x11, [sp,#8]____
>
>    0x0000007fa01d83cc: strb      w10, [x11,#20]____
>
>    0x0000007fa01d83d0: str       w12, [x11,#16]____
>
>    0x0000007fa01d83d4: dmb       ish        // second dmb____
>
> __ __
>
>
> Patch targets this pattern and remove redundant memory barrier for
> allocation node.____
>
> 1. When inserting memory barrier for final field write. If final fields'
> object allocation node is available, invoke
> AllocationNode::compute_MemBar_redundancy(initializer method).____
>
> 2. In AllocationNode:____
>
>      2.1 Add a new field _is_allocation_MemBar_redundant flag indicate
> if memory barrier after allocation node is redundant.____
>
>     2.2 Add method compute_MemBar_redundancy, set
>   _is_allocation_MemBar_redundant true if first parameter "this" does
> not escape in initializer method according to BCEscapeAnalyzer.____
>
> 3. skip inserting memory barrier in
> PhaseMacroExpand::expand_allocate_common, when AllocationNode's
> _is_allocation_MemBar_redundant flagis true.
>
>
> Regards
>
> Hui
>

From aleksey.shipilev at oracle.com  Tue Dec 15 09:05:44 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Tue, 15 Dec 2015 12:05:44 +0300
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
	barrier after AllocationNode
In-Reply-To: <566F7D82.6030806@oracle.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com>
Message-ID: <566FD7E8.7000105@oracle.com>

Also, I think this is a duplicate of:
 https://bugs.openjdk.java.net/browse/JDK-8032481

-Aleksey

On 12/15/2015 05:40 AM, Vladimir Kozlov wrote:
> Very interesting!
> 
> Please, add short statement to the comment in /macro.cpp for your case.
> 
> Changes looks fine to me. One nit could be to delay bytecode analysis
> until macro expansion - it may reduce compilation time. Bytecode
> analysis of each constructor could be expensive.
> 
> Thanks,
> Vladimir
> 
> On 12/10/15 6:48 AM, Hui Shi wrote:
>> Hi All,
>>
>>
>> Could some one help comments this change?
>>
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8144993
>>
>> webrev: http://cr.openjdk.java.net/~hshi/8144993/webrev/
>>
>>
>> This patch aims to remove redundant memory barrier after allocation
>> node, on AArch64 it removes redundant dmb when creating object. The
>> motivation is dmb instructions after commonly used object allocation,
>> for example string and boxing objects is redundant with dmb inserted for
>> final field write. In following small case:____
>>
>> __ __
>>
>> String foo(String s)____
>>
>> {____
>>
>>     String copy = new String(s);____
>>
>>     return copy;____
>>
>> }____
>>
>> __ __
>>
>> There are two dmb instructions in generated code. First one is
>> membar_storestore, inserted in PhaseMacroExpand::expand_allocate_common.
>> Second one is membar_release, inserted at exit of initializer method as
>> final fields write happens. Allocated String doesn't escape in String
>> initializer method, membar_release includes membar_storestore semantic.
>> So first one can be removed safely.____
>>
>> __ __
>>
>>    0x0000007f85bbfa8c: prfm      pstl1keep, [x11,#256]____
>>
>>    0x0000007f85bbfa90: str       xzr, [x0,#16]____
>>
>>    0x0000007f85bbfa94: dmb       ishst        // first dmb to remove____
>>
>>    ....____
>>
>> ____
>>
>>    0x0000007fa01d83c0: ldrsb     w10, [x20,#20]____
>>
>>    0x0000007fa01d83c4: ldr       w12, [x20,#16]____
>>
>>    0x0000007fa01d83c8: ldr       x11, [sp,#8]____
>>
>>    0x0000007fa01d83cc: strb      w10, [x11,#20]____
>>
>>    0x0000007fa01d83d0: str       w12, [x11,#16]____
>>
>>    0x0000007fa01d83d4: dmb       ish        // second dmb____
>>
>> __ __
>>
>>
>> Patch targets this pattern and remove redundant memory barrier for
>> allocation node.____
>>
>> 1. When inserting memory barrier for final field write. If final fields'
>> object allocation node is available, invoke
>> AllocationNode::compute_MemBar_redundancy(initializer method).____
>>
>> 2. In AllocationNode:____
>>
>>      2.1 Add a new field _is_allocation_MemBar_redundant flag indicate
>> if memory barrier after allocation node is redundant.____
>>
>>     2.2 Add method compute_MemBar_redundancy, set
>>   _is_allocation_MemBar_redundant true if first parameter "this" does
>> not escape in initializer method according to BCEscapeAnalyzer.____
>>
>> 3. skip inserting memory barrier in
>> PhaseMacroExpand::expand_allocate_common, when AllocationNode's
>> _is_allocation_MemBar_redundant flagis true.
>>
>>
>> Regards
>>
>> Hui
>>


From martin.doerr at sap.com  Tue Dec 15 10:27:14 2015
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 15 Dec 2015 10:27:14 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <566FD7E8.7000105@oracle.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
Message-ID: <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>

Hi,

I think this change is good with respect to concurrent java threads.
However, I'm not sure if concurrent GC may have a problem when we optimize out the memory barrier (with or without this change).

Is it guaranteed that no concurrent GC will ever read an object header of such a newly allocated object?
A reference to this object may get written somewhere where GC can find it. If the GC reads the header, it may read stale data.

Best regards,
  Martin


-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Aleksey Shipilev
Sent: Dienstag, 15. Dezember 2015 10:06
To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; Hui Shi <hui.shi at linaro.org>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev <aarch64-port-dev at openjdk.java.net>
Subject: Re: RFR: 8144993: Elide redundant memory barrier after AllocationNode

* PGP Signed by an unknown key

Also, I think this is a duplicate of:
 https://bugs.openjdk.java.net/browse/JDK-8032481

-Aleksey

On 12/15/2015 05:40 AM, Vladimir Kozlov wrote:
> Very interesting!
> 
> Please, add short statement to the comment in /macro.cpp for your case.
> 
> Changes looks fine to me. One nit could be to delay bytecode analysis
> until macro expansion - it may reduce compilation time. Bytecode
> analysis of each constructor could be expensive.
> 
> Thanks,
> Vladimir
> 
> On 12/10/15 6:48 AM, Hui Shi wrote:
>> Hi All,
>>
>>
>> Could some one help comments this change?
>>
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8144993
>>
>> webrev: http://cr.openjdk.java.net/~hshi/8144993/webrev/
>>
>>
>> This patch aims to remove redundant memory barrier after allocation
>> node, on AArch64 it removes redundant dmb when creating object. The
>> motivation is dmb instructions after commonly used object allocation,
>> for example string and boxing objects is redundant with dmb inserted for
>> final field write. In following small case:____
>>
>> __ __
>>
>> String foo(String s)____
>>
>> {____
>>
>>     String copy = new String(s);____
>>
>>     return copy;____
>>
>> }____
>>
>> __ __
>>
>> There are two dmb instructions in generated code. First one is
>> membar_storestore, inserted in PhaseMacroExpand::expand_allocate_common.
>> Second one is membar_release, inserted at exit of initializer method as
>> final fields write happens. Allocated String doesn't escape in String
>> initializer method, membar_release includes membar_storestore semantic.
>> So first one can be removed safely.____
>>
>> __ __
>>
>>    0x0000007f85bbfa8c: prfm      pstl1keep, [x11,#256]____
>>
>>    0x0000007f85bbfa90: str       xzr, [x0,#16]____
>>
>>    0x0000007f85bbfa94: dmb       ishst        // first dmb to remove____
>>
>>    ....____
>>
>> ____
>>
>>    0x0000007fa01d83c0: ldrsb     w10, [x20,#20]____
>>
>>    0x0000007fa01d83c4: ldr       w12, [x20,#16]____
>>
>>    0x0000007fa01d83c8: ldr       x11, [sp,#8]____
>>
>>    0x0000007fa01d83cc: strb      w10, [x11,#20]____
>>
>>    0x0000007fa01d83d0: str       w12, [x11,#16]____
>>
>>    0x0000007fa01d83d4: dmb       ish        // second dmb____
>>
>> __ __
>>
>>
>> Patch targets this pattern and remove redundant memory barrier for
>> allocation node.____
>>
>> 1. When inserting memory barrier for final field write. If final fields'
>> object allocation node is available, invoke
>> AllocationNode::compute_MemBar_redundancy(initializer method).____
>>
>> 2. In AllocationNode:____
>>
>>      2.1 Add a new field _is_allocation_MemBar_redundant flag indicate
>> if memory barrier after allocation node is redundant.____
>>
>>     2.2 Add method compute_MemBar_redundancy, set
>>   _is_allocation_MemBar_redundant true if first parameter "this" does
>> not escape in initializer method according to BCEscapeAnalyzer.____
>>
>> 3. skip inserting memory barrier in
>> PhaseMacroExpand::expand_allocate_common, when AllocationNode's
>> _is_allocation_MemBar_redundant flagis true.
>>
>>
>> Regards
>>
>> Hui
>>


* Unknown Key
* 0x62A119A7

From aph at redhat.com  Tue Dec 15 10:42:17 2015
From: aph at redhat.com (Andrew Haley)
Date: Tue, 15 Dec 2015 10:42:17 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
Message-ID: <566FEE89.5020300@redhat.com>

On 15/12/15 10:27, Doerr, Martin wrote:

> I think this change is good with respect to concurrent java threads.
> However, I'm not sure if concurrent GC may have a problem when we
> optimize out the memory barrier (with or without this change).
> 
> Is it guaranteed that no concurrent GC will ever read an object
> header of such a newly allocated object?
> A reference to this object may get written somewhere where GC can
> find it. If the GC reads the header, it may read stale data.

We know that the reference to the newly-created object does not
escape, so it is not reachable from any reference.  The only other way
a GC might find it is at a safepoint.  But even if that happens, a
safepoint is a memory barrier.  So I think we're OK.

Andrew.

From goetz.lindenmaier at sap.com  Tue Dec 15 13:09:58 2015
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Tue, 15 Dec 2015 13:09:58 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier	after AllocationNode
In-Reply-To: <566FEE89.5020300@redhat.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
Message-ID: <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>

Hi Andrew,

What if it's assigned to an object that's already 
completely alive, but does not escape itself?

Best regards,
  Goetz.


> -----Original Message-----
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-
> bounces at openjdk.java.net] On Behalf Of Andrew Haley
> Sent: Dienstag, 15. Dezember 2015 11:42
> To: Doerr, Martin <martin.doerr at sap.com>; Aleksey Shipilev
> <aleksey.shipilev at oracle.com>; Vladimir Kozlov
> <vladimir.kozlov at oracle.com>; Hui Shi <hui.shi at linaro.org>; hotspot
> compiler <hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev
> <aarch64-port-dev at openjdk.java.net>; Mikael Gerdin
> <mikael.gerdin at oracle.com> (mikael.gerdin at oracle.com)
> <mikael.gerdin at oracle.com>
> Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
> barrier after AllocationNode
> 
> On 15/12/15 10:27, Doerr, Martin wrote:
> 
> > I think this change is good with respect to concurrent java threads.
> > However, I'm not sure if concurrent GC may have a problem when we
> > optimize out the memory barrier (with or without this change).
> >
> > Is it guaranteed that no concurrent GC will ever read an object
> > header of such a newly allocated object?
> > A reference to this object may get written somewhere where GC can
> > find it. If the GC reads the header, it may read stale data.
> 
> We know that the reference to the newly-created object does not
> escape, so it is not reachable from any reference.  The only other way
> a GC might find it is at a safepoint.  But even if that happens, a
> safepoint is a memory barrier.  So I think we're OK.
> 
> Andrew.

From aph at redhat.com  Tue Dec 15 13:46:13 2015
From: aph at redhat.com (Andrew Haley)
Date: Tue, 15 Dec 2015 13:46:13 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
Message-ID: <567019A5.1000202@redhat.com>

Hi,

On 12/15/2015 01:09 PM, Lindenmaier, Goetz wrote:

> What if it's assigned to an object that's already completely alive,
> but does not escape itself?

It's not clear to me exactly what this means.  However, if neither
object escapes then they are both reachable to GC only via scanning
the stack, and this can happen only at safepoints.

Andrew.


From goetz.lindenmaier at sap.com  Tue Dec 15 13:53:39 2015
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Tue, 15 Dec 2015 13:53:39 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <567019A5.1000202@redhat.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
Message-ID: <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>

Hi Andrew,

here an example:

A a  = new A ();      // a does not escape
Safepoint();             // a is known to GC
                                     // Concurrent GC is running.
B b = new B(a);

    where 
    B(A a) {
         <Initialize>
         StoreStore barrier  // This is removed by the optimization.
        a.x = this;                    // Then this is not initialized, but visible to GC
        final field store
        Membar_release
    }
  
Best regards,
  Martin and Goetz.


> -----Original Message-----
> From: Andrew Haley [mailto:aph at redhat.com]
> Sent: Dienstag, 15. Dezember 2015 14:46
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Doerr, Martin
> <martin.doerr at sap.com>; Aleksey Shipilev <aleksey.shipilev at oracle.com>;
> Vladimir Kozlov <vladimir.kozlov at oracle.com>; Hui Shi <hui.shi at linaro.org>;
> hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; aarch64-port-
> dev <aarch64-port-dev at openjdk.java.net>; Mikael Gerdin
> <mikael.gerdin at oracle.com> (mikael.gerdin at oracle.com)
> <mikael.gerdin at oracle.com>
> Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
> barrier after AllocationNode
> 
> Hi,
> 
> On 12/15/2015 01:09 PM, Lindenmaier, Goetz wrote:
> 
> > What if it's assigned to an object that's already completely alive,
> > but does not escape itself?
> 
> It's not clear to me exactly what this means.  However, if neither
> object escapes then they are both reachable to GC only via scanning
> the stack, and this can happen only at safepoints.
> 
> Andrew.


From aph at redhat.com  Tue Dec 15 14:05:34 2015
From: aph at redhat.com (Andrew Haley)
Date: Tue, 15 Dec 2015 14:05:34 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
Message-ID: <56701E2E.5000901@redhat.com>

Hi,

On 12/15/2015 01:53 PM, Lindenmaier, Goetz wrote:

> here an example:
> 
> A a  = new A ();      // a does not escape
> Safepoint();             // a is known to GC
>                                      // Concurrent GC is running.
> B b = new B(a);
> 
>     where 
>     B(A a) {
>          <Initialize>
>          StoreStore barrier  // This is removed by the optimization.
>         a.x = this;                    // Then this is not initialized, but visible to GC
>         final field store
>         Membar_release
>     }

Hmm, interesting.  Here we're presented with two objects which
escape analysis reveals as not escaping but both are allocated
anyway and are included in the OOP map.

I'd argue that once you've put an object into an OOP map to be scanned
it has escaped, but that may well not be how C2 handles it.  For this
reachability analysis to be correct, if you put a reference to an
object into any object which is reachable as a GC root then that object
surely does escape.

Andrew.

From vitalyd at gmail.com  Tue Dec 15 14:28:35 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Tue, 15 Dec 2015 09:28:35 -0500
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <56701E2E.5000901@redhat.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
Message-ID: <CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>

I'm curious why you guys think `a` and/or `b` would be in the oopmap if
compiler proves they don't escape.  AFAIK, both `a` and `b` will be
component-wise scalar replaced.  Once that's done, there's a ref from
scalar replaced a.x to `b`, but `b` itself is scalar replaced.  In either
case, I don't see why either of these need to be known to GC at all (which
would somewhat defeat the purpose of EA to begin with).

On Tue, Dec 15, 2015 at 9:05 AM, Andrew Haley <aph at redhat.com> wrote:

> Hi,
>
> On 12/15/2015 01:53 PM, Lindenmaier, Goetz wrote:
>
> > here an example:
> >
> > A a  = new A ();      // a does not escape
> > Safepoint();             // a is known to GC
> >                                      // Concurrent GC is running.
> > B b = new B(a);
> >
> >     where
> >     B(A a) {
> >          <Initialize>
> >          StoreStore barrier  // This is removed by the optimization.
> >         a.x = this;                    // Then this is not initialized,
> but visible to GC
> >         final field store
> >         Membar_release
> >     }
>
> Hmm, interesting.  Here we're presented with two objects which
> escape analysis reveals as not escaping but both are allocated
> anyway and are included in the OOP map.
>
> I'd argue that once you've put an object into an OOP map to be scanned
> it has escaped, but that may well not be how C2 handles it.  For this
> reachability analysis to be correct, if you put a reference to an
> object into any object which is reachable as a GC root then that object
> surely does escape.
>
> Andrew.
>

From aph at redhat.com  Tue Dec 15 14:33:04 2015
From: aph at redhat.com (Andrew Haley)
Date: Tue, 15 Dec 2015 14:33:04 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
Message-ID: <567024A0.40409@redhat.com>

On 12/15/2015 02:28 PM, Vitaly Davidovich wrote:
> I'm curious why you guys think `a` and/or `b` would be in the oopmap if
> compiler proves they don't escape.  AFAIK, both `a` and `b` will be
> component-wise scalar replaced.  Once that's done, there's a ref from
> scalar replaced a.x to `b`, but `b` itself is scalar replaced.  In either
> case, I don't see why either of these need to be known to GC at all (which
> would somewhat defeat the purpose of EA to begin with).

Are you saying that if escape analysis determined that an object does
not escape then you know *for sure* that it will always be scalar-
replaced?

Andrew.


From goetz.lindenmaier at sap.com  Tue Dec 15 14:37:51 2015
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Tue, 15 Dec 2015 14:37:51 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com>	<566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
Message-ID: <4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap>

If object arg_escape, locking, barriers etc can be relaxed, but scalar replacement is not possible.
Oop maps are needed, else these don?t survive the gc.

Goetz.

From: Vitaly Davidovich [mailto:vitalyd at gmail.com]
Sent: Dienstag, 15. Dezember 2015 15:29
To: Andrew Haley <aph at redhat.com>
Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Doerr, Martin <martin.doerr at sap.com>; Aleksey Shipilev <aleksey.shipilev at oracle.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>; Hui Shi <hui.shi at linaro.org>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev <aarch64-port-dev at openjdk.java.net>; Mikael Gerdin <mikael.gerdin at oracle.com> (mikael.gerdin at oracle.com) <mikael.gerdin at oracle.com>
Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode

I'm curious why you guys think `a` and/or `b` would be in the oopmap if compiler proves they don't escape.  AFAIK, both `a` and `b` will be component-wise scalar replaced.  Once that's done, there's a ref from scalar replaced a.x to `b`, but `b` itself is scalar replaced.  In either case, I don't see why either of these need to be known to GC at all (which would somewhat defeat the purpose of EA to begin with).

On Tue, Dec 15, 2015 at 9:05 AM, Andrew Haley <aph at redhat.com<mailto:aph at redhat.com>> wrote:
Hi,

On 12/15/2015 01:53 PM, Lindenmaier, Goetz wrote:

> here an example:
>
> A a  = new A ();      // a does not escape
> Safepoint();             // a is known to GC
>                                      // Concurrent GC is running.
> B b = new B(a);
>
>     where
>     B(A a) {
>          <Initialize>
>          StoreStore barrier  // This is removed by the optimization.
>         a.x = this;                    // Then this is not initialized, but visible to GC
>         final field store
>         Membar_release
>     }

Hmm, interesting.  Here we're presented with two objects which
escape analysis reveals as not escaping but both are allocated
anyway and are included in the OOP map.

I'd argue that once you've put an object into an OOP map to be scanned
it has escaped, but that may well not be how C2 handles it.  For this
reachability analysis to be correct, if you put a reference to an
object into any object which is reachable as a GC root then that object
surely does escape.

Andrew.


From aph at redhat.com  Tue Dec 15 14:42:31 2015
From: aph at redhat.com (Andrew Haley)
Date: Tue, 15 Dec 2015 14:42:31 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap>
Message-ID: <567026D7.6080908@redhat.com>

On 12/15/2015 02:37 PM, Lindenmaier, Goetz wrote:
> If object arg_escape, locking, barriers etc can be relaxed, but scalar replacement is not possible.
> Oop maps are needed, else these don?t survive the gc.

I don't know what this means.

Andrew.


From hui.shi at linaro.org  Tue Dec 15 14:50:38 2015
From: hui.shi at linaro.org (Hui Shi)
Date: Tue, 15 Dec 2015 22:50:38 +0800
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
Message-ID: <CAF1YaiBOm2rmogb5oEOHqqG86H7BmJbN2rVmjFXR-wqsQhHH_Q@mail.gmail.com>

Thanks All!

In Goetz example, suppose the outer method is named foo and object a, b is
not escaped in foo. b is not escaped in foo as a is not escaped in foo.

But b is escaped in its initializer in BCEscapeAnalysis. In b's initializer
method, "this" should be marked escaped as it is assigned to another
parameter "assign to a.x". As b is escaped in its initializer, storestore
barrier will not be removed in this case, so it's safe.

Regards
Hui

On 15 December 2015 at 21:53, Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
wrote:

> Hi Andrew,
>
> here an example:
>
> A a  = new A ();      // a does not escape
> Safepoint();             // a is known to GC
>                                      // Concurrent GC is running.
> B b = new B(a);
>
>     where
>     B(A a) {
>          <Initialize>
>          StoreStore barrier  // This is removed by the optimization.
>         a.x = this;                    // Then this is not initialized,
> but visible to GC
>         final field store
>         Membar_release
>     }
>
> Best regards,
>   Martin and Goetz.
>
>
> > -----Original Message-----
> > From: Andrew Haley [mailto:aph at redhat.com]
> > Sent: Dienstag, 15. Dezember 2015 14:46
> > To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Doerr, Martin
> > <martin.doerr at sap.com>; Aleksey Shipilev <aleksey.shipilev at oracle.com>;
> > Vladimir Kozlov <vladimir.kozlov at oracle.com>; Hui Shi <
> hui.shi at linaro.org>;
> > hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; aarch64-port-
> > dev <aarch64-port-dev at openjdk.java.net>; Mikael Gerdin
> > <mikael.gerdin at oracle.com> (mikael.gerdin at oracle.com)
> > <mikael.gerdin at oracle.com>
> > Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
> > barrier after AllocationNode
> >
> > Hi,
> >
> > On 12/15/2015 01:09 PM, Lindenmaier, Goetz wrote:
> >
> > > What if it's assigned to an object that's already completely alive,
> > > but does not escape itself?
> >
> > It's not clear to me exactly what this means.  However, if neither
> > object escapes then they are both reachable to GC only via scanning
> > the stack, and this can happen only at safepoints.
> >
> > Andrew.
>
>
>
>

From vitalyd at gmail.com  Tue Dec 15 14:51:40 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Tue, 15 Dec 2015 09:51:40 -0500
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <567024A0.40409@redhat.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<567024A0.40409@redhat.com>
Message-ID: <CAHjP37HGnsevbOzku7cZrsVF+V1VGF-zoCijCH=_+Q+_eYgYDw@mail.gmail.com>

Hotspot implements only the scalar replacement form of EA.

On Tue, Dec 15, 2015 at 9:33 AM, Andrew Haley <aph at redhat.com> wrote:

> On 12/15/2015 02:28 PM, Vitaly Davidovich wrote:
> > I'm curious why you guys think `a` and/or `b` would be in the oopmap if
> > compiler proves they don't escape.  AFAIK, both `a` and `b` will be
> > component-wise scalar replaced.  Once that's done, there's a ref from
> > scalar replaced a.x to `b`, but `b` itself is scalar replaced.  In either
> > case, I don't see why either of these need to be known to GC at all
> (which
> > would somewhat defeat the purpose of EA to begin with).
>
> Are you saying that if escape analysis determined that an object does
> not escape then you know *for sure* that it will always be scalar-
> replaced?
>
> Andrew.
>
>

From goetz.lindenmaier at sap.com  Tue Dec 15 14:54:23 2015
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Tue, 15 Dec 2015 14:54:23 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <567026D7.6080908@redhat.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap>
	<567026D7.6080908@redhat.com>
Message-ID: <4295855A5C1DE049A61835A1887419CC41EDEF22@DEWDFEMB12A.global.corp.sap>

Hi,

It's explained in escape.hpp.  The proper name is 'ArgEscape'.

  typedef enum {
    UnknownEscape = 0,
    NoEscape      = 1, // An object does not escape method or thread and it is
                       // not passed to call. It could be replaced with scalar.
    ArgEscape     = 2, // An object does not escape method or thread but it is
                       // passed as argument to call or referenced by argument
                       // and it does not escape during call.
    GlobalEscape  = 3  // An object escapes the method or thread.
  } EscapeState;

I.e., an object passed to a callee that is a pure function
can not be scalar replaced, as you have to keep the object 
layout to pass it down.
But the callee does not publish the reference to any other
thread, so we don't need to execute locks. Also, we
can remove barriers.

Actually, we see a whole bunch of errors on ppc recently.
I thought it's all related to ComressedStrings, but not all
are investigated yet.  So it could also stem from "8136596: Remove aarch64: 
MemBarRelease when final field's allocation is NoEscape or ArgEscape"
http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/6cc606e29b74
We'll investigate ...

Best regards,
  Goetz.


> -----Original Message-----
> From: Andrew Haley [mailto:aph at redhat.com]
> Sent: Dienstag, 15. Dezember 2015 15:43
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Vitaly Davidovich
> <vitalyd at gmail.com>
> Cc: Doerr, Martin <martin.doerr at sap.com>; Aleksey Shipilev
> <aleksey.shipilev at oracle.com>; Vladimir Kozlov
> <vladimir.kozlov at oracle.com>; Hui Shi <hui.shi at linaro.org>; hotspot
> compiler <hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev
> <aarch64-port-dev at openjdk.java.net>; Mikael Gerdin
> <mikael.gerdin at oracle.com> (mikael.gerdin at oracle.com)
> <mikael.gerdin at oracle.com>
> Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
> barrier after AllocationNode
> 
> On 12/15/2015 02:37 PM, Lindenmaier, Goetz wrote:
> > If object arg_escape, locking, barriers etc can be relaxed, but scalar
> replacement is not possible.
> > Oop maps are needed, else these don?t survive the gc.
> 
> I don't know what this means.
> 
> Andrew.


From aph at redhat.com  Tue Dec 15 14:55:38 2015
From: aph at redhat.com (Andrew Haley)
Date: Tue, 15 Dec 2015 14:55:38 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <CAHjP37HGnsevbOzku7cZrsVF+V1VGF-zoCijCH=_+Q+_eYgYDw@mail.gmail.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<567024A0.40409@redhat.com>
	<CAHjP37HGnsevbOzku7cZrsVF+V1VGF-zoCijCH=_+Q+_eYgYDw@mail.gmail.com>
Message-ID: <567029EA.5030607@redhat.com>

On 12/15/2015 02:51 PM, Vitaly Davidovich wrote:
> Hotspot implements only the scalar replacement form of EA.

Scalar replacement is not a form of escape analysis.  This does
not answer my question, which was:

> Are you saying that if escape analysis determined that an object does
> not escape then you know *for sure* that it will always be scalar-
> replaced?

Andrew.


From aph at redhat.com  Tue Dec 15 14:57:59 2015
From: aph at redhat.com (Andrew Haley)
Date: Tue, 15 Dec 2015 14:57:59 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEF22@DEWDFEMB12A.global.corp.sap>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap>
	<567026D7.6080908@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEF22@DEWDFEMB12A.global.corp.sap>
Message-ID: <56702A77.7040407@redhat.com>

On 12/15/2015 02:54 PM, Lindenmaier, Goetz wrote:
> I.e., an object passed to a callee that is a pure function
> can not be scalar replaced, as you have to keep the object 
> layout to pass it down.
> But the callee does not publish the reference to any other
> thread, so we don't need to execute locks. Also, we
> can remove barriers.

So the answer is obvious, surely?  We can elide the locks only if
NoEscape.

Andrew.


From vitalyd at gmail.com  Tue Dec 15 15:00:42 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Tue, 15 Dec 2015 10:00:42 -0500
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <CAHjP37HGnsevbOzku7cZrsVF+V1VGF-zoCijCH=_+Q+_eYgYDw@mail.gmail.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<567024A0.40409@redhat.com>
	<CAHjP37HGnsevbOzku7cZrsVF+V1VGF-zoCijCH=_+Q+_eYgYDw@mail.gmail.com>
Message-ID: <CAHjP37H1x9COPpLpymr_xg70mj_EtoObLchhYFrntpLTu2yGHg@mail.gmail.com>

Well, scratch what I said; I see Goetz is referring to ArgEscape form, but
I was thinking we're talking about the NoEscape version given the example
is quite simple.

On Tue, Dec 15, 2015 at 9:51 AM, Vitaly Davidovich <vitalyd at gmail.com>
wrote:

> Hotspot implements only the scalar replacement form of EA.
>
> On Tue, Dec 15, 2015 at 9:33 AM, Andrew Haley <aph at redhat.com> wrote:
>
>> On 12/15/2015 02:28 PM, Vitaly Davidovich wrote:
>> > I'm curious why you guys think `a` and/or `b` would be in the oopmap if
>> > compiler proves they don't escape.  AFAIK, both `a` and `b` will be
>> > component-wise scalar replaced.  Once that's done, there's a ref from
>> > scalar replaced a.x to `b`, but `b` itself is scalar replaced.  In
>> either
>> > case, I don't see why either of these need to be known to GC at all
>> (which
>> > would somewhat defeat the purpose of EA to begin with).
>>
>> Are you saying that if escape analysis determined that an object does
>> not escape then you know *for sure* that it will always be scalar-
>> replaced?
>>
>> Andrew.
>>
>>
>

From vitalyd at gmail.com  Tue Dec 15 15:02:23 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Tue, 15 Dec 2015 10:02:23 -0500
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap>
Message-ID: <CAHjP37FK6L+nUnYyKRy_CJXQNHWdnyXONZjKfOtJPE4w1NM1-w@mail.gmail.com>

Ok, as I just replied to Andrew, I hadn't considered the ArgEscape
scenario.  Does an oop that's ArgEscape still get allocated on heap then?

On Tue, Dec 15, 2015 at 9:37 AM, Lindenmaier, Goetz <
goetz.lindenmaier at sap.com> wrote:

> If object arg_escape, locking, barriers etc can be relaxed, but scalar
> replacement is not possible.
>
> Oop maps are needed, else these don?t survive the gc.
>
>
>
> Goetz.
>
>
>
> *From:* Vitaly Davidovich [mailto:vitalyd at gmail.com]
> *Sent:* Dienstag, 15. Dezember 2015 15:29
> *To:* Andrew Haley <aph at redhat.com>
> *Cc:* Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Doerr, Martin <
> martin.doerr at sap.com>; Aleksey Shipilev <aleksey.shipilev at oracle.com>;
> Vladimir Kozlov <vladimir.kozlov at oracle.com>; Hui Shi <hui.shi at linaro.org>;
> hotspot compiler <hotspot-compiler-dev at openjdk.java.net>;
> aarch64-port-dev <aarch64-port-dev at openjdk.java.net>; Mikael Gerdin <
> mikael.gerdin at oracle.com> (mikael.gerdin at oracle.com) <
> mikael.gerdin at oracle.com>
> *Subject:* Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
> barrier after AllocationNode
>
>
>
> I'm curious why you guys think `a` and/or `b` would be in the oopmap if
> compiler proves they don't escape.  AFAIK, both `a` and `b` will be
> component-wise scalar replaced.  Once that's done, there's a ref from
> scalar replaced a.x to `b`, but `b` itself is scalar replaced.  In either
> case, I don't see why either of these need to be known to GC at all (which
> would somewhat defeat the purpose of EA to begin with).
>
>
>
> On Tue, Dec 15, 2015 at 9:05 AM, Andrew Haley <aph at redhat.com> wrote:
>
> Hi,
>
> On 12/15/2015 01:53 PM, Lindenmaier, Goetz wrote:
>
> > here an example:
> >
> > A a  = new A ();      // a does not escape
> > Safepoint();             // a is known to GC
> >                                      // Concurrent GC is running.
> > B b = new B(a);
> >
> >     where
> >     B(A a) {
> >          <Initialize>
> >          StoreStore barrier  // This is removed by the optimization.
> >         a.x = this;                    // Then this is not initialized,
> but visible to GC
> >         final field store
> >         Membar_release
> >     }
>
> Hmm, interesting.  Here we're presented with two objects which
> escape analysis reveals as not escaping but both are allocated
> anyway and are included in the OOP map.
>
> I'd argue that once you've put an object into an OOP map to be scanned
> it has escaped, but that may well not be how C2 handles it.  For this
> reachability analysis to be correct, if you put a reference to an
> object into any object which is reachable as a GC root then that object
> surely does escape.
>
> Andrew.
>
>
>

From vitalyd at gmail.com  Tue Dec 15 15:11:00 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Tue, 15 Dec 2015 10:11:00 -0500
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
	barrier after AllocationNode
In-Reply-To: <567029EA.5030607@redhat.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<567024A0.40409@redhat.com>
	<CAHjP37HGnsevbOzku7cZrsVF+V1VGF-zoCijCH=_+Q+_eYgYDw@mail.gmail.com>
	<567029EA.5030607@redhat.com>
Message-ID: <CAHjP37FrofiZcaa0-3NXuoW-4nnDe6byrkevBm3WBzFA2-gRvA@mail.gmail.com>

Yes that was my fault; I had forgotten about the ArgEscape analysis result.

To answer your question somewhat, if an object is NoEscape then it's scalar
replaced in the end.  I don't think there's any other end result in hotspot
(e.g there's no stack allocation).

On Tuesday, December 15, 2015, Andrew Haley <aph at redhat.com> wrote:

> On 12/15/2015 02:51 PM, Vitaly Davidovich wrote:
> > Hotspot implements only the scalar replacement form of EA.
>
> Scalar replacement is not a form of escape analysis.  This does
> not answer my question, which was:
>
> > Are you saying that if escape analysis determined that an object does
> > not escape then you know *for sure* that it will always be scalar-
> > replaced?
>
> Andrew.
>
>

-- 
Sent from my phone

From goetz.lindenmaier at sap.com  Tue Dec 15 15:14:02 2015
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Tue, 15 Dec 2015 15:14:02 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <CAF1YaiBOm2rmogb5oEOHqqG86H7BmJbN2rVmjFXR-wqsQhHH_Q@mail.gmail.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com>	<566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<CAF1YaiBOm2rmogb5oEOHqqG86H7BmJbN2rVmjFXR-wqsQhHH_Q@mail.gmail.com>
Message-ID: <4295855A5C1DE049A61835A1887419CC41EDEF66@DEWDFEMB12A.global.corp.sap>

Hi Hui

That depends how BCEscapeAnalysis is implemented.  I don?t know this in detail.
But in theory, after analyzing a callee, you represent it by some function
describing it?s semantics. From this you would derive that both are ArgEscape in the end.

Best regards,
  Goetz.


From: Hui Shi [mailto:hui.shi at linaro.org]
Sent: Dienstag, 15. Dezember 2015 15:51
To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
Cc: Andrew Haley <aph at redhat.com>; Doerr, Martin <martin.doerr at sap.com>; Aleksey Shipilev <aleksey.shipilev at oracle.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev <aarch64-port-dev at openjdk.java.net>; Mikael Gerdin <mikael.gerdin at oracle.com> (mikael.gerdin at oracle.com) <mikael.gerdin at oracle.com>
Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode

Thanks All!

In Goetz example, suppose the outer method is named foo and object a, b is not escaped in foo. b is not escaped in foo as a is not escaped in foo.

But b is escaped in its initializer in BCEscapeAnalysis. In b's initializer method, "this" should be marked escaped as it is assigned to another parameter "assign to a.x". As b is escaped in its initializer, storestore barrier will not be removed in this case, so it's safe.

Regards
Hui

On 15 December 2015 at 21:53, Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>> wrote:
Hi Andrew,

here an example:

A a  = new A ();      // a does not escape
Safepoint();             // a is known to GC
                                     // Concurrent GC is running.
B b = new B(a);

    where
    B(A a) {
         <Initialize>
         StoreStore barrier  // This is removed by the optimization.
        a.x = this;                    // Then this is not initialized, but visible to GC
        final field store
        Membar_release
    }

Best regards,
  Martin and Goetz.


> -----Original Message-----
> From: Andrew Haley [mailto:aph at redhat.com<mailto:aph at redhat.com>]
> Sent: Dienstag, 15. Dezember 2015 14:46
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>; Doerr, Martin
> <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>; Aleksey Shipilev <aleksey.shipilev at oracle.com<mailto:aleksey.shipilev at oracle.com>>;
> Vladimir Kozlov <vladimir.kozlov at oracle.com<mailto:vladimir.kozlov at oracle.com>>; Hui Shi <hui.shi at linaro.org<mailto:hui.shi at linaro.org>>;
> hotspot compiler <hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>>; aarch64-port-
> dev <aarch64-port-dev at openjdk.java.net<mailto:aarch64-port-dev at openjdk.java.net>>; Mikael Gerdin
> <mikael.gerdin at oracle.com<mailto:mikael.gerdin at oracle.com>> (mikael.gerdin at oracle.com<mailto:mikael.gerdin at oracle.com>)
> <mikael.gerdin at oracle.com<mailto:mikael.gerdin at oracle.com>>
> Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
> barrier after AllocationNode
>
> Hi,
>
> On 12/15/2015 01:09 PM, Lindenmaier, Goetz wrote:
>
> > What if it's assigned to an object that's already completely alive,
> > but does not escape itself?
>
> It's not clear to me exactly what this means.  However, if neither
> object escapes then they are both reachable to GC only via scanning
> the stack, and this can happen only at safepoints.
>
> Andrew.


From goetz.lindenmaier at sap.com  Tue Dec 15 16:01:40 2015
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Tue, 15 Dec 2015 16:01:40 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <CAHjP37FrofiZcaa0-3NXuoW-4nnDe6byrkevBm3WBzFA2-gRvA@mail.gmail.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com>	<566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<567024A0.40409@redhat.com>
	<CAHjP37HGnsevbOzku7cZrsVF+V1VGF-zoCijCH=_+Q+_eYgYDw@mail.gmail.com>
	<567029EA.5030607@redhat.com>
	<CAHjP37FrofiZcaa0-3NXuoW-4nnDe6byrkevBm3WBzFA2-gRvA@mail.gmail.com>
Message-ID: <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap>

Yes, there is no stack allocation.
But locks are removed, see escape.cpp:1844, which is executed under
condition not_global_escape(). As well look at callnode:1770.

Also, does_not_escape_thread() used here checks for <= ArgEscape.

Further, if the object is NoEscape it might not be scalar replaced. If I remember
correctly, there are various conditions, e.g., too big, allocated in loop.

And, the constructor could be inlined (or does this happen after expand_allocate_common()?)

Best regards,
  Goetz.


From: Vitaly Davidovich [mailto:vitalyd at gmail.com]
Sent: Dienstag, 15. Dezember 2015 16:11
To: Andrew Haley <aph at redhat.com>
Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Doerr, Martin <martin.doerr at sap.com>; Aleksey Shipilev <aleksey.shipilev at oracle.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>; Hui Shi <hui.shi at linaro.org>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev <aarch64-port-dev at openjdk.java.net>; Mikael Gerdin <mikael.gerdin at oracle.com> (mikael.gerdin at oracle.com) <mikael.gerdin at oracle.com>
Subject: Re: RFR: 8144993: Elide redundant memory barrier after AllocationNode

Yes that was my fault; I had forgotten about the ArgEscape analysis result.

To answer your question somewhat, if an object is NoEscape then it's scalar replaced in the end.  I don't think there's any other end result in hotspot (e.g there's no stack allocation).

On Tuesday, December 15, 2015, Andrew Haley <aph at redhat.com<mailto:aph at redhat.com>> wrote:
On 12/15/2015 02:51 PM, Vitaly Davidovich wrote:
> Hotspot implements only the scalar replacement form of EA.

Scalar replacement is not a form of escape analysis.  This does
not answer my question, which was:

> Are you saying that if escape analysis determined that an object does
> not escape then you know *for sure* that it will always be scalar-
> replaced?

Andrew.


--
Sent from my phone

From aph at redhat.com  Tue Dec 15 16:15:08 2015
From: aph at redhat.com (Andrew Haley)
Date: Tue, 15 Dec 2015 16:15:08 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
	barrier after AllocationNode
In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<567024A0.40409@redhat.com>
	<CAHjP37HGnsevbOzku7cZrsVF+V1VGF-zoCijCH=_+Q+_eYgYDw@mail.gmail.com>
	<567029EA.5030607@redhat.com>
	<CAHjP37FrofiZcaa0-3NXuoW-4nnDe6byrkevBm3WBzFA2-gRvA@mail.gmail.com>
	<4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap>
Message-ID: <56703C8C.4000801@redhat.com>

On 12/15/2015 04:01 PM, Lindenmaier, Goetz wrote:

> Further, if the object is NoEscape it might not be scalar
> replaced. If I remember correctly, there are various conditions,
> e.g., too big, allocated in loop.

Well, that's the killer.  The definition of "escape" we need to use
here is the really, truly, honest-to-goodness one: that this object
never becomes visible to any other thread by any means.  Unless that
is so, all bets are off.  In this case, what is intended is "appears
in an OOP map".

Andrew.

From aph at redhat.com  Tue Dec 15 18:00:57 2015
From: aph at redhat.com (Andrew Haley)
Date: Tue, 15 Dec 2015 18:00:57 +0000
Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache
 generates SEGV
In-Reply-To: <1449588750.5880.28.camel@mylittlepony.linaroharston>
References: <1449223186.15424.42.camel@mint> <5661BBCB.5000307@redhat.com>
	<5661CF8B.6040405@redhat.com> <1449490934.12382.49.camel@mint>
	<566595B5.9060400@redhat.com>
	<1449588750.5880.28.camel@mylittlepony.linaroharston>
Message-ID: <56705559.8020900@redhat.com>

On 12/08/2015 03:32 PM, Edward Nevill wrote:
> OK. Thanks, I have satisfied myself that this is correct.
> 
> New webrev @ http://cr.openjdk.java.net/~enevill/8144498/webrev.2

By the powers newly vested in me I hereby approve this patch.

Andrew.


From hui.shi at linaro.org  Wed Dec 16 12:27:00 2015
From: hui.shi at linaro.org (Hui Shi)
Date: Wed, 16 Dec 2015 20:27:00 +0800
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
	barrier after AllocationNode
In-Reply-To: <56703C8C.4000801@redhat.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<567024A0.40409@redhat.com>
	<CAHjP37HGnsevbOzku7cZrsVF+V1VGF-zoCijCH=_+Q+_eYgYDw@mail.gmail.com>
	<567029EA.5030607@redhat.com>
	<CAHjP37FrofiZcaa0-3NXuoW-4nnDe6byrkevBm3WBzFA2-gRvA@mail.gmail.com>
	<4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap>
	<56703C8C.4000801@redhat.com>
Message-ID: <CAF1YaiB8-thrBkP-Q58TDL026Fpd2HTrerOEFzxEy3uDWFL9qg@mail.gmail.com>

Thanks Andrew, Goetz and all!

Major concern is will removing storestore barrier cause other threads read
stale data for newly allocated object. Other threads include java thread or
concurrent GC thread. It should be safe with following analysis.

1. If BCEA result "this"(b) escapes in its initializer, change will not
optimize storestore barrier.
2. If BCEA result "this"(b) does not escape in its initializer, it's safe
to remove storestore.
   2.1 If there is a safe point between storestore and release, b is
visible to GC in initializer, but at safe point, it should have a memory
barrier.
   2.2 If there is no safe point between storestore and release. b will be
visible to other thread after release memory barrier.

Case #1
A a = new A();
safepoint // a can be reached from GC
new B(a)

allocation
-------
b.klass =...
b.markword =...
b.f1 = 0
..
b.fn = 0
storestore
-------- init start
....
a.x = this;  // b might visible to other threads here
....
release
-------- init end

BCEA result indicate "this"(b) is not local and not arg_stack. So "b" will
be treated as escaped in its initialzer, so change will not optimize
storestore barrier.
[EA] estimated escape information for B::<init>
     non-escaping args:      {}
     stack-allocatable args: {1}
     return non-local value
     modified args:     0x6    0x6
     flags:
b="this"  is not local and not arg_stack
a        is arg_stack means it is passed in and not assigned to other
object in initializer.

Case #2.1
allocation
-------
b.klass =...
b.markword =...
b.f1 = 0
..
b.fn = 0
storestore
-------- init start
....
safepoint  // "this" is in oop map and might visible to GC thread here
....
release
-------- init end

Case #2.2
allocation
-------
b.klass =...
b.markword =...
b.f1 = 0
..
b.fn = 0
storestore
-------- init start
....
release
-------- init end

Regards
Hui

On 16 December 2015 at 00:15, Andrew Haley <aph at redhat.com> wrote:

> On 12/15/2015 04:01 PM, Lindenmaier, Goetz wrote:
>
> > Further, if the object is NoEscape it might not be scalar
> > replaced. If I remember correctly, there are various conditions,
> > e.g., too big, allocated in loop.
>
> Well, that's the killer.  The definition of "escape" we need to use
> here is the really, truly, honest-to-goodness one: that this object
> never becomes visible to any other thread by any means.  Unless that
> is so, all bets are off.  In this case, what is intended is "appears
> in an OOP map".
>
> Andrew.
>

From martin.doerr at sap.com  Thu Dec 17 13:54:20 2015
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 17 Dec 2015 13:54:20 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <CAF1YaiB8-thrBkP-Q58TDL026Fpd2HTrerOEFzxEy3uDWFL9qg@mail.gmail.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com>	<566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<567024A0.40409@redhat.com>
	<CAHjP37HGnsevbOzku7cZrsVF+V1VGF-zoCijCH=_+Q+_eYgYDw@mail.gmail.com>
	<567029EA.5030607@redhat.com>
	<CAHjP37FrofiZcaa0-3NXuoW-4nnDe6byrkevBm3WBzFA2-gRvA@mail.gmail.com>
	<4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap>
	<56703C8C.4000801@redhat.com>
	<CAF1YaiB8-thrBkP-Q58TDL026Fpd2HTrerOEFzxEy3uDWFL9qg@mail.gmail.com>
Message-ID: <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap>

Hi Hui Shi,

my concern was not limited to 8144993, but also with respect to 8136596 which is already pushed.


I have written the following small java example:

public class TestAllocMemBar{

  static final int loop_cnt = 20000;

  void dont_inline_me() {}


  public class A{
    public B b;
  }

  public class B{
    public B(A a) { a.b = B.this; }
  }

  public void TestMethod() {
    A a = new A();
    dont_inline_me();
    //System.gc();
    B b = new B(a);
  }


  public static void main(String args[]){
    TestAllocMemBar xyz = new TestAllocMemBar();
    long duration = System.nanoTime();
    for (int x = 0; x < loop_cnt; x++) { xyz.TestMethod(); }
    duration = System.nanoTime() - duration;
    System.out.println("duration: " + duration/1000/loop_cnt + " us per iteration");
  }

}


Execution shows (tested on PPC64):
openjdk_9/bin/java -XX:+UseConcMarkSweepGC -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:CompileCommand="exclude TestAllocMemBar::dont_inline_me" -XX:+PrintInlining -XX:+PrintEscapeAnalysis -XX:-EliminateAllocations TestAllocMemBar
?
======== Connection graph for  TestAllocMemBar::TestMethod
JavaObject NoEscape(NoEscape) [ 59F 179F [ 37 42 ]]   25        Allocate        ===  5  6  7  8  1 ( 23  21  22  1  10  1  1 ) [[ 26  27  28  35  36  37 ]]  rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top ) TestAllocMemBar::TestMethod @ bci:0 !jvms: TestAllocMemBar::TestMethod @ bci:0
LocalVar [ 25P [ 42 59b ]]   37 Proj    ===  25  [[ 38  42  59 ]] #5 !jvms: TestAllocMemBar::TestMethod @ bci:0
LocalVar [ 37 25P [ 179b ]]   42        CheckCastPP     ===  39  37  [[ 179  183  179  119  98  93 ]]  #TestAllocMemBar$A:NotNull:exact *  Oop:TestAllocMemBar$A:NotNull:exact * !jvms: TestAllocMemBar::TestMethod @ bci:0

JavaObject NoEscape(NoEscape) NSR [ 153F [ 131 136 180 179 ]]   119     Allocate        ===  105  100  101  8  1 ( 54  117  22  1  10  42  1 ) [[ 120  121  122  129  130  131 ]]  rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top ) TestAllocMemBar::TestMethod @ bci:13 !jvms: TestAllocMemBar::TestMethod @ bci:13
LocalVar [ 119P [ 136 153b ]]   131     Proj    ===  119  [[ 132  136  153 ]] #5 !jvms: TestAllocMemBar::TestMethod @ bci:13
LocalVar [ 131 119P [ 180 ]]   136      CheckCastPP     ===  133  131  [[ 180  193 ]]  #TestAllocMemBar$B:NotNull:exact *  Oop:TestAllocMemBar$B:NotNull:exact * !jvms: TestAllocMemBar::TestMethod @ bci:13
LocalVar [ 136 119P [ 179 ]]   180      EncodeP === _  136  [[ 181 ]]  #narrowoop: TestAllocMemBar$B:NotNull:exact * !jvms: TestAllocMemBar$B::<init> @ bci:11 TestAllocMemBar::TestMethod @ bci:19

                            @ 5   TestAllocMemBar$A::<init> (10 bytes)   inline (hot)
                              @ 6   java.lang.Object::<init> (1 bytes)   inline (hot)
                            @ 10   TestAllocMemBar::dont_inline_me (1 bytes)   not compilable (disabled)
                            @ 19   TestAllocMemBar$B::<init> (15 bytes)   inline (hot)
                              @ 6   java.lang.Object::<init> (1 bytes)   inline (hot)
                            @ 6   java.lang.Object::<init> (1 bytes)   inline (hot)
                            @ 6   java.lang.Object::<init> (1 bytes)   inline (hot)
duration: 3 us per iteration


So you can see that both Allocations have the state NoEscape, but there?s a safepoint (the non-inlined call) between them. Concurrent GC could access the obj header and read stale data (and possibly crash). OptoAssembly shows that the MemBar was optimized out (probably due to 8136596).

However, we may have luck. Maybe no concurrent GC accesses the header of newly created objects. But I don?t know if this is true which is the reason why I posted this question originally. Keep in mind that objects can get allocated in old gen.

I still could imaging that these 2 optimization may be dangerous.

Best regards,
  Martin


From: Hui Shi [mailto:hui.shi at linaro.org]
Sent: Mittwoch, 16. Dezember 2015 13:27
To: Andrew Haley <aph at redhat.com>
Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Vitaly Davidovich <vitalyd at gmail.com>; Doerr, Martin <martin.doerr at sap.com>; Aleksey Shipilev <aleksey.shipilev at oracle.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev <aarch64-port-dev at openjdk.java.net>; Mikael Gerdin <mikael.gerdin at oracle.com> (mikael.gerdin at oracle.com) <mikael.gerdin at oracle.com>
Subject: Re: RFR: 8144993: Elide redundant memory barrier after AllocationNode

Thanks Andrew, Goetz and all!

Major concern is will removing storestore barrier cause other threads read stale data for newly allocated object. Other threads include java thread or concurrent GC thread. It should be safe with following analysis.

1. If BCEA result "this"(b) escapes in its initializer, change will not optimize storestore barrier.
2. If BCEA result "this"(b) does not escape in its initializer, it's safe to remove storestore.
   2.1 If there is a safe point between storestore and release, b is visible to GC in initializer, but at safe point, it should have a memory barrier.
   2.2 If there is no safe point between storestore and release. b will be visible to other thread after release memory barrier.

Case #1
A a = new A();
safepoint // a can be reached from GC
new B(a)

allocation
-------
b.klass =...
b.markword =...
b.f1 = 0
..
b.fn = 0
storestore
-------- init start
....
a.x = this;  // b might visible to other threads here
....
release
-------- init end

BCEA result indicate "this"(b) is not local and not arg_stack. So "b" will be treated as escaped in its initialzer, so change will not optimize storestore barrier.
[EA] estimated escape information for B::<init>
     non-escaping args:      {}
     stack-allocatable args: {1}
     return non-local value
     modified args:     0x6    0x6
     flags:
b="this"  is not local and not arg_stack
a        is arg_stack means it is passed in and not assigned to other object in initializer.

Case #2.1
allocation
-------
b.klass =...
b.markword =...
b.f1 = 0
..
b.fn = 0
storestore
-------- init start
....
safepoint  // "this" is in oop map and might visible to GC thread here
....
release
-------- init end

Case #2.2
allocation
-------
b.klass =...
b.markword =...
b.f1 = 0
..
b.fn = 0
storestore
-------- init start
....
release
-------- init end

Regards
Hui

On 16 December 2015 at 00:15, Andrew Haley <aph at redhat.com<mailto:aph at redhat.com>> wrote:
On 12/15/2015 04:01 PM, Lindenmaier, Goetz wrote:

> Further, if the object is NoEscape it might not be scalar
> replaced. If I remember correctly, there are various conditions,
> e.g., too big, allocated in loop.

Well, that's the killer.  The definition of "escape" we need to use
here is the really, truly, honest-to-goodness one: that this object
never becomes visible to any other thread by any means.  Unless that
is so, all bets are off.  In this case, what is intended is "appears
in an OOP map".

Andrew.


From aph at redhat.com  Thu Dec 17 13:59:47 2015
From: aph at redhat.com (Andrew Haley)
Date: Thu, 17 Dec 2015 13:59:47 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
	barrier after AllocationNode
In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<567024A0.40409@redhat.com>
	<CAHjP37HGnsevbOzku7cZrsVF+V1VGF-zoCijCH=_+Q+_eYgYDw@mail.gmail.com>
	<567029EA.5030607@redhat.com>
	<CAHjP37FrofiZcaa0-3NXuoW-4nnDe6byrkevBm3WBzFA2-gRvA@mail.gmail.com>
	<4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap>
	<56703C8C.4000801@redhat.com>
	<CAF1YaiB8-thrBkP-Q58TDL026Fpd2HTrerOEFzxEy3uDWFL9qg@mail.gmail.com>
	<7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap>
Message-ID: <5672BFD3.7040307@redhat.com>

On 12/17/2015 01:54 PM, Doerr, Martin wrote:

> So you can see that both Allocations have the state NoEscape, but
> there?s a safepoint (the non-inlined call) between them. Concurrent
> GC could access the obj header and read stale data (and possibly
> crash). OptoAssembly shows that the MemBar was optimized out
> (probably due to 8136596).
> 
> However, we may have luck. Maybe no concurrent GC accesses the
> header of newly created objects. But I don?t know if this is true
> which is the reason why I posted this question originally. Keep in
> mind that objects can get allocated in old gen.

So, they are both NoEscape.  So do the objects actually get allocated?
Or are they scalar-replaced?

Andrew.

From hui.shi at linaro.org  Thu Dec 17 15:28:35 2015
From: hui.shi at linaro.org (Hui Shi)
Date: Thu, 17 Dec 2015 23:28:35 +0800
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
	barrier after AllocationNode
In-Reply-To: <5672BFD3.7040307@redhat.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<567024A0.40409@redhat.com>
	<CAHjP37HGnsevbOzku7cZrsVF+V1VGF-zoCijCH=_+Q+_eYgYDw@mail.gmail.com>
	<567029EA.5030607@redhat.com>
	<CAHjP37FrofiZcaa0-3NXuoW-4nnDe6byrkevBm3WBzFA2-gRvA@mail.gmail.com>
	<4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap>
	<56703C8C.4000801@redhat.com>
	<CAF1YaiB8-thrBkP-Q58TDL026Fpd2HTrerOEFzxEy3uDWFL9qg@mail.gmail.com>
	<7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap>
	<5672BFD3.7040307@redhat.com>
Message-ID: <CAF1YaiDOqXdp8KFpX-kU=1Lab2RWPEvwvsf=Oyp3obV5zviOAw@mail.gmail.com>

Thanks Martin!

Could discussion limit to 8144993 in this thread. Stated in early mail, it
looks safe in 3 cases for references from both GC thread or other java
thread.

8136596 enhances original optimization from noEcape to both noescape and
argescape. As said in your new example, both allocations are noescape, so
it's not directly related with 8136596.  How about starting a new thread
discussing if there is possible danger in original storestore  barrier
optimization?

Regards
Hui

On 17 December 2015 at 21:59, Andrew Haley <aph at redhat.com> wrote:

> On 12/17/2015 01:54 PM, Doerr, Martin wrote:
>
> > So you can see that both Allocations have the state NoEscape, but
> > there?s a safepoint (the non-inlined call) between them. Concurrent
> > GC could access the obj header and read stale data (and possibly
> > crash). OptoAssembly shows that the MemBar was optimized out
> > (probably due to 8136596).
> >
> > However, we may have luck. Maybe no concurrent GC accesses the
> > header of newly created objects. But I don?t know if this is true
> > which is the reason why I posted this question originally. Keep in
> > mind that objects can get allocated in old gen.
>
> So, they are both NoEscape.  So do the objects actually get allocated?
> Or are they scalar-replaced?
>
> Andrew.
>

From aph at redhat.com  Thu Dec 17 15:34:54 2015
From: aph at redhat.com (Andrew Haley)
Date: Thu, 17 Dec 2015 15:34:54 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
	barrier after AllocationNode
In-Reply-To: <CAF1YaiDOqXdp8KFpX-kU=1Lab2RWPEvwvsf=Oyp3obV5zviOAw@mail.gmail.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<567024A0.40409@redhat.com>
	<CAHjP37HGnsevbOzku7cZrsVF+V1VGF-zoCijCH=_+Q+_eYgYDw@mail.gmail.com>
	<567029EA.5030607@redhat.com>
	<CAHjP37FrofiZcaa0-3NXuoW-4nnDe6byrkevBm3WBzFA2-gRvA@mail.gmail.com>
	<4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap>
	<56703C8C.4000801@redhat.com>
	<CAF1YaiB8-thrBkP-Q58TDL026Fpd2HTrerOEFzxEy3uDWFL9qg@mail.gmail.com>
	<7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap>
	<5672BFD3.7040307@redhat.com>
	<CAF1YaiDOqXdp8KFpX-kU=1Lab2RWPEvwvsf=Oyp3obV5zviOAw@mail.gmail.com>
Message-ID: <5672D61E.3020805@redhat.com>

On 12/17/2015 03:28 PM, Hui Shi wrote:
> Could discussion limit to 8144993 in this thread. Stated in early mail, it
> looks safe in 3 cases for references from both GC thread or other java
> thread.
> 
> 8136596 enhances original optimization from noEcape to both noescape and
> argescape. As said in your new example, both allocations are noescape, so
> it's not directly related with 8136596.  How about starting a new thread
> discussing if there is possible danger in original storestore  barrier
> optimization?

I say we should not do that.  Martin's concern is real, and you have
shown no reason to suppose that removing the memory barriers will not
result in a concurrent GC seeing stale object headers.  As it stands,
unless someone can come up with something convincing, we're going to
have to restore those memory barriers.  8144993 should not be committed
until this issue is resolved.

Andrew.

From aph at redhat.com  Thu Dec 17 15:43:38 2015
From: aph at redhat.com (Andrew Haley)
Date: Thu, 17 Dec 2015 15:43:38 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
	barrier after AllocationNode
In-Reply-To: <5672D61E.3020805@redhat.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<567024A0.40409@redhat.com>
	<CAHjP37HGnsevbOzku7cZrsVF+V1VGF-zoCijCH=_+Q+_eYgYDw@mail.gmail.com>
	<567029EA.5030607@redhat.com>
	<CAHjP37FrofiZcaa0-3NXuoW-4nnDe6byrkevBm3WBzFA2-gRvA@mail.gmail.com>
	<4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap>
	<56703C8C.4000801@redhat.com>
	<CAF1YaiB8-thrBkP-Q58TDL026Fpd2HTrerOEFzxEy3uDWFL9qg@mail.gmail.com>
	<7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap>
	<5672BFD3.7040307@redhat.com>
	<CAF1YaiDOqXdp8KFpX-kU=1Lab2RWPEvwvsf=Oyp3obV5zviOAw@mail.gmail.com>
	<5672D61E.3020805@redhat.com>
Message-ID: <5672D82A.309@redhat.com>

The potential problem only arises if "this" is published unsafely and
the object to which it is published doesn't escape.

Can't we detect unsafe publication?  It ought to be easier than escape
analysis: it's a matter of detecting that "this" escapes from the
constructor.

Andrew.

From edward.nevill at gmail.com  Thu Dec 17 16:07:34 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Thu, 17 Dec 2015 16:07:34 +0000
Subject: [aarch64-port-dev ] RFR: JDK 7: Add Support for Large Code Cache on
	aarch64
Message-ID: <1450368454.21162.22.camel@mylittlepony.linaroharston>

Hi,

The following webrev adds support for large code caches to JDK 7 for aarch64

http://cr.openjdk.java.net/~enevill/jdk7_largecode/webrev/

Tested with jtreg hotspot/langtools.

hotspot (original): Test results: passed: 297; failed: 12; error: 2
hotspot (patched): Test results: passed: 297; failed: 12; error: 2
hotspot (256m cache): Test results: passed: 298; failed: 11; error: 2

langtools (original): Test results: passed: 1,973; failed: 1
langtools (patched): Test results: passed: 1,973; failed: 1
langtools (256m cache): Test results: passed: 1,973; failed: 1

Only aarch64 files are touched in this patch.

OK to push?
Ed.


From martin.doerr at sap.com  Thu Dec 17 17:58:22 2015
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 17 Dec 2015 17:58:22 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <5672D82A.309@redhat.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<567024A0.40409@redhat.com>
	<CAHjP37HGnsevbOzku7cZrsVF+V1VGF-zoCijCH=_+Q+_eYgYDw@mail.gmail.com>
	<567029EA.5030607@redhat.com>
	<CAHjP37FrofiZcaa0-3NXuoW-4nnDe6byrkevBm3WBzFA2-gRvA@mail.gmail.com>
	<4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap>
	<56703C8C.4000801@redhat.com>
	<CAF1YaiB8-thrBkP-Q58TDL026Fpd2HTrerOEFzxEy3uDWFL9qg@mail.gmail.com>
	<7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap>
	<5672BFD3.7040307@redhat.com>
	<CAF1YaiDOqXdp8KFpX-kU=1Lab2RWPEvwvsf=Oyp3obV5zviOAw@mail.gmail.com>
	<5672D61E.3020805@redhat.com> <5672D82A.309@redhat.com>
Message-ID: <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap>

Hi Andrew,

thanks for your emails.

Many memory barriers are only there for concurrent java threads and are not relevant for GC. They are opportunities for EscapeAnalysis-based optimizations. 

The MemBarStoreStore after the Allocation actually has this purpose plus the additional purpose to satisfy GC requirements. EscapeAnalysis was not designed to analyze "escape to concurrent GC". I guess it is difficult to analyze this in general.

So maybe it would be better to change the condition for the MemBarStoreStore barrier insertion to something like
"gc_requires_initialized_new_obj_headers() || !alloc->does_not_escape..." with the first function containing the knowledge about all GCs.

You also had asked if the objects in my example were scalar replaced. By default, they do get scalar-replaced, but I had prevented this by -XX:-EliminateAllocations which does not influence the escape state and the membar optimizations.

Best regards,
 Martin

-----Original Message-----
From: Andrew Haley [mailto:aph at redhat.com] 
Sent: Donnerstag, 17. Dezember 2015 16:44
To: Hui Shi <hui.shi at linaro.org>
Cc: Doerr, Martin <martin.doerr at sap.com>; Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Vitaly Davidovich <vitalyd at gmail.com>; Aleksey Shipilev <aleksey.shipilev at oracle.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev <aarch64-port-dev at openjdk.java.net>; Mikael Gerdin <mikael.gerdin at oracle.com> (mikael.gerdin at oracle.com) <mikael.gerdin at oracle.com>
Subject: Re: RFR: 8144993: Elide redundant memory barrier after AllocationNode

The potential problem only arises if "this" is published unsafely and
the object to which it is published doesn't escape.

Can't we detect unsafe publication?  It ought to be easier than escape
analysis: it's a matter of detecting that "this" escapes from the
constructor.

Andrew.

From vitalyd at gmail.com  Thu Dec 17 18:10:44 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Thu, 17 Dec 2015 13:10:44 -0500
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
	barrier after AllocationNode
In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<567024A0.40409@redhat.com>
	<CAHjP37HGnsevbOzku7cZrsVF+V1VGF-zoCijCH=_+Q+_eYgYDw@mail.gmail.com>
	<567029EA.5030607@redhat.com>
	<CAHjP37FrofiZcaa0-3NXuoW-4nnDe6byrkevBm3WBzFA2-gRvA@mail.gmail.com>
	<4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap>
	<56703C8C.4000801@redhat.com>
	<CAF1YaiB8-thrBkP-Q58TDL026Fpd2HTrerOEFzxEy3uDWFL9qg@mail.gmail.com>
	<7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap>
	<5672BFD3.7040307@redhat.com>
	<CAF1YaiDOqXdp8KFpX-kU=1Lab2RWPEvwvsf=Oyp3obV5zviOAw@mail.gmail.com>
	<5672D61E.3020805@redhat.com> <5672D82A.309@redhat.com>
	<7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap>
Message-ID: <CAHjP37FE2vFEEQtQn0v0sY8MkK0Qe8ykQfjS0yQsVSOfyW3D4w@mail.gmail.com>

>
> You also had asked if the objects in my example were scalar replaced. By
> default, they do get scalar-replaced, but I had prevented this by
> -XX:-EliminateAllocations which does not influence the escape state and the
> membar optimizations.


I'd say that's a big problem, no? The membar elimination is only safe if
the allocation is actually removed.  If the analysis says it's NoEscape but
compiler still allocates it for whatever reason (Goetz mentioned a couple
earlier in this thread), then it seems insufficient to rely on just the
analysis result.

On Thu, Dec 17, 2015 at 12:58 PM, Doerr, Martin <martin.doerr at sap.com>
wrote:

> Hi Andrew,
>
> thanks for your emails.
>
> Many memory barriers are only there for concurrent java threads and are
> not relevant for GC. They are opportunities for EscapeAnalysis-based
> optimizations.
>
> The MemBarStoreStore after the Allocation actually has this purpose plus
> the additional purpose to satisfy GC requirements. EscapeAnalysis was not
> designed to analyze "escape to concurrent GC". I guess it is difficult to
> analyze this in general.
>
> So maybe it would be better to change the condition for the
> MemBarStoreStore barrier insertion to something like
> "gc_requires_initialized_new_obj_headers() || !alloc->does_not_escape..."
> with the first function containing the knowledge about all GCs.
>
> You also had asked if the objects in my example were scalar replaced. By
> default, they do get scalar-replaced, but I had prevented this by
> -XX:-EliminateAllocations which does not influence the escape state and the
> membar optimizations.
>
> Best regards,
>  Martin
>
> -----Original Message-----
> From: Andrew Haley [mailto:aph at redhat.com]
> Sent: Donnerstag, 17. Dezember 2015 16:44
> To: Hui Shi <hui.shi at linaro.org>
> Cc: Doerr, Martin <martin.doerr at sap.com>; Lindenmaier, Goetz <
> goetz.lindenmaier at sap.com>; Vitaly Davidovich <vitalyd at gmail.com>;
> Aleksey Shipilev <aleksey.shipilev at oracle.com>; Vladimir Kozlov <
> vladimir.kozlov at oracle.com>; hotspot compiler <
> hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev <
> aarch64-port-dev at openjdk.java.net>; Mikael Gerdin <
> mikael.gerdin at oracle.com> (mikael.gerdin at oracle.com) <
> mikael.gerdin at oracle.com>
> Subject: Re: RFR: 8144993: Elide redundant memory barrier after
> AllocationNode
>
> The potential problem only arises if "this" is published unsafely and
> the object to which it is published doesn't escape.
>
> Can't we detect unsafe publication?  It ought to be easier than escape
> analysis: it's a matter of detecting that "this" escapes from the
> constructor.
>
> Andrew.
>

From goetz.lindenmaier at sap.com  Fri Dec 18 10:43:44 2015
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Fri, 18 Dec 2015 10:43:44 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <CAF1YaiB8-thrBkP-Q58TDL026Fpd2HTrerOEFzxEy3uDWFL9qg@mail.gmail.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com>	<566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<567024A0.40409@redhat.com>
	<CAHjP37HGnsevbOzku7cZrsVF+V1VGF-zoCijCH=_+Q+_eYgYDw@mail.gmail.com>
	<567029EA.5030607@redhat.com>
	<CAHjP37FrofiZcaa0-3NXuoW-4nnDe6byrkevBm3WBzFA2-gRvA@mail.gmail.com>
	<4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap>
	<56703C8C.4000801@redhat.com>
	<CAF1YaiB8-thrBkP-Q58TDL026Fpd2HTrerOEFzxEy3uDWFL9qg@mail.gmail.com>
Message-ID: <4295855A5C1DE049A61835A1887419CC41EE35EF@DEWDFEMB12A.global.corp.sap>

Hi Hui,

> Subject: Re: RFR: 8144993: Elide redundant memory barrier after
> AllocationNode
> 
> Thanks Andrew, Goetz and all!
> 
> Major concern is will removing storestore barrier cause other threads read
> stale data for newly allocated object. Other threads include java thread or
> concurrent GC thread. It should be safe with following analysis.
> 
> 1. If BCEA result "this"(b) escapes in its initializer, change will not optimize
> storestore barrier.
> 2. If BCEA result "this"(b) does not escape in its initializer, it's safe to remove
> storestore.
>    2.1 If there is a safe point between storestore and release, b is visible to GC
> in initializer, but at safe point, it should have a memory barrier.
>    2.2 If there is no safe point between storestore and release. b will be visible
> to other thread after release memory barrier.
I think this describes the situation correctly wrt. to my counterexample. I'm
not sure whether there are other possibilities.

Is the test for 1.) already implemented?
How do you do this?  Is inlining of the constructor delayed when you do 
your optimization, so you can find the call to it?  Or do you find the BCEA information
via the class that is reachable over the type information?  How do you known then
which constructor was called if there are several ones?

Best regards,
  Goetz.


> 
> Case #1
> A a = new A();
> safepoint // a can be reached from GC
> new B(a)
> 
> allocation
> -------
> b.klass =...
> b.markword =...
> b.f1 = 0
> ..
> b.fn = 0
> storestore
> -------- init start
> ....
> a.x = this;  // b might visible to other threads here
> ....
> release
> -------- init end
> 
> BCEA result indicate "this"(b) is not local and not arg_stack. So "b" will be
> treated as escaped in its initialzer, so change will not optimize storestore
> barrier.
> [EA] estimated escape information for B::<init>
>      non-escaping args:      {}
>      stack-allocatable args: {1}
>      return non-local value
>      modified args:     0x6    0x6
>      flags:
> b="this"  is not local and not arg_stack
> a        is arg_stack means it is passed in and not assigned to other object in
> initializer.
> 
> Case #2.1
> allocation
> -------
> b.klass =...
> b.markword =...
> b.f1 = 0
> ..
> b.fn = 0
> storestore
> -------- init start
> ....
> safepoint  // "this" is in oop map and might visible to GC thread here
> ....
> release
> -------- init end
> 
> Case #2.2
> allocation
> -------
> b.klass =...
> b.markword =...
> b.f1 = 0
> ..
> b.fn = 0
> storestore
> -------- init start
> ....
> release
> -------- init end
> 
> Regards
> Hui
> 
> On 16 December 2015 at 00:15, Andrew Haley <aph at redhat.com
> <mailto:aph at redhat.com> > wrote:
> 
> 
> 	On 12/15/2015 04:01 PM, Lindenmaier, Goetz wrote:
> 
> 	> Further, if the object is NoEscape it might not be scalar
> 	> replaced. If I remember correctly, there are various conditions,
> 	> e.g., too big, allocated in loop.
> 
> 	Well, that's the killer.  The definition of "escape" we need to use
> 	here is the really, truly, honest-to-goodness one: that this object
> 	never becomes visible to any other thread by any means.  Unless
> that
> 	is so, all bets are off.  In this case, what is intended is "appears
> 	in an OOP map".
> 
> 	Andrew.
> 
> 


From goetz.lindenmaier at sap.com  Fri Dec 18 11:09:41 2015
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Fri, 18 Dec 2015 11:09:41 +0000
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
 barrier after AllocationNode
In-Reply-To: <CAHjP37FE2vFEEQtQn0v0sY8MkK0Qe8ykQfjS0yQsVSOfyW3D4w@mail.gmail.com>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<567024A0.40409@redhat.com>
	<CAHjP37HGnsevbOzku7cZrsVF+V1VGF-zoCijCH=_+Q+_eYgYDw@mail.gmail.com>
	<567029EA.5030607@redhat.com>
	<CAHjP37FrofiZcaa0-3NXuoW-4nnDe6byrkevBm3WBzFA2-gRvA@mail.gmail.com>
	<4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap>
	<56703C8C.4000801@redhat.com>
	<CAF1YaiB8-thrBkP-Q58TDL026Fpd2HTrerOEFzxEy3uDWFL9qg@mail.gmail.com>
	<7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap>
	<5672BFD3.7040307@redhat.com>
	<CAF1YaiDOqXdp8KFpX-kU=1Lab2RWPEvwvsf=Oyp3obV5zviOAw@mail.gmail.com>
	<5672D61E.3020805@redhat.com>	<5672D82A.309@redhat.com>
	<7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap>
	<CAHjP37FE2vFEEQtQn0v0sY8MkK0Qe8ykQfjS0yQsVSOfyW3D4w@mail.gmail.com>
Message-ID: <4295855A5C1DE049A61835A1887419CC41EE3661@DEWDFEMB12A.global.corp.sap>

Hi

> >	You also had asked if the objects in my example were scalar replaced.
> > By default, they do get scalar-replaced, but I had prevented this by -XX:-
> > EliminateAllocations which does not influence the escape state and the
> > membar optimizations.
> 
> I'd say that's a big problem, no? The membar elimination is only safe if the
> allocation is actually removed.  If the analysis says it's NoEscape but compiler
> still allocates it for whatever reason (Goetz mentioned a couple earlier in this
> thread), then it seems insufficient to rely on just the analysis result.
Well, if it's NoEscape it's safe to remove the barriers wrt. to Java semantics, 
no matter what other optimizations (here: scalar replacement) do.  But here 
we look at the importance of the barrier to the runtime system, which is 
VM implementation specific. 
In particular, the new optimization addresses also objects that escape, as long
as they don't escape before the barrier at the end of the constructor.

Best regards,
  Goetz.


> On Thu, Dec 17, 2015 at 12:58 PM, Doerr, Martin <martin.doerr at sap.com
> <mailto:martin.doerr at sap.com> > wrote:
> 
> 
> 	Hi Andrew,
> 
> 	thanks for your emails.
> 
> 	Many memory barriers are only there for concurrent java threads
> and are not relevant for GC. They are opportunities for EscapeAnalysis-based
> optimizations.
> 
> 	The MemBarStoreStore after the Allocation actually has this purpose
> plus the additional purpose to satisfy GC requirements. EscapeAnalysis was
> not designed to analyze "escape to concurrent GC". I guess it is difficult to
> analyze this in general.
> 
> 	So maybe it would be better to change the condition for the
> MemBarStoreStore barrier insertion to something like
> 	"gc_requires_initialized_new_obj_headers() || !alloc-
> >does_not_escape..." with the first function containing the knowledge
> about all GCs.
> 
> 	You also had asked if the objects in my example were scalar replaced.
> By default, they do get scalar-replaced, but I had prevented this by -XX:-
> EliminateAllocations which does not influence the escape state and the
> membar optimizations.
> 
> 	Best regards,
> 	 Martin
> 
> 	-----Original Message-----
> 	From: Andrew Haley [mailto:aph at redhat.com
> <mailto:aph at redhat.com> ]
> 	Sent: Donnerstag, 17. Dezember 2015 16:44
> 	To: Hui Shi <hui.shi at linaro.org <mailto:hui.shi at linaro.org> >
> 	Cc: Doerr, Martin <martin.doerr at sap.com
> <mailto:martin.doerr at sap.com> >; Lindenmaier, Goetz
> <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com> >;
> Vitaly Davidovich <vitalyd at gmail.com <mailto:vitalyd at gmail.com> >;
> Aleksey Shipilev <aleksey.shipilev at oracle.com
> <mailto:aleksey.shipilev at oracle.com> >; Vladimir Kozlov
> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com> >;
> hotspot compiler <hotspot-compiler-dev at openjdk.java.net
> <mailto:hotspot-compiler-dev at openjdk.java.net> >; aarch64-port-dev
> <aarch64-port-dev at openjdk.java.net <mailto:aarch64-port-
> dev at openjdk.java.net> >; Mikael Gerdin <mikael.gerdin at oracle.com
> <mailto:mikael.gerdin at oracle.com> > (mikael.gerdin at oracle.com
> <mailto:mikael.gerdin at oracle.com> ) <mikael.gerdin at oracle.com
> <mailto:mikael.gerdin at oracle.com> >
> 	Subject: Re: RFR: 8144993: Elide redundant memory barrier after
> AllocationNode
> 
> 
> 	The potential problem only arises if "this" is published unsafely and
> 	the object to which it is published doesn't escape.
> 
> 	Can't we detect unsafe publication?  It ought to be easier than
> escape
> 	analysis: it's a matter of detecting that "this" escapes from the
> 	constructor.
> 
> 	Andrew.
> 
> 


From hui.shi at linaro.org  Fri Dec 18 12:45:43 2015
From: hui.shi at linaro.org (Hui Shi)
Date: Fri, 18 Dec 2015 20:45:43 +0800
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
	barrier after AllocationNode
In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EE35EF@DEWDFEMB12A.global.corp.sap>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com>
	<7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<567024A0.40409@redhat.com>
	<CAHjP37HGnsevbOzku7cZrsVF+V1VGF-zoCijCH=_+Q+_eYgYDw@mail.gmail.com>
	<567029EA.5030607@redhat.com>
	<CAHjP37FrofiZcaa0-3NXuoW-4nnDe6byrkevBm3WBzFA2-gRvA@mail.gmail.com>
	<4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap>
	<56703C8C.4000801@redhat.com>
	<CAF1YaiB8-thrBkP-Q58TDL026Fpd2HTrerOEFzxEy3uDWFL9qg@mail.gmail.com>
	<4295855A5C1DE049A61835A1887419CC41EE35EF@DEWDFEMB12A.global.corp.sap>
Message-ID: <CAF1YaiAR8qQp+iGj_23g64NA1K659QFJO=fL=pH08pc6EtUjpQ@mail.gmail.com>

Thanks Gotez!

case 1) can be handle with current patch. BCEA information is getting from
owning method when inserting release memory barrier for final field write.
Final field is initialized in its owning allocation node's constructor
method.

Following code is in parse::do_exits, alloc->compute_MemBar_redundancy  get
constructor method's BCEA information and check if allocation escape in
constructor method.

   if (method()->is_initializer() &&
         (wrote_final() ||
            PPC64_ONLY(wrote_volatile() ||)
            (AlwaysSafeConstructors && wrote_fields()))) {
     _exits.insert_mem_bar(Op_MemBarRelease, alloc_with_final());
+
+    // If Memory barrier is created for final fields write
+    // and allocation node does not escape the initialize method,
+    // then barrier introduced by allocation node can be removed.
+    if (DoEscapeAnalysis && alloc_with_final()) {
+      AllocateNode *alloc =
AllocateNode::Ideal_allocation(alloc_with_final(), &_gvn);
+      alloc->compute_MemBar_redundancy(method());
+    }


Regards
Hui


On 18 December 2015 at 18:43, Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
wrote:

> Hi Hui,
>
> > Subject: Re: RFR: 8144993: Elide redundant memory barrier after
> > AllocationNode
> >
> > Thanks Andrew, Goetz and all!
> >
> > Major concern is will removing storestore barrier cause other threads
> read
> > stale data for newly allocated object. Other threads include java thread
> or
> > concurrent GC thread. It should be safe with following analysis.
> >
> > 1. If BCEA result "this"(b) escapes in its initializer, change will not
> optimize
> > storestore barrier.
> > 2. If BCEA result "this"(b) does not escape in its initializer, it's
> safe to remove
> > storestore.
> >    2.1 If there is a safe point between storestore and release, b is
> visible to GC
> > in initializer, but at safe point, it should have a memory barrier.
> >    2.2 If there is no safe point between storestore and release. b will
> be visible
> > to other thread after release memory barrier.
> I think this describes the situation correctly wrt. to my counterexample.
> I'm
> not sure whether there are other possibilities.
>
> Is the test for 1.) already implemented?
> How do you do this?  Is inlining of the constructor delayed when you do
> your optimization, so you can find the call to it?  Or do you find the
> BCEA information
> via the class that is reachable over the type information?  How do you
> known then
> which constructor was called if there are several ones?
>
> Best regards,
>   Goetz.
>
>
>
>
>
> >
> > Case #1
> > A a = new A();
> > safepoint // a can be reached from GC
> > new B(a)
> >
> > allocation
> > -------
> > b.klass =...
> > b.markword =...
> > b.f1 = 0
> > ..
> > b.fn = 0
> > storestore
> > -------- init start
> > ....
> > a.x = this;  // b might visible to other threads here
> > ....
> > release
> > -------- init end
> >
> > BCEA result indicate "this"(b) is not local and not arg_stack. So "b"
> will be
> > treated as escaped in its initialzer, so change will not optimize
> storestore
> > barrier.
> > [EA] estimated escape information for B::<init>
> >      non-escaping args:      {}
> >      stack-allocatable args: {1}
> >      return non-local value
> >      modified args:     0x6    0x6
> >      flags:
> > b="this"  is not local and not arg_stack
> > a        is arg_stack means it is passed in and not assigned to other
> object in
> > initializer.
> >
> > Case #2.1
> > allocation
> > -------
> > b.klass =...
> > b.markword =...
> > b.f1 = 0
> > ..
> > b.fn = 0
> > storestore
> > -------- init start
> > ....
> > safepoint  // "this" is in oop map and might visible to GC thread here
> > ....
> > release
> > -------- init end
> >
> > Case #2.2
> > allocation
> > -------
> > b.klass =...
> > b.markword =...
> > b.f1 = 0
> > ..
> > b.fn = 0
> > storestore
> > -------- init start
> > ....
> > release
> > -------- init end
> >
> > Regards
> > Hui
> >
> > On 16 December 2015 at 00:15, Andrew Haley <aph at redhat.com
> > <mailto:aph at redhat.com> > wrote:
> >
> >
> >       On 12/15/2015 04:01 PM, Lindenmaier, Goetz wrote:
> >
> >       > Further, if the object is NoEscape it might not be scalar
> >       > replaced. If I remember correctly, there are various conditions,
> >       > e.g., too big, allocated in loop.
> >
> >       Well, that's the killer.  The definition of "escape" we need to use
> >       here is the really, truly, honest-to-goodness one: that this object
> >       never becomes visible to any other thread by any means.  Unless
> > that
> >       is so, all bets are off.  In this case, what is intended is
> "appears
> >       in an OOP map".
> >
> >       Andrew.
> >
> >
>
>

From hui.shi at linaro.org  Fri Dec 18 13:10:06 2015
From: hui.shi at linaro.org (Hui Shi)
Date: Fri, 18 Dec 2015 21:10:06 +0800
Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory
	barrier after AllocationNode
In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap>
References: <CAF1YaiDHohqfZmq=z9FPtOeiQgkfOS6xk6A2K7Hk=y66e2zuiA@mail.gmail.com>
	<566FEE89.5020300@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap>
	<567019A5.1000202@redhat.com>
	<4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap>
	<56701E2E.5000901@redhat.com>
	<CAHjP37HQKgcsw8-Ua4ej3CUjfCDvL9PahLbsrzO9xP-_1cOPxQ@mail.gmail.com>
	<567024A0.40409@redhat.com>
	<CAHjP37HGnsevbOzku7cZrsVF+V1VGF-zoCijCH=_+Q+_eYgYDw@mail.gmail.com>
	<567029EA.5030607@redhat.com>
	<CAHjP37FrofiZcaa0-3NXuoW-4nnDe6byrkevBm3WBzFA2-gRvA@mail.gmail.com>
	<4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap>
	<56703C8C.4000801@redhat.com>
	<CAF1YaiB8-thrBkP-Q58TDL026Fpd2HTrerOEFzxEy3uDWFL9qg@mail.gmail.com>
	<7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap>
	<5672BFD3.7040307@redhat.com>
	<CAF1YaiDOqXdp8KFpX-kU=1Lab2RWPEvwvsf=Oyp3obV5zviOAw@mail.gmail.com>
	<5672D61E.3020805@redhat.com> <5672D82A.309@redhat.com>
	<7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap>
Message-ID: <CAF1YaiCmoy+-xs5uMteU_m300Dx61hVXUnv3di_PKr5jsUqQCg@mail.gmail.com>

Thanks Andrew and Martin!

Agree, it's better fix original storestore barrier optimization with escape
information.

When entering PhaseMacroExpand::expand_allocate_common, object must be
allocated on heap and can't be scalar replaced? This issue can't be solved
by detecting unsafe publish only in constructor, in following example, b is
published outside constructor and storestore barrier still can't be removed.
  public void TestMethod() {
    A a = new A();
    dont_inline_me();
    //System.gc();
    B b = new B(); // empty constructor
    // nosafe point
    a.b = b;
  }

Martin proposed fix looks reasonable, disable original storestore barrier
optimization if GC threads might reference allocated object.

Regards
Hui

On 18 December 2015 at 01:58, Doerr, Martin <martin.doerr at sap.com> wrote:

> Hi Andrew,
>
> thanks for your emails.
>
> Many memory barriers are only there for concurrent java threads and are
> not relevant for GC. They are opportunities for EscapeAnalysis-based
> optimizations.
>
> The MemBarStoreStore after the Allocation actually has this purpose plus
> the additional purpose to satisfy GC requirements. EscapeAnalysis was not
> designed to analyze "escape to concurrent GC". I guess it is difficult to
> analyze this in general.
>
> So maybe it would be better to change the condition for the
> MemBarStoreStore barrier insertion to something like
> "gc_requires_initialized_new_obj_headers() || !alloc->does_not_escape..."
> with the first function containing the knowledge about all GCs.
>
> You also had asked if the objects in my example were scalar replaced. By
> default, they do get scalar-replaced, but I had prevented this by
> -XX:-EliminateAllocations which does not influence the escape state and the
> membar optimizations.
>
> Best regards,
>  Martin
>
> -----Original Message-----
> From: Andrew Haley [mailto:aph at redhat.com]
> Sent: Donnerstag, 17. Dezember 2015 16:44
> To: Hui Shi <hui.shi at linaro.org>
> Cc: Doerr, Martin <martin.doerr at sap.com>; Lindenmaier, Goetz <
> goetz.lindenmaier at sap.com>; Vitaly Davidovich <vitalyd at gmail.com>;
> Aleksey Shipilev <aleksey.shipilev at oracle.com>; Vladimir Kozlov <
> vladimir.kozlov at oracle.com>; hotspot compiler <
> hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev <
> aarch64-port-dev at openjdk.java.net>; Mikael Gerdin <
> mikael.gerdin at oracle.com> (mikael.gerdin at oracle.com) <
> mikael.gerdin at oracle.com>
> Subject: Re: RFR: 8144993: Elide redundant memory barrier after
> AllocationNode
>
> The potential problem only arises if "this" is published unsafely and
> the object to which it is published doesn't escape.
>
> Can't we detect unsafe publication?  It ought to be easier than escape
> analysis: it's a matter of detecting that "this" escapes from the
> constructor.
>
> Andrew.
>

From bob.vandette at oracle.com  Wed Dec 23 14:55:35 2015
From: bob.vandette at oracle.com (Bob Vandette)
Date: Wed, 23 Dec 2015 09:55:35 -0500
Subject: [aarch64-port-dev ] VAR_CPU_ARCH for ARM platforms
Message-ID: <C1C54234-9AC2-4649-B9BD-738049F6E541@oracle.com>

In my push to the mobile/dev forest, I changed VAR_CPU_ARCH on arm platforms to always use arm for both
32 and 64 bit arm builds to be consistent with the setting for x86/x86_64.

http://cr.openjdk.java.net/~bobv/8145936/webrev.00/ <http://cr.openjdk.java.net/~bobv/8145936/webrev.00/>

My assumption which is confirmed by most of the usage in the makefiles is that VAR_CPU_ARCH should be
set to the generic ARCH family (x86, arm) for both 32 and 64 bit builds.

My motivation for doing this was initially for the selection of the Socket and UnixConstant template files used
in cross compilation since these files contain the same content for arm and aarch64.

This seems to be causing at least one problem in the hotspot build where in JDK 9, ARCH is being set to
VAR_CPU_ARCH (via OPENJDK_TARGET_CPU_ARCH).  For aarch64 builds, ARCH gets set to arm.

In JDK8, ARCH is set to VAR_CPU and not VAR_CPU_ARCH.  Was there a reason for this change?
Can we go back to the way it was in JDK8????

There are a lot of hacks in both open and closed makefiles to set various variable based on ARCH in
order to end up with the correct variables.

In hotspot/make/defs.make, we undo the VAR_CPU_ARCH setting of x86 for x86_64 builds by
checking for LP64!    This is not done for arm.

 BUILDARCH ?= $(SRCARCH)
 ifeq ($(BUILDARCH), x86)
   ifdef LP64
     BUILDARCH = amd64
   else
     BUILDARCH = i486
   endif
 endif


in hotspot/make/closed/defs.make, we don't fix this issue either.

ifeq ($(ARCH), arm)
 SRCARCH = arm
 LIBARCH = arm

 ARCH_DATA_MODEL  = 32
 PLATFORM         = linux-arm
 VM_PLATFORM = linux_arm
 HS_ARCH          = arm
endif

ifeq ($(ARCH), aarch64)
 BUILDARCH = aarch64
 SRCARCH   = arm
 LIBARCH   = aarch64
 HS_ARCH   = arm
 SAARCH    = arm64
endif

From aph at redhat.com  Wed Dec 23 16:55:56 2015
From: aph at redhat.com (Andrew Haley)
Date: Wed, 23 Dec 2015 16:55:56 +0000
Subject: [aarch64-port-dev ] VAR_CPU_ARCH for ARM platforms
In-Reply-To: <C1C54234-9AC2-4649-B9BD-738049F6E541@oracle.com>
References: <C1C54234-9AC2-4649-B9BD-738049F6E541@oracle.com>
Message-ID: <567AD21C.5040907@redhat.com>

On 23/12/15 14:55, Bob Vandette wrote:

> In my push to the mobile/dev forest, I changed VAR_CPU_ARCH on arm
> platforms to always use arm for both 32 and 64 bit arm builds to be
> consistent with the setting for x86/x86_64.

This isn't a similar situation, IMO.

> http://cr.openjdk.java.net/~bobv/8145936/webrev.00/ <http://cr.openjdk.java.net/~bobv/8145936/webrev.00/>
> 
> My assumption which is confirmed by most of the usage in the
> makefiles is that VAR_CPU_ARCH should be set to the generic ARCH
> family (x86, arm) for both 32 and 64 bit builds.
> 
> My motivation for doing this was initially for the selection of the
> Socket and UnixConstant template files used in cross compilation
> since these files contain the same content for arm and aarch64.

I'm not convinced this makes any sense.  The only thing the ARM
architectures have in common is that they come from the same company.
This is not true of x86_64, which is a rather elaborate 64-bit
extension of x86.  For examples of how the ARM/AArch64 split is
handled elsewhere, note that the Linux kernel, GCC, and GNU binutils
arches are all separate.

> There are a lot of hacks in both open and closed makefiles to set
> various variable based on ARCH in order to end up with the correct
> variables.
> 
> In hotspot/make/defs.make, we undo the VAR_CPU_ARCH setting of x86
> for x86_64 builds by checking for LP64!  This is not done for arm.

It really should not need to be.  AArch64 is not ARM.

Andrew.

From andrey.petushkov at gmail.com  Wed Dec 23 17:12:26 2015
From: andrey.petushkov at gmail.com (Andrey Petushkov)
Date: Wed, 23 Dec 2015 17:12:26 +0000
Subject: [aarch64-port-dev ] VAR_CPU_ARCH for ARM platforms
In-Reply-To: <567AD21C.5040907@redhat.com>
References: <C1C54234-9AC2-4649-B9BD-738049F6E541@oracle.com>
	<567AD21C.5040907@redhat.com>
Message-ID: <CADV+1JrRC8VV4vb08t5=Y3ptZbFBQk9HFmqKGd2Hz8F9hanvGg@mail.gmail.com>

Hi guys,

And indeed, please don't forget about AArch32 port. It's like ARM but it's
quite different, you know. And it is currently using aarch32 value as
VAR_CPU and VAR_CPU_ARCH

Thanks,
Andrey

On Wed, Dec 23, 2015 at 7:56 PM Andrew Haley <aph at redhat.com> wrote:

> On 23/12/15 14:55, Bob Vandette wrote:
>
> > In my push to the mobile/dev forest, I changed VAR_CPU_ARCH on arm
> > platforms to always use arm for both 32 and 64 bit arm builds to be
> > consistent with the setting for x86/x86_64.
>
> This isn't a similar situation, IMO.
>
> > http://cr.openjdk.java.net/~bobv/8145936/webrev.00/ <
> http://cr.openjdk.java.net/~bobv/8145936/webrev.00/>
> >
> > My assumption which is confirmed by most of the usage in the
> > makefiles is that VAR_CPU_ARCH should be set to the generic ARCH
> > family (x86, arm) for both 32 and 64 bit builds.
> >
> > My motivation for doing this was initially for the selection of the
> > Socket and UnixConstant template files used in cross compilation
> > since these files contain the same content for arm and aarch64.
>
> I'm not convinced this makes any sense.  The only thing the ARM
> architectures have in common is that they come from the same company.
> This is not true of x86_64, which is a rather elaborate 64-bit
> extension of x86.  For examples of how the ARM/AArch64 split is
> handled elsewhere, note that the Linux kernel, GCC, and GNU binutils
> arches are all separate.
>
> > There are a lot of hacks in both open and closed makefiles to set
> > various variable based on ARCH in order to end up with the correct
> > variables.
> >
> > In hotspot/make/defs.make, we undo the VAR_CPU_ARCH setting of x86
> > for x86_64 builds by checking for LP64!  This is not done for arm.
>
> It really should not need to be.  AArch64 is not ARM.
>
> Andrew.
>

From bob.vandette at oracle.com  Wed Dec 23 20:36:35 2015
From: bob.vandette at oracle.com (Bob Vandette)
Date: Wed, 23 Dec 2015 15:36:35 -0500
Subject: [aarch64-port-dev ] VAR_CPU_ARCH for ARM platforms
In-Reply-To: <567AD21C.5040907@redhat.com>
References: <C1C54234-9AC2-4649-B9BD-738049F6E541@oracle.com>
	<567AD21C.5040907@redhat.com>
Message-ID: <C9202924-E3CC-4A10-BAC1-7C6C14493614@oracle.com>


> On Dec 23, 2015, at 11:55 AM, Andrew Haley <aph at redhat.com> wrote:
> 
> On 23/12/15 14:55, Bob Vandette wrote:
> 
>> In my push to the mobile/dev forest, I changed VAR_CPU_ARCH on arm
>> platforms to always use arm for both 32 and 64 bit arm builds to be
>> consistent with the setting for x86/x86_64.
> 
> This isn't a similar situation, IMO.

There appears to be a need for a variable that is used to indicate an x86
or ARM specific path independent of the specific type of ARM or x86 processor.

Why don?t you this this is a similar situation.  x86_64 is a 64-bit 
Intel architecture that also has the ability to run it?s legacy 32 bit binaries.

aarch64 is a 64-bit ARM architecture that also has the ability to run its legacy 
armv7 (aarch32) 32-bit binaries.  

aarch32 may be slightly different in that it has the ability to use some newer armv8
instructions but it is compatible with armv7 with very few exceptions like the old mcr instructions.

> 
>> http://cr.openjdk.java.net/~bobv/8145936/webrev.00/ <http://cr.openjdk.java.net/~bobv/8145936/webrev.00/>
>> 
>> My assumption which is confirmed by most of the usage in the
>> makefiles is that VAR_CPU_ARCH should be set to the generic ARCH
>> family (x86, arm) for both 32 and 64 bit builds.
>> 
>> My motivation for doing this was initially for the selection of the
>> Socket and UnixConstant template files used in cross compilation
>> since these files contain the same content for arm and aarch64.
> 
> I'm not convinced this makes any sense.  The only thing the ARM
> architectures have in common is that they come from the same company.

> This is not true of x86_64, which is a rather elaborate 64-bit
> extension of x86.
One could say the same thing about armv8 versus armv7.

>  For examples of how the ARM/AArch64 split is
> handled elsewhere, note that the Linux kernel, GCC, and GNU binutils
> arches are all separate.
> 
>> There are a lot of hacks in both open and closed makefiles to set
>> various variable based on ARCH in order to end up with the correct
>> variables.
>> 
>> In hotspot/make/defs.make, we undo the VAR_CPU_ARCH setting of x86
>> for x86_64 builds by checking for LP64!  This is not done for arm.
> 
> It really should not need to be.  AArch64 is not ARM.
That really depends on your criteria for comparison.  I still believe we need a broad
variable that identifies ARM varieties.  Without this, when the aarch32 port is attempted
there?s going to be a lot of extraneous checks required in the makefile for ?if ARCH == aarch32? || ARCH == arm in
places that would not need to be changed simply because we didn?t use the existing variable 
for the purpose that I believe it was originally intended.

Bob.

> 
> Andrew.


From aph at redhat.com  Wed Dec 23 23:43:54 2015
From: aph at redhat.com (Andrew Haley)
Date: Wed, 23 Dec 2015 23:43:54 +0000
Subject: [aarch64-port-dev ] VAR_CPU_ARCH for ARM platforms
In-Reply-To: <C9202924-E3CC-4A10-BAC1-7C6C14493614@oracle.com>
References: <C1C54234-9AC2-4649-B9BD-738049F6E541@oracle.com>
	<567AD21C.5040907@redhat.com>
	<C9202924-E3CC-4A10-BAC1-7C6C14493614@oracle.com>
Message-ID: <567B31BA.3060001@redhat.com>

On 23/12/15 20:36, Bob Vandette wrote:
>> > This is not true of x86_64, which is a rather elaborate 64-bit
>> > extension of x86.
> One could say the same thing about armv8 versus armv7.

I don't think one could.  I suspect this exact architecture could have
been designed by some other company, and no-one would have suggested
it was related.  Maybe someone might have said "Ooh, it's very
ARM-ish," but that's all.  It's a clean sheet design, it's not just
wider with more registers.  (The floating-point units are very
similar, I'll grant you.)  In contrast, x86_64 is pretty much a
superset with even the same binary encodings for many instructions.

[ NB: ARMv8 identifies both the AArch32 and AArch64 instruction set
architectures.  AArch32 is a slightly extended ARM; AArch64 is all-
new. ]

> That really depends on your criteria for comparison.  I still
> believe we need a broad variable that identifies ARM varieties.

Maybe so.  I guess this would capture what they have in common with
each other that is different from other architectures.  But there
isn't much of that.

> Without this, when the aarch32 port is attempted there?s going to be
> a lot of extraneous checks required in the makefile for ?if ARCH ==
> aarch32? || ARCH == arm in places that would not need to be changed
> simply because we didn?t use the existing variable for the purpose
> that I believe it was originally intended.

I totally agree about AArch32 and ARM.  It's the same thing: the
AArch32 project is just about creating ARM-open.  There definitely
should be a variable to cover those.

Andrew.

From edward.nevill at gmail.com  Thu Dec 24 12:27:54 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Thu, 24 Dec 2015 12:27:54 +0000
Subject: [aarch64-port-dev ] VAR_CPU_ARCH for ARM platforms
In-Reply-To: <567B31BA.3060001@redhat.com>
References: <C1C54234-9AC2-4649-B9BD-738049F6E541@oracle.com>
	<567AD21C.5040907@redhat.com>
	<C9202924-E3CC-4A10-BAC1-7C6C14493614@oracle.com>
	<567B31BA.3060001@redhat.com>
Message-ID: <1450960074.31650.8.camel@mint>

On Wed, 2015-12-23 at 23:43 +0000, Andrew Haley wrote:
> On 23/12/15 20:36, Bob Vandette wrote:
> >> > This is not true of x86_64, which is a rather elaborate 64-bit
> >> > extension of x86.
...
> > One could say the same thing about armv8 versus armv7.
> [ NB: ARMv8 identifies both the AArch32 and AArch64 instruction set
> architectures.  AArch32 is a slightly extended ARM; AArch64 is all-
> new. ]
> 
> > That really depends on your criteria for comparison.  I still
> > believe we need a broad variable that identifies ARM varieties.
....
> > Without this, when the aarch32 port is attempted there?s going to be
> > a lot of extraneous checks required in the makefile for ?if ARCH ==
> > aarch32? || ARCH == arm in places that would not need to be changed
> > simply because we didn?t use the existing variable for the purpose
> > that I believe it was originally intended.
> 
> I totally agree about AArch32 and ARM.  It's the same thing: the
> AArch32 project is just about creating ARM-open.  There definitely
> should be a variable to cover those.

FWIW the aarch32 port does

-DAARCH32 -DARM

in its sysdefs. The rationale for adding -DAARCH32 is to avoid conflicts
with the proprietary port.

My 2c worth is that aarch64 should be considered a completely separate
port. It is not like x86/x86_64. You cannot access the 32 bit
instructions from aarch34. In fact some implementations do not even have
the 32 bit instructions, ie they are pure aarch64.

wrt the differences between armv7 and aarch32 they are not worth
considering as separate, they are only a few mcr instructions to do with
cache flushing/barriers and are deprecated in armv7 in any case. The
correct fix is to use the non deprecated instructions in armv7 which
will then also work on aarch32.

All the best,
Ed.


From edward.nevill at gmail.com  Thu Dec 24 15:06:56 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Thu, 24 Dec 2015 15:06:56 +0000
Subject: [aarch64-port-dev ] guarantee failures with large code cache sizes
 on jtreg test java/lang/invoke/LFCaching10/LFMultiThreadCachingTest.java
Message-ID: <1450969616.31650.50.camel@mint>

Hi,

I am seeing intermittent guarantee failures on jdk jtreg test java/lang/invoke/LFCaching10/LFMultiThreadCachingTest.jtr.

The failure is

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (assembler_aarch64.hpp:218), pid=43991, tid=44418
#  guarantee(chk == -1 || chk == 0) failed: Field too big for insn
#

The test is being run with -XX:ReservedCodeCacheSize=256m, the following is the full command line

/home/ed/images/jdk9-orig/bin/java -XX:-TieredCompilation -jar lib/jtreg.jar -vmoption:-XX:ReservedCodeCacheSize=256m -retain -nr -conc:8 -timeout:99 -othervm -jdk:/home/ed/images/jdk9-orig -v1 -a -ignore:quiet /home/ed/new_jdk9/dev/jdk_test/test/java/lang/invoke

I have trapped the failure in gdb, it is occurring in pd_patch_instruction_size when trying to patch a BL instruction.

#8  0x000003ff7a7a360c in MacroAssembler::pd_patch_instruction_size (
    branch=0x3ff691cf2d8 "\223\323\343\227\277:\003\325\213c\313\071\313\b", 
    target=0x3ff60a108a4 "\375{\277\251\375\003")
    at /home/ed/new_jdk9/hs-comp/hotspot/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp:74
74          Instruction_aarch64::spatch(branch, 25, 0, offset);
(gdb) p offset
$17 = -35584653
(gdb) 

Here is the backtrace from gdb

#8  0x000003ff7a7a360c in MacroAssembler::pd_patch_instruction_size (
    branch=0x3ff691cf2d8 "\223\323\343\227\277:\003\325\213c\313\071\313\b", 
    target=0x3ff60a108a4 "\375{\277\251\375\003")
    at /home/ed/new_jdk9/hs-comp/hotspot/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp:74
#9  0x000003ff7a33451c in MacroAssembler::pd_patch_instruction (
    branch=0x3ff691cf2d8 "\223\323\343\227\277:\003\325\213c\313\071\313\b", 
    target=0x3ff60a108a4 "\375{\277\251\375\003")
    at /home/ed/new_jdk9/hs-comp/hotspot/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp:565
#10 0x000003ff7a8ca9bc in Relocation::pd_set_call_destination (this=0x3fdc5fab3e8, 
    x=0x3ff60a108a4 "\375{\277\251\375\003")
    at /home/ed/new_jdk9/hs-comp/hotspot/src/cpu/aarch64/vm/relocInfo_aarch64.cpp:85
#11 0x000003ff7a8c8650 in CallRelocation::fix_relocation_after_move (
    this=0x3fdc5fab3e8, src=0x3fdc5fae0b0, dest=0x3fdc5fab490)
    at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/code/relocInfo.cpp:549
#12 0x000003ff7a4736bc in CodeBuffer::relocate_code_to (this=0x3fdc5fae0b0, 
    dest=0x3fdc5fab490)
    at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/asm/codeBuffer.cpp:812
#13 0x000003ff7a473be8 in CodeBuffer::expand (this=0x3fdc5fae0b0, 
    which_cs=0x3fdc5fae158, amount=64)
    at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/asm/codeBuffer.cpp:942
#14 0x000003ff7a334404 in CodeSection::maybe_expand_to_ensure_remaining (
    this=0x3fdc5fae158, amount=64)
    at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/asm/codeBuffer.hpp:661
#15 0x000003ff7a33379c in AbstractAssembler::start_a_stub (this=0x3fdc5fab838, 
    required_space=64)
    at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/asm/assembler.cpp:65
#16 0x000003ff7a7a54a0 in MacroAssembler::emit_trampoline_stub (this=0x3fdc5fab838, 
    insts_call_instruction_offset=976, dest=0x3ff609cf080 "\375{\277\251H\001")
    at /home/ed/new_jdk9/hs-comp/hotspot/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp:704
#17 0x000003ff7a7a53c0 in MacroAssembler::trampoline_call (this=0x3fdc5fab838, 
    entry=..., cbuf=0x3fdc5fae0b0)
    at /home/ed/new_jdk9/hs-comp/hotspot/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp:673
#18 0x000003ff7a2a1bd0 in CallStaticJavaDirectNode::emit (this=0x3fdac0024b0, 
    cbuf=..., ra_=0x3fdc5fabd30)
    at /home/ed/new_jdk9/hs-comp/hotspot/src/cpu/aarch64/vm/aarch64.ad:4673
#19 0x000003ff7a85edc0 in Compile::fill_buffer (this=0x3fdc5fad870, 
    cb=0x3fdc5fae0b0, blk_starts=0x3fd40042520)
    at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/opto/output.cpp:1380
#20 0x000003ff7a85b960 in Compile::Output (this=0x3fdc5fad870)
    at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/opto/output.cpp:154
#21 0x000003ff7a4a6c88 in Compile::Code_Gen (this=0x3fdc5fad870)
    at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/opto/compile.cpp:2407
#22 0x000003ff7a4a1fa8 in Compile::Compile (this=0x3fdc5fad870, 
    ci_env=0x3fdc5fae390, compiler=0x3ff746bc7d0, target=0x3fd981ff670, osr_bci=-1, 
    subsume_loads=true, do_escape_analysis=true, eliminate_boxing=true, 
    directive=0x3ff74680570)
    at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/opto/compile.cpp:899
#23 0x000003ff7a3e7684 in C2Compiler::compile_method (this=0x3ff746bc7d0, 
    env=0x3fdc5fae390, target=0x3fd981ff670, entry_bci=-1, directive=0x3ff74680570)
    at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/opto/c2compiler.cpp:106
#24 0x000003ff7a4b2ea8 in CompileBroker::invoke_compiler_on_method (
    task=0x3fd640bbfd0)
    at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/compiler/compileBroker.cpp:1814
#25 0x000003ff7a4b25d4 in CompileBroker::compiler_thread_loop ()
    at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/compiler/compileBroker.cpp:1564
#26 0x000003ff7a96a9b4 in compiler_thread_entry (thread=0x3ff746bf000, 
    __the_thread__=0x3ff746bf000)
    at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/runtime/thread.cpp:3238
#27 0x000003ff7a9678f4 in JavaThread::thread_main_inner (this=0x3ff746bf000)
    at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/runtime/thread.cpp:1723
#28 0x000003ff7a967830 in JavaThread::run (this=0x3ff746bf000)
    at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/runtime/thread.cpp:1703
#29 0x000003ff7a849614 in java_start (thread=0x3ff746bf000)
    at /home/ed/new_jdk9/hs-comp/hotspot/src/os/linux/vm/os_linux.cpp:683
#30 0x000003ff7af07e2c in start_thread (arg=0x3fdc5faf1f0) at pthread_create.c:314
#31 0x000003ff7ae18c40 in clone ()
    at ../ports/sysdeps/unix/sysv/linux/aarch64/nptl/../clone.S:96


Looking at frame #11 above we see

(gdb) list
544       // On some platforms, the reference is absolute (not self-relative).
545       // The enhanced use of pd_call_destination sorts this all out.
546       address orig_addr = old_addr_for(addr(), src, dest);
547       address callee    = pd_call_destination(orig_addr);
548       // Reassert the callee address, this time in the new copy of the code.
549       pd_set_call_destination(callee);
550     }
551
552
553     //// pack/unpack methods
(gdb) p/x addr()
$18 = 0x3ff691cf2d8
(gdb) p/x orig_addr
$19 = 0x3ff6111ba58
(gdb) p/x callee
$20 = 0x3ff60a108a4

Looking at a section of code at both orig_addr and addr() and at the destination of the BL in each case we have

(gdb) x/10i orig_addr-20
   0x3ff6111ba44:       add     x10, x14, w15, sxtw
   0x3ff6111ba48:       sxtw    x2, w17
   0x3ff6111ba4c:       add     x0, x10, #0x10
   0x3ff6111ba50:       cmp     w17, w13
   0x3ff6111ba54:       b.lt    0x3ff6111bb24
   0x3ff6111ba58:       bl      0x3ff60a108a4
   0x3ff6111ba5c:       dmb     ishst
   0x3ff6111ba60:       ldrsb   w11, [x28,#728]
   0x3ff6111ba64:       cbnz    w11, 0x3ff6111bb7c
   0x3ff6111ba68:       mov     x10, x19
(gdb) x/10i 0x3ff60a108a4
   0x3ff60a108a4:       stp     x29, x30, [sp,#-16]!
   0x3ff60a108a8:       mov     x29, sp
   0x3ff60a108ac:       cmp     x1, x0
   0x3ff60a108b0:       b.ls    0x3ff60a10808
   0x3ff60a108b4:       add     x0, x0, x2, uxtx
   0x3ff60a108b8:       add     x1, x1, x2, uxtx
   0x3ff60a108bc:       cmp     x2, #0x10
   0x3ff60a108c0:       b.cc    0x3ff60a10914
   0x3ff60a108c4:       and     x9, x0, #0xf
   0x3ff60a108c8:       cbz     x9, 0x3ff60a1090c
(gdb) x/10i addr()-20
   0x3ff691cf2c4:       add     x10, x14, w15, sxtw
   0x3ff691cf2c8:       sxtw    x2, w17
   0x3ff691cf2cc:       add     x0, x10, #0x10
   0x3ff691cf2d0:       cmp     w17, w13
   0x3ff691cf2d4:       b.lt    0x3ff691cf3a4
   0x3ff691cf2d8:       bl      0x3ff68ac4124
   0x3ff691cf2dc:       dmb     ishst
   0x3ff691cf2e0:       ldrsb   w11, [x28,#728]
   0x3ff691cf2e4:       cbnz    w11, 0x3ff691cf3fc
   0x3ff691cf2e8:       mov     x10, x19
(gdb) x/10i 0x3ff68ac4124
   0x3ff68ac4124:       .inst   0x00000000 ; undefined
   0x3ff68ac4128:       .inst   0x00000000 ; undefined
   0x3ff68ac412c:       .inst   0x00000000 ; undefined
   0x3ff68ac4130:       .inst   0x00000000 ; undefined
   0x3ff68ac4134:       .inst   0x00000000 ; undefined
   0x3ff68ac4138:       .inst   0x00000000 ; undefined
   0x3ff68ac413c:       .inst   0x00000000 ; undefined
   0x3ff68ac4140:       .inst   0x00000000 ; undefined
   0x3ff68ac4144:       .inst   0x00000000 ; undefined
   0x3ff68ac4148:       .inst   0x00000000 ; undefined

What appears to be the case here is that we have a BL to another method, therefore outside the scope of the current codeblob. However, this codeblob is now being moved and will now require a trampoline instead of a straight BL.

However the BL is not recognised as requiring a trampoline. Looking at frame #10

(gdb) down
#10 0x000003ff7a8ca9bc in Relocation::pd_set_call_destination (this=0x3fdc5fab3e8, 
    x=0x3ff60a108a4 "\375{\277\251\375\003")
    at /home/ed/new_jdk9/hs-comp/hotspot/src/cpu/aarch64/vm/relocInfo_aarch64.cpp:85
85        MacroAssembler::pd_patch_instruction(addr(), x);
(gdb) list 81
76        assert(is_call(), "should be a call here");
77        if (NativeCall::is_call_at(addr())) {
78          address trampoline = nativeCall_at(addr())->get_trampoline();
79          if (trampoline) {
80            nativeCall_at(addr())->set_destination_mt_safe(x, /* assert_lock */false);
81            return;
82          }
83        }
84        assert(addr() != x, "call instruction in an infinite loop");
85        MacroAssembler::pd_patch_instruction(addr(), x);

'trampoline' is set to false here

(gdb) p NativeCall::is_call_at(addr())
$21 = true
(gdb) p nativeCall_at(addr())->get_trampoline()
$22 = (u_char *) 0x0
(gdb) 

Looking at the source for get_trampoline()


  CodeBlob *code = CodeCache::find_blob(call_addr);
  assert(code != NULL, "Could not find the containing code blob");

  address bl_destination
    = MacroAssembler::pd_call_destination(call_addr);
  if (code->content_contains(bl_destination) &&
      is_NativeCallTrampolineStub_at(bl_destination))
    return bl_destination;

This only tests for a trampoline if the BL destination is within the current code blob, and as seen previously with the problems with adrp, it must not test for a trampoline outside the current code blob because that could be pointing somewhere completely random. In this case it happens to be pointing to a block of .inst 0x00000000 words.

The problem arises from the implementation of MacroAssembler::trampoline_call where is does

  if (Assembler::reachable_from_branch_at(pc(), entry.target())) {
    bl(entry.target());
  } else {
    bl(pc());
  }

Here if the call reaches, it plants a BL, however when the call subsequently fails to reach, because the codeblob is moved out of range of a bl, it has no way of finding the trampoline, because it will not look outside the current code blob.

Only possibility might be to always write it as bl(pc()) and rely on the final reloc to fix it up to either point to the trampoline, or call direct. However I think there may be a problem with this if the codeblob is moved more than once, in this case the first move would relocate it using a direct BL and then the second could move it out of range and fail to find the trampoline as above.

Anyone got any ideas on how to fix this?

All the best, and Happy Christmas,
Ed.


From aph at redhat.com  Thu Dec 24 17:29:38 2015
From: aph at redhat.com (Andrew Haley)
Date: Thu, 24 Dec 2015 17:29:38 +0000
Subject: [aarch64-port-dev ] guarantee failures with large code cache
 sizes on jtreg test
 java/lang/invoke/LFCaching10/LFMultiThreadCachingTest.java
In-Reply-To: <1450969616.31650.50.camel@mint>
References: <1450969616.31650.50.camel@mint>
Message-ID: <567C2B82.6080908@redhat.com>

On 24/12/15 15:06, Edward Nevill wrote:

> This only tests for a trampoline if the BL destination is within the
> current code blob, and as seen previously with the problems with
> adrp, it must not test for a trampoline outside the current code
> blob because that could be pointing somewhere completely random. In
> this case it happens to be pointing to a block of .inst 0x00000000
> words.

Indeed.  But the subsequent code should find the trampoline:

  return trampoline_stub_Relocation::get_trampoline_for(call_addr, (nmethod*)code);

The question is why it doesn't.

Andrew.

From edward.nevill at gmail.com  Tue Dec 29 17:17:32 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Tue, 29 Dec 2015 17:17:32 +0000
Subject: [aarch64-port-dev ] RFR: 8146286: aarch64: guarantee failures with
 large code cache sizes on jtreg test
 java/lang/invoke/LFCaching10/LFMultiThreadCachingTest.java
Message-ID: <1451409452.30784.72.camel@mint>

Hi,

The following webrev

http://cr.openjdk.java.net/~enevill/8146286/webrev.0/

JIRA Issue: https://bugs.openjdk.java.net/browse/JDK-8146286

The problem is that during code buffer expansion the code buffer can be moved so that a BL is no longer in range. Normally this would resolve to the targets trampoline.

However this is inhibited during code buffer expansion because of the following in get_trampoline_for()

address trampoline_stub_Relocation::get_trampoline_for(address call, nmethod* code) {
  // There are no relocations available when the code gets relocated
  // because of CodeBuffer expansion.
  if (code->relocation_size() == 0)
    return NULL;

The problem is that the relocs have not been created yet, so get_trampoline_for cannot resolve to the trampoline.

The solution I have adopted is to always generate a BL to self in MacroAssembler::trampoline_call.

In Relocation::pd_call_destination when it detects a call to self it does not attempt to do the relocation but just leaves it as a call to self (there is no point in trying to relocate the call to self to point to the original destination since that is in the old copy of the code buffer and could be out of range).

During final relocation the call to self is then relocated to the correct value.

Repeated testing with the above test shows that the problem has been resolved.

I have also tested with jtreg hotspot/langtools and jdk, before and after patching and with and without -XX:+ReservedCodeCacheSize=256m with no additional failures.

OK to push?
Ed.


From aph at redhat.com  Tue Dec 29 22:16:52 2015
From: aph at redhat.com (Andrew Haley)
Date: Tue, 29 Dec 2015 22:16:52 +0000
Subject: [aarch64-port-dev ] RFR: 8146286: aarch64: guarantee failures
 with large code cache sizes on jtreg test
 java/lang/invoke/LFCaching10/LFMultiThreadCachingTest.java
In-Reply-To: <1451409452.30784.72.camel@mint>
References: <1451409452.30784.72.camel@mint>
Message-ID: <56830654.5010603@redhat.com>

On 29/12/15 17:17, Edward Nevill wrote:
> I have also tested with jtreg hotspot/langtools and jdk, before and after patching and with and without -XX:+ReservedCodeCacheSize=256m with no additional failures.
> 
> OK to push?

Eww.

This does make sense, but it looks very odd indeed.  OK.

Andrew.