From edward.nevill at gmail.com Wed Dec 2 14:24:26 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 02 Dec 2015 14:24:26 +0000 Subject: [aarch64-port-dev ] Help debugging problem with large code cache Message-ID: <1449066266.25167.8.camel@mylittlepony.linaroharston> Hi I have been trying to debug a problem with large code caches on JDK 9 over the past week and could do with some help/advice on how to proceed. I have filed a JIRA issue https://bugs.openjdk.java.net/browse/JDK-8144498 Here is my analysis of the problem so far. Apologies if this is a bit stream of consciousness. Running jtreg/langtools with -XX:ReservedCodeCacheSize=512m generates a number of failures dues to SEGVs whereas running without this option passes all tests. The set of tests which fails each time is different. For example on two back to back runs I get FAILED: tools/javac/classfiles/attributes/annotations/RuntimeAnnotationsForInnerAnnotationTest.java FAILED: tools/javac/T6410706.java FAILED: tools/jdeps/DotFileTest.java ed at arm64:~/jtreg/jtreg$ fgrep FAILED log_512m_2 FAILED: com/sun/javadoc/testSimpleTag/TestSimpleTag.java FAILED: com/sun/javadoc/testWindowTitle/TestWindowTitle.java FAILED: jdk/jshell/CompletionSuggestionTest.java The command used to invoke jtreg was /home/ed/images/jdk9-orig/bin/java -jar lib/jtreg.jar -vmoption:-XX:ReservedCodeCacheSize=512m -nr -conc:48 -timeout:99 -othervm -jdk:/home/ed/images/jdk9-orig -v1 -a -ignore:quiet /home/ed/new_jdk9/hs-comp/langtools/test The problem can also be replicated with EEMBC GrinderBench although it may required many 100s of runs to trigger. The command I used to invoke GrinderBench is /home/ed/images/jdk9-orig/bin/java -XX:ReservedCodeCacheSize=512m -classpath dist/fullset/bench1.jar org.eembc.grinderbench.CmdlineWrapper -r 1 -m 1 -t 4 For the purposes of the following I have chosen to investigate the GrinderBench failure because it is easier to debug than random failures in jtreg/ The SEGV occurs in a method which is called from SharedRuntime::resolve_opt_virtual_call_C. The call backtrace is about 20 frames long. The following are the oldest few frames. .... #17 0x000003ff99717a44 in SharedRuntime::resolve_helper (thread=thread at entry=0x3ff94010000, is_virtual=is_virtual at entry=true, is_optimized=is_optimized at entry=true, __the_thread__=__the_thread__ at entry=0x3ff94010000) at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/runtime/sharedRuntime.cpp:1186 #18 0x000003ff99718988 in SharedRuntime::resolve_opt_virtual_call_C (thread=0x3ff94010000) at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/runtime/sharedRuntime.cpp:1441 #19 0x000003ff70ab23a8 in ?? () #20 0x000003fdd59228f0 in ?? () Looking at frame #19 (gdb) x/10i $pc-20 0x3ff70ab2394: mov x0, x28 0x3ff70ab2398: mov x8, #0x8950 // #35152 0x3ff70ab239c: movk x8, #0x9971, lsl #16 0x3ff70ab23a0: movk x8, #0x3ff, lsl #32 0x3ff70ab23a4: blr x8 => 0x3ff70ab23a8: isb 0x3ff70ab23ac: str xzr, [x28,#440] 0x3ff70ab23b0: str xzr, [x28,#448] 0x3ff70ab23b4: ldr x8, [x28,#8] 0x3ff70ab23b8: cbnz x8, 0x3ff70ab2454 This is a stub for resolve_opt_virtual_call. So here it calls 0x3ff99718950 and disassembling that (gdb) x/i 0x3ff99718950 0x3ff99718950 : stp x29, x30, [sp,#-80]! So it is calling SharedRuntime::resolve_opt_virtual_call_C which is correct according to the above stack trace. However, looking at the previous frame (gdb) x/2g $fp 0x3ff98dede60: 0x0000000000000138 0x000003ff7122469c (gdb) x/12i 0x000003ff7122469c-40 0x3ff71224674: ret 0x3ff71224678: mov x8, #0x28f0 // #10480 0x3ff7122467c: movk x8, #0xd592, lsl #16 0x3ff71224680: movk x8, #0x3fd, lsl #32 0x3ff71224684: str x8, [sp,#8] 0x3ff71224688: mov x8, #0xffffffffffffffff // #-1 0x3ff7122468c: str x8, [sp] 0x3ff71224690: adrp x8, 0x3ff70ab2000 <<< HERE 0x3ff71224694: add x8, x8, #0x300 <<< 0x3ff71224698: blr x8 <<< 0x3ff7122469c: b 0x3ff712242f8 <<< 0x3ff712246a0: adrp x8, 0x3ff70adf000 The code marked HERE is a out of line stub which is calling the resolve_opt_virtual_call stub. So far so good. *** But this is not the correct code to call resolve_opt_virtual_call **** This is in fact the code generated by the following from c1_CodeStubs_aarch64.cpp void CounterOverflowStub::emit_code(LIR_Assembler* ce) { __ bind(_entry); Metadata *m = _method->as_constant_ptr()->as_metadata(); __ mov_metadata(rscratch1, m); ce->store_parameter(rscratch1, 1); ce->store_parameter(_bci, 0); __ far_call(RuntimeAddress(Runtime1::entry_for(Runtime1::counter_overflow_id))); ce->add_call_info_here(_info); ce->verify_oop_map(_info); __ b(_continuation); } So this code is supposed to be calling Runtime1::counter_overflow. The -1 for the BCI is the InvocationEntryBci because this is an invocation entry counter overflow and it is this -1 which eventually causes the SEGV because it is being used as a genuine index into the bytecode to get a constant pool index for the invoke. But is shouldn't be calling SharedRuntime::resolve_opt_virtual_call_C, it should be calling Runtime1::counter_overflow. Tracing back where this out of line stub is called from (gdb) x/10i 0x3ff712242f8-36 0x3ff712242d4: mov x0, #0xc250 // #49744 0x3ff712242d8: movk x0, #0xd592, lsl #16 0x3ff712242dc: movk x0, #0x3fd, lsl #32 0x3ff712242e0: ldr w6, [x0,#220] 0x3ff712242e4: add w6, w6, #0x8 0x3ff712242e8: str w6, [x0,#220] 0x3ff712242ec: and w6, w6, #0x1ff8 0x3ff712242f0: cmp w6, #0x0 0x3ff712242f4: b.eq 0x3ff71224678 <<<< HERE is the b to the out of line stub 0x3ff712242f8: str w5, [sp,#52] (gdb) So the above confirms that it is really doing a counter overflow but calling resolve_opt_virtual_call. So I tried changing the 'far_call' method in macroAssembler_aarch64.cpp to use movz/movk/movk instead of adrp/add. IE // We can use ADRP here because we know that the total size of // the code cache cannot exceed 2Gb. adrp(tmp, entry, offset); add(tmp, tmp, offset); becomes // We can use ADRP here because we know that the total size of // the code cache cannot exceed 2Gb. movptr(tmp, (uintptr_t)entry.target()); //adrp(tmp, entry, offset); //add(tmp, tmp, offset); This cause GrinderBench to start working (at least, no failures after about 5000 runs). So I changed this to read // We can use ADRP here because we know that the total size of // the code cache cannot exceed 2Gb. movptr(tmp, (uintptr_t)entry.target()); adrp(tmp, entry, offset); add(tmp, tmp, offset); IE. So it generate both the movz/movk/movk vsn and the adrp/add version but uses the adrp version discarding the result of the movz/movk/movk version. Now when I list the out of line stub in gdb I get (gdb) x/10i 0x000003ff5521d5dc-32 0x3ff5521d5bc: mov x8, #0xffffffffffffffff // #-1 0x3ff5521d5c0: str x8, [sp] 0x3ff5521d5c4: mov x8, #0x9780 <<< movz/movk/movk -> 0x3ff54c89780 0x3ff5521d5c8: movk x8, #0x54c8, lsl #16 0x3ff5521d5cc: movk x8, #0x3ff, lsl #32 0x3ff5521d5d0: adrp x8, 0x3ff54ab2000 <<< adrp/add -> 0x3ff54ab2300 0x3ff5521d5d4: add x8, x8, #0x300 0x3ff5521d5d8: blr x8 0x3ff5521d5dc: b 0x3ff5521d308 0x3ff5521d5e0: mov x8, #0xfc80 // #64640 So the adrp/add and movz/movk/movk address different runtime stubs. Disassembling both shows that the adrp is addressing the resolve_opt_virtual_call stub and the movz/movk/movk is addressing the Runtime1::counter_overflow stub. So it looks like the adrp is either not being relocated, or is being relocated incorrectly. Any suggestions as to why it might be doing this??? I have had a long look at the pd_patch_* code and it seems correct to me. Unfortunately it is difficult to debug because I cannot walk though it in gdb because of the infrequency (once in every few 100 runs) so I can only debug as above by looking at the core files generated. Thanks for your help, Ed. From edward.nevill at gmail.com Thu Dec 3 07:41:44 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 03 Dec 2015 07:41:44 +0000 Subject: [aarch64-port-dev ] Help debugging problem with large code cache In-Reply-To: <1449066266.25167.8.camel@mylittlepony.linaroharston> References: <1449066266.25167.8.camel@mylittlepony.linaroharston> Message-ID: <1449128504.15424.11.camel@mint> On Wed, 2015-12-02 at 14:24 +0000, Edward Nevill wrote: > So it looks like the adrp is either not being relocated, or is being relocated incorrectly. > > Any suggestions as to why it might be doing this??? I have had a long look at the pd_patch_* code and it seems correct to me. I think I have it on the run! I believe the following code is getting a false positive inline bool is_NativeCallTrampolineStub_at(address addr) { // Ensure that the stub is exactly // ldr xscratch1, L // br xscratch1 // L: uint32_t *i = (uint32_t *)addr; return i[0] == 0x58000048 && i[1] == 0xd61f0100; } when called from the following in get_trampoline() address bl_destination = MacroAssembler::pd_call_destination(call_addr); if (code->content_contains(bl_destination) && is_NativeCallTrampolineStub_at(bl_destination)) return bl_destination; which in turn is called from the following in pd_call_destination if (is_call()) { address trampoline = nativeCall_at(addr())->get_trampoline(); if (trampoline) { return nativeCallTrampolineStub_at(trampoline)->destination(); } } so the call destination for overflow_counter is matched as a false positive and the destination of a trampoline is returned instead, so the adrp is relocated to this. Should the following line address trampoline = nativeCall_at(addr())->get_trampoline(); be address trampoline = nativeCall_at(orig_addr)->get_trampoline(); IE the address before relocation. Because if the code has not been relocated yet, then the adrp could be pointing somewhere randomly within the code buffer, and it just happens sometimes to point to a valid trampoline stub. Regards, Ed From aph at redhat.com Thu Dec 3 09:36:35 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 3 Dec 2015 09:36:35 +0000 Subject: [aarch64-port-dev ] Help debugging problem with large code cache In-Reply-To: <1449128504.15424.11.camel@mint> References: <1449066266.25167.8.camel@mylittlepony.linaroharston> <1449128504.15424.11.camel@mint> Message-ID: <56600D23.1020208@redhat.com> On 03/12/15 07:41, Edward Nevill wrote: > Because if the code has not been relocated yet, then the adrp could be pointing somewhere randomly within the code buffer, and it just happens sometimes to point to a valid trampoline stub. If you can catch adrp being used where it randomly points somewhere in a code buffer, then that undoubtedly would be a bug. But pd_call_destination is surely not supposed to be used on a branch whose destination has not been set: in that case it'll return garbage, and it doesn't matter what kind of garbage. The code in pd_set_call_destination certainly does look wrong, however. There is no guarantee at all that it points anywhere, so dereferencing the adrp might be wrong. It might be that the logic here needs redesigning. Andrew. From edward.nevill at gmail.com Thu Dec 3 10:15:32 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 03 Dec 2015 10:15:32 +0000 Subject: [aarch64-port-dev ] Help debugging problem with large code cache In-Reply-To: <56600D23.1020208@redhat.com> References: <1449066266.25167.8.camel@mylittlepony.linaroharston> <1449128504.15424.11.camel@mint> <56600D23.1020208@redhat.com> Message-ID: <1449137732.6644.11.camel@mylittlepony.linaroharston> On Thu, 2015-12-03 at 09:36 +0000, Andrew Haley wrote: > On 03/12/15 07:41, Edward Nevill wrote: > > Because if the code has not been relocated yet, then the adrp could be pointing somewhere randomly within the code buffer, and it just happens sometimes to point to a valid trampoline stub. > > If you can catch adrp being used where it randomly points somewhere > in a code buffer, then that undoubtedly would be a bug. I assert it is I have trapped it in gdb at the point where it is making the incorrect relocation. Winding back the call trace to CallRelocation::fix_relocation_after_move void CallRelocation::fix_relocation_after_move(const CodeBuffer* src, CodeBuffer* dest) { // Usually a self-relative reference to an external routine. // On some platforms, the reference is absolute (not self-relative). // The enhanced use of pd_call_destination sorts this all out. address orig_addr = old_addr_for(addr(), src, dest); address callee = pd_call_destination(orig_addr); // Reassert the callee address, this time in the new copy of the code. pd_set_call_destination(callee); } (gdb) p/x callee $9 = 0x3ff68ab2300 (gdb) p/x orig_addr ;; Relocating from 0x3ff68d16f28 -> 0x3ff691f02e8 $10 = 0x3ff68d16f28 (gdb) p/x addr() $11 = 0x3ff691f02e8 (gdb) x/2i orig_addr 0x3ff68d16f28: adrp x8, 0x3ff68d16000 ;; Original call destination 0x3ff68d16f2c: add x8, x8, #0x500 ;; == 0x3ff68d16500 (gdb) x/2i addr() 0x3ff691f02e8: adrp x8, 0x3ff691f0000 ;; Copied but not relocated 0x3ff691f02ec: add x8, x8, #0x500 ;; dest == 0x3ff691f0500 (gdb) x/2i 0x3ff68d16500 0x3ff68d16500: stp x29, x30, [sp,#-16]! ;; overflow_counter stub 0x3ff68d16504: mov x29, sp ;; pointed to by original call dest above (gdb) x/2i 0x3ff691f0500 0x3ff691f0500: ldr x8, 0x3ff691f0508 ;; copied but not relocated dest points here 0x3ff691f0504: br x8 ;; to a trampoline stub, but only by accident ;; essentially pointing to a random place in ;; the codebuf (gdb) x/g 0x3ff691f0508 0x3ff691f0508: 0x000003ff68ab2300 ;; so since it thinks it is a trampoline stub ;; it picks up this address as the final adr ;; which we see in callee above This is because it is using addr() in pd_call_destination, rather than orig_addr. IE. it is using the copied, but not relocated version, therefore the adrp is transiently pointing into garbage. Using the orig_addr should be correct. From adinn at redhat.com Thu Dec 3 11:02:46 2015 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 3 Dec 2015 11:02:46 +0000 Subject: [aarch64-port-dev ] Help debugging problem with large code cache In-Reply-To: <1449137732.6644.11.camel@mylittlepony.linaroharston> References: <1449066266.25167.8.camel@mylittlepony.linaroharston> <1449128504.15424.11.camel@mint> <56600D23.1020208@redhat.com> <1449137732.6644.11.camel@mylittlepony.linaroharston> Message-ID: <56602156.9030103@redhat.com> On 03/12/15 10:15, Edward Nevill wrote: > On Thu, 2015-12-03 at 09:36 +0000, Andrew Haley wrote: >> On 03/12/15 07:41, Edward Nevill wrote: >>> Because if the code has not been relocated yet, then the adrp could be pointing somewhere randomly within the code buffer, and it just happens sometimes to point to a valid trampoline stub. >> >> If you can catch adrp being used where it randomly points somewhere >> in a code buffer, then that undoubtedly would be a bug. > > I assert it is > > I have trapped it in gdb at the point where it is making the incorrect relocation. Hmm, that looks like to me like it is the cause of the problem. Interestingly, I just glanced at what the ppc code does and I am not clear why it is not subject to the same problem -- admittedly only on a half-arsed understanding of what it is doing. It might be worth you looking at it to see if it there is something I have missed whihc sheds light on the AArch64 case. regards, Andrew Dinn ----------- From edward.nevill at gmail.com Thu Dec 3 12:32:23 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 03 Dec 2015 12:32:23 +0000 Subject: [aarch64-port-dev ] Help debugging problem with large code cache In-Reply-To: <56600D23.1020208@redhat.com> References: <1449066266.25167.8.camel@mylittlepony.linaroharston> <1449128504.15424.11.camel@mint> <56600D23.1020208@redhat.com> Message-ID: <1449145943.6644.22.camel@mylittlepony.linaroharston> On Thu, 2015-12-03 at 09:36 +0000, Andrew Haley wrote: > On 03/12/15 07:41, Edward Nevill wrote: > The code in pd_set_call_destination certainly does look wrong, > however. There is no guarantee at all that it points anywhere, > so dereferencing the adrp might be wrong. It might be that the > logic here needs redesigning. I believe the code is pd_set_call_destination is correct although it is fragile. Again it is looking at the copied but not relocated code as in pd_call_destination. However, the NativeCall::get_trampoline() method called by pd_set_call_destination checks that the destination is within the code blob before examining it. >From NativeCall::get_trampoline() if (code->content_contains(bl_destination) && is_NativeCallTrampolineStub_at(bl_destination)) return bl_destination; so code->content_contains(bl_destination) checks that the destination is within the code blob. We know that if a trampoline exists it must be in the same code blob (that is the whole purpose of the trampoline). Regards, Ed. From fei.yang0953 at yahoo.com Thu Dec 3 14:22:26 2015 From: fei.yang0953 at yahoo.com (felix yang) Date: Thu, 3 Dec 2015 14:22:26 +0000 (UTC) Subject: [aarch64-port-dev ] [RFR] aarch64: C2 generate vectorized MLA/MLS instructions References: <537574996.11929209.1449152546033.JavaMail.yahoo.ref@mail.yahoo.com> Message-ID: <537574996.11929209.1449152546033.JavaMail.yahoo@mail.yahoo.com> Hi, ? Can someone help review and sponsor this code generation improvement for aarch64 port??? ? Bug:?https://bugs.openjdk.java.net/browse/JDK-8144587 ? Webrev:?http://cr.openjdk.java.net/~fyang/8144587/webrev.00/ ? The hotspot/test/compiler/loopopts/superword/SumRed_Int.java can server as a test case.?? With this patch, the following code snippet by C2:? ? ? 0x0000007f6cec12cc: mul v19.4s, v16.4s, v17.4s ? ? 0x0000007f6cec12d0: mul v16.4s, v16.4s, v18.4s ? ? 0x0000007f6cec12d4: mul v17.4s, v18.4s, v17.4s ? ? 0x0000007f6cec12d8: add v16.4s, v19.4s, v16.4s ? ? 0x0000007f6cec12dc: add v16.4s, v16.4s, v17.4s ? will be further optimized into:? ? ? 0x0000007f9cdb86dc: mul? ? ? v19.4s, v16.4s, v17.4s ? ? 0x0000007f9cdb86e0: mla? ? ? v19.4s, v16.4s, v18.4s ? ? 0x0000007f9cdb86e4: mla? ? ? v19.4s, v17.4s, v18.4s ? About 13% performance gain achieved for the test case on my aarch64 server.?? ? Tested with jtreg hotspot & langtools.? Results are the same before and after.?? ? Is it OK to push??? Felix,?? Thanks for your help.?? From aph at redhat.com Thu Dec 3 14:40:07 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 3 Dec 2015 14:40:07 +0000 Subject: [aarch64-port-dev ] [RFR] aarch64: C2 generate vectorized MLA/MLS instructions In-Reply-To: <537574996.11929209.1449152546033.JavaMail.yahoo@mail.yahoo.com> References: <537574996.11929209.1449152546033.JavaMail.yahoo.ref@mail.yahoo.com> <537574996.11929209.1449152546033.JavaMail.yahoo@mail.yahoo.com> Message-ID: <56605447.9070103@redhat.com> It would help everybody if you did "hg commit" with an appropriate changeset comment before generating the webrev. Andrew. From edward.nevill at gmail.com Fri Dec 4 09:59:46 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Fri, 04 Dec 2015 09:59:46 +0000 Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache generates SEGV Message-ID: <1449223186.15424.42.camel@mint> Hi, Please review the following webrev http://cr.openjdk.java.net/~enevill/8144498/webrev.0/ JIRA issue: https://bugs.openjdk.java.net/browse/JDK-8144498 This fixes an issue when random SEGVs were generated with -XX:ReservedCodeCacheSize > 128m The problem was that pd_call_destination was using addr() rather than orig_addr. IE. It was using the address in the copied, but not relocated code. It was then following a call destination to determine whether or not this was a call to a trampoline (in order that it could substitute the final trampoline address). Usually this worked OK because it ended up just referencing a random address in the code buffer. However, very occasionally it would point to a trampoline somewhere in the code buffer and get a false positive. In this case it would substitute the final address of that trampoline. The result was that it would very occasionally relocate the address of some call to a random trampoline stub. I have tested this with jtreg hotspot and langtools with -XX:ReservedCodeCacheSize=256m and without specifying any ReservedCodeCacheSize (so it defaults to 128m). With ReservedCodeCacheSize == default Hotspot (original): Test results: passed: 935; failed: 22; error: 12 Hotspot (patched): Test results: passed: 942; failed: 15; error: 12 Langtools (original): Test results: passed: 3,313; failed: 33 Langtools (patched): Test results: passed: 3,316; failed: 33 With -XX:+ReservedCodeCacheSize=256m Hotspot (original): Test results: passed: 865; failed: 19; error: 85 Hotspot (patched): Test results: passed: 946; failed: 10; error: 13 Langtools (original): Test results: passed: 3,049; failed: 77; error: 223 Langtools (patched): Test results: passed: 3,314; failed: 33 So in all cases it generates results as good, or better than the original. In the case of langtools with a 256m buffer it goes from 300 failures+errors to just 33. I have also tested this with EEMBC GrinderBench which also showed the problem every few 100 runs. I have run this over 5000 times with no occurrence of the problem. Thanks for your review, Ed. From adinn at redhat.com Fri Dec 4 10:11:27 2015 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 4 Dec 2015 10:11:27 +0000 Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache generates SEGV In-Reply-To: <1449223186.15424.42.camel@mint> References: <1449223186.15424.42.camel@mint> Message-ID: <566166CF.5000006@redhat.com> On 04/12/15 09:59, Edward Nevill wrote: > Hi, > > Please review the following webrev . . . Reviewed by me as an AArch64-only patch. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors:Michael Cunningham (US), Michael O'Neill(Ireland), Paul Argiry (US) From aph at redhat.com Fri Dec 4 16:14:03 2015 From: aph at redhat.com (Andrew Haley) Date: Fri, 4 Dec 2015 16:14:03 +0000 Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache generates SEGV In-Reply-To: <1449223186.15424.42.camel@mint> References: <1449223186.15424.42.camel@mint> Message-ID: <5661BBCB.5000307@redhat.com> Your fix looks OK. However, there is one other fix which would be nice. We use call relocs for things other than bl instructions. This is because some things (e.g. MachUEPNode::emit) do this: __ far_jump(RuntimeAddress(SharedRuntime::get_ic_miss_stub())); Only bl immediate instructions are ever used to jump to trampolines. This is essential because they must be patchable. Because of this, in here: if (is_call()) { address trampoline = nativeCall_at(orig_addr)->get_trampoline(); if (trampoline) { return nativeCallTrampolineStub_at(trampoline)->destination(); } } the is_call() could be replaced by NativeCall::is_call_at(). Otherwise we're pointlessly decoding instructions and chasing nonexistent trampolines. Could you try that? Thanks, Andrew. From aph at redhat.com Fri Dec 4 17:38:19 2015 From: aph at redhat.com (Andrew Haley) Date: Fri, 4 Dec 2015 17:38:19 +0000 Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache generates SEGV In-Reply-To: <5661BBCB.5000307@redhat.com> References: <1449223186.15424.42.camel@mint> <5661BBCB.5000307@redhat.com> Message-ID: <5661CF8B.6040405@redhat.com> On 12/04/2015 04:14 PM, Andrew Haley wrote: > Your fix looks OK. Scratch that, I'm seeing NetBeans failures with your patch. I think it's because you're missing a trampoline destination when the initial relocation is being done. This is because get_trampoline() looks for a trampoline_stub reloc based on orig_addr, and this can never work. (When a trampoline call is first created it is a call to self; the reloc is the only way to find the trampoline. For this reason, you must use nativeCall_at(addr())->get_trampoline().) I'm going to suggest this as a simpler fix: address Relocation::pd_call_destination(address orig_addr) { assert(is_call(), "should be a call here"); if (NativeCall::is_call_at(addr())) { // is a BL instruction address trampoline = nativeCall_at(addr())->get_trampoline(); if (trampoline) { return nativeCallTrampolineStub_at(trampoline)->destination(); } } if (orig_addr != NULL) { return MacroAssembler::pd_call_destination(orig_addr); } return MacroAssembler::pd_call_destination(addr()); } I think it's right because this way we only follow real BL instructions, and if these point to trampolines they must be within the blob which is being relocated. I think this will fix your problem because such BL instructions cannot point to anywhere wild. Thanks, Andrew. From edward.nevill at gmail.com Fri Dec 4 17:43:37 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Fri, 04 Dec 2015 17:43:37 +0000 Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache generates SEGV In-Reply-To: <5661BBCB.5000307@redhat.com> References: <1449223186.15424.42.camel@mint> <5661BBCB.5000307@redhat.com> Message-ID: <1449251017.4670.3.camel@mint> On Fri, 2015-12-04 at 16:14 +0000, Andrew Haley wrote: > Your fix looks OK. > > However, there is one other fix which would be nice. > if (is_call()) { > address trampoline = nativeCall_at(orig_addr)->get_trampoline(); > if (trampoline) { > return nativeCallTrampolineStub_at(trampoline)->destination(); > } > } > > the is_call() could be replaced by NativeCall::is_call_at(). > Otherwise we're pointlessly decoding instructions and chasing > nonexistent trampolines. Could you try that? Done. New webrev at http://cr.openjdk.java.net/~enevill/8144498/webrev.1 jtreg results with ReservedCodeCacheSize=256m Hotspot (original): Test results: passed: 865; failed: 19; error: 85 Hotspot (patched): Test results: passed: 947; failed: 10; error: 12 Langtools (original): Test results: passed: 3,049; failed: 77; error: 223 Hotspot (patched): Test results: passed: 3,316; failed: 33 Many thanks, Ed. From fei.yang0953 at yahoo.com Sun Dec 6 14:33:46 2015 From: fei.yang0953 at yahoo.com (felix yang) Date: Sun, 6 Dec 2015 14:33:46 +0000 (UTC) Subject: [aarch64-port-dev ] [RFR] aarch64: C2 generate vectorized MLA/MLS instructions In-Reply-To: <56605447.9070103@redhat.com> References: <56605447.9070103@redhat.com> Message-ID: <792430786.13018984.1449412426251.JavaMail.yahoo@mail.yahoo.com> Done.Currently, I have two webrevs which are under review.I hava recreated both of them:Bug: https://bugs.openjdk.java.net/browse/JDK-8144201 Webrev: http://cr.openjdk.java.net/~fyang/8144201/webrev.01Bug:?https://bugs.openjdk.java.net/browse/JDK-8144587 Webrev:?http://cr.openjdk.java.net/~fyang/8144587/webrev.01Is that OK? Thanks. On Thursday, December 3, 2015 10:40 PM, Andrew Haley wrote: It would help everybody if you did "hg commit" with an appropriate changeset comment before generating the webrev. Andrew. From aph at redhat.com Mon Dec 7 09:48:36 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 7 Dec 2015 09:48:36 +0000 Subject: [aarch64-port-dev ] [RFR] aarch64: C2 generate vectorized MLA/MLS instructions In-Reply-To: <792430786.13018984.1449412426251.JavaMail.yahoo@mail.yahoo.com> References: <56605447.9070103@redhat.com> <792430786.13018984.1449412426251.JavaMail.yahoo@mail.yahoo.com> Message-ID: <566555F4.6090202@redhat.com> On 06/12/15 14:33, felix yang wrote: > Done.Currently, I have two webrevs which are under review.I hava recreated both of them:Bug: https://bugs.openjdk.java.net/browse/JDK-8144201 > Webrev: http://cr.openjdk.java.net/~fyang/8144201/webrev.01Bug: https://bugs.openjdk.java.net/browse/JDK-8144587 > Webrev: http://cr.openjdk.java.net/~fyang/8144587/webrev.01Is that OK? No, the comment is not complete. Please make sure that you have Jcheck installed in your Mercurial. http://openjdk.java.net/projects/code-tools/jcheck/ Andrew. From edward.nevill at gmail.com Mon Dec 7 12:22:14 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Mon, 07 Dec 2015 12:22:14 +0000 Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache generates SEGV In-Reply-To: <5661CF8B.6040405@redhat.com> References: <1449223186.15424.42.camel@mint> <5661BBCB.5000307@redhat.com> <5661CF8B.6040405@redhat.com> Message-ID: <1449490934.12382.49.camel@mint> On Fri, 2015-12-04 at 17:38 +0000, Andrew Haley wrote: > On 12/04/2015 04:14 PM, Andrew Haley wrote: > I'm going to suggest this as a simpler fix: > > address Relocation::pd_call_destination(address orig_addr) { > assert(is_call(), "should be a call here"); > if (NativeCall::is_call_at(addr())) { // is a BL instruction > address trampoline = nativeCall_at(addr())->get_trampoline(); > if (trampoline) { > return nativeCallTrampolineStub_at(trampoline)->destination(); > } > } > if (orig_addr != NULL) { > return MacroAssembler::pd_call_destination(orig_addr); > } > return MacroAssembler::pd_call_destination(addr()); > } > > I think it's right because this way we only follow real BL > instructions, and if these point to trampolines they must be within > the blob which is being relocated. I think this will fix your problem > because such BL instructions cannot point to anywhere wild. I am not sure this works. Firstly, in the case that far_branches are not enabled (IE the code cache is <= 128m), then there could be BL instructions to other addresses outside the current code blob. These are generated by far_call as follows. if (far_branches()) { unsigned long offset; // We can use ADRP here because we know that the total size of // the code cache cannot exceed 2Gb. adrp(tmp, entry, offset); add(tmp, tmp, offset); if (cbuf) cbuf->set_insts_mark(); blr(tmp); } else { if (cbuf) cbuf->set_insts_mark(); bl(entry); } I cannot see what prevents one of these BLs from being followed and since they may have been copied but not relocated then they may end up pointing somewhere random in the code buffer which just happens to look like a trampoline. Admittedly, the probability of failure is vastly reduced because there are no genuine trampolines for it to latch on to. This case can be avoided by adding a far_branches() predicate to pd_call_destination as follows. if (far_branches() && NativeCall::is_call_at(addr())) { // is a BL instruction Second, I am not such that your assertion > (When a trampoline call is first created it is a call to self; the > reloc is the only way to find the trampoline. For this reason, you > must use nativeCall_at(addr())->get_trampoline().) is correct. In MacroAssembler::trampoline_call I see if (Assembler::reachable_from_branch_at(pc(), entry.target())) { bl(entry.target()); } else { bl(pc()); } so it only creates a call to self if the branch does not reach and as before you could have a dangling BL when this is copied. I believe it would be possible to replace the above code section with simply bl(pc()); since it will always be relocated and therefore you can always generate the call to self. All of this seems very fragile and I am wondering about the value of trampolines. The alternative to using trampolines would be to always generate adrp Xn, target & ~0xfff add Xn, Xn, target & 0xfff blr Xn On most modern, out of order, dual issue implementations the ADRP and ADD will be folded into a single micro-op which will then be dual issued with the BLR so it doesn't end up costing us anything. I did some experiments on 2 different implementations comparing the following 3 code fragments (where 'tramp_dest' is the final destination to be called). 1) Straight BL tramp_test: mov x2, x30 tramp1: bl tramp_dest subs x0, x0, #1 bne tramp1 ret x2 2) Straight ADRP/ADD tramp_test: mov x2, x30 tramp1: adr x3, tramp_dest add x3, x3, #0x0 blr x3 subs x0, x0, #1 bne tramp1 ret x2 3) Trampoline tramp_test: mov x2, x30 tramp1: bl tramp subs x0, x0, #1 bne tramp1 ret x2 tramp: ldr x1, tramp_adcon br x1 tramp_adcon: .dword tramp_dest I ran the above tests on 2 different implementations for 1E9 iteration. The results were Imp 1: Straight BL = 4.50157 sec, ADRP/ADD = 4.50157 sec, trampoline = 6.00209 sec Imp 2: Straight BL = 3.00107 sec, ADRP/ADD = 3.00106 sec, trampoline = 4.16815 sec Maybe we could just get rid of trampolines? All the best, Ed. From edward.nevill at gmail.com Mon Dec 7 13:45:19 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Mon, 07 Dec 2015 13:45:19 +0000 Subject: [aarch64-port-dev ] aarch32 project Message-ID: <1449495919.12382.59.camel@mint> Hi, This is not really applicable to aarch64 but there is probably a large overlap of interest so I am posting this announcement here. The aarch32 project has now been created and there is now an aarch32 specific mailing list aarch32-port-dev at openjdk.java.net Please go to http://mail.openjdk.java.net/mailman/listinfo/aarch32-port-dev to sign up. There will be no further announcements on this list. Thanks for your time, Ed Nevill From aph at redhat.com Mon Dec 7 14:20:37 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 7 Dec 2015 14:20:37 +0000 Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache generates SEGV In-Reply-To: <1449490934.12382.49.camel@mint> References: <1449223186.15424.42.camel@mint> <5661BBCB.5000307@redhat.com> <5661CF8B.6040405@redhat.com> <1449490934.12382.49.camel@mint> Message-ID: <566595B5.9060400@redhat.com> On 12/07/2015 12:22 PM, Edward Nevill wrote: > I cannot see what prevents one of these BLs from being followed and > since they may have been copied but not relocated then they may end > up pointing somewhere random in the code buffer which just happens > to look like a trampoline. Admittedly, the probability of failure is > vastly reduced because there are no genuine trampolines for it to > latch on to. You must look inside get_trampoline(). It checks for this. > Second, I am not such that your assertion > >> (When a trampoline call is first created it is a call to self; the >> reloc is the only way to find the trampoline. For this reason, you >> must use nativeCall_at(addr())->get_trampoline().) > > is correct. In MacroAssembler::trampoline_call I see > > if (Assembler::reachable_from_branch_at(pc(), entry.target())) { > bl(entry.target()); > } else { > bl(pc()); > } > > so it only creates a call to self if the branch does not reach and > as before you could have a dangling BL when this is copied. It doesn't matter, because get_trampoline() checks for BLs outside the current method. > I believe it would be possible to replace the above code section > with simply > > bl(pc()); > since it will always be relocated and therefore you can always > generate the call to self. True. There are some other tidy-ups which could also be made in this area, but none of it is terribly important as far as I can see. > Maybe we could just get rid of trampolines? There's no need. In the commonest case we BL directly to the destination, which is optimal. Your ADRP/ADD examples aren't patchable; if you are going to compare trampolines with something else, whatever else you choose must be patchable, and it will be slower and/or larger than BL. Andrew. From felix.yang at linaro.org Mon Dec 7 15:17:30 2015 From: felix.yang at linaro.org (Felix Yang) Date: Mon, 7 Dec 2015 23:17:30 +0800 Subject: [aarch64-port-dev ] RFR: 8144201: aarch64: jdk/test/com/sun/net/httpserver/Test6a.java fails with --enable-unlimited-crypto Message-ID: Hi, I have corrected the webrev issues in my previous. Thanks Edward for providing the help. Now I am resending this mail: Could someone help review and sponsor this runtime fix for aarch64? Bug: https://bugs.openjdk.java.net/browse/JDK-8144201 Webrev: http://cr.openjdk.java.net/~fyang/8144201/webrev.02 The test fails on aarch64 platform using openjdk8/9 configured with --enable-unlimited-crypto. Reported error message: Execution failed: `main' threw exception: java.io.IOException: Error writing request body to server. And the test passes with -XX:TieredStopAtLevel=3 or -XX:-UseAESIntrinsics option. After narrowing down, I find the bug is caused by the _cipherBlockChaining_decryptAESCrypt StubRoutine. The proposed patch fixes an obvious typo in this StubRoutine. Passed JTreg regression test(using openjdk8 built with --enable-unlimited-crypto). Is it OK to push? Felix, Thanks for your help. From felix.yang at linaro.org Mon Dec 7 15:26:06 2015 From: felix.yang at linaro.org (Felix Yang) Date: Mon, 7 Dec 2015 23:26:06 +0800 Subject: [aarch64-port-dev ] RFR: 8144587: aarch64: generate vectorized MLA/MLS instructions Message-ID: Hi, I have corrected the webrev issues in my previous mail. Thanks Edward for providing the help. Now I am resending this mail: Can someone help review and sponsor this code generation improvement for aarch64 port? Bug: https://bugs.openjdk.java.net/browse/JDK-8144587 Webrev: http://cr.openjdk.java.net/~fyang/8144587/webrev.02 The hotspot/test/compiler/loopopts/superword/SumRed_Int.java can server as a test case. With this patch, the following code snippet by C2: 0x0000007f6cec12cc: mul v19.4s, v16.4s, v17.4s 0x0000007f6cec12d0: mul v16.4s, v16.4s, v18.4s 0x0000007f6cec12d4: mul v17.4s, v18.4s, v17.4s 0x0000007f6cec12d8: add v16.4s, v19.4s, v16.4s 0x0000007f6cec12dc: add v16.4s, v16.4s, v17.4s will be further optimized into: 0x0000007f9cdb86dc: mul v19.4s, v16.4s, v17.4s 0x0000007f9cdb86e0: mla v19.4s, v16.4s, v18.4s 0x0000007f9cdb86e4: mla v19.4s, v17.4s, v18.4s About 13% performance gain achieved for the test case on my aarch64 server. Tested with jtreg hotspot & langtools. Results are the same before and after. Is it OK to push? Felix, Thanks for your help. From edward.nevill at gmail.com Mon Dec 7 16:19:07 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Mon, 07 Dec 2015 16:19:07 +0000 Subject: [aarch64-port-dev ] RFR: 8144201: aarch64: jdk/test/com/sun/net/httpserver/Test6a.java fails with --enable-unlimited-crypto In-Reply-To: References: Message-ID: <1449505147.12382.67.camel@mint> Hi Felix, Thanks for finding this. The fix looks good to me. Could we have an official reviewer please. Regards, Ed. On Mon, 2015-12-07 at 23:17 +0800, Felix Yang wrote: > Hi, > > I have corrected the webrev issues in my previous. Thanks Edward for > providing the help. > Now I am resending this mail: > > Could someone help review and sponsor this runtime fix for aarch64? > Bug: https://bugs.openjdk.java.net/browse/JDK-8144201 > Webrev: http://cr.openjdk.java.net/~fyang/8144201/webrev.02 > > The test fails on aarch64 platform using openjdk8/9 configured with > --enable-unlimited-crypto. > Reported error message: Execution failed: `main' threw exception: > java.io.IOException: Error writing request body to server. > And the test passes with -XX:TieredStopAtLevel=3 or > -XX:-UseAESIntrinsics option. > > After narrowing down, I find the bug is caused by the > _cipherBlockChaining_decryptAESCrypt StubRoutine. > The proposed patch fixes an obvious typo in this StubRoutine. Passed > JTreg regression test(using openjdk8 built with --enable-unlimited-crypto). > Is it OK to push? > > Felix, > Thanks for your help. From edward.nevill at gmail.com Mon Dec 7 16:21:17 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Mon, 07 Dec 2015 16:21:17 +0000 Subject: [aarch64-port-dev ] RFR: 8144587: aarch64: generate vectorized MLA/MLS instructions In-Reply-To: References: Message-ID: <1449505277.12382.69.camel@mint> Hi Felix, Thanks for this. This optimisation looks good to me. Could we have an official reviewer please. Thanks, Ed. On Mon, 2015-12-07 at 23:26 +0800, Felix Yang wrote: > Hi, > > I have corrected the webrev issues in my previous mail. Thanks Edward for > providing the help. > Now I am resending this mail: > > Can someone help review and sponsor this code generation improvement for > aarch64 port? > Bug: https://bugs.openjdk.java.net/browse/JDK-8144587 > Webrev: http://cr.openjdk.java.net/~fyang/8144587/webrev.02 > > The hotspot/test/compiler/loopopts/superword/SumRed_Int.java can server > as a test case. > With this patch, the following code snippet by C2: > 0x0000007f6cec12cc: mul v19.4s, v16.4s, v17.4s > 0x0000007f6cec12d0: mul v16.4s, v16.4s, v18.4s > 0x0000007f6cec12d4: mul v17.4s, v18.4s, v17.4s > 0x0000007f6cec12d8: add v16.4s, v19.4s, v16.4s > 0x0000007f6cec12dc: add v16.4s, v16.4s, v17.4s > will be further optimized into: > 0x0000007f9cdb86dc: mul v19.4s, v16.4s, v17.4s > 0x0000007f9cdb86e0: mla v19.4s, v16.4s, v18.4s > 0x0000007f9cdb86e4: mla v19.4s, v17.4s, v18.4s > > About 13% performance gain achieved for the test case on my aarch64 > server. > Tested with jtreg hotspot & langtools. Results are the same before and > after. > Is it OK to push? > > Felix, > Thanks for your help. From roland.westrelin at oracle.com Mon Dec 7 16:26:57 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 7 Dec 2015 17:26:57 +0100 Subject: [aarch64-port-dev ] RFR: 8144201: aarch64: jdk/test/com/sun/net/httpserver/Test6a.java fails with --enable-unlimited-crypto In-Reply-To: References: Message-ID: <60CFC191-622E-4243-A9C5-E2D4B7F2F024@oracle.com> > Webrev: http://cr.openjdk.java.net/~fyang/8144201/webrev.02 That looks good to me. Roland. From roland.westrelin at oracle.com Mon Dec 7 16:31:06 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 7 Dec 2015 17:31:06 +0100 Subject: [aarch64-port-dev ] RFR: 8144587: aarch64: generate vectorized MLA/MLS instructions In-Reply-To: References: Message-ID: > Webrev: http://cr.openjdk.java.net/~fyang/8144587/webrev.02 That looks good to me. Roland. From edward.nevill at gmail.com Tue Dec 8 15:32:30 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Tue, 08 Dec 2015 15:32:30 +0000 Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache generates SEGV In-Reply-To: <566595B5.9060400@redhat.com> References: <1449223186.15424.42.camel@mint> <5661BBCB.5000307@redhat.com> <5661CF8B.6040405@redhat.com> <1449490934.12382.49.camel@mint> <566595B5.9060400@redhat.com> Message-ID: <1449588750.5880.28.camel@mylittlepony.linaroharston> On Mon, 2015-12-07 at 14:20 +0000, Andrew Haley wrote: > On 12/07/2015 12:22 PM, Edward Nevill wrote: > > > I cannot see what prevents one of these BLs from being followed and > > since they may have been copied but not relocated then they may end > > up pointing somewhere random in the code buffer which just happens > > to look like a trampoline. Admittedly, the probability of failure is > > vastly reduced because there are no genuine trampolines for it to > > latch on to. > > You must look inside get_trampoline(). It checks for this. OK. Thanks, I have satisfied myself that this is correct. New webrev @ http://cr.openjdk.java.net/~enevill/8144498/webrev.2 I was having difficulty understanding why the check inside get_trapoline() did not exclude the adrp/add relocation. However when I trap it doing the relocation in gdb I see Original: 0x3ff54170b50: adrp x8, 0x3ff54170000 <<< Not in code blob 0x3ff54170b54: add x8, x8, #0x400 0x3ff54170b58: blr x8 Copied but not relocated. 0x3ff5481d250: adrp x8, 0x3ff5481d000 <<< Within code blob 0x3ff5481d254: add x8, x8, #0x400 0x3ff5481d258: blr x8 So the destination offset in the original is 0x3ff54170400 - 0x3ff54170b50 = 0xfffffffffffff8b0, whereas in the copied but not relocated version it is 0x3ff5481d400 - 0x3ff5481d250 = 0x1b0 which is within the current code blob. This happens because of the half PC relative, half absolute nature of the adrp/add relocation in that the bottom 12 bits are always absolute whereas the adrp instruction is PC relative. I have retested this with JTreg hotspot & langtools with ReservedCodeCacheSize=256m Hotspot original: Test results: passed: 865; failed: 19; error: 85 Hotspot revised: Test results: passed: 953; failed: 9; error: 12 Langtools original: Test results: passed: 3,049; failed: 77; error: 223 Langtools revised: Test results: passed: 3,316; failed: 33 Thanks for the review, Ed. From aph at redhat.com Tue Dec 8 15:49:40 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 8 Dec 2015 15:49:40 +0000 Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache generates SEGV In-Reply-To: <1449588750.5880.28.camel@mylittlepony.linaroharston> References: <1449223186.15424.42.camel@mint> <5661BBCB.5000307@redhat.com> <5661CF8B.6040405@redhat.com> <1449490934.12382.49.camel@mint> <566595B5.9060400@redhat.com> <1449588750.5880.28.camel@mylittlepony.linaroharston> Message-ID: <5666FC14.6020001@redhat.com> On 12/08/2015 03:32 PM, Edward Nevill wrote: > OK. Thanks, I have satisfied myself that this is correct. > > New webrev @ http://cr.openjdk.java.net/~enevill/8144498/webrev.2 That looks good to me. Thanks, Andrew. From edward.nevill at gmail.com Tue Dec 8 18:22:32 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Tue, 08 Dec 2015 18:22:32 +0000 Subject: [aarch64-port-dev ] Guarantee failures since 8144028: Use AArch64 bit-test instructions in C2 Message-ID: <1449598952.3988.7.camel@mint> Hi, Since "8144028: Use AArch64 bit-test instructions in C2" I have been seeing occasional guarantee failures of the form. # Internal Error (assembler_aarch64.hpp:223), pid=4241, tid=4595 # guarantee(chk == -1 || chk == 0) failed: Field too big for insn These are being generated by the following call from pd_patch_instruction_size in macroAssembler_aarch64.cpp // Test & branch (immediate) Instruction_aarch64::spatch(branch, 18, 5, offset); The problem is that test and branch instructions only have a 14 bit offset giving a range of +/- 32Kb which is not sufficient for large C2 methods. What can we do about this? It seems a shame to backout this optimization but I cannot see any easy way around it. All the best, Ed. From aph at redhat.com Tue Dec 8 18:22:39 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 8 Dec 2015 18:22:39 +0000 Subject: [aarch64-port-dev ] Guarantee failures since 8144028: Use AArch64 bit-test instructions in C2 In-Reply-To: <1449598952.3988.7.camel@mint> References: <1449598952.3988.7.camel@mint> Message-ID: <56671FEF.6020404@redhat.com> On 12/08/2015 06:22 PM, Edward Nevill wrote: > The problem is that test and branch instructions only have a 14 bit > offset giving a range of +/- 32Kb which is not sufficient for large > C2 methods. > > What can we do about this? It seems a shame to backout this > optimization but I cannot see any easy way around it. C2 does support branch length relaxation: we already know it makes a couple of passes generating code. We've never used it, and I don't quite know how to use it, but I think some other ports do. Since this is my mess, I guess I should clean it up, and I'm interested to try this. But feel free if you like... Andrew. From edward.nevill at gmail.com Tue Dec 8 18:31:54 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Tue, 08 Dec 2015 18:31:54 +0000 Subject: [aarch64-port-dev ] Guarantee failures since 8144028: Use AArch64 bit-test instructions in C2 In-Reply-To: <56671FEF.6020404@redhat.com> References: <1449598952.3988.7.camel@mint> <56671FEF.6020404@redhat.com> Message-ID: <1449599514.3988.9.camel@mint> On Tue, 2015-12-08 at 18:22 +0000, Andrew Haley wrote: > On 12/08/2015 06:22 PM, Edward Nevill wrote: > C2 does support branch length relaxation: we already know it makes a > couple of passes generating code. We've never used it, and I don't > quite know how to use it, but I think some other ports do. > > Since this is my mess, I guess I should clean it up, and I'm > interested to try this. But feel free if you like... No. Its OK, thanks for the offer:-) Ed. From edward.nevill at gmail.com Wed Dec 9 14:10:42 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 09 Dec 2015 14:10:42 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: jdk8: large code cache support Message-ID: <1449670242.21212.24.camel@mylittlepony.linaroharston> Hi, The following webrev http://cr.openjdk.java.net/~enevill/jdk8_largecode/webrev backports large code cache support from JDK 9 to JDK 8 This incorporates the fix to pd_call_destination http://cr.openjdk.java.net/~enevill/8144498/webrev.2 I have also updated the jdk8 patch to reflect the setting of CODE_CACHE_SIZE_LIMIT in jdk9, so that jdk8 and jdk9 are the same. One change I am not so sure about is the following in jdk9 @@ -868,7 +867,7 @@ // blrt rscratch1 CodeBlob *cb = CodeCache::find_blob(_entry_point); if (cb) { - return NativeInstruction::instruction_size; + return MacroAssembler::far_branch_size(); } else { return 6 * NativeInstruction::instruction_size; whereas in jdk8 we have - return 4; + return MacroAssembler::far_branch_size(); } else { // A 48-bit address. See movptr(). - return 16; + // then a blrt + // return 16; + return 4 * NativeInstruction::instruction_size; IE. 4 * instruction_size instead of 6 * instruction_size This is because in jdk9, aarch64_enc_java_to_runtime does __ adr(rscratch2, retaddr); __ lea(rscratch1, RuntimeAddress(entry)); // Leave a breadcrumb for JavaThread::pd_last_frame(). __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize))); __ blrt(rscratch1, gpcnt, fpcnt, rtype); __ bind(retaddr); __ add(sp, sp, 2 * wordSize); whereas in jdk8 it just does __ lea(rscratch1, RuntimeAddress(entry)); __ blrt(rscratch1, gpcnt, fpcnt, rtype); For the moment I have left this unchanged. Is this necessary and should I include it in the backport? I have tested the large code support in jdk8 with jtreg hotspot and langtools with the following results. Hotspot (original - 128M code cache): Test results: passed: 674; failed: 17; error: 3 Hotspot (patched - 128M code cache): Test results: passed: 674; failed: 17; error: 3 Hotspot (patched- 256M code cache): Test results: passed: 674; failed: 17; error: 3 Langtools (original - 128M code cache): Test results: passed: 3,091 Langtools (patched - 128M code cache): Test results: passed: 3,090 Langtools (patched - 256M code cache): Test results: passed: 3,091 OK to push? Ed. From aph at redhat.com Wed Dec 9 14:40:23 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 9 Dec 2015 14:40:23 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: jdk8: large code cache support In-Reply-To: <1449670242.21212.24.camel@mylittlepony.linaroharston> References: <1449670242.21212.24.camel@mylittlepony.linaroharston> Message-ID: <56683D57.4090005@redhat.com> On 12/09/2015 02:10 PM, Edward Nevill wrote: > For the moment I have left this unchanged. > > Is this necessary and should I include it in the backport? This is fixed in http://hg.openjdk.java.net/aarch64-port/jdk8u changeset: 8597:bea52c7ebf71 user: aph date: Tue Sep 15 16:14:32 2015 +0000 summary: Remove AArch64-specific code in generateOptoStub.cpp. It's worth importing that patch. Andrew. From edward.nevill at gmail.com Wed Dec 9 15:30:18 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 09 Dec 2015 15:30:18 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: jdk8: large code cache support In-Reply-To: <56683D57.4090005@redhat.com> References: <1449670242.21212.24.camel@mylittlepony.linaroharston> <56683D57.4090005@redhat.com> Message-ID: <1449675018.21212.33.camel@mylittlepony.linaroharston> On Wed, 2015-12-09 at 14:40 +0000, Andrew Haley wrote: > On 12/09/2015 02:10 PM, Edward Nevill wrote: > > For the moment I have left this unchanged. > > > > Is this necessary and should I include it in the backport? > > This is fixed in http://hg.openjdk.java.net/aarch64-port/jdk8u > > changeset: 8597:bea52c7ebf71 > user: aph > date: Tue Sep 15 16:14:32 2015 +0000 > summary: Remove AArch64-specific code in generateOptoStub.cpp. > > It's worth importing that patch. OK. Thanks. With that patch imported does the large code cache support patch look ok to push to jdk8? Regards, Ed. From aph at redhat.com Wed Dec 9 15:36:58 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 9 Dec 2015 15:36:58 +0000 Subject: [aarch64-port-dev ] RFR: aarch64: jdk8: large code cache support In-Reply-To: <1449675018.21212.33.camel@mylittlepony.linaroharston> References: <1449670242.21212.24.camel@mylittlepony.linaroharston> <56683D57.4090005@redhat.com> <1449675018.21212.33.camel@mylittlepony.linaroharston> Message-ID: <56684A9A.9070705@redhat.com> On 12/09/2015 03:30 PM, Edward Nevill wrote: > OK. Thanks. With that patch imported does the large code cache support patch look ok to push to jdk8? I think so. Andrew. From aph at redhat.com Wed Dec 9 19:00:00 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 9 Dec 2015 19:00:00 +0000 Subject: [aarch64-port-dev ] Guarantee failures since 8144028: Use AArch64 bit-test instructions in C2 In-Reply-To: <1449598952.3988.7.camel@mint> References: <1449598952.3988.7.camel@mint> Message-ID: <56687A30.2020203@redhat.com> On 12/08/2015 06:22 PM, Edward Nevill wrote: > Hi, > > Since "8144028: Use AArch64 bit-test instructions in C2" I have been seeing occasional guarantee failures of the form. > > # Internal Error (assembler_aarch64.hpp:223), pid=4241, tid=4595 > # guarantee(chk == -1 || chk == 0) failed: Field too big for insn > > These are being generated by the following call from pd_patch_instruction_size in macroAssembler_aarch64.cpp > > // Test & branch (immediate) > Instruction_aarch64::spatch(branch, 18, 5, offset); > > The problem is that test and branch instructions only have a 14 bit offset giving a range of +/- 32Kb which is not sufficient for large C2 methods. > > What can we do about this? It seems a shame to backout this optimization but I cannot see any easy way around it. Please try this patch. Andrew. -------------- next part -------------- diff --git a/src/cpu/aarch64/vm/aarch64.ad b/src/cpu/aarch64/vm/aarch64.ad --- a/src/cpu/aarch64/vm/aarch64.ad +++ b/src/cpu/aarch64/vm/aarch64.ad @@ -3484,10 +3484,17 @@ return 0; } -bool Matcher::is_short_branch_offset(int rule, int br_size, int offset) -{ - Unimplemented(); - return false; +// Is this branch offset short enough that a short branch can be used? +// +// NOTE: If the platform does not provide any short branch variants, then +// this method should return false for offset 0. +bool Matcher::is_short_branch_offset(int rule, int br_size, int offset) { + // The passed offset is relative to address of the branch. On + // AArch64 a branch displacement is calculated relative to address + // of the next instruction. + offset -= br_size; + + return (-32768 <= offset && offset < 32768); } const bool Matcher::isSimpleConstant64(jlong value) { @@ -13845,7 +13852,8 @@ // Test bit and Branch -instruct cmpL_branch_sign(cmpOp cmp, iRegL op1, immL0 op2, label labl, rFlagsReg cr) %{ +// Patterns for short (< 32KiB) variants +instruct cmpL_branch_sign(cmpOp cmp, iRegL op1, immL0 op2, label labl) %{ match(If cmp (CmpL op1 op2)); predicate(n->in(1)->as_Bool()->_test._test == BoolTest::lt || n->in(1)->as_Bool()->_test._test == BoolTest::ge); @@ -13855,16 +13863,15 @@ format %{ "cb$cmp $op1, $labl # long" %} ins_encode %{ Label* L = $labl$$label; - Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode; - if (cond == Assembler::LT) - __ tbnz($op1$$Register, 63, *L); - else - __ tbz($op1$$Register, 63, *L); + Assembler::Condition cond = + ((Assembler::Condition)$cmp$$cmpcode == Assembler::LT) ? Assembler::NE : Assembler::EQ; + __ tbr($op1$$Register, cond, 63, *L); %} ins_pipe(pipe_cmp_branch); -%} - -instruct cmpI_branch_sign(cmpOp cmp, iRegIorL2I op1, immI0 op2, label labl, rFlagsReg cr) %{ + ins_short_branch(1); +%} + +instruct cmpI_branch_sign(cmpOp cmp, iRegIorL2I op1, immI0 op2, label labl) %{ match(If cmp (CmpI op1 op2)); predicate(n->in(1)->as_Bool()->_test._test == BoolTest::lt || n->in(1)->as_Bool()->_test._test == BoolTest::ge); @@ -13874,16 +13881,15 @@ format %{ "cb$cmp $op1, $labl # int" %} ins_encode %{ Label* L = $labl$$label; - Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode; - if (cond == Assembler::LT) - __ tbnz($op1$$Register, 31, *L); - else - __ tbz($op1$$Register, 31, *L); + Assembler::Condition cond = + ((Assembler::Condition)$cmp$$cmpcode == Assembler::LT) ? Assembler::NE : Assembler::EQ; + __ tbr($op1$$Register, cond, 31, *L); %} ins_pipe(pipe_cmp_branch); -%} - -instruct cmpL_branch_bit(cmpOp cmp, iRegL op1, immL op2, immL0 op3, label labl, rFlagsReg cr) %{ + ins_short_branch(1); +%} + +instruct cmpL_branch_bit(cmpOp cmp, iRegL op1, immL op2, immL0 op3, label labl) %{ match(If cmp (CmpL (AndL op1 op2) op3)); predicate((n->in(1)->as_Bool()->_test._test == BoolTest::ne || n->in(1)->as_Bool()->_test._test == BoolTest::eq) @@ -13896,15 +13902,13 @@ Label* L = $labl$$label; Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode; int bit = exact_log2($op2$$constant); - if (cond == Assembler::EQ) - __ tbz($op1$$Register, bit, *L); - else - __ tbnz($op1$$Register, bit, *L); + __ tbr($op1$$Register, cond, bit, *L); %} ins_pipe(pipe_cmp_branch); -%} - -instruct cmpI_branch_bit(cmpOp cmp, iRegIorL2I op1, immI op2, immI0 op3, label labl, rFlagsReg cr) %{ + ins_short_branch(1); +%} + +instruct cmpI_branch_bit(cmpOp cmp, iRegIorL2I op1, immI op2, immI0 op3, label labl) %{ match(If cmp (CmpI (AndI op1 op2) op3)); predicate((n->in(1)->as_Bool()->_test._test == BoolTest::ne || n->in(1)->as_Bool()->_test._test == BoolTest::eq) @@ -13917,10 +13921,79 @@ Label* L = $labl$$label; Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode; int bit = exact_log2($op2$$constant); - if (cond == Assembler::EQ) - __ tbz($op1$$Register, bit, *L); - else - __ tbnz($op1$$Register, bit, *L); + __ tbr($op1$$Register, cond, bit, *L); + %} + ins_pipe(pipe_cmp_branch); + ins_short_branch(1); +%} + +// And far variants +instruct far_cmpL_branch_sign(cmpOp cmp, iRegL op1, immL0 op2, label labl) %{ + match(If cmp (CmpL op1 op2)); + predicate(n->in(1)->as_Bool()->_test._test == BoolTest::lt + || n->in(1)->as_Bool()->_test._test == BoolTest::ge); + effect(USE labl); + + ins_cost(BRANCH_COST); + format %{ "cb$cmp $op1, $labl # long" %} + ins_encode %{ + Label* L = $labl$$label; + Assembler::Condition cond = + ((Assembler::Condition)$cmp$$cmpcode == Assembler::LT) ? Assembler::NE : Assembler::EQ; + __ tbr($op1$$Register, cond, 63, *L, /*far*/true); + %} + ins_pipe(pipe_cmp_branch); +%} + +instruct far_cmpI_branch_sign(cmpOp cmp, iRegIorL2I op1, immI0 op2, label labl) %{ + match(If cmp (CmpI op1 op2)); + predicate(n->in(1)->as_Bool()->_test._test == BoolTest::lt + || n->in(1)->as_Bool()->_test._test == BoolTest::ge); + effect(USE labl); + + ins_cost(BRANCH_COST); + format %{ "cb$cmp $op1, $labl # int" %} + ins_encode %{ + Label* L = $labl$$label; + Assembler::Condition cond = + ((Assembler::Condition)$cmp$$cmpcode == Assembler::LT) ? Assembler::NE : Assembler::EQ; + __ tbr($op1$$Register, cond, 31, *L, /*far*/true); + %} + ins_pipe(pipe_cmp_branch); +%} + +instruct far_cmpL_branch_bit(cmpOp cmp, iRegL op1, immL op2, immL0 op3, label labl) %{ + match(If cmp (CmpL (AndL op1 op2) op3)); + predicate((n->in(1)->as_Bool()->_test._test == BoolTest::ne + || n->in(1)->as_Bool()->_test._test == BoolTest::eq) + && is_power_of_2(n->in(2)->in(1)->in(2)->get_long())); + effect(USE labl); + + ins_cost(BRANCH_COST); + format %{ "tb$cmp $op1, $op2, $labl" %} + ins_encode %{ + Label* L = $labl$$label; + Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode; + int bit = exact_log2($op2$$constant); + __ tbr($op1$$Register, cond, bit, *L, /*far*/true); + %} + ins_pipe(pipe_cmp_branch); +%} + +instruct far_cmpI_branch_bit(cmpOp cmp, iRegIorL2I op1, immI op2, immI0 op3, label labl) %{ + match(If cmp (CmpI (AndI op1 op2) op3)); + predicate((n->in(1)->as_Bool()->_test._test == BoolTest::ne + || n->in(1)->as_Bool()->_test._test == BoolTest::eq) + && is_power_of_2(n->in(2)->in(1)->in(2)->get_int())); + effect(USE labl); + + ins_cost(BRANCH_COST); + format %{ "tb$cmp $op1, $op2, $labl" %} + ins_encode %{ + Label* L = $labl$$label; + Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode; + int bit = exact_log2($op2$$constant); + __ tbr($op1$$Register, cond, bit, *L, /*far*/true); %} ins_pipe(pipe_cmp_branch); %} diff --git a/src/cpu/aarch64/vm/c1_MacroAssembler_aarch64.hpp b/src/cpu/aarch64/vm/c1_MacroAssembler_aarch64.hpp --- a/src/cpu/aarch64/vm/c1_MacroAssembler_aarch64.hpp +++ b/src/cpu/aarch64/vm/c1_MacroAssembler_aarch64.hpp @@ -27,6 +27,7 @@ #define CPU_AARCH64_VM_C1_MACROASSEMBLER_AARCH64_HPP using MacroAssembler::build_frame; +using MacroAssembler::null_check; // C1_MacroAssembler contains high-level macros for C1 diff --git a/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp b/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp --- a/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp +++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp @@ -487,6 +487,32 @@ orr(Vd, T, Vn, Vn); } +public: + + // Generalized Test Bit And Branch, including a "far" variety which + // spans more than 32KiB. + void tbr(Register Rt, Condition cond, int bitpos, Label &dest, bool far = false) { + assert(cond == Assembler::EQ || cond == Assembler::NE, "must be"); + + if (far) + cond = ~cond; + + void (Assembler::* branch)(Register Rt, int bitpos, Label &L); + if (cond == Assembler::EQ) + branch = &Assembler::tbz; + else + branch = &Assembler::tbnz; + + if (far) { + Label L; + (this->*branch)(Rt, bitpos, L); + b(dest); + bind(L); + } else { + (this->*branch)(Rt, bitpos, dest); + } + } + // macro instructions for accessing and updating floating point // status register // diff --git a/src/share/vm/adlc/formssel.cpp b/src/share/vm/adlc/formssel.cpp --- a/src/share/vm/adlc/formssel.cpp +++ b/src/share/vm/adlc/formssel.cpp @@ -1246,7 +1246,8 @@ !is_short_branch() && // Don't match another short branch variant reduce_result() != NULL && strcmp(reduce_result(), short_branch->reduce_result()) == 0 && - _matrule->equivalent(AD.globalNames(), short_branch->_matrule)) { + _matrule->equivalent(AD.globalNames(), short_branch->_matrule) && + equivalent_predicates(this, short_branch)) { // The instructions are equivalent. // Now verify that both instructions have the same parameters and From edward.nevill at gmail.com Thu Dec 10 11:16:07 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 10 Dec 2015 11:16:07 +0000 Subject: [aarch64-port-dev ] Guarantee failures since 8144028: Use AArch64 bit-test instructions in C2 In-Reply-To: <56687A30.2020203@redhat.com> References: <1449598952.3988.7.camel@mint> <56687A30.2020203@redhat.com> Message-ID: <1449746167.24789.16.camel@mylittlepony.linaroharston> On Wed, 2015-12-09 at 19:00 +0000, Andrew Haley wrote: > On 12/08/2015 06:22 PM, Edward Nevill wrote: > > Hi, > > > Please try this patch. Hi, It fixed some of the problems I see, but not all. The test I am running is jtreg/langtools. With 8144028 I see 33 failures. With this patch that reduces to 30 failures. With 8144028 backed out there are no failures. The command I am using to run jtreg is /home/ed/images/jdk9-backout/bin/java -jar lib/jtreg.jar -nr -conc:16 -timeout:3 -othervm -jdk:/home/ed/images/jdk9-backout -v1 -a -ignore:quiet /home/ed/new_jdk9/hs-comp/langtools/test I am continuing to look at the other failures. Let me know if you want any logs etc. All the best, Ed. From hui.shi at linaro.org Thu Dec 10 14:48:05 2015 From: hui.shi at linaro.org (Hui Shi) Date: Thu, 10 Dec 2015 22:48:05 +0800 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode Message-ID: Hi All, Could some one help comments this change? Bug: https://bugs.openjdk.java.net/browse/JDK-8144993 webrev: http://cr.openjdk.java.net/~hshi/8144993/webrev/ This patch aims to remove redundant memory barrier after allocation node, on AArch64 it removes redundant dmb when creating object. The motivation is dmb instructions after commonly used object allocation, for example string and boxing objects is redundant with dmb inserted for final field write. In following small case: String foo(String s) { String copy = new String(s); return copy; } There are two dmb instructions in generated code. First one is membar_storestore, inserted in PhaseMacroExpand::expand_allocate_common. Second one is membar_release, inserted at exit of initializer method as final fields write happens. Allocated String doesn't escape in String initializer method, membar_release includes membar_storestore semantic. So first one can be removed safely. 0x0000007f85bbfa8c: prfm pstl1keep, [x11,#256] 0x0000007f85bbfa90: str xzr, [x0,#16] 0x0000007f85bbfa94: dmb ishst // first dmb to remove .... 0x0000007fa01d83c0: ldrsb w10, [x20,#20] 0x0000007fa01d83c4: ldr w12, [x20,#16] 0x0000007fa01d83c8: ldr x11, [sp,#8] 0x0000007fa01d83cc: strb w10, [x11,#20] 0x0000007fa01d83d0: str w12, [x11,#16] 0x0000007fa01d83d4: dmb ish // second dmb Patch targets this pattern and remove redundant memory barrier for allocation node. 1. When inserting memory barrier for final field write. If final fields' object allocation node is available, invoke AllocationNode::compute_MemBar_redundancy(initializer method). 2. In AllocationNode: 2.1 Add a new field _is_allocation_MemBar_redundant flag indicate if memory barrier after allocation node is redundant. 2.2 Add method compute_MemBar_redundancy, set _is_allocation_MemBar_redundant true if first parameter "this" does not escape in initializer method according to BCEscapeAnalyzer. 3. skip inserting memory barrier in PhaseMacroExpand::expand_allocate_common, when AllocationNode's _is_allocation_MemBar_redundant flag is true. Regards Hui From edward.nevill at gmail.com Thu Dec 10 17:05:26 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 10 Dec 2015 17:05:26 +0000 Subject: [aarch64-port-dev ] Guarantee failures since 8144028: Use AArch64 bit-test instructions in C2 In-Reply-To: <56687A30.2020203@redhat.com> References: <1449598952.3988.7.camel@mint> <56687A30.2020203@redhat.com> Message-ID: <1449767126.8845.3.camel@mylittlepony.linaroharston> On Wed, 2015-12-09 at 19:00 +0000, Andrew Haley wrote: > On 12/08/2015 06:22 PM, Edward Nevill wrote: > > Hi, > > > > Since "8144028: Use AArch64 bit-test instructions in C2" I have been seeing occasional guarantee failures of the form. > > > > # Internal Error (assembler_aarch64.hpp:223), pid=4241, tid=4595 > > # guarantee(chk == -1 || chk == 0) failed: Field too big for insn > > > > These are being generated by the following call from pd_patch_instruction_size in macroAssembler_aarch64.cpp > > > > // Test & branch (immediate) > > Instruction_aarch64::spatch(branch, 18, 5, offset); > > > > The problem is that test and branch instructions only have a 14 bit offset giving a range of +/- 32Kb which is not sufficient for large C2 methods. > > > > What can we do about this? It seems a shame to backout this optimization but I cannot see any easy way around it. > > Please try this patch. I think the following patch is needed in addition. diff -r af66c2e5a0f6 src/cpu/aarch64/vm/interp_masm_aarch64.cpp --- a/src/cpu/aarch64/vm/interp_masm_aarch64.cpp Thu Dec 10 15:58:02 2015 +0000 +++ b/src/cpu/aarch64/vm/interp_masm_aarch64.cpp Thu Dec 10 17:02:12 2015 +0000 @@ -1355,8 +1355,9 @@ if (JvmtiExport::can_post_interpreter_events()) { Label L; ldr(r3, Address(rthread, JavaThread::interp_only_mode_offset())); - tst(r3, ~0); - br(Assembler::EQ, L); +// tst(r3, ~0); +// br(Assembler::EQ, L); + cbz(r3, L); call_VM(noreg, CAST_FROM_FN_PTR(address, InterpreterRuntime::post_method_entry)); bind(L); Regards, Ed. From aph at redhat.com Thu Dec 10 17:22:59 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 10 Dec 2015 17:22:59 +0000 Subject: [aarch64-port-dev ] Guarantee failures since 8144028: Use AArch64 bit-test instructions in C2 In-Reply-To: <1449767126.8845.3.camel@mylittlepony.linaroharston> References: <1449598952.3988.7.camel@mint> <56687A30.2020203@redhat.com> <1449767126.8845.3.camel@mylittlepony.linaroharston> Message-ID: <5669B4F3.2060800@redhat.com> On 12/10/2015 05:05 PM, Edward Nevill wrote: > I think the following patch is needed in addition. Good catch! Thanks, Andrew. From aph at redhat.com Mon Dec 14 15:59:23 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 14 Dec 2015 15:59:23 +0000 Subject: [aarch64-port-dev ] RFR: 8145320: Create unsafe_arraycopy and generic_arraycopy for AArch64 Message-ID: <566EE75B.70107@redhat.com> http://cr.openjdk.java.net/~aph/8145320-1/ Andrew. From edward.nevill at gmail.com Mon Dec 14 17:52:39 2015 From: edward.nevill at gmail.com (edward.nevill at gmail.com) Date: Mon, 14 Dec 2015 17:52:39 +0000 Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/hotspot: 2 new changesets Message-ID: <201512141752.tBEHqd99023228@aojmv0008.oracle.com> Changeset: b2df86902f5e Author: enevill Date: 2015-12-09 13:08 +0000 URL: http://hg.openjdk.java.net/aarch64-port/jdk8/hotspot/rev/b2df86902f5e Add support for large code cache ! src/cpu/aarch64/vm/aarch64.ad ! src/cpu/aarch64/vm/assembler_aarch64.cpp ! src/cpu/aarch64/vm/assembler_aarch64.hpp ! src/cpu/aarch64/vm/c1_CodeStubs_aarch64.cpp ! src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp ! src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.hpp ! src/cpu/aarch64/vm/c1_MacroAssembler_aarch64.cpp ! src/cpu/aarch64/vm/c1_Runtime1_aarch64.cpp ! src/cpu/aarch64/vm/compiledIC_aarch64.cpp ! src/cpu/aarch64/vm/globalDefinitions_aarch64.hpp ! src/cpu/aarch64/vm/globals_aarch64.hpp ! src/cpu/aarch64/vm/icBuffer_aarch64.cpp ! src/cpu/aarch64/vm/macroAssembler_aarch64.cpp ! src/cpu/aarch64/vm/macroAssembler_aarch64.hpp ! src/cpu/aarch64/vm/methodHandles_aarch64.cpp ! src/cpu/aarch64/vm/nativeInst_aarch64.cpp ! src/cpu/aarch64/vm/nativeInst_aarch64.hpp ! src/cpu/aarch64/vm/relocInfo_aarch64.cpp ! src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp ! src/cpu/aarch64/vm/stubGenerator_aarch64.cpp ! src/cpu/aarch64/vm/templateInterpreter_aarch64.cpp ! src/cpu/aarch64/vm/vtableStubs_aarch64.cpp ! src/os_cpu/linux_aarch64/vm/os_linux_aarch64.cpp ! src/share/vm/runtime/arguments.cpp ! src/share/vm/utilities/globalDefinitions.hpp Changeset: 0096f1ef564e Author: aph Date: 2015-09-15 16:14 +0000 URL: http://hg.openjdk.java.net/aarch64-port/jdk8/hotspot/rev/0096f1ef564e Remove AArch64-specific code in generateOptoStub.cpp. In aarch64_enc_java_to_runtime leave a breadcrumb for JavaThread::pd_last_frame(). ! src/cpu/aarch64/vm/aarch64.ad ! src/share/vm/opto/generateOptoStub.cpp From edward.nevill at gmail.com Mon Dec 14 19:46:41 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Mon, 14 Dec 2015 19:46:41 +0000 Subject: [aarch64-port-dev ] Ping: RFR: aarch64: backports to JDK 7 In-Reply-To: <1448466290.16878.5.camel@mylittlepony.linaroharston> References: <1448466290.16878.5.camel@mylittlepony.linaroharston> Message-ID: <1450122401.708.1.camel@mint> Hi, OK to backport these to JDK7? Thanks, Ed. On Wed, 2015-11-25 at 15:44 +0000, Edward Nevill wrote: > Hi, > > Please review the following backports to JDK 7 > > http://cr.openjdk.java.net/~enevill/jdk7_backports_1511/ > > Tested with jtreg hotspot & langtools. Results the same before and after. > > Hotspot: Test results: passed: 297; failed: 12; error: 2 > Langtools: Test results: passed: 1,971; failed: 1; error: 2 > > Summary of the changesets below. > > Thanks, > Ed. > > --- > enevill at arm64:~/icedtea7-forest/hotspot$ hg outgoing > comparing with ssh://enevill at icedtea.classpath.org/hg/icedtea7-forest/hotspot > running ssh enevill at icedtea.classpath.org 'hg -R hg/icedtea7-forest/hotspot serve --stdio' > searching for changes > changeset: 6380:5b6efbae9fea > user: aph > date: Wed Nov 04 13:38:38 2015 +0100 > files: src/share/vm/gc_implementation/parallelScavenge/psParallelCompact.hpp > description: > 8138966: Intermittent SEGV running ParallelGC > Summary: Add necessary memory fences so that the parallel threads are unable to observe partially filled block tables. > Reviewed-by: tschatzl > > > changeset: 6381:c7679d143590 > user: enevill > date: Thu Nov 19 15:15:20 2015 +0000 > files: src/cpu/aarch64/vm/assembler_aarch64.cpp > description: > 8143067: aarch64: guarantee failure in javac > Summary: Fix adrp going out of range during code relocation > Reviewed-by: aph, kvn > > > changeset: 6382:eeb4a3ec4563 > tag: tip > user: hshi > date: Tue Nov 24 09:02:26 2015 +0000 > files: src/cpu/aarch64/vm/interp_masm_aarch64.cpp > description: > 8143285: aarch64: Missing load acquire when checking if ConstantPoolCacheEntry is resolved > Reviewed-by: roland, aph > --- > > From aph at redhat.com Mon Dec 14 20:42:46 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 14 Dec 2015 20:42:46 +0000 Subject: [aarch64-port-dev ] Ping: RFR: aarch64: backports to JDK 7 In-Reply-To: <1450122401.708.1.camel@mint> References: <1448466290.16878.5.camel@mylittlepony.linaroharston> <1450122401.708.1.camel@mint> Message-ID: <566F29C6.7090802@redhat.com> On 12/14/2015 07:46 PM, Edward Nevill wrote: > OK to backport these to JDK7? Looks good. Andrew. From vladimir.kozlov at oracle.com Mon Dec 14 23:07:03 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Dec 2015 15:07:03 -0800 Subject: [aarch64-port-dev ] RFR: 8145320: Create unsafe_arraycopy and generic_arraycopy for AArch64 In-Reply-To: <566EE75B.70107@redhat.com> References: <566EE75B.70107@redhat.com> Message-ID: <566F4B97.5050605@oracle.com> Looks fine to me. Thanks, Vladimir On 12/14/15 7:59 AM, Andrew Haley wrote: > http://cr.openjdk.java.net/~aph/8145320-1/ > > Andrew. > From vladimir.kozlov at oracle.com Tue Dec 15 02:40:02 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Dec 2015 18:40:02 -0800 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: Message-ID: <566F7D82.6030806@oracle.com> Very interesting! Please, add short statement to the comment in /macro.cpp for your case. Changes looks fine to me. One nit could be to delay bytecode analysis until macro expansion - it may reduce compilation time. Bytecode analysis of each constructor could be expensive. Thanks, Vladimir On 12/10/15 6:48 AM, Hui Shi wrote: > Hi All, > > > Could some one help comments this change? > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8144993 > > webrev: http://cr.openjdk.java.net/~hshi/8144993/webrev/ > > > This patch aims to remove redundant memory barrier after allocation > node, on AArch64 it removes redundant dmb when creating object. The > motivation is dmb instructions after commonly used object allocation, > for example string and boxing objects is redundant with dmb inserted for > final field write. In following small case:____ > > __ __ > > String foo(String s)____ > > {____ > > String copy = new String(s);____ > > return copy;____ > > }____ > > __ __ > > There are two dmb instructions in generated code. First one is > membar_storestore, inserted in PhaseMacroExpand::expand_allocate_common. > Second one is membar_release, inserted at exit of initializer method as > final fields write happens. Allocated String doesn't escape in String > initializer method, membar_release includes membar_storestore semantic. > So first one can be removed safely.____ > > __ __ > > 0x0000007f85bbfa8c: prfm pstl1keep, [x11,#256]____ > > 0x0000007f85bbfa90: str xzr, [x0,#16]____ > > 0x0000007f85bbfa94: dmb ishst // first dmb to remove____ > > ....____ > > ____ > > 0x0000007fa01d83c0: ldrsb w10, [x20,#20]____ > > 0x0000007fa01d83c4: ldr w12, [x20,#16]____ > > 0x0000007fa01d83c8: ldr x11, [sp,#8]____ > > 0x0000007fa01d83cc: strb w10, [x11,#20]____ > > 0x0000007fa01d83d0: str w12, [x11,#16]____ > > 0x0000007fa01d83d4: dmb ish // second dmb____ > > __ __ > > > Patch targets this pattern and remove redundant memory barrier for > allocation node.____ > > 1. When inserting memory barrier for final field write. If final fields' > object allocation node is available, invoke > AllocationNode::compute_MemBar_redundancy(initializer method).____ > > 2. In AllocationNode:____ > > 2.1 Add a new field _is_allocation_MemBar_redundant flag indicate > if memory barrier after allocation node is redundant.____ > > 2.2 Add method compute_MemBar_redundancy, set > _is_allocation_MemBar_redundant true if first parameter "this" does > not escape in initializer method according to BCEscapeAnalyzer.____ > > 3. skip inserting memory barrier in > PhaseMacroExpand::expand_allocate_common, when AllocationNode's > _is_allocation_MemBar_redundant flagis true. > > > Regards > > Hui > From aleksey.shipilev at oracle.com Tue Dec 15 09:05:44 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 15 Dec 2015 12:05:44 +0300 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <566F7D82.6030806@oracle.com> References: <566F7D82.6030806@oracle.com> Message-ID: <566FD7E8.7000105@oracle.com> Also, I think this is a duplicate of: https://bugs.openjdk.java.net/browse/JDK-8032481 -Aleksey On 12/15/2015 05:40 AM, Vladimir Kozlov wrote: > Very interesting! > > Please, add short statement to the comment in /macro.cpp for your case. > > Changes looks fine to me. One nit could be to delay bytecode analysis > until macro expansion - it may reduce compilation time. Bytecode > analysis of each constructor could be expensive. > > Thanks, > Vladimir > > On 12/10/15 6:48 AM, Hui Shi wrote: >> Hi All, >> >> >> Could some one help comments this change? >> >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8144993 >> >> webrev: http://cr.openjdk.java.net/~hshi/8144993/webrev/ >> >> >> This patch aims to remove redundant memory barrier after allocation >> node, on AArch64 it removes redundant dmb when creating object. The >> motivation is dmb instructions after commonly used object allocation, >> for example string and boxing objects is redundant with dmb inserted for >> final field write. In following small case:____ >> >> __ __ >> >> String foo(String s)____ >> >> {____ >> >> String copy = new String(s);____ >> >> return copy;____ >> >> }____ >> >> __ __ >> >> There are two dmb instructions in generated code. First one is >> membar_storestore, inserted in PhaseMacroExpand::expand_allocate_common. >> Second one is membar_release, inserted at exit of initializer method as >> final fields write happens. Allocated String doesn't escape in String >> initializer method, membar_release includes membar_storestore semantic. >> So first one can be removed safely.____ >> >> __ __ >> >> 0x0000007f85bbfa8c: prfm pstl1keep, [x11,#256]____ >> >> 0x0000007f85bbfa90: str xzr, [x0,#16]____ >> >> 0x0000007f85bbfa94: dmb ishst // first dmb to remove____ >> >> ....____ >> >> ____ >> >> 0x0000007fa01d83c0: ldrsb w10, [x20,#20]____ >> >> 0x0000007fa01d83c4: ldr w12, [x20,#16]____ >> >> 0x0000007fa01d83c8: ldr x11, [sp,#8]____ >> >> 0x0000007fa01d83cc: strb w10, [x11,#20]____ >> >> 0x0000007fa01d83d0: str w12, [x11,#16]____ >> >> 0x0000007fa01d83d4: dmb ish // second dmb____ >> >> __ __ >> >> >> Patch targets this pattern and remove redundant memory barrier for >> allocation node.____ >> >> 1. When inserting memory barrier for final field write. If final fields' >> object allocation node is available, invoke >> AllocationNode::compute_MemBar_redundancy(initializer method).____ >> >> 2. In AllocationNode:____ >> >> 2.1 Add a new field _is_allocation_MemBar_redundant flag indicate >> if memory barrier after allocation node is redundant.____ >> >> 2.2 Add method compute_MemBar_redundancy, set >> _is_allocation_MemBar_redundant true if first parameter "this" does >> not escape in initializer method according to BCEscapeAnalyzer.____ >> >> 3. skip inserting memory barrier in >> PhaseMacroExpand::expand_allocate_common, when AllocationNode's >> _is_allocation_MemBar_redundant flagis true. >> >> >> Regards >> >> Hui >> From martin.doerr at sap.com Tue Dec 15 10:27:14 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 15 Dec 2015 10:27:14 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <566FD7E8.7000105@oracle.com> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> Hi, I think this change is good with respect to concurrent java threads. However, I'm not sure if concurrent GC may have a problem when we optimize out the memory barrier (with or without this change). Is it guaranteed that no concurrent GC will ever read an object header of such a newly allocated object? A reference to this object may get written somewhere where GC can find it. If the GC reads the header, it may read stale data. Best regards, Martin -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Aleksey Shipilev Sent: Dienstag, 15. Dezember 2015 10:06 To: Vladimir Kozlov ; Hui Shi ; hotspot compiler ; aarch64-port-dev Subject: Re: RFR: 8144993: Elide redundant memory barrier after AllocationNode * PGP Signed by an unknown key Also, I think this is a duplicate of: https://bugs.openjdk.java.net/browse/JDK-8032481 -Aleksey On 12/15/2015 05:40 AM, Vladimir Kozlov wrote: > Very interesting! > > Please, add short statement to the comment in /macro.cpp for your case. > > Changes looks fine to me. One nit could be to delay bytecode analysis > until macro expansion - it may reduce compilation time. Bytecode > analysis of each constructor could be expensive. > > Thanks, > Vladimir > > On 12/10/15 6:48 AM, Hui Shi wrote: >> Hi All, >> >> >> Could some one help comments this change? >> >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8144993 >> >> webrev: http://cr.openjdk.java.net/~hshi/8144993/webrev/ >> >> >> This patch aims to remove redundant memory barrier after allocation >> node, on AArch64 it removes redundant dmb when creating object. The >> motivation is dmb instructions after commonly used object allocation, >> for example string and boxing objects is redundant with dmb inserted for >> final field write. In following small case:____ >> >> __ __ >> >> String foo(String s)____ >> >> {____ >> >> String copy = new String(s);____ >> >> return copy;____ >> >> }____ >> >> __ __ >> >> There are two dmb instructions in generated code. First one is >> membar_storestore, inserted in PhaseMacroExpand::expand_allocate_common. >> Second one is membar_release, inserted at exit of initializer method as >> final fields write happens. Allocated String doesn't escape in String >> initializer method, membar_release includes membar_storestore semantic. >> So first one can be removed safely.____ >> >> __ __ >> >> 0x0000007f85bbfa8c: prfm pstl1keep, [x11,#256]____ >> >> 0x0000007f85bbfa90: str xzr, [x0,#16]____ >> >> 0x0000007f85bbfa94: dmb ishst // first dmb to remove____ >> >> ....____ >> >> ____ >> >> 0x0000007fa01d83c0: ldrsb w10, [x20,#20]____ >> >> 0x0000007fa01d83c4: ldr w12, [x20,#16]____ >> >> 0x0000007fa01d83c8: ldr x11, [sp,#8]____ >> >> 0x0000007fa01d83cc: strb w10, [x11,#20]____ >> >> 0x0000007fa01d83d0: str w12, [x11,#16]____ >> >> 0x0000007fa01d83d4: dmb ish // second dmb____ >> >> __ __ >> >> >> Patch targets this pattern and remove redundant memory barrier for >> allocation node.____ >> >> 1. When inserting memory barrier for final field write. If final fields' >> object allocation node is available, invoke >> AllocationNode::compute_MemBar_redundancy(initializer method).____ >> >> 2. In AllocationNode:____ >> >> 2.1 Add a new field _is_allocation_MemBar_redundant flag indicate >> if memory barrier after allocation node is redundant.____ >> >> 2.2 Add method compute_MemBar_redundancy, set >> _is_allocation_MemBar_redundant true if first parameter "this" does >> not escape in initializer method according to BCEscapeAnalyzer.____ >> >> 3. skip inserting memory barrier in >> PhaseMacroExpand::expand_allocate_common, when AllocationNode's >> _is_allocation_MemBar_redundant flagis true. >> >> >> Regards >> >> Hui >> * Unknown Key * 0x62A119A7 From aph at redhat.com Tue Dec 15 10:42:17 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 10:42:17 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> Message-ID: <566FEE89.5020300@redhat.com> On 15/12/15 10:27, Doerr, Martin wrote: > I think this change is good with respect to concurrent java threads. > However, I'm not sure if concurrent GC may have a problem when we > optimize out the memory barrier (with or without this change). > > Is it guaranteed that no concurrent GC will ever read an object > header of such a newly allocated object? > A reference to this object may get written somewhere where GC can > find it. If the GC reads the header, it may read stale data. We know that the reference to the newly-created object does not escape, so it is not reachable from any reference. The only other way a GC might find it is at a safepoint. But even if that happens, a safepoint is a memory barrier. So I think we're OK. Andrew. From goetz.lindenmaier at sap.com Tue Dec 15 13:09:58 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 15 Dec 2015 13:09:58 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <566FEE89.5020300@redhat.com> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> Message-ID: <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> Hi Andrew, What if it's assigned to an object that's already completely alive, but does not escape itself? Best regards, Goetz. > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev- > bounces at openjdk.java.net] On Behalf Of Andrew Haley > Sent: Dienstag, 15. Dezember 2015 11:42 > To: Doerr, Martin ; Aleksey Shipilev > ; Vladimir Kozlov > ; Hui Shi ; hotspot > compiler ; aarch64-port-dev > ; Mikael Gerdin > (mikael.gerdin at oracle.com) > > Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory > barrier after AllocationNode > > On 15/12/15 10:27, Doerr, Martin wrote: > > > I think this change is good with respect to concurrent java threads. > > However, I'm not sure if concurrent GC may have a problem when we > > optimize out the memory barrier (with or without this change). > > > > Is it guaranteed that no concurrent GC will ever read an object > > header of such a newly allocated object? > > A reference to this object may get written somewhere where GC can > > find it. If the GC reads the header, it may read stale data. > > We know that the reference to the newly-created object does not > escape, so it is not reachable from any reference. The only other way > a GC might find it is at a safepoint. But even if that happens, a > safepoint is a memory barrier. So I think we're OK. > > Andrew. From aph at redhat.com Tue Dec 15 13:46:13 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 13:46:13 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> Message-ID: <567019A5.1000202@redhat.com> Hi, On 12/15/2015 01:09 PM, Lindenmaier, Goetz wrote: > What if it's assigned to an object that's already completely alive, > but does not escape itself? It's not clear to me exactly what this means. However, if neither object escapes then they are both reachable to GC only via scanning the stack, and this can happen only at safepoints. Andrew. From goetz.lindenmaier at sap.com Tue Dec 15 13:53:39 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 15 Dec 2015 13:53:39 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <567019A5.1000202@redhat.com> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> Message-ID: <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> Hi Andrew, here an example: A a = new A (); // a does not escape Safepoint(); // a is known to GC // Concurrent GC is running. B b = new B(a); where B(A a) { StoreStore barrier // This is removed by the optimization. a.x = this; // Then this is not initialized, but visible to GC final field store Membar_release } Best regards, Martin and Goetz. > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Dienstag, 15. Dezember 2015 14:46 > To: Lindenmaier, Goetz ; Doerr, Martin > ; Aleksey Shipilev ; > Vladimir Kozlov ; Hui Shi ; > hotspot compiler ; aarch64-port- > dev ; Mikael Gerdin > (mikael.gerdin at oracle.com) > > Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory > barrier after AllocationNode > > Hi, > > On 12/15/2015 01:09 PM, Lindenmaier, Goetz wrote: > > > What if it's assigned to an object that's already completely alive, > > but does not escape itself? > > It's not clear to me exactly what this means. However, if neither > object escapes then they are both reachable to GC only via scanning > the stack, and this can happen only at safepoints. > > Andrew. From aph at redhat.com Tue Dec 15 14:05:34 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 14:05:34 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> Message-ID: <56701E2E.5000901@redhat.com> Hi, On 12/15/2015 01:53 PM, Lindenmaier, Goetz wrote: > here an example: > > A a = new A (); // a does not escape > Safepoint(); // a is known to GC > // Concurrent GC is running. > B b = new B(a); > > where > B(A a) { > > StoreStore barrier // This is removed by the optimization. > a.x = this; // Then this is not initialized, but visible to GC > final field store > Membar_release > } Hmm, interesting. Here we're presented with two objects which escape analysis reveals as not escaping but both are allocated anyway and are included in the OOP map. I'd argue that once you've put an object into an OOP map to be scanned it has escaped, but that may well not be how C2 handles it. For this reachability analysis to be correct, if you put a reference to an object into any object which is reachable as a GC root then that object surely does escape. Andrew. From vitalyd at gmail.com Tue Dec 15 14:28:35 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 15 Dec 2015 09:28:35 -0500 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <56701E2E.5000901@redhat.com> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> Message-ID: I'm curious why you guys think `a` and/or `b` would be in the oopmap if compiler proves they don't escape. AFAIK, both `a` and `b` will be component-wise scalar replaced. Once that's done, there's a ref from scalar replaced a.x to `b`, but `b` itself is scalar replaced. In either case, I don't see why either of these need to be known to GC at all (which would somewhat defeat the purpose of EA to begin with). On Tue, Dec 15, 2015 at 9:05 AM, Andrew Haley wrote: > Hi, > > On 12/15/2015 01:53 PM, Lindenmaier, Goetz wrote: > > > here an example: > > > > A a = new A (); // a does not escape > > Safepoint(); // a is known to GC > > // Concurrent GC is running. > > B b = new B(a); > > > > where > > B(A a) { > > > > StoreStore barrier // This is removed by the optimization. > > a.x = this; // Then this is not initialized, > but visible to GC > > final field store > > Membar_release > > } > > Hmm, interesting. Here we're presented with two objects which > escape analysis reveals as not escaping but both are allocated > anyway and are included in the OOP map. > > I'd argue that once you've put an object into an OOP map to be scanned > it has escaped, but that may well not be how C2 handles it. For this > reachability analysis to be correct, if you put a reference to an > object into any object which is reachable as a GC root then that object > surely does escape. > > Andrew. > From aph at redhat.com Tue Dec 15 14:33:04 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 14:33:04 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> Message-ID: <567024A0.40409@redhat.com> On 12/15/2015 02:28 PM, Vitaly Davidovich wrote: > I'm curious why you guys think `a` and/or `b` would be in the oopmap if > compiler proves they don't escape. AFAIK, both `a` and `b` will be > component-wise scalar replaced. Once that's done, there's a ref from > scalar replaced a.x to `b`, but `b` itself is scalar replaced. In either > case, I don't see why either of these need to be known to GC at all (which > would somewhat defeat the purpose of EA to begin with). Are you saying that if escape analysis determined that an object does not escape then you know *for sure* that it will always be scalar- replaced? Andrew. From goetz.lindenmaier at sap.com Tue Dec 15 14:37:51 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 15 Dec 2015 14:37:51 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> Message-ID: <4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap> If object arg_escape, locking, barriers etc can be relaxed, but scalar replacement is not possible. Oop maps are needed, else these don?t survive the gc. Goetz. From: Vitaly Davidovich [mailto:vitalyd at gmail.com] Sent: Dienstag, 15. Dezember 2015 15:29 To: Andrew Haley Cc: Lindenmaier, Goetz ; Doerr, Martin ; Aleksey Shipilev ; Vladimir Kozlov ; Hui Shi ; hotspot compiler ; aarch64-port-dev ; Mikael Gerdin (mikael.gerdin at oracle.com) Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode I'm curious why you guys think `a` and/or `b` would be in the oopmap if compiler proves they don't escape. AFAIK, both `a` and `b` will be component-wise scalar replaced. Once that's done, there's a ref from scalar replaced a.x to `b`, but `b` itself is scalar replaced. In either case, I don't see why either of these need to be known to GC at all (which would somewhat defeat the purpose of EA to begin with). On Tue, Dec 15, 2015 at 9:05 AM, Andrew Haley > wrote: Hi, On 12/15/2015 01:53 PM, Lindenmaier, Goetz wrote: > here an example: > > A a = new A (); // a does not escape > Safepoint(); // a is known to GC > // Concurrent GC is running. > B b = new B(a); > > where > B(A a) { > > StoreStore barrier // This is removed by the optimization. > a.x = this; // Then this is not initialized, but visible to GC > final field store > Membar_release > } Hmm, interesting. Here we're presented with two objects which escape analysis reveals as not escaping but both are allocated anyway and are included in the OOP map. I'd argue that once you've put an object into an OOP map to be scanned it has escaped, but that may well not be how C2 handles it. For this reachability analysis to be correct, if you put a reference to an object into any object which is reachable as a GC root then that object surely does escape. Andrew. From aph at redhat.com Tue Dec 15 14:42:31 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 14:42:31 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap> Message-ID: <567026D7.6080908@redhat.com> On 12/15/2015 02:37 PM, Lindenmaier, Goetz wrote: > If object arg_escape, locking, barriers etc can be relaxed, but scalar replacement is not possible. > Oop maps are needed, else these don?t survive the gc. I don't know what this means. Andrew. From hui.shi at linaro.org Tue Dec 15 14:50:38 2015 From: hui.shi at linaro.org (Hui Shi) Date: Tue, 15 Dec 2015 22:50:38 +0800 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> Message-ID: Thanks All! In Goetz example, suppose the outer method is named foo and object a, b is not escaped in foo. b is not escaped in foo as a is not escaped in foo. But b is escaped in its initializer in BCEscapeAnalysis. In b's initializer method, "this" should be marked escaped as it is assigned to another parameter "assign to a.x". As b is escaped in its initializer, storestore barrier will not be removed in this case, so it's safe. Regards Hui On 15 December 2015 at 21:53, Lindenmaier, Goetz wrote: > Hi Andrew, > > here an example: > > A a = new A (); // a does not escape > Safepoint(); // a is known to GC > // Concurrent GC is running. > B b = new B(a); > > where > B(A a) { > > StoreStore barrier // This is removed by the optimization. > a.x = this; // Then this is not initialized, > but visible to GC > final field store > Membar_release > } > > Best regards, > Martin and Goetz. > > > > -----Original Message----- > > From: Andrew Haley [mailto:aph at redhat.com] > > Sent: Dienstag, 15. Dezember 2015 14:46 > > To: Lindenmaier, Goetz ; Doerr, Martin > > ; Aleksey Shipilev ; > > Vladimir Kozlov ; Hui Shi < > hui.shi at linaro.org>; > > hotspot compiler ; aarch64-port- > > dev ; Mikael Gerdin > > (mikael.gerdin at oracle.com) > > > > Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory > > barrier after AllocationNode > > > > Hi, > > > > On 12/15/2015 01:09 PM, Lindenmaier, Goetz wrote: > > > > > What if it's assigned to an object that's already completely alive, > > > but does not escape itself? > > > > It's not clear to me exactly what this means. However, if neither > > object escapes then they are both reachable to GC only via scanning > > the stack, and this can happen only at safepoints. > > > > Andrew. > > > > From vitalyd at gmail.com Tue Dec 15 14:51:40 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 15 Dec 2015 09:51:40 -0500 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <567024A0.40409@redhat.com> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> Message-ID: Hotspot implements only the scalar replacement form of EA. On Tue, Dec 15, 2015 at 9:33 AM, Andrew Haley wrote: > On 12/15/2015 02:28 PM, Vitaly Davidovich wrote: > > I'm curious why you guys think `a` and/or `b` would be in the oopmap if > > compiler proves they don't escape. AFAIK, both `a` and `b` will be > > component-wise scalar replaced. Once that's done, there's a ref from > > scalar replaced a.x to `b`, but `b` itself is scalar replaced. In either > > case, I don't see why either of these need to be known to GC at all > (which > > would somewhat defeat the purpose of EA to begin with). > > Are you saying that if escape analysis determined that an object does > not escape then you know *for sure* that it will always be scalar- > replaced? > > Andrew. > > From goetz.lindenmaier at sap.com Tue Dec 15 14:54:23 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 15 Dec 2015 14:54:23 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <567026D7.6080908@redhat.com> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap> <567026D7.6080908@redhat.com> Message-ID: <4295855A5C1DE049A61835A1887419CC41EDEF22@DEWDFEMB12A.global.corp.sap> Hi, It's explained in escape.hpp. The proper name is 'ArgEscape'. typedef enum { UnknownEscape = 0, NoEscape = 1, // An object does not escape method or thread and it is // not passed to call. It could be replaced with scalar. ArgEscape = 2, // An object does not escape method or thread but it is // passed as argument to call or referenced by argument // and it does not escape during call. GlobalEscape = 3 // An object escapes the method or thread. } EscapeState; I.e., an object passed to a callee that is a pure function can not be scalar replaced, as you have to keep the object layout to pass it down. But the callee does not publish the reference to any other thread, so we don't need to execute locks. Also, we can remove barriers. Actually, we see a whole bunch of errors on ppc recently. I thought it's all related to ComressedStrings, but not all are investigated yet. So it could also stem from "8136596: Remove aarch64: MemBarRelease when final field's allocation is NoEscape or ArgEscape" http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/6cc606e29b74 We'll investigate ... Best regards, Goetz. > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Dienstag, 15. Dezember 2015 15:43 > To: Lindenmaier, Goetz ; Vitaly Davidovich > > Cc: Doerr, Martin ; Aleksey Shipilev > ; Vladimir Kozlov > ; Hui Shi ; hotspot > compiler ; aarch64-port-dev > ; Mikael Gerdin > (mikael.gerdin at oracle.com) > > Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory > barrier after AllocationNode > > On 12/15/2015 02:37 PM, Lindenmaier, Goetz wrote: > > If object arg_escape, locking, barriers etc can be relaxed, but scalar > replacement is not possible. > > Oop maps are needed, else these don?t survive the gc. > > I don't know what this means. > > Andrew. From aph at redhat.com Tue Dec 15 14:55:38 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 14:55:38 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> Message-ID: <567029EA.5030607@redhat.com> On 12/15/2015 02:51 PM, Vitaly Davidovich wrote: > Hotspot implements only the scalar replacement form of EA. Scalar replacement is not a form of escape analysis. This does not answer my question, which was: > Are you saying that if escape analysis determined that an object does > not escape then you know *for sure* that it will always be scalar- > replaced? Andrew. From aph at redhat.com Tue Dec 15 14:57:59 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 14:57:59 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEF22@DEWDFEMB12A.global.corp.sap> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap> <567026D7.6080908@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEF22@DEWDFEMB12A.global.corp.sap> Message-ID: <56702A77.7040407@redhat.com> On 12/15/2015 02:54 PM, Lindenmaier, Goetz wrote: > I.e., an object passed to a callee that is a pure function > can not be scalar replaced, as you have to keep the object > layout to pass it down. > But the callee does not publish the reference to any other > thread, so we don't need to execute locks. Also, we > can remove barriers. So the answer is obvious, surely? We can elide the locks only if NoEscape. Andrew. From vitalyd at gmail.com Tue Dec 15 15:00:42 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 15 Dec 2015 10:00:42 -0500 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> Message-ID: Well, scratch what I said; I see Goetz is referring to ArgEscape form, but I was thinking we're talking about the NoEscape version given the example is quite simple. On Tue, Dec 15, 2015 at 9:51 AM, Vitaly Davidovich wrote: > Hotspot implements only the scalar replacement form of EA. > > On Tue, Dec 15, 2015 at 9:33 AM, Andrew Haley wrote: > >> On 12/15/2015 02:28 PM, Vitaly Davidovich wrote: >> > I'm curious why you guys think `a` and/or `b` would be in the oopmap if >> > compiler proves they don't escape. AFAIK, both `a` and `b` will be >> > component-wise scalar replaced. Once that's done, there's a ref from >> > scalar replaced a.x to `b`, but `b` itself is scalar replaced. In >> either >> > case, I don't see why either of these need to be known to GC at all >> (which >> > would somewhat defeat the purpose of EA to begin with). >> >> Are you saying that if escape analysis determined that an object does >> not escape then you know *for sure* that it will always be scalar- >> replaced? >> >> Andrew. >> >> > From vitalyd at gmail.com Tue Dec 15 15:02:23 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 15 Dec 2015 10:02:23 -0500 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEF5@DEWDFEMB12A.global.corp.sap> Message-ID: Ok, as I just replied to Andrew, I hadn't considered the ArgEscape scenario. Does an oop that's ArgEscape still get allocated on heap then? On Tue, Dec 15, 2015 at 9:37 AM, Lindenmaier, Goetz < goetz.lindenmaier at sap.com> wrote: > If object arg_escape, locking, barriers etc can be relaxed, but scalar > replacement is not possible. > > Oop maps are needed, else these don?t survive the gc. > > > > Goetz. > > > > *From:* Vitaly Davidovich [mailto:vitalyd at gmail.com] > *Sent:* Dienstag, 15. Dezember 2015 15:29 > *To:* Andrew Haley > *Cc:* Lindenmaier, Goetz ; Doerr, Martin < > martin.doerr at sap.com>; Aleksey Shipilev ; > Vladimir Kozlov ; Hui Shi ; > hotspot compiler ; > aarch64-port-dev ; Mikael Gerdin < > mikael.gerdin at oracle.com> (mikael.gerdin at oracle.com) < > mikael.gerdin at oracle.com> > *Subject:* Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory > barrier after AllocationNode > > > > I'm curious why you guys think `a` and/or `b` would be in the oopmap if > compiler proves they don't escape. AFAIK, both `a` and `b` will be > component-wise scalar replaced. Once that's done, there's a ref from > scalar replaced a.x to `b`, but `b` itself is scalar replaced. In either > case, I don't see why either of these need to be known to GC at all (which > would somewhat defeat the purpose of EA to begin with). > > > > On Tue, Dec 15, 2015 at 9:05 AM, Andrew Haley wrote: > > Hi, > > On 12/15/2015 01:53 PM, Lindenmaier, Goetz wrote: > > > here an example: > > > > A a = new A (); // a does not escape > > Safepoint(); // a is known to GC > > // Concurrent GC is running. > > B b = new B(a); > > > > where > > B(A a) { > > > > StoreStore barrier // This is removed by the optimization. > > a.x = this; // Then this is not initialized, > but visible to GC > > final field store > > Membar_release > > } > > Hmm, interesting. Here we're presented with two objects which > escape analysis reveals as not escaping but both are allocated > anyway and are included in the OOP map. > > I'd argue that once you've put an object into an OOP map to be scanned > it has escaped, but that may well not be how C2 handles it. For this > reachability analysis to be correct, if you put a reference to an > object into any object which is reachable as a GC root then that object > surely does escape. > > Andrew. > > > From vitalyd at gmail.com Tue Dec 15 15:11:00 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 15 Dec 2015 10:11:00 -0500 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <567029EA.5030607@redhat.com> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> Message-ID: Yes that was my fault; I had forgotten about the ArgEscape analysis result. To answer your question somewhat, if an object is NoEscape then it's scalar replaced in the end. I don't think there's any other end result in hotspot (e.g there's no stack allocation). On Tuesday, December 15, 2015, Andrew Haley wrote: > On 12/15/2015 02:51 PM, Vitaly Davidovich wrote: > > Hotspot implements only the scalar replacement form of EA. > > Scalar replacement is not a form of escape analysis. This does > not answer my question, which was: > > > Are you saying that if escape analysis determined that an object does > > not escape then you know *for sure* that it will always be scalar- > > replaced? > > Andrew. > > -- Sent from my phone From goetz.lindenmaier at sap.com Tue Dec 15 15:14:02 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 15 Dec 2015 15:14:02 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> Message-ID: <4295855A5C1DE049A61835A1887419CC41EDEF66@DEWDFEMB12A.global.corp.sap> Hi Hui That depends how BCEscapeAnalysis is implemented. I don?t know this in detail. But in theory, after analyzing a callee, you represent it by some function describing it?s semantics. From this you would derive that both are ArgEscape in the end. Best regards, Goetz. From: Hui Shi [mailto:hui.shi at linaro.org] Sent: Dienstag, 15. Dezember 2015 15:51 To: Lindenmaier, Goetz Cc: Andrew Haley ; Doerr, Martin ; Aleksey Shipilev ; Vladimir Kozlov ; hotspot compiler ; aarch64-port-dev ; Mikael Gerdin (mikael.gerdin at oracle.com) Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode Thanks All! In Goetz example, suppose the outer method is named foo and object a, b is not escaped in foo. b is not escaped in foo as a is not escaped in foo. But b is escaped in its initializer in BCEscapeAnalysis. In b's initializer method, "this" should be marked escaped as it is assigned to another parameter "assign to a.x". As b is escaped in its initializer, storestore barrier will not be removed in this case, so it's safe. Regards Hui On 15 December 2015 at 21:53, Lindenmaier, Goetz > wrote: Hi Andrew, here an example: A a = new A (); // a does not escape Safepoint(); // a is known to GC // Concurrent GC is running. B b = new B(a); where B(A a) { StoreStore barrier // This is removed by the optimization. a.x = this; // Then this is not initialized, but visible to GC final field store Membar_release } Best regards, Martin and Goetz. > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Dienstag, 15. Dezember 2015 14:46 > To: Lindenmaier, Goetz >; Doerr, Martin > >; Aleksey Shipilev >; > Vladimir Kozlov >; Hui Shi >; > hotspot compiler >; aarch64-port- > dev >; Mikael Gerdin > > (mikael.gerdin at oracle.com) > > > Subject: Re: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory > barrier after AllocationNode > > Hi, > > On 12/15/2015 01:09 PM, Lindenmaier, Goetz wrote: > > > What if it's assigned to an object that's already completely alive, > > but does not escape itself? > > It's not clear to me exactly what this means. However, if neither > object escapes then they are both reachable to GC only via scanning > the stack, and this can happen only at safepoints. > > Andrew. From goetz.lindenmaier at sap.com Tue Dec 15 16:01:40 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 15 Dec 2015 16:01:40 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> Message-ID: <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> Yes, there is no stack allocation. But locks are removed, see escape.cpp:1844, which is executed under condition not_global_escape(). As well look at callnode:1770. Also, does_not_escape_thread() used here checks for <= ArgEscape. Further, if the object is NoEscape it might not be scalar replaced. If I remember correctly, there are various conditions, e.g., too big, allocated in loop. And, the constructor could be inlined (or does this happen after expand_allocate_common()?) Best regards, Goetz. From: Vitaly Davidovich [mailto:vitalyd at gmail.com] Sent: Dienstag, 15. Dezember 2015 16:11 To: Andrew Haley Cc: Lindenmaier, Goetz ; Doerr, Martin ; Aleksey Shipilev ; Vladimir Kozlov ; Hui Shi ; hotspot compiler ; aarch64-port-dev ; Mikael Gerdin (mikael.gerdin at oracle.com) Subject: Re: RFR: 8144993: Elide redundant memory barrier after AllocationNode Yes that was my fault; I had forgotten about the ArgEscape analysis result. To answer your question somewhat, if an object is NoEscape then it's scalar replaced in the end. I don't think there's any other end result in hotspot (e.g there's no stack allocation). On Tuesday, December 15, 2015, Andrew Haley > wrote: On 12/15/2015 02:51 PM, Vitaly Davidovich wrote: > Hotspot implements only the scalar replacement form of EA. Scalar replacement is not a form of escape analysis. This does not answer my question, which was: > Are you saying that if escape analysis determined that an object does > not escape then you know *for sure* that it will always be scalar- > replaced? Andrew. -- Sent from my phone From aph at redhat.com Tue Dec 15 16:15:08 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 16:15:08 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> Message-ID: <56703C8C.4000801@redhat.com> On 12/15/2015 04:01 PM, Lindenmaier, Goetz wrote: > Further, if the object is NoEscape it might not be scalar > replaced. If I remember correctly, there are various conditions, > e.g., too big, allocated in loop. Well, that's the killer. The definition of "escape" we need to use here is the really, truly, honest-to-goodness one: that this object never becomes visible to any other thread by any means. Unless that is so, all bets are off. In this case, what is intended is "appears in an OOP map". Andrew. From aph at redhat.com Tue Dec 15 18:00:57 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Dec 2015 18:00:57 +0000 Subject: [aarch64-port-dev ] RFR: 8144498: aarch64: large code cache generates SEGV In-Reply-To: <1449588750.5880.28.camel@mylittlepony.linaroharston> References: <1449223186.15424.42.camel@mint> <5661BBCB.5000307@redhat.com> <5661CF8B.6040405@redhat.com> <1449490934.12382.49.camel@mint> <566595B5.9060400@redhat.com> <1449588750.5880.28.camel@mylittlepony.linaroharston> Message-ID: <56705559.8020900@redhat.com> On 12/08/2015 03:32 PM, Edward Nevill wrote: > OK. Thanks, I have satisfied myself that this is correct. > > New webrev @ http://cr.openjdk.java.net/~enevill/8144498/webrev.2 By the powers newly vested in me I hereby approve this patch. Andrew. From hui.shi at linaro.org Wed Dec 16 12:27:00 2015 From: hui.shi at linaro.org (Hui Shi) Date: Wed, 16 Dec 2015 20:27:00 +0800 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <56703C8C.4000801@redhat.com> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> Message-ID: Thanks Andrew, Goetz and all! Major concern is will removing storestore barrier cause other threads read stale data for newly allocated object. Other threads include java thread or concurrent GC thread. It should be safe with following analysis. 1. If BCEA result "this"(b) escapes in its initializer, change will not optimize storestore barrier. 2. If BCEA result "this"(b) does not escape in its initializer, it's safe to remove storestore. 2.1 If there is a safe point between storestore and release, b is visible to GC in initializer, but at safe point, it should have a memory barrier. 2.2 If there is no safe point between storestore and release. b will be visible to other thread after release memory barrier. Case #1 A a = new A(); safepoint // a can be reached from GC new B(a) allocation ------- b.klass =... b.markword =... b.f1 = 0 .. b.fn = 0 storestore -------- init start .... a.x = this; // b might visible to other threads here .... release -------- init end BCEA result indicate "this"(b) is not local and not arg_stack. So "b" will be treated as escaped in its initialzer, so change will not optimize storestore barrier. [EA] estimated escape information for B:: non-escaping args: {} stack-allocatable args: {1} return non-local value modified args: 0x6 0x6 flags: b="this" is not local and not arg_stack a is arg_stack means it is passed in and not assigned to other object in initializer. Case #2.1 allocation ------- b.klass =... b.markword =... b.f1 = 0 .. b.fn = 0 storestore -------- init start .... safepoint // "this" is in oop map and might visible to GC thread here .... release -------- init end Case #2.2 allocation ------- b.klass =... b.markword =... b.f1 = 0 .. b.fn = 0 storestore -------- init start .... release -------- init end Regards Hui On 16 December 2015 at 00:15, Andrew Haley wrote: > On 12/15/2015 04:01 PM, Lindenmaier, Goetz wrote: > > > Further, if the object is NoEscape it might not be scalar > > replaced. If I remember correctly, there are various conditions, > > e.g., too big, allocated in loop. > > Well, that's the killer. The definition of "escape" we need to use > here is the really, truly, honest-to-goodness one: that this object > never becomes visible to any other thread by any means. Unless that > is so, all bets are off. In this case, what is intended is "appears > in an OOP map". > > Andrew. > From martin.doerr at sap.com Thu Dec 17 13:54:20 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 17 Dec 2015 13:54:20 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> Hi Hui Shi, my concern was not limited to 8144993, but also with respect to 8136596 which is already pushed. I have written the following small java example: public class TestAllocMemBar{ static final int loop_cnt = 20000; void dont_inline_me() {} public class A{ public B b; } public class B{ public B(A a) { a.b = B.this; } } public void TestMethod() { A a = new A(); dont_inline_me(); //System.gc(); B b = new B(a); } public static void main(String args[]){ TestAllocMemBar xyz = new TestAllocMemBar(); long duration = System.nanoTime(); for (int x = 0; x < loop_cnt; x++) { xyz.TestMethod(); } duration = System.nanoTime() - duration; System.out.println("duration: " + duration/1000/loop_cnt + " us per iteration"); } } Execution shows (tested on PPC64): openjdk_9/bin/java -XX:+UseConcMarkSweepGC -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:CompileCommand="exclude TestAllocMemBar::dont_inline_me" -XX:+PrintInlining -XX:+PrintEscapeAnalysis -XX:-EliminateAllocations TestAllocMemBar ? ======== Connection graph for TestAllocMemBar::TestMethod JavaObject NoEscape(NoEscape) [ 59F 179F [ 37 42 ]] 25 Allocate === 5 6 7 8 1 ( 23 21 22 1 10 1 1 ) [[ 26 27 28 35 36 37 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top ) TestAllocMemBar::TestMethod @ bci:0 !jvms: TestAllocMemBar::TestMethod @ bci:0 LocalVar [ 25P [ 42 59b ]] 37 Proj === 25 [[ 38 42 59 ]] #5 !jvms: TestAllocMemBar::TestMethod @ bci:0 LocalVar [ 37 25P [ 179b ]] 42 CheckCastPP === 39 37 [[ 179 183 179 119 98 93 ]] #TestAllocMemBar$A:NotNull:exact * Oop:TestAllocMemBar$A:NotNull:exact * !jvms: TestAllocMemBar::TestMethod @ bci:0 JavaObject NoEscape(NoEscape) NSR [ 153F [ 131 136 180 179 ]] 119 Allocate === 105 100 101 8 1 ( 54 117 22 1 10 42 1 ) [[ 120 121 122 129 130 131 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top ) TestAllocMemBar::TestMethod @ bci:13 !jvms: TestAllocMemBar::TestMethod @ bci:13 LocalVar [ 119P [ 136 153b ]] 131 Proj === 119 [[ 132 136 153 ]] #5 !jvms: TestAllocMemBar::TestMethod @ bci:13 LocalVar [ 131 119P [ 180 ]] 136 CheckCastPP === 133 131 [[ 180 193 ]] #TestAllocMemBar$B:NotNull:exact * Oop:TestAllocMemBar$B:NotNull:exact * !jvms: TestAllocMemBar::TestMethod @ bci:13 LocalVar [ 136 119P [ 179 ]] 180 EncodeP === _ 136 [[ 181 ]] #narrowoop: TestAllocMemBar$B:NotNull:exact * !jvms: TestAllocMemBar$B:: @ bci:11 TestAllocMemBar::TestMethod @ bci:19 @ 5 TestAllocMemBar$A:: (10 bytes) inline (hot) @ 6 java.lang.Object:: (1 bytes) inline (hot) @ 10 TestAllocMemBar::dont_inline_me (1 bytes) not compilable (disabled) @ 19 TestAllocMemBar$B:: (15 bytes) inline (hot) @ 6 java.lang.Object:: (1 bytes) inline (hot) @ 6 java.lang.Object:: (1 bytes) inline (hot) @ 6 java.lang.Object:: (1 bytes) inline (hot) duration: 3 us per iteration So you can see that both Allocations have the state NoEscape, but there?s a safepoint (the non-inlined call) between them. Concurrent GC could access the obj header and read stale data (and possibly crash). OptoAssembly shows that the MemBar was optimized out (probably due to 8136596). However, we may have luck. Maybe no concurrent GC accesses the header of newly created objects. But I don?t know if this is true which is the reason why I posted this question originally. Keep in mind that objects can get allocated in old gen. I still could imaging that these 2 optimization may be dangerous. Best regards, Martin From: Hui Shi [mailto:hui.shi at linaro.org] Sent: Mittwoch, 16. Dezember 2015 13:27 To: Andrew Haley Cc: Lindenmaier, Goetz ; Vitaly Davidovich ; Doerr, Martin ; Aleksey Shipilev ; Vladimir Kozlov ; hotspot compiler ; aarch64-port-dev ; Mikael Gerdin (mikael.gerdin at oracle.com) Subject: Re: RFR: 8144993: Elide redundant memory barrier after AllocationNode Thanks Andrew, Goetz and all! Major concern is will removing storestore barrier cause other threads read stale data for newly allocated object. Other threads include java thread or concurrent GC thread. It should be safe with following analysis. 1. If BCEA result "this"(b) escapes in its initializer, change will not optimize storestore barrier. 2. If BCEA result "this"(b) does not escape in its initializer, it's safe to remove storestore. 2.1 If there is a safe point between storestore and release, b is visible to GC in initializer, but at safe point, it should have a memory barrier. 2.2 If there is no safe point between storestore and release. b will be visible to other thread after release memory barrier. Case #1 A a = new A(); safepoint // a can be reached from GC new B(a) allocation ------- b.klass =... b.markword =... b.f1 = 0 .. b.fn = 0 storestore -------- init start .... a.x = this; // b might visible to other threads here .... release -------- init end BCEA result indicate "this"(b) is not local and not arg_stack. So "b" will be treated as escaped in its initialzer, so change will not optimize storestore barrier. [EA] estimated escape information for B:: non-escaping args: {} stack-allocatable args: {1} return non-local value modified args: 0x6 0x6 flags: b="this" is not local and not arg_stack a is arg_stack means it is passed in and not assigned to other object in initializer. Case #2.1 allocation ------- b.klass =... b.markword =... b.f1 = 0 .. b.fn = 0 storestore -------- init start .... safepoint // "this" is in oop map and might visible to GC thread here .... release -------- init end Case #2.2 allocation ------- b.klass =... b.markword =... b.f1 = 0 .. b.fn = 0 storestore -------- init start .... release -------- init end Regards Hui On 16 December 2015 at 00:15, Andrew Haley > wrote: On 12/15/2015 04:01 PM, Lindenmaier, Goetz wrote: > Further, if the object is NoEscape it might not be scalar > replaced. If I remember correctly, there are various conditions, > e.g., too big, allocated in loop. Well, that's the killer. The definition of "escape" we need to use here is the really, truly, honest-to-goodness one: that this object never becomes visible to any other thread by any means. Unless that is so, all bets are off. In this case, what is intended is "appears in an OOP map". Andrew. From aph at redhat.com Thu Dec 17 13:59:47 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 17 Dec 2015 13:59:47 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> Message-ID: <5672BFD3.7040307@redhat.com> On 12/17/2015 01:54 PM, Doerr, Martin wrote: > So you can see that both Allocations have the state NoEscape, but > there?s a safepoint (the non-inlined call) between them. Concurrent > GC could access the obj header and read stale data (and possibly > crash). OptoAssembly shows that the MemBar was optimized out > (probably due to 8136596). > > However, we may have luck. Maybe no concurrent GC accesses the > header of newly created objects. But I don?t know if this is true > which is the reason why I posted this question originally. Keep in > mind that objects can get allocated in old gen. So, they are both NoEscape. So do the objects actually get allocated? Or are they scalar-replaced? Andrew. From hui.shi at linaro.org Thu Dec 17 15:28:35 2015 From: hui.shi at linaro.org (Hui Shi) Date: Thu, 17 Dec 2015 23:28:35 +0800 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <5672BFD3.7040307@redhat.com> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> Message-ID: Thanks Martin! Could discussion limit to 8144993 in this thread. Stated in early mail, it looks safe in 3 cases for references from both GC thread or other java thread. 8136596 enhances original optimization from noEcape to both noescape and argescape. As said in your new example, both allocations are noescape, so it's not directly related with 8136596. How about starting a new thread discussing if there is possible danger in original storestore barrier optimization? Regards Hui On 17 December 2015 at 21:59, Andrew Haley wrote: > On 12/17/2015 01:54 PM, Doerr, Martin wrote: > > > So you can see that both Allocations have the state NoEscape, but > > there?s a safepoint (the non-inlined call) between them. Concurrent > > GC could access the obj header and read stale data (and possibly > > crash). OptoAssembly shows that the MemBar was optimized out > > (probably due to 8136596). > > > > However, we may have luck. Maybe no concurrent GC accesses the > > header of newly created objects. But I don?t know if this is true > > which is the reason why I posted this question originally. Keep in > > mind that objects can get allocated in old gen. > > So, they are both NoEscape. So do the objects actually get allocated? > Or are they scalar-replaced? > > Andrew. > From aph at redhat.com Thu Dec 17 15:34:54 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 17 Dec 2015 15:34:54 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> Message-ID: <5672D61E.3020805@redhat.com> On 12/17/2015 03:28 PM, Hui Shi wrote: > Could discussion limit to 8144993 in this thread. Stated in early mail, it > looks safe in 3 cases for references from both GC thread or other java > thread. > > 8136596 enhances original optimization from noEcape to both noescape and > argescape. As said in your new example, both allocations are noescape, so > it's not directly related with 8136596. How about starting a new thread > discussing if there is possible danger in original storestore barrier > optimization? I say we should not do that. Martin's concern is real, and you have shown no reason to suppose that removing the memory barriers will not result in a concurrent GC seeing stale object headers. As it stands, unless someone can come up with something convincing, we're going to have to restore those memory barriers. 8144993 should not be committed until this issue is resolved. Andrew. From aph at redhat.com Thu Dec 17 15:43:38 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 17 Dec 2015 15:43:38 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <5672D61E.3020805@redhat.com> References: <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> <5672D61E.3020805@redhat.com> Message-ID: <5672D82A.309@redhat.com> The potential problem only arises if "this" is published unsafely and the object to which it is published doesn't escape. Can't we detect unsafe publication? It ought to be easier than escape analysis: it's a matter of detecting that "this" escapes from the constructor. Andrew. From edward.nevill at gmail.com Thu Dec 17 16:07:34 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 17 Dec 2015 16:07:34 +0000 Subject: [aarch64-port-dev ] RFR: JDK 7: Add Support for Large Code Cache on aarch64 Message-ID: <1450368454.21162.22.camel@mylittlepony.linaroharston> Hi, The following webrev adds support for large code caches to JDK 7 for aarch64 http://cr.openjdk.java.net/~enevill/jdk7_largecode/webrev/ Tested with jtreg hotspot/langtools. hotspot (original): Test results: passed: 297; failed: 12; error: 2 hotspot (patched): Test results: passed: 297; failed: 12; error: 2 hotspot (256m cache): Test results: passed: 298; failed: 11; error: 2 langtools (original): Test results: passed: 1,973; failed: 1 langtools (patched): Test results: passed: 1,973; failed: 1 langtools (256m cache): Test results: passed: 1,973; failed: 1 Only aarch64 files are touched in this patch. OK to push? Ed. From martin.doerr at sap.com Thu Dec 17 17:58:22 2015 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 17 Dec 2015 17:58:22 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <5672D82A.309@redhat.com> References: <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> <5672D61E.3020805@redhat.com> <5672D82A.309@redhat.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap> Hi Andrew, thanks for your emails. Many memory barriers are only there for concurrent java threads and are not relevant for GC. They are opportunities for EscapeAnalysis-based optimizations. The MemBarStoreStore after the Allocation actually has this purpose plus the additional purpose to satisfy GC requirements. EscapeAnalysis was not designed to analyze "escape to concurrent GC". I guess it is difficult to analyze this in general. So maybe it would be better to change the condition for the MemBarStoreStore barrier insertion to something like "gc_requires_initialized_new_obj_headers() || !alloc->does_not_escape..." with the first function containing the knowledge about all GCs. You also had asked if the objects in my example were scalar replaced. By default, they do get scalar-replaced, but I had prevented this by -XX:-EliminateAllocations which does not influence the escape state and the membar optimizations. Best regards, Martin -----Original Message----- From: Andrew Haley [mailto:aph at redhat.com] Sent: Donnerstag, 17. Dezember 2015 16:44 To: Hui Shi Cc: Doerr, Martin ; Lindenmaier, Goetz ; Vitaly Davidovich ; Aleksey Shipilev ; Vladimir Kozlov ; hotspot compiler ; aarch64-port-dev ; Mikael Gerdin (mikael.gerdin at oracle.com) Subject: Re: RFR: 8144993: Elide redundant memory barrier after AllocationNode The potential problem only arises if "this" is published unsafely and the object to which it is published doesn't escape. Can't we detect unsafe publication? It ought to be easier than escape analysis: it's a matter of detecting that "this" escapes from the constructor. Andrew. From vitalyd at gmail.com Thu Dec 17 18:10:44 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Thu, 17 Dec 2015 13:10:44 -0500 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap> References: <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> <5672D61E.3020805@redhat.com> <5672D82A.309@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap> Message-ID: > > You also had asked if the objects in my example were scalar replaced. By > default, they do get scalar-replaced, but I had prevented this by > -XX:-EliminateAllocations which does not influence the escape state and the > membar optimizations. I'd say that's a big problem, no? The membar elimination is only safe if the allocation is actually removed. If the analysis says it's NoEscape but compiler still allocates it for whatever reason (Goetz mentioned a couple earlier in this thread), then it seems insufficient to rely on just the analysis result. On Thu, Dec 17, 2015 at 12:58 PM, Doerr, Martin wrote: > Hi Andrew, > > thanks for your emails. > > Many memory barriers are only there for concurrent java threads and are > not relevant for GC. They are opportunities for EscapeAnalysis-based > optimizations. > > The MemBarStoreStore after the Allocation actually has this purpose plus > the additional purpose to satisfy GC requirements. EscapeAnalysis was not > designed to analyze "escape to concurrent GC". I guess it is difficult to > analyze this in general. > > So maybe it would be better to change the condition for the > MemBarStoreStore barrier insertion to something like > "gc_requires_initialized_new_obj_headers() || !alloc->does_not_escape..." > with the first function containing the knowledge about all GCs. > > You also had asked if the objects in my example were scalar replaced. By > default, they do get scalar-replaced, but I had prevented this by > -XX:-EliminateAllocations which does not influence the escape state and the > membar optimizations. > > Best regards, > Martin > > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Donnerstag, 17. Dezember 2015 16:44 > To: Hui Shi > Cc: Doerr, Martin ; Lindenmaier, Goetz < > goetz.lindenmaier at sap.com>; Vitaly Davidovich ; > Aleksey Shipilev ; Vladimir Kozlov < > vladimir.kozlov at oracle.com>; hotspot compiler < > hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev < > aarch64-port-dev at openjdk.java.net>; Mikael Gerdin < > mikael.gerdin at oracle.com> (mikael.gerdin at oracle.com) < > mikael.gerdin at oracle.com> > Subject: Re: RFR: 8144993: Elide redundant memory barrier after > AllocationNode > > The potential problem only arises if "this" is published unsafely and > the object to which it is published doesn't escape. > > Can't we detect unsafe publication? It ought to be easier than escape > analysis: it's a matter of detecting that "this" escapes from the > constructor. > > Andrew. > From goetz.lindenmaier at sap.com Fri Dec 18 10:43:44 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 18 Dec 2015 10:43:44 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> Message-ID: <4295855A5C1DE049A61835A1887419CC41EE35EF@DEWDFEMB12A.global.corp.sap> Hi Hui, > Subject: Re: RFR: 8144993: Elide redundant memory barrier after > AllocationNode > > Thanks Andrew, Goetz and all! > > Major concern is will removing storestore barrier cause other threads read > stale data for newly allocated object. Other threads include java thread or > concurrent GC thread. It should be safe with following analysis. > > 1. If BCEA result "this"(b) escapes in its initializer, change will not optimize > storestore barrier. > 2. If BCEA result "this"(b) does not escape in its initializer, it's safe to remove > storestore. > 2.1 If there is a safe point between storestore and release, b is visible to GC > in initializer, but at safe point, it should have a memory barrier. > 2.2 If there is no safe point between storestore and release. b will be visible > to other thread after release memory barrier. I think this describes the situation correctly wrt. to my counterexample. I'm not sure whether there are other possibilities. Is the test for 1.) already implemented? How do you do this? Is inlining of the constructor delayed when you do your optimization, so you can find the call to it? Or do you find the BCEA information via the class that is reachable over the type information? How do you known then which constructor was called if there are several ones? Best regards, Goetz. > > Case #1 > A a = new A(); > safepoint // a can be reached from GC > new B(a) > > allocation > ------- > b.klass =... > b.markword =... > b.f1 = 0 > .. > b.fn = 0 > storestore > -------- init start > .... > a.x = this; // b might visible to other threads here > .... > release > -------- init end > > BCEA result indicate "this"(b) is not local and not arg_stack. So "b" will be > treated as escaped in its initialzer, so change will not optimize storestore > barrier. > [EA] estimated escape information for B:: > non-escaping args: {} > stack-allocatable args: {1} > return non-local value > modified args: 0x6 0x6 > flags: > b="this" is not local and not arg_stack > a is arg_stack means it is passed in and not assigned to other object in > initializer. > > Case #2.1 > allocation > ------- > b.klass =... > b.markword =... > b.f1 = 0 > .. > b.fn = 0 > storestore > -------- init start > .... > safepoint // "this" is in oop map and might visible to GC thread here > .... > release > -------- init end > > Case #2.2 > allocation > ------- > b.klass =... > b.markword =... > b.f1 = 0 > .. > b.fn = 0 > storestore > -------- init start > .... > release > -------- init end > > Regards > Hui > > On 16 December 2015 at 00:15, Andrew Haley > wrote: > > > On 12/15/2015 04:01 PM, Lindenmaier, Goetz wrote: > > > Further, if the object is NoEscape it might not be scalar > > replaced. If I remember correctly, there are various conditions, > > e.g., too big, allocated in loop. > > Well, that's the killer. The definition of "escape" we need to use > here is the really, truly, honest-to-goodness one: that this object > never becomes visible to any other thread by any means. Unless > that > is so, all bets are off. In this case, what is intended is "appears > in an OOP map". > > Andrew. > > From goetz.lindenmaier at sap.com Fri Dec 18 11:09:41 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 18 Dec 2015 11:09:41 +0000 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: References: <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> <5672D61E.3020805@redhat.com> <5672D82A.309@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap> Message-ID: <4295855A5C1DE049A61835A1887419CC41EE3661@DEWDFEMB12A.global.corp.sap> Hi > > You also had asked if the objects in my example were scalar replaced. > > By default, they do get scalar-replaced, but I had prevented this by -XX:- > > EliminateAllocations which does not influence the escape state and the > > membar optimizations. > > I'd say that's a big problem, no? The membar elimination is only safe if the > allocation is actually removed. If the analysis says it's NoEscape but compiler > still allocates it for whatever reason (Goetz mentioned a couple earlier in this > thread), then it seems insufficient to rely on just the analysis result. Well, if it's NoEscape it's safe to remove the barriers wrt. to Java semantics, no matter what other optimizations (here: scalar replacement) do. But here we look at the importance of the barrier to the runtime system, which is VM implementation specific. In particular, the new optimization addresses also objects that escape, as long as they don't escape before the barrier at the end of the constructor. Best regards, Goetz. > On Thu, Dec 17, 2015 at 12:58 PM, Doerr, Martin > wrote: > > > Hi Andrew, > > thanks for your emails. > > Many memory barriers are only there for concurrent java threads > and are not relevant for GC. They are opportunities for EscapeAnalysis-based > optimizations. > > The MemBarStoreStore after the Allocation actually has this purpose > plus the additional purpose to satisfy GC requirements. EscapeAnalysis was > not designed to analyze "escape to concurrent GC". I guess it is difficult to > analyze this in general. > > So maybe it would be better to change the condition for the > MemBarStoreStore barrier insertion to something like > "gc_requires_initialized_new_obj_headers() || !alloc- > >does_not_escape..." with the first function containing the knowledge > about all GCs. > > You also had asked if the objects in my example were scalar replaced. > By default, they do get scalar-replaced, but I had prevented this by -XX:- > EliminateAllocations which does not influence the escape state and the > membar optimizations. > > Best regards, > Martin > > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com > ] > Sent: Donnerstag, 17. Dezember 2015 16:44 > To: Hui Shi > > Cc: Doerr, Martin >; Lindenmaier, Goetz > >; > Vitaly Davidovich >; > Aleksey Shipilev >; Vladimir Kozlov > >; > hotspot compiler >; aarch64-port-dev > dev at openjdk.java.net> >; Mikael Gerdin > (mikael.gerdin at oracle.com > ) > > Subject: Re: RFR: 8144993: Elide redundant memory barrier after > AllocationNode > > > The potential problem only arises if "this" is published unsafely and > the object to which it is published doesn't escape. > > Can't we detect unsafe publication? It ought to be easier than > escape > analysis: it's a matter of detecting that "this" escapes from the > constructor. > > Andrew. > > From hui.shi at linaro.org Fri Dec 18 12:45:43 2015 From: hui.shi at linaro.org (Hui Shi) Date: Fri, 18 Dec 2015 20:45:43 +0800 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <4295855A5C1DE049A61835A1887419CC41EE35EF@DEWDFEMB12A.global.corp.sap> References: <566F7D82.6030806@oracle.com> <566FD7E8.7000105@oracle.com> <7C9B87B351A4BA4AA9EC95BB4181165672288D60@DEWDFEMB19C.global.corp.sap> <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <4295855A5C1DE049A61835A1887419CC41EE35EF@DEWDFEMB12A.global.corp.sap> Message-ID: Thanks Gotez! case 1) can be handle with current patch. BCEA information is getting from owning method when inserting release memory barrier for final field write. Final field is initialized in its owning allocation node's constructor method. Following code is in parse::do_exits, alloc->compute_MemBar_redundancy get constructor method's BCEA information and check if allocation escape in constructor method. if (method()->is_initializer() && (wrote_final() || PPC64_ONLY(wrote_volatile() ||) (AlwaysSafeConstructors && wrote_fields()))) { _exits.insert_mem_bar(Op_MemBarRelease, alloc_with_final()); + + // If Memory barrier is created for final fields write + // and allocation node does not escape the initialize method, + // then barrier introduced by allocation node can be removed. + if (DoEscapeAnalysis && alloc_with_final()) { + AllocateNode *alloc = AllocateNode::Ideal_allocation(alloc_with_final(), &_gvn); + alloc->compute_MemBar_redundancy(method()); + } Regards Hui On 18 December 2015 at 18:43, Lindenmaier, Goetz wrote: > Hi Hui, > > > Subject: Re: RFR: 8144993: Elide redundant memory barrier after > > AllocationNode > > > > Thanks Andrew, Goetz and all! > > > > Major concern is will removing storestore barrier cause other threads > read > > stale data for newly allocated object. Other threads include java thread > or > > concurrent GC thread. It should be safe with following analysis. > > > > 1. If BCEA result "this"(b) escapes in its initializer, change will not > optimize > > storestore barrier. > > 2. If BCEA result "this"(b) does not escape in its initializer, it's > safe to remove > > storestore. > > 2.1 If there is a safe point between storestore and release, b is > visible to GC > > in initializer, but at safe point, it should have a memory barrier. > > 2.2 If there is no safe point between storestore and release. b will > be visible > > to other thread after release memory barrier. > I think this describes the situation correctly wrt. to my counterexample. > I'm > not sure whether there are other possibilities. > > Is the test for 1.) already implemented? > How do you do this? Is inlining of the constructor delayed when you do > your optimization, so you can find the call to it? Or do you find the > BCEA information > via the class that is reachable over the type information? How do you > known then > which constructor was called if there are several ones? > > Best regards, > Goetz. > > > > > > > > > Case #1 > > A a = new A(); > > safepoint // a can be reached from GC > > new B(a) > > > > allocation > > ------- > > b.klass =... > > b.markword =... > > b.f1 = 0 > > .. > > b.fn = 0 > > storestore > > -------- init start > > .... > > a.x = this; // b might visible to other threads here > > .... > > release > > -------- init end > > > > BCEA result indicate "this"(b) is not local and not arg_stack. So "b" > will be > > treated as escaped in its initialzer, so change will not optimize > storestore > > barrier. > > [EA] estimated escape information for B:: > > non-escaping args: {} > > stack-allocatable args: {1} > > return non-local value > > modified args: 0x6 0x6 > > flags: > > b="this" is not local and not arg_stack > > a is arg_stack means it is passed in and not assigned to other > object in > > initializer. > > > > Case #2.1 > > allocation > > ------- > > b.klass =... > > b.markword =... > > b.f1 = 0 > > .. > > b.fn = 0 > > storestore > > -------- init start > > .... > > safepoint // "this" is in oop map and might visible to GC thread here > > .... > > release > > -------- init end > > > > Case #2.2 > > allocation > > ------- > > b.klass =... > > b.markword =... > > b.f1 = 0 > > .. > > b.fn = 0 > > storestore > > -------- init start > > .... > > release > > -------- init end > > > > Regards > > Hui > > > > On 16 December 2015 at 00:15, Andrew Haley > > wrote: > > > > > > On 12/15/2015 04:01 PM, Lindenmaier, Goetz wrote: > > > > > Further, if the object is NoEscape it might not be scalar > > > replaced. If I remember correctly, there are various conditions, > > > e.g., too big, allocated in loop. > > > > Well, that's the killer. The definition of "escape" we need to use > > here is the really, truly, honest-to-goodness one: that this object > > never becomes visible to any other thread by any means. Unless > > that > > is so, all bets are off. In this case, what is intended is > "appears > > in an OOP map". > > > > Andrew. > > > > > > From hui.shi at linaro.org Fri Dec 18 13:10:06 2015 From: hui.shi at linaro.org (Hui Shi) Date: Fri, 18 Dec 2015 21:10:06 +0800 Subject: [aarch64-port-dev ] RFR: 8144993: Elide redundant memory barrier after AllocationNode In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap> References: <566FEE89.5020300@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEE45@DEWDFEMB12A.global.corp.sap> <567019A5.1000202@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEEB4@DEWDFEMB12A.global.corp.sap> <56701E2E.5000901@redhat.com> <567024A0.40409@redhat.com> <567029EA.5030607@redhat.com> <4295855A5C1DE049A61835A1887419CC41EDEFEA@DEWDFEMB12A.global.corp.sap> <56703C8C.4000801@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228945F@DEWDFEMB19C.global.corp.sap> <5672BFD3.7040307@redhat.com> <5672D61E.3020805@redhat.com> <5672D82A.309@redhat.com> <7C9B87B351A4BA4AA9EC95BB418116567228950C@DEWDFEMB19C.global.corp.sap> Message-ID: Thanks Andrew and Martin! Agree, it's better fix original storestore barrier optimization with escape information. When entering PhaseMacroExpand::expand_allocate_common, object must be allocated on heap and can't be scalar replaced? This issue can't be solved by detecting unsafe publish only in constructor, in following example, b is published outside constructor and storestore barrier still can't be removed. public void TestMethod() { A a = new A(); dont_inline_me(); //System.gc(); B b = new B(); // empty constructor // nosafe point a.b = b; } Martin proposed fix looks reasonable, disable original storestore barrier optimization if GC threads might reference allocated object. Regards Hui On 18 December 2015 at 01:58, Doerr, Martin wrote: > Hi Andrew, > > thanks for your emails. > > Many memory barriers are only there for concurrent java threads and are > not relevant for GC. They are opportunities for EscapeAnalysis-based > optimizations. > > The MemBarStoreStore after the Allocation actually has this purpose plus > the additional purpose to satisfy GC requirements. EscapeAnalysis was not > designed to analyze "escape to concurrent GC". I guess it is difficult to > analyze this in general. > > So maybe it would be better to change the condition for the > MemBarStoreStore barrier insertion to something like > "gc_requires_initialized_new_obj_headers() || !alloc->does_not_escape..." > with the first function containing the knowledge about all GCs. > > You also had asked if the objects in my example were scalar replaced. By > default, they do get scalar-replaced, but I had prevented this by > -XX:-EliminateAllocations which does not influence the escape state and the > membar optimizations. > > Best regards, > Martin > > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Donnerstag, 17. Dezember 2015 16:44 > To: Hui Shi > Cc: Doerr, Martin ; Lindenmaier, Goetz < > goetz.lindenmaier at sap.com>; Vitaly Davidovich ; > Aleksey Shipilev ; Vladimir Kozlov < > vladimir.kozlov at oracle.com>; hotspot compiler < > hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev < > aarch64-port-dev at openjdk.java.net>; Mikael Gerdin < > mikael.gerdin at oracle.com> (mikael.gerdin at oracle.com) < > mikael.gerdin at oracle.com> > Subject: Re: RFR: 8144993: Elide redundant memory barrier after > AllocationNode > > The potential problem only arises if "this" is published unsafely and > the object to which it is published doesn't escape. > > Can't we detect unsafe publication? It ought to be easier than escape > analysis: it's a matter of detecting that "this" escapes from the > constructor. > > Andrew. > From bob.vandette at oracle.com Wed Dec 23 14:55:35 2015 From: bob.vandette at oracle.com (Bob Vandette) Date: Wed, 23 Dec 2015 09:55:35 -0500 Subject: [aarch64-port-dev ] VAR_CPU_ARCH for ARM platforms Message-ID: In my push to the mobile/dev forest, I changed VAR_CPU_ARCH on arm platforms to always use arm for both 32 and 64 bit arm builds to be consistent with the setting for x86/x86_64. http://cr.openjdk.java.net/~bobv/8145936/webrev.00/ My assumption which is confirmed by most of the usage in the makefiles is that VAR_CPU_ARCH should be set to the generic ARCH family (x86, arm) for both 32 and 64 bit builds. My motivation for doing this was initially for the selection of the Socket and UnixConstant template files used in cross compilation since these files contain the same content for arm and aarch64. This seems to be causing at least one problem in the hotspot build where in JDK 9, ARCH is being set to VAR_CPU_ARCH (via OPENJDK_TARGET_CPU_ARCH). For aarch64 builds, ARCH gets set to arm. In JDK8, ARCH is set to VAR_CPU and not VAR_CPU_ARCH. Was there a reason for this change? Can we go back to the way it was in JDK8???? There are a lot of hacks in both open and closed makefiles to set various variable based on ARCH in order to end up with the correct variables. In hotspot/make/defs.make, we undo the VAR_CPU_ARCH setting of x86 for x86_64 builds by checking for LP64! This is not done for arm. BUILDARCH ?= $(SRCARCH) ifeq ($(BUILDARCH), x86) ifdef LP64 BUILDARCH = amd64 else BUILDARCH = i486 endif endif in hotspot/make/closed/defs.make, we don't fix this issue either. ifeq ($(ARCH), arm) SRCARCH = arm LIBARCH = arm ARCH_DATA_MODEL = 32 PLATFORM = linux-arm VM_PLATFORM = linux_arm HS_ARCH = arm endif ifeq ($(ARCH), aarch64) BUILDARCH = aarch64 SRCARCH = arm LIBARCH = aarch64 HS_ARCH = arm SAARCH = arm64 endif From aph at redhat.com Wed Dec 23 16:55:56 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 23 Dec 2015 16:55:56 +0000 Subject: [aarch64-port-dev ] VAR_CPU_ARCH for ARM platforms In-Reply-To: References: Message-ID: <567AD21C.5040907@redhat.com> On 23/12/15 14:55, Bob Vandette wrote: > In my push to the mobile/dev forest, I changed VAR_CPU_ARCH on arm > platforms to always use arm for both 32 and 64 bit arm builds to be > consistent with the setting for x86/x86_64. This isn't a similar situation, IMO. > http://cr.openjdk.java.net/~bobv/8145936/webrev.00/ > > My assumption which is confirmed by most of the usage in the > makefiles is that VAR_CPU_ARCH should be set to the generic ARCH > family (x86, arm) for both 32 and 64 bit builds. > > My motivation for doing this was initially for the selection of the > Socket and UnixConstant template files used in cross compilation > since these files contain the same content for arm and aarch64. I'm not convinced this makes any sense. The only thing the ARM architectures have in common is that they come from the same company. This is not true of x86_64, which is a rather elaborate 64-bit extension of x86. For examples of how the ARM/AArch64 split is handled elsewhere, note that the Linux kernel, GCC, and GNU binutils arches are all separate. > There are a lot of hacks in both open and closed makefiles to set > various variable based on ARCH in order to end up with the correct > variables. > > In hotspot/make/defs.make, we undo the VAR_CPU_ARCH setting of x86 > for x86_64 builds by checking for LP64! This is not done for arm. It really should not need to be. AArch64 is not ARM. Andrew. From andrey.petushkov at gmail.com Wed Dec 23 17:12:26 2015 From: andrey.petushkov at gmail.com (Andrey Petushkov) Date: Wed, 23 Dec 2015 17:12:26 +0000 Subject: [aarch64-port-dev ] VAR_CPU_ARCH for ARM platforms In-Reply-To: <567AD21C.5040907@redhat.com> References: <567AD21C.5040907@redhat.com> Message-ID: Hi guys, And indeed, please don't forget about AArch32 port. It's like ARM but it's quite different, you know. And it is currently using aarch32 value as VAR_CPU and VAR_CPU_ARCH Thanks, Andrey On Wed, Dec 23, 2015 at 7:56 PM Andrew Haley wrote: > On 23/12/15 14:55, Bob Vandette wrote: > > > In my push to the mobile/dev forest, I changed VAR_CPU_ARCH on arm > > platforms to always use arm for both 32 and 64 bit arm builds to be > > consistent with the setting for x86/x86_64. > > This isn't a similar situation, IMO. > > > http://cr.openjdk.java.net/~bobv/8145936/webrev.00/ < > http://cr.openjdk.java.net/~bobv/8145936/webrev.00/> > > > > My assumption which is confirmed by most of the usage in the > > makefiles is that VAR_CPU_ARCH should be set to the generic ARCH > > family (x86, arm) for both 32 and 64 bit builds. > > > > My motivation for doing this was initially for the selection of the > > Socket and UnixConstant template files used in cross compilation > > since these files contain the same content for arm and aarch64. > > I'm not convinced this makes any sense. The only thing the ARM > architectures have in common is that they come from the same company. > This is not true of x86_64, which is a rather elaborate 64-bit > extension of x86. For examples of how the ARM/AArch64 split is > handled elsewhere, note that the Linux kernel, GCC, and GNU binutils > arches are all separate. > > > There are a lot of hacks in both open and closed makefiles to set > > various variable based on ARCH in order to end up with the correct > > variables. > > > > In hotspot/make/defs.make, we undo the VAR_CPU_ARCH setting of x86 > > for x86_64 builds by checking for LP64! This is not done for arm. > > It really should not need to be. AArch64 is not ARM. > > Andrew. > From bob.vandette at oracle.com Wed Dec 23 20:36:35 2015 From: bob.vandette at oracle.com (Bob Vandette) Date: Wed, 23 Dec 2015 15:36:35 -0500 Subject: [aarch64-port-dev ] VAR_CPU_ARCH for ARM platforms In-Reply-To: <567AD21C.5040907@redhat.com> References: <567AD21C.5040907@redhat.com> Message-ID: > On Dec 23, 2015, at 11:55 AM, Andrew Haley wrote: > > On 23/12/15 14:55, Bob Vandette wrote: > >> In my push to the mobile/dev forest, I changed VAR_CPU_ARCH on arm >> platforms to always use arm for both 32 and 64 bit arm builds to be >> consistent with the setting for x86/x86_64. > > This isn't a similar situation, IMO. There appears to be a need for a variable that is used to indicate an x86 or ARM specific path independent of the specific type of ARM or x86 processor. Why don?t you this this is a similar situation. x86_64 is a 64-bit Intel architecture that also has the ability to run it?s legacy 32 bit binaries. aarch64 is a 64-bit ARM architecture that also has the ability to run its legacy armv7 (aarch32) 32-bit binaries. aarch32 may be slightly different in that it has the ability to use some newer armv8 instructions but it is compatible with armv7 with very few exceptions like the old mcr instructions. > >> http://cr.openjdk.java.net/~bobv/8145936/webrev.00/ >> >> My assumption which is confirmed by most of the usage in the >> makefiles is that VAR_CPU_ARCH should be set to the generic ARCH >> family (x86, arm) for both 32 and 64 bit builds. >> >> My motivation for doing this was initially for the selection of the >> Socket and UnixConstant template files used in cross compilation >> since these files contain the same content for arm and aarch64. > > I'm not convinced this makes any sense. The only thing the ARM > architectures have in common is that they come from the same company. > This is not true of x86_64, which is a rather elaborate 64-bit > extension of x86. One could say the same thing about armv8 versus armv7. > For examples of how the ARM/AArch64 split is > handled elsewhere, note that the Linux kernel, GCC, and GNU binutils > arches are all separate. > >> There are a lot of hacks in both open and closed makefiles to set >> various variable based on ARCH in order to end up with the correct >> variables. >> >> In hotspot/make/defs.make, we undo the VAR_CPU_ARCH setting of x86 >> for x86_64 builds by checking for LP64! This is not done for arm. > > It really should not need to be. AArch64 is not ARM. That really depends on your criteria for comparison. I still believe we need a broad variable that identifies ARM varieties. Without this, when the aarch32 port is attempted there?s going to be a lot of extraneous checks required in the makefile for ?if ARCH == aarch32? || ARCH == arm in places that would not need to be changed simply because we didn?t use the existing variable for the purpose that I believe it was originally intended. Bob. > > Andrew. From aph at redhat.com Wed Dec 23 23:43:54 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 23 Dec 2015 23:43:54 +0000 Subject: [aarch64-port-dev ] VAR_CPU_ARCH for ARM platforms In-Reply-To: References: <567AD21C.5040907@redhat.com> Message-ID: <567B31BA.3060001@redhat.com> On 23/12/15 20:36, Bob Vandette wrote: >> > This is not true of x86_64, which is a rather elaborate 64-bit >> > extension of x86. > One could say the same thing about armv8 versus armv7. I don't think one could. I suspect this exact architecture could have been designed by some other company, and no-one would have suggested it was related. Maybe someone might have said "Ooh, it's very ARM-ish," but that's all. It's a clean sheet design, it's not just wider with more registers. (The floating-point units are very similar, I'll grant you.) In contrast, x86_64 is pretty much a superset with even the same binary encodings for many instructions. [ NB: ARMv8 identifies both the AArch32 and AArch64 instruction set architectures. AArch32 is a slightly extended ARM; AArch64 is all- new. ] > That really depends on your criteria for comparison. I still > believe we need a broad variable that identifies ARM varieties. Maybe so. I guess this would capture what they have in common with each other that is different from other architectures. But there isn't much of that. > Without this, when the aarch32 port is attempted there?s going to be > a lot of extraneous checks required in the makefile for ?if ARCH == > aarch32? || ARCH == arm in places that would not need to be changed > simply because we didn?t use the existing variable for the purpose > that I believe it was originally intended. I totally agree about AArch32 and ARM. It's the same thing: the AArch32 project is just about creating ARM-open. There definitely should be a variable to cover those. Andrew. From edward.nevill at gmail.com Thu Dec 24 12:27:54 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 24 Dec 2015 12:27:54 +0000 Subject: [aarch64-port-dev ] VAR_CPU_ARCH for ARM platforms In-Reply-To: <567B31BA.3060001@redhat.com> References: <567AD21C.5040907@redhat.com> <567B31BA.3060001@redhat.com> Message-ID: <1450960074.31650.8.camel@mint> On Wed, 2015-12-23 at 23:43 +0000, Andrew Haley wrote: > On 23/12/15 20:36, Bob Vandette wrote: > >> > This is not true of x86_64, which is a rather elaborate 64-bit > >> > extension of x86. ... > > One could say the same thing about armv8 versus armv7. > [ NB: ARMv8 identifies both the AArch32 and AArch64 instruction set > architectures. AArch32 is a slightly extended ARM; AArch64 is all- > new. ] > > > That really depends on your criteria for comparison. I still > > believe we need a broad variable that identifies ARM varieties. .... > > Without this, when the aarch32 port is attempted there?s going to be > > a lot of extraneous checks required in the makefile for ?if ARCH == > > aarch32? || ARCH == arm in places that would not need to be changed > > simply because we didn?t use the existing variable for the purpose > > that I believe it was originally intended. > > I totally agree about AArch32 and ARM. It's the same thing: the > AArch32 project is just about creating ARM-open. There definitely > should be a variable to cover those. FWIW the aarch32 port does -DAARCH32 -DARM in its sysdefs. The rationale for adding -DAARCH32 is to avoid conflicts with the proprietary port. My 2c worth is that aarch64 should be considered a completely separate port. It is not like x86/x86_64. You cannot access the 32 bit instructions from aarch34. In fact some implementations do not even have the 32 bit instructions, ie they are pure aarch64. wrt the differences between armv7 and aarch32 they are not worth considering as separate, they are only a few mcr instructions to do with cache flushing/barriers and are deprecated in armv7 in any case. The correct fix is to use the non deprecated instructions in armv7 which will then also work on aarch32. All the best, Ed. From edward.nevill at gmail.com Thu Dec 24 15:06:56 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 24 Dec 2015 15:06:56 +0000 Subject: [aarch64-port-dev ] guarantee failures with large code cache sizes on jtreg test java/lang/invoke/LFCaching10/LFMultiThreadCachingTest.java Message-ID: <1450969616.31650.50.camel@mint> Hi, I am seeing intermittent guarantee failures on jdk jtreg test java/lang/invoke/LFCaching10/LFMultiThreadCachingTest.jtr. The failure is # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (assembler_aarch64.hpp:218), pid=43991, tid=44418 # guarantee(chk == -1 || chk == 0) failed: Field too big for insn # The test is being run with -XX:ReservedCodeCacheSize=256m, the following is the full command line /home/ed/images/jdk9-orig/bin/java -XX:-TieredCompilation -jar lib/jtreg.jar -vmoption:-XX:ReservedCodeCacheSize=256m -retain -nr -conc:8 -timeout:99 -othervm -jdk:/home/ed/images/jdk9-orig -v1 -a -ignore:quiet /home/ed/new_jdk9/dev/jdk_test/test/java/lang/invoke I have trapped the failure in gdb, it is occurring in pd_patch_instruction_size when trying to patch a BL instruction. #8 0x000003ff7a7a360c in MacroAssembler::pd_patch_instruction_size ( branch=0x3ff691cf2d8 "\223\323\343\227\277:\003\325\213c\313\071\313\b", target=0x3ff60a108a4 "\375{\277\251\375\003") at /home/ed/new_jdk9/hs-comp/hotspot/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp:74 74 Instruction_aarch64::spatch(branch, 25, 0, offset); (gdb) p offset $17 = -35584653 (gdb) Here is the backtrace from gdb #8 0x000003ff7a7a360c in MacroAssembler::pd_patch_instruction_size ( branch=0x3ff691cf2d8 "\223\323\343\227\277:\003\325\213c\313\071\313\b", target=0x3ff60a108a4 "\375{\277\251\375\003") at /home/ed/new_jdk9/hs-comp/hotspot/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp:74 #9 0x000003ff7a33451c in MacroAssembler::pd_patch_instruction ( branch=0x3ff691cf2d8 "\223\323\343\227\277:\003\325\213c\313\071\313\b", target=0x3ff60a108a4 "\375{\277\251\375\003") at /home/ed/new_jdk9/hs-comp/hotspot/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp:565 #10 0x000003ff7a8ca9bc in Relocation::pd_set_call_destination (this=0x3fdc5fab3e8, x=0x3ff60a108a4 "\375{\277\251\375\003") at /home/ed/new_jdk9/hs-comp/hotspot/src/cpu/aarch64/vm/relocInfo_aarch64.cpp:85 #11 0x000003ff7a8c8650 in CallRelocation::fix_relocation_after_move ( this=0x3fdc5fab3e8, src=0x3fdc5fae0b0, dest=0x3fdc5fab490) at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/code/relocInfo.cpp:549 #12 0x000003ff7a4736bc in CodeBuffer::relocate_code_to (this=0x3fdc5fae0b0, dest=0x3fdc5fab490) at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/asm/codeBuffer.cpp:812 #13 0x000003ff7a473be8 in CodeBuffer::expand (this=0x3fdc5fae0b0, which_cs=0x3fdc5fae158, amount=64) at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/asm/codeBuffer.cpp:942 #14 0x000003ff7a334404 in CodeSection::maybe_expand_to_ensure_remaining ( this=0x3fdc5fae158, amount=64) at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/asm/codeBuffer.hpp:661 #15 0x000003ff7a33379c in AbstractAssembler::start_a_stub (this=0x3fdc5fab838, required_space=64) at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/asm/assembler.cpp:65 #16 0x000003ff7a7a54a0 in MacroAssembler::emit_trampoline_stub (this=0x3fdc5fab838, insts_call_instruction_offset=976, dest=0x3ff609cf080 "\375{\277\251H\001") at /home/ed/new_jdk9/hs-comp/hotspot/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp:704 #17 0x000003ff7a7a53c0 in MacroAssembler::trampoline_call (this=0x3fdc5fab838, entry=..., cbuf=0x3fdc5fae0b0) at /home/ed/new_jdk9/hs-comp/hotspot/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp:673 #18 0x000003ff7a2a1bd0 in CallStaticJavaDirectNode::emit (this=0x3fdac0024b0, cbuf=..., ra_=0x3fdc5fabd30) at /home/ed/new_jdk9/hs-comp/hotspot/src/cpu/aarch64/vm/aarch64.ad:4673 #19 0x000003ff7a85edc0 in Compile::fill_buffer (this=0x3fdc5fad870, cb=0x3fdc5fae0b0, blk_starts=0x3fd40042520) at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/opto/output.cpp:1380 #20 0x000003ff7a85b960 in Compile::Output (this=0x3fdc5fad870) at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/opto/output.cpp:154 #21 0x000003ff7a4a6c88 in Compile::Code_Gen (this=0x3fdc5fad870) at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/opto/compile.cpp:2407 #22 0x000003ff7a4a1fa8 in Compile::Compile (this=0x3fdc5fad870, ci_env=0x3fdc5fae390, compiler=0x3ff746bc7d0, target=0x3fd981ff670, osr_bci=-1, subsume_loads=true, do_escape_analysis=true, eliminate_boxing=true, directive=0x3ff74680570) at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/opto/compile.cpp:899 #23 0x000003ff7a3e7684 in C2Compiler::compile_method (this=0x3ff746bc7d0, env=0x3fdc5fae390, target=0x3fd981ff670, entry_bci=-1, directive=0x3ff74680570) at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/opto/c2compiler.cpp:106 #24 0x000003ff7a4b2ea8 in CompileBroker::invoke_compiler_on_method ( task=0x3fd640bbfd0) at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/compiler/compileBroker.cpp:1814 #25 0x000003ff7a4b25d4 in CompileBroker::compiler_thread_loop () at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/compiler/compileBroker.cpp:1564 #26 0x000003ff7a96a9b4 in compiler_thread_entry (thread=0x3ff746bf000, __the_thread__=0x3ff746bf000) at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/runtime/thread.cpp:3238 #27 0x000003ff7a9678f4 in JavaThread::thread_main_inner (this=0x3ff746bf000) at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/runtime/thread.cpp:1723 #28 0x000003ff7a967830 in JavaThread::run (this=0x3ff746bf000) at /home/ed/new_jdk9/hs-comp/hotspot/src/share/vm/runtime/thread.cpp:1703 #29 0x000003ff7a849614 in java_start (thread=0x3ff746bf000) at /home/ed/new_jdk9/hs-comp/hotspot/src/os/linux/vm/os_linux.cpp:683 #30 0x000003ff7af07e2c in start_thread (arg=0x3fdc5faf1f0) at pthread_create.c:314 #31 0x000003ff7ae18c40 in clone () at ../ports/sysdeps/unix/sysv/linux/aarch64/nptl/../clone.S:96 Looking at frame #11 above we see (gdb) list 544 // On some platforms, the reference is absolute (not self-relative). 545 // The enhanced use of pd_call_destination sorts this all out. 546 address orig_addr = old_addr_for(addr(), src, dest); 547 address callee = pd_call_destination(orig_addr); 548 // Reassert the callee address, this time in the new copy of the code. 549 pd_set_call_destination(callee); 550 } 551 552 553 //// pack/unpack methods (gdb) p/x addr() $18 = 0x3ff691cf2d8 (gdb) p/x orig_addr $19 = 0x3ff6111ba58 (gdb) p/x callee $20 = 0x3ff60a108a4 Looking at a section of code at both orig_addr and addr() and at the destination of the BL in each case we have (gdb) x/10i orig_addr-20 0x3ff6111ba44: add x10, x14, w15, sxtw 0x3ff6111ba48: sxtw x2, w17 0x3ff6111ba4c: add x0, x10, #0x10 0x3ff6111ba50: cmp w17, w13 0x3ff6111ba54: b.lt 0x3ff6111bb24 0x3ff6111ba58: bl 0x3ff60a108a4 0x3ff6111ba5c: dmb ishst 0x3ff6111ba60: ldrsb w11, [x28,#728] 0x3ff6111ba64: cbnz w11, 0x3ff6111bb7c 0x3ff6111ba68: mov x10, x19 (gdb) x/10i 0x3ff60a108a4 0x3ff60a108a4: stp x29, x30, [sp,#-16]! 0x3ff60a108a8: mov x29, sp 0x3ff60a108ac: cmp x1, x0 0x3ff60a108b0: b.ls 0x3ff60a10808 0x3ff60a108b4: add x0, x0, x2, uxtx 0x3ff60a108b8: add x1, x1, x2, uxtx 0x3ff60a108bc: cmp x2, #0x10 0x3ff60a108c0: b.cc 0x3ff60a10914 0x3ff60a108c4: and x9, x0, #0xf 0x3ff60a108c8: cbz x9, 0x3ff60a1090c (gdb) x/10i addr()-20 0x3ff691cf2c4: add x10, x14, w15, sxtw 0x3ff691cf2c8: sxtw x2, w17 0x3ff691cf2cc: add x0, x10, #0x10 0x3ff691cf2d0: cmp w17, w13 0x3ff691cf2d4: b.lt 0x3ff691cf3a4 0x3ff691cf2d8: bl 0x3ff68ac4124 0x3ff691cf2dc: dmb ishst 0x3ff691cf2e0: ldrsb w11, [x28,#728] 0x3ff691cf2e4: cbnz w11, 0x3ff691cf3fc 0x3ff691cf2e8: mov x10, x19 (gdb) x/10i 0x3ff68ac4124 0x3ff68ac4124: .inst 0x00000000 ; undefined 0x3ff68ac4128: .inst 0x00000000 ; undefined 0x3ff68ac412c: .inst 0x00000000 ; undefined 0x3ff68ac4130: .inst 0x00000000 ; undefined 0x3ff68ac4134: .inst 0x00000000 ; undefined 0x3ff68ac4138: .inst 0x00000000 ; undefined 0x3ff68ac413c: .inst 0x00000000 ; undefined 0x3ff68ac4140: .inst 0x00000000 ; undefined 0x3ff68ac4144: .inst 0x00000000 ; undefined 0x3ff68ac4148: .inst 0x00000000 ; undefined What appears to be the case here is that we have a BL to another method, therefore outside the scope of the current codeblob. However, this codeblob is now being moved and will now require a trampoline instead of a straight BL. However the BL is not recognised as requiring a trampoline. Looking at frame #10 (gdb) down #10 0x000003ff7a8ca9bc in Relocation::pd_set_call_destination (this=0x3fdc5fab3e8, x=0x3ff60a108a4 "\375{\277\251\375\003") at /home/ed/new_jdk9/hs-comp/hotspot/src/cpu/aarch64/vm/relocInfo_aarch64.cpp:85 85 MacroAssembler::pd_patch_instruction(addr(), x); (gdb) list 81 76 assert(is_call(), "should be a call here"); 77 if (NativeCall::is_call_at(addr())) { 78 address trampoline = nativeCall_at(addr())->get_trampoline(); 79 if (trampoline) { 80 nativeCall_at(addr())->set_destination_mt_safe(x, /* assert_lock */false); 81 return; 82 } 83 } 84 assert(addr() != x, "call instruction in an infinite loop"); 85 MacroAssembler::pd_patch_instruction(addr(), x); 'trampoline' is set to false here (gdb) p NativeCall::is_call_at(addr()) $21 = true (gdb) p nativeCall_at(addr())->get_trampoline() $22 = (u_char *) 0x0 (gdb) Looking at the source for get_trampoline() CodeBlob *code = CodeCache::find_blob(call_addr); assert(code != NULL, "Could not find the containing code blob"); address bl_destination = MacroAssembler::pd_call_destination(call_addr); if (code->content_contains(bl_destination) && is_NativeCallTrampolineStub_at(bl_destination)) return bl_destination; This only tests for a trampoline if the BL destination is within the current code blob, and as seen previously with the problems with adrp, it must not test for a trampoline outside the current code blob because that could be pointing somewhere completely random. In this case it happens to be pointing to a block of .inst 0x00000000 words. The problem arises from the implementation of MacroAssembler::trampoline_call where is does if (Assembler::reachable_from_branch_at(pc(), entry.target())) { bl(entry.target()); } else { bl(pc()); } Here if the call reaches, it plants a BL, however when the call subsequently fails to reach, because the codeblob is moved out of range of a bl, it has no way of finding the trampoline, because it will not look outside the current code blob. Only possibility might be to always write it as bl(pc()) and rely on the final reloc to fix it up to either point to the trampoline, or call direct. However I think there may be a problem with this if the codeblob is moved more than once, in this case the first move would relocate it using a direct BL and then the second could move it out of range and fail to find the trampoline as above. Anyone got any ideas on how to fix this? All the best, and Happy Christmas, Ed. From aph at redhat.com Thu Dec 24 17:29:38 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 24 Dec 2015 17:29:38 +0000 Subject: [aarch64-port-dev ] guarantee failures with large code cache sizes on jtreg test java/lang/invoke/LFCaching10/LFMultiThreadCachingTest.java In-Reply-To: <1450969616.31650.50.camel@mint> References: <1450969616.31650.50.camel@mint> Message-ID: <567C2B82.6080908@redhat.com> On 24/12/15 15:06, Edward Nevill wrote: > This only tests for a trampoline if the BL destination is within the > current code blob, and as seen previously with the problems with > adrp, it must not test for a trampoline outside the current code > blob because that could be pointing somewhere completely random. In > this case it happens to be pointing to a block of .inst 0x00000000 > words. Indeed. But the subsequent code should find the trampoline: return trampoline_stub_Relocation::get_trampoline_for(call_addr, (nmethod*)code); The question is why it doesn't. Andrew. From edward.nevill at gmail.com Tue Dec 29 17:17:32 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Tue, 29 Dec 2015 17:17:32 +0000 Subject: [aarch64-port-dev ] RFR: 8146286: aarch64: guarantee failures with large code cache sizes on jtreg test java/lang/invoke/LFCaching10/LFMultiThreadCachingTest.java Message-ID: <1451409452.30784.72.camel@mint> Hi, The following webrev http://cr.openjdk.java.net/~enevill/8146286/webrev.0/ JIRA Issue: https://bugs.openjdk.java.net/browse/JDK-8146286 The problem is that during code buffer expansion the code buffer can be moved so that a BL is no longer in range. Normally this would resolve to the targets trampoline. However this is inhibited during code buffer expansion because of the following in get_trampoline_for() address trampoline_stub_Relocation::get_trampoline_for(address call, nmethod* code) { // There are no relocations available when the code gets relocated // because of CodeBuffer expansion. if (code->relocation_size() == 0) return NULL; The problem is that the relocs have not been created yet, so get_trampoline_for cannot resolve to the trampoline. The solution I have adopted is to always generate a BL to self in MacroAssembler::trampoline_call. In Relocation::pd_call_destination when it detects a call to self it does not attempt to do the relocation but just leaves it as a call to self (there is no point in trying to relocate the call to self to point to the original destination since that is in the old copy of the code buffer and could be out of range). During final relocation the call to self is then relocated to the correct value. Repeated testing with the above test shows that the problem has been resolved. I have also tested with jtreg hotspot/langtools and jdk, before and after patching and with and without -XX:+ReservedCodeCacheSize=256m with no additional failures. OK to push? Ed. From aph at redhat.com Tue Dec 29 22:16:52 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 29 Dec 2015 22:16:52 +0000 Subject: [aarch64-port-dev ] RFR: 8146286: aarch64: guarantee failures with large code cache sizes on jtreg test java/lang/invoke/LFCaching10/LFMultiThreadCachingTest.java In-Reply-To: <1451409452.30784.72.camel@mint> References: <1451409452.30784.72.camel@mint> Message-ID: <56830654.5010603@redhat.com> On 29/12/15 17:17, Edward Nevill wrote: > I have also tested with jtreg hotspot/langtools and jdk, before and after patching and with and without -XX:+ReservedCodeCacheSize=256m with no additional failures. > > OK to push? Eww. This does make sense, but it looks very odd indeed. OK. Andrew.