From adinn at redhat.com Mon Jun 1 11:00:51 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 01 Jun 2015 12:00:51 +0100 Subject: [aarch64-port-dev ] Fwd: /hg/icedtea7-forest/hotspot: 11 new changesets In-Reply-To: References: Message-ID: <556C3B63.10703@redhat.com> I have backported several outstanding AArch64 JDK8 and JDK9 hotspot fixes to the icedtea7-forest repo as per the commit message below I (successfully) smoke-tested the resulting build on both AArch64 and x86 (the latter because the changes included a small number of minor changes to shared code) by running netbeans and specjvm. regards, Andrew Dinn ----------- -------- Forwarded Message -------- Subject: /hg/icedtea7-forest/hotspot: 11 new changesets Date: Mon, 01 Jun 2015 10:53:29 +0000 From: adinn at icedtea.classpath.org To: distro-pkg-dev at openjdk.java.net changeset 8e04e38c3fa8 in /hg/icedtea7-forest/hotspot details: http://icedtea.classpath.org/hg/icedtea7-forest/hotspot?cmd=changeset;node=8e04e38c3fa8 author: aph date: Thu May 28 10:16:54 2015 -0400 8069593: Changes to JavaThread::_thread_state must use acquire and release changeset 548020488783 in /hg/icedtea7-forest/hotspot details: http://icedtea.classpath.org/hg/icedtea7-forest/hotspot?cmd=changeset;node=548020488783 author: aph date: Tue Mar 03 17:56:33 2015 +0000 8074349: AARCH64: C2 generates poor code for some byte and character stores Summary: Use iRegIorL2I as src input for char and byte stores. Reviewed-by: kvn changeset c8f1b01693ba in /hg/icedtea7-forest/hotspot details: http://icedtea.classpath.org/hg/icedtea7-forest/hotspot?cmd=changeset;node=c8f1b01693ba author: aph date: Thu May 28 10:25:15 2015 -0400 8075045: AARCH64: Stack banging should use store rather than load Summary: Change stack bangs to use a store rather than a load changeset 0bea9494c9cb in /hg/icedtea7-forest/hotspot details: http://icedtea.classpath.org/hg/icedtea7-forest/hotspot?cmd=changeset;node=0bea9494c9cb author: enevill date: Wed May 27 15:03:26 2015 +0100 Add copyright to aarch64_ad.m4 changeset 63723278c978 in /hg/icedtea7-forest/hotspot details: http://icedtea.classpath.org/hg/icedtea7-forest/hotspot?cmd=changeset;node=63723278c978 author: aph date: Fri May 29 09:31:52 2015 -0400 8075443: AARCH64: Missed L2I optimizations in C2 Summary: Use iRegIOrL2I for input operands whenever it makes sense. changeset 84fa299120ce in /hg/icedtea7-forest/hotspot details: http://icedtea.classpath.org/hg/icedtea7-forest/hotspot?cmd=changeset;node=84fa299120ce author: aph date: Fri May 29 09:45:44 2015 -0400 8075930: AARCH64: Use FP Register in C2 Summary: modify to allow C2 to allocate FP (R29) as a general register changeset 137f1ed67e92 in /hg/icedtea7-forest/hotspot details: http://icedtea.classpath.org/hg/icedtea7-forest/hotspot?cmd=changeset;node=137f1ed67e92 author: aph date: Fri May 29 10:38:35 2015 -0400 8076467: AARCH64: assertion fail with -XX:+UseG1GC Summary: Don't call encoding unless bool is true. changeset 3f4d11cdefe1 in /hg/icedtea7-forest/hotspot details: http://icedtea.classpath.org/hg/icedtea7-forest/hotspot?cmd=changeset;node=3f4d11cdefe1 author: enevill date: Fri May 29 11:03:49 2015 -0400 8079203: AARCH64: Need to cater for different partner implementations Summary: Parse /proc/cpuinfo to derive implementation specific info changeset a74b6b4d0bde in /hg/icedtea7-forest/hotspot details: http://icedtea.classpath.org/hg/icedtea7-forest/hotspot?cmd=changeset;node=a74b6b4d0bde author: enevill date: Wed May 27 15:40:40 2015 +0100 8080586: aarch64: hotspot test compiler/codegen/7184394/TestAESMain.java fails Summary: Return correct length in generate_cipherBlockChaining_encryptAESCrypt changeset 1795197a987f in /hg/icedtea7-forest/hotspot details: http://icedtea.classpath.org/hg/icedtea7-forest/hotspot?cmd=changeset;node=1795197a987f author: adinn date: Fri May 29 11:20:12 2015 -0400 8075324: Costs of memory operands in aarch64.ad are inconsistent Summary: Made cost of 'indOffI' consistent to the other memory operands. changeset c96991560be1 in /hg/icedtea7-forest/hotspot details: http://icedtea.classpath.org/hg/icedtea7-forest/hotspot?cmd=changeset;node=c96991560be1 author: thartmann date: Mon Mar 23 10:15:53 2015 +0100 8075136: Unnecessary sign extension for byte array access Summary: Added C2 matching rules to remove unnecessary sign extension for byte array access. Reviewed-by: roland, kvn, aph, adinn diffstat: src/cpu/aarch64/vm/aarch64.ad | 251 ++++++++++++-------- src/cpu/aarch64/vm/aarch64_ad.m4 | 51 +++- src/cpu/aarch64/vm/assembler_aarch64.hpp | 2 +- src/cpu/aarch64/vm/frame_aarch64.inline.hpp | 12 - src/cpu/aarch64/vm/interp_masm_aarch64.hpp | 2 + src/cpu/aarch64/vm/register_aarch64.hpp | 5 +- src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp | 9 +- src/cpu/aarch64/vm/stubGenerator_aarch64.cpp | 4 +- src/cpu/aarch64/vm/templateInterpreter_aarch64.cpp | 11 +- src/cpu/aarch64/vm/vm_version_aarch64.cpp | 40 +++- src/cpu/aarch64/vm/vm_version_aarch64.hpp | 34 ++ src/cpu/x86/vm/x86_64.ad | 61 ++++- src/share/vm/runtime/thread.hpp | 2 +- 13 files changed, 341 insertions(+), 143 deletions(-) diffs (truncated from 1325 to 500 lines): diff -r c0ca0821c737 -r c96991560be1 src/cpu/aarch64/vm/aarch64.ad --- a/src/cpu/aarch64/vm/aarch64.ad Wed Apr 29 12:23:48 2015 -0700 +++ b/src/cpu/aarch64/vm/aarch64.ad Mon Mar 23 10:15:53 2015 +0100 @@ -447,7 +447,7 @@ R26 /* R27, */ // heapbase /* R28, */ // thread - /* R29, */ // fp + R29, // fp /* R30, */ // lr /* R31 */ // sp ); @@ -481,7 +481,7 @@ R26, R26_H, /* R27, R27_H, */ // heapbase /* R28, R28_H, */ // thread - /* R29, R29_H, */ // fp + R29, R29_H, // fp /* R30, R30_H, */ // lr /* R31, R31_H */ // sp ); @@ -1728,7 +1728,7 @@ } const RegMask Matcher::method_handle_invoke_SP_save_mask() { - return RegMask(); + return FP_REG_mask(); } // helper for encoding java_to_runtime calls on sim @@ -1811,6 +1811,8 @@ case INDINDEXSCALEDI2L: case INDINDEXSCALEDOFFSETI2LN: case INDINDEXSCALEDI2LN: + case INDINDEXOFFSETI2L: + case INDINDEXOFFSETI2LN: scale = Address::sxtw(size); break; default: @@ -2126,16 +2128,22 @@ enc_class aarch64_enc_stlrb(iRegI src, memory mem) %{ MOV_VOLATILE(as_Register($src$$reg), $mem$$base, $mem$$index, $mem$$scale, $mem$$disp, rscratch1, stlrb); + if (VM_Version::cpu_cpuFeatures() & VM_Version::CPU_DMB_ATOMICS) + __ dmb(__ ISH); %} enc_class aarch64_enc_stlrh(iRegI src, memory mem) %{ MOV_VOLATILE(as_Register($src$$reg), $mem$$base, $mem$$index, $mem$$scale, $mem$$disp, rscratch1, stlrh); + if (VM_Version::cpu_cpuFeatures() & VM_Version::CPU_DMB_ATOMICS) + __ dmb(__ ISH); %} enc_class aarch64_enc_stlrw(iRegI src, memory mem) %{ MOV_VOLATILE(as_Register($src$$reg), $mem$$base, $mem$$index, $mem$$scale, $mem$$disp, rscratch1, stlrw); + if (VM_Version::cpu_cpuFeatures() & VM_Version::CPU_DMB_ATOMICS) + __ dmb(__ ISH); %} @@ -2226,6 +2234,8 @@ } MOV_VOLATILE(src_reg, $mem$$base, $mem$$index, $mem$$scale, $mem$$disp, rscratch1, stlr); + if (VM_Version::cpu_cpuFeatures() & VM_Version::CPU_DMB_ATOMICS) + __ dmb(__ ISH); %} enc_class aarch64_enc_fstlrs(vRegF src, memory mem) %{ @@ -2236,6 +2246,8 @@ } MOV_VOLATILE(rscratch2, $mem$$base, $mem$$index, $mem$$scale, $mem$$disp, rscratch1, stlrw); + if (VM_Version::cpu_cpuFeatures() & VM_Version::CPU_DMB_ATOMICS) + __ dmb(__ ISH); %} enc_class aarch64_enc_fstlrd(vRegD src, memory mem) %{ @@ -2246,6 +2258,8 @@ } MOV_VOLATILE(rscratch2, $mem$$base, $mem$$index, $mem$$scale, $mem$$disp, rscratch1, stlr); + if (VM_Version::cpu_cpuFeatures() & VM_Version::CPU_DMB_ATOMICS) + __ dmb(__ ISH); %} // synchronized read/update encodings @@ -4285,6 +4299,20 @@ %} %} +operand indIndexOffsetI2L(iRegP reg, iRegI ireg, immLU12 off) +%{ + constraint(ALLOC_IN_RC(ptr_reg)); + match(AddP (AddP reg (ConvI2L ireg)) off); + op_cost(INSN_COST); + format %{ "$reg, $ireg, $off I2L" %} + interface(MEMORY_INTER) %{ + base($reg); + index($ireg); + scale(0x0); + disp($off); + %} +%} + operand indIndexScaledOffsetI2L(iRegP reg, iRegI ireg, immIScale scale, immLU12 off) %{ constraint(ALLOC_IN_RC(ptr_reg)); @@ -4345,7 +4373,7 @@ %{ constraint(ALLOC_IN_RC(ptr_reg)); match(AddP reg off); - op_cost(INSN_COST); + op_cost(0); format %{ "[$reg, $off]" %} interface(MEMORY_INTER) %{ base($reg); @@ -4415,6 +4443,21 @@ %} %} +operand indIndexOffsetI2LN(iRegN reg, iRegI ireg, immLU12 off) +%{ + predicate(Universe::narrow_oop_shift() == 0); + constraint(ALLOC_IN_RC(ptr_reg)); + match(AddP (AddP (DecodeN reg) (ConvI2L ireg)) off); + op_cost(INSN_COST); + format %{ "$reg, $ireg, $off I2L\t# narrow" %} + interface(MEMORY_INTER) %{ + base($reg); + index($ireg); + scale(0x0); + disp($off); + %} +%} + operand indIndexScaledOffsetI2LN(iRegN reg, iRegI ireg, immIScale scale, immLU12 off) %{ predicate(Universe::narrow_oop_shift() == 0); @@ -4673,8 +4716,8 @@ // memory is used to define read/write location for load/store // instruction defs. we can turn a memory op into an Address -opclass memory(indirect, indIndexScaledOffsetI, indIndexScaledOffsetL, indIndexScaledOffsetI2L, indIndexScaled, indIndexScaledI2L, indIndex, indOffI, indOffL, - indirectN, indIndexScaledOffsetIN, indIndexScaledOffsetLN, indIndexScaledOffsetI2LN, indIndexScaledN, indIndexScaledI2LN, indIndexN, indOffIN, indOffLN); +opclass memory(indirect, indIndexScaledOffsetI, indIndexScaledOffsetL, indIndexOffsetI2L, indIndexScaledOffsetI2L, indIndexScaled, indIndexScaledI2L, indIndex, indOffI, indOffL, + indirectN, indIndexScaledOffsetIN, indIndexScaledOffsetLN, indIndexOffsetI2LN, indIndexScaledOffsetI2LN, indIndexScaledN, indIndexScaledI2LN, indIndexN, indOffIN, indOffLN); // iRegIorL2I is used for src inputs in rules for 32 bit int (I) // operations. it allows the src to be either an iRegI or a (ConvL2I @@ -5616,7 +5659,7 @@ %} // Store Byte -instruct storeB(iRegI src, memory mem) +instruct storeB(iRegIorL2I src, memory mem) %{ match(Set mem (StoreB mem src)); @@ -5642,7 +5685,7 @@ %} // Store Char/Short -instruct storeC(iRegI src, memory mem) +instruct storeC(iRegIorL2I src, memory mem) %{ match(Set mem (StoreC mem src)); @@ -5943,7 +5986,7 @@ // ============================================================================ // Zero Count Instructions -instruct countLeadingZerosI(iRegINoSp dst, iRegI src) %{ +instruct countLeadingZerosI(iRegINoSp dst, iRegIorL2I src) %{ match(Set dst (CountLeadingZerosI src)); ins_cost(INSN_COST); @@ -5967,7 +6010,7 @@ ins_pipe( ialu_reg ); %} -instruct countTrailingZerosI(iRegINoSp dst, iRegI src) %{ +instruct countTrailingZerosI(iRegINoSp dst, iRegIorL2I src) %{ match(Set dst (CountTrailingZerosI src)); ins_cost(INSN_COST * 2); @@ -6539,7 +6582,7 @@ // which throws a ShouldNotHappen. So, we have to provide two flavours // of each rule, one for a cmpOp and a second for a cmpOpU (sigh). -instruct cmovI_reg_reg(cmpOp cmp, rFlagsReg cr, iRegINoSp dst, iRegI src1, iRegI src2) %{ +instruct cmovI_reg_reg(cmpOp cmp, rFlagsReg cr, iRegINoSp dst, iRegIorL2I src1, iRegIorL2I src2) %{ match(Set dst (CMoveI (Binary cmp cr) (Binary src1 src2))); ins_cost(INSN_COST * 2); @@ -6555,7 +6598,7 @@ ins_pipe(icond_reg_reg); %} -instruct cmovUI_reg_reg(cmpOpU cmp, rFlagsRegU cr, iRegINoSp dst, iRegI src1, iRegI src2) %{ +instruct cmovUI_reg_reg(cmpOpU cmp, rFlagsRegU cr, iRegINoSp dst, iRegIorL2I src1, iRegIorL2I src2) %{ match(Set dst (CMoveI (Binary cmp cr) (Binary src1 src2))); ins_cost(INSN_COST * 2); @@ -6580,7 +6623,7 @@ // we ought only to be able to cull one of these variants as the ideal // transforms ought always to order the zero consistently (to left/right?) -instruct cmovI_zero_reg(cmpOp cmp, rFlagsReg cr, iRegINoSp dst, immI0 zero, iRegI src2) %{ +instruct cmovI_zero_reg(cmpOp cmp, rFlagsReg cr, iRegINoSp dst, immI0 zero, iRegIorL2I src2) %{ match(Set dst (CMoveI (Binary cmp cr) (Binary zero src2))); ins_cost(INSN_COST * 2); @@ -6596,7 +6639,7 @@ ins_pipe(icond_reg); %} -instruct cmovUI_zero_reg(cmpOpU cmp, rFlagsRegU cr, iRegINoSp dst, immI0 zero, iRegI src2) %{ +instruct cmovUI_zero_reg(cmpOpU cmp, rFlagsRegU cr, iRegINoSp dst, immI0 zero, iRegIorL2I src2) %{ match(Set dst (CMoveI (Binary cmp cr) (Binary zero src2))); ins_cost(INSN_COST * 2); @@ -6612,7 +6655,7 @@ ins_pipe(icond_reg); %} -instruct cmovI_reg_zero(cmpOp cmp, rFlagsReg cr, iRegINoSp dst, iRegI src1, immI0 zero) %{ +instruct cmovI_reg_zero(cmpOp cmp, rFlagsReg cr, iRegINoSp dst, iRegIorL2I src1, immI0 zero) %{ match(Set dst (CMoveI (Binary cmp cr) (Binary src1 zero))); ins_cost(INSN_COST * 2); @@ -6628,7 +6671,7 @@ ins_pipe(icond_reg); %} -instruct cmovUI_reg_zero(cmpOpU cmp, rFlagsRegU cr, iRegINoSp dst, iRegI src1, immI0 zero) %{ +instruct cmovUI_reg_zero(cmpOpU cmp, rFlagsRegU cr, iRegINoSp dst, iRegIorL2I src1, immI0 zero) %{ match(Set dst (CMoveI (Binary cmp cr) (Binary src1 zero))); ins_cost(INSN_COST * 2); @@ -7080,7 +7123,7 @@ ins_pipe(ialu_reg_reg); %} -instruct addI_reg_imm(iRegINoSp dst, iRegI src1, immIAddSub src2) %{ +instruct addI_reg_imm(iRegINoSp dst, iRegIorL2I src1, immIAddSub src2) %{ match(Set dst (AddI src1 src2)); ins_cost(INSN_COST); @@ -7127,7 +7170,7 @@ instruct addP_reg_reg_ext(iRegPNoSp dst, iRegP src1, iRegIorL2I src2) %{ match(Set dst (AddP src1 (ConvI2L src2))); - ins_cost(INSN_COST); + ins_cost(1.9 * INSN_COST); format %{ "add $dst, $src1, $src2, sxtw\t# ptr" %} ins_encode %{ @@ -7473,7 +7516,7 @@ ins_pipe(idiv_reg_reg); %} -instruct signExtract(iRegINoSp dst, iRegI src, immI_31 div1, immI_31 div2) %{ +instruct signExtract(iRegINoSp dst, iRegIorL2I src, immI_31 div1, immI_31 div2) %{ match(Set dst (URShiftI (RShiftI src div1) div2)); ins_cost(INSN_COST); format %{ "lsrw $dst, $src, $div1" %} @@ -7483,7 +7526,7 @@ ins_pipe(ialu_reg_shift); %} -instruct div2Round(iRegINoSp dst, iRegI src, immI_31 div1, immI_31 div2) %{ +instruct div2Round(iRegINoSp dst, iRegIorL2I src, immI_31 div1, immI_31 div2) %{ match(Set dst (AddI src (URShiftI (RShiftI src div1) div2))); ins_cost(INSN_COST); format %{ "addw $dst, $src, LSR $div1" %} @@ -7793,7 +7836,7 @@ ins_pipe(ialu_reg); %} instruct regI_not_reg(iRegINoSp dst, - iRegI src1, immI_M1 m1, + iRegIorL2I src1, immI_M1 m1, rFlagsReg cr) %{ match(Set dst (XorI src1 m1)); ins_cost(INSN_COST); @@ -7810,10 +7853,27 @@ %} instruct AndI_reg_not_reg(iRegINoSp dst, - iRegI src1, iRegI src2, immI_M1 m1, + iRegIorL2I src1, iRegIorL2I src2, immI_M1 m1, rFlagsReg cr) %{ match(Set dst (AndI src1 (XorI src2 m1))); ins_cost(INSN_COST); + format %{ "bicw $dst, $src1, $src2" %} + + ins_encode %{ + __ bicw(as_Register($dst$$reg), + as_Register($src1$$reg), + as_Register($src2$$reg), + Assembler::LSL, 0); + %} + + ins_pipe(ialu_reg_reg); +%} + +instruct AndL_reg_not_reg(iRegLNoSp dst, + iRegL src1, iRegL src2, immL_M1 m1, + rFlagsReg cr) %{ + match(Set dst (AndL src1 (XorL src2 m1))); + ins_cost(INSN_COST); format %{ "bic $dst, $src1, $src2" %} ins_encode %{ @@ -7826,15 +7886,15 @@ ins_pipe(ialu_reg_reg); %} -instruct AndL_reg_not_reg(iRegLNoSp dst, - iRegL src1, iRegL src2, immL_M1 m1, +instruct OrI_reg_not_reg(iRegINoSp dst, + iRegIorL2I src1, iRegIorL2I src2, immI_M1 m1, rFlagsReg cr) %{ - match(Set dst (AndL src1 (XorL src2 m1))); - ins_cost(INSN_COST); - format %{ "bic $dst, $src1, $src2" %} - - ins_encode %{ - __ bic(as_Register($dst$$reg), + match(Set dst (OrI src1 (XorI src2 m1))); + ins_cost(INSN_COST); + format %{ "ornw $dst, $src1, $src2" %} + + ins_encode %{ + __ ornw(as_Register($dst$$reg), as_Register($src1$$reg), as_Register($src2$$reg), Assembler::LSL, 0); @@ -7843,10 +7903,10 @@ ins_pipe(ialu_reg_reg); %} -instruct OrI_reg_not_reg(iRegINoSp dst, - iRegI src1, iRegI src2, immI_M1 m1, +instruct OrL_reg_not_reg(iRegLNoSp dst, + iRegL src1, iRegL src2, immL_M1 m1, rFlagsReg cr) %{ - match(Set dst (OrI src1 (XorI src2 m1))); + match(Set dst (OrL src1 (XorL src2 m1))); ins_cost(INSN_COST); format %{ "orn $dst, $src1, $src2" %} @@ -7860,15 +7920,15 @@ ins_pipe(ialu_reg_reg); %} -instruct OrL_reg_not_reg(iRegLNoSp dst, - iRegL src1, iRegL src2, immL_M1 m1, +instruct XorI_reg_not_reg(iRegINoSp dst, + iRegIorL2I src1, iRegIorL2I src2, immI_M1 m1, rFlagsReg cr) %{ - match(Set dst (OrL src1 (XorL src2 m1))); - ins_cost(INSN_COST); - format %{ "orn $dst, $src1, $src2" %} - - ins_encode %{ - __ orn(as_Register($dst$$reg), + match(Set dst (XorI m1 (XorI src2 src1))); + ins_cost(INSN_COST); + format %{ "eonw $dst, $src1, $src2" %} + + ins_encode %{ + __ eonw(as_Register($dst$$reg), as_Register($src1$$reg), as_Register($src2$$reg), Assembler::LSL, 0); @@ -7877,10 +7937,10 @@ ins_pipe(ialu_reg_reg); %} -instruct XorI_reg_not_reg(iRegINoSp dst, - iRegI src1, iRegI src2, immI_M1 m1, +instruct XorL_reg_not_reg(iRegLNoSp dst, + iRegL src1, iRegL src2, immL_M1 m1, rFlagsReg cr) %{ - match(Set dst (XorI m1 (XorI src2 src1))); + match(Set dst (XorL m1 (XorL src2 src1))); ins_cost(INSN_COST); format %{ "eon $dst, $src1, $src2" %} @@ -7894,25 +7954,8 @@ ins_pipe(ialu_reg_reg); %} -instruct XorL_reg_not_reg(iRegLNoSp dst, - iRegL src1, iRegL src2, immL_M1 m1, - rFlagsReg cr) %{ - match(Set dst (XorL m1 (XorL src2 src1))); - ins_cost(INSN_COST); - format %{ "eon $dst, $src1, $src2" %} - - ins_encode %{ - __ eon(as_Register($dst$$reg), - as_Register($src1$$reg), - as_Register($src2$$reg), - Assembler::LSL, 0); - %} - - ins_pipe(ialu_reg_reg); -%} - instruct AndI_reg_URShift_not_reg(iRegINoSp dst, - iRegI src1, iRegI src2, + iRegIorL2I src1, iRegIorL2I src2, immI src3, immI_M1 src4, rFlagsReg cr) %{ match(Set dst (AndI src1 (XorI(URShiftI src2 src3) src4))); ins_cost(1.9 * INSN_COST); @@ -7948,7 +7991,7 @@ %} instruct AndI_reg_RShift_not_reg(iRegINoSp dst, - iRegI src1, iRegI src2, + iRegIorL2I src1, iRegIorL2I src2, immI src3, immI_M1 src4, rFlagsReg cr) %{ match(Set dst (AndI src1 (XorI(RShiftI src2 src3) src4))); ins_cost(1.9 * INSN_COST); @@ -7984,7 +8027,7 @@ %} instruct AndI_reg_LShift_not_reg(iRegINoSp dst, - iRegI src1, iRegI src2, + iRegIorL2I src1, iRegIorL2I src2, immI src3, immI_M1 src4, rFlagsReg cr) %{ match(Set dst (AndI src1 (XorI(LShiftI src2 src3) src4))); ins_cost(1.9 * INSN_COST); @@ -8020,7 +8063,7 @@ %} instruct XorI_reg_URShift_not_reg(iRegINoSp dst, - iRegI src1, iRegI src2, + iRegIorL2I src1, iRegIorL2I src2, immI src3, immI_M1 src4, rFlagsReg cr) %{ match(Set dst (XorI src4 (XorI(URShiftI src2 src3) src1))); ins_cost(1.9 * INSN_COST); @@ -8056,7 +8099,7 @@ %} instruct XorI_reg_RShift_not_reg(iRegINoSp dst, - iRegI src1, iRegI src2, + iRegIorL2I src1, iRegIorL2I src2, immI src3, immI_M1 src4, rFlagsReg cr) %{ match(Set dst (XorI src4 (XorI(RShiftI src2 src3) src1))); ins_cost(1.9 * INSN_COST); @@ -8092,7 +8135,7 @@ %} instruct XorI_reg_LShift_not_reg(iRegINoSp dst, - iRegI src1, iRegI src2, + iRegIorL2I src1, iRegIorL2I src2, immI src3, immI_M1 src4, rFlagsReg cr) %{ match(Set dst (XorI src4 (XorI(LShiftI src2 src3) src1))); ins_cost(1.9 * INSN_COST); @@ -8128,7 +8171,7 @@ %} instruct OrI_reg_URShift_not_reg(iRegINoSp dst, - iRegI src1, iRegI src2, + iRegIorL2I src1, iRegIorL2I src2, immI src3, immI_M1 src4, rFlagsReg cr) %{ match(Set dst (OrI src1 (XorI(URShiftI src2 src3) src4))); ins_cost(1.9 * INSN_COST); @@ -8164,7 +8207,7 @@ %} instruct OrI_reg_RShift_not_reg(iRegINoSp dst, - iRegI src1, iRegI src2, + iRegIorL2I src1, iRegIorL2I src2, immI src3, immI_M1 src4, rFlagsReg cr) %{ match(Set dst (OrI src1 (XorI(RShiftI src2 src3) src4))); ins_cost(1.9 * INSN_COST); @@ -8200,7 +8243,7 @@ %} instruct OrI_reg_LShift_not_reg(iRegINoSp dst, - iRegI src1, iRegI src2, + iRegIorL2I src1, iRegIorL2I src2, immI src3, immI_M1 src4, rFlagsReg cr) %{ match(Set dst (OrI src1 (XorI(LShiftI src2 src3) src4))); ins_cost(1.9 * INSN_COST); @@ -8236,7 +8279,7 @@ %} instruct AndI_reg_URShift_reg(iRegINoSp dst, - iRegI src1, iRegI src2, + iRegIorL2I src1, iRegIorL2I src2, immI src3, rFlagsReg cr) %{ match(Set dst (AndI src1 (URShiftI src2 src3))); From Alexander.Alexeev at caviumnetworks.com Mon Jun 1 14:23:24 2015 From: Alexander.Alexeev at caviumnetworks.com (Alexeev, Alexander) Date: Mon, 1 Jun 2015 14:23:24 +0000 Subject: [aarch64-port-dev ] UseSHA flag is a bit inconsistent on AArch64 Message-ID: Hello I noticed a couple of inconsistences related to UseSHA flag on AArch64 1. Comments for this flag in globals.hpp says "Control whether SHA instructions can be used on SPARC" 2. Two rules for the flag defined in test suite are broken. (Although rules defined for Sparc) a. UseSHA option should be disabled when all UseSHA*Intrinsics are disabled b. UseSHA option should be disabled when all UseSHA*Intrinsics are disabled even if UseSHA flag set to JVM Proposed fixes for both issues are below --- CUT HERE --- --- old/src/cpu/aarch64/vm/vm_version_aarch64.cpp 2015-06-01 14:19:20.854027000 +0000 +++ new/src/cpu/aarch64/vm/vm_version_aarch64.cpp 2015-06-01 14:19:20.664027000 +0000 @@ -228,6 +228,9 @@ warning("SHA512 instruction (for SHA-384 and SHA-512) is not available on this CPU."); FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); } + if (!(UseSHA1Intrinsics || UseSHA256Intrinsics || UseSHA512Intrinsics)) { + FLAG_SET_DEFAULT(UseSHA, false); + } } // This machine allows unaligned memory accesses --- old/src/share/vm/runtime/globals.hpp 2015-06-01 14:19:21.594027000 +0000 +++ new/src/share/vm/runtime/globals.hpp 2015-06-01 14:19:21.374027000 +0000 @@ -639,7 +639,8 @@ "Control whether AES instructions can be used on x86/x64") \ \ product(bool, UseSHA, false, \ - "Control whether SHA instructions can be used on SPARC") \ + "Control whether SHA instructions can be used" \ + "on SPARC and AArch64") \ \ product(size_t, LargePageSizeInBytes, 0, \ "Large page size (0 to let VM choose the page size)") \ --- CUT HERE --- Wbr, Alexander From Alexander.Alexeev at caviumnetworks.com Mon Jun 1 18:38:53 2015 From: Alexander.Alexeev at caviumnetworks.com (Alexeev, Alexander) Date: Mon, 1 Jun 2015 18:38:53 +0000 Subject: [aarch64-port-dev ] RFR: AARCH64: Fix for failed sha tests (compiler/intrinsics/sha) Message-ID: Hello This patch [1] resolves issues with sha related tests (hotspot/test/compiler/intrinsics/sha) failed on aarch64. Names of tests are below. The problem is that tests have no cases for aarch64 and such architecture falls into "OtherCPU" category which assumed not supporting SHA intrinsics. Adding test cases for AArch64 solve that. For "supported" test cases and "specific unsupported" test cases implementation is just combination with Sparc existed versions by renaming classes and adding predicates for aarch64. For "unsupported" version dedicated aarch64 test case is created since Sparc version doesn't check -XX:+SHA... options warnings and aarch64 can do what and does. "supported" - means CPU support SHA instructions and intrinsics are available "unsupported" - means CPU doesn't have SHA support Failed tests: compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnUnsupportedCPU.java compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnUnsupportedCPU.java compiler/intrinsics/sha/cli/TestUseSHAOptionOnUnsupportedCPU.java compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java compiler/intrinsics/sha/sanity/TestSHA1MultiBlockIntrinsics.java compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java compiler/intrinsics/sha/sanity/TestSHA256MultiBlockIntrinsics.java After applying proposed patch new test starts failing until inconsistency in UseSHA flag behavior will be resolved. Details with fix were sent early to the aarch64-port mail list in separate message. Failing test: compiler/intrinsics/sha/cli/TestUseSHAOptionOnSupportedCPU.java Jtreg hotspot tests Before: Test results: passed: 818; failed: 34; error: 3 After: Test results: passed: 824; failed: 28; error: 3 Tests weren't executed on SPARC since such arch is unavailable. Please, review and sponsor if approved. [1] www.googledrive.com/host/0B5VQvD5uJjDQfmt3OXBJWHFTc2RQN0RsNUpNZ1ZDUFdPSno3VW12eUJnR0Y0TWszNHpaSEU From edward.nevill at gmail.com Mon Jun 1 19:06:03 2015 From: edward.nevill at gmail.com (edward.nevill at gmail.com) Date: Mon, 01 Jun 2015 19:06:03 +0000 Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/hotspot: 2 new changesets Message-ID: <201506011906.t51J63KH029334@aojmv0008.oracle.com> Changeset: 685e10e5d557 Author: thartmann Date: 2015-03-23 10:13 +0100 URL: http://hg.openjdk.java.net/aarch64-port/jdk8/hotspot/rev/685e10e5d557 8075324: Costs of memory operands in aarch64.ad are inconsistent Summary: Made cost of 'indOffI' consistent to the other memory operands. Reviewed-by: roland, aph, adinn ! src/cpu/aarch64/vm/aarch64.ad Changeset: 471988878307 Author: thartmann Date: 2015-03-23 10:15 +0100 URL: http://hg.openjdk.java.net/aarch64-port/jdk8/hotspot/rev/471988878307 8075136: Unnecessary sign extension for byte array access Summary: Added C2 matching rules to remove unnecessary sign extension for byte array access. Reviewed-by: roland, kvn, aph, adinn ! src/cpu/aarch64/vm/aarch64.ad ! src/cpu/x86/vm/x86_64.ad From edward.nevill at linaro.org Tue Jun 2 09:26:35 2015 From: edward.nevill at linaro.org (Edward Nevill) Date: Tue, 02 Jun 2015 10:26:35 +0100 Subject: [aarch64-port-dev ] RFR: 8081669: aarch64: JTreg TestStable tests failing Message-ID: <1433237195.16770.13.camel@mylittlepony.linaroharston> Hi, The following webrev http://cr.openjdk.java.net/~enevill/8081669/webrev.00/ fixes a number of TestStable tests. This patch was contributed by alexander.alexeev at caviumnetworks.com The following are the test failures that are fixed by this patch compiler/stable/TestStableByte.java compiler/stable/TestStableBoolean.java compiler/stable/TestStableChar.java compiler/stable/TestStableFloat.java compiler/stable/TestStableObject.java compiler/stable/TestStableDouble.java compiler/stable/TestStableInt.java compiler/stable/TestStableLong.java compiler/stable/TestStableShort.java The problem is that the method 'get' in StableConfiguration is supposed to return true if the method is server compiled, false otherwise. On aarch64 it is always returning true, even when the method is client compiled. The reason for this is that aarch64 differs from other ports in that it always deopts on patching. This means that the method 'get' deopts immediately when compiled with -Xcomp because it hits an unresolved method call. This means that the method is now executing in the interpreter. When the method 'get' is executing in the interpreter, it uses the value of java.vm.name to determine whether the method would be server compiled if it were to be compiled. This ends up returning true on aarch64, because it is a server compiler. However in the case where we force it not to server compile by using -XX:+TieredCompilation and -XX:TieredStopAtLevel=1 (as in the TestStable tests) this will be incorrect. The solution is to introduce a simple (null) method 'get1' which will never be deopted (because there is never anything to patch) and uses this as the basis for deciding whether we are server compiling or not. This is more fully explained in the inline comment in the patch. Please review and if appropriate I will push. All the best, Ed. From aph at redhat.com Tue Jun 2 10:45:02 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 02 Jun 2015 11:45:02 +0100 Subject: [aarch64-port-dev ] RFR: 8081669: aarch64: JTreg TestStable tests failing In-Reply-To: <1433237195.16770.13.camel@mylittlepony.linaroharston> References: <1433237195.16770.13.camel@mylittlepony.linaroharston> Message-ID: <556D892E.50805@redhat.com> On 06/02/2015 10:26 AM, Edward Nevill wrote: > Please review and if appropriate I will push. That sounds correct to me, but I need a JDK9 reviewer. Andrew. From aph at redhat.com Tue Jun 2 12:56:41 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 02 Jun 2015 13:56:41 +0100 Subject: [aarch64-port-dev ] RFR: 8079565: aarch64: Add vectorization support for aarch64 In-Reply-To: <1432658017.17486.32.camel@mylittlepony.linaroharston> References: <1432658017.17486.32.camel@mylittlepony.linaroharston> Message-ID: <556DA809.9080305@redhat.com> On 05/26/2015 05:33 PM, Edward Nevill wrote: > The following webrev > > http://cr.openjdk.java.net/~enevill/8079565/webrev.00/ Looks like a great start, thanks. Can we have a JDK9 reviewer, please? Thanks, Andrew. From vladimir.x.ivanov at oracle.com Tue Jun 2 14:17:17 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 02 Jun 2015 17:17:17 +0300 Subject: [aarch64-port-dev ] RFR: 8081669: aarch64: JTreg TestStable tests failing In-Reply-To: <1433237195.16770.13.camel@mylittlepony.linaroharston> References: <1433237195.16770.13.camel@mylittlepony.linaroharston> Message-ID: <556DBAED.5030701@oracle.com> Looks good. Best regards, Vladimir Ivanov On 6/2/15 12:26 PM, Edward Nevill wrote: > Hi, > > The following webrev > > http://cr.openjdk.java.net/~enevill/8081669/webrev.00/ > > fixes a number of TestStable tests. > > This patch was contributed by alexander.alexeev at caviumnetworks.com > > The following are the test failures that are fixed by this patch > > compiler/stable/TestStableByte.java > compiler/stable/TestStableBoolean.java > compiler/stable/TestStableChar.java > compiler/stable/TestStableFloat.java > compiler/stable/TestStableObject.java > compiler/stable/TestStableDouble.java > compiler/stable/TestStableInt.java > compiler/stable/TestStableLong.java > compiler/stable/TestStableShort.java > > The problem is that the method 'get' in StableConfiguration is supposed to return true if the method is server compiled, false otherwise. > > On aarch64 it is always returning true, even when the method is client compiled. The reason for this is that aarch64 differs from other ports in that it always deopts on patching. > > This means that the method 'get' deopts immediately when compiled with -Xcomp because it hits an unresolved method call. This means that the method is now executing in the interpreter. > > When the method 'get' is executing in the interpreter, it uses the value of java.vm.name to determine whether the method would be server compiled if it were to be compiled. This ends up returning true on aarch64, because it is a server compiler. > > However in the case where we force it not to server compile by using -XX:+TieredCompilation and -XX:TieredStopAtLevel=1 (as in the TestStable tests) this will be incorrect. > > The solution is to introduce a simple (null) method 'get1' which will never be deopted (because there is never anything to patch) and uses this as the basis for deciding whether we are server compiling or not. > > This is more fully explained in the inline comment in the patch. > > Please review and if appropriate I will push. > > All the best, > Ed. > > From vladimir.x.ivanov at oracle.com Tue Jun 2 16:50:55 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 02 Jun 2015 19:50:55 +0300 Subject: [aarch64-port-dev ] RFR: 8081669: aarch64: JTreg TestStable tests failing In-Reply-To: <556DBAED.5030701@oracle.com> References: <1433237195.16770.13.camel@mylittlepony.linaroharston> <556DBAED.5030701@oracle.com> Message-ID: <556DDEEF.4060703@oracle.com> The only concern I have is that I don't see Alexander on OCA list [1]. In order to proceed with the fix, he should sign OCA first. Best regards, Vladimir Ivanov [1] http://www.oracle.com/technetwork/community/oca-486395.html On 6/2/15 5:17 PM, Vladimir Ivanov wrote: > Looks good. > > Best regards, > Vladimir Ivanov > > On 6/2/15 12:26 PM, Edward Nevill wrote: >> Hi, >> >> The following webrev >> >> http://cr.openjdk.java.net/~enevill/8081669/webrev.00/ >> >> fixes a number of TestStable tests. >> >> This patch was contributed by alexander.alexeev at caviumnetworks.com >> >> The following are the test failures that are fixed by this patch >> >> compiler/stable/TestStableByte.java >> compiler/stable/TestStableBoolean.java >> compiler/stable/TestStableChar.java >> compiler/stable/TestStableFloat.java >> compiler/stable/TestStableObject.java >> compiler/stable/TestStableDouble.java >> compiler/stable/TestStableInt.java >> compiler/stable/TestStableLong.java >> compiler/stable/TestStableShort.java >> >> The problem is that the method 'get' in StableConfiguration is >> supposed to return true if the method is server compiled, false >> otherwise. >> >> On aarch64 it is always returning true, even when the method is client >> compiled. The reason for this is that aarch64 differs from other ports >> in that it always deopts on patching. >> >> This means that the method 'get' deopts immediately when compiled with >> -Xcomp because it hits an unresolved method call. This means that the >> method is now executing in the interpreter. >> >> When the method 'get' is executing in the interpreter, it uses the >> value of java.vm.name to determine whether the method would be server >> compiled if it were to be compiled. This ends up returning true on >> aarch64, because it is a server compiler. >> >> However in the case where we force it not to server compile by using >> -XX:+TieredCompilation and -XX:TieredStopAtLevel=1 (as in the >> TestStable tests) this will be incorrect. >> >> The solution is to introduce a simple (null) method 'get1' which will >> never be deopted (because there is never anything to patch) and uses >> this as the basis for deciding whether we are server compiling or not. >> >> This is more fully explained in the inline comment in the patch. >> >> Please review and if appropriate I will push. >> >> All the best, >> Ed. >> >> From Alexander.Alexeev at caviumnetworks.com Tue Jun 2 16:55:29 2015 From: Alexander.Alexeev at caviumnetworks.com (Alexeev, Alexander) Date: Tue, 2 Jun 2015 16:55:29 +0000 Subject: [aarch64-port-dev ] RFR: 8081669: aarch64: JTreg TestStable tests failing In-Reply-To: <556DDEEF.4060703@oracle.com> References: <1433237195.16770.13.camel@mylittlepony.linaroharston> <556DBAED.5030701@oracle.com> <556DDEEF.4060703@oracle.com> Message-ID: Vladimir, I am contributing on behalf of Cavium Inc. It actually exists in the list. Is it enough? Regards, Alexander -----Original Message----- From: aarch64-port-dev [mailto:aarch64-port-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Ivanov Sent: Tuesday, June 2, 2015 7:51 PM To: edward.nevill at linaro.org; aarch64-port-dev at openjdk.java.net; hotspot-dev Source Developers Subject: Re: [aarch64-port-dev ] RFR: 8081669: aarch64: JTreg TestStable tests failing The only concern I have is that I don't see Alexander on OCA list [1]. In order to proceed with the fix, he should sign OCA first. Best regards, Vladimir Ivanov [1] http://www.oracle.com/technetwork/community/oca-486395.html On 6/2/15 5:17 PM, Vladimir Ivanov wrote: > Looks good. > > Best regards, > Vladimir Ivanov > > On 6/2/15 12:26 PM, Edward Nevill wrote: >> Hi, >> >> The following webrev >> >> http://cr.openjdk.java.net/~enevill/8081669/webrev.00/ >> >> fixes a number of TestStable tests. >> >> This patch was contributed by alexander.alexeev at caviumnetworks.com >> >> The following are the test failures that are fixed by this patch >> >> compiler/stable/TestStableByte.java >> compiler/stable/TestStableBoolean.java >> compiler/stable/TestStableChar.java >> compiler/stable/TestStableFloat.java >> compiler/stable/TestStableObject.java >> compiler/stable/TestStableDouble.java >> compiler/stable/TestStableInt.java >> compiler/stable/TestStableLong.java >> compiler/stable/TestStableShort.java >> >> The problem is that the method 'get' in StableConfiguration is >> supposed to return true if the method is server compiled, false >> otherwise. >> >> On aarch64 it is always returning true, even when the method is >> client compiled. The reason for this is that aarch64 differs from >> other ports in that it always deopts on patching. >> >> This means that the method 'get' deopts immediately when compiled >> with -Xcomp because it hits an unresolved method call. This means >> that the method is now executing in the interpreter. >> >> When the method 'get' is executing in the interpreter, it uses the >> value of java.vm.name to determine whether the method would be server >> compiled if it were to be compiled. This ends up returning true on >> aarch64, because it is a server compiler. >> >> However in the case where we force it not to server compile by using >> -XX:+TieredCompilation and -XX:TieredStopAtLevel=1 (as in the >> TestStable tests) this will be incorrect. >> >> The solution is to introduce a simple (null) method 'get1' which will >> never be deopted (because there is never anything to patch) and uses >> this as the basis for deciding whether we are server compiling or not. >> >> This is more fully explained in the inline comment in the patch. >> >> Please review and if appropriate I will push. >> >> All the best, >> Ed. >> >> From edward.nevill at gmail.com Tue Jun 2 17:11:42 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Tue, 02 Jun 2015 18:11:42 +0100 Subject: [aarch64-port-dev ] RFR: 8081669: aarch64: JTreg TestStable tests failing In-Reply-To: <556DDEEF.4060703@oracle.com> References: <1433237195.16770.13.camel@mylittlepony.linaroharston> <556DBAED.5030701@oracle.com> <556DDEEF.4060703@oracle.com> Message-ID: <1433265102.1852.5.camel@mint> Hi Vladimir, - Alexander is employed as a contractor by Cavium - Cavium have signed the OCA - Alexander is using his Cavium email address I have checked this with Dalibor Topic (on cc) and my understanding was that this was sufficient to allow Alexander to contribute. All the best, Ed. On Tue, 2015-06-02 at 19:50 +0300, Vladimir Ivanov wrote: > The only concern I have is that I don't see Alexander on OCA list [1]. > In order to proceed with the fix, he should sign OCA first. > > Best regards, > Vladimir Ivanov > > [1] http://www.oracle.com/technetwork/community/oca-486395.html > > On 6/2/15 5:17 PM, Vladimir Ivanov wrote: > > Looks good. > > > > Best regards, > > Vladimir Ivanov > > > > On 6/2/15 12:26 PM, Edward Nevill wrote: > >> Hi, > >> > >> The following webrev > >> > >> http://cr.openjdk.java.net/~enevill/8081669/webrev.00/ > >> > >> fixes a number of TestStable tests. > >> > >> This patch was contributed by alexander.alexeev at caviumnetworks.com > >> > >> The following are the test failures that are fixed by this patch > >> > >> compiler/stable/TestStableByte.java > >> compiler/stable/TestStableBoolean.java > >> compiler/stable/TestStableChar.java > >> compiler/stable/TestStableFloat.java > >> compiler/stable/TestStableObject.java > >> compiler/stable/TestStableDouble.java > >> compiler/stable/TestStableInt.java > >> compiler/stable/TestStableLong.java > >> compiler/stable/TestStableShort.java > >> > >> The problem is that the method 'get' in StableConfiguration is > >> supposed to return true if the method is server compiled, false > >> otherwise. > >> > >> On aarch64 it is always returning true, even when the method is client > >> compiled. The reason for this is that aarch64 differs from other ports > >> in that it always deopts on patching. > >> > >> This means that the method 'get' deopts immediately when compiled with > >> -Xcomp because it hits an unresolved method call. This means that the > >> method is now executing in the interpreter. > >> > >> When the method 'get' is executing in the interpreter, it uses the > >> value of java.vm.name to determine whether the method would be server > >> compiled if it were to be compiled. This ends up returning true on > >> aarch64, because it is a server compiler. > >> > >> However in the case where we force it not to server compile by using > >> -XX:+TieredCompilation and -XX:TieredStopAtLevel=1 (as in the > >> TestStable tests) this will be incorrect. > >> > >> The solution is to introduce a simple (null) method 'get1' which will > >> never be deopted (because there is never anything to patch) and uses > >> this as the basis for deciding whether we are server compiling or not. > >> > >> This is more fully explained in the inline comment in the patch. > >> > >> Please review and if appropriate I will push. > >> > >> All the best, > >> Ed. > >> > >> From vladimir.x.ivanov at oracle.com Tue Jun 2 17:16:55 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 02 Jun 2015 20:16:55 +0300 Subject: [aarch64-port-dev ] RFR: 8081669: aarch64: JTreg TestStable tests failing In-Reply-To: <1433265102.1852.5.camel@mint> References: <1433237195.16770.13.camel@mylittlepony.linaroharston> <556DBAED.5030701@oracle.com> <556DDEEF.4060703@oracle.com> <1433265102.1852.5.camel@mint> Message-ID: <556DE507.2060403@oracle.com> Edward, Alexander, thanks for the clarifications! My bad, missed Cavium in the OCA list. Best regards, Vladimir Ivanov On 6/2/15 8:11 PM, Edward Nevill wrote: > Hi Vladimir, > > - Alexander is employed as a contractor by Cavium > - Cavium have signed the OCA > - Alexander is using his Cavium email address > > I have checked this with Dalibor Topic (on cc) and my understanding was > that this was sufficient to allow Alexander to contribute. > > All the best, > Ed. > > On Tue, 2015-06-02 at 19:50 +0300, Vladimir Ivanov wrote: >> The only concern I have is that I don't see Alexander on OCA list [1]. >> In order to proceed with the fix, he should sign OCA first. >> >> Best regards, >> Vladimir Ivanov >> >> [1] http://www.oracle.com/technetwork/community/oca-486395.html >> >> On 6/2/15 5:17 PM, Vladimir Ivanov wrote: >>> Looks good. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> On 6/2/15 12:26 PM, Edward Nevill wrote: >>>> Hi, >>>> >>>> The following webrev >>>> >>>> http://cr.openjdk.java.net/~enevill/8081669/webrev.00/ >>>> >>>> fixes a number of TestStable tests. >>>> >>>> This patch was contributed by alexander.alexeev at caviumnetworks.com >>>> >>>> The following are the test failures that are fixed by this patch >>>> >>>> compiler/stable/TestStableByte.java >>>> compiler/stable/TestStableBoolean.java >>>> compiler/stable/TestStableChar.java >>>> compiler/stable/TestStableFloat.java >>>> compiler/stable/TestStableObject.java >>>> compiler/stable/TestStableDouble.java >>>> compiler/stable/TestStableInt.java >>>> compiler/stable/TestStableLong.java >>>> compiler/stable/TestStableShort.java >>>> >>>> The problem is that the method 'get' in StableConfiguration is >>>> supposed to return true if the method is server compiled, false >>>> otherwise. >>>> >>>> On aarch64 it is always returning true, even when the method is client >>>> compiled. The reason for this is that aarch64 differs from other ports >>>> in that it always deopts on patching. >>>> >>>> This means that the method 'get' deopts immediately when compiled with >>>> -Xcomp because it hits an unresolved method call. This means that the >>>> method is now executing in the interpreter. >>>> >>>> When the method 'get' is executing in the interpreter, it uses the >>>> value of java.vm.name to determine whether the method would be server >>>> compiled if it were to be compiled. This ends up returning true on >>>> aarch64, because it is a server compiler. >>>> >>>> However in the case where we force it not to server compile by using >>>> -XX:+TieredCompilation and -XX:TieredStopAtLevel=1 (as in the >>>> TestStable tests) this will be incorrect. >>>> >>>> The solution is to introduce a simple (null) method 'get1' which will >>>> never be deopted (because there is never anything to patch) and uses >>>> this as the basis for deciding whether we are server compiling or not. >>>> >>>> This is more fully explained in the inline comment in the patch. >>>> >>>> Please review and if appropriate I will push. >>>> >>>> All the best, >>>> Ed. >>>> >>>> > > From edward.nevill at linaro.org Wed Jun 3 08:51:47 2015 From: edward.nevill at linaro.org (Edward Nevill) Date: Wed, 03 Jun 2015 09:51:47 +0100 Subject: [aarch64-port-dev ] RFR: 8081790: SHA tests fail Message-ID: <1433321507.32688.13.camel@mylittlepony.linaroharston> Hi, The following webrev http://cr.openjdk.java.net/~enevill/8081790/webrev.00/ fixes a number of SHA test failures on aarch64. This patch was contributed by alexander.alexeev at caviumnetworks.com Currently the following JTReg/hotspot SHA tests fail on aarch64 FAILED: compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnUnsupportedCPU.java FAILED: compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnUnsupportedCPU.java FAILED: compiler/intrinsics/sha/cli/TestUseSHAOptionOnUnsupportedCPU.java (ie FAILED: compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java FAILED: compiler/intrinsics/sha/sanity/TestSHA1MultiBlockIntrinsics.java FAILED: compiler/intrinsics/sha/sanity/TestSHA256MultiBlockIntrinsics.java FAILED: compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java The reason for the test failures is that the test suite is configured on the assumption that Sparc is the only arch which support SHA in hw (and therefore supports the -XX:+UseSHA options). The webrev adds tests for aarch64. The following files have also been renamed as they were inappropriately named. R test/compiler/intrinsics/sha/cli/testcases/GenericTestCaseForSupportedSparcCPU.java R test/compiler/intrinsics/sha/cli/testcases/UseSHAIntrinsicsSpecificTestCaseForUnsupportedSparcCPU.java R test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForSupportedSparcCPU.java R test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForUnsupportedSparcCPU.java These now become A test/compiler/intrinsics/sha/cli/testcases/GenericTestCaseForSupportedCPU.java A test/compiler/intrinsics/sha/cli/testcases/UseSHAIntrinsicsSpecificTestCaseForUnsupportedCPU.java A test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForSupportedCPU.java A test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForUnsupportedCPU.java (ie. the 'Sparc' has been dropped from the filename as Sparc is no longer the only arch which supports SHA). Tested with JTReg/hotspot Before: Test results: passed: 840; failed: 10; error: 5 After: Test results: passed: 847; failed: 3; error: 5 Please review, Thanks, Ed. From edward.nevill at gmail.com Wed Jun 3 14:32:19 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 03 Jun 2015 15:32:19 +0100 Subject: [aarch64-port-dev ] RFR: 8081669: aarch64: JTreg TestStable tests failing In-Reply-To: <556DE507.2060403@oracle.com> References: <1433237195.16770.13.camel@mylittlepony.linaroharston> <556DBAED.5030701@oracle.com> <556DDEEF.4060703@oracle.com> <1433265102.1852.5.camel@mint> <556DE507.2060403@oracle.com> Message-ID: <1433341939.2009.1.camel@mylittlepony.linaroharston> On Tue, 2015-06-02 at 20:16 +0300, Vladimir Ivanov wrote: > Edward, Alexander, thanks for the clarifications! > > My bad, missed Cavium in the OCA list. > > Best regards, > Vladimir Ivanov > NP, Thanks for the review. I have pushed the change, All the best, Ed. From edward.nevill at linaro.org Thu Jun 4 10:37:55 2015 From: edward.nevill at linaro.org (Edward Nevill) Date: Thu, 4 Jun 2015 11:37:55 +0100 Subject: [aarch64-port-dev ] RFR: 8081289: aarch64: add support for RewriteFrequentPairs in interpreter In-Reply-To: <1432732880.11287.10.camel@mylittlepony.linaroharston> References: <1432732880.11287.10.camel@mylittlepony.linaroharston> Message-ID: Hi, Just a polite ping. I submitted this patch for review by a JDK9 reviewer over a week ago and there has been no response. This patch was contributed by Alexander Alexeev who is a new contributer to OpenJDK. The patch affects only _arch64 files and both Alexander and I have verified it by running JTreg hotspot with -Xint. Thanks for your help, Ed, On 27 May 2015 at 14:21, Edward Nevill wrote: > Hi, > > The following webrev adds support for RewriteFrequentPairs to the template > interpreter for aarch64. > > http://cr.openjdk.java.net/~enevill/8081289/webrev.00 > > This was contributed by Alexander Alexeev ( > alexander.alexeev at caviumnetworks.com) > > This gives a small improvement to the interpreter on aarch64, and brings > it in line with all the other ports (x86, sparc, ppc, zero) which all > support RewriteFrequentPairs. > > I have done some performance measurement using -Xint with some micro > benchmarks and I see a small improvement on each. > > java dhrystone: +9% > embedded caffeinemark: +4% > grinderbench: +1% > dacapo (avrora): +1% > > Tested with hotspot jtreg:- > > Original: Test results: passed: 787; failed: 24; error: 44 > With patch: Test results: passed: 785; failed: 24; error: 46 > > The difference in the # of errors is due to timeouts because we are > running -Xint. > > Please review and if OK I will push. > > All the best, > Ed. > > > From roland.westrelin at oracle.com Thu Jun 4 10:41:12 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Thu, 4 Jun 2015 12:41:12 +0200 Subject: [aarch64-port-dev ] RFR: 8081289: aarch64: add support for RewriteFrequentPairs in interpreter In-Reply-To: <1432732880.11287.10.camel@mylittlepony.linaroharston> References: <1432732880.11287.10.camel@mylittlepony.linaroharston> Message-ID: <73430178-4F90-4AC4-B2A9-7591909750ED@oracle.com> > http://cr.openjdk.java.net/~enevill/8081289/webrev.00 That looks good to me. Roland. From roland.westrelin at oracle.com Thu Jun 4 11:29:27 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Thu, 4 Jun 2015 13:29:27 +0200 Subject: [aarch64-port-dev ] RFR: 8079565: aarch64: Add vectorization support for aarch64 In-Reply-To: <1432658017.17486.32.camel@mylittlepony.linaroharston> References: <1432658017.17486.32.camel@mylittlepony.linaroharston> Message-ID: <1EE4C04D-7286-4ABF-B3FA-3343DC006BEF@oracle.com> > http://cr.openjdk.java.net/~enevill/8079565/webrev.00/ The platform specific changes look good. I don?t have an opinion on these: - if (TraceSuperWord && Verbose) { + if (TraceSuperWord) { I never used TraceSuperWord. If someone has opinion, it?s time to speak up I guess. Roland. From edward.nevill at linaro.org Thu Jun 4 12:28:51 2015 From: edward.nevill at linaro.org (Edward Nevill) Date: Thu, 4 Jun 2015 13:28:51 +0100 Subject: [aarch64-port-dev ] RFR: 8079565: aarch64: Add vectorization support for aarch64 In-Reply-To: <1EE4C04D-7286-4ABF-B3FA-3343DC006BEF@oracle.com> References: <1432658017.17486.32.camel@mylittlepony.linaroharston> <1EE4C04D-7286-4ABF-B3FA-3343DC006BEF@oracle.com> Message-ID: On 4 June 2015 at 12:29, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~enevill/8079565/webrev.00/ > > The platform specific changes look good. > I don?t have an opinion on these: > > - if (TraceSuperWord && Verbose) { > + if (TraceSuperWord) { > > I never used TraceSuperWord. If someone has opinion, it?s time to speak up > I guess. Hi Roland, Thanks for spotting this. This was an accidental change. I made it non verbose for my own debugging purposes but forgot to revert it. New webrev http://cr.openjdk.java.net/~enevill/8079565/webrev.01 Only difference is reverting the changes to src/share/vm/opto/superword.cpp There are now no changes to shared code, only _aarch64 files. OK to push? Ed. From roland.westrelin at oracle.com Thu Jun 4 13:43:54 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Thu, 4 Jun 2015 15:43:54 +0200 Subject: [aarch64-port-dev ] RFR: 8079565: aarch64: Add vectorization support for aarch64 In-Reply-To: References: <1432658017.17486.32.camel@mylittlepony.linaroharston> <1EE4C04D-7286-4ABF-B3FA-3343DC006BEF@oracle.com> Message-ID: <0D7936CD-B5BD-4C8C-92BF-0108D3BD7CF0@oracle.com> > OK to push? Yes. Roland. From edward.nevill at linaro.org Tue Jun 9 17:10:55 2015 From: edward.nevill at linaro.org (Edward Nevill) Date: Tue, 09 Jun 2015 18:10:55 +0100 Subject: [aarch64-port-dev ] RFR: 8086087: aarch64: add support for 64 bit vectors Message-ID: <1433869855.11860.20.camel@mylittlepony.linaroharston> Hi, http://cr.openjdk.java.net/~enevill/8086087/webrev/ This adds support for 64 bit vectors on aarch64. Previously the vector code only supported 128 bit vectors. 32 bit vectors are not supported directly as aarch64 has no support for 32 bit vectors, however the above webrev will permit 32 bit vectors but just place them in a 64 bit vector. I have tested this with JTreg hotspot and get the same results before and after the change, viz, Test results: passed: 845; failed: 12; error: 6 I have also benchmarked the Test*Vect tests from 6340864 in the hotspot test suite. The following are the average results I get on one of our partners HW (lower number is better). TestByteVect: 128-bit (11.77), 64-bit (4.36) TestShortVect: 128-bit (5.02), 64-bit (5.22) TestIntVect: 128-bit (7.81), 64-bit (7.70) TestLongVect: 128-bit (11.67), 64-bit (11.71) TestFloatVect: 128-bit (16.75), 64-bit (17.29) TestDoubleVect:128-bit (32.37), 64-bit (32.43) So the only test which shows an improvement is TestByteVect which shows a 2.7x speedup. The other tests are the same within the bounds of experimental error. The reason TestByteVect shows such an improvement is that with 128 bit vectors it is not being vectorized at all because the loop is not unrolled sufficiently to allow it to be vectorized, wheras with 64 bit vectors it is. Please review and let me know if this is OK to push? Ed. From edward.nevill at linaro.org Wed Jun 10 12:58:51 2015 From: edward.nevill at linaro.org (Edward Nevill) Date: Wed, 10 Jun 2015 13:58:51 +0100 Subject: [aarch64-port-dev ] RFR: 8085805: aarch64: AdvancedThresholdPolicy lacks tuning of InlineSmallCode size Message-ID: <1433941131.11860.74.camel@mylittlepony.linaroharston> Hi, http://cr.openjdk.java.net/~enevill/8085805/webrev/ adds tuning of InlineSmallCode for aarch64. src/share/vm/runtime/advancedThresholdPolicy.cpp contains the following code which tunes the value of InlineSmallCode for X86 and SPARC. // Some inlining tuning #ifdef X86 if (FLAG_IS_DEFAULT(InlineSmallCode)) { FLAG_SET_DEFAULT(InlineSmallCode, 2000); } #endif #ifdef SPARC if (FLAG_IS_DEFAULT(InlineSmallCode)) { FLAG_SET_DEFAULT(InlineSmallCode, 2500); } #endif set_increase_threshold_at_ratio(); set_start_time(os::javaTimeMillis()); } This webrev proposes changing this so that InlineSmallCode is increased to 2500 on aarch64 rather than the default of 1000. The change is simply to add AARCH64 to the conditional. IE. #if defined SPARC || defined AARCH64 if (FLAG_IS_DEFAULT(InlineSmallCode)) { FLAG_SET_DEFAULT(InlineSmallCode, 2500); } #endif This change request was triggered by one of our partners reporting a 6x improvement in one benchmark when the size of InlineSmallCode is increased. I have done some testing to find the optimal size of InlineCodeSize for aarch64. The following shows the performance of various benchmarks against different sizes of InlineSmallCode. InlineSmallCode 100 1000 1500 2000 2500 3000 5000 Grinderbench 440543 589792 595603 659213 665973 664663 667865 Stringtest 65182 65304 65211 339946 329314 326831 296886 SpecJVM2008 76.4 90.8 90.9 91.9 89.6 89.2 88.3 The optimal value seems to be about 2000/2500. I have elected for the slightly higher value. Tested with JTreg/hotspot. In both cases, before and after the patch Test results: passed: 845; failed: 12; error: 6 Please review, Thanks, Ed. From Alexander.Alexeev at caviumnetworks.com Wed Jun 10 13:14:20 2015 From: Alexander.Alexeev at caviumnetworks.com (Alexeev, Alexander) Date: Wed, 10 Jun 2015 13:14:20 +0000 Subject: [aarch64-port-dev ] mov between vector and GP register Message-ID: Hello I would like to clarify why two moves below are declared as private in macroAssembler_aarch64.hpp? What would be correct approach to use them in ins_encode definition in aarch64.ad? assembler_aarch64.hpp: 2062 // Move from general purpose register 2063 // mov Vd.T[index], Rn 2064 void mov(FloatRegister Vd, SIMD_Arrangement T, int index, Register Xn) { ... 2070 // Move to general purpose register 2071 // mov Rd, Vn.T[index] 2072 void mov(Register Xd, FloatRegister Vn, SIMD_Arrangement T, int index) { Thanks, Alexander From Alexander.Alexeev at caviumnetworks.com Wed Jun 10 14:06:03 2015 From: Alexander.Alexeev at caviumnetworks.com (Alexeev, Alexander) Date: Wed, 10 Jun 2015 14:06:03 +0000 Subject: [aarch64-port-dev ] population count intrinsic performance Message-ID: Hello I've implemented preliminary version of popCountI (intrinsic for java.lang.Integer.bitCount). For some reasons performance become worse than it was before with Hacker's Delight version of algorithm. Pure benchmarking of assembly code showed that new version in contrast should be more efficient (2 cycles shorter). SIMD - 13 cycles HD (baseline) - 15 cycles For evaluation in Java I used JMH Benchmark Mode Cnt Score Error Units SIMD BitCount.bitCountInteger avgt 5 16.008 ? 0.016 ns/op Baseline BitCount.bitCountInteger avgt 5 11.131 ? 0.069 ns/op So I wonder what could cause such reverse. Could the reason be in JVM infrastructure and how intrinsics are inlined versus JITed code? Any ideas are appreciated? instruct popCountI(iRegINoSp dst, iRegIorL2I src) %{ match(Set dst (PopCountI src)); ins_cost(INSN_COST * 13); format %{ "popCountI TODO\n\t" %} ins_encode %{ __ mov(vscratch1, __ T1D, 0, as_Register($src$$reg)); __ cnt(vscratch2, __ T8B, vscratch1); __ addv(vscratch1, __ T8B, vscratch2); __ mov(as_Register($dst$$reg), vscratch1, __ T1D, 0); %} ins_pipe(ialu_reg); %} Benchmark JMH (just one routine, the rest is as usual) @Benchmark public void bitCountInteger(final Blackhole bh) { bh.consume(java.lang.Integer.bitCount(0)); } Thanks, Alexander From edward.nevill at gmail.com Wed Jun 10 14:24:28 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 10 Jun 2015 15:24:28 +0100 Subject: [aarch64-port-dev ] population count intrinsic performance In-Reply-To: References: Message-ID: <1433946268.11860.93.camel@mylittlepony.linaroharston> On Wed, 2015-06-10 at 14:06 +0000, Alexeev, Alexander wrote: > > > instruct popCountI(iRegINoSp dst, iRegIorL2I src) %{ > match(Set dst (PopCountI src)); > ins_cost(INSN_COST * 13); > > format %{ "popCountI TODO\n\t" %} > ins_encode %{ > __ mov(vscratch1, __ T1D, 0, as_Register($src$$reg)); > __ cnt(vscratch2, __ T8B, vscratch1); > __ addv(vscratch1, __ T8B, vscratch2); > __ mov(as_Register($dst$$reg), vscratch1, __ T1D, 0); > %} > > ins_pipe(ialu_reg); > %} What are 'vscratch1' & 'vscratch2'. Could you send the complete patch so I can try this out, Thanks, Ed. From vladimir.kozlov at oracle.com Wed Jun 10 18:01:25 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 10 Jun 2015 11:01:25 -0700 Subject: [aarch64-port-dev ] RFR: 8085805: aarch64: AdvancedThresholdPolicy lacks tuning of InlineSmallCode size In-Reply-To: <1433941131.11860.74.camel@mylittlepony.linaroharston> References: <1433941131.11860.74.camel@mylittlepony.linaroharston> Message-ID: <55787B75.8010009@oracle.com> Looks good to me. Thanks, Vladimir On 6/10/15 5:58 AM, Edward Nevill wrote: > Hi, > > http://cr.openjdk.java.net/~enevill/8085805/webrev/ > > adds tuning of InlineSmallCode for aarch64. > > src/share/vm/runtime/advancedThresholdPolicy.cpp contains the following code which tunes the value of InlineSmallCode for X86 and SPARC. > > // Some inlining tuning > #ifdef X86 > if (FLAG_IS_DEFAULT(InlineSmallCode)) { > FLAG_SET_DEFAULT(InlineSmallCode, 2000); > } > #endif > > #ifdef SPARC > if (FLAG_IS_DEFAULT(InlineSmallCode)) { > FLAG_SET_DEFAULT(InlineSmallCode, 2500); > } > #endif > > set_increase_threshold_at_ratio(); > set_start_time(os::javaTimeMillis()); > } > > This webrev proposes changing this so that InlineSmallCode is increased to 2500 on aarch64 rather than the default of 1000. The change is simply to add AARCH64 to the conditional. IE. > > #if defined SPARC || defined AARCH64 > if (FLAG_IS_DEFAULT(InlineSmallCode)) { > FLAG_SET_DEFAULT(InlineSmallCode, 2500); > } > #endif > > This change request was triggered by one of our partners reporting a 6x improvement in one benchmark when the size of InlineSmallCode is increased. > > I have done some testing to find the optimal size of InlineCodeSize for aarch64. The following shows the performance of various benchmarks against different sizes of InlineSmallCode. > > InlineSmallCode 100 1000 1500 2000 2500 3000 5000 > > Grinderbench 440543 589792 595603 659213 665973 664663 667865 > Stringtest 65182 65304 65211 339946 329314 326831 296886 > SpecJVM2008 76.4 90.8 90.9 91.9 89.6 89.2 88.3 > > The optimal value seems to be about 2000/2500. I have elected for the slightly higher value. > > Tested with JTreg/hotspot. In both cases, before and after the patch > > Test results: passed: 845; failed: 12; error: 6 > > Please review, > Thanks, > Ed. > > > From Alexander.Alexeev at caviumnetworks.com Wed Jun 10 18:41:01 2015 From: Alexander.Alexeev at caviumnetworks.com (Alexeev, Alexander) Date: Wed, 10 Jun 2015 18:41:01 +0000 Subject: [aarch64-port-dev ] population count intrinsic performance In-Reply-To: <1433946268.11860.93.camel@mylittlepony.linaroharston> References: <1433946268.11860.93.camel@mylittlepony.linaroharston> Message-ID: Ed, I removed those 'vscratch1' & 'vscratch2' as redundant. Patch is below. Regards, Alexander --- CUT HERE --- diff -r 11af3990d56c src/cpu/aarch64/vm/aarch64.ad --- a/src/cpu/aarch64/vm/aarch64.ad Thu Jun 04 18:50:05 2015 -0700 +++ b/src/cpu/aarch64/vm/aarch64.ad Wed Jun 10 18:12:27 2015 +0000 @@ -7402,6 +7402,40 @@ ins_pipe(ialu_reg); %} +//---------- Population Count Instructions ------------------------------------- +// + +instruct popCountI(iRegINoSp dst, iRegIorL2I src) %{ + match(Set dst (PopCountI src)); + ins_cost(INSN_COST * 13); + + format %{ "TODO popCountI\n\t" %} + ins_encode %{ + __ mov(v0, __ T1D, 0, as_Register($src$$reg)); + __ cnt(v1, __ T8B, v0); + __ addv(v0, __ T8B, v1); + __ mov(as_Register($dst$$reg), v0, __ T1D, 0); + %} + + ins_pipe(ialu_reg); +%} + +// Note: Long.bitCount(long) returns an int. +instruct popCountL(iRegINoSp dst, iRegL src) %{ + match(Set dst (PopCountL src)); + ins_cost(INSN_COST * 13); + + format %{ "TODO popCountL\n\t" %} + ins_encode %{ + __ mov(v0, __ T1D, 0, as_Register($src$$reg)); + __ cnt(v1, __ T8B, v0); + __ addv(v0, __ T8B, v1); + __ mov(as_Register($dst$$reg), v0, __ T1D, 0); + %} + + ins_pipe(ialu_reg); +%} + // ============================================================================ // MemBar Instruction diff -r 11af3990d56c src/cpu/aarch64/vm/assembler_aarch64.hpp --- a/src/cpu/aarch64/vm/assembler_aarch64.hpp Thu Jun 04 18:50:05 2015 -0700 +++ b/src/cpu/aarch64/vm/assembler_aarch64.hpp Wed Jun 10 18:12:27 2015 +0000 @@ -2050,6 +2050,9 @@ INSN(negr, 1, 0b100000101110); INSN(notr, 1, 0b100000010110); INSN(addv, 0, 0b110001101110); + INSN(cls, 0, 0b100000010010); + INSN(clz, 1, 0b100000010010); + INSN(cnt, 0, 0b100000010110); #undef INSN diff -r 11af3990d56c src/cpu/aarch64/vm/macroAssembler_aarch64.hpp --- a/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp Thu Jun 04 18:50:05 2015 -0700 +++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp Wed Jun 10 18:12:27 2015 +0000 @@ -36,6 +36,7 @@ class MacroAssembler: public Assembler { friend class LIR_Assembler; + public: using Assembler::mov; using Assembler::movi; --- CUT HERE --- > -----Original Message----- > From: Edward Nevill [mailto:edward.nevill at gmail.com] > Sent: Wednesday, June 10, 2015 5:24 PM > To: Alexeev, Alexander > Cc: aarch64-port-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] population count intrinsic performance > > On Wed, 2015-06-10 at 14:06 +0000, Alexeev, Alexander wrote: > > > > > > > instruct popCountI(iRegINoSp dst, iRegIorL2I src) %{ > > match(Set dst (PopCountI src)); > > ins_cost(INSN_COST * 13); > > > > format %{ "popCountI TODO\n\t" %} > > ins_encode %{ > > __ mov(vscratch1, __ T1D, 0, as_Register($src$$reg)); > > __ cnt(vscratch2, __ T8B, vscratch1); > > __ addv(vscratch1, __ T8B, vscratch2); > > __ mov(as_Register($dst$$reg), vscratch1, __ T1D, 0); > > %} > > > > ins_pipe(ialu_reg); > > %} > > What are 'vscratch1' & 'vscratch2'. Could you send the complete patch so I > can try this out, > > Thanks, > Ed. > From edward.nevill at gmail.com Wed Jun 10 20:34:48 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 10 Jun 2015 21:34:48 +0100 Subject: [aarch64-port-dev ] population count intrinsic performance In-Reply-To: References: <1433946268.11860.93.camel@mylittlepony.linaroharston> Message-ID: <1433968488.1036.39.camel@mint> On Wed, 2015-06-10 at 18:41 +0000, Alexeev, Alexander wrote: > Ed, I removed those 'vscratch1' & 'vscratch2' as redundant. > Patch is below. Ah, I see. vscratch1 & vscratch2 are just two v registers you carved out as scratch registers from the vector register set. But you need to let the register allocator know! > > +//---------- Population Count Instructions ------------------------------------- > +// > + > +instruct popCountI(iRegINoSp dst, iRegIorL2I src) %{ > + match(Set dst (PopCountI src)); > + ins_cost(INSN_COST * 13); > + > + format %{ "TODO popCountI\n\t" %} > + ins_encode %{ > + __ mov(v0, __ T1D, 0, as_Register($src$$reg)); > + __ cnt(v1, __ T8B, v0); > + __ addv(v0, __ T8B, v1); > + __ mov(as_Register($dst$$reg), v0, __ T1D, 0); > + %} So here, registers v0 and v1 might already be allocated so you cannot just use them. Also, I don't understand why you need v0 and v1. I think what you want is something like instruct popCountI(iRegINoSp dst, iRegIorL2I src, vRegD tmp) %{ match(Set dst (PopCountI src)); ins_cost(INSN_COST * 13); effect(TEMP tmp); format ... ins_encode %{ __ mov(tmp, __ T1D, 0, as_Register($src$$reg)); __ cnt(tmp, __ T8B, tmp); __ addv(tmp, __ T8B, tmp); __ mov(as_Register($dst$$reg), tmp, __ T1D, 0); %} (I haven't tried this, just typed it into the email, so there may be typos). > + > + ins_pipe(ialu_reg); > +%} I think this should be ins_pipe(pipe_class_default) for consistency with all the other SIMD instructions for which we haven't implemented pipeline scheduling. > + > // ============================================================================ > // MemBar Instruction > > diff -r 11af3990d56c src/cpu/aarch64/vm/assembler_aarch64.hpp > --- a/src/cpu/aarch64/vm/assembler_aarch64.hpp Thu Jun 04 18:50:05 2015 -0700 > +++ b/src/cpu/aarch64/vm/assembler_aarch64.hpp Wed Jun 10 18:12:27 2015 +0000 > @@ -2050,6 +2050,9 @@ > INSN(negr, 1, 0b100000101110); > INSN(notr, 1, 0b100000010110); > INSN(addv, 0, 0b110001101110); > + INSN(cls, 0, 0b100000010010); > + INSN(clz, 1, 0b100000010010); > + INSN(cnt, 0, 0b100000010110); > > #undef INSN > > diff -r 11af3990d56c src/cpu/aarch64/vm/macroAssembler_aarch64.hpp > --- a/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp Thu Jun 04 18:50:05 2015 -0700 > +++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp Wed Jun 10 18:12:27 2015 +0000 > @@ -36,6 +36,7 @@ > class MacroAssembler: public Assembler { > friend class LIR_Assembler; > > + public: > using Assembler::mov; > using Assembler::movi; Looks fine, I think these should be public. All the best, Ed. From Alexander.Alexeev at caviumnetworks.com Thu Jun 11 08:10:30 2015 From: Alexander.Alexeev at caviumnetworks.com (Alexeev, Alexander) Date: Thu, 11 Jun 2015 08:10:30 +0000 Subject: [aarch64-port-dev ] population count intrinsic performance In-Reply-To: <1433968488.1036.39.camel@mint> References: <1433946268.11860.93.camel@mylittlepony.linaroharston> <1433968488.1036.39.camel@mint> Message-ID: > But you need to let the register allocator know! This is the main reason why I called this patch preliminary and it was a mistake to neglect that. Now it is clear. After applying recommended changes results for both versions are the same. Baseline: Benchmark Mode Cnt Score Error Units BitCount.bitCountInteger avgt 5 11.004 ? 0.000 ns/op BitCount.bitCountLong avgt 5 11.005 ? 0.000 ns/op SIMD version: Benchmark Mode Cnt Score Error Units BitCount.bitCountInteger avgt 5 11.004 ? 0.001 ns/op BitCount.bitCountLong avgt 5 11.004 ? 0.000 ns/op Updated patch is below. --- CUT HERE --- diff -r 93cc4d7535ce src/cpu/aarch64/vm/aarch64.ad --- a/src/cpu/aarch64/vm/aarch64.ad Wed Jun 10 12:29:07 2015 +0000 +++ b/src/cpu/aarch64/vm/aarch64.ad Thu Jun 11 07:28:28 2015 +0000 @@ -7402,6 +7402,42 @@ ins_pipe(ialu_reg); %} +//---------- Population Count Instructions ------------------------------------- +// + +instruct popCountI(iRegINoSp dst, iRegIorL2I src, vRegD tmp) %{ + match(Set dst (PopCountI src)); + effect(TEMP tmp); + ins_cost(INSN_COST * 13); + + format %{ "TODO popCountI\n\t" %} + ins_encode %{ + __ mov($tmp$$FloatRegister, __ T1D, 0, as_Register($src$$reg)); + __ cnt($tmp$$FloatRegister, __ T8B, $tmp$$FloatRegister); + __ addv($tmp$$FloatRegister, __ T8B, $tmp$$FloatRegister); + __ mov(as_Register($dst$$reg), $tmp$$FloatRegister, __ T1D, 0); + %} + + ins_pipe(pipe_class_default); +%} + +// Note: Long.bitCount(long) returns an int. +instruct popCountL(iRegINoSp dst, iRegL src, vRegD tmp) %{ + match(Set dst (PopCountL src)); + effect(TEMP tmp); + ins_cost(INSN_COST * 13); + + format %{ "TODO popCountL\n\t" %} + ins_encode %{ + __ mov($tmp$$FloatRegister, __ T1D, 0, as_Register($src$$reg)); + __ cnt($tmp$$FloatRegister, __ T8B, $tmp$$FloatRegister); + __ addv($tmp$$FloatRegister, __ T8B, $tmp$$FloatRegister); + __ mov(as_Register($dst$$reg), $tmp$$FloatRegister, __ T1D, 0); + %} + + ins_pipe(pipe_class_default); +%} + // ============================================================================ // MemBar Instruction diff -r 93cc4d7535ce src/cpu/aarch64/vm/assembler_aarch64.hpp --- a/src/cpu/aarch64/vm/assembler_aarch64.hpp Wed Jun 10 12:29:07 2015 +0000 +++ b/src/cpu/aarch64/vm/assembler_aarch64.hpp Thu Jun 11 07:28:28 2015 +0000 @@ -2050,6 +2050,9 @@ INSN(negr, 1, 0b100000101110); INSN(notr, 1, 0b100000010110); INSN(addv, 0, 0b110001101110); + INSN(cls, 0, 0b100000010010); + INSN(clz, 1, 0b100000010010); + INSN(cnt, 0, 0b100000010110); #undef INSN diff -r 93cc4d7535ce src/cpu/aarch64/vm/macroAssembler_aarch64.hpp --- a/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp Wed Jun 10 12:29:07 2015 +0000 +++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp Thu Jun 11 07:28:28 2015 +0000 @@ -36,6 +36,7 @@ class MacroAssembler: public Assembler { friend class LIR_Assembler; + public: using Assembler::mov; using Assembler::movi; --- CUT HERE --- Regards, Alexander From edward.nevill at gmail.com Thu Jun 11 16:20:24 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 11 Jun 2015 17:20:24 +0100 Subject: [aarch64-port-dev ] population count intrinsic performance In-Reply-To: References: <1433946268.11860.93.camel@mylittlepony.linaroharston> <1433968488.1036.39.camel@mint> Message-ID: <1434039624.7052.50.camel@mint> On Thu, 2015-06-11 at 08:10 +0000, Alexeev, Alexander wrote: > + > +instruct popCountI(iRegINoSp dst, iRegIorL2I src, vRegD tmp) %{ > + match(Set dst (PopCountI src)); > + effect(TEMP tmp); > + ins_cost(INSN_COST * 13); > + > + format %{ "TODO popCountI\n\t" %} > + ins_encode %{ > + __ mov($tmp$$FloatRegister, __ T1D, 0, as_Register($src$$reg)); > + __ cnt($tmp$$FloatRegister, __ T8B, $tmp$$FloatRegister); > + __ addv($tmp$$FloatRegister, __ T8B, $tmp$$FloatRegister); > + __ mov(as_Register($dst$$reg), $tmp$$FloatRegister, __ T1D, 0); > + %} I think there may be a problem with the way 'src' is used here. You are assuming that the top 32 bits of src are 0. However this may not be the case if for example, src is the result of an elided L2I conversion. See the following comment in aarch64.ad // iRegIorL2I is used for src inputs in rules for 32 bit int (I) // operations. it allows the src to be either an iRegI or a (ConvL2I // iRegL). in the latter case the l2i normally planted for a ConvL2I // can be elided because the 32-bit instruction will just employ the // lower 32 bits anyway. Now, what I am not clear on, is whether if you just use iRegI here rather than iRregIorL2I you are guaranteed that the top 32 bits are 0. All the best, Ed. From aph at redhat.com Thu Jun 11 16:24:16 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 11 Jun 2015 17:24:16 +0100 Subject: [aarch64-port-dev ] population count intrinsic performance In-Reply-To: <1434039624.7052.50.camel@mint> References: <1433946268.11860.93.camel@mylittlepony.linaroharston> <1433968488.1036.39.camel@mint> <1434039624.7052.50.camel@mint> Message-ID: <5579B630.4060406@redhat.com> On 06/11/2015 05:20 PM, Edward Nevill wrote: > Now, what I am not clear on, is whether if you just use iRegI here > rather than iRregIorL2I you are guaranteed that the top 32 bits are 0. If you can't use movw then src should be an iRegI. Also, this: __ mov($tmp$$FloatRegister, __ T1D, 0, as_Register($src$$reg)); could be __ mov($tmp$$FloatRegister, __ T1D, 0, $src$$Register); Andrew. From Alexander.Alexeev at caviumnetworks.com Thu Jun 11 16:54:31 2015 From: Alexander.Alexeev at caviumnetworks.com (Alexeev, Alexander) Date: Thu, 11 Jun 2015 16:54:31 +0000 Subject: [aarch64-port-dev ] population count intrinsic performance In-Reply-To: <5579B630.4060406@redhat.com> References: <1433946268.11860.93.camel@mylittlepony.linaroharston> <1433968488.1036.39.camel@mint> <1434039624.7052.50.camel@mint> <5579B630.4060406@redhat.com> Message-ID: > > Now, what I am not clear on, is whether if you just use iRegI here > > rather than iRregIorL2I you are guaranteed that the top 32 bits are 0. > > If you can't use movw then src should be an iRegI. Agreed. >__ mov($tmp$$FloatRegister, __ T1D, 0, $src$$Register); I've fixed that. What would be the next step? From aph at redhat.com Thu Jun 11 17:00:09 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 11 Jun 2015 18:00:09 +0100 Subject: [aarch64-port-dev ] population count intrinsic performance In-Reply-To: References: <1433946268.11860.93.camel@mylittlepony.linaroharston> <1433968488.1036.39.camel@mint> <1434039624.7052.50.camel@mint> <5579B630.4060406@redhat.com> Message-ID: <5579BE99.1090600@redhat.com> On 06/11/2015 05:54 PM, Alexeev, Alexander wrote: >>> Now, what I am not clear on, is whether if you just use iRegI here >>> rather than iRregIorL2I you are guaranteed that the top 32 bits are 0. >> >> If you can't use movw then src should be an iRegI. > Agreed. > >> __ mov($tmp$$FloatRegister, __ T1D, 0, $src$$Register); > I've fixed that. > > What would be the next step? Post it as an RFR to hotspot-dev Andrew. From edward.nevill at linaro.org Fri Jun 12 08:49:03 2015 From: edward.nevill at linaro.org (Edward Nevill) Date: Fri, 12 Jun 2015 09:49:03 +0100 Subject: [aarch64-port-dev ] RFR: 8081790: SHA tests fail In-Reply-To: <1433321507.32688.13.camel@mylittlepony.linaroharston> References: <1433321507.32688.13.camel@mylittlepony.linaroharston> Message-ID: Hi, Sorry to bother. The following was posted for review 9 days ago but there has been no response. This is an aarch64 only change to resolve 7 jtreg/hotspot failures. Could a JDK9 reviewer please take a look at this, Thanks, Ed. On 3 June 2015 at 09:51, Edward Nevill wrote: > Hi, > > The following webrev > > http://cr.openjdk.java.net/~enevill/8081790/webrev.00/ > > fixes a number of SHA test failures on aarch64. > > This patch was contributed by alexander.alexeev at caviumnetworks.com > > Currently the following JTReg/hotspot SHA tests fail on aarch64 > > FAILED: > compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnUnsupportedCPU.java > FAILED: > compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnUnsupportedCPU.java > FAILED: compiler/intrinsics/sha/cli/TestUseSHAOptionOnUnsupportedCPU.java > (ie > FAILED: compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java > FAILED: compiler/intrinsics/sha/sanity/TestSHA1MultiBlockIntrinsics.java > FAILED: compiler/intrinsics/sha/sanity/TestSHA256MultiBlockIntrinsics.java > FAILED: compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java > > The reason for the test failures is that the test suite is configured on > the assumption that Sparc is the only arch which support SHA in hw (and > therefore supports the -XX:+UseSHA options). > > The webrev adds tests for aarch64. > > The following files have also been renamed as they were inappropriately > named. > > R > test/compiler/intrinsics/sha/cli/testcases/GenericTestCaseForSupportedSparcCPU.java > R > test/compiler/intrinsics/sha/cli/testcases/UseSHAIntrinsicsSpecificTestCaseForUnsupportedSparcCPU.java > R > test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForSupportedSparcCPU.java > R > test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForUnsupportedSparcCPU.java > > These now become > > A > test/compiler/intrinsics/sha/cli/testcases/GenericTestCaseForSupportedCPU.java > A > test/compiler/intrinsics/sha/cli/testcases/UseSHAIntrinsicsSpecificTestCaseForUnsupportedCPU.java > A > test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForSupportedCPU.java > A > test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForUnsupportedCPU.java > > (ie. the 'Sparc' has been dropped from the filename as Sparc is no longer > the only arch which supports SHA). > > Tested with JTReg/hotspot > > Before: Test results: passed: 840; failed: 10; error: 5 > After: Test results: passed: 847; failed: 3; error: 5 > > Please review, > > Thanks, > Ed. > > > From david.holmes at oracle.com Fri Jun 12 08:59:51 2015 From: david.holmes at oracle.com (David Holmes) Date: Fri, 12 Jun 2015 18:59:51 +1000 Subject: [aarch64-port-dev ] RFR: 8081790: SHA tests fail In-Reply-To: References: <1433321507.32688.13.camel@mylittlepony.linaroharston> Message-ID: <557A9F87.3010902@oracle.com> Hi Ed, On 12/06/2015 6:49 PM, Edward Nevill wrote: > Hi, > > Sorry to bother. The following was posted for review 9 days ago but there > has been no response. > > This is an aarch64 only change to resolve 7 jtreg/hotspot failures. > > Could a JDK9 reviewer please take a look at this, The test changes are shared code so this needs someone from the compiler team to review and sponsor. Thanks, David > Thanks, > Ed. > > On 3 June 2015 at 09:51, Edward Nevill wrote: > >> Hi, >> >> The following webrev >> >> http://cr.openjdk.java.net/~enevill/8081790/webrev.00/ >> >> fixes a number of SHA test failures on aarch64. >> >> This patch was contributed by alexander.alexeev at caviumnetworks.com >> >> Currently the following JTReg/hotspot SHA tests fail on aarch64 >> >> FAILED: >> compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnUnsupportedCPU.java >> FAILED: >> compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnUnsupportedCPU.java >> FAILED: compiler/intrinsics/sha/cli/TestUseSHAOptionOnUnsupportedCPU.java >> (ie >> FAILED: compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >> FAILED: compiler/intrinsics/sha/sanity/TestSHA1MultiBlockIntrinsics.java >> FAILED: compiler/intrinsics/sha/sanity/TestSHA256MultiBlockIntrinsics.java >> FAILED: compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >> >> The reason for the test failures is that the test suite is configured on >> the assumption that Sparc is the only arch which support SHA in hw (and >> therefore supports the -XX:+UseSHA options). >> >> The webrev adds tests for aarch64. >> >> The following files have also been renamed as they were inappropriately >> named. >> >> R >> test/compiler/intrinsics/sha/cli/testcases/GenericTestCaseForSupportedSparcCPU.java >> R >> test/compiler/intrinsics/sha/cli/testcases/UseSHAIntrinsicsSpecificTestCaseForUnsupportedSparcCPU.java >> R >> test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForSupportedSparcCPU.java >> R >> test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForUnsupportedSparcCPU.java >> >> These now become >> >> A >> test/compiler/intrinsics/sha/cli/testcases/GenericTestCaseForSupportedCPU.java >> A >> test/compiler/intrinsics/sha/cli/testcases/UseSHAIntrinsicsSpecificTestCaseForUnsupportedCPU.java >> A >> test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForSupportedCPU.java >> A >> test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForUnsupportedCPU.java >> >> (ie. the 'Sparc' has been dropped from the filename as Sparc is no longer >> the only arch which supports SHA). >> >> Tested with JTReg/hotspot >> >> Before: Test results: passed: 840; failed: 10; error: 5 >> After: Test results: passed: 847; failed: 3; error: 5 >> >> Please review, >> >> Thanks, >> Ed. >> >> >> From edward.nevill at gmail.com Fri Jun 12 09:42:45 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Fri, 12 Jun 2015 10:42:45 +0100 Subject: [aarch64-port-dev ] RFR: 8081790: SHA tests fail In-Reply-To: <557A9F87.3010902@oracle.com> References: <1433321507.32688.13.camel@mylittlepony.linaroharston> <557A9F87.3010902@oracle.com> Message-ID: <1434102165.7052.61.camel@mint> On Fri, 2015-06-12 at 18:59 +1000, David Holmes wrote: > Hi Ed, > > On 12/06/2015 6:49 PM, Edward Nevill wrote: > > Hi, > > > > Sorry to bother. The following was posted for review 9 days ago but there > > has been no response. > > > > This is an aarch64 only change to resolve 7 jtreg/hotspot failures. > > > > Could a JDK9 reviewer please take a look at this, > > The test changes are shared code so this needs someone from the compiler > team to review and sponsor. Yes, thanks for pointing this out. Could someone from the compiler team please review and sponsor, Ed. > > > > On 3 June 2015 at 09:51, Edward Nevill wrote: > > > >> Hi, > >> > >> The following webrev > >> > >> http://cr.openjdk.java.net/~enevill/8081790/webrev.00/ > >> > >> fixes a number of SHA test failures on aarch64. > >> > >> This patch was contributed by alexander.alexeev at caviumnetworks.com > >> > >> Currently the following JTReg/hotspot SHA tests fail on aarch64 > >> > >> FAILED: > >> compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnUnsupportedCPU.java > >> FAILED: > >> compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnUnsupportedCPU.java > >> FAILED: compiler/intrinsics/sha/cli/TestUseSHAOptionOnUnsupportedCPU.java > >> (ie > >> FAILED: compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java > >> FAILED: compiler/intrinsics/sha/sanity/TestSHA1MultiBlockIntrinsics.java > >> FAILED: compiler/intrinsics/sha/sanity/TestSHA256MultiBlockIntrinsics.java > >> FAILED: compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java > >> > >> The reason for the test failures is that the test suite is configured on > >> the assumption that Sparc is the only arch which support SHA in hw (and > >> therefore supports the -XX:+UseSHA options). > >> > >> The webrev adds tests for aarch64. > >> > >> The following files have also been renamed as they were inappropriately > >> named. > >> > >> R > >> test/compiler/intrinsics/sha/cli/testcases/GenericTestCaseForSupportedSparcCPU.java > >> R > >> test/compiler/intrinsics/sha/cli/testcases/UseSHAIntrinsicsSpecificTestCaseForUnsupportedSparcCPU.java > >> R > >> test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForSupportedSparcCPU.java > >> R > >> test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForUnsupportedSparcCPU.java > >> > >> These now become > >> > >> A > >> test/compiler/intrinsics/sha/cli/testcases/GenericTestCaseForSupportedCPU.java > >> A > >> test/compiler/intrinsics/sha/cli/testcases/UseSHAIntrinsicsSpecificTestCaseForUnsupportedCPU.java > >> A > >> test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForSupportedCPU.java > >> A > >> test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForUnsupportedCPU.java > >> > >> (ie. the 'Sparc' has been dropped from the filename as Sparc is no longer > >> the only arch which supports SHA). > >> > >> Tested with JTReg/hotspot > >> > >> Before: Test results: passed: 840; failed: 10; error: 5 > >> After: Test results: passed: 847; failed: 3; error: 5 > >> > >> Please review, > >> > >> Thanks, > >> Ed. > >> > >> > >> From vladimir.kozlov at oracle.com Fri Jun 12 17:12:47 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 12 Jun 2015 10:12:47 -0700 Subject: [aarch64-port-dev ] RFR: 8081790: SHA tests fail In-Reply-To: <557A9F87.3010902@oracle.com> References: <1433321507.32688.13.camel@mylittlepony.linaroharston> <557A9F87.3010902@oracle.com> Message-ID: <557B130F.1030209@oracle.com> Changes looks fine to me I will sponsor it. Thanks, Vladimir On 6/12/15 1:59 AM, David Holmes wrote: > Hi Ed, > > On 12/06/2015 6:49 PM, Edward Nevill wrote: >> Hi, >> >> Sorry to bother. The following was posted for review 9 days ago but there >> has been no response. >> >> This is an aarch64 only change to resolve 7 jtreg/hotspot failures. >> >> Could a JDK9 reviewer please take a look at this, > > The test changes are shared code so this needs someone from the compiler > team to review and sponsor. > > Thanks, > David > >> Thanks, >> Ed. >> >> On 3 June 2015 at 09:51, Edward Nevill wrote: >> >>> Hi, >>> >>> The following webrev >>> >>> http://cr.openjdk.java.net/~enevill/8081790/webrev.00/ >>> >>> fixes a number of SHA test failures on aarch64. >>> >>> This patch was contributed by alexander.alexeev at caviumnetworks.com >>> >>> Currently the following JTReg/hotspot SHA tests fail on aarch64 >>> >>> FAILED: >>> compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnUnsupportedCPU.java >>> >>> FAILED: >>> compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnUnsupportedCPU.java >>> >>> FAILED: >>> compiler/intrinsics/sha/cli/TestUseSHAOptionOnUnsupportedCPU.java >>> (ie >>> FAILED: compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >>> FAILED: compiler/intrinsics/sha/sanity/TestSHA1MultiBlockIntrinsics.java >>> FAILED: >>> compiler/intrinsics/sha/sanity/TestSHA256MultiBlockIntrinsics.java >>> FAILED: compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >>> >>> The reason for the test failures is that the test suite is configured on >>> the assumption that Sparc is the only arch which support SHA in hw (and >>> therefore supports the -XX:+UseSHA options). >>> >>> The webrev adds tests for aarch64. >>> >>> The following files have also been renamed as they were inappropriately >>> named. >>> >>> R >>> test/compiler/intrinsics/sha/cli/testcases/GenericTestCaseForSupportedSparcCPU.java >>> >>> R >>> test/compiler/intrinsics/sha/cli/testcases/UseSHAIntrinsicsSpecificTestCaseForUnsupportedSparcCPU.java >>> >>> R >>> test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForSupportedSparcCPU.java >>> >>> R >>> test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForUnsupportedSparcCPU.java >>> >>> >>> These now become >>> >>> A >>> test/compiler/intrinsics/sha/cli/testcases/GenericTestCaseForSupportedCPU.java >>> >>> A >>> test/compiler/intrinsics/sha/cli/testcases/UseSHAIntrinsicsSpecificTestCaseForUnsupportedCPU.java >>> >>> A >>> test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForSupportedCPU.java >>> >>> A >>> test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForUnsupportedCPU.java >>> >>> >>> (ie. the 'Sparc' has been dropped from the filename as Sparc is no >>> longer >>> the only arch which supports SHA). >>> >>> Tested with JTReg/hotspot >>> >>> Before: Test results: passed: 840; failed: 10; error: 5 >>> After: Test results: passed: 847; failed: 3; error: 5 >>> >>> Please review, >>> >>> Thanks, >>> Ed. >>> >>> >>> From goetz.lindenmaier at sap.com Mon Jun 15 15:30:16 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 15 Jun 2015 15:30:16 +0000 Subject: [aarch64-port-dev ] Ping: RFR(M): 8086069: Adapt runtime calls to recent intrinsics to pass ints as long Message-ID: <4295855A5C1DE049A61835A1887419CC2CFF9D17@DEWDFEMB12A.global.corp.sap> Hi, Could someone please have a look at this change? I had a look whether I can push the functionality down to make_runtime_call(). This would simplify matters a lot. But as the TypeFunc is hashed, I can not change it any more in make_runtime_call(). @aarch-people: I saw you have CCallingConventionRequiresIntsAsLongs set. Could you please check whether this breaks your intinsics, e.g., multiplyToLen? (We assure in sharedRuntime_ppc.cpp, c_calling_convention() that no INT types end up there.) Thanks, Goetz -----Original Message----- From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Lindenmaier, Goetz Sent: Dienstag, 9. Juni 2015 17:18 To: HotSpot Developers Subject: RFR(M): 8086069: Adapt runtime calls to recent intrinsics to pass ints as long Hi, we are working on porting the recently* added intrinsics to PPC. As these use runtime calls, the calls must obey to the platform ABI, which requires that ints are passed as longs. We made a similar change in "8024342: PPC64 (part 111): Support for C calling conventions that require 64-bit ints." It adapts the calls if CCallingConventionRequiresIntsAsLongs is set. This change adapts the calls to multiplyToLen, CRC32, AES, SHA accordingly. Please review this change. I please need a sponsor. http://cr.openjdk.java.net/~goetz/webrevs/8086069-call_conv/webrev.01/ Best regards, Goetz * i.e., added after making our initial port From vladimir.kozlov at oracle.com Mon Jun 15 16:16:35 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 15 Jun 2015 09:16:35 -0700 Subject: [aarch64-port-dev ] RFR: aarch64: Bit count intrinsic implementation for aarch64 In-Reply-To: References: <557ECAA9.1030706@redhat.com> Message-ID: <557EFA63.50203@oracle.com> Andrew, please, help Alexeev to file JBS bug and publish webrev on cr.openjdk. We can't review or accept patches which are not on cr. server. There are several hotspot tests which check bitCount intrinsics (for example, compiler//codegen/6378821/Test6378821.java) but not full range of values. Note, jdk tests should be run with -Xcomp to make sure methods are compiled and intrinsics are used. Thanks, Vladimir On 6/15/15 6:51 AM, Alexeev, Alexander wrote: > >> Do any of those tests actually test popcount? > > Relevant JDK tests also all pass > jdk/test/java/lang/Long/* > jdk/test/java/lang/Integer/* > > Regards, > Alexander > From aph at redhat.com Mon Jun 15 16:21:10 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 15 Jun 2015 17:21:10 +0100 Subject: [aarch64-port-dev ] AArch64: vectorization fails RSA crypto tests Message-ID: <557EFB76.5050308@redhat.com> java.math.BigInteger::add([I[I)[I gets miscompiled. There is a ldr q16, [x17,x10,lsl #4] which should be a ldr q16, [x17,x10] Andrew. diff -r 6217fd2c767b src/cpu/aarch64/vm/assembler_aarch64.hpp --- a/src/cpu/aarch64/vm/assembler_aarch64.hpp Fri Jun 12 16:09:45 2015 +0100 +++ b/src/cpu/aarch64/vm/assembler_aarch64.hpp Mon Jun 15 17:16:58 2015 +0100 @@ -491,6 +491,11 @@ i->rf(_index, 16); i->f(_ext.option(), 15, 13); unsigned size = i->get(31, 30); + if (i->get(26, 26) && i->get(23, 23)) { + // SIMD Q Type - Size = 128 bits + assert(size == 0, "bad size"); + size = 0b100; + } if (size == 0) // It's a byte i->f(_ext.shift() >= 0, 12); else { From aph at redhat.com Mon Jun 15 16:22:50 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 15 Jun 2015 17:22:50 +0100 Subject: [aarch64-port-dev ] RFR: aarch64: Bit count intrinsic implementation for aarch64 In-Reply-To: <557EFA63.50203@oracle.com> References: <557ECAA9.1030706@redhat.com> <557EFA63.50203@oracle.com> Message-ID: <557EFBDA.5050404@redhat.com> On 06/15/2015 05:16 PM, Vladimir Kozlov wrote: > Andrew, please, help Alexeev to file JBS bug and publish webrev on cr.openjdk. We can't review or accept patches which > are not on cr. server. Sure, I'll do that. Given that he now has done the paperwork, is there anything to prevent him from having an account on cr.openjdk? Andrew. From edward.nevill at linaro.org Mon Jun 15 20:24:59 2015 From: edward.nevill at linaro.org (Edward Nevill) Date: Mon, 15 Jun 2015 21:24:59 +0100 Subject: [aarch64-port-dev ] AArch64: vectorization fails RSA crypto tests In-Reply-To: <557EFB76.5050308@redhat.com> References: <557EFB76.5050308@redhat.com> Message-ID: On 15 June 2015 at 17:21, Andrew Haley wrote: > > > diff -r 6217fd2c767b src/cpu/aarch64/vm/assembler_aarch64.hpp > --- a/src/cpu/aarch64/vm/assembler_aarch64.hpp Fri Jun 12 16:09:45 2015 > +0100 > +++ b/src/cpu/aarch64/vm/assembler_aarch64.hpp Mon Jun 15 17:16:58 2015 > +0100 > @@ -491,6 +491,11 @@ > i->rf(_index, 16); > i->f(_ext.option(), 15, 13); > unsigned size = i->get(31, 30); > + if (i->get(26, 26) && i->get(23, 23)) { > + // SIMD Q Type - Size = 128 bits > + assert(size == 0, "bad size"); > + size = 0b100; > + } > if (size == 0) // It's a byte > i->f(_ext.shift() >= 0, 12); > else { > Oops, sorry about that. The following fixes the assertion failure in java/math/BigInteger/BigIntegerTest.java diff -r 6217fd2c767b src/cpu/aarch64/vm/macroAssembler_aarch64.hpp --- a/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp Fri Jun 12 16:09:45 2015 +0100 +++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp Mon Jun 15 20:20:12 2015 +0000 @@ -477,8 +477,11 @@ // T1D/T2D: invalid void mov(FloatRegister Vd, SIMD_Arrangement T, u_int32_t imm32) { assert(T != T1D && T != T2D, "invalid arrangement"); + if (T == T8B || T == T16B) { + movi(Vd, T, imm32 & 0xff, 0); + return; + } u_int32_t nimm32 = ~imm32; - if (T == T8B || T == T16B) { imm32 &= 0xff; nimm32 &= 0xff; } if (T == T4H || T == T8H) { imm32 &= 0xffff; nimm32 &= 0xffff; } u_int32_t x = imm32; int movi_cnt = 0; Would you like me to merge these two as a single patch, file a JBS "regression" bug report and push a cr. All the best, Ed. From dean.long at oracle.com Tue Jun 16 00:07:34 2015 From: dean.long at oracle.com (Dean Long) Date: Mon, 15 Jun 2015 17:07:34 -0700 Subject: [aarch64-port-dev ] Ping: RFR(M): 8086069: Adapt runtime calls to recent intrinsics to pass ints as long In-Reply-To: <57EF8CC4-3CB7-488E-89D4-5AE5EA3C99AA@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CFF9D17@DEWDFEMB12A.global.corp.sap> <57EF8CC4-3CB7-488E-89D4-5AE5EA3C99AA@oracle.com> Message-ID: <557F68C6.4050805@oracle.com> This may be a dumb (but hopefully related) question, but why do we need to add top() for _LP64: 4364 #ifdef _LP64 4365 #define XTOP ,top() /*additional argument*/ 4366 #else //_LP64 4367 #define XTOP /*no additional argument*/ 4368 #endif //_LP64 4396 make_runtime_call(RC_LEAF|RC_NO_FP, 4397 OptoRuntime::fast_arraycopy_Type(), 4398 StubRoutines::unsafe_arraycopy(), 4399 "unsafe_arraycopy", 4400 TypeRawPtr::BOTTOM, 4401 src, dst, size XTOP); And why only for"size", but not "src" and "dst"? dl On 6/15/2015 1:47 PM, John Rose wrote: > This change surprises me. Sometimes our machine-independent IR needs #ifdefs, Matcher queries, or flag tests to deal with platform stuff we haven't factorized properly. In this case a flag test is "apologizing" for oddly-sized argument registers at the IR level. > > But TypeFuncs and the rest of the IR do not talk about such details of calling conventions. A C2 type is only about the information content of arguments and return values, not their register bindings. The lower level function SharedRuntime::c_calling_convention determines exact bindings of values to argument and return value registers, using the type VMRegPair. It is likely that there is some awkwardness in assigning a *pair* of regs (representing a single 64-bit register) to carry a 32-bit value, but surely this is less complex, and more to the point, than hacking conversions from 32- to 64-bit values at the IR level. > > I would expect that, if your approach is to work, there should be an assert in SharedRuntime::c_calling_convention saying that the 32-bit types T_INT, etc., are *never* presented to the SR::ccc/VMRegPair layer of the code. But, as it seems to me, it would be less disruptive to the overall design if SR::ccc can be presented with T_INT types, and be free to return an indication of which 64-bit register will carry that value. The low-level move instructions which push data into those argument registers can be specialized to those target registers (in the AD file) if there is a need for filling up the 32 extra bits (sign or zero). > > HTH > ? John > > On Jun 15, 2015, at 8:30 AM, Lindenmaier, Goetz wrote: >> Hi, >> >> Could someone please have a look at this change? >> >> I had a look whether I can push the functionality down to make_runtime_call(). >> This would simplify matters a lot. But as the TypeFunc is hashed, I can not >> change it any more in make_runtime_call(). >> >> @aarch-people: I saw you have CCallingConventionRequiresIntsAsLongs set. >> Could you please check whether this breaks your intinsics, e.g., multiplyToLen? >> (We assure in sharedRuntime_ppc.cpp, c_calling_convention() that no INT types >> end up there.) >> >> Thanks, >> Goetz >> >> -----Original Message----- >> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Lindenmaier, Goetz >> Sent: Dienstag, 9. Juni 2015 17:18 >> To: HotSpot Developers >> Subject: RFR(M): 8086069: Adapt runtime calls to recent intrinsics to pass ints as long >> >> Hi, >> >> we are working on porting the recently* added intrinsics to PPC. As these use >> runtime calls, the calls must obey to the platform ABI, which requires that ints >> are passed as longs. >> >> We made a similar change in "8024342: PPC64 (part 111): Support for C calling >> conventions that require 64-bit ints." It adapts the calls if >> CCallingConventionRequiresIntsAsLongs is set. >> >> This change adapts the calls to multiplyToLen, CRC32, AES, SHA accordingly. >> >> Please review this change. I please need a sponsor. >> http://cr.openjdk.java.net/~goetz/webrevs/8086069-call_conv/webrev.01/ >> >> Best regards, >> Goetz >> >> >> * i.e., added after making our initial port From dean.long at oracle.com Tue Jun 16 03:57:05 2015 From: dean.long at oracle.com (Dean Long) Date: Mon, 15 Jun 2015 20:57:05 -0700 Subject: [aarch64-port-dev ] Ping: RFR(M): 8086069: Adapt runtime calls to recent intrinsics to pass ints as long In-Reply-To: References: <4295855A5C1DE049A61835A1887419CC2CFF9D17@DEWDFEMB12A.global.corp.sap> <57EF8CC4-3CB7-488E-89D4-5AE5EA3C99AA@oracle.com> <557F68C6.4050805@oracle.com> Message-ID: <557F9E91.7020603@oracle.com> On 6/15/2015 5:26 PM, John Rose wrote: > On Jun 15, 2015, at 5:07 PM, Dean Long > wrote: >> >> This may be a dumb (but hopefully related) question, but why do we >> need to add top() for _LP64: >> >> 4364 #ifdef _LP64 >> 4365 #define XTOP ,top() /*additional argument*/ >> 4366 #else //_LP64 >> 4367 #define XTOP /*no additional argument*/ >> 4368 #endif //_LP64 >> >> 4396 make_runtime_call(RC_LEAF|RC_NO_FP, >> 4397 OptoRuntime::fast_arraycopy_Type(), >> 4398 StubRoutines::unsafe_arraycopy(), >> 4399 "unsafe_arraycopy", >> 4400 TypeRawPtr::BOTTOM, >> 4401 src, dst, size XTOP); >> >> And why only for"size", but not "src" and "dst"? > > That is one of the awkward places we jam in LP64-specific code. > Java has no size_t type; the closest it gets is "long". > But the compiler uses Java types to set up runtime stub call signatures. > So if we want the compiler to pass a size_t value to a stub, it has to > pass a long on !LP64 and int on ILP32. > (There's no need for an int-in-a-long here, since size_t is never 32 > bits on an int-in-a-long machine.) > To complete the embarrassment, the compiler has an internal convention > of always representing the twin word for longs and doubles (see JVMS). > Net result is that if you want to ask for a size_t in the JIT on LP64, > you have to #ifdef in a jlong, and pass a "top" twin word. > ? John Thanks for the explanation. It sounds like we are modeling the abstract Java stack representation of long and double, and this wouldn't be easy to change, because I see things like "TypeFunc::Parms + 1" and "argument(2)" that would need to change before this could go away. dl From goetz.lindenmaier at sap.com Tue Jun 16 07:23:01 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 16 Jun 2015 07:23:01 +0000 Subject: [aarch64-port-dev ] Ping: RFR(M): 8086069: Adapt runtime calls to recent intrinsics to pass ints as long In-Reply-To: <57EF8CC4-3CB7-488E-89D4-5AE5EA3C99AA@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CFF9D17@DEWDFEMB12A.global.corp.sap> <57EF8CC4-3CB7-488E-89D4-5AE5EA3C99AA@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2CFF9F65@DEWDFEMB12A.global.corp.sap> Hi John, thanks for looking at this change! The PPC ABI says that int arguments must properly be extended to 64-bit: http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi-1.9.html#PARAM-PASS "Simple integer types (char, short, int, long, enum) are mapped to a single doubleword. Values shorter than a doubleword are sign or zero extended as necessary." We achieve this by adding an I2L node for arguments < 64-bit. To assure proper typing of the IR, we have to adapt the function type and parameter list accordingly. Obviously, we have to deal with the fact that longs occupy 2 slots. That's not nice, but currently necessary. The assertion you mention is in sharedRuntime_ppc.cpp:738. The approach works, it's just not implemented for the new intrinsics. Also, I was looking for a generic solution, where I don't have to adapt each runtime call added to the parser. Sure we could issue sign extending instructions along with the call node during emitting to the code buffer. But if we add this early during parsing, the nodes are subject to optimization. Best regards, Goetz. -----Original Message----- From: John Rose [mailto:john.r.rose at oracle.com] Sent: Montag, 15. Juni 2015 22:47 To: Lindenmaier, Goetz Cc: HotSpot Developers; aarch64-port-dev at openjdk.java.net Subject: Re: Ping: RFR(M): 8086069: Adapt runtime calls to recent intrinsics to pass ints as long This change surprises me. Sometimes our machine-independent IR needs #ifdefs, Matcher queries, or flag tests to deal with platform stuff we haven't factorized properly. In this case a flag test is "apologizing" for oddly-sized argument registers at the IR level. But TypeFuncs and the rest of the IR do not talk about such details of calling conventions. A C2 type is only about the information content of arguments and return values, not their register bindings. The lower level function SharedRuntime::c_calling_convention determines exact bindings of values to argument and return value registers, using the type VMRegPair. It is likely that there is some awkwardness in assigning a *pair* of regs (representing a single 64-bit register) to carry a 32-bit value, but surely this is less complex, and more to the point, than hacking conversions from 32- to 64-bit values at the IR level. I would expect that, if your approach is to work, there should be an assert in SharedRuntime::c_calling_convention saying that the 32-bit types T_INT, etc., are *never* presented to the SR::ccc/VMRegPair layer of the code. But, as it seems to me, it would be less disruptive to the overall design if SR::ccc can be presented with T_INT types, and be free to return an indication of which 64-bit register will carry that value. The low-level move instructions which push data into those argument registers can be specialized to those target registers (in the AD file) if there is a need for filling up the 32 extra bits (sign or zero). HTH ? John On Jun 15, 2015, at 8:30 AM, Lindenmaier, Goetz wrote: > > Hi, > > Could someone please have a look at this change? > > I had a look whether I can push the functionality down to make_runtime_call(). > This would simplify matters a lot. But as the TypeFunc is hashed, I can not > change it any more in make_runtime_call(). > > @aarch-people: I saw you have CCallingConventionRequiresIntsAsLongs set. > Could you please check whether this breaks your intinsics, e.g., multiplyToLen? > (We assure in sharedRuntime_ppc.cpp, c_calling_convention() that no INT types > end up there.) > > Thanks, > Goetz > > -----Original Message----- > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Lindenmaier, Goetz > Sent: Dienstag, 9. Juni 2015 17:18 > To: HotSpot Developers > Subject: RFR(M): 8086069: Adapt runtime calls to recent intrinsics to pass ints as long > > Hi, > > we are working on porting the recently* added intrinsics to PPC. As these use > runtime calls, the calls must obey to the platform ABI, which requires that ints > are passed as longs. > > We made a similar change in "8024342: PPC64 (part 111): Support for C calling > conventions that require 64-bit ints." It adapts the calls if > CCallingConventionRequiresIntsAsLongs is set. > > This change adapts the calls to multiplyToLen, CRC32, AES, SHA accordingly. > > Please review this change. I please need a sponsor. > http://cr.openjdk.java.net/~goetz/webrevs/8086069-call_conv/webrev.01/ > > Best regards, > Goetz > > > * i.e., added after making our initial port From john.r.rose at oracle.com Mon Jun 15 20:47:17 2015 From: john.r.rose at oracle.com (John Rose) Date: Mon, 15 Jun 2015 13:47:17 -0700 Subject: [aarch64-port-dev ] Ping: RFR(M): 8086069: Adapt runtime calls to recent intrinsics to pass ints as long In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CFF9D17@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC2CFF9D17@DEWDFEMB12A.global.corp.sap> Message-ID: <57EF8CC4-3CB7-488E-89D4-5AE5EA3C99AA@oracle.com> This change surprises me. Sometimes our machine-independent IR needs #ifdefs, Matcher queries, or flag tests to deal with platform stuff we haven't factorized properly. In this case a flag test is "apologizing" for oddly-sized argument registers at the IR level. But TypeFuncs and the rest of the IR do not talk about such details of calling conventions. A C2 type is only about the information content of arguments and return values, not their register bindings. The lower level function SharedRuntime::c_calling_convention determines exact bindings of values to argument and return value registers, using the type VMRegPair. It is likely that there is some awkwardness in assigning a *pair* of regs (representing a single 64-bit register) to carry a 32-bit value, but surely this is less complex, and more to the point, than hacking conversions from 32- to 64-bit values at the IR level. I would expect that, if your approach is to work, there should be an assert in SharedRuntime::c_calling_convention saying that the 32-bit types T_INT, etc., are *never* presented to the SR::ccc/VMRegPair layer of the code. But, as it seems to me, it would be less disruptive to the overall design if SR::ccc can be presented with T_INT types, and be free to return an indication of which 64-bit register will carry that value. The low-level move instructions which push data into those argument registers can be specialized to those target registers (in the AD file) if there is a need for filling up the 32 extra bits (sign or zero). HTH ? John On Jun 15, 2015, at 8:30 AM, Lindenmaier, Goetz wrote: > > Hi, > > Could someone please have a look at this change? > > I had a look whether I can push the functionality down to make_runtime_call(). > This would simplify matters a lot. But as the TypeFunc is hashed, I can not > change it any more in make_runtime_call(). > > @aarch-people: I saw you have CCallingConventionRequiresIntsAsLongs set. > Could you please check whether this breaks your intinsics, e.g., multiplyToLen? > (We assure in sharedRuntime_ppc.cpp, c_calling_convention() that no INT types > end up there.) > > Thanks, > Goetz > > -----Original Message----- > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Lindenmaier, Goetz > Sent: Dienstag, 9. Juni 2015 17:18 > To: HotSpot Developers > Subject: RFR(M): 8086069: Adapt runtime calls to recent intrinsics to pass ints as long > > Hi, > > we are working on porting the recently* added intrinsics to PPC. As these use > runtime calls, the calls must obey to the platform ABI, which requires that ints > are passed as longs. > > We made a similar change in "8024342: PPC64 (part 111): Support for C calling > conventions that require 64-bit ints." It adapts the calls if > CCallingConventionRequiresIntsAsLongs is set. > > This change adapts the calls to multiplyToLen, CRC32, AES, SHA accordingly. > > Please review this change. I please need a sponsor. > http://cr.openjdk.java.net/~goetz/webrevs/8086069-call_conv/webrev.01/ > > Best regards, > Goetz > > > * i.e., added after making our initial port From john.r.rose at oracle.com Tue Jun 16 00:26:36 2015 From: john.r.rose at oracle.com (John Rose) Date: Mon, 15 Jun 2015 17:26:36 -0700 Subject: [aarch64-port-dev ] Ping: RFR(M): 8086069: Adapt runtime calls to recent intrinsics to pass ints as long In-Reply-To: <557F68C6.4050805@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CFF9D17@DEWDFEMB12A.global.corp.sap> <57EF8CC4-3CB7-488E-89D4-5AE5EA3C99AA@oracle.com> <557F68C6.4050805@oracle.com> Message-ID: On Jun 15, 2015, at 5:07 PM, Dean Long wrote: > > This may be a dumb (but hopefully related) question, but why do we need to add top() for _LP64: > > 4364 #ifdef _LP64 > 4365 #define XTOP ,top() /*additional argument*/ > 4366 #else //_LP64 > 4367 #define XTOP /*no additional argument*/ > 4368 #endif //_LP64 > > 4396 make_runtime_call(RC_LEAF|RC_NO_FP, > 4397 OptoRuntime::fast_arraycopy_Type(), > 4398 StubRoutines::unsafe_arraycopy(), > 4399 "unsafe_arraycopy", > 4400 TypeRawPtr::BOTTOM, > 4401 src, dst, size XTOP); > > And why only for"size", but not "src" and "dst"? That is one of the awkward places we jam in LP64-specific code. Java has no size_t type; the closest it gets is "long". But the compiler uses Java types to set up runtime stub call signatures. So if we want the compiler to pass a size_t value to a stub, it has to pass a long on !LP64 and int on ILP32. (There's no need for an int-in-a-long here, since size_t is never 32 bits on an int-in-a-long machine.) To complete the embarrassment, the compiler has an internal convention of always representing the twin word for longs and doubles (see JVMS). Net result is that if you want to ask for a size_t in the JIT on LP64, you have to #ifdef in a jlong, and pass a "top" twin word. ? John From john.r.rose at oracle.com Tue Jun 16 04:13:02 2015 From: john.r.rose at oracle.com (John Rose) Date: Mon, 15 Jun 2015 21:13:02 -0700 Subject: [aarch64-port-dev ] Ping: RFR(M): 8086069: Adapt runtime calls to recent intrinsics to pass ints as long In-Reply-To: <557F9E91.7020603@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CFF9D17@DEWDFEMB12A.global.corp.sap> <57EF8CC4-3CB7-488E-89D4-5AE5EA3C99AA@oracle.com> <557F68C6.4050805@oracle.com> <557F9E91.7020603@oracle.com> Message-ID: On Jun 15, 2015, at 8:57 PM, Dean Long wrote: > > Thanks for the explanation. It sounds like we are modeling the abstract Java stack representation of long and double, and this wouldn't be > easy to change, because I see things like "TypeFunc::Parms + 1" and "argument(2)" that would need to change before this could go away. Indeed. Slot pairs are a mess, an optimization (or concession) for platforms that no longer matter. (Primitives might look like that in a few years.) Some messes in HotSpot stem (IMO) from excessive attention to the bytecode syntax, designing a managed execution engine around the oddities of a code format. In an ideal world, I would like to isolate, deprecate, and eventually remove the "evil twin" slots, since they no longer have any meaning (except maybe on some 32-bit systems). Doing it at all levels will be hard, except in the context of some other breaking change. But it could be done locally in the JVM, removing the notion of twin slots from modules that don't have an absolute need to work with them. JITs shouldn't have to know about them, IMO; maybe not even the interpreter (though that would involve a renumbering prepass). When we get value types, we may be able to make such a change, even to the bytecode syntax itself. Or perhaps we will perpetuate the "evil twin" convention, but apply it to all value types (plus long and double). Or perhaps (happy thought) we can make every value/ref/prim occupy one stack slot, in some bytecode of the future. ? John From aph at redhat.com Tue Jun 16 07:58:37 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 16 Jun 2015 08:58:37 +0100 Subject: [aarch64-port-dev ] AArch64: vectorization fails RSA crypto tests In-Reply-To: References: <557EFB76.5050308@redhat.com> Message-ID: <557FD72D.9060904@redhat.com> On 15/06/15 21:24, Edward Nevill wrote: > Would you like me to merge these two as a single patch, file a JBS > "regression" bug report and push a cr. Yes please, Andrew. From aph at redhat.com Tue Jun 16 15:44:06 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 16 Jun 2015 16:44:06 +0100 Subject: [aarch64-port-dev ] AArch64: vectorization fails RSA crypto tests In-Reply-To: References: <557EFB76.5050308@redhat.com> Message-ID: <55804446.7010100@redhat.com> On 06/15/2015 09:24 PM, Edward Nevill wrote: > Would you like me to merge these two as a single patch, file a JBS > "regression" bug report and push a cr. While you're at it: mov(FloatRegister, SIMD_Arrangement, u_int32_t) is a bit too large to be in a header file. Thx, Andrew. From tangwei6 at huawei.com Thu Jun 18 07:08:29 2015 From: tangwei6 at huawei.com (Tangwei (Euler)) Date: Thu, 18 Jun 2015 07:08:29 +0000 Subject: [aarch64-port-dev ] failed to build JDK9 Message-ID: Hi All, I cloned the latest openJDK9 for aarch64 on Ubuntu and failed to configure with following error message. Anyone knows how to solve this issue? From the message, it suggested to install libfreetype6-dev, but the library has already been installed. configure: Could not compile and link with freetype. This might be a 32/64-bit mismatch. configure: Using FREETYPE_CFLAGS=-I/usr/include/freetype2 and FREETYPE_LIBS=-lfreetype configure: error: Can not continue without freetype. You might be able to fix this by running 'sudo apt-get install libfreetype6-dev'. configure exiting with result code 1 Regards! wei From Alexander.Alexeev at caviumnetworks.com Thu Jun 18 07:40:47 2015 From: Alexander.Alexeev at caviumnetworks.com (Alexeev, Alexander) Date: Thu, 18 Jun 2015 07:40:47 +0000 Subject: [aarch64-port-dev ] failed to build JDK9 In-Reply-To: References: Message-ID: Hello Wei Did you try --debug-configure flag? It might provide some information on the source of the problem. Check that lib and includes are available on specified path. Regards, Alexander > -----Original Message----- > From: aarch64-port-dev [mailto:aarch64-port-dev- > bounces at openjdk.java.net] On Behalf Of Tangwei (Euler) > Sent: Thursday, June 18, 2015 10:08 AM > To: aarch64-port-dev at openjdk.java.net > Subject: [aarch64-port-dev ] failed to build JDK9 > > Hi All, > I cloned the latest openJDK9 for aarch64 on Ubuntu and failed to configure > with following error message. > Anyone knows how to solve this issue? From the message, it suggested to > install libfreetype6-dev, but the library has already been installed. > > configure: Could not compile and link with freetype. This might be a 32/64-bit > mismatch. > configure: Using FREETYPE_CFLAGS=-I/usr/include/freetype2 and > FREETYPE_LIBS=-lfreetype > configure: error: Can not continue without freetype. You might be able to fix > this by running 'sudo apt-get install libfreetype6-dev'. > configure exiting with result code 1 > > > Regards! > wei From edward.nevill at linaro.org Thu Jun 18 09:35:37 2015 From: edward.nevill at linaro.org (Edward Nevill) Date: Thu, 18 Jun 2015 10:35:37 +0100 Subject: [aarch64-port-dev ] failed to build JDK9 In-Reply-To: References: Message-ID: Hi, I can successfully build jdk9 on ubuntu 14.04. You may like to try initially building openjdk-7 as follows to ensure all the dependancies are correct for openjdk-7. apt-get source openjdk-7-jdk cd openjdk-7-7u51-2.4.6 (name may vary slightly) dpkg-buildpackage 2>&1 | tee ../log If any dependancies are missing this will tell you exactly what packages to install. Once the dependancies are correct for openjdk-7 then you can retry jdk9. Note that in order to build jdk9 you will need to have jdk8 installed. You can download a pre-built binary from http://openjdk.linaro.org (follow the releases tab). All the best, Edward Nevill On 18 June 2015 at 08:08, Tangwei (Euler) wrote: > Hi All, > I cloned the latest openJDK9 for aarch64 on Ubuntu and failed to > configure with following error message. > Anyone knows how to solve this issue? From the message, it suggested to > install libfreetype6-dev, > but the library has already been installed. > > configure: Could not compile and link with freetype. This might be a > 32/64-bit mismatch. > configure: Using FREETYPE_CFLAGS=-I/usr/include/freetype2 and > FREETYPE_LIBS=-lfreetype > configure: error: Can not continue without freetype. You might be able to > fix this by running 'sudo apt-get install libfreetype6-dev'. > configure exiting with result code 1 > > > Regards! > wei > From tangwei6 at huawei.com Thu Jun 18 14:36:17 2015 From: tangwei6 at huawei.com (Tangwei (Euler)) Date: Thu, 18 Jun 2015 14:36:17 +0000 Subject: [aarch64-port-dev ] failed to build JDK9 In-Reply-To: References: Message-ID: Forgot to mention, I tried to do cross compilation for aarch64 on X64 platform. Following is my configuration command line. The same command line works for openJDK8 before. From the configuration log, the directory to include freetype2 and libfreetype.so for aarch64 needs to be specified. ./configure --enable-option-checking=fatal --openjdk-target=aarch64-linux-gnu --enable-unlimited-crypto --with-zlib=system --with-stdc++lib=dynamic CC=aarch64-linux-gnu-gcc CXX=aarch64-linux-gnu-g++ My issue is solved by adding two options below to configure command: --with-freetype-include=aarch64-toolchain/sysroot/usr/include/freetype2/ --with-freetype-lib=aarch64-toolchain/sysroot/usr/lib/ Thanks a lot for everyone?s kindly help! Regards! wei From: Edward Nevill [mailto:edward.nevill at linaro.org] Sent: Thursday, June 18, 2015 5:36 PM To: Tangwei (Euler) Cc: aarch64-port-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] failed to build JDK9 Hi, I can successfully build jdk9 on ubuntu 14.04. You may like to try initially building openjdk-7 as follows to ensure all the dependancies are correct for openjdk-7. apt-get source openjdk-7-jdk cd openjdk-7-7u51-2.4.6 (name may vary slightly) dpkg-buildpackage 2>&1 | tee ../log If any dependancies are missing this will tell you exactly what packages to install. Once the dependancies are correct for openjdk-7 then you can retry jdk9. Note that in order to build jdk9 you will need to have jdk8 installed. You can download a pre-built binary from http://openjdk.linaro.org (follow the releases tab). All the best, Edward Nevill On 18 June 2015 at 08:08, Tangwei (Euler) > wrote: Hi All, I cloned the latest openJDK9 for aarch64 on Ubuntu and failed to configure with following error message. Anyone knows how to solve this issue? From the message, it suggested to install libfreetype6-dev, but the library has already been installed. configure: Could not compile and link with freetype. This might be a 32/64-bit mismatch. configure: Using FREETYPE_CFLAGS=-I/usr/include/freetype2 and FREETYPE_LIBS=-lfreetype configure: error: Can not continue without freetype. You might be able to fix this by running 'sudo apt-get install libfreetype6-dev'. configure exiting with result code 1 Regards! wei From tangwei6 at huawei.com Thu Jun 18 15:00:23 2015 From: tangwei6 at huawei.com (Tangwei (Euler)) Date: Thu, 18 Jun 2015 15:00:23 +0000 Subject: [aarch64-port-dev ] abort when running jdk9 on ARM64 In-Reply-To: References: Message-ID: Hi All, I can build out openjdk9 successfully, but the program will abort when just running ?java? without any options: Following is stack, anyone has met the same issue before? Stack: [0x000003ffa5340000,0x000003ffa5540000], sp=0x000003ffa553e1c0, free space=2040k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x8fb4e4] VMError::report_and_die()+0x130 V [libjvm.so+0x3df4a0] report_vm_error(char const*, int, char const*, char const*)+0x68 V [libjvm.so+0x73066c] Monitor::wait(bool, long, bool)+0x22c V [libjvm.so+0x773d10] os::create_thread(Thread*, os::ThreadType, unsigned long)+0x1a8 V [libjvm.so+0x491efc] GCTaskThread::GCTaskThread(GCTaskManager*, unsigned int, unsigned int)+0x60 V [libjvm.so+0x491320] GCTaskManager::initialize()+0x338 V [libjvm.so+0x7905cc] ParallelScavengeHeap::initialize()+0x334 V [libjvm.so+0x8c2a94] Universe::initialize_heap()+0x11c V [libjvm.so+0x8c2c60] universe_init()+0x34 V [libjvm.so+0x4f8348] init_globals()+0x54 V [libjvm.so+0x8a6d98] Threads::create_vm(JavaVMInitArgs*, bool*)+0x2ac V [libjvm.so+0x56d688] JNI_CreateJavaVM+0x78 C [libjli.so+0x2a64] JavaMain+0x8c C [libpthread.so.0+0x7c50] start_thread+0xb0 C [libc.so.6+0xdac60] thread_start+0x30 Regards! wei From: Edward Nevill [mailto:edward.nevill at linaro.org] Sent: Thursday, June 18, 2015 5:36 PM To: Tangwei (Euler) Cc: aarch64-port-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] failed to build JDK9 Hi, I can successfully build jdk9 on ubuntu 14.04. You may like to try initially building openjdk-7 as follows to ensure all the dependancies are correct for openjdk-7. apt-get source openjdk-7-jdk cd openjdk-7-7u51-2.4.6 (name may vary slightly) dpkg-buildpackage 2>&1 | tee ../log If any dependancies are missing this will tell you exactly what packages to install. Once the dependancies are correct for openjdk-7 then you can retry jdk9. Note that in order to build jdk9 you will need to have jdk8 installed. You can download a pre-built binary from http://openjdk.linaro.org (follow the releases tab). All the best, Edward Nevill On 18 June 2015 at 08:08, Tangwei (Euler) > wrote: Hi All, I cloned the latest openJDK9 for aarch64 on Ubuntu and failed to configure with following error message. Anyone knows how to solve this issue? From the message, it suggested to install libfreetype6-dev, but the library has already been installed. configure: Could not compile and link with freetype. This might be a 32/64-bit mismatch. configure: Using FREETYPE_CFLAGS=-I/usr/include/freetype2 and FREETYPE_LIBS=-lfreetype configure: error: Can not continue without freetype. You might be able to fix this by running 'sudo apt-get install libfreetype6-dev'. configure exiting with result code 1 Regards! wei From edward.nevill at gmail.com Thu Jun 18 15:19:25 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 18 Jun 2015 16:19:25 +0100 Subject: [aarch64-port-dev ] abort when running jdk9 on ARM64 In-Reply-To: References: Message-ID: <1434640765.8420.4.camel@mint> Hi wei, You could try downloading the latest jdk9 prebuilt binary from http://openjdk.linaro.org. Here is a link http://openjdk.linaro.org/releases/jdk9-server-release-1505.tar.xz If this works then it is likely to be a problem with your build. All the best, Ed. On Thu, 2015-06-18 at 15:00 +0000, Tangwei (Euler) wrote: > Hi All, > I can build out openjdk9 successfully, but the program will abort when just running ?java? without any options: > Following is stack, anyone has met the same issue before? > > Stack: [0x000003ffa5340000,0x000003ffa5540000], sp=0x000003ffa553e1c0, free space=2040k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x8fb4e4] VMError::report_and_die()+0x130 > V [libjvm.so+0x3df4a0] report_vm_error(char const*, int, char const*, char const*)+0x68 > V [libjvm.so+0x73066c] Monitor::wait(bool, long, bool)+0x22c > V [libjvm.so+0x773d10] os::create_thread(Thread*, os::ThreadType, unsigned long)+0x1a8 > V [libjvm.so+0x491efc] GCTaskThread::GCTaskThread(GCTaskManager*, unsigned int, unsigned int)+0x60 > V [libjvm.so+0x491320] GCTaskManager::initialize()+0x338 > V [libjvm.so+0x7905cc] ParallelScavengeHeap::initialize()+0x334 > V [libjvm.so+0x8c2a94] Universe::initialize_heap()+0x11c > V [libjvm.so+0x8c2c60] universe_init()+0x34 > V [libjvm.so+0x4f8348] init_globals()+0x54 > V [libjvm.so+0x8a6d98] Threads::create_vm(JavaVMInitArgs*, bool*)+0x2ac > V [libjvm.so+0x56d688] JNI_CreateJavaVM+0x78 > C [libjli.so+0x2a64] JavaMain+0x8c > C [libpthread.so.0+0x7c50] start_thread+0xb0 > C [libc.so.6+0xdac60] thread_start+0x30 > > > Regards! > wei From christian.thalinger at oracle.com Thu Jun 18 16:02:20 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 18 Jun 2015 09:02:20 -0700 Subject: [aarch64-port-dev ] Ping: RFR(M): 8086069: Adapt runtime calls to recent intrinsics to pass ints as long In-Reply-To: References: <4295855A5C1DE049A61835A1887419CC2CFF9D17@DEWDFEMB12A.global.corp.sap> <57EF8CC4-3CB7-488E-89D4-5AE5EA3C99AA@oracle.com> <557F68C6.4050805@oracle.com> <557F9E91.7020603@oracle.com> Message-ID: <754964BF-CC0D-4324-8AAC-11799847BFEE@oracle.com> > On Jun 15, 2015, at 9:13 PM, John Rose wrote: > > On Jun 15, 2015, at 8:57 PM, Dean Long wrote: >> >> Thanks for the explanation. It sounds like we are modeling the abstract Java stack representation of long and double, and this wouldn't be >> easy to change, because I see things like "TypeFunc::Parms + 1" and "argument(2)" that would need to change before this could go away. > > Indeed. Slot pairs are a mess, an optimization (or concession) for platforms that no longer matter. (Primitives might look like that in a few years.) Some messes in HotSpot stem (IMO) from excessive attention to the bytecode syntax, designing a managed execution engine around the oddities of a code format. > > In an ideal world, I would like to isolate, deprecate, and eventually remove the "evil twin" slots, since they no longer have any meaning (except maybe on some 32-bit systems). Doing it at all levels will be hard, except in the context of some other breaking change. But it could be done locally in the JVM, removing the notion of twin slots from modules that don't have an absolute need to work with them. JITs shouldn't have to know about them, IMO; maybe not even the interpreter (though that would involve a renumbering prepass). > > When we get value types, we may be able to make such a change, even to the bytecode syntax itself. Or perhaps we will perpetuate the "evil twin" convention, but apply it to all value types (plus long and double). Or perhaps (happy thought) we can make every value/ref/prim occupy one stack slot, in some bytecode of the future. I?m all for happy thoughts :-) > > ? John From goetz.lindenmaier at sap.com Mon Jun 22 12:54:30 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 22 Jun 2015 12:54:30 +0000 Subject: [aarch64-port-dev ] Ping: RFR(M): 8086069: Adapt runtime calls to recent intrinsics to pass ints as long In-Reply-To: <754964BF-CC0D-4324-8AAC-11799847BFEE@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2CFF9D17@DEWDFEMB12A.global.corp.sap> <57EF8CC4-3CB7-488E-89D4-5AE5EA3C99AA@oracle.com> <557F68C6.4050805@oracle.com> <557F9E91.7020603@oracle.com> <754964BF-CC0D-4324-8AAC-11799847BFEE@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2CFFC4DE@DEWDFEMB12A.global.corp.sap> Hi, I would like to ping again for my change. I want to recall that this only extends the existing mechanism guarded by CCallingConventionRequiresIntsAsLongs to multiplyToLen, CRC32, AES and SHA. http://cr.openjdk.java.net/~goetz/webrevs/8086069-call_conv/webrev.01/ Could somebody please review this change? I please need a sponsor. Thanks and best regards, Goetz. -----Original Message----- From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Christian Thalinger Sent: Donnerstag, 18. Juni 2015 18:02 To: John Rose Cc: HotSpot Developers; aarch64-port-dev at openjdk.java.net Subject: Re: Ping: RFR(M): 8086069: Adapt runtime calls to recent intrinsics to pass ints as long > On Jun 15, 2015, at 9:13 PM, John Rose wrote: > > On Jun 15, 2015, at 8:57 PM, Dean Long wrote: >> >> Thanks for the explanation. It sounds like we are modeling the abstract Java stack representation of long and double, and this wouldn't be >> easy to change, because I see things like "TypeFunc::Parms + 1" and "argument(2)" that would need to change before this could go away. > > Indeed. Slot pairs are a mess, an optimization (or concession) for platforms that no longer matter. (Primitives might look like that in a few years.) Some messes in HotSpot stem (IMO) from excessive attention to the bytecode syntax, designing a managed execution engine around the oddities of a code format. > > In an ideal world, I would like to isolate, deprecate, and eventually remove the "evil twin" slots, since they no longer have any meaning (except maybe on some 32-bit systems). Doing it at all levels will be hard, except in the context of some other breaking change. But it could be done locally in the JVM, removing the notion of twin slots from modules that don't have an absolute need to work with them. JITs shouldn't have to know about them, IMO; maybe not even the interpreter (though that would involve a renumbering prepass). > > When we get value types, we may be able to make such a change, even to the bytecode syntax itself. Or perhaps we will perpetuate the "evil twin" convention, but apply it to all value types (plus long and double). Or perhaps (happy thought) we can make every value/ref/prim occupy one stack slot, in some bytecode of the future. I?m all for happy thoughts :-) > > ? John From edward.nevill at gmail.com Mon Jun 22 13:23:21 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Mon, 22 Jun 2015 14:23:21 +0100 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 Message-ID: <1434979401.21282.31.camel@mint> Hi, Aarch64 currently does not support the PopCountI and PopCountL nodes in aarch64.ad The following webrev adds support for these using the SIMD instructions 'cnt' and 'addv' http://cr.openjdk.java.net/~enevill/8129426/webrev.01/ This patch was contributed by alexander.alexeev at caviumnetworks.com The patch only modifies aarch64 specific files. I have merged the patch in and tested it with JTreg / hotspot with the following results. Original: Test results: passed: 847; failed: 13; error: 6 Revised: Test results: passed: 848; failed: 12; error: 6 The single additional failure in the original is the test FAILED: compiler/intrinsics/squaretolen/TestSquareToLen.java which is an intermittent failure in the original. I have benchmarked the patch on four different partner platforms. The average improvement was 2.6X for PopCountI and 2.5X for PopCountL. Please review and if OK I will push, Thanks, Ed. From aph at redhat.com Mon Jun 22 14:04:10 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 22 Jun 2015 15:04:10 +0100 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 In-Reply-To: <1434979401.21282.31.camel@mint> References: <1434979401.21282.31.camel@mint> Message-ID: <558815DA.8020500@redhat.com> On 06/22/2015 02:23 PM, Edward Nevill wrote: > Aarch64 currently does not support the PopCountI and PopCountL nodes in aarch64.ad > > Please review and if OK I will push, Shouldn't mov in the IregI case be movw? And iRegI be iRegIorL2I? I'm guessing that C2 won't do the MOVs itself if you specify the instruction as vRegD src. Andrew. From adinn at redhat.com Mon Jun 22 14:34:05 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 22 Jun 2015 15:34:05 +0100 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 In-Reply-To: <558815DA.8020500@redhat.com> References: <1434979401.21282.31.camel@mint> <558815DA.8020500@redhat.com> Message-ID: <55881CDD.2080009@redhat.com> On 22/06/15 15:04, Andrew Haley wrote: > On 06/22/2015 02:23 PM, Edward Nevill wrote: >> Aarch64 currently does not support the PopCountI and PopCountL nodes in aarch64.ad > >> >> Please review and if OK I will push, > > Shouldn't mov in the IregI case be movw? And iRegI be iRegIorL2I? Agreed on both counts. Strictly, for the PopCountI encoding /both/ mov operations -- i.e. to and fro -- should be a movw (I assume that means passing enum tag T1F in place of T1D?). However, using mov for the restore to $dst is safe as we know the top 32 bits will be zero. > I'm guessing that C2 won't do the MOVs itself if you specify the > instruction as vRegD src. I believe you guess right. regards, Andrew Dinn ----------- From edward.nevill at gmail.com Mon Jun 22 14:59:42 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Mon, 22 Jun 2015 15:59:42 +0100 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 In-Reply-To: <558815DA.8020500@redhat.com> References: <1434979401.21282.31.camel@mint> <558815DA.8020500@redhat.com> Message-ID: <1434985182.21282.34.camel@mint> On Mon, 2015-06-22 at 15:04 +0100, Andrew Haley wrote: > On 06/22/2015 02:23 PM, Edward Nevill wrote: > > Aarch64 currently does not support the PopCountI and PopCountL nodes in aarch64.ad > > > > > Please review and if OK I will push, > > Shouldn't mov in the IregI case be movw? And iRegI be iRegIorL2I? No. It needs 0s in the top 32 bits. The reason is that the following CNT instruction is only available in 8B or 16B forms. It was iRegIorL2I before, I changed it to IregI because of this problem. Regards, Ed. From aph at redhat.com Mon Jun 22 15:04:29 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 22 Jun 2015 16:04:29 +0100 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 In-Reply-To: <1434985182.21282.34.camel@mint> References: <1434979401.21282.31.camel@mint> <558815DA.8020500@redhat.com> <1434985182.21282.34.camel@mint> Message-ID: <558823FD.5080800@redhat.com> On 06/22/2015 03:59 PM, Edward Nevill wrote: > On Mon, 2015-06-22 at 15:04 +0100, Andrew Haley wrote: >> On 06/22/2015 02:23 PM, Edward Nevill wrote: >>> Aarch64 currently does not support the PopCountI and PopCountL nodes in aarch64.ad >> >>> >>> Please review and if OK I will push, >> >> Shouldn't mov in the IregI case be movw? And iRegI be iRegIorL2I? > > No. It needs 0s in the top 32 bits. > > The reason is that the following CNT instruction is only available in 8B or 16B forms. > > It was iRegIorL2I before, I changed it to IregI because of this problem. Well, you're asking for trouble. We've tried to make sure that the top half of an int register is always zero, but it's hard absolutely to guarantee it in all cases. Does movw to a vector register really not clear the top 32 bits of the dest? Andrew. From adinn at redhat.com Mon Jun 22 15:29:22 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 22 Jun 2015 16:29:22 +0100 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 In-Reply-To: <558823FD.5080800@redhat.com> References: <1434979401.21282.31.camel@mint> <558815DA.8020500@redhat.com> <1434985182.21282.34.camel@mint> <558823FD.5080800@redhat.com> Message-ID: <558829D2.4000503@redhat.com> On 22/06/15 16:04, Andrew Haley wrote: > On 06/22/2015 03:59 PM, Edward Nevill wrote: >> On Mon, 2015-06-22 at 15:04 +0100, Andrew Haley wrote: >>> On 06/22/2015 02:23 PM, Edward Nevill wrote: >>>> Aarch64 currently does not support the PopCountI and PopCountL nodes in aarch64.ad >>> >>>> >>>> Please review and if OK I will push, >>> >>> Shouldn't mov in the IregI case be movw? And iRegI be iRegIorL2I? >> >> No. It needs 0s in the top 32 bits. >> >> The reason is that the following CNT instruction is only available in 8B or 16B forms. >> >> It was iRegIorL2I before, I changed it to IregI because of this problem. > > Well, you're asking for trouble. We've tried to make sure that the > top half of an int register is always zero, but it's hard absolutely > to guarantee it in all cases. Does movw to a vector register really > not clear the top 32 bits of the dest? Aargh, after checking the Manuel (as a last resort) it appears that the 32 bit move carefully moves a 32 bit word into a 32 bit slot leaving the rest of the slots unchanged -- MOV is documented as an alias for INS which pretty much explains the semantics. It seems that the scalar fmovw instruction does the same (it is documented as overlapping the behaviour of the vector move instruction). So, it seems Ed is right to use iRegI and rely on an l2i conversion to zero the top word if the incoming value is long. regards, Andrew Dinn ----------- From aph at redhat.com Mon Jun 22 15:41:08 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 22 Jun 2015 16:41:08 +0100 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 In-Reply-To: <558829D2.4000503@redhat.com> References: <1434979401.21282.31.camel@mint> <558815DA.8020500@redhat.com> <1434985182.21282.34.camel@mint> <558823FD.5080800@redhat.com> <558829D2.4000503@redhat.com> Message-ID: <55882C94.7030505@redhat.com> On 06/22/2015 04:29 PM, Andrew Dinn wrote: > So, it seems Ed is right to use iRegI and rely on an l2i conversion to > zero the top word if the incoming value is long. I don't think that's safe. I certainly don't think it's a good tradeoff. I think it'd be the only place in our entire code base where we assume that the high bits of a jint are zero. If it really wants zeros in the top bits we'd better put them there. Andrew. From adinn at redhat.com Mon Jun 22 15:50:36 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 22 Jun 2015 16:50:36 +0100 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 In-Reply-To: <55882C94.7030505@redhat.com> References: <1434979401.21282.31.camel@mint> <558815DA.8020500@redhat.com> <1434985182.21282.34.camel@mint> <558823FD.5080800@redhat.com> <558829D2.4000503@redhat.com> <55882C94.7030505@redhat.com> Message-ID: <55882ECC.8030602@redhat.com> On 22/06/15 16:41, Andrew Haley wrote: > On 06/22/2015 04:29 PM, Andrew Dinn wrote: >> So, it seems Ed is right to use iRegI and rely on an l2i conversion to >> zero the top word if the incoming value is long. > > I don't think that's safe. I certainly don't think it's a good > tradeoff. I think it'd be the only place in our entire code base > where we assume that the high bits of a jint are zero. If it really > wants zeros in the top bits we'd better put them there. Well, yes, but it really /ought/ to be safe. Whenever we generate an iRegI dst output we should be using a foow instruction and end up with the top 32 bits zero. So, wherever we consume an iRegI src input we ought to be able to rely on it having top bits zero. Either it was generated directly as an iRegI output or it was generated as an iRegL output and passed in via an l2i conversion. If that assumption fails anywhere then it will only fail because we used a foo insn where we really needed a foow. I think we would be better to let any such errors fail as quickly as possible, find the error and fix the offending code to use foow. Your mileage may vary. regards, Andrew Dinn ----------- From aph at redhat.com Mon Jun 22 16:01:00 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 22 Jun 2015 17:01:00 +0100 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 In-Reply-To: <55882ECC.8030602@redhat.com> References: <1434979401.21282.31.camel@mint> <558815DA.8020500@redhat.com> <1434985182.21282.34.camel@mint> <558823FD.5080800@redhat.com> <558829D2.4000503@redhat.com> <55882C94.7030505@redhat.com> <55882ECC.8030602@redhat.com> Message-ID: <5588313C.1070409@redhat.com> On 06/22/2015 04:50 PM, Andrew Dinn wrote: > On 22/06/15 16:41, Andrew Haley wrote: >> On 06/22/2015 04:29 PM, Andrew Dinn wrote: >>> So, it seems Ed is right to use iRegI and rely on an l2i conversion to >>> zero the top word if the incoming value is long. >> >> I don't think that's safe. I certainly don't think it's a good >> tradeoff. I think it'd be the only place in our entire code base >> where we assume that the high bits of a jint are zero. If it really >> wants zeros in the top bits we'd better put them there. > > Well, yes, but it really /ought/ to be safe. > > Whenever we generate an iRegI dst output we should be using a foow > instruction and end up with the top 32 bits zero. So, wherever we > consume an iRegI src input we ought to be able to rely on it having top > bits zero. Either it was generated directly as an iRegI output or it was > generated as an iRegL output and passed in via an l2i conversion. > > If that assumption fails anywhere then it will only fail because we used > a foo insn where we really needed a foow. I think we would be better to > let any such errors fail as quickly as possible, find the error and fix > the offending code to use foow. And how would we even notice it, yet alone find the error? > Your mileage may vary. Hmm. So far we've been very conservative, making sure that we always use the correct mode for inputs and the correct mode for outputs. If we're going to start making assumptions that top bits of int ops are always zero we could always elide l2i to a no-op. So far we have resisted that, and with good reason IMO. I wrote the deoptimization code and was pretty careful to do the right thing, but also very reassured that it probably didn't matter. I don't think we can guarantee that nowhere do we have a sign extension where there should be a zero extension. Andrew. From adinn at redhat.com Mon Jun 22 16:14:18 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 22 Jun 2015 17:14:18 +0100 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 In-Reply-To: <5588313C.1070409@redhat.com> References: <1434979401.21282.31.camel@mint> <558815DA.8020500@redhat.com> <1434985182.21282.34.camel@mint> <558823FD.5080800@redhat.com> <558829D2.4000503@redhat.com> <55882C94.7030505@redhat.com> <55882ECC.8030602@redhat.com> <5588313C.1070409@redhat.com> Message-ID: <5588345A.4060708@redhat.com> On 22/06/15 17:01, Andrew Haley wrote: > On 06/22/2015 04:50 PM, Andrew Dinn wrote: >> If that assumption fails anywhere then it will only fail because we used >> a foo insn where we really needed a foow. I think we would be better to >> let any such errors fail as quickly as possible, find the error and fix >> the offending code to use foow. > > And how would we even notice it, yet alone find the error? I agree it will not necessarily be easy to spot. Bit we know exactly where to look (see below). >> Your mileage may vary. > > Hmm. So far we've been very conservative, making sure that we always > use the correct mode for inputs and the correct mode for outputs. If > we're going to start making assumptions that top bits of int ops are > always zero we could always elide l2i to a no-op. So far we have > resisted that, and with good reason IMO. No, that last statement is not at all correct. l2i is explicitly inserted into the ideal graph when the compiler knows that a value generated as long is being consumed as an int and so needs to be truncated. We have explicitly avoided performing any truncation to effect that l2i in every rule where we accept an input of iRegIorL2I. In all such cases we have ensured that the instruction which consumes the input is a foow not a foo. That's quite checkable by eyeball. For this one case, we also need to be sure that every instruction which generates an iRegI output uses a foow instruction which, (according to the manual) zeroes the top bits. That's also checkable by eyeball. We also need to be sure that anything spilled as a 32 bit int is restored as a 32 bit int with the top bits correspondingly zeroed. > I wrote the deoptimization code and was pretty careful to do the right > thing, but also very reassured that it probably didn't matter. I > don't think we can guarantee that nowhere do we have a sign extension > where there should be a zero extension. Well, you might not want to take this risk and instead add an explicit zero of the upper half. But I think we need to be clear what risk we are taking. regards, Andrew Dinn ----------- From goetz.lindenmaier at sap.com Tue Jun 23 07:00:26 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 23 Jun 2015 07:00:26 +0000 Subject: [aarch64-port-dev ] Fix on aarch required after 8073165: Contended Locking fast exit bucket. Message-ID: <4295855A5C1DE049A61835A1887419CC2CFFC7DF@DEWDFEMB12A.global.corp.sap> Hi, I think you need the fix below after the change in 8073165. http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/2abcd8a4896c If you verify this, I will submit it along with the ppc fix. Best regards, Goetz. diff -r 5b5db3d68ab9 src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp --- a/src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp Mon Jun 22 17:15:45 2015 +0200 +++ b/src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp Tue Jun 23 08:54:15 2015 +0200 @@ -2120,6 +2120,7 @@ save_native_result(masm, ret_type, stack_slots); } + __ mov(c_rarg2, rthread); __ lea(c_rarg1, Address(sp, lock_slot_offset * VMRegImpl::stack_slot_size)); __ mov(c_rarg0, obj_reg); @@ -2128,7 +2129,7 @@ __ ldr(r19, Address(rthread, in_bytes(Thread::pending_exception_offset()))); __ str(zr, Address(rthread, in_bytes(Thread::pending_exception_offset()))); - rt_call(masm, CAST_FROM_FN_PTR(address, SharedRuntime::complete_monitor_unlocking_C), 2, 0, 1); + rt_call(masm, CAST_FROM_FN_PTR(address, SharedRuntime::complete_monitor_unlocking_C), 3, 0, 1); #ifdef ASSERT { From edward.nevill at linaro.org Tue Jun 23 08:55:48 2015 From: edward.nevill at linaro.org (Edward Nevill) Date: Tue, 23 Jun 2015 09:55:48 +0100 Subject: [aarch64-port-dev ] RFR: 8081294: aarch64: fails to build on ubuntu wily Message-ID: <1435049748.20837.6.camel@mylittlepony.linaroharston> Hi, One of our partners has reported that jdk9 fails to build for aarch64 on ubuntu 'wily' The failing buildlog is here https://launchpad.net/ubuntu/+source/openjdk-9/9~b64-1ubuntu1/+build/7441971 The following webrev fixes this http://cr.openjdk.java.net/~enevill/8081294/webrev/ I have verified that this fixes the problem by cross compiling against a sysroot. Our partner has also verified that this patch fixes the build. As this is a change to shared code I will need someone to sponsor this and push it through JPRT. Thanks for your help, Ed From aph at redhat.com Tue Jun 23 10:10:20 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 23 Jun 2015 11:10:20 +0100 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 In-Reply-To: <5588345A.4060708@redhat.com> References: <1434979401.21282.31.camel@mint> <558815DA.8020500@redhat.com> <1434985182.21282.34.camel@mint> <558823FD.5080800@redhat.com> <558829D2.4000503@redhat.com> <55882C94.7030505@redhat.com> <55882ECC.8030602@redhat.com> <5588313C.1070409@redhat.com> <5588345A.4060708@redhat.com> Message-ID: <5589308C.6000309@redhat.com> On 22/06/15 17:14, Andrew Dinn wrote: > On 22/06/15 17:01, Andrew Haley wrote: >> On 06/22/2015 04:50 PM, Andrew Dinn wrote: >>> If that assumption fails anywhere then it will only fail because we used >>> a foo insn where we really needed a foow. I think we would be better to >>> let any such errors fail as quickly as possible, find the error and fix >>> the offending code to use foow. >> >> And how would we even notice it, yet alone find the error? > > I agree it will not necessarily be easy to spot. Bit we know exactly > where to look (see below). > >>> Your mileage may vary. >> >> Hmm. So far we've been very conservative, making sure that we always >> use the correct mode for inputs and the correct mode for outputs. If >> we're going to start making assumptions that top bits of int ops are >> always zero we could always elide l2i to a no-op. So far we have >> resisted that, and with good reason IMO. > > No, that last statement is not at all correct. l2i is explicitly > inserted into the ideal graph when the compiler knows that a value > generated as long is being consumed as an int and so needs to be > truncated. I'm sure that's true, but it's not really relevant to what I said. > We also need to be sure that anything spilled as a 32 bit int is > restored as a 32 bit int with the top bits correspondingly zeroed. That too. But C2 has to interact with C1, the interpreter, and all the stubs, and all the intrinsics. >> I wrote the deoptimization code and was pretty careful to do the right >> thing, but also very reassured that it probably didn't matter. I >> don't think we can guarantee that nowhere do we have a sign extension >> where there should be a zero extension. > > Well, you might not want to take this risk and instead add an explicit > zero of the upper half. But I think we need to be clear what risk we are > taking. It's this: if we don't explicitly zero the upper half we'll have to audit all the code which might present a sign-extended value (instead of a zero-extended one) in a register that's supposed to contain a jint. Andrew. From edward.nevill at linaro.org Tue Jun 23 10:12:12 2015 From: edward.nevill at linaro.org (Edward Nevill) Date: Tue, 23 Jun 2015 11:12:12 +0100 Subject: [aarch64-port-dev ] RFR: 8129551: some regressions introduced by addition of vectorisation code Message-ID: <1435054332.5083.15.camel@mylittlepony.linaroharston> Hi, The following webrev http://cr.openjdk.java.net/~enevill/8129551/webrev fixes a number of regressions introduced in the addition of vectorisation for aarch64 as follows:- java/math/BigInteger/BigIntegerTest.java fails with an assertion failure when run with fastdebug or slowdebug builds # Internal Error (/home/alexander/build-open-jdk/dev/jdk9/baseline/dev/hotspot/src/cpu/aarch64/vm/assembler_aarch64.hpp:2078), pid=8124, tid=0x0000007ec61eb1f0 # assert(op == 0 && 0 == 0) failed: must be MOVI and also in test java/math/BigInteger/ModPow.java java.math.BigInteger::add gets miscompiled. There is a ldr q16, [x17,x10,lsl #4] which should be a ldr q16, [x17,x10] I have also moved void MacroAssembler::mov(FloatRegister Vd, SIMD_Arrangement T, u_int32_t imm32) to macroAssembler_aarch64.cpp from macroAssembler_aarch64.hpp as it was getting much too large for a .hpp. I have tested the original and webrev version with JTreg (hotspot, langtools & jdk) with the following results:- Original:- hotspot: Test results: passed: 849; failed: 11; error: 6 langtools: Test results: passed: 3,240; error: 2 jdk: Test results: passed: 6,103; failed: 568; error: 26 Revised:- hotspot: Test results: passed: 849; failed: 11; error: 6 langtools: Test results: passed: 3,240; error: 2 jdk: Test results: passed: 6,108; failed: 567; error: 22 This changeset only affects aarch64 files. Please review and if OK I will push, Thanks, Ed. From aph at redhat.com Tue Jun 23 10:18:09 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 23 Jun 2015 11:18:09 +0100 Subject: [aarch64-port-dev ] RFR: 8129551: some regressions introduced by addition of vectorisation code In-Reply-To: <1435054332.5083.15.camel@mylittlepony.linaroharston> References: <1435054332.5083.15.camel@mylittlepony.linaroharston> Message-ID: <55893261.7030501@redhat.com> On 23/06/15 11:12, Edward Nevill wrote: > Hi, > > The following webrev > > http://cr.openjdk.java.net/~enevill/8129551/webrev > > fixes a number of regressions introduced in the addition of vectorisation for aarch64 as follows:- > > java/math/BigInteger/BigIntegerTest.java > > fails with an assertion failure when run with fastdebug or slowdebug builds > > # Internal Error (/home/alexander/build-open-jdk/dev/jdk9/baseline/dev/hotspot/src/cpu/aarch64/vm/assembler_aarch64.hpp:2078), pid=8124, tid=0x0000007ec61eb1f0 > # assert(op == 0 && 0 == 0) failed: must be MOVI > > and also in test > > java/math/BigInteger/ModPow.java > > java.math.BigInteger::add gets miscompiled. There is a > > ldr q16, [x17,x10,lsl #4] > > which should be a > > ldr q16, [x17,x10] > > I have also moved > > void MacroAssembler::mov(FloatRegister Vd, SIMD_Arrangement T, u_int32_t imm32) > > to macroAssembler_aarch64.cpp from macroAssembler_aarch64.hpp as it was getting much too large for a .hpp. > > I have tested the original and webrev version with JTreg (hotspot, langtools & jdk) with the following results:- > > Original:- > > hotspot: Test results: passed: 849; failed: 11; error: 6 > langtools: Test results: passed: 3,240; error: 2 > jdk: Test results: passed: 6,103; failed: 568; error: 26 > > Revised:- > > hotspot: Test results: passed: 849; failed: 11; error: 6 > langtools: Test results: passed: 3,240; error: 2 > jdk: Test results: passed: 6,108; failed: 567; error: 22 > > This changeset only affects aarch64 files. > > Please review and if OK I will push, Won't this fail to detect an overflow? 1418 if (T == T4H || T == T8H) { imm32 &= 0xffff; nimm32 &= 0xffff; } Otherwise this looks OK to me. Andrew. From aph at redhat.com Tue Jun 23 10:22:35 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 23 Jun 2015 11:22:35 +0100 Subject: [aarch64-port-dev ] RFR [M] : 8087333, Optionally Pre-Generate the HotSpot Template Interpreter In-Reply-To: <5589239E.9050600@oracle.com> References: <557B1743.9040004@oracle.com> <55824D8D.2000003@oracle.com> <5582F686.1090507@oracle.com> <5583C704.3090900@oracle.com> <55845790.7040507@oracle.com> <5588F41F.70506@oracle.com> <5589239E.9050600@oracle.com> Message-ID: <5589336B.90706@redhat.com> On 23/06/15 10:15, Bertrand Delsart wrote: > While investigating why I had added the .S files for this CR, I noticed > that aarch64 open port has some .S files checked-in but no rules to > compile them. To avoid triggering the compilation and linking of these > (obsolete?) files in the open aarch64 port, I had to modify the change > in vm.make. Both of these .S files are used by simulator builds. We probably don't need them any more. Andrew. From adinn at redhat.com Tue Jun 23 10:32:50 2015 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 23 Jun 2015 11:32:50 +0100 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 In-Reply-To: <5589308C.6000309@redhat.com> References: <1434979401.21282.31.camel@mint> <558815DA.8020500@redhat.com> <1434985182.21282.34.camel@mint> <558823FD.5080800@redhat.com> <558829D2.4000503@redhat.com> <55882C94.7030505@redhat.com> <55882ECC.8030602@redhat.com> <5588313C.1070409@redhat.com> <5588345A.4060708@redhat.com> <5589308C.6000309@redhat.com> Message-ID: <558935D2.1020103@redhat.com> On 23/06/15 11:10, Andrew Haley wrote: > On 22/06/15 17:14, Andrew Dinn wrote: >> . . . >> Well, you might not want to take this risk and instead add an explicit >> zero of the upper half. But I think we need to be clear what risk we are >> taking. > > It's this: if we don't explicitly zero the upper half we'll have to > audit all the code which might present a sign-extended value (instead > of a zero-extended one) in a register that's supposed to contain a > jint. Ok, let's play safe. If Ed tweaks the patch to zero the upper word we can always revise that later if/when we decide we are feeling lucky. regards, Andrew Dinn ----------- From bertrand.delsart at oracle.com Tue Jun 23 12:49:52 2015 From: bertrand.delsart at oracle.com (Bertrand Delsart) Date: Tue, 23 Jun 2015 14:49:52 +0200 Subject: [aarch64-port-dev ] RFR [M] : 8087333, Optionally Pre-Generate the HotSpot Template Interpreter In-Reply-To: <5589336B.90706@redhat.com> References: <557B1743.9040004@oracle.com> <55824D8D.2000003@oracle.com> <5582F686.1090507@oracle.com> <5583C704.3090900@oracle.com> <55845790.7040507@oracle.com> <5588F41F.70506@oracle.com> <5589239E.9050600@oracle.com> <5589336B.90706@redhat.com> Message-ID: <558955F0.3070203@oracle.com> On 23/06/2015 12:22, Andrew Haley wrote: > On 23/06/15 10:15, Bertrand Delsart wrote: >> While investigating why I had added the .S files for this CR, I noticed >> that aarch64 open port has some .S files checked-in but no rules to >> compile them. To avoid triggering the compilation and linking of these >> (obsolete?) files in the open aarch64 port, I had to modify the change >> in vm.make. > > Both of these .S files are used by simulator builds. We probably don't > need them any more. Thanks Andrew, Let me know whether you prefer my new extensible Src_Files_BASE based findsrc rule in hotspot/make/linux/makefiles/vm.make or whether I can hard code the fact that .S files, if present, should be compiled. In the later case, I can either delete your .S files or add them in Src_Files_EXCLUDE See http://cr.openjdk.java.net/~bdelsart/8087333/webrev.00-02/webrev/make/linux/makefiles/vm.make.udiff.html (The addition of .S files for our extensions was moved to make/closed, using a simple/extensible "Src_Files_BASE += \*.S") Regards, Bertrand > > Andrew. > -- Bertrand Delsart, Grenoble Engineering Center Oracle, 180 av. de l'Europe, ZIRST de Montbonnot 38330 Montbonnot Saint Martin, FRANCE bertrand.delsart at oracle.com Phone : +33 4 76 18 81 23 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From adinn at redhat.com Tue Jun 23 13:14:45 2015 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 23 Jun 2015 14:14:45 +0100 Subject: [aarch64-port-dev ] Fix for 8122937: [JEP 245] Validate JVM Command-Line Flag Arguments breaks AArch64 Message-ID: <55895BC5.7010607@redhat.com> I am trying to build the current hs-rt tree on AArch64 in order to test against Andrew Haley's UseCondCardMark patch (JDK-8079315). That tree now also includes the following fix for JDK-8122937: http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/5bbf25472731 The patch includes changes made to all the current cpus /except/ aarch64 which means that the aarch64 build now falls over with /home/adinn/openjdk/hs-rt/hotspot/src/share/vm/runtime/globals_extension.hpp:242:28: error: macro "ARCH_FLAGS" passed 7 arguments, but takes just 5 IGNORE_CONSTRAINT) I believe the only modification required is to add the extra 2 arguments range and constraint to macro ARCH_FLAGS (as per most of the other cpu-specific globals files). I am testing this now and will raise a JIRA and submit a webrev if that is indeed all that is needed. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From edward.nevill at gmail.com Tue Jun 23 13:51:00 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Tue, 23 Jun 2015 14:51:00 +0100 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 In-Reply-To: <558935D2.1020103@redhat.com> References: <1434979401.21282.31.camel@mint> <558815DA.8020500@redhat.com> <1434985182.21282.34.camel@mint> <558823FD.5080800@redhat.com> <558829D2.4000503@redhat.com> <55882C94.7030505@redhat.com> <55882ECC.8030602@redhat.com> <5588313C.1070409@redhat.com> <5588345A.4060708@redhat.com> <5589308C.6000309@redhat.com> <558935D2.1020103@redhat.com> Message-ID: <1435067460.5083.21.camel@mylittlepony.linaroharston> On Tue, 2015-06-23 at 11:32 +0100, Andrew Dinn wrote: > On 23/06/15 11:10, Andrew Haley wrote: > > On 22/06/15 17:14, Andrew Dinn wrote: > >> . . . > >> Well, you might not want to take this risk and instead add an explicit > >> zero of the upper half. But I think we need to be clear what risk we are > >> taking. > > > > It's this: if we don't explicitly zero the upper half we'll have to > > audit all the code which might present a sign-extended value (instead > > of a zero-extended one) in a register that's supposed to contain a > > jint. > > Ok, let's play safe. If Ed tweaks the patch to zero the upper word we > can always revise that later if/when we decide we are feeling lucky. > OK. New webrev at http://cr.openjdk.java.net/~enevill/8129426/webrev.02/ All the best, Ed. From aph at redhat.com Tue Jun 23 14:30:07 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 23 Jun 2015 15:30:07 +0100 Subject: [aarch64-port-dev ] RFR [M] : 8087333, Optionally Pre-Generate the HotSpot Template Interpreter In-Reply-To: <558955F0.3070203@oracle.com> References: <557B1743.9040004@oracle.com> <55824D8D.2000003@oracle.com> <5582F686.1090507@oracle.com> <5583C704.3090900@oracle.com> <55845790.7040507@oracle.com> <5588F41F.70506@oracle.com> <5589239E.9050600@oracle.com> <5589336B.90706@redhat.com> <558955F0.3070203@oracle.com> Message-ID: <55896D6F.3010404@redhat.com> On 06/23/2015 01:49 PM, Bertrand Delsart wrote: > On 23/06/2015 12:22, Andrew Haley wrote: >> On 23/06/15 10:15, Bertrand Delsart wrote: >>> While investigating why I had added the .S files for this CR, I noticed >>> that aarch64 open port has some .S files checked-in but no rules to >>> compile them. To avoid triggering the compilation and linking of these >>> (obsolete?) files in the open aarch64 port, I had to modify the change >>> in vm.make. >> >> Both of these .S files are used by simulator builds. We probably don't >> need them any more. > > Thanks Andrew, > > Let me know whether you prefer my new extensible Src_Files_BASE based > findsrc rule in hotspot/make/linux/makefiles/vm.make or whether I can > hard code the fact that .S files, if present, should be compiled. > > In the later case, I can either delete your .S files or add them in > Src_Files_EXCLUDE > > See > http://cr.openjdk.java.net/~bdelsart/8087333/webrev.00-02/webrev/make/linux/makefiles/vm.make.udiff.html > > (The addition of .S files for our extensions was moved to make/closed, > using a simple/extensible "Src_Files_BASE += \*.S") Hmmm. Ed, Andrew Dinn, do you understand the issue here? Andrew. From adinn at redhat.com Tue Jun 23 14:41:22 2015 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 23 Jun 2015 15:41:22 +0100 Subject: [aarch64-port-dev ] RFR: 8129584: Fix required for aarch64 after 8122937 Message-ID: <55897012.6040109@redhat.com> The following webrev against jdk9/hs-rt fixes AArch64 after it was broken by JDK-8122937: http://cr.openjdk.java.net/~adinn/8129584/webrev/ It is an AArch64-only patch. Could AArch64 reviewers please check it is ok? I'll also need someone to push it as I am not a committer. Thanks. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From bertrand.delsart at oracle.com Tue Jun 23 15:05:18 2015 From: bertrand.delsart at oracle.com (Bertrand Delsart) Date: Tue, 23 Jun 2015 17:05:18 +0200 Subject: [aarch64-port-dev ] RFR [M] : 8087333, Optionally Pre-Generate the HotSpot Template Interpreter In-Reply-To: <55896D6F.3010404@redhat.com> References: <557B1743.9040004@oracle.com> <55824D8D.2000003@oracle.com> <5582F686.1090507@oracle.com> <5583C704.3090900@oracle.com> <55845790.7040507@oracle.com> <5588F41F.70506@oracle.com> <5589239E.9050600@oracle.com> <5589336B.90706@redhat.com> <558955F0.3070203@oracle.com> <55896D6F.3010404@redhat.com> Message-ID: <558975AE.2060509@oracle.com> On 23/06/2015 16:30, Andrew Haley wrote: > On 06/23/2015 01:49 PM, Bertrand Delsart wrote: >> On 23/06/2015 12:22, Andrew Haley wrote: >>> On 23/06/15 10:15, Bertrand Delsart wrote: >>>> While investigating why I had added the .S files for this CR, I noticed >>>> that aarch64 open port has some .S files checked-in but no rules to >>>> compile them. To avoid triggering the compilation and linking of these >>>> (obsolete?) files in the open aarch64 port, I had to modify the change >>>> in vm.make. >>> >>> Both of these .S files are used by simulator builds. We probably don't >>> need them any more. >> >> Thanks Andrew, >> >> Let me know whether you prefer my new extensible Src_Files_BASE based >> findsrc rule in hotspot/make/linux/makefiles/vm.make or whether I can >> hard code the fact that .S files, if present, should be compiled. >> >> In the later case, I can either delete your .S files or add them in >> Src_Files_EXCLUDE >> >> See >> http://cr.openjdk.java.net/~bdelsart/8087333/webrev.00-02/webrev/make/linux/makefiles/vm.make.udiff.html >> >> (The addition of .S files for our extensions was moved to make/closed, >> using a simple/extensible "Src_Files_BASE += \*.S") > > Hmmm. Ed, Andrew Dinn, do you understand the issue here? Stated differently, I need to add support for finding and compiling .S files for this extension. In Hotspot, this is done thanks to: - a rule to compile them - 'findsrc', which builds the list of files to compile and link During reviews, I realized there are .S files in your code. These files are currently ignored by openjdk builds. However, my initial modification was going to cause these files to be spotted by findsrc (and hence compiled and linked with your JVM). To avoid compiling and linking them, I need either: - to remove them - or to ensure findsrc does not look for them (thanks to my proposed Src_Files_BASE mechanism) - or to ensure findsrc excludes them (through Src_Files_EXCLUDE) Hope this helps, Bertrand. > > Andrew. > > -- Bertrand Delsart, Grenoble Engineering Center Oracle, 180 av. de l'Europe, ZIRST de Montbonnot 38330 Montbonnot Saint Martin, FRANCE bertrand.delsart at oracle.com Phone : +33 4 76 18 81 23 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From volker.simonis at gmail.com Tue Jun 23 15:22:10 2015 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 23 Jun 2015 17:22:10 +0200 Subject: [aarch64-port-dev ] RFR: 8129584: Fix required for aarch64 after 8122937 In-Reply-To: <55897012.6040109@redhat.com> References: <55897012.6040109@redhat.com> Message-ID: Hi Andrew, the change looks good! I can push it for you. Regards, Volker On Tue, Jun 23, 2015 at 4:41 PM, Andrew Dinn wrote: > The following webrev against jdk9/hs-rt fixes AArch64 after it was > broken by JDK-8122937: > > http://cr.openjdk.java.net/~adinn/8129584/webrev/ > > It is an AArch64-only patch. Could AArch64 reviewers please check it is > ok? I'll also need someone to push it as I am not a committer. Thanks. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in UK and Wales under Company Registration No. 3798903 > Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters > (USA), Michael O'Neill (Ireland) From adinn at redhat.com Tue Jun 23 15:24:48 2015 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 23 Jun 2015 16:24:48 +0100 Subject: [aarch64-port-dev ] RFR: 8129584: Fix required for aarch64 after 8122937 In-Reply-To: References: <55897012.6040109@redhat.com> Message-ID: <55897A40.4070908@redhat.com> On 23/06/15 16:22, Volker Simonis wrote: > the change looks good! > > I can push it for you. Thanks, Volker, that would be great. I think I still need one more reviewer though. Can anyone else help here? regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From edward.nevill at linaro.org Tue Jun 23 15:31:14 2015 From: edward.nevill at linaro.org (Edward Nevill) Date: Tue, 23 Jun 2015 16:31:14 +0100 Subject: [aarch64-port-dev ] Fix on aarch required after 8073165: Contended Locking fast exit bucket. In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CFFC7DF@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC2CFFC7DF@DEWDFEMB12A.global.corp.sap> Message-ID: <1435073474.5083.28.camel@mylittlepony.linaroharston> On Tue, 2015-06-23 at 07:00 +0000, Lindenmaier, Goetz wrote: > Hi, > > I think you need the fix below after the change in 8073165. > http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/2abcd8a4896c > If you verify this, I will submit it along with the ppc fix. > > Best regards, > Goetz. Hi, This change looks fine. I have merged in the patch, built it on aarch64 and run jtreg/hotspot on the result. The result before and after was Test results: passed: 856; failed: 4; error: 6 Thanks for doing this, Ed. > > diff -r 5b5db3d68ab9 src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp > --- a/src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp Mon Jun 22 17:15:45 2015 +0200 > +++ b/src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp Tue Jun 23 08:54:15 2015 +0200 > @@ -2120,6 +2120,7 @@ > save_native_result(masm, ret_type, stack_slots); > } > > + __ mov(c_rarg2, rthread); > __ lea(c_rarg1, Address(sp, lock_slot_offset * VMRegImpl::stack_slot_size)); > __ mov(c_rarg0, obj_reg); > > @@ -2128,7 +2129,7 @@ > __ ldr(r19, Address(rthread, in_bytes(Thread::pending_exception_offset()))); > __ str(zr, Address(rthread, in_bytes(Thread::pending_exception_offset()))); > > - rt_call(masm, CAST_FROM_FN_PTR(address, SharedRuntime::complete_monitor_unlocking_C), 2, 0, 1); > + rt_call(masm, CAST_FROM_FN_PTR(address, SharedRuntime::complete_monitor_unlocking_C), 3, 0, 1); > > #ifdef ASSERT > { From volker.simonis at gmail.com Tue Jun 23 15:31:21 2015 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 23 Jun 2015 17:31:21 +0200 Subject: [aarch64-port-dev ] RFR: 8129584: Fix required for aarch64 after 8122937 In-Reply-To: <55897A40.4070908@redhat.com> References: <55897012.6040109@redhat.com> <55897A40.4070908@redhat.com> Message-ID: You're welcome. I've updated the copyright and once a second review appears I'll push it. Regards, Volker On Tue, Jun 23, 2015 at 5:24 PM, Andrew Dinn wrote: > On 23/06/15 16:22, Volker Simonis wrote: >> the change looks good! >> >> I can push it for you. > > Thanks, Volker, that would be great. I think I still need one more > reviewer though. Can anyone else help here? > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in UK and Wales under Company Registration No. 3798903 > Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters > (USA), Michael O'Neill (Ireland) From vladimir.kozlov at oracle.com Tue Jun 23 15:32:51 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 23 Jun 2015 08:32:51 -0700 Subject: [aarch64-port-dev ] RFR: 8129584: Fix required for aarch64 after 8122937 In-Reply-To: References: <55897012.6040109@redhat.com> Message-ID: <55897C23.9090601@oracle.com> +1. Reviewed. Thanks, Vladimir On 6/23/15 8:22 AM, Volker Simonis wrote: > Hi Andrew, > > the change looks good! > > I can push it for you. > > Regards, > Volker > > > On Tue, Jun 23, 2015 at 4:41 PM, Andrew Dinn wrote: >> The following webrev against jdk9/hs-rt fixes AArch64 after it was >> broken by JDK-8122937: >> >> http://cr.openjdk.java.net/~adinn/8129584/webrev/ >> >> It is an AArch64-only patch. Could AArch64 reviewers please check it is >> ok? I'll also need someone to push it as I am not a committer. Thanks. >> >> regards, >> >> >> Andrew Dinn >> ----------- >> Senior Principal Software Engineer >> Red Hat UK Ltd >> Registered in UK and Wales under Company Registration No. 3798903 >> Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters >> (USA), Michael O'Neill (Ireland) From adinn at redhat.com Tue Jun 23 17:00:35 2015 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 23 Jun 2015 18:00:35 +0100 Subject: [aarch64-port-dev ] RFR: 8129584: Fix required for aarch64 after 8122937 In-Reply-To: <55897C23.9090601@oracle.com> References: <55897012.6040109@redhat.com> <55897C23.9090601@oracle.com> Message-ID: <558990B3.6050905@redhat.com> On 23/06/15 16:32, Vladimir Kozlov wrote: > +1. Reviewed. Thanks, Vladimir. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From edward.nevill at linaro.org Tue Jun 23 17:22:17 2015 From: edward.nevill at linaro.org (Edward Nevill) Date: Tue, 23 Jun 2015 18:22:17 +0100 Subject: [aarch64-port-dev ] RFR: 8129551: some regressions introduced by addition of vectorisation code In-Reply-To: <55893261.7030501@redhat.com> References: <1435054332.5083.15.camel@mylittlepony.linaroharston> <55893261.7030501@redhat.com> Message-ID: On 23 June 2015 at 11:18, Andrew Haley wrote: > On 23/06/15 11:12, Edward Nevill wrote: > > This changeset only affects aarch64 files. > > > > Please review and if OK I will push, > > Won't this fail to detect an overflow? > > 1418 if (T == T4H || T == T8H) { imm32 &= 0xffff; nimm32 &= 0xffff; } > > Otherwise this looks OK to me. > > Andrew. > > Updated webrev with asserts http://cr.openjdk.java.net/~enevill/8129551/webrev.01 Could a JDK9 Reviewer please review this, Thanks, Ed. From vladimir.kozlov at oracle.com Tue Jun 23 17:55:20 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 23 Jun 2015 10:55:20 -0700 Subject: [aarch64-port-dev ] RFR: 8129551: some regressions introduced by addition of vectorisation code In-Reply-To: References: <1435054332.5083.15.camel@mylittlepony.linaroharston> <55893261.7030501@redhat.com> Message-ID: <55899D88.8090904@oracle.com> asserts works only in debug VM so I would leave original imm32 &= 0xff and imm32 &= 0xffff. I think you should also move comments with table to macroAssembler_aarch64.cpp. Thanks, Vladimir On 6/23/15 10:22 AM, Edward Nevill wrote: > On 23 June 2015 at 11:18, Andrew Haley wrote: > >> On 23/06/15 11:12, Edward Nevill wrote: >>> This changeset only affects aarch64 files. >>> >>> Please review and if OK I will push, >> >> Won't this fail to detect an overflow? >> >> 1418 if (T == T4H || T == T8H) { imm32 &= 0xffff; nimm32 &= 0xffff; } >> >> Otherwise this looks OK to me. >> >> Andrew. >> >> > Updated webrev with asserts > > http://cr.openjdk.java.net/~enevill/8129551/webrev.01 > > Could a JDK9 Reviewer please review this, > > Thanks, > Ed. > From edward.nevill at linaro.org Tue Jun 23 19:01:44 2015 From: edward.nevill at linaro.org (Edward Nevill) Date: Tue, 23 Jun 2015 20:01:44 +0100 Subject: [aarch64-port-dev ] RFR: 8129551: some regressions introduced by addition of vectorisation code In-Reply-To: <55899D88.8090904@oracle.com> References: <1435054332.5083.15.camel@mylittlepony.linaroharston> <55893261.7030501@redhat.com> <55899D88.8090904@oracle.com> Message-ID: On 23 June 2015 at 18:55, Vladimir Kozlov wrote: > asserts works only in debug VM so I would leave original imm32 &= 0xff and > imm32 &= 0xffff. > I think you should also move comments with table to > macroAssembler_aarch64.cpp. Done. New webrev at http://cr.openjdk.java.net/~enevill/8129551/webrev.02 Does this look OK now? Thanks, Ed. From vladimir.kozlov at oracle.com Tue Jun 23 19:42:20 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 23 Jun 2015 12:42:20 -0700 Subject: [aarch64-port-dev ] RFR: 8129551: some regressions introduced by addition of vectorisation code In-Reply-To: References: <1435054332.5083.15.camel@mylittlepony.linaroharston> <55893261.7030501@redhat.com> <55899D88.8090904@oracle.com> Message-ID: <5589B69C.5050100@oracle.com> Yes, looks good. Thanks, Vladimir On 6/23/15 12:01 PM, Edward Nevill wrote: > > > On 23 June 2015 at 18:55, Vladimir Kozlov > wrote: > > asserts works only in debug VM so I would leave original imm32 &= > 0xff and imm32 &= 0xffff. > I think you should also move comments with table to > macroAssembler_aarch64.cpp. > > > Done. New webrev at > > http://cr.openjdk.java.net/~enevill/8129551/webrev.02 > > Does this look OK now? > > Thanks, > Ed. > > From goetz.lindenmaier at sap.com Wed Jun 24 06:48:22 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 24 Jun 2015 06:48:22 +0000 Subject: [aarch64-port-dev ] Fix on aarch required after 8073165: Contended Locking fast exit bucket. In-Reply-To: <1435073474.5083.28.camel@mylittlepony.linaroharston> References: <4295855A5C1DE049A61835A1887419CC2CFFC7DF@DEWDFEMB12A.global.corp.sap> <1435073474.5083.28.camel@mylittlepony.linaroharston> Message-ID: <4295855A5C1DE049A61835A1887419CC2CFFECF1@DEWDFEMB12A.global.corp.sap> Hi Ed, thanks for testing this. I'll prepare an official RFR including these changes. Best regards, Goetz. -----Original Message----- From: Edward Nevill [mailto:edward.nevill at linaro.org] Sent: Dienstag, 23. Juni 2015 17:31 To: Lindenmaier, Goetz Cc: aarch64-port-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] Fix on aarch required after 8073165: Contended Locking fast exit bucket. On Tue, 2015-06-23 at 07:00 +0000, Lindenmaier, Goetz wrote: > Hi, > > I think you need the fix below after the change in 8073165. > http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/2abcd8a4896c > If you verify this, I will submit it along with the ppc fix. > > Best regards, > Goetz. Hi, This change looks fine. I have merged in the patch, built it on aarch64 and run jtreg/hotspot on the result. The result before and after was Test results: passed: 856; failed: 4; error: 6 Thanks for doing this, Ed. > > diff -r 5b5db3d68ab9 src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp > --- a/src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp Mon Jun 22 17:15:45 2015 +0200 > +++ b/src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp Tue Jun 23 08:54:15 2015 +0200 > @@ -2120,6 +2120,7 @@ > save_native_result(masm, ret_type, stack_slots); > } > > + __ mov(c_rarg2, rthread); > __ lea(c_rarg1, Address(sp, lock_slot_offset * VMRegImpl::stack_slot_size)); > __ mov(c_rarg0, obj_reg); > > @@ -2128,7 +2129,7 @@ > __ ldr(r19, Address(rthread, in_bytes(Thread::pending_exception_offset()))); > __ str(zr, Address(rthread, in_bytes(Thread::pending_exception_offset()))); > > - rt_call(masm, CAST_FROM_FN_PTR(address, SharedRuntime::complete_monitor_unlocking_C), 2, 0, 1); > + rt_call(masm, CAST_FROM_FN_PTR(address, SharedRuntime::complete_monitor_unlocking_C), 3, 0, 1); > > #ifdef ASSERT > { From david.holmes at oracle.com Wed Jun 24 09:48:39 2015 From: david.holmes at oracle.com (David Holmes) Date: Wed, 24 Jun 2015 19:48:39 +1000 Subject: [aarch64-port-dev ] RFR: 8081294: aarch64: fails to build on ubuntu wily In-Reply-To: <1435049748.20837.6.camel@mylittlepony.linaroharston> References: <1435049748.20837.6.camel@mylittlepony.linaroharston> Message-ID: <558A7CF7.7080101@oracle.com> Hi Ed, On 23/06/2015 6:55 PM, Edward Nevill wrote: > Hi, > > One of our partners has reported that jdk9 fails to build for aarch64 on ubuntu 'wily' > > The failing buildlog is here > > https://launchpad.net/ubuntu/+source/openjdk-9/9~b64-1ubuntu1/+build/7441971 > > The following webrev fixes this > > http://cr.openjdk.java.net/~enevill/8081294/webrev/ > > I have verified that this fixes the problem by cross compiling against a sysroot. > > Our partner has also verified that this patch fixes the build. > > As this is a change to shared code I will need someone to sponsor this and push it through JPRT. I'll handle this for you. As it is trivial and only affects aarch64 my Review is sufficient. Thanks, David > Thanks for your help, > Ed > > From edward.nevill at gmail.com Wed Jun 24 13:36:51 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 24 Jun 2015 14:36:51 +0100 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 In-Reply-To: <5589CB73.7020509@oracle.com> References: <1434979401.21282.31.camel@mint> <558815DA.8020500@redhat.com> <1434985182.21282.34.camel@mint> <558823FD.5080800@redhat.com> <558829D2.4000503@redhat.com> <55882C94.7030505@redhat.com> <55882ECC.8030602@redhat.com> <5588313C.1070409@redhat.com> <5588345A.4060708@redhat.com> <5589308C.6000309@redhat.com> <558935D2.1020103@redhat.com> <1435067460.5083.21.camel@mylittlepony.linaroharston> <5589CB73.7020509@oracle.com> Message-ID: <1435153011.13459.2.camel@mint> On Tue, 2015-06-23 at 15:11 -0600, Alejandro E Murillo wrote: > On 6/23/2015 7:51 AM, Edward Nevill wrote: > > On Tue, 2015-06-23 at 11:32 +0100, Andrew Dinn wrote: > >> On 23/06/15 11:10, Andrew Haley wrote: > >>> On 22/06/15 17:14, Andrew Dinn wrote: > >>>> . . . > >>>> Well, you might not want to take this risk and instead add an explicit > >>>> zero of the upper half. But I think we need to be clear what risk we are > >>>> taking. > >>> It's this: if we don't explicitly zero the upper half we'll have to > >>> audit all the code which might present a sign-extended value (instead > >>> of a zero-extended one) in a register that's supposed to contain a > >>> jint. > >> Ok, let's play safe. If Ed tweaks the patch to zero the upper word we > >> can always revise that later if/when we decide we are feeling lucky. > >> > > OK. New webrev at > > > > http://cr.openjdk.java.net/~enevill/8129426/webrev.02/ > > > > All the best, > > Ed. > > > > > Hi, > to be consistent with similar integrations and to avoid potential > merging problems, > going forward please work with the hs-rt repo for this kind of changes, > as Volker has been doing. Hi Alejandro, I have rebased the patch on hs-rt. New webrev http://cr.openjdk.java.net/~enevill/8129426/webrev.03/ Does it look OK to push? Thanks, Ed. From edward.nevill at gmail.com Wed Jun 24 15:27:32 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 24 Jun 2015 16:27:32 +0100 Subject: [aarch64-port-dev ] RFR: 8086087: aarch64: add support for 64 bit vectors Message-ID: <1435159652.13459.6.camel@mint> Hi, The following webrev based on the hs-rt repo http://cr.openjdk.java.net/~enevill/8086087/webrev.01 Adds support for 64 bit vectors on aarch64. Previously the vector code only supported 128 bit vectors. 32 bit vectors are not supported directly as aarch64 has no support for 32 bit vectors, however the above webrev will permit 32 bit vectors but just place them in a 64 bit vector. I have tested this with JTreg hotspot and get the same results before and after the change, viz, Test results: passed: 845; failed: 12; error: 6 I have also benchmarked the Test*Vect tests from 6340864 in the hotspot test suite. The following are the average results I get on one of our partners HW (lower number is better). TestByteVect: 128-bit (11.77), 64-bit (4.36) TestShortVect: 128-bit (5.02), 64-bit (5.22) TestIntVect: 128-bit (7.81), 64-bit (7.70) TestLongVect: 128-bit (11.67), 64-bit (11.71) TestFloatVect: 128-bit (16.75), 64-bit (17.29) TestDoubleVect:128-bit (32.37), 64-bit (32.43) So the only test which shows an improvement is TestByteVect which shows a 2.7x speedup. The other tests are the same within the bounds of experimental error. The reason TestByteVect shows such an improvement is that with 128 bit vectors it is not being vectorized at all because the loop is not unrolled sufficiently to allow it to be vectorized, wheras with 64 bit vectors it is. Please review and let me know if this is OK to push? Ed. From vladimir.kozlov at oracle.com Wed Jun 24 16:57:19 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 24 Jun 2015 09:57:19 -0700 Subject: [aarch64-port-dev ] RFR: 8086087: aarch64: add support for 64 bit vectors In-Reply-To: <1435159652.13459.6.camel@mint> References: <1435159652.13459.6.camel@mint> Message-ID: <558AE16F.9080704@oracle.com> Hi, Ed I am worried about 32 bit vectors. There could be conflict somewhere in RA since min_vector_size will not match minimum vector register VecD size. Can you split these changes to have separate changesets? One is support VecD (64 bit) and an other 32bit vectors. If some testing will show problems we can check which changes caused it more precisely. And this should be reviewed on compiler mailing list instead of runtime. Thanks, Vladimir On 6/24/15 8:27 AM, Edward Nevill wrote: > Hi, > > The following webrev based on the hs-rt repo > > http://cr.openjdk.java.net/~enevill/8086087/webrev.01 > > Adds support for 64 bit vectors on aarch64. Previously the vector code only supported 128 bit vectors. > > 32 bit vectors are not supported directly as aarch64 has no support for 32 bit vectors, however the above webrev will permit 32 bit vectors but just place them in a 64 bit vector. > > I have tested this with JTreg hotspot and get the same results before and after the change, viz, > > Test results: passed: 845; failed: 12; error: 6 > > I have also benchmarked the Test*Vect tests from 6340864 in the hotspot test suite. The following are the average results I get on one of our partners HW (lower number is better). > > TestByteVect: 128-bit (11.77), 64-bit (4.36) > TestShortVect: 128-bit (5.02), 64-bit (5.22) > TestIntVect: 128-bit (7.81), 64-bit (7.70) > TestLongVect: 128-bit (11.67), 64-bit (11.71) > TestFloatVect: 128-bit (16.75), 64-bit (17.29) > TestDoubleVect:128-bit (32.37), 64-bit (32.43) > > So the only test which shows an improvement is TestByteVect which shows a 2.7x speedup. The other tests are the same within the bounds of experimental error. > > The reason TestByteVect shows such an improvement is that with 128 bit vectors it is not being vectorized at all because the loop is not unrolled sufficiently to allow it to be vectorized, wheras with 64 bit vectors it is. > > Please review and let me know if this is OK to push? > > Ed. > > From edward.nevill at gmail.com Wed Jun 24 18:53:12 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 24 Jun 2015 19:53:12 +0100 Subject: [aarch64-port-dev ] RFR: 8086087: aarch64: add support for 64 bit vectors In-Reply-To: <558AE16F.9080704@oracle.com> References: <1435159652.13459.6.camel@mint> <558AE16F.9080704@oracle.com> Message-ID: <1435171992.13459.22.camel@mint> On Wed, 2015-06-24 at 09:57 -0700, Vladimir Kozlov wrote: > Hi, Ed > > I am worried about 32 bit vectors. There could be conflict somewhere in RA since min_vector_size will not match minimum > vector register VecD size. > > Can you split these changes to have separate changesets? One is support VecD (64 bit) and an other 32bit vectors. > If some testing will show problems we can check which changes caused it more precisely. Hi Vladimir, Thanks for the review. I am generally happy that putting 32 bit values in 64 bit registers is OK. I initially did the 64 bit registers by putting them in 128 bit registers. That worked OK, but there were 2 problems. First when a register was spilled I had to spill 128 bits since I did not know the size at the point of the spill. The second problem was with scalar reduction when doing an add across the vector, rather than a parallel vector operation. In this case it would get the wrong result if the top 64 bits were non zero. This is why I generated a separate 64 bit vectorisation. With 32 bit, spilling 64 bits instead of 32 bits does not matter, and scalar reduction operations do not exist for 32 bit (the minimum is 2I). I will do as you suggest, and split it into two webrevs. > > And this should be reviewed on compiler mailing list instead of runtime. And should the changeset then be based on hs-comp and pushed to hs-comp? All the best, Ed. From vladimir.kozlov at oracle.com Wed Jun 24 18:58:04 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 24 Jun 2015 11:58:04 -0700 Subject: [aarch64-port-dev ] RFR: 8086087: aarch64: add support for 64 bit vectors In-Reply-To: <1435171992.13459.22.camel@mint> References: <1435159652.13459.6.camel@mint> <558AE16F.9080704@oracle.com> <1435171992.13459.22.camel@mint> Message-ID: <558AFDBC.1080705@oracle.com> > And should the changeset then be based on hs-comp and pushed to hs-comp? Yes On 6/24/15 11:53 AM, Edward Nevill wrote: > On Wed, 2015-06-24 at 09:57 -0700, Vladimir Kozlov wrote: >> Hi, Ed >> >> I am worried about 32 bit vectors. There could be conflict somewhere in RA since min_vector_size will not match minimum >> vector register VecD size. >> >> Can you split these changes to have separate changesets? One is support VecD (64 bit) and an other 32bit vectors. >> If some testing will show problems we can check which changes caused it more precisely. > > Hi Vladimir, > > Thanks for the review. I am generally happy that putting 32 bit values in 64 bit registers is OK. I initially did the 64 bit registers by putting them in 128 bit registers. > > That worked OK, but there were 2 problems. > > First when a register was spilled I had to spill 128 bits since I did not know the size at the point of the spill. > > The second problem was with scalar reduction when doing an add across the vector, rather than a parallel vector operation. In this case it would get the wrong result if the top 64 bits were non zero. > > This is why I generated a separate 64 bit vectorisation. > > With 32 bit, spilling 64 bits instead of 32 bits does not matter, and scalar reduction operations do not exist for 32 bit (the minimum is 2I). > > I will do as you suggest, and split it into two webrevs. > >> >> And this should be reviewed on compiler mailing list instead of runtime. > > And should the changeset then be based on hs-comp and pushed to hs-comp? > > All the best, > Ed. > > From edward.nevill at gmail.com Wed Jun 24 19:44:09 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 24 Jun 2015 20:44:09 +0100 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 In-Reply-To: <558AFCD9.1090501@oracle.com> References: <1434979401.21282.31.camel@mint> <558815DA.8020500@redhat.com> <1434985182.21282.34.camel@mint> <558823FD.5080800@redhat.com> <558829D2.4000503@redhat.com> <55882C94.7030505@redhat.com> <55882ECC.8030602@redhat.com> <5588313C.1070409@redhat.com> <5588345A.4060708@redhat.com> <5589308C.6000309@redhat.com> <558935D2.1020103@redhat.com> <1435067460.5083.21.camel@mylittlepony.linaroharston> <5589CB73.7020509@oracle.com> <1435153011.13459.2.camel@mint> <558AFCD9.1090501@oracle.com> Message-ID: <1435175049.13459.26.camel@mint> On Wed, 2015-06-24 at 12:54 -0600, Alejandro E Murillo wrote: > > On 6/24/2015 7:36 AM, Edward Nevill wrote: > > On Tue, 2015-06-23 at 15:11 -0600, Alejandro E Murillo wrote: > >> On 6/23/2015 7:51 AM, Edward Nevill wrote: > >>> On Tue, 2015-06-23 at 11:32 +0100, Andrew Dinn wrote: > >>>> On 23/06/15 11:10, Andrew Haley wrote: > >>>>> On 22/06/15 17:14, Andrew Dinn wrote: > >>>>>> . . . > >>>>>> Well, you might not want to take this risk and instead add an explicit > >>>>>> zero of the upper half. But I think we need to be clear what risk we are > >>>>>> taking. > >>>>> It's this: if we don't explicitly zero the upper half we'll have to > >>>>> audit all the code which might present a sign-extended value (instead > >>>>> of a zero-extended one) in a register that's supposed to contain a > >>>>> jint. > >>>> Ok, let's play safe. If Ed tweaks the patch to zero the upper word we > >>>> can always revise that later if/when we decide we are feeling lucky. > >>>> > >>> OK. New webrev at > >>> > >>> http://cr.openjdk.java.net/~enevill/8129426/webrev.02/ > >>> > >>> All the best, > >>> Ed. > >>> > >>> > >> Hi, > >> to be consistent with similar integrations and to avoid potential > >> merging problems, > >> going forward please work with the hs-rt repo for this kind of changes, > >> as Volker has been doing. > > Hi Alejandro, > > > > I have rebased the patch on hs-rt. New webrev > > > > http://cr.openjdk.java.net/~enevill/8129426/webrev.03/ > > > > Does it look OK to push? > > > > Thanks, > > Ed. > > > Apologies, I thought you had already pushed that to jdk9/dev, > but it turns out you had pushed 8129551 , no this one. > > If this is a follow up to the previous push into jdk9/dev (8129551) > or somewhat related, then it's probably better if you pushed > this one to jdk9/dev as well, as to avoid any possible conflict > when we merge jdk9/dev with jdk9/hs. If they are completely > independent then go ahead and push it to hs-rt, after review of course. OK, I was confused when you suggested it should be pushed to hs-rt, since the change is adding PopCount to C2. Should I base it on hs-comp and move the review over to hotspot-compiler-dev? All the best, Ed. From alejandro.murillo at oracle.com Wed Jun 24 18:54:17 2015 From: alejandro.murillo at oracle.com (Alejandro E Murillo) Date: Wed, 24 Jun 2015 12:54:17 -0600 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 In-Reply-To: <1435153011.13459.2.camel@mint> References: <1434979401.21282.31.camel@mint> <558815DA.8020500@redhat.com> <1434985182.21282.34.camel@mint> <558823FD.5080800@redhat.com> <558829D2.4000503@redhat.com> <55882C94.7030505@redhat.com> <55882ECC.8030602@redhat.com> <5588313C.1070409@redhat.com> <5588345A.4060708@redhat.com> <5589308C.6000309@redhat.com> <558935D2.1020103@redhat.com> <1435067460.5083.21.camel@mylittlepony.linaroharston> <5589CB73.7020509@oracle.com> <1435153011.13459.2.camel@mint> Message-ID: <558AFCD9.1090501@oracle.com> On 6/24/2015 7:36 AM, Edward Nevill wrote: > On Tue, 2015-06-23 at 15:11 -0600, Alejandro E Murillo wrote: >> On 6/23/2015 7:51 AM, Edward Nevill wrote: >>> On Tue, 2015-06-23 at 11:32 +0100, Andrew Dinn wrote: >>>> On 23/06/15 11:10, Andrew Haley wrote: >>>>> On 22/06/15 17:14, Andrew Dinn wrote: >>>>>> . . . >>>>>> Well, you might not want to take this risk and instead add an explicit >>>>>> zero of the upper half. But I think we need to be clear what risk we are >>>>>> taking. >>>>> It's this: if we don't explicitly zero the upper half we'll have to >>>>> audit all the code which might present a sign-extended value (instead >>>>> of a zero-extended one) in a register that's supposed to contain a >>>>> jint. >>>> Ok, let's play safe. If Ed tweaks the patch to zero the upper word we >>>> can always revise that later if/when we decide we are feeling lucky. >>>> >>> OK. New webrev at >>> >>> http://cr.openjdk.java.net/~enevill/8129426/webrev.02/ >>> >>> All the best, >>> Ed. >>> >>> >> Hi, >> to be consistent with similar integrations and to avoid potential >> merging problems, >> going forward please work with the hs-rt repo for this kind of changes, >> as Volker has been doing. > Hi Alejandro, > > I have rebased the patch on hs-rt. New webrev > > http://cr.openjdk.java.net/~enevill/8129426/webrev.03/ > > Does it look OK to push? > > Thanks, > Ed. > Apologies, I thought you had already pushed that to jdk9/dev, but it turns out you had pushed 8129551 , no this one. If this is a follow up to the previous push into jdk9/dev (8129551) or somewhat related, then it's probably better if you pushed this one to jdk9/dev as well, as to avoid any possible conflict when we merge jdk9/dev with jdk9/hs. If they are completely independent then go ahead and push it to hs-rt, after review of course. Thanks -- Alejandro From alejandro.murillo at oracle.com Wed Jun 24 20:51:30 2015 From: alejandro.murillo at oracle.com (Alejandro E Murillo) Date: Wed, 24 Jun 2015 14:51:30 -0600 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 In-Reply-To: <1435175049.13459.26.camel@mint> References: <1434979401.21282.31.camel@mint> <558815DA.8020500@redhat.com> <1434985182.21282.34.camel@mint> <558823FD.5080800@redhat.com> <558829D2.4000503@redhat.com> <55882C94.7030505@redhat.com> <55882ECC.8030602@redhat.com> <5588313C.1070409@redhat.com> <5588345A.4060708@redhat.com> <5589308C.6000309@redhat.com> <558935D2.1020103@redhat.com> <1435067460.5083.21.camel@mylittlepony.linaroharston> <5589CB73.7020509@oracle.com> <1435153011.13459.2.camel@mint> <558AFCD9.1090501@oracle.com> <1435175049.13459.26.camel@mint> Message-ID: <558B1852.2030800@oracle.com> On 6/24/2015 1:44 PM, Edward Nevill wrote: > On Wed, 2015-06-24 at 12:54 -0600, Alejandro E Murillo wrote: >> On 6/24/2015 7:36 AM, Edward Nevill wrote: >>> On Tue, 2015-06-23 at 15:11 -0600, Alejandro E Murillo wrote: >>>> On 6/23/2015 7:51 AM, Edward Nevill wrote: >>>>> On Tue, 2015-06-23 at 11:32 +0100, Andrew Dinn wrote: >>>>>> On 23/06/15 11:10, Andrew Haley wrote: >>>>>>> On 22/06/15 17:14, Andrew Dinn wrote: >>>>>>>> . . . >>>>>>>> Well, you might not want to take this risk and instead add an explicit >>>>>>>> zero of the upper half. But I think we need to be clear what risk we are >>>>>>>> taking. >>>>>>> It's this: if we don't explicitly zero the upper half we'll have to >>>>>>> audit all the code which might present a sign-extended value (instead >>>>>>> of a zero-extended one) in a register that's supposed to contain a >>>>>>> jint. >>>>>> Ok, let's play safe. If Ed tweaks the patch to zero the upper word we >>>>>> can always revise that later if/when we decide we are feeling lucky. >>>>>> >>>>> OK. New webrev at >>>>> >>>>> http://cr.openjdk.java.net/~enevill/8129426/webrev.02/ >>>>> >>>>> All the best, >>>>> Ed. >>>>> >>>>> >>>> Hi, >>>> to be consistent with similar integrations and to avoid potential >>>> merging problems, >>>> going forward please work with the hs-rt repo for this kind of changes, >>>> as Volker has been doing. >>> Hi Alejandro, >>> >>> I have rebased the patch on hs-rt. New webrev >>> >>> http://cr.openjdk.java.net/~enevill/8129426/webrev.03/ >>> >>> Does it look OK to push? >>> >>> Thanks, >>> Ed. >>> >> Apologies, I thought you had already pushed that to jdk9/dev, >> but it turns out you had pushed 8129551 , no this one. >> >> If this is a follow up to the previous push into jdk9/dev (8129551) >> or somewhat related, then it's probably better if you pushed >> this one to jdk9/dev as well, as to avoid any possible conflict >> when we merge jdk9/dev with jdk9/hs. If they are completely >> independent then go ahead and push it to hs-rt, after review of course. > OK, I was confused when you suggested it should be pushed to hs-rt, since the change is adding PopCount to C2. > > Should I base it on hs-comp and move the review over to hotspot-compiler-dev? > > All the best, > Ed. > > As I said, I didn't double checked and thought that was for the push you had done into jdk9/dev yes, this one looks more appropriate for hs-comp. Depending on the change, going forward, push to the appropriate hotspot group repo (hs-rt or hs-comp) not to jdk9/dev cheers -- Alejandro From aph at redhat.com Thu Jun 25 09:24:49 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 25 Jun 2015 10:24:49 +0100 Subject: [aarch64-port-dev ] Scalar reduction Message-ID: <558BC8E1.9050207@redhat.com> Do you know if int scalar reduction supposed to work yet? This doesn't seem to be vectorized: int sum(int[] a) { int val = 0; for(int elem: a) val += elem; return val; } but this is: int[] sum(int[] a, int[] b, int[] result) { for(int i = 0; i < a.length; i++) result [i] = a[i] + b[i]; return result; } Hmmmm, doesn't seem to work on x86 either. Baffled. Andrew. From edward.nevill at gmail.com Thu Jun 25 10:20:49 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 25 Jun 2015 11:20:49 +0100 Subject: [aarch64-port-dev ] Scalar reduction In-Reply-To: <558BC8E1.9050207@redhat.com> References: <558BC8E1.9050207@redhat.com> Message-ID: <1435227649.11204.11.camel@mylittlepony.linaroharston> On Thu, 2015-06-25 at 10:24 +0100, Andrew Haley wrote: > Do you know if int scalar reduction supposed to work yet? Yes, the following shows an example --- cut here --- public class Sum { public static void main(String[] args) { int[] a = new int[256*1024]; int[] b = new int[256*1024]; init(a,b); int total = 0; for(int j = 0; j < 2000; j++) { total = sum(a,b); } System.out.println("total = " + total); } public static void init( int[] a, int[] b) { for(int j = 0; j < 1; j++) { for(int i = 0; i < a.length; i++) { a[i] = i * 1 + j; b[i] = i * 1 - j; } } } public static int sum( int[] a, int[] b) { int total = 0; for(int i = 0; i < a.length; i++) { total += a[i] + b[i]; } return total; } } --- cut here --- This generates 0x000003ff850eaa00: sbfiz x11, x16, #2, #32 ;*iaload ; - Sum::sum at 13 (line 35) 0x000003ff850eaa04: add x12, x2, x11 0x000003ff850eaa08: add x11, x18, x11 0x000003ff850eaa0c: ldr q17, [x11,#16] 0x000003ff850eaa10: ldr q16, [x12,#16] 0x000003ff850eaa14: sbfiz x11, x16, #2, #32 0x000003ff850eaa18: add x12, x2, x11 0x000003ff850eaa1c: add x11, x18, x11 0x000003ff850eaa20: ldr q19, [x11,#32] 0x000003ff850eaa24: ldr q18, [x12,#32] 0x000003ff850eaa28: add v16.4s, v16.4s, v17.4s 0x000003ff850eaa2c: add v17.4s, v18.4s, v19.4s 0x000003ff850eaa30: addv s18, v16.4s <<<<< SCALAR REDUCTION 0x000003ff850eaa34: mov w12, v18.s[0] 0x000003ff850eaa38: add w11, w12, w0 0x000003ff850eaa3c: add w16, w16, #0x8 ;*iinc ; - Sum::sum at 20 (line 33) 0x000003ff850eaa40: addv s16, v17.4s 0x000003ff850eaa44: mov w13, v16.s[0] 0x000003ff850eaa48: add w0, w13, w11 ;*iadd ; - Sum::sum at 18 (line 35) 0x000003ff850eaa4c: cmp w16, w10 0x000003ff850eaa50: b.lt 0x000003ff850eaa00 ;*if_icmpge > > This doesn't seem to be vectorized: > > int sum(int[] a) { > int val = 0; > for(int elem: a) > val += elem; > return val; > } But yes, it seems rather bad that it doesn't get this. I'll take a closer look, Ed. From aph at redhat.com Thu Jun 25 10:23:29 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 25 Jun 2015 11:23:29 +0100 Subject: [aarch64-port-dev ] Scalar reduction In-Reply-To: <1435227649.11204.11.camel@mylittlepony.linaroharston> References: <558BC8E1.9050207@redhat.com> <1435227649.11204.11.camel@mylittlepony.linaroharston> Message-ID: <558BD6A1.9030502@redhat.com> On 06/25/2015 11:20 AM, Edward Nevill wrote: > But yes, it seems rather bad that it doesn't get this. Ah, OK. x86 doesn't work either. I guess better to ask hs-comp. Andrew. From edward.nevill at gmail.com Thu Jun 25 10:40:49 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 25 Jun 2015 11:40:49 +0100 Subject: [aarch64-port-dev ] RFR: 8086087: aarch64: add support for 64 bit vectors Message-ID: <1435228849.11204.17.camel@mylittlepony.linaroharston> Hi, The following webrev adds support for 64 bit vectors (only) on aarch64 http://cr.openjdk.java.net/~enevill/8086087/webrev.02 Previously the vector code only supported 128 bit vectors. 32 bit vectors are not supported in this changeset but will be supported in a future changeset. I have tested this with JTreg hotspot with the following results Original: Test results: passed: 858; failed: 4; error: 6 Revised: Test results: passed: 857; failed: 5; error: 6 The additional test failure is compiler/intrinsics/muladd/TestMulAdd.java which fails intermittently with both original and revised versions (I'll take a look at that next:-). I have also benchmarked the Test*Vect tests from 6340864 in the hotspot test suite. The following are the average results I get on one of our partners HW (lower number is better). TestByteVect: 128-bit (11.77), 64-bit (4.36) TestShortVect: 128-bit (5.02), 64-bit (5.22) TestIntVect: 128-bit (7.81), 64-bit (7.70) TestLongVect: 128-bit (11.67), 64-bit (11.71) TestFloatVect: 128-bit (16.75), 64-bit (17.29) TestDoubleVect:128-bit (32.37), 64-bit (32.43) So the only test which shows an improvement is TestByteVect which shows a 2.7x speedup. The other tests are the same within the bounds of experimental error. The reason TestByteVect shows such an improvement is that with 128 bit vectors it is not being vectorized at all because the loop is not unrolled sufficiently to allow it to be vectorized, wheras with 64 bit vectors it is. Please review and let me know if this is OK to push? Ed. PS: For pushing an aarch64 specific change to hs-comp do I need 1 or 2 reviewers? From vladimir.kozlov at oracle.com Thu Jun 25 14:20:41 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 25 Jun 2015 07:20:41 -0700 Subject: [aarch64-port-dev ] RFR: 8086087: aarch64: add support for 64 bit vectors In-Reply-To: <1435228849.11204.17.camel@mylittlepony.linaroharston> References: <1435228849.11204.17.camel@mylittlepony.linaroharston> Message-ID: <558C0E39.2060000@oracle.com> This looks good. Thank you, Ed. Since changes a big you need 2 reviewers. One official reviewer (me in this case) and one who is familiar with this code and at least committer (Andrew, for example). Thanks, Vladimir On 6/25/15 3:40 AM, Edward Nevill wrote: > Hi, > > The following webrev adds support for 64 bit vectors (only) on aarch64 > > http://cr.openjdk.java.net/~enevill/8086087/webrev.02 > > Previously the vector code only supported 128 bit vectors. > > 32 bit vectors are not supported in this changeset but will be supported in a future changeset. > > I have tested this with JTreg hotspot with the following results > > Original: Test results: passed: 858; failed: 4; error: 6 > Revised: Test results: passed: 857; failed: 5; error: 6 > > The additional test failure is compiler/intrinsics/muladd/TestMulAdd.java which fails intermittently with both original and revised versions (I'll take a look at that next:-). > > I have also benchmarked the Test*Vect tests from 6340864 in the hotspot test suite. The following are the average results I get on one of our partners HW (lower number is better). > > TestByteVect: 128-bit (11.77), 64-bit (4.36) > TestShortVect: 128-bit (5.02), 64-bit (5.22) > TestIntVect: 128-bit (7.81), 64-bit (7.70) > TestLongVect: 128-bit (11.67), 64-bit (11.71) > TestFloatVect: 128-bit (16.75), 64-bit (17.29) > TestDoubleVect:128-bit (32.37), 64-bit (32.43) > > So the only test which shows an improvement is TestByteVect which shows a 2.7x speedup. The other tests are the same within the bounds of experimental error. > > The reason TestByteVect shows such an improvement is that with 128 bit vectors it is not being vectorized at all because the loop is not unrolled sufficiently to allow it to be vectorized, wheras with 64 bit vectors it is. > > Please review and let me know if this is OK to push? > > Ed. > > PS: For pushing an aarch64 specific change to hs-comp do I need 1 or 2 reviewers? > > From edward.nevill at linaro.org Thu Jun 25 14:37:38 2015 From: edward.nevill at linaro.org (Edward Nevill) Date: Thu, 25 Jun 2015 15:37:38 +0100 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 Message-ID: <1435243058.29000.4.camel@mylittlepony.linaroharston> Hi, Aarch64 currently does not support the PopCountI and PopCountL nodes in aarch64.ad The following webrev adds support for these using the SIMD instructions 'cnt' and 'addv' http://cr.openjdk.java.net/~enevill/8129426/webrev.04 This patch was contributed by alexander.alexeev at caviumnetworks.com The patch only modifies aarch64 specific files. I have merged the patch in and tested it with JTreg / hotspot with the following results for both original and revised Test results: passed: 858; failed: 4; error: 6 I have benchmarked the patch on four different partner platforms. The average improvement was 2.6X for PopCountI and 2.5X for PopCountL. Please review, Thanks, Ed. From edward.nevill at gmail.com Thu Jun 25 14:44:04 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 25 Jun 2015 15:44:04 +0100 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 Message-ID: <1435243444.29000.6.camel@mylittlepony.linaroharston> Hi, Aarch64 currently does not support the PopCountI and PopCountL nodes in aarch64.ad The following webrev adds support for these using the SIMD instructions 'cnt' and 'addv' http://cr.openjdk.java.net/~enevill/8129426/webrev.04 This patch was contributed by alexander.alexeev at caviumnetworks.com The patch only modifies aarch64 specific files. I have merged the patch in and tested it with JTreg / hotspot with the following results for both original and revised Test results: passed: 858; failed: 4; error: 6 I have benchmarked the patch on four different partner platforms. The average improvement was 2.6X for PopCountI and 2.5X for PopCountL. Please review, Thanks, Ed. From vladimir.kozlov at oracle.com Thu Jun 25 19:29:17 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 25 Jun 2015 12:29:17 -0700 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 In-Reply-To: <1435243444.29000.6.camel@mylittlepony.linaroharston> References: <1435243444.29000.6.camel@mylittlepony.linaroharston> Message-ID: <558C568D.5080506@oracle.com> Looks good. Thanks, Vladimir On 6/25/15 7:44 AM, Edward Nevill wrote: > Hi, > > Aarch64 currently does not support the PopCountI and PopCountL nodes in aarch64.ad > > The following webrev adds support for these using the SIMD instructions 'cnt' and 'addv' > > http://cr.openjdk.java.net/~enevill/8129426/webrev.04 > > This patch was contributed by alexander.alexeev at caviumnetworks.com > > The patch only modifies aarch64 specific files. > > I have merged the patch in and tested it with JTreg / hotspot with the following results for both original and revised > > Test results: passed: 858; failed: 4; error: 6 > > I have benchmarked the patch on four different partner platforms. The average improvement was 2.6X for PopCountI and 2.5X for PopCountL. > > Please review, > > Thanks, > Ed. > > From aph at redhat.com Mon Jun 29 09:07:10 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 29 Jun 2015 10:07:10 +0100 Subject: [aarch64-port-dev ] RFR: 8086087: aarch64: add support for 64 bit vectors In-Reply-To: <558C0E39.2060000@oracle.com> References: <1435228849.11204.17.camel@mylittlepony.linaroharston> <558C0E39.2060000@oracle.com> Message-ID: <55910ABE.8050809@redhat.com> On 25/06/15 15:20, Vladimir Kozlov wrote: > Since changes a big you need 2 reviewers. One official reviewer (me in this case) and one who is familiar with this code > and at least committer (Andrew, for example). Looks good. Thanks, Andrew. From aph at redhat.com Mon Jun 29 09:08:18 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 29 Jun 2015 10:08:18 +0100 Subject: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2 In-Reply-To: <1435243444.29000.6.camel@mylittlepony.linaroharston> References: <1435243444.29000.6.camel@mylittlepony.linaroharston> Message-ID: <55910B02.50604@redhat.com> On 25/06/15 15:44, Edward Nevill wrote: > Please review, This is fine. Thanks, Andrew. From aph at redhat.com Mon Jun 29 12:15:21 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 29 Jun 2015 13:15:21 +0100 Subject: [aarch64-port-dev ] Sign-extending 32-bit operands in adapters [Was: RFR: 8129426: aarch64: add support for PopCount in C2] In-Reply-To: <5589308C.6000309@redhat.com> References: <1434979401.21282.31.camel@mint> <558815DA.8020500@redhat.com> <1434985182.21282.34.camel@mint> <558823FD.5080800@redhat.com> <558829D2.4000503@redhat.com> <55882C94.7030505@redhat.com> <55882ECC.8030602@redhat.com> <5588313C.1070409@redhat.com> <5588345A.4060708@redhat.com> <5589308C.6000309@redhat.com> Message-ID: <559136D9.5000801@redhat.com> Here's a snippet from gen_i2c_adapter where we sign extend: if (!r_2->is_valid()) { // sign extend??? __ ldrsw(rscratch2, Address(esp, ld_off)); __ str(rscratch2, Address(sp, st_off)); but in another place we don't sign extend: // sign extend and use a full word? __ ldrw(r, Address(esp, ld_off)); } So, we sign extend when our argument is passed to compiled code in memory, but zero extend when it is passed in a register. The confusion (and those comments) about what should happen seems to come from the x86 code. I think we've agreed that we should zero extend, but I'm still far from convinced that we should ever use an input operand in any mode other than its natural size. Andrew. From goetz.lindenmaier at sap.com Mon Jun 29 12:46:42 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 29 Jun 2015 12:46:42 +0000 Subject: [aarch64-port-dev ] Sign-extending 32-bit operands in adapters [Was: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2] In-Reply-To: <559136D9.5000801@redhat.com> References: <1434979401.21282.31.camel@mint> <558815DA.8020500@redhat.com> <1434985182.21282.34.camel@mint> <558823FD.5080800@redhat.com> <558829D2.4000503@redhat.com> <55882C94.7030505@redhat.com> <55882ECC.8030602@redhat.com> <5588313C.1070409@redhat.com> <5588345A.4060708@redhat.com> <5589308C.6000309@redhat.com> <559136D9.5000801@redhat.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2D000CA6@DEWDFEMB12A.global.corp.sap> Hi Andrew, I have an off-topic question that touches this issue: You have CCallingConventionRequiresIntsAsLongs set to true. We once introduced this, because we have to sign-extend all ints and place them in long slots for PPC C calling conventions. If this is set, we do the cast in the frontend, and the i2l nodes can be optimized, and we don't need to do it in the native wrapper. Unfortunately, there are more and more intrinsics with explicitly constructed calls. We have to adapt these in the frontend, which causes not that nice shared changes. I think about doing the i2l cast right in the native wrapper. There, the cast will always be necessary, i.e., it's not optimized, but that's not really performance relevant. So basically I could remove the code guarded by CCallingConventionRequiresIntsAsLongs, except for that you use it ... But as I read the aarch code, it's not really necessary. You pass ints in small slots, anyways. So do you rely on that code? Best regards, Goetz. -----Original Message----- From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Andrew Haley Sent: Montag, 29. Juni 2015 14:15 To: Andrew Dinn; edward.nevill at gmail.com Cc: hotspot-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: Sign-extending 32-bit operands in adapters [Was: [aarch64-port-dev ] RFR: 8129426: aarch64: add support for PopCount in C2] Here's a snippet from gen_i2c_adapter where we sign extend: if (!r_2->is_valid()) { // sign extend??? __ ldrsw(rscratch2, Address(esp, ld_off)); __ str(rscratch2, Address(sp, st_off)); but in another place we don't sign extend: // sign extend and use a full word? __ ldrw(r, Address(esp, ld_off)); } So, we sign extend when our argument is passed to compiled code in memory, but zero extend when it is passed in a register. The confusion (and those comments) about what should happen seems to come from the x86 code. I think we've agreed that we should zero extend, but I'm still far from convinced that we should ever use an input operand in any mode other than its natural size. Andrew. From aph at redhat.com Mon Jun 29 12:54:38 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 29 Jun 2015 13:54:38 +0100 Subject: [aarch64-port-dev ] Sign-extending 32-bit operands in adapters [Was: RFR: 8129426: aarch64: add support for PopCount in C2] In-Reply-To: <4295855A5C1DE049A61835A1887419CC2D000CA6@DEWDFEMB12A.global.corp.sap> References: <1434979401.21282.31.camel@mint> <558815DA.8020500@redhat.com> <1434985182.21282.34.camel@mint> <558823FD.5080800@redhat.com> <558829D2.4000503@redhat.com> <55882C94.7030505@redhat.com> <55882ECC.8030602@redhat.com> <5588313C.1070409@redhat.com> <5588345A.4060708@redhat.com> <5589308C.6000309@redhat.com> <559136D9.5000801@redhat.com> <4295855A5C1DE049A61835A1887419CC2D000CA6@DEWDFEMB12A.global.corp.sap> Message-ID: <5591400E.9090702@redhat.com> On 06/29/2015 01:46 PM, Lindenmaier, Goetz wrote: > So basically I could remove the code guarded by CCallingConventionRequiresIntsAsLongs, > except for that you use it ... > But as I read the aarch code, it's not really necessary. You pass ints in small slots, anyways. > So do you rely on that code? Could you please point me to exactly the code you are talking about which we use? Andrew. From aph at redhat.com Mon Jun 29 13:00:29 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 29 Jun 2015 14:00:29 +0100 Subject: [aarch64-port-dev ] Sign-extending 32-bit operands in adapters [Was: RFR: 8129426: aarch64: add support for PopCount in C2] In-Reply-To: <5591400E.9090702@redhat.com> References: <1434979401.21282.31.camel@mint> <558815DA.8020500@redhat.com> <1434985182.21282.34.camel@mint> <558823FD.5080800@redhat.com> <558829D2.4000503@redhat.com> <55882C94.7030505@redhat.com> <55882ECC.8030602@redhat.com> <5588313C.1070409@redhat.com> <5588345A.4060708@redhat.com> <5589308C.6000309@redhat.com> <559136D9.5000801@redhat.com> <4295855A5C1DE049A61835A1887419CC2D000CA6@DEWDFEMB12A.global.corp.sap> <5591400E.9090702@redhat.com> Message-ID: <5591416D.7000804@redhat.com> On 06/29/2015 01:54 PM, Andrew Haley wrote: > On 06/29/2015 01:46 PM, Lindenmaier, Goetz wrote: >> So basically I could remove the code guarded by CCallingConventionRequiresIntsAsLongs, >> except for that you use it ... >> But as I read the aarch code, it's not really necessary. You pass ints in small slots, anyways. >> So do you rely on that code? > > Could you please point me to exactly the code you are talking about > which we use? Or is it simply that we set CCallingConventionRequiresIntsAsLongs = true? We don't need to do that. Andrew. From goetz.lindenmaier at sap.com Mon Jun 29 13:04:33 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 29 Jun 2015 13:04:33 +0000 Subject: [aarch64-port-dev ] Sign-extending 32-bit operands in adapters [Was: RFR: 8129426: aarch64: add support for PopCount in C2] In-Reply-To: <5591416D.7000804@redhat.com> References: <1434979401.21282.31.camel@mint> <558815DA.8020500@redhat.com> <1434985182.21282.34.camel@mint> <558823FD.5080800@redhat.com> <558829D2.4000503@redhat.com> <55882C94.7030505@redhat.com> <55882ECC.8030602@redhat.com> <5588313C.1070409@redhat.com> <5588345A.4060708@redhat.com> <5589308C.6000309@redhat.com> <559136D9.5000801@redhat.com> <4295855A5C1DE049A61835A1887419CC2D000CA6@DEWDFEMB12A.global.corp.sap> <5591400E.9090702@redhat.com> <5591416D.7000804@redhat.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2D000CEC@DEWDFEMB12A.global.corp.sap> Yes, right, that's what I mean. So I'll remove it. Best regards, Goetz. -----Original Message----- From: Andrew Haley [mailto:aph at redhat.com] Sent: Montag, 29. Juni 2015 15:00 To: Lindenmaier, Goetz; Andrew Dinn; edward.nevill at gmail.com Cc: hotspot-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] Sign-extending 32-bit operands in adapters [Was: RFR: 8129426: aarch64: add support for PopCount in C2] On 06/29/2015 01:54 PM, Andrew Haley wrote: > On 06/29/2015 01:46 PM, Lindenmaier, Goetz wrote: >> So basically I could remove the code guarded by CCallingConventionRequiresIntsAsLongs, >> except for that you use it ... >> But as I read the aarch code, it's not really necessary. You pass ints in small slots, anyways. >> So do you rely on that code? > > Could you please point me to exactly the code you are talking about > which we use? Or is it simply that we set CCallingConventionRequiresIntsAsLongs = true? We don't need to do that. Andrew. From tangwei6 at huawei.com Tue Jun 30 14:34:12 2015 From: tangwei6 at huawei.com (Tangwei (Euler)) Date: Tue, 30 Jun 2015 14:34:12 +0000 Subject: [aarch64-port-dev ] barrier issue in cmpxchgptr implementation Message-ID: Hi All, I checked the MacroAssembler::cmpxchgptr implementation in JVM, and found load-aquire/store-release followed by a full memory barrier used to simulate the behavior of X86 cmpxchg. Following is the code snapshot from the function for your reference. In my opinion, the cmpxchg must provide full bi-directional fence semantics. Is there anyone can help to explain why one membar in the end is enough for cmpxchg? Or one more memory barrier commented out is needed to add at the beginning? // membar(AnyAny) // is this needed? retry_load: ldaxr(tmp, addr); cmp(tmp, oldv); br(Assembler::NE, nope); stlxr(tmp, newv, addr); cbzw(tmp, succeed); b(retry_load); nope: membar(AnyAny); mov(oldv, tmp); I checked the source __cmpxchg_mb, it seems two memory barrier is added around cmpxchg. smp_mb(); ret = __cmpxchg(ptr, old, new, size); smp_mb(); But after checked intrinsic in GCC with following simple case, I found there is no barrier around load-acquire/store-release. I am a little confused now. Which instruction sequence should be chosen to simulate X86 cmpxchg? long foo (long *ptr, long old, long new) { return __sync_val_compare_and_swap (ptr, old, new); } Assembly: ldaxr x3, [x0] // 22 aarch64_load_exclusivedi [length = 4] cmp x3, x1 // 23 *cmpdi/1 [length = 4] bne .L3 // 24 *condjump [length = 4] stlxr w4, x2, [x0] // 25 aarch64_store_exclusivedi [length = 4] cbnz w4, .L2 // 26 *cbnesi1 [length = 4] Regards! wei From aph at redhat.com Tue Jun 30 18:10:47 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 30 Jun 2015 19:10:47 +0100 Subject: [aarch64-port-dev ] barrier issue in cmpxchgptr implementation In-Reply-To: References: Message-ID: <5592DBA7.6010501@redhat.com> Hi, On 06/30/2015 03:34 PM, Tangwei (Euler) wrote: > I checked the MacroAssembler::cmpxchgptr implementation in JVM, > and found load-aquire/store-release followed by a full memory > barrier used to simulate the behavior of X86 cmpxchg. Well, it's not necessarily doing that. It is a cmpxchg which needs to be strong enough for HotSpot's usage: it does not necessarily have to be as strong as x86. But let's move on... > Following is the code snapshot from the function for your > reference. In my opinion, the cmpxchg must provide full > bi-directional fence semantics. Is there anyone can help to explain > why one membar in the end is enough for cmpxchg? Or one more memory > barrier commented out is needed to add at the beginning? > // membar(AnyAny) // is this needed? > retry_load: LoadLoad|LoadStore > ldaxr(tmp, addr); > cmp(tmp, oldv); > br(Assembler::NE, nope); StoreStore|LoadStore > stlxr(tmp, newv, addr); > cbzw(tmp, succeed); > b(retry_load); > nope: > membar(AnyAny); AnyAny. Nothing can pass here. Please explain what problem you see with this sequence. We need an example of incorrect operation. > But after checked intrinsic in GCC with following simple case, I > found there is no barrier around load-acquire/store-release. I am a > little confused now. Which instruction sequence should be chosen to > simulate X86 cmpxchg? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 As I said, that depends if you really need to simulate X86 cmpxchg. Andrew.