From dean.long at oracle.com Wed May 1 01:06:52 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 30 Apr 2019 18:06:52 -0700 Subject: [13] RFR (S): 8223171: Redundant nmethod dependencies for effectively final methods In-Reply-To: References: Message-ID: Does this allow us to assert !uniqm->can_be_statically_bound() in Dependencies::assert_unique_concrete_method? dl On 4/30/19 12:59 PM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8223171/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8223171 > > Both C1 & C2 may register redundant nmethod dependencies (which > (always hold). For example, for instance methods on final classes. > > Moreover, C2 does add dependencies for private methods. > > The patch enhances the checks and unify them between C1 & C2. > > Testing: tier1-4 > > Best regards, > Vladimir Ivanov From sandhya.viswanathan at intel.com Wed May 1 01:14:53 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Wed, 1 May 2019 01:14:53 +0000 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABDA2@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB845@FMSMSX126.amr.corp.intel.com> <21eeec09-624f-2dbd-b2f5-86d512233fe0@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB898@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABCE7@FMSMSX126.amr.corp.intel.com> <4a77b7c0-fc1a-441c-d018-70568876c4f4@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABDA2@FMSMSX126.amr.corp.intel.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB5094@FMSMSX126.amr.corp.intel.com> Hi VladimirK, JBS: https://bugs.openjdk.java.net/browse/JDK-8222074 Please find updated webrev at: http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.01/ With this webrev the ad file has only about 60 lines effectively added. Also the generated product libjvm.so size only increases by about 0.26% vs the prior 1.50%. I have used multiple match rules in one instruct for same size shift related rules and also for the new Abs/Neg rules. What I noticed is that the adlc still duplicates lot of code and there is potential to further improve code size for multiple match rule case by improving the adlc itself. The adlc improvement (like removing duplicate emits, formats, expand, pipeline etc) can be done as a separate RFE. In this webrev, I have also fixed the errors reported by Vladimir Ivanov and corrected the issues reported by jcheck tool. Also taken into account reducing the temporary by using TEMP dst for multiply rules. The compiler jtreg tests and the java math tests pass on Haswell, SKX, and KNL. Your review and feedback is welcome. Best Regards, Sandhya -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Viswanathan, Sandhya Sent: Wednesday, April 10, 2019 10:22 AM To: Vladimir Kozlov ; B. Blaser Cc: hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 Yes good catch, in mul32B_reg_avx(), the last two instructions are the only place where dst is used: __ vpackuswb($dst$$XMMRegister, $tmp2$$XMMRegister, $tmp1$$XMMRegister, vector_len); __ vpermq($dst$$XMMRegister, $dst$$XMMRegister, 0xD8, vector_len); Here dst can be same as tmp2 or tmp1 in packuswb() and so the effect TEMP dst is not required. Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Wednesday, April 10, 2019 9:59 AM To: Viswanathan, Sandhya ; B. Blaser Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 On 4/10/19 8:36 AM, Viswanathan, Sandhya wrote: > Hi Bernard, > > One could add TEMP dst in effect() to let the register allocator know that dst needs to be different from src. Yes, we use this way. Or, in mul4B_reg() case, we can use $dst instead $tmp2 to avoid overwriting $src2 before we get value from it if $dst = $src2. On other hand, mul32B_reg_avx() and other have 'TEMP dst' effect but $dst is used only for final result. It is a little mess which may cause ineffective use of registers in compiled code. Thanks, Vladimir > > Best Regards, > Sandhya > > > -----Original Message----- > From: B. Blaser [mailto:bsrbnd at gmail.com] > Sent: Wednesday, April 10, 2019 4:10 AM > To: Viswanathan, Sandhya > Cc: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > > Hi Sandhya and Vladimir K., > > On Wed, 10 Apr 2019 at 03:06, Viswanathan, Sandhya wrote: >> >> Hi Vladimir, >> >> Yes, I missed the question below: >>>> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? >> >> No it is not intentional, we can use the dst register in those cases and reduced the tmps. > > I guess we have to be careful using $dst instead of $tmp registers as the allocator sometimes provides identical $src & $dst. Also, I'm not sure this would be possible in the case of mul4B_reg(): > > 7349 format %{"pmovsxbw $tmp,$src1\n\t" > 7350 "pmovsxbw $tmp2,$src2\n\t" > > I believe this couldn't work if you use $dst instead of $tmp and $dst = $src2, what do you think? > > Thanks, > Bernard > From rahul.v.raghavan at oracle.com Wed May 1 16:39:46 2019 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Wed, 1 May 2019 22:09:46 +0530 Subject: [13] RFR: 8202414: Unsafe write after primitive array creation may result in array length change In-Reply-To: References: <7e900022-4e16-2ab9-1f4d-89e1510e2646@oracle.com> <392c665f-869c-29af-4fc5-e6f844820846@oracle.com> <3db5d7ab-ad99-310b-e891-fc36d25da338@oracle.com> <7b03a213-7fee-a87f-b48d-250662e730ef@oracle.com>

Message-ID: <5fb42e98-f923-b3d8-ee04-6d899fb2ac2d@oracle.com> Thank you Vladimir. On 30/04/19 10:45 PM, Vladimir Ivanov wrote: > Looks good! > > Best regards, > Vladimir Ivanov > > On 30/04/2019 00:04, Rahul Raghavan wrote: >> Thank you Vladimir Ivanov for suggestions. >> >> Please note following latest changes tried. >> - http://cr.openjdk.java.net/~rraghavan/8202414/webrev.04/ >> >> Hope did not miss any points. >> Confirmed no failures with the reported test cases. >> Also hs-tier1 to tier4, hs-precheckin-comp testing in progress. >> >> Thanks, >> Rahul From vladimir.x.ivanov at oracle.com Wed May 1 16:40:49 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 1 May 2019 09:40:49 -0700 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB5094@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB845@FMSMSX126.amr.corp.intel.com> <21eeec09-624f-2dbd-b2f5-86d512233fe0@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB898@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABCE7@FMSMSX126.amr.corp.intel.com> <4a77b7c0-fc1a-441c-d018-70568876c4f4@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABDA2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB5094@FMSMSX126.amr.corp.intel.com> Message-ID: So far, testing spotted a couple of minor issues: windows build broken: jib > t:/workspace/build/windows-x64/hotspot/variant-server/gensrc/adfiles/ad_x86.cpp(1572): error C2220: warning treated as error - no 'object' file generated jib > t:/workspace/build/windows-x64/hotspot/variant-server/gensrc/adfiles/ad_x86.cpp(1572): warning C4101: 'inst': unreferenced local variable jib > t:/workspace/build/windows-x64/hotspot/variant-server/gensrc/adfiles/ad_x86.cpp(1600): warning C4101: 'inst': unreferenced local variable jib > t:/workspace/build/windows-x64/hotspot/variant-server/gensrc/adfiles/ad_x86.cpp(1616): warning C4101: 'inst': unreferenced local variable jib > t:/workspace/build/windows-x64/hotspot/variant-server/gensrc/adfiles/ad_x86.cpp(1632): warning C4101: 'inst': unreferenced local variable compiler/graalunit/HotspotTest.java: org.graalvm.compiler.hotspot.test.CRC32SubstitutionsTest finished 1685.0 ms org.graalvm.compiler.hotspot.test.CheckGraalIntrinsics started (7 of 44) test: FAILED test(org.graalvm.compiler.hotspot.test.CheckGraalIntrinsics) java.lang.AssertionError: missing Graal intrinsics for: java/lang/Math.abs(I)I java/lang/Math.abs(J)J at org.graalvm.compiler.hotspot.test.CheckGraalIntrinsics.test(CheckGraalIntrinsics.java:646) I'll respond on the patch itself separately. Best regards, Vladimir Ivanov On 30/04/2019 18:14, Viswanathan, Sandhya wrote: > Hi VladimirK, > > JBS: https://bugs.openjdk.java.net/browse/JDK-8222074 > > Please find updated webrev at: > http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.01/ > > With this webrev the ad file has only about 60 lines effectively added. > Also the generated product libjvm.so size only increases by about 0.26% vs the prior 1.50%. > I have used multiple match rules in one instruct for same size shift related rules and also for the new Abs/Neg rules. > What I noticed is that the adlc still duplicates lot of code and there is potential to further improve code size for multiple match rule case by improving the adlc itself. > The adlc improvement (like removing duplicate emits, formats, expand, pipeline etc) can be done as a separate RFE. > > In this webrev, I have also fixed the errors reported by Vladimir Ivanov and corrected the issues reported by jcheck tool. > Also taken into account reducing the temporary by using TEMP dst for multiply rules. > > The compiler jtreg tests and the java math tests pass on Haswell, SKX, and KNL. > > Your review and feedback is welcome. > > Best Regards, > Sandhya > > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Viswanathan, Sandhya > Sent: Wednesday, April 10, 2019 10:22 AM > To: Vladimir Kozlov ; B. Blaser > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 > > Yes good catch, in mul32B_reg_avx(), the last two instructions are the only place where dst is used: > > __ vpackuswb($dst$$XMMRegister, $tmp2$$XMMRegister, $tmp1$$XMMRegister, vector_len); > __ vpermq($dst$$XMMRegister, $dst$$XMMRegister, 0xD8, vector_len); > > Here dst can be same as tmp2 or tmp1 in packuswb() and so the effect TEMP dst is not required. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, April 10, 2019 9:59 AM > To: Viswanathan, Sandhya ; B. Blaser > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > > On 4/10/19 8:36 AM, Viswanathan, Sandhya wrote: >> Hi Bernard, >> >> One could add TEMP dst in effect() to let the register allocator know that dst needs to be different from src. > > Yes, we use this way. Or, in mul4B_reg() case, we can use $dst instead $tmp2 to avoid overwriting > $src2 before we get value from it if $dst = $src2. > > On other hand, mul32B_reg_avx() and other have 'TEMP dst' effect but $dst is used only for final result. > > It is a little mess which may cause ineffective use of registers in compiled code. > > Thanks, > Vladimir > >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: B. Blaser [mailto:bsrbnd at gmail.com] >> Sent: Wednesday, April 10, 2019 4:10 AM >> To: Viswanathan, Sandhya >> Cc: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> Hi Sandhya and Vladimir K., >> >> On Wed, 10 Apr 2019 at 03:06, Viswanathan, Sandhya wrote: >>> >>> Hi Vladimir, >>> >>> Yes, I missed the question below: >>>>> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? >>> >>> No it is not intentional, we can use the dst register in those cases and reduced the tmps. >> >> I guess we have to be careful using $dst instead of $tmp registers as the allocator sometimes provides identical $src & $dst. Also, I'm not sure this would be possible in the case of mul4B_reg(): >> >> 7349 format %{"pmovsxbw $tmp,$src1\n\t" >> 7350 "pmovsxbw $tmp2,$src2\n\t" >> >> I believe this couldn't work if you use $dst instead of $tmp and $dst = $src2, what do you think? >> >> Thanks, >> Bernard >> From vladimir.x.ivanov at oracle.com Wed May 1 16:52:16 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 1 May 2019 09:52:16 -0700 Subject: [13] RFR (S): 8223171: Redundant nmethod dependencies for effectively final methods In-Reply-To: References:

Message-ID: <5cdb781f-1922-9a5d-9f52-f6874fd6a259@oracle.com> > Does this allow us to assert !uniqm->can_be_statically_bound() in > Dependencies::assert_unique_concrete_method? In general, no. It doesn't hold for final methods: dependency is still needed when context is broad enough, since an overriding method can be loaded in a different part of the hierarchy (under the same context class). In case of the adjusted checks it's safe, since context == method holder when actual_receiver->is_final() == true. if (!callee->is_final_method() && !callee->is_private() && !actual_receiver->is_final()) { dependencies()->assert_unique_concrete_method(actual_receiver, cha_monomorphic_target); } I refactored the patch a bit: http://cr.openjdk.java.net/~vlivanov/8223171/webrev.01/ >> Moreover, C2 does add dependencies for private methods. I take it back. Earlier checks handle private methods. Only methods on final classes get redundant dependencies. Best regards, Vladimir Ivanov From vladimir.x.ivanov at oracle.com Wed May 1 16:52:56 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 1 May 2019 09:52:56 -0700 Subject: [13] RFR (S): 8219902: C2: MemNode::can_see_stored_value() ignores casts which carry control dependency In-Reply-To: <147e1906-381d-4a3d-0a24-23e9c149581d@oracle.com> References: <8ab7b14b-d42d-37ea-e6d7-151d068c57f0@oracle.com> <147e1906-381d-4a3d-0a24-23e9c149581d@oracle.com> Message-ID: <4c67bc3e-deac-d6ad-0a63-2e01193100cb@oracle.com> Thanks, Vladimir. Best regards, Vladimir Ivanov On 30/04/2019 13:02, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir K > > On 4/30/19 12:47 PM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8219902/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8219902 >> >> JDK-8161334 [1] enhanced MemNode::can_see_stored_value to ignore casts >> when access base addresses are compared. It turned out to be too >> aggressive since casts may carry control dependency. >> >> Proposed fix is to keep casts with control dependency. >> >> Testing: failing test case, tier1-3 >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8161334 From john.r.rose at oracle.com Wed May 1 18:20:03 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 1 May 2019 11:20:03 -0700 Subject: [13] RFR: 8202414: Unsafe write after primitive array creation may result in array length change In-Reply-To: References: <7e900022-4e16-2ab9-1f4d-89e1510e2646@oracle.com> <392c665f-869c-29af-4fc5-e6f844820846@oracle.com> <3db5d7ab-ad99-310b-e891-fc36d25da338@oracle.com> <7b03a213-7fee-a87f-b48d-250662e730ef@oracle.com>

<5cdb781f-1922-9a5d-9f52-f6874fd6a259@oracle.com> Message-ID: Can you also add check_unique_method(ctxk, uniqm) to the version of assert_unique_concrete_method that takes a Method*? Otherwise, the changes look good to me. dl On 5/1/19 9:52 AM, Vladimir Ivanov wrote: > >> Does this allow us to assert !uniqm->can_be_statically_bound() in >> Dependencies::assert_unique_concrete_method? > > In general, no. It doesn't hold for final methods: dependency is still > needed when context is broad enough, since an overriding method can be > loaded in a different part of the hierarchy (under the same context > class). > > In case of the adjusted checks it's safe, since context == method > holder when actual_receiver->is_final() == true. > > ?? if (!callee->is_final_method() && !callee->is_private() && > !actual_receiver->is_final()) { > dependencies()->assert_unique_concrete_method(actual_receiver, > cha_monomorphic_target); > ??? } > > I refactored the patch a bit: > ? http://cr.openjdk.java.net/~vlivanov/8223171/webrev.01/ > >>> Moreover, C2 does add dependencies for private methods. > > I take it back. Earlier checks handle private methods. Only methods on > final classes get redundant dependencies. > > Best regards, > Vladimir Ivanov From vladimir.x.ivanov at oracle.com Wed May 1 21:58:01 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 1 May 2019 14:58:01 -0700 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB5094@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB845@FMSMSX126.amr.corp.intel.com> <21eeec09-624f-2dbd-b2f5-86d512233fe0@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB898@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABCE7@FMSMSX126.amr.corp.intel.com> <4a77b7c0-fc1a-441c-d018-70568876c4f4@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABDA2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB5094@FMSMSX126.amr.corp.intel.com> Message-ID: <0cd3fd93-0f1e-a6d0-d4c3-f8d95b533ff7@oracle.com> > http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.01/ Nice job, Sandhya! Glad to hear the approach pays off! Unfortunately, I must note that AD file becomes much more obscure. Especially with those function pointers. 1528 void emit_vshift16B_code(MacroAssembler& _masm, int opcode, XMMRegister dst, 1529 XMMRegister src, XMMRegister shift, 1530 XMMRegister tmp1, XMMRegister tmp2, Register scratch) { 1531 XX_Inst extendinst = get_extend_inst(opcode == Op_URShiftVB ? false : true); 1532 XX_Inst shiftinst = get_xx_inst(opcode); 1533 1534 (_masm.*extendinst)(tmp1, src); 1535 (_masm.*shiftinst)(tmp1, shift); 1536 __ pshufd(tmp2, src, 0xE); 1537 (_masm.*extendinst)(tmp2, tmp2); 1538 (_masm.*shiftinst)(tmp2, shift); 1539 __ movdqu(dst, ExternalAddress(vector_short_to_byte_mask()), scratch); 1540 __ pand(tmp2, dst); 1541 __ pand(dst, tmp1); 1542 __ packuswb(dst, tmp2); 1543 } Have you tried to encapsulate that into x86-specific MacroAssembler? 8682 instruct vshift16B(vecX dst, vecX src, vecS shift, vecX tmp1, vecX tmp2, rRegI scratch) %{ 8683 predicate(UseSSE > 3 && UseAVX <= 1 && n->as_Vector()->length() == 16); 8684 match(Set dst (LShiftVB src shift)); 8685 match(Set dst (RShiftVB src shift)); 8686 match(Set dst (URShiftVB src shift)); 8687 effect(TEMP dst, TEMP tmp1, TEMP tmp2, TEMP scratch); 8688 format %{"pmovxbw $tmp1,$src\n\t" 8689 "shiftop $tmp1,$shift\n\t" 8690 "pshufd $tmp2,$src\n\t" 8691 "pmovxbw $tmp2,$tmp2\n\t" 8692 "shiftop $tmp2,$shift\n\t" 8693 "movdqu $dst,[0x00ff00ff0x00ff00ff]\n\t" 8694 "pand $tmp2,$dst\n\t" 8695 "pand $dst,$tmp1\n\t" 8696 "packuswb $dst,$tmp2\n\t! packed16B shift" %} 8697 ins_encode %{ 8698 emit_vshift16B_code(_masm, this->as_Mach()->ideal_Opcode() , $dst$$XMMRegister, $src$$XMMRegister, $shift$$XMMRegister, $tmp1$$XMMRegister, $tmp2$$XMMRegister, $scratch$$Register); 8699 %} 8700 ins_pipe( pipe_slow ); 8701 %} can be turned into something like: instruct vshift16B(vecX dst, vecX src, vecS shift, vecX tmp1, vecX tmp2, rRegI scratch) %{ predicate(n->as_Vector()->length() == 16); match(Set dst (LShiftVB src shift)); match(Set dst (RShiftVB src shift)); match(Set dst (URShiftVB src shift)); effect(TEMP dst, TEMP tmp1, TEMP tmp2, TEMP scratch); format %{"packed16B shift" %} ins_encode %{ int vlen = 0; // 128-bit BasicType elem_type = T_BYTE; int shift_mode = ...; // L/R/UR or S/U + L/R __ vshift(vlen, elem_type, shift_mode, $dst$$..., $src$$..., $shift$$..., $tmp1$$..., $tmp2$$..., $scratch$$...); %} Then MA::vshift can dispatch between different implementations depending on SSE/AVX level available. Do you see any problems with that from footprint perspective? Ideally, I'd prefer to see a library of operations on vectors encapsulated in MacroAssembler (or a subclass) and used in x86.ad. That will accommodate further reductions in AD instructions needed. Best regards, Vladimir Ivanov > With this webrev the ad file has only about 60 lines effectively added. > Also the generated product libjvm.so size only increases by about 0.26% vs the prior 1.50%. > I have used multiple match rules in one instruct for same size shift related rules and also for the new Abs/Neg rules. > What I noticed is that the adlc still duplicates lot of code and there is potential to further improve code size for multiple match rule case by improving the adlc itself. > The adlc improvement (like removing duplicate emits, formats, expand, pipeline etc) can be done as a separate RFE. > > In this webrev, I have also fixed the errors reported by Vladimir Ivanov and corrected the issues reported by jcheck tool. > Also taken into account reducing the temporary by using TEMP dst for multiply rules. > > The compiler jtreg tests and the java math tests pass on Haswell, SKX, and KNL. > > Your review and feedback is welcome. > > Best Regards, > Sandhya > > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Viswanathan, Sandhya > Sent: Wednesday, April 10, 2019 10:22 AM > To: Vladimir Kozlov ; B. Blaser > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 > > Yes good catch, in mul32B_reg_avx(), the last two instructions are the only place where dst is used: > > __ vpackuswb($dst$$XMMRegister, $tmp2$$XMMRegister, $tmp1$$XMMRegister, vector_len); > __ vpermq($dst$$XMMRegister, $dst$$XMMRegister, 0xD8, vector_len); > > Here dst can be same as tmp2 or tmp1 in packuswb() and so the effect TEMP dst is not required. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, April 10, 2019 9:59 AM > To: Viswanathan, Sandhya ; B. Blaser > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > > On 4/10/19 8:36 AM, Viswanathan, Sandhya wrote: >> Hi Bernard, >> >> One could add TEMP dst in effect() to let the register allocator know that dst needs to be different from src. > > Yes, we use this way. Or, in mul4B_reg() case, we can use $dst instead $tmp2 to avoid overwriting > $src2 before we get value from it if $dst = $src2. > > On other hand, mul32B_reg_avx() and other have 'TEMP dst' effect but $dst is used only for final result. > > It is a little mess which may cause ineffective use of registers in compiled code. > > Thanks, > Vladimir > >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: B. Blaser [mailto:bsrbnd at gmail.com] >> Sent: Wednesday, April 10, 2019 4:10 AM >> To: Viswanathan, Sandhya >> Cc: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> Hi Sandhya and Vladimir K., >> >> On Wed, 10 Apr 2019 at 03:06, Viswanathan, Sandhya wrote: >>> >>> Hi Vladimir, >>> >>> Yes, I missed the question below: >>>>> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? >>> >>> No it is not intentional, we can use the dst register in those cases and reduced the tmps. >> >> I guess we have to be careful using $dst instead of $tmp registers as the allocator sometimes provides identical $src & $dst. Also, I'm not sure this would be possible in the case of mul4B_reg(): >> >> 7349 format %{"pmovsxbw $tmp,$src1\n\t" >> 7350 "pmovsxbw $tmp2,$src2\n\t" >> >> I believe this couldn't work if you use $dst instead of $tmp and $dst = $src2, what do you think? >> >> Thanks, >> Bernard >> From sandhya.viswanathan at intel.com Wed May 1 22:09:49 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Wed, 1 May 2019 22:09:49 +0000 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <0cd3fd93-0f1e-a6d0-d4c3-f8d95b533ff7@oracle.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB845@FMSMSX126.amr.corp.intel.com> <21eeec09-624f-2dbd-b2f5-86d512233fe0@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB898@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABCE7@FMSMSX126.amr.corp.intel.com> <4a77b7c0-fc1a-441c-d018-70568876c4f4@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABDA2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB5094@FMSMSX126.amr.corp.intel.com> <0cd3fd93-0f1e-a6d0-d4c3-f8d95b533ff7@oracle.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB569D@FMSMSX126.amr.corp.intel.com> Hi Vladimir, I agree, I wanted to show both the approaches in this patch to get your feedback: 1) with emit as a function 2) with emit part in the instruct body itself With emit as a function it becomes hard to read and I personally prefer it in the instruct itself as is done for vabsneg2D etc. That is what you are recommending as well so I feel good. Once the adlc enhancement is done both the approaches should give similar binary size. Till then there will be small overhead with approach 2) as emit is duplicated per match rule. I will send an updated patch fixing the two issues you mentioned in your previous email plus this change of using approach 2). Please do let me know if you want to see any other change in this patch. Best Regards, Sandhya -----Original Message----- From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] Sent: Wednesday, May 01, 2019 2:58 PM To: Viswanathan, Sandhya ; Vladimir Kozlov Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.01/ Nice job, Sandhya! Glad to hear the approach pays off! Unfortunately, I must note that AD file becomes much more obscure. Especially with those function pointers. 1528 void emit_vshift16B_code(MacroAssembler& _masm, int opcode, XMMRegister dst, 1529 XMMRegister src, XMMRegister shift, 1530 XMMRegister tmp1, XMMRegister tmp2, Register scratch) { 1531 XX_Inst extendinst = get_extend_inst(opcode == Op_URShiftVB ? false : true); 1532 XX_Inst shiftinst = get_xx_inst(opcode); 1533 1534 (_masm.*extendinst)(tmp1, src); 1535 (_masm.*shiftinst)(tmp1, shift); 1536 __ pshufd(tmp2, src, 0xE); 1537 (_masm.*extendinst)(tmp2, tmp2); 1538 (_masm.*shiftinst)(tmp2, shift); 1539 __ movdqu(dst, ExternalAddress(vector_short_to_byte_mask()), scratch); 1540 __ pand(tmp2, dst); 1541 __ pand(dst, tmp1); 1542 __ packuswb(dst, tmp2); 1543 } Have you tried to encapsulate that into x86-specific MacroAssembler? 8682 instruct vshift16B(vecX dst, vecX src, vecS shift, vecX tmp1, vecX tmp2, rRegI scratch) %{ 8683 predicate(UseSSE > 3 && UseAVX <= 1 && n->as_Vector()->length() == 16); 8684 match(Set dst (LShiftVB src shift)); 8685 match(Set dst (RShiftVB src shift)); 8686 match(Set dst (URShiftVB src shift)); 8687 effect(TEMP dst, TEMP tmp1, TEMP tmp2, TEMP scratch); 8688 format %{"pmovxbw $tmp1,$src\n\t" 8689 "shiftop $tmp1,$shift\n\t" 8690 "pshufd $tmp2,$src\n\t" 8691 "pmovxbw $tmp2,$tmp2\n\t" 8692 "shiftop $tmp2,$shift\n\t" 8693 "movdqu $dst,[0x00ff00ff0x00ff00ff]\n\t" 8694 "pand $tmp2,$dst\n\t" 8695 "pand $dst,$tmp1\n\t" 8696 "packuswb $dst,$tmp2\n\t! packed16B shift" %} 8697 ins_encode %{ 8698 emit_vshift16B_code(_masm, this->as_Mach()->ideal_Opcode() , $dst$$XMMRegister, $src$$XMMRegister, $shift$$XMMRegister, $tmp1$$XMMRegister, $tmp2$$XMMRegister, $scratch$$Register); 8699 %} 8700 ins_pipe( pipe_slow ); 8701 %} can be turned into something like: instruct vshift16B(vecX dst, vecX src, vecS shift, vecX tmp1, vecX tmp2, rRegI scratch) %{ predicate(n->as_Vector()->length() == 16); match(Set dst (LShiftVB src shift)); match(Set dst (RShiftVB src shift)); match(Set dst (URShiftVB src shift)); effect(TEMP dst, TEMP tmp1, TEMP tmp2, TEMP scratch); format %{"packed16B shift" %} ins_encode %{ int vlen = 0; // 128-bit BasicType elem_type = T_BYTE; int shift_mode = ...; // L/R/UR or S/U + L/R __ vshift(vlen, elem_type, shift_mode, $dst$$..., $src$$..., $shift$$..., $tmp1$$..., $tmp2$$..., $scratch$$...); %} Then MA::vshift can dispatch between different implementations depending on SSE/AVX level available. Do you see any problems with that from footprint perspective? Ideally, I'd prefer to see a library of operations on vectors encapsulated in MacroAssembler (or a subclass) and used in x86.ad. That will accommodate further reductions in AD instructions needed. Best regards, Vladimir Ivanov > With this webrev the ad file has only about 60 lines effectively added. > Also the generated product libjvm.so size only increases by about 0.26% vs the prior 1.50%. > I have used multiple match rules in one instruct for same size shift related rules and also for the new Abs/Neg rules. > What I noticed is that the adlc still duplicates lot of code and there is potential to further improve code size for multiple match rule case by improving the adlc itself. > The adlc improvement (like removing duplicate emits, formats, expand, pipeline etc) can be done as a separate RFE. > > In this webrev, I have also fixed the errors reported by Vladimir Ivanov and corrected the issues reported by jcheck tool. > Also taken into account reducing the temporary by using TEMP dst for multiply rules. > > The compiler jtreg tests and the java math tests pass on Haswell, SKX, and KNL. > > Your review and feedback is welcome. > > Best Regards, > Sandhya > > > -----Original Message----- > From: hotspot-compiler-dev > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of > Viswanathan, Sandhya > Sent: Wednesday, April 10, 2019 10:22 AM > To: Vladimir Kozlov ; B. Blaser > > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 > > Yes good catch, in mul32B_reg_avx(), the last two instructions are the only place where dst is used: > > __ vpackuswb($dst$$XMMRegister, $tmp2$$XMMRegister, $tmp1$$XMMRegister, vector_len); > __ vpermq($dst$$XMMRegister, $dst$$XMMRegister, 0xD8, > vector_len); > > Here dst can be same as tmp2 or tmp1 in packuswb() and so the effect TEMP dst is not required. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, April 10, 2019 9:59 AM > To: Viswanathan, Sandhya ; B. Blaser > > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > > On 4/10/19 8:36 AM, Viswanathan, Sandhya wrote: >> Hi Bernard, >> >> One could add TEMP dst in effect() to let the register allocator know that dst needs to be different from src. > > Yes, we use this way. Or, in mul4B_reg() case, we can use $dst instead > $tmp2 to avoid overwriting > $src2 before we get value from it if $dst = $src2. > > On other hand, mul32B_reg_avx() and other have 'TEMP dst' effect but $dst is used only for final result. > > It is a little mess which may cause ineffective use of registers in compiled code. > > Thanks, > Vladimir > >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: B. Blaser [mailto:bsrbnd at gmail.com] >> Sent: Wednesday, April 10, 2019 4:10 AM >> To: Viswanathan, Sandhya >> Cc: Vladimir Kozlov ; >> hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> Hi Sandhya and Vladimir K., >> >> On Wed, 10 Apr 2019 at 03:06, Viswanathan, Sandhya wrote: >>> >>> Hi Vladimir, >>> >>> Yes, I missed the question below: >>>>> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? >>> >>> No it is not intentional, we can use the dst register in those cases and reduced the tmps. >> >> I guess we have to be careful using $dst instead of $tmp registers as the allocator sometimes provides identical $src & $dst. Also, I'm not sure this would be possible in the case of mul4B_reg(): >> >> 7349 format %{"pmovsxbw $tmp,$src1\n\t" >> 7350 "pmovsxbw $tmp2,$src2\n\t" >> >> I believe this couldn't work if you use $dst instead of $tmp and $dst = $src2, what do you think? >> >> Thanks, >> Bernard >> From vladimir.x.ivanov at oracle.com Wed May 1 22:15:05 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 1 May 2019 15:15:05 -0700 Subject: [13] RFR (S): 8223171: Redundant nmethod dependencies for effectively final methods In-Reply-To: References:

<5cdb781f-1922-9a5d-9f52-f6874fd6a259@oracle.com> Message-ID: <50f8065b-445f-ae1d-00c8-743fd870404a@oracle.com> > Can you also add check_unique_method(ctxk, uniqm) to the version of > assert_unique_concrete_method that takes a Method*? Like this? http://cr.openjdk.java.net/~vlivanov/8223171/webrev.02/ Best regards, Vladimir Ivanov > On 5/1/19 9:52 AM, Vladimir Ivanov wrote: >> >>> Does this allow us to assert !uniqm->can_be_statically_bound() in >>> Dependencies::assert_unique_concrete_method? >> >> In general, no. It doesn't hold for final methods: dependency is still >> needed when context is broad enough, since an overriding method can be >> loaded in a different part of the hierarchy (under the same context >> class). >> >> In case of the adjusted checks it's safe, since context == method >> holder when actual_receiver->is_final() == true. >> >> ?? if (!callee->is_final_method() && !callee->is_private() && >> !actual_receiver->is_final()) { >> dependencies()->assert_unique_concrete_method(actual_receiver, >> cha_monomorphic_target); >> ??? } >> >> I refactored the patch a bit: >> ? http://cr.openjdk.java.net/~vlivanov/8223171/webrev.01/ >> >>>> Moreover, C2 does add dependencies for private methods. >> >> I take it back. Earlier checks handle private methods. Only methods on >> final classes get redundant dependencies. >> >> Best regards, >> Vladimir Ivanov > From sandhya.viswanathan at intel.com Wed May 1 22:16:44 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Wed, 1 May 2019 22:16:44 +0000 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB845@FMSMSX126.amr.corp.intel.com> <21eeec09-624f-2dbd-b2f5-86d512233fe0@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB898@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABCE7@FMSMSX126.amr.corp.intel.com> <4a77b7c0-fc1a-441c-d018-70568876c4f4@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABDA2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB5094@FMSMSX126.amr.corp.intel.com> <0cd3fd93-0f1e-a6d0-d4c3-f8d95b533ff7@oracle.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB56B1@FMSMSX126.amr.corp.intel.com> I should add here that your suggestion of adding generic shift instruction etc to the macroAssembler is also wonderful instead of function pointer. I will look into making that change as well. Best Regards, Sandhya -----Original Message----- From: Viswanathan, Sandhya Sent: Wednesday, May 01, 2019 3:10 PM To: 'Vladimir Ivanov' ; Vladimir Kozlov Cc: hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 Hi Vladimir, I agree, I wanted to show both the approaches in this patch to get your feedback: 1) with emit as a function 2) with emit part in the instruct body itself With emit as a function it becomes hard to read and I personally prefer it in the instruct itself as is done for vabsneg2D etc. That is what you are recommending as well so I feel good. Once the adlc enhancement is done both the approaches should give similar binary size. Till then there will be small overhead with approach 2) as emit is duplicated per match rule. I will send an updated patch fixing the two issues you mentioned in your previous email plus this change of using approach 2). Please do let me know if you want to see any other change in this patch. Best Regards, Sandhya -----Original Message----- From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] Sent: Wednesday, May 01, 2019 2:58 PM To: Viswanathan, Sandhya ; Vladimir Kozlov Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.01/ Nice job, Sandhya! Glad to hear the approach pays off! Unfortunately, I must note that AD file becomes much more obscure. Especially with those function pointers. 1528 void emit_vshift16B_code(MacroAssembler& _masm, int opcode, XMMRegister dst, 1529 XMMRegister src, XMMRegister shift, 1530 XMMRegister tmp1, XMMRegister tmp2, Register scratch) { 1531 XX_Inst extendinst = get_extend_inst(opcode == Op_URShiftVB ? false : true); 1532 XX_Inst shiftinst = get_xx_inst(opcode); 1533 1534 (_masm.*extendinst)(tmp1, src); 1535 (_masm.*shiftinst)(tmp1, shift); 1536 __ pshufd(tmp2, src, 0xE); 1537 (_masm.*extendinst)(tmp2, tmp2); 1538 (_masm.*shiftinst)(tmp2, shift); 1539 __ movdqu(dst, ExternalAddress(vector_short_to_byte_mask()), scratch); 1540 __ pand(tmp2, dst); 1541 __ pand(dst, tmp1); 1542 __ packuswb(dst, tmp2); 1543 } Have you tried to encapsulate that into x86-specific MacroAssembler? 8682 instruct vshift16B(vecX dst, vecX src, vecS shift, vecX tmp1, vecX tmp2, rRegI scratch) %{ 8683 predicate(UseSSE > 3 && UseAVX <= 1 && n->as_Vector()->length() == 16); 8684 match(Set dst (LShiftVB src shift)); 8685 match(Set dst (RShiftVB src shift)); 8686 match(Set dst (URShiftVB src shift)); 8687 effect(TEMP dst, TEMP tmp1, TEMP tmp2, TEMP scratch); 8688 format %{"pmovxbw $tmp1,$src\n\t" 8689 "shiftop $tmp1,$shift\n\t" 8690 "pshufd $tmp2,$src\n\t" 8691 "pmovxbw $tmp2,$tmp2\n\t" 8692 "shiftop $tmp2,$shift\n\t" 8693 "movdqu $dst,[0x00ff00ff0x00ff00ff]\n\t" 8694 "pand $tmp2,$dst\n\t" 8695 "pand $dst,$tmp1\n\t" 8696 "packuswb $dst,$tmp2\n\t! packed16B shift" %} 8697 ins_encode %{ 8698 emit_vshift16B_code(_masm, this->as_Mach()->ideal_Opcode() , $dst$$XMMRegister, $src$$XMMRegister, $shift$$XMMRegister, $tmp1$$XMMRegister, $tmp2$$XMMRegister, $scratch$$Register); 8699 %} 8700 ins_pipe( pipe_slow ); 8701 %} can be turned into something like: instruct vshift16B(vecX dst, vecX src, vecS shift, vecX tmp1, vecX tmp2, rRegI scratch) %{ predicate(n->as_Vector()->length() == 16); match(Set dst (LShiftVB src shift)); match(Set dst (RShiftVB src shift)); match(Set dst (URShiftVB src shift)); effect(TEMP dst, TEMP tmp1, TEMP tmp2, TEMP scratch); format %{"packed16B shift" %} ins_encode %{ int vlen = 0; // 128-bit BasicType elem_type = T_BYTE; int shift_mode = ...; // L/R/UR or S/U + L/R __ vshift(vlen, elem_type, shift_mode, $dst$$..., $src$$..., $shift$$..., $tmp1$$..., $tmp2$$..., $scratch$$...); %} Then MA::vshift can dispatch between different implementations depending on SSE/AVX level available. Do you see any problems with that from footprint perspective? Ideally, I'd prefer to see a library of operations on vectors encapsulated in MacroAssembler (or a subclass) and used in x86.ad. That will accommodate further reductions in AD instructions needed. Best regards, Vladimir Ivanov > With this webrev the ad file has only about 60 lines effectively added. > Also the generated product libjvm.so size only increases by about 0.26% vs the prior 1.50%. > I have used multiple match rules in one instruct for same size shift related rules and also for the new Abs/Neg rules. > What I noticed is that the adlc still duplicates lot of code and there is potential to further improve code size for multiple match rule case by improving the adlc itself. > The adlc improvement (like removing duplicate emits, formats, expand, pipeline etc) can be done as a separate RFE. > > In this webrev, I have also fixed the errors reported by Vladimir Ivanov and corrected the issues reported by jcheck tool. > Also taken into account reducing the temporary by using TEMP dst for multiply rules. > > The compiler jtreg tests and the java math tests pass on Haswell, SKX, and KNL. > > Your review and feedback is welcome. > > Best Regards, > Sandhya > > > -----Original Message----- > From: hotspot-compiler-dev > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of > Viswanathan, Sandhya > Sent: Wednesday, April 10, 2019 10:22 AM > To: Vladimir Kozlov ; B. Blaser > > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 > > Yes good catch, in mul32B_reg_avx(), the last two instructions are the only place where dst is used: > > __ vpackuswb($dst$$XMMRegister, $tmp2$$XMMRegister, $tmp1$$XMMRegister, vector_len); > __ vpermq($dst$$XMMRegister, $dst$$XMMRegister, 0xD8, > vector_len); > > Here dst can be same as tmp2 or tmp1 in packuswb() and so the effect TEMP dst is not required. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, April 10, 2019 9:59 AM > To: Viswanathan, Sandhya ; B. Blaser > > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > > On 4/10/19 8:36 AM, Viswanathan, Sandhya wrote: >> Hi Bernard, >> >> One could add TEMP dst in effect() to let the register allocator know that dst needs to be different from src. > > Yes, we use this way. Or, in mul4B_reg() case, we can use $dst instead > $tmp2 to avoid overwriting > $src2 before we get value from it if $dst = $src2. > > On other hand, mul32B_reg_avx() and other have 'TEMP dst' effect but $dst is used only for final result. > > It is a little mess which may cause ineffective use of registers in compiled code. > > Thanks, > Vladimir > >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: B. Blaser [mailto:bsrbnd at gmail.com] >> Sent: Wednesday, April 10, 2019 4:10 AM >> To: Viswanathan, Sandhya >> Cc: Vladimir Kozlov ; >> hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> Hi Sandhya and Vladimir K., >> >> On Wed, 10 Apr 2019 at 03:06, Viswanathan, Sandhya wrote: >>> >>> Hi Vladimir, >>> >>> Yes, I missed the question below: >>>>> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? >>> >>> No it is not intentional, we can use the dst register in those cases and reduced the tmps. >> >> I guess we have to be careful using $dst instead of $tmp registers as the allocator sometimes provides identical $src & $dst. Also, I'm not sure this would be possible in the case of mul4B_reg(): >> >> 7349 format %{"pmovsxbw $tmp,$src1\n\t" >> 7350 "pmovsxbw $tmp2,$src2\n\t" >> >> I believe this couldn't work if you use $dst instead of $tmp and $dst = $src2, what do you think? >> >> Thanks, >> Bernard >> From vladimir.x.ivanov at oracle.com Wed May 1 23:17:17 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 1 May 2019 16:17:17 -0700 Subject: [13] RFR (M): 8223213: Implement fast class initialization checks on x86-64 Message-ID: <85a4a478-9200-87f2-c966-49af21f687c2@oracle.com> http://cr.openjdk.java.net/~vlivanov/8223213/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8223213 (It's a followup RFR on a earlier RFC [1].) Recent changes severely affected how static initializers are executed and for long-running initializers it manifested as a severe slowdown. As an example, it led to a 3x slowdown on some Clojure applications (JDK-8219233 [2]). The root cause is that until a class is fully initialized, every invocation of static method on it goes through method resolution. Proposed fix introduces fast class initialization barriers for C1, C2, and template interpreter on x86-64. I did some experiments with cross-platform approaches, but haven't got satisfactory results. On other platforms, behavior stays (mostly) intact. (I had to revert some changes introduced by JDK-8219492 [3], since the assumptions they rely on about accesses inside a class don't hold in all cases.) The barrier is as simple as: if (holder->is_not_initialized() && !holder->is_reentrant_initialization(current_thread)) { // trigger call site re-resolution and block there } There are 3 places where barriers are added: * in template interpreter for invokestatic bytecode; * at nmethod verified entry point (for normal compilations); * c2i adapters; For template interperter, there's additional check added into TemplateTable::resolve_cache_and_index which calls into InterpreterRuntime::resolve_from_cache when fast path checks fail. In case of nmethods, the barrier is put before frame construction, so existing compiler runtime routines can be reused (SharedRuntime::get_handle_wrong_method_stub()). Also, C2 has a guard on entry (Parse::clinit_deopt()) which triggers nmethod recompilation once the class is fully initialized. OSR compilations don't need a barrier. Correspondence between barriers and transitions they cover: (1) from interpreter (barrier on caller side) * all transitions: interpreter, compiled (i2c), native, aot, ... (2) from compiled (barrier on callee side) to compiled, to native (barrier in native wrapper on entry) (3) c2i bypasses both barriers (interpreter and compiled) and requires a dedicated barrier in c2i (4) to Graal/AOT code: from interpreter: covered by interpreter barrier from compiled: call site patching is disabled, leading to repeated call site resolution until method holder is fully initialized (original behavior). Performance experiments with clojure [2] demonstrated that the fix almost completely recuperates the regression: (1) always reresolve (w/o the fix): ~12,0s ( 1x) (2) C1/C2 barriers only: ~3,8s (~3x) (3) int/C1/C2 barriers: ~3,2s (-20%) -------- (4) barriers disabled for invokestatic ~3,2s I deliberately tried to keep the patch backport-friendly for 8u/11u/12u and refrained from using newer features like nmethod barriers introduced recently. The fix can be refactored later specifically for 13 as a followup change. Testing: clojure startup, tier1-5 Thanks! Best regards, Vladimir Ivanov [1] https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-April/037760.html [2] https://bugs.openjdk.java.net/browse/JDK-8219233 [3] https://bugs.openjdk.java.net/browse/JDK-8219492 From vladimir.x.ivanov at oracle.com Wed May 1 23:37:22 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 1 May 2019 16:37:22 -0700 Subject: [13] RFR (M): 8223216: C2: Unify class initialization checks between new, getstatic, and putstatic Message-ID: http://cr.openjdk.java.net/~vlivanov/8223216/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8223216 (The patch has minor dependencies on 8223213 [1] I sent out for review earlier.) C2 implements class initialization checks for new and getstatic/putstatic differently: while "new" supports fast class initialization checks, static field accesses rely on uncommon traps which may lead to deoptimization/recompilation storms during long-running class initialisation. Proposed patch unifies implementation between them and uses the following barrier: if (holder->is_initialized()) { uncommon_trap(initialized, reinterpret); } if (!holder->is_reentrant_initialization(current_thread)) { uncommon_trap(uninitialized, none); } It also enhances checks for not-yet-initialized classes (Compile::needs_clinit_barrier) and unifies the implementation between new, invokestatic, and getfield/putfield. Testing: tier1-5, targeted microbenchmarks, new test from 8223213 Thanks! Best regards, Vladimir Ivanov [1] http://cr.openjdk.java.net/~vlivanov/8223213/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8223213 From vladimir.x.ivanov at oracle.com Thu May 2 00:09:20 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 1 May 2019 17:09:20 -0700 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB56B1@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB845@FMSMSX126.amr.corp.intel.com> <21eeec09-624f-2dbd-b2f5-86d512233fe0@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB898@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABCE7@FMSMSX126.amr.corp.intel.com> <4a77b7c0-fc1a-441c-d018-70568876c4f4@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABDA2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB5094@FMSMSX126.amr.corp.intel.com> <0cd3fd93-0f1e-a6d0-d4c3-f8d95b533ff7@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB56B1@FMSMSX126.amr.corp.intel.com> Message-ID: Sounds good, thanks! Best regards, Vladimir Ivanov On 01/05/2019 15:16, Viswanathan, Sandhya wrote: > I should add here that your suggestion of adding generic shift instruction etc to the macroAssembler is also wonderful instead of function pointer. I will look into making that change as well. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Viswanathan, Sandhya > Sent: Wednesday, May 01, 2019 3:10 PM > To: 'Vladimir Ivanov' ; Vladimir Kozlov > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 > > Hi Vladimir, > > I agree, I wanted to show both the approaches in this patch to get your feedback: > 1) with emit as a function > 2) with emit part in the instruct body itself > > With emit as a function it becomes hard to read and I personally prefer it in the instruct itself as is done for vabsneg2D etc. That is what you are recommending as well so I feel good. > > Once the adlc enhancement is done both the approaches should give similar binary size. Till then there will be small overhead with approach 2) as emit is duplicated per match rule. > > I will send an updated patch fixing the two issues you mentioned in your previous email plus this change of using approach 2). > > Please do let me know if you want to see any other change in this patch. > > Best Regards, > Sandhya > > > > -----Original Message----- > From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] > Sent: Wednesday, May 01, 2019 2:58 PM > To: Viswanathan, Sandhya ; Vladimir Kozlov > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > > >> http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.01/ > > Nice job, Sandhya! Glad to hear the approach pays off! > > Unfortunately, I must note that AD file becomes much more obscure. > Especially with those function pointers. > > 1528 void emit_vshift16B_code(MacroAssembler& _masm, int opcode, XMMRegister dst, > 1529 XMMRegister src, XMMRegister shift, > 1530 XMMRegister tmp1, XMMRegister tmp2, > Register scratch) { > 1531 XX_Inst extendinst = get_extend_inst(opcode == Op_URShiftVB ? > false : true); > 1532 XX_Inst shiftinst = get_xx_inst(opcode); > 1533 > 1534 (_masm.*extendinst)(tmp1, src); > 1535 (_masm.*shiftinst)(tmp1, shift); > 1536 __ pshufd(tmp2, src, 0xE); > 1537 (_masm.*extendinst)(tmp2, tmp2); > 1538 (_masm.*shiftinst)(tmp2, shift); > 1539 __ movdqu(dst, ExternalAddress(vector_short_to_byte_mask()), > scratch); > 1540 __ pand(tmp2, dst); > 1541 __ pand(dst, tmp1); > 1542 __ packuswb(dst, tmp2); > 1543 } > > Have you tried to encapsulate that into x86-specific MacroAssembler? > > 8682 instruct vshift16B(vecX dst, vecX src, vecS shift, vecX tmp1, vecX tmp2, rRegI scratch) %{ > 8683 predicate(UseSSE > 3 && UseAVX <= 1 && n->as_Vector()->length() > == 16); > 8684 match(Set dst (LShiftVB src shift)); > 8685 match(Set dst (RShiftVB src shift)); > 8686 match(Set dst (URShiftVB src shift)); > 8687 effect(TEMP dst, TEMP tmp1, TEMP tmp2, TEMP scratch); > 8688 format %{"pmovxbw $tmp1,$src\n\t" > 8689 "shiftop $tmp1,$shift\n\t" > 8690 "pshufd $tmp2,$src\n\t" > 8691 "pmovxbw $tmp2,$tmp2\n\t" > 8692 "shiftop $tmp2,$shift\n\t" > 8693 "movdqu $dst,[0x00ff00ff0x00ff00ff]\n\t" > 8694 "pand $tmp2,$dst\n\t" > 8695 "pand $dst,$tmp1\n\t" > 8696 "packuswb $dst,$tmp2\n\t! packed16B shift" %} > 8697 ins_encode %{ > 8698 emit_vshift16B_code(_masm, this->as_Mach()->ideal_Opcode() , > $dst$$XMMRegister, $src$$XMMRegister, $shift$$XMMRegister, $tmp1$$XMMRegister, $tmp2$$XMMRegister, $scratch$$Register); > 8699 %} > 8700 ins_pipe( pipe_slow ); > 8701 %} > > can be turned into something like: > > instruct vshift16B(vecX dst, vecX src, vecS shift, vecX tmp1, vecX tmp2, rRegI scratch) %{ > predicate(n->as_Vector()->length() == 16); > match(Set dst (LShiftVB src shift)); > match(Set dst (RShiftVB src shift)); > match(Set dst (URShiftVB src shift)); > effect(TEMP dst, TEMP tmp1, TEMP tmp2, TEMP scratch); > format %{"packed16B shift" %} > ins_encode %{ > int vlen = 0; // 128-bit > BasicType elem_type = T_BYTE; > int shift_mode = ...; // L/R/UR or S/U + L/R > __ vshift(vlen, elem_type, shift_mode, > $dst$$..., $src$$..., $shift$$..., > $tmp1$$..., $tmp2$$..., $scratch$$...); > %} > > Then MA::vshift can dispatch between different implementations depending on SSE/AVX level available. Do you see any problems with that from footprint perspective? > > Ideally, I'd prefer to see a library of operations on vectors encapsulated in MacroAssembler (or a subclass) and used in x86.ad. That will accommodate further reductions in AD instructions needed. > > Best regards, > Vladimir Ivanov > >> With this webrev the ad file has only about 60 lines effectively added. >> Also the generated product libjvm.so size only increases by about 0.26% vs the prior 1.50%. >> I have used multiple match rules in one instruct for same size shift related rules and also for the new Abs/Neg rules. >> What I noticed is that the adlc still duplicates lot of code and there is potential to further improve code size for multiple match rule case by improving the adlc itself. >> The adlc improvement (like removing duplicate emits, formats, expand, pipeline etc) can be done as a separate RFE. >> >> In this webrev, I have also fixed the errors reported by Vladimir Ivanov and corrected the issues reported by jcheck tool. >> Also taken into account reducing the temporary by using TEMP dst for multiply rules. >> >> The compiler jtreg tests and the java math tests pass on Haswell, SKX, and KNL. >> >> Your review and feedback is welcome. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: hotspot-compiler-dev >> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >> Viswanathan, Sandhya >> Sent: Wednesday, April 10, 2019 10:22 AM >> To: Vladimir Kozlov ; B. Blaser >> >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> Yes good catch, in mul32B_reg_avx(), the last two instructions are the only place where dst is used: >> >> __ vpackuswb($dst$$XMMRegister, $tmp2$$XMMRegister, $tmp1$$XMMRegister, vector_len); >> __ vpermq($dst$$XMMRegister, $dst$$XMMRegister, 0xD8, >> vector_len); >> >> Here dst can be same as tmp2 or tmp1 in packuswb() and so the effect TEMP dst is not required. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, April 10, 2019 9:59 AM >> To: Viswanathan, Sandhya ; B. Blaser >> >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> On 4/10/19 8:36 AM, Viswanathan, Sandhya wrote: >>> Hi Bernard, >>> >>> One could add TEMP dst in effect() to let the register allocator know that dst needs to be different from src. >> >> Yes, we use this way. Or, in mul4B_reg() case, we can use $dst instead >> $tmp2 to avoid overwriting >> $src2 before we get value from it if $dst = $src2. >> >> On other hand, mul32B_reg_avx() and other have 'TEMP dst' effect but $dst is used only for final result. >> >> It is a little mess which may cause ineffective use of registers in compiled code. >> >> Thanks, >> Vladimir >> >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: B. Blaser [mailto:bsrbnd at gmail.com] >>> Sent: Wednesday, April 10, 2019 4:10 AM >>> To: Viswanathan, Sandhya >>> Cc: Vladimir Kozlov ; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >>> >>> Hi Sandhya and Vladimir K., >>> >>> On Wed, 10 Apr 2019 at 03:06, Viswanathan, Sandhya wrote: >>>> >>>> Hi Vladimir, >>>> >>>> Yes, I missed the question below: >>>>>> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? >>>> >>>> No it is not intentional, we can use the dst register in those cases and reduced the tmps. >>> >>> I guess we have to be careful using $dst instead of $tmp registers as the allocator sometimes provides identical $src & $dst. Also, I'm not sure this would be possible in the case of mul4B_reg(): >>> >>> 7349 format %{"pmovsxbw $tmp,$src1\n\t" >>> 7350 "pmovsxbw $tmp2,$src2\n\t" >>> >>> I believe this couldn't work if you use $dst instead of $tmp and $dst = $src2, what do you think? >>> >>> Thanks, >>> Bernard >>> From vladimir.kozlov at oracle.com Thu May 2 00:40:05 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 1 May 2019 17:40:05 -0700 Subject: [13] RFR (M): 8223213: Implement fast class initialization checks on x86-64 In-Reply-To: <85a4a478-9200-87f2-c966-49af21f687c2@oracle.com> References: <85a4a478-9200-87f2-c966-49af21f687c2@oracle.com> Message-ID: <9e0616f5-d79b-e439-26dd-a8e3334c10ed@oracle.com> Why you skip patching code compiled by Graal and AOT? The flag UseFastClassInitChecks could be diagnostic or even product. The feature is not for debugging. Thanks, Vladimir K On 5/1/19 4:17 PM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8223213/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8223213 > > (It's a followup RFR on a earlier RFC [1].) > > Recent changes severely affected how static initializers are executed and for long-running initializers it manifested as > a severe slowdown. > As an example, it led to a 3x slowdown on some Clojure applications > (JDK-8219233 [2]). The root cause is that until a class is fully initialized, every invocation of static method on it > goes through method resolution. > > Proposed fix introduces fast class initialization barriers for C1, C2, and template interpreter on x86-64. I did some > experiments with cross-platform approaches, but haven't got satisfactory results. > > On other platforms, behavior stays (mostly) intact. (I had to revert some changes introduced by JDK-8219492 [3], since > the assumptions they rely on about accesses inside a class don't hold in all cases.) > > The barrier is as simple as: > ?? if (holder->is_not_initialized() && > ?????? !holder->is_reentrant_initialization(current_thread)) { > ???? // trigger call site re-resolution and block there > ?? } > > There are 3 places where barriers are added: > ? * in template interpreter for invokestatic bytecode; > ? * at nmethod verified entry point (for normal compilations); > ? * c2i adapters; > > For template interperter, there's additional check added into TemplateTable::resolve_cache_and_index which calls into > InterpreterRuntime::resolve_from_cache when fast path checks fail. > > In case of nmethods, the barrier is put before frame construction, so existing compiler runtime routines can be reused > (SharedRuntime::get_handle_wrong_method_stub()). > > Also, C2 has a guard on entry (Parse::clinit_deopt()) which triggers nmethod recompilation once the class is fully > initialized. > > OSR compilations don't need a barrier. > > Correspondence between barriers and transitions they cover: > ? (1) from interpreter (barrier on caller side) > ?????? * all transitions: interpreter, compiled (i2c), native, aot, ... > > ? (2) from compiled (barrier on callee side) > ?????? to compiled, to native (barrier in native wrapper on entry) > > ? (3) c2i bypasses both barriers (interpreter and compiled) and requires a dedicated barrier in c2i > > ? (4) to Graal/AOT code: > ??????? from interpreter: covered by interpreter barrier > ??????? from compiled: call site patching is disabled, leading to repeated call site resolution until method holder is > fully initialized (original behavior). > > Performance experiments with clojure [2] demonstrated that the fix almost completely recuperates the regression: > > ? (1) always reresolve (w/o the fix):??? ~12,0s ( 1x) > ? (2) C1/C2 barriers only:??????????????? ~3,8s (~3x) > ? (3) int/C1/C2 barriers:???????????????? ~3,2s (-20%) > -------- > ? (4) barriers disabled for invokestatic? ~3,2s > > I deliberately tried to keep the patch backport-friendly for 8u/11u/12u and refrained from using newer features like > nmethod barriers introduced recently. The fix can be refactored later specifically for 13 as a followup change. > > Testing: clojure startup, tier1-5 > > Thanks! > > Best regards, > Vladimir Ivanov > > [1] https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-April/037760.html > [2] https://bugs.openjdk.java.net/browse/JDK-8219233 > [3] https://bugs.openjdk.java.net/browse/JDK-8219492 From vladimir.kozlov at oracle.com Thu May 2 00:42:19 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 1 May 2019 17:42:19 -0700 Subject: [13] RFR (M): 8223216: C2: Unify class initialization checks between new, getstatic, and putstatic In-Reply-To: References: Message-ID: <79c9e6ca-bde7-db7d-4c74-51ee9ddac4f6@oracle.com> Looks good. Thanks, Vladimir On 5/1/19 4:37 PM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8223216/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8223216 > > (The patch has minor dependencies on 8223213 [1] I sent out for review earlier.) > > C2 implements class initialization checks for new and getstatic/putstatic differently: while "new" supports fast class > initialization checks, static field accesses rely on uncommon traps which may lead to deoptimization/recompilation > storms during long-running class initialisation. > > Proposed patch unifies implementation between them and uses the following barrier: > ?? if (holder->is_initialized()) { > ???? uncommon_trap(initialized, reinterpret); > ?? } > ?? if (!holder->is_reentrant_initialization(current_thread)) { > ???? uncommon_trap(uninitialized, none); > ?? } > > It also enhances checks for not-yet-initialized classes (Compile::needs_clinit_barrier) and unifies the implementation > between new, invokestatic, and getfield/putfield. > > Testing: tier1-5, targeted microbenchmarks, new test from 8223213 > > Thanks! > > Best regards, > Vladimir Ivanov > > [1] http://cr.openjdk.java.net/~vlivanov/8223213/webrev.00/ > ??? https://bugs.openjdk.java.net/browse/JDK-8223213 > From tom.rodriguez at oracle.com Thu May 2 00:44:38 2019 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 1 May 2019 17:44:38 -0700 Subject: RFR(S) 8218700: infinite loop in HotSpotJVMCIMetaAccessContext.fromClass after OutOfMemoryError In-Reply-To: <53bcf718-e543-d40c-5486-58b98f66bcee@oracle.com> References: <53bcf718-e543-d40c-5486-58b98f66bcee@oracle.com> Message-ID: You'll need to update your webrev after Vladimir's push. This code has moved into HotSpootJVMCIRuntime.java. Maybe WeakReferenceHolder instead of WeakTypeRef? It needs a comment explaining that we're intentionally avoiding the use of ClassValue.remove as well. Shouldn't the ref field be volatile? ClassValue includes some barrier semantics and the new code needs similar guarantees. tom dean.long at oracle.com wrote on 4/26/19 12:09 PM: > https://bugs.openjdk.java.net/browse/JDK-8218700 > http://cr.openjdk.java.net/~dlong/8218700/webrev.2/ > > If we throw an OutOfMemoryError in the right place (see JDK-8222941), > HotSpotJVMCIMetaAccessContext.fromClass can go into an infinite loop > calling ClassValue.remove.? To work around the problem, reset the value > in a mutable cell instead of calling remove. > > dl From vladimir.x.ivanov at oracle.com Thu May 2 02:13:37 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 1 May 2019 19:13:37 -0700 Subject: [13] RFR (M): 8223213: Implement fast class initialization checks on x86-64 In-Reply-To: <9e0616f5-d79b-e439-26dd-a8e3334c10ed@oracle.com> References: <85a4a478-9200-87f2-c966-49af21f687c2@oracle.com> <9e0616f5-d79b-e439-26dd-a8e3334c10ed@oracle.com> Message-ID: <0ceb99f0-2c37-bb27-9ca4-18e1f145dbbe@oracle.com> Thanks for the feedback, Vladimir! > Why you skip patching code compiled by Graal and AOT? It happens only for classes being initialized and effectively preserve current behavior (re-resolution until class is fully initialized). The motivation is the following: * Graal needs to put class init barriers in nmethods at verified entry point in the same way C1/C2 does with this patch; * regarding AOTed code (I haven't done extensive exploration, but based on private discussions), I believe it needs additional barriers at method entry as well. Once proper support lands in Graal or AOT, the patching can be re-enabled. > The flag UseFastClassInitChecks could be diagnostic or even product. The > feature is not for debugging. The flag is used to signal that platform-specific support is available. Unless there's a use case which benefits from ability to turning it off (disable new barriers and fallback to re-resolution) from command line, I don't see much value in turning the flag into diagnostic/product one. Best regards, Vladimir Ivanov > On 5/1/19 4:17 PM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8223213/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8223213 >> >> (It's a followup RFR on a earlier RFC [1].) >> >> Recent changes severely affected how static initializers are executed >> and for long-running initializers it manifested as a severe slowdown. >> As an example, it led to a 3x slowdown on some Clojure applications >> (JDK-8219233 [2]). The root cause is that until a class is fully >> initialized, every invocation of static method on it goes through >> method resolution. >> >> Proposed fix introduces fast class initialization barriers for C1, C2, >> and template interpreter on x86-64. I did some experiments with >> cross-platform approaches, but haven't got satisfactory results. >> >> On other platforms, behavior stays (mostly) intact. (I had to revert >> some changes introduced by JDK-8219492 [3], since the assumptions they >> rely on about accesses inside a class don't hold in all cases.) >> >> The barrier is as simple as: >> ??? if (holder->is_not_initialized() && >> ??????? !holder->is_reentrant_initialization(current_thread)) { >> ????? // trigger call site re-resolution and block there >> ??? } >> >> There are 3 places where barriers are added: >> ?? * in template interpreter for invokestatic bytecode; >> ?? * at nmethod verified entry point (for normal compilations); >> ?? * c2i adapters; >> >> For template interperter, there's additional check added into >> TemplateTable::resolve_cache_and_index which calls into >> InterpreterRuntime::resolve_from_cache when fast path checks fail. >> >> In case of nmethods, the barrier is put before frame construction, so >> existing compiler runtime routines can be reused >> (SharedRuntime::get_handle_wrong_method_stub()). >> >> Also, C2 has a guard on entry (Parse::clinit_deopt()) which triggers >> nmethod recompilation once the class is fully initialized. >> >> OSR compilations don't need a barrier. >> >> Correspondence between barriers and transitions they cover: >> ?? (1) from interpreter (barrier on caller side) >> ??????? * all transitions: interpreter, compiled (i2c), native, aot, ... >> >> ?? (2) from compiled (barrier on callee side) >> ??????? to compiled, to native (barrier in native wrapper on entry) >> >> ?? (3) c2i bypasses both barriers (interpreter and compiled) and >> requires a dedicated barrier in c2i >> >> ?? (4) to Graal/AOT code: >> ???????? from interpreter: covered by interpreter barrier >> ???????? from compiled: call site patching is disabled, leading to >> repeated call site resolution until method holder is fully initialized >> (original behavior). >> >> Performance experiments with clojure [2] demonstrated that the fix >> almost completely recuperates the regression: >> >> ?? (1) always reresolve (w/o the fix):??? ~12,0s ( 1x) >> ?? (2) C1/C2 barriers only:??????????????? ~3,8s (~3x) >> ?? (3) int/C1/C2 barriers:???????????????? ~3,2s (-20%) >> -------- >> ?? (4) barriers disabled for invokestatic? ~3,2s >> >> I deliberately tried to keep the patch backport-friendly for >> 8u/11u/12u and refrained from using newer features like nmethod >> barriers introduced recently. The fix can be refactored later >> specifically for 13 as a followup change. >> >> Testing: clojure startup, tier1-5 >> >> Thanks! >> >> Best regards, >> Vladimir Ivanov >> >> [1] >> https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-April/037760.html >> >> [2] https://bugs.openjdk.java.net/browse/JDK-8219233 >> [3] https://bugs.openjdk.java.net/browse/JDK-8219492 From dean.long at oracle.com Thu May 2 02:30:16 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 1 May 2019 19:30:16 -0700 Subject: [13] RFR (S): 8223171: Redundant nmethod dependencies for effectively final methods In-Reply-To: <50f8065b-445f-ae1d-00c8-743fd870404a@oracle.com> References:

<5cdb781f-1922-9a5d-9f52-f6874fd6a259@oracle.com> <50f8065b-445f-ae1d-00c8-743fd870404a@oracle.com> Message-ID: <8788c5d0-48f7-6cfd-f733-19ec5bee84b0@oracle.com> Yes, that's exactly what I had in mind :-) dl On 5/1/19 3:15 PM, Vladimir Ivanov wrote: > >> Can you also add check_unique_method(ctxk, uniqm) to the version of >> assert_unique_concrete_method that takes a Method*? > > Like this? > ? http://cr.openjdk.java.net/~vlivanov/8223171/webrev.02/ > > Best regards, > Vladimir Ivanov > >> On 5/1/19 9:52 AM, Vladimir Ivanov wrote: >>> >>>> Does this allow us to assert !uniqm->can_be_statically_bound() in >>>> Dependencies::assert_unique_concrete_method? >>> >>> In general, no. It doesn't hold for final methods: dependency is >>> still needed when context is broad enough, since an overriding >>> method can be loaded in a different part of the hierarchy (under the >>> same context class). >>> >>> In case of the adjusted checks it's safe, since context == method >>> holder when actual_receiver->is_final() == true. >>> >>> ?? if (!callee->is_final_method() && !callee->is_private() && >>> !actual_receiver->is_final()) { >>> dependencies()->assert_unique_concrete_method(actual_receiver, >>> cha_monomorphic_target); >>> ??? } >>> >>> I refactored the patch a bit: >>> ? http://cr.openjdk.java.net/~vlivanov/8223171/webrev.01/ >>> >>>>> Moreover, C2 does add dependencies for private methods. >>> >>> I take it back. Earlier checks handle private methods. Only methods >>> on final classes get redundant dependencies. >>> >>> Best regards, >>> Vladimir Ivanov >> From vladimir.kozlov at oracle.com Thu May 2 02:34:39 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 1 May 2019 19:34:39 -0700 Subject: [13] RFR (M): 8223213: Implement fast class initialization checks on x86-64 In-Reply-To: <0ceb99f0-2c37-bb27-9ca4-18e1f145dbbe@oracle.com> References: <85a4a478-9200-87f2-c966-49af21f687c2@oracle.com> <9e0616f5-d79b-e439-26dd-a8e3334c10ed@oracle.com> <0ceb99f0-2c37-bb27-9ca4-18e1f145dbbe@oracle.com> Message-ID: On 5/1/19 7:13 PM, Vladimir Ivanov wrote: > Thanks for the feedback, Vladimir! > >> Why you skip patching code compiled by Graal and AOT? > > It happens only for classes being initialized and effectively preserve current behavior (re-resolution until class is > fully initialized). > > The motivation is the following: > > ? * Graal needs to put class init barriers in nmethods at verified entry > point in the same way C1/C2 does with this patch; > > ? * regarding AOTed code (I haven't done extensive exploration, but based on private discussions), I believe it needs > additional barriers at method entry as well. When Graal will add barriers AOT code will get them automatically. > > Once proper support lands in Graal or AOT, the patching can be re-enabled. Got it. > >> The flag UseFastClassInitChecks could be diagnostic or even product. The feature is not for debugging. > The flag is used to signal that platform-specific support is available. Unless there's a use case which benefits from > ability to turning it off (disable new barriers and fallback to re-resolution) from command line, I don't see much value > in turning the flag into diagnostic/product one. Okay. Thanks, Vladimir > > Best regards, > Vladimir Ivanov > >> On 5/1/19 4:17 PM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/8223213/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8223213 >>> >>> (It's a followup RFR on a earlier RFC [1].) >>> >>> Recent changes severely affected how static initializers are executed and for long-running initializers it manifested >>> as a severe slowdown. >>> As an example, it led to a 3x slowdown on some Clojure applications >>> (JDK-8219233 [2]). The root cause is that until a class is fully initialized, every invocation of static method on it >>> goes through method resolution. >>> >>> Proposed fix introduces fast class initialization barriers for C1, C2, and template interpreter on x86-64. I did some >>> experiments with cross-platform approaches, but haven't got satisfactory results. >>> >>> On other platforms, behavior stays (mostly) intact. (I had to revert some changes introduced by JDK-8219492 [3], >>> since the assumptions they rely on about accesses inside a class don't hold in all cases.) >>> >>> The barrier is as simple as: >>> ??? if (holder->is_not_initialized() && >>> ??????? !holder->is_reentrant_initialization(current_thread)) { >>> ????? // trigger call site re-resolution and block there >>> ??? } >>> >>> There are 3 places where barriers are added: >>> ?? * in template interpreter for invokestatic bytecode; >>> ?? * at nmethod verified entry point (for normal compilations); >>> ?? * c2i adapters; >>> >>> For template interperter, there's additional check added into TemplateTable::resolve_cache_and_index which calls into >>> InterpreterRuntime::resolve_from_cache when fast path checks fail. >>> >>> In case of nmethods, the barrier is put before frame construction, so existing compiler runtime routines can be >>> reused (SharedRuntime::get_handle_wrong_method_stub()). >>> >>> Also, C2 has a guard on entry (Parse::clinit_deopt()) which triggers nmethod recompilation once the class is fully >>> initialized. >>> >>> OSR compilations don't need a barrier. >>> >>> Correspondence between barriers and transitions they cover: >>> ?? (1) from interpreter (barrier on caller side) >>> ??????? * all transitions: interpreter, compiled (i2c), native, aot, ... >>> >>> ?? (2) from compiled (barrier on callee side) >>> ??????? to compiled, to native (barrier in native wrapper on entry) >>> >>> ?? (3) c2i bypasses both barriers (interpreter and compiled) and requires a dedicated barrier in c2i >>> >>> ?? (4) to Graal/AOT code: >>> ???????? from interpreter: covered by interpreter barrier >>> ???????? from compiled: call site patching is disabled, leading to repeated call site resolution until method holder >>> is fully initialized (original behavior). >>> >>> Performance experiments with clojure [2] demonstrated that the fix almost completely recuperates the regression: >>> >>> ?? (1) always reresolve (w/o the fix):??? ~12,0s ( 1x) >>> ?? (2) C1/C2 barriers only:??????????????? ~3,8s (~3x) >>> ?? (3) int/C1/C2 barriers:???????????????? ~3,2s (-20%) >>> -------- >>> ?? (4) barriers disabled for invokestatic? ~3,2s >>> >>> I deliberately tried to keep the patch backport-friendly for 8u/11u/12u and refrained from using newer features like >>> nmethod barriers introduced recently. The fix can be refactored later specifically for 13 as a followup change. >>> >>> Testing: clojure startup, tier1-5 >>> >>> Thanks! >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [1] https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-April/037760.html >>> [2] https://bugs.openjdk.java.net/browse/JDK-8219233 >>> [3] https://bugs.openjdk.java.net/browse/JDK-8219492 From rahul.v.raghavan at oracle.com Thu May 2 06:45:36 2019 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Thu, 2 May 2019 12:15:36 +0530 Subject: [13] RFR: 8202414: Unsafe write after primitive array creation may result in array length change In-Reply-To: <0E11910B-A4B0-4F64-9B87-5A4BF065B9D2@oracle.com> References: <7e900022-4e16-2ab9-1f4d-89e1510e2646@oracle.com> <392c665f-869c-29af-4fc5-e6f844820846@oracle.com> <3db5d7ab-ad99-310b-e891-fc36d25da338@oracle.com> <7b03a213-7fee-a87f-b48d-250662e730ef@oracle.com>

<6aebd883-0be7-0b05-5364-262e138a1fbc@loongson.cn> <182d87da-0d99-3f33-fbe7-ef5818be0422@loongson.cn> <0936427d-f4d2-299a-87ce-860dce5e57e1@loongson.cn> <574d59f5-3437-738f-e10c-796dcb02b42e@oracle.com>

<5275854c-ab35-f160-f6f0-6ab9ac86e3d0@loongson.cn> <8bc507fe-b6db-d697-8821-0547860de232@oracle.com> <1a398a1f-ed52-2197-5886-d9d5fd872974@loongson.cn> <5607f7ca-57b9-b409-3bce-efc1688f0678@loongson.cn> Message-ID: Hi Jie, this looks good to me too but please add brackets to the checks in InlineTree::is_not_reached. I've submitted some extended testing and let you know once it passed. Someone from the runtime team should also have a look at this because your changes affect the interpreter. CC'ing runtime-dev. Thanks, Tobias On 29.04.19 15:43, Jie Fu wrote: > Hi all, > > May I have another review for this change [1] to finalize the fix? > Thanks a lot. > > Best regards, > Jie > > [1] http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.02/ > > > On 2019?04?20? 11:35, Jie Fu wrote: >> Ah, I got it. >> I like your patch and benefit a lot from you. >> Thank you so much, Vladimir. >> >> Any comments from other reviewers? >> Thanks. >> >> Best regards, >> Jie >> >> On 2019/4/20 ??11:18, Vladimir Ivanov wrote: >>> >>>>> After some explorations I decided to keep original behavior for immature profiles >>>>> (profile.count == -1). >>>> >>>> I agree. >>>> >>>> I have two questions here. >>>> >>>> 1. What's the difference of the following two if statements? >>>> ------------------------------------------------- >>>> +? if (!callee_method->was_executed_more_than(0))? return true; // callee was never executed >>>> + >>>> +? if (caller_method->is_not_reached(caller_bci))? return true; // call site not resolved >>>> ------------------------------------------------- >>>> I think only one of them is needed. >>> >>> The checks are complimentary: one inspects callee and the other looks at call site. >>> >>> "!callee_method->was_executed_more_than(0)" ensures that callee was executed at least once. >>> >>> "caller_method->is_not_reached(caller_bci)" inspects the state of the call site. If corresponding >>> CP entry is not resolved, then the call site isn't reached. If is_not_reached() returns false, >>> it's not a definitive answer: there's still a chance the site is not reached - consider the case >>> of virtual calls where callee_method may differ for the same resolved method. >>> >>>> 2. Does the assert in InlineTree::is_not_reached(...) make sense? >>>> Since we have >>>> ------------------------------------------------- >>>> if (profile.count() > 0)?? return false; // reachable according to profile >>>> ------------------------------------------------- >>>> and >>>> ------------------------------------------------- >>>> if (profile.count() == -1) {...} >>>> ------------------------------------------------- >>>> before >>>> ------------------------------------------------- >>>> assert(profile.count() == 0, "sanity"); >>>> ------------------------------------------------- >>>> is the assert redundant? >>> >>> Asserts are intended to be redundant :-) But still catch bugs from time to time. >>> >>> This one, in particular, checks invariant on profile.count() >= -1 (which is not very useful by >>> itself), but also stresses that "profile.count() == 0" case is being processed. >>> >>> Best regards, >>> Vladimir Ivanov >> > > From claes.redestad at oracle.com Thu May 2 11:03:18 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 2 May 2019 13:03:18 +0200 Subject: [13] RFR (M): 8223213: Implement fast class initialization checks on x86-64 In-Reply-To: <85a4a478-9200-87f2-c966-49af21f687c2@oracle.com> References: <85a4a478-9200-87f2-c966-49af21f687c2@oracle.com> Message-ID: <37ee4fe8-d962-6e05-82f0-7258f5083459@oracle.com> Hi Vladimir, On 2019-05-02 01:17, Vladimir Ivanov wrote: > Performance experiments with clojure [2] demonstrated that the fix > almost completely recuperates the regression: > > ? (1) always reresolve (w/o the fix):??? ~12,0s ( 1x) > ? (2) C1/C2 barriers only:??????????????? ~3,8s (~3x) > ? (3) int/C1/C2 barriers:???????????????? ~3,2s (-20%) > -------- > ? (4) barriers disabled for invokestatic? ~3,2s good stuff! Just to add a few data points I turned some of my earlier experiments to try and isolate some of these issues into a little stress test: BadStress[1]: 11.0.1: 136ms 11.0.2: 13500ms jdk/jdk baseline: 126ms jdk/jdk patched: 123ms GoodStress[2] (baseline): 11.0.1: 56ms 11.0.2: 54ms jdk/jdk baseline: 48ms jdk/jdk patched: 47ms Observations: - On latest jdk/jdk, we've already recuperated most of the cost exposed in these synthetic tests due related fixes (mainly https://bugs.openjdk.java.net/browse/JDK-8188133 and https://bugs.openjdk.java.net/browse/JDK-8219974 ), but the patch helps a bit here too and we're net faster than 11.0.1 (also when taking into account how startup in general has improved since) - The small 1ms startup improvement with the patch on the baseline test is sustained and significant, indicating we have some internal JDK classes exercised during bootstrap which benefit directly from your fixes. I've verified this improvement translates to all our other small-app startup tests. - My tests were too na?ve to capture all the overheads seen with clj - Likely still good performance advice to avoid heavy lifting in static initializers. All in all I think this is a great improvement and hope the added complexity is deemed acceptable. Thanks! /Claes [1] public class BadStress { static void foo() {} static void bar() {} public static class Helper { static void foo() { BadStress.foo(); } } static { long start = System.nanoTime(); for (int i = 0; i < 10_000_000; i++) { Helper.foo(); } for (int i = 0; i < 10_000_000; i++) { bar(); } long end = System.nanoTime(); System.out.println("Elapsed: " + (end - start) + " ns"); } public static void main(String... args) {} } [2] public class GoodStress { public static class Helper { static void foo() {} static void bar() {} } static { long start = System.nanoTime(); for (int i = 0; i < 10_000_000; i++) { Helper.foo(); } for (int i = 0; i < 10_000_000; i++) { Helper.bar(); } long end = System.nanoTime(); System.out.println("Elapsed: " + (end - start) + " ns"); } public static void main(String... args) {} } From tobias.hartmann at oracle.com Thu May 2 12:02:15 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 2 May 2019 14:02:15 +0200 Subject: RFR: 8221542: ~15% performance degradation due to less optimized inline decision In-Reply-To: References:

<5275854c-ab35-f160-f6f0-6ab9ac86e3d0@loongson.cn> <8bc507fe-b6db-d697-8821-0547860de232@oracle.com> <1a398a1f-ed52-2197-5886-d9d5fd872974@loongson.cn> <5607f7ca-57b9-b409-3bce-efc1688f0678@loongson.cn> Message-ID: <9834558d-20f7-5bc6-4058-7cd007b0ad5f@oracle.com> On 02.05.19 11:23, Tobias Hartmann wrote: > I've submitted some extended testing and let you know once it passed. Testing passed. Best regards, Tobias From nils.eliasson at oracle.com Thu May 2 12:31:29 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 2 May 2019 14:31:29 +0200 Subject: RFR(M): 8216137: assert(Compile::current()->live_nodes() < Compile::current()->max_node_limit()) failed: Live Node limit exceeded limit In-Reply-To: <0553d83e-9d77-0295-acff-9fd3e8a44043@oracle.com> References: <0553d83e-9d77-0295-acff-9fd3e8a44043@oracle.com> Message-ID: <48a14e3b-cd31-b906-7fb7-41be0e845b83@oracle.com> Looks good! Regards, Nils On 2019-04-30 16:28, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8216137 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8216137/ > > 8216137: assert(Compile::current()->live_nodes() < > Compile::current()->max_node_limit()) failed: > ???????? Live Node limit exceeded limit > > Also addressed: > > 8219520: assert(Compile::current()->live_nodes() < > Compile::current()->max_node_limit()) failed: > ???????? Live Node limit exceeded limit > > Approach: > > ??? Adding a simplistic (ad-hoc) node budget mechanism, applied during > loop transforms. > > > Testing: hs-tier1..4, hs-precheckin-comp, Kitchensink24h > > > Caveat:? Testing and benchmarking needs to be reran but is currently > experiencing issues. > > > Best regards, > Patric > From nils.eliasson at oracle.com Thu May 2 12:41:19 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 2 May 2019 14:41:19 +0200 Subject: RFR(S): 8223140: Clean-up in 'ok_to_convert()'. In-Reply-To: References:

Message-ID: +1 Regards, Nils On 2019-04-30 18:54, Vladimir Ivanov wrote: > Looks good. > > I like precond/postcond macros. > > +static bool is_cloop_increment(Node* inc) { > +? precond(inc->Opcode() == Op_AddI || inc->Opcode() == Op_AddL); > > Best regards, > Vladimir Ivanov > > On 30/04/2019 07:11, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8223140 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8223140/ >> >> 8223140: Clean-up in 'ok_to_convert()' >> >> ???? Simplify logic in 'ok_to_convert()'. >> ???? Rename 'is_loop_iv()' to 'is_cloop_ind_var()'. >> ???? Adding precond/postcond macros. >> >> >> Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, >> Kitchensink24h) >> >> >> Best regards, >> Patric >> From patric.hedlin at oracle.com Thu May 2 13:32:58 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Thu, 2 May 2019 15:32:58 +0200 Subject: RFR(S): 8223140: Clean-up in 'ok_to_convert()'. In-Reply-To: References:

Message-ID: Thanks Nils. /Patric On 02/05/2019 14:41, Nils Eliasson wrote: > +1 > > Regards, > > Nils > > > On 2019-04-30 18:54, Vladimir Ivanov wrote: >> Looks good. >> >> I like precond/postcond macros. >> >> +static bool is_cloop_increment(Node* inc) { >> +? precond(inc->Opcode() == Op_AddI || inc->Opcode() == Op_AddL); >> >> Best regards, >> Vladimir Ivanov >> >> On 30/04/2019 07:11, Patric Hedlin wrote: >>> Dear all, >>> >>> I would like to ask for help to review the following change/update: >>> >>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8223140 >>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8223140/ >>> >>> 8223140: Clean-up in 'ok_to_convert()' >>> >>> ???? Simplify logic in 'ok_to_convert()'. >>> ???? Rename 'is_loop_iv()' to 'is_cloop_ind_var()'. >>> ???? Adding precond/postcond macros. >>> >>> >>> Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, >>> Kitchensink24h) >>> >>> >>> Best regards, >>> Patric >>> From patric.hedlin at oracle.com Thu May 2 13:33:39 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Thu, 2 May 2019 15:33:39 +0200 Subject: RFR(M): 8216137: assert(Compile::current()->live_nodes() < Compile::current()->max_node_limit()) failed: Live Node limit exceeded limit In-Reply-To: <48a14e3b-cd31-b906-7fb7-41be0e845b83@oracle.com> References: <0553d83e-9d77-0295-acff-9fd3e8a44043@oracle.com> <48a14e3b-cd31-b906-7fb7-41be0e845b83@oracle.com> Message-ID: <34aeb06a-d0f4-87db-0c0a-3cb179ddb9e6@oracle.com> Thanks Nils. /Patric On 02/05/2019 14:31, Nils Eliasson wrote: > Looks good! > > Regards, > > Nils > > > On 2019-04-30 16:28, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8216137 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8216137/ >> >> 8216137: assert(Compile::current()->live_nodes() < >> Compile::current()->max_node_limit()) failed: >> ???????? Live Node limit exceeded limit >> >> Also addressed: >> >> 8219520: assert(Compile::current()->live_nodes() < >> Compile::current()->max_node_limit()) failed: >> ???????? Live Node limit exceeded limit >> >> Approach: >> >> ??? Adding a simplistic (ad-hoc) node budget mechanism, applied >> during loop transforms. >> >> >> Testing: hs-tier1..4, hs-precheckin-comp, Kitchensink24h >> >> >> Caveat:? Testing and benchmarking needs to be reran but is currently >> experiencing issues. >> >> >> Best regards, >> Patric >> From fujie at loongson.cn Thu May 2 15:18:43 2019 From: fujie at loongson.cn (Jie Fu) Date: Thu, 2 May 2019 23:18:43 +0800 Subject: RFR: 8221542: ~15% performance degradation due to less optimized inline decision In-Reply-To: References:

<5275854c-ab35-f160-f6f0-6ab9ac86e3d0@loongson.cn> <8bc507fe-b6db-d697-8821-0547860de232@oracle.com> <1a398a1f-ed52-2197-5886-d9d5fd872974@loongson.cn> <5607f7ca-57b9-b409-3bce-efc1688f0678@loongson.cn> Message-ID: <3910e97e-009e-598a-f91a-8872ecd7ec18@loongson.cn> Hi Tobias, Thank you for your review. I will add the brackets as soon as I come back to my office this Sunday. I sincerely hope that someone from the runtime-dev can also help to review this patch[1]. Thanks a lot. Best regards, Jie [1] http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.02/ On 2019?05?02? 17:23, Tobias Hartmann wrote: > Hi Jie, > > this looks good to me too but please add brackets to the checks in InlineTree::is_not_reached. > > I've submitted some extended testing and let you know once it passed. > > Someone from the runtime team should also have a look at this because your changes affect the > interpreter. CC'ing runtime-dev. > > Thanks, > Tobias > > On 29.04.19 15:43, Jie Fu wrote: >> Hi all, >> >> May I have another review for this change [1] to finalize the fix? >> Thanks a lot. >> >> Best regards, >> Jie >> >> [1] http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.02/ >> >> >> On 2019?04?20? 11:35, Jie Fu wrote: >>> Ah, I got it. >>> I like your patch and benefit a lot from you. >>> Thank you so much, Vladimir. >>> >>> Any comments from other reviewers? >>> Thanks. >>> >>> Best regards, >>> Jie >>> >>> On 2019/4/20 ??11:18, Vladimir Ivanov wrote: >>>>>> After some explorations I decided to keep original behavior for immature profiles >>>>>> (profile.count == -1). >>>>> I agree. >>>>> >>>>> I have two questions here. >>>>> >>>>> 1. What's the difference of the following two if statements? >>>>> ------------------------------------------------- >>>>> + if (!callee_method->was_executed_more_than(0)) return true; // callee was never executed >>>>> + >>>>> + if (caller_method->is_not_reached(caller_bci)) return true; // call site not resolved >>>>> ------------------------------------------------- >>>>> I think only one of them is needed. >>>> The checks are complimentary: one inspects callee and the other looks at call site. >>>> >>>> "!callee_method->was_executed_more_than(0)" ensures that callee was executed at least once. >>>> >>>> "caller_method->is_not_reached(caller_bci)" inspects the state of the call site. If corresponding >>>> CP entry is not resolved, then the call site isn't reached. If is_not_reached() returns false, >>>> it's not a definitive answer: there's still a chance the site is not reached - consider the case >>>> of virtual calls where callee_method may differ for the same resolved method. >>>> >>>>> 2. Does the assert in InlineTree::is_not_reached(...) make sense? >>>>> Since we have >>>>> ------------------------------------------------- >>>>> if (profile.count() > 0) return false; // reachable according to profile >>>>> ------------------------------------------------- >>>>> and >>>>> ------------------------------------------------- >>>>> if (profile.count() == -1) {...} >>>>> ------------------------------------------------- >>>>> before >>>>> ------------------------------------------------- >>>>> assert(profile.count() == 0, "sanity"); >>>>> ------------------------------------------------- >>>>> is the assert redundant? >>>> Asserts are intended to be redundant :-) But still catch bugs from time to time. >>>> >>>> This one, in particular, checks invariant on profile.count() >= -1 (which is not very useful by >>>> itself), but also stresses that "profile.count() == 0" case is being processed. >>>> >>>> Best regards, >>>> Vladimir Ivanov >> From derekw at marvell.com Thu May 2 15:19:10 2019 From: derekw at marvell.com (Derek White) Date: Thu, 2 May 2019 15:19:10 +0000 Subject: [EXT] Re: [13] RFR (M): 8223213: Implement fast class initialization checks on x86-64 In-Reply-To: <37ee4fe8-d962-6e05-82f0-7258f5083459@oracle.com> References: <85a4a478-9200-87f2-c966-49af21f687c2@oracle.com> <37ee4fe8-d962-6e05-82f0-7258f5083459@oracle.com> Message-ID: Hi Vladimir, I want to be clear on the relationship between bugs and patches: 8223213 and patchset is intended to *replace* 8219233 and it's patchset, or be applied on top of it? https://bugs.openjdk.java.net/browse/JDK-8223213, https://bugs.openjdk.java.net/browse/JDK-8219233 Thanks! - Derek > -----Original Message----- > From: hotspot-dev On Behalf Of > Claes Redestad > Sent: Thursday, May 02, 2019 7:03 AM > To: Vladimir Ivanov ; hotspot compiler > ; hotspot-runtime-dev runtime-dev at openjdk.java.net>; hotspot-dev developers dev at openjdk.java.net> > Subject: [EXT] Re: [13] RFR (M): 8223213: Implement fast class initialization > checks on x86-64 > > External Email > > ---------------------------------------------------------------------- > Hi Vladimir, > > On 2019-05-02 01:17, Vladimir Ivanov wrote: > > Performance experiments with clojure [2] demonstrated that the fix > > almost completely recuperates the regression: > > > > ? (1) always reresolve (w/o the fix):??? ~12,0s ( 1x) > > ? (2) C1/C2 barriers only:??????????????? ~3,8s (~3x) > > ? (3) int/C1/C2 barriers:???????????????? ~3,2s (-20%) > > -------- > > ? (4) barriers disabled for invokestatic? ~3,2s > > good stuff! > > Just to add a few data points I turned some of my earlier experiments to try > and isolate some of these issues into a little stress test: > > BadStress[1]: > 11.0.1: 136ms > 11.0.2: 13500ms > jdk/jdk baseline: 126ms > jdk/jdk patched: 123ms > > GoodStress[2] (baseline): > 11.0.1: 56ms > 11.0.2: 54ms > jdk/jdk baseline: 48ms > jdk/jdk patched: 47ms > > Observations: > > - On latest jdk/jdk, we've already recuperated most of the cost exposed > in these synthetic tests due related fixes (mainly > https://bugs.openjdk.java.net/browse/JDK-8188133 and > https://bugs.openjdk.java.net/browse/JDK-8219974 ), but the patch > helps a bit here too and we're net faster than 11.0.1 (also when > taking into account how startup in general has improved since) > > - The small 1ms startup improvement with the patch on the baseline test > is sustained and significant, indicating we have some internal JDK > classes exercised during bootstrap which benefit directly from your > fixes. I've verified this improvement translates to all our other > small-app startup tests. > > - My tests were too na?ve to capture all the overheads seen with clj > > - Likely still good performance advice to avoid heavy lifting in static > initializers. > > All in all I think this is a great improvement and hope the added complexity is > deemed acceptable. > > Thanks! > > /Claes > > [1] > public class BadStress { > static void foo() {} > static void bar() {} > public static class Helper { > static void foo() { BadStress.foo(); } > } > static { > long start = System.nanoTime(); > for (int i = 0; i < 10_000_000; i++) { > Helper.foo(); > } > for (int i = 0; i < 10_000_000; i++) { > bar(); > } > long end = System.nanoTime(); > System.out.println("Elapsed: " + (end - start) + " ns"); > } > public static void main(String... args) {} } > > [2] > public class GoodStress { > public static class Helper { > static void foo() {} > static void bar() {} > } > static { > long start = System.nanoTime(); > for (int i = 0; i < 10_000_000; i++) { > Helper.foo(); > } > for (int i = 0; i < 10_000_000; i++) { > Helper.bar(); > } > long end = System.nanoTime(); > System.out.println("Elapsed: " + (end - start) + " ns"); > } > public static void main(String... args) {} } From vladimir.x.ivanov at oracle.com Thu May 2 16:26:20 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 2 May 2019 09:26:20 -0700 Subject: [EXT] Re: [13] RFR (M): 8223213: Implement fast class initialization checks on x86-64 In-Reply-To: References: <85a4a478-9200-87f2-c966-49af21f687c2@oracle.com> <37ee4fe8-d962-6e05-82f0-7258f5083459@oracle.com> Message-ID: <0201a50a-0006-1b28-9b7b-4ce5ed534795@oracle.com> Derek, > I want to be clear on the relationship between bugs and patches: > > 8223213 and patchset is intended to *replace* 8219233 and it's patchset, or be applied on top of it? > https://bugs.openjdk.java.net/browse/JDK-8223213, > https://bugs.openjdk.java.net/browse/JDK-8219233 The former: 8223213 supersedes 8219233 patchset. The plan is to use 8219233 as an umbrella for tracking the progress of relevant fixes and backports. Best regards, Vladimir Ivanov >> -----Original Message----- >> From: hotspot-dev On Behalf Of >> Claes Redestad >> Sent: Thursday, May 02, 2019 7:03 AM >> To: Vladimir Ivanov ; hotspot compiler >> ; hotspot-runtime-dev > runtime-dev at openjdk.java.net>; hotspot-dev developers > dev at openjdk.java.net> >> Subject: [EXT] Re: [13] RFR (M): 8223213: Implement fast class initialization >> checks on x86-64 >> >> External Email >> >> ---------------------------------------------------------------------- >> Hi Vladimir, >> >> On 2019-05-02 01:17, Vladimir Ivanov wrote: >>> Performance experiments with clojure [2] demonstrated that the fix >>> almost completely recuperates the regression: >>> >>> ? (1) always reresolve (w/o the fix):??? ~12,0s ( 1x) >>> ? (2) C1/C2 barriers only:??????????????? ~3,8s (~3x) >>> ? (3) int/C1/C2 barriers:???????????????? ~3,2s (-20%) >>> -------- >>> ? (4) barriers disabled for invokestatic? ~3,2s >> >> good stuff! >> >> Just to add a few data points I turned some of my earlier experiments to try >> and isolate some of these issues into a little stress test: >> >> BadStress[1]: >> 11.0.1: 136ms >> 11.0.2: 13500ms >> jdk/jdk baseline: 126ms >> jdk/jdk patched: 123ms >> >> GoodStress[2] (baseline): >> 11.0.1: 56ms >> 11.0.2: 54ms >> jdk/jdk baseline: 48ms >> jdk/jdk patched: 47ms >> >> Observations: >> >> - On latest jdk/jdk, we've already recuperated most of the cost exposed >> in these synthetic tests due related fixes (mainly >> https://bugs.openjdk.java.net/browse/JDK-8188133 and >> https://bugs.openjdk.java.net/browse/JDK-8219974 ), but the patch >> helps a bit here too and we're net faster than 11.0.1 (also when >> taking into account how startup in general has improved since) >> >> - The small 1ms startup improvement with the patch on the baseline test >> is sustained and significant, indicating we have some internal JDK >> classes exercised during bootstrap which benefit directly from your >> fixes. I've verified this improvement translates to all our other >> small-app startup tests. >> >> - My tests were too na?ve to capture all the overheads seen with clj >> >> - Likely still good performance advice to avoid heavy lifting in static >> initializers. >> >> All in all I think this is a great improvement and hope the added complexity is >> deemed acceptable. >> >> Thanks! >> >> /Claes >> >> [1] >> public class BadStress { >> static void foo() {} >> static void bar() {} >> public static class Helper { >> static void foo() { BadStress.foo(); } >> } >> static { >> long start = System.nanoTime(); >> for (int i = 0; i < 10_000_000; i++) { >> Helper.foo(); >> } >> for (int i = 0; i < 10_000_000; i++) { >> bar(); >> } >> long end = System.nanoTime(); >> System.out.println("Elapsed: " + (end - start) + " ns"); >> } >> public static void main(String... args) {} } >> >> [2] >> public class GoodStress { >> public static class Helper { >> static void foo() {} >> static void bar() {} >> } >> static { >> long start = System.nanoTime(); >> for (int i = 0; i < 10_000_000; i++) { >> Helper.foo(); >> } >> for (int i = 0; i < 10_000_000; i++) { >> Helper.bar(); >> } >> long end = System.nanoTime(); >> System.out.println("Elapsed: " + (end - start) + " ns"); >> } >> public static void main(String... args) {} } From vladimir.x.ivanov at oracle.com Thu May 2 16:28:07 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 2 May 2019 09:28:07 -0700 Subject: [13] RFR (S): 8223171: Redundant nmethod dependencies for effectively final methods In-Reply-To: <8788c5d0-48f7-6cfd-f733-19ec5bee84b0@oracle.com> References:

<5cdb781f-1922-9a5d-9f52-f6874fd6a259@oracle.com> <50f8065b-445f-ae1d-00c8-743fd870404a@oracle.com> <8788c5d0-48f7-6cfd-f733-19ec5bee84b0@oracle.com> Message-ID: <7add187c-56d8-243b-3f1e-f2bfc933a7f7@oracle.com> Thanks, Dean. Best regards, Vladimir Ivanov On 01/05/2019 19:30, dean.long at oracle.com wrote: > Yes, that's exactly what I had in mind :-) > > dl > > On 5/1/19 3:15 PM, Vladimir Ivanov wrote: >> >>> Can you also add check_unique_method(ctxk, uniqm) to the version of >>> assert_unique_concrete_method that takes a Method*? >> >> Like this? >> ? http://cr.openjdk.java.net/~vlivanov/8223171/webrev.02/ >> >> Best regards, >> Vladimir Ivanov >> >>> On 5/1/19 9:52 AM, Vladimir Ivanov wrote: >>>> >>>>> Does this allow us to assert !uniqm->can_be_statically_bound() in >>>>> Dependencies::assert_unique_concrete_method? >>>> >>>> In general, no. It doesn't hold for final methods: dependency is >>>> still needed when context is broad enough, since an overriding >>>> method can be loaded in a different part of the hierarchy (under the >>>> same context class). >>>> >>>> In case of the adjusted checks it's safe, since context == method >>>> holder when actual_receiver->is_final() == true. >>>> >>>> ?? if (!callee->is_final_method() && !callee->is_private() && >>>> !actual_receiver->is_final()) { >>>> dependencies()->assert_unique_concrete_method(actual_receiver, >>>> cha_monomorphic_target); >>>> ??? } >>>> >>>> I refactored the patch a bit: >>>> ? http://cr.openjdk.java.net/~vlivanov/8223171/webrev.01/ >>>> >>>>>> Moreover, C2 does add dependencies for private methods. >>>> >>>> I take it back. Earlier checks handle private methods. Only methods >>>> on final classes get redundant dependencies. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>> > From john.r.rose at oracle.com Thu May 2 18:31:50 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 2 May 2019 11:31:50 -0700 Subject: [13] RFR: 8202414: Unsafe write after primitive array creation may result in array length change In-Reply-To: References: <7e900022-4e16-2ab9-1f4d-89e1510e2646@oracle.com> <392c665f-869c-29af-4fc5-e6f844820846@oracle.com> <3db5d7ab-ad99-310b-e891-fc36d25da338@oracle.com> <7b03a213-7fee-a87f-b48d-250662e730ef@oracle.com>

<959abf54-d1da-95ee-9cf6-6c6d8ec5e4a1@oracle.com> <18115aa8-edaa-31b9-02a6-06721d9fbfc9@oracle.com> <939f3f5d-b8e7-939f-8953-d34a0f3ff6c9@oracle.com> <259ef902-778b-7eef-46e2-d1927950d21c@oracle.com> <73f7c647-3194-2a65-6cc6-a15cbf6c82be@oracle.com> <37837126-c9d5-1bb1-fc9a-6fb9b848efbe@oracle.com> <28955bc6-020a-29e1-953c-e9f48932cd56@oracle.com> <0E11910B-A4B0-4F64-9B87-5A4BF065B9D2@oracle.com> Message-ID: <2BD07B45-71CE-41FD-B83B-6E043FB63F09@oracle.com> On May 1, 2019, at 11:45 PM, Rahul Raghavan wrote: > > Thank you, understood your point. > But please note here new code cannot be inserted at the other deleted code location as such, due to the dependency on 'offset'! Oops, my bad! Carry on. :-) From vladimir.x.ivanov at oracle.com Thu May 2 21:55:48 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 2 May 2019 14:55:48 -0700 Subject: [13] RFR (M): 8223216: C2: Unify class initialization checks between new, getstatic, and putstatic In-Reply-To: <79c9e6ca-bde7-db7d-4c74-51ee9ddac4f6@oracle.com> References: <79c9e6ca-bde7-db7d-4c74-51ee9ddac4f6@oracle.com> Message-ID: Thanks, Vladimir. Best regards, Vladimir Ivanov On 01/05/2019 17:42, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 5/1/19 4:37 PM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8223216/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8223216 >> >> (The patch has minor dependencies on 8223213 [1] I sent out for review >> earlier.) >> >> C2 implements class initialization checks for new and >> getstatic/putstatic differently: while "new" supports fast class >> initialization checks, static field accesses rely on uncommon traps >> which may lead to deoptimization/recompilation storms during >> long-running class initialisation. >> >> Proposed patch unifies implementation between them and uses the >> following barrier: >> ??? if (holder->is_initialized()) { >> ????? uncommon_trap(initialized, reinterpret); >> ??? } >> ??? if (!holder->is_reentrant_initialization(current_thread)) { >> ????? uncommon_trap(uninitialized, none); >> ??? } >> >> It also enhances checks for not-yet-initialized classes >> (Compile::needs_clinit_barrier) and unifies the implementation between >> new, invokestatic, and getfield/putfield. >> >> Testing: tier1-5, targeted microbenchmarks, new test from 8223213 >> >> Thanks! >> >> Best regards, >> Vladimir Ivanov >> >> [1] http://cr.openjdk.java.net/~vlivanov/8223213/webrev.00/ >> ???? https://bugs.openjdk.java.net/browse/JDK-8223213 >> From xxinliu at amazon.com Fri May 3 00:21:07 2019 From: xxinliu at amazon.com (Liu, Xin) Date: Fri, 3 May 2019 00:21:07 +0000 Subject: 8222670 patch review: prevent downgraded tasks from recompiling In-Reply-To: <0fca1798-5851-3e5d-e603-54282dc3be81@oracle.com> References: <99aae03d0315482c723abda2f2cb530b4b52f82d.camel@redhat.com> <427BC0A9-DAB2-43A3-AF93-F96414EC1E7E@amazon.com> <0fca1798-5851-3e5d-e603-54282dc3be81@oracle.com> Message-ID: Hi, Tobias, Thanks for the review. I fixed copyrights and the typo of clearMethodState0. Here is the new revision. https://cr.openjdk.java.net/~xliu/8222670/webrev.04/ ?On 5/2/19, 2:05 AM, "Tobias Hartmann" wrote: Hi, in the bug description you state: > CompileBroker::compile_method fails to detect the pre-existing nmethod because comp_level doesn't match But why is that? If a downgraded compilation succeeded at level 2, shouldn't a re-compilation at the same level be detected by CompileBroker::compilation_is_complete() in CompileBroker::compile_method()? That's the very root cause of level2 recompilation. In CompileBroker::compile_method(), its input argument is comp_level = 3. CompileBroker::compilation_is_complete returns false because codecache only has level=2 nmethod. I don't know why, but hotpsot is also very stubborn. It will request level = 3 again and again. All of them are downgraded to level=2 when they dequeue. Level2RecompilationTest simulates this process. I didn't make it up. I observe the symptom in some real services as follows. https://bugs.openjdk.java.net/secure/attachment/82079/lvl2_recomp_spring.log.zip thanks, --lx You need to update the copyright date in Level2RecompilationTest.java (should be 2019 only). Thanks, Tobias On 26.04.19 09:36, Liu, Xin wrote: > Gently ping. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8222670 > I got the new revision. > https://cr.openjdk.java.net/~xliu/8222670/webrev.03/ > > I finish up test Level2RecompilationTest.java. if you want to start a OSR compilation, you have to specify bci which points to the begin of a BB. > Give them bci = 0 is good enough for general cases. > > Thanks, > --lx > > > > > On 4/19/19, 11:19 PM, "Liu, Xin" wrote: > > hi, Severin, > > Thanks for reviewing. Yes, it's irrelevant. I revert it. please check it out. > https://cr.openjdk.java.net/~xliu/8222670/webrev.02/ > > Please note that I added an assertion InstanceKlass::add_osr_nmethod(nmethod* n) in this webrev. > In my understanding, it is a potential memleak of codecache. If there's no higher level of osr compilation, those dups will stay in codecache forever. > > Further, it doesn?t make sense to recompile with the same level and same bci. With this assertion, the following tests in tier1-test failed. > test/hotspot/jtreg/compiler/intrinsics/unsafe/DirectByteBufferTest.java > test/hotspot/jtreg/compiler/intrinsics/unsafe/HeapByteBufferTest.java > test/jdk/java/util/stream/test/org/openjdk/tests/java/util/stream/ToArrayOpTest.java > test/jdk/tools/pack200/Pack200Test.java > test/jdk/java/util/Arrays/SortingNearlySortedPrimitive.java > > All crashes happen as I described in JDK-8222670. Eg. duplicated OSR compilations occur for level2. > > Program received signal SIGSEGV, Segmentation fault. > # To suppress the following error report, specify this argument > # after -XX: or in .hotspotrc: SuppressErrorAt=/instanceKlass.cpp:2972 > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/src/src/hotspot/share/oops/instanceKlass.cpp:2972), pid=8347, tid=8361 > # assert(prev == __null || !prev->is_in_use()) failed: redundunt OSR recompilation detected. memory leak in CodeCache! > # > # JRE version: OpenJDK Runtime Environment (13.0) (slowdebug build 13-internal+0-adhoc..src) > # Java VM: OpenJDK 64-Bit Server VM (slowdebug 13-internal+0-adhoc..src, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0xb3dbb4] InstanceKlass::add_osr_nmethod(nmethod*)+0xc4 > # > # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /build/JTwork/scratch/hs_err_pid8347.log > > Program received signal SIGSEGV, Segmentation fault. > Compiled method (c1) 19032 752 % 2 ByteBufferTest::stepUsingAccessors @ 382 (633 bytes) > total in heap [0x00007fffd8f9ff90,0x00007fffd8fac628] = 50840 > relocation [0x00007fffd8fa0110,0x00007fffd8fa1388] = 4728 > main code [0x00007fffd8fa13a0,0x00007fffd8fa7f80] = 27616 > stub code [0x00007fffd8fa7f80,0x00007fffd8fa86c0] = 1856 > oops [0x00007fffd8fa86c0,0x00007fffd8fa86c8] = 8 > metadata [0x00007fffd8fa86c8,0x00007fffd8fa8800] = 312 > scopes data [0x00007fffd8fa8800,0x00007fffd8fa9ff8] = 6136 > scopes pcs [0x00007fffd8fa9ff8,0x00007fffd8fac408] = 9232 > dependencies [0x00007fffd8fac408,0x00007fffd8fac418] = 16 > nul chk table [0x00007fffd8fac418,0x00007fffd8fac628] = 528 > Compiled method (c1) 19032 752 % 2 ByteBufferTest::stepUsingAccessors @ 382 (633 bytes) > total in heap [0x00007fffd8f9ff90,0x00007fffd8fac628] = 50840 > relocation [0x00007fffd8fa0110,0x00007fffd8fa1388] = 4728 > main code [0x00007fffd8fa13a0,0x00007fffd8fa7f80] = 27616 > stub code [0x00007fffd8fa7f80,0x00007fffd8fa86c0] = 1856 > oops [0x00007fffd8fa86c0,0x00007fffd8fa86c8] = 8 > metadata [0x00007fffd8fa86c8,0x00007fffd8fa8800] = 312 > scopes data [0x00007fffd8fa8800,0x00007fffd8fa9ff8] = 6136 > scopes pcs [0x00007fffd8fa9ff8,0x00007fffd8fac408] = 9232 > dependencies [0x00007fffd8fac408,0x00007fffd8fac418] = 16 > nul chk table [0x00007fffd8fac418,0x00007fffd8fac628] = 528 > > > Thanks, > --lx > > On 4/19/19, 9:31 AM, "Severin Gehwolf" wrote: > > On Thu, 2019-04-18 at 19:46 +0000, Liu, Xin wrote: > > Hi, hotspot-compiler group, > > > > Could you review this webrev for JDK-8222670? > > https://cr.openjdk.java.net/~xliu/8222670/webrev.01/ > > +++ new/test/hotspot/jtreg/compiler/tiered/TieredLevelsTest.java 2019-04-18 12:18:38.000000000 -0700 > @@ -89,7 +89,7 @@ > && actual == COMP_LEVEL_LIMITED_PROFILE) { > // for simple method full_profile may be replaced by limited_profile > if (IS_VERBOSE) { > - System.out.printf("Level check: full profiling was replaced " > + System.out.println("Level check: full profiling was replaced " > + "by limited profiling. Expected: %d, actual:%d", > expected, actual); > > This seems an unintended change, is it? > > Thanks, > Severin > > > > > From dean.long at oracle.com Fri May 3 06:47:29 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 2 May 2019 23:47:29 -0700 Subject: RFR(S) 8218700: infinite loop in HotSpotJVMCIMetaAccessContext.fromClass after OutOfMemoryError In-Reply-To: References: <53bcf718-e543-d40c-5486-58b98f66bcee@oracle.com> Message-ID: <1e91a8e6-16bc-2ae0-8aaf-830e1c6b450a@oracle.com> On 5/1/19 5:44 PM, Tom Rodriguez wrote: > You'll need to update your webrev after Vladimir's push.? This code > has moved into HotSpootJVMCIRuntime.java. > Here's the updated version: http://cr.openjdk.java.net/~dlong/8218700/webrev.3/ > Maybe WeakReferenceHolder instead of WeakTypeRef?? It needs a comment > explaining that we're intentionally avoiding the use of > ClassValue.remove as well.? Shouldn't the ref field be volatile? > ClassValue includes some barrier semantics and the new code needs > similar guarantees. > I went ahead and made it volatile, but I don't understand what guarantee was missing, and what problem we want to eliminate, unless it is to reduce the possibility of duplicates.? But the fix for JDK-8201248 assumes that duplicates are possible, so I wasn't worried about that. dl > tom > > dean.long at oracle.com wrote on 4/26/19 12:09 PM: >> https://bugs.openjdk.java.net/browse/JDK-8218700 >> http://cr.openjdk.java.net/~dlong/8218700/webrev.2/ >> >> If we throw an OutOfMemoryError in the right place (see JDK-8222941), >> HotSpotJVMCIMetaAccessContext.fromClass can go into an infinite loop >> calling ClassValue.remove.? To work around the problem, reset the >> value in a mutable cell instead of calling remove. >> >> dl From nils.eliasson at oracle.com Fri May 3 08:40:37 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 3 May 2019 10:40:37 +0200 Subject: RFR(S): 8223138: Small clean-up in loop-tree support. In-Reply-To: <757ff96c-ac8b-7d25-9222-dcd0830b1fed@oracle.com> References: <4883fb85-23ab-acf0-4687-3da50b070a4d@oracle.com> <757ff96c-ac8b-7d25-9222-dcd0830b1fed@oracle.com> Message-ID: <7ae9b35b-d679-d705-41d0-4a6646a2908b@oracle.com> Looks good! Regards, Nils On 2019-04-30 19:25, Patric Hedlin wrote: > Thanks Vladimir. > > On 2019-04-30 18:45, Vladimir Ivanov wrote: >> Looks good. >> >> Small nit: I find original version of IdealLoopTree::tail() easier to >> read. What do you think about the folloing? >> >> inline Node* IdealLoopTree::tail() { >> ? // Handle lazy update of _tail field >> ? if (_tail->in(0) == NULL) { >> ??? _tail = _phase->get_ctrl(n); >> ? } >> ? return _tail; >> } >> > Sure, I'm fine with a revised "old" version as well. > > /Patric > >> Best regards, >> Vladimir Ivanov >> >> On 30/04/2019 07:09, Patric Hedlin wrote: >>> Dear all, >>> >>> I would like to ask for help to review the following change/update: >>> >>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8223138 >>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8223138/ >>> >>> 8223138: Small clean-up in loop-tree support. >>> >>> ???? Rename predicate 'is_inner()' to 'is_innermost()' to be accurate. >>> ???? Add 'is_root()' predicate for root parent test in loop-tree. >>> ???? Change definition of 'is_loop()' to always lazy-read the tail, >>> ???? since it should never be NULL. Clean-up of 'tail()' definition. >>> >>> >>> Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, >>> Kitchensink24h) >>> >>> >>> Best regards, >>> Patric >>> From patric.hedlin at oracle.com Fri May 3 08:42:19 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Fri, 3 May 2019 10:42:19 +0200 Subject: RFR(S): 8223138: Small clean-up in loop-tree support. In-Reply-To: <7ae9b35b-d679-d705-41d0-4a6646a2908b@oracle.com> References: <4883fb85-23ab-acf0-4687-3da50b070a4d@oracle.com> <757ff96c-ac8b-7d25-9222-dcd0830b1fed@oracle.com> <7ae9b35b-d679-d705-41d0-4a6646a2908b@oracle.com> Message-ID: Thanks Nils. /Patric On 03/05/2019 10:40, Nils Eliasson wrote: > Looks good! > > Regards, > > Nils > > On 2019-04-30 19:25, Patric Hedlin wrote: >> Thanks Vladimir. >> >> On 2019-04-30 18:45, Vladimir Ivanov wrote: >>> Looks good. >>> >>> Small nit: I find original version of IdealLoopTree::tail() easier >>> to read. What do you think about the folloing? >>> >>> inline Node* IdealLoopTree::tail() { >>> ? // Handle lazy update of _tail field >>> ? if (_tail->in(0) == NULL) { >>> ??? _tail = _phase->get_ctrl(n); >>> ? } >>> ? return _tail; >>> } >>> >> Sure, I'm fine with a revised "old" version as well. >> >> /Patric >> >>> Best regards, >>> Vladimir Ivanov >>> >>> On 30/04/2019 07:09, Patric Hedlin wrote: >>>> Dear all, >>>> >>>> I would like to ask for help to review the following change/update: >>>> >>>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8223138 >>>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8223138/ >>>> >>>> 8223138: Small clean-up in loop-tree support. >>>> >>>> ???? Rename predicate 'is_inner()' to 'is_innermost()' to be accurate. >>>> ???? Add 'is_root()' predicate for root parent test in loop-tree. >>>> ???? Change definition of 'is_loop()' to always lazy-read the tail, >>>> ???? since it should never be NULL. Clean-up of 'tail()' definition. >>>> >>>> >>>> Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, >>>> Kitchensink24h) >>>> >>>> >>>> Best regards, >>>> Patric >>>> From robbin.ehn at oracle.com Fri May 3 10:31:25 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Fri, 3 May 2019 12:31:25 +0200 Subject: RFR(m): 8221734: Deoptimize with handshakes In-Reply-To: <89b00912-1f84-3458-d53b-fbe6d372affe@oracle.com> References: <89b00912-1f84-3458-d53b-fbe6d372affe@oracle.com> Message-ID: <64a8afca-9dc8-b119-0a12-dd05799bdd22@oracle.com> Hi, please see this update: Inc: http://cr.openjdk.java.net/~rehn/8221734/v2/inc/webrev/index.html Full: http://cr.openjdk.java.net/~rehn/8221734/v2/webrev/ # Note http://cr.openjdk.java.net/~rehn/8221734/v2/inc/webrev/src/hotspot/share/runtime/biasedLocking.cpp.sdiff.html line 630 This is revert to the original, I accidental had left in a temporary test change, as you can see here in full diff: http://cr.openjdk.java.net/~rehn/8221734/v2/webrev/src/hotspot/share/runtime/biasedLocking.cpp.sdiff.html I think I manage to address all review comments. Dean can you please cast an extra eye on: http://cr.openjdk.java.net/~rehn/8221734/v2/inc/webrev/src/hotspot/share/oops/method.cpp.sdiff.html This OR should be correct. Dan please do the same on the biased locking changes. I left out the merge with MutexLocker changes, since it was not interesting. There were some conflicts with JVMCI changes, so incremental contains some parts of that merge. Passes t1-5 and local testing. I'll continue with some additional testing. Thanks, Robbin On 4/25/19 2:05 PM, Robbin Ehn wrote: > Hi all, please review. > > Let's deopt with handshakes. > Removed VM op Deoptimize, instead we handshake. > Locks needs to be inflate since we are not in a safepoint. > > Goes on top of: > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-April/033491.html > > Code: > http://cr.openjdk.java.net/~rehn/8221734/v1/webrev/index.html > Issue: > https://bugs.openjdk.java.net/browse/JDK-8221734 > > Passes t1-7 and multiple t1-5 runs. > > A few startup benchmark see a small speedup. > > Thanks, Robbin From tobias.hartmann at oracle.com Fri May 3 12:29:01 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 3 May 2019 14:29:01 +0200 Subject: 8222670 patch review: prevent downgraded tasks from recompiling In-Reply-To: References: <99aae03d0315482c723abda2f2cb530b4b52f82d.camel@redhat.com> <427BC0A9-DAB2-43A3-AF93-F96414EC1E7E@amazon.com> <0fca1798-5851-3e5d-e603-54282dc3be81@oracle.com> Message-ID: <88c6a4e1-b98b-b0a9-2f76-3f2595be7374@oracle.com> On 03.05.19 02:21, Liu, Xin wrote: > Thanks for the review. I fixed copyrights and the typo of clearMethodState0. > Here is the new revision. > https://cr.openjdk.java.net/~xliu/8222670/webrev.04/ Looks good to me but I think you should also add: if (PrintTieredEvents) { print_event(REMOVE_FROM_QUEUE, method, method, task->osr_bci(), (CompLevel) task->comp_level()); } > But why is that? If a downgraded compilation succeeded at level 2, shouldn't a re-compilation at the > same level be detected by CompileBroker::compilation_is_complete() in CompileBroker::compile_method()? > > That's the very root cause of level2 recompilation. > In CompileBroker::compile_method(), its input argument is comp_level = 3. > CompileBroker::compilation_is_complete returns false because codecache only has level=2 nmethod. > I don't know why, but hotpsot is also very stubborn. It will request level = 3 again and again. All of them are downgraded to level=2 when they dequeue. > > Level2RecompilationTest simulates this process. I didn't make it up. I observe the symptom in some real services as follows. > https://bugs.openjdk.java.net/secure/attachment/82079/lvl2_recomp_spring.log.zip Okay, got it. Thanks, Tobias From rwestrel at redhat.com Fri May 3 12:59:47 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 03 May 2019 14:59:47 +0200 Subject: RFR(S): 8222738: Shenandoah: assert(is_Proj()) failed when running cometd benchmarks In-Reply-To: <0f1a9600-2f2d-f360-9bc5-aa44f49d8990@redhat.com> References: <87zhonnwoq.fsf@redhat.com> <0f1a9600-2f2d-f360-9bc5-aa44f49d8990@redhat.com> Message-ID: <87lfznofr0.fsf@redhat.com> Thanks for the review. Actually, I think it's safer to also make the change below because we want to clone everything that's between the call and the fallthrough/exception paths, that is everything with a control of: the call itself or its control projection. Roland. diff -r f0739ec84bb4 -r 9968255985be src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp --- a/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp Thu Apr 11 12:00:33 2019 +0200 +++ b/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp Thu May 02 20:47:23 2019 +0200 @@ -1362,7 +1362,7 @@ if (idx < n->outcnt()) { Node* u = n->raw_out(idx); Node* c = phase->ctrl_or_self(u); - if (c == ctrl) { + if (phase->is_dominator(call, c) && phase->is_dominator(c, projs.fallthrough_proj)) { stack.set_index(idx+1); assert(!u->is_CFG(), ""); stack.push(u, 0); From vladimir.kozlov at oracle.com Fri May 3 16:03:31 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 3 May 2019 09:03:31 -0700 Subject: [13] RFR(S) 8223262: [AOT] jaotc crashes with assert(!(((ThreadShadow*)__the_thread__)->has_pending_exception())) failed: Should not allocate with exception pending Message-ID: <8603b6d7-8323-7078-aafa-c65437b06718@oracle.com> http://cr.openjdk.java.net/~kvn/8223262/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8223262 Added missing checks for pending exception. Fix was reviewed by Tom R. and Gilles D. and pushed into jvmci-8 by Doug S. I tested it with hs-tier4 and hs-tier6-graal where we had the problem. AOT compilation does not crush now but there are still test failures caused by JDK-8220623 which will be fixed later. -- Thanks, Vladimir From rkennke at redhat.com Fri May 3 16:59:52 2019 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 3 May 2019 18:59:52 +0200 Subject: RFR: JDK-8222079: Don't use memset to initialize fields decode_env constructor in disassembler.cpp In-Reply-To: References: Message-ID: Ping? Thanks, Roman > Recent gcc (I use version 9) complains about using memset to initialize > fields of decode_env. Let's use proper field initializers instead. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8222079 > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8222079/webrev.01/ > > Can I please get a review? > > Thanks, Roman From tom.rodriguez at oracle.com Fri May 3 17:45:02 2019 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 3 May 2019 10:45:02 -0700 Subject: RFR(S) 8218700: infinite loop in HotSpotJVMCIMetaAccessContext.fromClass after OutOfMemoryError In-Reply-To: <1e91a8e6-16bc-2ae0-8aaf-830e1c6b450a@oracle.com> References: <53bcf718-e543-d40c-5486-58b98f66bcee@oracle.com> <1e91a8e6-16bc-2ae0-8aaf-830e1c6b450a@oracle.com> Message-ID: <3d15e9f0-8717-ac82-678d-2139dcfec7f8@oracle.com> dean.long at oracle.com wrote on 5/2/19 11:47 PM: > On 5/1/19 5:44 PM, Tom Rodriguez wrote: >> You'll need to update your webrev after Vladimir's push.? This code >> has moved into HotSpootJVMCIRuntime.java. >> > > Here's the updated version: > > http://cr.openjdk.java.net/~dlong/8218700/webrev.3/ Looks good to me. > >> Maybe WeakReferenceHolder instead of WeakTypeRef?? It needs a comment >> explaining that we're intentionally avoiding the use of >> ClassValue.remove as well.? Shouldn't the ref field be volatile? >> ClassValue includes some barrier semantics and the new code needs >> similar guarantees. >> > > I went ahead and made it volatile, but I don't understand what guarantee > was missing, and what problem we want to eliminate, unless it is to > reduce the possibility of duplicates.? But the fix for JDK-8201248 > assumes that duplicates are possible, so I wasn't worried about that. We're publishing a mutable locally created object to other threads so it seems like we need some sort of ordering barrier when we do so. Presumably the ClassValue would normally provide some ordering though it's a little unclear from the javadoc if it makes any such guarantees. Is the extra volatile unneeded? tom > > dl > >> tom >> >> dean.long at oracle.com wrote on 4/26/19 12:09 PM: >>> https://bugs.openjdk.java.net/browse/JDK-8218700 >>> http://cr.openjdk.java.net/~dlong/8218700/webrev.2/ >>> >>> If we throw an OutOfMemoryError in the right place (see JDK-8222941), >>> HotSpotJVMCIMetaAccessContext.fromClass can go into an infinite loop >>> calling ClassValue.remove.? To work around the problem, reset the >>> value in a mutable cell instead of calling remove. >>> >>> dl > From dean.long at oracle.com Fri May 3 17:54:09 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 3 May 2019 10:54:09 -0700 Subject: [13] RFR (M): 8223216: C2: Unify class initialization checks between new, getstatic, and putstatic In-Reply-To: References: Message-ID: <5e67b2d3-9856-069e-4886-8366c89bc3f8@oracle.com> I like the refactoring. Do you want to have a Runtime reviewer take a look at the new logic? Can you explain why Parse::clinit_deopt() changed from testing for InstanceKlass::fully_initialized to testing for InstanceKlass::being_initialized instead?? How do we know we it is the initializing thread? dl On 5/1/19 4:37 PM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8223216/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8223216 > > (The patch has minor dependencies on 8223213 [1] I sent out for review > earlier.) > > C2 implements class initialization checks for new and > getstatic/putstatic differently: while "new" supports fast class > initialization checks, static field accesses rely on uncommon traps > which may lead to deoptimization/recompilation storms during > long-running class initialisation. > > Proposed patch unifies implementation between them and uses the > following barrier: > ?? if (holder->is_initialized()) { > ???? uncommon_trap(initialized, reinterpret); > ?? } > ?? if (!holder->is_reentrant_initialization(current_thread)) { > ???? uncommon_trap(uninitialized, none); > ?? } > > It also enhances checks for not-yet-initialized classes > (Compile::needs_clinit_barrier) and unifies the implementation between > new, invokestatic, and getfield/putfield. > > Testing: tier1-5, targeted microbenchmarks, new test from 8223213 > > Thanks! > > Best regards, > Vladimir Ivanov > > [1] http://cr.openjdk.java.net/~vlivanov/8223213/webrev.00/ > ??? https://bugs.openjdk.java.net/browse/JDK-8223213 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dean.long at oracle.com Fri May 3 18:55:18 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 3 May 2019 11:55:18 -0700 Subject: RFR(S) 8218700: infinite loop in HotSpotJVMCIMetaAccessContext.fromClass after OutOfMemoryError In-Reply-To: <3d15e9f0-8717-ac82-678d-2139dcfec7f8@oracle.com> References: <53bcf718-e543-d40c-5486-58b98f66bcee@oracle.com> <1e91a8e6-16bc-2ae0-8aaf-830e1c6b450a@oracle.com> <3d15e9f0-8717-ac82-678d-2139dcfec7f8@oracle.com> Message-ID: <415627e2-165c-14a0-a069-2e01de5574d4@oracle.com> On 5/3/19 10:45 AM, Tom Rodriguez wrote: > > > dean.long at oracle.com wrote on 5/2/19 11:47 PM: >> On 5/1/19 5:44 PM, Tom Rodriguez wrote: >>> You'll need to update your webrev after Vladimir's push.? This code >>> has moved into HotSpootJVMCIRuntime.java. >>> >> >> Here's the updated version: >> >> http://cr.openjdk.java.net/~dlong/8218700/webrev.3/ > > Looks good to me. Thanks for the review. > >> >>> Maybe WeakReferenceHolder instead of WeakTypeRef?? It needs a >>> comment explaining that we're intentionally avoiding the use of >>> ClassValue.remove as well. Shouldn't the ref field be volatile? >>> ClassValue includes some barrier semantics and the new code needs >>> similar guarantees. >>> >> >> I went ahead and made it volatile, but I don't understand what >> guarantee was missing, and what problem we want to eliminate, unless >> it is to reduce the possibility of duplicates.? But the fix for >> JDK-8201248 assumes that duplicates are possible, so I wasn't worried >> about that. > > We're publishing a mutable locally created object to other threads so > it seems like we need some sort of ordering barrier when we do so. > Presumably the ClassValue would normally provide some ordering though > it's a little unclear from the javadoc if it makes any such > guarantees. Is the extra volatile unneeded? > ClassValue uses volatile internally so that an unsynchronized read sees the latest version.? Using a volatile here should help in a similar way, but I believe there is still a race that allows duplicates if the weak reference gets cleared by GC.? To prevent all duplicates I think we would need both volatile and more synchronization. dl > tom > >> >> dl >> >>> tom >>> >>> dean.long at oracle.com wrote on 4/26/19 12:09 PM: >>>> https://bugs.openjdk.java.net/browse/JDK-8218700 >>>> http://cr.openjdk.java.net/~dlong/8218700/webrev.2/ >>>> >>>> If we throw an OutOfMemoryError in the right place (see >>>> JDK-8222941), HotSpotJVMCIMetaAccessContext.fromClass can go into >>>> an infinite loop calling ClassValue.remove.? To work around the >>>> problem, reset the value in a mutable cell instead of calling remove. >>>> >>>> dl >> From dean.long at oracle.com Fri May 3 20:31:55 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 3 May 2019 13:31:55 -0700 Subject: RFR: JDK-8222079: Don't use memset to initialize fields decode_env constructor in disassembler.cpp In-Reply-To: References:

Message-ID: <2593bcd8-d4c6-03d2-9d70-d90c94dbb828@oracle.com> Looks good. dl On 5/3/19 9:59 AM, Roman Kennke wrote: > Ping? > > Thanks, > Roman > > >> Recent gcc (I use version 9) complains about using memset to >> initialize fields of decode_env. Let's use proper field initializers >> instead. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8222079 >> Webrev: >> http://cr.openjdk.java.net/~rkennke/JDK-8222079/webrev.01/ >> >> Can I please get a review? >> >> Thanks, Roman From dean.long at oracle.com Fri May 3 20:43:56 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 3 May 2019 13:43:56 -0700 Subject: [13] RFR(S) 8223262: [AOT] jaotc crashes with assert(!(((ThreadShadow*)__the_thread__)->has_pending_exception())) failed: Should not allocate with exception pending In-Reply-To: <8603b6d7-8323-7078-aafa-c65437b06718@oracle.com> References: <8603b6d7-8323-7078-aafa-c65437b06718@oracle.com> Message-ID: Looks good. dl On 5/3/19 9:03 AM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/8223262/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8223262 > > Added missing checks for pending exception. > Fix was reviewed by Tom R. and Gilles D.? and pushed into jvmci-8 by > Doug S. > > I tested it with hs-tier4 and hs-tier6-graal where we had the problem. > AOT compilation does not crush now but there are still test failures > caused by? JDK-8220623 which will be fixed later. > From daniel.daugherty at oracle.com Fri May 3 21:13:14 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 3 May 2019 17:13:14 -0400 Subject: RFR(m): 8221734: Deoptimize with handshakes In-Reply-To: <64a8afca-9dc8-b119-0a12-dd05799bdd22@oracle.com> References: <89b00912-1f84-3458-d53b-fbe6d372affe@oracle.com> <64a8afca-9dc8-b119-0a12-dd05799bdd22@oracle.com> Message-ID: On 5/3/19 6:31 AM, Robbin Ehn wrote: > Hi, please see this update: > > Inc: > http://cr.openjdk.java.net/~rehn/8221734/v2/inc/webrev/index.html > Full: > http://cr.openjdk.java.net/~rehn/8221734/v2/webrev/ src/hotspot/share/aot/aotCodeHeap.cpp ??? No comments. src/hotspot/share/aot/aotCompiledMethod.cpp ??? No comments. src/hotspot/share/code/codeCache.cpp ??? No comments. src/hotspot/share/code/nmethod.cpp ??? No comments. src/hotspot/share/code/nmethod.hpp ??? No comments. src/hotspot/share/gc/z/zBarrierSetNMethod.cpp ??? No comments. src/hotspot/share/gc/z/zNMethod.cpp ??? No comments. src/hotspot/share/jvmci/jvmciEnv.cpp ??? No comments. src/hotspot/share/oops/markOop.hpp ??? No changes to this file. src/hotspot/share/oops/method.cpp ??? No comments. src/hotspot/share/oops/method.hpp ??? No comments. src/hotspot/share/prims/jvmtiEventController.cpp ??? No comments. src/hotspot/share/prims/methodHandles.cpp ??? No comments. src/hotspot/share/prims/whitebox.cpp ??? No comments. src/hotspot/share/runtime/biasedLocking.cpp ??? nit - Please update copyright year for this file. ??? Nice refactoring into more readable chunks! I'm assuming that ??? Patricio is also reviewing these changes... src/hotspot/share/runtime/biasedLocking.hpp ??? No comments. src/hotspot/share/runtime/deoptimization.cpp ??? L778:? bool _in_handshake; ??????? nit - needs one more space of indent. ??? Nice refactoring while adding in the handshake support. src/hotspot/share/runtime/deoptimization.hpp ??? L147:? public: ??? L148: ??? L149: ? // Deoptimizes a frame lazily. nmethod gets patched deopt happens on return to the frame ??? L163: ? static void fix_monitors(JavaThread* thread, frame fr, RegisterMap* map) ??????? Style nit: I would put the blank line on L148 above L147. ??? L164: ??? { inflate_monitors(thread, fr, map); } ??????? Style nit: Should be: ??????????? static void fix_monitors(JavaThread* thread, frame fr, RegisterMap* map) { ????????????? inflate_monitors(thread, fr, map); ??????????? } src/hotspot/share/runtime/mutex.hpp ??? No comments. src/hotspot/share/runtime/mutexLocker.cpp ??? No comments. (So OsrList_lock is now 'special-1' instead of 'leaf'. ??? I presume the Compiler team is okay with that... src/hotspot/share/runtime/mutexLocker.hpp ??? No comments. src/hotspot/share/runtime/synchronizer.cpp ??? No comments. src/hotspot/share/runtime/thread.cpp ??? No comments. src/hotspot/share/runtime/thread.hpp ??? No comments. src/hotspot/share/runtime/vmOperations.cpp ??? No comments. src/hotspot/share/runtime/vmOperations.hpp ??? No comments. src/hotspot/share/services/dtraceAttacher.cpp ??? No comments. > Dan please do the same on the biased locking changes. ??? I did so and they look fine. Thumbs up!? I don't need to see a webrev if you fix the nits... Dan > > # Note > http://cr.openjdk.java.net/~rehn/8221734/v2/inc/webrev/src/hotspot/share/runtime/biasedLocking.cpp.sdiff.html > line 630 > This is revert to the original, I accidental had left in a temporary > test change, as you can see here in full diff: > http://cr.openjdk.java.net/~rehn/8221734/v2/webrev/src/hotspot/share/runtime/biasedLocking.cpp.sdiff.html > > > I think I manage to address all review comments. > > Dean can you please cast an extra eye on: > http://cr.openjdk.java.net/~rehn/8221734/v2/inc/webrev/src/hotspot/share/oops/method.cpp.sdiff.html > > This OR should be correct. > > Dan please do the same on the biased locking changes. > > I left out the merge with MutexLocker changes, since it was not > interesting. > There were some conflicts with JVMCI changes, so incremental contains > some parts of that merge. > > Passes t1-5 and local testing. > I'll continue with some additional testing. > > Thanks, Robbin > > On 4/25/19 2:05 PM, Robbin Ehn wrote: >> Hi all, please review. >> >> Let's deopt with handshakes. >> Removed VM op Deoptimize, instead we handshake. >> Locks needs to be inflate since we are not in a safepoint. >> >> Goes on top of: >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-April/033491.html >> >> >> Code: >> http://cr.openjdk.java.net/~rehn/8221734/v1/webrev/index.html >> Issue: >> https://bugs.openjdk.java.net/browse/JDK-8221734 >> >> Passes t1-7 and multiple t1-5 runs. >> >> A few startup benchmark see a small speedup. >> >> Thanks, Robbin From vladimir.kozlov at oracle.com Fri May 3 21:38:15 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 3 May 2019 14:38:15 -0700 Subject: [13] RFR(S) 8223262: [AOT] jaotc crashes with assert(!(((ThreadShadow*)__the_thread__)->has_pending_exception())) failed: Should not allocate with exception pending In-Reply-To: References: <8603b6d7-8323-7078-aafa-c65437b06718@oracle.com> Message-ID: <48aab9ef-97da-0e3e-2fa3-ceeaac8a9d5d@oracle.com> Thank you, Dean Vladimir On 5/3/19 1:43 PM, dean.long at oracle.com wrote: > Looks good. > > dl > > On 5/3/19 9:03 AM, Vladimir Kozlov wrote: >> http://cr.openjdk.java.net/~kvn/8223262/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8223262 >> >> Added missing checks for pending exception. >> Fix was reviewed by Tom R. and Gilles D.? and pushed into jvmci-8 by Doug S. >> >> I tested it with hs-tier4 and hs-tier6-graal where we had the problem. AOT compilation does not crush now but there >> are still test failures caused by? JDK-8220623 which will be fixed later. >> > From vladimir.x.ivanov at oracle.com Fri May 3 22:49:45 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 3 May 2019 15:49:45 -0700 Subject: [13] RFR (M): 8223216: C2: Unify class initialization checks between new, getstatic, and putstatic In-Reply-To: <5e67b2d3-9856-069e-4886-8366c89bc3f8@oracle.com> References: <5e67b2d3-9856-069e-4886-8366c89bc3f8@oracle.com> Message-ID: Thanks for the feedback, Dean. > Do you want to have a Runtime reviewer take a look at the new logic? I'm definitely looking for feedback on 8223213 from Runtime team. But 8223216 is C2-specific and incrementally builds on top of it, so I don't think there's anything new for Runtime team to look at. > Can you explain why Parse::clinit_deopt() changed from testing for > > InstanceKlass::fully_initialized > > to testing for > > InstanceKlass::being_initialized > > instead?? How do we know we it is the initializing thread? Initializing thread is irrelevant here. The check is solely about the current state of the holder class. Parse::clinit_deopt() is not mandatory (nmethod clinit barrier on entry cover all important cases), but an optimization. It is added by 8223213 specifically for C2 to trigger recompilation once the holder class is fully initialized. The motivation is to get better code when a class is fully initialized. The change in 8223216 is intended as a refactoring: since there are only 2 states allowed here (being_initialized and fully_initialized), it doesn't matter what state is checked (== being initialized vs != fully_initialized). Best regards, Vladimir Ivanov > On 5/1/19 4:37 PM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8223216/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8223216 >> >> (The patch has minor dependencies on 8223213 [1] I sent out for review >> earlier.) >> >> C2 implements class initialization checks for new and >> getstatic/putstatic differently: while "new" supports fast class >> initialization checks, static field accesses rely on uncommon traps >> which may lead to deoptimization/recompilation storms during >> long-running class initialisation. >> >> Proposed patch unifies implementation between them and uses the >> following barrier: >> ?? if (holder->is_initialized()) { >> ???? uncommon_trap(initialized, reinterpret); >> ?? } >> ?? if (!holder->is_reentrant_initialization(current_thread)) { >> ???? uncommon_trap(uninitialized, none); >> ?? } >> >> It also enhances checks for not-yet-initialized classes >> (Compile::needs_clinit_barrier) and unifies the implementation between >> new, invokestatic, and getfield/putfield. >> >> Testing: tier1-5, targeted microbenchmarks, new test from 8223213 >> >> Thanks! >> >> Best regards, >> Vladimir Ivanov >> >> [1] http://cr.openjdk.java.net/~vlivanov/8223213/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8223213 >> > From sandhya.viswanathan at intel.com Fri May 3 23:02:25 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Fri, 3 May 2019 23:02:25 +0000 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB845@FMSMSX126.amr.corp.intel.com> <21eeec09-624f-2dbd-b2f5-86d512233fe0@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB898@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABCE7@FMSMSX126.amr.corp.intel.com> <4a77b7c0-fc1a-441c-d018-70568876c4f4@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABDA2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB5094@FMSMSX126.amr.corp.intel.com> <0cd3fd93-0f1e-a6d0-d4c3-f8d95b533ff7@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB56B1@FMSMSX126.amr.corp.intel.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB7472@FMSMSX126.amr.corp.intel.com> Hi Vladimir, Please find below the updated webrev which implements all your inputs: http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.02/ Looking forward to your feedback. Best Regards, Sandhya -----Original Message----- From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] Sent: Wednesday, May 01, 2019 5:09 PM To: Viswanathan, Sandhya ; Vladimir Kozlov Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 Sounds good, thanks! Best regards, Vladimir Ivanov On 01/05/2019 15:16, Viswanathan, Sandhya wrote: > I should add here that your suggestion of adding generic shift instruction etc to the macroAssembler is also wonderful instead of function pointer. I will look into making that change as well. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Viswanathan, Sandhya > Sent: Wednesday, May 01, 2019 3:10 PM > To: 'Vladimir Ivanov' ; Vladimir Kozlov > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 > > Hi Vladimir, > > I agree, I wanted to show both the approaches in this patch to get your feedback: > 1) with emit as a function > 2) with emit part in the instruct body itself > > With emit as a function it becomes hard to read and I personally prefer it in the instruct itself as is done for vabsneg2D etc. That is what you are recommending as well so I feel good. > > Once the adlc enhancement is done both the approaches should give similar binary size. Till then there will be small overhead with approach 2) as emit is duplicated per match rule. > > I will send an updated patch fixing the two issues you mentioned in your previous email plus this change of using approach 2). > > Please do let me know if you want to see any other change in this patch. > > Best Regards, > Sandhya > > > > -----Original Message----- > From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] > Sent: Wednesday, May 01, 2019 2:58 PM > To: Viswanathan, Sandhya ; Vladimir Kozlov > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > > >> http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.01/ > > Nice job, Sandhya! Glad to hear the approach pays off! > > Unfortunately, I must note that AD file becomes much more obscure. > Especially with those function pointers. > > 1528 void emit_vshift16B_code(MacroAssembler& _masm, int opcode, XMMRegister dst, > 1529 XMMRegister src, XMMRegister shift, > 1530 XMMRegister tmp1, XMMRegister tmp2, > Register scratch) { > 1531 XX_Inst extendinst = get_extend_inst(opcode == Op_URShiftVB ? > false : true); > 1532 XX_Inst shiftinst = get_xx_inst(opcode); > 1533 > 1534 (_masm.*extendinst)(tmp1, src); > 1535 (_masm.*shiftinst)(tmp1, shift); > 1536 __ pshufd(tmp2, src, 0xE); > 1537 (_masm.*extendinst)(tmp2, tmp2); > 1538 (_masm.*shiftinst)(tmp2, shift); > 1539 __ movdqu(dst, ExternalAddress(vector_short_to_byte_mask()), > scratch); > 1540 __ pand(tmp2, dst); > 1541 __ pand(dst, tmp1); > 1542 __ packuswb(dst, tmp2); > 1543 } > > Have you tried to encapsulate that into x86-specific MacroAssembler? > > 8682 instruct vshift16B(vecX dst, vecX src, vecS shift, vecX tmp1, vecX tmp2, rRegI scratch) %{ > 8683 predicate(UseSSE > 3 && UseAVX <= 1 && n->as_Vector()->length() > == 16); > 8684 match(Set dst (LShiftVB src shift)); > 8685 match(Set dst (RShiftVB src shift)); > 8686 match(Set dst (URShiftVB src shift)); > 8687 effect(TEMP dst, TEMP tmp1, TEMP tmp2, TEMP scratch); > 8688 format %{"pmovxbw $tmp1,$src\n\t" > 8689 "shiftop $tmp1,$shift\n\t" > 8690 "pshufd $tmp2,$src\n\t" > 8691 "pmovxbw $tmp2,$tmp2\n\t" > 8692 "shiftop $tmp2,$shift\n\t" > 8693 "movdqu $dst,[0x00ff00ff0x00ff00ff]\n\t" > 8694 "pand $tmp2,$dst\n\t" > 8695 "pand $dst,$tmp1\n\t" > 8696 "packuswb $dst,$tmp2\n\t! packed16B shift" %} > 8697 ins_encode %{ > 8698 emit_vshift16B_code(_masm, this->as_Mach()->ideal_Opcode() , > $dst$$XMMRegister, $src$$XMMRegister, $shift$$XMMRegister, $tmp1$$XMMRegister, $tmp2$$XMMRegister, $scratch$$Register); > 8699 %} > 8700 ins_pipe( pipe_slow ); > 8701 %} > > can be turned into something like: > > instruct vshift16B(vecX dst, vecX src, vecS shift, vecX tmp1, vecX tmp2, rRegI scratch) %{ > predicate(n->as_Vector()->length() == 16); > match(Set dst (LShiftVB src shift)); > match(Set dst (RShiftVB src shift)); > match(Set dst (URShiftVB src shift)); > effect(TEMP dst, TEMP tmp1, TEMP tmp2, TEMP scratch); > format %{"packed16B shift" %} > ins_encode %{ > int vlen = 0; // 128-bit > BasicType elem_type = T_BYTE; > int shift_mode = ...; // L/R/UR or S/U + L/R > __ vshift(vlen, elem_type, shift_mode, > $dst$$..., $src$$..., $shift$$..., > $tmp1$$..., $tmp2$$..., $scratch$$...); > %} > > Then MA::vshift can dispatch between different implementations depending on SSE/AVX level available. Do you see any problems with that from footprint perspective? > > Ideally, I'd prefer to see a library of operations on vectors encapsulated in MacroAssembler (or a subclass) and used in x86.ad. That will accommodate further reductions in AD instructions needed. > > Best regards, > Vladimir Ivanov > >> With this webrev the ad file has only about 60 lines effectively added. >> Also the generated product libjvm.so size only increases by about 0.26% vs the prior 1.50%. >> I have used multiple match rules in one instruct for same size shift related rules and also for the new Abs/Neg rules. >> What I noticed is that the adlc still duplicates lot of code and there is potential to further improve code size for multiple match rule case by improving the adlc itself. >> The adlc improvement (like removing duplicate emits, formats, expand, pipeline etc) can be done as a separate RFE. >> >> In this webrev, I have also fixed the errors reported by Vladimir Ivanov and corrected the issues reported by jcheck tool. >> Also taken into account reducing the temporary by using TEMP dst for multiply rules. >> >> The compiler jtreg tests and the java math tests pass on Haswell, SKX, and KNL. >> >> Your review and feedback is welcome. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: hotspot-compiler-dev >> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >> Viswanathan, Sandhya >> Sent: Wednesday, April 10, 2019 10:22 AM >> To: Vladimir Kozlov ; B. Blaser >> >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> Yes good catch, in mul32B_reg_avx(), the last two instructions are the only place where dst is used: >> >> __ vpackuswb($dst$$XMMRegister, $tmp2$$XMMRegister, $tmp1$$XMMRegister, vector_len); >> __ vpermq($dst$$XMMRegister, $dst$$XMMRegister, 0xD8, >> vector_len); >> >> Here dst can be same as tmp2 or tmp1 in packuswb() and so the effect TEMP dst is not required. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, April 10, 2019 9:59 AM >> To: Viswanathan, Sandhya ; B. Blaser >> >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> On 4/10/19 8:36 AM, Viswanathan, Sandhya wrote: >>> Hi Bernard, >>> >>> One could add TEMP dst in effect() to let the register allocator know that dst needs to be different from src. >> >> Yes, we use this way. Or, in mul4B_reg() case, we can use $dst instead >> $tmp2 to avoid overwriting >> $src2 before we get value from it if $dst = $src2. >> >> On other hand, mul32B_reg_avx() and other have 'TEMP dst' effect but $dst is used only for final result. >> >> It is a little mess which may cause ineffective use of registers in compiled code. >> >> Thanks, >> Vladimir >> >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: B. Blaser [mailto:bsrbnd at gmail.com] >>> Sent: Wednesday, April 10, 2019 4:10 AM >>> To: Viswanathan, Sandhya >>> Cc: Vladimir Kozlov ; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >>> >>> Hi Sandhya and Vladimir K., >>> >>> On Wed, 10 Apr 2019 at 03:06, Viswanathan, Sandhya wrote: >>>> >>>> Hi Vladimir, >>>> >>>> Yes, I missed the question below: >>>>>> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? >>>> >>>> No it is not intentional, we can use the dst register in those cases and reduced the tmps. >>> >>> I guess we have to be careful using $dst instead of $tmp registers as the allocator sometimes provides identical $src & $dst. Also, I'm not sure this would be possible in the case of mul4B_reg(): >>> >>> 7349 format %{"pmovsxbw $tmp,$src1\n\t" >>> 7350 "pmovsxbw $tmp2,$src2\n\t" >>> >>> I believe this couldn't work if you use $dst instead of $tmp and $dst = $src2, what do you think? >>> >>> Thanks, >>> Bernard >>> From vladimir.x.ivanov at oracle.com Fri May 3 23:22:16 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 3 May 2019 16:22:16 -0700 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB7472@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB845@FMSMSX126.amr.corp.intel.com> <21eeec09-624f-2dbd-b2f5-86d512233fe0@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB898@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABCE7@FMSMSX126.amr.corp.intel.com> <4a77b7c0-fc1a-441c-d018-70568876c4f4@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABDA2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB5094@FMSMSX126.amr.corp.intel.com> <0cd3fd93-0f1e-a6d0-d4c3-f8d95b533ff7@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB56B1@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB7472@FMSMSX126.amr.corp.intel.com> Message-ID: <52876f29-4da2-2885-fe18-5e362b57eb2b@oracle.com> > http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.02/ Much better! I like how AD files look now. I assume static footprint numbers you provided earlier are still valid. +void MacroAssembler::vabsnegd(int opcode, XMMRegister dst, Register scr) { + if (opcode == Op_AbsVD) { + andpd(dst, ExternalAddress(StubRoutines::x86::vector_double_sign_mask()), scr); + } else { + assert((opcode == Op_NegVD),"opcode should be Op_NegD"); + xorpd(dst, ExternalAddress(StubRoutines::x86::vector_double_sign_flip()), scr); + } +} It's a bit odd to see C2-specific stuff in MacroAssembler, but I'm perfectly fine with incrementally refactor it later. For now, just guard relevant code with #ifdef COMPILER2. Otherwise, looks very good! Best regards, Vladimir Ivanov > > Looking forward to your feedback. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] > Sent: Wednesday, May 01, 2019 5:09 PM > To: Viswanathan, Sandhya ; Vladimir Kozlov > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > > Sounds good, thanks! > > Best regards, > Vladimir Ivanov > > On 01/05/2019 15:16, Viswanathan, Sandhya wrote: >> I should add here that your suggestion of adding generic shift instruction etc to the macroAssembler is also wonderful instead of function pointer. I will look into making that change as well. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Viswanathan, Sandhya >> Sent: Wednesday, May 01, 2019 3:10 PM >> To: 'Vladimir Ivanov' ; Vladimir Kozlov >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> Hi Vladimir, >> >> I agree, I wanted to show both the approaches in this patch to get your feedback: >> 1) with emit as a function >> 2) with emit part in the instruct body itself >> >> With emit as a function it becomes hard to read and I personally prefer it in the instruct itself as is done for vabsneg2D etc. That is what you are recommending as well so I feel good. >> >> Once the adlc enhancement is done both the approaches should give similar binary size. Till then there will be small overhead with approach 2) as emit is duplicated per match rule. >> >> I will send an updated patch fixing the two issues you mentioned in your previous email plus this change of using approach 2). >> >> Please do let me know if you want to see any other change in this patch. >> >> Best Regards, >> Sandhya >> >> >> >> -----Original Message----- >> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] >> Sent: Wednesday, May 01, 2019 2:58 PM >> To: Viswanathan, Sandhya ; Vladimir Kozlov >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> >>> http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.01/ >> >> Nice job, Sandhya! Glad to hear the approach pays off! >> >> Unfortunately, I must note that AD file becomes much more obscure. >> Especially with those function pointers. >> >> 1528 void emit_vshift16B_code(MacroAssembler& _masm, int opcode, XMMRegister dst, >> 1529 XMMRegister src, XMMRegister shift, >> 1530 XMMRegister tmp1, XMMRegister tmp2, >> Register scratch) { >> 1531 XX_Inst extendinst = get_extend_inst(opcode == Op_URShiftVB ? >> false : true); >> 1532 XX_Inst shiftinst = get_xx_inst(opcode); >> 1533 >> 1534 (_masm.*extendinst)(tmp1, src); >> 1535 (_masm.*shiftinst)(tmp1, shift); >> 1536 __ pshufd(tmp2, src, 0xE); >> 1537 (_masm.*extendinst)(tmp2, tmp2); >> 1538 (_masm.*shiftinst)(tmp2, shift); >> 1539 __ movdqu(dst, ExternalAddress(vector_short_to_byte_mask()), >> scratch); >> 1540 __ pand(tmp2, dst); >> 1541 __ pand(dst, tmp1); >> 1542 __ packuswb(dst, tmp2); >> 1543 } >> >> Have you tried to encapsulate that into x86-specific MacroAssembler? >> >> 8682 instruct vshift16B(vecX dst, vecX src, vecS shift, vecX tmp1, vecX tmp2, rRegI scratch) %{ >> 8683 predicate(UseSSE > 3 && UseAVX <= 1 && n->as_Vector()->length() >> == 16); >> 8684 match(Set dst (LShiftVB src shift)); >> 8685 match(Set dst (RShiftVB src shift)); >> 8686 match(Set dst (URShiftVB src shift)); >> 8687 effect(TEMP dst, TEMP tmp1, TEMP tmp2, TEMP scratch); >> 8688 format %{"pmovxbw $tmp1,$src\n\t" >> 8689 "shiftop $tmp1,$shift\n\t" >> 8690 "pshufd $tmp2,$src\n\t" >> 8691 "pmovxbw $tmp2,$tmp2\n\t" >> 8692 "shiftop $tmp2,$shift\n\t" >> 8693 "movdqu $dst,[0x00ff00ff0x00ff00ff]\n\t" >> 8694 "pand $tmp2,$dst\n\t" >> 8695 "pand $dst,$tmp1\n\t" >> 8696 "packuswb $dst,$tmp2\n\t! packed16B shift" %} >> 8697 ins_encode %{ >> 8698 emit_vshift16B_code(_masm, this->as_Mach()->ideal_Opcode() , >> $dst$$XMMRegister, $src$$XMMRegister, $shift$$XMMRegister, $tmp1$$XMMRegister, $tmp2$$XMMRegister, $scratch$$Register); >> 8699 %} >> 8700 ins_pipe( pipe_slow ); >> 8701 %} >> >> can be turned into something like: >> >> instruct vshift16B(vecX dst, vecX src, vecS shift, vecX tmp1, vecX tmp2, rRegI scratch) %{ >> predicate(n->as_Vector()->length() == 16); >> match(Set dst (LShiftVB src shift)); >> match(Set dst (RShiftVB src shift)); >> match(Set dst (URShiftVB src shift)); >> effect(TEMP dst, TEMP tmp1, TEMP tmp2, TEMP scratch); >> format %{"packed16B shift" %} >> ins_encode %{ >> int vlen = 0; // 128-bit >> BasicType elem_type = T_BYTE; >> int shift_mode = ...; // L/R/UR or S/U + L/R >> __ vshift(vlen, elem_type, shift_mode, >> $dst$$..., $src$$..., $shift$$..., >> $tmp1$$..., $tmp2$$..., $scratch$$...); >> %} >> >> Then MA::vshift can dispatch between different implementations depending on SSE/AVX level available. Do you see any problems with that from footprint perspective? >> >> Ideally, I'd prefer to see a library of operations on vectors encapsulated in MacroAssembler (or a subclass) and used in x86.ad. That will accommodate further reductions in AD instructions needed. >> >> Best regards, >> Vladimir Ivanov >> >>> With this webrev the ad file has only about 60 lines effectively added. >>> Also the generated product libjvm.so size only increases by about 0.26% vs the prior 1.50%. >>> I have used multiple match rules in one instruct for same size shift related rules and also for the new Abs/Neg rules. >>> What I noticed is that the adlc still duplicates lot of code and there is potential to further improve code size for multiple match rule case by improving the adlc itself. >>> The adlc improvement (like removing duplicate emits, formats, expand, pipeline etc) can be done as a separate RFE. >>> >>> In this webrev, I have also fixed the errors reported by Vladimir Ivanov and corrected the issues reported by jcheck tool. >>> Also taken into account reducing the temporary by using TEMP dst for multiply rules. >>> >>> The compiler jtreg tests and the java math tests pass on Haswell, SKX, and KNL. >>> >>> Your review and feedback is welcome. >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev >>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >>> Viswanathan, Sandhya >>> Sent: Wednesday, April 10, 2019 10:22 AM >>> To: Vladimir Kozlov ; B. Blaser >>> >>> Cc: hotspot-compiler-dev at openjdk.java.net >>> Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 >>> >>> Yes good catch, in mul32B_reg_avx(), the last two instructions are the only place where dst is used: >>> >>> __ vpackuswb($dst$$XMMRegister, $tmp2$$XMMRegister, $tmp1$$XMMRegister, vector_len); >>> __ vpermq($dst$$XMMRegister, $dst$$XMMRegister, 0xD8, >>> vector_len); >>> >>> Here dst can be same as tmp2 or tmp1 in packuswb() and so the effect TEMP dst is not required. >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Wednesday, April 10, 2019 9:59 AM >>> To: Viswanathan, Sandhya ; B. Blaser >>> >>> Cc: hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >>> >>> On 4/10/19 8:36 AM, Viswanathan, Sandhya wrote: >>>> Hi Bernard, >>>> >>>> One could add TEMP dst in effect() to let the register allocator know that dst needs to be different from src. >>> >>> Yes, we use this way. Or, in mul4B_reg() case, we can use $dst instead >>> $tmp2 to avoid overwriting >>> $src2 before we get value from it if $dst = $src2. >>> >>> On other hand, mul32B_reg_avx() and other have 'TEMP dst' effect but $dst is used only for final result. >>> >>> It is a little mess which may cause ineffective use of registers in compiled code. >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Best Regards, >>>> Sandhya >>>> >>>> >>>> -----Original Message----- >>>> From: B. Blaser [mailto:bsrbnd at gmail.com] >>>> Sent: Wednesday, April 10, 2019 4:10 AM >>>> To: Viswanathan, Sandhya >>>> Cc: Vladimir Kozlov ; >>>> hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >>>> >>>> Hi Sandhya and Vladimir K., >>>> >>>> On Wed, 10 Apr 2019 at 03:06, Viswanathan, Sandhya wrote: >>>>> >>>>> Hi Vladimir, >>>>> >>>>> Yes, I missed the question below: >>>>>>> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? >>>>> >>>>> No it is not intentional, we can use the dst register in those cases and reduced the tmps. >>>> >>>> I guess we have to be careful using $dst instead of $tmp registers as the allocator sometimes provides identical $src & $dst. Also, I'm not sure this would be possible in the case of mul4B_reg(): >>>> >>>> 7349 format %{"pmovsxbw $tmp,$src1\n\t" >>>> 7350 "pmovsxbw $tmp2,$src2\n\t" >>>> >>>> I believe this couldn't work if you use $dst instead of $tmp and $dst = $src2, what do you think? >>>> >>>> Thanks, >>>> Bernard >>>> From fujie at loongson.cn Fri May 3 23:24:28 2019 From: fujie at loongson.cn (Jie Fu) Date: Sat, 4 May 2019 07:24:28 +0800 Subject: RFR: 8221542: ~15% performance degradation due to less optimized inline decision In-Reply-To: References:

Message-ID: <8510740c-ac56-f8d2-3c5e-451dfa6948a0@loongson.cn> Hi Vladimir Ivanov, The patch in the attachment has been updated by adding brackets to the checks in InlineTree::is_not_reached. Is it OK to be pushed? If so, could you please sponsor it? Thanks a lot. Best regards, Jie On 2019?05?04? 06:06, Vladimir Ivanov wrote: > CCing Jie Fu. > > Best regards, > Vladimir Ivanov > > On 03/05/2019 14:55, coleen.phillimore at oracle.com wrote: >> >> http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.02/src/hotspot/share/oops/cpCache.cpp.frames.html >> >> >> This looks like it should have gotten the wrong answer without this >> change (there appears to be protection from an index out of range) >> even without your patch. f2 is Method* for invokeinterface now. >> >> The runtime part of this change look good to me. >> >> Thanks, >> Coleen >> >> >> On 5/2/19 5:23 AM, Tobias Hartmann wrote: >>> Hi Jie, >>> >>> this looks good to me too but please add brackets to the checks in >>> InlineTree::is_not_reached. >>> >>> I've submitted some extended testing and let you know once it passed. >>> >>> Someone from the runtime team should also have a look at this >>> because your changes affect the >>> interpreter. CC'ing runtime-dev. >>> >>> Thanks, >>> Tobias >>> >>> On 29.04.19 15:43, Jie Fu wrote: >>>> Hi all, >>>> >>>> May I have another review for this change [1] to finalize the fix? >>>> Thanks a lot. >>>> >>>> Best regards, >>>> Jie >>>> >>>> [1] http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.02/ >>>> >>>> >>>> On 2019?04?20? 11:35, Jie Fu wrote: >>>>> Ah, I got it. >>>>> I like your patch and benefit a lot from you. >>>>> Thank you so much, Vladimir. >>>>> >>>>> Any comments from other reviewers? >>>>> Thanks. >>>>> >>>>> Best regards, >>>>> Jie >>>>> >>>>> On 2019/4/20 ??11:18, Vladimir Ivanov wrote: >>>>>>>> After some explorations I decided to keep original behavior for >>>>>>>> immature profiles >>>>>>>> (profile.count == -1). >>>>>>> I agree. >>>>>>> >>>>>>> I have two questions here. >>>>>>> >>>>>>> 1. What's the difference of the following two if statements? >>>>>>> ------------------------------------------------- >>>>>>> + if (!callee_method->was_executed_more_than(0)) return true; >>>>>>> // callee was never executed >>>>>>> + >>>>>>> + if (caller_method->is_not_reached(caller_bci)) return true; >>>>>>> // call site not resolved >>>>>>> ------------------------------------------------- >>>>>>> I think only one of them is needed. >>>>>> The checks are complimentary: one inspects callee and the other >>>>>> looks at call site. >>>>>> >>>>>> "!callee_method->was_executed_more_than(0)" ensures that callee >>>>>> was executed at least once. >>>>>> >>>>>> "caller_method->is_not_reached(caller_bci)" inspects the state of >>>>>> the call site. If corresponding >>>>>> CP entry is not resolved, then the call site isn't reached. If >>>>>> is_not_reached() returns false, >>>>>> it's not a definitive answer: there's still a chance the site is >>>>>> not reached - consider the case >>>>>> of virtual calls where callee_method may differ for the same >>>>>> resolved method. >>>>>> >>>>>>> 2. Does the assert in InlineTree::is_not_reached(...) make sense? >>>>>>> Since we have >>>>>>> ------------------------------------------------- >>>>>>> if (profile.count() > 0) return false; // reachable according >>>>>>> to profile >>>>>>> ------------------------------------------------- >>>>>>> and >>>>>>> ------------------------------------------------- >>>>>>> if (profile.count() == -1) {...} >>>>>>> ------------------------------------------------- >>>>>>> before >>>>>>> ------------------------------------------------- >>>>>>> assert(profile.count() == 0, "sanity"); >>>>>>> ------------------------------------------------- >>>>>>> is the assert redundant? >>>>>> Asserts are intended to be redundant :-) But still catch bugs >>>>>> from time to time. >>>>>> >>>>>> This one, in particular, checks invariant on profile.count() >= >>>>>> -1 (which is not very useful by >>>>>> itself), but also stresses that "profile.count() == 0" case is >>>>>> being processed. >>>>>> >>>>>> Best regards, >>>>>> Vladimir Ivanov >>>> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: 8221542.patch Type: text/x-patch Size: 7109 bytes Desc: not available URL: From sandhya.viswanathan at intel.com Sat May 4 00:01:33 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Sat, 4 May 2019 00:01:33 +0000 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <52876f29-4da2-2885-fe18-5e362b57eb2b@oracle.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB845@FMSMSX126.amr.corp.intel.com> <21eeec09-624f-2dbd-b2f5-86d512233fe0@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB898@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABCE7@FMSMSX126.amr.corp.intel.com> <4a77b7c0-fc1a-441c-d018-70568876c4f4@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABDA2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB5094@FMSMSX126.amr.corp.intel.com> <0cd3fd93-0f1e-a6d0-d4c3-f8d95b533ff7@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB56B1@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB7472@FMSMSX126.amr.corp.intel.com> <52876f29-4da2-2885-fe18-5e362b57eb2b@oracle.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB74D2@FMSMSX126.amr.corp.intel.com> Hi Vladimir, The updated webrev with #ifdef change is at: http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.03/ Yes, the footprint numbers continue to hold with this patch: The x86.ad file is 126 lines smaller. The libjvm size increase is only 0.24%. Best Regards, Sandhya -----Original Message----- From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] Sent: Friday, May 03, 2019 4:22 PM To: Viswanathan, Sandhya Cc: hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.02/ Much better! I like how AD files look now. I assume static footprint numbers you provided earlier are still valid. +void MacroAssembler::vabsnegd(int opcode, XMMRegister dst, Register scr) { + if (opcode == Op_AbsVD) { + andpd(dst, ExternalAddress(StubRoutines::x86::vector_double_sign_mask()), scr); + } else { + assert((opcode == Op_NegVD),"opcode should be Op_NegD"); + xorpd(dst, ExternalAddress(StubRoutines::x86::vector_double_sign_flip()), scr); + } +} It's a bit odd to see C2-specific stuff in MacroAssembler, but I'm perfectly fine with incrementally refactor it later. For now, just guard relevant code with #ifdef COMPILER2. Otherwise, looks very good! Best regards, Vladimir Ivanov > > Looking forward to your feedback. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] > Sent: Wednesday, May 01, 2019 5:09 PM > To: Viswanathan, Sandhya ; Vladimir Kozlov > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > > Sounds good, thanks! > > Best regards, > Vladimir Ivanov > > On 01/05/2019 15:16, Viswanathan, Sandhya wrote: >> I should add here that your suggestion of adding generic shift instruction etc to the macroAssembler is also wonderful instead of function pointer. I will look into making that change as well. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Viswanathan, Sandhya >> Sent: Wednesday, May 01, 2019 3:10 PM >> To: 'Vladimir Ivanov' ; Vladimir Kozlov >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> Hi Vladimir, >> >> I agree, I wanted to show both the approaches in this patch to get your feedback: >> 1) with emit as a function >> 2) with emit part in the instruct body itself >> >> With emit as a function it becomes hard to read and I personally prefer it in the instruct itself as is done for vabsneg2D etc. That is what you are recommending as well so I feel good. >> >> Once the adlc enhancement is done both the approaches should give similar binary size. Till then there will be small overhead with approach 2) as emit is duplicated per match rule. >> >> I will send an updated patch fixing the two issues you mentioned in your previous email plus this change of using approach 2). >> >> Please do let me know if you want to see any other change in this patch. >> >> Best Regards, >> Sandhya >> >> >> >> -----Original Message----- >> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] >> Sent: Wednesday, May 01, 2019 2:58 PM >> To: Viswanathan, Sandhya ; Vladimir Kozlov >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> >>> http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.01/ >> >> Nice job, Sandhya! Glad to hear the approach pays off! >> >> Unfortunately, I must note that AD file becomes much more obscure. >> Especially with those function pointers. >> >> 1528 void emit_vshift16B_code(MacroAssembler& _masm, int opcode, XMMRegister dst, >> 1529 XMMRegister src, XMMRegister shift, >> 1530 XMMRegister tmp1, XMMRegister tmp2, >> Register scratch) { >> 1531 XX_Inst extendinst = get_extend_inst(opcode == Op_URShiftVB ? >> false : true); >> 1532 XX_Inst shiftinst = get_xx_inst(opcode); >> 1533 >> 1534 (_masm.*extendinst)(tmp1, src); >> 1535 (_masm.*shiftinst)(tmp1, shift); >> 1536 __ pshufd(tmp2, src, 0xE); >> 1537 (_masm.*extendinst)(tmp2, tmp2); >> 1538 (_masm.*shiftinst)(tmp2, shift); >> 1539 __ movdqu(dst, ExternalAddress(vector_short_to_byte_mask()), >> scratch); >> 1540 __ pand(tmp2, dst); >> 1541 __ pand(dst, tmp1); >> 1542 __ packuswb(dst, tmp2); >> 1543 } >> >> Have you tried to encapsulate that into x86-specific MacroAssembler? >> >> 8682 instruct vshift16B(vecX dst, vecX src, vecS shift, vecX tmp1, vecX tmp2, rRegI scratch) %{ >> 8683 predicate(UseSSE > 3 && UseAVX <= 1 && n->as_Vector()->length() >> == 16); >> 8684 match(Set dst (LShiftVB src shift)); >> 8685 match(Set dst (RShiftVB src shift)); >> 8686 match(Set dst (URShiftVB src shift)); >> 8687 effect(TEMP dst, TEMP tmp1, TEMP tmp2, TEMP scratch); >> 8688 format %{"pmovxbw $tmp1,$src\n\t" >> 8689 "shiftop $tmp1,$shift\n\t" >> 8690 "pshufd $tmp2,$src\n\t" >> 8691 "pmovxbw $tmp2,$tmp2\n\t" >> 8692 "shiftop $tmp2,$shift\n\t" >> 8693 "movdqu $dst,[0x00ff00ff0x00ff00ff]\n\t" >> 8694 "pand $tmp2,$dst\n\t" >> 8695 "pand $dst,$tmp1\n\t" >> 8696 "packuswb $dst,$tmp2\n\t! packed16B shift" %} >> 8697 ins_encode %{ >> 8698 emit_vshift16B_code(_masm, this->as_Mach()->ideal_Opcode() , >> $dst$$XMMRegister, $src$$XMMRegister, $shift$$XMMRegister, $tmp1$$XMMRegister, $tmp2$$XMMRegister, $scratch$$Register); >> 8699 %} >> 8700 ins_pipe( pipe_slow ); >> 8701 %} >> >> can be turned into something like: >> >> instruct vshift16B(vecX dst, vecX src, vecS shift, vecX tmp1, vecX tmp2, rRegI scratch) %{ >> predicate(n->as_Vector()->length() == 16); >> match(Set dst (LShiftVB src shift)); >> match(Set dst (RShiftVB src shift)); >> match(Set dst (URShiftVB src shift)); >> effect(TEMP dst, TEMP tmp1, TEMP tmp2, TEMP scratch); >> format %{"packed16B shift" %} >> ins_encode %{ >> int vlen = 0; // 128-bit >> BasicType elem_type = T_BYTE; >> int shift_mode = ...; // L/R/UR or S/U + L/R >> __ vshift(vlen, elem_type, shift_mode, >> $dst$$..., $src$$..., $shift$$..., >> $tmp1$$..., $tmp2$$..., $scratch$$...); >> %} >> >> Then MA::vshift can dispatch between different implementations depending on SSE/AVX level available. Do you see any problems with that from footprint perspective? >> >> Ideally, I'd prefer to see a library of operations on vectors encapsulated in MacroAssembler (or a subclass) and used in x86.ad. That will accommodate further reductions in AD instructions needed. >> >> Best regards, >> Vladimir Ivanov >> >>> With this webrev the ad file has only about 60 lines effectively added. >>> Also the generated product libjvm.so size only increases by about 0.26% vs the prior 1.50%. >>> I have used multiple match rules in one instruct for same size shift related rules and also for the new Abs/Neg rules. >>> What I noticed is that the adlc still duplicates lot of code and there is potential to further improve code size for multiple match rule case by improving the adlc itself. >>> The adlc improvement (like removing duplicate emits, formats, expand, pipeline etc) can be done as a separate RFE. >>> >>> In this webrev, I have also fixed the errors reported by Vladimir Ivanov and corrected the issues reported by jcheck tool. >>> Also taken into account reducing the temporary by using TEMP dst for multiply rules. >>> >>> The compiler jtreg tests and the java math tests pass on Haswell, SKX, and KNL. >>> >>> Your review and feedback is welcome. >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev >>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >>> Viswanathan, Sandhya >>> Sent: Wednesday, April 10, 2019 10:22 AM >>> To: Vladimir Kozlov ; B. Blaser >>> >>> Cc: hotspot-compiler-dev at openjdk.java.net >>> Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 >>> >>> Yes good catch, in mul32B_reg_avx(), the last two instructions are the only place where dst is used: >>> >>> __ vpackuswb($dst$$XMMRegister, $tmp2$$XMMRegister, $tmp1$$XMMRegister, vector_len); >>> __ vpermq($dst$$XMMRegister, $dst$$XMMRegister, 0xD8, >>> vector_len); >>> >>> Here dst can be same as tmp2 or tmp1 in packuswb() and so the effect TEMP dst is not required. >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Wednesday, April 10, 2019 9:59 AM >>> To: Viswanathan, Sandhya ; B. Blaser >>> >>> Cc: hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >>> >>> On 4/10/19 8:36 AM, Viswanathan, Sandhya wrote: >>>> Hi Bernard, >>>> >>>> One could add TEMP dst in effect() to let the register allocator know that dst needs to be different from src. >>> >>> Yes, we use this way. Or, in mul4B_reg() case, we can use $dst instead >>> $tmp2 to avoid overwriting >>> $src2 before we get value from it if $dst = $src2. >>> >>> On other hand, mul32B_reg_avx() and other have 'TEMP dst' effect but $dst is used only for final result. >>> >>> It is a little mess which may cause ineffective use of registers in compiled code. >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Best Regards, >>>> Sandhya >>>> >>>> >>>> -----Original Message----- >>>> From: B. Blaser [mailto:bsrbnd at gmail.com] >>>> Sent: Wednesday, April 10, 2019 4:10 AM >>>> To: Viswanathan, Sandhya >>>> Cc: Vladimir Kozlov ; >>>> hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >>>> >>>> Hi Sandhya and Vladimir K., >>>> >>>> On Wed, 10 Apr 2019 at 03:06, Viswanathan, Sandhya wrote: >>>>> >>>>> Hi Vladimir, >>>>> >>>>> Yes, I missed the question below: >>>>>>> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? >>>>> >>>>> No it is not intentional, we can use the dst register in those cases and reduced the tmps. >>>> >>>> I guess we have to be careful using $dst instead of $tmp registers as the allocator sometimes provides identical $src & $dst. Also, I'm not sure this would be possible in the case of mul4B_reg(): >>>> >>>> 7349 format %{"pmovsxbw $tmp,$src1\n\t" >>>> 7350 "pmovsxbw $tmp2,$src2\n\t" >>>> >>>> I believe this couldn't work if you use $dst instead of $tmp and $dst = $src2, what do you think? >>>> >>>> Thanks, >>>> Bernard >>>> From jesper.wilhelmsson at oracle.com Sat May 4 01:13:04 2019 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Sat, 4 May 2019 03:13:04 +0200 Subject: RFR: JDK-8222665 - Update Graal Message-ID: <74DF69E0-2792-492E-99DC-BEA707375BBB@oracle.com> Hi, Please review the patch to integrate recent Graal changes into OpenJDK. Graal tip to integrate: 556bed673d5bccbed227e2e108dc36eaf00239eb Bug: https://bugs.openjdk.java.net/browse/JDK-8222665 Webrev: http://cr.openjdk.java.net/~jwilhelm/8222665/webrev.00/ This integration did overwrite changes already in place in OpenJDK. The diff has been attached to the umbrella bug. Thanks, /Jesper -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From vladimir.kozlov at oracle.com Sat May 4 02:04:57 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 3 May 2019 19:04:57 -0700 Subject: RFR: JDK-8222665 - Update Graal In-Reply-To: <74DF69E0-2792-492E-99DC-BEA707375BBB@oracle.com> References: <74DF69E0-2792-492E-99DC-BEA707375BBB@oracle.com> Message-ID: <998bc1f0-ba55-6c06-21ed-349267a8271f@oracle.com> This is too old Graal's tip. You need at least take a29972bd6677f2e8165438caf1073ff596b95f26 to get next changes and to avoid rollback needed changes listed in overwrite file. Otherwise you will get tests failures. my changes: [GR-14499] Update jdk9 version of GraalServices.java and Dean's: [GR-15582] Replace getCompilationLevelAdjustment with excludeFromJVMCICompilation after JDK-8219403. Vladimir On 5/3/19 6:13 PM, jesper.wilhelmsson at oracle.com wrote: > Hi, > > Please review the patch to integrate recent Graal changes into OpenJDK. > Graal tip to integrate: 556bed673d5bccbed227e2e108dc36eaf00239eb > > Bug: https://bugs.openjdk.java.net/browse/JDK-8222665 > Webrev: http://cr.openjdk.java.net/~jwilhelm/8222665/webrev.00/ > > This integration did overwrite changes already in place in OpenJDK. The diff has been attached to the umbrella bug. > > Thanks, > /Jesper > From jesper.wilhelmsson at oracle.com Sat May 4 02:15:58 2019 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Sat, 4 May 2019 04:15:58 +0200 Subject: RFR: JDK-8222665 - Update Graal In-Reply-To: <998bc1f0-ba55-6c06-21ed-349267a8271f@oracle.com> References: <74DF69E0-2792-492E-99DC-BEA707375BBB@oracle.com> <998bc1f0-ba55-6c06-21ed-349267a8271f@oracle.com> Message-ID: <5C486583-81AC-4372-8B5A-8A7B369E8BCE@oracle.com> Ok, then I withdraw this RFR and we need to wait until we have a more recent clean Graal nightly. Thanks, /Jesper > On 4 May 2019, at 04:04, Vladimir Kozlov wrote: > > This is too old Graal's tip. > > You need at least take a29972bd6677f2e8165438caf1073ff596b95f26 to get next changes and to avoid rollback needed changes listed in overwrite file. Otherwise you will get tests failures. > > my changes: [GR-14499] Update jdk9 version of GraalServices.java > and Dean's: [GR-15582] Replace getCompilationLevelAdjustment with excludeFromJVMCICompilation after JDK-8219403. > > Vladimir > > On 5/3/19 6:13 PM, jesper.wilhelmsson at oracle.com wrote: >> Hi, >> Please review the patch to integrate recent Graal changes into OpenJDK. >> Graal tip to integrate: 556bed673d5bccbed227e2e108dc36eaf00239eb >> Bug: https://bugs.openjdk.java.net/browse/JDK-8222665 >> Webrev: http://cr.openjdk.java.net/~jwilhelm/8222665/webrev.00/ >> This integration did overwrite changes already in place in OpenJDK. The diff has been attached to the umbrella bug. >> Thanks, >> /Jesper -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From vladimir.kozlov at oracle.com Sat May 4 17:54:07 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sat, 4 May 2019 10:54:07 -0700 Subject: [13] RFR(M) 8223332: Update JVMCI Message-ID: <085cd3bd-e2ba-4bdc-0573-c7f5cad98fed@oracle.com> http://cr.openjdk.java.net/~kvn/8223332/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8223332 Sync latest JVMCI changes from graal-jvmci-8 [1] The list in the bug report. [1] https://github.com/graalvm/graal-jvmci-8/commits/master -- Thanks, Vladimir From igor.ignatyev at oracle.com Mon May 6 03:31:37 2019 From: igor.ignatyev at oracle.com (Igor Ignatev) Date: Sun, 5 May 2019 20:31:37 -0700 Subject: RFR(trivial): 8223054: [TESTBUG] Put graalJarsCP before existing classpath in GraalUnitTestLauncher In-Reply-To: References: Message-ID: <803B96E9-8EDD-4469-9137-63451E825724@oracle.com> Looks good to me. // moved to hotspot compiler list ? Igor > On May 4, 2019, at 6:32 PM, Pengfei Li (Arm Technology China) wrote: > > Hi, > > Please help review this trivial change on GraalUnitTestLauncher. > > Webrev: http://cr.openjdk.java.net/~pli/rfr/8223054/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8223054 > > Current graal unit test in jtreg requires junit-4.12.jar as a dependency. In GraalUnitTestLauncher.java, we put the path of this file into graalJarsCP and concat it with existing classpath. But existing classpath may contain another version of junit with which the jtreg tool is built. (According to OpenJDK "Building jtreg" webpage[1], the recommended version of Junit to build jtreg is junit-4.10). > > In this patch, graalJarsCP is put before existing classpath returned by System.getProperty() when generating the new classpath string to avoid incompatibility issues. Jteg graal unit test cases passed after this change. > > [1] https://openjdk.java.net/jtreg/build.html > > -- > Thanks, > Pengfei > From robbin.ehn at oracle.com Mon May 6 08:42:11 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Mon, 6 May 2019 10:42:11 +0200 Subject: RFR(m): 8221734: Deoptimize with handshakes In-Reply-To: References: <89b00912-1f84-3458-d53b-fbe6d372affe@oracle.com> <64a8afca-9dc8-b119-0a12-dd05799bdd22@oracle.com> Message-ID: Hi Dan, > src/hotspot/share/runtime/biasedLocking.cpp > ??? nit - Please update copyright year for this file. > Updated in 8220724. > ??? Nice refactoring into more readable chunks! I'm assuming that > ??? Patricio is also reviewing these changes... Great, good! > src/hotspot/share/runtime/deoptimization.cpp > ??? L778:? bool _in_handshake; > ??????? nit - needs one more space of indent. Fixed. > > ??? Nice refactoring while adding in the handshake support. Great! > > src/hotspot/share/runtime/deoptimization.hpp > ??? L147:? public: > ??? L148: > ??? L149: ? // Deoptimizes a frame lazily. nmethod gets patched deopt happens > on return to the frame > ??? L163: ? static void fix_monitors(JavaThread* thread, frame fr, RegisterMap* > map) > ??????? Style nit: I would put the blank line on L148 above L147. Fixed. > > ??? L164: ??? { inflate_monitors(thread, fr, map); } > ??????? Style nit: Should be: > > ??????????? static void fix_monitors(JavaThread* thread, frame fr, RegisterMap* > map) { > ????????????? inflate_monitors(thread, fr, map); > ??????????? } Fixed. > src/hotspot/share/runtime/mutexLocker.cpp > ??? No comments. (So OsrList_lock is now 'special-1' instead of 'leaf'. > ??? I presume the Compiler team is okay with that... Since need we hold CodeCache_lock while iterating nmethods, all locks that might be taken needed to be pushed down under CodeCache_lock. So I hope they are okay with that. > Thumbs up!? I don't need to see a webrev if you fix the nits... Thanks Dan! Fixed! I did t6-7 over the weekend, no issues found. /Robbin > > Dan > > >> >> # Note >> http://cr.openjdk.java.net/~rehn/8221734/v2/inc/webrev/src/hotspot/share/runtime/biasedLocking.cpp.sdiff.html >> line 630 >> This is revert to the original, I accidental had left in a temporary test >> change, as you can see here in full diff: >> http://cr.openjdk.java.net/~rehn/8221734/v2/webrev/src/hotspot/share/runtime/biasedLocking.cpp.sdiff.html >> >> >> I think I manage to address all review comments. >> >> Dean can you please cast an extra eye on: >> http://cr.openjdk.java.net/~rehn/8221734/v2/inc/webrev/src/hotspot/share/oops/method.cpp.sdiff.html >> >> This OR should be correct. >> >> Dan please do the same on the biased locking changes. >> >> I left out the merge with MutexLocker changes, since it was not interesting. >> There were some conflicts with JVMCI changes, so incremental contains some >> parts of that merge. >> >> Passes t1-5 and local testing. >> I'll continue with some additional testing. >> >> Thanks, Robbin >> >> On 4/25/19 2:05 PM, Robbin Ehn wrote: >>> Hi all, please review. >>> >>> Let's deopt with handshakes. >>> Removed VM op Deoptimize, instead we handshake. >>> Locks needs to be inflate since we are not in a safepoint. >>> >>> Goes on top of: >>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-April/033491.html >>> >>> >>> Code: >>> http://cr.openjdk.java.net/~rehn/8221734/v1/webrev/index.html >>> Issue: >>> https://bugs.openjdk.java.net/browse/JDK-8221734 >>> >>> Passes t1-7 and multiple t1-5 runs. >>> >>> A few startup benchmark see a small speedup. >>> >>> Thanks, Robbin > From Pengfei.Li at arm.com Mon May 6 10:41:00 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Mon, 6 May 2019 10:41:00 +0000 Subject: RFR(trivial): 8223054: [TESTBUG] Put graalJarsCP before existing classpath in GraalUnitTestLauncher In-Reply-To: <803B96E9-8EDD-4469-9137-63451E825724@oracle.com> References: <803B96E9-8EDD-4469-9137-63451E825724@oracle.com> Message-ID: Thanks Igor. Do I need another reviewer for this trivial change? // Also cc graal-dev list -- Thanks, Pengfei > > Looks good to me. > > // moved to hotspot compiler list > > ? Igor > > > On May 4, 2019, at 6:32 PM, Pengfei Li (Arm Technology China) > wrote: > > > > Hi, > > > > Please help review this trivial change on GraalUnitTestLauncher. > > > > Webrev: http://cr.openjdk.java.net/~pli/rfr/8223054/webrev.00/ > > JBS: https://bugs.openjdk.java.net/browse/JDK-8223054 > > > > Current graal unit test in jtreg requires junit-4.12.jar as a dependency. In > GraalUnitTestLauncher.java, we put the path of this file into graalJarsCP and > concat it with existing classpath. But existing classpath may contain another > version of junit with which the jtreg tool is built. (According to OpenJDK > "Building jtreg" webpage[1], the recommended version of Junit to build jtreg > is junit-4.10). > > > > In this patch, graalJarsCP is put before existing classpath returned by > System.getProperty() when generating the new classpath string to avoid > incompatibility issues. Jteg graal unit test cases passed after this change. > > > > [1] https://openjdk.java.net/jtreg/build.html > > > > -- > > Thanks, > > Pengfei > > From martin.doerr at sap.com Mon May 6 12:54:01 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 6 May 2019 12:54:01 +0000 Subject: RFR(S): jdk11u-dev backport 8216556: Unnecessary liveness computation with JVMTI Message-ID: Hi, I'd like to backport this change to jdk11u because it's very simply and avoids some unnecessary overhead. Applies almost cleanly (only needs manual resolution because neighboring hunk has changed: CompileTheWorld removal). bug: https://bugs.openjdk.java.net/browse/JDK-8216556 original change: http://hg.openjdk.java.net/jdk/jdk/rev/91ab128a65a3 jdk11u webrev: http://cr.openjdk.java.net/~mdoerr/8216556_JVMTI_liveness/jdk11u/webrev.00/ I only had to reapply the change around "if (CURRENT_ENV->should_retain_local_variables() || DeoptimizeALot || CompileTheWorld) {" (ciMethod.cpp) because CompileTheWorld was removed. Please review. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From goetz.lindenmaier at sap.com Mon May 6 12:59:23 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 6 May 2019 12:59:23 +0000 Subject: RFR(S): jdk11u-dev backport 8216556: Unnecessary liveness computation with JVMTI In-Reply-To: References: Message-ID: Hi Martin, you well intergrated this change to 11. Thanks for downporting it, it will help the debugging performance slightly. Best regards, Goetz. > -----Original Message----- > From: Doerr, Martin > Sent: Montag, 6. Mai 2019 14:54 > To: 'hotspot-compiler-dev at openjdk.java.net' dev at openjdk.java.net>; Lindenmaier, Goetz > Subject: RFR(S): jdk11u-dev backport 8216556: Unnecessary liveness > computation with JVMTI > > Hi, > > > > I'd like to backport this change to jdk11u because it's very simply and avoids > some unnecessary overhead. > > Applies almost cleanly (only needs manual resolution because neighboring > hunk has changed: CompileTheWorld removal). > > > > bug: > > https://bugs.openjdk.java.net/browse/JDK-8216556 > > > > original change: > > http://hg.openjdk.java.net/jdk/jdk/rev/91ab128a65a3 > > > > jdk11u webrev: > > http://cr.openjdk.java.net/~mdoerr/8216556_JVMTI_liveness/jdk11u/webrev. > 00/ > > > > I only had to reapply the change around > > "if (CURRENT_ENV->should_retain_local_variables() || DeoptimizeALot || > CompileTheWorld) {" > > (ciMethod.cpp) because CompileTheWorld was removed. > > > > Please review. > > > > Best regards, > > Martin > > From martin.doerr at sap.com Mon May 6 13:05:13 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 6 May 2019 13:05:13 +0000 Subject: RFR(S): jdk11u-dev backport 8216556: Unnecessary liveness computation with JVMTI In-Reply-To: References: Message-ID: Hi G?tz, thank you for reviewing. Best regards, Martin -----Original Message----- From: Lindenmaier, Goetz Sent: Montag, 6. Mai 2019 14:59 To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' ; jdk-updates-dev at openjdk.java.net Subject: RE: RFR(S): jdk11u-dev backport 8216556: Unnecessary liveness computation with JVMTI Hi Martin, you well intergrated this change to 11. Thanks for downporting it, it will help the debugging performance slightly. Best regards, Goetz. > -----Original Message----- > From: Doerr, Martin > Sent: Montag, 6. Mai 2019 14:54 > To: 'hotspot-compiler-dev at openjdk.java.net' dev at openjdk.java.net>; Lindenmaier, Goetz > Subject: RFR(S): jdk11u-dev backport 8216556: Unnecessary liveness > computation with JVMTI > > Hi, > > > > I'd like to backport this change to jdk11u because it's very simply and avoids > some unnecessary overhead. > > Applies almost cleanly (only needs manual resolution because neighboring > hunk has changed: CompileTheWorld removal). > > > > bug: > > https://bugs.openjdk.java.net/browse/JDK-8216556 > > > > original change: > > http://hg.openjdk.java.net/jdk/jdk/rev/91ab128a65a3 > > > > jdk11u webrev: > > http://cr.openjdk.java.net/~mdoerr/8216556_JVMTI_liveness/jdk11u/webrev. > 00/ > > > > I only had to reapply the change around > > "if (CURRENT_ENV->should_retain_local_variables() || DeoptimizeALot || > CompileTheWorld) {" > > (ciMethod.cpp) because CompileTheWorld was removed. > > > > Please review. > > > > Best regards, > > Martin > > From jesper.wilhelmsson at oracle.com Mon May 6 14:18:16 2019 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Mon, 6 May 2019 16:18:16 +0200 Subject: RFR: JDK-8222665 - Update Graal Message-ID: Hi, Please review the patch to integrate recent Graal changes into OpenJDK. Graal tip to integrate: 88c3adb11b1bc10f6443435685b65227e7584b43 Bug: https://bugs.openjdk.java.net/browse/JDK-8222665 Webrev: http://cr.openjdk.java.net/~jwilhelm/8222665/webrev.00/ This integration did overwrite changes already in place in OpenJDK. The diff has been attached to the umbrella bug. Thanks, /Jesper -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From eric.caspole at oracle.com Mon May 6 14:21:08 2019 From: eric.caspole at oracle.com (Eric Caspole) Date: Mon, 6 May 2019 10:21:08 -0400 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB7472@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB845@FMSMSX126.amr.corp.intel.com> <21eeec09-624f-2dbd-b2f5-86d512233fe0@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB898@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABCE7@FMSMSX126.amr.corp.intel.com> <4a77b7c0-fc1a-441c-d018-70568876c4f4@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABDA2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB5094@FMSMSX126.amr.corp.intel.com> <0cd3fd93-0f1e-a6d0-d4c3-f8d95b533ff7@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB56B1@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB7472@FMSMSX126.amr.corp.intel.com> Message-ID: Hi Sandhya, Could add some new JMH to this webrev that target the java code that show the benefit of these changes? Or, you could look through the existing ones in test/micro/org/openjdk/bench/ and mention in the bug which existing ones exercise these changes. That will be a big help to us in the course of working on JDK 13. Thanks, Eric On 5/3/19 19:02, Viswanathan, Sandhya wrote: > Hi Vladimir, > > Please find below the updated webrev which implements all your inputs: > http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.02/ > > Looking forward to your feedback. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] > Sent: Wednesday, May 01, 2019 5:09 PM > To: Viswanathan, Sandhya ; Vladimir Kozlov > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > > Sounds good, thanks! > > Best regards, > Vladimir Ivanov > > On 01/05/2019 15:16, Viswanathan, Sandhya wrote: >> I should add here that your suggestion of adding generic shift instruction etc to the macroAssembler is also wonderful instead of function pointer. I will look into making that change as well. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Viswanathan, Sandhya >> Sent: Wednesday, May 01, 2019 3:10 PM >> To: 'Vladimir Ivanov' ; Vladimir Kozlov >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> Hi Vladimir, >> >> I agree, I wanted to show both the approaches in this patch to get your feedback: >> 1) with emit as a function >> 2) with emit part in the instruct body itself >> >> With emit as a function it becomes hard to read and I personally prefer it in the instruct itself as is done for vabsneg2D etc. That is what you are recommending as well so I feel good. >> >> Once the adlc enhancement is done both the approaches should give similar binary size. Till then there will be small overhead with approach 2) as emit is duplicated per match rule. >> >> I will send an updated patch fixing the two issues you mentioned in your previous email plus this change of using approach 2). >> >> Please do let me know if you want to see any other change in this patch. >> >> Best Regards, >> Sandhya >> >> >> >> -----Original Message----- >> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] >> Sent: Wednesday, May 01, 2019 2:58 PM >> To: Viswanathan, Sandhya ; Vladimir Kozlov >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> >>> http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.01/ >> >> Nice job, Sandhya! Glad to hear the approach pays off! >> >> Unfortunately, I must note that AD file becomes much more obscure. >> Especially with those function pointers. >> >> 1528 void emit_vshift16B_code(MacroAssembler& _masm, int opcode, XMMRegister dst, >> 1529 XMMRegister src, XMMRegister shift, >> 1530 XMMRegister tmp1, XMMRegister tmp2, >> Register scratch) { >> 1531 XX_Inst extendinst = get_extend_inst(opcode == Op_URShiftVB ? >> false : true); >> 1532 XX_Inst shiftinst = get_xx_inst(opcode); >> 1533 >> 1534 (_masm.*extendinst)(tmp1, src); >> 1535 (_masm.*shiftinst)(tmp1, shift); >> 1536 __ pshufd(tmp2, src, 0xE); >> 1537 (_masm.*extendinst)(tmp2, tmp2); >> 1538 (_masm.*shiftinst)(tmp2, shift); >> 1539 __ movdqu(dst, ExternalAddress(vector_short_to_byte_mask()), >> scratch); >> 1540 __ pand(tmp2, dst); >> 1541 __ pand(dst, tmp1); >> 1542 __ packuswb(dst, tmp2); >> 1543 } >> >> Have you tried to encapsulate that into x86-specific MacroAssembler? >> >> 8682 instruct vshift16B(vecX dst, vecX src, vecS shift, vecX tmp1, vecX tmp2, rRegI scratch) %{ >> 8683 predicate(UseSSE > 3 && UseAVX <= 1 && n->as_Vector()->length() >> == 16); >> 8684 match(Set dst (LShiftVB src shift)); >> 8685 match(Set dst (RShiftVB src shift)); >> 8686 match(Set dst (URShiftVB src shift)); >> 8687 effect(TEMP dst, TEMP tmp1, TEMP tmp2, TEMP scratch); >> 8688 format %{"pmovxbw $tmp1,$src\n\t" >> 8689 "shiftop $tmp1,$shift\n\t" >> 8690 "pshufd $tmp2,$src\n\t" >> 8691 "pmovxbw $tmp2,$tmp2\n\t" >> 8692 "shiftop $tmp2,$shift\n\t" >> 8693 "movdqu $dst,[0x00ff00ff0x00ff00ff]\n\t" >> 8694 "pand $tmp2,$dst\n\t" >> 8695 "pand $dst,$tmp1\n\t" >> 8696 "packuswb $dst,$tmp2\n\t! packed16B shift" %} >> 8697 ins_encode %{ >> 8698 emit_vshift16B_code(_masm, this->as_Mach()->ideal_Opcode() , >> $dst$$XMMRegister, $src$$XMMRegister, $shift$$XMMRegister, $tmp1$$XMMRegister, $tmp2$$XMMRegister, $scratch$$Register); >> 8699 %} >> 8700 ins_pipe( pipe_slow ); >> 8701 %} >> >> can be turned into something like: >> >> instruct vshift16B(vecX dst, vecX src, vecS shift, vecX tmp1, vecX tmp2, rRegI scratch) %{ >> predicate(n->as_Vector()->length() == 16); >> match(Set dst (LShiftVB src shift)); >> match(Set dst (RShiftVB src shift)); >> match(Set dst (URShiftVB src shift)); >> effect(TEMP dst, TEMP tmp1, TEMP tmp2, TEMP scratch); >> format %{"packed16B shift" %} >> ins_encode %{ >> int vlen = 0; // 128-bit >> BasicType elem_type = T_BYTE; >> int shift_mode = ...; // L/R/UR or S/U + L/R >> __ vshift(vlen, elem_type, shift_mode, >> $dst$$..., $src$$..., $shift$$..., >> $tmp1$$..., $tmp2$$..., $scratch$$...); >> %} >> >> Then MA::vshift can dispatch between different implementations depending on SSE/AVX level available. Do you see any problems with that from footprint perspective? >> >> Ideally, I'd prefer to see a library of operations on vectors encapsulated in MacroAssembler (or a subclass) and used in x86.ad. That will accommodate further reductions in AD instructions needed. >> >> Best regards, >> Vladimir Ivanov >> >>> With this webrev the ad file has only about 60 lines effectively added. >>> Also the generated product libjvm.so size only increases by about 0.26% vs the prior 1.50%. >>> I have used multiple match rules in one instruct for same size shift related rules and also for the new Abs/Neg rules. >>> What I noticed is that the adlc still duplicates lot of code and there is potential to further improve code size for multiple match rule case by improving the adlc itself. >>> The adlc improvement (like removing duplicate emits, formats, expand, pipeline etc) can be done as a separate RFE. >>> >>> In this webrev, I have also fixed the errors reported by Vladimir Ivanov and corrected the issues reported by jcheck tool. >>> Also taken into account reducing the temporary by using TEMP dst for multiply rules. >>> >>> The compiler jtreg tests and the java math tests pass on Haswell, SKX, and KNL. >>> >>> Your review and feedback is welcome. >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev >>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >>> Viswanathan, Sandhya >>> Sent: Wednesday, April 10, 2019 10:22 AM >>> To: Vladimir Kozlov ; B. Blaser >>> >>> Cc: hotspot-compiler-dev at openjdk.java.net >>> Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 >>> >>> Yes good catch, in mul32B_reg_avx(), the last two instructions are the only place where dst is used: >>> >>> __ vpackuswb($dst$$XMMRegister, $tmp2$$XMMRegister, $tmp1$$XMMRegister, vector_len); >>> __ vpermq($dst$$XMMRegister, $dst$$XMMRegister, 0xD8, >>> vector_len); >>> >>> Here dst can be same as tmp2 or tmp1 in packuswb() and so the effect TEMP dst is not required. >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Wednesday, April 10, 2019 9:59 AM >>> To: Viswanathan, Sandhya ; B. Blaser >>> >>> Cc: hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >>> >>> On 4/10/19 8:36 AM, Viswanathan, Sandhya wrote: >>>> Hi Bernard, >>>> >>>> One could add TEMP dst in effect() to let the register allocator know that dst needs to be different from src. >>> >>> Yes, we use this way. Or, in mul4B_reg() case, we can use $dst instead >>> $tmp2 to avoid overwriting >>> $src2 before we get value from it if $dst = $src2. >>> >>> On other hand, mul32B_reg_avx() and other have 'TEMP dst' effect but $dst is used only for final result. >>> >>> It is a little mess which may cause ineffective use of registers in compiled code. >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Best Regards, >>>> Sandhya >>>> >>>> >>>> -----Original Message----- >>>> From: B. Blaser [mailto:bsrbnd at gmail.com] >>>> Sent: Wednesday, April 10, 2019 4:10 AM >>>> To: Viswanathan, Sandhya >>>> Cc: Vladimir Kozlov ; >>>> hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >>>> >>>> Hi Sandhya and Vladimir K., >>>> >>>> On Wed, 10 Apr 2019 at 03:06, Viswanathan, Sandhya wrote: >>>>> >>>>> Hi Vladimir, >>>>> >>>>> Yes, I missed the question below: >>>>>>> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? >>>>> >>>>> No it is not intentional, we can use the dst register in those cases and reduced the tmps. >>>> >>>> I guess we have to be careful using $dst instead of $tmp registers as the allocator sometimes provides identical $src & $dst. Also, I'm not sure this would be possible in the case of mul4B_reg(): >>>> >>>> 7349 format %{"pmovsxbw $tmp,$src1\n\t" >>>> 7350 "pmovsxbw $tmp2,$src2\n\t" >>>> >>>> I believe this couldn't work if you use $dst instead of $tmp and $dst = $src2, what do you think? >>>> >>>> Thanks, >>>> Bernard >>>> From patricio.chilano.mateo at oracle.com Mon May 6 16:10:16 2019 From: patricio.chilano.mateo at oracle.com (Patricio Chilano) Date: Mon, 6 May 2019 12:10:16 -0400 Subject: RFR(m): 8221734: Deoptimize with handshakes In-Reply-To: References: <89b00912-1f84-3458-d53b-fbe6d372affe@oracle.com> <64a8afca-9dc8-b119-0a12-dd05799bdd22@oracle.com>

Message-ID: <259f2edc-a842-8f14-39d6-74eb47a2964c@oracle.com> Hi Robbin, I'm going to just review the biased locking part since I'm not really familiar with the rest of the code. In BiasedLocking::revoke_and_rebias_in_handshake(), why do you need to execute fast_revoke(obj, false)? If these are objects locked by the JavaThread you are handshaking then it seems they should be normal locks (no bias pattern) or the condition (mark->biased_locker() == THREAD && prototype_header->bias_epoch() == mark->bias_epoch()) you are testing for later should hold. Then that would save the extra comparisons in fast_revoke(). Also instead of placing the condition (mark->biased_locker() == THREAD && prototype_header->bias_epoch() == mark->bias_epoch()) inside an if() and then later use a ShouldNotReachHere(), wouldn't it be better to make that an assertion, place that code outside the if() and remove the ShouldNotReachHere()? For the execution of revoke_bias() inside BiasedLocking::revoke_and_rebias_in_handshake() you could use a shorter version of BiasedLocking::revoke_and_rebias() that avoids the extra comparisons made for the general case and just starts at the walking the stack part, but I'm actually doing that for 8191890 so I can merge that with my patch. In deoptimization.cpp you have methods inflate_monitors() and inflate_monitors_handshake(), but in inflate_monitors() you are not inflating the monitors, you just revoke the ones that have bias. You mentioned in your first email that we need to inflate if we are not at a safepoint, why is that? Since revocation seems to be the common factor between those methods, maybe s/inflate/revoke is a better name? Thanks! Patricio On 5/6/19 4:42 AM, Robbin Ehn wrote: > Hi Dan, > >> src/hotspot/share/runtime/biasedLocking.cpp >> ???? nit - Please update copyright year for this file. >> > > Updated in 8220724. > >> ???? Nice refactoring into more readable chunks! I'm assuming that >> ???? Patricio is also reviewing these changes... > > Great, good! > >> src/hotspot/share/runtime/deoptimization.cpp >> ???? L778:? bool _in_handshake; >> ???????? nit - needs one more space of indent. > > Fixed. > >> >> ???? Nice refactoring while adding in the handshake support. > > Great! > >> >> src/hotspot/share/runtime/deoptimization.hpp >> ???? L147:? public: >> ???? L148: >> ???? L149: ? // Deoptimizes a frame lazily. nmethod gets patched >> deopt happens on return to the frame >> ???? L163: ? static void fix_monitors(JavaThread* thread, frame fr, >> RegisterMap* map) >> ???????? Style nit: I would put the blank line on L148 above L147. > > Fixed. > >> >> ???? L164: ??? { inflate_monitors(thread, fr, map); } >> ???????? Style nit: Should be: >> >> ???????????? static void fix_monitors(JavaThread* thread, frame fr, >> RegisterMap* map) { >> ?????????????? inflate_monitors(thread, fr, map); >> ???????????? } > > Fixed. > >> src/hotspot/share/runtime/mutexLocker.cpp >> ???? No comments. (So OsrList_lock is now 'special-1' instead of 'leaf'. >> ???? I presume the Compiler team is okay with that... > > Since need we hold CodeCache_lock while iterating nmethods, all locks > that might be taken needed to be pushed down under CodeCache_lock. > So I hope they are okay with that. > >> Thumbs up!? I don't need to see a webrev if you fix the nits... > > Thanks Dan! Fixed! > > I did t6-7 over the weekend, no issues found. > > /Robbin > >> >> Dan >> >> >>> >>> # Note >>> http://cr.openjdk.java.net/~rehn/8221734/v2/inc/webrev/src/hotspot/share/runtime/biasedLocking.cpp.sdiff.html >>> line 630 >>> This is revert to the original, I accidental had left in a temporary >>> test change, as you can see here in full diff: >>> http://cr.openjdk.java.net/~rehn/8221734/v2/webrev/src/hotspot/share/runtime/biasedLocking.cpp.sdiff.html >>> >>> >>> I think I manage to address all review comments. >>> >>> Dean can you please cast an extra eye on: >>> http://cr.openjdk.java.net/~rehn/8221734/v2/inc/webrev/src/hotspot/share/oops/method.cpp.sdiff.html >>> >>> This OR should be correct. >>> >>> Dan please do the same on the biased locking changes. >>> >>> I left out the merge with MutexLocker changes, since it was not >>> interesting. >>> There were some conflicts with JVMCI changes, so incremental >>> contains some parts of that merge. >>> >>> Passes t1-5 and local testing. >>> I'll continue with some additional testing. >>> >>> Thanks, Robbin >>> >>> On 4/25/19 2:05 PM, Robbin Ehn wrote: >>>> Hi all, please review. >>>> >>>> Let's deopt with handshakes. >>>> Removed VM op Deoptimize, instead we handshake. >>>> Locks needs to be inflate since we are not in a safepoint. >>>> >>>> Goes on top of: >>>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-April/033491.html >>>> >>>> >>>> Code: >>>> http://cr.openjdk.java.net/~rehn/8221734/v1/webrev/index.html >>>> Issue: >>>> https://bugs.openjdk.java.net/browse/JDK-8221734 >>>> >>>> Passes t1-7 and multiple t1-5 runs. >>>> >>>> A few startup benchmark see a small speedup. >>>> >>>> Thanks, Robbin >> From tom.rodriguez at oracle.com Mon May 6 17:00:59 2019 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 6 May 2019 10:00:59 -0700 Subject: [13] RFR(M) 8223332: Update JVMCI In-Reply-To: <085cd3bd-e2ba-4bdc-0573-c7f5cad98fed@oracle.com> References: <085cd3bd-e2ba-4bdc-0573-c7f5cad98fed@oracle.com> Message-ID: Looks good. tom Vladimir Kozlov wrote on 5/4/19 10:54 AM: > http://cr.openjdk.java.net/~kvn/8223332/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8223332 > > Sync latest JVMCI changes from graal-jvmci-8 [1] > The list in the bug report. > > [1] https://github.com/graalvm/graal-jvmci-8/commits/master > From vladimir.kozlov at oracle.com Mon May 6 17:13:56 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 6 May 2019 10:13:56 -0700 Subject: [13] RFR(M) 8223332: Update JVMCI In-Reply-To: References: <085cd3bd-e2ba-4bdc-0573-c7f5cad98fed@oracle.com> Message-ID: Thank you, Tom Vladimir On 5/6/19 10:00 AM, Tom Rodriguez wrote: > Looks good. > > tom > > Vladimir Kozlov wrote on 5/4/19 10:54 AM: >> http://cr.openjdk.java.net/~kvn/8223332/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8223332 >> >> Sync latest JVMCI changes from graal-jvmci-8 [1] >> The list in the bug report. >> >> [1] https://github.com/graalvm/graal-jvmci-8/commits/master >> From vladimir.kozlov at oracle.com Mon May 6 18:11:38 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 6 May 2019 11:11:38 -0700 Subject: RFR: JDK-8222665 - Update Graal In-Reply-To: References: Message-ID: <5ff3bca1-2ca5-24d4-976a-0d3ffdcaa874@oracle.com> It seems webrev is wrong. Jesper, is it possible you sent old webrev? I looked on patch (from submitted test job) and it seems correct. For example, from next changes [1] it correctly updated only Copyright year in JDK (in JDK it was old 2018). But webrev shows reversed changes [2]. The patch does not have IsGraalPredicate.java changes. But webrev has it with reversed changes again [3]. The same for GraalServices.java file changes. No changes in patch but reverse changes in webrev. Thanks, Vladimir [1] https://github.com/oracle/graal/commit/4fa819e120212393122b55e2c95e9de7c6101ccf#diff-3f2f58ebefeb6c5489c4d264ec8ae502 [2] http://cr.openjdk.java.net/~jwilhelm/8222665/webrev.00/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/meta/DefaultHotSpotLoweringProvider.java.udiff.html [3] http://cr.openjdk.java.net/~jwilhelm/8222665/webrev.00/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/IsGraalPredicate.java.udiff.html On 5/6/19 7:18 AM, jesper.wilhelmsson at oracle.com wrote: > Hi, > > Please review the patch to integrate recent Graal changes into OpenJDK. > Graal tip to integrate: 88c3adb11b1bc10f6443435685b65227e7584b43 > > Bug: https://bugs.openjdk.java.net/browse/JDK-8222665 > Webrev: http://cr.openjdk.java.net/~jwilhelm/8222665/webrev.00/ > > This integration did overwrite changes already in place in OpenJDK. The diff has been attached to the umbrella bug. > > Thanks, > /Jesper > From jesper.wilhelmsson at oracle.com Mon May 6 18:32:37 2019 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Mon, 6 May 2019 20:32:37 +0200 Subject: RFR: JDK-8222665 - Update Graal In-Reply-To: <5ff3bca1-2ca5-24d4-976a-0d3ffdcaa874@oracle.com> References: <5ff3bca1-2ca5-24d4-976a-0d3ffdcaa874@oracle.com> Message-ID: Sorry! I forgot to remove the old one so the script automatically created webrev.01 but still linked to the old in the email. Current webrev: http://cr.openjdk.java.net/~jwilhelm/8222665/webrev.01/ /Jesper > On 6 May 2019, at 20:11, Vladimir Kozlov wrote: > > It seems webrev is wrong. Jesper, is it possible you sent old webrev? > > I looked on patch (from submitted test job) and it seems correct. For example, from next changes [1] it correctly updated only Copyright year in JDK (in JDK it was old 2018). > But webrev shows reversed changes [2]. > > The patch does not have IsGraalPredicate.java changes. But webrev has it with reversed changes again [3]. > > The same for GraalServices.java file changes. No changes in patch but reverse changes in webrev. > > Thanks, > Vladimir > > [1] https://github.com/oracle/graal/commit/4fa819e120212393122b55e2c95e9de7c6101ccf#diff-3f2f58ebefeb6c5489c4d264ec8ae502 > > [2] http://cr.openjdk.java.net/~jwilhelm/8222665/webrev.00/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/meta/DefaultHotSpotLoweringProvider.java.udiff.html > > [3] http://cr.openjdk.java.net/~jwilhelm/8222665/webrev.00/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/IsGraalPredicate.java.udiff.html > > On 5/6/19 7:18 AM, jesper.wilhelmsson at oracle.com wrote: >> Hi, >> Please review the patch to integrate recent Graal changes into OpenJDK. >> Graal tip to integrate: 88c3adb11b1bc10f6443435685b65227e7584b43 > >> Bug: https://bugs.openjdk.java.net/browse/JDK-8222665 >> Webrev: http://cr.openjdk.java.net/~jwilhelm/8222665/webrev.00/ >> This integration did overwrite changes already in place in OpenJDK. The diff has been attached to the umbrella bug. >> Thanks, >> /Jesper -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From dean.long at oracle.com Mon May 6 18:39:23 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Mon, 6 May 2019 11:39:23 -0700 Subject: [13] RFR (M): 8223216: C2: Unify class initialization checks between new, getstatic, and putstatic In-Reply-To: References: <5e67b2d3-9856-069e-4886-8366c89bc3f8@oracle.com> Message-ID: <2d4adb75-c86b-8115-16b0-fe7c5b4129e0@oracle.com> OK, thanks for the explanation.? Looks good. dl On 5/3/19 3:49 PM, Vladimir Ivanov wrote: > Thanks for the feedback, Dean. > >> Do you want to have a Runtime reviewer take a look at the new logic? > > I'm definitely looking for feedback on 8223213 from Runtime team. But > 8223216 is C2-specific and incrementally builds on top of it, so I > don't think there's anything new for Runtime team to look at. > >> Can you explain why Parse::clinit_deopt() changed from testing for >> >> InstanceKlass::fully_initialized >> >> to testing for >> >> InstanceKlass::being_initialized >> >> instead?? How do we know we it is the initializing thread? > > Initializing thread is irrelevant here. The check is solely about the > current state of the holder class. > > Parse::clinit_deopt() is not mandatory (nmethod clinit barrier on > entry cover all important cases), but an optimization. It is added by > 8223213 specifically for C2 to trigger recompilation once the holder > class is fully initialized. The motivation is to get better code when > a class is fully initialized. > > The change in 8223216 is intended as a refactoring: since there are > only 2 states allowed here (being_initialized and fully_initialized), > it doesn't matter what state is checked (== being initialized vs != > fully_initialized). > > Best regards, > Vladimir Ivanov > >> On 5/1/19 4:37 PM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/8223216/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8223216 >>> >>> (The patch has minor dependencies on 8223213 [1] I sent out for >>> review earlier.) >>> >>> C2 implements class initialization checks for new and >>> getstatic/putstatic differently: while "new" supports fast class >>> initialization checks, static field accesses rely on uncommon traps >>> which may lead to deoptimization/recompilation storms during >>> long-running class initialisation. >>> >>> Proposed patch unifies implementation between them and uses the >>> following barrier: >>> ?? if (holder->is_initialized()) { >>> ???? uncommon_trap(initialized, reinterpret); >>> ?? } >>> ?? if (!holder->is_reentrant_initialization(current_thread)) { >>> ???? uncommon_trap(uninitialized, none); >>> ?? } >>> >>> It also enhances checks for not-yet-initialized classes >>> (Compile::needs_clinit_barrier) and unifies the implementation >>> between new, invokestatic, and getfield/putfield. >>> >>> Testing: tier1-5, targeted microbenchmarks, new test from 8223213 >>> >>> Thanks! >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [1] http://cr.openjdk.java.net/~vlivanov/8223213/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8223213 >>> >> From vladimir.kozlov at oracle.com Mon May 6 18:43:39 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 6 May 2019 11:43:39 -0700 Subject: RFR: JDK-8222665 - Update Graal In-Reply-To: References: <5ff3bca1-2ca5-24d4-976a-0d3ffdcaa874@oracle.com> Message-ID: Yes, this one looks good! And testing seems fine - most failures are timeouts due to Graal runs with -Xcomp -XX:-TieredCompilation which is known issue 8222524. Thanks, Vladimir On 5/6/19 11:32 AM, jesper.wilhelmsson at oracle.com wrote: > Sorry! ?I forgot to remove the old one so the script automatically created webrev.01 but still linked to the old in the > email. > > Current webrev: > http://cr.openjdk.java.net/~jwilhelm/8222665/webrev.01/ > > /Jesper > >> On 6 May 2019, at 20:11, Vladimir Kozlov > wrote: >> >> It seems webrev is wrong. Jesper, is it possible you sent old webrev? >> >> I looked on patch (from submitted test job) and it seems correct. For example, from next changes [1] it correctly >> updated only Copyright year in JDK (in JDK it was old 2018). >> But webrev shows reversed changes [2]. >> >> The patch does not have IsGraalPredicate.java changes. But webrev has it with reversed changes again [3]. >> >> The same for GraalServices.java file changes. No changes in patch but reverse changes in webrev. >> >> Thanks, >> Vladimir >> >> [1] https://github.com/oracle/graal/commit/4fa819e120212393122b55e2c95e9de7c6101ccf#diff-3f2f58ebefeb6c5489c4d264ec8ae502 >> >> [2] >> http://cr.openjdk.java.net/~jwilhelm/8222665/webrev.00/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/meta/DefaultHotSpotLoweringProvider.java.udiff.html >> >> [3] >> http://cr.openjdk.java.net/~jwilhelm/8222665/webrev.00/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/IsGraalPredicate.java.udiff.html >> >> On 5/6/19 7:18 AM, jesper.wilhelmsson at oracle.com wrote: >>> Hi, >>> Please review the patch to integrate recent Graal changes into OpenJDK. >>> Graal tip to integrate: 88c3adb11b1bc10f6443435685b65227e7584b43 > >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8222665 >>> Webrev: http://cr.openjdk.java.net/~jwilhelm/8222665/webrev.00/ >>> This integration did overwrite changes already in place in OpenJDK. The diff has been attached to the umbrella bug. >>> Thanks, >>> /Jesper > From dean.long at oracle.com Mon May 6 18:45:33 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Mon, 6 May 2019 11:45:33 -0700 Subject: RFR(trivial): 8223054: [TESTBUG] Put graalJarsCP before existing classpath in GraalUnitTestLauncher In-Reply-To: References: <803B96E9-8EDD-4469-9137-63451E825724@oracle.com> Message-ID: Looks good (and trivial) to me. dl On 5/6/19 3:41 AM, Pengfei Li (Arm Technology China) wrote: > Thanks Igor. Do I need another reviewer for this trivial change? > > // Also cc graal-dev list > > -- > Thanks, > Pengfei > >> Looks good to me. >> >> // moved to hotspot compiler list >> >> ? Igor >> >>> On May 4, 2019, at 6:32 PM, Pengfei Li (Arm Technology China) >> wrote: >>> Hi, >>> >>> Please help review this trivial change on GraalUnitTestLauncher. >>> >>> Webrev: http://cr.openjdk.java.net/~pli/rfr/8223054/webrev.00/ >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8223054 >>> >>> Current graal unit test in jtreg requires junit-4.12.jar as a dependency. In >> GraalUnitTestLauncher.java, we put the path of this file into graalJarsCP and >> concat it with existing classpath. But existing classpath may contain another >> version of junit with which the jtreg tool is built. (According to OpenJDK >> "Building jtreg" webpage[1], the recommended version of Junit to build jtreg >> is junit-4.10). >>> In this patch, graalJarsCP is put before existing classpath returned by >> System.getProperty() when generating the new classpath string to avoid >> incompatibility issues. Jteg graal unit test cases passed after this change. >>> [1] https://openjdk.java.net/jtreg/build.html >>> >>> -- >>> Thanks, >>> Pengfei >>> From vladimir.x.ivanov at oracle.com Mon May 6 19:15:21 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 6 May 2019 12:15:21 -0700 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB74D2@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB845@FMSMSX126.amr.corp.intel.com> <21eeec09-624f-2dbd-b2f5-86d512233fe0@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB898@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABCE7@FMSMSX126.amr.corp.intel.com> <4a77b7c0-fc1a-441c-d018-70568876c4f4@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABDA2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB5094@FMSMSX126.amr.corp.intel.com> <0cd3fd93-0f1e-a6d0-d4c3-f8d95b533ff7@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB56B1@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB7472@FMSMSX126.amr.corp.intel.com> <52876f29-4da2-2885-fe18-5e362b57eb2b@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AB74D2@FMSMSX126.amr.corp.intel.com> Message-ID: > http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.03/ Looks good. Testing results are good as well. Best regards, Vladimir Ivanov > Yes, the footprint numbers continue to hold with this patch: > The x86.ad file is 126 lines smaller. > The libjvm size increase is only 0.24%. > > Best Regards, > Sandhya > > -----Original Message----- > From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] > Sent: Friday, May 03, 2019 4:22 PM > To: Viswanathan, Sandhya > Cc: hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov > Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > > >> http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.02/ > > Much better! I like how AD files look now. I assume static footprint > numbers you provided earlier are still valid. > > +void MacroAssembler::vabsnegd(int opcode, XMMRegister dst, Register scr) { > + if (opcode == Op_AbsVD) { > + andpd(dst, > ExternalAddress(StubRoutines::x86::vector_double_sign_mask()), scr); > + } else { > + assert((opcode == Op_NegVD),"opcode should be Op_NegD"); > + xorpd(dst, > ExternalAddress(StubRoutines::x86::vector_double_sign_flip()), scr); > + } > +} > > It's a bit odd to see C2-specific stuff in MacroAssembler, but I'm > perfectly fine with incrementally refactor it later. > > For now, just guard relevant code with #ifdef COMPILER2. > > Otherwise, looks very good! > > Best regards, > Vladimir Ivanov > >> >> Looking forward to your feedback. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] >> Sent: Wednesday, May 01, 2019 5:09 PM >> To: Viswanathan, Sandhya ; Vladimir Kozlov >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> Sounds good, thanks! >> >> Best regards, >> Vladimir Ivanov >> >> On 01/05/2019 15:16, Viswanathan, Sandhya wrote: >>> I should add here that your suggestion of adding generic shift instruction etc to the macroAssembler is also wonderful instead of function pointer. I will look into making that change as well. >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: Viswanathan, Sandhya >>> Sent: Wednesday, May 01, 2019 3:10 PM >>> To: 'Vladimir Ivanov' ; Vladimir Kozlov >>> Cc: hotspot-compiler-dev at openjdk.java.net >>> Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 >>> >>> Hi Vladimir, >>> >>> I agree, I wanted to show both the approaches in this patch to get your feedback: >>> 1) with emit as a function >>> 2) with emit part in the instruct body itself >>> >>> With emit as a function it becomes hard to read and I personally prefer it in the instruct itself as is done for vabsneg2D etc. That is what you are recommending as well so I feel good. >>> >>> Once the adlc enhancement is done both the approaches should give similar binary size. Till then there will be small overhead with approach 2) as emit is duplicated per match rule. >>> >>> I will send an updated patch fixing the two issues you mentioned in your previous email plus this change of using approach 2). >>> >>> Please do let me know if you want to see any other change in this patch. >>> >>> Best Regards, >>> Sandhya >>> >>> >>> >>> -----Original Message----- >>> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] >>> Sent: Wednesday, May 01, 2019 2:58 PM >>> To: Viswanathan, Sandhya ; Vladimir Kozlov >>> Cc: hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >>> >>> >>>> http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.01/ >>> >>> Nice job, Sandhya! Glad to hear the approach pays off! >>> >>> Unfortunately, I must note that AD file becomes much more obscure. >>> Especially with those function pointers. >>> >>> 1528 void emit_vshift16B_code(MacroAssembler& _masm, int opcode, XMMRegister dst, >>> 1529 XMMRegister src, XMMRegister shift, >>> 1530 XMMRegister tmp1, XMMRegister tmp2, >>> Register scratch) { >>> 1531 XX_Inst extendinst = get_extend_inst(opcode == Op_URShiftVB ? >>> false : true); >>> 1532 XX_Inst shiftinst = get_xx_inst(opcode); >>> 1533 >>> 1534 (_masm.*extendinst)(tmp1, src); >>> 1535 (_masm.*shiftinst)(tmp1, shift); >>> 1536 __ pshufd(tmp2, src, 0xE); >>> 1537 (_masm.*extendinst)(tmp2, tmp2); >>> 1538 (_masm.*shiftinst)(tmp2, shift); >>> 1539 __ movdqu(dst, ExternalAddress(vector_short_to_byte_mask()), >>> scratch); >>> 1540 __ pand(tmp2, dst); >>> 1541 __ pand(dst, tmp1); >>> 1542 __ packuswb(dst, tmp2); >>> 1543 } >>> >>> Have you tried to encapsulate that into x86-specific MacroAssembler? >>> >>> 8682 instruct vshift16B(vecX dst, vecX src, vecS shift, vecX tmp1, vecX tmp2, rRegI scratch) %{ >>> 8683 predicate(UseSSE > 3 && UseAVX <= 1 && n->as_Vector()->length() >>> == 16); >>> 8684 match(Set dst (LShiftVB src shift)); >>> 8685 match(Set dst (RShiftVB src shift)); >>> 8686 match(Set dst (URShiftVB src shift)); >>> 8687 effect(TEMP dst, TEMP tmp1, TEMP tmp2, TEMP scratch); >>> 8688 format %{"pmovxbw $tmp1,$src\n\t" >>> 8689 "shiftop $tmp1,$shift\n\t" >>> 8690 "pshufd $tmp2,$src\n\t" >>> 8691 "pmovxbw $tmp2,$tmp2\n\t" >>> 8692 "shiftop $tmp2,$shift\n\t" >>> 8693 "movdqu $dst,[0x00ff00ff0x00ff00ff]\n\t" >>> 8694 "pand $tmp2,$dst\n\t" >>> 8695 "pand $dst,$tmp1\n\t" >>> 8696 "packuswb $dst,$tmp2\n\t! packed16B shift" %} >>> 8697 ins_encode %{ >>> 8698 emit_vshift16B_code(_masm, this->as_Mach()->ideal_Opcode() , >>> $dst$$XMMRegister, $src$$XMMRegister, $shift$$XMMRegister, $tmp1$$XMMRegister, $tmp2$$XMMRegister, $scratch$$Register); >>> 8699 %} >>> 8700 ins_pipe( pipe_slow ); >>> 8701 %} >>> >>> can be turned into something like: >>> >>> instruct vshift16B(vecX dst, vecX src, vecS shift, vecX tmp1, vecX tmp2, rRegI scratch) %{ >>> predicate(n->as_Vector()->length() == 16); >>> match(Set dst (LShiftVB src shift)); >>> match(Set dst (RShiftVB src shift)); >>> match(Set dst (URShiftVB src shift)); >>> effect(TEMP dst, TEMP tmp1, TEMP tmp2, TEMP scratch); >>> format %{"packed16B shift" %} >>> ins_encode %{ >>> int vlen = 0; // 128-bit >>> BasicType elem_type = T_BYTE; >>> int shift_mode = ...; // L/R/UR or S/U + L/R >>> __ vshift(vlen, elem_type, shift_mode, >>> $dst$$..., $src$$..., $shift$$..., >>> $tmp1$$..., $tmp2$$..., $scratch$$...); >>> %} >>> >>> Then MA::vshift can dispatch between different implementations depending on SSE/AVX level available. Do you see any problems with that from footprint perspective? >>> >>> Ideally, I'd prefer to see a library of operations on vectors encapsulated in MacroAssembler (or a subclass) and used in x86.ad. That will accommodate further reductions in AD instructions needed. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> With this webrev the ad file has only about 60 lines effectively added. >>>> Also the generated product libjvm.so size only increases by about 0.26% vs the prior 1.50%. >>>> I have used multiple match rules in one instruct for same size shift related rules and also for the new Abs/Neg rules. >>>> What I noticed is that the adlc still duplicates lot of code and there is potential to further improve code size for multiple match rule case by improving the adlc itself. >>>> The adlc improvement (like removing duplicate emits, formats, expand, pipeline etc) can be done as a separate RFE. >>>> >>>> In this webrev, I have also fixed the errors reported by Vladimir Ivanov and corrected the issues reported by jcheck tool. >>>> Also taken into account reducing the temporary by using TEMP dst for multiply rules. >>>> >>>> The compiler jtreg tests and the java math tests pass on Haswell, SKX, and KNL. >>>> >>>> Your review and feedback is welcome. >>>> >>>> Best Regards, >>>> Sandhya >>>> >>>> >>>> -----Original Message----- >>>> From: hotspot-compiler-dev >>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >>>> Viswanathan, Sandhya >>>> Sent: Wednesday, April 10, 2019 10:22 AM >>>> To: Vladimir Kozlov ; B. Blaser >>>> >>>> Cc: hotspot-compiler-dev at openjdk.java.net >>>> Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 >>>> >>>> Yes good catch, in mul32B_reg_avx(), the last two instructions are the only place where dst is used: >>>> >>>> __ vpackuswb($dst$$XMMRegister, $tmp2$$XMMRegister, $tmp1$$XMMRegister, vector_len); >>>> __ vpermq($dst$$XMMRegister, $dst$$XMMRegister, 0xD8, >>>> vector_len); >>>> >>>> Here dst can be same as tmp2 or tmp1 in packuswb() and so the effect TEMP dst is not required. >>>> >>>> Best Regards, >>>> Sandhya >>>> >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Wednesday, April 10, 2019 9:59 AM >>>> To: Viswanathan, Sandhya ; B. Blaser >>>> >>>> Cc: hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >>>> >>>> On 4/10/19 8:36 AM, Viswanathan, Sandhya wrote: >>>>> Hi Bernard, >>>>> >>>>> One could add TEMP dst in effect() to let the register allocator know that dst needs to be different from src. >>>> >>>> Yes, we use this way. Or, in mul4B_reg() case, we can use $dst instead >>>> $tmp2 to avoid overwriting >>>> $src2 before we get value from it if $dst = $src2. >>>> >>>> On other hand, mul32B_reg_avx() and other have 'TEMP dst' effect but $dst is used only for final result. >>>> >>>> It is a little mess which may cause ineffective use of registers in compiled code. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> >>>>> Best Regards, >>>>> Sandhya >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: B. Blaser [mailto:bsrbnd at gmail.com] >>>>> Sent: Wednesday, April 10, 2019 4:10 AM >>>>> To: Viswanathan, Sandhya >>>>> Cc: Vladimir Kozlov ; >>>>> hotspot-compiler-dev at openjdk.java.net >>>>> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >>>>> >>>>> Hi Sandhya and Vladimir K., >>>>> >>>>> On Wed, 10 Apr 2019 at 03:06, Viswanathan, Sandhya wrote: >>>>>> >>>>>> Hi Vladimir, >>>>>> >>>>>> Yes, I missed the question below: >>>>>>>> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? >>>>>> >>>>>> No it is not intentional, we can use the dst register in those cases and reduced the tmps. >>>>> >>>>> I guess we have to be careful using $dst instead of $tmp registers as the allocator sometimes provides identical $src & $dst. Also, I'm not sure this would be possible in the case of mul4B_reg(): >>>>> >>>>> 7349 format %{"pmovsxbw $tmp,$src1\n\t" >>>>> 7350 "pmovsxbw $tmp2,$src2\n\t" >>>>> >>>>> I believe this couldn't work if you use $dst instead of $tmp and $dst = $src2, what do you think? >>>>> >>>>> Thanks, >>>>> Bernard >>>>> From vladimir.x.ivanov at oracle.com Mon May 6 19:19:02 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 6 May 2019 12:19:02 -0700 Subject: RFR: 8221542: ~15% performance degradation due to less optimized inline decision In-Reply-To: <8510740c-ac56-f8d2-3c5e-451dfa6948a0@loongson.cn> References: