From jiefu at tencent.com Sat Aug 1 10:09:54 2020 From: jiefu at tencent.com (=?utf-8?B?amllZnUo5YKF5p2wKQ==?=) Date: Sat, 1 Aug 2020 10:09:54 +0000 Subject: 8250825: C2 crashes with assert(field != __null) failed: missing field(Internet mail) In-Reply-To: <40d947f8-ebdb-0850-274b-583be9a37aa3@oracle.com> References: <11584C93-EDD5-42A9-A2CD-0738970F3181@tencent.com>, <40d947f8-ebdb-0850-274b-583be9a37aa3@oracle.com> Message-ID: <97d105b27624408e89666fe7ebdb4d74@tencent.com> Thanks Vladimir and Tobias for your review. Pushed. Best regards, Jie ________________________________ From: Vladimir Kozlov Sent: Saturday, August 1, 2020 7:54 AM To: jiefu(??) Cc: hotspot compiler Subject: Re: 8250825: C2 crashes with assert(field != __null) failed: missing field(Internet mail) Yes, it is good. Thanks, Vladimir On 7/31/20 4:43 PM, jiefu(??) wrote: > Hi Vladimir K, > > The latest version for the test case is here: http://cr.openjdk.java.net/~jiefu/8250825/webrev.02/ > Compared with webrev.01, the changes are: > - Rename the test to TestMisalignedUnsafeAccess.java > - Add @summary tag > - Remove Xbatch > - Remvoe initUnsafe > > Are you still OK with it? > > Thanks. > Best regards, > Jie > > ?On 2020/8/1, 12:46 AM, "Vladimir Kozlov" wrote: > > Good. > > thanks, > Vladimir K > > On 7/30/20 10:06 PM, jiefu(??) wrote: > > Hi Vladimir K, > > > > Thanks for your review. > > > > The test had been extended here: > > - http://cr.openjdk.java.net/~jiefu/8250825/webrev.01/ > > > > Before the patch: > > The unsafe access (put/get) to static field will crash. > > The unsafe access (put/get) to instance field is fine. > > > > After the patch: > > All is ok. > > > > Thanks a lot. > > Best regards, > > Jie > > > > On 2020/7/31, 2:24 AM, "hotspot-compiler-dev on behalf of Vladimir Kozlov" wrote: > > > > Hi Jie > > > > Nodes generated by make_unsafe_address() are correct. The issue is that Unsafe API allows to genereate unaligned (to > > fields) offset with arbitrary type. As result C2 type system can't find corresponding field. > > > > Did you tried to do unaligned unsafe access to instance fields? > > Also try to unsafe set value (Store node). There is code in C2 which checks for narrow stores. Would be interesting how > > it behave in unsafe case. > > > > Please, extend your test. > > > > Otherwise fix is good. > > > > Thanks, > > Vladimir K > > > > On 7/30/20 6:09 AM, jiefu(??) wrote: > > > Hi all, > > > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8250825 > > > Webrev: http://cr.openjdk.java.net/~jiefu/8250825/webrev.00/ > > > > > > When C2 tries to inline an unsafe-access method, it may generate the following pattern in make_unsafe_address: > > > ConP ConL > > > \ | > > > \ | > > > AddP > > > Current implementation of TypeOopPtr::TypeOopPtr(...) failed to recognize it as an unsafe operation, which leads to the crash. > > > > > > Testing: > > > - tier1-3 on Linux/x64 > > > > > > Could you please review it and give me some advice? > > > > > > Thanks a lot. > > > Best regards, > > > Jie > > > > > > > > > > > > From jatin.bhateja at intel.com Sun Aug 2 18:25:12 2020 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Sun, 2 Aug 2020 18:25:12 +0000 Subject: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 In-Reply-To: <8265e303-0f86-b308-be79-740d6b4710f2@oracle.com> References: <92d97d1b-fc53-e368-b249-1cab7db33964@oracle.com> <5f6a3e52-7854-4613-43f1-32a7423a0db6@oracle.com> <8265e303-0f86-b308-be79-740d6b4710f2@oracle.com> Message-ID: Hi Vladimir, Final patch is placed at following link. http://cr.openjdk.java.net/~jbhateja/8248830/webrev.06/ One more reviewer approval needed. Best Regards, Jatin > -----Original Message----- > From: Vladimir Ivanov > Sent: Saturday, August 1, 2020 4:49 AM > To: Bhateja, Jatin > Cc: Viswanathan, Sandhya ; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 > > > > http://cr.openjdk.java.net/~jbhateja/8248830/webrev.05/ > > Looks good. > > Tier5 (where I saw the crashes) passed. > > Please, incorporate the following minor cleanups in the final version: > http://cr.openjdk.java.net/~vlivanov/jbhateja/8248830/webrev.05.cleanup/ > > (Tested with hs-tier1,hs-tier2.) > > Best regards, > Vladimir Ivanov > > >> -----Original Message----- > >> From: Vladimir Ivanov > >> Sent: Thursday, July 30, 2020 3:30 AM > >> To: Bhateja, Jatin > >> Cc: Viswanathan, Sandhya ; > >> hotspot-compiler- dev at openjdk.java.net > >> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for > >> X86 > >> > >> > >>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.04/ > >>> > >>> Looks good. (Testing is in progress.) > >> > >> FYI test results are clean (tier1-tier5). > >> > >>>> I have removed RotateLeftNode/RotateRightNode::Ideal routines since > >>>> we are anyways doing constant folding in LShiftI/URShiftI value > >>>> routines. Since JAVA rotate APIs are no longer intrincified hence > >>>> these routines may no longer be useful. > >>> > >>> Nice observation! Good. > >> > >> As a second thought, it seems there's still a chance left that Rotate > >> nodes get their input type narrowed after the folding happened. For > >> example, as a result of incremental inlining or CFG transformations > >> during loop optimizations. And it does happen in practice since the > >> testing revealed some crashes due to the bug in > RotateLeftNode/RotateRightNode::Ideal(). > >> > >> So, it makes sense to keep the transformations. But I'm fine with > >> addressing that as a followup enhancement. > >> > >> Best regards, > >> Vladimir Ivanov > >> > >>> > >>>>> It would be really nice to migrate to MacroAssembler along the way > >>>>> (as a cleanup). > >>>> > >>>> I guess you are saying remove opcodes/encoding from patterns and > >>>> move then to Assembler, Can we take this cleanup activity > >>>> separately since other patterns are also using these matcher > directives. > >>> > >>> I'm perfectly fine with handling it as a separate enhancement. > >>> > >>>> Other synthetic comments have been taken care of. I have extended > >>>> the Test to cover all the newly added scalar transforms. Kindly let > >>>> me know if there other comments. > >>> > >>> Nice! > >>> > >>> Best regards, > >>> Vladimir Ivanov > >>> > >>>>> -----Original Message----- > >>>>> From: Vladimir Ivanov > >>>>> Sent: Friday, July 24, 2020 3:21 AM > >>>>> To: Bhateja, Jatin > >>>>> Cc: Viswanathan, Sandhya ; Andrew > >>>>> Haley ; hotspot-compiler-dev at openjdk.java.net > >>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification > >>>>> for > >>>>> X86 > >>>>> > >>>>> Hi Jatin, > >>>>> > >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.03/ > >>>>> > >>>>> Much better! Thanks. > >>>>> > >>>>>> Change Summary: > >>>>>> > >>>>>> 1) Unified the handling for scalar rotate operation. All scalar > >>>>>> rotate > >>>>> selection patterns are now dependent on newly created > >>>>> RotateLeft/RotateRight nodes. This promotes rotate inferencing. > >>>>> Currently > >>>>> if DAG nodes corresponding to a sub-pattern are shared (have > >>>>> multiple > >>>>> users) then existing complex patterns based on Or/LShiftL/URShift > >>>>> does not get matched and this prevents inferring rotate nodes. > >>>>> Please refer to JIT'ed assembly output with baseline[1] and with > >>>>> patch[2] . We can see that generated code size also went done from > >>>>> 832 byte to 768 bytes. Also this can cause perf degradation if > >>>>> shift-or dependency chain appears inside a hot region. > >>>>>> > >>>>>> 2) Due to enhanced rotate inferencing new patch shows better > >>>>>> performance > >>>>> even for legacy targets (non AVX-512). Please refer to the perf > >>>>> result[3] over AVX2 machine for JMH benchmark part of the patch. > >>>>> > >>>>> Very nice! > >>>>>> 3) As suggested, removed Java API intrinsification changes and > >>>>>> scalar > >>>>> rotate transformation are done during OrI/OrL node idealizations. > >>>>> > >>>>> Good. > >>>>> > >>>>> (Still would be nice to factor the matching code from Ideal() and > >>>>> share it between multiple use sites. Especially considering > >>>>> OrVNode::Ideal() now does basically the same thing. As an > >>>>> example/idea, take a look at > >>>>> is_bmi_pattern() in x86.ad.) > >>>>> > >>>>>> 4) SLP always gets to work on new scalar Rotate nodes and creates > >>>>>> vector > >>>>> rotate nodes which are degenerated into OrV/LShiftV/URShiftV nodes > >>>>> if target does not supports vector rotates(non-AVX512). > >>>>> > >>>>> Good. > >>>>> > >>>>>> 5) Added new instruction patterns for vector shift Left/Right > >>>>>> operations > >>>>> with constant shift operands. This prevents emitting extra moves > >>>>> to > >> XMM. > >>>>> > >>>>> +instruct vshiftI_imm(vec dst, vec src, immI8 shift) %{ > >>>>> +? match(Set dst (LShiftVI src shift)); > >>>>> > >>>>> I'd prefer to see a uniform Ideal IR shape being used irrespective > >>>>> of whether the argument is a constant or not. It should also > >>>>> simplify the logic in SuperWord and make it easier to support on > >>>>> non-x86 architectures. > >>>>> > >>>>> For example, here's how it is done on AArch64: > >>>>> > >>>>> instruct vsll4I_imm(vecX dst, vecX src, immI shift) %{ > >>>>> ??? predicate(n->as_Vector()->length() == 4); > >>>>> ??? match(Set dst (LShiftVI src (LShiftCntV shift))); ... > >>>>> > >>>>>> 6) Constant folding scenarios are covered in > >>>>>> RotateLeft/RotateRight > >>>>> idealization, inferencing of vector rotate through OrV > >>>>> idealization covers the vector patterns generated though non SLP > route i.e. > >>>>> VectorAPI. > >>>>> > >>>>> I'm fine with keeping OrV::Ideal(), but I'm concerned with the > >>>>> general direction here - duplication of scalar transformations to > >>>>> lane-wise vector operations. It definitely won't scale and in a > >>>>> longer run it risks to diverge. Would be nice to find a way to > >>>>> automatically "lift" > >>>>> scalar transformations to vectors and apply them uniformly. But > >>>>> right now it is just an idea which requires more experimentation. > >>>>> > >>>>> > >>>>> Some other minor comments/suggestions: > >>>>> > >>>>> +? // Swap the computed left and right shift counts. > >>>>> +? if (is_rotate_left) { > >>>>> +??? Node* temp = shiftRCnt; > >>>>> +??? shiftRCnt? = shiftLCnt; > >>>>> +??? shiftLCnt? = temp; > >>>>> +? } > >>>>> > >>>>> Maybe use swap() here (declared in globalDefinitions.hpp)? > >>>>> > >>>>> > >>>>> +? if (Matcher::match_rule_supported_vector(vopc, vlen, bt)) > >>>>> +??? return true; > >>>>> > >>>>> Please, don't omit curly braces (even for simple cases). > >>>>> > >>>>> > >>>>> -// Rotate Right by variable > >>>>> -instruct rorI_rReg_Var_C0(no_rcx_RegI dst, rcx_RegI shift, immI0 > >>>>> zero, rFlagsReg cr) > >>>>> +instruct rorI_immI8_legacy(rRegI dst, immI8 shift, rFlagsReg cr) > >>>>> ?? %{ > >>>>> -? match(Set dst (OrI (URShiftI dst shift) (LShiftI dst (SubI zero > >>>>> shift)))); > >>>>> - > >>>>> +? predicate(!VM_Version::supports_bmi2() && > >>>>> n->bottom_type()->basic_type() == T_INT); > >>>>> +? match(Set dst (RotateRight dst shift)); > >>>>> +? format %{ "rorl???? $dst, $shift" %} > >>>>> ???? expand %{ > >>>>> -??? rorI_rReg_CL(dst, shift, cr); > >>>>> +??? rorI_rReg_imm8(dst, shift, cr); > >>>>> ???? %} > >>>>> > >>>>> It would be really nice to migrate to MacroAssembler along the way > >>>>> (as a cleanup). > >>>>> > >>>>>> Please push the patch through your testing framework and let me > >>>>>> know your > >>>>> review feedback. > >>>>> > >>>>> There's one new assertion failure: > >>>>> > >>>>> #? Internal Error (.../src/hotspot/share/opto/phaseX.cpp:1238), > >>>>> pid=5476, tid=6219 > >>>>> #? assert((i->_idx >= k->_idx) || i->is_top()) failed: Idealize > >>>>> should return new nodes, use Identity to return old nodes > >>>>> > >>>>> I believe it comes from > >>>>> RotateLeftNode::Ideal/RotateRightNode::Ideal > >>>>> which can return pre-contructed constants. I suggest to get rid of > >>>>> Ideal() methods and move constant folding logic into Node::Value() > >>>>> (as implemented for other bitwise/arithmethic nodes in > >>>>> addnode.cpp/subnode.cpp/mulnode.cpp et al). It's a more generic > >>>>> approach since it enables richer type information (ranges vs > >>>>> constants) and IMO it's more convenient to work with constants > >>>>> through Types than ConNodes. > >>>>> > >>>>> (I suspect that original/expanded IR shape may already provide > >>>>> more precise type info for non-constant case which can affect the > >>>>> benchmarks.) > >>>>> > >>>>> Best regards, > >>>>> Vladimir Ivanov > >>>>> > >>>>>> > >>>>>> Best Regards, > >>>>>> Jatin > >>>>>> > >>>>>> [1] > >>>>>> > http://cr.openjdk.java.net/~jbhateja/8248830/rotate_baseline_avx2_asm. > >>>>>> txt [2] > >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_new_patch_avx > >>>>>> 2_ > >>>>>> asm > >>>>>> .txt [3] > >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_perf_avx2_new > >>>>>> _p > >>>>>> atc > >>>>>> h.txt > >>>>>> > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Vladimir Ivanov > >>>>>>> Sent: Saturday, July 18, 2020 12:25 AM > >>>>>>> To: Bhateja, Jatin ; Andrew Haley > >>>>>>> > >>>>>>> Cc: Viswanathan, Sandhya ; > >>>>>>> hotspot-compiler- dev at openjdk.java.net > >>>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification > >>>>>>> for > >>>>>>> X86 > >>>>>>> > >>>>>>> Hi Jatin, > >>>>>>> > >>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev_02/ > >>>>>>> > >>>>>>> It definitely looks better, but IMO it hasn't reached the sweet > >>>>>>> spot > >>>>> yet. > >>>>>>> It feels like the focus is on auto-vectorizer while the burden > >>>>>>> is put on scalar cases. > >>>>>>> > >>>>>>> First of all, considering GVN folds relevant operation patterns > >>>>>>> into a single Rotate node now, what's the motivation to > >>>>>>> introduce intrinsics? > >>>>>>> > >>>>>>> Another point is there's still significant duplication for > >>>>>>> scalar cases. > >>>>>>> > >>>>>>> I'd prefer to see the legacy cases which rely on pattern > >>>>>>> matching to go away and be substituted with instructions which > >>>>>>> match Rotate instructions (migrating ). > >>>>>>> > >>>>>>> I understand that it will penalize the vectorization > >>>>>>> implementation, but IMO reducing overall complexity is worth it. > >>>>>>> On auto-vectorizer side, I see > >>>>>>> 2 ways to fix it: > >>>>>>> > >>>>>>> ???? (1) introduce additional AD instructions for > >>>>>>> RotateLeftV/RotateRightV specifically for pre-AVX512 hardware; > >>>>>>> > >>>>>>> ???? (2) in SuperWord::output(), when matcher doesn't support > >>>>>>> RotateLeftV/RotateLeftV nodes (Matcher::match_rule_supported()), > >>>>>>> generate vectorized version of the original pattern. > >>>>>>> > >>>>>>> Overall, it looks like more and more focus is made on scalar part. > >>>>>>> Considering the main goal of the patch is to enable > >>>>>>> vectorization, I'm fine with separating cleanup of scalar part. > >>>>>>> As an interim solution, it seems that leaving the scalar part as > >>>>>>> it is now and matching scalar bit rotate pattern in > >>>>>>> VectorNode::is_rotate() should be enough to keep the > >>>>>>> vectorization part functioning. Then scalar Rotate nodes and > relevant cleanups can be integrated later. > >>>>>>> (Or vice > >>>>>>> versa: clean up scalar part first and then follow up with > >>>>>>> vectorization.) > >>>>>>> > >>>>>>> Some other comments: > >>>>>>> > >>>>>>> * There's a lot of duplication between OrINode::Ideal and > >>>>> OrLNode::Ideal. > >>>>>>> What do you think about introducing a super type > >>>>>>> (OrNode) and put a unified version (OrNode::Ideal) there? > >>>>>>> > >>>>>>> > >>>>>>> * src/hotspot/cpu/x86/x86.ad > >>>>>>> > >>>>>>> +instruct vprotate_immI8(vec dst, vec src, immI8 shift) %{ > >>>>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type() > >>>>>>> +== > >>>>>>> T_INT > >>>>> || > >>>>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type() > >>>>>>> +== T_LONG); > >>>>>>> > >>>>>>> +instruct vprorate(vec dst, vec src, vec shift) %{ > >>>>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type() > >>>>>>> +== > >>>>>>> T_INT > >>>>> || > >>>>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type() > >>>>>>> +== T_LONG); > >>>>>>> > >>>>>>> The predicates are redundant here. > >>>>>>> > >>>>>>> > >>>>>>> * src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp > >>>>>>> > >>>>>>> +void C2_MacroAssembler::vprotate_imm(int opcode, BasicType > >>>>>>> +etype, > >>>>>>> XMMRegister dst, XMMRegister src, > >>>>>>> +???????????????????????????????????? int shift, int vector_len) > >>>>>>> +{ if (opcode == Op_RotateLeftV) { > >>>>>>> +??? if (etype == T_INT) { > >>>>>>> +????? evprold(dst, src, shift, vector_len); > >>>>>>> +??? } else { > >>>>>>> +????? evprolq(dst, src, shift, vector_len); > >>>>>>> +??? } > >>>>>>> > >>>>>>> Please, put an assert for the false case (assert(etype == > >>>>>>> T_LONG, > >>>>> "...")). > >>>>>>> > >>>>>>> > >>>>>>> * On testing (with previous version of the patch): -XX:UseAVX is > >>>>>>> x86- specific flag, so new/adjusted tests now fail on non-x86 > >> platforms. > >>>>>>> Either omitting the flag or adding > >>>>>>> -XX:+IgnoreUnrecognizedVMOptions will solve the issue. > >>>>>>> > >>>>>>> Best regards, > >>>>>>> Vladimir Ivanov > >>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Summary of changes: > >>>>>>>> 1) Optimization is specifically targeted to exploit vector > >>>>>>>> rotation > >>>>>>> instruction added for X86 AVX512. A single rotate instruction > >>>>>>> encapsulates entire vector OR/SHIFTs pattern thus offers better > >>>>>>> latency at reduced instruction count. > >>>>>>>> > >>>>>>>> 2) There were two approaches to implement this: > >>>>>>>> ?????? a)? Let everything remain the same and add new wide > >>>>>>>> complex > >>>>>>> instruction patterns in the matcher for e.g. > >>>>>>>> ??????????? set Dst ( OrV (Binary (LShiftVI dst (Binary > >>>>>>>> ReplicateI > >>>>>>>> shift)) > >>>>>>> (URShiftVI dst (Binary (SubI (Binary ReplicateI 32) ( Replicate > >>>>>>> shift)) > >>>>>>>> ?????? It would have been an overoptimistic assumption to > >>>>>>>> expect that graph > >>>>>>> shape would be preserved till the matcher for correct inferencing. > >>>>>>>> ?????? In addition we would have required multiple such bulky > >>>>>>>> patterns. > >>>>>>>> ?????? b) Create new RotateLeft/RotateRight scalar nodes, > >>>>>>>> these gets > >>>>>>> generated during intrinsification as well as during additional > >>>>>>> pattern > >>>>>>>> ?????? matching during node Idealization, later on these nodes > >>>>>>>> are consumed > >>>>>>> by SLP for valid vectorization scenarios to emit their vector > >>>>>>>> ?????? counterparts which eventually emits vector rotates. > >>>>>>>> > >>>>>>>> 3) I choose approach 2b) since its cleaner, only problem here > >>>>>>>> was that in non-evex mode (UseAVX < 3) new scalar Rotate nodes > >>>>>>>> should either be > >>>>>>> dismantled back to OR/SHIFT pattern or we penalize the > >>>>>>> vectorization which would be very costly, other option would > >>>>>>> have been to add additional vector rotate pattern for UseAVX=3 > >>>>>>> in the matcher which emit vector OR-SHIFTs instruction but then > >>>>>>> it will loose on emitting efficient instruction sequence which > >>>>>>> node sharing > >>>>>>> (OrV/LShiftV/URShift) offer in current implementation - thus it > >>>>>>> will not be beneficial for non-AVX512 targets, only saving will > >>>>>>> be in terms of cleanup of few existing scalar rotate matcher > >>>>>>> patterns, also old targets does not offer this powerful rotate > >> instruction. > >>>>>>> Therefore new scalar nodes are created only for AVX512 targets. > >>>>>>>> > >>>>>>>> As per suggestions constant folding scenarios have been covered > >>>>>>>> during > >>>>>>> Idealizations of newly added scalar nodes. > >>>>>>>> > >>>>>>>> Please review the latest version and share your feedback and > >>>>>>>> test > >>>>>>> results. > >>>>>>>> > >>>>>>>> Best Regards, > >>>>>>>> Jatin > >>>>>>>> > >>>>>>>> > >>>>>>>>> -----Original Message----- > >>>>>>>>> From: Andrew Haley > >>>>>>>>> Sent: Saturday, July 11, 2020 2:24 PM > >>>>>>>>> To: Vladimir Ivanov ; Bhateja, > >>>>>>>>> Jatin ; > >>>>>>>>> hotspot-compiler-dev at openjdk.java.net > >>>>>>>>> Cc: Viswanathan, Sandhya > >>>>>>>>> Subject: Re: 8248830 : RFR[S] : C2 : Rotate API > >>>>>>>>> intrinsification for > >>>>>>>>> X86 > >>>>>>>>> > >>>>>>>>> On 10/07/2020 18:32, Vladimir Ivanov wrote: > >>>>>>>>> > >>>>>>>>> ??? > High-level comment: so far, there were no pressing need > >>>>>>>>> in > >>>>>>>>>> explicitly marking the methods as intrinsics. ROR/ROL > >>>>>>>>> instructions > >>>>>>>>>> were selected during matching [1]. Now the patch introduces > >>>>>>>>>> > > >>>>>>>>> dedicated nodes > >>>>>>>>> (RotateLeft/RotateRight) specifically for intrinsics? > which > >>>>>>>>> partly duplicates existing logic. > >>>>>>>>> > >>>>>>>>> The lack of rotate nodes in the IR has always meant that > >>>>>>>>> AArch64 doesn't generate optimal code for e.g. > >>>>>>>>> > >>>>>>>>> ????? (Set dst (XorL reg1 (RotateLeftL reg2 imm))) > >>>>>>>>> > >>>>>>>>> because, with the RotateLeft expanded to its full combination > >>>>>>>>> of ORs and shifts, it's to complicated to match. At the time I > >>>>>>>>> put this to one side because it wasn't urgent. This is a shame > >>>>>>>>> because although such combinations are unusual they are used > >>>>>>>>> in some crypto > >>>>> operations. > >>>>>>>>> > >>>>>>>>> If we can generate immediate-form rotate nodes early by > >>>>>>>>> pattern matching during parsing (rather than depending on > >>>>>>>>> intrinsics) we'll get more value than by depending on > >>>>>>>>> programmers calling > >> intrinsics. > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Andrew Haley? (he/him) > >>>>>>>>> Java Platform Lead Engineer > >>>>>>>>> Red Hat UK Ltd. > >>>>>>>>> https://keybase.io/andrewhaley > >>>>>>>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > >>>>>>>> From boris.ulasevich at bell-sw.com Sun Aug 2 20:54:47 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Sun, 2 Aug 2020 23:54:47 +0300 Subject: RFR(S) 8248445: Use of AbsI/AbsL nodes should be limited to supported platforms Message-ID: Hi all, Please review a simple change to C2 to fix a regression: AbsI/AbsL nodes are used without checking that the platform supports them (for now it is the issue for ARM32 and 32-bit x86 platforms). http://cr.openjdk.java.net/~bulasevich/8248445/webrev.02 http://bugs.openjdk.java.net/browse/JDK-8248445 thanks, Boris From vladimir.x.ivanov at oracle.com Mon Aug 3 10:37:28 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 3 Aug 2020 13:37:28 +0300 Subject: RFR(S) 8248445: Use of AbsI/AbsL nodes should be limited to supported platforms In-Reply-To: References: Message-ID: <89e9ab7a-5f42-075a-e770-2fb943da897a@oracle.com> > http://cr.openjdk.java.net/~bulasevich/8248445/webrev.02 Looks good. Best regards, Vladimir Ivanov From luhenry at microsoft.com Mon Aug 3 14:39:20 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Mon, 3 Aug 2020 14:39:20 +0000 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: , Message-ID: Hi, A quick follow up on that change. Are you happy with the general approach, or would rather have it done differently? JBS: https://bugs.openjdk.java.net/browse/JDK-8250902 Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.01/ Thank you Ludovic From evgeny.nikitin at oracle.com Mon Aug 3 15:22:40 2020 From: evgeny.nikitin at oracle.com (Evgeny Nikitin) Date: Mon, 3 Aug 2020 17:22:40 +0200 Subject: RFR(M): 8067651: Fix Trivial code path for LevelTransitionTest.java In-Reply-To: <970076A7-1F18-4E88-994F-802590AF4F9B@oracle.com> References: <58fd3cd5-cdce-8e15-3237-d22a3566b0da@oracle.com> <970076A7-1F18-4E88-994F-802590AF4F9B@oracle.com> Message-ID: Hi Igor, thanks for review. > - I don't see necessity of move Helper.* methods into the enclosing class, nor do I see it as improving readability of the test. why did you decide to move them? Remnants from the previous developer and their decision :). I personally don't like inner classes and inner helper methods alike, so now I've extracted that into MethodHelper.java. The fact that the methods are used in another test strengthens this decision for me. > - if the test is inapplicable for Xcomp run, you should either throw SkippedException instead of System.err::println at L#67 or use '@requires vm.compMode != "Xcomp"' in jtreg test description. currently, the former provides arguable more clear message that the test wasn't run (as it sets special sub-status which is understood by our test execution system) than the latter (which will just omit test from test results altogether), however @requires is "faster" as jtreg don't need to run any of the test code. in any case, both makes it clean that the test wasn't really performed, while your code will lead to a passed-passed test w/o no automated way to know that the test wasn't run. I choose @requires. Descriptions in most cases are better then in-code logic. > - from you explanation of the fix it's also unclear why BackgroundCompilation got disabled, could you please explain? One of the reasons for the case was uncontrollable switch to another layer in background. I found that switch valuable to make the test behavior predictable. The new webrev: https://cr.openjdk.java.net/~enikitin/8067651/webrev.01/ Please review, // Evgeny Nikitin. On 2020-07-31 19:11, Igor Ignatyev wrote: > Hi Evgeny, > > in general looks good to me, a couple comments/questions though: > - I don't see necessity of move Helper.* methods into the enclosing class, nor do I see it as improving readability of the test. why did you decide to move them? > - if the test is inapplicable for Xcomp run, you should either throw SkippedException instead of System.err::println at L#67 or use '@requires vm.compMode != "Xcomp"' in jtreg test description. currently, the former provides arguable more clear message that the test wasn't run (as it sets special sub-status which is understood by our test execution system) than the latter (which will just omit test from test results altogether), however @requires is "faster" as jtreg don't need to run any of the test code. in any case, both makes it clean that the test wasn't really performed, while your code will lead to a passed-passed test w/o no automated way to know that the test wasn't run. > - from you explanation of the fix it's also unclear why BackgroundCompilation got disabled, could you please explain? > > Thanks, > -- Igor > >> On Jul 27, 2020, at 12:38 PM, Evgeny Nikitin wrote: >> >> Hi, >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8067651 >> Webrev: https://cr.openjdk.java.net/~enikitin/8067651/webrev.00/ >> >> Adjusting the test to current state of the VM. >> >> - Definition of 'trivial code' does not depend on whether the method has been profiled or not; >> - Trivial code does only go level 0 to level 1; >> - Some refactoring. >> >> The change has been checked in mach5 for the 5 platforms (passed). >> >> Please review, >> /Evgeny Nikitin. > From viv.desh at gmail.com Mon Aug 3 15:41:41 2020 From: viv.desh at gmail.com (Vivek Deshpande) Date: Mon, 3 Aug 2020 08:41:41 -0700 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: Message-ID: Hi ludovic Thanks for the change. It looks good to me. The approach also looks good to me. Thank you. Regards, Vivek On Mon, Aug 3, 2020 at 7:39 AM Ludovic Henry wrote: > Hi, > > A quick follow up on that change. Are you happy with the general approach, > or would rather have it done differently? > > JBS: https://bugs.openjdk.java.net/browse/JDK-8250902 > Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.01/ > > Thank you > Ludovic > -- Thanks and Regards, Vivek Deshpande viv.desh at gmail.com From hohensee at amazon.com Mon Aug 3 17:06:35 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Mon, 3 Aug 2020 17:06:35 +0000 Subject: RFR(S) 8248445: Use of AbsI/AbsL nodes should be limited to supported platforms Message-ID: <37381D95-CBD8-4EC0-9824-9B8AA2D140FB@amazon.com> +1. Paul ?On 8/3/20, 3:35 AM, "hotspot-compiler-dev on behalf of Vladimir Ivanov" wrote: > http://cr.openjdk.java.net/~bulasevich/8248445/webrev.02 Looks good. Best regards, Vladimir Ivanov From vladimir.kozlov at oracle.com Mon Aug 3 17:10:40 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 3 Aug 2020 10:10:40 -0700 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: Message-ID: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> Hi Ludovic This is very professional work! CCing to Core-libs because you modified Java code and need review from Java library group. Few notes: Add tests to verify intrinsic implementation. You can use test/hotspot/jtreg/compiler/intrinsics/sha/ as examples. In vm_version_x86.cpp move UseMD5Intrinsics flag setting near UseSHA flag setting. In new file macroAssembler_x86_md5.cpp no need empty line after copyright line. There is also typo 'rrdistribute': * This code is free software; you can rrdistribute it and/or modify it Our validate-headers check failed. See GPL header template: ./make/templates/gpl-header Ludovic, it looks like you used only general instructions to implement this code. Can you add comment where the algorithm come from? Or it is just direct translation of Java code? Vivek, do we have SSE/AVX instructions which may improve performance of this code? It could be follow up update if we can. Did you test it on 32-bit x86? Would be interesting to see result of artificially switching off AVX and SSE: '-XX:UseSSE=0 -XX:UseAVX=0'. It will make sure that only general instructions are needed. Thanks, Vladimir On 8/3/20 7:39 AM, Ludovic Henry wrote: > Hi, > > A quick follow up on that change. Are you happy with the general approach, or would rather have it done differently? > > JBS: https://bugs.openjdk.java.net/browse/JDK-8250902 > Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.01/ > > Thank you > Ludovic > From vladimir.kozlov at oracle.com Mon Aug 3 17:25:34 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 3 Aug 2020 10:25:34 -0700 Subject: RFR(S) 8248445: Use of AbsI/AbsL nodes should be limited to supported platforms In-Reply-To: References: Message-ID: Hi Boris, The current code is hard to read. Can you rearrange it to have clear code flow (and correct spaces for if ())? Including F and D checks. To something like: if (tzero == TypeF::ZERO) { if (sub->Opcode() == Op_SubF && sub->in(2) == x && phase->type(sub->in(1)) == tzero)) { x = new AbsFNode(x); if (flip) { x = new SubFNode(sub->in(1), phase->transform(x)); } } } else if Thanks, Vladimir On 8/2/20 1:54 PM, Boris Ulasevich wrote: > Hi all, > > Please review a simple change to C2 to fix a regression: AbsI/AbsL > nodes are used without checking that the platform supports them > (for now it is the issue for ARM32 and 32-bit x86 platforms). > > http://cr.openjdk.java.net/~bulasevich/8248445/webrev.02 > http://bugs.openjdk.java.net/browse/JDK-8248445 > > thanks, > Boris > From anthony.scarpino at oracle.com Mon Aug 3 17:31:38 2020 From: anthony.scarpino at oracle.com (Anthony Scarpino) Date: Mon, 3 Aug 2020 10:31:38 -0700 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> Message-ID: <724174CC-78F4-453C-9420-DC30B8E44664@oracle.com> I had looked at the java code changes and are fine with them Tony > On Aug 3, 2020, at 10:10 AM, Vladimir Kozlov wrote: > > ?Hi Ludovic > > This is very professional work! > > CCing to Core-libs because you modified Java code and need review from Java library group. > > Few notes: > > Add tests to verify intrinsic implementation. You can use test/hotspot/jtreg/compiler/intrinsics/sha/ as examples. > > In vm_version_x86.cpp move UseMD5Intrinsics flag setting near UseSHA flag setting. > > In new file macroAssembler_x86_md5.cpp no need empty line after copyright line. There is also typo 'rrdistribute': > > * This code is free software; you can rrdistribute it and/or modify it > > Our validate-headers check failed. See GPL header template: ./make/templates/gpl-header > > Ludovic, it looks like you used only general instructions to implement this code. Can you add comment where the algorithm come from? Or it is just direct translation of Java code? > > Vivek, do we have SSE/AVX instructions which may improve performance of this code? It could be follow up update if we can. > > Did you test it on 32-bit x86? Would be interesting to see result of artificially switching off AVX and SSE: '-XX:UseSSE=0 -XX:UseAVX=0'. It will make sure that only general instructions are needed. > > Thanks, > Vladimir > >> On 8/3/20 7:39 AM, Ludovic Henry wrote: >> Hi, >> A quick follow up on that change. Are you happy with the general approach, or would rather have it done differently? >> JBS: https://bugs.openjdk.java.net/browse/JDK-8250902 >> Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.01/ >> Thank you >> Ludovic From vladimir.kozlov at oracle.com Mon Aug 3 17:34:28 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 3 Aug 2020 10:34:28 -0700 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> Message-ID: <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> And I got crash during JDK build on linux-x64: # Internal Error (src/hotspot/share/opto/library_call.cpp:5732), pid=18904, tid=19012 # assert(field != __null) failed: undefined field # # Java VM: OpenJDK 64-Bit Server VM (fastdebug 16-internal+0-2020-08-03-1651458.vladimir.kozlov.jdkjdk, mixed mode, tiered, compressed oops, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x11123f4] LibraryCallKit::load_field_from_object(Node*, char const*, char const*, bool, bool, ciInstanceKlass*)+0x334 Current CompileTask: C2: 6204 1305 4 sun.security.provider.DigestBase::engineUpdate (189 bytes) Stack: [0x0000151bfcfc7000,0x0000151bfd0c8000], sp=0x0000151bfd0c3ed0, free space=1011k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x11123f4] LibraryCallKit::load_field_from_object(Node*, char const*, char const*, bool, bool, ciInstanceKlass*)+0x334 V [libjvm.so+0x11167ea] LibraryCallKit::get_long_state_from_digestBase_object(Node*)+0x2a V [libjvm.so+0x1116f2d] LibraryCallKit::inline_digestBase_implCompressMB(Node*, ciInstanceKlass*, bool, unsigned char*, char const*, Node*, Node*, Node*)+0x2cd V [libjvm.so+0x1117467] LibraryCallKit::inline_digestBase_implCompressMB(int)+0x397 V [libjvm.so+0x1121de1] LibraryIntrinsic::generate(JVMState*)+0x211 V [libjvm.so+0x75d61d] PredicatedIntrinsicGenerator::generate(JVMState*)+0xb8d Vladimir K On 8/3/20 10:10 AM, Vladimir Kozlov wrote: > Hi Ludovic > > This is very professional work! > > CCing to Core-libs because you modified Java code and need review from Java library group. > > Few notes: > > Add tests to verify intrinsic implementation. You can use test/hotspot/jtreg/compiler/intrinsics/sha/ as examples. > > In vm_version_x86.cpp move UseMD5Intrinsics flag setting near UseSHA flag setting. > > In new file macroAssembler_x86_md5.cpp no need empty line after copyright line. There is also typo 'rrdistribute': > > ?* This code is free software; you can rrdistribute it and/or modify it > > Our validate-headers check failed. See GPL header template: ./make/templates/gpl-header > > Ludovic, it looks like you used only general instructions to implement this code. Can you add comment where the > algorithm come from? Or it is just direct translation of Java code? > > Vivek, do we have SSE/AVX instructions which may improve performance of this code? It could be follow up update if we can. > > Did you test it on 32-bit x86? Would be interesting to see result of artificially switching off AVX and SSE: > '-XX:UseSSE=0 -XX:UseAVX=0'. It will make sure that only general instructions are needed. > > Thanks, > Vladimir > > On 8/3/20 7:39 AM, Ludovic Henry wrote: >> Hi, >> >> A quick follow up on that change. Are you happy with the general approach, or would rather have it done differently? >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8250902 >> Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.01/ >> >> Thank you >> Ludovic >> From igor.ignatyev at oracle.com Mon Aug 3 18:11:35 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 3 Aug 2020 11:11:35 -0700 Subject: RFR(M): 8067651: Fix Trivial code path for LevelTransitionTest.java In-Reply-To: References: <58fd3cd5-cdce-8e15-3237-d22a3566b0da@oracle.com> <970076A7-1F18-4E88-994F-802590AF4F9B@oracle.com> Message-ID: <5F33FE0C-922F-435E-AD25-2A3445A51996@oracle.com> Hi Evgeny, webrev.01 looks good to me, thanks. -- Igor > On Aug 3, 2020, at 8:22 AM, Evgeny Nikitin wrote: > > Hi Igor, thanks for review. > > > - I don't see necessity of move Helper.* methods into the enclosing class, nor do I see it as improving readability of the test. why did you decide to move them? > > Remnants from the previous developer and their decision :). I personally don't like inner classes and inner helper methods alike, so now I've extracted that into MethodHelper.java. The fact that the methods are used in another test strengthens this decision for me. > > > - if the test is inapplicable for Xcomp run, you should either throw SkippedException instead of System.err::println at L#67 or use '@requires vm.compMode != "Xcomp"' in jtreg test description. currently, the former provides arguable more clear message that the test wasn't run (as it sets special sub-status which is understood by our test execution system) than the latter (which will just omit test from test results altogether), however @requires is "faster" as jtreg don't need to run any of the test code. in any case, both makes it clean that the test wasn't really performed, while your code will lead to a passed-passed test w/o no automated way to know that the test wasn't run. > > I choose @requires. Descriptions in most cases are better then in-code logic. > > > - from you explanation of the fix it's also unclear why BackgroundCompilation got disabled, could you please explain? > > One of the reasons for the case was uncontrollable switch to another layer in background. I found that switch valuable to make the test behavior predictable. > > The new webrev: https://cr.openjdk.java.net/~enikitin/8067651/webrev.01/ > > Please review, > // Evgeny Nikitin. > > > > On 2020-07-31 19:11, Igor Ignatyev wrote: >> Hi Evgeny, >> in general looks good to me, a couple comments/questions though: >> - I don't see necessity of move Helper.* methods into the enclosing class, nor do I see it as improving readability of the test. why did you decide to move them? >> - if the test is inapplicable for Xcomp run, you should either throw SkippedException instead of System.err::println at L#67 or use '@requires vm.compMode != "Xcomp"' in jtreg test description. currently, the former provides arguable more clear message that the test wasn't run (as it sets special sub-status which is understood by our test execution system) than the latter (which will just omit test from test results altogether), however @requires is "faster" as jtreg don't need to run any of the test code. in any case, both makes it clean that the test wasn't really performed, while your code will lead to a passed-passed test w/o no automated way to know that the test wasn't run. >> - from you explanation of the fix it's also unclear why BackgroundCompilation got disabled, could you please explain? >> Thanks, >> -- Igor >>> On Jul 27, 2020, at 12:38 PM, Evgeny Nikitin wrote: >>> >>> Hi, >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8067651 >>> Webrev: https://cr.openjdk.java.net/~enikitin/8067651/webrev.00/ >>> >>> Adjusting the test to current state of the VM. >>> >>> - Definition of 'trivial code' does not depend on whether the method has been profiled or not; >>> - Trivial code does only go level 0 to level 1; >>> - Some refactoring. >>> >>> The change has been checked in mach5 for the 5 platforms (passed). >>> >>> Please review, >>> /Evgeny Nikitin. From luhenry at microsoft.com Mon Aug 3 18:12:32 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Mon, 3 Aug 2020 18:12:32 +0000 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> Message-ID: > And I got crash during JDK build on linux-x64: > > # Internal Error (src/hotspot/share/opto/library_call.cpp:5732), pid=18904, tid=19012 > # assert(field != __null) failed: undefined field > # > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 16-internal+0-2020-08-03-1651458.vladimir.kozlov.jdkjdk, mixed mode, > tiered, compressed oops, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x11123f4] LibraryCallKit::load_field_from_object(Node*, char const*, char const*, bool, bool, ciInstanceKlass*)+0x334 > > Current CompileTask: > C2: 6204 1305 4 sun.security.provider.DigestBase::engineUpdate (189 bytes) > > Stack: [0x0000151bfcfc7000,0x0000151bfd0c8000], sp=0x0000151bfd0c3ed0, free space=1011k > Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x11123f4] LibraryCallKit::load_field_from_object(Node*, char const*, char const*, bool, bool, ciInstanceKlass*)+0x334 > V [libjvm.so+0x11167ea] LibraryCallKit::get_long_state_from_digestBase_object(Node*)+0x2a > V [libjvm.so+0x1116f2d] LibraryCallKit::inline_digestBase_implCompressMB(Node*, ciInstanceKlass*, bool, unsigned char*, char const*, Node*, Node*, Node*)+0x2cd > V [libjvm.so+0x1117467] LibraryCallKit::inline_digestBase_implCompressMB(int)+0x397 > V [libjvm.so+0x1121de1] LibraryIntrinsic::generate(JVMState*)+0x211 > V [libjvm.so+0x75d61d] PredicatedIntrinsicGenerator::generate(JVMState*)+0xb8d Interesting, I did all my work on Linux-x64 but didn't observe that. Let me try to reproduce and come back to you on that. From vladimir.kozlov at oracle.com Mon Aug 3 18:49:18 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 3 Aug 2020 11:49:18 -0700 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> Message-ID: Hmm, I applied your http://cr.openjdk.java.net/~luhenry/8250902/webrev.01/jdk.changeset But it looks like it has more changes (windows_aarch64) then just MD5 intrinsic. I will retest again with removed other changes. Vladimir K On 8/3/20 11:12 AM, Ludovic Henry wrote: >> And I got crash during JDK build on linux-x64: >> >> # Internal Error (src/hotspot/share/opto/library_call.cpp:5732), pid=18904, tid=19012 >> # assert(field != __null) failed: undefined field >> # >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 16-internal+0-2020-08-03-1651458.vladimir.kozlov.jdkjdk, mixed mode, >> tiered, compressed oops, g1 gc, linux-amd64) >> # Problematic frame: >> # V [libjvm.so+0x11123f4] LibraryCallKit::load_field_from_object(Node*, char const*, char const*, bool, bool, ciInstanceKlass*)+0x334 >> >> Current CompileTask: >> C2: 6204 1305 4 sun.security.provider.DigestBase::engineUpdate (189 bytes) >> >> Stack: [0x0000151bfcfc7000,0x0000151bfd0c8000], sp=0x0000151bfd0c3ed0, free space=1011k >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x11123f4] LibraryCallKit::load_field_from_object(Node*, char const*, char const*, bool, bool, ciInstanceKlass*)+0x334 >> V [libjvm.so+0x11167ea] LibraryCallKit::get_long_state_from_digestBase_object(Node*)+0x2a >> V [libjvm.so+0x1116f2d] LibraryCallKit::inline_digestBase_implCompressMB(Node*, ciInstanceKlass*, bool, unsigned char*, char const*, Node*, Node*, Node*)+0x2cd >> V [libjvm.so+0x1117467] LibraryCallKit::inline_digestBase_implCompressMB(int)+0x397 >> V [libjvm.so+0x1121de1] LibraryIntrinsic::generate(JVMState*)+0x211 >> V [libjvm.so+0x75d61d] PredicatedIntrinsicGenerator::generate(JVMState*)+0xb8d > > Interesting, I did all my work on Linux-x64 but didn't observe that. Let me try to reproduce and come back to you on that. > From vladimir.kozlov at oracle.com Mon Aug 3 18:50:43 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 3 Aug 2020 11:50:43 -0700 Subject: RFR(M): 8067651: Fix Trivial code path for LevelTransitionTest.java In-Reply-To: <5F33FE0C-922F-435E-AD25-2A3445A51996@oracle.com> References: <58fd3cd5-cdce-8e15-3237-d22a3566b0da@oracle.com> <970076A7-1F18-4E88-994F-802590AF4F9B@oracle.com> <5F33FE0C-922F-435E-AD25-2A3445A51996@oracle.com> Message-ID: <47918541-638d-a231-c1ba-67ce512a498d@oracle.com> +1 Thanks, Vladimir K On 8/3/20 11:11 AM, Igor Ignatyev wrote: > Hi Evgeny, > > webrev.01 looks good to me, thanks. > > -- Igor > >> On Aug 3, 2020, at 8:22 AM, Evgeny Nikitin wrote: >> >> Hi Igor, thanks for review. >> >>> - I don't see necessity of move Helper.* methods into the enclosing class, nor do I see it as improving readability of the test. why did you decide to move them? >> >> Remnants from the previous developer and their decision :). I personally don't like inner classes and inner helper methods alike, so now I've extracted that into MethodHelper.java. The fact that the methods are used in another test strengthens this decision for me. >> >>> - if the test is inapplicable for Xcomp run, you should either throw SkippedException instead of System.err::println at L#67 or use '@requires vm.compMode != "Xcomp"' in jtreg test description. currently, the former provides arguable more clear message that the test wasn't run (as it sets special sub-status which is understood by our test execution system) than the latter (which will just omit test from test results altogether), however @requires is "faster" as jtreg don't need to run any of the test code. in any case, both makes it clean that the test wasn't really performed, while your code will lead to a passed-passed test w/o no automated way to know that the test wasn't run. >> >> I choose @requires. Descriptions in most cases are better then in-code logic. >> >>> - from you explanation of the fix it's also unclear why BackgroundCompilation got disabled, could you please explain? >> >> One of the reasons for the case was uncontrollable switch to another layer in background. I found that switch valuable to make the test behavior predictable. >> >> The new webrev: https://cr.openjdk.java.net/~enikitin/8067651/webrev.01/ >> >> Please review, >> // Evgeny Nikitin. >> >> >> >> On 2020-07-31 19:11, Igor Ignatyev wrote: >>> Hi Evgeny, >>> in general looks good to me, a couple comments/questions though: >>> - I don't see necessity of move Helper.* methods into the enclosing class, nor do I see it as improving readability of the test. why did you decide to move them? >>> - if the test is inapplicable for Xcomp run, you should either throw SkippedException instead of System.err::println at L#67 or use '@requires vm.compMode != "Xcomp"' in jtreg test description. currently, the former provides arguable more clear message that the test wasn't run (as it sets special sub-status which is understood by our test execution system) than the latter (which will just omit test from test results altogether), however @requires is "faster" as jtreg don't need to run any of the test code. in any case, both makes it clean that the test wasn't really performed, while your code will lead to a passed-passed test w/o no automated way to know that the test wasn't run. >>> - from you explanation of the fix it's also unclear why BackgroundCompilation got disabled, could you please explain? >>> Thanks, >>> -- Igor >>>> On Jul 27, 2020, at 12:38 PM, Evgeny Nikitin wrote: >>>> >>>> Hi, >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8067651 >>>> Webrev: https://cr.openjdk.java.net/~enikitin/8067651/webrev.00/ >>>> >>>> Adjusting the test to current state of the VM. >>>> >>>> - Definition of 'trivial code' does not depend on whether the method has been profiled or not; >>>> - Trivial code does only go level 0 to level 1; >>>> - Some refactoring. >>>> >>>> The change has been checked in mach5 for the 5 platforms (passed). >>>> >>>> Please review, >>>> /Evgeny Nikitin. > From luhenry at microsoft.com Mon Aug 3 18:52:34 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Mon, 3 Aug 2020 18:52:34 +0000 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> Message-ID: > But it looks like it has more changes (windows_aarch64) then just MD5 intrinsic. > I will retest again with removed other changes. That looks like a mistake with me learning to use Mercurial, sorry about that. The only patch you need is `8250902: Implement MD5 Intrinsics on x86`, all the others are my mistake. From luhenry at microsoft.com Mon Aug 3 19:00:06 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Mon, 3 Aug 2020 19:00:06 +0000 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> Message-ID: I've updated [1] with the proper patch. [1] http://cr.openjdk.java.net/~luhenry/md5-intrinsics/webrev.01/8250902.patch From vladimir.kozlov at oracle.com Mon Aug 3 19:18:53 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 3 Aug 2020 12:18:53 -0700 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> Message-ID: I reproduced crash with only MD5 changes on my local linux machine during fastdebug build. Next code in inline_digestBase_implCompressMB should be reversed (get_long_*() should be called for long_state): if (long_state) { state = get_state_from_digestBase_object(digestBase_obj); } else { state = get_long_state_from_digestBase_object(digestBase_obj); } Vladimir K On 8/3/20 11:52 AM, Ludovic Henry wrote: >> But it looks like it has more changes (windows_aarch64) then just MD5 intrinsic. >> I will retest again with removed other changes. > > That looks like a mistake with me learning to use Mercurial, sorry about that. > > The only patch you need is `8250902: Implement MD5 Intrinsics on x86`, all the others are my mistake. > > From viv.desh at gmail.com Mon Aug 3 22:08:22 2020 From: viv.desh at gmail.com (Vivek Deshpande) Date: Mon, 3 Aug 2020 15:08:22 -0700 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> Message-ID: Hi Vladimir It seems that the algorithm can be optimized further using SSE/AVX instructions. I am not aware of any specific SSE/AVX implementation which leverages those instructions in the best possible way. Sandhya can chime in more on that. As far as I know, I came across this which points to MD5 SSE/AVX implementation. https://software.intel.com/content/www/us/en/develop/articles/intel-isa-l-cryptographic-hashes-for-cloud-storage.html Regards, Vivek On Mon, Aug 3, 2020 at 12:21 PM Vladimir Kozlov wrote: > I reproduced crash with only MD5 changes on my local linux machine during > fastdebug build. > > Next code in inline_digestBase_implCompressMB should be reversed > (get_long_*() should be called for long_state): > > if (long_state) { > state = get_state_from_digestBase_object(digestBase_obj); > } else { > state = get_long_state_from_digestBase_object(digestBase_obj); > } > > Vladimir K > > On 8/3/20 11:52 AM, Ludovic Henry wrote: > >> But it looks like it has more changes (windows_aarch64) then just MD5 > intrinsic. > >> I will retest again with removed other changes. > > > > That looks like a mistake with me learning to use Mercurial, sorry about > that. > > > > The only patch you need is `8250902: Implement MD5 Intrinsics on x86`, > all the others are my mistake. > > > > > -- Thanks and Regards, Vivek Deshpande viv.desh at gmail.com From vladimir.kozlov at oracle.com Mon Aug 3 23:10:21 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 3 Aug 2020 16:10:21 -0700 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> Message-ID: <0d97ffec-1e6e-65d3-d1c3-b39f72145c14@oracle.com> Thank you, Vivek, for pointer. This is interesting ,could be somehitng Intel's mlib may have. Vladimir K On 8/3/20 3:08 PM, Vivek Deshpande wrote: > Hi Vladimir > > It seems that the algorithm can be optimized further using SSE/AVX > instructions. I am not aware of any specific SSE/AVX implementation which > leverages those instructions in the best possible way. Sandhya can chime > in more on that. > As far as I know, I came across this which points to MD5 SSE/AVX > implementation. > https://software.intel.com/content/www/us/en/develop/articles/intel-isa-l-cryptographic-hashes-for-cloud-storage.html > > Regards, > Vivek > > On Mon, Aug 3, 2020 at 12:21 PM Vladimir Kozlov > wrote: > >> I reproduced crash with only MD5 changes on my local linux machine during >> fastdebug build. >> >> Next code in inline_digestBase_implCompressMB should be reversed >> (get_long_*() should be called for long_state): >> >> if (long_state) { >> state = get_state_from_digestBase_object(digestBase_obj); >> } else { >> state = get_long_state_from_digestBase_object(digestBase_obj); >> } >> >> Vladimir K >> >> On 8/3/20 11:52 AM, Ludovic Henry wrote: >>>> But it looks like it has more changes (windows_aarch64) then just MD5 >> intrinsic. >>>> I will retest again with removed other changes. >>> >>> That looks like a mistake with me learning to use Mercurial, sorry about >> that. >>> >>> The only patch you need is `8250902: Implement MD5 Intrinsics on x86`, >> all the others are my mistake. >>> >>> >> > > From vladimir.kozlov at oracle.com Mon Aug 3 23:58:59 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 3 Aug 2020 16:58:59 -0700 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> Message-ID: Hmm, with that code reversed I now have failure only on Windows: V [jvm.dll+0x43abb7] report_vm_error+0x117 (debug.cpp:264) V [jvm.dll+0x8a222e] LibraryCallKit::load_field_from_object+0x1ae (library_call.cpp:5732) V [jvm.dll+0x88c3ea] LibraryCallKit::get_state_from_digestBase_object+0x3a (library_call.cpp:6614) V [jvm.dll+0x8909d5] LibraryCallKit::inline_digestBase_implCompressMB+0x115 (library_call.cpp:6598) V [jvm.dll+0x8908b1] LibraryCallKit::inline_digestBase_implCompressMB+0x411 (library_call.cpp:6578) V [jvm.dll+0x8a5b2d] LibraryCallKit::try_to_inline+0x184d (library_call.cpp:836) The bug is in the same code as before - typreo due to renaming. So the code should be: if (long_state) { state = get_long_state_from_digestBase_object(obj); } else { state = get_state_from_digestBase_object(obj); } BTW, Ludovic, you need to add next change [1] to Graal's test to avoid its failure. Thanks, Vladimir K [1] src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java @@ -423,6 +423,11 @@ "java/math/BigInteger.shiftRightImplWorker([I[IIII)V"); } + if (isJDK16OrHigher()) { + add(toBeInvestigated, + "sun/security/provider/MD5.implCompress0([BI)V"); + } + if (!config.inlineNotify()) { add(ignore, "java/lang/Object.notify()V"); } @@ -593,6 +598,14 @@ return JavaVersionUtil.JAVA_SPEC >= 14; } + private static boolean isJDK15OrHigher() { + return JavaVersionUtil.JAVA_SPEC >= 15; + } + + private static boolean isJDK16OrHigher() { + return JavaVersionUtil.JAVA_SPEC >= 16; + } + public interface Refiner { void refine(CheckGraalIntrinsics checker); } On 8/3/20 12:18 PM, Vladimir Kozlov wrote: > I reproduced crash with only MD5 changes on my local linux machine during fastdebug build. > > Next code in inline_digestBase_implCompressMB should be reversed (get_long_*() should be called for long_state): > > ?? if (long_state) { > ???? state = get_state_from_digestBase_object(digestBase_obj); > ?? } else { > ???? state = get_long_state_from_digestBase_object(digestBase_obj); > ?? } > > Vladimir K > > On 8/3/20 11:52 AM, Ludovic Henry wrote: >>> But it looks like it has more changes (windows_aarch64) then just MD5 intrinsic. >>> I will retest again with removed other changes. >> >> That looks like a mistake with me learning to use Mercurial, sorry about that. >> >> The only patch you need is `8250902: Implement MD5 Intrinsics on x86`, all the others are my mistake. >> From sandhya.viswanathan at intel.com Mon Aug 3 23:59:59 2020 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Mon, 3 Aug 2020 23:59:59 +0000 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: <0d97ffec-1e6e-65d3-d1c3-b39f72145c14@oracle.com> References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> <0d97ffec-1e6e-65d3-d1c3-b39f72145c14@oracle.com> Message-ID: The link that Vivek shared is for multi-buffer implementation where multiple MD5 hashes for different buffers is calculated at once using SIMD. What is needed here is the acceleration of single buffer hash. I think that is what Henry's patch is proposing. Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov Sent: Monday, August 03, 2020 4:10 PM To: Vivek Deshpande Cc: Ludovic Henry ; hotspot-compiler-dev at openjdk.java.net; core-libs-dev ; Viswanathan, Sandhya Subject: Re: RFR[M]: Adding MD5 Intrinsic on x86-64 Thank you, Vivek, for pointer. This is interesting ,could be somehitng Intel's mlib may have. Vladimir K On 8/3/20 3:08 PM, Vivek Deshpande wrote: > Hi Vladimir > > It seems that the algorithm can be optimized further using SSE/AVX > instructions. I am not aware of any specific SSE/AVX implementation > which leverages those instructions in the best possible way. Sandhya > can chime in more on that. > As far as I know, I came across this which points to MD5 SSE/AVX > implementation. > https://software.intel.com/content/www/us/en/develop/articles/intel-is > a-l-cryptographic-hashes-for-cloud-storage.html > > Regards, > Vivek > > On Mon, Aug 3, 2020 at 12:21 PM Vladimir Kozlov > > wrote: > >> I reproduced crash with only MD5 changes on my local linux machine >> during fastdebug build. >> >> Next code in inline_digestBase_implCompressMB should be reversed >> (get_long_*() should be called for long_state): >> >> if (long_state) { >> state = get_state_from_digestBase_object(digestBase_obj); >> } else { >> state = get_long_state_from_digestBase_object(digestBase_obj); >> } >> >> Vladimir K >> >> On 8/3/20 11:52 AM, Ludovic Henry wrote: >>>> But it looks like it has more changes (windows_aarch64) then just >>>> MD5 >> intrinsic. >>>> I will retest again with removed other changes. >>> >>> That looks like a mistake with me learning to use Mercurial, sorry >>> about >> that. >>> >>> The only patch you need is `8250902: Implement MD5 Intrinsics on >>> x86`, >> all the others are my mistake. >>> >>> >> > > From luhenry at microsoft.com Tue Aug 4 00:13:06 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Tue, 4 Aug 2020 00:13:06 +0000 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> Message-ID: I've fixed it at [1]. I'm sending an update very soon as soon as I have the performance numbers you asked for, and the test suites results on the different platforms of interest. [1] http://cr.openjdk.java.net/~luhenry/8250902/webrev.02/ -----Original Message----- From: Vladimir Kozlov Sent: Monday, August 3, 2020 4:59 PM To: Ludovic Henry ; hotspot-compiler-dev at openjdk.java.net; Vivek Deshpande Cc: core-libs-dev Subject: Re: RFR[M]: Adding MD5 Intrinsic on x86-64 Hmm, with that code reversed I now have failure only on Windows: V [jvm.dll+0x43abb7] report_vm_error+0x117 (debug.cpp:264) V [jvm.dll+0x8a222e] LibraryCallKit::load_field_from_object+0x1ae (library_call.cpp:5732) V [jvm.dll+0x88c3ea] LibraryCallKit::get_state_from_digestBase_object+0x3a (library_call.cpp:6614) V [jvm.dll+0x8909d5] LibraryCallKit::inline_digestBase_implCompressMB+0x115 (library_call.cpp:6598) V [jvm.dll+0x8908b1] LibraryCallKit::inline_digestBase_implCompressMB+0x411 (library_call.cpp:6578) V [jvm.dll+0x8a5b2d] LibraryCallKit::try_to_inline+0x184d (library_call.cpp:836) The bug is in the same code as before - typreo due to renaming. So the code should be: if (long_state) { state = get_long_state_from_digestBase_object(obj); } else { state = get_state_from_digestBase_object(obj); } BTW, Ludovic, you need to add next change [1] to Graal's test to avoid its failure. Thanks, Vladimir K [1] src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java @@ -423,6 +423,11 @@ "java/math/BigInteger.shiftRightImplWorker([I[IIII)V"); } + if (isJDK16OrHigher()) { + add(toBeInvestigated, + "sun/security/provider/MD5.implCompress0([BI)V"); + } + if (!config.inlineNotify()) { add(ignore, "java/lang/Object.notify()V"); } @@ -593,6 +598,14 @@ return JavaVersionUtil.JAVA_SPEC >= 14; } + private static boolean isJDK15OrHigher() { + return JavaVersionUtil.JAVA_SPEC >= 15; + } + + private static boolean isJDK16OrHigher() { + return JavaVersionUtil.JAVA_SPEC >= 16; + } + public interface Refiner { void refine(CheckGraalIntrinsics checker); } On 8/3/20 12:18 PM, Vladimir Kozlov wrote: > I reproduced crash with only MD5 changes on my local linux machine during fastdebug build. > > Next code in inline_digestBase_implCompressMB should be reversed (get_long_*() should be called for long_state): > > ?? if (long_state) { > ???? state = get_state_from_digestBase_object(digestBase_obj); > ?? } else { > ???? state = get_long_state_from_digestBase_object(digestBase_obj); > ?? } > > Vladimir K > > On 8/3/20 11:52 AM, Ludovic Henry wrote: >>> But it looks like it has more changes (windows_aarch64) then just MD5 intrinsic. >>> I will retest again with removed other changes. >> >> That looks like a mistake with me learning to use Mercurial, sorry about that. >> >> The only patch you need is `8250902: Implement MD5 Intrinsics on x86`, all the others are my mistake. >> From luhenry at microsoft.com Tue Aug 4 04:07:49 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Tue, 4 Aug 2020 04:07:49 +0000 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> Message-ID: Updated webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.02 > Next code in inline_digestBase_implCompressMB should be reversed (get_long_*() should be called for long_state): > > if (long_state) { > state = get_state_from_digestBase_object(digestBase_obj); > } else { > state = get_long_state_from_digestBase_object(digestBase_obj); > } Thanks for pointing that out. I tested everything with `hotspot:tier1` and `jdk:tier1` in fastdebug on Windows-x86, Windows-x64 and Linux-x64. > It seems that the algorithm can be optimized further using SSE/AVX instructions. I am not aware of any specific SSE/AVX implementation which leverages those instructions in the best possible way. Sandhya can chime in more on that. I have done some research prior to implementing this intrinsic and the only pointers I could find to vectorized MD5 is on computing _multiple_ MD5 hashes in parallel but not a _single_ MD5 hash. Using vectors effectively parallelize the computation of many MD5 hash, but it does not accelerate the computation of a single MD5 hash. And looking at the algorithm, every step depends on the previous step's result, which make it particularly hard to parallelize/vectorize. > As far as I know, I came across this which points to MD5 SSE/AVX implementation. https://software.intel.com/content/www/us/en/develop/articles/intel-isa-l-cryptographic-hashes-for-cloud-storage.html That library points to computing many MD5 hashes in parallel. Quoting: "Intel? ISA-L uses a novel technique called multi-buffer hashing, which [...] compute several hashes at once within a single core." That is similar to what I found in researching how to vectorize MD5. I also did not find any reference of an ISA-level implementation of MD5, neither in x86 nor ARM. If you can point me to a document describing how to vectorize MD5, I would be more than happy to take a look and implement the algorithm. However, my understanding is that MD5 is not vectorizable by-design. > Add tests to verify intrinsic implementation. You can use test/hotspot/jtreg/compiler/intrinsics/sha/ as examples. I looked at these tests and they already cover MD5. I am not sure what's the best way to add tests here: 1. should I rename ` compiler/intrinsics/sha` to ` compiler/intrinsics/digest` and add the md5 tests there, 2. should I just add ` compiler/intrinsics/md5`, or 3. the name doesn't matter and I can just add it in ` compiler/intrinsics/sha`? > In vm_version_x86.cpp move UseMD5Intrinsics flag setting near UseSHA flag setting. Fixed. > In new file macroAssembler_x86_md5.cpp no need empty line after copyright line. There is also typo 'rrdistribute': > > * This code is free software; you can rrdistribute it and/or modify it > > Our validate-headers check failed. See GPL header template: ./make/templates/gpl-header I updated the header, and added the license for the original code for the MD5 core algorithm. > Did you test it on 32-bit x86? I did run `hotspot:tier1` and `jdk:tier1` on Windows-x86, Windows-x64 and Linux-x64. > Would be interesting to see result of artificially switching off AVX and SSE: > '-XX:UseSSE=0 -XX:UseAVX=0'. It will make sure that only general instructions are needed. The results are below: -XX:-UseMD5Intrinsics Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest md5 64 DEFAULT thrpt 10 3512.618 ? 9.384 ops/ms MessageDigests.digest md5 1024 DEFAULT thrpt 10 450.037 ? 1.213 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 10 29.887 ? 0.057 ops/ms MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.485 ? 0.002 ops/ms -XX:+UseMD5Intrinsics Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest md5 64 DEFAULT thrpt 10 4212.156 ? 7.781 ops/ ms => 19% speedup MessageDigests.digest md5 1024 DEFAULT thrpt 10 548.609 ? 1.374 ops/ ms => 22% speedup MessageDigests.digest md5 16384 DEFAULT thrpt 10 37.961 ? 0.079 ops/ ms => 27% speedup MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.596 ? 0.006 ops/ ms => 23% speedup -XX:-UseMD5Intrinsics -XX:UseSSE=0 -XX:UseAVX=0 Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest md5 64 DEFAULT thrpt 10 3462.769 ? 4.992 ops/ms MessageDigests.digest md5 1024 DEFAULT thrpt 10 443.858 ? 0.576 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 10 29.723 ? 0.480 ops/ms MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.470 ? 0.001 ops/ms -XX:+UseMD5Intrinsics -XX:UseSSE=0 -XX:UseAVX=0 Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest md5 64 DEFAULT thrpt 10 4237.219 ? 15.627 ops/ms => 22% speedup MessageDigests.digest md5 1024 DEFAULT thrpt 10 564.625 ? 1.510 ops/ms => 27% speedup MessageDigests.digest md5 16384 DEFAULT thrpt 10 38.004 ? 0.078 ops/ms => 28% speedup MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.597 ? 0.002 ops/ms => 27% speedup Thank you, Ludovic From luhenry at microsoft.com Tue Aug 4 04:23:11 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Tue, 4 Aug 2020 04:23:11 +0000 Subject: [aarch64-port-dev ] RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC In-Reply-To: References: <5e301790-8bfe-0ced-b5e2-8a9c76ae33de@oracle.com> <1259c3fd-b69c-6d81-0427-cb769f00bca5@redhat.com> , <116277BD-EA21-49AA-8DE1-DBC06ED43C43@oracle.com> Message-ID: Hello, A quick follow up on that change. Webrev: http://cr.openjdk.java.net/~luhenry/8248672/webrev.01/8248672.patch Thank you, Ludovic From viv.desh at gmail.com Tue Aug 4 04:39:54 2020 From: viv.desh at gmail.com (Vivek Deshpande) Date: Mon, 3 Aug 2020 21:39:54 -0700 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> Message-ID: Thanks Ludovic Detailed explanation and Sandhya for clarification on the vectorization. Regards, Vivek On Mon, Aug 3, 2020 at 9:07 PM Ludovic Henry wrote: > Updated webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.02 > > > Next code in inline_digestBase_implCompressMB should be reversed > (get_long_*() should be called for long_state): > > > > if (long_state) { > > state = get_state_from_digestBase_object(digestBase_obj); > > } else { > > state = get_long_state_from_digestBase_object(digestBase_obj); > > } > > Thanks for pointing that out. I tested everything with `hotspot:tier1` and > `jdk:tier1` in fastdebug on Windows-x86, Windows-x64 and Linux-x64. > > > It seems that the algorithm can be optimized further using SSE/AVX > instructions. I am not aware of any specific SSE/AVX implementation which > leverages those instructions in the best possible way. Sandhya can chime in > more on that. > > I have done some research prior to implementing this intrinsic and the > only pointers I could find to vectorized MD5 is on computing _multiple_ MD5 > hashes in parallel but not a _single_ MD5 hash. Using vectors effectively > parallelize the computation of many MD5 hash, but it does not accelerate > the computation of a single MD5 hash. And looking at the algorithm, every > step depends on the previous step's result, which make it particularly hard > to parallelize/vectorize. > > > As far as I know, I came across this which points to MD5 SSE/AVX > implementation. > https://software.intel.com/content/www/us/en/develop/articles/intel-isa-l-cryptographic-hashes-for-cloud-storage.html > > That library points to computing many MD5 hashes in parallel. Quoting: > "Intel? ISA-L uses a novel technique called multi-buffer hashing, which > [...] compute several hashes at once within a single core." That is similar > to what I found in researching how to vectorize MD5. I also did not find > any reference of an ISA-level implementation of MD5, neither in x86 nor ARM. > > If you can point me to a document describing how to vectorize MD5, I would > be more than happy to take a look and implement the algorithm. However, my > understanding is that MD5 is not vectorizable by-design. > > > Add tests to verify intrinsic implementation. You can use > test/hotspot/jtreg/compiler/intrinsics/sha/ as examples. > > I looked at these tests and they already cover MD5. I am not sure what's > the best way to add tests here: 1. should I rename ` > compiler/intrinsics/sha` to ` compiler/intrinsics/digest` and add the md5 > tests there, 2. should I just add ` compiler/intrinsics/md5`, or 3. the > name doesn't matter and I can just add it in ` compiler/intrinsics/sha`? > > > In vm_version_x86.cpp move UseMD5Intrinsics flag setting near UseSHA > flag setting. > > Fixed. > > > In new file macroAssembler_x86_md5.cpp no need empty line after > copyright line. There is also typo 'rrdistribute': > > > > * This code is free software; you can rrdistribute it and/or modify it > > > > Our validate-headers check failed. See GPL header template: > ./make/templates/gpl-header > > I updated the header, and added the license for the original code for the > MD5 core algorithm. > > > Did you test it on 32-bit x86? > > I did run `hotspot:tier1` and `jdk:tier1` on Windows-x86, Windows-x64 and > Linux-x64. > > > Would be interesting to see result of artificially switching off AVX and > SSE: > > '-XX:UseSSE=0 -XX:UseAVX=0'. It will make sure that only general > instructions are needed. > > The results are below: > > -XX:-UseMD5Intrinsics > Benchmark (digesterName) (length) (provider) Mode Cnt > Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 10 > 3512.618 ? 9.384 ops/ms > MessageDigests.digest md5 1024 DEFAULT thrpt 10 > 450.037 ? 1.213 ops/ms > MessageDigests.digest md5 16384 DEFAULT thrpt 10 > 29.887 ? 0.057 ops/ms > MessageDigests.digest md5 1048576 DEFAULT thrpt 10 > 0.485 ? 0.002 ops/ms > > -XX:+UseMD5Intrinsics > Benchmark (digesterName) (length) (provider) Mode Cnt > Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 10 > 4212.156 ? 7.781 ops/ ms => 19% speedup > MessageDigests.digest md5 1024 DEFAULT thrpt 10 > 548.609 ? 1.374 ops/ ms => 22% speedup > MessageDigests.digest md5 16384 DEFAULT thrpt 10 > 37.961 ? 0.079 ops/ ms => 27% speedup > MessageDigests.digest md5 1048576 DEFAULT thrpt 10 > 0.596 ? 0.006 ops/ ms => 23% speedup > > -XX:-UseMD5Intrinsics -XX:UseSSE=0 -XX:UseAVX=0 > Benchmark (digesterName) (length) (provider) Mode Cnt > Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 10 > 3462.769 ? 4.992 ops/ms > MessageDigests.digest md5 1024 DEFAULT thrpt 10 > 443.858 ? 0.576 ops/ms > MessageDigests.digest md5 16384 DEFAULT thrpt 10 > 29.723 ? 0.480 ops/ms > MessageDigests.digest md5 1048576 DEFAULT thrpt 10 > 0.470 ? 0.001 ops/ms > > -XX:+UseMD5Intrinsics -XX:UseSSE=0 -XX:UseAVX=0 > Benchmark (digesterName) (length) (provider) Mode Cnt > Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 10 > 4237.219 ? 15.627 ops/ms => 22% speedup > MessageDigests.digest md5 1024 DEFAULT thrpt 10 > 564.625 ? 1.510 ops/ms => 27% speedup > MessageDigests.digest md5 16384 DEFAULT thrpt 10 > 38.004 ? 0.078 ops/ms => 28% speedup > MessageDigests.digest md5 1048576 DEFAULT thrpt 10 > 0.597 ? 0.002 ops/ms => 27% speedup > > Thank you, > Ludovic > -- Thanks and Regards, Vivek Deshpande viv.desh at gmail.com From tobias.hartmann at oracle.com Tue Aug 4 06:25:11 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 4 Aug 2020 08:25:11 +0200 Subject: [16] RFR(M) 8250233: -XX:+CITime triggers guarantee(events != NULL) in jvmci.cpp:173 In-Reply-To: References: Message-ID: <4a82bdf6-1f45-3335-cb5d-4aa92f682353@oracle.com> Hi Vladimir, nice cleanup, looks good to me. Best regards, Tobias On 31.07.20 04:54, Vladimir Kozlov wrote: > https://cr.openjdk.java.net/~kvn/8250233/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8250233 > > Main issue was missing EnableJVMCI flag check when calling > JVMCICompiler::print_compilation_timers(). I addition to fixinf that I did next refactoring. > > The code which collects and print statistics per compiler was guarded by #if INCLUDE_JVMCI but not > by any JVMCI flags. > As result it is default code used by all JIT compilers since JVMCI was added in JDK 9. > > I decided to make it not JVMCI specific and used it on all platforms. > > I also added statistic per compilation tier which provides more useful information than combined > date for C1. > > Removed in CompileBroker::print_times() code which calculate total values based on data in > compiler's statistic. Such data is already collected in CompileBroker's static fields. > > Added checks for 0 values in print statements to avoid division by 0 (whioch produced NaN values for > doubles). > > Don't print empty data in JVMCICompiler::print_compilation_timers() but print total compilation time > in JVMCICompiler::print_timers(). > > Tested hs-tier1-3. > > Thanks, > Vladimir > > Beginning of CITime new output: > > Individual compiler times (for compiled methods only) > ------------------------------------------------ > > ? C1 {speed: 49626.710 bytes/s; standard:? 0.037 s, 1842 bytes, 35 methods; osr:? 0.000 s, 0 bytes, > 0 methods; nmethods_size: 51096 bytes; nmethods_code_size: 30880 bytes} > ? C2 {speed: 1451.769 bytes/s; standard:? 0.001 s, 2 bytes, 2 methods; osr:? 0.000 s, 0 bytes, 0 > methods; nmethods_size: 288 bytes; nmethods_code_size: 128 bytes} > > Individual compilation Tier times (for compiled methods only) > ------------------------------------------------ > > ? Tier1 {speed: 21162.963 bytes/s; standard:? 0.002 s, 47 bytes, 10 methods; osr:? 0.000 s, 0 bytes, > 0 methods; nmethods_size: 3160 bytes; nmethods_code_size: 1504 bytes} > ? Tier2 {speed:? 0.000 bytes/s; standard:? 0.000 s, 0 bytes, 0 methods; osr:? 0.000 s, 0 bytes, 0 > methods; nmethods_size: 0 bytes; nmethods_code_size: 0 bytes} > ? Tier3 {speed: 51438.195 bytes/s; standard:? 0.035 s, 1795 bytes, 25 methods; osr:? 0.000 s, 0 > bytes, 0 methods; nmethods_size: 47936 bytes; nmethods_code_size: 29376 bytes} > ? Tier4 {speed: 1451.769 bytes/s; standard:? 0.001 s, 2 bytes, 2 methods; osr:? 0.000 s, 0 bytes, 0 > methods; nmethods_size: 288 bytes; nmethods_code_size: 128 bytes} > > Accumulated compiler times > ---------------------------------------------------------- > ? Total compilation time?? :?? 0.038 s > ??? Standard compilation?? :?? 0.038 s, Average : 0.001 s > ??? Bailed out compilation :?? 0.000 s, Average : 0.000 s > ??? On stack replacement?? :?? 0.000 s, Average : 0.000 s > ??? Invalidated??????????? :?? 0.000 s, Average : 0.000 s From xxinliu at amazon.com Tue Aug 4 06:39:52 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Tue, 4 Aug 2020 06:39:52 +0000 Subject: RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic In-Reply-To: <916b3a4a-5617-941d-6161-840f3ea900bd@oracle.com> References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com> <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com> <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com> <1595401959932.33284@amazon.com> <1595520162373.22868@amazon.com>, <916b3a4a-5617-941d-6161-840f3ea900bd@oracle.com> Message-ID: <1596523192072.15354@amazon.com> hi, Nils, Tobias would like to keep the parser behavior consistency. I think it means that the hotspot need to suppress the warning if the intrinsic_id doesn't exists in compiler directive. eg. -XX:CompileCommand=option,,ControlIntrinsic=-_nonexist. What do you think about it? Here is the latest webrev: http://cr.openjdk.java.net/~xliu/8247732/01/webrev/ thanks, --lx ________________________________________ From: Tobias Hartmann Sent: Friday, July 24, 2020 2:52 AM To: Liu, Xin; Nils Eliasson; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev Subject: RE: [EXTERNAL] RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Liu, On 23.07.20 18:02, Liu, Xin wrote: > That is my intention too, but CompilerOracle doesn't exit JVM when it encounters parsing errors. > It just exacts information from CompileCommand as many as possible. That makes sense because compiler "directives" are supposed to be optional for program execution. > > I do put the error message in parser's errorbuf. I set a flag "exit_on_error" to quit JVM after it dumps parser errors. yes, I treat undefined intrinsics as fatal errors. > This behavior is from Nils comment: "I want to see an error on startup if the user has specified unknown intrinsic names." It is also consistent with JVM option -XX:ControlIntrinsic=. Okay, thanks for the explanation! I would prefer consistency in error handling of compiler directives, i.e., handle all parser failures the same way. But I leave it to Nils to decide. Best regards, Tobias From boris.ulasevich at bell-sw.com Tue Aug 4 16:56:58 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Tue, 4 Aug 2020 19:56:58 +0300 Subject: RFR 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: References: Message-ID: <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com> Hi, gently reminding of this review request. thanks, Boris On 24.07.2020 13:48, Boris Ulasevich wrote: > Hi, > > Please review the change to C2 and AArch64 which reduces constructs > like? "(v1 & 0xFF) | ((v2 & 0xFF) << 8)" into two Bitfield Insert > instructions. > > http://bugs.openjdk.java.net/browse/JDK-8249893 > http://cr.openjdk.java.net/~bulasevich/8249893/webrev.00 > > The change in common code was made to enable Node::is_AndL method. > The method in the rule predicate is required to find out if we are within > the straight or reversed rule (ADLC adds rule with swapped parameters > for commutative operands). > > Tested with JTREG and generated [1] tests. > > thanks, > Boris > > [1] http://cr.openjdk.java.net/~bulasevich/8249893/webrev.00/Gen.java From boris.ulasevich at bell-sw.com Tue Aug 4 16:58:18 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Tue, 4 Aug 2020 19:58:18 +0300 Subject: RFR(S) 8248445: Use of AbsI/AbsL nodes should be limited to supported platforms In-Reply-To: References: Message-ID: Hi Vladimir, Yes, thank you. I've re-written this to improve readability by changing the logic slightly. http://cr.openjdk.java.net/~bulasevich/8248445/webrev.03 thanks, Boris On 03.08.2020 20:25, Vladimir Kozlov wrote: > Hi Boris, > > The current code is hard to read. Can you rearrange it to have clear > code flow (and correct spaces for if ())? Including F and D checks. To > something like: > > ? if (tzero == TypeF::ZERO) { > ??? if (sub->Opcode() == Op_SubF && > ??????? sub->in(2) == x && > ??????? phase->type(sub->in(1)) == tzero)) { > ????? x = new AbsFNode(x); > ????? if (flip) { > ??????? x = new SubFNode(sub->in(1), phase->transform(x)); > ????? } > ??? } > ? } else if > > Thanks, > Vladimir > > On 8/2/20 1:54 PM, Boris Ulasevich wrote: >> Hi all, >> >> Please review a simple change to C2 to fix a regression: AbsI/AbsL >> nodes are used without checking that the platform supports them >> (for now it is the issue for ARM32 and 32-bit x86 platforms). >> >> http://cr.openjdk.java.net/~bulasevich/8248445/webrev.02 >> http://bugs.openjdk.java.net/browse/JDK-8248445 >> >> thanks, >> Boris >> From vladimir.kozlov at oracle.com Tue Aug 4 17:19:56 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 4 Aug 2020 10:19:56 -0700 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> Message-ID: Hi Ludovic, On 8/3/20 9:07 PM, Ludovic Henry wrote: > Updated webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.02 > >> Next code in inline_digestBase_implCompressMB should be reversed (get_long_*() should be called for long_state): >> >> if (long_state) { >> state = get_state_from_digestBase_object(digestBase_obj); >> } else { >> state = get_long_state_from_digestBase_object(digestBase_obj); >> } > > Thanks for pointing that out. I tested everything with `hotspot:tier1` and `jdk:tier1` in fastdebug on Windows-x86, Windows-x64 and Linux-x64. Code in library_call.cpp is good now. > >> It seems that the algorithm can be optimized further using SSE/AVX instructions. I am not aware of any specific SSE/AVX implementation which leverages those instructions in the best possible way. Sandhya can chime in more on that. > > I have done some research prior to implementing this intrinsic and the only pointers I could find to vectorized MD5 is on computing _multiple_ MD5 hashes in parallel but not a _single_ MD5 hash. Using vectors effectively parallelize the computation of many MD5 hash, but it does not accelerate the computation of a single MD5 hash. And looking at the algorithm, every step depends on the previous step's result, which make it particularly hard to parallelize/vectorize. > >> As far as I know, I came across this which points to MD5 SSE/AVX implementation. https://software.intel.com/content/www/us/en/develop/articles/intel-isa-l-cryptographic-hashes-for-cloud-storage.html > > That library points to computing many MD5 hashes in parallel. Quoting: "Intel? ISA-L uses a novel technique called multi-buffer hashing, which [...] compute several hashes at once within a single core." That is similar to what I found in researching how to vectorize MD5. I also did not find any reference of an ISA-level implementation of MD5, neither in x86 nor ARM. > > If you can point me to a document describing how to vectorize MD5, I would be more than happy to take a look and implement the algorithm. However, my understanding is that MD5 is not vectorizable by-design. I would leave this investigation to Intel's Java group. They are expert in this area! For now, lets put current implementation into JDK. > >> Add tests to verify intrinsic implementation. You can use test/hotspot/jtreg/compiler/intrinsics/sha/ as examples. > > I looked at these tests and they already cover MD5. I am not sure what's the best way to add tests here: 1. should I rename ` compiler/intrinsics/sha` to ` compiler/intrinsics/digest` and add the md5 tests there, 2. should I just add ` compiler/intrinsics/md5`, or 3. the name doesn't matter and I can just add it in ` compiler/intrinsics/sha`? 3. Just add MD5 tests into existing SHA directory. Note, compiler/intrinsics/sha testing is done in tier2. I ran it and it passed but it does not test MD5 a lot as I understand. > >> In vm_version_x86.cpp move UseMD5Intrinsics flag setting near UseSHA flag setting. > > Fixed. It is not moved in webrev.02 > >> In new file macroAssembler_x86_md5.cpp no need empty line after copyright line. There is also typo 'rrdistribute': >> >> * This code is free software; you can rrdistribute it and/or modify it >> >> Our validate-headers check failed. See GPL header template: ./make/templates/gpl-header > > I updated the header, and added the license for the original code for the MD5 core algorithm. You don't need to use Oracle copyright line. Using original Microsoft's copyright line is fine since you are author. Thank you for adding license for original code. > >> Did you test it on 32-bit x86? > > I did run `hotspot:tier1` and `jdk:tier1` on Windows-x86, Windows-x64 and Linux-x64. > >> Would be interesting to see result of artificially switching off AVX and SSE: >> '-XX:UseSSE=0 -XX:UseAVX=0'. It will make sure that only general instructions are needed. > > The results are below: Very good. Thank you for testing it. Regards, Vladimir > > -XX:-UseMD5Intrinsics > Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 10 3512.618 ? 9.384 ops/ms > MessageDigests.digest md5 1024 DEFAULT thrpt 10 450.037 ? 1.213 ops/ms > MessageDigests.digest md5 16384 DEFAULT thrpt 10 29.887 ? 0.057 ops/ms > MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.485 ? 0.002 ops/ms > > -XX:+UseMD5Intrinsics > Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 10 4212.156 ? 7.781 ops/ ms => 19% speedup > MessageDigests.digest md5 1024 DEFAULT thrpt 10 548.609 ? 1.374 ops/ ms => 22% speedup > MessageDigests.digest md5 16384 DEFAULT thrpt 10 37.961 ? 0.079 ops/ ms => 27% speedup > MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.596 ? 0.006 ops/ ms => 23% speedup > > -XX:-UseMD5Intrinsics -XX:UseSSE=0 -XX:UseAVX=0 > Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 10 3462.769 ? 4.992 ops/ms > MessageDigests.digest md5 1024 DEFAULT thrpt 10 443.858 ? 0.576 ops/ms > MessageDigests.digest md5 16384 DEFAULT thrpt 10 29.723 ? 0.480 ops/ms > MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.470 ? 0.001 ops/ms > > -XX:+UseMD5Intrinsics -XX:UseSSE=0 -XX:UseAVX=0 > Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 10 4237.219 ? 15.627 ops/ms => 22% speedup > MessageDigests.digest md5 1024 DEFAULT thrpt 10 564.625 ? 1.510 ops/ms => 27% speedup > MessageDigests.digest md5 16384 DEFAULT thrpt 10 38.004 ? 0.078 ops/ms => 28% speedup > MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.597 ? 0.002 ops/ms => 27% speedup > > Thank you, > Ludovic > From vladimir.kozlov at oracle.com Tue Aug 4 17:29:13 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 4 Aug 2020 10:29:13 -0700 Subject: [aarch64-port-dev ] RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC In-Reply-To: References: <5e301790-8bfe-0ced-b5e2-8a9c76ae33de@oracle.com> <1259c3fd-b69c-6d81-0427-cb769f00bca5@redhat.com> <116277BD-EA21-49AA-8DE1-DBC06ED43C43@oracle.com> Message-ID: Good. Vladimir K On 8/3/20 9:23 PM, Ludovic Henry wrote: > Hello, > > A quick follow up on that change. > > Webrev: http://cr.openjdk.java.net/~luhenry/8248672/webrev.01/8248672.patch > > Thank you, > Ludovic > From vladimir.kozlov at oracle.com Tue Aug 4 17:33:34 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 4 Aug 2020 10:33:34 -0700 Subject: [16] RFR(M) 8250233: -XX:+CITime triggers guarantee(events != NULL) in jvmci.cpp:173 In-Reply-To: <4a82bdf6-1f45-3335-cb5d-4aa92f682353@oracle.com> References: <4a82bdf6-1f45-3335-cb5d-4aa92f682353@oracle.com> Message-ID: <5ae75bab-7331-9984-63f2-0107902fd7e8@oracle.com> Thank you, Tobias Vladimir K On 8/3/20 11:25 PM, Tobias Hartmann wrote: > Hi Vladimir, > > nice cleanup, looks good to me. > > Best regards, > Tobias > > On 31.07.20 04:54, Vladimir Kozlov wrote: >> https://cr.openjdk.java.net/~kvn/8250233/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8250233 >> >> Main issue was missing EnableJVMCI flag check when calling >> JVMCICompiler::print_compilation_timers(). I addition to fixinf that I did next refactoring. >> >> The code which collects and print statistics per compiler was guarded by #if INCLUDE_JVMCI but not >> by any JVMCI flags. >> As result it is default code used by all JIT compilers since JVMCI was added in JDK 9. >> >> I decided to make it not JVMCI specific and used it on all platforms. >> >> I also added statistic per compilation tier which provides more useful information than combined >> date for C1. >> >> Removed in CompileBroker::print_times() code which calculate total values based on data in >> compiler's statistic. Such data is already collected in CompileBroker's static fields. >> >> Added checks for 0 values in print statements to avoid division by 0 (whioch produced NaN values for >> doubles). >> >> Don't print empty data in JVMCICompiler::print_compilation_timers() but print total compilation time >> in JVMCICompiler::print_timers(). >> >> Tested hs-tier1-3. >> >> Thanks, >> Vladimir >> >> Beginning of CITime new output: >> >> Individual compiler times (for compiled methods only) >> ------------------------------------------------ >> >> ? C1 {speed: 49626.710 bytes/s; standard:? 0.037 s, 1842 bytes, 35 methods; osr:? 0.000 s, 0 bytes, >> 0 methods; nmethods_size: 51096 bytes; nmethods_code_size: 30880 bytes} >> ? C2 {speed: 1451.769 bytes/s; standard:? 0.001 s, 2 bytes, 2 methods; osr:? 0.000 s, 0 bytes, 0 >> methods; nmethods_size: 288 bytes; nmethods_code_size: 128 bytes} >> >> Individual compilation Tier times (for compiled methods only) >> ------------------------------------------------ >> >> ? Tier1 {speed: 21162.963 bytes/s; standard:? 0.002 s, 47 bytes, 10 methods; osr:? 0.000 s, 0 bytes, >> 0 methods; nmethods_size: 3160 bytes; nmethods_code_size: 1504 bytes} >> ? Tier2 {speed:? 0.000 bytes/s; standard:? 0.000 s, 0 bytes, 0 methods; osr:? 0.000 s, 0 bytes, 0 >> methods; nmethods_size: 0 bytes; nmethods_code_size: 0 bytes} >> ? Tier3 {speed: 51438.195 bytes/s; standard:? 0.035 s, 1795 bytes, 25 methods; osr:? 0.000 s, 0 >> bytes, 0 methods; nmethods_size: 47936 bytes; nmethods_code_size: 29376 bytes} >> ? Tier4 {speed: 1451.769 bytes/s; standard:? 0.001 s, 2 bytes, 2 methods; osr:? 0.000 s, 0 bytes, 0 >> methods; nmethods_size: 288 bytes; nmethods_code_size: 128 bytes} >> >> Accumulated compiler times >> ---------------------------------------------------------- >> ? Total compilation time?? :?? 0.038 s >> ??? Standard compilation?? :?? 0.038 s, Average : 0.001 s >> ??? Bailed out compilation :?? 0.000 s, Average : 0.000 s >> ??? On stack replacement?? :?? 0.000 s, Average : 0.000 s >> ??? Invalidated??????????? :?? 0.000 s, Average : 0.000 s From vladimir.kozlov at oracle.com Tue Aug 4 17:55:14 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 4 Aug 2020 10:55:14 -0700 Subject: RFR(S) 8248445: Use of AbsI/AbsL nodes should be limited to supported platforms In-Reply-To: References: Message-ID: Hi Boris, Good change. Add year to test's copyright line. Regards, Vladimir K On 8/4/20 9:58 AM, Boris Ulasevich wrote: > Hi Vladimir, > > Yes, thank you. I've re-written this to improve readability by changing the logic slightly. > http://cr.openjdk.java.net/~bulasevich/8248445/webrev.03 > > thanks, > Boris > > On 03.08.2020 20:25, Vladimir Kozlov wrote: >> Hi Boris, >> >> The current code is hard to read. Can you rearrange it to have clear code flow (and correct spaces for if ())? >> Including F and D checks. To something like: >> >> ? if (tzero == TypeF::ZERO) { >> ??? if (sub->Opcode() == Op_SubF && >> ??????? sub->in(2) == x && >> ??????? phase->type(sub->in(1)) == tzero)) { >> ????? x = new AbsFNode(x); >> ????? if (flip) { >> ??????? x = new SubFNode(sub->in(1), phase->transform(x)); >> ????? } >> ??? } >> ? } else if >> >> Thanks, >> Vladimir >> >> On 8/2/20 1:54 PM, Boris Ulasevich wrote: >>> Hi all, >>> >>> Please review a simple change to C2 to fix a regression: AbsI/AbsL >>> nodes are used without checking that the platform supports them >>> (for now it is the issue for ARM32 and 32-bit x86 platforms). >>> >>> http://cr.openjdk.java.net/~bulasevich/8248445/webrev.02 >>> http://bugs.openjdk.java.net/browse/JDK-8248445 >>> >>> thanks, >>> Boris >>> > From vladimir.kozlov at oracle.com Tue Aug 4 19:52:29 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 4 Aug 2020 12:52:29 -0700 Subject: RFR (XXL): 8223347: Integration of Vector API (Incubator): General HotSpot changes In-Reply-To: <38a7fe74-0c5e-4a28-b128-24c40b8ea01e@oracle.com> References: <38a7fe74-0c5e-4a28-b128-24c40b8ea01e@oracle.com> Message-ID: Hi Vladimir, Looks good. I have only few small questions. compile.cpp: what is next comment about? + // FIXME for_igvn() is corrupted from here: new_worklist which is set_for_ignv() was allocated on stack. print_method(): NodeClassNames[] should be available in product. Node::Name() method is not, but we can move it to product. But I am fine to do that later. Why VectorSupport.java does not have copyright header? Thanks, Vladimir K On 7/28/20 3:29 PM, Vladimir Ivanov wrote: > Hi, > > Thanks for the feedback on webrev.00, Remi, Coleen, Vladimir K., and Ekaterina! > > Here are the latest changes for Vector API support in HotSpot shared code: > > http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01 > > Incremental changes (diff against webrev.00): > > http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01_00 > > I decided to post it here and not initiate a new round of reviews because the changes are mostly limited to minor > cleanups / simple bug fixes. > > Detailed summary: > ? - rebased to jdk/jdk tip; > ? - got rid of NotV, VLShiftV, VRShiftV, VURShiftV nodes; > ? - restore lazy cleanup logic during incremental inlining (see needs_cleanup in compile.cpp); > ? - got rid of x86-specific changes in shared code; > ? - fix for 8244867 [1]; > ? - fix Graal test failure: enumerate VectorSupport intrinsics in CheckGraalIntrinsics > ? - numerous minor cleanups > > Best regards, > Vladimir Ivanov > > [1] http://hg.openjdk.java.net/panama/dev/rev/dcfc7b6e8977 > ??? http://jbs.oracle.com/browse/JDK-8244867 > ??? 8244867: 2 vector api tests crash with assert(is_reference_type(basic_type())) failed: wrong type > Summary: Adding safety checks to prevent intrinsification if class arguments of non-primitive types are uninitialized. > > On 04.04.2020 02:12, Vladimir Ivanov wrote: >> Hi, >> >> Following up on review requests of API [0] and Java implementation [1] for Vector API (JEP 338 [2]), here's a request >> for review of general HotSpot changes (in shared code) required for supporting the API: >> >> >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/all.00-03/ >> >> (First of all, to set proper expectations: since the JEP is still in Candidate state, the intention is to initiate >> preliminary round(s) of review to inform the community and gather feedback before sending out final/official RFRs once >> the JEP is Targeted to a release.) >> >> Vector API (being developed in Project Panama [3]) relies on JVM support to utilize optimal vector hardware >> instructions at runtime. It interacts with JVM through intrinsics (declared in jdk.internal.vm.vector.VectorSupport >> [4]) which expose vector operations support in C2 JIT-compiler. >> >> As Paul wrote earlier: "A vector intrinsic is an internal low-level vector operation. The last argument to the >> intrinsic is fall back behavior in Java, implementing the scalar operation over the number of elements held by the >> vector.? Thus, If the intrinsic is not supported in C2 for the other arguments then the Java implementation is >> executed (the Java implementation is always executed when running in the interpreter or for C1)." >> >> The rest of JVM support is about aggressively optimizing vector boxes to minimize (ideally eliminate) the overhead of >> boxing for vector values. >> It's a stop-the-gap solution for vector box elimination problem until inline classes arrive. Vector classes are >> value-based and in the longer term will be migrated to inline classes once the support becomes available. >> >> Vector API talk from JVMLS'18 [5] contains brief overview of JVM implementation and some details. >> >> Complete implementation resides in vector-unstable branch of panama/dev repository [6]. >> >> Now to gory details (the patch is split in multiple "sub-webrevs"): >> >> =========================================================== >> >> (1) http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/00.backend.shared/ >> >> Ideal vector nodes for new operations introduced by Vector API. >> >> (Platform-specific back end support will be posted for review separately). >> >> =========================================================== >> >> (2) http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/01.intrinsics/ >> >> JVM Java interface (VectorSupport) and intrinsic support in C2. >> >> Vector instances are initially represented as VectorBox macro nodes and "unboxing" is represented by VectorUnbox node. >> It simplifies vector box elimination analysis and the nodes are expanded later right before EA pass. >> >> Vectors have 2-level on-heap representation: for the vector value primitive array is used as a backing storage and it >> is encapsulated in a typed wrapper (e.g., Int256Vector - vector of 8 ints - contains a int[8] instance which is used >> to store vector value). >> >> Unless VectorBox node goes away, it needs to be expanded into an allocation eventually, but it is a pure node and >> doesn't have any JVM state associated with it. The problem is solved by keeping JVM state separately in a >> VectorBoxAllocate node associated with VectorBox node and use it during expansion. >> >> Also, to simplify vector box elimination, inlining of vector reboxing calls (VectorSupport::maybeRebox) is delayed >> until the analysis is over. >> >> =========================================================== >> >> (3) http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/02.vbox_elimination/ >> >> Vector box elimination analysis implementation. (Brief overview: slides #36-42 [5].) >> >> The main part is devoted to scalarization across safepoints and rematerialization support during deoptimization. In >> C2-generated code vector operations work with raw vector values which live in registers or spilled on the stack and it >> allows to avoid boxing/unboxing when a vector value is alive across a safepoint. As with other values, there's just a >> location of the vector value at the safepoint and vector type information recorded in the relevant nmethod metadata >> and all the heavy-lifting happens only when rematerialization takes place. >> >> The analysis preserves object identity invariants except during aggressive reboxing (guarded by >> -XX:+EnableAggressiveReboxing). >> >> (Aggressive reboxing is crucial for cases when vectors "escape": it allocates a fresh instance at every escape point >> thus enabling original instance to go away.) >> >> =========================================================== >> >> (4) http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/03.module.hotspot/ >> >> HotSpot changes for jdk.incubator.vector module. Vector support is makred experimental and turned off by default. JEP >> 338 proposes the API to be released as an incubator module, so a user has to specify "--add-module >> jdk.incubator.vector" on the command line to be able to use it. >> When user does that, JVM automatically enables Vector API support. >> It improves usability (user doesn't need to separately "open" the API and enable JVM support) while minimizing risks >> of destabilitzation from new code when the API is not used. >> >> >> That's it! Will be happy to answer any questions. >> >> And thanks in advance for any feedback! >> >> Best regards, >> Vladimir Ivanov >> >> [0] https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-March/065345.html >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-April/041228.html >> >> [2] https://openjdk.java.net/jeps/338 >> >> [3] https://openjdk.java.net/projects/panama/ >> >> [4] >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/01.intrinsics/src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java.html >> >> >> [5] http://cr.openjdk.java.net/~vlivanov/talks/2018_JVMLS_VectorAPI.pdf >> >> [6] http://hg.openjdk.java.net/panama/dev/shortlog/92bbd44386e9 >> >> ???? $ hg clone http://hg.openjdk.java.net/panama/dev/ -b vector-unstable From vladimir.kozlov at oracle.com Tue Aug 4 19:59:52 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 4 Aug 2020 12:59:52 -0700 Subject: RFR (XXL): 8223347: Integration of Vector API (Incubator): General HotSpot changes In-Reply-To: References: <38a7fe74-0c5e-4a28-b128-24c40b8ea01e@oracle.com> Message-ID: x86 changes seems fine. Thanks, Vladimir K On 7/29/20 11:19 AM, Viswanathan, Sandhya wrote: > Hi, > > Likewise, the corresponding x86 backend changes since first review are also only minor cleanups and simple bug fixes: > > X86: > Full: http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/x86_webrev/webrev.01/ > Incremental: http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/x86_webrev/webrev.00-webrev.01/ > > Summary: > - rebased to jdk/jdk tip; > - backend changes related to removal of NotV, VLShiftV, VRShiftV, VURShiftV nodes; > - vector insert bug fix > - some minor cleanups > > Older webrev links for your reference: > X86b backend: http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/x86_webrev/webrev.00/ > > Best Regards, > Sandhya > > -----Original Message----- > From: Vladimir Ivanov > Sent: Tuesday, July 28, 2020 3:30 PM > To: hotspot-dev ; hotspot compiler > Cc: Viswanathan, Sandhya ; panama-dev > Subject: Re: RFR (XXL): 8223347: Integration of Vector API (Incubator): General HotSpot changes > > Hi, > > Thanks for the feedback on webrev.00, Remi, Coleen, Vladimir K., and Ekaterina! > > Here are the latest changes for Vector API support in HotSpot shared code: > > http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01 > > Incremental changes (diff against webrev.00): > > http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01_00 > > I decided to post it here and not initiate a new round of reviews because the changes are mostly limited to minor cleanups / simple bug fixes. > > Detailed summary: > - rebased to jdk/jdk tip; > - got rid of NotV, VLShiftV, VRShiftV, VURShiftV nodes; > - restore lazy cleanup logic during incremental inlining (see needs_cleanup in compile.cpp); > - got rid of x86-specific changes in shared code; > - fix for 8244867 [1]; > - fix Graal test failure: enumerate VectorSupport intrinsics in CheckGraalIntrinsics > - numerous minor cleanups > > Best regards, > Vladimir Ivanov > > [1] http://hg.openjdk.java.net/panama/dev/rev/dcfc7b6e8977 > http://jbs.oracle.com/browse/JDK-8244867 > 8244867: 2 vector api tests crash with > assert(is_reference_type(basic_type())) failed: wrong type > Summary: Adding safety checks to prevent intrinsification if class arguments of non-primitive types are uninitialized. > > On 04.04.2020 02:12, Vladimir Ivanov wrote: >> Hi, >> >> Following up on review requests of API [0] and Java implementation [1] >> for Vector API (JEP 338 [2]), here's a request for review of general >> HotSpot changes (in shared code) required for supporting the API: >> >> >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar >> ed/webrev.00/all.00-03/ >> >> >> (First of all, to set proper expectations: since the JEP is still in >> Candidate state, the intention is to initiate preliminary round(s) of >> review to inform the community and gather feedback before sending out >> final/official RFRs once the JEP is Targeted to a release.) >> >> Vector API (being developed in Project Panama [3]) relies on JVM >> support to utilize optimal vector hardware instructions at runtime. It >> interacts with JVM through intrinsics (declared in >> jdk.internal.vm.vector.VectorSupport [4]) which expose vector >> operations support in C2 JIT-compiler. >> >> As Paul wrote earlier: "A vector intrinsic is an internal low-level >> vector operation. The last argument to the intrinsic is fall back >> behavior in Java, implementing the scalar operation over the number of >> elements held by the vector.? Thus, If the intrinsic is not supported >> in >> C2 for the other arguments then the Java implementation is executed >> (the Java implementation is always executed when running in the >> interpreter or for C1)." >> >> The rest of JVM support is about aggressively optimizing vector boxes >> to minimize (ideally eliminate) the overhead of boxing for vector values. >> It's a stop-the-gap solution for vector box elimination problem until >> inline classes arrive. Vector classes are value-based and in the >> longer term will be migrated to inline classes once the support becomes available. >> >> Vector API talk from JVMLS'18 [5] contains brief overview of JVM >> implementation and some details. >> >> Complete implementation resides in vector-unstable branch of >> panama/dev repository [6]. >> >> Now to gory details (the patch is split in multiple "sub-webrevs"): >> >> =========================================================== >> >> (1) >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar >> ed/webrev.00/00.backend.shared/ >> >> >> Ideal vector nodes for new operations introduced by Vector API. >> >> (Platform-specific back end support will be posted for review separately). >> >> =========================================================== >> >> (2) >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar >> ed/webrev.00/01.intrinsics/ >> >> >> JVM Java interface (VectorSupport) and intrinsic support in C2. >> >> Vector instances are initially represented as VectorBox macro nodes >> and "unboxing" is represented by VectorUnbox node. It simplifies >> vector box elimination analysis and the nodes are expanded later right before EA pass. >> >> Vectors have 2-level on-heap representation: for the vector value >> primitive array is used as a backing storage and it is encapsulated in >> a typed wrapper (e.g., Int256Vector - vector of 8 ints - contains a >> int[8] instance which is used to store vector value). >> >> Unless VectorBox node goes away, it needs to be expanded into an >> allocation eventually, but it is a pure node and doesn't have any JVM >> state associated with it. The problem is solved by keeping JVM state >> separately in a VectorBoxAllocate node associated with VectorBox node >> and use it during expansion. >> >> Also, to simplify vector box elimination, inlining of vector reboxing >> calls (VectorSupport::maybeRebox) is delayed until the analysis is over. >> >> =========================================================== >> >> (3) >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar >> ed/webrev.00/02.vbox_elimination/ >> >> >> Vector box elimination analysis implementation. (Brief overview: >> slides >> #36-42 [5].) >> >> The main part is devoted to scalarization across safepoints and >> rematerialization support during deoptimization. In C2-generated code >> vector operations work with raw vector values which live in registers >> or spilled on the stack and it allows to avoid boxing/unboxing when a >> vector value is alive across a safepoint. As with other values, >> there's just a location of the vector value at the safepoint and >> vector type information recorded in the relevant nmethod metadata and >> all the heavy-lifting happens only when rematerialization takes place. >> >> The analysis preserves object identity invariants except during >> aggressive reboxing (guarded by -XX:+EnableAggressiveReboxing). >> >> (Aggressive reboxing is crucial for cases when vectors "escape": it >> allocates a fresh instance at every escape point thus enabling >> original instance to go away.) >> >> =========================================================== >> >> (4) >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar >> ed/webrev.00/03.module.hotspot/ >> >> >> HotSpot changes for jdk.incubator.vector module. Vector support is >> makred experimental and turned off by default. JEP 338 proposes the >> API to be released as an incubator module, so a user has to specify >> "--add-module jdk.incubator.vector" on the command line to be able to >> use it. >> When user does that, JVM automatically enables Vector API support. >> It improves usability (user doesn't need to separately "open" the API >> and enable JVM support) while minimizing risks of destabilitzation >> from new code when the API is not used. >> >> >> That's it! Will be happy to answer any questions. >> >> And thanks in advance for any feedback! >> >> Best regards, >> Vladimir Ivanov >> >> [0] >> https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-March/06534 >> 5.html >> >> >> [1] >> https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-April/041228. >> html >> >> [2] https://openjdk.java.net/jeps/338 >> >> [3] https://openjdk.java.net/projects/panama/ >> >> [4] >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar >> ed/webrev.00/01.intrinsics/src/java.base/share/classes/jdk/internal/vm >> /vector/VectorSupport.java.html >> >> >> [5] >> http://cr.openjdk.java.net/~vlivanov/talks/2018_JVMLS_VectorAPI.pdf >> >> [6] http://hg.openjdk.java.net/panama/dev/shortlog/92bbd44386e9 >> >> ??? $ hg clone http://hg.openjdk.java.net/panama/dev/ -b >> vector-unstable From luhenry at microsoft.com Tue Aug 4 20:21:02 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Tue, 4 Aug 2020 20:21:02 +0000 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> Message-ID: Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.03 Testing: everything passes with hotspot:tier1 and jdk:tier1 in fastdebug on Linux-x64. > I would leave this investigation to Intel's Java group. They are expert in this area! Ok, we'll reach out to Intel on our end as well to figure out whether they have any specific guidance on that. > 3. Just add MD5 tests into existing SHA directory. Done. I've done some small renames (TestSHA -> TestDigest, SHAOptionsBase -> DigestOptionsBase), modified some of the SHA-specific code for non-SHA cases (GenericTestCaseFor*.java), and added MD5-specific tests. > Note, compiler/intrinsics/sha testing is done in tier2. I ran it and it passed but it does not test MD5 a lot as I understand. I extended the existing tests to cover MD5 on the same level as SHA, and I made sure that all tests are still passing. >> >>> In vm_version_x86.cpp move UseMD5Intrinsics flag setting near UseSHA flag setting. >> >> Fixed. > > It is not moved in webrev.02 Fixed. > You don't need to use Oracle copyright line. Using original Microsoft's copyright line is fine since you are author. Fixed. From vladimir.kozlov at oracle.com Tue Aug 4 22:03:38 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 4 Aug 2020 15:03:38 -0700 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> Message-ID: <410fb009-94ab-fea8-9c1c-51c835b27b72@oracle.com> Good. I will run Hotspot and JDK testing and let you know results. Regards, Vladimir K On 8/4/20 1:21 PM, Ludovic Henry wrote: > Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.03 > Testing: everything passes with hotspot:tier1 and jdk:tier1 in fastdebug on Linux-x64. > >> I would leave this investigation to Intel's Java group. They are expert in this area! > > Ok, we'll reach out to Intel on our end as well to figure out whether they have any specific guidance on that. > >> 3. Just add MD5 tests into existing SHA directory. > > Done. I've done some small renames (TestSHA -> TestDigest, SHAOptionsBase -> DigestOptionsBase), modified some of the SHA-specific code for non-SHA cases (GenericTestCaseFor*.java), and added MD5-specific tests. > >> Note, compiler/intrinsics/sha testing is done in tier2. I ran it and it passed but it does not test MD5 a lot as I understand. > > I extended the existing tests to cover MD5 on the same level as SHA, and I made sure that all tests are still passing. > >>> >>>> In vm_version_x86.cpp move UseMD5Intrinsics flag setting near UseSHA flag setting. >>> >>> Fixed. >> >> It is not moved in webrev.02 > > Fixed. > >> You don't need to use Oracle copyright line. Using original Microsoft's copyright line is fine since you are author. > > Fixed. > > From igor.ignatyev at oracle.com Tue Aug 4 23:58:48 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 4 Aug 2020 16:58:48 -0700 Subject: RFR(S) : 8251126 : nsk.share.GoldChecker should read golden file from ${test.src} Message-ID: <7510EC68-7A8C-4F1E-A928-5910F13FA5D9@oracle.com> http://cr.openjdk.java.net/~iignatyev/8251126/webrev.00/ > 37 lines changed: 7 ins; 20 del; 10 mod; Hi all, could you please review this patch? from JBS: > as of now, nsk.share.GoldChecker reads golden files from the current directory, which makes it necessary to copy golden files from ${test.src} before the execution of the tests which use GoldChecker. after this patch, FileInstaller actions will become redundant in 103 of :vmTestbase_vm_compiler tests and will be removed by 8251127. JBS: https://bugs.openjdk.java.net/browse/JDK-8251126 webrev: http://cr.openjdk.java.net/~iignatyev/8251126/webrev.00/ testing: :vmTestbase_vm_compiler tests 8251127: https://bugs.openjdk.java.net/browse/JDK-8251127 Thanks, -- Igor From sandhya.viswanathan at intel.com Wed Aug 5 00:16:44 2020 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Wed, 5 Aug 2020 00:16:44 +0000 Subject: RFR (XXL): 8223347: Integration of Vector API (Incubator): General HotSpot changes In-Reply-To: References: <38a7fe74-0c5e-4a28-b128-24c40b8ea01e@oracle.com> Message-ID: Thanks a lot for the review. Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov Sent: Tuesday, August 04, 2020 1:00 PM To: Viswanathan, Sandhya ; Vladimir Ivanov ; hotspot-dev ; hotspot compiler Cc: panama-dev Subject: Re: RFR (XXL): 8223347: Integration of Vector API (Incubator): General HotSpot changes x86 changes seems fine. Thanks, Vladimir K On 7/29/20 11:19 AM, Viswanathan, Sandhya wrote: > Hi, > > Likewise, the corresponding x86 backend changes since first review are also only minor cleanups and simple bug fixes: > > X86: > Full: http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/x86_webrev/webrev.01/ > Incremental: > http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/x86_webrev/webrev.00 > -webrev.01/ > > Summary: > - rebased to jdk/jdk tip; > - backend changes related to removal of NotV, VLShiftV, VRShiftV, VURShiftV nodes; > - vector insert bug fix > - some minor cleanups > > Older webrev links for your reference: > X86b backend: > http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/x86_webrev/webrev.00 > / > > Best Regards, > Sandhya > > -----Original Message----- > From: Vladimir Ivanov > Sent: Tuesday, July 28, 2020 3:30 PM > To: hotspot-dev ; hotspot compiler > > Cc: Viswanathan, Sandhya ; panama-dev > > Subject: Re: RFR (XXL): 8223347: Integration of Vector API > (Incubator): General HotSpot changes > > Hi, > > Thanks for the feedback on webrev.00, Remi, Coleen, Vladimir K., and Ekaterina! > > Here are the latest changes for Vector API support in HotSpot shared code: > > http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar > ed/webrev.01 > > Incremental changes (diff against webrev.00): > > http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar > ed/webrev.01_00 > > I decided to post it here and not initiate a new round of reviews because the changes are mostly limited to minor cleanups / simple bug fixes. > > Detailed summary: > - rebased to jdk/jdk tip; > - got rid of NotV, VLShiftV, VRShiftV, VURShiftV nodes; > - restore lazy cleanup logic during incremental inlining (see needs_cleanup in compile.cpp); > - got rid of x86-specific changes in shared code; > - fix for 8244867 [1]; > - fix Graal test failure: enumerate VectorSupport intrinsics in CheckGraalIntrinsics > - numerous minor cleanups > > Best regards, > Vladimir Ivanov > > [1] http://hg.openjdk.java.net/panama/dev/rev/dcfc7b6e8977 > http://jbs.oracle.com/browse/JDK-8244867 > 8244867: 2 vector api tests crash with > assert(is_reference_type(basic_type())) failed: wrong type > Summary: Adding safety checks to prevent intrinsification if class arguments of non-primitive types are uninitialized. > > On 04.04.2020 02:12, Vladimir Ivanov wrote: >> Hi, >> >> Following up on review requests of API [0] and Java implementation >> [1] for Vector API (JEP 338 [2]), here's a request for review of >> general HotSpot changes (in shared code) required for supporting the API: >> >> >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.sha >> r >> ed/webrev.00/all.00-03/ >> >> >> (First of all, to set proper expectations: since the JEP is still in >> Candidate state, the intention is to initiate preliminary round(s) of >> review to inform the community and gather feedback before sending out >> final/official RFRs once the JEP is Targeted to a release.) >> >> Vector API (being developed in Project Panama [3]) relies on JVM >> support to utilize optimal vector hardware instructions at runtime. >> It interacts with JVM through intrinsics (declared in >> jdk.internal.vm.vector.VectorSupport [4]) which expose vector >> operations support in C2 JIT-compiler. >> >> As Paul wrote earlier: "A vector intrinsic is an internal low-level >> vector operation. The last argument to the intrinsic is fall back >> behavior in Java, implementing the scalar operation over the number >> of elements held by the vector.? Thus, If the intrinsic is not >> supported in >> C2 for the other arguments then the Java implementation is executed >> (the Java implementation is always executed when running in the >> interpreter or for C1)." >> >> The rest of JVM support is about aggressively optimizing vector boxes >> to minimize (ideally eliminate) the overhead of boxing for vector values. >> It's a stop-the-gap solution for vector box elimination problem until >> inline classes arrive. Vector classes are value-based and in the >> longer term will be migrated to inline classes once the support becomes available. >> >> Vector API talk from JVMLS'18 [5] contains brief overview of JVM >> implementation and some details. >> >> Complete implementation resides in vector-unstable branch of >> panama/dev repository [6]. >> >> Now to gory details (the patch is split in multiple "sub-webrevs"): >> >> =========================================================== >> >> (1) >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.sha >> r >> ed/webrev.00/00.backend.shared/ >> >> >> Ideal vector nodes for new operations introduced by Vector API. >> >> (Platform-specific back end support will be posted for review separately). >> >> =========================================================== >> >> (2) >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.sha >> r >> ed/webrev.00/01.intrinsics/ >> >> >> JVM Java interface (VectorSupport) and intrinsic support in C2. >> >> Vector instances are initially represented as VectorBox macro nodes >> and "unboxing" is represented by VectorUnbox node. It simplifies >> vector box elimination analysis and the nodes are expanded later right before EA pass. >> >> Vectors have 2-level on-heap representation: for the vector value >> primitive array is used as a backing storage and it is encapsulated >> in a typed wrapper (e.g., Int256Vector - vector of 8 ints - contains >> a int[8] instance which is used to store vector value). >> >> Unless VectorBox node goes away, it needs to be expanded into an >> allocation eventually, but it is a pure node and doesn't have any JVM >> state associated with it. The problem is solved by keeping JVM state >> separately in a VectorBoxAllocate node associated with VectorBox node >> and use it during expansion. >> >> Also, to simplify vector box elimination, inlining of vector reboxing >> calls (VectorSupport::maybeRebox) is delayed until the analysis is over. >> >> =========================================================== >> >> (3) >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.sha >> r >> ed/webrev.00/02.vbox_elimination/ >> >> >> Vector box elimination analysis implementation. (Brief overview: >> slides >> #36-42 [5].) >> >> The main part is devoted to scalarization across safepoints and >> rematerialization support during deoptimization. In C2-generated code >> vector operations work with raw vector values which live in registers >> or spilled on the stack and it allows to avoid boxing/unboxing when a >> vector value is alive across a safepoint. As with other values, >> there's just a location of the vector value at the safepoint and >> vector type information recorded in the relevant nmethod metadata and >> all the heavy-lifting happens only when rematerialization takes place. >> >> The analysis preserves object identity invariants except during >> aggressive reboxing (guarded by -XX:+EnableAggressiveReboxing). >> >> (Aggressive reboxing is crucial for cases when vectors "escape": it >> allocates a fresh instance at every escape point thus enabling >> original instance to go away.) >> >> =========================================================== >> >> (4) >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.sha >> r >> ed/webrev.00/03.module.hotspot/ >> >> >> HotSpot changes for jdk.incubator.vector module. Vector support is >> makred experimental and turned off by default. JEP 338 proposes the >> API to be released as an incubator module, so a user has to specify >> "--add-module jdk.incubator.vector" on the command line to be able to >> use it. >> When user does that, JVM automatically enables Vector API support. >> It improves usability (user doesn't need to separately "open" the API >> and enable JVM support) while minimizing risks of destabilitzation >> from new code when the API is not used. >> >> >> That's it! Will be happy to answer any questions. >> >> And thanks in advance for any feedback! >> >> Best regards, >> Vladimir Ivanov >> >> [0] >> https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-March/0653 >> 4 >> 5.html >> >> >> [1] >> https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-April/041228. >> html >> >> [2] https://openjdk.java.net/jeps/338 >> >> [3] https://openjdk.java.net/projects/panama/ >> >> [4] >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.sha >> r >> ed/webrev.00/01.intrinsics/src/java.base/share/classes/jdk/internal/v >> m >> /vector/VectorSupport.java.html >> >> >> [5] >> http://cr.openjdk.java.net/~vlivanov/talks/2018_JVMLS_VectorAPI.pdf >> >> [6] http://hg.openjdk.java.net/panama/dev/shortlog/92bbd44386e9 >> >> ??? $ hg clone http://hg.openjdk.java.net/panama/dev/ -b >> vector-unstable From igor.ignatyev at oracle.com Wed Aug 5 00:22:09 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 4 Aug 2020 17:22:09 -0700 Subject: RFR(T) : 8251128 : remove vmTestbase/vm/compiler/jbe/combine Message-ID: <09FAF175-7A12-40DE-8E2E-B825A6E103F2@oracle.com> http://cr.openjdk.java.net/~iignatyev//8251128/webrev.00 > 25 lines changed: 0 ins; 25 del; 0 mod; Hi all, could you please review the patch which removes test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine directory (or rather the only file it contained -- README)? > % hg rm test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine > removing test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine/README > % hg st > R test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine/README from JBS: > test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine directory doesn't any tests and should be removed. JBS: https://bugs.openjdk.java.net/browse/JDK-8251128 webrev: http://cr.openjdk.java.net/~iignatyev//8251128/webrev.00 Thanks, -- Igor From vladimir.kozlov at oracle.com Wed Aug 5 00:34:52 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 4 Aug 2020 17:34:52 -0700 Subject: RFR(T) : 8251128 : remove vmTestbase/vm/compiler/jbe/combine In-Reply-To: <09FAF175-7A12-40DE-8E2E-B825A6E103F2@oracle.com> References: <09FAF175-7A12-40DE-8E2E-B825A6E103F2@oracle.com> Message-ID: <88d72d5b-329b-0b7d-04a0-2b2f2032b952@oracle.com> Looks good but where original tests were moved? Which RFE did that? Thanks, Vladimir On 8/4/20 5:22 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8251128/webrev.00 >> 25 lines changed: 0 ins; 25 del; 0 mod; > > Hi all, > > could you please review the patch which removes test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine directory (or rather the only file it contained -- README)? >> % hg rm test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine >> removing test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine/README >> % hg st >> R test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine/README > > from JBS: >> test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine directory doesn't any tests and should be removed. > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8251128 > webrev: http://cr.openjdk.java.net/~iignatyev//8251128/webrev.00 > > Thanks, > -- Igor > From vladimir.kozlov at oracle.com Wed Aug 5 01:33:09 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 4 Aug 2020 18:33:09 -0700 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: <410fb009-94ab-fea8-9c1c-51c835b27b72@oracle.com> References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> <410fb009-94ab-fea8-9c1c-51c835b27b72@oracle.com> Message-ID: <832a89ec-bd6b-5d40-9a1d-5a5e688399e7@oracle.com> Hi Ludovic, Tests are mostly clean so far except: new 3 MD5 tests failed on aarch64 because UseMD5Intrinsics flag is 'true' incorrectly: bool UseMD5Intrinsics = true {diagnostic} {command line} compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java compiler/intrinsics/sha/cli/TestUseMD5IntrinsicsOptionOnUnsupportedCPU.java I think you need to set flag to false (to overwrite setting on command line) in vm_version_*.cpp files on all other CPUs until they have implementation: http://hg.openjdk.java.net/jdk/jdk/file/5bda40c115c1/src/hotspot/cpu/ppc/vm_version_ppc.cpp#l278 Also I forgot to ask to update copyright year in files you touched. Thanks, Vladimir K On 8/4/20 3:03 PM, Vladimir Kozlov wrote: > Good. > > I will run Hotspot and JDK testing and let you know results. > > Regards, > Vladimir K > > On 8/4/20 1:21 PM, Ludovic Henry wrote: >> Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.03 >> Testing: everything passes with hotspot:tier1 and jdk:tier1 in fastdebug on Linux-x64. >> >>> I would leave this investigation to Intel's Java group. They are expert in this area! >> >> Ok, we'll reach out to Intel on our end as well to figure out whether they have any specific guidance on that. >> >>> 3. Just add MD5 tests into existing SHA directory. >> >> Done. I've done some small renames (TestSHA -> TestDigest, SHAOptionsBase -> DigestOptionsBase), modified some of the >> SHA-specific code for non-SHA cases (GenericTestCaseFor*.java), and added MD5-specific tests. >> >>> Note, compiler/intrinsics/sha testing is done in tier2. I ran it and it passed but it does not test MD5 a lot as I >>> understand. >> >> I extended the existing tests to cover MD5 on the same level as SHA, and I made sure that all tests are still passing. >> >>>> >>>>> In vm_version_x86.cpp move UseMD5Intrinsics flag setting near UseSHA flag setting. >>>> >>>> Fixed. >>> >>> It is not moved in webrev.02 >> >> Fixed. >> >>> You don't need to use Oracle copyright line. Using original Microsoft's copyright line is fine since you are author. >> >> Fixed. >> From luhenry at microsoft.com Wed Aug 5 02:09:08 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Wed, 5 Aug 2020 02:09:08 +0000 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: <832a89ec-bd6b-5d40-9a1d-5a5e688399e7@oracle.com> References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> <410fb009-94ab-fea8-9c1c-51c835b27b72@oracle.com> <832a89ec-bd6b-5d40-9a1d-5a5e688399e7@oracle.com> Message-ID: Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.03 > I think you need to set flag to false (to overwrite setting on command line) in vm_version_*.cpp files on all other CPUs until they have implementation: Fixed. > Also I forgot to ask to update copyright year in files you touched. Fixed. From igor.ignatyev at oracle.com Wed Aug 5 02:25:57 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 4 Aug 2020 19:25:57 -0700 Subject: RFR(T) : 8251128 : remove vmTestbase/vm/compiler/jbe/combine In-Reply-To: <88d72d5b-329b-0b7d-04a0-2b2f2032b952@oracle.com> References: <09FAF175-7A12-40DE-8E2E-B825A6E103F2@oracle.com> <88d72d5b-329b-0b7d-04a0-2b2f2032b952@oracle.com> Message-ID: Hi Vladimir, thanks for your reviewed. as of the original tests, they haven't been co-located and hence not open-sourced due to different reasons. -- Igor > On Aug 4, 2020, at 5:34 PM, Vladimir Kozlov wrote: > > Looks good but where original tests were moved? Which RFE did that? > > Thanks, > Vladimir > > On 8/4/20 5:22 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8251128/webrev.00 >>> 25 lines changed: 0 ins; 25 del; 0 mod; >> Hi all, >> could you please review the patch which removes test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine directory (or rather the only file it contained -- README)? >>> % hg rm test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine >>> removing test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine/README >>> % hg st >>> R test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine/README >> from JBS: >>> test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine directory doesn't any tests and should be removed. >> JBS: https://bugs.openjdk.java.net/browse/JDK-8251128 >> webrev: http://cr.openjdk.java.net/~iignatyev//8251128/webrev.00 >> Thanks, >> -- Igor From david.holmes at oracle.com Wed Aug 5 02:29:26 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 5 Aug 2020 12:29:26 +1000 Subject: RFR(S) : 8251126 : nsk.share.GoldChecker should read golden file from ${test.src} In-Reply-To: <7510EC68-7A8C-4F1E-A928-5910F13FA5D9@oracle.com> References: <7510EC68-7A8C-4F1E-A928-5910F13FA5D9@oracle.com> Message-ID: <41261a03-cd46-3d48-839b-d934a9fb92bb@oracle.com> Hi Igor, This seems fine. The code cleanup looks good too. Thanks, David On 5/08/2020 9:58 am, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev/8251126/webrev.00/ >> 37 lines changed: 7 ins; 20 del; 10 mod; > > Hi all, > > could you please review this patch? > from JBS: >> as of now, nsk.share.GoldChecker reads golden files from the current directory, which makes it necessary to copy golden files from ${test.src} before the execution of the tests which use GoldChecker. > > after this patch, FileInstaller actions will become redundant in 103 of :vmTestbase_vm_compiler tests and will be removed by 8251127. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8251126 > webrev: http://cr.openjdk.java.net/~iignatyev/8251126/webrev.00/ > testing: :vmTestbase_vm_compiler tests > > 8251127: https://bugs.openjdk.java.net/browse/JDK-8251127 > > Thanks, > -- Igor > > From vladimir.kozlov at oracle.com Wed Aug 5 04:36:07 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 4 Aug 2020 21:36:07 -0700 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> <410fb009-94ab-fea8-9c1c-51c835b27b72@oracle.com> <832a89ec-bd6b-5d40-9a1d-5a5e688399e7@oracle.com> Message-ID: It looks like you created webrev based on old state of jdk or its branch. Your vm_version_aarch64.cpp change did not apply to latest jdk source. There are also few copyright year updates for files which already have it. I fixed it and start new round of testing. Vladimir K On 8/4/20 7:09 PM, Ludovic Henry wrote: > Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.03 > >> I think you need to set flag to false (to overwrite setting on command line) in vm_version_*.cpp files on all other CPUs until they have implementation: > > Fixed. > > > Also I forgot to ask to update copyright year in files you touched. > > Fixed. > From igor.ignatyev at oracle.com Wed Aug 5 05:18:56 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 4 Aug 2020 22:18:56 -0700 Subject: RFR(M) : 8251132 : make main classes public in vmTestbase/jit tests Message-ID: http://cr.openjdk.java.net/~iignatyev//8251132/webrev.00 > 498 lines changed: 0 ins; 132 del; 366 mod; Hi all, could you please review the patch which adds public modifier to "main" test classes in vmTestbase/jit tests? from JBS: > main test classes of several vmTestbase/jit tests are package-private, as a result, jtreg can't run them directly and we had to use `driver ExecDriver --java ` to run them. > > this RFE is to make these classes public and to replace ExecDriver w/ regular `main/othervm` where appropriate. the patch also removes ExecDriver and @build in all but 6 tests. those 6 (vmTestbase/jit/t/t108--t113) compare stack traces to the golden ones, and execution them "directly" by jtreg will lead to failures due to a few extra frames from jtreg. JBS: https://bugs.openjdk.java.net/browse/JDK-8251132 testing: :vmTestbase_vm_compiler on {linux,windows,macos}-x64 webrev: http://cr.openjdk.java.net/~iignatyev//8251132/webrev.00 Thanks, -- Igor From boris.ulasevich at bell-sw.com Wed Aug 5 08:31:11 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Wed, 5 Aug 2020 11:31:11 +0300 Subject: RFR(S) 8248445: Use of AbsI/AbsL nodes should be limited to supported platforms In-Reply-To: References: Message-ID: <2f447be9-0349-240d-c511-5a6e06f662af@bell-sw.com> Hi Vladimir, Ok. Thank you for review! regards, Boris On 04.08.2020 20:55, Vladimir Kozlov wrote: > Hi Boris, > > Good change. > > Add year to test's copyright line. > > Regards, > Vladimir K > > On 8/4/20 9:58 AM, Boris Ulasevich wrote: >> Hi Vladimir, >> >> Yes, thank you. I've re-written this to improve readability by >> changing the logic slightly. >> http://cr.openjdk.java.net/~bulasevich/8248445/webrev.03 >> >> thanks, >> Boris >> >> On 03.08.2020 20:25, Vladimir Kozlov wrote: >>> Hi Boris, >>> >>> The current code is hard to read. Can you rearrange it to have clear >>> code flow (and correct spaces for if ())? Including F and D checks. >>> To something like: >>> >>> ? if (tzero == TypeF::ZERO) { >>> ??? if (sub->Opcode() == Op_SubF && >>> ??????? sub->in(2) == x && >>> ??????? phase->type(sub->in(1)) == tzero)) { >>> ????? x = new AbsFNode(x); >>> ????? if (flip) { >>> ??????? x = new SubFNode(sub->in(1), phase->transform(x)); >>> ????? } >>> ??? } >>> ? } else if >>> >>> Thanks, >>> Vladimir >>> >>> On 8/2/20 1:54 PM, Boris Ulasevich wrote: >>>> Hi all, >>>> >>>> Please review a simple change to C2 to fix a regression: AbsI/AbsL >>>> nodes are used without checking that the platform supports them >>>> (for now it is the issue for ARM32 and 32-bit x86 platforms). >>>> >>>> http://cr.openjdk.java.net/~bulasevich/8248445/webrev.02 >>>> http://bugs.openjdk.java.net/browse/JDK-8248445 >>>> >>>> thanks, >>>> Boris >>>> >> From aph at redhat.com Wed Aug 5 09:08:39 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 5 Aug 2020 10:08:39 +0100 Subject: RFR 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com> References: <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com> Message-ID: <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com> Hi, On 8/4/20 5:56 PM, Boris Ulasevich wrote: > gently reminding of this review request. >> http://bugs.openjdk.java.net/browse/JDK-8249893 >> http://cr.openjdk.java.net/~bulasevich/8249893/webrev.00 I'm leaning towards no. The code is too complicated and difficult to maintain for such a small gain. As I suggested to Eric Liu when discussing 8248870, we should try canonicalizing this stuff early in compilation then matching with BFM rules. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.kozlov at oracle.com Wed Aug 5 16:16:43 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 5 Aug 2020 09:16:43 -0700 Subject: RFR(M) : 8251132 : make main classes public in vmTestbase/jit tests In-Reply-To: References: Message-ID: <3baaf7fd-374f-5005-20e9-619a61710ee3@oracle.com> Hi Igor We were always told to use '/othervm' only if additional VM flags are specified. Also based on RFE description making classes public will allow to execute them directly by jtreg. So why you use '/othervm'? Also since you cleaning all this test can you use uniform format for class declaration line. I see different variations: public class DivTest{ public class Filtering { public class Robert { public class collapse { I think the last example is what we usually use. Code indent is also all over places. I understand that fixing many files by hand would be hard. But we you can do something (with script) which will not take a lot of your time we should do that. Thanks, Vladimir K On 8/4/20 10:18 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8251132/webrev.00 >> 498 lines changed: 0 ins; 132 del; 366 mod; > > Hi all, > > could you please review the patch which adds public modifier to "main" test classes in vmTestbase/jit tests? > from JBS: >> main test classes of several vmTestbase/jit tests are package-private, as a result, jtreg can't run them directly and we had to use `driver ExecDriver --java ` to run them. >> >> this RFE is to make these classes public and to replace ExecDriver w/ regular `main/othervm` where appropriate. > > the patch also removes ExecDriver and @build in all but 6 tests. those 6 (vmTestbase/jit/t/t108--t113) compare stack traces to the golden ones, and execution them "directly" by jtreg will lead to failures due to a few extra frames from jtreg. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8251132 > testing: :vmTestbase_vm_compiler on {linux,windows,macos}-x64 > webrev: http://cr.openjdk.java.net/~iignatyev//8251132/webrev.00 > > Thanks, > -- Igor > From igor.ignatyev at oracle.com Wed Aug 5 16:44:50 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 5 Aug 2020 09:44:50 -0700 Subject: RFR(M) : 8251132 : make main classes public in vmTestbase/jit tests In-Reply-To: <3baaf7fd-374f-5005-20e9-619a61710ee3@oracle.com> References: <3baaf7fd-374f-5005-20e9-619a61710ee3@oracle.com> Message-ID: > On Aug 5, 2020, at 9:16 AM, Vladimir Kozlov wrote: > > Hi Igor > > We were always told to use '/othervm' only if additional VM flags are specified. > Also based on RFE description making classes public will allow to execute them directly by jtreg. > > So why you use '/othervm'? /othervm tests are run directly by jtreg, as opposed to tests which use ExecDriver, where jtreg runs ExecDriver and ExecDriver spawns a new process to run a test. I used to /othervm to keep the tests closer to their current state, i.e. each test is run in a separate clean JVM. removing /othervm would require a bit more detail analysis on wherever these tests really require clean state, I'd prefer to do separately. > > Also since you cleaning all this test can you use uniform format for class declaration line. > I see different variations: > > public class DivTest{ > > public class Filtering > { > > public class Robert > { > > public class collapse { > > I think the last example is what we usually use. > > Code indent is also all over places. > > I understand that fixing many files by hand would be hard. But we you can do something (with script) which will not take a lot of your time we should do that. I guess I can run some auto-formater on all these files, yet to make it cleaner I'd prefer to do it by another RFE. -- Igor > > Thanks, > Vladimir K > > On 8/4/20 10:18 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8251132/webrev.00 >>> 498 lines changed: 0 ins; 132 del; 366 mod; >> Hi all, >> could you please review the patch which adds public modifier to "main" test classes in vmTestbase/jit tests? >> from JBS: >>> main test classes of several vmTestbase/jit tests are package-private, as a result, jtreg can't run them directly and we had to use `driver ExecDriver --java ` to run them. >>> >>> this RFE is to make these classes public and to replace ExecDriver w/ regular `main/othervm` where appropriate. >> the patch also removes ExecDriver and @build in all but 6 tests. those 6 (vmTestbase/jit/t/t108--t113) compare stack traces to the golden ones, and execution them "directly" by jtreg will lead to failures due to a few extra frames from jtreg. >> JBS: https://bugs.openjdk.java.net/browse/JDK-8251132 >> testing: :vmTestbase_vm_compiler on {linux,windows,macos}-x64 >> webrev: http://cr.openjdk.java.net/~iignatyev//8251132/webrev.00 >> Thanks, >> -- Igor From vladimir.kozlov at oracle.com Wed Aug 5 16:48:43 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 5 Aug 2020 09:48:43 -0700 Subject: RFR(M) : 8251132 : make main classes public in vmTestbase/jit tests In-Reply-To: References: <3baaf7fd-374f-5005-20e9-619a61710ee3@oracle.com> Message-ID: On 8/5/20 9:44 AM, Igor Ignatyev wrote: > > >> On Aug 5, 2020, at 9:16 AM, Vladimir Kozlov wrote: >> >> Hi Igor >> >> We were always told to use '/othervm' only if additional VM flags are specified. >> Also based on RFE description making classes public will allow to execute them directly by jtreg. >> >> So why you use '/othervm'? > > /othervm tests are run directly by jtreg, as opposed to tests which use ExecDriver, where jtreg runs ExecDriver and ExecDriver spawns a new process to run a test. > > I used to /othervm to keep the tests closer to their current state, i.e. each test is run in a separate clean JVM. removing /othervm would require a bit more detail analysis on wherever these tests really require clean state, I'd prefer to do separately. Okay. Add this to RFE comment to avoid confusion later. > >> >> Also since you cleaning all this test can you use uniform format for class declaration line. >> I see different variations: >> >> public class DivTest{ >> >> public class Filtering >> { >> >> public class Robert >> { >> >> public class collapse { >> >> I think the last example is what we usually use. >> >> Code indent is also all over places. >> >> I understand that fixing many files by hand would be hard. But we you can do something (with script) which will not take a lot of your time we should do that. > I guess I can run some auto-formater on all these files, yet to make it cleaner I'd prefer to do it by another RFE. Agree. Thanks, Vladimir K > > -- Igor > >> >> Thanks, >> Vladimir K >> >> On 8/4/20 10:18 PM, Igor Ignatyev wrote: >>> http://cr.openjdk.java.net/~iignatyev//8251132/webrev.00 >>>> 498 lines changed: 0 ins; 132 del; 366 mod; >>> Hi all, >>> could you please review the patch which adds public modifier to "main" test classes in vmTestbase/jit tests? >>> from JBS: >>>> main test classes of several vmTestbase/jit tests are package-private, as a result, jtreg can't run them directly and we had to use `driver ExecDriver --java ` to run them. >>>> >>>> this RFE is to make these classes public and to replace ExecDriver w/ regular `main/othervm` where appropriate. >>> the patch also removes ExecDriver and @build in all but 6 tests. those 6 (vmTestbase/jit/t/t108--t113) compare stack traces to the golden ones, and execution them "directly" by jtreg will lead to failures due to a few extra frames from jtreg. >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8251132 >>> testing: :vmTestbase_vm_compiler on {linux,windows,macos}-x64 >>> webrev: http://cr.openjdk.java.net/~iignatyev//8251132/webrev.00 >>> Thanks, >>> -- Igor > From leonid.mesnik at oracle.com Wed Aug 5 17:54:01 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Wed, 5 Aug 2020 10:54:01 -0700 Subject: RFR: 8161684: [testconf] Add VerifyOops' testing into compiler tiers Message-ID: <2D3C0414-4A37-4A02-BD2C-F3A221E4658C@oracle.com> Hi Could you please review following fix which disable testing of AOT when VerifyOops is enabled until https://bugs.openjdk.java.net/browse/JDK-8209961 is fixed. bug: https://bugs.openjdk.java.net/browse/JDK-8161684 diff: diff -r 0d5c9dffe1f6 test/jtreg-ext/requires/VMProps.java --- a/test/jtreg-ext/requires/VMProps.java Mon Jul 27 22:59:27 2020 +0200 +++ b/test/jtreg-ext/requires/VMProps.java Wed Aug 05 10:50:20 2020 -0700 @@ -380,6 +380,10 @@ return "false"; } + if (WB.getBooleanVMFlag("VerifyOops")) { + return "false"; + } + switch (GC.selected()) { case Serial: case Parallel: Leonid From vladimir.kozlov at oracle.com Wed Aug 5 18:22:01 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 5 Aug 2020 11:22:01 -0700 Subject: RFR: 8161684: [testconf] Add VerifyOops' testing into compiler tiers In-Reply-To: <2D3C0414-4A37-4A02-BD2C-F3A221E4658C@oracle.com> References: <2D3C0414-4A37-4A02-BD2C-F3A221E4658C@oracle.com> Message-ID: Hi Leonid, Dean is working on 8209961 fix and it can be done 'soon'. How urgent your changes? Can you wait a little? Thanks, Vladimir K On 8/5/20 10:54 AM, Leonid Mesnik wrote: > Hi > > Could you please review following fix which disable testing of AOT when VerifyOops is enabled until https://bugs.openjdk.java.net/browse/JDK-8209961 is fixed. > > > bug: https://bugs.openjdk.java.net/browse/JDK-8161684 > diff: > > diff -r 0d5c9dffe1f6 test/jtreg-ext/requires/VMProps.java > --- a/test/jtreg-ext/requires/VMProps.java Mon Jul 27 22:59:27 2020 +0200 > +++ b/test/jtreg-ext/requires/VMProps.java Wed Aug 05 10:50:20 2020 -0700 > @@ -380,6 +380,10 @@ > return "false"; > } > > + if (WB.getBooleanVMFlag("VerifyOops")) { > + return "false"; > + } > + > switch (GC.selected()) { > case Serial: > case Parallel: > > Leonid > From leonid.mesnik at oracle.com Wed Aug 5 18:35:16 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Wed, 5 Aug 2020 11:35:16 -0700 Subject: RFR: 8161684: [testconf] Add VerifyOops' testing into compiler tiers In-Reply-To: References: <2D3C0414-4A37-4A02-BD2C-F3A221E4658C@oracle.com> Message-ID: <733129AD-775A-4042-8373-B78E3CDCE47D@oracle.com> Hi I checked with Dean status of 8209961. He said that he run into some issue and it makes sense to disable VerifyOops for AOT now. Leonid > On Aug 5, 2020, at 11:22 AM, Vladimir Kozlov wrote: > > Hi Leonid, > > Dean is working on 8209961 fix and it can be done 'soon'. > > How urgent your changes? Can you wait a little? > > Thanks, > Vladimir K > > On 8/5/20 10:54 AM, Leonid Mesnik wrote: >> Hi >> Could you please review following fix which disable testing of AOT when VerifyOops is enabled until https://bugs.openjdk.java.net/browse/JDK-8209961 is fixed. >> bug: https://bugs.openjdk.java.net/browse/JDK-8161684 >> diff: >> diff -r 0d5c9dffe1f6 test/jtreg-ext/requires/VMProps.java >> --- a/test/jtreg-ext/requires/VMProps.java Mon Jul 27 22:59:27 2020 +0200 >> +++ b/test/jtreg-ext/requires/VMProps.java Wed Aug 05 10:50:20 2020 -0700 >> @@ -380,6 +380,10 @@ >> return "false"; >> } >> + if (WB.getBooleanVMFlag("VerifyOops")) { >> + return "false"; >> + } >> + >> switch (GC.selected()) { >> case Serial: >> case Parallel: >> Leonid From vladimir.kozlov at oracle.com Wed Aug 5 18:53:32 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 5 Aug 2020 11:53:32 -0700 Subject: RFR: 8161684: [testconf] Add VerifyOops' testing into compiler tiers In-Reply-To: <733129AD-775A-4042-8373-B78E3CDCE47D@oracle.com> References: <2D3C0414-4A37-4A02-BD2C-F3A221E4658C@oracle.com> <733129AD-775A-4042-8373-B78E3CDCE47D@oracle.com> Message-ID: <28f13e69-7234-1d1f-e6e7-7f4775d3f948@oracle.com> Okay. Then you change is good. Thanks, Vladimir K On 8/5/20 11:35 AM, Leonid Mesnik wrote: > Hi > > I checked with Dean status of 8209961. He said that he run into some issue and it makes sense to disable VerifyOops for AOT now. > > Leonid > >> On Aug 5, 2020, at 11:22 AM, Vladimir Kozlov wrote: >> >> Hi Leonid, >> >> Dean is working on 8209961 fix and it can be done 'soon'. >> >> How urgent your changes? Can you wait a little? >> >> Thanks, >> Vladimir K >> >> On 8/5/20 10:54 AM, Leonid Mesnik wrote: >>> Hi >>> Could you please review following fix which disable testing of AOT when VerifyOops is enabled until https://bugs.openjdk.java.net/browse/JDK-8209961 is fixed. >>> bug: https://bugs.openjdk.java.net/browse/JDK-8161684 >>> diff: >>> diff -r 0d5c9dffe1f6 test/jtreg-ext/requires/VMProps.java >>> --- a/test/jtreg-ext/requires/VMProps.java Mon Jul 27 22:59:27 2020 +0200 >>> +++ b/test/jtreg-ext/requires/VMProps.java Wed Aug 05 10:50:20 2020 -0700 >>> @@ -380,6 +380,10 @@ >>> return "false"; >>> } >>> + if (WB.getBooleanVMFlag("VerifyOops")) { >>> + return "false"; >>> + } >>> + >>> switch (GC.selected()) { >>> case Serial: >>> case Parallel: >>> Leonid > From vladimir.x.ivanov at oracle.com Wed Aug 5 19:16:30 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 5 Aug 2020 22:16:30 +0300 Subject: RFR (XXL): 8223347: Integration of Vector API (Incubator): General HotSpot changes In-Reply-To: References: <38a7fe74-0c5e-4a28-b128-24c40b8ea01e@oracle.com> Message-ID: Thanks for the review, Vladimir. > compile.cpp: what is next comment about? > > +? // FIXME for_igvn() is corrupted from here: new_worklist which is > set_for_ignv() was allocated on stack. It documents a bug in the preceding code which makes for_igvn() node list unusable beyond that point: 2098 if (!failing() && RenumberLiveNodes && live_nodes() + NodeLimitFudgeFactor < unique()) { 2099 Compile::TracePhase tp("", &timers[_t_renumberLive]); 2100 initial_gvn()->replace_with(&igvn); 2101 for_igvn()->clear(); 2102 Unique_Node_List new_worklist(C->comp_arena()); 2103 { 2104 ResourceMark rm; 2105 PhaseRenumberLive prl = PhaseRenumberLive(initial_gvn(), for_igvn(), &new_worklist); 2106 } 2107 set_for_igvn(&new_worklist); 2108 igvn = PhaseIterGVN(initial_gvn()); 2109 igvn.optimize(); 2110 } 2111 2112 // FIXME for_igvn() is corrupted from here: new_worklist which is set_for_ignv() was allocated on stack. I'm fine with removing the commend and filing a bug instead. > print_method(): NodeClassNames[] should be available in product. > Node::Name() method is not, but we can move it to product. But I am fine > to do that later. Good point. I'll migrate print_method() to NodeClassNames[] for now. > Why VectorSupport.java does not have copyright header? Good catch! Will fix it and incorporate into the webrev in-place shortly. Best regards, Vladimir Ivanov > On 7/28/20 3:29 PM, Vladimir Ivanov wrote: >> Hi, >> >> Thanks for the feedback on webrev.00, Remi, Coleen, Vladimir K., and >> Ekaterina! >> >> Here are the latest changes for Vector API support in HotSpot shared >> code: >> >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01 >> >> >> Incremental changes (diff against webrev.00): >> >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01_00 >> >> >> I decided to post it here and not initiate a new round of reviews >> because the changes are mostly limited to minor cleanups / simple bug >> fixes. >> >> Detailed summary: >> ?? - rebased to jdk/jdk tip; >> ?? - got rid of NotV, VLShiftV, VRShiftV, VURShiftV nodes; >> ?? - restore lazy cleanup logic during incremental inlining (see >> needs_cleanup in compile.cpp); >> ?? - got rid of x86-specific changes in shared code; >> ?? - fix for 8244867 [1]; >> ?? - fix Graal test failure: enumerate VectorSupport intrinsics in >> CheckGraalIntrinsics >> ?? - numerous minor cleanups >> >> Best regards, >> Vladimir Ivanov >> >> [1] http://hg.openjdk.java.net/panama/dev/rev/dcfc7b6e8977 >> ???? http://jbs.oracle.com/browse/JDK-8244867 >> ???? 8244867: 2 vector api tests crash with >> assert(is_reference_type(basic_type())) failed: wrong type >> Summary: Adding safety checks to prevent intrinsification if class >> arguments of non-primitive types are uninitialized. >> >> On 04.04.2020 02:12, Vladimir Ivanov wrote: >>> Hi, >>> >>> Following up on review requests of API [0] and Java implementation >>> [1] for Vector API (JEP 338 [2]), here's a request for review of >>> general HotSpot changes (in shared code) required for supporting the >>> API: >>> >>> >>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/all.00-03/ >>> >>> >>> (First of all, to set proper expectations: since the JEP is still in >>> Candidate state, the intention is to initiate preliminary round(s) of >>> review to inform the community and gather feedback before sending out >>> final/official RFRs once the JEP is Targeted to a release.) >>> >>> Vector API (being developed in Project Panama [3]) relies on JVM >>> support to utilize optimal vector hardware instructions at runtime. >>> It interacts with JVM through intrinsics (declared in >>> jdk.internal.vm.vector.VectorSupport [4]) which expose vector >>> operations support in C2 JIT-compiler. >>> >>> As Paul wrote earlier: "A vector intrinsic is an internal low-level >>> vector operation. The last argument to the intrinsic is fall back >>> behavior in Java, implementing the scalar operation over the number >>> of elements held by the vector.? Thus, If the intrinsic is not >>> supported in C2 for the other arguments then the Java implementation >>> is executed (the Java implementation is always executed when running >>> in the interpreter or for C1)." >>> >>> The rest of JVM support is about aggressively optimizing vector boxes >>> to minimize (ideally eliminate) the overhead of boxing for vector >>> values. >>> It's a stop-the-gap solution for vector box elimination problem until >>> inline classes arrive. Vector classes are value-based and in the >>> longer term will be migrated to inline classes once the support >>> becomes available. >>> >>> Vector API talk from JVMLS'18 [5] contains brief overview of JVM >>> implementation and some details. >>> >>> Complete implementation resides in vector-unstable branch of >>> panama/dev repository [6]. >>> >>> Now to gory details (the patch is split in multiple "sub-webrevs"): >>> >>> =========================================================== >>> >>> (1) >>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/00.backend.shared/ >>> >>> >>> Ideal vector nodes for new operations introduced by Vector API. >>> >>> (Platform-specific back end support will be posted for review >>> separately). >>> >>> =========================================================== >>> >>> (2) >>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/01.intrinsics/ >>> >>> >>> JVM Java interface (VectorSupport) and intrinsic support in C2. >>> >>> Vector instances are initially represented as VectorBox macro nodes >>> and "unboxing" is represented by VectorUnbox node. It simplifies >>> vector box elimination analysis and the nodes are expanded later >>> right before EA pass. >>> >>> Vectors have 2-level on-heap representation: for the vector value >>> primitive array is used as a backing storage and it is encapsulated >>> in a typed wrapper (e.g., Int256Vector - vector of 8 ints - contains >>> a int[8] instance which is used to store vector value). >>> >>> Unless VectorBox node goes away, it needs to be expanded into an >>> allocation eventually, but it is a pure node and doesn't have any JVM >>> state associated with it. The problem is solved by keeping JVM state >>> separately in a VectorBoxAllocate node associated with VectorBox node >>> and use it during expansion. >>> >>> Also, to simplify vector box elimination, inlining of vector reboxing >>> calls (VectorSupport::maybeRebox) is delayed until the analysis is over. >>> >>> =========================================================== >>> >>> (3) >>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/02.vbox_elimination/ >>> >>> >>> Vector box elimination analysis implementation. (Brief overview: >>> slides #36-42 [5].) >>> >>> The main part is devoted to scalarization across safepoints and >>> rematerialization support during deoptimization. In C2-generated code >>> vector operations work with raw vector values which live in registers >>> or spilled on the stack and it allows to avoid boxing/unboxing when a >>> vector value is alive across a safepoint. As with other values, >>> there's just a location of the vector value at the safepoint and >>> vector type information recorded in the relevant nmethod metadata and >>> all the heavy-lifting happens only when rematerialization takes place. >>> >>> The analysis preserves object identity invariants except during >>> aggressive reboxing (guarded by -XX:+EnableAggressiveReboxing). >>> >>> (Aggressive reboxing is crucial for cases when vectors "escape": it >>> allocates a fresh instance at every escape point thus enabling >>> original instance to go away.) >>> >>> =========================================================== >>> >>> (4) >>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/03.module.hotspot/ >>> >>> >>> HotSpot changes for jdk.incubator.vector module. Vector support is >>> makred experimental and turned off by default. JEP 338 proposes the >>> API to be released as an incubator module, so a user has to specify >>> "--add-module jdk.incubator.vector" on the command line to be able to >>> use it. >>> When user does that, JVM automatically enables Vector API support. >>> It improves usability (user doesn't need to separately "open" the API >>> and enable JVM support) while minimizing risks of destabilitzation >>> from new code when the API is not used. >>> >>> >>> That's it! Will be happy to answer any questions. >>> >>> And thanks in advance for any feedback! >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [0] >>> https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-March/065345.html >>> >>> >>> [1] >>> https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-April/041228.html >>> >>> >>> [2] https://openjdk.java.net/jeps/338 >>> >>> [3] https://openjdk.java.net/projects/panama/ >>> >>> [4] >>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/01.intrinsics/src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java.html >>> >>> >>> [5] http://cr.openjdk.java.net/~vlivanov/talks/2018_JVMLS_VectorAPI.pdf >>> >>> [6] http://hg.openjdk.java.net/panama/dev/shortlog/92bbd44386e9 >>> >>> ???? $ hg clone http://hg.openjdk.java.net/panama/dev/ -b >>> vector-unstable From vladimir.x.ivanov at oracle.com Wed Aug 5 19:17:00 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 5 Aug 2020 22:17:00 +0300 Subject: RFR (XXL): 8223347: Integration of Vector API (Incubator): General HotSpot changes In-Reply-To: <9c538834-903b-5431-bb43-908b58a1b70a@oracle.com> References: <38a7fe74-0c5e-4a28-b128-24c40b8ea01e@oracle.com> <9c538834-903b-5431-bb43-908b58a1b70a@oracle.com> Message-ID: <90f71dc2-8ff0-5956-d08d-0af28f59c7df@oracle.com> Thanks for the review, Coleen. Best regards, Vladimir Ivanov On 31.07.2020 22:38, coleen.phillimore at oracle.com wrote: > The runtime code still looks good to me. > Coleen > > On 7/28/20 6:29 PM, Vladimir Ivanov wrote: >> Hi, >> >> Thanks for the feedback on webrev.00, Remi, Coleen, Vladimir K., and >> Ekaterina! >> >> Here are the latest changes for Vector API support in HotSpot shared >> code: >> >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01 >> >> >> Incremental changes (diff against webrev.00): >> >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01_00 >> >> >> I decided to post it here and not initiate a new round of reviews >> because the changes are mostly limited to minor cleanups / simple bug >> fixes. >> >> Detailed summary: >> ? - rebased to jdk/jdk tip; >> ? - got rid of NotV, VLShiftV, VRShiftV, VURShiftV nodes; >> ? - restore lazy cleanup logic during incremental inlining (see >> needs_cleanup in compile.cpp); >> ? - got rid of x86-specific changes in shared code; >> ? - fix for 8244867 [1]; >> ? - fix Graal test failure: enumerate VectorSupport intrinsics in >> CheckGraalIntrinsics >> ? - numerous minor cleanups >> >> Best regards, >> Vladimir Ivanov >> >> [1] http://hg.openjdk.java.net/panama/dev/rev/dcfc7b6e8977 >> ??? http://jbs.oracle.com/browse/JDK-8244867 >> ??? 8244867: 2 vector api tests crash with >> assert(is_reference_type(basic_type())) failed: wrong type >> Summary: Adding safety checks to prevent intrinsification if class >> arguments of non-primitive types are uninitialized. >> >> On 04.04.2020 02:12, Vladimir Ivanov wrote: >>> Hi, >>> >>> Following up on review requests of API [0] and Java implementation >>> [1] for Vector API (JEP 338 [2]), here's a request for review of >>> general HotSpot changes (in shared code) required for supporting the >>> API: >>> >>> >>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/all.00-03/ >>> >>> >>> (First of all, to set proper expectations: since the JEP is still in >>> Candidate state, the intention is to initiate preliminary round(s) of >>> review to inform the community and gather feedback before sending out >>> final/official RFRs once the JEP is Targeted to a release.) >>> >>> Vector API (being developed in Project Panama [3]) relies on JVM >>> support to utilize optimal vector hardware instructions at runtime. >>> It interacts with JVM through intrinsics (declared in >>> jdk.internal.vm.vector.VectorSupport [4]) which expose vector >>> operations support in C2 JIT-compiler. >>> >>> As Paul wrote earlier: "A vector intrinsic is an internal low-level >>> vector operation. The last argument to the intrinsic is fall back >>> behavior in Java, implementing the scalar operation over the number >>> of elements held by the vector.? Thus, If the intrinsic is not >>> supported in C2 for the other arguments then the Java implementation >>> is executed (the Java implementation is always executed when running >>> in the interpreter or for C1)." >>> >>> The rest of JVM support is about aggressively optimizing vector boxes >>> to minimize (ideally eliminate) the overhead of boxing for vector >>> values. >>> It's a stop-the-gap solution for vector box elimination problem until >>> inline classes arrive. Vector classes are value-based and in the >>> longer term will be migrated to inline classes once the support >>> becomes available. >>> >>> Vector API talk from JVMLS'18 [5] contains brief overview of JVM >>> implementation and some details. >>> >>> Complete implementation resides in vector-unstable branch of >>> panama/dev repository [6]. >>> >>> Now to gory details (the patch is split in multiple "sub-webrevs"): >>> >>> =========================================================== >>> >>> (1) >>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/00.backend.shared/ >>> >>> >>> Ideal vector nodes for new operations introduced by Vector API. >>> >>> (Platform-specific back end support will be posted for review >>> separately). >>> >>> =========================================================== >>> >>> (2) >>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/01.intrinsics/ >>> >>> >>> JVM Java interface (VectorSupport) and intrinsic support in C2. >>> >>> Vector instances are initially represented as VectorBox macro nodes >>> and "unboxing" is represented by VectorUnbox node. It simplifies >>> vector box elimination analysis and the nodes are expanded later >>> right before EA pass. >>> >>> Vectors have 2-level on-heap representation: for the vector value >>> primitive array is used as a backing storage and it is encapsulated >>> in a typed wrapper (e.g., Int256Vector - vector of 8 ints - contains >>> a int[8] instance which is used to store vector value). >>> >>> Unless VectorBox node goes away, it needs to be expanded into an >>> allocation eventually, but it is a pure node and doesn't have any JVM >>> state associated with it. The problem is solved by keeping JVM state >>> separately in a VectorBoxAllocate node associated with VectorBox node >>> and use it during expansion. >>> >>> Also, to simplify vector box elimination, inlining of vector reboxing >>> calls (VectorSupport::maybeRebox) is delayed until the analysis is over. >>> >>> =========================================================== >>> >>> (3) >>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/02.vbox_elimination/ >>> >>> >>> Vector box elimination analysis implementation. (Brief overview: >>> slides #36-42 [5].) >>> >>> The main part is devoted to scalarization across safepoints and >>> rematerialization support during deoptimization. In C2-generated code >>> vector operations work with raw vector values which live in registers >>> or spilled on the stack and it allows to avoid boxing/unboxing when a >>> vector value is alive across a safepoint. As with other values, >>> there's just a location of the vector value at the safepoint and >>> vector type information recorded in the relevant nmethod metadata and >>> all the heavy-lifting happens only when rematerialization takes place. >>> >>> The analysis preserves object identity invariants except during >>> aggressive reboxing (guarded by -XX:+EnableAggressiveReboxing). >>> >>> (Aggressive reboxing is crucial for cases when vectors "escape": it >>> allocates a fresh instance at every escape point thus enabling >>> original instance to go away.) >>> >>> =========================================================== >>> >>> (4) >>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/03.module.hotspot/ >>> >>> >>> HotSpot changes for jdk.incubator.vector module. Vector support is >>> makred experimental and turned off by default. JEP 338 proposes the >>> API to be released as an incubator module, so a user has to specify >>> "--add-module jdk.incubator.vector" on the command line to be able to >>> use it. >>> When user does that, JVM automatically enables Vector API support. >>> It improves usability (user doesn't need to separately "open" the API >>> and enable JVM support) while minimizing risks of destabilitzation >>> from new code when the API is not used. >>> >>> >>> That's it! Will be happy to answer any questions. >>> >>> And thanks in advance for any feedback! >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [0] >>> https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-March/065345.html >>> >>> >>> [1] >>> https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-April/041228.html >>> >>> >>> [2] https://openjdk.java.net/jeps/338 >>> >>> [3] https://openjdk.java.net/projects/panama/ >>> >>> [4] >>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/01.intrinsics/src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java.html >>> >>> >>> [5] http://cr.openjdk.java.net/~vlivanov/talks/2018_JVMLS_VectorAPI.pdf >>> >>> [6] http://hg.openjdk.java.net/panama/dev/shortlog/92bbd44386e9 >>> >>> ???? $ hg clone http://hg.openjdk.java.net/panama/dev/ -b >>> vector-unstable > From vladimir.kozlov at oracle.com Wed Aug 5 19:18:39 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 5 Aug 2020 12:18:39 -0700 Subject: RFR (XXL): 8223347: Integration of Vector API (Incubator): General HotSpot changes In-Reply-To: References: <38a7fe74-0c5e-4a28-b128-24c40b8ea01e@oracle.com> Message-ID: <6f25a6c6-c675-ee46-596d-f97a4119b95a@oracle.com> On 8/5/20 12:16 PM, Vladimir Ivanov wrote: > Thanks for the review, Vladimir. > >> compile.cpp: what is next comment about? >> >> +? // FIXME for_igvn() is corrupted from here: new_worklist which is set_for_ignv() was allocated on stack. > > It documents a bug in the preceding code which makes for_igvn() node list unusable beyond that point: > > 2098?? if (!failing() && RenumberLiveNodes && live_nodes() + NodeLimitFudgeFactor < unique()) { > 2099???? Compile::TracePhase tp("", &timers[_t_renumberLive]); > 2100???? initial_gvn()->replace_with(&igvn); > 2101???? for_igvn()->clear(); > 2102???? Unique_Node_List new_worklist(C->comp_arena()); > 2103???? { > 2104?????? ResourceMark rm; > 2105?????? PhaseRenumberLive prl = PhaseRenumberLive(initial_gvn(), for_igvn(), &new_worklist); > 2106???? } > 2107???? set_for_igvn(&new_worklist); > 2108???? igvn = PhaseIterGVN(initial_gvn()); > 2109???? igvn.optimize(); > 2110?? } > 2111 > 2112?? // FIXME for_igvn() is corrupted from here: new_worklist which is set_for_ignv() was allocated on stack. > > I'm fine with removing the commend and filing a bug instead. Yes, please. > >> print_method(): NodeClassNames[] should be available in product. Node::Name() method is not, but we can move it to >> product. But I am fine to do that later. > > Good point. I'll migrate print_method() to NodeClassNames[] for now. Okay. > >> Why VectorSupport.java does not have copyright header? > > Good catch! Will fix it and incorporate into the webrev in-place shortly. Thanks, Vladimir K > > Best regards, > Vladimir Ivanov > >> On 7/28/20 3:29 PM, Vladimir Ivanov wrote: >>> Hi, >>> >>> Thanks for the feedback on webrev.00, Remi, Coleen, Vladimir K., and Ekaterina! >>> >>> Here are the latest changes for Vector API support in HotSpot shared code: >>> >>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01 >>> >>> Incremental changes (diff against webrev.00): >>> >>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01_00 >>> >>> I decided to post it here and not initiate a new round of reviews because the changes are mostly limited to minor >>> cleanups / simple bug fixes. >>> >>> Detailed summary: >>> ?? - rebased to jdk/jdk tip; >>> ?? - got rid of NotV, VLShiftV, VRShiftV, VURShiftV nodes; >>> ?? - restore lazy cleanup logic during incremental inlining (see needs_cleanup in compile.cpp); >>> ?? - got rid of x86-specific changes in shared code; >>> ?? - fix for 8244867 [1]; >>> ?? - fix Graal test failure: enumerate VectorSupport intrinsics in CheckGraalIntrinsics >>> ?? - numerous minor cleanups >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [1] http://hg.openjdk.java.net/panama/dev/rev/dcfc7b6e8977 >>> ???? http://jbs.oracle.com/browse/JDK-8244867 >>> ???? 8244867: 2 vector api tests crash with assert(is_reference_type(basic_type())) failed: wrong type >>> Summary: Adding safety checks to prevent intrinsification if class arguments of non-primitive types are uninitialized. >>> >>> On 04.04.2020 02:12, Vladimir Ivanov wrote: >>>> Hi, >>>> >>>> Following up on review requests of API [0] and Java implementation [1] for Vector API (JEP 338 [2]), here's a >>>> request for review of general HotSpot changes (in shared code) required for supporting the API: >>>> >>>> >>>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/all.00-03/ >>>> >>>> (First of all, to set proper expectations: since the JEP is still in Candidate state, the intention is to initiate >>>> preliminary round(s) of review to inform the community and gather feedback before sending out final/official RFRs >>>> once the JEP is Targeted to a release.) >>>> >>>> Vector API (being developed in Project Panama [3]) relies on JVM support to utilize optimal vector hardware >>>> instructions at runtime. It interacts with JVM through intrinsics (declared in jdk.internal.vm.vector.VectorSupport >>>> [4]) which expose vector operations support in C2 JIT-compiler. >>>> >>>> As Paul wrote earlier: "A vector intrinsic is an internal low-level vector operation. The last argument to the >>>> intrinsic is fall back behavior in Java, implementing the scalar operation over the number of elements held by the >>>> vector.? Thus, If the intrinsic is not supported in C2 for the other arguments then the Java implementation is >>>> executed (the Java implementation is always executed when running in the interpreter or for C1)." >>>> >>>> The rest of JVM support is about aggressively optimizing vector boxes to minimize (ideally eliminate) the overhead >>>> of boxing for vector values. >>>> It's a stop-the-gap solution for vector box elimination problem until inline classes arrive. Vector classes are >>>> value-based and in the longer term will be migrated to inline classes once the support becomes available. >>>> >>>> Vector API talk from JVMLS'18 [5] contains brief overview of JVM implementation and some details. >>>> >>>> Complete implementation resides in vector-unstable branch of panama/dev repository [6]. >>>> >>>> Now to gory details (the patch is split in multiple "sub-webrevs"): >>>> >>>> =========================================================== >>>> >>>> (1) http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/00.backend.shared/ >>>> >>>> Ideal vector nodes for new operations introduced by Vector API. >>>> >>>> (Platform-specific back end support will be posted for review separately). >>>> >>>> =========================================================== >>>> >>>> (2) http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/01.intrinsics/ >>>> >>>> JVM Java interface (VectorSupport) and intrinsic support in C2. >>>> >>>> Vector instances are initially represented as VectorBox macro nodes and "unboxing" is represented by VectorUnbox >>>> node. It simplifies vector box elimination analysis and the nodes are expanded later right before EA pass. >>>> >>>> Vectors have 2-level on-heap representation: for the vector value primitive array is used as a backing storage and >>>> it is encapsulated in a typed wrapper (e.g., Int256Vector - vector of 8 ints - contains a int[8] instance which is >>>> used to store vector value). >>>> >>>> Unless VectorBox node goes away, it needs to be expanded into an allocation eventually, but it is a pure node and >>>> doesn't have any JVM state associated with it. The problem is solved by keeping JVM state separately in a >>>> VectorBoxAllocate node associated with VectorBox node and use it during expansion. >>>> >>>> Also, to simplify vector box elimination, inlining of vector reboxing calls (VectorSupport::maybeRebox) is delayed >>>> until the analysis is over. >>>> >>>> =========================================================== >>>> >>>> (3) http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/02.vbox_elimination/ >>>> >>>> Vector box elimination analysis implementation. (Brief overview: slides #36-42 [5].) >>>> >>>> The main part is devoted to scalarization across safepoints and rematerialization support during deoptimization. In >>>> C2-generated code vector operations work with raw vector values which live in registers or spilled on the stack and >>>> it allows to avoid boxing/unboxing when a vector value is alive across a safepoint. As with other values, there's >>>> just a location of the vector value at the safepoint and vector type information recorded in the relevant nmethod >>>> metadata and all the heavy-lifting happens only when rematerialization takes place. >>>> >>>> The analysis preserves object identity invariants except during aggressive reboxing (guarded by >>>> -XX:+EnableAggressiveReboxing). >>>> >>>> (Aggressive reboxing is crucial for cases when vectors "escape": it allocates a fresh instance at every escape point >>>> thus enabling original instance to go away.) >>>> >>>> =========================================================== >>>> >>>> (4) http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/03.module.hotspot/ >>>> >>>> HotSpot changes for jdk.incubator.vector module. Vector support is makred experimental and turned off by default. >>>> JEP 338 proposes the API to be released as an incubator module, so a user has to specify "--add-module >>>> jdk.incubator.vector" on the command line to be able to use it. >>>> When user does that, JVM automatically enables Vector API support. >>>> It improves usability (user doesn't need to separately "open" the API and enable JVM support) while minimizing risks >>>> of destabilitzation from new code when the API is not used. >>>> >>>> >>>> That's it! Will be happy to answer any questions. >>>> >>>> And thanks in advance for any feedback! >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> [0] https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-March/065345.html >>>> >>>> [1] https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-April/041228.html >>>> >>>> [2] https://openjdk.java.net/jeps/338 >>>> >>>> [3] https://openjdk.java.net/projects/panama/ >>>> >>>> [4] >>>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/01.intrinsics/src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java.html >>>> >>>> >>>> [5] http://cr.openjdk.java.net/~vlivanov/talks/2018_JVMLS_VectorAPI.pdf >>>> >>>> [6] http://hg.openjdk.java.net/panama/dev/shortlog/92bbd44386e9 >>>> >>>> ???? $ hg clone http://hg.openjdk.java.net/panama/dev/ -b vector-unstable From igor.ignatyev at oracle.com Wed Aug 5 19:52:25 2020 From: igor.ignatyev at oracle.com (igor.ignatyev at oracle.com) Date: Wed, 5 Aug 2020 12:52:25 -0700 Subject: RFR: 8161684: [testconf] Add VerifyOops' testing into compiler tiers In-Reply-To: <28f13e69-7234-1d1f-e6e7-7f4775d3f948@oracle.com> References: <28f13e69-7234-1d1f-e6e7-7f4775d3f948@oracle.com> Message-ID: <2638BD7F-407B-4C3E-9789-9DE1D4836382@oracle.com> Leonid, Could you please add a comment saying that this code should be reverted when 8209961 is fixed? ? Igor > On Aug 5, 2020, at 11:54 AM, Vladimir Kozlov wrote: > > ?Okay. Then you change is good. > > Thanks, > Vladimir K > >>> On 8/5/20 11:35 AM, Leonid Mesnik wrote: >> Hi >> I checked with Dean status of 8209961. He said that he run into some issue and it makes sense to disable VerifyOops for AOT now. >> Leonid >>>> On Aug 5, 2020, at 11:22 AM, Vladimir Kozlov wrote: >>> Hi Leonid, >>> Dean is working on 8209961 fix and it can be done 'soon'. >>> How urgent your changes? Can you wait a little? >>> Thanks, >>> Vladimir K >>> On 8/5/20 10:54 AM, Leonid Mesnik wrote: >>>> Hi >>>> Could you please review following fix which disable testing of AOT when VerifyOops is enabled until https://bugs.openjdk.java.net/browse/JDK-8209961 is fixed. >>>> bug: https://bugs.openjdk.java.net/browse/JDK-8161684 >>>> diff: >>>> diff -r 0d5c9dffe1f6 test/jtreg-ext/requires/VMProps.java >>>> --- a/test/jtreg-ext/requires/VMProps.java Mon Jul 27 22:59:27 2020 +0200 >>>> +++ b/test/jtreg-ext/requires/VMProps.java Wed Aug 05 10:50:20 2020 -0700 >>>> @@ -380,6 +380,10 @@ >>>> return "false"; >>>> } >>>> + if (WB.getBooleanVMFlag("VerifyOops")) { >>>> + return "false"; >>>> + } >>>> + >>>> switch (GC.selected()) { >>>> case Serial: >>>> case Parallel: >>>> Leonid From leonid.mesnik at oracle.com Wed Aug 5 21:39:21 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Wed, 5 Aug 2020 14:39:21 -0700 Subject: RFR: 8161684: [testconf] Add VerifyOops' testing into compiler tiers In-Reply-To: <2638BD7F-407B-4C3E-9789-9DE1D4836382@oracle.com> References: <28f13e69-7234-1d1f-e6e7-7f4775d3f948@oracle.com> <2638BD7F-407B-4C3E-9789-9DE1D4836382@oracle.com> Message-ID: Sure, will do. Leonid > On Aug 5, 2020, at 12:52 PM, igor.ignatyev at oracle.com wrote: > > Leonid, > > Could you please add a comment saying that this code should be reverted when 8209961 is fixed? > > ? Igor > >> On Aug 5, 2020, at 11:54 AM, Vladimir Kozlov wrote: >> >> ?Okay. Then you change is good. >> >> Thanks, >> Vladimir K >> >>>> On 8/5/20 11:35 AM, Leonid Mesnik wrote: >>> Hi >>> I checked with Dean status of 8209961. He said that he run into some issue and it makes sense to disable VerifyOops for AOT now. >>> Leonid >>>>> On Aug 5, 2020, at 11:22 AM, Vladimir Kozlov wrote: >>>> Hi Leonid, >>>> Dean is working on 8209961 fix and it can be done 'soon'. >>>> How urgent your changes? Can you wait a little? >>>> Thanks, >>>> Vladimir K >>>> On 8/5/20 10:54 AM, Leonid Mesnik wrote: >>>>> Hi >>>>> Could you please review following fix which disable testing of AOT when VerifyOops is enabled until https://bugs.openjdk.java.net/browse/JDK-8209961 is fixed. >>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8161684 >>>>> diff: >>>>> diff -r 0d5c9dffe1f6 test/jtreg-ext/requires/VMProps.java >>>>> --- a/test/jtreg-ext/requires/VMProps.java Mon Jul 27 22:59:27 2020 +0200 >>>>> +++ b/test/jtreg-ext/requires/VMProps.java Wed Aug 05 10:50:20 2020 -0700 >>>>> @@ -380,6 +380,10 @@ >>>>> return "false"; >>>>> } >>>>> + if (WB.getBooleanVMFlag("VerifyOops")) { >>>>> + return "false"; >>>>> + } >>>>> + >>>>> switch (GC.selected()) { >>>>> case Serial: >>>>> case Parallel: >>>>> Leonid > From igor.ignatyev at oracle.com Wed Aug 5 23:54:26 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 5 Aug 2020 16:54:26 -0700 Subject: RFR(S) : 8251126 : nsk.share.GoldChecker should read golden file from ${test.src} In-Reply-To: <41261a03-cd46-3d48-839b-d934a9fb92bb@oracle.com> References: <7510EC68-7A8C-4F1E-A928-5910F13FA5D9@oracle.com> <41261a03-cd46-3d48-839b-d934a9fb92bb@oracle.com> Message-ID: <35C137CF-698C-4396-B68E-98A158CE481F@oracle.com> Hi David, thanks for your review, pushed. -- Igor > On Aug 4, 2020, at 7:29 PM, David Holmes wrote: > > Hi Igor, > > This seems fine. The code cleanup looks good too. > > Thanks, > David > > On 5/08/2020 9:58 am, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev/8251126/webrev.00/ >>> 37 lines changed: 7 ins; 20 del; 10 mod; >> Hi all, >> could you please review this patch? >> from JBS: >>> as of now, nsk.share.GoldChecker reads golden files from the current directory, which makes it necessary to copy golden files from ${test.src} before the execution of the tests which use GoldChecker. >> after this patch, FileInstaller actions will become redundant in 103 of :vmTestbase_vm_compiler tests and will be removed by 8251127. >> JBS: https://bugs.openjdk.java.net/browse/JDK-8251126 >> webrev: http://cr.openjdk.java.net/~iignatyev/8251126/webrev.00/ >> testing: :vmTestbase_vm_compiler tests >> 8251127: https://bugs.openjdk.java.net/browse/JDK-8251127 >> Thanks, >> -- Igor From Xiaohong.Gong at arm.com Thu Aug 6 02:43:24 2020 From: Xiaohong.Gong at arm.com (Xiaohong Gong) Date: Thu, 6 Aug 2020 02:43:24 +0000 Subject: RFR: 8250808: Re-associate loop invariants with other associative operations Message-ID: Hi, Could you please help to review this simple patch? It adds the re-association for loop invariants with other associative operations in the C2 compiler. JBS: https://bugs.openjdk.java.net/browse/JDK-8250808 Webrev: http://cr.openjdk.java.net/~xgong/rfr/8250808/webrev.00/ C2 has re-association of loop invariants. However, the current implementation only supports the re-associations for add and subtract with 32-bits integer type. For other associative expressions like multiplication and the logic operations, the re-association is also applicable, and also for the operations with long type. This patch adds the missing re-associations for other associative operations together with the support for long type. With this patch, the following expressions: (x * inv1) * inv2 (x | inv1) | inv2 (x & inv1) & inv2 (x ^ inv1) ^ inv2 ; inv1, inv2 are invariants can be re-associated to: x * (inv1 * inv2) ; "inv1 * inv2" can be hoisted x | (inv1 | inv2) ; "inv1 | inv2" can be hoisted x & (inv1 & inv2) ; "inv1 & inv2" can be hoisted x ^ (inv1 ^ inv2) ; "inv1 ^ inv2" can be hoisted Performance: Here is the micro benchmark: http://cr.openjdk.java.net/~xgong/rfr/8250808/LoopInvariant.java And the results on X86_64: Before: Benchmark (length) Mode Cnt Score Error Units loopInvariantAddLong 1024 avgt 15 988.142 ? 0.110 ns/op loopInvariantAndInt 1024 avgt 15 843.850 ? 0.522 ns/op loopInvariantAndLong 1024 avgt 15 990.551 ? 10.458 ns/op loopInvariantMulInt 1024 avgt 15 1209.003 ? 0.247 ns/op loopInvariantMulLong 1024 avgt 15 1213.923 ? 0.438 ns/op loopInvariantOrInt 1024 avgt 15 843.908 ? 0.132 ns/op loopInvariantOrLong 1024 avgt 15 990.710 ? 10.484 ns/op loopInvariantSubLong 1024 avgt 15 988.170 ? 0.159 ns/op loopInvariantXorInt 1024 avgt 15 806.949 ? 7.860 ns/op loopInvariantXorLong 1024 avgt 15 990.963 ? 8.321 ns/op After: Benchmark (length) Mode Cnt Score Error Units loopInvariantAddLong 1024 avgt 15 842.854 ? 9.036 ns/op loopInvariantAndInt 1024 avgt 15 698.097 ? 0.916 ns/op loopInvariantAndLong 1024 avgt 15 841.120 ? 0.118 ns/op loopInvariantMulInt 1024 avgt 15 691.000 ? 7.696 ns/op loopInvariantMulLong 1024 avgt 15 846.907 ? 0.189 ns/op loopInvariantOrInt 1024 avgt 15 698.423 ? 4.969 ns/op loopInvariantOrLong 1024 avgt 15 843.465 ? 10.196 ns/op loopInvariantSubLong 1024 avgt 15 841.314 ? 2.906 ns/op loopInvariantXorInt 1024 avgt 15 652.529 ? 0.556 ns/op loopInvariantXorLong 1024 avgt 15 841.860 ? 2.491 ns/op Results on AArch64: Before: Benchmark (length) Mode Cnt Score Error Units loopInvariantAddLong 1024 avgt 15 514.437 ? 0.351 ns/op loopInvariantAndInt 1024 avgt 15 435.301 ? 0.415 ns/op loopInvariantAndLong 1024 avgt 15 572.437 ? 0.057 ns/op loopInvariantMulInt 1024 avgt 15 1154.544 ? 0.030 ns/op loopInvariantMulLong 1024 avgt 15 1188.109 ? 0.299 ns/op loopInvariantOrInt 1024 avgt 15 435.605 ? 0.977 ns/op loopInvariantOrLong 1024 avgt 15 572.475 ? 0.093 ns/op loopInvariantSubLong 1024 avgt 15 514.340 ? 0.154 ns/op loopInvariantXorInt 1024 avgt 15 426.186 ? 0.105 ns/op loopInvariantXorLong 1024 avgt 15 572.505 ? 0.259 ns/op After: Benchmark (length) Mode Cnt Score Error Units loopInvariantAddLong 1024 avgt 15 508.179 ? 0.108 ns/op loopInvariantAndInt 1024 avgt 15 394.706 ? 0.199 ns/op loopInvariantAndLong 1024 avgt 15 434.443 ? 0.247 ns/op loopInvariantMulInt 1024 avgt 15 762.477 ? 0.079 ns/op loopInvariantMulLong 1024 avgt 15 775.975 ? 0.159 ns/op loopInvariantOrInt 1024 avgt 15 394.657 ? 0.156 ns/op loopInvariantOrLong 1024 avgt 15 434.428 ? 0.282 ns/op loopInvariantSubLong 1024 avgt 15 507.475 ? 0.151 ns/op loopInvariantXorInt 1024 avgt 15 396.000 ? 0.011 ns/op loopInvariantXorLong 1024 avgt 15 434.255 ? 0.099 ns/op Tests: Tested jtreg hotspot::hotspot_all_no_apps,jdk::jdk_core,langtools::tier1 and jcstress:tests-custom, and all tests pass without new failure. Thanks, Xiaohong Gong From luhenry at microsoft.com Thu Aug 6 04:36:07 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Thu, 6 Aug 2020 04:36:07 +0000 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com> <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com> <410fb009-94ab-fea8-9c1c-51c835b27b72@oracle.com> <832a89ec-bd6b-5d40-9a1d-5a5e688399e7@oracle.com> Message-ID: Pushed with https://hg.openjdk.java.net/jdk/jdk/rev/b8231f177eaf Thank you to all involved ?? From christian.hagedorn at oracle.com Thu Aug 6 09:34:11 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 6 Aug 2020 11:34:11 +0200 Subject: [16] RFR(S): 8249603: C1: assert(has_error == false) failed: register allocation invalid Message-ID: <348bd2c6-df94-d9f5-47d9-d51443e1e5d6@oracle.com> Hi Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8249603 http://cr.openjdk.java.net/~chagedorn/8249603/webrev.00/ Register allocation fails in C1 in the testcase because two intervals overlap (they both have the same stack slot assigned). The problem can be traced back to the optimization to assign the same spill slot to non-intersecting intervals in LinearScanWalker::combine_spilled_intervals(). In this method, we look at a split parent interval 'cur' and its register hint interval 'register_hint'. A register hint is present when the interval represents either the source or the target operand of a move operation and the register hint the target or source operand, respectively (the register hint is used to try to assign the same register to the source and target operand such that we can completely remove the move operation). If the register hint is set, then we do some additional checks and make sure that the split parent and the register hint do not intersect. If all checks pass, the split parent 'cur' gets the same spill slot as the register hint [1]. This means that both intervals get the same slot on the stack if they are spilled. The problem now is that we do not consider any split children of the register hint which all share the same spill slot with the register hint (their split parent). In the testcase, the split parent 'cur' does not intersect with the register hint but with one of its split children. As a result, they both get the same spill slot and are later indeed both spilled (i.e. both virtual registers/operands are put to the same stack location at the same time). The fix now additionally checks if the split parent 'cur' does not intersect any split children of the register hint in combine_spilled_intervals(). If there is such an intersection, then we bail out of the optimization. Some standard benchmark testing did not show any regressions. Thank you! Best regards, Christian [1] http://hg.openjdk.java.net/jdk/jdk/file/7a3522ab48b3/src/hotspot/share/c1/c1_LinearScan.cpp#l5728 From jamsheed.c.m at oracle.com Thu Aug 6 12:07:40 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Thu, 6 Aug 2020 17:37:40 +0530 Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions Message-ID: Hi all, JBS: https://bugs.openjdk.java.net/browse/JDK-8249451 webrev: http://cr.openjdk.java.net/~jcm/8249451/webrev.00/ testing : mach1-5(links in jbs) While working on JDK-8246381 it was noticed that compilation request path clears all exceptions(including async) and doesn't propagate[1]. Fix: patch restores the propagation behavior for the probable async exceptions. Compilation request path propagate exception as in [2]. MDO and MethodCounter doesn't expect any exception other than metaspace OOM(added comments). Deoptimization path doesn't clear probable async exceptions and take unpack_exception path for non uncommontraps. Added java_lang_InternalError to well known classes. Request for review. Best Regards, Jamsheed [1] w.r.t changes done for JDK-7131259 [2] ??? (a) ??? -----> c1_Runtime1.cpp/interpreterRuntime.cpp/compilerRuntime.cpp ????? | ?????? ----- compilationPolicy.cpp/tieredThresholdPolicy.cpp ???????? | ????????? ------ compileBroker.cpp ??? (b) ??? Xcomp versions ??? ------> compilationPolicy.cpp ?????? | ??????? ------> compileBroker.cpp ??? (c) ??? Direct call to? compile_method in compileBroker.cpp ??? JVMCI bootstrap, whitebox, replayCompile. From tobias.hartmann at oracle.com Thu Aug 6 13:53:27 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 6 Aug 2020 15:53:27 +0200 Subject: [16] RFR(S): 8249608: Vector register used by C2 compiled method corrupted at safepoint Message-ID: <9dca10a8-0fcb-e63f-f0f9-c2552e5218c1@oracle.com> Hi, please review the following fix: https://bugs.openjdk.java.net/browse/JDK-8249608 http://cr.openjdk.java.net/~thartmann/8249608/webrev.00/ The problem is very similar to JDK-8193518 [1], a vector register (ymm0) used for vectorization of a loop in a C2 compiled method is corrupted at a safepoint. Again, the root cause is the superword optimization setting 'max_vector_size' to 16 bytes instead of 32 bytes which leads to the nmethod being marked as !has_wide_vectors and the safepoint handler not saving vector registers [3]. This time, the problem is that the superword code only updates 'max_vlen_in_bytes' if 'vlen > max_vlen'. In the failing case, 'vlen' is 4 for all packs (see [4]) but 'vlen_in_bytes' is 16 for the 4 x int StoreVector and 32 for the 4 x long StoreVector. Once we've processed the int StoreVector, we are not updating 'max_vlen_in_bytes' when processing long StoreVector because 'vlen' is equal. The fix is to make sure to always update 'max_vlen_in_bytes'. When looking at JDK-8193518 [1], I've noticed that the corresponding regression test was never pushed. I've added it to this webrev and extended it such that it also covers the new issue. Thanks, Tobias [1] https://bugs.openjdk.java.net/browse/JDK-8193518 [2] http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/opto/output.cpp#l3313 [3] http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/runtime/sharedRuntime.cpp#l551 [4] -XX:+TraceSuperWord output: After filter_packs packset Pack: 0 align: 0 1101 StoreL === 1115 1120 1102 174 [[ 1098 ]] @long[int:>=0]:exact+any *, idx=6; Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=993,214,[1012] !jvms: Test::test @ bci:17 Test::main @ bci:8 align: 8 1098 StoreL === 1115 1101 1099 174 [[ 993 ]] @long[int:>=0]:exact+any *, idx=6; Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ bci:17 Test::main @ bci:8 align: 16 993 StoreL === 1115 1098 994 174 [[ 866 214 ]] @long[int:>=0]:exact+any *, idx=6; Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ bci:17 Test::main @ bci:8 align: 24 214 StoreL === 1115 993 212 174 [[ 1120 864 255 ]] @long[int:>=0]:exact+any *, idx=6; Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=[1012] !jvms: Test::test @ bci:17 Test::main @ bci:8 Pack: 1 align: 0 1097 StoreI === 1115 1119 1106 41 [[ 1096 ]] @int[int:>=0]:exact+any *, idx=8; Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=989,253,[1009] !jvms: Test::test @ bci:23 Test::main @ bci:8 align: 4 1096 StoreI === 1115 1097 1104 41 [[ 989 ]] @int[int:>=0]:exact+any *, idx=8; Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23 Test::main @ bci:8 align: 8 989 StoreI === 1115 1096 996 41 [[ 867 253 ]] @int[int:>=0]:exact+any *, idx=8; Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23 Test::main @ bci:8 align: 12 253 StoreI === 1115 989 251 41 [[ 1119 860 255 ]] @int[int:>=0]:exact+any *, idx=8; Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=[1009] !jvms: Test::test @ bci:23 Test::main @ bci:8 new Vector node: 1491 ReplicateI === _ 41 [[]] #vectorx[4]:{int} new Vector node: 1492 StoreVector === 1115 1119 1106 1491 [[ 1487 1119 255 1486 ]] @int[int:>=0]:NotNull:exact+any *, idx=8; mismatched Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=[1097],[989],[253],[1009] !jvms: Test::test @ bci:23 Test::main @ bci:8 new Vector node: 1493 ReplicateL === _ 174 [[]] #vectory[4]:{long} new Vector node: 1494 StoreVector === 1115 1120 1102 1493 [[ 1489 1120 255 1488 ]] @long[int:>=0]:NotNull:exact+any *, idx=6; mismatched Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=[1101],[993],[214],[1012] !jvms: Test::test @ bci:17 Test::main @ bci:8 From vladimir.x.ivanov at oracle.com Thu Aug 6 14:07:43 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 6 Aug 2020 17:07:43 +0300 Subject: [16] RFR(S): 8249608: Vector register used by C2 compiled method corrupted at safepoint In-Reply-To: <9dca10a8-0fcb-e63f-f0f9-c2552e5218c1@oracle.com> References: <9dca10a8-0fcb-e63f-f0f9-c2552e5218c1@oracle.com> Message-ID: <57163077-f113-b538-2830-86e43c5bd8ea@oracle.com> > http://cr.openjdk.java.net/~thartmann/8249608/webrev.00/ Looks good. Best regards, Vladimir Ivanov > > The problem is very similar to JDK-8193518 [1], a vector register (ymm0) used for vectorization of a > loop in a C2 compiled method is corrupted at a safepoint. Again, the root cause is the superword > optimization setting 'max_vector_size' to 16 bytes instead of 32 bytes which leads to the nmethod > being marked as !has_wide_vectors and the safepoint handler not saving vector registers [3]. > > This time, the problem is that the superword code only updates 'max_vlen_in_bytes' if 'vlen > > max_vlen'. In the failing case, 'vlen' is 4 for all packs (see [4]) but 'vlen_in_bytes' is 16 for > the 4 x int StoreVector and 32 for the 4 x long StoreVector. Once we've processed the int > StoreVector, we are not updating 'max_vlen_in_bytes' when processing long StoreVector because 'vlen' > is equal. > > The fix is to make sure to always update 'max_vlen_in_bytes'. > > When looking at JDK-8193518 [1], I've noticed that the corresponding regression test was never > pushed. I've added it to this webrev and extended it such that it also covers the new issue. > > Thanks, > Tobias > > [1] https://bugs.openjdk.java.net/browse/JDK-8193518 > [2] http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/opto/output.cpp#l3313 > [3] > http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/runtime/sharedRuntime.cpp#l551 > > [4] -XX:+TraceSuperWord output: > > After filter_packs > packset > Pack: 0 > align: 0 1101 StoreL === 1115 1120 1102 174 [[ 1098 ]] @long[int:>=0]:exact+any *, idx=6; > Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=993,214,[1012] !jvms: Test::test @ bci:17 > Test::main @ bci:8 > align: 8 1098 StoreL === 1115 1101 1099 174 [[ 993 ]] @long[int:>=0]:exact+any *, idx=6; > Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ bci:17 > Test::main @ bci:8 > align: 16 993 StoreL === 1115 1098 994 174 [[ 866 214 ]] @long[int:>=0]:exact+any *, > idx=6; Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ > bci:17 Test::main @ bci:8 > align: 24 214 StoreL === 1115 993 212 174 [[ 1120 864 255 ]] @long[int:>=0]:exact+any *, > idx=6; Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=[1012] !jvms: Test::test @ bci:17 > Test::main @ bci:8 > Pack: 1 > align: 0 1097 StoreI === 1115 1119 1106 41 [[ 1096 ]] @int[int:>=0]:exact+any *, idx=8; > Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=989,253,[1009] !jvms: Test::test @ bci:23 > Test::main @ bci:8 > align: 4 1096 StoreI === 1115 1097 1104 41 [[ 989 ]] @int[int:>=0]:exact+any *, idx=8; > Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23 > Test::main @ bci:8 > align: 8 989 StoreI === 1115 1096 996 41 [[ 867 253 ]] @int[int:>=0]:exact+any *, idx=8; > Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23 > Test::main @ bci:8 > align: 12 253 StoreI === 1115 989 251 41 [[ 1119 860 255 ]] @int[int:>=0]:exact+any *, > idx=8; Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=[1009] !jvms: Test::test @ bci:23 > Test::main @ bci:8 > > new Vector node: 1491 ReplicateI === _ 41 [[]] #vectorx[4]:{int} > new Vector node: 1492 StoreVector === 1115 1119 1106 1491 [[ 1487 1119 255 1486 ]] > @int[int:>=0]:NotNull:exact+any *, idx=8; mismatched Memory: @int[int:>=0]:NotNull:exact+any *, > idx=8; !orig=[1097],[989],[253],[1009] !jvms: Test::test @ bci:23 Test::main @ bci:8 > new Vector node: 1493 ReplicateL === _ 174 [[]] #vectory[4]:{long} > new Vector node: 1494 StoreVector === 1115 1120 1102 1493 [[ 1489 1120 255 1488 ]] > @long[int:>=0]:NotNull:exact+any *, idx=6; mismatched Memory: @long[int:>=0]:NotNull:exact+any *, > idx=6; !orig=[1101],[993],[214],[1012] !jvms: Test::test @ bci:17 Test::main @ bci:8 > From tobias.hartmann at oracle.com Thu Aug 6 14:11:38 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 6 Aug 2020 16:11:38 +0200 Subject: [16] RFR(S): 8249608: Vector register used by C2 compiled method corrupted at safepoint In-Reply-To: <57163077-f113-b538-2830-86e43c5bd8ea@oracle.com> References: <9dca10a8-0fcb-e63f-f0f9-c2552e5218c1@oracle.com> <57163077-f113-b538-2830-86e43c5bd8ea@oracle.com> Message-ID: Thanks Vladimir! Best regards, Tobias On 06.08.20 16:07, Vladimir Ivanov wrote: > >> http://cr.openjdk.java.net/~thartmann/8249608/webrev.00/ > > Looks good. > > Best regards, > Vladimir Ivanov > >> >> The problem is very similar to JDK-8193518 [1], a vector register (ymm0) used for vectorization of a >> loop in a C2 compiled method is corrupted at a safepoint. Again, the root cause is the superword >> optimization setting 'max_vector_size' to 16 bytes instead of 32 bytes which leads to the nmethod >> being marked as !has_wide_vectors and the safepoint handler not saving vector registers [3]. >> >> This time, the problem is that the superword code only updates 'max_vlen_in_bytes' if 'vlen > >> max_vlen'. In the failing case, 'vlen' is 4 for all packs (see [4]) but 'vlen_in_bytes' is 16 for >> the 4 x int StoreVector and 32 for the 4 x long StoreVector. Once we've processed the int >> StoreVector, we are not updating 'max_vlen_in_bytes' when processing long StoreVector because 'vlen' >> is equal. >> >> The fix is to make sure to always update 'max_vlen_in_bytes'. >> >> When looking at JDK-8193518 [1], I've noticed that the corresponding regression test was never >> pushed. I've added it to this webrev and extended it such that it also covers the new issue. >> >> Thanks, >> Tobias >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8193518 >> [2] http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/opto/output.cpp#l3313 >> [3] >> http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/runtime/sharedRuntime.cpp#l551 >> >> [4] -XX:+TraceSuperWord output: >> >> After filter_packs >> packset >> Pack: 0 >> ? align: 0????? 1101??? StoreL??? ===? 1115? 1120? 1102? 174? [[ 1098 ]]? @long[int:>=0]:exact+any >> *, idx=6; >> Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=993,214,[1012] !jvms: Test::test @ bci:17 >> Test::main @ bci:8 >> ? align: 8????? 1098??? StoreL??? ===? 1115? 1101? 1099? 174? [[ 993 ]]? @long[int:>=0]:exact+any >> *, idx=6; >> Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ bci:17 >> Test::main @ bci:8 >> ? align: 16????? 993??? StoreL??? ===? 1115? 1098? 994? 174? [[ 866? 214 ]]? >> @long[int:>=0]:exact+any *, >> idx=6;? Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ >> bci:17 Test::main @ bci:8 >> ? align: 24????? 214??? StoreL??? ===? 1115? 993? 212? 174? [[ 1120? 864? 255 ]]? >> @long[int:>=0]:exact+any *, >> idx=6;? Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=[1012] !jvms: Test::test @ bci:17 >> Test::main @ bci:8 >> Pack: 1 >> ? align: 0????? 1097??? StoreI??? ===? 1115? 1119? 1106? 41? [[ 1096 ]]? @int[int:>=0]:exact+any >> *, idx=8; >> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=989,253,[1009] !jvms: Test::test @ bci:23 >> Test::main @ bci:8 >> ? align: 4????? 1096??? StoreI??? ===? 1115? 1097? 1104? 41? [[ 989 ]]? @int[int:>=0]:exact+any *, >> idx=8; >> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23 >> Test::main @ bci:8 >> ? align: 8????? 989??? StoreI??? ===? 1115? 1096? 996? 41? [[ 867? 253 ]]? @int[int:>=0]:exact+any >> *, idx=8; >> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23 >> Test::main @ bci:8 >> ? align: 12????? 253??? StoreI??? ===? 1115? 989? 251? 41? [[ 1119? 860? 255 ]]? >> @int[int:>=0]:exact+any *, >> idx=8;? Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=[1009] !jvms: Test::test @ bci:23 >> Test::main @ bci:8 >> >> new Vector node:? 1491??? ReplicateI??? === _? 41? [[]]? #vectorx[4]:{int} >> new Vector node:? 1492??? StoreVector??? ===? 1115? 1119? 1106? 1491? [[ 1487? 1119? 255? 1486 ]] >> @int[int:>=0]:NotNull:exact+any *, idx=8; mismatched? Memory: @int[int:>=0]:NotNull:exact+any *, >> idx=8; !orig=[1097],[989],[253],[1009] !jvms: Test::test @ bci:23 Test::main @ bci:8 >> new Vector node:? 1493??? ReplicateL??? === _? 174? [[]]? #vectory[4]:{long} >> new Vector node:? 1494??? StoreVector??? ===? 1115? 1120? 1102? 1493? [[ 1489? 1120? 255? 1488 ]] >> @long[int:>=0]:NotNull:exact+any *, idx=6; mismatched? Memory: @long[int:>=0]:NotNull:exact+any *, >> idx=6; !orig=[1101],[993],[214],[1012] !jvms: Test::test @ bci:17 Test::main @ bci:8 >> From christian.hagedorn at oracle.com Thu Aug 6 14:28:21 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 6 Aug 2020 16:28:21 +0200 Subject: [16] RFR(S): 8249608: Vector register used by C2 compiled method corrupted at safepoint In-Reply-To: <9dca10a8-0fcb-e63f-f0f9-c2552e5218c1@oracle.com> References: <9dca10a8-0fcb-e63f-f0f9-c2552e5218c1@oracle.com> Message-ID: Hi Tobias Looks good to me! Best regards, Christian On 06.08.20 15:53, Tobias Hartmann wrote: > Hi, > > please review the following fix: > https://bugs.openjdk.java.net/browse/JDK-8249608 > http://cr.openjdk.java.net/~thartmann/8249608/webrev.00/ > > The problem is very similar to JDK-8193518 [1], a vector register (ymm0) used for vectorization of a > loop in a C2 compiled method is corrupted at a safepoint. Again, the root cause is the superword > optimization setting 'max_vector_size' to 16 bytes instead of 32 bytes which leads to the nmethod > being marked as !has_wide_vectors and the safepoint handler not saving vector registers [3]. > > This time, the problem is that the superword code only updates 'max_vlen_in_bytes' if 'vlen > > max_vlen'. In the failing case, 'vlen' is 4 for all packs (see [4]) but 'vlen_in_bytes' is 16 for > the 4 x int StoreVector and 32 for the 4 x long StoreVector. Once we've processed the int > StoreVector, we are not updating 'max_vlen_in_bytes' when processing long StoreVector because 'vlen' > is equal. > > The fix is to make sure to always update 'max_vlen_in_bytes'. > > When looking at JDK-8193518 [1], I've noticed that the corresponding regression test was never > pushed. I've added it to this webrev and extended it such that it also covers the new issue. > > Thanks, > Tobias > > [1] https://bugs.openjdk.java.net/browse/JDK-8193518 > [2] http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/opto/output.cpp#l3313 > [3] > http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/runtime/sharedRuntime.cpp#l551 > > [4] -XX:+TraceSuperWord output: > > After filter_packs > packset > Pack: 0 > align: 0 1101 StoreL === 1115 1120 1102 174 [[ 1098 ]] @long[int:>=0]:exact+any *, idx=6; > Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=993,214,[1012] !jvms: Test::test @ bci:17 > Test::main @ bci:8 > align: 8 1098 StoreL === 1115 1101 1099 174 [[ 993 ]] @long[int:>=0]:exact+any *, idx=6; > Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ bci:17 > Test::main @ bci:8 > align: 16 993 StoreL === 1115 1098 994 174 [[ 866 214 ]] @long[int:>=0]:exact+any *, > idx=6; Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ > bci:17 Test::main @ bci:8 > align: 24 214 StoreL === 1115 993 212 174 [[ 1120 864 255 ]] @long[int:>=0]:exact+any *, > idx=6; Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=[1012] !jvms: Test::test @ bci:17 > Test::main @ bci:8 > Pack: 1 > align: 0 1097 StoreI === 1115 1119 1106 41 [[ 1096 ]] @int[int:>=0]:exact+any *, idx=8; > Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=989,253,[1009] !jvms: Test::test @ bci:23 > Test::main @ bci:8 > align: 4 1096 StoreI === 1115 1097 1104 41 [[ 989 ]] @int[int:>=0]:exact+any *, idx=8; > Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23 > Test::main @ bci:8 > align: 8 989 StoreI === 1115 1096 996 41 [[ 867 253 ]] @int[int:>=0]:exact+any *, idx=8; > Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23 > Test::main @ bci:8 > align: 12 253 StoreI === 1115 989 251 41 [[ 1119 860 255 ]] @int[int:>=0]:exact+any *, > idx=8; Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=[1009] !jvms: Test::test @ bci:23 > Test::main @ bci:8 > > new Vector node: 1491 ReplicateI === _ 41 [[]] #vectorx[4]:{int} > new Vector node: 1492 StoreVector === 1115 1119 1106 1491 [[ 1487 1119 255 1486 ]] > @int[int:>=0]:NotNull:exact+any *, idx=8; mismatched Memory: @int[int:>=0]:NotNull:exact+any *, > idx=8; !orig=[1097],[989],[253],[1009] !jvms: Test::test @ bci:23 Test::main @ bci:8 > new Vector node: 1493 ReplicateL === _ 174 [[]] #vectory[4]:{long} > new Vector node: 1494 StoreVector === 1115 1120 1102 1493 [[ 1489 1120 255 1488 ]] > @long[int:>=0]:NotNull:exact+any *, idx=6; mismatched Memory: @long[int:>=0]:NotNull:exact+any *, > idx=6; !orig=[1101],[993],[214],[1012] !jvms: Test::test @ bci:17 Test::main @ bci:8 > From tobias.hartmann at oracle.com Thu Aug 6 14:29:12 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 6 Aug 2020 16:29:12 +0200 Subject: [16] RFR(S): 8249608: Vector register used by C2 compiled method corrupted at safepoint In-Reply-To: References: <9dca10a8-0fcb-e63f-f0f9-c2552e5218c1@oracle.com> Message-ID: Thanks Christian! Best regards, Tobias On 06.08.20 16:28, Christian Hagedorn wrote: > Hi Tobias > > Looks good to me! > > Best regards, > Christian > > On 06.08.20 15:53, Tobias Hartmann wrote: >> Hi, >> >> please review the following fix: >> https://bugs.openjdk.java.net/browse/JDK-8249608 >> http://cr.openjdk.java.net/~thartmann/8249608/webrev.00/ >> >> The problem is very similar to JDK-8193518 [1], a vector register (ymm0) used for vectorization of a >> loop in a C2 compiled method is corrupted at a safepoint. Again, the root cause is the superword >> optimization setting 'max_vector_size' to 16 bytes instead of 32 bytes which leads to the nmethod >> being marked as !has_wide_vectors and the safepoint handler not saving vector registers [3]. >> >> This time, the problem is that the superword code only updates 'max_vlen_in_bytes' if 'vlen > >> max_vlen'. In the failing case, 'vlen' is 4 for all packs (see [4]) but 'vlen_in_bytes' is 16 for >> the 4 x int StoreVector and 32 for the 4 x long StoreVector. Once we've processed the int >> StoreVector, we are not updating 'max_vlen_in_bytes' when processing long StoreVector because 'vlen' >> is equal. >> >> The fix is to make sure to always update 'max_vlen_in_bytes'. >> >> When looking at JDK-8193518 [1], I've noticed that the corresponding regression test was never >> pushed. I've added it to this webrev and extended it such that it also covers the new issue. >> >> Thanks, >> Tobias >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8193518 >> [2] http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/opto/output.cpp#l3313 >> [3] >> http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/runtime/sharedRuntime.cpp#l551 >> >> [4] -XX:+TraceSuperWord output: >> >> After filter_packs >> packset >> Pack: 0 >> ? align: 0????? 1101??? StoreL??? ===? 1115? 1120? 1102? 174? [[ 1098 ]]? @long[int:>=0]:exact+any >> *, idx=6; >> Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=993,214,[1012] !jvms: Test::test @ bci:17 >> Test::main @ bci:8 >> ? align: 8????? 1098??? StoreL??? ===? 1115? 1101? 1099? 174? [[ 993 ]]? @long[int:>=0]:exact+any >> *, idx=6; >> Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ bci:17 >> Test::main @ bci:8 >> ? align: 16????? 993??? StoreL??? ===? 1115? 1098? 994? 174? [[ 866? 214 ]]? >> @long[int:>=0]:exact+any *, >> idx=6;? Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ >> bci:17 Test::main @ bci:8 >> ? align: 24????? 214??? StoreL??? ===? 1115? 993? 212? 174? [[ 1120? 864? 255 ]]? >> @long[int:>=0]:exact+any *, >> idx=6;? Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=[1012] !jvms: Test::test @ bci:17 >> Test::main @ bci:8 >> Pack: 1 >> ? align: 0????? 1097??? StoreI??? ===? 1115? 1119? 1106? 41? [[ 1096 ]]? @int[int:>=0]:exact+any >> *, idx=8; >> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=989,253,[1009] !jvms: Test::test @ bci:23 >> Test::main @ bci:8 >> ? align: 4????? 1096??? StoreI??? ===? 1115? 1097? 1104? 41? [[ 989 ]]? @int[int:>=0]:exact+any *, >> idx=8; >> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23 >> Test::main @ bci:8 >> ? align: 8????? 989??? StoreI??? ===? 1115? 1096? 996? 41? [[ 867? 253 ]]? @int[int:>=0]:exact+any >> *, idx=8; >> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23 >> Test::main @ bci:8 >> ? align: 12????? 253??? StoreI??? ===? 1115? 989? 251? 41? [[ 1119? 860? 255 ]]? >> @int[int:>=0]:exact+any *, >> idx=8;? Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=[1009] !jvms: Test::test @ bci:23 >> Test::main @ bci:8 >> >> new Vector node:? 1491??? ReplicateI??? === _? 41? [[]]? #vectorx[4]:{int} >> new Vector node:? 1492??? StoreVector??? ===? 1115? 1119? 1106? 1491? [[ 1487? 1119? 255? 1486 ]] >> @int[int:>=0]:NotNull:exact+any *, idx=8; mismatched? Memory: @int[int:>=0]:NotNull:exact+any *, >> idx=8; !orig=[1097],[989],[253],[1009] !jvms: Test::test @ bci:23 Test::main @ bci:8 >> new Vector node:? 1493??? ReplicateL??? === _? 174? [[]]? #vectory[4]:{long} >> new Vector node:? 1494??? StoreVector??? ===? 1115? 1120? 1102? 1493? [[ 1489? 1120? 255? 1488 ]] >> @long[int:>=0]:NotNull:exact+any *, idx=6; mismatched? Memory: @long[int:>=0]:NotNull:exact+any *, >> idx=6; !orig=[1101],[993],[214],[1012] !jvms: Test::test @ bci:17 Test::main @ bci:8 >> From vladimir.kozlov at oracle.com Thu Aug 6 19:00:00 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 6 Aug 2020 12:00:00 -0700 Subject: [16] RFR(S): 8249608: Vector register used by C2 compiled method corrupted at safepoint In-Reply-To: <57163077-f113-b538-2830-86e43c5bd8ea@oracle.com> References: <9dca10a8-0fcb-e63f-f0f9-c2552e5218c1@oracle.com> <57163077-f113-b538-2830-86e43c5bd8ea@oracle.com> Message-ID: <8ddafcf8-5fcf-c0cc-ccd0-29692dd1c19b@oracle.com> +1 Thanks, Vladimir K On 8/6/20 7:07 AM, Vladimir Ivanov wrote: > >> http://cr.openjdk.java.net/~thartmann/8249608/webrev.00/ > > Looks good. > > Best regards, > Vladimir Ivanov > >> >> The problem is very similar to JDK-8193518 [1], a vector register (ymm0) used for vectorization of a >> loop in a C2 compiled method is corrupted at a safepoint. Again, the root cause is the superword >> optimization setting 'max_vector_size' to 16 bytes instead of 32 bytes which leads to the nmethod >> being marked as !has_wide_vectors and the safepoint handler not saving vector registers [3]. >> >> This time, the problem is that the superword code only updates 'max_vlen_in_bytes' if 'vlen > >> max_vlen'. In the failing case, 'vlen' is 4 for all packs (see [4]) but 'vlen_in_bytes' is 16 for >> the 4 x int StoreVector and 32 for the 4 x long StoreVector. Once we've processed the int >> StoreVector, we are not updating 'max_vlen_in_bytes' when processing long StoreVector because 'vlen' >> is equal. >> >> The fix is to make sure to always update 'max_vlen_in_bytes'. >> >> When looking at JDK-8193518 [1], I've noticed that the corresponding regression test was never >> pushed. I've added it to this webrev and extended it such that it also covers the new issue. >> >> Thanks, >> Tobias >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8193518 >> [2] http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/opto/output.cpp#l3313 >> [3] >> http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/runtime/sharedRuntime.cpp#l551 >> >> [4] -XX:+TraceSuperWord output: >> >> After filter_packs >> packset >> Pack: 0 >> ? align: 0????? 1101??? StoreL??? ===? 1115? 1120? 1102? 174? [[ 1098 ]]? @long[int:>=0]:exact+any *, idx=6; >> Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=993,214,[1012] !jvms: Test::test @ bci:17 >> Test::main @ bci:8 >> ? align: 8????? 1098??? StoreL??? ===? 1115? 1101? 1099? 174? [[ 993 ]]? @long[int:>=0]:exact+any *, idx=6; >> Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ bci:17 >> Test::main @ bci:8 >> ? align: 16????? 993??? StoreL??? ===? 1115? 1098? 994? 174? [[ 866? 214 ]]? @long[int:>=0]:exact+any *, >> idx=6;? Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ >> bci:17 Test::main @ bci:8 >> ? align: 24????? 214??? StoreL??? ===? 1115? 993? 212? 174? [[ 1120? 864? 255 ]]? @long[int:>=0]:exact+any *, >> idx=6;? Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=[1012] !jvms: Test::test @ bci:17 >> Test::main @ bci:8 >> Pack: 1 >> ? align: 0????? 1097??? StoreI??? ===? 1115? 1119? 1106? 41? [[ 1096 ]]? @int[int:>=0]:exact+any *, idx=8; >> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=989,253,[1009] !jvms: Test::test @ bci:23 >> Test::main @ bci:8 >> ? align: 4????? 1096??? StoreI??? ===? 1115? 1097? 1104? 41? [[ 989 ]]? @int[int:>=0]:exact+any *, idx=8; >> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23 >> Test::main @ bci:8 >> ? align: 8????? 989??? StoreI??? ===? 1115? 1096? 996? 41? [[ 867? 253 ]]? @int[int:>=0]:exact+any *, idx=8; >> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23 >> Test::main @ bci:8 >> ? align: 12????? 253??? StoreI??? ===? 1115? 989? 251? 41? [[ 1119? 860? 255 ]]? @int[int:>=0]:exact+any *, >> idx=8;? Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=[1009] !jvms: Test::test @ bci:23 >> Test::main @ bci:8 >> >> new Vector node:? 1491??? ReplicateI??? === _? 41? [[]]? #vectorx[4]:{int} >> new Vector node:? 1492??? StoreVector??? ===? 1115? 1119? 1106? 1491? [[ 1487? 1119? 255? 1486 ]] >> @int[int:>=0]:NotNull:exact+any *, idx=8; mismatched? Memory: @int[int:>=0]:NotNull:exact+any *, >> idx=8; !orig=[1097],[989],[253],[1009] !jvms: Test::test @ bci:23 Test::main @ bci:8 >> new Vector node:? 1493??? ReplicateL??? === _? 174? [[]]? #vectory[4]:{long} >> new Vector node:? 1494??? StoreVector??? ===? 1115? 1120? 1102? 1493? [[ 1489? 1120? 255? 1488 ]] >> @long[int:>=0]:NotNull:exact+any *, idx=6; mismatched? Memory: @long[int:>=0]:NotNull:exact+any *, >> idx=6; !orig=[1101],[993],[214],[1012] !jvms: Test::test @ bci:17 Test::main @ bci:8 >> From vladimir.kozlov at oracle.com Thu Aug 6 19:19:13 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 6 Aug 2020 12:19:13 -0700 Subject: [16] RFR(S): 8249603: C1: assert(has_error == false) failed: register allocation invalid In-Reply-To: <348bd2c6-df94-d9f5-47d9-d51443e1e5d6@oracle.com> References: <348bd2c6-df94-d9f5-47d9-d51443e1e5d6@oracle.com> Message-ID: Fix looks good. And very nice description of the issue. Thanks, Vladimir K On 8/6/20 2:34 AM, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8249603 > http://cr.openjdk.java.net/~chagedorn/8249603/webrev.00/ > > Register allocation fails in C1 in the testcase because two intervals overlap (they both have the same stack slot > assigned). The problem can be traced back to the optimization to assign the same spill slot to non-intersecting > intervals in LinearScanWalker::combine_spilled_intervals(). > > In this method, we look at a split parent interval 'cur' and its register hint interval 'register_hint'. A register hint > is present when the interval represents either the source or the target operand of a move operation and the register > hint the target or source operand, respectively (the register hint is used to try to assign the same register to the > source and target operand such that we can completely remove the move operation). > > If the register hint is set, then we do some additional checks and make sure that the split parent and the register hint > do not intersect. If all checks pass, the split parent 'cur' gets the same spill slot as the register hint [1]. This > means that both intervals get the same slot on the stack if they are spilled. > > The problem now is that we do not consider any split children of the register hint which all share the same spill slot > with the register hint (their split parent). In the testcase, the split parent 'cur' does not intersect with the > register hint but with one of its split children. As a result, they both get the same spill slot and are later indeed > both spilled (i.e. both virtual registers/operands are put to the same stack location at the same time). > > The fix now additionally checks if the split parent 'cur' does not intersect any split children of the register hint in > combine_spilled_intervals(). If there is such an intersection, then we bail out of the optimization. > > Some standard benchmark testing did not show any regressions. > > Thank you! > > Best regards, > Christian > > > [1] http://hg.openjdk.java.net/jdk/jdk/file/7a3522ab48b3/src/hotspot/share/c1/c1_LinearScan.cpp#l5728 From vladimir.kozlov at oracle.com Thu Aug 6 21:45:53 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 6 Aug 2020 14:45:53 -0700 Subject: [16] (S) RFR 8251260: two MD5 tests fail "RuntimeException: Unexpected count of intrinsic" Message-ID: <7ecc9a5b-3af6-78a4-832c-03d043340f9f@oracle.com> http://cr.openjdk.java.net/~kvn/8251260/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8251260 New MD5 intrinsic tests failed when run with AOTed java.base. And old SHA tests are problem listed for AOT. SHA and MD5 intrinsic tests parse -XX:+LogCompilation output looking for compilation of sun/security/provider methods as intrinsics. But these methods are already pre-compiled by AOT when AOTed java.base is used. As result LogCompilation does not have corresponding entries. I think we should not run these MD5 and SHA tests with AOTed java.base module. I added corresponding @requires. Old SHA tests were problem listed referencing 8167430 [1] bug but I think it is incorrect. The original SHA tests crash with AOT 8207358 [2] bug was closed as duplicate of 8167430 because of conflict how intirnsics flags are set by default during AOT compilation. But we simply should not run these tests with AOTed java.base. So I am adding @requires to them as well and removing them from AOT problem list. Tested hs-tier1, hs-tier2 (runs sha,md5 tests), hs-tier6 (now skips sha,md5 tests when AOTed java.base is used). Thanks, Vladimir [1] https://bugs.openjdk.java.net/browse/JDK-8167430 [2] https://bugs.openjdk.java.net/browse/JDK-8207358 From verghese at amazon.com Thu Aug 6 23:49:39 2020 From: verghese at amazon.com (Verghese, Clive) Date: Thu, 6 Aug 2020 23:49:39 +0000 Subject: RFR 8251268: Move PhaseChaitin definations from live.cpp to chaitin.cpp Message-ID: <54AD5187-E7EE-410F-BD5D-11658E8D2F6E@amazon.com> Hi, Requesting review for Webrev : http://cr.openjdk.java.net/~xliu/clive/8251268/00/webrev/ JBS : https://bugs.openjdk.java.net/browse/JDK-8251268 The change moves the definition of PhaseChaitin::verify_base_ptrs and PhaseChaitin::verify from live.cpp to chaitin.cpp I have tested this builds successfully for both PRODUCT and !PRODUCT. Ensured that there are no regressions in hotspot:tier1 tests. Regards, Clive Verghese From christian.hagedorn at oracle.com Fri Aug 7 06:53:08 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 7 Aug 2020 08:53:08 +0200 Subject: RFR 8251268: Move PhaseChaitin definations from live.cpp to chaitin.cpp In-Reply-To: <54AD5187-E7EE-410F-BD5D-11658E8D2F6E@amazon.com> References: <54AD5187-E7EE-410F-BD5D-11658E8D2F6E@amazon.com> Message-ID: <6593ec2c-78dc-4a72-7f5b-f6c60deda41d@oracle.com> Hi Clive The fix looks good to me. It makes sense to move it to chaitin.cpp since the calls to verify() are also in this file only. You could fix some minor code style things about the existing code that you moved while at it: - You can move the #ifdef ASSERT out of both methods and surround both methods by one single #ifdef ASSERT since verify()/verify_base_ptrs() are only called in ASSERT blocks. And add a // ASSERT comment on the closing #endif to make it more clear. Don't forget to also surround the declarations in the .hpp file with an ASSERT. - In verify_base_ptrs(): - L2330: Missing curly braces for the loop - L2297, 2309, 2316: The asterisk should be at the type: ResourceArea *a -> ResourceArea* a - There is a missing space in all asserts after the comma separating the condition and the failure string - In verify(): - L2386: Missing space and curly braces for the if statement Best regards, Christian On 07.08.20 01:49, Verghese, Clive wrote: > Hi, > > Requesting review for > > Webrev : http://cr.openjdk.java.net/~xliu/clive/8251268/00/webrev/ > JBS : https://bugs.openjdk.java.net/browse/JDK-8251268 > > The change moves the definition of PhaseChaitin::verify_base_ptrs and PhaseChaitin::verify from live.cpp to chaitin.cpp > > I have tested this builds successfully for both PRODUCT and !PRODUCT. > > Ensured that there are no regressions in hotspot:tier1 tests. > > > Regards, > Clive Verghese > From christian.hagedorn at oracle.com Fri Aug 7 06:55:24 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 7 Aug 2020 08:55:24 +0200 Subject: [16] RFR(S): 8249603: C1: assert(has_error == false) failed: register allocation invalid In-Reply-To: References: <348bd2c6-df94-d9f5-47d9-d51443e1e5d6@oracle.com> Message-ID: <141abed0-ea8f-8c93-6031-4deebd799af0@oracle.com> Thanks a lot Vladimir! Best regards, Christian On 06.08.20 21:19, Vladimir Kozlov wrote: > Fix looks good. And very nice description of the issue. > > Thanks, > Vladimir K > > On 8/6/20 2:34 AM, Christian Hagedorn wrote: >> Hi >> >> Please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8249603 >> http://cr.openjdk.java.net/~chagedorn/8249603/webrev.00/ >> >> Register allocation fails in C1 in the testcase because two intervals >> overlap (they both have the same stack slot assigned). The problem can >> be traced back to the optimization to assign the same spill slot to >> non-intersecting intervals in >> LinearScanWalker::combine_spilled_intervals(). >> >> In this method, we look at a split parent interval 'cur' and its >> register hint interval 'register_hint'. A register hint is present >> when the interval represents either the source or the target operand >> of a move operation and the register hint the target or source >> operand, respectively (the register hint is used to try to assign the >> same register to the source and target operand such that we can >> completely remove the move operation). >> >> If the register hint is set, then we do some additional checks and >> make sure that the split parent and the register hint do not >> intersect. If all checks pass, the split parent 'cur' gets the same >> spill slot as the register hint [1]. This means that both intervals >> get the same slot on the stack if they are spilled. >> >> The problem now is that we do not consider any split children of the >> register hint which all share the same spill slot with the register >> hint (their split parent). In the testcase, the split parent 'cur' >> does not intersect with the register hint but with one of its split >> children. As a result, they both get the same spill slot and are later >> indeed both spilled (i.e. both virtual registers/operands are put to >> the same stack location at the same time). >> >> The fix now additionally checks if the split parent 'cur' does not >> intersect any split children of the register hint in >> combine_spilled_intervals(). If there is such an intersection, then we >> bail out of the optimization. >> >> Some standard benchmark testing did not show any regressions. >> >> Thank you! >> >> Best regards, >> Christian >> >> >> [1] >> http://hg.openjdk.java.net/jdk/jdk/file/7a3522ab48b3/src/hotspot/share/c1/c1_LinearScan.cpp#l5728 >> From nick.gasson at arm.com Fri Aug 7 09:04:49 2020 From: nick.gasson at arm.com (Nick Gasson) Date: Fri, 07 Aug 2020 17:04:49 +0800 Subject: RFR: 8247354: [aarch64] PopFrame causes assert(oopDesc::is_oop(obj)) failed: not an oop Message-ID: <85364ykery.fsf@nicgas01-pc.shanghai.arm.com> Hi, Bug: https://bugs.openjdk.java.net/browse/JDK-8247354 Webrev: http://cr.openjdk.java.net/~ngasson/8247354/webrev.0/ Running jtreg test vmTestbase/nsk/jdb/pop/pop001/pop001.java with -Xcomp causes this assertion failure: assert(oopDesc::is_oop(obj)) failed: not an oop: 0x0000ffff60b334c0 This test has a sequence of method calls func1(0) -> func2(1) -> ... -> func5(4) -> lastBreak() with a breakpoint in lastBreak(). func{2..5} are inlined into func1 when compiled. At the breakpoint the debugger is used to pop four frames and then continue executing from func2. This causes func1 to be deoptimized but the recreated interpreter frame for func2 has garbage values in its temporary expression stack (the parameters for func3), which triggers the above assertion when the invoke bytecode re-executes. The outgoing parameters in func2's expression stack should be filled in when we recreate the locals for func3. But on AArch64 the template interpreter inserts padding between the locals block and the saved sender SP to align the machine SP to 16-bytes. This extra padding is accounted for by AbstractInterpreter::size_activation() but not when recreating the frame in layout_activation(). This causes the incoming parameters in the callee frame to be misaligned with outgoing parameters in the caller frame. This patch fixes that by using the caller's ESP to calculate the location of the locals if the caller is an interpreted frame. Tested jtreg hotspot_all_no_apps, jdk_core plus tier1 with -XX:+DeoptimizeALot. -- Thanks, Nick From jatin.bhateja at intel.com Fri Aug 7 09:27:38 2020 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Fri, 7 Aug 2020 09:27:38 +0000 Subject: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 In-Reply-To: References: <92d97d1b-fc53-e368-b249-1cab7db33964@oracle.com> <5f6a3e52-7854-4613-43f1-32a7423a0db6@oracle.com> <8265e303-0f86-b308-be79-740d6b4710f2@oracle.com> Message-ID: Hi Vladimir, Please let me know if final version looks fine to you. Also, if clearance from a second reviewer mandatory here or can we push this to trunk is no more comments. Best Regards, Jatin > -----Original Message----- > From: hotspot-compiler-dev On > Behalf Of Bhateja, Jatin > Sent: Sunday, August 2, 2020 11:55 PM > To: Vladimir Ivanov > Cc: Viswanathan, Sandhya ; hotspot-compiler- > dev at openjdk.java.net > Subject: RE: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 > > Hi Vladimir, > > Final patch is placed at following link. > > http://cr.openjdk.java.net/~jbhateja/8248830/webrev.06/ > > One more reviewer approval needed. > > Best Regards, > Jatin > > > -----Original Message----- > > From: Vladimir Ivanov > > Sent: Saturday, August 1, 2020 4:49 AM > > To: Bhateja, Jatin > > Cc: Viswanathan, Sandhya ; > > hotspot-compiler- dev at openjdk.java.net > > Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for > > X86 > > > > > > > http://cr.openjdk.java.net/~jbhateja/8248830/webrev.05/ > > > > Looks good. > > > > Tier5 (where I saw the crashes) passed. > > > > Please, incorporate the following minor cleanups in the final version: > > > > http://cr.openjdk.java.net/~vlivanov/jbhateja/8248830/webrev.05.cleanu > > p/ > > > > (Tested with hs-tier1,hs-tier2.) > > > > Best regards, > > Vladimir Ivanov > > > > >> -----Original Message----- > > >> From: Vladimir Ivanov > > >> Sent: Thursday, July 30, 2020 3:30 AM > > >> To: Bhateja, Jatin > > >> Cc: Viswanathan, Sandhya ; > > >> hotspot-compiler- dev at openjdk.java.net > > >> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification > > >> for > > >> X86 > > >> > > >> > > >>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.04/ > > >>> > > >>> Looks good. (Testing is in progress.) > > >> > > >> FYI test results are clean (tier1-tier5). > > >> > > >>>> I have removed RotateLeftNode/RotateRightNode::Ideal routines > > >>>> since we are anyways doing constant folding in LShiftI/URShiftI > > >>>> value routines. Since JAVA rotate APIs are no longer intrincified > > >>>> hence these routines may no longer be useful. > > >>> > > >>> Nice observation! Good. > > >> > > >> As a second thought, it seems there's still a chance left that > > >> Rotate nodes get their input type narrowed after the folding > > >> happened. For example, as a result of incremental inlining or CFG > > >> transformations during loop optimizations. And it does happen in > > >> practice since the testing revealed some crashes due to the bug in > > RotateLeftNode/RotateRightNode::Ideal(). > > >> > > >> So, it makes sense to keep the transformations. But I'm fine with > > >> addressing that as a followup enhancement. > > >> > > >> Best regards, > > >> Vladimir Ivanov > > >> > > >>> > > >>>>> It would be really nice to migrate to MacroAssembler along the > > >>>>> way (as a cleanup). > > >>>> > > >>>> I guess you are saying remove opcodes/encoding from patterns and > > >>>> move then to Assembler, Can we take this cleanup activity > > >>>> separately since other patterns are also using these matcher > > directives. > > >>> > > >>> I'm perfectly fine with handling it as a separate enhancement. > > >>> > > >>>> Other synthetic comments have been taken care of. I have extended > > >>>> the Test to cover all the newly added scalar transforms. Kindly > > >>>> let me know if there other comments. > > >>> > > >>> Nice! > > >>> > > >>> Best regards, > > >>> Vladimir Ivanov > > >>> > > >>>>> -----Original Message----- > > >>>>> From: Vladimir Ivanov > > >>>>> Sent: Friday, July 24, 2020 3:21 AM > > >>>>> To: Bhateja, Jatin > > >>>>> Cc: Viswanathan, Sandhya ; Andrew > > >>>>> Haley ; hotspot-compiler-dev at openjdk.java.net > > >>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification > > >>>>> for > > >>>>> X86 > > >>>>> > > >>>>> Hi Jatin, > > >>>>> > > >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.03/ > > >>>>> > > >>>>> Much better! Thanks. > > >>>>> > > >>>>>> Change Summary: > > >>>>>> > > >>>>>> 1) Unified the handling for scalar rotate operation. All scalar > > >>>>>> rotate > > >>>>> selection patterns are now dependent on newly created > > >>>>> RotateLeft/RotateRight nodes. This promotes rotate inferencing. > > >>>>> Currently > > >>>>> if DAG nodes corresponding to a sub-pattern are shared (have > > >>>>> multiple > > >>>>> users) then existing complex patterns based on > > >>>>> Or/LShiftL/URShift does not get matched and this prevents inferring > rotate nodes. > > >>>>> Please refer to JIT'ed assembly output with baseline[1] and with > > >>>>> patch[2] . We can see that generated code size also went done > > >>>>> from > > >>>>> 832 byte to 768 bytes. Also this can cause perf degradation if > > >>>>> shift-or dependency chain appears inside a hot region. > > >>>>>> > > >>>>>> 2) Due to enhanced rotate inferencing new patch shows better > > >>>>>> performance > > >>>>> even for legacy targets (non AVX-512). Please refer to the perf > > >>>>> result[3] over AVX2 machine for JMH benchmark part of the patch. > > >>>>> > > >>>>> Very nice! > > >>>>>> 3) As suggested, removed Java API intrinsification changes and > > >>>>>> scalar > > >>>>> rotate transformation are done during OrI/OrL node idealizations. > > >>>>> > > >>>>> Good. > > >>>>> > > >>>>> (Still would be nice to factor the matching code from Ideal() > > >>>>> and share it between multiple use sites. Especially considering > > >>>>> OrVNode::Ideal() now does basically the same thing. As an > > >>>>> example/idea, take a look at > > >>>>> is_bmi_pattern() in x86.ad.) > > >>>>> > > >>>>>> 4) SLP always gets to work on new scalar Rotate nodes and > > >>>>>> creates vector > > >>>>> rotate nodes which are degenerated into OrV/LShiftV/URShiftV > > >>>>> nodes if target does not supports vector rotates(non-AVX512). > > >>>>> > > >>>>> Good. > > >>>>> > > >>>>>> 5) Added new instruction patterns for vector shift Left/Right > > >>>>>> operations > > >>>>> with constant shift operands. This prevents emitting extra moves > > >>>>> to > > >> XMM. > > >>>>> > > >>>>> +instruct vshiftI_imm(vec dst, vec src, immI8 shift) %{ > > >>>>> +? match(Set dst (LShiftVI src shift)); > > >>>>> > > >>>>> I'd prefer to see a uniform Ideal IR shape being used > > >>>>> irrespective of whether the argument is a constant or not. It > > >>>>> should also simplify the logic in SuperWord and make it easier > > >>>>> to support on > > >>>>> non-x86 architectures. > > >>>>> > > >>>>> For example, here's how it is done on AArch64: > > >>>>> > > >>>>> instruct vsll4I_imm(vecX dst, vecX src, immI shift) %{ > > >>>>> ??? predicate(n->as_Vector()->length() == 4); > > >>>>> ??? match(Set dst (LShiftVI src (LShiftCntV shift))); ... > > >>>>> > > >>>>>> 6) Constant folding scenarios are covered in > > >>>>>> RotateLeft/RotateRight > > >>>>> idealization, inferencing of vector rotate through OrV > > >>>>> idealization covers the vector patterns generated though non SLP > > route i.e. > > >>>>> VectorAPI. > > >>>>> > > >>>>> I'm fine with keeping OrV::Ideal(), but I'm concerned with the > > >>>>> general direction here - duplication of scalar transformations > > >>>>> to lane-wise vector operations. It definitely won't scale and in > > >>>>> a longer run it risks to diverge. Would be nice to find a way to > > >>>>> automatically "lift" > > >>>>> scalar transformations to vectors and apply them uniformly. But > > >>>>> right now it is just an idea which requires more experimentation. > > >>>>> > > >>>>> > > >>>>> Some other minor comments/suggestions: > > >>>>> > > >>>>> +? // Swap the computed left and right shift counts. > > >>>>> +? if (is_rotate_left) { > > >>>>> +??? Node* temp = shiftRCnt; > > >>>>> +??? shiftRCnt? = shiftLCnt; > > >>>>> +??? shiftLCnt? = temp; > > >>>>> +? } > > >>>>> > > >>>>> Maybe use swap() here (declared in globalDefinitions.hpp)? > > >>>>> > > >>>>> > > >>>>> +? if (Matcher::match_rule_supported_vector(vopc, vlen, bt)) > > >>>>> +??? return true; > > >>>>> > > >>>>> Please, don't omit curly braces (even for simple cases). > > >>>>> > > >>>>> > > >>>>> -// Rotate Right by variable > > >>>>> -instruct rorI_rReg_Var_C0(no_rcx_RegI dst, rcx_RegI shift, > > >>>>> immI0 zero, rFlagsReg cr) > > >>>>> +instruct rorI_immI8_legacy(rRegI dst, immI8 shift, rFlagsReg > > >>>>> +cr) > > >>>>> ?? %{ > > >>>>> -? match(Set dst (OrI (URShiftI dst shift) (LShiftI dst (SubI > > >>>>> zero shift)))); > > >>>>> - > > >>>>> +? predicate(!VM_Version::supports_bmi2() && > > >>>>> n->bottom_type()->basic_type() == T_INT); > > >>>>> +? match(Set dst (RotateRight dst shift)); > > >>>>> +? format %{ "rorl???? $dst, $shift" %} > > >>>>> ???? expand %{ > > >>>>> -??? rorI_rReg_CL(dst, shift, cr); > > >>>>> +??? rorI_rReg_imm8(dst, shift, cr); > > >>>>> ???? %} > > >>>>> > > >>>>> It would be really nice to migrate to MacroAssembler along the > > >>>>> way (as a cleanup). > > >>>>> > > >>>>>> Please push the patch through your testing framework and let me > > >>>>>> know your > > >>>>> review feedback. > > >>>>> > > >>>>> There's one new assertion failure: > > >>>>> > > >>>>> #? Internal Error (.../src/hotspot/share/opto/phaseX.cpp:1238), > > >>>>> pid=5476, tid=6219 > > >>>>> #? assert((i->_idx >= k->_idx) || i->is_top()) failed: Idealize > > >>>>> should return new nodes, use Identity to return old nodes > > >>>>> > > >>>>> I believe it comes from > > >>>>> RotateLeftNode::Ideal/RotateRightNode::Ideal > > >>>>> which can return pre-contructed constants. I suggest to get rid > > >>>>> of > > >>>>> Ideal() methods and move constant folding logic into > > >>>>> Node::Value() (as implemented for other bitwise/arithmethic > > >>>>> nodes in addnode.cpp/subnode.cpp/mulnode.cpp et al). It's a more > > >>>>> generic approach since it enables richer type information > > >>>>> (ranges vs > > >>>>> constants) and IMO it's more convenient to work with constants > > >>>>> through Types than ConNodes. > > >>>>> > > >>>>> (I suspect that original/expanded IR shape may already provide > > >>>>> more precise type info for non-constant case which can affect > > >>>>> the > > >>>>> benchmarks.) > > >>>>> > > >>>>> Best regards, > > >>>>> Vladimir Ivanov > > >>>>> > > >>>>>> > > >>>>>> Best Regards, > > >>>>>> Jatin > > >>>>>> > > >>>>>> [1] > > >>>>>> > > http://cr.openjdk.java.net/~jbhateja/8248830/rotate_baseline_avx2_asm. > > >>>>>> txt [2] > > >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_new_patch_a > > >>>>>> vx > > >>>>>> 2_ > > >>>>>> asm > > >>>>>> .txt [3] > > >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_perf_avx2_n > > >>>>>> ew > > >>>>>> _p > > >>>>>> atc > > >>>>>> h.txt > > >>>>>> > > >>>>>> > > >>>>>>> -----Original Message----- > > >>>>>>> From: Vladimir Ivanov > > >>>>>>> Sent: Saturday, July 18, 2020 12:25 AM > > >>>>>>> To: Bhateja, Jatin ; Andrew Haley > > >>>>>>> > > >>>>>>> Cc: Viswanathan, Sandhya ; > > >>>>>>> hotspot-compiler- dev at openjdk.java.net > > >>>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API > > >>>>>>> intrinsification for > > >>>>>>> X86 > > >>>>>>> > > >>>>>>> Hi Jatin, > > >>>>>>> > > >>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev_02/ > > >>>>>>> > > >>>>>>> It definitely looks better, but IMO it hasn't reached the > > >>>>>>> sweet spot > > >>>>> yet. > > >>>>>>> It feels like the focus is on auto-vectorizer while the burden > > >>>>>>> is put on scalar cases. > > >>>>>>> > > >>>>>>> First of all, considering GVN folds relevant operation > > >>>>>>> patterns into a single Rotate node now, what's the motivation > > >>>>>>> to introduce intrinsics? > > >>>>>>> > > >>>>>>> Another point is there's still significant duplication for > > >>>>>>> scalar cases. > > >>>>>>> > > >>>>>>> I'd prefer to see the legacy cases which rely on pattern > > >>>>>>> matching to go away and be substituted with instructions which > > >>>>>>> match Rotate instructions (migrating ). > > >>>>>>> > > >>>>>>> I understand that it will penalize the vectorization > > >>>>>>> implementation, but IMO reducing overall complexity is worth it. > > >>>>>>> On auto-vectorizer side, I see > > >>>>>>> 2 ways to fix it: > > >>>>>>> > > >>>>>>> ???? (1) introduce additional AD instructions for > > >>>>>>> RotateLeftV/RotateRightV specifically for pre-AVX512 hardware; > > >>>>>>> > > >>>>>>> ???? (2) in SuperWord::output(), when matcher doesn't support > > >>>>>>> RotateLeftV/RotateLeftV nodes > > >>>>>>> (Matcher::match_rule_supported()), > > >>>>>>> generate vectorized version of the original pattern. > > >>>>>>> > > >>>>>>> Overall, it looks like more and more focus is made on scalar > part. > > >>>>>>> Considering the main goal of the patch is to enable > > >>>>>>> vectorization, I'm fine with separating cleanup of scalar part. > > >>>>>>> As an interim solution, it seems that leaving the scalar part > > >>>>>>> as it is now and matching scalar bit rotate pattern in > > >>>>>>> VectorNode::is_rotate() should be enough to keep the > > >>>>>>> vectorization part functioning. Then scalar Rotate nodes and > > relevant cleanups can be integrated later. > > >>>>>>> (Or vice > > >>>>>>> versa: clean up scalar part first and then follow up with > > >>>>>>> vectorization.) > > >>>>>>> > > >>>>>>> Some other comments: > > >>>>>>> > > >>>>>>> * There's a lot of duplication between OrINode::Ideal and > > >>>>> OrLNode::Ideal. > > >>>>>>> What do you think about introducing a super type > > >>>>>>> (OrNode) and put a unified version (OrNode::Ideal) there? > > >>>>>>> > > >>>>>>> > > >>>>>>> * src/hotspot/cpu/x86/x86.ad > > >>>>>>> > > >>>>>>> +instruct vprotate_immI8(vec dst, vec src, immI8 shift) %{ > > >>>>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type() > > >>>>>>> +== > > >>>>>>> T_INT > > >>>>> || > > >>>>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type() > > >>>>>>> +== T_LONG); > > >>>>>>> > > >>>>>>> +instruct vprorate(vec dst, vec src, vec shift) %{ > > >>>>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type() > > >>>>>>> +== > > >>>>>>> T_INT > > >>>>> || > > >>>>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type() > > >>>>>>> +== T_LONG); > > >>>>>>> > > >>>>>>> The predicates are redundant here. > > >>>>>>> > > >>>>>>> > > >>>>>>> * src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp > > >>>>>>> > > >>>>>>> +void C2_MacroAssembler::vprotate_imm(int opcode, BasicType > > >>>>>>> +etype, > > >>>>>>> XMMRegister dst, XMMRegister src, > > >>>>>>> +???????????????????????????????????? int shift, int > > >>>>>>> +vector_len) { if (opcode == Op_RotateLeftV) { > > >>>>>>> +??? if (etype == T_INT) { > > >>>>>>> +????? evprold(dst, src, shift, vector_len); > > >>>>>>> +??? } else { > > >>>>>>> +????? evprolq(dst, src, shift, vector_len); > > >>>>>>> +??? } > > >>>>>>> > > >>>>>>> Please, put an assert for the false case (assert(etype == > > >>>>>>> T_LONG, > > >>>>> "...")). > > >>>>>>> > > >>>>>>> > > >>>>>>> * On testing (with previous version of the patch): -XX:UseAVX > > >>>>>>> is > > >>>>>>> x86- specific flag, so new/adjusted tests now fail on non-x86 > > >> platforms. > > >>>>>>> Either omitting the flag or adding > > >>>>>>> -XX:+IgnoreUnrecognizedVMOptions will solve the issue. > > >>>>>>> > > >>>>>>> Best regards, > > >>>>>>> Vladimir Ivanov > > >>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> Summary of changes: > > >>>>>>>> 1) Optimization is specifically targeted to exploit vector > > >>>>>>>> rotation > > >>>>>>> instruction added for X86 AVX512. A single rotate instruction > > >>>>>>> encapsulates entire vector OR/SHIFTs pattern thus offers > > >>>>>>> better latency at reduced instruction count. > > >>>>>>>> > > >>>>>>>> 2) There were two approaches to implement this: > > >>>>>>>> ?????? a)? Let everything remain the same and add new wide > > >>>>>>>> complex > > >>>>>>> instruction patterns in the matcher for e.g. > > >>>>>>>> ??????????? set Dst ( OrV (Binary (LShiftVI dst (Binary > > >>>>>>>> ReplicateI > > >>>>>>>> shift)) > > >>>>>>> (URShiftVI dst (Binary (SubI (Binary ReplicateI 32) ( > > >>>>>>> Replicate > > >>>>>>> shift)) > > >>>>>>>> ?????? It would have been an overoptimistic assumption to > > >>>>>>>> expect that graph > > >>>>>>> shape would be preserved till the matcher for correct > inferencing. > > >>>>>>>> ?????? In addition we would have required multiple such > > >>>>>>>> bulky patterns. > > >>>>>>>> ?????? b) Create new RotateLeft/RotateRight scalar nodes, > > >>>>>>>> these gets > > >>>>>>> generated during intrinsification as well as during additional > > >>>>>>> pattern > > >>>>>>>> ?????? matching during node Idealization, later on these > > >>>>>>>> nodes are consumed > > >>>>>>> by SLP for valid vectorization scenarios to emit their vector > > >>>>>>>> ?????? counterparts which eventually emits vector rotates. > > >>>>>>>> > > >>>>>>>> 3) I choose approach 2b) since its cleaner, only problem here > > >>>>>>>> was that in non-evex mode (UseAVX < 3) new scalar Rotate > > >>>>>>>> nodes should either be > > >>>>>>> dismantled back to OR/SHIFT pattern or we penalize the > > >>>>>>> vectorization which would be very costly, other option would > > >>>>>>> have been to add additional vector rotate pattern for UseAVX=3 > > >>>>>>> in the matcher which emit vector OR-SHIFTs instruction but > > >>>>>>> then it will loose on emitting efficient instruction sequence > > >>>>>>> which node sharing > > >>>>>>> (OrV/LShiftV/URShift) offer in current implementation - thus > > >>>>>>> it will not be beneficial for non-AVX512 targets, only saving > > >>>>>>> will be in terms of cleanup of few existing scalar rotate > > >>>>>>> matcher patterns, also old targets does not offer this > > >>>>>>> powerful rotate > > >> instruction. > > >>>>>>> Therefore new scalar nodes are created only for AVX512 targets. > > >>>>>>>> > > >>>>>>>> As per suggestions constant folding scenarios have been > > >>>>>>>> covered during > > >>>>>>> Idealizations of newly added scalar nodes. > > >>>>>>>> > > >>>>>>>> Please review the latest version and share your feedback and > > >>>>>>>> test > > >>>>>>> results. > > >>>>>>>> > > >>>>>>>> Best Regards, > > >>>>>>>> Jatin > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>> -----Original Message----- > > >>>>>>>>> From: Andrew Haley > > >>>>>>>>> Sent: Saturday, July 11, 2020 2:24 PM > > >>>>>>>>> To: Vladimir Ivanov ; Bhateja, > > >>>>>>>>> Jatin ; > > >>>>>>>>> hotspot-compiler-dev at openjdk.java.net > > >>>>>>>>> Cc: Viswanathan, Sandhya > > >>>>>>>>> Subject: Re: 8248830 : RFR[S] : C2 : Rotate API > > >>>>>>>>> intrinsification for > > >>>>>>>>> X86 > > >>>>>>>>> > > >>>>>>>>> On 10/07/2020 18:32, Vladimir Ivanov wrote: > > >>>>>>>>> > > >>>>>>>>> ??? > High-level comment: so far, there were no pressing > > >>>>>>>>> need in > > >>>>>>>>>> explicitly marking the methods as intrinsics. ROR/ROL > > >>>>>>>>> instructions > > >>>>>>>>>> were selected during matching [1]. Now the patch introduces > > >>>>>>>>>> > > > >>>>>>>>> dedicated nodes > > >>>>>>>>> (RotateLeft/RotateRight) specifically for intrinsics? > > > >>>>>>>>> which partly duplicates existing logic. > > >>>>>>>>> > > >>>>>>>>> The lack of rotate nodes in the IR has always meant that > > >>>>>>>>> AArch64 doesn't generate optimal code for e.g. > > >>>>>>>>> > > >>>>>>>>> ????? (Set dst (XorL reg1 (RotateLeftL reg2 imm))) > > >>>>>>>>> > > >>>>>>>>> because, with the RotateLeft expanded to its full > > >>>>>>>>> combination of ORs and shifts, it's to complicated to match. > > >>>>>>>>> At the time I put this to one side because it wasn't urgent. > > >>>>>>>>> This is a shame because although such combinations are > > >>>>>>>>> unusual they are used in some crypto > > >>>>> operations. > > >>>>>>>>> > > >>>>>>>>> If we can generate immediate-form rotate nodes early by > > >>>>>>>>> pattern matching during parsing (rather than depending on > > >>>>>>>>> intrinsics) we'll get more value than by depending on > > >>>>>>>>> programmers calling > > >> intrinsics. > > >>>>>>>>> > > >>>>>>>>> -- > > >>>>>>>>> Andrew Haley? (he/him) > > >>>>>>>>> Java Platform Lead Engineer > > >>>>>>>>> Red Hat UK Ltd. > > >>>>>>>>> https://keybase.io/andrewhaley > > >>>>>>>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > > >>>>>>>> From vladimir.x.ivanov at oracle.com Fri Aug 7 12:15:12 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 7 Aug 2020 15:15:12 +0300 Subject: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 In-Reply-To: References: <92d97d1b-fc53-e368-b249-1cab7db33964@oracle.com> <5f6a3e52-7854-4613-43f1-32a7423a0db6@oracle.com> <8265e303-0f86-b308-be79-740d6b4710f2@oracle.com> Message-ID: <95066bec-d74e-eb55-9a05-463239016b2a@oracle.com> >> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.06/ Still looks good. It would be nice to get one more (R)eview. Let's wait a little bit more. Best regards, Vladimir Ivanov >>> -----Original Message----- >>> From: Vladimir Ivanov >>> Sent: Saturday, August 1, 2020 4:49 AM >>> To: Bhateja, Jatin >>> Cc: Viswanathan, Sandhya ; >>> hotspot-compiler- dev at openjdk.java.net >>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for >>> X86 >>> >>> >>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.05/ >>> >>> Looks good. >>> >>> Tier5 (where I saw the crashes) passed. >>> >>> Please, incorporate the following minor cleanups in the final version: >>> >>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8248830/webrev.05.cleanu >>> p/ >>> >>> (Tested with hs-tier1,hs-tier2.) >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>>> -----Original Message----- >>>>> From: Vladimir Ivanov >>>>> Sent: Thursday, July 30, 2020 3:30 AM >>>>> To: Bhateja, Jatin >>>>> Cc: Viswanathan, Sandhya ; >>>>> hotspot-compiler- dev at openjdk.java.net >>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification >>>>> for >>>>> X86 >>>>> >>>>> >>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.04/ >>>>>> >>>>>> Looks good. (Testing is in progress.) >>>>> >>>>> FYI test results are clean (tier1-tier5). >>>>> >>>>>>> I have removed RotateLeftNode/RotateRightNode::Ideal routines >>>>>>> since we are anyways doing constant folding in LShiftI/URShiftI >>>>>>> value routines. Since JAVA rotate APIs are no longer intrincified >>>>>>> hence these routines may no longer be useful. >>>>>> >>>>>> Nice observation! Good. >>>>> >>>>> As a second thought, it seems there's still a chance left that >>>>> Rotate nodes get their input type narrowed after the folding >>>>> happened. For example, as a result of incremental inlining or CFG >>>>> transformations during loop optimizations. And it does happen in >>>>> practice since the testing revealed some crashes due to the bug in >>> RotateLeftNode/RotateRightNode::Ideal(). >>>>> >>>>> So, it makes sense to keep the transformations. But I'm fine with >>>>> addressing that as a followup enhancement. >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> >>>>>> >>>>>>>> It would be really nice to migrate to MacroAssembler along the >>>>>>>> way (as a cleanup). >>>>>>> >>>>>>> I guess you are saying remove opcodes/encoding from patterns and >>>>>>> move then to Assembler, Can we take this cleanup activity >>>>>>> separately since other patterns are also using these matcher >>> directives. >>>>>> >>>>>> I'm perfectly fine with handling it as a separate enhancement. >>>>>> >>>>>>> Other synthetic comments have been taken care of. I have extended >>>>>>> the Test to cover all the newly added scalar transforms. Kindly >>>>>>> let me know if there other comments. >>>>>> >>>>>> Nice! >>>>>> >>>>>> Best regards, >>>>>> Vladimir Ivanov >>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Vladimir Ivanov >>>>>>>> Sent: Friday, July 24, 2020 3:21 AM >>>>>>>> To: Bhateja, Jatin >>>>>>>> Cc: Viswanathan, Sandhya ; Andrew >>>>>>>> Haley ; hotspot-compiler-dev at openjdk.java.net >>>>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification >>>>>>>> for >>>>>>>> X86 >>>>>>>> >>>>>>>> Hi Jatin, >>>>>>>> >>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.03/ >>>>>>>> >>>>>>>> Much better! Thanks. >>>>>>>> >>>>>>>>> Change Summary: >>>>>>>>> >>>>>>>>> 1) Unified the handling for scalar rotate operation. All scalar >>>>>>>>> rotate >>>>>>>> selection patterns are now dependent on newly created >>>>>>>> RotateLeft/RotateRight nodes. This promotes rotate inferencing. >>>>>>>> Currently >>>>>>>> if DAG nodes corresponding to a sub-pattern are shared (have >>>>>>>> multiple >>>>>>>> users) then existing complex patterns based on >>>>>>>> Or/LShiftL/URShift does not get matched and this prevents inferring >> rotate nodes. >>>>>>>> Please refer to JIT'ed assembly output with baseline[1] and with >>>>>>>> patch[2] . We can see that generated code size also went done >>>>>>>> from >>>>>>>> 832 byte to 768 bytes. Also this can cause perf degradation if >>>>>>>> shift-or dependency chain appears inside a hot region. >>>>>>>>> >>>>>>>>> 2) Due to enhanced rotate inferencing new patch shows better >>>>>>>>> performance >>>>>>>> even for legacy targets (non AVX-512). Please refer to the perf >>>>>>>> result[3] over AVX2 machine for JMH benchmark part of the patch. >>>>>>>> >>>>>>>> Very nice! >>>>>>>>> 3) As suggested, removed Java API intrinsification changes and >>>>>>>>> scalar >>>>>>>> rotate transformation are done during OrI/OrL node idealizations. >>>>>>>> >>>>>>>> Good. >>>>>>>> >>>>>>>> (Still would be nice to factor the matching code from Ideal() >>>>>>>> and share it between multiple use sites. Especially considering >>>>>>>> OrVNode::Ideal() now does basically the same thing. As an >>>>>>>> example/idea, take a look at >>>>>>>> is_bmi_pattern() in x86.ad.) >>>>>>>> >>>>>>>>> 4) SLP always gets to work on new scalar Rotate nodes and >>>>>>>>> creates vector >>>>>>>> rotate nodes which are degenerated into OrV/LShiftV/URShiftV >>>>>>>> nodes if target does not supports vector rotates(non-AVX512). >>>>>>>> >>>>>>>> Good. >>>>>>>> >>>>>>>>> 5) Added new instruction patterns for vector shift Left/Right >>>>>>>>> operations >>>>>>>> with constant shift operands. This prevents emitting extra moves >>>>>>>> to >>>>> XMM. >>>>>>>> >>>>>>>> +instruct vshiftI_imm(vec dst, vec src, immI8 shift) %{ >>>>>>>> +? match(Set dst (LShiftVI src shift)); >>>>>>>> >>>>>>>> I'd prefer to see a uniform Ideal IR shape being used >>>>>>>> irrespective of whether the argument is a constant or not. It >>>>>>>> should also simplify the logic in SuperWord and make it easier >>>>>>>> to support on >>>>>>>> non-x86 architectures. >>>>>>>> >>>>>>>> For example, here's how it is done on AArch64: >>>>>>>> >>>>>>>> instruct vsll4I_imm(vecX dst, vecX src, immI shift) %{ >>>>>>>> ??? predicate(n->as_Vector()->length() == 4); >>>>>>>> ??? match(Set dst (LShiftVI src (LShiftCntV shift))); ... >>>>>>>> >>>>>>>>> 6) Constant folding scenarios are covered in >>>>>>>>> RotateLeft/RotateRight >>>>>>>> idealization, inferencing of vector rotate through OrV >>>>>>>> idealization covers the vector patterns generated though non SLP >>> route i.e. >>>>>>>> VectorAPI. >>>>>>>> >>>>>>>> I'm fine with keeping OrV::Ideal(), but I'm concerned with the >>>>>>>> general direction here - duplication of scalar transformations >>>>>>>> to lane-wise vector operations. It definitely won't scale and in >>>>>>>> a longer run it risks to diverge. Would be nice to find a way to >>>>>>>> automatically "lift" >>>>>>>> scalar transformations to vectors and apply them uniformly. But >>>>>>>> right now it is just an idea which requires more experimentation. >>>>>>>> >>>>>>>> >>>>>>>> Some other minor comments/suggestions: >>>>>>>> >>>>>>>> +? // Swap the computed left and right shift counts. >>>>>>>> +? if (is_rotate_left) { >>>>>>>> +??? Node* temp = shiftRCnt; >>>>>>>> +??? shiftRCnt? = shiftLCnt; >>>>>>>> +??? shiftLCnt? = temp; >>>>>>>> +? } >>>>>>>> >>>>>>>> Maybe use swap() here (declared in globalDefinitions.hpp)? >>>>>>>> >>>>>>>> >>>>>>>> +? if (Matcher::match_rule_supported_vector(vopc, vlen, bt)) >>>>>>>> +??? return true; >>>>>>>> >>>>>>>> Please, don't omit curly braces (even for simple cases). >>>>>>>> >>>>>>>> >>>>>>>> -// Rotate Right by variable >>>>>>>> -instruct rorI_rReg_Var_C0(no_rcx_RegI dst, rcx_RegI shift, >>>>>>>> immI0 zero, rFlagsReg cr) >>>>>>>> +instruct rorI_immI8_legacy(rRegI dst, immI8 shift, rFlagsReg >>>>>>>> +cr) >>>>>>>> ?? %{ >>>>>>>> -? match(Set dst (OrI (URShiftI dst shift) (LShiftI dst (SubI >>>>>>>> zero shift)))); >>>>>>>> - >>>>>>>> +? predicate(!VM_Version::supports_bmi2() && >>>>>>>> n->bottom_type()->basic_type() == T_INT); >>>>>>>> +? match(Set dst (RotateRight dst shift)); >>>>>>>> +? format %{ "rorl???? $dst, $shift" %} >>>>>>>> ???? expand %{ >>>>>>>> -??? rorI_rReg_CL(dst, shift, cr); >>>>>>>> +??? rorI_rReg_imm8(dst, shift, cr); >>>>>>>> ???? %} >>>>>>>> >>>>>>>> It would be really nice to migrate to MacroAssembler along the >>>>>>>> way (as a cleanup). >>>>>>>> >>>>>>>>> Please push the patch through your testing framework and let me >>>>>>>>> know your >>>>>>>> review feedback. >>>>>>>> >>>>>>>> There's one new assertion failure: >>>>>>>> >>>>>>>> #? Internal Error (.../src/hotspot/share/opto/phaseX.cpp:1238), >>>>>>>> pid=5476, tid=6219 >>>>>>>> #? assert((i->_idx >= k->_idx) || i->is_top()) failed: Idealize >>>>>>>> should return new nodes, use Identity to return old nodes >>>>>>>> >>>>>>>> I believe it comes from >>>>>>>> RotateLeftNode::Ideal/RotateRightNode::Ideal >>>>>>>> which can return pre-contructed constants. I suggest to get rid >>>>>>>> of >>>>>>>> Ideal() methods and move constant folding logic into >>>>>>>> Node::Value() (as implemented for other bitwise/arithmethic >>>>>>>> nodes in addnode.cpp/subnode.cpp/mulnode.cpp et al). It's a more >>>>>>>> generic approach since it enables richer type information >>>>>>>> (ranges vs >>>>>>>> constants) and IMO it's more convenient to work with constants >>>>>>>> through Types than ConNodes. >>>>>>>> >>>>>>>> (I suspect that original/expanded IR shape may already provide >>>>>>>> more precise type info for non-constant case which can affect >>>>>>>> the >>>>>>>> benchmarks.) >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Vladimir Ivanov >>>>>>>> >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> Jatin >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> >>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_baseline_avx2_asm. >>>>>>>>> txt [2] >>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_new_patch_a >>>>>>>>> vx >>>>>>>>> 2_ >>>>>>>>> asm >>>>>>>>> .txt [3] >>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_perf_avx2_n >>>>>>>>> ew >>>>>>>>> _p >>>>>>>>> atc >>>>>>>>> h.txt >>>>>>>>> >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Vladimir Ivanov >>>>>>>>>> Sent: Saturday, July 18, 2020 12:25 AM >>>>>>>>>> To: Bhateja, Jatin ; Andrew Haley >>>>>>>>>> >>>>>>>>>> Cc: Viswanathan, Sandhya ; >>>>>>>>>> hotspot-compiler- dev at openjdk.java.net >>>>>>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API >>>>>>>>>> intrinsification for >>>>>>>>>> X86 >>>>>>>>>> >>>>>>>>>> Hi Jatin, >>>>>>>>>> >>>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev_02/ >>>>>>>>>> >>>>>>>>>> It definitely looks better, but IMO it hasn't reached the >>>>>>>>>> sweet spot >>>>>>>> yet. >>>>>>>>>> It feels like the focus is on auto-vectorizer while the burden >>>>>>>>>> is put on scalar cases. >>>>>>>>>> >>>>>>>>>> First of all, considering GVN folds relevant operation >>>>>>>>>> patterns into a single Rotate node now, what's the motivation >>>>>>>>>> to introduce intrinsics? >>>>>>>>>> >>>>>>>>>> Another point is there's still significant duplication for >>>>>>>>>> scalar cases. >>>>>>>>>> >>>>>>>>>> I'd prefer to see the legacy cases which rely on pattern >>>>>>>>>> matching to go away and be substituted with instructions which >>>>>>>>>> match Rotate instructions (migrating ). >>>>>>>>>> >>>>>>>>>> I understand that it will penalize the vectorization >>>>>>>>>> implementation, but IMO reducing overall complexity is worth it. >>>>>>>>>> On auto-vectorizer side, I see >>>>>>>>>> 2 ways to fix it: >>>>>>>>>> >>>>>>>>>> ???? (1) introduce additional AD instructions for >>>>>>>>>> RotateLeftV/RotateRightV specifically for pre-AVX512 hardware; >>>>>>>>>> >>>>>>>>>> ???? (2) in SuperWord::output(), when matcher doesn't support >>>>>>>>>> RotateLeftV/RotateLeftV nodes >>>>>>>>>> (Matcher::match_rule_supported()), >>>>>>>>>> generate vectorized version of the original pattern. >>>>>>>>>> >>>>>>>>>> Overall, it looks like more and more focus is made on scalar >> part. >>>>>>>>>> Considering the main goal of the patch is to enable >>>>>>>>>> vectorization, I'm fine with separating cleanup of scalar part. >>>>>>>>>> As an interim solution, it seems that leaving the scalar part >>>>>>>>>> as it is now and matching scalar bit rotate pattern in >>>>>>>>>> VectorNode::is_rotate() should be enough to keep the >>>>>>>>>> vectorization part functioning. Then scalar Rotate nodes and >>> relevant cleanups can be integrated later. >>>>>>>>>> (Or vice >>>>>>>>>> versa: clean up scalar part first and then follow up with >>>>>>>>>> vectorization.) >>>>>>>>>> >>>>>>>>>> Some other comments: >>>>>>>>>> >>>>>>>>>> * There's a lot of duplication between OrINode::Ideal and >>>>>>>> OrLNode::Ideal. >>>>>>>>>> What do you think about introducing a super type >>>>>>>>>> (OrNode) and put a unified version (OrNode::Ideal) there? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> * src/hotspot/cpu/x86/x86.ad >>>>>>>>>> >>>>>>>>>> +instruct vprotate_immI8(vec dst, vec src, immI8 shift) %{ >>>>>>>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type() >>>>>>>>>> +== >>>>>>>>>> T_INT >>>>>>>> || >>>>>>>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type() >>>>>>>>>> +== T_LONG); >>>>>>>>>> >>>>>>>>>> +instruct vprorate(vec dst, vec src, vec shift) %{ >>>>>>>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type() >>>>>>>>>> +== >>>>>>>>>> T_INT >>>>>>>> || >>>>>>>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type() >>>>>>>>>> +== T_LONG); >>>>>>>>>> >>>>>>>>>> The predicates are redundant here. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> * src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp >>>>>>>>>> >>>>>>>>>> +void C2_MacroAssembler::vprotate_imm(int opcode, BasicType >>>>>>>>>> +etype, >>>>>>>>>> XMMRegister dst, XMMRegister src, >>>>>>>>>> +???????????????????????????????????? int shift, int >>>>>>>>>> +vector_len) { if (opcode == Op_RotateLeftV) { >>>>>>>>>> +??? if (etype == T_INT) { >>>>>>>>>> +????? evprold(dst, src, shift, vector_len); >>>>>>>>>> +??? } else { >>>>>>>>>> +????? evprolq(dst, src, shift, vector_len); >>>>>>>>>> +??? } >>>>>>>>>> >>>>>>>>>> Please, put an assert for the false case (assert(etype == >>>>>>>>>> T_LONG, >>>>>>>> "...")). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> * On testing (with previous version of the patch): -XX:UseAVX >>>>>>>>>> is >>>>>>>>>> x86- specific flag, so new/adjusted tests now fail on non-x86 >>>>> platforms. >>>>>>>>>> Either omitting the flag or adding >>>>>>>>>> -XX:+IgnoreUnrecognizedVMOptions will solve the issue. >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Vladimir Ivanov >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Summary of changes: >>>>>>>>>>> 1) Optimization is specifically targeted to exploit vector >>>>>>>>>>> rotation >>>>>>>>>> instruction added for X86 AVX512. A single rotate instruction >>>>>>>>>> encapsulates entire vector OR/SHIFTs pattern thus offers >>>>>>>>>> better latency at reduced instruction count. >>>>>>>>>>> >>>>>>>>>>> 2) There were two approaches to implement this: >>>>>>>>>>> ?????? a)? Let everything remain the same and add new wide >>>>>>>>>>> complex >>>>>>>>>> instruction patterns in the matcher for e.g. >>>>>>>>>>> ??????????? set Dst ( OrV (Binary (LShiftVI dst (Binary >>>>>>>>>>> ReplicateI >>>>>>>>>>> shift)) >>>>>>>>>> (URShiftVI dst (Binary (SubI (Binary ReplicateI 32) ( >>>>>>>>>> Replicate >>>>>>>>>> shift)) >>>>>>>>>>> ?????? It would have been an overoptimistic assumption to >>>>>>>>>>> expect that graph >>>>>>>>>> shape would be preserved till the matcher for correct >> inferencing. >>>>>>>>>>> ?????? In addition we would have required multiple such >>>>>>>>>>> bulky patterns. >>>>>>>>>>> ?????? b) Create new RotateLeft/RotateRight scalar nodes, >>>>>>>>>>> these gets >>>>>>>>>> generated during intrinsification as well as during additional >>>>>>>>>> pattern >>>>>>>>>>> ?????? matching during node Idealization, later on these >>>>>>>>>>> nodes are consumed >>>>>>>>>> by SLP for valid vectorization scenarios to emit their vector >>>>>>>>>>> ?????? counterparts which eventually emits vector rotates. >>>>>>>>>>> >>>>>>>>>>> 3) I choose approach 2b) since its cleaner, only problem here >>>>>>>>>>> was that in non-evex mode (UseAVX < 3) new scalar Rotate >>>>>>>>>>> nodes should either be >>>>>>>>>> dismantled back to OR/SHIFT pattern or we penalize the >>>>>>>>>> vectorization which would be very costly, other option would >>>>>>>>>> have been to add additional vector rotate pattern for UseAVX=3 >>>>>>>>>> in the matcher which emit vector OR-SHIFTs instruction but >>>>>>>>>> then it will loose on emitting efficient instruction sequence >>>>>>>>>> which node sharing >>>>>>>>>> (OrV/LShiftV/URShift) offer in current implementation - thus >>>>>>>>>> it will not be beneficial for non-AVX512 targets, only saving >>>>>>>>>> will be in terms of cleanup of few existing scalar rotate >>>>>>>>>> matcher patterns, also old targets does not offer this >>>>>>>>>> powerful rotate >>>>> instruction. >>>>>>>>>> Therefore new scalar nodes are created only for AVX512 targets. >>>>>>>>>>> >>>>>>>>>>> As per suggestions constant folding scenarios have been >>>>>>>>>>> covered during >>>>>>>>>> Idealizations of newly added scalar nodes. >>>>>>>>>>> >>>>>>>>>>> Please review the latest version and share your feedback and >>>>>>>>>>> test >>>>>>>>>> results. >>>>>>>>>>> >>>>>>>>>>> Best Regards, >>>>>>>>>>> Jatin >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: Andrew Haley >>>>>>>>>>>> Sent: Saturday, July 11, 2020 2:24 PM >>>>>>>>>>>> To: Vladimir Ivanov ; Bhateja, >>>>>>>>>>>> Jatin ; >>>>>>>>>>>> hotspot-compiler-dev at openjdk.java.net >>>>>>>>>>>> Cc: Viswanathan, Sandhya >>>>>>>>>>>> Subject: Re: 8248830 : RFR[S] : C2 : Rotate API >>>>>>>>>>>> intrinsification for >>>>>>>>>>>> X86 >>>>>>>>>>>> >>>>>>>>>>>> On 10/07/2020 18:32, Vladimir Ivanov wrote: >>>>>>>>>>>> >>>>>>>>>>>> ??? > High-level comment: so far, there were no pressing >>>>>>>>>>>> need in >>>>>>>>>>>>> explicitly marking the methods as intrinsics. ROR/ROL >>>>>>>>>>>> instructions >>>>>>>>>>>>> were selected during matching [1]. Now the patch introduces >>>>>>>>>>>>>> >>>>>>>>>>>> dedicated nodes >>>>>>>>>>>> (RotateLeft/RotateRight) specifically for intrinsics? > >>>>>>>>>>>> which partly duplicates existing logic. >>>>>>>>>>>> >>>>>>>>>>>> The lack of rotate nodes in the IR has always meant that >>>>>>>>>>>> AArch64 doesn't generate optimal code for e.g. >>>>>>>>>>>> >>>>>>>>>>>> ????? (Set dst (XorL reg1 (RotateLeftL reg2 imm))) >>>>>>>>>>>> >>>>>>>>>>>> because, with the RotateLeft expanded to its full >>>>>>>>>>>> combination of ORs and shifts, it's to complicated to match. >>>>>>>>>>>> At the time I put this to one side because it wasn't urgent. >>>>>>>>>>>> This is a shame because although such combinations are >>>>>>>>>>>> unusual they are used in some crypto >>>>>>>> operations. >>>>>>>>>>>> >>>>>>>>>>>> If we can generate immediate-form rotate nodes early by >>>>>>>>>>>> pattern matching during parsing (rather than depending on >>>>>>>>>>>> intrinsics) we'll get more value than by depending on >>>>>>>>>>>> programmers calling >>>>> intrinsics. >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Andrew Haley? (he/him) >>>>>>>>>>>> Java Platform Lead Engineer >>>>>>>>>>>> Red Hat UK Ltd. >>>>>>>>>>>> https://keybase.io/andrewhaley >>>>>>>>>>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >>>>>>>>>>> From adinn at redhat.com Fri Aug 7 13:25:09 2020 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 7 Aug 2020 14:25:09 +0100 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> Message-ID: Hi Ningsheng, On 31/07/2020 02:41, Ningsheng Jian wrote: > Hi Andrew, > > Thanks a lot!! > > FYI, the latest patch: > > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039289.html > > > And some descriptions: > > http://cr.openjdk.java.net/~njian/8231441/README-RFR.txt Thanks for doing such a great job. This is very good work. Also, thanks for splitting the patch up to separate out the different steps -- that was immensely helpful. I have one general query and a small number of detailed comments which are provided separately for each patch. See below. Testing: I was able to test this patch on a loaned Fujitsu FX700. I replicated your results, passing tier1 tests and the jtreg compiler tests in vectorization, codegen, c2/cr6340864 and loopopts. I also eyeballed /some/ of the generated code to check that it looked ok. I'd really like to be able to do that systematically for a comprehensive test suite that exercised every rule but I only had the machine for a few days. This really ought to be done as a follow-up to ensure that all the rules are working as expected. General Comments: Sizing the NEON registers using 8 slots -- even though there might actually be more (or less!) slots in use for a VecA is fine. However, I think this needs a little bit more explanation in the .ad. file (see comments on ra webrev below) I'm ok with your choice to use p7 as an always true predicate register and also how you choose to init and re-init from code defined via the ad file based on C->max_vector_size(). I am not clear why you are choosing to re-init ptrue after certain JVM runtime calls (e.g. when Z calls into the runtime) and not others e.g. when we call a JVM_ENTRY. Could you explain the rationale you have followed here? Specific Comments (feature webrev): globals_aarch64.hpp:102 Just out of interest why does UseSVE have range(0,2)? It seems you are only testing for UseSVE > 0. Does value 2 correspond to an optional subset? Specific Comments (register allocator webrev): aarch64.ad:97-100 Why have you added a reg_def for R8 and R9 here and also to alloc_class chunk0 at lines 544-545? They aren't used by C2 so why define them? assembler_aarch64.hpp:280 (also 699) prf sets a predicate register field. pgrf sets a governing predicate register field. Should the name not be gprf. chaitin.cpp:648-660 The comment is rather oddly formatted. At line 650 you guard the assert with a test for lrg._is_vector. Is that not always going to be guaranteed by the outer condition lrg._is_scalable? If so then you should really assert lrg._is_vector. The special case code for computation of num_regs for a vector stack slot also appears in this file with a slightly different organization in find_first_set (line 1350) and in PhaseChaitin::Select (line 1590). There is another similar case in RegMask::num_registers at regmask.cpp: 98. It would be better to factor out the common code into methods of LRG. Maybe using the following? bool LRG::is_scalable_vector() { if (_is_scalable) { assert(_is_vector == 1); assert(_num_regs == == RegMask::SlotsPerVecA) return true; } return false; } int LRG::scalable_num_regs() { assert(is_scalable_vector()); if (OptoReg::is_stack(_reg)) { return _scalable_reg_slots } else { return num_reg_slots; } } chaitin.cpp:1350 Once again the test for lrg._is_vector should be guaranteed by the outer test of lrg._is_scalable. Refactoring using the common methods of LRG as above ought to help. chaitin.cpp:1591 Use common method code. postaloc.cpp:308/323 Once again you should be able to use common method code of LRG here. regmask.cpp:91 Once again you should be able to use common method code of LRG here. Specific Comments (c2 webrev): aarch64.ad:3815 very nice defensive check! assembler_aarch64.hpp:2469 & 2699+ Andrew Haley is definitely going to ask you to update function entry (assembler_aarch64.cpp:76) to call these new instruction generation methods and then validate the generated code using asm_check So, I guess you might as well do that now ;-) zBarrierSetAssembler_aarch64.cpp:434 Can you explain why we need to check p7 here and not do so in other places where we call into the JVM? I'm not saying this is wrong. I just want to know how you decided where re-init of p7 was needed. superword.cpp:97 Does this mean that is someone sets the maximum vector size to a non-power of two, such as 384, all superword operations will be bypassed? Including those which can be done using NEON vectors? regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From Charlie.Gracie at microsoft.com Fri Aug 7 16:19:10 2020 From: Charlie.Gracie at microsoft.com (Charlie Gracie) Date: Fri, 7 Aug 2020 16:19:10 +0000 Subject: RFR: 8251303: C2: remove unused _site_invoke_ratio and related code from InlineTree Message-ID: Hi, Please review this change to C2 that removes unused code from InlineTree. I looked to see which change removed the last use of this code but as far back in history as I could see it was not used. Bug: https://bugs.openjdk.java.net/browse/JDK-8251303 Webrev: https://cr.openjdk.java.net/~burban/cgracie/unused_code/webrev0.0/ Sponsor Required: Yes Test: ?? Built on MacOS {release,fastdebug} Thanks, Charlie From vladimir.x.ivanov at oracle.com Fri Aug 7 16:45:56 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 7 Aug 2020 19:45:56 +0300 Subject: RFR: 8250808: Re-associate loop invariants with other associative operations In-Reply-To: References: Message-ID: > Webrev: http://cr.openjdk.java.net/~xgong/rfr/8250808/webrev.00/ Looks good. So far, testing results look good (hs-tier1/2 are clean, tier1-4 are in progress). Best regards, Vladimir Ivanov > C2 has re-association of loop invariants. However, the current implementation > only supports the re-associations for add and subtract with 32-bits integer type. > For other associative expressions like multiplication and the logic operations, > the re-association is also applicable, and also for the operations with long type. > > This patch adds the missing re-associations for other associative operations > together with the support for long type. > > With this patch, the following expressions: > (x * inv1) * inv2 > (x | inv1) | inv2 > (x & inv1) & inv2 > (x ^ inv1) ^ inv2 ; inv1, inv2 are invariants > > can be re-associated to: > x * (inv1 * inv2) ; "inv1 * inv2" can be hoisted > x | (inv1 | inv2) ; "inv1 | inv2" can be hoisted > x & (inv1 & inv2) ; "inv1 & inv2" can be hoisted > x ^ (inv1 ^ inv2) ; "inv1 ^ inv2" can be hoisted > > Performance: > Here is the micro benchmark: > http://cr.openjdk.java.net/~xgong/rfr/8250808/LoopInvariant.java > > And the results on X86_64: > Before: > Benchmark (length) Mode Cnt Score Error Units > loopInvariantAddLong 1024 avgt 15 988.142 ? 0.110 ns/op > loopInvariantAndInt 1024 avgt 15 843.850 ? 0.522 ns/op > loopInvariantAndLong 1024 avgt 15 990.551 ? 10.458 ns/op > loopInvariantMulInt 1024 avgt 15 1209.003 ? 0.247 ns/op > loopInvariantMulLong 1024 avgt 15 1213.923 ? 0.438 ns/op > loopInvariantOrInt 1024 avgt 15 843.908 ? 0.132 ns/op > loopInvariantOrLong 1024 avgt 15 990.710 ? 10.484 ns/op > loopInvariantSubLong 1024 avgt 15 988.170 ? 0.159 ns/op > loopInvariantXorInt 1024 avgt 15 806.949 ? 7.860 ns/op > loopInvariantXorLong 1024 avgt 15 990.963 ? 8.321 ns/op > > After: > Benchmark (length) Mode Cnt Score Error Units > loopInvariantAddLong 1024 avgt 15 842.854 ? 9.036 ns/op > loopInvariantAndInt 1024 avgt 15 698.097 ? 0.916 ns/op > loopInvariantAndLong 1024 avgt 15 841.120 ? 0.118 ns/op > loopInvariantMulInt 1024 avgt 15 691.000 ? 7.696 ns/op > loopInvariantMulLong 1024 avgt 15 846.907 ? 0.189 ns/op > loopInvariantOrInt 1024 avgt 15 698.423 ? 4.969 ns/op > loopInvariantOrLong 1024 avgt 15 843.465 ? 10.196 ns/op > loopInvariantSubLong 1024 avgt 15 841.314 ? 2.906 ns/op > loopInvariantXorInt 1024 avgt 15 652.529 ? 0.556 ns/op > loopInvariantXorLong 1024 avgt 15 841.860 ? 2.491 ns/op > > Results on AArch64: > Before: > Benchmark (length) Mode Cnt Score Error Units > loopInvariantAddLong 1024 avgt 15 514.437 ? 0.351 ns/op > loopInvariantAndInt 1024 avgt 15 435.301 ? 0.415 ns/op > loopInvariantAndLong 1024 avgt 15 572.437 ? 0.057 ns/op > loopInvariantMulInt 1024 avgt 15 1154.544 ? 0.030 ns/op > loopInvariantMulLong 1024 avgt 15 1188.109 ? 0.299 ns/op > loopInvariantOrInt 1024 avgt 15 435.605 ? 0.977 ns/op > loopInvariantOrLong 1024 avgt 15 572.475 ? 0.093 ns/op > loopInvariantSubLong 1024 avgt 15 514.340 ? 0.154 ns/op > loopInvariantXorInt 1024 avgt 15 426.186 ? 0.105 ns/op > loopInvariantXorLong 1024 avgt 15 572.505 ? 0.259 ns/op > > After: > Benchmark (length) Mode Cnt Score Error Units > loopInvariantAddLong 1024 avgt 15 508.179 ? 0.108 ns/op > loopInvariantAndInt 1024 avgt 15 394.706 ? 0.199 ns/op > loopInvariantAndLong 1024 avgt 15 434.443 ? 0.247 ns/op > loopInvariantMulInt 1024 avgt 15 762.477 ? 0.079 ns/op > loopInvariantMulLong 1024 avgt 15 775.975 ? 0.159 ns/op > loopInvariantOrInt 1024 avgt 15 394.657 ? 0.156 ns/op > loopInvariantOrLong 1024 avgt 15 434.428 ? 0.282 ns/op > loopInvariantSubLong 1024 avgt 15 507.475 ? 0.151 ns/op > loopInvariantXorInt 1024 avgt 15 396.000 ? 0.011 ns/op > loopInvariantXorLong 1024 avgt 15 434.255 ? 0.099 ns/op > > Tests: > Tested jtreg hotspot::hotspot_all_no_apps,jdk::jdk_core,langtools::tier1 > and jcstress:tests-custom, and all tests pass without new failure. > > Thanks, > Xiaohong Gong > From vladimir.x.ivanov at oracle.com Fri Aug 7 16:50:33 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 7 Aug 2020 19:50:33 +0300 Subject: RFR: 8251303: C2: remove unused _site_invoke_ratio and related code from InlineTree In-Reply-To: References: Message-ID: > Webrev: https://cr.openjdk.java.net/~burban/cgracie/unused_code/webrev0.0/ Looks good. I'll submit it for testing. Best regards, Vladimir Ivanov From vladimir.x.ivanov at oracle.com Fri Aug 7 17:06:26 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 7 Aug 2020 20:06:26 +0300 Subject: [16] (S) RFR 8251260: two MD5 tests fail "RuntimeException: Unexpected count of intrinsic" In-Reply-To: <7ecc9a5b-3af6-78a4-832c-03d043340f9f@oracle.com> References: <7ecc9a5b-3af6-78a4-832c-03d043340f9f@oracle.com> Message-ID: <08098628-7484-f2dc-019a-54a45f37a9c9@oracle.com> > http://cr.openjdk.java.net/~kvn/8251260/webrev.00/ Looks good. Best regards, Vladimir Ivanov > https://bugs.openjdk.java.net/browse/JDK-8251260 > > New MD5 intrinsic tests failed when run with AOTed java.base. And old > SHA tests are problem listed for AOT. > > SHA and MD5 intrinsic tests parse -XX:+LogCompilation output looking for > compilation of sun/security/provider methods as intrinsics. But these > methods are already pre-compiled by AOT when AOTed java.base is used. As > result LogCompilation does not have corresponding entries. > > I think we should not run these MD5 and SHA tests with AOTed java.base > module. I added corresponding @requires. > > Old SHA tests were problem listed referencing 8167430 [1] bug but I > think it is incorrect. The original SHA tests crash with AOT 8207358 [2] > bug was closed as duplicate of 8167430 because of conflict how > intirnsics flags are set by default during AOT compilation. But we > simply should not run these tests with AOTed java.base. So I am adding > @requires to them as well and removing them from AOT problem list. > > Tested hs-tier1, hs-tier2 (runs sha,md5 tests), hs-tier6 (now skips > sha,md5 tests when AOTed java.base is used). > > Thanks, > Vladimir > > [1] https://bugs.openjdk.java.net/browse/JDK-8167430 > [2] https://bugs.openjdk.java.net/browse/JDK-8207358 From vladimir.kozlov at oracle.com Fri Aug 7 17:08:17 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 7 Aug 2020 10:08:17 -0700 Subject: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 In-Reply-To: <95066bec-d74e-eb55-9a05-463239016b2a@oracle.com> References: <92d97d1b-fc53-e368-b249-1cab7db33964@oracle.com> <5f6a3e52-7854-4613-43f1-32a7423a0db6@oracle.com> <8265e303-0f86-b308-be79-740d6b4710f2@oracle.com> <95066bec-d74e-eb55-9a05-463239016b2a@oracle.com> Message-ID: <9748e2ee-f47d-7c47-627d-58e7d98e1779@oracle.com> I see that you already discussed removal of opcodes/encoding from patterns in .ad file and move them to Assembler. I would like to see that too. Changes look good otherwise. Thank you for adding tests to verify new code. Thanks, Vladimir K On 8/7/20 5:15 AM, Vladimir Ivanov wrote: > >>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.06/ > > Still looks good. > > It would be nice to get one more (R)eview. Let's wait a little bit more. > > Best regards, > Vladimir Ivanov > >>>> -----Original Message----- >>>> From: Vladimir Ivanov >>>> Sent: Saturday, August 1, 2020 4:49 AM >>>> To: Bhateja, Jatin >>>> Cc: Viswanathan, Sandhya ; >>>> hotspot-compiler- dev at openjdk.java.net >>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for >>>> X86 >>>> >>>> >>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.05/ >>>> >>>> Looks good. >>>> >>>> Tier5 (where I saw the crashes) passed. >>>> >>>> Please, incorporate the following minor cleanups in the final version: >>>> >>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8248830/webrev.05.cleanu >>>> p/ >>>> >>>> (Tested with hs-tier1,hs-tier2.) >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>>>> -----Original Message----- >>>>>> From: Vladimir Ivanov >>>>>> Sent: Thursday, July 30, 2020 3:30 AM >>>>>> To: Bhateja, Jatin >>>>>> Cc: Viswanathan, Sandhya ; >>>>>> hotspot-compiler- dev at openjdk.java.net >>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification >>>>>> for >>>>>> X86 >>>>>> >>>>>> >>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.04/ >>>>>>> >>>>>>> Looks good. (Testing is in progress.) >>>>>> >>>>>> FYI test results are clean (tier1-tier5). >>>>>> >>>>>>>> I have removed RotateLeftNode/RotateRightNode::Ideal routines >>>>>>>> since we are anyways doing constant folding in LShiftI/URShiftI >>>>>>>> value routines. Since JAVA rotate APIs are no longer intrincified >>>>>>>> hence these routines may no longer be useful. >>>>>>> >>>>>>> Nice observation! Good. >>>>>> >>>>>> As a second thought, it seems there's still a chance left that >>>>>> Rotate nodes get their input type narrowed after the folding >>>>>> happened. For example, as a result of incremental inlining or CFG >>>>>> transformations during loop optimizations. And it does happen in >>>>>> practice since the testing revealed some crashes due to the bug in >>>> RotateLeftNode/RotateRightNode::Ideal(). >>>>>> >>>>>> So, it makes sense to keep the transformations. But I'm fine with >>>>>> addressing that as a followup enhancement. >>>>>> >>>>>> Best regards, >>>>>> Vladimir Ivanov >>>>>> >>>>>>> >>>>>>>>> It would be really nice to migrate to MacroAssembler along the >>>>>>>>> way (as a cleanup). >>>>>>>> >>>>>>>> I guess you are saying remove opcodes/encoding from patterns and >>>>>>>> move then to Assembler, Can we take this cleanup activity >>>>>>>> separately since other patterns are also using these matcher >>>> directives. >>>>>>> >>>>>>> I'm perfectly fine with handling it as a separate enhancement. >>>>>>> >>>>>>>> Other synthetic comments have been taken care of. I have extended >>>>>>>> the Test to cover all the newly added scalar transforms. Kindly >>>>>>>> let me know if there other comments. >>>>>>> >>>>>>> Nice! >>>>>>> >>>>>>> Best regards, >>>>>>> Vladimir Ivanov >>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Vladimir Ivanov >>>>>>>>> Sent: Friday, July 24, 2020 3:21 AM >>>>>>>>> To: Bhateja, Jatin >>>>>>>>> Cc: Viswanathan, Sandhya ; Andrew >>>>>>>>> Haley ; hotspot-compiler-dev at openjdk.java.net >>>>>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification >>>>>>>>> for >>>>>>>>> X86 >>>>>>>>> >>>>>>>>> Hi Jatin, >>>>>>>>> >>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.03/ >>>>>>>>> >>>>>>>>> Much better! Thanks. >>>>>>>>> >>>>>>>>>> Change Summary: >>>>>>>>>> >>>>>>>>>> 1) Unified the handling for scalar rotate operation. All scalar >>>>>>>>>> rotate >>>>>>>>> selection patterns are now dependent on newly created >>>>>>>>> RotateLeft/RotateRight nodes. This promotes rotate inferencing. >>>>>>>>> Currently >>>>>>>>> if DAG nodes corresponding to a sub-pattern are shared (have >>>>>>>>> multiple >>>>>>>>> users) then existing complex patterns based on >>>>>>>>> Or/LShiftL/URShift does not get matched and this prevents inferring >>> rotate nodes. >>>>>>>>> Please refer to JIT'ed assembly output with baseline[1] and with >>>>>>>>> patch[2] . We can see that generated code size also went done >>>>>>>>> from >>>>>>>>> 832 byte to 768 bytes. Also this can cause perf degradation if >>>>>>>>> shift-or dependency chain appears inside a hot region. >>>>>>>>>> >>>>>>>>>> 2) Due to enhanced rotate inferencing new patch shows better >>>>>>>>>> performance >>>>>>>>> even for legacy targets (non AVX-512). Please refer to the perf >>>>>>>>> result[3] over AVX2 machine for JMH benchmark part of the patch. >>>>>>>>> >>>>>>>>> Very nice! >>>>>>>>>> 3) As suggested, removed Java API intrinsification changes and >>>>>>>>>> scalar >>>>>>>>> rotate transformation are done during OrI/OrL node idealizations. >>>>>>>>> >>>>>>>>> Good. >>>>>>>>> >>>>>>>>> (Still would be nice to factor the matching code from Ideal() >>>>>>>>> and share it between multiple use sites. Especially considering >>>>>>>>> OrVNode::Ideal() now does basically the same thing. As an >>>>>>>>> example/idea, take a look at >>>>>>>>> is_bmi_pattern() in x86.ad.) >>>>>>>>> >>>>>>>>>> 4) SLP always gets to work on new scalar Rotate nodes and >>>>>>>>>> creates vector >>>>>>>>> rotate nodes which are degenerated into OrV/LShiftV/URShiftV >>>>>>>>> nodes if target does not supports vector rotates(non-AVX512). >>>>>>>>> >>>>>>>>> Good. >>>>>>>>> >>>>>>>>>> 5) Added new instruction patterns for vector shift Left/Right >>>>>>>>>> operations >>>>>>>>> with constant shift operands. This prevents emitting extra moves >>>>>>>>> to >>>>>> XMM. >>>>>>>>> >>>>>>>>> +instruct vshiftI_imm(vec dst, vec src, immI8 shift) %{ >>>>>>>>> +? match(Set dst (LShiftVI src shift)); >>>>>>>>> >>>>>>>>> I'd prefer to see a uniform Ideal IR shape being used >>>>>>>>> irrespective of whether the argument is a constant or not. It >>>>>>>>> should also simplify the logic in SuperWord and make it easier >>>>>>>>> to support on >>>>>>>>> non-x86 architectures. >>>>>>>>> >>>>>>>>> For example, here's how it is done on AArch64: >>>>>>>>> >>>>>>>>> instruct vsll4I_imm(vecX dst, vecX src, immI shift) %{ >>>>>>>>> ? ??? predicate(n->as_Vector()->length() == 4); >>>>>>>>> ? ??? match(Set dst (LShiftVI src (LShiftCntV shift))); ... >>>>>>>>> >>>>>>>>>> 6) Constant folding scenarios are covered in >>>>>>>>>> RotateLeft/RotateRight >>>>>>>>> idealization, inferencing of vector rotate through OrV >>>>>>>>> idealization covers the vector patterns generated though non SLP >>>> route i.e. >>>>>>>>> VectorAPI. >>>>>>>>> >>>>>>>>> I'm fine with keeping OrV::Ideal(), but I'm concerned with the >>>>>>>>> general direction here - duplication of scalar transformations >>>>>>>>> to lane-wise vector operations. It definitely won't scale and in >>>>>>>>> a longer run it risks to diverge. Would be nice to find a way to >>>>>>>>> automatically "lift" >>>>>>>>> scalar transformations to vectors and apply them uniformly. But >>>>>>>>> right now it is just an idea which requires more experimentation. >>>>>>>>> >>>>>>>>> >>>>>>>>> Some other minor comments/suggestions: >>>>>>>>> >>>>>>>>> +? // Swap the computed left and right shift counts. >>>>>>>>> +? if (is_rotate_left) { >>>>>>>>> +??? Node* temp = shiftRCnt; >>>>>>>>> +??? shiftRCnt? = shiftLCnt; >>>>>>>>> +??? shiftLCnt? = temp; >>>>>>>>> +? } >>>>>>>>> >>>>>>>>> Maybe use swap() here (declared in globalDefinitions.hpp)? >>>>>>>>> >>>>>>>>> >>>>>>>>> +? if (Matcher::match_rule_supported_vector(vopc, vlen, bt)) >>>>>>>>> +??? return true; >>>>>>>>> >>>>>>>>> Please, don't omit curly braces (even for simple cases). >>>>>>>>> >>>>>>>>> >>>>>>>>> -// Rotate Right by variable >>>>>>>>> -instruct rorI_rReg_Var_C0(no_rcx_RegI dst, rcx_RegI shift, >>>>>>>>> immI0 zero, rFlagsReg cr) >>>>>>>>> +instruct rorI_immI8_legacy(rRegI dst, immI8 shift, rFlagsReg >>>>>>>>> +cr) >>>>>>>>> ? ?? %{ >>>>>>>>> -? match(Set dst (OrI (URShiftI dst shift) (LShiftI dst (SubI >>>>>>>>> zero shift)))); >>>>>>>>> - >>>>>>>>> +? predicate(!VM_Version::supports_bmi2() && >>>>>>>>> n->bottom_type()->basic_type() == T_INT); >>>>>>>>> +? match(Set dst (RotateRight dst shift)); >>>>>>>>> +? format %{ "rorl???? $dst, $shift" %} >>>>>>>>> ? ???? expand %{ >>>>>>>>> -??? rorI_rReg_CL(dst, shift, cr); >>>>>>>>> +??? rorI_rReg_imm8(dst, shift, cr); >>>>>>>>> ? ???? %} >>>>>>>>> >>>>>>>>> It would be really nice to migrate to MacroAssembler along the >>>>>>>>> way (as a cleanup). >>>>>>>>> >>>>>>>>>> Please push the patch through your testing framework and let me >>>>>>>>>> know your >>>>>>>>> review feedback. >>>>>>>>> >>>>>>>>> There's one new assertion failure: >>>>>>>>> >>>>>>>>> #? Internal Error (.../src/hotspot/share/opto/phaseX.cpp:1238), >>>>>>>>> pid=5476, tid=6219 >>>>>>>>> #? assert((i->_idx >= k->_idx) || i->is_top()) failed: Idealize >>>>>>>>> should return new nodes, use Identity to return old nodes >>>>>>>>> >>>>>>>>> I believe it comes from >>>>>>>>> RotateLeftNode::Ideal/RotateRightNode::Ideal >>>>>>>>> which can return pre-contructed constants. I suggest to get rid >>>>>>>>> of >>>>>>>>> Ideal() methods and move constant folding logic into >>>>>>>>> Node::Value() (as implemented for other bitwise/arithmethic >>>>>>>>> nodes in addnode.cpp/subnode.cpp/mulnode.cpp et al). It's a more >>>>>>>>> generic approach since it enables richer type information >>>>>>>>> (ranges vs >>>>>>>>> constants) and IMO it's more convenient to work with constants >>>>>>>>> through Types than ConNodes. >>>>>>>>> >>>>>>>>> (I suspect that original/expanded IR shape may already provide >>>>>>>>> more precise type info for non-constant case which can affect >>>>>>>>> the >>>>>>>>> benchmarks.) >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Vladimir Ivanov >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Jatin >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> >>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_baseline_avx2_asm. >>>>>>>>>> txt [2] >>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_new_patch_a >>>>>>>>>> vx >>>>>>>>>> 2_ >>>>>>>>>> asm >>>>>>>>>> .txt [3] >>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_perf_avx2_n >>>>>>>>>> ew >>>>>>>>>> _p >>>>>>>>>> atc >>>>>>>>>> h.txt >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: Vladimir Ivanov >>>>>>>>>>> Sent: Saturday, July 18, 2020 12:25 AM >>>>>>>>>>> To: Bhateja, Jatin ; Andrew Haley >>>>>>>>>>> >>>>>>>>>>> Cc: Viswanathan, Sandhya ; >>>>>>>>>>> hotspot-compiler- dev at openjdk.java.net >>>>>>>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API >>>>>>>>>>> intrinsification for >>>>>>>>>>> X86 >>>>>>>>>>> >>>>>>>>>>> Hi Jatin, >>>>>>>>>>> >>>>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev_02/ >>>>>>>>>>> >>>>>>>>>>> It definitely looks better, but IMO it hasn't reached the >>>>>>>>>>> sweet spot >>>>>>>>> yet. >>>>>>>>>>> It feels like the focus is on auto-vectorizer while the burden >>>>>>>>>>> is put on scalar cases. >>>>>>>>>>> >>>>>>>>>>> First of all, considering GVN folds relevant operation >>>>>>>>>>> patterns into a single Rotate node now, what's the motivation >>>>>>>>>>> to introduce intrinsics? >>>>>>>>>>> >>>>>>>>>>> Another point is there's still significant duplication for >>>>>>>>>>> scalar cases. >>>>>>>>>>> >>>>>>>>>>> I'd prefer to see the legacy cases which rely on pattern >>>>>>>>>>> matching to go away and be substituted with instructions which >>>>>>>>>>> match Rotate instructions (migrating ). >>>>>>>>>>> >>>>>>>>>>> I understand that it will penalize the vectorization >>>>>>>>>>> implementation, but IMO reducing overall complexity is worth it. >>>>>>>>>>> On auto-vectorizer side, I see >>>>>>>>>>> 2 ways to fix it: >>>>>>>>>>> >>>>>>>>>>> ? ???? (1) introduce additional AD instructions for >>>>>>>>>>> RotateLeftV/RotateRightV specifically for pre-AVX512 hardware; >>>>>>>>>>> >>>>>>>>>>> ? ???? (2) in SuperWord::output(), when matcher doesn't support >>>>>>>>>>> RotateLeftV/RotateLeftV nodes >>>>>>>>>>> (Matcher::match_rule_supported()), >>>>>>>>>>> generate vectorized version of the original pattern. >>>>>>>>>>> >>>>>>>>>>> Overall, it looks like more and more focus is made on scalar >>> part. >>>>>>>>>>> Considering the main goal of the patch is to enable >>>>>>>>>>> vectorization, I'm fine with separating cleanup of scalar part. >>>>>>>>>>> As an interim solution, it seems that leaving the scalar part >>>>>>>>>>> as it is now and matching scalar bit rotate pattern in >>>>>>>>>>> VectorNode::is_rotate() should be enough to keep the >>>>>>>>>>> vectorization part functioning. Then scalar Rotate nodes and >>>> relevant cleanups can be integrated later. >>>>>>>>>>> (Or vice >>>>>>>>>>> versa: clean up scalar part first and then follow up with >>>>>>>>>>> vectorization.) >>>>>>>>>>> >>>>>>>>>>> Some other comments: >>>>>>>>>>> >>>>>>>>>>> * There's a lot of duplication between OrINode::Ideal and >>>>>>>>> OrLNode::Ideal. >>>>>>>>>>> What do you think about introducing a super type >>>>>>>>>>> (OrNode) and put a unified version (OrNode::Ideal) there? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> * src/hotspot/cpu/x86/x86.ad >>>>>>>>>>> >>>>>>>>>>> +instruct vprotate_immI8(vec dst, vec src, immI8 shift) %{ >>>>>>>>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type() >>>>>>>>>>> +== >>>>>>>>>>> T_INT >>>>>>>>> || >>>>>>>>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type() >>>>>>>>>>> +== T_LONG); >>>>>>>>>>> >>>>>>>>>>> +instruct vprorate(vec dst, vec src, vec shift) %{ >>>>>>>>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type() >>>>>>>>>>> +== >>>>>>>>>>> T_INT >>>>>>>>> || >>>>>>>>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type() >>>>>>>>>>> +== T_LONG); >>>>>>>>>>> >>>>>>>>>>> The predicates are redundant here. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> * src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp >>>>>>>>>>> >>>>>>>>>>> +void C2_MacroAssembler::vprotate_imm(int opcode, BasicType >>>>>>>>>>> +etype, >>>>>>>>>>> XMMRegister dst, XMMRegister src, >>>>>>>>>>> +???????????????????????????????????? int shift, int >>>>>>>>>>> +vector_len) { if (opcode == Op_RotateLeftV) { >>>>>>>>>>> +??? if (etype == T_INT) { >>>>>>>>>>> +????? evprold(dst, src, shift, vector_len); >>>>>>>>>>> +??? } else { >>>>>>>>>>> +????? evprolq(dst, src, shift, vector_len); >>>>>>>>>>> +??? } >>>>>>>>>>> >>>>>>>>>>> Please, put an assert for the false case (assert(etype == >>>>>>>>>>> T_LONG, >>>>>>>>> "...")). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> * On testing (with previous version of the patch): -XX:UseAVX >>>>>>>>>>> is >>>>>>>>>>> x86- specific flag, so new/adjusted tests now fail on non-x86 >>>>>> platforms. >>>>>>>>>>> Either omitting the flag or adding >>>>>>>>>>> -XX:+IgnoreUnrecognizedVMOptions will solve the issue. >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> Vladimir Ivanov >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Summary of changes: >>>>>>>>>>>> 1) Optimization is specifically targeted to exploit vector >>>>>>>>>>>> rotation >>>>>>>>>>> instruction added for X86 AVX512. A single rotate instruction >>>>>>>>>>> encapsulates entire vector OR/SHIFTs pattern thus offers >>>>>>>>>>> better latency at reduced instruction count. >>>>>>>>>>>> >>>>>>>>>>>> 2) There were two approaches to implement this: >>>>>>>>>>>> ? ?????? a)? Let everything remain the same and add new wide >>>>>>>>>>>> complex >>>>>>>>>>> instruction patterns in the matcher for e.g. >>>>>>>>>>>> ? ??????????? set Dst ( OrV (Binary (LShiftVI dst (Binary >>>>>>>>>>>> ReplicateI >>>>>>>>>>>> shift)) >>>>>>>>>>> (URShiftVI dst (Binary (SubI (Binary ReplicateI 32) ( >>>>>>>>>>> Replicate >>>>>>>>>>> shift)) >>>>>>>>>>>> ? ?????? It would have been an overoptimistic assumption to >>>>>>>>>>>> expect that graph >>>>>>>>>>> shape would be preserved till the matcher for correct >>> inferencing. >>>>>>>>>>>> ? ?????? In addition we would have required multiple such >>>>>>>>>>>> bulky patterns. >>>>>>>>>>>> ? ?????? b) Create new RotateLeft/RotateRight scalar nodes, >>>>>>>>>>>> these gets >>>>>>>>>>> generated during intrinsification as well as during additional >>>>>>>>>>> pattern >>>>>>>>>>>> ? ?????? matching during node Idealization, later on these >>>>>>>>>>>> nodes are consumed >>>>>>>>>>> by SLP for valid vectorization scenarios to emit their vector >>>>>>>>>>>> ? ?????? counterparts which eventually emits vector rotates. >>>>>>>>>>>> >>>>>>>>>>>> 3) I choose approach 2b) since its cleaner, only problem here >>>>>>>>>>>> was that in non-evex mode (UseAVX < 3) new scalar Rotate >>>>>>>>>>>> nodes should either be >>>>>>>>>>> dismantled back to OR/SHIFT pattern or we penalize the >>>>>>>>>>> vectorization which would be very costly, other option would >>>>>>>>>>> have been to add additional vector rotate pattern for UseAVX=3 >>>>>>>>>>> in the matcher which emit vector OR-SHIFTs instruction but >>>>>>>>>>> then it will loose on emitting efficient instruction sequence >>>>>>>>>>> which node sharing >>>>>>>>>>> (OrV/LShiftV/URShift) offer in current implementation - thus >>>>>>>>>>> it will not be beneficial for non-AVX512 targets, only saving >>>>>>>>>>> will be in terms of cleanup of few existing scalar rotate >>>>>>>>>>> matcher patterns, also old targets does not offer this >>>>>>>>>>> powerful rotate >>>>>> instruction. >>>>>>>>>>> Therefore new scalar nodes are created only for AVX512 targets. >>>>>>>>>>>> >>>>>>>>>>>> As per suggestions constant folding scenarios have been >>>>>>>>>>>> covered during >>>>>>>>>>> Idealizations of newly added scalar nodes. >>>>>>>>>>>> >>>>>>>>>>>> Please review the latest version and share your feedback and >>>>>>>>>>>> test >>>>>>>>>>> results. >>>>>>>>>>>> >>>>>>>>>>>> Best Regards, >>>>>>>>>>>> Jatin >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>> From: Andrew Haley >>>>>>>>>>>>> Sent: Saturday, July 11, 2020 2:24 PM >>>>>>>>>>>>> To: Vladimir Ivanov ; Bhateja, >>>>>>>>>>>>> Jatin ; >>>>>>>>>>>>> hotspot-compiler-dev at openjdk.java.net >>>>>>>>>>>>> Cc: Viswanathan, Sandhya >>>>>>>>>>>>> Subject: Re: 8248830 : RFR[S] : C2 : Rotate API >>>>>>>>>>>>> intrinsification for >>>>>>>>>>>>> X86 >>>>>>>>>>>>> >>>>>>>>>>>>> On 10/07/2020 18:32, Vladimir Ivanov wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> ? ??? > High-level comment: so far, there were no pressing >>>>>>>>>>>>> need in >>>>>>>>>>>>>> explicitly marking the methods as intrinsics. ROR/ROL >>>>>>>>>>>>> instructions >>>>>>>>>>>>>> were selected during matching [1]. Now the patch introduces >>>>>>>>>>>>>>> >>>>>>>>>>>>> dedicated nodes >>>>>>>>>>>>> (RotateLeft/RotateRight) specifically for intrinsics? > >>>>>>>>>>>>> which partly duplicates existing logic. >>>>>>>>>>>>> >>>>>>>>>>>>> The lack of rotate nodes in the IR has always meant that >>>>>>>>>>>>> AArch64 doesn't generate optimal code for e.g. >>>>>>>>>>>>> >>>>>>>>>>>>> ? ????? (Set dst (XorL reg1 (RotateLeftL reg2 imm))) >>>>>>>>>>>>> >>>>>>>>>>>>> because, with the RotateLeft expanded to its full >>>>>>>>>>>>> combination of ORs and shifts, it's to complicated to match. >>>>>>>>>>>>> At the time I put this to one side because it wasn't urgent. >>>>>>>>>>>>> This is a shame because although such combinations are >>>>>>>>>>>>> unusual they are used in some crypto >>>>>>>>> operations. >>>>>>>>>>>>> >>>>>>>>>>>>> If we can generate immediate-form rotate nodes early by >>>>>>>>>>>>> pattern matching during parsing (rather than depending on >>>>>>>>>>>>> intrinsics) we'll get more value than by depending on >>>>>>>>>>>>> programmers calling >>>>>> intrinsics. >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Andrew Haley? (he/him) >>>>>>>>>>>>> Java Platform Lead Engineer >>>>>>>>>>>>> Red Hat UK Ltd. >>>>>>>>>>>>> https://keybase.io/andrewhaley >>>>>>>>>>>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >>>>>>>>>>>> From vladimir.kozlov at oracle.com Fri Aug 7 17:10:36 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 7 Aug 2020 10:10:36 -0700 Subject: [16] (S) RFR 8251260: two MD5 tests fail "RuntimeException: Unexpected count of intrinsic" In-Reply-To: <08098628-7484-f2dc-019a-54a45f37a9c9@oracle.com> References: <7ecc9a5b-3af6-78a4-832c-03d043340f9f@oracle.com> <08098628-7484-f2dc-019a-54a45f37a9c9@oracle.com> Message-ID: <80bf7d5e-6446-817f-732c-519fc0383ff5@oracle.com> Thank you, Vladimir On 8/7/20 10:06 AM, Vladimir Ivanov wrote: > >> http://cr.openjdk.java.net/~kvn/8251260/webrev.00/ > > Looks good. > > Best regards, > Vladimir Ivanov > >> https://bugs.openjdk.java.net/browse/JDK-8251260 >> >> New MD5 intrinsic tests failed when run with AOTed java.base. And old SHA tests are problem listed for AOT. >> >> SHA and MD5 intrinsic tests parse -XX:+LogCompilation output looking for compilation of sun/security/provider methods >> as intrinsics. But these methods are already pre-compiled by AOT when AOTed java.base is used. As result >> LogCompilation does not have corresponding entries. >> >> I think we should not run these MD5 and SHA tests with AOTed java.base module. I added corresponding @requires. >> >> Old SHA tests were problem listed referencing 8167430 [1] bug but I think it is incorrect. The original SHA tests >> crash with AOT 8207358 [2] bug was closed as duplicate of 8167430 because of conflict how intirnsics flags are set by >> default during AOT compilation. But we simply should not run these tests with AOTed java.base. So I am adding >> @requires to them as well and removing them from AOT problem list. >> >> Tested hs-tier1, hs-tier2 (runs sha,md5 tests), hs-tier6 (now skips sha,md5 tests when AOTed java.base is used). >> >> Thanks, >> Vladimir >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8167430 >> [2] https://bugs.openjdk.java.net/browse/JDK-8207358 From luhenry at microsoft.com Sat Aug 8 04:30:37 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Sat, 8 Aug 2020 04:30:37 +0000 Subject: [11u] RFR[M]: 8250902: Implement MD5 Intrinsics on x86 Message-ID: Hello, I would like to backport the newly added MD5 Intrinsic to JDK 11. The change is contained, limiting the chance of a regression, and provides a great speedup on a common pattern. This change also contains the follow-up fix by Vladimir Kozlov. As it is the first backport I go through, please let me know what other steps I need to take. Original Bugs: https://bugs.openjdk.java.net/browse/JDK-8250902 https://bugs.openjdk.java.net/browse/JDK-8251260 Original Webrevs: http://cr.openjdk.java.net/~luhenry/8250902/webrev.03 http://cr.openjdk.java.net/~kvn/8251260/webrev.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8251319 Webrev: http://cr.openjdk.java.net/~luhenry/8250902-11u/webrev.00 Testing: Linux-x64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha, hotspot:tier1, jdk:tier1. Thank you, Ludovic [1] From vladimir.kozlov at oracle.com Sat Aug 8 04:49:49 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 7 Aug 2020 21:49:49 -0700 Subject: [11u] RFR[M]: 8250902: Implement MD5 Intrinsics on x86 In-Reply-To: References: Message-ID: <569d6c8b-136f-de81-5816-47216333fcb8@oracle.com> Hi Ludovic, Usually we backport only bugs fixes to keep LTS (11u) release stable. To backport into 11u you need approval [1]. Here is example [2]. You need also point if backport applied cleanly or you have to make changes. Changes should be backported separately to keep track - do not combine changes. But it is okay to push both changesets together (especially if followup changes fixed first). Regards, Vladimir K [1] http://openjdk.java.net/projects/jdk-updates/approval.html [2] https://bugs.openjdk.java.net/browse/JDK-8248214 On 8/7/20 9:30 PM, Ludovic Henry wrote: > Hello, > > I would like to backport the newly added MD5 Intrinsic to JDK 11. The change is contained, limiting the chance of a regression, and provides a great speedup on a common pattern. This change also contains the follow-up fix by Vladimir Kozlov. > > As it is the first backport I go through, please let me know what other steps I need to take. > > Original Bugs: > https://bugs.openjdk.java.net/browse/JDK-8250902 > https://bugs.openjdk.java.net/browse/JDK-8251260 > > Original Webrevs: > http://cr.openjdk.java.net/~luhenry/8250902/webrev.03 > http://cr.openjdk.java.net/~kvn/8251260/webrev.00/ > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8251319 > > Webrev: > http://cr.openjdk.java.net/~luhenry/8250902-11u/webrev.00 > > Testing: Linux-x64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha, hotspot:tier1, jdk:tier1. > > Thank you, > Ludovic > > [1] > > From aph at redhat.com Sat Aug 8 12:08:46 2020 From: aph at redhat.com (Andrew Haley) Date: Sat, 8 Aug 2020 13:08:46 +0100 Subject: [11u] RFR[M]: 8250902: Implement MD5 Intrinsics on x86 In-Reply-To: References: Message-ID: <8b61da32-3cc1-fa14-a607-1f871f7e3d70@redhat.com> On 8/8/20 5:30 AM, Ludovic Henry wrote: > I would like to backport the newly added MD5 Intrinsic to JDK 11. It's too early for that: changes are supposed to bake in JDK head for a while. Also, since it's an enhancement rather than a bug fix we'd need to have the discussion. I would say it's marginal whether something like this should be back ported. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From luhenry at microsoft.com Sat Aug 8 17:30:07 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Sat, 8 Aug 2020 17:30:07 +0000 Subject: [11u] RFR[M]: 8250902: Implement MD5 Intrinsics on x86 In-Reply-To: <8b61da32-3cc1-fa14-a607-1f871f7e3d70@redhat.com> References: <8b61da32-3cc1-fa14-a607-1f871f7e3d70@redhat.com> Message-ID: Hi Andrew, Vladimir, > It's too early for that: changes are supposed to bake in JDK head for > a while. Also, since it's an enhancement rather than a bug fix we'd > need to have the discussion. I would say it's marginal whether > something like this should be back ported. > Usually we backport only bugs fixes to keep LTS (11u) release stable. It makes perfect sense. I'm happy to wait longer, and follow up on that thread later on to check if there is any appetite to get it backported. > You need also point if backport applied cleanly or you have to make changes. The code conflicts were trivial as the infrastructure for intrinsics didn't change much since 11 (and even 8). Conflicts: http://cr.openjdk.java.net/~luhenry/8250902-11u/webrev.00/conflict.diff > Changes should be backported separately to keep track - do not combine changes. > But it is okay to push both changesets together (especially if followup changes fixed first). Sorry I do not fully understand. Is it ok in this case to combine both changes into a single changeset, since the second one is a followup that fixes the first one? Or should I still make 2 changeset, but have them pushed together? Thank you, Ludovic From jatin.bhateja at intel.com Sat Aug 8 21:06:18 2020 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Sat, 8 Aug 2020 21:06:18 +0000 Subject: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 In-Reply-To: <9748e2ee-f47d-7c47-627d-58e7d98e1779@oracle.com> References: <92d97d1b-fc53-e368-b249-1cab7db33964@oracle.com> <5f6a3e52-7854-4613-43f1-32a7423a0db6@oracle.com> <8265e303-0f86-b308-be79-740d6b4710f2@oracle.com> <95066bec-d74e-eb55-9a05-463239016b2a@oracle.com>, <9748e2ee-f47d-7c47-627d-58e7d98e1779@oracle.com> Message-ID: Thanks Vladimir K, Vladimir I, Patch has been pushed with suggested changes. https://hg.openjdk.java.net/jdk/jdk/rev/ebe6d3b79edf Best Regards, Jatin -------- Original message -------- From: Vladimir Kozlov Date: 07/08/2020 22:40 (GMT+05:30) To: Vladimir Ivanov , "Bhateja, Jatin" Cc: "Viswanathan, Sandhya" , hotspot-compiler-dev at openjdk.java.net, Andrew Haley Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 I see that you already discussed removal of opcodes/encoding from patterns in .ad file and move them to Assembler. I would like to see that too. Changes look good otherwise. Thank you for adding tests to verify new code. Thanks, Vladimir K On 8/7/20 5:15 AM, Vladimir Ivanov wrote: > >>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.06/ > > Still looks good. > > It would be nice to get one more (R)eview. Let's wait a little bit more. > > Best regards, > Vladimir Ivanov > >>>> -----Original Message----- >>>> From: Vladimir Ivanov >>>> Sent: Saturday, August 1, 2020 4:49 AM >>>> To: Bhateja, Jatin >>>> Cc: Viswanathan, Sandhya ; >>>> hotspot-compiler- dev at openjdk.java.net >>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for >>>> X86 >>>> >>>> >>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.05/ >>>> >>>> Looks good. >>>> >>>> Tier5 (where I saw the crashes) passed. >>>> >>>> Please, incorporate the following minor cleanups in the final version: >>>> >>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8248830/webrev.05.cleanu >>>> p/ >>>> >>>> (Tested with hs-tier1,hs-tier2.) >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>>>> -----Original Message----- >>>>>> From: Vladimir Ivanov >>>>>> Sent: Thursday, July 30, 2020 3:30 AM >>>>>> To: Bhateja, Jatin >>>>>> Cc: Viswanathan, Sandhya ; >>>>>> hotspot-compiler- dev at openjdk.java.net >>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification >>>>>> for >>>>>> X86 >>>>>> >>>>>> >>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.04/ >>>>>>> >>>>>>> Looks good. (Testing is in progress.) >>>>>> >>>>>> FYI test results are clean (tier1-tier5). >>>>>> >>>>>>>> I have removed RotateLeftNode/RotateRightNode::Ideal routines >>>>>>>> since we are anyways doing constant folding in LShiftI/URShiftI >>>>>>>> value routines. Since JAVA rotate APIs are no longer intrincified >>>>>>>> hence these routines may no longer be useful. >>>>>>> >>>>>>> Nice observation! Good. >>>>>> >>>>>> As a second thought, it seems there's still a chance left that >>>>>> Rotate nodes get their input type narrowed after the folding >>>>>> happened. For example, as a result of incremental inlining or CFG >>>>>> transformations during loop optimizations. And it does happen in >>>>>> practice since the testing revealed some crashes due to the bug in >>>> RotateLeftNode/RotateRightNode::Ideal(). >>>>>> >>>>>> So, it makes sense to keep the transformations. But I'm fine with >>>>>> addressing that as a followup enhancement. >>>>>> >>>>>> Best regards, >>>>>> Vladimir Ivanov >>>>>> >>>>>>> >>>>>>>>> It would be really nice to migrate to MacroAssembler along the >>>>>>>>> way (as a cleanup). >>>>>>>> >>>>>>>> I guess you are saying remove opcodes/encoding from patterns and >>>>>>>> move then to Assembler, Can we take this cleanup activity >>>>>>>> separately since other patterns are also using these matcher >>>> directives. >>>>>>> >>>>>>> I'm perfectly fine with handling it as a separate enhancement. >>>>>>> >>>>>>>> Other synthetic comments have been taken care of. I have extended >>>>>>>> the Test to cover all the newly added scalar transforms. Kindly >>>>>>>> let me know if there other comments. >>>>>>> >>>>>>> Nice! >>>>>>> >>>>>>> Best regards, >>>>>>> Vladimir Ivanov >>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Vladimir Ivanov >>>>>>>>> Sent: Friday, July 24, 2020 3:21 AM >>>>>>>>> To: Bhateja, Jatin >>>>>>>>> Cc: Viswanathan, Sandhya ; Andrew >>>>>>>>> Haley ; hotspot-compiler-dev at openjdk.java.net >>>>>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification >>>>>>>>> for >>>>>>>>> X86 >>>>>>>>> >>>>>>>>> Hi Jatin, >>>>>>>>> >>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.03/ >>>>>>>>> >>>>>>>>> Much better! Thanks. >>>>>>>>> >>>>>>>>>> Change Summary: >>>>>>>>>> >>>>>>>>>> 1) Unified the handling for scalar rotate operation. All scalar >>>>>>>>>> rotate >>>>>>>>> selection patterns are now dependent on newly created >>>>>>>>> RotateLeft/RotateRight nodes. This promotes rotate inferencing. >>>>>>>>> Currently >>>>>>>>> if DAG nodes corresponding to a sub-pattern are shared (have >>>>>>>>> multiple >>>>>>>>> users) then existing complex patterns based on >>>>>>>>> Or/LShiftL/URShift does not get matched and this prevents inferring >>> rotate nodes. >>>>>>>>> Please refer to JIT'ed assembly output with baseline[1] and with >>>>>>>>> patch[2] . We can see that generated code size also went done >>>>>>>>> from >>>>>>>>> 832 byte to 768 bytes. Also this can cause perf degradation if >>>>>>>>> shift-or dependency chain appears inside a hot region. >>>>>>>>>> >>>>>>>>>> 2) Due to enhanced rotate inferencing new patch shows better >>>>>>>>>> performance >>>>>>>>> even for legacy targets (non AVX-512). Please refer to the perf >>>>>>>>> result[3] over AVX2 machine for JMH benchmark part of the patch. >>>>>>>>> >>>>>>>>> Very nice! >>>>>>>>>> 3) As suggested, removed Java API intrinsification changes and >>>>>>>>>> scalar >>>>>>>>> rotate transformation are done during OrI/OrL node idealizations. >>>>>>>>> >>>>>>>>> Good. >>>>>>>>> >>>>>>>>> (Still would be nice to factor the matching code from Ideal() >>>>>>>>> and share it between multiple use sites. Especially considering >>>>>>>>> OrVNode::Ideal() now does basically the same thing. As an >>>>>>>>> example/idea, take a look at >>>>>>>>> is_bmi_pattern() in x86.ad.) >>>>>>>>> >>>>>>>>>> 4) SLP always gets to work on new scalar Rotate nodes and >>>>>>>>>> creates vector >>>>>>>>> rotate nodes which are degenerated into OrV/LShiftV/URShiftV >>>>>>>>> nodes if target does not supports vector rotates(non-AVX512). >>>>>>>>> >>>>>>>>> Good. >>>>>>>>> >>>>>>>>>> 5) Added new instruction patterns for vector shift Left/Right >>>>>>>>>> operations >>>>>>>>> with constant shift operands. This prevents emitting extra moves >>>>>>>>> to >>>>>> XMM. >>>>>>>>> >>>>>>>>> +instruct vshiftI_imm(vec dst, vec src, immI8 shift) %{ >>>>>>>>> + match(Set dst (LShiftVI src shift)); >>>>>>>>> >>>>>>>>> I'd prefer to see a uniform Ideal IR shape being used >>>>>>>>> irrespective of whether the argument is a constant or not. It >>>>>>>>> should also simplify the logic in SuperWord and make it easier >>>>>>>>> to support on >>>>>>>>> non-x86 architectures. >>>>>>>>> >>>>>>>>> For example, here's how it is done on AArch64: >>>>>>>>> >>>>>>>>> instruct vsll4I_imm(vecX dst, vecX src, immI shift) %{ >>>>>>>>> predicate(n->as_Vector()->length() == 4); >>>>>>>>> match(Set dst (LShiftVI src (LShiftCntV shift))); ... >>>>>>>>> >>>>>>>>>> 6) Constant folding scenarios are covered in >>>>>>>>>> RotateLeft/RotateRight >>>>>>>>> idealization, inferencing of vector rotate through OrV >>>>>>>>> idealization covers the vector patterns generated though non SLP >>>> route i.e. >>>>>>>>> VectorAPI. >>>>>>>>> >>>>>>>>> I'm fine with keeping OrV::Ideal(), but I'm concerned with the >>>>>>>>> general direction here - duplication of scalar transformations >>>>>>>>> to lane-wise vector operations. It definitely won't scale and in >>>>>>>>> a longer run it risks to diverge. Would be nice to find a way to >>>>>>>>> automatically "lift" >>>>>>>>> scalar transformations to vectors and apply them uniformly. But >>>>>>>>> right now it is just an idea which requires more experimentation. >>>>>>>>> >>>>>>>>> >>>>>>>>> Some other minor comments/suggestions: >>>>>>>>> >>>>>>>>> + // Swap the computed left and right shift counts. >>>>>>>>> + if (is_rotate_left) { >>>>>>>>> + Node* temp = shiftRCnt; >>>>>>>>> + shiftRCnt = shiftLCnt; >>>>>>>>> + shiftLCnt = temp; >>>>>>>>> + } >>>>>>>>> >>>>>>>>> Maybe use swap() here (declared in globalDefinitions.hpp)? >>>>>>>>> >>>>>>>>> >>>>>>>>> + if (Matcher::match_rule_supported_vector(vopc, vlen, bt)) >>>>>>>>> + return true; >>>>>>>>> >>>>>>>>> Please, don't omit curly braces (even for simple cases). >>>>>>>>> >>>>>>>>> >>>>>>>>> -// Rotate Right by variable >>>>>>>>> -instruct rorI_rReg_Var_C0(no_rcx_RegI dst, rcx_RegI shift, >>>>>>>>> immI0 zero, rFlagsReg cr) >>>>>>>>> +instruct rorI_immI8_legacy(rRegI dst, immI8 shift, rFlagsReg >>>>>>>>> +cr) >>>>>>>>> %{ >>>>>>>>> - match(Set dst (OrI (URShiftI dst shift) (LShiftI dst (SubI >>>>>>>>> zero shift)))); >>>>>>>>> - >>>>>>>>> + predicate(!VM_Version::supports_bmi2() && >>>>>>>>> n->bottom_type()->basic_type() == T_INT); >>>>>>>>> + match(Set dst (RotateRight dst shift)); >>>>>>>>> + format %{ "rorl $dst, $shift" %} >>>>>>>>> expand %{ >>>>>>>>> - rorI_rReg_CL(dst, shift, cr); >>>>>>>>> + rorI_rReg_imm8(dst, shift, cr); >>>>>>>>> %} >>>>>>>>> >>>>>>>>> It would be really nice to migrate to MacroAssembler along the >>>>>>>>> way (as a cleanup). >>>>>>>>> >>>>>>>>>> Please push the patch through your testing framework and let me >>>>>>>>>> know your >>>>>>>>> review feedback. >>>>>>>>> >>>>>>>>> There's one new assertion failure: >>>>>>>>> >>>>>>>>> # Internal Error (.../src/hotspot/share/opto/phaseX.cpp:1238), >>>>>>>>> pid=5476, tid=6219 >>>>>>>>> # assert((i->_idx >= k->_idx) || i->is_top()) failed: Idealize >>>>>>>>> should return new nodes, use Identity to return old nodes >>>>>>>>> >>>>>>>>> I believe it comes from >>>>>>>>> RotateLeftNode::Ideal/RotateRightNode::Ideal >>>>>>>>> which can return pre-contructed constants. I suggest to get rid >>>>>>>>> of >>>>>>>>> Ideal() methods and move constant folding logic into >>>>>>>>> Node::Value() (as implemented for other bitwise/arithmethic >>>>>>>>> nodes in addnode.cpp/subnode.cpp/mulnode.cpp et al). It's a more >>>>>>>>> generic approach since it enables richer type information >>>>>>>>> (ranges vs >>>>>>>>> constants) and IMO it's more convenient to work with constants >>>>>>>>> through Types than ConNodes. >>>>>>>>> >>>>>>>>> (I suspect that original/expanded IR shape may already provide >>>>>>>>> more precise type info for non-constant case which can affect >>>>>>>>> the >>>>>>>>> benchmarks.) >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Vladimir Ivanov >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Jatin >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> >>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_baseline_avx2_asm. >>>>>>>>>> txt [2] >>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_new_patch_a >>>>>>>>>> vx >>>>>>>>>> 2_ >>>>>>>>>> asm >>>>>>>>>> .txt [3] >>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_perf_avx2_n >>>>>>>>>> ew >>>>>>>>>> _p >>>>>>>>>> atc >>>>>>>>>> h.txt >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: Vladimir Ivanov >>>>>>>>>>> Sent: Saturday, July 18, 2020 12:25 AM >>>>>>>>>>> To: Bhateja, Jatin ; Andrew Haley >>>>>>>>>>> >>>>>>>>>>> Cc: Viswanathan, Sandhya ; >>>>>>>>>>> hotspot-compiler- dev at openjdk.java.net >>>>>>>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API >>>>>>>>>>> intrinsification for >>>>>>>>>>> X86 >>>>>>>>>>> >>>>>>>>>>> Hi Jatin, >>>>>>>>>>> >>>>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev_02/ >>>>>>>>>>> >>>>>>>>>>> It definitely looks better, but IMO it hasn't reached the >>>>>>>>>>> sweet spot >>>>>>>>> yet. >>>>>>>>>>> It feels like the focus is on auto-vectorizer while the burden >>>>>>>>>>> is put on scalar cases. >>>>>>>>>>> >>>>>>>>>>> First of all, considering GVN folds relevant operation >>>>>>>>>>> patterns into a single Rotate node now, what's the motivation >>>>>>>>>>> to introduce intrinsics? >>>>>>>>>>> >>>>>>>>>>> Another point is there's still significant duplication for >>>>>>>>>>> scalar cases. >>>>>>>>>>> >>>>>>>>>>> I'd prefer to see the legacy cases which rely on pattern >>>>>>>>>>> matching to go away and be substituted with instructions which >>>>>>>>>>> match Rotate instructions (migrating ). >>>>>>>>>>> >>>>>>>>>>> I understand that it will penalize the vectorization >>>>>>>>>>> implementation, but IMO reducing overall complexity is worth it. >>>>>>>>>>> On auto-vectorizer side, I see >>>>>>>>>>> 2 ways to fix it: >>>>>>>>>>> >>>>>>>>>>> (1) introduce additional AD instructions for >>>>>>>>>>> RotateLeftV/RotateRightV specifically for pre-AVX512 hardware; >>>>>>>>>>> >>>>>>>>>>> (2) in SuperWord::output(), when matcher doesn't support >>>>>>>>>>> RotateLeftV/RotateLeftV nodes >>>>>>>>>>> (Matcher::match_rule_supported()), >>>>>>>>>>> generate vectorized version of the original pattern. >>>>>>>>>>> >>>>>>>>>>> Overall, it looks like more and more focus is made on scalar >>> part. >>>>>>>>>>> Considering the main goal of the patch is to enable >>>>>>>>>>> vectorization, I'm fine with separating cleanup of scalar part. >>>>>>>>>>> As an interim solution, it seems that leaving the scalar part >>>>>>>>>>> as it is now and matching scalar bit rotate pattern in >>>>>>>>>>> VectorNode::is_rotate() should be enough to keep the >>>>>>>>>>> vectorization part functioning. Then scalar Rotate nodes and >>>> relevant cleanups can be integrated later. >>>>>>>>>>> (Or vice >>>>>>>>>>> versa: clean up scalar part first and then follow up with >>>>>>>>>>> vectorization.) >>>>>>>>>>> >>>>>>>>>>> Some other comments: >>>>>>>>>>> >>>>>>>>>>> * There's a lot of duplication between OrINode::Ideal and >>>>>>>>> OrLNode::Ideal. >>>>>>>>>>> What do you think about introducing a super type >>>>>>>>>>> (OrNode) and put a unified version (OrNode::Ideal) there? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> * src/hotspot/cpu/x86/x86.ad >>>>>>>>>>> >>>>>>>>>>> +instruct vprotate_immI8(vec dst, vec src, immI8 shift) %{ >>>>>>>>>>> + predicate(n->bottom_type()->is_vect()->element_basic_type() >>>>>>>>>>> +== >>>>>>>>>>> T_INT >>>>>>>>> || >>>>>>>>>>> + n->bottom_type()->is_vect()->element_basic_type() >>>>>>>>>>> +== T_LONG); >>>>>>>>>>> >>>>>>>>>>> +instruct vprorate(vec dst, vec src, vec shift) %{ >>>>>>>>>>> + predicate(n->bottom_type()->is_vect()->element_basic_type() >>>>>>>>>>> +== >>>>>>>>>>> T_INT >>>>>>>>> || >>>>>>>>>>> + n->bottom_type()->is_vect()->element_basic_type() >>>>>>>>>>> +== T_LONG); >>>>>>>>>>> >>>>>>>>>>> The predicates are redundant here. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> * src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp >>>>>>>>>>> >>>>>>>>>>> +void C2_MacroAssembler::vprotate_imm(int opcode, BasicType >>>>>>>>>>> +etype, >>>>>>>>>>> XMMRegister dst, XMMRegister src, >>>>>>>>>>> + int shift, int >>>>>>>>>>> +vector_len) { if (opcode == Op_RotateLeftV) { >>>>>>>>>>> + if (etype == T_INT) { >>>>>>>>>>> + evprold(dst, src, shift, vector_len); >>>>>>>>>>> + } else { >>>>>>>>>>> + evprolq(dst, src, shift, vector_len); >>>>>>>>>>> + } >>>>>>>>>>> >>>>>>>>>>> Please, put an assert for the false case (assert(etype == >>>>>>>>>>> T_LONG, >>>>>>>>> "...")). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> * On testing (with previous version of the patch): -XX:UseAVX >>>>>>>>>>> is >>>>>>>>>>> x86- specific flag, so new/adjusted tests now fail on non-x86 >>>>>> platforms. >>>>>>>>>>> Either omitting the flag or adding >>>>>>>>>>> -XX:+IgnoreUnrecognizedVMOptions will solve the issue. >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> Vladimir Ivanov >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Summary of changes: >>>>>>>>>>>> 1) Optimization is specifically targeted to exploit vector >>>>>>>>>>>> rotation >>>>>>>>>>> instruction added for X86 AVX512. A single rotate instruction >>>>>>>>>>> encapsulates entire vector OR/SHIFTs pattern thus offers >>>>>>>>>>> better latency at reduced instruction count. >>>>>>>>>>>> >>>>>>>>>>>> 2) There were two approaches to implement this: >>>>>>>>>>>> a) Let everything remain the same and add new wide >>>>>>>>>>>> complex >>>>>>>>>>> instruction patterns in the matcher for e.g. >>>>>>>>>>>> set Dst ( OrV (Binary (LShiftVI dst (Binary >>>>>>>>>>>> ReplicateI >>>>>>>>>>>> shift)) >>>>>>>>>>> (URShiftVI dst (Binary (SubI (Binary ReplicateI 32) ( >>>>>>>>>>> Replicate >>>>>>>>>>> shift)) >>>>>>>>>>>> It would have been an overoptimistic assumption to >>>>>>>>>>>> expect that graph >>>>>>>>>>> shape would be preserved till the matcher for correct >>> inferencing. >>>>>>>>>>>> In addition we would have required multiple such >>>>>>>>>>>> bulky patterns. >>>>>>>>>>>> b) Create new RotateLeft/RotateRight scalar nodes, >>>>>>>>>>>> these gets >>>>>>>>>>> generated during intrinsification as well as during additional >>>>>>>>>>> pattern >>>>>>>>>>>> matching during node Idealization, later on these >>>>>>>>>>>> nodes are consumed >>>>>>>>>>> by SLP for valid vectorization scenarios to emit their vector >>>>>>>>>>>> counterparts which eventually emits vector rotates. >>>>>>>>>>>> >>>>>>>>>>>> 3) I choose approach 2b) since its cleaner, only problem here >>>>>>>>>>>> was that in non-evex mode (UseAVX < 3) new scalar Rotate >>>>>>>>>>>> nodes should either be >>>>>>>>>>> dismantled back to OR/SHIFT pattern or we penalize the >>>>>>>>>>> vectorization which would be very costly, other option would >>>>>>>>>>> have been to add additional vector rotate pattern for UseAVX=3 >>>>>>>>>>> in the matcher which emit vector OR-SHIFTs instruction but >>>>>>>>>>> then it will loose on emitting efficient instruction sequence >>>>>>>>>>> which node sharing >>>>>>>>>>> (OrV/LShiftV/URShift) offer in current implementation - thus >>>>>>>>>>> it will not be beneficial for non-AVX512 targets, only saving >>>>>>>>>>> will be in terms of cleanup of few existing scalar rotate >>>>>>>>>>> matcher patterns, also old targets does not offer this >>>>>>>>>>> powerful rotate >>>>>> instruction. >>>>>>>>>>> Therefore new scalar nodes are created only for AVX512 targets. >>>>>>>>>>>> >>>>>>>>>>>> As per suggestions constant folding scenarios have been >>>>>>>>>>>> covered during >>>>>>>>>>> Idealizations of newly added scalar nodes. >>>>>>>>>>>> >>>>>>>>>>>> Please review the latest version and share your feedback and >>>>>>>>>>>> test >>>>>>>>>>> results. >>>>>>>>>>>> >>>>>>>>>>>> Best Regards, >>>>>>>>>>>> Jatin >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>> From: Andrew Haley >>>>>>>>>>>>> Sent: Saturday, July 11, 2020 2:24 PM >>>>>>>>>>>>> To: Vladimir Ivanov ; Bhateja, >>>>>>>>>>>>> Jatin ; >>>>>>>>>>>>> hotspot-compiler-dev at openjdk.java.net >>>>>>>>>>>>> Cc: Viswanathan, Sandhya >>>>>>>>>>>>> Subject: Re: 8248830 : RFR[S] : C2 : Rotate API >>>>>>>>>>>>> intrinsification for >>>>>>>>>>>>> X86 >>>>>>>>>>>>> >>>>>>>>>>>>> On 10/07/2020 18:32, Vladimir Ivanov wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> > High-level comment: so far, there were no pressing >>>>>>>>>>>>> need in >>>>>>>>>>>>>> explicitly marking the methods as intrinsics. ROR/ROL >>>>>>>>>>>>> instructions >>>>>>>>>>>>>> were selected during matching [1]. Now the patch introduces >>>>>>>>>>>>>>> >>>>>>>>>>>>> dedicated nodes >>>>>>>>>>>>> (RotateLeft/RotateRight) specifically for intrinsics > >>>>>>>>>>>>> which partly duplicates existing logic. >>>>>>>>>>>>> >>>>>>>>>>>>> The lack of rotate nodes in the IR has always meant that >>>>>>>>>>>>> AArch64 doesn't generate optimal code for e.g. >>>>>>>>>>>>> >>>>>>>>>>>>> (Set dst (XorL reg1 (RotateLeftL reg2 imm))) >>>>>>>>>>>>> >>>>>>>>>>>>> because, with the RotateLeft expanded to its full >>>>>>>>>>>>> combination of ORs and shifts, it's to complicated to match. >>>>>>>>>>>>> At the time I put this to one side because it wasn't urgent. >>>>>>>>>>>>> This is a shame because although such combinations are >>>>>>>>>>>>> unusual they are used in some crypto >>>>>>>>> operations. >>>>>>>>>>>>> >>>>>>>>>>>>> If we can generate immediate-form rotate nodes early by >>>>>>>>>>>>> pattern matching during parsing (rather than depending on >>>>>>>>>>>>> intrinsics) we'll get more value than by depending on >>>>>>>>>>>>> programmers calling >>>>>> intrinsics. >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Andrew Haley (he/him) >>>>>>>>>>>>> Java Platform Lead Engineer >>>>>>>>>>>>> Red Hat UK Ltd. >>>>>>>>>>>>> https://keybase.io/andrewhaley >>>>>>>>>>>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >>>>>>>>>>>> From vladimir.kozlov at oracle.com Sun Aug 9 02:28:42 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sat, 8 Aug 2020 19:28:42 -0700 Subject: [11u] RFR[M]: 8250902: Implement MD5 Intrinsics on x86 In-Reply-To: References: <8b61da32-3cc1-fa14-a607-1f871f7e3d70@redhat.com> Message-ID: On 8/8/20 10:30 AM, Ludovic Henry wrote: > Hi Andrew, Vladimir, > >> It's too early for that: changes are supposed to bake in JDK head for >> a while. Also, since it's an enhancement rather than a bug fix we'd >> need to have the discussion. I would say it's marginal whether >> something like this should be back ported. > >> Usually we backport only bugs fixes to keep LTS (11u) release stable. > > It makes perfect sense. I'm happy to wait longer, and follow up on that thread later on to check if there is any appetite to get it backported. > >> You need also point if backport applied cleanly or you have to make changes. > > The code conflicts were trivial as the infrastructure for intrinsics didn't change much since 11 (and even 8). > > Conflicts: > http://cr.openjdk.java.net/~luhenry/8250902-11u/webrev.00/conflict.diff > >> Changes should be backported separately to keep track - do not combine changes. >> But it is okay to push both changesets together (especially if followup changes fixed first). > > Sorry I do not fully understand. Is it ok in this case to combine both changes into a single changeset, since the second one is a followup that fixes the first one? Or should I still make 2 changeset, but have them pushed together? It is not okay to combine changes into a single changeset. You need to make 2 (in this case) separate changesets but push them together. You can push them separately too but there is a chance that second push may miss a new build which would includes only first push. Also if a changeset applies cleanly you can use "hg export" and "hg import" commands - no need to do new commit. If changeset does not apply cleanly you need to send RFR for backport as you correctly did. Regards, Vladimir > > Thank you, > Ludovic > From luhenry at microsoft.com Sun Aug 9 03:19:20 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Sun, 9 Aug 2020 03:19:20 +0000 Subject: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 Message-ID: Hello, Bug: https://bugs.openjdk.java.net/browse/JDK-8251216 Webrev: http://cr.openjdk.java.net/~luhenry/8251216/webrev.00 Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1 This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance improvements are the following (on Linux-AArch64 on a Marvell TX2): -XX:-UseMD5Intrinsics Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest md5 64 DEFAULT thrpt 10 1616.238 ? 28.082 ops/ms MessageDigests.digest md5 1024 DEFAULT thrpt 10 215.030 ? 0.691 ops/ms MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.228 ? 0.001 ops/ms -XX:+UseMD5Intrinsics Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest md5 64 DEFAULT thrpt 10 2005.233 ? 40.513 ops/ms => 24% speedup MessageDigests.digest md5 1024 DEFAULT thrpt 10 275.979 ? 0.455 ops/ms => 28% speedup MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.279 ? 0.001 ops/ms => 22% speedup Thank you, Ludovic [1] https://bugs.openjdk.java.net/browse/JDK-8250902 From aph at redhat.com Sun Aug 9 14:32:48 2020 From: aph at redhat.com (Andrew Haley) Date: Sun, 9 Aug 2020 15:32:48 +0100 Subject: [aarch64-port-dev ] RFR: 8247354: [aarch64] PopFrame causes assert(oopDesc::is_oop(obj)) failed: not an oop In-Reply-To: <85364ykery.fsf@nicgas01-pc.shanghai.arm.com> References: <85364ykery.fsf@nicgas01-pc.shanghai.arm.com> Message-ID: On 8/7/20 10:04 AM, Nick Gasson wrote: > Bug:https://bugs.openjdk.java.net/browse/JDK-8247354 > Webrev:http://cr.openjdk.java.net/~ngasson/8247354/webrev.0/ How did you test this? I'm looking through the test suite, but I can't find the test vectors. They must be in there somewhere. https://www.nist.gov/itl/ssd/software-quality-group/nsrl-test-data -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From nick.gasson at arm.com Mon Aug 10 01:34:41 2020 From: nick.gasson at arm.com (Nick Gasson) Date: Mon, 10 Aug 2020 09:34:41 +0800 Subject: [aarch64-port-dev ] RFR: 8247354: [aarch64] PopFrame causes assert(oopDesc::is_oop(obj)) failed: not an oop In-Reply-To: References: <85364ykery.fsf@nicgas01-pc.shanghai.arm.com> Message-ID: <85lfinwafi.fsf@nicgas01-pc.shanghai.arm.com> On 08/09/20 22:32 pm, Andrew Haley wrote: > On 8/7/20 10:04 AM, Nick Gasson wrote: >> Bug:https://bugs.openjdk.java.net/browse/JDK-8247354 >> Webrev:http://cr.openjdk.java.net/~ngasson/8247354/webrev.0/ > > How did you test this? I'm looking through the test suite, but I can't > find the test vectors. They must be in there somewhere. > > https://www.nist.gov/itl/ssd/software-quality-group/nsrl-test-data Hi Andrew, did you reply to the wrong mail...? -- Nick From Xiaohong.Gong at arm.com Mon Aug 10 01:39:23 2020 From: Xiaohong.Gong at arm.com (Xiaohong Gong) Date: Mon, 10 Aug 2020 01:39:23 +0000 Subject: RFR: 8250808: Re-associate loop invariants with other associative operations In-Reply-To: References: Message-ID: Hi Vladimir, > > Webrev: http://cr.openjdk.java.net/~xgong/rfr/8250808/webrev.00/ > > Looks good. > > So far, testing results look good (hs-tier1/2 are clean, tier1-4 are in progress). Thanks for looking at it! Thanks, Xiaohong Gong From david.holmes at oracle.com Mon Aug 10 02:03:35 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 10 Aug 2020 12:03:35 +1000 Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions In-Reply-To: References: Message-ID: <442caa21-ca0a-f6eb-60a5-1e74bf994894@oracle.com> Hi Jamsheed, On 6/08/2020 10:07 pm, Jamsheed C M wrote: > Hi all, > > JBS: https://bugs.openjdk.java.net/browse/JDK-8249451 > > webrev: http://cr.openjdk.java.net/~jcm/8249451/webrev.00/ Thanks for tackling this messy issue. Overall I like the use of TRAPS to more clearly document which methods can return with an exception pending. I think there are some problems with the proposed changes. I'll start with those comments and then move on to more general comments. src/hotspot/share/utilities/exceptions.cpp src/hotspot/share/utilities/exceptions.hpp I don't think the changes here are correct or safe in general. First, adding the new macro and function to only clear non-async exceptions is fine itself. But naming wise the fact only non-async exceptions are cleared should be evident, and there is no "check" involved (in the sense of the existing CHECK_ macros) so I suggest: s/CHECK_CLEAR_PENDING_EXCEPTION/CLEAR_PENDING_NONASYNC_EXCEPTIONS/ s/check_clear_pending_exception/clear_pending_nonasync_exceptions/ But changing the existing CHECK_AND_CLEAR macros to now leave async exceptions pending seems potentially dangerous as calling code may not be prepared for there to now be a pending exception. For example the use in thread.cpp: JDK_Version::set_runtime_name(get_java_runtime_name(THREAD)); JDK_Version::set_runtime_version(get_java_runtime_version(THREAD)); get_java_runtime_name() is currently guaranteed to clear all exceptions, so all the other code is known to be safe to call. But that would no longer be true. That said, this is VM initialization code and an async exception is impossible at this stage. I think I would rather see CHECK_AND_CLEAR left as-is, and an actual CHECK_AND_CLEAR_NONASYNC introduced for those users of CHECK_AND_CLEAR that can encounter async exceptions and which should not clear them. + if (!_pending_exception->is_a(SystemDictionary::ThreadDeath_klass()) && + _pending_exception->klass() != SystemDictionary::InternalError_klass()) { Flagging all InternalErrors as async exceptions is probably also not correct. I don't see a good solution to this at the moment. I think we would need to introduce a new subclass of InternalError for the unsafe access error case**. Now it may be that all the other InternalError usages are "impossible" in the context of where the new macros are to be used, but that is very difficult to establish or assert. ** Or perhaps we could inject a field that allows the VM to identify instances related to unsafe access errors ... Ideally of course these unsafe access errors would be distinct from the async exception mechanism - something I would still like to pursue. --- General comments ... There is a general change from "JavaThread* thread" to "Thread* THREAD" (or TRAPS) to allow the use of the CHECK macros. This is unfortunate because the fact the thread is restricted to being a JavaThread is no longer evident in the method signatures. That is a flaw with the TRAPS/CHECK mechanism unfortunately :( . But as the methods no longer take a JavaThread* arg, they should assert that THREAD->is_Java_thread(). I will also look at an RFE to have as_JavaThread() to avoid the need for separate assertion checks before casting from "Thread*" to "JavaThread*". Note there's no need to use CHECK when the enclosing method is going to return immediately after the call that contains the CHECK. It just adds unnecessary checking of the exception state. The use of TRAPS shows that the methods may return with an exception pending. I've flagged all such occurrences I spotted below. --- + // Only metaspace OOM is expected. no Java code executed. Nit: s/no/No src/hotspot/share/compiler/compilationPolicy.cpp 410 method_invocation_event(method, CHECK_NULL); 489 CompileBroker::compile_method(m, InvocationEntryBci, comp_level, m, hot_count, CompileTask::Reason_InvocationCount, CHECK); Nit: there's no need to use CHECK here. --- src/hotspot/share/compiler/tieredThresholdPolicy.cpp 504 method_invocation_event(method, inlinee, comp_level, nm, CHECK_NULL); 570 compile(mh, bci, CompLevel_simple, CHECK); 581 compile(mh, bci, CompLevel_simple, CHECK); 595 CompileBroker::compile_method(mh, bci, level, mh, hot_count, CompileTask::Reason_Tiered, CHECK); 1062 compile(mh, InvocationEntryBci, next_level, CHECK); Nit: there's no need to use CHECK here. 814 void TieredThresholdPolicy::create_mdo(const methodHandle& mh, Thread* THREAD) { Thank you for correcting this misuse of the THREAD name on a JavaThread* type. --- src/hotspot/share/interpreter/linkResolver.cpp 128 CompilationPolicy::compile_if_required(selected_method, CHECK); Nit: there's no need to use CHECK here. --- src/hotspot/share/jvmci/compilerRuntime.cpp 260 CompilationPolicy::policy()->event(emh, mh, InvocationEntryBci, InvocationEntryBci, CompLevel_aot, cm, CHECK); 280 nmethod* osr_nm = CompilationPolicy::policy()->event(emh, mh, branch_bci, target_bci, CompLevel_aot, cm, CHECK); Nit: there's no need to use CHECK here. --- src/hotspot/share/jvmci/jvmciRuntime.cpp 102 // Donot clear probable async exceptions. typo: s/Donot/Do not/ --- src/hotspot/share/runtime/deoptimization.cpp 1686 void Deoptimization::load_class_by_index(const constantPoolHandle& constant_pool, int index) { This method should be declared with TRAPS now. 1693 // Donot clear probable Async Exceptions. typo: s/Donot/Do not/ > testing : mach1-5(links in jbs) There is very little existing testing that will actually test the key changes you have made here. You will need to do direct fault-injection testing anywhere you now allow async exceptions to remain, to see if the calling code can tolerate that. It will be difficult to test thoroughly. Thanks again for tackling this difficult problem! David ----- > > While working on JDK-8246381 it was noticed that compilation request > path clears all exceptions(including async) and doesn't propagate[1]. > > Fix: patch restores the propagation behavior for the probable async > exceptions. > > Compilation request path propagate exception as in [2]. MDO and > MethodCounter doesn't expect any exception other than metaspace > OOM(added comments). > > Deoptimization path doesn't clear probable async exceptions and take > unpack_exception path for non uncommontraps. > > Added java_lang_InternalError to well known classes. > > Request for review. > > Best Regards, > > Jamsheed > > [1] w.r.t changes done for JDK-7131259 > > [2] > > ??? (a) > ??? -----> c1_Runtime1.cpp/interpreterRuntime.cpp/compilerRuntime.cpp > ????? | > ?????? ----- compilationPolicy.cpp/tieredThresholdPolicy.cpp > ???????? | > ????????? ------ compileBroker.cpp > > ??? (b) > ??? Xcomp versions > ??? ------> compilationPolicy.cpp > ?????? | > ??????? ------> compileBroker.cpp > > ??? (c) > > ??? Direct call to? compile_method in compileBroker.cpp > > ??? JVMCI bootstrap, whitebox, replayCompile. > > From vladimir.kozlov at oracle.com Mon Aug 10 04:25:54 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sun, 9 Aug 2020 21:25:54 -0700 Subject: [16] RFR (S) 8249749: modify a primitive array through a stream and a for cycle causes jre crash Message-ID: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com> http://cr.openjdk.java.net/~kvn/8249749/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8249749 SuperWord does not recognize array indexing pattern used in the test due to additional AddI node: AddI(AddI(Invariant(j*n), Loop_phi(i)), Loop_inc(1)) As result it can't find memory reference to align vectors. But code ignores that and continue execution. Later when align_to_ref is referenced we hit SEGV because it is NULL. The fix is to check align_to_ref for NULL early and bailout. I also adjusted code in SWPointer::scaled_iv_plus_offset() to recognize this address pattern to vectorize test's code. And added missing _invar setting. And I slightly modified tracking code to investigate this issue. Added new test to check some complex address expressions similar to bug's test case. Not all cases in test are vectorized - there are other conditions which prevent that. Tested tier1,tier2,hs-tier3,precheckin-comp Thanks, Vladimir K From Pengfei.Li at arm.com Mon Aug 10 04:45:17 2020 From: Pengfei.Li at arm.com (Pengfei Li) Date: Mon, 10 Aug 2020 04:45:17 +0000 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> Message-ID: Hi Andrew Dinn, Thank you so much for taking the time to review this. As Ningsheng is on leave this week, I will attempt to answer your specific questions based on what I know. I'm sorry that I'm not able to answer all your questions since I'm not familiar with every detail of the patch. And you may still need to wait him coming back to update the webrev(s). > I was able to test this patch on a loaned Fujitsu FX700. I replicated your > results, passing tier1 tests and the jtreg compiler tests in vectorization, > codegen, c2/cr6340864 and loopopts. > > I also eyeballed /some/ of the generated code to check that it looked ok. I'd > really like to be able to do that systematically for a comprehensive test suite > that exercised every rule but I only had the machine for a few days. This > really ought to be done as a follow-up to ensure that all the rules are working > as expected. Not sure if you have tried my newly added test in the vectorization folder. It checks if expected SVE/NEON instructions are generated as expected for each C2 vectornode by checking the OptoAssembly output. I put it in another webrev so you may have missed it. http://cr.openjdk.java.net/~pli/rfr/8231441/jtreg.webrev.00/ > Specific Comments (feature webrev): > > > globals_aarch64.hpp:102 > > Just out of interest why does UseSVE have range(0,2)? It seems you are only > testing for UseSVE > 0. Does value 2 correspond to an optional subset? AArch64 SVE has multiple versions. Current Fujitsu FX machine supports SVE1 only. We leave 2 here for SVE2 support in the near future. https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-instruction-emulator/resources/tutorials/sve/sve-vs-sve2/introduction-to-sve2 > Specific Comments (register allocator webrev): > > > aarch64.ad:97-100 > > Why have you added a reg_def for R8 and R9 here and also to alloc_class > chunk0 at lines 544-545? They aren't used by C2 so why define them? This has no functionality change to the two scratch registers. But if these are missing in the register definition, the regmask for vector registers won't start at an aligned position. So we prefer adding them back to make the computation easier. > assembler_aarch64.hpp:280 (also 699) > > prf sets a predicate register field. pgrf sets a governing predicate register field. > Should the name not be gprf. I guess the reason is that the ArmARM doc says "the Pg field". > chaitin.cpp:648-660 > > The comment is rather oddly formatted. Thanks for catching this. > At line 650 you guard the assert with a test for lrg._is_vector. Is that not > always going to be guaranteed by the outer condition lrg._is_scalable? If so > then you should really assert lrg._is_vector. > > The special case code for computation of num_regs for a vector stack slot > also appears in this file with a slightly different organization in find_first_set > (line 1350) and in PhaseChaitin::Select (line 1590). > There is another similar case in RegMask::num_registers at regmask.cpp: > 98. It would be better to factor out the common code into methods of LRG. > Maybe using the following? > > bool LRG::is_scalable_vector() { > if (_is_scalable) { > assert(_is_vector == 1); > assert(_num_regs == == RegMask::SlotsPerVecA) > return true; > } > return false; > } > > int LRG::scalable_num_regs() { > assert(is_scalable_vector()); > if (OptoReg::is_stack(_reg)) { > return _scalable_reg_slots > } else { > return num_reg_slots; > } > } > > > chaitin.cpp:1350 > > Once again the test for lrg._is_vector should be guaranteed by the outer test > of lrg._is_scalable. Refactoring using the common methods of LRG as above > ought to help. > > chaitin.cpp:1591 > > Use common method code. > > > postaloc.cpp:308/323 > > Once again you should be able to use common method code of LRG here. > > > regmask.cpp:91 > > Once again you should be able to use common method code of LRG here. Thanks for above suggestions. We will consider refactoring these parts. > Specific Comments (c2 webrev): > > > aarch64.ad:3815 > > very nice defensive check! > > > assembler_aarch64.hpp:2469 & 2699+ > > Andrew Haley is definitely going to ask you to update function entry > (assembler_aarch64.cpp:76) to call these new instruction generation > methods and then validate the generated code using asm_check So, I guess > you might as well do that now ;-) Thanks for letting us know. We will check how to validate those. > zBarrierSetAssembler_aarch64.cpp:434 > > Can you explain why we need to check p7 here and not do so in other places > where we call into the JVM? I'm not saying this is wrong. I just want to know > how you decided where re-init of p7 was needed. Sorry I don't know how the places are decided. But I will ask Ningsheng to explain this question and reply you later. > superword.cpp:97 > > Does this mean that is someone sets the maximum vector size to a non- > power of two, such as 384, all superword operations will be bypassed? > Including those which can be done using NEON vectors? The existing SLP doesn't support non-power-of-2 vector size (there are some assertions inside) so we added this. Yes, it's better if we have some mechanism to fall back to NEON for non-power-of-2 size. But so far in practice, we don't know any real chip implements the non-power-of-2 vector size. Also, we are now working on a new predicate-driven auto-vectorization pass to support SVE better. Do you think it's ok if we print some warnings if someone sets a non-power-of-2 size in vm options? Or any other suggestions in the short term? -- Thanks, Pengfei From tobias.hartmann at oracle.com Mon Aug 10 06:18:21 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 10 Aug 2020 08:18:21 +0200 Subject: [16] RFR(S): 8249608: Vector register used by C2 compiled method corrupted at safepoint In-Reply-To: <8ddafcf8-5fcf-c0cc-ccd0-29692dd1c19b@oracle.com> References: <9dca10a8-0fcb-e63f-f0f9-c2552e5218c1@oracle.com> <57163077-f113-b538-2830-86e43c5bd8ea@oracle.com> <8ddafcf8-5fcf-c0cc-ccd0-29692dd1c19b@oracle.com> Message-ID: <88ad17d1-d79c-3504-d535-a720a8239fe4@oracle.com> Thanks Vladimir! Best regards, Tobias On 06.08.20 21:00, Vladimir Kozlov wrote: > +1 > > Thanks, > Vladimir K > > On 8/6/20 7:07 AM, Vladimir Ivanov wrote: >> >>> http://cr.openjdk.java.net/~thartmann/8249608/webrev.00/ >> >> Looks good. >> >> Best regards, >> Vladimir Ivanov >> >>> >>> The problem is very similar to JDK-8193518 [1], a vector register (ymm0) used for vectorization of a >>> loop in a C2 compiled method is corrupted at a safepoint. Again, the root cause is the superword >>> optimization setting 'max_vector_size' to 16 bytes instead of 32 bytes which leads to the nmethod >>> being marked as !has_wide_vectors and the safepoint handler not saving vector registers [3]. >>> >>> This time, the problem is that the superword code only updates 'max_vlen_in_bytes' if 'vlen > >>> max_vlen'. In the failing case, 'vlen' is 4 for all packs (see [4]) but 'vlen_in_bytes' is 16 for >>> the 4 x int StoreVector and 32 for the 4 x long StoreVector. Once we've processed the int >>> StoreVector, we are not updating 'max_vlen_in_bytes' when processing long StoreVector because 'vlen' >>> is equal. >>> >>> The fix is to make sure to always update 'max_vlen_in_bytes'. >>> >>> When looking at JDK-8193518 [1], I've noticed that the corresponding regression test was never >>> pushed. I've added it to this webrev and extended it such that it also covers the new issue. >>> >>> Thanks, >>> Tobias >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8193518 >>> [2] http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/opto/output.cpp#l3313 >>> [3] >>> http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/runtime/sharedRuntime.cpp#l551 >>> >>> >>> [4] -XX:+TraceSuperWord output: >>> >>> After filter_packs >>> packset >>> Pack: 0 >>> ? align: 0????? 1101??? StoreL??? ===? 1115? 1120? 1102? 174? [[ 1098 ]]? >>> @long[int:>=0]:exact+any *, idx=6; >>> Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=993,214,[1012] !jvms: Test::test @ bci:17 >>> Test::main @ bci:8 >>> ? align: 8????? 1098??? StoreL??? ===? 1115? 1101? 1099? 174? [[ 993 ]]? @long[int:>=0]:exact+any >>> *, idx=6; >>> Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ bci:17 >>> Test::main @ bci:8 >>> ? align: 16????? 993??? StoreL??? ===? 1115? 1098? 994? 174? [[ 866? 214 ]]? >>> @long[int:>=0]:exact+any *, >>> idx=6;? Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ >>> bci:17 Test::main @ bci:8 >>> ? align: 24????? 214??? StoreL??? ===? 1115? 993? 212? 174? [[ 1120? 864? 255 ]]? >>> @long[int:>=0]:exact+any *, >>> idx=6;? Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=[1012] !jvms: Test::test @ bci:17 >>> Test::main @ bci:8 >>> Pack: 1 >>> ? align: 0????? 1097??? StoreI??? ===? 1115? 1119? 1106? 41? [[ 1096 ]]? @int[int:>=0]:exact+any >>> *, idx=8; >>> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=989,253,[1009] !jvms: Test::test @ bci:23 >>> Test::main @ bci:8 >>> ? align: 4????? 1096??? StoreI??? ===? 1115? 1097? 1104? 41? [[ 989 ]]? @int[int:>=0]:exact+any >>> *, idx=8; >>> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23 >>> Test::main @ bci:8 >>> ? align: 8????? 989??? StoreI??? ===? 1115? 1096? 996? 41? [[ 867? 253 ]]? >>> @int[int:>=0]:exact+any *, idx=8; >>> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23 >>> Test::main @ bci:8 >>> ? align: 12????? 253??? StoreI??? ===? 1115? 989? 251? 41? [[ 1119? 860? 255 ]]? >>> @int[int:>=0]:exact+any *, >>> idx=8;? Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=[1009] !jvms: Test::test @ bci:23 >>> Test::main @ bci:8 >>> >>> new Vector node:? 1491??? ReplicateI??? === _? 41? [[]]? #vectorx[4]:{int} >>> new Vector node:? 1492??? StoreVector??? ===? 1115? 1119? 1106? 1491? [[ 1487? 1119? 255? 1486 ]] >>> @int[int:>=0]:NotNull:exact+any *, idx=8; mismatched? Memory: @int[int:>=0]:NotNull:exact+any *, >>> idx=8; !orig=[1097],[989],[253],[1009] !jvms: Test::test @ bci:23 Test::main @ bci:8 >>> new Vector node:? 1493??? ReplicateL??? === _? 174? [[]]? #vectory[4]:{long} >>> new Vector node:? 1494??? StoreVector??? ===? 1115? 1120? 1102? 1493? [[ 1489? 1120? 255? 1488 ]] >>> @long[int:>=0]:NotNull:exact+any *, idx=6; mismatched? Memory: @long[int:>=0]:NotNull:exact+any *, >>> idx=6; !orig=[1101],[993],[214],[1012] !jvms: Test::test @ bci:17 Test::main @ bci:8 >>> From tobias.hartmann at oracle.com Mon Aug 10 07:20:05 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 10 Aug 2020 09:20:05 +0200 Subject: RFR: 8251303: C2: remove unused _site_invoke_ratio and related code from InlineTree In-Reply-To: References: Message-ID: +1 Best regards, Tobias On 07.08.20 18:50, Vladimir Ivanov wrote: >> Webrev: https://cr.openjdk.java.net/~burban/cgracie/unused_code/webrev0.0/ > Looks good. > > I'll submit it for testing. > > Best regards, > Vladimir Ivanov From yueshi.zwj at alibaba-inc.com Mon Aug 10 07:24:52 2020 From: yueshi.zwj at alibaba-inc.com (Joshua Zhu) Date: Mon, 10 Aug 2020 15:24:52 +0800 Subject: =?UTF-8?B?562U5aSNOiBbYWFyY2g2NC1wb3J0LWRldiBdIFJGUihMKTogODIzMTQ0MTogQUFyY2g2NDog?= =?UTF-8?B?SW5pdGlhbCBTVkUgYmFja2VuZCBzdXBwb3J0?= In-Reply-To: References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> Message-ID: <003101d66ee7$56b5e3f0$0421abd0$@alibaba-inc.com> Hi Andrew, Thanks a lot for your review. > As Ningsheng is on leave this week, I will attempt to answer your specific > questions based on what I know. I'm sorry that I'm not able to answer all > your questions since I'm not familiar with every detail of the patch. And you > may still need to wait him coming back to update the webrev(s). I will help answer questions related with RA. > > Specific Comments (register allocator webrev): > > > > > > aarch64.ad:97-100 > > > > Why have you added a reg_def for R8 and R9 here and also to > > alloc_class > > chunk0 at lines 544-545? They aren't used by C2 so why define them? > > This has no functionality change to the two scratch registers. But if these are > missing in the register definition, the regmask for vector registers won't start > at an aligned position. So we prefer adding them back to make the > computation easier. Yes. Thanks Pengfei. > > > assembler_aarch64.hpp:280 (also 699) > > > > prf sets a predicate register field. pgrf sets a governing predicate register > field. > > Should the name not be gprf. > > I guess the reason is that the ArmARM doc says "the Pg field". > > > chaitin.cpp:648-660 > > > > The comment is rather oddly formatted. > > Thanks for catching this. > > > At line 650 you guard the assert with a test for lrg._is_vector. Is > > that not always going to be guaranteed by the outer condition > > lrg._is_scalable? If so then you should really assert lrg._is_vector. _is_scalable tells the register length for the live range is scalable. This rule applies for both SVE vector register and predicate register. Each predicate register holds one bit per byte of SVE vector register, meaning that each predicate register is one-eighth of the size of SVE vector register. Each predicate register is an IMPLEMENTATION DEFINED multiple of 16 bits, up to 256 bits. Although the actual length of predicate register is scalable, the max slots is always defined as 1. class PRegisterImpl: public AbstractRegisterImpl { public: enum { number_of_registers = 16, max_slots_per_register = 1 }; I think this patch under review does not include the part of predicate register allocation. > > The special case code for computation of num_regs for a vector stack > > slot also appears in this file with a slightly different organization > > in find_first_set (line 1350) and in PhaseChaitin::Select (line 1590). > > There is another similar case in RegMask::num_registers at regmask.cpp: > > 98. It would be better to factor out the common code into methods of LRG. > > Maybe using the following? > > > > bool LRG::is_scalable_vector() { > > if (_is_scalable) { > > assert(_is_vector == 1); > > assert(_num_regs == == RegMask::SlotsPerVecA) > > return true; > > } > > return false; > > } > > > > int LRG::scalable_num_regs() { > > assert(is_scalable_vector()); > > if (OptoReg::is_stack(_reg)) { > > return _scalable_reg_slots > > } else { > > return num_reg_slots; > > } > > } > > > > chaitin.cpp:1350 > > > > Once again the test for lrg._is_vector should be guaranteed by the > > outer test of lrg._is_scalable. Refactoring using the common methods > > of LRG as above ought to help. > > > > chaitin.cpp:1591 > > > > Use common method code. > > > > > > postaloc.cpp:308/323 > > > > Once again you should be able to use common method code of LRG here. > > > > > > regmask.cpp:91 > > > > Once again you should be able to use common method code of LRG here. PhaseChaitin::Select (line 1590) will cover both SVE vector and predicate cases in future. 1590 // We always choose the high bit, then mask the low bits by register size 1591 if (lrg->_is_scalable && OptoReg::is_stack(lrg->reg())) { // stack 1592 n_regs = lrg->scalable_reg_slots(); 1593 } I think regmask.cpp (line 98) in future will look like: 98 if (lrg._is_scalable && OptoReg::is_stack(assigned)) { 99 if (lrg._is_vector) { 100 assert(ireg == Op_VecA, "scalable vector register"); 101 } else if (lrg._is_predicate) { assert(ireg == Op_RegVMask, "scalable predicate register"); } 102 n_regs = lrg.scalable_reg_slots(); 103 } 104 105 return n_regs; 106 } Please correct me if any issues. Thanks. Best Regards, Joshua From tobias.hartmann at oracle.com Mon Aug 10 07:32:51 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 10 Aug 2020 09:32:51 +0200 Subject: [16] RFR (S) 8249749: modify a primitive array through a stream and a for cycle causes jre crash In-Reply-To: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com> References: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com> Message-ID: <5f283812-510f-e22e-3e95-810103da2e43@oracle.com> Hi Vladimir, looks good to me. Little typo in the test on line 27: "explressions". Best regards, Tobias On 10.08.20 06:25, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/8249749/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8249749 > > SuperWord does not recognize array indexing pattern used in the test due to additional AddI node: > > AddI(AddI(Invariant(j*n), Loop_phi(i)), Loop_inc(1)) > > As result it can't find memory reference to align vectors. But code ignores that and continue > execution. > Later when align_to_ref is referenced we hit SEGV because it is NULL. > > The fix is to check align_to_ref for NULL early and bailout. > > I also adjusted code in SWPointer::scaled_iv_plus_offset() to recognize this address pattern to > vectorize test's code. > And added missing _invar setting. > > And I slightly modified tracking code to investigate this issue. > > Added new test to check some complex address expressions similar to bug's test case. Not all cases > in test are vectorized - there are other conditions which prevent that. > > Tested tier1,tier2,hs-tier3,precheckin-comp > > Thanks, > Vladimir K From tobias.hartmann at oracle.com Mon Aug 10 07:52:35 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 10 Aug 2020 09:52:35 +0200 Subject: RFR: 8250808: Re-associate loop invariants with other associative operations In-Reply-To: References: Message-ID: +1 Best regards, Tobias On 07.08.20 18:45, Vladimir Ivanov wrote: > >> Webrev: http://cr.openjdk.java.net/~xgong/rfr/8250808/webrev.00/ > > Looks good. > > So far, testing results look good (hs-tier1/2 are clean, tier1-4 are in progress). > > Best regards, > Vladimir Ivanov > >> C2 has re-association of loop invariants. However, the current implementation >> only supports the re-associations for add and subtract with 32-bits integer type. >> For other associative expressions like multiplication and the logic operations, >> the re-association is also applicable, and also for the operations with long type. >> >> This patch adds the missing re-associations for other associative operations >> together with the support for long type. >> >> With this patch, the following expressions: >> ?? (x * inv1) * inv2 >> ?? (x | inv1) | inv2 >> ?? (x & inv1) & inv2 >> ?? (x ^ inv1) ^ inv2???????? ; inv1, inv2 are invariants >> >> can be re-associated to: >> ?? x * (inv1 * inv2)???????? ; "inv1 * inv2" can be hoisted >> ?? x | (inv1 | inv2)???????? ; "inv1 | inv2" can be hoisted >> ?? x & (inv1 & inv2)?????? ; "inv1 & inv2" can be hoisted >> ?? x ^ (inv1 ^ inv2)???????? ; "inv1 ^ inv2" can be hoisted >> >> Performance: >> Here is the micro benchmark: >> http://cr.openjdk.java.net/~xgong/rfr/8250808/LoopInvariant.java >> >> And the results on X86_64: >> Before: >> Benchmark?????????????????????????? (length)? Mode Cnt??? Score??????? Error????? Units >> loopInvariantAddLong????????? 1024????? avgt?? 15?? 988.142??? ?? 0.110?? ns/op >> loopInvariantAndInt????????????? 1024????? avgt?? 15?? 843.850??? ?? 0.522?? ns/op >> loopInvariantAndLong????????? 1024????? avgt?? 15?? 990.551??? ? 10.458? ns/op >> loopInvariantMulInt????????????? 1024????? avgt?? 15? 1209.003?? ?? 0.247?? ns/op >> loopInvariantMulLong????????? 1024????? avgt?? 15? 1213.923?? ?? 0.438??? ns/op >> loopInvariantOrInt??????????????? 1024????? avgt?? 15?? 843.908??? ?? 0.132??? ns/op >> loopInvariantOrLong???????????? 1024????? avgt?? 15?? 990.710?? ? 10.484? ns/op >> loopInvariantSubLong?????????? 1024????? avgt?? 15?? 988.170?? ?? 0.159??? ns/op >> loopInvariantXorInt?????????????? 1024????? avgt?? 15?? 806.949?? ?? 7.860??? ns/op >> loopInvariantXorLong?????????? 1024????? avgt?? 15?? 990.963?? ?? 8.321??? ns/op >> >> After: >> Benchmark?????????????????????????? (length)? Mode? Cnt??? Score?????? Error??? Units >> loopInvariantAddLong????????? 1024????? avgt?? 15?? 842.854?? ?? 9.036? ns/op >> loopInvariantAndInt????????????? 1024????? avgt?? 15?? 698.097?? ?? 0.916? ns/op >> loopInvariantAndLong????????? 1024????? avgt?? 15?? 841.120?? ?? 0.118? ns/op >> loopInvariantMulInt????????????? 1024????? avgt?? 15?? 691.000?? ?? 7.696? ns/op >> loopInvariantMulLong????????? 1024????? avgt?? 15?? 846.907?? ?? 0.189? ns/op >> loopInvariantOrInt??????????????? 1024????? avgt?? 15?? 698.423?? ?? 4.969? ns/op >> loopInvariantOrLong??????????? 1024????? avgt?? 15?? 843.465?? ? 10.196? ns/op >> loopInvariantSubLong????????? 1024????? avgt?? 15?? 841.314?? ?? 2.906? ns/op >> loopInvariantXorInt????????????? 1024????? avgt?? 15?? 652.529?? ?? 0.556? ns/op >> loopInvariantXorLong????????? 1024????? avgt?? 15?? 841.860?? ?? 2.491? ns/op >> >> Results on AArch64: >> Before: >> Benchmark????????????????????????? (length)? Mode? Cnt??? Score??????? Error???? Units >> loopInvariantAddLong???????? 1024????? avgt??? 15?? 514.437??? ? 0.351? ns/op >> loopInvariantAndInt??????????? 1024????? avgt???? 15?? 435.301??? ? 0.415? ns/op >> loopInvariantAndLong??????? 1024????? avgt???? 15?? 572.437??? ? 0.057? ns/op >> loopInvariantMulInt??????????? 1024????? avgt???? 15? 1154.544?? ? 0.030? ns/op >> loopInvariantMulLong??????? 1024????? avgt???? 15? 1188.109?? ? 0.299? ns/op >> loopInvariantOrInt????????????? 1024????? avgt???? 15?? 435.605??? ? 0.977? ns/op >> loopInvariantOrLong????????? 1024????? avgt???? 15?? 572.475???? ? 0.093? ns/op >> loopInvariantSubLong??????? 1024????? avgt???? 15?? 514.340??? ? 0.154? ns/op >> loopInvariantXorInt??????????? 1024????? avgt???? 15?? 426.186??? ? 0.105? ns/op >> loopInvariantXorLong??????? 1024????? avgt???? 15?? 572.505??? ? 0.259? ns/op >> >> After: >> Benchmark??????????????????????? (length)? Mode? Cnt??? Score?????? Error??? Units >> loopInvariantAddLong?????? 1024???? avgt???? 15?? 508.179?? ? 0.108? ns/op >> loopInvariantAndInt?????????? 1024??? avgt???? 15?? 394.706?? ? 0.199? ns/op >> loopInvariantAndLong?????? 1024??? avgt???? 15?? 434.443?? ? 0.247? ns/op >> loopInvariantMulInt?????????? 1024??? avgt???? 15?? 762.477?? ? 0.079? ns/op >> loopInvariantMulLong?????? 1024??? avgt???? 15?? 775.975?? ? 0.159? ns/op >> loopInvariantOrInt???????????? 1024??? avgt???? 15?? 394.657?? ? 0.156? ns/op >> loopInvariantOrLong???????? 1024??? avgt???? 15?? 434.428?? ? 0.282? ns/op >> loopInvariantSubLong?????? 1024??? avgt???? 15?? 507.475?? ? 0.151? ns/op >> loopInvariantXorInt?????????? 1024??? avgt???? 15?? 396.000?? ? 0.011? ns/op >> loopInvariantXorLong?????? 1024??? avgt???? 15?? 434.255?? ? 0.099? ns/op >> >> Tests: >> Tested jtreg hotspot::hotspot_all_no_apps,jdk::jdk_core,langtools::tier1 >> and jcstress:tests-custom, and all tests pass without new failure. >> >> Thanks, >> Xiaohong Gong >> From tobias.hartmann at oracle.com Mon Aug 10 08:13:48 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 10 Aug 2020 10:13:48 +0200 Subject: [16] RFR(S): 8249603: C1: assert(has_error == false) failed: register allocation invalid In-Reply-To: <348bd2c6-df94-d9f5-47d9-d51443e1e5d6@oracle.com> References: <348bd2c6-df94-d9f5-47d9-d51443e1e5d6@oracle.com> Message-ID: <9f5d2b18-a080-d569-a0a4-d357c8c2c8a6@oracle.com> Hi Christian, I agree with Vladimir, very nice analysis. Although I'm not too familiar with the C1 register allocator, your explanation and fix makes sense to me. Just wondering, do we hit this case with any of our existing tests? Best regards, Tobias On 06.08.20 11:34, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8249603 > http://cr.openjdk.java.net/~chagedorn/8249603/webrev.00/ > > Register allocation fails in C1 in the testcase because two intervals overlap (they both have the > same stack slot assigned). The problem can be traced back to the optimization to assign the same > spill slot to non-intersecting intervals in LinearScanWalker::combine_spilled_intervals(). > > In this method, we look at a split parent interval 'cur' and its register hint interval > 'register_hint'. A register hint is present when the interval represents either the source or the > target operand of a move operation and the register hint the target or source operand, respectively > (the register hint is used to try to assign the same register to the source and target operand such > that we can completely remove the move operation). > > If the register hint is set, then we do some additional checks and make sure that the split parent > and the register hint do not intersect. If all checks pass, the split parent 'cur' gets the same > spill slot as the register hint [1]. This means that both intervals get the same slot on the stack > if they are spilled. > > The problem now is that we do not consider any split children of the register hint which all share > the same spill slot with the register hint (their split parent). In the testcase, the split parent > 'cur' does not intersect with the register hint but with one of its split children. As a result, > they both get the same spill slot and are later indeed both spilled (i.e. both virtual > registers/operands are put to the same stack location at the same time). > > The fix now additionally checks if the split parent 'cur' does not intersect any split children of > the register hint in combine_spilled_intervals(). If there is such an intersection, then we bail out > of the optimization. > > Some standard benchmark testing did not show any regressions. > > Thank you! > > Best regards, > Christian > > > [1] http://hg.openjdk.java.net/jdk/jdk/file/7a3522ab48b3/src/hotspot/share/c1/c1_LinearScan.cpp#l5728 From vladimir.x.ivanov at oracle.com Mon Aug 10 08:30:59 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 10 Aug 2020 11:30:59 +0300 Subject: RFR: 8250808: Re-associate loop invariants with other associative operations In-Reply-To: References: Message-ID: <40dffc1b-1c62-4a53-a21f-3cf041ab569b@oracle.com> >> Webrev: http://cr.openjdk.java.net/~xgong/rfr/8250808/webrev.00/ > > Looks good. > > So far, testing results look good (hs-tier1/2 are clean, tier1-4 are in > progress). FYI test results are clean. Best regards, Vladimir Ivanov >> C2 has re-association of loop invariants. However, the current >> implementation >> only supports the re-associations for add and subtract with 32-bits >> integer type. >> For other associative expressions like multiplication and the logic >> operations, >> the re-association is also applicable, and also for the operations >> with long type. >> >> This patch adds the missing re-associations for other associative >> operations >> together with the support for long type. >> >> With this patch, the following expressions: >> ?? (x * inv1) * inv2 >> ?? (x | inv1) | inv2 >> ?? (x & inv1) & inv2 >> ?? (x ^ inv1) ^ inv2???????? ; inv1, inv2 are invariants >> >> can be re-associated to: >> ?? x * (inv1 * inv2)???????? ; "inv1 * inv2" can be hoisted >> ?? x | (inv1 | inv2)???????? ; "inv1 | inv2" can be hoisted >> ?? x & (inv1 & inv2)?????? ; "inv1 & inv2" can be hoisted >> ?? x ^ (inv1 ^ inv2)???????? ; "inv1 ^ inv2" can be hoisted >> >> Performance: >> Here is the micro benchmark: >> http://cr.openjdk.java.net/~xgong/rfr/8250808/LoopInvariant.java >> >> And the results on X86_64: >> Before: >> Benchmark?????????????????????????? (length)? Mode Cnt??? Score >> Error????? Units >> loopInvariantAddLong????????? 1024????? avgt?? 15?? 988.142??? ? >> 0.110?? ns/op >> loopInvariantAndInt????????????? 1024????? avgt?? 15?? 843.850??? ? >> 0.522?? ns/op >> loopInvariantAndLong????????? 1024????? avgt?? 15?? 990.551??? ? >> 10.458? ns/op >> loopInvariantMulInt????????????? 1024????? avgt?? 15? 1209.003?? ? >> 0.247?? ns/op >> loopInvariantMulLong????????? 1024????? avgt?? 15? 1213.923?? ? >> 0.438??? ns/op >> loopInvariantOrInt??????????????? 1024????? avgt?? 15?? 843.908??? ? >> 0.132??? ns/op >> loopInvariantOrLong???????????? 1024????? avgt?? 15?? 990.710?? ? >> 10.484? ns/op >> loopInvariantSubLong?????????? 1024????? avgt?? 15?? 988.170?? ? >> 0.159??? ns/op >> loopInvariantXorInt?????????????? 1024????? avgt?? 15?? 806.949?? ? >> 7.860??? ns/op >> loopInvariantXorLong?????????? 1024????? avgt?? 15?? 990.963?? ? >> 8.321??? ns/op >> >> After: >> Benchmark?????????????????????????? (length)? Mode? Cnt??? Score >> Error??? Units >> loopInvariantAddLong????????? 1024????? avgt?? 15?? 842.854?? ? >> 9.036? ns/op >> loopInvariantAndInt????????????? 1024????? avgt?? 15?? 698.097?? ? >> 0.916? ns/op >> loopInvariantAndLong????????? 1024????? avgt?? 15?? 841.120?? ? >> 0.118? ns/op >> loopInvariantMulInt????????????? 1024????? avgt?? 15?? 691.000?? ? >> 7.696? ns/op >> loopInvariantMulLong????????? 1024????? avgt?? 15?? 846.907?? ? >> 0.189? ns/op >> loopInvariantOrInt??????????????? 1024????? avgt?? 15?? 698.423?? ? >> 4.969? ns/op >> loopInvariantOrLong??????????? 1024????? avgt?? 15?? 843.465?? ? >> 10.196? ns/op >> loopInvariantSubLong????????? 1024????? avgt?? 15?? 841.314?? ? >> 2.906? ns/op >> loopInvariantXorInt????????????? 1024????? avgt?? 15?? 652.529?? ? >> 0.556? ns/op >> loopInvariantXorLong????????? 1024????? avgt?? 15?? 841.860?? ? >> 2.491? ns/op >> >> Results on AArch64: >> Before: >> Benchmark????????????????????????? (length)? Mode? Cnt??? Score >> Error???? Units >> loopInvariantAddLong???????? 1024????? avgt??? 15?? 514.437??? ? >> 0.351? ns/op >> loopInvariantAndInt??????????? 1024????? avgt???? 15?? 435.301??? ? >> 0.415? ns/op >> loopInvariantAndLong??????? 1024????? avgt???? 15?? 572.437??? ? >> 0.057? ns/op >> loopInvariantMulInt??????????? 1024????? avgt???? 15? 1154.544?? ? >> 0.030? ns/op >> loopInvariantMulLong??????? 1024????? avgt???? 15? 1188.109?? ? 0.299 >> ns/op >> loopInvariantOrInt????????????? 1024????? avgt???? 15?? 435.605??? ? >> 0.977? ns/op >> loopInvariantOrLong????????? 1024????? avgt???? 15?? 572.475???? ? >> 0.093? ns/op >> loopInvariantSubLong??????? 1024????? avgt???? 15?? 514.340??? ? >> 0.154? ns/op >> loopInvariantXorInt??????????? 1024????? avgt???? 15?? 426.186??? ? >> 0.105? ns/op >> loopInvariantXorLong??????? 1024????? avgt???? 15?? 572.505??? ? >> 0.259? ns/op >> >> After: >> Benchmark??????????????????????? (length)? Mode? Cnt??? Score >> Error??? Units >> loopInvariantAddLong?????? 1024???? avgt???? 15?? 508.179?? ? 0.108 >> ns/op >> loopInvariantAndInt?????????? 1024??? avgt???? 15?? 394.706?? ? 0.199 >> ns/op >> loopInvariantAndLong?????? 1024??? avgt???? 15?? 434.443?? ? 0.247? ns/op >> loopInvariantMulInt?????????? 1024??? avgt???? 15?? 762.477?? ? 0.079 >> ns/op >> loopInvariantMulLong?????? 1024??? avgt???? 15?? 775.975?? ? 0.159? ns/op >> loopInvariantOrInt???????????? 1024??? avgt???? 15?? 394.657?? ? >> 0.156? ns/op >> loopInvariantOrLong???????? 1024??? avgt???? 15?? 434.428?? ? 0.282 >> ns/op >> loopInvariantSubLong?????? 1024??? avgt???? 15?? 507.475?? ? 0.151? ns/op >> loopInvariantXorInt?????????? 1024??? avgt???? 15?? 396.000?? ? 0.011 >> ns/op >> loopInvariantXorLong?????? 1024??? avgt???? 15?? 434.255?? ? 0.099? ns/op >> >> Tests: >> Tested jtreg hotspot::hotspot_all_no_apps,jdk::jdk_core,langtools::tier1 >> and jcstress:tests-custom, and all tests pass without new failure. >> >> Thanks, >> Xiaohong Gong >> From vladimir.x.ivanov at oracle.com Mon Aug 10 08:33:51 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 10 Aug 2020 11:33:51 +0300 Subject: RFR: 8251303: C2: remove unused _site_invoke_ratio and related code from InlineTree In-Reply-To: References: Message-ID: <500e1fc4-11a5-90c3-d554-11cdf2f3eaed@oracle.com> >> https://cr.openjdk.java.net/~burban/cgracie/unused_code/webrev0.0/ > Looks good. > > I'll submit it for testing. Test results are clean. I'll push the patch for you. Best regards, Vladimir Ivanov From adinn at redhat.com Mon Aug 10 08:43:34 2020 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 10 Aug 2020 09:43:34 +0100 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> Message-ID: <6562d0da-f081-ede8-dfef-d3d6c70fb998@redhat.com> Hi Pengfei, On 10/08/2020 05:45, Pengfei Li wrote: >> I also eyeballed /some/ of the generated code to check that it >> looked ok. I'd really like to be able to do that systematically for >> a comprehensive test suite that exercised every rule but I only had >> the machine for a few days. This really ought to be done as a >> follow-up to ensure that all the rules are working as expected. > > Not sure if you have tried my newly added test in the vectorization > folder. It checks if expected SVE/NEON instructions are generated as > expected for each C2 vectornode by checking the OptoAssembly output. > I put it in another webrev so you may have missed it. > http://cr.openjdk.java.net/~pli/rfr/8231441/jtreg.webrev.00/ Ah, thank you. That was not in the patch I Ningsheng pointed me at. It is exactly what is needed to check the generation rules are all working. >> Just out of interest why does UseSVE have range(0,2)? It seems you >> are only testing for UseSVE > 0. Does value 2 correspond to an >> optional subset? > AArch64 SVE has multiple versions. Current Fujitsu FX machine > supports SVE1 only. We leave 2 here for SVE2 support in the near > future. > https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-instruction-emulator/resources/tutorials/sve/sve-vs-sve2/introduction-to-sve2 Ah ok, thanks. Got it. Being able to switch on level 1 without level 2 is a good idea. >> Why have you added a reg_def for R8 and R9 here and also to >> alloc_class chunk0 at lines 544-545? They aren't used by C2 so why >> define them? > > This has no functionality change to the two scratch registers. But if > these are missing in the register definition, the regmask for vector > registers won't start at an aligned position. So we prefer adding > them back to make the computation easier. It would be good to make this clear with a comment. Also, I think you should change the name of the registers to R8_UNUSED and R9_UNUSED just to emphasize that these are not expected to be included in any register sets. >> prf sets a predicate register field. pgrf sets a governing >> predicate register field. Should the name not be gprf. > > I guess the reason is that the ArmARM doc says "the Pg field". Ok, let's leave it at that then and blame ARM ;-) >> chaitin.cpp:648-660 >> >> The comment is rather oddly formatted. > > Thanks for catching this. Well, that's what reviews are for ... >> At line 650 you guard the assert with a test for lrg._is_vector. Is >> that not always going to be guaranteed by the outer condition >> lrg._is_scalable? If so then you should really assert >> lrg._is_vector. >> >> . . . > Thanks for above suggestions. We will consider refactoring these > parts. Ok, I'll wait for an updated webrev. >> Andrew Haley is definitely going to ask you to update function >> entry (assembler_aarch64.cpp:76) to call these new instruction >> generation methods and then validate the generated code using >> asm_check So, I guess you might as well do that now ;-) > > Thanks for letting us know. We will check how to validate those. Ok, thanks. >> Can you explain why we need to check p7 here and not do so in other >> places where we call into the JVM? I'm not saying this is wrong. I >> just want to know how you decided where re-init of p7 was needed. > > Sorry I don't know how the places are decided. But I will ask > Ningsheng to explain this question and reply you later. Sure, thanks. >> Does this mean that is someone sets the maximum vector size to a >> non- power of two, such as 384, all superword operations will be >> bypassed? Including those which can be done using NEON vectors? > > The existing SLP doesn't support non-power-of-2 vector size (there > are some assertions inside) so we added this. Yes, it's better if we > have some mechanism to fall back to NEON for non-power-of-2 size. But > so far in practice, we don't know any real chip implements the > non-power-of-2 vector size. Also, we are now working on a new > predicate-driven auto-vectorization pass to support SVE better. Do > you think it's ok if we print some warnings if someone sets a > non-power-of-2 size in vm options? Or any other suggestions in the > short term? Well, the test for MaxVectorSize in vm_version.cpp currently only ensures it has been set to a multiple of 16. I think you probably ought to check for a power of two at that point and exit the VM otherwise. If hardware comes along that supports a non-power of two we can deal with it at that point. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From Xiaohong.Gong at arm.com Mon Aug 10 08:44:02 2020 From: Xiaohong.Gong at arm.com (Xiaohong Gong) Date: Mon, 10 Aug 2020 08:44:02 +0000 Subject: RFR: 8250808: Re-associate loop invariants with other associative operations In-Reply-To: References: Message-ID: Hi Tobias, > +1 Thanks for the review! Best Regards, Xiaohong From vladimir.x.ivanov at oracle.com Mon Aug 10 09:04:17 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 10 Aug 2020 12:04:17 +0300 Subject: [16] RFR (S) 8249749: modify a primitive array through a stream and a for cycle causes jre crash In-Reply-To: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com> References: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com> Message-ID: <03653a8f-d196-4e4c-d893-a619f2973011@oracle.com> > http://cr.openjdk.java.net/~kvn/8249749/webrev.00/ Looks good. Best regards, Vladimir Ivanov > https://bugs.openjdk.java.net/browse/JDK-8249749 > > SuperWord does not recognize array indexing pattern used in the test due > to additional AddI node: > > AddI(AddI(Invariant(j*n), Loop_phi(i)), Loop_inc(1)) > > As result it can't find memory reference to align vectors. But code > ignores that and continue execution. > Later when align_to_ref is referenced we hit SEGV because it is NULL. > > The fix is to check align_to_ref for NULL early and bailout. > > I also adjusted code in SWPointer::scaled_iv_plus_offset() to recognize > this address pattern to vectorize test's code. > And added missing _invar setting. > > And I slightly modified tracking code to investigate this issue. > > Added new test to check some complex address expressions similar to > bug's test case. Not all cases in test are vectorized - there are other > conditions which prevent that. > > Tested tier1,tier2,hs-tier3,precheckin-comp > > Thanks, > Vladimir K From adinn at redhat.com Mon Aug 10 09:18:59 2020 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 10 Aug 2020 10:18:59 +0100 Subject: =?UTF-8?B?UmU6IOetlOWkjTogW2FhcmNoNjQtcG9ydC1kZXYgXSBSRlIoTCk6IDgy?= =?UTF-8?Q?31441=3a_AArch64=3a_Initial_SVE_backend_support?= In-Reply-To: <003101d66ee7$56b5e3f0$0421abd0$@alibaba-inc.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <003101d66ee7$56b5e3f0$0421abd0$@alibaba-inc.com> Message-ID: Hi Joshua, On 10/08/2020 08:24, Joshua Zhu wrote: > I will help answer questions related with RA. Thanks for your help. >>> At line 650 you guard the assert with a test for lrg._is_vector. Is >>> that not always going to be guaranteed by the outer condition >>> lrg._is_scalable? If so then you should really assert lrg._is_vector. > > _is_scalable tells the register length for the live range is > scalable. This rule applies for both SVE vector register and > predicate register. Each predicate register holds one bit per > byte of SVE vector register, meaning that each predicate > register is one-eighth of the size of SVE vector register. > Each predicate register is an IMPLEMENTATION DEFINED multiple > of 16 bits, up to 256 bits. Although the actual length of > predicate register is scalable, the max slots is always defined > as 1.> class PRegisterImpl: public AbstractRegisterImpl { > public: > enum { > number_of_registers = 16, > max_slots_per_register = 1 > }; > I think this patch under review does not include the part of > predicate register allocation. Ok, I understand that _is_scalable is meant to identify both a predicate register and an SVE vector register. Something definitely seems to be missing because field LRG::_is_scalable is not set in the case where we have a PRegisterImpl (Op_RegVMask). In webrev03 it only ever gets set at chaitin.cpp:822: if (RegMask::is_vector(ireg)) { lrg._is_vector = 1; if (ireg == Op_VecA) { assert(Matcher::supports_scalable_vector(), "scalable vector should be supported"); lrg._is_scalable = 1; // For scalable vector, when it is allocated in physical register, // num_regs is RegMask::SlotsPerVecA for reg mask, // which may not be the actual physical register size. // If it is allocated in stack, we need to get the actual // physical length of scalable vector register. lrg.set_scalable_reg_slots(Matcher::scalable_vector_reg_size(T_FLOAT)); } So, it seems LRG::_is_scalable will only be set for a VecA register. If you could check what code might be missing and post a new webrev I'll look at this again. However, it would still be good to try to factor out some common code into methods if possible. >>> The special case code for computation of num_regs for a vector stack >>> slot also appears in this file with a slightly different organization >>> . . . > PhaseChaitin::Select (line 1590) will cover both SVE vector and predicate cases in future. > 1590 // We always choose the high bit, then mask the low bits by register size > 1591 if (lrg->_is_scalable && OptoReg::is_stack(lrg->reg())) { // stack > 1592 n_regs = lrg->scalable_reg_slots(); > 1593 } > > I think regmask.cpp (line 98) in future will look like: > 98 if (lrg._is_scalable && OptoReg::is_stack(assigned)) { > 99 if (lrg._is_vector) { > 100 assert(ireg == Op_VecA, "scalable vector register"); > 101 } > else if (lrg._is_predicate) { > assert(ireg == Op_RegVMask, "scalable predicate register"); > } > 102 n_regs = lrg.scalable_reg_slots(); > 103 } > 104 > 105 return n_regs; > 106 } > > Please correct me if any issues. Thanks. Ok, I agree that this will be correct when we can come across the case where lrg._is_scalable is true and ireg == Op_RegVMask. However, that case does not currently arise. So, a new webrev that allows for this case would help. Thanks for helping to explain what is going on here. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From doug.simon at oracle.com Mon Aug 10 09:55:30 2020 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 10 Aug 2020 11:55:30 +0200 Subject: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: References: Message-ID: <9074F4C9-589A-4519-BBAB-2F3161B814D7@oracle.com> Hi Ludovic, Are you considering also implementing this intrinsic in Graal? Is the intrinsification purely about removing the array bounds checks? If so, it may be possible to have the Graal intrinsify the method by compiling the relevant Java code without array bounds checks. -Doug > On 9 Aug 2020, at 05:19, Ludovic Henry wrote: > > Hello, > > Bug: https://bugs.openjdk.java.net/browse/JDK-8251216 > Webrev: http://cr.openjdk.java.net/~luhenry/8251216/webrev.00 > > Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1 > > This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance improvements are the following (on Linux-AArch64 on a Marvell TX2): > > -XX:-UseMD5Intrinsics > Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 10 1616.238 ? 28.082 ops/ms > MessageDigests.digest md5 1024 DEFAULT thrpt 10 215.030 ? 0.691 ops/ms > MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.228 ? 0.001 ops/ms > > -XX:+UseMD5Intrinsics > Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 10 2005.233 ? 40.513 ops/ms => 24% speedup > MessageDigests.digest md5 1024 DEFAULT thrpt 10 275.979 ? 0.455 ops/ms => 28% speedup > MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.279 ? 0.001 ops/ms => 22% speedup > > Thank you, > Ludovic > > [1] https://bugs.openjdk.java.net/browse/JDK-8250902 From beurba at microsoft.com Mon Aug 10 13:01:45 2020 From: beurba at microsoft.com (Bernhard Urban-Forster) Date: Mon, 10 Aug 2020 13:01:45 +0000 Subject: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: <9074F4C9-589A-4519-BBAB-2F3161B814D7@oracle.com> References: , <9074F4C9-589A-4519-BBAB-2F3161B814D7@oracle.com> Message-ID: Hey Doug, replying on behalf for Ludovic, as he is on vacation :-) Currently we are not planning to implement the intrinsic for Graal. Also we didn't check the generated code by Graal. I believe it will do a better job eliminated array bounds checks, but I'm curious to learn how "compiling the relevant Java code without array bounds checks" works. Is something like that done for other methods already? This is the relevant Java method for the MD5 intrinsic: https://github.com/openjdk/jdk/blob/733218137289d6a0eb705103ed7be30f1e68d17a/src/java.base/share/classes/sun/security/provider/MD5.java#L172 -Bernhard ________________________________________ From: Doug Simon Sent: Monday, August 10, 2020 11:55 To: Ludovic Henry Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64 Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 Hi Ludovic, Are you considering also implementing this intrinsic in Graal? Is the intrinsification purely about removing the array bounds checks? If so, it may be possible to have the Graal intrinsify the method by compiling the relevant Java code without array bounds checks. -Doug > On 9 Aug 2020, at 05:19, Ludovic Henry wrote: > > Hello, > > Bug: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8251216&data=02%7C01%7Cbeurba%40microsoft.com%7C087d5d80f9484f13ddcc08d83d138f3a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637326501506459034&sdata=C7Bi8BTsmtR3HFgWgYTw7jww63BcHGutNXE8o9x2bdY%3D&reserved=0 > Webrev: https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~luhenry%2F8251216%2Fwebrev.00&data=02%7C01%7Cbeurba%40microsoft.com%7C087d5d80f9484f13ddcc08d83d138f3a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637326501506459034&sdata=0CZOMfpmtPZiy64za8NYYpVjCdawmjGacEOc3WfADDA%3D&reserved=0 > > Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1 > > This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance improvements are the following (on Linux-AArch64 on a Marvell TX2): > > -XX:-UseMD5Intrinsics > Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 10 1616.238 ? 28.082 ops/ms > MessageDigests.digest md5 1024 DEFAULT thrpt 10 215.030 ? 0.691 ops/ms > MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.228 ? 0.001 ops/ms > > -XX:+UseMD5Intrinsics > Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 10 2005.233 ? 40.513 ops/ms => 24% speedup > MessageDigests.digest md5 1024 DEFAULT thrpt 10 275.979 ? 0.455 ops/ms => 28% speedup > MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.279 ? 0.001 ops/ms => 22% speedup > > Thank you, > Ludovic > > [1] https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8250902&data=02%7C01%7Cbeurba%40microsoft.com%7C087d5d80f9484f13ddcc08d83d138f3a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637326501506459034&sdata=5KcoG5n10rnVMU9y8L076jpCoEd0NBzNqr%2F8M5ghO3c%3D&reserved=0 From doug.simon at oracle.com Mon Aug 10 13:38:42 2020 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 10 Aug 2020 15:38:42 +0200 Subject: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: References: <9074F4C9-589A-4519-BBAB-2F3161B814D7@oracle.com> Message-ID: Hi Bernhard, > On 10 Aug 2020, at 15:01, Bernhard Urban-Forster wrote: > > Hey Doug, > > replying on behalf for Ludovic, as he is on vacation :-) > > Currently we are not planning to implement the intrinsic for Graal. Schade ;-) > Also we didn't check the generated code by Graal. I believe it will do a better job eliminated array bounds checks, but I'm curious to learn how "compiling the relevant Java code without array bounds checks" works. Is something like that done for other methods already? I don?t think we do that anywhere currently but I imagine it wouldn?t be hard to put the BytecodeParser into a mode whereby an array access generates a AccessIndexedNode that omits the bounds check (generated by org.graalvm.compiler.replacements.DefaultJavaLoweringProvider.getBoundsCheck). -Doug > > This is the relevant Java method for the MD5 intrinsic: > https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/733218137289d6a0eb705103ed7be30f1e68d17a/src/java.base/share/classes/sun/security/provider/MD5.java*L172__;Iw!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6ijVLDV$ > > > -Bernhard > > ________________________________________ > From: Doug Simon > > Sent: Monday, August 10, 2020 11:55 > To: Ludovic Henry > Cc: hotspot-compiler-dev at openjdk.java.net ; aarch64-port-dev at openjdk.java.net ; openjdk-aarch64 > Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 > > Hi Ludovic, > > Are you considering also implementing this intrinsic in Graal? > > Is the intrinsification purely about removing the array bounds checks? If so, it may be possible to have the Graal intrinsify the method by compiling the relevant Java code without array bounds checks. > > -Doug > >> On 9 Aug 2020, at 05:19, Ludovic Henry wrote: >> >> Hello, >> >> Bug: https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8251216&data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&sdata=C7Bi8BTsmtR3HFgWgYTw7jww63BcHGutNXE8o9x2bdY*3D&reserved=0__;JSUlJSUlJSUlJSUlJSU!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E97IPBA3$ >> Webrev: https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=http:*2F*2Fcr.openjdk.java.net*2F*luhenry*2F8251216*2Fwebrev.00&data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&sdata=0CZOMfpmtPZiy64za8NYYpVjCdawmjGacEOc3WfADDA*3D&reserved=0__;JSUlfiUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E84nlzLJ$ >> >> Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1 >> >> This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance improvements are the following (on Linux-AArch64 on a Marvell TX2): >> >> -XX:-UseMD5Intrinsics >> Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units >> MessageDigests.digest md5 64 DEFAULT thrpt 10 1616.238 ? 28.082 ops/ms >> MessageDigests.digest md5 1024 DEFAULT thrpt 10 215.030 ? 0.691 ops/ms >> MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.228 ? 0.001 ops/ms >> >> -XX:+UseMD5Intrinsics >> Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units >> MessageDigests.digest md5 64 DEFAULT thrpt 10 2005.233 ? 40.513 ops/ms => 24% speedup >> MessageDigests.digest md5 1024 DEFAULT thrpt 10 275.979 ? 0.455 ops/ms => 28% speedup >> MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.279 ? 0.001 ops/ms => 22% speedup >> >> Thank you, >> Ludovic >> >> [1] https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8250902&data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&sdata=5KcoG5n10rnVMU9y8L076jpCoEd0NBzNqr*2F8M5ghO3c*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6SPJBTN$ From evgeny.nikitin at oracle.com Mon Aug 10 13:47:03 2020 From: evgeny.nikitin at oracle.com (Evgeny Nikitin) Date: Mon, 10 Aug 2020 15:47:03 +0200 Subject: RFR(XS): 8251349: Add TestCaseImpl to OverloadCompileQueueTest.java's build dependencies Message-ID: Hi, Bug: https://bugs.openjdk.java.net/browse/JDK-8251349 Webrev: https://cr.openjdk.java.net/~enikitin/8251349/webrev.00/ The test loads said class (TestCaseImpl) as a resource from disk. The test obviously needs the class to get compiled in advance. The change has been checked in mach5 for the 5 common platforms (passed). Please review, /Evgeny Nikitin. From evgeny.nikitin at oracle.com Mon Aug 10 14:22:39 2020 From: evgeny.nikitin at oracle.com (Evgeny Nikitin) Date: Mon, 10 Aug 2020 16:22:39 +0200 Subject: RFR(XS): 8069411: Un-quarantine OverloadCompileQueueTest.java Message-ID: Hi, Bug: https://bugs.openjdk.java.net/browse/JDK-8069411 Webrev: http://cr.openjdk.java.net/~enikitin//8069411/webrev.00/index.html The test failed previously due to a specific Assert class design from 2015 [1]. Please note that getMessage gets called for every comparison, causing a string copy. So the OOME was not caused by a test design or failure, it was just a common OOME, and Assert class was stressing the VM by copying error messages. These days Assert class has changed and I have run lengths attempting to reproduce OOME in that or any other place of the test. I suggest to enable the test in CI runs. Please review, //Evgeny Nikitin. ======== [1] http://cr.openjdk.java.net/~enikitin//8069411/webrev.00/index.html From Charlie.Gracie at microsoft.com Mon Aug 10 14:58:57 2020 From: Charlie.Gracie at microsoft.com (Charlie Gracie) Date: Mon, 10 Aug 2020 14:58:57 +0000 Subject: RFR: 8251303: C2: remove unused _site_invoke_ratio and related code from InlineTree Message-ID: <756853D1-477A-4A7A-AC18-0FFA624502A3@microsoft.com> Thanks for the reviews Vladimir and Tobias. Thanks for testing and sponsoring the change Vladimir. Cheers, Charlie Gracie ?On 2020-08-10, 4:29 AM, "Vladimir Ivanov" wrote: >> https://cr.openjdk.java.net/~burban/cgracie/unused_code/webrev0.0/ > Looks good. > > I'll submit it for testing. Test results are clean. I'll push the patch for you. Best regards, Vladimir Ivanov From igor.ignatyev at oracle.com Mon Aug 10 16:04:48 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 10 Aug 2020 09:04:48 -0700 Subject: RFR(XS): 8251349: Add TestCaseImpl to OverloadCompileQueueTest.java's build dependencies In-Reply-To: References: Message-ID: <64A989AA-7A20-4AB2-828B-C0BABE31E6D6@oracle.com> Hi Evgeny, the fix looks good. there is although another (arguable better) way to solve that: update test/hotspot/jtreg/compiler/codecache/stress/Helper.java to get TestCaseImpl classname from TestCaseImpl.class, so there will be statically detectable dependency b/w TestCaseImpl and compiler/codecache/stress/Helper (and all test classes which use it, including OverloadCompileQueueTest), so the tests won't have to have explicit @build. Thanks, -- Igor > On Aug 10, 2020, at 6:47 AM, Evgeny Nikitin wrote: > > Hi, > > Bug: https://bugs.openjdk.java.net/browse/JDK-8251349 > Webrev: https://cr.openjdk.java.net/~enikitin/8251349/webrev.00/ > > The test loads said class (TestCaseImpl) as a resource from disk. The test obviously needs the class to get compiled in advance. > > The change has been checked in mach5 for the 5 common platforms (passed). > > Please review, > /Evgeny Nikitin. From igor.ignatyev at oracle.com Mon Aug 10 16:09:59 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 10 Aug 2020 09:09:59 -0700 Subject: RFR(XS): 8069411: Un-quarantine OverloadCompileQueueTest.java In-Reply-To: References: Message-ID: <6EF18AD8-F195-4DB5-98FE-50D1B49A3A90@oracle.com> Hi Evgeny, I'm assuming that you haven't seen timeouts in your reproducing attempts either, correct? the fix looks good (assuming to goes after 8251349) -- Igor > On Aug 10, 2020, at 7:22 AM, Evgeny Nikitin wrote: > > Hi, > > Bug: https://bugs.openjdk.java.net/browse/JDK-8069411 > Webrev: http://cr.openjdk.java.net/~enikitin//8069411/webrev.00/index.html > > The test failed previously due to a specific Assert class design from 2015 [1]. Please note that getMessage gets called for every comparison, causing a string copy. So the OOME was not caused by a test design or failure, it was just a common OOME, and Assert class was stressing the VM by copying error messages. > > These days Assert class has changed and I have run lengths attempting to reproduce OOME in that or any other place of the test. I suggest to enable the test in CI runs. > > > Please review, > //Evgeny Nikitin. > > ======== > [1] http://cr.openjdk.java.net/~enikitin//8069411/webrev.00/index.html From verghese at amazon.com Mon Aug 10 17:00:29 2020 From: verghese at amazon.com (Verghese, Clive) Date: Mon, 10 Aug 2020 17:00:29 +0000 Subject: RFR 8251268: Move PhaseChaitin definations from live.cpp to chaitin.cpp In-Reply-To: <6593ec2c-78dc-4a72-7f5b-f6c60deda41d@oracle.com> References: <54AD5187-E7EE-410F-BD5D-11658E8D2F6E@amazon.com> <6593ec2c-78dc-4a72-7f5b-f6c60deda41d@oracle.com> Message-ID: <8DD352C1-18EA-4D50-9646-18C333CCC118@amazon.com> Hi Christian, Thank you for the feedback. I have updated the review addressing the comments below. http://cr.openjdk.java.net/~xliu/clive/8251268/01/webrev/ Regards, Clive Verghese ?On 8/6/20, 11:55 PM, "Christian Hagedorn" wrote: Hi Clive The fix looks good to me. It makes sense to move it to chaitin.cpp since the calls to verify() are also in this file only. You could fix some minor code style things about the existing code that you moved while at it: - You can move the #ifdef ASSERT out of both methods and surround both methods by one single #ifdef ASSERT since verify()/verify_base_ptrs() are only called in ASSERT blocks. And add a // ASSERT comment on the closing #endif to make it more clear. Don't forget to also surround the declarations in the .hpp file with an ASSERT. - In verify_base_ptrs(): - L2330: Missing curly braces for the loop - L2297, 2309, 2316: The asterisk should be at the type: ResourceArea *a -> ResourceArea* a - There is a missing space in all asserts after the comma separating the condition and the failure string - In verify(): - L2386: Missing space and curly braces for the if statement Best regards, Christian On 07.08.20 01:49, Verghese, Clive wrote: > Hi, > > Requesting review for > > Webrev : http://cr.openjdk.java.net/~xliu/clive/8251268/00/webrev/ > JBS : https://bugs.openjdk.java.net/browse/JDK-8251268 > > The change moves the definition of PhaseChaitin::verify_base_ptrs and PhaseChaitin::verify from live.cpp to chaitin.cpp > > I have tested this builds successfully for both PRODUCT and !PRODUCT. > > Ensured that there are no regressions in hotspot:tier1 tests. > > > Regards, > Clive Verghese > From vladimir.kozlov at oracle.com Mon Aug 10 17:01:01 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 10 Aug 2020 10:01:01 -0700 Subject: [16] RFR (S) 8249749: modify a primitive array through a stream and a for cycle causes jre crash In-Reply-To: <5f283812-510f-e22e-3e95-810103da2e43@oracle.com> References: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com> <5f283812-510f-e22e-3e95-810103da2e43@oracle.com> Message-ID: <83200b26-6f48-0af0-19e3-1f8a2089d29b@oracle.com> Thank you, Tobias On 8/10/20 12:32 AM, Tobias Hartmann wrote: > Hi Vladimir, > > looks good to me. > > Little typo in the test on line 27: "explressions". Fixed. Thanks, Vladimir K > > Best regards, > Tobias > > On 10.08.20 06:25, Vladimir Kozlov wrote: >> http://cr.openjdk.java.net/~kvn/8249749/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8249749 >> >> SuperWord does not recognize array indexing pattern used in the test due to additional AddI node: >> >> AddI(AddI(Invariant(j*n), Loop_phi(i)), Loop_inc(1)) >> >> As result it can't find memory reference to align vectors. But code ignores that and continue >> execution. >> Later when align_to_ref is referenced we hit SEGV because it is NULL. >> >> The fix is to check align_to_ref for NULL early and bailout. >> >> I also adjusted code in SWPointer::scaled_iv_plus_offset() to recognize this address pattern to >> vectorize test's code. >> And added missing _invar setting. >> >> And I slightly modified tracking code to investigate this issue. >> >> Added new test to check some complex address expressions similar to bug's test case. Not all cases >> in test are vectorized - there are other conditions which prevent that. >> >> Tested tier1,tier2,hs-tier3,precheckin-comp >> >> Thanks, >> Vladimir K From vladimir.kozlov at oracle.com Mon Aug 10 17:02:34 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 10 Aug 2020 10:02:34 -0700 Subject: [16] RFR (S) 8249749: modify a primitive array through a stream and a for cycle causes jre crash In-Reply-To: <03653a8f-d196-4e4c-d893-a619f2973011@oracle.com> References: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com> <03653a8f-d196-4e4c-d893-a619f2973011@oracle.com> Message-ID: <3d0dc868-e824-a141-02a2-58a58ad5b450@oracle.com> Thank you, Vladimir Vladimir K On 8/10/20 2:04 AM, Vladimir Ivanov wrote: > >> http://cr.openjdk.java.net/~kvn/8249749/webrev.00/ > > Looks good. > > Best regards, > Vladimir Ivanov > >> https://bugs.openjdk.java.net/browse/JDK-8249749 >> >> SuperWord does not recognize array indexing pattern used in the test due to additional AddI node: >> >> AddI(AddI(Invariant(j*n), Loop_phi(i)), Loop_inc(1)) >> >> As result it can't find memory reference to align vectors. But code ignores that and continue execution. >> Later when align_to_ref is referenced we hit SEGV because it is NULL. >> >> The fix is to check align_to_ref for NULL early and bailout. >> >> I also adjusted code in SWPointer::scaled_iv_plus_offset() to recognize this address pattern to vectorize test's code. >> And added missing _invar setting. >> >> And I slightly modified tracking code to investigate this issue. >> >> Added new test to check some complex address expressions similar to bug's test case. Not all cases in test are >> vectorized - there are other conditions which prevent that. >> >> Tested tier1,tier2,hs-tier3,precheckin-comp >> >> Thanks, >> Vladimir K From evgeny.nikitin at oracle.com Mon Aug 10 19:25:05 2020 From: evgeny.nikitin at oracle.com (Evgeny Nikitin) Date: Mon, 10 Aug 2020 21:25:05 +0200 Subject: RFR(XS): 8251349: Add TestCaseImpl to OverloadCompileQueueTest.java's build dependencies In-Reply-To: <64A989AA-7A20-4AB2-828B-C0BABE31E6D6@oracle.com> References: <64A989AA-7A20-4AB2-828B-C0BABE31E6D6@oracle.com> Message-ID: Hi Igor, I agree, using reflection would be better. For those using IDEs as well. Here's the new webrev: http://cr.openjdk.java.net/~enikitin//8251349/webrev.01/index.html Again, the same one-time test run in mach5 on 5 platforms. Thanks in advance, //Evgeny On 2020-08-10 18:04, Igor Ignatyev wrote: > Hi Evgeny, > > the fix looks good. there is although another (arguable better) way to solve that: update test/hotspot/jtreg/compiler/codecache/stress/Helper.java to get TestCaseImpl classname from TestCaseImpl.class, so there will be statically detectable dependency b/w TestCaseImpl and compiler/codecache/stress/Helper (and all test classes which use it, including OverloadCompileQueueTest), so the tests won't have to have explicit @build. > > Thanks, > -- Igor > >> On Aug 10, 2020, at 6:47 AM, Evgeny Nikitin wrote: >> >> Hi, >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8251349 >> Webrev: https://cr.openjdk.java.net/~enikitin/8251349/webrev.00/ >> >> The test loads said class (TestCaseImpl) as a resource from disk. The test obviously needs the class to get compiled in advance. >> >> The change has been checked in mach5 for the 5 common platforms (passed). >> >> Please review, >> /Evgeny Nikitin. > From igor.ignatyev at oracle.com Mon Aug 10 19:34:06 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 10 Aug 2020 12:34:06 -0700 Subject: RFR(XS): 8251349: Add TestCaseImpl to OverloadCompileQueueTest.java's build dependencies In-Reply-To: References: <64A989AA-7A20-4AB2-828B-C0BABE31E6D6@oracle.com> Message-ID: <415F5369-AD2E-4E57-8A9A-C3A58BC4F99A@oracle.com> LGTM -- Igor > On Aug 10, 2020, at 12:25 PM, Evgeny Nikitin wrote: > > Hi Igor, > > I agree, using reflection would be better. For those using IDEs as well. Here's the new webrev: > > http://cr.openjdk.java.net/~enikitin//8251349/webrev.01/index.html > > Again, the same one-time test run in mach5 on 5 platforms. > > Thanks in advance, > //Evgeny > > On 2020-08-10 18:04, Igor Ignatyev wrote: >> Hi Evgeny, >> the fix looks good. there is although another (arguable better) way to solve that: update test/hotspot/jtreg/compiler/codecache/stress/Helper.java to get TestCaseImpl classname from TestCaseImpl.class, so there will be statically detectable dependency b/w TestCaseImpl and compiler/codecache/stress/Helper (and all test classes which use it, including OverloadCompileQueueTest), so the tests won't have to have explicit @build. >> Thanks, >> -- Igor >>> On Aug 10, 2020, at 6:47 AM, Evgeny Nikitin wrote: >>> >>> Hi, >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8251349 >>> Webrev: https://cr.openjdk.java.net/~enikitin/8251349/webrev.00/ >>> >>> The test loads said class (TestCaseImpl) as a resource from disk. The test obviously needs the class to get compiled in advance. >>> >>> The change has been checked in mach5 for the 5 common platforms (passed). >>> >>> Please review, >>> /Evgeny Nikitin. From vladimir.kozlov at oracle.com Mon Aug 10 20:05:20 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 10 Aug 2020 13:05:20 -0700 Subject: RFR(XS): 8251349: Add TestCaseImpl to OverloadCompileQueueTest.java's build dependencies In-Reply-To: <415F5369-AD2E-4E57-8A9A-C3A58BC4F99A@oracle.com> References: <64A989AA-7A20-4AB2-828B-C0BABE31E6D6@oracle.com> <415F5369-AD2E-4E57-8A9A-C3A58BC4F99A@oracle.com> Message-ID: <429818e1-47e4-30d3-151f-0a38d2524bff@oracle.com> +1 Thanks, Vladimir K On 8/10/20 12:34 PM, Igor Ignatyev wrote: > LGTM > -- Igor > >> On Aug 10, 2020, at 12:25 PM, Evgeny Nikitin wrote: >> >> Hi Igor, >> >> I agree, using reflection would be better. For those using IDEs as well. Here's the new webrev: >> >> http://cr.openjdk.java.net/~enikitin//8251349/webrev.01/index.html >> >> Again, the same one-time test run in mach5 on 5 platforms. >> >> Thanks in advance, >> //Evgeny >> >> On 2020-08-10 18:04, Igor Ignatyev wrote: >>> Hi Evgeny, >>> the fix looks good. there is although another (arguable better) way to solve that: update test/hotspot/jtreg/compiler/codecache/stress/Helper.java to get TestCaseImpl classname from TestCaseImpl.class, so there will be statically detectable dependency b/w TestCaseImpl and compiler/codecache/stress/Helper (and all test classes which use it, including OverloadCompileQueueTest), so the tests won't have to have explicit @build. >>> Thanks, >>> -- Igor >>>> On Aug 10, 2020, at 6:47 AM, Evgeny Nikitin wrote: >>>> >>>> Hi, >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8251349 >>>> Webrev: https://cr.openjdk.java.net/~enikitin/8251349/webrev.00/ >>>> >>>> The test loads said class (TestCaseImpl) as a resource from disk. The test obviously needs the class to get compiled in advance. >>>> >>>> The change has been checked in mach5 for the 5 common platforms (passed). >>>> >>>> Please review, >>>> /Evgeny Nikitin. > From christian.hagedorn at oracle.com Tue Aug 11 07:15:44 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 11 Aug 2020 09:15:44 +0200 Subject: RFR 8251268: Move PhaseChaitin definations from live.cpp to chaitin.cpp In-Reply-To: <8DD352C1-18EA-4D50-9646-18C333CCC118@amazon.com> References: <54AD5187-E7EE-410F-BD5D-11658E8D2F6E@amazon.com> <6593ec2c-78dc-4a72-7f5b-f6c60deda41d@oracle.com> <8DD352C1-18EA-4D50-9646-18C333CCC118@amazon.com> Message-ID: Hi Clive Thanks a lot for taking care of this! One last comment: The existing spacing for the verify methods in the .hpp file is wrong. But since there are many more methods with a wrong spacing following it, I leave it up to you if you want to fix it for the verify methods or not. I'm fine with both. Either way, you don't need to send another webrev. Otherwise, it looks good to me! Best regards, Christian On 10.08.20 19:00, Verghese, Clive wrote: > Hi Christian, > > Thank you for the feedback. I have updated the review addressing the comments below. > > http://cr.openjdk.java.net/~xliu/clive/8251268/01/webrev/ > > Regards, > Clive Verghese > > > > ?On 8/6/20, 11:55 PM, "Christian Hagedorn" wrote: > > > Hi Clive > > The fix looks good to me. It makes sense to move it to chaitin.cpp since > the calls to verify() are also in this file only. > > You could fix some minor code style things about the existing code that > you moved while at it: > - You can move the #ifdef ASSERT out of both methods and surround both > methods by one single #ifdef ASSERT since verify()/verify_base_ptrs() > are only called in ASSERT blocks. And add a // ASSERT comment on the > closing #endif to make it more clear. Don't forget to also surround the > declarations in the .hpp file with an ASSERT. > - In verify_base_ptrs(): > - L2330: Missing curly braces for the loop > - L2297, 2309, 2316: The asterisk should be at the type: ResourceArea > *a -> ResourceArea* a > - There is a missing space in all asserts after the comma separating > the condition and the failure string > - In verify(): > - L2386: Missing space and curly braces for the if statement > > > Best regards, > Christian > > On 07.08.20 01:49, Verghese, Clive wrote: > > Hi, > > > > Requesting review for > > > > Webrev : http://cr.openjdk.java.net/~xliu/clive/8251268/00/webrev/ > > JBS : https://bugs.openjdk.java.net/browse/JDK-8251268 > > > > The change moves the definition of PhaseChaitin::verify_base_ptrs and PhaseChaitin::verify from live.cpp to chaitin.cpp > > > > I have tested this builds successfully for both PRODUCT and !PRODUCT. > > > > Ensured that there are no regressions in hotspot:tier1 tests. > > > > > > Regards, > > Clive Verghese > > > > From christian.hagedorn at oracle.com Tue Aug 11 08:40:38 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 11 Aug 2020 10:40:38 +0200 Subject: [16] RFR(S): 8249603: C1: assert(has_error == false) failed: register allocation invalid In-Reply-To: <9f5d2b18-a080-d569-a0a4-d357c8c2c8a6@oracle.com> References: <348bd2c6-df94-d9f5-47d9-d51443e1e5d6@oracle.com> <9f5d2b18-a080-d569-a0a4-d357c8c2c8a6@oracle.com> Message-ID: Hi Tobias Thanks a lot! I think we will always catch an overlap later in the verification method unless we somehow correct the mistake until then. But I don't think that this is likely or even possible. Nevertheless, I still wanted to verify that to some extent and added an assert(false) in the newly added intersection bailout test with the split children and could not trigger it in tier 1-4 (apart from the newly added test). Best regards, Christian On 10.08.20 10:13, Tobias Hartmann wrote: > Hi Christian, > > I agree with Vladimir, very nice analysis. Although I'm not too familiar with the C1 register > allocator, your explanation and fix makes sense to me. > > Just wondering, do we hit this case with any of our existing tests? > > Best regards, > Tobias > > On 06.08.20 11:34, Christian Hagedorn wrote: >> Hi >> >> Please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8249603 >> http://cr.openjdk.java.net/~chagedorn/8249603/webrev.00/ >> >> Register allocation fails in C1 in the testcase because two intervals overlap (they both have the >> same stack slot assigned). The problem can be traced back to the optimization to assign the same >> spill slot to non-intersecting intervals in LinearScanWalker::combine_spilled_intervals(). >> >> In this method, we look at a split parent interval 'cur' and its register hint interval >> 'register_hint'. A register hint is present when the interval represents either the source or the >> target operand of a move operation and the register hint the target or source operand, respectively >> (the register hint is used to try to assign the same register to the source and target operand such >> that we can completely remove the move operation). >> >> If the register hint is set, then we do some additional checks and make sure that the split parent >> and the register hint do not intersect. If all checks pass, the split parent 'cur' gets the same >> spill slot as the register hint [1]. This means that both intervals get the same slot on the stack >> if they are spilled. >> >> The problem now is that we do not consider any split children of the register hint which all share >> the same spill slot with the register hint (their split parent). In the testcase, the split parent >> 'cur' does not intersect with the register hint but with one of its split children. As a result, >> they both get the same spill slot and are later indeed both spilled (i.e. both virtual >> registers/operands are put to the same stack location at the same time). >> >> The fix now additionally checks if the split parent 'cur' does not intersect any split children of >> the register hint in combine_spilled_intervals(). If there is such an intersection, then we bail out >> of the optimization. >> >> Some standard benchmark testing did not show any regressions. >> >> Thank you! >> >> Best regards, >> Christian >> >> >> [1] http://hg.openjdk.java.net/jdk/jdk/file/7a3522ab48b3/src/hotspot/share/c1/c1_LinearScan.cpp#l5728 From tobias.hartmann at oracle.com Tue Aug 11 08:44:31 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 11 Aug 2020 10:44:31 +0200 Subject: [16] RFR(S): 8249603: C1: assert(has_error == false) failed: register allocation invalid In-Reply-To: References: <348bd2c6-df94-d9f5-47d9-d51443e1e5d6@oracle.com> <9f5d2b18-a080-d569-a0a4-d357c8c2c8a6@oracle.com> Message-ID: Hi Christian, On 11.08.20 10:40, Christian Hagedorn wrote: > I think we will always catch an overlap later in the verification method unless we somehow correct > the mistake until then. But I don't think that this is likely or even possible. Nevertheless, I > still wanted to verify that to some extent and added an assert(false) in the newly added > intersection bailout test with the split children and could not trigger it in tier 1-4 (apart from > the newly added test). Okay, thanks for checking! Best regards, Tobias From patric.hedlin at oracle.com Tue Aug 11 09:00:15 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 11 Aug 2020 11:00:15 +0200 Subject: RFR(XS/T): 8250848: [aarch64] nativeGotJump_at() missing call to verify(). Message-ID: <6a148562-eb36-1fc5-895f-024a6135b002@oracle.com> Please review this trivial change/update. --- a/src/hotspot/cpu/aarch64/nativeInst_aarch64.hpp??? Mon Aug 10 12:57:38 2020 +0100 +++ b/src/hotspot/cpu/aarch64/nativeInst_aarch64.hpp??? Mon Aug 10 16:50:20 2020 +0200 @@ -537,6 +537,7 @@ ?inline NativeGotJump* nativeGotJump_at(address addr) { ?? NativeGotJump* jump = (NativeGotJump*)(addr); +? DEBUG_ONLY(jump->verify()); ?? return jump; ?} Issue: https://bugs.openjdk.java.net/browse/JDK-8250848 Webrev with additional (trivial) /code style/ conforming clean-up: ? ? ?? http://cr.openjdk.java.net/~phedlin/tr8250848/ Testing: tier1-3 Best regards, Patric From aph at redhat.com Tue Aug 11 10:06:03 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 11 Aug 2020 11:06:03 +0100 Subject: [aarch64-port-dev ] [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: References: Message-ID: <2d960bbe-0191-db94-2d5c-7df511a36dba@redhat.com> On 09/08/2020 04:19, Ludovic Henry wrote: > Hello, > > Bug: https://bugs.openjdk.java.net/browse/JDK-8251216 > Webrev: http://cr.openjdk.java.net/~luhenry/8251216/webrev.00 > > Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1 > > This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance improvements are the following (on Linux-AArch64 on a Marvell TX2): > > -XX:-UseMD5Intrinsics > Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 10 1616.238 ? 28.082 ops/ms > MessageDigests.digest md5 1024 DEFAULT thrpt 10 215.030 ? 0.691 ops/ms > MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.228 ? 0.001 ops/ms > > -XX:+UseMD5Intrinsics > Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 10 2005.233 ? 40.513 ops/ms => 24% speedup > MessageDigests.digest md5 1024 DEFAULT thrpt 10 275.979 ? 0.455 ops/ms => 28% speedup > MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.279 ? 0.001 ops/ms => 22% speedup > > Thank you, > Ludovic > > [1] https://bugs.openjdk.java.net/browse/JDK-8250902 > How did you test this? I'm looking through the test suite, but I can't find the test vectors. They must be in there somewhere. https://www.nist.gov/itl/ssd/software-quality-group/nsrl-test-data -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Tue Aug 11 10:06:28 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 11 Aug 2020 11:06:28 +0100 Subject: [aarch64-port-dev ] RFR: 8247354: [aarch64] PopFrame causes assert(oopDesc::is_oop(obj)) failed: not an oop In-Reply-To: <85lfinwafi.fsf@nicgas01-pc.shanghai.arm.com> References: <85364ykery.fsf@nicgas01-pc.shanghai.arm.com> <85lfinwafi.fsf@nicgas01-pc.shanghai.arm.com> Message-ID: <8b4d4db0-01d0-3501-8114-382fff8b06bc@redhat.com> On 10/08/2020 02:34, Nick Gasson wrote: > Hi Andrew, did you reply to the wrong mail...? Looks like it. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Tue Aug 11 10:08:53 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 11 Aug 2020 11:08:53 +0100 Subject: [aarch64-port-dev ] RFR(XS/T): 8250848: [aarch64] nativeGotJump_at() missing call to verify(). In-Reply-To: <6a148562-eb36-1fc5-895f-024a6135b002@oracle.com> References: <6a148562-eb36-1fc5-895f-024a6135b002@oracle.com> Message-ID: <4ddd50f6-6449-fd4a-58c0-dc77a523434e@redhat.com> On 11/08/2020 10:00, Patric Hedlin wrote: > > Issue:https://bugs.openjdk.java.net/browse/JDK-8250848 > > Webrev with additional (trivial)/code style/ conforming clean-up: > http://cr.openjdk.java.net/~phedlin/tr8250848/ OK. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From evgeny.nikitin at oracle.com Tue Aug 11 10:16:21 2020 From: evgeny.nikitin at oracle.com (Evgeny Nikitin) Date: Tue, 11 Aug 2020 12:16:21 +0200 Subject: RFR(XS): 8251349: Add TestCaseImpl to OverloadCompileQueueTest.java's build dependencies In-Reply-To: <415F5369-AD2E-4E57-8A9A-C3A58BC4F99A@oracle.com> References: <64A989AA-7A20-4AB2-828B-C0BABE31E6D6@oracle.com> <415F5369-AD2E-4E57-8A9A-C3A58BC4F99A@oracle.com> Message-ID: Hi Igor, Thank you. Please find the patch attached. I wonder how many such nano-fixes one needs to make to become a committer? :)) Thanks in advance, // Evgeny. On 2020-08-10 21:34, Igor Ignatyev wrote: > LGTM > -- Igor > >> On Aug 10, 2020, at 12:25 PM, Evgeny Nikitin wrote: >> >> Hi Igor, >> >> I agree, using reflection would be better. For those using IDEs as well. Here's the new webrev: >> >> http://cr.openjdk.java.net/~enikitin//8251349/webrev.01/index.html >> >> Again, the same one-time test run in mach5 on 5 platforms. >> >> Thanks in advance, >> //Evgeny >> >> On 2020-08-10 18:04, Igor Ignatyev wrote: >>> Hi Evgeny, >>> the fix looks good. there is although another (arguable better) way to solve that: update test/hotspot/jtreg/compiler/codecache/stress/Helper.java to get TestCaseImpl classname from TestCaseImpl.class, so there will be statically detectable dependency b/w TestCaseImpl and compiler/codecache/stress/Helper (and all test classes which use it, including OverloadCompileQueueTest), so the tests won't have to have explicit @build. >>> Thanks, >>> -- Igor >>>> On Aug 10, 2020, at 6:47 AM, Evgeny Nikitin wrote: >>>> >>>> Hi, >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8251349 >>>> Webrev: https://cr.openjdk.java.net/~enikitin/8251349/webrev.00/ >>>> >>>> The test loads said class (TestCaseImpl) as a resource from disk. The test obviously needs the class to get compiled in advance. >>>> >>>> The change has been checked in mach5 for the 5 common platforms (passed). >>>> >>>> Please review, >>>> /Evgeny Nikitin. > -------------- next part -------------- # HG changeset patch # User enikitin # Date 1597084287 -7200 # Mon Aug 10 20:31:27 2020 +0200 # Node ID 060dd595dda6a12a38ccd944a565b9bd23c1933e # Parent c379dc750a02918dda02809fbc9edb2711c4a6ee 8251349: Add TestCaseImpl to OverloadCompileQueueTest.java's build dependencies Reviewed-by: iignatyev, kvn diff -r c379dc750a02 -r 060dd595dda6 test/hotspot/jtreg/compiler/codecache/stress/Helper.java --- a/test/hotspot/jtreg/compiler/codecache/stress/Helper.java Mon Jul 27 11:34:19 2020 -0700 +++ b/test/hotspot/jtreg/compiler/codecache/stress/Helper.java Mon Aug 10 20:31:27 2020 +0200 @@ -37,7 +37,7 @@ public static final WhiteBox WHITE_BOX = WhiteBox.getWhiteBox(); private static final long THRESHOLD = WHITE_BOX.getIntxVMFlag("CompileThreshold"); - private static final String TEST_CASE_IMPL_CLASS_NAME = "compiler.codecache.stress.TestCaseImpl"; + private static final String TEST_CASE_IMPL_CLASS_NAME = TestCaseImpl.class.getName(); private static byte[] CLASS_DATA; static { try { From patric.hedlin at oracle.com Tue Aug 11 11:16:51 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 11 Aug 2020 13:16:51 +0200 Subject: [aarch64-port-dev ] RFR(XS/T): 8250848: [aarch64] nativeGotJump_at() missing call to verify(). In-Reply-To: <4ddd50f6-6449-fd4a-58c0-dc77a523434e@redhat.com> References: <6a148562-eb36-1fc5-895f-024a6135b002@oracle.com> <4ddd50f6-6449-fd4a-58c0-dc77a523434e@redhat.com> Message-ID: Thank you for reviewing Andrew. /Patric On 2020-08-11 12:08, Andrew Haley wrote: > On 11/08/2020 10:00, Patric Hedlin wrote: >> >> Issue:https://bugs.openjdk.java.net/browse/JDK-8250848 >> >> Webrev with additional (trivial)/code style/? conforming clean-up: >> ???????? http://cr.openjdk.java.net/~phedlin/tr8250848/ > > OK. > From xxinliu at amazon.com Tue Aug 11 17:09:11 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Tue, 11 Aug 2020 17:09:11 +0000 Subject: RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic In-Reply-To: <1596523192072.15354@amazon.com> References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com> <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com> <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com> <1595401959932.33284@amazon.com> <1595520162373.22868@amazon.com>, <916b3a4a-5617-941d-6161-840f3ea900bd@oracle.com>, <1596523192072.15354@amazon.com> Message-ID: <1597165750921.4285@amazon.com> Hi, Reviewers, May I gently ping this? I stuck because I don't know which error handling is appropriate. If we do nothing, current hotspot ignores wrong intrinsic Ids in the cmdline. This patch aborts hotspot when it detects any invalid intrinsic id. thanks, --lx ________________________________________ From: Liu, Xin Sent: Monday, August 3, 2020 11:39 PM To: Tobias Hartmann; Nils Eliasson; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev Subject: Re: [EXTERNAL] RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic hi, Nils, Tobias would like to keep the parser behavior consistency. I think it means that the hotspot need to suppress the warning if the intrinsic_id doesn't exists in compiler directive. eg. -XX:CompileCommand=option,,ControlIntrinsic=-_nonexist. What do you think about it? Here is the latest webrev: http://cr.openjdk.java.net/~xliu/8247732/01/webrev/ thanks, --lx ________________________________________ From: Tobias Hartmann Sent: Friday, July 24, 2020 2:52 AM To: Liu, Xin; Nils Eliasson; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev Subject: RE: [EXTERNAL] RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Liu, On 23.07.20 18:02, Liu, Xin wrote: > That is my intention too, but CompilerOracle doesn't exit JVM when it encounters parsing errors. > It just exacts information from CompileCommand as many as possible. That makes sense because compiler "directives" are supposed to be optional for program execution. > > I do put the error message in parser's errorbuf. I set a flag "exit_on_error" to quit JVM after it dumps parser errors. yes, I treat undefined intrinsics as fatal errors. > This behavior is from Nils comment: "I want to see an error on startup if the user has specified unknown intrinsic names." It is also consistent with JVM option -XX:ControlIntrinsic=. Okay, thanks for the explanation! I would prefer consistency in error handling of compiler directives, i.e., handle all parser failures the same way. But I leave it to Nils to decide. Best regards, Tobias From luhenry at microsoft.com Tue Aug 11 18:28:56 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Tue, 11 Aug 2020 18:28:56 +0000 Subject: [aarch64-port-dev ] [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: <2d960bbe-0191-db94-2d5c-7df511a36dba@redhat.com> References: <2d960bbe-0191-db94-2d5c-7df511a36dba@redhat.com> Message-ID: Hi Andrew, (I'm currently on vacation and will come back on the 20th.) I've relied on the existing test suite, which was also enhanced when submitting the patch for the MD5 intrinsic on x86 [1]. To help in the development, I've also generated 1k random strings, got them through md5sum on Linux, and compared the output of this MD5 intrinsic on the same input. I did not use [2] as a testing bed, but would be happy to add it to the OpenJDK test suite (if the license allows for it, I didn't check yet where it's allowed). > I'm looking through the test suite, but I can't find the test vectors. They must be in there somewhere. test/hotspot/jtreg/compiler/intrinsics/sha/TestDigest.java covers that by running a single value with and without the intrinsic. -- Ludovic [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039327.html [2] https://www.nist.gov/itl/ssd/software-quality-group/nsrl-test-data From vladimir.kozlov at oracle.com Tue Aug 11 19:32:20 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 11 Aug 2020 12:32:20 -0700 Subject: RFR 8251268: Move PhaseChaitin definations from live.cpp to chaitin.cpp In-Reply-To: References: <54AD5187-E7EE-410F-BD5D-11658E8D2F6E@amazon.com> <6593ec2c-78dc-4a72-7f5b-f6c60deda41d@oracle.com> <8DD352C1-18EA-4D50-9646-18C333CCC118@amazon.com> Message-ID: On 8/11/20 12:15 AM, Christian Hagedorn wrote: > Hi Clive > > Thanks a lot for taking care of this! > > One last comment: The existing spacing for the verify methods in the .hpp file is wrong. But since there are many more > methods with a wrong spacing following it, I leave it up to you if you want to fix it for the verify methods or not. I'm > fine with both. Either way, you don't need to send another webrev. > > Otherwise, it looks good to me! +1 Thanks, Vladimir K > > Best regards, > Christian > > On 10.08.20 19:00, Verghese, Clive wrote: >> Hi Christian, >> >> Thank you for the feedback. I have updated the review addressing the comments below. >> >> http://cr.openjdk.java.net/~xliu/clive/8251268/01/webrev/ >> >> Regards, >> Clive Verghese >> >> >> >> ?On 8/6/20, 11:55 PM, "Christian Hagedorn" wrote: >> ???? Hi Clive >> ???? The fix looks good to me. It makes sense to move it to chaitin.cpp since >> ???? the calls to verify() are also in this file only. >> ???? You could fix some minor code style things about the existing code that >> ???? you moved while at it: >> ???? - You can move the #ifdef ASSERT out of both methods and surround both >> ???? methods by one single #ifdef ASSERT since verify()/verify_base_ptrs() >> ???? are only called in ASSERT blocks. And add a // ASSERT comment on the >> ???? closing #endif to make it more clear. Don't forget to also surround the >> ???? declarations in the .hpp file with an ASSERT. >> ???? - In verify_base_ptrs(): >> ??????? - L2330: Missing curly braces for the loop >> ??????? - L2297, 2309, 2316: The asterisk should be at the type: ResourceArea >> ???? *a -> ResourceArea* a >> ??????? - There is a missing space in all asserts after the comma separating >> ???? the condition and the failure string >> ???? - In verify(): >> ??????? - L2386: Missing space and curly braces for the if statement >> ???? Best regards, >> ???? Christian >> ???? On 07.08.20 01:49, Verghese, Clive wrote: >> ???? > Hi, >> ???? > >> ???? > Requesting review for >> ???? > >> ???? > Webrev : http://cr.openjdk.java.net/~xliu/clive/8251268/00/webrev/ >> ???? > JBS : https://bugs.openjdk.java.net/browse/JDK-8251268 >> ???? > >> ???? > The change moves the definition of PhaseChaitin::verify_base_ptrs and PhaseChaitin::verify from live.cpp to >> chaitin.cpp >> ???? > >> ???? > I have tested this builds successfully for both PRODUCT and !PRODUCT. >> ???? > >> ???? > Ensured that there are no regressions in hotspot:tier1 tests. >> ???? > >> ???? > >> ???? > Regards, >> ???? > Clive Verghese >> ???? > >> From beurba at microsoft.com Tue Aug 11 20:23:50 2020 From: beurba at microsoft.com (Bernhard Urban-Forster) Date: Tue, 11 Aug 2020 20:23:50 +0000 Subject: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: References: <9074F4C9-589A-4519-BBAB-2F3161B814D7@oracle.com> , Message-ID: Hey Doug, since I was curious I did a bit of digging. Here are my findings: 1. Graal is able to detect that it only needs to do the array bounds check once for all the 16 array accesses, as I expected. 2. Thus the generated code by Graal is almost as fast as the MD5 intrinsic. 3. The gap, from what I can tell, is that the SchedulePhase decides to put all the 16 FloatingReadNodes at the top of the basic block, and thus increasing register pressure and therefore ending up needing to spill on x86_64. It would be nice if the read access would be scheduled next to its usage in this case. I couldn't figure out how to do that, it has been a while since I've touched that code :-) Here are some numbers plus the generated code of C2, the intrinsic and Graal: https://gist.github.com/lewurm/3b874558d369fd56b3737e28f1616740 -Bernhard ________________________________________ From: Doug Simon Sent: Monday, August 10, 2020 15:38 To: Bernhard Urban-Forster Cc: Ludovic Henry; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64 Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 Hi Bernhard, On 10 Aug 2020, at 15:01, Bernhard Urban-Forster > wrote: Hey Doug, replying on behalf for Ludovic, as he is on vacation :-) Currently we are not planning to implement the intrinsic for Graal. Schade ;-) Also we didn't check the generated code by Graal. I believe it will do a better job eliminated array bounds checks, but I'm curious to learn how "compiling the relevant Java code without array bounds checks" works. Is something like that done for other methods already? I don?t think we do that anywhere currently but I imagine it wouldn?t be hard to put the BytecodeParser into a mode whereby an array access generates a AccessIndexedNode that omits the bounds check (generated by org.graalvm.compiler.replacements.DefaultJavaLoweringProvider.getBoundsCheck). -Doug This is the relevant Java method for the MD5 intrinsic: https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/733218137289d6a0eb705103ed7be30f1e68d17a/src/java.base/share/classes/sun/security/provider/MD5.java*L172__;Iw!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6ijVLDV$ -Bernhard ________________________________________ From: Doug Simon > Sent: Monday, August 10, 2020 11:55 To: Ludovic Henry Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64 Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 Hi Ludovic, Are you considering also implementing this intrinsic in Graal? Is the intrinsification purely about removing the array bounds checks? If so, it may be possible to have the Graal intrinsify the method by compiling the relevant Java code without array bounds checks. -Doug On 9 Aug 2020, at 05:19, Ludovic Henry > wrote: Hello, Bug: https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8251216&data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&sdata=C7Bi8BTsmtR3HFgWgYTw7jww63BcHGutNXE8o9x2bdY*3D&reserved=0__;JSUlJSUlJSUlJSUlJSU!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E97IPBA3$ Webrev: https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=http:*2F*2Fcr.openjdk.java.net*2F*luhenry*2F8251216*2Fwebrev.00&data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&sdata=0CZOMfpmtPZiy64za8NYYpVjCdawmjGacEOc3WfADDA*3D&reserved=0__;JSUlfiUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E84nlzLJ$ Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1 This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance improvements are the following (on Linux-AArch64 on a Marvell TX2): -XX:-UseMD5Intrinsics Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest md5 64 DEFAULT thrpt 10 1616.238 ? 28.082 ops/ms MessageDigests.digest md5 1024 DEFAULT thrpt 10 215.030 ? 0.691 ops/ms MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.228 ? 0.001 ops/ms -XX:+UseMD5Intrinsics Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest md5 64 DEFAULT thrpt 10 2005.233 ? 40.513 ops/ms => 24% speedup MessageDigests.digest md5 1024 DEFAULT thrpt 10 275.979 ? 0.455 ops/ms => 28% speedup MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.279 ? 0.001 ops/ms => 22% speedup Thank you, Ludovic [1] https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8250902&data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&sdata=5KcoG5n10rnVMU9y8L076jpCoEd0NBzNqr*2F8M5ghO3c*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6SPJBTN$ From doug.simon at oracle.com Tue Aug 11 20:32:54 2020 From: doug.simon at oracle.com (Doug Simon) Date: Tue, 11 Aug 2020 22:32:54 +0200 Subject: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: References: <9074F4C9-589A-4519-BBAB-2F3161B814D7@oracle.com> Message-ID: <80E25174-9E0E-40EF-AF75-7295782CE360@oracle.com> Thanks for the digging and results Bernhard. We?ve discussed making the SchedulePhase do latency-aware scheduling within blocks but haven?t done anything yet. -Doug > On 11 Aug 2020, at 22:23, Bernhard Urban-Forster wrote: > > Hey Doug, > > since I was curious I did a bit of digging. Here are my findings: > > 1. Graal is able to detect that it only needs to do the array bounds check once for all the 16 array accesses, as I expected. > 2. Thus the generated code by Graal is almost as fast as the MD5 intrinsic. > 3. The gap, from what I can tell, is that the SchedulePhase decides to put all the 16 FloatingReadNodes at the top of the basic block, and thus increasing register pressure and therefore ending up needing to spill on x86_64. It would be nice if the read access would be scheduled next to its usage in this case. I couldn't figure out how to do that, it has been a while since I've touched that code :-) > > Here are some numbers plus the generated code of C2, the intrinsic and Graal: > https://urldefense.com/v3/__https://gist.github.com/lewurm/3b874558d369fd56b3737e28f1616740__;!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctRenoV51$ > > -Bernhard > > ________________________________________ > From: Doug Simon > Sent: Monday, August 10, 2020 15:38 > To: Bernhard Urban-Forster > Cc: Ludovic Henry; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64 > Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 > > Hi Bernhard, > > > On 10 Aug 2020, at 15:01, Bernhard Urban-Forster > wrote: > > Hey Doug, > > replying on behalf for Ludovic, as he is on vacation :-) > > Currently we are not planning to implement the intrinsic for Graal. > > Schade ;-) > > Also we didn't check the generated code by Graal. I believe it will do a better job eliminated array bounds checks, but I'm curious to learn how "compiling the relevant Java code without array bounds checks" works. Is something like that done for other methods already? > > I don?t think we do that anywhere currently but I imagine it wouldn?t be hard to put the BytecodeParser into a mode whereby an array access generates a AccessIndexedNode that omits the bounds check (generated by org.graalvm.compiler.replacements.DefaultJavaLoweringProvider.getBoundsCheck). > > -Doug > > > This is the relevant Java method for the MD5 intrinsic: > https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/733218137289d6a0eb705103ed7be30f1e68d17a/src/java.base/share/classes/sun/security/provider/MD5.java*L172__;Iw!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6ijVLDV$ > > > -Bernhard > > ________________________________________ > From: Doug Simon > > Sent: Monday, August 10, 2020 11:55 > To: Ludovic Henry > Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64 > Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 > > Hi Ludovic, > > Are you considering also implementing this intrinsic in Graal? > > Is the intrinsification purely about removing the array bounds checks? If so, it may be possible to have the Graal intrinsify the method by compiling the relevant Java code without array bounds checks. > > -Doug > > On 9 Aug 2020, at 05:19, Ludovic Henry > wrote: > > Hello, > > Bug: https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8251216&data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&sdata=C7Bi8BTsmtR3HFgWgYTw7jww63BcHGutNXE8o9x2bdY*3D&reserved=0__;JSUlJSUlJSUlJSUlJSU!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E97IPBA3$ > Webrev: https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=http:*2F*2Fcr.openjdk.java.net*2F*luhenry*2F8251216*2Fwebrev.00&data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&sdata=0CZOMfpmtPZiy64za8NYYpVjCdawmjGacEOc3WfADDA*3D&reserved=0__;JSUlfiUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E84nlzLJ$ > > Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1 > > This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance improvements are the following (on Linux-AArch64 on a Marvell TX2): > > -XX:-UseMD5Intrinsics > Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 10 1616.238 ? 28.082 ops/ms > MessageDigests.digest md5 1024 DEFAULT thrpt 10 215.030 ? 0.691 ops/ms > MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.228 ? 0.001 ops/ms > > -XX:+UseMD5Intrinsics > Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 10 2005.233 ? 40.513 ops/ms => 24% speedup > MessageDigests.digest md5 1024 DEFAULT thrpt 10 275.979 ? 0.455 ops/ms => 28% speedup > MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.279 ? 0.001 ops/ms => 22% speedup > > Thank you, > Ludovic > > [1] https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8250902&data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&sdata=5KcoG5n10rnVMU9y8L076jpCoEd0NBzNqr*2F8M5ghO3c*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6SPJBTN$ > From beurba at microsoft.com Tue Aug 11 20:45:27 2020 From: beurba at microsoft.com (Bernhard Urban-Forster) Date: Tue, 11 Aug 2020 20:45:27 +0000 Subject: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: <80E25174-9E0E-40EF-AF75-7295782CE360@oracle.com> References: <9074F4C9-589A-4519-BBAB-2F3161B814D7@oracle.com> , <80E25174-9E0E-40EF-AF75-7295782CE360@oracle.com> Message-ID: That's great to hear :-) Thank you, -Bernhard ________________________________________ From: Doug Simon Sent: Tuesday, August 11, 2020 22:32 To: Bernhard Urban-Forster Cc: Ludovic Henry; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64; Thomas Wuerthinger; David Leopoldseder Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 Thanks for the digging and results Bernhard. We?ve discussed making the SchedulePhase do latency-aware scheduling within blocks but haven?t done anything yet. -Doug > On 11 Aug 2020, at 22:23, Bernhard Urban-Forster wrote: > > Hey Doug, > > since I was curious I did a bit of digging. Here are my findings: > > 1. Graal is able to detect that it only needs to do the array bounds check once for all the 16 array accesses, as I expected. > 2. Thus the generated code by Graal is almost as fast as the MD5 intrinsic. > 3. The gap, from what I can tell, is that the SchedulePhase decides to put all the 16 FloatingReadNodes at the top of the basic block, and thus increasing register pressure and therefore ending up needing to spill on x86_64. It would be nice if the read access would be scheduled next to its usage in this case. I couldn't figure out how to do that, it has been a while since I've touched that code :-) > > Here are some numbers plus the generated code of C2, the intrinsic and Graal: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fgist.github.com%2Flewurm%2F3b874558d369fd56b3737e28f1616740__%3B!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctRenoV51%24&data=02%7C01%7Cbeurba%40microsoft.com%7C4604acd9be3e4dafdb8d08d83e35c6ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637327747982263260&sdata=da4FNCvOEUajDgNYLdNl2DNo3diwgsCsGy8BCD%2BipPA%3D&reserved=0 > > -Bernhard > > ________________________________________ > From: Doug Simon > Sent: Monday, August 10, 2020 15:38 > To: Bernhard Urban-Forster > Cc: Ludovic Henry; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64 > Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 > > Hi Bernhard, > > > On 10 Aug 2020, at 15:01, Bernhard Urban-Forster > wrote: > > Hey Doug, > > replying on behalf for Ludovic, as he is on vacation :-) > > Currently we are not planning to implement the intrinsic for Graal. > > Schade ;-) > > Also we didn't check the generated code by Graal. I believe it will do a better job eliminated array bounds checks, but I'm curious to learn how "compiling the relevant Java code without array bounds checks" works. Is something like that done for other methods already? > > I don?t think we do that anywhere currently but I imagine it wouldn?t be hard to put the BytecodeParser into a mode whereby an array access generates a AccessIndexedNode that omits the bounds check (generated by org.graalvm.compiler.replacements.DefaultJavaLoweringProvider.getBoundsCheck). > > -Doug > > > This is the relevant Java method for the MD5 intrinsic: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblob%2F733218137289d6a0eb705103ed7be30f1e68d17a%2Fsrc%2Fjava.base%2Fshare%2Fclasses%2Fsun%2Fsecurity%2Fprovider%2FMD5.java*L172__%3BIw!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6ijVLDV%24&data=02%7C01%7Cbeurba%40microsoft.com%7C4604acd9be3e4dafdb8d08d83e35c6ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637327747982273214&sdata=BTu7UyXhnF1XPmbzhpVQ3y3mQ1evQHuVe0qKMgj%2FNDs%3D&reserved=0 > > > -Bernhard > > ________________________________________ > From: Doug Simon > > Sent: Monday, August 10, 2020 11:55 > To: Ludovic Henry > Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64 > Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 > > Hi Ludovic, > > Are you considering also implementing this intrinsic in Graal? > > Is the intrinsification purely about removing the array bounds checks? If so, it may be possible to have the Graal intrinsify the method by compiling the relevant Java code without array bounds checks. > > -Doug > > On 9 Aug 2020, at 05:19, Ludovic Henry > wrote: > > Hello, > > Bug: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8251216%26amp%3Bdata%3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034%26amp%3Bsdata%3DC7Bi8BTsmtR3HFgWgYTw7jww63BcHGutNXE8o9x2bdY*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUlJSU!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E97IPBA3%24&data=02%7C01%7Cbeurba%40microsoft.com%7C4604acd9be3e4dafdb8d08d83e35c6ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637327747982273214&sdata=gzxWbxSJGlmPvXnYko6rvVAKnbeJOWhWhISqTJvVaA8%3D&reserved=0 > Webrev: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttp%3A*2F*2Fcr.openjdk.java.net*2F*luhenry*2F8251216*2Fwebrev.00%26amp%3Bdata%3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034%26amp%3Bsdata%3D0CZOMfpmtPZiy64za8NYYpVjCdawmjGacEOc3WfADDA*3D%26amp%3Breserved%3D0__%3BJSUlfiUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E84nlzLJ%24&data=02%7C01%7Cbeurba%40microsoft.com%7C4604acd9be3e4dafdb8d08d83e35c6ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637327747982273214&sdata=%2FTz7d6sQ2Hx8MGSGgUv5eKLxgCxtKEIdSJA2EYX3pHE%3D&reserved=0 > > Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1 > > This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance improvements are the following (on Linux-AArch64 on a Marvell TX2): > > -XX:-UseMD5Intrinsics > Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 10 1616.238 ? 28.082 ops/ms > MessageDigests.digest md5 1024 DEFAULT thrpt 10 215.030 ? 0.691 ops/ms > MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.228 ? 0.001 ops/ms > > -XX:+UseMD5Intrinsics > Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 10 2005.233 ? 40.513 ops/ms => 24% speedup > MessageDigests.digest md5 1024 DEFAULT thrpt 10 275.979 ? 0.455 ops/ms => 28% speedup > MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.279 ? 0.001 ops/ms => 22% speedup > > Thank you, > Ludovic > > [1] https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8250902%26amp%3Bdata%3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034%26amp%3Bsdata%3D5KcoG5n10rnVMU9y8L076jpCoEd0NBzNqr*2F8M5ghO3c*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6SPJBTN%24&data=02%7C01%7Cbeurba%40microsoft.com%7C4604acd9be3e4dafdb8d08d83e35c6ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637327747982273214&sdata=8kQJdfsUCcdDq%2BS3UA0vWV2QADQGhGSEsZFiXZK0e%2Bw%3D&reserved=0 > From jingxinc at amazon.com Tue Aug 11 22:41:46 2020 From: jingxinc at amazon.com (Eric, Chan) Date: Tue, 11 Aug 2020 22:41:46 +0000 Subject: RFR 8164632: Node indices should be treated as unsigned integers Message-ID: <99612339-38D5-411C-9459-89EA1A0F4284@amazon.com> Hi, Requesting review for Webrev : http://cr.openjdk.java.net/~xliu/eric/8164632/00/webrev/ JBS : https://bugs.openjdk.java.net/browse/JDK-8164632 The change cast uint ni to integer so that the parameter that pass to method TypeOopPtr::cast_to_instance_id is a integer. I have tested this builds successfully . Ensured that there are no regressions in hotspot : tier1 tests. Regards, Eric Chen From vladimir.kozlov at oracle.com Wed Aug 12 00:19:09 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 11 Aug 2020 17:19:09 -0700 Subject: [16] RFR(T) 8251306: compiler/aot/cli/jaotc/IgnoreErrorsTest.java timed out on MacOS Message-ID: <62be0c5d-1bb4-4f24-6f9c-25e2b4059a07@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8251306 Test runs 4 jaotc subtests and each took 4 mins on particular slow machine. Even so timeout factor was "-timeoutFactor:4" it was not enough. Tests concurrency was '-concurrency:6' Flags were: '-ea -esa -XX:CompileThreshold=100 -XX:-TieredCompilation' So 2 C2 threads were compiling Graal during JAOTC execution when other tests were run concurrently. Which may explain slow execution. Since it is rare case I suggest just increase test's timeout from default 2 to 6 mins : test/hotspot/jtreg/compiler/aot/cli/jaotc/IgnoreErrorsTest.java @@ -1,5 +1,5 @@ /* - * Copyright (c) 2018, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2018, 2020, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it @@ -26,7 +26,7 @@ * @requires vm.aot * @library / /test/lib /testlibrary * @compile IllegalClass.jasm - * @run driver compiler.aot.cli.jaotc.IgnoreErrorsTest + * @run driver/timeout=360 compiler.aot.cli.jaotc.IgnoreErrorsTest */ package compiler.aot.cli.jaotc; Thanks, Vladimir From igor.ignatyev at oracle.com Wed Aug 12 02:22:33 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 11 Aug 2020 19:22:33 -0700 Subject: [16] RFR(T) 8251306: compiler/aot/cli/jaotc/IgnoreErrorsTest.java timed out on MacOS In-Reply-To: <62be0c5d-1bb4-4f24-6f9c-25e2b4059a07@oracle.com> References: <62be0c5d-1bb4-4f24-6f9c-25e2b4059a07@oracle.com> Message-ID: <68AEF8FA-949F-414F-BBA8-D7A1A8D13469@oracle.com> Hi Vladimir, LGTM. -- Igor > On Aug 11, 2020, at 5:19 PM, Vladimir Kozlov wrote: > > https://bugs.openjdk.java.net/browse/JDK-8251306 > > Test runs 4 jaotc subtests and each took 4 mins on particular slow machine. > Even so timeout factor was "-timeoutFactor:4" it was not enough. > > Tests concurrency was '-concurrency:6' > Flags were: '-ea -esa -XX:CompileThreshold=100 -XX:-TieredCompilation' > > So 2 C2 threads were compiling Graal during JAOTC execution when other tests were run concurrently. > Which may explain slow execution. > > Since it is rare case I suggest just increase test's timeout from default 2 to 6 mins : > > test/hotspot/jtreg/compiler/aot/cli/jaotc/IgnoreErrorsTest.java > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 2018, Oracle and/or its affiliates. All rights reserved. > + * Copyright (c) 2018, 2020, Oracle and/or its affiliates. All rights reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -26,7 +26,7 @@ > * @requires vm.aot > * @library / /test/lib /testlibrary > * @compile IllegalClass.jasm > - * @run driver compiler.aot.cli.jaotc.IgnoreErrorsTest > + * @run driver/timeout=360 compiler.aot.cli.jaotc.IgnoreErrorsTest > */ > > package compiler.aot.cli.jaotc; > > Thanks, > Vladimir From vladimir.kozlov at oracle.com Wed Aug 12 02:23:10 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 11 Aug 2020 19:23:10 -0700 Subject: [16] RFR(T) 8251306: compiler/aot/cli/jaotc/IgnoreErrorsTest.java timed out on MacOS In-Reply-To: <68AEF8FA-949F-414F-BBA8-D7A1A8D13469@oracle.com> References: <62be0c5d-1bb4-4f24-6f9c-25e2b4059a07@oracle.com> <68AEF8FA-949F-414F-BBA8-D7A1A8D13469@oracle.com> Message-ID: <7ce1223b-7c1d-47b9-dc69-ae72baa26fda@oracle.com> Thank you, Igor Vladimir K On 8/11/20 7:22 PM, Igor Ignatyev wrote: > Hi Vladimir, > > LGTM. > > -- Igor > >> On Aug 11, 2020, at 5:19 PM, Vladimir Kozlov wrote: >> >> https://bugs.openjdk.java.net/browse/JDK-8251306 >> >> Test runs 4 jaotc subtests and each took 4 mins on particular slow machine. >> Even so timeout factor was "-timeoutFactor:4" it was not enough. >> >> Tests concurrency was '-concurrency:6' >> Flags were: '-ea -esa -XX:CompileThreshold=100 -XX:-TieredCompilation' >> >> So 2 C2 threads were compiling Graal during JAOTC execution when other tests were run concurrently. >> Which may explain slow execution. >> >> Since it is rare case I suggest just increase test's timeout from default 2 to 6 mins : >> >> test/hotspot/jtreg/compiler/aot/cli/jaotc/IgnoreErrorsTest.java >> @@ -1,5 +1,5 @@ >> /* >> - * Copyright (c) 2018, Oracle and/or its affiliates. All rights reserved. >> + * Copyright (c) 2018, 2020, Oracle and/or its affiliates. All rights reserved. >> * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. >> * >> * This code is free software; you can redistribute it and/or modify it >> @@ -26,7 +26,7 @@ >> * @requires vm.aot >> * @library / /test/lib /testlibrary >> * @compile IllegalClass.jasm >> - * @run driver compiler.aot.cli.jaotc.IgnoreErrorsTest >> + * @run driver/timeout=360 compiler.aot.cli.jaotc.IgnoreErrorsTest >> */ >> >> package compiler.aot.cli.jaotc; >> >> Thanks, >> Vladimir > From OGATAK at jp.ibm.com Wed Aug 12 07:48:59 2020 From: OGATAK at jp.ibm.com (Kazunori Ogata) Date: Wed, 12 Aug 2020 16:48:59 +0900 Subject: RFR: JDK-8251470: Add a development option equivalant to OptoNoExecute to C1 compiler Message-ID: Hi, May I get review for JDK-8251470: Add a development option equivalant to OptoNoExecute to C1 compiler? This patch adds a development option to compile a method with C1 and print disassembly of the generated native code, but to skip execution of the generated code, in the same manner as OptoNoExecute option does in C2. Log-based debugging is useful to support a new processor. In C1, the existing options BailoutAfterHIR and BailoutAfterLIR can be used if printing HIR/LIR is sufficient. However, there is no way to print disassembly of the generated code because these existing options quit compilation before generating native code. So this issue proposes a new option for this purpose. Bug: https://bugs.openjdk.java.net/browse/JDK-8251470 Webrev: http://cr.openjdk.java.net/~ogatak/8251470/webrev.00/ Regards, Ogata From nils.eliasson at oracle.com Wed Aug 12 08:21:58 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 12 Aug 2020 10:21:58 +0200 Subject: RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic In-Reply-To: <1597165750921.4285@amazon.com> References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com> <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com> <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com> <1595401959932.33284@amazon.com> <1595520162373.22868@amazon.com> <916b3a4a-5617-941d-6161-840f3ea900bd@oracle.com> <1596523192072.15354@amazon.com> <1597165750921.4285@amazon.com> Message-ID: Hi, Sorry for the delay. About the error handling: For CompilerDirectivesFile there are two scenarios: 1) If a file containing bad contents is passed on the commandline - the VM prints an descriptive error and refuses to start. 2) If a file containing bad contents is passed through jcmd - the VM prints and error on the jcmd stream and continues to run (ignoring the command). This is achieved by letting the parser just register any parsing error, and defer to the caller to decide how to handle the situation. Regards, Nils Eliasson On 2020-08-11 19:09, Liu, Xin wrote: > Hi, Reviewers, > > May I gently ping this? > > I stuck because I don't know which error handling is appropriate. > > If we do nothing, current hotspot ignores wrong intrinsic Ids in the cmdline. > This patch aborts hotspot when it detects any invalid intrinsic id. > > thanks, > --lx > > > ________________________________________ > From: Liu, Xin > Sent: Monday, August 3, 2020 11:39 PM > To: Tobias Hartmann; Nils Eliasson; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev > Subject: Re: [EXTERNAL] RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic > > hi, Nils, > > Tobias would like to keep the parser behavior consistency. I think it means that the hotspot need to suppress the warning if the intrinsic_id doesn't exists in compiler directive. > eg. -XX:CompileCommand=option,,ControlIntrinsic=-_nonexist. > > What do you think about it? > > Here is the latest webrev: > http://cr.openjdk.java.net/~xliu/8247732/01/webrev/ > > thanks, > --lx > > ________________________________________ > From: Tobias Hartmann > Sent: Friday, July 24, 2020 2:52 AM > To: Liu, Xin; Nils Eliasson; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev > Subject: RE: [EXTERNAL] RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > Hi Liu, > > On 23.07.20 18:02, Liu, Xin wrote: >> That is my intention too, but CompilerOracle doesn't exit JVM when it encounters parsing errors. >> It just exacts information from CompileCommand as many as possible. That makes sense because compiler "directives" are supposed to be optional for program execution. >> >> I do put the error message in parser's errorbuf. I set a flag "exit_on_error" to quit JVM after it dumps parser errors. yes, I treat undefined intrinsics as fatal errors. >> This behavior is from Nils comment: "I want to see an error on startup if the user has specified unknown intrinsic names." It is also consistent with JVM option -XX:ControlIntrinsic=. > Okay, thanks for the explanation! I would prefer consistency in error handling of compiler > directives, i.e., handle all parser failures the same way. But I leave it to Nils to decide. > > Best regards, > Tobias From tobias.hartmann at oracle.com Wed Aug 12 08:57:13 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 12 Aug 2020 10:57:13 +0200 Subject: [16] RFR(S): 8251456: [TESTBUG] compiler/vectorization/TestVectorsNotSavedAtSafepoint.java failed OutOfMemoryError Message-ID: <55ac928f-7d77-2900-1688-100e5a6fbd4f@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8251456 http://cr.openjdk.java.net/~thartmann/8251456/webrev.00/ The test is supposed to fill up the heap to trigger GC which will then corrupt vector registers if they are not saved at safepoints. It sometimes fails with OOME because there is not enough heap space to allocate such large arrays. Adding a System.gc() call to the GarbageProducerThread triggers GCs more often without the need to allocate large garbage arrays and running for many (warmup) iterations. I've also strengthened verification of the array contents and used the exact same command line flags that Roland proposed in his fix for JDK-8193518 [1]. I've verified that the test still reproduces JDK-8249608 and JDK-8193518. It's now much more reliable and reproduces the issues in every run. Best regards, Tobias [1] http://cr.openjdk.java.net/~roland/8193518/webrev.01/test/hotspot/jtreg/compiler/vectorization/TestVectorsNotSavedAtSafepoint.java.html From aph at redhat.com Wed Aug 12 09:47:50 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 12 Aug 2020 10:47:50 +0100 Subject: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: References: <9074F4C9-589A-4519-BBAB-2F3161B814D7@oracle.com> Message-ID: On 8/10/20 2:38 PM, Doug Simon wrote: > I don?t think we do that anywhere currently but I imagine it > wouldn?t be hard to put the BytecodeParser into a mode whereby an > array access generates a AccessIndexedNode that omits the bounds > check (generated by > org.graalvm.compiler.replacements.DefaultJavaLoweringProvider.getBoundsCheck). We could do that in C2 as well. And it'd be far more attractive than hand-coded intrinsics. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tobias.hartmann at oracle.com Wed Aug 12 11:08:39 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 12 Aug 2020 13:08:39 +0200 Subject: [16] RFR(S): 8251458: Parse::do_lookupswitch fails with "assert(_cnt >= 0) failed" Message-ID: Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8251458 http://cr.openjdk.java.net/~thartmann/8251458/webrev.00/ We hit an assert in Parse::do_lookupswitch() because the "taken" counter for a lookupswitch branch is negative. The problem is an overflow when converting an uint counter value > max_jint from profile information to a jint. The fix is to handle such overflows by simply limiting the counter value to max_jint. Best regards, Tobias From christian.hagedorn at oracle.com Wed Aug 12 11:26:47 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 12 Aug 2020 13:26:47 +0200 Subject: [16] RFR(S): 8251458: Parse::do_lookupswitch fails with "assert(_cnt >= 0) failed" In-Reply-To: References: Message-ID: Hi Tobias Looks good to me! Best regards, Christian On 12.08.20 13:08, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8251458 > http://cr.openjdk.java.net/~thartmann/8251458/webrev.00/ > > We hit an assert in Parse::do_lookupswitch() because the "taken" counter for a lookupswitch branch > is negative. The problem is an overflow when converting an uint counter value > max_jint from > profile information to a jint. > > The fix is to handle such overflows by simply limiting the counter value to max_jint. > > Best regards, > Tobias > From stumon01 at arm.com Wed Aug 12 11:38:03 2020 From: stumon01 at arm.com (Stuart Monteith) Date: Wed, 12 Aug 2020 12:38:03 +0100 Subject: [aarch64-port-dev ] [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: <2d960bbe-0191-db94-2d5c-7df511a36dba@redhat.com> References: <2d960bbe-0191-db94-2d5c-7df511a36dba@redhat.com> Message-ID: On 11/08/2020 11:06, Andrew Haley wrote: > On 09/08/2020 04:19, Ludovic Henry wrote: >> Hello, >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8251216 >> Webrev: http://cr.openjdk.java.net/~luhenry/8251216/webrev.00 >> >> Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1 >> >> This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance >> improvements are the following (on Linux-AArch64 on a Marvell TX2): >> >> -XX:-UseMD5Intrinsics >> Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units >> MessageDigests.digest md5 64 DEFAULT thrpt 10 1616.238 ? 28.082 ops/ms >> MessageDigests.digest md5 1024 DEFAULT thrpt 10 215.030 ? 0.691 ops/ms >> MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.228 ? 0.001 ops/ms >> >> -XX:+UseMD5Intrinsics >> Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units >> MessageDigests.digest md5 64 DEFAULT thrpt 10 2005.233 ? 40.513 ops/ms => 24% speedup >> MessageDigests.digest md5 1024 DEFAULT thrpt 10 275.979 ? 0.455 ops/ms => 28% speedup >> MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.279 ? 0.001 ops/ms => 22% speedup >> >> Thank you, >> Ludovic >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8250902 >> > > How did you test this? I'm looking through the test suite, but I can't > find the test vectors. They must be in there somewhere. > > https://www.nist.gov/itl/ssd/software-quality-group/nsrl-test-data > I've been looking over this patch too. The fundamental unit test is: test/hotspot/jtreg/compiler/intrinsics/sha/TestDigest.java The method "testDigest" generates an byte array of a given size, with each element filled with it's own index & 0xff. The test is then run once, assumed uncompiled, it is then "warmed up" and the first generated digest is compared against the digest presumably generated by the intrinsic. This is the same test for all of the message digest algorithms. I'd say the test is no worse than what has gone before. There are additional tests under the jdk library tests, but nothing that addresses the correctness of the MD5 algorithm implementation itself. In terms of the status-quo, that patch looks ok to me. I think if the testing is to be expanded, it should be expanded to all of the message digest algorithms. BR, Stuart IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From tobias.hartmann at oracle.com Wed Aug 12 11:58:50 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 12 Aug 2020 13:58:50 +0200 Subject: [16] RFR(S): 8251458: Parse::do_lookupswitch fails with "assert(_cnt >= 0) failed" In-Reply-To: References: Message-ID: <9345c0b6-b40e-8cff-360d-4843a64b8aec@oracle.com> Thanks Christian! Best regards, Tobias On 12.08.20 13:26, Christian Hagedorn wrote: > Hi Tobias > > Looks good to me! > > Best regards, > Christian > > On 12.08.20 13:08, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8251458 >> http://cr.openjdk.java.net/~thartmann/8251458/webrev.00/ >> >> We hit an assert in Parse::do_lookupswitch() because the "taken" counter for a lookupswitch branch >> is negative. The problem is an overflow when converting an uint counter value > max_jint from >> profile information to a jint. >> >> The fix is to handle such overflows by simply limiting the counter value to max_jint. >> >> Best regards, >> Tobias >> From christian.hagedorn at oracle.com Wed Aug 12 13:34:26 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 12 Aug 2020 15:34:26 +0200 Subject: [16] RFR(S): 8248791: sun/util/resources/cldr/TimeZoneNamesTest.java fails with -XX:-ReduceInitialCardMarks -XX:-ReduceBulkZeroing Message-ID: Hi Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8248791 http://cr.openjdk.java.net/~chagedorn/8248791/webrev.00/ The problem can be traced back to cloning an object and wrongly optimizing a field load from it to a constant zero. In LoadNode::Value(), we check if a load is performed on a freshly-allocated object. If that is the case we can replace the load by a constant zero. This is done by calling can_see_stored_value() at [1]. In this method, we first check if we can find a captured store with find_captured_store() [2]. When enabling ReduceBulkZeroing in the testcase, then this method returns NULL because captured_store_insertion_point() bails out at [3] for completed InitializationNodes (is set to complete at [4] since ReduceBulkZeroing is enabled and the allocation belongs to a clone). When disabling ReduceBulkZeroing in the testcase, find_caputured_store() returns a non-NULL ProjNode because the InitializationNode of the allocation is not marked completed. We loop one more time and then return a constant zero at [5] because there is no store for the allocation (the ArrayCopyNode is responsible for the initialization of the cloned object). The fix now only returns a constant zero if ReduceBulkZeroing is enabled or when the allocation does not belong to an ArrayCopyNode clone (if ReduceBulkZeroing is disabled). Thank you! Best regards, Christian [1] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l1968 [2] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l1115 [3] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l3737 [4] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/library_call.cpp#l4236 [5] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l1106 From vladimir.kozlov at oracle.com Wed Aug 12 16:18:21 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 12 Aug 2020 09:18:21 -0700 Subject: [16] RFR(S): 8251456: [TESTBUG] compiler/vectorization/TestVectorsNotSavedAtSafepoint.java failed OutOfMemoryError In-Reply-To: <55ac928f-7d77-2900-1688-100e5a6fbd4f@oracle.com> References: <55ac928f-7d77-2900-1688-100e5a6fbd4f@oracle.com> Message-ID: Looks good. Thanks, Vladimir On 8/12/20 1:57 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8251456 > http://cr.openjdk.java.net/~thartmann/8251456/webrev.00/ > > The test is supposed to fill up the heap to trigger GC which will then corrupt vector registers if > they are not saved at safepoints. It sometimes fails with OOME because there is not enough heap > space to allocate such large arrays. > > Adding a System.gc() call to the GarbageProducerThread triggers GCs more often without the need to > allocate large garbage arrays and running for many (warmup) iterations. I've also strengthened > verification of the array contents and used the exact same command line flags that Roland proposed > in his fix for JDK-8193518 [1]. > > I've verified that the test still reproduces JDK-8249608 and JDK-8193518. It's now much more > reliable and reproduces the issues in every run. > > Best regards, > Tobias > > [1] > http://cr.openjdk.java.net/~roland/8193518/webrev.01/test/hotspot/jtreg/compiler/vectorization/TestVectorsNotSavedAtSafepoint.java.html > From vladimir.kozlov at oracle.com Wed Aug 12 17:31:13 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 12 Aug 2020 10:31:13 -0700 Subject: [16] RFR(S): 8251458: Parse::do_lookupswitch fails with "assert(_cnt >= 0) failed" In-Reply-To: References: Message-ID: <426c9b61-8708-549d-ac5b-6e207aa2f508@oracle.com> +1 Thanks, Vladimir K On 8/12/20 4:26 AM, Christian Hagedorn wrote: > Hi Tobias > > Looks good to me! > > Best regards, > Christian > > On 12.08.20 13:08, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8251458 >> http://cr.openjdk.java.net/~thartmann/8251458/webrev.00/ >> >> We hit an assert in Parse::do_lookupswitch() because the "taken" counter for a lookupswitch branch >> is negative. The problem is an overflow when converting an uint counter value > max_jint from >> profile information to a jint. >> >> The fix is to handle such overflows by simply limiting the counter value to max_jint. >> >> Best regards, >> Tobias >> From vladimir.kozlov at oracle.com Wed Aug 12 17:38:16 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 12 Aug 2020 10:38:16 -0700 Subject: [16] RFR(S): 8248791: sun/util/resources/cldr/TimeZoneNamesTest.java fails with -XX:-ReduceInitialCardMarks -XX:-ReduceBulkZeroing In-Reply-To: References: Message-ID: <13406ed1-5818-8062-6898-66598ba4f595@oracle.com> Good. Thanks, Vladimir K On 8/12/20 6:34 AM, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8248791 > http://cr.openjdk.java.net/~chagedorn/8248791/webrev.00/ > > The problem can be traced back to cloning an object and wrongly optimizing a field load from it to a constant zero. In > LoadNode::Value(), we check if a load is performed on a freshly-allocated object. If that is the case we can replace the > load by a constant zero. This is done by calling can_see_stored_value() at [1]. In this method, we first check if we can > find a captured store with find_captured_store() [2]. > > When enabling ReduceBulkZeroing in the testcase, then this method returns NULL because captured_store_insertion_point() > bails out at [3] for completed InitializationNodes (is set to complete at [4] since ReduceBulkZeroing is enabled and the > allocation belongs to a clone). > > When disabling ReduceBulkZeroing in the testcase, find_caputured_store() returns a non-NULL ProjNode because the > InitializationNode of the allocation is not marked completed. We loop one more time and then return a constant zero at > [5] because there is no store for the allocation (the ArrayCopyNode is responsible for the initialization of the cloned > object). > > The fix now only returns a constant zero if ReduceBulkZeroing is enabled or when the allocation does not belong to an > ArrayCopyNode clone (if ReduceBulkZeroing is disabled). > > Thank you! > > Best regards, > Christian > > > [1] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l1968 > [2] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l1115 > [3] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l3737 > [4] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/library_call.cpp#l4236 > [5] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l1106 From vladimir.x.ivanov at oracle.com Wed Aug 12 22:24:55 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 13 Aug 2020 01:24:55 +0300 Subject: [16] RFR(S): 8251458: Parse::do_lookupswitch fails with "assert(_cnt >= 0) failed" In-Reply-To: References: Message-ID: <965ba14c-2e1c-280e-4403-736ede11dd97@oracle.com> > http://cr.openjdk.java.net/~thartmann/8251458/webrev.00/ Though the fix itself looks sufficient, the code around is still not pretty... In particular, profile data goes through uint->jint->int->float(!) conversion which doesn't make any sense. It would be really nice to clean it up. Best regards, Vladimir Ivanov > We hit an assert in Parse::do_lookupswitch() because the "taken" counter for a lookupswitch branch > is negative. The problem is an overflow when converting an uint counter value > max_jint from > profile information to a jint. > > The fix is to handle such overflows by simply limiting the counter value to max_jint. > > Best regards, > Tobias > From tobias.hartmann at oracle.com Thu Aug 13 05:59:00 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 13 Aug 2020 07:59:00 +0200 Subject: [16] RFR(S): 8251456: [TESTBUG] compiler/vectorization/TestVectorsNotSavedAtSafepoint.java failed OutOfMemoryError In-Reply-To: References: <55ac928f-7d77-2900-1688-100e5a6fbd4f@oracle.com> Message-ID: Thanks Vladimir. Best regards, Tobias On 12.08.20 18:18, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 8/12/20 1:57 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8251456 >> http://cr.openjdk.java.net/~thartmann/8251456/webrev.00/ >> >> The test is supposed to fill up the heap to trigger GC which will then corrupt vector registers if >> they are not saved at safepoints. It sometimes fails with OOME because there is not enough heap >> space to allocate such large arrays. >> >> Adding a System.gc() call to the GarbageProducerThread triggers GCs more often without the need to >> allocate large garbage arrays and running for many (warmup) iterations. I've also strengthened >> verification of the array contents and used the exact same command line flags that Roland proposed >> in his fix for JDK-8193518 [1]. >> >> I've verified that the test still reproduces JDK-8249608 and JDK-8193518. It's now much more >> reliable and reproduces the issues in every run. >> >> Best regards, >> Tobias >> >> [1] >> http://cr.openjdk.java.net/~roland/8193518/webrev.01/test/hotspot/jtreg/compiler/vectorization/TestVectorsNotSavedAtSafepoint.java.html >> >> From tobias.hartmann at oracle.com Thu Aug 13 06:01:35 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 13 Aug 2020 08:01:35 +0200 Subject: [16] RFR(S): 8251458: Parse::do_lookupswitch fails with "assert(_cnt >= 0) failed" In-Reply-To: <426c9b61-8708-549d-ac5b-6e207aa2f508@oracle.com> References: <426c9b61-8708-549d-ac5b-6e207aa2f508@oracle.com> Message-ID: <675e5e9b-3152-f836-7708-1eb69e445777@oracle.com> Thanks Vladimir. Best regards, Tobias On 12.08.20 19:31, Vladimir Kozlov wrote: > +1 > > Thanks, > Vladimir K > > On 8/12/20 4:26 AM, Christian Hagedorn wrote: >> Hi Tobias >> >> Looks good to me! >> >> Best regards, >> Christian >> >> On 12.08.20 13:08, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch: >>> https://bugs.openjdk.java.net/browse/JDK-8251458 >>> http://cr.openjdk.java.net/~thartmann/8251458/webrev.00/ >>> >>> We hit an assert in Parse::do_lookupswitch() because the "taken" counter for a lookupswitch branch >>> is negative. The problem is an overflow when converting an uint counter value > max_jint from >>> profile information to a jint. >>> >>> The fix is to handle such overflows by simply limiting the counter value to max_jint. >>> >>> Best regards, >>> Tobias >>> From tobias.hartmann at oracle.com Thu Aug 13 06:09:06 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 13 Aug 2020 08:09:06 +0200 Subject: [16] RFR(S): 8251458: Parse::do_lookupswitch fails with "assert(_cnt >= 0) failed" In-Reply-To: <965ba14c-2e1c-280e-4403-736ede11dd97@oracle.com> References: <965ba14c-2e1c-280e-4403-736ede11dd97@oracle.com> Message-ID: <6bf15413-3a6d-55cc-570d-c115a72397ec@oracle.com> Hi Vladimir, Thanks for looking at this! On 13.08.20 00:24, Vladimir Ivanov wrote: > Though the fix itself looks sufficient, the code around is still not pretty... In particular, > profile data goes through uint->jint->int->float(!) conversion which doesn't make any sense. > > It would be really nice to clean it up. Yes, I've noticed that as well but didn't want to clean it up with this patch because we need to backport to 11u. I've filed JDK-8251513 [1] for the cleanup. Best regards, Tobias [1] https://bugs.openjdk.java.net/browse/JDK-8251513 From tobias.hartmann at oracle.com Thu Aug 13 06:41:14 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 13 Aug 2020 08:41:14 +0200 Subject: [16] RFR(S): 8248791: sun/util/resources/cldr/TimeZoneNamesTest.java fails with -XX:-ReduceInitialCardMarks -XX:-ReduceBulkZeroing In-Reply-To: References: Message-ID: <1375695a-e955-a341-45f3-d7168b3c9bc8@oracle.com> Hi Christian, what about other allocations that are marked as 'complete_with_arraycoppy'? Not all of them use an ArrayCopyNode for the actual initialization and therefore find_array_copy_clone will return false. For example, LibraryCallKit::inline_string_copy. Can't you just check if InitializeNode::is_complete_with_arraycopy is set? Best regards, Tobias On 12.08.20 15:34, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8248791 > http://cr.openjdk.java.net/~chagedorn/8248791/webrev.00/ > > The problem can be traced back to cloning an object and wrongly optimizing a field load from it to a > constant zero. In LoadNode::Value(), we check if a load is performed on a freshly-allocated object. > If that is the case we can replace the load by a constant zero. This is done by calling > can_see_stored_value() at [1]. In this method, we first check if we can find a captured store with > find_captured_store() [2]. > > When enabling ReduceBulkZeroing in the testcase, then this method returns NULL because > captured_store_insertion_point() bails out at [3] for completed InitializationNodes (is set to > complete at [4] since ReduceBulkZeroing is enabled and the allocation belongs to a clone). > > When disabling ReduceBulkZeroing in the testcase, find_caputured_store() returns a non-NULL ProjNode > because the InitializationNode of the allocation is not marked completed. We loop one more time and > then return a constant zero at [5] because there is no store for the allocation (the ArrayCopyNode > is responsible for the initialization of the cloned object). > > The fix now only returns a constant zero if ReduceBulkZeroing is enabled or when the allocation does > not belong to an ArrayCopyNode clone (if ReduceBulkZeroing is disabled). > > Thank you! > > Best regards, > Christian > > > [1] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l1968 > [2] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l1115 > [3] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l3737 > [4] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/library_call.cpp#l4236 > [5] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l1106 From tobias.hartmann at oracle.com Thu Aug 13 07:00:20 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 13 Aug 2020 09:00:20 +0200 Subject: RFR: JDK-8251470: Add a development option equivalant to OptoNoExecute to C1 compiler In-Reply-To: References: Message-ID: Hi Ogata, isn't that what -XX:-InstallMethods [1] is supposed to accomplish? It triggers a bailout right before Compilation::install_code, which is the same with your code. Also, why do you need the change in javaCalls.cpp? That would also affect C2 compiled code. Best regards, Tobias [1] http://hg.openjdk.java.net/jdk/jdk/file/a7c030723240/src/hotspot/share/c1/c1_globals.hpp#l292 On 12.08.20 09:48, Kazunori Ogata wrote: > Hi, > > May I get review for JDK-8251470: Add a development option equivalant to > OptoNoExecute to C1 compiler? > > This patch adds a development option to compile a method with C1 and print > disassembly of the generated native code, but to skip execution of the > generated code, in the same manner as OptoNoExecute option does in C2. > > Log-based debugging is useful to support a new processor. In C1, the > existing options BailoutAfterHIR and BailoutAfterLIR can be used if > printing HIR/LIR is sufficient. However, there is no way to print > disassembly of the generated code because these existing options quit > compilation before generating native code. So this issue proposes a new > option for this purpose. > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8251470 > Webrev: http://cr.openjdk.java.net/~ogatak/8251470/webrev.00/ > > > Regards, > Ogata > From tobias.hartmann at oracle.com Thu Aug 13 07:31:14 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 13 Aug 2020 09:31:14 +0200 Subject: RFR 8164632: Node indices should be treated as unsigned integers In-Reply-To: <99612339-38D5-411C-9459-89EA1A0F4284@amazon.com> References: <99612339-38D5-411C-9459-89EA1A0F4284@amazon.com> Message-ID: <64dc3cb4-ebbe-a668-febf-5d7dd3ac71df@oracle.com> Hi Eric, there are other places where Node::_idx is casted to int (and a potential overflow might happen). For example, calls to Compile::node_notes_at. The purpose of this RFE was to replace all Node::_idx uint -> int casts and consistently use uint for the node index. If that's not feasible, we should at least add a guarantee (not only an assert) checking that _idx is always <= MAX_INT. Best regards, Tobias On 12.08.20 00:41, Eric, Chan wrote: > Hi, > > Requesting review for > > Webrev : http://cr.openjdk.java.net/~xliu/eric/8164632/00/webrev/ > JBS : https://bugs.openjdk.java.net/browse/JDK-8164632 > > The change cast uint ni to integer so that the parameter that pass to method TypeOopPtr::cast_to_instance_id is a integer. > > I have tested this builds successfully . > > Ensured that there are no regressions in hotspot : tier1 tests. > > Regards, > Eric Chen > From adinn at redhat.com Thu Aug 13 08:06:13 2020 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 13 Aug 2020 09:06:13 +0100 Subject: RFR: 8247354: [aarch64] PopFrame causes assert(oopDesc::is_oop(obj)) failed: not an oop In-Reply-To: <85364ykery.fsf@nicgas01-pc.shanghai.arm.com> References: <85364ykery.fsf@nicgas01-pc.shanghai.arm.com> Message-ID: <10b37c70-c522-7f65-3c7e-bbeeaf7e1c3d@redhat.com> Hi Nick, On 07/08/2020 10:04, Nick Gasson wrote: > Bug: https://bugs.openjdk.java.net/browse/JDK-8247354 > Webrev: http://cr.openjdk.java.net/~ngasson/8247354/webrev.0/ Nice detective work. The patch looks ok to me. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From tobias.hartmann at oracle.com Thu Aug 13 08:21:40 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 13 Aug 2020 10:21:40 +0200 Subject: [16] RFR(S): 8248791: sun/util/resources/cldr/TimeZoneNamesTest.java fails with -XX:-ReduceInitialCardMarks -XX:-ReduceBulkZeroing In-Reply-To: <1375695a-e955-a341-45f3-d7168b3c9bc8@oracle.com> References: <1375695a-e955-a341-45f3-d7168b3c9bc8@oracle.com> Message-ID: On 13.08.20 08:41, Tobias Hartmann wrote: > what about other allocations that are marked as 'complete_with_arraycoppy'? Not all of them use an > ArrayCopyNode for the actual initialization and therefore find_array_copy_clone will return false. > For example, LibraryCallKit::inline_string_copy. > > Can't you just check if InitializeNode::is_complete_with_arraycopy is set? Okay, please ignore that. I've noticed that only the clone intrinsic respects ReduceBulkZeroing. Your fix looks good to me. Best regards, Tobias From aph at redhat.com Thu Aug 13 10:00:12 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 13 Aug 2020 11:00:12 +0100 Subject: [aarch64-port-dev ] [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: References: <2d960bbe-0191-db94-2d5c-7df511a36dba@redhat.com> Message-ID: <0586ead6-583d-1907-491e-64db6edf2106@redhat.com> On 12/08/2020 12:38, Stuart Monteith wrote: > The method "testDigest" generates an byte array of a given size, > with each element filled with it's own index & 0xff. > > The test is then run once, assumed uncompiled, it is then "warmed > up" and the first generated digest is compared against the digest > presumably generated by the intrinsic. This is the same test for all > of the message digest algorithms. > > I'd say the test is no worse than what has gone before. There are > additional tests under the jdk library tests, but nothing that > addresses the correctness of the MD5 algorithm implementation > itself. Good grief. So there are no compliance tests in the test suite at all. > In terms of the status-quo, that patch looks ok to me. I think if > the testing is to be expanded, it should be expanded to all of the > message digest algorithms. That's not much more that an excuse for doing nothing, IMO. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From christian.hagedorn at oracle.com Thu Aug 13 10:17:54 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 13 Aug 2020 12:17:54 +0200 Subject: [16] RFR(S): 8248791: sun/util/resources/cldr/TimeZoneNamesTest.java fails with -XX:-ReduceInitialCardMarks -XX:-ReduceBulkZeroing In-Reply-To: References: <1375695a-e955-a341-45f3-d7168b3c9bc8@oracle.com> Message-ID: <39d1ae30-f7c0-c2d8-7c58-5b5e5cd3a522@oracle.com> Thank you Tobias for your careful review! Best regards, Christian On 13.08.20 10:21, Tobias Hartmann wrote: > > On 13.08.20 08:41, Tobias Hartmann wrote: >> what about other allocations that are marked as 'complete_with_arraycoppy'? Not all of them use an >> ArrayCopyNode for the actual initialization and therefore find_array_copy_clone will return false. >> For example, LibraryCallKit::inline_string_copy. >> >> Can't you just check if InitializeNode::is_complete_with_arraycopy is set? > > Okay, please ignore that. I've noticed that only the clone intrinsic respects ReduceBulkZeroing. > > Your fix looks good to me. > > Best regards, > Tobias > From christian.hagedorn at oracle.com Thu Aug 13 10:46:00 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 13 Aug 2020 12:46:00 +0200 Subject: [16] RFR(S): 8251456: [TESTBUG] compiler/vectorization/TestVectorsNotSavedAtSafepoint.java failed OutOfMemoryError In-Reply-To: <55ac928f-7d77-2900-1688-100e5a6fbd4f@oracle.com> References: <55ac928f-7d77-2900-1688-100e5a6fbd4f@oracle.com> Message-ID: <3d9b4ae6-cee6-5203-87db-9b874403af1a@oracle.com> Hi Tobias Looks good to me. Best regards, Christian On 12.08.20 10:57, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8251456 > http://cr.openjdk.java.net/~thartmann/8251456/webrev.00/ > > The test is supposed to fill up the heap to trigger GC which will then corrupt vector registers if > they are not saved at safepoints. It sometimes fails with OOME because there is not enough heap > space to allocate such large arrays. > > Adding a System.gc() call to the GarbageProducerThread triggers GCs more often without the need to > allocate large garbage arrays and running for many (warmup) iterations. I've also strengthened > verification of the array contents and used the exact same command line flags that Roland proposed > in his fix for JDK-8193518 [1]. > > I've verified that the test still reproduces JDK-8249608 and JDK-8193518. It's now much more > reliable and reproduces the issues in every run. > > Best regards, > Tobias > > [1] > http://cr.openjdk.java.net/~roland/8193518/webrev.01/test/hotspot/jtreg/compiler/vectorization/TestVectorsNotSavedAtSafepoint.java.html > From tobias.hartmann at oracle.com Thu Aug 13 10:47:10 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 13 Aug 2020 12:47:10 +0200 Subject: [16] RFR(S): 8251456: [TESTBUG] compiler/vectorization/TestVectorsNotSavedAtSafepoint.java failed OutOfMemoryError In-Reply-To: <3d9b4ae6-cee6-5203-87db-9b874403af1a@oracle.com> References: <55ac928f-7d77-2900-1688-100e5a6fbd4f@oracle.com> <3d9b4ae6-cee6-5203-87db-9b874403af1a@oracle.com> Message-ID: <5b5be4fb-2cc4-7b96-1741-e8f5cfda3531@oracle.com> Thanks Christian! Best regards, Tobias On 13.08.20 12:46, Christian Hagedorn wrote: > Hi Tobias > > Looks good to me. > > Best regards, > Christian > > On 12.08.20 10:57, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8251456 >> http://cr.openjdk.java.net/~thartmann/8251456/webrev.00/ >> >> The test is supposed to fill up the heap to trigger GC which will then corrupt vector registers if >> they are not saved at safepoints. It sometimes fails with OOME because there is not enough heap >> space to allocate such large arrays. >> >> Adding a System.gc() call to the GarbageProducerThread triggers GCs more often without the need to >> allocate large garbage arrays and running for many (warmup) iterations. I've also strengthened >> verification of the array contents and used the exact same command line flags that Roland proposed >> in his fix for JDK-8193518 [1]. >> >> I've verified that the test still reproduces JDK-8249608 and JDK-8193518. It's now much more >> reliable and reproduces the issues in every run. >> >> Best regards, >> Tobias >> >> [1] >> http://cr.openjdk.java.net/~roland/8193518/webrev.01/test/hotspot/jtreg/compiler/vectorization/TestVectorsNotSavedAtSafepoint.java.html >> >> From dmitry.chuyko at bell-sw.com Thu Aug 13 11:04:37 2020 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Thu, 13 Aug 2020 14:04:37 +0300 Subject: [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) Message-ID: Hello, Please review a faster version of Math.signum() for AArch64. Two new intrinsics (double and float) are introduced in general code, with appropriate new nodes. New JTreg test is added to cover the intrinsic case (enabled only for aarch64). AArch64 implementation uses FACGT (compare abslute fp values) and BSL (fp bit selection) to avoid branches and moves to non-fp registers and back. Performance results show ~30% better time in the benchmark with a black hole [1] on Cortex. E.g. on random numbers 4.8 ns/op --> 3.5 ns/op, overhead is 2.9 ns/op. rfe: https://bugs.openjdk.java.net/browse/JDK-8251525 webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.00/ testing: jck, jtreg including new dedicated test -Dmitry [1] https://cr.openjdk.java.net/~dchuyko/8249198/DoubleSignum.java From vladimir.x.ivanov at oracle.com Thu Aug 13 11:32:50 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 13 Aug 2020 14:32:50 +0300 Subject: [16] RFR(S): 8251458: Parse::do_lookupswitch fails with "assert(_cnt >= 0) failed" In-Reply-To: <6bf15413-3a6d-55cc-570d-c115a72397ec@oracle.com> References: <965ba14c-2e1c-280e-4403-736ede11dd97@oracle.com> <6bf15413-3a6d-55cc-570d-c115a72397ec@oracle.com> Message-ID: >> Though the fix itself looks sufficient, the code around is still not pretty... In particular, >> profile data goes through uint->jint->int->float(!) conversion which doesn't make any sense. >> >> It would be really nice to clean it up. > > Yes, I've noticed that as well but didn't want to clean it up with this patch because we need to > backport to 11u. I've filed JDK-8251513 [1] for the cleanup. Sounds good. Best regards, Vladimir Ivanov From vladimir.x.ivanov at oracle.com Thu Aug 13 11:51:59 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 13 Aug 2020 14:51:59 +0300 Subject: [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: References: Message-ID: Hi Dmitry, Some comments on shared code changes: src/hotspot/share/opto/library_call.cpp: + case vmIntrinsics::_dsignum: + return UseSignumIntrinsic && (Matcher::match_rule_supported(Op_SignumD) ? inline_double_math(id) : false); There's no need in repeating UseSignumIntrinsic and (Matcher::match_rule_supported(Op_SignumD) checks. C2Compiler::is_intrinsic_supported() already covers taht. src/hotspot/share/opto/signum.hpp: 32 class SignumNode : public Node { 33 public: 34 SignumNode(Node* in) : Node(0, in) {} 35 virtual int Opcode() const; 36 virtual const Type *bottom_type() const { return NULL; } 37 virtual uint ideal_reg() const { return Op_RegD; } 38 }; Any particular reason to keep SignumNode? I don't see any and would just drop it. Also, having a dedicated header file just for a couple of nodes with trivial implementations looks like an overkill. As an alternative location, intrinsicnode.cpp should be a better option. Best regards, Vladimir Ivanov On 13.08.2020 14:04, Dmitry Chuyko wrote: > Hello, > > Please review a faster version of Math.signum() for AArch64. > > Two new intrinsics (double and float) are introduced in general code, > with appropriate new nodes. New JTreg test is added to cover the > intrinsic case (enabled only for aarch64). > > AArch64 implementation uses FACGT (compare abslute fp values) and BSL > (fp bit selection) to avoid branches and moves to non-fp registers and > back. > > Performance results show ~30% better time in the benchmark with a black > hole [1] on Cortex. E.g. on random numbers 4.8 ns/op --> 3.5 ns/op, > overhead is 2.9 ns/op. > > rfe: https://bugs.openjdk.java.net/browse/JDK-8251525 > webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.00/ > testing: jck, jtreg including new dedicated test > > -Dmitry > > [1] https://cr.openjdk.java.net/~dchuyko/8249198/DoubleSignum.java > From aph at redhat.com Thu Aug 13 13:07:38 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 13 Aug 2020 14:07:38 +0100 Subject: [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: References: Message-ID: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com> On 13/08/2020 12:04, Dmitry Chuyko wrote: > Performance results show ~30% better time in the benchmark with a black > hole [1] on Cortex. E.g. on random numbers 4.8 ns/op --> 3.5 ns/op, > overhead is 2.9 ns/op. > > rfe:https://bugs.openjdk.java.net/browse/JDK-8251525 > webrev:http://cr.openjdk.java.net/~dchuyko/8251525/webrev.00/ > testing: jck, jtreg including new dedicated test Please show all of the JMH results. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dmitry.chuyko at bell-sw.com Thu Aug 13 13:50:01 2020 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Thu, 13 Aug 2020 16:50:01 +0300 Subject: [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com> References: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com> Message-ID: <790171f2-0499-21ed-899c-59fd788c34ba@bell-sw.com> Hi Andrew, On 8/13/20 4:07 PM, Andrew Haley wrote: > On 13/08/2020 12:04, Dmitry Chuyko wrote: >> Performance results show ~30% better time in the benchmark with a black >> hole [1] on Cortex. E.g. on random numbers 4.8 ns/op --> 3.5 ns/op, >> overhead is 2.9 ns/op. >> ...... > > Please show all of the JMH results. > Results for other sub-benchmarks are listed in the RFE, here is a copy: Baseline DoubleSignum.ofMostlyNaN 5.019 ? 0.060 ns/op DoubleSignum.ofMostlyNeg 4.919 ? 0.030 ns/op DoubleSignum.ofMostlyPos 4.827 ? 0.081 ns/op DoubleSignum.ofMostlyZero 4.936 ? 0.107 ns/op DoubleSignum.ofRandom 4.825 ? 0.026 ns/op DoubleSignum.overhead 2.846 ? 0.027 ns/op Patch DoubleSignum.ofMostlyNaN 3.478 ? 0.368 ns/op DoubleSignum.ofMostlyNeg 3.509 ? 0.487 ns/op DoubleSignum.ofMostlyPos 3.513 ? 0.451 ns/op DoubleSignum.ofMostlyZero 3.494 ? 0.220 ns/op DoubleSignum.ofRandom 3.506 ? 0.343 ns/op DoubleSignum.overhead 2.848 ? 0.019 ns/op -Dmitry From stuart.monteith at arm.com Thu Aug 13 15:48:34 2020 From: stuart.monteith at arm.com (Stuart Monteith) Date: Thu, 13 Aug 2020 16:48:34 +0100 Subject: [aarch64-port-dev ] [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: <0586ead6-583d-1907-491e-64db6edf2106@redhat.com> References: <2d960bbe-0191-db94-2d5c-7df511a36dba@redhat.com> <0586ead6-583d-1907-491e-64db6edf2106@redhat.com> Message-ID: On 13/08/2020 11:00, Andrew Haley wrote: > On 12/08/2020 12:38, Stuart Monteith wrote: > > > The method "testDigest" generates an byte array of a given size, > > with each element filled with it's own index & 0xff. > > > > The test is then run once, assumed uncompiled, it is then "warmed > > up" and the first generated digest is compared against the digest > > presumably generated by the intrinsic. This is the same test for all > > of the message digest algorithms. > > > > I'd say the test is no worse than what has gone before. There are > > additional tests under the jdk library tests, but nothing that > > addresses the correctness of the MD5 algorithm implementation > > itself. > > Good grief. So there are no compliance tests in the test suite at all. Yes for any algorithm, for either the intrinsics or the Java implementations. > > > In terms of the status-quo, that patch looks ok to me. I think if > > the testing is to be expanded, it should be expanded to all of the > > message digest algorithms. > > That's not much more that an excuse for doing nothing, IMO. > My intention was to suggest that more than MD5 or even just the intrinsics need to be tested, it's not an excuse to ignore this. The existing tests are simply a comparison between generated message digest for a single message between the Java code and the intrinsics. The NIST samples cover SHA1 and MD5, but there are additional samples here: https://csrc.nist.gov/Projects/cryptographic-standards-and-guidelines/example-values . The message digests in Java under sun.security.provider are: MD2, MD4, MD5, SHA1 SHA2: SHA2-224, SHA2-256, SHA3: SHA3-224, SHA3-256, SHA3-384, SHA3-512, SHAKE256 SHA5: SHA-512/224, SHA-512/256, SHA-512, SHA-384, The intrinsics implemented are: aarch64: SHA1, SHA2, SHA5 (+MD5) ppc64: SHA2, SHA5 x86_64: SHA1, SHA2, SHA5, MD5 x86_32: SHA1, SHA2, MD5 The MD5 patches have been merged already for x86. SHA3 doesn't have any intrinsic implementations. MD2 has some example values in its RFC https://tools.ietf.org/html/rfc1319 Likewise, MD4 has example values in its RFC too: https://tools.ietf.org/html/rfc1320 My suggestion is to add new tests for each of the message digest algorithms and share them between the JTreg jdk and hotspot instrinsics. The MD5 intrinsics could be merged after some demonstration of correctness? I've CC'd core-libs-dev as this affects the jdk library. BR, Stuart From nils.eliasson at oracle.com Thu Aug 13 15:59:05 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 13 Aug 2020 17:59:05 +0200 Subject: RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic In-Reply-To: References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com> <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com> <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com> <1595401959932.33284@amazon.com> <1595520162373.22868@amazon.com> <916b3a4a-5617-941d-6161-840f3ea900bd@oracle.com> <1596523192072.15354@amazon.com> <1597165750921.4285@amazon.com> Message-ID: <9e3fae0e-ecf7-07a9-dba3-c1cef2646eb3@oracle.com> Hi again, On second thought - please add some basic testing (reuse any old test, or write a new) that covers the different cases. I think this table covers all combinations. There should exist tests for most of them that you can piggy back on. |+-------------------------------------------------+-------+----------------------------------+ | ControlIntrinsics | valid | invalid | +-------------------------------------------------+-------+----------------------------------+ | vmflag | ok | print error and don't start | +-------------------------------------------------+-------+----------------------------------+ | CompilerOracle: -XX:CompileCommand= | ok | print error and continue | +-------------------------------------------------+-------+----------------------------------+ | CompilerDirectives: -XX:CompilerDirectivesFile= | ok | print error and don't start | +-------------------------------------------------+-------+----------------------------------+ | CompilerDirectives via jcmd | ok | print error, vm continues to run | +-------------------------------------------------+-------+----------------------------------+| Regards, Nils On 2020-08-12 10:21, Nils Eliasson wrote: > Hi, > > Sorry for the delay. > > About the error handling: > > For CompilerDirectivesFile there are two scenarios: > 1) If a file containing bad contents is passed on the commandline - > the VM prints an descriptive error and refuses to start. > 2) If a file containing bad contents is passed through jcmd - the VM > prints and error on the jcmd stream and continues to run (ignoring the > command). > > This is achieved by letting the parser just register any parsing > error, and defer to the caller to decide how to handle the situation. > > Regards, > Nils Eliasson > > > On 2020-08-11 19:09, Liu, Xin wrote: >> Hi, Reviewers, >> >> May I gently ping this? >> >> I stuck because I don't know which error handling is appropriate. >> >> If we do nothing, current hotspot ignores wrong intrinsic Ids in the >> cmdline. >> This patch aborts hotspot when it detects any invalid intrinsic id. >> >> thanks, >> --lx >> >> >> ________________________________________ >> From: Liu, Xin >> Sent: Monday, August 3, 2020 11:39 PM >> To: Tobias Hartmann; Nils Eliasson; >> hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev >> Subject: Re: [EXTERNAL] RFR(S): 8247732: validate user-input >> intrinsic_ids in ControlIntrinsic >> >> hi, Nils, >> >> Tobias would like to keep the parser behavior consistency.? I think >> it means that the hotspot need to suppress the warning if the >> intrinsic_id doesn't exists in compiler directive. >> eg. -XX:CompileCommand=option,,ControlIntrinsic=-_nonexist. >> >> What do you think about it? >> >> Here is the latest webrev: >> http://cr.openjdk.java.net/~xliu/8247732/01/webrev/ >> >> thanks, >> --lx >> >> ________________________________________ >> From: Tobias Hartmann >> Sent: Friday, July 24, 2020 2:52 AM >> To: Liu, Xin; Nils Eliasson; hotspot-compiler-dev at openjdk.java.net; >> hotspot-runtime-dev >> Subject: RE: [EXTERNAL] RFR(S): 8247732: validate user-input >> intrinsic_ids in ControlIntrinsic >> >> CAUTION: This email originated from outside of the organization. Do >> not click links or open attachments unless you can confirm the sender >> and know the content is safe. >> >> >> >> Hi Liu, >> >> On 23.07.20 18:02, Liu, Xin wrote: >>> That is my intention too, but CompilerOracle doesn't exit JVM when >>> it encounters parsing errors. >>> It just exacts information from CompileCommand as many as possible. >>> That makes sense because compiler "directives" are supposed to be >>> optional for program execution. >>> >>> I do put the error message in parser's errorbuf.? I set a flag >>> "exit_on_error" to quit JVM after it dumps parser errors. yes, I >>> treat undefined intrinsics as fatal errors. >>> This behavior is from Nils comment: "I want to see an error on >>> startup if the user has specified unknown intrinsic names." It is >>> also consistent with JVM option -XX:ControlIntrinsic=. >> Okay, thanks for the explanation! I would prefer consistency in error >> handling of compiler >> directives, i.e., handle all parser failures the same way. But I >> leave it to Nils to decide. >> >> Best regards, >> Tobias > From nils.eliasson at oracle.com Thu Aug 13 16:17:26 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 13 Aug 2020 18:17:26 +0200 Subject: RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic In-Reply-To: <9e3fae0e-ecf7-07a9-dba3-c1cef2646eb3@oracle.com> References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com> <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com> <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com> <1595401959932.33284@amazon.com> <1595520162373.22868@amazon.com> <916b3a4a-5617-941d-6161-840f3ea900bd@oracle.com> <1596523192072.15354@amazon.com> <1597165750921.4285@amazon.com> <9e3fae0e-ecf7-07a9-dba3-c1cef2646eb3@oracle.com> Message-ID: <4c70ed76-d31a-4077-14b7-37937b5c22ae@oracle.com> That table didn't come out right... +-------------------------------------------------+-------+----------------------------------+ | ControlIntrinsics?????????????????????????????? | valid | invalid????????????????????????? | +-------------------------------------------------+-------+----------------------------------+ | vmflag????????????????????????????????????????? | ok??? | print error and don't start????? | +-------------------------------------------------+-------+----------------------------------+ | CompilerOracle: -XX:CompileCommand=???????????? | ok??? | print error and continue???????? | +-------------------------------------------------+-------+----------------------------------+ | CompilerDirectives: -XX:CompilerDirectivesFile= | ok??? | print error and don't start????? | +-------------------------------------------------+-------+----------------------------------+ | CompilerDirectives via jcmd???????????????????? | ok??? | print error, VM continues to run | +-------------------------------------------------+-------+----------------------------------+ // Regards Nils On 2020-08-13 17:59, Nils Eliasson wrote: > > |+-------------------------------------------------+-------+----------------------------------+ > | ControlIntrinsics | valid | invalid | > +-------------------------------------------------+-------+----------------------------------+ > | vmflag | ok | print error and don't start | > +-------------------------------------------------+-------+----------------------------------+ > | CompilerOracle: -XX:CompileCommand= | ok | print error and continue > | > +-------------------------------------------------+-------+----------------------------------+ > | CompilerDirectives: -XX:CompilerDirectivesFile= | ok | print error > and don't start | > +-------------------------------------------------+-------+----------------------------------+ > | CompilerDirectives via jcmd | ok | print error, vm continues to run > | > +-------------------------------------------------+-------+----------------------------------+| From igor.ignatyev at oracle.com Thu Aug 13 16:46:25 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 13 Aug 2020 09:46:25 -0700 Subject: RFR(T) : 8251526 : CTW fails to build after JDK-8251121 Message-ID: <60FD6988-4ED2-4E64-A4F7-C4F7C8033748@oracle.com> Hi all, could you please review this one-liner patch? 8251121 introduced a dependency b/w jdk/test/lib/util/CoreUtils and jtreg/SkippedException, b/c SkippedException wasn't on the source path, ctw build failed. the patch simply adds test/lib/jtreg/*.java to the source path. JBS: https://bugs.openjdk.java.net/browse/JDK-8251526 patch: > diff -r ce770ba672fe test/hotspot/jtreg/testlibrary/ctw/Makefile > --- a/test/hotspot/jtreg/testlibrary/ctw/Makefile Wed Aug 12 12:37:16 2020 -0400 > +++ b/test/hotspot/jtreg/testlibrary/ctw/Makefile Thu Aug 13 09:42:09 2020 -0700 > @@ -45,6 +45,7 @@ > LIB_FILES = $(shell find $(TESTLIBRARY_DIR)/jdk/test/lib/ \ > $(TESTLIBRARY_DIR)/jdk/test/lib/process \ > $(TESTLIBRARY_DIR)/jdk/test/lib/util \ > + $(TESTLIBRARY_DIR)/jtreg \ > -maxdepth 1 -name '*.java') > WB_SRC_FILES = $(shell find $(TESTLIBRARY_DIR)/sun/hotspot -name '*.java') > EXPORTS=--add-exports java.base/jdk.internal.jimage=ALL-UNNAMED \ testing: cd test/hotspot/jtreg/testlibrary/ctw && make Thanks, -- Igor From shade at redhat.com Thu Aug 13 16:55:10 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 13 Aug 2020 18:55:10 +0200 Subject: RFR(T) : 8251526 : CTW fails to build after JDK-8251121 In-Reply-To: <60FD6988-4ED2-4E64-A4F7-C4F7C8033748@oracle.com> References: <60FD6988-4ED2-4E64-A4F7-C4F7C8033748@oracle.com> Message-ID: On 8/13/20 6:46 PM, Igor Ignatyev wrote: >> diff -r ce770ba672fe test/hotspot/jtreg/testlibrary/ctw/Makefile >> --- a/test/hotspot/jtreg/testlibrary/ctw/Makefile Wed Aug 12 12:37:16 2020 -0400 >> +++ b/test/hotspot/jtreg/testlibrary/ctw/Makefile Thu Aug 13 09:42:09 2020 -0700 >> @@ -45,6 +45,7 @@ >> LIB_FILES = $(shell find $(TESTLIBRARY_DIR)/jdk/test/lib/ \ >> $(TESTLIBRARY_DIR)/jdk/test/lib/process \ >> $(TESTLIBRARY_DIR)/jdk/test/lib/util \ >> + $(TESTLIBRARY_DIR)/jtreg \ >> -maxdepth 1 -name '*.java') Looks good and trivial to me. -- Thanks, -Aleksey From igor.ignatyev at oracle.com Thu Aug 13 17:35:02 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 13 Aug 2020 10:35:02 -0700 Subject: RFR(T) : 8251526 : CTW fails to build after JDK-8251121 In-Reply-To: References: <60FD6988-4ED2-4E64-A4F7-C4F7C8033748@oracle.com> Message-ID: <887ADD5E-55E5-491D-AF06-FBAC6FA9C4A0@oracle.com> Thanks Aleksey, pushed. -- Igor > On Aug 13, 2020, at 9:55 AM, Aleksey Shipilev wrote: > > On 8/13/20 6:46 PM, Igor Ignatyev wrote: >>> diff -r ce770ba672fe test/hotspot/jtreg/testlibrary/ctw/Makefile >>> --- a/test/hotspot/jtreg/testlibrary/ctw/Makefile Wed Aug 12 12:37:16 2020 -0400 >>> +++ b/test/hotspot/jtreg/testlibrary/ctw/Makefile Thu Aug 13 09:42:09 2020 -0700 >>> @@ -45,6 +45,7 @@ >>> LIB_FILES = $(shell find $(TESTLIBRARY_DIR)/jdk/test/lib/ \ >>> $(TESTLIBRARY_DIR)/jdk/test/lib/process \ >>> $(TESTLIBRARY_DIR)/jdk/test/lib/util \ >>> + $(TESTLIBRARY_DIR)/jtreg \ >>> -maxdepth 1 -name '*.java') > > Looks good and trivial to me. > > -- > Thanks, > -Aleksey > From xxinliu at amazon.com Thu Aug 13 18:37:31 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Thu, 13 Aug 2020 18:37:31 +0000 Subject: RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic In-Reply-To: <4c70ed76-d31a-4077-14b7-37937b5c22ae@oracle.com> References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com> <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com> <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com> <1595401959932.33284@amazon.com> <1595520162373.22868@amazon.com> <916b3a4a-5617-941d-6161-840f3ea900bd@oracle.com> <1596523192072.15354@amazon.com> <1597165750921.4285@amazon.com> <9e3fae0e-ecf7-07a9-dba3-c1cef2646eb3@oracle.com>, <4c70ed76-d31a-4077-14b7-37937b5c22ae@oracle.com> Message-ID: <1597343851213.53343@amazon.com> hi, Nils, Thank you to elaborate the answer with a table. I don't know there are up to 4 approaches to affect compilation behaviors until this table! I got it. I will work tests and make sure my next patch conform this spec. thanks, --lx ________________________________________ From: hotspot-compiler-dev on behalf of Nils Eliasson Sent: Thursday, August 13, 2020 9:17 AM To: hotspot-compiler-dev at openjdk.java.net Subject: RE: [EXTERNAL] RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. That table didn't come out right... +-------------------------------------------------+-------+----------------------------------+ | ControlIntrinsics | valid | invalid | +-------------------------------------------------+-------+----------------------------------+ | vmflag | ok | print error and don't start | +-------------------------------------------------+-------+----------------------------------+ | CompilerOracle: -XX:CompileCommand= | ok | print error and continue | +-------------------------------------------------+-------+----------------------------------+ | CompilerDirectives: -XX:CompilerDirectivesFile= | ok | print error and don't start | +-------------------------------------------------+-------+----------------------------------+ | CompilerDirectives via jcmd | ok | print error, VM continues to run | +-------------------------------------------------+-------+----------------------------------+ // Regards Nils On 2020-08-13 17:59, Nils Eliasson wrote: > > |+-------------------------------------------------+-------+----------------------------------+ > | ControlIntrinsics | valid | invalid | > +-------------------------------------------------+-------+----------------------------------+ > | vmflag | ok | print error and don't start | > +-------------------------------------------------+-------+----------------------------------+ > | CompilerOracle: -XX:CompileCommand= | ok | print error and continue > | > +-------------------------------------------------+-------+----------------------------------+ > | CompilerDirectives: -XX:CompilerDirectivesFile= | ok | print error > and don't start | > +-------------------------------------------------+-------+----------------------------------+ > | CompilerDirectives via jcmd | ok | print error, vm continues to run > | > +-------------------------------------------------+-------+----------------------------------+| From hohensee at amazon.com Thu Aug 13 21:51:39 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Thu, 13 Aug 2020 21:51:39 +0000 Subject: RFR 8164632: Node indices should be treated as unsigned integers Message-ID: Shouldn't all the uint type uses that represent node indices actually be node_idx_t? Thanks, Paul ?On 8/13/20, 12:34 AM, "hotspot-compiler-dev on behalf of Tobias Hartmann" wrote: Hi Eric, there are other places where Node::_idx is casted to int (and a potential overflow might happen). For example, calls to Compile::node_notes_at. The purpose of this RFE was to replace all Node::_idx uint -> int casts and consistently use uint for the node index. If that's not feasible, we should at least add a guarantee (not only an assert) checking that _idx is always <= MAX_INT. Best regards, Tobias On 12.08.20 00:41, Eric, Chan wrote: > Hi, > > Requesting review for > > Webrev : http://cr.openjdk.java.net/~xliu/eric/8164632/00/webrev/ > JBS : https://bugs.openjdk.java.net/browse/JDK-8164632 > > The change cast uint ni to integer so that the parameter that pass to method TypeOopPtr::cast_to_instance_id is a integer. > > I have tested this builds successfully . > > Ensured that there are no regressions in hotspot : tier1 tests. > > Regards, > Eric Chen > From vladimir.kozlov at oracle.com Thu Aug 13 22:58:44 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 13 Aug 2020 15:58:44 -0700 Subject: RFR 8164632: Node indices should be treated as unsigned integers In-Reply-To: References: Message-ID: <7225b0a5-e685-89aa-1bc5-4ff162774fe5@oracle.com> Yes, it is sloppy :( Mostly it bases on value of MaxNodeLimit = 80000 by default and as result node's idx will never reach MAX_INT. For EA we need 2 special types TOP and BOTTOM as Paul correctly pointed in RFE. We can make InstanceTop == max_juint and node_idx_t type for _instance_id . We don't do arithmetic on it, see TypeOopPtr::meet_instance_id(). But we can't use assert in this case to check incoming idx because max_juint will be valid value - InstanceTop. And I agree that we should use node_idx_t everywhere. For example, Node::Init(), init_node_notes(), node_notes_at() and set_node_notes_at() should use it. Same goes for req and other Node's methods arguments. All Node fields defined as node_idx_t but we have mix of int and uint when referencing them. Warning: it is not small change. Regards, Vladimir On 8/13/20 2:51 PM, Hohensee, Paul wrote: > Shouldn't all the uint type uses that represent node indices actually be node_idx_t? > > Thanks, > Paul > > ?On 8/13/20, 12:34 AM, "hotspot-compiler-dev on behalf of Tobias Hartmann" wrote: > > Hi Eric, > > there are other places where Node::_idx is casted to int (and a potential overflow might happen). > For example, calls to Compile::node_notes_at. > > The purpose of this RFE was to replace all Node::_idx uint -> int casts and consistently use uint > for the node index. If that's not feasible, we should at least add a guarantee (not only an assert) > checking that _idx is always <= MAX_INT. > > Best regards, > Tobias > > On 12.08.20 00:41, Eric, Chan wrote: > > Hi, > > > > Requesting review for > > > > Webrev : http://cr.openjdk.java.net/~xliu/eric/8164632/00/webrev/ > > JBS : https://bugs.openjdk.java.net/browse/JDK-8164632 > > > > The change cast uint ni to integer so that the parameter that pass to method TypeOopPtr::cast_to_instance_id is a integer. > > > > I have tested this builds successfully . > > > > Ensured that there are no regressions in hotspot : tier1 tests. > > > > Regards, > > Eric Chen > > > From jptatton at amazon.com Thu Aug 13 23:02:34 2020 From: jptatton at amazon.com (Tatton, Jason) Date: Thu, 13 Aug 2020 23:02:34 +0000 Subject: JDK-8180068: Access of mark word should use oopDesc::mark_offset_in_bytes() instead of '0' for sparc & arm Message-ID: <13d13861e3c14681a9180dc2271589ba@EX13D46EUB003.ant.amazon.com> Hi Everyone, I'm Jason. I recently joined Amazon on the team supporting OpenJDK. I am new to the OpenJDK project and would like to contribute some starter bug fixes/enhancements/cleanups. I am working with my sponsor, Paul Hohensee. I have a cleanup to submit. Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8180068 http://cr.openjdk.java.net/~phh/8180068/webrev.00/ The code change is very straightforward, simply a substitution of '0' with 'oopDesc::mark_offset_in_bytes()' in the relevant 6 locations. For testing I have run; 'run-test-tier1' and 'run-test-tier2' for: x86_64 and aarch64. Regards, -- Jason Taton From nick.gasson at arm.com Fri Aug 14 02:26:11 2020 From: nick.gasson at arm.com (Nick Gasson) Date: Fri, 14 Aug 2020 10:26:11 +0800 Subject: RFR: 8247354: [aarch64] PopFrame causes assert(oopDesc::is_oop(obj)) failed: not an oop In-Reply-To: <10b37c70-c522-7f65-3c7e-bbeeaf7e1c3d@redhat.com> References: <85364ykery.fsf@nicgas01-pc.shanghai.arm.com> <10b37c70-c522-7f65-3c7e-bbeeaf7e1c3d@redhat.com> Message-ID: <85k0y2q7y4.fsf@nicgas01-pc.shanghai.arm.com> On 08/13/20 16:06 pm, Andrew Dinn wrote: > Hi Nick, > > On 07/08/2020 10:04, Nick Gasson wrote: >> Bug: https://bugs.openjdk.java.net/browse/JDK-8247354 >> Webrev: http://cr.openjdk.java.net/~ngasson/8247354/webrev.0/ > Nice detective work. The patch looks ok to me. > Thanks for the review Andrew. I've pushed it. -- Nick From shade at redhat.com Fri Aug 14 07:22:26 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 14 Aug 2020 09:22:26 +0200 Subject: JDK-8180068: Access of mark word should use oopDesc::mark_offset_in_bytes() instead of '0' for sparc & arm In-Reply-To: <13d13861e3c14681a9180dc2271589ba@EX13D46EUB003.ant.amazon.com> References: <13d13861e3c14681a9180dc2271589ba@EX13D46EUB003.ant.amazon.com> Message-ID: On 8/14/20 1:02 AM, Tatton, Jason wrote: > I have a cleanup to submit. Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8180068 > http://cr.openjdk.java.net/~phh/8180068/webrev.00/ > > The code change is very straightforward, simply a substitution of '0' with 'oopDesc::mark_offset_in_bytes()' in the relevant 6 locations. No, wait. None of these look relevant: *) The uses in load_heap_oop are the _load addresses_. They are naturally just *(obj + 0). This is not loading the mark word. *) The uses in try_resolve_jobject is decoding the JNI handle, "0" is valid there. This is not loading the mark word. See the native implementation in JNIHandles::resolve_impl that ends up loading off the dereferenced handle via: inline oop* JNIHandles::jobject_ptr(jobject handle) { assert(!is_jweak(handle), "precondition"); return reinterpret_cast(handle); } -- Thanks, -Aleksey From sergei.tsypanov at yandex.ru Fri Aug 14 07:43:59 2020 From: sergei.tsypanov at yandex.ru (=?utf-8?B?0KHQtdGA0LPQtdC5INCm0YvQv9Cw0L3QvtCy?=) Date: Fri, 14 Aug 2020 09:43:59 +0200 Subject: JIT optimization broke mapping between compiled code and byte-code instructions on JDK 14 / 15EAP Message-ID: <894941597390093@mail.yandex.ru> Hello, while investigating an issue related to instantiation of Spring's `org.springframework.util.ConcurrentReferenceHashMap` (as of `spring-core-5.1.3.RELEASE`) I've used `LinuxPerfAsmProfiler` shipped along with JMH to profile generated assembly. I simply run this @Benchmark public Object measureInit() { return new ConcurrentReferenceHashMap<>(); } Benchmarking on JDK 8 allows to identify one of hot spots (full assembly layout can be found in [1]): 0.61% 0x00007f32d92772ea: lock addl $0x0,(%rsp) ;*putfield count ; - org.springframework.util.ConcurrentReferenceHashMap$Segment::<init>@11 (line 476) ; - org.springframework.util.ConcurrentReferenceHashMap::<init>@141 (line 184) 15.81% 0x00007f32d92772ef: mov 0x60(%r15),%rdx This corresponds unnecessary assignment of default value to a volatile field: protected final class Segment extends ReentrantLock { private volatile int count = 0; } Then I run the same benchmark on JDK 14 and again use `LinuxPerfAsmProfiler`, but now I don't have any explicit pointing to `volatile int count = 0` in captured assembly [2]. Looking for `lock addl $0x0` instuction which is assignment of `0` under `lock` prefix I have found this: 0.08% ? 0x00007f3717d46187: lock addl $0x0,-0x40(%rsp) 23.74% ? 0x00007f3717d4618d: mov 0x120(%r15),%rbx which is likely to correspond `volatile int count = 0` because it follows the construction of `Segment`'s superclass `ReentrantLock`: 0.77% ? 0x00007f3717d46140: movq $0x0,0x18(%rax) ;*new {reexecute=0 rethrow=0 return_oop=0} ? ; - java.util.concurrent.locks.ReentrantLock::<init>@5 (line 294) ? ; - org.springframework.util.ConcurrentReferenceHashMap$Segment::<init>@6 (line 484) ? ; - org.springframework.util.ConcurrentReferenceHashMap::<init>@141 (line 184) 0.06% ? 0x00007f3717d46148: mov %r8,%rcx 0.05% ? 0x00007f3717d4614b: mov %rax,%rbx 0.03% ? 0x00007f3717d4614e: shr $0x3,%rbx 0.74% ? 0x00007f3717d46152: mov %ebx,0xc(%r8) 0.06% ? 0x00007f3717d46156: mov %rax,%rbx 0.05% ? 0x00007f3717d46159: xor %rcx,%rbx 0.02% ? 0x00007f3717d4615c: shr $0x14,%rbx 0.72% ? 0x00007f3717d46160: test %rbx,%rbx ? ? 0x00007f3717d46163: je 0x00007f3717d4617f ? ? 0x00007f3717d46165: shr $0x9,%rcx ? ? 0x00007f3717d46169: movabs $0x7f370a872000,%rdi ? ? 0x00007f3717d46173: add %rcx,%rdi ? ? 0x00007f3717d46176: cmpb $0x8,(%rdi) 0.00% ? ? 0x00007f3717d46179: jne 0x00007f3717d46509 0.04% ? ? 0x00007f3717d4617f: movl $0x0,0x14(%r8) 0.08% ? 0x00007f3717d46187: lock addl $0x0,-0x40(%rsp) 23.74% ? 0x00007f3717d4618d: mov 0x120(%r15),%rbx The problem is that I don't have any mention of `putfield count` in generated assembly at all. I've asked the question on StackOverflow [4] and Andrey Pangin suggests in his comment [5] that this might be due to broken mapping between compiled code and byte-code causing miss of debug info in the output of -XX:+PrintAssembly P.S. The issue is reproducible on JDK 15 built locally, see [3] [1]: https://gist.github.com/stsypanov/ff5678987c6f95a2aaf292fa2a3b92a8 [2]: https://gist.github.com/stsypanov/2e4bd73c39d7465cbdd75ba26d4bc217 [3]: https://gist.github.com/stsypanov/30fc0f688e6d37612ca017b59ab3e631 [4]: https://stackoverflow.com/questions/63397711/linuxperfasmprofiler-shows-java-code-corresponding-assembly-hot-spot-for-java-8 [5]: https://stackoverflow.com/questions/63397711/linuxperfasmprofiler-shows-java-code-corresponding-assembly-hot-spot-for-java-8#comment112109002_63397711 From aph at redhat.com Fri Aug 14 08:24:45 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 14 Aug 2020 09:24:45 +0100 Subject: JIT optimization broke mapping between compiled code and byte-code instructions on JDK 14 / 15EAP In-Reply-To: <894941597390093@mail.yandex.ru> References: <894941597390093@mail.yandex.ru> Message-ID: <19bfa8d8-a9bc-69eb-1f66-af68ab537502@redhat.com> On 14/08/2020 08:43, ?????? ??????? wrote: > The problem is that I don't have any mention of `putfield count` in > generated assembly at all. It's here: 0.04% ? ? 0x00007f3717d4617f: movl $0x0,0x14(%r8) 0.08% ? 0x00007f3717d46187: lock addl $0x0,-0x40(%rsp) There's never been any guarantee that debuginfo will be complete after transformations. Optimization rewrites things to such an extent that it's not really possible anyway: operations are reorganized and combined in such a way that the relationship between incoming bytecode and generated code is not 1:1. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From christian.hagedorn at oracle.com Fri Aug 14 12:10:50 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 14 Aug 2020 14:10:50 +0200 Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and debugging support Message-ID: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com> Hi Please review the following enhancement for C1: https://bugs.openjdk.java.net/browse/JDK-8251093 http://cr.openjdk.java.net/~chagedorn/8251093/webrev.00/ While I was working on JDK-8249603 [1], I added some additional debugging and logging code which helped to figure out what was going on. I think it would be useful to have this code around for the analysis of future C1 register allocator bugs. This RFE adds (everything non-product code): - find_interval(number): Can be called like that from gdb anywhere to find an interval with the given number. - Interval::print_children()/print_parent(): Useful when debugging with gdb to quickly show the split children and parent. - LinearScan::print_reg_num(number): Prints the register or stack location for this register number. This is useful in some places (logging with TraceLinearScanLevel set) where it just printed a number which first had to be manually looked up in other logs. I additionally did some cleanup of the touched code. We could additionally split the TraceLinearScanLevel flag into separate flags related to the different phases of the register allocation algorithm. It currently just prints too much details on the higher levels. You often find yourself being interested in a specific part of the algorithm and only want to know more details there. To achieve that you now you have to either handle all the noise or manually disable/enable other logs. We could file an RFE to clean this up if it's worth the effort - given that there are not many new issues filed for C1 register allocation today. Thank you! Best regards, Christian [1] https://bugs.openjdk.java.net/browse/JDK-8251093 From hohensee at amazon.com Fri Aug 14 16:05:56 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Fri, 14 Aug 2020 16:05:56 +0000 Subject: RFR 8164632: Node indices should be treated as unsigned integers Message-ID: <05F44A7B-7BF3-4EF0-B1A6-8131600A3919@amazon.com> Hi, Vladimir, What do you think of the following? 1. Fix 8164632, i.e., replace int with uint, and add guarantees where idxs are passed to a different type (as in e.g., Eric's webrev). 2. New issue: Define an enum type for _instance_id, (typedef uint instance_idx_t) and change the guarantees to check < InstanceTop and > InstanceBot (InstanceTop = ~(uint)0, InstanceBot = 0). And change from instance ids from int to instance_idx_t. 3. New issue: Change from uint to node_idx_t. Thanks, Paul ?On 8/13/20, 4:00 PM, "hotspot-compiler-dev on behalf of Vladimir Kozlov" wrote: Yes, it is sloppy :( Mostly it bases on value of MaxNodeLimit = 80000 by default and as result node's idx will never reach MAX_INT. For EA we need 2 special types TOP and BOTTOM as Paul correctly pointed in RFE. We can make InstanceTop == max_juint and node_idx_t type for _instance_id . We don't do arithmetic on it, see TypeOopPtr::meet_instance_id(). But we can't use assert in this case to check incoming idx because max_juint will be valid value - InstanceTop. And I agree that we should use node_idx_t everywhere. For example, Node::Init(), init_node_notes(), node_notes_at() and set_node_notes_at() should use it. Same goes for req and other Node's methods arguments. All Node fields defined as node_idx_t but we have mix of int and uint when referencing them. Warning: it is not small change. Regards, Vladimir On 8/13/20 2:51 PM, Hohensee, Paul wrote: > Shouldn't all the uint type uses that represent node indices actually be node_idx_t? > > Thanks, > Paul > > On 8/13/20, 12:34 AM, "hotspot-compiler-dev on behalf of Tobias Hartmann" wrote: > > Hi Eric, > > there are other places where Node::_idx is casted to int (and a potential overflow might happen). > For example, calls to Compile::node_notes_at. > > The purpose of this RFE was to replace all Node::_idx uint -> int casts and consistently use uint > for the node index. If that's not feasible, we should at least add a guarantee (not only an assert) > checking that _idx is always <= MAX_INT. > > Best regards, > Tobias > > On 12.08.20 00:41, Eric, Chan wrote: > > Hi, > > > > Requesting review for > > > > Webrev : http://cr.openjdk.java.net/~xliu/eric/8164632/00/webrev/ > > JBS : https://bugs.openjdk.java.net/browse/JDK-8164632 > > > > The change cast uint ni to integer so that the parameter that pass to method TypeOopPtr::cast_to_instance_id is a integer. > > > > I have tested this builds successfully . > > > > Ensured that there are no regressions in hotspot : tier1 tests. > > > > Regards, > > Eric Chen > > > From dmitry.chuyko at bell-sw.com Fri Aug 14 17:14:54 2020 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Fri, 14 Aug 2020 20:14:54 +0300 Subject: [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: References: Message-ID: <220c8f3d-3443-4c4d-bf42-078bec651335@bell-sw.com> Hi Vladimir, Thank you for the comments. Here is a version with simplified node definitions: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.01/ -Dmitry On 8/13/20 2:51 PM, Vladimir Ivanov wrote: > Hi Dmitry, > > Some comments on shared code changes: > > src/hotspot/share/opto/library_call.cpp: > > +? case vmIntrinsics::_dsignum: > +??? return UseSignumIntrinsic && > (Matcher::match_rule_supported(Op_SignumD) ? inline_double_math(id) : > false); > > There's no need in repeating UseSignumIntrinsic and > (Matcher::match_rule_supported(Op_SignumD) checks. > C2Compiler::is_intrinsic_supported() already covers taht. > > > src/hotspot/share/opto/signum.hpp: > > ? 32 class SignumNode : public Node { > ? 33 public: > ? 34?? SignumNode(Node* in) : Node(0, in) {} > ? 35?? virtual int Opcode() const; > ? 36?? virtual const Type *bottom_type() const { return NULL; } > ? 37?? virtual uint ideal_reg() const { return Op_RegD; } > ? 38 }; > > Any particular reason to keep SignumNode? I don't see any and would > just drop it. > > Also, having a dedicated header file just for a couple of nodes with > trivial implementations looks like an overkill. As an alternative > location, intrinsicnode.cpp should be a better option. > > Best regards, > Vladimir Ivanov > > On 13.08.2020 14:04, Dmitry Chuyko wrote: >> Hello, >> >> Please review a faster version of Math.signum() for AArch64. >> >> Two new intrinsics (double and float) are introduced in general code, >> with appropriate new nodes. New JTreg test is added to cover the >> intrinsic case (enabled only for aarch64). >> >> AArch64 implementation uses FACGT (compare abslute fp values) and BSL >> (fp bit selection) to avoid branches and moves to non-fp registers >> and back. >> >> Performance results show ~30% better time in the benchmark with a >> black hole [1] on Cortex. E.g. on random numbers 4.8 ns/op --> 3.5 >> ns/op, overhead is 2.9 ns/op. >> >> rfe: https://bugs.openjdk.java.net/browse/JDK-8251525 >> webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.00/ >> testing: jck, jtreg including new dedicated test >> >> -Dmitry >> >> [1] https://cr.openjdk.java.net/~dchuyko/8249198/DoubleSignum.java >> From vladimir.kozlov at oracle.com Fri Aug 14 18:03:51 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 14 Aug 2020 11:03:51 -0700 Subject: RFR 8164632: Node indices should be treated as unsigned integers In-Reply-To: <05F44A7B-7BF3-4EF0-B1A6-8131600A3919@amazon.com> References: <05F44A7B-7BF3-4EF0-B1A6-8131600A3919@amazon.com> Message-ID: On 8/14/20 9:05 AM, Hohensee, Paul wrote: > Hi, Vladimir, > > What do you think of the following? > > 1. Fix 8164632, i.e., replace int with uint, and add guarantees where idxs are passed to a different type (as in e.g., Eric's webrev). I see only this change: - const TypeOopPtr* tinst = t->cast_to_instance_id(ni); + assert(ni<=INT_MAX,"node index cannot be negative"); + const TypeOopPtr* tinst = t->cast_to_instance_id((int)ni); I would like to see first what you are suggesting. > 2. New issue: Define an enum type for _instance_id, (typedef uint instance_idx_t) and change the guarantees to check < InstanceTop and > InstanceBot (InstanceTop = ~(uint)0, InstanceBot = 0). And change from instance ids from int to instance_idx_t. > 3. New issue: Change from uint to node_idx_t. Yes, it is fine to split these 2. Regards, Vladimir > > Thanks, > Paul > > ?On 8/13/20, 4:00 PM, "hotspot-compiler-dev on behalf of Vladimir Kozlov" wrote: > > Yes, it is sloppy :( > > Mostly it bases on value of MaxNodeLimit = 80000 by default and as result node's idx will never reach MAX_INT. > > For EA we need 2 special types TOP and BOTTOM as Paul correctly pointed in RFE. > We can make InstanceTop == max_juint and node_idx_t type for _instance_id . We don't do arithmetic on it, see > TypeOopPtr::meet_instance_id(). But we can't use assert in this case to check incoming idx because max_juint will be > valid value - InstanceTop. > > And I agree that we should use node_idx_t everywhere. > > For example, Node::Init(), init_node_notes(), node_notes_at() and set_node_notes_at() should use it. > > Same goes for req and other Node's methods arguments. All Node fields defined as node_idx_t but we have mix of int and > uint when referencing them. > > Warning: it is not small change. > > Regards, > Vladimir > > On 8/13/20 2:51 PM, Hohensee, Paul wrote: > > Shouldn't all the uint type uses that represent node indices actually be node_idx_t? > > > > Thanks, > > Paul > > > > On 8/13/20, 12:34 AM, "hotspot-compiler-dev on behalf of Tobias Hartmann" wrote: > > > > Hi Eric, > > > > there are other places where Node::_idx is casted to int (and a potential overflow might happen). > > For example, calls to Compile::node_notes_at. > > > > The purpose of this RFE was to replace all Node::_idx uint -> int casts and consistently use uint > > for the node index. If that's not feasible, we should at least add a guarantee (not only an assert) > > checking that _idx is always <= MAX_INT. > > > > Best regards, > > Tobias > > > > On 12.08.20 00:41, Eric, Chan wrote: > > > Hi, > > > > > > Requesting review for > > > > > > Webrev : http://cr.openjdk.java.net/~xliu/eric/8164632/00/webrev/ > > > JBS : https://bugs.openjdk.java.net/browse/JDK-8164632 > > > > > > The change cast uint ni to integer so that the parameter that pass to method TypeOopPtr::cast_to_instance_id is a integer. > > > > > > I have tested this builds successfully . > > > > > > Ensured that there are no regressions in hotspot : tier1 tests. > > > > > > Regards, > > > Eric Chen > > > > > > From vladimir.kozlov at oracle.com Fri Aug 14 18:09:49 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 14 Aug 2020 11:09:49 -0700 Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and debugging support In-Reply-To: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com> References: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com> Message-ID: <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com> One note. Most of the code is guarded by #ifndef PRODUCT. But the flag is available only in DEBUG build: develop(intx, TraceLinearScanLevel, 0, Should we use #ifdef ASSERT and DEBUG() instead? Thanks, Vladimir On 8/14/20 5:10 AM, Christian Hagedorn wrote: > Hi > > Please review the following enhancement for C1: > https://bugs.openjdk.java.net/browse/JDK-8251093 > http://cr.openjdk.java.net/~chagedorn/8251093/webrev.00/ > > While I was working on JDK-8249603 [1], I added some additional debugging and logging code which helped to figure out > what was going on. I think it would be useful to have this code around for the analysis of future C1 register allocator > bugs. > > This RFE adds (everything non-product code): > - find_interval(number): Can be called like that from gdb anywhere to find an interval with the given number. > - Interval::print_children()/print_parent(): Useful when debugging with gdb to quickly show the split children and parent. > - LinearScan::print_reg_num(number): Prints the register or stack location for this register number. This is useful in > some places (logging with TraceLinearScanLevel set) where it just printed a number which first had to be manually looked > up in other logs. > > I additionally did some cleanup of the touched code. > > We could additionally split the TraceLinearScanLevel flag into separate flags related to the different phases of the > register allocation algorithm. It currently just prints too much details on the higher levels. You often find yourself > being interested in a specific part of the algorithm and only want to know more details there. To achieve that you now > you have to either handle all the noise or manually disable/enable other logs. We could file an RFE to clean this up if > it's worth the effort - given that there are not many new issues filed for C1 register allocation today. > > Thank you! > > Best regards, > Christian > > > [1] https://bugs.openjdk.java.net/browse/JDK-8251093 > From vladimir.x.ivanov at oracle.com Fri Aug 14 18:53:04 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 14 Aug 2020 21:53:04 +0300 Subject: [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: <220c8f3d-3443-4c4d-bf42-078bec651335@bell-sw.com> References: <220c8f3d-3443-4c4d-bf42-078bec651335@bell-sw.com> Message-ID: <3d8f2ce2-d106-6cb6-ffb8-861905d8f49e@oracle.com> > http://cr.openjdk.java.net/~dchuyko/8251525/webrev.01/ Changes in shared code look good. Best regards, Vladimir Ivanov > On 8/13/20 2:51 PM, Vladimir Ivanov wrote: >> Hi Dmitry, >> >> Some comments on shared code changes: >> >> src/hotspot/share/opto/library_call.cpp: >> >> +? case vmIntrinsics::_dsignum: >> +??? return UseSignumIntrinsic && >> (Matcher::match_rule_supported(Op_SignumD) ? inline_double_math(id) : >> false); >> >> There's no need in repeating UseSignumIntrinsic and >> (Matcher::match_rule_supported(Op_SignumD) checks. >> C2Compiler::is_intrinsic_supported() already covers taht. >> >> >> src/hotspot/share/opto/signum.hpp: >> >> ? 32 class SignumNode : public Node { >> ? 33 public: >> ? 34?? SignumNode(Node* in) : Node(0, in) {} >> ? 35?? virtual int Opcode() const; >> ? 36?? virtual const Type *bottom_type() const { return NULL; } >> ? 37?? virtual uint ideal_reg() const { return Op_RegD; } >> ? 38 }; >> >> Any particular reason to keep SignumNode? I don't see any and would >> just drop it. >> >> Also, having a dedicated header file just for a couple of nodes with >> trivial implementations looks like an overkill. As an alternative >> location, intrinsicnode.cpp should be a better option. >> >> Best regards, >> Vladimir Ivanov >> >> On 13.08.2020 14:04, Dmitry Chuyko wrote: >>> Hello, >>> >>> Please review a faster version of Math.signum() for AArch64. >>> >>> Two new intrinsics (double and float) are introduced in general code, >>> with appropriate new nodes. New JTreg test is added to cover the >>> intrinsic case (enabled only for aarch64). >>> >>> AArch64 implementation uses FACGT (compare abslute fp values) and BSL >>> (fp bit selection) to avoid branches and moves to non-fp registers >>> and back. >>> >>> Performance results show ~30% better time in the benchmark with a >>> black hole [1] on Cortex. E.g. on random numbers 4.8 ns/op --> 3.5 >>> ns/op, overhead is 2.9 ns/op. >>> >>> rfe: https://bugs.openjdk.java.net/browse/JDK-8251525 >>> webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.00/ >>> testing: jck, jtreg including new dedicated test >>> >>> -Dmitry >>> >>> [1] https://cr.openjdk.java.net/~dchuyko/8249198/DoubleSignum.java >>> From jptatton at amazon.com Fri Aug 14 19:26:15 2020 From: jptatton at amazon.com (Tatton, Jason) Date: Fri, 14 Aug 2020 19:26:15 +0000 Subject: JDK-8180068: Access of mark word should use oopDesc::mark_offset_in_bytes() instead of '0' for sparc & arm In-Reply-To: References: <13d13861e3c14681a9180dc2271589ba@EX13D46EUB003.ant.amazon.com> Message-ID: <868e69bd583b4e5f9c8be4f40c934a5d@EX13D46EUB003.ant.amazon.com> Hi Aleksey, Thanks for having a look into this. I was mistaken in what these calls were doing, thank you for explaining this. I'm not able to find any other potential instances where 'oopDesc::mark_offset_in_bytes()' should be used. The bug is a few years old, so perhaps the codebase has naturally evolved in the intervening time to resolve this? Unless anyone can advise on other instances which I should change, I'd advise closing the bug? -Jason -----Original Message----- From: Aleksey Shipilev Sent: 14 August 2020 08:22 To: Tatton, Jason ; hotspot-compiler-dev at openjdk.java.net Subject: RE: [EXTERNAL] JDK-8180068: Access of mark word should use oopDesc::mark_offset_in_bytes() instead of '0' for sparc & arm On 8/14/20 1:02 AM, Tatton, Jason wrote: > I have a cleanup to submit. Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8180068 > http://cr.openjdk.java.net/~phh/8180068/webrev.00/ > > The code change is very straightforward, simply a substitution of '0' with 'oopDesc::mark_offset_in_bytes()' in the relevant 6 locations. No, wait. None of these look relevant: *) The uses in load_heap_oop are the _load addresses_. They are naturally just *(obj + 0). This is not loading the mark word. *) The uses in try_resolve_jobject is decoding the JNI handle, "0" is valid there. This is not loading the mark word. See the native implementation in JNIHandles::resolve_impl that ends up loading off the dereferenced handle via: inline oop* JNIHandles::jobject_ptr(jobject handle) { assert(!is_jweak(handle), "precondition"); return reinterpret_cast(handle); } -- Thanks, -Aleksey From hohensee at amazon.com Fri Aug 14 20:54:55 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Fri, 14 Aug 2020 20:54:55 +0000 Subject: RFR 8164632: Node indices should be treated as unsigned integers In-Reply-To: References: <05F44A7B-7BF3-4EF0-B1A6-8131600A3919@amazon.com> Message-ID: <587AF7B9-5EE9-4F93-A587-9B3277E9183D@amazon.com> By "e.g.", I meant "ones like the one in the webrev". Tobais is correct that there are more. I grep'ed for "(int idx", ", int idx", "(int idx)", and so on, and found a bunch (not all of them are node_idx_t, but many of those that aren't should probably be uint too). So those would be fixed first. Thanks, Paul ?On 8/14/20, 11:04 AM, "Vladimir Kozlov" wrote: On 8/14/20 9:05 AM, Hohensee, Paul wrote: > Hi, Vladimir, > > What do you think of the following? > > 1. Fix 8164632, i.e., replace int with uint, and add guarantees where idxs are passed to a different type (as in e.g., Eric's webrev). I see only this change: - const TypeOopPtr* tinst = t->cast_to_instance_id(ni); + assert(ni<=INT_MAX,"node index cannot be negative"); + const TypeOopPtr* tinst = t->cast_to_instance_id((int)ni); I would like to see first what you are suggesting. > 2. New issue: Define an enum type for _instance_id, (typedef uint instance_idx_t) and change the guarantees to check < InstanceTop and > InstanceBot (InstanceTop = ~(uint)0, InstanceBot = 0). And change from instance ids from int to instance_idx_t. > 3. New issue: Change from uint to node_idx_t. Yes, it is fine to split these 2. Regards, Vladimir > > Thanks, > Paul > > On 8/13/20, 4:00 PM, "hotspot-compiler-dev on behalf of Vladimir Kozlov" wrote: > > Yes, it is sloppy :( > > Mostly it bases on value of MaxNodeLimit = 80000 by default and as result node's idx will never reach MAX_INT. > > For EA we need 2 special types TOP and BOTTOM as Paul correctly pointed in RFE. > We can make InstanceTop == max_juint and node_idx_t type for _instance_id . We don't do arithmetic on it, see > TypeOopPtr::meet_instance_id(). But we can't use assert in this case to check incoming idx because max_juint will be > valid value - InstanceTop. > > And I agree that we should use node_idx_t everywhere. > > For example, Node::Init(), init_node_notes(), node_notes_at() and set_node_notes_at() should use it. > > Same goes for req and other Node's methods arguments. All Node fields defined as node_idx_t but we have mix of int and > uint when referencing them. > > Warning: it is not small change. > > Regards, > Vladimir > > On 8/13/20 2:51 PM, Hohensee, Paul wrote: > > Shouldn't all the uint type uses that represent node indices actually be node_idx_t? > > > > Thanks, > > Paul > > > > On 8/13/20, 12:34 AM, "hotspot-compiler-dev on behalf of Tobias Hartmann" wrote: > > > > Hi Eric, > > > > there are other places where Node::_idx is casted to int (and a potential overflow might happen). > > For example, calls to Compile::node_notes_at. > > > > The purpose of this RFE was to replace all Node::_idx uint -> int casts and consistently use uint > > for the node index. If that's not feasible, we should at least add a guarantee (not only an assert) > > checking that _idx is always <= MAX_INT. > > > > Best regards, > > Tobias > > > > On 12.08.20 00:41, Eric, Chan wrote: > > > Hi, > > > > > > Requesting review for > > > > > > Webrev : http://cr.openjdk.java.net/~xliu/eric/8164632/00/webrev/ > > > JBS : https://bugs.openjdk.java.net/browse/JDK-8164632 > > > > > > The change cast uint ni to integer so that the parameter that pass to method TypeOopPtr::cast_to_instance_id is a integer. > > > > > > I have tested this builds successfully . > > > > > > Ensured that there are no regressions in hotspot : tier1 tests. > > > > > > Regards, > > > Eric Chen > > > > > > From OGATAK at jp.ibm.com Fri Aug 14 21:04:58 2020 From: OGATAK at jp.ibm.com (Kazunori Ogata) Date: Sat, 15 Aug 2020 06:04:58 +0900 Subject: RFR: JDK-8251470: Add a development option equivalant to OptoNoExecute to C1 compiler In-Reply-To: References: Message-ID: Hi Tobias, Thank you for checking the webrev and pointing out InstallMethods option. I now realize that I failed to notice this option can be turned off. I remember I checked its default value is true, but I wasn't aware that it's a command line option... Regarding the change in javaCalls.cpp, I made this change when I was debugging my changes to support new instructions. I also made another change to make my code work. I guess I should have revisit the change in javaCalls.cpp when my code became workable. This change must not be necessary because my version of JVM works fine by only disabling InstallMethods. Anyway, I agree this RFR is unnecessary. Sorry for bothering you. Regards, Ogata Tobias Hartmann wrote on 2020/08/13 16:00:20: > From: Tobias Hartmann > To: Kazunori Ogata , hotspot-compiler-dev at openjdk.java.net > Date: 2020/08/13 16:02 > Subject: [EXTERNAL] Re: RFR: JDK-8251470: Add a development option > equivalant to OptoNoExecute to C1 compiler > > Hi Ogata, > > isn't that what -XX:-InstallMethods [1] is supposed to accomplish? It > triggers a bailout right > before Compilation::install_code, which is the same with your code. > > Also, why do you need the change in javaCalls.cpp? That would also affect > C2 compiled code. > > Best regards, > Tobias > > [1] INVALID URI REMOVED > u=http-3A__hg.openjdk.java.net_jdk_jdk_file_a7c030723240_src_hotspot_share_c1_c1-5Fglobals.hpp-23l292&d=DwICaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=p- > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=9MlMOi5vGmX_CxK_2Eh5nMKekBrYPPdxQkvLPDv- > isw&s=TWFHKEFoj6wwSylXbeLhsD7-tv5nCR50A6-iptKDC00&e= > > > On 12.08.20 09:48, Kazunori Ogata wrote: > > Hi, > > > > May I get review for JDK-8251470: Add a development option equivalant to > > OptoNoExecute to C1 compiler? > > > > This patch adds a development option to compile a method with C1 and print > > disassembly of the generated native code, but to skip execution of the > > generated code, in the same manner as OptoNoExecute option does in C2. > > > > Log-based debugging is useful to support a new processor. In C1, the > > existing options BailoutAfterHIR and BailoutAfterLIR can be used if > > printing HIR/LIR is sufficient. However, there is no way to print > > disassembly of the generated code because these existing options quit > > compilation before generating native code. So this issue proposes a new > > option for this purpose. > > > > > > Bug: INVALID URI REMOVED > u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8251470&d=DwICaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=p- > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=9MlMOi5vGmX_CxK_2Eh5nMKekBrYPPdxQkvLPDv- > isw&s=JN0Zd_7HcvX3tVM-KdN-Q4hpX7Um5_muAy0Ma5sFWAI&e= > > Webrev: INVALID URI REMOVED > u=http-3A__cr.openjdk.java.net_-7Eogatak_8251470_webrev. > 00_&d=DwICaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p- > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=9MlMOi5vGmX_CxK_2Eh5nMKekBrYPPdxQkvLPDv- > isw&s=7Z4peF7vXvN0QRxSZALXIU3C91WHZWhS5pWyvRA4XlA&e= > > > > > > Regards, > > Ogata > > > From aph at redhat.com Sat Aug 15 13:50:42 2020 From: aph at redhat.com (Andrew Haley) Date: Sat, 15 Aug 2020 14:50:42 +0100 Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com> References: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com> Message-ID: <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com> I've been looking at the way Math.signum() is used, mostly by searching the GitHub code database. I've changed the JMH test to be IMO more realistic: it's at http://cr.openjdk.java.net/~aph/DoubleSignum.java. I think it's more realitic because signum() results usually aren't stored but are used to feed other arithmetic ops, usually + or *. Baseline: Benchmark Mode Cnt Score Error Units DoubleSignum.ofMostlyNaN avgt 3 2.409 ? 0.051 ns/op DoubleSignum.ofMostlyNeg avgt 3 2.475 ? 0.211 ns/op DoubleSignum.ofMostlyPos avgt 3 2.494 ? 0.015 ns/op DoubleSignum.ofMostlyZero avgt 3 2.501 ? 0.008 ns/op DoubleSignum.ofRandom avgt 3 2.458 ? 0.373 ns/op DoubleSignum.overhead avgt 3 2.373 ? 0.029 ns/op -XX:+UseSignumIntrinsic: Benchmark Mode Cnt Score Error Units DoubleSignum.ofMostlyNaN avgt 3 2.776 ? 0.006 ns/op DoubleSignum.ofMostlyNeg avgt 3 2.773 ? 0.066 ns/op DoubleSignum.ofMostlyPos avgt 3 2.772 ? 0.084 ns/op DoubleSignum.ofMostlyZero avgt 3 2.770 ? 0.045 ns/op DoubleSignum.ofRandom avgt 3 2.769 ? 0.005 ns/op DoubleSignum.overhead avgt 3 2.376 ? 0.013 ns/op I think it might be more useful for you to work on optimizing Math.copysign(). -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From ningsheng.jian at arm.com Mon Aug 17 06:00:13 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Mon, 17 Aug 2020 14:00:13 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> Message-ID: <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> Hi Andrew, Thanks a lot for the review! Sorry for the late reply, as I was on vacation last week. And thanks to Pengfei and Joshua for helping clarifying some details in the patch. > > Testing: > > I was able to test this patch on a loaned Fujitsu FX700. I replicated > your results, passing tier1 tests and the jtreg compiler tests in > vectorization, codegen, c2/cr6340864 and loopopts. > Thanks for the testing. > I also eyeballed /some/ of the generated code to check that it looked > ok. I'd really like to be able to do that systematically for a > comprehensive test suite that exercised every rule but I only had the > machine for a few days. This really ought to be done as a follow-up to > ensure that all the rules are working as expected. > > Yes, we would expect Pengfei's OptoAssembly check patch can get merged in future. > > General Comments: > > Sizing the NEON registers using 8 slots -- even though there might > actually be more (or less!) slots in use for a VecA is fine. However, I > think this needs a little bit more explanation in the .ad. file (see > comments on ra webrev below) > OK, I will try to have some more clear comments in ad file. > I'm ok with your choice to use p7 as an always true predicate register > and also how you choose to init and re-init from code defined via the ad > file based on C->max_vector_size(). > > I am not clear why you are choosing to re-init ptrue after certain JVM > runtime calls (e.g. when Z calls into the runtime) and not others e.g. > when we call a JVM_ENTRY. Could you explain the rationale you have > followed here? > > We do the re-init at any possible return points to c2 code, not in any runtime c++ functions, which will reduce the re-init calls. Actually I found those entries by some hack of jvm. In the hacky code below we use gcc option -finstrument-functions to build hotspot. With this option, each C/C++ function entry/exit will call the instrument functions we defined. In instrument functions, we clobber p7 (or other reg for test) register, and in c2 function return we verify that p7 (or other reg) has been reinitialized. http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch > > Specific Comments (feature webrev): > > > globals_aarch64.hpp:102 > > Just out of interest why does UseSVE have range(0,2)? It seems you are > only testing for UseSVE > 0. Does value 2 correspond to an optional subset? > > Thanks to Pengfei's reply for this. :-) > > Specific Comments (register allocator webrev): > > > aarch64.ad:97-100 > > Why have you added a reg_def for R8 and R9 here and also to alloc_class > chunk0 at lines 544-545? They aren't used by C2 so why define them? > I think Pengfei has helped to explain that. I will either add clear comments or rename the register name as you suggested. > > assembler_aarch64.hpp:280 (also 699) > > prf sets a predicate register field. pgrf sets a governing predicate > register field. Should the name not be gprf. > Thanks to Pengfei's comment. > > chaitin.cpp:648-660 > > The comment is rather oddly formatted. > Thanks! > At line 650 you guard the assert with a test for lrg._is_vector. Is that > not always going to be guaranteed by the outer condition > lrg._is_scalable? If so then you should really assert lrg._is_vector. > > The special case code for computation of num_regs for a vector stack > slot also appears in this file with a slightly different organization in > find_first_set (line 1350) and in PhaseChaitin::Select (line 1590). > There is another similar case in RegMask::num_registers at regmask.cpp: > 98. It would be better to factor out the common code into methods of > LRG. Maybe using the following? > > bool LRG::is_scalable_vector() { > if (_is_scalable) { > assert(_is_vector == 1); > assert(_num_regs == == RegMask::SlotsPerVecA) > return true; > } > return false; > } > > int LRG::scalable_num_regs() { > assert(is_scalable_vector()); > if (OptoReg::is_stack(_reg)) { > return _scalable_reg_slots > } else { > return num_reg_slots; > } > } > > > chaitin.cpp:1350 > > Once again the test for lrg._is_vector should be guaranteed by the outer > test of lrg._is_scalable. Refactoring using the common methods of LRG as > above ought to help. > > chaitin.cpp:1591 > > Use common method code. > > > postaloc.cpp:308/323 > > Once again you should be able to use common method code of LRG here. > > > regmask.cpp:91 > > Once again you should be able to use common method code of LRG here. > As Joshua clarified, we are also working on predicate scalable reg, which is not in this patch. Thanks for the suggestion, I will try to refactor this a bit. > Specific Comments (c2 webrev): > > > aarch64.ad:3815 > > very nice defensive check! > > > assembler_aarch64.hpp:2469 & 2699+ > > Andrew Haley is definitely going to ask you to update function entry > (assembler_aarch64.cpp:76) to call these new instruction generation > methods and then validate the generated code using asm_check So, I guess > you might as well do that now ;-) > > Yes! :-) Will add the test code. Thanks! > zBarrierSetAssembler_aarch64.cpp:434 > > Can you explain why we need to check p7 here and not do so in other > places where we call into the JVM? I'm not saying this is wrong. I just > want to know how you decided where re-init of p7 was needed. > Actually I found this by my hack patch above while running jtreg tests. The stub slowpath here can be a c++ function. > superword.cpp:97 > > Does this mean that is someone sets the maximum vector size to a > non-power of two, such as 384, all superword operations will be > bypassed? Including those which can be done using NEON vectors? > Current SLP vectorizer only supports power-of-2 vector size. We are trying to work out a new vectorizer to support all SVE vector sizes, so we would expect a size like 384 could go to that path. I tried current patch on a 512-bit SVE hardware which does not support 384-bit: $ java -XX:MaxVectorSize=16 -version # (32 and 64 are the same) openjdk version "16-internal" 2021-03-16 $ java -XX:MaxVectorSize=48 -version OpenJDK 64-Bit Server VM warning: Current system only supports max SVE vector length 32. Set MaxVectorSize to 32 (Fallbacks to 32 and issue a warning, as the prctl() call returns 32 instead of unsupported 48: https://www.kernel.org/doc/Documentation/arm64/sve.txt) Do you think we need to exit vm instead of warning and fallbacking to 32 here? Thanks, Ningsheng From tobias.hartmann at oracle.com Mon Aug 17 06:16:57 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 17 Aug 2020 08:16:57 +0200 Subject: RFR: JDK-8251470: Add a development option equivalant to OptoNoExecute to C1 compiler In-Reply-To: References: Message-ID: Hi Ogata, thanks for the details, I've closed the bug as "Not An Issue". Best regards, Tobias On 14.08.20 23:04, Kazunori Ogata wrote: > Hi Tobias, > > Thank you for checking the webrev and pointing out InstallMethods option. > I now realize that I failed to notice this option can be turned off. I > remember I checked its default value is true, but I wasn't aware that it's > a command line option... > > Regarding the change in javaCalls.cpp, I made this change when I was > debugging my changes to support new instructions. I also made another > change to make my code work. I guess I should have revisit the change in > javaCalls.cpp when my code became workable. This change must not be > necessary because my version of JVM works fine by only disabling > InstallMethods. > > Anyway, I agree this RFR is unnecessary. Sorry for bothering you. > > > Regards, > Ogata > > > Tobias Hartmann wrote on 2020/08/13 16:00:20: > >> From: Tobias Hartmann >> To: Kazunori Ogata , > hotspot-compiler-dev at openjdk.java.net >> Date: 2020/08/13 16:02 >> Subject: [EXTERNAL] Re: RFR: JDK-8251470: Add a development option >> equivalant to OptoNoExecute to C1 compiler >> >> Hi Ogata, >> >> isn't that what -XX:-InstallMethods [1] is supposed to accomplish? It >> triggers a bailout right >> before Compilation::install_code, which is the same with your code. >> >> Also, why do you need the change in javaCalls.cpp? That would also > affect >> C2 compiled code. >> >> Best regards, >> Tobias >> >> [1] INVALID URI REMOVED >> > u=http-3A__hg.openjdk.java.net_jdk_jdk_file_a7c030723240_src_hotspot_share_c1_c1-5Fglobals.hpp-23l292&d=DwICaQ&c=jf_iaSHvJObTbx- >> siA1ZOg&r=p- >> > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=9MlMOi5vGmX_CxK_2Eh5nMKekBrYPPdxQkvLPDv- >> isw&s=TWFHKEFoj6wwSylXbeLhsD7-tv5nCR50A6-iptKDC00&e= >> >> >> On 12.08.20 09:48, Kazunori Ogata wrote: >>> Hi, >>> >>> May I get review for JDK-8251470: Add a development option equivalant > to >>> OptoNoExecute to C1 compiler? >>> >>> This patch adds a development option to compile a method with C1 and > print >>> disassembly of the generated native code, but to skip execution of the > >>> generated code, in the same manner as OptoNoExecute option does in C2. >>> >>> Log-based debugging is useful to support a new processor. In C1, the >>> existing options BailoutAfterHIR and BailoutAfterLIR can be used if >>> printing HIR/LIR is sufficient. However, there is no way to print >>> disassembly of the generated code because these existing options quit >>> compilation before generating native code. So this issue proposes a > new >>> option for this purpose. >>> >>> >>> Bug: INVALID URI REMOVED >> > u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8251470&d=DwICaQ&c=jf_iaSHvJObTbx- >> siA1ZOg&r=p- >> > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=9MlMOi5vGmX_CxK_2Eh5nMKekBrYPPdxQkvLPDv- >> isw&s=JN0Zd_7HcvX3tVM-KdN-Q4hpX7Um5_muAy0Ma5sFWAI&e= >>> Webrev: INVALID URI REMOVED >> u=http-3A__cr.openjdk.java.net_-7Eogatak_8251470_webrev. >> 00_&d=DwICaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p- >> > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=9MlMOi5vGmX_CxK_2Eh5nMKekBrYPPdxQkvLPDv- >> isw&s=7Z4peF7vXvN0QRxSZALXIU3C91WHZWhS5pWyvRA4XlA&e= >>> >>> >>> Regards, >>> Ogata >>> >> > > From tobias.hartmann at oracle.com Mon Aug 17 06:20:19 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 17 Aug 2020 08:20:19 +0200 Subject: JDK-8180068: Access of mark word should use oopDesc::mark_offset_in_bytes() instead of '0' for sparc & arm In-Reply-To: <868e69bd583b4e5f9c8be4f40c934a5d@EX13D46EUB003.ant.amazon.com> References: <13d13861e3c14681a9180dc2271589ba@EX13D46EUB003.ant.amazon.com> <868e69bd583b4e5f9c8be4f40c934a5d@EX13D46EUB003.ant.amazon.com> Message-ID: <726c7edc-e0f1-4542-0136-dde41e982653@oracle.com> Hi Jason, On 14.08.20 21:26, Tatton, Jason wrote: > Unless anyone can advise on other instances which I should change, I'd advise closing the bug? Yes, please close as "Not an Issue" and link to this RFR [1]. Best regards, Tobias [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039509.html From christian.hagedorn at oracle.com Mon Aug 17 07:44:14 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Mon, 17 Aug 2020 09:44:14 +0200 Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and debugging support In-Reply-To: <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com> References: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com> <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com> Message-ID: <5c6d0dae-a30d-7b32-4503-62b7ec460606@oracle.com> Hi Vladimir Yes, you're right, these should be changed into ASSERT and DEBUG(). I'm wondering though if these ifdefs are even required for if-blocks inside methods? Isn't, for example, this if-block: #ifndef PRODUCT if (TraceLinearScanLevel >= 2) { tty->print_cr("killing XMMs for trig"); } #endif removed anyways when the flag is set to < 2 (which is statically known and thus would allow this entire block to be removed)? Or does it make a difference by explicitly guarding it with an ifdef? Best regards, Christian On 14.08.20 20:09, Vladimir Kozlov wrote: > One note. Most of the code is guarded by #ifndef PRODUCT. > > But the flag is available only in DEBUG build: > ? develop(intx, TraceLinearScanLevel, 0, > > Should we use #ifdef ASSERT and DEBUG() instead? > > Thanks, > Vladimir > > On 8/14/20 5:10 AM, Christian Hagedorn wrote: >> Hi >> >> Please review the following enhancement for C1: >> https://bugs.openjdk.java.net/browse/JDK-8251093 >> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.00/ >> >> While I was working on JDK-8249603 [1], I added some additional >> debugging and logging code which helped to figure out what was going >> on. I think it would be useful to have this code around for the >> analysis of future C1 register allocator bugs. >> >> This RFE adds (everything non-product code): >> - find_interval(number): Can be called like that from gdb anywhere to >> find an interval with the given number. >> - Interval::print_children()/print_parent(): Useful when debugging >> with gdb to quickly show the split children and parent. >> - LinearScan::print_reg_num(number): Prints the register or stack >> location for this register number. This is useful in some places >> (logging with TraceLinearScanLevel set) where it just printed a number >> which first had to be manually looked up in other logs. >> >> I additionally did some cleanup of the touched code. >> >> We could additionally split the TraceLinearScanLevel flag into >> separate flags related to the different phases of the register >> allocation algorithm. It currently just prints too much details on the >> higher levels. You often find yourself being interested in a specific >> part of the algorithm and only want to know more details there. To >> achieve that you now you have to either handle all the noise or >> manually disable/enable other logs. We could file an RFE to clean this >> up if it's worth the effort - given that there are not many new issues >> filed for C1 register allocation today. >> >> Thank you! >> >> Best regards, >> Christian >> >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8251093 >> From shade at redhat.com Mon Aug 17 08:45:30 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 17 Aug 2020 10:45:30 +0200 Subject: JDK-8180068: Access of mark word should use oopDesc::mark_offset_in_bytes() instead of '0' for sparc & arm In-Reply-To: <868e69bd583b4e5f9c8be4f40c934a5d@EX13D46EUB003.ant.amazon.com> References: <13d13861e3c14681a9180dc2271589ba@EX13D46EUB003.ant.amazon.com> <868e69bd583b4e5f9c8be4f40c934a5d@EX13D46EUB003.ant.amazon.com> Message-ID: Hi again, On 8/14/20 9:26 PM, Tatton, Jason wrote: > Thanks for having a look into this. I was mistaken in what these calls were doing, thank you for > explaining this. I'm not able to find any other potential instances where > 'oopDesc::mark_offset_in_bytes()' should be used. The bug is a few years old, so perhaps the > codebase has naturally evolved in the intervening time to resolve this? I think so. sparc parts are gone. I eyeballed arm parts for Address(...) usages, and there seem to be none that require changing 0 to oopDesc::mark_offset_in_bytes(). > Unless anyone can advise on other instances which I should change, I'd advise closing the bug? Yes, I think closing with "Not an Issue" would be in order. -- Thanks, -Aleksey From rwestrel at redhat.com Mon Aug 17 08:49:28 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 17 Aug 2020 10:49:28 +0200 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <875zbjw9m9.fsf@redhat.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> Message-ID: <87h7t13bdz.fsf@redhat.com> John, Tobias, > The last patch is flawed: predicates in the inner loop use the jvm state > from the predicates of the initial loop, that is the state before the > loop. If deoptimization happens for an inner loop predicate on an > iteration of the outer loop that's not the first one then execution > resumes as if the initial loop was never executed when it's already part > way through. > > To fix this, I changed the code so one iteration of the loop is peeled > when the loop is transformed to a long counted loop. State for > predicates is obtained from the safepoint at the end of the peeled > iteration of the loop. Does the fixed patch look ok to you? Roland. > http://cr.openjdk.java.net/~roland/8223051/webrev.03/ > > diff from previous patch: > http://cr.openjdk.java.net/~roland/8223051/webrev.02-03/ From adinn at redhat.com Mon Aug 17 08:52:56 2020 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 17 Aug 2020 09:52:56 +0100 Subject: [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: References: Message-ID: On 13/08/2020 12:04, Dmitry Chuyko wrote: > Please review a faster version of Math.signum() for AArch64. > > Two new intrinsics (double and float) are introduced in general code, > with appropriate new nodes. New JTreg test is added to cover the > intrinsic case (enabled only for aarch64). > > AArch64 implementation uses FACGT (compare abslute fp values) and BSL > (fp bit selection) to avoid branches and moves to non-fp registers and > back. > > Performance results show ~30% better time in the benchmark with a black > hole [1] on Cortex. E.g. on random numbers 4.8 ns/op --> 3.5 ns/op, > overhead is 2.9 ns/op. > > rfe: https://bugs.openjdk.java.net/browse/JDK-8251525 > webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.00/ > testing: jck, jtreg including new dedicated test The arrays float_cases and double_cases in that dedicated test (TestSignumIntrinsic) include some rather randomly picked float literals with either exponent or a large exponent. They do not include a denormal float literal (excluding the obvious corner cases Float/Double.MIN_VALUE). At least one sample value from the denormal range ought to be included even though (indeed, precisely because) it ought to be of no consequence for the algorithm being used. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From adinn at redhat.com Mon Aug 17 09:16:47 2020 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 17 Aug 2020 10:16:47 +0100 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> Message-ID: <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> Hi Pengfei, On 17/08/2020 07:00, Ningsheng Jian wrote: > Thanks a lot for the review! Sorry for the late reply, as I was on > vacation last week. And thanks to Pengfei and Joshua for helping > clarifying some details in the patch. Yes, they did a very good job of answering most of the pending questions. >> I also eyeballed /some/ of the generated code to check that it looked >> ok. I'd really like to be able to do that systematically for a >> comprehensive test suite that exercised every rule but I only had the >> machine for a few days. This really ought to be done as a follow-up to >> ensure that all the rules are working as expected. > > Yes, we would expect Pengfei's OptoAssembly check patch can get merged > in future. I'm fine with that as a follow-up patch if you raise a JIRA for it. >> I am not clear why you are choosing to re-init ptrue after certain JVM >> runtime calls (e.g. when Z calls into the runtime) and not others e.g. >> when we call a JVM_ENTRY. Could you explain the rationale you have >> followed here? > > We do the re-init at any possible return points to c2 code, not in any > runtime c++ functions, which will reduce the re-init calls. > > Actually I found those entries by some hack of jvm. In the hacky code > below we use gcc option -finstrument-functions to build hotspot. With > this option, each C/C++ function entry/exit will call the instrument > functions we defined. In instrument functions, we clobber p7 (or other > reg for test) register, and in c2 function return we verify that p7 (or > other reg) has been reinitialized. > > http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch Nice work. It's very good to have that documented. I'm willing to accept i) that this has found all current cases and ii) that the verify will catch any cases that might get introduced by future changes (e.g. the callout introduced by ZGC that you mention below). As the above mot say there is a slim chance this might have missed some cases but I think it is pretty unlikely. >> Specific Comments (register allocator webrev): >> >> >> aarch64.ad:97-100 >> >> Why have you added a reg_def for R8 and R9 here and also to alloc_class >> chunk0 at lines 544-545? They aren't used by C2 so why define them? >> > > I think Pengfei has helped to explain that. I will either add clear > comments or rename the register name as you suggested. Ok, good. > As Joshua clarified, we are also working on predicate scalable reg, > which is not in this patch. Thanks for the suggestion, I will try to > refactor this a bit. Ok, I'll wait for an updated patch. Are you planning to include the scalable predicate reg code as part of this patch? I think that would be better as it would help to clarify the need to distinguish vector regs as a subset of scalable regs. >> zBarrierSetAssembler_aarch64.cpp:434 >> >> Can you explain why we need to check p7 here and not do so in other >> places where we call into the JVM? I'm not saying this is wrong. I just >> want to know how you decided where re-init of p7 was needed. >> > > Actually I found this by my hack patch above while running jtreg tests. > The stub slowpath here can be a c++ function. Yes, good catch. >> superword.cpp:97 >> >> Does this mean that is someone sets the maximum vector size to a >> non-power of two, such as 384, all superword operations will be >> bypassed? Including those which can be done using NEON vectors? >> > > Current SLP vectorizer only supports power-of-2 vector size. We are > trying to work out a new vectorizer to support all SVE vector sizes, so > we would expect a size like 384 could go to that path. I tried current > patch on a 512-bit SVE hardware which does not support 384-bit: > > $ java -XX:MaxVectorSize=16 -version # (32 and 64 are the same) > openjdk version "16-internal" 2021-03-16 > > $ java -XX:MaxVectorSize=48 -version > OpenJDK 64-Bit Server VM warning: Current system only supports max SVE > vector length 32. Set MaxVectorSize to 32 > > (Fallbacks to 32 and issue a warning, as the prctl() call returns 32 > instead of unsupported 48: > https://www.kernel.org/doc/Documentation/arm64/sve.txt) > > Do you think we need to exit vm instead of warning and fallbacking to 32 > here? Yes, I think a vm exit would probably be a better choice. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From ningsheng.jian at arm.com Mon Aug 17 10:19:04 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Mon, 17 Aug 2020 18:19:04 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> Message-ID: <294d013f-e6d4-4eba-4455-78bd4cfd1148@arm.com> Hi Andrew, > >> As Joshua clarified, we are also working on predicate scalable reg, >> which is not in this patch. Thanks for the suggestion, I will try to >> refactor this a bit. > > Ok, I'll wait for an updated patch. Are you planning to include the > scalable predicate reg code as part of this patch? I think that would be > better as it would help to clarify the need to distinguish vector regs > as a subset of scalable regs. > My original plan was not to include scalable predicate reg related code, as they are not used and tested without proper mid-end/back-end code. Do you think just adding some comments is OK for now, e.g. saying that a scalable reg could also be a predicate reg in future? >> $ java -XX:MaxVectorSize=48 -version >> OpenJDK 64-Bit Server VM warning: Current system only supports max SVE >> vector length 32. Set MaxVectorSize to 32 >> >> (Fallbacks to 32 and issue a warning, as the prctl() call returns 32 >> instead of unsupported 48: >> https://www.kernel.org/doc/Documentation/arm64/sve.txt) >> >> Do you think we need to exit vm instead of warning and fallbacking to 32 >> here? > > Yes, I think a vm exit would probably be a better choice. > OK, will do that. Thanks! Regards, Ningsheng From adinn at redhat.com Mon Aug 17 10:29:11 2020 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 17 Aug 2020 11:29:11 +0100 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <294d013f-e6d4-4eba-4455-78bd4cfd1148@arm.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> <294d013f-e6d4-4eba-4455-78bd4cfd1148@arm.com> Message-ID: <9e045763-4a5d-57b5-18c9-63f8a6072c2e@redhat.com> On 17/08/2020 11:19, Ningsheng Jian wrote: >>> As Joshua clarified, we are also working on predicate scalable reg, >>> which is not in this patch. Thanks for the suggestion, I will try to >>> refactor this a bit. >> >> Ok, I'll wait for an updated patch. Are you planning to include the >> scalable predicate reg code as part of this patch? I think that would be >> better as it would help to clarify the need to distinguish vector regs >> as a subset of scalable regs. >> > > My original plan was not to include scalable predicate reg related code, > as they are not used and tested without proper mid-end/back-end code. Do > you think just adding some comments is OK for now, e.g. saying that a > scalable reg could also be a predicate reg in future? Sure. A comment describing the meaning of the scalable and vector properties and their independence from each other will be good enough for now and it will still be helpful once the extra code is added. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From fairoz.matte at oracle.com Mon Aug 17 12:46:37 2020 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Mon, 17 Aug 2020 05:46:37 -0700 (PDT) Subject: RFR(s): 8248295: serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal Message-ID: Hi, Please review this small test change to work with Graal. Background: Graal require more code cache compared to c1/c2. but the test case always set it to 20MB. This may not be sufficient when running graal. Default configuration for ReservedCodeCacheSize = 250MB With graal enabled, ReservedCodeCacheSize = 350MB Either we can modify the framework to honor ReservedCodeCacheSize for graal or just update the testcase. There are not many test cases they rely on ReservedCodeCacheSize or InitialCodeCacheSize. So the fix prefer the later one. JBS - https://bugs.openjdk.java.net/browse/JDK-8248295 Webrev - http://cr.openjdk.java.net/~fmatte/8248295/webrev.00/ Thanks, Fairoz From vladimir.kozlov at oracle.com Mon Aug 17 17:36:27 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 17 Aug 2020 10:36:27 -0700 Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and debugging support In-Reply-To: <5c6d0dae-a30d-7b32-4503-62b7ec460606@oracle.com> References: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com> <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com> <5c6d0dae-a30d-7b32-4503-62b7ec460606@oracle.com> Message-ID: On 8/17/20 12:44 AM, Christian Hagedorn wrote: > Hi Vladimir > > Yes, you're right, these should be changed into ASSERT and DEBUG(). > > I'm wondering though if these ifdefs are even required for if-blocks inside methods? > > Isn't, for example, this if-block: > > #ifndef PRODUCT > ??????? if (TraceLinearScanLevel >= 2) { > ????????? tty->print_cr("killing XMMs for trig"); > ??????? } > #endif > > removed anyways when the flag is set to < 2 (which is statically known and thus would allow this entire block to be > removed)? Or does it make a difference by explicitly guarding it with an ifdef? You are right. It could be statically removed. But we keep #ifdef sometimes to indicate that code is executed only in debug build because we don't always remember type of a flag. Thanks, Vladimir K > > Best regards, > Christian > > On 14.08.20 20:09, Vladimir Kozlov wrote: >> One note. Most of the code is guarded by #ifndef PRODUCT. >> >> But the flag is available only in DEBUG build: >> ?? develop(intx, TraceLinearScanLevel, 0, >> >> Should we use #ifdef ASSERT and DEBUG() instead? >> >> Thanks, >> Vladimir >> >> On 8/14/20 5:10 AM, Christian Hagedorn wrote: >>> Hi >>> >>> Please review the following enhancement for C1: >>> https://bugs.openjdk.java.net/browse/JDK-8251093 >>> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.00/ >>> >>> While I was working on JDK-8249603 [1], I added some additional debugging and logging code which helped to figure out >>> what was going on. I think it would be useful to have this code around for the analysis of future C1 register >>> allocator bugs. >>> >>> This RFE adds (everything non-product code): >>> - find_interval(number): Can be called like that from gdb anywhere to find an interval with the given number. >>> - Interval::print_children()/print_parent(): Useful when debugging with gdb to quickly show the split children and >>> parent. >>> - LinearScan::print_reg_num(number): Prints the register or stack location for this register number. This is useful >>> in some places (logging with TraceLinearScanLevel set) where it just printed a number which first had to be manually >>> looked up in other logs. >>> >>> I additionally did some cleanup of the touched code. >>> >>> We could additionally split the TraceLinearScanLevel flag into separate flags related to the different phases of the >>> register allocation algorithm. It currently just prints too much details on the higher levels. You often find >>> yourself being interested in a specific part of the algorithm and only want to know more details there. To achieve >>> that you now you have to either handle all the noise or manually disable/enable other logs. We could file an RFE to >>> clean this up if it's worth the effort - given that there are not many new issues filed for C1 register allocation >>> today. >>> >>> Thank you! >>> >>> Best regards, >>> Christian >>> >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8251093 >>> From vladimir.kozlov at oracle.com Mon Aug 17 17:52:22 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 17 Aug 2020 10:52:22 -0700 Subject: RFR(s): 8248295: serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal In-Reply-To: References: Message-ID: <1bdcbd35-097e-1681-3a0c-32f9709497a4@oracle.com> Hi Fairoz, How you determine that +10Mb is enough with Graal? Thanks, Vladimir On 8/17/20 5:46 AM, Fairoz Matte wrote: > Hi, > > > > Please review this small test change to work with Graal. > > > > Background: > > Graal require more code cache compared to c1/c2. but the test case always set it to 20MB. This may not be sufficient when running graal. > > Default configuration for ReservedCodeCacheSize = 250MB > > With graal enabled, ReservedCodeCacheSize = 350MB > > > > Either we can modify the framework to honor ReservedCodeCacheSize for graal or just update the testcase. > > There are not many test cases they rely on ReservedCodeCacheSize or InitialCodeCacheSize. So the fix prefer the later one. > > > > JBS - https://bugs.openjdk.java.net/browse/JDK-8248295 > > Webrev - http://cr.openjdk.java.net/~fmatte/8248295/webrev.00/ > > > > Thanks, > > Fairoz > > > From jingxinc at amazon.com Mon Aug 17 18:52:00 2020 From: jingxinc at amazon.com (Eric, Chan) Date: Mon, 17 Aug 2020 18:52:00 +0000 Subject: RFR 8213777: purge outdated fp code in x86_32.ad Message-ID: <05D26CF9-9C02-4803-9FEF-1B8EB45A3BEA@amazon.com> Hi, Requesting review for Webrev : http://cr.openjdk.java.net/~phh/8213777/webrev.00/ JBS : https://bugs.openjdk.java.net/browse/JDK-8213777 I delete some outdate code in jdk-11. Since UseSSE is always larger than or equal to 2, some scenario when UseSSE less than two is outdated. I have tested this builds successfully . Ensured that there are no regressions in hotspot : tier1 tests. Regards, Eric Chen From vladimir.x.ivanov at oracle.com Mon Aug 17 19:35:21 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 17 Aug 2020 22:35:21 +0300 Subject: RFR 8213777: purge outdated fp code in x86_32.ad In-Reply-To: <05D26CF9-9C02-4803-9FEF-1B8EB45A3BEA@amazon.com> References: <05D26CF9-9C02-4803-9FEF-1B8EB45A3BEA@amazon.com> Message-ID: Hi Eric, UseSSE >= 2 invariant is valid only on x86-64 since it is guaranteed by system ABI. It is not applicable to x86-32. Best regards, Vladimir Ivanov On 17.08.2020 21:52, Eric, Chan wrote: > Hi, > > Requesting review for > > Webrev : http://cr.openjdk.java.net/~phh/8213777/webrev.00/ > JBS : https://bugs.openjdk.java.net/browse/JDK-8213777 > > I delete some outdate code in jdk-11. Since UseSSE is always larger than or equal to 2, some scenario when UseSSE less than two is outdated. > > I have tested this builds successfully . > > Ensured that there are no regressions in hotspot : tier1 tests. > > Regards, > Eric Chen > From martin.doerr at sap.com Mon Aug 17 21:54:32 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 17 Aug 2020 21:54:32 +0000 Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. Message-ID: Hi, I'd like to backport https://bugs.openjdk.java.net/browse/JDK-8241234 to JDK11u. Original JDK15 patch (https://hg.openjdk.java.net/jdk/jdk/rev/87c506c8be63) doesn't fit to JDK11u because the locking code has been reworked by https://bugs.openjdk.java.net/browse/JDK-8229844 As mentioned by Vladimir, there's already a GraalVM version available which consists of 2 patches (original + addon) and which can be applied: https://github.com/graalvm/labs-openjdk-11/commit/6c162cb15262e6aa77e36eb3a268320ef0a206a4 https://github.com/graalvm/labs-openjdk-11/commit/6a28a618cdbe595f9a3993e0eb63c01ccae1a528 Only JVMCI part from GraalVM doesn't apply automatically. The version of this file from JDK15 is very simple and fits perfectly. Please review the JDK11u backport webrev: http://cr.openjdk.java.net/~mdoerr/8241234_monitorenterexit_11u/webrev.00/ Thanks and best regards, Martin From tobias.hartmann at oracle.com Tue Aug 18 06:54:09 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 18 Aug 2020 08:54:09 +0200 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <87h7t13bdz.fsf@redhat.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> Message-ID: Hi Roland, On 17.08.20 10:49, Roland Westrelin wrote: > Does the fixed patch look ok to you? Yes, webrev.03 looks good to me. I've re-run extended testing and the results look good. Best regards, Tobias From HORIE at jp.ibm.com Tue Aug 18 07:28:12 2020 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Tue, 18 Aug 2020 16:28:12 +0900 Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com> References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com>, <20200626182644.GA262544@pacoca> <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com>, <20200630001528.GA26652@pacoca> Message-ID: Jose, Latest change looks good also to me. Marin, Do you think if I can push the change? Best regards, Michihiro ----- Original message ----- From: "Doerr, Martin" To: "joserz at linux.ibm.com" Cc: hotspot compiler , "horie at jp.ibm.com" Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions Date: Wed, Jul 1, 2020 4:01 AM Thanks for the much better flag description. Looks good. Best regards, Martin > Am 30.06.2020 um 02:15 schrieb "joserz at linux.ibm.com" : > > ?Hello team, > > Here's the 2nd version, implementing the suggestions asked by Martin. > > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.01/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > Thank you!! > > Jose > >> On Sat, Jun 27, 2020 at 09:29:32AM +0000, Doerr, Martin wrote: >> Hi Jose, >> >> Can you replace the outdated description of PowerArchitecturePPC64 in globals_poc.hpp by something generic, please? >> >> Please update the Copyright year in vm_version_poc.hpp. >> >> I can?t test the change, but it looks good to me. >> >> Best regards, >> Martin >> >>>> Am 26.06.2020 um 20:29 schrieb "joserz at linux.ibm.com" : >>> >>> ?Hello team! >>> >>> This patch introduces Power10 to OpenJDK and implements three new instructions: >>> - brh - byte-reverse halfword >>> - brw - byte-reverse word >>> - brd - byte-reverse doubleword >>> >>> Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/ >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 >>> >>> Thanks for your review! >>> >>> Jose R. Ziviani From richard.reingruber at sap.com Tue Aug 18 07:43:51 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Tue, 18 Aug 2020 07:43:51 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com> <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com> <5271429a-481d-ddb9-99dc-b3f6670fcc0b@oracle.com> Message-ID: Hi Goetz, I have collected the changes based on your feedback in a new webrev: Webrev.7: http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.7/ Delta: http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.7.inc/ Most of the changes are renamings, commenting, and reformatting. Besides that ... - I converted the native agent of the test IterateHeapWithEscapeAnalysisEnabled from C to C++, because this seems to be preferred by serviceability developers. I also re-indented the file, but excluded this from the delta webrev. - I had to adapt test/jdk/com/sun/jdi/EATests.java to the fact that background compilation (-Xbatch) cannot be reliably disabled for JVMCI compilers. E.g. the compile broker will compile in the background if JVMCI is not yet fully initialized. Therefore it is possible that test cases are executed before the main test method is compiled on the highest level and then the test case fails. The higher the system load the higher the probability for this to happen. In webrev.7 I skip the compilation level check if the vm is configured to use the JVMCI compiler. I also answered you inline below. Thanks, Richard. -----Original Message----- From: Lindenmaier, Goetz Sent: Donnerstag, 23. Juli 2020 16:20 To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi Richard, Thanks for your two further explanations in the other thread. That made the points clear to me. > > I was not that happy with the names saying not_global_escape > > and similar. I now agreed you have to use the terms of the escape > > analysis (NoEscape ArgEscape= throughout the runtime code. I'm still not happy with > > the 'not' in the term, I always try to expand the name to some > > sentence with a negated verb, but it makes no sense. > > For example, "has_not_global_escape_in_scope" expands to > > "Hasn't a global escape in its scope." in my thinking, which makes > > no sense. You probably mean > > "Has not-global escape in its scope." or "Has {ArgEscape|NoEscape} > > in its scope." > > > C2 is using the word "non" in this context, e.g., here > > alloc->is_non_escaping. > > There is also ConnectionGraph::not_global_escape() That talks about a single node that represents a single Object. An object has a single state wrt. ea. You use the term for safepoint which tracks a set of objects. Here, has_not_global_excape can mean 1. None of the several objects does escape globaly. 2. There is at least one object that escapes globaly. > > non obviously negates the adjective 'global', > > non-global or nonglobal even is a English term I find in the > > net. > > So what about "has_non_global_escape_in_scope?" > > And what about has_ea_local_in_scope? That's good. Please document somewhere that Ea_local == ArgEscape | NoEscape. That's what it is, right? > > Does jvmti specify that the same limits are used ...? > > ok on your side. > > I don't know and didn't find anything in a quick search. Ok, not your business. > > > jvmtiEnvBase.cpp ok > > jvmtiImpl.h|cpp ok > > jvmtiTagMap.cpp ok > > whitebox.cpp ok > > > deoptimization.cpp > > > line 177: Please break line > > line 246, 281: Please break line > > 1578, 1583, 1589, 1632, 1649, 1651 Break line > > > 1651: You use 'non'-terms, too: non-escaping :) > > I know :) At least here it is wrong I'd say. "...has to be a not escaping obj..." > sounds better > (hopefully not only to my german ears). I thought the term non-escpaing makes it quite clear. I just wanted to point out that using non above would be similar to the wording here. > > IterateHeapWithEscapeAnalysisEnabled.java > > > line 415: > > msg("wait until target thread has set testMethod_result"); > > while (testMethod_result == 0) { > > Thread.sleep(50); > > } > > Might the test run into timeouts at this place? > > The field is volatile, i.e. it will be reloaded > > in each iteration. But will dontinline_testMethod > > write it back to main memory in time? > > You mean, the test could hang in that loop for a couple of minutes? I don't > think so. There are cache coherence protocols in place which will invalidate > stale data very timely. Ok, anyways, it would only be a hanging test. > > Ok. I've removed quite a lot of the occurrances. > > > Also, I like full sentences in comments. > > Especially for me as foreign speaker, this makes > > things much more clear. I.e., I try to make it > > a real sentence with articles, capitalized and a > > dot at the end if there is a subject and a verb > > in first place. > > E.g., jvmtiEnvBase.cpp:1327 > > Are you referring to the following? > (from > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6/src/hots > pot/share/prims/jvmtiEnvBase.cpp.frames.html) > > 1326 > 1327 // If the frame is a compiled one, need to deoptimize it. > 1328 if (vf->is_compiled_frame()) { > > This line 1327 is preexisting. Sorry, wrong line number again. I think I meant 1333 // eagerly reallocate scalar replaced objects. But I must admit, the subject is missing. It's one of these imperative sentences where the subject is left out, which are used throughout documentation. Bad example, but still a correct sentence, so qualifies for punctuation? Best regards, Goetz. From fairoz.matte at oracle.com Tue Aug 18 08:10:26 2020 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Tue, 18 Aug 2020 01:10:26 -0700 (PDT) Subject: RFR(s): 8248295: serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal In-Reply-To: <1bdcbd35-097e-1681-3a0c-32f9709497a4@oracle.com> References: <1bdcbd35-097e-1681-3a0c-32f9709497a4@oracle.com> Message-ID: Hi Vladimir, Thanks for looking into. This is intermittent crash, and is reproducible in windows debug build environment. Below is the testing performed. 1. Issues observed 7/100 runs, ReservedCodeCacheSize=20m with "-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler" 2. Issues observed 0/300 runs, ReservedCodeCacheSize=30m with "-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler" Thanks, Fairoz > -----Original Message----- > From: Vladimir Kozlov > Sent: Monday, August 17, 2020 11:22 PM > To: Fairoz Matte ; hotspot-compiler- > dev at openjdk.java.net; serviceability-dev at openjdk.java.net > Cc: Coleen Phillimore ; Dean Long > > Subject: Re: RFR(s): 8248295: > serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal > > Hi Fairoz, > > How you determine that +10Mb is enough with Graal? > > Thanks, > Vladimir > > On 8/17/20 5:46 AM, Fairoz Matte wrote: > > Hi, > > > > > > > > Please review this small test change to work with Graal. > > > > > > > > Background: > > > > Graal require more code cache compared to c1/c2. but the test case always > set it to 20MB. This may not be sufficient when running graal. > > > > Default configuration for ReservedCodeCacheSize = 250MB > > > > With graal enabled, ReservedCodeCacheSize = 350MB > > > > > > > > Either we can modify the framework to honor ReservedCodeCacheSize for > graal or just update the testcase. > > > > There are not many test cases they rely on ReservedCodeCacheSize or > InitialCodeCacheSize. So the fix prefer the later one. > > > > > > > > JBS - https://bugs.openjdk.java.net/browse/JDK-8248295 > > > > Webrev - http://cr.openjdk.java.net/~fmatte/8248295/webrev.00/ > > > > > > > > Thanks, > > > > Fairoz > > > > > > From HORIE at jp.ibm.com Tue Aug 18 08:38:42 2020 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Tue, 18 Aug 2020 17:38:42 +0900 Subject: RFR: 8251926: PPC: Remove an unused variable in assembler_ppc.cpp Message-ID: Dear all, Would you please review a small change? Bug: https://bugs.openjdk.java.net/browse/JDK-8251926 Webrev: http://cr.openjdk.java.net/~mhorie/8251926/webrev.00/ The load_const_optimized function in assembler_ppc.cpp has an unused variable named return_xd. It looks unnecessary in the current code. Best regards, Michihiro From martin.doerr at sap.com Tue Aug 18 09:13:39 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 18 Aug 2020 09:13:39 +0000 Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com>, <20200626182644.GA262544@pacoca> <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com>, <20200630001528.GA26652@pacoca> Message-ID: Hi Michihiro and Jose, I had only done a quick review during my vacation. Thanks for updating the description of PowerArchitecturePPC64. After taking a second look, I have a few minor requests. Sorry for that. * ?UseByteReverseInstructions? (plural) would be more consistent with other names. * Please add ?size? specifications to the ppc.ad file. Otherwise, the compiler has to determine sizes dynamically every time. * bytes_reverse_short: ?format? specification misses ?extsh?. Unfortunately, I couldn?t find a Power10 machine in my garage ?? So we rely on your testing. Thanks and best regards, Martin From: Michihiro Horie Sent: Dienstag, 18. August 2020 09:28 To: Doerr, Martin Cc: hotspot-compiler-dev at openjdk.java.net; joserz at linux.ibm.com Subject: RE: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions Jose, Latest change looks good also to me. Marin, Do you think if I can push the change? Best regards, Michihiro ----- Original message ----- From: "Doerr, Martin" > To: "joserz at linux.ibm.com" > Cc: hotspot compiler >, "horie at jp.ibm.com" > Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions Date: Wed, Jul 1, 2020 4:01 AM Thanks for the much better flag description. Looks good. Best regards, Martin > Am 30.06.2020 um 02:15 schrieb "joserz at linux.ibm.com" >: > > ?Hello team, > > Here's the 2nd version, implementing the suggestions asked by Martin. > > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.01/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > Thank you!! > > Jose > >> On Sat, Jun 27, 2020 at 09:29:32AM +0000, Doerr, Martin wrote: >> Hi Jose, >> >> Can you replace the outdated description of PowerArchitecturePPC64 in globals_poc.hpp by something generic, please? >> >> Please update the Copyright year in vm_version_poc.hpp. >> >> I can?t test the change, but it looks good to me. >> >> Best regards, >> Martin >> >>>> Am 26.06.2020 um 20:29 schrieb "joserz at linux.ibm.com" >: >>> >>> ?Hello team! >>> >>> This patch introduces Power10 to OpenJDK and implements three new instructions: >>> - brh - byte-reverse halfword >>> - brw - byte-reverse word >>> - brd - byte-reverse doubleword >>> >>> Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/ >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 >>> >>> Thanks for your review! >>> >>> Jose R. Ziviani From thomas.stuefe at gmail.com Tue Aug 18 09:23:04 2020 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 18 Aug 2020 11:23:04 +0200 Subject: RFR: 8251926: PPC: Remove an unused variable in assembler_ppc.cpp In-Reply-To: References: Message-ID: Hi Michihiro, seems fine and trivial. Thanks, Thomas On Tue, Aug 18, 2020 at 10:40 AM Michihiro Horie wrote: > Dear all, > > Would you please review a small change? > > Bug: https://bugs.openjdk.java.net/browse/JDK-8251926 > Webrev: http://cr.openjdk.java.net/~mhorie/8251926/webrev.00/ > > The load_const_optimized function in assembler_ppc.cpp has an unused > variable named return_xd. It looks unnecessary in the current code. > > Best regards, > Michihiro > From HORIE at jp.ibm.com Tue Aug 18 09:43:34 2020 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Tue, 18 Aug 2020 18:43:34 +0900 Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: References: , <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com>, <20200626182644.GA262544@pacoca> <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com>, <20200630001528.GA26652@pacoca> Message-ID: Hi Martin, Thank you so much for your in-depth review. I agree all of the three items should be updated. Best regards, Michihiro ----- Original message ----- From: "Doerr, Martin" To: Michihiro Horie , "joserz at linux.ibm.com" Cc: "hotspot-compiler-dev at openjdk.java.net" Subject: [EXTERNAL] RE: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions Date: Tue, Aug 18, 2020 6:13 PM Hi Michihiro and Jose, I had only done a quick review during my vacation. Thanks for updating the description of PowerArchitecturePPC64. After taking a second look, I have a few minor requests. Sorry for that. ?UseByteReverseInstructions? (plural) would be more consistent with other names. Please add ?size? specifications to the ppc.ad file. Otherwise, the compiler has to determine sizes dynamically every time. bytes_reverse_short: ?format? specification misses ?extsh?. Unfortunately, I couldn?t find a Power10 machine in my garage ?? So we rely on your testing. Thanks and best regards, Martin From: Michihiro Horie Sent: Dienstag, 18. August 2020 09:28 To: Doerr, Martin Cc: hotspot-compiler-dev at openjdk.java.net; joserz at linux.ibm.com Subject: RE: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions Jose, Latest change looks good also to me. Marin, Do you think if I can push the change? Best regards, Michihiro ----- Original message ----- From: "Doerr, Martin" To: "joserz at linux.ibm.com" Cc: hotspot compiler , " horie at jp.ibm.com" Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions Date: Wed, Jul 1, 2020 4:01 AM Thanks for the much better flag description. Looks good. Best regards, Martin > Am 30.06.2020 um 02:15 schrieb "joserz at linux.ibm.com" < joserz at linux.ibm.com>: > > ?Hello team, > > Here's the 2nd version, implementing the suggestions asked by Martin. > > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.01/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > Thank you!! > > Jose > >> On Sat, Jun 27, 2020 at 09:29:32AM +0000, Doerr, Martin wrote: >> Hi Jose, >> >> Can you replace the outdated description of PowerArchitecturePPC64 in globals_poc.hpp by something generic, please? >> >> Please update the Copyright year in vm_version_poc.hpp. >> >> I can?t test the change, but it looks good to me. >> >> Best regards, >> Martin >> >>>> Am 26.06.2020 um 20:29 schrieb "joserz at linux.ibm.com" < joserz at linux.ibm.com>: >>> >>> ?Hello team! >>> >>> This patch introduces Power10 to OpenJDK and implements three new instructions: >>> - brh - byte-reverse halfword >>> - brw - byte-reverse word >>> - brd - byte-reverse doubleword >>> >>> Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/ >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 >>> >>> Thanks for your review! >>> >>> Jose R. Ziviani From HORIE at jp.ibm.com Tue Aug 18 09:49:19 2020 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Tue, 18 Aug 2020 18:49:19 +0900 Subject: RFR: 8251926: PPC: Remove an unused variable in assembler_ppc.cpp In-Reply-To: References: , Message-ID: Hi Thomas, Thanks a lot! Best regards, Michihiro ----- Original message ----- From: "Thomas St?fe" To: Michihiro Horie Cc: ppc-aix-port-dev , hotspot compiler Subject: [EXTERNAL] Re: RFR: 8251926: PPC: Remove an unused variable in assembler_ppc.cpp Date: Tue, Aug 18, 2020 6:23 PM Hi Michihiro, seems fine and trivial. Thanks, Thomas On Tue, Aug 18, 2020 at 10:40 AM Michihiro Horie wrote: Dear all, Would you please review a small change? Bug: https://bugs.openjdk.java.net/browse/JDK-8251926 Webrev: http://cr.openjdk.java.net/~mhorie/8251926/webrev.00/ The load_const_optimized function in assembler_ppc.cpp has an unused variable named return_xd. It looks unnecessary in the current code. Best regards, Michihiro From christian.hagedorn at oracle.com Tue Aug 18 13:16:12 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 18 Aug 2020 15:16:12 +0200 Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and debugging support In-Reply-To: References: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com> <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com> <5c6d0dae-a30d-7b32-4503-62b7ec460606@oracle.com> Message-ID: <4b6d3d8a-f2f3-9942-de60-078a0e5c46d9@oracle.com> Hi Vladimir On 17.08.20 19:36, Vladimir Kozlov wrote: > On 8/17/20 12:44 AM, Christian Hagedorn wrote: >> Hi Vladimir >> >> Yes, you're right, these should be changed into ASSERT and DEBUG(). >> >> I'm wondering though if these ifdefs are even required for if-blocks >> inside methods? >> >> Isn't, for example, this if-block: >> >> #ifndef PRODUCT >> ???????? if (TraceLinearScanLevel >= 2) { >> ?????????? tty->print_cr("killing XMMs for trig"); >> ???????? } >> #endif >> >> removed anyways when the flag is set to < 2 (which is statically known >> and thus would allow this entire block to be removed)? Or does it make >> a difference by explicitly guarding it with an ifdef? > > You are right. It could be statically removed. But we keep #ifdef > sometimes to indicate that code is executed only in debug build because > we don't always remember type of a flag. I see, that makes sense. I updated my patch and left the ifdefs there but changed them to ASSERT. I also updated other ifdefs belonging to TraceLinearScanLevel appropriately. http://cr.openjdk.java.net/~chagedorn/8251093/webrev.01/ Best regards, Christian > > Thanks, > Vladimir K > >> >> Best regards, >> Christian >> >> On 14.08.20 20:09, Vladimir Kozlov wrote: >>> One note. Most of the code is guarded by #ifndef PRODUCT. >>> >>> But the flag is available only in DEBUG build: >>> ?? develop(intx, TraceLinearScanLevel, 0, >>> >>> Should we use #ifdef ASSERT and DEBUG() instead? >>> >>> Thanks, >>> Vladimir >>> >>> On 8/14/20 5:10 AM, Christian Hagedorn wrote: >>>> Hi >>>> >>>> Please review the following enhancement for C1: >>>> https://bugs.openjdk.java.net/browse/JDK-8251093 >>>> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.00/ >>>> >>>> While I was working on JDK-8249603 [1], I added some additional >>>> debugging and logging code which helped to figure out what was going >>>> on. I think it would be useful to have this code around for the >>>> analysis of future C1 register allocator bugs. >>>> >>>> This RFE adds (everything non-product code): >>>> - find_interval(number): Can be called like that from gdb anywhere >>>> to find an interval with the given number. >>>> - Interval::print_children()/print_parent(): Useful when debugging >>>> with gdb to quickly show the split children and parent. >>>> - LinearScan::print_reg_num(number): Prints the register or stack >>>> location for this register number. This is useful in some places >>>> (logging with TraceLinearScanLevel set) where it just printed a >>>> number which first had to be manually looked up in other logs. >>>> >>>> I additionally did some cleanup of the touched code. >>>> >>>> We could additionally split the TraceLinearScanLevel flag into >>>> separate flags related to the different phases of the register >>>> allocation algorithm. It currently just prints too much details on >>>> the higher levels. You often find yourself being interested in a >>>> specific part of the algorithm and only want to know more details >>>> there. To achieve that you now you have to either handle all the >>>> noise or manually disable/enable other logs. We could file an RFE to >>>> clean this up if it's worth the effort - given that there are not >>>> many new issues filed for C1 register allocation today. >>>> >>>> Thank you! >>>> >>>> Best regards, >>>> Christian >>>> >>>> >>>> [1] https://bugs.openjdk.java.net/browse/JDK-8251093 >>>> From dmitry.chuyko at bell-sw.com Tue Aug 18 15:05:01 2020 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Tue, 18 Aug 2020 18:05:01 +0300 Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com> References: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com> <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com> Message-ID: <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com> Hi Andrew, Thanks for taking a look. This work has started as a try to improve common code, see JDK-8249198 [1] and short related discussion [2]. And the original benchmark [3] is quite similar to the one that you used. As you kindly tried the patch on a hardware where it shows degradation (baseline is quite slow btw), I think it makes sense to limit it to Cortex/Neoverse. So I restored UseSignumInrinsic flag which is enabled only for CPU_ARM. Disabling InlineMathNatives also disables it. webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.02/ As suggested by Anrew Dinn, there are few more test cases in the test: +-MIN_NORMAL and some denormal numbers. Some more results for a benchmark with reduce(): -XX:-UseSignumIntrinsic DoubleOrigSignum.ofMostlyNaN 0.914 ? 0.001 ns/op DoubleOrigSignum.ofMostlyNeg 1.178 ? 0.001 ns/op DoubleOrigSignum.ofMostlyPos 1.176 ? 0.017 ns/op DoubleOrigSignum.ofMostlyZero 0.803 ? 0.001 ns/op DoubleOrigSignum.ofRandom 1.175 ? 0.012 ns/op -XX:+UseSignumIntrinsic DoubleOrigSignum.ofMostlyNaN 1.040 ? 0.007 ns/op DoubleOrigSignum.ofMostlyNeg 1.040 ? 0.004 ns/op DoubleOrigSignum.ofMostlyPos 1.039 ? 0.003 ns/op DoubleOrigSignum.ofMostlyZero 1.040 ? 0.001 ns/op DoubleOrigSignum.ofRandom 1.040 ? 0.003 ns/op If we only intrinsify copySign() we lose free mask that we get from facgt. In such case improvement (for signum) decreases like from ~30% to ~15%, and it also greatly depends on the particular HW. We can additionally introduce an intrinsic for Math.copySign(), especially it makes sense for float where it can be just 2 fp instructions: movi+bsl (fmovd+fnegd+bsl for double). -Dmitry [1] https://bugs.openjdk.java.net/browse/JDK-8249198 [2] https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-July/067666.html [3] http://cr.openjdk.java.net/~dchuyko/8249198/webrev.00/raw_files/new/test/micro/org/openjdk/bench/java/lang/DoubleSignum.java On 8/15/20 4:50 PM, Andrew Haley wrote: > I've been looking at the way Math.signum() is used, mostly by > searching the GitHub code database. I've changed the JMH test to be > IMO more realistic: it's at > http://cr.openjdk.java.net/~aph/DoubleSignum.java. I think it's more > realitic because signum() results usually aren't stored but are used > to feed other arithmetic ops, usually + or *. > > Baseline: > > Benchmark Mode Cnt Score Error Units > DoubleSignum.ofMostlyNaN avgt 3 2.409 ? 0.051 ns/op > DoubleSignum.ofMostlyNeg avgt 3 2.475 ? 0.211 ns/op > DoubleSignum.ofMostlyPos avgt 3 2.494 ? 0.015 ns/op > DoubleSignum.ofMostlyZero avgt 3 2.501 ? 0.008 ns/op > DoubleSignum.ofRandom avgt 3 2.458 ? 0.373 ns/op > DoubleSignum.overhead avgt 3 2.373 ? 0.029 ns/op > > -XX:+UseSignumIntrinsic: > > Benchmark Mode Cnt Score Error Units > DoubleSignum.ofMostlyNaN avgt 3 2.776 ? 0.006 ns/op > DoubleSignum.ofMostlyNeg avgt 3 2.773 ? 0.066 ns/op > DoubleSignum.ofMostlyPos avgt 3 2.772 ? 0.084 ns/op > DoubleSignum.ofMostlyZero avgt 3 2.770 ? 0.045 ns/op > DoubleSignum.ofRandom avgt 3 2.769 ? 0.005 ns/op > DoubleSignum.overhead avgt 3 2.376 ? 0.013 ns/op > > > I think it might be more useful for you to work on optimizing > Math.copysign(). > From vladimir.x.ivanov at oracle.com Tue Aug 18 15:08:30 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 18 Aug 2020 18:08:30 +0300 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <87h7t13bdz.fsf@redhat.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> Message-ID: >> http://cr.openjdk.java.net/~roland/8223051/webrev.03/ Looks good! Thanks a lot for taking care of it! Some minor comments: =============== src/hotspot/share/opto/callnode.cpp: + Node* new_in = old_sosn->clone(sosn_map); + if (old_unique != C->unique()) { // New node? It's not a correctness issue, but strictly speaking it checks whether new nodes were allocated or not. It would be clearer to add a flag to SafePointScalarObjectNode::clone(Dict*) which signals that the returned node comes from the cache. Or just check that "new_in->_idx >= C->unique()". I see that the code comes from macro.cpp, but IMO it's a good opportunity to clean it up a bit. =============== src/hotspot/share/opto/callnode.cpp: // If you have back to back safepoints, remove one if( in(TypeFunc::Control)->is_SafePoint() ) return in(TypeFunc::Control); - if( in(0)->is_Proj() ) { + // Transforming long counted loops requires a safepoint node. Do not + // eliminate a safepoint until loop opts are over. + if (in(0)->is_Proj() && !phase->C->major_progress()) { Can you elaborate on this a bit? Why elimination of back-to-back safepoints cause problems during new transformation? Is it because you need specifically a SafePoint because CallNode doesn't fit? =============== src/hotspot/share/opto/loopnode.cpp: +void PhaseIdealLoop::add_empty_predicate(Deoptimization::DeoptReason reason, Node* inner_head, IdealLoopTree* loop, SafePointNode* sfpt) { Nothing actionable at the moment, but it's unfortunate to see more and more code being duplicated from GraphKit. I wish there were a way to share implementation between GraphKit, PhaseIdealLoop, and PhaseMacroExpand. Best regards, Vladimir Ivanov >> >> diff from previous patch: >> http://cr.openjdk.java.net/~roland/8223051/webrev.02-03/ > From vladimir.kozlov at oracle.com Tue Aug 18 15:12:31 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 18 Aug 2020 08:12:31 -0700 (PDT) Subject: [16] RFR(M) 8251459: Compute caller save exclusion RegMasks once Message-ID: <0ac5e7ce-2f98-cf5d-6668-fd3b15f9e0ab@oracle.com> https://cr.openjdk.java.net/~kvn/8251459/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8251459 Claes once again found optimization for C2 code! Instead of per bit exclusion SOC and AS registers from debuginfo regmasks he suggested to calculate exclusion masks once in Matcher::init_spill_mask() during first compilation and use these masks to do per word exclusion. We can save 27k instructions per compilation on x64 with this! I modified Claes's original patch by removing refactoring code to see changes more clear. Tested: hs-tier1-3, xcomp Thanks, Vladimir From vladimir.x.ivanov at oracle.com Tue Aug 18 15:26:44 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 18 Aug 2020 18:26:44 +0300 Subject: [16] RFR(M) 8251459: Compute caller save exclusion RegMasks once In-Reply-To: <0ac5e7ce-2f98-cf5d-6668-fd3b15f9e0ab@oracle.com> References: <0ac5e7ce-2f98-cf5d-6668-fd3b15f9e0ab@oracle.com> Message-ID: <0b735f6f-39ad-6d6a-8cc5-3f4fd9ab1d96@oracle.com> Looks good. Best regards, Vladimir Ivanov On 18.08.2020 18:12, Vladimir Kozlov wrote: > https://cr.openjdk.java.net/~kvn/8251459/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8251459 > > Claes once again found optimization for C2 code! > > Instead of per bit exclusion SOC and AS registers from debuginfo > regmasks he suggested to calculate exclusion masks once in > Matcher::init_spill_mask() during first compilation and use these masks > to do per word exclusion. > We can save 27k instructions per compilation on x64 with this! > > I modified Claes's original patch by removing refactoring code to see > changes more clear. > > Tested: hs-tier1-3, xcomp > > Thanks, > Vladimir From vladimir.kozlov at oracle.com Tue Aug 18 15:32:28 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 18 Aug 2020 08:32:28 -0700 Subject: [16] RFR(M) 8251459: Compute caller save exclusion RegMasks once In-Reply-To: <0b735f6f-39ad-6d6a-8cc5-3f4fd9ab1d96@oracle.com> References: <0ac5e7ce-2f98-cf5d-6668-fd3b15f9e0ab@oracle.com> <0b735f6f-39ad-6d6a-8cc5-3f4fd9ab1d96@oracle.com> Message-ID: Thanks! Vladimir K On 8/18/20 8:26 AM, Vladimir Ivanov wrote: > Looks good. > > Best regards, > Vladimir Ivanov > > On 18.08.2020 18:12, Vladimir Kozlov wrote: >> https://cr.openjdk.java.net/~kvn/8251459/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8251459 >> >> Claes once again found optimization for C2 code! >> >> Instead of per bit exclusion SOC and AS registers from debuginfo regmasks he suggested to calculate exclusion masks >> once in Matcher::init_spill_mask() during first compilation and use these masks to do per word exclusion. >> We can save 27k instructions per compilation on x64 with this! >> >> I modified Claes's original patch by removing refactoring code to see changes more clear. >> >> Tested: hs-tier1-3, xcomp >> >> Thanks, >> Vladimir From vladimir.kozlov at oracle.com Tue Aug 18 15:41:31 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 18 Aug 2020 08:41:31 -0700 Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and debugging support In-Reply-To: <4b6d3d8a-f2f3-9942-de60-078a0e5c46d9@oracle.com> References: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com> <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com> <5c6d0dae-a30d-7b32-4503-62b7ec460606@oracle.com> <4b6d3d8a-f2f3-9942-de60-078a0e5c46d9@oracle.com> Message-ID: <8cd1d560-f473-f4f1-a865-70e306d4750f@oracle.com> c1_Compilation.hpp: looks like both versions of allocator() do the same thing. I suggest to build with configure --with-debug-level=optimized to check that NOT_PRODUCT can be built with these changes. Thanks, Vladimir On 8/18/20 6:16 AM, Christian Hagedorn wrote: > Hi Vladimir > > On 17.08.20 19:36, Vladimir Kozlov wrote: >> On 8/17/20 12:44 AM, Christian Hagedorn wrote: >>> Hi Vladimir >>> >>> Yes, you're right, these should be changed into ASSERT and DEBUG(). >>> >>> I'm wondering though if these ifdefs are even required for if-blocks inside methods? >>> >>> Isn't, for example, this if-block: >>> >>> #ifndef PRODUCT >>> ???????? if (TraceLinearScanLevel >= 2) { >>> ?????????? tty->print_cr("killing XMMs for trig"); >>> ???????? } >>> #endif >>> >>> removed anyways when the flag is set to < 2 (which is statically known and thus would allow this entire block to be >>> removed)? Or does it make a difference by explicitly guarding it with an ifdef? >> >> You are right. It could be statically removed. But we keep #ifdef sometimes to indicate that code is executed only in >> debug build because we don't always remember type of a flag. > > I see, that makes sense. I updated my patch and left the ifdefs there but changed them to ASSERT. I also updated other > ifdefs belonging to TraceLinearScanLevel appropriately. > > http://cr.openjdk.java.net/~chagedorn/8251093/webrev.01/ > > Best regards, > Christian > >> >> Thanks, >> Vladimir K >> >>> >>> Best regards, >>> Christian >>> >>> On 14.08.20 20:09, Vladimir Kozlov wrote: >>>> One note. Most of the code is guarded by #ifndef PRODUCT. >>>> >>>> But the flag is available only in DEBUG build: >>>> ?? develop(intx, TraceLinearScanLevel, 0, >>>> >>>> Should we use #ifdef ASSERT and DEBUG() instead? >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 8/14/20 5:10 AM, Christian Hagedorn wrote: >>>>> Hi >>>>> >>>>> Please review the following enhancement for C1: >>>>> https://bugs.openjdk.java.net/browse/JDK-8251093 >>>>> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.00/ >>>>> >>>>> While I was working on JDK-8249603 [1], I added some additional debugging and logging code which helped to figure >>>>> out what was going on. I think it would be useful to have this code around for the analysis of future C1 register >>>>> allocator bugs. >>>>> >>>>> This RFE adds (everything non-product code): >>>>> - find_interval(number): Can be called like that from gdb anywhere to find an interval with the given number. >>>>> - Interval::print_children()/print_parent(): Useful when debugging with gdb to quickly show the split children and >>>>> parent. >>>>> - LinearScan::print_reg_num(number): Prints the register or stack location for this register number. This is useful >>>>> in some places (logging with TraceLinearScanLevel set) where it just printed a number which first had to be >>>>> manually looked up in other logs. >>>>> >>>>> I additionally did some cleanup of the touched code. >>>>> >>>>> We could additionally split the TraceLinearScanLevel flag into separate flags related to the different phases of >>>>> the register allocation algorithm. It currently just prints too much details on the higher levels. You often find >>>>> yourself being interested in a specific part of the algorithm and only want to know more details there. To achieve >>>>> that you now you have to either handle all the noise or manually disable/enable other logs. We could file an RFE to >>>>> clean this up if it's worth the effort - given that there are not many new issues filed for C1 register allocation >>>>> today. >>>>> >>>>> Thank you! >>>>> >>>>> Best regards, >>>>> Christian >>>>> >>>>> >>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8251093 >>>>> From vladimir.kozlov at oracle.com Tue Aug 18 19:14:01 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 18 Aug 2020 12:14:01 -0700 Subject: RFR(s): 8248295: serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal In-Reply-To: References: <1bdcbd35-097e-1681-3a0c-32f9709497a4@oracle.com> Message-ID: <7040f785-b871-9771-94a2-4c3472a6bf6d@oracle.com> I would suggest to run test with -XX:+PrintCodeCache flag which prints CodeCache usage on exit. Also add '-ea -esa' flags - some runs failed with them because they increase Graal's methods size. Running test with immediately caused OOM error on my local linux machine: '-server -ea -esa -XX:+TieredCompilation -XX:+PrintCodeCache -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler -Djvmci.Compiler=graal' With -XX:ReservedCodeCacheSize=30m I got: [11.217s][warning][codecache] CodeCache is full. Compiler has been disabled. [11.217s][warning][codecache] Try increasing the code cache size using -XX:ReservedCodeCacheSize= With -XX:ReservedCodeCacheSize=50m I got this output: CodeCache: size=51200Kb used=34401Kb max_used=34401Kb free=16798Kb May be you need to set it to 35m or better to 50m to be safe. Note, without Graal test uses only 5.5m: CodeCache: size=20480Kb used=5677Kb max_used=5688Kb free=14803Kb ----------------------------- I also forgot to ask you to update test's Copyright year. Regards, Vladimir K On 8/18/20 1:10 AM, Fairoz Matte wrote: > Hi Vladimir, > > Thanks for looking into. > This is intermittent crash, and is reproducible in windows debug build environment. Below is the testing performed. > > 1. Issues observed 7/100 runs, ReservedCodeCacheSize=20m with "-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler" > 2. Issues observed 0/300 runs, ReservedCodeCacheSize=30m with "-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler" > > Thanks, > Fairoz > >> -----Original Message----- >> From: Vladimir Kozlov >> Sent: Monday, August 17, 2020 11:22 PM >> To: Fairoz Matte ; hotspot-compiler- >> dev at openjdk.java.net; serviceability-dev at openjdk.java.net >> Cc: Coleen Phillimore ; Dean Long >> >> Subject: Re: RFR(s): 8248295: >> serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal >> >> Hi Fairoz, >> >> How you determine that +10Mb is enough with Graal? >> >> Thanks, >> Vladimir >> >> On 8/17/20 5:46 AM, Fairoz Matte wrote: >>> Hi, >>> >>> >>> >>> Please review this small test change to work with Graal. >>> >>> >>> >>> Background: >>> >>> Graal require more code cache compared to c1/c2. but the test case always >> set it to 20MB. This may not be sufficient when running graal. >>> >>> Default configuration for ReservedCodeCacheSize = 250MB >>> >>> With graal enabled, ReservedCodeCacheSize = 350MB >>> >>> >>> >>> Either we can modify the framework to honor ReservedCodeCacheSize for >> graal or just update the testcase. >>> >>> There are not many test cases they rely on ReservedCodeCacheSize or >> InitialCodeCacheSize. So the fix prefer the later one. >>> >>> >>> >>> JBS - https://bugs.openjdk.java.net/browse/JDK-8248295 >>> >>> Webrev - http://cr.openjdk.java.net/~fmatte/8248295/webrev.00/ >>> >>> >>> >>> Thanks, >>> >>> Fairoz >>> >>> >>> From evgeny.nikitin at oracle.com Tue Aug 18 19:40:45 2020 From: evgeny.nikitin at oracle.com (Evgeny Nikitin) Date: Tue, 18 Aug 2020 21:40:45 +0200 Subject: RFR(XS): 8208257: Un-quarantine vmTestbase/vm/mlvm/meth/func/jdi/breakpointOtherStratum Message-ID: <1b7315b8-98be-116d-d037-e3bb17f55f1b@oracle.com> Hi, Bug: https://bugs.openjdk.java.net/browse/JDK-8208257 Webrev: http://cr.openjdk.java.net/~enikitin/8208257/webrev.00/JDK-8208257.patch I tried to reproduce the test multiple times with different VM parameters, but it always passes. I suggest removing it from ProblemList.txt. Second change is marking the test with randomness keyword from the https://bugs.openjdk.java.net/browse/JDK-8243427 (using reproducible random for mlvm tests). The change has been checked in mach5 for windows, macosx, linux in x64-debug, approx. 100 runs on each platform (passed). Please review, /Evgeny Nikitin. From martin.doerr at sap.com Tue Aug 18 21:25:50 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 18 Aug 2020 21:25:50 +0000 Subject: [16] RFR (S) 8249749: modify a primitive array through a stream and a for cycle causes jre crash In-Reply-To: <3d0dc868-e824-a141-02a2-58a58ad5b450@oracle.com> References: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com> <03653a8f-d196-4e4c-d893-a619f2973011@oracle.com> <3d0dc868-e824-a141-02a2-58a58ad5b450@oracle.com> Message-ID: Hi Vladimir, we are hitting the following assertion after this change was pushed: assert(my_pack(s) == __null) failed: only in one pack Stack: V [jvm.dll+0xbbac55] SuperWord::construct_my_pack_map+0x135 (superword.cpp:1723) V [jvm.dll+0xbb57f7] SuperWord::SLP_extract+0x427 (superword.cpp:520) V [jvm.dll+0xbcba0b] SuperWord::transform_loop+0x48b (superword.cpp:170) V [jvm.dll+0x895a09] PhaseIdealLoop::build_and_optimize+0xef9 (loopnode.cpp:3270) V [jvm.dll+0x3df4b6] Compile::Optimize+0xf76 (compile.cpp:2187) ... Seems to be reproducible by JTREG test compiler/vectorization/TestComplexAddrExpr.java on some x64 and aarch64 machines. (May depend on CPU model.) Is this a known issue? Or should I open a bug? Best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev retn at openjdk.java.net> On Behalf Of Vladimir Kozlov > Sent: Montag, 10. August 2020 19:03 > To: hotspot compiler > Subject: Re: [16] RFR (S) 8249749: modify a primitive array through a stream > and a for cycle causes jre crash > > Thank you, Vladimir > > Vladimir K > > On 8/10/20 2:04 AM, Vladimir Ivanov wrote: > > > >> http://cr.openjdk.java.net/~kvn/8249749/webrev.00/ > > > > Looks good. > > > > Best regards, > > Vladimir Ivanov > > > >> https://bugs.openjdk.java.net/browse/JDK-8249749 > >> > >> SuperWord does not recognize array indexing pattern used in the test > due to additional AddI node: > >> > >> AddI(AddI(Invariant(j*n), Loop_phi(i)), Loop_inc(1)) > >> > >> As result it can't find memory reference to align vectors. But code ignores > that and continue execution. > >> Later when align_to_ref is referenced we hit SEGV because it is NULL. > >> > >> The fix is to check align_to_ref for NULL early and bailout. > >> > >> I also adjusted code in SWPointer::scaled_iv_plus_offset() to recognize > this address pattern to vectorize test's code. > >> And added missing _invar setting. > >> > >> And I slightly modified tracking code to investigate this issue. > >> > >> Added new test to check some complex address expressions similar to > bug's test case. Not all cases in test are > >> vectorized - there are other conditions which prevent that. > >> > >> Tested tier1,tier2,hs-tier3,precheckin-comp > >> > >> Thanks, > >> Vladimir K From vladimir.kozlov at oracle.com Tue Aug 18 21:57:42 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 18 Aug 2020 14:57:42 -0700 Subject: [16] RFR (S) 8249749: modify a primitive array through a stream and a for cycle causes jre crash In-Reply-To: References: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com> <03653a8f-d196-4e4c-d893-a619f2973011@oracle.com> <3d0dc868-e824-a141-02a2-58a58ad5b450@oracle.com> Message-ID: <2f7bd657-ce3b-b14b-e682-2b209c2240eb@oracle.com> Thank you for reporting, Martin Please, file bug and specify JDK version, VM flags and CPUID features on machine where it fail. We test on x86 and aarch64 and I did not see any issues so far. Regards, Vladimir On 8/18/20 2:25 PM, Doerr, Martin wrote: > Hi Vladimir, > > we are hitting the following assertion after this change was pushed: > assert(my_pack(s) == __null) failed: only in one pack > > Stack: > V [jvm.dll+0xbbac55] SuperWord::construct_my_pack_map+0x135 (superword.cpp:1723) > V [jvm.dll+0xbb57f7] SuperWord::SLP_extract+0x427 (superword.cpp:520) > V [jvm.dll+0xbcba0b] SuperWord::transform_loop+0x48b (superword.cpp:170) > V [jvm.dll+0x895a09] PhaseIdealLoop::build_and_optimize+0xef9 (loopnode.cpp:3270) > V [jvm.dll+0x3df4b6] Compile::Optimize+0xf76 (compile.cpp:2187) > ... > > Seems to be reproducible by JTREG test compiler/vectorization/TestComplexAddrExpr.java on some x64 and aarch64 machines. > (May depend on CPU model.) > > Is this a known issue? > Or should I open a bug? > > Best regards, > Martin > > >> -----Original Message----- >> From: hotspot-compiler-dev > retn at openjdk.java.net> On Behalf Of Vladimir Kozlov >> Sent: Montag, 10. August 2020 19:03 >> To: hotspot compiler >> Subject: Re: [16] RFR (S) 8249749: modify a primitive array through a stream >> and a for cycle causes jre crash >> >> Thank you, Vladimir >> >> Vladimir K >> >> On 8/10/20 2:04 AM, Vladimir Ivanov wrote: >>> >>>> http://cr.openjdk.java.net/~kvn/8249749/webrev.00/ >>> >>> Looks good. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> https://bugs.openjdk.java.net/browse/JDK-8249749 >>>> >>>> SuperWord does not recognize array indexing pattern used in the test >> due to additional AddI node: >>>> >>>> AddI(AddI(Invariant(j*n), Loop_phi(i)), Loop_inc(1)) >>>> >>>> As result it can't find memory reference to align vectors. But code ignores >> that and continue execution. >>>> Later when align_to_ref is referenced we hit SEGV because it is NULL. >>>> >>>> The fix is to check align_to_ref for NULL early and bailout. >>>> >>>> I also adjusted code in SWPointer::scaled_iv_plus_offset() to recognize >> this address pattern to vectorize test's code. >>>> And added missing _invar setting. >>>> >>>> And I slightly modified tracking code to investigate this issue. >>>> >>>> Added new test to check some complex address expressions similar to >> bug's test case. Not all cases in test are >>>> vectorized - there are other conditions which prevent that. >>>> >>>> Tested tier1,tier2,hs-tier3,precheckin-comp >>>> >>>> Thanks, >>>> Vladimir K From vladimir.kozlov at oracle.com Tue Aug 18 22:03:32 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 18 Aug 2020 15:03:32 -0700 Subject: [16] RFR (S) 8249749: modify a primitive array through a stream and a for cycle causes jre crash In-Reply-To: <2f7bd657-ce3b-b14b-e682-2b209c2240eb@oracle.com> References: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com> <03653a8f-d196-4e4c-d893-a619f2973011@oracle.com> <3d0dc868-e824-a141-02a2-58a58ad5b450@oracle.com> <2f7bd657-ce3b-b14b-e682-2b209c2240eb@oracle.com> Message-ID: <89ebdebb-2d8d-7e3a-1594-a3a6888329bc@oracle.com> I reproduced it with -XX:UseAVX=0. I will file bug and take care of it. Thanks, Vladimir K On 8/18/20 2:57 PM, Vladimir Kozlov wrote: > Thank you for reporting, Martin > > Please, file bug and specify JDK version, VM flags and CPUID features on machine where it fail. > > We test on x86 and aarch64 and I did not see any issues so far. > > Regards, > Vladimir > > On 8/18/20 2:25 PM, Doerr, Martin wrote: >> Hi Vladimir, >> >> we are hitting the following assertion after this change was pushed: >> assert(my_pack(s) == __null) failed: only in one pack >> >> Stack: >> V? [jvm.dll+0xbbac55]? SuperWord::construct_my_pack_map+0x135? (superword.cpp:1723) >> V? [jvm.dll+0xbb57f7]? SuperWord::SLP_extract+0x427? (superword.cpp:520) >> V? [jvm.dll+0xbcba0b]? SuperWord::transform_loop+0x48b? (superword.cpp:170) >> V? [jvm.dll+0x895a09]? PhaseIdealLoop::build_and_optimize+0xef9? (loopnode.cpp:3270) >> V? [jvm.dll+0x3df4b6]? Compile::Optimize+0xf76? (compile.cpp:2187) >> ... >> >> Seems to be reproducible by JTREG test compiler/vectorization/TestComplexAddrExpr.java on some x64 and aarch64 machines. >> (May depend on CPU model.) >> >> Is this a known issue? >> Or should I open a bug? >> >> Best regards, >> Martin >> >> >>> -----Original Message----- >>> From: hotspot-compiler-dev >> retn at openjdk.java.net> On Behalf Of Vladimir Kozlov >>> Sent: Montag, 10. August 2020 19:03 >>> To: hotspot compiler >>> Subject: Re: [16] RFR (S) 8249749: modify a primitive array through a stream >>> and a for cycle causes jre crash >>> >>> Thank you, Vladimir >>> >>> Vladimir K >>> >>> On 8/10/20 2:04 AM, Vladimir Ivanov wrote: >>>> >>>>> http://cr.openjdk.java.net/~kvn/8249749/webrev.00/ >>>> >>>> Looks good. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8249749 >>>>> >>>>> SuperWord does not recognize array indexing pattern used in the test >>> due to additional AddI node: >>>>> >>>>> AddI(AddI(Invariant(j*n), Loop_phi(i)), Loop_inc(1)) >>>>> >>>>> As result it can't find memory reference to align vectors. But code ignores >>> that and continue execution. >>>>> Later when align_to_ref is referenced we hit SEGV because it is NULL. >>>>> >>>>> The fix is to check align_to_ref for NULL early and bailout. >>>>> >>>>> I also adjusted code in SWPointer::scaled_iv_plus_offset() to recognize >>> this address pattern to vectorize test's code. >>>>> And added missing _invar setting. >>>>> >>>>> And I slightly modified tracking code to investigate this issue. >>>>> >>>>> Added new test to check some complex address expressions similar to >>> bug's test case. Not all cases in test are >>>>> vectorized - there are other conditions which prevent that. >>>>> >>>>> Tested tier1,tier2,hs-tier3,precheckin-comp >>>>> >>>>> Thanks, >>>>> Vladimir K From vladimir.kozlov at oracle.com Tue Aug 18 22:10:20 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 18 Aug 2020 15:10:20 -0700 Subject: [16] RFR (S) 8249749: modify a primitive array through a stream and a for cycle causes jre crash In-Reply-To: <89ebdebb-2d8d-7e3a-1594-a3a6888329bc@oracle.com> References: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com> <03653a8f-d196-4e4c-d893-a619f2973011@oracle.com> <3d0dc868-e824-a141-02a2-58a58ad5b450@oracle.com> <2f7bd657-ce3b-b14b-e682-2b209c2240eb@oracle.com> <89ebdebb-2d8d-7e3a-1594-a3a6888329bc@oracle.com> Message-ID: <860caa40-38f7-baff-54dc-3e6802a64425@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8251994 On 8/18/20 3:03 PM, Vladimir Kozlov wrote: > I reproduced it with -XX:UseAVX=0. > > I will file bug and take care of it. > > Thanks, > Vladimir K > > On 8/18/20 2:57 PM, Vladimir Kozlov wrote: >> Thank you for reporting, Martin >> >> Please, file bug and specify JDK version, VM flags and CPUID features on machine where it fail. >> >> We test on x86 and aarch64 and I did not see any issues so far. >> >> Regards, >> Vladimir >> >> On 8/18/20 2:25 PM, Doerr, Martin wrote: >>> Hi Vladimir, >>> >>> we are hitting the following assertion after this change was pushed: >>> assert(my_pack(s) == __null) failed: only in one pack >>> >>> Stack: >>> V? [jvm.dll+0xbbac55]? SuperWord::construct_my_pack_map+0x135? (superword.cpp:1723) >>> V? [jvm.dll+0xbb57f7]? SuperWord::SLP_extract+0x427? (superword.cpp:520) >>> V? [jvm.dll+0xbcba0b]? SuperWord::transform_loop+0x48b? (superword.cpp:170) >>> V? [jvm.dll+0x895a09]? PhaseIdealLoop::build_and_optimize+0xef9? (loopnode.cpp:3270) >>> V? [jvm.dll+0x3df4b6]? Compile::Optimize+0xf76? (compile.cpp:2187) >>> ... >>> >>> Seems to be reproducible by JTREG test compiler/vectorization/TestComplexAddrExpr.java on some x64 and aarch64 machines. >>> (May depend on CPU model.) >>> >>> Is this a known issue? >>> Or should I open a bug? >>> >>> Best regards, >>> Martin >>> >>> >>>> -----Original Message----- >>>> From: hotspot-compiler-dev >>> retn at openjdk.java.net> On Behalf Of Vladimir Kozlov >>>> Sent: Montag, 10. August 2020 19:03 >>>> To: hotspot compiler >>>> Subject: Re: [16] RFR (S) 8249749: modify a primitive array through a stream >>>> and a for cycle causes jre crash >>>> >>>> Thank you, Vladimir >>>> >>>> Vladimir K >>>> >>>> On 8/10/20 2:04 AM, Vladimir Ivanov wrote: >>>>> >>>>>> http://cr.openjdk.java.net/~kvn/8249749/webrev.00/ >>>>> >>>>> Looks good. >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8249749 >>>>>> >>>>>> SuperWord does not recognize array indexing pattern used in the test >>>> due to additional AddI node: >>>>>> >>>>>> AddI(AddI(Invariant(j*n), Loop_phi(i)), Loop_inc(1)) >>>>>> >>>>>> As result it can't find memory reference to align vectors. But code ignores >>>> that and continue execution. >>>>>> Later when align_to_ref is referenced we hit SEGV because it is NULL. >>>>>> >>>>>> The fix is to check align_to_ref for NULL early and bailout. >>>>>> >>>>>> I also adjusted code in SWPointer::scaled_iv_plus_offset() to recognize >>>> this address pattern to vectorize test's code. >>>>>> And added missing _invar setting. >>>>>> >>>>>> And I slightly modified tracking code to investigate this issue. >>>>>> >>>>>> Added new test to check some complex address expressions similar to >>>> bug's test case. Not all cases in test are >>>>>> vectorized - there are other conditions which prevent that. >>>>>> >>>>>> Tested tier1,tier2,hs-tier3,precheckin-comp >>>>>> >>>>>> Thanks, >>>>>> Vladimir K From igor.ignatyev at oracle.com Tue Aug 18 22:43:49 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 18 Aug 2020 15:43:49 -0700 Subject: RFR(XS): 8208257: Un-quarantine vmTestbase/vm/mlvm/meth/func/jdi/breakpointOtherStratum In-Reply-To: <1b7315b8-98be-116d-d037-e3bb17f55f1b@oracle.com> References: <1b7315b8-98be-116d-d037-e3bb17f55f1b@oracle.com> Message-ID: <6F5EA02D-CAC0-405B-B386-7FD2B7BA37EA@oracle.com> Hi Evgeny, looks good to me, you will need to update 8208257's title in JBS and close 8058176 as CNR. Cheers, -- Igor > On Aug 18, 2020, at 12:40 PM, Evgeny Nikitin wrote: > > Hi, > > Bug: https://bugs.openjdk.java.net/browse/JDK-8208257 > Webrev: http://cr.openjdk.java.net/~enikitin/8208257/webrev.00/JDK-8208257.patch > > I tried to reproduce the test multiple times with different VM parameters, but it always passes. I suggest removing it from ProblemList.txt. > > Second change is marking the test with randomness keyword from the https://bugs.openjdk.java.net/browse/JDK-8243427 (using reproducible random for mlvm tests). > > The change has been checked in mach5 for windows, macosx, linux in x64-debug, approx. 100 runs on each platform (passed). > > Please review, > /Evgeny Nikitin. From igor.ignatyev at oracle.com Tue Aug 18 23:42:19 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 18 Aug 2020 16:42:19 -0700 Subject: RFR(T) : 8252005 : narrow disabling of allowSmartActionArgs in vmTestbase Message-ID: <4E6FECE6-9103-46ED-84B2-79DBA0123ED9@oracle.com> http://cr.openjdk.java.net/~iignatyev//8252005/webrev.00/ > 0 lines changed: 0 ins; 0 del; 0 mod; Hi all, could you please review this trivial (and apparently empty) patch which sets allowSmartActionArgs to false only in subdirectories of vmTestbase which currently use PropertyResolvingWrapper? (it's hard to tell from webrev or patch, but test/hotspot/jtreg/vmTestbase/TEST.properties is effectively removed) webrev: http://cr.openjdk.java.net/~iignatyev//8252005/webrev.00/ JBS: https://bugs.openjdk.java.net/browse/JDK-8252005 Thanks, -- Igor From joserz at linux.ibm.com Wed Aug 19 00:24:32 2020 From: joserz at linux.ibm.com (joserz at linux.ibm.com) Date: Tue, 18 Aug 2020 21:24:32 -0300 Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com> <20200626182644.GA262544@pacoca> <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com> <20200630001528.GA26652@pacoca> Message-ID: <20200819002432.GA915540@pacoca> Hallo Martin! Thank you very much for your review. Here is the v3: Webrev: http://cr.openjdk.java.net/~mhorie/8248190/webrev.02/ Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 I run a functional test and it's working as expected. If you try to run it in a system Unfortunately, I couldn?t find a Power10 machine in my garage ?? ???????? This is the code I use to test: 8<--------------------------------------------------------------- import java.io.IOException; class ReverseBytes { public static void main(String[] args) throws IOException { for (int i = 0; i < 1000000; ++i) { if (Integer.reverseBytes(0x12345678) != 0x78563412) { throw new RuntimeException(); } if (Long.reverseBytes(0x123456789ABCDEF0L) != 0xF0DEBC9A78563412L) { throw new RuntimeException(); } if (Short.reverseBytes((short)0x1234) != (short)0x3412) { throw new RuntimeException(); } if (Character.reverseBytes((char)0xabcd) != (char)0xcdab) { throw new RuntimeException(); } } System.out.println("ok"); } } 8<--------------------------------------------------------------- Best regards! Jose On Tue, Aug 18, 2020 at 09:13:39AM +0000, Doerr, Martin wrote: > Hi Michihiro and Jose, > > I had only done a quick review during my vacation. Thanks for updating the description of PowerArchitecturePPC64. > After taking a second look, I have a few minor requests. Sorry for that. > > > * ?UseByteReverseInstructions? (plural) would be more consistent with other names. > * Please add ?size? specifications to the ppc.ad file. Otherwise, the compiler has to determine sizes dynamically every time. > * bytes_reverse_short: ?format? specification misses ?extsh?. > > Unfortunately, I couldn?t find a Power10 machine in my garage ?? > So we rely on your testing. > > Thanks and best regards, > Martin > > > From: Michihiro Horie > Sent: Dienstag, 18. August 2020 09:28 > To: Doerr, Martin > Cc: hotspot-compiler-dev at openjdk.java.net; joserz at linux.ibm.com > Subject: RE: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions > > > Jose, > Latest change looks good also to me. > > Marin, > Do you think if I can push the change? > > Best regards, > Michihiro > > > ----- Original message ----- > From: "Doerr, Martin" > > To: "joserz at linux.ibm.com" > > Cc: hotspot compiler >, "horie at jp.ibm.com" > > Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions > Date: Wed, Jul 1, 2020 4:01 AM > > Thanks for the much better flag description. > Looks good. > > Best regards, > Martin > > > Am 30.06.2020 um 02:15 schrieb "joserz at linux.ibm.com" >: > > > > ?Hello team, > > > > Here's the 2nd version, implementing the suggestions asked by Martin. > > > > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.01/ > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > > > Thank you!! > > > > Jose > > > >> On Sat, Jun 27, 2020 at 09:29:32AM +0000, Doerr, Martin wrote: > >> Hi Jose, > >> > >> Can you replace the outdated description of PowerArchitecturePPC64 in globals_poc.hpp by something generic, please? > >> > >> Please update the Copyright year in vm_version_poc.hpp. > >> > >> I can?t test the change, but it looks good to me. > >> > >> Best regards, > >> Martin > >> > >>>> Am 26.06.2020 um 20:29 schrieb "joserz at linux.ibm.com" >: > >>> > >>> ?Hello team! > >>> > >>> This patch introduces Power10 to OpenJDK and implements three new instructions: > >>> - brh - byte-reverse halfword > >>> - brw - byte-reverse word > >>> - brd - byte-reverse doubleword > >>> > >>> Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/ > >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > >>> > >>> Thanks for your review! > >>> > >>> Jose R. Ziviani > From aph at redhat.com Wed Aug 19 08:35:57 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 19 Aug 2020 09:35:57 +0100 Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com> References: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com> <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com> <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com> Message-ID: On 18/08/2020 16:05, Dmitry Chuyko wrote: > Some more results for a benchmark with reduce(): > > -XX:-UseSignumIntrinsic > DoubleOrigSignum.ofMostlyNaN 0.914 ? 0.001 ns/op > DoubleOrigSignum.ofMostlyNeg 1.178 ? 0.001 ns/op > DoubleOrigSignum.ofMostlyPos 1.176 ? 0.017 ns/op > DoubleOrigSignum.ofMostlyZero 0.803 ? 0.001 ns/op > DoubleOrigSignum.ofRandom 1.175 ? 0.012 ns/op > -XX:+UseSignumIntrinsic > DoubleOrigSignum.ofMostlyNaN 1.040 ? 0.007 ns/op > DoubleOrigSignum.ofMostlyNeg 1.040 ? 0.004 ns/op > DoubleOrigSignum.ofMostlyPos 1.039 ? 0.003 ns/op > DoubleOrigSignum.ofMostlyZero 1.040 ? 0.001 ns/op > DoubleOrigSignum.ofRandom 1.040 ? 0.003 ns/op That's almost no difference, isn't it? Down in the noise. > If we only intrinsify copySign() we lose free mask that we get from > facgt. In such case improvement (for signum) decreases like from ~30% to > ~15%, and it also greatly depends on the particular HW. We can > additionally introduce an intrinsic for Math.copySign(), especially it > makes sense for float where it can be just 2 fp instructions: movi+bsl > (fmovd+fnegd+bsl for double). I think this is worth doing, because moves between GPRs and vector regs tend to have a long latency. Can you please add that, and we can all try it on our various hardware. We're measuring two different things, throughput and latency. The first JMH test you provided was really testing latency, because Blackhole waits for everything to complete. [ Note to self: Blackhole.consume() seems to be particularly slow on some AArch64 implementations because it uses a volatile read. What seems to be happening, judging by how long it takes, is that the store buffer is drained before the volatile read. Maybe some other construct would work better but still provide the guarantees Blackhole.consume() needs. ] For throughput we want to keep everything moving. Sure, sometimes we are going to have to wait for some calculation to complete, so if we can improve latency without adverse cost we should. For that, staying in the vector regs helps. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From nick.gasson at arm.com Wed Aug 19 08:37:16 2020 From: nick.gasson at arm.com (Nick Gasson) Date: Wed, 19 Aug 2020 16:37:16 +0800 Subject: RFR(S): 8251923: "Invalid JNI handle" assertion failure in JVMCICompiler::force_comp_at_level_simple() Message-ID: <85pn7nxc8z.fsf@nicgas01-pc.shanghai.arm.com> Hi, Bug: https://bugs.openjdk.java.net/browse/JDK-8251923 Webrev: http://cr.openjdk.java.net/~ngasson/8251923/webrev.1/ We see this crash occasionally when testing with Graal on some AArch64 systems: # # Internal Error (/home/ent-user/jdk_src/src/hotspot/share/runtime/jniHandles.inline.hpp:63), pid=92161, tid=92593 # assert(external_guard || result != __null) failed: Invalid JNI handle # V [libjvm.so+0xdfaa84] JNIHandles::resolve(_jobject*)+0x19c V [libjvm.so+0xf25104] HotSpotJVMCI::resolve(JVMCIObject)+0x14 V [libjvm.so+0xe9bd20] JVMCICompiler::force_comp_at_level_simple(methodHandle const&)+0xa0 V [libjvm.so+0x174bd6c] TieredThresholdPolicy::is_mature(Method*)+0x51c V [libjvm.so+0x76e68c] ciMethodData::load_data()+0x9cc The full hs_err file is attached to the JBS entry. The handle here is _HotSpotJVMCIRuntime_instance which is initialised in JVMCIRuntime::initialize_HotSpotJVMCIRuntime(): JVMCIObject result = JVMCIENV->call_HotSpotJVMCIRuntime_runtime(JVMCI_CHECK); _HotSpotJVMCIRuntime_instance = JVMCIENV->make_global(result); JVMCICompiler::force_comp_at_level_simple() checks whether the _object field inside the handle is null before calling JNIHandles::resolve() on it, which should avoid the above assertion failure where the pointee is null. However on a non-TSO architecture another thread may observe the store to _object when assigning _HotSpotJVMCIRuntime_instance before the store in JVMCIEnv::make_global() that initialises the pointed-to oop. We need to add a store-store barrier here to force the expected ordering. Tested with jcstress and Graal on the affected machine, which used to reproduce it quite reliably. -- Thanks, Nick From ningsheng.jian at arm.com Wed Aug 19 09:53:45 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Wed, 19 Aug 2020 17:53:45 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> Message-ID: Hi Andrew, I have updated the patch based on the review comments. Would you mind taking another look? Thanks! Full: http://cr.openjdk.java.net/~njian/8231441/webrev.04/ Incremental: http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/ Also add build-dev, as there's a makefile change. And the split parts: 1) SVE feature detection: http://cr.openjdk.java.net/~njian/8231441/webrev.04-feature 2) c2 register allocation: http://cr.openjdk.java.net/~njian/8231441/webrev.04-ra 3) SVE c2 backend: http://cr.openjdk.java.net/~njian/8231441/webrev.04-c2 Bug: https://bugs.openjdk.java.net/browse/JDK-8231441 CSR: https://bugs.openjdk.java.net/browse/JDK-8248742 JTreg tests are still running, and so far no new failure found. Thanks, Ningsheng On 8/17/20 5:16 PM, Andrew Dinn wrote: > Hi Pengfei, > > On 17/08/2020 07:00, Ningsheng Jian wrote: >> Thanks a lot for the review! Sorry for the late reply, as I was on >> vacation last week. And thanks to Pengfei and Joshua for helping >> clarifying some details in the patch. > > Yes, they did a very good job of answering most of the pending questions. > >>> I also eyeballed /some/ of the generated code to check that it looked >>> ok. I'd really like to be able to do that systematically for a >>> comprehensive test suite that exercised every rule but I only had the >>> machine for a few days. This really ought to be done as a follow-up to >>> ensure that all the rules are working as expected. >> >> Yes, we would expect Pengfei's OptoAssembly check patch can get merged >> in future. > > I'm fine with that as a follow-up patch if you raise a JIRA for it. > >>> I am not clear why you are choosing to re-init ptrue after certain JVM >>> runtime calls (e.g. when Z calls into the runtime) and not others e.g. >>> when we call a JVM_ENTRY. Could you explain the rationale you have >>> followed here? >> >> We do the re-init at any possible return points to c2 code, not in any >> runtime c++ functions, which will reduce the re-init calls. >> >> Actually I found those entries by some hack of jvm. In the hacky code >> below we use gcc option -finstrument-functions to build hotspot. With >> this option, each C/C++ function entry/exit will call the instrument >> functions we defined. In instrument functions, we clobber p7 (or other >> reg for test) register, and in c2 function return we verify that p7 (or >> other reg) has been reinitialized. >> >> http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch > > Nice work. It's very good to have that documented. I'm willing to accept > i) that this has found all current cases and ii) that the verify will > catch any cases that might get introduced by future changes (e.g. the > callout introduced by ZGC that you mention below). As the above mot say > there is a slim chance this might have missed some cases but I think it > is pretty unlikely. > > >>> Specific Comments (register allocator webrev): >>> >>> >>> aarch64.ad:97-100 >>> >>> Why have you added a reg_def for R8 and R9 here and also to alloc_class >>> chunk0 at lines 544-545? They aren't used by C2 so why define them? >>> >> >> I think Pengfei has helped to explain that. I will either add clear >> comments or rename the register name as you suggested. > > Ok, good. > >> As Joshua clarified, we are also working on predicate scalable reg, >> which is not in this patch. Thanks for the suggestion, I will try to >> refactor this a bit. > > Ok, I'll wait for an updated patch. Are you planning to include the > scalable predicate reg code as part of this patch? I think that would be > better as it would help to clarify the need to distinguish vector regs > as a subset of scalable regs. > >>> zBarrierSetAssembler_aarch64.cpp:434 >>> >>> Can you explain why we need to check p7 here and not do so in other >>> places where we call into the JVM? I'm not saying this is wrong. I just >>> want to know how you decided where re-init of p7 was needed. >>> >> >> Actually I found this by my hack patch above while running jtreg tests. >> The stub slowpath here can be a c++ function. > > Yes, good catch. > >>> superword.cpp:97 >>> >>> Does this mean that is someone sets the maximum vector size to a >>> non-power of two, such as 384, all superword operations will be >>> bypassed? Including those which can be done using NEON vectors? >>> >> >> Current SLP vectorizer only supports power-of-2 vector size. We are >> trying to work out a new vectorizer to support all SVE vector sizes, so >> we would expect a size like 384 could go to that path. I tried current >> patch on a 512-bit SVE hardware which does not support 384-bit: >> >> $ java -XX:MaxVectorSize=16 -version # (32 and 64 are the same) >> openjdk version "16-internal" 2021-03-16 >> >> $ java -XX:MaxVectorSize=48 -version >> OpenJDK 64-Bit Server VM warning: Current system only supports max SVE >> vector length 32. Set MaxVectorSize to 32 >> >> (Fallbacks to 32 and issue a warning, as the prctl() call returns 32 >> instead of unsupported 48: >> https://www.kernel.org/doc/Documentation/arm64/sve.txt) >> >> Do you think we need to exit vm instead of warning and fallbacking to 32 >> here? > > Yes, I think a vm exit would probably be a better choice. > > regards, > > > Andrew Dinn > ----------- > Red Hat Distinguished Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill > From martin.doerr at sap.com Wed Aug 19 09:55:50 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 19 Aug 2020 09:55:50 +0000 Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: <20200819002432.GA915540@pacoca> References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com> <20200626182644.GA262544@pacoca> <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com> <20200630001528.GA26652@pacoca> <20200819002432.GA915540@pacoca> Message-ID: Hi Jose, thanks for the update. I have never seen 2 format specifications in the ad file. Does that work or does the 2nd one overwrite the 1st one? I think it should be: format %{ "BRH $dst, $src\n\t" "EXTSH $dst, $dst" %} I don't need to see another webrev for that. Otherwise, the change looks good. Thanks for contributing. Best regards, Martin > -----Original Message----- > From: joserz at linux.ibm.com > Sent: Mittwoch, 19. August 2020 02:25 > To: Doerr, Martin > Cc: Michihiro Horie ; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: RFR(M): 8248190: PPC: Enable Power10 system and use new > byte-reverse instructions > > Hallo Martin! > > Thank you very much for your review. Here is the v3: > > Webrev: http://cr.openjdk.java.net/~mhorie/8248190/webrev.02/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > I run a functional test and it's working as expected. If you try to run it in a > system > $ java -XX:+UseByteReverseInstructions ReverseBytes > OpenJDK 64-Bit Server VM warning: UseByteReverseInstructions specified, > but needs at least Power10. > (continue with existing code) > > > Unfortunately, I couldn?t find a Power10 machine in my garage ?? > ???????? > > This is the code I use to test: > 8<--------------------------------------------------------------- > import java.io.IOException; > > class ReverseBytes > { > public static void main(String[] args) throws IOException > { > for (int i = 0; i < 1000000; ++i) { > if (Integer.reverseBytes(0x12345678) != 0x78563412) { > throw new RuntimeException(); > } > > if (Long.reverseBytes(0x123456789ABCDEF0L) != > 0xF0DEBC9A78563412L) { > throw new RuntimeException(); > } > > if (Short.reverseBytes((short)0x1234) != (short)0x3412) { > throw new RuntimeException(); > } > > if (Character.reverseBytes((char)0xabcd) != (char)0xcdab) { > throw new RuntimeException(); > } > } > System.out.println("ok"); > } > } > 8<--------------------------------------------------------------- > > Best regards! > > Jose > > On Tue, Aug 18, 2020 at 09:13:39AM +0000, Doerr, Martin wrote: > > Hi Michihiro and Jose, > > > > I had only done a quick review during my vacation. Thanks for updating the > description of PowerArchitecturePPC64. > > After taking a second look, I have a few minor requests. Sorry for that. > > > > > > * ?UseByteReverseInstructions? (plural) would be more consistent with > other names. > > * Please add ?size? specifications to the ppc.ad file. Otherwise, the > compiler has to determine sizes dynamically every time. > > * bytes_reverse_short: ?format? specification misses ?extsh?. > > > > Unfortunately, I couldn?t find a Power10 machine in my garage ?? > > So we rely on your testing. > > > > Thanks and best regards, > > Martin > > > > > > From: Michihiro Horie > > Sent: Dienstag, 18. August 2020 09:28 > > To: Doerr, Martin > > Cc: hotspot-compiler-dev at openjdk.java.net; joserz at linux.ibm.com > > Subject: RE: RFR(M): 8248190: PPC: Enable Power10 system and use new > byte-reverse instructions > > > > > > Jose, > > Latest change looks good also to me. > > > > Marin, > > Do you think if I can push the change? > > > > Best regards, > > Michihiro > > > > > > ----- Original message ----- > > From: "Doerr, Martin" > > > > To: "joserz at linux.ibm.com" > > > > Cc: hotspot compiler dev at openjdk.java.net>, > "horie at jp.ibm.com" > > > > Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system > and use new byte-reverse instructions > > Date: Wed, Jul 1, 2020 4:01 AM > > > > Thanks for the much better flag description. > > Looks good. > > > > Best regards, > > Martin > > > > > Am 30.06.2020 um 02:15 schrieb > "joserz at linux.ibm.com" > >: > > > > > > ?Hello team, > > > > > > Here's the 2nd version, implementing the suggestions asked by Martin. > > > > > > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.01/ > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > > > > > Thank you!! > > > > > > Jose > > > > > >> On Sat, Jun 27, 2020 at 09:29:32AM +0000, Doerr, Martin wrote: > > >> Hi Jose, > > >> > > >> Can you replace the outdated description of PowerArchitecturePPC64 in > globals_poc.hpp by something generic, please? > > >> > > >> Please update the Copyright year in vm_version_poc.hpp. > > >> > > >> I can?t test the change, but it looks good to me. > > >> > > >> Best regards, > > >> Martin > > >> > > >>>> Am 26.06.2020 um 20:29 schrieb > "joserz at linux.ibm.com" > >: > > >>> > > >>> ?Hello team! > > >>> > > >>> This patch introduces Power10 to OpenJDK and implements three new > instructions: > > >>> - brh - byte-reverse halfword > > >>> - brw - byte-reverse word > > >>> - brd - byte-reverse doubleword > > >>> > > >>> Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/ > > >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > >>> > > >>> Thanks for your review! > > >>> > > >>> Jose R. Ziviani > > From magnus.ihse.bursie at oracle.com Wed Aug 19 10:05:01 2020 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Wed, 19 Aug 2020 12:05:01 +0200 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> Message-ID: <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com> On 2020-08-19 11:53, Ningsheng Jian wrote: > Hi Andrew, > > I have updated the patch based on the review comments. Would you mind > taking another look? Thanks! > > Full: > http://cr.openjdk.java.net/~njian/8231441/webrev.04/ Build changes look good. Thank you for remembering to cc build-dev! This is maybe not relevant, but I was surprised to find src/hotspot/cpu/aarch64/aarch64-asmtest.py, because a) it's python code, and b) the name implies that it is a test, even though that it resides in src. Is this really proper? /Magnus > > Incremental: > http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/ > > Also add build-dev, as there's a makefile change. > > And the split parts: > > 1) SVE feature detection: > http://cr.openjdk.java.net/~njian/8231441/webrev.04-feature > > 2) c2 register allocation: > http://cr.openjdk.java.net/~njian/8231441/webrev.04-ra > > 3) SVE c2 backend: > http://cr.openjdk.java.net/~njian/8231441/webrev.04-c2 > > Bug: https://bugs.openjdk.java.net/browse/JDK-8231441 > CSR: https://bugs.openjdk.java.net/browse/JDK-8248742 > > JTreg tests are still running, and so far no new failure found. > > Thanks, > Ningsheng > > On 8/17/20 5:16 PM, Andrew Dinn wrote: >> Hi Pengfei, >> >> On 17/08/2020 07:00, Ningsheng Jian wrote: >>> Thanks a lot for the review! Sorry for the late reply, as I was on >>> vacation last week. And thanks to Pengfei and Joshua for helping >>> clarifying some details in the patch. >> >> Yes, they did a very good job of answering most of the pending >> questions. >> >>>> I also eyeballed /some/ of the generated code to check that it looked >>>> ok. I'd really like to be able to do that systematically for a >>>> comprehensive test suite that exercised every rule but I only had the >>>> machine for a few days. This really ought to be done as a follow-up to >>>> ensure that all the rules are working as expected. >>> >>> Yes, we would expect Pengfei's OptoAssembly check patch can get merged >>> in future. >> >> I'm fine with that as a follow-up patch if you raise a JIRA for it. >> >>>> I am not clear why you are choosing to re-init ptrue after certain JVM >>>> runtime calls (e.g. when Z calls into the runtime) and not others e.g. >>>> when we call a JVM_ENTRY. Could you explain the rationale you have >>>> followed here? >>> >>> We do the re-init at any possible return points to c2 code, not in any >>> runtime c++ functions, which will reduce the re-init calls. >>> >>> Actually I found those entries by some hack of jvm. In the hacky code >>> below we use gcc option -finstrument-functions to build hotspot. With >>> this option, each C/C++ function entry/exit will call the instrument >>> functions we defined. In instrument functions, we clobber p7 (or other >>> reg for test) register, and in c2 function return we verify that p7 (or >>> other reg) has been reinitialized. >>> >>> http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch >>> >> >> Nice work. It's very good to have that documented. I'm willing to accept >> i) that this has found all current cases and ii) that the verify will >> catch any cases that might get introduced by future changes (e.g. the >> callout introduced by ZGC that you mention below). As the above mot say >> there is a slim chance this might have missed some cases but I think it >> is pretty unlikely. >> >> >>>> Specific Comments (register allocator webrev): >>>> >>>> >>>> aarch64.ad:97-100 >>>> >>>> Why have you added a reg_def for R8 and R9 here and also to >>>> alloc_class >>>> chunk0 at lines 544-545? They aren't used by C2 so why define them? >>>> >>> >>> I think Pengfei has helped to explain that. I will either add clear >>> comments or rename the register name as you suggested. >> >> Ok, good. >> >>> As Joshua clarified, we are also working on predicate scalable reg, >>> which is not in this patch. Thanks for the suggestion, I will try to >>> refactor this a bit. >> >> Ok, I'll wait for an updated patch. Are you planning to include the >> scalable predicate reg code as part of this patch? I think that would be >> better as it would help to clarify the need to distinguish vector regs >> as a subset of scalable regs. >> >>>> zBarrierSetAssembler_aarch64.cpp:434 >>>> >>>> Can you explain why we need to check p7 here and not do so in other >>>> places where we call into the JVM? I'm not saying this is wrong. I >>>> just >>>> want to know how you decided where re-init of p7 was needed. >>>> >>> >>> Actually I found this by my hack patch above while running jtreg tests. >>> The stub slowpath here can be a c++ function. >> >> Yes, good catch. >> >>>> superword.cpp:97 >>>> >>>> Does this mean that is someone sets the maximum vector size to a >>>> non-power of two, such as 384, all superword operations will be >>>> bypassed? Including those which can be done using NEON vectors? >>>> >>> >>> Current SLP vectorizer only supports power-of-2 vector size. We are >>> trying to work out a new vectorizer to support all SVE vector sizes, so >>> we would expect a size like 384 could go to that path. I tried current >>> patch on a 512-bit SVE hardware which does not support 384-bit: >>> >>> $ java -XX:MaxVectorSize=16 -version # (32 and 64 are the same) >>> openjdk version "16-internal" 2021-03-16 >>> >>> $ java -XX:MaxVectorSize=48 -version >>> OpenJDK 64-Bit Server VM warning: Current system only supports max SVE >>> vector length 32. Set MaxVectorSize to 32 >>> >>> (Fallbacks to 32 and issue a warning, as the prctl() call returns 32 >>> instead of unsupported 48: >>> https://www.kernel.org/doc/Documentation/arm64/sve.txt) >>> >>> Do you think we need to exit vm instead of warning and fallbacking >>> to 32 >>> here? >> >> Yes, I think a vm exit would probably be a better choice. >> >> regards, >> >> >> Andrew Dinn >> ----------- >> Red Hat Distinguished Engineer >> Red Hat UK Ltd >> Registered in England and Wales under Company Registration No. 03798903 >> Directors: Michael Cunningham, Michael ("Mike") O'Neill >> > From martin.doerr at sap.com Wed Aug 19 10:16:37 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 19 Aug 2020 10:16:37 +0000 Subject: [16] RFR (S) 8249749: modify a primitive array through a stream and a for cycle causes jre crash In-Reply-To: <860caa40-38f7-baff-54dc-3e6802a64425@oracle.com> References: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com> <03653a8f-d196-4e4c-d893-a619f2973011@oracle.com> <3d0dc868-e824-a141-02a2-58a58ad5b450@oracle.com> <2f7bd657-ce3b-b14b-e682-2b209c2240eb@oracle.com> <89ebdebb-2d8d-7e3a-1594-a3a6888329bc@oracle.com> <860caa40-38f7-baff-54dc-3e6802a64425@oracle.com> Message-ID: Hi Vladimir, thank you for taking care of it. It's good to know that 11u is also affected. Best regards, Martin > -----Original Message----- > From: Vladimir Kozlov > Sent: Mittwoch, 19. August 2020 00:10 > To: Doerr, Martin ; hotspot compiler compiler-dev at openjdk.java.net> > Cc: Zeller, Arno > Subject: Re: [16] RFR (S) 8249749: modify a primitive array through a stream > and a for cycle causes jre crash > > https://bugs.openjdk.java.net/browse/JDK-8251994 > > On 8/18/20 3:03 PM, Vladimir Kozlov wrote: > > I reproduced it with -XX:UseAVX=0. > > > > I will file bug and take care of it. > > > > Thanks, > > Vladimir K > > > > On 8/18/20 2:57 PM, Vladimir Kozlov wrote: > >> Thank you for reporting, Martin > >> > >> Please, file bug and specify JDK version, VM flags and CPUID features on > machine where it fail. > >> > >> We test on x86 and aarch64 and I did not see any issues so far. > >> > >> Regards, > >> Vladimir > >> > >> On 8/18/20 2:25 PM, Doerr, Martin wrote: > >>> Hi Vladimir, > >>> > >>> we are hitting the following assertion after this change was pushed: > >>> assert(my_pack(s) == __null) failed: only in one pack > >>> > >>> Stack: > >>> > V? [jvm.dll+0xbbac55]? SuperWord::construct_my_pack_map+0x135? (superw > ord.cpp:1723) > >>> > V? [jvm.dll+0xbb57f7]? SuperWord::SLP_extract+0x427? (superword.cpp:520) > >>> > V? [jvm.dll+0xbcba0b]? SuperWord::transform_loop+0x48b? (superword.cpp: > 170) > >>> > V? [jvm.dll+0x895a09]? PhaseIdealLoop::build_and_optimize+0xef9? (loopnod > e.cpp:3270) > >>> V? [jvm.dll+0x3df4b6]? Compile::Optimize+0xf76? (compile.cpp:2187) > >>> ... > >>> > >>> Seems to be reproducible by JTREG test > compiler/vectorization/TestComplexAddrExpr.java on some x64 and aarch64 > machines. > >>> (May depend on CPU model.) > >>> > >>> Is this a known issue? > >>> Or should I open a bug? > >>> > >>> Best regards, > >>> Martin > >>> > >>> > >>>> -----Original Message----- > >>>> From: hotspot-compiler-dev >>>> retn at openjdk.java.net> On Behalf Of Vladimir Kozlov > >>>> Sent: Montag, 10. August 2020 19:03 > >>>> To: hotspot compiler > >>>> Subject: Re: [16] RFR (S) 8249749: modify a primitive array through a > stream > >>>> and a for cycle causes jre crash > >>>> > >>>> Thank you, Vladimir > >>>> > >>>> Vladimir K > >>>> > >>>> On 8/10/20 2:04 AM, Vladimir Ivanov wrote: > >>>>> > >>>>>> http://cr.openjdk.java.net/~kvn/8249749/webrev.00/ > >>>>> > >>>>> Looks good. > >>>>> > >>>>> Best regards, > >>>>> Vladimir Ivanov > >>>>> > >>>>>> https://bugs.openjdk.java.net/browse/JDK-8249749 > >>>>>> > >>>>>> SuperWord does not recognize array indexing pattern used in the > test > >>>> due to additional AddI node: > >>>>>> > >>>>>> AddI(AddI(Invariant(j*n), Loop_phi(i)), Loop_inc(1)) > >>>>>> > >>>>>> As result it can't find memory reference to align vectors. But code > ignores > >>>> that and continue execution. > >>>>>> Later when align_to_ref is referenced we hit SEGV because it is > NULL. > >>>>>> > >>>>>> The fix is to check align_to_ref for NULL early and bailout. > >>>>>> > >>>>>> I also adjusted code in SWPointer::scaled_iv_plus_offset() to > recognize > >>>> this address pattern to vectorize test's code. > >>>>>> And added missing _invar setting. > >>>>>> > >>>>>> And I slightly modified tracking code to investigate this issue. > >>>>>> > >>>>>> Added new test to check some complex address expressions similar > to > >>>> bug's test case. Not all cases in test are > >>>>>> vectorized - there are other conditions which prevent that. > >>>>>> > >>>>>> Tested tier1,tier2,hs-tier3,precheckin-comp > >>>>>> > >>>>>> Thanks, > >>>>>> Vladimir K From ningsheng.jian at arm.com Wed Aug 19 10:40:49 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Wed, 19 Aug 2020 18:40:49 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com> Message-ID: <4ec335ca-0a88-3b98-f6e4-fe7a0453ae7b@arm.com> Hi Magnus, Thanks for the review! On 8/19/20 6:05 PM, Magnus Ihse Bursie wrote: > On 2020-08-19 11:53, Ningsheng Jian wrote: >> Hi Andrew, >> >> I have updated the patch based on the review comments. Would you mind >> taking another look? Thanks! >> >> Full: >> http://cr.openjdk.java.net/~njian/8231441/webrev.04/ > Build changes look good. Thank you for remembering to cc build-dev! > > This is maybe not relevant, but I was surprised to find > src/hotspot/cpu/aarch64/aarch64-asmtest.py, because a) it's python code, > and b) the name implies that it is a test, even though that it resides > in src. Is this really proper? This handy script is used to (manually) generate some code in assembler_aarch64.cpp. The generated code is for assembler smoke test, so it named that. It's helpful to make sure the assembler emits correct binary code, but I am not sure whether a python code in the project is proper or not. Thanks, Ningsheng > > /Magnus >> >> Incremental: >> http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/ >> >> Also add build-dev, as there's a makefile change. >> >> And the split parts: >> >> 1) SVE feature detection: >> http://cr.openjdk.java.net/~njian/8231441/webrev.04-feature >> >> 2) c2 register allocation: >> http://cr.openjdk.java.net/~njian/8231441/webrev.04-ra >> >> 3) SVE c2 backend: >> http://cr.openjdk.java.net/~njian/8231441/webrev.04-c2 >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8231441 >> CSR: https://bugs.openjdk.java.net/browse/JDK-8248742 >> >> JTreg tests are still running, and so far no new failure found. >> >> Thanks, >> Ningsheng >> >> On 8/17/20 5:16 PM, Andrew Dinn wrote: >>> Hi Pengfei, >>> >>> On 17/08/2020 07:00, Ningsheng Jian wrote: >>>> Thanks a lot for the review! Sorry for the late reply, as I was on >>>> vacation last week. And thanks to Pengfei and Joshua for helping >>>> clarifying some details in the patch. >>> >>> Yes, they did a very good job of answering most of the pending >>> questions. >>> >>>>> I also eyeballed /some/ of the generated code to check that it looked >>>>> ok. I'd really like to be able to do that systematically for a >>>>> comprehensive test suite that exercised every rule but I only had the >>>>> machine for a few days. This really ought to be done as a follow-up to >>>>> ensure that all the rules are working as expected. >>>> >>>> Yes, we would expect Pengfei's OptoAssembly check patch can get merged >>>> in future. >>> >>> I'm fine with that as a follow-up patch if you raise a JIRA for it. >>> >>>>> I am not clear why you are choosing to re-init ptrue after certain JVM >>>>> runtime calls (e.g. when Z calls into the runtime) and not others e.g. >>>>> when we call a JVM_ENTRY. Could you explain the rationale you have >>>>> followed here? >>>> >>>> We do the re-init at any possible return points to c2 code, not in any >>>> runtime c++ functions, which will reduce the re-init calls. >>>> >>>> Actually I found those entries by some hack of jvm. In the hacky code >>>> below we use gcc option -finstrument-functions to build hotspot. With >>>> this option, each C/C++ function entry/exit will call the instrument >>>> functions we defined. In instrument functions, we clobber p7 (or other >>>> reg for test) register, and in c2 function return we verify that p7 (or >>>> other reg) has been reinitialized. >>>> >>>> http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch >>>> >>> >>> Nice work. It's very good to have that documented. I'm willing to accept >>> i) that this has found all current cases and ii) that the verify will >>> catch any cases that might get introduced by future changes (e.g. the >>> callout introduced by ZGC that you mention below). As the above mot say >>> there is a slim chance this might have missed some cases but I think it >>> is pretty unlikely. >>> >>> >>>>> Specific Comments (register allocator webrev): >>>>> >>>>> >>>>> aarch64.ad:97-100 >>>>> >>>>> Why have you added a reg_def for R8 and R9 here and also to >>>>> alloc_class >>>>> chunk0 at lines 544-545? They aren't used by C2 so why define them? >>>>> >>>> >>>> I think Pengfei has helped to explain that. I will either add clear >>>> comments or rename the register name as you suggested. >>> >>> Ok, good. >>> >>>> As Joshua clarified, we are also working on predicate scalable reg, >>>> which is not in this patch. Thanks for the suggestion, I will try to >>>> refactor this a bit. >>> >>> Ok, I'll wait for an updated patch. Are you planning to include the >>> scalable predicate reg code as part of this patch? I think that would be >>> better as it would help to clarify the need to distinguish vector regs >>> as a subset of scalable regs. >>> >>>>> zBarrierSetAssembler_aarch64.cpp:434 >>>>> >>>>> Can you explain why we need to check p7 here and not do so in other >>>>> places where we call into the JVM? I'm not saying this is wrong. I >>>>> just >>>>> want to know how you decided where re-init of p7 was needed. >>>>> >>>> >>>> Actually I found this by my hack patch above while running jtreg tests. >>>> The stub slowpath here can be a c++ function. >>> >>> Yes, good catch. >>> >>>>> superword.cpp:97 >>>>> >>>>> Does this mean that is someone sets the maximum vector size to a >>>>> non-power of two, such as 384, all superword operations will be >>>>> bypassed? Including those which can be done using NEON vectors? >>>>> >>>> >>>> Current SLP vectorizer only supports power-of-2 vector size. We are >>>> trying to work out a new vectorizer to support all SVE vector sizes, so >>>> we would expect a size like 384 could go to that path. I tried current >>>> patch on a 512-bit SVE hardware which does not support 384-bit: >>>> >>>> $ java -XX:MaxVectorSize=16 -version # (32 and 64 are the same) >>>> openjdk version "16-internal" 2021-03-16 >>>> >>>> $ java -XX:MaxVectorSize=48 -version >>>> OpenJDK 64-Bit Server VM warning: Current system only supports max SVE >>>> vector length 32. Set MaxVectorSize to 32 >>>> >>>> (Fallbacks to 32 and issue a warning, as the prctl() call returns 32 >>>> instead of unsupported 48: >>>> https://www.kernel.org/doc/Documentation/arm64/sve.txt) >>>> >>>> Do you think we need to exit vm instead of warning and fallbacking >>>> to 32 >>>> here? >>> >>> Yes, I think a vm exit would probably be a better choice. >>> >>> regards, >>> >>> >>> Andrew Dinn >>> ----------- >>> Red Hat Distinguished Engineer >>> Red Hat UK Ltd >>> Registered in England and Wales under Company Registration No. 03798903 >>> Directors: Michael Cunningham, Michael ("Mike") O'Neill >>> >> > From aph at redhat.com Wed Aug 19 11:10:10 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 19 Aug 2020 12:10:10 +0100 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com> Message-ID: On 19/08/2020 11:05, Magnus Ihse Bursie wrote: > This is maybe not relevant, but I was surprised to find > src/hotspot/cpu/aarch64/aarch64-asmtest.py, because a) it's python code, > and b) the name implies that it is a test, even though that it resides > in src. Is this really proper? I have no idea whether it's really proper, but it allows us to check that instructions are encoded correctly by cross-checking with the system's assembler. There might well be a more hygienic way to do that, but I don't want to be without it. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From fairoz.matte at oracle.com Wed Aug 19 12:30:47 2020 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Wed, 19 Aug 2020 05:30:47 -0700 (PDT) Subject: RFR(s): 8248295: serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal In-Reply-To: <7040f785-b871-9771-94a2-4c3472a6bf6d@oracle.com> References: <1bdcbd35-097e-1681-3a0c-32f9709497a4@oracle.com> <7040f785-b871-9771-94a2-4c3472a6bf6d@oracle.com> Message-ID: <59cd0914-5a61-463e-b46f-ebdc1496ab9f@default> Hi Vladimir, Thanks for the review. > I would suggest to run test with -XX:+PrintCodeCache flag which prints > CodeCache usage on exit. > > Also add '-ea -esa' flags - some runs failed with them because they increase > Graal's methods size. > > Running test with immediately caused OOM error on my local linux machine: > > '-server -ea -esa -XX:+TieredCompilation -XX:+PrintCodeCache - > XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI - > XX:+UseJVMCICompiler -Djvmci.Compiler=graal' > > With -XX:ReservedCodeCacheSize=30m I got: > > [11.217s][warning][codecache] CodeCache is full. Compiler has been > disabled. > [11.217s][warning][codecache] Try increasing the code cache size using - > XX:ReservedCodeCacheSize= > > With -XX:ReservedCodeCacheSize=50m I got this output: Further testing with PrintCodeCache, ReservedCodeCacheSize = 50MB is the safe one to use. > > CodeCache: size=51200Kb used=34401Kb max_used=34401Kb free=16798Kb > > May be you need to set it to 35m or better to 50m to be safe. > > Note, without Graal test uses only 5.5m: > > CodeCache: size=20480Kb used=5677Kb max_used=5688Kb free=14803Kb > > ----------------------------- > > I also forgot to ask you to update test's Copyright year. I have updated the copyright year. Updated webrev for the reference - http://cr.openjdk.java.net/~fmatte/8248295/webrev.01/ Thanks, Fairoz > > Regards, > Vladimir K > > On 8/18/20 1:10 AM, Fairoz Matte wrote: > > Hi Vladimir, > > > > Thanks for looking into. > > This is intermittent crash, and is reproducible in windows debug build > environment. Below is the testing performed. > > > > 1. Issues observed 7/100 runs, ReservedCodeCacheSize=20m with "- > XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI - > XX:+UseJVMCICompiler" > > 2. Issues observed 0/300 runs, ReservedCodeCacheSize=30m with "- > XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI - > XX:+UseJVMCICompiler" > > > > Thanks, > > Fairoz > > > >> -----Original Message----- > >> From: Vladimir Kozlov > >> Sent: Monday, August 17, 2020 11:22 PM > >> To: Fairoz Matte ; hotspot-compiler- > >> dev at openjdk.java.net; serviceability-dev at openjdk.java.net > >> Cc: Coleen Phillimore ; Dean Long > >> > >> Subject: Re: RFR(s): 8248295: > >> serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with > >> Graal > >> > >> Hi Fairoz, > >> > >> How you determine that +10Mb is enough with Graal? > >> > >> Thanks, > >> Vladimir > >> > >> On 8/17/20 5:46 AM, Fairoz Matte wrote: > >>> Hi, > >>> > >>> > >>> > >>> Please review this small test change to work with Graal. > >>> > >>> > >>> > >>> Background: > >>> > >>> Graal require more code cache compared to c1/c2. but the test case > >>> always > >> set it to 20MB. This may not be sufficient when running graal. > >>> > >>> Default configuration for ReservedCodeCacheSize = 250MB > >>> > >>> With graal enabled, ReservedCodeCacheSize = 350MB > >>> > >>> > >>> > >>> Either we can modify the framework to honor ReservedCodeCacheSize > >>> for > >> graal or just update the testcase. > >>> > >>> There are not many test cases they rely on ReservedCodeCacheSize or > >> InitialCodeCacheSize. So the fix prefer the later one. > >>> > >>> > >>> > >>> JBS - https://bugs.openjdk.java.net/browse/JDK-8248295 > >>> > >>> Webrev - http://cr.openjdk.java.net/~fmatte/8248295/webrev.00/ > >>> > >>> > >>> > >>> Thanks, > >>> > >>> Fairoz > >>> > >>> > >>> From adinn at redhat.com Wed Aug 19 13:01:44 2020 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 19 Aug 2020 14:01:44 +0100 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> Message-ID: <47a0b915-291d-7bee-c298-a85d57b1c3a7@redhat.com> Hi Ningsheng, On 19/08/2020 10:53, Ningsheng Jian wrote: > I have updated the patch based on the review comments. Would you mind > taking another look? Thanks! > > Full: > http://cr.openjdk.java.net/~njian/8231441/webrev.04/ > > Incremental: > http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/ That looks ok. A few suggested tweaks: aarch64.ad:168 I think the following comment explains more clearly what is going on: // For SVE vector registers, we simply extend vector register size to 8 // 'logical' slots. This is nominally 256 bits but it actually covers // all possible 'physical' SVE vector register lengths from 128 ~ 2048 bits. // The 'physical' SVE vector register length is detected during startup // so the register allocator is able to identify the correct number of // bytes needed for an SVE spill/unspill. // Note that a vector register with 4 slots, denotes a 128-bit NEON // register allowing it to be distinguished from the // corresponding SVE vector register when the SVE vector length // is 128 bits. postaloc.cpp:312 & 322 311 if (lrgs(val_idx).is_scalable()) { 312 assert(val->ideal_reg() == Op_VecA, "scalable vector register"); . . . 321 if (lrgs(val_idx).is_scalable()) { 322 assert(val->ideal_reg() == Op_VecA, "scalable vector register"); You don't strictly need the asserts here as this is already asserted in the call to is_scalable(). > JTreg tests are still running, and so far no new failure found. Ok, well assuming they pass I am happy with this latest patch modulo the tweaks above. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From christian.hagedorn at oracle.com Wed Aug 19 14:06:57 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 19 Aug 2020 16:06:57 +0200 Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and debugging support In-Reply-To: <8cd1d560-f473-f4f1-a865-70e306d4750f@oracle.com> References: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com> <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com> <5c6d0dae-a30d-7b32-4503-62b7ec460606@oracle.com> <4b6d3d8a-f2f3-9942-de60-078a0e5c46d9@oracle.com> <8cd1d560-f473-f4f1-a865-70e306d4750f@oracle.com> Message-ID: <0d5fd444-e836-8042-3039-6d16e62ecfb1@oracle.com> On 18.08.20 17:41, Vladimir Kozlov wrote: > c1_Compilation.hpp: looks like both versions of allocator() do the same > thing. Right, I first wanted to have a public allocator() version in non-product only - but that might be over-engineered as they do the same thing. I changed it back to a single public version. > I suggest to build with configure --with-debug-level=optimized to check > that NOT_PRODUCT can be built with these changes. That's a good idea! I indeed forgot about one NOT_PRODUCT -> DEBUG_ONLY change. I also found other build issues with the optimized build. I filed [1] and already sent an RFR for it. It builds successfully with this patch on top of it. http://cr.openjdk.java.net/~chagedorn/8251093/webrev.02/ Best regards, Christian [1] https://bugs.openjdk.java.net/browse/JDK-8252037 > Thanks, > Vladimir > > On 8/18/20 6:16 AM, Christian Hagedorn wrote: >> Hi Vladimir >> >> On 17.08.20 19:36, Vladimir Kozlov wrote: >>> On 8/17/20 12:44 AM, Christian Hagedorn wrote: >>>> Hi Vladimir >>>> >>>> Yes, you're right, these should be changed into ASSERT and DEBUG(). >>>> >>>> I'm wondering though if these ifdefs are even required for if-blocks >>>> inside methods? >>>> >>>> Isn't, for example, this if-block: >>>> >>>> #ifndef PRODUCT >>>> ???????? if (TraceLinearScanLevel >= 2) { >>>> ?????????? tty->print_cr("killing XMMs for trig"); >>>> ???????? } >>>> #endif >>>> >>>> removed anyways when the flag is set to < 2 (which is statically >>>> known and thus would allow this entire block to be removed)? Or does >>>> it make a difference by explicitly guarding it with an ifdef? >>> >>> You are right. It could be statically removed. But we keep #ifdef >>> sometimes to indicate that code is executed only in debug build >>> because we don't always remember type of a flag. >> >> I see, that makes sense. I updated my patch and left the ifdefs there >> but changed them to ASSERT. I also updated other ifdefs belonging to >> TraceLinearScanLevel appropriately. >> >> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.01/ >> >> Best regards, >> Christian >> >>> >>> Thanks, >>> Vladimir K >>> >>>> >>>> Best regards, >>>> Christian >>>> >>>> On 14.08.20 20:09, Vladimir Kozlov wrote: >>>>> One note. Most of the code is guarded by #ifndef PRODUCT. >>>>> >>>>> But the flag is available only in DEBUG build: >>>>> ?? develop(intx, TraceLinearScanLevel, 0, >>>>> >>>>> Should we use #ifdef ASSERT and DEBUG() instead? >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 8/14/20 5:10 AM, Christian Hagedorn wrote: >>>>>> Hi >>>>>> >>>>>> Please review the following enhancement for C1: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8251093 >>>>>> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.00/ >>>>>> >>>>>> While I was working on JDK-8249603 [1], I added some additional >>>>>> debugging and logging code which helped to figure out what was >>>>>> going on. I think it would be useful to have this code around for >>>>>> the analysis of future C1 register allocator bugs. >>>>>> >>>>>> This RFE adds (everything non-product code): >>>>>> - find_interval(number): Can be called like that from gdb anywhere >>>>>> to find an interval with the given number. >>>>>> - Interval::print_children()/print_parent(): Useful when debugging >>>>>> with gdb to quickly show the split children and parent. >>>>>> - LinearScan::print_reg_num(number): Prints the register or stack >>>>>> location for this register number. This is useful in some places >>>>>> (logging with TraceLinearScanLevel set) where it just printed a >>>>>> number which first had to be manually looked up in other logs. >>>>>> >>>>>> I additionally did some cleanup of the touched code. >>>>>> >>>>>> We could additionally split the TraceLinearScanLevel flag into >>>>>> separate flags related to the different phases of the register >>>>>> allocation algorithm. It currently just prints too much details on >>>>>> the higher levels. You often find yourself being interested in a >>>>>> specific part of the algorithm and only want to know more details >>>>>> there. To achieve that you now you have to either handle all the >>>>>> noise or manually disable/enable other logs. We could file an RFE >>>>>> to clean this up if it's worth the effort - given that there are >>>>>> not many new issues filed for C1 register allocation today. >>>>>> >>>>>> Thank you! >>>>>> >>>>>> Best regards, >>>>>> Christian >>>>>> >>>>>> >>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8251093 >>>>>> From vladimir.kozlov at oracle.com Wed Aug 19 16:38:24 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 19 Aug 2020 09:38:24 -0700 Subject: RFR(s): 8248295: serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal In-Reply-To: <59cd0914-5a61-463e-b46f-ebdc1496ab9f@default> References: <1bdcbd35-097e-1681-3a0c-32f9709497a4@oracle.com> <7040f785-b871-9771-94a2-4c3472a6bf6d@oracle.com> <59cd0914-5a61-463e-b46f-ebdc1496ab9f@default> Message-ID: <1b7f5767-7d1f-1f43-87bb-556801ef1c41@oracle.com> Looks good. Thanks, Vladimir K On 8/19/20 5:30 AM, Fairoz Matte wrote: > Hi Vladimir, > > Thanks for the review. > >> I would suggest to run test with -XX:+PrintCodeCache flag which prints >> CodeCache usage on exit. >> >> Also add '-ea -esa' flags - some runs failed with them because they increase >> Graal's methods size. >> >> Running test with immediately caused OOM error on my local linux machine: >> >> '-server -ea -esa -XX:+TieredCompilation -XX:+PrintCodeCache - >> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI - >> XX:+UseJVMCICompiler -Djvmci.Compiler=graal' >> >> With -XX:ReservedCodeCacheSize=30m I got: >> >> [11.217s][warning][codecache] CodeCache is full. Compiler has been >> disabled. >> [11.217s][warning][codecache] Try increasing the code cache size using - >> XX:ReservedCodeCacheSize= >> >> With -XX:ReservedCodeCacheSize=50m I got this output: > > Further testing with PrintCodeCache, ReservedCodeCacheSize = 50MB is the safe one to use. > >> >> CodeCache: size=51200Kb used=34401Kb max_used=34401Kb free=16798Kb >> >> May be you need to set it to 35m or better to 50m to be safe. >> >> Note, without Graal test uses only 5.5m: >> >> CodeCache: size=20480Kb used=5677Kb max_used=5688Kb free=14803Kb >> >> ----------------------------- >> >> I also forgot to ask you to update test's Copyright year. > > I have updated the copyright year. > Updated webrev for the reference - http://cr.openjdk.java.net/~fmatte/8248295/webrev.01/ > > Thanks, > Fairoz >> >> Regards, >> Vladimir K >> >> On 8/18/20 1:10 AM, Fairoz Matte wrote: >>> Hi Vladimir, >>> >>> Thanks for looking into. >>> This is intermittent crash, and is reproducible in windows debug build >> environment. Below is the testing performed. >>> >>> 1. Issues observed 7/100 runs, ReservedCodeCacheSize=20m with "- >> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI - >> XX:+UseJVMCICompiler" >>> 2. Issues observed 0/300 runs, ReservedCodeCacheSize=30m with "- >> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI - >> XX:+UseJVMCICompiler" >>> >>> Thanks, >>> Fairoz >>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov >>>> Sent: Monday, August 17, 2020 11:22 PM >>>> To: Fairoz Matte ; hotspot-compiler- >>>> dev at openjdk.java.net; serviceability-dev at openjdk.java.net >>>> Cc: Coleen Phillimore ; Dean Long >>>> >>>> Subject: Re: RFR(s): 8248295: >>>> serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with >>>> Graal >>>> >>>> Hi Fairoz, >>>> >>>> How you determine that +10Mb is enough with Graal? >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 8/17/20 5:46 AM, Fairoz Matte wrote: >>>>> Hi, >>>>> >>>>> >>>>> >>>>> Please review this small test change to work with Graal. >>>>> >>>>> >>>>> >>>>> Background: >>>>> >>>>> Graal require more code cache compared to c1/c2. but the test case >>>>> always >>>> set it to 20MB. This may not be sufficient when running graal. >>>>> >>>>> Default configuration for ReservedCodeCacheSize = 250MB >>>>> >>>>> With graal enabled, ReservedCodeCacheSize = 350MB >>>>> >>>>> >>>>> >>>>> Either we can modify the framework to honor ReservedCodeCacheSize >>>>> for >>>> graal or just update the testcase. >>>>> >>>>> There are not many test cases they rely on ReservedCodeCacheSize or >>>> InitialCodeCacheSize. So the fix prefer the later one. >>>>> >>>>> >>>>> >>>>> JBS - https://bugs.openjdk.java.net/browse/JDK-8248295 >>>>> >>>>> Webrev - http://cr.openjdk.java.net/~fmatte/8248295/webrev.00/ >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Fairoz >>>>> >>>>> >>>>> From vladimir.kozlov at oracle.com Wed Aug 19 16:43:08 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 19 Aug 2020 09:43:08 -0700 Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and debugging support In-Reply-To: <0d5fd444-e836-8042-3039-6d16e62ecfb1@oracle.com> References: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com> <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com> <5c6d0dae-a30d-7b32-4503-62b7ec460606@oracle.com> <4b6d3d8a-f2f3-9942-de60-078a0e5c46d9@oracle.com> <8cd1d560-f473-f4f1-a865-70e306d4750f@oracle.com> <0d5fd444-e836-8042-3039-6d16e62ecfb1@oracle.com> Message-ID: Looks good. Thanks, Vladimir K On 8/19/20 7:06 AM, Christian Hagedorn wrote: > On 18.08.20 17:41, Vladimir Kozlov wrote: >> c1_Compilation.hpp: looks like both versions of allocator() do the same thing. > > Right, I first wanted to have a public allocator() version in non-product only - but that might be over-engineered as > they do the same thing. I changed it back to a single public version. > >> I suggest to build with configure --with-debug-level=optimized to check that NOT_PRODUCT can be built with these changes. > > That's a good idea! I indeed forgot about one NOT_PRODUCT -> DEBUG_ONLY change. I also found other build issues with the > optimized build. I filed [1] and already sent an RFR for it. It builds successfully with this patch on top of it. > > http://cr.openjdk.java.net/~chagedorn/8251093/webrev.02/ > > Best regards, > Christian > > [1] https://bugs.openjdk.java.net/browse/JDK-8252037 > >> Thanks, >> Vladimir >> >> On 8/18/20 6:16 AM, Christian Hagedorn wrote: >>> Hi Vladimir >>> >>> On 17.08.20 19:36, Vladimir Kozlov wrote: >>>> On 8/17/20 12:44 AM, Christian Hagedorn wrote: >>>>> Hi Vladimir >>>>> >>>>> Yes, you're right, these should be changed into ASSERT and DEBUG(). >>>>> >>>>> I'm wondering though if these ifdefs are even required for if-blocks inside methods? >>>>> >>>>> Isn't, for example, this if-block: >>>>> >>>>> #ifndef PRODUCT >>>>> ???????? if (TraceLinearScanLevel >= 2) { >>>>> ?????????? tty->print_cr("killing XMMs for trig"); >>>>> ???????? } >>>>> #endif >>>>> >>>>> removed anyways when the flag is set to < 2 (which is statically known and thus would allow this entire block to be >>>>> removed)? Or does it make a difference by explicitly guarding it with an ifdef? >>>> >>>> You are right. It could be statically removed. But we keep #ifdef sometimes to indicate that code is executed only >>>> in debug build because we don't always remember type of a flag. >>> >>> I see, that makes sense. I updated my patch and left the ifdefs there but changed them to ASSERT. I also updated >>> other ifdefs belonging to TraceLinearScanLevel appropriately. >>> >>> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.01/ >>> >>> Best regards, >>> Christian >>> >>>> >>>> Thanks, >>>> Vladimir K >>>> >>>>> >>>>> Best regards, >>>>> Christian >>>>> >>>>> On 14.08.20 20:09, Vladimir Kozlov wrote: >>>>>> One note. Most of the code is guarded by #ifndef PRODUCT. >>>>>> >>>>>> But the flag is available only in DEBUG build: >>>>>> ?? develop(intx, TraceLinearScanLevel, 0, >>>>>> >>>>>> Should we use #ifdef ASSERT and DEBUG() instead? >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 8/14/20 5:10 AM, Christian Hagedorn wrote: >>>>>>> Hi >>>>>>> >>>>>>> Please review the following enhancement for C1: >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8251093 >>>>>>> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.00/ >>>>>>> >>>>>>> While I was working on JDK-8249603 [1], I added some additional debugging and logging code which helped to figure >>>>>>> out what was going on. I think it would be useful to have this code around for the analysis of future C1 register >>>>>>> allocator bugs. >>>>>>> >>>>>>> This RFE adds (everything non-product code): >>>>>>> - find_interval(number): Can be called like that from gdb anywhere to find an interval with the given number. >>>>>>> - Interval::print_children()/print_parent(): Useful when debugging with gdb to quickly show the split children >>>>>>> and parent. >>>>>>> - LinearScan::print_reg_num(number): Prints the register or stack location for this register number. This is >>>>>>> useful in some places (logging with TraceLinearScanLevel set) where it just printed a number which first had to >>>>>>> be manually looked up in other logs. >>>>>>> >>>>>>> I additionally did some cleanup of the touched code. >>>>>>> >>>>>>> We could additionally split the TraceLinearScanLevel flag into separate flags related to the different phases of >>>>>>> the register allocation algorithm. It currently just prints too much details on the higher levels. You often find >>>>>>> yourself being interested in a specific part of the algorithm and only want to know more details there. To >>>>>>> achieve that you now you have to either handle all the noise or manually disable/enable other logs. We could file >>>>>>> an RFE to clean this up if it's worth the effort - given that there are not many new issues filed for C1 register >>>>>>> allocation today. >>>>>>> >>>>>>> Thank you! >>>>>>> >>>>>>> Best regards, >>>>>>> Christian >>>>>>> >>>>>>> >>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8251093 >>>>>>> From joserz at linux.ibm.com Wed Aug 19 16:53:38 2020 From: joserz at linux.ibm.com (joserz at linux.ibm.com) Date: Wed, 19 Aug 2020 13:53:38 -0300 Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com> <20200626182644.GA262544@pacoca> <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com> <20200630001528.GA26652@pacoca> <20200819002432.GA915540@pacoca> Message-ID: <20200819165338.GA978936@pacoca> On Wed, Aug 19, 2020 at 09:55:50AM +0000, Doerr, Martin wrote: > Hi Jose, > > thanks for the update. > > I have never seen 2 format specifications in the ad file. Does that work or does the 2nd one overwrite the 1st one? > I think it should be: > format %{ "BRH $dst, $src\n\t" > "EXTSH $dst, $dst" %} You're right, actually the 2nd one overwrote the first. I just fixed it. Thanks sir! > > I don't need to see another webrev for that. Otherwise, the change looks good. Thanks for contributing. > > Best regards, > Martin > > > > -----Original Message----- > > From: joserz at linux.ibm.com > > Sent: Mittwoch, 19. August 2020 02:25 > > To: Doerr, Martin > > Cc: Michihiro Horie ; hotspot-compiler- > > dev at openjdk.java.net > > Subject: Re: RFR(M): 8248190: PPC: Enable Power10 system and use new > > byte-reverse instructions > > > > Hallo Martin! > > > > Thank you very much for your review. Here is the v3: > > > > Webrev: http://cr.openjdk.java.net/~mhorie/8248190/webrev.02/ > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > > > I run a functional test and it's working as expected. If you try to run it in a > > system > > > $ java -XX:+UseByteReverseInstructions ReverseBytes > > OpenJDK 64-Bit Server VM warning: UseByteReverseInstructions specified, > > but needs at least Power10. > > (continue with existing code) > > > > > Unfortunately, I couldn?t find a Power10 machine in my garage ?? > > ???????? > > > > This is the code I use to test: > > 8<--------------------------------------------------------------- > > import java.io.IOException; > > > > class ReverseBytes > > { > > public static void main(String[] args) throws IOException > > { > > for (int i = 0; i < 1000000; ++i) { > > if (Integer.reverseBytes(0x12345678) != 0x78563412) { > > throw new RuntimeException(); > > } > > > > if (Long.reverseBytes(0x123456789ABCDEF0L) != > > 0xF0DEBC9A78563412L) { > > throw new RuntimeException(); > > } > > > > if (Short.reverseBytes((short)0x1234) != (short)0x3412) { > > throw new RuntimeException(); > > } > > > > if (Character.reverseBytes((char)0xabcd) != (char)0xcdab) { > > throw new RuntimeException(); > > } > > } > > System.out.println("ok"); > > } > > } > > 8<--------------------------------------------------------------- > > > > Best regards! > > > > Jose > > > > On Tue, Aug 18, 2020 at 09:13:39AM +0000, Doerr, Martin wrote: > > > Hi Michihiro and Jose, > > > > > > I had only done a quick review during my vacation. Thanks for updating the > > description of PowerArchitecturePPC64. > > > After taking a second look, I have a few minor requests. Sorry for that. > > > > > > > > > * ?UseByteReverseInstructions? (plural) would be more consistent with > > other names. > > > * Please add ?size? specifications to the ppc.ad file. Otherwise, the > > compiler has to determine sizes dynamically every time. > > > * bytes_reverse_short: ?format? specification misses ?extsh?. > > > > > > Unfortunately, I couldn?t find a Power10 machine in my garage ?? > > > So we rely on your testing. > > > > > > Thanks and best regards, > > > Martin > > > > > > > > > From: Michihiro Horie > > > Sent: Dienstag, 18. August 2020 09:28 > > > To: Doerr, Martin > > > Cc: hotspot-compiler-dev at openjdk.java.net; joserz at linux.ibm.com > > > Subject: RE: RFR(M): 8248190: PPC: Enable Power10 system and use new > > byte-reverse instructions > > > > > > > > > Jose, > > > Latest change looks good also to me. > > > > > > Marin, > > > Do you think if I can push the change? > > > > > > Best regards, > > > Michihiro > > > > > > > > > ----- Original message ----- > > > From: "Doerr, Martin" > > > > > > To: "joserz at linux.ibm.com" > > > > > > Cc: hotspot compiler > dev at openjdk.java.net>, > > "horie at jp.ibm.com" > > > > > > Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system > > and use new byte-reverse instructions > > > Date: Wed, Jul 1, 2020 4:01 AM > > > > > > Thanks for the much better flag description. > > > Looks good. > > > > > > Best regards, > > > Martin > > > > > > > Am 30.06.2020 um 02:15 schrieb > > "joserz at linux.ibm.com" > > >: > > > > > > > > ?Hello team, > > > > > > > > Here's the 2nd version, implementing the suggestions asked by Martin. > > > > > > > > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.01/ > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > > > > > > > Thank you!! > > > > > > > > Jose > > > > > > > >> On Sat, Jun 27, 2020 at 09:29:32AM +0000, Doerr, Martin wrote: > > > >> Hi Jose, > > > >> > > > >> Can you replace the outdated description of PowerArchitecturePPC64 in > > globals_poc.hpp by something generic, please? > > > >> > > > >> Please update the Copyright year in vm_version_poc.hpp. > > > >> > > > >> I can?t test the change, but it looks good to me. > > > >> > > > >> Best regards, > > > >> Martin > > > >> > > > >>>> Am 26.06.2020 um 20:29 schrieb > > "joserz at linux.ibm.com" > > >: > > > >>> > > > >>> ?Hello team! > > > >>> > > > >>> This patch introduces Power10 to OpenJDK and implements three new > > instructions: > > > >>> - brh - byte-reverse halfword > > > >>> - brw - byte-reverse word > > > >>> - brd - byte-reverse doubleword > > > >>> > > > >>> Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/ > > > >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > > >>> > > > >>> Thanks for your review! > > > >>> > > > >>> Jose R. Ziviani > > > From cjashfor at linux.ibm.com Wed Aug 19 18:10:50 2020 From: cjashfor at linux.ibm.com (Corey Ashford) Date: Wed, 19 Aug 2020 11:10:50 -0700 Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> Message-ID: Michihiro Horie posted up a new iteration of this webrev for me. This time the webrev includes a complete implementation of the intrinsic for Power9 and Power10. You can find it here: http://cr.openjdk.java.net/~mhorie/8248188/webrev.02/ Changes in webrev.02 vs. webrev.01: * The method header for the intrinsic in the Base64 code has been rewritten using the Javadoc style. The clarity of the comments has been improved and some verbosity has been removed. There are no additional functional changes to Base64.java. * The code needed to martial and check the intrinsic parameters has been added, using the base64 encodeBlock intrinsic as a guideline. * A complete intrinsic implementation for Power9 and Power10 is included. * Adds some Power9 and Power10 assembler instructions needed by the intrinsic which hadn't been defined before. The intrinsic implementation in this patch accelerates the decoding of large blocks of base64 data by a factor of about 3.5X on Power9. I'm attaching two Java test cases I am using for testing and benchmarking. The TestBase64_VB encodes and decodes randomly-sized buffers of random data and checks that original data matches the encoded-then-decoded data. TestBase64Errors encodes a 48K block of random bytes, then corrupts each byte of the encoded data, one at a time, checking to see if the decoder catches the illegal byte. Any comments/suggestions would be appreciated. Thanks, - Corey On 7/27/20 6:49 PM, Corey Ashford wrote: > Michihiro Horie uploaded a new revision of the Base64 decodeBlock > intrinsic API for me: > > http://cr.openjdk.java.net/~mhorie/8248188/webrev.01/ > > It has the following changes with respect to the original one posted: > > ?* In the event of encountering a non-base64 character, instead of > having a separate error code of -1, the intrinsic can now just return > either 0, or the number of data bytes produced up to the point where the > illegal base64 character was encountered.? This reduces the number of > special cases, and also provides a way to speed up the process of > finding the bad character by the slower, pure-Java algorithm. > > ?* The isMIME boolean is removed from the API for two reasons: > ?? - The current API is not sufficient to handle the isMIME case, > because there isn't a strict relationship between the number of input > bytes and the number of output bytes, because there can be an arbitrary > number of non-base64 characters in the source. > ?? - If an intrinsic only implements the (isMIME == false) case as ours > does, it will always return 0 bytes processed, which will slightly slow > down the normal path of processing an (isMIME == true) instantiation. > ?? - We considered adding a separate hotspot candidate for the (isMIME > == true) case, but since we don't have an intrinsic implementation to > test that, we decided to leave it as a future optimization. > > Comments and suggestions are welcome.? Thanks for your consideration. > > - Corey > > On 6/23/20 6:23 PM, Michihiro Horie wrote: >> Hi Corey, >> >> Following is the issue I created. >> https://bugs.openjdk.java.net/browse/JDK-8248188 >> >> I will upload a webrev when you're ready as we talked in private. >> >> Best regards, >> Michihiro >> >> Inactive hide details for "Corey Ashford" ---2020/06/24 >> 09:40:10---Currently in java.util.Base64, there is a >> HotSpotIntrinsicCa"Corey Ashford" ---2020/06/24 09:40:10---Currently >> in java.util.Base64, there is a HotSpotIntrinsicCandidate and API for >> encodeBlock, but no >> >> From: "Corey Ashford" >> To: "hotspot-compiler-dev at openjdk.java.net" >> , >> "ppc-aix-port-dev at openjdk.java.net" >> Cc: Michihiro Horie/Japan/IBM at IBMJP, Kazunori Ogata/Japan/IBM at IBMJP, >> joserz at br.ibm.com >> Date: 2020/06/24 09:40 >> Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for >> Base64 decoding >> >> ------------------------------------------------------------------------ >> >> >> >> Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and >> API for encodeBlock, but none for decoding. ?This means that only >> encoding gets acceleration from the underlying CPU's vector hardware. >> >> I'd like to propose adding a new intrinsic for decodeBlock. ?The >> considerations I have for this new intrinsic's API: >> >> ??* Don't make any assumptions about the underlying capability of the >> hardware. ?For example, do not impose any specific block size >> granularity. >> >> ??* Don't assume the underlying intrinsic can handle isMIME or isURL >> modes, but also let them decide if they will process the data regardless >> of the settings of the two booleans. >> >> ??* Any remaining data that is not processed by the intrinsic will be >> processed by the pure Java implementation. ?This allows the intrinsic to >> process whatever block sizes it's good at without the complexity of >> handling the end fragments. >> >> ??* If any illegal character is discovered in the decoding process, the >> intrinsic will simply return -1, instead of requiring it to throw a >> proper exception from the context of the intrinsic. ?In the event of >> getting a -1 returned from the intrinsic, the Java Base64 library code >> simply calls the pure Java implementation to have it find the error and >> properly throw an exception. ?This is a performance trade-off in the >> case of an error (which I expect to be very rare). >> >> ??* One thought I have for a further optimization (not implemented in >> the current patch), is that when the intrinsic decides not to process a >> block because of some combination of isURL and isMIME settings it >> doesn't handle, it could return extra bits in the return code, encoded >> as a negative number. ?For example: >> >> Illegal_Base64_char ? = 0b001; >> isMIME_unsupported ? ?= 0b010; >> isURL_unsupported ? ? = 0b100; >> >> These can be OR'd together as needed and then negated (flip the sign). >> The Base64 library code could then cache these flags, so it will know >> not to call the intrinsic again when another decodeBlock is requested >> but with an unsupported mode. ?This will save the performance hit of >> calling the intrinsic when it is guaranteed to fail. >> >> I've tested the attached patch with an actual intrinsic coded up for >> Power9/Power10, but those runtime intrinsics and arch-specific patches >> aren't attached today. ?I want to get some consensus on the >> library-level intrinsic API first. >> >> Also attached is a simple test case to test that the new intrinsic API >> doesn't break anything. >> >> I'm open to any comments about this. >> >> Thanks for your consideration, >> >> - Corey >> >> >> Corey Ashford >> IBM Systems, Linux Technology Center, OpenJDK team >> cjashfor at us dot ibm dot com >> [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro >> Horie/Japan/IBM] [attachment "TestBase64.java" deleted by Michihiro >> Horie/Japan/IBM] >> >> > From doug.simon at oracle.com Wed Aug 19 19:16:54 2020 From: doug.simon at oracle.com (Doug Simon) Date: Wed, 19 Aug 2020 21:16:54 +0200 Subject: RFR(S): 8251923: "Invalid JNI handle" assertion failure in JVMCICompiler::force_comp_at_level_simple() In-Reply-To: <85pn7nxc8z.fsf@nicgas01-pc.shanghai.arm.com> References: <85pn7nxc8z.fsf@nicgas01-pc.shanghai.arm.com> Message-ID: <6E15607C-D983-4645-86DB-115BDB7F563E@oracle.com> Looks good to me. -Doug > On 19 Aug 2020, at 10:37, Nick Gasson wrote: > > Hi, > > Bug: https://bugs.openjdk.java.net/browse/JDK-8251923 > Webrev: http://cr.openjdk.java.net/~ngasson/8251923/webrev.1/ > > We see this crash occasionally when testing with Graal on some AArch64 > systems: > > # > # Internal Error (/home/ent-user/jdk_src/src/hotspot/share/runtime/jniHandles.inline.hpp:63), pid=92161, tid=92593 > # assert(external_guard || result != __null) failed: Invalid JNI handle > # > > V [libjvm.so+0xdfaa84] JNIHandles::resolve(_jobject*)+0x19c > V [libjvm.so+0xf25104] HotSpotJVMCI::resolve(JVMCIObject)+0x14 > V [libjvm.so+0xe9bd20] JVMCICompiler::force_comp_at_level_simple(methodHandle const&)+0xa0 > V [libjvm.so+0x174bd6c] TieredThresholdPolicy::is_mature(Method*)+0x51c > V [libjvm.so+0x76e68c] ciMethodData::load_data()+0x9cc > > The full hs_err file is attached to the JBS entry. > > The handle here is _HotSpotJVMCIRuntime_instance which is initialised in > JVMCIRuntime::initialize_HotSpotJVMCIRuntime(): > > JVMCIObject result = JVMCIENV->call_HotSpotJVMCIRuntime_runtime(JVMCI_CHECK); > _HotSpotJVMCIRuntime_instance = JVMCIENV->make_global(result); > > JVMCICompiler::force_comp_at_level_simple() checks whether the _object > field inside the handle is null before calling JNIHandles::resolve() on > it, which should avoid the above assertion failure where the pointee is > null. However on a non-TSO architecture another thread may observe the > store to _object when assigning _HotSpotJVMCIRuntime_instance before the > store in JVMCIEnv::make_global() that initialises the pointed-to oop. We > need to add a store-store barrier here to force the expected ordering. > > Tested with jcstress and Graal on the affected machine, which used to > reproduce it quite reliably. > > -- > Thanks, > Nick From vladimir.kozlov at oracle.com Wed Aug 19 19:18:45 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 19 Aug 2020 12:18:45 -0700 Subject: RFR(S): 8251923: "Invalid JNI handle" assertion failure in JVMCICompiler::force_comp_at_level_simple() In-Reply-To: <6E15607C-D983-4645-86DB-115BDB7F563E@oracle.com> References: <85pn7nxc8z.fsf@nicgas01-pc.shanghai.arm.com> <6E15607C-D983-4645-86DB-115BDB7F563E@oracle.com> Message-ID: <83e818a0-b9d2-205b-6a25-4869fc1e2101@oracle.com> +1 Thanks, Vladimir K On 8/19/20 12:16 PM, Doug Simon wrote: > Looks good to me. > > -Doug > >> On 19 Aug 2020, at 10:37, Nick Gasson wrote: >> >> Hi, >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8251923 >> Webrev: http://cr.openjdk.java.net/~ngasson/8251923/webrev.1/ >> >> We see this crash occasionally when testing with Graal on some AArch64 >> systems: >> >> # >> # Internal Error (/home/ent-user/jdk_src/src/hotspot/share/runtime/jniHandles.inline.hpp:63), pid=92161, tid=92593 >> # assert(external_guard || result != __null) failed: Invalid JNI handle >> # >> >> V [libjvm.so+0xdfaa84] JNIHandles::resolve(_jobject*)+0x19c >> V [libjvm.so+0xf25104] HotSpotJVMCI::resolve(JVMCIObject)+0x14 >> V [libjvm.so+0xe9bd20] JVMCICompiler::force_comp_at_level_simple(methodHandle const&)+0xa0 >> V [libjvm.so+0x174bd6c] TieredThresholdPolicy::is_mature(Method*)+0x51c >> V [libjvm.so+0x76e68c] ciMethodData::load_data()+0x9cc >> >> The full hs_err file is attached to the JBS entry. >> >> The handle here is _HotSpotJVMCIRuntime_instance which is initialised in >> JVMCIRuntime::initialize_HotSpotJVMCIRuntime(): >> >> JVMCIObject result = JVMCIENV->call_HotSpotJVMCIRuntime_runtime(JVMCI_CHECK); >> _HotSpotJVMCIRuntime_instance = JVMCIENV->make_global(result); >> >> JVMCICompiler::force_comp_at_level_simple() checks whether the _object >> field inside the handle is null before calling JNIHandles::resolve() on >> it, which should avoid the above assertion failure where the pointee is >> null. However on a non-TSO architecture another thread may observe the >> store to _object when assigning _HotSpotJVMCIRuntime_instance before the >> store in JVMCIEnv::make_global() that initialises the pointed-to oop. We >> need to add a store-store barrier here to force the expected ordering. >> >> Tested with jcstress and Graal on the affected machine, which used to >> reproduce it quite reliably. >> >> -- >> Thanks, >> Nick > From serguei.spitsyn at oracle.com Wed Aug 19 20:14:56 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 19 Aug 2020 13:14:56 -0700 Subject: RFR(s): 8248295: serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal In-Reply-To: <1b7f5767-7d1f-1f43-87bb-556801ef1c41@oracle.com> References: <1bdcbd35-097e-1681-3a0c-32f9709497a4@oracle.com> <7040f785-b871-9771-94a2-4c3472a6bf6d@oracle.com> <59cd0914-5a61-463e-b46f-ebdc1496ab9f@default> <1b7f5767-7d1f-1f43-87bb-556801ef1c41@oracle.com> Message-ID: <6f104422-11cc-1bea-2ebf-a916a22f10fd@oracle.com> Hi Fairoz, LGTM++ Thanks, Serguei On 8/19/20 09:38, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir K > > On 8/19/20 5:30 AM, Fairoz Matte wrote: >> Hi Vladimir, >> >> Thanks for the review. >> >>> I would suggest to run test with -XX:+PrintCodeCache flag which prints >>> CodeCache usage on exit. >>> >>> Also add '-ea -esa' flags - some runs failed with them because they >>> increase >>> Graal's methods size. >>> >>> Running test with immediately caused OOM error on my local linux >>> machine: >>> >>> '-server -ea -esa -XX:+TieredCompilation -XX:+PrintCodeCache - >>> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI - >>> XX:+UseJVMCICompiler -Djvmci.Compiler=graal' >>> >>> With -XX:ReservedCodeCacheSize=30m I got: >>> >>> [11.217s][warning][codecache] CodeCache is full. Compiler has been >>> disabled. >>> [11.217s][warning][codecache] Try increasing the code cache size >>> using - >>> XX:ReservedCodeCacheSize= >>> >>> With -XX:ReservedCodeCacheSize=50m I got this output: >> >> Further testing with PrintCodeCache, ReservedCodeCacheSize = 50MB is >> the safe one to use. >> >>> >>> CodeCache: size=51200Kb used=34401Kb max_used=34401Kb free=16798Kb >>> >>> May be you need to set it to 35m or better to 50m to be safe. >>> >>> Note, without Graal test uses only 5.5m: >>> >>> CodeCache: size=20480Kb used=5677Kb max_used=5688Kb free=14803Kb >>> >>> ----------------------------- >>> >>> I also forgot to ask you to update test's Copyright year. >> >> I have updated the copyright year. >> Updated webrev for the reference - >> http://cr.openjdk.java.net/~fmatte/8248295/webrev.01/ >> >> Thanks, >> Fairoz >>> >>> Regards, >>> Vladimir K >>> >>> On 8/18/20 1:10 AM, Fairoz Matte wrote: >>>> Hi Vladimir, >>>> >>>> Thanks for looking into. >>>> This is intermittent crash, and is reproducible in windows debug build >>> environment. Below is the testing performed. >>>> >>>> 1. Issues observed 7/100 runs, ReservedCodeCacheSize=20m with "- >>> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI - >>> XX:+UseJVMCICompiler" >>>> 2. Issues observed 0/300 runs, ReservedCodeCacheSize=30m with "- >>> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI - >>> XX:+UseJVMCICompiler" >>>> >>>> Thanks, >>>> Fairoz >>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov >>>>> Sent: Monday, August 17, 2020 11:22 PM >>>>> To: Fairoz Matte ; hotspot-compiler- >>>>> dev at openjdk.java.net; serviceability-dev at openjdk.java.net >>>>> Cc: Coleen Phillimore ; Dean Long >>>>> >>>>> Subject: Re: RFR(s): 8248295: >>>>> serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with >>>>> Graal >>>>> >>>>> Hi Fairoz, >>>>> >>>>> How you determine that +10Mb is enough with Graal? >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 8/17/20 5:46 AM, Fairoz Matte wrote: >>>>>> Hi, >>>>>> >>>>>> >>>>>> >>>>>> Please review this small test change to work with Graal. >>>>>> >>>>>> >>>>>> >>>>>> Background: >>>>>> >>>>>> Graal require more code cache compared to c1/c2. but the test case >>>>>> always >>>>> set it to 20MB. This may not be sufficient when running graal. >>>>>> >>>>>> Default configuration for ReservedCodeCacheSize = 250MB >>>>>> >>>>>> With graal enabled, ReservedCodeCacheSize = 350MB >>>>>> >>>>>> >>>>>> >>>>>> Either we can modify the framework to honor ReservedCodeCacheSize >>>>>> for >>>>> graal or just update the testcase. >>>>>> >>>>>> There are not many test cases they rely on ReservedCodeCacheSize or >>>>> InitialCodeCacheSize. So the fix prefer the later one. >>>>>> >>>>>> >>>>>> >>>>>> JBS - https://bugs.openjdk.java.net/browse/JDK-8248295 >>>>>> >>>>>> Webrev - http://cr.openjdk.java.net/~fmatte/8248295/webrev.00/ >>>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Fairoz >>>>>> >>>>>> >>>>>> From mikael.vidstedt at oracle.com Wed Aug 19 22:14:21 2020 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Wed, 19 Aug 2020 15:14:21 -0700 Subject: RFR(XS): 8252051: Make mlvmJvmtiUtils strncpy uses GCC 10.x friendly Message-ID: Please review this small change which updates the strncpy code in mlvmJvmtiUtils.cpp to make gcc 10.x happy: JBS: https://bugs.openjdk.java.net/browse/JDK-8252051 webrev: http://cr.openjdk.java.net/~mikael/webrevs/8252051/webrev.00/open/webrev/ * Background (from JBS) gcc 10.2 is producing a warning for mlvmJmvtiUtils.cpp: In file included from test/hotspot/jtreg/vmTestbase/vm/mlvm/indy/func/jvmti/share/libIndyRedefineClass.cpp:31: test/hotspot/jtreg/vmTestbase/vm/mlvm/share/mlvmJvmtiUtils.cpp:100:12: error: 'char* strncpy(char*, const char*, size_t)' specified bound 256 equals destination size [-Werror=stringop-truncation] 100 | strncpy(mn->classSig, szSignature, sizeof(mn->classSig)); | ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1plus: all warnings being treated as errors It seems like gcc is not smart enough to realize that the strncpy on the previous line (mn->methodName) cannot modify szSignature. * Testing tier1 and test/hotspot/jtreg:vmTestbase_vm_mlvm locally Cheers, Mikael From igor.ignatyev at oracle.com Wed Aug 19 22:25:12 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 19 Aug 2020 15:25:12 -0700 Subject: RFR(XS): 8252051: Make mlvmJvmtiUtils strncpy uses GCC 10.x friendly In-Reply-To: References: Message-ID: LGTM -- Igor > On Aug 19, 2020, at 3:14 PM, Mikael Vidstedt wrote: > > > Please review this small change which updates the strncpy code in mlvmJvmtiUtils.cpp to make gcc 10.x happy: > > JBS: https://bugs.openjdk.java.net/browse/JDK-8252051 > webrev: http://cr.openjdk.java.net/~mikael/webrevs/8252051/webrev.00/open/webrev/ > > * Background (from JBS) > > gcc 10.2 is producing a warning for mlvmJmvtiUtils.cpp: > > In file included from test/hotspot/jtreg/vmTestbase/vm/mlvm/indy/func/jvmti/share/libIndyRedefineClass.cpp:31: > test/hotspot/jtreg/vmTestbase/vm/mlvm/share/mlvmJvmtiUtils.cpp:100:12: error: 'char* strncpy(char*, const char*, size_t)' specified bound 256 equals destination size [-Werror=stringop-truncation] > 100 | strncpy(mn->classSig, szSignature, sizeof(mn->classSig)); > | ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > cc1plus: all warnings being treated as errors > > It seems like gcc is not smart enough to realize that the strncpy on the previous line (mn->methodName) cannot modify szSignature. > > > * Testing > > tier1 and test/hotspot/jtreg:vmTestbase_vm_mlvm locally > > > Cheers, > Mikael > From vladimir.kozlov at oracle.com Wed Aug 19 22:59:55 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 19 Aug 2020 15:59:55 -0700 Subject: RFR(XS): 8252051: Make mlvmJvmtiUtils strncpy uses GCC 10.x friendly In-Reply-To: References: Message-ID: <6c1007ab-2a92-769f-688f-b123324d5d5b@oracle.com> +1 Vladimir K On 8/19/20 3:25 PM, Igor Ignatyev wrote: > LGTM > -- Igor > >> On Aug 19, 2020, at 3:14 PM, Mikael Vidstedt wrote: >> >> >> Please review this small change which updates the strncpy code in mlvmJvmtiUtils.cpp to make gcc 10.x happy: >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8252051 >> webrev: http://cr.openjdk.java.net/~mikael/webrevs/8252051/webrev.00/open/webrev/ >> >> * Background (from JBS) >> >> gcc 10.2 is producing a warning for mlvmJmvtiUtils.cpp: >> >> In file included from test/hotspot/jtreg/vmTestbase/vm/mlvm/indy/func/jvmti/share/libIndyRedefineClass.cpp:31: >> test/hotspot/jtreg/vmTestbase/vm/mlvm/share/mlvmJvmtiUtils.cpp:100:12: error: 'char* strncpy(char*, const char*, size_t)' specified bound 256 equals destination size [-Werror=stringop-truncation] >> 100 | strncpy(mn->classSig, szSignature, sizeof(mn->classSig)); >> | ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> cc1plus: all warnings being treated as errors >> >> It seems like gcc is not smart enough to realize that the strncpy on the previous line (mn->methodName) cannot modify szSignature. >> >> >> * Testing >> >> tier1 and test/hotspot/jtreg:vmTestbase_vm_mlvm locally >> >> >> Cheers, >> Mikael >> > From serguei.spitsyn at oracle.com Wed Aug 19 23:22:08 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 19 Aug 2020 16:22:08 -0700 Subject: RFR(T) : 8252005 : narrow disabling of allowSmartActionArgs in vmTestbase In-Reply-To: <4E6FECE6-9103-46ED-84B2-79DBA0123ED9@oracle.com> References: <4E6FECE6-9103-46ED-84B2-79DBA0123ED9@oracle.com> Message-ID: <17a8369e-5f38-ebab-974b-28e083378aa2@oracle.com> Hi Igor, This looks reasonable. Thanks, Serguei On 8/18/20 16:42, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8252005/webrev.00/ >> 0 lines changed: 0 ins; 0 del; 0 mod; > Hi all, > > could you please review this trivial (and apparently empty) patch which sets allowSmartActionArgs to false only in subdirectories of vmTestbase which currently use PropertyResolvingWrapper? > > (it's hard to tell from webrev or patch, but test/hotspot/jtreg/vmTestbase/TEST.properties is effectively removed) > > webrev: http://cr.openjdk.java.net/~iignatyev//8252005/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8252005 > > Thanks, > -- Igor > > From john.r.rose at oracle.com Thu Aug 20 00:47:02 2020 From: john.r.rose at oracle.com (John Rose) Date: Wed, 19 Aug 2020 17:47:02 -0700 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <87h7t13bdz.fsf@redhat.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> Message-ID: <668C77F1-5CCC-43CA-9C5E-2EE390D3137A@oracle.com> On Aug 17, 2020, at 1:49 AM, Roland Westrelin wrote: > > Does the fixed patch look ok to you? I?m going over it one more time (between smoky breaths from the California fires) and I have a question. What is the exact structure of outer_phi? 1. At first it is a clone of phi, its region patched and the other edges the same: outer_phi := Phi(outer_head, init, AddL(phi, stride)) 2. After long_loop_replace_long_iv, the interior phi links it back to itself: outer_phi := Phi(outer_head, init, AddL(AddL(inner_phi, outer_phi), stride)) The thing I?m suspicious of (as fragile code) is the retention of the addend ?stride?. The inner loop (on ?inner_phi?) *also* adds the stride. What prevents there from being a superfluous number of strides added? I suppose the answer is that the ending output of the inner loop produces the post-incremented value as (inner_incr + outer_phi), while the pre-incremented value is (inner_phi + outer_phi), which is *always* short of the final count by the stride; thus it?s OK to add single missing stride in the outer loop. But, this seems fragile to me. Would it not be safer to make the outer phi just copy the final inner IV value (the one that fails the loop?s test)? So: outer_phi := Phi(outer_head, init, AddL(inner_incr, outer_phi)) I?m worried that some edge-case of loop loop might actually miss a stride unless the outer phi has the latter, dead-simple form. In particular, there are loops where there is only one IV (no separate incr). I?m not confident that those loops will work correctly; it seems to me that the existing ?outer_phi?, with its extra stride addend, may well contribute an unwanted extra step, when the inner_phi is post-incremented. As a related issue, I think the pseudocode comment at the top is false, with the same problem. It should probably not say this: // L1: for (long phi1 = init; phi1 < limit; phi1 += stride) { // // phi1 := Phi(L1, init, phi1 + stride) but rather this: // L1: for (long phi1 = init; phi1 < limit; phi1 += phi2) { // // phi1 := Phi(L1, init, phi1 + phi2) This sort of bug will show up if (a) we test long loops with large trip counts, and (b) also use the stress mode which makes the outer loop trip two or three times, and finally (c) we get several kinds of loops; ones with and without phi == incr, and with and without ?limit_check_required?, and with each kind of possible termination condition (< <= > >=). ? John P.S. I came to this question while working through the transform logic on pseudocode. Here it is, for reference. It think it might make a good diagram to place in the code, just before the comment that says ?Peel one iteration?. From john.r.rose at oracle.com Thu Aug 20 00:47:46 2020 From: john.r.rose at oracle.com (John Rose) Date: Wed, 19 Aug 2020 17:47:46 -0700 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <668C77F1-5CCC-43CA-9C5E-2EE390D3137A@oracle.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> <668C77F1-5CCC-43CA-9C5E-2EE390D3137A@oracle.com> Message-ID: On Aug 19, 2020, at 5:47 PM, John Rose wrote: > > On Aug 17, 2020, at 1:49 AM, Roland Westrelin > wrote: >> >> Does the fixed patch look ok to you? > > I?m going over it one more time (between smoky breaths from > the California fires) and I have a question. What is the exact > structure of outer_phi? > > 1. At first it is a clone of phi, its region patched and the other edges the same: > > outer_phi := Phi(outer_head, init, AddL(phi, stride)) > > 2. After long_loop_replace_long_iv, the interior phi links it back to itself: > > outer_phi := Phi(outer_head, init, AddL(AddL(inner_phi, outer_phi), stride)) > > The thing I?m suspicious of (as fragile code) is the retention of the addend > ?stride?. The inner loop (on ?inner_phi?) *also* adds the stride. What > prevents there from being a superfluous number of strides added? > > I suppose the answer is that the ending output of the inner loop produces > the post-incremented value as (inner_incr + outer_phi), while the > pre-incremented value is (inner_phi + outer_phi), which is *always* > short of the final count by the stride; thus it?s OK to add single missing > stride in the outer loop. > > But, this seems fragile to me. Would it not be safer to make the outer > phi just copy the final inner IV value (the one that fails the loop?s test)? > So: > > outer_phi := Phi(outer_head, init, AddL(inner_incr, outer_phi)) > > I?m worried that some edge-case of loop loop might actually miss > a stride unless the outer phi has the latter, dead-simple form. > > In particular, there are loops where there is only one IV (no separate > incr). I?m not confident that those loops will work correctly; it seems > to me that the existing ?outer_phi?, with its extra stride addend, > may well contribute an unwanted extra step, when the inner_phi > is post-incremented. > > As a related issue, I think the pseudocode comment at the top is > false, with the same problem. It should probably not say this: > > // L1: for (long phi1 = init; phi1 < limit; phi1 += stride) { > // // phi1 := Phi(L1, init, phi1 + stride) > > but rather this: > > // L1: for (long phi1 = init; phi1 < limit; phi1 += phi2) { > // // phi1 := Phi(L1, init, phi1 + phi2) > > This sort of bug will show up if (a) we test long loops with > large trip counts, and (b) also use the stress mode which > makes the outer loop trip two or three times, and finally > (c) we get several kinds of loops; ones with and without > phi == incr, and with and without ?limit_check_required?, > and with each kind of possible termination condition > (< <= > >=). > > ? John > > P.S. I came to this question while working through the transform > logic on pseudocode. Here it is, for reference. It think it might > make a good diagram to place in the code, just before the comment > that says ?Peel one iteration?. == old IR nodes => entry_control: {...} x: for (long phi = init;;) { // phi := Phi(x, init, phi + stride) exit_test: if (phi < limit) back_control: fallthrough; else exit_branch: break; // test happens after increment => phi == phi_incr != NULL long incr = (phi + stride); ... use phi and incr ... phi = incr; } == new IR nodes (before final peel) => entry_control: {...} long adjusted_limit = limit + stride; //because phi_incr != NULL assert(!limit_check_required || (extralong)limit + stride == adjusted_limit); // else deopt ulong inner_iters_limit = max_jint - ABS(stride) - 1; //near 0x7FFFFFF0 outer_head: for (long outer_phi = init;;) { //phi->clone(), in(0):=outer_head // outer_phi := Phi(outer_head, init, inner_phi, phi=>(outer_phi+inner_phi) + stride) // >>> ISSUE: is the extra '+ stride' here always harmless? <<< ulong inner_iters_max = (ulong) MAX(0LL, ((extralong)adjusted_limit - outer_phi) * SGN(stride)); int inner_iters_actual_int = (int) MIN(inner_iters_limit, inner_iters_max) * SGN(stride); inner_head: x: //in(1) := outer_head for (int inner_phi = 0;;) { // inner_phi := Phi(x, intcon(0), inner_phi + stride) int inner_incr = inner_phi + stride; bool inner_bol = (inner_incr < inner_iters_actual_int); exit_test: //exit_test->in(1) := inner_bol; if (inner_bol) // WAS (phi < limit) back_control: fallthrough; else inner_exit_branch: break; //exit_branch->clone() // REPLACE phi => (outer_phi+inner_phi) // REPLACE incr => (outer_phi+inner_incr) ... use phi=>(outer_phi+inner_phi) and incr=>(outer_phi+inner_incr) ... inner_phi = inner_phi + stride; // inner_incr } outer_exit_test: //exit_test->clone(), in(0):=inner_exit_branch if ((outer_phi+inner_phi) < limit) // WAS (phi < limit) outer_back_branch: fallthrough; //back_control->clone(), in(0):=outer_exit_test else exit_branch: break; //in(0) := outer_exit_test } From ningsheng.jian at arm.com Thu Aug 20 02:27:08 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Thu, 20 Aug 2020 10:27:08 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <47a0b915-291d-7bee-c298-a85d57b1c3a7@redhat.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> <47a0b915-291d-7bee-c298-a85d57b1c3a7@redhat.com> Message-ID: <01299e5a-8786-bd78-83f4-5e7f900f96da@arm.com> Hi Andrew, On 8/19/20 9:01 PM, Andrew Dinn wrote: > Hi Ningsheng, > > On 19/08/2020 10:53, Ningsheng Jian wrote: >> I have updated the patch based on the review comments. Would you mind >> taking another look? Thanks! >> >> Full: >> http://cr.openjdk.java.net/~njian/8231441/webrev.04/ >> >> Incremental: >> http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/ > > That looks ok. A few suggested tweaks: > Thanks! > aarch64.ad:168 > > I think the following comment explains more clearly what is going on: > > // For SVE vector registers, we simply extend vector register size to 8 > // 'logical' slots. This is nominally 256 bits but it actually covers > // all possible 'physical' SVE vector register lengths from 128 ~ 2048 bits. > // The 'physical' SVE vector register length is detected during startup > // so the register allocator is able to identify the correct number of > // bytes needed for an SVE spill/unspill. > // Note that a vector register with 4 slots, denotes a 128-bit NEON > // register allowing it to be distinguished from the > // corresponding SVE vector register when the SVE vector length > // is 128 bits. > This looks better than mine. Thanks! :-) > postaloc.cpp:312 & 322 > > 311 if (lrgs(val_idx).is_scalable()) { > 312 assert(val->ideal_reg() == Op_VecA, "scalable vector register"); > > . . . > > 321 if (lrgs(val_idx).is_scalable()) { > 322 assert(val->ideal_reg() == Op_VecA, "scalable vector register"); > > You don't strictly need the asserts here as this is already asserted in > the call to is_scalable(). > The assertion in LRG::is_scalable() is different, while this is an assertion for ideal_reg of a given node. > >> JTreg tests are still running, and so far no new failure found. > Ok, well assuming they pass I am happy with this latest patch modulo the > tweaks above. > Will report back once the tests on real hardware passed. Thanks, Ningsheng From nick.gasson at arm.com Thu Aug 20 03:26:59 2020 From: nick.gasson at arm.com (Nick Gasson) Date: Thu, 20 Aug 2020 11:26:59 +0800 Subject: RFR(S): 8251923: "Invalid JNI handle" assertion failure in JVMCICompiler::force_comp_at_level_simple() In-Reply-To: <83e818a0-b9d2-205b-6a25-4869fc1e2101@oracle.com> References: <85pn7nxc8z.fsf@nicgas01-pc.shanghai.arm.com> <6E15607C-D983-4645-86DB-115BDB7F563E@oracle.com> <83e818a0-b9d2-205b-6a25-4869fc1e2101@oracle.com> Message-ID: <85eeo2xaik.fsf@nicgas01-pc.shanghai.arm.com> Thank you both for the reviews. I've pushed it. -- Nick On 08/20/20 03:18 am, Vladimir Kozlov wrote: > +1 > > Thanks, > Vladimir K > > On 8/19/20 12:16 PM, Doug Simon wrote: >> Looks good to me. >> >> -Doug >> >>> On 19 Aug 2020, at 10:37, Nick Gasson wrote: >>> >>> Hi, >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8251923 >>> Webrev: http://cr.openjdk.java.net/~ngasson/8251923/webrev.1/ >>> >>> We see this crash occasionally when testing with Graal on some AArch64 >>> systems: >>> >>> # >>> # Internal Error (/home/ent-user/jdk_src/src/hotspot/share/runtime/jniHandles.inline.hpp:63), pid=92161, tid=92593 >>> # assert(external_guard || result != __null) failed: Invalid JNI handle >>> # >>> >>> V [libjvm.so+0xdfaa84] JNIHandles::resolve(_jobject*)+0x19c >>> V [libjvm.so+0xf25104] HotSpotJVMCI::resolve(JVMCIObject)+0x14 >>> V [libjvm.so+0xe9bd20] JVMCICompiler::force_comp_at_level_simple(methodHandle const&)+0xa0 >>> V [libjvm.so+0x174bd6c] TieredThresholdPolicy::is_mature(Method*)+0x51c >>> V [libjvm.so+0x76e68c] ciMethodData::load_data()+0x9cc >>> >>> The full hs_err file is attached to the JBS entry. >>> >>> The handle here is _HotSpotJVMCIRuntime_instance which is initialised in >>> JVMCIRuntime::initialize_HotSpotJVMCIRuntime(): >>> >>> JVMCIObject result = JVMCIENV->call_HotSpotJVMCIRuntime_runtime(JVMCI_CHECK); >>> _HotSpotJVMCIRuntime_instance = JVMCIENV->make_global(result); >>> >>> JVMCICompiler::force_comp_at_level_simple() checks whether the _object >>> field inside the handle is null before calling JNIHandles::resolve() on >>> it, which should avoid the above assertion failure where the pointee is >>> null. However on a non-TSO architecture another thread may observe the >>> store to _object when assigning _HotSpotJVMCIRuntime_instance before the >>> store in JVMCIEnv::make_global() that initialises the pointed-to oop. We >>> need to add a store-store barrier here to force the expected ordering. >>> >>> Tested with jcstress and Graal on the affected machine, which used to >>> reproduce it quite reliably. >>> >>> -- >>> Thanks, >>> Nick >> From fairoz.matte at oracle.com Thu Aug 20 03:39:51 2020 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Wed, 19 Aug 2020 20:39:51 -0700 (PDT) Subject: RFR(s): 8248295: serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal In-Reply-To: <6f104422-11cc-1bea-2ebf-a916a22f10fd@oracle.com> References: <1bdcbd35-097e-1681-3a0c-32f9709497a4@oracle.com> <7040f785-b871-9771-94a2-4c3472a6bf6d@oracle.com> <59cd0914-5a61-463e-b46f-ebdc1496ab9f@default> <1b7f5767-7d1f-1f43-87bb-556801ef1c41@oracle.com> <6f104422-11cc-1bea-2ebf-a916a22f10fd@oracle.com> Message-ID: <94f5c0a2-f324-4613-abbd-68c4d7df6f52@default> Thanks Vladimir and Serguei for the reviews. Thanks, Fairoz > -----Original Message----- > From: Serguei Spitsyn > Sent: Thursday, August 20, 2020 1:45 AM > To: Vladimir Kozlov ; Fairoz Matte > ; hotspot-compiler-dev at openjdk.java.net; > serviceability-dev at openjdk.java.net > Cc: Coleen Phillimore > Subject: Re: RFR(s): 8248295: > serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal > > Hi Fairoz, > > LGTM++ > > Thanks, > Serguei > > > On 8/19/20 09:38, Vladimir Kozlov wrote: > > Looks good. > > > > Thanks, > > Vladimir K > > > > On 8/19/20 5:30 AM, Fairoz Matte wrote: > >> Hi Vladimir, > >> > >> Thanks for the review. > >> > >>> I would suggest to run test with -XX:+PrintCodeCache flag which > >>> prints CodeCache usage on exit. > >>> > >>> Also add '-ea -esa' flags - some runs failed with them because they > >>> increase Graal's methods size. > >>> > >>> Running test with immediately caused OOM error on my local linux > >>> machine: > >>> > >>> '-server -ea -esa -XX:+TieredCompilation -XX:+PrintCodeCache - > >>> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI - > >>> XX:+UseJVMCICompiler -Djvmci.Compiler=graal' > >>> > >>> With -XX:ReservedCodeCacheSize=30m I got: > >>> > >>> [11.217s][warning][codecache] CodeCache is full. Compiler has been > >>> disabled. > >>> [11.217s][warning][codecache] Try increasing the code cache size > >>> using - XX:ReservedCodeCacheSize= > >>> > >>> With -XX:ReservedCodeCacheSize=50m I got this output: > >> > >> Further testing with PrintCodeCache, ReservedCodeCacheSize = 50MB is > >> the safe one to use. > >> > >>> > >>> CodeCache: size=51200Kb used=34401Kb max_used=34401Kb > free=16798Kb > >>> > >>> May be you need to set it to 35m or better to 50m to be safe. > >>> > >>> Note, without Graal test uses only 5.5m: > >>> > >>> CodeCache: size=20480Kb used=5677Kb max_used=5688Kb > free=14803Kb > >>> > >>> ----------------------------- > >>> > >>> I also forgot to ask you to update test's Copyright year. > >> > >> I have updated the copyright year. > >> Updated webrev for the reference - > >> http://cr.openjdk.java.net/~fmatte/8248295/webrev.01/ > >> > >> Thanks, > >> Fairoz > >>> > >>> Regards, > >>> Vladimir K > >>> > >>> On 8/18/20 1:10 AM, Fairoz Matte wrote: > >>>> Hi Vladimir, > >>>> > >>>> Thanks for looking into. > >>>> This is intermittent crash, and is reproducible in windows debug > >>>> build > >>> environment. Below is the testing performed. > >>>> > >>>> 1. Issues observed 7/100 runs, ReservedCodeCacheSize=20m with "- > >>> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI - > >>> XX:+UseJVMCICompiler" > >>>> 2. Issues observed 0/300 runs, ReservedCodeCacheSize=30m with "- > >>> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI - > >>> XX:+UseJVMCICompiler" > >>>> > >>>> Thanks, > >>>> Fairoz > >>>> > >>>>> -----Original Message----- > >>>>> From: Vladimir Kozlov > >>>>> Sent: Monday, August 17, 2020 11:22 PM > >>>>> To: Fairoz Matte ; hotspot-compiler- > >>>>> dev at openjdk.java.net; serviceability-dev at openjdk.java.net > >>>>> Cc: Coleen Phillimore ; Dean Long > >>>>> > >>>>> Subject: Re: RFR(s): 8248295: > >>>>> serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with > >>>>> Graal > >>>>> > >>>>> Hi Fairoz, > >>>>> > >>>>> How you determine that +10Mb is enough with Graal? > >>>>> > >>>>> Thanks, > >>>>> Vladimir > >>>>> > >>>>> On 8/17/20 5:46 AM, Fairoz Matte wrote: > >>>>>> Hi, > >>>>>> > >>>>>> > >>>>>> > >>>>>> Please review this small test change to work with Graal. > >>>>>> > >>>>>> > >>>>>> > >>>>>> Background: > >>>>>> > >>>>>> Graal require more code cache compared to c1/c2. but the test > >>>>>> case always > >>>>> set it to 20MB. This may not be sufficient when running graal. > >>>>>> > >>>>>> Default configuration for ReservedCodeCacheSize = 250MB > >>>>>> > >>>>>> With graal enabled, ReservedCodeCacheSize = 350MB > >>>>>> > >>>>>> > >>>>>> > >>>>>> Either we can modify the framework to honor > ReservedCodeCacheSize > >>>>>> for > >>>>> graal or just update the testcase. > >>>>>> > >>>>>> There are not many test cases they rely on ReservedCodeCacheSize > >>>>>> or > >>>>> InitialCodeCacheSize. So the fix prefer the later one. > >>>>>> > >>>>>> > >>>>>> > >>>>>> JBS - https://bugs.openjdk.java.net/browse/JDK-8248295 > >>>>>> > >>>>>> Webrev - http://cr.openjdk.java.net/~fmatte/8248295/webrev.00/ > >>>>>> > >>>>>> > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> Fairoz > >>>>>> > >>>>>> > >>>>>> > From nick.gasson at arm.com Thu Aug 20 04:48:30 2020 From: nick.gasson at arm.com (Nick Gasson) Date: Thu, 20 Aug 2020 12:48:30 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com> Message-ID: <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com> On 08/19/20 19:10 pm, Andrew Haley wrote: > On 19/08/2020 11:05, Magnus Ihse Bursie wrote: >> This is maybe not relevant, but I was surprised to find >> src/hotspot/cpu/aarch64/aarch64-asmtest.py, because a) it's python code, >> and b) the name implies that it is a test, even though that it resides >> in src. Is this really proper? > > I have no idea whether it's really proper, but it allows us to check > that instructions are encoded correctly by cross-checking with the > system's assembler. There might well be a more hygienic way to do > that, but I don't want to be without it. It is perhaps a bit strange to have the test code under src/ and embedded in the assembler implementation. How about we move it under test/ using the existing gtest framework for native code tests? That runs in tier1 and also for release builds. I tried this just now and it's easy to do. -- Thanks, Nick From christian.hagedorn at oracle.com Thu Aug 20 07:10:57 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 20 Aug 2020 09:10:57 +0200 Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and debugging support In-Reply-To: References: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com> <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com> <5c6d0dae-a30d-7b32-4503-62b7ec460606@oracle.com> <4b6d3d8a-f2f3-9942-de60-078a0e5c46d9@oracle.com> <8cd1d560-f473-f4f1-a865-70e306d4750f@oracle.com> <0d5fd444-e836-8042-3039-6d16e62ecfb1@oracle.com> Message-ID: <419a1ca8-3bb0-1ed9-3d6b-6dec9fa4217e@oracle.com> Thank you Vladimir for your careful review! Best regards, Christian On 19.08.20 18:43, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir K > > On 8/19/20 7:06 AM, Christian Hagedorn wrote: >> On 18.08.20 17:41, Vladimir Kozlov wrote: >>> c1_Compilation.hpp: looks like both versions of allocator() do the >>> same thing. >> >> Right, I first wanted to have a public allocator() version in >> non-product only - but that might be over-engineered as they do the >> same thing. I changed it back to a single public version. >> >>> I suggest to build with configure --with-debug-level=optimized to >>> check that NOT_PRODUCT can be built with these changes. >> >> That's a good idea! I indeed forgot about one NOT_PRODUCT -> >> DEBUG_ONLY change. I also found other build issues with the optimized >> build. I filed [1] and already sent an RFR for it. It builds >> successfully with this patch on top of it. >> >> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.02/ >> >> Best regards, >> Christian >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8252037 >> >>> Thanks, >>> Vladimir >>> >>> On 8/18/20 6:16 AM, Christian Hagedorn wrote: >>>> Hi Vladimir >>>> >>>> On 17.08.20 19:36, Vladimir Kozlov wrote: >>>>> On 8/17/20 12:44 AM, Christian Hagedorn wrote: >>>>>> Hi Vladimir >>>>>> >>>>>> Yes, you're right, these should be changed into ASSERT and DEBUG(). >>>>>> >>>>>> I'm wondering though if these ifdefs are even required for >>>>>> if-blocks inside methods? >>>>>> >>>>>> Isn't, for example, this if-block: >>>>>> >>>>>> #ifndef PRODUCT >>>>>> ???????? if (TraceLinearScanLevel >= 2) { >>>>>> ?????????? tty->print_cr("killing XMMs for trig"); >>>>>> ???????? } >>>>>> #endif >>>>>> >>>>>> removed anyways when the flag is set to < 2 (which is statically >>>>>> known and thus would allow this entire block to be removed)? Or >>>>>> does it make a difference by explicitly guarding it with an ifdef? >>>>> >>>>> You are right. It could be statically removed. But we keep #ifdef >>>>> sometimes to indicate that code is executed only in debug build >>>>> because we don't always remember type of a flag. >>>> >>>> I see, that makes sense. I updated my patch and left the ifdefs >>>> there but changed them to ASSERT. I also updated other ifdefs >>>> belonging to TraceLinearScanLevel appropriately. >>>> >>>> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.01/ >>>> >>>> Best regards, >>>> Christian >>>> >>>>> >>>>> Thanks, >>>>> Vladimir K >>>>> >>>>>> >>>>>> Best regards, >>>>>> Christian >>>>>> >>>>>> On 14.08.20 20:09, Vladimir Kozlov wrote: >>>>>>> One note. Most of the code is guarded by #ifndef PRODUCT. >>>>>>> >>>>>>> But the flag is available only in DEBUG build: >>>>>>> ?? develop(intx, TraceLinearScanLevel, 0, >>>>>>> >>>>>>> Should we use #ifdef ASSERT and DEBUG() instead? >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 8/14/20 5:10 AM, Christian Hagedorn wrote: >>>>>>>> Hi >>>>>>>> >>>>>>>> Please review the following enhancement for C1: >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8251093 >>>>>>>> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.00/ >>>>>>>> >>>>>>>> While I was working on JDK-8249603 [1], I added some additional >>>>>>>> debugging and logging code which helped to figure out what was >>>>>>>> going on. I think it would be useful to have this code around >>>>>>>> for the analysis of future C1 register allocator bugs. >>>>>>>> >>>>>>>> This RFE adds (everything non-product code): >>>>>>>> - find_interval(number): Can be called like that from gdb >>>>>>>> anywhere to find an interval with the given number. >>>>>>>> - Interval::print_children()/print_parent(): Useful when >>>>>>>> debugging with gdb to quickly show the split children and parent. >>>>>>>> - LinearScan::print_reg_num(number): Prints the register or >>>>>>>> stack location for this register number. This is useful in some >>>>>>>> places (logging with TraceLinearScanLevel set) where it just >>>>>>>> printed a number which first had to be manually looked up in >>>>>>>> other logs. >>>>>>>> >>>>>>>> I additionally did some cleanup of the touched code. >>>>>>>> >>>>>>>> We could additionally split the TraceLinearScanLevel flag into >>>>>>>> separate flags related to the different phases of the register >>>>>>>> allocation algorithm. It currently just prints too much details >>>>>>>> on the higher levels. You often find yourself being interested >>>>>>>> in a specific part of the algorithm and only want to know more >>>>>>>> details there. To achieve that you now you have to either handle >>>>>>>> all the noise or manually disable/enable other logs. We could >>>>>>>> file an RFE to clean this up if it's worth the effort - given >>>>>>>> that there are not many new issues filed for C1 register >>>>>>>> allocation today. >>>>>>>> >>>>>>>> Thank you! >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Christian >>>>>>>> >>>>>>>> >>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8251093 >>>>>>>> From adinn at redhat.com Thu Aug 20 08:34:45 2020 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 20 Aug 2020 09:34:45 +0100 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <01299e5a-8786-bd78-83f4-5e7f900f96da@arm.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> <47a0b915-291d-7bee-c298-a85d57b1c3a7@redhat.com> <01299e5a-8786-bd78-83f4-5e7f900f96da@arm.com> Message-ID: Hi Ningsheng, >> postaloc.cpp:312 & 322 >> >> 311???? if (lrgs(val_idx).is_scalable()) { >> 312?????? assert(val->ideal_reg() == Op_VecA, "scalable vector >> register"); >> >> ???????? . . . >> >> 321?????? if (lrgs(val_idx).is_scalable()) { >> 322???????? assert(val->ideal_reg() == Op_VecA, "scalable vector >> register"); >> >> You don't strictly need the asserts here as this is already asserted in >> the call to is_scalable(). > > The assertion in LRG::is_scalable() is different, while this is an > assertion for ideal_reg of a given node. Yes, my apologies for misreading that. These assertions should be retained. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From adinn at redhat.com Thu Aug 20 08:48:07 2020 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 20 Aug 2020 09:48:07 +0100 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com> <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com> Message-ID: <301b8518-164d-b328-ed14-6aaa3b69b2ef@redhat.com> On 20/08/2020 05:48, Nick Gasson wrote: > On 08/19/20 19:10 pm, Andrew Haley wrote: >> On 19/08/2020 11:05, Magnus Ihse Bursie wrote: >>> This is maybe not relevant, but I was surprised to find >>> src/hotspot/cpu/aarch64/aarch64-asmtest.py, because a) it's python code, >>> and b) the name implies that it is a test, even though that it resides >>> in src. Is this really proper? >> >> I have no idea whether it's really proper, but it allows us to check >> that instructions are encoded correctly by cross-checking with the >> system's assembler. There might well be a more hygienic way to do >> that, but I don't want to be without it. > > It is perhaps a bit strange to have the test code under src/ and > embedded in the assembler implementation. How about we move it under > test/ using the existing gtest framework for native code tests? That > runs in tier1 and also for release builds. I tried this just now and > it's easy to do. I'm not sure that would be an improvement. This python code is used to generate C code run as part of JVM startup in a debug JVM build i.e. code that is linked into the JVM itself. So, the code it generates is really the same as the debug code embedded in the JVM. It doesn't really bear any relation to the code in the test tree. If the generator code were to go anywhere else it would perhaps make most sense to put it in the make tree. I'm not sure that is required though or even appropriate. There is already a precedent for keeping generator code in the source tree and, when it is specific to a given arch, keeping it next to the related source. The adlc generator code sits in the shared source tree. The m4 file used to generate parts of aarch64.ad is in the aarch64 source tree. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From aph at redhat.com Thu Aug 20 08:50:32 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 20 Aug 2020 09:50:32 +0100 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <01299e5a-8786-bd78-83f4-5e7f900f96da@arm.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> <47a0b915-291d-7bee-c298-a85d57b1c3a7@redhat.com> <01299e5a-8786-bd78-83f4-5e7f900f96da@arm.com> Message-ID: <7a12ad31-9196-c724-16c9-9994b096974c@redhat.com> On 20/08/2020 03:27, Ningsheng Jian wrote: > // Note that a vector register with 4 slots, denotes a 128-bit NEON Lose the comma. :-) Never known to miss a trivial point, -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Thu Aug 20 08:53:52 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 20 Aug 2020 09:53:52 +0100 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com> <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com> Message-ID: <6fea2144-b416-cad3-8c99-068a82490256@redhat.com> On 20/08/2020 05:48, Nick Gasson wrote: > On 08/19/20 19:10 pm, Andrew Haley wrote: >> On 19/08/2020 11:05, Magnus Ihse Bursie wrote: >>> This is maybe not relevant, but I was surprised to find >>> src/hotspot/cpu/aarch64/aarch64-asmtest.py, because a) it's python code, >>> and b) the name implies that it is a test, even though that it resides >>> in src. Is this really proper? >> >> I have no idea whether it's really proper, but it allows us to check >> that instructions are encoded correctly by cross-checking with the >> system's assembler. There might well be a more hygienic way to do >> that, but I don't want to be without it. > > It is perhaps a bit strange to have the test code under src/ and > embedded in the assembler implementation. How about we move it under > test/ using the existing gtest framework for native code tests? That > runs in tier1 and also for release builds. I tried this just now and > it's easy to do. Go on, then! Bear in mind that the idea of this test is that it checks the encoding of all instructions, regardless of whether the processor supports them or not. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From nick.gasson at arm.com Thu Aug 20 08:58:57 2020 From: nick.gasson at arm.com (Nick Gasson) Date: Thu, 20 Aug 2020 16:58:57 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <301b8518-164d-b328-ed14-6aaa3b69b2ef@redhat.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com> <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com> <301b8518-164d-b328-ed14-6aaa3b69b2ef@redhat.com> Message-ID: <857dtty9pq.fsf@nicgas01-pc.shanghai.arm.com> Hi Andrew, On 08/20/20 16:48 pm, Andrew Dinn wrote: >> >> It is perhaps a bit strange to have the test code under src/ and >> embedded in the assembler implementation. How about we move it under >> test/ using the existing gtest framework for native code tests? That >> runs in tier1 and also for release builds. I tried this just now and >> it's easy to do. > I'm not sure that would be an improvement. This python code is used to > generate C code run as part of JVM startup in a debug JVM build i.e. > code that is linked into the JVM itself. So, the code it generates is > really the same as the debug code embedded in the JVM. It doesn't really > bear any relation to the code in the test tree. > I meant move the test itself - entry() and asm_check() in assembler_aarch64.cpp - under test/hotspot/gtest. The generator would move with it. > If the generator code were to go anywhere else it would perhaps make > most sense to put it in the make tree. I'm not sure that is required > though or even appropriate. There is already a precedent for keeping > generator code in the source tree and, when it is specific to a given > arch, keeping it next to the related source. The adlc generator code > sits in the shared source tree. The m4 file used to generate parts of > aarch64.ad is in the aarch64 source tree. > > regards, > > > Andrew Dinn > ----------- > Red Hat Distinguished Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill From aph at redhat.com Thu Aug 20 09:08:18 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 20 Aug 2020 10:08:18 +0100 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <857dtty9pq.fsf@nicgas01-pc.shanghai.arm.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com> <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com> <301b8518-164d-b328-ed14-6aaa3b69b2ef@redhat.com> <857dtty9pq.fsf@nicgas01-pc.shanghai.arm.com> Message-ID: Hi, On 20/08/2020 09:58, Nick Gasson wrote: > > On 08/20/20 16:48 pm, Andrew Dinn wrote: >>> >>> It is perhaps a bit strange to have the test code under src/ and >>> embedded in the assembler implementation. How about we move it under >>> test/ using the existing gtest framework for native code tests? That >>> runs in tier1 and also for release builds. I tried this just now and >>> it's easy to do. >> I'm not sure that would be an improvement. This python code is used to >> generate C code run as part of JVM startup in a debug JVM build i.e. >> code that is linked into the JVM itself. So, the code it generates is >> really the same as the debug code embedded in the JVM. It doesn't really >> bear any relation to the code in the test tree. > > I meant move the test itself - entry() and asm_check() in > assembler_aarch64.cpp - under test/hotspot/gtest. The generator would > move with it. Hmm. I'm still not sure how this would work. Let's see the patch and we can talk about it. It still sounds to me rather like pointlessly moving the furniture around. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rwestrel at redhat.com Thu Aug 20 09:12:54 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 20 Aug 2020 11:12:54 +0200 Subject: RFR(XS): 8251527: CTW: C2 (Shenandoah) compilation fails with SEGV due to unhandled catchproj == NULL Message-ID: <877dtt3ckp.fsf@redhat.com> http://cr.openjdk.java.net/~roland/8251527/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8251527 This triggers with Shenandoah but the fix (and the bug) is in shared C2 code. CallNode::extract_projections(), once it has found the control ProjNode looks for the CatchNode at the first use of the ProjNode. In the case of the crash, the ProjNode has more than one use and the first use is not the CatchNode (but a pinned LoadNode). I propose using unique_ctrl_out() instead. The ProjNode has a LoadNode because one is pinned on a ProjNode by PhaseIdealLoop::split_if_with_blocks_post() when it tries to sink the LoadNode out of loop. A LoadNode becomes the first use of the ProjNode after the loop body is cloned during unswitching. Roland. From nick.gasson at arm.com Thu Aug 20 09:40:34 2020 From: nick.gasson at arm.com (Nick Gasson) Date: Thu, 20 Aug 2020 17:40:34 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com> <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com> <301b8518-164d-b328-ed14-6aaa3b69b2ef@redhat.com> <857dtty9pq.fsf@nicgas01-pc.shanghai.arm.com> Message-ID: <85364hy7sd.fsf@nicgas01-pc.shanghai.arm.com> On 08/20/20 17:08 pm, Andrew Haley wrote: >> >> I meant move the test itself - entry() and asm_check() in >> assembler_aarch64.cpp - under test/hotspot/gtest. The generator would >> move with it. > > Hmm. I'm still not sure how this would work. Let's see the patch and > we can talk about it. It still sounds to me rather like pointlessly > moving the furniture around. http://cr.openjdk.java.net/~ngasson/asmtest/webrev.0/ Then you'd run it with make exploded-test TEST="gtest:AssemblerAArch64" The downside is that it won't run on every startup of a debug build, but it will run in the tier1 tests, including for release builds, which arguably gives more coverage. It looks a lot tidier to me, but that's clearly subjective. -- Thanks, Nick From christian.hagedorn at oracle.com Thu Aug 20 09:45:05 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 20 Aug 2020 11:45:05 +0200 Subject: RFR(XS): 8251527: CTW: C2 (Shenandoah) compilation fails with SEGV due to unhandled catchproj == NULL In-Reply-To: <877dtt3ckp.fsf@redhat.com> References: <877dtt3ckp.fsf@redhat.com> Message-ID: <4c3aa4cf-af9f-9000-a12d-010bdd477b30@oracle.com> Hi Roland That looks good to me. Best regards, Christian On 20.08.20 11:12, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8251527/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8251527 > > This triggers with Shenandoah but the fix (and the bug) is in shared > C2 code. > > CallNode::extract_projections(), once it has found the control ProjNode > looks for the CatchNode at the first use of the ProjNode. In the case of > the crash, the ProjNode has more than one use and the first use is not > the CatchNode (but a pinned LoadNode). I propose using unique_ctrl_out() > instead. > > The ProjNode has a LoadNode because one is pinned on a ProjNode by > PhaseIdealLoop::split_if_with_blocks_post() when it tries to sink the > LoadNode out of loop. A LoadNode becomes the first use of the ProjNode > after the loop body is cloned during unswitching. > > Roland. > From magnus.ihse.bursie at oracle.com Thu Aug 20 10:14:36 2020 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Thu, 20 Aug 2020 12:14:36 +0200 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <85364hy7sd.fsf@nicgas01-pc.shanghai.arm.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com> <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com> <301b8518-164d-b328-ed14-6aaa3b69b2ef@redhat.com> <857dtty9pq.fsf@nicgas01-pc.shanghai.arm.com> <85364hy7sd.fsf@nicgas01-pc.shanghai.arm.com> Message-ID: <39664366-1ba9-6eb1-dcee-c8a4f07877b7@oracle.com> On 2020-08-20 11:40, Nick Gasson wrote: > On 08/20/20 17:08 pm, Andrew Haley wrote: >>> I meant move the test itself - entry() and asm_check() in >>> assembler_aarch64.cpp - under test/hotspot/gtest. The generator would >>> move with it. >> Hmm. I'm still not sure how this would work. Let's see the patch and >> we can talk about it. It still sounds to me rather like pointlessly >> moving the furniture around. > http://cr.openjdk.java.net/~ngasson/asmtest/webrev.0/ > > Then you'd run it with > > make exploded-test TEST="gtest:AssemblerAArch64" > > The downside is that it won't run on every startup of a debug build, but > it will run in the tier1 tests, including for release builds, which > arguably gives more coverage. It looks a lot tidier to me, but that's > clearly subjective. FWIW, it definitely looks tidier to me too. /Magnus > > -- > Thanks, > Nick From adinn at redhat.com Thu Aug 20 10:38:32 2020 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 20 Aug 2020 11:38:32 +0100 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <39664366-1ba9-6eb1-dcee-c8a4f07877b7@oracle.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com> <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com> <301b8518-164d-b328-ed14-6aaa3b69b2ef@redhat.com> <857dtty9pq.fsf@nicgas01-pc.shanghai.arm.com> <85364hy7sd.fsf@nicgas01-pc.shanghai.arm.com> <39664366-1ba9-6eb1-dcee-c8a4f07877b7@oracle.com> Message-ID: <4ea979d6-c81f-fa3b-7c23-e563f52141dd@redhat.com> On 20/08/2020 11:14, Magnus Ihse Bursie wrote: > On 2020-08-20 11:40, Nick Gasson wrote: >> http://cr.openjdk.java.net/~ngasson/asmtest/webrev.0/ >> >> Then you'd run it with >> >> ?? make exploded-test TEST="gtest:AssemblerAArch64" >> >> The downside is that it won't run on every startup of a debug build, but >> it will run in the tier1 tests, including for release builds, which >> arguably gives more coverage. It looks a lot tidier to me, but that's >> clearly subjective. > FWIW, it definitely looks tidier to me too. Well, perhaps this check ought to be done as a standalone test rather than as debug build validation. I don't really have any deep commitment either way. However, if we do proceed with this I think it ought to be in a separate follow-up patch and with its own JIRA. It should not stop Ningsheng's SVE patch going in as is since that merely corrects the status quo to allow for SVE instructions. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From vladimir.x.ivanov at oracle.com Thu Aug 20 12:29:27 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 20 Aug 2020 15:29:27 +0300 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> Message-ID: <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com> Hi Ningsheng, > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039289.html Impressive work, Ningsheng! > http://cr.openjdk.java.net/~njian/8231441/README-RFR.txt "Since the bottom 128 bits are shared with the NEON, we extend current register mask definition of V0-V31 registers. Currently, c2 uses one bit mask for a 32-bit register slot, so to define at most 2048 bits we will need to add 64 slots in AD file. That's a really large number, and will also break current regmask assumption." Can you, please, elaborate on the last point? What RegMask assumptions are broken for 2048-bit vectors? I'm looking at [1] and try to understand the motivation for the changes in shared code. Compared to x86 w/ AVX512, architectural state for vector registers is 4x larger in the worst case (ignoring predicate registers for now). Here are the relevant constants on x86: gensrc/adfiles/adGlobals_x86.hpp: // the number of reserved registers + machine registers. #define REG_COUNT 545 ... // Size of register-mask in ints #define RM_SIZE 22 My estimate is that for AArch64 with SVE support the constants will be: REG_COUNT < 2500 RM_SIZE < 100 which don't look too bad. Also, I don't see any changes related to stack management. So, I assume it continues to be managed in slots. Any problems there? As I understand, wide SVE registers are caller-save, so there may be many spills of huge vectors around a call. (Probably, not possible with C2 auto-vectorizer as it is now, but Vector API will expose it.) Have you noticed any performance problems? If that's the case, then AVX512 support on x86 would benefit from similar optimization as well. FTR there was a similar exercise [2] on x86 to abstract away exact sizes of vector registers, but it didn't have to worry about RA since all the operands were already available. Also, vectors of all different sizes may be used. So, it makes it hard to compare. Best regards, Vladimir Ivanov [1] http://cr.openjdk.java.net/~njian/8231441/webrev.03-ra/ [2] https://bugs.openjdk.java.net/browse/JDK-8230015 > On 7/30/20 7:26 PM, Andrew Dinn wrote: >> Hi Ningsheng, >> >> I will start to review this either later today or (more likely) >> tomorrow. It will probably take some time to work through it all. I will >> work from the updated patch posted by PengFei. >> >> regards, >> >> >> Andrew Dinn >> ----------- >> Red Hat Distinguished Engineer >> Red Hat UK Ltd >> Registered in England and Wales under Company Registration No. 03798903 >> Directors: Michael Cunningham, Michael ("Mike") O'Neill >> >> On 21/07/2020 07:05, Ningsheng Jian wrote: >>> [Ping] >>> >>> Could anyone please help to review this patch, especially for the c2 >>> register allocation part? >>> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8231441 >>> >>> The latest webrev: >>> http://cr.openjdk.java.net/~njian/8231441/webrev.02 >>> >>> In the latest webrev, we block one predicate register (p7) with all >>> elements preset to TRUE, so that c2 compiled code can use it freely to >>> generate instructions for unpredicated operations. >>> >>> And the split parts: >>> >>> 1) SVE feature detection: >>> http://cr.openjdk.java.net/~njian/8231441/webrev.02-feature >>> >>> 2) c2 register allocation: >>> http://cr.openjdk.java.net/~njian/8231441/webrev.02-ra >>> >>> 3) SVE c2 backend: >>> http://cr.openjdk.java.net/~njian/8231441/webrev.02-c2 >>> >>> The initial RFR which has some descriptions of the patch: >>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-March/037628.html >>> >>> >>> >>> The description can also be found at: >>> http://cr.openjdk.java.net/~njian/8231441/README-RFR.txt >>> >>> Notes to verify the patch on QEMU user emulation, with an example of >>> compiled code: >>> http://cr.openjdk.java.net/~njian/8231441/running-sve-in-qemu-user.txt >>> >>> Thanks, >>> Ningsheng >>> >>> >>> On 5/27/20 3:23 PM, Ningsheng Jian wrote: >>>> Hi, >>>> >>>> I have rebased this patch with some more comments added. And also >>>> relaxed the instruction matching conditions for 128-bit vector. >>>> >>>> I would appreciate if someone could help to review this. >>>> >>>> Whole patch: >>>> http://cr.openjdk.java.net/~njian/8231441/webrev.01 >>>> >>>> Different parts of changes: >>>> >>>> 1) SVE feature detection >>>> http://cr.openjdk.java.net/~njian/8231441/webrev.01-feature >>>> >>>> 2) c2 registion allocation >>>> http://cr.openjdk.java.net/~njian/8231441/webrev.01-ra >>>> >>>> 3) SVE c2 backend >>>> http://cr.openjdk.java.net/~njian/8231441/webrev.01-c2 >>>> >>>> (Or should I split this into different JBS?) >>>> >>>> Thanks, >>>> Ningsheng >>>> >>>> On 3/25/20 2:37 PM, Ningsheng Jian wrote: >>>>> Hi, >>>>> >>>>> Could you please help to review this patch adding AArch64 SVE support? >>>>> It also touches c2 compiler shared code. >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231441 >>>>> Webrev: http://cr.openjdk.java.net/~njian/8231441/webrev.00 >>>>> >>>>> Arm has released new vector ISA extension for AArch64, SVE [1] and >>>>> SVE2 [2]. This patch adds the initial SVE support in OpenJDK. In this >>>>> patch we have: >>>>> >>>>> 1) SVE feature enablement and detection >>>>> 2) SVE vector register allocation support with initial predicate >>>>> register definition >>>>> 3) SVE c2 backend for current SLP based vectorizer. (We also have a >>>>> POC >>>>> patch of a new vectorizer using SVE predicate-driven loop control, but >>>>> that's still under development.) >>>>> >>>>> SVE register definition >>>>> ======================= >>>>> Unlike other SIMD architectures, SVE allows hardware >>>>> implementations to >>>>> choose a vector register length from 128 and 2048 bits, multiple of >>>>> 128 >>>>> bits. So we introduce a new vector type VectorA, i.e. length agnostic >>>>> (scalable) vector type, and Op_VecA for machine vectora register. >>>>> In the >>>>> meantime, to minimize register allocation code changes, we also take >>>>> advantage of one JIT compiler aspect, that is during the compile >>>>> time we >>>>> actually know the real hardware SVE vector register size of current >>>>> running machine. So, the register allocator actually knows how many >>>>> register slots an Op_VecA ideal reg requires, and could work fine >>>>> without much modification. >>>>> >>>>> Since the bottom 128 bits are shared with the NEON, we extend current >>>>> register mask definition of V0-V31 registers. Currently, c2 uses >>>>> one bit >>>>> mask for a 32-bit register slot, so to define at most 2048 bits we >>>>> will >>>>> need to add 64 slots in AD file. That's a really large number, and >>>>> will >>>>> also break current regmask assumption. Considering the SVE vector >>>>> register is architecturally scalable for different sizes, we just >>>>> define >>>>> double of original NEON vector register slots, i.e. 8 slots: Vx, Vx_H, >>>>> Vx_J ... Vx_O. After adlc, the generated register masks now looks >>>>> like: >>>>> >>>>> const RegMask _VECTORA_REG_mask( 0x0, 0x0, 0xffffffff, 0xffffffff, >>>>> 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, ... >>>>> >>>>> const RegMask _VECTORD_REG_mask( 0x0, 0x0, 0x3030303, 0x3030303, >>>>> 0x3030303, 0x3030303, 0x3030303, 0x3030303, ... >>>>> >>>>> const RegMask _VECTORX_REG_mask( 0x0, 0x0, 0xf0f0f0f, 0xf0f0f0f, >>>>> 0xf0f0f0f, 0xf0f0f0f, 0xf0f0f0f, 0xf0f0f0f, ... >>>>> >>>>> And we use SlotsPerVecA to indicate regmask bit size for a VecA >>>>> register. >>>>> >>>>> Although for physical register allocation, register allocator does not >>>>> need to know the real VecA register size, while doing spill/unspill, >>>>> current register allocation needs to know actual stack slot size to >>>>> store/load VecA registers. SVE is able to do vector size agnostic >>>>> spilling, but to minimize the code changes, as I mentioned before, we >>>>> just let RA know the actual vector register size in current running >>>>> machine, by calling scalable_vector_reg_size(). >>>>> >>>>> In the meantime, since some vector operations do not have unpredicated >>>>> SVE1 instructions, but only predicate version, e.g. vector multiply, >>>>> vector load/store. We have also defined predicate registers in this >>>>> patch, and c2 register allocator will allocate a temp predicate >>>>> register >>>>> to fulfill the expecting unpredicated operations. And this can also be >>>>> used for future predicate-driven vectorizer. This is not efficient for >>>>> now, as we can see many ptrue instructions in the generated code. One >>>>> possible solution I can see, is to block one predicate register, and >>>>> preset it to all true. But to preserve/reinitialize a caller save >>>>> register value cross calls seems risky to work in this patch. I decide >>>>> to defer it to further optimization work. If anyone has any >>>>> suggestions >>>>> on this, I would appreciate. >>>>> >>>>> SVE feature detection >>>>> ===================== >>>>> Since we may have some compiled code based on the initial detected SVE >>>>> vector register length and the compiled code is compiled only for that >>>>> vector register length, we assume that the SVE vector register length >>>>> will not be changed during the JVM lifetime. However, SVE vector >>>>> length >>>>> is per-thread and can be changed by system call [3], so we need to >>>>> make >>>>> sure that each jni call will not change the sve vector length. >>>>> >>>>> Currently, we verify the SVE vector register length on each JNI >>>>> return, >>>>> and if an SVE vector length change is detected, jvm simply reports >>>>> error >>>>> and stops running. The VM running vector length can also be set by >>>>> existing VM option MaxVectorSize with c2 enabled. If MaxVectorSize is >>>>> specified not the same as system default sve vector length (in >>>>> /proc/sys/abi/sve_default_vector_length), JVM will set current process >>>>> sve vector length to the specified vector length. >>>>> >>>>> Compiled code >>>>> ============= >>>>> We have added all current c2 backend codegen on par with NEON, but >>>>> only >>>>> for vector length larger than 128-bit. >>>>> >>>>> On a 1024 bit SVE environment, for the following simple loop with int >>>>> array element type: >>>>> >>>>> ???? for (int i = 0; i < LENGTH; i++) { >>>>> ?????? c[i] = a[i] + b[i]; >>>>> ???? } >>>>> >>>>> c2 generated loop: >>>>> >>>>> ???? 0x0000ffff811c0820:?? sbfiz?? x11, x10, #2, #32 >>>>> ???? 0x0000ffff811c0824:?? add???? x13, x18, x11 >>>>> ???? 0x0000ffff811c0828:?? add???? x14, x1, x11 >>>>> ???? 0x0000ffff811c082c:?? add???? x13, x13, #0x10 >>>>> ???? 0x0000ffff811c0830:?? add???? x14, x14, #0x10 >>>>> ???? 0x0000ffff811c0834:?? add???? x11, x0, x11 >>>>> ???? 0x0000ffff811c0838:?? add???? x11, x11, #0x10 >>>>> ???? 0x0000ffff811c083c:?? ptrue?? p1.s??? // To be optimized >>>>> ???? 0x0000ffff811c0840:?? ld1w??? {z16.s}, p1/z, [x14] >>>>> ???? 0x0000ffff811c0844:?? ptrue?? p0.s >>>>> ???? 0x0000ffff811c0848:?? ld1w??? {z17.s}, p0/z, [x13] >>>>> ???? 0x0000ffff811c084c:?? add???? z16.s, z17.s, z16.s >>>>> ???? 0x0000ffff811c0850:?? ptrue?? p1.s >>>>> ???? 0x0000ffff811c0854:?? st1w??? {z16.s}, p1, [x11] >>>>> ???? 0x0000ffff811c0858:?? add???? w10, w10, #0x20 >>>>> ???? 0x0000ffff811c085c:?? cmp???? w10, w12 >>>>> ???? 0x0000ffff811c0860:?? b.lt??? 0x0000ffff811c0820 >>>>> >>>>> Test >>>>> ==== >>>>> Currently, we don't have real hardware to verify SVE features (and >>>>> performance). But we have run jtreg tests with SVE in some >>>>> emulators. On >>>>> QEMU system emulator, which has SVE emulation support, jtreg tier1-3 >>>>> passed with different vector sizes. We've also verified it with full >>>>> jtreg tests without SVE on both x86 and AArch64, to make sure that >>>>> there's no regression. >>>>> >>>>> The patch has also been applied to Vector API code base, and >>>>> verified on >>>>> emulator. In Vector API, there are more vector related tests and is >>>>> more >>>>> possible to generate vector instructions by intrinsification. >>>>> >>>>> A simple test can also run in QEMU user emulation, e.g. >>>>> >>>>> $ qemu-aarch64 -cpu max,sve-max-vq=2 java -XX:UseSVE=1 SIMD >>>>> >>>>> ( >>>>> To run it in user emulation mode, we will need to bypass SVE feature >>>>> detection code in this patch. E.g. apply: >>>>> http://cr.openjdk.java.net/~njian/8231441/user-emulation.patch >>>>> )l >>>>> >>>>> Others >>>>> ====== >>>>> Since this patch is a bit large, I've also split it into 3 parts, for >>>>> easy review: >>>>> >>>>> 1) SVE feature detection >>>>> http://cr.openjdk.java.net/~njian/8231441/webrev.00-feature >>>>> >>>>> 2) c2 registion allocation >>>>> http://cr.openjdk.java.net/~njian/8231441/webrev.00-ra >>>>> >>>>> 3) SVE c2 backend >>>>> http://cr.openjdk.java.net/~njian/8231441/webrev.00-c2 >>>>> >>>>> Part of this patch has been contributed by Joshua Zhu and Yang Zhang. >>>>> >>>>> Refs >>>>> ==== >>>>> [1] https://developer.arm.com/docs/ddi0584/latest >>>>> [2] https://developer.arm.com/docs/ddi0602/latest >>>>> [3] https://www.kernel.org/doc/Documentation/arm64/sve.txt >>>>> >>>>> Thanks, >>>>> Ningsheng >>>>> >>>> >>> >> > From yudi.zheng at oracle.com Thu Aug 20 12:37:18 2020 From: yudi.zheng at oracle.com (Yudi Zheng) Date: Thu, 20 Aug 2020 14:37:18 +0200 Subject: RFR: 8252058: [JVMCI] Rework setting is_method_handle_invoke flag in jvmciCodeInstaller In-Reply-To: <24dd9111-9119-3b00-fb48-733ef6042cae@oracle.com> References: <5aec53c6-5e79-aa07-aa97-ca46fccb3f58@oracle.com> <96D2E077-C8A9-4DB6-9107-359C151A004B@oracle.com> <24dd9111-9119-3b00-fb48-733ef6042cae@oracle.com> Message-ID: Please review this rework of setting is_method_handle_invoke flag in jvmciCodeInstaller. http://cr.openjdk.java.net/~yzheng/8252058/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8252058 Changes since last time are at http://cr.openjdk.java.net/~yzheng/8252058/webrev.00/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/GraalHotSpotVMConfig.java.udiff.html -Yudi > On 7 Jun 2020, at 23:14, Dean Long wrote: > > Looks good! > > dl > > On 6/7/20 1:06 PM, Yudi Zheng wrote: >> Thanks Dean! >> Here is a revision including your suggestion: http://cr.openjdk.java.net/~yzheng/8246347/webrev.01/ >> >> -Yudi >> >>> On 6 Jun 2020, at 11:33, Dean Long wrote: >>> >>> I found a problem. You need to make CompiledMethod::is_deopt_mh_entry() look like is_deopt_entry() by adding the JVMCI logic that looks backwards by the size of the call instruction. >>> >>> dl >>> >>> On 6/4/20 12:03 AM, Yudi Zheng wrote: >>>> I did not push this yet. It might require changes on the Graal side. I am still thinking about how to merge. >>>> >>>> -Yudi >>>> >>>>> On 4 Jun 2020, at 01:22, Dean Long wrote: >>>>> >>>>> Does this require recent Graal change in order to work correctly? >>>>> >>>>> dl >>>>> >>>>> On 6/3/20 3:47 PM, Dean Long wrote: >>>>>> Hi Yudi. I'm seeing an assert in test/jdk/java/lang/invoke/CallSiteTest.java with a debug build. Let me remove my changes and see if it still fails. What testing did you do? >>>>>> >>>>>> dl >>>>>> >>>>>> On 6/2/20 9:38 AM, Yudi Zheng wrote: >>>>>>> Hello, >>>>>>> >>>>>>> Please review this patch that sets is_method_handle_invoke flag accordingly when describing scope at call site in jvmciCodeInstaller. >>>>>>> >>>>>>> http://cr.openjdk.java.net/~yzheng/8246347/webrev.00/ >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8246347 >>>>>>> >>>>>>> Many thanks, >>>>>>> Yudi >>> >> > From rwestrel at redhat.com Thu Aug 20 12:51:42 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 20 Aug 2020 14:51:42 +0200 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <668C77F1-5CCC-43CA-9C5E-2EE390D3137A@oracle.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> <668C77F1-5CCC-43CA-9C5E-2EE390D3137A@oracle.com> Message-ID: <87y2m91nvl.fsf@redhat.com> Hi John, > I?m going over it one more time (between smoky breaths from > the California fires) and I have a question. What is the exact > structure of outer_phi? Thanks for taking another look at this. > 1. At first it is a clone of phi, its region patched and the other edges the same: > > outer_phi := Phi(outer_head, init, AddL(phi, stride)) It's only a clone so: outer_phi := Phi(outer_head, init, AddI(phi, stride)) (that is no AddL) > 2. After long_loop_replace_long_iv, the interior phi links it back to itself: > > outer_phi := Phi(outer_head, init, AddL(AddL(inner_phi, outer_phi), stride)) I don't think that's right. There are 2 calls to long_loop_replace_long_iv(). One to replace phi and the other one to replace incr (that is the AddI above). outer_phi := Phi(outer_head, init, AddL(I2L(AddI(inner_phi, stride)), outer_phi)) (actually this is not quite accurate because peeling one iteration causes an extra phi to be added to merge the peeled iteration with the counted loop in most cases). Do you see a problem with the above outer_phi structure? Roland. From aph at redhat.com Thu Aug 20 14:19:18 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 20 Aug 2020 15:19:18 +0100 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <4ea979d6-c81f-fa3b-7c23-e563f52141dd@redhat.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com> <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com> <301b8518-164d-b328-ed14-6aaa3b69b2ef@redhat.com> <857dtty9pq.fsf@nicgas01-pc.shanghai.arm.com> <85364hy7sd.fsf@nicgas01-pc.shanghai.arm.com> <39664366-1ba9-6eb1-dcee-c8a4f07877b7@oracle.com> <4ea979d6-c81f-fa3b-7c23-e563f52141dd@redhat.com> Message-ID: <57d07f23-c0f1-31eb-586a-71fa59b80891@redhat.com> On 20/08/2020 11:38, Andrew Dinn wrote: > Well, perhaps this check ought to be done as a standalone test rather > than as debug build validation. I don't really have any deep commitment > either way. However, if we do proceed with this I think it ought to be > in a separate follow-up patch and with its own JIRA. It should not stop > Ningsheng's SVE patch going in as is since that merely corrects the > status quo to allow for SVE instructions. I agree. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From magnus.ihse.bursie at oracle.com Thu Aug 20 14:37:42 2020 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Thu, 20 Aug 2020 16:37:42 +0200 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <4ea979d6-c81f-fa3b-7c23-e563f52141dd@redhat.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com> <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com> <301b8518-164d-b328-ed14-6aaa3b69b2ef@redhat.com> <857dtty9pq.fsf@nicgas01-pc.shanghai.arm.com> <85364hy7sd.fsf@nicgas01-pc.shanghai.arm.com> <39664366-1ba9-6eb1-dcee-c8a4f07877b7@oracle.com> <4ea979d6-c81f-fa3b-7c23-e563f52141dd@redhat.com> Message-ID: On 2020-08-20 12:38, Andrew Dinn wrote: > On 20/08/2020 11:14, Magnus Ihse Bursie wrote: >> On 2020-08-20 11:40, Nick Gasson wrote: >>> http://cr.openjdk.java.net/~ngasson/asmtest/webrev.0/ >>> >>> Then you'd run it with >>> >>> ?? make exploded-test TEST="gtest:AssemblerAArch64" >>> >>> The downside is that it won't run on every startup of a debug build, but >>> it will run in the tier1 tests, including for release builds, which >>> arguably gives more coverage. It looks a lot tidier to me, but that's >>> clearly subjective. >> FWIW, it definitely looks tidier to me too. > Well, perhaps this check ought to be done as a standalone test rather > than as debug build validation. I don't really have any deep commitment > either way. However, if we do proceed with this I think it ought to be > in a separate follow-up patch and with its own JIRA. It should not stop > Ningsheng's SVE patch going in as is since that merely corrects the > status quo to allow for SVE instructions. Yes, I fully agree, and never meant to imply anything else. /Magnus > > regards, > > > Andrew Dinn > ----------- > Red Hat Distinguished Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill > From shade at redhat.com Thu Aug 20 15:02:59 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 20 Aug 2020 17:02:59 +0200 Subject: RFR (XS) 8252120: compiler/oracle/TestCompileCommand.java misspells "occured" Message-ID: Bug: https://bugs.openjdk.java.net/browse/JDK-8252120 Noticed this while reading some related test code. There is no way VM could emit the message the assert checks, which means the assert always passes. See the history in the bug. Fix: diff -r 53629f4016c6 test/hotspot/jtreg/compiler/oracle/TestCompileCommand.java --- a/test/hotspot/jtreg/compiler/oracle/TestCompileCommand.java Thu Aug 20 11:42:12 2020 +0100 +++ b/test/hotspot/jtreg/compiler/oracle/TestCompileCommand.java Thu Aug 20 17:01:40 2020 +0200 @@ -63,5 +63,5 @@ } - out.shouldNotContain("CompileCommand: An error occured during parsing"); + out.shouldNotContain("CompileCommand: An error occurred during parsing"); out.shouldHaveExitValue(0); } Testing: affected test -- Thanks, -Aleksey From rwestrel at redhat.com Thu Aug 20 15:34:24 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 20 Aug 2020 17:34:24 +0200 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> Message-ID: <87tuwx1gcf.fsf@redhat.com> > Yes, webrev.03 looks good to me. I've re-run extended testing and the results look good. Thanks for the review and testing! Roland. From rwestrel at redhat.com Thu Aug 20 16:05:39 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 20 Aug 2020 18:05:39 +0200 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> Message-ID: <87r1s11ewc.fsf@redhat.com> Hi Vladimir, Thanks for taking a look at this. > =============== > src/hotspot/share/opto/callnode.cpp: > > // If you have back to back safepoints, remove one > if( in(TypeFunc::Control)->is_SafePoint() ) > return in(TypeFunc::Control); > > - if( in(0)->is_Proj() ) { > + // Transforming long counted loops requires a safepoint node. Do not > + // eliminate a safepoint until loop opts are over. > + if (in(0)->is_Proj() && !phase->C->major_progress()) { > > Can you elaborate on this a bit? Why elimination of back-to-back > safepoints cause problems during new transformation? Is it because you > need specifically a SafePoint because CallNode doesn't fit? The issue is with a call followed by a SafePointNode. A call captures the state before the call but we would need the state after the call otherwise on a deoptimization we would re-executed the call. > =============== > src/hotspot/share/opto/loopnode.cpp: > > +void PhaseIdealLoop::add_empty_predicate(Deoptimization::DeoptReason > reason, Node* inner_head, IdealLoopTree* loop, SafePointNode* sfpt) { > > Nothing actionable at the moment, but it's unfortunate to see more and > more code being duplicated from GraphKit. I wish there were a way to > share implementation between GraphKit, PhaseIdealLoop, and > PhaseMacroExpand. Actually, there might be a way. In the valhalla repo, Tobias pushed a change to GraphKit so it's possible to build one with an igvn argument. So we could do this: JVMState* jvms = cloned_sfpt->jvms()->clone_shallow(C); SafePointNode* map = cloned_sfpt->clone()->as_SafePoint(); map->set_jvms(jvms); jvms->set_map(map); GraphKit kit(jvms, &_igvn); kit.set_control(inner_head->in(LoopNode::EntryControl)); kit.add_empty_predicates(0); _igvn.replace_input_of(inner_head, LoopNode::EntryControl, kit.control()); _igvn.remove_dead_node(map); instead of: if (UseLoopPredicate) { add_empty_predicate(Deoptimization::Reason_predicate, inner_head, outer_ilt, cloned_sfpt); } if (UseProfiledLoopPredicate) { add_empty_predicate(Deoptimization::Reason_profile_predicate, inner_head, outer_ilt, cloned_sfpt); } add_empty_predicate(Deoptimization::Reason_loop_limit_check, inner_head, outer_ilt, cloned_sfpt); and the new PhaseIdealLoop::add_empty_predicate() wouldn't be needed anymore. One thing to consider is that new nodes for predicates are added by GraphKit now and they are not registered with PhaseIdealLoop. It may not be a problem because peeling sets major_progress so no further loop opts will be applied in this round. Anyway, if we wanted to pursue this further, I think it would make sense to push Tobias' patch first. What do you think? Roland. From igor.ignatyev at oracle.com Thu Aug 20 16:16:33 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 20 Aug 2020 09:16:33 -0700 Subject: RFR (XS) 8252120: compiler/oracle/TestCompileCommand.java misspells "occured" In-Reply-To: References: Message-ID: Hi Aleksey, LGTM -- Igor > On Aug 20, 2020, at 8:02 AM, Aleksey Shipilev wrote: > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8252120 > > Noticed this while reading some related test code. There is no way VM could emit the message the assert checks, which means the assert always passes. See the history in the bug. > > Fix: > > diff -r 53629f4016c6 test/hotspot/jtreg/compiler/oracle/TestCompileCommand.java > --- a/test/hotspot/jtreg/compiler/oracle/TestCompileCommand.java Thu Aug 20 11:42:12 2020 +0100 > +++ b/test/hotspot/jtreg/compiler/oracle/TestCompileCommand.java Thu Aug 20 17:01:40 2020 +0200 > @@ -63,5 +63,5 @@ > } > > - out.shouldNotContain("CompileCommand: An error occured during parsing"); > + out.shouldNotContain("CompileCommand: An error occurred during parsing"); > out.shouldHaveExitValue(0); > } > > > Testing: affected test > > -- > Thanks, > -Aleksey > From igor.ignatyev at oracle.com Thu Aug 20 17:16:31 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 20 Aug 2020 10:16:31 -0700 Subject: RFR(T) : 8252005 : narrow disabling of allowSmartActionArgs in vmTestbase In-Reply-To: <17a8369e-5f38-ebab-974b-28e083378aa2@oracle.com> References: <4E6FECE6-9103-46ED-84B2-79DBA0123ED9@oracle.com> <17a8369e-5f38-ebab-974b-28e083378aa2@oracle.com> Message-ID: Hi Serguei, thanks for your review. I've decided to slightly modify the patch and use the ids of subtasks in TEST.properties files (instead of main bug id) in order to avoid possible confusion in the future: - incremental: http://cr.openjdk.java.net/~iignatyev//8252005/webrev.0-1/index.html - whole: http://cr.openjdk.java.net/~iignatyev//8252005/webrev.01/index.html could you please re-review it? Thanks, -- Igor > On Aug 19, 2020, at 4:22 PM, serguei.spitsyn at oracle.com wrote: > > Hi Igor, > > This looks reasonable. > > Thanks, > Serguei > > > On 8/18/20 16:42, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8252005/webrev.00/ >>> 0 lines changed: 0 ins; 0 del; 0 mod; >> Hi all, >> >> could you please review this trivial (and apparently empty) patch which sets allowSmartActionArgs to false only in subdirectories of vmTestbase which currently use PropertyResolvingWrapper? >> >> (it's hard to tell from webrev or patch, but test/hotspot/jtreg/vmTestbase/TEST.properties is effectively removed) >> >> webrev: http://cr.openjdk.java.net/~iignatyev//8252005/webrev.00/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8252005 >> >> Thanks, >> -- Igor >> >> > From igor.ignatyev at oracle.com Thu Aug 20 18:18:19 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 20 Aug 2020 11:18:19 -0700 Subject: RFR(T) : 8252005 : narrow disabling of allowSmartActionArgs in vmTestbase In-Reply-To: <8eb1187f-8030-2adf-b20d-d289bfa35198@oracle.com> References: <4E6FECE6-9103-46ED-84B2-79DBA0123ED9@oracle.com> <17a8369e-5f38-ebab-974b-28e083378aa2@oracle.com> <8eb1187f-8030-2adf-b20d-d289bfa35198@oracle.com> Message-ID: <3CB6B3FF-458B-4B76-872B-46A6D30B7A33@oracle.com> thanks Serguei, pushed. -- Igor > On Aug 20, 2020, at 10:55 AM, serguei.spitsyn at oracle.com wrote: > > Hi Igor, > > Still looks good to me. > The webrev is veeeeery slow. > > Thanks, > Serguei > > > On 8/20/20 10:16, Igor Ignatyev wrote: >> Hi Serguei, >> >> thanks for your review. I've decided to slightly modify the patch and use the ids of subtasks in TEST.properties files (instead of main bug id) in order to avoid possible confusion in the future: >> - incremental: http://cr.openjdk.java.net/~iignatyev//8252005/webrev.0-1/index.html >> - whole: http://cr.openjdk.java.net/~iignatyev//8252005/webrev.01/index.html >> >> could you please re-review it? >> >> Thanks, >> -- Igor >> >>> On Aug 19, 2020, at 4:22 PM, serguei.spitsyn at oracle.com wrote: >>> >>> Hi Igor, >>> >>> This looks reasonable. >>> >>> Thanks, >>> Serguei >>> >>> >>> On 8/18/20 16:42, Igor Ignatyev wrote: >>>> http://cr.openjdk.java.net/~iignatyev//8252005/webrev.00/ >>>>> 0 lines changed: 0 ins; 0 del; 0 mod; >>>> Hi all, >>>> >>>> could you please review this trivial (and apparently empty) patch which sets allowSmartActionArgs to false only in subdirectories of vmTestbase which currently use PropertyResolvingWrapper? >>>> >>>> (it's hard to tell from webrev or patch, but test/hotspot/jtreg/vmTestbase/TEST.properties is effectively removed) >>>> >>>> webrev: http://cr.openjdk.java.net/~iignatyev//8252005/webrev.00/ >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8252005 >>>> >>>> Thanks, >>>> -- Igor >>>> >>>> >>> >> > From vladimir.kozlov at oracle.com Thu Aug 20 19:21:53 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 20 Aug 2020 12:21:53 -0700 Subject: RFR(XS): 8251527: CTW: C2 (Shenandoah) compilation fails with SEGV due to unhandled catchproj == NULL In-Reply-To: <4c3aa4cf-af9f-9000-a12d-010bdd477b30@oracle.com> References: <877dtt3ckp.fsf@redhat.com> <4c3aa4cf-af9f-9000-a12d-010bdd477b30@oracle.com> Message-ID: +1 Thanks, Vladimir K On 8/20/20 2:45 AM, Christian Hagedorn wrote: > Hi Roland > > That looks good to me. > > Best regards, > Christian > > On 20.08.20 11:12, Roland Westrelin wrote: >> >> http://cr.openjdk.java.net/~roland/8251527/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8251527 >> >> This triggers with Shenandoah but the fix (and the bug) is in shared >> C2 code. >> >> CallNode::extract_projections(), once it has found the control ProjNode >> looks for the CatchNode at the first use of the ProjNode. In the case of >> the crash, the ProjNode has more than one use and the first use is not >> the CatchNode (but a pinned LoadNode). I propose using unique_ctrl_out() >> instead. >> >> The ProjNode has a LoadNode because one is pinned on a ProjNode by >> PhaseIdealLoop::split_if_with_blocks_post() when it tries to sink the >> LoadNode out of loop. A LoadNode becomes the first use of the ProjNode >> after the loop body is cloned during unswitching. >> >> Roland. >> From igor.ignatyev at oracle.com Thu Aug 20 20:47:07 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 20 Aug 2020 13:47:07 -0700 Subject: RFR(S) : 8251996 : remove usage of PropertyResolvingWrapper in vm/compiler/complog/uninit Message-ID: <5DA75BC6-7102-4582-903A-F5299C398254@oracle.com> http://cr.openjdk.java.net/~iignatyev//8251996/webrev.00 > 75 lines changed: 13 ins; 29 del; 33 mod; Hi all, could you please review this small patch which removes usage of PropertyResolvingWrapper class from vm/compiler/complog/uninit? a bit of background (from 8219140): > CODETOOLS-7902352 added support of using ${property} in action directive, so PropertyResolvingWrapper isn't needed anymore and can be removed. jtreg can't pass "${test.vm.opts} ${test.java.opts}" as one option, so v.c.c.share.LogCompilationTest (used by and only by 13 complog/uninit tests) was updated to use j.t.lib.Utils::getTestJavaOpts() to get vm flags, and '-options "${test.vm.opts} ${test.java.opts}"' was removed from all 13 tests. the patch also slightly reformats LogCompilationTest: whitespace, imports cleanup, etc. JBS: https://bugs.openjdk.java.net/browse/JDK-8251996 webrev: http://cr.openjdk.java.net/~iignatyev//8251996/webrev.00 testing: :vmTestbase_vm_compiler Thanks, -- Igor From igor.ignatyev at oracle.com Thu Aug 20 20:57:34 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 20 Aug 2020 13:57:34 -0700 Subject: RFR(S) : 8251998 remove usage of PropertyResolvingWrapper in vmTestbase/jit/t Message-ID: <1041CE41-B5C9-407F-AF91-918A52885DA8@oracle.com> http://cr.openjdk.java.net/~iignatyev//8251998/webrev.00 > 69 lines changed: 4 ins; 24 del; 41 mod; Hi all, could you please review this small patch which removes usages of PropertyResolvingWrapper from vmTestbase/jit/t tests and reenabled allowSmartActionArgs? background from the main bug: > CODETOOLS-7902352 added support of using ${property} in action directive, so PropertyResolvingWrapper isn't needed anymore and can be removed. JBS: https://bugs.openjdk.java.net/browse/JDK-8251998 webrev: http://cr.openjdk.java.net/~iignatyev//8251998/webrev.00 testing: :vmTestbase_vm_compiler Thanks, -- Igor From vladimir.x.ivanov at oracle.com Thu Aug 20 22:01:14 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 21 Aug 2020 01:01:14 +0300 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <87r1s11ewc.fsf@redhat.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> <87r1s11ewc.fsf@redhat.com> Message-ID: >> src/hotspot/share/opto/callnode.cpp: >> >> // If you have back to back safepoints, remove one >> if( in(TypeFunc::Control)->is_SafePoint() ) >> return in(TypeFunc::Control); >> >> - if( in(0)->is_Proj() ) { >> + // Transforming long counted loops requires a safepoint node. Do not >> + // eliminate a safepoint until loop opts are over. >> + if (in(0)->is_Proj() && !phase->C->major_progress()) { >> >> Can you elaborate on this a bit? Why elimination of back-to-back >> safepoints cause problems during new transformation? Is it because you >> need specifically a SafePoint because CallNode doesn't fit? > > The issue is with a call followed by a SafePointNode. A call captures > the state before the call but we would need the state after the call > otherwise on a deoptimization we would re-executed the call. Sorry, I don't get it. Normally JVM state associated with a call is a state right after the call returns. Do you mean there are cases when call has reexecute bit set and hence it has JVM state before the call associated with it? Anyway, it's trivial to convert between 2 states (before and after) and we already do that in some places (e.g., late inline prepares JVM state for the parser based on the state associated with CallStaticJava node). >> =============== >> src/hotspot/share/opto/loopnode.cpp: >> >> +void PhaseIdealLoop::add_empty_predicate(Deoptimization::DeoptReason >> reason, Node* inner_head, IdealLoopTree* loop, SafePointNode* sfpt) { >> >> Nothing actionable at the moment, but it's unfortunate to see more and >> more code being duplicated from GraphKit. I wish there were a way to >> share implementation between GraphKit, PhaseIdealLoop, and >> PhaseMacroExpand. > > Actually, there might be a way. In the valhalla repo, Tobias pushed a > change to GraphKit so it's possible to build one with an igvn > argument. So we could do this: > > JVMState* jvms = cloned_sfpt->jvms()->clone_shallow(C); > SafePointNode* map = cloned_sfpt->clone()->as_SafePoint(); > map->set_jvms(jvms); > jvms->set_map(map); > GraphKit kit(jvms, &_igvn); > kit.set_control(inner_head->in(LoopNode::EntryControl)); > > kit.add_empty_predicates(0); > > _igvn.replace_input_of(inner_head, LoopNode::EntryControl, kit.control()); > _igvn.remove_dead_node(map); > > instead of: > > if (UseLoopPredicate) { > add_empty_predicate(Deoptimization::Reason_predicate, inner_head, outer_ilt, cloned_sfpt); > } > if (UseProfiledLoopPredicate) { > add_empty_predicate(Deoptimization::Reason_profile_predicate, inner_head, outer_ilt, cloned_sfpt); > } > add_empty_predicate(Deoptimization::Reason_loop_limit_check, inner_head, outer_ilt, cloned_sfpt); > > and the new PhaseIdealLoop::add_empty_predicate() wouldn't be needed > anymore. > > One thing to consider is that new nodes for predicates are added by > GraphKit now and they are not registered with PhaseIdealLoop. It may not > be a problem because peeling sets major_progress so no further loop opts > will be applied in this round. > > Anyway, if we wanted to pursue this further, I think it would make sense > to push Tobias' patch first. > > What do you think? Wow, it looks very promising! I'm perfectly fine with addressing it later. Best regards, Vladimir Ivanov From ekaterina.pavlova at oracle.com Thu Aug 20 22:03:50 2020 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Thu, 20 Aug 2020 15:03:50 -0700 Subject: RFR(S) : 8251996 : remove usage of PropertyResolvingWrapper in vm/compiler/complog/uninit In-Reply-To: <5DA75BC6-7102-4582-903A-F5299C398254@oracle.com> References: <5DA75BC6-7102-4582-903A-F5299C398254@oracle.com> Message-ID: <53db4f4d-75b5-be5b-5799-00ae8e567e65@oracle.com> Looks good, -katya On 8/20/20 1:47 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8251996/webrev.00 >> 75 lines changed: 13 ins; 29 del; 33 mod; > > Hi all, > > could you please review this small patch which removes usage of PropertyResolvingWrapper class from vm/compiler/complog/uninit? > > a bit of background (from 8219140): >> CODETOOLS-7902352 added support of using ${property} in action directive, so PropertyResolvingWrapper isn't needed anymore and can be removed. > > jtreg can't pass "${test.vm.opts} ${test.java.opts}" as one option, so v.c.c.share.LogCompilationTest (used by and only by 13 complog/uninit tests) was updated to use j.t.lib.Utils::getTestJavaOpts() to get vm flags, and '-options "${test.vm.opts} ${test.java.opts}"' was removed from all 13 tests. the patch also slightly reformats LogCompilationTest: whitespace, imports cleanup, etc. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8251996 > webrev: http://cr.openjdk.java.net/~iignatyev//8251996/webrev.00 > testing: :vmTestbase_vm_compiler > > Thanks, > -- Igor > From john.r.rose at oracle.com Fri Aug 21 00:12:23 2020 From: john.r.rose at oracle.com (John Rose) Date: Thu, 20 Aug 2020 17:12:23 -0700 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <87y2m91nvl.fsf@redhat.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> <668C77F1-5CCC-43CA-9C5E-2EE390D3137A@oracle.com> <87y2m91nvl.fsf@redhat.com> Message-ID: <9CBCBEBB-7C33-4263-8348-900AAC068D65@oracle.com> On Aug 20, 2020, at 5:51 AM, Roland Westrelin wrote: > > Hi John, > >> I?m going over it one more time (between smoky breaths from >> the California fires) and I have a question. What is the exact >> structure of outer_phi? > > Thanks for taking another look at this. > >> 1. At first it is a clone of phi, its region patched and the other edges the same: >> >> outer_phi := Phi(outer_head, init, AddL(phi, stride)) > > It's only a clone so: > > outer_phi := Phi(outer_head, init, AddI(phi, stride)) > > (that is no AddL) I?m not sure what you mean? The original incr is an AddL, since we are transforming a long loop. The AddI goes somewhere else in the transformed code. But, yes, the first step is just to make a cloned phi and patch it into the outer loop. > >> 2. After long_loop_replace_long_iv, the interior phi links it back to itself: >> >> outer_phi := Phi(outer_head, init, AddL(AddL(inner_phi, outer_phi), stride)) > > I don't think that's right. There are 2 calls to > long_loop_replace_long_iv(). One to replace phi and the other one to > replace incr (that is the AddI above). Right; I missed the fact that the second replace_long_iv step replaces the AddL completely. So while phi is replaced by AddL(I2L(inner_phi), outer_phi), incr is replaced by AddL(I2L(inner_incr), outer_phi). phi := Phi(x, init, incr) => AddL(I2L(inner_phi), outer_phi) incr := AddL(phi, stride) => AddL(I2L(inner_incr), outer_phi) And the effect of those replacements on outer_phi (the patched clone of phi) is: outer_phi := Phi(outer_head, init, <>) => Phi(outer_head, init, <>) => outer_phi := Phi(outer_head, init, AddL(I2L(AddI(inner_phi, intcon(stride))), outer_phi)) not (as I was said previously): outer_phi := Phi(outer_head, init, AddL(<>, longcon(stride))) => Phi(outer_head, init, AddL(<>, longcon(stride))) And, in the corrected transform, there is no worrying extra addition of stride (using AddL directly). > > outer_phi := Phi(outer_head, init, AddL(I2L(AddI(inner_phi, stride)), outer_phi)) > > (actually this is not quite accurate because peeling one iteration > causes an extra phi to be added to merge the peeled iteration with the > counted loop in most cases). > > Do you see a problem with the above outer_phi structure? Not any more. Let?s just make sure the transform gets exercised, OK? After the P.S. is an amended chunk of pseudocode showing how it works. I created it by labeling the various expressions in the example loop with the names used in is_long_counted_loop, and then I stepped through is_long_counted_loop and edited the pseudocode to reflect each step. If you agree I did it correctly, and that it helps explain the code, you could place it as a comment at bottom, just before the final peel. Otherwise, we can just leave it here FTR. I do have this specific request: Please replace the pseudocode at the top (already in the webrev) with the following corrected pseudocode. It uses names more consistent with the actual C++ code and corresponds more accurately to the transformed IR. ``` // range of long values from the initial loop in (at most) max int // steps. That is: x: for (long phi = init; phi < limit; phi += stride) { // phi := Phi(L, init, incr) // incr := AddL(phi, longcon(stride)) // phi_incr := phi (test happens before increment) long incr = phi + stride; ... use phi and incr ... } OR: x: for (long phi = init; (phi += stride) < limit; ) { // phi := Phi(L, AddL(init, stride), incr) // incr := AddL(phi, longcon(stride)) // phi_incr := NULL (test happens after increment) long incr = phi + stride; ... use phi and (phi + stride) ... } ==transform=> const ulong inner_iters_limit = INT_MAX - stride - 1; //near 0x7FFFFFF0 assert(stride <= inner_iters_limit); // else abort transform assert((extralong)limit + stride <= LONG_MAX); // else deopt outer_head: for (long outer_phi = init;;) { // outer_phi := Phi(outer_head, init, AddL(outer_phi, I2L(inner_phi))) ulong inner_iters_max = (ulong) MAX(0, ((extralong)limit + stride - outer_phi)); long inner_iters_actual = MIN(inner_iters_limit, inner_iters_max); assert(inner_iters_actual == (int)inner_iters_actual); int inner_phi, inner_incr; x: for (inner_phi = 0;; inner_phi = inner_incr) { // inner_phi := Phi(x, intcon(0), inner_incr) // inner_incr := AddI(inner_phi, intcon(stride)) inner_incr = inner_phi + stride; if (inner_incr < inner_iters_actual) { ... use phi=>(outer_phi+inner_phi) and incr=>(outer_phi+inner_incr) ... continue; } else break; } if ((outer_phi+inner_phi) < limit) //OR (outer_phi+inner_incr) < limit continue; else break; } ``` Thanks! ? John P.S. Here are the intermediate steps, annotated with the C++ variable names for the various nodes, and with the steps that created the transformed loop nodes. == old IR nodes => entry_control: {...} x: for (long phi = init;;) { // phi := Phi(x, init, incr) // incr := AddL(phi, longcon(stride)) exit_test: if (phi < limit) back_control: fallthrough; else exit_branch: break; // test happens before increment => phi == phi_incr != NULL long incr = phi + stride; ... use phi and incr ... phi = incr; } == new IR nodes (just before final peel) => entry_control: {...} long adjusted_limit = limit + stride; //because phi_incr != NULL assert(!limit_check_required || (extralong)limit + stride == adjusted_limit); // else deopt ulong inner_iters_limit = max_jint - ABS(stride) - 1; //near 0x7FFFFFF0 outer_head: for (long outer_phi = init;;) { // outer_phi := phi->clone(), in(0):=outer_head, => Phi(outer_head, init, incr) // REPLACE phi => AddL(outer_phi, I2L(inner_phi)) // REPLACE incr => AddL(outer_phi, I2L(inner_incr)) // SO THAT outer_phi := Phi(outer_head, init, AddL(outer_phi, I2L(inner_incr))) ulong inner_iters_max = (ulong) MAX(0, ((extralong)adjusted_limit - outer_phi) * SGN(stride)); int inner_iters_actual_int = (int) MIN(inner_iters_limit, inner_iters_max) * SGN(stride); inner_head: x: //in(1) := outer_head int inner_phi; for (inner_phi = 0;;) { // inner_phi := Phi(x, intcon(0), inner_phi + stride) int inner_incr = inner_phi + stride; bool inner_bol = (inner_incr < inner_iters_actual_int); exit_test: //exit_test->in(1) := inner_bol; if (inner_bol) // WAS (phi < limit) back_control: fallthrough; else inner_exit_branch: break; //exit_branch->clone() ... use phi=>(outer_phi+inner_phi) and incr=>(outer_phi+inner_incr) ... inner_phi = inner_phi + stride; // inner_incr } outer_exit_test: //exit_test->clone(), in(0):=inner_exit_branch if ((outer_phi+inner_phi) < limit) // WAS (phi < limit) outer_back_branch: fallthrough; //back_control->clone(), in(0):=outer_exit_test else exit_branch: break; //in(0) := outer_exit_test } From vladimir.kozlov at oracle.com Fri Aug 21 00:17:00 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 20 Aug 2020 17:17:00 -0700 Subject: RFR(S) : 8251996 : remove usage of PropertyResolvingWrapper in vm/compiler/complog/uninit In-Reply-To: <53db4f4d-75b5-be5b-5799-00ae8e567e65@oracle.com> References: <5DA75BC6-7102-4582-903A-F5299C398254@oracle.com> <53db4f4d-75b5-be5b-5799-00ae8e567e65@oracle.com> Message-ID: +1 Vladimir K On 8/20/20 3:03 PM, Ekaterina Pavlova wrote: > Looks good, > > -katya > > On 8/20/20 1:47 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8251996/webrev.00 >>> 75 lines changed: 13 ins; 29 del; 33 mod; >> >> Hi all, >> >> could you please review this small patch which removes usage of PropertyResolvingWrapper class from >> vm/compiler/complog/uninit? >> >> a bit of background (from 8219140): >>> CODETOOLS-7902352 added support of using ${property} in action directive, so PropertyResolvingWrapper isn't needed >>> anymore and can be removed. >> >> jtreg can't pass "${test.vm.opts} ${test.java.opts}" as one option, so v.c.c.share.LogCompilationTest (used by and >> only by 13 complog/uninit tests) was updated to use j.t.lib.Utils::getTestJavaOpts() to get vm flags, and '-options >> "${test.vm.opts} ${test.java.opts}"' was removed from all 13 tests. the patch also slightly reformats >> LogCompilationTest: whitespace, imports cleanup, etc. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8251996 >> webrev: http://cr.openjdk.java.net/~iignatyev//8251996/webrev.00 >> testing: :vmTestbase_vm_compiler >> >> Thanks, >> -- Igor >> > From HORIE at jp.ibm.com Fri Aug 21 02:33:16 2020 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Fri, 21 Aug 2020 11:33:16 +0900 Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: <20200819165338.GA978936@pacoca> References: <20200819165338.GA978936@pacoca>, <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com> <20200626182644.GA262544@pacoca> <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com> <20200630001528.GA26652@pacoca> <20200819002432.GA915540@pacoca> Message-ID: Hi Jose, One thing I noticed is a misaligned backslash in globals_ppc.hpp. Otherwise, the change looks good! /* special instructions */ \ + product(bool, UseByteReverseInstructions, false, \ Best regards, Michihiro ----- Original message ----- From: joserz at linux.ibm.com To: "Doerr, Martin" Cc: Michihiro Horie/Japan/IBM at IBMJP, "hotspot-compiler-dev at openjdk.java.net" Subject: Re: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions Date: Thu, Aug 20, 2020 1:53 AM On Wed, Aug 19, 2020 at 09:55:50AM +0000, Doerr, Martin wrote: > Hi Jose, > > thanks for the update. > > I have never seen 2 format specifications in the ad file. Does that work or does the 2nd one overwrite the 1st one? > I think it should be: > format %{ "BRH $dst, $src\n\t" > "EXTSH $dst, $dst" %} You're right, actually the 2nd one overwrote the first. I just fixed it. Thanks sir! > > I don't need to see another webrev for that. Otherwise, the change looks good. Thanks for contributing. > > Best regards, > Martin > > > > -----Original Message----- > > From: joserz at linux.ibm.com > > Sent: Mittwoch, 19. August 2020 02:25 > > To: Doerr, Martin > > Cc: Michihiro Horie ; hotspot-compiler- > > dev at openjdk.java.net > > Subject: Re: RFR(M): 8248190: PPC: Enable Power10 system and use new > > byte-reverse instructions > > > > Hallo Martin! > > > > Thank you very much for your review. Here is the v3: > > > > Webrev: http://cr.openjdk.java.net/~mhorie/8248190/webrev.02/ > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > > > I run a functional test and it's working as expected. If you try to run it in a > > system > > > $ java -XX:+UseByteReverseInstructions ReverseBytes > > OpenJDK 64-Bit Server VM warning: UseByteReverseInstructions specified, > > but needs at least Power10. > > (continue with existing code) > > > > > Unfortunately, I couldn?t find a Power10 machine in my garage ?? > > ???????? > > > > This is the code I use to test: > > 8<--------------------------------------------------------------- > > import java.io.IOException; > > > > class ReverseBytes > > { > > public static void main(String[] args) throws IOException > > { > > for (int i = 0; i < 1000000; ++i) { > > if (Integer.reverseBytes(0x12345678) != 0x78563412) { > > throw new RuntimeException(); > > } > > > > if (Long.reverseBytes(0x123456789ABCDEF0L) != > > 0xF0DEBC9A78563412L) { > > throw new RuntimeException(); > > } > > > > if (Short.reverseBytes((short)0x1234) != (short)0x3412) { > > throw new RuntimeException(); > > } > > > > if (Character.reverseBytes((char)0xabcd) != (char)0xcdab) { > > throw new RuntimeException(); > > } > > } > > System.out.println("ok"); > > } > > } > > 8<--------------------------------------------------------------- > > > > Best regards! > > > > Jose > > > > On Tue, Aug 18, 2020 at 09:13:39AM +0000, Doerr, Martin wrote: > > > Hi Michihiro and Jose, > > > > > > I had only done a quick review during my vacation. Thanks for updating the > > description of PowerArchitecturePPC64. > > > After taking a second look, I have a few minor requests. Sorry for that. > > > > > > > > > * ?UseByteReverseInstructions? (plural) would be more consistent with > > other names. > > > * Please add ?size? specifications to the ppc.ad file. Otherwise, the > > compiler has to determine sizes dynamically every time. > > > * bytes_reverse_short: ?format? specification misses ?extsh?. > > > > > > Unfortunately, I couldn?t find a Power10 machine in my garage ?? > > > So we rely on your testing. > > > > > > Thanks and best regards, > > > Martin > > > > > > > > > From: Michihiro Horie > > > Sent: Dienstag, 18. August 2020 09:28 > > > To: Doerr, Martin > > > Cc: hotspot-compiler-dev at openjdk.java.net; joserz at linux.ibm.com > > > Subject: RE: RFR(M): 8248190: PPC: Enable Power10 system and use new > > byte-reverse instructions > > > > > > > > > Jose, > > > Latest change looks good also to me. > > > > > > Marin, > > > Do you think if I can push the change? > > > > > > Best regards, > > > Michihiro > > > > > > > > > ----- Original message ----- > > > From: "Doerr, Martin" > > > > > > To: "joserz at linux.ibm.com" > > > > > > Cc: hotspot compiler > dev at openjdk.java.net>, > > "horie at jp.ibm.com" > > > > > > Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system > > and use new byte-reverse instructions > > > Date: Wed, Jul 1, 2020 4:01 AM > > > > > > Thanks for the much better flag description. > > > Looks good. > > > > > > Best regards, > > > Martin > > > > > > > Am 30.06.2020 um 02:15 schrieb > > "joserz at linux.ibm.com" > > >: > > > > > > > > ?Hello team, > > > > > > > > Here's the 2nd version, implementing the suggestions asked by Martin. > > > > > > > > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.01/ > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > > > > > > > Thank you!! > > > > > > > > Jose > > > > > > > >> On Sat, Jun 27, 2020 at 09:29:32AM +0000, Doerr, Martin wrote: > > > >> Hi Jose, > > > >> > > > >> Can you replace the outdated description of PowerArchitecturePPC64 in > > globals_poc.hpp by something generic, please? > > > >> > > > >> Please update the Copyright year in vm_version_poc.hpp. > > > >> > > > >> I can?t test the change, but it looks good to me. > > > >> > > > >> Best regards, > > > >> Martin > > > >> > > > >>>> Am 26.06.2020 um 20:29 schrieb > > "joserz at linux.ibm.com" > > >: > > > >>> > > > >>> ?Hello team! > > > >>> > > > >>> This patch introduces Power10 to OpenJDK and implements three new > > instructions: > > > >>> - brh - byte-reverse halfword > > > >>> - brw - byte-reverse word > > > >>> - brd - byte-reverse doubleword > > > >>> > > > >>> Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/ > > > >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > > >>> > > > >>> Thanks for your review! > > > >>> > > > >>> Jose R. Ziviani > > > From igor.ignatyev at oracle.com Fri Aug 21 03:18:50 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 20 Aug 2020 20:18:50 -0700 Subject: RFR(S) : 8251996 : remove usage of PropertyResolvingWrapper in vm/compiler/complog/uninit In-Reply-To: References: <5DA75BC6-7102-4582-903A-F5299C398254@oracle.com> <53db4f4d-75b5-be5b-5799-00ae8e567e65@oracle.com> Message-ID: <92D72E51-F252-49CF-AE72-367C77C24E9C@oracle.com> Vladimir, Katya, thank you for your reviews, pushed. -- Igor > On Aug 20, 2020, at 5:17 PM, Vladimir Kozlov wrote: > > +1 > > Vladimir K > > On 8/20/20 3:03 PM, Ekaterina Pavlova wrote: >> Looks good, >> -katya >> On 8/20/20 1:47 PM, Igor Ignatyev wrote: >>> http://cr.openjdk.java.net/~iignatyev//8251996/webrev.00 >>>> 75 lines changed: 13 ins; 29 del; 33 mod; >>> >>> Hi all, >>> >>> could you please review this small patch which removes usage of PropertyResolvingWrapper class from vm/compiler/complog/uninit? >>> >>> a bit of background (from 8219140): >>>> CODETOOLS-7902352 added support of using ${property} in action directive, so PropertyResolvingWrapper isn't needed anymore and can be removed. >>> >>> jtreg can't pass "${test.vm.opts} ${test.java.opts}" as one option, so v.c.c.share.LogCompilationTest (used by and only by 13 complog/uninit tests) was updated to use j.t.lib.Utils::getTestJavaOpts() to get vm flags, and '-options "${test.vm.opts} ${test.java.opts}"' was removed from all 13 tests. the patch also slightly reformats LogCompilationTest: whitespace, imports cleanup, etc. >>> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8251996 >>> webrev: http://cr.openjdk.java.net/~iignatyev//8251996/webrev.00 >>> testing: :vmTestbase_vm_compiler >>> >>> Thanks, >>> -- Igor >>> From thomas.stuefe at gmail.com Fri Aug 21 05:58:42 2020 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Fri, 21 Aug 2020 07:58:42 +0200 Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com> <20200626182644.GA262544@pacoca> <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com> <20200630001528.GA26652@pacoca> <20200819002432.GA915540@pacoca> <20200819165338.GA978936@pacoca> Message-ID: Hi, Version 3 of these changes look good to me too. Cheers, Thomas On Fri, Aug 21, 2020 at 4:33 AM Michihiro Horie wrote: > Hi Jose, > > One thing I noticed is a misaligned backslash in globals_ppc.hpp. > Otherwise, the change looks good! > > /* special instructions */ \ > + product(bool, UseByteReverseInstructions, false, \ > > > Best regards, > Michihiro > > > ----- Original message ----- > From: joserz at linux.ibm.com > To: "Doerr, Martin" > Cc: Michihiro Horie/Japan/IBM at IBMJP, " > hotspot-compiler-dev at openjdk.java.net" < > hotspot-compiler-dev at openjdk.java.net> > Subject: Re: RFR(M): 8248190: PPC: Enable Power10 system and use new > byte-reverse instructions > Date: Thu, Aug 20, 2020 1:53 AM > > On Wed, Aug 19, 2020 at 09:55:50AM +0000, Doerr, Martin wrote: > > Hi Jose, > > > > thanks for the update. > > > > I have never seen 2 format specifications in the ad file. Does that work > or does the 2nd one overwrite the 1st one? > > I think it should be: > > format %{ "BRH $dst, $src\n\t" > > "EXTSH $dst, $dst" %} > > You're right, actually the 2nd one overwrote the first. I just fixed it. > Thanks sir! > > > > > I don't need to see another webrev for that. Otherwise, the change looks > good. Thanks for contributing. > > > > Best regards, > > Martin > > > > > > > -----Original Message----- > > > From: joserz at linux.ibm.com > > > Sent: Mittwoch, 19. August 2020 02:25 > > > To: Doerr, Martin > > > Cc: Michihiro Horie ; hotspot-compiler- > > > dev at openjdk.java.net > > > Subject: Re: RFR(M): 8248190: PPC: Enable Power10 system and use new > > > byte-reverse instructions > > > > > > Hallo Martin! > > > > > > Thank you very much for your review. Here is the v3: > > > > > > Webrev: *http://cr.openjdk.java.net/~mhorie/8248190/webrev.02/* > > > > Bug: *https://bugs.openjdk.java.net/browse/JDK-8248190* > > > > > > > I run a functional test and it's working as expected. If you try to > run it in a > > > system > > > > > $ java -XX:+UseByteReverseInstructions ReverseBytes > > > OpenJDK 64-Bit Server VM warning: UseByteReverseInstructions specified, > > > but needs at least Power10. > > > (continue with existing code) > > > > > > > Unfortunately, I couldn?t find a Power10 machine in my garage ?? > > > ???????? > > > > > > This is the code I use to test: > > > 8<--------------------------------------------------------------- > > > import java.io.IOException; > > > > > > class ReverseBytes > > > { > > > public static void main(String[] args) throws IOException > > > { > > > for (int i = 0; i < 1000000; ++i) { > > > if (Integer.reverseBytes(0x12345678) != 0x78563412) { > > > throw new RuntimeException(); > > > } > > > > > > if (Long.reverseBytes(0x123456789ABCDEF0L) != > > > 0xF0DEBC9A78563412L) { > > > throw new RuntimeException(); > > > } > > > > > > if (Short.reverseBytes((short)0x1234) != (short)0x3412) { > > > throw new RuntimeException(); > > > } > > > > > > if (Character.reverseBytes((char)0xabcd) != (char)0xcdab) { > > > throw new RuntimeException(); > > > } > > > } > > > System.out.println("ok"); > > > } > > > } > > > 8<--------------------------------------------------------------- > > > > > > Best regards! > > > > > > Jose > > > > > > On Tue, Aug 18, 2020 at 09:13:39AM +0000, Doerr, Martin wrote: > > > > Hi Michihiro and Jose, > > > > > > > > I had only done a quick review during my vacation. Thanks for > updating the > > > description of PowerArchitecturePPC64. > > > > After taking a second look, I have a few minor requests. Sorry for > that. > > > > > > > > > > > > * ?UseByteReverseInstructions? (plural) would be more consistent > with > > > other names. > > > > * Please add ?size? specifications to the ppc.ad file. > Otherwise, the > > > compiler has to determine sizes dynamically every time. > > > > * bytes_reverse_short: ?format? specification misses ?extsh?. > > > > > > > > Unfortunately, I couldn?t find a Power10 machine in my garage ?? > > > > So we rely on your testing. > > > > > > > > Thanks and best regards, > > > > Martin > > > > > > > > > > > > From: Michihiro Horie > > > > Sent: Dienstag, 18. August 2020 09:28 > > > > To: Doerr, Martin > > > > Cc: hotspot-compiler-dev at openjdk.java.net; joserz at linux.ibm.com > > > > Subject: RE: RFR(M): 8248190: PPC: Enable Power10 system and use new > > > byte-reverse instructions > > > > > > > > > > > > Jose, > > > > Latest change looks good also to me. > > > > > > > > Marin, > > > > Do you think if I can push the change? > > > > > > > > Best regards, > > > > Michihiro > > > > > > > > > > > > ----- Original message ----- > > > > From: "Doerr, Martin" > > > >> > > > > To: "joserz at linux.ibm.com<*mailto:joserz at linux.ibm.com* > >" > > > >> > > > > Cc: hotspot compiler > > dev at openjdk.java.net<*mailto:hotspot-compiler-dev at openjdk.java.net* > >>, > > > "horie at jp.ibm.com<*mailto:horie at jp.ibm.com* >" > > > >> > > > > Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system > > > and use new byte-reverse instructions > > > > Date: Wed, Jul 1, 2020 4:01 AM > > > > > > > > Thanks for the much better flag description. > > > > Looks good. > > > > > > > > Best regards, > > > > Martin > > > > > > > > > Am 30.06.2020 um 02:15 schrieb > > > "joserz at linux.ibm.com<*mailto:joserz at linux.ibm.com* > >" > > > >>: > > > > > > > > > > ?Hello team, > > > > > > > > > > Here's the 2nd version, implementing the suggestions asked by > Martin. > > > > > > > > > > Webrev: *https://cr.openjdk.java.net/~mhorie/8248190/webrev.01/* > > > > > > Bug: *https://bugs.openjdk.java.net/browse/JDK-8248190* > > > > > > > > > > > Thank you!! > > > > > > > > > > Jose > > > > > > > > > >> On Sat, Jun 27, 2020 at 09:29:32AM +0000, Doerr, Martin wrote: > > > > >> Hi Jose, > > > > >> > > > > >> Can you replace the outdated description of > PowerArchitecturePPC64 in > > > globals_poc.hpp by something generic, please? > > > > >> > > > > >> Please update the Copyright year in vm_version_poc.hpp. > > > > >> > > > > >> I can?t test the change, but it looks good to me. > > > > >> > > > > >> Best regards, > > > > >> Martin > > > > >> > > > > >>>> Am 26.06.2020 um 20:29 schrieb > > > "joserz at linux.ibm.com<*mailto:joserz at linux.ibm.com* > >" > > > >>: > > > > >>> > > > > >>> ?Hello team! > > > > >>> > > > > >>> This patch introduces Power10 to OpenJDK and implements three new > > > instructions: > > > > >>> - brh - byte-reverse halfword > > > > >>> - brw - byte-reverse word > > > > >>> - brd - byte-reverse doubleword > > > > >>> > > > > >>> Webrev: *https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/* > > > > > >>> Bug: *https://bugs.openjdk.java.net/browse/JDK-8248190* > > > > > >>> > > > > >>> Thanks for your review! > > > > >>> > > > > >>> Jose R. Ziviani > > > > > > > From shade at redhat.com Fri Aug 21 07:38:37 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 21 Aug 2020 09:38:37 +0200 Subject: RFR (XS) 8252120: compiler/oracle/TestCompileCommand.java misspells "occured" In-Reply-To: References: Message-ID: <9a286b3e-60a5-87f9-00cb-4a4aa303947e@redhat.com> On 8/20/20 6:16 PM, Igor Ignatyev wrote: > LGTM Thanks, pushed. -- -Aleksey From tobias.hartmann at oracle.com Fri Aug 21 07:43:03 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 21 Aug 2020 09:43:03 +0200 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <87tuwx1gcf.fsf@redhat.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> <87tuwx1gcf.fsf@redhat.com> Message-ID: <7433dd28-94ce-781a-a50c-e79234e2986e@oracle.com> For the record, I've tested tier1-9 with "default" flags and tier1-5 with -XX:StressLongCountedLoop=1 and -XX:StressLongCountedLoop=4294967295. Please let me know if you think other flag combinations/values should be tested as well. Best regards, Tobias On 20.08.20 17:34, Roland Westrelin wrote: > >> Yes, webrev.03 looks good to me. I've re-run extended testing and the results look good. > > Thanks for the review and testing! > > Roland. > From rwestrel at redhat.com Fri Aug 21 07:51:17 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 21 Aug 2020 09:51:17 +0200 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <9CBCBEBB-7C33-4263-8348-900AAC068D65@oracle.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> <668C77F1-5CCC-43CA-9C5E-2EE390D3137A@oracle.com> <87y2m91nvl.fsf@redhat.com> <9CBCBEBB-7C33-4263-8348-900AAC068D65@oracle.com> Message-ID: <87o8n41loq.fsf@redhat.com> > I?m not sure what you mean? The original incr is an AddL, > since we are transforming a long loop. The AddI goes > somewhere else in the transformed code. I was confusing myself. Ignore that comment. > Not any more. Let?s just make sure the transform gets exercised, OK? "make sure the transform gets exercised" = properly stress tested? Tobias listed StressLongCountedLoop he used in an other email. > After the P.S. is an amended chunk of pseudocode showing how it works. > I created it by labeling the various expressions in the example loop > with the names used in is_long_counted_loop, and then I stepped > through is_long_counted_loop and edited the pseudocode to > reflect each step. If you agree I did it correctly, and that it helps > explain the code, you could place it as a comment at bottom, just > before the final peel. Otherwise, we can just leave it here FTR. > > I do have this specific request: Please replace the pseudocode > at the top (already in the webrev) with the following corrected > pseudocode. It uses names more consistent with the actual C++ > code and corresponds more accurately to the transformed IR. Ok. Let me do that. Roland. From ningsheng.jian at arm.com Fri Aug 21 07:56:19 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Fri, 21 Aug 2020 15:56:19 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com> Message-ID: <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com> Hi Vladimir, Thanks a lot for looking at this! On 8/20/20 8:29 PM, Vladimir Ivanov wrote: > Hi Ningsheng, > >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039289.html > > > Impressive work, Ningsheng! > >> http://cr.openjdk.java.net/~njian/8231441/README-RFR.txt > > "Since the bottom 128 bits are shared with the NEON, we extend current > register mask definition of V0-V31 registers. Currently, c2 uses one bit > mask for a 32-bit register slot, so to define at most 2048 bits we will > need to add 64 slots in AD file. That's a really large number, and will > also break current regmask assumption." > > Can you, please, elaborate on the last point? What RegMask assumptions > are broken for 2048-bit vectors? I'm looking at [1] and try to > understand the motivation for the changes in shared code. Current regmask is handled by an array of ints, so an element of regmask array can handle at most 32*32=1024 bits. Some regmask handling functions, e.g. clear_to_sets() for alignment, need to be re-examined for the support of 2048 bits. And we may even want to support non power-of-two physical reg sizes, that could be a lot more work. > > Compared to x86 w/ AVX512, architectural state for vector registers is > 4x larger in the worst case (ignoring predicate registers for now). Here > are the relevant constants on x86: > > gensrc/adfiles/adGlobals_x86.hpp: > > // the number of reserved registers + machine registers. > #define REG_COUNT??? 545 > ... > // Size of register-mask in ints > #define RM_SIZE 22 > > My estimate is that for AArch64 with SVE support the constants will be: > > ? REG_COUNT < 2500 > ? RM_SIZE < 100 > > which don't look too bad. > Right, but given that most real hardware implementations will be no larger than 512 bits, I think. Having a large bitmask array, with most bits useless, will be less efficient for regmask computation. > Also, I don't see any changes related to stack management. So, I assume > it continues to be managed in slots. Any problems there? As I > understand, wide SVE registers are caller-save, so there may be many > spills of huge vectors around a call. (Probably, not possible with C2 > auto-vectorizer as it is now, but Vector API will expose it.) > Yes, the stack is still managed in slots, but it will be allocated with real vector register length instead of 'virtual' slots for VecA. See the usages of scalable_reg_slots(), e.g. in chaitin.cpp:1587. We have also applied the patch to vector api, and did find a lot of vector spills with expected correct results. > Have you noticed any performance problems? If that's the case, then > AVX512 support on x86 would benefit from similar optimization as well. > Do you mean register allocation performance problems? I did not notice that before. Do you have any suggestion on how to measure that? > FTR there was a similar exercise [2] on x86 to abstract away exact sizes > of vector registers, but it didn't have to worry about RA since all the > operands were already available. Also, vectors of all different sizes > may be used. So, it makes it hard to compare. > I've also noticed that. That's an excellent work indeed. It could save a lot of backend match rules for different vector register sizes, which was one of the concerns when we started to work on SVE RA, if we defined all regmasks for different SVE vector register sizes. And yes, our current approach will also solve that problem. :-) > Best regards, > Vladimir Ivanov > > [1] http://cr.openjdk.java.net/~njian/8231441/webrev.03-ra/ > > [2] https://bugs.openjdk.java.net/browse/JDK-8230015 > Thanks, Ningsheng From rwestrel at redhat.com Fri Aug 21 07:59:53 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 21 Aug 2020 09:59:53 +0200 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> <87r1s11ewc.fsf@redhat.com> Message-ID: <87lfi81lae.fsf@redhat.com> > Sorry, I don't get it. Normally JVM state associated with a call is a > state right after the call returns. Do you mean there are cases when > call has reexecute bit set and hence it has JVM state before the call > associated with it? JVM state at a call can't be state after the call because it would need to capture the return value which can't be an incoming edge to the call, right? > Anyway, it's trivial to convert between 2 states (before and after) and > we already do that in some places (e.g., late inline prepares JVM state > for the parser based on the state associated with CallStaticJava node). Sure it's feasible to build state after the call. I was concerned that the runtime would hardwire somewhere that state at the call is always state before the call. That would lead to nasty, rare and hard to debug failures. It felt a lot simpler and robuster to leave SafePoint nodes in the graph. We could have the patch go through a performance run and see if that change makes any difference if there's concern about it. Roland. From thomas.schatzl at oracle.com Fri Aug 21 08:04:38 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 21 Aug 2020 10:04:38 +0200 Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: References: <20200819165338.GA978936@pacoca> <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com> <20200626182644.GA262544@pacoca> <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com> <20200630001528.GA26652@pacoca> <20200819002432.GA915540@pacoca> Message-ID: <1ca467b0-14a2-f526-2f97-cf62c5fa148d@oracle.com> Hi, On 21.08.20 04:33, Michihiro Horie wrote: > > Hi Jose, > > One thing I noticed is a misaligned backslash in globals_ppc.hpp. > Otherwise, the change looks good! > > /* special instructions */ > \ > + product(bool, UseByteReverseInstructions, false, > \ Fwiw, for adding product options, you must go through the CSR process. Maybe there is an exception for platform specific ones? Thanks, Thomas From vladimir.x.ivanov at oracle.com Fri Aug 21 09:12:30 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 21 Aug 2020 12:12:30 +0300 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <87lfi81lae.fsf@redhat.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> <87r1s11ewc.fsf@redhat.com> <87lfi81lae.fsf@redhat.com> Message-ID: >> Sorry, I don't get it. Normally JVM state associated with a call is a >> state right after the call returns. Do you mean there are cases when >> call has reexecute bit set and hence it has JVM state before the call >> associated with it? > > JVM state at a call can't be state after the call because it would need > to capture the return value which can't be an incoming edge to the call, > right? Yes, you are right. But, strictly speaking, it's not the state before the call either since all the arguments are not on the stack anymore (as an example [1]). >> Anyway, it's trivial to convert between 2 states (before and after) and >> we already do that in some places (e.g., late inline prepares JVM state >> for the parser based on the state associated with CallStaticJava node). > > Sure it's feasible to build state after the call. I was concerned that > the runtime would hardwire somewhere that state at the call is always > state before the call. That would lead to nasty, rare and hard to debug > failures. It felt a lot simpler and robuster to leave SafePoint nodes in > the graph. We could have the patch go through a performance run and see > if that change makes any difference if there's concern about it. Indeed, keeping a safepoint right after the call does look appealing. But it also means there should be always a safepoint accompanying the call and it should follow it immediately for the logic in question to be in effect. Do we guarantee that? Best regards, Vladimir Ivanov [1] 56 invokevirtual 179 152 bci: 56 VirtualCallData count(0) nonprofiled_count(0) entries(2) 'java/util/HashMap'(4607 0.99) 'java/util/LinkedHashMap'(57 0.01) (lldb) p jvms->dump() 149 SafePoint === 146 94 148 8 9 10 1 1 1 1 15 1 1 1 1 1 1 1 96 102 10 11 12 13 32 1 [[]] SafePoint replaced nodes: 127->136 !orig=55,[26],23 JVMS depth=1 loc=5 stk=18 arg=20 mon=26 scalar=26 end=26 mondepth=0 sp=2 bci=56 reexecute=false method=virtual jobject java.util.HashMap.putVal(jint, jobject, jobject, jboolean, jboolean) bc: locals(13): 10 1 1 1 1 15 1 1 1 1 1 1 1 stack(2): 96 102 args(6): 10 11 12 13 32 1 monitors(0): scalars(0): (lldb) p call->jvms()->dump() ... 150 CallDynamicJava === 146 94 95 8 1 ( 10 11 12 13 32 10 1 1 1 1 15 1 1 1 1 1 1 1 96 102 ) [[ 151 152 153 155 156 163 164 ]] # Dynamic java.util.HashMap::newNode java/util/HashMap$Node * ( java/util/HashMap:NotNull *, int, java/lang/Object *, java/lang/Object *, java/util/HashMap$Node * ) HashMap::putVal @ bci:56 !jvms: HashMap::putVal @ bci:56 JVMS depth=1 loc=10 stk=23 arg=25 mon=25 scalar=25 end=25 mondepth=0 sp=2 bci=56 reexecute=false method=virtual jobject java.util.HashMap.putVal(jint, jobject, jobject, jboolean, jboolean) bc: locals(13): 10 1 1 1 1 15 1 1 1 1 1 1 1 stack(2): 96 102 args(0): monitors(0): scalars(0): (lldb) p new_jvms->dump() ... 149 SafePoint === 158 163 165 8 9 10 1 1 1 1 15 1 1 1 1 1 1 1 96 102 155 11 12 13 32 1 | 161 [[]] SafePoint replaced nodes: 127->136 !orig=55,[26],23 JVMS depth=1 loc=5 stk=18 arg=21 mon=26 scalar=26 end=26 mondepth=0 sp=3 bci=56 reexecute=false method=virtual jobject java.util.HashMap.putVal(jint, jobject, jobject, jboolean, jboolean) bc: locals(13): 10 1 1 1 1 15 1 1 1 1 1 1 1 stack(3): 96 102 155 args(5): 11 12 13 32 1 monitors(0): scalars(0): From rwestrel at redhat.com Fri Aug 21 11:41:41 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 21 Aug 2020 13:41:41 +0200 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> <87r1s11ewc.fsf@redhat.com> <87lfi81lae.fsf@redhat.com> Message-ID: <87imdc1b0q.fsf@redhat.com> > But it also means there should be always a safepoint accompanying the > call and it should follow it immediately for the logic in question to be > in effect. Do we guarantee that? In general, it's not guaranteed that there's a safepoint above the loop exit test. We plant SafePointNodes on back branches in the bytecodes but if the destination of the backbranch is not the loop head then the SafePointNode is not above the exit test. If the SafePointNode is not right above the exit test, the current logic looks for a dominating one in the loop body and checks that there's no side effects between the safepoint and the exit test. So it's possible that we can't find a suitable safepoint in which case the transformation can proceed but without predicates for the inner loop (unless the exit test is a not equal test because then a loop limit check is likely required). So even if we find no safepoint, there's a good chance we can transform the loop and do a fair job of optimizing it. I ran ctw on the base module with the stress option that transforms an int counted loop to a long loop and back to an int counted loop nest to estimate how common it is that no suitable safepoint is found and there was only a handful of them. So it's possible that we end up with a loop that doesn't have a safepoint but the loop has a call that dominates the exit test and we could use its jvm state but it seems like a rare corner case so I don't think the extra complexity is worth it. We could revisit this if it turns out to be a common enough code pattern. Roland. From vladimir.x.ivanov at oracle.com Fri Aug 21 11:52:43 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 21 Aug 2020 14:52:43 +0300 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <87imdc1b0q.fsf@redhat.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> <87r1s11ewc.fsf@redhat.com> <87lfi81lae.fsf@redhat.com> <87imdc1b0q.fsf@redhat.com> Message-ID: > So it's possible that we end up with a loop that doesn't have a > safepoint but the loop has a call that dominates the exit test and we > could use its jvm state but it seems like a rare corner case so I don't > think the extra complexity is worth it. We could revisit this if it > turns out to be a common enough code pattern. Sounds good. Thanks for the clarifications and additional experiments! Best regards, Vladimir Ivanov From rwestrel at redhat.com Fri Aug 21 12:29:33 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 21 Aug 2020 14:29:33 +0200 Subject: RFR(XS): 8251527: CTW: C2 (Shenandoah) compilation fails with SEGV due to unhandled catchproj == NULL In-Reply-To: References: <877dtt3ckp.fsf@redhat.com> <4c3aa4cf-af9f-9000-a12d-010bdd477b30@oracle.com> Message-ID: <87ft8g18sy.fsf@redhat.com> Thanks for the reviews, Christian and Vladimir. Roland. From joserz at linux.ibm.com Fri Aug 21 13:37:29 2020 From: joserz at linux.ibm.com (joserz at linux.ibm.com) Date: Fri, 21 Aug 2020 10:37:29 -0300 Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: <1ca467b0-14a2-f526-2f97-cf62c5fa148d@oracle.com> References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com> <20200626182644.GA262544@pacoca> <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com> <20200630001528.GA26652@pacoca> <20200819002432.GA915540@pacoca> <1ca467b0-14a2-f526-2f97-cf62c5fa148d@oracle.com> Message-ID: <20200821133729.GA53991@pacoca> Hello! On Fri, Aug 21, 2020 at 10:04:38AM +0200, Thomas Schatzl wrote: > Hi, > > On 21.08.20 04:33, Michihiro Horie wrote: > > > > Hi Jose, > > > > One thing I noticed is a misaligned backslash in globals_ppc.hpp. > > Otherwise, the change looks good! > > > > /* special instructions */ > > \ > > + product(bool, UseByteReverseInstructions, false, > > \ > > Fwiw, for adding product options, you must go through the CSR process. Maybe > there is an exception for platform specific ones? I didn't find any exception for platform specific options. But, "experimental" options don't need such CSR process and, to be honest, experimental seems more appropriate here. What do you think? Thank you for your review! :) > > Thanks, > Thomas From thomas.schatzl at oracle.com Fri Aug 21 13:45:17 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 21 Aug 2020 15:45:17 +0200 Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: <20200821133729.GA53991@pacoca> References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com> <20200626182644.GA262544@pacoca> <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com> <20200630001528.GA26652@pacoca> <20200819002432.GA915540@pacoca> <1ca467b0-14a2-f526-2f97-cf62c5fa148d@oracle.com> <20200821133729.GA53991@pacoca> Message-ID: Hi, On 21.08.20 15:37, joserz at linux.ibm.com wrote: > Hello! > > On Fri, Aug 21, 2020 at 10:04:38AM +0200, Thomas Schatzl wrote: >> Hi, >> >> On 21.08.20 04:33, Michihiro Horie wrote: >>> >>> Hi Jose, >>> >>> One thing I noticed is a misaligned backslash in globals_ppc.hpp. >>> Otherwise, the change looks good! >>> >>> /* special instructions */ >>> \ >>> + product(bool, UseByteReverseInstructions, false, >>> \ >> >> Fwiw, for adding product options, you must go through the CSR process. Maybe >> there is an exception for platform specific ones? > > I didn't find any exception for platform specific options. But, "experimental" options > don't need such CSR process and, to be honest, experimental seems more appropriate here. > What do you think? > > Thank you for your review! :) Just a fly-by. It's up to you :) - just that product options need to be announced to the world. I kind of agree that experimental seems more appropriate. You can always "upgrade" it later. Thomas From christian.hagedorn at oracle.com Fri Aug 21 14:28:08 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 21 Aug 2020 16:28:08 +0200 Subject: [16] RFR(M): 8249607: C2: assert(!had_error) failed: bad dominance Message-ID: Hi Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8249607 http://cr.openjdk.java.net/~chagedorn/8249607/webrev.00/ In the testcase, a LoadSNode is cloned in PhaseIdealLoop::split_if_with_blocks_post() for each use such that they can float out of a loop. To ensure that these loads cannot float back into the loop, we pin them by setting their control input [1]. In the testcase, all 3 new clones are pinned to a loop exit node that is part of an outer strip mined loop (see [2]). The clones LoadS 901 and 902 have a late control that is outside of the strip mined loop 879. But the dominance information is still correct after the SplitIf optimization since the inner loop exit node 876 IfFalse is still on the dominator chains of these late controls. We later create pre/main/post loops and add additional RegionNodes to merge them together. However, we do not consider these LoadSNodes that have a control input from 876 IfFalse. When later verifying for each node that its early control dominates its latest possible control, we fail because we cannot reach 876 IfFalse anymore on a dominator chain for the late controls of LoadS 901 and 902 which start further down outside of the strip mined loop 879. We have two options to fix this. We could either update the wrong control inputs from 876 IfFalse during the creation/merging of pre/main/post loops or directly fix it inside split_if_with_blocks_post(). I think it is makes more sense and is also easier to directly fix it in split_if_with_blocks_post() where we could be less pessimistic when pinning loads. The fix now checks if late_load_ctrl is a loop exit of a loop that has an outer strip mined loop and if it dominates x_ctrl. If that is the case, we use the outer loop exit control instead. This also means that the loads can completely float out of the outer strip mined loop. Applying that to the testcase, we get [3] instead of [2]. LoadS 901 and 902 are both at the outer strip mined loop exit while 903 LoadS is still at the inner loop due to 575 StoreI (x_ctrl is 876 IfFalse and dominates the outer strip mined loop exit). The process of creating pre/main/post loops will then take care of these control inputs of the LoadSNodes and rewires them to the newly created RegionNode such that the dominator information is correct again. I additionally updated the printing output in case of such a dominance failure which I think improves the analysis of these problems. It now also prints the idom chain of the early node and the actual real LCA of early and the wrong LCA together with the idom index: Real LCA of early 876 (idom[5]) and (wrong) LCA 728 (idom[19]): 1052 If === 523 1051 [[ 1035 1053 ]] P=0.999999, C=-1.000000 Thank you! Best regards, Christian [1] http://hg.openjdk.java.net/jdk/jdk/file/1c332a041243/src/hotspot/share/opto/loopopts.cpp#l1456 [2] https://bugs.openjdk.java.net/secure/attachment/89911/pinned_at_inner_loop_exit.png [3] https://bugs.openjdk.java.net/secure/attachment/89912/pinned_at_outer_strip_mined_loop_exit.png From martin.doerr at sap.com Fri Aug 21 15:06:55 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 21 Aug 2020 15:06:55 +0000 Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com> <20200626182644.GA262544@pacoca> <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com> <20200630001528.GA26652@pacoca> <20200819002432.GA915540@pacoca> <1ca467b0-14a2-f526-2f97-cf62c5fa148d@oracle.com> <20200821133729.GA53991@pacoca> Message-ID: Hi Thomas, I agree with you in general. However, all PPC64 specific platform flags are "product" at the moment. Most of them should probably be "diagnostic". We should fix that at some point of time. But for now, I'm ok with Jose's webrev since it's consistent with the other PPC64 flags. Best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev retn at openjdk.java.net> On Behalf Of Thomas Schatzl > Sent: Freitag, 21. August 2020 15:45 > To: joserz at linux.ibm.com > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system > and use new byte-reverse instructions > > Hi, > > On 21.08.20 15:37, joserz at linux.ibm.com wrote: > > Hello! > > > > On Fri, Aug 21, 2020 at 10:04:38AM +0200, Thomas Schatzl wrote: > >> Hi, > >> > >> On 21.08.20 04:33, Michihiro Horie wrote: > >>> > >>> Hi Jose, > >>> > >>> One thing I noticed is a misaligned backslash in globals_ppc.hpp. > >>> Otherwise, the change looks good! > >>> > >>> /* special instructions */ > >>> \ > >>> + product(bool, UseByteReverseInstructions, false, > >>> \ > >> > >> Fwiw, for adding product options, you must go through the CSR process. > Maybe > >> there is an exception for platform specific ones? > > > > I didn't find any exception for platform specific options. But, > "experimental" options > > don't need such CSR process and, to be honest, experimental seems more > appropriate here. > > What do you think? > > > > Thank you for your review! :) > > Just a fly-by. It's up to you :) - just that product options need to be > announced to the world. > > I kind of agree that experimental seems more appropriate. You can always > "upgrade" it later. > > Thomas From thomas.schatzl at oracle.com Fri Aug 21 15:12:19 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 21 Aug 2020 17:12:19 +0200 Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com> <20200626182644.GA262544@pacoca> <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com> <20200630001528.GA26652@pacoca> <20200819002432.GA915540@pacoca> <1ca467b0-14a2-f526-2f97-cf62c5fa148d@oracle.com> <20200821133729.GA53991@pacoca> Message-ID: <6202cdf2-10b8-dd70-60ee-da9917cf8a28@oracle.com> Hi, On 21.08.20 17:06, Doerr, Martin wrote: > Hi Thomas, > > I agree with you in general. However, all PPC64 specific platform flags are "product" at the moment. > Most of them should probably be "diagnostic". We should fix that at some point of time. > But for now, I'm ok with Jose's webrev since it's consistent with the other PPC64 flags. > I was merely pointing out what the rule is, that has not been a veto for the patch (which I haven't reviewed btw). If you want to go ahead with that for consistency's sake, with a plan to fix this I can see your point of keeping it. Thanks, Thomas > Best regards, > Martin > > >> -----Original Message----- >> From: hotspot-compiler-dev > retn at openjdk.java.net> On Behalf Of Thomas Schatzl >> Sent: Freitag, 21. August 2020 15:45 >> To: joserz at linux.ibm.com >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system >> and use new byte-reverse instructions >> >> Hi, >> >> On 21.08.20 15:37, joserz at linux.ibm.com wrote: >>> Hello! >>> >>> On Fri, Aug 21, 2020 at 10:04:38AM +0200, Thomas Schatzl wrote: >>>> Hi, >>>> >>>> On 21.08.20 04:33, Michihiro Horie wrote: >>>>> >>>>> Hi Jose, >>>>> >>>>> One thing I noticed is a misaligned backslash in globals_ppc.hpp. >>>>> Otherwise, the change looks good! >>>>> >>>>> /* special instructions */ >>>>> \ >>>>> + product(bool, UseByteReverseInstructions, false, >>>>> \ >>>> >>>> Fwiw, for adding product options, you must go through the CSR process. >> Maybe >>>> there is an exception for platform specific ones? >>> >>> I didn't find any exception for platform specific options. But, >> "experimental" options >>> don't need such CSR process and, to be honest, experimental seems more >> appropriate here. >>> What do you think? >>> >>> Thank you for your review! :) >> >> Just a fly-by. It's up to you :) - just that product options need to be >> announced to the world. >> >> I kind of agree that experimental seems more appropriate. You can always >> "upgrade" it later. >> >> Thomas From martin.doerr at sap.com Fri Aug 21 15:25:46 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 21 Aug 2020 15:25:46 +0000 Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: <6202cdf2-10b8-dd70-60ee-da9917cf8a28@oracle.com> References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com> <20200626182644.GA262544@pacoca> <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com> <20200630001528.GA26652@pacoca> <20200819002432.GA915540@pacoca> <1ca467b0-14a2-f526-2f97-cf62c5fa148d@oracle.com> <20200821133729.GA53991@pacoca> <6202cdf2-10b8-dd70-60ee-da9917cf8a28@oracle.com> Message-ID: Hi Thomas, I understand your point. My concern is that it may become a more political discussion how to handle CSR for PPC64 flags and I don't want to delay Jose's change for that. There are already other changes in the pipe which build on top of it. It will probably be us to handle and approve CSR requests for platforms which are maintained by SAP. We haven't done this so far. We are still handling such flags in a less formal way. I don't know how other non-Oracle platforms are handled. Best regards, Martin > -----Original Message----- > From: Thomas Schatzl > Sent: Freitag, 21. August 2020 17:12 > To: Doerr, Martin ; joserz at linux.ibm.com > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system > and use new byte-reverse instructions > > Hi, > > On 21.08.20 17:06, Doerr, Martin wrote: > > Hi Thomas, > > > > I agree with you in general. However, all PPC64 specific platform flags are > "product" at the moment. > > Most of them should probably be "diagnostic". We should fix that at some > point of time. > > But for now, I'm ok with Jose's webrev since it's consistent with the other > PPC64 flags. > > > > I was merely pointing out what the rule is, that has not been a veto > for the patch (which I haven't reviewed btw). If you want to go ahead > with that for consistency's sake, with a plan to fix this I can see your > point of keeping it. > > Thanks, > Thomas > > > Best regards, > > Martin > > > > > >> -----Original Message----- > >> From: hotspot-compiler-dev >> retn at openjdk.java.net> On Behalf Of Thomas Schatzl > >> Sent: Freitag, 21. August 2020 15:45 > >> To: joserz at linux.ibm.com > >> Cc: hotspot-compiler-dev at openjdk.java.net > >> Subject: Re: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 > system > >> and use new byte-reverse instructions > >> > >> Hi, > >> > >> On 21.08.20 15:37, joserz at linux.ibm.com wrote: > >>> Hello! > >>> > >>> On Fri, Aug 21, 2020 at 10:04:38AM +0200, Thomas Schatzl wrote: > >>>> Hi, > >>>> > >>>> On 21.08.20 04:33, Michihiro Horie wrote: > >>>>> > >>>>> Hi Jose, > >>>>> > >>>>> One thing I noticed is a misaligned backslash in globals_ppc.hpp. > >>>>> Otherwise, the change looks good! > >>>>> > >>>>> /* special instructions */ > >>>>> \ > >>>>> + product(bool, UseByteReverseInstructions, false, > >>>>> \ > >>>> > >>>> Fwiw, for adding product options, you must go through the CSR > process. > >> Maybe > >>>> there is an exception for platform specific ones? > >>> > >>> I didn't find any exception for platform specific options. But, > >> "experimental" options > >>> don't need such CSR process and, to be honest, experimental seems > more > >> appropriate here. > >>> What do you think? > >>> > >>> Thank you for your review! :) > >> > >> Just a fly-by. It's up to you :) - just that product options need to be > >> announced to the world. > >> > >> I kind of agree that experimental seems more appropriate. You can > always > >> "upgrade" it later. > >> > >> Thomas From rwestrel at redhat.com Fri Aug 21 15:22:27 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 21 Aug 2020 17:22:27 +0200 Subject: [16] RFR(M): 8249607: C2: assert(!had_error) failed: bad dominance In-Reply-To: References: Message-ID: <87d03k10ss.fsf@redhat.com> Hi Christian, > We have two options to fix this. We could either update the wrong > control inputs from 876 IfFalse during the creation/merging of > pre/main/post loops or directly fix it inside > split_if_with_blocks_post(). I think it is makes more sense and is also > easier to directly fix it in split_if_with_blocks_post() where we could > be less pessimistic when pinning loads. > > The fix now checks if late_load_ctrl is a loop exit of a loop that has > an outer strip mined loop and if it dominates x_ctrl. If that is the > case, we use the outer loop exit control instead. This also means that > the loads can completely float out of the outer strip mined loop. > Applying that to the testcase, we get [3] instead of [2]. LoadS 901 and > 902 are both at the outer strip mined loop exit while 903 LoadS is still > at the inner loop due to 575 StoreI (x_ctrl is 876 IfFalse and dominates > the outer strip mined loop exit). The process of creating pre/main/post > loops will then take care of these control inputs of the LoadSNodes and > rewires them to the newly created RegionNode such that the dominator > information is correct again. I agree that fixing it in split_if_with_blocks_post() is the right thing to do. The load has no edges to the safepoint in the outer strip mined loop so why is it in the loop in the first place then? If java code has a load in a loop that's live outside the loop then it should be live at the safepoint on loop exit. Is anti dependence analysis too conservative? Also why does get_late_ctrl(n, n_ctrl) return a control inside the outer strip mined loop? And why is it safe to bypass that result? Roland. From erik.osterlund at oracle.com Fri Aug 21 16:21:53 2020 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 21 Aug 2020 18:21:53 +0200 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> Message-ID: Hi, Have you tried this with ZGC on AArch64? It has custom code for saving live registers in the load barrier slow path. I can't see any code changes there, so assuming this will just crash instead. The relevant code is in ZBarrierSetAssembler on aarch64. Maybe I missed something? Thanks, /Erik On 2020-08-19 11:53, Ningsheng Jian wrote: > Hi Andrew, > > I have updated the patch based on the review comments. Would you mind > taking another look? Thanks! > > Full: > http://cr.openjdk.java.net/~njian/8231441/webrev.04/ > > Incremental: > http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/ > > Also add build-dev, as there's a makefile change. > > And the split parts: > > 1) SVE feature detection: > http://cr.openjdk.java.net/~njian/8231441/webrev.04-feature > > 2) c2 register allocation: > http://cr.openjdk.java.net/~njian/8231441/webrev.04-ra > > 3) SVE c2 backend: > http://cr.openjdk.java.net/~njian/8231441/webrev.04-c2 > > Bug: https://bugs.openjdk.java.net/browse/JDK-8231441 > CSR: https://bugs.openjdk.java.net/browse/JDK-8248742 > > JTreg tests are still running, and so far no new failure found. > > Thanks, > Ningsheng > > On 8/17/20 5:16 PM, Andrew Dinn wrote: >> Hi Pengfei, >> >> On 17/08/2020 07:00, Ningsheng Jian wrote: >>> Thanks a lot for the review! Sorry for the late reply, as I was on >>> vacation last week. And thanks to Pengfei and Joshua for helping >>> clarifying some details in the patch. >> >> Yes, they did a very good job of answering most of the pending >> questions. >> >>>> I also eyeballed /some/ of the generated code to check that it looked >>>> ok. I'd really like to be able to do that systematically for a >>>> comprehensive test suite that exercised every rule but I only had the >>>> machine for a few days. This really ought to be done as a follow-up to >>>> ensure that all the rules are working as expected. >>> >>> Yes, we would expect Pengfei's OptoAssembly check patch can get merged >>> in future. >> >> I'm fine with that as a follow-up patch if you raise a JIRA for it. >> >>>> I am not clear why you are choosing to re-init ptrue after certain JVM >>>> runtime calls (e.g. when Z calls into the runtime) and not others e.g. >>>> when we call a JVM_ENTRY. Could you explain the rationale you have >>>> followed here? >>> >>> We do the re-init at any possible return points to c2 code, not in any >>> runtime c++ functions, which will reduce the re-init calls. >>> >>> Actually I found those entries by some hack of jvm. In the hacky code >>> below we use gcc option -finstrument-functions to build hotspot. With >>> this option, each C/C++ function entry/exit will call the instrument >>> functions we defined. In instrument functions, we clobber p7 (or other >>> reg for test) register, and in c2 function return we verify that p7 (or >>> other reg) has been reinitialized. >>> >>> http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch >>> >> >> Nice work. It's very good to have that documented. I'm willing to accept >> i) that this has found all current cases and ii) that the verify will >> catch any cases that might get introduced by future changes (e.g. the >> callout introduced by ZGC that you mention below). As the above mot say >> there is a slim chance this might have missed some cases but I think it >> is pretty unlikely. >> >> >>>> Specific Comments (register allocator webrev): >>>> >>>> >>>> aarch64.ad:97-100 >>>> >>>> Why have you added a reg_def for R8 and R9 here and also to >>>> alloc_class >>>> chunk0 at lines 544-545? They aren't used by C2 so why define them? >>>> >>> >>> I think Pengfei has helped to explain that. I will either add clear >>> comments or rename the register name as you suggested. >> >> Ok, good. >> >>> As Joshua clarified, we are also working on predicate scalable reg, >>> which is not in this patch. Thanks for the suggestion, I will try to >>> refactor this a bit. >> >> Ok, I'll wait for an updated patch. Are you planning to include the >> scalable predicate reg code as part of this patch? I think that would be >> better as it would help to clarify the need to distinguish vector regs >> as a subset of scalable regs. >> >>>> zBarrierSetAssembler_aarch64.cpp:434 >>>> >>>> Can you explain why we need to check p7 here and not do so in other >>>> places where we call into the JVM? I'm not saying this is wrong. I >>>> just >>>> want to know how you decided where re-init of p7 was needed. >>>> >>> >>> Actually I found this by my hack patch above while running jtreg tests. >>> The stub slowpath here can be a c++ function. >> >> Yes, good catch. >> >>>> superword.cpp:97 >>>> >>>> Does this mean that is someone sets the maximum vector size to a >>>> non-power of two, such as 384, all superword operations will be >>>> bypassed? Including those which can be done using NEON vectors? >>>> >>> >>> Current SLP vectorizer only supports power-of-2 vector size. We are >>> trying to work out a new vectorizer to support all SVE vector sizes, so >>> we would expect a size like 384 could go to that path. I tried current >>> patch on a 512-bit SVE hardware which does not support 384-bit: >>> >>> $ java -XX:MaxVectorSize=16 -version # (32 and 64 are the same) >>> openjdk version "16-internal" 2021-03-16 >>> >>> $ java -XX:MaxVectorSize=48 -version >>> OpenJDK 64-Bit Server VM warning: Current system only supports max SVE >>> vector length 32. Set MaxVectorSize to 32 >>> >>> (Fallbacks to 32 and issue a warning, as the prctl() call returns 32 >>> instead of unsupported 48: >>> https://www.kernel.org/doc/Documentation/arm64/sve.txt) >>> >>> Do you think we need to exit vm instead of warning and fallbacking >>> to 32 >>> here? >> >> Yes, I think a vm exit would probably be a better choice. >> >> regards, >> >> >> Andrew Dinn >> ----------- >> Red Hat Distinguished Engineer >> Red Hat UK Ltd >> Registered in England and Wales under Company Registration No. 03798903 >> Directors: Michael Cunningham, Michael ("Mike") O'Neill >> > From vladimir.kozlov at oracle.com Fri Aug 21 20:04:38 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 21 Aug 2020 13:04:38 -0700 Subject: RFR(S) : 8251998 remove usage of PropertyResolvingWrapper in vmTestbase/jit/t In-Reply-To: <1041CE41-B5C9-407F-AF91-918A52885DA8@oracle.com> References: <1041CE41-B5C9-407F-AF91-918A52885DA8@oracle.com> Message-ID: Looks good. Thanks, Vladimir K On 8/20/20 1:57 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8251998/webrev.00 >> 69 lines changed: 4 ins; 24 del; 41 mod; > > Hi all, > > could you please review this small patch which removes usages of PropertyResolvingWrapper from vmTestbase/jit/t tests and reenabled allowSmartActionArgs? > > background from the main bug: >> CODETOOLS-7902352 added support of using ${property} in action directive, so PropertyResolvingWrapper isn't needed anymore and can be removed. > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8251998 > webrev: http://cr.openjdk.java.net/~iignatyev//8251998/webrev.00 > testing: :vmTestbase_vm_compiler > > Thanks, > -- Igor > > From vladimir.kozlov at oracle.com Fri Aug 21 20:07:24 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 21 Aug 2020 13:07:24 -0700 Subject: RFR: 8252058: [JVMCI] Rework setting is_method_handle_invoke flag in jvmciCodeInstaller In-Reply-To: References: <5aec53c6-5e79-aa07-aa97-ca46fccb3f58@oracle.com> <96D2E077-C8A9-4DB6-9107-359C151A004B@oracle.com> <24dd9111-9119-3b00-fb48-733ef6042cae@oracle.com> Message-ID: <679726fc-3a89-072e-45a6-d2a69eb5f068@oracle.com> Looks good. Thank you for testing it with changed version. Vladimir K On 8/20/20 5:37 AM, Yudi Zheng wrote: > Please review this rework of setting is_method_handle_invoke flag in jvmciCodeInstaller. > > http://cr.openjdk.java.net/~yzheng/8252058/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8252058 > > Changes since last time are at http://cr.openjdk.java.net/~yzheng/8252058/webrev.00/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/GraalHotSpotVMConfig.java.udiff.html > > -Yudi > >> On 7 Jun 2020, at 23:14, Dean Long wrote: >> >> Looks good! >> >> dl >> >> On 6/7/20 1:06 PM, Yudi Zheng wrote: >>> Thanks Dean! >>> Here is a revision including your suggestion: http://cr.openjdk.java.net/~yzheng/8246347/webrev.01/ >>> >>> -Yudi >>> >>>> On 6 Jun 2020, at 11:33, Dean Long wrote: >>>> >>>> I found a problem. You need to make CompiledMethod::is_deopt_mh_entry() look like is_deopt_entry() by adding the JVMCI logic that looks backwards by the size of the call instruction. >>>> >>>> dl >>>> >>>> On 6/4/20 12:03 AM, Yudi Zheng wrote: >>>>> I did not push this yet. It might require changes on the Graal side. I am still thinking about how to merge. >>>>> >>>>> -Yudi >>>>> >>>>>> On 4 Jun 2020, at 01:22, Dean Long wrote: >>>>>> >>>>>> Does this require recent Graal change in order to work correctly? >>>>>> >>>>>> dl >>>>>> >>>>>> On 6/3/20 3:47 PM, Dean Long wrote: >>>>>>> Hi Yudi. I'm seeing an assert in test/jdk/java/lang/invoke/CallSiteTest.java with a debug build. Let me remove my changes and see if it still fails. What testing did you do? >>>>>>> >>>>>>> dl >>>>>>> >>>>>>> On 6/2/20 9:38 AM, Yudi Zheng wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> Please review this patch that sets is_method_handle_invoke flag accordingly when describing scope at call site in jvmciCodeInstaller. >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~yzheng/8246347/webrev.00/ >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8246347 >>>>>>>> >>>>>>>> Many thanks, >>>>>>>> Yudi >>>> >>> >> > From vladimir.x.ivanov at oracle.com Fri Aug 21 22:34:40 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Sat, 22 Aug 2020 01:34:40 +0300 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com> <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com> Message-ID: <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com> Thanks for clarifications, Ningsheng. Let me share my thoughts on the topic and I'll start with summarizing the experience of migrating x86 code to generic vectors. JVM has quite a bit of special logic to support vectors. It hasn't exhausted the complexity budget yet, but it's quite close to the limit (as you probably noticed). While extending x86 backend to support Vector API, we pushed it over the limit and had to address some of the issues. The ultimate goal was to move to vectors which represent full-width hardware registers. After we were convinced that it will work well in AD files, we encountered some inefficiencies with vector spills: depending on actual hardware, smaller (than available) vectors may be used (e.g., integer computations on AVX-capable CPU). So, we stopped half-way and left post-matching part intact: depending on actual vector value width, appropriate operand (vecX/vecY/vecZ + legacy variants) is chosen. (I believe you may be in a similar situation on AArch64 with NEON vs SVE where both 128-bit and wide SVE vectors may be used at runtime.) Now back to the patch. What I see in the patch is that you try to attack the problem from the opposite side: you introduce new concept of a size-agnostic vector register on RA side and then directly use it during matching: vecA is used in aarch64_sve.ad and aarch64.ad relies on vecD/vecX. Unfortunately, it extends the implementation in orthogonal direction which looks too aarch64-specific to benefit other architectures and x86 particular. I believe there's an alternative approach which can benefit both aarch64 and x86, but it requires more experimentation. If I were to start from scratch, I would choose between 3 options: #1: reuse existing VecX/VecY/VecZ ideal registers and limit supported vector sizes to 128-/256-/512-bit values. #2: lift limitation on max size (to 1024/2048 bits), but ignore non-power-of-2 sizes; #3: introduce support for full range of vector register sizes (128-/.../2048-bit with 128-bit step); I see 2 (mostly unrelated) limitations: maximum vector size and non-power-of-2 sizes. My understanding is that you don't try to accurately represent SVE for now, but lay some foundations for future work: you give up on non-power-of-2 sized vectors, but still enable support for arbitrarily sized vectors (addressing both limitations on maximum size and size granularity) in RA (and it affects only spills). So, it is somewhere between #2 and #3. The ultimate goal is definitely #3, but how much more work will be required to teach the JVM about non-power-of-2 vectors? As I see in the patch, you don't have auto-vectorizer support yet, but Vector API will provide access to whatever size hardware exposes. What do you expect on hardware front in the near/mid-term future? Anything supporting vectors larger than 512-bit? What about 384-bit vectors? I don't have a good understanding where SVE/SVE2-capable hardware is moving and would benefit a lot from your insights about what to expect. If 256-/512-bit vectors end up as the only option, then #1 should fit them well. For larger vectors #2 (or a mix of #1 and #2) may be a good fit. My understanding that existing RA machinery should support 1024-bit vectors well. So, unless 2048-bit vectors are needed, we could live with the framework we have right now. If hardware has non-power-of-2 vectors, but JVM doesn't support them, then JVM can work with just power-of-2 portion of them (384-bit => 256-bit). Giving up on #3 for now and starting with less ambitious goals (#1 or #2) would reduce pressure on RA and give more time for additional experiments to come with a better and more universal support/representation of generic/size-agnostic vectors. And, in a longer term, help reducing complexity and technical debt in the area. Some more comments follow inline. >> Compared to x86 w/ AVX512, architectural state for vector registers is >> 4x larger in the worst case (ignoring predicate registers for now). >> Here are the relevant constants on x86: >> >> gensrc/adfiles/adGlobals_x86.hpp: >> >> // the number of reserved registers + machine registers. >> #define REG_COUNT??? 545 >> ... >> // Size of register-mask in ints >> #define RM_SIZE 22 >> >> My estimate is that for AArch64 with SVE support the constants will be: >> >> ?? REG_COUNT < 2500 >> ?? RM_SIZE < 100 >> >> which don't look too bad. >> > > Right, but given that most real hardware implementations will be no > larger than 512 bits, I think. Having a large bitmask array, with most > bits useless, will be less efficient for regmask computation. Does it make sense to limit the maximum supported size to 512-bit then (at least, initially)? In that case, the overhead won't be worse it is on x86 now. >> Also, I don't see any changes related to stack management. So, I >> assume it continues to be managed in slots. Any problems there? As I >> understand, wide SVE registers are caller-save, so there may be many >> spills of huge vectors around a call. (Probably, not possible with C2 >> auto-vectorizer as it is now, but Vector API will expose it.) >> > > Yes, the stack is still managed in slots, but it will be allocated with > real vector register length instead of 'virtual' slots for VecA. See the > usages of scalable_reg_slots(), e.g. in chaitin.cpp:1587. We have also > applied the patch to vector api, and did find a lot of vector spills > with expected correct results. I'm curious whether similar problems may arise for spills. Considering wide vector registers are caller-saved, it's possible to have lots of 256-byte values to end up on stack (especially, with Vector API). Any concerns with that? >> Have you noticed any performance problems? If that's the case, then >> AVX512 support on x86 would benefit from similar optimization as well. >> > > Do you mean register allocation performance problems? I did not notice > that before. Do you have any suggestion on how to measure that? I'd try to run some applications/benchmarks with -XX:+CITime to get a sense how much RA may be affected. Best regards, Vladimir Ivanov From Divino.Cesar at microsoft.com Sat Aug 22 01:56:34 2020 From: Divino.Cesar at microsoft.com (Cesar Soares Lucas) Date: Sat, 22 Aug 2020 01:56:34 +0000 Subject: Help with JDK-8230525 - Adding new intrinsic Message-ID: Hey there, I'm working on JDK-8230525 (https://bugs.openjdk.java.net/browse/JDK-8230525) and for the past few days I'm struggling to get all the plumbing necessary to add a new intrinsic and instruction pattern for the Integer.reverse() method. Can someone please look at the code I currently have and advise what I'm doing wrong/missing here? The exact problem I'm struggling with is that C2 for some reason choose a previously existing instruction pattern (as defined in x86_64.ad) instead of the new instruction pattern I created. I shared the code here: https://github.com/JohnTortugo/jdk/pull/1 Thanks, Cesar From igor.ignatyev at oracle.com Sat Aug 22 02:02:33 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 21 Aug 2020 19:02:33 -0700 Subject: RFR(S) : 8251998 remove usage of PropertyResolvingWrapper in vmTestbase/jit/t In-Reply-To: References: <1041CE41-B5C9-407F-AF91-918A52885DA8@oracle.com> Message-ID: Thank you Vladimir, pushed. -- Igor > On Aug 21, 2020, at 1:04 PM, Vladimir Kozlov wrote: > > Looks good. > > Thanks, > Vladimir K > > On 8/20/20 1:57 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8251998/webrev.00 >>> 69 lines changed: 4 ins; 24 del; 41 mod; >> Hi all, >> could you please review this small patch which removes usages of PropertyResolvingWrapper from vmTestbase/jit/t tests and reenabled allowSmartActionArgs? >> background from the main bug: >>> CODETOOLS-7902352 added support of using ${property} in action directive, so PropertyResolvingWrapper isn't needed anymore and can be removed. >> JBS: https://bugs.openjdk.java.net/browse/JDK-8251998 >> webrev: http://cr.openjdk.java.net/~iignatyev//8251998/webrev.00 >> testing: :vmTestbase_vm_compiler >> Thanks, >> -- Igor >> From igor.ignatyev at oracle.com Sat Aug 22 05:23:14 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 21 Aug 2020 22:23:14 -0700 Subject: RFR(S) : 8252186 : remove FileInstaller action from vmTestbase/jit/graph tests Message-ID: <221D21A7-B791-48CF-B48E-8E6E8CF8F4B0@oracle.com> http://cr.openjdk.java.net/~iignatyev/8252186/webrev.00/ > 24 lines changed: 0 ins; 12 del; 12 mod; Hi all, could you please review this small cleanup of vmTestbase/jit/graph tests? from JBS: > vmTestbase/jit/graph tests use FileInstaller to copy ${test.src}/data/main.data to the current directory, and pass the path to it as '-path' option to jit.graph.CGT class. since JDK-8252005 enabled jtreg smart action args, we can use ${test.src} right in the argument and avoid copying. testing: :vmTestbase_vm_compiler JBS: https://bugs.openjdk.java.net/browse/JDK-8252186 webrev: http://cr.openjdk.java.net/~iignatyev/8252186/webrev.00/ Thanks, -- Igor From goetz.lindenmaier at sap.com Sat Aug 22 05:45:40 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Sat, 22 Aug 2020 05:45:40 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com> <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com> <5271429a-481d-ddb9-99dc-b3f6670fcc0b@oracle.com> Message-ID: Hi Richard, I read through your change again. It looks good to me now. The new naming and additional comments make it easier to read I think, thank you. One small thing: deoptimization.cpp, l. 1503 You don't really need the brackets. Two lines below you don't use them either. (No webrev needed) Best regards, Goetz. -----Original Message----- From: Reingruber, Richard Sent: Dienstag, 18. August 2020 10:44 To: Lindenmaier, Goetz ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi Goetz, I have collected the changes based on your feedback in a new webrev: Webrev.7: http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.7/ Delta: http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.7.inc/ Most of the changes are renamings, commenting, and reformatting. Besides that ... - I converted the native agent of the test IterateHeapWithEscapeAnalysisEnabled from C to C++, because this seems to be preferred by serviceability developers. I also re-indented the file, but excluded this from the delta webrev. - I had to adapt test/jdk/com/sun/jdi/EATests.java to the fact that background compilation (-Xbatch) cannot be reliably disabled for JVMCI compilers. E.g. the compile broker will compile in the background if JVMCI is not yet fully initialized. Therefore it is possible that test cases are executed before the main test method is compiled on the highest level and then the test case fails. The higher the system load the higher the probability for this to happen. In webrev.7 I skip the compilation level check if the vm is configured to use the JVMCI compiler. I also answered you inline below. Thanks, Richard. -----Original Message----- From: Lindenmaier, Goetz Sent: Donnerstag, 23. Juli 2020 16:20 To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi Richard, Thanks for your two further explanations in the other thread. That made the points clear to me. > > I was not that happy with the names saying not_global_escape > > and similar. I now agreed you have to use the terms of the escape > > analysis (NoEscape ArgEscape= throughout the runtime code. I'm still not happy with > > the 'not' in the term, I always try to expand the name to some > > sentence with a negated verb, but it makes no sense. > > For example, "has_not_global_escape_in_scope" expands to > > "Hasn't a global escape in its scope." in my thinking, which makes > > no sense. You probably mean > > "Has not-global escape in its scope." or "Has {ArgEscape|NoEscape} > > in its scope." > > > C2 is using the word "non" in this context, e.g., here > > alloc->is_non_escaping. > > There is also ConnectionGraph::not_global_escape() That talks about a single node that represents a single Object. An object has a single state wrt. ea. You use the term for safepoint which tracks a set of objects. Here, has_not_global_excape can mean 1. None of the several objects does escape globaly. 2. There is at least one object that escapes globaly. > > non obviously negates the adjective 'global', > > non-global or nonglobal even is a English term I find in the > > net. > > So what about "has_non_global_escape_in_scope?" > > And what about has_ea_local_in_scope? That's good. Please document somewhere that Ea_local == ArgEscape | NoEscape. That's what it is, right? > > Does jvmti specify that the same limits are used ...? > > ok on your side. > > I don't know and didn't find anything in a quick search. Ok, not your business. > > > jvmtiEnvBase.cpp ok > > jvmtiImpl.h|cpp ok > > jvmtiTagMap.cpp ok > > whitebox.cpp ok > > > deoptimization.cpp > > > line 177: Please break line > > line 246, 281: Please break line > > 1578, 1583, 1589, 1632, 1649, 1651 Break line > > > 1651: You use 'non'-terms, too: non-escaping :) > > I know :) At least here it is wrong I'd say. "...has to be a not escaping obj..." > sounds better > (hopefully not only to my german ears). I thought the term non-escpaing makes it quite clear. I just wanted to point out that using non above would be similar to the wording here. > > IterateHeapWithEscapeAnalysisEnabled.java > > > line 415: > > msg("wait until target thread has set testMethod_result"); > > while (testMethod_result == 0) { > > Thread.sleep(50); > > } > > Might the test run into timeouts at this place? > > The field is volatile, i.e. it will be reloaded > > in each iteration. But will dontinline_testMethod > > write it back to main memory in time? > > You mean, the test could hang in that loop for a couple of minutes? I don't > think so. There are cache coherence protocols in place which will invalidate > stale data very timely. Ok, anyways, it would only be a hanging test. > > Ok. I've removed quite a lot of the occurrances. > > > Also, I like full sentences in comments. > > Especially for me as foreign speaker, this makes > > things much more clear. I.e., I try to make it > > a real sentence with articles, capitalized and a > > dot at the end if there is a subject and a verb > > in first place. > > E.g., jvmtiEnvBase.cpp:1327 > > Are you referring to the following? > (from > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6/src/hots > pot/share/prims/jvmtiEnvBase.cpp.frames.html) > > 1326 > 1327 // If the frame is a compiled one, need to deoptimize it. > 1328 if (vf->is_compiled_frame()) { > > This line 1327 is preexisting. Sorry, wrong line number again. I think I meant 1333 // eagerly reallocate scalar replaced objects. But I must admit, the subject is missing. It's one of these imperative sentences where the subject is left out, which are used throughout documentation. Bad example, but still a correct sentence, so qualifies for punctuation? Best regards, Goetz. From vladimir.kozlov at oracle.com Sat Aug 22 17:55:22 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sat, 22 Aug 2020 10:55:22 -0700 Subject: RFR(S) : 8252186 : remove FileInstaller action from vmTestbase/jit/graph tests In-Reply-To: <221D21A7-B791-48CF-B48E-8E6E8CF8F4B0@oracle.com> References: <221D21A7-B791-48CF-B48E-8E6E8CF8F4B0@oracle.com> Message-ID: <7488a613-f5ad-acc8-edc1-677d4511216a@oracle.com> LGTM Thanks, Vladimir K On 8/21/20 10:23 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev/8252186/webrev.00/ >> 24 lines changed: 0 ins; 12 del; 12 mod; > > Hi all, > > could you please review this small cleanup of vmTestbase/jit/graph tests? > from JBS: >> vmTestbase/jit/graph tests use FileInstaller to copy ${test.src}/data/main.data to the current directory, and pass the path to it as '-path' option to jit.graph.CGT class. since JDK-8252005 enabled jtreg smart action args, we can use ${test.src} right in the argument and avoid copying. > > testing: :vmTestbase_vm_compiler > JBS: https://bugs.openjdk.java.net/browse/JDK-8252186 > webrev: http://cr.openjdk.java.net/~iignatyev/8252186/webrev.00/ > > Thanks, > -- Igor > From dean.long at oracle.com Sun Aug 23 04:14:00 2020 From: dean.long at oracle.com (Dean Long) Date: Sat, 22 Aug 2020 21:14:00 -0700 Subject: RFR: 8252058: [JVMCI] Rework setting is_method_handle_invoke flag in jvmciCodeInstaller In-Reply-To: <679726fc-3a89-072e-45a6-d2a69eb5f068@oracle.com> References: <5aec53c6-5e79-aa07-aa97-ca46fccb3f58@oracle.com> <96D2E077-C8A9-4DB6-9107-359C151A004B@oracle.com> <24dd9111-9119-3b00-fb48-733ef6042cae@oracle.com> <679726fc-3a89-072e-45a6-d2a69eb5f068@oracle.com> Message-ID: <18afd52c-3f80-2ecb-c1a4-33395081934a@oracle.com> +1 dl On 8/21/20 1:07 PM, Vladimir Kozlov wrote: > Looks good. Thank you for testing it with changed version. > > Vladimir K > > On 8/20/20 5:37 AM, Yudi Zheng wrote: >> Please review this rework of setting is_method_handle_invoke flag in >> jvmciCodeInstaller. >> >> http://cr.openjdk.java.net/~yzheng/8252058/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8252058 >> >> Changes since last time are at >> http://cr.openjdk.java.net/~yzheng/8252058/webrev.00/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/GraalHotSpotVMConfig.java.udiff.html >> >> -Yudi >> >>> On 7 Jun 2020, at 23:14, Dean Long wrote: >>> >>> Looks good! >>> >>> dl >>> >>> On 6/7/20 1:06 PM, Yudi Zheng wrote: >>>> Thanks Dean! >>>> Here is a revision including your suggestion: >>>> http://cr.openjdk.java.net/~yzheng/8246347/webrev.01/ >>>> >>>> -Yudi >>>> >>>>> On 6 Jun 2020, at 11:33, Dean Long wrote: >>>>> >>>>> I found a problem.? You need to make >>>>> CompiledMethod::is_deopt_mh_entry() look like is_deopt_entry() by >>>>> adding the JVMCI logic that looks backwards by the size of the >>>>> call instruction. >>>>> >>>>> dl >>>>> >>>>> On 6/4/20 12:03 AM, Yudi Zheng wrote: >>>>>> I did not push this yet. It might require changes on the Graal >>>>>> side. I am still thinking about how to merge. >>>>>> >>>>>> -Yudi >>>>>> >>>>>>> On 4 Jun 2020, at 01:22, Dean Long wrote: >>>>>>> >>>>>>> Does this require recent Graal change in order to work correctly? >>>>>>> >>>>>>> dl >>>>>>> >>>>>>> On 6/3/20 3:47 PM, Dean Long wrote: >>>>>>>> Hi Yudi.? I'm seeing an assert in >>>>>>>> test/jdk/java/lang/invoke/CallSiteTest.java with a debug build. >>>>>>>> Let me remove my changes and see if it still fails.? What >>>>>>>> testing did you do? >>>>>>>> >>>>>>>> dl >>>>>>>> >>>>>>>> On 6/2/20 9:38 AM, Yudi Zheng wrote: >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> Please review this patch that sets is_method_handle_invoke >>>>>>>>> flag accordingly when describing scope at call site in >>>>>>>>> jvmciCodeInstaller. >>>>>>>>> >>>>>>>>> http://cr.openjdk.java.net/~yzheng/8246347/webrev.00/ >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8246347 >>>>>>>>> >>>>>>>>> Many thanks, >>>>>>>>> Yudi >>>>> >>>> >>> >> From boris.ulasevich at bell-sw.com Sun Aug 23 18:20:28 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Sun, 23 Aug 2020 21:20:28 +0300 Subject: RFR 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com> References: <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com> <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com> Message-ID: <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com> Hi, Please review the updated change to C2 and AArch64 which introduces a new BitfieldInsert node to replace Or+Shift+And sequence when possible. Single BFI instruction is emitted for the new node. With the current change all the transformation logic is moved out of aarch64.ad file into the common C2 code. http://bugs.openjdk.java.net/browse/JDK-8249893 http://cr.openjdk.java.net/~bulasevich/8249893/webrev.01 The change in compiler.cpp was done to implicitly ask IGVN to run the idealization once again after the loop optimization phase. This extra step is necessary to make the BFI transform happen only after loop optimization. thanks, Boris On 05.08.2020 12:08, Andrew Haley wrote: > Hi, > > On 8/4/20 5:56 PM, Boris Ulasevich wrote: > >> gently reminding of this review request. >>> http://bugs.openjdk.java.net/browse/JDK-8249893 >>> http://cr.openjdk.java.net/~bulasevich/8249893/webrev.00 > I'm leaning towards no. The code is too complicated and difficult to > maintain for such a small gain. As I suggested to Eric Liu > when discussing 8248870, we should try > canonicalizing this stuff early in compilation then matching with > BFM rules. > From jamsheed.c.m at oracle.com Mon Aug 24 05:36:51 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Mon, 24 Aug 2020 11:06:51 +0530 Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions In-Reply-To: <442caa21-ca0a-f6eb-60a5-1e74bf994894@oracle.com> References: <442caa21-ca0a-f6eb-60a5-1e74bf994894@oracle.com> Message-ID: <03df9364-817d-04d6-6434-80be93a66526@oracle.com> Hi David, Thank you for the review and feedback. Agree on all of them. I will rework and get back. On 10/08/2020 07:33, David Holmes wrote: > Hi Jamsheed, > > On 6/08/2020 10:07 pm, Jamsheed C M wrote: >> Hi all, >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8249451 >> >> webrev: http://cr.openjdk.java.net/~jcm/8249451/webrev.00/ > > Thanks for tackling this messy issue. Overall I like the use of TRAPS > to more clearly document which methods can return with an exception > pending. I think there are some problems with the proposed changes. > I'll start with those comments and then move on to more general comments. > > src/hotspot/share/utilities/exceptions.cpp > src/hotspot/share/utilities/exceptions.hpp > > I don't think the changes here are correct or safe in general. > > First, adding the new macro and function to only clear non-async > exceptions is fine itself. But naming wise the fact only non-async > exceptions are cleared should be evident, and there is no "check" > involved (in the sense of the existing CHECK_ macros) so I suggest: > > s/CHECK_CLEAR_PENDING_EXCEPTION/CLEAR_PENDING_NONASYNC_EXCEPTIONS/ > s/check_clear_pending_exception/clear_pending_nonasync_exceptions/ > Ok > But changing the existing CHECK_AND_CLEAR macros to now leave async > exceptions pending seems potentially dangerous as calling code may not > be prepared for there to now be a pending exception. For example the > use in thread.cpp: > > ?JDK_Version::set_runtime_name(get_java_runtime_name(THREAD)); > ?JDK_Version::set_runtime_version(get_java_runtime_version(THREAD)); > > get_java_runtime_name() is currently guaranteed to clear all > exceptions, so all the other code is known to be safe to call. But > that would no longer be true. That said, this is VM initialization > code and an async exception is impossible at this stage. > > I think I would rather see CHECK_AND_CLEAR left as-is, and an actual > CHECK_AND_CLEAR_NONASYNC introduced for those users of CHECK_AND_CLEAR > that can encounter async exceptions and which should not clear them. > > +?? if > (!_pending_exception->is_a(SystemDictionary::ThreadDeath_klass()) && > +?????? _pending_exception->klass() != > SystemDictionary::InternalError_klass()) { > Ok > Flagging all InternalErrors as async exceptions is probably also not > correct. I don't see a good solution to this at the moment. I think we > would need to introduce a new subclass of InternalError for the unsafe > access error case**. Now it may be that all the other InternalError > usages are "impossible" in the context of where the new macros are to > be used, but that is very difficult to establish or assert. > > ** Or perhaps we could inject a field that allows the VM to identify > instances related to unsafe access errors ... Ideally of course these > unsafe access errors would be distinct from the async exception > mechanism - something I would still like to pursue. > Ok > --- > > General comments ... > > There is a general change from "JavaThread* thread" to "Thread* > THREAD" (or TRAPS) to allow the use of the CHECK macros. This is > unfortunate because the fact the thread is restricted to being a > JavaThread is no longer evident in the method signatures. That is a > flaw with the TRAPS/CHECK mechanism unfortunately :( . But as the > methods no longer take a JavaThread* arg, they should assert that > THREAD->is_Java_thread(). I will also look at an RFE to have > as_JavaThread() to avoid the need for separate assertion checks before > casting from "Thread*" to "JavaThread*". > Ok > Note there's no need to use CHECK when the enclosing method is going > to return immediately after the call that contains the CHECK. It just > adds unnecessary checking of the exception state. The use of TRAPS > shows that the methods may return with an exception pending. I've > flagged all such occurrences I spotted below. > Ok > --- > > +?? // Only metaspace OOM is expected. no Java code executed. > > Nit: s/no/No > > > src/hotspot/share/compiler/compilationPolicy.cpp > > > ?410?????? method_invocation_event(method, CHECK_NULL); > ?489?????? CompileBroker::compile_method(m, InvocationEntryBci, > comp_level, m, hot_count, CompileTask::Reason_InvocationCount, CHECK); > > Nit: there's no need to use CHECK here. > > --- > > src/hotspot/share/compiler/tieredThresholdPolicy.cpp > > ?504???? method_invocation_event(method, inlinee, comp_level, nm, > CHECK_NULL); > ?570???????? compile(mh, bci, CompLevel_simple, CHECK); > ?581???????? compile(mh, bci, CompLevel_simple, CHECK); > ?595???? CompileBroker::compile_method(mh, bci, level, mh, hot_count, > CompileTask::Reason_Tiered, CHECK); > 1062?????? compile(mh, InvocationEntryBci, next_level, CHECK); > > Nit: there's no need to use CHECK here. > > 814 void TieredThresholdPolicy::create_mdo(const methodHandle& mh, > Thread* THREAD) { > > Thank you for correcting this misuse of the THREAD name on a > JavaThread* type. > > --- > > src/hotspot/share/interpreter/linkResolver.cpp > > ?128?? CompilationPolicy::compile_if_required(selected_method, CHECK); > > Nit: there's no need to use CHECK here. > > --- > > src/hotspot/share/jvmci/compilerRuntime.cpp > > ?260???? CompilationPolicy::policy()->event(emh, mh, > InvocationEntryBci, InvocationEntryBci, CompLevel_aot, cm, CHECK); > ?280???? nmethod* osr_nm = CompilationPolicy::policy()->event(emh, mh, > branch_bci, target_bci, CompLevel_aot, cm, CHECK); > > Nit: there's no need to use CHECK here. > > --- > > src/hotspot/share/jvmci/jvmciRuntime.cpp > > ?102???????? // Donot clear probable async exceptions. > > typo: s/Donot/Do not/ > > --- > > src/hotspot/share/runtime/deoptimization.cpp > > 1686 void Deoptimization::load_class_by_index(const > constantPoolHandle& constant_pool, int index) { > > This method should be declared with TRAPS now. > > 1693???? // Donot clear probable Async Exceptions. > > typo: s/Donot/Do not/ > > Ok >> testing : mach1-5(links in jbs) > > There is very little existing testing that will actually test the key > changes you have made here. You will need to do direct fault-injection > testing anywhere you now allow async exceptions to remain, to see if > the calling code can tolerate that. It will be difficult to test > thoroughly. > Ok > Thanks again for tackling this difficult problem! Best regards, Jamsheed > > David > ----- > >> >> While working on JDK-8246381 it was noticed that compilation request >> path clears all exceptions(including async) and doesn't propagate[1]. >> >> Fix: patch restores the propagation behavior for the probable async >> exceptions. >> >> Compilation request path propagate exception as in [2]. MDO and >> MethodCounter doesn't expect any exception other than metaspace >> OOM(added comments). >> >> Deoptimization path doesn't clear probable async exceptions and take >> unpack_exception path for non uncommontraps. >> >> Added java_lang_InternalError to well known classes. >> >> Request for review. >> >> Best Regards, >> >> Jamsheed >> >> [1] w.r.t changes done for JDK-7131259 >> >> [2] >> >> ???? (a) >> ???? -----> c1_Runtime1.cpp/interpreterRuntime.cpp/compilerRuntime.cpp >> ?????? | >> ??????? ----- compilationPolicy.cpp/tieredThresholdPolicy.cpp >> ????????? | >> ?????????? ------ compileBroker.cpp >> >> ???? (b) >> ???? Xcomp versions >> ???? ------> compilationPolicy.cpp >> ??????? | >> ???????? ------> compileBroker.cpp >> >> ???? (c) >> >> ???? Direct call to? compile_method in compileBroker.cpp >> >> ???? JVMCI bootstrap, whitebox, replayCompile. >> >> From ningsheng.jian at arm.com Mon Aug 24 09:16:07 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Mon, 24 Aug 2020 17:16:07 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com> <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com> <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com> Message-ID: Hi Vladimir, Thanks for your valuable inputs! On 8/22/20 6:34 AM, Vladimir Ivanov wrote: > Thanks for clarifications, Ningsheng. > > Let me share my thoughts on the topic and I'll start with summarizing > the experience of migrating x86 code to generic vectors. > > JVM has quite a bit of special logic to support vectors. It hasn't > exhausted the complexity budget yet, but it's quite close to the limit > (as you probably noticed). While extending x86 backend to support Vector > API, we pushed it over the limit and had to address some of the issues. > > The ultimate goal was to move to vectors which represent full-width > hardware registers. After we were convinced that it will work well in AD > files, we encountered some inefficiencies with vector spills: depending > on actual hardware, smaller (than available) vectors may be used (e.g., > integer computations on AVX-capable CPU). So, we stopped half-way and > left post-matching part intact: depending on actual vector value width, > appropriate operand (vecX/vecY/vecZ + legacy variants) is chosen. > > (I believe you may be in a similar situation on AArch64 with NEON vs SVE > where both 128-bit and wide SVE vectors may be used at runtime.) > Thanks for sharing the background. > Now back to the patch. > > What I see in the patch is that you try to attack the problem from the > opposite side: you introduce new concept of a size-agnostic vector > register on RA side and then directly use it during matching: vecA is > used in aarch64_sve.ad and aarch64.ad relies on vecD/vecX. > > Unfortunately, it extends the implementation in orthogonal direction > which looks too aarch64-specific to benefit other architectures and x86 > particular. I believe there's an alternative approach which can benefit > both aarch64 and x86, but it requires more experimentation. > Since vecA and vecX (and others) are architecturally different vector registers, I think it's quite natural that we just introduced the new vector register type vecA, to represent what we need for corresponding hardware vector register. Please note that in vector length agnostic ISA, like Arm SVE and RISC-V vector extension [1], the vector registers are architecturally the same type of register despite the different hardware implementations. > If I were to start from scratch, I would choose between 3 options: > > #1: reuse existing VecX/VecY/VecZ ideal registers and limit supported > vector sizes to 128-/256-/512-bit values. > > #2: lift limitation on max size (to 1024/2048 bits), but ignore > non-power-of-2 sizes; > > #3: introduce support for full range of vector register sizes > (128-/.../2048-bit with 128-bit step); > > I see 2 (mostly unrelated) limitations: maximum vector size and > non-power-of-2 sizes. > > My understanding is that you don't try to accurately represent SVE for > now, but lay some foundations for future work: you give up on > non-power-of-2 sized vectors, but still enable support for arbitrarily > sized vectors (addressing both limitations on maximum size and size > granularity) in RA (and it affects only spills). So, it is somewhere > between #2 and #3. > > The ultimate goal is definitely #3, but how much more work will be > required to teach the JVM about non-power-of-2 vectors? As I see in the > patch, you don't have auto-vectorizer support yet, but Vector API will > provide access to whatever size hardware exposes. What do you expect on > hardware front in the near/mid-term future? Anything supporting vectors > larger than 512-bit? What about 384-bit vectors? > I think our patch is now in 3. :-) We do not give up non-power-of-2 sized vectors, instead we are supporting them well in this patch. And are still using current regmask framework. (Actually, I think the only limitation to the vector size is that it should be multiple of 32-bits - bits per 1 reg slot.) I am not sure about other Arm partners' hardware implementations in the mid-term future, as it's free for cpu implementer to choose any max vector sizes as long as it follows SVE architecture specification. But we did tested the patch with Vector API on different SVE supported vector sizes on emulator, e.g. 384, 768, 1024, 2048 etc. The register allocator including the spill/unspill works well on those different sizes with Vector API. (Thanks to your great work on Vector API. :-)) We currently limit the vector size to power-of-2 in vm_version_aarch64.cpp, as suggested by Andrew Dinn, is because current SLP vectorizer only supports power-of-2 vectors. With Vector API in, I think such restriction can be removed. And we are also working on a new vectorizer to support predication/mask, which should not have power-of-2 limitation. > I don't have a good understanding where SVE/SVE2-capable hardware is > moving and would benefit a lot from your insights about what to expect. > > If 256-/512-bit vectors end up as the only option, then #1 should fit > them well. > > For larger vectors #2 (or a mix of #1 and #2) may be a good fit. My > understanding that existing RA machinery should support 1024-bit vectors > well. So, unless 2048-bit vectors are needed, we could live with the > framework we have right now. > > If hardware has non-power-of-2 vectors, but JVM doesn't support them, > then JVM can work with just power-of-2 portion of them (384-bit => 256-bit). > Yes, we can make JVM to support portion of vectors, at least for SVE. My concern is that the performance wouldn't be as good as the full available vector width. > Giving up on #3 for now and starting with less ambitious goals (#1 or > #2) would reduce pressure on RA and give more time for additional > experiments to come with a better and more universal > support/representation of generic/size-agnostic vectors. And, in a > longer term, help reducing complexity and technical debt in the area. > > Some more comments follow inline. > >>> Compared to x86 w/ AVX512, architectural state for vector registers is >>> 4x larger in the worst case (ignoring predicate registers for now). >>> Here are the relevant constants on x86: >>> >>> gensrc/adfiles/adGlobals_x86.hpp: >>> >>> // the number of reserved registers + machine registers. >>> #define REG_COUNT??? 545 >>> ... >>> // Size of register-mask in ints >>> #define RM_SIZE 22 >>> >>> My estimate is that for AArch64 with SVE support the constants will be: >>> >>> ?? REG_COUNT < 2500 >>> ?? RM_SIZE < 100 >>> >>> which don't look too bad. >>> >> >> Right, but given that most real hardware implementations will be no >> larger than 512 bits, I think. Having a large bitmask array, with most >> bits useless, will be less efficient for regmask computation. > > Does it make sense to limit the maximum supported size to 512-bit then > (at least, initially)? In that case, the overhead won't be worse it is > on x86 now. > Technically, this may be possible though I haven't tried. My concerns are: 1) A larger regmask arrays would be less efficient (we only use 256 bits - 8 slots for SVE in this patch), though won't be worse than x86. 2) Given that current patch already supports larger sizes and non-power-of-2 sizes well with relative small size in diff, if we want to support other sizes soon, there may be some more work to roll-back ad file changes. >>> Also, I don't see any changes related to stack management. So, I >>> assume it continues to be managed in slots. Any problems there? As I >>> understand, wide SVE registers are caller-save, so there may be many >>> spills of huge vectors around a call. (Probably, not possible with C2 >>> auto-vectorizer as it is now, but Vector API will expose it.) >>> >> >> Yes, the stack is still managed in slots, but it will be allocated with >> real vector register length instead of 'virtual' slots for VecA. See the >> usages of scalable_reg_slots(), e.g. in chaitin.cpp:1587. We have also >> applied the patch to vector api, and did find a lot of vector spills >> with expected correct results. > > I'm curious whether similar problems may arise for spills. Considering > wide vector registers are caller-saved, it's possible to have lots of > 256-byte values to end up on stack (especially, with Vector API). Any > concerns with that? > No, we don't need to have such big (256-byte) slots for a smaller vector register. The spill slots are the same size as of real vector length, e.g. 48 bytes for 384-bit vector. Even for alignment, we currently choose SlotsPerVecA (8 slots for 32 bytes, 256 bits) for alignment (skipped slots can still be allocated to other args), which is still smaller than AVX512 (64 bytes, 512 bits). We can tweak the patch to choose other smaller value, if we think the alignment is too large. (Yes, we should always try to avoid spills for wide vectors, especially with Vector API, to avoid performance pitfalls.) >>> Have you noticed any performance problems? If that's the case, then >>> AVX512 support on x86 would benefit from similar optimization as well. >>> >> >> Do you mean register allocation performance problems? I did not notice >> that before. Do you have any suggestion on how to measure that? > > I'd try to run some applications/benchmarks with -XX:+CITime to get a > sense how much RA may be affected. > Thanks! I will give a try. [1] https://github.com/riscv/riscv-v-spec/releases/tag/0.9 Thanks, Ningsheng From ningsheng.jian at arm.com Mon Aug 24 09:59:20 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Mon, 24 Aug 2020 17:59:20 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> Message-ID: <5397e0d1-9d40-0107-c164-304740bc5d7f@arm.com> Hi Erik, Thanks for the review! On 8/22/20 12:21 AM, Erik ?sterlund wrote: > Hi, > > Have you tried this with ZGC on AArch64? It has custom code for saving > live registers in the load barrier slow path. > I can't see any code changes there, so assuming this will just crash > instead. > The relevant code is in ZBarrierSetAssembler on aarch64. > > Maybe I missed something? > I didn't add ZGC option while running tests. I think I need to update push_fp() which is called by ZSaveLiveRegisters. But do we need to get size info (float/neon/sve) instead of saving the whole vector register? Currently, it just simply saves the whole NEON register. And in ZBarrierSetAssembler::load_at(), before calling to runtime code, we call push_call_clobbered_registers_except(), which just saves floating point registers instead of the whole NEON vector registers. Similar behavior in x86 implementation. Is that correct (not saving vectors)? Thanks, Ningsheng From vladimir.x.ivanov at oracle.com Mon Aug 24 12:03:47 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 24 Aug 2020 15:03:47 +0300 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com> <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com> <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com> Message-ID: <9fd1e3b1-7884-1cf7-64ba-040a16c74425@oracle.com> Hi Ningsheng, >> What I see in the patch is that you try to attack the problem from the >> opposite side: you introduce new concept of a size-agnostic vector >> register on RA side and then directly use it during matching: vecA is >> used in aarch64_sve.ad and aarch64.ad relies on vecD/vecX. >> >> Unfortunately, it extends the implementation in orthogonal direction >> which looks too aarch64-specific to benefit other architectures and x86 >> particular. I believe there's an alternative approach which can benefit >> both aarch64 and x86, but it requires more experimentation. >> > > Since vecA and vecX (and others) are architecturally different vector > registers, I think it's quite natural that we just introduced the new > vector register type vecA, to represent what we need for corresponding > hardware vector register. Please note that in vector length agnostic > ISA, like Arm SVE and RISC-V vector extension [1], the vector registers > are architecturally the same type of register despite the different > hardware implementations. FTR vecX et al don't represent hardware registers, they represent vector values of predefined size. (For example, vecS, vecD, and vecX map to the very same set of 128-bit vector registers on x86.) My point is: in terms of existing concepts what you are adding is not "yet another flavor of vector". It's a new full-fledged concept (which is manifested as special cases across the JVM) and you end up with 2 different representations of vectors. I agree that hardware is quite different, but I don't see it makes much of a difference in the context of the JVM and abstractions used to hide it are similar. For example, as of now, most of x86-specific code in C2 works just fine with full-width hardware vectors which are oblivious of their sizes until RA kicks in. And SVE patch you propose completely omits implicit predication hardware provides which makes it similar to AVX512 (modulo wider range of vector width sizes supported). So, even though hardware abstractions being used aren't actually *that* different, vecA piles complexity and introduces a separate way to achieve similar results (but slightly differently). And that's what bothers me. I'd like to see more unification instead which should bring reduction in complexity and an opportunity to address long-standing technical debt (and 5 flavors of ideal registers for vectors is part of it IMO). So far, I see 2 main directions for RA work: (a) support vectors of arbitrary size: (1) helps push the upper limit on the size (1024-bit) (2) handle non-power-of-2 sizes (b) optimize RA implementation for large values Anything else? Speaking of (a), in particular, I don't see why possible solution for it should not supersede vecX et al altogether. Also, I may be wrong, but I don't see a clear evidence there's a pressing need to have all of that fixed right from the beginning. (That's why I put #1 and #2 options on the table.) Starting with #1/#2 would untie initial SVE support from the exploratory work needed to choose the most appropriate solution for (a) and (b). >> If I were to start from scratch, I would choose between 3 options: >> >> ??? #1: reuse existing VecX/VecY/VecZ ideal registers and limit supported >> vector sizes to 128-/256-/512-bit values. >> >> ??? #2: lift limitation on max size (to 1024/2048 bits), but ignore >> non-power-of-2 sizes; >> >> ??? #3: introduce support for full range of vector register sizes >> (128-/.../2048-bit with 128-bit step); >> >> I see 2 (mostly unrelated) limitations: maximum vector size and >> non-power-of-2 sizes. >> >> My understanding is that you don't try to accurately represent SVE for >> now, but lay some foundations for future work: you give up on >> non-power-of-2 sized vectors, but still enable support for arbitrarily >> sized vectors (addressing both limitations on maximum size and size >> granularity) in RA (and it affects only spills). So, it is somewhere >> between #2 and #3. >> >> The ultimate goal is definitely #3, but how much more work will be >> required to teach the JVM about non-power-of-2 vectors? As I see in the >> patch, you don't have auto-vectorizer support yet, but Vector API will >> provide access to whatever size hardware exposes. What do you expect on >> hardware front in the near/mid-term future? Anything supporting vectors >> larger than 512-bit? What about 384-bit vectors? >> > > I think our patch is now in 3. :-) We do not give up non-power-of-2 > sized vectors, instead we are supporting them well in this patch. And > are still using current regmask framework. (Actually, I think the only > limitation to the vector size is that it should be multiple of 32-bits - > bits per 1 reg slot.) > I am not sure about other Arm partners' hardware implementations in the > mid-term future, as it's free for cpu implementer to choose any max > vector sizes as long as it follows SVE architecture specification. But > we did tested the patch with Vector API on different SVE supported > vector sizes on emulator, e.g. 384, 768, 1024, 2048 etc. The register > allocator including the spill/unspill works well on those different > sizes with Vector API. (Thanks to your great work on Vector API. :-)) > > We currently limit the vector size to power-of-2 in > vm_version_aarch64.cpp, as suggested by Andrew Dinn, is because current > SLP vectorizer only supports power-of-2 vectors. With Vector API in, I > think such restriction can be removed. And we are also working on a new > vectorizer to support predication/mask, which should not have power-of-2 > limitation. [...] > Yes, we can make JVM to support portion of vectors, at least for SVE. My > concern is that the performance wouldn't be as good as the full > available vector width. To be clear: I called it "somewhere between #2 and #3" solely because auto-vectorizer bails out on non-power-of-2 sizes. And even though Vector API will work with such cases just fine, IMO having auto-vectorizer support is required before calling #3 complete. In that respect, choosing smaller vector size auto-vectorizer supports is preferrable to picking up the full-width vectors and turning off auto-vectorizer (even though Vector API will support them). It can be turned into heuristic (by default, pick only power-of-2 sizes; let users explicitly specify non-power-of-2 sizes), but speaking of priorities, IMO auto-vectorizer support is more important. >> Giving up on #3 for now and starting with less ambitious goals (#1 or >> #2) would reduce pressure on RA and give more time for additional >> experiments to come with a better and more universal >> support/representation of generic/size-agnostic vectors. And, in a >> longer term, help reducing complexity and technical debt in the area. >> >> Some more comments follow inline. >> >>>> Compared to x86 w/ AVX512, architectural state for vector registers is >>>> 4x larger in the worst case (ignoring predicate registers for now). >>>> Here are the relevant constants on x86: >>>> >>>> gensrc/adfiles/adGlobals_x86.hpp: >>>> >>>> // the number of reserved registers + machine registers. >>>> #define REG_COUNT??? 545 >>>> ... >>>> // Size of register-mask in ints >>>> #define RM_SIZE 22 >>>> >>>> My estimate is that for AArch64 with SVE support the constants will be: >>>> >>>> ??? REG_COUNT < 2500 >>>> ??? RM_SIZE < 100 >>>> >>>> which don't look too bad. >>>> >>> >>> Right, but given that most real hardware implementations will be no >>> larger than 512 bits, I think. Having a large bitmask array, with most >>> bits useless, will be less efficient for regmask computation. >> >> Does it make sense to limit the maximum supported size to 512-bit then >> (at least, initially)? In that case, the overhead won't be worse it is >> on x86 now. >> > > Technically, this may be possible though I haven't tried. My concerns are: > > 1) A larger regmask arrays would be less efficient (we only use 256 bits > - 8 slots for SVE in this patch), though won't be worse than x86. > > 2) Given that current patch already supports larger sizes and > non-power-of-2 sizes well with relative small size in diff, if we want > to support other sizes soon, there may be some more work to roll-back ad > file changes. > >>>> Also, I don't see any changes related to stack management. So, I >>>> assume it continues to be managed in slots. Any problems there? As I >>>> understand, wide SVE registers are caller-save, so there may be many >>>> spills of huge vectors around a call. (Probably, not possible with C2 >>>> auto-vectorizer as it is now, but Vector API will expose it.) >>>> >>> >>> Yes, the stack is still managed in slots, but it will be allocated with >>> real vector register length instead of 'virtual' slots for VecA. See the >>> usages of scalable_reg_slots(), e.g. in chaitin.cpp:1587. We have also >>> applied the patch to vector api, and did find a lot of vector spills >>> with expected correct results. >> >> I'm curious whether similar problems may arise for spills. Considering >> wide vector registers are caller-saved, it's possible to have lots of >> 256-byte values to end up on stack (especially, with Vector API). Any >> concerns with that? >> > > No, we don't need to have such big (256-byte) slots for a smaller vector > register. The spill slots are the same size as of real vector length, > e.g. 48 bytes for 384-bit vector. Even for alignment, we currently > choose SlotsPerVecA (8 slots for 32 bytes, 256 bits) for alignment > (skipped slots can still be allocated to other args), which is still > smaller than AVX512 (64 bytes, 512 bits). We can tweak the patch to > choose other smaller value, if we think the alignment is too large. > (Yes, we should always try to avoid spills for wide vectors, especially > with Vector API, to avoid performance pitfalls.) Thanks for the clarifications. Any new problems/hitting some limitations envisioned when spilling large number of huge vectors (2048-bit) on stack? Best regards, Vladimir Ivanov >>>> Have you noticed any performance problems? If that's the case, then >>>> AVX512 support on x86 would benefit from similar optimization as well. >>>> >>> >>> Do you mean register allocation performance problems? I did not notice >>> that before. Do you have any suggestion on how to measure that? >> >> I'd try to run some applications/benchmarks with -XX:+CITime to get a >> sense how much RA may be affected. >> > > Thanks! I will give a try. > > [1] > https://urldefense.com/v3/__https://github.com/riscv/riscv-v-spec/releases/tag/0.9__;!!GqivPVa7Brio!IwFEx-c_8JDZcWgXPLcWp2ypX3pr1-IWTBfC7O7PHo7_0skMWtQa4fyWpo-lVor0NFv4Ivo$ > > Thanks, > Ningsheng > From joserz at linux.ibm.com Mon Aug 24 12:35:40 2020 From: joserz at linux.ibm.com (joserz at linux.ibm.com) Date: Mon, 24 Aug 2020 09:35:40 -0300 Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: References: <20200819002432.GA915540@pacoca> <1ca467b0-14a2-f526-2f97-cf62c5fa148d@oracle.com> <20200821133729.GA53991@pacoca> <6202cdf2-10b8-dd70-60ee-da9917cf8a28@oracle.com> Message-ID: <20200824123540.GA166438@pacoca> Hallo Martin! Just to understand. Do I need to do something else? Ask more reviewers? Thank you :) Jose On Fri, Aug 21, 2020 at 03:25:46PM +0000, Doerr, Martin wrote: > Hi Thomas, > > I understand your point. My concern is that it may become a more political discussion how to handle CSR for PPC64 flags and I don't want to delay Jose's change for that. There are already other changes in the pipe which build on top of it. > > It will probably be us to handle and approve CSR requests for platforms which are maintained by SAP. We haven't done this so far. We are still handling such flags in a less formal way. > I don't know how other non-Oracle platforms are handled. > > Best regards, > Martin > > > > -----Original Message----- > > From: Thomas Schatzl > > Sent: Freitag, 21. August 2020 17:12 > > To: Doerr, Martin ; joserz at linux.ibm.com > > Cc: hotspot-compiler-dev at openjdk.java.net > > Subject: Re: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system > > and use new byte-reverse instructions > > > > Hi, > > > > On 21.08.20 17:06, Doerr, Martin wrote: > > > Hi Thomas, > > > > > > I agree with you in general. However, all PPC64 specific platform flags are > > "product" at the moment. > > > Most of them should probably be "diagnostic". We should fix that at some > > point of time. > > > But for now, I'm ok with Jose's webrev since it's consistent with the other > > PPC64 flags. > > > > > > > I was merely pointing out what the rule is, that has not been a veto > > for the patch (which I haven't reviewed btw). If you want to go ahead > > with that for consistency's sake, with a plan to fix this I can see your > > point of keeping it. > > > > Thanks, > > Thomas > > > > > Best regards, > > > Martin > > > > > > > > >> -----Original Message----- > > >> From: hotspot-compiler-dev > >> retn at openjdk.java.net> On Behalf Of Thomas Schatzl > > >> Sent: Freitag, 21. August 2020 15:45 > > >> To: joserz at linux.ibm.com > > >> Cc: hotspot-compiler-dev at openjdk.java.net > > >> Subject: Re: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 > > system > > >> and use new byte-reverse instructions > > >> > > >> Hi, > > >> > > >> On 21.08.20 15:37, joserz at linux.ibm.com wrote: > > >>> Hello! > > >>> > > >>> On Fri, Aug 21, 2020 at 10:04:38AM +0200, Thomas Schatzl wrote: > > >>>> Hi, > > >>>> > > >>>> On 21.08.20 04:33, Michihiro Horie wrote: > > >>>>> > > >>>>> Hi Jose, > > >>>>> > > >>>>> One thing I noticed is a misaligned backslash in globals_ppc.hpp. > > >>>>> Otherwise, the change looks good! > > >>>>> > > >>>>> /* special instructions */ > > >>>>> \ > > >>>>> + product(bool, UseByteReverseInstructions, false, > > >>>>> \ > > >>>> > > >>>> Fwiw, for adding product options, you must go through the CSR > > process. > > >> Maybe > > >>>> there is an exception for platform specific ones? > > >>> > > >>> I didn't find any exception for platform specific options. But, > > >> "experimental" options > > >>> don't need such CSR process and, to be honest, experimental seems > > more > > >> appropriate here. > > >>> What do you think? > > >>> > > >>> Thank you for your review! :) > > >> > > >> Just a fly-by. It's up to you :) - just that product options need to be > > >> announced to the world. > > >> > > >> I kind of agree that experimental seems more appropriate. You can > > always > > >> "upgrade" it later. > > >> > > >> Thomas > From martin.doerr at sap.com Mon Aug 24 13:03:25 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 24 Aug 2020 13:03:25 +0000 Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: <20200824123540.GA166438@pacoca> References: <20200819002432.GA915540@pacoca> <1ca467b0-14a2-f526-2f97-cf62c5fa148d@oracle.com> <20200821133729.GA53991@pacoca> <6202cdf2-10b8-dd70-60ee-da9917cf8a28@oracle.com> <20200824123540.GA166438@pacoca> Message-ID: Hi Jose, you already have 2 reviews by JDK Reviewers. The change needs to get the formal information including "Reviewed-by" and "Contributed-by" information added such that it passes jcheck. Then you only need a sponsor to push it. I guess Michihiro wants to do that for you? Best regards, Martin > -----Original Message----- > From: joserz at linux.ibm.com > Sent: Montag, 24. August 2020 14:36 > To: Doerr, Martin > Cc: Thomas Schatzl ; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system > and use new byte-reverse instructions > > Hallo Martin! > > Just to understand. Do I need to do something else? Ask more reviewers? > > Thank you :) > > Jose > > On Fri, Aug 21, 2020 at 03:25:46PM +0000, Doerr, Martin wrote: > > Hi Thomas, > > > > I understand your point. My concern is that it may become a more political > discussion how to handle CSR for PPC64 flags and I don't want to delay Jose's > change for that. There are already other changes in the pipe which build on > top of it. > > > > It will probably be us to handle and approve CSR requests for platforms > which are maintained by SAP. We haven't done this so far. We are still > handling such flags in a less formal way. > > I don't know how other non-Oracle platforms are handled. > > > > Best regards, > > Martin > > > > > > > -----Original Message----- > > > From: Thomas Schatzl > > > Sent: Freitag, 21. August 2020 17:12 > > > To: Doerr, Martin ; joserz at linux.ibm.com > > > Cc: hotspot-compiler-dev at openjdk.java.net > > > Subject: Re: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 > system > > > and use new byte-reverse instructions > > > > > > Hi, > > > > > > On 21.08.20 17:06, Doerr, Martin wrote: > > > > Hi Thomas, > > > > > > > > I agree with you in general. However, all PPC64 specific platform flags > are > > > "product" at the moment. > > > > Most of them should probably be "diagnostic". We should fix that at > some > > > point of time. > > > > But for now, I'm ok with Jose's webrev since it's consistent with the > other > > > PPC64 flags. > > > > > > > > > > I was merely pointing out what the rule is, that has not been a veto > > > for the patch (which I haven't reviewed btw). If you want to go ahead > > > with that for consistency's sake, with a plan to fix this I can see your > > > point of keeping it. > > > > > > Thanks, > > > Thomas > > > > > > > Best regards, > > > > Martin > > > > > > > > > > > >> -----Original Message----- > > > >> From: hotspot-compiler-dev > > >> retn at openjdk.java.net> On Behalf Of Thomas Schatzl > > > >> Sent: Freitag, 21. August 2020 15:45 > > > >> To: joserz at linux.ibm.com > > > >> Cc: hotspot-compiler-dev at openjdk.java.net > > > >> Subject: Re: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 > > > system > > > >> and use new byte-reverse instructions > > > >> > > > >> Hi, > > > >> > > > >> On 21.08.20 15:37, joserz at linux.ibm.com wrote: > > > >>> Hello! > > > >>> > > > >>> On Fri, Aug 21, 2020 at 10:04:38AM +0200, Thomas Schatzl wrote: > > > >>>> Hi, > > > >>>> > > > >>>> On 21.08.20 04:33, Michihiro Horie wrote: > > > >>>>> > > > >>>>> Hi Jose, > > > >>>>> > > > >>>>> One thing I noticed is a misaligned backslash in globals_ppc.hpp. > > > >>>>> Otherwise, the change looks good! > > > >>>>> > > > >>>>> /* special instructions */ > > > >>>>> \ > > > >>>>> + product(bool, UseByteReverseInstructions, false, > > > >>>>> \ > > > >>>> > > > >>>> Fwiw, for adding product options, you must go through the CSR > > > process. > > > >> Maybe > > > >>>> there is an exception for platform specific ones? > > > >>> > > > >>> I didn't find any exception for platform specific options. But, > > > >> "experimental" options > > > >>> don't need such CSR process and, to be honest, experimental seems > > > more > > > >> appropriate here. > > > >>> What do you think? > > > >>> > > > >>> Thank you for your review! :) > > > >> > > > >> Just a fly-by. It's up to you :) - just that product options need to be > > > >> announced to the world. > > > >> > > > >> I kind of agree that experimental seems more appropriate. You can > > > always > > > >> "upgrade" it later. > > > >> > > > >> Thomas > > From adinn at redhat.com Mon Aug 24 13:40:53 2020 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 24 Aug 2020 14:40:53 +0100 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com> <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com> <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com> Message-ID: <35f8801f-4383-4804-2a9b-5818d1bda763@redhat.com> On 24/08/2020 10:16, Ningsheng Jian wrote: > On 8/22/20 6:34 AM, Vladimir Ivanov wrote: >> The ultimate goal was to move to vectors which represent full-width >> hardware registers. After we were convinced that it will work well in AD >> files, we encountered some inefficiencies with vector spills: depending >> on actual hardware, smaller (than available) vectors may be used (e.g., >> integer computations on AVX-capable CPU). So, we stopped half-way and >> left post-matching part intact: depending on actual vector value width, >> appropriate operand (vecX/vecY/vecZ + legacy variants) is chosen. >> >> (I believe you may be in a similar situation on AArch64 with NEON vs SVE >> where both 128-bit and wide SVE vectors may be used at runtime.) Your problem here seems to be a worry about spilling more data than is actually needed. As Ningsheng pointed out the amount of data spilled is determined by the actual length of the VecA registers, not by the logical size of the VecA mask (256 bits) nor by the maximum possible size of a VecA register on future architectures (2048 bits). So, no more stack space will be used than is needed to preserve the live bits that need preserving. >> Unfortunately, it extends the implementation in orthogonal direction >> which looks too aarch64-specific to benefit other architectures and x86 >> particular. I believe there's an alternative approach which can benefit >> both aarch64 and x86, but it requires more experimentation. >> > > Since vecA and vecX (and others) are architecturally different vector > registers, I think it's quite natural that we just introduced the new > vector register type vecA, to represent what we need for corresponding > hardware vector register. Please note that in vector length agnostic > ISA, like Arm SVE and RISC-V vector extension [1], the vector registers > are architecturally the same type of register despite the different > hardware implementations. Yes, I also see this as quite natural. Ningsheng's change extends the implementation in the architecture-specific direction that is needed for AArch64's vector model. The fact that this differs from x86_64 is not unexpected. >> If I were to start from scratch, I would choose between 3 options: >> >> ??? #1: reuse existing VecX/VecY/VecZ ideal registers and limit supported >> vector sizes to 128-/256-/512-bit values. >> >> ??? #2: lift limitation on max size (to 1024/2048 bits), but ignore >> non-power-of-2 sizes; >> >> ??? #3: introduce support for full range of vector register sizes >> (128-/.../2048-bit with 128-bit step); >> >> I see 2 (mostly unrelated) limitations: maximum vector size and >> non-power-of-2 sizes. Yes, but this patch deals with both of those and I cannot see it causing any problems for x86_64 nor do I see it adding any great complexity. The extra shard paths deal with scalable vectors wich onlu occur on AArch64. A scalable VecA register (and also eventually the scalable predicate register) caters for all possible vector sizes via a single 'logical' vector of size 8 slots (also eventually a single 'logical' predicate register of size 1 slot). Catering for scalable registers in shared code is localized and does not change handling of the existing, non-scalable VecX/Y/Z registers. >> My understanding is that you don't try to accurately represent SVE for >> now, but lay some foundations for future work: you give up on >> non-power-of-2 sized vectors, but still enable support for arbitrarily >> sized vectors (addressing both limitations on maximum size and size >> granularity) in RA (and it affects only spills). So, it is somewhere >> between #2 and #3. I have to disagree with your statement that this proposal doesn't 'accurately' represent SVE. Yes, the vector mask for this arbitrary-size vector is modelled 'logically' using a nominal 8 slots. However, that is merely to avoid wasting bits in the bit masks plus cpu time processing them. The 'physical' vector length models the actual number of slots, and includes the option to model a non-power of two. That 'physical' size is used in all operations that manipulate VecA register contents. So, although I grant that the code is /parameterized/, it is also 100% accurate. >> The ultimate goal is definitely #3, but how much more work will be >> required to teach the JVM about non-power-of-2 vectors? As I see in the >> patch, you don't have auto-vectorizer support yet, but Vector API will >> provide access to whatever size hardware exposes. What do you expect on >> hardware front in the near/mid-term future? Anything supporting vectors >> larger than 512-bit? What about 384-bit vectors? Do we need to know for sure such hardware is going to arrive in order to allow for it now? If there were a significant cost to doing so I'd maybe say yes but I don't really see one here. Most importantly, the changes to the AArch64 register model and small changes to the shared chaitin/reg mask code proposed here already work with the auto-vectorizer if the VecA slots are any of the possible powers of 2 VecA sizes. The extra work needed to profit from non-power-of-two vector involves upgrading the auto-vectorizer code. While this may be tricky I don't see ti as impossible. However, more importantly, even if such an upgrade cannot be achieved then this proposal is still a very simple way to allow for arbitrarily scalable SVE vectors that are a power of two size. It also allows any architecture with a non-power of two to work with the lowest power of two that fits. So, this is a very siple way to cater for what may turn up. >> For larger vectors #2 (or a mix of #1 and #2) may be a good fit. My >> understanding that existing RA machinery should support 1024-bit vectors >> well. So, unless 2048-bit vectors are needed, we could live with the >> framework we have right now. I'm not sure what you are proposing here but it sounds like introducing extra vectors beyond VecX, VecY for larger powers of two i.e. VecZ, vecZZ, VecZZZ ... and providing separate case processing for each of them where the relevant case is selected conditional on the actual vector size. Is that what you are proposing? I can't see any virtue in multiplying case handling fore ach new power-of-two size that turns up when all possible VecZ* power-of-two options can actually be handled as one uniform case. >> If hardware has non-power-of-2 vectors, but JVM doesn't support them, >> then JVM can work with just power-of-2 portion of them (384-bit => >> 256-bit). And, of course, the previous comment applies here /a fortiori/. >> Giving up on #3 for now and starting with less ambitious goals (#1 or >> #2) would reduce pressure on RA and give more time for additional >> experiments to come with a better and more universal >> support/representation of generic/size-agnostic vectors. And, in a >> longer term, help reducing complexity and technical debt in the area. Can you explain what you mean by 'reduce pressure on RA'? I'm also unclear as to what you see as complex about this proposal. >> Some more comments follow inline. >> >>>> Compared to x86 w/ AVX512, architectural state for vector registers is >>>> 4x larger in the worst case (ignoring predicate registers for now). >>>> Here are the relevant constants on x86: >>>> >>>> gensrc/adfiles/adGlobals_x86.hpp: >>>> >>>> // the number of reserved registers + machine registers. >>>> #define REG_COUNT??? 545 >>>> ... >>>> // Size of register-mask in ints >>>> #define RM_SIZE 22 >>>> >>>> My estimate is that for AArch64 with SVE support the constants will be: >>>> >>>> ??? REG_COUNT < 2500 >>>> ??? RM_SIZE < 100 >>>> >>>> which don't look too bad. I'm not sure what these numbers are meant to mean. The number of SVE vector registers is the same as the number of NEON vector registers i.e. 32. The register mask size for VecA registers is 8 * 32 bits. >>> Right, but given that most real hardware implementations will be no >>> larger than 512 bits, I think. Having a large bitmask array, with most >>> bits useless, will be less efficient for regmask computation. >> >> Does it make sense to limit the maximum supported size to 512-bit then >> (at least, initially)? In that case, the overhead won't be worse it is >> on x86 now. Well, no. It doesn't make sense when all you need is a 'logical' 8 * 32 bit mask whatever the actual 'physical' register size is. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From erik.osterlund at oracle.com Mon Aug 24 15:26:30 2020 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 24 Aug 2020 17:26:30 +0200 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <5397e0d1-9d40-0107-c164-304740bc5d7f@arm.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> <5397e0d1-9d40-0107-c164-304740bc5d7f@arm.com> Message-ID: Hi Ningsheng, On 2020-08-24 11:59, Ningsheng Jian wrote: > Hi Erik, > > Thanks for the review! > > On 8/22/20 12:21 AM, Erik ?sterlund wrote: >> Hi, >> >> Have you tried this with ZGC on AArch64? It has custom code for saving >> live registers in the load barrier slow path. >> I can't see any code changes there, so assuming this will just crash >> instead. >> The relevant code is in ZBarrierSetAssembler on aarch64. >> >> Maybe I missed something? >> > > I didn't add ZGC option while running tests. I think I need to update > push_fp() which is called by ZSaveLiveRegisters. But do we need to get > size info (float/neon/sve) instead of saving the whole vector > register? Currently, it just simply saves the whole NEON register. What we found on x86_64 was that there was a significant cost in saving vector registers in load barriers. That is why we perform some analysis so that only the exact registers that are affected, and only the parts of the registers that are affected, get spilled. It actually mattered. It will of course work either way, but that was our observation on x86_64. But I am okay with that being deferred to a separate RFE. I just wanted to make sure that it at the very least works with the new code, for a start, so it doesn't start crashing. > And in ZBarrierSetAssembler::load_at(), before calling to runtime > code, we call push_call_clobbered_registers_except(), which just saves > floating point registers instead of the whole NEON vector registers. > Similar behavior in x86 implementation. Is that correct (not saving > vectors)? Yes. The call contexts are: 1) Interpreter. Does not use vector registers. 2) Method handle intrinsic. Uses only floats that are part of the Java calling convention, rest is garbage. No vectors here. 3) Checkcast arraycopy. Does not use vectors. Thanks, /Erik > Thanks, > Ningsheng From aph at redhat.com Mon Aug 24 17:31:57 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 24 Aug 2020 18:31:57 +0100 Subject: RFR 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com> References: <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com> <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com> <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com> Message-ID: <977c8d9b-a9a0-4412-8d1b-0ca6bb5db558@redhat.com> On 23/08/2020 19:20, Boris Ulasevich wrote: > With the current change all the transformation logic is moved out of > aarch64.ad file into the common C2 code. > > http://bugs.openjdk.java.net/browse/JDK-8249893 > http://cr.openjdk.java.net/~bulasevich/8249893/webrev.01 > > The change in compiler.cpp was done to implicitly ask IGVN to run > the idealization once again after the loop optimization phase. > This extra step is necessary to make the BFI transform happen > only after loop optimization. This looks rather nice. How did you test it? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From igor.ignatyev at oracle.com Mon Aug 24 20:24:22 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 24 Aug 2020 13:24:22 -0700 Subject: RFR(S) : 8252186 : remove FileInstaller action from vmTestbase/jit/graph tests In-Reply-To: <7488a613-f5ad-acc8-edc1-677d4511216a@oracle.com> References: <221D21A7-B791-48CF-B48E-8E6E8CF8F4B0@oracle.com> <7488a613-f5ad-acc8-edc1-677d4511216a@oracle.com> Message-ID: <6CDBB24F-A155-4C93-A85D-82D8D2E58DCD@oracle.com> thanks Vladimir, pushed. -- Igor > On Aug 22, 2020, at 10:55 AM, Vladimir Kozlov wrote: > > LGTM > > Thanks, > Vladimir K > > On 8/21/20 10:23 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev/8252186/webrev.00/ >>> 24 lines changed: 0 ins; 12 del; 12 mod; >> Hi all, >> could you please review this small cleanup of vmTestbase/jit/graph tests? >> from JBS: >>> vmTestbase/jit/graph tests use FileInstaller to copy ${test.src}/data/main.data to the current directory, and pass the path to it as '-path' option to jit.graph.CGT class. since JDK-8252005 enabled jtreg smart action args, we can use ${test.src} right in the argument and avoid copying. >> testing: :vmTestbase_vm_compiler >> JBS: https://bugs.openjdk.java.net/browse/JDK-8252186 >> webrev: http://cr.openjdk.java.net/~iignatyev/8252186/webrev.00/ >> Thanks, >> -- Igor From dmitry.chuyko at bell-sw.com Mon Aug 24 21:52:06 2020 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Tue, 25 Aug 2020 00:52:06 +0300 Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: References: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com> <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com> <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com> Message-ID: Hi Andrew, I added two more intrinsics -- for copySign, they are controlled by UseCopySignIntrinsic flag. webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.03/ It also contains 'benchmarks' directory: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.03/benchmarks/ There are 8 benchmarks there: (double | float) x (blackhole | reduce) x (current j.l.Math.signum | abs()>0 check). My results on Arm are in signum-facgt-copysign.ods. Main case is 'random' which is actually a random from positive and negative numbers between -0.5 and +0.5. Basically we have ~14% improvement in 'reduce' benchmark variant but ~20% regression in 'blackhole' variant in case of only copySign() intrinsified. Same picture if abs()>0 check is used in signum() (+-5%). This variant is included as it shows very good results on x86. Intrinsic for signum() gives improvement of main case in both 'blackhole' and 'reduce' variants of benchmark: 28% and 11%, which is a noticeable difference. -Dmitry On 8/19/20 11:35 AM, Andrew Haley wrote: > On 18/08/2020 16:05, Dmitry Chuyko wrote: >> Some more results for a benchmark with reduce(): >> >> -XX:-UseSignumIntrinsic >> DoubleOrigSignum.ofMostlyNaN 0.914 ? 0.001 ns/op >> DoubleOrigSignum.ofMostlyNeg 1.178 ? 0.001 ns/op >> DoubleOrigSignum.ofMostlyPos 1.176 ? 0.017 ns/op >> DoubleOrigSignum.ofMostlyZero 0.803 ? 0.001 ns/op >> DoubleOrigSignum.ofRandom 1.175 ? 0.012 ns/op >> -XX:+UseSignumIntrinsic >> DoubleOrigSignum.ofMostlyNaN 1.040 ? 0.007 ns/op >> DoubleOrigSignum.ofMostlyNeg 1.040 ? 0.004 ns/op >> DoubleOrigSignum.ofMostlyPos 1.039 ? 0.003 ns/op >> DoubleOrigSignum.ofMostlyZero 1.040 ? 0.001 ns/op >> DoubleOrigSignum.ofRandom 1.040 ? 0.003 ns/op > That's almost no difference, isn't it? Down in the noise. > >> If we only intrinsify copySign() we lose free mask that we get from >> facgt. In such case improvement (for signum) decreases like from ~30% to >> ~15%, and it also greatly depends on the particular HW. We can >> additionally introduce an intrinsic for Math.copySign(), especially it >> makes sense for float where it can be just 2 fp instructions: movi+bsl >> (fmovd+fnegd+bsl for double). > I think this is worth doing, because moves between GPRs and vector regs > tend to have a long latency. Can you please add that, and we can all try > it on our various hardware. > > We're measuring two different things, throughput and latency. The > first JMH test you provided was really testing latency, because > Blackhole waits for everything to complete. > > [ Note to self: Blackhole.consume() seems to be particularly slow on > some AArch64 implementations because it uses a volatile read. What > seems to be happening, judging by how long it takes, is that the store > buffer is drained before the volatile read. Maybe some other construct > would work better but still provide the guarantees Blackhole.consume() > needs. ] > > For throughput we want to keep everything moving. Sure, sometimes we > are going to have to wait for some calculation to complete, so if we > can improve latency without adverse cost we should. For that, staying > in the vector regs helps. > From cjashfor at linux.ibm.com Tue Aug 25 01:21:59 2020 From: cjashfor at linux.ibm.com (Corey Ashford) Date: Mon, 24 Aug 2020 18:21:59 -0700 Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> Message-ID: <8ece8d2e-fd99-b734-211e-a32b534a7dc8@linux.ibm.com> Here's a revised webrev which includes a JMH benchmark for the decode operation. http://cr.openjdk.java.net/~mhorie/8248188/webrev.03/ The added benchmark tries to be "fair" in that it doesn't prefer a large buffer size, which would favor the intrinsic. It pseudo-randomly (but reproducibly) chooses a buffer size between 8 and 20k+8 bytes, and fills it with random data to encode and decode. As part of the TearDown of an invocation, it also checks the decoded output data for correctness. Example runs on the Power9-based machine I use for development shows a 3X average improvement across these random buffer sizes. Here's an excerpt of the output when run with -XX:-UseBASE64Intrinsics : Iteration 1: 70795.623 ops/s Iteration 2: 71070.607 ops/s Iteration 3: 70867.544 ops/s Iteration 4: 71107.992 ops/s Iteration 5: 71048.281 ops/s And here's the output with the intrinsic enabled: Iteration 1: 208794.022 ops/s Iteration 2: 208630.904 ops/s Iteration 3: 208238.822 ops/s Iteration 4: 208714.967 ops/s Iteration 5: 209060.894 ops/s Taking the best of the two runs: 209060/71048 = 2.94 From other experiments where the benchmark uses a fixed-size, larger buffer, the performance ratio rises to about 4.0. Power10 should have a slightly higher ratio due to several factors, but I have not yet benchmarked on Power10. Other arches ought to be able to do at least this well, if not better, because of wider vector registers (> 128 bits) being available. Only a Power9/10 implementation is included in this webrev, however. Regards, - Corey On 8/19/20 11:20 AM, Roger Riggs wrote: > Hi Corey, > > For changes obviously performance motivated, it is conventional to run a > JMH perf test to demonstate > the improvement and prove it is worthwhile to add code complexity. > > I don't see any existing Base64 JMH tests but they would be in the repo > below or near: > ??? test/micro/org/openjdk/bench/java/util/ > > Please contribute a JMH test and results to show the difference. > > Regards, Roger > > > > On 8/19/20 2:10 PM, Corey Ashford wrote: >> Michihiro Horie posted up a new iteration of this webrev for me.? This >> time the webrev includes a complete implementation of the intrinsic >> for Power9 and Power10. >> >> You can find it here: >> http://cr.openjdk.java.net/~mhorie/8248188/webrev.02/ >> >> Changes in webrev.02 vs. webrev.01: >> >> ? * The method header for the intrinsic in the Base64 code has been >> rewritten using the Javadoc style.? The clarity of the comments has >> been improved and some verbosity has been removed. There are no >> additional functional changes to Base64.java. >> >> ? * The code needed to martial and check the intrinsic parameters has >> been added, using the base64 encodeBlock intrinsic as a guideline. >> >> ? * A complete intrinsic implementation for Power9 and Power10 is >> included. >> >> ? * Adds some Power9 and Power10 assembler instructions needed by the >> intrinsic which hadn't been defined before. >> >> The intrinsic implementation in this patch accelerates the decoding of >> large blocks of base64 data by a factor of about 3.5X on Power9. >> >> I'm attaching two Java test cases I am using for testing and >> benchmarking.? The TestBase64_VB encodes and decodes randomly-sized >> buffers of random data and checks that original data matches the >> encoded-then-decoded data.? TestBase64Errors encodes a 48K block of >> random bytes, then corrupts each byte of the encoded data, one at a >> time, checking to see if the decoder catches the illegal byte. >> >> Any comments/suggestions would be appreciated. >> >> Thanks, >> >> - Corey >> >> On 7/27/20 6:49 PM, Corey Ashford wrote: >>> Michihiro Horie uploaded a new revision of the Base64 decodeBlock >>> intrinsic API for me: >>> >>> http://cr.openjdk.java.net/~mhorie/8248188/webrev.01/ >>> >>> It has the following changes with respect to the original one posted: >>> >>> ??* In the event of encountering a non-base64 character, instead of >>> having a separate error code of -1, the intrinsic can now just return >>> either 0, or the number of data bytes produced up to the point where >>> the illegal base64 character was encountered. This reduces the number >>> of special cases, and also provides a way to speed up the process of >>> finding the bad character by the slower, pure-Java algorithm. >>> >>> ??* The isMIME boolean is removed from the API for two reasons: >>> ??? - The current API is not sufficient to handle the isMIME case, >>> because there isn't a strict relationship between the number of input >>> bytes and the number of output bytes, because there can be an >>> arbitrary number of non-base64 characters in the source. >>> ??? - If an intrinsic only implements the (isMIME == false) case as >>> ours does, it will always return 0 bytes processed, which will >>> slightly slow down the normal path of processing an (isMIME == true) >>> instantiation. >>> ??? - We considered adding a separate hotspot candidate for the >>> (isMIME == true) case, but since we don't have an intrinsic >>> implementation to test that, we decided to leave it as a future >>> optimization. >>> >>> Comments and suggestions are welcome.? Thanks for your consideration. >>> >>> - Corey >>> >>> On 6/23/20 6:23 PM, Michihiro Horie wrote: >>>> Hi Corey, >>>> >>>> Following is the issue I created. >>>> https://bugs.openjdk.java.net/browse/JDK-8248188 >>>> >>>> I will upload a webrev when you're ready as we talked in private. >>>> >>>> Best regards, >>>> Michihiro >>>> >>>> Inactive hide details for "Corey Ashford" ---2020/06/24 >>>> 09:40:10---Currently in java.util.Base64, there is a >>>> HotSpotIntrinsicCa"Corey Ashford" ---2020/06/24 09:40:10---Currently >>>> in java.util.Base64, there is a HotSpotIntrinsicCandidate and API >>>> for encodeBlock, but no >>>> >>>> From: "Corey Ashford" >>>> To: "hotspot-compiler-dev at openjdk.java.net" >>>> , >>>> "ppc-aix-port-dev at openjdk.java.net" >>>> Cc: Michihiro Horie/Japan/IBM at IBMJP, Kazunori Ogata/Japan/IBM at IBMJP, >>>> joserz at br.ibm.com >>>> Date: 2020/06/24 09:40 >>>> Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for >>>> Base64 decoding >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> >>>> >>>> >>>> Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and >>>> API for encodeBlock, but none for decoding. ?This means that only >>>> encoding gets acceleration from the underlying CPU's vector hardware. >>>> >>>> I'd like to propose adding a new intrinsic for decodeBlock. ?The >>>> considerations I have for this new intrinsic's API: >>>> >>>> ??* Don't make any assumptions about the underlying capability of the >>>> hardware. ?For example, do not impose any specific block size >>>> granularity. >>>> >>>> ??* Don't assume the underlying intrinsic can handle isMIME or isURL >>>> modes, but also let them decide if they will process the data >>>> regardless >>>> of the settings of the two booleans. >>>> >>>> ??* Any remaining data that is not processed by the intrinsic will be >>>> processed by the pure Java implementation. ?This allows the >>>> intrinsic to >>>> process whatever block sizes it's good at without the complexity of >>>> handling the end fragments. >>>> >>>> ??* If any illegal character is discovered in the decoding process, the >>>> intrinsic will simply return -1, instead of requiring it to throw a >>>> proper exception from the context of the intrinsic. ?In the event of >>>> getting a -1 returned from the intrinsic, the Java Base64 library code >>>> simply calls the pure Java implementation to have it find the error and >>>> properly throw an exception. ?This is a performance trade-off in the >>>> case of an error (which I expect to be very rare). >>>> >>>> ??* One thought I have for a further optimization (not implemented in >>>> the current patch), is that when the intrinsic decides not to process a >>>> block because of some combination of isURL and isMIME settings it >>>> doesn't handle, it could return extra bits in the return code, encoded >>>> as a negative number. ?For example: >>>> >>>> Illegal_Base64_char ? = 0b001; >>>> isMIME_unsupported ? ?= 0b010; >>>> isURL_unsupported ? ? = 0b100; >>>> >>>> These can be OR'd together as needed and then negated (flip the sign). >>>> The Base64 library code could then cache these flags, so it will know >>>> not to call the intrinsic again when another decodeBlock is requested >>>> but with an unsupported mode. ?This will save the performance hit of >>>> calling the intrinsic when it is guaranteed to fail. >>>> >>>> I've tested the attached patch with an actual intrinsic coded up for >>>> Power9/Power10, but those runtime intrinsics and arch-specific patches >>>> aren't attached today. ?I want to get some consensus on the >>>> library-level intrinsic API first. >>>> >>>> Also attached is a simple test case to test that the new intrinsic API >>>> doesn't break anything. >>>> >>>> I'm open to any comments about this. >>>> >>>> Thanks for your consideration, >>>> >>>> - Corey >>>> >>>> >>>> Corey Ashford >>>> IBM Systems, Linux Technology Center, OpenJDK team >>>> cjashfor at us dot ibm dot com >>>> [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro >>>> Horie/Japan/IBM] [attachment "TestBase64.java" deleted by Michihiro >>>> Horie/Japan/IBM] >>>> >>>> >>> >> > From john.r.rose at oracle.com Tue Aug 25 05:23:27 2020 From: john.r.rose at oracle.com (John Rose) Date: Mon, 24 Aug 2020 22:23:27 -0700 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <7433dd28-94ce-781a-a50c-e79234e2986e@oracle.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> <87tuwx1gcf.fsf@redhat.com> <7433dd28-94ce-781a-a50c-e79234e2986e@oracle.com> Message-ID: On Aug 21, 2020, at 12:43 AM, Tobias Hartmann wrote: > > For the record, I've tested tier1-9 with "default" flags and tier1-5 with > -XX:StressLongCountedLoop=1 and -XX:StressLongCountedLoop=4294967295. > > Please let me know if you think other flag combinations/values should be tested as well. Those settings force iters_limit (normally 2^31-2) to be either preserved at 2^31-2 or reset to 0, respectively. The latter value is not very useful, since the transform will bail out for trip counts of 1 or 0. I suggest aiming for StressLongCountedLoop values which get inner loop trip counts that are a balance between two concerns: (a) large enough so that the inner loop makes a non-trivial number of trips, and (b) small enough so the *outer* loop makes a non-trivial number of trips. Concern (a) lets us to exercise further optimizations on the inner loop such as unrolling, peeling, and RCE. Concern (b) helps us be sure that back edge of the outer loop performs the right register moves, even if the inner loop is very complex and has many exit points. If we don?t worry about (a) we could mask bugs in the transformed inner loop (unlikely, but possible). If we don?t worry about (b) we could be ignorant about what happens when the outer loop runs the second time (or third, after peeling). For (a) we want an iters_limit on the order of 100 or more, while for (b) we want an iters_limit large enough that many tests (each loop of which has its own characteristic trip count) will run the outer loop three or more times. Tests which intentionally warm up loops go for a *cumulative* trip count of 20,000 or so, but the individual trip counts can vary widely. As a wild guess, I?ll say that many tests will run 100 or more times, which means we want an iter_limit of 300 or more. To derive a StressLongCountedLoop parameter X from a desired iter_limit, ensure that floor((2^31-2)/X) is close to the target iter_limit. So, I recommend a value of StressLongCountedLoop which is at most 21400000 (for an iters_limit of at least 100), and another which is at least 7150000 (for an iters_limit of at most 300). Putting these together, and choosing a round number which prioritizes concern (b) by moving closer to the limit of (a), if I had one more run to do I?d choose -XX: StressLongCountedLoop=20000000. If I were to do multiple runs, I might choose vary that stress parameter by adding and subtracting a couple of zeroes: -XX: StressLongCountedLoop=200000 -XX: StressLongCountedLoop=2000000 -XX: StressLongCountedLoop=20000000 -XX: StressLongCountedLoop=200000000 -XX: StressLongCountedLoop=2000000000 If any of those runs kicks out a bug or other suspicious behavior, it should be added to a permanent test list. Separately from those issues, we know that the stress mode converts 32-bit loops into 64-bit loops, which then re-nest using the new logic. But, are we confident that this re-nesting works? Roland did some manual testing to make sure the test works as intended, but it would be good to run the above stress tests with some sort of logging that ensures that there are at least ?lots and lots? of successful 32-to-64 loop conversions. If those loop conversions fail (staying at 64 bits) the tests will pass, but they won?t be testing what we need to be testing. HTH ? John > Best regards, > Tobias > > On 20.08.20 17:34, Roland Westrelin wrote: >> >>> Yes, webrev.03 looks good to me. I've re-run extended testing and the results look good. >> >> Thanks for the review and testing! >> >> Roland. >> From yueshi.zwj at alibaba-inc.com Tue Aug 25 06:03:10 2020 From: yueshi.zwj at alibaba-inc.com (Joshua Zhu) Date: Tue, 25 Aug 2020 14:03:10 +0800 Subject: RFR: 8252259: AArch64: Adjust default value of FLOATPRESSURE Message-ID: <001101d67aa5$69851450$3c8f3cf0$@alibaba-inc.com> Hi, I have a small patch that will decrease the default value from 64 into 32 for aarch64's FLOATPRESSURE, which represents float LRG's number that constitutes high register pressure. With the proper value setting, in low register pressure (LRP) region, C2 can avoid unnecessary spilling and directly use register. I wrote a simple case that is able to reflect the effect of new value. http://cr.openjdk.java.net/~jzhu/8252259/Test.java For this case, with new FLOATPRESSURE value, only one iteration of iterative graph-coloring RA was required. The DefinitionSpillCopyNode was generated directly when crossing HRP boundary in Split phase [1]. And only one MemToRegSpillCopyNode in HRP region was generated at USE site. The dump of Split cycles and OptoAssembly is: http://cr.openjdk.java.net/~jzhu/8252259/frp_32.log For the same case, with current FLOATPRESSURE, the whole method was identified as LRP region. In the first iteration of graph-coloring, LRG was identified as spilled. In the second iteration, DefinitionSpillCopyNode was generated [2] and there were three MemToRegSpillCopy nodes were produced at each USE site. See dump: http://cr.openjdk.java.net/~jzhu/8252259/frp_64.log with the old FLOATPRSSURE. Therefore I propose the default value of FLOATPRESSURE be 32 because there are 32 float/SIMD registers on aarch64 and also the value of register pressure is the same as 1 for each LRG of Op_RegL/Op_RegD/Op_Vec. [3] Could you please help review this change? JBS: https://bugs.openjdk.java.net/browse/JDK-8252259 Webrev: http://cr.openjdk.java.net/~jzhu/8252259/webrev.00/ [1] https://hg.openjdk.java.net/jdk/jdk/file/332b3a2eb4cc/src/hotspot/share/opto /reg_split.cpp#l855 [2] https://hg.openjdk.java.net/jdk/jdk/file/332b3a2eb4cc/src/hotspot/share/opto /reg_split.cpp#l1198 [3] https://hg.openjdk.java.net/jdk/jdk/file/332b3a2eb4cc/src/hotspot/share/opto /chaitin.cpp#l926 Best Regards, Joshua From boris.ulasevich at bell-sw.com Tue Aug 25 06:40:57 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Tue, 25 Aug 2020 09:40:57 +0300 Subject: RFR 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: <977c8d9b-a9a0-4412-8d1b-0ca6bb5db558@redhat.com> References: <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com> <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com> <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com> <977c8d9b-a9a0-4412-8d1b-0ca6bb5db558@redhat.com> Message-ID: Hi Andrew, > This looks rather nice. Thank you! > How did you test it? I have run JCK and JTREG tests on arm and intel platforms (the transformation works in many places: StringUTF16, BigInteger, ZipUtils, etc). I checked that benchmark [1] shows positive results on both single call and vectorized case (adding the benchmarking code in a simple cycle). I checked with +PrintAssembly that expressions are generated as expected: ((v1 & 0xFF) << 24) | ((v2 & 0xFF) << 16) | ((v3 & 0xFF) << 8) | (v4 & 0xFF) I ran the generated brute force tests [2] that checks all possible mask/shift combinations for int/long types: (value1 & mask1) | ((value1 & mask2) << shift) thanks, Boris [1] http://cr.openjdk.java.net/~bulasevich/8249893/webrev.01/Benchmark.java [2] http://cr.openjdk.java.net/~bulasevich/8249893/webrev.00/Gen.java On 24.08.2020 20:31, Andrew Haley wrote: > On 23/08/2020 19:20, Boris Ulasevich wrote: >> With the current change all the transformation logic is moved out of >> aarch64.ad file into the common C2 code. >> >> http://bugs.openjdk.java.net/browse/JDK-8249893 >> http://cr.openjdk.java.net/~bulasevich/8249893/webrev.01 >> >> The change in compiler.cpp was done to implicitly ask IGVN to run >> the idealization once again after the loop optimization phase. >> This extra step is necessary to make the BFI transform happen >> only after loop optimization. > > This looks rather nice. How did you test it? > From shade at redhat.com Tue Aug 25 07:08:00 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 25 Aug 2020 09:08:00 +0200 Subject: RFR (S) 8252215: Remove VerifyOptoOopOffsets flag Message-ID: <96144e25-02b7-ed81-285e-b8d487fd6cfb@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8252215 VerifyOptoOopOffsets flag does not seem to be used (no tests in the current test base), and it does not seem to work reliably (see JDK-4834891). It might be a good time to remove it. JDK-4834891 evaluation says: "The flag VerifyOptoOopOffsets has not been valid since the introduction of sun/misc/Unsafe and the flag should not be used for general testing." How about we remove it? https://cr.openjdk.java.net/~shade/8252215/webrev.01/ Testing: tier1 (locally); jdk-submit (still running?) -- Thanks, -Aleksey From christian.hagedorn at oracle.com Tue Aug 25 07:25:51 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 25 Aug 2020 09:25:51 +0200 Subject: [16] RFR(M): 8249607: C2: assert(!had_error) failed: bad dominance In-Reply-To: <87d03k10ss.fsf@redhat.com> References: <87d03k10ss.fsf@redhat.com> Message-ID: <2334c98c-48da-2fc1-a98d-9e9b983c7500@oracle.com> Hi Roland On 21.08.20 17:22, Roland Westrelin wrote: > > Hi Christian, > >> We have two options to fix this. We could either update the wrong >> control inputs from 876 IfFalse during the creation/merging of >> pre/main/post loops or directly fix it inside >> split_if_with_blocks_post(). I think it is makes more sense and is also >> easier to directly fix it in split_if_with_blocks_post() where we could >> be less pessimistic when pinning loads. >> >> The fix now checks if late_load_ctrl is a loop exit of a loop that has >> an outer strip mined loop and if it dominates x_ctrl. If that is the >> case, we use the outer loop exit control instead. This also means that >> the loads can completely float out of the outer strip mined loop. >> Applying that to the testcase, we get [3] instead of [2]. LoadS 901 and >> 902 are both at the outer strip mined loop exit while 903 LoadS is still >> at the inner loop due to 575 StoreI (x_ctrl is 876 IfFalse and dominates >> the outer strip mined loop exit). The process of creating pre/main/post >> loops will then take care of these control inputs of the LoadSNodes and >> rewires them to the newly created RegionNode such that the dominator >> information is correct again. > > I agree that fixing it in split_if_with_blocks_post() is the right thing > to do. > > The load has no edges to the safepoint in the outer strip mined loop so > why is it in the loop in the first place then? If java code has a load > in a loop that's live outside the loop then it should be live at the > safepoint on loop exit. Is anti dependence analysis too conservative? I maybe should have shared another image of the graph before the LoadS clones 901-903 are created. The original 572 LoadS (see [1]) is an input into 575 StoreI which is an input of 578 MergeMem which goes into the 881 SafePoint in the outer strip mined loop. The other two uses (897 Phi and 893 Phi) are uses outside of the outer strip mined loop. > Also why does get_late_ctrl(n, n_ctrl) return a control inside the outer > strip mined loop? And why is it safe to bypass that result? Due to 575 StoreI being needed inside the outer strip mined loop, get_late_ctrl() of 572 LoadS also returns the inner loop exit 876 IfFalse. My thinking was that since we now clone 572 LoadS and create a new LoadS for each use, then we don't need to pin the LoadS going into Phi 893 and 897 to 876 IfFalse, too, if x_ctrl is outside the outer strip mined loop but to the outer strip mined loop exit. But now thinking about it, do we need another get_late_ctrl(x, late_load_ctrl) for each clone and check if they can really be put outside of the strip mined loop instead of just checking dominance with x_ctrl (which is based on get_ctrl(u) of a use of the load)? In get_late_ctrl() we do consider anti dependencies. Maybe something like this (change on L1473): http://cr.openjdk.java.net/~chagedorn/8249607/webrev.01/ Best regards, Christian [1] https://bugs.openjdk.java.net/secure/attachment/89947/before_cloning_LoadS.png From shade at redhat.com Tue Aug 25 07:29:16 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 25 Aug 2020 09:29:16 +0200 Subject: RFR (XS) 8252290: C2: Remove unused enum in CallGenerator Message-ID: <30d68060-d518-a2c6-f853-9e870d48f0ad@redhat.com> Small cleanup: https://bugs.openjdk.java.net/browse/JDK-8252290 Static code inspection complains the enum below is unused. diff -r 13fdf97f0a8f src/hotspot/share/opto/callGenerator.hpp --- a/src/hotspot/share/opto/callGenerator.hpp Mon Aug 24 09:35:23 2020 +0200 +++ b/src/hotspot/share/opto/callGenerator.hpp Tue Aug 25 09:27:45 2020 +0200 @@ -37,9 +37,4 @@ class CallGenerator : public ResourceObj { - public: - enum { - xxxunusedxxx - }; - private: ciMethod* _method; // The method being called. Testing: grepping for "xxxunusedxxx", local builds -- Thanks, -Aleksey From shade at redhat.com Tue Aug 25 07:34:40 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 25 Aug 2020 09:34:40 +0200 Subject: RFR (XS) 8252291: C2: Assignment in conditional in loopUnswitch.cpp Message-ID: Cleanup: https://bugs.openjdk.java.net/browse/JDK-8252291 Static code analysis complains there is the assignment in the conditional here. I believe the assignment should be explicit here. Code was introduced with JDK-8136725. diff -r 31de2a59348a src/hotspot/share/opto/loopUnswitch.cpp --- a/src/hotspot/share/opto/loopUnswitch.cpp Tue Aug 25 09:27:04 2020 +0200 +++ b/src/hotspot/share/opto/loopUnswitch.cpp Tue Aug 25 09:29:23 2020 +0200 @@ -442,7 +442,8 @@ if (iff->in(1)->Opcode() != Op_ConI) { return false; } - return _has_reserved = true; + _has_reserved = true; + return true; } Testing: local builds -- Thanks, -Aleksey From aph at redhat.com Tue Aug 25 08:10:19 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 25 Aug 2020 09:10:19 +0100 Subject: RFR 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com> References: <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com> <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com> <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com> Message-ID: <8635e113-53c0-f6a7-e51c-acf20222e216@redhat.com> Hi, On 23/08/2020 19:20, Boris Ulasevich wrote: > > Please review the updated change to C2 and AArch64 which introduces > a new BitfieldInsert node to replace Or+Shift+And sequence when possible. > Single BFI instruction is emitted for the new node. > > With the current change all the transformation logic is moved out of > aarch64.ad file into the common C2 code. > > http://bugs.openjdk.java.net/browse/JDK-8249893 > http://cr.openjdk.java.net/~bulasevich/8249893/webrev.01 > > The change in compiler.cpp was done to implicitly ask IGVN to run > the idealization once again after the loop optimization phase. > This extra step is necessary to make the BFI transform happen > only after loop optimization. So here's a strange thing. When I run a simple JMH test @State(Scope.Benchmark) public static class Result { public int a, b; public long x; } @Benchmark public static int bfm(Result r) { return (r.a & 0xFF) | ((r.b & 0xFF) << 8); } I get 0x0000ffff84644df0: ubfiz w12, w11, #8, #8 0x0000ffff84644df4: and w10, w10, #0xff 0x0000ffff84644df8: orr w2, w10, w12 ;*ior {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.Rotates::bfm at 19 (line 22) ; - org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub at 20 (line 199) instead of 0x0000ffff808554b4: and w10, w10, #0xff 0x0000ffff808554b8: and w12, w12, #0xff 0x0000ffff808554bc: orr w2, w12, w10, lsl #8 ;*ior ; - org.openjdk.Rotates::bfm at 19 (line 22) ; - org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub at 20 (line 199) Do you have any ideas why this might be? Thanks. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rwestrel at redhat.com Tue Aug 25 08:23:19 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 25 Aug 2020 10:23:19 +0200 Subject: RFR(S): 8252292: 8240795 may cause anti-dependence to be missed Message-ID: <87wo1n6snc.fsf@redhat.com> https://bugs.openjdk.java.net/browse/JDK-8252292 http://cr.openjdk.java.net/~roland/8252292/webrev.00/ In 8240795, I modified alias analysis so non escaping allocations don't alias with bottom memory. While browsing that code last week, I noticed that that change didn't seem quite right and may cause some anti-dependences to be missed. I could indeed write a test case that fails with an incorrect execution. In the test case: the dst[9] load after the ArrayCopy is transformed into a src[9] load before the ArrayCopy. Anti dependence analysis find src[9] shares the memory of the ArrayCopy but because of the way I tweaked the code with 8240795, anti-dependence analysis finds the src[9] and ArrayCopy don't alias so src[9] can sink out of the loop which is wrong because of the src[9] store. Anti-dependence analysis in that case would need to look at the memory uses of ArrayCopy too. Roland. From rwestrel at redhat.com Tue Aug 25 08:34:00 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 25 Aug 2020 10:34:00 +0200 Subject: RFR(S): 8241486: G1/Z give warning when using LoopStripMiningIter and turn off LoopStripMiningIter (0) Message-ID: <87tuwr6s5j.fsf@redhat.com> https://bugs.openjdk.java.net/browse/JDK-8241486 http://cr.openjdk.java.net/~roland/8241486/webrev.00/ Setting LoopStripMiningIter on the command line for a GC that has loop strip mining implicitly enabled causes a warning to be printed and loop strip mining to be turned off. As suggested in the bug report, this change moves the validation of loop strip mining options to "AfterErgo". Roland. From boris.ulasevich at bell-sw.com Tue Aug 25 08:57:14 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Tue, 25 Aug 2020 11:57:14 +0300 Subject: RFR 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: <8635e113-53c0-f6a7-e51c-acf20222e216@redhat.com> References: <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com> <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com> <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com> <8635e113-53c0-f6a7-e51c-acf20222e216@redhat.com> Message-ID: <89c44f2f-9a4f-4298-4e78-358fff27b6a7@bell-sw.com> Hi, On 25.08.2020 11:10, Andrew Haley wrote: > Hi, > > On 23/08/2020 19:20, Boris Ulasevich wrote: > ?> > ?> Please review the updated change to C2 and AArch64 which introduces > ?> a new BitfieldInsert node to replace Or+Shift+And sequence when > possible. > ?> Single BFI instruction is emitted for the new node. > ?> > ?> With the current change all the transformation logic is moved out of > ?> aarch64.ad file into the common C2 code. > ?> > ?> http://bugs.openjdk.java.net/browse/JDK-8249893 > ?> http://cr.openjdk.java.net/~bulasevich/8249893/webrev.01 > ?> > ?> The change in compiler.cpp was done to implicitly ask IGVN to run > ?> the idealization once again after the loop optimization phase. > ?> This extra step is necessary to make the BFI transform happen > ?> only after loop optimization. > > So here's a strange thing. When I run a simple JMH test > > ???? @State(Scope.Benchmark) > ???? public static class Result { > ???????? public int a, b; > ???????? public long x; > ???? } > > ???? @Benchmark > ???? public static int bfm(Result r) { > ???????? return (r.a & 0xFF) | ((r.b & 0xFF) << 8); > ???? } > > I get > > ?? 0x0000ffff84644df0:?? ubfiz??? w12, w11, #8, #8 > ?? 0x0000ffff84644df4:?? and??? w10, w10, #0xff > ?? 0x0000ffff84644df8:?? orr??? w2, w10, w12??????????????? ;*ior > {reexecute=0 rethrow=0 return_oop=0} > ???????????????????????????????????????????????????????????? ; - > org.openjdk.Rotates::bfm at 19 (line 22) > ???????????????????????????????????????????????????????????? ; - > org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub at 20 (line 199) > > instead of > > ?? 0x0000ffff808554b4: and??? w10, w10, #0xff > ?? 0x0000ffff808554b8: and??? w12, w12, #0xff > ?? 0x0000ffff808554bc: orr??? w2, w12, w10, lsl #8? ;*ior > ???????????????????????????????????????????????? ; - > org.openjdk.Rotates::bfm at 19 (line 22) > ???????????????????????????????????????????????? ; - > org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub at 20 (line 199) > > Do you have any ideas why this might be? Thanks. > Both variants are correct, isn't it? I think matcher preferred UBFIZto OR rule becauseins_costwas set to 1.9 for OR: https://hg.openjdk.java.net/jdk/jdk/file/92ddc6fe60eb/src/hotspot/cpu/aarch64/aarch64.ad#l12130 https://hg.openjdk.java.net/jdk/jdk/file/92ddc6fe60eb/src/hotspot/cpu/aarch64/aarch64.ad#l11675 With my change it would work like this: 0x0000ffff7c587fe0:?? and??? w2, w10, #0xff 0x0000ffff7c587fe8:?? bfi??? x2, x12, #8, #8 From aph at redhat.com Tue Aug 25 09:17:12 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 25 Aug 2020 10:17:12 +0100 Subject: RFR 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: <89c44f2f-9a4f-4298-4e78-358fff27b6a7@bell-sw.com> References: <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com> <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com> <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com> <8635e113-53c0-f6a7-e51c-acf20222e216@redhat.com> <89c44f2f-9a4f-4298-4e78-358fff27b6a7@bell-sw.com> Message-ID: <95cf8beb-2071-8c41-ff71-d4998681e742@redhat.com> On 25/08/2020 09:57, Boris Ulasevich wrote: > Hi, > > On 25.08.2020 11:10, Andrew Haley wrote: >> Hi, >> >> On 23/08/2020 19:20, Boris Ulasevich wrote: >> ?> >> ?> Please review the updated change to C2 and AArch64 which introduces >> ?> a new BitfieldInsert node to replace Or+Shift+And sequence when >> possible. >> ?> Single BFI instruction is emitted for the new node. >> ?> >> ?> With the current change all the transformation logic is moved out of >> ?> aarch64.ad file into the common C2 code. >> ?> >> ?> http://bugs.openjdk.java.net/browse/JDK-8249893 >> ?> http://cr.openjdk.java.net/~bulasevich/8249893/webrev.01 >> ?> >> ?> The change in compiler.cpp was done to implicitly ask IGVN to run >> ?> the idealization once again after the loop optimization phase. >> ?> This extra step is necessary to make the BFI transform happen >> ?> only after loop optimization. >> >> So here's a strange thing. When I run a simple JMH test >> >> ???? @State(Scope.Benchmark) >> ???? public static class Result { >> ???????? public int a, b; >> ???????? public long x; >> ???? } >> >> ???? @Benchmark >> ???? public static int bfm(Result r) { >> ???????? return (r.a & 0xFF) | ((r.b & 0xFF) << 8); >> ???? } >> >> I get >> >> ?? 0x0000ffff84644df0:?? ubfiz??? w12, w11, #8, #8 >> ?? 0x0000ffff84644df4:?? and??? w10, w10, #0xff >> ?? 0x0000ffff84644df8:?? orr??? w2, w10, w12??????????????? ;*ior >> {reexecute=0 rethrow=0 return_oop=0} >> ???????????????????????????????????????????????????????????? ; - >> org.openjdk.Rotates::bfm at 19 (line 22) >> ???????????????????????????????????????????????????????????? ; - >> org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub at 20 (line 199) >> >> instead of >> >> ?? 0x0000ffff808554b4: and??? w10, w10, #0xff >> ?? 0x0000ffff808554b8: and??? w12, w12, #0xff >> ?? 0x0000ffff808554bc: orr??? w2, w12, w10, lsl #8? ;*ior >> ???????????????????????????????????????????????? ; - >> org.openjdk.Rotates::bfm at 19 (line 22) >> ???????????????????????????????????????????????? ; - >> org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub at 20 (line 199) >> >> Do you have any ideas why this might be? Thanks. >> > > Both variants are correct, isn't it? Well, yes. But I thought that the idea was to generate fewer instructions. > I think matcher preferred UBFIZto OR rule becauseins_costwas set to 1.9 > for OR: > https://hg.openjdk.java.net/jdk/jdk/file/92ddc6fe60eb/src/hotspot/cpu/aarch64/aarch64.ad#l12130 > https://hg.openjdk.java.net/jdk/jdk/file/92ddc6fe60eb/src/hotspot/cpu/aarch64/aarch64.ad#l11675 > > With my change it would work like this: > > 0x0000ffff7c587fe0:?? and??? w2, w10, #0xff > 0x0000ffff7c587fe8:?? bfi??? x2, x12, #8, #8 But it didn't. I'm asking you why that is. The first code I showed you was the JMH test in http://cr.openjdk.java.net/~aph/scratch/. This was after I applied your patch. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From richard.reingruber at sap.com Tue Aug 25 09:28:32 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Tue, 25 Aug 2020 09:28:32 +0000 Subject: RFR (XS) 8252290: C2: Remove unused enum in CallGenerator In-Reply-To: <30d68060-d518-a2c6-f853-9e870d48f0ad@redhat.com> References: <30d68060-d518-a2c6-f853-9e870d48f0ad@redhat.com> Message-ID: Hi Aleksey, the cleanup looks good to me. That enum was already part of the initial load with xxxunusedxxx as the only element [1]. So there's no open version history. I could not find any references either (rtags, grep). Probably the enum had more elements originally which were removed. Thanks, Richard. [1] https://github.com/openjdk/jdk/blame/d4626d89cc778b8b7108036f389548c95d52e56a/src/hotspot/share/opto/callGenerator.hpp#L41 -----Original Message----- From: hotspot-compiler-dev On Behalf Of Aleksey Shipilev Sent: Dienstag, 25. August 2020 09:29 To: hotspot compiler Subject: RFR (XS) 8252290: C2: Remove unused enum in CallGenerator Small cleanup: https://bugs.openjdk.java.net/browse/JDK-8252290 Static code inspection complains the enum below is unused. diff -r 13fdf97f0a8f src/hotspot/share/opto/callGenerator.hpp --- a/src/hotspot/share/opto/callGenerator.hpp Mon Aug 24 09:35:23 2020 +0200 +++ b/src/hotspot/share/opto/callGenerator.hpp Tue Aug 25 09:27:45 2020 +0200 @@ -37,9 +37,4 @@ class CallGenerator : public ResourceObj { - public: - enum { - xxxunusedxxx - }; - private: ciMethod* _method; // The method being called. Testing: grepping for "xxxunusedxxx", local builds -- Thanks, -Aleksey From boris.ulasevich at bell-sw.com Tue Aug 25 09:47:11 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Tue, 25 Aug 2020 12:47:11 +0300 Subject: RFR 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: <95cf8beb-2071-8c41-ff71-d4998681e742@redhat.com> References: <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com> <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com> <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com> <8635e113-53c0-f6a7-e51c-acf20222e216@redhat.com> <89c44f2f-9a4f-4298-4e78-358fff27b6a7@bell-sw.com> <95cf8beb-2071-8c41-ff71-d4998681e742@redhat.com> Message-ID: <2323d921-8db3-b98f-af7a-bba7b7c345be@bell-sw.com> On 25.08.2020 12:17, Andrew Haley wrote: > On 25/08/2020 09:57, Boris Ulasevich wrote: >> Hi, >> >> On 25.08.2020 11:10, Andrew Haley wrote: >>> Hi, >>> >>> On 23/08/2020 19:20, Boris Ulasevich wrote: >>> ??> >>> ??> Please review the updated change to C2 and AArch64 which introduces >>> ??> a new BitfieldInsert node to replace Or+Shift+And sequence when >>> possible. >>> ??> Single BFI instruction is emitted for the new node. >>> ??> >>> ??> With the current change all the transformation logic is moved >>> out of >>> ??> aarch64.ad file into the common C2 code. >>> ??> >>> ??> http://bugs.openjdk.java.net/browse/JDK-8249893 >>> ??> http://cr.openjdk.java.net/~bulasevich/8249893/webrev.01 >>> ??> >>> ??> The change in compiler.cpp was done to implicitly ask IGVN to run >>> ??> the idealization once again after the loop optimization phase. >>> ??> This extra step is necessary to make the BFI transform happen >>> ??> only after loop optimization. >>> >>> So here's a strange thing. When I run a simple JMH test >>> >>> ????? @State(Scope.Benchmark) >>> ????? public static class Result { >>> ????????? public int a, b; >>> ????????? public long x; >>> ????? } >>> >>> ????? @Benchmark >>> ????? public static int bfm(Result r) { >>> ????????? return (r.a & 0xFF) | ((r.b & 0xFF) << 8); >>> ????? } >>> >>> I get >>> >>> ??? 0x0000ffff84644df0:?? ubfiz??? w12, w11, #8, #8 >>> ??? 0x0000ffff84644df4:?? and??? w10, w10, #0xff >>> ??? 0x0000ffff84644df8:?? orr??? w2, w10, w12 ;*ior >>> {reexecute=0 rethrow=0 return_oop=0} >>> ; - >>> org.openjdk.Rotates::bfm at 19 (line 22) >>> ; - >>> org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub at 20 (line >>> 199) >>> >>> instead of >>> >>> ??? 0x0000ffff808554b4: and??? w10, w10, #0xff >>> ??? 0x0000ffff808554b8: and??? w12, w12, #0xff >>> ??? 0x0000ffff808554bc: orr??? w2, w12, w10, lsl #8? ;*ior >>> ????????????????????????????????????????????????? ; - >>> org.openjdk.Rotates::bfm at 19 (line 22) >>> ????????????????????????????????????????????????? ; - >>> org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub at 20 (line >>> 199) >>> >>> Do you have any ideas why this might be? Thanks. >>> >> >> Both variants are correct, isn't it? > > Well, yes. But I thought that the idea was to generate fewer > instructions. > >> I think matcher preferred UBFIZto OR rule becauseins_costwas set to 1.9 >> for OR: >> https://hg.openjdk.java.net/jdk/jdk/file/92ddc6fe60eb/src/hotspot/cpu/aarch64/aarch64.ad#l12130 >> >> https://hg.openjdk.java.net/jdk/jdk/file/92ddc6fe60eb/src/hotspot/cpu/aarch64/aarch64.ad#l11675 >> >> >> With my change it would work like this: >> >> 0x0000ffff7c587fe0:?? and??? w2, w10, #0xff >> 0x0000ffff7c587fe8:?? bfi??? x2, x12, #8, #8 > > But it didn't. I'm asking you why that is. The first code I showed you > was the JMH test > in http://cr.openjdk.java.net/~aph/scratch/. This was after I applied > your patch. Ok. Can you please check that my patch [1] has been applied and built correctly. With my change I see this picture: ....[Hottest Region 2]........................................... c2, level 4, org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub, ??????????? 0x0000ffff84584dac:?? add??? x11, x14, #0x94 ??????????? 0x0000ffff84584db0:?? stp??? x21, x19, [sp] ??????????? 0x0000ffff84584db4:?? stp??? x20, x14, [sp, #16] ??????????? 0x0000ffff84584db8:?? stp??? x15, x10, [sp, #32] ??????????? 0x0000ffff84584dbc:?? str??? x11, [sp, #48] ??????????? 0x0000ffff84584dc0:?? b??? 0x0000ffff84584dd8 ??????????? 0x0000ffff84584dc4:?? nop ??????????? 0x0000ffff84584dc8:?? nop ??????????? 0x0000ffff84584dcc:?? nop ? 3.64%? ?? 0x0000ffff84584dd0:?? str??? x19, [sp, #16] ? 0.07%? ?? 0x0000ffff84584dd4:?? mov??? x16, x29 ???????? ?? 0x0000ffff84584dd8:?? ldr??? w10, [x16, #12] ;*invokestatic bfm ? 3.92%? ?? 0x0000ffff84584ddc:?? ldr??? w12, [x16, #24] ? 4.69%? ?? 0x0000ffff84584de0:?? and??? w2, w10, #0xff ? 0.03%? ?? 0x0000ffff84584de4:?? mov??? x29, x16 ? 0.02%? ?? 0x0000ffff84584de8:?? bfi??? x2, x12, #8, #8???? ;*ior {reexecute=0 rethrow=0 return_oop=0} ???????? ??????????????????????????????????????????????????? ; - org.openjdk.Rotates::bfm at 19 (line 23) [1] http://cr.openjdk.java.net/~bulasevich/8249893/webrev.01/jdk-jdk.patch From ningsheng.jian at arm.com Tue Aug 25 10:07:30 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Tue, 25 Aug 2020 18:07:30 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <9fd1e3b1-7884-1cf7-64ba-040a16c74425@oracle.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com> <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com> <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com> <9fd1e3b1-7884-1cf7-64ba-040a16c74425@oracle.com> Message-ID: Hi Vladimir, On 8/24/20 8:03 PM, Vladimir Ivanov wrote: > Hi Ningsheng, > >>> What I see in the patch is that you try to attack the problem from the >>> opposite side: you introduce new concept of a size-agnostic vector >>> register on RA side and then directly use it during matching: vecA is >>> used in aarch64_sve.ad and aarch64.ad relies on vecD/vecX. >>> >>> Unfortunately, it extends the implementation in orthogonal direction >>> which looks too aarch64-specific to benefit other architectures and x86 >>> particular. I believe there's an alternative approach which can benefit >>> both aarch64 and x86, but it requires more experimentation. >>> >> >> Since vecA and vecX (and others) are architecturally different vector >> registers, I think it's quite natural that we just introduced the new >> vector register type vecA, to represent what we need for corresponding >> hardware vector register. Please note that in vector length agnostic >> ISA, like Arm SVE and RISC-V vector extension [1], the vector >> registers are architecturally the same type of register despite the >> different hardware implementations. > > FTR vecX et al don't represent hardware registers, they represent vector > values of predefined size. (For example, vecS, vecD, and vecX map to the > very same set of 128-bit vector registers on x86.) > > My point is: in terms of existing concepts what you are adding is not > "yet another flavor of vector". It's a new full-fledged concept (which > is manifested as special cases across the JVM) and you end up with 2 > different representations of vectors. > > I agree that hardware is quite different, but I don't see it makes much > of a difference in the context of the JVM and abstractions used to hide > it are similar. > > For example, as of now, most of x86-specific code in C2 works just fine > with full-width hardware vectors which are oblivious of their sizes > until RA kicks in. And SVE patch you propose completely omits implicit > predication hardware provides which makes it similar to AVX512 (modulo > wider range of vector width sizes supported). > > So, even though hardware abstractions being used aren't actually *that* > different, vecA piles complexity and introduces a separate way to > achieve similar results (but slightly differently). And that's what > bothers me. I'd like to see more unification instead which should bring > reduction in complexity and an opportunity to address long-standing > technical debt (and 5 flavors of ideal registers for vectors is part of > it IMO). > I can understand that a total solution for different archs and vector sizes is preferable. Do you have any initial idea how to achieve that? > So far, I see 2 main directions for RA work: > > ? (a) support vectors of arbitrary size: > ??? (1) helps push the upper limit on the size (1024-bit) > ??? (2) handle non-power-of-2 sizes > > ? (b) optimize RA implementation for large values > > Anything else? > Yes, and it's not just vector. SVE predicate register has scalable size (vector_size/8) as well. We also have predicate register allocator support well with proposed approach (not in this patch.). > Speaking of (a), in particular, I don't see why possible solution for it > should not supersede vecX et al altogether. > > Also, I may be wrong, but I don't see a clear evidence there's a > pressing need to have all of that fixed right from the beginning. > (That's why I put #1 and #2 options on the table.) Starting with #1/#2 > would untie initial SVE support from the exploratory work needed to > choose the most appropriate solution for (a) and (b). > Staring from partial SVE register support might be acceptable for initial patch (Andrew may not agree :-)), but I think we may end up with more follow-up work, given that our proposed approach already supports SVE well in terms of (a) and (b). If there's no other solution, would it be possible to use current proposed method? It's not difficult to backout our changes in register allocation part, if we find other better solution to support arbitrary vector/predicate sizes in future, as the patch there is actually not big IMO. >>> If I were to start from scratch, I would choose between 3 options: >>> >>> ??? #1: reuse existing VecX/VecY/VecZ ideal registers and limit >>> supported >>> vector sizes to 128-/256-/512-bit values. >>> >>> ??? #2: lift limitation on max size (to 1024/2048 bits), but ignore >>> non-power-of-2 sizes; >>> >>> ??? #3: introduce support for full range of vector register sizes >>> (128-/.../2048-bit with 128-bit step); >>> >>> I see 2 (mostly unrelated) limitations: maximum vector size and >>> non-power-of-2 sizes. >>> >>> My understanding is that you don't try to accurately represent SVE for >>> now, but lay some foundations for future work: you give up on >>> non-power-of-2 sized vectors, but still enable support for arbitrarily >>> sized vectors (addressing both limitations on maximum size and size >>> granularity) in RA (and it affects only spills). So, it is somewhere >>> between #2 and #3. >>> >>> The ultimate goal is definitely #3, but how much more work will be >>> required to teach the JVM about non-power-of-2 vectors? As I see in the >>> patch, you don't have auto-vectorizer support yet, but Vector API will >>> provide access to whatever size hardware exposes. What do you expect on >>> hardware front in the near/mid-term future? Anything supporting vectors >>> larger than 512-bit? What about 384-bit vectors? >>> >> >> I think our patch is now in 3. :-) We do not give up non-power-of-2 >> sized vectors, instead we are supporting them well in this patch. And >> are still using current regmask framework. (Actually, I think the only >> limitation to the vector size is that it should be multiple of 32-bits >> - bits per 1 reg slot.) > >> I am not sure about other Arm partners' hardware implementations in >> the mid-term future, as it's free for cpu implementer to choose any >> max vector sizes as long as it follows SVE architecture specification. >> But we did tested the patch with Vector API on different SVE supported >> vector sizes on emulator, e.g. 384, 768, 1024, 2048 etc. The register >> allocator including the spill/unspill works well on those different >> sizes with Vector API. (Thanks to your great work on Vector API. :-)) >> >> We currently limit the vector size to power-of-2 in >> vm_version_aarch64.cpp, as suggested by Andrew Dinn, is because >> current SLP vectorizer only supports power-of-2 vectors. With Vector >> API in, I think such restriction can be removed. And we are also >> working on a new vectorizer to support predication/mask, which should >> not have power-of-2 limitation. > > [...] > >> Yes, we can make JVM to support portion of vectors, at least for SVE. >> My concern is that the performance wouldn't be as good as the full >> available vector width. > > To be clear: I called it "somewhere between #2 and #3" solely because > auto-vectorizer bails out on non-power-of-2 sizes. And even though > Vector API will work with such cases just fine, IMO having > auto-vectorizer support is required before calling #3 complete. > > In that respect, choosing smaller vector size auto-vectorizer supports > is preferrable to picking up the full-width vectors and turning off > auto-vectorizer (even though Vector API will support them). > > It can be turned into heuristic (by default, pick only power-of-2 sizes; > let users explicitly specify non-power-of-2 sizes), but speaking of > priorities, IMO auto-vectorizer support is more important. > I agree that auto-vectorizer support is more important, and we are working on that. >>> Giving up on #3 for now and starting with less ambitious goals (#1 or >>> #2) would reduce pressure on RA and give more time for additional >>> experiments to come with a better and more universal >>> support/representation of generic/size-agnostic vectors. And, in a >>> longer term, help reducing complexity and technical debt in the area. >>> >>> Some more comments follow inline. >>> >>>>> Compared to x86 w/ AVX512, architectural state for vector registers is >>>>> 4x larger in the worst case (ignoring predicate registers for now). >>>>> Here are the relevant constants on x86: >>>>> >>>>> gensrc/adfiles/adGlobals_x86.hpp: >>>>> >>>>> // the number of reserved registers + machine registers. >>>>> #define REG_COUNT??? 545 >>>>> ... >>>>> // Size of register-mask in ints >>>>> #define RM_SIZE 22 >>>>> >>>>> My estimate is that for AArch64 with SVE support the constants will >>>>> be: >>>>> >>>>> ??? REG_COUNT < 2500 >>>>> ??? RM_SIZE < 100 >>>>> >>>>> which don't look too bad. >>>>> >>>> >>>> Right, but given that most real hardware implementations will be no >>>> larger than 512 bits, I think. Having a large bitmask array, with most >>>> bits useless, will be less efficient for regmask computation. >>> >>> Does it make sense to limit the maximum supported size to 512-bit then >>> (at least, initially)? In that case, the overhead won't be worse it is >>> on x86 now. >>> >> >> Technically, this may be possible though I haven't tried. My concerns >> are: >> >> 1) A larger regmask arrays would be less efficient (we only use 256 >> bits - 8 slots for SVE in this patch), though won't be worse than x86. >> >> 2) Given that current patch already supports larger sizes and >> non-power-of-2 sizes well with relative small size in diff, if we want >> to support other sizes soon, there may be some more work to roll-back >> ad file changes. >> >>>>> Also, I don't see any changes related to stack management. So, I >>>>> assume it continues to be managed in slots. Any problems there? As I >>>>> understand, wide SVE registers are caller-save, so there may be many >>>>> spills of huge vectors around a call. (Probably, not possible with C2 >>>>> auto-vectorizer as it is now, but Vector API will expose it.) >>>>> >>>> >>>> Yes, the stack is still managed in slots, but it will be allocated with >>>> real vector register length instead of 'virtual' slots for VecA. See >>>> the >>>> usages of scalable_reg_slots(), e.g. in chaitin.cpp:1587. We have also >>>> applied the patch to vector api, and did find a lot of vector spills >>>> with expected correct results. >>> >>> I'm curious whether similar problems may arise for spills. Considering >>> wide vector registers are caller-saved, it's possible to have lots of >>> 256-byte values to end up on stack (especially, with Vector API). Any >>> concerns with that? >>> >> >> No, we don't need to have such big (256-byte) slots for a smaller >> vector register. The spill slots are the same size as of real vector >> length, e.g. 48 bytes for 384-bit vector. Even for alignment, we >> currently choose SlotsPerVecA (8 slots for 32 bytes, 256 bits) for >> alignment (skipped slots can still be allocated to other args), which >> is still smaller than AVX512 (64 bytes, 512 bits). We can tweak the >> patch to choose other smaller value, if we think the alignment is too >> large. (Yes, we should always try to avoid spills for wide vectors, >> especially with Vector API, to avoid performance pitfalls.) > > Thanks for the clarifications. > > Any new problems/hitting some limitations envisioned when spilling large > number of huge vectors (2048-bit) on stack? > I haven't seen any so far. > Best regards, > Vladimir Ivanov > >>>>> Have you noticed any performance problems? If that's the case, then >>>>> AVX512 support on x86 would benefit from similar optimization as well. >>>>> >>>> >>>> Do you mean register allocation performance problems? I did not notice >>>> that before. Do you have any suggestion on how to measure that? >>> >>> I'd try to run some applications/benchmarks with -XX:+CITime to get a >>> sense how much RA may be affected. >>> >> >> Thanks! I will give a try. >> >> [1] >> https://urldefense.com/v3/__https://github.com/riscv/riscv-v-spec/releases/tag/0.9__;!!GqivPVa7Brio!IwFEx-c_8JDZcWgXPLcWp2ypX3pr1-IWTBfC7O7PHo7_0skMWtQa4fyWpo-lVor0NFv4Ivo$ >> Thanks, Ningsheng From ningsheng.jian at arm.com Tue Aug 25 10:13:14 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Tue, 25 Aug 2020 18:13:14 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <50271ba1-cc78-a325-aed5-2fc468084515@arm.com> <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com> <5397e0d1-9d40-0107-c164-304740bc5d7f@arm.com> Message-ID: Hi Erik, On 8/24/20 11:26 PM, Erik ?sterlund wrote: > Hi Ningsheng, > > On 2020-08-24 11:59, Ningsheng Jian wrote: >> Hi Erik, >> >> Thanks for the review! >> >> On 8/22/20 12:21 AM, Erik ?sterlund wrote: >>> Hi, >>> >>> Have you tried this with ZGC on AArch64? It has custom code for saving >>> live registers in the load barrier slow path. >>> I can't see any code changes there, so assuming this will just crash >>> instead. >>> The relevant code is in ZBarrierSetAssembler on aarch64. >>> >>> Maybe I missed something? >>> >> >> I didn't add ZGC option while running tests. I think I need to update >> push_fp() which is called by ZSaveLiveRegisters. But do we need to get >> size info (float/neon/sve) instead of saving the whole vector >> register? Currently, it just simply saves the whole NEON register. > > What we found on x86_64 was that there was a significant cost in saving > vector registers in load barriers. That is why we perform some analysis > so that only the exact registers that are affected, and only the parts > of the registers that are affected, get spilled. It actually mattered. > It will of course work either way, but that was our observation on > x86_64. But I am okay with that being deferred to a separate RFE. I just > wanted to make sure that it at the very least works with the new code, > for a start, so it doesn't start crashing. > OK, I will make it to save the whole reg in this patch and have a separate RFE to optimize as what x86 does. >> And in ZBarrierSetAssembler::load_at(), before calling to runtime >> code, we call push_call_clobbered_registers_except(), which just saves >> floating point registers instead of the whole NEON vector registers. >> Similar behavior in x86 implementation. Is that correct (not saving >> vectors)? > > Yes. The call contexts are: > 1) Interpreter. Does not use vector registers. > 2) Method handle intrinsic. Uses only floats that are part of the Java > calling convention, rest is garbage. No vectors here. > 3) Checkcast arraycopy. Does not use vectors. > Thanks for sharing this. Thanks, Ningsheng From aph at redhat.com Tue Aug 25 11:52:52 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 25 Aug 2020 12:52:52 +0100 Subject: [aarch64-port-dev ] RFR: 8252259: AArch64: Adjust default value of FLOATPRESSURE In-Reply-To: <001101d67aa5$69851450$3c8f3cf0$@alibaba-inc.com> References: <001101d67aa5$69851450$3c8f3cf0$@alibaba-inc.com> Message-ID: On 25/08/2020 07:03, Joshua Zhu wrote: > Therefore I propose the default value of FLOATPRESSURE be 32 because > there are 32 float/SIMD registers on aarch64 and also the value of register > pressure is the same as 1 for each LRG of Op_RegL/Op_RegD/Op_Vec. [3] > > Could you please help review this change? > > JBS: https://bugs.openjdk.java.net/browse/JDK-8252259 > Webrev: http://cr.openjdk.java.net/~jzhu/8252259/webrev.00/ Yes, thanks. I can't remember why FLOATPRESSURE is 64, but it certainly looks like 32 is a much more sensible value. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.x.ivanov at oracle.com Tue Aug 25 12:12:38 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 25 Aug 2020 15:12:38 +0300 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com> <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com> <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com> <9fd1e3b1-7884-1cf7-64ba-040a16c74425@oracle.com> Message-ID: <5b452edb-2851-f35a-ac30-523d74d95851@oracle.com> > I can understand that a total solution for different archs and vector > sizes is preferable. Do you have any initial idea how to achieve that? I have only ideas right now (unfortunately) :-) So far, my observations from working on refactoring vector support on x86 with Intel folks are the following: (1) full-width register representation is good enough; Though on x86 all vector registers are accurately modeled (register masks properly track sizes and aliasing), it turns out that what matters in practice is aliasing. So, it's enough to use a single "virtual" slot to model XMM, YMM, and ZMM registers all at once unless RA supports packing multiple smaller vector values into a single register (separately managing lower and upper parts of the register; e.g., YMM = XMM(hi):XMM(lo) ). Though currently RA does support it, there are no code which utilizes that and no plans to do that in the future. I believe the situation on AArch64 with NEON and SVE is similar. (And scalable vectors make it harder to support packing in RA.) (2) vector width matters only for spills/refills and reg2reg moves. Matcher does type capturing, so all vector mach nodes keep precise type of the value they produce. On x86 it is heavily used later in code emission phase, but RA still relies on ideal registers (Op_VecX et al). I don't see why RA can't be migrated from ideal registers to types (TypeVect) to determine vector size when performing spilling. From aforementioned observations, I conclude there should be a way to declare a single ideal vector register (Op_Vec) which represents full-width vector supported by the hardware and use captured vector types (TypeVect instances) to guide RA and code generation. And that's the state where I'd like to see vector support in C2 be moving to. Regarding predicate registers, I haven't thought too much about them, so I don't have a strong opinion about whether they should be a separate entity (Op_RegVMask in your patch) or just treated as a vector of bits (Op_Vec). >> So far, I see 2 main directions for RA work: >> >> ?? (a) support vectors of arbitrary size: >> ???? (1) helps push the upper limit on the size (1024-bit) >> ???? (2) handle non-power-of-2 sizes >> >> ?? (b) optimize RA implementation for large values >> >> Anything else? >> > > Yes, and it's not just vector. SVE predicate register has scalable size > (vector_size/8) as well. We also have predicate register allocator > support well with proposed approach (not in this patch.). Though with AVX512 support predicate register support was left aside, I agree that predicate registers should be taken into account from the very beginning. (And glad to hear you are already working on supporting them!) Also, I believe options #1/#2 may be extended to cover predicate registers as well without too much effort. >> Speaking of (a), in particular, I don't see why possible solution for >> it should not supersede vecX et al altogether. >> >> Also, I may be wrong, but I don't see a clear evidence there's a >> pressing need to have all of that fixed right from the beginning. >> (That's why I put #1 and #2 options on the table.) Starting with #1/#2 >> would untie initial SVE support from the exploratory work needed to >> choose the most appropriate solution for (a) and (b). >> > > Staring from partial SVE register support might be acceptable for > initial patch (Andrew may not agree :-)), but I think we may end up with > more follow-up work, given that our proposed approach already supports > SVE well in terms of (a) and (b). If there's no other solution, would it > be possible to use current proposed method? It's not difficult to > backout our changes in register allocation part, if we find other better > solution to support arbitrary vector/predicate sizes in future, as the > patch there is actually not big IMO. Unfortunately, temporary solutions usually end up as permanent ones since there's much less motivation to replace them (and harder to justify the effort) after initial pressure is relieved. I'm OK with the proposed patch if we agree it's a stop-the-gap/temporary solution to the immediate problems you face with initial SVE support and are ready to commit resources into replacing it. That's why I think it's the right time to discuss general direction, work on a plan, and use it to guide the coordinated effort to improve vector support in C2. Also, considering it a stop-the-gap solution means we should strive for the simplest solution and that's another reason I put #1/#2 options on the table to consider. [...] >> Any new problems/hitting some limitations envisioned when spilling >> large number of huge vectors (2048-bit) on stack? >> > > I haven't seen any so far. Ok, good to know. I was curious whether stack representation should also move away from 32-bit slots to a more compact representation. Best regards, Vladimir Ivanov From tobias.hartmann at oracle.com Tue Aug 25 12:37:21 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 25 Aug 2020 14:37:21 +0200 Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and debugging support In-Reply-To: <0d5fd444-e836-8042-3039-6d16e62ecfb1@oracle.com> References: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com> <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com> <5c6d0dae-a30d-7b32-4503-62b7ec460606@oracle.com> <4b6d3d8a-f2f3-9942-de60-078a0e5c46d9@oracle.com> <8cd1d560-f473-f4f1-a865-70e306d4750f@oracle.com> <0d5fd444-e836-8042-3039-6d16e62ecfb1@oracle.com> Message-ID: Hi Christian, On 19.08.20 16:06, Christian Hagedorn wrote: > http://cr.openjdk.java.net/~chagedorn/8251093/webrev.02/ Looks good to me, just noticed some style issues (no new webrev required): c1_LinearScan.cpp: - Wrong indentation in lines 5445, 5509, 5681 TestTraceLinearScanLevel.java: - "... in a HelloWorld program". It's not a HelloWorld program, right? ;) Best regards, Tobias From tobias.hartmann at oracle.com Tue Aug 25 12:43:10 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 25 Aug 2020 14:43:10 +0200 Subject: RFR (S) 8252215: Remove VerifyOptoOopOffsets flag In-Reply-To: <96144e25-02b7-ed81-285e-b8d487fd6cfb@redhat.com> References: <96144e25-02b7-ed81-285e-b8d487fd6cfb@redhat.com> Message-ID: Hi Aleksey, looks good to me. Best regards, Tobias On 25.08.20 09:08, Aleksey Shipilev wrote: > RFE: > ? https://bugs.openjdk.java.net/browse/JDK-8252215 > > VerifyOptoOopOffsets flag does not seem to be used (no tests in the current test base), and it does > not seem to work reliably (see JDK-4834891). It might be a good time to remove it. JDK-4834891 > evaluation says: "The flag VerifyOptoOopOffsets has not been valid since the introduction of > sun/misc/Unsafe and the flag should not be used for general testing." > > How about we remove it? > ? https://cr.openjdk.java.net/~shade/8252215/webrev.01/ > > Testing: tier1 (locally); jdk-submit (still running?) > From tobias.hartmann at oracle.com Tue Aug 25 12:44:17 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 25 Aug 2020 14:44:17 +0200 Subject: RFR (XS) 8252291: C2: Assignment in conditional in loopUnswitch.cpp In-Reply-To: References: Message-ID: <811ea63c-b897-ccf0-2559-82842b52e4be@oracle.com> Hi Aleksey, looks good and trivial to me. Best regards, Tobias On 25.08.20 09:34, Aleksey Shipilev wrote: > Cleanup: > ? https://bugs.openjdk.java.net/browse/JDK-8252291 > > Static code analysis complains there is the assignment in the conditional here. I believe the > assignment should be explicit here. Code was introduced with JDK-8136725. > > diff -r 31de2a59348a src/hotspot/share/opto/loopUnswitch.cpp > --- a/src/hotspot/share/opto/loopUnswitch.cpp?? Tue Aug 25 09:27:04 2020 +0200 > +++ b/src/hotspot/share/opto/loopUnswitch.cpp?? Tue Aug 25 09:29:23 2020 +0200 > @@ -442,7 +442,8 @@ > > ?? if (iff->in(1)->Opcode() != Op_ConI) { > ???? return false; > ?? } > > -? return _has_reserved = true; > +? _has_reserved = true; > +? return true; > ?} > > Testing: local builds > From tobias.hartmann at oracle.com Tue Aug 25 12:46:11 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 25 Aug 2020 14:46:11 +0200 Subject: RFR (XS) 8252290: C2: Remove unused enum in CallGenerator In-Reply-To: References: <30d68060-d518-a2c6-f853-9e870d48f0ad@redhat.com> Message-ID: +1 On 25.08.20 11:28, Reingruber, Richard wrote: > Static code inspection complains the enum below is unused. Just curious, which analyzer are you using? Best regards, Tobias From tobias.hartmann at oracle.com Tue Aug 25 12:57:22 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 25 Aug 2020 14:57:22 +0200 Subject: RFR(S): 8241486: G1/Z give warning when using LoopStripMiningIter and turn off LoopStripMiningIter (0) In-Reply-To: <87tuwr6s5j.fsf@redhat.com> References: <87tuwr6s5j.fsf@redhat.com> Message-ID: Hi Roland, > * @requires vm.gc.G1 & vm.gc.Shenandoah & vm.gc.Z & vm.gc.Epsilon That doesn't look right. The test would never be executed. Best regards, Tobias From tobias.hartmann at oracle.com Tue Aug 25 13:18:17 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 25 Aug 2020 15:18:17 +0200 Subject: RFR(S): 8241486: G1/Z give warning when using LoopStripMiningIter and turn off LoopStripMiningIter (0) In-Reply-To: References: <87tuwr6s5j.fsf@redhat.com> Message-ID: On 25.08.20 14:57, Tobias Hartmann wrote: >> * @requires vm.gc.G1 & vm.gc.Shenandoah & vm.gc.Z & vm.gc.Epsilon > That doesn't look right. The test would never be executed. Sorry, confused it with the vm.gc == .. check. You are just checking if the VM supports the GC. Looks good to me. Best regards, Tobias From vladimir.x.ivanov at oracle.com Tue Aug 25 13:18:12 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 25 Aug 2020 16:18:12 +0300 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <35f8801f-4383-4804-2a9b-5818d1bda763@redhat.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com> <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com> <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com> <35f8801f-4383-4804-2a9b-5818d1bda763@redhat.com> Message-ID: <524a4aaa-3cf7-7b5e-3e0e-0fd7f4f89fbf@oracle.com> Hi Andrew, I elaborated on some of the points in the thread with Ningsheng. I put my responses in-line, but will try to avoid repeating myself too much. >>> The ultimate goal was to move to vectors which represent full-width >>> hardware registers. After we were convinced that it will work well in AD >>> files, we encountered some inefficiencies with vector spills: depending >>> on actual hardware, smaller (than available) vectors may be used (e.g., >>> integer computations on AVX-capable CPU). So, we stopped half-way and >>> left post-matching part intact: depending on actual vector value width, >>> appropriate operand (vecX/vecY/vecZ + legacy variants) is chosen. >>> >>> (I believe you may be in a similar situation on AArch64 with NEON vs SVE >>> where both 128-bit and wide SVE vectors may be used at runtime.) > > Your problem here seems to be a worry about spilling more data than is > actually needed. As Ningsheng pointed out the amount of data spilled is > determined by the actual length of the VecA registers, not by the > logical size of the VecA mask (256 bits) nor by the maximum possible > size of a VecA register on future architectures (2048 bits). So, no more > stack space will be used than is needed to preserve the live bits that > need preserving. I described the experience with doing a similar exercise on x86: migrating away from [leg]vec[SDXYZ] operands to a uniform size-agnostic representation (legVec/vec). The only problem with abandoning Op_VecX et al was the need to track the size of vector values in RA. >>> Unfortunately, it extends the implementation in orthogonal direction >>> which looks too aarch64-specific to benefit other architectures and x86 >>> particular. I believe there's an alternative approach which can benefit >>> both aarch64 and x86, but it requires more experimentation. >>> >> >> Since vecA and vecX (and others) are architecturally different vector >> registers, I think it's quite natural that we just introduced the new >> vector register type vecA, to represent what we need for corresponding >> hardware vector register. Please note that in vector length agnostic >> ISA, like Arm SVE and RISC-V vector extension [1], the vector registers >> are architecturally the same type of register despite the different >> hardware implementations. > > Yes, I also see this as quite natural. Ningsheng's change extends the > implementation in the architecture-specific direction that is needed for > AArch64's vector model. The fact that this differs from x86_64 is not > unexpected. And still C2 can model them in a similar way. Moreover, recent changes on x86 I described brings x86 very close to SVE. (I elaborated on that in the previous response to Ningsheng.) >>> If I were to start from scratch, I would choose between 3 options: >>> >>> ??? #1: reuse existing VecX/VecY/VecZ ideal registers and limit supported >>> vector sizes to 128-/256-/512-bit values. >>> >>> ??? #2: lift limitation on max size (to 1024/2048 bits), but ignore >>> non-power-of-2 sizes; >>> >>> ??? #3: introduce support for full range of vector register sizes >>> (128-/.../2048-bit with 128-bit step); >>> >>> I see 2 (mostly unrelated) limitations: maximum vector size and >>> non-power-of-2 sizes. > > Yes, but this patch deals with both of those and I cannot see it causing > any problems for x86_64 nor do I see it adding any great complexity. The > extra shard paths deal with scalable vectors wich onlu occur on AArch64. > A scalable VecA register (and also eventually the scalable predicate > register) caters for all possible vector sizes via a single 'logical' > vector of size 8 slots (also eventually a single 'logical' predicate > register of size 1 slot). Catering for scalable registers in shared code > is localized and does not change handling of the existing, non-scalable > VecX/Y/Z registers. Code needed for vector support in C2 has been growing in size over the years and now it comprises a noticeable part of the compiler. And it got there through relatively small incremental and localized changes. I agree that the proposed solution demonstrates a very clever way to overcome some of the limitations imposed by existing implementation. But it is still a workaround which only emphasizes the architectural limitations. And it's not specific to AArch64 with SVE: x86 stretches it hard as well (though in a slightly different direction) which FTR forced recent migration to "generic vectors". So, instead of proceeding with incremental changes and accumulating complexity (and technical debt along the way), I suggest to look into reworking vector support and making it relevant to the modern hardware (both x86 and AArch64). >>> My understanding is that you don't try to accurately represent SVE for >>> now, but lay some foundations for future work: you give up on >>> non-power-of-2 sized vectors, but still enable support for arbitrarily >>> sized vectors (addressing both limitations on maximum size and size >>> granularity) in RA (and it affects only spills). So, it is somewhere >>> between #2 and #3. > > I have to disagree with your statement that this proposal doesn't > 'accurately' represent SVE. Yes, the vector mask for this arbitrary-size > vector is modelled 'logically' using a nominal 8 slots. However, that is > merely to avoid wasting bits in the bit masks plus cpu time processing > them. The 'physical' vector length models the actual number of slots, > and includes the option to model a non-power of two. That 'physical' > size is used in all operations that manipulate VecA register contents. > So, although I grant that the code is /parameterized/, it is also 100% > accurate. My point is: the proposed solution makes a number of simplifying assumptions which makes it much easier to support SVE (e.g., VecA represents full-width vector which completely ignores implicit predication provided by the ISA). >>> The ultimate goal is definitely #3, but how much more work will be >>> required to teach the JVM about non-power-of-2 vectors? As I see in the >>> patch, you don't have auto-vectorizer support yet, but Vector API will >>> provide access to whatever size hardware exposes. What do you expect on >>> hardware front in the near/mid-term future? Anything supporting vectors >>> larger than 512-bit? What about 384-bit vectors? > > Do we need to know for sure such hardware is going to arrive in order to > allow for it now? If there were a significant cost to doing so I'd maybe > say yes but I don't really see one here. Most importantly, the changes > to the AArch64 register model and small changes to the shared > chaitin/reg mask code proposed here already work with the > auto-vectorizer if the VecA slots are any of the possible powers of 2 > VecA sizes. > > The extra work needed to profit from non-power-of-two vector involves > upgrading the auto-vectorizer code. While this may be tricky I don't see > ti as impossible. However, more importantly, even if such an upgrade > cannot be achieved then this proposal is still a very simple way to > allow for arbitrarily scalable SVE vectors that are a power of two size. > It also allows any architecture with a non-power of two to work with the > lowest power of two that fits. So, this is a very siple way to cater for > what may turn up. If it makes options #1/#2 viable, then there's no need to change shared code at all. Choosing between no code changes and low risk / small code changes which won't be used in practice, I'm strongly in favor of the former. >>> For larger vectors #2 (or a mix of #1 and #2) may be a good fit. My >>> understanding that existing RA machinery should support 1024-bit vectors >>> well. So, unless 2048-bit vectors are needed, we could live with the >>> framework we have right now. > > I'm not sure what you are proposing here but it sounds like introducing > extra vectors beyond VecX, VecY for larger powers of two i.e. VecZ, > vecZZ, VecZZZ ... and providing separate case processing for each of > them where the relevant case is selected conditional on the actual > vector size. Is that what you are proposing? I can't see any virtue in > multiplying case handling fore ach new power-of-two size that turns up > when all possible VecZ* power-of-two options can actually be handled as > one uniform case. Option #1 doesn't require anything more than Vec[SDXYZ]. Option #2 assumes 1 more operand & ideal register for 1024-bit. As Ningsheng pointed out, without introducing length-agnostic vectors, supporting 2048-bit vectors require changes in RegMask to accommodate for values spanning 64 slots. >>> Giving up on #3 for now and starting with less ambitious goals (#1 or >>> #2) would reduce pressure on RA and give more time for additional >>> experiments to come with a better and more universal >>> support/representation of generic/size-agnostic vectors. And, in a >>> longer term, help reducing complexity and technical debt in the area. > > Can you explain what you mean by 'reduce pressure on RA'? I'm also > unclear as to what you see as complex about this proposal. IMO vector support already introduces significant complexity in C2. Adding platform-specific features will only increase it. So, I'm in favor of reworking the support than applying band-aids to relax some inherent limitations of it. >>> Some more comments follow inline. >>> >>>>> Compared to x86 w/ AVX512, architectural state for vector registers is >>>>> 4x larger in the worst case (ignoring predicate registers for now). >>>>> Here are the relevant constants on x86: >>>>> >>>>> gensrc/adfiles/adGlobals_x86.hpp: >>>>> >>>>> // the number of reserved registers + machine registers. >>>>> #define REG_COUNT??? 545 >>>>> ... >>>>> // Size of register-mask in ints >>>>> #define RM_SIZE 22 >>>>> >>>>> My estimate is that for AArch64 with SVE support the constants will be: >>>>> >>>>> ??? REG_COUNT < 2500 >>>>> ??? RM_SIZE < 100 >>>>> >>>>> which don't look too bad. > > I'm not sure what these numbers are meant to mean. The number of SVE > vector registers is the same as the number of NEON vector registers i.e. > 32. The register mask size for VecA registers is 8 * 32 bits. I attempted to estimate the sizes of relevant structures if VecA is modelled the same way as VecX et al. >>>> Right, but given that most real hardware implementations will be no >>>> larger than 512 bits, I think. Having a large bitmask array, with most >>>> bits useless, will be less efficient for regmask computation. >>> >>> Does it make sense to limit the maximum supported size to 512-bit then >>> (at least, initially)? In that case, the overhead won't be worse it is >>> on x86 now. > > Well, no. It doesn't make sense when all you need is a 'logical' 8 * 32 > bit mask whatever the actual 'physical' register size is. I asked that question in a different context trying to get a sense of other simplifying assumptions which could be made in the initial implementation. But you should definitely prefer 1-slot design for vector registers then ;-) Best regards, Vladimir Ivanov From rwestrel at redhat.com Tue Aug 25 13:37:10 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 25 Aug 2020 15:37:10 +0200 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> <87tuwx1gcf.fsf@redhat.com> <7433dd28-94ce-781a-a50c-e79234e2986e@oracle.com> Message-ID: <87r1ru7sop.fsf@redhat.com> > Putting these together, and choosing a round number > which prioritizes concern (b) by moving closer to the > limit of (a), if I had one more run to do I?d choose > -XX: StressLongCountedLoop=20000000. > > If I were to do multiple runs, I might choose vary that > stress parameter by adding and subtracting a couple > of zeroes: > > -XX: StressLongCountedLoop=200000 > -XX: StressLongCountedLoop=2000000 > -XX: StressLongCountedLoop=20000000 > -XX: StressLongCountedLoop=200000000 > -XX: StressLongCountedLoop=2000000000 FWIW, I ran my own tests with -XX: StressLongCountedLoop=10000. Tobias runs did catch failures I didn't run into. > Separately from those issues, we know that the stress mode > converts 32-bit loops into 64-bit loops, which then re-nest > using the new logic. But, are we confident that this re-nesting > works? Roland did some manual testing to make sure the > test works as intended, but it would be good to run the above > stress tests with some sort of logging that ensures that there > are at least ?lots and lots? of successful 32-to-64 loop conversions. > If those loop conversions fail (staying at 64 bits) the tests will > pass, but they won?t be testing what we need to be testing. What about using the new statistics? A CTW run of the base module reports: long loops=11/36 A CTW run of the base module with -XX:StressLongCountedLoop=1000 reports: long loops=3271/3410 Granted, the first counter is only incremented once the loop nest is created but not when the inner loop is converted to a counted loop. On another run with a third counter incremented on counted loop creation: 2889/2971/3106 (that not all created inner loops are transformed to counted loops is strange. Maybe some become dead between the 2 steps.) Roland. From martin.doerr at sap.com Tue Aug 25 13:37:38 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 25 Aug 2020 13:37:38 +0000 Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> Message-ID: Hi Corey, thanks for proposing this change. I have comments and suggestions regarding various files. Base64.java This is the only file which needs another review from core-libs-dev. First of all, I like the idea to use a HotSpotIntrinsicCandidate which can consume as many bytes as the implementation wants. Comment before decodeBlock: Let's be precise: "should process a multiple of four" => "must process a multiple of four" > If any illegal base64 bytes are encountered in the source by the > intrinsic, the intrinsic can return a data length of zero or any > number of bytes before the place where the illegal base64 byte > was encountered. I think this has a drawback. Somebody may use a debugger and want to stop when throwing IllegalArgumentException. He should see the position which matches the Java implementation. Please note that the comment indentation differs from other comments. decode0: Final "else" after return is redundant. stubGenerator_ppc.cpp "__vector" breaks AIX build! Does it work on Big Endian linux with old gcc (we require 7.3.1, now)? Please either support Big Endian properly or #ifdef it out. What exactly does it on linux? I remember that we had tried such prefixes but were not satisfied. I think it didn't enforce 16 Byte alignment if I remember correctly. Attention: C2 does no longer convert int/bool to 64 bit values (since JDK-8086069). So the argument registers for offset, length and isURL may contain garbage in the higher bits. You may want to use load_const_optimized which produces shorter code. You may want to use __ align(32) to align unrolled_loop_start. I'll review the algorithm in detail when I find more time. assembler_ppc.hpp assembler_ppc.inline.hpp vm_version_ppc.cpp vm_version_ppc.hpp Please rebase. Parts of the change were pushed as part of 8248190: Enable Power10 system and implement new byte-reverse instructions vmSymbols.hpp Indentation looks odd at the end. library_call.cpp Good. Indentation style of the call parameters differs from encodeBlock. runtime.cpp Good. aotCodeHeap.cpp vmSymbols.cpp shenandoahSupport.cpp vmStructs_jvmci.cpp shenandoahSupport.cpp escape.cpp runtime.hpp stubRoutines.cpp stubRoutines.hpp vmStructs.cpp Good and trivial. Tests: I think we should have JTREG tests to check for regressions in the future. Best regards, Martin > -----Original Message----- > From: Corey Ashford > Sent: Mittwoch, 19. August 2020 20:11 > To: Michihiro Horie > Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev dev at openjdk.java.net>; Kazunori Ogata ; > joserz at br.ibm.com; Doerr, Martin > Subject: Re: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and > API for Base64 decoding > > Michihiro Horie posted up a new iteration of this webrev for me. This > time the webrev includes a complete implementation of the intrinsic for > Power9 and Power10. > > You can find it here: > http://cr.openjdk.java.net/~mhorie/8248188/webrev.02/ > > Changes in webrev.02 vs. webrev.01: > > * The method header for the intrinsic in the Base64 code has been > rewritten using the Javadoc style. The clarity of the comments has been > improved and some verbosity has been removed. There are no additional > functional changes to Base64.java. > > * The code needed to martial and check the intrinsic parameters has > been added, using the base64 encodeBlock intrinsic as a guideline. > > * A complete intrinsic implementation for Power9 and Power10 is included. > > * Adds some Power9 and Power10 assembler instructions needed by the > intrinsic which hadn't been defined before. > > The intrinsic implementation in this patch accelerates the decoding of > large blocks of base64 data by a factor of about 3.5X on Power9. > > I'm attaching two Java test cases I am using for testing and > benchmarking. The TestBase64_VB encodes and decodes randomly-sized > buffers of random data and checks that original data matches the > encoded-then-decoded data. TestBase64Errors encodes a 48K block of > random bytes, then corrupts each byte of the encoded data, one at a > time, checking to see if the decoder catches the illegal byte. > > Any comments/suggestions would be appreciated. > > Thanks, > > - Corey > > On 7/27/20 6:49 PM, Corey Ashford wrote: > > Michihiro Horie uploaded a new revision of the Base64 decodeBlock > > intrinsic API for me: > > > > http://cr.openjdk.java.net/~mhorie/8248188/webrev.01/ > > > > It has the following changes with respect to the original one posted: > > > > ?* In the event of encountering a non-base64 character, instead of > > having a separate error code of -1, the intrinsic can now just return > > either 0, or the number of data bytes produced up to the point where the > > illegal base64 character was encountered.? This reduces the number of > > special cases, and also provides a way to speed up the process of > > finding the bad character by the slower, pure-Java algorithm. > > > > ?* The isMIME boolean is removed from the API for two reasons: > > ?? - The current API is not sufficient to handle the isMIME case, > > because there isn't a strict relationship between the number of input > > bytes and the number of output bytes, because there can be an arbitrary > > number of non-base64 characters in the source. > > ?? - If an intrinsic only implements the (isMIME == false) case as ours > > does, it will always return 0 bytes processed, which will slightly slow > > down the normal path of processing an (isMIME == true) instantiation. > > ?? - We considered adding a separate hotspot candidate for the (isMIME > > == true) case, but since we don't have an intrinsic implementation to > > test that, we decided to leave it as a future optimization. > > > > Comments and suggestions are welcome.? Thanks for your consideration. > > > > - Corey > > > > On 6/23/20 6:23 PM, Michihiro Horie wrote: > >> Hi Corey, > >> > >> Following is the issue I created. > >> https://bugs.openjdk.java.net/browse/JDK-8248188 > >> > >> I will upload a webrev when you're ready as we talked in private. > >> > >> Best regards, > >> Michihiro > >> > >> Inactive hide details for "Corey Ashford" ---2020/06/24 > >> 09:40:10---Currently in java.util.Base64, there is a > >> HotSpotIntrinsicCa"Corey Ashford" ---2020/06/24 09:40:10---Currently > >> in java.util.Base64, there is a HotSpotIntrinsicCandidate and API for > >> encodeBlock, but no > >> > >> From: "Corey Ashford" > >> To: "hotspot-compiler-dev at openjdk.java.net" > >> , > >> "ppc-aix-port-dev at openjdk.java.net" dev at openjdk.java.net> > >> Cc: Michihiro Horie/Japan/IBM at IBMJP, Kazunori > Ogata/Japan/IBM at IBMJP, > >> joserz at br.ibm.com > >> Date: 2020/06/24 09:40 > >> Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for > >> Base64 decoding > >> > >> ------------------------------------------------------------------------ > >> > >> > >> > >> Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and > >> API for encodeBlock, but none for decoding. ?This means that only > >> encoding gets acceleration from the underlying CPU's vector hardware. > >> > >> I'd like to propose adding a new intrinsic for decodeBlock. ?The > >> considerations I have for this new intrinsic's API: > >> > >> ??* Don't make any assumptions about the underlying capability of the > >> hardware. ?For example, do not impose any specific block size > >> granularity. > >> > >> ??* Don't assume the underlying intrinsic can handle isMIME or isURL > >> modes, but also let them decide if they will process the data regardless > >> of the settings of the two booleans. > >> > >> ??* Any remaining data that is not processed by the intrinsic will be > >> processed by the pure Java implementation. ?This allows the intrinsic to > >> process whatever block sizes it's good at without the complexity of > >> handling the end fragments. > >> > >> ??* If any illegal character is discovered in the decoding process, the > >> intrinsic will simply return -1, instead of requiring it to throw a > >> proper exception from the context of the intrinsic. ?In the event of > >> getting a -1 returned from the intrinsic, the Java Base64 library code > >> simply calls the pure Java implementation to have it find the error and > >> properly throw an exception. ?This is a performance trade-off in the > >> case of an error (which I expect to be very rare). > >> > >> ??* One thought I have for a further optimization (not implemented in > >> the current patch), is that when the intrinsic decides not to process a > >> block because of some combination of isURL and isMIME settings it > >> doesn't handle, it could return extra bits in the return code, encoded > >> as a negative number. ?For example: > >> > >> Illegal_Base64_char ? = 0b001; > >> isMIME_unsupported ? ?= 0b010; > >> isURL_unsupported ? ? = 0b100; > >> > >> These can be OR'd together as needed and then negated (flip the sign). > >> The Base64 library code could then cache these flags, so it will know > >> not to call the intrinsic again when another decodeBlock is requested > >> but with an unsupported mode. ?This will save the performance hit of > >> calling the intrinsic when it is guaranteed to fail. > >> > >> I've tested the attached patch with an actual intrinsic coded up for > >> Power9/Power10, but those runtime intrinsics and arch-specific patches > >> aren't attached today. ?I want to get some consensus on the > >> library-level intrinsic API first. > >> > >> Also attached is a simple test case to test that the new intrinsic API > >> doesn't break anything. > >> > >> I'm open to any comments about this. > >> > >> Thanks for your consideration, > >> > >> - Corey > >> > >> > >> Corey Ashford > >> IBM Systems, Linux Technology Center, OpenJDK team > >> cjashfor at us dot ibm dot com > >> [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro > >> Horie/Japan/IBM] [attachment "TestBase64.java" deleted by Michihiro > >> Horie/Japan/IBM] > >> > >> > > From tobias.hartmann at oracle.com Tue Aug 25 13:49:43 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 25 Aug 2020 15:49:43 +0200 Subject: RFR(S): 8252292: 8240795 may cause anti-dependence to be missed In-Reply-To: <87wo1n6snc.fsf@redhat.com> References: <87wo1n6snc.fsf@redhat.com> Message-ID: Hi Roland, Good catch, the fix looks reasonable to me. I think the test needs a @requires vm.gc == "Parallel" | vm.gc == "null" to not fail due to conflicting GC options if another GC is set. Best regards, Tobias On 25.08.20 10:23, Roland Westrelin wrote: > > https://bugs.openjdk.java.net/browse/JDK-8252292 > http://cr.openjdk.java.net/~roland/8252292/webrev.00/ > > In 8240795, I modified alias analysis so non escaping allocations don't > alias with bottom memory. While browsing that code last week, I noticed > that that change didn't seem quite right and may cause some > anti-dependences to be missed. I could indeed write a test case that > fails with an incorrect execution. > > In the test case: the dst[9] load after the ArrayCopy is transformed > into a src[9] load before the ArrayCopy. Anti dependence analysis find > src[9] shares the memory of the ArrayCopy but because of the way I > tweaked the code with 8240795, anti-dependence analysis finds the src[9] > and ArrayCopy don't alias so src[9] can sink out of the loop which is > wrong because of the src[9] store. Anti-dependence analysis in that case > would need to look at the memory uses of ArrayCopy too. > > Roland. > From tobias.hartmann at oracle.com Tue Aug 25 14:06:38 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 25 Aug 2020 16:06:38 +0200 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> <87tuwx1gcf.fsf@redhat.com> <7433dd28-94ce-781a-a50c-e79234e2986e@oracle.com> Message-ID: On 25.08.20 07:23, John Rose wrote: > Putting these together, and choosing a round number > which prioritizes concern (b) by moving closer to the > limit of (a), if I had one more run to do I?d choose > -XX: StressLongCountedLoop=20000000. > > If I were to do multiple runs, I might choose vary that > stress parameter by adding and subtracting a couple > of zeroes: > > -XX: StressLongCountedLoop=200000 > -XX: StressLongCountedLoop=2000000 > -XX: StressLongCountedLoop=20000000 > -XX: StressLongCountedLoop=200000000 > -XX: StressLongCountedLoop=2000000000 Okay, thanks, I'll run some more testing with these values. Will report back once it finished. > If any of those runs kicks out a bug or other suspicious behavior, > it should be added to a permanent test list. Earlier runs with 1 and 4294967295 already found bugs. I think we should add a selection of stress values to higher CI tiers. Best regards, Tobias From rwestrel at redhat.com Tue Aug 25 14:13:24 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 25 Aug 2020 16:13:24 +0200 Subject: [16] RFR(M): 8249607: C2: assert(!had_error) failed: bad dominance In-Reply-To: References: Message-ID: <87o8my7r0b.fsf@redhat.com> > In the testcase, a LoadSNode is cloned in > PhaseIdealLoop::split_if_with_blocks_post() for each use such that they > can float out of a loop. To ensure that these loads cannot float back > into the loop, we pin them by setting their control input [1]. In the > testcase, all 3 new clones are pinned to a loop exit node that is part > of an outer strip mined loop (see [2]). Do I understand this right, that all 3 clones are pinned with the same control? So they common and only of them is kept? Roland. From rwestrel at redhat.com Tue Aug 25 14:21:55 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 25 Aug 2020 16:21:55 +0200 Subject: RFR(S): 8252292: 8240795 may cause anti-dependence to be missed In-Reply-To: References: <87wo1n6snc.fsf@redhat.com> Message-ID: <87k0xm7qm4.fsf@redhat.com> > Good catch, the fix looks reasonable to me. Thanks for the review. > I think the test needs a @requires vm.gc == "Parallel" | vm.gc == "null" to not fail due to > conflicting GC options if another GC is set. Indeed. I'll make that change before I push the fix. Roland. From igor.ignatyev at oracle.com Tue Aug 25 14:25:26 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 25 Aug 2020 07:25:26 -0700 Subject: RFR(S): 8252292: 8240795 may cause anti-dependence to be missed In-Reply-To: References: <87wo1n6snc.fsf@redhat.com> Message-ID: > On Aug 25, 2020, at 6:49 AM, Tobias Hartmann wrote: > > Hi Roland, > > Good catch, the fix looks reasonable to me. > > I think the test needs a @requires vm.gc == "Parallel" | vm.gc == "null" to not fail due to > conflicting GC options if another GC is set. Hi Roland, '@requires vm.gc.Parallel' should be used to limit execution of the test to configurations where ParallelGC is available and selectable (meaning no GC has been explicitly specified or explicitly specified GC is Parallel). -- Igor > > Best regards, > Tobias > > On 25.08.20 10:23, Roland Westrelin wrote: >> >> https://bugs.openjdk.java.net/browse/JDK-8252292 >> http://cr.openjdk.java.net/~roland/8252292/webrev.00/ >> >> In 8240795, I modified alias analysis so non escaping allocations don't >> alias with bottom memory. While browsing that code last week, I noticed >> that that change didn't seem quite right and may cause some >> anti-dependences to be missed. I could indeed write a test case that >> fails with an incorrect execution. >> >> In the test case: the dst[9] load after the ArrayCopy is transformed >> into a src[9] load before the ArrayCopy. Anti dependence analysis find >> src[9] shares the memory of the ArrayCopy but because of the way I >> tweaked the code with 8240795, anti-dependence analysis finds the src[9] >> and ArrayCopy don't alias so src[9] can sink out of the loop which is >> wrong because of the src[9] store. Anti-dependence analysis in that case >> would need to look at the memory uses of ArrayCopy too. >> >> Roland. >> From rwestrel at redhat.com Tue Aug 25 14:31:16 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 25 Aug 2020 16:31:16 +0200 Subject: RFR(S): 8252292: 8240795 may cause anti-dependence to be missed In-Reply-To: References: <87wo1n6snc.fsf@redhat.com> Message-ID: <87h7sq7q6j.fsf@redhat.com> Hi Igor, > '@requires vm.gc.Parallel' should be used to limit execution of the > test to configurations where ParallelGC is available and selectable > (meaning no GC has been explicitly specified or explicitly specified > GC is Parallel). Thanks for the clarification. I'll go with your suggestion. Roland. From aph at redhat.com Tue Aug 25 14:55:40 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 25 Aug 2020 15:55:40 +0100 Subject: RFR 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: <2323d921-8db3-b98f-af7a-bba7b7c345be@bell-sw.com> References: <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com> <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com> <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com> <8635e113-53c0-f6a7-e51c-acf20222e216@redhat.com> <89c44f2f-9a4f-4298-4e78-358fff27b6a7@bell-sw.com> <95cf8beb-2071-8c41-ff71-d4998681e742@redhat.com> <2323d921-8db3-b98f-af7a-bba7b7c345be@bell-sw.com> Message-ID: On 25/08/2020 10:47, Boris Ulasevich wrote: > Ok. Can you please check that my patch [1] has been applied > and built correctly. With my change I see this picture: > > ....[Hottest Region 2]........................................... > c2, level 4, org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub, > > ??????????? 0x0000ffff84584dac:?? add??? x11, x14, #0x94 > > ??????????? 0x0000ffff84584db0:?? stp??? x21, x19, [sp] > ??????????? 0x0000ffff84584db4:?? stp??? x20, x14, [sp, #16] > ??????????? 0x0000ffff84584db8:?? stp??? x15, x10, [sp, #32] > ??????????? 0x0000ffff84584dbc:?? str??? x11, [sp, #48] > ??????????? 0x0000ffff84584dc0:?? b??? 0x0000ffff84584dd8 > ??????????? 0x0000ffff84584dc4:?? nop > ??????????? 0x0000ffff84584dc8:?? nop > ??????????? 0x0000ffff84584dcc:?? nop > ? 3.64%? ?? 0x0000ffff84584dd0:?? str??? x19, [sp, #16] > ? 0.07%? ?? 0x0000ffff84584dd4:?? mov??? x16, x29 > ???????? ?? 0x0000ffff84584dd8:?? ldr??? w10, [x16, #12] ;*invokestatic bfm > ? 3.92%? ?? 0x0000ffff84584ddc:?? ldr??? w12, [x16, #24] > ? 4.69%? ?? 0x0000ffff84584de0:?? and??? w2, w10, #0xff > ? 0.03%? ?? 0x0000ffff84584de4:?? mov??? x29, x16 > ? 0.02%? ?? 0x0000ffff84584de8:?? bfi??? x2, x12, #8, #8???? ;*ior {reexecute=0 rethrow=0 return_oop=0} > ???????? ??????????????????????????????????????????????????? ; - My apologies, I must have messed the patch up. I rebuilt cleanly. One odd thing, though, is that it only works with some forms, and not necessarily the most common ones. Good: @Benchmark public static int bfm(Result r) { return (r.a & 0xFF) | ((r.b & 0xFF) << 8); } 8.13% ? 0x0000fffface550f0: and w2, w11, #0xff 0.69% ? 0x0000fffface550f4: bfi x2, x10, #8, #8 ;*ior {reexecute=0 rethrow=0 return_oop=0} Not so good: @Benchmark public static int shift_bfm(Result r) { return ((r.a << 24 >>> 24) | (r.b << 24 >>> 16)); } 8.56% ? 0x0000ffff88e50e70: lsl w12, w11, #24 ? 0x0000ffff88e50e74: and w10, w10, #0xff 8.59% ? 0x0000ffff88e50e78: orr w2, w10, w12, lsr #16 ;*ior {reexecute=0 rethrow=0 return_oop=0} @Benchmark public static int shift_sbfm(Result r) { return ((r.a << 24 >>> 24) | (r.b << 24 >> 16)); } 9.40% ? 0x0000ffff84e51070: lsl w12, w11, #24 0.12% ? 0x0000ffff84e51074: and w10, w10, #0xff 8.06% ? 0x0000ffff84e51078: orr w2, w10, w12, asr #16 ;*ior {reexecute=0 rethrow=0 return_oop=0} Does this matter? Bits.java uses the (a & 0xff) | ((b & 0xFF) << 8) idiom so maybe we don't care about the shift left followed by shift right form. But it feels to me a bit unsatisfactory to miss it. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.x.ivanov at oracle.com Tue Aug 25 15:29:50 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 25 Aug 2020 18:29:50 +0300 Subject: RFR (XS) 8252290: C2: Remove unused enum in CallGenerator In-Reply-To: References: <30d68060-d518-a2c6-f853-9e870d48f0ad@redhat.com> Message-ID: <0f0a8779-0d70-7bd8-e302-f83fbefee24c@oracle.com> > Just curious, which analyzer are you using? One of the other bugs [1] filed by Aleksey mentions CLion. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8252237 "CLion static analyzer highlights this oddity." From aph at redhat.com Tue Aug 25 16:55:38 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 25 Aug 2020 17:55:38 +0100 Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: References: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com> <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com> <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com> Message-ID: On 24/08/2020 22:52, Dmitry Chuyko wrote: > > I added two more intrinsics -- for copySign, they are controlled by > UseCopySignIntrinsic flag. > > webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.03/ > > It also contains 'benchmarks' directory: > http://cr.openjdk.java.net/~dchuyko/8251525/webrev.03/benchmarks/ > > There are 8 benchmarks there: (double | float) x (blackhole | reduce) x > (current j.l.Math.signum | abs()>0 check). > > My results on Arm are in signum-facgt-copysign.ods. Main case is > 'random' which is actually a random from positive and negative numbers > between -0.5 and +0.5. > > Basically we have ~14% improvement in 'reduce' benchmark variant but > ~20% regression in 'blackhole' variant in case of only copySign() > intrinsified. > > Same picture if abs()>0 check is used in signum() (+-5%). This variant > is included as it shows very good results on x86. > > Intrinsic for signum() gives improvement of main case in both > 'blackhole' and 'reduce' variants of benchmark: 28% and 11%, which is a > noticeable difference. Ignoring Blackhole for the moment, this is what I'm seeing for the reduction/random case: Benchmark Mode Cnt Score Error Units ThunderX 2: -XX:-UseSignumIntrinsic -XX:-UseCopySignIntrinsic DoubleReduceBench.ofRandom avgt 3 2.456 ? 0.065 ns/op -XX:+UseSignumIntrinsic -XX:-UseCopySignIntrinsic DoubleReduceBench.ofRandom avgt 3 2.766 ? 0.107 ns/op -XX:-UseSignumIntrinsic -XX:+UseCopySignIntrinsic DoubleReduceBench.ofRandom avgt 3 2.537 ? 0.770 ns/op Neoverse N1 (Actually Amazon m6g.16xlarge): -XX:-UseSignumIntrinsic -XX:-UseCopySignIntrinsic DoubleReduceBench.ofRandom avgt 3 1.173 ? 0.001 ns/op -XX:+UseSignumIntrinsic -XX:-UseCopySignIntrinsic DoubleReduceBench.ofRandom avgt 3 1.043 ? 0.022 ns/op -XX:-UseSignumIntrinsic -XX:+UseCopySignIntrinsic DoubleReduceBench.ofRandom avgt 3 1.012 ? 0.001 ns/op By your own numbers, in the reduce benchmark the signum intrinsic is worse than default for all 0 and NaN, but about 12% better for random, >0, and <0. If you take the average of the sppedups and slowdowns it's actually worse than default. By my reckoning, if you take all possibilities (Nan, <0, >0, 0, Random) into account, the best-performing on the reduce test is actually Abs/Copysign, but there's very little in it. The only time that the signum intrinsic actually wins is when you're storing the result into memory *and* flushing the store buffer. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From boris.ulasevich at bell-sw.com Tue Aug 25 17:30:02 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Tue, 25 Aug 2020 20:30:02 +0300 Subject: RFR 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: References: <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com> <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com> <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com> <8635e113-53c0-f6a7-e51c-acf20222e216@redhat.com> <89c44f2f-9a4f-4298-4e78-358fff27b6a7@bell-sw.com> <95cf8beb-2071-8c41-ff71-d4998681e742@redhat.com> <2323d921-8db3-b98f-af7a-bba7b7c345be@bell-sw.com> Message-ID: <405af8db-d12b-66ef-ff1b-8d0e2fb1273c@bell-sw.com> Andrew, Thanks for looking into this. I believe masking with left shift and right shift is not common. Search though jdk repository does not give such patterns while there is a hundreds of mask+lshift expressions. I implemented a simple is_bitrange_zero() method for counting the bitranges of sub-expressions: power-of-two masks and left shift only. We can take into account more cases (careful testing is a main concern). But particularly about "r.a << 24 >>> 24" expression I think it is worse to think about canonicalization: "left shift + right shift" to "mask + left shift" (or may be the backwards). regards, Boris On 25.08.2020 17:55, Andrew Haley wrote: > On 25/08/2020 10:47, Boris Ulasevich wrote: >> Ok. Can you please check that my patch [1] has been applied >> and built correctly. With my change I see this picture: >> >> ....[Hottest Region 2]........................................... >> c2, level 4, org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub, >> >> ??????????? 0x0000ffff84584dac:?? add??? x11, x14, #0x94 >> >> ??????????? 0x0000ffff84584db0:?? stp??? x21, x19, [sp] >> ??????????? 0x0000ffff84584db4:?? stp??? x20, x14, [sp, #16] >> ??????????? 0x0000ffff84584db8:?? stp??? x15, x10, [sp, #32] >> ??????????? 0x0000ffff84584dbc:?? str??? x11, [sp, #48] >> ??????????? 0x0000ffff84584dc0:?? b??? 0x0000ffff84584dd8 >> ??????????? 0x0000ffff84584dc4:?? nop >> ??????????? 0x0000ffff84584dc8:?? nop >> ??????????? 0x0000ffff84584dcc:?? nop >> ? 3.64%? ?? 0x0000ffff84584dd0:?? str??? x19, [sp, #16] >> ? 0.07%? ?? 0x0000ffff84584dd4:?? mov??? x16, x29 >> ???????? ?? 0x0000ffff84584dd8:?? ldr??? w10, [x16, #12] ;*invokestatic bfm >> ? 3.92%? ?? 0x0000ffff84584ddc:?? ldr??? w12, [x16, #24] >> ? 4.69%? ?? 0x0000ffff84584de0:?? and??? w2, w10, #0xff >> ? 0.03%? ?? 0x0000ffff84584de4:?? mov??? x29, x16 >> ? 0.02%? ?? 0x0000ffff84584de8:?? bfi??? x2, x12, #8, #8???? ;*ior {reexecute=0 rethrow=0 return_oop=0} >> ???????? ??????????????????????????????????????????????????? ; - > My apologies, I must have messed the patch up. I rebuilt cleanly. One odd thing, > though, is that it only works with some forms, and not necessarily the most > common ones. > > Good: > > @Benchmark > public static int bfm(Result r) { > return (r.a & 0xFF) | ((r.b & 0xFF) << 8); > } > > 8.13% ? 0x0000fffface550f0: and w2, w11, #0xff > 0.69% ? 0x0000fffface550f4: bfi x2, x10, #8, #8 ;*ior {reexecute=0 rethrow=0 return_oop=0} > > Not so good: > > @Benchmark > public static int shift_bfm(Result r) { > return ((r.a << 24 >>> 24) | (r.b << 24 >>> 16)); > } > > 8.56% ? 0x0000ffff88e50e70: lsl w12, w11, #24 > ? 0x0000ffff88e50e74: and w10, w10, #0xff > 8.59% ? 0x0000ffff88e50e78: orr w2, w10, w12, lsr #16 ;*ior {reexecute=0 rethrow=0 return_oop=0} > > @Benchmark > public static int shift_sbfm(Result r) { > return ((r.a << 24 >>> 24) | (r.b << 24 >> 16)); > } > > 9.40% ? 0x0000ffff84e51070: lsl w12, w11, #24 > 0.12% ? 0x0000ffff84e51074: and w10, w10, #0xff > 8.06% ? 0x0000ffff84e51078: orr w2, w10, w12, asr #16 ;*ior {reexecute=0 rethrow=0 return_oop=0} > > Does this matter? Bits.java uses the (a & 0xff) | ((b & 0xFF) << 8) idiom so maybe > we don't care about the shift left followed by shift right form. But it feels > to me a bit unsatisfactory to miss it. From christian.hagedorn at oracle.com Tue Aug 25 17:42:38 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 25 Aug 2020 19:42:38 +0200 Subject: [16] RFR(M): 8249607: C2: assert(!had_error) failed: bad dominance In-Reply-To: <87o8my7r0b.fsf@redhat.com> References: <87o8my7r0b.fsf@redhat.com> Message-ID: <738ba102-2cbd-a842-0f23-2984a9293035@oracle.com> On 25.08.20 16:13, Roland Westrelin wrote: > >> In the testcase, a LoadSNode is cloned in >> PhaseIdealLoop::split_if_with_blocks_post() for each use such that they >> can float out of a loop. To ensure that these loads cannot float back >> into the loop, we pin them by setting their control input [1]. In the >> testcase, all 3 new clones are pinned to a loop exit node that is part >> of an outer strip mined loop (see [2]). > > Do I understand this right, that all 3 clones are pinned with the same > control? So they common and only of them is kept? Yes, exactly. All are pinned to the inner loop exit node. But at the time we hit the assertion failure, we still got one cloned load (903 LoadS) that is an input to the store (575 StoreI) that's going into the outer strip mined loop safepoint, and one load (901 LoadS) that is triggering the dominance failure. LoadS 902 was removed at some point in between due to other optimizations. Best regards, Christian From john.r.rose at oracle.com Tue Aug 25 19:04:53 2020 From: john.r.rose at oracle.com (John Rose) Date: Tue, 25 Aug 2020 12:04:53 -0700 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <87r1ru7sop.fsf@redhat.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> <87tuwx1gcf.fsf@redhat.com> <7433dd28-94ce-781a-a50c-e79234e2986e@oracle.com> <87r1ru7sop.fsf@redhat.com> Message-ID: <18900602-6B03-483E-986B-30C8153F9F6F@oracle.com> On Aug 25, 2020, at 6:37 AM, Roland Westrelin wrote: > > >> Putting these together, and choosing a round number >> which prioritizes concern (b) by moving closer to the >> limit of (a), if I had one more run to do I?d choose >> -XX: StressLongCountedLoop=20000000. >> >> If I were to do multiple runs, I might choose vary that >> stress parameter by adding and subtracting a couple >> of zeroes: >> >> -XX: StressLongCountedLoop=200000 >> -XX: StressLongCountedLoop=2000000 >> -XX: StressLongCountedLoop=20000000 >> -XX: StressLongCountedLoop=200000000 >> -XX: StressLongCountedLoop=2000000000 > > FWIW, I ran my own tests with -XX: StressLongCountedLoop=10000. Tobias > runs did catch failures I didn't run into. > >> Separately from those issues, we know that the stress mode >> converts 32-bit loops into 64-bit loops, which then re-nest >> using the new logic. But, are we confident that this re-nesting >> works? Roland did some manual testing to make sure the >> test works as intended, but it would be good to run the above >> stress tests with some sort of logging that ensures that there >> are at least ?lots and lots? of successful 32-to-64 loop conversions. >> If those loop conversions fail (staying at 64 bits) the tests will >> pass, but they won?t be testing what we need to be testing. > > What about using the new statistics? > A CTW run of the base module reports: long loops=11/36 > A CTW run of the base module with -XX:StressLongCountedLoop=1000 reports: long loops=3271/3410 > > Granted, the first counter is only incremented once the loop nest is > created but not when the inner loop is converted to a counted loop. On > another run with a third counter incremented on counted loop creation: > > 2889/2971/3106 > > (that not all created inner loops are transformed to counted loops is > strange. Maybe some become dead between the 2 steps.) Yes, that?s the sort of manual testing I was referring to. The numbers you show are a reasonable value of ?lots and lots? for a CTW run. Who knows what the conversion rates are for real applications driven by profiles. I?m curious what they are but I suppose we can live without them. We don?t have AFAIK a way to set up a special cumulative report for a tier of testing on those parameters. I guess we are several levels of improvement short of being able to set up a probe across a set of tests and roll up statistics from it. So, with Tobias running those extra tests (the ?20s?) we are more than good. Thanks! ? John From john.r.rose at oracle.com Tue Aug 25 19:09:43 2020 From: john.r.rose at oracle.com (John Rose) Date: Tue, 25 Aug 2020 12:09:43 -0700 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: <18900602-6B03-483E-986B-30C8153F9F6F@oracle.com> References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> <87tuwx1gcf.fsf@redhat.com> <7433dd28-94ce-781a-a50c-e79234e2986e@oracle.com> <87r1ru7sop.fsf@redhat.com> <18900602-6B03-483E-986B-30C8153F9F6F@oracle.com> Message-ID: <4EF06EFC-9790-4B46-AA1D-E688C571D171@oracle.com> On Aug 25, 2020, at 12:04 PM, John Rose wrote: > > I guess we are > several levels of improvement short of being able to set up > a probe across a set of tests and roll up statistics from it. P.S. Those levels might be: 1. Plumb our ad hoc statistics into JFR and/or BPF publication points. (Either recode, or write some sort of log-file stripper.) 2. Fit the JVM with a side channel to manage external connections to said publication points. 3. Fit the side channel to off-the-shelf tools for log data aggregation. 4. Ensure that our testing framework has options for hooking up said aggregation tools to the test jobs. From igor.ignatyev at oracle.com Wed Aug 26 01:01:44 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 25 Aug 2020 18:01:44 -0700 Subject: RFR(M/S) : 8251127 : clean up FileInstaller $test.src $cwd in remaining vmTestbase_vm_compiler tests : Message-ID: http://cr.openjdk.java.net/~iignatyev/8251127/webrev.00/ > 560 lines changed: 132 ins; 367 del; 61 mod; Hi all, could you please review the patch which removes FileInstaller actions from :vmTestbase_vm_compiler? the biggest chunk of the patch is just removal for '@run jdk.test.lib.FileInstaller' produced by sed '/jdk.test.lib.FileInstaller \. \./d'. human-made changes are: - moving jtreg test descriptions to the test source in t108-t113, corresponding changes in TEST.quick-groups and fixing line numbers in t108-t113.gold files - adding -Dtest.src=${test.src} to the tests which use ExecDriver (t087,t088,t108-t113), so GoldChecker would be able to find .gold file testing: :vmTestbase_vm_compiler JBS: https://bugs.openjdk.java.net/browse/JDK-8251127 webrev: http://cr.openjdk.java.net/~iignatyev/8251127/webrev.00/ Thanks, -- Igor From vladimir.kozlov at oracle.com Wed Aug 26 01:10:40 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 25 Aug 2020 18:10:40 -0700 Subject: RFR(M/S) : 8251127 : clean up FileInstaller $test.src $cwd in remaining vmTestbase_vm_compiler tests : In-Reply-To: References: Message-ID: <5859dffd-9ed9-21d3-102b-3070013d7fe0@oracle.com> Good. Thanks, Vladimir K On 8/25/20 6:01 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev/8251127/webrev.00/ >> 560 lines changed: 132 ins; 367 del; 61 mod; > > Hi all, > > could you please review the patch which removes FileInstaller actions from :vmTestbase_vm_compiler? > > the biggest chunk of the patch is just removal for '@run jdk.test.lib.FileInstaller' produced by sed '/jdk.test.lib.FileInstaller \. \./d'. human-made changes are: > - moving jtreg test descriptions to the test source in t108-t113, corresponding changes in TEST.quick-groups and fixing line numbers in t108-t113.gold files > - adding -Dtest.src=${test.src} to the tests which use ExecDriver (t087,t088,t108-t113), so GoldChecker would be able to find .gold file > > testing: :vmTestbase_vm_compiler > JBS: https://bugs.openjdk.java.net/browse/JDK-8251127 > webrev: http://cr.openjdk.java.net/~iignatyev/8251127/webrev.00/ > > Thanks, > -- Igor > > > > From shade at redhat.com Wed Aug 26 07:28:04 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 26 Aug 2020 09:28:04 +0200 Subject: RFR (S) 8252215: Remove VerifyOptoOopOffsets flag In-Reply-To: References: <96144e25-02b7-ed81-285e-b8d487fd6cfb@redhat.com> Message-ID: <6d334364-c26c-13b5-b804-7d61d8fad8d4@redhat.com> On 8/25/20 2:43 PM, Tobias Hartmann wrote: > looks good to me. Thanks! I'll wait a bit for more opinions on this. -- -Aleksey From shade at redhat.com Wed Aug 26 07:30:19 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 26 Aug 2020 09:30:19 +0200 Subject: RFR (XS) 8252290: C2: Remove unused enum in CallGenerator In-Reply-To: References: <30d68060-d518-a2c6-f853-9e870d48f0ad@redhat.com> Message-ID: On 8/25/20 2:46 PM, Tobias Hartmann wrote: > +1 Thanks, pushed. > On 25.08.20 11:28, Reingruber, Richard wrote: >> Static code inspection complains the enum below is unused. > > Just curious, which analyzer are you using? Yup, CLion analyzers. They highlight all sorts of errors when I browse the code :) -- Thanks, -Aleksey From shade at redhat.com Wed Aug 26 07:30:17 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 26 Aug 2020 09:30:17 +0200 Subject: RFR (XS) 8252291: C2: Assignment in conditional in loopUnswitch.cpp In-Reply-To: <811ea63c-b897-ccf0-2559-82842b52e4be@oracle.com> References: <811ea63c-b897-ccf0-2559-82842b52e4be@oracle.com> Message-ID: <173e2b6a-e1ce-32b3-8ded-f84077e55979@redhat.com> On 8/25/20 2:44 PM, Tobias Hartmann wrote: > looks good and trivial to me. Thanks, pushed. -- -Aleksey From shade at redhat.com Wed Aug 26 08:06:48 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 26 Aug 2020 10:06:48 +0200 Subject: RFR (XS) 8252362: C2: Remove no-op checking for callee-saved-floats Message-ID: Cleanup: https://bugs.openjdk.java.net/browse/JDK-8252362 The block below does not do anything, because there are no side-effects anywhere, and then callee_saved_floats is left unused. It is this way since the initial load. I believe C2 (matching) code just uses SOE/SOC info from .ad. Anyhow, I cannot find where the rest of runtime codifies SOE/SOC registers to check here. There are plenty of hand-enumerated registers in, say, macroAssembler-s. I think it is cleaner to remove the block: diff -r e12584d50765 src/hotspot/share/opto/c2compiler.cpp --- a/src/hotspot/share/opto/c2compiler.cpp Wed Aug 26 09:29:46 2020 +0200 +++ b/src/hotspot/share/opto/c2compiler.cpp Wed Aug 26 10:02:48 2020 +0200 @@ -64,14 +64,4 @@ } - // Check that runtime and architecture description agree on callee-saved-floats - bool callee_saved_floats = false; - for( OptoReg::Name i=OptoReg::Name(0); i References: <001101d67aa5$69851450$3c8f3cf0$@alibaba-inc.com> Message-ID: <003801d67b88$04e60340$0eb209c0$@alibaba-inc.com> Andrew, thanks a lot for your review. Ningsheng, could you please help push this change? Best Regards, Joshua > -----Original Message----- > From: Andrew Haley > Sent: 2020?8?25? 19:53 > To: Joshua Zhu ; hotspot-compiler- > dev at openjdk.java.net > Cc: aarch64-port-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR: 8252259: AArch64: Adjust default value of > FLOATPRESSURE > > On 25/08/2020 07:03, Joshua Zhu wrote: > > Therefore I propose the default value of FLOATPRESSURE be 32 because > > there are 32 float/SIMD registers on aarch64 and also the value of > > register pressure is the same as 1 for each LRG of > > Op_RegL/Op_RegD/Op_Vec. [3] > > > > Could you please help review this change? > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8252259 > > Webrev: http://cr.openjdk.java.net/~jzhu/8252259/webrev.00/ > > Yes, thanks. I can't remember why FLOATPRESSURE is 64, but it certainly looks > like 32 is a much more sensible value. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rwestrel at redhat.com Wed Aug 26 09:06:03 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 26 Aug 2020 11:06:03 +0200 Subject: RFR(T): 8252296: Shenandoah: crash in CallNode::extract_projections References: <87d03d7pdk.fsf@redhat.com> Message-ID: <878se17p50.fsf@redhat.com> Should have gone to hotspot-compiler-dev as well... -------------------- Start of forwarded message -------------------- From: Roland Westrelin To: shenandoah-dev at openjdk.java.net Subject: RFR(T): 8252296: Shenandoah: crash in CallNode::extract_projections Date: Wed, 26 Aug 2020 11:00:55 +0200 http://cr.openjdk.java.net/~roland/8252296/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8252296 My fix for 8251527 has caused failures with shenandoah enabled because CallNode::extract_projections() is called with a graph in the process of being modified where a ProjNode has more than one control use. Roland. -------------------- End of forwarded message -------------------- From vladimir.x.ivanov at oracle.com Wed Aug 26 09:30:25 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 26 Aug 2020 12:30:25 +0300 Subject: RFR (XS) 8252362: C2: Remove no-op checking for callee-saved-floats In-Reply-To: References: Message-ID: <697ad989-ba33-7eb4-281e-3763e722fa10@oracle.com> Looks good and trivial. The code was added as part of JDK-6527187 [1], but it was useless from the very beginning. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-6527187 On 26.08.2020 11:06, Aleksey Shipilev wrote: > Cleanup: > ? https://bugs.openjdk.java.net/browse/JDK-8252362 > > The block below does not do anything, because there are no side-effects > anywhere, and then callee_saved_floats is left unused. It is this way > since the initial load. I believe C2 (matching) code just uses SOE/SOC > info from .ad. Anyhow, I cannot find where the rest of runtime codifies > SOE/SOC registers to check here. There are plenty of hand-enumerated > registers in, say, macroAssembler-s. > > I think it is cleaner to remove the block: > > diff -r e12584d50765 src/hotspot/share/opto/c2compiler.cpp > --- a/src/hotspot/share/opto/c2compiler.cpp???? Wed Aug 26 09:29:46 2020 > +0200 > +++ b/src/hotspot/share/opto/c2compiler.cpp???? Wed Aug 26 10:02:48 2020 > +0200 > @@ -64,14 +64,4 @@ > ?? } > > -? // Check that runtime and architecture description agree on > callee-saved-floats > -? bool callee_saved_floats = false; > -? for( OptoReg::Name i=OptoReg::Name(0); > i -??? // Is there a callee-saved float or double? > -??? if( register_save_policy[i] == 'E' /* callee-saved */ && > -?????? (register_save_type[i] == Op_RegF || register_save_type[i] == > Op_RegD) ) { > -????? callee_saved_floats = true; > -??? } > -? } > - > ?? DEBUG_ONLY( Node::init_NodeProperty(); ) > > Testing: local tier1 > From ningsheng.jian at arm.com Wed Aug 26 09:31:41 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Wed, 26 Aug 2020 17:31:41 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <5b452edb-2851-f35a-ac30-523d74d95851@oracle.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com> <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com> <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com> <9fd1e3b1-7884-1cf7-64ba-040a16c74425@oracle.com> <5b452edb-2851-f35a-ac30-523d74d95851@oracle.com> Message-ID: <15ea964d-6605-7ba4-63bc-e61007407ed8@arm.com> Hi Vladimir, On 8/25/20 8:12 PM, Vladimir Ivanov wrote: > [...] > > So, it's enough to use a single "virtual" slot to model XMM, YMM, and > ZMM registers all at once unless RA supports packing multiple smaller > vector values into a single register (separately managing lower and > upper parts of the register; e.g., YMM = XMM(hi):XMM(lo) ). Though > currently RA does support it, there are no code which utilizes that and > no plans to do that in the future. > > I believe the situation on AArch64 with NEON and SVE is similar. (And > scalable vectors make it harder to support packing in RA.) > Right. > ? (2) vector width matters only for spills/refills and reg2reg moves. > > Matcher does type capturing, so all vector mach nodes keep precise type > of the value they produce. On x86 it is heavily used later in code > emission phase, but RA still relies on ideal registers (Op_VecX et al). > I don't see why RA can't be migrated from ideal registers to types > (TypeVect) to determine vector size when performing spilling. > > From aforementioned observations, I conclude there should be a way to > declare a single ideal vector register (Op_Vec) which represents > full-width vector supported by the hardware and use captured vector > types (TypeVect instances) to guide RA and code generation. And that's > the state where I'd like to see vector support in C2 be moving to. > That may be true. I think we can move forward step-by-step for easy maintenance. > Regarding predicate registers, I haven't thought too much about them, so > I don't have a strong opinion about whether they should be a separate > entity (Op_RegVMask in your patch) or just treated as a vector of bits > (Op_Vec). > >>> So far, I see 2 main directions for RA work: >>> >>> ?? (a) support vectors of arbitrary size: >>> ???? (1) helps push the upper limit on the size (1024-bit) >>> ???? (2) handle non-power-of-2 sizes >>> >>> ?? (b) optimize RA implementation for large values >>> >>> Anything else? >>> >> >> Yes, and it's not just vector. SVE predicate register has scalable >> size (vector_size/8) as well. We also have predicate register >> allocator support well with proposed approach (not in this patch.). > > Though with AVX512 support predicate register support was left aside, I > agree that predicate registers should be taken into account from the > very beginning. (And glad to hear you are already working on supporting > them!) > As that's one of the main feature of SVE, we have to do that. :-) With initial SVE support in, our further work on that could be easier. > Also, I believe options #1/#2 may be extended to cover predicate > registers as well without too much effort. > >>> Speaking of (a), in particular, I don't see why possible solution for >>> it should not supersede vecX et al altogether. >>> >>> Also, I may be wrong, but I don't see a clear evidence there's a >>> pressing need to have all of that fixed right from the beginning. >>> (That's why I put #1 and #2 options on the table.) Starting with >>> #1/#2 would untie initial SVE support from the exploratory work >>> needed to choose the most appropriate solution for (a) and (b). >>> >> >> Staring from partial SVE register support might be acceptable for >> initial patch (Andrew may not agree :-)), but I think we may end up >> with more follow-up work, given that our proposed approach already >> supports SVE well in terms of (a) and (b). If there's no other >> solution, would it be possible to use current proposed method? It's >> not difficult to backout our changes in register allocation part, if >> we find other better solution to support arbitrary vector/predicate >> sizes in future, as the patch there is actually not big IMO. > > Unfortunately, temporary solutions usually end up as permanent ones > since there's much less motivation to replace them (and harder to > justify the effort) after initial pressure is relieved. > > I'm OK with the proposed patch if we agree it's a stop-the-gap/temporary > solution to the immediate problems you face with initial SVE support and > are ready to commit resources into replacing it. > Yes, we will continue to maintain and improve it. Our idea might be Arm biased :), so we will need collaborations and suggestions from the community. > That's why I think it's the right time to discuss general direction, > work on a plan, and use it to guide the coordinated effort to improve > vector support in C2. > > Also, considering it a stop-the-gap solution means we should strive for > the simplest solution and that's another reason I put #1/#2 options on > the table to consider. > > [...] > >>> Any new problems/hitting some limitations envisioned when spilling >>> large number of huge vectors (2048-bit) on stack? >>> >> >> I haven't seen any so far. > > Ok, good to know. > > I was curious whether stack representation should also move away from > 32-bit slots to a more compact representation. > I think that's possible, if we could also have the alignment handled. Thanks, Ningsheng From shade at redhat.com Wed Aug 26 09:37:47 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 26 Aug 2020 11:37:47 +0200 Subject: RFR (XS) 8252362: C2: Remove no-op checking for callee-saved-floats In-Reply-To: <697ad989-ba33-7eb4-281e-3763e722fa10@oracle.com> References: <697ad989-ba33-7eb4-281e-3763e722fa10@oracle.com> Message-ID: <3df1803d-84f2-b84f-bbd4-859dddd769e7@redhat.com> On 8/26/20 11:30 AM, Vladimir Ivanov wrote: > Looks good and trivial. Ack. I'll wait a bit and then push. > The code was added as part of JDK-6527187 [1], but it was useless from > the very beginning. Ah. Thanks for digging into pre-OpenJDK history. Added that breadcrumb to the JIRA. -- Thanks, -Aleksey From Ningsheng.Jian at arm.com Wed Aug 26 09:43:23 2020 From: Ningsheng.Jian at arm.com (Ningsheng Jian) Date: Wed, 26 Aug 2020 09:43:23 +0000 Subject: [aarch64-port-dev ] RFR: 8252259: AArch64: Adjust default value of FLOATPRESSURE In-Reply-To: <003801d67b88$04e60340$0eb209c0$@alibaba-inc.com> References: <001101d67aa5$69851450$3c8f3cf0$@alibaba-inc.com> <003801d67b88$04e60340$0eb209c0$@alibaba-inc.com> Message-ID: Pushed. Regards, Ningsheng > -----Original Message----- > From: Joshua Zhu > Sent: Wednesday, August 26, 2020 5:05 PM > To: 'Andrew Haley' ; hotspot-compiler-dev at openjdk.java.net; > Ningsheng Jian > Cc: aarch64-port-dev at openjdk.java.net > Subject: RE: [aarch64-port-dev ] RFR: 8252259: AArch64: Adjust default value of > FLOATPRESSURE > > Andrew, thanks a lot for your review. > Ningsheng, could you please help push this change? > > Best Regards, > Joshua > > > -----Original Message----- > > From: Andrew Haley > > Sent: 2020?8?25? 19:53 > > To: Joshua Zhu ; hotspot-compiler- > > dev at openjdk.java.net > > Cc: aarch64-port-dev at openjdk.java.net > > Subject: Re: [aarch64-port-dev ] RFR: 8252259: AArch64: Adjust default > > value of FLOATPRESSURE > > > > On 25/08/2020 07:03, Joshua Zhu wrote: > > > Therefore I propose the default value of FLOATPRESSURE be 32 because > > > there are 32 float/SIMD registers on aarch64 and also the value of > > > register pressure is the same as 1 for each LRG of > > > Op_RegL/Op_RegD/Op_Vec. [3] > > > > > > Could you please help review this change? > > > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8252259 > > > Webrev: http://cr.openjdk.java.net/~jzhu/8252259/webrev.00/ > > > > Yes, thanks. I can't remember why FLOATPRESSURE is 64, but it > > certainly looks like 32 is a much more sensible value. > > > > -- > > Andrew Haley (he/him) > > Java Platform Lead Engineer > > Red Hat UK Ltd. > > https://keybase.io/andrewhaley > > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From christian.hagedorn at oracle.com Wed Aug 26 11:10:41 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 26 Aug 2020 13:10:41 +0200 Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and debugging support In-Reply-To: References: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com> <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com> <5c6d0dae-a30d-7b32-4503-62b7ec460606@oracle.com> <4b6d3d8a-f2f3-9942-de60-078a0e5c46d9@oracle.com> <8cd1d560-f473-f4f1-a865-70e306d4750f@oracle.com> <0d5fd444-e836-8042-3039-6d16e62ecfb1@oracle.com> Message-ID: <78c28a8c-8a7b-f10d-95e9-e583a278b03c@oracle.com> Hi Tobias Thank you for your review! On 25.08.20 14:37, Tobias Hartmann wrote: > Hi Christian, > > On 19.08.20 16:06, Christian Hagedorn wrote: >> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.02/ > Looks good to me, just noticed some style issues (no new webrev required): > > c1_LinearScan.cpp: > - Wrong indentation in lines 5445, 5509, 5681 Thanks, fixed it inline. > TestTraceLinearScanLevel.java: > - "... in a HelloWorld program". It's not a HelloWorld program, right? ;) Oh, you're right! Should have written "... in a *silent* HelloWorld program" :-) Best regards, Christian From christian.hagedorn at oracle.com Wed Aug 26 12:43:20 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 26 Aug 2020 14:43:20 +0200 Subject: RFR(T): 8252296: Shenandoah: crash in CallNode::extract_projections In-Reply-To: <878se17p50.fsf@redhat.com> References: <87d03d7pdk.fsf@redhat.com> <878se17p50.fsf@redhat.com> Message-ID: <312607ab-2d2f-7966-519c-5354951d5184@oracle.com> Hi Roland Looks good and trivial to me. Best regards, Christian On 26.08.20 11:06, Roland Westrelin wrote: > > Should have gone to hotspot-compiler-dev as well... > > -------------------- Start of forwarded message -------------------- > From: Roland Westrelin > To: shenandoah-dev at openjdk.java.net > Subject: RFR(T): 8252296: Shenandoah: crash in CallNode::extract_projections > Date: Wed, 26 Aug 2020 11:00:55 +0200 > > > http://cr.openjdk.java.net/~roland/8252296/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8252296 > > My fix for 8251527 has caused failures with shenandoah enabled because > CallNode::extract_projections() is called with a graph in the process of > being modified where a ProjNode has more than one control use. > > Roland. > -------------------- End of forwarded message -------------------- > From adinn at redhat.com Wed Aug 26 12:54:26 2020 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 26 Aug 2020 13:54:26 +0100 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <524a4aaa-3cf7-7b5e-3e0e-0fd7f4f89fbf@oracle.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com> <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com> <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com> <35f8801f-4383-4804-2a9b-5818d1bda763@redhat.com> <524a4aaa-3cf7-7b5e-3e0e-0fd7f4f89fbf@oracle.com> Message-ID: <670fad6f-16ff-a7b3-8775-08dd79809ddf@redhat.com> Hi Vladimir, On 25/08/2020 14:18, Vladimir Ivanov wrote: > I elaborated on some of the points in the thread with Ningsheng. > > I put my responses in-line, but will try to avoid repeating myself too > much. Thanks for the response and also clarification in replies to Ningsheng. So, if I can summarize (please correct me if I misunderstand): You are as concerned about existing complexity in vector handling as much as complexity added by this patch, whether the latter is to AArch64 code or shared code. The goal you would like to achieve is a single set of rules for a single kind of vector register whose size is parameterized, the appropriate value being derived from each specific vector operation. Your main concern about this patch is that it adds yet another additional vector kind to the current 'wrong' multi-kind vector model and, what is worse, one with a different behaviour, taking us further from your desired goal. Your other concern is that this design does not allow for the AArch64 ISA predication or, indeed, for what you treat uniformly as the 'implicit' predication imposed on a 'logical' max vector size (2048 bits) by the specific AVX/SVE/NEON hardware vector size. > But you should definitely prefer 1-slot design for vector registers then > ;-) Indeed I do :-] So, let me respond to the above summary points, assuming I have them down right. I agree that your end goal is highly desirable. However, we are not there yet and since your attempts to do so have not succeeded so far I don't think that means we are compelled to drop the current patch. As you say this could (and, if it is adopted, should) be regarded as a useful stop-gap until we come up with a unified, parameterized vector implementation that makes it redundant. That said, I'm not pushing hard to keep the patch if the consequence is generating significant work later to undo it. The number of users who might benefit from using SVE vectors from Java now or in the near future does not look like it is going to be very large (if you are not making a lot of use of SVE registers then that is a lot of wasted silicon and I suspect it's going to be the rare case that someone codes an app in Java that needs to make continuous use of SVE -- mind you, by the same token I guess that also applies for AVX on Intel). I'm not sure pushing this now will add a lot more work later. It seems to me that this code is actually moving in the right direction for the sort of solution you want. The AArch64 VecA register /is/ size-parameterized, albeit by a size fixed at startup rather than per operation. So, that's one reason why I don't know if this implies a lot more rework to move towards your desired goal. Surely, if we do arrive at a unifying vector model that can replace the existing multi-kind vectors then it ought to be able to subsume this code - unless of course it replaces it wholesale. Are you concerned that adding this patch will result in more cases to pick through and correct? Are you worried that we might have to withdraw some of the support this patch enables to arrive at the final goal? Also, Ningsheng and his colleagues have laid some foundations for implementing predicated operations with this patch and have that work in the pipeline. Once again this is moving towards the desired goal even if it might end up doign so in a slightly sideways fashion. Perhaps we could continue this stop-gap experiment as an experimental option in order to learn from the experience? regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From aph at redhat.com Wed Aug 26 14:21:35 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 26 Aug 2020 15:21:35 +0100 Subject: RFR 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: <405af8db-d12b-66ef-ff1b-8d0e2fb1273c@bell-sw.com> References: <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com> <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com> <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com> <8635e113-53c0-f6a7-e51c-acf20222e216@redhat.com> <89c44f2f-9a4f-4298-4e78-358fff27b6a7@bell-sw.com> <95cf8beb-2071-8c41-ff71-d4998681e742@redhat.com> <2323d921-8db3-b98f-af7a-bba7b7c345be@bell-sw.com> <405af8db-d12b-66ef-ff1b-8d0e2fb1273c@bell-sw.com> Message-ID: <5cbb89bb-32c7-8064-a6e9-f9b0d0a2b195@redhat.com> On 25/08/2020 18:30, Boris Ulasevich wrote: > I believe masking with left shift and right shift is not common. > Search though jdk repository does not give such patterns while > there is a hundreds of mask+lshift expressions. > I implemented a simple is_bitrange_zero() method for counting the > bitranges of sub-expressions: power-of-two masks and left shift only. > We can take into account more cases (careful testing is a main > concern). But particularly about "r.a << 24 >>> 24" expression > I think it is worse to think about canonicalization: "left shift + right > shift" to "mask + left shift" (or may be the backwards). I'm running your test program, and for example I get this, old on the left, new on the right. Compiled method (c2) 11832 1113 SubTest0::tst2 (184 bytes) : and x11, x2, #0x1 ;*land : and x11, x2, #0x1 : and x10, x1, #0x1 ;*land : and x10, x1, #0x1 : orr x11, x11, x11, lsl #3 : bfi x11, x2, #3, #1 : orr x10, x10, x10, lsl #3 : bfi x10, x1, #3, #1 : and xmethod, x3, #0x1 ;*land : and xmethod, x3, #0x1 : add x10, x10, x11 : bfi xmethod, x3, #3, #1 : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, x11 : and xmethod, x4, #0x1 ;*land : and x11, x4, #0x1 : add x10, x11, x10 : bfi x11, x4, #3, #1 : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, xmethod : and xmethod, x5, #0x1 ;*land : and xmethod, x5, #0x1 : add x10, x11, x10 : bfi xmethod, x5, #3, #1 : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, x11 : and xmethod, x6, #0x1 ;*land : and x11, x6, #0x1 : add x10, x11, x10 : bfi x11, x6, #3, #1 : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, xmethod : and xmethod, x7, #0x1 ;*land : and xmethod, x7, #0x1 : add x10, x11, x10 : bfi xmethod, x7, #3, #1 : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, x11 : and xmethod, x0, #0x1 ;*land : add x10, x10, xmethod : add x10, x11, x10 : ldr x13, [sp,#32] : orr x11, xmethod, xmethod, lsl #3 : and x11, x0, #0x1 : ldr xmethod, [sp,#32] : and xmethod, x13, #0x1 : and xmethod, xmethod, #0x1 : bfi x11, x0, #3, #1 : add x10, x11, x10 : bfi xmethod, x13, #3, #1 : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, x11 : ldr xmethod, [sp,#40] : ldr x13, [sp,#40] : and xmethod, xmethod, #0x1 : and x11, x13, #0x1 : add x10, x11, x10 : bfi x11, x13, #3, #1 : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, xmethod : ldr xmethod, [sp,#48] : ldr x13, [sp,#48] : and xmethod, xmethod, #0x1 : and xmethod, x13, #0x1 : add x10, x11, x10 : bfi xmethod, x13, #3, #1 : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, x11 : ldr xmethod, [sp,#56] : ldr x13, [sp,#56] : and xmethod, xmethod, #0x1 : and x11, x13, #0x1 : add x10, x11, x10 : bfi x11, x13, #3, #1 : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, xmethod : add x0, x11, x10 ;*ladd : add x0, x10, x11 I've also tried a bunch of different test cases doing operations that could match BFI instructions, and in only a few of them does it happen. In almost all cases, then, this change does not help, *even your own test case*. I think that you've got something that is potentially useful, but it needs some careful analysis to make sure it actually gets used. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From fw at deneb.enyo.de Wed Aug 26 14:59:26 2020 From: fw at deneb.enyo.de (Florian Weimer) Date: Wed, 26 Aug 2020 16:59:26 +0200 Subject: RFR(T): 8252296: Shenandoah: crash in CallNode::extract_projections In-Reply-To: <312607ab-2d2f-7966-519c-5354951d5184@oracle.com> (Christian Hagedorn's message of "Wed, 26 Aug 2020 14:43:20 +0200") References: <87d03d7pdk.fsf@redhat.com> <878se17p50.fsf@redhat.com> <312607ab-2d2f-7966-519c-5354951d5184@oracle.com> Message-ID: <874koptpv5.fsf@mid.deneb.enyo.de> * Christian Hagedorn: > Looks good and trivial to me. It seems to fix my reproducer, too. Thanks. From lutz.schmidt at sap.com Wed Aug 26 15:20:52 2020 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Wed, 26 Aug 2020 15:20:52 +0000 Subject: RFR(M): 8219586: CodeHeap State Analytics processes dead nmethods Message-ID: <6DA47071-83F8-4E02-A6A9-E7FD8B9B5813@sap.com> Dear all, may I please request reviews for this fix/improvement to CodeHeap State Analytics. Explained in a nutshell it removes the last holes through which the analysis code could potentially access memory which is no longer associated with the entity being inspected. There has been a long-lasting, off-list discussion with Erik ?sterlund until all pitfalls were identified and agreeable solutions were found. The important parts of that discussion are reflected in the bug comments. There are two major changes: 1) All accesses to the CodeHeap are now protected by continuously holding the CodeCache_lock and, in addition, the Compile_lock. Information is aggregated in local data structures for later printing without holding the above locks. 2) Printing the names of all code blobs has been disabled except for one operation mode where the locks can be held while printing. Bug: https://bugs.openjdk.java.net/browse/JDK-8219586 Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8219586.02/ This change has JDK-8250635 (currently out for review) as a prerequisite. It will not compile without. Thank you! Lutz From lutz.schmidt at sap.com Wed Aug 26 15:18:31 2020 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Wed, 26 Aug 2020 15:18:31 +0000 Subject: RFR(S): 8250635: MethodArityHistogram should use Compile_lock in favour of fancy checks Message-ID: <6C21DEE4-95FD-4EDA-9DBF-2B12560A5C04@sap.com> Dear all, may I please request reviews for this small enhancement? Instead of calling a method doing complicated and fancy (hard to understand) checks, the iteration over all nmethods is now protected by holding the Compile_lock in addition to the CodeCache_lock. Bug: https://bugs.openjdk.java.net/browse/JDK-8250635 Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8250635.00/ Thank you! Lutz From martin.doerr at sap.com Wed Aug 26 15:26:59 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 26 Aug 2020 15:26:59 +0000 Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> Message-ID: Hi Corey, I should explain my comments regarding Base64.java better. > Let's be precise: "should process a multiple of four" => "must process a > multiple of four" Did you try to support non-multiple of 4 and this was intended as recommendation? I think making it a requirement and simplifying the logic in decode0 is better. Or what's the benefit of the recommendation? > > If any illegal base64 bytes are encountered in the source by the > > intrinsic, the intrinsic can return a data length of zero or any > > number of bytes before the place where the illegal base64 byte > > was encountered. > I think this has a drawback. Somebody may use a debugger and want to stop > when throwing IllegalArgumentException. He should see the position which > matches the Java implementation. This is probably hard to understand. Let me try to explain it by example: 1. 80 Bytes get processed by the intrinsic and 60 Bytes written to the destination array. 2. The intrinsic sees an illegal base64 Byte and it returns 12 which is allowed by your specification. 3. The compiled method containing the intrinsic hits a safepoint (e.g. in the large while loop in decodeBlockSlow). 4. A JVMTI agent (debugger) reads dp and dst. 5. The person using the debugger gets angry because more bytes than dp were written into dst. The JVM didn't follow the specified behavior. I guess we can and should avoid it by specifying that the intrinsic needs to return the dp value matching the number of Bytes written. Best regards, Martin > -----Original Message----- > From: Doerr, Martin > Sent: Dienstag, 25. August 2020 15:38 > To: Corey Ashford ; Michihiro Horie > > Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev dev at openjdk.java.net>; Kazunori Ogata ; > joserz at br.ibm.com > Subject: RE: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and > API for Base64 decoding > > Hi Corey, > > thanks for proposing this change. I have comments and suggestions > regarding various files. > > > Base64.java > > This is the only file which needs another review from core-libs-dev. > First of all, I like the idea to use a HotSpotIntrinsicCandidate which can > consume as many bytes as the implementation wants. > > Comment before decodeBlock: > Let's be precise: "should process a multiple of four" => "must process a > multiple of four" > > > If any illegal base64 bytes are encountered in the source by the > > intrinsic, the intrinsic can return a data length of zero or any > > number of bytes before the place where the illegal base64 byte > > was encountered. > I think this has a drawback. Somebody may use a debugger and want to stop > when throwing IllegalArgumentException. He should see the position which > matches the Java implementation. > > Please note that the comment indentation differs from other comments. > > decode0: Final "else" after return is redundant. > > > stubGenerator_ppc.cpp > > "__vector" breaks AIX build! > Does it work on Big Endian linux with old gcc (we require 7.3.1, now)? > Please either support Big Endian properly or #ifdef it out. > What exactly does it on linux? > I remember that we had tried such prefixes but were not satisfied. I think it > didn't enforce 16 Byte alignment if I remember correctly. > > Attention: C2 does no longer convert int/bool to 64 bit values (since JDK- > 8086069). So the argument registers for offset, length and isURL may contain > garbage in the higher bits. > > You may want to use load_const_optimized which produces shorter code. > > You may want to use __ align(32) to align unrolled_loop_start. > > I'll review the algorithm in detail when I find more time. > > > assembler_ppc.hpp > assembler_ppc.inline.hpp > vm_version_ppc.cpp > vm_version_ppc.hpp > Please rebase. Parts of the change were pushed as part of 8248190: Enable > Power10 system and implement new byte-reverse instructions > > > vmSymbols.hpp > Indentation looks odd at the end. > > > library_call.cpp > Good. Indentation style of the call parameters differs from encodeBlock. > > > runtime.cpp > Good. > > > aotCodeHeap.cpp > vmSymbols.cpp > shenandoahSupport.cpp > vmStructs_jvmci.cpp > shenandoahSupport.cpp > escape.cpp > runtime.hpp > stubRoutines.cpp > stubRoutines.hpp > vmStructs.cpp > Good and trivial. > > > Tests: > I think we should have JTREG tests to check for regressions in the future. > > Best regards, > Martin > > > > -----Original Message----- > > From: Corey Ashford > > Sent: Mittwoch, 19. August 2020 20:11 > > To: Michihiro Horie > > Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev > dev at openjdk.java.net>; Kazunori Ogata ; > > joserz at br.ibm.com; Doerr, Martin > > Subject: Re: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and > > API for Base64 decoding > > > > Michihiro Horie posted up a new iteration of this webrev for me. This > > time the webrev includes a complete implementation of the intrinsic for > > Power9 and Power10. > > > > You can find it here: > > http://cr.openjdk.java.net/~mhorie/8248188/webrev.02/ > > > > Changes in webrev.02 vs. webrev.01: > > > > * The method header for the intrinsic in the Base64 code has been > > rewritten using the Javadoc style. The clarity of the comments has been > > improved and some verbosity has been removed. There are no additional > > functional changes to Base64.java. > > > > * The code needed to martial and check the intrinsic parameters has > > been added, using the base64 encodeBlock intrinsic as a guideline. > > > > * A complete intrinsic implementation for Power9 and Power10 is > included. > > > > * Adds some Power9 and Power10 assembler instructions needed by the > > intrinsic which hadn't been defined before. > > > > The intrinsic implementation in this patch accelerates the decoding of > > large blocks of base64 data by a factor of about 3.5X on Power9. > > > > I'm attaching two Java test cases I am using for testing and > > benchmarking. The TestBase64_VB encodes and decodes randomly-sized > > buffers of random data and checks that original data matches the > > encoded-then-decoded data. TestBase64Errors encodes a 48K block of > > random bytes, then corrupts each byte of the encoded data, one at a > > time, checking to see if the decoder catches the illegal byte. > > > > Any comments/suggestions would be appreciated. > > > > Thanks, > > > > - Corey > > > > On 7/27/20 6:49 PM, Corey Ashford wrote: > > > Michihiro Horie uploaded a new revision of the Base64 decodeBlock > > > intrinsic API for me: > > > > > > http://cr.openjdk.java.net/~mhorie/8248188/webrev.01/ > > > > > > It has the following changes with respect to the original one posted: > > > > > > ?* In the event of encountering a non-base64 character, instead of > > > having a separate error code of -1, the intrinsic can now just return > > > either 0, or the number of data bytes produced up to the point where > the > > > illegal base64 character was encountered.? This reduces the number of > > > special cases, and also provides a way to speed up the process of > > > finding the bad character by the slower, pure-Java algorithm. > > > > > > ?* The isMIME boolean is removed from the API for two reasons: > > > ?? - The current API is not sufficient to handle the isMIME case, > > > because there isn't a strict relationship between the number of input > > > bytes and the number of output bytes, because there can be an arbitrary > > > number of non-base64 characters in the source. > > > ?? - If an intrinsic only implements the (isMIME == false) case as ours > > > does, it will always return 0 bytes processed, which will slightly slow > > > down the normal path of processing an (isMIME == true) instantiation. > > > ?? - We considered adding a separate hotspot candidate for the (isMIME > > > == true) case, but since we don't have an intrinsic implementation to > > > test that, we decided to leave it as a future optimization. > > > > > > Comments and suggestions are welcome.? Thanks for your consideration. > > > > > > - Corey > > > > > > On 6/23/20 6:23 PM, Michihiro Horie wrote: > > >> Hi Corey, > > >> > > >> Following is the issue I created. > > >> https://bugs.openjdk.java.net/browse/JDK-8248188 > > >> > > >> I will upload a webrev when you're ready as we talked in private. > > >> > > >> Best regards, > > >> Michihiro > > >> > > >> Inactive hide details for "Corey Ashford" ---2020/06/24 > > >> 09:40:10---Currently in java.util.Base64, there is a > > >> HotSpotIntrinsicCa"Corey Ashford" ---2020/06/24 09:40:10---Currently > > >> in java.util.Base64, there is a HotSpotIntrinsicCandidate and API for > > >> encodeBlock, but no > > >> > > >> From: "Corey Ashford" > > >> To: "hotspot-compiler-dev at openjdk.java.net" > > >> , > > >> "ppc-aix-port-dev at openjdk.java.net" > dev at openjdk.java.net> > > >> Cc: Michihiro Horie/Japan/IBM at IBMJP, Kazunori > > Ogata/Japan/IBM at IBMJP, > > >> joserz at br.ibm.com > > >> Date: 2020/06/24 09:40 > > >> Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for > > >> Base64 decoding > > >> > > >> ------------------------------------------------------------------------ > > >> > > >> > > >> > > >> Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and > > >> API for encodeBlock, but none for decoding. ?This means that only > > >> encoding gets acceleration from the underlying CPU's vector hardware. > > >> > > >> I'd like to propose adding a new intrinsic for decodeBlock. ?The > > >> considerations I have for this new intrinsic's API: > > >> > > >> ??* Don't make any assumptions about the underlying capability of the > > >> hardware. ?For example, do not impose any specific block size > > >> granularity. > > >> > > >> ??* Don't assume the underlying intrinsic can handle isMIME or isURL > > >> modes, but also let them decide if they will process the data regardless > > >> of the settings of the two booleans. > > >> > > >> ??* Any remaining data that is not processed by the intrinsic will be > > >> processed by the pure Java implementation. ?This allows the intrinsic to > > >> process whatever block sizes it's good at without the complexity of > > >> handling the end fragments. > > >> > > >> ??* If any illegal character is discovered in the decoding process, the > > >> intrinsic will simply return -1, instead of requiring it to throw a > > >> proper exception from the context of the intrinsic. ?In the event of > > >> getting a -1 returned from the intrinsic, the Java Base64 library code > > >> simply calls the pure Java implementation to have it find the error and > > >> properly throw an exception. ?This is a performance trade-off in the > > >> case of an error (which I expect to be very rare). > > >> > > >> ??* One thought I have for a further optimization (not implemented in > > >> the current patch), is that when the intrinsic decides not to process a > > >> block because of some combination of isURL and isMIME settings it > > >> doesn't handle, it could return extra bits in the return code, encoded > > >> as a negative number. ?For example: > > >> > > >> Illegal_Base64_char ? = 0b001; > > >> isMIME_unsupported ? ?= 0b010; > > >> isURL_unsupported ? ? = 0b100; > > >> > > >> These can be OR'd together as needed and then negated (flip the sign). > > >> The Base64 library code could then cache these flags, so it will know > > >> not to call the intrinsic again when another decodeBlock is requested > > >> but with an unsupported mode. ?This will save the performance hit of > > >> calling the intrinsic when it is guaranteed to fail. > > >> > > >> I've tested the attached patch with an actual intrinsic coded up for > > >> Power9/Power10, but those runtime intrinsics and arch-specific patches > > >> aren't attached today. ?I want to get some consensus on the > > >> library-level intrinsic API first. > > >> > > >> Also attached is a simple test case to test that the new intrinsic API > > >> doesn't break anything. > > >> > > >> I'm open to any comments about this. > > >> > > >> Thanks for your consideration, > > >> > > >> - Corey > > >> > > >> > > >> Corey Ashford > > >> IBM Systems, Linux Technology Center, OpenJDK team > > >> cjashfor at us dot ibm dot com > > >> [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro > > >> Horie/Japan/IBM] [attachment "TestBase64.java" deleted by Michihiro > > >> Horie/Japan/IBM] > > >> > > >> > > > From vladimir.kozlov at oracle.com Wed Aug 26 16:44:59 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 26 Aug 2020 09:44:59 -0700 Subject: RFR (S) 8252215: Remove VerifyOptoOopOffsets flag In-Reply-To: References: <96144e25-02b7-ed81-285e-b8d487fd6cfb@redhat.com> Message-ID: I agree. It does not even check that the field in particular offset is oop. It just check that there is a field for which we have a ton of other checks. Also in shenandoahBarrierSetC2.cpp it check tp == NULL in assert after code already referenced through it! Thanks, Vladimir On 8/25/20 5:43 AM, Tobias Hartmann wrote: > Hi Aleksey, > > looks good to me. > > Best regards, > Tobias > > On 25.08.20 09:08, Aleksey Shipilev wrote: >> RFE: >> ? https://bugs.openjdk.java.net/browse/JDK-8252215 >> >> VerifyOptoOopOffsets flag does not seem to be used (no tests in the current test base), and it does >> not seem to work reliably (see JDK-4834891). It might be a good time to remove it. JDK-4834891 >> evaluation says: "The flag VerifyOptoOopOffsets has not been valid since the introduction of >> sun/misc/Unsafe and the flag should not be used for general testing." >> >> How about we remove it? >> ? https://cr.openjdk.java.net/~shade/8252215/webrev.01/ >> >> Testing: tier1 (locally); jdk-submit (still running?) >> From cjashfor at linux.ibm.com Wed Aug 26 16:50:05 2020 From: cjashfor at linux.ibm.com (Corey Ashford) Date: Wed, 26 Aug 2020 09:50:05 -0700 Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> Message-ID: Thanks for your careful review, Martin. I will consider what you have said, and reply with comments/questions and possibly a revised webrev if I think I can satisfy your concerns. Regards, - Corey On 8/26/20 8:26 AM, Doerr, Martin wrote: > Hi Corey, > > I should explain my comments regarding Base64.java better. > >> Let's be precise: "should process a multiple of four" => "must process a >> multiple of four" > Did you try to support non-multiple of 4 and this was intended as recommendation? > I think making it a requirement and simplifying the logic in decode0 is better. > Or what's the benefit of the recommendation? > >>> If any illegal base64 bytes are encountered in the source by the >>> intrinsic, the intrinsic can return a data length of zero or any >>> number of bytes before the place where the illegal base64 byte >>> was encountered. >> I think this has a drawback. Somebody may use a debugger and want to stop >> when throwing IllegalArgumentException. He should see the position which >> matches the Java implementation. > This is probably hard to understand. Let me try to explain it by example: > 1. 80 Bytes get processed by the intrinsic and 60 Bytes written to the destination array. > 2. The intrinsic sees an illegal base64 Byte and it returns 12 which is allowed by your specification. > 3. The compiled method containing the intrinsic hits a safepoint (e.g. in the large while loop in decodeBlockSlow). > 4. A JVMTI agent (debugger) reads dp and dst. > 5. The person using the debugger gets angry because more bytes than dp were written into dst. The JVM didn't follow the specified behavior. > > I guess we can and should avoid it by specifying that the intrinsic needs to return the dp value matching the number of Bytes written. > > Best regards, > Martin > > >> -----Original Message----- >> From: Doerr, Martin >> Sent: Dienstag, 25. August 2020 15:38 >> To: Corey Ashford ; Michihiro Horie >> >> Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev > dev at openjdk.java.net>; Kazunori Ogata ; >> joserz at br.ibm.com >> Subject: RE: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and >> API for Base64 decoding >> >> Hi Corey, >> >> thanks for proposing this change. I have comments and suggestions >> regarding various files. >> >> >> Base64.java >> >> This is the only file which needs another review from core-libs-dev. >> First of all, I like the idea to use a HotSpotIntrinsicCandidate which can >> consume as many bytes as the implementation wants. >> >> Comment before decodeBlock: >> Let's be precise: "should process a multiple of four" => "must process a >> multiple of four" >> >>> If any illegal base64 bytes are encountered in the source by the >>> intrinsic, the intrinsic can return a data length of zero or any >>> number of bytes before the place where the illegal base64 byte >>> was encountered. >> I think this has a drawback. Somebody may use a debugger and want to stop >> when throwing IllegalArgumentException. He should see the position which >> matches the Java implementation. >> >> Please note that the comment indentation differs from other comments. >> >> decode0: Final "else" after return is redundant. >> >> >> stubGenerator_ppc.cpp >> >> "__vector" breaks AIX build! >> Does it work on Big Endian linux with old gcc (we require 7.3.1, now)? >> Please either support Big Endian properly or #ifdef it out. >> What exactly does it on linux? >> I remember that we had tried such prefixes but were not satisfied. I think it >> didn't enforce 16 Byte alignment if I remember correctly. >> >> Attention: C2 does no longer convert int/bool to 64 bit values (since JDK- >> 8086069). So the argument registers for offset, length and isURL may contain >> garbage in the higher bits. >> >> You may want to use load_const_optimized which produces shorter code. >> >> You may want to use __ align(32) to align unrolled_loop_start. >> >> I'll review the algorithm in detail when I find more time. >> >> >> assembler_ppc.hpp >> assembler_ppc.inline.hpp >> vm_version_ppc.cpp >> vm_version_ppc.hpp >> Please rebase. Parts of the change were pushed as part of 8248190: Enable >> Power10 system and implement new byte-reverse instructions >> >> >> vmSymbols.hpp >> Indentation looks odd at the end. >> >> >> library_call.cpp >> Good. Indentation style of the call parameters differs from encodeBlock. >> >> >> runtime.cpp >> Good. >> >> >> aotCodeHeap.cpp >> vmSymbols.cpp >> shenandoahSupport.cpp >> vmStructs_jvmci.cpp >> shenandoahSupport.cpp >> escape.cpp >> runtime.hpp >> stubRoutines.cpp >> stubRoutines.hpp >> vmStructs.cpp >> Good and trivial. >> >> >> Tests: >> I think we should have JTREG tests to check for regressions in the future. >> >> Best regards, >> Martin >> >> >>> -----Original Message----- >>> From: Corey Ashford >>> Sent: Mittwoch, 19. August 2020 20:11 >>> To: Michihiro Horie >>> Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev >> dev at openjdk.java.net>; Kazunori Ogata ; >>> joserz at br.ibm.com; Doerr, Martin >>> Subject: Re: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and >>> API for Base64 decoding >>> >>> Michihiro Horie posted up a new iteration of this webrev for me. This >>> time the webrev includes a complete implementation of the intrinsic for >>> Power9 and Power10. >>> >>> You can find it here: >>> http://cr.openjdk.java.net/~mhorie/8248188/webrev.02/ >>> >>> Changes in webrev.02 vs. webrev.01: >>> >>> * The method header for the intrinsic in the Base64 code has been >>> rewritten using the Javadoc style. The clarity of the comments has been >>> improved and some verbosity has been removed. There are no additional >>> functional changes to Base64.java. >>> >>> * The code needed to martial and check the intrinsic parameters has >>> been added, using the base64 encodeBlock intrinsic as a guideline. >>> >>> * A complete intrinsic implementation for Power9 and Power10 is >> included. >>> >>> * Adds some Power9 and Power10 assembler instructions needed by the >>> intrinsic which hadn't been defined before. >>> >>> The intrinsic implementation in this patch accelerates the decoding of >>> large blocks of base64 data by a factor of about 3.5X on Power9. >>> >>> I'm attaching two Java test cases I am using for testing and >>> benchmarking. The TestBase64_VB encodes and decodes randomly-sized >>> buffers of random data and checks that original data matches the >>> encoded-then-decoded data. TestBase64Errors encodes a 48K block of >>> random bytes, then corrupts each byte of the encoded data, one at a >>> time, checking to see if the decoder catches the illegal byte. >>> >>> Any comments/suggestions would be appreciated. >>> >>> Thanks, >>> >>> - Corey >>> >>> On 7/27/20 6:49 PM, Corey Ashford wrote: >>>> Michihiro Horie uploaded a new revision of the Base64 decodeBlock >>>> intrinsic API for me: >>>> >>>> http://cr.openjdk.java.net/~mhorie/8248188/webrev.01/ >>>> >>>> It has the following changes with respect to the original one posted: >>>> >>>> ?* In the event of encountering a non-base64 character, instead of >>>> having a separate error code of -1, the intrinsic can now just return >>>> either 0, or the number of data bytes produced up to the point where >> the >>>> illegal base64 character was encountered.? This reduces the number of >>>> special cases, and also provides a way to speed up the process of >>>> finding the bad character by the slower, pure-Java algorithm. >>>> >>>> ?* The isMIME boolean is removed from the API for two reasons: >>>> ?? - The current API is not sufficient to handle the isMIME case, >>>> because there isn't a strict relationship between the number of input >>>> bytes and the number of output bytes, because there can be an arbitrary >>>> number of non-base64 characters in the source. >>>> ?? - If an intrinsic only implements the (isMIME == false) case as ours >>>> does, it will always return 0 bytes processed, which will slightly slow >>>> down the normal path of processing an (isMIME == true) instantiation. >>>> ?? - We considered adding a separate hotspot candidate for the (isMIME >>>> == true) case, but since we don't have an intrinsic implementation to >>>> test that, we decided to leave it as a future optimization. >>>> >>>> Comments and suggestions are welcome.? Thanks for your consideration. >>>> >>>> - Corey >>>> >>>> On 6/23/20 6:23 PM, Michihiro Horie wrote: >>>>> Hi Corey, >>>>> >>>>> Following is the issue I created. >>>>> https://bugs.openjdk.java.net/browse/JDK-8248188 >>>>> >>>>> I will upload a webrev when you're ready as we talked in private. >>>>> >>>>> Best regards, >>>>> Michihiro >>>>> >>>>> Inactive hide details for "Corey Ashford" ---2020/06/24 >>>>> 09:40:10---Currently in java.util.Base64, there is a >>>>> HotSpotIntrinsicCa"Corey Ashford" ---2020/06/24 09:40:10---Currently >>>>> in java.util.Base64, there is a HotSpotIntrinsicCandidate and API for >>>>> encodeBlock, but no >>>>> >>>>> From: "Corey Ashford" >>>>> To: "hotspot-compiler-dev at openjdk.java.net" >>>>> , >>>>> "ppc-aix-port-dev at openjdk.java.net" >> dev at openjdk.java.net> >>>>> Cc: Michihiro Horie/Japan/IBM at IBMJP, Kazunori >>> Ogata/Japan/IBM at IBMJP, >>>>> joserz at br.ibm.com >>>>> Date: 2020/06/24 09:40 >>>>> Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for >>>>> Base64 decoding >>>>> >>>>> ------------------------------------------------------------------------ >>>>> >>>>> >>>>> >>>>> Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and >>>>> API for encodeBlock, but none for decoding. ?This means that only >>>>> encoding gets acceleration from the underlying CPU's vector hardware. >>>>> >>>>> I'd like to propose adding a new intrinsic for decodeBlock. ?The >>>>> considerations I have for this new intrinsic's API: >>>>> >>>>> ??* Don't make any assumptions about the underlying capability of the >>>>> hardware. ?For example, do not impose any specific block size >>>>> granularity. >>>>> >>>>> ??* Don't assume the underlying intrinsic can handle isMIME or isURL >>>>> modes, but also let them decide if they will process the data regardless >>>>> of the settings of the two booleans. >>>>> >>>>> ??* Any remaining data that is not processed by the intrinsic will be >>>>> processed by the pure Java implementation. ?This allows the intrinsic to >>>>> process whatever block sizes it's good at without the complexity of >>>>> handling the end fragments. >>>>> >>>>> ??* If any illegal character is discovered in the decoding process, the >>>>> intrinsic will simply return -1, instead of requiring it to throw a >>>>> proper exception from the context of the intrinsic. ?In the event of >>>>> getting a -1 returned from the intrinsic, the Java Base64 library code >>>>> simply calls the pure Java implementation to have it find the error and >>>>> properly throw an exception. ?This is a performance trade-off in the >>>>> case of an error (which I expect to be very rare). >>>>> >>>>> ??* One thought I have for a further optimization (not implemented in >>>>> the current patch), is that when the intrinsic decides not to process a >>>>> block because of some combination of isURL and isMIME settings it >>>>> doesn't handle, it could return extra bits in the return code, encoded >>>>> as a negative number. ?For example: >>>>> >>>>> Illegal_Base64_char ? = 0b001; >>>>> isMIME_unsupported ? ?= 0b010; >>>>> isURL_unsupported ? ? = 0b100; >>>>> >>>>> These can be OR'd together as needed and then negated (flip the sign). >>>>> The Base64 library code could then cache these flags, so it will know >>>>> not to call the intrinsic again when another decodeBlock is requested >>>>> but with an unsupported mode. ?This will save the performance hit of >>>>> calling the intrinsic when it is guaranteed to fail. >>>>> >>>>> I've tested the attached patch with an actual intrinsic coded up for >>>>> Power9/Power10, but those runtime intrinsics and arch-specific patches >>>>> aren't attached today. ?I want to get some consensus on the >>>>> library-level intrinsic API first. >>>>> >>>>> Also attached is a simple test case to test that the new intrinsic API >>>>> doesn't break anything. >>>>> >>>>> I'm open to any comments about this. >>>>> >>>>> Thanks for your consideration, >>>>> >>>>> - Corey >>>>> >>>>> >>>>> Corey Ashford >>>>> IBM Systems, Linux Technology Center, OpenJDK team >>>>> cjashfor at us dot ibm dot com >>>>> [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro >>>>> Horie/Japan/IBM] [attachment "TestBase64.java" deleted by Michihiro >>>>> Horie/Japan/IBM] >>>>> >>>>> >>>> > From vladimir.kozlov at oracle.com Wed Aug 26 16:59:43 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 26 Aug 2020 09:59:43 -0700 Subject: RFR (XS) 8252290: C2: Remove unused enum in CallGenerator In-Reply-To: References: <30d68060-d518-a2c6-f853-9e870d48f0ad@redhat.com> Message-ID: <30db6ea6-cf81-4fb8-b43f-3a275fa7acab@oracle.com> On 8/25/20 2:28 AM, Reingruber, Richard wrote: > Hi Aleksey, > > the cleanup looks good to me. +1 > > That enum was already part of the initial load with xxxunusedxxx as the only element [1]. > So there's no open version history. > > I could not find any references either (rtags, grep). Probably the enum had more elements > originally which were removed. Nope. Old history shows that it was like this from time when callGenerator.hpp was created. I assume it is leftover from C2 implementation work. Regards, Vladimir K > > Thanks, Richard. > > [1] https://github.com/openjdk/jdk/blame/d4626d89cc778b8b7108036f389548c95d52e56a/src/hotspot/share/opto/callGenerator.hpp#L41 > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Aleksey Shipilev > Sent: Dienstag, 25. August 2020 09:29 > To: hotspot compiler > Subject: RFR (XS) 8252290: C2: Remove unused enum in CallGenerator > > Small cleanup: > https://bugs.openjdk.java.net/browse/JDK-8252290 > > Static code inspection complains the enum below is unused. > > diff -r 13fdf97f0a8f src/hotspot/share/opto/callGenerator.hpp > --- a/src/hotspot/share/opto/callGenerator.hpp Mon Aug 24 09:35:23 2020 +0200 > +++ b/src/hotspot/share/opto/callGenerator.hpp Tue Aug 25 09:27:45 2020 +0200 > @@ -37,9 +37,4 @@ > > class CallGenerator : public ResourceObj { > - public: > - enum { > - xxxunusedxxx > - }; > - > private: > ciMethod* _method; // The method being called. > > Testing: grepping for "xxxunusedxxx", local builds > From vladimir.kozlov at oracle.com Wed Aug 26 17:10:42 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 26 Aug 2020 10:10:42 -0700 Subject: RFR(S): 8252292: 8240795 may cause anti-dependence to be missed In-Reply-To: References: <87wo1n6snc.fsf@redhat.com> Message-ID: <09a82d80-208c-6cea-da6b-e501d65e0f79@oracle.com> On 8/25/20 6:49 AM, Tobias Hartmann wrote: > Hi Roland, > > Good catch, the fix looks reasonable to me. +1 Thanks, Vladimir K > > I think the test needs a @requires vm.gc == "Parallel" | vm.gc == "null" to not fail due to > conflicting GC options if another GC is set. > > Best regards, > Tobias > > On 25.08.20 10:23, Roland Westrelin wrote: >> >> https://bugs.openjdk.java.net/browse/JDK-8252292 >> http://cr.openjdk.java.net/~roland/8252292/webrev.00/ >> >> In 8240795, I modified alias analysis so non escaping allocations don't >> alias with bottom memory. While browsing that code last week, I noticed >> that that change didn't seem quite right and may cause some >> anti-dependences to be missed. I could indeed write a test case that >> fails with an incorrect execution. >> >> In the test case: the dst[9] load after the ArrayCopy is transformed >> into a src[9] load before the ArrayCopy. Anti dependence analysis find >> src[9] shares the memory of the ArrayCopy but because of the way I >> tweaked the code with 8240795, anti-dependence analysis finds the src[9] >> and ArrayCopy don't alias so src[9] can sink out of the loop which is >> wrong because of the src[9] store. Anti-dependence analysis in that case >> would need to look at the memory uses of ArrayCopy too. >> >> Roland. >> From vladimir.kozlov at oracle.com Wed Aug 26 18:07:38 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 26 Aug 2020 11:07:38 -0700 Subject: RFR(S): 8241486: G1/Z give warning when using LoopStripMiningIter and turn off LoopStripMiningIter (0) In-Reply-To: References: <87tuwr6s5j.fsf@redhat.com> Message-ID: +1 Thanks, Vladimir K On 8/25/20 6:18 AM, Tobias Hartmann wrote: > > On 25.08.20 14:57, Tobias Hartmann wrote: >>> * @requires vm.gc.G1 & vm.gc.Shenandoah & vm.gc.Z & vm.gc.Epsilon >> That doesn't look right. The test would never be executed. > > Sorry, confused it with the vm.gc == .. check. You are just checking if the VM supports the GC. > > Looks good to me. > > Best regards, > Tobias > From honguye at microsoft.com Wed Aug 26 18:55:07 2020 From: honguye at microsoft.com (Nhat Nguyen) Date: Wed, 26 Aug 2020 18:55:07 +0000 Subject: RFR(S) 8251271- C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes Message-ID: Hi hotspot-compiler-dev, Please review the following patch to address https://bugs.openjdk.java.net/browse/JDK-8251271 The bug is currently assigned to Christian Hagedorn, but he was supportive of me submitting the patch instead. I have run hotspot/tier1 and jdk/tier1 tests to make sure that the change is working as intended. webrev: http://cr.openjdk.java.net/~burban/nhat/JDK-8251271/webrev.00/ Thank you, Nhat From jingxinc at amazon.com Wed Aug 26 21:36:52 2020 From: jingxinc at amazon.com (Eric, Chan) Date: Wed, 26 Aug 2020 21:36:52 +0000 Subject: RFR 8239090: Improve CPU feature support in VM_version Message-ID: <21DF2FC1-7D91-4D2A-87EB-8F42EA1E276D@amazon.com> Hi, Requesting review for Webrev : http://cr.openjdk.java.net/~xliu/eric/8213777/01/webrev/ JBS : https://bugs.openjdk.java.net/browse/JDK-8239090 I improve the ?get_processor_features? method by store every cpu features in an enum array so that we don?t have to count how many ?%s? that need to added. I passed the tier1 test successfully. Regards, Eric Chen From cjashfor at linux.ibm.com Wed Aug 26 22:17:25 2020 From: cjashfor at linux.ibm.com (Corey Ashford) Date: Wed, 26 Aug 2020 15:17:25 -0700 Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> Message-ID: Hi Martin, Some inline responses below. On 8/26/20 8:26 AM, Doerr, Martin wrote: > Hi Corey, > > I should explain my comments regarding Base64.java better. > >> Let's be precise: "should process a multiple of four" => "must process a >> multiple of four" > Did you try to support non-multiple of 4 and this was intended as recommendation? > I think making it a requirement and simplifying the logic in decode0 is better. > Or what's the benefit of the recommendation? If I make a requirement, I feel decode0 should check that the requirement is met, and raise some kind of internal error if it isn't. That actually was my first implementation, but I received some comments during an internal review suggesting that I just "round down" the destination count to the closest multiple of 3 less than or equal to the returned value, rather than throw an internal exception which would confuse users. This "enforces" the rule, in some sense, without error handling. Do you have some thoughts about this? > >>> If any illegal base64 bytes are encountered in the source by the >>> intrinsic, the intrinsic can return a data length of zero or any >>> number of bytes before the place where the illegal base64 byte >>> was encountered. >> I think this has a drawback. Somebody may use a debugger and want to stop >> when throwing IllegalArgumentException. He should see the position which >> matches the Java implementation.kkkk > This is probably hard to understand. Let me try to explain it by example: > 1. 80 Bytes get processed by the intrinsic and 60 Bytes written to the destination array. > 2. The intrinsic sees an illegal base64 Byte and it returns 12 which is allowed by your specification. > 3. The compiled method containing the intrinsic hits a safepoint (e.g. in the large while loop in decodeBlockSlow). > 4. A JVMTI agent (debugger) reads dp and dst. > 5. The person using the debugger gets angry because more bytes than dp were written into dst. The JVM didn't follow the specified behavior. > > I guess we can and should avoid it by specifying that the intrinsic needs to return the dp value matching the number of Bytes written. That's an interesting point. I will change the specification, and the intrinsic implementation. Right now the Power9/10 intrinsic returns 0 when any illegal character is discovered, but I've been thinking about returning the number of bytes already written, which will allow decodeBlockSlow to more quickly find the offending character. This provides another good reason to make that change. > > Best regards, > Martin > > >> -----Original Message----- >> From: Doerr, Martin >> Sent: Dienstag, 25. August 2020 15:38 >> To: Corey Ashford ; Michihiro Horie >> >> Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev > dev at openjdk.java.net>; Kazunori Ogata ; >> joserz at br.ibm.com >> Subject: RE: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and >> API for Base64 decoding >> >> Hi Corey, >> >> thanks for proposing this change. I have comments and suggestions >> regarding various files. >> >> >> Base64.java >> >> This is the only file which needs another review from core-libs-dev. >> First of all, I like the idea to use a HotSpotIntrinsicCandidate which can >> consume as many bytes as the implementation wants. >> >> Comment before decodeBlock: >> Let's be precise: "should process a multiple of four" => "must process a >> multiple of four" >> >>> If any illegal base64 bytes are encountered in the source by the >>> intrinsic, the intrinsic can return a data length of zero or any >>> number of bytes before the place where the illegal base64 byte >>> was encountered. >> I think this has a drawback. Somebody may use a debugger and want to stop >> when throwing IllegalArgumentException. He should see the position which >> matches the Java implementation. >> >> Please note that the comment indentation differs from other comments. Will fix. >> >> decode0: Final "else" after return is redundant. Will fix. >> >> >> stubGenerator_ppc.cpp >> >> "__vector" breaks AIX build! >> Does it work on Big Endian linux with old gcc (we require 7.3.1, now)? >> Please either support Big Endian properly or #ifdef it out. I have been compiling with only Advance Toolchain 13, which is 9.3.1, and only on Linux. It will not work with big endian, so it won't work on AIX, however obviously it shouldn't break the AIX build, so I will address that. There's code to set UseBASE64Intrinsics to false on big endian, but you're right -- I should ifdef all of the intrinsic code for little endian for now. Getting it to work on big endian / AIX shouldn't be difficult, but it's not in my scope of work at the moment. I will double check that everything compiles and runs properly with gcc 7.3.1. >> What exactly does it (do) on linux? It's an arch-specific type that's 16 bytes in size and aligned on a 16-byte boundary. >> I remember that we had tried such prefixes but were not satisfied. I think it >> didn't enforce 16 Byte alignment if I remember correctly. I will use __attribute__ ((align(16))) instead of __vector, and make them arrays of 16 unsigned char. >> >> Attention: C2 does no longer convert int/bool to 64 bit values (since JDK- >> 8086069). So the argument registers for offset, length and isURL may contain >> garbage in the higher bits. Wow, that's good to know! I will mask off the incoming values. >> >> You may want to use load_const_optimized which produces shorter code. Will fix. >> >> You may want to use __ align(32) to align unrolled_loop_start. Will fix. >> >> I'll review the algorithm in detail when I find more time. >> >> >> assembler_ppc.hpp >> assembler_ppc.inline.hpp >> vm_version_ppc.cpp >> vm_version_ppc.hpp >> Please rebase. Parts of the change were pushed as part of 8248190: Enable >> Power10 system and implement new byte-reverse instructions Will do. >> >> >> vmSymbols.hpp >> Indentation looks odd at the end. I was following what was done for encodeBlock, but it appears encodeBlock's style isn't what is used for the other intrinsics. I will correct decodeBlock to use the prevailing style. Another patch should be added (not part of this webrev) to correct encodeBlock's style. >> >> >> library_call.cpp >> Good. Indentation style of the call parameters differs from encodeBlock. Will fix. >> >> >> runtime.cpp >> Good. >> >> >> aotCodeHeap.cpp >> vmSymbols.cpp >> shenandoahSupport.cpp >> vmStructs_jvmci.cpp >> shenandoahSupport.cpp >> escape.cpp >> runtime.hpp >> stubRoutines.cpp >> stubRoutines.hpp >> vmStructs.cpp >> Good and trivial. >> >> >> Tests: >> I think we should have JTREG tests to check for regressions in the future. Ah, this is another thing I didn't know about. I will make some regression tests. Thanks for your time on this. As you can tell, I'm inexperienced in writing openjdk code, so your patience and careful review is really appreciated. - Corey From vladimir.kozlov at oracle.com Wed Aug 26 23:31:19 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 26 Aug 2020 16:31:19 -0700 Subject: RFR 8164632: Node indices should be treated as unsigned integers In-Reply-To: <587AF7B9-5EE9-4F93-A587-9B3277E9183D@amazon.com> References: <05F44A7B-7BF3-4EF0-B1A6-8131600A3919@amazon.com> <587AF7B9-5EE9-4F93-A587-9B3277E9183D@amazon.com> Message-ID: Missed this. On 8/14/20 1:54 PM, Hohensee, Paul wrote: > By "e.g.", I meant "ones like the one in the webrev". Tobais is correct that there are more. I grep'ed for "(int idx", ", int idx", "(int idx)", and so on, and found a bunch (not all of them are node_idx_t, but many of those that aren't should probably be uint too). So those would be fixed first. Yes, I am okay with fixing them first. Thanks, Vladimir K > > Thanks, > Paul > > ?On 8/14/20, 11:04 AM, "Vladimir Kozlov" wrote: > > On 8/14/20 9:05 AM, Hohensee, Paul wrote: > > Hi, Vladimir, > > > > What do you think of the following? > > > > 1. Fix 8164632, i.e., replace int with uint, and add guarantees where idxs are passed to a different type (as in e.g., Eric's webrev). > > I see only this change: > > - const TypeOopPtr* tinst = t->cast_to_instance_id(ni); > + assert(ni<=INT_MAX,"node index cannot be negative"); > + const TypeOopPtr* tinst = t->cast_to_instance_id((int)ni); > > I would like to see first what you are suggesting. > > > 2. New issue: Define an enum type for _instance_id, (typedef uint instance_idx_t) and change the guarantees to check < InstanceTop and > InstanceBot (InstanceTop = ~(uint)0, InstanceBot = 0). And change from instance ids from int to instance_idx_t. > > 3. New issue: Change from uint to node_idx_t. > > Yes, it is fine to split these 2. > > Regards, > Vladimir > > > > > Thanks, > > Paul > > > > On 8/13/20, 4:00 PM, "hotspot-compiler-dev on behalf of Vladimir Kozlov" wrote: > > > > Yes, it is sloppy :( > > > > Mostly it bases on value of MaxNodeLimit = 80000 by default and as result node's idx will never reach MAX_INT. > > > > For EA we need 2 special types TOP and BOTTOM as Paul correctly pointed in RFE. > > We can make InstanceTop == max_juint and node_idx_t type for _instance_id . We don't do arithmetic on it, see > > TypeOopPtr::meet_instance_id(). But we can't use assert in this case to check incoming idx because max_juint will be > > valid value - InstanceTop. > > > > And I agree that we should use node_idx_t everywhere. > > > > For example, Node::Init(), init_node_notes(), node_notes_at() and set_node_notes_at() should use it. > > > > Same goes for req and other Node's methods arguments. All Node fields defined as node_idx_t but we have mix of int and > > uint when referencing them. > > > > Warning: it is not small change. > > > > Regards, > > Vladimir > > > > On 8/13/20 2:51 PM, Hohensee, Paul wrote: > > > Shouldn't all the uint type uses that represent node indices actually be node_idx_t? > > > > > > Thanks, > > > Paul > > > > > > On 8/13/20, 12:34 AM, "hotspot-compiler-dev on behalf of Tobias Hartmann" wrote: > > > > > > Hi Eric, > > > > > > there are other places where Node::_idx is casted to int (and a potential overflow might happen). > > > For example, calls to Compile::node_notes_at. > > > > > > The purpose of this RFE was to replace all Node::_idx uint -> int casts and consistently use uint > > > for the node index. If that's not feasible, we should at least add a guarantee (not only an assert) > > > checking that _idx is always <= MAX_INT. > > > > > > Best regards, > > > Tobias > > > > > > On 12.08.20 00:41, Eric, Chan wrote: > > > > Hi, > > > > > > > > Requesting review for > > > > > > > > Webrev : http://cr.openjdk.java.net/~xliu/eric/8164632/00/webrev/ > > > > JBS : https://bugs.openjdk.java.net/browse/JDK-8164632 > > > > > > > > The change cast uint ni to integer so that the parameter that pass to method TypeOopPtr::cast_to_instance_id is a integer. > > > > > > > > I have tested this builds successfully . > > > > > > > > Ensured that there are no regressions in hotspot : tier1 tests. > > > > > > > > Regards, > > > > Eric Chen > > > > > > > > > > From jiefu at tencent.com Wed Aug 26 23:37:37 2020 From: jiefu at tencent.com (=?utf-8?B?amllZnUo5YKF5p2wKQ==?=) Date: Wed, 26 Aug 2020 23:37:37 +0000 Subject: RFR: 8252404: compiler/c1/TestTraceLinearScanLevel.java fails with release VMs Message-ID: <1D07C3F6-F236-4934-9A1D-7F95960D1C24@tencent.com> Hi all, May I get reviews for this fix? JBS: https://bugs.openjdk.java.net/browse/JDK-8252404 Webrev: http://cr.openjdk.java.net/~jiefu/8252404/webrev.00/ Thanks. Best regards, Jie From igor.ignatyev at oracle.com Thu Aug 27 00:08:09 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 26 Aug 2020 17:08:09 -0700 Subject: RFR(M/S) : 8251127 : clean up FileInstaller $test.src $cwd in remaining vmTestbase_vm_compiler tests : In-Reply-To: <5859dffd-9ed9-21d3-102b-3070013d7fe0@oracle.com> References: <5859dffd-9ed9-21d3-102b-3070013d7fe0@oracle.com> Message-ID: <40E57766-0F5A-48E0-9B9A-5353642A75D0@oracle.com> thanks Vladimir, pushed. -- Igor > On Aug 25, 2020, at 6:10 PM, Vladimir Kozlov wrote: > > Good. > > Thanks, > Vladimir K > > On 8/25/20 6:01 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev/8251127/webrev.00/ >>> 560 lines changed: 132 ins; 367 del; 61 mod; >> Hi all, >> could you please review the patch which removes FileInstaller actions from :vmTestbase_vm_compiler? >> the biggest chunk of the patch is just removal for '@run jdk.test.lib.FileInstaller' produced by sed '/jdk.test.lib.FileInstaller \. \./d'. human-made changes are: >> - moving jtreg test descriptions to the test source in t108-t113, corresponding changes in TEST.quick-groups and fixing line numbers in t108-t113.gold files >> - adding -Dtest.src=${test.src} to the tests which use ExecDriver (t087,t088,t108-t113), so GoldChecker would be able to find .gold file >> testing: :vmTestbase_vm_compiler >> JBS: https://bugs.openjdk.java.net/browse/JDK-8251127 >> webrev: http://cr.openjdk.java.net/~iignatyev/8251127/webrev.00/ >> Thanks, >> -- Igor >> From vladimir.kozlov at oracle.com Thu Aug 27 00:32:19 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 26 Aug 2020 17:32:19 -0700 Subject: [16] RFR(M) 825239: AOT need to process new markId DEOPT_MH_HANDLER_ENTRY in compiled code Message-ID: <9c278576-08e3-1f5b-28d2-6c3b980a6511@oracle.com> http://cr.openjdk.java.net/~kvn/8252396/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8252396 8252058 added new markId DEOPT_MH_HANDLER_ENTRY to handle deoptimization for MH invoke. But changes did not updated AOT (jaotc and Hotspot's AOT code) to handle this new markId. We should handle DEOPT_MH_HANDLER_ENTRY in AOT similar to DEOPT_HANDLER_ENTRY. In aotCompiledMethod.hpp, if DEOPT_MH_HANDLER_ENTRY value is set, CompiledMethod::_deopt_mh_handler_begin [2] is set similar to Graal JIT [3]. I kept current code to set _deopt_mh_handler_begin to 'this' when DEOPT_MH_HANDLER_ENTRY value is not set. But may be it should be set to NULL as in [3]. May be it does not matter because offset is not used when there are not MH invoke in method. Tested: ran tests which used AOT (including Graal testing). Thanks, Vladimir [1] https://bugs.openjdk.java.net/browse/JDK-8252058 [2] http://hg.openjdk.java.net/jdk/jdk/file/6abdfb11f342/src/hotspot/share/code/compiledMethod.hpp#l168 [3] http://hg.openjdk.java.net/jdk/jdk/file/6abdfb11f342/src/hotspot/share/code/nmethod.cpp#l764 From vladimir.kozlov at oracle.com Thu Aug 27 02:20:17 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 26 Aug 2020 19:20:17 -0700 Subject: RFR: 8252404: compiler/c1/TestTraceLinearScanLevel.java fails with release VMs In-Reply-To: <1D07C3F6-F236-4934-9A1D-7F95960D1C24@tencent.com> References: <1D07C3F6-F236-4934-9A1D-7F95960D1C24@tencent.com> Message-ID: <405d4932-df45-8967-c4d6-79d119baa511@oracle.com> Since test's method is empty, it does not make sense to run it when C1's flag is not available. I suggest to add @requires instead of IgnoreUnrecognizedVMOptions flag, as in an other test [1]: * @requires vm.debug == true & vm.compiler1.enabled Thanks, Vladimir K [1] http://hg.openjdk.java.net/jdk/jdk/file/6abdfb11f342/test/hotspot/jtreg/compiler/c1/TestPrintIRDuringConstruction.java On 8/26/20 4:37 PM, jiefu(??) wrote: > Hi all, > > May I get reviews for this fix? > > JBS: https://bugs.openjdk.java.net/browse/JDK-8252404 > Webrev: http://cr.openjdk.java.net/~jiefu/8252404/webrev.00/ > > Thanks. > Best regards, > Jie > From jiefu at tencent.com Thu Aug 27 02:38:34 2020 From: jiefu at tencent.com (=?iso-2022-jp?B?amllZnUoGyRCUHxbPxsoQik=?=) Date: Thu, 27 Aug 2020 02:38:34 +0000 Subject: RFR: 8252404: compiler/c1/TestTraceLinearScanLevel.java fails with release VMs(Internet mail) In-Reply-To: <405d4932-df45-8967-c4d6-79d119baa511@oracle.com> References: <1D07C3F6-F236-4934-9A1D-7F95960D1C24@tencent.com>, <405d4932-df45-8967-c4d6-79d119baa511@oracle.com> Message-ID: <7c0fb874a5e6477fb7ca0c9ec659d004@tencent.com> Hi Vladimir K, Thanks for your review. Updated: http://cr.openjdk.java.net/~jiefu/8252404/webrev.01/ Best regards, Jie ________________________________ From: Vladimir Kozlov Sent: Thursday, August 27, 2020 10:20 AM To: jiefu(??); hotspot compiler Subject: Re: RFR: 8252404: compiler/c1/TestTraceLinearScanLevel.java fails with release VMs(Internet mail) Since test's method is empty, it does not make sense to run it when C1's flag is not available. I suggest to add @requires instead of IgnoreUnrecognizedVMOptions flag, as in an other test [1]: * @requires vm.debug == true & vm.compiler1.enabled Thanks, Vladimir K [1] http://hg.openjdk.java.net/jdk/jdk/file/6abdfb11f342/test/hotspot/jtreg/compiler/c1/TestPrintIRDuringConstruction.java On 8/26/20 4:37 PM, jiefu(??) wrote: > Hi all, > > May I get reviews for this fix? > > JBS: https://bugs.openjdk.java.net/browse/JDK-8252404 > Webrev: http://cr.openjdk.java.net/~jiefu/8252404/webrev.00/ > > Thanks. > Best regards, > Jie > From vladimir.kozlov at oracle.com Thu Aug 27 02:47:38 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 26 Aug 2020 19:47:38 -0700 Subject: RFR: 8252404: compiler/c1/TestTraceLinearScanLevel.java fails with release VMs(Internet mail) In-Reply-To: <7c0fb874a5e6477fb7ca0c9ec659d004@tencent.com> References: <1D07C3F6-F236-4934-9A1D-7F95960D1C24@tencent.com> <405d4932-df45-8967-c4d6-79d119baa511@oracle.com> <7c0fb874a5e6477fb7ca0c9ec659d004@tencent.com> Message-ID: <2f2cbd0e-639a-9c8a-41ab-33e16483e12c@oracle.com> Good. Vladimir K On 8/26/20 7:38 PM, jiefu(??) wrote: > Hi Vladimir K, > > Thanks for your review. > > Updated: http://cr.openjdk.java.net/~jiefu/8252404/webrev.01/ > > Best regards, > Jie > > > > ------------------------------------------------------------------------------------------------------------------------ > *From:* Vladimir Kozlov > *Sent:* Thursday, August 27, 2020 10:20 AM > *To:* jiefu(??); hotspot compiler > *Subject:* Re: RFR: 8252404: compiler/c1/TestTraceLinearScanLevel.java fails with release VMs(Internet mail) > Since test's method is empty, it does not make sense to run it when C1's flag is not available. > > I suggest to add @requires instead of IgnoreUnrecognizedVMOptions flag, as in an other test [1]: > > * @requires vm.debug == true & vm.compiler1.enabled > > Thanks, > Vladimir K > > [1] http://hg.openjdk.java.net/jdk/jdk/file/6abdfb11f342/test/hotspot/jtreg/compiler/c1/TestPrintIRDuringConstruction.java > > On 8/26/20 4:37 PM, jiefu(??) wrote: >> Hi all, >> >> May I get reviews for this fix? >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8252404 >> Webrev: http://cr.openjdk.java.net/~jiefu/8252404/webrev.00/ >> >> Thanks. >> Best regards, >> Jie >> > From jiefu at tencent.com Thu Aug 27 02:54:36 2020 From: jiefu at tencent.com (=?iso-2022-jp?B?amllZnUoGyRCUHxbPxsoQik=?=) Date: Thu, 27 Aug 2020 02:54:36 +0000 Subject: RFR: 8252404: compiler/c1/TestTraceLinearScanLevel.java fails with release VMs(Internet mail) In-Reply-To: <2f2cbd0e-639a-9c8a-41ab-33e16483e12c@oracle.com> References: <1D07C3F6-F236-4934-9A1D-7F95960D1C24@tencent.com> <405d4932-df45-8967-c4d6-79d119baa511@oracle.com> <7c0fb874a5e6477fb7ca0c9ec659d004@tencent.com>, <2f2cbd0e-639a-9c8a-41ab-33e16483e12c@oracle.com> Message-ID: <528dc08156b348c48700797c729a7f2c@tencent.com> Thanks Vladimir K. Can I push it right now? I think it's trivial and this is a tier1 failure. Best regards, Jie ________________________________ From: Vladimir Kozlov Sent: Thursday, August 27, 2020 10:47 AM To: jiefu(??); hotspot compiler Subject: Re: RFR: 8252404: compiler/c1/TestTraceLinearScanLevel.java fails with release VMs(Internet mail) Good. Vladimir K On 8/26/20 7:38 PM, jiefu(??) wrote: > Hi Vladimir K, > > Thanks for your review. > > Updated: http://cr.openjdk.java.net/~jiefu/8252404/webrev.01/ > > Best regards, > Jie > > > > ------------------------------------------------------------------------------------------------------------------------ > *From:* Vladimir Kozlov > *Sent:* Thursday, August 27, 2020 10:20 AM > *To:* jiefu(??); hotspot compiler > *Subject:* Re: RFR: 8252404: compiler/c1/TestTraceLinearScanLevel.java fails with release VMs(Internet mail) > Since test's method is empty, it does not make sense to run it when C1's flag is not available. > > I suggest to add @requires instead of IgnoreUnrecognizedVMOptions flag, as in an other test [1]: > > * @requires vm.debug == true & vm.compiler1.enabled > > Thanks, > Vladimir K > > [1] http://hg.openjdk.java.net/jdk/jdk/file/6abdfb11f342/test/hotspot/jtreg/compiler/c1/TestPrintIRDuringConstruction.java > > On 8/26/20 4:37 PM, jiefu(??) wrote: >> Hi all, >> >> May I get reviews for this fix? >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8252404 >> Webrev: http://cr.openjdk.java.net/~jiefu/8252404/webrev.00/ >> >> Thanks. >> Best regards, >> Jie >> > From xxinliu at amazon.com Thu Aug 27 05:37:25 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Thu, 27 Aug 2020 05:37:25 +0000 Subject: RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic In-Reply-To: <1597343851213.53343@amazon.com> References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com> <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com> <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com> <1595401959932.33284@amazon.com> <1595520162373.22868@amazon.com> <916b3a4a-5617-941d-6161-840f3ea900bd@oracle.com> <1596523192072.15354@amazon.com> <1597165750921.4285@amazon.com> <9e3fae0e-ecf7-07a9-dba3-c1cef2646eb3@oracle.com>, <4c70ed76-d31a-4077-14b7-37937b5c22ae@oracle.com>, <1597343851213.53343@amazon.com> Message-ID: <1598506645473.15178@amazon.com> Hi, Reviewers, May I ask to review the new revision of JDK-8247732? Webrev: http://cr.openjdk.java.net/~xliu/8247732/02/webrev/ Compared to the previous revision, I suppress invalid Intrinsic Ids in -XX:CompileCommand= and -XX:CompileCommandFile=. This behavior conforms to Tobias and Nils comments before. I extent the testing framework to support a new CompileCommand 'INTRINSIC'. It actually represents ControlIntrinic= in both compiler command and compiler directive. The reason I don't test DisableIntrinsic because it will deprecate. 3 new ControlIntrinsicTest.java files are added to test ControlIntrinsic appears in -XX:CompileCommand=, -XX:CompilerDirectivesFile= and JCMD respectively. As the following table described, only -XX:CompilerDirectivesFile= will abort hotspot process with non-zero exit value. The current testing framework can't test vmflag case directly, I ran test manually like I did in comment before. https://bugs.openjdk.java.net/browse/JDK-8247732?focusedCommentId=14349960&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14349960 Testing: hotspot tier1 test and gtest. thanks, --lx ________________________________________ From: hotspot-compiler-dev on behalf of Liu, Xin Sent: Thursday, August 13, 2020 11:37 AM To: Nils Eliasson; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic hi, Nils, Thank you to elaborate the answer with a table. I don't know there are up to 4 approaches to affect compilation behaviors until this table! I got it. I will work tests and make sure my next patch conform this spec. thanks, --lx ________________________________________ From: hotspot-compiler-dev on behalf of Nils Eliasson Sent: Thursday, August 13, 2020 9:17 AM To: hotspot-compiler-dev at openjdk.java.net Subject: RE: [EXTERNAL] RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. That table didn't come out right... +-------------------------------------------------+-------+----------------------------------+ | ControlIntrinsics | valid | invalid | +-------------------------------------------------+-------+----------------------------------+ | vmflag | ok | print error and don't start | +-------------------------------------------------+-------+----------------------------------+ | CompilerOracle: -XX:CompileCommand= | ok | print error and continue | +-------------------------------------------------+-------+----------------------------------+ | CompilerDirectives: -XX:CompilerDirectivesFile= | ok | print error and don't start | +-------------------------------------------------+-------+----------------------------------+ | CompilerDirectives via jcmd | ok | print error, VM continues to run | +-------------------------------------------------+-------+----------------------------------+ // Regards Nils On 2020-08-13 17:59, Nils Eliasson wrote: > > |+-------------------------------------------------+-------+----------------------------------+ > | ControlIntrinsics | valid | invalid | > +-------------------------------------------------+-------+----------------------------------+ > | vmflag | ok | print error and don't start | > +-------------------------------------------------+-------+----------------------------------+ > | CompilerOracle: -XX:CompileCommand= | ok | print error and continue > | > +-------------------------------------------------+-------+----------------------------------+ > | CompilerDirectives: -XX:CompilerDirectivesFile= | ok | print error > and don't start | > +-------------------------------------------------+-------+----------------------------------+ > | CompilerDirectives via jcmd | ok | print error, vm continues to run > | > +-------------------------------------------------+-------+----------------------------------+| From jiefu at tencent.com Thu Aug 27 06:29:24 2020 From: jiefu at tencent.com (=?utf-8?B?amllZnUo5YKF5p2wKQ==?=) Date: Thu, 27 Aug 2020 06:29:24 +0000 Subject: RFR: 8252404: compiler/c1/TestTraceLinearScanLevel.java fails with release VMs Message-ID: Thanks Tobias for your review. I'll push it later. Best regards, Jie ?On 2020/8/27, 2:23 PM, "Tobias Hartmann" wrote: On 27.08.20 04:54, jiefu(??) wrote: > Can I push it right now? > > I think it's trivial and this is a tier1 failure. Looks good and trivial to me as well. Best regards, Tobias From rwestrel at redhat.com Thu Aug 27 07:25:44 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 27 Aug 2020 09:25:44 +0200 Subject: RFR(T): 8252296: Shenandoah: crash in CallNode::extract_projections In-Reply-To: <312607ab-2d2f-7966-519c-5354951d5184@oracle.com> References: <87d03d7pdk.fsf@redhat.com> <878se17p50.fsf@redhat.com> <312607ab-2d2f-7966-519c-5354951d5184@oracle.com> Message-ID: <87wo1k5z47.fsf@redhat.com> Thanks for the review, Christian. Roland. From rwestrel at redhat.com Thu Aug 27 07:26:13 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 27 Aug 2020 09:26:13 +0200 Subject: RFR(T): 8252296: Shenandoah: crash in CallNode::extract_projections In-Reply-To: <874koptpv5.fsf@mid.deneb.enyo.de> References: <87d03d7pdk.fsf@redhat.com> <878se17p50.fsf@redhat.com> <312607ab-2d2f-7966-519c-5354951d5184@oracle.com> <874koptpv5.fsf@mid.deneb.enyo.de> Message-ID: <87tuwo5z3e.fsf@redhat.com> > It seems to fix my reproducer, too. Thanks. Thanks for verifying the fix. Roland. From christian.hagedorn at oracle.com Thu Aug 27 07:53:28 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 27 Aug 2020 09:53:28 +0200 Subject: [16] RFR(M): 8249607: C2: assert(!had_error) failed: bad dominance In-Reply-To: <738ba102-2cbd-a842-0f23-2984a9293035@oracle.com> References: <87o8my7r0b.fsf@redhat.com> <738ba102-2cbd-a842-0f23-2984a9293035@oracle.com> Message-ID: <95e31d50-aab1-a56c-9077-3e6370d1d94a@oracle.com> On 25.08.20 19:42, Christian Hagedorn wrote: > On 25.08.20 16:13, Roland Westrelin wrote: >> >>> In the testcase, a LoadSNode is cloned in >>> PhaseIdealLoop::split_if_with_blocks_post() for each use such that they >>> can float out of a loop. To ensure that these loads cannot float back >>> into the loop, we pin them by setting their control input [1]. In the >>> testcase, all 3 new clones are pinned to a loop exit node that is part >>> of an outer strip mined loop (see [2]). >> >> Do I understand this right, that all 3 clones are pinned with the same >> control? So they common and only of them is kept? > > Yes, exactly. All are pinned to the inner loop exit node. But at the > time we hit the assertion failure, we still got one cloned load (903 > LoadS) that is an input to the store (575 StoreI) that's going into the > outer strip mined loop safepoint, and one load (901 LoadS) that is > triggering the dominance failure. LoadS 902 was removed at some point in > between due to other optimizations. As Roland and I have discussed offline, it seems to be better and safer to do a simpler fix that does not change the original behavior of the optimization. The new fix suggests not yank AddP nodes (which are inputs to the cloned LoadSNodes in the testcase) and also to not yank gc barriers. In the testcase, the cloned LoadSNodes are still pinned at the loop exit but now they can be optimized and common up to one node during igvn that only belongs to the safepoint in the outer strip mined loop (i.e. no load after the loop anymore). The load is still successfully removed from the inner loop: http://cr.openjdk.java.net/~chagedorn/8249607/webrev.02/ I left the improved dominance failure dumping as it is. We think that it would be a good idea to revisit this cloning optimization in an RFE and also consider webrev.01 there as it seems to be more like an enhancement for loop strip mining rather than a bug fix. I filed [1] which summarizes some thoughts about it. What do others think about that? Best regards, Christian [1] https://bugs.openjdk.java.net/browse/JDK-8252372 From dean.long at oracle.com Thu Aug 27 08:36:11 2020 From: dean.long at oracle.com (Dean Long) Date: Thu, 27 Aug 2020 01:36:11 -0700 Subject: [16] RFR(M) 825239: AOT need to process new markId DEOPT_MH_HANDLER_ENTRY in compiled code In-Reply-To: <9c278576-08e3-1f5b-28d2-6c3b980a6511@oracle.com> References: <9c278576-08e3-1f5b-28d2-6c3b980a6511@oracle.com> Message-ID: Looks good. dl On 8/26/20 5:32 PM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/8252396/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8252396 > > 8252058 added new markId DEOPT_MH_HANDLER_ENTRY to handle > deoptimization for MH invoke. > But changes did not updated AOT (jaotc and Hotspot's AOT code) to > handle this new markId. > > We should handle DEOPT_MH_HANDLER_ENTRY in AOT similar to > DEOPT_HANDLER_ENTRY. > > In aotCompiledMethod.hpp, if DEOPT_MH_HANDLER_ENTRY value is set, > CompiledMethod::_deopt_mh_handler_begin [2] is set similar to Graal > JIT [3]. I kept current code to set _deopt_mh_handler_begin to 'this' > when DEOPT_MH_HANDLER_ENTRY value is not set. But may be it should be > set to NULL as in [3]. May be it does not matter because offset is not > used when there are not MH invoke in method. > > Tested: ran tests which used AOT (including Graal testing). > > Thanks, > Vladimir > > [1] https://bugs.openjdk.java.net/browse/JDK-8252058 > [2] > http://hg.openjdk.java.net/jdk/jdk/file/6abdfb11f342/src/hotspot/share/code/compiledMethod.hpp#l168 > [3] > http://hg.openjdk.java.net/jdk/jdk/file/6abdfb11f342/src/hotspot/share/code/nmethod.cpp#l764 From aph at redhat.com Thu Aug 27 09:18:46 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 27 Aug 2020 10:18:46 +0100 Subject: [16] RFR(M) 825239: AOT need to process new markId DEOPT_MH_HANDLER_ENTRY in compiled code In-Reply-To: <9c278576-08e3-1f5b-28d2-6c3b980a6511@oracle.com> References: <9c278576-08e3-1f5b-28d2-6c3b980a6511@oracle.com> Message-ID: On 27/08/2020 01:32, Vladimir Kozlov wrote: > [1] https://bugs.openjdk.java.net/browse/JDK-8252058 You can't view this issue It may have been deleted or you don't have permission to view it. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Thu Aug 27 09:44:41 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 27 Aug 2020 10:44:41 +0100 Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: References: Message-ID: On 17/08/2020 22:54, Doerr, Martin wrote: > Hi, > > I'd like to backport https://bugs.openjdk.java.net/browse/JDK-8241234 to JDK11u. > > Original JDK15 patch (https://hg.openjdk.java.net/jdk/jdk/rev/87c506c8be63) doesn't fit to JDK11u because the locking code has been reworked by https://bugs.openjdk.java.net/browse/JDK-8229844 > As mentioned by Vladimir, there's already a GraalVM version available which consists of 2 patches (original + addon) and which can be applied: > https://github.com/graalvm/labs-openjdk-11/commit/6c162cb15262e6aa77e36eb3a268320ef0a206a4 > https://github.com/graalvm/labs-openjdk-11/commit/6a28a618cdbe595f9a3993e0eb63c01ccae1a528 > Only JVMCI part from GraalVM doesn't apply automatically. The version of this file from JDK15 is very simple and fits perfectly. > > Please review the JDK11u backport webrev: > http://cr.openjdk.java.net/~mdoerr/8241234_monitorenterexit_11u/webrev.00/ Why is anyone backporting a P4 Enhancement? Seems weird. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From martin.doerr at sap.com Thu Aug 27 10:04:28 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 27 Aug 2020 10:04:28 +0000 Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: References: Message-ID: Hi Andrew, > Why is anyone backporting a P4 Enhancement? Seems weird. This is a good question in general. Personally, I'd vote for backporting fewer less important things to 11u in the future. We should better focus on 17 IMHO. However, there are some arguments for backporting this one: - Oracle has done so. There may be more backports in this area and I'd expect less effort if we have the same code in the open version. - Performance is supposed to be better. (Though I didn't measure it.) - New code is much cleaner. Let's keep in mind that we have to support it for quite a while. Are you ok with it? Best regards, Martin > -----Original Message----- > From: Andrew Haley > Sent: Donnerstag, 27. August 2020 11:45 > To: Doerr, Martin ; 'hotspot-compiler- > dev at openjdk.java.net' ; jdk- > updates-dev at openjdk.java.net > Cc: Lindenmaier, Goetz > Subject: Re: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. > > On 17/08/2020 22:54, Doerr, Martin wrote: > > Hi, > > > > I'd like to backport https://bugs.openjdk.java.net/browse/JDK-8241234 to > JDK11u. > > > > Original JDK15 patch > (https://hg.openjdk.java.net/jdk/jdk/rev/87c506c8be63) doesn't fit to > JDK11u because the locking code has been reworked by > https://bugs.openjdk.java.net/browse/JDK-8229844 > > As mentioned by Vladimir, there's already a GraalVM version available > which consists of 2 patches (original + addon) and which can be applied: > > https://github.com/graalvm/labs-openjdk- > 11/commit/6c162cb15262e6aa77e36eb3a268320ef0a206a4 > > https://github.com/graalvm/labs-openjdk- > 11/commit/6a28a618cdbe595f9a3993e0eb63c01ccae1a528 > > Only JVMCI part from GraalVM doesn't apply automatically. The version of > this file from JDK15 is very simple and fits perfectly. > > > > Please review the JDK11u backport webrev: > > > http://cr.openjdk.java.net/~mdoerr/8241234_monitorenterexit_11u/webre > v.00/ > > Why is anyone backporting a P4 Enhancement? Seems weird. > > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rwestrel at redhat.com Thu Aug 27 11:43:06 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 27 Aug 2020 13:43:06 +0200 Subject: RFR(S): 8241486: G1/Z give warning when using LoopStripMiningIter and turn off LoopStripMiningIter (0) In-Reply-To: References: <87tuwr6s5j.fsf@redhat.com> Message-ID: <87r1rs5n79.fsf@redhat.com> Thanks for the reviews Vladimir and Tobias. Roland. From rwestrel at redhat.com Thu Aug 27 11:52:43 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 27 Aug 2020 13:52:43 +0200 Subject: RFR(S): 8252292: 8240795 may cause anti-dependence to be missed In-Reply-To: <09a82d80-208c-6cea-da6b-e501d65e0f79@oracle.com> References: <87wo1n6snc.fsf@redhat.com> <09a82d80-208c-6cea-da6b-e501d65e0f79@oracle.com> Message-ID: <87o8mw5mr8.fsf@redhat.com> Thanks for the review, Vladimir. Roland. From vladimir.x.ivanov at oracle.com Thu Aug 27 12:54:23 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 27 Aug 2020 15:54:23 +0300 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <670fad6f-16ff-a7b3-8775-08dd79809ddf@redhat.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com> <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com> <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com> <35f8801f-4383-4804-2a9b-5818d1bda763@redhat.com> <524a4aaa-3cf7-7b5e-3e0e-0fd7f4f89fbf@oracle.com> <670fad6f-16ff-a7b3-8775-08dd79809ddf@redhat.com> Message-ID: Hi Andrew, > So, if I can summarize (please correct me if I misunderstand): > > You are as concerned about existing complexity in vector handling as > much as complexity added by this patch, whether the latter is to AArch64 > code or shared code. > > The goal you would like to achieve is a single set of rules for a > single kind of vector register whose size is parameterized, the > appropriate value being derived from each specific vector operation. > > Your main concern about this patch is that it adds yet another > additional vector kind to the current 'wrong' multi-kind vector model > and, what is worse, one with a different behaviour, taking us further > from your desired goal. Yes, correct. > Your other concern is that this design does not allow for the AArch64 > ISA predication or, indeed, for what you treat uniformly as the > 'implicit' predication imposed on a 'logical' max vector size (2048 > bits) by the specific AVX/SVE/NEON hardware vector size. No, I'm not concerned about that. I mentioned SVE implicit predication to illustrate that there's a higher-level abstraction in the JVM above ISA level which hides some of the functionality ISA exposes. And I'm perfectly fine with that. >> But you should definitely prefer 1-slot design for vector registers then >> ;-) > > Indeed I do :-] > > So, let me respond to the above summary points, assuming I have them > down right. > > I agree that your end goal is highly desirable. However, we are not > there yet and since your attempts to do so have not succeeded so far I > don't think that means we are compelled to drop the current patch. As > you say this could (and, if it is adopted, should) be regarded as a > useful stop-gap until we come up with a unified, parameterized vector > implementation that makes it redundant. Unfortunately, there was simply not enough motivation on x86 (and hence resources spent) to address it there. Vector API support for x86 stretched the implementation in a different direction: combinatorial explosion of AD instructions needed to cover all useful cases. It required switching to full-width vectors in x86.ad file which left RA concerns waiting next opportunity. > That said, I'm not pushing hard to keep the patch if the consequence is > generating significant work later to undo it. The number of users who > might benefit from using SVE vectors from Java now or in the near future > does not look like it is going to be very large (if you are not making a > lot of use of SVE registers then that is a lot of wasted silicon and I > suspect it's going to be the rare case that someone codes an app in Java > that needs to make continuous use of SVE -- mind you, by the same token > I guess that also applies for AVX on Intel). I don't consider RA part of the patch as the show-stopper issue for initial SVE support. As I said to Ningsheng, I'm fine with the patch as it is now if we agree it's a stop-the-gap solution and there's a commitment to invest into the proper support. I initially put options #1/#2 (which don't require any changes in RA shared code) as possible alternatives way to temporarily address the problem. Both require additional simplifying assumptions and hence I didn't insist they should be chosen. > I'm not sure pushing this now will add a lot more work later. It seems > to me that this code is actually moving in the right direction for the > sort of solution you want. The AArch64 VecA register /is/ > size-parameterized, albeit by a size fixed at startup rather than per > operation. So, that's one reason why I don't know if this implies a lot > more rework to move towards your desired goal. Surely, if we do arrive > at a unifying vector model that can replace the existing multi-kind > vectors then it ought to be able to subsume this code - unless of course > it replaces it wholesale. > > Are you concerned that adding this patch will result in more cases to > pick through and correct? > > Are you worried that we might have to withdraw some of the support this > patch enables to arrive at the final goal? > > Also, Ningsheng and his colleagues have laid some foundations for > implementing predicated operations with this patch and have that work in > the pipeline. Once again this is moving towards the desired goal even if > it might end up doign so in a slightly sideways fashion. Perhaps we > could continue this stop-gap experiment as an experimental option in > order to learn from the experience? I definitely don't want to hinder/block the impressive work Ningsheng and others at Arm are doing for SVE support. Frankly speaking, my main concern is that the implementation can stay that way forever ;-) That's why I'm trying to get enough ground covered in the discussion and some agreements/commitments to be made before it is integrated. I don't have any strong objections to the patch which could justify blocking its integration, but on a higher-level I do voice my concerns about where it pushes the implementation longer-term. Unfortunately, as it is shaped now, I don't see how x86 can benefit from it. So, I'm afraid this particular route with vecA and _is_scalable bit will stay purely AArch64-specific exercise. Leaving RA part aside, I have one suggestion which should help in the future: let's try to consistently follow full-width vector abstraction. In AD file, vecA operand is way too similar to vecX et al which makes a wrong impression it's yet another vector flavor. So, choosing a better name will help when representation changes. For example, x86 moved away from vecX/... operands to a single generic one (called "vec") and you can take a loot at x86.ad to see the result. Best regards, Vladimir Ivanov From christian.hagedorn at oracle.com Thu Aug 27 14:54:22 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 27 Aug 2020 16:54:22 +0200 Subject: RFR(S) 8251271- C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes In-Reply-To: References: Message-ID: <3c989485-754f-b7f5-e91f-c7c0adfdaf88@oracle.com> Hi Nhat Looks good to me! Just make sure you that next time you assign the bug to you or a sponsor and/or leave a comment that you intend to work on it to avoid the possibility of some duplicated work (was no problem in this case) ;-) Best regards, Christian On 26.08.20 20:55, Nhat Nguyen wrote: > Hi hotspot-compiler-dev, > > Please review the following patch to address https://bugs.openjdk.java.net/browse/JDK-8251271 > The bug is currently assigned to Christian Hagedorn, but he was supportive of me submitting the patch instead. > I have run hotspot/tier1 and jdk/tier1 tests to make sure that the change is working as intended. > > webrev: http://cr.openjdk.java.net/~burban/nhat/JDK-8251271/webrev.00/ > > Thank you, > Nhat > From martin.doerr at sap.com Thu Aug 27 15:07:08 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 27 Aug 2020 15:07:08 +0000 Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> Message-ID: Hi Corey, > If I make a requirement, I feel decode0 should check that the > requirement is met, and raise some kind of internal error if it isn't. > That actually was my first implementation, but I received some comments > during an internal review suggesting that I just "round down" the > destination count to the closest multiple of 3 less than or equal to the > returned value, rather than throw an internal exception which would > confuse users. This "enforces" the rule, in some sense, without error > handling. Do you have some thoughts about this? I think the rounding logic is hard to understand and I'm not sure if it's correct (you're rounding up for the 1st computation of chars_decoded). If we don't use it, it will never get tested (because the intrinsic always returns a multiple of 3). I prefer having a more simple version which is easy to understand and for which we can test all cases. I think we should be able to catch violations of this requirement by adding good JTREG tests. An illegal intrinsic implementation should never pass the tests. So I don't see a need to catch an illegal state in the Java source code in this case. I guess this will be best for intrinsic implementors for other platforms as well. I'd appreciate more opinions on this. > I will double check that everything compiles and runs properly with gcc > 7.3.1. Please note that 7.3.1 is our minimum for Big Endian linux. For Little Endian it's 7.4.0. You can also find this information here: https://wiki.openjdk.java.net/display/Build/Supported+Build+Platforms under "Other JDK 13 build platforms" which hasn't changed since then. > I will use __attribute__ ((align(16))) instead of __vector, and make > them arrays of 16 unsigned char. Maybe __vectors works as expected, too, now. Whatever we use, I'd appreciate to double-check the alignment e.g. by using gdb. I don't remember what we had tried and why it didn't work as desired. > I was following what was done for encodeBlock, but it appears > encodeBlock's style isn't what is used for the other intrinsics. I will > correct decodeBlock to use the prevailing style. Another patch should > be added (not part of this webrev) to correct encodeBlock's style. In your code one '\' is not aligned with the other ones. > Ah, this is another thing I didn't know about. I will make some > regression tests. Thanks. There's some documentation available: https://openjdk.java.net/jtreg/ I guess your colleagues can assist you with that so you don't have to figure out everything alone. > Thanks for your time on this. As you can tell, I'm inexperienced in > writing openjdk code, so your patience and careful review is really > appreciated. I'm glad you work on contributions. I think we should welcome new contributors and assist as far as we can. Best regards, Martin > -----Original Message----- > From: Corey Ashford > Sent: Donnerstag, 27. August 2020 00:17 > To: Doerr, Martin ; Michihiro Horie > > Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev dev at openjdk.java.net>; Kazunori Ogata ; > joserz at br.ibm.com > Subject: Re: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and > API for Base64 decoding > > Hi Martin, > > Some inline responses below. > > On 8/26/20 8:26 AM, Doerr, Martin wrote: > > > Hi Corey, > > > > I should explain my comments regarding Base64.java better. > > > >> Let's be precise: "should process a multiple of four" => "must process a > >> multiple of four" > > Did you try to support non-multiple of 4 and this was intended as > recommendation? > > I think making it a requirement and simplifying the logic in decode0 is > better. > > Or what's the benefit of the recommendation? > > If I make a requirement, I feel decode0 should check that the > requirement is met, and raise some kind of internal error if it isn't. > That actually was my first implementation, but I received some comments > during an internal review suggesting that I just "round down" the > destination count to the closest multiple of 3 less than or equal to the > returned value, rather than throw an internal exception which would > confuse users. This "enforces" the rule, in some sense, without error > handling. Do you have some thoughts about this? > > > > >>> If any illegal base64 bytes are encountered in the source by the > >>> intrinsic, the intrinsic can return a data length of zero or any > >>> number of bytes before the place where the illegal base64 byte > >>> was encountered. > >> I think this has a drawback. Somebody may use a debugger and want to > stop > >> when throwing IllegalArgumentException. He should see the position > which > >> matches the Java implementation.kkkk > > This is probably hard to understand. Let me try to explain it by example: > > 1. 80 Bytes get processed by the intrinsic and 60 Bytes written to the > destination array. > > 2. The intrinsic sees an illegal base64 Byte and it returns 12 which is allowed > by your specification. > > 3. The compiled method containing the intrinsic hits a safepoint (e.g. in the > large while loop in decodeBlockSlow). > > 4. A JVMTI agent (debugger) reads dp and dst. > > 5. The person using the debugger gets angry because more bytes than dp > were written into dst. The JVM didn't follow the specified behavior. > > > > I guess we can and should avoid it by specifying that the intrinsic needs to > return the dp value matching the number of Bytes written. > > That's an interesting point. I will change the specification, and the > intrinsic implementation. Right now the Power9/10 intrinsic returns 0 > when any illegal character is discovered, but I've been thinking about > returning the number of bytes already written, which will allow > decodeBlockSlow to more quickly find the offending character. This > provides another good reason to make that change. > > > > > Best regards, > > Martin > > > > > >> -----Original Message----- > >> From: Doerr, Martin > >> Sent: Dienstag, 25. August 2020 15:38 > >> To: Corey Ashford ; Michihiro Horie > >> > >> Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev >> dev at openjdk.java.net>; Kazunori Ogata ; > >> joserz at br.ibm.com > >> Subject: RE: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate > and > >> API for Base64 decoding > >> > >> Hi Corey, > >> > >> thanks for proposing this change. I have comments and suggestions > >> regarding various files. > >> > >> > >> Base64.java > >> > >> This is the only file which needs another review from core-libs-dev. > >> First of all, I like the idea to use a HotSpotIntrinsicCandidate which can > >> consume as many bytes as the implementation wants. > >> > >> Comment before decodeBlock: > >> Let's be precise: "should process a multiple of four" => "must process a > >> multiple of four" > >> > >>> If any illegal base64 bytes are encountered in the source by the > >>> intrinsic, the intrinsic can return a data length of zero or any > >>> number of bytes before the place where the illegal base64 byte > >>> was encountered. > >> I think this has a drawback. Somebody may use a debugger and want to > stop > >> when throwing IllegalArgumentException. He should see the position > which > >> matches the Java implementation. > >> > >> Please note that the comment indentation differs from other comments. > > Will fix. > > >> > >> decode0: Final "else" after return is redundant. > > Will fix. > > >> > >> > >> stubGenerator_ppc.cpp > >> > >> "__vector" breaks AIX build! > >> Does it work on Big Endian linux with old gcc (we require 7.3.1, now)? > >> Please either support Big Endian properly or #ifdef it out. > > I have been compiling with only Advance Toolchain 13, which is 9.3.1, > and only on Linux. It will not work with big endian, so it won't work > on AIX, however obviously it shouldn't break the AIX build, so I will > address that. There's code to set UseBASE64Intrinsics to false on big > endian, but you're right -- I should ifdef all of the intrinsic code for > little endian for now. Getting it to work on big endian / AIX shouldn't > be difficult, but it's not in my scope of work at the moment. > > I will double check that everything compiles and runs properly with gcc > 7.3.1. > > >> What exactly does it (do) on linux? > > It's an arch-specific type that's 16 bytes in size and aligned on a > 16-byte boundary. > > >> I remember that we had tried such prefixes but were not satisfied. I think > it > >> didn't enforce 16 Byte alignment if I remember correctly. > > I will use __attribute__ ((align(16))) instead of __vector, and make > them arrays of 16 unsigned char. > > >> > >> Attention: C2 does no longer convert int/bool to 64 bit values (since JDK- > >> 8086069). So the argument registers for offset, length and isURL may > contain > >> garbage in the higher bits. > > Wow, that's good to know! I will mask off the incoming values. > > >> > >> You may want to use load_const_optimized which produces shorter code. > > Will fix. > > >> > >> You may want to use __ align(32) to align unrolled_loop_start. > > Will fix. > > >> > >> I'll review the algorithm in detail when I find more time. > >> > >> > >> assembler_ppc.hpp > >> assembler_ppc.inline.hpp > >> vm_version_ppc.cpp > >> vm_version_ppc.hpp > >> Please rebase. Parts of the change were pushed as part of 8248190: > Enable > >> Power10 system and implement new byte-reverse instructions > > Will do. > > >> > >> > >> vmSymbols.hpp > >> Indentation looks odd at the end. > > I was following what was done for encodeBlock, but it appears > encodeBlock's style isn't what is used for the other intrinsics. I will > correct decodeBlock to use the prevailing style. Another patch should > be added (not part of this webrev) to correct encodeBlock's style. > > >> > >> > >> library_call.cpp > >> Good. Indentation style of the call parameters differs from encodeBlock. > > Will fix. > > >> > >> > >> runtime.cpp > >> Good. > >> > >> > >> aotCodeHeap.cpp > >> vmSymbols.cpp > >> shenandoahSupport.cpp > >> vmStructs_jvmci.cpp > >> shenandoahSupport.cpp > >> escape.cpp > >> runtime.hpp > >> stubRoutines.cpp > >> stubRoutines.hpp > >> vmStructs.cpp > >> Good and trivial. > >> > >> > >> Tests: > >> I think we should have JTREG tests to check for regressions in the future. > > Ah, this is another thing I didn't know about. I will make some > regression tests. > > Thanks for your time on this. As you can tell, I'm inexperienced in > writing openjdk code, so your patience and careful review is really > appreciated. > > - Corey From aph at redhat.com Thu Aug 27 15:25:16 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 27 Aug 2020 16:25:16 +0100 Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: References: Message-ID: <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com> Hi, On 27/08/2020 11:04, Doerr, Martin wrote: > >> Why is anyone backporting a P4 Enhancement? Seems weird. > This is a good question in general. Personally, I'd vote for > backporting fewer less important things to 11u in the future. We > should better focus on 17 IMHO. > > However, there are some arguments for backporting this one: > - Oracle has done so. There may be more backports in this area and > I'd expect less effort if we have the same code in the open version. > - Performance is supposed to be better. (Though I didn't measure it.) > - New code is much cleaner. Let's keep in mind that we have to > support it for quite a while. > > Are you ok with it? I'm unsure. While "Oracle has backported it" has been a slam-dunk justification for many patches, I am concerned about the destabilizing effect of the volume of patches we are processing. "Better performance" is not in itself justification for a backport unless the improvement is really compelling. "Cleanups" are a red flag. The miserable history of code that has been broken by seemingly innocuous cleanups is long. This is a big change that affects some very delicate code, but the fact that there is already a GraalVM patch we can use is quite persuasive. So I'm not refusing it, I want people's opinions. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From sgehwolf at redhat.com Thu Aug 27 15:59:32 2020 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Thu, 27 Aug 2020 17:59:32 +0200 Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com> References: <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com> Message-ID: On Thu, 2020-08-27 at 16:25 +0100, Andrew Haley wrote: > Hi, > > On 27/08/2020 11:04, Doerr, Martin wrote: > > > Why is anyone backporting a P4 Enhancement? Seems weird. > > This is a good question in general. Personally, I'd vote for > > backporting fewer less important things to 11u in the future. We > > should better focus on 17 IMHO. > > > > However, there are some arguments for backporting this one: > > - Oracle has done so. There may be more backports in this area and > > I'd expect less effort if we have the same code in the open version. > > - Performance is supposed to be better. (Though I didn't measure it.) > > - New code is much cleaner. Let's keep in mind that we have to > > support it for quite a while. > > > > Are you ok with it? > > I'm unsure. While "Oracle has backported it" has been a slam-dunk > justification for many patches, I am concerned about the destabilizing > effect of the volume of patches we are processing. > > "Better performance" is not in itself justification for a backport > unless the improvement is really compelling. > > "Cleanups" are a red flag. The miserable history of code that has been > broken by seemingly innocuous cleanups is long. This is a big change > that affects some very delicate code, but the fact that there is > already a GraalVM patch we can use is quite persuasive. > > So I'm not refusing it, I want people's opinions. It seems like a nice-to-have fix for OpenJDK 11 itself. Interest seems to be coming from Graal. Until there is a more compelling reason to backport this (other than performance for some JVMCI impl) we shouldn't backport this. We already have a label for these: jdk11u-jvmci-defer. We should apply that and re-evaluate later if needed. My $0.02 Thanks, Severin From vladimir.kozlov at oracle.com Thu Aug 27 16:33:44 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 27 Aug 2020 09:33:44 -0700 Subject: New EA Metropolis build Message-ID: The build at the Project Metropolis Early Access page [1] has been refreshed. It was updated to JDK 15. Binaries are based on Metropolis repository [2] which was synced with jdk-15+36 (JDK 15 build 36). Graal in Metropolis is based on GraalVM CE version of Graal [3]. It was updated up to GR-24572 commit [4] and additional patch was applied [5] to enable libgraal build with JDK 15. Regards, Vladimir Kozlov [1] https://jdk.java.net/metropolis/ [2] https://github.com/openjdk/metropolis [3] https://github.com/oracle/graal [4] [GR-24572] JDK15 java.lang.invoke.MemberName is reachable. https://github.com/oracle/graal/commit/b0735cd5fb384cfdb522488edf1d83b013507d72 [5] [GR-25120] Fixed leaked indirect java constants on jdk15. https://github.com/oracle/graal/commit/e82d1090c23493a6d665e579cacad8241ea75318 From vladimir.kozlov at oracle.com Thu Aug 27 17:23:52 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 27 Aug 2020 10:23:52 -0700 Subject: [16] RFR(M): 8249607: C2: assert(!had_error) failed: bad dominance In-Reply-To: <95e31d50-aab1-a56c-9077-3e6370d1d94a@oracle.com> References: <87o8my7r0b.fsf@redhat.com> <738ba102-2cbd-a842-0f23-2984a9293035@oracle.com> <95e31d50-aab1-a56c-9077-3e6370d1d94a@oracle.com> Message-ID: On 8/27/20 12:53 AM, Christian Hagedorn wrote: > On 25.08.20 19:42, Christian Hagedorn wrote: >> On 25.08.20 16:13, Roland Westrelin wrote: >>> >>>> In the testcase, a LoadSNode is cloned in >>>> PhaseIdealLoop::split_if_with_blocks_post() for each use such that they >>>> can float out of a loop. To ensure that these loads cannot float back >>>> into the loop, we pin them by setting their control input [1]. In the >>>> testcase, all 3 new clones are pinned to a loop exit node that is part >>>> of an outer strip mined loop (see [2]). >>> >>> Do I understand this right, that all 3 clones are pinned with the same >>> control? So they common and only of them is kept? >> >> Yes, exactly. All are pinned to the inner loop exit node. But at the time we hit the assertion failure, we still got >> one cloned load (903 LoadS) that is an input to the store (575 StoreI) that's going into the outer strip mined loop >> safepoint, and one load (901 LoadS) that is triggering the dominance failure. LoadS 902 was removed at some point in >> between due to other optimizations. > > As Roland and I have discussed offline, it seems to be better and safer to do a simpler fix that does not change the > original behavior of the optimization. The new fix suggests not yank AddP nodes (which are inputs to the cloned > LoadSNodes in the testcase) and also to not yank gc barriers. In the testcase, the cloned LoadSNodes are still pinned at > the loop exit but now they can be optimized and common up to one node during igvn that only belongs to the safepoint in > the outer strip mined loop (i.e. no load after the loop anymore). The load is still successfully removed from the inner > loop: > > http://cr.openjdk.java.net/~chagedorn/8249607/webrev.02/ > > I left the improved dominance failure dumping as it is. Good. > > We think that it would be a good idea to revisit this cloning optimization in an RFE and also consider webrev.01 there > as it seems to be more like an enhancement for loop strip mining rather than a bug fix. I filed [1] which summarizes > some thoughts about it. > > What do others think about that? I agree with that. Thanks, Vladimir > > Best regards, > Christian > > > [1] https://bugs.openjdk.java.net/browse/JDK-8252372 From vladimir.kozlov at oracle.com Thu Aug 27 17:27:19 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 27 Aug 2020 10:27:19 -0700 Subject: [16] RFR(M) 825239: AOT need to process new markId DEOPT_MH_HANDLER_ENTRY in compiled code In-Reply-To: References: <9c278576-08e3-1f5b-28d2-6c3b980a6511@oracle.com> Message-ID: Thank you, Dean Vladimir K On 8/27/20 1:36 AM, Dean Long wrote: > Looks good. > > dl > > On 8/26/20 5:32 PM, Vladimir Kozlov wrote: >> http://cr.openjdk.java.net/~kvn/8252396/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8252396 >> >> 8252058 added new markId DEOPT_MH_HANDLER_ENTRY to handle deoptimization for MH invoke. >> But changes did not updated AOT (jaotc and Hotspot's AOT code) to handle this new markId. >> >> We should handle DEOPT_MH_HANDLER_ENTRY in AOT similar to DEOPT_HANDLER_ENTRY. >> >> In aotCompiledMethod.hpp, if DEOPT_MH_HANDLER_ENTRY value is set, CompiledMethod::_deopt_mh_handler_begin [2] is set >> similar to Graal JIT [3]. I kept current code to set _deopt_mh_handler_begin to 'this' when DEOPT_MH_HANDLER_ENTRY >> value is not set. But may be it should be set to NULL as in [3]. May be it does not matter because offset is not used >> when there are not MH invoke in method. >> >> Tested: ran tests which used AOT (including Graal testing). >> >> Thanks, >> Vladimir >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8252058 >> [2] http://hg.openjdk.java.net/jdk/jdk/file/6abdfb11f342/src/hotspot/share/code/compiledMethod.hpp#l168 >> [3] http://hg.openjdk.java.net/jdk/jdk/file/6abdfb11f342/src/hotspot/share/code/nmethod.cpp#l764 > From vladimir.kozlov at oracle.com Thu Aug 27 17:39:42 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 27 Aug 2020 10:39:42 -0700 (PDT) Subject: [16] RFR(M) 825239: AOT need to process new markId DEOPT_MH_HANDLER_ENTRY in compiled code In-Reply-To: References: <9c278576-08e3-1f5b-28d2-6c3b980a6511@oracle.com> Message-ID: <94cfa9c9-f5dc-a443-cf4a-53b642c68c84@oracle.com> I created open bug and will use its ID for changeset: https://bugs.openjdk.java.net/browse/JDK-8252467 Thank, Vladimir K On 8/27/20 2:18 AM, Andrew Haley wrote: > On 27/08/2020 01:32, Vladimir Kozlov wrote: >> [1] https://bugs.openjdk.java.net/browse/JDK-8252396 > > You can't view this issue > > It may have been deleted or you don't have permission to view it. > From jingxinc at amazon.com Thu Aug 27 18:08:49 2020 From: jingxinc at amazon.com (Eric, Chan) Date: Thu, 27 Aug 2020 18:08:49 +0000 Subject: RFR 8239090: Improve CPU feature support in VM_version Message-ID: Hi, Requesting review for Webrev : http://cr.openjdk.java.net/~phh/8239090/webrev.00/ JBS : https://bugs.openjdk.java.net/browse/JDK-8239090 Yesterday I sent a wrong one, so I send it again, I improve the ?get_processor_features? method by store every cpu features in an enum array so that we don?t have to count how many ?%s? that need to added. I passed the tier1 test successfully. Regards, Eric Chen From Divino.Cesar at microsoft.com Thu Aug 27 19:36:27 2020 From: Divino.Cesar at microsoft.com (Cesar Soares Lucas) Date: Thu, 27 Aug 2020 19:36:27 +0000 Subject: [16] RFR(S): 8250668: Clean up method_oop names in adlc Message-ID: Hi there, RFE: https://bugs.openjdk.java.net/browse/JDK-8250668 Webrev: https://cr.openjdk.java.net/~adityam/cesar/8250668/0/ Need sponsor: Yes Tested on: Windows/Linux/MacOS tiers 1-3 can I please get some reviews for the Webrev linked above? The work consists of renaming "method_oop" ocurrences all around the code base to just "method". I've tested this on x86_64 only?* Can someone please help testing on other architectures as well: x86_32, PPC, ARM32/64, S390? Thank you, Cesar From richard.reingruber at sap.com Thu Aug 27 20:32:36 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Thu, 27 Aug 2020 20:32:36 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com> <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com> <5271429a-481d-ddb9-99dc-b3f6670fcc0b@oracle.com> Message-ID: Hi Goetz, > I read through your change again. It looks good to me now. > The new naming and additional comments make it > easier to read I think, thank you. Thanks for all your input! > One small thing: > deoptimization.cpp, l. 1503 > You don't really need the brackets. Two lines below you don't use them either. > (No webrev needed) Thanks for providing the correct line off list. Fixed! I prepared a new webrev, because I had to rebase after JDK-8249293 [1] and because I wanted to make use of JDK-8251384 [2] Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8/ Delta: http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8.inc/ The delta looks bigger than it is. Most of it is re-indentation of VM_GetOrSetLocal::deoptimize_objects(). You can see this if you look at http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8.inc/src/hotspot/share/prims/jvmtiImpl.cpp.udiff.html which does not include the whitespace change. Hope you are still ok with webrev.8. The changes are marginal. I've commented each below. Thanks, Richard. --- Details below --- src/hotspot/share/prims/jvmtiImpl.cpp @@ -425,11 +425,11 @@ , _depth(depth) , _index(index) , _type(type) , _jvf(NULL) , _set(false) - , _eb(NULL, NULL, false) // no references escape + , _eb(NULL, NULL, type == T_OBJECT) , _result(JVMTI_ERROR_NONE) Currently 'type' is never equal to T_OBJECT at this location, still I think it is better to check. The compiler will replace the compare with false. @@ -630,11 +630,11 @@ } // Revert optimizations based on escape analysis if this is an access to a local object bool VM_GetOrSetLocal::deoptimize_objects(javaVFrame* jvf) { #if COMPILER2_OR_JVMCI - if (NOT_JVMCI(DoEscapeAnalysis &&) _type == T_OBJECT) { + assert(_type == T_OBJECT, "EscapeBarrier should not be active if _type != T_OBJECT"); I removed the if from VM_GetOrSetLocal::deoptimize_objects(), because now it only gets called if the VM_GetOrSetLocal instance has an active EscapeBarrier which will be the case iff the local type is T_OBJECT and if either C2 escape analysis is enabled or Graal is used. src/hotspot/share/runtime/deoptimization.cpp You suggested to remove the braces. Done. src/hotspot/share/runtime/deoptimization.hpp Must provide definition of EscapeBarrier::barrier_active() for new call site in VM_GetOrSetLocal::doit_prologue() if building with COMPILER2_OR_JVMCI not defined. test/hotspot/jtreg/serviceability/jvmti/Heap/IterateHeapWithEscapeAnalysisEnabled.java Make use of [2] and pass test with minimal vm. [1] https://bugs.openjdk.java.net/browse/JDK-8249293 [2] https://bugs.openjdk.java.net/browse/JDK-8251384 -----Original Message----- From: Lindenmaier, Goetz Sent: Samstag, 22. August 2020 07:46 To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi Richard, I read through your change again. It looks good to me now. The new naming and additional comments make it easier to read I think, thank you. One small thing: deoptimization.cpp, l. 1503 You don't really need the brackets. Two lines below you don't use them either. (No webrev needed) Best regards, Goetz. From igor.veresov at oracle.com Fri Aug 28 01:20:48 2020 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 27 Aug 2020 18:20:48 -0700 Subject: RFR 8239090: Improve CPU feature support in VM_version In-Reply-To: References: Message-ID: <47EE441C-09D0-43C1-A339-E8323B866A66@oracle.com> You can actually make a constexpr array of feature objects and then use constexpr function with a loop to look it up. The c++ compiler will generate an O(1) table lookup for it. That would be a good way to get rid of the ugly macro (we allow c++14 now). For example foo() in this example: enum E { a, b, c }; struct P { E _e; // key int _v; // value constexpr P(E e, int v) : _e(e), _v(v) { } }; constexpr static P ps[3] = { P(a, 0xdead), P(b, 0xbeef), P(c, 0xf00d)}; constexpr int match(E e) { for (const auto& p : ps) { if (p._e == e) { return p._v; } } return -1; } int foo(E e) { return match(e); } Will be compiled into: __Z3foo1E: ## @_Z3foo1E .cfi_startproc ## %bb.0: movl $-1, %eax cmpl $2, %edi ja LBB0_2 ## %bb.1: pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset %rbp, -16 movq %rsp, %rbp .cfi_def_cfa_register %rbp movslq %edi, %rax leaq l_switch.table._Z3foo1E(%rip), %rcx movq (%rcx,%rax,8), %rax movl 4(%rax), %eax popq %rbp LBB0_2: retq .cfi_endproc ## -- End function .section __TEXT,__const .p2align 4 ## @_ZL2ps __ZL2ps: .long 0 ## 0x0 .long 57005 ## 0xdead .long 1 ## 0x1 .long 48879 ## 0xbeef .long 2 ## 0x2 .long 61453 ## 0xf00d .section __DATA,__const .p2align 3 ## @switch.table._Z3foo1E l_switch.table._Z3foo1E: .quad __ZL2ps .quad __ZL2ps+8 .quad __ZL2ps+16 igor > On Aug 27, 2020, at 11:08 AM, Eric, Chan wrote: > > Hi, > > Requesting review for > > Webrev : http://cr.openjdk.java.net/~phh/8239090/webrev.00/ > JBS : https://bugs.openjdk.java.net/browse/JDK-8239090 > > Yesterday I sent a wrong one, so I send it again, > I improve the ?get_processor_features? method by store every cpu features in an enum array so that we don?t have to count how many ?%s? that need to added. I passed the tier1 test successfully. > > Regards, > Eric Chen > From ningsheng.jian at arm.com Fri Aug 28 05:56:56 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Fri, 28 Aug 2020 13:56:56 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com> <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com> <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com> <35f8801f-4383-4804-2a9b-5818d1bda763@redhat.com> <524a4aaa-3cf7-7b5e-3e0e-0fd7f4f89fbf@oracle.com> <670fad6f-16ff-a7b3-8775-08dd79809ddf@redhat.com> Message-ID: Hi Vladimir, Thanks a lot for helping clarifying your concerns which will benefit future direction. On 8/27/20 8:54 PM, Vladimir Ivanov wrote: > Hi Andrew, > >> So, if I can summarize (please correct me if I misunderstand): >> >> You are as concerned about existing complexity in vector handling as >> much as complexity added by this patch, whether the latter is to AArch64 >> code or shared code. >> >> The goal you would like to achieve is a single set of rules for a >> single kind of vector register whose size is parameterized, the >> appropriate value being derived from each specific vector operation. >> >> Your main concern about this patch is that it adds yet another >> additional vector kind to the current 'wrong' multi-kind vector model >> and, what is worse, one with a different behaviour, taking us further >> from your desired goal. > > Yes, correct. > >> Your other concern is that this design does not allow for the AArch64 >> ISA predication or, indeed, for what you treat uniformly as the >> 'implicit' predication imposed on a 'logical' max vector size (2048 >> bits) by the specific AVX/SVE/NEON hardware vector size. > > No, I'm not concerned about that. I mentioned SVE implicit predication > to illustrate that there's a higher-level abstraction in the JVM above > ISA level which hides some of the functionality ISA exposes. And I'm > perfectly fine with that. > >>> But you should definitely prefer 1-slot design for vector registers then >>> ;-) >> >> Indeed I do :-] >> >> So, let me respond to the above summary points, assuming I have them >> down right. >> >> I agree that your end goal is highly desirable. However, we are not >> there yet and since your attempts to do so have not succeeded so far I >> don't think that means we are compelled to drop the current patch. As >> you say this could (and, if it is adopted, should) be regarded as a >> useful stop-gap until we come up with a unified, parameterized vector >> implementation that makes it redundant. > > Unfortunately, there was simply not enough motivation on x86 (and hence > resources spent) to address it there. Vector API support for x86 > stretched the implementation in a different direction: combinatorial > explosion of AD instructions needed to cover all useful cases. It > required switching to full-width vectors in x86.ad file which left RA > concerns waiting next opportunity. > >> That said, I'm not pushing hard to keep the patch if the consequence is >> generating significant work later to undo it. The number of users who >> might benefit from using SVE vectors from Java now or in the near future >> does not look like it is going to be very large (if you are not making a >> lot of use of SVE registers then that is a lot of wasted silicon and I >> suspect it's going to be the rare case that someone codes an app in Java >> that needs to make continuous use of SVE -- mind you, by the same token >> I guess that also applies for AVX on Intel). > > I don't consider RA part of the patch as the show-stopper issue for > initial SVE support. As I said to Ningsheng, I'm fine with the patch as > it is now if we agree it's a stop-the-gap solution and there's a > commitment to invest into the proper support. > > I initially put options #1/#2 (which don't require any changes in RA > shared code) as possible alternatives way to temporarily address the > problem. Both require additional simplifying assumptions and hence I > didn't insist they should be chosen. > >> I'm not sure pushing this now will add a lot more work later. It seems >> to me that this code is actually moving in the right direction for the >> sort of solution you want. The AArch64 VecA register /is/ >> size-parameterized, albeit by a size fixed at startup rather than per >> operation. So, that's one reason why I don't know if this implies a lot >> more rework to move towards your desired goal. Surely, if we do arrive >> at a unifying vector model that can replace the existing multi-kind >> vectors then it ought to be able to subsume this code - unless of course >> it replaces it wholesale. >> >> Are you concerned that adding this patch will result in more cases to >> pick through and correct? >> >> Are you worried that we might have to withdraw some of the support this >> patch enables to arrive at the final goal? >> >> Also, Ningsheng and his colleagues have laid some foundations for >> implementing predicated operations with this patch and have that work in >> the pipeline. Once again this is moving towards the desired goal even if >> it might end up doign so in a slightly sideways fashion. Perhaps we >> could continue this stop-gap experiment as an experimental option in >> order to learn from the experience? > > I definitely don't want to hinder/block the impressive work Ningsheng > and others at Arm are doing for SVE support. > > Frankly speaking, my main concern is that the implementation can stay > that way forever ;-) That's why I'm trying to get enough ground covered > in the discussion and some agreements/commitments to be made before it > is integrated. > > I don't have any strong objections to the patch which could justify > blocking its integration, but on a higher-level I do voice my concerns > about where it pushes the implementation longer-term. > > Unfortunately, as it is shaped now, I don't see how x86 can benefit from > it. So, I'm afraid this particular route with vecA and _is_scalable bit > will stay purely AArch64-specific exercise. > > Leaving RA part aside, I have one suggestion which should help in the > future: let's try to consistently follow full-width vector abstraction. > In AD file, vecA operand is way too similar to vecX et al which makes a > wrong impression it's yet another vector flavor. So, choosing a better > name will help when representation changes. For example, x86 moved away > from vecX/... operands to a single generic one (called "vec") and you > can take a loot at x86.ad to see the result. > Thanks for the suggestion. In current implementation vecA does not include vecD/vecX for NEON - so actually it's regarded as another vector flavor. We try to keep the SVE implementation separated from original NEON code (and a new ad file is also introduced), to make the code better maintainable and reviewable. What do you think about this naming, Andrew? Thanks, Ningsheng From goetz.lindenmaier at sap.com Fri Aug 28 06:37:39 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 28 Aug 2020 06:37:39 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com> <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com> <5271429a-481d-ddb9-99dc-b3f6670fcc0b@oracle.com> Message-ID: Hi Richard, Thanks for the new webrev. The small improvements are fine, too. Reviewed from my side. Best regards, Goetz. > -----Original Message----- > From: Reingruber, Richard > Sent: Thursday, August 27, 2020 10:33 PM > To: Lindenmaier, Goetz ; serviceability- > dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot- > runtime-dev at openjdk.java.net > Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance > in the Presence of JVMTI Agents > > Hi Goetz, > > > I read through your change again. It looks good to me now. > > The new naming and additional comments make it > > easier to read I think, thank you. > > Thanks for all your input! > > > One small thing: > > deoptimization.cpp, l. 1503 > > You don't really need the brackets. Two lines below you don't use them > either. > > (No webrev needed) > > Thanks for providing the correct line off list. Fixed! > > I prepared a new webrev, because I had to rebase after JDK-8249293 [1] and > because I wanted to make use of JDK-8251384 [2] > > Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8/ > Delta: http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8.inc/ > > The delta looks bigger than it is. Most of it is re-indentation of > VM_GetOrSetLocal::deoptimize_objects(). You can see this if you look at > > http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8.inc/src/hotsp > ot/share/prims/jvmtiImpl.cpp.udiff.html > > which does not include the whitespace change. > > Hope you are still ok with webrev.8. The changes are marginal. I've > commented > each below. > > Thanks, Richard. > > --- Details below --- > > src/hotspot/share/prims/jvmtiImpl.cpp > > @@ -425,11 +425,11 @@ > , _depth(depth) > , _index(index) > , _type(type) > , _jvf(NULL) > , _set(false) > - , _eb(NULL, NULL, false) // no references escape > + , _eb(NULL, NULL, type == T_OBJECT) > , _result(JVMTI_ERROR_NONE) > > Currently 'type' is never equal to T_OBJECT at this location, still I think it > is better to check. The compiler will replace the compare with false. > > @@ -630,11 +630,11 @@ > } > > // Revert optimizations based on escape analysis if this is an access to a > local object > bool VM_GetOrSetLocal::deoptimize_objects(javaVFrame* jvf) { > #if COMPILER2_OR_JVMCI > - if (NOT_JVMCI(DoEscapeAnalysis &&) _type == T_OBJECT) { > + assert(_type == T_OBJECT, "EscapeBarrier should not be active if _type != > T_OBJECT"); > > I removed the if from VM_GetOrSetLocal::deoptimize_objects(), because > now it > only gets called if the VM_GetOrSetLocal instance has an active > EscapeBarrier > which will be the case iff the local type is T_OBJECT and if either C2 escape > analysis is enabled or Graal is used. > > src/hotspot/share/runtime/deoptimization.cpp > > You suggested to remove the braces. Done. > > src/hotspot/share/runtime/deoptimization.hpp > > Must provide definition of EscapeBarrier::barrier_active() for new call site in > VM_GetOrSetLocal::doit_prologue() if building with COMPILER2_OR_JVMCI > not > defined. > > test/hotspot/jtreg/serviceability/jvmti/Heap/IterateHeapWithEscapeAnalysis > Enabled.java > > Make use of [2] and pass test with minimal vm. > > [1] https://bugs.openjdk.java.net/browse/JDK-8249293 > [2] https://bugs.openjdk.java.net/browse/JDK-8251384 > > -----Original Message----- > From: Lindenmaier, Goetz > Sent: Samstag, 22. August 2020 07:46 > To: Reingruber, Richard ; serviceability- > dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot- > runtime-dev at openjdk.java.net > Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance > in the Presence of JVMTI Agents > > Hi Richard, > > I read through your change again. It looks good to me now. > The new naming and additional comments make it > easier to read I think, thank you. > > One small thing: > deoptimization.cpp, l. 1503 > You don't really need the brackets. Two lines below you don't use them > either. > (No webrev needed) > > Best regards, > Goetz. From richard.reingruber at sap.com Fri Aug 28 07:41:02 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Fri, 28 Aug 2020 07:41:02 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com> <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com> <5271429a-481d-ddb9-99dc-b3f6670fcc0b@oracle.com> Message-ID: Thanks a lot! Richard. -----Original Message----- From: Lindenmaier, Goetz Sent: Freitag, 28. August 2020 08:38 To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi Richard, Thanks for the new webrev. The small improvements are fine, too. Reviewed from my side. Best regards, Goetz. > -----Original Message----- > From: Reingruber, Richard > Sent: Thursday, August 27, 2020 10:33 PM > To: Lindenmaier, Goetz ; serviceability- > dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot- > runtime-dev at openjdk.java.net > Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance > in the Presence of JVMTI Agents > > Hi Goetz, > > > I read through your change again. It looks good to me now. > > The new naming and additional comments make it > > easier to read I think, thank you. > > Thanks for all your input! > > > One small thing: > > deoptimization.cpp, l. 1503 > > You don't really need the brackets. Two lines below you don't use them > either. > > (No webrev needed) > > Thanks for providing the correct line off list. Fixed! > > I prepared a new webrev, because I had to rebase after JDK-8249293 [1] and > because I wanted to make use of JDK-8251384 [2] > > Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8/ > Delta: http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8.inc/ > > The delta looks bigger than it is. Most of it is re-indentation of > VM_GetOrSetLocal::deoptimize_objects(). You can see this if you look at > > http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8.inc/src/hotsp > ot/share/prims/jvmtiImpl.cpp.udiff.html > > which does not include the whitespace change. > > Hope you are still ok with webrev.8. The changes are marginal. I've > commented > each below. > > Thanks, Richard. > > --- Details below --- > > src/hotspot/share/prims/jvmtiImpl.cpp > > @@ -425,11 +425,11 @@ > , _depth(depth) > , _index(index) > , _type(type) > , _jvf(NULL) > , _set(false) > - , _eb(NULL, NULL, false) // no references escape > + , _eb(NULL, NULL, type == T_OBJECT) > , _result(JVMTI_ERROR_NONE) > > Currently 'type' is never equal to T_OBJECT at this location, still I think it > is better to check. The compiler will replace the compare with false. > > @@ -630,11 +630,11 @@ > } > > // Revert optimizations based on escape analysis if this is an access to a > local object > bool VM_GetOrSetLocal::deoptimize_objects(javaVFrame* jvf) { > #if COMPILER2_OR_JVMCI > - if (NOT_JVMCI(DoEscapeAnalysis &&) _type == T_OBJECT) { > + assert(_type == T_OBJECT, "EscapeBarrier should not be active if _type != > T_OBJECT"); > > I removed the if from VM_GetOrSetLocal::deoptimize_objects(), because > now it > only gets called if the VM_GetOrSetLocal instance has an active > EscapeBarrier > which will be the case iff the local type is T_OBJECT and if either C2 escape > analysis is enabled or Graal is used. > > src/hotspot/share/runtime/deoptimization.cpp > > You suggested to remove the braces. Done. > > src/hotspot/share/runtime/deoptimization.hpp > > Must provide definition of EscapeBarrier::barrier_active() for new call site in > VM_GetOrSetLocal::doit_prologue() if building with COMPILER2_OR_JVMCI > not > defined. > > test/hotspot/jtreg/serviceability/jvmti/Heap/IterateHeapWithEscapeAnalysis > Enabled.java > > Make use of [2] and pass test with minimal vm. > > [1] https://bugs.openjdk.java.net/browse/JDK-8249293 > [2] https://bugs.openjdk.java.net/browse/JDK-8251384 > > -----Original Message----- > From: Lindenmaier, Goetz > Sent: Samstag, 22. August 2020 07:46 > To: Reingruber, Richard ; serviceability- > dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot- > runtime-dev at openjdk.java.net > Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance > in the Presence of JVMTI Agents > > Hi Richard, > > I read through your change again. It looks good to me now. > The new naming and additional comments make it > easier to read I think, thank you. > > One small thing: > deoptimization.cpp, l. 1503 > You don't really need the brackets. Two lines below you don't use them > either. > (No webrev needed) > > Best regards, > Goetz. From christian.hagedorn at oracle.com Fri Aug 28 08:10:08 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 28 Aug 2020 10:10:08 +0200 Subject: [16] RFR(M): 8249607: C2: assert(!had_error) failed: bad dominance In-Reply-To: References: <87o8my7r0b.fsf@redhat.com> <738ba102-2cbd-a842-0f23-2984a9293035@oracle.com> <95e31d50-aab1-a56c-9077-3e6370d1d94a@oracle.com> Message-ID: <32e58c35-e19a-c4cf-608e-10aa2a8fa12e@oracle.com> Hi Vladimir On 27.08.20 19:23, Vladimir Kozlov wrote: > On 8/27/20 12:53 AM, Christian Hagedorn wrote: >> On 25.08.20 19:42, Christian Hagedorn wrote: >>> On 25.08.20 16:13, Roland Westrelin wrote: >>>> >>>>> In the testcase, a LoadSNode is cloned in >>>>> PhaseIdealLoop::split_if_with_blocks_post() for each use such that >>>>> they >>>>> can float out of a loop. To ensure that these loads cannot float back >>>>> into the loop, we pin them by setting their control input [1]. In the >>>>> testcase, all 3 new clones are pinned to a loop exit node that is part >>>>> of an outer strip mined loop (see [2]). >>>> >>>> Do I understand this right, that all 3 clones are pinned with the same >>>> control? So they common and only of them is kept? >>> >>> Yes, exactly. All are pinned to the inner loop exit node. But at the >>> time we hit the assertion failure, we still got one cloned load (903 >>> LoadS) that is an input to the store (575 StoreI) that's going into >>> the outer strip mined loop safepoint, and one load (901 LoadS) that >>> is triggering the dominance failure. LoadS 902 was removed at some >>> point in between due to other optimizations. >> >> As Roland and I have discussed offline, it seems to be better and >> safer to do a simpler fix that does not change the original behavior >> of the optimization. The new fix suggests not yank AddP nodes (which >> are inputs to the cloned LoadSNodes in the testcase) and also to not >> yank gc barriers. In the testcase, the cloned LoadSNodes are still >> pinned at the loop exit but now they can be optimized and common up to >> one node during igvn that only belongs to the safepoint in the outer >> strip mined loop (i.e. no load after the loop anymore). The load is >> still successfully removed from the inner loop: >> >> http://cr.openjdk.java.net/~chagedorn/8249607/webrev.02/ >> >> I left the improved dominance failure dumping as it is. > > Good. Thank you for your review! >> >> We think that it would be a good idea to revisit this cloning >> optimization in an RFE and also consider webrev.01 there as it seems >> to be more like an enhancement for loop strip mining rather than a bug >> fix. I filed [1] which summarizes some thoughts about it. >> >> What do others think about that? > > I agree with that. Great! Best regards, Christian > Thanks, > Vladimir > >> >> Best regards, >> Christian >> >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8252372 From rwestrel at redhat.com Fri Aug 28 08:27:46 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 28 Aug 2020 10:27:46 +0200 Subject: [16] RFR(M): 8249607: C2: assert(!had_error) failed: bad dominance In-Reply-To: <95e31d50-aab1-a56c-9077-3e6370d1d94a@oracle.com> References: <87o8my7r0b.fsf@redhat.com> <738ba102-2cbd-a842-0f23-2984a9293035@oracle.com> <95e31d50-aab1-a56c-9077-3e6370d1d94a@oracle.com> Message-ID: <87lfhz5g59.fsf@redhat.com> > http://cr.openjdk.java.net/~chagedorn/8249607/webrev.02/ That looks good to me. Roland. From christian.hagedorn at oracle.com Fri Aug 28 08:33:13 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 28 Aug 2020 10:33:13 +0200 Subject: [16] RFR(M): 8249607: C2: assert(!had_error) failed: bad dominance In-Reply-To: <87lfhz5g59.fsf@redhat.com> References: <87o8my7r0b.fsf@redhat.com> <738ba102-2cbd-a842-0f23-2984a9293035@oracle.com> <95e31d50-aab1-a56c-9077-3e6370d1d94a@oracle.com> <87lfhz5g59.fsf@redhat.com> Message-ID: <97e85c41-03b2-d0f5-8e8d-7cfe0d120644@oracle.com> Thank you Roland for your review and your help discussing it! Best regards, Christian On 28.08.20 10:27, Roland Westrelin wrote: > >> http://cr.openjdk.java.net/~chagedorn/8249607/webrev.02/ > > That looks good to me. > > Roland. > From goetz.lindenmaier at sap.com Fri Aug 28 08:57:07 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 28 Aug 2020 08:57:07 +0000 Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: References: <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com> Message-ID: Hi, I'd prefer to push this. I'm not really happy with 11 staying behind 11-oracle in the JVMCI issue. Unfortunately there is nobody in the open community to address this. And there are enough other changes OpenJDK 11 lacks wrt. 11-oracle. If this gap grows big, we can no more claim OpenJDK 11 is a valid replacement for the Oracle vm. So I would continue to try to take all changes that go to 11-oracle to OpenJDK 11, too. And as this is now ported to 11, let's push it. Anyways, it also affects C1 and other shared code, so it might simplify integrating follow-ups. Best regards, Goetz. > -----Original Message----- > From: Severin Gehwolf > Sent: Thursday, August 27, 2020 6:00 PM > To: Andrew Haley ; Doerr, Martin > ; 'hotspot-compiler-dev at openjdk.java.net' > ; jdk-updates- > dev at openjdk.java.net > Cc: Lindenmaier, Goetz > Subject: Re: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. > > On Thu, 2020-08-27 at 16:25 +0100, Andrew Haley wrote: > > Hi, > > > > On 27/08/2020 11:04, Doerr, Martin wrote: > > > > Why is anyone backporting a P4 Enhancement? Seems weird. > > > This is a good question in general. Personally, I'd vote for > > > backporting fewer less important things to 11u in the future. We > > > should better focus on 17 IMHO. > > > > > > However, there are some arguments for backporting this one: > > > - Oracle has done so. There may be more backports in this area and > > > I'd expect less effort if we have the same code in the open version. > > > - Performance is supposed to be better. (Though I didn't measure it.) > > > - New code is much cleaner. Let's keep in mind that we have to > > > support it for quite a while. > > > > > > Are you ok with it? > > > > I'm unsure. While "Oracle has backported it" has been a slam-dunk > > justification for many patches, I am concerned about the destabilizing > > effect of the volume of patches we are processing. > > > > "Better performance" is not in itself justification for a backport > > unless the improvement is really compelling. > > > > "Cleanups" are a red flag. The miserable history of code that has been > > broken by seemingly innocuous cleanups is long. This is a big change > > that affects some very delicate code, but the fact that there is > > already a GraalVM patch we can use is quite persuasive. > > > > So I'm not refusing it, I want people's opinions. > > It seems like a nice-to-have fix for OpenJDK 11 itself. Interest seems > to be coming from Graal. Until there is a more compelling reason to > backport this (other than performance for some JVMCI impl) we shouldn't > backport this. We already have a label for these: jdk11u-jvmci-defer. > We should apply that and re-evaluate later if needed. > > My $0.02 > > Thanks, > Severin From adinn at redhat.com Fri Aug 28 09:21:29 2020 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 28 Aug 2020 10:21:29 +0100 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com> <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com> <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com> <35f8801f-4383-4804-2a9b-5818d1bda763@redhat.com> <524a4aaa-3cf7-7b5e-3e0e-0fd7f4f89fbf@oracle.com> <670fad6f-16ff-a7b3-8775-08dd79809ddf@redhat.com> Message-ID: On 28/08/2020 06:56, Ningsheng Jian wrote: > On 8/27/20 8:54 PM, Vladimir Ivanov wrote: >> I definitely don't want to hinder/block the impressive work Ningsheng >> and others at Arm are doing for SVE support. >> >> Frankly speaking, my main concern is that the implementation can stay >> that way forever ;-) That's why I'm trying to get enough ground covered >> in the discussion and some agreements/commitments to be made before it >> is integrated. Sure, I agree that we should use this implementation as a stepping stone to a set of unified AArch64 vector rules that handle operations for vectors of all size. Having looked at the latest x86 vector code I get the impression that there is a much greater problem unifying the plethora of different cases within the x86_64 family than there will be unifying x86_64 and AArch64 in this regard. Your solution of using the vec (and legVec) register class(es) has tamed the proliferation of match rules yet it still leaves a great deal of complexity in the logic that controls the handling of those matches. I think it will be much easier to subsume the AArch64 Neon and SVE cases under one common vec type and the resulting case handling will be much less complex. Of course, the rationale for doing so is far less pressing than with x86 since the multiplication of match rules is not so large (particularly as there is no cross-combination with memory operands). Yet, it still seems worth doing. >> Leaving RA part aside, I have one suggestion which should help in the >> future: let's try to consistently follow full-width vector abstraction. >> In AD file, vecA operand is way too similar to vecX et al which makes a >> wrong impression it's yet another vector flavor. So, choosing a better >> name will help when representation changes. For example, x86 moved away >> from vecX/... operands to a single generic one (called "vec") and you >> can take a loot at x86.ad to see the result. > > Thanks for the suggestion. In current implementation vecA does not > include vecD/vecX for NEON - so actually it's regarded as another vector > flavor. We try to keep the SVE implementation separated from original > NEON code (and a new ad file is also introduced), to make the code > better maintainable and reviewable. What do you think about this naming, > Andrew? If the goal is that eventually a vec register class will parametrize the relevant rules for VecD, VecX and VecA operations then I don't see any harm in re-labelling the vecA class to simply be called vec. The intention to use this to handle all cases can be signalled by documenting this register class to explain that it is currently only used to specify VecA rules but will eventually be used as a generic class, parameterizing rules that subsume all applicable VecD, VecX and VecA cases. When that happens we can quite naturally fold the aarch64_sve rules back into aarch64.ad with common and/or special case handling merging under a single rule. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From vladimir.x.ivanov at oracle.com Fri Aug 28 09:56:10 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 28 Aug 2020 12:56:10 +0300 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com> <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com> <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com> <35f8801f-4383-4804-2a9b-5818d1bda763@redhat.com> <524a4aaa-3cf7-7b5e-3e0e-0fd7f4f89fbf@oracle.com> <670fad6f-16ff-a7b3-8775-08dd79809ddf@redhat.com> Message-ID: <9b585dff-38be-16b5-b1a1-4ea0207458b9@oracle.com> >>> Frankly speaking, my main concern is that the implementation can stay >>> that way forever ;-) That's why I'm trying to get enough ground covered >>> in the discussion and some agreements/commitments to be made before it >>> is integrated. > > Sure, I agree that we should use this implementation as a stepping stone > to a set of unified AArch64 vector rules that handle operations for > vectors of all size. Having looked at the latest x86 vector code I get > the impression that there is a much greater problem unifying the > plethora of different cases within the x86_64 family than there will be > unifying x86_64 and AArch64 in this regard. Your solution of using the > vec (and legVec) register class(es) has tamed the proliferation of match > rules yet it still leaves a great deal of complexity in the logic that > controls the handling of those matches. I believe you are referring to ubiquitous presence of predicates in AD instructions for vector cases. The root cause is that operands have very limited influence on matching logic. There's a promising idea to introduce predicated operands and factor complex predicates into a set of simpler ones placed on operands instead. It should significantly reduce the perceived complexity, but the prototyping hasn't been finished yet. [...] >>> Leaving RA part aside, I have one suggestion which should help in the >>> future: let's try to consistently follow full-width vector abstraction. >>> In AD file, vecA operand is way too similar to vecX et al which makes a >>> wrong impression it's yet another vector flavor. So, choosing a better >>> name will help when representation changes. For example, x86 moved away >>> from vecX/... operands to a single generic one (called "vec") and you >>> can take a loot at x86.ad to see the result. >> >> Thanks for the suggestion. In current implementation vecA does not >> include vecD/vecX for NEON - so actually it's regarded as another vector >> flavor. We try to keep the SVE implementation separated from original >> NEON code (and a new ad file is also introduced), to make the code >> better maintainable and reviewable. What do you think about this naming, >> Andrew? > If the goal is that eventually a vec register class will parametrize the > relevant rules for VecD, VecX and VecA operations then I don't see any > harm in re-labelling the vecA class to simply be called vec. The > intention to use this to handle all cases can be signalled by > documenting this register class to explain that it is currently only > used to specify VecA rules but will eventually be used as a generic > class, parameterizing rules that subsume all applicable VecD, VecX and > VecA cases. When that happens we can quite naturally fold the > aarch64_sve rules back into aarch64.ad with common and/or special case > handling merging under a single rule. One more point on naming: though it was me who proposed the name "vec" on x86, I don't think it's the best option anymore. Considering it's desirable to get rid of VecS/VecD/VecX/... machine ideal registers and replace them with a single one, I think using Op_RegV is a better alternative to Op_Vec. Hence, regV/rRegV/vReg look better (depending on conventions adopted in particular AD file). Best regards, Vladimir Ivanov From goetz.lindenmaier at sap.com Fri Aug 28 11:48:13 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 28 Aug 2020 11:48:13 +0000 Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: <778219306.3426.1598610873510.JavaMail.www@wwinf1p10> References: <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com> <778219306.3426.1598610873510.JavaMail.www@wwinf1p10> Message-ID: Hi, There are queries for this on the jdk11 project page: https://wiki.openjdk.java.net/display/JDKUpdates/JDK11u e.g. https://bugs.openjdk.java.net/issues/?filter=39054 Best regards, Goetz. From: gouessej at orange.fr Sent: Friday, August 28, 2020 12:35 PM To: Lindenmaier, Goetz ; 'Severin Gehwolf' ; Andrew Haley ; Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' ; jdk-updates-dev at openjdk.java.net Subject: RE: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. Please can you elaborate about " there are enough other changes OpenJDK 11 lacks wrt. 11-oracle"? > Message du 28/08/20 11:03 > De : "Lindenmaier, Goetz" > > A : "'Severin Gehwolf'" >, "Andrew Haley" >, "Doerr, Martin" >, "'hotspot-compiler-dev at openjdk.java.net'" >, "jdk-updates-dev at openjdk.java.net" > > Copie ? : > Objet : RE: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. > > Hi, > > I'd prefer to push this. > I'm not really happy with 11 staying behind 11-oracle in the JVMCI issue. > Unfortunately there is nobody in the open community to address this. > And there are enough other changes OpenJDK 11 lacks wrt. 11-oracle. > If this gap grows big, we can no more claim OpenJDK 11 is a valid replacement > for the Oracle vm. > > So I would continue to try to take all changes that go to 11-oracle > to OpenJDK 11, too. > > And as this is now ported to 11, let's push it. > Anyways, it also affects C1 and other shared code, so it might > simplify integrating follow-ups. > > Best regards, > Goetz. > > > > -----Original Message----- > > From: Severin Gehwolf > > > Sent: Thursday, August 27, 2020 6:00 PM > > To: Andrew Haley >; Doerr, Martin > > >; 'hotspot-compiler-dev at openjdk.java.net' > > >; jdk-updates- > > dev at openjdk.java.net > > Cc: Lindenmaier, Goetz > > > Subject: Re: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. > > > > On Thu, 2020-08-27 at 16:25 +0100, Andrew Haley wrote: > > > Hi, > > > > > > On 27/08/2020 11:04, Doerr, Martin wrote: > > > > > Why is anyone backporting a P4 Enhancement? Seems weird. > > > > This is a good question in general. Personally, I'd vote for > > > > backporting fewer less important things to 11u in the future. We > > > > should better focus on 17 IMHO. > > > > > > > > However, there are some arguments for backporting this one: > > > > - Oracle has done so. There may be more backports in this area and > > > > I'd expect less effort if we have the same code in the open version. > > > > - Performance is supposed to be better. (Though I didn't measure it.) > > > > - New code is much cleaner. Let's keep in mind that we have to > > > > support it for quite a while. > > > > > > > > Are you ok with it? > > > > > > I'm unsure. While "Oracle has backported it" has been a slam-dunk > > > justification for many patches, I am concerned about the destabilizing > > > effect of the volume of patches we are processing. > > > > > > "Better performance" is not in itself justification for a backport > > > unless the improvement is really compelling. > > > > > > "Cleanups" are a red flag. The miserable history of code that has been > > > broken by seemingly innocuous cleanups is long. This is a big change > > > that affects some very delicate code, but the fact that there is > > > already a GraalVM patch we can use is quite persuasive. > > > > > > So I'm not refusing it, I want people's opinions. > > > > It seems like a nice-to-have fix for OpenJDK 11 itself. Interest seems > > to be coming from Graal. Until there is a more compelling reason to > > backport this (other than performance for some JVMCI impl) we shouldn't > > backport this. We already have a label for these: jdk11u-jvmci-defer. > > We should apply that and re-evaluate later if needed. > > > > My $0.02 > > > > Thanks, > > Severin > > From aph at redhat.com Fri Aug 28 12:52:18 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 28 Aug 2020 13:52:18 +0100 Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: References: <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com> Message-ID: <4137e474-cf95-b380-1fd5-ca71f1313d22@redhat.com> On 28/08/2020 09:57, Lindenmaier, Goetz wrote: > I'd prefer to push this. > I'm not really happy with 11 staying behind 11-oracle in the JVMCI issue. > Unfortunately there is nobody in the open community to address this. > And there are enough other changes OpenJDK 11 lacks wrt. 11-oracle. > If this gap grows big, we can no more claim OpenJDK 11 is a valid replacement > for the Oracle vm. What JVMCI issue is this? Please explain. All that I see is a faster "slow" locking path for monitors. > So I would continue to try to take all changes that go to 11-oracle > to OpenJDK 11, too. > > And as this is now ported to 11, let's push it. > Anyways, it also affects C1 and other shared code, so it might > simplify integrating follow-ups. That is not a good reason for backporting. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From goetz.lindenmaier at sap.com Fri Aug 28 13:11:57 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 28 Aug 2020 13:11:57 +0000 Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: <4137e474-cf95-b380-1fd5-ca71f1313d22@redhat.com> References: <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com> <4137e474-cf95-b380-1fd5-ca71f1313d22@redhat.com> Message-ID: Hi Andrew, > > I'm not really happy with 11 staying behind 11-oracle in the JVMCI issue. > What JVMCI issue is this? Please explain. All that I see is a faster > "slow" locking path for monitors. This was meant as a more general comment. I wanted to address that we don't integrate many of the JVMCI changes so the OpenJDK 11 is probably not usable with graal. The comment was not tailored to this specific change. Unfortunately our team has not the capacity to look at JVMCI/graal. Best regards, Goetz. From gouessej at orange.fr Fri Aug 28 10:34:33 2020 From: gouessej at orange.fr (gouessej at orange.fr) Date: Fri, 28 Aug 2020 12:34:33 +0200 (CEST) Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: References: <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com> Message-ID: <778219306.3426.1598610873510.JavaMail.www@wwinf1p10> Please can you elaborate about " there are enough other changes OpenJDK 11 lacks wrt. 11-oracle"? ? ? > Message du 28/08/20 11:03 > De : "Lindenmaier, Goetz" > A : "'Severin Gehwolf'" , "Andrew Haley" , "Doerr, Martin" , "'hotspot-compiler-dev at openjdk.java.net'" , "jdk-updates-dev at openjdk.java.net" > Copie ? : > Objet : RE: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. > > Hi, > > I'd prefer to push this. > I'm not really happy with 11 staying behind 11-oracle in the JVMCI issue. > Unfortunately there is nobody in the open community to address this. > And there are enough other changes OpenJDK 11 lacks wrt. 11-oracle. > If this gap grows big, we can no more claim OpenJDK 11 is a valid replacement > for the Oracle vm. > > So I would continue to try to take all changes that go to 11-oracle > to OpenJDK 11, too. > > And as this is now ported to 11, let's push it. > Anyways, it also affects C1 and other shared code, so it might > simplify integrating follow-ups. > > Best regards, > Goetz. > > > > -----Original Message----- > > From: Severin Gehwolf > > Sent: Thursday, August 27, 2020 6:00 PM > > To: Andrew Haley ; Doerr, Martin > > ; 'hotspot-compiler-dev at openjdk.java.net' > > ; jdk-updates- > > dev at openjdk.java.net > > Cc: Lindenmaier, Goetz > > Subject: Re: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. > > > > On Thu, 2020-08-27 at 16:25 +0100, Andrew Haley wrote: > > > Hi, > > > > > > On 27/08/2020 11:04, Doerr, Martin wrote: > > > > > Why is anyone backporting a P4 Enhancement? Seems weird. > > > > This is a good question in general. Personally, I'd vote for > > > > backporting fewer less important things to 11u in the future. We > > > > should better focus on 17 IMHO. > > > > > > > > However, there are some arguments for backporting this one: > > > > - Oracle has done so. There may be more backports in this area and > > > > I'd expect less effort if we have the same code in the open version. > > > > - Performance is supposed to be better. (Though I didn't measure it.) > > > > - New code is much cleaner. Let's keep in mind that we have to > > > > support it for quite a while. > > > > > > > > Are you ok with it? > > > > > > I'm unsure. While "Oracle has backported it" has been a slam-dunk > > > justification for many patches, I am concerned about the destabilizing > > > effect of the volume of patches we are processing. > > > > > > "Better performance" is not in itself justification for a backport > > > unless the improvement is really compelling. > > > > > > "Cleanups" are a red flag. The miserable history of code that has been > > > broken by seemingly innocuous cleanups is long. This is a big change > > > that affects some very delicate code, but the fact that there is > > > already a GraalVM patch we can use is quite persuasive. > > > > > > So I'm not refusing it, I want people's opinions. > > > > It seems like a nice-to-have fix for OpenJDK 11 itself. Interest seems > > to be coming from Graal. Until there is a more compelling reason to > > backport this (other than performance for some JVMCI impl) we shouldn't > > backport this. We already have a label for these: jdk11u-jvmci-defer. > > We should apply that and re-evaluate later if needed. > > > > My $0.02 > > > > Thanks, > > Severin > > From sgehwolf at redhat.com Fri Aug 28 13:30:51 2020 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Fri, 28 Aug 2020 15:30:51 +0200 Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: References: <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com> <4137e474-cf95-b380-1fd5-ca71f1313d22@redhat.com> Message-ID: On Fri, 2020-08-28 at 13:11 +0000, Lindenmaier, Goetz wrote: > I wanted to address that we don't integrate many of the JVMCI changes > so the OpenJDK 11 is probably not usable with graal. https://github.com/graalvm/mandrel#how-does-mandrel-differ-from-graal Thanks, Severin From goetz.lindenmaier at sap.com Fri Aug 28 14:30:32 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 28 Aug 2020 14:30:32 +0000 Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: References: <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com> <4137e474-cf95-b380-1fd5-ca71f1313d22@redhat.com> Message-ID: That's cool. So it works ?? They would probably profit from '8241234: Unify monitor enter/exit runtime entries." Best regards, Goetz. > -----Original Message----- > From: Severin Gehwolf > Sent: Friday, August 28, 2020 3:31 PM > To: Lindenmaier, Goetz ; 'Andrew Haley' > ; Doerr, Martin ; 'hotspot- > compiler-dev at openjdk.java.net' ; > jdk-updates-dev at openjdk.java.net > Subject: Re: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. > > On Fri, 2020-08-28 at 13:11 +0000, Lindenmaier, Goetz wrote: > > I wanted to address that we don't integrate many of the JVMCI changes > > so the OpenJDK 11 is probably not usable with graal. > > https://github.com/graalvm/mandrel#how-does-mandrel-differ-from-graal > > Thanks, > Severin From aph at redhat.com Fri Aug 28 14:35:39 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 28 Aug 2020 15:35:39 +0100 Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: References: <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com> <4137e474-cf95-b380-1fd5-ca71f1313d22@redhat.com> Message-ID: Hi, On 28/08/2020 14:11, Lindenmaier, Goetz wrote: > >>> I'm not really happy with 11 staying behind 11-oracle in the JVMCI issue. >> What JVMCI issue is this? Please explain. All that I see is a faster >> "slow" locking path for monitors. > > This was meant as a more general comment. I wanted to address that > we don't integrate many of the JVMCI changes so the OpenJDK 11 is > probably not usable with graal. The comment was not tailored to > this specific change. Unfortunately our team has not the capacity > to look at JVMCI/graal. Fair enough. Now, let's think about the wider point. Any change is bad because our users want, above all else, stability. So first we should avoid change. In order to justify any change, I want backport patches to have a real justification. That is to say, they must have a real effect on a Java user's experience. Fixing visible bugs obviously qualifies, as does a significant performance bump, as does meeting a new crypto specification, etc, etc. The other good reason is improved stability, which includes better testing. A real justification doesn't exclude "cleanups", as long as there is some other benefit, such as making making a proposed backport cleaner. But it has to be a backport that we are actually doing, not some unknown backport that might happen some day. It may well be that the 8241234 fix has a definite performance advantage, in which case it might be a reasonable thing to do. The provided justifications were: - Oracle has done so. There may be more backports in this area and I'd expect less effort if we have the same code in the open version. - Performance is supposed to be better. - New code is much cleaner. But even though the new code is much cleaner, it's a significant change in a very delicate area. Bugs in this are can take a long time to reveal themselves, usually under heavy load in a production situation. I am not saying no to this patch. I am asking "Are you sure that this change is worth making the change?" Given that I doubt anyone will ever notice this change unless it breaks something important, I have my doubts. So, anyone: is there any chance that this patch will break something? Is this change worth the churn? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From martin.doerr at sap.com Fri Aug 28 15:02:53 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 28 Aug 2020 15:02:53 +0000 Subject: RFR(S): 8250635: MethodArityHistogram should use Compile_lock in favour of fancy checks In-Reply-To: <6C21DEE4-95FD-4EDA-9DBF-2B12560A5C04@sap.com> References: <6C21DEE4-95FD-4EDA-9DBF-2B12560A5C04@sap.com> Message-ID: Hi Lutz, just for my understanding: What exactly are we protecting against by holding Compile_lock? Is it for concurrent initialization or concurrent unloading? Note that it's also possible to iterate only over alive nmethods: NMethodIterator iter(NMethodIterator::only_alive); Best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev retn at openjdk.java.net> On Behalf Of Schmidt, Lutz > Sent: Mittwoch, 26. August 2020 17:19 > To: hotspot-compiler-dev at openjdk.java.net > Subject: [CAUTION] RFR(S): 8250635: MethodArityHistogram should use > Compile_lock in favour of fancy checks > > Dear all, > > may I please request reviews for this small enhancement? Instead of calling a > method doing complicated and fancy (hard to understand) checks, the > iteration over all nmethods is now protected by holding the Compile_lock in > addition to the CodeCache_lock. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8250635 > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8250635.00/ > > Thank you! > Lutz > From headius at headius.com Fri Aug 28 15:41:35 2020 From: headius at headius.com (Charles Oliver Nutter) Date: Fri, 28 Aug 2020 10:41:35 -0500 Subject: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby In-Reply-To: References: <2f8c8f7a-3563-758b-9bb2-4e267ef7d694@oracle.com> Message-ID: It has been a couple months so I want to wake this thread up again. As far as I know nothing has changed. Just to emphasize the importance here: if indy call sites are not inlining, then JRuby is clearly missing out on tons of performance. It seems likely to also affect other languages using invokedynamic, and based on other reports (and my own experiments) it may not matter if exotic classloader structures are in use. What is the next step for me to help get this problem solved? - Charlie On Mon, Jun 15, 2020 at 4:38 PM Charles Oliver Nutter wrote: > > Charlie Gracie figured out a nice Hotspot incantation to reproduce > 100% and dump just the PriintInlining graph in question. > > He also managed this with tiered compilation *turned off*, so that may > have been a red herring. > > jruby \ > -Xcompile.invokedynamic \ > "-J-XX:CompileCommand=option *::*foo*,PrintInlining" \ > "-J-XX:CompileCommand=compileonly,*::*foo*" \ > "-J-XX:-TieredCompilation" \ > main.rb > > On Mon, Jun 15, 2020 at 4:23 PM Claes Redestad > wrote: > > If so, a possible workaround might be to pass the generated class > > through Unsafe.ensureClassInitialized (or Lookup.ensureInitialized if on > > 15+) > > I added Unsafe.ensureClassInitialized right after the JIT class has > been defined, and it did not appear to help. > > I tried turning off JRuby's background JIT threads, which could cause > a method to get jitted and loaded twice (into separate classloaders). > The JRuby flag is "-Xjit.background=false" but it also did not help. > > - Charlie From vladimir.x.ivanov at oracle.com Fri Aug 28 15:51:02 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 28 Aug 2020 18:51:02 +0300 Subject: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby In-Reply-To: References: <2f8c8f7a-3563-758b-9bb2-4e267ef7d694@oracle.com> Message-ID: <25761258-96b7-9795-41db-94147ff2b3c5@oracle.com> Hi Charles, I'll take a look and will try to reproduce it myself. Meanwhile, here's what Charlie reported: "Starting at the error message unloaded signature classes I worked backwards to find the class(es) which were causing the error. The first class in the signature that caused issues was org/jruby/RubyModule. This class was found on the current class loader but it is rejected due to a protection domain check. There was a 2nd failure related to java/lang/String which is just not found on the particular class loader." It does sound like there's something fishy happening with class loaders and compilation context. Does it ring any bell for you? Best regards, Vladimir Ivanov On 28.08.2020 18:41, Charles Oliver Nutter wrote: > It has been a couple months so I want to wake this thread up again. As > far as I know nothing has changed. > > Just to emphasize the importance here: if indy call sites are not > inlining, then JRuby is clearly missing out on tons of performance. It > seems likely to also affect other languages using invokedynamic, and > based on other reports (and my own experiments) it may not matter if > exotic classloader structures are in use. > > What is the next step for me to help get this problem solved? > > - Charlie > > On Mon, Jun 15, 2020 at 4:38 PM Charles Oliver Nutter > wrote: >> >> Charlie Gracie figured out a nice Hotspot incantation to reproduce >> 100% and dump just the PriintInlining graph in question. >> >> He also managed this with tiered compilation *turned off*, so that may >> have been a red herring. >> >> jruby \ >> -Xcompile.invokedynamic \ >> "-J-XX:CompileCommand=option *::*foo*,PrintInlining" \ >> "-J-XX:CompileCommand=compileonly,*::*foo*" \ >> "-J-XX:-TieredCompilation" \ >> main.rb >> >> On Mon, Jun 15, 2020 at 4:23 PM Claes Redestad >> wrote: >>> If so, a possible workaround might be to pass the generated class >>> through Unsafe.ensureClassInitialized (or Lookup.ensureInitialized if on >>> 15+) >> >> I added Unsafe.ensureClassInitialized right after the JIT class has >> been defined, and it did not appear to help. >> >> I tried turning off JRuby's background JIT threads, which could cause >> a method to get jitted and loaded twice (into separate classloaders). >> The JRuby flag is "-Xjit.background=false" but it also did not help. >> >> - Charlie From headius at headius.com Fri Aug 28 15:53:10 2020 From: headius at headius.com (Charles Oliver Nutter) Date: Fri, 28 Aug 2020 10:53:10 -0500 Subject: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby In-Reply-To: <25761258-96b7-9795-41db-94147ff2b3c5@oracle.com> References: <2f8c8f7a-3563-758b-9bb2-4e267ef7d694@oracle.com> <25761258-96b7-9795-41db-94147ff2b3c5@oracle.com> Message-ID: It does not ring any bells but we do generate runtime-compiled methods into their own classloaders. They should be pretty simple, though... same protection domain as parent classloader and as each other. I have also tried forcing all methods to be generated into the same classloader and did not see any improvement. I would love for this to be my problem, so I can fix it! - Charlie On Fri, Aug 28, 2020 at 10:51 AM Vladimir Ivanov wrote: > > Hi Charles, > > I'll take a look and will try to reproduce it myself. > > Meanwhile, here's what Charlie reported: > > "Starting at the error message unloaded signature classes I worked > backwards to find the class(es) which were causing the error. The first > class in the signature that caused issues was org/jruby/RubyModule. This > class was found on the current class loader but it is rejected due to a > protection domain check. There was a 2nd failure related to > java/lang/String which is just not found on the particular class loader." > > It does sound like there's something fishy happening with class loaders > and compilation context. Does it ring any bell for you? > > Best regards, > Vladimir Ivanov > > On 28.08.2020 18:41, Charles Oliver Nutter wrote: > > It has been a couple months so I want to wake this thread up again. As > > far as I know nothing has changed. > > > > Just to emphasize the importance here: if indy call sites are not > > inlining, then JRuby is clearly missing out on tons of performance. It > > seems likely to also affect other languages using invokedynamic, and > > based on other reports (and my own experiments) it may not matter if > > exotic classloader structures are in use. > > > > What is the next step for me to help get this problem solved? > > > > - Charlie > > > > On Mon, Jun 15, 2020 at 4:38 PM Charles Oliver Nutter > > wrote: > >> > >> Charlie Gracie figured out a nice Hotspot incantation to reproduce > >> 100% and dump just the PriintInlining graph in question. > >> > >> He also managed this with tiered compilation *turned off*, so that may > >> have been a red herring. > >> > >> jruby \ > >> -Xcompile.invokedynamic \ > >> "-J-XX:CompileCommand=option *::*foo*,PrintInlining" \ > >> "-J-XX:CompileCommand=compileonly,*::*foo*" \ > >> "-J-XX:-TieredCompilation" \ > >> main.rb > >> > >> On Mon, Jun 15, 2020 at 4:23 PM Claes Redestad > >> wrote: > >>> If so, a possible workaround might be to pass the generated class > >>> through Unsafe.ensureClassInitialized (or Lookup.ensureInitialized if on > >>> 15+) > >> > >> I added Unsafe.ensureClassInitialized right after the JIT class has > >> been defined, and it did not appear to help. > >> > >> I tried turning off JRuby's background JIT threads, which could cause > >> a method to get jitted and loaded twice (into separate classloaders). > >> The JRuby flag is "-Xjit.background=false" but it also did not help. > >> > >> - Charlie From lutz.schmidt at sap.com Fri Aug 28 16:01:26 2020 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Fri, 28 Aug 2020 16:01:26 +0000 Subject: RFR(S): 8250635: MethodArityHistogram should use Compile_lock in favour of fancy checks In-Reply-To: References: <6C21DEE4-95FD-4EDA-9DBF-2B12560A5C04@sap.com> Message-ID: <73F65D4B-5970-41A3-B678-1F947BEE7392@sap.com> Hi Martin, good question. Originally, the iteration was only protected by the CodeCache_lock. That proved insufficient: the CodeCache_lock only protects against structural changes in the CodeCache. The contents of the individual code blobs can be, and is, modified independently. By acquiring the Compile_lock, those modifications are blocked while iterating. With the help of a consistency check (not contained in the RFR code), it was found that there is a slight chance to see the case (nm != NULL) && (method() == NULL). That chance is eliminated by adding the is_alive() check which is less invasive compared to adding a new nmethods_do() variant. Regards, Lutz ?On 28.08.20, 17:02, "Doerr, Martin" wrote: Hi Lutz, just for my understanding: What exactly are we protecting against by holding Compile_lock? Is it for concurrent initialization or concurrent unloading? Note that it's also possible to iterate only over alive nmethods: NMethodIterator iter(NMethodIterator::only_alive); Best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev retn at openjdk.java.net> On Behalf Of Schmidt, Lutz > Sent: Mittwoch, 26. August 2020 17:19 > To: hotspot-compiler-dev at openjdk.java.net > Subject: [CAUTION] RFR(S): 8250635: MethodArityHistogram should use > Compile_lock in favour of fancy checks > > Dear all, > > may I please request reviews for this small enhancement? Instead of calling a > method doing complicated and fancy (hard to understand) checks, the > iteration over all nmethods is now protected by holding the Compile_lock in > addition to the CodeCache_lock. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8250635 > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8250635.00/ > > Thank you! > Lutz > From martin.doerr at sap.com Fri Aug 28 16:19:35 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 28 Aug 2020 16:19:35 +0000 Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. In-Reply-To: References: <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com> <4137e474-cf95-b380-1fd5-ca71f1313d22@redhat.com> Message-ID: Hi, seems like two different philosophies collide here. 1. Some people assume that all of Oracle's 11u changes should get integrated into the open version. 2. Others only want to take them on demand with a good reason. I think there are good arguments for and against both ones. Personally, I think approach 1. is better at the beginning of an updates branch while it may be reasonable to switch at some point of time. At the moment, I still prefer to stay in sync with Oracle as far as we can. Regarding this change, I don't see a high risk. What it basically does is that it reuses better code which is already used by C2 for C1 and JVMCI compilers. So there's no substantial new code. It's tested by GraalVM and by our internal testing. There are no known issues with it. So I'd rather vote for taking it. Best regards, Martin > -----Original Message----- > From: Andrew Haley > Sent: Freitag, 28. August 2020 16:36 > To: Lindenmaier, Goetz ; 'Severin Gehwolf' > ; Doerr, Martin ; 'hotspot- > compiler-dev at openjdk.java.net' dev at openjdk.java.net>; jdk-updates-dev at openjdk.java.net > Subject: Re: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries. > > Hi, > > On 28/08/2020 14:11, Lindenmaier, Goetz wrote: > > > >>> I'm not really happy with 11 staying behind 11-oracle in the JVMCI issue. > >> What JVMCI issue is this? Please explain. All that I see is a faster > >> "slow" locking path for monitors. > > > > This was meant as a more general comment. I wanted to address that > > we don't integrate many of the JVMCI changes so the OpenJDK 11 is > > probably not usable with graal. The comment was not tailored to > > this specific change. Unfortunately our team has not the capacity > > to look at JVMCI/graal. > > Fair enough. > > Now, let's think about the wider point. > > Any change is bad because our users want, above all else, > stability. So first we should avoid change. > > In order to justify any change, I want backport patches to have a real > justification. That is to say, they must have a real effect on a Java > user's experience. Fixing visible bugs obviously qualifies, as does a > significant performance bump, as does meeting a new crypto > specification, etc, etc. > > The other good reason is improved stability, which includes better > testing. > > A real justification doesn't exclude "cleanups", as long as there is > some other benefit, such as making making a proposed backport > cleaner. But it has to be a backport that we are actually doing, not > some unknown backport that might happen some day. > > It may well be that the 8241234 fix has a definite performance > advantage, in which case it might be a reasonable thing to do. > The provided justifications were: > > - Oracle has done so. There may be more backports in this area and I'd > expect less effort if we have the same code in the open version. > - Performance is supposed to be better. > - New code is much cleaner. > > But even though the new code is much cleaner, it's a significant > change in a very delicate area. Bugs in this are can take a long time > to reveal themselves, usually under heavy load in a production > situation. > > I am not saying no to this patch. I am asking "Are you sure that this > change is worth making the change?" Given that I doubt anyone will > ever notice this change unless it breaks something important, I have > my doubts. > > So, anyone: is there any chance that this patch will break something? > Is this change worth the churn? > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From hohensee at amazon.com Fri Aug 28 16:40:28 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Fri, 28 Aug 2020 16:40:28 +0000 Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) Message-ID: <5D556B7D-1995-4FBC-9176-E79FFC789571@amazon.com> One's perspective on the benchmark results depends on the expected frequency of the input types. If we don't expect frequent NaNs (I don?t, because they mean your algorithm is numerically unstable and you're wasting your time running it), or zeros (somewhat arguable, but note that most codes go to some lengths to eliminate zeros, e.g., using sparse arrays), then this patch seems to me to be a win. Thanks, Paul ?On 8/25/20, 9:57 AM, "hotspot-compiler-dev on behalf of Andrew Haley" wrote: On 24/08/2020 22:52, Dmitry Chuyko wrote: > > I added two more intrinsics -- for copySign, they are controlled by > UseCopySignIntrinsic flag. > > webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.03/ > > It also contains 'benchmarks' directory: > http://cr.openjdk.java.net/~dchuyko/8251525/webrev.03/benchmarks/ > > There are 8 benchmarks there: (double | float) x (blackhole | reduce) x > (current j.l.Math.signum | abs()>0 check). > > My results on Arm are in signum-facgt-copysign.ods. Main case is > 'random' which is actually a random from positive and negative numbers > between -0.5 and +0.5. > > Basically we have ~14% improvement in 'reduce' benchmark variant but > ~20% regression in 'blackhole' variant in case of only copySign() > intrinsified. > > Same picture if abs()>0 check is used in signum() (+-5%). This variant > is included as it shows very good results on x86. > > Intrinsic for signum() gives improvement of main case in both > 'blackhole' and 'reduce' variants of benchmark: 28% and 11%, which is a > noticeable difference. Ignoring Blackhole for the moment, this is what I'm seeing for the reduction/random case: Benchmark Mode Cnt Score Error Units ThunderX 2: -XX:-UseSignumIntrinsic -XX:-UseCopySignIntrinsic DoubleReduceBench.ofRandom avgt 3 2.456 ? 0.065 ns/op -XX:+UseSignumIntrinsic -XX:-UseCopySignIntrinsic DoubleReduceBench.ofRandom avgt 3 2.766 ? 0.107 ns/op -XX:-UseSignumIntrinsic -XX:+UseCopySignIntrinsic DoubleReduceBench.ofRandom avgt 3 2.537 ? 0.770 ns/op Neoverse N1 (Actually Amazon m6g.16xlarge): -XX:-UseSignumIntrinsic -XX:-UseCopySignIntrinsic DoubleReduceBench.ofRandom avgt 3 1.173 ? 0.001 ns/op -XX:+UseSignumIntrinsic -XX:-UseCopySignIntrinsic DoubleReduceBench.ofRandom avgt 3 1.043 ? 0.022 ns/op -XX:-UseSignumIntrinsic -XX:+UseCopySignIntrinsic DoubleReduceBench.ofRandom avgt 3 1.012 ? 0.001 ns/op By your own numbers, in the reduce benchmark the signum intrinsic is worse than default for all 0 and NaN, but about 12% better for random, >0, and <0. If you take the average of the sppedups and slowdowns it's actually worse than default. By my reckoning, if you take all possibilities (Nan, <0, >0, 0, Random) into account, the best-performing on the reduce test is actually Abs/Copysign, but there's very little in it. The only time that the signum intrinsic actually wins is when you're storing the result into memory *and* flushing the store buffer. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From honguye at microsoft.com Fri Aug 28 17:46:07 2020 From: honguye at microsoft.com (Nhat Nguyen) Date: Fri, 28 Aug 2020 17:46:07 +0000 Subject: [EXTERNAL] Re: RFR(S) 8251271- C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes In-Reply-To: <3c989485-754f-b7f5-e91f-c7c0adfdaf88@oracle.com> References: <3c989485-754f-b7f5-e91f-c7c0adfdaf88@oracle.com> Message-ID: Thank you Christian for taking a look at the patch! I'll be sure to ask a sponsor to assign the bug for me in the future. Nhat -----Original Message----- From: Christian Hagedorn Sent: Thursday, August 27, 2020 7:54 AM To: Nhat Nguyen ; hotspot-compiler-dev at openjdk.java.net Subject: [EXTERNAL] Re: RFR(S) 8251271- C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes Hi Nhat Looks good to me! Just make sure you that next time you assign the bug to you or a sponsor and/or leave a comment that you intend to work on it to avoid the possibility of some duplicated work (was no problem in this case) ;-) Best regards, Christian On 26.08.20 20:55, Nhat Nguyen wrote: > Hi hotspot-compiler-dev, > > Please review the following patch to address > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs > .openjdk.java.net%2Fbrowse%2FJDK-8251271&data=02%7C01%7Chonguye%40 > microsoft.com%7C52cd8fdc324d4e86326b08d84a991fef%7C72f988bf86f141af91a > b2d7cd011db47%7C1%7C0%7C637341368808595657&sdata=j3YM%2BfxaO8KK1Ie > CbKCPRYjwmGVfCUBrNULXDCJcUxM%3D&reserved=0 > The bug is currently assigned to Christian Hagedorn, but he was supportive of me submitting the patch instead. > I have run hotspot/tier1 and jdk/tier1 tests to make sure that the change is working as intended. > > webrev: > https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.open > jdk.java.net%2F~burban%2Fnhat%2FJDK-8251271%2Fwebrev.00%2F&data=02 > %7C01%7Chonguye%40microsoft.com%7C52cd8fdc324d4e86326b08d84a991fef%7C7 > 2f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637341368808595657&sdata > =PsHUTKZf9MrvM8Et5zPXsXpj32mfsGfBRGoZATjOv0I%3D&reserved=0 > > Thank you, > Nhat > From Roger.Riggs at oracle.com Fri Aug 28 17:54:42 2020 From: Roger.Riggs at oracle.com (Roger Riggs) Date: Fri, 28 Aug 2020 13:54:42 -0400 Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: <8ece8d2e-fd99-b734-211e-a32b534a7dc8@linux.ibm.com> References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> <8ece8d2e-fd99-b734-211e-a32b534a7dc8@linux.ibm.com> Message-ID: <8d53dcf8-635a-11e2-4f6a-39b70e2c3b8b@oracle.com> Hi Corey, A few comments on core-libs side... The naming convention for methods that end in '0' is usually to indicate they are the bottom-most method or a native method. So I think you can/should rename the methods to make the most sense as to their function. Comparing with the way that the Base64 encoder was intrinsified, the method that is intrinsified should have a method body that does the same function, so it is interchangable.? That likely will just shift the "fast path" code into the decodeBlock method. Keeping the symmetry between encoder and decoder will make it easier to maintain the code. Given intrinsic only handles 2 of the three cases, and the java code handles all three, I would add an extra arg to decodeBlock to reflect the isMime case and have the intrinsic take an early exit until it was implemented. It is unfortunate that taking advantage of vectorization has to be hand coded. If/when the Vector API is ready (JEP 338 https://openjdk.java.net/jeps/338) the java code should be replaced to use the Vector API and then it would work for a new hardware without specific coding for each platform. "Just" implement the Vector API.? There's a lot more bang for the buck going for that approach. Thanks, Roger On 8/24/20 9:21 PM, Corey Ashford wrote: > Here's a revised webrev which includes a JMH benchmark for the decode > operation. > > http://cr.openjdk.java.net/~mhorie/8248188/webrev.03/ > > The added benchmark tries to be "fair" in that it doesn't prefer a > large buffer size, which would favor the intrinsic.? It > pseudo-randomly (but reproducibly) chooses a buffer size between 8 and > 20k+8 bytes, and fills it with random data to encode and decode.? As > part of the TearDown of an invocation, it also checks the decoded > output data for correctness. > > Example runs on the Power9-based machine I use for development shows a > 3X average improvement across these random buffer sizes. Here's an > excerpt of the output when run with -XX:-UseBASE64Intrinsics : > > Iteration?? 1: 70795.623 ops/s > Iteration?? 2: 71070.607 ops/s > Iteration?? 3: 70867.544 ops/s > Iteration?? 4: 71107.992 ops/s > Iteration?? 5: 71048.281 ops/s > > And here's the output with the intrinsic enabled: > > Iteration?? 1: 208794.022 ops/s > Iteration?? 2: 208630.904 ops/s > Iteration?? 3: 208238.822 ops/s > Iteration?? 4: 208714.967 ops/s > Iteration?? 5: 209060.894 ops/s > > Taking the best of the two runs: 209060/71048 = 2.94 > > From other experiments where the benchmark uses a fixed-size, larger > buffer, the performance ratio rises to about 4.0. > > Power10 should have a slightly higher ratio due to several factors, > but I have not yet benchmarked on Power10. > > Other arches ought to be able to do at least this well, if not better, > because of wider vector registers (> 128 bits) being available.? Only > a Power9/10 implementation is included in this webrev, however. > > Regards, > > - Corey > > > On 8/19/20 11:20 AM, Roger Riggs wrote: >> Hi Corey, >> >> For changes obviously performance motivated, it is conventional to >> run a JMH perf test to demonstate >> the improvement and prove it is worthwhile to add code complexity. >> >> I don't see any existing Base64 JMH tests but they would be in the >> repo below or near: >> ???? test/micro/org/openjdk/bench/java/util/ >> >> Please contribute a JMH test and results to show the difference. >> >> Regards, Roger >> >> >> >> On 8/19/20 2:10 PM, Corey Ashford wrote: >>> Michihiro Horie posted up a new iteration of this webrev for me.? >>> This time the webrev includes a complete implementation of the >>> intrinsic for Power9 and Power10. >>> >>> You can find it here: >>> http://cr.openjdk.java.net/~mhorie/8248188/webrev.02/ >>> >>> Changes in webrev.02 vs. webrev.01: >>> >>> ? * The method header for the intrinsic in the Base64 code has been >>> rewritten using the Javadoc style.? The clarity of the comments has >>> been improved and some verbosity has been removed. There are no >>> additional functional changes to Base64.java. >>> >>> ? * The code needed to martial and check the intrinsic parameters >>> has been added, using the base64 encodeBlock intrinsic as a guideline. >>> >>> ? * A complete intrinsic implementation for Power9 and Power10 is >>> included. >>> >>> ? * Adds some Power9 and Power10 assembler instructions needed by >>> the intrinsic which hadn't been defined before. >>> >>> The intrinsic implementation in this patch accelerates the decoding >>> of large blocks of base64 data by a factor of about 3.5X on Power9. >>> >>> I'm attaching two Java test cases I am using for testing and >>> benchmarking.? The TestBase64_VB encodes and decodes randomly-sized >>> buffers of random data and checks that original data matches the >>> encoded-then-decoded data.? TestBase64Errors encodes a 48K block of >>> random bytes, then corrupts each byte of the encoded data, one at a >>> time, checking to see if the decoder catches the illegal byte. >>> >>> Any comments/suggestions would be appreciated. >>> >>> Thanks, >>> >>> - Corey >>> >>> On 7/27/20 6:49 PM, Corey Ashford wrote: >>>> Michihiro Horie uploaded a new revision of the Base64 decodeBlock >>>> intrinsic API for me: >>>> >>>> http://cr.openjdk.java.net/~mhorie/8248188/webrev.01/ >>>> >>>> It has the following changes with respect to the original one posted: >>>> >>>> ??* In the event of encountering a non-base64 character, instead of >>>> having a separate error code of -1, the intrinsic can now just >>>> return either 0, or the number of data bytes produced up to the >>>> point where the illegal base64 character was encountered. This >>>> reduces the number of special cases, and also provides a way to >>>> speed up the process of finding the bad character by the slower, >>>> pure-Java algorithm. >>>> >>>> ??* The isMIME boolean is removed from the API for two reasons: >>>> ??? - The current API is not sufficient to handle the isMIME case, >>>> because there isn't a strict relationship between the number of >>>> input bytes and the number of output bytes, because there can be an >>>> arbitrary number of non-base64 characters in the source. >>>> ??? - If an intrinsic only implements the (isMIME == false) case as >>>> ours does, it will always return 0 bytes processed, which will >>>> slightly slow down the normal path of processing an (isMIME == >>>> true) instantiation. >>>> ??? - We considered adding a separate hotspot candidate for the >>>> (isMIME == true) case, but since we don't have an intrinsic >>>> implementation to test that, we decided to leave it as a future >>>> optimization. >>>> >>>> Comments and suggestions are welcome.? Thanks for your consideration. >>>> >>>> - Corey >>>> >>>> On 6/23/20 6:23 PM, Michihiro Horie wrote: >>>>> Hi Corey, >>>>> >>>>> Following is the issue I created. >>>>> https://bugs.openjdk.java.net/browse/JDK-8248188 >>>>> >>>>> I will upload a webrev when you're ready as we talked in private. >>>>> >>>>> Best regards, >>>>> Michihiro >>>>> >>>>> Inactive hide details for "Corey Ashford" ---2020/06/24 >>>>> 09:40:10---Currently in java.util.Base64, there is a >>>>> HotSpotIntrinsicCa"Corey Ashford" ---2020/06/24 >>>>> 09:40:10---Currently in java.util.Base64, there is a >>>>> HotSpotIntrinsicCandidate and API for encodeBlock, but no >>>>> >>>>> From: "Corey Ashford" >>>>> To: "hotspot-compiler-dev at openjdk.java.net" >>>>> , >>>>> "ppc-aix-port-dev at openjdk.java.net" >>>>> >>>>> Cc: Michihiro Horie/Japan/IBM at IBMJP, Kazunori >>>>> Ogata/Japan/IBM at IBMJP, joserz at br.ibm.com >>>>> Date: 2020/06/24 09:40 >>>>> Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for >>>>> Base64 decoding >>>>> >>>>> ------------------------------------------------------------------------ >>>>> >>>>> >>>>> >>>>> >>>>> Currently in java.util.Base64, there is a >>>>> HotSpotIntrinsicCandidate and >>>>> API for encodeBlock, but none for decoding. ?This means that only >>>>> encoding gets acceleration from the underlying CPU's vector hardware. >>>>> >>>>> I'd like to propose adding a new intrinsic for decodeBlock. ?The >>>>> considerations I have for this new intrinsic's API: >>>>> >>>>> ??* Don't make any assumptions about the underlying capability of the >>>>> hardware. ?For example, do not impose any specific block size >>>>> granularity. >>>>> >>>>> ??* Don't assume the underlying intrinsic can handle isMIME or isURL >>>>> modes, but also let them decide if they will process the data >>>>> regardless >>>>> of the settings of the two booleans. >>>>> >>>>> ??* Any remaining data that is not processed by the intrinsic will be >>>>> processed by the pure Java implementation. ?This allows the >>>>> intrinsic to >>>>> process whatever block sizes it's good at without the complexity of >>>>> handling the end fragments. >>>>> >>>>> ??* If any illegal character is discovered in the decoding >>>>> process, the >>>>> intrinsic will simply return -1, instead of requiring it to throw a >>>>> proper exception from the context of the intrinsic. ?In the event of >>>>> getting a -1 returned from the intrinsic, the Java Base64 library >>>>> code >>>>> simply calls the pure Java implementation to have it find the >>>>> error and >>>>> properly throw an exception. ?This is a performance trade-off in the >>>>> case of an error (which I expect to be very rare). >>>>> >>>>> ??* One thought I have for a further optimization (not implemented in >>>>> the current patch), is that when the intrinsic decides not to >>>>> process a >>>>> block because of some combination of isURL and isMIME settings it >>>>> doesn't handle, it could return extra bits in the return code, >>>>> encoded >>>>> as a negative number. ?For example: >>>>> >>>>> Illegal_Base64_char ? = 0b001; >>>>> isMIME_unsupported ? ?= 0b010; >>>>> isURL_unsupported ? ? = 0b100; >>>>> >>>>> These can be OR'd together as needed and then negated (flip the >>>>> sign). >>>>> The Base64 library code could then cache these flags, so it will know >>>>> not to call the intrinsic again when another decodeBlock is requested >>>>> but with an unsupported mode. ?This will save the performance hit of >>>>> calling the intrinsic when it is guaranteed to fail. >>>>> >>>>> I've tested the attached patch with an actual intrinsic coded up for >>>>> Power9/Power10, but those runtime intrinsics and arch-specific >>>>> patches >>>>> aren't attached today. ?I want to get some consensus on the >>>>> library-level intrinsic API first. >>>>> >>>>> Also attached is a simple test case to test that the new intrinsic >>>>> API >>>>> doesn't break anything. >>>>> >>>>> I'm open to any comments about this. >>>>> >>>>> Thanks for your consideration, >>>>> >>>>> - Corey >>>>> >>>>> >>>>> Corey Ashford >>>>> IBM Systems, Linux Technology Center, OpenJDK team >>>>> cjashfor at us dot ibm dot com >>>>> [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro >>>>> Horie/Japan/IBM] [attachment "TestBase64.java" deleted by >>>>> Michihiro Horie/Japan/IBM] >>>>> >>>>> >>>> >>> >> > From dean.long at oracle.com Sat Aug 29 01:41:51 2020 From: dean.long at oracle.com (Dean Long) Date: Fri, 28 Aug 2020 18:41:51 -0700 Subject: RFR(M) 8209961: [AOT] crash in Graal stub when -XX:+VerifyOops is used Message-ID: <00d162e0-90ca-5647-6062-fa2e8aa70fd6@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8209961 http://cr.openjdk.java.net/~dlong/8209961/webrev/ This change fixes support for -XX:+VerifyOops when used with AOT. The feature is disabled in generated AOT code by default unless -J-Dgraal.AOTVerifyOops=true is passed to jaotc (similar idea as --compile-with-assertions).? The JVM changes are minimal.? The Graal changes are all from upstream Graal and have already been reviewed and pushed there. dl From boris.ulasevich at bell-sw.com Sat Aug 29 15:39:02 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Sat, 29 Aug 2020 18:39:02 +0300 Subject: RFR 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: <5cbb89bb-32c7-8064-a6e9-f9b0d0a2b195@redhat.com> References: <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com> <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com> <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com> <8635e113-53c0-f6a7-e51c-acf20222e216@redhat.com> <89c44f2f-9a4f-4298-4e78-358fff27b6a7@bell-sw.com> <95cf8beb-2071-8c41-ff71-d4998681e742@redhat.com> <2323d921-8db3-b98f-af7a-bba7b7c345be@bell-sw.com> <405af8db-d12b-66ef-ff1b-8d0e2fb1273c@bell-sw.com> <5cbb89bb-32c7-8064-a6e9-f9b0d0a2b195@redhat.com> Message-ID: <24ed2bde-c80d-b6db-3167-6c31cc8fb4a7@bell-sw.com> Hi Andrew, Thank you once again. Can you please look at my update. I have added a functional test to demonstrate which cases are covered by the change and made a small update (OrI case in is_bitrange_zero) to add the missing transformation on java.awt.Color case: http://cr.openjdk.java.net/~bulasevich/8249893/webrev.02 The test shows successful transformation for typical int/long value construction cases I found in jdk java sources: ((a & 0xFF) << 24) | ((r & 0xFF) << 16) | ((g & 0xFF) << 8) | (b & 0xFF) (high << 32) | (low & 0xffffffffL) Was there anything else among your test cases? On my test case SubTest0::tst2 output I see that the BFI transformation works, but for this particular case (compiled with template=template1 where value1=value2) the result is not faster than default one. (value1 & 0x1L) | ((value1 & 0x1L) << 3) : and? x11, x2, #0x1 orr? x11, x11, x11, lsl #3 -> and? x11, x2, #0x1 bfi? x11, x2, #3, #1 I think it is Ok, using bfi here does not reduce the number of instructions used. The same case with different inputs (template=template2) is better: (value1 & 0x1L) | ((valueC & 0x1L) << 1) : and? x18, x10, #0x1 and? x10, x1, #0x1 orr? x10, x10, x18, lsl #3 -> and? x11, x3, #0x1 bfi? x11, x18, #3, #1 Do you think TestBFI test cases are Ok or I should implement more checks? The "a << 24 >>> 24" case IMO should be implemented as a LShiftI::Ideal transformation which should be done separately. thanks, Boris On 26.08.2020 17:21, Andrew Haley wrote: > On 25/08/2020 18:30, Boris Ulasevich wrote: >> I believe masking with left shift and right shift is not common. >> Search though jdk repository does not give such patterns while >> there is a hundreds of mask+lshift expressions. > >> I implemented a simple is_bitrange_zero() method for counting the >> bitranges of sub-expressions: power-of-two masks and left shift only. >> We can take into account more cases (careful testing is a main >> concern). But particularly about "r.a << 24 >>> 24" expression >> I think it is worse to think about canonicalization: "left shift + right >> shift" to "mask + left shift" (or may be the backwards). > I'm running your test program, and for example I get this, old on the > left, new on the right. > > Compiled method (c2) 11832 1113 SubTest0::tst2 (184 bytes) > > : and x11, x2, #0x1 ;*land : and x11, x2, #0x1 > : and x10, x1, #0x1 ;*land : and x10, x1, #0x1 > : orr x11, x11, x11, lsl #3 : bfi x11, x2, #3, #1 > : orr x10, x10, x10, lsl #3 : bfi x10, x1, #3, #1 > : and xmethod, x3, #0x1 ;*land : and xmethod, x3, #0x1 > : add x10, x10, x11 : bfi xmethod, x3, #3, #1 > : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, x11 > : and xmethod, x4, #0x1 ;*land : and x11, x4, #0x1 > : add x10, x11, x10 : bfi x11, x4, #3, #1 > : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, xmethod > : and xmethod, x5, #0x1 ;*land : and xmethod, x5, #0x1 > : add x10, x11, x10 : bfi xmethod, x5, #3, #1 > : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, x11 > : and xmethod, x6, #0x1 ;*land : and x11, x6, #0x1 > : add x10, x11, x10 : bfi x11, x6, #3, #1 > : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, xmethod > : and xmethod, x7, #0x1 ;*land : and xmethod, x7, #0x1 > : add x10, x11, x10 : bfi xmethod, x7, #3, #1 > : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, x11 > : and xmethod, x0, #0x1 ;*land : add x10, x10, xmethod > : add x10, x11, x10 : ldr x13, [sp,#32] > : orr x11, xmethod, xmethod, lsl #3 : and x11, x0, #0x1 > : ldr xmethod, [sp,#32] : and xmethod, x13, #0x1 > : and xmethod, xmethod, #0x1 : bfi x11, x0, #3, #1 > : add x10, x11, x10 : bfi xmethod, x13, #3, #1 > : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, x11 > : ldr xmethod, [sp,#40] : ldr x13, [sp,#40] > : and xmethod, xmethod, #0x1 : and x11, x13, #0x1 > : add x10, x11, x10 : bfi x11, x13, #3, #1 > : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, xmethod > : ldr xmethod, [sp,#48] : ldr x13, [sp,#48] > : and xmethod, xmethod, #0x1 : and xmethod, x13, #0x1 > : add x10, x11, x10 : bfi xmethod, x13, #3, #1 > : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, x11 > : ldr xmethod, [sp,#56] : ldr x13, [sp,#56] > : and xmethod, xmethod, #0x1 : and x11, x13, #0x1 > : add x10, x11, x10 : bfi x11, x13, #3, #1 > : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, xmethod > : add x0, x11, x10 ;*ladd : add x0, x10, x11 > > I've also tried a bunch of different test cases doing operations that > could match BFI instructions, and in only a few of them does it > happen. In almost all cases, then, this change does not help, *even > your own test case*. > > I think that you've got something that is potentially useful, but it > needs some careful analysis to make sure it actually gets used. > From xxinliu at amazon.com Sat Aug 29 20:08:36 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Sat, 29 Aug 2020 20:08:36 +0000 Subject: RFR: 8251464: make Node::dump(int depth) support indent Message-ID: <1598731717217.87517@amazon.com> hi, Reviewers, Could you review this patch? JBS:https://bugs.openjdk.java.net/browse/JDK-8251464 Webrev: http://cr.openjdk.java.net/~xliu/8251464/00/webrev/ This patch attempts to improve the formation of nodes when developers try to dump an ideal graph or snippet of a graph. In practice, I found it's pretty handy if Node::dump(int d) can support indent. The basic idea is to support indention for the utility function: collect_nodes_i(GrowableArray* queue, const Node* start, int direction, uint depth, bool include_start, bool only_ctrl, bool only_data) It only affects Node::dump family and -XX::PrintIdeal. It won't impact the output for igv. This can help developers who try to inspect a cluster of nodes in gdb. Another change is naming. collect_nodes_i uses breadth-first search. the container is used in fifo way instead of filo. I think the name "queue" serve better. TEST: hotspot:tier1 and gtest. mach-5 thanks, --lx From cjashfor at linux.ibm.com Sat Aug 29 20:19:42 2020 From: cjashfor at linux.ibm.com (Corey Ashford) Date: Sat, 29 Aug 2020 13:19:42 -0700 Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: <8d53dcf8-635a-11e2-4f6a-39b70e2c3b8b@oracle.com> References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> <8ece8d2e-fd99-b734-211e-a32b534a7dc8@linux.ibm.com> <8d53dcf8-635a-11e2-4f6a-39b70e2c3b8b@oracle.com> Message-ID: <65ed7919-86fc-adfa-3cd5-58dd96a3487f@linux.ibm.com> Hi Roger, Thanks for your reply and thoughts! Comments interspersed below: On 8/28/20 10:54 AM, Roger Riggs wrote: > Hi Corey, > > A few comments on core-libs side... > > The naming convention for methods that end in '0' is usually to indicate > they are the bottom-most method or a native method. > So I think you can/should rename the methods to make the most sense > as to their function. Ok, I will fix that. > > Comparing with the way that the Base64 encoder was intrinsified, the > method that is intrinsified should have a method body that does > the same function, so it is interchangable.? That likely will just shift > the "fast path" code into the decodeBlock method. > Keeping the symmetry between encoder and decoder will > make it easier to maintain the code. Good point. I'll investigate what this looks like in terms of the actual code, and will report back (perhaps in a new webrev). > > Given intrinsic only handles 2 of the three cases, and the java code > handles > all three, I would add an extra arg to decodeBlock to reflect the isMime > case > and have the intrinsic take an early exit until it was implemented. > I did consider doing that, but didn't for two reasons: * Implementing isMIME using vector hardware would be very difficult due to the need to ignore non-base64 characters. This requires eliminating those characters from the vector, then reading and shifting more in, repeatedly until there are no non-base64 characters left. This isn't a trivial/fast thing to do, at least on Power arch. None of the published base64 encode/decode functions for vector processors address the MIME case. In fact they don't address isURL=true either, but fortunately that is a relatively easy addition. * If isMIME=true is not implemented by the intrinsic, it will cost unnecessary overhead for that case, because of the need to martial the parameters, call the intrinsic, and then do an early return. I benchmarked this approach before, and saw an approx 5% drop in performance when isMIME = true. So that's why we decided to leave the isMIME=true case as a later optimization. Because of the extra complexity of the algorithm, it probably shouldn't share the same intrinsic anyway; only the isMIME=true case should take the performance hit. > > It is unfortunate that taking advantage of vectorization has to be hand > coded. > If/when the Vector API is ready (JEP 338 https://openjdk.java.net/jeps/338) > the java code should be replaced to use the Vector API and then it would > work for a new hardware without specific coding for each platform. > "Just" implement the Vector API.? There's a lot more bang for the buck > going for that approach. The kind of vector processing used in this intrinsic operates mostly on bytes within one vector, not between two vectors (for example in matrix-multiply algorithms), otherwise known as SWAR (https://en.wikipedia.org/wiki/SWAR). Because of that, it's very sensitive to which exact instructions are available in the vector processor. There isn't much standardization of SWAR instructions between different arches, so I think it would be hard to get a generic SWAR API that gives good performance across several arches. From briefly looking at the link you provided, it doesn't appear to address SWAR operations, so it doesn't seem to me that waiting for the vector API would be worth the wait, and in fact may not provide any method at all to boost performance of base64 decode/encode. Regards, - Corey P.S. I work only two days a week, so the updates will be slower compared to other developers. > > Thanks, Roger > > > On 8/24/20 9:21 PM, Corey Ashford wrote: >> Here's a revised webrev which includes a JMH benchmark for the decode >> operation. >> >> http://cr.openjdk.java.net/~mhorie/8248188/webrev.03/ >> >> The added benchmark tries to be "fair" in that it doesn't prefer a >> large buffer size, which would favor the intrinsic.? It >> pseudo-randomly (but reproducibly) chooses a buffer size between 8 and >> 20k+8 bytes, and fills it with random data to encode and decode.? As >> part of the TearDown of an invocation, it also checks the decoded >> output data for correctness. >> >> Example runs on the Power9-based machine I use for development shows a >> 3X average improvement across these random buffer sizes. Here's an >> excerpt of the output when run with -XX:-UseBASE64Intrinsics : >> >> Iteration?? 1: 70795.623 ops/s >> Iteration?? 2: 71070.607 ops/s >> Iteration?? 3: 70867.544 ops/s >> Iteration?? 4: 71107.992 ops/s >> Iteration?? 5: 71048.281 ops/s >> >> And here's the output with the intrinsic enabled: >> >> Iteration?? 1: 208794.022 ops/s >> Iteration?? 2: 208630.904 ops/s >> Iteration?? 3: 208238.822 ops/s >> Iteration?? 4: 208714.967 ops/s >> Iteration?? 5: 209060.894 ops/s >> >> Taking the best of the two runs: 209060/71048 = 2.94 >> >> From other experiments where the benchmark uses a fixed-size, larger >> buffer, the performance ratio rises to about 4.0. >> >> Power10 should have a slightly higher ratio due to several factors, >> but I have not yet benchmarked on Power10. >> >> Other arches ought to be able to do at least this well, if not better, >> because of wider vector registers (> 128 bits) being available.? Only >> a Power9/10 implementation is included in this webrev, however. >> >> Regards, >> >> - Corey >> >> >> On 8/19/20 11:20 AM, Roger Riggs wrote: >>> Hi Corey, >>> >>> For changes obviously performance motivated, it is conventional to >>> run a JMH perf test to demonstate >>> the improvement and prove it is worthwhile to add code complexity. >>> >>> I don't see any existing Base64 JMH tests but they would be in the >>> repo below or near: >>> ???? test/micro/org/openjdk/bench/java/util/ >>> >>> Please contribute a JMH test and results to show the difference. >>> >>> Regards, Roger >>> >>> >>> >>> On 8/19/20 2:10 PM, Corey Ashford wrote: >>>> Michihiro Horie posted up a new iteration of this webrev for me. >>>> This time the webrev includes a complete implementation of the >>>> intrinsic for Power9 and Power10. >>>> >>>> You can find it here: >>>> http://cr.openjdk.java.net/~mhorie/8248188/webrev.02/ >>>> >>>> Changes in webrev.02 vs. webrev.01: >>>> >>>> ? * The method header for the intrinsic in the Base64 code has been >>>> rewritten using the Javadoc style.? The clarity of the comments has >>>> been improved and some verbosity has been removed. There are no >>>> additional functional changes to Base64.java. >>>> >>>> ? * The code needed to martial and check the intrinsic parameters >>>> has been added, using the base64 encodeBlock intrinsic as a guideline. >>>> >>>> ? * A complete intrinsic implementation for Power9 and Power10 is >>>> included. >>>> >>>> ? * Adds some Power9 and Power10 assembler instructions needed by >>>> the intrinsic which hadn't been defined before. >>>> >>>> The intrinsic implementation in this patch accelerates the decoding >>>> of large blocks of base64 data by a factor of about 3.5X on Power9. >>>> >>>> I'm attaching two Java test cases I am using for testing and >>>> benchmarking.? The TestBase64_VB encodes and decodes randomly-sized >>>> buffers of random data and checks that original data matches the >>>> encoded-then-decoded data.? TestBase64Errors encodes a 48K block of >>>> random bytes, then corrupts each byte of the encoded data, one at a >>>> time, checking to see if the decoder catches the illegal byte. >>>> >>>> Any comments/suggestions would be appreciated. >>>> >>>> Thanks, >>>> >>>> - Corey >>>> >>>> On 7/27/20 6:49 PM, Corey Ashford wrote: >>>>> Michihiro Horie uploaded a new revision of the Base64 decodeBlock >>>>> intrinsic API for me: >>>>> >>>>> http://cr.openjdk.java.net/~mhorie/8248188/webrev.01/ >>>>> >>>>> It has the following changes with respect to the original one posted: >>>>> >>>>> ??* In the event of encountering a non-base64 character, instead of >>>>> having a separate error code of -1, the intrinsic can now just >>>>> return either 0, or the number of data bytes produced up to the >>>>> point where the illegal base64 character was encountered. This >>>>> reduces the number of special cases, and also provides a way to >>>>> speed up the process of finding the bad character by the slower, >>>>> pure-Java algorithm. >>>>> >>>>> ??* The isMIME boolean is removed from the API for two reasons: >>>>> ??? - The current API is not sufficient to handle the isMIME case, >>>>> because there isn't a strict relationship between the number of >>>>> input bytes and the number of output bytes, because there can be an >>>>> arbitrary number of non-base64 characters in the source. >>>>> ??? - If an intrinsic only implements the (isMIME == false) case as >>>>> ours does, it will always return 0 bytes processed, which will >>>>> slightly slow down the normal path of processing an (isMIME == >>>>> true) instantiation. >>>>> ??? - We considered adding a separate hotspot candidate for the >>>>> (isMIME == true) case, but since we don't have an intrinsic >>>>> implementation to test that, we decided to leave it as a future >>>>> optimization. >>>>> >>>>> Comments and suggestions are welcome.? Thanks for your consideration. >>>>> >>>>> - Corey >>>>> >>>>> On 6/23/20 6:23 PM, Michihiro Horie wrote: >>>>>> Hi Corey, >>>>>> >>>>>> Following is the issue I created. >>>>>> https://bugs.openjdk.java.net/browse/JDK-8248188 >>>>>> >>>>>> I will upload a webrev when you're ready as we talked in private. >>>>>> >>>>>> Best regards, >>>>>> Michihiro >>>>>> >>>>>> Inactive hide details for "Corey Ashford" ---2020/06/24 >>>>>> 09:40:10---Currently in java.util.Base64, there is a >>>>>> HotSpotIntrinsicCa"Corey Ashford" ---2020/06/24 >>>>>> 09:40:10---Currently in java.util.Base64, there is a >>>>>> HotSpotIntrinsicCandidate and API for encodeBlock, but no >>>>>> >>>>>> From: "Corey Ashford" >>>>>> To: "hotspot-compiler-dev at openjdk.java.net" >>>>>> , >>>>>> "ppc-aix-port-dev at openjdk.java.net" >>>>>> >>>>>> Cc: Michihiro Horie/Japan/IBM at IBMJP, Kazunori >>>>>> Ogata/Japan/IBM at IBMJP, joserz at br.ibm.com >>>>>> Date: 2020/06/24 09:40 >>>>>> Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for >>>>>> Base64 decoding >>>>>> >>>>>> ------------------------------------------------------------------------ >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Currently in java.util.Base64, there is a >>>>>> HotSpotIntrinsicCandidate and >>>>>> API for encodeBlock, but none for decoding. ?This means that only >>>>>> encoding gets acceleration from the underlying CPU's vector hardware. >>>>>> >>>>>> I'd like to propose adding a new intrinsic for decodeBlock. ?The >>>>>> considerations I have for this new intrinsic's API: >>>>>> >>>>>> ??* Don't make any assumptions about the underlying capability of the >>>>>> hardware. ?For example, do not impose any specific block size >>>>>> granularity. >>>>>> >>>>>> ??* Don't assume the underlying intrinsic can handle isMIME or isURL >>>>>> modes, but also let them decide if they will process the data >>>>>> regardless >>>>>> of the settings of the two booleans. >>>>>> >>>>>> ??* Any remaining data that is not processed by the intrinsic will be >>>>>> processed by the pure Java implementation. ?This allows the >>>>>> intrinsic to >>>>>> process whatever block sizes it's good at without the complexity of >>>>>> handling the end fragments. >>>>>> >>>>>> ??* If any illegal character is discovered in the decoding >>>>>> process, the >>>>>> intrinsic will simply return -1, instead of requiring it to throw a >>>>>> proper exception from the context of the intrinsic. ?In the event of >>>>>> getting a -1 returned from the intrinsic, the Java Base64 library >>>>>> code >>>>>> simply calls the pure Java implementation to have it find the >>>>>> error and >>>>>> properly throw an exception. ?This is a performance trade-off in the >>>>>> case of an error (which I expect to be very rare). >>>>>> >>>>>> ??* One thought I have for a further optimization (not implemented in >>>>>> the current patch), is that when the intrinsic decides not to >>>>>> process a >>>>>> block because of some combination of isURL and isMIME settings it >>>>>> doesn't handle, it could return extra bits in the return code, >>>>>> encoded >>>>>> as a negative number. ?For example: >>>>>> >>>>>> Illegal_Base64_char ? = 0b001; >>>>>> isMIME_unsupported ? ?= 0b010; >>>>>> isURL_unsupported ? ? = 0b100; >>>>>> >>>>>> These can be OR'd together as needed and then negated (flip the >>>>>> sign). >>>>>> The Base64 library code could then cache these flags, so it will know >>>>>> not to call the intrinsic again when another decodeBlock is requested >>>>>> but with an unsupported mode. ?This will save the performance hit of >>>>>> calling the intrinsic when it is guaranteed to fail. >>>>>> >>>>>> I've tested the attached patch with an actual intrinsic coded up for >>>>>> Power9/Power10, but those runtime intrinsics and arch-specific >>>>>> patches >>>>>> aren't attached today. ?I want to get some consensus on the >>>>>> library-level intrinsic API first. >>>>>> >>>>>> Also attached is a simple test case to test that the new intrinsic >>>>>> API >>>>>> doesn't break anything. >>>>>> >>>>>> I'm open to any comments about this. >>>>>> >>>>>> Thanks for your consideration, >>>>>> >>>>>> - Corey >>>>>> >>>>>> >>>>>> Corey Ashford >>>>>> IBM Systems, Linux Technology Center, OpenJDK team >>>>>> cjashfor at us dot ibm dot com >>>>>> [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro >>>>>> Horie/Japan/IBM] [attachment "TestBase64.java" deleted by >>>>>> Michihiro Horie/Japan/IBM] >>>>>> >>>>>> >>>>> >>>> >>> >> > From aph at redhat.com Sun Aug 30 08:34:57 2020 From: aph at redhat.com (Andrew Haley) Date: Sun, 30 Aug 2020 09:34:57 +0100 Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: <5D556B7D-1995-4FBC-9176-E79FFC789571@amazon.com> References: <5D556B7D-1995-4FBC-9176-E79FFC789571@amazon.com> Message-ID: <8289015c-711b-286f-ba99-e589edfed8a5@redhat.com> On 28/08/2020 17:40, Hohensee, Paul wrote: > One's perspective on the benchmark results depends on the expected > frequency of the input types. If we don't expect frequent NaNs (I > don?t, because they mean your algorithm is numerically unstable and > you're wasting your time running it), or zeros (somewhat arguable, > but note that most codes go to some lengths to eliminate zeros, > e.g., using sparse arrays), then this patch seems to me to be a win. Possibly. But it's a significant change that improves some cases while making some other cases worse. When it does makes some cases better, it's only by a small factor and it's not consistent across hardware implementations. Please consider the numbers. When you look at Abs/Copysign it improves all cases except 0, and it doesn't make any of them any worse. Copysign on its own gets close. Copysign is nearly as good. That's true at least for the reduce case, which I argue is representative, more so than the blackhole case, where the blackhole operation itself swamps the calculation we're trying to measure. Ignoring NaN, I've added averages for the four cases to http://cr.openjdk.java.net/~aph/signum-facgt-copysign.ods. But we still don't know what effect all of this has, if any, on real code. My guess is that copysign should always helps because it avoids a move between FPU and integer unit and is otherwise identical. But the blackhole benchmark suggests it can make latency worse, and I have no explanation for that. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From boris.ulasevich at bell-sw.com Sun Aug 30 17:18:30 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Sun, 30 Aug 2020 20:18:30 +0300 Subject: RFR(S) 8252311: AArch64: save two words in itable lookup stub Message-ID: Hi, The interface method lookup stub becomes hot when interface calls are performed frequently. The stub assembly code can be made shorter (132->124 bytes) by using a pre-increment instruction variant. http://cr.openjdk.java.net/~bulasevich/8252311/webrev.00 http://bugs.openjdk.java.net/browse/JDK-8252311 The benchmark [1] shows [2] performance and icache loads improvement: performance: 6165206 -> 6307798 ops/s L1-icache-loads: 307.271 -> 274.604 The change was tested with JTREG. thanks, Boris [1] http://cr.openjdk.java.net/~bulasevich/8252311/InvokeInterface.java [2] http://cr.openjdk.java.net/~bulasevich/8252311/InvokeInterface.perf.txt From vladimir.kozlov at oracle.com Sun Aug 30 22:16:38 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sun, 30 Aug 2020 15:16:38 -0700 Subject: RFR(M) 8209961: [AOT] crash in Graal stub when -XX:+VerifyOops is used In-Reply-To: <00d162e0-90ca-5647-6062-fa2e8aa70fd6@oracle.com> References: <00d162e0-90ca-5647-6062-fa2e8aa70fd6@oracle.com> Message-ID: <16b6eb21-9d92-790c-4d57-0300be288ffa@oracle.com> Looks good. Thanks, Vladimir K On 8/28/20 6:41 PM, Dean Long wrote: > https://bugs.openjdk.java.net/browse/JDK-8209961 > http://cr.openjdk.java.net/~dlong/8209961/webrev/ > > This change fixes support for -XX:+VerifyOops when used with AOT. The feature is disabled in generated AOT code by > default unless -J-Dgraal.AOTVerifyOops=true > is passed to jaotc (similar idea as --compile-with-assertions).? The JVM changes are minimal.? The Graal changes are all > from upstream Graal and have already been reviewed and pushed there. > > dl > From dean.long at oracle.com Sun Aug 30 22:37:08 2020 From: dean.long at oracle.com (Dean Long) Date: Sun, 30 Aug 2020 15:37:08 -0700 Subject: RFR(M) 8209961: [AOT] crash in Graal stub when -XX:+VerifyOops is used In-Reply-To: <16b6eb21-9d92-790c-4d57-0300be288ffa@oracle.com> References: <00d162e0-90ca-5647-6062-fa2e8aa70fd6@oracle.com> <16b6eb21-9d92-790c-4d57-0300be288ffa@oracle.com> Message-ID: Thanks Vladimir. dl On 8/30/20 3:16 PM, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir K > > On 8/28/20 6:41 PM, Dean Long wrote: >> https://bugs.openjdk.java.net/browse/JDK-8209961 >> http://cr.openjdk.java.net/~dlong/8209961/webrev/ >> >> This change fixes support for -XX:+VerifyOops when used with AOT. The >> feature is disabled in generated AOT code by default unless >> -J-Dgraal.AOTVerifyOops=true >> is passed to jaotc (similar idea as --compile-with-assertions). The >> JVM changes are minimal.? The Graal changes are all from upstream >> Graal and have already been reviewed and pushed there. >> >> dl >> From ningsheng.jian at arm.com Mon Aug 31 04:00:48 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Mon, 31 Aug 2020 12:00:48 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <9b585dff-38be-16b5-b1a1-4ea0207458b9@oracle.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com> <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com> <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com> <35f8801f-4383-4804-2a9b-5818d1bda763@redhat.com> <524a4aaa-3cf7-7b5e-3e0e-0fd7f4f89fbf@oracle.com> <670fad6f-16ff-a7b3-8775-08dd79809ddf@redhat.com> <9b585dff-38be-16b5-b1a1-4ea0207458b9@oracle.com> Message-ID: Hi Vladimir, On 8/28/20 5:56 PM, Vladimir Ivanov wrote: > [...] > > One more point on naming: though it was me who proposed the name "vec" > on x86, I don't think it's the best option anymore. Considering it's > desirable to get rid of VecS/VecD/VecX/... machine ideal registers and > replace them with a single one, I think using Op_RegV is a better > alternative to Op_Vec. Hence, regV/rRegV/vReg look better (depending on > conventions adopted in particular AD file). > vReg looks good to me. I will update it in the new webrev. Thanks! Regards, Ningsheng From felix.yang at huawei.com Mon Aug 31 06:50:34 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Mon, 31 Aug 2020 06:50:34 +0000 Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic Message-ID: Hi, Bug: https://bugs.openjdk.java.net/browse/JDK-8252204 Webrev: http://cr.openjdk.java.net/~fyang/8252204/webrev.00/ This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions. Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52 Trivial adaptation in SHA3. implCompress is needed for the purpose of adding the intrinsic. For SHA3, we need to pass one extra parameter "digestLength" to the stub for the calculation of block size. "digestLength" is also used in for the EOR loop before keccak to differentiate different SHA3 variants. We added jtreg tests for SHA3 and used QEMU system emulator which supports SHA3 instructions to test the functionality. Patch passed jtreg tier1-3 tests with QEMU system emulator. Also verified with jtreg tier1-3 tests without SHA3 instructions on aarch64-linux-gnu and x86_64-linux-gnu, to make sure that there's no regression. We used one existing JMH test for performance test: test/micro/org/openjdk/bench/java/security/MessageDigests.java We measured the performance benefit with an aarch64 cycle-accurate simulator. Patch delivers 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message. For now, this feature will not be enabled automatically for aarch64. We can auto-enable this when it is fully tested on real hardware. But for the above testing purposes, this is auto-enabled when the corresponding hardware feature is detected. Comments? Thanks, Felix From tobias.hartmann at oracle.com Mon Aug 31 07:16:23 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 31 Aug 2020 09:16:23 +0200 Subject: RFR(M): 8223051: support loops with long (64b) trip counts In-Reply-To: References: <87lfmd8lip.fsf@redhat.com> <87h7wv7jny.fsf@redhat.com> <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com> <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com> <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com> <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com> <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com> <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com> <87tuwx1gcf.fsf@redhat.com> <7433dd28-94ce-781a-a50c-e79234e2986e@oracle.com> Message-ID: <13aba46a-3200-30fa-7f37-b08a42dc9f8e@oracle.com> On 25.08.20 16:06, Tobias Hartmann wrote: > Okay, thanks, I'll run some more testing with these values. Will report back once it finished. All done. Apart from expected test failures (TestIntVect due to failed vectorization and UseCountedLoopSafepointsTest due to a missing safepoint) and unrelated/known issues, I'm seeing the following failure: compiler/loopopts/TestRangeCheckPredicatesControl.java -server -Xcomp -XX:+IgnoreUnrecognizedVMOptions -XX:StressLongCountedLoop=200000000 # SIGSEGV (0xb) at pc=0x00007fb970ac2b73, pid=2312839, tid=2312845 # Problematic frame: # V [libjvm.so+0x18b3b73] ZMark::try_mark_object(ZMarkCache*, unsigned long, bool)+0x53 Current thread (0x00007fb968059260): GCTaskThread "ZWorker#2" [stack: 0x00007fb96ea6c000,0x00007fb96eb6c000] [id=2312845] Stack: [0x00007fb96ea6c000,0x00007fb96eb6c000], sp=0x00007fb96eb64be0, free space=994k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x18b3b73] ZMark::try_mark_object(ZMarkCache*, unsigned long, bool)+0x53 V [libjvm.so+0x18b59a8] ZMark::work_without_timeout(ZMarkCache*, ZMarkStripe*, ZMarkThreadLocalStacks*)+0x148 V [libjvm.so+0x18b6048] ZMark::work(unsigned long)+0xa8 V [libjvm.so+0x18f1b8d] ZTask::GangTask::work(unsigned int)+0x1d V [libjvm.so+0x187a4c4] GangWorker::run_task(WorkData)+0x84 V [libjvm.so+0x187a604] GangWorker::loop()+0x44 V [libjvm.so+0x173ab90] Thread::call_run()+0x100 V [libjvm.so+0x143fc16] thread_native_entry(Thread*)+0x116 Roland, could you please try to reproduce and check if it's related to your patch? Best regards, Tobias From aph at redhat.com Mon Aug 31 08:41:26 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 31 Aug 2020 09:41:26 +0100 Subject: [aarch64-port-dev ] RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic In-Reply-To: References: Message-ID: <1729f1b1-056d-76c9-c820-d38bd6c1235d@redhat.com> On 31/08/2020 07:50, Yangfei (Felix) wrote: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8252204 > Webrev: http://cr.openjdk.java.net/~fyang/8252204/webrev.00/ > > This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions. > Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-cecore.S?h=v5.4.52 > Trivial adaptation in SHA3. implCompress is needed for the purpose > of adding the intrinsic. For SHA3, we need to pass one extra > parameter "digestLength" to the stub for the calculation of block > size. "digestLength" is also used in for the EOR loop before > keccak to differentiate different SHA3 variants. > > We added jtreg tests for SHA3 and used QEMU system emulator > which supports SHA3 instructions to test the functionality. > Patch passed jtreg tier1-3 tests with QEMU system emulator. > Also verified with jtreg tier1-3 tests without SHA3 instructions > on aarch64-linux-gnu and x86_64-linux-gnu, to make sure that > there's no regression. > > We used one existing JMH test for performance test: > test/micro/org/openjdk/bench/java/security/MessageDigests.java > We measured the performance benefit with an aarch64 > cycle-accurate simulator. > Patch delivers 20% - 40% performance improvement depending on > specific SHA3 digest length and size of the message. > For now, this feature will not be enabled automatically for > aarch64. We can auto-enable this when it is fully tested on > real hardware. > But for the above testing purposes, this is auto-enabled when > the corresponding hardware feature is detected. > > Comments? This looks like a direct copy of the sha3-cecore.S file.You'll need Linaro to contribute it. I don't imagine they'll have any problem with that: they are OCA signatories Also, given that we've got the assembly source file, why not just copy that into OpenJDK? I can't see the point rewriting it into the HotSpot assembler. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From felix.yang at huawei.com Mon Aug 31 09:46:58 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Mon, 31 Aug 2020 09:46:58 +0000 Subject: [aarch64-port-dev ] RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic In-Reply-To: <1729f1b1-056d-76c9-c820-d38bd6c1235d@redhat.com> References: <1729f1b1-056d-76c9-c820-d38bd6c1235d@redhat.com> Message-ID: > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Monday, August 31, 2020 4:41 PM > To: Yangfei (Felix) ; hotspot-compiler- > dev at openjdk.java.net; core-libs-dev at openjdk.java.net > Cc: aarch64-port-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR: 8252204: AArch64: Implement SHA3 > accelerator/intrinsic > > On 31/08/2020 07:50, Yangfei (Felix) wrote: > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8252204 > > Webrev: http://cr.openjdk.java.net/~fyang/8252204/webrev.00/ > > > > This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto > Extensions. > > Reference implementation for core SHA-3 transform using ARMv8.2 > Crypto Extensions: > > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/ar > m64/crypto/sha3-cecore.S?h=v5.4.52 > > Trivial adaptation in SHA3. implCompress is needed for the purpose > > of adding the intrinsic. For SHA3, we need to pass one extra > > parameter "digestLength" to the stub for the calculation of block > > size. "digestLength" is also used in for the EOR loop before > > keccak to differentiate different SHA3 variants. > > > > We added jtreg tests for SHA3 and used QEMU system emulator > > which supports SHA3 instructions to test the functionality. > > Patch passed jtreg tier1-3 tests with QEMU system emulator. > > Also verified with jtreg tier1-3 tests without SHA3 instructions > > on aarch64-linux-gnu and x86_64-linux-gnu, to make sure that > > there's no regression. > > > > We used one existing JMH test for performance test: > > test/micro/org/openjdk/bench/java/security/MessageDigests.java > > We measured the performance benefit with an aarch64 > > cycle-accurate simulator. > > Patch delivers 20% - 40% performance improvement depending on > > specific SHA3 digest length and size of the message. > > For now, this feature will not be enabled automatically for > > aarch64. We can auto-enable this when it is fully tested on > > real hardware. > > But for the above testing purposes, this is auto-enabled when > > the corresponding hardware feature is detected. > > > > Comments? > > This looks like a direct copy of the sha3-cecore.S file.You'll need Linaro to > contribute it. I don't imagine they'll have any problem with that: they are > OCA signatories Since the code in sha3-cecore.S works in kernel space, we need several modifications here to makes it work in hotspot. First, we need to add callee-save & restore for d8 - d15 according to the aarch64 abi. Also, the following code snippet is not needed for user-space: if_will_cond_yield_neon add x8, x19, #32 st1 { v0.1d- v3.1d}, [x19] st1 { v4.1d- v7.1d}, [x8], #32 st1 { v8.1d-v11.1d}, [x8], #32 st1 {v12.1d-v15.1d}, [x8], #32 st1 {v16.1d-v19.1d}, [x8], #32 st1 {v20.1d-v23.1d}, [x8], #32 st1 {v24.1d}, [x8] do_cond_yield_neon b 0b endif_yield_neon And we need to handle the multi-block case differently for StubRoutines::sha3_implCompressMB: 3485 if (multi_block) { 3486 // block_size = 200 - 2 * digest_length, ofs += block_size 3487 __ add(ofs, ofs, 200); 3488 __ sub(ofs, ofs, digest_length, Assembler::LSL, 1); 3489 3490 __ cmp(ofs, limit); 3491 __ br(Assembler::LE, sha3_loop); 3492 __ mov(c_rarg0, ofs); // return ofs 3493 } And StubRoutines::sha3_implCompress does not even need this multi-block check logic. > Also, given that we've got the assembly source file, why not just copy that > into OpenJDK? I can't see the point rewriting it into the HotSpot assembler. Actually, we referenced the existing intrinsics implementation and took a similar way. It looks strange to have one intrinsic that goes differently. And we won't be able to emit this code on demand if we go that different way. Some cpu does not support these special sha3 instructions and thus does need this code at all. I think that's one advantage of using a stub. Thanks, Felix From vladimir.x.ivanov at oracle.com Mon Aug 31 13:52:58 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 31 Aug 2020 16:52:58 +0300 Subject: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby In-Reply-To: References: Message-ID: <416425ef-0980-ba2c-0bdf-8eebefa5e81e@oracle.com> Hi Charlie, > So we have a puzzle. Why does running this code with tiered > compilation cause it to (erroneously?) claim a signature class has not > been loaded? I didn't try to answer this exact question, but looked at what happens during the failed inlining attempt. What surprised me is that the absent class which causes the failure is java.lang.String. But it turns out java.lang.String is never accessed from callee method [1] and hence there are no guarantees it is resolved in the context of the context class loader (instance of org/jruby/util/OneShotClassLoader) by the time the compilation kicks in. You can workaround that by forcing j.l.String resolution when instantiating the class loader. Best regards, Vladimir Ivanov [1] Users.vlivanov.ws.tmp.TIERED.inline::RUBY$method$bar$0 (Lorg/jruby/runtime/ThreadContext;Lorg/jruby/parser/StaticScope;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;Lorg/jruby/RubyModule;Ljava/lang/String;)Lorg/jruby/runtime/builtin/IRubyObject; 0 nop 1 nop 2 fast_aload_0 3 invokedynamic bsm=18 22 0 bci: 3 CounterData count(14485) argument types 0: stack(0) 'org/jruby/runtime/ThreadContext' return type 'org/jruby/RubyFixnum' 8 areturn 9 athrow - class loader data: loader data: 0x0000000134c18570 for instance a 'org/jruby/util/OneShotClassLoader'{0x0000000702198648} Java dictionary (table_size=107, classes=10, resizable=true) ^ indicates that initiating loader is different from defining loader 8: ^java.lang.Object, loader data: 0x000000010043e520 of 'bootstrap' 15: ^org.jruby.ir.targets.Bootstrap, loader data: 0x0000000127b2e010 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x0000000700001480} 36: ^org.jruby.runtime.builtin.IRubyObject, loader data: 0x0000000127b2e010 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x0000000700001480} 53: ^org.jruby.runtime.Block, loader data: 0x0000000127b2e010 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x0000000700001480} 53: ^org.jruby.ir.IRScope, loader data: 0x0000000127b2e010 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x0000000700001480} 69: Users.vlivanov.ws.tmp.TIERED.inline, loader data: 0x0000000134c18570 for instance a 'org/jruby/util/OneShotClassLoader'{0x0000000702198648} 73: ^org.jruby.runtime.ThreadContext, loader data: 0x0000000127b2e010 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x0000000700001480} 74: ^org.jruby.ir.targets.FixnumObjectSite, loader data: 0x0000000127b2e010 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x0000000700001480} 94: ^org.jruby.parser.StaticScope, loader data: 0x0000000127b2e010 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x0000000700001480} 95: ^org.jruby.RubyModule, loader data: 0x0000000127b2e010 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x0000000700001480} > This appears to affect every OpenJDK release at least back to 8u222, > the earliest version we tested. > > To reproduce, create the two scripts in the bug, download a JRuby > distribution from jruby.org, and execute the main script like this: > > bin/jruby -Xcompile.invokedynamic -J-XX:+WhateverHotspotFlag main.rb > > PrintInlining and PrintAssembly output will show that the "bar" method > fails to inline into "foo" in the inline.rb part of the example. > > Help! > > - Charlie > From dmitry.chuyko at bell-sw.com Mon Aug 31 14:28:46 2020 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Mon, 31 Aug 2020 17:28:46 +0300 Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp) In-Reply-To: References: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com> <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com> <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com> Message-ID: <0cca5c0c-9240-3a9f-98f0-519384ea69cb@bell-sw.com> Hi Andrew, Here is another version of intrinsics. It is an extension of webrev.03. Additional thing is that constants 0 and 1 that are used internally by intrinics are constructed as nodes. This is somehow similar to what is done for passing pointers to tables. webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.04/ results: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.04/benchmarks/signum-facgt_ir-copysign.ods As you can see the case of intrinsic for entire signum is now up to 29.2% better for "random" data. NaN is 30% better also. The only suffering case is 0, which is just 1 number (in two representations) of the whole range, and the regression is ~7%/10%. Performance in case of 0 becomes the same as for all other numbers (and NaN). I don't suppose that 0 is so special. Because if input data is all zeroes and program produces zeroes during the computation, it is trivial. If zero make half of the data, there still be a win. For the case of copySign(double), making a constant in IR amplifies regression in Blackhole benchmark, but still may be interesting to experiment with. Just in case, it will be interesting to remeasure Blackhole variants if compiler support [1] will be implemented. Here is also a benchmark variant [2] where we consume different data, and it shows same effects as Blackhole.consume(signum). -Dmitry [1] https://bugs.openjdk.java.net/browse/JDK-8252505 [2] http://cr.openjdk.java.net/~dchuyko/8251525/webrev.04/benchmarks/DoubleSideSinkBench.java From cjashfor at linux.ibm.com Mon Aug 31 16:41:32 2020 From: cjashfor at linux.ibm.com (Corey Ashford) Date: Mon, 31 Aug 2020 09:41:32 -0700 Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> Message-ID: <83ee5372-3890-fb07-721b-9d51641865da@linux.ibm.com> On 8/27/20 8:07 AM, Doerr, Martin wrote: > Hi Corey, > >> If I make a requirement, I feel decode0 should check that the >> requirement is met, and raise some kind of internal error if it isn't. >> That actually was my first implementation, but I received some comments >> during an internal review suggesting that I just "round down" the >> destination count to the closest multiple of 3 less than or equal to the >> returned value, rather than throw an internal exception which would >> confuse users. This "enforces" the rule, in some sense, without error >> handling. Do you have some thoughts about this? > > I think the rounding logic is hard to understand and I'm not sure if it's correct (you're rounding up for the 1st computation of chars_decoded). > If we don't use it, it will never get tested (because the intrinsic always returns a multiple of 3). > I prefer having a more simple version which is easy to understand and for which we can test all cases. I will see what I can do with the calculation of chars_decoded, at least in the comments, to make it more clear as to the "why" of the calculation. I will remove the round down code: "dl = (dl / 3) * 3;" and leave it for intrinsics implementers/maintainers to check that assumption when the intrinsic returns. > > I think we should be able to catch violations of this requirement by adding good JTREG tests. > An illegal intrinsic implementation should never pass the tests. So I don't see a need to catch an illegal state in the Java source code in this case. > I guess this will be best for intrinsic implementors for other platforms as well. > > I'd appreciate more opinions on this. > > >> I will double check that everything compiles and runs properly with gcc >> 7.3.1. > Please note that 7.3.1 is our minimum for Big Endian linux. For Little Endian it's 7.4.0. Ah, that might explain why I wasn't able to find gcc-7.3.1 on RHEL 8.1 (gcc-8.3.1) or Ubuntu 16.04 (gcc-7.4.0) for Power9. As long as the code is enabled on little endian machines only, there should be no trouble with compilation. I did compile and run the tests against 7.4.0, and it worked without a problem. > You can also find this information here: > https://wiki.openjdk.java.net/display/Build/Supported+Build+Platforms > under "Other JDK 13 build platforms" which hasn't changed since then. > Great, thank you. >> I will use __attribute__ ((align(16))) instead of __vector, and make >> them arrays of 16 unsigned char. > Maybe __vectors works as expected, too, now. Whatever we use, I'd appreciate to double-check the alignment e.g. by using gdb. Ok, I will experiment with that with some small test cases and see if I can make the compiler stumble and not align the vector. The lxv instruction can handle unaligned vectors in memory, but it would be better to have the vectors aligned for performance reasons. > I don't remember what we had tried and why it didn't work as desired. > > >> I was following what was done for encodeBlock, but it appears >> encodeBlock's style isn't what is used for the other intrinsics. I will >> correct decodeBlock to use the prevailing style. Another patch should >> be added (not part of this webrev) to correct encodeBlock's style. > In your code one '\' is not aligned with the other ones. Yes, it's corrected now. > > >> Ah, this is another thing I didn't know about. I will make some >> regression tests. > Thanks. There's some documentation available: > https://openjdk.java.net/jtreg/ > I guess your colleagues can assist you with that so you don't have to figure out everything alone. Yes, thank you. JTREG tests will be part of the next webrev version. Regards, - Corey > > >> Thanks for your time on this. As you can tell, I'm inexperienced in >> writing openjdk code, so your patience and careful review is really >> appreciated. > I'm glad you work on contributions. I think we should welcome new contributors and assist as far as we can. > > Best regards, > Martin > > >> -----Original Message----- >> From: Corey Ashford >> Sent: Donnerstag, 27. August 2020 00:17 >> To: Doerr, Martin ; Michihiro Horie >> >> Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev > dev at openjdk.java.net>; Kazunori Ogata ; >> joserz at br.ibm.com >> Subject: Re: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and >> API for Base64 decoding >> >> Hi Martin, >> >> Some inline responses below. >> >> On 8/26/20 8:26 AM, Doerr, Martin wrote: >> >>> Hi Corey, >>> >>> I should explain my comments regarding Base64.java better. >>> >>>> Let's be precise: "should process a multiple of four" => "must process a >>>> multiple of four" >>> Did you try to support non-multiple of 4 and this was intended as >> recommendation? >>> I think making it a requirement and simplifying the logic in decode0 is >> better. >>> Or what's the benefit of the recommendation? >> >> If I make a requirement, I feel decode0 should check that the >> requirement is met, and raise some kind of internal error if it isn't. >> That actually was my first implementation, but I received some comments >> during an internal review suggesting that I just "round down" the >> destination count to the closest multiple of 3 less than or equal to the >> returned value, rather than throw an internal exception which would >> confuse users. This "enforces" the rule, in some sense, without error >> handling. Do you have some thoughts about this? >> >>> >>>>> If any illegal base64 bytes are encountered in the source by the >>>>> intrinsic, the intrinsic can return a data length of zero or any >>>>> number of bytes before the place where the illegal base64 byte >>>>> was encountered. >>>> I think this has a drawback. Somebody may use a debugger and want to >> stop >>>> when throwing IllegalArgumentException. He should see the position >> which >>>> matches the Java implementation.kkkk >>> This is probably hard to understand. Let me try to explain it by example: >>> 1. 80 Bytes get processed by the intrinsic and 60 Bytes written to the >> destination array. >>> 2. The intrinsic sees an illegal base64 Byte and it returns 12 which is allowed >> by your specification. >>> 3. The compiled method containing the intrinsic hits a safepoint (e.g. in the >> large while loop in decodeBlockSlow). >>> 4. A JVMTI agent (debugger) reads dp and dst. >>> 5. The person using the debugger gets angry because more bytes than dp >> were written into dst. The JVM didn't follow the specified behavior. >>> >>> I guess we can and should avoid it by specifying that the intrinsic needs to >> return the dp value matching the number of Bytes written. >> >> That's an interesting point. I will change the specification, and the >> intrinsic implementation. Right now the Power9/10 intrinsic returns 0 >> when any illegal character is discovered, but I've been thinking about >> returning the number of bytes already written, which will allow >> decodeBlockSlow to more quickly find the offending character. This >> provides another good reason to make that change. >> >>> >>> Best regards, >>> Martin >>> >>> >>>> -----Original Message----- >>>> From: Doerr, Martin >>>> Sent: Dienstag, 25. August 2020 15:38 >>>> To: Corey Ashford ; Michihiro Horie >>>> >>>> Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev >>> dev at openjdk.java.net>; Kazunori Ogata ; >>>> joserz at br.ibm.com >>>> Subject: RE: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate >> and >>>> API for Base64 decoding >>>> >>>> Hi Corey, >>>> >>>> thanks for proposing this change. I have comments and suggestions >>>> regarding various files. >>>> >>>> >>>> Base64.java >>>> >>>> This is the only file which needs another review from core-libs-dev. >>>> First of all, I like the idea to use a HotSpotIntrinsicCandidate which can >>>> consume as many bytes as the implementation wants. >>>> >>>> Comment before decodeBlock: >>>> Let's be precise: "should process a multiple of four" => "must process a >>>> multiple of four" >>>> >>>>> If any illegal base64 bytes are encountered in the source by the >>>>> intrinsic, the intrinsic can return a data length of zero or any >>>>> number of bytes before the place where the illegal base64 byte >>>>> was encountered. >>>> I think this has a drawback. Somebody may use a debugger and want to >> stop >>>> when throwing IllegalArgumentException. He should see the position >> which >>>> matches the Java implementation. >>>> >>>> Please note that the comment indentation differs from other comments. >> >> Will fix. >> >>>> >>>> decode0: Final "else" after return is redundant. >> >> Will fix. >> >>>> >>>> >>>> stubGenerator_ppc.cpp >>>> >>>> "__vector" breaks AIX build! >>>> Does it work on Big Endian linux with old gcc (we require 7.3.1, now)? >>>> Please either support Big Endian properly or #ifdef it out. >> >> I have been compiling with only Advance Toolchain 13, which is 9.3.1, >> and only on Linux. It will not work with big endian, so it won't work >> on AIX, however obviously it shouldn't break the AIX build, so I will >> address that. There's code to set UseBASE64Intrinsics to false on big >> endian, but you're right -- I should ifdef all of the intrinsic code for >> little endian for now. Getting it to work on big endian / AIX shouldn't >> be difficult, but it's not in my scope of work at the moment. >> >> I will double check that everything compiles and runs properly with gcc >> 7.3.1. >> >>>> What exactly does it (do) on linux? >> >> It's an arch-specific type that's 16 bytes in size and aligned on a >> 16-byte boundary. >> >>>> I remember that we had tried such prefixes but were not satisfied. I think >> it >>>> didn't enforce 16 Byte alignment if I remember correctly. >> >> I will use __attribute__ ((align(16))) instead of __vector, and make >> them arrays of 16 unsigned char. >> >>>> >>>> Attention: C2 does no longer convert int/bool to 64 bit values (since JDK- >>>> 8086069). So the argument registers for offset, length and isURL may >> contain >>>> garbage in the higher bits. >> >> Wow, that's good to know! I will mask off the incoming values. >> >>>> >>>> You may want to use load_const_optimized which produces shorter code. >> >> Will fix. >> >>>> >>>> You may want to use __ align(32) to align unrolled_loop_start. >> >> Will fix. >> >>>> >>>> I'll review the algorithm in detail when I find more time. >>>> >>>> >>>> assembler_ppc.hpp >>>> assembler_ppc.inline.hpp >>>> vm_version_ppc.cpp >>>> vm_version_ppc.hpp >>>> Please rebase. Parts of the change were pushed as part of 8248190: >> Enable >>>> Power10 system and implement new byte-reverse instructions >> >> Will do. >> >>>> >>>> >>>> vmSymbols.hpp >>>> Indentation looks odd at the end. >> >> I was following what was done for encodeBlock, but it appears >> encodeBlock's style isn't what is used for the other intrinsics. I will >> correct decodeBlock to use the prevailing style. Another patch should >> be added (not part of this webrev) to correct encodeBlock's style. >> >>>> >>>> >>>> library_call.cpp >>>> Good. Indentation style of the call parameters differs from encodeBlock. >> >> Will fix. >> >>>> >>>> >>>> runtime.cpp >>>> Good. >>>> >>>> >>>> aotCodeHeap.cpp >>>> vmSymbols.cpp >>>> shenandoahSupport.cpp >>>> vmStructs_jvmci.cpp >>>> shenandoahSupport.cpp >>>> escape.cpp >>>> runtime.hpp >>>> stubRoutines.cpp >>>> stubRoutines.hpp >>>> vmStructs.cpp >>>> Good and trivial. >>>> >>>> >>>> Tests: >>>> I think we should have JTREG tests to check for regressions in the future. >> >> Ah, this is another thing I didn't know about. I will make some >> regression tests. >> >> Thanks for your time on this. As you can tell, I'm inexperienced in >> writing openjdk code, so your patience and careful review is really >> appreciated. >> >> - Corey From headius at headius.com Mon Aug 31 18:38:53 2020 From: headius at headius.com (Charles Oliver Nutter) Date: Mon, 31 Aug 2020 13:38:53 -0500 Subject: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby In-Reply-To: <416425ef-0980-ba2c-0bdf-8eebefa5e81e@oracle.com> References: <416425ef-0980-ba2c-0bdf-8eebefa5e81e@oracle.com> Message-ID: On Mon, Aug 31, 2020 at 8:53 AM Vladimir Ivanov wrote: > What surprised me is that the absent class which causes the failure is > java.lang.String. But it turns out java.lang.String is never accessed > from callee method [1] and hence there are no guarantees it is resolved > in the context of the context class loader (instance of > org/jruby/util/OneShotClassLoader) by the time the compilation kicks in. > > You can workaround that by forcing j.l.String resolution when > instantiating the class loader. I can give this a shot, but if I'm resolving the target method's class, and that class is using String (there's definitely references to String in the generated code), why is String still unresolved at the point where I actually bind the method and call it? I guess I can't tell whether you're saying "this is not your fault and here's a workaround" or "this is your fault and this is how you should fix it". - Charlie From evgeny.nikitin at oracle.com Mon Aug 31 20:22:03 2020 From: evgeny.nikitin at oracle.com (Evgeny Nikitin) Date: Mon, 31 Aug 2020 22:22:03 +0200 Subject: RFR(M): 8166554: Avoid compilation blocking in OverloadCompileQueueTest.java Message-ID: <34b013fb-4eea-1a88-d3f1-6af990fecfbc@oracle.com> Hi, Bug: https://bugs.openjdk.java.net/browse/JDK-8166554 Webrev: http://cr.openjdk.java.net/~enikitin//8166554/webrev.00/index.html ==== Problem explanation ==== The immediate reason for the test timeout is compilation lock within the JVM during shutdown. In such a case, the VM gives the stuck compiler threads 10 seconds to finish [0], after which shuts down not respecting them. That 10 seconds of VM shutdown is enough for the test to fail in some cases - the test is a stress test that utilizes almost all the possible timeout[1]. The compilation lock, in its turn, is done via WhiteBox by the test. It should gracefully unlock the compilation in the 'finally' block, but the lockUnlocker thread is declared daemon[2], and therefore may not execute the 'finally'. ==== Solution ==== Since the 'lockUnlock' is started via InfiniteLoop, it's not possible to un-daemon it. So I just turned the lockUnlock method into a Thread descendant, which got joined in the end. Not the most beautiful solution given direct work with delays, but its main lock-unlock cycle is small and it is clear about what it does. Please review, // Evgeny Nikitin. ======== [0] http://hg.openjdk.java.net/jdk/jdk/file/6db0cb3893c5/src/hotspot/share/runtime/vmOperations.cpp#l388 [1] http://hg.openjdk.java.net/jdk/jdk/file/e10f558e1df5/test/hotspot/jtreg/compiler/codecache/stress/CodeCacheStressRunner.java#l40 [2] http://hg.openjdk.java.net/jdk/jdk/file/e10f558e1df5/test/hotspot/jtreg/compiler/codecache/stress/Helper.java#l59 From yumin.qi at oracle.com Mon Aug 31 21:32:26 2020 From: yumin.qi at oracle.com (Yumin Qi) Date: Mon, 31 Aug 2020 14:32:26 -0700 Subject: 8248337: sparc related code clean up after solaris removal Message-ID: HI, ? Please review for ? bug: https://bugs.openjdk.java.net/browse/JDK-8248337 ? webrev:http://cr.openjdk.java.net/~minqi/2020/8248337/webrev-01/ ? Summary: After Solaris supported files removed from repo, there are some remnants which needs cleaning up. Some comments are not correct, and some refer to wrong files. There is a flag seems only useful for Sparc: UseRDPCForConstantTableBase, which got removed in this patch . Also in postaloc.cpp, the delay slot seems is only for sparc too, but I am not sure about that. Most of the patch are in comment section. ? Tests passed tier1-4 ? Thanks ? Yumin From vladimir.x.ivanov at oracle.com Mon Aug 31 21:58:31 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 1 Sep 2020 00:58:31 +0300 Subject: Tiered compilation leads to "unloaded signature class" inlining failures in JRuby In-Reply-To: References: <416425ef-0980-ba2c-0bdf-8eebefa5e81e@oracle.com> Message-ID: >> What surprised me is that the absent class which causes the failure is >> java.lang.String. But it turns out java.lang.String is never accessed >> from callee method [1] and hence there are no guarantees it is resolved >> in the context of the context class loader (instance of >> org/jruby/util/OneShotClassLoader) by the time the compilation kicks in. >> >> You can workaround that by forcing j.l.String resolution when >> instantiating the class loader. > > I can give this a shot, but if I'm resolving the target method's > class, and that class is using String (there's definitely references > to String in the generated code), why is String still unresolved at > the point where I actually bind the method and call it? As I can see with the test case, target method is loaded in a separate instance of OneShotClassLoader (and, moreover, I see j.l.String loaded there!). So, it doesn't mattter whether a class is loaded in a "parent" (?) script at all since they are loaded by separate class loaders. > I guess I can't tell whether you're saying "this is not your fault and > here's a workaround" or "this is your fault and this is how you should > fix it". It's hard to draw a line here. My feeling is JVM can do a better job here (but I haven't worked out all the consequences yet). But if you want to get rid of this quirk running on 8u, you definitely better fix your app (JRuby). Best regards, Vladimir Ivanov From cjashfor at linux.ibm.com Mon Aug 31 22:22:47 2020 From: cjashfor at linux.ibm.com (Corey Ashford) Date: Mon, 31 Aug 2020 15:22:47 -0700 Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: <65ed7919-86fc-adfa-3cd5-58dd96a3487f@linux.ibm.com> References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> <8ece8d2e-fd99-b734-211e-a32b534a7dc8@linux.ibm.com> <8d53dcf8-635a-11e2-4f6a-39b70e2c3b8b@oracle.com> <65ed7919-86fc-adfa-3cd5-58dd96a3487f@linux.ibm.com> Message-ID: On 8/29/20 1:19 PM, Corey Ashford wrote: > Hi Roger, > > Thanks for your reply and thoughts!? Comments interspersed below: > > On 8/28/20 10:54 AM, Roger Riggs wrote: ... >> Comparing with the way that the Base64 encoder was intrinsified, the >> method that is intrinsified should have a method body that does >> the same function, so it is interchangable.? That likely will just shift >> the "fast path" code into the decodeBlock method. >> Keeping the symmetry between encoder and decoder will >> make it easier to maintain the code. > > Good point.? I'll investigate what this looks like in terms of the > actual code, and will report back (perhaps in a new webrev). > Having looked at this again, I don't think it makes sense. One thing that differs significantly from the encodeBlock intrinsic is that the decodeBlock intrinsic only needs to process a prefix of the data, and so it can leave virtually any amount of data at the end of the src buffer unprocessed, where as with the encodeBlock intrinsic, if it exists, it must process the entire buffer. In the (common) case where the decodeBlock intrinsic returns not having processed everything, it still needs to call the Java code, and if that Java code is "replaced" by the intrinsic, it's inaccessible. Is there something I'm overlooking here? Basically I want the decode API to behave differently than the encode API, mostly to make the arch-specific intrinsic easier to implement. If that's not acceptable, then I need to rethink the API, and also figure out how to deal with the illegal character case. The latter could perhaps be done by throwing an exception from the intrinsic, or maybe by returning a negative length that specifies the index of the illegal src byte, and then have the Java code throw the exception). Regards, - Corey