From jiefu at tencent.com  Sat Aug  1 10:09:54 2020
From: jiefu at tencent.com (=?utf-8?B?amllZnUo5YKF5p2wKQ==?=)
Date: Sat, 1 Aug 2020 10:09:54 +0000
Subject: 8250825: C2 crashes with assert(field != __null) failed: missing
 field(Internet mail)
In-Reply-To: <40d947f8-ebdb-0850-274b-583be9a37aa3@oracle.com>
References: <11584C93-EDD5-42A9-A2CD-0738970F3181@tencent.com>,
 <40d947f8-ebdb-0850-274b-583be9a37aa3@oracle.com>
Message-ID: <97d105b27624408e89666fe7ebdb4d74@tencent.com>

Thanks Vladimir and Tobias for your review.

Pushed.


Best regards,

Jie


________________________________
From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
Sent: Saturday, August 1, 2020 7:54 AM
To: jiefu(??)
Cc: hotspot compiler
Subject: Re: 8250825: C2 crashes with assert(field != __null) failed: missing field(Internet mail)

Yes, it is good.

Thanks,
Vladimir

On 7/31/20 4:43 PM, jiefu(??) wrote:
> Hi Vladimir K,
>
> The latest version for the test case is here: http://cr.openjdk.java.net/~jiefu/8250825/webrev.02/
> Compared with webrev.01, the changes are:
>       - Rename the test to TestMisalignedUnsafeAccess.java
>       - Add @summary tag
>       - Remove Xbatch
>       - Remvoe initUnsafe
>
> Are you still OK with it?
>
> Thanks.
> Best regards,
> Jie
>
> ?On 2020/8/1, 12:46 AM, "Vladimir Kozlov" <vladimir.kozlov at oracle.com> wrote:
>
>      Good.
>
>      thanks,
>      Vladimir K
>
>      On 7/30/20 10:06 PM, jiefu(??) wrote:
>      > Hi Vladimir K,
>      >
>      > Thanks for your review.
>      >
>      > The test had been extended here:
>      >    - http://cr.openjdk.java.net/~jiefu/8250825/webrev.01/
>      >
>      > Before the patch:
>      >    The unsafe access (put/get) to static field will crash.
>      >    The unsafe access (put/get) to instance field is fine.
>      >
>      > After the patch:
>      >    All is ok.
>      >
>      > Thanks a lot.
>      > Best regards,
>      > Jie
>      >
>      > On 2020/7/31, 2:24 AM, "hotspot-compiler-dev on behalf of Vladimir Kozlov" <hotspot-compiler-dev-retn at openjdk.java.net on behalf of vladimir.kozlov at oracle.com> wrote:
>      >
>      >      Hi Jie
>      >
>      >      Nodes generated by make_unsafe_address() are correct. The issue is that Unsafe API allows to genereate unaligned (to
>      >      fields) offset with arbitrary type. As result C2 type system can't find corresponding field.
>      >
>      >      Did you tried to do unaligned unsafe access to instance fields?
>      >      Also try to unsafe set value (Store node). There is code in C2 which checks for narrow stores. Would be interesting how
>      >      it behave in unsafe case.
>      >
>      >      Please, extend your test.
>      >
>      >      Otherwise fix is good.
>      >
>      >      Thanks,
>      >      Vladimir K
>      >
>      >      On 7/30/20 6:09 AM, jiefu(??) wrote:
>      >      > Hi all,
>      >      >
>      >      > JBS:    https://bugs.openjdk.java.net/browse/JDK-8250825
>      >      > Webrev: http://cr.openjdk.java.net/~jiefu/8250825/webrev.00/
>      >      >
>      >      > When C2 tries to inline an unsafe-access method, it may generate the following pattern in make_unsafe_address:
>      >      >        ConP  ConL
>      >      >           \  |
>      >      >            \ |
>      >      >            AddP
>      >      > Current implementation of TypeOopPtr::TypeOopPtr(...) failed to recognize it as an unsafe operation, which leads to the crash.
>      >      >
>      >      > Testing:
>      >      >    - tier1-3 on Linux/x64
>      >      >
>      >      > Could you please review it and give me some advice?
>      >      >
>      >      > Thanks a lot.
>      >      > Best regards,
>      >      > Jie
>      >      >
>      >
>      >
>      >
>
>
>


From jatin.bhateja at intel.com  Sun Aug  2 18:25:12 2020
From: jatin.bhateja at intel.com (Bhateja, Jatin)
Date: Sun, 2 Aug 2020 18:25:12 +0000
Subject: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86
In-Reply-To: <8265e303-0f86-b308-be79-740d6b4710f2@oracle.com>
References: <MWHPR11MB1614EAFF216144FE6EAE68F9E87F0@MWHPR11MB1614.namprd11.prod.outlook.com>
 <92d97d1b-fc53-e368-b249-1cab7db33964@oracle.com>
 <MWHPR11MB1614CB6E26028AC98DAA7F30E8790@MWHPR11MB1614.namprd11.prod.outlook.com>
 <dd691913-d9c7-2657-905f-4f3df50f6bb4@oracle.com>
 <MWHPR11MB1614E047E14386D3B51EA3A9E8700@MWHPR11MB1614.namprd11.prod.outlook.com>
 <e0a75968-936f-97df-5693-f1e3275824e9@oracle.com>
 <5f6a3e52-7854-4613-43f1-32a7423a0db6@oracle.com>
 <MWHPR11MB1614B0D4523E65CF9876E72DE8710@MWHPR11MB1614.namprd11.prod.outlook.com>
 <8265e303-0f86-b308-be79-740d6b4710f2@oracle.com>
Message-ID: <MWHPR11MB1614336DE7DA3CDDD7292519E84C0@MWHPR11MB1614.namprd11.prod.outlook.com>

Hi Vladimir,

Final patch is placed at following link.

http://cr.openjdk.java.net/~jbhateja/8248830/webrev.06/

One more reviewer approval needed.

Best Regards,
Jatin 

> -----Original Message-----
> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Sent: Saturday, August 1, 2020 4:49 AM
> To: Bhateja, Jatin <jatin.bhateja at intel.com>
> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-
> dev at openjdk.java.net
> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86
> 
> 
> > http://cr.openjdk.java.net/~jbhateja/8248830/webrev.05/
> 
> Looks good.
> 
> Tier5 (where I saw the crashes) passed.
> 
> Please, incorporate the following minor cleanups in the final version:
>    http://cr.openjdk.java.net/~vlivanov/jbhateja/8248830/webrev.05.cleanup/
> 
> (Tested with hs-tier1,hs-tier2.)
> 
> Best regards,
> Vladimir Ivanov
> 
> >> -----Original Message-----
> >> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> >> Sent: Thursday, July 30, 2020 3:30 AM
> >> To: Bhateja, Jatin <jatin.bhateja at intel.com>
> >> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
> >> hotspot-compiler- dev at openjdk.java.net
> >> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for
> >> X86
> >>
> >>
> >>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.04/
> >>>
> >>> Looks good. (Testing is in progress.)
> >>
> >> FYI test results are clean (tier1-tier5).
> >>
> >>>> I have removed RotateLeftNode/RotateRightNode::Ideal routines since
> >>>> we are anyways doing constant folding in LShiftI/URShiftI value
> >>>> routines. Since JAVA rotate APIs are no longer intrincified hence
> >>>> these routines may no longer be useful.
> >>>
> >>> Nice observation! Good.
> >>
> >> As a second thought, it seems there's still a chance left that Rotate
> >> nodes get their input type narrowed after the folding happened. For
> >> example, as a result of incremental inlining or CFG transformations
> >> during loop optimizations. And it does happen in practice since the
> >> testing revealed some crashes due to the bug in
> RotateLeftNode/RotateRightNode::Ideal().
> >>
> >> So, it makes sense to keep the transformations. But I'm fine with
> >> addressing that as a followup enhancement.
> >>
> >> Best regards,
> >> Vladimir Ivanov
> >>
> >>>
> >>>>> It would be really nice to migrate to MacroAssembler along the way
> >>>>> (as a cleanup).
> >>>>
> >>>> I guess you are saying remove opcodes/encoding from patterns and
> >>>> move then to Assembler, Can we take this cleanup activity
> >>>> separately since other patterns are also using these matcher
> directives.
> >>>
> >>> I'm perfectly fine with handling it as a separate enhancement.
> >>>
> >>>> Other synthetic comments have been taken care of. I have extended
> >>>> the Test to cover all the newly added scalar transforms. Kindly let
> >>>> me know if there other comments.
> >>>
> >>> Nice!
> >>>
> >>> Best regards,
> >>> Vladimir Ivanov
> >>>
> >>>>> -----Original Message-----
> >>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> >>>>> Sent: Friday, July 24, 2020 3:21 AM
> >>>>> To: Bhateja, Jatin <jatin.bhateja at intel.com>
> >>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Andrew
> >>>>> Haley <aph at redhat.com>; hotspot-compiler-dev at openjdk.java.net
> >>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification
> >>>>> for
> >>>>> X86
> >>>>>
> >>>>> Hi Jatin,
> >>>>>
> >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.03/
> >>>>>
> >>>>> Much better! Thanks.
> >>>>>
> >>>>>> Change Summary:
> >>>>>>
> >>>>>> 1) Unified the handling for scalar rotate operation. All scalar
> >>>>>> rotate
> >>>>> selection patterns are now dependent on newly created
> >>>>> RotateLeft/RotateRight nodes. This promotes rotate inferencing.
> >>>>> Currently
> >>>>> if DAG nodes corresponding to a sub-pattern are shared (have
> >>>>> multiple
> >>>>> users) then existing complex patterns based on Or/LShiftL/URShift
> >>>>> does not get matched and this prevents inferring rotate nodes.
> >>>>> Please refer to JIT'ed assembly output with baseline[1] and with
> >>>>> patch[2] . We can see that generated code size also went done from
> >>>>> 832 byte to 768 bytes. Also this can cause perf degradation if
> >>>>> shift-or dependency chain appears inside a hot region.
> >>>>>>
> >>>>>> 2) Due to enhanced rotate inferencing new patch shows better
> >>>>>> performance
> >>>>> even for legacy targets (non AVX-512). Please refer to the perf
> >>>>> result[3] over AVX2 machine for JMH benchmark part of the patch.
> >>>>>
> >>>>> Very nice!
> >>>>>> 3) As suggested, removed Java API intrinsification changes and
> >>>>>> scalar
> >>>>> rotate transformation are done during OrI/OrL node idealizations.
> >>>>>
> >>>>> Good.
> >>>>>
> >>>>> (Still would be nice to factor the matching code from Ideal() and
> >>>>> share it between multiple use sites. Especially considering
> >>>>> OrVNode::Ideal() now does basically the same thing. As an
> >>>>> example/idea, take a look at
> >>>>> is_bmi_pattern() in x86.ad.)
> >>>>>
> >>>>>> 4) SLP always gets to work on new scalar Rotate nodes and creates
> >>>>>> vector
> >>>>> rotate nodes which are degenerated into OrV/LShiftV/URShiftV nodes
> >>>>> if target does not supports vector rotates(non-AVX512).
> >>>>>
> >>>>> Good.
> >>>>>
> >>>>>> 5) Added new instruction patterns for vector shift Left/Right
> >>>>>> operations
> >>>>> with constant shift operands. This prevents emitting extra moves
> >>>>> to
> >> XMM.
> >>>>>
> >>>>> +instruct vshiftI_imm(vec dst, vec src, immI8 shift) %{
> >>>>> +? match(Set dst (LShiftVI src shift));
> >>>>>
> >>>>> I'd prefer to see a uniform Ideal IR shape being used irrespective
> >>>>> of whether the argument is a constant or not. It should also
> >>>>> simplify the logic in SuperWord and make it easier to support on
> >>>>> non-x86 architectures.
> >>>>>
> >>>>> For example, here's how it is done on AArch64:
> >>>>>
> >>>>> instruct vsll4I_imm(vecX dst, vecX src, immI shift) %{
> >>>>>  ??? predicate(n->as_Vector()->length() == 4);
> >>>>>  ??? match(Set dst (LShiftVI src (LShiftCntV shift))); ...
> >>>>>
> >>>>>> 6) Constant folding scenarios are covered in
> >>>>>> RotateLeft/RotateRight
> >>>>> idealization, inferencing of vector rotate through OrV
> >>>>> idealization covers the vector patterns generated though non SLP
> route i.e.
> >>>>> VectorAPI.
> >>>>>
> >>>>> I'm fine with keeping OrV::Ideal(), but I'm concerned with the
> >>>>> general direction here - duplication of scalar transformations to
> >>>>> lane-wise vector operations. It definitely won't scale and in a
> >>>>> longer run it risks to diverge. Would be nice to find a way to
> >>>>> automatically "lift"
> >>>>> scalar transformations to vectors and apply them uniformly. But
> >>>>> right now it is just an idea which requires more experimentation.
> >>>>>
> >>>>>
> >>>>> Some other minor comments/suggestions:
> >>>>>
> >>>>> +? // Swap the computed left and right shift counts.
> >>>>> +? if (is_rotate_left) {
> >>>>> +??? Node* temp = shiftRCnt;
> >>>>> +??? shiftRCnt? = shiftLCnt;
> >>>>> +??? shiftLCnt? = temp;
> >>>>> +? }
> >>>>>
> >>>>> Maybe use swap() here (declared in globalDefinitions.hpp)?
> >>>>>
> >>>>>
> >>>>> +? if (Matcher::match_rule_supported_vector(vopc, vlen, bt))
> >>>>> +??? return true;
> >>>>>
> >>>>> Please, don't omit curly braces (even for simple cases).
> >>>>>
> >>>>>
> >>>>> -// Rotate Right by variable
> >>>>> -instruct rorI_rReg_Var_C0(no_rcx_RegI dst, rcx_RegI shift, immI0
> >>>>> zero, rFlagsReg cr)
> >>>>> +instruct rorI_immI8_legacy(rRegI dst, immI8 shift, rFlagsReg cr)
> >>>>>  ?? %{
> >>>>> -? match(Set dst (OrI (URShiftI dst shift) (LShiftI dst (SubI zero
> >>>>> shift))));
> >>>>> -
> >>>>> +? predicate(!VM_Version::supports_bmi2() &&
> >>>>> n->bottom_type()->basic_type() == T_INT);
> >>>>> +? match(Set dst (RotateRight dst shift));
> >>>>> +? format %{ "rorl???? $dst, $shift" %}
> >>>>>  ???? expand %{
> >>>>> -??? rorI_rReg_CL(dst, shift, cr);
> >>>>> +??? rorI_rReg_imm8(dst, shift, cr);
> >>>>>  ???? %}
> >>>>>
> >>>>> It would be really nice to migrate to MacroAssembler along the way
> >>>>> (as a cleanup).
> >>>>>
> >>>>>> Please push the patch through your testing framework and let me
> >>>>>> know your
> >>>>> review feedback.
> >>>>>
> >>>>> There's one new assertion failure:
> >>>>>
> >>>>> #? Internal Error (.../src/hotspot/share/opto/phaseX.cpp:1238),
> >>>>> pid=5476, tid=6219
> >>>>> #? assert((i->_idx >= k->_idx) || i->is_top()) failed: Idealize
> >>>>> should return new nodes, use Identity to return old nodes
> >>>>>
> >>>>> I believe it comes from
> >>>>> RotateLeftNode::Ideal/RotateRightNode::Ideal
> >>>>> which can return pre-contructed constants. I suggest to get rid of
> >>>>> Ideal() methods and move constant folding logic into Node::Value()
> >>>>> (as implemented for other bitwise/arithmethic nodes in
> >>>>> addnode.cpp/subnode.cpp/mulnode.cpp et al). It's a more generic
> >>>>> approach since it enables richer type information (ranges vs
> >>>>> constants) and IMO it's more convenient to work with constants
> >>>>> through Types than ConNodes.
> >>>>>
> >>>>> (I suspect that original/expanded IR shape may already provide
> >>>>> more precise type info for non-constant case which can affect the
> >>>>> benchmarks.)
> >>>>>
> >>>>> Best regards,
> >>>>> Vladimir Ivanov
> >>>>>
> >>>>>>
> >>>>>> Best Regards,
> >>>>>> Jatin
> >>>>>>
> >>>>>> [1]
> >>>>>>
> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_baseline_avx2_asm.
> >>>>>> txt [2]
> >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_new_patch_avx
> >>>>>> 2_
> >>>>>> asm
> >>>>>> .txt [3]
> >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_perf_avx2_new
> >>>>>> _p
> >>>>>> atc
> >>>>>> h.txt
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> >>>>>>> Sent: Saturday, July 18, 2020 12:25 AM
> >>>>>>> To: Bhateja, Jatin <jatin.bhateja at intel.com>; Andrew Haley
> >>>>>>> <aph at redhat.com>
> >>>>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
> >>>>>>> hotspot-compiler- dev at openjdk.java.net
> >>>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification
> >>>>>>> for
> >>>>>>> X86
> >>>>>>>
> >>>>>>> Hi Jatin,
> >>>>>>>
> >>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev_02/
> >>>>>>>
> >>>>>>> It definitely looks better, but IMO it hasn't reached the sweet
> >>>>>>> spot
> >>>>> yet.
> >>>>>>> It feels like the focus is on auto-vectorizer while the burden
> >>>>>>> is put on scalar cases.
> >>>>>>>
> >>>>>>> First of all, considering GVN folds relevant operation patterns
> >>>>>>> into a single Rotate node now, what's the motivation to
> >>>>>>> introduce intrinsics?
> >>>>>>>
> >>>>>>> Another point is there's still significant duplication for
> >>>>>>> scalar cases.
> >>>>>>>
> >>>>>>> I'd prefer to see the legacy cases which rely on pattern
> >>>>>>> matching to go away and be substituted with instructions which
> >>>>>>> match Rotate instructions (migrating ).
> >>>>>>>
> >>>>>>> I understand that it will penalize the vectorization
> >>>>>>> implementation, but IMO reducing overall complexity is worth it.
> >>>>>>> On auto-vectorizer side, I see
> >>>>>>> 2 ways to fix it:
> >>>>>>>
> >>>>>>>  ???? (1) introduce additional AD instructions for
> >>>>>>> RotateLeftV/RotateRightV specifically for pre-AVX512 hardware;
> >>>>>>>
> >>>>>>>  ???? (2) in SuperWord::output(), when matcher doesn't support
> >>>>>>> RotateLeftV/RotateLeftV nodes (Matcher::match_rule_supported()),
> >>>>>>> generate vectorized version of the original pattern.
> >>>>>>>
> >>>>>>> Overall, it looks like more and more focus is made on scalar part.
> >>>>>>> Considering the main goal of the patch is to enable
> >>>>>>> vectorization, I'm fine with separating cleanup of scalar part.
> >>>>>>> As an interim solution, it seems that leaving the scalar part as
> >>>>>>> it is now and matching scalar bit rotate pattern in
> >>>>>>> VectorNode::is_rotate() should be enough to keep the
> >>>>>>> vectorization part functioning. Then scalar Rotate nodes and
> relevant cleanups can be integrated later.
> >>>>>>> (Or vice
> >>>>>>> versa: clean up scalar part first and then follow up with
> >>>>>>> vectorization.)
> >>>>>>>
> >>>>>>> Some other comments:
> >>>>>>>
> >>>>>>> * There's a lot of duplication between OrINode::Ideal and
> >>>>> OrLNode::Ideal.
> >>>>>>> What do you think about introducing a super type
> >>>>>>> (OrNode) and put a unified version (OrNode::Ideal) there?
> >>>>>>>
> >>>>>>>
> >>>>>>> * src/hotspot/cpu/x86/x86.ad
> >>>>>>>
> >>>>>>> +instruct vprotate_immI8(vec dst, vec src, immI8 shift) %{
> >>>>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type()
> >>>>>>> +==
> >>>>>>> T_INT
> >>>>> ||
> >>>>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type()
> >>>>>>> +== T_LONG);
> >>>>>>>
> >>>>>>> +instruct vprorate(vec dst, vec src, vec shift) %{
> >>>>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type()
> >>>>>>> +==
> >>>>>>> T_INT
> >>>>> ||
> >>>>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type()
> >>>>>>> +== T_LONG);
> >>>>>>>
> >>>>>>> The predicates are redundant here.
> >>>>>>>
> >>>>>>>
> >>>>>>> * src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
> >>>>>>>
> >>>>>>> +void C2_MacroAssembler::vprotate_imm(int opcode, BasicType
> >>>>>>> +etype,
> >>>>>>> XMMRegister dst, XMMRegister src,
> >>>>>>> +???????????????????????????????????? int shift, int vector_len)
> >>>>>>> +{ if (opcode == Op_RotateLeftV) {
> >>>>>>> +??? if (etype == T_INT) {
> >>>>>>> +????? evprold(dst, src, shift, vector_len);
> >>>>>>> +??? } else {
> >>>>>>> +????? evprolq(dst, src, shift, vector_len);
> >>>>>>> +??? }
> >>>>>>>
> >>>>>>> Please, put an assert for the false case (assert(etype ==
> >>>>>>> T_LONG,
> >>>>> "...")).
> >>>>>>>
> >>>>>>>
> >>>>>>> * On testing (with previous version of the patch): -XX:UseAVX is
> >>>>>>> x86- specific flag, so new/adjusted tests now fail on non-x86
> >> platforms.
> >>>>>>> Either omitting the flag or adding
> >>>>>>> -XX:+IgnoreUnrecognizedVMOptions will solve the issue.
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>> Vladimir Ivanov
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Summary of changes:
> >>>>>>>> 1) Optimization is specifically targeted to exploit vector
> >>>>>>>> rotation
> >>>>>>> instruction added for X86 AVX512. A single rotate instruction
> >>>>>>> encapsulates entire vector OR/SHIFTs pattern thus offers better
> >>>>>>> latency at reduced instruction count.
> >>>>>>>>
> >>>>>>>> 2) There were two approaches to implement this:
> >>>>>>>>  ?????? a)? Let everything remain the same and add new wide
> >>>>>>>> complex
> >>>>>>> instruction patterns in the matcher for e.g.
> >>>>>>>>  ??????????? set Dst ( OrV (Binary (LShiftVI dst (Binary
> >>>>>>>> ReplicateI
> >>>>>>>> shift))
> >>>>>>> (URShiftVI dst (Binary (SubI (Binary ReplicateI 32) ( Replicate
> >>>>>>> shift))
> >>>>>>>>  ?????? It would have been an overoptimistic assumption to
> >>>>>>>> expect that graph
> >>>>>>> shape would be preserved till the matcher for correct inferencing.
> >>>>>>>>  ?????? In addition we would have required multiple such bulky
> >>>>>>>> patterns.
> >>>>>>>>  ?????? b) Create new RotateLeft/RotateRight scalar nodes,
> >>>>>>>> these gets
> >>>>>>> generated during intrinsification as well as during additional
> >>>>>>> pattern
> >>>>>>>>  ?????? matching during node Idealization, later on these nodes
> >>>>>>>> are consumed
> >>>>>>> by SLP for valid vectorization scenarios to emit their vector
> >>>>>>>>  ?????? counterparts which eventually emits vector rotates.
> >>>>>>>>
> >>>>>>>> 3) I choose approach 2b) since its cleaner, only problem here
> >>>>>>>> was that in non-evex mode (UseAVX < 3) new scalar Rotate nodes
> >>>>>>>> should either be
> >>>>>>> dismantled back to OR/SHIFT pattern or we penalize the
> >>>>>>> vectorization which would be very costly, other option would
> >>>>>>> have been to add additional vector rotate pattern for UseAVX=3
> >>>>>>> in the matcher which emit vector OR-SHIFTs instruction but then
> >>>>>>> it will loose on emitting efficient instruction sequence which
> >>>>>>> node sharing
> >>>>>>> (OrV/LShiftV/URShift) offer in current implementation - thus it
> >>>>>>> will not be beneficial for non-AVX512 targets, only saving will
> >>>>>>> be in terms of cleanup of few existing scalar rotate matcher
> >>>>>>> patterns, also old targets does not offer this powerful rotate
> >> instruction.
> >>>>>>> Therefore new scalar nodes are created only for AVX512 targets.
> >>>>>>>>
> >>>>>>>> As per suggestions constant folding scenarios have been covered
> >>>>>>>> during
> >>>>>>> Idealizations of newly added scalar nodes.
> >>>>>>>>
> >>>>>>>> Please review the latest version and share your feedback and
> >>>>>>>> test
> >>>>>>> results.
> >>>>>>>>
> >>>>>>>> Best Regards,
> >>>>>>>> Jatin
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: Andrew Haley <aph at redhat.com>
> >>>>>>>>> Sent: Saturday, July 11, 2020 2:24 PM
> >>>>>>>>> To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; Bhateja,
> >>>>>>>>> Jatin <jatin.bhateja at intel.com>;
> >>>>>>>>> hotspot-compiler-dev at openjdk.java.net
> >>>>>>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> >>>>>>>>> Subject: Re: 8248830 : RFR[S] : C2 : Rotate API
> >>>>>>>>> intrinsification for
> >>>>>>>>> X86
> >>>>>>>>>
> >>>>>>>>> On 10/07/2020 18:32, Vladimir Ivanov wrote:
> >>>>>>>>>
> >>>>>>>>>  ??? > High-level comment: so far, there were no pressing need
> >>>>>>>>> in
> >>>>>>>>>> explicitly marking the methods as intrinsics. ROR/ROL
> >>>>>>>>> instructions
> >>>>>>>>>> were selected during matching [1]. Now the patch introduces
> >>>>>>>>>> >
> >>>>>>>>> dedicated nodes
> >>>>>>>>> (RotateLeft/RotateRight) specifically for intrinsics? > which
> >>>>>>>>> partly duplicates existing logic.
> >>>>>>>>>
> >>>>>>>>> The lack of rotate nodes in the IR has always meant that
> >>>>>>>>> AArch64 doesn't generate optimal code for e.g.
> >>>>>>>>>
> >>>>>>>>>  ????? (Set dst (XorL reg1 (RotateLeftL reg2 imm)))
> >>>>>>>>>
> >>>>>>>>> because, with the RotateLeft expanded to its full combination
> >>>>>>>>> of ORs and shifts, it's to complicated to match. At the time I
> >>>>>>>>> put this to one side because it wasn't urgent. This is a shame
> >>>>>>>>> because although such combinations are unusual they are used
> >>>>>>>>> in some crypto
> >>>>> operations.
> >>>>>>>>>
> >>>>>>>>> If we can generate immediate-form rotate nodes early by
> >>>>>>>>> pattern matching during parsing (rather than depending on
> >>>>>>>>> intrinsics) we'll get more value than by depending on
> >>>>>>>>> programmers calling
> >> intrinsics.
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Andrew Haley? (he/him)
> >>>>>>>>> Java Platform Lead Engineer
> >>>>>>>>> Red Hat UK Ltd. <https://www.redhat.com>
> >>>>>>>>> https://keybase.io/andrewhaley
> >>>>>>>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
> >>>>>>>>

From boris.ulasevich at bell-sw.com  Sun Aug  2 20:54:47 2020
From: boris.ulasevich at bell-sw.com (Boris Ulasevich)
Date: Sun, 2 Aug 2020 23:54:47 +0300
Subject: RFR(S) 8248445: Use of AbsI/AbsL nodes should be limited to supported
 platforms
Message-ID: <CAOhyNwApepiZt05ytwMLtxKWCxr2wWiaW7xHhTQXYnfKxrAdWA@mail.gmail.com>

Hi all,

Please review a simple change to C2 to fix a regression: AbsI/AbsL
nodes are used without checking that the platform supports them
(for now it is the issue for ARM32 and 32-bit x86 platforms).

http://cr.openjdk.java.net/~bulasevich/8248445/webrev.02
http://bugs.openjdk.java.net/browse/JDK-8248445

thanks,
Boris

From vladimir.x.ivanov at oracle.com  Mon Aug  3 10:37:28 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 3 Aug 2020 13:37:28 +0300
Subject: RFR(S) 8248445: Use of AbsI/AbsL nodes should be limited to
 supported platforms
In-Reply-To: <CAOhyNwApepiZt05ytwMLtxKWCxr2wWiaW7xHhTQXYnfKxrAdWA@mail.gmail.com>
References: <CAOhyNwApepiZt05ytwMLtxKWCxr2wWiaW7xHhTQXYnfKxrAdWA@mail.gmail.com>
Message-ID: <89e9ab7a-5f42-075a-e770-2fb943da897a@oracle.com>


> http://cr.openjdk.java.net/~bulasevich/8248445/webrev.02

Looks good.

Best regards,
Vladimir Ivanov

From luhenry at microsoft.com  Mon Aug  3 14:39:20 2020
From: luhenry at microsoft.com (Ludovic Henry)
Date: Mon, 3 Aug 2020 14:39:20 +0000
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>,
 <CACR9jGOqu3vM8J=n44b5C-FHgAE1ZBt9-AesVtxYqny2Dui=bQ@mail.gmail.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
Message-ID: <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>

Hi,

A quick follow up on that change. Are you happy with the general approach, or would rather have it done differently?

JBS: https://bugs.openjdk.java.net/browse/JDK-8250902
Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.01/

Thank you
Ludovic

From evgeny.nikitin at oracle.com  Mon Aug  3 15:22:40 2020
From: evgeny.nikitin at oracle.com (Evgeny Nikitin)
Date: Mon, 3 Aug 2020 17:22:40 +0200
Subject: RFR(M): 8067651: Fix Trivial code path for
 LevelTransitionTest.java
In-Reply-To: <970076A7-1F18-4E88-994F-802590AF4F9B@oracle.com>
References: <58fd3cd5-cdce-8e15-3237-d22a3566b0da@oracle.com>
 <970076A7-1F18-4E88-994F-802590AF4F9B@oracle.com>
Message-ID: <a3d7d4f0-b609-b415-c6b4-8093b140a7d0@oracle.com>

Hi Igor, thanks for review.

 >   - I don't see necessity of move Helper.* methods into the enclosing 
class, nor do I see it as improving readability of the test. why did you 
decide to move them?

Remnants from the previous developer and their decision :). I personally 
don't like inner classes and inner helper methods alike, so now I've 
extracted that into MethodHelper.java. The fact that the methods are 
used in another test strengthens this decision for me.

 >   - if the test is inapplicable for Xcomp run, you should either 
throw SkippedException instead of System.err::println at L#67 or use 
'@requires vm.compMode != "Xcomp"' in jtreg test description. currently, 
the former provides arguable more clear message that the test wasn't run 
(as it sets special sub-status which is understood by our test execution 
system) than the latter (which will just omit test from test results 
altogether), however @requires is "faster" as jtreg don't need to run 
any of the test code. in any case, both makes it clean that the test 
wasn't really performed, while your code will lead to a passed-passed 
test w/o no automated way to know that the test wasn't run.

I choose @requires. Descriptions in most cases are better then in-code 
logic.

 >   - from you explanation of the fix it's also unclear why 
BackgroundCompilation got disabled, could you please explain?

One of the reasons for the case was uncontrollable switch to another 
layer in background. I found that switch valuable to make the test 
behavior predictable.

The new webrev: https://cr.openjdk.java.net/~enikitin/8067651/webrev.01/

Please review,
// Evgeny Nikitin.


On 2020-07-31 19:11, Igor Ignatyev wrote:
> Hi Evgeny,
> 
> in general looks good to me, a couple comments/questions though:
>   - I don't see necessity of move Helper.* methods into the enclosing class, nor do I see it as improving readability of the test. why did you decide to move them?
>   - if the test is inapplicable for Xcomp run, you should either throw SkippedException instead of System.err::println at L#67 or use '@requires vm.compMode != "Xcomp"' in jtreg test description. currently, the former provides arguable more clear message that the test wasn't run (as it sets special sub-status which is understood by our test execution system) than the latter (which will just omit test from test results altogether), however @requires is "faster" as jtreg don't need to run any of the test code. in any case, both makes it clean that the test wasn't really performed, while your code will lead to a passed-passed test w/o no automated way to know that the test wasn't run.
>   - from you explanation of the fix it's also unclear why BackgroundCompilation got disabled, could you please explain?
> 
> Thanks,
> -- Igor
> 
>> On Jul 27, 2020, at 12:38 PM, Evgeny Nikitin <evgeny.nikitin at oracle.com> wrote:
>>
>> Hi,
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8067651
>> Webrev: https://cr.openjdk.java.net/~enikitin/8067651/webrev.00/
>>
>> Adjusting the test to current state of the VM.
>>
>>     - Definition of 'trivial code' does not depend on whether the method has been profiled or not;
>>     - Trivial code does only go level 0 to level 1;
>>     - Some refactoring.
>>
>> The change has been checked in mach5 for the 5 platforms (passed).
>>
>> Please review,
>> /Evgeny Nikitin.
> 

From viv.desh at gmail.com  Mon Aug  3 15:41:41 2020
From: viv.desh at gmail.com (Vivek Deshpande)
Date: Mon, 3 Aug 2020 08:41:41 -0700
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <CACR9jGOqu3vM8J=n44b5C-FHgAE1ZBt9-AesVtxYqny2Dui=bQ@mail.gmail.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
Message-ID: <CACR9jGOyhsnd3WXFpZAYcpQqXng8V7_=aGJGxO0vc75VKjGqsA@mail.gmail.com>

Hi ludovic

Thanks for the change. It looks good to me.
The approach also looks good to me.
Thank you.

Regards,
Vivek

On Mon, Aug 3, 2020 at 7:39 AM Ludovic Henry <luhenry at microsoft.com> wrote:

> Hi,
>
> A quick follow up on that change. Are you happy with the general approach,
> or would rather have it done differently?
>
> JBS: https://bugs.openjdk.java.net/browse/JDK-8250902
> Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.01/
>
> Thank you
> Ludovic
>


-- 
Thanks and Regards,

Vivek Deshpande
viv.desh at gmail.com

From hohensee at amazon.com  Mon Aug  3 17:06:35 2020
From: hohensee at amazon.com (Hohensee, Paul)
Date: Mon, 3 Aug 2020 17:06:35 +0000
Subject: RFR(S) 8248445: Use of AbsI/AbsL nodes should be limited to
 supported platforms
Message-ID: <37381D95-CBD8-4EC0-9824-9B8AA2D140FB@amazon.com>

+1.

Paul

?On 8/3/20, 3:35 AM, "hotspot-compiler-dev on behalf of Vladimir Ivanov" <hotspot-compiler-dev-retn at openjdk.java.net on behalf of vladimir.x.ivanov at oracle.com> wrote:

    > http://cr.openjdk.java.net/~bulasevich/8248445/webrev.02

    Looks good.

    Best regards,
    Vladimir Ivanov


From vladimir.kozlov at oracle.com  Mon Aug  3 17:10:40 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 3 Aug 2020 10:10:40 -0700
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <CACR9jGOqu3vM8J=n44b5C-FHgAE1ZBt9-AesVtxYqny2Dui=bQ@mail.gmail.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
Message-ID: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>

Hi Ludovic

This is very professional work!

CCing to Core-libs because you modified Java code and need review from Java library group.

Few notes:

Add tests to verify intrinsic implementation. You can use test/hotspot/jtreg/compiler/intrinsics/sha/ as examples.

In vm_version_x86.cpp move UseMD5Intrinsics flag setting near UseSHA flag setting.

In new file macroAssembler_x86_md5.cpp no need empty line after copyright line. There is also typo 'rrdistribute':

  * This code is free software; you can rrdistribute it and/or modify it

Our validate-headers check failed. See GPL header template: ./make/templates/gpl-header

Ludovic, it looks like you used only general instructions to implement this code. Can you add comment where the 
algorithm come from? Or it is just direct translation of Java code?

Vivek, do we have SSE/AVX instructions which may improve performance of this code? It could be follow up update if we can.

Did you test it on 32-bit x86? Would be interesting to see result of artificially switching off AVX and SSE: 
'-XX:UseSSE=0 -XX:UseAVX=0'. It will make sure that only general instructions are needed.

Thanks,
Vladimir

On 8/3/20 7:39 AM, Ludovic Henry wrote:
> Hi,
> 
> A quick follow up on that change. Are you happy with the general approach, or would rather have it done differently?
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8250902
> Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.01/
> 
> Thank you
> Ludovic
> 

From vladimir.kozlov at oracle.com  Mon Aug  3 17:25:34 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 3 Aug 2020 10:25:34 -0700
Subject: RFR(S) 8248445: Use of AbsI/AbsL nodes should be limited to
 supported platforms
In-Reply-To: <CAOhyNwApepiZt05ytwMLtxKWCxr2wWiaW7xHhTQXYnfKxrAdWA@mail.gmail.com>
References: <CAOhyNwApepiZt05ytwMLtxKWCxr2wWiaW7xHhTQXYnfKxrAdWA@mail.gmail.com>
Message-ID: <e72be2be-82bb-4f9e-1076-8cbd795ed344@oracle.com>

Hi Boris,

The current code is hard to read. Can you rearrange it to have clear code flow (and correct spaces for if ())? Including 
F and D checks. To something like:

   if (tzero == TypeF::ZERO) {
     if (sub->Opcode() == Op_SubF &&
         sub->in(2) == x &&
         phase->type(sub->in(1)) == tzero)) {
       x = new AbsFNode(x);
       if (flip) {
         x = new SubFNode(sub->in(1), phase->transform(x));
       }
     }
   } else if

Thanks,
Vladimir

On 8/2/20 1:54 PM, Boris Ulasevich wrote:
> Hi all,
> 
> Please review a simple change to C2 to fix a regression: AbsI/AbsL
> nodes are used without checking that the platform supports them
> (for now it is the issue for ARM32 and 32-bit x86 platforms).
> 
> http://cr.openjdk.java.net/~bulasevich/8248445/webrev.02
> http://bugs.openjdk.java.net/browse/JDK-8248445
> 
> thanks,
> Boris
> 

From anthony.scarpino at oracle.com  Mon Aug  3 17:31:38 2020
From: anthony.scarpino at oracle.com (Anthony Scarpino)
Date: Mon, 3 Aug 2020 10:31:38 -0700
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
References: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
Message-ID: <724174CC-78F4-453C-9420-DC30B8E44664@oracle.com>

I had looked at the java code changes and are fine with them

Tony

> On Aug 3, 2020, at 10:10 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> ?Hi Ludovic
> 
> This is very professional work!
> 
> CCing to Core-libs because you modified Java code and need review from Java library group.
> 
> Few notes:
> 
> Add tests to verify intrinsic implementation. You can use test/hotspot/jtreg/compiler/intrinsics/sha/ as examples.
> 
> In vm_version_x86.cpp move UseMD5Intrinsics flag setting near UseSHA flag setting.
> 
> In new file macroAssembler_x86_md5.cpp no need empty line after copyright line. There is also typo 'rrdistribute':
> 
> * This code is free software; you can rrdistribute it and/or modify it
> 
> Our validate-headers check failed. See GPL header template: ./make/templates/gpl-header
> 
> Ludovic, it looks like you used only general instructions to implement this code. Can you add comment where the algorithm come from? Or it is just direct translation of Java code?
> 
> Vivek, do we have SSE/AVX instructions which may improve performance of this code? It could be follow up update if we can.
> 
> Did you test it on 32-bit x86? Would be interesting to see result of artificially switching off AVX and SSE: '-XX:UseSSE=0 -XX:UseAVX=0'. It will make sure that only general instructions are needed.
> 
> Thanks,
> Vladimir
> 
>> On 8/3/20 7:39 AM, Ludovic Henry wrote:
>> Hi,
>> A quick follow up on that change. Are you happy with the general approach, or would rather have it done differently?
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8250902
>> Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.01/
>> Thank you
>> Ludovic


From vladimir.kozlov at oracle.com  Mon Aug  3 17:34:28 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 3 Aug 2020 10:34:28 -0700
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <CACR9jGOqu3vM8J=n44b5C-FHgAE1ZBt9-AesVtxYqny2Dui=bQ@mail.gmail.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
Message-ID: <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>

And I got crash during JDK build on linux-x64:

#  Internal Error (src/hotspot/share/opto/library_call.cpp:5732), pid=18904, tid=19012
#  assert(field != __null) failed: undefined field
#
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 16-internal+0-2020-08-03-1651458.vladimir.kozlov.jdkjdk, mixed mode, 
tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x11123f4]  LibraryCallKit::load_field_from_object(Node*, char const*, char const*, bool, bool, 
ciInstanceKlass*)+0x334

Current CompileTask:
C2:   6204 1305       4       sun.security.provider.DigestBase::engineUpdate (189 bytes)

Stack: [0x0000151bfcfc7000,0x0000151bfd0c8000],  sp=0x0000151bfd0c3ed0,  free space=1011k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x11123f4]  LibraryCallKit::load_field_from_object(Node*, char const*, char const*, bool, bool, 
ciInstanceKlass*)+0x334
V  [libjvm.so+0x11167ea]  LibraryCallKit::get_long_state_from_digestBase_object(Node*)+0x2a
V  [libjvm.so+0x1116f2d]  LibraryCallKit::inline_digestBase_implCompressMB(Node*, ciInstanceKlass*, bool, unsigned 
char*, char const*, Node*, Node*, Node*)+0x2cd
V  [libjvm.so+0x1117467]  LibraryCallKit::inline_digestBase_implCompressMB(int)+0x397
V  [libjvm.so+0x1121de1]  LibraryIntrinsic::generate(JVMState*)+0x211
V  [libjvm.so+0x75d61d]  PredicatedIntrinsicGenerator::generate(JVMState*)+0xb8d

Vladimir K

On 8/3/20 10:10 AM, Vladimir Kozlov wrote:
> Hi Ludovic
> 
> This is very professional work!
> 
> CCing to Core-libs because you modified Java code and need review from Java library group.
> 
> Few notes:
> 
> Add tests to verify intrinsic implementation. You can use test/hotspot/jtreg/compiler/intrinsics/sha/ as examples.
> 
> In vm_version_x86.cpp move UseMD5Intrinsics flag setting near UseSHA flag setting.
> 
> In new file macroAssembler_x86_md5.cpp no need empty line after copyright line. There is also typo 'rrdistribute':
> 
>  ?* This code is free software; you can rrdistribute it and/or modify it
> 
> Our validate-headers check failed. See GPL header template: ./make/templates/gpl-header
> 
> Ludovic, it looks like you used only general instructions to implement this code. Can you add comment where the 
> algorithm come from? Or it is just direct translation of Java code?
> 
> Vivek, do we have SSE/AVX instructions which may improve performance of this code? It could be follow up update if we can.
> 
> Did you test it on 32-bit x86? Would be interesting to see result of artificially switching off AVX and SSE: 
> '-XX:UseSSE=0 -XX:UseAVX=0'. It will make sure that only general instructions are needed.
> 
> Thanks,
> Vladimir
> 
> On 8/3/20 7:39 AM, Ludovic Henry wrote:
>> Hi,
>>
>> A quick follow up on that change. Are you happy with the general approach, or would rather have it done differently?
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8250902
>> Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.01/
>>
>> Thank you
>> Ludovic
>>

From igor.ignatyev at oracle.com  Mon Aug  3 18:11:35 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Mon, 3 Aug 2020 11:11:35 -0700
Subject: RFR(M): 8067651: Fix Trivial code path for
 LevelTransitionTest.java
In-Reply-To: <a3d7d4f0-b609-b415-c6b4-8093b140a7d0@oracle.com>
References: <58fd3cd5-cdce-8e15-3237-d22a3566b0da@oracle.com>
 <970076A7-1F18-4E88-994F-802590AF4F9B@oracle.com>
 <a3d7d4f0-b609-b415-c6b4-8093b140a7d0@oracle.com>
Message-ID: <5F33FE0C-922F-435E-AD25-2A3445A51996@oracle.com>

Hi Evgeny,

webrev.01 looks good to me, thanks.

-- Igor

> On Aug 3, 2020, at 8:22 AM, Evgeny Nikitin <evgeny.nikitin at oracle.com> wrote:
> 
> Hi Igor, thanks for review.
> 
> >   - I don't see necessity of move Helper.* methods into the enclosing class, nor do I see it as improving readability of the test. why did you decide to move them?
> 
> Remnants from the previous developer and their decision :). I personally don't like inner classes and inner helper methods alike, so now I've extracted that into MethodHelper.java. The fact that the methods are used in another test strengthens this decision for me.
> 
> >   - if the test is inapplicable for Xcomp run, you should either throw SkippedException instead of System.err::println at L#67 or use '@requires vm.compMode != "Xcomp"' in jtreg test description. currently, the former provides arguable more clear message that the test wasn't run (as it sets special sub-status which is understood by our test execution system) than the latter (which will just omit test from test results altogether), however @requires is "faster" as jtreg don't need to run any of the test code. in any case, both makes it clean that the test wasn't really performed, while your code will lead to a passed-passed test w/o no automated way to know that the test wasn't run.
> 
> I choose @requires. Descriptions in most cases are better then in-code logic.
> 
> >   - from you explanation of the fix it's also unclear why BackgroundCompilation got disabled, could you please explain?
> 
> One of the reasons for the case was uncontrollable switch to another layer in background. I found that switch valuable to make the test behavior predictable.
> 
> The new webrev: https://cr.openjdk.java.net/~enikitin/8067651/webrev.01/
> 
> Please review,
> // Evgeny Nikitin.
> 
> 
> 
> On 2020-07-31 19:11, Igor Ignatyev wrote:
>> Hi Evgeny,
>> in general looks good to me, a couple comments/questions though:
>>  - I don't see necessity of move Helper.* methods into the enclosing class, nor do I see it as improving readability of the test. why did you decide to move them?
>>  - if the test is inapplicable for Xcomp run, you should either throw SkippedException instead of System.err::println at L#67 or use '@requires vm.compMode != "Xcomp"' in jtreg test description. currently, the former provides arguable more clear message that the test wasn't run (as it sets special sub-status which is understood by our test execution system) than the latter (which will just omit test from test results altogether), however @requires is "faster" as jtreg don't need to run any of the test code. in any case, both makes it clean that the test wasn't really performed, while your code will lead to a passed-passed test w/o no automated way to know that the test wasn't run.
>>  - from you explanation of the fix it's also unclear why BackgroundCompilation got disabled, could you please explain?
>> Thanks,
>> -- Igor
>>> On Jul 27, 2020, at 12:38 PM, Evgeny Nikitin <evgeny.nikitin at oracle.com> wrote:
>>> 
>>> Hi,
>>> 
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8067651
>>> Webrev: https://cr.openjdk.java.net/~enikitin/8067651/webrev.00/
>>> 
>>> Adjusting the test to current state of the VM.
>>> 
>>>    - Definition of 'trivial code' does not depend on whether the method has been profiled or not;
>>>    - Trivial code does only go level 0 to level 1;
>>>    - Some refactoring.
>>> 
>>> The change has been checked in mach5 for the 5 platforms (passed).
>>> 
>>> Please review,
>>> /Evgeny Nikitin.


From luhenry at microsoft.com  Mon Aug  3 18:12:32 2020
From: luhenry at microsoft.com (Ludovic Henry)
Date: Mon, 3 Aug 2020 18:12:32 +0000
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <CACR9jGOqu3vM8J=n44b5C-FHgAE1ZBt9-AesVtxYqny2Dui=bQ@mail.gmail.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
 <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>
Message-ID: <MWHPR21MB0511A106E229CA4C23B7AD95B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>

> And I got crash during JDK build on linux-x64:
>
> #  Internal Error (src/hotspot/share/opto/library_call.cpp:5732), pid=18904, tid=19012
> #  assert(field != __null) failed: undefined field
> #
> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 16-internal+0-2020-08-03-1651458.vladimir.kozlov.jdkjdk, mixed mode, 
> tiered, compressed oops, g1 gc, linux-amd64)
> # Problematic frame:
> # V  [libjvm.so+0x11123f4]  LibraryCallKit::load_field_from_object(Node*, char const*, char const*, bool, bool, ciInstanceKlass*)+0x334
>
> Current CompileTask:
> C2:   6204 1305       4       sun.security.provider.DigestBase::engineUpdate (189 bytes)
>
> Stack: [0x0000151bfcfc7000,0x0000151bfd0c8000],  sp=0x0000151bfd0c3ed0,  free space=1011k
> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
> V  [libjvm.so+0x11123f4]  LibraryCallKit::load_field_from_object(Node*, char const*, char const*, bool, bool, ciInstanceKlass*)+0x334
> V  [libjvm.so+0x11167ea]  LibraryCallKit::get_long_state_from_digestBase_object(Node*)+0x2a
> V  [libjvm.so+0x1116f2d]  LibraryCallKit::inline_digestBase_implCompressMB(Node*, ciInstanceKlass*, bool, unsigned char*, char const*, Node*, Node*, Node*)+0x2cd
> V  [libjvm.so+0x1117467]  LibraryCallKit::inline_digestBase_implCompressMB(int)+0x397
> V  [libjvm.so+0x1121de1]  LibraryIntrinsic::generate(JVMState*)+0x211
> V  [libjvm.so+0x75d61d]  PredicatedIntrinsicGenerator::generate(JVMState*)+0xb8d

Interesting, I did all my work on Linux-x64 but didn't observe that. Let me try to reproduce and come back to you on that.


From vladimir.kozlov at oracle.com  Mon Aug  3 18:49:18 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 3 Aug 2020 11:49:18 -0700
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <MWHPR21MB0511A106E229CA4C23B7AD95B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <CACR9jGOqu3vM8J=n44b5C-FHgAE1ZBt9-AesVtxYqny2Dui=bQ@mail.gmail.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
 <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>
 <MWHPR21MB0511A106E229CA4C23B7AD95B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
Message-ID: <b90d19dc-ad8a-f3d0-4ac2-89fcebb305e3@oracle.com>

Hmm, I applied your http://cr.openjdk.java.net/~luhenry/8250902/webrev.01/jdk.changeset

But it looks like it has more changes (windows_aarch64) then just MD5 intrinsic.
I will retest again with removed other changes.

Vladimir K

On 8/3/20 11:12 AM, Ludovic Henry wrote:
>> And I got crash during JDK build on linux-x64:
>>
>> #  Internal Error (src/hotspot/share/opto/library_call.cpp:5732), pid=18904, tid=19012
>> #  assert(field != __null) failed: undefined field
>> #
>> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 16-internal+0-2020-08-03-1651458.vladimir.kozlov.jdkjdk, mixed mode,
>> tiered, compressed oops, g1 gc, linux-amd64)
>> # Problematic frame:
>> # V  [libjvm.so+0x11123f4]  LibraryCallKit::load_field_from_object(Node*, char const*, char const*, bool, bool, ciInstanceKlass*)+0x334
>>
>> Current CompileTask:
>> C2:   6204 1305       4       sun.security.provider.DigestBase::engineUpdate (189 bytes)
>>
>> Stack: [0x0000151bfcfc7000,0x0000151bfd0c8000],  sp=0x0000151bfd0c3ed0,  free space=1011k
>> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V  [libjvm.so+0x11123f4]  LibraryCallKit::load_field_from_object(Node*, char const*, char const*, bool, bool, ciInstanceKlass*)+0x334
>> V  [libjvm.so+0x11167ea]  LibraryCallKit::get_long_state_from_digestBase_object(Node*)+0x2a
>> V  [libjvm.so+0x1116f2d]  LibraryCallKit::inline_digestBase_implCompressMB(Node*, ciInstanceKlass*, bool, unsigned char*, char const*, Node*, Node*, Node*)+0x2cd
>> V  [libjvm.so+0x1117467]  LibraryCallKit::inline_digestBase_implCompressMB(int)+0x397
>> V  [libjvm.so+0x1121de1]  LibraryIntrinsic::generate(JVMState*)+0x211
>> V  [libjvm.so+0x75d61d]  PredicatedIntrinsicGenerator::generate(JVMState*)+0xb8d
> 
> Interesting, I did all my work on Linux-x64 but didn't observe that. Let me try to reproduce and come back to you on that.
> 

From vladimir.kozlov at oracle.com  Mon Aug  3 18:50:43 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 3 Aug 2020 11:50:43 -0700
Subject: RFR(M): 8067651: Fix Trivial code path for
 LevelTransitionTest.java
In-Reply-To: <5F33FE0C-922F-435E-AD25-2A3445A51996@oracle.com>
References: <58fd3cd5-cdce-8e15-3237-d22a3566b0da@oracle.com>
 <970076A7-1F18-4E88-994F-802590AF4F9B@oracle.com>
 <a3d7d4f0-b609-b415-c6b4-8093b140a7d0@oracle.com>
 <5F33FE0C-922F-435E-AD25-2A3445A51996@oracle.com>
Message-ID: <47918541-638d-a231-c1ba-67ce512a498d@oracle.com>

+1

Thanks,
Vladimir K

On 8/3/20 11:11 AM, Igor Ignatyev wrote:
> Hi Evgeny,
> 
> webrev.01 looks good to me, thanks.
> 
> -- Igor
> 
>> On Aug 3, 2020, at 8:22 AM, Evgeny Nikitin <evgeny.nikitin at oracle.com> wrote:
>>
>> Hi Igor, thanks for review.
>>
>>>    - I don't see necessity of move Helper.* methods into the enclosing class, nor do I see it as improving readability of the test. why did you decide to move them?
>>
>> Remnants from the previous developer and their decision :). I personally don't like inner classes and inner helper methods alike, so now I've extracted that into MethodHelper.java. The fact that the methods are used in another test strengthens this decision for me.
>>
>>>    - if the test is inapplicable for Xcomp run, you should either throw SkippedException instead of System.err::println at L#67 or use '@requires vm.compMode != "Xcomp"' in jtreg test description. currently, the former provides arguable more clear message that the test wasn't run (as it sets special sub-status which is understood by our test execution system) than the latter (which will just omit test from test results altogether), however @requires is "faster" as jtreg don't need to run any of the test code. in any case, both makes it clean that the test wasn't really performed, while your code will lead to a passed-passed test w/o no automated way to know that the test wasn't run.
>>
>> I choose @requires. Descriptions in most cases are better then in-code logic.
>>
>>>    - from you explanation of the fix it's also unclear why BackgroundCompilation got disabled, could you please explain?
>>
>> One of the reasons for the case was uncontrollable switch to another layer in background. I found that switch valuable to make the test behavior predictable.
>>
>> The new webrev: https://cr.openjdk.java.net/~enikitin/8067651/webrev.01/
>>
>> Please review,
>> // Evgeny Nikitin.
>>
>>
>>
>> On 2020-07-31 19:11, Igor Ignatyev wrote:
>>> Hi Evgeny,
>>> in general looks good to me, a couple comments/questions though:
>>>   - I don't see necessity of move Helper.* methods into the enclosing class, nor do I see it as improving readability of the test. why did you decide to move them?
>>>   - if the test is inapplicable for Xcomp run, you should either throw SkippedException instead of System.err::println at L#67 or use '@requires vm.compMode != "Xcomp"' in jtreg test description. currently, the former provides arguable more clear message that the test wasn't run (as it sets special sub-status which is understood by our test execution system) than the latter (which will just omit test from test results altogether), however @requires is "faster" as jtreg don't need to run any of the test code. in any case, both makes it clean that the test wasn't really performed, while your code will lead to a passed-passed test w/o no automated way to know that the test wasn't run.
>>>   - from you explanation of the fix it's also unclear why BackgroundCompilation got disabled, could you please explain?
>>> Thanks,
>>> -- Igor
>>>> On Jul 27, 2020, at 12:38 PM, Evgeny Nikitin <evgeny.nikitin at oracle.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8067651
>>>> Webrev: https://cr.openjdk.java.net/~enikitin/8067651/webrev.00/
>>>>
>>>> Adjusting the test to current state of the VM.
>>>>
>>>>     - Definition of 'trivial code' does not depend on whether the method has been profiled or not;
>>>>     - Trivial code does only go level 0 to level 1;
>>>>     - Some refactoring.
>>>>
>>>> The change has been checked in mach5 for the 5 platforms (passed).
>>>>
>>>> Please review,
>>>> /Evgeny Nikitin.
> 

From luhenry at microsoft.com  Mon Aug  3 18:52:34 2020
From: luhenry at microsoft.com (Ludovic Henry)
Date: Mon, 3 Aug 2020 18:52:34 +0000
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <b90d19dc-ad8a-f3d0-4ac2-89fcebb305e3@oracle.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <CACR9jGOqu3vM8J=n44b5C-FHgAE1ZBt9-AesVtxYqny2Dui=bQ@mail.gmail.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
 <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>
 <MWHPR21MB0511A106E229CA4C23B7AD95B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b90d19dc-ad8a-f3d0-4ac2-89fcebb305e3@oracle.com>
Message-ID: <MWHPR21MB0511C097361C5CEE9D02A075B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>

> But it looks like it has more changes (windows_aarch64) then just MD5 intrinsic.
> I will retest again with removed other changes.

That looks like a mistake with me learning to use Mercurial, sorry about that. 

The only patch you need is `8250902: Implement MD5 Intrinsics on x86`, all the others are my mistake.
 

From luhenry at microsoft.com  Mon Aug  3 19:00:06 2020
From: luhenry at microsoft.com (Ludovic Henry)
Date: Mon, 3 Aug 2020 19:00:06 +0000
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <MWHPR21MB0511C097361C5CEE9D02A075B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <CACR9jGOqu3vM8J=n44b5C-FHgAE1ZBt9-AesVtxYqny2Dui=bQ@mail.gmail.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
 <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>
 <MWHPR21MB0511A106E229CA4C23B7AD95B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b90d19dc-ad8a-f3d0-4ac2-89fcebb305e3@oracle.com>
 <MWHPR21MB0511C097361C5CEE9D02A075B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
Message-ID: <MWHPR21MB0511E029B4C7B93701C1474FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>

I've updated [1] with the proper patch.

[1] http://cr.openjdk.java.net/~luhenry/md5-intrinsics/webrev.01/8250902.patch 

From vladimir.kozlov at oracle.com  Mon Aug  3 19:18:53 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 3 Aug 2020 12:18:53 -0700
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <MWHPR21MB0511C097361C5CEE9D02A075B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <CACR9jGOqu3vM8J=n44b5C-FHgAE1ZBt9-AesVtxYqny2Dui=bQ@mail.gmail.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
 <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>
 <MWHPR21MB0511A106E229CA4C23B7AD95B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b90d19dc-ad8a-f3d0-4ac2-89fcebb305e3@oracle.com>
 <MWHPR21MB0511C097361C5CEE9D02A075B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
Message-ID: <d18933bf-a2f9-806d-4f97-8cf48fdcddfe@oracle.com>

I reproduced crash with only MD5 changes on my local linux machine during fastdebug build.

Next code in inline_digestBase_implCompressMB should be reversed (get_long_*() should be called for long_state):

    if (long_state) {
      state = get_state_from_digestBase_object(digestBase_obj);
    } else {
      state = get_long_state_from_digestBase_object(digestBase_obj);
    }

Vladimir K

On 8/3/20 11:52 AM, Ludovic Henry wrote:
>> But it looks like it has more changes (windows_aarch64) then just MD5 intrinsic.
>> I will retest again with removed other changes.
> 
> That looks like a mistake with me learning to use Mercurial, sorry about that.
> 
> The only patch you need is `8250902: Implement MD5 Intrinsics on x86`, all the others are my mistake.
>   
> 

From viv.desh at gmail.com  Mon Aug  3 22:08:22 2020
From: viv.desh at gmail.com (Vivek Deshpande)
Date: Mon, 3 Aug 2020 15:08:22 -0700
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <d18933bf-a2f9-806d-4f97-8cf48fdcddfe@oracle.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <CACR9jGOqu3vM8J=n44b5C-FHgAE1ZBt9-AesVtxYqny2Dui=bQ@mail.gmail.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
 <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>
 <MWHPR21MB0511A106E229CA4C23B7AD95B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b90d19dc-ad8a-f3d0-4ac2-89fcebb305e3@oracle.com>
 <MWHPR21MB0511C097361C5CEE9D02A075B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <d18933bf-a2f9-806d-4f97-8cf48fdcddfe@oracle.com>
Message-ID: <CACR9jGODTahkL5z2NpCdH9Ss9RZHWH3RLTaLb-u4Dmun4h0Cpg@mail.gmail.com>

Hi Vladimir

It seems that the algorithm can be optimized further using SSE/AVX
instructions. I am not aware of any specific SSE/AVX implementation which
leverages those instructions in the best possible way.  Sandhya can chime
in more on that.
As far as I know, I came across this which points to MD5 SSE/AVX
implementation.
https://software.intel.com/content/www/us/en/develop/articles/intel-isa-l-cryptographic-hashes-for-cloud-storage.html

Regards,
Vivek

On Mon, Aug 3, 2020 at 12:21 PM Vladimir Kozlov <vladimir.kozlov at oracle.com>
wrote:

> I reproduced crash with only MD5 changes on my local linux machine during
> fastdebug build.
>
> Next code in inline_digestBase_implCompressMB should be reversed
> (get_long_*() should be called for long_state):
>
>     if (long_state) {
>       state = get_state_from_digestBase_object(digestBase_obj);
>     } else {
>       state = get_long_state_from_digestBase_object(digestBase_obj);
>     }
>
> Vladimir K
>
> On 8/3/20 11:52 AM, Ludovic Henry wrote:
> >> But it looks like it has more changes (windows_aarch64) then just MD5
> intrinsic.
> >> I will retest again with removed other changes.
> >
> > That looks like a mistake with me learning to use Mercurial, sorry about
> that.
> >
> > The only patch you need is `8250902: Implement MD5 Intrinsics on x86`,
> all the others are my mistake.
> >
> >
>


-- 
Thanks and Regards,

Vivek Deshpande
viv.desh at gmail.com

From vladimir.kozlov at oracle.com  Mon Aug  3 23:10:21 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 3 Aug 2020 16:10:21 -0700
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <CACR9jGODTahkL5z2NpCdH9Ss9RZHWH3RLTaLb-u4Dmun4h0Cpg@mail.gmail.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <CACR9jGOqu3vM8J=n44b5C-FHgAE1ZBt9-AesVtxYqny2Dui=bQ@mail.gmail.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
 <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>
 <MWHPR21MB0511A106E229CA4C23B7AD95B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b90d19dc-ad8a-f3d0-4ac2-89fcebb305e3@oracle.com>
 <MWHPR21MB0511C097361C5CEE9D02A075B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <d18933bf-a2f9-806d-4f97-8cf48fdcddfe@oracle.com>
 <CACR9jGODTahkL5z2NpCdH9Ss9RZHWH3RLTaLb-u4Dmun4h0Cpg@mail.gmail.com>
Message-ID: <0d97ffec-1e6e-65d3-d1c3-b39f72145c14@oracle.com>

Thank you, Vivek, for pointer. This is interesting ,could be somehitng Intel's mlib may have.

Vladimir K

On 8/3/20 3:08 PM, Vivek Deshpande wrote:
> Hi Vladimir
> 
> It seems that the algorithm can be optimized further using SSE/AVX
> instructions. I am not aware of any specific SSE/AVX implementation which
> leverages those instructions in the best possible way.  Sandhya can chime
> in more on that.
> As far as I know, I came across this which points to MD5 SSE/AVX
> implementation.
> https://software.intel.com/content/www/us/en/develop/articles/intel-isa-l-cryptographic-hashes-for-cloud-storage.html
> 
> Regards,
> Vivek
> 
> On Mon, Aug 3, 2020 at 12:21 PM Vladimir Kozlov <vladimir.kozlov at oracle.com>
> wrote:
> 
>> I reproduced crash with only MD5 changes on my local linux machine during
>> fastdebug build.
>>
>> Next code in inline_digestBase_implCompressMB should be reversed
>> (get_long_*() should be called for long_state):
>>
>>      if (long_state) {
>>        state = get_state_from_digestBase_object(digestBase_obj);
>>      } else {
>>        state = get_long_state_from_digestBase_object(digestBase_obj);
>>      }
>>
>> Vladimir K
>>
>> On 8/3/20 11:52 AM, Ludovic Henry wrote:
>>>> But it looks like it has more changes (windows_aarch64) then just MD5
>> intrinsic.
>>>> I will retest again with removed other changes.
>>>
>>> That looks like a mistake with me learning to use Mercurial, sorry about
>> that.
>>>
>>> The only patch you need is `8250902: Implement MD5 Intrinsics on x86`,
>> all the others are my mistake.
>>>
>>>
>>
> 
> 

From vladimir.kozlov at oracle.com  Mon Aug  3 23:58:59 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 3 Aug 2020 16:58:59 -0700
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <d18933bf-a2f9-806d-4f97-8cf48fdcddfe@oracle.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <CACR9jGOqu3vM8J=n44b5C-FHgAE1ZBt9-AesVtxYqny2Dui=bQ@mail.gmail.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
 <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>
 <MWHPR21MB0511A106E229CA4C23B7AD95B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b90d19dc-ad8a-f3d0-4ac2-89fcebb305e3@oracle.com>
 <MWHPR21MB0511C097361C5CEE9D02A075B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <d18933bf-a2f9-806d-4f97-8cf48fdcddfe@oracle.com>
Message-ID: <b322a0a3-8345-17d6-f015-82d40c3969cd@oracle.com>

Hmm, with that code reversed I now have failure only on Windows:

V  [jvm.dll+0x43abb7]  report_vm_error+0x117  (debug.cpp:264)
V  [jvm.dll+0x8a222e]  LibraryCallKit::load_field_from_object+0x1ae  (library_call.cpp:5732)
V  [jvm.dll+0x88c3ea]  LibraryCallKit::get_state_from_digestBase_object+0x3a  (library_call.cpp:6614)
V  [jvm.dll+0x8909d5]  LibraryCallKit::inline_digestBase_implCompressMB+0x115  (library_call.cpp:6598)
V  [jvm.dll+0x8908b1]  LibraryCallKit::inline_digestBase_implCompressMB+0x411  (library_call.cpp:6578)
V  [jvm.dll+0x8a5b2d]  LibraryCallKit::try_to_inline+0x184d  (library_call.cpp:836)

The bug is in the same code as before - typreo due to renaming. So the code should be:

      if (long_state) {
        state = get_long_state_from_digestBase_object(obj);
      } else {
        state = get_state_from_digestBase_object(obj);
      }


BTW, Ludovic, you need to add next change [1] to Graal's test to avoid its failure.

Thanks,
Vladimir K

[1]

src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java
@@ -423,6 +423,11 @@
                              "java/math/BigInteger.shiftRightImplWorker([I[IIII)V");
          }

+        if (isJDK16OrHigher()) {
+            add(toBeInvestigated,
+                            "sun/security/provider/MD5.implCompress0([BI)V");
+        }
+
          if (!config.inlineNotify()) {
              add(ignore, "java/lang/Object.notify()V");
          }
@@ -593,6 +598,14 @@
          return JavaVersionUtil.JAVA_SPEC >= 14;
      }

+    private static boolean isJDK15OrHigher() {
+        return JavaVersionUtil.JAVA_SPEC >= 15;
+    }
+
+    private static boolean isJDK16OrHigher() {
+        return JavaVersionUtil.JAVA_SPEC >= 16;
+    }
+
      public interface Refiner {
          void refine(CheckGraalIntrinsics checker);
      }


On 8/3/20 12:18 PM, Vladimir Kozlov wrote:
> I reproduced crash with only MD5 changes on my local linux machine during fastdebug build.
> 
> Next code in inline_digestBase_implCompressMB should be reversed (get_long_*() should be called for long_state):
> 
>  ?? if (long_state) {
>  ???? state = get_state_from_digestBase_object(digestBase_obj);
>  ?? } else {
>  ???? state = get_long_state_from_digestBase_object(digestBase_obj);
>  ?? }
> 
> Vladimir K
> 
> On 8/3/20 11:52 AM, Ludovic Henry wrote:
>>> But it looks like it has more changes (windows_aarch64) then just MD5 intrinsic.
>>> I will retest again with removed other changes.
>>
>> That looks like a mistake with me learning to use Mercurial, sorry about that.
>>
>> The only patch you need is `8250902: Implement MD5 Intrinsics on x86`, all the others are my mistake.
>>

From sandhya.viswanathan at intel.com  Mon Aug  3 23:59:59 2020
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Mon, 3 Aug 2020 23:59:59 +0000
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <0d97ffec-1e6e-65d3-d1c3-b39f72145c14@oracle.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <CACR9jGOqu3vM8J=n44b5C-FHgAE1ZBt9-AesVtxYqny2Dui=bQ@mail.gmail.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
 <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>
 <MWHPR21MB0511A106E229CA4C23B7AD95B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b90d19dc-ad8a-f3d0-4ac2-89fcebb305e3@oracle.com>
 <MWHPR21MB0511C097361C5CEE9D02A075B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <d18933bf-a2f9-806d-4f97-8cf48fdcddfe@oracle.com>
 <CACR9jGODTahkL5z2NpCdH9Ss9RZHWH3RLTaLb-u4Dmun4h0Cpg@mail.gmail.com>
 <0d97ffec-1e6e-65d3-d1c3-b39f72145c14@oracle.com>
Message-ID: <BYAPR11MB35435B218A5076AA2C6A921EEF4D0@BYAPR11MB3543.namprd11.prod.outlook.com>


The link that Vivek shared is for multi-buffer implementation where multiple MD5 hashes for different buffers is calculated at once using SIMD.
What is needed here is the acceleration of single buffer hash. I think that is what Henry's patch is proposing. 

Best Regards,
Sandhya

-----Original Message-----
From: Vladimir Kozlov <vladimir.kozlov at oracle.com> 
Sent: Monday, August 03, 2020 4:10 PM
To: Vivek Deshpande <viv.desh at gmail.com>
Cc: Ludovic Henry <luhenry at microsoft.com>; hotspot-compiler-dev at openjdk.java.net; core-libs-dev <core-libs-dev at openjdk.java.net>; Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
Subject: Re: RFR[M]: Adding MD5 Intrinsic on x86-64

Thank you, Vivek, for pointer. This is interesting ,could be somehitng Intel's mlib may have.

Vladimir K

On 8/3/20 3:08 PM, Vivek Deshpande wrote:
> Hi Vladimir
> 
> It seems that the algorithm can be optimized further using SSE/AVX 
> instructions. I am not aware of any specific SSE/AVX implementation 
> which leverages those instructions in the best possible way.  Sandhya 
> can chime in more on that.
> As far as I know, I came across this which points to MD5 SSE/AVX 
> implementation.
> https://software.intel.com/content/www/us/en/develop/articles/intel-is
> a-l-cryptographic-hashes-for-cloud-storage.html
> 
> Regards,
> Vivek
> 
> On Mon, Aug 3, 2020 at 12:21 PM Vladimir Kozlov 
> <vladimir.kozlov at oracle.com>
> wrote:
> 
>> I reproduced crash with only MD5 changes on my local linux machine 
>> during fastdebug build.
>>
>> Next code in inline_digestBase_implCompressMB should be reversed
>> (get_long_*() should be called for long_state):
>>
>>      if (long_state) {
>>        state = get_state_from_digestBase_object(digestBase_obj);
>>      } else {
>>        state = get_long_state_from_digestBase_object(digestBase_obj);
>>      }
>>
>> Vladimir K
>>
>> On 8/3/20 11:52 AM, Ludovic Henry wrote:
>>>> But it looks like it has more changes (windows_aarch64) then just 
>>>> MD5
>> intrinsic.
>>>> I will retest again with removed other changes.
>>>
>>> That looks like a mistake with me learning to use Mercurial, sorry 
>>> about
>> that.
>>>
>>> The only patch you need is `8250902: Implement MD5 Intrinsics on 
>>> x86`,
>> all the others are my mistake.
>>>
>>>
>>
> 
> 

From luhenry at microsoft.com  Tue Aug  4 00:13:06 2020
From: luhenry at microsoft.com (Ludovic Henry)
Date: Tue, 4 Aug 2020 00:13:06 +0000
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <b322a0a3-8345-17d6-f015-82d40c3969cd@oracle.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <CACR9jGOqu3vM8J=n44b5C-FHgAE1ZBt9-AesVtxYqny2Dui=bQ@mail.gmail.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
 <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>
 <MWHPR21MB0511A106E229CA4C23B7AD95B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b90d19dc-ad8a-f3d0-4ac2-89fcebb305e3@oracle.com>
 <MWHPR21MB0511C097361C5CEE9D02A075B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <d18933bf-a2f9-806d-4f97-8cf48fdcddfe@oracle.com>
 <b322a0a3-8345-17d6-f015-82d40c3969cd@oracle.com>
Message-ID: <MWHPR21MB0511A29C92D7F9787184C81AB04A0@MWHPR21MB0511.namprd21.prod.outlook.com>

I've fixed it at [1]. I'm sending an update very soon as soon as I have the performance numbers you asked for, and the test suites results on the different platforms of interest.

[1] http://cr.openjdk.java.net/~luhenry/8250902/webrev.02/


-----Original Message-----
From: Vladimir Kozlov <vladimir.kozlov at oracle.com> 
Sent: Monday, August 3, 2020 4:59 PM
To: Ludovic Henry <luhenry at microsoft.com>; hotspot-compiler-dev at openjdk.java.net; Vivek Deshpande <viv.desh at gmail.com>
Cc: core-libs-dev <core-libs-dev at openjdk.java.net>
Subject: Re: RFR[M]: Adding MD5 Intrinsic on x86-64

Hmm, with that code reversed I now have failure only on Windows:

V  [jvm.dll+0x43abb7]  report_vm_error+0x117  (debug.cpp:264)
V  [jvm.dll+0x8a222e]  LibraryCallKit::load_field_from_object+0x1ae  (library_call.cpp:5732)
V  [jvm.dll+0x88c3ea]  LibraryCallKit::get_state_from_digestBase_object+0x3a  (library_call.cpp:6614)
V  [jvm.dll+0x8909d5]  LibraryCallKit::inline_digestBase_implCompressMB+0x115  (library_call.cpp:6598)
V  [jvm.dll+0x8908b1]  LibraryCallKit::inline_digestBase_implCompressMB+0x411  (library_call.cpp:6578)
V  [jvm.dll+0x8a5b2d]  LibraryCallKit::try_to_inline+0x184d  (library_call.cpp:836)

The bug is in the same code as before - typreo due to renaming. So the code should be:

      if (long_state) {
        state = get_long_state_from_digestBase_object(obj);
      } else {
        state = get_state_from_digestBase_object(obj);
      }


BTW, Ludovic, you need to add next change [1] to Graal's test to avoid its failure.

Thanks,
Vladimir K

[1]

src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java
@@ -423,6 +423,11 @@
                              "java/math/BigInteger.shiftRightImplWorker([I[IIII)V");
          }

+        if (isJDK16OrHigher()) {
+            add(toBeInvestigated,
+                            "sun/security/provider/MD5.implCompress0([BI)V");
+        }
+
          if (!config.inlineNotify()) {
              add(ignore, "java/lang/Object.notify()V");
          }
@@ -593,6 +598,14 @@
          return JavaVersionUtil.JAVA_SPEC >= 14;
      }

+    private static boolean isJDK15OrHigher() {
+        return JavaVersionUtil.JAVA_SPEC >= 15;
+    }
+
+    private static boolean isJDK16OrHigher() {
+        return JavaVersionUtil.JAVA_SPEC >= 16;
+    }
+
      public interface Refiner {
          void refine(CheckGraalIntrinsics checker);
      }


On 8/3/20 12:18 PM, Vladimir Kozlov wrote:
> I reproduced crash with only MD5 changes on my local linux machine during fastdebug build.
> 
> Next code in inline_digestBase_implCompressMB should be reversed (get_long_*() should be called for long_state):
> 
>  ?? if (long_state) {
>  ???? state = get_state_from_digestBase_object(digestBase_obj);
>  ?? } else {
>  ???? state = get_long_state_from_digestBase_object(digestBase_obj);
>  ?? }
> 
> Vladimir K
> 
> On 8/3/20 11:52 AM, Ludovic Henry wrote:
>>> But it looks like it has more changes (windows_aarch64) then just MD5 intrinsic.
>>> I will retest again with removed other changes.
>>
>> That looks like a mistake with me learning to use Mercurial, sorry about that.
>>
>> The only patch you need is `8250902: Implement MD5 Intrinsics on x86`, all the others are my mistake.
>>

From luhenry at microsoft.com  Tue Aug  4 04:07:49 2020
From: luhenry at microsoft.com (Ludovic Henry)
Date: Tue, 4 Aug 2020 04:07:49 +0000
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <d18933bf-a2f9-806d-4f97-8cf48fdcddfe@oracle.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <CACR9jGOqu3vM8J=n44b5C-FHgAE1ZBt9-AesVtxYqny2Dui=bQ@mail.gmail.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
 <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>
 <MWHPR21MB0511A106E229CA4C23B7AD95B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b90d19dc-ad8a-f3d0-4ac2-89fcebb305e3@oracle.com>
 <MWHPR21MB0511C097361C5CEE9D02A075B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <d18933bf-a2f9-806d-4f97-8cf48fdcddfe@oracle.com>
Message-ID: <MWHPR21MB0511DB583EB3EBEFBE4E7753B04A0@MWHPR21MB0511.namprd21.prod.outlook.com>

Updated webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.02

> Next code in inline_digestBase_implCompressMB should be reversed (get_long_*() should be called for long_state):
>
>    if (long_state) {
>      state = get_state_from_digestBase_object(digestBase_obj);
>    } else {
>      state = get_long_state_from_digestBase_object(digestBase_obj);
>    }

Thanks for pointing that out. I tested everything with `hotspot:tier1` and `jdk:tier1` in fastdebug on Windows-x86, Windows-x64 and Linux-x64.

> It seems that the algorithm can be optimized further using SSE/AVX instructions. I am not aware of any specific SSE/AVX implementation which leverages those instructions in the best possible way. Sandhya can chime in more on that.

I have done some research prior to implementing this intrinsic and the only pointers I could find to vectorized MD5 is on computing _multiple_ MD5 hashes in parallel but not a _single_ MD5 hash. Using vectors effectively parallelize the computation of many MD5 hash, but it does not accelerate the computation of a single MD5 hash. And looking at the algorithm, every step depends on the previous step's result, which make it particularly hard to parallelize/vectorize.

> As far as I know, I came across this which points to MD5 SSE/AVX implementation. https://software.intel.com/content/www/us/en/develop/articles/intel-isa-l-cryptographic-hashes-for-cloud-storage.html

That library points to computing many MD5 hashes in parallel. Quoting: "Intel? ISA-L uses a novel technique called multi-buffer hashing, which [...] compute several hashes at once within a single core." That is similar to what I found in researching how to vectorize MD5. I also did not find any reference of an ISA-level implementation of MD5, neither in x86 nor ARM.

If you can point me to a document describing how to vectorize MD5, I would be more than happy to take a look and implement the algorithm. However, my understanding is that MD5 is not vectorizable by-design.

> Add tests to verify intrinsic implementation. You can use test/hotspot/jtreg/compiler/intrinsics/sha/ as examples.

I looked at these tests and they already cover MD5. I am not sure what's the best way to add tests here: 1. should I rename ` compiler/intrinsics/sha` to ` compiler/intrinsics/digest` and add the md5 tests there, 2. should I just add ` compiler/intrinsics/md5`, or 3. the name doesn't matter and I can just add it in ` compiler/intrinsics/sha`?

> In vm_version_x86.cpp move UseMD5Intrinsics flag setting near UseSHA flag setting.

Fixed.

> In new file macroAssembler_x86_md5.cpp no need empty line after copyright line. There is also typo 'rrdistribute':
>
>   * This code is free software; you can rrdistribute it and/or modify it
>
> Our validate-headers check failed. See GPL header template: ./make/templates/gpl-header

I updated the header, and added the license for the original code for the MD5 core algorithm.

> Did you test it on 32-bit x86?

I did run `hotspot:tier1` and `jdk:tier1` on Windows-x86, Windows-x64 and Linux-x64.

> Would be interesting to see result of artificially switching off AVX and SSE:
> '-XX:UseSSE=0 -XX:UseAVX=0'. It will make sure that only general instructions are needed.

The results are below:

-XX:-UseMD5Intrinsics
Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
MessageDigests.digest             md5        64     DEFAULT  thrpt   10  3512.618 ? 9.384  ops/ms
MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   450.037 ? 1.213  ops/ms
MessageDigests.digest             md5     16384     DEFAULT  thrpt   10    29.887 ? 0.057  ops/ms
MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.485 ? 0.002  ops/ms

-XX:+UseMD5Intrinsics
Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score   Error   Units
MessageDigests.digest             md5        64     DEFAULT  thrpt   10  4212.156 ? 7.781  ops/ ms => 19% speedup
MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   548.609 ? 1.374  ops/ ms => 22% speedup
MessageDigests.digest             md5     16384     DEFAULT  thrpt   10    37.961 ? 0.079  ops/ ms => 27% speedup
MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.596 ? 0.006  ops/ ms => 23% speedup

-XX:-UseMD5Intrinsics -XX:UseSSE=0 -XX:UseAVX=0
Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
MessageDigests.digest             md5        64     DEFAULT  thrpt   10  3462.769 ? 4.992  ops/ms
MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   443.858 ? 0.576  ops/ms
MessageDigests.digest             md5     16384     DEFAULT  thrpt   10    29.723 ? 0.480  ops/ms
MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.470 ? 0.001  ops/ms

-XX:+UseMD5Intrinsics -XX:UseSSE=0 -XX:UseAVX=0
Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score   Error   Units
MessageDigests.digest             md5        64     DEFAULT  thrpt   10  4237.219 ? 15.627  ops/ms => 22% speedup
MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   564.625 ?  1.510  ops/ms => 27% speedup
MessageDigests.digest             md5     16384     DEFAULT  thrpt   10    38.004 ?  0.078  ops/ms => 28% speedup
MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.597 ?  0.002  ops/ms => 27% speedup

Thank you,
Ludovic

From luhenry at microsoft.com  Tue Aug  4 04:23:11 2020
From: luhenry at microsoft.com (Ludovic Henry)
Date: Tue, 4 Aug 2020 04:23:11 +0000
Subject: [aarch64-port-dev ] RFR[XXS] 8248672: utilities: Introduce
 DEPRECATED macro for GCC and MSVC
In-Reply-To: <MWHPR21MB0511A595027B7328169F3816B0700@MWHPR21MB0511.namprd21.prod.outlook.com>
References: <MWHPR21MB05113A054F2D8291BFAFB085B0770@MWHPR21MB0511.namprd21.prod.outlook.com>
 <5e301790-8bfe-0ced-b5e2-8a9c76ae33de@oracle.com>
 <MWHPR21MB051146316FF56A63423DF514B0770@MWHPR21MB0511.namprd21.prod.outlook.com>
 <F50C37BA-FE47-465C-A60C-10547DCF8F69@oracle.com>
 <1259c3fd-b69c-6d81-0427-cb769f00bca5@redhat.com>
 <CD1AE64E-8555-4074-97EB-831BF0A72C61@oracle.com>
 <MWHPR21MB05117357CE8A5903943234C2B0750@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511A0713E147045CA02522CB0730@MWHPR21MB0511.namprd21.prod.outlook.com>,
 <116277BD-EA21-49AA-8DE1-DBC06ED43C43@oracle.com>
 <MWHPR21MB0511A595027B7328169F3816B0700@MWHPR21MB0511.namprd21.prod.outlook.com>
Message-ID: <MWHPR21MB0511C48B5F7C8FA1F33CA567B04A0@MWHPR21MB0511.namprd21.prod.outlook.com>

Hello,

A quick follow up on that change.

Webrev: http://cr.openjdk.java.net/~luhenry/8248672/webrev.01/8248672.patch

Thank you,
Ludovic

From viv.desh at gmail.com  Tue Aug  4 04:39:54 2020
From: viv.desh at gmail.com (Vivek Deshpande)
Date: Mon, 3 Aug 2020 21:39:54 -0700
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <MWHPR21MB0511DB583EB3EBEFBE4E7753B04A0@MWHPR21MB0511.namprd21.prod.outlook.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <CACR9jGOqu3vM8J=n44b5C-FHgAE1ZBt9-AesVtxYqny2Dui=bQ@mail.gmail.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
 <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>
 <MWHPR21MB0511A106E229CA4C23B7AD95B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b90d19dc-ad8a-f3d0-4ac2-89fcebb305e3@oracle.com>
 <MWHPR21MB0511C097361C5CEE9D02A075B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <d18933bf-a2f9-806d-4f97-8cf48fdcddfe@oracle.com>
 <MWHPR21MB0511DB583EB3EBEFBE4E7753B04A0@MWHPR21MB0511.namprd21.prod.outlook.com>
Message-ID: <CACR9jGOJusE+rO1f-w70BDE8OQta96fPMmL+xiQq+fuXyJ_C7Q@mail.gmail.com>

Thanks Ludovic Detailed explanation and Sandhya for clarification on the
vectorization.

Regards,
Vivek


On Mon, Aug 3, 2020 at 9:07 PM Ludovic Henry <luhenry at microsoft.com> wrote:

> Updated webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.02
>
> > Next code in inline_digestBase_implCompressMB should be reversed
> (get_long_*() should be called for long_state):
> >
> >    if (long_state) {
> >      state = get_state_from_digestBase_object(digestBase_obj);
> >    } else {
> >      state = get_long_state_from_digestBase_object(digestBase_obj);
> >    }
>
> Thanks for pointing that out. I tested everything with `hotspot:tier1` and
> `jdk:tier1` in fastdebug on Windows-x86, Windows-x64 and Linux-x64.
>
> > It seems that the algorithm can be optimized further using SSE/AVX
> instructions. I am not aware of any specific SSE/AVX implementation which
> leverages those instructions in the best possible way. Sandhya can chime in
> more on that.
>
> I have done some research prior to implementing this intrinsic and the
> only pointers I could find to vectorized MD5 is on computing _multiple_ MD5
> hashes in parallel but not a _single_ MD5 hash. Using vectors effectively
> parallelize the computation of many MD5 hash, but it does not accelerate
> the computation of a single MD5 hash. And looking at the algorithm, every
> step depends on the previous step's result, which make it particularly hard
> to parallelize/vectorize.
>
> > As far as I know, I came across this which points to MD5 SSE/AVX
> implementation.
> https://software.intel.com/content/www/us/en/develop/articles/intel-isa-l-cryptographic-hashes-for-cloud-storage.html
>
> That library points to computing many MD5 hashes in parallel. Quoting:
> "Intel? ISA-L uses a novel technique called multi-buffer hashing, which
> [...] compute several hashes at once within a single core." That is similar
> to what I found in researching how to vectorize MD5. I also did not find
> any reference of an ISA-level implementation of MD5, neither in x86 nor ARM.
>
> If you can point me to a document describing how to vectorize MD5, I would
> be more than happy to take a look and implement the algorithm. However, my
> understanding is that MD5 is not vectorizable by-design.
>
> > Add tests to verify intrinsic implementation. You can use
> test/hotspot/jtreg/compiler/intrinsics/sha/ as examples.
>
> I looked at these tests and they already cover MD5. I am not sure what's
> the best way to add tests here: 1. should I rename `
> compiler/intrinsics/sha` to ` compiler/intrinsics/digest` and add the md5
> tests there, 2. should I just add ` compiler/intrinsics/md5`, or 3. the
> name doesn't matter and I can just add it in ` compiler/intrinsics/sha`?
>
> > In vm_version_x86.cpp move UseMD5Intrinsics flag setting near UseSHA
> flag setting.
>
> Fixed.
>
> > In new file macroAssembler_x86_md5.cpp no need empty line after
> copyright line. There is also typo 'rrdistribute':
> >
> >   * This code is free software; you can rrdistribute it and/or modify it
> >
> > Our validate-headers check failed. See GPL header template:
> ./make/templates/gpl-header
>
> I updated the header, and added the license for the original code for the
> MD5 core algorithm.
>
> > Did you test it on 32-bit x86?
>
> I did run `hotspot:tier1` and `jdk:tier1` on Windows-x86, Windows-x64 and
> Linux-x64.
>
> > Would be interesting to see result of artificially switching off AVX and
> SSE:
> > '-XX:UseSSE=0 -XX:UseAVX=0'. It will make sure that only general
> instructions are needed.
>
> The results are below:
>
> -XX:-UseMD5Intrinsics
> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt
>  Score    Error   Units
> MessageDigests.digest             md5        64     DEFAULT  thrpt   10
> 3512.618 ? 9.384  ops/ms
> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10
>  450.037 ? 1.213  ops/ms
> MessageDigests.digest             md5     16384     DEFAULT  thrpt   10
> 29.887 ? 0.057  ops/ms
> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10
>  0.485 ? 0.002  ops/ms
>
> -XX:+UseMD5Intrinsics
> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt
>  Score   Error   Units
> MessageDigests.digest             md5        64     DEFAULT  thrpt   10
> 4212.156 ? 7.781  ops/ ms => 19% speedup
> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10
>  548.609 ? 1.374  ops/ ms => 22% speedup
> MessageDigests.digest             md5     16384     DEFAULT  thrpt   10
> 37.961 ? 0.079  ops/ ms => 27% speedup
> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10
>  0.596 ? 0.006  ops/ ms => 23% speedup
>
> -XX:-UseMD5Intrinsics -XX:UseSSE=0 -XX:UseAVX=0
> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt
>  Score    Error   Units
> MessageDigests.digest             md5        64     DEFAULT  thrpt   10
> 3462.769 ? 4.992  ops/ms
> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10
>  443.858 ? 0.576  ops/ms
> MessageDigests.digest             md5     16384     DEFAULT  thrpt   10
> 29.723 ? 0.480  ops/ms
> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10
>  0.470 ? 0.001  ops/ms
>
> -XX:+UseMD5Intrinsics -XX:UseSSE=0 -XX:UseAVX=0
> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt
>  Score   Error   Units
> MessageDigests.digest             md5        64     DEFAULT  thrpt   10
> 4237.219 ? 15.627  ops/ms => 22% speedup
> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10
>  564.625 ?  1.510  ops/ms => 27% speedup
> MessageDigests.digest             md5     16384     DEFAULT  thrpt   10
> 38.004 ?  0.078  ops/ms => 28% speedup
> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10
>  0.597 ?  0.002  ops/ms => 27% speedup
>
> Thank you,
> Ludovic
>
-- 
Thanks and Regards,

Vivek Deshpande
viv.desh at gmail.com

From tobias.hartmann at oracle.com  Tue Aug  4 06:25:11 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 4 Aug 2020 08:25:11 +0200
Subject: [16] RFR(M) 8250233: -XX:+CITime triggers guarantee(events !=
 NULL) in jvmci.cpp:173
In-Reply-To: <e754ff2e-d764-cc2f-58f6-30daccb43dbc@oracle.com>
References: <e754ff2e-d764-cc2f-58f6-30daccb43dbc@oracle.com>
Message-ID: <4a82bdf6-1f45-3335-cb5d-4aa92f682353@oracle.com>

Hi Vladimir,

nice cleanup, looks good to me.

Best regards,
Tobias

On 31.07.20 04:54, Vladimir Kozlov wrote:
> https://cr.openjdk.java.net/~kvn/8250233/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8250233
> 
> Main issue was missing EnableJVMCI flag check when calling
> JVMCICompiler::print_compilation_timers(). I addition to fixinf that I did next refactoring.
> 
> The code which collects and print statistics per compiler was guarded by #if INCLUDE_JVMCI but not
> by any JVMCI flags.
> As result it is default code used by all JIT compilers since JVMCI was added in JDK 9.
> 
> I decided to make it not JVMCI specific and used it on all platforms.
> 
> I also added statistic per compilation tier which provides more useful information than combined
> date for C1.
> 
> Removed in CompileBroker::print_times() code which calculate total values based on data in
> compiler's statistic. Such data is already collected in CompileBroker's static fields.
> 
> Added checks for 0 values in print statements to avoid division by 0 (whioch produced NaN values for
> doubles).
> 
> Don't print empty data in JVMCICompiler::print_compilation_timers() but print total compilation time
> in JVMCICompiler::print_timers().
> 
> Tested hs-tier1-3.
> 
> Thanks,
> Vladimir
> 
> Beginning of CITime new output:
> 
> Individual compiler times (for compiled methods only)
> ------------------------------------------------
> 
> ? C1 {speed: 49626.710 bytes/s; standard:? 0.037 s, 1842 bytes, 35 methods; osr:? 0.000 s, 0 bytes,
> 0 methods; nmethods_size: 51096 bytes; nmethods_code_size: 30880 bytes}
> ? C2 {speed: 1451.769 bytes/s; standard:? 0.001 s, 2 bytes, 2 methods; osr:? 0.000 s, 0 bytes, 0
> methods; nmethods_size: 288 bytes; nmethods_code_size: 128 bytes}
> 
> Individual compilation Tier times (for compiled methods only)
> ------------------------------------------------
> 
> ? Tier1 {speed: 21162.963 bytes/s; standard:? 0.002 s, 47 bytes, 10 methods; osr:? 0.000 s, 0 bytes,
> 0 methods; nmethods_size: 3160 bytes; nmethods_code_size: 1504 bytes}
> ? Tier2 {speed:? 0.000 bytes/s; standard:? 0.000 s, 0 bytes, 0 methods; osr:? 0.000 s, 0 bytes, 0
> methods; nmethods_size: 0 bytes; nmethods_code_size: 0 bytes}
> ? Tier3 {speed: 51438.195 bytes/s; standard:? 0.035 s, 1795 bytes, 25 methods; osr:? 0.000 s, 0
> bytes, 0 methods; nmethods_size: 47936 bytes; nmethods_code_size: 29376 bytes}
> ? Tier4 {speed: 1451.769 bytes/s; standard:? 0.001 s, 2 bytes, 2 methods; osr:? 0.000 s, 0 bytes, 0
> methods; nmethods_size: 288 bytes; nmethods_code_size: 128 bytes}
> 
> Accumulated compiler times
> ----------------------------------------------------------
> ? Total compilation time?? :?? 0.038 s
> ??? Standard compilation?? :?? 0.038 s, Average : 0.001 s
> ??? Bailed out compilation :?? 0.000 s, Average : 0.000 s
> ??? On stack replacement?? :?? 0.000 s, Average : 0.000 s
> ??? Invalidated??????????? :?? 0.000 s, Average : 0.000 s

From xxinliu at amazon.com  Tue Aug  4 06:39:52 2020
From: xxinliu at amazon.com (Liu, Xin)
Date: Tue, 4 Aug 2020 06:39:52 +0000
Subject: RFR(S): 8247732: validate user-input intrinsic_ids in
 ControlIntrinsic
In-Reply-To: <916b3a4a-5617-941d-6161-840f3ea900bd@oracle.com>
References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com>
 <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com>
 <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com>
 <1595401959932.33284@amazon.com>
 <a03d92d6-ad07-b347-7452-776459b8d174@oracle.com>
 <1595520162373.22868@amazon.com>,
 <916b3a4a-5617-941d-6161-840f3ea900bd@oracle.com>
Message-ID: <1596523192072.15354@amazon.com>

hi, Nils, 

Tobias would like to keep the parser behavior consistency.  I think it means that the hotspot need to suppress the warning if the intrinsic_id doesn't exists in compiler directive.  
eg. -XX:CompileCommand=option,<pattern>,ControlIntrinsic=-_nonexist.

What do you think about it? 

Here is the latest webrev:
http://cr.openjdk.java.net/~xliu/8247732/01/webrev/

thanks,
--lx

________________________________________
From: Tobias Hartmann <tobias.hartmann at oracle.com>
Sent: Friday, July 24, 2020 2:52 AM
To: Liu, Xin; Nils Eliasson; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev
Subject: RE: [EXTERNAL] RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.


Hi Liu,

On 23.07.20 18:02, Liu, Xin wrote:
> That is my intention too, but CompilerOracle doesn't exit JVM when it encounters parsing errors.
> It just exacts information from CompileCommand as many as possible. That makes sense because compiler "directives" are supposed to be optional for program execution.
>
> I do put the error message in parser's errorbuf.  I set a flag "exit_on_error" to quit JVM after it dumps parser errors. yes, I treat undefined intrinsics as fatal errors.
> This behavior is from Nils comment: "I want to see an error on startup if the user has specified unknown intrinsic names."  It is also consistent with JVM option -XX:ControlIntrinsic=.

Okay, thanks for the explanation! I would prefer consistency in error handling of compiler
directives, i.e., handle all parser failures the same way. But I leave it to Nils to decide.

Best regards,
Tobias

From boris.ulasevich at bell-sw.com  Tue Aug  4 16:56:58 2020
From: boris.ulasevich at bell-sw.com (Boris Ulasevich)
Date: Tue, 4 Aug 2020 19:56:58 +0300
Subject: RFR 8249893: AARCH64: optimize the construction of the value from
 the bits of the other two
In-Reply-To: <c11d8af8-bca7-7a7e-5486-94c487f241ac@bell-sw.com>
References: <c11d8af8-bca7-7a7e-5486-94c487f241ac@bell-sw.com>
Message-ID: <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com>

Hi,

gently reminding of this review request.

thanks,
Boris

On 24.07.2020 13:48, Boris Ulasevich wrote:
> Hi,
>
> Please review the change to C2 and AArch64 which reduces constructs
> like? "(v1 & 0xFF) | ((v2 & 0xFF) << 8)" into two Bitfield Insert 
> instructions.
>
> http://bugs.openjdk.java.net/browse/JDK-8249893
> http://cr.openjdk.java.net/~bulasevich/8249893/webrev.00
>
> The change in common code was made to enable Node::is_AndL method.
> The method in the rule predicate is required to find out if we are within
> the straight or reversed rule (ADLC adds rule with swapped parameters
> for commutative operands).
>
> Tested with JTREG and generated [1] tests.
>
> thanks,
> Boris
>
> [1] http://cr.openjdk.java.net/~bulasevich/8249893/webrev.00/Gen.java


From boris.ulasevich at bell-sw.com  Tue Aug  4 16:58:18 2020
From: boris.ulasevich at bell-sw.com (Boris Ulasevich)
Date: Tue, 4 Aug 2020 19:58:18 +0300
Subject: RFR(S) 8248445: Use of AbsI/AbsL nodes should be limited to
 supported platforms
In-Reply-To: <e72be2be-82bb-4f9e-1076-8cbd795ed344@oracle.com>
References: <CAOhyNwApepiZt05ytwMLtxKWCxr2wWiaW7xHhTQXYnfKxrAdWA@mail.gmail.com>
 <e72be2be-82bb-4f9e-1076-8cbd795ed344@oracle.com>
Message-ID: <e18a6c4a-13fd-6201-7b56-eaab9636d6bf@bell-sw.com>

Hi Vladimir,

Yes, thank you. I've re-written this to improve readability by changing 
the logic slightly.
http://cr.openjdk.java.net/~bulasevich/8248445/webrev.03

thanks,
Boris

On 03.08.2020 20:25, Vladimir Kozlov wrote:
> Hi Boris,
>
> The current code is hard to read. Can you rearrange it to have clear 
> code flow (and correct spaces for if ())? Including F and D checks. To 
> something like:
>
> ? if (tzero == TypeF::ZERO) {
> ??? if (sub->Opcode() == Op_SubF &&
> ??????? sub->in(2) == x &&
> ??????? phase->type(sub->in(1)) == tzero)) {
> ????? x = new AbsFNode(x);
> ????? if (flip) {
> ??????? x = new SubFNode(sub->in(1), phase->transform(x));
> ????? }
> ??? }
> ? } else if
>
> Thanks,
> Vladimir
>
> On 8/2/20 1:54 PM, Boris Ulasevich wrote:
>> Hi all,
>>
>> Please review a simple change to C2 to fix a regression: AbsI/AbsL
>> nodes are used without checking that the platform supports them
>> (for now it is the issue for ARM32 and 32-bit x86 platforms).
>>
>> http://cr.openjdk.java.net/~bulasevich/8248445/webrev.02
>> http://bugs.openjdk.java.net/browse/JDK-8248445
>>
>> thanks,
>> Boris
>>


From vladimir.kozlov at oracle.com  Tue Aug  4 17:19:56 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 4 Aug 2020 10:19:56 -0700
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <MWHPR21MB0511DB583EB3EBEFBE4E7753B04A0@MWHPR21MB0511.namprd21.prod.outlook.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <CACR9jGOqu3vM8J=n44b5C-FHgAE1ZBt9-AesVtxYqny2Dui=bQ@mail.gmail.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
 <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>
 <MWHPR21MB0511A106E229CA4C23B7AD95B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b90d19dc-ad8a-f3d0-4ac2-89fcebb305e3@oracle.com>
 <MWHPR21MB0511C097361C5CEE9D02A075B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <d18933bf-a2f9-806d-4f97-8cf48fdcddfe@oracle.com>
 <MWHPR21MB0511DB583EB3EBEFBE4E7753B04A0@MWHPR21MB0511.namprd21.prod.outlook.com>
Message-ID: <b9aa612e-346a-d002-aa45-77d4aa7c97dd@oracle.com>

Hi Ludovic,

On 8/3/20 9:07 PM, Ludovic Henry wrote:
> Updated webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.02
> 
>> Next code in inline_digestBase_implCompressMB should be reversed (get_long_*() should be called for long_state):
>>
>>     if (long_state) {
>>       state = get_state_from_digestBase_object(digestBase_obj);
>>     } else {
>>       state = get_long_state_from_digestBase_object(digestBase_obj);
>>     }
> 
> Thanks for pointing that out. I tested everything with `hotspot:tier1` and `jdk:tier1` in fastdebug on Windows-x86, Windows-x64 and Linux-x64.

Code in library_call.cpp is good now.

> 
>> It seems that the algorithm can be optimized further using SSE/AVX instructions. I am not aware of any specific SSE/AVX implementation which leverages those instructions in the best possible way. Sandhya can chime in more on that.
> 
> I have done some research prior to implementing this intrinsic and the only pointers I could find to vectorized MD5 is on computing _multiple_ MD5 hashes in parallel but not a _single_ MD5 hash. Using vectors effectively parallelize the computation of many MD5 hash, but it does not accelerate the computation of a single MD5 hash. And looking at the algorithm, every step depends on the previous step's result, which make it particularly hard to parallelize/vectorize.
> 
>> As far as I know, I came across this which points to MD5 SSE/AVX implementation. https://software.intel.com/content/www/us/en/develop/articles/intel-isa-l-cryptographic-hashes-for-cloud-storage.html
> 
> That library points to computing many MD5 hashes in parallel. Quoting: "Intel? ISA-L uses a novel technique called multi-buffer hashing, which [...] compute several hashes at once within a single core." That is similar to what I found in researching how to vectorize MD5. I also did not find any reference of an ISA-level implementation of MD5, neither in x86 nor ARM.
> 
> If you can point me to a document describing how to vectorize MD5, I would be more than happy to take a look and implement the algorithm. However, my understanding is that MD5 is not vectorizable by-design.

I would leave this investigation to Intel's Java group. They are expert in this area!

For now, lets put current implementation into JDK.

> 
>> Add tests to verify intrinsic implementation. You can use test/hotspot/jtreg/compiler/intrinsics/sha/ as examples.
> 
> I looked at these tests and they already cover MD5. I am not sure what's the best way to add tests here: 1. should I rename ` compiler/intrinsics/sha` to ` compiler/intrinsics/digest` and add the md5 tests there, 2. should I just add ` compiler/intrinsics/md5`, or 3. the name doesn't matter and I can just add it in ` compiler/intrinsics/sha`?

3. Just add MD5 tests into existing SHA directory.

Note, compiler/intrinsics/sha testing is done in tier2. I ran it and it passed but it does not test MD5 a lot as I 
understand.

> 
>> In vm_version_x86.cpp move UseMD5Intrinsics flag setting near UseSHA flag setting.
> 
> Fixed.

It is not moved in webrev.02

> 
>> In new file macroAssembler_x86_md5.cpp no need empty line after copyright line. There is also typo 'rrdistribute':
>>
>>    * This code is free software; you can rrdistribute it and/or modify it
>>
>> Our validate-headers check failed. See GPL header template: ./make/templates/gpl-header
> 
> I updated the header, and added the license for the original code for the MD5 core algorithm.

You don't need to use Oracle copyright line. Using original Microsoft's copyright line is fine since you are author.
Thank you for adding license for original code.

> 
>> Did you test it on 32-bit x86?
> 
> I did run `hotspot:tier1` and `jdk:tier1` on Windows-x86, Windows-x64 and Linux-x64.
> 
>> Would be interesting to see result of artificially switching off AVX and SSE:
>> '-XX:UseSSE=0 -XX:UseAVX=0'. It will make sure that only general instructions are needed.
> 
> The results are below:

Very good. Thank you for testing it.

Regards,
Vladimir

> 
> -XX:-UseMD5Intrinsics
> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
> MessageDigests.digest             md5        64     DEFAULT  thrpt   10  3512.618 ? 9.384  ops/ms
> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   450.037 ? 1.213  ops/ms
> MessageDigests.digest             md5     16384     DEFAULT  thrpt   10    29.887 ? 0.057  ops/ms
> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.485 ? 0.002  ops/ms
> 
> -XX:+UseMD5Intrinsics
> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score   Error   Units
> MessageDigests.digest             md5        64     DEFAULT  thrpt   10  4212.156 ? 7.781  ops/ ms => 19% speedup
> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   548.609 ? 1.374  ops/ ms => 22% speedup
> MessageDigests.digest             md5     16384     DEFAULT  thrpt   10    37.961 ? 0.079  ops/ ms => 27% speedup
> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.596 ? 0.006  ops/ ms => 23% speedup
> 
> -XX:-UseMD5Intrinsics -XX:UseSSE=0 -XX:UseAVX=0
> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
> MessageDigests.digest             md5        64     DEFAULT  thrpt   10  3462.769 ? 4.992  ops/ms
> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   443.858 ? 0.576  ops/ms
> MessageDigests.digest             md5     16384     DEFAULT  thrpt   10    29.723 ? 0.480  ops/ms
> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.470 ? 0.001  ops/ms
> 
> -XX:+UseMD5Intrinsics -XX:UseSSE=0 -XX:UseAVX=0
> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score   Error   Units
> MessageDigests.digest             md5        64     DEFAULT  thrpt   10  4237.219 ? 15.627  ops/ms => 22% speedup
> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   564.625 ?  1.510  ops/ms => 27% speedup
> MessageDigests.digest             md5     16384     DEFAULT  thrpt   10    38.004 ?  0.078  ops/ms => 28% speedup
> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.597 ?  0.002  ops/ms => 27% speedup
> 
> Thank you,
> Ludovic
> 

From vladimir.kozlov at oracle.com  Tue Aug  4 17:29:13 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 4 Aug 2020 10:29:13 -0700
Subject: [aarch64-port-dev ] RFR[XXS] 8248672: utilities: Introduce
 DEPRECATED macro for GCC and MSVC
In-Reply-To: <MWHPR21MB0511C48B5F7C8FA1F33CA567B04A0@MWHPR21MB0511.namprd21.prod.outlook.com>
References: <MWHPR21MB05113A054F2D8291BFAFB085B0770@MWHPR21MB0511.namprd21.prod.outlook.com>
 <5e301790-8bfe-0ced-b5e2-8a9c76ae33de@oracle.com>
 <MWHPR21MB051146316FF56A63423DF514B0770@MWHPR21MB0511.namprd21.prod.outlook.com>
 <F50C37BA-FE47-465C-A60C-10547DCF8F69@oracle.com>
 <1259c3fd-b69c-6d81-0427-cb769f00bca5@redhat.com>
 <CD1AE64E-8555-4074-97EB-831BF0A72C61@oracle.com>
 <MWHPR21MB05117357CE8A5903943234C2B0750@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511A0713E147045CA02522CB0730@MWHPR21MB0511.namprd21.prod.outlook.com>
 <116277BD-EA21-49AA-8DE1-DBC06ED43C43@oracle.com>
 <MWHPR21MB0511A595027B7328169F3816B0700@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511C48B5F7C8FA1F33CA567B04A0@MWHPR21MB0511.namprd21.prod.outlook.com>
Message-ID: <f0ac986e-23c1-ff6d-2d57-4be6255bfa97@oracle.com>

Good.

Vladimir K

On 8/3/20 9:23 PM, Ludovic Henry wrote:
> Hello,
> 
> A quick follow up on that change.
> 
> Webrev: http://cr.openjdk.java.net/~luhenry/8248672/webrev.01/8248672.patch
> 
> Thank you,
> Ludovic
> 

From vladimir.kozlov at oracle.com  Tue Aug  4 17:33:34 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 4 Aug 2020 10:33:34 -0700
Subject: [16] RFR(M) 8250233: -XX:+CITime triggers guarantee(events !=
 NULL) in jvmci.cpp:173
In-Reply-To: <4a82bdf6-1f45-3335-cb5d-4aa92f682353@oracle.com>
References: <e754ff2e-d764-cc2f-58f6-30daccb43dbc@oracle.com>
 <4a82bdf6-1f45-3335-cb5d-4aa92f682353@oracle.com>
Message-ID: <5ae75bab-7331-9984-63f2-0107902fd7e8@oracle.com>

Thank you, Tobias

Vladimir K

On 8/3/20 11:25 PM, Tobias Hartmann wrote:
> Hi Vladimir,
> 
> nice cleanup, looks good to me.
> 
> Best regards,
> Tobias
> 
> On 31.07.20 04:54, Vladimir Kozlov wrote:
>> https://cr.openjdk.java.net/~kvn/8250233/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8250233
>>
>> Main issue was missing EnableJVMCI flag check when calling
>> JVMCICompiler::print_compilation_timers(). I addition to fixinf that I did next refactoring.
>>
>> The code which collects and print statistics per compiler was guarded by #if INCLUDE_JVMCI but not
>> by any JVMCI flags.
>> As result it is default code used by all JIT compilers since JVMCI was added in JDK 9.
>>
>> I decided to make it not JVMCI specific and used it on all platforms.
>>
>> I also added statistic per compilation tier which provides more useful information than combined
>> date for C1.
>>
>> Removed in CompileBroker::print_times() code which calculate total values based on data in
>> compiler's statistic. Such data is already collected in CompileBroker's static fields.
>>
>> Added checks for 0 values in print statements to avoid division by 0 (whioch produced NaN values for
>> doubles).
>>
>> Don't print empty data in JVMCICompiler::print_compilation_timers() but print total compilation time
>> in JVMCICompiler::print_timers().
>>
>> Tested hs-tier1-3.
>>
>> Thanks,
>> Vladimir
>>
>> Beginning of CITime new output:
>>
>> Individual compiler times (for compiled methods only)
>> ------------------------------------------------
>>
>>  ? C1 {speed: 49626.710 bytes/s; standard:? 0.037 s, 1842 bytes, 35 methods; osr:? 0.000 s, 0 bytes,
>> 0 methods; nmethods_size: 51096 bytes; nmethods_code_size: 30880 bytes}
>>  ? C2 {speed: 1451.769 bytes/s; standard:? 0.001 s, 2 bytes, 2 methods; osr:? 0.000 s, 0 bytes, 0
>> methods; nmethods_size: 288 bytes; nmethods_code_size: 128 bytes}
>>
>> Individual compilation Tier times (for compiled methods only)
>> ------------------------------------------------
>>
>>  ? Tier1 {speed: 21162.963 bytes/s; standard:? 0.002 s, 47 bytes, 10 methods; osr:? 0.000 s, 0 bytes,
>> 0 methods; nmethods_size: 3160 bytes; nmethods_code_size: 1504 bytes}
>>  ? Tier2 {speed:? 0.000 bytes/s; standard:? 0.000 s, 0 bytes, 0 methods; osr:? 0.000 s, 0 bytes, 0
>> methods; nmethods_size: 0 bytes; nmethods_code_size: 0 bytes}
>>  ? Tier3 {speed: 51438.195 bytes/s; standard:? 0.035 s, 1795 bytes, 25 methods; osr:? 0.000 s, 0
>> bytes, 0 methods; nmethods_size: 47936 bytes; nmethods_code_size: 29376 bytes}
>>  ? Tier4 {speed: 1451.769 bytes/s; standard:? 0.001 s, 2 bytes, 2 methods; osr:? 0.000 s, 0 bytes, 0
>> methods; nmethods_size: 288 bytes; nmethods_code_size: 128 bytes}
>>
>> Accumulated compiler times
>> ----------------------------------------------------------
>>  ? Total compilation time?? :?? 0.038 s
>>  ??? Standard compilation?? :?? 0.038 s, Average : 0.001 s
>>  ??? Bailed out compilation :?? 0.000 s, Average : 0.000 s
>>  ??? On stack replacement?? :?? 0.000 s, Average : 0.000 s
>>  ??? Invalidated??????????? :?? 0.000 s, Average : 0.000 s

From vladimir.kozlov at oracle.com  Tue Aug  4 17:55:14 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 4 Aug 2020 10:55:14 -0700
Subject: RFR(S) 8248445: Use of AbsI/AbsL nodes should be limited to
 supported platforms
In-Reply-To: <e18a6c4a-13fd-6201-7b56-eaab9636d6bf@bell-sw.com>
References: <CAOhyNwApepiZt05ytwMLtxKWCxr2wWiaW7xHhTQXYnfKxrAdWA@mail.gmail.com>
 <e72be2be-82bb-4f9e-1076-8cbd795ed344@oracle.com>
 <e18a6c4a-13fd-6201-7b56-eaab9636d6bf@bell-sw.com>
Message-ID: <f5a0d500-3723-d2df-7523-48422c07d2dc@oracle.com>

Hi Boris,

Good change.

Add year to test's copyright line.

Regards,
Vladimir K

On 8/4/20 9:58 AM, Boris Ulasevich wrote:
> Hi Vladimir,
> 
> Yes, thank you. I've re-written this to improve readability by changing the logic slightly.
> http://cr.openjdk.java.net/~bulasevich/8248445/webrev.03
> 
> thanks,
> Boris
> 
> On 03.08.2020 20:25, Vladimir Kozlov wrote:
>> Hi Boris,
>>
>> The current code is hard to read. Can you rearrange it to have clear code flow (and correct spaces for if ())? 
>> Including F and D checks. To something like:
>>
>> ? if (tzero == TypeF::ZERO) {
>> ??? if (sub->Opcode() == Op_SubF &&
>> ??????? sub->in(2) == x &&
>> ??????? phase->type(sub->in(1)) == tzero)) {
>> ????? x = new AbsFNode(x);
>> ????? if (flip) {
>> ??????? x = new SubFNode(sub->in(1), phase->transform(x));
>> ????? }
>> ??? }
>> ? } else if
>>
>> Thanks,
>> Vladimir
>>
>> On 8/2/20 1:54 PM, Boris Ulasevich wrote:
>>> Hi all,
>>>
>>> Please review a simple change to C2 to fix a regression: AbsI/AbsL
>>> nodes are used without checking that the platform supports them
>>> (for now it is the issue for ARM32 and 32-bit x86 platforms).
>>>
>>> http://cr.openjdk.java.net/~bulasevich/8248445/webrev.02
>>> http://bugs.openjdk.java.net/browse/JDK-8248445
>>>
>>> thanks,
>>> Boris
>>>
> 

From vladimir.kozlov at oracle.com  Tue Aug  4 19:52:29 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 4 Aug 2020 12:52:29 -0700
Subject: RFR (XXL): 8223347: Integration of Vector API (Incubator):
 General HotSpot changes
In-Reply-To: <38a7fe74-0c5e-4a28-b128-24c40b8ea01e@oracle.com>
References: <c1bdf88c-5de2-d069-5f31-5a95c6988bf8@oracle.com>
 <38a7fe74-0c5e-4a28-b128-24c40b8ea01e@oracle.com>
Message-ID: <d4a9c6df-b879-df00-6326-359569e9e93b@oracle.com>

Hi Vladimir,

Looks good. I have only few small questions.

compile.cpp: what is next comment about?

+  // FIXME for_igvn() is corrupted from here: new_worklist which is set_for_ignv() was allocated on stack.

print_method(): NodeClassNames[] should be available in product. Node::Name() method is not, but we can move it to 
product. But I am fine to do that later.

Why VectorSupport.java does not have copyright header?

Thanks,
Vladimir K

On 7/28/20 3:29 PM, Vladimir Ivanov wrote:
> Hi,
> 
> Thanks for the feedback on webrev.00, Remi, Coleen, Vladimir K., and Ekaterina!
> 
> Here are the latest changes for Vector API support in HotSpot shared code:
> 
> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01
> 
> Incremental changes (diff against webrev.00):
> 
> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01_00
> 
> I decided to post it here and not initiate a new round of reviews because the changes are mostly limited to minor 
> cleanups / simple bug fixes.
> 
> Detailed summary:
>  ? - rebased to jdk/jdk tip;
>  ? - got rid of NotV, VLShiftV, VRShiftV, VURShiftV nodes;
>  ? - restore lazy cleanup logic during incremental inlining (see needs_cleanup in compile.cpp);
>  ? - got rid of x86-specific changes in shared code;
>  ? - fix for 8244867 [1];
>  ? - fix Graal test failure: enumerate VectorSupport intrinsics in CheckGraalIntrinsics
>  ? - numerous minor cleanups
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] http://hg.openjdk.java.net/panama/dev/rev/dcfc7b6e8977
>  ??? http://jbs.oracle.com/browse/JDK-8244867
>  ??? 8244867: 2 vector api tests crash with assert(is_reference_type(basic_type())) failed: wrong type
> Summary: Adding safety checks to prevent intrinsification if class arguments of non-primitive types are uninitialized.
> 
> On 04.04.2020 02:12, Vladimir Ivanov wrote:
>> Hi,
>>
>> Following up on review requests of API [0] and Java implementation [1] for Vector API (JEP 338 [2]), here's a request 
>> for review of general HotSpot changes (in shared code) required for supporting the API:
>>
>>
>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/all.00-03/
>>
>> (First of all, to set proper expectations: since the JEP is still in Candidate state, the intention is to initiate 
>> preliminary round(s) of review to inform the community and gather feedback before sending out final/official RFRs once 
>> the JEP is Targeted to a release.)
>>
>> Vector API (being developed in Project Panama [3]) relies on JVM support to utilize optimal vector hardware 
>> instructions at runtime. It interacts with JVM through intrinsics (declared in jdk.internal.vm.vector.VectorSupport 
>> [4]) which expose vector operations support in C2 JIT-compiler.
>>
>> As Paul wrote earlier: "A vector intrinsic is an internal low-level vector operation. The last argument to the 
>> intrinsic is fall back behavior in Java, implementing the scalar operation over the number of elements held by the 
>> vector.? Thus, If the intrinsic is not supported in C2 for the other arguments then the Java implementation is 
>> executed (the Java implementation is always executed when running in the interpreter or for C1)."
>>
>> The rest of JVM support is about aggressively optimizing vector boxes to minimize (ideally eliminate) the overhead of 
>> boxing for vector values.
>> It's a stop-the-gap solution for vector box elimination problem until inline classes arrive. Vector classes are 
>> value-based and in the longer term will be migrated to inline classes once the support becomes available.
>>
>> Vector API talk from JVMLS'18 [5] contains brief overview of JVM implementation and some details.
>>
>> Complete implementation resides in vector-unstable branch of panama/dev repository [6].
>>
>> Now to gory details (the patch is split in multiple "sub-webrevs"):
>>
>> ===========================================================
>>
>> (1) http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/00.backend.shared/
>>
>> Ideal vector nodes for new operations introduced by Vector API.
>>
>> (Platform-specific back end support will be posted for review separately).
>>
>> ===========================================================
>>
>> (2) http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/01.intrinsics/
>>
>> JVM Java interface (VectorSupport) and intrinsic support in C2.
>>
>> Vector instances are initially represented as VectorBox macro nodes and "unboxing" is represented by VectorUnbox node. 
>> It simplifies vector box elimination analysis and the nodes are expanded later right before EA pass.
>>
>> Vectors have 2-level on-heap representation: for the vector value primitive array is used as a backing storage and it 
>> is encapsulated in a typed wrapper (e.g., Int256Vector - vector of 8 ints - contains a int[8] instance which is used 
>> to store vector value).
>>
>> Unless VectorBox node goes away, it needs to be expanded into an allocation eventually, but it is a pure node and 
>> doesn't have any JVM state associated with it. The problem is solved by keeping JVM state separately in a 
>> VectorBoxAllocate node associated with VectorBox node and use it during expansion.
>>
>> Also, to simplify vector box elimination, inlining of vector reboxing calls (VectorSupport::maybeRebox) is delayed 
>> until the analysis is over.
>>
>> ===========================================================
>>
>> (3) http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/02.vbox_elimination/
>>
>> Vector box elimination analysis implementation. (Brief overview: slides #36-42 [5].)
>>
>> The main part is devoted to scalarization across safepoints and rematerialization support during deoptimization. In 
>> C2-generated code vector operations work with raw vector values which live in registers or spilled on the stack and it 
>> allows to avoid boxing/unboxing when a vector value is alive across a safepoint. As with other values, there's just a 
>> location of the vector value at the safepoint and vector type information recorded in the relevant nmethod metadata 
>> and all the heavy-lifting happens only when rematerialization takes place.
>>
>> The analysis preserves object identity invariants except during aggressive reboxing (guarded by 
>> -XX:+EnableAggressiveReboxing).
>>
>> (Aggressive reboxing is crucial for cases when vectors "escape": it allocates a fresh instance at every escape point 
>> thus enabling original instance to go away.)
>>
>> ===========================================================
>>
>> (4) http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/03.module.hotspot/
>>
>> HotSpot changes for jdk.incubator.vector module. Vector support is makred experimental and turned off by default. JEP 
>> 338 proposes the API to be released as an incubator module, so a user has to specify "--add-module 
>> jdk.incubator.vector" on the command line to be able to use it.
>> When user does that, JVM automatically enables Vector API support.
>> It improves usability (user doesn't need to separately "open" the API and enable JVM support) while minimizing risks 
>> of destabilitzation from new code when the API is not used.
>>
>>
>> That's it! Will be happy to answer any questions.
>>
>> And thanks in advance for any feedback!
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [0] https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-March/065345.html
>>
>> [1] https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-April/041228.html
>>
>> [2] https://openjdk.java.net/jeps/338
>>
>> [3] https://openjdk.java.net/projects/panama/
>>
>> [4] 
>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/01.intrinsics/src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java.html 
>>
>>
>> [5] http://cr.openjdk.java.net/~vlivanov/talks/2018_JVMLS_VectorAPI.pdf
>>
>> [6] http://hg.openjdk.java.net/panama/dev/shortlog/92bbd44386e9
>>
>> ???? $ hg clone http://hg.openjdk.java.net/panama/dev/ -b vector-unstable

From vladimir.kozlov at oracle.com  Tue Aug  4 19:59:52 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 4 Aug 2020 12:59:52 -0700
Subject: RFR (XXL): 8223347: Integration of Vector API (Incubator):
 General HotSpot changes
In-Reply-To: <BYAPR11MB35437FE26F6DE8890ED94BBBEF700@BYAPR11MB3543.namprd11.prod.outlook.com>
References: <c1bdf88c-5de2-d069-5f31-5a95c6988bf8@oracle.com>
 <38a7fe74-0c5e-4a28-b128-24c40b8ea01e@oracle.com>
 <BYAPR11MB35437FE26F6DE8890ED94BBBEF700@BYAPR11MB3543.namprd11.prod.outlook.com>
Message-ID: <fe5c6191-9b34-df86-5b64-2a125a54de2f@oracle.com>

x86 changes seems fine.

Thanks,
Vladimir K

On 7/29/20 11:19 AM, Viswanathan, Sandhya wrote:
> Hi,
> 
> Likewise, the corresponding x86 backend changes since first review are also only minor cleanups and simple bug fixes:
> 
> X86:
>     Full: http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/x86_webrev/webrev.01/
>     Incremental: http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/x86_webrev/webrev.00-webrev.01/
> 
> Summary:
>     - rebased to jdk/jdk tip;
>     - backend changes related to removal of NotV, VLShiftV, VRShiftV, VURShiftV nodes;
>     - vector insert bug fix
>     - some minor cleanups
> 
> Older webrev links for your reference:
>     X86b backend: http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/x86_webrev/webrev.00/
> 
> Best Regards,
> Sandhya
> 
> -----Original Message-----
> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Sent: Tuesday, July 28, 2020 3:30 PM
> To: hotspot-dev <hotspot-dev at openjdk.java.net>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; panama-dev <panama-dev at openjdk.java.net>
> Subject: Re: RFR (XXL): 8223347: Integration of Vector API (Incubator): General HotSpot changes
> 
> Hi,
> 
> Thanks for the feedback on webrev.00, Remi, Coleen, Vladimir K., and Ekaterina!
> 
> Here are the latest changes for Vector API support in HotSpot shared code:
>   
> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01
> 
> Incremental changes (diff against webrev.00):
>   
> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01_00
> 
> I decided to post it here and not initiate a new round of reviews because the changes are mostly limited to minor cleanups / simple bug fixes.
> 
> Detailed summary:
>     - rebased to jdk/jdk tip;
>     - got rid of NotV, VLShiftV, VRShiftV, VURShiftV nodes;
>     - restore lazy cleanup logic during incremental inlining (see needs_cleanup in compile.cpp);
>     - got rid of x86-specific changes in shared code;
>     - fix for 8244867 [1];
>     - fix Graal test failure: enumerate VectorSupport intrinsics in CheckGraalIntrinsics
>     - numerous minor cleanups
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] http://hg.openjdk.java.net/panama/dev/rev/dcfc7b6e8977
>       http://jbs.oracle.com/browse/JDK-8244867
>       8244867: 2 vector api tests crash with
> assert(is_reference_type(basic_type())) failed: wrong type
> Summary: Adding safety checks to prevent intrinsification if class arguments of non-primitive types are uninitialized.
> 
> On 04.04.2020 02:12, Vladimir Ivanov wrote:
>> Hi,
>>
>> Following up on review requests of API [0] and Java implementation [1]
>> for Vector API (JEP 338 [2]), here's a request for review of general
>> HotSpot changes (in shared code) required for supporting the API:
>>
>>
>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar
>> ed/webrev.00/all.00-03/
>>
>>
>> (First of all, to set proper expectations: since the JEP is still in
>> Candidate state, the intention is to initiate preliminary round(s) of
>> review to inform the community and gather feedback before sending out
>> final/official RFRs once the JEP is Targeted to a release.)
>>
>> Vector API (being developed in Project Panama [3]) relies on JVM
>> support to utilize optimal vector hardware instructions at runtime. It
>> interacts with JVM through intrinsics (declared in
>> jdk.internal.vm.vector.VectorSupport [4]) which expose vector
>> operations support in C2 JIT-compiler.
>>
>> As Paul wrote earlier: "A vector intrinsic is an internal low-level
>> vector operation. The last argument to the intrinsic is fall back
>> behavior in Java, implementing the scalar operation over the number of
>> elements held by the vector.? Thus, If the intrinsic is not supported
>> in
>> C2 for the other arguments then the Java implementation is executed
>> (the Java implementation is always executed when running in the
>> interpreter or for C1)."
>>
>> The rest of JVM support is about aggressively optimizing vector boxes
>> to minimize (ideally eliminate) the overhead of boxing for vector values.
>> It's a stop-the-gap solution for vector box elimination problem until
>> inline classes arrive. Vector classes are value-based and in the
>> longer term will be migrated to inline classes once the support becomes available.
>>
>> Vector API talk from JVMLS'18 [5] contains brief overview of JVM
>> implementation and some details.
>>
>> Complete implementation resides in vector-unstable branch of
>> panama/dev repository [6].
>>
>> Now to gory details (the patch is split in multiple "sub-webrevs"):
>>
>> ===========================================================
>>
>> (1)
>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar
>> ed/webrev.00/00.backend.shared/
>>
>>
>> Ideal vector nodes for new operations introduced by Vector API.
>>
>> (Platform-specific back end support will be posted for review separately).
>>
>> ===========================================================
>>
>> (2)
>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar
>> ed/webrev.00/01.intrinsics/
>>
>>
>> JVM Java interface (VectorSupport) and intrinsic support in C2.
>>
>> Vector instances are initially represented as VectorBox macro nodes
>> and "unboxing" is represented by VectorUnbox node. It simplifies
>> vector box elimination analysis and the nodes are expanded later right before EA pass.
>>
>> Vectors have 2-level on-heap representation: for the vector value
>> primitive array is used as a backing storage and it is encapsulated in
>> a typed wrapper (e.g., Int256Vector - vector of 8 ints - contains a
>> int[8] instance which is used to store vector value).
>>
>> Unless VectorBox node goes away, it needs to be expanded into an
>> allocation eventually, but it is a pure node and doesn't have any JVM
>> state associated with it. The problem is solved by keeping JVM state
>> separately in a VectorBoxAllocate node associated with VectorBox node
>> and use it during expansion.
>>
>> Also, to simplify vector box elimination, inlining of vector reboxing
>> calls (VectorSupport::maybeRebox) is delayed until the analysis is over.
>>
>> ===========================================================
>>
>> (3)
>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar
>> ed/webrev.00/02.vbox_elimination/
>>
>>
>> Vector box elimination analysis implementation. (Brief overview:
>> slides
>> #36-42 [5].)
>>
>> The main part is devoted to scalarization across safepoints and
>> rematerialization support during deoptimization. In C2-generated code
>> vector operations work with raw vector values which live in registers
>> or spilled on the stack and it allows to avoid boxing/unboxing when a
>> vector value is alive across a safepoint. As with other values,
>> there's just a location of the vector value at the safepoint and
>> vector type information recorded in the relevant nmethod metadata and
>> all the heavy-lifting happens only when rematerialization takes place.
>>
>> The analysis preserves object identity invariants except during
>> aggressive reboxing (guarded by -XX:+EnableAggressiveReboxing).
>>
>> (Aggressive reboxing is crucial for cases when vectors "escape": it
>> allocates a fresh instance at every escape point thus enabling
>> original instance to go away.)
>>
>> ===========================================================
>>
>> (4)
>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar
>> ed/webrev.00/03.module.hotspot/
>>
>>
>> HotSpot changes for jdk.incubator.vector module. Vector support is
>> makred experimental and turned off by default. JEP 338 proposes the
>> API to be released as an incubator module, so a user has to specify
>> "--add-module jdk.incubator.vector" on the command line to be able to
>> use it.
>> When user does that, JVM automatically enables Vector API support.
>> It improves usability (user doesn't need to separately "open" the API
>> and enable JVM support) while minimizing risks of destabilitzation
>> from new code when the API is not used.
>>
>>
>> That's it! Will be happy to answer any questions.
>>
>> And thanks in advance for any feedback!
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [0]
>> https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-March/06534
>> 5.html
>>
>>
>> [1]
>> https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-April/041228.
>> html
>>
>> [2] https://openjdk.java.net/jeps/338
>>
>> [3] https://openjdk.java.net/projects/panama/
>>
>> [4]
>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar
>> ed/webrev.00/01.intrinsics/src/java.base/share/classes/jdk/internal/vm
>> /vector/VectorSupport.java.html
>>
>>
>> [5]
>> http://cr.openjdk.java.net/~vlivanov/talks/2018_JVMLS_VectorAPI.pdf
>>
>> [6] http://hg.openjdk.java.net/panama/dev/shortlog/92bbd44386e9
>>
>>   ??? $ hg clone http://hg.openjdk.java.net/panama/dev/ -b
>> vector-unstable

From luhenry at microsoft.com  Tue Aug  4 20:21:02 2020
From: luhenry at microsoft.com (Ludovic Henry)
Date: Tue, 4 Aug 2020 20:21:02 +0000
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <b9aa612e-346a-d002-aa45-77d4aa7c97dd@oracle.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <CACR9jGOqu3vM8J=n44b5C-FHgAE1ZBt9-AesVtxYqny2Dui=bQ@mail.gmail.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
 <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>
 <MWHPR21MB0511A106E229CA4C23B7AD95B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b90d19dc-ad8a-f3d0-4ac2-89fcebb305e3@oracle.com>
 <MWHPR21MB0511C097361C5CEE9D02A075B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <d18933bf-a2f9-806d-4f97-8cf48fdcddfe@oracle.com>
 <MWHPR21MB0511DB583EB3EBEFBE4E7753B04A0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b9aa612e-346a-d002-aa45-77d4aa7c97dd@oracle.com>
Message-ID: <MWHPR21MB0511B0E3507A82B4954B907DB04A0@MWHPR21MB0511.namprd21.prod.outlook.com>

Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.03
Testing: everything passes with hotspot:tier1 and jdk:tier1 in fastdebug on Linux-x64.

> I would leave this investigation to Intel's Java group. They are expert in this area!

Ok, we'll reach out to Intel on our end as well to figure out whether they have any specific guidance on that.

> 3. Just add MD5 tests into existing SHA directory.

Done. I've done some small renames (TestSHA -> TestDigest, SHAOptionsBase -> DigestOptionsBase), modified some of the SHA-specific code for non-SHA cases (GenericTestCaseFor*.java), and added MD5-specific tests.

> Note, compiler/intrinsics/sha testing is done in tier2. I ran it and it passed but it does not test MD5 a lot as I understand.

I extended the existing tests to cover MD5 on the same level as SHA, and I made sure that all tests are still passing.

>> 
>>> In vm_version_x86.cpp move UseMD5Intrinsics flag setting near UseSHA flag setting.
>> 
>> Fixed.
>
> It is not moved in webrev.02

Fixed.

> You don't need to use Oracle copyright line. Using original Microsoft's copyright line is fine since you are author.

Fixed.
 

From vladimir.kozlov at oracle.com  Tue Aug  4 22:03:38 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 4 Aug 2020 15:03:38 -0700
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <MWHPR21MB0511B0E3507A82B4954B907DB04A0@MWHPR21MB0511.namprd21.prod.outlook.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <CACR9jGOqu3vM8J=n44b5C-FHgAE1ZBt9-AesVtxYqny2Dui=bQ@mail.gmail.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
 <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>
 <MWHPR21MB0511A106E229CA4C23B7AD95B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b90d19dc-ad8a-f3d0-4ac2-89fcebb305e3@oracle.com>
 <MWHPR21MB0511C097361C5CEE9D02A075B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <d18933bf-a2f9-806d-4f97-8cf48fdcddfe@oracle.com>
 <MWHPR21MB0511DB583EB3EBEFBE4E7753B04A0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b9aa612e-346a-d002-aa45-77d4aa7c97dd@oracle.com>
 <MWHPR21MB0511B0E3507A82B4954B907DB04A0@MWHPR21MB0511.namprd21.prod.outlook.com>
Message-ID: <410fb009-94ab-fea8-9c1c-51c835b27b72@oracle.com>

Good.

I will run Hotspot and JDK testing and let you know results.

Regards,
Vladimir K

On 8/4/20 1:21 PM, Ludovic Henry wrote:
> Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.03
> Testing: everything passes with hotspot:tier1 and jdk:tier1 in fastdebug on Linux-x64.
> 
>> I would leave this investigation to Intel's Java group. They are expert in this area!
> 
> Ok, we'll reach out to Intel on our end as well to figure out whether they have any specific guidance on that.
> 
>> 3. Just add MD5 tests into existing SHA directory.
> 
> Done. I've done some small renames (TestSHA -> TestDigest, SHAOptionsBase -> DigestOptionsBase), modified some of the SHA-specific code for non-SHA cases (GenericTestCaseFor*.java), and added MD5-specific tests.
> 
>> Note, compiler/intrinsics/sha testing is done in tier2. I ran it and it passed but it does not test MD5 a lot as I understand.
> 
> I extended the existing tests to cover MD5 on the same level as SHA, and I made sure that all tests are still passing.
> 
>>>
>>>> In vm_version_x86.cpp move UseMD5Intrinsics flag setting near UseSHA flag setting.
>>>
>>> Fixed.
>>
>> It is not moved in webrev.02
> 
> Fixed.
> 
>> You don't need to use Oracle copyright line. Using original Microsoft's copyright line is fine since you are author.
> 
> Fixed.
>   
> 

From igor.ignatyev at oracle.com  Tue Aug  4 23:58:48 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 4 Aug 2020 16:58:48 -0700
Subject: RFR(S) : 8251126 : nsk.share.GoldChecker should read golden file from
 ${test.src}
Message-ID: <7510EC68-7A8C-4F1E-A928-5910F13FA5D9@oracle.com>

http://cr.openjdk.java.net/~iignatyev/8251126/webrev.00/
> 37 lines changed: 7 ins; 20 del; 10 mod; 

Hi all,

could you please review this patch?
from JBS:
> as of now, nsk.share.GoldChecker reads golden files from the current directory, which makes it necessary to copy golden files from ${test.src} before the execution of the tests which use GoldChecker.

after this patch, FileInstaller actions will become redundant in 103 of :vmTestbase_vm_compiler tests and will be removed by 8251127.

JBS: https://bugs.openjdk.java.net/browse/JDK-8251126
webrev: http://cr.openjdk.java.net/~iignatyev/8251126/webrev.00/
testing: :vmTestbase_vm_compiler tests

8251127: https://bugs.openjdk.java.net/browse/JDK-8251127

Thanks,
-- Igor


From sandhya.viswanathan at intel.com  Wed Aug  5 00:16:44 2020
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Wed, 5 Aug 2020 00:16:44 +0000
Subject: RFR (XXL): 8223347: Integration of Vector API (Incubator):
 General HotSpot changes
In-Reply-To: <fe5c6191-9b34-df86-5b64-2a125a54de2f@oracle.com>
References: <c1bdf88c-5de2-d069-5f31-5a95c6988bf8@oracle.com>
 <38a7fe74-0c5e-4a28-b128-24c40b8ea01e@oracle.com>
 <BYAPR11MB35437FE26F6DE8890ED94BBBEF700@BYAPR11MB3543.namprd11.prod.outlook.com>
 <fe5c6191-9b34-df86-5b64-2a125a54de2f@oracle.com>
Message-ID: <BYAPR11MB35435879768F4546060228D9EF4B0@BYAPR11MB3543.namprd11.prod.outlook.com>

Thanks a lot for the review.

Best Regards,
Sandhya

-----Original Message-----
From: Vladimir Kozlov <vladimir.kozlov at oracle.com> 
Sent: Tuesday, August 04, 2020 1:00 PM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; hotspot-dev <hotspot-dev at openjdk.java.net>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Cc: panama-dev <panama-dev at openjdk.java.net>
Subject: Re: RFR (XXL): 8223347: Integration of Vector API (Incubator): General HotSpot changes

x86 changes seems fine.

Thanks,
Vladimir K

On 7/29/20 11:19 AM, Viswanathan, Sandhya wrote:
> Hi,
> 
> Likewise, the corresponding x86 backend changes since first review are also only minor cleanups and simple bug fixes:
> 
> X86:
>     Full: http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/x86_webrev/webrev.01/
>     Incremental: 
> http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/x86_webrev/webrev.00
> -webrev.01/
> 
> Summary:
>     - rebased to jdk/jdk tip;
>     - backend changes related to removal of NotV, VLShiftV, VRShiftV, VURShiftV nodes;
>     - vector insert bug fix
>     - some minor cleanups
> 
> Older webrev links for your reference:
>     X86b backend: 
> http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/x86_webrev/webrev.00
> /
> 
> Best Regards,
> Sandhya
> 
> -----Original Message-----
> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Sent: Tuesday, July 28, 2020 3:30 PM
> To: hotspot-dev <hotspot-dev at openjdk.java.net>; hotspot compiler 
> <hotspot-compiler-dev at openjdk.java.net>
> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; panama-dev 
> <panama-dev at openjdk.java.net>
> Subject: Re: RFR (XXL): 8223347: Integration of Vector API 
> (Incubator): General HotSpot changes
> 
> Hi,
> 
> Thanks for the feedback on webrev.00, Remi, Coleen, Vladimir K., and Ekaterina!
> 
> Here are the latest changes for Vector API support in HotSpot shared code:
>   
> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar
> ed/webrev.01
> 
> Incremental changes (diff against webrev.00):
>   
> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar
> ed/webrev.01_00
> 
> I decided to post it here and not initiate a new round of reviews because the changes are mostly limited to minor cleanups / simple bug fixes.
> 
> Detailed summary:
>     - rebased to jdk/jdk tip;
>     - got rid of NotV, VLShiftV, VRShiftV, VURShiftV nodes;
>     - restore lazy cleanup logic during incremental inlining (see needs_cleanup in compile.cpp);
>     - got rid of x86-specific changes in shared code;
>     - fix for 8244867 [1];
>     - fix Graal test failure: enumerate VectorSupport intrinsics in CheckGraalIntrinsics
>     - numerous minor cleanups
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] http://hg.openjdk.java.net/panama/dev/rev/dcfc7b6e8977
>       http://jbs.oracle.com/browse/JDK-8244867
>       8244867: 2 vector api tests crash with
> assert(is_reference_type(basic_type())) failed: wrong type
> Summary: Adding safety checks to prevent intrinsification if class arguments of non-primitive types are uninitialized.
> 
> On 04.04.2020 02:12, Vladimir Ivanov wrote:
>> Hi,
>>
>> Following up on review requests of API [0] and Java implementation 
>> [1] for Vector API (JEP 338 [2]), here's a request for review of 
>> general HotSpot changes (in shared code) required for supporting the API:
>>
>>
>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.sha
>> r
>> ed/webrev.00/all.00-03/
>>
>>
>> (First of all, to set proper expectations: since the JEP is still in 
>> Candidate state, the intention is to initiate preliminary round(s) of 
>> review to inform the community and gather feedback before sending out 
>> final/official RFRs once the JEP is Targeted to a release.)
>>
>> Vector API (being developed in Project Panama [3]) relies on JVM 
>> support to utilize optimal vector hardware instructions at runtime. 
>> It interacts with JVM through intrinsics (declared in 
>> jdk.internal.vm.vector.VectorSupport [4]) which expose vector 
>> operations support in C2 JIT-compiler.
>>
>> As Paul wrote earlier: "A vector intrinsic is an internal low-level 
>> vector operation. The last argument to the intrinsic is fall back 
>> behavior in Java, implementing the scalar operation over the number 
>> of elements held by the vector.? Thus, If the intrinsic is not 
>> supported in
>> C2 for the other arguments then the Java implementation is executed 
>> (the Java implementation is always executed when running in the 
>> interpreter or for C1)."
>>
>> The rest of JVM support is about aggressively optimizing vector boxes 
>> to minimize (ideally eliminate) the overhead of boxing for vector values.
>> It's a stop-the-gap solution for vector box elimination problem until 
>> inline classes arrive. Vector classes are value-based and in the 
>> longer term will be migrated to inline classes once the support becomes available.
>>
>> Vector API talk from JVMLS'18 [5] contains brief overview of JVM 
>> implementation and some details.
>>
>> Complete implementation resides in vector-unstable branch of 
>> panama/dev repository [6].
>>
>> Now to gory details (the patch is split in multiple "sub-webrevs"):
>>
>> ===========================================================
>>
>> (1)
>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.sha
>> r
>> ed/webrev.00/00.backend.shared/
>>
>>
>> Ideal vector nodes for new operations introduced by Vector API.
>>
>> (Platform-specific back end support will be posted for review separately).
>>
>> ===========================================================
>>
>> (2)
>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.sha
>> r
>> ed/webrev.00/01.intrinsics/
>>
>>
>> JVM Java interface (VectorSupport) and intrinsic support in C2.
>>
>> Vector instances are initially represented as VectorBox macro nodes 
>> and "unboxing" is represented by VectorUnbox node. It simplifies 
>> vector box elimination analysis and the nodes are expanded later right before EA pass.
>>
>> Vectors have 2-level on-heap representation: for the vector value 
>> primitive array is used as a backing storage and it is encapsulated 
>> in a typed wrapper (e.g., Int256Vector - vector of 8 ints - contains 
>> a int[8] instance which is used to store vector value).
>>
>> Unless VectorBox node goes away, it needs to be expanded into an 
>> allocation eventually, but it is a pure node and doesn't have any JVM 
>> state associated with it. The problem is solved by keeping JVM state 
>> separately in a VectorBoxAllocate node associated with VectorBox node 
>> and use it during expansion.
>>
>> Also, to simplify vector box elimination, inlining of vector reboxing 
>> calls (VectorSupport::maybeRebox) is delayed until the analysis is over.
>>
>> ===========================================================
>>
>> (3)
>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.sha
>> r
>> ed/webrev.00/02.vbox_elimination/
>>
>>
>> Vector box elimination analysis implementation. (Brief overview:
>> slides
>> #36-42 [5].)
>>
>> The main part is devoted to scalarization across safepoints and 
>> rematerialization support during deoptimization. In C2-generated code 
>> vector operations work with raw vector values which live in registers 
>> or spilled on the stack and it allows to avoid boxing/unboxing when a 
>> vector value is alive across a safepoint. As with other values, 
>> there's just a location of the vector value at the safepoint and 
>> vector type information recorded in the relevant nmethod metadata and 
>> all the heavy-lifting happens only when rematerialization takes place.
>>
>> The analysis preserves object identity invariants except during 
>> aggressive reboxing (guarded by -XX:+EnableAggressiveReboxing).
>>
>> (Aggressive reboxing is crucial for cases when vectors "escape": it 
>> allocates a fresh instance at every escape point thus enabling 
>> original instance to go away.)
>>
>> ===========================================================
>>
>> (4)
>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.sha
>> r
>> ed/webrev.00/03.module.hotspot/
>>
>>
>> HotSpot changes for jdk.incubator.vector module. Vector support is 
>> makred experimental and turned off by default. JEP 338 proposes the 
>> API to be released as an incubator module, so a user has to specify 
>> "--add-module jdk.incubator.vector" on the command line to be able to 
>> use it.
>> When user does that, JVM automatically enables Vector API support.
>> It improves usability (user doesn't need to separately "open" the API 
>> and enable JVM support) while minimizing risks of destabilitzation 
>> from new code when the API is not used.
>>
>>
>> That's it! Will be happy to answer any questions.
>>
>> And thanks in advance for any feedback!
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [0]
>> https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-March/0653
>> 4
>> 5.html
>>
>>
>> [1]
>> https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-April/041228.
>> html
>>
>> [2] https://openjdk.java.net/jeps/338
>>
>> [3] https://openjdk.java.net/projects/panama/
>>
>> [4]
>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.sha
>> r 
>> ed/webrev.00/01.intrinsics/src/java.base/share/classes/jdk/internal/v
>> m
>> /vector/VectorSupport.java.html
>>
>>
>> [5]
>> http://cr.openjdk.java.net/~vlivanov/talks/2018_JVMLS_VectorAPI.pdf
>>
>> [6] http://hg.openjdk.java.net/panama/dev/shortlog/92bbd44386e9
>>
>>   ??? $ hg clone http://hg.openjdk.java.net/panama/dev/ -b 
>> vector-unstable

From igor.ignatyev at oracle.com  Wed Aug  5 00:22:09 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 4 Aug 2020 17:22:09 -0700
Subject: RFR(T) : 8251128 : remove vmTestbase/vm/compiler/jbe/combine
Message-ID: <09FAF175-7A12-40DE-8E2E-B825A6E103F2@oracle.com>

http://cr.openjdk.java.net/~iignatyev//8251128/webrev.00
> 25 lines changed: 0 ins; 25 del; 0 mod; 

Hi all,

could you please review the patch which removes test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine directory (or rather the only file it contained -- README)?
> % hg rm test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine
> removing test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine/README
> % hg st
> R test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine/README

from JBS:
> test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine directory doesn't any tests and should be removed.


JBS: https://bugs.openjdk.java.net/browse/JDK-8251128
webrev: http://cr.openjdk.java.net/~iignatyev//8251128/webrev.00

Thanks,
-- Igor

From vladimir.kozlov at oracle.com  Wed Aug  5 00:34:52 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 4 Aug 2020 17:34:52 -0700
Subject: RFR(T) : 8251128 : remove vmTestbase/vm/compiler/jbe/combine
In-Reply-To: <09FAF175-7A12-40DE-8E2E-B825A6E103F2@oracle.com>
References: <09FAF175-7A12-40DE-8E2E-B825A6E103F2@oracle.com>
Message-ID: <88d72d5b-329b-0b7d-04a0-2b2f2032b952@oracle.com>

Looks good but where original tests were moved? Which RFE did that?

Thanks,
Vladimir

On 8/4/20 5:22 PM, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev//8251128/webrev.00
>> 25 lines changed: 0 ins; 25 del; 0 mod;
> 
> Hi all,
> 
> could you please review the patch which removes test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine directory (or rather the only file it contained -- README)?
>> % hg rm test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine
>> removing test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine/README
>> % hg st
>> R test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine/README
> 
> from JBS:
>> test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine directory doesn't any tests and should be removed.
> 
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8251128
> webrev: http://cr.openjdk.java.net/~iignatyev//8251128/webrev.00
> 
> Thanks,
> -- Igor
> 

From vladimir.kozlov at oracle.com  Wed Aug  5 01:33:09 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 4 Aug 2020 18:33:09 -0700
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <410fb009-94ab-fea8-9c1c-51c835b27b72@oracle.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
 <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>
 <MWHPR21MB0511A106E229CA4C23B7AD95B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b90d19dc-ad8a-f3d0-4ac2-89fcebb305e3@oracle.com>
 <MWHPR21MB0511C097361C5CEE9D02A075B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <d18933bf-a2f9-806d-4f97-8cf48fdcddfe@oracle.com>
 <MWHPR21MB0511DB583EB3EBEFBE4E7753B04A0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b9aa612e-346a-d002-aa45-77d4aa7c97dd@oracle.com>
 <MWHPR21MB0511B0E3507A82B4954B907DB04A0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <410fb009-94ab-fea8-9c1c-51c835b27b72@oracle.com>
Message-ID: <832a89ec-bd6b-5d40-9a1d-5a5e688399e7@oracle.com>

Hi Ludovic,

Tests are mostly clean so far except:

new 3 MD5 tests failed on aarch64 because UseMD5Intrinsics flag is 'true' incorrectly:

     bool UseMD5Intrinsics = true  {diagnostic} {command line}

compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java
compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java
compiler/intrinsics/sha/cli/TestUseMD5IntrinsicsOptionOnUnsupportedCPU.java

I think you need to set flag to false (to overwrite setting on command line) in vm_version_*.cpp files on all other CPUs 
until they have implementation:

http://hg.openjdk.java.net/jdk/jdk/file/5bda40c115c1/src/hotspot/cpu/ppc/vm_version_ppc.cpp#l278

Also I forgot to ask to update copyright year in files you touched.

Thanks,
Vladimir K

On 8/4/20 3:03 PM, Vladimir Kozlov wrote:
> Good.
> 
> I will run Hotspot and JDK testing and let you know results.
> 
> Regards,
> Vladimir K
> 
> On 8/4/20 1:21 PM, Ludovic Henry wrote:
>> Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.03
>> Testing: everything passes with hotspot:tier1 and jdk:tier1 in fastdebug on Linux-x64.
>>
>>> I would leave this investigation to Intel's Java group. They are expert in this area!
>>
>> Ok, we'll reach out to Intel on our end as well to figure out whether they have any specific guidance on that.
>>
>>> 3. Just add MD5 tests into existing SHA directory.
>>
>> Done. I've done some small renames (TestSHA -> TestDigest, SHAOptionsBase -> DigestOptionsBase), modified some of the 
>> SHA-specific code for non-SHA cases (GenericTestCaseFor*.java), and added MD5-specific tests.
>>
>>> Note, compiler/intrinsics/sha testing is done in tier2. I ran it and it passed but it does not test MD5 a lot as I 
>>> understand.
>>
>> I extended the existing tests to cover MD5 on the same level as SHA, and I made sure that all tests are still passing.
>>
>>>>
>>>>> In vm_version_x86.cpp move UseMD5Intrinsics flag setting near UseSHA flag setting.
>>>>
>>>> Fixed.
>>>
>>> It is not moved in webrev.02
>>
>> Fixed.
>>
>>> You don't need to use Oracle copyright line. Using original Microsoft's copyright line is fine since you are author.
>>
>> Fixed.
>>

From luhenry at microsoft.com  Wed Aug  5 02:09:08 2020
From: luhenry at microsoft.com (Ludovic Henry)
Date: Wed, 5 Aug 2020 02:09:08 +0000
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <832a89ec-bd6b-5d40-9a1d-5a5e688399e7@oracle.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB05116DACB30B8F9D6C330FDCB04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB0511725F80EACA52DD00A811B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
 <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>
 <MWHPR21MB0511A106E229CA4C23B7AD95B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b90d19dc-ad8a-f3d0-4ac2-89fcebb305e3@oracle.com>
 <MWHPR21MB0511C097361C5CEE9D02A075B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <d18933bf-a2f9-806d-4f97-8cf48fdcddfe@oracle.com>
 <MWHPR21MB0511DB583EB3EBEFBE4E7753B04A0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b9aa612e-346a-d002-aa45-77d4aa7c97dd@oracle.com>
 <MWHPR21MB0511B0E3507A82B4954B907DB04A0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <410fb009-94ab-fea8-9c1c-51c835b27b72@oracle.com>
 <832a89ec-bd6b-5d40-9a1d-5a5e688399e7@oracle.com>
Message-ID: <MWHPR21MB0511C1ABFC7FEFDE07C72F3FB04B0@MWHPR21MB0511.namprd21.prod.outlook.com>

Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.03

> I think you need to set flag to false (to overwrite setting on command line) in vm_version_*.cpp files on all other CPUs until they have implementation:

Fixed.

 > Also I forgot to ask to update copyright year in files you touched.

Fixed.


From igor.ignatyev at oracle.com  Wed Aug  5 02:25:57 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 4 Aug 2020 19:25:57 -0700
Subject: RFR(T) : 8251128 : remove vmTestbase/vm/compiler/jbe/combine
In-Reply-To: <88d72d5b-329b-0b7d-04a0-2b2f2032b952@oracle.com>
References: <09FAF175-7A12-40DE-8E2E-B825A6E103F2@oracle.com>
 <88d72d5b-329b-0b7d-04a0-2b2f2032b952@oracle.com>
Message-ID: <CD68241D-2EA8-4AE1-B2C2-3AF07051E25D@oracle.com>

Hi Vladimir,

thanks for your reviewed. 

as of the original tests, they haven't been co-located and hence not open-sourced due to different reasons.

-- Igor

> On Aug 4, 2020, at 5:34 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Looks good but where original tests were moved? Which RFE did that?
> 
> Thanks,
> Vladimir
> 
> On 8/4/20 5:22 PM, Igor Ignatyev wrote:
>> http://cr.openjdk.java.net/~iignatyev//8251128/webrev.00
>>> 25 lines changed: 0 ins; 25 del; 0 mod;
>> Hi all,
>> could you please review the patch which removes test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine directory (or rather the only file it contained -- README)?
>>> % hg rm test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine
>>> removing test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine/README
>>> % hg st
>>> R test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine/README
>> from JBS:
>>> test/hotspot/jtreg/vmTestbase/vm/compiler/jbe/combine directory doesn't any tests and should be removed.
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8251128
>> webrev: http://cr.openjdk.java.net/~iignatyev//8251128/webrev.00
>> Thanks,
>> -- Igor


From david.holmes at oracle.com  Wed Aug  5 02:29:26 2020
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 5 Aug 2020 12:29:26 +1000
Subject: RFR(S) : 8251126 : nsk.share.GoldChecker should read golden file
 from ${test.src}
In-Reply-To: <7510EC68-7A8C-4F1E-A928-5910F13FA5D9@oracle.com>
References: <7510EC68-7A8C-4F1E-A928-5910F13FA5D9@oracle.com>
Message-ID: <41261a03-cd46-3d48-839b-d934a9fb92bb@oracle.com>

Hi Igor,

This seems fine. The code cleanup looks good too.

Thanks,
David

On 5/08/2020 9:58 am, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev/8251126/webrev.00/
>> 37 lines changed: 7 ins; 20 del; 10 mod;
> 
> Hi all,
> 
> could you please review this patch?
> from JBS:
>> as of now, nsk.share.GoldChecker reads golden files from the current directory, which makes it necessary to copy golden files from ${test.src} before the execution of the tests which use GoldChecker.
> 
> after this patch, FileInstaller actions will become redundant in 103 of :vmTestbase_vm_compiler tests and will be removed by 8251127.
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8251126
> webrev: http://cr.openjdk.java.net/~iignatyev/8251126/webrev.00/
> testing: :vmTestbase_vm_compiler tests
> 
> 8251127: https://bugs.openjdk.java.net/browse/JDK-8251127
> 
> Thanks,
> -- Igor
> 
> 

From vladimir.kozlov at oracle.com  Wed Aug  5 04:36:07 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 4 Aug 2020 21:36:07 -0700
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <MWHPR21MB0511C1ABFC7FEFDE07C72F3FB04B0@MWHPR21MB0511.namprd21.prod.outlook.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
 <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>
 <MWHPR21MB0511A106E229CA4C23B7AD95B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b90d19dc-ad8a-f3d0-4ac2-89fcebb305e3@oracle.com>
 <MWHPR21MB0511C097361C5CEE9D02A075B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <d18933bf-a2f9-806d-4f97-8cf48fdcddfe@oracle.com>
 <MWHPR21MB0511DB583EB3EBEFBE4E7753B04A0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b9aa612e-346a-d002-aa45-77d4aa7c97dd@oracle.com>
 <MWHPR21MB0511B0E3507A82B4954B907DB04A0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <410fb009-94ab-fea8-9c1c-51c835b27b72@oracle.com>
 <832a89ec-bd6b-5d40-9a1d-5a5e688399e7@oracle.com>
 <MWHPR21MB0511C1ABFC7FEFDE07C72F3FB04B0@MWHPR21MB0511.namprd21.prod.outlook.com>
Message-ID: <f89221eb-26b1-6102-0e41-4da765196832@oracle.com>

It looks like you created webrev based on old state of jdk or its branch.
Your vm_version_aarch64.cpp change did not apply to latest jdk source.
There are also few copyright year updates for files which already have it.

I fixed it and start new round of testing.

Vladimir K

On 8/4/20 7:09 PM, Ludovic Henry wrote:
> Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.03
> 
>> I think you need to set flag to false (to overwrite setting on command line) in vm_version_*.cpp files on all other CPUs until they have implementation:
> 
> Fixed.
> 
>   > Also I forgot to ask to update copyright year in files you touched.
> 
> Fixed.
> 

From igor.ignatyev at oracle.com  Wed Aug  5 05:18:56 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 4 Aug 2020 22:18:56 -0700
Subject: RFR(M) : 8251132 : make main classes public in vmTestbase/jit tests
Message-ID: <FE4EBF05-3C3A-4763-AB04-04C0FDB90214@oracle.com>

http://cr.openjdk.java.net/~iignatyev//8251132/webrev.00
> 498 lines changed: 0 ins; 132 del; 366 mod;

Hi all,

could you please review the patch which adds public modifier to "main" test classes in vmTestbase/jit tests?
from JBS:
> main test classes of several vmTestbase/jit tests are package-private, as a result, jtreg can't run them directly and we had to use `driver ExecDriver --java ` to run them. 
> 
> this RFE is to make these classes public and to replace ExecDriver w/ regular `main/othervm` where appropriate.

the patch also removes ExecDriver and @build in all but 6 tests. those 6 (vmTestbase/jit/t/t108--t113) compare stack traces to the golden ones, and execution them "directly" by jtreg will lead to failures due to a few extra frames from jtreg.

JBS: https://bugs.openjdk.java.net/browse/JDK-8251132
testing: :vmTestbase_vm_compiler on {linux,windows,macos}-x64
webrev: http://cr.openjdk.java.net/~iignatyev//8251132/webrev.00

Thanks,
-- Igor

From boris.ulasevich at bell-sw.com  Wed Aug  5 08:31:11 2020
From: boris.ulasevich at bell-sw.com (Boris Ulasevich)
Date: Wed, 5 Aug 2020 11:31:11 +0300
Subject: RFR(S) 8248445: Use of AbsI/AbsL nodes should be limited to
 supported platforms
In-Reply-To: <f5a0d500-3723-d2df-7523-48422c07d2dc@oracle.com>
References: <CAOhyNwApepiZt05ytwMLtxKWCxr2wWiaW7xHhTQXYnfKxrAdWA@mail.gmail.com>
 <e72be2be-82bb-4f9e-1076-8cbd795ed344@oracle.com>
 <e18a6c4a-13fd-6201-7b56-eaab9636d6bf@bell-sw.com>
 <f5a0d500-3723-d2df-7523-48422c07d2dc@oracle.com>
Message-ID: <2f447be9-0349-240d-c511-5a6e06f662af@bell-sw.com>

Hi Vladimir,

Ok. Thank you for review!

regards,
Boris

On 04.08.2020 20:55, Vladimir Kozlov wrote:
> Hi Boris,
>
> Good change.
>
> Add year to test's copyright line.
>
> Regards,
> Vladimir K
>
> On 8/4/20 9:58 AM, Boris Ulasevich wrote:
>> Hi Vladimir,
>>
>> Yes, thank you. I've re-written this to improve readability by 
>> changing the logic slightly.
>> http://cr.openjdk.java.net/~bulasevich/8248445/webrev.03
>>
>> thanks,
>> Boris
>>
>> On 03.08.2020 20:25, Vladimir Kozlov wrote:
>>> Hi Boris,
>>>
>>> The current code is hard to read. Can you rearrange it to have clear 
>>> code flow (and correct spaces for if ())? Including F and D checks. 
>>> To something like:
>>>
>>> ? if (tzero == TypeF::ZERO) {
>>> ??? if (sub->Opcode() == Op_SubF &&
>>> ??????? sub->in(2) == x &&
>>> ??????? phase->type(sub->in(1)) == tzero)) {
>>> ????? x = new AbsFNode(x);
>>> ????? if (flip) {
>>> ??????? x = new SubFNode(sub->in(1), phase->transform(x));
>>> ????? }
>>> ??? }
>>> ? } else if
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 8/2/20 1:54 PM, Boris Ulasevich wrote:
>>>> Hi all,
>>>>
>>>> Please review a simple change to C2 to fix a regression: AbsI/AbsL
>>>> nodes are used without checking that the platform supports them
>>>> (for now it is the issue for ARM32 and 32-bit x86 platforms).
>>>>
>>>> http://cr.openjdk.java.net/~bulasevich/8248445/webrev.02
>>>> http://bugs.openjdk.java.net/browse/JDK-8248445
>>>>
>>>> thanks,
>>>> Boris
>>>>
>>


From aph at redhat.com  Wed Aug  5 09:08:39 2020
From: aph at redhat.com (Andrew Haley)
Date: Wed, 5 Aug 2020 10:08:39 +0100
Subject: RFR 8249893: AARCH64: optimize the construction of the value from
 the bits of the other two
In-Reply-To: <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com>
References: <c11d8af8-bca7-7a7e-5486-94c487f241ac@bell-sw.com>
 <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com>
Message-ID: <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com>

Hi,

On 8/4/20 5:56 PM, Boris Ulasevich wrote:

> gently reminding of this review request.

>> http://bugs.openjdk.java.net/browse/JDK-8249893
>> http://cr.openjdk.java.net/~bulasevich/8249893/webrev.00

I'm leaning towards no. The code is too complicated and difficult to
maintain for such a small gain. As I suggested to Eric Liu
<eric.c.liu at arm.com> when discussing 8248870, we should try
canonicalizing this stuff early in compilation then matching with
BFM rules.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From vladimir.kozlov at oracle.com  Wed Aug  5 16:16:43 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 5 Aug 2020 09:16:43 -0700
Subject: RFR(M) : 8251132 : make main classes public in vmTestbase/jit
 tests
In-Reply-To: <FE4EBF05-3C3A-4763-AB04-04C0FDB90214@oracle.com>
References: <FE4EBF05-3C3A-4763-AB04-04C0FDB90214@oracle.com>
Message-ID: <3baaf7fd-374f-5005-20e9-619a61710ee3@oracle.com>

Hi Igor

We were always told to use '/othervm' only if additional VM flags are specified.
Also based on RFE description making classes public will allow to execute them directly by jtreg.

So why you use  '/othervm'?

Also since you cleaning all this test can you use uniform format for class declaration line.
I see different variations:

public class DivTest{

public class Filtering
  {

public class Robert
     {

public class collapse {

I think the last example is what we usually use.

Code indent is also all over places.

I understand that fixing many files by hand would be hard. But we you can do something (with script) which will not take 
a lot of your time we should do that.

Thanks,
Vladimir K

On 8/4/20 10:18 PM, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev//8251132/webrev.00
>> 498 lines changed: 0 ins; 132 del; 366 mod;
> 
> Hi all,
> 
> could you please review the patch which adds public modifier to "main" test classes in vmTestbase/jit tests?
> from JBS:
>> main test classes of several vmTestbase/jit tests are package-private, as a result, jtreg can't run them directly and we had to use `driver ExecDriver --java ` to run them.
>>
>> this RFE is to make these classes public and to replace ExecDriver w/ regular `main/othervm` where appropriate.
> 
> the patch also removes ExecDriver and @build in all but 6 tests. those 6 (vmTestbase/jit/t/t108--t113) compare stack traces to the golden ones, and execution them "directly" by jtreg will lead to failures due to a few extra frames from jtreg.
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8251132
> testing: :vmTestbase_vm_compiler on {linux,windows,macos}-x64
> webrev: http://cr.openjdk.java.net/~iignatyev//8251132/webrev.00
> 
> Thanks,
> -- Igor
> 

From igor.ignatyev at oracle.com  Wed Aug  5 16:44:50 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 5 Aug 2020 09:44:50 -0700
Subject: RFR(M) : 8251132 : make main classes public in vmTestbase/jit
 tests
In-Reply-To: <3baaf7fd-374f-5005-20e9-619a61710ee3@oracle.com>
References: <FE4EBF05-3C3A-4763-AB04-04C0FDB90214@oracle.com>
 <3baaf7fd-374f-5005-20e9-619a61710ee3@oracle.com>
Message-ID: <CFF70C8E-8DCF-4EE8-A99D-E9D25B1CD29A@oracle.com>


> On Aug 5, 2020, at 9:16 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Hi Igor
> 
> We were always told to use '/othervm' only if additional VM flags are specified.
> Also based on RFE description making classes public will allow to execute them directly by jtreg.
> 
> So why you use  '/othervm'?

/othervm tests are run directly by jtreg, as opposed to tests which use ExecDriver, where jtreg runs ExecDriver and ExecDriver spawns a new process to run a test.

I used to /othervm to keep the tests closer to their current state, i.e. each test is run in a separate clean JVM. removing /othervm would require a bit more detail analysis on wherever these tests really require clean state, I'd prefer to do separately. 

> 
> Also since you cleaning all this test can you use uniform format for class declaration line.
> I see different variations:
> 
> public class DivTest{
> 
> public class Filtering
> {
> 
> public class Robert
>    {
> 
> public class collapse {
> 
> I think the last example is what we usually use.
> 
> Code indent is also all over places.
> 
> I understand that fixing many files by hand would be hard. But we you can do something (with script) which will not take a lot of your time we should do that.
I guess I can run some auto-formater on all these files, yet to make it cleaner I'd prefer to do it by another RFE.

-- Igor

> 
> Thanks,
> Vladimir K
> 
> On 8/4/20 10:18 PM, Igor Ignatyev wrote:
>> http://cr.openjdk.java.net/~iignatyev//8251132/webrev.00
>>> 498 lines changed: 0 ins; 132 del; 366 mod;
>> Hi all,
>> could you please review the patch which adds public modifier to "main" test classes in vmTestbase/jit tests?
>> from JBS:
>>> main test classes of several vmTestbase/jit tests are package-private, as a result, jtreg can't run them directly and we had to use `driver ExecDriver --java ` to run them.
>>> 
>>> this RFE is to make these classes public and to replace ExecDriver w/ regular `main/othervm` where appropriate.
>> the patch also removes ExecDriver and @build in all but 6 tests. those 6 (vmTestbase/jit/t/t108--t113) compare stack traces to the golden ones, and execution them "directly" by jtreg will lead to failures due to a few extra frames from jtreg.
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8251132
>> testing: :vmTestbase_vm_compiler on {linux,windows,macos}-x64
>> webrev: http://cr.openjdk.java.net/~iignatyev//8251132/webrev.00
>> Thanks,
>> -- Igor


From vladimir.kozlov at oracle.com  Wed Aug  5 16:48:43 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 5 Aug 2020 09:48:43 -0700
Subject: RFR(M) : 8251132 : make main classes public in vmTestbase/jit
 tests
In-Reply-To: <CFF70C8E-8DCF-4EE8-A99D-E9D25B1CD29A@oracle.com>
References: <FE4EBF05-3C3A-4763-AB04-04C0FDB90214@oracle.com>
 <3baaf7fd-374f-5005-20e9-619a61710ee3@oracle.com>
 <CFF70C8E-8DCF-4EE8-A99D-E9D25B1CD29A@oracle.com>
Message-ID: <d29404f0-41f5-3c4b-5223-04d92dc0668a@oracle.com>

On 8/5/20 9:44 AM, Igor Ignatyev wrote:
> 
> 
>> On Aug 5, 2020, at 9:16 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>> Hi Igor
>>
>> We were always told to use '/othervm' only if additional VM flags are specified.
>> Also based on RFE description making classes public will allow to execute them directly by jtreg.
>>
>> So why you use  '/othervm'?
> 
> /othervm tests are run directly by jtreg, as opposed to tests which use ExecDriver, where jtreg runs ExecDriver and ExecDriver spawns a new process to run a test.
> 
> I used to /othervm to keep the tests closer to their current state, i.e. each test is run in a separate clean JVM. removing /othervm would require a bit more detail analysis on wherever these tests really require clean state, I'd prefer to do separately.

Okay. Add this to RFE comment to avoid confusion later.

> 
>>
>> Also since you cleaning all this test can you use uniform format for class declaration line.
>> I see different variations:
>>
>> public class DivTest{
>>
>> public class Filtering
>> {
>>
>> public class Robert
>>     {
>>
>> public class collapse {
>>
>> I think the last example is what we usually use.
>>
>> Code indent is also all over places.
>>
>> I understand that fixing many files by hand would be hard. But we you can do something (with script) which will not take a lot of your time we should do that.
> I guess I can run some auto-formater on all these files, yet to make it cleaner I'd prefer to do it by another RFE.

Agree.

Thanks,
Vladimir K

> 
> -- Igor
> 
>>
>> Thanks,
>> Vladimir K
>>
>> On 8/4/20 10:18 PM, Igor Ignatyev wrote:
>>> http://cr.openjdk.java.net/~iignatyev//8251132/webrev.00
>>>> 498 lines changed: 0 ins; 132 del; 366 mod;
>>> Hi all,
>>> could you please review the patch which adds public modifier to "main" test classes in vmTestbase/jit tests?
>>> from JBS:
>>>> main test classes of several vmTestbase/jit tests are package-private, as a result, jtreg can't run them directly and we had to use `driver ExecDriver --java ` to run them.
>>>>
>>>> this RFE is to make these classes public and to replace ExecDriver w/ regular `main/othervm` where appropriate.
>>> the patch also removes ExecDriver and @build in all but 6 tests. those 6 (vmTestbase/jit/t/t108--t113) compare stack traces to the golden ones, and execution them "directly" by jtreg will lead to failures due to a few extra frames from jtreg.
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8251132
>>> testing: :vmTestbase_vm_compiler on {linux,windows,macos}-x64
>>> webrev: http://cr.openjdk.java.net/~iignatyev//8251132/webrev.00
>>> Thanks,
>>> -- Igor
> 

From leonid.mesnik at oracle.com  Wed Aug  5 17:54:01 2020
From: leonid.mesnik at oracle.com (Leonid Mesnik)
Date: Wed, 5 Aug 2020 10:54:01 -0700
Subject: RFR: 8161684: [testconf] Add VerifyOops' testing into compiler tiers
Message-ID: <2D3C0414-4A37-4A02-BD2C-F3A221E4658C@oracle.com>

Hi 

Could you please review following fix which disable testing of AOT when VerifyOops is enabled until https://bugs.openjdk.java.net/browse/JDK-8209961 <https://bugs.openjdk.java.net/browse/JDK-8209961> is fixed.


bug: https://bugs.openjdk.java.net/browse/JDK-8161684 <https://bugs.openjdk.java.net/browse/JDK-8161684>
diff:

diff -r 0d5c9dffe1f6 test/jtreg-ext/requires/VMProps.java
--- a/test/jtreg-ext/requires/VMProps.java      Mon Jul 27 22:59:27 2020 +0200
+++ b/test/jtreg-ext/requires/VMProps.java      Wed Aug 05 10:50:20 2020 -0700
@@ -380,6 +380,10 @@
             return "false";
         }

+        if (WB.getBooleanVMFlag("VerifyOops")) {
+            return "false";
+        }
+
         switch (GC.selected()) {
             case Serial:
             case Parallel:

Leonid

From vladimir.kozlov at oracle.com  Wed Aug  5 18:22:01 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 5 Aug 2020 11:22:01 -0700
Subject: RFR: 8161684: [testconf] Add VerifyOops' testing into compiler
 tiers
In-Reply-To: <2D3C0414-4A37-4A02-BD2C-F3A221E4658C@oracle.com>
References: <2D3C0414-4A37-4A02-BD2C-F3A221E4658C@oracle.com>
Message-ID: <d21d7932-d80e-9827-8330-7ca634726b8e@oracle.com>

Hi Leonid,

Dean is working on 8209961 fix and it can be done 'soon'.

How urgent your changes? Can you wait a little?

Thanks,
Vladimir K

On 8/5/20 10:54 AM, Leonid Mesnik wrote:
> Hi
> 
> Could you please review following fix which disable testing of AOT when VerifyOops is enabled until https://bugs.openjdk.java.net/browse/JDK-8209961 <https://bugs.openjdk.java.net/browse/JDK-8209961> is fixed.
> 
> 
> bug: https://bugs.openjdk.java.net/browse/JDK-8161684 <https://bugs.openjdk.java.net/browse/JDK-8161684>
> diff:
> 
> diff -r 0d5c9dffe1f6 test/jtreg-ext/requires/VMProps.java
> --- a/test/jtreg-ext/requires/VMProps.java      Mon Jul 27 22:59:27 2020 +0200
> +++ b/test/jtreg-ext/requires/VMProps.java      Wed Aug 05 10:50:20 2020 -0700
> @@ -380,6 +380,10 @@
>               return "false";
>           }
> 
> +        if (WB.getBooleanVMFlag("VerifyOops")) {
> +            return "false";
> +        }
> +
>           switch (GC.selected()) {
>               case Serial:
>               case Parallel:
> 
> Leonid
> 

From leonid.mesnik at oracle.com  Wed Aug  5 18:35:16 2020
From: leonid.mesnik at oracle.com (Leonid Mesnik)
Date: Wed, 5 Aug 2020 11:35:16 -0700
Subject: RFR: 8161684: [testconf] Add VerifyOops' testing into compiler
 tiers
In-Reply-To: <d21d7932-d80e-9827-8330-7ca634726b8e@oracle.com>
References: <2D3C0414-4A37-4A02-BD2C-F3A221E4658C@oracle.com>
 <d21d7932-d80e-9827-8330-7ca634726b8e@oracle.com>
Message-ID: <733129AD-775A-4042-8373-B78E3CDCE47D@oracle.com>

Hi

I checked with Dean status of 8209961. He said that he run into some issue and  it makes sense to disable VerifyOops for AOT now.

Leonid

> On Aug 5, 2020, at 11:22 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Hi Leonid,
> 
> Dean is working on 8209961 fix and it can be done 'soon'.
> 
> How urgent your changes? Can you wait a little?
> 
> Thanks,
> Vladimir K
> 
> On 8/5/20 10:54 AM, Leonid Mesnik wrote:
>> Hi
>> Could you please review following fix which disable testing of AOT when VerifyOops is enabled until https://bugs.openjdk.java.net/browse/JDK-8209961 <https://bugs.openjdk.java.net/browse/JDK-8209961> is fixed.
>> bug: https://bugs.openjdk.java.net/browse/JDK-8161684 <https://bugs.openjdk.java.net/browse/JDK-8161684>
>> diff:
>> diff -r 0d5c9dffe1f6 test/jtreg-ext/requires/VMProps.java
>> --- a/test/jtreg-ext/requires/VMProps.java      Mon Jul 27 22:59:27 2020 +0200
>> +++ b/test/jtreg-ext/requires/VMProps.java      Wed Aug 05 10:50:20 2020 -0700
>> @@ -380,6 +380,10 @@
>>              return "false";
>>          }
>> +        if (WB.getBooleanVMFlag("VerifyOops")) {
>> +            return "false";
>> +        }
>> +
>>          switch (GC.selected()) {
>>              case Serial:
>>              case Parallel:
>> Leonid


From vladimir.kozlov at oracle.com  Wed Aug  5 18:53:32 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 5 Aug 2020 11:53:32 -0700
Subject: RFR: 8161684: [testconf] Add VerifyOops' testing into compiler
 tiers
In-Reply-To: <733129AD-775A-4042-8373-B78E3CDCE47D@oracle.com>
References: <2D3C0414-4A37-4A02-BD2C-F3A221E4658C@oracle.com>
 <d21d7932-d80e-9827-8330-7ca634726b8e@oracle.com>
 <733129AD-775A-4042-8373-B78E3CDCE47D@oracle.com>
Message-ID: <28f13e69-7234-1d1f-e6e7-7f4775d3f948@oracle.com>

Okay. Then you change is good.

Thanks,
Vladimir K

On 8/5/20 11:35 AM, Leonid Mesnik wrote:
> Hi
> 
> I checked with Dean status of 8209961. He said that he run into some issue and  it makes sense to disable VerifyOops for AOT now.
> 
> Leonid
> 
>> On Aug 5, 2020, at 11:22 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>> Hi Leonid,
>>
>> Dean is working on 8209961 fix and it can be done 'soon'.
>>
>> How urgent your changes? Can you wait a little?
>>
>> Thanks,
>> Vladimir K
>>
>> On 8/5/20 10:54 AM, Leonid Mesnik wrote:
>>> Hi
>>> Could you please review following fix which disable testing of AOT when VerifyOops is enabled until https://bugs.openjdk.java.net/browse/JDK-8209961 <https://bugs.openjdk.java.net/browse/JDK-8209961> is fixed.
>>> bug: https://bugs.openjdk.java.net/browse/JDK-8161684 <https://bugs.openjdk.java.net/browse/JDK-8161684>
>>> diff:
>>> diff -r 0d5c9dffe1f6 test/jtreg-ext/requires/VMProps.java
>>> --- a/test/jtreg-ext/requires/VMProps.java      Mon Jul 27 22:59:27 2020 +0200
>>> +++ b/test/jtreg-ext/requires/VMProps.java      Wed Aug 05 10:50:20 2020 -0700
>>> @@ -380,6 +380,10 @@
>>>               return "false";
>>>           }
>>> +        if (WB.getBooleanVMFlag("VerifyOops")) {
>>> +            return "false";
>>> +        }
>>> +
>>>           switch (GC.selected()) {
>>>               case Serial:
>>>               case Parallel:
>>> Leonid
> 

From vladimir.x.ivanov at oracle.com  Wed Aug  5 19:16:30 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 5 Aug 2020 22:16:30 +0300
Subject: RFR (XXL): 8223347: Integration of Vector API (Incubator):
 General HotSpot changes
In-Reply-To: <d4a9c6df-b879-df00-6326-359569e9e93b@oracle.com>
References: <c1bdf88c-5de2-d069-5f31-5a95c6988bf8@oracle.com>
 <38a7fe74-0c5e-4a28-b128-24c40b8ea01e@oracle.com>
 <d4a9c6df-b879-df00-6326-359569e9e93b@oracle.com>
Message-ID: <c44200b5-3004-eeba-ceed-0dafa6444e96@oracle.com>

Thanks for the review, Vladimir.

> compile.cpp: what is next comment about?
> 
> +? // FIXME for_igvn() is corrupted from here: new_worklist which is 
> set_for_ignv() was allocated on stack.

It documents a bug in the preceding code which makes for_igvn() node 
list unusable beyond that point:

2098   if (!failing() && RenumberLiveNodes && live_nodes() + 
NodeLimitFudgeFactor < unique()) {
2099     Compile::TracePhase tp("", &timers[_t_renumberLive]);
2100     initial_gvn()->replace_with(&igvn);
2101     for_igvn()->clear();
2102     Unique_Node_List new_worklist(C->comp_arena());
2103     {
2104       ResourceMark rm;
2105       PhaseRenumberLive prl = PhaseRenumberLive(initial_gvn(), 
for_igvn(), &new_worklist);
2106     }
2107     set_for_igvn(&new_worklist);
2108     igvn = PhaseIterGVN(initial_gvn());
2109     igvn.optimize();
2110   }
2111
2112   // FIXME for_igvn() is corrupted from here: new_worklist which is 
set_for_ignv() was allocated on stack.

I'm fine with removing the commend and filing a bug instead.

> print_method(): NodeClassNames[] should be available in product. 
> Node::Name() method is not, but we can move it to product. But I am fine 
> to do that later.

Good point. I'll migrate print_method() to NodeClassNames[] for now.

> Why VectorSupport.java does not have copyright header?

Good catch! Will fix it and incorporate into the webrev in-place shortly.

Best regards,
Vladimir Ivanov

> On 7/28/20 3:29 PM, Vladimir Ivanov wrote:
>> Hi,
>>
>> Thanks for the feedback on webrev.00, Remi, Coleen, Vladimir K., and 
>> Ekaterina!
>>
>> Here are the latest changes for Vector API support in HotSpot shared 
>> code:
>>
>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01 
>>
>>
>> Incremental changes (diff against webrev.00):
>>
>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01_00 
>>
>>
>> I decided to post it here and not initiate a new round of reviews 
>> because the changes are mostly limited to minor cleanups / simple bug 
>> fixes.
>>
>> Detailed summary:
>> ?? - rebased to jdk/jdk tip;
>> ?? - got rid of NotV, VLShiftV, VRShiftV, VURShiftV nodes;
>> ?? - restore lazy cleanup logic during incremental inlining (see 
>> needs_cleanup in compile.cpp);
>> ?? - got rid of x86-specific changes in shared code;
>> ?? - fix for 8244867 [1];
>> ?? - fix Graal test failure: enumerate VectorSupport intrinsics in 
>> CheckGraalIntrinsics
>> ?? - numerous minor cleanups
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] http://hg.openjdk.java.net/panama/dev/rev/dcfc7b6e8977
>> ???? http://jbs.oracle.com/browse/JDK-8244867
>> ???? 8244867: 2 vector api tests crash with 
>> assert(is_reference_type(basic_type())) failed: wrong type
>> Summary: Adding safety checks to prevent intrinsification if class 
>> arguments of non-primitive types are uninitialized.
>>
>> On 04.04.2020 02:12, Vladimir Ivanov wrote:
>>> Hi,
>>>
>>> Following up on review requests of API [0] and Java implementation 
>>> [1] for Vector API (JEP 338 [2]), here's a request for review of 
>>> general HotSpot changes (in shared code) required for supporting the 
>>> API:
>>>
>>>
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/all.00-03/ 
>>>
>>>
>>> (First of all, to set proper expectations: since the JEP is still in 
>>> Candidate state, the intention is to initiate preliminary round(s) of 
>>> review to inform the community and gather feedback before sending out 
>>> final/official RFRs once the JEP is Targeted to a release.)
>>>
>>> Vector API (being developed in Project Panama [3]) relies on JVM 
>>> support to utilize optimal vector hardware instructions at runtime. 
>>> It interacts with JVM through intrinsics (declared in 
>>> jdk.internal.vm.vector.VectorSupport [4]) which expose vector 
>>> operations support in C2 JIT-compiler.
>>>
>>> As Paul wrote earlier: "A vector intrinsic is an internal low-level 
>>> vector operation. The last argument to the intrinsic is fall back 
>>> behavior in Java, implementing the scalar operation over the number 
>>> of elements held by the vector.? Thus, If the intrinsic is not 
>>> supported in C2 for the other arguments then the Java implementation 
>>> is executed (the Java implementation is always executed when running 
>>> in the interpreter or for C1)."
>>>
>>> The rest of JVM support is about aggressively optimizing vector boxes 
>>> to minimize (ideally eliminate) the overhead of boxing for vector 
>>> values.
>>> It's a stop-the-gap solution for vector box elimination problem until 
>>> inline classes arrive. Vector classes are value-based and in the 
>>> longer term will be migrated to inline classes once the support 
>>> becomes available.
>>>
>>> Vector API talk from JVMLS'18 [5] contains brief overview of JVM 
>>> implementation and some details.
>>>
>>> Complete implementation resides in vector-unstable branch of 
>>> panama/dev repository [6].
>>>
>>> Now to gory details (the patch is split in multiple "sub-webrevs"):
>>>
>>> ===========================================================
>>>
>>> (1) 
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/00.backend.shared/ 
>>>
>>>
>>> Ideal vector nodes for new operations introduced by Vector API.
>>>
>>> (Platform-specific back end support will be posted for review 
>>> separately).
>>>
>>> ===========================================================
>>>
>>> (2) 
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/01.intrinsics/ 
>>>
>>>
>>> JVM Java interface (VectorSupport) and intrinsic support in C2.
>>>
>>> Vector instances are initially represented as VectorBox macro nodes 
>>> and "unboxing" is represented by VectorUnbox node. It simplifies 
>>> vector box elimination analysis and the nodes are expanded later 
>>> right before EA pass.
>>>
>>> Vectors have 2-level on-heap representation: for the vector value 
>>> primitive array is used as a backing storage and it is encapsulated 
>>> in a typed wrapper (e.g., Int256Vector - vector of 8 ints - contains 
>>> a int[8] instance which is used to store vector value).
>>>
>>> Unless VectorBox node goes away, it needs to be expanded into an 
>>> allocation eventually, but it is a pure node and doesn't have any JVM 
>>> state associated with it. The problem is solved by keeping JVM state 
>>> separately in a VectorBoxAllocate node associated with VectorBox node 
>>> and use it during expansion.
>>>
>>> Also, to simplify vector box elimination, inlining of vector reboxing 
>>> calls (VectorSupport::maybeRebox) is delayed until the analysis is over.
>>>
>>> ===========================================================
>>>
>>> (3) 
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/02.vbox_elimination/ 
>>>
>>>
>>> Vector box elimination analysis implementation. (Brief overview: 
>>> slides #36-42 [5].)
>>>
>>> The main part is devoted to scalarization across safepoints and 
>>> rematerialization support during deoptimization. In C2-generated code 
>>> vector operations work with raw vector values which live in registers 
>>> or spilled on the stack and it allows to avoid boxing/unboxing when a 
>>> vector value is alive across a safepoint. As with other values, 
>>> there's just a location of the vector value at the safepoint and 
>>> vector type information recorded in the relevant nmethod metadata and 
>>> all the heavy-lifting happens only when rematerialization takes place.
>>>
>>> The analysis preserves object identity invariants except during 
>>> aggressive reboxing (guarded by -XX:+EnableAggressiveReboxing).
>>>
>>> (Aggressive reboxing is crucial for cases when vectors "escape": it 
>>> allocates a fresh instance at every escape point thus enabling 
>>> original instance to go away.)
>>>
>>> ===========================================================
>>>
>>> (4) 
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/03.module.hotspot/ 
>>>
>>>
>>> HotSpot changes for jdk.incubator.vector module. Vector support is 
>>> makred experimental and turned off by default. JEP 338 proposes the 
>>> API to be released as an incubator module, so a user has to specify 
>>> "--add-module jdk.incubator.vector" on the command line to be able to 
>>> use it.
>>> When user does that, JVM automatically enables Vector API support.
>>> It improves usability (user doesn't need to separately "open" the API 
>>> and enable JVM support) while minimizing risks of destabilitzation 
>>> from new code when the API is not used.
>>>
>>>
>>> That's it! Will be happy to answer any questions.
>>>
>>> And thanks in advance for any feedback!
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [0] 
>>> https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-March/065345.html 
>>>
>>>
>>> [1] 
>>> https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-April/041228.html 
>>>
>>>
>>> [2] https://openjdk.java.net/jeps/338
>>>
>>> [3] https://openjdk.java.net/projects/panama/
>>>
>>> [4] 
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/01.intrinsics/src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java.html 
>>>
>>>
>>> [5] http://cr.openjdk.java.net/~vlivanov/talks/2018_JVMLS_VectorAPI.pdf
>>>
>>> [6] http://hg.openjdk.java.net/panama/dev/shortlog/92bbd44386e9
>>>
>>> ???? $ hg clone http://hg.openjdk.java.net/panama/dev/ -b 
>>> vector-unstable

From vladimir.x.ivanov at oracle.com  Wed Aug  5 19:17:00 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 5 Aug 2020 22:17:00 +0300
Subject: RFR (XXL): 8223347: Integration of Vector API (Incubator):
 General HotSpot changes
In-Reply-To: <9c538834-903b-5431-bb43-908b58a1b70a@oracle.com>
References: <c1bdf88c-5de2-d069-5f31-5a95c6988bf8@oracle.com>
 <38a7fe74-0c5e-4a28-b128-24c40b8ea01e@oracle.com>
 <9c538834-903b-5431-bb43-908b58a1b70a@oracle.com>
Message-ID: <90f71dc2-8ff0-5956-d08d-0af28f59c7df@oracle.com>

Thanks for the review, Coleen.

Best regards,
Vladimir Ivanov

On 31.07.2020 22:38, coleen.phillimore at oracle.com wrote:
> The runtime code still looks good to me.
> Coleen
> 
> On 7/28/20 6:29 PM, Vladimir Ivanov wrote:
>> Hi,
>>
>> Thanks for the feedback on webrev.00, Remi, Coleen, Vladimir K., and 
>> Ekaterina!
>>
>> Here are the latest changes for Vector API support in HotSpot shared 
>> code:
>>
>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01 
>>
>>
>> Incremental changes (diff against webrev.00):
>>
>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01_00 
>>
>>
>> I decided to post it here and not initiate a new round of reviews 
>> because the changes are mostly limited to minor cleanups / simple bug 
>> fixes.
>>
>> Detailed summary:
>> ? - rebased to jdk/jdk tip;
>> ? - got rid of NotV, VLShiftV, VRShiftV, VURShiftV nodes;
>> ? - restore lazy cleanup logic during incremental inlining (see 
>> needs_cleanup in compile.cpp);
>> ? - got rid of x86-specific changes in shared code;
>> ? - fix for 8244867 [1];
>> ? - fix Graal test failure: enumerate VectorSupport intrinsics in 
>> CheckGraalIntrinsics
>> ? - numerous minor cleanups
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] http://hg.openjdk.java.net/panama/dev/rev/dcfc7b6e8977
>> ??? http://jbs.oracle.com/browse/JDK-8244867
>> ??? 8244867: 2 vector api tests crash with 
>> assert(is_reference_type(basic_type())) failed: wrong type
>> Summary: Adding safety checks to prevent intrinsification if class 
>> arguments of non-primitive types are uninitialized.
>>
>> On 04.04.2020 02:12, Vladimir Ivanov wrote:
>>> Hi,
>>>
>>> Following up on review requests of API [0] and Java implementation 
>>> [1] for Vector API (JEP 338 [2]), here's a request for review of 
>>> general HotSpot changes (in shared code) required for supporting the 
>>> API:
>>>
>>>
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/all.00-03/ 
>>>
>>>
>>> (First of all, to set proper expectations: since the JEP is still in 
>>> Candidate state, the intention is to initiate preliminary round(s) of 
>>> review to inform the community and gather feedback before sending out 
>>> final/official RFRs once the JEP is Targeted to a release.)
>>>
>>> Vector API (being developed in Project Panama [3]) relies on JVM 
>>> support to utilize optimal vector hardware instructions at runtime. 
>>> It interacts with JVM through intrinsics (declared in 
>>> jdk.internal.vm.vector.VectorSupport [4]) which expose vector 
>>> operations support in C2 JIT-compiler.
>>>
>>> As Paul wrote earlier: "A vector intrinsic is an internal low-level 
>>> vector operation. The last argument to the intrinsic is fall back 
>>> behavior in Java, implementing the scalar operation over the number 
>>> of elements held by the vector.? Thus, If the intrinsic is not 
>>> supported in C2 for the other arguments then the Java implementation 
>>> is executed (the Java implementation is always executed when running 
>>> in the interpreter or for C1)."
>>>
>>> The rest of JVM support is about aggressively optimizing vector boxes 
>>> to minimize (ideally eliminate) the overhead of boxing for vector 
>>> values.
>>> It's a stop-the-gap solution for vector box elimination problem until 
>>> inline classes arrive. Vector classes are value-based and in the 
>>> longer term will be migrated to inline classes once the support 
>>> becomes available.
>>>
>>> Vector API talk from JVMLS'18 [5] contains brief overview of JVM 
>>> implementation and some details.
>>>
>>> Complete implementation resides in vector-unstable branch of 
>>> panama/dev repository [6].
>>>
>>> Now to gory details (the patch is split in multiple "sub-webrevs"):
>>>
>>> ===========================================================
>>>
>>> (1) 
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/00.backend.shared/ 
>>>
>>>
>>> Ideal vector nodes for new operations introduced by Vector API.
>>>
>>> (Platform-specific back end support will be posted for review 
>>> separately).
>>>
>>> ===========================================================
>>>
>>> (2) 
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/01.intrinsics/ 
>>>
>>>
>>> JVM Java interface (VectorSupport) and intrinsic support in C2.
>>>
>>> Vector instances are initially represented as VectorBox macro nodes 
>>> and "unboxing" is represented by VectorUnbox node. It simplifies 
>>> vector box elimination analysis and the nodes are expanded later 
>>> right before EA pass.
>>>
>>> Vectors have 2-level on-heap representation: for the vector value 
>>> primitive array is used as a backing storage and it is encapsulated 
>>> in a typed wrapper (e.g., Int256Vector - vector of 8 ints - contains 
>>> a int[8] instance which is used to store vector value).
>>>
>>> Unless VectorBox node goes away, it needs to be expanded into an 
>>> allocation eventually, but it is a pure node and doesn't have any JVM 
>>> state associated with it. The problem is solved by keeping JVM state 
>>> separately in a VectorBoxAllocate node associated with VectorBox node 
>>> and use it during expansion.
>>>
>>> Also, to simplify vector box elimination, inlining of vector reboxing 
>>> calls (VectorSupport::maybeRebox) is delayed until the analysis is over.
>>>
>>> ===========================================================
>>>
>>> (3) 
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/02.vbox_elimination/ 
>>>
>>>
>>> Vector box elimination analysis implementation. (Brief overview: 
>>> slides #36-42 [5].)
>>>
>>> The main part is devoted to scalarization across safepoints and 
>>> rematerialization support during deoptimization. In C2-generated code 
>>> vector operations work with raw vector values which live in registers 
>>> or spilled on the stack and it allows to avoid boxing/unboxing when a 
>>> vector value is alive across a safepoint. As with other values, 
>>> there's just a location of the vector value at the safepoint and 
>>> vector type information recorded in the relevant nmethod metadata and 
>>> all the heavy-lifting happens only when rematerialization takes place.
>>>
>>> The analysis preserves object identity invariants except during 
>>> aggressive reboxing (guarded by -XX:+EnableAggressiveReboxing).
>>>
>>> (Aggressive reboxing is crucial for cases when vectors "escape": it 
>>> allocates a fresh instance at every escape point thus enabling 
>>> original instance to go away.)
>>>
>>> ===========================================================
>>>
>>> (4) 
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/03.module.hotspot/ 
>>>
>>>
>>> HotSpot changes for jdk.incubator.vector module. Vector support is 
>>> makred experimental and turned off by default. JEP 338 proposes the 
>>> API to be released as an incubator module, so a user has to specify 
>>> "--add-module jdk.incubator.vector" on the command line to be able to 
>>> use it.
>>> When user does that, JVM automatically enables Vector API support.
>>> It improves usability (user doesn't need to separately "open" the API 
>>> and enable JVM support) while minimizing risks of destabilitzation 
>>> from new code when the API is not used.
>>>
>>>
>>> That's it! Will be happy to answer any questions.
>>>
>>> And thanks in advance for any feedback!
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [0] 
>>> https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-March/065345.html 
>>>
>>>
>>> [1] 
>>> https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-April/041228.html 
>>>
>>>
>>> [2] https://openjdk.java.net/jeps/338
>>>
>>> [3] https://openjdk.java.net/projects/panama/
>>>
>>> [4] 
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/01.intrinsics/src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java.html 
>>>
>>>
>>> [5] http://cr.openjdk.java.net/~vlivanov/talks/2018_JVMLS_VectorAPI.pdf
>>>
>>> [6] http://hg.openjdk.java.net/panama/dev/shortlog/92bbd44386e9
>>>
>>> ???? $ hg clone http://hg.openjdk.java.net/panama/dev/ -b 
>>> vector-unstable
> 

From vladimir.kozlov at oracle.com  Wed Aug  5 19:18:39 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 5 Aug 2020 12:18:39 -0700
Subject: RFR (XXL): 8223347: Integration of Vector API (Incubator):
 General HotSpot changes
In-Reply-To: <c44200b5-3004-eeba-ceed-0dafa6444e96@oracle.com>
References: <c1bdf88c-5de2-d069-5f31-5a95c6988bf8@oracle.com>
 <38a7fe74-0c5e-4a28-b128-24c40b8ea01e@oracle.com>
 <d4a9c6df-b879-df00-6326-359569e9e93b@oracle.com>
 <c44200b5-3004-eeba-ceed-0dafa6444e96@oracle.com>
Message-ID: <6f25a6c6-c675-ee46-596d-f97a4119b95a@oracle.com>

On 8/5/20 12:16 PM, Vladimir Ivanov wrote:
> Thanks for the review, Vladimir.
> 
>> compile.cpp: what is next comment about?
>>
>> +? // FIXME for_igvn() is corrupted from here: new_worklist which is set_for_ignv() was allocated on stack.
> 
> It documents a bug in the preceding code which makes for_igvn() node list unusable beyond that point:
> 
> 2098?? if (!failing() && RenumberLiveNodes && live_nodes() + NodeLimitFudgeFactor < unique()) {
> 2099???? Compile::TracePhase tp("", &timers[_t_renumberLive]);
> 2100???? initial_gvn()->replace_with(&igvn);
> 2101???? for_igvn()->clear();
> 2102???? Unique_Node_List new_worklist(C->comp_arena());
> 2103???? {
> 2104?????? ResourceMark rm;
> 2105?????? PhaseRenumberLive prl = PhaseRenumberLive(initial_gvn(), for_igvn(), &new_worklist);
> 2106???? }
> 2107???? set_for_igvn(&new_worklist);
> 2108???? igvn = PhaseIterGVN(initial_gvn());
> 2109???? igvn.optimize();
> 2110?? }
> 2111
> 2112?? // FIXME for_igvn() is corrupted from here: new_worklist which is set_for_ignv() was allocated on stack.
> 
> I'm fine with removing the commend and filing a bug instead.

Yes, please.

> 
>> print_method(): NodeClassNames[] should be available in product. Node::Name() method is not, but we can move it to 
>> product. But I am fine to do that later.
> 
> Good point. I'll migrate print_method() to NodeClassNames[] for now.

Okay.

> 
>> Why VectorSupport.java does not have copyright header?
> 
> Good catch! Will fix it and incorporate into the webrev in-place shortly.

Thanks,
Vladimir K

> 
> Best regards,
> Vladimir Ivanov
> 
>> On 7/28/20 3:29 PM, Vladimir Ivanov wrote:
>>> Hi,
>>>
>>> Thanks for the feedback on webrev.00, Remi, Coleen, Vladimir K., and Ekaterina!
>>>
>>> Here are the latest changes for Vector API support in HotSpot shared code:
>>>
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01
>>>
>>> Incremental changes (diff against webrev.00):
>>>
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01_00
>>>
>>> I decided to post it here and not initiate a new round of reviews because the changes are mostly limited to minor 
>>> cleanups / simple bug fixes.
>>>
>>> Detailed summary:
>>> ?? - rebased to jdk/jdk tip;
>>> ?? - got rid of NotV, VLShiftV, VRShiftV, VURShiftV nodes;
>>> ?? - restore lazy cleanup logic during incremental inlining (see needs_cleanup in compile.cpp);
>>> ?? - got rid of x86-specific changes in shared code;
>>> ?? - fix for 8244867 [1];
>>> ?? - fix Graal test failure: enumerate VectorSupport intrinsics in CheckGraalIntrinsics
>>> ?? - numerous minor cleanups
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [1] http://hg.openjdk.java.net/panama/dev/rev/dcfc7b6e8977
>>> ???? http://jbs.oracle.com/browse/JDK-8244867
>>> ???? 8244867: 2 vector api tests crash with assert(is_reference_type(basic_type())) failed: wrong type
>>> Summary: Adding safety checks to prevent intrinsification if class arguments of non-primitive types are uninitialized.
>>>
>>> On 04.04.2020 02:12, Vladimir Ivanov wrote:
>>>> Hi,
>>>>
>>>> Following up on review requests of API [0] and Java implementation [1] for Vector API (JEP 338 [2]), here's a 
>>>> request for review of general HotSpot changes (in shared code) required for supporting the API:
>>>>
>>>>
>>>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/all.00-03/
>>>>
>>>> (First of all, to set proper expectations: since the JEP is still in Candidate state, the intention is to initiate 
>>>> preliminary round(s) of review to inform the community and gather feedback before sending out final/official RFRs 
>>>> once the JEP is Targeted to a release.)
>>>>
>>>> Vector API (being developed in Project Panama [3]) relies on JVM support to utilize optimal vector hardware 
>>>> instructions at runtime. It interacts with JVM through intrinsics (declared in jdk.internal.vm.vector.VectorSupport 
>>>> [4]) which expose vector operations support in C2 JIT-compiler.
>>>>
>>>> As Paul wrote earlier: "A vector intrinsic is an internal low-level vector operation. The last argument to the 
>>>> intrinsic is fall back behavior in Java, implementing the scalar operation over the number of elements held by the 
>>>> vector.? Thus, If the intrinsic is not supported in C2 for the other arguments then the Java implementation is 
>>>> executed (the Java implementation is always executed when running in the interpreter or for C1)."
>>>>
>>>> The rest of JVM support is about aggressively optimizing vector boxes to minimize (ideally eliminate) the overhead 
>>>> of boxing for vector values.
>>>> It's a stop-the-gap solution for vector box elimination problem until inline classes arrive. Vector classes are 
>>>> value-based and in the longer term will be migrated to inline classes once the support becomes available.
>>>>
>>>> Vector API talk from JVMLS'18 [5] contains brief overview of JVM implementation and some details.
>>>>
>>>> Complete implementation resides in vector-unstable branch of panama/dev repository [6].
>>>>
>>>> Now to gory details (the patch is split in multiple "sub-webrevs"):
>>>>
>>>> ===========================================================
>>>>
>>>> (1) http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/00.backend.shared/
>>>>
>>>> Ideal vector nodes for new operations introduced by Vector API.
>>>>
>>>> (Platform-specific back end support will be posted for review separately).
>>>>
>>>> ===========================================================
>>>>
>>>> (2) http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/01.intrinsics/
>>>>
>>>> JVM Java interface (VectorSupport) and intrinsic support in C2.
>>>>
>>>> Vector instances are initially represented as VectorBox macro nodes and "unboxing" is represented by VectorUnbox 
>>>> node. It simplifies vector box elimination analysis and the nodes are expanded later right before EA pass.
>>>>
>>>> Vectors have 2-level on-heap representation: for the vector value primitive array is used as a backing storage and 
>>>> it is encapsulated in a typed wrapper (e.g., Int256Vector - vector of 8 ints - contains a int[8] instance which is 
>>>> used to store vector value).
>>>>
>>>> Unless VectorBox node goes away, it needs to be expanded into an allocation eventually, but it is a pure node and 
>>>> doesn't have any JVM state associated with it. The problem is solved by keeping JVM state separately in a 
>>>> VectorBoxAllocate node associated with VectorBox node and use it during expansion.
>>>>
>>>> Also, to simplify vector box elimination, inlining of vector reboxing calls (VectorSupport::maybeRebox) is delayed 
>>>> until the analysis is over.
>>>>
>>>> ===========================================================
>>>>
>>>> (3) http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/02.vbox_elimination/
>>>>
>>>> Vector box elimination analysis implementation. (Brief overview: slides #36-42 [5].)
>>>>
>>>> The main part is devoted to scalarization across safepoints and rematerialization support during deoptimization. In 
>>>> C2-generated code vector operations work with raw vector values which live in registers or spilled on the stack and 
>>>> it allows to avoid boxing/unboxing when a vector value is alive across a safepoint. As with other values, there's 
>>>> just a location of the vector value at the safepoint and vector type information recorded in the relevant nmethod 
>>>> metadata and all the heavy-lifting happens only when rematerialization takes place.
>>>>
>>>> The analysis preserves object identity invariants except during aggressive reboxing (guarded by 
>>>> -XX:+EnableAggressiveReboxing).
>>>>
>>>> (Aggressive reboxing is crucial for cases when vectors "escape": it allocates a fresh instance at every escape point 
>>>> thus enabling original instance to go away.)
>>>>
>>>> ===========================================================
>>>>
>>>> (4) http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/03.module.hotspot/
>>>>
>>>> HotSpot changes for jdk.incubator.vector module. Vector support is makred experimental and turned off by default. 
>>>> JEP 338 proposes the API to be released as an incubator module, so a user has to specify "--add-module 
>>>> jdk.incubator.vector" on the command line to be able to use it.
>>>> When user does that, JVM automatically enables Vector API support.
>>>> It improves usability (user doesn't need to separately "open" the API and enable JVM support) while minimizing risks 
>>>> of destabilitzation from new code when the API is not used.
>>>>
>>>>
>>>> That's it! Will be happy to answer any questions.
>>>>
>>>> And thanks in advance for any feedback!
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>> [0] https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-March/065345.html
>>>>
>>>> [1] https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-April/041228.html
>>>>
>>>> [2] https://openjdk.java.net/jeps/338
>>>>
>>>> [3] https://openjdk.java.net/projects/panama/
>>>>
>>>> [4] 
>>>> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/01.intrinsics/src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java.html 
>>>>
>>>>
>>>> [5] http://cr.openjdk.java.net/~vlivanov/talks/2018_JVMLS_VectorAPI.pdf
>>>>
>>>> [6] http://hg.openjdk.java.net/panama/dev/shortlog/92bbd44386e9
>>>>
>>>> ???? $ hg clone http://hg.openjdk.java.net/panama/dev/ -b vector-unstable

From igor.ignatyev at oracle.com  Wed Aug  5 19:52:25 2020
From: igor.ignatyev at oracle.com (igor.ignatyev at oracle.com)
Date: Wed, 5 Aug 2020 12:52:25 -0700
Subject: RFR: 8161684: [testconf] Add VerifyOops' testing into compiler
 tiers
In-Reply-To: <28f13e69-7234-1d1f-e6e7-7f4775d3f948@oracle.com>
References: <28f13e69-7234-1d1f-e6e7-7f4775d3f948@oracle.com>
Message-ID: <2638BD7F-407B-4C3E-9789-9DE1D4836382@oracle.com>

Leonid,

Could you please add a comment saying that this code should be reverted when 8209961 is fixed?

? Igor

> On Aug 5, 2020, at 11:54 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> ?Okay. Then you change is good.
> 
> Thanks,
> Vladimir K
> 
>>> On 8/5/20 11:35 AM, Leonid Mesnik wrote:
>> Hi
>> I checked with Dean status of 8209961. He said that he run into some issue and  it makes sense to disable VerifyOops for AOT now.
>> Leonid
>>>> On Aug 5, 2020, at 11:22 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>> Hi Leonid,
>>> Dean is working on 8209961 fix and it can be done 'soon'.
>>> How urgent your changes? Can you wait a little?
>>> Thanks,
>>> Vladimir K
>>> On 8/5/20 10:54 AM, Leonid Mesnik wrote:
>>>> Hi
>>>> Could you please review following fix which disable testing of AOT when VerifyOops is enabled until https://bugs.openjdk.java.net/browse/JDK-8209961 <https://bugs.openjdk.java.net/browse/JDK-8209961> is fixed.
>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8161684 <https://bugs.openjdk.java.net/browse/JDK-8161684>
>>>> diff:
>>>> diff -r 0d5c9dffe1f6 test/jtreg-ext/requires/VMProps.java
>>>> --- a/test/jtreg-ext/requires/VMProps.java      Mon Jul 27 22:59:27 2020 +0200
>>>> +++ b/test/jtreg-ext/requires/VMProps.java      Wed Aug 05 10:50:20 2020 -0700
>>>> @@ -380,6 +380,10 @@
>>>>             return "false";
>>>>         }
>>>> +        if (WB.getBooleanVMFlag("VerifyOops")) {
>>>> +            return "false";
>>>> +        }
>>>> +
>>>>         switch (GC.selected()) {
>>>>             case Serial:
>>>>             case Parallel:
>>>> Leonid


From leonid.mesnik at oracle.com  Wed Aug  5 21:39:21 2020
From: leonid.mesnik at oracle.com (Leonid Mesnik)
Date: Wed, 5 Aug 2020 14:39:21 -0700
Subject: RFR: 8161684: [testconf] Add VerifyOops' testing into compiler
 tiers
In-Reply-To: <2638BD7F-407B-4C3E-9789-9DE1D4836382@oracle.com>
References: <28f13e69-7234-1d1f-e6e7-7f4775d3f948@oracle.com>
 <2638BD7F-407B-4C3E-9789-9DE1D4836382@oracle.com>
Message-ID: <C657973C-4C5A-4295-A0C9-292D4885DC95@oracle.com>

Sure, will do.

Leonid

> On Aug 5, 2020, at 12:52 PM, igor.ignatyev at oracle.com wrote:
> 
> Leonid,
> 
> Could you please add a comment saying that this code should be reverted when 8209961 is fixed?
> 
> ? Igor
> 
>> On Aug 5, 2020, at 11:54 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>> 
>> ?Okay. Then you change is good.
>> 
>> Thanks,
>> Vladimir K
>> 
>>>> On 8/5/20 11:35 AM, Leonid Mesnik wrote:
>>> Hi
>>> I checked with Dean status of 8209961. He said that he run into some issue and  it makes sense to disable VerifyOops for AOT now.
>>> Leonid
>>>>> On Aug 5, 2020, at 11:22 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>>> Hi Leonid,
>>>> Dean is working on 8209961 fix and it can be done 'soon'.
>>>> How urgent your changes? Can you wait a little?
>>>> Thanks,
>>>> Vladimir K
>>>> On 8/5/20 10:54 AM, Leonid Mesnik wrote:
>>>>> Hi
>>>>> Could you please review following fix which disable testing of AOT when VerifyOops is enabled until https://bugs.openjdk.java.net/browse/JDK-8209961 <https://bugs.openjdk.java.net/browse/JDK-8209961> is fixed.
>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8161684 <https://bugs.openjdk.java.net/browse/JDK-8161684>
>>>>> diff:
>>>>> diff -r 0d5c9dffe1f6 test/jtreg-ext/requires/VMProps.java
>>>>> --- a/test/jtreg-ext/requires/VMProps.java      Mon Jul 27 22:59:27 2020 +0200
>>>>> +++ b/test/jtreg-ext/requires/VMProps.java      Wed Aug 05 10:50:20 2020 -0700
>>>>> @@ -380,6 +380,10 @@
>>>>>            return "false";
>>>>>        }
>>>>> +        if (WB.getBooleanVMFlag("VerifyOops")) {
>>>>> +            return "false";
>>>>> +        }
>>>>> +
>>>>>        switch (GC.selected()) {
>>>>>            case Serial:
>>>>>            case Parallel:
>>>>> Leonid
> 


From igor.ignatyev at oracle.com  Wed Aug  5 23:54:26 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 5 Aug 2020 16:54:26 -0700
Subject: RFR(S) : 8251126 : nsk.share.GoldChecker should read golden file
 from ${test.src}
In-Reply-To: <41261a03-cd46-3d48-839b-d934a9fb92bb@oracle.com>
References: <7510EC68-7A8C-4F1E-A928-5910F13FA5D9@oracle.com>
 <41261a03-cd46-3d48-839b-d934a9fb92bb@oracle.com>
Message-ID: <35C137CF-698C-4396-B68E-98A158CE481F@oracle.com>

Hi David,

thanks for your review, pushed.

-- Igor

> On Aug 4, 2020, at 7:29 PM, David Holmes <david.holmes at oracle.com> wrote:
> 
> Hi Igor,
> 
> This seems fine. The code cleanup looks good too.
> 
> Thanks,
> David
> 
> On 5/08/2020 9:58 am, Igor Ignatyev wrote:
>> http://cr.openjdk.java.net/~iignatyev/8251126/webrev.00/
>>> 37 lines changed: 7 ins; 20 del; 10 mod;
>> Hi all,
>> could you please review this patch?
>> from JBS:
>>> as of now, nsk.share.GoldChecker reads golden files from the current directory, which makes it necessary to copy golden files from ${test.src} before the execution of the tests which use GoldChecker.
>> after this patch, FileInstaller actions will become redundant in 103 of :vmTestbase_vm_compiler tests and will be removed by 8251127.
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8251126
>> webrev: http://cr.openjdk.java.net/~iignatyev/8251126/webrev.00/
>> testing: :vmTestbase_vm_compiler tests
>> 8251127: https://bugs.openjdk.java.net/browse/JDK-8251127
>> Thanks,
>> -- Igor


From Xiaohong.Gong at arm.com  Thu Aug  6 02:43:24 2020
From: Xiaohong.Gong at arm.com (Xiaohong Gong)
Date: Thu, 6 Aug 2020 02:43:24 +0000
Subject: RFR: 8250808: Re-associate loop invariants with other associative
 operations
Message-ID: <VI1PR08MB5328B283CCC9EFB7AFBF3108F5480@VI1PR08MB5328.eurprd08.prod.outlook.com>

Hi,

Could you please help to review this simple patch?  It adds the re-association
for loop invariants with other associative operations in the C2 compiler.

JBS: https://bugs.openjdk.java.net/browse/JDK-8250808
Webrev: http://cr.openjdk.java.net/~xgong/rfr/8250808/webrev.00/

C2 has re-association of loop invariants. However, the current implementation
only supports the re-associations for add and subtract with 32-bits integer type.
For other associative expressions like multiplication and the logic operations,
the re-association is also applicable, and also for the operations with long type.

This patch adds the missing re-associations for other associative operations
together with the support for long type.

With this patch, the following expressions:
  (x * inv1) * inv2
  (x | inv1) | inv2
  (x & inv1) & inv2
  (x ^ inv1) ^ inv2         ; inv1, inv2 are invariants

can be re-associated to:
  x * (inv1 * inv2)         ; "inv1 * inv2" can be hoisted
  x | (inv1 | inv2)         ; "inv1 | inv2" can be hoisted
  x & (inv1 & inv2)       ; "inv1 & inv2" can be hoisted
  x ^ (inv1 ^ inv2)         ; "inv1 ^ inv2" can be hoisted

Performance:
Here is the micro benchmark:
http://cr.openjdk.java.net/~xgong/rfr/8250808/LoopInvariant.java

And the results on X86_64:
Before:
Benchmark                           (length)  Mode Cnt    Score        Error      Units
loopInvariantAddLong          1024      avgt   15   988.142    ?  0.110   ns/op
loopInvariantAndInt              1024      avgt   15   843.850    ?  0.522   ns/op
loopInvariantAndLong          1024      avgt   15   990.551    ? 10.458  ns/op
loopInvariantMulInt              1024      avgt   15  1209.003   ?  0.247   ns/op
loopInvariantMulLong          1024      avgt   15  1213.923   ?  0.438    ns/op
loopInvariantOrInt                1024      avgt   15   843.908    ?  0.132    ns/op
loopInvariantOrLong             1024      avgt   15   990.710   ? 10.484  ns/op
loopInvariantSubLong           1024      avgt   15   988.170   ?  0.159    ns/op
loopInvariantXorInt               1024      avgt   15   806.949   ?  7.860    ns/op
loopInvariantXorLong           1024      avgt   15   990.963   ?  8.321    ns/op

After:
Benchmark                           (length)  Mode  Cnt    Score       Error    Units
loopInvariantAddLong          1024      avgt   15   842.854   ?  9.036  ns/op
loopInvariantAndInt              1024      avgt   15   698.097   ?  0.916  ns/op
loopInvariantAndLong          1024      avgt   15   841.120   ?  0.118  ns/op
loopInvariantMulInt              1024      avgt   15   691.000   ?  7.696  ns/op
loopInvariantMulLong          1024      avgt   15   846.907   ?  0.189  ns/op
loopInvariantOrInt                1024      avgt   15   698.423   ?  4.969  ns/op
loopInvariantOrLong            1024      avgt   15   843.465   ? 10.196  ns/op
loopInvariantSubLong          1024      avgt   15   841.314   ?  2.906  ns/op
loopInvariantXorInt              1024      avgt   15   652.529   ?  0.556  ns/op
loopInvariantXorLong          1024      avgt   15   841.860   ?  2.491  ns/op

Results on AArch64:
Before:
Benchmark                          (length)  Mode  Cnt    Score        Error     Units
loopInvariantAddLong         1024      avgt    15   514.437    ? 0.351  ns/op
loopInvariantAndInt            1024      avgt     15   435.301    ? 0.415  ns/op
loopInvariantAndLong        1024      avgt     15   572.437    ? 0.057  ns/op
loopInvariantMulInt            1024      avgt     15  1154.544   ? 0.030  ns/op
loopInvariantMulLong        1024      avgt     15  1188.109   ? 0.299  ns/op
loopInvariantOrInt              1024      avgt     15   435.605    ? 0.977  ns/op
loopInvariantOrLong          1024      avgt     15   572.475     ? 0.093  ns/op
loopInvariantSubLong        1024      avgt     15   514.340    ? 0.154  ns/op
loopInvariantXorInt            1024      avgt     15   426.186    ? 0.105  ns/op
loopInvariantXorLong        1024      avgt     15   572.505    ? 0.259  ns/op

After:
Benchmark                        (length)  Mode  Cnt    Score       Error    Units
loopInvariantAddLong       1024     avgt     15   508.179   ? 0.108  ns/op
loopInvariantAndInt           1024    avgt     15   394.706   ? 0.199  ns/op
loopInvariantAndLong       1024    avgt     15   434.443   ? 0.247  ns/op
loopInvariantMulInt           1024    avgt     15   762.477   ? 0.079  ns/op
loopInvariantMulLong       1024    avgt     15   775.975   ? 0.159  ns/op
loopInvariantOrInt             1024    avgt     15   394.657   ? 0.156  ns/op
loopInvariantOrLong         1024    avgt     15   434.428   ? 0.282  ns/op
loopInvariantSubLong       1024    avgt     15   507.475   ? 0.151  ns/op
loopInvariantXorInt           1024    avgt     15   396.000   ? 0.011  ns/op
loopInvariantXorLong       1024    avgt     15   434.255   ? 0.099  ns/op

Tests:
Tested jtreg hotspot::hotspot_all_no_apps,jdk::jdk_core,langtools::tier1
and jcstress:tests-custom, and all tests pass without new failure.

Thanks,
Xiaohong Gong


From luhenry at microsoft.com  Thu Aug  6 04:36:07 2020
From: luhenry at microsoft.com (Ludovic Henry)
Date: Thu, 6 Aug 2020 04:36:07 +0000
Subject: RFR[M]: Adding MD5 Intrinsic on x86-64
In-Reply-To: <f89221eb-26b1-6102-0e41-4da765196832@oracle.com>
References: <MWHPR21MB0511E66642AA67A20AFCC2C8B04E0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <MWHPR21MB051101F88E52562389871C2FB04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <061fb2de-563b-7a55-1b14-d8c6b6b4e0f4@oracle.com>
 <159bfdfb-6476-0826-8cf7-145202c7ce33@oracle.com>
 <MWHPR21MB0511A106E229CA4C23B7AD95B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b90d19dc-ad8a-f3d0-4ac2-89fcebb305e3@oracle.com>
 <MWHPR21MB0511C097361C5CEE9D02A075B04D0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <d18933bf-a2f9-806d-4f97-8cf48fdcddfe@oracle.com>
 <MWHPR21MB0511DB583EB3EBEFBE4E7753B04A0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <b9aa612e-346a-d002-aa45-77d4aa7c97dd@oracle.com>
 <MWHPR21MB0511B0E3507A82B4954B907DB04A0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <410fb009-94ab-fea8-9c1c-51c835b27b72@oracle.com>
 <832a89ec-bd6b-5d40-9a1d-5a5e688399e7@oracle.com>
 <MWHPR21MB0511C1ABFC7FEFDE07C72F3FB04B0@MWHPR21MB0511.namprd21.prod.outlook.com>
 <f89221eb-26b1-6102-0e41-4da765196832@oracle.com>
Message-ID: <MWHPR21MB05118A223359BEB4375CEF8BB0480@MWHPR21MB0511.namprd21.prod.outlook.com>

Pushed with https://hg.openjdk.java.net/jdk/jdk/rev/b8231f177eaf

Thank you to all involved ??
 

From christian.hagedorn at oracle.com  Thu Aug  6 09:34:11 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Thu, 6 Aug 2020 11:34:11 +0200
Subject: [16] RFR(S): 8249603: C1: assert(has_error == false) failed: register
 allocation invalid
Message-ID: <348bd2c6-df94-d9f5-47d9-d51443e1e5d6@oracle.com>

Hi

Please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8249603
http://cr.openjdk.java.net/~chagedorn/8249603/webrev.00/

Register allocation fails in C1 in the testcase because two intervals 
overlap (they both have the same stack slot assigned). The problem can 
be traced back to the optimization to assign the same spill slot to 
non-intersecting intervals in LinearScanWalker::combine_spilled_intervals().

In this method, we look at a split parent interval 'cur' and its 
register hint interval 'register_hint'. A register hint is present when 
the interval represents either the source or the target operand of a 
move operation and the register hint the target or source operand, 
respectively (the register hint is used to try to assign the same 
register to the source and target operand such that we can completely 
remove the move operation).

If the register hint is set, then we do some additional checks and make 
sure that the split parent and the register hint do not intersect. If 
all checks pass, the split parent 'cur' gets the same spill slot as the 
register hint [1]. This means that both intervals get the same slot on 
the stack if they are spilled.

The problem now is that we do not consider any split children of the 
register hint which all share the same spill slot with the register hint 
(their split parent). In the testcase, the split parent 'cur' does not 
intersect with the register hint but with one of its split children. As 
a result, they both get the same spill slot and are later indeed both 
spilled (i.e. both virtual registers/operands are put to the same stack 
location at the same time).

The fix now additionally checks if the split parent 'cur' does not 
intersect any split children of the register hint in 
combine_spilled_intervals(). If there is such an intersection, then we 
bail out of the optimization.

Some standard benchmark testing did not show any regressions.

Thank you!

Best regards,
Christian


[1] 
http://hg.openjdk.java.net/jdk/jdk/file/7a3522ab48b3/src/hotspot/share/c1/c1_LinearScan.cpp#l5728

From jamsheed.c.m at oracle.com  Thu Aug  6 12:07:40 2020
From: jamsheed.c.m at oracle.com (Jamsheed C M)
Date: Thu, 6 Aug 2020 17:37:40 +0530
Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler
 code should honor Async Exceptions
Message-ID: <ba5ebf9b-90a7-c45e-a4fb-af2e4efe078d@oracle.com>

Hi all,

JBS: https://bugs.openjdk.java.net/browse/JDK-8249451

webrev: http://cr.openjdk.java.net/~jcm/8249451/webrev.00/

testing : mach1-5(links in jbs)

While working on JDK-8246381 it was noticed that compilation request 
path clears all exceptions(including async) and doesn't propagate[1].

Fix: patch restores the propagation behavior for the probable async 
exceptions.

Compilation request path propagate exception as in [2]. MDO and 
MethodCounter doesn't expect any exception other than metaspace 
OOM(added comments).

Deoptimization path doesn't clear probable async exceptions and take 
unpack_exception path for non uncommontraps.

Added java_lang_InternalError to well known classes.

Request for review.

Best Regards,

Jamsheed

[1] w.r.t changes done for JDK-7131259

[2]

 ??? (a)
 ??? -----> c1_Runtime1.cpp/interpreterRuntime.cpp/compilerRuntime.cpp
 ????? |
 ?????? ----- compilationPolicy.cpp/tieredThresholdPolicy.cpp
 ???????? |
 ????????? ------ compileBroker.cpp

 ??? (b)
 ??? Xcomp versions
 ??? ------> compilationPolicy.cpp
 ?????? |
 ??????? ------> compileBroker.cpp

 ??? (c)

 ??? Direct call to? compile_method in compileBroker.cpp

 ??? JVMCI bootstrap, whitebox, replayCompile.


From tobias.hartmann at oracle.com  Thu Aug  6 13:53:27 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 6 Aug 2020 15:53:27 +0200
Subject: [16] RFR(S): 8249608: Vector register used by C2 compiled method
 corrupted at safepoint
Message-ID: <9dca10a8-0fcb-e63f-f0f9-c2552e5218c1@oracle.com>

Hi,

please review the following fix:
https://bugs.openjdk.java.net/browse/JDK-8249608
http://cr.openjdk.java.net/~thartmann/8249608/webrev.00/

The problem is very similar to JDK-8193518 [1], a vector register (ymm0) used for vectorization of a
loop in a C2 compiled method is corrupted at a safepoint. Again, the root cause is the superword
optimization setting 'max_vector_size' to 16 bytes instead of 32 bytes which leads to the nmethod
being marked as !has_wide_vectors and the safepoint handler not saving vector registers [3].

This time, the problem is that the superword code only updates 'max_vlen_in_bytes' if 'vlen >
max_vlen'. In the failing case, 'vlen' is 4 for all packs (see [4]) but 'vlen_in_bytes' is 16 for
the 4 x int StoreVector and 32 for the 4 x long StoreVector. Once we've processed the int
StoreVector, we are not updating 'max_vlen_in_bytes' when processing long StoreVector because 'vlen'
is equal.

The fix is to make sure to always update 'max_vlen_in_bytes'.

When looking at JDK-8193518 [1], I've noticed that the corresponding regression test was never
pushed. I've added it to this webrev and extended it such that it also covers the new issue.

Thanks,
Tobias

[1] https://bugs.openjdk.java.net/browse/JDK-8193518
[2] http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/opto/output.cpp#l3313
[3]
http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/runtime/sharedRuntime.cpp#l551

[4] -XX:+TraceSuperWord output:

After filter_packs
packset
Pack: 0
 align: 0 	 1101	StoreL	===  1115  1120  1102  174  [[ 1098 ]]  @long[int:>=0]:exact+any *, idx=6;
Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=993,214,[1012] !jvms: Test::test @ bci:17
Test::main @ bci:8
 align: 8 	 1098	StoreL	===  1115  1101  1099  174  [[ 993 ]]  @long[int:>=0]:exact+any *, idx=6;
Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ bci:17
Test::main @ bci:8
 align: 16 	 993	StoreL	===  1115  1098  994  174  [[ 866  214 ]]  @long[int:>=0]:exact+any *,
idx=6;  Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @
bci:17 Test::main @ bci:8
 align: 24 	 214	StoreL	===  1115  993  212  174  [[ 1120  864  255 ]]  @long[int:>=0]:exact+any *,
idx=6;  Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=[1012] !jvms: Test::test @ bci:17
Test::main @ bci:8
Pack: 1
 align: 0 	 1097	StoreI	===  1115  1119  1106  41  [[ 1096 ]]  @int[int:>=0]:exact+any *, idx=8;
Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=989,253,[1009] !jvms: Test::test @ bci:23
Test::main @ bci:8
 align: 4 	 1096	StoreI	===  1115  1097  1104  41  [[ 989 ]]  @int[int:>=0]:exact+any *, idx=8;
Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23
Test::main @ bci:8
 align: 8 	 989	StoreI	===  1115  1096  996  41  [[ 867  253 ]]  @int[int:>=0]:exact+any *, idx=8;
Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23
Test::main @ bci:8
 align: 12 	 253	StoreI	===  1115  989  251  41  [[ 1119  860  255 ]]  @int[int:>=0]:exact+any *,
idx=8;  Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=[1009] !jvms: Test::test @ bci:23
Test::main @ bci:8

new Vector node:  1491	ReplicateI	=== _  41  [[]]  #vectorx[4]:{int}
new Vector node:  1492	StoreVector	===  1115  1119  1106  1491  [[ 1487  1119  255  1486 ]]
@int[int:>=0]:NotNull:exact+any *, idx=8; mismatched  Memory: @int[int:>=0]:NotNull:exact+any *,
idx=8; !orig=[1097],[989],[253],[1009] !jvms: Test::test @ bci:23 Test::main @ bci:8
new Vector node:  1493	ReplicateL	=== _  174  [[]]  #vectory[4]:{long}
new Vector node:  1494	StoreVector	===  1115  1120  1102  1493  [[ 1489  1120  255  1488 ]]
@long[int:>=0]:NotNull:exact+any *, idx=6; mismatched  Memory: @long[int:>=0]:NotNull:exact+any *,
idx=6; !orig=[1101],[993],[214],[1012] !jvms: Test::test @ bci:17 Test::main @ bci:8

From vladimir.x.ivanov at oracle.com  Thu Aug  6 14:07:43 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 6 Aug 2020 17:07:43 +0300
Subject: [16] RFR(S): 8249608: Vector register used by C2 compiled method
 corrupted at safepoint
In-Reply-To: <9dca10a8-0fcb-e63f-f0f9-c2552e5218c1@oracle.com>
References: <9dca10a8-0fcb-e63f-f0f9-c2552e5218c1@oracle.com>
Message-ID: <57163077-f113-b538-2830-86e43c5bd8ea@oracle.com>


> http://cr.openjdk.java.net/~thartmann/8249608/webrev.00/

Looks good.

Best regards,
Vladimir Ivanov

> 
> The problem is very similar to JDK-8193518 [1], a vector register (ymm0) used for vectorization of a
> loop in a C2 compiled method is corrupted at a safepoint. Again, the root cause is the superword
> optimization setting 'max_vector_size' to 16 bytes instead of 32 bytes which leads to the nmethod
> being marked as !has_wide_vectors and the safepoint handler not saving vector registers [3].
> 
> This time, the problem is that the superword code only updates 'max_vlen_in_bytes' if 'vlen >
> max_vlen'. In the failing case, 'vlen' is 4 for all packs (see [4]) but 'vlen_in_bytes' is 16 for
> the 4 x int StoreVector and 32 for the 4 x long StoreVector. Once we've processed the int
> StoreVector, we are not updating 'max_vlen_in_bytes' when processing long StoreVector because 'vlen'
> is equal.
> 
> The fix is to make sure to always update 'max_vlen_in_bytes'.
> 
> When looking at JDK-8193518 [1], I've noticed that the corresponding regression test was never
> pushed. I've added it to this webrev and extended it such that it also covers the new issue.
> 
> Thanks,
> Tobias
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8193518
> [2] http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/opto/output.cpp#l3313
> [3]
> http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/runtime/sharedRuntime.cpp#l551
> 
> [4] -XX:+TraceSuperWord output:
> 
> After filter_packs
> packset
> Pack: 0
>   align: 0 	 1101	StoreL	===  1115  1120  1102  174  [[ 1098 ]]  @long[int:>=0]:exact+any *, idx=6;
> Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=993,214,[1012] !jvms: Test::test @ bci:17
> Test::main @ bci:8
>   align: 8 	 1098	StoreL	===  1115  1101  1099  174  [[ 993 ]]  @long[int:>=0]:exact+any *, idx=6;
> Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ bci:17
> Test::main @ bci:8
>   align: 16 	 993	StoreL	===  1115  1098  994  174  [[ 866  214 ]]  @long[int:>=0]:exact+any *,
> idx=6;  Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @
> bci:17 Test::main @ bci:8
>   align: 24 	 214	StoreL	===  1115  993  212  174  [[ 1120  864  255 ]]  @long[int:>=0]:exact+any *,
> idx=6;  Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=[1012] !jvms: Test::test @ bci:17
> Test::main @ bci:8
> Pack: 1
>   align: 0 	 1097	StoreI	===  1115  1119  1106  41  [[ 1096 ]]  @int[int:>=0]:exact+any *, idx=8;
> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=989,253,[1009] !jvms: Test::test @ bci:23
> Test::main @ bci:8
>   align: 4 	 1096	StoreI	===  1115  1097  1104  41  [[ 989 ]]  @int[int:>=0]:exact+any *, idx=8;
> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23
> Test::main @ bci:8
>   align: 8 	 989	StoreI	===  1115  1096  996  41  [[ 867  253 ]]  @int[int:>=0]:exact+any *, idx=8;
> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23
> Test::main @ bci:8
>   align: 12 	 253	StoreI	===  1115  989  251  41  [[ 1119  860  255 ]]  @int[int:>=0]:exact+any *,
> idx=8;  Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=[1009] !jvms: Test::test @ bci:23
> Test::main @ bci:8
> 
> new Vector node:  1491	ReplicateI	=== _  41  [[]]  #vectorx[4]:{int}
> new Vector node:  1492	StoreVector	===  1115  1119  1106  1491  [[ 1487  1119  255  1486 ]]
> @int[int:>=0]:NotNull:exact+any *, idx=8; mismatched  Memory: @int[int:>=0]:NotNull:exact+any *,
> idx=8; !orig=[1097],[989],[253],[1009] !jvms: Test::test @ bci:23 Test::main @ bci:8
> new Vector node:  1493	ReplicateL	=== _  174  [[]]  #vectory[4]:{long}
> new Vector node:  1494	StoreVector	===  1115  1120  1102  1493  [[ 1489  1120  255  1488 ]]
> @long[int:>=0]:NotNull:exact+any *, idx=6; mismatched  Memory: @long[int:>=0]:NotNull:exact+any *,
> idx=6; !orig=[1101],[993],[214],[1012] !jvms: Test::test @ bci:17 Test::main @ bci:8
> 

From tobias.hartmann at oracle.com  Thu Aug  6 14:11:38 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 6 Aug 2020 16:11:38 +0200
Subject: [16] RFR(S): 8249608: Vector register used by C2 compiled method
 corrupted at safepoint
In-Reply-To: <57163077-f113-b538-2830-86e43c5bd8ea@oracle.com>
References: <9dca10a8-0fcb-e63f-f0f9-c2552e5218c1@oracle.com>
 <57163077-f113-b538-2830-86e43c5bd8ea@oracle.com>
Message-ID: <fb6bbc4e-2cd7-3205-7e83-e700e148c6db@oracle.com>

Thanks Vladimir!

Best regards,
Tobias

On 06.08.20 16:07, Vladimir Ivanov wrote:
> 
>> http://cr.openjdk.java.net/~thartmann/8249608/webrev.00/
> 
> Looks good.
> 
> Best regards,
> Vladimir Ivanov
> 
>>
>> The problem is very similar to JDK-8193518 [1], a vector register (ymm0) used for vectorization of a
>> loop in a C2 compiled method is corrupted at a safepoint. Again, the root cause is the superword
>> optimization setting 'max_vector_size' to 16 bytes instead of 32 bytes which leads to the nmethod
>> being marked as !has_wide_vectors and the safepoint handler not saving vector registers [3].
>>
>> This time, the problem is that the superword code only updates 'max_vlen_in_bytes' if 'vlen >
>> max_vlen'. In the failing case, 'vlen' is 4 for all packs (see [4]) but 'vlen_in_bytes' is 16 for
>> the 4 x int StoreVector and 32 for the 4 x long StoreVector. Once we've processed the int
>> StoreVector, we are not updating 'max_vlen_in_bytes' when processing long StoreVector because 'vlen'
>> is equal.
>>
>> The fix is to make sure to always update 'max_vlen_in_bytes'.
>>
>> When looking at JDK-8193518 [1], I've noticed that the corresponding regression test was never
>> pushed. I've added it to this webrev and extended it such that it also covers the new issue.
>>
>> Thanks,
>> Tobias
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8193518
>> [2] http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/opto/output.cpp#l3313
>> [3]
>> http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/runtime/sharedRuntime.cpp#l551
>>
>> [4] -XX:+TraceSuperWord output:
>>
>> After filter_packs
>> packset
>> Pack: 0
>> ? align: 0????? 1101??? StoreL??? ===? 1115? 1120? 1102? 174? [[ 1098 ]]? @long[int:>=0]:exact+any
>> *, idx=6;
>> Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=993,214,[1012] !jvms: Test::test @ bci:17
>> Test::main @ bci:8
>> ? align: 8????? 1098??? StoreL??? ===? 1115? 1101? 1099? 174? [[ 993 ]]? @long[int:>=0]:exact+any
>> *, idx=6;
>> Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ bci:17
>> Test::main @ bci:8
>> ? align: 16????? 993??? StoreL??? ===? 1115? 1098? 994? 174? [[ 866? 214 ]]?
>> @long[int:>=0]:exact+any *,
>> idx=6;? Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @
>> bci:17 Test::main @ bci:8
>> ? align: 24????? 214??? StoreL??? ===? 1115? 993? 212? 174? [[ 1120? 864? 255 ]]?
>> @long[int:>=0]:exact+any *,
>> idx=6;? Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=[1012] !jvms: Test::test @ bci:17
>> Test::main @ bci:8
>> Pack: 1
>> ? align: 0????? 1097??? StoreI??? ===? 1115? 1119? 1106? 41? [[ 1096 ]]? @int[int:>=0]:exact+any
>> *, idx=8;
>> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=989,253,[1009] !jvms: Test::test @ bci:23
>> Test::main @ bci:8
>> ? align: 4????? 1096??? StoreI??? ===? 1115? 1097? 1104? 41? [[ 989 ]]? @int[int:>=0]:exact+any *,
>> idx=8;
>> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23
>> Test::main @ bci:8
>> ? align: 8????? 989??? StoreI??? ===? 1115? 1096? 996? 41? [[ 867? 253 ]]? @int[int:>=0]:exact+any
>> *, idx=8;
>> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23
>> Test::main @ bci:8
>> ? align: 12????? 253??? StoreI??? ===? 1115? 989? 251? 41? [[ 1119? 860? 255 ]]?
>> @int[int:>=0]:exact+any *,
>> idx=8;? Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=[1009] !jvms: Test::test @ bci:23
>> Test::main @ bci:8
>>
>> new Vector node:? 1491??? ReplicateI??? === _? 41? [[]]? #vectorx[4]:{int}
>> new Vector node:? 1492??? StoreVector??? ===? 1115? 1119? 1106? 1491? [[ 1487? 1119? 255? 1486 ]]
>> @int[int:>=0]:NotNull:exact+any *, idx=8; mismatched? Memory: @int[int:>=0]:NotNull:exact+any *,
>> idx=8; !orig=[1097],[989],[253],[1009] !jvms: Test::test @ bci:23 Test::main @ bci:8
>> new Vector node:? 1493??? ReplicateL??? === _? 174? [[]]? #vectory[4]:{long}
>> new Vector node:? 1494??? StoreVector??? ===? 1115? 1120? 1102? 1493? [[ 1489? 1120? 255? 1488 ]]
>> @long[int:>=0]:NotNull:exact+any *, idx=6; mismatched? Memory: @long[int:>=0]:NotNull:exact+any *,
>> idx=6; !orig=[1101],[993],[214],[1012] !jvms: Test::test @ bci:17 Test::main @ bci:8
>>

From christian.hagedorn at oracle.com  Thu Aug  6 14:28:21 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Thu, 6 Aug 2020 16:28:21 +0200
Subject: [16] RFR(S): 8249608: Vector register used by C2 compiled method
 corrupted at safepoint
In-Reply-To: <9dca10a8-0fcb-e63f-f0f9-c2552e5218c1@oracle.com>
References: <9dca10a8-0fcb-e63f-f0f9-c2552e5218c1@oracle.com>
Message-ID: <f3c5d919-b23c-e26b-b629-848cc0174184@oracle.com>

Hi Tobias

Looks good to me!

Best regards,
Christian

On 06.08.20 15:53, Tobias Hartmann wrote:
> Hi,
> 
> please review the following fix:
> https://bugs.openjdk.java.net/browse/JDK-8249608
> http://cr.openjdk.java.net/~thartmann/8249608/webrev.00/
> 
> The problem is very similar to JDK-8193518 [1], a vector register (ymm0) used for vectorization of a
> loop in a C2 compiled method is corrupted at a safepoint. Again, the root cause is the superword
> optimization setting 'max_vector_size' to 16 bytes instead of 32 bytes which leads to the nmethod
> being marked as !has_wide_vectors and the safepoint handler not saving vector registers [3].
> 
> This time, the problem is that the superword code only updates 'max_vlen_in_bytes' if 'vlen >
> max_vlen'. In the failing case, 'vlen' is 4 for all packs (see [4]) but 'vlen_in_bytes' is 16 for
> the 4 x int StoreVector and 32 for the 4 x long StoreVector. Once we've processed the int
> StoreVector, we are not updating 'max_vlen_in_bytes' when processing long StoreVector because 'vlen'
> is equal.
> 
> The fix is to make sure to always update 'max_vlen_in_bytes'.
> 
> When looking at JDK-8193518 [1], I've noticed that the corresponding regression test was never
> pushed. I've added it to this webrev and extended it such that it also covers the new issue.
> 
> Thanks,
> Tobias
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8193518
> [2] http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/opto/output.cpp#l3313
> [3]
> http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/runtime/sharedRuntime.cpp#l551
> 
> [4] -XX:+TraceSuperWord output:
> 
> After filter_packs
> packset
> Pack: 0
>   align: 0 	 1101	StoreL	===  1115  1120  1102  174  [[ 1098 ]]  @long[int:>=0]:exact+any *, idx=6;
> Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=993,214,[1012] !jvms: Test::test @ bci:17
> Test::main @ bci:8
>   align: 8 	 1098	StoreL	===  1115  1101  1099  174  [[ 993 ]]  @long[int:>=0]:exact+any *, idx=6;
> Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ bci:17
> Test::main @ bci:8
>   align: 16 	 993	StoreL	===  1115  1098  994  174  [[ 866  214 ]]  @long[int:>=0]:exact+any *,
> idx=6;  Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @
> bci:17 Test::main @ bci:8
>   align: 24 	 214	StoreL	===  1115  993  212  174  [[ 1120  864  255 ]]  @long[int:>=0]:exact+any *,
> idx=6;  Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=[1012] !jvms: Test::test @ bci:17
> Test::main @ bci:8
> Pack: 1
>   align: 0 	 1097	StoreI	===  1115  1119  1106  41  [[ 1096 ]]  @int[int:>=0]:exact+any *, idx=8;
> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=989,253,[1009] !jvms: Test::test @ bci:23
> Test::main @ bci:8
>   align: 4 	 1096	StoreI	===  1115  1097  1104  41  [[ 989 ]]  @int[int:>=0]:exact+any *, idx=8;
> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23
> Test::main @ bci:8
>   align: 8 	 989	StoreI	===  1115  1096  996  41  [[ 867  253 ]]  @int[int:>=0]:exact+any *, idx=8;
> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23
> Test::main @ bci:8
>   align: 12 	 253	StoreI	===  1115  989  251  41  [[ 1119  860  255 ]]  @int[int:>=0]:exact+any *,
> idx=8;  Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=[1009] !jvms: Test::test @ bci:23
> Test::main @ bci:8
> 
> new Vector node:  1491	ReplicateI	=== _  41  [[]]  #vectorx[4]:{int}
> new Vector node:  1492	StoreVector	===  1115  1119  1106  1491  [[ 1487  1119  255  1486 ]]
> @int[int:>=0]:NotNull:exact+any *, idx=8; mismatched  Memory: @int[int:>=0]:NotNull:exact+any *,
> idx=8; !orig=[1097],[989],[253],[1009] !jvms: Test::test @ bci:23 Test::main @ bci:8
> new Vector node:  1493	ReplicateL	=== _  174  [[]]  #vectory[4]:{long}
> new Vector node:  1494	StoreVector	===  1115  1120  1102  1493  [[ 1489  1120  255  1488 ]]
> @long[int:>=0]:NotNull:exact+any *, idx=6; mismatched  Memory: @long[int:>=0]:NotNull:exact+any *,
> idx=6; !orig=[1101],[993],[214],[1012] !jvms: Test::test @ bci:17 Test::main @ bci:8
> 

From tobias.hartmann at oracle.com  Thu Aug  6 14:29:12 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 6 Aug 2020 16:29:12 +0200
Subject: [16] RFR(S): 8249608: Vector register used by C2 compiled method
 corrupted at safepoint
In-Reply-To: <f3c5d919-b23c-e26b-b629-848cc0174184@oracle.com>
References: <9dca10a8-0fcb-e63f-f0f9-c2552e5218c1@oracle.com>
 <f3c5d919-b23c-e26b-b629-848cc0174184@oracle.com>
Message-ID: <fb9d929c-704c-4794-716e-62345e5a4b07@oracle.com>

Thanks Christian!

Best regards,
Tobias

On 06.08.20 16:28, Christian Hagedorn wrote:
> Hi Tobias
> 
> Looks good to me!
> 
> Best regards,
> Christian
> 
> On 06.08.20 15:53, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following fix:
>> https://bugs.openjdk.java.net/browse/JDK-8249608
>> http://cr.openjdk.java.net/~thartmann/8249608/webrev.00/
>>
>> The problem is very similar to JDK-8193518 [1], a vector register (ymm0) used for vectorization of a
>> loop in a C2 compiled method is corrupted at a safepoint. Again, the root cause is the superword
>> optimization setting 'max_vector_size' to 16 bytes instead of 32 bytes which leads to the nmethod
>> being marked as !has_wide_vectors and the safepoint handler not saving vector registers [3].
>>
>> This time, the problem is that the superword code only updates 'max_vlen_in_bytes' if 'vlen >
>> max_vlen'. In the failing case, 'vlen' is 4 for all packs (see [4]) but 'vlen_in_bytes' is 16 for
>> the 4 x int StoreVector and 32 for the 4 x long StoreVector. Once we've processed the int
>> StoreVector, we are not updating 'max_vlen_in_bytes' when processing long StoreVector because 'vlen'
>> is equal.
>>
>> The fix is to make sure to always update 'max_vlen_in_bytes'.
>>
>> When looking at JDK-8193518 [1], I've noticed that the corresponding regression test was never
>> pushed. I've added it to this webrev and extended it such that it also covers the new issue.
>>
>> Thanks,
>> Tobias
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8193518
>> [2] http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/opto/output.cpp#l3313
>> [3]
>> http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/runtime/sharedRuntime.cpp#l551
>>
>> [4] -XX:+TraceSuperWord output:
>>
>> After filter_packs
>> packset
>> Pack: 0
>> ? align: 0????? 1101??? StoreL??? ===? 1115? 1120? 1102? 174? [[ 1098 ]]? @long[int:>=0]:exact+any
>> *, idx=6;
>> Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=993,214,[1012] !jvms: Test::test @ bci:17
>> Test::main @ bci:8
>> ? align: 8????? 1098??? StoreL??? ===? 1115? 1101? 1099? 174? [[ 993 ]]? @long[int:>=0]:exact+any
>> *, idx=6;
>> Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ bci:17
>> Test::main @ bci:8
>> ? align: 16????? 993??? StoreL??? ===? 1115? 1098? 994? 174? [[ 866? 214 ]]?
>> @long[int:>=0]:exact+any *,
>> idx=6;? Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @
>> bci:17 Test::main @ bci:8
>> ? align: 24????? 214??? StoreL??? ===? 1115? 993? 212? 174? [[ 1120? 864? 255 ]]?
>> @long[int:>=0]:exact+any *,
>> idx=6;? Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=[1012] !jvms: Test::test @ bci:17
>> Test::main @ bci:8
>> Pack: 1
>> ? align: 0????? 1097??? StoreI??? ===? 1115? 1119? 1106? 41? [[ 1096 ]]? @int[int:>=0]:exact+any
>> *, idx=8;
>> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=989,253,[1009] !jvms: Test::test @ bci:23
>> Test::main @ bci:8
>> ? align: 4????? 1096??? StoreI??? ===? 1115? 1097? 1104? 41? [[ 989 ]]? @int[int:>=0]:exact+any *,
>> idx=8;
>> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23
>> Test::main @ bci:8
>> ? align: 8????? 989??? StoreI??? ===? 1115? 1096? 996? 41? [[ 867? 253 ]]? @int[int:>=0]:exact+any
>> *, idx=8;
>> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23
>> Test::main @ bci:8
>> ? align: 12????? 253??? StoreI??? ===? 1115? 989? 251? 41? [[ 1119? 860? 255 ]]?
>> @int[int:>=0]:exact+any *,
>> idx=8;? Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=[1009] !jvms: Test::test @ bci:23
>> Test::main @ bci:8
>>
>> new Vector node:? 1491??? ReplicateI??? === _? 41? [[]]? #vectorx[4]:{int}
>> new Vector node:? 1492??? StoreVector??? ===? 1115? 1119? 1106? 1491? [[ 1487? 1119? 255? 1486 ]]
>> @int[int:>=0]:NotNull:exact+any *, idx=8; mismatched? Memory: @int[int:>=0]:NotNull:exact+any *,
>> idx=8; !orig=[1097],[989],[253],[1009] !jvms: Test::test @ bci:23 Test::main @ bci:8
>> new Vector node:? 1493??? ReplicateL??? === _? 174? [[]]? #vectory[4]:{long}
>> new Vector node:? 1494??? StoreVector??? ===? 1115? 1120? 1102? 1493? [[ 1489? 1120? 255? 1488 ]]
>> @long[int:>=0]:NotNull:exact+any *, idx=6; mismatched? Memory: @long[int:>=0]:NotNull:exact+any *,
>> idx=6; !orig=[1101],[993],[214],[1012] !jvms: Test::test @ bci:17 Test::main @ bci:8
>>

From vladimir.kozlov at oracle.com  Thu Aug  6 19:00:00 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 6 Aug 2020 12:00:00 -0700
Subject: [16] RFR(S): 8249608: Vector register used by C2 compiled method
 corrupted at safepoint
In-Reply-To: <57163077-f113-b538-2830-86e43c5bd8ea@oracle.com>
References: <9dca10a8-0fcb-e63f-f0f9-c2552e5218c1@oracle.com>
 <57163077-f113-b538-2830-86e43c5bd8ea@oracle.com>
Message-ID: <8ddafcf8-5fcf-c0cc-ccd0-29692dd1c19b@oracle.com>

+1

Thanks,
Vladimir K

On 8/6/20 7:07 AM, Vladimir Ivanov wrote:
> 
>> http://cr.openjdk.java.net/~thartmann/8249608/webrev.00/
> 
> Looks good.
> 
> Best regards,
> Vladimir Ivanov
> 
>>
>> The problem is very similar to JDK-8193518 [1], a vector register (ymm0) used for vectorization of a
>> loop in a C2 compiled method is corrupted at a safepoint. Again, the root cause is the superword
>> optimization setting 'max_vector_size' to 16 bytes instead of 32 bytes which leads to the nmethod
>> being marked as !has_wide_vectors and the safepoint handler not saving vector registers [3].
>>
>> This time, the problem is that the superword code only updates 'max_vlen_in_bytes' if 'vlen >
>> max_vlen'. In the failing case, 'vlen' is 4 for all packs (see [4]) but 'vlen_in_bytes' is 16 for
>> the 4 x int StoreVector and 32 for the 4 x long StoreVector. Once we've processed the int
>> StoreVector, we are not updating 'max_vlen_in_bytes' when processing long StoreVector because 'vlen'
>> is equal.
>>
>> The fix is to make sure to always update 'max_vlen_in_bytes'.
>>
>> When looking at JDK-8193518 [1], I've noticed that the corresponding regression test was never
>> pushed. I've added it to this webrev and extended it such that it also covers the new issue.
>>
>> Thanks,
>> Tobias
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8193518
>> [2] http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/opto/output.cpp#l3313
>> [3]
>> http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/runtime/sharedRuntime.cpp#l551
>>
>> [4] -XX:+TraceSuperWord output:
>>
>> After filter_packs
>> packset
>> Pack: 0
>> ? align: 0????? 1101??? StoreL??? ===? 1115? 1120? 1102? 174? [[ 1098 ]]? @long[int:>=0]:exact+any *, idx=6;
>> Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=993,214,[1012] !jvms: Test::test @ bci:17
>> Test::main @ bci:8
>> ? align: 8????? 1098??? StoreL??? ===? 1115? 1101? 1099? 174? [[ 993 ]]? @long[int:>=0]:exact+any *, idx=6;
>> Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ bci:17
>> Test::main @ bci:8
>> ? align: 16????? 993??? StoreL??? ===? 1115? 1098? 994? 174? [[ 866? 214 ]]? @long[int:>=0]:exact+any *,
>> idx=6;? Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @
>> bci:17 Test::main @ bci:8
>> ? align: 24????? 214??? StoreL??? ===? 1115? 993? 212? 174? [[ 1120? 864? 255 ]]? @long[int:>=0]:exact+any *,
>> idx=6;? Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=[1012] !jvms: Test::test @ bci:17
>> Test::main @ bci:8
>> Pack: 1
>> ? align: 0????? 1097??? StoreI??? ===? 1115? 1119? 1106? 41? [[ 1096 ]]? @int[int:>=0]:exact+any *, idx=8;
>> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=989,253,[1009] !jvms: Test::test @ bci:23
>> Test::main @ bci:8
>> ? align: 4????? 1096??? StoreI??? ===? 1115? 1097? 1104? 41? [[ 989 ]]? @int[int:>=0]:exact+any *, idx=8;
>> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23
>> Test::main @ bci:8
>> ? align: 8????? 989??? StoreI??? ===? 1115? 1096? 996? 41? [[ 867? 253 ]]? @int[int:>=0]:exact+any *, idx=8;
>> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23
>> Test::main @ bci:8
>> ? align: 12????? 253??? StoreI??? ===? 1115? 989? 251? 41? [[ 1119? 860? 255 ]]? @int[int:>=0]:exact+any *,
>> idx=8;? Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=[1009] !jvms: Test::test @ bci:23
>> Test::main @ bci:8
>>
>> new Vector node:? 1491??? ReplicateI??? === _? 41? [[]]? #vectorx[4]:{int}
>> new Vector node:? 1492??? StoreVector??? ===? 1115? 1119? 1106? 1491? [[ 1487? 1119? 255? 1486 ]]
>> @int[int:>=0]:NotNull:exact+any *, idx=8; mismatched? Memory: @int[int:>=0]:NotNull:exact+any *,
>> idx=8; !orig=[1097],[989],[253],[1009] !jvms: Test::test @ bci:23 Test::main @ bci:8
>> new Vector node:? 1493??? ReplicateL??? === _? 174? [[]]? #vectory[4]:{long}
>> new Vector node:? 1494??? StoreVector??? ===? 1115? 1120? 1102? 1493? [[ 1489? 1120? 255? 1488 ]]
>> @long[int:>=0]:NotNull:exact+any *, idx=6; mismatched? Memory: @long[int:>=0]:NotNull:exact+any *,
>> idx=6; !orig=[1101],[993],[214],[1012] !jvms: Test::test @ bci:17 Test::main @ bci:8
>>

From vladimir.kozlov at oracle.com  Thu Aug  6 19:19:13 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 6 Aug 2020 12:19:13 -0700
Subject: [16] RFR(S): 8249603: C1: assert(has_error == false) failed:
 register allocation invalid
In-Reply-To: <348bd2c6-df94-d9f5-47d9-d51443e1e5d6@oracle.com>
References: <348bd2c6-df94-d9f5-47d9-d51443e1e5d6@oracle.com>
Message-ID: <e2ca6b1a-f705-789a-25c4-517f21c9dd2c@oracle.com>

Fix looks good. And very nice description of the issue.

Thanks,
Vladimir K

On 8/6/20 2:34 AM, Christian Hagedorn wrote:
> Hi
> 
> Please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8249603
> http://cr.openjdk.java.net/~chagedorn/8249603/webrev.00/
> 
> Register allocation fails in C1 in the testcase because two intervals overlap (they both have the same stack slot 
> assigned). The problem can be traced back to the optimization to assign the same spill slot to non-intersecting 
> intervals in LinearScanWalker::combine_spilled_intervals().
> 
> In this method, we look at a split parent interval 'cur' and its register hint interval 'register_hint'. A register hint 
> is present when the interval represents either the source or the target operand of a move operation and the register 
> hint the target or source operand, respectively (the register hint is used to try to assign the same register to the 
> source and target operand such that we can completely remove the move operation).
> 
> If the register hint is set, then we do some additional checks and make sure that the split parent and the register hint 
> do not intersect. If all checks pass, the split parent 'cur' gets the same spill slot as the register hint [1]. This 
> means that both intervals get the same slot on the stack if they are spilled.
> 
> The problem now is that we do not consider any split children of the register hint which all share the same spill slot 
> with the register hint (their split parent). In the testcase, the split parent 'cur' does not intersect with the 
> register hint but with one of its split children. As a result, they both get the same spill slot and are later indeed 
> both spilled (i.e. both virtual registers/operands are put to the same stack location at the same time).
> 
> The fix now additionally checks if the split parent 'cur' does not intersect any split children of the register hint in 
> combine_spilled_intervals(). If there is such an intersection, then we bail out of the optimization.
> 
> Some standard benchmark testing did not show any regressions.
> 
> Thank you!
> 
> Best regards,
> Christian
> 
> 
> [1] http://hg.openjdk.java.net/jdk/jdk/file/7a3522ab48b3/src/hotspot/share/c1/c1_LinearScan.cpp#l5728

From vladimir.kozlov at oracle.com  Thu Aug  6 21:45:53 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 6 Aug 2020 14:45:53 -0700
Subject: [16] (S) RFR 8251260: two MD5 tests fail "RuntimeException:
 Unexpected count of intrinsic"
Message-ID: <7ecc9a5b-3af6-78a4-832c-03d043340f9f@oracle.com>

http://cr.openjdk.java.net/~kvn/8251260/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8251260

New MD5 intrinsic tests failed when run with AOTed java.base. And old SHA tests are problem listed for AOT.

SHA and MD5 intrinsic tests parse -XX:+LogCompilation output looking for compilation of sun/security/provider methods as 
intrinsics. But these methods are already pre-compiled by AOT when AOTed java.base is used. As result LogCompilation 
does not have corresponding entries.

I think we should not run these MD5 and SHA tests with AOTed java.base module. I added corresponding @requires.

Old SHA tests were problem listed referencing 8167430 [1] bug but I think it is incorrect. The original SHA tests crash 
with AOT 8207358 [2] bug was closed as duplicate of 8167430 because of conflict how intirnsics flags are set by default 
during AOT compilation. But we simply should not run these tests with AOTed java.base. So I am adding @requires to them 
as well and removing them from AOT problem list.

Tested hs-tier1, hs-tier2 (runs sha,md5 tests), hs-tier6 (now skips sha,md5 tests when AOTed java.base is used).

Thanks,
Vladimir

[1] https://bugs.openjdk.java.net/browse/JDK-8167430
[2] https://bugs.openjdk.java.net/browse/JDK-8207358

From verghese at amazon.com  Thu Aug  6 23:49:39 2020
From: verghese at amazon.com (Verghese, Clive)
Date: Thu, 6 Aug 2020 23:49:39 +0000
Subject: RFR 8251268: Move PhaseChaitin definations from live.cpp to
 chaitin.cpp
Message-ID: <54AD5187-E7EE-410F-BD5D-11658E8D2F6E@amazon.com>

Hi,

Requesting review for

Webrev : http://cr.openjdk.java.net/~xliu/clive/8251268/00/webrev/
JBS : https://bugs.openjdk.java.net/browse/JDK-8251268

The change moves the definition of PhaseChaitin::verify_base_ptrs and PhaseChaitin::verify from live.cpp to chaitin.cpp

I have tested this builds successfully for both PRODUCT and !PRODUCT.

Ensured that there are no regressions in hotspot:tier1 tests.


Regards,
Clive Verghese

From christian.hagedorn at oracle.com  Fri Aug  7 06:53:08 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Fri, 7 Aug 2020 08:53:08 +0200
Subject: RFR 8251268: Move PhaseChaitin definations from live.cpp to
 chaitin.cpp
In-Reply-To: <54AD5187-E7EE-410F-BD5D-11658E8D2F6E@amazon.com>
References: <54AD5187-E7EE-410F-BD5D-11658E8D2F6E@amazon.com>
Message-ID: <6593ec2c-78dc-4a72-7f5b-f6c60deda41d@oracle.com>

Hi Clive

The fix looks good to me. It makes sense to move it to chaitin.cpp since 
the calls to verify() are also in this file only.

You could fix some minor code style things about the existing code that 
you moved while at it:
- You can move the #ifdef ASSERT out of both methods and surround both 
methods by one single #ifdef ASSERT since verify()/verify_base_ptrs() 
are only called in ASSERT blocks. And add a // ASSERT comment on the 
closing #endif to make it more clear. Don't forget to also surround the 
declarations in the .hpp file with an ASSERT.
- In verify_base_ptrs():
   - L2330: Missing curly braces for the loop
   - L2297, 2309, 2316: The asterisk should be at the type: ResourceArea 
*a -> ResourceArea* a
   - There is a missing space in all asserts after the comma separating 
the condition and the failure string
- In verify():
   - L2386: Missing space and curly braces for the if statement


Best regards,
Christian

On 07.08.20 01:49, Verghese, Clive wrote:
> Hi,
> 
> Requesting review for
> 
> Webrev : http://cr.openjdk.java.net/~xliu/clive/8251268/00/webrev/
> JBS : https://bugs.openjdk.java.net/browse/JDK-8251268
> 
> The change moves the definition of PhaseChaitin::verify_base_ptrs and PhaseChaitin::verify from live.cpp to chaitin.cpp
> 
> I have tested this builds successfully for both PRODUCT and !PRODUCT.
> 
> Ensured that there are no regressions in hotspot:tier1 tests.
> 
> 
> Regards,
> Clive Verghese
> 

From christian.hagedorn at oracle.com  Fri Aug  7 06:55:24 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Fri, 7 Aug 2020 08:55:24 +0200
Subject: [16] RFR(S): 8249603: C1: assert(has_error == false) failed:
 register allocation invalid
In-Reply-To: <e2ca6b1a-f705-789a-25c4-517f21c9dd2c@oracle.com>
References: <348bd2c6-df94-d9f5-47d9-d51443e1e5d6@oracle.com>
 <e2ca6b1a-f705-789a-25c4-517f21c9dd2c@oracle.com>
Message-ID: <141abed0-ea8f-8c93-6031-4deebd799af0@oracle.com>

Thanks a lot Vladimir!

Best regards,
Christian

On 06.08.20 21:19, Vladimir Kozlov wrote:
> Fix looks good. And very nice description of the issue.
> 
> Thanks,
> Vladimir K
> 
> On 8/6/20 2:34 AM, Christian Hagedorn wrote:
>> Hi
>>
>> Please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8249603
>> http://cr.openjdk.java.net/~chagedorn/8249603/webrev.00/
>>
>> Register allocation fails in C1 in the testcase because two intervals 
>> overlap (they both have the same stack slot assigned). The problem can 
>> be traced back to the optimization to assign the same spill slot to 
>> non-intersecting intervals in 
>> LinearScanWalker::combine_spilled_intervals().
>>
>> In this method, we look at a split parent interval 'cur' and its 
>> register hint interval 'register_hint'. A register hint is present 
>> when the interval represents either the source or the target operand 
>> of a move operation and the register hint the target or source 
>> operand, respectively (the register hint is used to try to assign the 
>> same register to the source and target operand such that we can 
>> completely remove the move operation).
>>
>> If the register hint is set, then we do some additional checks and 
>> make sure that the split parent and the register hint do not 
>> intersect. If all checks pass, the split parent 'cur' gets the same 
>> spill slot as the register hint [1]. This means that both intervals 
>> get the same slot on the stack if they are spilled.
>>
>> The problem now is that we do not consider any split children of the 
>> register hint which all share the same spill slot with the register 
>> hint (their split parent). In the testcase, the split parent 'cur' 
>> does not intersect with the register hint but with one of its split 
>> children. As a result, they both get the same spill slot and are later 
>> indeed both spilled (i.e. both virtual registers/operands are put to 
>> the same stack location at the same time).
>>
>> The fix now additionally checks if the split parent 'cur' does not 
>> intersect any split children of the register hint in 
>> combine_spilled_intervals(). If there is such an intersection, then we 
>> bail out of the optimization.
>>
>> Some standard benchmark testing did not show any regressions.
>>
>> Thank you!
>>
>> Best regards,
>> Christian
>>
>>
>> [1] 
>> http://hg.openjdk.java.net/jdk/jdk/file/7a3522ab48b3/src/hotspot/share/c1/c1_LinearScan.cpp#l5728 
>>

From nick.gasson at arm.com  Fri Aug  7 09:04:49 2020
From: nick.gasson at arm.com (Nick Gasson)
Date: Fri, 07 Aug 2020 17:04:49 +0800
Subject: RFR: 8247354: [aarch64] PopFrame causes assert(oopDesc::is_oop(obj))
 failed: not an oop
Message-ID: <85364ykery.fsf@nicgas01-pc.shanghai.arm.com>

Hi,

Bug: https://bugs.openjdk.java.net/browse/JDK-8247354
Webrev: http://cr.openjdk.java.net/~ngasson/8247354/webrev.0/

Running jtreg test vmTestbase/nsk/jdb/pop/pop001/pop001.java with -Xcomp
causes this assertion failure:

  assert(oopDesc::is_oop(obj)) failed: not an oop: 0x0000ffff60b334c0

This test has a sequence of method calls func1(0) -> func2(1) -> ... ->
func5(4) -> lastBreak() with a breakpoint in lastBreak(). func{2..5} are
inlined into func1 when compiled. At the breakpoint the debugger is used
to pop four frames and then continue executing from func2. This causes
func1 to be deoptimized but the recreated interpreter frame for func2
has garbage values in its temporary expression stack (the parameters for
func3), which triggers the above assertion when the invoke bytecode
re-executes.

The outgoing parameters in func2's expression stack should be filled in
when we recreate the locals for func3. But on AArch64 the template
interpreter inserts padding between the locals block and the saved
sender SP to align the machine SP to 16-bytes. This extra padding is
accounted for by AbstractInterpreter::size_activation() but not when
recreating the frame in layout_activation(). This causes the incoming
parameters in the callee frame to be misaligned with outgoing parameters
in the caller frame. This patch fixes that by using the caller's ESP to
calculate the location of the locals if the caller is an interpreted
frame.

Tested jtreg hotspot_all_no_apps, jdk_core plus tier1 with
-XX:+DeoptimizeALot.

--
Thanks,
Nick

From jatin.bhateja at intel.com  Fri Aug  7 09:27:38 2020
From: jatin.bhateja at intel.com (Bhateja, Jatin)
Date: Fri, 7 Aug 2020 09:27:38 +0000
Subject: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86
In-Reply-To: <MWHPR11MB1614336DE7DA3CDDD7292519E84C0@MWHPR11MB1614.namprd11.prod.outlook.com>
References: <MWHPR11MB1614EAFF216144FE6EAE68F9E87F0@MWHPR11MB1614.namprd11.prod.outlook.com>
 <92d97d1b-fc53-e368-b249-1cab7db33964@oracle.com>
 <MWHPR11MB1614CB6E26028AC98DAA7F30E8790@MWHPR11MB1614.namprd11.prod.outlook.com>
 <dd691913-d9c7-2657-905f-4f3df50f6bb4@oracle.com>
 <MWHPR11MB1614E047E14386D3B51EA3A9E8700@MWHPR11MB1614.namprd11.prod.outlook.com>
 <e0a75968-936f-97df-5693-f1e3275824e9@oracle.com>
 <5f6a3e52-7854-4613-43f1-32a7423a0db6@oracle.com>
 <MWHPR11MB1614B0D4523E65CF9876E72DE8710@MWHPR11MB1614.namprd11.prod.outlook.com>
 <8265e303-0f86-b308-be79-740d6b4710f2@oracle.com>
 <MWHPR11MB1614336DE7DA3CDDD7292519E84C0@MWHPR11MB1614.namprd11.prod.outlook.com>
Message-ID: <MWHPR11MB1614F63435554D7186F6BDA3E8490@MWHPR11MB1614.namprd11.prod.outlook.com>


Hi Vladimir,

Please let me know if final version looks fine to you.
Also, if clearance from a second reviewer mandatory here or can we push this
to trunk is no more comments.

Best Regards,
Jatin

> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-retn at openjdk.java.net> On
> Behalf Of Bhateja, Jatin
> Sent: Sunday, August 2, 2020 11:55 PM
> To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-
> dev at openjdk.java.net
> Subject: RE: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86
> 
> Hi Vladimir,
> 
> Final patch is placed at following link.
> 
> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.06/
> 
> One more reviewer approval needed.
> 
> Best Regards,
> Jatin
> 
> > -----Original Message-----
> > From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> > Sent: Saturday, August 1, 2020 4:49 AM
> > To: Bhateja, Jatin <jatin.bhateja at intel.com>
> > Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
> > hotspot-compiler- dev at openjdk.java.net
> > Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for
> > X86
> >
> >
> > > http://cr.openjdk.java.net/~jbhateja/8248830/webrev.05/
> >
> > Looks good.
> >
> > Tier5 (where I saw the crashes) passed.
> >
> > Please, incorporate the following minor cleanups in the final version:
> >
> > http://cr.openjdk.java.net/~vlivanov/jbhateja/8248830/webrev.05.cleanu
> > p/
> >
> > (Tested with hs-tier1,hs-tier2.)
> >
> > Best regards,
> > Vladimir Ivanov
> >
> > >> -----Original Message-----
> > >> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> > >> Sent: Thursday, July 30, 2020 3:30 AM
> > >> To: Bhateja, Jatin <jatin.bhateja at intel.com>
> > >> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
> > >> hotspot-compiler- dev at openjdk.java.net
> > >> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification
> > >> for
> > >> X86
> > >>
> > >>
> > >>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.04/
> > >>>
> > >>> Looks good. (Testing is in progress.)
> > >>
> > >> FYI test results are clean (tier1-tier5).
> > >>
> > >>>> I have removed RotateLeftNode/RotateRightNode::Ideal routines
> > >>>> since we are anyways doing constant folding in LShiftI/URShiftI
> > >>>> value routines. Since JAVA rotate APIs are no longer intrincified
> > >>>> hence these routines may no longer be useful.
> > >>>
> > >>> Nice observation! Good.
> > >>
> > >> As a second thought, it seems there's still a chance left that
> > >> Rotate nodes get their input type narrowed after the folding
> > >> happened. For example, as a result of incremental inlining or CFG
> > >> transformations during loop optimizations. And it does happen in
> > >> practice since the testing revealed some crashes due to the bug in
> > RotateLeftNode/RotateRightNode::Ideal().
> > >>
> > >> So, it makes sense to keep the transformations. But I'm fine with
> > >> addressing that as a followup enhancement.
> > >>
> > >> Best regards,
> > >> Vladimir Ivanov
> > >>
> > >>>
> > >>>>> It would be really nice to migrate to MacroAssembler along the
> > >>>>> way (as a cleanup).
> > >>>>
> > >>>> I guess you are saying remove opcodes/encoding from patterns and
> > >>>> move then to Assembler, Can we take this cleanup activity
> > >>>> separately since other patterns are also using these matcher
> > directives.
> > >>>
> > >>> I'm perfectly fine with handling it as a separate enhancement.
> > >>>
> > >>>> Other synthetic comments have been taken care of. I have extended
> > >>>> the Test to cover all the newly added scalar transforms. Kindly
> > >>>> let me know if there other comments.
> > >>>
> > >>> Nice!
> > >>>
> > >>> Best regards,
> > >>> Vladimir Ivanov
> > >>>
> > >>>>> -----Original Message-----
> > >>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> > >>>>> Sent: Friday, July 24, 2020 3:21 AM
> > >>>>> To: Bhateja, Jatin <jatin.bhateja at intel.com>
> > >>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Andrew
> > >>>>> Haley <aph at redhat.com>; hotspot-compiler-dev at openjdk.java.net
> > >>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification
> > >>>>> for
> > >>>>> X86
> > >>>>>
> > >>>>> Hi Jatin,
> > >>>>>
> > >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.03/
> > >>>>>
> > >>>>> Much better! Thanks.
> > >>>>>
> > >>>>>> Change Summary:
> > >>>>>>
> > >>>>>> 1) Unified the handling for scalar rotate operation. All scalar
> > >>>>>> rotate
> > >>>>> selection patterns are now dependent on newly created
> > >>>>> RotateLeft/RotateRight nodes. This promotes rotate inferencing.
> > >>>>> Currently
> > >>>>> if DAG nodes corresponding to a sub-pattern are shared (have
> > >>>>> multiple
> > >>>>> users) then existing complex patterns based on
> > >>>>> Or/LShiftL/URShift does not get matched and this prevents inferring
> rotate nodes.
> > >>>>> Please refer to JIT'ed assembly output with baseline[1] and with
> > >>>>> patch[2] . We can see that generated code size also went done
> > >>>>> from
> > >>>>> 832 byte to 768 bytes. Also this can cause perf degradation if
> > >>>>> shift-or dependency chain appears inside a hot region.
> > >>>>>>
> > >>>>>> 2) Due to enhanced rotate inferencing new patch shows better
> > >>>>>> performance
> > >>>>> even for legacy targets (non AVX-512). Please refer to the perf
> > >>>>> result[3] over AVX2 machine for JMH benchmark part of the patch.
> > >>>>>
> > >>>>> Very nice!
> > >>>>>> 3) As suggested, removed Java API intrinsification changes and
> > >>>>>> scalar
> > >>>>> rotate transformation are done during OrI/OrL node idealizations.
> > >>>>>
> > >>>>> Good.
> > >>>>>
> > >>>>> (Still would be nice to factor the matching code from Ideal()
> > >>>>> and share it between multiple use sites. Especially considering
> > >>>>> OrVNode::Ideal() now does basically the same thing. As an
> > >>>>> example/idea, take a look at
> > >>>>> is_bmi_pattern() in x86.ad.)
> > >>>>>
> > >>>>>> 4) SLP always gets to work on new scalar Rotate nodes and
> > >>>>>> creates vector
> > >>>>> rotate nodes which are degenerated into OrV/LShiftV/URShiftV
> > >>>>> nodes if target does not supports vector rotates(non-AVX512).
> > >>>>>
> > >>>>> Good.
> > >>>>>
> > >>>>>> 5) Added new instruction patterns for vector shift Left/Right
> > >>>>>> operations
> > >>>>> with constant shift operands. This prevents emitting extra moves
> > >>>>> to
> > >> XMM.
> > >>>>>
> > >>>>> +instruct vshiftI_imm(vec dst, vec src, immI8 shift) %{
> > >>>>> +? match(Set dst (LShiftVI src shift));
> > >>>>>
> > >>>>> I'd prefer to see a uniform Ideal IR shape being used
> > >>>>> irrespective of whether the argument is a constant or not. It
> > >>>>> should also simplify the logic in SuperWord and make it easier
> > >>>>> to support on
> > >>>>> non-x86 architectures.
> > >>>>>
> > >>>>> For example, here's how it is done on AArch64:
> > >>>>>
> > >>>>> instruct vsll4I_imm(vecX dst, vecX src, immI shift) %{
> > >>>>>  ??? predicate(n->as_Vector()->length() == 4);
> > >>>>>  ??? match(Set dst (LShiftVI src (LShiftCntV shift))); ...
> > >>>>>
> > >>>>>> 6) Constant folding scenarios are covered in
> > >>>>>> RotateLeft/RotateRight
> > >>>>> idealization, inferencing of vector rotate through OrV
> > >>>>> idealization covers the vector patterns generated though non SLP
> > route i.e.
> > >>>>> VectorAPI.
> > >>>>>
> > >>>>> I'm fine with keeping OrV::Ideal(), but I'm concerned with the
> > >>>>> general direction here - duplication of scalar transformations
> > >>>>> to lane-wise vector operations. It definitely won't scale and in
> > >>>>> a longer run it risks to diverge. Would be nice to find a way to
> > >>>>> automatically "lift"
> > >>>>> scalar transformations to vectors and apply them uniformly. But
> > >>>>> right now it is just an idea which requires more experimentation.
> > >>>>>
> > >>>>>
> > >>>>> Some other minor comments/suggestions:
> > >>>>>
> > >>>>> +? // Swap the computed left and right shift counts.
> > >>>>> +? if (is_rotate_left) {
> > >>>>> +??? Node* temp = shiftRCnt;
> > >>>>> +??? shiftRCnt? = shiftLCnt;
> > >>>>> +??? shiftLCnt? = temp;
> > >>>>> +? }
> > >>>>>
> > >>>>> Maybe use swap() here (declared in globalDefinitions.hpp)?
> > >>>>>
> > >>>>>
> > >>>>> +? if (Matcher::match_rule_supported_vector(vopc, vlen, bt))
> > >>>>> +??? return true;
> > >>>>>
> > >>>>> Please, don't omit curly braces (even for simple cases).
> > >>>>>
> > >>>>>
> > >>>>> -// Rotate Right by variable
> > >>>>> -instruct rorI_rReg_Var_C0(no_rcx_RegI dst, rcx_RegI shift,
> > >>>>> immI0 zero, rFlagsReg cr)
> > >>>>> +instruct rorI_immI8_legacy(rRegI dst, immI8 shift, rFlagsReg
> > >>>>> +cr)
> > >>>>>  ?? %{
> > >>>>> -? match(Set dst (OrI (URShiftI dst shift) (LShiftI dst (SubI
> > >>>>> zero shift))));
> > >>>>> -
> > >>>>> +? predicate(!VM_Version::supports_bmi2() &&
> > >>>>> n->bottom_type()->basic_type() == T_INT);
> > >>>>> +? match(Set dst (RotateRight dst shift));
> > >>>>> +? format %{ "rorl???? $dst, $shift" %}
> > >>>>>  ???? expand %{
> > >>>>> -??? rorI_rReg_CL(dst, shift, cr);
> > >>>>> +??? rorI_rReg_imm8(dst, shift, cr);
> > >>>>>  ???? %}
> > >>>>>
> > >>>>> It would be really nice to migrate to MacroAssembler along the
> > >>>>> way (as a cleanup).
> > >>>>>
> > >>>>>> Please push the patch through your testing framework and let me
> > >>>>>> know your
> > >>>>> review feedback.
> > >>>>>
> > >>>>> There's one new assertion failure:
> > >>>>>
> > >>>>> #? Internal Error (.../src/hotspot/share/opto/phaseX.cpp:1238),
> > >>>>> pid=5476, tid=6219
> > >>>>> #? assert((i->_idx >= k->_idx) || i->is_top()) failed: Idealize
> > >>>>> should return new nodes, use Identity to return old nodes
> > >>>>>
> > >>>>> I believe it comes from
> > >>>>> RotateLeftNode::Ideal/RotateRightNode::Ideal
> > >>>>> which can return pre-contructed constants. I suggest to get rid
> > >>>>> of
> > >>>>> Ideal() methods and move constant folding logic into
> > >>>>> Node::Value() (as implemented for other bitwise/arithmethic
> > >>>>> nodes in addnode.cpp/subnode.cpp/mulnode.cpp et al). It's a more
> > >>>>> generic approach since it enables richer type information
> > >>>>> (ranges vs
> > >>>>> constants) and IMO it's more convenient to work with constants
> > >>>>> through Types than ConNodes.
> > >>>>>
> > >>>>> (I suspect that original/expanded IR shape may already provide
> > >>>>> more precise type info for non-constant case which can affect
> > >>>>> the
> > >>>>> benchmarks.)
> > >>>>>
> > >>>>> Best regards,
> > >>>>> Vladimir Ivanov
> > >>>>>
> > >>>>>>
> > >>>>>> Best Regards,
> > >>>>>> Jatin
> > >>>>>>
> > >>>>>> [1]
> > >>>>>>
> > http://cr.openjdk.java.net/~jbhateja/8248830/rotate_baseline_avx2_asm.
> > >>>>>> txt [2]
> > >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_new_patch_a
> > >>>>>> vx
> > >>>>>> 2_
> > >>>>>> asm
> > >>>>>> .txt [3]
> > >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_perf_avx2_n
> > >>>>>> ew
> > >>>>>> _p
> > >>>>>> atc
> > >>>>>> h.txt
> > >>>>>>
> > >>>>>>
> > >>>>>>> -----Original Message-----
> > >>>>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> > >>>>>>> Sent: Saturday, July 18, 2020 12:25 AM
> > >>>>>>> To: Bhateja, Jatin <jatin.bhateja at intel.com>; Andrew Haley
> > >>>>>>> <aph at redhat.com>
> > >>>>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
> > >>>>>>> hotspot-compiler- dev at openjdk.java.net
> > >>>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API
> > >>>>>>> intrinsification for
> > >>>>>>> X86
> > >>>>>>>
> > >>>>>>> Hi Jatin,
> > >>>>>>>
> > >>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev_02/
> > >>>>>>>
> > >>>>>>> It definitely looks better, but IMO it hasn't reached the
> > >>>>>>> sweet spot
> > >>>>> yet.
> > >>>>>>> It feels like the focus is on auto-vectorizer while the burden
> > >>>>>>> is put on scalar cases.
> > >>>>>>>
> > >>>>>>> First of all, considering GVN folds relevant operation
> > >>>>>>> patterns into a single Rotate node now, what's the motivation
> > >>>>>>> to introduce intrinsics?
> > >>>>>>>
> > >>>>>>> Another point is there's still significant duplication for
> > >>>>>>> scalar cases.
> > >>>>>>>
> > >>>>>>> I'd prefer to see the legacy cases which rely on pattern
> > >>>>>>> matching to go away and be substituted with instructions which
> > >>>>>>> match Rotate instructions (migrating ).
> > >>>>>>>
> > >>>>>>> I understand that it will penalize the vectorization
> > >>>>>>> implementation, but IMO reducing overall complexity is worth it.
> > >>>>>>> On auto-vectorizer side, I see
> > >>>>>>> 2 ways to fix it:
> > >>>>>>>
> > >>>>>>>  ???? (1) introduce additional AD instructions for
> > >>>>>>> RotateLeftV/RotateRightV specifically for pre-AVX512 hardware;
> > >>>>>>>
> > >>>>>>>  ???? (2) in SuperWord::output(), when matcher doesn't support
> > >>>>>>> RotateLeftV/RotateLeftV nodes
> > >>>>>>> (Matcher::match_rule_supported()),
> > >>>>>>> generate vectorized version of the original pattern.
> > >>>>>>>
> > >>>>>>> Overall, it looks like more and more focus is made on scalar
> part.
> > >>>>>>> Considering the main goal of the patch is to enable
> > >>>>>>> vectorization, I'm fine with separating cleanup of scalar part.
> > >>>>>>> As an interim solution, it seems that leaving the scalar part
> > >>>>>>> as it is now and matching scalar bit rotate pattern in
> > >>>>>>> VectorNode::is_rotate() should be enough to keep the
> > >>>>>>> vectorization part functioning. Then scalar Rotate nodes and
> > relevant cleanups can be integrated later.
> > >>>>>>> (Or vice
> > >>>>>>> versa: clean up scalar part first and then follow up with
> > >>>>>>> vectorization.)
> > >>>>>>>
> > >>>>>>> Some other comments:
> > >>>>>>>
> > >>>>>>> * There's a lot of duplication between OrINode::Ideal and
> > >>>>> OrLNode::Ideal.
> > >>>>>>> What do you think about introducing a super type
> > >>>>>>> (OrNode) and put a unified version (OrNode::Ideal) there?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> * src/hotspot/cpu/x86/x86.ad
> > >>>>>>>
> > >>>>>>> +instruct vprotate_immI8(vec dst, vec src, immI8 shift) %{
> > >>>>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type()
> > >>>>>>> +==
> > >>>>>>> T_INT
> > >>>>> ||
> > >>>>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type()
> > >>>>>>> +== T_LONG);
> > >>>>>>>
> > >>>>>>> +instruct vprorate(vec dst, vec src, vec shift) %{
> > >>>>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type()
> > >>>>>>> +==
> > >>>>>>> T_INT
> > >>>>> ||
> > >>>>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type()
> > >>>>>>> +== T_LONG);
> > >>>>>>>
> > >>>>>>> The predicates are redundant here.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> * src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
> > >>>>>>>
> > >>>>>>> +void C2_MacroAssembler::vprotate_imm(int opcode, BasicType
> > >>>>>>> +etype,
> > >>>>>>> XMMRegister dst, XMMRegister src,
> > >>>>>>> +???????????????????????????????????? int shift, int
> > >>>>>>> +vector_len) { if (opcode == Op_RotateLeftV) {
> > >>>>>>> +??? if (etype == T_INT) {
> > >>>>>>> +????? evprold(dst, src, shift, vector_len);
> > >>>>>>> +??? } else {
> > >>>>>>> +????? evprolq(dst, src, shift, vector_len);
> > >>>>>>> +??? }
> > >>>>>>>
> > >>>>>>> Please, put an assert for the false case (assert(etype ==
> > >>>>>>> T_LONG,
> > >>>>> "...")).
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> * On testing (with previous version of the patch): -XX:UseAVX
> > >>>>>>> is
> > >>>>>>> x86- specific flag, so new/adjusted tests now fail on non-x86
> > >> platforms.
> > >>>>>>> Either omitting the flag or adding
> > >>>>>>> -XX:+IgnoreUnrecognizedVMOptions will solve the issue.
> > >>>>>>>
> > >>>>>>> Best regards,
> > >>>>>>> Vladimir Ivanov
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Summary of changes:
> > >>>>>>>> 1) Optimization is specifically targeted to exploit vector
> > >>>>>>>> rotation
> > >>>>>>> instruction added for X86 AVX512. A single rotate instruction
> > >>>>>>> encapsulates entire vector OR/SHIFTs pattern thus offers
> > >>>>>>> better latency at reduced instruction count.
> > >>>>>>>>
> > >>>>>>>> 2) There were two approaches to implement this:
> > >>>>>>>>  ?????? a)? Let everything remain the same and add new wide
> > >>>>>>>> complex
> > >>>>>>> instruction patterns in the matcher for e.g.
> > >>>>>>>>  ??????????? set Dst ( OrV (Binary (LShiftVI dst (Binary
> > >>>>>>>> ReplicateI
> > >>>>>>>> shift))
> > >>>>>>> (URShiftVI dst (Binary (SubI (Binary ReplicateI 32) (
> > >>>>>>> Replicate
> > >>>>>>> shift))
> > >>>>>>>>  ?????? It would have been an overoptimistic assumption to
> > >>>>>>>> expect that graph
> > >>>>>>> shape would be preserved till the matcher for correct
> inferencing.
> > >>>>>>>>  ?????? In addition we would have required multiple such
> > >>>>>>>> bulky patterns.
> > >>>>>>>>  ?????? b) Create new RotateLeft/RotateRight scalar nodes,
> > >>>>>>>> these gets
> > >>>>>>> generated during intrinsification as well as during additional
> > >>>>>>> pattern
> > >>>>>>>>  ?????? matching during node Idealization, later on these
> > >>>>>>>> nodes are consumed
> > >>>>>>> by SLP for valid vectorization scenarios to emit their vector
> > >>>>>>>>  ?????? counterparts which eventually emits vector rotates.
> > >>>>>>>>
> > >>>>>>>> 3) I choose approach 2b) since its cleaner, only problem here
> > >>>>>>>> was that in non-evex mode (UseAVX < 3) new scalar Rotate
> > >>>>>>>> nodes should either be
> > >>>>>>> dismantled back to OR/SHIFT pattern or we penalize the
> > >>>>>>> vectorization which would be very costly, other option would
> > >>>>>>> have been to add additional vector rotate pattern for UseAVX=3
> > >>>>>>> in the matcher which emit vector OR-SHIFTs instruction but
> > >>>>>>> then it will loose on emitting efficient instruction sequence
> > >>>>>>> which node sharing
> > >>>>>>> (OrV/LShiftV/URShift) offer in current implementation - thus
> > >>>>>>> it will not be beneficial for non-AVX512 targets, only saving
> > >>>>>>> will be in terms of cleanup of few existing scalar rotate
> > >>>>>>> matcher patterns, also old targets does not offer this
> > >>>>>>> powerful rotate
> > >> instruction.
> > >>>>>>> Therefore new scalar nodes are created only for AVX512 targets.
> > >>>>>>>>
> > >>>>>>>> As per suggestions constant folding scenarios have been
> > >>>>>>>> covered during
> > >>>>>>> Idealizations of newly added scalar nodes.
> > >>>>>>>>
> > >>>>>>>> Please review the latest version and share your feedback and
> > >>>>>>>> test
> > >>>>>>> results.
> > >>>>>>>>
> > >>>>>>>> Best Regards,
> > >>>>>>>> Jatin
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> -----Original Message-----
> > >>>>>>>>> From: Andrew Haley <aph at redhat.com>
> > >>>>>>>>> Sent: Saturday, July 11, 2020 2:24 PM
> > >>>>>>>>> To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; Bhateja,
> > >>>>>>>>> Jatin <jatin.bhateja at intel.com>;
> > >>>>>>>>> hotspot-compiler-dev at openjdk.java.net
> > >>>>>>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> > >>>>>>>>> Subject: Re: 8248830 : RFR[S] : C2 : Rotate API
> > >>>>>>>>> intrinsification for
> > >>>>>>>>> X86
> > >>>>>>>>>
> > >>>>>>>>> On 10/07/2020 18:32, Vladimir Ivanov wrote:
> > >>>>>>>>>
> > >>>>>>>>>  ??? > High-level comment: so far, there were no pressing
> > >>>>>>>>> need in
> > >>>>>>>>>> explicitly marking the methods as intrinsics. ROR/ROL
> > >>>>>>>>> instructions
> > >>>>>>>>>> were selected during matching [1]. Now the patch introduces
> > >>>>>>>>>> >
> > >>>>>>>>> dedicated nodes
> > >>>>>>>>> (RotateLeft/RotateRight) specifically for intrinsics? >
> > >>>>>>>>> which partly duplicates existing logic.
> > >>>>>>>>>
> > >>>>>>>>> The lack of rotate nodes in the IR has always meant that
> > >>>>>>>>> AArch64 doesn't generate optimal code for e.g.
> > >>>>>>>>>
> > >>>>>>>>>  ????? (Set dst (XorL reg1 (RotateLeftL reg2 imm)))
> > >>>>>>>>>
> > >>>>>>>>> because, with the RotateLeft expanded to its full
> > >>>>>>>>> combination of ORs and shifts, it's to complicated to match.
> > >>>>>>>>> At the time I put this to one side because it wasn't urgent.
> > >>>>>>>>> This is a shame because although such combinations are
> > >>>>>>>>> unusual they are used in some crypto
> > >>>>> operations.
> > >>>>>>>>>
> > >>>>>>>>> If we can generate immediate-form rotate nodes early by
> > >>>>>>>>> pattern matching during parsing (rather than depending on
> > >>>>>>>>> intrinsics) we'll get more value than by depending on
> > >>>>>>>>> programmers calling
> > >> intrinsics.
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>> Andrew Haley? (he/him)
> > >>>>>>>>> Java Platform Lead Engineer
> > >>>>>>>>> Red Hat UK Ltd. <https://www.redhat.com>
> > >>>>>>>>> https://keybase.io/andrewhaley
> > >>>>>>>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
> > >>>>>>>>

From vladimir.x.ivanov at oracle.com  Fri Aug  7 12:15:12 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 7 Aug 2020 15:15:12 +0300
Subject: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86
In-Reply-To: <MWHPR11MB1614F63435554D7186F6BDA3E8490@MWHPR11MB1614.namprd11.prod.outlook.com>
References: <MWHPR11MB1614EAFF216144FE6EAE68F9E87F0@MWHPR11MB1614.namprd11.prod.outlook.com>
 <92d97d1b-fc53-e368-b249-1cab7db33964@oracle.com>
 <MWHPR11MB1614CB6E26028AC98DAA7F30E8790@MWHPR11MB1614.namprd11.prod.outlook.com>
 <dd691913-d9c7-2657-905f-4f3df50f6bb4@oracle.com>
 <MWHPR11MB1614E047E14386D3B51EA3A9E8700@MWHPR11MB1614.namprd11.prod.outlook.com>
 <e0a75968-936f-97df-5693-f1e3275824e9@oracle.com>
 <5f6a3e52-7854-4613-43f1-32a7423a0db6@oracle.com>
 <MWHPR11MB1614B0D4523E65CF9876E72DE8710@MWHPR11MB1614.namprd11.prod.outlook.com>
 <8265e303-0f86-b308-be79-740d6b4710f2@oracle.com>
 <MWHPR11MB1614336DE7DA3CDDD7292519E84C0@MWHPR11MB1614.namprd11.prod.outlook.com>
 <MWHPR11MB1614F63435554D7186F6BDA3E8490@MWHPR11MB1614.namprd11.prod.outlook.com>
Message-ID: <95066bec-d74e-eb55-9a05-463239016b2a@oracle.com>


>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.06/

Still looks good.

It would be nice to get one more (R)eview. Let's wait a little bit more.

Best regards,
Vladimir Ivanov

>>> -----Original Message-----
>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>> Sent: Saturday, August 1, 2020 4:49 AM
>>> To: Bhateja, Jatin <jatin.bhateja at intel.com>
>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>> hotspot-compiler- dev at openjdk.java.net
>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for
>>> X86
>>>
>>>
>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.05/
>>>
>>> Looks good.
>>>
>>> Tier5 (where I saw the crashes) passed.
>>>
>>> Please, incorporate the following minor cleanups in the final version:
>>>
>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8248830/webrev.05.cleanu
>>> p/
>>>
>>> (Tested with hs-tier1,hs-tier2.)
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>>>> -----Original Message-----
>>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>>> Sent: Thursday, July 30, 2020 3:30 AM
>>>>> To: Bhateja, Jatin <jatin.bhateja at intel.com>
>>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>>> hotspot-compiler- dev at openjdk.java.net
>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification
>>>>> for
>>>>> X86
>>>>>
>>>>>
>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.04/
>>>>>>
>>>>>> Looks good. (Testing is in progress.)
>>>>>
>>>>> FYI test results are clean (tier1-tier5).
>>>>>
>>>>>>> I have removed RotateLeftNode/RotateRightNode::Ideal routines
>>>>>>> since we are anyways doing constant folding in LShiftI/URShiftI
>>>>>>> value routines. Since JAVA rotate APIs are no longer intrincified
>>>>>>> hence these routines may no longer be useful.
>>>>>>
>>>>>> Nice observation! Good.
>>>>>
>>>>> As a second thought, it seems there's still a chance left that
>>>>> Rotate nodes get their input type narrowed after the folding
>>>>> happened. For example, as a result of incremental inlining or CFG
>>>>> transformations during loop optimizations. And it does happen in
>>>>> practice since the testing revealed some crashes due to the bug in
>>> RotateLeftNode/RotateRightNode::Ideal().
>>>>>
>>>>> So, it makes sense to keep the transformations. But I'm fine with
>>>>> addressing that as a followup enhancement.
>>>>>
>>>>> Best regards,
>>>>> Vladimir Ivanov
>>>>>
>>>>>>
>>>>>>>> It would be really nice to migrate to MacroAssembler along the
>>>>>>>> way (as a cleanup).
>>>>>>>
>>>>>>> I guess you are saying remove opcodes/encoding from patterns and
>>>>>>> move then to Assembler, Can we take this cleanup activity
>>>>>>> separately since other patterns are also using these matcher
>>> directives.
>>>>>>
>>>>>> I'm perfectly fine with handling it as a separate enhancement.
>>>>>>
>>>>>>> Other synthetic comments have been taken care of. I have extended
>>>>>>> the Test to cover all the newly added scalar transforms. Kindly
>>>>>>> let me know if there other comments.
>>>>>>
>>>>>> Nice!
>>>>>>
>>>>>> Best regards,
>>>>>> Vladimir Ivanov
>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>>>>>> Sent: Friday, July 24, 2020 3:21 AM
>>>>>>>> To: Bhateja, Jatin <jatin.bhateja at intel.com>
>>>>>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Andrew
>>>>>>>> Haley <aph at redhat.com>; hotspot-compiler-dev at openjdk.java.net
>>>>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification
>>>>>>>> for
>>>>>>>> X86
>>>>>>>>
>>>>>>>> Hi Jatin,
>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.03/
>>>>>>>>
>>>>>>>> Much better! Thanks.
>>>>>>>>
>>>>>>>>> Change Summary:
>>>>>>>>>
>>>>>>>>> 1) Unified the handling for scalar rotate operation. All scalar
>>>>>>>>> rotate
>>>>>>>> selection patterns are now dependent on newly created
>>>>>>>> RotateLeft/RotateRight nodes. This promotes rotate inferencing.
>>>>>>>> Currently
>>>>>>>> if DAG nodes corresponding to a sub-pattern are shared (have
>>>>>>>> multiple
>>>>>>>> users) then existing complex patterns based on
>>>>>>>> Or/LShiftL/URShift does not get matched and this prevents inferring
>> rotate nodes.
>>>>>>>> Please refer to JIT'ed assembly output with baseline[1] and with
>>>>>>>> patch[2] . We can see that generated code size also went done
>>>>>>>> from
>>>>>>>> 832 byte to 768 bytes. Also this can cause perf degradation if
>>>>>>>> shift-or dependency chain appears inside a hot region.
>>>>>>>>>
>>>>>>>>> 2) Due to enhanced rotate inferencing new patch shows better
>>>>>>>>> performance
>>>>>>>> even for legacy targets (non AVX-512). Please refer to the perf
>>>>>>>> result[3] over AVX2 machine for JMH benchmark part of the patch.
>>>>>>>>
>>>>>>>> Very nice!
>>>>>>>>> 3) As suggested, removed Java API intrinsification changes and
>>>>>>>>> scalar
>>>>>>>> rotate transformation are done during OrI/OrL node idealizations.
>>>>>>>>
>>>>>>>> Good.
>>>>>>>>
>>>>>>>> (Still would be nice to factor the matching code from Ideal()
>>>>>>>> and share it between multiple use sites. Especially considering
>>>>>>>> OrVNode::Ideal() now does basically the same thing. As an
>>>>>>>> example/idea, take a look at
>>>>>>>> is_bmi_pattern() in x86.ad.)
>>>>>>>>
>>>>>>>>> 4) SLP always gets to work on new scalar Rotate nodes and
>>>>>>>>> creates vector
>>>>>>>> rotate nodes which are degenerated into OrV/LShiftV/URShiftV
>>>>>>>> nodes if target does not supports vector rotates(non-AVX512).
>>>>>>>>
>>>>>>>> Good.
>>>>>>>>
>>>>>>>>> 5) Added new instruction patterns for vector shift Left/Right
>>>>>>>>> operations
>>>>>>>> with constant shift operands. This prevents emitting extra moves
>>>>>>>> to
>>>>> XMM.
>>>>>>>>
>>>>>>>> +instruct vshiftI_imm(vec dst, vec src, immI8 shift) %{
>>>>>>>> +? match(Set dst (LShiftVI src shift));
>>>>>>>>
>>>>>>>> I'd prefer to see a uniform Ideal IR shape being used
>>>>>>>> irrespective of whether the argument is a constant or not. It
>>>>>>>> should also simplify the logic in SuperWord and make it easier
>>>>>>>> to support on
>>>>>>>> non-x86 architectures.
>>>>>>>>
>>>>>>>> For example, here's how it is done on AArch64:
>>>>>>>>
>>>>>>>> instruct vsll4I_imm(vecX dst, vecX src, immI shift) %{
>>>>>>>>   ??? predicate(n->as_Vector()->length() == 4);
>>>>>>>>   ??? match(Set dst (LShiftVI src (LShiftCntV shift))); ...
>>>>>>>>
>>>>>>>>> 6) Constant folding scenarios are covered in
>>>>>>>>> RotateLeft/RotateRight
>>>>>>>> idealization, inferencing of vector rotate through OrV
>>>>>>>> idealization covers the vector patterns generated though non SLP
>>> route i.e.
>>>>>>>> VectorAPI.
>>>>>>>>
>>>>>>>> I'm fine with keeping OrV::Ideal(), but I'm concerned with the
>>>>>>>> general direction here - duplication of scalar transformations
>>>>>>>> to lane-wise vector operations. It definitely won't scale and in
>>>>>>>> a longer run it risks to diverge. Would be nice to find a way to
>>>>>>>> automatically "lift"
>>>>>>>> scalar transformations to vectors and apply them uniformly. But
>>>>>>>> right now it is just an idea which requires more experimentation.
>>>>>>>>
>>>>>>>>
>>>>>>>> Some other minor comments/suggestions:
>>>>>>>>
>>>>>>>> +? // Swap the computed left and right shift counts.
>>>>>>>> +? if (is_rotate_left) {
>>>>>>>> +??? Node* temp = shiftRCnt;
>>>>>>>> +??? shiftRCnt? = shiftLCnt;
>>>>>>>> +??? shiftLCnt? = temp;
>>>>>>>> +? }
>>>>>>>>
>>>>>>>> Maybe use swap() here (declared in globalDefinitions.hpp)?
>>>>>>>>
>>>>>>>>
>>>>>>>> +? if (Matcher::match_rule_supported_vector(vopc, vlen, bt))
>>>>>>>> +??? return true;
>>>>>>>>
>>>>>>>> Please, don't omit curly braces (even for simple cases).
>>>>>>>>
>>>>>>>>
>>>>>>>> -// Rotate Right by variable
>>>>>>>> -instruct rorI_rReg_Var_C0(no_rcx_RegI dst, rcx_RegI shift,
>>>>>>>> immI0 zero, rFlagsReg cr)
>>>>>>>> +instruct rorI_immI8_legacy(rRegI dst, immI8 shift, rFlagsReg
>>>>>>>> +cr)
>>>>>>>>   ?? %{
>>>>>>>> -? match(Set dst (OrI (URShiftI dst shift) (LShiftI dst (SubI
>>>>>>>> zero shift))));
>>>>>>>> -
>>>>>>>> +? predicate(!VM_Version::supports_bmi2() &&
>>>>>>>> n->bottom_type()->basic_type() == T_INT);
>>>>>>>> +? match(Set dst (RotateRight dst shift));
>>>>>>>> +? format %{ "rorl???? $dst, $shift" %}
>>>>>>>>   ???? expand %{
>>>>>>>> -??? rorI_rReg_CL(dst, shift, cr);
>>>>>>>> +??? rorI_rReg_imm8(dst, shift, cr);
>>>>>>>>   ???? %}
>>>>>>>>
>>>>>>>> It would be really nice to migrate to MacroAssembler along the
>>>>>>>> way (as a cleanup).
>>>>>>>>
>>>>>>>>> Please push the patch through your testing framework and let me
>>>>>>>>> know your
>>>>>>>> review feedback.
>>>>>>>>
>>>>>>>> There's one new assertion failure:
>>>>>>>>
>>>>>>>> #? Internal Error (.../src/hotspot/share/opto/phaseX.cpp:1238),
>>>>>>>> pid=5476, tid=6219
>>>>>>>> #? assert((i->_idx >= k->_idx) || i->is_top()) failed: Idealize
>>>>>>>> should return new nodes, use Identity to return old nodes
>>>>>>>>
>>>>>>>> I believe it comes from
>>>>>>>> RotateLeftNode::Ideal/RotateRightNode::Ideal
>>>>>>>> which can return pre-contructed constants. I suggest to get rid
>>>>>>>> of
>>>>>>>> Ideal() methods and move constant folding logic into
>>>>>>>> Node::Value() (as implemented for other bitwise/arithmethic
>>>>>>>> nodes in addnode.cpp/subnode.cpp/mulnode.cpp et al). It's a more
>>>>>>>> generic approach since it enables richer type information
>>>>>>>> (ranges vs
>>>>>>>> constants) and IMO it's more convenient to work with constants
>>>>>>>> through Types than ConNodes.
>>>>>>>>
>>>>>>>> (I suspect that original/expanded IR shape may already provide
>>>>>>>> more precise type info for non-constant case which can affect
>>>>>>>> the
>>>>>>>> benchmarks.)
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Vladimir Ivanov
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>> Jatin
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>>
>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_baseline_avx2_asm.
>>>>>>>>> txt [2]
>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_new_patch_a
>>>>>>>>> vx
>>>>>>>>> 2_
>>>>>>>>> asm
>>>>>>>>> .txt [3]
>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_perf_avx2_n
>>>>>>>>> ew
>>>>>>>>> _p
>>>>>>>>> atc
>>>>>>>>> h.txt
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>>>>>>>> Sent: Saturday, July 18, 2020 12:25 AM
>>>>>>>>>> To: Bhateja, Jatin <jatin.bhateja at intel.com>; Andrew Haley
>>>>>>>>>> <aph at redhat.com>
>>>>>>>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>>>>>>>> hotspot-compiler- dev at openjdk.java.net
>>>>>>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API
>>>>>>>>>> intrinsification for
>>>>>>>>>> X86
>>>>>>>>>>
>>>>>>>>>> Hi Jatin,
>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev_02/
>>>>>>>>>>
>>>>>>>>>> It definitely looks better, but IMO it hasn't reached the
>>>>>>>>>> sweet spot
>>>>>>>> yet.
>>>>>>>>>> It feels like the focus is on auto-vectorizer while the burden
>>>>>>>>>> is put on scalar cases.
>>>>>>>>>>
>>>>>>>>>> First of all, considering GVN folds relevant operation
>>>>>>>>>> patterns into a single Rotate node now, what's the motivation
>>>>>>>>>> to introduce intrinsics?
>>>>>>>>>>
>>>>>>>>>> Another point is there's still significant duplication for
>>>>>>>>>> scalar cases.
>>>>>>>>>>
>>>>>>>>>> I'd prefer to see the legacy cases which rely on pattern
>>>>>>>>>> matching to go away and be substituted with instructions which
>>>>>>>>>> match Rotate instructions (migrating ).
>>>>>>>>>>
>>>>>>>>>> I understand that it will penalize the vectorization
>>>>>>>>>> implementation, but IMO reducing overall complexity is worth it.
>>>>>>>>>> On auto-vectorizer side, I see
>>>>>>>>>> 2 ways to fix it:
>>>>>>>>>>
>>>>>>>>>>   ???? (1) introduce additional AD instructions for
>>>>>>>>>> RotateLeftV/RotateRightV specifically for pre-AVX512 hardware;
>>>>>>>>>>
>>>>>>>>>>   ???? (2) in SuperWord::output(), when matcher doesn't support
>>>>>>>>>> RotateLeftV/RotateLeftV nodes
>>>>>>>>>> (Matcher::match_rule_supported()),
>>>>>>>>>> generate vectorized version of the original pattern.
>>>>>>>>>>
>>>>>>>>>> Overall, it looks like more and more focus is made on scalar
>> part.
>>>>>>>>>> Considering the main goal of the patch is to enable
>>>>>>>>>> vectorization, I'm fine with separating cleanup of scalar part.
>>>>>>>>>> As an interim solution, it seems that leaving the scalar part
>>>>>>>>>> as it is now and matching scalar bit rotate pattern in
>>>>>>>>>> VectorNode::is_rotate() should be enough to keep the
>>>>>>>>>> vectorization part functioning. Then scalar Rotate nodes and
>>> relevant cleanups can be integrated later.
>>>>>>>>>> (Or vice
>>>>>>>>>> versa: clean up scalar part first and then follow up with
>>>>>>>>>> vectorization.)
>>>>>>>>>>
>>>>>>>>>> Some other comments:
>>>>>>>>>>
>>>>>>>>>> * There's a lot of duplication between OrINode::Ideal and
>>>>>>>> OrLNode::Ideal.
>>>>>>>>>> What do you think about introducing a super type
>>>>>>>>>> (OrNode) and put a unified version (OrNode::Ideal) there?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> * src/hotspot/cpu/x86/x86.ad
>>>>>>>>>>
>>>>>>>>>> +instruct vprotate_immI8(vec dst, vec src, immI8 shift) %{
>>>>>>>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type()
>>>>>>>>>> +==
>>>>>>>>>> T_INT
>>>>>>>> ||
>>>>>>>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type()
>>>>>>>>>> +== T_LONG);
>>>>>>>>>>
>>>>>>>>>> +instruct vprorate(vec dst, vec src, vec shift) %{
>>>>>>>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type()
>>>>>>>>>> +==
>>>>>>>>>> T_INT
>>>>>>>> ||
>>>>>>>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type()
>>>>>>>>>> +== T_LONG);
>>>>>>>>>>
>>>>>>>>>> The predicates are redundant here.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> * src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
>>>>>>>>>>
>>>>>>>>>> +void C2_MacroAssembler::vprotate_imm(int opcode, BasicType
>>>>>>>>>> +etype,
>>>>>>>>>> XMMRegister dst, XMMRegister src,
>>>>>>>>>> +???????????????????????????????????? int shift, int
>>>>>>>>>> +vector_len) { if (opcode == Op_RotateLeftV) {
>>>>>>>>>> +??? if (etype == T_INT) {
>>>>>>>>>> +????? evprold(dst, src, shift, vector_len);
>>>>>>>>>> +??? } else {
>>>>>>>>>> +????? evprolq(dst, src, shift, vector_len);
>>>>>>>>>> +??? }
>>>>>>>>>>
>>>>>>>>>> Please, put an assert for the false case (assert(etype ==
>>>>>>>>>> T_LONG,
>>>>>>>> "...")).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> * On testing (with previous version of the patch): -XX:UseAVX
>>>>>>>>>> is
>>>>>>>>>> x86- specific flag, so new/adjusted tests now fail on non-x86
>>>>> platforms.
>>>>>>>>>> Either omitting the flag or adding
>>>>>>>>>> -XX:+IgnoreUnrecognizedVMOptions will solve the issue.
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Vladimir Ivanov
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Summary of changes:
>>>>>>>>>>> 1) Optimization is specifically targeted to exploit vector
>>>>>>>>>>> rotation
>>>>>>>>>> instruction added for X86 AVX512. A single rotate instruction
>>>>>>>>>> encapsulates entire vector OR/SHIFTs pattern thus offers
>>>>>>>>>> better latency at reduced instruction count.
>>>>>>>>>>>
>>>>>>>>>>> 2) There were two approaches to implement this:
>>>>>>>>>>>   ?????? a)? Let everything remain the same and add new wide
>>>>>>>>>>> complex
>>>>>>>>>> instruction patterns in the matcher for e.g.
>>>>>>>>>>>   ??????????? set Dst ( OrV (Binary (LShiftVI dst (Binary
>>>>>>>>>>> ReplicateI
>>>>>>>>>>> shift))
>>>>>>>>>> (URShiftVI dst (Binary (SubI (Binary ReplicateI 32) (
>>>>>>>>>> Replicate
>>>>>>>>>> shift))
>>>>>>>>>>>   ?????? It would have been an overoptimistic assumption to
>>>>>>>>>>> expect that graph
>>>>>>>>>> shape would be preserved till the matcher for correct
>> inferencing.
>>>>>>>>>>>   ?????? In addition we would have required multiple such
>>>>>>>>>>> bulky patterns.
>>>>>>>>>>>   ?????? b) Create new RotateLeft/RotateRight scalar nodes,
>>>>>>>>>>> these gets
>>>>>>>>>> generated during intrinsification as well as during additional
>>>>>>>>>> pattern
>>>>>>>>>>>   ?????? matching during node Idealization, later on these
>>>>>>>>>>> nodes are consumed
>>>>>>>>>> by SLP for valid vectorization scenarios to emit their vector
>>>>>>>>>>>   ?????? counterparts which eventually emits vector rotates.
>>>>>>>>>>>
>>>>>>>>>>> 3) I choose approach 2b) since its cleaner, only problem here
>>>>>>>>>>> was that in non-evex mode (UseAVX < 3) new scalar Rotate
>>>>>>>>>>> nodes should either be
>>>>>>>>>> dismantled back to OR/SHIFT pattern or we penalize the
>>>>>>>>>> vectorization which would be very costly, other option would
>>>>>>>>>> have been to add additional vector rotate pattern for UseAVX=3
>>>>>>>>>> in the matcher which emit vector OR-SHIFTs instruction but
>>>>>>>>>> then it will loose on emitting efficient instruction sequence
>>>>>>>>>> which node sharing
>>>>>>>>>> (OrV/LShiftV/URShift) offer in current implementation - thus
>>>>>>>>>> it will not be beneficial for non-AVX512 targets, only saving
>>>>>>>>>> will be in terms of cleanup of few existing scalar rotate
>>>>>>>>>> matcher patterns, also old targets does not offer this
>>>>>>>>>> powerful rotate
>>>>> instruction.
>>>>>>>>>> Therefore new scalar nodes are created only for AVX512 targets.
>>>>>>>>>>>
>>>>>>>>>>> As per suggestions constant folding scenarios have been
>>>>>>>>>>> covered during
>>>>>>>>>> Idealizations of newly added scalar nodes.
>>>>>>>>>>>
>>>>>>>>>>> Please review the latest version and share your feedback and
>>>>>>>>>>> test
>>>>>>>>>> results.
>>>>>>>>>>>
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Jatin
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: Andrew Haley <aph at redhat.com>
>>>>>>>>>>>> Sent: Saturday, July 11, 2020 2:24 PM
>>>>>>>>>>>> To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; Bhateja,
>>>>>>>>>>>> Jatin <jatin.bhateja at intel.com>;
>>>>>>>>>>>> hotspot-compiler-dev at openjdk.java.net
>>>>>>>>>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
>>>>>>>>>>>> Subject: Re: 8248830 : RFR[S] : C2 : Rotate API
>>>>>>>>>>>> intrinsification for
>>>>>>>>>>>> X86
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/07/2020 18:32, Vladimir Ivanov wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>   ??? > High-level comment: so far, there were no pressing
>>>>>>>>>>>> need in
>>>>>>>>>>>>> explicitly marking the methods as intrinsics. ROR/ROL
>>>>>>>>>>>> instructions
>>>>>>>>>>>>> were selected during matching [1]. Now the patch introduces
>>>>>>>>>>>>>>
>>>>>>>>>>>> dedicated nodes
>>>>>>>>>>>> (RotateLeft/RotateRight) specifically for intrinsics? >
>>>>>>>>>>>> which partly duplicates existing logic.
>>>>>>>>>>>>
>>>>>>>>>>>> The lack of rotate nodes in the IR has always meant that
>>>>>>>>>>>> AArch64 doesn't generate optimal code for e.g.
>>>>>>>>>>>>
>>>>>>>>>>>>   ????? (Set dst (XorL reg1 (RotateLeftL reg2 imm)))
>>>>>>>>>>>>
>>>>>>>>>>>> because, with the RotateLeft expanded to its full
>>>>>>>>>>>> combination of ORs and shifts, it's to complicated to match.
>>>>>>>>>>>> At the time I put this to one side because it wasn't urgent.
>>>>>>>>>>>> This is a shame because although such combinations are
>>>>>>>>>>>> unusual they are used in some crypto
>>>>>>>> operations.
>>>>>>>>>>>>
>>>>>>>>>>>> If we can generate immediate-form rotate nodes early by
>>>>>>>>>>>> pattern matching during parsing (rather than depending on
>>>>>>>>>>>> intrinsics) we'll get more value than by depending on
>>>>>>>>>>>> programmers calling
>>>>> intrinsics.
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Andrew Haley? (he/him)
>>>>>>>>>>>> Java Platform Lead Engineer
>>>>>>>>>>>> Red Hat UK Ltd. <https://www.redhat.com>
>>>>>>>>>>>> https://keybase.io/andrewhaley
>>>>>>>>>>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>>>>>>>>>>>

From adinn at redhat.com  Fri Aug  7 13:25:09 2020
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 7 Aug 2020 14:25:09 +0100
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
Message-ID: <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>

Hi Ningsheng,

On 31/07/2020 02:41, Ningsheng Jian wrote:
> Hi Andrew,
> 
> Thanks a lot!!
> 
> FYI, the latest patch:
> 
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039289.html
> 
> 
> And some descriptions:
> 
> http://cr.openjdk.java.net/~njian/8231441/README-RFR.txt

Thanks for doing such a great job. This is very good work.

Also, thanks for splitting the patch up to separate out the different
steps -- that was immensely helpful.

I have one general query and a small number of detailed comments which
are provided separately for each patch. See below.

Testing:

I was able to test this patch on a loaned Fujitsu FX700. I replicated
your results, passing tier1 tests and the jtreg compiler tests in
vectorization, codegen, c2/cr6340864 and loopopts.

I also eyeballed /some/ of the generated code to check that it looked
ok. I'd really like to be able to do that systematically for a
comprehensive test suite that exercised every rule but I only had the
machine for a few days. This really ought to be done as a follow-up to
ensure that all the rules are working as expected.


General Comments:

Sizing the NEON registers using 8 slots -- even though there might
actually be more (or less!) slots in use for a VecA is fine. However, I
think this needs a little bit more explanation in the .ad. file (see
comments on ra webrev below)

I'm ok with your choice to use p7 as an always true predicate register
and also how you choose to init and re-init from code defined via the ad
file based on C->max_vector_size().

I am not clear why you are choosing to re-init ptrue after certain JVM
runtime calls (e.g. when Z calls into the runtime) and not others e.g.
when we call a JVM_ENTRY. Could you explain the rationale you have
followed here?


Specific Comments (feature webrev):


globals_aarch64.hpp:102

Just out of interest why does UseSVE have range(0,2)? It seems you are
only testing for UseSVE > 0. Does value 2 correspond to  an optional subset?


Specific Comments (register allocator webrev):


aarch64.ad:97-100

Why have you added a reg_def for R8 and R9 here and also to alloc_class
chunk0 at lines 544-545? They aren't used by C2 so why define them?


assembler_aarch64.hpp:280 (also 699)

prf sets a predicate register field. pgrf sets a governing predicate
register field. Should the name not be gprf.


chaitin.cpp:648-660

The comment is rather oddly formatted.

At line 650 you guard the assert with a test for lrg._is_vector. Is that
not always going to be guaranteed by the outer condition
lrg._is_scalable? If so then you should really assert lrg._is_vector.

The special case code for computation of num_regs for a vector stack
slot also appears in this file with a slightly different organization in
find_first_set (line 1350) and in PhaseChaitin::Select (line 1590).
There is another similar case in RegMask::num_registers at regmask.cpp:
98. It would be better to factor out the common code into methods of
LRG. Maybe using the following?

  bool LRG::is_scalable_vector() {
    if (_is_scalable) {
      assert(_is_vector == 1);
      assert(_num_regs == == RegMask::SlotsPerVecA)
      return true;
    }
    return false;
  }

  int LRG::scalable_num_regs() {
    assert(is_scalable_vector());
    if (OptoReg::is_stack(_reg)) {
      return _scalable_reg_slots
    } else {
      return num_reg_slots;
    }
  }


chaitin.cpp:1350

Once again the test for lrg._is_vector should be guaranteed by the outer
test of lrg._is_scalable. Refactoring using the common methods of LRG as
above ought to help.

chaitin.cpp:1591

Use common method code.


postaloc.cpp:308/323

Once again you should be able to use common method code of LRG here.


regmask.cpp:91

Once again you should be able to use common method code of LRG here.

Specific Comments (c2 webrev):


aarch64.ad:3815

very nice defensive check!


assembler_aarch64.hpp:2469 & 2699+

Andrew Haley is definitely going to ask you to update function entry
(assembler_aarch64.cpp:76) to call these new instruction generation
methods and then validate the generated code using asm_check So, I guess
you might as well do that now ;-)


zBarrierSetAssembler_aarch64.cpp:434

Can you explain why we need to check p7 here and not do so in other
places where we call into the JVM? I'm not saying this is wrong. I just
want to know how you decided where re-init of p7 was needed.

superword.cpp:97

Does this mean that is someone sets the maximum vector size to a
non-power of two, such as 384, all superword operations will be
bypassed? Including those which can be done using NEON vectors?

regards,


Andrew Dinn
-----------
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From Charlie.Gracie at microsoft.com  Fri Aug  7 16:19:10 2020
From: Charlie.Gracie at microsoft.com (Charlie Gracie)
Date: Fri, 7 Aug 2020 16:19:10 +0000
Subject: RFR: 8251303: C2: remove unused _site_invoke_ratio and related code
 from InlineTree
Message-ID: <BD4943EA-03C4-47DD-97C0-D04CA8D3946D@microsoft.com>

Hi,

Please review this change to C2 that removes unused code from InlineTree.
I looked to see which change removed the last use of this code but as far
back in history as I could see it was not used.

Bug: https://bugs.openjdk.java.net/browse/JDK-8251303
Webrev: https://cr.openjdk.java.net/~burban/cgracie/unused_code/webrev0.0/

Sponsor Required: Yes

Test:
?? Built on MacOS {release,fastdebug}

Thanks,
Charlie


From vladimir.x.ivanov at oracle.com  Fri Aug  7 16:45:56 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 7 Aug 2020 19:45:56 +0300
Subject: RFR: 8250808: Re-associate loop invariants with other associative
 operations
In-Reply-To: <VI1PR08MB5328B283CCC9EFB7AFBF3108F5480@VI1PR08MB5328.eurprd08.prod.outlook.com>
References: <VI1PR08MB5328B283CCC9EFB7AFBF3108F5480@VI1PR08MB5328.eurprd08.prod.outlook.com>
Message-ID: <f7fe7c01-5845-1e95-bb8e-e4dc84d63a43@oracle.com>


> Webrev: http://cr.openjdk.java.net/~xgong/rfr/8250808/webrev.00/

Looks good.

So far, testing results look good (hs-tier1/2 are clean, tier1-4 are in 
progress).

Best regards,
Vladimir Ivanov

> C2 has re-association of loop invariants. However, the current implementation
> only supports the re-associations for add and subtract with 32-bits integer type.
> For other associative expressions like multiplication and the logic operations,
> the re-association is also applicable, and also for the operations with long type.
> 
> This patch adds the missing re-associations for other associative operations
> together with the support for long type.
> 
> With this patch, the following expressions:
>    (x * inv1) * inv2
>    (x | inv1) | inv2
>    (x & inv1) & inv2
>    (x ^ inv1) ^ inv2         ; inv1, inv2 are invariants
> 
> can be re-associated to:
>    x * (inv1 * inv2)         ; "inv1 * inv2" can be hoisted
>    x | (inv1 | inv2)         ; "inv1 | inv2" can be hoisted
>    x & (inv1 & inv2)       ; "inv1 & inv2" can be hoisted
>    x ^ (inv1 ^ inv2)         ; "inv1 ^ inv2" can be hoisted
> 
> Performance:
> Here is the micro benchmark:
> http://cr.openjdk.java.net/~xgong/rfr/8250808/LoopInvariant.java
> 
> And the results on X86_64:
> Before:
> Benchmark                           (length)  Mode Cnt    Score        Error      Units
> loopInvariantAddLong          1024      avgt   15   988.142    ?  0.110   ns/op
> loopInvariantAndInt              1024      avgt   15   843.850    ?  0.522   ns/op
> loopInvariantAndLong          1024      avgt   15   990.551    ? 10.458  ns/op
> loopInvariantMulInt              1024      avgt   15  1209.003   ?  0.247   ns/op
> loopInvariantMulLong          1024      avgt   15  1213.923   ?  0.438    ns/op
> loopInvariantOrInt                1024      avgt   15   843.908    ?  0.132    ns/op
> loopInvariantOrLong             1024      avgt   15   990.710   ? 10.484  ns/op
> loopInvariantSubLong           1024      avgt   15   988.170   ?  0.159    ns/op
> loopInvariantXorInt               1024      avgt   15   806.949   ?  7.860    ns/op
> loopInvariantXorLong           1024      avgt   15   990.963   ?  8.321    ns/op
> 
> After:
> Benchmark                           (length)  Mode  Cnt    Score       Error    Units
> loopInvariantAddLong          1024      avgt   15   842.854   ?  9.036  ns/op
> loopInvariantAndInt              1024      avgt   15   698.097   ?  0.916  ns/op
> loopInvariantAndLong          1024      avgt   15   841.120   ?  0.118  ns/op
> loopInvariantMulInt              1024      avgt   15   691.000   ?  7.696  ns/op
> loopInvariantMulLong          1024      avgt   15   846.907   ?  0.189  ns/op
> loopInvariantOrInt                1024      avgt   15   698.423   ?  4.969  ns/op
> loopInvariantOrLong            1024      avgt   15   843.465   ? 10.196  ns/op
> loopInvariantSubLong          1024      avgt   15   841.314   ?  2.906  ns/op
> loopInvariantXorInt              1024      avgt   15   652.529   ?  0.556  ns/op
> loopInvariantXorLong          1024      avgt   15   841.860   ?  2.491  ns/op
> 
> Results on AArch64:
> Before:
> Benchmark                          (length)  Mode  Cnt    Score        Error     Units
> loopInvariantAddLong         1024      avgt    15   514.437    ? 0.351  ns/op
> loopInvariantAndInt            1024      avgt     15   435.301    ? 0.415  ns/op
> loopInvariantAndLong        1024      avgt     15   572.437    ? 0.057  ns/op
> loopInvariantMulInt            1024      avgt     15  1154.544   ? 0.030  ns/op
> loopInvariantMulLong        1024      avgt     15  1188.109   ? 0.299  ns/op
> loopInvariantOrInt              1024      avgt     15   435.605    ? 0.977  ns/op
> loopInvariantOrLong          1024      avgt     15   572.475     ? 0.093  ns/op
> loopInvariantSubLong        1024      avgt     15   514.340    ? 0.154  ns/op
> loopInvariantXorInt            1024      avgt     15   426.186    ? 0.105  ns/op
> loopInvariantXorLong        1024      avgt     15   572.505    ? 0.259  ns/op
> 
> After:
> Benchmark                        (length)  Mode  Cnt    Score       Error    Units
> loopInvariantAddLong       1024     avgt     15   508.179   ? 0.108  ns/op
> loopInvariantAndInt           1024    avgt     15   394.706   ? 0.199  ns/op
> loopInvariantAndLong       1024    avgt     15   434.443   ? 0.247  ns/op
> loopInvariantMulInt           1024    avgt     15   762.477   ? 0.079  ns/op
> loopInvariantMulLong       1024    avgt     15   775.975   ? 0.159  ns/op
> loopInvariantOrInt             1024    avgt     15   394.657   ? 0.156  ns/op
> loopInvariantOrLong         1024    avgt     15   434.428   ? 0.282  ns/op
> loopInvariantSubLong       1024    avgt     15   507.475   ? 0.151  ns/op
> loopInvariantXorInt           1024    avgt     15   396.000   ? 0.011  ns/op
> loopInvariantXorLong       1024    avgt     15   434.255   ? 0.099  ns/op
> 
> Tests:
> Tested jtreg hotspot::hotspot_all_no_apps,jdk::jdk_core,langtools::tier1
> and jcstress:tests-custom, and all tests pass without new failure.
> 
> Thanks,
> Xiaohong Gong
> 

From vladimir.x.ivanov at oracle.com  Fri Aug  7 16:50:33 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 7 Aug 2020 19:50:33 +0300
Subject: RFR: 8251303: C2: remove unused _site_invoke_ratio and related
 code from InlineTree
In-Reply-To: <BD4943EA-03C4-47DD-97C0-D04CA8D3946D@microsoft.com>
References: <BD4943EA-03C4-47DD-97C0-D04CA8D3946D@microsoft.com>
Message-ID: <ca91adcd-3c88-de04-1b63-1b7a8a7c68ca@oracle.com>

> Webrev: https://cr.openjdk.java.net/~burban/cgracie/unused_code/webrev0.0/
Looks good.

I'll submit it for testing.

Best regards,
Vladimir Ivanov

From vladimir.x.ivanov at oracle.com  Fri Aug  7 17:06:26 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 7 Aug 2020 20:06:26 +0300
Subject: [16] (S) RFR 8251260: two MD5 tests fail "RuntimeException:
 Unexpected count of intrinsic"
In-Reply-To: <7ecc9a5b-3af6-78a4-832c-03d043340f9f@oracle.com>
References: <7ecc9a5b-3af6-78a4-832c-03d043340f9f@oracle.com>
Message-ID: <08098628-7484-f2dc-019a-54a45f37a9c9@oracle.com>


> http://cr.openjdk.java.net/~kvn/8251260/webrev.00/

Looks good.

Best regards,
Vladimir Ivanov

> https://bugs.openjdk.java.net/browse/JDK-8251260
> 
> New MD5 intrinsic tests failed when run with AOTed java.base. And old 
> SHA tests are problem listed for AOT.
> 
> SHA and MD5 intrinsic tests parse -XX:+LogCompilation output looking for 
> compilation of sun/security/provider methods as intrinsics. But these 
> methods are already pre-compiled by AOT when AOTed java.base is used. As 
> result LogCompilation does not have corresponding entries.
> 
> I think we should not run these MD5 and SHA tests with AOTed java.base 
> module. I added corresponding @requires.
> 
> Old SHA tests were problem listed referencing 8167430 [1] bug but I 
> think it is incorrect. The original SHA tests crash with AOT 8207358 [2] 
> bug was closed as duplicate of 8167430 because of conflict how 
> intirnsics flags are set by default during AOT compilation. But we 
> simply should not run these tests with AOTed java.base. So I am adding 
> @requires to them as well and removing them from AOT problem list.
> 
> Tested hs-tier1, hs-tier2 (runs sha,md5 tests), hs-tier6 (now skips 
> sha,md5 tests when AOTed java.base is used).
> 
> Thanks,
> Vladimir
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8167430
> [2] https://bugs.openjdk.java.net/browse/JDK-8207358

From vladimir.kozlov at oracle.com  Fri Aug  7 17:08:17 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 7 Aug 2020 10:08:17 -0700
Subject: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86
In-Reply-To: <95066bec-d74e-eb55-9a05-463239016b2a@oracle.com>
References: <MWHPR11MB1614EAFF216144FE6EAE68F9E87F0@MWHPR11MB1614.namprd11.prod.outlook.com>
 <92d97d1b-fc53-e368-b249-1cab7db33964@oracle.com>
 <MWHPR11MB1614CB6E26028AC98DAA7F30E8790@MWHPR11MB1614.namprd11.prod.outlook.com>
 <dd691913-d9c7-2657-905f-4f3df50f6bb4@oracle.com>
 <MWHPR11MB1614E047E14386D3B51EA3A9E8700@MWHPR11MB1614.namprd11.prod.outlook.com>
 <e0a75968-936f-97df-5693-f1e3275824e9@oracle.com>
 <5f6a3e52-7854-4613-43f1-32a7423a0db6@oracle.com>
 <MWHPR11MB1614B0D4523E65CF9876E72DE8710@MWHPR11MB1614.namprd11.prod.outlook.com>
 <8265e303-0f86-b308-be79-740d6b4710f2@oracle.com>
 <MWHPR11MB1614336DE7DA3CDDD7292519E84C0@MWHPR11MB1614.namprd11.prod.outlook.com>
 <MWHPR11MB1614F63435554D7186F6BDA3E8490@MWHPR11MB1614.namprd11.prod.outlook.com>
 <95066bec-d74e-eb55-9a05-463239016b2a@oracle.com>
Message-ID: <9748e2ee-f47d-7c47-627d-58e7d98e1779@oracle.com>

I see that you already discussed removal of opcodes/encoding from patterns in .ad file and move them to Assembler. I 
would like to see that too.

Changes look good otherwise. Thank you for adding tests to verify new code.

Thanks,
Vladimir K

On 8/7/20 5:15 AM, Vladimir Ivanov wrote:
> 
>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.06/
> 
> Still looks good.
> 
> It would be nice to get one more (R)eview. Let's wait a little bit more.
> 
> Best regards,
> Vladimir Ivanov
> 
>>>> -----Original Message-----
>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>> Sent: Saturday, August 1, 2020 4:49 AM
>>>> To: Bhateja, Jatin <jatin.bhateja at intel.com>
>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>> hotspot-compiler- dev at openjdk.java.net
>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for
>>>> X86
>>>>
>>>>
>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.05/
>>>>
>>>> Looks good.
>>>>
>>>> Tier5 (where I saw the crashes) passed.
>>>>
>>>> Please, incorporate the following minor cleanups in the final version:
>>>>
>>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8248830/webrev.05.cleanu
>>>> p/
>>>>
>>>> (Tested with hs-tier1,hs-tier2.)
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>>>> -----Original Message-----
>>>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>>>> Sent: Thursday, July 30, 2020 3:30 AM
>>>>>> To: Bhateja, Jatin <jatin.bhateja at intel.com>
>>>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>>>> hotspot-compiler- dev at openjdk.java.net
>>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification
>>>>>> for
>>>>>> X86
>>>>>>
>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.04/
>>>>>>>
>>>>>>> Looks good. (Testing is in progress.)
>>>>>>
>>>>>> FYI test results are clean (tier1-tier5).
>>>>>>
>>>>>>>> I have removed RotateLeftNode/RotateRightNode::Ideal routines
>>>>>>>> since we are anyways doing constant folding in LShiftI/URShiftI
>>>>>>>> value routines. Since JAVA rotate APIs are no longer intrincified
>>>>>>>> hence these routines may no longer be useful.
>>>>>>>
>>>>>>> Nice observation! Good.
>>>>>>
>>>>>> As a second thought, it seems there's still a chance left that
>>>>>> Rotate nodes get their input type narrowed after the folding
>>>>>> happened. For example, as a result of incremental inlining or CFG
>>>>>> transformations during loop optimizations. And it does happen in
>>>>>> practice since the testing revealed some crashes due to the bug in
>>>> RotateLeftNode/RotateRightNode::Ideal().
>>>>>>
>>>>>> So, it makes sense to keep the transformations. But I'm fine with
>>>>>> addressing that as a followup enhancement.
>>>>>>
>>>>>> Best regards,
>>>>>> Vladimir Ivanov
>>>>>>
>>>>>>>
>>>>>>>>> It would be really nice to migrate to MacroAssembler along the
>>>>>>>>> way (as a cleanup).
>>>>>>>>
>>>>>>>> I guess you are saying remove opcodes/encoding from patterns and
>>>>>>>> move then to Assembler, Can we take this cleanup activity
>>>>>>>> separately since other patterns are also using these matcher
>>>> directives.
>>>>>>>
>>>>>>> I'm perfectly fine with handling it as a separate enhancement.
>>>>>>>
>>>>>>>> Other synthetic comments have been taken care of. I have extended
>>>>>>>> the Test to cover all the newly added scalar transforms. Kindly
>>>>>>>> let me know if there other comments.
>>>>>>>
>>>>>>> Nice!
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Vladimir Ivanov
>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>>>>>>> Sent: Friday, July 24, 2020 3:21 AM
>>>>>>>>> To: Bhateja, Jatin <jatin.bhateja at intel.com>
>>>>>>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Andrew
>>>>>>>>> Haley <aph at redhat.com>; hotspot-compiler-dev at openjdk.java.net
>>>>>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification
>>>>>>>>> for
>>>>>>>>> X86
>>>>>>>>>
>>>>>>>>> Hi Jatin,
>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.03/
>>>>>>>>>
>>>>>>>>> Much better! Thanks.
>>>>>>>>>
>>>>>>>>>> Change Summary:
>>>>>>>>>>
>>>>>>>>>> 1) Unified the handling for scalar rotate operation. All scalar
>>>>>>>>>> rotate
>>>>>>>>> selection patterns are now dependent on newly created
>>>>>>>>> RotateLeft/RotateRight nodes. This promotes rotate inferencing.
>>>>>>>>> Currently
>>>>>>>>> if DAG nodes corresponding to a sub-pattern are shared (have
>>>>>>>>> multiple
>>>>>>>>> users) then existing complex patterns based on
>>>>>>>>> Or/LShiftL/URShift does not get matched and this prevents inferring
>>> rotate nodes.
>>>>>>>>> Please refer to JIT'ed assembly output with baseline[1] and with
>>>>>>>>> patch[2] . We can see that generated code size also went done
>>>>>>>>> from
>>>>>>>>> 832 byte to 768 bytes. Also this can cause perf degradation if
>>>>>>>>> shift-or dependency chain appears inside a hot region.
>>>>>>>>>>
>>>>>>>>>> 2) Due to enhanced rotate inferencing new patch shows better
>>>>>>>>>> performance
>>>>>>>>> even for legacy targets (non AVX-512). Please refer to the perf
>>>>>>>>> result[3] over AVX2 machine for JMH benchmark part of the patch.
>>>>>>>>>
>>>>>>>>> Very nice!
>>>>>>>>>> 3) As suggested, removed Java API intrinsification changes and
>>>>>>>>>> scalar
>>>>>>>>> rotate transformation are done during OrI/OrL node idealizations.
>>>>>>>>>
>>>>>>>>> Good.
>>>>>>>>>
>>>>>>>>> (Still would be nice to factor the matching code from Ideal()
>>>>>>>>> and share it between multiple use sites. Especially considering
>>>>>>>>> OrVNode::Ideal() now does basically the same thing. As an
>>>>>>>>> example/idea, take a look at
>>>>>>>>> is_bmi_pattern() in x86.ad.)
>>>>>>>>>
>>>>>>>>>> 4) SLP always gets to work on new scalar Rotate nodes and
>>>>>>>>>> creates vector
>>>>>>>>> rotate nodes which are degenerated into OrV/LShiftV/URShiftV
>>>>>>>>> nodes if target does not supports vector rotates(non-AVX512).
>>>>>>>>>
>>>>>>>>> Good.
>>>>>>>>>
>>>>>>>>>> 5) Added new instruction patterns for vector shift Left/Right
>>>>>>>>>> operations
>>>>>>>>> with constant shift operands. This prevents emitting extra moves
>>>>>>>>> to
>>>>>> XMM.
>>>>>>>>>
>>>>>>>>> +instruct vshiftI_imm(vec dst, vec src, immI8 shift) %{
>>>>>>>>> +? match(Set dst (LShiftVI src shift));
>>>>>>>>>
>>>>>>>>> I'd prefer to see a uniform Ideal IR shape being used
>>>>>>>>> irrespective of whether the argument is a constant or not. It
>>>>>>>>> should also simplify the logic in SuperWord and make it easier
>>>>>>>>> to support on
>>>>>>>>> non-x86 architectures.
>>>>>>>>>
>>>>>>>>> For example, here's how it is done on AArch64:
>>>>>>>>>
>>>>>>>>> instruct vsll4I_imm(vecX dst, vecX src, immI shift) %{
>>>>>>>>> ? ??? predicate(n->as_Vector()->length() == 4);
>>>>>>>>> ? ??? match(Set dst (LShiftVI src (LShiftCntV shift))); ...
>>>>>>>>>
>>>>>>>>>> 6) Constant folding scenarios are covered in
>>>>>>>>>> RotateLeft/RotateRight
>>>>>>>>> idealization, inferencing of vector rotate through OrV
>>>>>>>>> idealization covers the vector patterns generated though non SLP
>>>> route i.e.
>>>>>>>>> VectorAPI.
>>>>>>>>>
>>>>>>>>> I'm fine with keeping OrV::Ideal(), but I'm concerned with the
>>>>>>>>> general direction here - duplication of scalar transformations
>>>>>>>>> to lane-wise vector operations. It definitely won't scale and in
>>>>>>>>> a longer run it risks to diverge. Would be nice to find a way to
>>>>>>>>> automatically "lift"
>>>>>>>>> scalar transformations to vectors and apply them uniformly. But
>>>>>>>>> right now it is just an idea which requires more experimentation.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Some other minor comments/suggestions:
>>>>>>>>>
>>>>>>>>> +? // Swap the computed left and right shift counts.
>>>>>>>>> +? if (is_rotate_left) {
>>>>>>>>> +??? Node* temp = shiftRCnt;
>>>>>>>>> +??? shiftRCnt? = shiftLCnt;
>>>>>>>>> +??? shiftLCnt? = temp;
>>>>>>>>> +? }
>>>>>>>>>
>>>>>>>>> Maybe use swap() here (declared in globalDefinitions.hpp)?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> +? if (Matcher::match_rule_supported_vector(vopc, vlen, bt))
>>>>>>>>> +??? return true;
>>>>>>>>>
>>>>>>>>> Please, don't omit curly braces (even for simple cases).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -// Rotate Right by variable
>>>>>>>>> -instruct rorI_rReg_Var_C0(no_rcx_RegI dst, rcx_RegI shift,
>>>>>>>>> immI0 zero, rFlagsReg cr)
>>>>>>>>> +instruct rorI_immI8_legacy(rRegI dst, immI8 shift, rFlagsReg
>>>>>>>>> +cr)
>>>>>>>>> ? ?? %{
>>>>>>>>> -? match(Set dst (OrI (URShiftI dst shift) (LShiftI dst (SubI
>>>>>>>>> zero shift))));
>>>>>>>>> -
>>>>>>>>> +? predicate(!VM_Version::supports_bmi2() &&
>>>>>>>>> n->bottom_type()->basic_type() == T_INT);
>>>>>>>>> +? match(Set dst (RotateRight dst shift));
>>>>>>>>> +? format %{ "rorl???? $dst, $shift" %}
>>>>>>>>> ? ???? expand %{
>>>>>>>>> -??? rorI_rReg_CL(dst, shift, cr);
>>>>>>>>> +??? rorI_rReg_imm8(dst, shift, cr);
>>>>>>>>> ? ???? %}
>>>>>>>>>
>>>>>>>>> It would be really nice to migrate to MacroAssembler along the
>>>>>>>>> way (as a cleanup).
>>>>>>>>>
>>>>>>>>>> Please push the patch through your testing framework and let me
>>>>>>>>>> know your
>>>>>>>>> review feedback.
>>>>>>>>>
>>>>>>>>> There's one new assertion failure:
>>>>>>>>>
>>>>>>>>> #? Internal Error (.../src/hotspot/share/opto/phaseX.cpp:1238),
>>>>>>>>> pid=5476, tid=6219
>>>>>>>>> #? assert((i->_idx >= k->_idx) || i->is_top()) failed: Idealize
>>>>>>>>> should return new nodes, use Identity to return old nodes
>>>>>>>>>
>>>>>>>>> I believe it comes from
>>>>>>>>> RotateLeftNode::Ideal/RotateRightNode::Ideal
>>>>>>>>> which can return pre-contructed constants. I suggest to get rid
>>>>>>>>> of
>>>>>>>>> Ideal() methods and move constant folding logic into
>>>>>>>>> Node::Value() (as implemented for other bitwise/arithmethic
>>>>>>>>> nodes in addnode.cpp/subnode.cpp/mulnode.cpp et al). It's a more
>>>>>>>>> generic approach since it enables richer type information
>>>>>>>>> (ranges vs
>>>>>>>>> constants) and IMO it's more convenient to work with constants
>>>>>>>>> through Types than ConNodes.
>>>>>>>>>
>>>>>>>>> (I suspect that original/expanded IR shape may already provide
>>>>>>>>> more precise type info for non-constant case which can affect
>>>>>>>>> the
>>>>>>>>> benchmarks.)
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Vladimir Ivanov
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best Regards,
>>>>>>>>>> Jatin
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>>
>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_baseline_avx2_asm.
>>>>>>>>>> txt [2]
>>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_new_patch_a
>>>>>>>>>> vx
>>>>>>>>>> 2_
>>>>>>>>>> asm
>>>>>>>>>> .txt [3]
>>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_perf_avx2_n
>>>>>>>>>> ew
>>>>>>>>>> _p
>>>>>>>>>> atc
>>>>>>>>>> h.txt
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>>>>>>>>> Sent: Saturday, July 18, 2020 12:25 AM
>>>>>>>>>>> To: Bhateja, Jatin <jatin.bhateja at intel.com>; Andrew Haley
>>>>>>>>>>> <aph at redhat.com>
>>>>>>>>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>>>>>>>>> hotspot-compiler- dev at openjdk.java.net
>>>>>>>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API
>>>>>>>>>>> intrinsification for
>>>>>>>>>>> X86
>>>>>>>>>>>
>>>>>>>>>>> Hi Jatin,
>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev_02/
>>>>>>>>>>>
>>>>>>>>>>> It definitely looks better, but IMO it hasn't reached the
>>>>>>>>>>> sweet spot
>>>>>>>>> yet.
>>>>>>>>>>> It feels like the focus is on auto-vectorizer while the burden
>>>>>>>>>>> is put on scalar cases.
>>>>>>>>>>>
>>>>>>>>>>> First of all, considering GVN folds relevant operation
>>>>>>>>>>> patterns into a single Rotate node now, what's the motivation
>>>>>>>>>>> to introduce intrinsics?
>>>>>>>>>>>
>>>>>>>>>>> Another point is there's still significant duplication for
>>>>>>>>>>> scalar cases.
>>>>>>>>>>>
>>>>>>>>>>> I'd prefer to see the legacy cases which rely on pattern
>>>>>>>>>>> matching to go away and be substituted with instructions which
>>>>>>>>>>> match Rotate instructions (migrating ).
>>>>>>>>>>>
>>>>>>>>>>> I understand that it will penalize the vectorization
>>>>>>>>>>> implementation, but IMO reducing overall complexity is worth it.
>>>>>>>>>>> On auto-vectorizer side, I see
>>>>>>>>>>> 2 ways to fix it:
>>>>>>>>>>>
>>>>>>>>>>> ? ???? (1) introduce additional AD instructions for
>>>>>>>>>>> RotateLeftV/RotateRightV specifically for pre-AVX512 hardware;
>>>>>>>>>>>
>>>>>>>>>>> ? ???? (2) in SuperWord::output(), when matcher doesn't support
>>>>>>>>>>> RotateLeftV/RotateLeftV nodes
>>>>>>>>>>> (Matcher::match_rule_supported()),
>>>>>>>>>>> generate vectorized version of the original pattern.
>>>>>>>>>>>
>>>>>>>>>>> Overall, it looks like more and more focus is made on scalar
>>> part.
>>>>>>>>>>> Considering the main goal of the patch is to enable
>>>>>>>>>>> vectorization, I'm fine with separating cleanup of scalar part.
>>>>>>>>>>> As an interim solution, it seems that leaving the scalar part
>>>>>>>>>>> as it is now and matching scalar bit rotate pattern in
>>>>>>>>>>> VectorNode::is_rotate() should be enough to keep the
>>>>>>>>>>> vectorization part functioning. Then scalar Rotate nodes and
>>>> relevant cleanups can be integrated later.
>>>>>>>>>>> (Or vice
>>>>>>>>>>> versa: clean up scalar part first and then follow up with
>>>>>>>>>>> vectorization.)
>>>>>>>>>>>
>>>>>>>>>>> Some other comments:
>>>>>>>>>>>
>>>>>>>>>>> * There's a lot of duplication between OrINode::Ideal and
>>>>>>>>> OrLNode::Ideal.
>>>>>>>>>>> What do you think about introducing a super type
>>>>>>>>>>> (OrNode) and put a unified version (OrNode::Ideal) there?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> * src/hotspot/cpu/x86/x86.ad
>>>>>>>>>>>
>>>>>>>>>>> +instruct vprotate_immI8(vec dst, vec src, immI8 shift) %{
>>>>>>>>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type()
>>>>>>>>>>> +==
>>>>>>>>>>> T_INT
>>>>>>>>> ||
>>>>>>>>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type()
>>>>>>>>>>> +== T_LONG);
>>>>>>>>>>>
>>>>>>>>>>> +instruct vprorate(vec dst, vec src, vec shift) %{
>>>>>>>>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type()
>>>>>>>>>>> +==
>>>>>>>>>>> T_INT
>>>>>>>>> ||
>>>>>>>>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type()
>>>>>>>>>>> +== T_LONG);
>>>>>>>>>>>
>>>>>>>>>>> The predicates are redundant here.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> * src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
>>>>>>>>>>>
>>>>>>>>>>> +void C2_MacroAssembler::vprotate_imm(int opcode, BasicType
>>>>>>>>>>> +etype,
>>>>>>>>>>> XMMRegister dst, XMMRegister src,
>>>>>>>>>>> +???????????????????????????????????? int shift, int
>>>>>>>>>>> +vector_len) { if (opcode == Op_RotateLeftV) {
>>>>>>>>>>> +??? if (etype == T_INT) {
>>>>>>>>>>> +????? evprold(dst, src, shift, vector_len);
>>>>>>>>>>> +??? } else {
>>>>>>>>>>> +????? evprolq(dst, src, shift, vector_len);
>>>>>>>>>>> +??? }
>>>>>>>>>>>
>>>>>>>>>>> Please, put an assert for the false case (assert(etype ==
>>>>>>>>>>> T_LONG,
>>>>>>>>> "...")).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> * On testing (with previous version of the patch): -XX:UseAVX
>>>>>>>>>>> is
>>>>>>>>>>> x86- specific flag, so new/adjusted tests now fail on non-x86
>>>>>> platforms.
>>>>>>>>>>> Either omitting the flag or adding
>>>>>>>>>>> -XX:+IgnoreUnrecognizedVMOptions will solve the issue.
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Vladimir Ivanov
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Summary of changes:
>>>>>>>>>>>> 1) Optimization is specifically targeted to exploit vector
>>>>>>>>>>>> rotation
>>>>>>>>>>> instruction added for X86 AVX512. A single rotate instruction
>>>>>>>>>>> encapsulates entire vector OR/SHIFTs pattern thus offers
>>>>>>>>>>> better latency at reduced instruction count.
>>>>>>>>>>>>
>>>>>>>>>>>> 2) There were two approaches to implement this:
>>>>>>>>>>>> ? ?????? a)? Let everything remain the same and add new wide
>>>>>>>>>>>> complex
>>>>>>>>>>> instruction patterns in the matcher for e.g.
>>>>>>>>>>>> ? ??????????? set Dst ( OrV (Binary (LShiftVI dst (Binary
>>>>>>>>>>>> ReplicateI
>>>>>>>>>>>> shift))
>>>>>>>>>>> (URShiftVI dst (Binary (SubI (Binary ReplicateI 32) (
>>>>>>>>>>> Replicate
>>>>>>>>>>> shift))
>>>>>>>>>>>> ? ?????? It would have been an overoptimistic assumption to
>>>>>>>>>>>> expect that graph
>>>>>>>>>>> shape would be preserved till the matcher for correct
>>> inferencing.
>>>>>>>>>>>> ? ?????? In addition we would have required multiple such
>>>>>>>>>>>> bulky patterns.
>>>>>>>>>>>> ? ?????? b) Create new RotateLeft/RotateRight scalar nodes,
>>>>>>>>>>>> these gets
>>>>>>>>>>> generated during intrinsification as well as during additional
>>>>>>>>>>> pattern
>>>>>>>>>>>> ? ?????? matching during node Idealization, later on these
>>>>>>>>>>>> nodes are consumed
>>>>>>>>>>> by SLP for valid vectorization scenarios to emit their vector
>>>>>>>>>>>> ? ?????? counterparts which eventually emits vector rotates.
>>>>>>>>>>>>
>>>>>>>>>>>> 3) I choose approach 2b) since its cleaner, only problem here
>>>>>>>>>>>> was that in non-evex mode (UseAVX < 3) new scalar Rotate
>>>>>>>>>>>> nodes should either be
>>>>>>>>>>> dismantled back to OR/SHIFT pattern or we penalize the
>>>>>>>>>>> vectorization which would be very costly, other option would
>>>>>>>>>>> have been to add additional vector rotate pattern for UseAVX=3
>>>>>>>>>>> in the matcher which emit vector OR-SHIFTs instruction but
>>>>>>>>>>> then it will loose on emitting efficient instruction sequence
>>>>>>>>>>> which node sharing
>>>>>>>>>>> (OrV/LShiftV/URShift) offer in current implementation - thus
>>>>>>>>>>> it will not be beneficial for non-AVX512 targets, only saving
>>>>>>>>>>> will be in terms of cleanup of few existing scalar rotate
>>>>>>>>>>> matcher patterns, also old targets does not offer this
>>>>>>>>>>> powerful rotate
>>>>>> instruction.
>>>>>>>>>>> Therefore new scalar nodes are created only for AVX512 targets.
>>>>>>>>>>>>
>>>>>>>>>>>> As per suggestions constant folding scenarios have been
>>>>>>>>>>>> covered during
>>>>>>>>>>> Idealizations of newly added scalar nodes.
>>>>>>>>>>>>
>>>>>>>>>>>> Please review the latest version and share your feedback and
>>>>>>>>>>>> test
>>>>>>>>>>> results.
>>>>>>>>>>>>
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Jatin
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: Andrew Haley <aph at redhat.com>
>>>>>>>>>>>>> Sent: Saturday, July 11, 2020 2:24 PM
>>>>>>>>>>>>> To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; Bhateja,
>>>>>>>>>>>>> Jatin <jatin.bhateja at intel.com>;
>>>>>>>>>>>>> hotspot-compiler-dev at openjdk.java.net
>>>>>>>>>>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
>>>>>>>>>>>>> Subject: Re: 8248830 : RFR[S] : C2 : Rotate API
>>>>>>>>>>>>> intrinsification for
>>>>>>>>>>>>> X86
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/07/2020 18:32, Vladimir Ivanov wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ? ??? > High-level comment: so far, there were no pressing
>>>>>>>>>>>>> need in
>>>>>>>>>>>>>> explicitly marking the methods as intrinsics. ROR/ROL
>>>>>>>>>>>>> instructions
>>>>>>>>>>>>>> were selected during matching [1]. Now the patch introduces
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> dedicated nodes
>>>>>>>>>>>>> (RotateLeft/RotateRight) specifically for intrinsics? >
>>>>>>>>>>>>> which partly duplicates existing logic.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The lack of rotate nodes in the IR has always meant that
>>>>>>>>>>>>> AArch64 doesn't generate optimal code for e.g.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ? ????? (Set dst (XorL reg1 (RotateLeftL reg2 imm)))
>>>>>>>>>>>>>
>>>>>>>>>>>>> because, with the RotateLeft expanded to its full
>>>>>>>>>>>>> combination of ORs and shifts, it's to complicated to match.
>>>>>>>>>>>>> At the time I put this to one side because it wasn't urgent.
>>>>>>>>>>>>> This is a shame because although such combinations are
>>>>>>>>>>>>> unusual they are used in some crypto
>>>>>>>>> operations.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we can generate immediate-form rotate nodes early by
>>>>>>>>>>>>> pattern matching during parsing (rather than depending on
>>>>>>>>>>>>> intrinsics) we'll get more value than by depending on
>>>>>>>>>>>>> programmers calling
>>>>>> intrinsics.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- 
>>>>>>>>>>>>> Andrew Haley? (he/him)
>>>>>>>>>>>>> Java Platform Lead Engineer
>>>>>>>>>>>>> Red Hat UK Ltd. <https://www.redhat.com>
>>>>>>>>>>>>> https://keybase.io/andrewhaley
>>>>>>>>>>>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>>>>>>>>>>>>

From vladimir.kozlov at oracle.com  Fri Aug  7 17:10:36 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 7 Aug 2020 10:10:36 -0700
Subject: [16] (S) RFR 8251260: two MD5 tests fail "RuntimeException:
 Unexpected count of intrinsic"
In-Reply-To: <08098628-7484-f2dc-019a-54a45f37a9c9@oracle.com>
References: <7ecc9a5b-3af6-78a4-832c-03d043340f9f@oracle.com>
 <08098628-7484-f2dc-019a-54a45f37a9c9@oracle.com>
Message-ID: <80bf7d5e-6446-817f-732c-519fc0383ff5@oracle.com>

Thank you, Vladimir

On 8/7/20 10:06 AM, Vladimir Ivanov wrote:
> 
>> http://cr.openjdk.java.net/~kvn/8251260/webrev.00/
> 
> Looks good.
> 
> Best regards,
> Vladimir Ivanov
> 
>> https://bugs.openjdk.java.net/browse/JDK-8251260
>>
>> New MD5 intrinsic tests failed when run with AOTed java.base. And old SHA tests are problem listed for AOT.
>>
>> SHA and MD5 intrinsic tests parse -XX:+LogCompilation output looking for compilation of sun/security/provider methods 
>> as intrinsics. But these methods are already pre-compiled by AOT when AOTed java.base is used. As result 
>> LogCompilation does not have corresponding entries.
>>
>> I think we should not run these MD5 and SHA tests with AOTed java.base module. I added corresponding @requires.
>>
>> Old SHA tests were problem listed referencing 8167430 [1] bug but I think it is incorrect. The original SHA tests 
>> crash with AOT 8207358 [2] bug was closed as duplicate of 8167430 because of conflict how intirnsics flags are set by 
>> default during AOT compilation. But we simply should not run these tests with AOTed java.base. So I am adding 
>> @requires to them as well and removing them from AOT problem list.
>>
>> Tested hs-tier1, hs-tier2 (runs sha,md5 tests), hs-tier6 (now skips sha,md5 tests when AOTed java.base is used).
>>
>> Thanks,
>> Vladimir
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8167430
>> [2] https://bugs.openjdk.java.net/browse/JDK-8207358

From luhenry at microsoft.com  Sat Aug  8 04:30:37 2020
From: luhenry at microsoft.com (Ludovic Henry)
Date: Sat, 8 Aug 2020 04:30:37 +0000
Subject: [11u] RFR[M]: 8250902: Implement MD5 Intrinsics on x86  
Message-ID: <MWHPR21MB0511C1E3EBF84010E0B083F6B0460@MWHPR21MB0511.namprd21.prod.outlook.com>

Hello,

I would like to backport the newly added MD5 Intrinsic to JDK 11. The change is contained, limiting the chance of a regression, and provides a great speedup on a common pattern. This change also contains the follow-up fix by Vladimir Kozlov.

As it is the first backport I go through, please let me know what other steps I need to take.

Original Bugs:
https://bugs.openjdk.java.net/browse/JDK-8250902
https://bugs.openjdk.java.net/browse/JDK-8251260

Original Webrevs:
http://cr.openjdk.java.net/~luhenry/8250902/webrev.03
http://cr.openjdk.java.net/~kvn/8251260/webrev.00/

Bug:
https://bugs.openjdk.java.net/browse/JDK-8251319

Webrev:
http://cr.openjdk.java.net/~luhenry/8250902-11u/webrev.00

Testing: Linux-x64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha, hotspot:tier1, jdk:tier1.

Thank you,
Ludovic

[1]


From vladimir.kozlov at oracle.com  Sat Aug  8 04:49:49 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 7 Aug 2020 21:49:49 -0700
Subject: [11u] RFR[M]: 8250902: Implement MD5 Intrinsics on x86
In-Reply-To: <MWHPR21MB0511C1E3EBF84010E0B083F6B0460@MWHPR21MB0511.namprd21.prod.outlook.com>
References: <MWHPR21MB0511C1E3EBF84010E0B083F6B0460@MWHPR21MB0511.namprd21.prod.outlook.com>
Message-ID: <569d6c8b-136f-de81-5816-47216333fcb8@oracle.com>

Hi Ludovic,

Usually we backport only bugs fixes to keep LTS (11u) release stable.

To backport into 11u you need approval [1]. Here is example [2].

You need also point if backport applied cleanly or you have to make changes.

Changes should be backported separately to keep track - do not combine changes.
But it is okay to push both changesets together (especially if followup changes fixed first).

Regards,
Vladimir K

[1] http://openjdk.java.net/projects/jdk-updates/approval.html
[2] https://bugs.openjdk.java.net/browse/JDK-8248214

On 8/7/20 9:30 PM, Ludovic Henry wrote:
> Hello,
> 
> I would like to backport the newly added MD5 Intrinsic to JDK 11. The change is contained, limiting the chance of a regression, and provides a great speedup on a common pattern. This change also contains the follow-up fix by Vladimir Kozlov.
> 
> As it is the first backport I go through, please let me know what other steps I need to take.
> 
> Original Bugs:
> https://bugs.openjdk.java.net/browse/JDK-8250902
> https://bugs.openjdk.java.net/browse/JDK-8251260
> 
> Original Webrevs:
> http://cr.openjdk.java.net/~luhenry/8250902/webrev.03
> http://cr.openjdk.java.net/~kvn/8251260/webrev.00/
> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8251319
> 
> Webrev:
> http://cr.openjdk.java.net/~luhenry/8250902-11u/webrev.00
> 
> Testing: Linux-x64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha, hotspot:tier1, jdk:tier1.
> 
> Thank you,
> Ludovic
> 
> [1]
> 
> 

From aph at redhat.com  Sat Aug  8 12:08:46 2020
From: aph at redhat.com (Andrew Haley)
Date: Sat, 8 Aug 2020 13:08:46 +0100
Subject: [11u] RFR[M]: 8250902: Implement MD5 Intrinsics on x86
In-Reply-To: <MWHPR21MB0511C1E3EBF84010E0B083F6B0460@MWHPR21MB0511.namprd21.prod.outlook.com>
References: <MWHPR21MB0511C1E3EBF84010E0B083F6B0460@MWHPR21MB0511.namprd21.prod.outlook.com>
Message-ID: <8b61da32-3cc1-fa14-a607-1f871f7e3d70@redhat.com>

On 8/8/20 5:30 AM, Ludovic Henry wrote:
 > I would like to backport the newly added MD5 Intrinsic to JDK 11.

It's too early for that: changes are supposed to bake in JDK head for
a while. Also, since it's an enhancement rather than a bug fix we'd
need to have the discussion. I would say it's marginal whether
something like this should be back ported.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From luhenry at microsoft.com  Sat Aug  8 17:30:07 2020
From: luhenry at microsoft.com (Ludovic Henry)
Date: Sat, 8 Aug 2020 17:30:07 +0000
Subject: [11u] RFR[M]: 8250902: Implement MD5 Intrinsics on x86
In-Reply-To: <8b61da32-3cc1-fa14-a607-1f871f7e3d70@redhat.com>
References: <MWHPR21MB0511C1E3EBF84010E0B083F6B0460@MWHPR21MB0511.namprd21.prod.outlook.com>
 <8b61da32-3cc1-fa14-a607-1f871f7e3d70@redhat.com>
Message-ID: <MWHPR21MB0511FE500E9FEC0D6789F95CB0460@MWHPR21MB0511.namprd21.prod.outlook.com>

Hi Andrew, Vladimir,

> It's too early for that: changes are supposed to bake in JDK head for
> a while. Also, since it's an enhancement rather than a bug fix we'd
> need to have the discussion. I would say it's marginal whether
> something like this should be back ported.

> Usually we backport only bugs fixes to keep LTS (11u) release stable.

It makes perfect sense. I'm happy to wait longer, and follow up on that thread later on to check if there is any appetite to get it backported.

> You need also point if backport applied cleanly or you have to make changes.

The code conflicts were trivial as the infrastructure for intrinsics didn't change much since 11 (and even 8).

Conflicts:
http://cr.openjdk.java.net/~luhenry/8250902-11u/webrev.00/conflict.diff

> Changes should be backported separately to keep track - do not combine changes.
> But it is okay to push both changesets together (especially if followup changes fixed first).

Sorry I do not fully understand. Is it ok in this case to combine both changes into a single changeset, since the second one is a followup that fixes the first one? Or should I still make 2 changeset, but have them pushed together?

Thank you,
Ludovic


From jatin.bhateja at intel.com  Sat Aug  8 21:06:18 2020
From: jatin.bhateja at intel.com (Bhateja, Jatin)
Date: Sat, 8 Aug 2020 21:06:18 +0000
Subject: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86
In-Reply-To: <9748e2ee-f47d-7c47-627d-58e7d98e1779@oracle.com>
References: <MWHPR11MB1614EAFF216144FE6EAE68F9E87F0@MWHPR11MB1614.namprd11.prod.outlook.com>
 <92d97d1b-fc53-e368-b249-1cab7db33964@oracle.com>
 <MWHPR11MB1614CB6E26028AC98DAA7F30E8790@MWHPR11MB1614.namprd11.prod.outlook.com>
 <dd691913-d9c7-2657-905f-4f3df50f6bb4@oracle.com>
 <MWHPR11MB1614E047E14386D3B51EA3A9E8700@MWHPR11MB1614.namprd11.prod.outlook.com>
 <e0a75968-936f-97df-5693-f1e3275824e9@oracle.com>
 <5f6a3e52-7854-4613-43f1-32a7423a0db6@oracle.com>
 <MWHPR11MB1614B0D4523E65CF9876E72DE8710@MWHPR11MB1614.namprd11.prod.outlook.com>
 <8265e303-0f86-b308-be79-740d6b4710f2@oracle.com>
 <MWHPR11MB1614336DE7DA3CDDD7292519E84C0@MWHPR11MB1614.namprd11.prod.outlook.com>
 <MWHPR11MB1614F63435554D7186F6BDA3E8490@MWHPR11MB1614.namprd11.prod.outlook.com>
 <95066bec-d74e-eb55-9a05-463239016b2a@oracle.com>,
 <9748e2ee-f47d-7c47-627d-58e7d98e1779@oracle.com>
Message-ID: <MWHPR11MB1614A811F540C65AA502740DE8460@MWHPR11MB1614.namprd11.prod.outlook.com>

Thanks Vladimir K, Vladimir I,

Patch has been pushed with suggested changes.

https://hg.openjdk.java.net/jdk/jdk/rev/ebe6d3b79edf

Best Regards,
Jatin


-------- Original message --------
From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
Date: 07/08/2020 22:40 (GMT+05:30)
To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>, "Bhateja, Jatin" <jatin.bhateja at intel.com>
Cc: "Viswanathan, Sandhya" <sandhya.viswanathan at intel.com>, hotspot-compiler-dev at openjdk.java.net, Andrew Haley <aph at redhat.com>
Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86

I see that you already discussed removal of opcodes/encoding from patterns in .ad file and move them to Assembler. I
would like to see that too.

Changes look good otherwise. Thank you for adding tests to verify new code.

Thanks,
Vladimir K

On 8/7/20 5:15 AM, Vladimir Ivanov wrote:
>
>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.06/
>
> Still looks good.
>
> It would be nice to get one more (R)eview. Let's wait a little bit more.
>
> Best regards,
> Vladimir Ivanov
>
>>>> -----Original Message-----
>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>> Sent: Saturday, August 1, 2020 4:49 AM
>>>> To: Bhateja, Jatin <jatin.bhateja at intel.com>
>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>> hotspot-compiler- dev at openjdk.java.net
>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for
>>>> X86
>>>>
>>>>
>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.05/
>>>>
>>>> Looks good.
>>>>
>>>> Tier5 (where I saw the crashes) passed.
>>>>
>>>> Please, incorporate the following minor cleanups in the final version:
>>>>
>>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8248830/webrev.05.cleanu
>>>> p/
>>>>
>>>> (Tested with hs-tier1,hs-tier2.)
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>>>> -----Original Message-----
>>>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>>>> Sent: Thursday, July 30, 2020 3:30 AM
>>>>>> To: Bhateja, Jatin <jatin.bhateja at intel.com>
>>>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>>>> hotspot-compiler- dev at openjdk.java.net
>>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification
>>>>>> for
>>>>>> X86
>>>>>>
>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.04/
>>>>>>>
>>>>>>> Looks good. (Testing is in progress.)
>>>>>>
>>>>>> FYI test results are clean (tier1-tier5).
>>>>>>
>>>>>>>> I have removed RotateLeftNode/RotateRightNode::Ideal routines
>>>>>>>> since we are anyways doing constant folding in LShiftI/URShiftI
>>>>>>>> value routines. Since JAVA rotate APIs are no longer intrincified
>>>>>>>> hence these routines may no longer be useful.
>>>>>>>
>>>>>>> Nice observation! Good.
>>>>>>
>>>>>> As a second thought, it seems there's still a chance left that
>>>>>> Rotate nodes get their input type narrowed after the folding
>>>>>> happened. For example, as a result of incremental inlining or CFG
>>>>>> transformations during loop optimizations. And it does happen in
>>>>>> practice since the testing revealed some crashes due to the bug in
>>>> RotateLeftNode/RotateRightNode::Ideal().
>>>>>>
>>>>>> So, it makes sense to keep the transformations. But I'm fine with
>>>>>> addressing that as a followup enhancement.
>>>>>>
>>>>>> Best regards,
>>>>>> Vladimir Ivanov
>>>>>>
>>>>>>>
>>>>>>>>> It would be really nice to migrate to MacroAssembler along the
>>>>>>>>> way (as a cleanup).
>>>>>>>>
>>>>>>>> I guess you are saying remove opcodes/encoding from patterns and
>>>>>>>> move then to Assembler, Can we take this cleanup activity
>>>>>>>> separately since other patterns are also using these matcher
>>>> directives.
>>>>>>>
>>>>>>> I'm perfectly fine with handling it as a separate enhancement.
>>>>>>>
>>>>>>>> Other synthetic comments have been taken care of. I have extended
>>>>>>>> the Test to cover all the newly added scalar transforms. Kindly
>>>>>>>> let me know if there other comments.
>>>>>>>
>>>>>>> Nice!
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Vladimir Ivanov
>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>>>>>>> Sent: Friday, July 24, 2020 3:21 AM
>>>>>>>>> To: Bhateja, Jatin <jatin.bhateja at intel.com>
>>>>>>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Andrew
>>>>>>>>> Haley <aph at redhat.com>; hotspot-compiler-dev at openjdk.java.net
>>>>>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification
>>>>>>>>> for
>>>>>>>>> X86
>>>>>>>>>
>>>>>>>>> Hi Jatin,
>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.03/
>>>>>>>>>
>>>>>>>>> Much better! Thanks.
>>>>>>>>>
>>>>>>>>>> Change Summary:
>>>>>>>>>>
>>>>>>>>>> 1) Unified the handling for scalar rotate operation. All scalar
>>>>>>>>>> rotate
>>>>>>>>> selection patterns are now dependent on newly created
>>>>>>>>> RotateLeft/RotateRight nodes. This promotes rotate inferencing.
>>>>>>>>> Currently
>>>>>>>>> if DAG nodes corresponding to a sub-pattern are shared (have
>>>>>>>>> multiple
>>>>>>>>> users) then existing complex patterns based on
>>>>>>>>> Or/LShiftL/URShift does not get matched and this prevents inferring
>>> rotate nodes.
>>>>>>>>> Please refer to JIT'ed assembly output with baseline[1] and with
>>>>>>>>> patch[2] . We can see that generated code size also went done
>>>>>>>>> from
>>>>>>>>> 832 byte to 768 bytes. Also this can cause perf degradation if
>>>>>>>>> shift-or dependency chain appears inside a hot region.
>>>>>>>>>>
>>>>>>>>>> 2) Due to enhanced rotate inferencing new patch shows better
>>>>>>>>>> performance
>>>>>>>>> even for legacy targets (non AVX-512). Please refer to the perf
>>>>>>>>> result[3] over AVX2 machine for JMH benchmark part of the patch.
>>>>>>>>>
>>>>>>>>> Very nice!
>>>>>>>>>> 3) As suggested, removed Java API intrinsification changes and
>>>>>>>>>> scalar
>>>>>>>>> rotate transformation are done during OrI/OrL node idealizations.
>>>>>>>>>
>>>>>>>>> Good.
>>>>>>>>>
>>>>>>>>> (Still would be nice to factor the matching code from Ideal()
>>>>>>>>> and share it between multiple use sites. Especially considering
>>>>>>>>> OrVNode::Ideal() now does basically the same thing. As an
>>>>>>>>> example/idea, take a look at
>>>>>>>>> is_bmi_pattern() in x86.ad.)
>>>>>>>>>
>>>>>>>>>> 4) SLP always gets to work on new scalar Rotate nodes and
>>>>>>>>>> creates vector
>>>>>>>>> rotate nodes which are degenerated into OrV/LShiftV/URShiftV
>>>>>>>>> nodes if target does not supports vector rotates(non-AVX512).
>>>>>>>>>
>>>>>>>>> Good.
>>>>>>>>>
>>>>>>>>>> 5) Added new instruction patterns for vector shift Left/Right
>>>>>>>>>> operations
>>>>>>>>> with constant shift operands. This prevents emitting extra moves
>>>>>>>>> to
>>>>>> XMM.
>>>>>>>>>
>>>>>>>>> +instruct vshiftI_imm(vec dst, vec src, immI8 shift) %{
>>>>>>>>> +  match(Set dst (LShiftVI src shift));
>>>>>>>>>
>>>>>>>>> I'd prefer to see a uniform Ideal IR shape being used
>>>>>>>>> irrespective of whether the argument is a constant or not. It
>>>>>>>>> should also simplify the logic in SuperWord and make it easier
>>>>>>>>> to support on
>>>>>>>>> non-x86 architectures.
>>>>>>>>>
>>>>>>>>> For example, here's how it is done on AArch64:
>>>>>>>>>
>>>>>>>>> instruct vsll4I_imm(vecX dst, vecX src, immI shift) %{
>>>>>>>>>       predicate(n->as_Vector()->length() == 4);
>>>>>>>>>       match(Set dst (LShiftVI src (LShiftCntV shift))); ...
>>>>>>>>>
>>>>>>>>>> 6) Constant folding scenarios are covered in
>>>>>>>>>> RotateLeft/RotateRight
>>>>>>>>> idealization, inferencing of vector rotate through OrV
>>>>>>>>> idealization covers the vector patterns generated though non SLP
>>>> route i.e.
>>>>>>>>> VectorAPI.
>>>>>>>>>
>>>>>>>>> I'm fine with keeping OrV::Ideal(), but I'm concerned with the
>>>>>>>>> general direction here - duplication of scalar transformations
>>>>>>>>> to lane-wise vector operations. It definitely won't scale and in
>>>>>>>>> a longer run it risks to diverge. Would be nice to find a way to
>>>>>>>>> automatically "lift"
>>>>>>>>> scalar transformations to vectors and apply them uniformly. But
>>>>>>>>> right now it is just an idea which requires more experimentation.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Some other minor comments/suggestions:
>>>>>>>>>
>>>>>>>>> +  // Swap the computed left and right shift counts.
>>>>>>>>> +  if (is_rotate_left) {
>>>>>>>>> +    Node* temp = shiftRCnt;
>>>>>>>>> +    shiftRCnt  = shiftLCnt;
>>>>>>>>> +    shiftLCnt  = temp;
>>>>>>>>> +  }
>>>>>>>>>
>>>>>>>>> Maybe use swap() here (declared in globalDefinitions.hpp)?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> +  if (Matcher::match_rule_supported_vector(vopc, vlen, bt))
>>>>>>>>> +    return true;
>>>>>>>>>
>>>>>>>>> Please, don't omit curly braces (even for simple cases).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -// Rotate Right by variable
>>>>>>>>> -instruct rorI_rReg_Var_C0(no_rcx_RegI dst, rcx_RegI shift,
>>>>>>>>> immI0 zero, rFlagsReg cr)
>>>>>>>>> +instruct rorI_immI8_legacy(rRegI dst, immI8 shift, rFlagsReg
>>>>>>>>> +cr)
>>>>>>>>>      %{
>>>>>>>>> -  match(Set dst (OrI (URShiftI dst shift) (LShiftI dst (SubI
>>>>>>>>> zero shift))));
>>>>>>>>> -
>>>>>>>>> +  predicate(!VM_Version::supports_bmi2() &&
>>>>>>>>> n->bottom_type()->basic_type() == T_INT);
>>>>>>>>> +  match(Set dst (RotateRight dst shift));
>>>>>>>>> +  format %{ "rorl     $dst, $shift" %}
>>>>>>>>>        expand %{
>>>>>>>>> -    rorI_rReg_CL(dst, shift, cr);
>>>>>>>>> +    rorI_rReg_imm8(dst, shift, cr);
>>>>>>>>>        %}
>>>>>>>>>
>>>>>>>>> It would be really nice to migrate to MacroAssembler along the
>>>>>>>>> way (as a cleanup).
>>>>>>>>>
>>>>>>>>>> Please push the patch through your testing framework and let me
>>>>>>>>>> know your
>>>>>>>>> review feedback.
>>>>>>>>>
>>>>>>>>> There's one new assertion failure:
>>>>>>>>>
>>>>>>>>> #  Internal Error (.../src/hotspot/share/opto/phaseX.cpp:1238),
>>>>>>>>> pid=5476, tid=6219
>>>>>>>>> #  assert((i->_idx >= k->_idx) || i->is_top()) failed: Idealize
>>>>>>>>> should return new nodes, use Identity to return old nodes
>>>>>>>>>
>>>>>>>>> I believe it comes from
>>>>>>>>> RotateLeftNode::Ideal/RotateRightNode::Ideal
>>>>>>>>> which can return pre-contructed constants. I suggest to get rid
>>>>>>>>> of
>>>>>>>>> Ideal() methods and move constant folding logic into
>>>>>>>>> Node::Value() (as implemented for other bitwise/arithmethic
>>>>>>>>> nodes in addnode.cpp/subnode.cpp/mulnode.cpp et al). It's a more
>>>>>>>>> generic approach since it enables richer type information
>>>>>>>>> (ranges vs
>>>>>>>>> constants) and IMO it's more convenient to work with constants
>>>>>>>>> through Types than ConNodes.
>>>>>>>>>
>>>>>>>>> (I suspect that original/expanded IR shape may already provide
>>>>>>>>> more precise type info for non-constant case which can affect
>>>>>>>>> the
>>>>>>>>> benchmarks.)
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Vladimir Ivanov
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best Regards,
>>>>>>>>>> Jatin
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>>
>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_baseline_avx2_asm.
>>>>>>>>>> txt [2]
>>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_new_patch_a
>>>>>>>>>> vx
>>>>>>>>>> 2_
>>>>>>>>>> asm
>>>>>>>>>> .txt [3]
>>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_perf_avx2_n
>>>>>>>>>> ew
>>>>>>>>>> _p
>>>>>>>>>> atc
>>>>>>>>>> h.txt
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>>>>>>>>> Sent: Saturday, July 18, 2020 12:25 AM
>>>>>>>>>>> To: Bhateja, Jatin <jatin.bhateja at intel.com>; Andrew Haley
>>>>>>>>>>> <aph at redhat.com>
>>>>>>>>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>>>>>>>>> hotspot-compiler- dev at openjdk.java.net
>>>>>>>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API
>>>>>>>>>>> intrinsification for
>>>>>>>>>>> X86
>>>>>>>>>>>
>>>>>>>>>>> Hi Jatin,
>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev_02/
>>>>>>>>>>>
>>>>>>>>>>> It definitely looks better, but IMO it hasn't reached the
>>>>>>>>>>> sweet spot
>>>>>>>>> yet.
>>>>>>>>>>> It feels like the focus is on auto-vectorizer while the burden
>>>>>>>>>>> is put on scalar cases.
>>>>>>>>>>>
>>>>>>>>>>> First of all, considering GVN folds relevant operation
>>>>>>>>>>> patterns into a single Rotate node now, what's the motivation
>>>>>>>>>>> to introduce intrinsics?
>>>>>>>>>>>
>>>>>>>>>>> Another point is there's still significant duplication for
>>>>>>>>>>> scalar cases.
>>>>>>>>>>>
>>>>>>>>>>> I'd prefer to see the legacy cases which rely on pattern
>>>>>>>>>>> matching to go away and be substituted with instructions which
>>>>>>>>>>> match Rotate instructions (migrating ).
>>>>>>>>>>>
>>>>>>>>>>> I understand that it will penalize the vectorization
>>>>>>>>>>> implementation, but IMO reducing overall complexity is worth it.
>>>>>>>>>>> On auto-vectorizer side, I see
>>>>>>>>>>> 2 ways to fix it:
>>>>>>>>>>>
>>>>>>>>>>>        (1) introduce additional AD instructions for
>>>>>>>>>>> RotateLeftV/RotateRightV specifically for pre-AVX512 hardware;
>>>>>>>>>>>
>>>>>>>>>>>        (2) in SuperWord::output(), when matcher doesn't support
>>>>>>>>>>> RotateLeftV/RotateLeftV nodes
>>>>>>>>>>> (Matcher::match_rule_supported()),
>>>>>>>>>>> generate vectorized version of the original pattern.
>>>>>>>>>>>
>>>>>>>>>>> Overall, it looks like more and more focus is made on scalar
>>> part.
>>>>>>>>>>> Considering the main goal of the patch is to enable
>>>>>>>>>>> vectorization, I'm fine with separating cleanup of scalar part.
>>>>>>>>>>> As an interim solution, it seems that leaving the scalar part
>>>>>>>>>>> as it is now and matching scalar bit rotate pattern in
>>>>>>>>>>> VectorNode::is_rotate() should be enough to keep the
>>>>>>>>>>> vectorization part functioning. Then scalar Rotate nodes and
>>>> relevant cleanups can be integrated later.
>>>>>>>>>>> (Or vice
>>>>>>>>>>> versa: clean up scalar part first and then follow up with
>>>>>>>>>>> vectorization.)
>>>>>>>>>>>
>>>>>>>>>>> Some other comments:
>>>>>>>>>>>
>>>>>>>>>>> * There's a lot of duplication between OrINode::Ideal and
>>>>>>>>> OrLNode::Ideal.
>>>>>>>>>>> What do you think about introducing a super type
>>>>>>>>>>> (OrNode) and put a unified version (OrNode::Ideal) there?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> * src/hotspot/cpu/x86/x86.ad
>>>>>>>>>>>
>>>>>>>>>>> +instruct vprotate_immI8(vec dst, vec src, immI8 shift) %{
>>>>>>>>>>> +  predicate(n->bottom_type()->is_vect()->element_basic_type()
>>>>>>>>>>> +==
>>>>>>>>>>> T_INT
>>>>>>>>> ||
>>>>>>>>>>> +            n->bottom_type()->is_vect()->element_basic_type()
>>>>>>>>>>> +== T_LONG);
>>>>>>>>>>>
>>>>>>>>>>> +instruct vprorate(vec dst, vec src, vec shift) %{
>>>>>>>>>>> +  predicate(n->bottom_type()->is_vect()->element_basic_type()
>>>>>>>>>>> +==
>>>>>>>>>>> T_INT
>>>>>>>>> ||
>>>>>>>>>>> +            n->bottom_type()->is_vect()->element_basic_type()
>>>>>>>>>>> +== T_LONG);
>>>>>>>>>>>
>>>>>>>>>>> The predicates are redundant here.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> * src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
>>>>>>>>>>>
>>>>>>>>>>> +void C2_MacroAssembler::vprotate_imm(int opcode, BasicType
>>>>>>>>>>> +etype,
>>>>>>>>>>> XMMRegister dst, XMMRegister src,
>>>>>>>>>>> +                                     int shift, int
>>>>>>>>>>> +vector_len) { if (opcode == Op_RotateLeftV) {
>>>>>>>>>>> +    if (etype == T_INT) {
>>>>>>>>>>> +      evprold(dst, src, shift, vector_len);
>>>>>>>>>>> +    } else {
>>>>>>>>>>> +      evprolq(dst, src, shift, vector_len);
>>>>>>>>>>> +    }
>>>>>>>>>>>
>>>>>>>>>>> Please, put an assert for the false case (assert(etype ==
>>>>>>>>>>> T_LONG,
>>>>>>>>> "...")).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> * On testing (with previous version of the patch): -XX:UseAVX
>>>>>>>>>>> is
>>>>>>>>>>> x86- specific flag, so new/adjusted tests now fail on non-x86
>>>>>> platforms.
>>>>>>>>>>> Either omitting the flag or adding
>>>>>>>>>>> -XX:+IgnoreUnrecognizedVMOptions will solve the issue.
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Vladimir Ivanov
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Summary of changes:
>>>>>>>>>>>> 1) Optimization is specifically targeted to exploit vector
>>>>>>>>>>>> rotation
>>>>>>>>>>> instruction added for X86 AVX512. A single rotate instruction
>>>>>>>>>>> encapsulates entire vector OR/SHIFTs pattern thus offers
>>>>>>>>>>> better latency at reduced instruction count.
>>>>>>>>>>>>
>>>>>>>>>>>> 2) There were two approaches to implement this:
>>>>>>>>>>>>          a)  Let everything remain the same and add new wide
>>>>>>>>>>>> complex
>>>>>>>>>>> instruction patterns in the matcher for e.g.
>>>>>>>>>>>>               set Dst ( OrV (Binary (LShiftVI dst (Binary
>>>>>>>>>>>> ReplicateI
>>>>>>>>>>>> shift))
>>>>>>>>>>> (URShiftVI dst (Binary (SubI (Binary ReplicateI 32) (
>>>>>>>>>>> Replicate
>>>>>>>>>>> shift))
>>>>>>>>>>>>          It would have been an overoptimistic assumption to
>>>>>>>>>>>> expect that graph
>>>>>>>>>>> shape would be preserved till the matcher for correct
>>> inferencing.
>>>>>>>>>>>>          In addition we would have required multiple such
>>>>>>>>>>>> bulky patterns.
>>>>>>>>>>>>          b) Create new RotateLeft/RotateRight scalar nodes,
>>>>>>>>>>>> these gets
>>>>>>>>>>> generated during intrinsification as well as during additional
>>>>>>>>>>> pattern
>>>>>>>>>>>>          matching during node Idealization, later on these
>>>>>>>>>>>> nodes are consumed
>>>>>>>>>>> by SLP for valid vectorization scenarios to emit their vector
>>>>>>>>>>>>          counterparts which eventually emits vector rotates.
>>>>>>>>>>>>
>>>>>>>>>>>> 3) I choose approach 2b) since its cleaner, only problem here
>>>>>>>>>>>> was that in non-evex mode (UseAVX < 3) new scalar Rotate
>>>>>>>>>>>> nodes should either be
>>>>>>>>>>> dismantled back to OR/SHIFT pattern or we penalize the
>>>>>>>>>>> vectorization which would be very costly, other option would
>>>>>>>>>>> have been to add additional vector rotate pattern for UseAVX=3
>>>>>>>>>>> in the matcher which emit vector OR-SHIFTs instruction but
>>>>>>>>>>> then it will loose on emitting efficient instruction sequence
>>>>>>>>>>> which node sharing
>>>>>>>>>>> (OrV/LShiftV/URShift) offer in current implementation - thus
>>>>>>>>>>> it will not be beneficial for non-AVX512 targets, only saving
>>>>>>>>>>> will be in terms of cleanup of few existing scalar rotate
>>>>>>>>>>> matcher patterns, also old targets does not offer this
>>>>>>>>>>> powerful rotate
>>>>>> instruction.
>>>>>>>>>>> Therefore new scalar nodes are created only for AVX512 targets.
>>>>>>>>>>>>
>>>>>>>>>>>> As per suggestions constant folding scenarios have been
>>>>>>>>>>>> covered during
>>>>>>>>>>> Idealizations of newly added scalar nodes.
>>>>>>>>>>>>
>>>>>>>>>>>> Please review the latest version and share your feedback and
>>>>>>>>>>>> test
>>>>>>>>>>> results.
>>>>>>>>>>>>
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Jatin
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: Andrew Haley <aph at redhat.com>
>>>>>>>>>>>>> Sent: Saturday, July 11, 2020 2:24 PM
>>>>>>>>>>>>> To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; Bhateja,
>>>>>>>>>>>>> Jatin <jatin.bhateja at intel.com>;
>>>>>>>>>>>>> hotspot-compiler-dev at openjdk.java.net
>>>>>>>>>>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
>>>>>>>>>>>>> Subject: Re: 8248830 : RFR[S] : C2 : Rotate API
>>>>>>>>>>>>> intrinsification for
>>>>>>>>>>>>> X86
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/07/2020 18:32, Vladimir Ivanov wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>       > High-level comment: so far, there were no pressing
>>>>>>>>>>>>> need in
>>>>>>>>>>>>>> explicitly marking the methods as intrinsics. ROR/ROL
>>>>>>>>>>>>> instructions
>>>>>>>>>>>>>> were selected during matching [1]. Now the patch introduces
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> dedicated nodes
>>>>>>>>>>>>> (RotateLeft/RotateRight) specifically for intrinsics  >
>>>>>>>>>>>>> which partly duplicates existing logic.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The lack of rotate nodes in the IR has always meant that
>>>>>>>>>>>>> AArch64 doesn't generate optimal code for e.g.
>>>>>>>>>>>>>
>>>>>>>>>>>>>         (Set dst (XorL reg1 (RotateLeftL reg2 imm)))
>>>>>>>>>>>>>
>>>>>>>>>>>>> because, with the RotateLeft expanded to its full
>>>>>>>>>>>>> combination of ORs and shifts, it's to complicated to match.
>>>>>>>>>>>>> At the time I put this to one side because it wasn't urgent.
>>>>>>>>>>>>> This is a shame because although such combinations are
>>>>>>>>>>>>> unusual they are used in some crypto
>>>>>>>>> operations.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we can generate immediate-form rotate nodes early by
>>>>>>>>>>>>> pattern matching during parsing (rather than depending on
>>>>>>>>>>>>> intrinsics) we'll get more value than by depending on
>>>>>>>>>>>>> programmers calling
>>>>>> intrinsics.
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Andrew Haley  (he/him)
>>>>>>>>>>>>> Java Platform Lead Engineer
>>>>>>>>>>>>> Red Hat UK Ltd. <https://www.redhat.com>
>>>>>>>>>>>>> https://keybase.io/andrewhaley
>>>>>>>>>>>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>>>>>>>>>>>>

From vladimir.kozlov at oracle.com  Sun Aug  9 02:28:42 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Sat, 8 Aug 2020 19:28:42 -0700
Subject: [11u] RFR[M]: 8250902: Implement MD5 Intrinsics on x86
In-Reply-To: <MWHPR21MB0511FE500E9FEC0D6789F95CB0460@MWHPR21MB0511.namprd21.prod.outlook.com>
References: <MWHPR21MB0511C1E3EBF84010E0B083F6B0460@MWHPR21MB0511.namprd21.prod.outlook.com>
 <8b61da32-3cc1-fa14-a607-1f871f7e3d70@redhat.com>
 <MWHPR21MB0511FE500E9FEC0D6789F95CB0460@MWHPR21MB0511.namprd21.prod.outlook.com>
Message-ID: <b94af288-4227-f2cb-7f93-8fbabda34ba5@oracle.com>

On 8/8/20 10:30 AM, Ludovic Henry wrote:
> Hi Andrew, Vladimir,
> 
>> It's too early for that: changes are supposed to bake in JDK head for
>> a while. Also, since it's an enhancement rather than a bug fix we'd
>> need to have the discussion. I would say it's marginal whether
>> something like this should be back ported.
> 
>> Usually we backport only bugs fixes to keep LTS (11u) release stable.
> 
> It makes perfect sense. I'm happy to wait longer, and follow up on that thread later on to check if there is any appetite to get it backported.
> 
>> You need also point if backport applied cleanly or you have to make changes.
> 
> The code conflicts were trivial as the infrastructure for intrinsics didn't change much since 11 (and even 8).
> 
> Conflicts:
> http://cr.openjdk.java.net/~luhenry/8250902-11u/webrev.00/conflict.diff
> 
>> Changes should be backported separately to keep track - do not combine changes.
>> But it is okay to push both changesets together (especially if followup changes fixed first).
> 
> Sorry I do not fully understand. Is it ok in this case to combine both changes into a single changeset, since the second one is a followup that fixes the first one? Or should I still make 2 changeset, but have them pushed together?

It is not okay to combine changes into a single changeset.

You need to make 2 (in this case) separate changesets but push them together. You can push them separately too but there 
is a chance that second push may miss a new build which would includes only first push.

Also if a changeset applies cleanly you can use "hg export" and "hg import" commands - no need to do new commit.

If changeset does not apply cleanly you need to send RFR for backport as you correctly did.

Regards,
Vladimir

> 
> Thank you,
> Ludovic
> 

From luhenry at microsoft.com  Sun Aug  9 03:19:20 2020
From: luhenry at microsoft.com (Ludovic Henry)
Date: Sun, 9 Aug 2020 03:19:20 +0000
Subject: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64
Message-ID: <MWHPR21MB051183FDFD8C154790995C1BB0470@MWHPR21MB0511.namprd21.prod.outlook.com>

Hello,

Bug: https://bugs.openjdk.java.net/browse/JDK-8251216
Webrev: http://cr.openjdk.java.net/~luhenry/8251216/webrev.00

Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1

This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance improvements are the following (on Linux-AArch64 on a Marvell TX2):

-XX:-UseMD5Intrinsics
Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
MessageDigests.digest             md5        64     DEFAULT  thrpt   10  1616.238 ? 28.082  ops/ms
MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   215.030 ?  0.691  ops/ms
MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.228 ?  0.001  ops/ms

-XX:+UseMD5Intrinsics
Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
MessageDigests.digest             md5        64     DEFAULT  thrpt   10  2005.233 ? 40.513  ops/ms => 24% speedup
MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   275.979 ?  0.455  ops/ms => 28% speedup
MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.279 ?  0.001  ops/ms => 22% speedup

Thank you,
Ludovic

[1] https://bugs.openjdk.java.net/browse/JDK-8250902

From aph at redhat.com  Sun Aug  9 14:32:48 2020
From: aph at redhat.com (Andrew Haley)
Date: Sun, 9 Aug 2020 15:32:48 +0100
Subject: [aarch64-port-dev ] RFR: 8247354: [aarch64] PopFrame causes
 assert(oopDesc::is_oop(obj)) failed: not an oop
In-Reply-To: <85364ykery.fsf@nicgas01-pc.shanghai.arm.com>
References: <85364ykery.fsf@nicgas01-pc.shanghai.arm.com>
Message-ID: <c3192037-6ce7-40c1-7137-a3b4a0bd502f@redhat.com>

On 8/7/20 10:04 AM, Nick Gasson wrote:
> Bug:https://bugs.openjdk.java.net/browse/JDK-8247354
> Webrev:http://cr.openjdk.java.net/~ngasson/8247354/webrev.0/

How did you test this? I'm looking through the test suite, but I can't
find the test vectors. They must be in there somewhere.

https://www.nist.gov/itl/ssd/software-quality-group/nsrl-test-data

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From nick.gasson at arm.com  Mon Aug 10 01:34:41 2020
From: nick.gasson at arm.com (Nick Gasson)
Date: Mon, 10 Aug 2020 09:34:41 +0800
Subject: [aarch64-port-dev ] RFR: 8247354: [aarch64] PopFrame causes
 assert(oopDesc::is_oop(obj)) failed: not an oop
In-Reply-To: <c3192037-6ce7-40c1-7137-a3b4a0bd502f@redhat.com>
References: <85364ykery.fsf@nicgas01-pc.shanghai.arm.com>
 <c3192037-6ce7-40c1-7137-a3b4a0bd502f@redhat.com>
Message-ID: <85lfinwafi.fsf@nicgas01-pc.shanghai.arm.com>

On 08/09/20 22:32 pm, Andrew Haley wrote:
> On 8/7/20 10:04 AM, Nick Gasson wrote:
>> Bug:https://bugs.openjdk.java.net/browse/JDK-8247354
>> Webrev:http://cr.openjdk.java.net/~ngasson/8247354/webrev.0/
>
> How did you test this? I'm looking through the test suite, but I can't
> find the test vectors. They must be in there somewhere.
>
> https://www.nist.gov/itl/ssd/software-quality-group/nsrl-test-data

Hi Andrew, did you reply to the wrong mail...?

--
Nick

From Xiaohong.Gong at arm.com  Mon Aug 10 01:39:23 2020
From: Xiaohong.Gong at arm.com (Xiaohong Gong)
Date: Mon, 10 Aug 2020 01:39:23 +0000
Subject: RFR: 8250808: Re-associate loop invariants with other associative
 operations
In-Reply-To: <f7fe7c01-5845-1e95-bb8e-e4dc84d63a43@oracle.com>
References: <VI1PR08MB5328B283CCC9EFB7AFBF3108F5480@VI1PR08MB5328.eurprd08.prod.outlook.com>
 <f7fe7c01-5845-1e95-bb8e-e4dc84d63a43@oracle.com>
Message-ID: <VI1PR08MB5328C0023BB5072077A76F97F5440@VI1PR08MB5328.eurprd08.prod.outlook.com>

Hi Vladimir,

> > Webrev: http://cr.openjdk.java.net/~xgong/rfr/8250808/webrev.00/
> 
> Looks good.
> 
> So far, testing results look good (hs-tier1/2 are clean, tier1-4 are in progress).

Thanks for looking at it!

Thanks,
Xiaohong Gong 

From david.holmes at oracle.com  Mon Aug 10 02:03:35 2020
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 10 Aug 2020 12:03:35 +1000
Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler
 code should honor Async Exceptions
In-Reply-To: <ba5ebf9b-90a7-c45e-a4fb-af2e4efe078d@oracle.com>
References: <ba5ebf9b-90a7-c45e-a4fb-af2e4efe078d@oracle.com>
Message-ID: <442caa21-ca0a-f6eb-60a5-1e74bf994894@oracle.com>

Hi Jamsheed,

On 6/08/2020 10:07 pm, Jamsheed C M wrote:
> Hi all,
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8249451
> 
> webrev: http://cr.openjdk.java.net/~jcm/8249451/webrev.00/

Thanks for tackling this messy issue. Overall I like the use of TRAPS to 
more clearly document which methods can return with an exception 
pending. I think there are some problems with the proposed changes. I'll 
start with those comments and then move on to more general comments.

src/hotspot/share/utilities/exceptions.cpp
src/hotspot/share/utilities/exceptions.hpp

I don't think the changes here are correct or safe in general.

First, adding the new macro and function to only clear non-async 
exceptions is fine itself. But naming wise the fact only non-async 
exceptions are cleared should be evident, and there is no "check" 
involved (in the sense of the existing CHECK_ macros) so I suggest:

s/CHECK_CLEAR_PENDING_EXCEPTION/CLEAR_PENDING_NONASYNC_EXCEPTIONS/
s/check_clear_pending_exception/clear_pending_nonasync_exceptions/

But changing the existing CHECK_AND_CLEAR macros to now leave async 
exceptions pending seems potentially dangerous as calling code may not 
be prepared for there to now be a pending exception. For example the use 
in thread.cpp:

  JDK_Version::set_runtime_name(get_java_runtime_name(THREAD));
  JDK_Version::set_runtime_version(get_java_runtime_version(THREAD));

get_java_runtime_name() is currently guaranteed to clear all exceptions, 
so all the other code is known to be safe to call. But that would no 
longer be true. That said, this is VM initialization code and an async 
exception is impossible at this stage.

I think I would rather see CHECK_AND_CLEAR left as-is, and an actual 
CHECK_AND_CLEAR_NONASYNC introduced for those users of CHECK_AND_CLEAR 
that can encounter async exceptions and which should not clear them.

+   if (!_pending_exception->is_a(SystemDictionary::ThreadDeath_klass()) &&
+       _pending_exception->klass() != 
SystemDictionary::InternalError_klass()) {

Flagging all InternalErrors as async exceptions is probably also not 
correct. I don't see a good solution to this at the moment. I think we 
would need to introduce a new subclass of InternalError for the unsafe 
access error case**. Now it may be that all the other InternalError 
usages are "impossible" in the context of where the new macros are to be 
used, but that is very difficult to establish or assert.

** Or perhaps we could inject a field that allows the VM to identify 
instances related to unsafe access errors ... Ideally of course these 
unsafe access errors would be distinct from the async exception 
mechanism - something I would still like to pursue.

---

General comments ...

There is a general change from "JavaThread* thread" to "Thread* THREAD" 
(or TRAPS) to allow the use of the CHECK macros. This is unfortunate 
because the fact the thread is restricted to being a JavaThread is no 
longer evident in the method signatures. That is a flaw with the 
TRAPS/CHECK mechanism unfortunately :( . But as the methods no longer 
take a JavaThread* arg, they should assert that 
THREAD->is_Java_thread(). I will also look at an RFE to have 
as_JavaThread() to avoid the need for separate assertion checks before 
casting from "Thread*" to "JavaThread*".

Note there's no need to use CHECK when the enclosing method is going to 
return immediately after the call that contains the CHECK. It just adds 
unnecessary checking of the exception state. The use of TRAPS shows that 
the methods may return with an exception pending. I've flagged all such 
occurrences I spotted below.

---

+   // Only metaspace OOM is expected. no Java code executed.

Nit: s/no/No


src/hotspot/share/compiler/compilationPolicy.cpp


  410       method_invocation_event(method, CHECK_NULL);
  489       CompileBroker::compile_method(m, InvocationEntryBci, 
comp_level, m, hot_count, CompileTask::Reason_InvocationCount, CHECK);

Nit: there's no need to use CHECK here.

---

src/hotspot/share/compiler/tieredThresholdPolicy.cpp

  504     method_invocation_event(method, inlinee, comp_level, nm, 
CHECK_NULL);
  570         compile(mh, bci, CompLevel_simple, CHECK);
  581         compile(mh, bci, CompLevel_simple, CHECK);
  595     CompileBroker::compile_method(mh, bci, level, mh, hot_count, 
CompileTask::Reason_Tiered, CHECK);
1062       compile(mh, InvocationEntryBci, next_level, CHECK);

Nit: there's no need to use CHECK here.

814 void TieredThresholdPolicy::create_mdo(const methodHandle& mh, 
Thread* THREAD) {

Thank you for correcting this misuse of the THREAD name on a JavaThread* 
type.

---

src/hotspot/share/interpreter/linkResolver.cpp

  128   CompilationPolicy::compile_if_required(selected_method, CHECK);

Nit: there's no need to use CHECK here.

---

src/hotspot/share/jvmci/compilerRuntime.cpp

  260     CompilationPolicy::policy()->event(emh, mh, 
InvocationEntryBci, InvocationEntryBci, CompLevel_aot, cm, CHECK);
  280     nmethod* osr_nm = CompilationPolicy::policy()->event(emh, mh, 
branch_bci, target_bci, CompLevel_aot, cm, CHECK);

Nit: there's no need to use CHECK here.

---

src/hotspot/share/jvmci/jvmciRuntime.cpp

  102         // Donot clear probable async exceptions.

typo: s/Donot/Do not/

---

src/hotspot/share/runtime/deoptimization.cpp

1686 void Deoptimization::load_class_by_index(const constantPoolHandle& 
constant_pool, int index) {

This method should be declared with TRAPS now.

1693     // Donot clear probable Async Exceptions.

typo: s/Donot/Do not/


> testing : mach1-5(links in jbs)

There is very little existing testing that will actually test the key 
changes you have made here. You will need to do direct fault-injection 
testing anywhere you now allow async exceptions to remain, to see if the 
calling code can tolerate that. It will be difficult to test thoroughly.

Thanks again for tackling this difficult problem!

David
-----

> 
> While working on JDK-8246381 it was noticed that compilation request 
> path clears all exceptions(including async) and doesn't propagate[1].
> 
> Fix: patch restores the propagation behavior for the probable async 
> exceptions.
> 
> Compilation request path propagate exception as in [2]. MDO and 
> MethodCounter doesn't expect any exception other than metaspace 
> OOM(added comments).
> 
> Deoptimization path doesn't clear probable async exceptions and take 
> unpack_exception path for non uncommontraps.
> 
> Added java_lang_InternalError to well known classes.
> 
> Request for review.
> 
> Best Regards,
> 
> Jamsheed
> 
> [1] w.r.t changes done for JDK-7131259
> 
> [2]
> 
>  ??? (a)
>  ??? -----> c1_Runtime1.cpp/interpreterRuntime.cpp/compilerRuntime.cpp
>  ????? |
>  ?????? ----- compilationPolicy.cpp/tieredThresholdPolicy.cpp
>  ???????? |
>  ????????? ------ compileBroker.cpp
> 
>  ??? (b)
>  ??? Xcomp versions
>  ??? ------> compilationPolicy.cpp
>  ?????? |
>  ??????? ------> compileBroker.cpp
> 
>  ??? (c)
> 
>  ??? Direct call to? compile_method in compileBroker.cpp
> 
>  ??? JVMCI bootstrap, whitebox, replayCompile.
> 
> 

From vladimir.kozlov at oracle.com  Mon Aug 10 04:25:54 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Sun, 9 Aug 2020 21:25:54 -0700
Subject: [16] RFR (S) 8249749: modify a primitive array through a stream and a
 for cycle causes jre crash
Message-ID: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com>

http://cr.openjdk.java.net/~kvn/8249749/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8249749

SuperWord does not recognize array indexing pattern used in the test due to additional AddI node:

AddI(AddI(Invariant(j*n), Loop_phi(i)), Loop_inc(1))

As result it can't find memory reference to align vectors. But code ignores that and continue execution.
Later when align_to_ref is referenced we hit SEGV because it is NULL.

The fix is to check align_to_ref for NULL early and bailout.

I also adjusted code in SWPointer::scaled_iv_plus_offset() to recognize this address pattern to vectorize test's code.
And added missing _invar setting.

And I slightly modified tracking code to investigate this issue.

Added new test to check some complex address expressions similar to bug's test case. Not all cases in test are 
vectorized - there are other conditions which prevent that.

Tested tier1,tier2,hs-tier3,precheckin-comp

Thanks,
Vladimir K

From Pengfei.Li at arm.com  Mon Aug 10 04:45:17 2020
From: Pengfei.Li at arm.com (Pengfei Li)
Date: Mon, 10 Aug 2020 04:45:17 +0000
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
Message-ID: <DB8PR08MB496923EB3A6FBA193EF35AD196440@DB8PR08MB4969.eurprd08.prod.outlook.com>

Hi Andrew Dinn,

Thank you so much for taking the time to review this.

As Ningsheng is on leave this week, I will attempt to answer your specific questions based on what I know. I'm sorry that I'm not able to answer all your questions since I'm not familiar with every detail of the patch. And you may still need to wait him coming back to update the webrev(s).

> I was able to test this patch on a loaned Fujitsu FX700. I replicated your
> results, passing tier1 tests and the jtreg compiler tests in vectorization,
> codegen, c2/cr6340864 and loopopts.
> 
> I also eyeballed /some/ of the generated code to check that it looked ok. I'd
> really like to be able to do that systematically for a comprehensive test suite
> that exercised every rule but I only had the machine for a few days. This
> really ought to be done as a follow-up to ensure that all the rules are working
> as expected.

Not sure if you have tried my newly added test in the vectorization folder. It checks if expected SVE/NEON instructions are generated as expected for each C2 vectornode by checking the OptoAssembly output. I put it in another webrev so you may have missed it.
http://cr.openjdk.java.net/~pli/rfr/8231441/jtreg.webrev.00/

> Specific Comments (feature webrev):
> 
> 
> globals_aarch64.hpp:102
> 
> Just out of interest why does UseSVE have range(0,2)? It seems you are only
> testing for UseSVE > 0. Does value 2 correspond to  an optional subset?

AArch64 SVE has multiple versions. Current Fujitsu FX machine supports SVE1 only. We leave 2 here for SVE2 support in the near future.
https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-instruction-emulator/resources/tutorials/sve/sve-vs-sve2/introduction-to-sve2

> Specific Comments (register allocator webrev):
> 
> 
> aarch64.ad:97-100
> 
> Why have you added a reg_def for R8 and R9 here and also to alloc_class
> chunk0 at lines 544-545? They aren't used by C2 so why define them?

This has no functionality change to the two scratch registers. But if these are missing in the register definition, the regmask for vector registers won't start at an aligned position. So we prefer adding them back to make the computation easier.

> assembler_aarch64.hpp:280 (also 699)
> 
> prf sets a predicate register field. pgrf sets a governing predicate register field.
> Should the name not be gprf.

I guess the reason is that the ArmARM doc says "the Pg field".

> chaitin.cpp:648-660
> 
> The comment is rather oddly formatted.

Thanks for catching this.

> At line 650 you guard the assert with a test for lrg._is_vector. Is that not
> always going to be guaranteed by the outer condition lrg._is_scalable? If so
> then you should really assert lrg._is_vector.
> 
> The special case code for computation of num_regs for a vector stack slot
> also appears in this file with a slightly different organization in find_first_set
> (line 1350) and in PhaseChaitin::Select (line 1590).
> There is another similar case in RegMask::num_registers at regmask.cpp:
> 98. It would be better to factor out the common code into methods of LRG.
> Maybe using the following?
> 
>   bool LRG::is_scalable_vector() {
>     if (_is_scalable) {
>       assert(_is_vector == 1);
>       assert(_num_regs == == RegMask::SlotsPerVecA)
>       return true;
>     }
>     return false;
>   }
> 
>   int LRG::scalable_num_regs() {
>     assert(is_scalable_vector());
>     if (OptoReg::is_stack(_reg)) {
>       return _scalable_reg_slots
>     } else {
>       return num_reg_slots;
>     }
>   }
> 
> 
> chaitin.cpp:1350
> 
> Once again the test for lrg._is_vector should be guaranteed by the outer test
> of lrg._is_scalable. Refactoring using the common methods of LRG as above
> ought to help.
> 
> chaitin.cpp:1591
> 
> Use common method code.
> 
> 
> postaloc.cpp:308/323
> 
> Once again you should be able to use common method code of LRG here.
> 
> 
> regmask.cpp:91
> 
> Once again you should be able to use common method code of LRG here.

Thanks for above suggestions. We will consider refactoring these parts.

> Specific Comments (c2 webrev):
> 
> 
> aarch64.ad:3815
> 
> very nice defensive check!
> 
> 
> assembler_aarch64.hpp:2469 & 2699+
> 
> Andrew Haley is definitely going to ask you to update function entry
> (assembler_aarch64.cpp:76) to call these new instruction generation
> methods and then validate the generated code using asm_check So, I guess
> you might as well do that now ;-)

Thanks for letting us know. We will check how to validate those.

> zBarrierSetAssembler_aarch64.cpp:434
> 
> Can you explain why we need to check p7 here and not do so in other places
> where we call into the JVM? I'm not saying this is wrong. I just want to know
> how you decided where re-init of p7 was needed.

Sorry I don't know how the places are decided. But I will ask Ningsheng to explain this question and reply you later.

> superword.cpp:97
> 
> Does this mean that is someone sets the maximum vector size to a non-
> power of two, such as 384, all superword operations will be bypassed?
> Including those which can be done using NEON vectors?

The existing SLP doesn't support non-power-of-2 vector size (there are some assertions inside) so we added this. Yes, it's better if we have some mechanism to fall back to NEON for non-power-of-2 size. But so far in practice, we don't know any real chip implements the non-power-of-2 vector size. Also, we are now working on a new predicate-driven auto-vectorization pass to support SVE better. Do you think it's ok if we print some warnings if someone sets a non-power-of-2 size in vm options? Or any other suggestions in the short term?

--
Thanks,
Pengfei


From tobias.hartmann at oracle.com  Mon Aug 10 06:18:21 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 10 Aug 2020 08:18:21 +0200
Subject: [16] RFR(S): 8249608: Vector register used by C2 compiled method
 corrupted at safepoint
In-Reply-To: <8ddafcf8-5fcf-c0cc-ccd0-29692dd1c19b@oracle.com>
References: <9dca10a8-0fcb-e63f-f0f9-c2552e5218c1@oracle.com>
 <57163077-f113-b538-2830-86e43c5bd8ea@oracle.com>
 <8ddafcf8-5fcf-c0cc-ccd0-29692dd1c19b@oracle.com>
Message-ID: <88ad17d1-d79c-3504-d535-a720a8239fe4@oracle.com>

Thanks Vladimir!

Best regards,
Tobias

On 06.08.20 21:00, Vladimir Kozlov wrote:
> +1
> 
> Thanks,
> Vladimir K
> 
> On 8/6/20 7:07 AM, Vladimir Ivanov wrote:
>>
>>> http://cr.openjdk.java.net/~thartmann/8249608/webrev.00/
>>
>> Looks good.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>>
>>> The problem is very similar to JDK-8193518 [1], a vector register (ymm0) used for vectorization of a
>>> loop in a C2 compiled method is corrupted at a safepoint. Again, the root cause is the superword
>>> optimization setting 'max_vector_size' to 16 bytes instead of 32 bytes which leads to the nmethod
>>> being marked as !has_wide_vectors and the safepoint handler not saving vector registers [3].
>>>
>>> This time, the problem is that the superword code only updates 'max_vlen_in_bytes' if 'vlen >
>>> max_vlen'. In the failing case, 'vlen' is 4 for all packs (see [4]) but 'vlen_in_bytes' is 16 for
>>> the 4 x int StoreVector and 32 for the 4 x long StoreVector. Once we've processed the int
>>> StoreVector, we are not updating 'max_vlen_in_bytes' when processing long StoreVector because 'vlen'
>>> is equal.
>>>
>>> The fix is to make sure to always update 'max_vlen_in_bytes'.
>>>
>>> When looking at JDK-8193518 [1], I've noticed that the corresponding regression test was never
>>> pushed. I've added it to this webrev and extended it such that it also covers the new issue.
>>>
>>> Thanks,
>>> Tobias
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8193518
>>> [2] http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/opto/output.cpp#l3313
>>> [3]
>>> http://hg.openjdk.java.net/jdk/jdk/file/1f74c0319302/src/hotspot/share/runtime/sharedRuntime.cpp#l551
>>>
>>>
>>> [4] -XX:+TraceSuperWord output:
>>>
>>> After filter_packs
>>> packset
>>> Pack: 0
>>> ? align: 0????? 1101??? StoreL??? ===? 1115? 1120? 1102? 174? [[ 1098 ]]?
>>> @long[int:>=0]:exact+any *, idx=6;
>>> Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=993,214,[1012] !jvms: Test::test @ bci:17
>>> Test::main @ bci:8
>>> ? align: 8????? 1098??? StoreL??? ===? 1115? 1101? 1099? 174? [[ 993 ]]? @long[int:>=0]:exact+any
>>> *, idx=6;
>>> Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @ bci:17
>>> Test::main @ bci:8
>>> ? align: 16????? 993??? StoreL??? ===? 1115? 1098? 994? 174? [[ 866? 214 ]]?
>>> @long[int:>=0]:exact+any *,
>>> idx=6;? Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=214,[1012] !jvms: Test::test @
>>> bci:17 Test::main @ bci:8
>>> ? align: 24????? 214??? StoreL??? ===? 1115? 993? 212? 174? [[ 1120? 864? 255 ]]?
>>> @long[int:>=0]:exact+any *,
>>> idx=6;? Memory: @long[int:>=0]:NotNull:exact+any *, idx=6; !orig=[1012] !jvms: Test::test @ bci:17
>>> Test::main @ bci:8
>>> Pack: 1
>>> ? align: 0????? 1097??? StoreI??? ===? 1115? 1119? 1106? 41? [[ 1096 ]]? @int[int:>=0]:exact+any
>>> *, idx=8;
>>> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=989,253,[1009] !jvms: Test::test @ bci:23
>>> Test::main @ bci:8
>>> ? align: 4????? 1096??? StoreI??? ===? 1115? 1097? 1104? 41? [[ 989 ]]? @int[int:>=0]:exact+any
>>> *, idx=8;
>>> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23
>>> Test::main @ bci:8
>>> ? align: 8????? 989??? StoreI??? ===? 1115? 1096? 996? 41? [[ 867? 253 ]]?
>>> @int[int:>=0]:exact+any *, idx=8;
>>> Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=253,[1009] !jvms: Test::test @ bci:23
>>> Test::main @ bci:8
>>> ? align: 12????? 253??? StoreI??? ===? 1115? 989? 251? 41? [[ 1119? 860? 255 ]]?
>>> @int[int:>=0]:exact+any *,
>>> idx=8;? Memory: @int[int:>=0]:NotNull:exact+any *, idx=8; !orig=[1009] !jvms: Test::test @ bci:23
>>> Test::main @ bci:8
>>>
>>> new Vector node:? 1491??? ReplicateI??? === _? 41? [[]]? #vectorx[4]:{int}
>>> new Vector node:? 1492??? StoreVector??? ===? 1115? 1119? 1106? 1491? [[ 1487? 1119? 255? 1486 ]]
>>> @int[int:>=0]:NotNull:exact+any *, idx=8; mismatched? Memory: @int[int:>=0]:NotNull:exact+any *,
>>> idx=8; !orig=[1097],[989],[253],[1009] !jvms: Test::test @ bci:23 Test::main @ bci:8
>>> new Vector node:? 1493??? ReplicateL??? === _? 174? [[]]? #vectory[4]:{long}
>>> new Vector node:? 1494??? StoreVector??? ===? 1115? 1120? 1102? 1493? [[ 1489? 1120? 255? 1488 ]]
>>> @long[int:>=0]:NotNull:exact+any *, idx=6; mismatched? Memory: @long[int:>=0]:NotNull:exact+any *,
>>> idx=6; !orig=[1101],[993],[214],[1012] !jvms: Test::test @ bci:17 Test::main @ bci:8
>>>

From tobias.hartmann at oracle.com  Mon Aug 10 07:20:05 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 10 Aug 2020 09:20:05 +0200
Subject: RFR: 8251303: C2: remove unused _site_invoke_ratio and related
 code from InlineTree
In-Reply-To: <ca91adcd-3c88-de04-1b63-1b7a8a7c68ca@oracle.com>
References: <BD4943EA-03C4-47DD-97C0-D04CA8D3946D@microsoft.com>
 <ca91adcd-3c88-de04-1b63-1b7a8a7c68ca@oracle.com>
Message-ID: <ea28ebfd-ec73-da44-8331-aad9e9a9b71a@oracle.com>

+1

Best regards,
Tobias

On 07.08.20 18:50, Vladimir Ivanov wrote:
>> Webrev: https://cr.openjdk.java.net/~burban/cgracie/unused_code/webrev0.0/
> Looks good.
> 
> I'll submit it for testing.
> 
> Best regards,
> Vladimir Ivanov

From yueshi.zwj at alibaba-inc.com  Mon Aug 10 07:24:52 2020
From: yueshi.zwj at alibaba-inc.com (Joshua Zhu)
Date: Mon, 10 Aug 2020 15:24:52 +0800
Subject: =?UTF-8?B?562U5aSNOiBbYWFyY2g2NC1wb3J0LWRldiBdIFJGUihMKTogODIzMTQ0MTogQUFyY2g2NDog?=
 =?UTF-8?B?SW5pdGlhbCBTVkUgYmFja2VuZCBzdXBwb3J0?=
In-Reply-To: <DB8PR08MB496923EB3A6FBA193EF35AD196440@DB8PR08MB4969.eurprd08.prod.outlook.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <DB8PR08MB496923EB3A6FBA193EF35AD196440@DB8PR08MB4969.eurprd08.prod.outlook.com>
Message-ID: <003101d66ee7$56b5e3f0$0421abd0$@alibaba-inc.com>

Hi Andrew,

Thanks a lot for your review.

> As Ningsheng is on leave this week, I will attempt to answer your specific
> questions based on what I know. I'm sorry that I'm not able to answer all
> your questions since I'm not familiar with every detail of the patch. And you
> may still need to wait him coming back to update the webrev(s).

I will help answer questions related with RA.

> > Specific Comments (register allocator webrev):
> >
> >
> > aarch64.ad:97-100
> >
> > Why have you added a reg_def for R8 and R9 here and also to
> > alloc_class
> > chunk0 at lines 544-545? They aren't used by C2 so why define them?
> 
> This has no functionality change to the two scratch registers. But if these are
> missing in the register definition, the regmask for vector registers won't start
> at an aligned position. So we prefer adding them back to make the
> computation easier.

Yes. Thanks Pengfei.

> 
> > assembler_aarch64.hpp:280 (also 699)
> >
> > prf sets a predicate register field. pgrf sets a governing predicate register
> field.
> > Should the name not be gprf.
> 
> I guess the reason is that the ArmARM doc says "the Pg field".
> 
> > chaitin.cpp:648-660
> >
> > The comment is rather oddly formatted.
> 
> Thanks for catching this.
> 
> > At line 650 you guard the assert with a test for lrg._is_vector. Is
> > that not always going to be guaranteed by the outer condition
> > lrg._is_scalable? If so then you should really assert lrg._is_vector.

_is_scalable tells the register length for the live range is scalable. This rule applies for both SVE vector register and predicate register.
Each predicate register holds one bit per byte of SVE vector register, meaning that each predicate register is one-eighth of the size of SVE vector register.
Each predicate register is an IMPLEMENTATION DEFINED multiple of 16 bits, up to 256 bits.
Although the actual length of predicate register is scalable, the max slots is always defined as 1. 
class PRegisterImpl: public AbstractRegisterImpl {
 public:
  enum {
    number_of_registers = 16,
    max_slots_per_register = 1
  }; 
I think this patch under review does not include the part of predicate register allocation.

> > The special case code for computation of num_regs for a vector stack
> > slot also appears in this file with a slightly different organization
> > in find_first_set (line 1350) and in PhaseChaitin::Select (line 1590).
> > There is another similar case in RegMask::num_registers at regmask.cpp:
> > 98. It would be better to factor out the common code into methods of LRG.
> > Maybe using the following?
> >
> >   bool LRG::is_scalable_vector() {
> >     if (_is_scalable) {
> >       assert(_is_vector == 1);
> >       assert(_num_regs == == RegMask::SlotsPerVecA)
> >       return true;
> >     }
> >     return false;
> >   }
> >
> >   int LRG::scalable_num_regs() {
> >     assert(is_scalable_vector());
> >     if (OptoReg::is_stack(_reg)) {
> >       return _scalable_reg_slots
> >     } else {
> >       return num_reg_slots;
> >     }
> >   }
> >
> > chaitin.cpp:1350
> >
> > Once again the test for lrg._is_vector should be guaranteed by the
> > outer test of lrg._is_scalable. Refactoring using the common methods
> > of LRG as above ought to help.
> >
> > chaitin.cpp:1591
> >
> > Use common method code.
> >
> >
> > postaloc.cpp:308/323
> >
> > Once again you should be able to use common method code of LRG here.
> >
> >
> > regmask.cpp:91
> >
> > Once again you should be able to use common method code of LRG here.

PhaseChaitin::Select (line 1590) will cover both SVE vector and predicate cases in future.
1590         // We always choose the high bit, then mask the low bits by register size
1591         if (lrg->_is_scalable && OptoReg::is_stack(lrg->reg())) { // stack
1592           n_regs = lrg->scalable_reg_slots();
1593         }

I think regmask.cpp (line 98) in future will look like:
 98   if (lrg._is_scalable && OptoReg::is_stack(assigned)) {
 99     if (lrg._is_vector) {
100       assert(ireg == Op_VecA, "scalable vector register");
101     }
        else if (lrg._is_predicate) {
          assert(ireg == Op_RegVMask, "scalable predicate register");
        }
102     n_regs = lrg.scalable_reg_slots();
103   }
104 
105   return n_regs;
106 }

Please correct me if any issues. Thanks.

Best Regards,
Joshua


From tobias.hartmann at oracle.com  Mon Aug 10 07:32:51 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 10 Aug 2020 09:32:51 +0200
Subject: [16] RFR (S) 8249749: modify a primitive array through a stream
 and a for cycle causes jre crash
In-Reply-To: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com>
References: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com>
Message-ID: <5f283812-510f-e22e-3e95-810103da2e43@oracle.com>

Hi Vladimir,

looks good to me.

Little typo in the test on line 27: "explressions".

Best regards,
Tobias

On 10.08.20 06:25, Vladimir Kozlov wrote:
> http://cr.openjdk.java.net/~kvn/8249749/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8249749
> 
> SuperWord does not recognize array indexing pattern used in the test due to additional AddI node:
> 
> AddI(AddI(Invariant(j*n), Loop_phi(i)), Loop_inc(1))
> 
> As result it can't find memory reference to align vectors. But code ignores that and continue
> execution.
> Later when align_to_ref is referenced we hit SEGV because it is NULL.
> 
> The fix is to check align_to_ref for NULL early and bailout.
> 
> I also adjusted code in SWPointer::scaled_iv_plus_offset() to recognize this address pattern to
> vectorize test's code.
> And added missing _invar setting.
> 
> And I slightly modified tracking code to investigate this issue.
> 
> Added new test to check some complex address expressions similar to bug's test case. Not all cases
> in test are vectorized - there are other conditions which prevent that.
> 
> Tested tier1,tier2,hs-tier3,precheckin-comp
> 
> Thanks,
> Vladimir K

From tobias.hartmann at oracle.com  Mon Aug 10 07:52:35 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 10 Aug 2020 09:52:35 +0200
Subject: RFR: 8250808: Re-associate loop invariants with other associative
 operations
In-Reply-To: <f7fe7c01-5845-1e95-bb8e-e4dc84d63a43@oracle.com>
References: <VI1PR08MB5328B283CCC9EFB7AFBF3108F5480@VI1PR08MB5328.eurprd08.prod.outlook.com>
 <f7fe7c01-5845-1e95-bb8e-e4dc84d63a43@oracle.com>
Message-ID: <defe9d75-1ddc-6651-8d4d-57c64db3d631@oracle.com>

+1

Best regards,
Tobias

On 07.08.20 18:45, Vladimir Ivanov wrote:
> 
>> Webrev: http://cr.openjdk.java.net/~xgong/rfr/8250808/webrev.00/
> 
> Looks good.
> 
> So far, testing results look good (hs-tier1/2 are clean, tier1-4 are in progress).
> 
> Best regards,
> Vladimir Ivanov
> 
>> C2 has re-association of loop invariants. However, the current implementation
>> only supports the re-associations for add and subtract with 32-bits integer type.
>> For other associative expressions like multiplication and the logic operations,
>> the re-association is also applicable, and also for the operations with long type.
>>
>> This patch adds the missing re-associations for other associative operations
>> together with the support for long type.
>>
>> With this patch, the following expressions:
>> ?? (x * inv1) * inv2
>> ?? (x | inv1) | inv2
>> ?? (x & inv1) & inv2
>> ?? (x ^ inv1) ^ inv2???????? ; inv1, inv2 are invariants
>>
>> can be re-associated to:
>> ?? x * (inv1 * inv2)???????? ; "inv1 * inv2" can be hoisted
>> ?? x | (inv1 | inv2)???????? ; "inv1 | inv2" can be hoisted
>> ?? x & (inv1 & inv2)?????? ; "inv1 & inv2" can be hoisted
>> ?? x ^ (inv1 ^ inv2)???????? ; "inv1 ^ inv2" can be hoisted
>>
>> Performance:
>> Here is the micro benchmark:
>> http://cr.openjdk.java.net/~xgong/rfr/8250808/LoopInvariant.java
>>
>> And the results on X86_64:
>> Before:
>> Benchmark?????????????????????????? (length)? Mode Cnt??? Score??????? Error????? Units
>> loopInvariantAddLong????????? 1024????? avgt?? 15?? 988.142??? ?? 0.110?? ns/op
>> loopInvariantAndInt????????????? 1024????? avgt?? 15?? 843.850??? ?? 0.522?? ns/op
>> loopInvariantAndLong????????? 1024????? avgt?? 15?? 990.551??? ? 10.458? ns/op
>> loopInvariantMulInt????????????? 1024????? avgt?? 15? 1209.003?? ?? 0.247?? ns/op
>> loopInvariantMulLong????????? 1024????? avgt?? 15? 1213.923?? ?? 0.438??? ns/op
>> loopInvariantOrInt??????????????? 1024????? avgt?? 15?? 843.908??? ?? 0.132??? ns/op
>> loopInvariantOrLong???????????? 1024????? avgt?? 15?? 990.710?? ? 10.484? ns/op
>> loopInvariantSubLong?????????? 1024????? avgt?? 15?? 988.170?? ?? 0.159??? ns/op
>> loopInvariantXorInt?????????????? 1024????? avgt?? 15?? 806.949?? ?? 7.860??? ns/op
>> loopInvariantXorLong?????????? 1024????? avgt?? 15?? 990.963?? ?? 8.321??? ns/op
>>
>> After:
>> Benchmark?????????????????????????? (length)? Mode? Cnt??? Score?????? Error??? Units
>> loopInvariantAddLong????????? 1024????? avgt?? 15?? 842.854?? ?? 9.036? ns/op
>> loopInvariantAndInt????????????? 1024????? avgt?? 15?? 698.097?? ?? 0.916? ns/op
>> loopInvariantAndLong????????? 1024????? avgt?? 15?? 841.120?? ?? 0.118? ns/op
>> loopInvariantMulInt????????????? 1024????? avgt?? 15?? 691.000?? ?? 7.696? ns/op
>> loopInvariantMulLong????????? 1024????? avgt?? 15?? 846.907?? ?? 0.189? ns/op
>> loopInvariantOrInt??????????????? 1024????? avgt?? 15?? 698.423?? ?? 4.969? ns/op
>> loopInvariantOrLong??????????? 1024????? avgt?? 15?? 843.465?? ? 10.196? ns/op
>> loopInvariantSubLong????????? 1024????? avgt?? 15?? 841.314?? ?? 2.906? ns/op
>> loopInvariantXorInt????????????? 1024????? avgt?? 15?? 652.529?? ?? 0.556? ns/op
>> loopInvariantXorLong????????? 1024????? avgt?? 15?? 841.860?? ?? 2.491? ns/op
>>
>> Results on AArch64:
>> Before:
>> Benchmark????????????????????????? (length)? Mode? Cnt??? Score??????? Error???? Units
>> loopInvariantAddLong???????? 1024????? avgt??? 15?? 514.437??? ? 0.351? ns/op
>> loopInvariantAndInt??????????? 1024????? avgt???? 15?? 435.301??? ? 0.415? ns/op
>> loopInvariantAndLong??????? 1024????? avgt???? 15?? 572.437??? ? 0.057? ns/op
>> loopInvariantMulInt??????????? 1024????? avgt???? 15? 1154.544?? ? 0.030? ns/op
>> loopInvariantMulLong??????? 1024????? avgt???? 15? 1188.109?? ? 0.299? ns/op
>> loopInvariantOrInt????????????? 1024????? avgt???? 15?? 435.605??? ? 0.977? ns/op
>> loopInvariantOrLong????????? 1024????? avgt???? 15?? 572.475???? ? 0.093? ns/op
>> loopInvariantSubLong??????? 1024????? avgt???? 15?? 514.340??? ? 0.154? ns/op
>> loopInvariantXorInt??????????? 1024????? avgt???? 15?? 426.186??? ? 0.105? ns/op
>> loopInvariantXorLong??????? 1024????? avgt???? 15?? 572.505??? ? 0.259? ns/op
>>
>> After:
>> Benchmark??????????????????????? (length)? Mode? Cnt??? Score?????? Error??? Units
>> loopInvariantAddLong?????? 1024???? avgt???? 15?? 508.179?? ? 0.108? ns/op
>> loopInvariantAndInt?????????? 1024??? avgt???? 15?? 394.706?? ? 0.199? ns/op
>> loopInvariantAndLong?????? 1024??? avgt???? 15?? 434.443?? ? 0.247? ns/op
>> loopInvariantMulInt?????????? 1024??? avgt???? 15?? 762.477?? ? 0.079? ns/op
>> loopInvariantMulLong?????? 1024??? avgt???? 15?? 775.975?? ? 0.159? ns/op
>> loopInvariantOrInt???????????? 1024??? avgt???? 15?? 394.657?? ? 0.156? ns/op
>> loopInvariantOrLong???????? 1024??? avgt???? 15?? 434.428?? ? 0.282? ns/op
>> loopInvariantSubLong?????? 1024??? avgt???? 15?? 507.475?? ? 0.151? ns/op
>> loopInvariantXorInt?????????? 1024??? avgt???? 15?? 396.000?? ? 0.011? ns/op
>> loopInvariantXorLong?????? 1024??? avgt???? 15?? 434.255?? ? 0.099? ns/op
>>
>> Tests:
>> Tested jtreg hotspot::hotspot_all_no_apps,jdk::jdk_core,langtools::tier1
>> and jcstress:tests-custom, and all tests pass without new failure.
>>
>> Thanks,
>> Xiaohong Gong
>>

From tobias.hartmann at oracle.com  Mon Aug 10 08:13:48 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 10 Aug 2020 10:13:48 +0200
Subject: [16] RFR(S): 8249603: C1: assert(has_error == false) failed:
 register allocation invalid
In-Reply-To: <348bd2c6-df94-d9f5-47d9-d51443e1e5d6@oracle.com>
References: <348bd2c6-df94-d9f5-47d9-d51443e1e5d6@oracle.com>
Message-ID: <9f5d2b18-a080-d569-a0a4-d357c8c2c8a6@oracle.com>

Hi Christian,

I agree with Vladimir, very nice analysis. Although I'm not too familiar with the C1 register
allocator, your explanation and fix makes sense to me.

Just wondering, do we hit this case with any of our existing tests?

Best regards,
Tobias

On 06.08.20 11:34, Christian Hagedorn wrote:
> Hi
> 
> Please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8249603
> http://cr.openjdk.java.net/~chagedorn/8249603/webrev.00/
> 
> Register allocation fails in C1 in the testcase because two intervals overlap (they both have the
> same stack slot assigned). The problem can be traced back to the optimization to assign the same
> spill slot to non-intersecting intervals in LinearScanWalker::combine_spilled_intervals().
> 
> In this method, we look at a split parent interval 'cur' and its register hint interval
> 'register_hint'. A register hint is present when the interval represents either the source or the
> target operand of a move operation and the register hint the target or source operand, respectively
> (the register hint is used to try to assign the same register to the source and target operand such
> that we can completely remove the move operation).
> 
> If the register hint is set, then we do some additional checks and make sure that the split parent
> and the register hint do not intersect. If all checks pass, the split parent 'cur' gets the same
> spill slot as the register hint [1]. This means that both intervals get the same slot on the stack
> if they are spilled.
> 
> The problem now is that we do not consider any split children of the register hint which all share
> the same spill slot with the register hint (their split parent). In the testcase, the split parent
> 'cur' does not intersect with the register hint but with one of its split children. As a result,
> they both get the same spill slot and are later indeed both spilled (i.e. both virtual
> registers/operands are put to the same stack location at the same time).
> 
> The fix now additionally checks if the split parent 'cur' does not intersect any split children of
> the register hint in combine_spilled_intervals(). If there is such an intersection, then we bail out
> of the optimization.
> 
> Some standard benchmark testing did not show any regressions.
> 
> Thank you!
> 
> Best regards,
> Christian
> 
> 
> [1] http://hg.openjdk.java.net/jdk/jdk/file/7a3522ab48b3/src/hotspot/share/c1/c1_LinearScan.cpp#l5728

From vladimir.x.ivanov at oracle.com  Mon Aug 10 08:30:59 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 10 Aug 2020 11:30:59 +0300
Subject: RFR: 8250808: Re-associate loop invariants with other associative
 operations
In-Reply-To: <f7fe7c01-5845-1e95-bb8e-e4dc84d63a43@oracle.com>
References: <VI1PR08MB5328B283CCC9EFB7AFBF3108F5480@VI1PR08MB5328.eurprd08.prod.outlook.com>
 <f7fe7c01-5845-1e95-bb8e-e4dc84d63a43@oracle.com>
Message-ID: <40dffc1b-1c62-4a53-a21f-3cf041ab569b@oracle.com>


>> Webrev: http://cr.openjdk.java.net/~xgong/rfr/8250808/webrev.00/
> 
> Looks good.
> 
> So far, testing results look good (hs-tier1/2 are clean, tier1-4 are in 
> progress).

FYI test results are clean.

Best regards,
Vladimir Ivanov

>> C2 has re-association of loop invariants. However, the current 
>> implementation
>> only supports the re-associations for add and subtract with 32-bits 
>> integer type.
>> For other associative expressions like multiplication and the logic 
>> operations,
>> the re-association is also applicable, and also for the operations 
>> with long type.
>>
>> This patch adds the missing re-associations for other associative 
>> operations
>> together with the support for long type.
>>
>> With this patch, the following expressions:
>> ?? (x * inv1) * inv2
>> ?? (x | inv1) | inv2
>> ?? (x & inv1) & inv2
>> ?? (x ^ inv1) ^ inv2???????? ; inv1, inv2 are invariants
>>
>> can be re-associated to:
>> ?? x * (inv1 * inv2)???????? ; "inv1 * inv2" can be hoisted
>> ?? x | (inv1 | inv2)???????? ; "inv1 | inv2" can be hoisted
>> ?? x & (inv1 & inv2)?????? ; "inv1 & inv2" can be hoisted
>> ?? x ^ (inv1 ^ inv2)???????? ; "inv1 ^ inv2" can be hoisted
>>
>> Performance:
>> Here is the micro benchmark:
>> http://cr.openjdk.java.net/~xgong/rfr/8250808/LoopInvariant.java
>>
>> And the results on X86_64:
>> Before:
>> Benchmark?????????????????????????? (length)? Mode Cnt??? Score        
>> Error????? Units
>> loopInvariantAddLong????????? 1024????? avgt?? 15?? 988.142??? ?  
>> 0.110?? ns/op
>> loopInvariantAndInt????????????? 1024????? avgt?? 15?? 843.850??? ?  
>> 0.522?? ns/op
>> loopInvariantAndLong????????? 1024????? avgt?? 15?? 990.551??? ? 
>> 10.458? ns/op
>> loopInvariantMulInt????????????? 1024????? avgt?? 15? 1209.003?? ?  
>> 0.247?? ns/op
>> loopInvariantMulLong????????? 1024????? avgt?? 15? 1213.923?? ?  
>> 0.438??? ns/op
>> loopInvariantOrInt??????????????? 1024????? avgt?? 15?? 843.908??? ?  
>> 0.132??? ns/op
>> loopInvariantOrLong???????????? 1024????? avgt?? 15?? 990.710?? ? 
>> 10.484? ns/op
>> loopInvariantSubLong?????????? 1024????? avgt?? 15?? 988.170?? ?  
>> 0.159??? ns/op
>> loopInvariantXorInt?????????????? 1024????? avgt?? 15?? 806.949?? ?  
>> 7.860??? ns/op
>> loopInvariantXorLong?????????? 1024????? avgt?? 15?? 990.963?? ?  
>> 8.321??? ns/op
>>
>> After:
>> Benchmark?????????????????????????? (length)? Mode? Cnt??? Score       
>> Error??? Units
>> loopInvariantAddLong????????? 1024????? avgt?? 15?? 842.854?? ?  
>> 9.036? ns/op
>> loopInvariantAndInt????????????? 1024????? avgt?? 15?? 698.097?? ?  
>> 0.916? ns/op
>> loopInvariantAndLong????????? 1024????? avgt?? 15?? 841.120?? ?  
>> 0.118? ns/op
>> loopInvariantMulInt????????????? 1024????? avgt?? 15?? 691.000?? ?  
>> 7.696? ns/op
>> loopInvariantMulLong????????? 1024????? avgt?? 15?? 846.907?? ?  
>> 0.189? ns/op
>> loopInvariantOrInt??????????????? 1024????? avgt?? 15?? 698.423?? ?  
>> 4.969? ns/op
>> loopInvariantOrLong??????????? 1024????? avgt?? 15?? 843.465?? ? 
>> 10.196? ns/op
>> loopInvariantSubLong????????? 1024????? avgt?? 15?? 841.314?? ?  
>> 2.906? ns/op
>> loopInvariantXorInt????????????? 1024????? avgt?? 15?? 652.529?? ?  
>> 0.556? ns/op
>> loopInvariantXorLong????????? 1024????? avgt?? 15?? 841.860?? ?  
>> 2.491? ns/op
>>
>> Results on AArch64:
>> Before:
>> Benchmark????????????????????????? (length)? Mode? Cnt??? Score        
>> Error???? Units
>> loopInvariantAddLong???????? 1024????? avgt??? 15?? 514.437??? ? 
>> 0.351? ns/op
>> loopInvariantAndInt??????????? 1024????? avgt???? 15?? 435.301??? ? 
>> 0.415? ns/op
>> loopInvariantAndLong??????? 1024????? avgt???? 15?? 572.437??? ? 
>> 0.057? ns/op
>> loopInvariantMulInt??????????? 1024????? avgt???? 15? 1154.544?? ? 
>> 0.030? ns/op
>> loopInvariantMulLong??????? 1024????? avgt???? 15? 1188.109?? ? 0.299  
>> ns/op
>> loopInvariantOrInt????????????? 1024????? avgt???? 15?? 435.605??? ? 
>> 0.977? ns/op
>> loopInvariantOrLong????????? 1024????? avgt???? 15?? 572.475???? ? 
>> 0.093? ns/op
>> loopInvariantSubLong??????? 1024????? avgt???? 15?? 514.340??? ? 
>> 0.154? ns/op
>> loopInvariantXorInt??????????? 1024????? avgt???? 15?? 426.186??? ? 
>> 0.105? ns/op
>> loopInvariantXorLong??????? 1024????? avgt???? 15?? 572.505??? ? 
>> 0.259? ns/op
>>
>> After:
>> Benchmark??????????????????????? (length)? Mode? Cnt??? Score       
>> Error??? Units
>> loopInvariantAddLong?????? 1024???? avgt???? 15?? 508.179?? ? 0.108  
>> ns/op
>> loopInvariantAndInt?????????? 1024??? avgt???? 15?? 394.706?? ? 0.199  
>> ns/op
>> loopInvariantAndLong?????? 1024??? avgt???? 15?? 434.443?? ? 0.247? ns/op
>> loopInvariantMulInt?????????? 1024??? avgt???? 15?? 762.477?? ? 0.079  
>> ns/op
>> loopInvariantMulLong?????? 1024??? avgt???? 15?? 775.975?? ? 0.159? ns/op
>> loopInvariantOrInt???????????? 1024??? avgt???? 15?? 394.657?? ? 
>> 0.156? ns/op
>> loopInvariantOrLong???????? 1024??? avgt???? 15?? 434.428?? ? 0.282  
>> ns/op
>> loopInvariantSubLong?????? 1024??? avgt???? 15?? 507.475?? ? 0.151? ns/op
>> loopInvariantXorInt?????????? 1024??? avgt???? 15?? 396.000?? ? 0.011  
>> ns/op
>> loopInvariantXorLong?????? 1024??? avgt???? 15?? 434.255?? ? 0.099? ns/op
>>
>> Tests:
>> Tested jtreg hotspot::hotspot_all_no_apps,jdk::jdk_core,langtools::tier1
>> and jcstress:tests-custom, and all tests pass without new failure.
>>
>> Thanks,
>> Xiaohong Gong
>>

From vladimir.x.ivanov at oracle.com  Mon Aug 10 08:33:51 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 10 Aug 2020 11:33:51 +0300
Subject: RFR: 8251303: C2: remove unused _site_invoke_ratio and related
 code from InlineTree
In-Reply-To: <ca91adcd-3c88-de04-1b63-1b7a8a7c68ca@oracle.com>
References: <BD4943EA-03C4-47DD-97C0-D04CA8D3946D@microsoft.com>
 <ca91adcd-3c88-de04-1b63-1b7a8a7c68ca@oracle.com>
Message-ID: <500e1fc4-11a5-90c3-d554-11cdf2f3eaed@oracle.com>


>> https://cr.openjdk.java.net/~burban/cgracie/unused_code/webrev0.0/
> Looks good.
> 
> I'll submit it for testing.

Test results are clean. I'll push the patch for you.

Best regards,
Vladimir Ivanov

From adinn at redhat.com  Mon Aug 10 08:43:34 2020
From: adinn at redhat.com (Andrew Dinn)
Date: Mon, 10 Aug 2020 09:43:34 +0100
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <DB8PR08MB496923EB3A6FBA193EF35AD196440@DB8PR08MB4969.eurprd08.prod.outlook.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <DB8PR08MB496923EB3A6FBA193EF35AD196440@DB8PR08MB4969.eurprd08.prod.outlook.com>
Message-ID: <6562d0da-f081-ede8-dfef-d3d6c70fb998@redhat.com>

Hi Pengfei,

On 10/08/2020 05:45, Pengfei Li wrote:

>> I also eyeballed /some/ of the generated code to check that it
>> looked ok. I'd really like to be able to do that systematically for
>> a comprehensive test suite that exercised every rule but I only had
>> the machine for a few days. This really ought to be done as a
>> follow-up to ensure that all the rules are working as expected.
> 
> Not sure if you have tried my newly added test in the vectorization
> folder. It checks if expected SVE/NEON instructions are generated as
> expected for each C2 vectornode by checking the OptoAssembly output.
> I put it in another webrev so you may have missed it. 
> http://cr.openjdk.java.net/~pli/rfr/8231441/jtreg.webrev.00/

Ah, thank you. That was not in the patch I Ningsheng pointed me at. It
is exactly what is needed to check the generation rules are all working.


>> Just out of interest why does UseSVE have range(0,2)? It seems you
>> are only testing for UseSVE > 0. Does value 2 correspond to  an
>> optional subset?

> AArch64 SVE has multiple versions. Current Fujitsu FX machine
> supports SVE1 only. We leave 2 here for SVE2 support in the near
> future. 
> https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-instruction-emulator/resources/tutorials/sve/sve-vs-sve2/introduction-to-sve2

Ah ok, thanks. Got it. Being able to switch on level 1 without level 2
is a good idea.


>> Why have you added a reg_def for R8 and R9 here and also to
>> alloc_class chunk0 at lines 544-545? They aren't used by C2 so why
>> define them?
> 
> This has no functionality change to the two scratch registers. But if
> these are missing in the register definition, the regmask for vector
> registers won't start at an aligned position. So we prefer adding
> them back to make the computation easier.

It would be good to make this clear with a comment. Also, I think you
should change the name of the registers to R8_UNUSED and R9_UNUSED just
to emphasize that these are not expected to be included in any register
sets.

>> prf sets a predicate register field. pgrf sets a governing
>> predicate register field. Should the name not be gprf.
> 
> I guess the reason is that the ArmARM doc says "the Pg field".

Ok, let's leave it at that then and blame ARM ;-)

>> chaitin.cpp:648-660
>> 
>> The comment is rather oddly formatted.
> 
> Thanks for catching this.

Well, that's what reviews are for ...

>> At line 650 you guard the assert with a test for lrg._is_vector. Is
>> that not always going to be guaranteed by the outer condition
>> lrg._is_scalable? If so then you should really assert
>> lrg._is_vector.
>> 
>> . . .

> Thanks for above suggestions. We will consider refactoring these
> parts.

Ok, I'll wait for an updated webrev.

>> Andrew Haley is definitely going to ask you to update function
>> entry (assembler_aarch64.cpp:76) to call these new instruction
>> generation methods and then validate the generated code using
>> asm_check So, I guess you might as well do that now ;-)
> 
> Thanks for letting us know. We will check how to validate those.

Ok, thanks.


>> Can you explain why we need to check p7 here and not do so in other
>> places where we call into the JVM? I'm not saying this is wrong. I
>> just want to know how you decided where re-init of p7 was needed.
> 
> Sorry I don't know how the places are decided. But I will ask
> Ningsheng to explain this question and reply you later.

Sure, thanks.

>> Does this mean that is someone sets the maximum vector size to a
>> non- power of two, such as 384, all superword operations will be
>> bypassed? Including those which can be done using NEON vectors?
> 
> The existing SLP doesn't support non-power-of-2 vector size (there
> are some assertions inside) so we added this. Yes, it's better if we
> have some mechanism to fall back to NEON for non-power-of-2 size. But
> so far in practice, we don't know any real chip implements the
> non-power-of-2 vector size. Also, we are now working on a new
> predicate-driven auto-vectorization pass to support SVE better. Do
> you think it's ok if we print some warnings if someone sets a
> non-power-of-2 size in vm options? Or any other suggestions in the
> short term?

Well, the test for MaxVectorSize in vm_version.cpp currently only
ensures it has been set to a multiple of 16. I think you probably ought
to check for a power of two at that point and exit the VM otherwise. If
hardware comes along that supports a non-power of two we can deal with
it at that point.

regards,


Andrew Dinn
-----------
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From Xiaohong.Gong at arm.com  Mon Aug 10 08:44:02 2020
From: Xiaohong.Gong at arm.com (Xiaohong Gong)
Date: Mon, 10 Aug 2020 08:44:02 +0000
Subject: RFR: 8250808: Re-associate loop invariants with other associative
 operations
In-Reply-To: <defe9d75-1ddc-6651-8d4d-57c64db3d631@oracle.com>
References: <VI1PR08MB5328B283CCC9EFB7AFBF3108F5480@VI1PR08MB5328.eurprd08.prod.outlook.com>
 <f7fe7c01-5845-1e95-bb8e-e4dc84d63a43@oracle.com>
 <defe9d75-1ddc-6651-8d4d-57c64db3d631@oracle.com>
Message-ID: <VI1PR08MB5328B33D6E7C3880235E02CCF5440@VI1PR08MB5328.eurprd08.prod.outlook.com>

Hi Tobias,

> +1

Thanks for the review!

Best Regards,
Xiaohong

From vladimir.x.ivanov at oracle.com  Mon Aug 10 09:04:17 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 10 Aug 2020 12:04:17 +0300
Subject: [16] RFR (S) 8249749: modify a primitive array through a stream
 and a for cycle causes jre crash
In-Reply-To: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com>
References: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com>
Message-ID: <03653a8f-d196-4e4c-d893-a619f2973011@oracle.com>


> http://cr.openjdk.java.net/~kvn/8249749/webrev.00/

Looks good.

Best regards,
Vladimir Ivanov

> https://bugs.openjdk.java.net/browse/JDK-8249749
> 
> SuperWord does not recognize array indexing pattern used in the test due 
> to additional AddI node:
> 
> AddI(AddI(Invariant(j*n), Loop_phi(i)), Loop_inc(1))
> 
> As result it can't find memory reference to align vectors. But code 
> ignores that and continue execution.
> Later when align_to_ref is referenced we hit SEGV because it is NULL.
> 
> The fix is to check align_to_ref for NULL early and bailout.
> 
> I also adjusted code in SWPointer::scaled_iv_plus_offset() to recognize 
> this address pattern to vectorize test's code.
> And added missing _invar setting.
> 
> And I slightly modified tracking code to investigate this issue.
> 
> Added new test to check some complex address expressions similar to 
> bug's test case. Not all cases in test are vectorized - there are other 
> conditions which prevent that.
> 
> Tested tier1,tier2,hs-tier3,precheckin-comp
> 
> Thanks,
> Vladimir K

From adinn at redhat.com  Mon Aug 10 09:18:59 2020
From: adinn at redhat.com (Andrew Dinn)
Date: Mon, 10 Aug 2020 10:18:59 +0100
Subject: =?UTF-8?B?UmU6IOetlOWkjTogW2FhcmNoNjQtcG9ydC1kZXYgXSBSRlIoTCk6IDgy?=
 =?UTF-8?Q?31441=3a_AArch64=3a_Initial_SVE_backend_support?=
In-Reply-To: <003101d66ee7$56b5e3f0$0421abd0$@alibaba-inc.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <DB8PR08MB496923EB3A6FBA193EF35AD196440@DB8PR08MB4969.eurprd08.prod.outlook.com>
 <003101d66ee7$56b5e3f0$0421abd0$@alibaba-inc.com>
Message-ID: <e8f91b58-ff64-2761-1f27-386808517fdb@redhat.com>

Hi Joshua,

On 10/08/2020 08:24, Joshua Zhu wrote:
> I will help answer questions related with RA.

Thanks for your help.

>>> At line 650 you guard the assert with a test for lrg._is_vector. Is
>>> that not always going to be guaranteed by the outer condition
>>> lrg._is_scalable? If so then you should really assert lrg._is_vector.
> 
> _is_scalable tells the register length for the live range is
> scalable. This rule applies for both SVE vector register and
> predicate register. Each predicate register holds one bit per
> byte of SVE vector register, meaning that each predicate 
> register is one-eighth of the size of SVE vector register.
> Each predicate register is an IMPLEMENTATION DEFINED multiple
> of 16 bits, up to 256 bits. Although the actual length of
> predicate register is scalable, the max slots is always defined
> as 1.> class PRegisterImpl: public AbstractRegisterImpl {
>  public:
>   enum {
>     number_of_registers = 16,
>     max_slots_per_register = 1
>   }; 
> I think this patch under review does not include the part of
>  predicate register allocation.

Ok, I understand that _is_scalable is meant to identify both a predicate
register and an SVE vector register. Something definitely seems to be
missing because field LRG::_is_scalable is not set in the case where we
have a PRegisterImpl (Op_RegVMask). In webrev03 it only ever gets set at
chaitin.cpp:822:

        if (RegMask::is_vector(ireg)) {
          lrg._is_vector = 1;
          if (ireg == Op_VecA) {
            assert(Matcher::supports_scalable_vector(), "scalable vector
should be supported");
            lrg._is_scalable = 1;
            // For scalable vector, when it is allocated in physical
register,
            // num_regs is RegMask::SlotsPerVecA for reg mask,
            // which may not be the actual physical register size.
            // If it is allocated in stack, we need to get the actual
            // physical length of scalable vector register.

lrg.set_scalable_reg_slots(Matcher::scalable_vector_reg_size(T_FLOAT));
          }

So, it seems LRG::_is_scalable will only be set for a VecA register.

If you could check what code might be missing and  post a new webrev
I'll look at this again. However, it would still be good to try to
factor out some common code into methods if possible.

>>> The special case code for computation of num_regs for a vector stack
>>> slot also appears in this file with a slightly different organization
>>> . . .

> PhaseChaitin::Select (line 1590) will cover both SVE vector and predicate cases in future.
> 1590         // We always choose the high bit, then mask the low bits by register size
> 1591         if (lrg->_is_scalable && OptoReg::is_stack(lrg->reg())) { // stack
> 1592           n_regs = lrg->scalable_reg_slots();
> 1593         }
> 
> I think regmask.cpp (line 98) in future will look like:
>  98   if (lrg._is_scalable && OptoReg::is_stack(assigned)) {
>  99     if (lrg._is_vector) {
> 100       assert(ireg == Op_VecA, "scalable vector register");
> 101     }
>         else if (lrg._is_predicate) {
>           assert(ireg == Op_RegVMask, "scalable predicate register");
>         }
> 102     n_regs = lrg.scalable_reg_slots();
> 103   }
> 104 
> 105   return n_regs;
> 106 }
> 
> Please correct me if any issues. Thanks.
Ok, I agree that this will be correct when we can come across the case
where lrg._is_scalable is true and ireg == Op_RegVMask. However, that
case does not currently arise. So, a new webrev that allows for this
case would help.

Thanks for helping to explain what is going on here.

regards,


Andrew Dinn
-----------
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From doug.simon at oracle.com  Mon Aug 10 09:55:30 2020
From: doug.simon at oracle.com (Doug Simon)
Date: Mon, 10 Aug 2020 11:55:30 +0200
Subject: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64
In-Reply-To: <MWHPR21MB051183FDFD8C154790995C1BB0470@MWHPR21MB0511.namprd21.prod.outlook.com>
References: <MWHPR21MB051183FDFD8C154790995C1BB0470@MWHPR21MB0511.namprd21.prod.outlook.com>
Message-ID: <9074F4C9-589A-4519-BBAB-2F3161B814D7@oracle.com>

Hi Ludovic,

Are you considering also implementing this intrinsic in Graal?

Is the intrinsification purely about removing the array bounds checks? If so, it may be possible to have the Graal intrinsify the method by compiling the relevant Java code without array bounds checks.

-Doug

> On 9 Aug 2020, at 05:19, Ludovic Henry <luhenry at microsoft.com> wrote:
> 
> Hello,
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8251216
> Webrev: http://cr.openjdk.java.net/~luhenry/8251216/webrev.00
> 
> Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1
> 
> This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance improvements are the following (on Linux-AArch64 on a Marvell TX2):
> 
> -XX:-UseMD5Intrinsics
> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
> MessageDigests.digest             md5        64     DEFAULT  thrpt   10  1616.238 ? 28.082  ops/ms
> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   215.030 ?  0.691  ops/ms
> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.228 ?  0.001  ops/ms
> 
> -XX:+UseMD5Intrinsics
> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
> MessageDigests.digest             md5        64     DEFAULT  thrpt   10  2005.233 ? 40.513  ops/ms => 24% speedup
> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   275.979 ?  0.455  ops/ms => 28% speedup
> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.279 ?  0.001  ops/ms => 22% speedup
> 
> Thank you,
> Ludovic
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8250902


From beurba at microsoft.com  Mon Aug 10 13:01:45 2020
From: beurba at microsoft.com (Bernhard Urban-Forster)
Date: Mon, 10 Aug 2020 13:01:45 +0000
Subject: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64
In-Reply-To: <9074F4C9-589A-4519-BBAB-2F3161B814D7@oracle.com>
References: <MWHPR21MB051183FDFD8C154790995C1BB0470@MWHPR21MB0511.namprd21.prod.outlook.com>,
 <9074F4C9-589A-4519-BBAB-2F3161B814D7@oracle.com>
Message-ID: <DM6PR21MB145210D8E5638AEEFF2849C3C2440@DM6PR21MB1452.namprd21.prod.outlook.com>

Hey Doug,

replying on behalf for Ludovic, as he is on vacation :-)

Currently we are not planning to implement the intrinsic for Graal. Also we didn't check the generated code by Graal. I believe it will do a better job eliminated array bounds checks, but I'm curious to learn how "compiling the relevant Java code without array bounds checks" works. Is something like that done for other methods already?

This is the relevant Java method for the MD5 intrinsic:
https://github.com/openjdk/jdk/blob/733218137289d6a0eb705103ed7be30f1e68d17a/src/java.base/share/classes/sun/security/provider/MD5.java#L172


-Bernhard

________________________________________
From: Doug Simon <doug.simon at oracle.com>
Sent: Monday, August 10, 2020 11:55
To: Ludovic Henry
Cc: hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64
Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64

Hi Ludovic,

Are you considering also implementing this intrinsic in Graal?

Is the intrinsification purely about removing the array bounds checks? If so, it may be possible to have the Graal intrinsify the method by compiling the relevant Java code without array bounds checks.

-Doug

> On 9 Aug 2020, at 05:19, Ludovic Henry <luhenry at microsoft.com> wrote:
>
> Hello,
>
> Bug: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8251216&amp;data=02%7C01%7Cbeurba%40microsoft.com%7C087d5d80f9484f13ddcc08d83d138f3a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637326501506459034&amp;sdata=C7Bi8BTsmtR3HFgWgYTw7jww63BcHGutNXE8o9x2bdY%3D&amp;reserved=0
> Webrev: https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~luhenry%2F8251216%2Fwebrev.00&amp;data=02%7C01%7Cbeurba%40microsoft.com%7C087d5d80f9484f13ddcc08d83d138f3a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637326501506459034&amp;sdata=0CZOMfpmtPZiy64za8NYYpVjCdawmjGacEOc3WfADDA%3D&amp;reserved=0
>
> Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1
>
> This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance improvements are the following (on Linux-AArch64 on a Marvell TX2):
>
> -XX:-UseMD5Intrinsics
> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
> MessageDigests.digest             md5        64     DEFAULT  thrpt   10  1616.238 ? 28.082  ops/ms
> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   215.030 ?  0.691  ops/ms
> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.228 ?  0.001  ops/ms
>
> -XX:+UseMD5Intrinsics
> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
> MessageDigests.digest             md5        64     DEFAULT  thrpt   10  2005.233 ? 40.513  ops/ms => 24% speedup
> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   275.979 ?  0.455  ops/ms => 28% speedup
> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.279 ?  0.001  ops/ms => 22% speedup
>
> Thank you,
> Ludovic
>
> [1] https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8250902&amp;data=02%7C01%7Cbeurba%40microsoft.com%7C087d5d80f9484f13ddcc08d83d138f3a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637326501506459034&amp;sdata=5KcoG5n10rnVMU9y8L076jpCoEd0NBzNqr%2F8M5ghO3c%3D&amp;reserved=0


From doug.simon at oracle.com  Mon Aug 10 13:38:42 2020
From: doug.simon at oracle.com (Doug Simon)
Date: Mon, 10 Aug 2020 15:38:42 +0200
Subject: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64
In-Reply-To: <DM6PR21MB145210D8E5638AEEFF2849C3C2440@DM6PR21MB1452.namprd21.prod.outlook.com>
References: <MWHPR21MB051183FDFD8C154790995C1BB0470@MWHPR21MB0511.namprd21.prod.outlook.com>
 <9074F4C9-589A-4519-BBAB-2F3161B814D7@oracle.com>
 <DM6PR21MB145210D8E5638AEEFF2849C3C2440@DM6PR21MB1452.namprd21.prod.outlook.com>
Message-ID: <A369C69D-229E-471A-B8F3-21EECC40F71D@oracle.com>

Hi Bernhard,


> On 10 Aug 2020, at 15:01, Bernhard Urban-Forster <beurba at microsoft.com> wrote:
> 
> Hey Doug,
> 
> replying on behalf for Ludovic, as he is on vacation :-)
> 
> Currently we are not planning to implement the intrinsic for Graal.

Schade ;-)

> Also we didn't check the generated code by Graal. I believe it will do a better job eliminated array bounds checks, but I'm curious to learn how "compiling the relevant Java code without array bounds checks" works. Is something like that done for other methods already?

I don?t think we do that anywhere currently but I imagine it wouldn?t be hard to put the BytecodeParser into a mode whereby an array access generates a AccessIndexedNode that omits the bounds check (generated by org.graalvm.compiler.replacements.DefaultJavaLoweringProvider.getBoundsCheck).

-Doug

> 
> This is the relevant Java method for the MD5 intrinsic:
> https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/733218137289d6a0eb705103ed7be30f1e68d17a/src/java.base/share/classes/sun/security/provider/MD5.java*L172__;Iw!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6ijVLDV$ <https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/733218137289d6a0eb705103ed7be30f1e68d17a/src/java.base/share/classes/sun/security/provider/MD5.java*L172__;Iw!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6ijVLDV$> 
> 
> 
> -Bernhard
> 
> ________________________________________
> From: Doug Simon <doug.simon at oracle.com <mailto:doug.simon at oracle.com>>
> Sent: Monday, August 10, 2020 11:55
> To: Ludovic Henry
> Cc: hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev at openjdk.java.net <mailto:aarch64-port-dev at openjdk.java.net>; openjdk-aarch64
> Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64
> 
> Hi Ludovic,
> 
> Are you considering also implementing this intrinsic in Graal?
> 
> Is the intrinsification purely about removing the array bounds checks? If so, it may be possible to have the Graal intrinsify the method by compiling the relevant Java code without array bounds checks.
> 
> -Doug
> 
>> On 9 Aug 2020, at 05:19, Ludovic Henry <luhenry at microsoft.com> wrote:
>> 
>> Hello,
>> 
>> Bug: https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8251216&amp;data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&amp;sdata=C7Bi8BTsmtR3HFgWgYTw7jww63BcHGutNXE8o9x2bdY*3D&amp;reserved=0__;JSUlJSUlJSUlJSUlJSU!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E97IPBA3$ <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8251216&amp;data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&amp;sdata=C7Bi8BTsmtR3HFgWgYTw7jww63BcHGutNXE8o9x2bdY*3D&amp;reserved=0__;JSUlJSUlJSUlJSUlJSU!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E97IPBA3$> 
>> Webrev: https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=http:*2F*2Fcr.openjdk.java.net*2F*luhenry*2F8251216*2Fwebrev.00&amp;data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&amp;sdata=0CZOMfpmtPZiy64za8NYYpVjCdawmjGacEOc3WfADDA*3D&amp;reserved=0__;JSUlfiUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E84nlzLJ$ <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=http:*2F*2Fcr.openjdk.java.net*2F*luhenry*2F8251216*2Fwebrev.00&amp;data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&amp;sdata=0CZOMfpmtPZiy64za8NYYpVjCdawmjGacEOc3WfADDA*3D&amp;reserved=0__;JSUlfiUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E84nlzLJ$> 
>> 
>> Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1
>> 
>> This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance improvements are the following (on Linux-AArch64 on a Marvell TX2):
>> 
>> -XX:-UseMD5Intrinsics
>> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
>> MessageDigests.digest             md5        64     DEFAULT  thrpt   10  1616.238 ? 28.082  ops/ms
>> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   215.030 ?  0.691  ops/ms
>> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.228 ?  0.001  ops/ms
>> 
>> -XX:+UseMD5Intrinsics
>> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
>> MessageDigests.digest             md5        64     DEFAULT  thrpt   10  2005.233 ? 40.513  ops/ms => 24% speedup
>> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   275.979 ?  0.455  ops/ms => 28% speedup
>> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.279 ?  0.001  ops/ms => 22% speedup
>> 
>> Thank you,
>> Ludovic
>> 
>> [1] https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8250902&amp;data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&amp;sdata=5KcoG5n10rnVMU9y8L076jpCoEd0NBzNqr*2F8M5ghO3c*3D&amp;reserved=0__;JSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6SPJBTN$ <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8250902&amp;data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&amp;sdata=5KcoG5n10rnVMU9y8L076jpCoEd0NBzNqr*2F8M5ghO3c*3D&amp;reserved=0__;JSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6SPJBTN$>

From evgeny.nikitin at oracle.com  Mon Aug 10 13:47:03 2020
From: evgeny.nikitin at oracle.com (Evgeny Nikitin)
Date: Mon, 10 Aug 2020 15:47:03 +0200
Subject: RFR(XS): 8251349: Add TestCaseImpl to OverloadCompileQueueTest.java's
 build dependencies
Message-ID: <ab6d98c5-b221-85d7-63ce-f2cf871b050e@oracle.com>

Hi,

Bug: https://bugs.openjdk.java.net/browse/JDK-8251349
Webrev: https://cr.openjdk.java.net/~enikitin/8251349/webrev.00/

The test loads said class (TestCaseImpl) as a resource from disk. The 
test obviously needs the class to get compiled in advance.

The change has been checked in mach5 for the 5 common platforms (passed).

Please review,
/Evgeny Nikitin.

From evgeny.nikitin at oracle.com  Mon Aug 10 14:22:39 2020
From: evgeny.nikitin at oracle.com (Evgeny Nikitin)
Date: Mon, 10 Aug 2020 16:22:39 +0200
Subject: RFR(XS): 8069411: Un-quarantine OverloadCompileQueueTest.java
Message-ID: <f7b11303-85e3-b034-ed65-04a4ca9eec5a@oracle.com>

Hi,

Bug: https://bugs.openjdk.java.net/browse/JDK-8069411
Webrev: http://cr.openjdk.java.net/~enikitin//8069411/webrev.00/index.html

The test failed previously due to a specific Assert class design from 
2015 [1]. Please note that getMessage gets called for every comparison, 
causing a string copy. So the OOME was not caused by a test design or 
failure, it was just a common OOME, and Assert class was stressing the 
VM by copying error messages.

These days Assert class has changed and I have run lengths attempting to 
reproduce OOME in that or any other place of the test. I suggest to 
enable the test in CI runs.


Please review,
//Evgeny Nikitin.

========
[1] http://cr.openjdk.java.net/~enikitin//8069411/webrev.00/index.html

From Charlie.Gracie at microsoft.com  Mon Aug 10 14:58:57 2020
From: Charlie.Gracie at microsoft.com (Charlie Gracie)
Date: Mon, 10 Aug 2020 14:58:57 +0000
Subject: RFR: 8251303: C2: remove unused _site_invoke_ratio and
 related code from InlineTree
Message-ID: <756853D1-477A-4A7A-AC18-0FFA624502A3@microsoft.com>

Thanks for the reviews Vladimir and Tobias.

Thanks for testing and sponsoring the change Vladimir.

Cheers,
Charlie Gracie

?On 2020-08-10, 4:29 AM, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com> wrote:

    
    >> https://cr.openjdk.java.net/~burban/cgracie/unused_code/webrev0.0/
    > Looks good.
    > 
    > I'll submit it for testing.
    
    Test results are clean. I'll push the patch for you.
    
    Best regards,
    Vladimir Ivanov
    

From igor.ignatyev at oracle.com  Mon Aug 10 16:04:48 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Mon, 10 Aug 2020 09:04:48 -0700
Subject: RFR(XS): 8251349: Add TestCaseImpl to
 OverloadCompileQueueTest.java's build dependencies
In-Reply-To: <ab6d98c5-b221-85d7-63ce-f2cf871b050e@oracle.com>
References: <ab6d98c5-b221-85d7-63ce-f2cf871b050e@oracle.com>
Message-ID: <64A989AA-7A20-4AB2-828B-C0BABE31E6D6@oracle.com>

Hi Evgeny,

the fix looks good. there is although another (arguable better) way to solve that: update test/hotspot/jtreg/compiler/codecache/stress/Helper.java to get TestCaseImpl classname from TestCaseImpl.class, so there will be statically detectable dependency b/w TestCaseImpl  and compiler/codecache/stress/Helper (and all test classes which use it, including OverloadCompileQueueTest), so the tests won't have to have explicit @build.

Thanks,
-- Igor

> On Aug 10, 2020, at 6:47 AM, Evgeny Nikitin <evgeny.nikitin at oracle.com> wrote:
> 
> Hi,
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8251349
> Webrev: https://cr.openjdk.java.net/~enikitin/8251349/webrev.00/
> 
> The test loads said class (TestCaseImpl) as a resource from disk. The test obviously needs the class to get compiled in advance.
> 
> The change has been checked in mach5 for the 5 common platforms (passed).
> 
> Please review,
> /Evgeny Nikitin.


From igor.ignatyev at oracle.com  Mon Aug 10 16:09:59 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Mon, 10 Aug 2020 09:09:59 -0700
Subject: RFR(XS): 8069411: Un-quarantine OverloadCompileQueueTest.java
In-Reply-To: <f7b11303-85e3-b034-ed65-04a4ca9eec5a@oracle.com>
References: <f7b11303-85e3-b034-ed65-04a4ca9eec5a@oracle.com>
Message-ID: <6EF18AD8-F195-4DB5-98FE-50D1B49A3A90@oracle.com>

Hi Evgeny,

I'm assuming that you haven't seen timeouts in your reproducing attempts either, correct?

the fix looks good (assuming to goes after 8251349)

-- Igor

> On Aug 10, 2020, at 7:22 AM, Evgeny Nikitin <evgeny.nikitin at oracle.com> wrote:
> 
> Hi,
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8069411
> Webrev: http://cr.openjdk.java.net/~enikitin//8069411/webrev.00/index.html
> 
> The test failed previously due to a specific Assert class design from 2015 [1]. Please note that getMessage gets called for every comparison, causing a string copy. So the OOME was not caused by a test design or failure, it was just a common OOME, and Assert class was stressing the VM by copying error messages.
> 
> These days Assert class has changed and I have run lengths attempting to reproduce OOME in that or any other place of the test. I suggest to enable the test in CI runs.
> 
> 
> Please review,
> //Evgeny Nikitin.
> 
> ========
> [1] http://cr.openjdk.java.net/~enikitin//8069411/webrev.00/index.html


From verghese at amazon.com  Mon Aug 10 17:00:29 2020
From: verghese at amazon.com (Verghese, Clive)
Date: Mon, 10 Aug 2020 17:00:29 +0000
Subject: RFR 8251268: Move PhaseChaitin definations from live.cpp to
 chaitin.cpp
In-Reply-To: <6593ec2c-78dc-4a72-7f5b-f6c60deda41d@oracle.com>
References: <54AD5187-E7EE-410F-BD5D-11658E8D2F6E@amazon.com>
 <6593ec2c-78dc-4a72-7f5b-f6c60deda41d@oracle.com>
Message-ID: <8DD352C1-18EA-4D50-9646-18C333CCC118@amazon.com>

Hi Christian, 

Thank you for the feedback. I have updated the review addressing the comments below. 

http://cr.openjdk.java.net/~xliu/clive/8251268/01/webrev/

Regards,
Clive Verghese 


?On 8/6/20, 11:55 PM, "Christian Hagedorn" <christian.hagedorn at oracle.com> wrote:    
    
    
    Hi Clive
    
    The fix looks good to me. It makes sense to move it to chaitin.cpp since
    the calls to verify() are also in this file only.
    
    You could fix some minor code style things about the existing code that
    you moved while at it:
    - You can move the #ifdef ASSERT out of both methods and surround both
    methods by one single #ifdef ASSERT since verify()/verify_base_ptrs()
    are only called in ASSERT blocks. And add a // ASSERT comment on the
    closing #endif to make it more clear. Don't forget to also surround the
    declarations in the .hpp file with an ASSERT.
    - In verify_base_ptrs():
       - L2330: Missing curly braces for the loop
       - L2297, 2309, 2316: The asterisk should be at the type: ResourceArea
    *a -> ResourceArea* a
       - There is a missing space in all asserts after the comma separating
    the condition and the failure string
    - In verify():
       - L2386: Missing space and curly braces for the if statement
    
    
    Best regards,
    Christian
    
    On 07.08.20 01:49, Verghese, Clive wrote:
    > Hi,
    >
    > Requesting review for
    >
    > Webrev : http://cr.openjdk.java.net/~xliu/clive/8251268/00/webrev/
    > JBS : https://bugs.openjdk.java.net/browse/JDK-8251268
    >
    > The change moves the definition of PhaseChaitin::verify_base_ptrs and PhaseChaitin::verify from live.cpp to chaitin.cpp
    >
    > I have tested this builds successfully for both PRODUCT and !PRODUCT.
    >
    > Ensured that there are no regressions in hotspot:tier1 tests.
    >
    >
    > Regards,
    > Clive Verghese
    >
    

From vladimir.kozlov at oracle.com  Mon Aug 10 17:01:01 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 10 Aug 2020 10:01:01 -0700
Subject: [16] RFR (S) 8249749: modify a primitive array through a stream
 and a for cycle causes jre crash
In-Reply-To: <5f283812-510f-e22e-3e95-810103da2e43@oracle.com>
References: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com>
 <5f283812-510f-e22e-3e95-810103da2e43@oracle.com>
Message-ID: <83200b26-6f48-0af0-19e3-1f8a2089d29b@oracle.com>

Thank you, Tobias

On 8/10/20 12:32 AM, Tobias Hartmann wrote:
> Hi Vladimir,
> 
> looks good to me.
> 
> Little typo in the test on line 27: "explressions".

Fixed.

Thanks,
Vladimir K

> 
> Best regards,
> Tobias
> 
> On 10.08.20 06:25, Vladimir Kozlov wrote:
>> http://cr.openjdk.java.net/~kvn/8249749/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8249749
>>
>> SuperWord does not recognize array indexing pattern used in the test due to additional AddI node:
>>
>> AddI(AddI(Invariant(j*n), Loop_phi(i)), Loop_inc(1))
>>
>> As result it can't find memory reference to align vectors. But code ignores that and continue
>> execution.
>> Later when align_to_ref is referenced we hit SEGV because it is NULL.
>>
>> The fix is to check align_to_ref for NULL early and bailout.
>>
>> I also adjusted code in SWPointer::scaled_iv_plus_offset() to recognize this address pattern to
>> vectorize test's code.
>> And added missing _invar setting.
>>
>> And I slightly modified tracking code to investigate this issue.
>>
>> Added new test to check some complex address expressions similar to bug's test case. Not all cases
>> in test are vectorized - there are other conditions which prevent that.
>>
>> Tested tier1,tier2,hs-tier3,precheckin-comp
>>
>> Thanks,
>> Vladimir K

From vladimir.kozlov at oracle.com  Mon Aug 10 17:02:34 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 10 Aug 2020 10:02:34 -0700
Subject: [16] RFR (S) 8249749: modify a primitive array through a stream
 and a for cycle causes jre crash
In-Reply-To: <03653a8f-d196-4e4c-d893-a619f2973011@oracle.com>
References: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com>
 <03653a8f-d196-4e4c-d893-a619f2973011@oracle.com>
Message-ID: <3d0dc868-e824-a141-02a2-58a58ad5b450@oracle.com>

Thank you, Vladimir

Vladimir K

On 8/10/20 2:04 AM, Vladimir Ivanov wrote:
> 
>> http://cr.openjdk.java.net/~kvn/8249749/webrev.00/
> 
> Looks good.
> 
> Best regards,
> Vladimir Ivanov
> 
>> https://bugs.openjdk.java.net/browse/JDK-8249749
>>
>> SuperWord does not recognize array indexing pattern used in the test due to additional AddI node:
>>
>> AddI(AddI(Invariant(j*n), Loop_phi(i)), Loop_inc(1))
>>
>> As result it can't find memory reference to align vectors. But code ignores that and continue execution.
>> Later when align_to_ref is referenced we hit SEGV because it is NULL.
>>
>> The fix is to check align_to_ref for NULL early and bailout.
>>
>> I also adjusted code in SWPointer::scaled_iv_plus_offset() to recognize this address pattern to vectorize test's code.
>> And added missing _invar setting.
>>
>> And I slightly modified tracking code to investigate this issue.
>>
>> Added new test to check some complex address expressions similar to bug's test case. Not all cases in test are 
>> vectorized - there are other conditions which prevent that.
>>
>> Tested tier1,tier2,hs-tier3,precheckin-comp
>>
>> Thanks,
>> Vladimir K

From evgeny.nikitin at oracle.com  Mon Aug 10 19:25:05 2020
From: evgeny.nikitin at oracle.com (Evgeny Nikitin)
Date: Mon, 10 Aug 2020 21:25:05 +0200
Subject: RFR(XS): 8251349: Add TestCaseImpl to
 OverloadCompileQueueTest.java's build dependencies
In-Reply-To: <64A989AA-7A20-4AB2-828B-C0BABE31E6D6@oracle.com>
References: <ab6d98c5-b221-85d7-63ce-f2cf871b050e@oracle.com>
 <64A989AA-7A20-4AB2-828B-C0BABE31E6D6@oracle.com>
Message-ID: <d6e0e04e-c8dc-fc2f-d615-ea3afbfc8738@oracle.com>

Hi Igor,

I agree, using reflection would be better. For those using IDEs as well. 
Here's the new webrev:

http://cr.openjdk.java.net/~enikitin//8251349/webrev.01/index.html

Again, the same one-time test run in mach5 on 5 platforms.

Thanks in advance,
//Evgeny

On 2020-08-10 18:04, Igor Ignatyev wrote:
> Hi Evgeny,
> 
> the fix looks good. there is although another (arguable better) way to solve that: update test/hotspot/jtreg/compiler/codecache/stress/Helper.java to get TestCaseImpl classname from TestCaseImpl.class, so there will be statically detectable dependency b/w TestCaseImpl  and compiler/codecache/stress/Helper (and all test classes which use it, including OverloadCompileQueueTest), so the tests won't have to have explicit @build.
> 
> Thanks,
> -- Igor
> 
>> On Aug 10, 2020, at 6:47 AM, Evgeny Nikitin <evgeny.nikitin at oracle.com> wrote:
>>
>> Hi,
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8251349
>> Webrev: https://cr.openjdk.java.net/~enikitin/8251349/webrev.00/
>>
>> The test loads said class (TestCaseImpl) as a resource from disk. The test obviously needs the class to get compiled in advance.
>>
>> The change has been checked in mach5 for the 5 common platforms (passed).
>>
>> Please review,
>> /Evgeny Nikitin.
> 

From igor.ignatyev at oracle.com  Mon Aug 10 19:34:06 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Mon, 10 Aug 2020 12:34:06 -0700
Subject: RFR(XS): 8251349: Add TestCaseImpl to
 OverloadCompileQueueTest.java's build dependencies
In-Reply-To: <d6e0e04e-c8dc-fc2f-d615-ea3afbfc8738@oracle.com>
References: <ab6d98c5-b221-85d7-63ce-f2cf871b050e@oracle.com>
 <64A989AA-7A20-4AB2-828B-C0BABE31E6D6@oracle.com>
 <d6e0e04e-c8dc-fc2f-d615-ea3afbfc8738@oracle.com>
Message-ID: <415F5369-AD2E-4E57-8A9A-C3A58BC4F99A@oracle.com>

LGTM
-- Igor

> On Aug 10, 2020, at 12:25 PM, Evgeny Nikitin <evgeny.nikitin at oracle.com> wrote:
> 
> Hi Igor,
> 
> I agree, using reflection would be better. For those using IDEs as well. Here's the new webrev:
> 
> http://cr.openjdk.java.net/~enikitin//8251349/webrev.01/index.html
> 
> Again, the same one-time test run in mach5 on 5 platforms.
> 
> Thanks in advance,
> //Evgeny
> 
> On 2020-08-10 18:04, Igor Ignatyev wrote:
>> Hi Evgeny,
>> the fix looks good. there is although another (arguable better) way to solve that: update test/hotspot/jtreg/compiler/codecache/stress/Helper.java to get TestCaseImpl classname from TestCaseImpl.class, so there will be statically detectable dependency b/w TestCaseImpl  and compiler/codecache/stress/Helper (and all test classes which use it, including OverloadCompileQueueTest), so the tests won't have to have explicit @build.
>> Thanks,
>> -- Igor
>>> On Aug 10, 2020, at 6:47 AM, Evgeny Nikitin <evgeny.nikitin at oracle.com> wrote:
>>> 
>>> Hi,
>>> 
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8251349
>>> Webrev: https://cr.openjdk.java.net/~enikitin/8251349/webrev.00/
>>> 
>>> The test loads said class (TestCaseImpl) as a resource from disk. The test obviously needs the class to get compiled in advance.
>>> 
>>> The change has been checked in mach5 for the 5 common platforms (passed).
>>> 
>>> Please review,
>>> /Evgeny Nikitin.


From vladimir.kozlov at oracle.com  Mon Aug 10 20:05:20 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 10 Aug 2020 13:05:20 -0700
Subject: RFR(XS): 8251349: Add TestCaseImpl to
 OverloadCompileQueueTest.java's build dependencies
In-Reply-To: <415F5369-AD2E-4E57-8A9A-C3A58BC4F99A@oracle.com>
References: <ab6d98c5-b221-85d7-63ce-f2cf871b050e@oracle.com>
 <64A989AA-7A20-4AB2-828B-C0BABE31E6D6@oracle.com>
 <d6e0e04e-c8dc-fc2f-d615-ea3afbfc8738@oracle.com>
 <415F5369-AD2E-4E57-8A9A-C3A58BC4F99A@oracle.com>
Message-ID: <429818e1-47e4-30d3-151f-0a38d2524bff@oracle.com>

+1

Thanks,
Vladimir K

On 8/10/20 12:34 PM, Igor Ignatyev wrote:
> LGTM
> -- Igor
> 
>> On Aug 10, 2020, at 12:25 PM, Evgeny Nikitin <evgeny.nikitin at oracle.com> wrote:
>>
>> Hi Igor,
>>
>> I agree, using reflection would be better. For those using IDEs as well. Here's the new webrev:
>>
>> http://cr.openjdk.java.net/~enikitin//8251349/webrev.01/index.html
>>
>> Again, the same one-time test run in mach5 on 5 platforms.
>>
>> Thanks in advance,
>> //Evgeny
>>
>> On 2020-08-10 18:04, Igor Ignatyev wrote:
>>> Hi Evgeny,
>>> the fix looks good. there is although another (arguable better) way to solve that: update test/hotspot/jtreg/compiler/codecache/stress/Helper.java to get TestCaseImpl classname from TestCaseImpl.class, so there will be statically detectable dependency b/w TestCaseImpl  and compiler/codecache/stress/Helper (and all test classes which use it, including OverloadCompileQueueTest), so the tests won't have to have explicit @build.
>>> Thanks,
>>> -- Igor
>>>> On Aug 10, 2020, at 6:47 AM, Evgeny Nikitin <evgeny.nikitin at oracle.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8251349
>>>> Webrev: https://cr.openjdk.java.net/~enikitin/8251349/webrev.00/
>>>>
>>>> The test loads said class (TestCaseImpl) as a resource from disk. The test obviously needs the class to get compiled in advance.
>>>>
>>>> The change has been checked in mach5 for the 5 common platforms (passed).
>>>>
>>>> Please review,
>>>> /Evgeny Nikitin.
> 

From christian.hagedorn at oracle.com  Tue Aug 11 07:15:44 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Tue, 11 Aug 2020 09:15:44 +0200
Subject: RFR 8251268: Move PhaseChaitin definations from live.cpp to
 chaitin.cpp
In-Reply-To: <8DD352C1-18EA-4D50-9646-18C333CCC118@amazon.com>
References: <54AD5187-E7EE-410F-BD5D-11658E8D2F6E@amazon.com>
 <6593ec2c-78dc-4a72-7f5b-f6c60deda41d@oracle.com>
 <8DD352C1-18EA-4D50-9646-18C333CCC118@amazon.com>
Message-ID: <fe42a16e-4d2d-b844-7b23-12898358db6a@oracle.com>

Hi Clive

Thanks a lot for taking care of this!

One last comment: The existing spacing for the verify methods in the 
.hpp file is wrong. But since there are many more methods with a wrong 
spacing following it, I leave it up to you if you want to fix it for the 
verify methods or not. I'm fine with both. Either way, you don't need to 
send another webrev.

Otherwise, it looks good to me!

Best regards,
Christian

On 10.08.20 19:00, Verghese, Clive wrote:
> Hi Christian,
> 
> Thank you for the feedback. I have updated the review addressing the comments below.
> 
> http://cr.openjdk.java.net/~xliu/clive/8251268/01/webrev/
> 
> Regards,
> Clive Verghese
> 
> 
> 
> ?On 8/6/20, 11:55 PM, "Christian Hagedorn" <christian.hagedorn at oracle.com> wrote:
>      
>      
>      Hi Clive
>      
>      The fix looks good to me. It makes sense to move it to chaitin.cpp since
>      the calls to verify() are also in this file only.
>      
>      You could fix some minor code style things about the existing code that
>      you moved while at it:
>      - You can move the #ifdef ASSERT out of both methods and surround both
>      methods by one single #ifdef ASSERT since verify()/verify_base_ptrs()
>      are only called in ASSERT blocks. And add a // ASSERT comment on the
>      closing #endif to make it more clear. Don't forget to also surround the
>      declarations in the .hpp file with an ASSERT.
>      - In verify_base_ptrs():
>         - L2330: Missing curly braces for the loop
>         - L2297, 2309, 2316: The asterisk should be at the type: ResourceArea
>      *a -> ResourceArea* a
>         - There is a missing space in all asserts after the comma separating
>      the condition and the failure string
>      - In verify():
>         - L2386: Missing space and curly braces for the if statement
>      
>      
>      Best regards,
>      Christian
>      
>      On 07.08.20 01:49, Verghese, Clive wrote:
>      > Hi,
>      >
>      > Requesting review for
>      >
>      > Webrev : http://cr.openjdk.java.net/~xliu/clive/8251268/00/webrev/
>      > JBS : https://bugs.openjdk.java.net/browse/JDK-8251268
>      >
>      > The change moves the definition of PhaseChaitin::verify_base_ptrs and PhaseChaitin::verify from live.cpp to chaitin.cpp
>      >
>      > I have tested this builds successfully for both PRODUCT and !PRODUCT.
>      >
>      > Ensured that there are no regressions in hotspot:tier1 tests.
>      >
>      >
>      > Regards,
>      > Clive Verghese
>      >
>      
> 

From christian.hagedorn at oracle.com  Tue Aug 11 08:40:38 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Tue, 11 Aug 2020 10:40:38 +0200
Subject: [16] RFR(S): 8249603: C1: assert(has_error == false) failed:
 register allocation invalid
In-Reply-To: <9f5d2b18-a080-d569-a0a4-d357c8c2c8a6@oracle.com>
References: <348bd2c6-df94-d9f5-47d9-d51443e1e5d6@oracle.com>
 <9f5d2b18-a080-d569-a0a4-d357c8c2c8a6@oracle.com>
Message-ID: <f1f50865-0164-aacb-00ca-6585c70ef93d@oracle.com>

Hi Tobias

Thanks a lot!

I think we will always catch an overlap later in the verification method 
unless we somehow correct the mistake until then. But I don't think that 
this is likely or even possible. Nevertheless, I still wanted to verify 
that to some extent and added an assert(false) in the newly added 
intersection bailout test with the split children and could not trigger 
it in tier 1-4 (apart from the newly added test).

Best regards,
Christian

On 10.08.20 10:13, Tobias Hartmann wrote:
> Hi Christian,
> 
> I agree with Vladimir, very nice analysis. Although I'm not too familiar with the C1 register
> allocator, your explanation and fix makes sense to me.
> 
> Just wondering, do we hit this case with any of our existing tests?
> 
> Best regards,
> Tobias
> 
> On 06.08.20 11:34, Christian Hagedorn wrote:
>> Hi
>>
>> Please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8249603
>> http://cr.openjdk.java.net/~chagedorn/8249603/webrev.00/
>>
>> Register allocation fails in C1 in the testcase because two intervals overlap (they both have the
>> same stack slot assigned). The problem can be traced back to the optimization to assign the same
>> spill slot to non-intersecting intervals in LinearScanWalker::combine_spilled_intervals().
>>
>> In this method, we look at a split parent interval 'cur' and its register hint interval
>> 'register_hint'. A register hint is present when the interval represents either the source or the
>> target operand of a move operation and the register hint the target or source operand, respectively
>> (the register hint is used to try to assign the same register to the source and target operand such
>> that we can completely remove the move operation).
>>
>> If the register hint is set, then we do some additional checks and make sure that the split parent
>> and the register hint do not intersect. If all checks pass, the split parent 'cur' gets the same
>> spill slot as the register hint [1]. This means that both intervals get the same slot on the stack
>> if they are spilled.
>>
>> The problem now is that we do not consider any split children of the register hint which all share
>> the same spill slot with the register hint (their split parent). In the testcase, the split parent
>> 'cur' does not intersect with the register hint but with one of its split children. As a result,
>> they both get the same spill slot and are later indeed both spilled (i.e. both virtual
>> registers/operands are put to the same stack location at the same time).
>>
>> The fix now additionally checks if the split parent 'cur' does not intersect any split children of
>> the register hint in combine_spilled_intervals(). If there is such an intersection, then we bail out
>> of the optimization.
>>
>> Some standard benchmark testing did not show any regressions.
>>
>> Thank you!
>>
>> Best regards,
>> Christian
>>
>>
>> [1] http://hg.openjdk.java.net/jdk/jdk/file/7a3522ab48b3/src/hotspot/share/c1/c1_LinearScan.cpp#l5728

From tobias.hartmann at oracle.com  Tue Aug 11 08:44:31 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 11 Aug 2020 10:44:31 +0200
Subject: [16] RFR(S): 8249603: C1: assert(has_error == false) failed:
 register allocation invalid
In-Reply-To: <f1f50865-0164-aacb-00ca-6585c70ef93d@oracle.com>
References: <348bd2c6-df94-d9f5-47d9-d51443e1e5d6@oracle.com>
 <9f5d2b18-a080-d569-a0a4-d357c8c2c8a6@oracle.com>
 <f1f50865-0164-aacb-00ca-6585c70ef93d@oracle.com>
Message-ID: <e6c4f302-cdf3-919f-acef-cd62e566a08e@oracle.com>

Hi Christian,

On 11.08.20 10:40, Christian Hagedorn wrote:
> I think we will always catch an overlap later in the verification method unless we somehow correct
> the mistake until then. But I don't think that this is likely or even possible. Nevertheless, I
> still wanted to verify that to some extent and added an assert(false) in the newly added
> intersection bailout test with the split children and could not trigger it in tier 1-4 (apart from
> the newly added test).

Okay, thanks for checking!

Best regards,
Tobias

From patric.hedlin at oracle.com  Tue Aug 11 09:00:15 2020
From: patric.hedlin at oracle.com (Patric Hedlin)
Date: Tue, 11 Aug 2020 11:00:15 +0200
Subject: RFR(XS/T): 8250848: [aarch64] nativeGotJump_at() missing call to
 verify().
Message-ID: <6a148562-eb36-1fc5-895f-024a6135b002@oracle.com>

Please review this trivial change/update.


--- a/src/hotspot/cpu/aarch64/nativeInst_aarch64.hpp??? Mon Aug 10 
12:57:38 2020 +0100
+++ b/src/hotspot/cpu/aarch64/nativeInst_aarch64.hpp??? Mon Aug 10 
16:50:20 2020 +0200
@@ -537,6 +537,7 @@

 ?inline NativeGotJump* nativeGotJump_at(address addr) {
 ?? NativeGotJump* jump = (NativeGotJump*)(addr);
+? DEBUG_ONLY(jump->verify());
 ?? return jump;
 ?}


Issue: https://bugs.openjdk.java.net/browse/JDK-8250848

Webrev with additional (trivial) /code style/ conforming clean-up:
 ? ? ?? http://cr.openjdk.java.net/~phedlin/tr8250848/


Testing: tier1-3


Best regards,
Patric

From aph at redhat.com  Tue Aug 11 10:06:03 2020
From: aph at redhat.com (Andrew Haley)
Date: Tue, 11 Aug 2020 11:06:03 +0100
Subject: [aarch64-port-dev ] [16] RFR[S]: 8251216: Implement MD5
 intrinsics on AArch64
In-Reply-To: <MWHPR21MB051183FDFD8C154790995C1BB0470@MWHPR21MB0511.namprd21.prod.outlook.com>
References: <MWHPR21MB051183FDFD8C154790995C1BB0470@MWHPR21MB0511.namprd21.prod.outlook.com>
Message-ID: <2d960bbe-0191-db94-2d5c-7df511a36dba@redhat.com>

On 09/08/2020 04:19, Ludovic Henry wrote:
> Hello,
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8251216
> Webrev: http://cr.openjdk.java.net/~luhenry/8251216/webrev.00
> 
> Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1
> 
> This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance improvements are the following (on Linux-AArch64 on a Marvell TX2):
> 
> -XX:-UseMD5Intrinsics
> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
> MessageDigests.digest             md5        64     DEFAULT  thrpt   10  1616.238 ? 28.082  ops/ms
> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   215.030 ?  0.691  ops/ms
> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.228 ?  0.001  ops/ms
> 
> -XX:+UseMD5Intrinsics
> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
> MessageDigests.digest             md5        64     DEFAULT  thrpt   10  2005.233 ? 40.513  ops/ms => 24% speedup
> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   275.979 ?  0.455  ops/ms => 28% speedup
> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.279 ?  0.001  ops/ms => 22% speedup
> 
> Thank you,
> Ludovic
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8250902
> 

How did you test this? I'm looking through the test suite, but I can't
find the test vectors. They must be in there somewhere.

https://www.nist.gov/itl/ssd/software-quality-group/nsrl-test-data

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From aph at redhat.com  Tue Aug 11 10:06:28 2020
From: aph at redhat.com (Andrew Haley)
Date: Tue, 11 Aug 2020 11:06:28 +0100
Subject: [aarch64-port-dev ] RFR: 8247354: [aarch64] PopFrame causes
 assert(oopDesc::is_oop(obj)) failed: not an oop
In-Reply-To: <85lfinwafi.fsf@nicgas01-pc.shanghai.arm.com>
References: <85364ykery.fsf@nicgas01-pc.shanghai.arm.com>
 <c3192037-6ce7-40c1-7137-a3b4a0bd502f@redhat.com>
 <85lfinwafi.fsf@nicgas01-pc.shanghai.arm.com>
Message-ID: <8b4d4db0-01d0-3501-8114-382fff8b06bc@redhat.com>

On 10/08/2020 02:34, Nick Gasson wrote:
> Hi Andrew, did you reply to the wrong mail...?

Looks like it.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From aph at redhat.com  Tue Aug 11 10:08:53 2020
From: aph at redhat.com (Andrew Haley)
Date: Tue, 11 Aug 2020 11:08:53 +0100
Subject: [aarch64-port-dev ] RFR(XS/T): 8250848: [aarch64]
 nativeGotJump_at() missing call to verify().
In-Reply-To: <6a148562-eb36-1fc5-895f-024a6135b002@oracle.com>
References: <6a148562-eb36-1fc5-895f-024a6135b002@oracle.com>
Message-ID: <4ddd50f6-6449-fd4a-58c0-dc77a523434e@redhat.com>

On 11/08/2020 10:00, Patric Hedlin wrote:
> 
> Issue:https://bugs.openjdk.java.net/browse/JDK-8250848
> 
> Webrev with additional (trivial)/code style/  conforming clean-up:
>          http://cr.openjdk.java.net/~phedlin/tr8250848/

OK.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From evgeny.nikitin at oracle.com  Tue Aug 11 10:16:21 2020
From: evgeny.nikitin at oracle.com (Evgeny Nikitin)
Date: Tue, 11 Aug 2020 12:16:21 +0200
Subject: RFR(XS): 8251349: Add TestCaseImpl to
 OverloadCompileQueueTest.java's build dependencies
In-Reply-To: <415F5369-AD2E-4E57-8A9A-C3A58BC4F99A@oracle.com>
References: <ab6d98c5-b221-85d7-63ce-f2cf871b050e@oracle.com>
 <64A989AA-7A20-4AB2-828B-C0BABE31E6D6@oracle.com>
 <d6e0e04e-c8dc-fc2f-d615-ea3afbfc8738@oracle.com>
 <415F5369-AD2E-4E57-8A9A-C3A58BC4F99A@oracle.com>
Message-ID: <a12d20a0-72c7-2bf1-b649-a5c776e06bf5@oracle.com>

Hi Igor,

Thank you. Please find the patch attached. I wonder how many such 
nano-fixes one needs to make to become a committer? :))

Thanks in advance,
// Evgeny.

On 2020-08-10 21:34, Igor Ignatyev wrote:
> LGTM
> -- Igor
> 
>> On Aug 10, 2020, at 12:25 PM, Evgeny Nikitin <evgeny.nikitin at oracle.com> wrote:
>>
>> Hi Igor,
>>
>> I agree, using reflection would be better. For those using IDEs as well. Here's the new webrev:
>>
>> http://cr.openjdk.java.net/~enikitin//8251349/webrev.01/index.html
>>
>> Again, the same one-time test run in mach5 on 5 platforms.
>>
>> Thanks in advance,
>> //Evgeny
>>
>> On 2020-08-10 18:04, Igor Ignatyev wrote:
>>> Hi Evgeny,
>>> the fix looks good. there is although another (arguable better) way to solve that: update test/hotspot/jtreg/compiler/codecache/stress/Helper.java to get TestCaseImpl classname from TestCaseImpl.class, so there will be statically detectable dependency b/w TestCaseImpl  and compiler/codecache/stress/Helper (and all test classes which use it, including OverloadCompileQueueTest), so the tests won't have to have explicit @build.
>>> Thanks,
>>> -- Igor
>>>> On Aug 10, 2020, at 6:47 AM, Evgeny Nikitin <evgeny.nikitin at oracle.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8251349
>>>> Webrev: https://cr.openjdk.java.net/~enikitin/8251349/webrev.00/
>>>>
>>>> The test loads said class (TestCaseImpl) as a resource from disk. The test obviously needs the class to get compiled in advance.
>>>>
>>>> The change has been checked in mach5 for the 5 common platforms (passed).
>>>>
>>>> Please review,
>>>> /Evgeny Nikitin.
> 
-------------- next part --------------
# HG changeset patch
# User enikitin
# Date 1597084287 -7200
#      Mon Aug 10 20:31:27 2020 +0200
# Node ID 060dd595dda6a12a38ccd944a565b9bd23c1933e
# Parent  c379dc750a02918dda02809fbc9edb2711c4a6ee
8251349: Add TestCaseImpl to OverloadCompileQueueTest.java's build dependencies
Reviewed-by: iignatyev, kvn

diff -r c379dc750a02 -r 060dd595dda6 test/hotspot/jtreg/compiler/codecache/stress/Helper.java
--- a/test/hotspot/jtreg/compiler/codecache/stress/Helper.java	Mon Jul 27 11:34:19 2020 -0700
+++ b/test/hotspot/jtreg/compiler/codecache/stress/Helper.java	Mon Aug 10 20:31:27 2020 +0200
@@ -37,7 +37,7 @@
     public static final WhiteBox WHITE_BOX = WhiteBox.getWhiteBox();
 
     private static final long THRESHOLD = WHITE_BOX.getIntxVMFlag("CompileThreshold");
-    private static final String TEST_CASE_IMPL_CLASS_NAME = "compiler.codecache.stress.TestCaseImpl";
+    private static final String TEST_CASE_IMPL_CLASS_NAME = TestCaseImpl.class.getName();
     private static byte[] CLASS_DATA;
     static {
         try {

From patric.hedlin at oracle.com  Tue Aug 11 11:16:51 2020
From: patric.hedlin at oracle.com (Patric Hedlin)
Date: Tue, 11 Aug 2020 13:16:51 +0200
Subject: [aarch64-port-dev ] RFR(XS/T): 8250848: [aarch64]
 nativeGotJump_at() missing call to verify().
In-Reply-To: <4ddd50f6-6449-fd4a-58c0-dc77a523434e@redhat.com>
References: <6a148562-eb36-1fc5-895f-024a6135b002@oracle.com>
 <4ddd50f6-6449-fd4a-58c0-dc77a523434e@redhat.com>
Message-ID: <e5b6f0a8-0b57-1c6d-ed66-cf2da6e87842@oracle.com>

Thank you for reviewing Andrew.

/Patric

On 2020-08-11 12:08, Andrew Haley wrote:
> On 11/08/2020 10:00, Patric Hedlin wrote:
>>
>> Issue:https://bugs.openjdk.java.net/browse/JDK-8250848
>>
>> Webrev with additional (trivial)/code style/? conforming clean-up:
>> ???????? http://cr.openjdk.java.net/~phedlin/tr8250848/
>
> OK.
>


From xxinliu at amazon.com  Tue Aug 11 17:09:11 2020
From: xxinliu at amazon.com (Liu, Xin)
Date: Tue, 11 Aug 2020 17:09:11 +0000
Subject: RFR(S): 8247732: validate user-input intrinsic_ids in
 ControlIntrinsic
In-Reply-To: <1596523192072.15354@amazon.com>
References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com>
 <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com>
 <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com>
 <1595401959932.33284@amazon.com>
 <a03d92d6-ad07-b347-7452-776459b8d174@oracle.com>
 <1595520162373.22868@amazon.com>,
 <916b3a4a-5617-941d-6161-840f3ea900bd@oracle.com>,
 <1596523192072.15354@amazon.com>
Message-ID: <1597165750921.4285@amazon.com>

Hi, Reviewers, 

May I gently ping this? 

I stuck because I don't know which error handling is appropriate.

If we do nothing, current hotspot ignores wrong intrinsic Ids in the cmdline. 
This patch aborts hotspot when it detects any invalid intrinsic id.  

thanks,
--lx


________________________________________
From: Liu, Xin
Sent: Monday, August 3, 2020 11:39 PM
To: Tobias Hartmann; Nils Eliasson; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev
Subject: Re: [EXTERNAL] RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic

hi, Nils,

Tobias would like to keep the parser behavior consistency.  I think it means that the hotspot need to suppress the warning if the intrinsic_id doesn't exists in compiler directive.
eg. -XX:CompileCommand=option,<pattern>,ControlIntrinsic=-_nonexist.

What do you think about it?

Here is the latest webrev:
http://cr.openjdk.java.net/~xliu/8247732/01/webrev/

thanks,
--lx

________________________________________
From: Tobias Hartmann <tobias.hartmann at oracle.com>
Sent: Friday, July 24, 2020 2:52 AM
To: Liu, Xin; Nils Eliasson; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev
Subject: RE: [EXTERNAL] RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.


Hi Liu,

On 23.07.20 18:02, Liu, Xin wrote:
> That is my intention too, but CompilerOracle doesn't exit JVM when it encounters parsing errors.
> It just exacts information from CompileCommand as many as possible. That makes sense because compiler "directives" are supposed to be optional for program execution.
>
> I do put the error message in parser's errorbuf.  I set a flag "exit_on_error" to quit JVM after it dumps parser errors. yes, I treat undefined intrinsics as fatal errors.
> This behavior is from Nils comment: "I want to see an error on startup if the user has specified unknown intrinsic names."  It is also consistent with JVM option -XX:ControlIntrinsic=.

Okay, thanks for the explanation! I would prefer consistency in error handling of compiler
directives, i.e., handle all parser failures the same way. But I leave it to Nils to decide.

Best regards,
Tobias

From luhenry at microsoft.com  Tue Aug 11 18:28:56 2020
From: luhenry at microsoft.com (Ludovic Henry)
Date: Tue, 11 Aug 2020 18:28:56 +0000
Subject: [aarch64-port-dev ] [16] RFR[S]: 8251216: Implement MD5
 intrinsics on AArch64
In-Reply-To: <2d960bbe-0191-db94-2d5c-7df511a36dba@redhat.com>
References: <MWHPR21MB051183FDFD8C154790995C1BB0470@MWHPR21MB0511.namprd21.prod.outlook.com>
 <2d960bbe-0191-db94-2d5c-7df511a36dba@redhat.com>
Message-ID: <MWHPR21MB0511E7436CCEE403AB658613B0450@MWHPR21MB0511.namprd21.prod.outlook.com>

Hi Andrew,

(I'm currently on vacation and will come back on the 20th.)

I've relied on the existing test suite, which was also enhanced when submitting the patch for the MD5 intrinsic on x86 [1]. To help in the development, I've also generated 1k random strings, got them through md5sum on Linux, and compared the output of this MD5 intrinsic on the same input. I did not use [2] as a testing bed, but would be happy to add it to the OpenJDK test suite (if the license allows for it, I didn't check yet where it's allowed).

> I'm looking through the test suite, but I can't find the test vectors. They must be in there somewhere.

test/hotspot/jtreg/compiler/intrinsics/sha/TestDigest.java covers that by running a single value with and without the intrinsic.

--
Ludovic

[1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039327.html
[2] https://www.nist.gov/itl/ssd/software-quality-group/nsrl-test-data


From vladimir.kozlov at oracle.com  Tue Aug 11 19:32:20 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 11 Aug 2020 12:32:20 -0700
Subject: RFR 8251268: Move PhaseChaitin definations from live.cpp to
 chaitin.cpp
In-Reply-To: <fe42a16e-4d2d-b844-7b23-12898358db6a@oracle.com>
References: <54AD5187-E7EE-410F-BD5D-11658E8D2F6E@amazon.com>
 <6593ec2c-78dc-4a72-7f5b-f6c60deda41d@oracle.com>
 <8DD352C1-18EA-4D50-9646-18C333CCC118@amazon.com>
 <fe42a16e-4d2d-b844-7b23-12898358db6a@oracle.com>
Message-ID: <de562ab5-cb82-f667-51b1-d59cded4d967@oracle.com>

On 8/11/20 12:15 AM, Christian Hagedorn wrote:
> Hi Clive
> 
> Thanks a lot for taking care of this!
> 
> One last comment: The existing spacing for the verify methods in the .hpp file is wrong. But since there are many more 
> methods with a wrong spacing following it, I leave it up to you if you want to fix it for the verify methods or not. I'm 
> fine with both. Either way, you don't need to send another webrev.
> 
> Otherwise, it looks good to me!

+1

Thanks,
Vladimir K

> 
> Best regards,
> Christian
> 
> On 10.08.20 19:00, Verghese, Clive wrote:
>> Hi Christian,
>>
>> Thank you for the feedback. I have updated the review addressing the comments below.
>>
>> http://cr.openjdk.java.net/~xliu/clive/8251268/01/webrev/
>>
>> Regards,
>> Clive Verghese
>>
>>
>>
>> ?On 8/6/20, 11:55 PM, "Christian Hagedorn" <christian.hagedorn at oracle.com> wrote:
>> ???? Hi Clive
>> ???? The fix looks good to me. It makes sense to move it to chaitin.cpp since
>> ???? the calls to verify() are also in this file only.
>> ???? You could fix some minor code style things about the existing code that
>> ???? you moved while at it:
>> ???? - You can move the #ifdef ASSERT out of both methods and surround both
>> ???? methods by one single #ifdef ASSERT since verify()/verify_base_ptrs()
>> ???? are only called in ASSERT blocks. And add a // ASSERT comment on the
>> ???? closing #endif to make it more clear. Don't forget to also surround the
>> ???? declarations in the .hpp file with an ASSERT.
>> ???? - In verify_base_ptrs():
>> ??????? - L2330: Missing curly braces for the loop
>> ??????? - L2297, 2309, 2316: The asterisk should be at the type: ResourceArea
>> ???? *a -> ResourceArea* a
>> ??????? - There is a missing space in all asserts after the comma separating
>> ???? the condition and the failure string
>> ???? - In verify():
>> ??????? - L2386: Missing space and curly braces for the if statement
>> ???? Best regards,
>> ???? Christian
>> ???? On 07.08.20 01:49, Verghese, Clive wrote:
>> ???? > Hi,
>> ???? >
>> ???? > Requesting review for
>> ???? >
>> ???? > Webrev : http://cr.openjdk.java.net/~xliu/clive/8251268/00/webrev/
>> ???? > JBS : https://bugs.openjdk.java.net/browse/JDK-8251268
>> ???? >
>> ???? > The change moves the definition of PhaseChaitin::verify_base_ptrs and PhaseChaitin::verify from live.cpp to 
>> chaitin.cpp
>> ???? >
>> ???? > I have tested this builds successfully for both PRODUCT and !PRODUCT.
>> ???? >
>> ???? > Ensured that there are no regressions in hotspot:tier1 tests.
>> ???? >
>> ???? >
>> ???? > Regards,
>> ???? > Clive Verghese
>> ???? >
>>

From beurba at microsoft.com  Tue Aug 11 20:23:50 2020
From: beurba at microsoft.com (Bernhard Urban-Forster)
Date: Tue, 11 Aug 2020 20:23:50 +0000
Subject: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64
In-Reply-To: <A369C69D-229E-471A-B8F3-21EECC40F71D@oracle.com>
References: <MWHPR21MB051183FDFD8C154790995C1BB0470@MWHPR21MB0511.namprd21.prod.outlook.com>
 <9074F4C9-589A-4519-BBAB-2F3161B814D7@oracle.com>
 <DM6PR21MB145210D8E5638AEEFF2849C3C2440@DM6PR21MB1452.namprd21.prod.outlook.com>,
 <A369C69D-229E-471A-B8F3-21EECC40F71D@oracle.com>
Message-ID: <DM6PR21MB145227A13AFE145ADA0B7973C2450@DM6PR21MB1452.namprd21.prod.outlook.com>

Hey Doug,

since I was curious I did a bit of digging. Here are my findings:

1. Graal is able to detect that it only needs to do the array bounds check once for all the 16 array accesses, as I expected.
2. Thus the generated code by Graal is almost as fast as the MD5 intrinsic.
3. The gap, from what I can tell, is that the SchedulePhase decides to put all the 16 FloatingReadNodes at the top of the basic block, and thus increasing register pressure and therefore ending up needing to spill on x86_64. It would be nice if the read access would be scheduled next to its usage in this case. I couldn't figure out how to do that, it has been a while since I've touched that code :-)

Here are some numbers plus the generated code of C2, the intrinsic and Graal:
https://gist.github.com/lewurm/3b874558d369fd56b3737e28f1616740

-Bernhard

________________________________________
From: Doug Simon <doug.simon at oracle.com>
Sent: Monday, August 10, 2020 15:38
To: Bernhard Urban-Forster
Cc: Ludovic Henry; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64
Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64

Hi Bernhard,


On 10 Aug 2020, at 15:01, Bernhard Urban-Forster <beurba at microsoft.com<mailto:beurba at microsoft.com>> wrote:

Hey Doug,

replying on behalf for Ludovic, as he is on vacation :-)

Currently we are not planning to implement the intrinsic for Graal.

Schade ;-)

Also we didn't check the generated code by Graal. I believe it will do a better job eliminated array bounds checks, but I'm curious to learn how "compiling the relevant Java code without array bounds checks" works. Is something like that done for other methods already?

I don?t think we do that anywhere currently but I imagine it wouldn?t be hard to put the BytecodeParser into a mode whereby an array access generates a AccessIndexedNode that omits the bounds check (generated by org.graalvm.compiler.replacements.DefaultJavaLoweringProvider.getBoundsCheck).

-Doug


This is the relevant Java method for the MD5 intrinsic:
https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/733218137289d6a0eb705103ed7be30f1e68d17a/src/java.base/share/classes/sun/security/provider/MD5.java*L172__;Iw!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6ijVLDV$<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblob%2F733218137289d6a0eb705103ed7be30f1e68d17a%2Fsrc%2Fjava.base%2Fshare%2Fclasses%2Fsun%2Fsecurity%2Fprovider%2FMD5.java*L172__%3BIw!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6ijVLDV%24&data=02%7C01%7Cbeurba%40microsoft.com%7C73f0bfe6e2b04b3b723f08d83d32bbe2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637326635414506507&sdata=zjIRJ0NvFOuSTXrhmNJbaPYqzCgZ3SOTLGDdo5B0cVk%3D&reserved=0>


-Bernhard

________________________________________
From: Doug Simon <doug.simon at oracle.com<mailto:doug.simon at oracle.com>>
Sent: Monday, August 10, 2020 11:55
To: Ludovic Henry
Cc: hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev at openjdk.java.net<mailto:aarch64-port-dev at openjdk.java.net>; openjdk-aarch64
Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64

Hi Ludovic,

Are you considering also implementing this intrinsic in Graal?

Is the intrinsification purely about removing the array bounds checks? If so, it may be possible to have the Graal intrinsify the method by compiling the relevant Java code without array bounds checks.

-Doug

On 9 Aug 2020, at 05:19, Ludovic Henry <luhenry at microsoft.com<mailto:luhenry at microsoft.com>> wrote:

Hello,

Bug: https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8251216&amp;data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&amp;sdata=C7Bi8BTsmtR3HFgWgYTw7jww63BcHGutNXE8o9x2bdY*3D&amp;reserved=0__;JSUlJSUlJSUlJSUlJSU!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E97IPBA3$<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8251216%26data%3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034%26sdata%3DC7Bi8BTsmtR3HFgWgYTw7jww63BcHGutNXE8o9x2bdY*3D%26reserved%3D0__%3BJSUlJSUlJSUlJSUlJSU!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E97IPBA3%24&data=02%7C01%7Cbeurba%40microsoft.com%7C73f0bfe6e2b04b3b723f08d83d32bbe2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637326635414516501&sdata=ygTKduL7MP94XfsURzGptQR2dXaWVjaeRZaOQFDAxpc%3D&reserved=0>
Webrev: https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=http:*2F*2Fcr.openjdk.java.net*2F*luhenry*2F8251216*2Fwebrev.00&amp;data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&amp;sdata=0CZOMfpmtPZiy64za8NYYpVjCdawmjGacEOc3WfADDA*3D&amp;reserved=0__;JSUlfiUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E84nlzLJ$<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttp%3A*2F*2Fcr.openjdk.java.net*2F*luhenry*2F8251216*2Fwebrev.00%26data%3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034%26sdata%3D0CZOMfpmtPZiy64za8NYYpVjCdawmjGacEOc3WfADDA*3D%26reserved%3D0__%3BJSUlfiUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E84nlzLJ%24&data=02%7C01%7Cbeurba%40microsoft.com%7C73f0bfe6e2b04b3b723f08d83d32bbe2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637326635414516501&sdata=4WTJV1GTOda5cssyQOIOeecPgo8IJ8HFNhuarv%2FXgkg%3D&reserved=0>

Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1

This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance improvements are the following (on Linux-AArch64 on a Marvell TX2):

-XX:-UseMD5Intrinsics
Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
MessageDigests.digest             md5        64     DEFAULT  thrpt   10  1616.238 ? 28.082  ops/ms
MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   215.030 ?  0.691  ops/ms
MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.228 ?  0.001  ops/ms

-XX:+UseMD5Intrinsics
Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
MessageDigests.digest             md5        64     DEFAULT  thrpt   10  2005.233 ? 40.513  ops/ms => 24% speedup
MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   275.979 ?  0.455  ops/ms => 28% speedup
MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.279 ?  0.001  ops/ms => 22% speedup

Thank you,
Ludovic

[1] https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8250902&amp;data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&amp;sdata=5KcoG5n10rnVMU9y8L076jpCoEd0NBzNqr*2F8M5ghO3c*3D&amp;reserved=0__;JSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6SPJBTN$<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8250902%26data%3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034%26sdata%3D5KcoG5n10rnVMU9y8L076jpCoEd0NBzNqr*2F8M5ghO3c*3D%26reserved%3D0__%3BJSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6SPJBTN%24&data=02%7C01%7Cbeurba%40microsoft.com%7C73f0bfe6e2b04b3b723f08d83d32bbe2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637326635414526495&sdata=gJPbg6l5kxrB79Z9CE0TIB9jjnamG7lGHp%2BZj%2Bbw73A%3D&reserved=0>


From doug.simon at oracle.com  Tue Aug 11 20:32:54 2020
From: doug.simon at oracle.com (Doug Simon)
Date: Tue, 11 Aug 2020 22:32:54 +0200
Subject: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64
In-Reply-To: <DM6PR21MB145227A13AFE145ADA0B7973C2450@DM6PR21MB1452.namprd21.prod.outlook.com>
References: <MWHPR21MB051183FDFD8C154790995C1BB0470@MWHPR21MB0511.namprd21.prod.outlook.com>
 <9074F4C9-589A-4519-BBAB-2F3161B814D7@oracle.com>
 <DM6PR21MB145210D8E5638AEEFF2849C3C2440@DM6PR21MB1452.namprd21.prod.outlook.com>
 <A369C69D-229E-471A-B8F3-21EECC40F71D@oracle.com>
 <DM6PR21MB145227A13AFE145ADA0B7973C2450@DM6PR21MB1452.namprd21.prod.outlook.com>
Message-ID: <80E25174-9E0E-40EF-AF75-7295782CE360@oracle.com>

Thanks for the digging and results Bernhard.

We?ve discussed making the SchedulePhase do latency-aware scheduling within blocks but haven?t done anything yet.

-Doug

> On 11 Aug 2020, at 22:23, Bernhard Urban-Forster <beurba at microsoft.com> wrote:
> 
> Hey Doug,
> 
> since I was curious I did a bit of digging. Here are my findings:
> 
> 1. Graal is able to detect that it only needs to do the array bounds check once for all the 16 array accesses, as I expected.
> 2. Thus the generated code by Graal is almost as fast as the MD5 intrinsic.
> 3. The gap, from what I can tell, is that the SchedulePhase decides to put all the 16 FloatingReadNodes at the top of the basic block, and thus increasing register pressure and therefore ending up needing to spill on x86_64. It would be nice if the read access would be scheduled next to its usage in this case. I couldn't figure out how to do that, it has been a while since I've touched that code :-)
> 
> Here are some numbers plus the generated code of C2, the intrinsic and Graal:
> https://urldefense.com/v3/__https://gist.github.com/lewurm/3b874558d369fd56b3737e28f1616740__;!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctRenoV51$ 
> 
> -Bernhard
> 
> ________________________________________
> From: Doug Simon <doug.simon at oracle.com>
> Sent: Monday, August 10, 2020 15:38
> To: Bernhard Urban-Forster
> Cc: Ludovic Henry; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64
> Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64
> 
> Hi Bernhard,
> 
> 
> On 10 Aug 2020, at 15:01, Bernhard Urban-Forster <beurba at microsoft.com<mailto:beurba at microsoft.com>> wrote:
> 
> Hey Doug,
> 
> replying on behalf for Ludovic, as he is on vacation :-)
> 
> Currently we are not planning to implement the intrinsic for Graal.
> 
> Schade ;-)
> 
> Also we didn't check the generated code by Graal. I believe it will do a better job eliminated array bounds checks, but I'm curious to learn how "compiling the relevant Java code without array bounds checks" works. Is something like that done for other methods already?
> 
> I don?t think we do that anywhere currently but I imagine it wouldn?t be hard to put the BytecodeParser into a mode whereby an array access generates a AccessIndexedNode that omits the bounds check (generated by org.graalvm.compiler.replacements.DefaultJavaLoweringProvider.getBoundsCheck).
> 
> -Doug
> 
> 
> This is the relevant Java method for the MD5 intrinsic:
> https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/733218137289d6a0eb705103ed7be30f1e68d17a/src/java.base/share/classes/sun/security/provider/MD5.java*L172__;Iw!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6ijVLDV$<https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fgithub.com*2Fopenjdk*2Fjdk*2Fblob*2F733218137289d6a0eb705103ed7be30f1e68d17a*2Fsrc*2Fjava.base*2Fshare*2Fclasses*2Fsun*2Fsecurity*2Fprovider*2FMD5.java*L172__*3BIw!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6ijVLDV*24&data=02*7C01*7Cbeurba*40microsoft.com*7C73f0bfe6e2b04b3b723f08d83d32bbe2*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326635414506507&sdata=zjIRJ0NvFOuSTXrhmNJbaPYqzCgZ3SOTLGDdo5B0cVk*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUqJSUlJSUlJSUlJSU!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctRvXp_Fn$ >
> 
> 
> -Bernhard
> 
> ________________________________________
> From: Doug Simon <doug.simon at oracle.com<mailto:doug.simon at oracle.com>>
> Sent: Monday, August 10, 2020 11:55
> To: Ludovic Henry
> Cc: hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev at openjdk.java.net<mailto:aarch64-port-dev at openjdk.java.net>; openjdk-aarch64
> Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64
> 
> Hi Ludovic,
> 
> Are you considering also implementing this intrinsic in Graal?
> 
> Is the intrinsification purely about removing the array bounds checks? If so, it may be possible to have the Graal intrinsify the method by compiling the relevant Java code without array bounds checks.
> 
> -Doug
> 
> On 9 Aug 2020, at 05:19, Ludovic Henry <luhenry at microsoft.com<mailto:luhenry at microsoft.com>> wrote:
> 
> Hello,
> 
> Bug: https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8251216&amp;data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&amp;sdata=C7Bi8BTsmtR3HFgWgYTw7jww63BcHGutNXE8o9x2bdY*3D&amp;reserved=0__;JSUlJSUlJSUlJSUlJSU!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E97IPBA3$<https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8251216*26data*3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034*26sdata*3DC7Bi8BTsmtR3HFgWgYTw7jww63BcHGutNXE8o9x2bdY*3D*26reserved*3D0__*3BJSUlJSUlJSUlJSUlJSU!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E97IPBA3*24&data=02*7C01*7Cbeurba*40microsoft.com*7C73f0bfe6e2b04b3b723f08d83d32bbe2*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326635414516501&sdata=ygTKduL7MP94XfsURzGptQR2dXaWVjaeRZaOQFDAxpc*3D&reserved=0__;JSUlJSUlJSUlJSUqKioqKiUlKioqKioqKiolJSolJSUlJSUlJSUlJSUl!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctewJOrwe$ >
> Webrev: https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=http:*2F*2Fcr.openjdk.java.net*2F*luhenry*2F8251216*2Fwebrev.00&amp;data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&amp;sdata=0CZOMfpmtPZiy64za8NYYpVjCdawmjGacEOc3WfADDA*3D&amp;reserved=0__;JSUlfiUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E84nlzLJ$<https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttp*3A*2F*2Fcr.openjdk.java.net*2F*luhenry*2F8251216*2Fwebrev.00*26data*3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034*26sdata*3D0CZOMfpmtPZiy64za8NYYpVjCdawmjGacEOc3WfADDA*3D*26reserved*3D0__*3BJSUlfiUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E84nlzLJ*24&data=02*7C01*7Cbeurba*40microsoft.com*7C73f0bfe6e2b04b3b723f08d83d32bbe2*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326635414516501&sdata=4WTJV1GTOda5cssyQOIOeecPgo8IJ8HFNhuarv*2FXgkg*3D&reserved=0__;JSUlJSUlJSUlJSUlKioqKioqJSUqKioqKioqKiUlKiUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctd8I7Gm1$ >
> 
> Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1
> 
> This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance improvements are the following (on Linux-AArch64 on a Marvell TX2):
> 
> -XX:-UseMD5Intrinsics
> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
> MessageDigests.digest             md5        64     DEFAULT  thrpt   10  1616.238 ? 28.082  ops/ms
> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   215.030 ?  0.691  ops/ms
> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.228 ?  0.001  ops/ms
> 
> -XX:+UseMD5Intrinsics
> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
> MessageDigests.digest             md5        64     DEFAULT  thrpt   10  2005.233 ? 40.513  ops/ms => 24% speedup
> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   275.979 ?  0.455  ops/ms => 28% speedup
> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.279 ?  0.001  ops/ms => 22% speedup
> 
> Thank you,
> Ludovic
> 
> [1] https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8250902&amp;data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&amp;sdata=5KcoG5n10rnVMU9y8L076jpCoEd0NBzNqr*2F8M5ghO3c*3D&amp;reserved=0__;JSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6SPJBTN$<https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8250902*26data*3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034*26sdata*3D5KcoG5n10rnVMU9y8L076jpCoEd0NBzNqr*2F8M5ghO3c*3D*26reserved*3D0__*3BJSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6SPJBTN*24&data=02*7C01*7Cbeurba*40microsoft.com*7C73f0bfe6e2b04b3b723f08d83d32bbe2*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326635414526495&sdata=gJPbg6l5kxrB79Z9CE0TIB9jjnamG7lGHp*2BZj*2Bbw73A*3D&reserved=0__;JSUlJSUlJSUlJSUqKioqKiUlKioqKioqKiolJSoqJSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctQ6f-i3M$ >
> 


From beurba at microsoft.com  Tue Aug 11 20:45:27 2020
From: beurba at microsoft.com (Bernhard Urban-Forster)
Date: Tue, 11 Aug 2020 20:45:27 +0000
Subject: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64
In-Reply-To: <80E25174-9E0E-40EF-AF75-7295782CE360@oracle.com>
References: <MWHPR21MB051183FDFD8C154790995C1BB0470@MWHPR21MB0511.namprd21.prod.outlook.com>
 <9074F4C9-589A-4519-BBAB-2F3161B814D7@oracle.com>
 <DM6PR21MB145210D8E5638AEEFF2849C3C2440@DM6PR21MB1452.namprd21.prod.outlook.com>
 <A369C69D-229E-471A-B8F3-21EECC40F71D@oracle.com>
 <DM6PR21MB145227A13AFE145ADA0B7973C2450@DM6PR21MB1452.namprd21.prod.outlook.com>,
 <80E25174-9E0E-40EF-AF75-7295782CE360@oracle.com>
Message-ID: <DM6PR21MB145249E7CDA1BC29A6A05EC0C2450@DM6PR21MB1452.namprd21.prod.outlook.com>

That's great to hear :-)


Thank you,
-Bernhard

________________________________________
From: Doug Simon <doug.simon at oracle.com>
Sent: Tuesday, August 11, 2020 22:32
To: Bernhard Urban-Forster
Cc: Ludovic Henry; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64; Thomas Wuerthinger; David Leopoldseder
Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64

Thanks for the digging and results Bernhard.

We?ve discussed making the SchedulePhase do latency-aware scheduling within blocks but haven?t done anything yet.

-Doug

> On 11 Aug 2020, at 22:23, Bernhard Urban-Forster <beurba at microsoft.com> wrote:
>
> Hey Doug,
>
> since I was curious I did a bit of digging. Here are my findings:
>
> 1. Graal is able to detect that it only needs to do the array bounds check once for all the 16 array accesses, as I expected.
> 2. Thus the generated code by Graal is almost as fast as the MD5 intrinsic.
> 3. The gap, from what I can tell, is that the SchedulePhase decides to put all the 16 FloatingReadNodes at the top of the basic block, and thus increasing register pressure and therefore ending up needing to spill on x86_64. It would be nice if the read access would be scheduled next to its usage in this case. I couldn't figure out how to do that, it has been a while since I've touched that code :-)
>
> Here are some numbers plus the generated code of C2, the intrinsic and Graal:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fgist.github.com%2Flewurm%2F3b874558d369fd56b3737e28f1616740__%3B!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctRenoV51%24&amp;data=02%7C01%7Cbeurba%40microsoft.com%7C4604acd9be3e4dafdb8d08d83e35c6ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637327747982263260&amp;sdata=da4FNCvOEUajDgNYLdNl2DNo3diwgsCsGy8BCD%2BipPA%3D&amp;reserved=0
>
> -Bernhard
>
> ________________________________________
> From: Doug Simon <doug.simon at oracle.com>
> Sent: Monday, August 10, 2020 15:38
> To: Bernhard Urban-Forster
> Cc: Ludovic Henry; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64
> Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64
>
> Hi Bernhard,
>
>
> On 10 Aug 2020, at 15:01, Bernhard Urban-Forster <beurba at microsoft.com<mailto:beurba at microsoft.com>> wrote:
>
> Hey Doug,
>
> replying on behalf for Ludovic, as he is on vacation :-)
>
> Currently we are not planning to implement the intrinsic for Graal.
>
> Schade ;-)
>
> Also we didn't check the generated code by Graal. I believe it will do a better job eliminated array bounds checks, but I'm curious to learn how "compiling the relevant Java code without array bounds checks" works. Is something like that done for other methods already?
>
> I don?t think we do that anywhere currently but I imagine it wouldn?t be hard to put the BytecodeParser into a mode whereby an array access generates a AccessIndexedNode that omits the bounds check (generated by org.graalvm.compiler.replacements.DefaultJavaLoweringProvider.getBoundsCheck).
>
> -Doug
>
>
> This is the relevant Java method for the MD5 intrinsic:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblob%2F733218137289d6a0eb705103ed7be30f1e68d17a%2Fsrc%2Fjava.base%2Fshare%2Fclasses%2Fsun%2Fsecurity%2Fprovider%2FMD5.java*L172__%3BIw!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6ijVLDV%24&amp;data=02%7C01%7Cbeurba%40microsoft.com%7C4604acd9be3e4dafdb8d08d83e35c6ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637327747982273214&amp;sdata=BTu7UyXhnF1XPmbzhpVQ3y3mQ1evQHuVe0qKMgj%2FNDs%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fgithub.com*2Fopenjdk*2Fjdk*2Fblob*2F733218137289d6a0eb705103ed7be30f1e68d17a*2Fsrc*2Fjava.base*2Fshare*2Fclasses*2Fsun*2Fsecurity*2Fprovider*2FMD5.java*L172__*3BIw!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6ijVLDV*24%26data%3D02*7C01*7Cbeurba*40microsoft.com*7C73f0bfe6e2b04b3b723f08d83d32bbe2*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326635414506507%26sdata%3DzjIRJ0NvFOuSTXrhmNJbaPYqzCgZ3SOTLGDdo5B0cVk*3D%26reserved%3D0__%3BJSUlJSUlJSUlJSUlJSUlJSUlJSUqJSUlJSUlJSUlJSU!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctRvXp_Fn%24&amp;data=02%7C01%7Cbeurba%40microsoft.com%7C4604acd9be3e4dafdb8d08d83e35c6ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637327747982273214&amp;sdata=PFUAyiQReVYAuD9ZZ7iFzI%2F57GQSbELUxN%2BfRZKKVi4%3D&amp;reserved=0 >
>
>
> -Bernhard
>
> ________________________________________
> From: Doug Simon <doug.simon at oracle.com<mailto:doug.simon at oracle.com>>
> Sent: Monday, August 10, 2020 11:55
> To: Ludovic Henry
> Cc: hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev at openjdk.java.net<mailto:aarch64-port-dev at openjdk.java.net>; openjdk-aarch64
> Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64
>
> Hi Ludovic,
>
> Are you considering also implementing this intrinsic in Graal?
>
> Is the intrinsification purely about removing the array bounds checks? If so, it may be possible to have the Graal intrinsify the method by compiling the relevant Java code without array bounds checks.
>
> -Doug
>
> On 9 Aug 2020, at 05:19, Ludovic Henry <luhenry at microsoft.com<mailto:luhenry at microsoft.com>> wrote:
>
> Hello,
>
> Bug: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8251216%26amp%3Bdata%3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034%26amp%3Bsdata%3DC7Bi8BTsmtR3HFgWgYTw7jww63BcHGutNXE8o9x2bdY*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUlJSU!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E97IPBA3%24&amp;data=02%7C01%7Cbeurba%40microsoft.com%7C4604acd9be3e4dafdb8d08d83e35c6ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637327747982273214&amp;sdata=gzxWbxSJGlmPvXnYko6rvVAKnbeJOWhWhISqTJvVaA8%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8251216*26data*3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034*26sdata*3DC7Bi8BTsmtR3HFgWgYTw7jww63BcHGutNXE8o9x2bdY*3D*26reserved*3D0__*3BJSUlJSUlJSUlJSUlJSU!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E97IPBA3*24%26data%3D02*7C01*7Cbeurba*40microsoft.com*7C73f0bfe6e2b04b3b723f08d83d32bbe2*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326635414516501%26sdata%3DygTKduL7MP94XfsURzGptQR2dXaWVjaeRZaOQFDAxpc*3D%26reserved%3D0__%3BJSUlJSUlJSUlJSUqKioqKiUlKioqKioqKiolJSolJSUlJSUlJSUlJSUl!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctewJOrwe%24&amp;data=02%7C01%7Cbeurba%40microsoft.com%7C4604acd9be3e4dafdb8d08d83e35c6ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637327747982273214&amp;sdata=62af8uOdZu%2BgCCebls3qkBMKYAZVMdHQpq1z3pSLQvA%3D&amp;reserved=0 >
> Webrev: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttp%3A*2F*2Fcr.openjdk.java.net*2F*luhenry*2F8251216*2Fwebrev.00%26amp%3Bdata%3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034%26amp%3Bsdata%3D0CZOMfpmtPZiy64za8NYYpVjCdawmjGacEOc3WfADDA*3D%26amp%3Breserved%3D0__%3BJSUlfiUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E84nlzLJ%24&amp;data=02%7C01%7Cbeurba%40microsoft.com%7C4604acd9be3e4dafdb8d08d83e35c6ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637327747982273214&amp;sdata=%2FTz7d6sQ2Hx8MGSGgUv5eKLxgCxtKEIdSJA2EYX3pHE%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttp*3A*2F*2Fcr.openjdk.java.net*2F*luhenry*2F8251216*2Fwebrev.00*26data*3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034*26sdata*3D0CZOMfpmtPZiy64za8NYYpVjCdawmjGacEOc3WfADDA*3D*26reserved*3D0__*3BJSUlfiUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E84nlzLJ*24%26data%3D02*7C01*7Cbeurba*40microsoft.com*7C73f0bfe6e2b04b3b723f08d83d32bbe2*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326635414516501%26sdata%3D4WTJV1GTOda5cssyQOIOeecPgo8IJ8HFNhuarv*2FXgkg*3D%26reserved%3D0__%3BJSUlJSUlJSUlJSUlKioqKioqJSUqKioqKioqKiUlKiUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctd8I7Gm1%24&amp;data=02%7C01%7Cbeurba%40microsoft.com%7C4604acd9be3e4dafdb8d08d83e35c6ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637327747982273214&amp;sdata=wuIgUnNJFJsrWsb1f4vFIPujIY6f5WEUVragM94R%2Bm8%3D&amp;reserved=0 >
>
> Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1
>
> This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance improvements are the following (on Linux-AArch64 on a Marvell TX2):
>
> -XX:-UseMD5Intrinsics
> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
> MessageDigests.digest             md5        64     DEFAULT  thrpt   10  1616.238 ? 28.082  ops/ms
> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   215.030 ?  0.691  ops/ms
> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.228 ?  0.001  ops/ms
>
> -XX:+UseMD5Intrinsics
> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
> MessageDigests.digest             md5        64     DEFAULT  thrpt   10  2005.233 ? 40.513  ops/ms => 24% speedup
> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   275.979 ?  0.455  ops/ms => 28% speedup
> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.279 ?  0.001  ops/ms => 22% speedup
>
> Thank you,
> Ludovic
>
> [1] https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8250902%26amp%3Bdata%3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034%26amp%3Bsdata%3D5KcoG5n10rnVMU9y8L076jpCoEd0NBzNqr*2F8M5ghO3c*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6SPJBTN%24&amp;data=02%7C01%7Cbeurba%40microsoft.com%7C4604acd9be3e4dafdb8d08d83e35c6ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637327747982273214&amp;sdata=8kQJdfsUCcdDq%2BS3UA0vWV2QADQGhGSEsZFiXZK0e%2Bw%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8250902*26data*3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034*26sdata*3D5KcoG5n10rnVMU9y8L076jpCoEd0NBzNqr*2F8M5ghO3c*3D*26reserved*3D0__*3BJSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6SPJBTN*24%26data%3D02*7C01*7Cbeurba*40microsoft.com*7C73f0bfe6e2b04b3b723f08d83d32bbe2*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326635414526495%26sdata%3DgJPbg6l5kxrB79Z9CE0TIB9jjnamG7lGHp*2BZj*2Bbw73A*3D%26reserved%3D0__%3BJSUlJSUlJSUlJSUqKioqKiUlKioqKioqKiolJSoqJSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctQ6f-i3M%24&amp;data=02%7C01%7Cbeurba%40microsoft.com%7C4604acd9be3e4dafdb8d08d83e35c6ae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637327747982273214&amp;sdata=D8klwOsu5rsiN6B6MtqEjgZd8vTjvNJWxK1KbTQNiMU%3D&amp;reserved=0 >
>


From jingxinc at amazon.com  Tue Aug 11 22:41:46 2020
From: jingxinc at amazon.com (Eric, Chan)
Date: Tue, 11 Aug 2020 22:41:46 +0000
Subject: RFR 8164632: Node indices should be treated as unsigned integers
Message-ID: <99612339-38D5-411C-9459-89EA1A0F4284@amazon.com>

Hi,

Requesting review for

Webrev : http://cr.openjdk.java.net/~xliu/eric/8164632/00/webrev/
JBS : https://bugs.openjdk.java.net/browse/JDK-8164632

The change cast uint ni to integer so that the parameter that pass to method TypeOopPtr::cast_to_instance_id is a integer.

I have tested this builds successfully .

Ensured that there are no regressions in hotspot : tier1 tests.

Regards,
Eric Chen

From vladimir.kozlov at oracle.com  Wed Aug 12 00:19:09 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 11 Aug 2020 17:19:09 -0700
Subject: [16] RFR(T) 8251306: compiler/aot/cli/jaotc/IgnoreErrorsTest.java
 timed out on MacOS
Message-ID: <62be0c5d-1bb4-4f24-6f9c-25e2b4059a07@oracle.com>

https://bugs.openjdk.java.net/browse/JDK-8251306

Test runs 4 jaotc subtests and each took 4 mins on particular slow machine.
Even so timeout factor was "-timeoutFactor:4" it was not enough.

Tests concurrency was '-concurrency:6'
Flags were: '-ea -esa -XX:CompileThreshold=100 -XX:-TieredCompilation'

So 2 C2 threads were compiling Graal during JAOTC execution when other tests were run concurrently.
Which may explain slow execution.

Since it is rare case I suggest just increase test's timeout from default 2 to 6 mins :

test/hotspot/jtreg/compiler/aot/cli/jaotc/IgnoreErrorsTest.java
@@ -1,5 +1,5 @@
  /*
- * Copyright (c) 2018, Oracle and/or its affiliates. All rights reserved.
+ * Copyright (c) 2018, 2020, Oracle and/or its affiliates. All rights reserved.
   * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
   *
   * This code is free software; you can redistribute it and/or modify it
@@ -26,7 +26,7 @@
   * @requires vm.aot
   * @library / /test/lib /testlibrary
   * @compile IllegalClass.jasm
- * @run driver compiler.aot.cli.jaotc.IgnoreErrorsTest
+ * @run driver/timeout=360 compiler.aot.cli.jaotc.IgnoreErrorsTest
   */

  package compiler.aot.cli.jaotc;

Thanks,
Vladimir

From igor.ignatyev at oracle.com  Wed Aug 12 02:22:33 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 11 Aug 2020 19:22:33 -0700
Subject: [16] RFR(T) 8251306: compiler/aot/cli/jaotc/IgnoreErrorsTest.java
 timed out on MacOS
In-Reply-To: <62be0c5d-1bb4-4f24-6f9c-25e2b4059a07@oracle.com>
References: <62be0c5d-1bb4-4f24-6f9c-25e2b4059a07@oracle.com>
Message-ID: <68AEF8FA-949F-414F-BBA8-D7A1A8D13469@oracle.com>

Hi Vladimir,

LGTM.

-- Igor

> On Aug 11, 2020, at 5:19 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> https://bugs.openjdk.java.net/browse/JDK-8251306
> 
> Test runs 4 jaotc subtests and each took 4 mins on particular slow machine.
> Even so timeout factor was "-timeoutFactor:4" it was not enough.
> 
> Tests concurrency was '-concurrency:6'
> Flags were: '-ea -esa -XX:CompileThreshold=100 -XX:-TieredCompilation'
> 
> So 2 C2 threads were compiling Graal during JAOTC execution when other tests were run concurrently.
> Which may explain slow execution.
> 
> Since it is rare case I suggest just increase test's timeout from default 2 to 6 mins :
> 
> test/hotspot/jtreg/compiler/aot/cli/jaotc/IgnoreErrorsTest.java
> @@ -1,5 +1,5 @@
> /*
> - * Copyright (c) 2018, Oracle and/or its affiliates. All rights reserved.
> + * Copyright (c) 2018, 2020, Oracle and/or its affiliates. All rights reserved.
>  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>  *
>  * This code is free software; you can redistribute it and/or modify it
> @@ -26,7 +26,7 @@
>  * @requires vm.aot
>  * @library / /test/lib /testlibrary
>  * @compile IllegalClass.jasm
> - * @run driver compiler.aot.cli.jaotc.IgnoreErrorsTest
> + * @run driver/timeout=360 compiler.aot.cli.jaotc.IgnoreErrorsTest
>  */
> 
> package compiler.aot.cli.jaotc;
> 
> Thanks,
> Vladimir


From vladimir.kozlov at oracle.com  Wed Aug 12 02:23:10 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 11 Aug 2020 19:23:10 -0700
Subject: [16] RFR(T) 8251306: compiler/aot/cli/jaotc/IgnoreErrorsTest.java
 timed out on MacOS
In-Reply-To: <68AEF8FA-949F-414F-BBA8-D7A1A8D13469@oracle.com>
References: <62be0c5d-1bb4-4f24-6f9c-25e2b4059a07@oracle.com>
 <68AEF8FA-949F-414F-BBA8-D7A1A8D13469@oracle.com>
Message-ID: <7ce1223b-7c1d-47b9-dc69-ae72baa26fda@oracle.com>

Thank you, Igor

Vladimir K

On 8/11/20 7:22 PM, Igor Ignatyev wrote:
> Hi Vladimir,
> 
> LGTM.
> 
> -- Igor
> 
>> On Aug 11, 2020, at 5:19 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8251306
>>
>> Test runs 4 jaotc subtests and each took 4 mins on particular slow machine.
>> Even so timeout factor was "-timeoutFactor:4" it was not enough.
>>
>> Tests concurrency was '-concurrency:6'
>> Flags were: '-ea -esa -XX:CompileThreshold=100 -XX:-TieredCompilation'
>>
>> So 2 C2 threads were compiling Graal during JAOTC execution when other tests were run concurrently.
>> Which may explain slow execution.
>>
>> Since it is rare case I suggest just increase test's timeout from default 2 to 6 mins :
>>
>> test/hotspot/jtreg/compiler/aot/cli/jaotc/IgnoreErrorsTest.java
>> @@ -1,5 +1,5 @@
>> /*
>> - * Copyright (c) 2018, Oracle and/or its affiliates. All rights reserved.
>> + * Copyright (c) 2018, 2020, Oracle and/or its affiliates. All rights reserved.
>>   * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>>   *
>>   * This code is free software; you can redistribute it and/or modify it
>> @@ -26,7 +26,7 @@
>>   * @requires vm.aot
>>   * @library / /test/lib /testlibrary
>>   * @compile IllegalClass.jasm
>> - * @run driver compiler.aot.cli.jaotc.IgnoreErrorsTest
>> + * @run driver/timeout=360 compiler.aot.cli.jaotc.IgnoreErrorsTest
>>   */
>>
>> package compiler.aot.cli.jaotc;
>>
>> Thanks,
>> Vladimir
> 

From OGATAK at jp.ibm.com  Wed Aug 12 07:48:59 2020
From: OGATAK at jp.ibm.com (Kazunori Ogata)
Date: Wed, 12 Aug 2020 16:48:59 +0900
Subject: RFR: JDK-8251470: Add a development option equivalant to
 OptoNoExecute to C1 compiler
Message-ID: <OF1981A1B1.47AE63D1-ON492585C2.002A4C68-492585C2.002AF0B9@notes.na.collabserv.com>

Hi,

May I get review for JDK-8251470: Add a development option equivalant to 
OptoNoExecute to C1 compiler?

This patch adds a development option to compile a method with C1 and print 
disassembly of the generated native code, but to skip execution of the 
generated code, in the same manner as OptoNoExecute option does in C2.

Log-based debugging is useful to support a new processor.  In C1, the 
existing options BailoutAfterHIR and BailoutAfterLIR can be used if 
printing HIR/LIR is sufficient.  However, there is no way to print 
disassembly of the generated code because these existing options quit 
compilation before generating native code.  So this issue proposes a new 
option for this purpose.


Bug: https://bugs.openjdk.java.net/browse/JDK-8251470
Webrev: http://cr.openjdk.java.net/~ogatak/8251470/webrev.00/


Regards,
Ogata


From nils.eliasson at oracle.com  Wed Aug 12 08:21:58 2020
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Wed, 12 Aug 2020 10:21:58 +0200
Subject: RFR(S): 8247732: validate user-input intrinsic_ids in
 ControlIntrinsic
In-Reply-To: <1597165750921.4285@amazon.com>
References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com>
 <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com>
 <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com>
 <1595401959932.33284@amazon.com>
 <a03d92d6-ad07-b347-7452-776459b8d174@oracle.com>
 <1595520162373.22868@amazon.com>
 <916b3a4a-5617-941d-6161-840f3ea900bd@oracle.com>
 <1596523192072.15354@amazon.com> <1597165750921.4285@amazon.com>
Message-ID: <f9dfee56-c3dc-718a-960b-9bfc2a8c0c12@oracle.com>

Hi,

Sorry for the delay.

About the error handling:

For CompilerDirectivesFile there are two scenarios:
1) If a file containing bad contents is passed on the commandline - the 
VM prints an descriptive error and refuses to start.
2) If a file containing bad contents is passed through jcmd - the VM 
prints and error on the jcmd stream and continues to run (ignoring the 
command).

This is achieved by letting the parser just register any parsing error, 
and defer to the caller to decide how to handle the situation.

Regards,
Nils Eliasson


On 2020-08-11 19:09, Liu, Xin wrote:
> Hi, Reviewers,
>
> May I gently ping this?
>
> I stuck because I don't know which error handling is appropriate.
>
> If we do nothing, current hotspot ignores wrong intrinsic Ids in the cmdline.
> This patch aborts hotspot when it detects any invalid intrinsic id.
>
> thanks,
> --lx
>
>
> ________________________________________
> From: Liu, Xin
> Sent: Monday, August 3, 2020 11:39 PM
> To: Tobias Hartmann; Nils Eliasson; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev
> Subject: Re: [EXTERNAL] RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic
>
> hi, Nils,
>
> Tobias would like to keep the parser behavior consistency.  I think it means that the hotspot need to suppress the warning if the intrinsic_id doesn't exists in compiler directive.
> eg. -XX:CompileCommand=option,<pattern>,ControlIntrinsic=-_nonexist.
>
> What do you think about it?
>
> Here is the latest webrev:
> http://cr.openjdk.java.net/~xliu/8247732/01/webrev/
>
> thanks,
> --lx
>
> ________________________________________
> From: Tobias Hartmann <tobias.hartmann at oracle.com>
> Sent: Friday, July 24, 2020 2:52 AM
> To: Liu, Xin; Nils Eliasson; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev
> Subject: RE: [EXTERNAL] RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic
>
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> Hi Liu,
>
> On 23.07.20 18:02, Liu, Xin wrote:
>> That is my intention too, but CompilerOracle doesn't exit JVM when it encounters parsing errors.
>> It just exacts information from CompileCommand as many as possible. That makes sense because compiler "directives" are supposed to be optional for program execution.
>>
>> I do put the error message in parser's errorbuf.  I set a flag "exit_on_error" to quit JVM after it dumps parser errors. yes, I treat undefined intrinsics as fatal errors.
>> This behavior is from Nils comment: "I want to see an error on startup if the user has specified unknown intrinsic names."  It is also consistent with JVM option -XX:ControlIntrinsic=.
> Okay, thanks for the explanation! I would prefer consistency in error handling of compiler
> directives, i.e., handle all parser failures the same way. But I leave it to Nils to decide.
>
> Best regards,
> Tobias


From tobias.hartmann at oracle.com  Wed Aug 12 08:57:13 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 12 Aug 2020 10:57:13 +0200
Subject: [16] RFR(S): 8251456: [TESTBUG]
 compiler/vectorization/TestVectorsNotSavedAtSafepoint.java failed
 OutOfMemoryError
Message-ID: <55ac928f-7d77-2900-1688-100e5a6fbd4f@oracle.com>

Hi,

please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8251456
http://cr.openjdk.java.net/~thartmann/8251456/webrev.00/

The test is supposed to fill up the heap to trigger GC which will then corrupt vector registers if
they are not saved at safepoints. It sometimes fails with OOME because there is not enough heap
space to allocate such large arrays.

Adding a System.gc() call to the GarbageProducerThread triggers GCs more often without the need to
allocate large garbage arrays and running for many (warmup) iterations. I've also strengthened
verification of the array contents and used the exact same command line flags that Roland proposed
in his fix for JDK-8193518 [1].

I've verified that the test still reproduces JDK-8249608 and JDK-8193518. It's now much more
reliable and reproduces the issues in every run.

Best regards,
Tobias

[1]
http://cr.openjdk.java.net/~roland/8193518/webrev.01/test/hotspot/jtreg/compiler/vectorization/TestVectorsNotSavedAtSafepoint.java.html

From aph at redhat.com  Wed Aug 12 09:47:50 2020
From: aph at redhat.com (Andrew Haley)
Date: Wed, 12 Aug 2020 10:47:50 +0100
Subject: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64
In-Reply-To: <A369C69D-229E-471A-B8F3-21EECC40F71D@oracle.com>
References: <MWHPR21MB051183FDFD8C154790995C1BB0470@MWHPR21MB0511.namprd21.prod.outlook.com>
 <9074F4C9-589A-4519-BBAB-2F3161B814D7@oracle.com>
 <DM6PR21MB145210D8E5638AEEFF2849C3C2440@DM6PR21MB1452.namprd21.prod.outlook.com>
 <A369C69D-229E-471A-B8F3-21EECC40F71D@oracle.com>
Message-ID: <e7405ea7-c681-245a-b360-14003412784a@redhat.com>

On 8/10/20 2:38 PM, Doug Simon wrote:

 > I don?t think we do that anywhere currently but I imagine it
 > wouldn?t be hard to put the BytecodeParser into a mode whereby an
 > array access generates a AccessIndexedNode that omits the bounds
 > check (generated by
 > org.graalvm.compiler.replacements.DefaultJavaLoweringProvider.getBoundsCheck).

We could do that in C2 as well. And it'd be far more attractive than
hand-coded intrinsics.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From tobias.hartmann at oracle.com  Wed Aug 12 11:08:39 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 12 Aug 2020 13:08:39 +0200
Subject: [16] RFR(S): 8251458: Parse::do_lookupswitch fails with "assert(_cnt
 >= 0) failed"
Message-ID: <dda8f5ee-0359-441d-3012-899b08b3b8e2@oracle.com>

Hi,

please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8251458
http://cr.openjdk.java.net/~thartmann/8251458/webrev.00/

We hit an assert in Parse::do_lookupswitch() because the "taken" counter for a lookupswitch branch
is negative. The problem is an overflow when converting an uint counter value > max_jint from
profile information to a jint.

The fix is to handle such overflows by simply limiting the counter value to max_jint.

Best regards,
Tobias

From christian.hagedorn at oracle.com  Wed Aug 12 11:26:47 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Wed, 12 Aug 2020 13:26:47 +0200
Subject: [16] RFR(S): 8251458: Parse::do_lookupswitch fails with
 "assert(_cnt >= 0) failed"
In-Reply-To: <dda8f5ee-0359-441d-3012-899b08b3b8e2@oracle.com>
References: <dda8f5ee-0359-441d-3012-899b08b3b8e2@oracle.com>
Message-ID: <a70a5b0a-b997-965d-13c8-154339183dc1@oracle.com>

Hi Tobias

Looks good to me!

Best regards,
Christian

On 12.08.20 13:08, Tobias Hartmann wrote:
> Hi,
> 
> please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8251458
> http://cr.openjdk.java.net/~thartmann/8251458/webrev.00/
> 
> We hit an assert in Parse::do_lookupswitch() because the "taken" counter for a lookupswitch branch
> is negative. The problem is an overflow when converting an uint counter value > max_jint from
> profile information to a jint.
> 
> The fix is to handle such overflows by simply limiting the counter value to max_jint.
> 
> Best regards,
> Tobias
> 

From stumon01 at arm.com  Wed Aug 12 11:38:03 2020
From: stumon01 at arm.com (Stuart Monteith)
Date: Wed, 12 Aug 2020 12:38:03 +0100
Subject: [aarch64-port-dev ] [16] RFR[S]: 8251216: Implement MD5
 intrinsics on AArch64
In-Reply-To: <2d960bbe-0191-db94-2d5c-7df511a36dba@redhat.com>
References: <MWHPR21MB051183FDFD8C154790995C1BB0470@MWHPR21MB0511.namprd21.prod.outlook.com>
 <2d960bbe-0191-db94-2d5c-7df511a36dba@redhat.com>
Message-ID: <b90fb90d-2cbf-af4c-f058-5d37bf887172@arm.com>

On 11/08/2020 11:06, Andrew Haley wrote:
> On 09/08/2020 04:19, Ludovic Henry wrote:
>> Hello,
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8251216
>> Webrev: http://cr.openjdk.java.net/~luhenry/8251216/webrev.00
>>
>> Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1
>>
>> This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance
>> improvements are the following (on Linux-AArch64 on a Marvell TX2):
>>
>> -XX:-UseMD5Intrinsics
>> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
>> MessageDigests.digest             md5        64     DEFAULT  thrpt   10  1616.238 ? 28.082  ops/ms
>> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   215.030 ?  0.691  ops/ms
>> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.228 ?  0.001  ops/ms
>>
>> -XX:+UseMD5Intrinsics
>> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
>> MessageDigests.digest             md5        64     DEFAULT  thrpt   10  2005.233 ? 40.513  ops/ms => 24% speedup
>> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   275.979 ?  0.455  ops/ms => 28% speedup
>> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.279 ?  0.001  ops/ms => 22% speedup
>>
>> Thank you,
>> Ludovic
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8250902
>>
>
> How did you test this? I'm looking through the test suite, but I can't
> find the test vectors. They must be in there somewhere.
>
> https://www.nist.gov/itl/ssd/software-quality-group/nsrl-test-data
>

I've been looking over this patch too. The fundamental unit test is:
    test/hotspot/jtreg/compiler/intrinsics/sha/TestDigest.java

The method "testDigest" generates an byte array of a given size, with each element filled with it's own index & 0xff.
The test is then run once, assumed uncompiled, it is then "warmed up" and the first generated digest is compared against
the digest presumably generated by the intrinsic. This is the same test for all of the message digest algorithms.

I'd say the test is no worse than what has gone before. There are additional tests under the jdk library tests, but
nothing that addresses the correctness of the MD5 algorithm implementation itself.

In terms of the status-quo, that patch looks ok to me. I think if the testing is to be expanded, it should be expanded
to all of the message digest algorithms.

BR,
        Stuart
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

From tobias.hartmann at oracle.com  Wed Aug 12 11:58:50 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 12 Aug 2020 13:58:50 +0200
Subject: [16] RFR(S): 8251458: Parse::do_lookupswitch fails with
 "assert(_cnt >= 0) failed"
In-Reply-To: <a70a5b0a-b997-965d-13c8-154339183dc1@oracle.com>
References: <dda8f5ee-0359-441d-3012-899b08b3b8e2@oracle.com>
 <a70a5b0a-b997-965d-13c8-154339183dc1@oracle.com>
Message-ID: <9345c0b6-b40e-8cff-360d-4843a64b8aec@oracle.com>

Thanks Christian!

Best regards,
Tobias

On 12.08.20 13:26, Christian Hagedorn wrote:
> Hi Tobias
> 
> Looks good to me!
> 
> Best regards,
> Christian
> 
> On 12.08.20 13:08, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8251458
>> http://cr.openjdk.java.net/~thartmann/8251458/webrev.00/
>>
>> We hit an assert in Parse::do_lookupswitch() because the "taken" counter for a lookupswitch branch
>> is negative. The problem is an overflow when converting an uint counter value > max_jint from
>> profile information to a jint.
>>
>> The fix is to handle such overflows by simply limiting the counter value to max_jint.
>>
>> Best regards,
>> Tobias
>>

From christian.hagedorn at oracle.com  Wed Aug 12 13:34:26 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Wed, 12 Aug 2020 15:34:26 +0200
Subject: [16] RFR(S): 8248791: sun/util/resources/cldr/TimeZoneNamesTest.java
 fails with -XX:-ReduceInitialCardMarks -XX:-ReduceBulkZeroing
Message-ID: <ef0272f7-ca26-bfcf-c2e4-2f2cc253ec21@oracle.com>

Hi

Please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8248791
http://cr.openjdk.java.net/~chagedorn/8248791/webrev.00/

The problem can be traced back to cloning an object and wrongly 
optimizing a field load from it to a constant zero. In 
LoadNode::Value(), we check if a load is performed on a 
freshly-allocated object. If that is the case we can replace the load by 
a constant zero. This is done by calling can_see_stored_value() at [1]. 
In this method, we first check if we can find a captured store with 
find_captured_store() [2].

When enabling ReduceBulkZeroing in the testcase, then this method 
returns NULL because captured_store_insertion_point() bails out at [3] 
for completed InitializationNodes (is set to complete at [4] since 
ReduceBulkZeroing is enabled and the allocation belongs to a clone).

When disabling ReduceBulkZeroing in the testcase, find_caputured_store() 
returns a non-NULL ProjNode because the InitializationNode of the 
allocation is not marked completed. We loop one more time and then 
return a constant zero at [5] because there is no store for the 
allocation (the ArrayCopyNode is responsible for the initialization of 
the cloned object).

The fix now only returns a constant zero if ReduceBulkZeroing is enabled 
or when the allocation does not belong to an ArrayCopyNode clone (if 
ReduceBulkZeroing is disabled).

Thank you!

Best regards,
Christian


[1] 
http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l1968
[2] 
http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l1115
[3] 
http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l3737
[4] 
http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/library_call.cpp#l4236
[5] 
http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l1106

From vladimir.kozlov at oracle.com  Wed Aug 12 16:18:21 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 12 Aug 2020 09:18:21 -0700
Subject: [16] RFR(S): 8251456: [TESTBUG]
 compiler/vectorization/TestVectorsNotSavedAtSafepoint.java failed
 OutOfMemoryError
In-Reply-To: <55ac928f-7d77-2900-1688-100e5a6fbd4f@oracle.com>
References: <55ac928f-7d77-2900-1688-100e5a6fbd4f@oracle.com>
Message-ID: <b096d2e6-c237-b26b-dcb3-be949255c456@oracle.com>

Looks good.

Thanks,
Vladimir

On 8/12/20 1:57 AM, Tobias Hartmann wrote:
> Hi,
> 
> please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8251456
> http://cr.openjdk.java.net/~thartmann/8251456/webrev.00/
> 
> The test is supposed to fill up the heap to trigger GC which will then corrupt vector registers if
> they are not saved at safepoints. It sometimes fails with OOME because there is not enough heap
> space to allocate such large arrays.
> 
> Adding a System.gc() call to the GarbageProducerThread triggers GCs more often without the need to
> allocate large garbage arrays and running for many (warmup) iterations. I've also strengthened
> verification of the array contents and used the exact same command line flags that Roland proposed
> in his fix for JDK-8193518 [1].
> 
> I've verified that the test still reproduces JDK-8249608 and JDK-8193518. It's now much more
> reliable and reproduces the issues in every run.
> 
> Best regards,
> Tobias
> 
> [1]
> http://cr.openjdk.java.net/~roland/8193518/webrev.01/test/hotspot/jtreg/compiler/vectorization/TestVectorsNotSavedAtSafepoint.java.html
> 

From vladimir.kozlov at oracle.com  Wed Aug 12 17:31:13 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 12 Aug 2020 10:31:13 -0700
Subject: [16] RFR(S): 8251458: Parse::do_lookupswitch fails with
 "assert(_cnt >= 0) failed"
In-Reply-To: <a70a5b0a-b997-965d-13c8-154339183dc1@oracle.com>
References: <dda8f5ee-0359-441d-3012-899b08b3b8e2@oracle.com>
 <a70a5b0a-b997-965d-13c8-154339183dc1@oracle.com>
Message-ID: <426c9b61-8708-549d-ac5b-6e207aa2f508@oracle.com>

+1

Thanks,
Vladimir K

On 8/12/20 4:26 AM, Christian Hagedorn wrote:
> Hi Tobias
> 
> Looks good to me!
> 
> Best regards,
> Christian
> 
> On 12.08.20 13:08, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8251458
>> http://cr.openjdk.java.net/~thartmann/8251458/webrev.00/
>>
>> We hit an assert in Parse::do_lookupswitch() because the "taken" counter for a lookupswitch branch
>> is negative. The problem is an overflow when converting an uint counter value > max_jint from
>> profile information to a jint.
>>
>> The fix is to handle such overflows by simply limiting the counter value to max_jint.
>>
>> Best regards,
>> Tobias
>>

From vladimir.kozlov at oracle.com  Wed Aug 12 17:38:16 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 12 Aug 2020 10:38:16 -0700
Subject: [16] RFR(S): 8248791:
 sun/util/resources/cldr/TimeZoneNamesTest.java fails with
 -XX:-ReduceInitialCardMarks -XX:-ReduceBulkZeroing
In-Reply-To: <ef0272f7-ca26-bfcf-c2e4-2f2cc253ec21@oracle.com>
References: <ef0272f7-ca26-bfcf-c2e4-2f2cc253ec21@oracle.com>
Message-ID: <13406ed1-5818-8062-6898-66598ba4f595@oracle.com>

Good.

Thanks,
Vladimir K

On 8/12/20 6:34 AM, Christian Hagedorn wrote:
> Hi
> 
> Please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8248791
> http://cr.openjdk.java.net/~chagedorn/8248791/webrev.00/
> 
> The problem can be traced back to cloning an object and wrongly optimizing a field load from it to a constant zero. In 
> LoadNode::Value(), we check if a load is performed on a freshly-allocated object. If that is the case we can replace the 
> load by a constant zero. This is done by calling can_see_stored_value() at [1]. In this method, we first check if we can 
> find a captured store with find_captured_store() [2].
> 
> When enabling ReduceBulkZeroing in the testcase, then this method returns NULL because captured_store_insertion_point() 
> bails out at [3] for completed InitializationNodes (is set to complete at [4] since ReduceBulkZeroing is enabled and the 
> allocation belongs to a clone).
> 
> When disabling ReduceBulkZeroing in the testcase, find_caputured_store() returns a non-NULL ProjNode because the 
> InitializationNode of the allocation is not marked completed. We loop one more time and then return a constant zero at 
> [5] because there is no store for the allocation (the ArrayCopyNode is responsible for the initialization of the cloned 
> object).
> 
> The fix now only returns a constant zero if ReduceBulkZeroing is enabled or when the allocation does not belong to an 
> ArrayCopyNode clone (if ReduceBulkZeroing is disabled).
> 
> Thank you!
> 
> Best regards,
> Christian
> 
> 
> [1] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l1968
> [2] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l1115
> [3] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l3737
> [4] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/library_call.cpp#l4236
> [5] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l1106

From vladimir.x.ivanov at oracle.com  Wed Aug 12 22:24:55 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 13 Aug 2020 01:24:55 +0300
Subject: [16] RFR(S): 8251458: Parse::do_lookupswitch fails with
 "assert(_cnt >= 0) failed"
In-Reply-To: <dda8f5ee-0359-441d-3012-899b08b3b8e2@oracle.com>
References: <dda8f5ee-0359-441d-3012-899b08b3b8e2@oracle.com>
Message-ID: <965ba14c-2e1c-280e-4403-736ede11dd97@oracle.com>


> http://cr.openjdk.java.net/~thartmann/8251458/webrev.00/

Though the fix itself looks sufficient, the code around is still not 
pretty... In particular, profile data goes through 
uint->jint->int->float(!) conversion which doesn't make any sense.

It would be really nice to clean it up.

Best regards,
Vladimir Ivanov

> We hit an assert in Parse::do_lookupswitch() because the "taken" counter for a lookupswitch branch
> is negative. The problem is an overflow when converting an uint counter value > max_jint from
> profile information to a jint.
> 
> The fix is to handle such overflows by simply limiting the counter value to max_jint.
> 
> Best regards,
> Tobias
> 

From tobias.hartmann at oracle.com  Thu Aug 13 05:59:00 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 13 Aug 2020 07:59:00 +0200
Subject: [16] RFR(S): 8251456: [TESTBUG]
 compiler/vectorization/TestVectorsNotSavedAtSafepoint.java failed
 OutOfMemoryError
In-Reply-To: <b096d2e6-c237-b26b-dcb3-be949255c456@oracle.com>
References: <55ac928f-7d77-2900-1688-100e5a6fbd4f@oracle.com>
 <b096d2e6-c237-b26b-dcb3-be949255c456@oracle.com>
Message-ID: <ee1e2ee3-37e3-8114-3022-891efbfce3b3@oracle.com>

Thanks Vladimir.

Best regards,
Tobias

On 12.08.20 18:18, Vladimir Kozlov wrote:
> Looks good.
> 
> Thanks,
> Vladimir
> 
> On 8/12/20 1:57 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8251456
>> http://cr.openjdk.java.net/~thartmann/8251456/webrev.00/
>>
>> The test is supposed to fill up the heap to trigger GC which will then corrupt vector registers if
>> they are not saved at safepoints. It sometimes fails with OOME because there is not enough heap
>> space to allocate such large arrays.
>>
>> Adding a System.gc() call to the GarbageProducerThread triggers GCs more often without the need to
>> allocate large garbage arrays and running for many (warmup) iterations. I've also strengthened
>> verification of the array contents and used the exact same command line flags that Roland proposed
>> in his fix for JDK-8193518 [1].
>>
>> I've verified that the test still reproduces JDK-8249608 and JDK-8193518. It's now much more
>> reliable and reproduces the issues in every run.
>>
>> Best regards,
>> Tobias
>>
>> [1]
>> http://cr.openjdk.java.net/~roland/8193518/webrev.01/test/hotspot/jtreg/compiler/vectorization/TestVectorsNotSavedAtSafepoint.java.html
>>
>>

From tobias.hartmann at oracle.com  Thu Aug 13 06:01:35 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 13 Aug 2020 08:01:35 +0200
Subject: [16] RFR(S): 8251458: Parse::do_lookupswitch fails with
 "assert(_cnt >= 0) failed"
In-Reply-To: <426c9b61-8708-549d-ac5b-6e207aa2f508@oracle.com>
References: <dda8f5ee-0359-441d-3012-899b08b3b8e2@oracle.com>
 <a70a5b0a-b997-965d-13c8-154339183dc1@oracle.com>
 <426c9b61-8708-549d-ac5b-6e207aa2f508@oracle.com>
Message-ID: <675e5e9b-3152-f836-7708-1eb69e445777@oracle.com>

Thanks Vladimir.

Best regards,
Tobias

On 12.08.20 19:31, Vladimir Kozlov wrote:
> +1
> 
> Thanks,
> Vladimir K
> 
> On 8/12/20 4:26 AM, Christian Hagedorn wrote:
>> Hi Tobias
>>
>> Looks good to me!
>>
>> Best regards,
>> Christian
>>
>> On 12.08.20 13:08, Tobias Hartmann wrote:
>>> Hi,
>>>
>>> please review the following patch:
>>> https://bugs.openjdk.java.net/browse/JDK-8251458
>>> http://cr.openjdk.java.net/~thartmann/8251458/webrev.00/
>>>
>>> We hit an assert in Parse::do_lookupswitch() because the "taken" counter for a lookupswitch branch
>>> is negative. The problem is an overflow when converting an uint counter value > max_jint from
>>> profile information to a jint.
>>>
>>> The fix is to handle such overflows by simply limiting the counter value to max_jint.
>>>
>>> Best regards,
>>> Tobias
>>>

From tobias.hartmann at oracle.com  Thu Aug 13 06:09:06 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 13 Aug 2020 08:09:06 +0200
Subject: [16] RFR(S): 8251458: Parse::do_lookupswitch fails with
 "assert(_cnt >= 0) failed"
In-Reply-To: <965ba14c-2e1c-280e-4403-736ede11dd97@oracle.com>
References: <dda8f5ee-0359-441d-3012-899b08b3b8e2@oracle.com>
 <965ba14c-2e1c-280e-4403-736ede11dd97@oracle.com>
Message-ID: <6bf15413-3a6d-55cc-570d-c115a72397ec@oracle.com>

Hi Vladimir,

Thanks for looking at this!

On 13.08.20 00:24, Vladimir Ivanov wrote:
> Though the fix itself looks sufficient, the code around is still not pretty... In particular,
> profile data goes through uint->jint->int->float(!) conversion which doesn't make any sense.
> 
> It would be really nice to clean it up.

Yes, I've noticed that as well but didn't want to clean it up with this patch because we need to
backport to 11u. I've filed JDK-8251513 [1] for the cleanup.

Best regards,
Tobias

[1] https://bugs.openjdk.java.net/browse/JDK-8251513

From tobias.hartmann at oracle.com  Thu Aug 13 06:41:14 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 13 Aug 2020 08:41:14 +0200
Subject: [16] RFR(S): 8248791:
 sun/util/resources/cldr/TimeZoneNamesTest.java fails with
 -XX:-ReduceInitialCardMarks -XX:-ReduceBulkZeroing
In-Reply-To: <ef0272f7-ca26-bfcf-c2e4-2f2cc253ec21@oracle.com>
References: <ef0272f7-ca26-bfcf-c2e4-2f2cc253ec21@oracle.com>
Message-ID: <1375695a-e955-a341-45f3-d7168b3c9bc8@oracle.com>

Hi Christian,

what about other allocations that are marked as 'complete_with_arraycoppy'? Not all of them use an
ArrayCopyNode for the actual initialization and therefore find_array_copy_clone will return false.
For example, LibraryCallKit::inline_string_copy.

Can't you just check if InitializeNode::is_complete_with_arraycopy is set?

Best regards,
Tobias


On 12.08.20 15:34, Christian Hagedorn wrote:
> Hi
> 
> Please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8248791
> http://cr.openjdk.java.net/~chagedorn/8248791/webrev.00/
> 
> The problem can be traced back to cloning an object and wrongly optimizing a field load from it to a
> constant zero. In LoadNode::Value(), we check if a load is performed on a freshly-allocated object.
> If that is the case we can replace the load by a constant zero. This is done by calling
> can_see_stored_value() at [1]. In this method, we first check if we can find a captured store with
> find_captured_store() [2].
> 
> When enabling ReduceBulkZeroing in the testcase, then this method returns NULL because
> captured_store_insertion_point() bails out at [3] for completed InitializationNodes (is set to
> complete at [4] since ReduceBulkZeroing is enabled and the allocation belongs to a clone).
> 
> When disabling ReduceBulkZeroing in the testcase, find_caputured_store() returns a non-NULL ProjNode
> because the InitializationNode of the allocation is not marked completed. We loop one more time and
> then return a constant zero at [5] because there is no store for the allocation (the ArrayCopyNode
> is responsible for the initialization of the cloned object).
> 
> The fix now only returns a constant zero if ReduceBulkZeroing is enabled or when the allocation does
> not belong to an ArrayCopyNode clone (if ReduceBulkZeroing is disabled).
> 
> Thank you!
> 
> Best regards,
> Christian
> 
> 
> [1] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l1968
> [2] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l1115
> [3] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l3737
> [4] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/library_call.cpp#l4236
> [5] http://hg.openjdk.java.net/jdk/jdk/file/e7109ed4bbb0/src/hotspot/share/opto/memnode.cpp#l1106

From tobias.hartmann at oracle.com  Thu Aug 13 07:00:20 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 13 Aug 2020 09:00:20 +0200
Subject: RFR: JDK-8251470: Add a development option equivalant to
 OptoNoExecute to C1 compiler
In-Reply-To: <OF1981A1B1.47AE63D1-ON492585C2.002A4C68-492585C2.002AF0B9@notes.na.collabserv.com>
References: <OF1981A1B1.47AE63D1-ON492585C2.002A4C68-492585C2.002AF0B9@notes.na.collabserv.com>
Message-ID: <aee137bd-ae8d-2070-2990-3f0d26dcdb48@oracle.com>

Hi Ogata,

isn't that what -XX:-InstallMethods [1] is supposed to accomplish? It triggers a bailout right
before Compilation::install_code, which is the same with your code.

Also, why do you need the change in javaCalls.cpp? That would also affect C2 compiled code.

Best regards,
Tobias

[1] http://hg.openjdk.java.net/jdk/jdk/file/a7c030723240/src/hotspot/share/c1/c1_globals.hpp#l292


On 12.08.20 09:48, Kazunori Ogata wrote:
> Hi,
> 
> May I get review for JDK-8251470: Add a development option equivalant to 
> OptoNoExecute to C1 compiler?
> 
> This patch adds a development option to compile a method with C1 and print 
> disassembly of the generated native code, but to skip execution of the 
> generated code, in the same manner as OptoNoExecute option does in C2.
> 
> Log-based debugging is useful to support a new processor.  In C1, the 
> existing options BailoutAfterHIR and BailoutAfterLIR can be used if 
> printing HIR/LIR is sufficient.  However, there is no way to print 
> disassembly of the generated code because these existing options quit 
> compilation before generating native code.  So this issue proposes a new 
> option for this purpose.
> 
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8251470
> Webrev: http://cr.openjdk.java.net/~ogatak/8251470/webrev.00/
> 
> 
> Regards,
> Ogata
> 

From tobias.hartmann at oracle.com  Thu Aug 13 07:31:14 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 13 Aug 2020 09:31:14 +0200
Subject: RFR 8164632: Node indices should be treated as unsigned integers
In-Reply-To: <99612339-38D5-411C-9459-89EA1A0F4284@amazon.com>
References: <99612339-38D5-411C-9459-89EA1A0F4284@amazon.com>
Message-ID: <64dc3cb4-ebbe-a668-febf-5d7dd3ac71df@oracle.com>

Hi Eric,

there are other places where Node::_idx is casted to int (and a potential overflow might happen).
For example, calls to Compile::node_notes_at.

The purpose of this RFE was to replace all Node::_idx uint -> int casts and consistently use uint
for the node index. If that's not feasible, we should at least add a guarantee (not only an assert)
checking that _idx is always <= MAX_INT.

Best regards,
Tobias

On 12.08.20 00:41, Eric, Chan wrote:
> Hi,
> 
> Requesting review for
> 
> Webrev : http://cr.openjdk.java.net/~xliu/eric/8164632/00/webrev/
> JBS : https://bugs.openjdk.java.net/browse/JDK-8164632
> 
> The change cast uint ni to integer so that the parameter that pass to method TypeOopPtr::cast_to_instance_id is a integer.
> 
> I have tested this builds successfully .
> 
> Ensured that there are no regressions in hotspot : tier1 tests.
> 
> Regards,
> Eric Chen
> 

From adinn at redhat.com  Thu Aug 13 08:06:13 2020
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 13 Aug 2020 09:06:13 +0100
Subject: RFR: 8247354: [aarch64] PopFrame causes
 assert(oopDesc::is_oop(obj)) failed: not an oop
In-Reply-To: <85364ykery.fsf@nicgas01-pc.shanghai.arm.com>
References: <85364ykery.fsf@nicgas01-pc.shanghai.arm.com>
Message-ID: <10b37c70-c522-7f65-3c7e-bbeeaf7e1c3d@redhat.com>

Hi Nick,

On 07/08/2020 10:04, Nick Gasson wrote:
> Bug: https://bugs.openjdk.java.net/browse/JDK-8247354
> Webrev: http://cr.openjdk.java.net/~ngasson/8247354/webrev.0/
Nice detective work. The patch looks ok to me.


regards,


Andrew Dinn
-----------
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From tobias.hartmann at oracle.com  Thu Aug 13 08:21:40 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 13 Aug 2020 10:21:40 +0200
Subject: [16] RFR(S): 8248791:
 sun/util/resources/cldr/TimeZoneNamesTest.java fails with
 -XX:-ReduceInitialCardMarks -XX:-ReduceBulkZeroing
In-Reply-To: <1375695a-e955-a341-45f3-d7168b3c9bc8@oracle.com>
References: <ef0272f7-ca26-bfcf-c2e4-2f2cc253ec21@oracle.com>
 <1375695a-e955-a341-45f3-d7168b3c9bc8@oracle.com>
Message-ID: <f368832c-2b39-e1a2-8826-81836b1a035e@oracle.com>


On 13.08.20 08:41, Tobias Hartmann wrote:
> what about other allocations that are marked as 'complete_with_arraycoppy'? Not all of them use an
> ArrayCopyNode for the actual initialization and therefore find_array_copy_clone will return false.
> For example, LibraryCallKit::inline_string_copy.
> 
> Can't you just check if InitializeNode::is_complete_with_arraycopy is set?

Okay, please ignore that. I've noticed that only the clone intrinsic respects ReduceBulkZeroing.

Your fix looks good to me.

Best regards,
Tobias

From aph at redhat.com  Thu Aug 13 10:00:12 2020
From: aph at redhat.com (Andrew Haley)
Date: Thu, 13 Aug 2020 11:00:12 +0100
Subject: [aarch64-port-dev ] [16] RFR[S]: 8251216: Implement MD5
 intrinsics on AArch64
In-Reply-To: <b90fb90d-2cbf-af4c-f058-5d37bf887172@arm.com>
References: <MWHPR21MB051183FDFD8C154790995C1BB0470@MWHPR21MB0511.namprd21.prod.outlook.com>
 <2d960bbe-0191-db94-2d5c-7df511a36dba@redhat.com>
 <b90fb90d-2cbf-af4c-f058-5d37bf887172@arm.com>
Message-ID: <0586ead6-583d-1907-491e-64db6edf2106@redhat.com>

On 12/08/2020 12:38, Stuart Monteith wrote:

 > The method "testDigest" generates an byte array of a given size,
 > with each element filled with it's own index & 0xff.
 >
 > The test is then run once, assumed uncompiled, it is then "warmed
 > up" and the first generated digest is compared against the digest
 > presumably generated by the intrinsic. This is the same test for all
 > of the message digest algorithms.
 >
 > I'd say the test is no worse than what has gone before. There are
 > additional tests under the jdk library tests, but nothing that
 > addresses the correctness of the MD5 algorithm implementation
 > itself.

Good grief. So there are no compliance tests in the test suite at all.

 > In terms of the status-quo, that patch looks ok to me. I think if
 > the testing is to be expanded, it should be expanded to all of the
 > message digest algorithms.

That's not much more that an excuse for doing nothing, IMO.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From christian.hagedorn at oracle.com  Thu Aug 13 10:17:54 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Thu, 13 Aug 2020 12:17:54 +0200
Subject: [16] RFR(S): 8248791:
 sun/util/resources/cldr/TimeZoneNamesTest.java fails with
 -XX:-ReduceInitialCardMarks -XX:-ReduceBulkZeroing
In-Reply-To: <f368832c-2b39-e1a2-8826-81836b1a035e@oracle.com>
References: <ef0272f7-ca26-bfcf-c2e4-2f2cc253ec21@oracle.com>
 <1375695a-e955-a341-45f3-d7168b3c9bc8@oracle.com>
 <f368832c-2b39-e1a2-8826-81836b1a035e@oracle.com>
Message-ID: <39d1ae30-f7c0-c2d8-7c58-5b5e5cd3a522@oracle.com>

Thank you Tobias for your careful review!

Best regards,
Christian

On 13.08.20 10:21, Tobias Hartmann wrote:
> 
> On 13.08.20 08:41, Tobias Hartmann wrote:
>> what about other allocations that are marked as 'complete_with_arraycoppy'? Not all of them use an
>> ArrayCopyNode for the actual initialization and therefore find_array_copy_clone will return false.
>> For example, LibraryCallKit::inline_string_copy.
>>
>> Can't you just check if InitializeNode::is_complete_with_arraycopy is set?
> 
> Okay, please ignore that. I've noticed that only the clone intrinsic respects ReduceBulkZeroing.
> 
> Your fix looks good to me.
> 
> Best regards,
> Tobias
> 

From christian.hagedorn at oracle.com  Thu Aug 13 10:46:00 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Thu, 13 Aug 2020 12:46:00 +0200
Subject: [16] RFR(S): 8251456: [TESTBUG]
 compiler/vectorization/TestVectorsNotSavedAtSafepoint.java failed
 OutOfMemoryError
In-Reply-To: <55ac928f-7d77-2900-1688-100e5a6fbd4f@oracle.com>
References: <55ac928f-7d77-2900-1688-100e5a6fbd4f@oracle.com>
Message-ID: <3d9b4ae6-cee6-5203-87db-9b874403af1a@oracle.com>

Hi Tobias

Looks good to me.

Best regards,
Christian

On 12.08.20 10:57, Tobias Hartmann wrote:
> Hi,
> 
> please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8251456
> http://cr.openjdk.java.net/~thartmann/8251456/webrev.00/
> 
> The test is supposed to fill up the heap to trigger GC which will then corrupt vector registers if
> they are not saved at safepoints. It sometimes fails with OOME because there is not enough heap
> space to allocate such large arrays.
> 
> Adding a System.gc() call to the GarbageProducerThread triggers GCs more often without the need to
> allocate large garbage arrays and running for many (warmup) iterations. I've also strengthened
> verification of the array contents and used the exact same command line flags that Roland proposed
> in his fix for JDK-8193518 [1].
> 
> I've verified that the test still reproduces JDK-8249608 and JDK-8193518. It's now much more
> reliable and reproduces the issues in every run.
> 
> Best regards,
> Tobias
> 
> [1]
> http://cr.openjdk.java.net/~roland/8193518/webrev.01/test/hotspot/jtreg/compiler/vectorization/TestVectorsNotSavedAtSafepoint.java.html
> 

From tobias.hartmann at oracle.com  Thu Aug 13 10:47:10 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 13 Aug 2020 12:47:10 +0200
Subject: [16] RFR(S): 8251456: [TESTBUG]
 compiler/vectorization/TestVectorsNotSavedAtSafepoint.java failed
 OutOfMemoryError
In-Reply-To: <3d9b4ae6-cee6-5203-87db-9b874403af1a@oracle.com>
References: <55ac928f-7d77-2900-1688-100e5a6fbd4f@oracle.com>
 <3d9b4ae6-cee6-5203-87db-9b874403af1a@oracle.com>
Message-ID: <5b5be4fb-2cc4-7b96-1741-e8f5cfda3531@oracle.com>

Thanks Christian!

Best regards,
Tobias

On 13.08.20 12:46, Christian Hagedorn wrote:
> Hi Tobias
> 
> Looks good to me.
> 
> Best regards,
> Christian
> 
> On 12.08.20 10:57, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8251456
>> http://cr.openjdk.java.net/~thartmann/8251456/webrev.00/
>>
>> The test is supposed to fill up the heap to trigger GC which will then corrupt vector registers if
>> they are not saved at safepoints. It sometimes fails with OOME because there is not enough heap
>> space to allocate such large arrays.
>>
>> Adding a System.gc() call to the GarbageProducerThread triggers GCs more often without the need to
>> allocate large garbage arrays and running for many (warmup) iterations. I've also strengthened
>> verification of the array contents and used the exact same command line flags that Roland proposed
>> in his fix for JDK-8193518 [1].
>>
>> I've verified that the test still reproduces JDK-8249608 and JDK-8193518. It's now much more
>> reliable and reproduces the issues in every run.
>>
>> Best regards,
>> Tobias
>>
>> [1]
>> http://cr.openjdk.java.net/~roland/8193518/webrev.01/test/hotspot/jtreg/compiler/vectorization/TestVectorsNotSavedAtSafepoint.java.html
>>
>>

From dmitry.chuyko at bell-sw.com  Thu Aug 13 11:04:37 2020
From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko)
Date: Thu, 13 Aug 2020 14:04:37 +0300
Subject: [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp)
Message-ID: <abfbd41d-aacc-af5d-d12b-d8e685e95a76@bell-sw.com>

Hello,

Please review a faster version of Math.signum() for AArch64.

Two new intrinsics (double and float) are introduced in general code, 
with appropriate new nodes. New JTreg test is added to cover the 
intrinsic case (enabled only for aarch64).

AArch64 implementation uses FACGT (compare abslute fp values) and BSL 
(fp bit selection) to avoid branches and moves to non-fp registers and back.

Performance results show ~30% better time in the benchmark with a black 
hole [1] on Cortex. E.g. on random numbers 4.8 ns/op --> 3.5 ns/op, 
overhead is 2.9 ns/op.

rfe: https://bugs.openjdk.java.net/browse/JDK-8251525
webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.00/
testing: jck, jtreg including new dedicated test

-Dmitry

[1] https://cr.openjdk.java.net/~dchuyko/8249198/DoubleSignum.java


From vladimir.x.ivanov at oracle.com  Thu Aug 13 11:32:50 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 13 Aug 2020 14:32:50 +0300
Subject: [16] RFR(S): 8251458: Parse::do_lookupswitch fails with
 "assert(_cnt >= 0) failed"
In-Reply-To: <6bf15413-3a6d-55cc-570d-c115a72397ec@oracle.com>
References: <dda8f5ee-0359-441d-3012-899b08b3b8e2@oracle.com>
 <965ba14c-2e1c-280e-4403-736ede11dd97@oracle.com>
 <6bf15413-3a6d-55cc-570d-c115a72397ec@oracle.com>
Message-ID: <ceda8ac5-fcdd-c7c0-74f0-503aa74e5e4c@oracle.com>


>> Though the fix itself looks sufficient, the code around is still not pretty... In particular,
>> profile data goes through uint->jint->int->float(!) conversion which doesn't make any sense.
>>
>> It would be really nice to clean it up.
> 
> Yes, I've noticed that as well but didn't want to clean it up with this patch because we need to
> backport to 11u. I've filed JDK-8251513 [1] for the cleanup.

Sounds good.

Best regards,
Vladimir Ivanov

From vladimir.x.ivanov at oracle.com  Thu Aug 13 11:51:59 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 13 Aug 2020 14:51:59 +0300
Subject: [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp)
In-Reply-To: <abfbd41d-aacc-af5d-d12b-d8e685e95a76@bell-sw.com>
References: <abfbd41d-aacc-af5d-d12b-d8e685e95a76@bell-sw.com>
Message-ID: <abc16ae7-369e-77eb-ed6c-3fa71e4fd80c@oracle.com>

Hi Dmitry,

Some comments on shared code changes:

src/hotspot/share/opto/library_call.cpp:

+  case vmIntrinsics::_dsignum:
+    return UseSignumIntrinsic && 
(Matcher::match_rule_supported(Op_SignumD) ? inline_double_math(id) : 
false);

There's no need in repeating UseSignumIntrinsic and 
(Matcher::match_rule_supported(Op_SignumD) checks.
C2Compiler::is_intrinsic_supported() already covers taht.


src/hotspot/share/opto/signum.hpp:

   32 class SignumNode : public Node {
   33 public:
   34   SignumNode(Node* in) : Node(0, in) {}
   35   virtual int Opcode() const;
   36   virtual const Type *bottom_type() const { return NULL; }
   37   virtual uint ideal_reg() const { return Op_RegD; }
   38 };

Any particular reason to keep SignumNode? I don't see any and would just 
drop it.

Also, having a dedicated header file just for a couple of nodes with 
trivial implementations looks like an overkill. As an alternative 
location, intrinsicnode.cpp should be a better option.

Best regards,
Vladimir Ivanov

On 13.08.2020 14:04, Dmitry Chuyko wrote:
> Hello,
> 
> Please review a faster version of Math.signum() for AArch64.
> 
> Two new intrinsics (double and float) are introduced in general code, 
> with appropriate new nodes. New JTreg test is added to cover the 
> intrinsic case (enabled only for aarch64).
> 
> AArch64 implementation uses FACGT (compare abslute fp values) and BSL 
> (fp bit selection) to avoid branches and moves to non-fp registers and 
> back.
> 
> Performance results show ~30% better time in the benchmark with a black 
> hole [1] on Cortex. E.g. on random numbers 4.8 ns/op --> 3.5 ns/op, 
> overhead is 2.9 ns/op.
> 
> rfe: https://bugs.openjdk.java.net/browse/JDK-8251525
> webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.00/
> testing: jck, jtreg including new dedicated test
> 
> -Dmitry
> 
> [1] https://cr.openjdk.java.net/~dchuyko/8249198/DoubleSignum.java
> 

From aph at redhat.com  Thu Aug 13 13:07:38 2020
From: aph at redhat.com (Andrew Haley)
Date: Thu, 13 Aug 2020 14:07:38 +0100
Subject: [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp)
In-Reply-To: <abfbd41d-aacc-af5d-d12b-d8e685e95a76@bell-sw.com>
References: <abfbd41d-aacc-af5d-d12b-d8e685e95a76@bell-sw.com>
Message-ID: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com>

On 13/08/2020 12:04, Dmitry Chuyko wrote:
> Performance results show ~30% better time in the benchmark with a black
> hole [1] on Cortex. E.g. on random numbers 4.8 ns/op --> 3.5 ns/op,
> overhead is 2.9 ns/op.
> 
> rfe:https://bugs.openjdk.java.net/browse/JDK-8251525
> webrev:http://cr.openjdk.java.net/~dchuyko/8251525/webrev.00/
> testing: jck, jtreg including new dedicated test

Please show all of the JMH results.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From dmitry.chuyko at bell-sw.com  Thu Aug 13 13:50:01 2020
From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko)
Date: Thu, 13 Aug 2020 16:50:01 +0300
Subject: [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp)
In-Reply-To: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com>
References: <abfbd41d-aacc-af5d-d12b-d8e685e95a76@bell-sw.com>
 <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com>
Message-ID: <790171f2-0499-21ed-899c-59fd788c34ba@bell-sw.com>

Hi Andrew,

On 8/13/20 4:07 PM, Andrew Haley wrote:
> On 13/08/2020 12:04, Dmitry Chuyko wrote:
>> Performance results show ~30% better time in the benchmark with a black
>> hole [1] on Cortex. E.g. on random numbers 4.8 ns/op --> 3.5 ns/op,
>> overhead is 2.9 ns/op.
>> ......
>
> Please show all of the JMH results.
>
Results for other sub-benchmarks are listed in the RFE, here is a copy:

Baseline
DoubleSignum.ofMostlyNaN 5.019 ? 0.060 ns/op
DoubleSignum.ofMostlyNeg 4.919 ? 0.030 ns/op
DoubleSignum.ofMostlyPos 4.827 ? 0.081 ns/op
DoubleSignum.ofMostlyZero 4.936 ? 0.107 ns/op
DoubleSignum.ofRandom 4.825 ? 0.026 ns/op
DoubleSignum.overhead 2.846 ? 0.027 ns/op

Patch
DoubleSignum.ofMostlyNaN 3.478 ? 0.368 ns/op
DoubleSignum.ofMostlyNeg 3.509 ? 0.487 ns/op
DoubleSignum.ofMostlyPos 3.513 ? 0.451 ns/op
DoubleSignum.ofMostlyZero 3.494 ? 0.220 ns/op
DoubleSignum.ofRandom 3.506 ? 0.343 ns/op
DoubleSignum.overhead 2.848 ? 0.019 ns/op

-Dmitry


From stuart.monteith at arm.com  Thu Aug 13 15:48:34 2020
From: stuart.monteith at arm.com (Stuart Monteith)
Date: Thu, 13 Aug 2020 16:48:34 +0100
Subject: [aarch64-port-dev ] [16] RFR[S]: 8251216: Implement MD5
 intrinsics on AArch64
In-Reply-To: <0586ead6-583d-1907-491e-64db6edf2106@redhat.com>
References: <MWHPR21MB051183FDFD8C154790995C1BB0470@MWHPR21MB0511.namprd21.prod.outlook.com>
 <2d960bbe-0191-db94-2d5c-7df511a36dba@redhat.com>
 <b90fb90d-2cbf-af4c-f058-5d37bf887172@arm.com>
 <0586ead6-583d-1907-491e-64db6edf2106@redhat.com>
Message-ID: <b979947b-4470-7953-77e9-1b117e8d52fb@arm.com>

On 13/08/2020 11:00, Andrew Haley wrote:
> On 12/08/2020 12:38, Stuart Monteith wrote:
> 
>  > The method "testDigest" generates an byte array of a given size,
>  > with each element filled with it's own index & 0xff.
>  >
>  > The test is then run once, assumed uncompiled, it is then "warmed
>  > up" and the first generated digest is compared against the digest
>  > presumably generated by the intrinsic. This is the same test for all
>  > of the message digest algorithms.
>  >
>  > I'd say the test is no worse than what has gone before. There are
>  > additional tests under the jdk library tests, but nothing that
>  > addresses the correctness of the MD5 algorithm implementation
>  > itself.
> 
> Good grief. So there are no compliance tests in the test suite at all.

Yes for any algorithm, for either the intrinsics or the Java implementations.

> 
>  > In terms of the status-quo, that patch looks ok to me. I think if
>  > the testing is to be expanded, it should be expanded to all of the
>  > message digest algorithms.
> 
> That's not much more that an excuse for doing nothing, IMO.
> 

My intention was to suggest that more than MD5 or even just the intrinsics need to be tested, it's not an excuse to 
ignore this.

The existing tests are simply a comparison between generated message digest for a single message between the Java code 
and the intrinsics. The NIST samples cover SHA1 and MD5, but there are additional samples here: 
https://csrc.nist.gov/Projects/cryptographic-standards-and-guidelines/example-values .

The message digests in Java under sun.security.provider are:

	MD2, MD4, MD5,
	SHA1
SHA2:	SHA2-224, SHA2-256,
SHA3:	SHA3-224, SHA3-256, SHA3-384, SHA3-512,	SHAKE256
SHA5: SHA-512/224, SHA-512/256, SHA-512, SHA-384,

The intrinsics implemented are:
	aarch64: SHA1, SHA2, SHA5 (+MD5)
	ppc64:  SHA2, SHA5
	x86_64: SHA1, SHA2, SHA5, MD5
	x86_32: SHA1, SHA2, MD5

The MD5 patches have been merged already for x86.

SHA3 doesn't have any intrinsic implementations.

MD2 has some example values in its RFC https://tools.ietf.org/html/rfc1319
Likewise, MD4 has example values in its RFC too: https://tools.ietf.org/html/rfc1320

My suggestion is to add new tests for each of the message digest algorithms and share them between the JTreg jdk and 
hotspot instrinsics. The MD5 intrinsics could be merged after some demonstration of correctness?

I've CC'd core-libs-dev as this affects the jdk library.

BR,
	Stuart

From nils.eliasson at oracle.com  Thu Aug 13 15:59:05 2020
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Thu, 13 Aug 2020 17:59:05 +0200
Subject: RFR(S): 8247732: validate user-input intrinsic_ids in
 ControlIntrinsic
In-Reply-To: <f9dfee56-c3dc-718a-960b-9bfc2a8c0c12@oracle.com>
References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com>
 <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com>
 <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com>
 <1595401959932.33284@amazon.com>
 <a03d92d6-ad07-b347-7452-776459b8d174@oracle.com>
 <1595520162373.22868@amazon.com>
 <916b3a4a-5617-941d-6161-840f3ea900bd@oracle.com>
 <1596523192072.15354@amazon.com> <1597165750921.4285@amazon.com>
 <f9dfee56-c3dc-718a-960b-9bfc2a8c0c12@oracle.com>
Message-ID: <9e3fae0e-ecf7-07a9-dba3-c1cef2646eb3@oracle.com>

Hi again,

On second thought - please add some basic testing (reuse any old test, 
or write a new) that covers the different cases.

I think this table covers all combinations. There should exist tests for 
most of them that you can piggy back on.

|+-------------------------------------------------+-------+----------------------------------+ 
| ControlIntrinsics | valid | invalid | 
+-------------------------------------------------+-------+----------------------------------+ 
| vmflag | ok | print error and don't start | 
+-------------------------------------------------+-------+----------------------------------+ 
| CompilerOracle: -XX:CompileCommand= | ok | print error and continue | 
+-------------------------------------------------+-------+----------------------------------+ 
| CompilerDirectives: -XX:CompilerDirectivesFile= | ok | print error and 
don't start | 
+-------------------------------------------------+-------+----------------------------------+ 
| CompilerDirectives via jcmd | ok | print error, vm continues to run | 
+-------------------------------------------------+-------+----------------------------------+|


Regards,
Nils


On 2020-08-12 10:21, Nils Eliasson wrote:
> Hi,
>
> Sorry for the delay.
>
> About the error handling:
>
> For CompilerDirectivesFile there are two scenarios:
> 1) If a file containing bad contents is passed on the commandline - 
> the VM prints an descriptive error and refuses to start.
> 2) If a file containing bad contents is passed through jcmd - the VM 
> prints and error on the jcmd stream and continues to run (ignoring the 
> command).
>
> This is achieved by letting the parser just register any parsing 
> error, and defer to the caller to decide how to handle the situation.
>
> Regards,
> Nils Eliasson
>
>
> On 2020-08-11 19:09, Liu, Xin wrote:
>> Hi, Reviewers,
>>
>> May I gently ping this?
>>
>> I stuck because I don't know which error handling is appropriate.
>>
>> If we do nothing, current hotspot ignores wrong intrinsic Ids in the 
>> cmdline.
>> This patch aborts hotspot when it detects any invalid intrinsic id.
>>
>> thanks,
>> --lx
>>
>>
>> ________________________________________
>> From: Liu, Xin
>> Sent: Monday, August 3, 2020 11:39 PM
>> To: Tobias Hartmann; Nils Eliasson; 
>> hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev
>> Subject: Re: [EXTERNAL] RFR(S): 8247732: validate user-input 
>> intrinsic_ids in ControlIntrinsic
>>
>> hi, Nils,
>>
>> Tobias would like to keep the parser behavior consistency.? I think 
>> it means that the hotspot need to suppress the warning if the 
>> intrinsic_id doesn't exists in compiler directive.
>> eg. -XX:CompileCommand=option,<pattern>,ControlIntrinsic=-_nonexist.
>>
>> What do you think about it?
>>
>> Here is the latest webrev:
>> http://cr.openjdk.java.net/~xliu/8247732/01/webrev/
>>
>> thanks,
>> --lx
>>
>> ________________________________________
>> From: Tobias Hartmann <tobias.hartmann at oracle.com>
>> Sent: Friday, July 24, 2020 2:52 AM
>> To: Liu, Xin; Nils Eliasson; hotspot-compiler-dev at openjdk.java.net; 
>> hotspot-runtime-dev
>> Subject: RE: [EXTERNAL] RFR(S): 8247732: validate user-input 
>> intrinsic_ids in ControlIntrinsic
>>
>> CAUTION: This email originated from outside of the organization. Do 
>> not click links or open attachments unless you can confirm the sender 
>> and know the content is safe.
>>
>>
>>
>> Hi Liu,
>>
>> On 23.07.20 18:02, Liu, Xin wrote:
>>> That is my intention too, but CompilerOracle doesn't exit JVM when 
>>> it encounters parsing errors.
>>> It just exacts information from CompileCommand as many as possible. 
>>> That makes sense because compiler "directives" are supposed to be 
>>> optional for program execution.
>>>
>>> I do put the error message in parser's errorbuf.? I set a flag 
>>> "exit_on_error" to quit JVM after it dumps parser errors. yes, I 
>>> treat undefined intrinsics as fatal errors.
>>> This behavior is from Nils comment: "I want to see an error on 
>>> startup if the user has specified unknown intrinsic names." It is 
>>> also consistent with JVM option -XX:ControlIntrinsic=.
>> Okay, thanks for the explanation! I would prefer consistency in error 
>> handling of compiler
>> directives, i.e., handle all parser failures the same way. But I 
>> leave it to Nils to decide.
>>
>> Best regards,
>> Tobias
>


From nils.eliasson at oracle.com  Thu Aug 13 16:17:26 2020
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Thu, 13 Aug 2020 18:17:26 +0200
Subject: RFR(S): 8247732: validate user-input intrinsic_ids in
 ControlIntrinsic
In-Reply-To: <9e3fae0e-ecf7-07a9-dba3-c1cef2646eb3@oracle.com>
References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com>
 <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com>
 <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com>
 <1595401959932.33284@amazon.com>
 <a03d92d6-ad07-b347-7452-776459b8d174@oracle.com>
 <1595520162373.22868@amazon.com>
 <916b3a4a-5617-941d-6161-840f3ea900bd@oracle.com>
 <1596523192072.15354@amazon.com> <1597165750921.4285@amazon.com>
 <f9dfee56-c3dc-718a-960b-9bfc2a8c0c12@oracle.com>
 <9e3fae0e-ecf7-07a9-dba3-c1cef2646eb3@oracle.com>
Message-ID: <4c70ed76-d31a-4077-14b7-37937b5c22ae@oracle.com>

That table didn't come out right...

+-------------------------------------------------+-------+----------------------------------+
| ControlIntrinsics?????????????????????????????? | valid | 
invalid????????????????????????? |
+-------------------------------------------------+-------+----------------------------------+
| vmflag????????????????????????????????????????? | ok??? | print error 
and don't start????? |
+-------------------------------------------------+-------+----------------------------------+
| CompilerOracle: -XX:CompileCommand=???????????? | ok??? | print error 
and continue???????? |
+-------------------------------------------------+-------+----------------------------------+
| CompilerDirectives: -XX:CompilerDirectivesFile= | ok??? | print error 
and don't start????? |
+-------------------------------------------------+-------+----------------------------------+
| CompilerDirectives via jcmd???????????????????? | ok??? | print error, 
VM continues to run |
+-------------------------------------------------+-------+----------------------------------+

// Regards
Nils


On 2020-08-13 17:59, Nils Eliasson wrote:
>
> |+-------------------------------------------------+-------+----------------------------------+ 
> | ControlIntrinsics | valid | invalid | 
> +-------------------------------------------------+-------+----------------------------------+ 
> | vmflag | ok | print error and don't start | 
> +-------------------------------------------------+-------+----------------------------------+ 
> | CompilerOracle: -XX:CompileCommand= | ok | print error and continue 
> | 
> +-------------------------------------------------+-------+----------------------------------+ 
> | CompilerDirectives: -XX:CompilerDirectivesFile= | ok | print error 
> and don't start | 
> +-------------------------------------------------+-------+----------------------------------+ 
> | CompilerDirectives via jcmd | ok | print error, vm continues to run 
> | 
> +-------------------------------------------------+-------+----------------------------------+|


From igor.ignatyev at oracle.com  Thu Aug 13 16:46:25 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Thu, 13 Aug 2020 09:46:25 -0700
Subject: RFR(T) : 8251526 : CTW fails to build after JDK-8251121
Message-ID: <60FD6988-4ED2-4E64-A4F7-C4F7C8033748@oracle.com>

Hi all,

could you please review this one-liner patch?
8251121 introduced a dependency b/w jdk/test/lib/util/CoreUtils and jtreg/SkippedException, b/c SkippedException wasn't on the source path, ctw build failed. the patch simply adds test/lib/jtreg/*.java to the source path.

JBS: https://bugs.openjdk.java.net/browse/JDK-8251526
patch:   
> diff -r ce770ba672fe test/hotspot/jtreg/testlibrary/ctw/Makefile
> --- a/test/hotspot/jtreg/testlibrary/ctw/Makefile       Wed Aug 12 12:37:16 2020 -0400
> +++ b/test/hotspot/jtreg/testlibrary/ctw/Makefile       Thu Aug 13 09:42:09 2020 -0700
> @@ -45,6 +45,7 @@
>  LIB_FILES = $(shell find $(TESTLIBRARY_DIR)/jdk/test/lib/ \
>      $(TESTLIBRARY_DIR)/jdk/test/lib/process \
>      $(TESTLIBRARY_DIR)/jdk/test/lib/util \
> +    $(TESTLIBRARY_DIR)/jtreg \
>      -maxdepth 1 -name '*.java')
>  WB_SRC_FILES = $(shell find $(TESTLIBRARY_DIR)/sun/hotspot -name '*.java')
>  EXPORTS=--add-exports java.base/jdk.internal.jimage=ALL-UNNAMED \
testing: cd test/hotspot/jtreg/testlibrary/ctw && make

Thanks,
-- Igor

From shade at redhat.com  Thu Aug 13 16:55:10 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 13 Aug 2020 18:55:10 +0200
Subject: RFR(T) : 8251526 : CTW fails to build after JDK-8251121
In-Reply-To: <60FD6988-4ED2-4E64-A4F7-C4F7C8033748@oracle.com>
References: <60FD6988-4ED2-4E64-A4F7-C4F7C8033748@oracle.com>
Message-ID: <ebb89e2d-2433-6696-807e-22566e28dc43@redhat.com>

On 8/13/20 6:46 PM, Igor Ignatyev wrote:
>> diff -r ce770ba672fe test/hotspot/jtreg/testlibrary/ctw/Makefile
>> --- a/test/hotspot/jtreg/testlibrary/ctw/Makefile       Wed Aug 12 12:37:16 2020 -0400
>> +++ b/test/hotspot/jtreg/testlibrary/ctw/Makefile       Thu Aug 13 09:42:09 2020 -0700
>> @@ -45,6 +45,7 @@
>>  LIB_FILES = $(shell find $(TESTLIBRARY_DIR)/jdk/test/lib/ \
>>      $(TESTLIBRARY_DIR)/jdk/test/lib/process \
>>      $(TESTLIBRARY_DIR)/jdk/test/lib/util \
>> +    $(TESTLIBRARY_DIR)/jtreg \
>>      -maxdepth 1 -name '*.java')

Looks good and trivial to me.

-- 
Thanks,
-Aleksey


From igor.ignatyev at oracle.com  Thu Aug 13 17:35:02 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Thu, 13 Aug 2020 10:35:02 -0700
Subject: RFR(T) : 8251526 : CTW fails to build after JDK-8251121
In-Reply-To: <ebb89e2d-2433-6696-807e-22566e28dc43@redhat.com>
References: <60FD6988-4ED2-4E64-A4F7-C4F7C8033748@oracle.com>
 <ebb89e2d-2433-6696-807e-22566e28dc43@redhat.com>
Message-ID: <887ADD5E-55E5-491D-AF06-FBAC6FA9C4A0@oracle.com>

Thanks Aleksey, pushed.

-- Igor

> On Aug 13, 2020, at 9:55 AM, Aleksey Shipilev <shade at redhat.com> wrote:
> 
> On 8/13/20 6:46 PM, Igor Ignatyev wrote:
>>> diff -r ce770ba672fe test/hotspot/jtreg/testlibrary/ctw/Makefile
>>> --- a/test/hotspot/jtreg/testlibrary/ctw/Makefile       Wed Aug 12 12:37:16 2020 -0400
>>> +++ b/test/hotspot/jtreg/testlibrary/ctw/Makefile       Thu Aug 13 09:42:09 2020 -0700
>>> @@ -45,6 +45,7 @@
>>> LIB_FILES = $(shell find $(TESTLIBRARY_DIR)/jdk/test/lib/ \
>>>     $(TESTLIBRARY_DIR)/jdk/test/lib/process \
>>>     $(TESTLIBRARY_DIR)/jdk/test/lib/util \
>>> +    $(TESTLIBRARY_DIR)/jtreg \
>>>     -maxdepth 1 -name '*.java')
> 
> Looks good and trivial to me.
> 
> -- 
> Thanks,
> -Aleksey
> 


From xxinliu at amazon.com  Thu Aug 13 18:37:31 2020
From: xxinliu at amazon.com (Liu, Xin)
Date: Thu, 13 Aug 2020 18:37:31 +0000
Subject: RFR(S): 8247732: validate user-input intrinsic_ids in
 ControlIntrinsic
In-Reply-To: <4c70ed76-d31a-4077-14b7-37937b5c22ae@oracle.com>
References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com>
 <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com>
 <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com>
 <1595401959932.33284@amazon.com>
 <a03d92d6-ad07-b347-7452-776459b8d174@oracle.com>
 <1595520162373.22868@amazon.com>
 <916b3a4a-5617-941d-6161-840f3ea900bd@oracle.com>
 <1596523192072.15354@amazon.com> <1597165750921.4285@amazon.com>
 <f9dfee56-c3dc-718a-960b-9bfc2a8c0c12@oracle.com>
 <9e3fae0e-ecf7-07a9-dba3-c1cef2646eb3@oracle.com>,
 <4c70ed76-d31a-4077-14b7-37937b5c22ae@oracle.com>
Message-ID: <1597343851213.53343@amazon.com>

hi, Nils, 

Thank you to elaborate the answer with a table. 

I don't know there are up to 4 approaches to affect compilation behaviors until this table!
I got it. I will work tests and make sure my next patch conform this spec. 

thanks, 
--lx

________________________________________
From: hotspot-compiler-dev <hotspot-compiler-dev-retn at openjdk.java.net> on behalf of Nils Eliasson <nils.eliasson at oracle.com>
Sent: Thursday, August 13, 2020 9:17 AM
To: hotspot-compiler-dev at openjdk.java.net
Subject: RE: [EXTERNAL] RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.


That table didn't come out right...

+-------------------------------------------------+-------+----------------------------------+
| ControlIntrinsics                               | valid |
invalid                          |
+-------------------------------------------------+-------+----------------------------------+
| vmflag                                          | ok    | print error
and don't start      |
+-------------------------------------------------+-------+----------------------------------+
| CompilerOracle: -XX:CompileCommand=             | ok    | print error
and continue         |
+-------------------------------------------------+-------+----------------------------------+
| CompilerDirectives: -XX:CompilerDirectivesFile= | ok    | print error
and don't start      |
+-------------------------------------------------+-------+----------------------------------+
| CompilerDirectives via jcmd                     | ok    | print error,
VM continues to run |
+-------------------------------------------------+-------+----------------------------------+

// Regards
Nils


On 2020-08-13 17:59, Nils Eliasson wrote:
>
> |+-------------------------------------------------+-------+----------------------------------+
> | ControlIntrinsics | valid | invalid |
> +-------------------------------------------------+-------+----------------------------------+
> | vmflag | ok | print error and don't start |
> +-------------------------------------------------+-------+----------------------------------+
> | CompilerOracle: -XX:CompileCommand= | ok | print error and continue
> |
> +-------------------------------------------------+-------+----------------------------------+
> | CompilerDirectives: -XX:CompilerDirectivesFile= | ok | print error
> and don't start |
> +-------------------------------------------------+-------+----------------------------------+
> | CompilerDirectives via jcmd | ok | print error, vm continues to run
> |
> +-------------------------------------------------+-------+----------------------------------+|


From hohensee at amazon.com  Thu Aug 13 21:51:39 2020
From: hohensee at amazon.com (Hohensee, Paul)
Date: Thu, 13 Aug 2020 21:51:39 +0000
Subject: RFR 8164632: Node indices should be treated as unsigned integers
Message-ID: <B921AE90-B00C-4134-B649-6EC848F39CA6@amazon.com>

Shouldn't all the uint type uses that represent node indices actually be node_idx_t?

Thanks,
Paul

?On 8/13/20, 12:34 AM, "hotspot-compiler-dev on behalf of Tobias Hartmann" <hotspot-compiler-dev-retn at openjdk.java.net on behalf of tobias.hartmann at oracle.com> wrote:

    Hi Eric,

    there are other places where Node::_idx is casted to int (and a potential overflow might happen).
    For example, calls to Compile::node_notes_at.

    The purpose of this RFE was to replace all Node::_idx uint -> int casts and consistently use uint
    for the node index. If that's not feasible, we should at least add a guarantee (not only an assert)
    checking that _idx is always <= MAX_INT.

    Best regards,
    Tobias

    On 12.08.20 00:41, Eric, Chan wrote:
    > Hi,
    >
    > Requesting review for
    >
    > Webrev : http://cr.openjdk.java.net/~xliu/eric/8164632/00/webrev/
    > JBS : https://bugs.openjdk.java.net/browse/JDK-8164632
    >
    > The change cast uint ni to integer so that the parameter that pass to method TypeOopPtr::cast_to_instance_id is a integer.
    >
    > I have tested this builds successfully .
    >
    > Ensured that there are no regressions in hotspot : tier1 tests.
    >
    > Regards,
    > Eric Chen
    >


From vladimir.kozlov at oracle.com  Thu Aug 13 22:58:44 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 13 Aug 2020 15:58:44 -0700
Subject: RFR 8164632: Node indices should be treated as unsigned integers
In-Reply-To: <B921AE90-B00C-4134-B649-6EC848F39CA6@amazon.com>
References: <B921AE90-B00C-4134-B649-6EC848F39CA6@amazon.com>
Message-ID: <7225b0a5-e685-89aa-1bc5-4ff162774fe5@oracle.com>

Yes, it is sloppy :(

Mostly it bases on value of MaxNodeLimit = 80000 by default and as result node's idx will never reach MAX_INT.

For EA we need 2 special types TOP and BOTTOM as Paul correctly pointed in RFE.
We can make InstanceTop == max_juint and node_idx_t type for _instance_id . We don't do arithmetic on it, see 
TypeOopPtr::meet_instance_id(). But we can't use assert in this case to check incoming idx because max_juint will be 
valid value - InstanceTop.

And I agree that we should use node_idx_t everywhere.

For example, Node::Init(), init_node_notes(), node_notes_at() and set_node_notes_at() should use it.

Same goes for req and other Node's methods arguments. All Node fields defined as node_idx_t but we have mix of int and 
uint when referencing them.

Warning: it is not small change.

Regards,
Vladimir

On 8/13/20 2:51 PM, Hohensee, Paul wrote:
> Shouldn't all the uint type uses that represent node indices actually be node_idx_t?
> 
> Thanks,
> Paul
> 
> ?On 8/13/20, 12:34 AM, "hotspot-compiler-dev on behalf of Tobias Hartmann" <hotspot-compiler-dev-retn at openjdk.java.net on behalf of tobias.hartmann at oracle.com> wrote:
> 
>      Hi Eric,
> 
>      there are other places where Node::_idx is casted to int (and a potential overflow might happen).
>      For example, calls to Compile::node_notes_at.
> 
>      The purpose of this RFE was to replace all Node::_idx uint -> int casts and consistently use uint
>      for the node index. If that's not feasible, we should at least add a guarantee (not only an assert)
>      checking that _idx is always <= MAX_INT.
> 
>      Best regards,
>      Tobias
> 
>      On 12.08.20 00:41, Eric, Chan wrote:
>      > Hi,
>      >
>      > Requesting review for
>      >
>      > Webrev : http://cr.openjdk.java.net/~xliu/eric/8164632/00/webrev/
>      > JBS : https://bugs.openjdk.java.net/browse/JDK-8164632
>      >
>      > The change cast uint ni to integer so that the parameter that pass to method TypeOopPtr::cast_to_instance_id is a integer.
>      >
>      > I have tested this builds successfully .
>      >
>      > Ensured that there are no regressions in hotspot : tier1 tests.
>      >
>      > Regards,
>      > Eric Chen
>      >
> 

From jptatton at amazon.com  Thu Aug 13 23:02:34 2020
From: jptatton at amazon.com (Tatton, Jason)
Date: Thu, 13 Aug 2020 23:02:34 +0000
Subject: JDK-8180068: Access of mark word should use
 oopDesc::mark_offset_in_bytes() instead of '0' for sparc & arm
Message-ID: <13d13861e3c14681a9180dc2271589ba@EX13D46EUB003.ant.amazon.com>

Hi Everyone,

I'm Jason. I recently joined Amazon on the team supporting OpenJDK. I am new to the OpenJDK project and would like to contribute some starter bug fixes/enhancements/cleanups. I am working with my sponsor, Paul Hohensee.

I have a cleanup to submit. Please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8180068
http://cr.openjdk.java.net/~phh/8180068/webrev.00/

The code change is very straightforward, simply a substitution of '0' with 'oopDesc::mark_offset_in_bytes()' in the relevant 6 locations.

For testing I have run; 'run-test-tier1' and 'run-test-tier2' for: x86_64 and aarch64.

Regards,
--
Jason Taton


From nick.gasson at arm.com  Fri Aug 14 02:26:11 2020
From: nick.gasson at arm.com (Nick Gasson)
Date: Fri, 14 Aug 2020 10:26:11 +0800
Subject: RFR: 8247354: [aarch64] PopFrame causes
 assert(oopDesc::is_oop(obj)) failed: not an oop
In-Reply-To: <10b37c70-c522-7f65-3c7e-bbeeaf7e1c3d@redhat.com>
References: <85364ykery.fsf@nicgas01-pc.shanghai.arm.com>
 <10b37c70-c522-7f65-3c7e-bbeeaf7e1c3d@redhat.com>
Message-ID: <85k0y2q7y4.fsf@nicgas01-pc.shanghai.arm.com>

On 08/13/20 16:06 pm, Andrew Dinn wrote:
> Hi Nick,
>
> On 07/08/2020 10:04, Nick Gasson wrote:
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8247354
>> Webrev: http://cr.openjdk.java.net/~ngasson/8247354/webrev.0/
> Nice detective work. The patch looks ok to me.
>

Thanks for the review Andrew. I've pushed it.

--
Nick

From shade at redhat.com  Fri Aug 14 07:22:26 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 14 Aug 2020 09:22:26 +0200
Subject: JDK-8180068: Access of mark word should use
 oopDesc::mark_offset_in_bytes() instead of '0' for sparc & arm
In-Reply-To: <13d13861e3c14681a9180dc2271589ba@EX13D46EUB003.ant.amazon.com>
References: <13d13861e3c14681a9180dc2271589ba@EX13D46EUB003.ant.amazon.com>
Message-ID: <e59226e7-c54f-3417-4d61-820f7eaf54cc@redhat.com>

On 8/14/20 1:02 AM, Tatton, Jason wrote:
> I have a cleanup to submit. Please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8180068
> http://cr.openjdk.java.net/~phh/8180068/webrev.00/
> 
> The code change is very straightforward, simply a substitution of '0' with 'oopDesc::mark_offset_in_bytes()' in the relevant 6 locations.

No, wait. None of these look relevant:

 *) The uses in load_heap_oop are the _load addresses_. They are naturally just *(obj + 0). This is
not loading the mark word.

 *) The uses in try_resolve_jobject is decoding the JNI handle, "0" is valid there. This is not
loading the mark word. See the native implementation in JNIHandles::resolve_impl that ends up
loading off the dereferenced handle via:

 inline oop* JNIHandles::jobject_ptr(jobject handle) {
   assert(!is_jweak(handle), "precondition");
   return reinterpret_cast<oop*>(handle);
 }


-- 
Thanks,
-Aleksey


From sergei.tsypanov at yandex.ru  Fri Aug 14 07:43:59 2020
From: sergei.tsypanov at yandex.ru (=?utf-8?B?0KHQtdGA0LPQtdC5INCm0YvQv9Cw0L3QvtCy?=)
Date: Fri, 14 Aug 2020 09:43:59 +0200
Subject: JIT optimization broke mapping between compiled code and byte-code
 instructions on JDK 14 / 15EAP
Message-ID: <894941597390093@mail.yandex.ru>

Hello,

while investigating an issue related to instantiation of Spring's 
`org.springframework.util.ConcurrentReferenceHashMap` (as of `spring-core-5.1.3.RELEASE`) 
I've used `LinuxPerfAsmProfiler` shipped along with JMH to profile generated assembly.

I simply run this

@Benchmark
public Object measureInit() {
  return new ConcurrentReferenceHashMap<>();
}

Benchmarking on JDK 8 allows to identify one of hot spots (full assembly layout can be found in [1]):

  0.61%        0x00007f32d92772ea: lock addl $0x0,(%rsp)     ;*putfield count
                                                             ; - org.springframework.util.ConcurrentReferenceHashMap$Segment::&lt;init&gt;@11 (line 476)
                                                             ; - org.springframework.util.ConcurrentReferenceHashMap::&lt;init&gt;@141 (line 184)
 15.81%        0x00007f32d92772ef: mov    0x60(%r15),%rdx

This corresponds unnecessary assignment of default value to a volatile field:

protected final class Segment extends ReentrantLock {
  private volatile int count = 0;
}

Then I run the same benchmark on JDK 14 and again use `LinuxPerfAsmProfiler`, 
but now I don't have any explicit pointing to `volatile int count = 0` in captured assembly [2].

Looking for `lock addl $0x0` instuction which is assignment of `0` under `lock` prefix I have found this:

  0.08%                          ?  0x00007f3717d46187:   lock addl $0x0,-0x40(%rsp)
 23.74%                          ?  0x00007f3717d4618d:   mov    0x120(%r15),%rbx

which is likely to correspond `volatile int count = 0` because it follows the construction of `Segment`'s superclass `ReentrantLock`:

  0.77%                          ?  0x00007f3717d46140:   movq   $0x0,0x18(%rax)              ;*new {reexecute=0 rethrow=0 return_oop=0}
                                 ?                                                            ; - java.util.concurrent.locks.ReentrantLock::&lt;init&gt;@5 (line 294)
                                 ?                                                            ; - org.springframework.util.ConcurrentReferenceHashMap$Segment::&lt;init&gt;@6 (line 484)
                                 ?                                                            ; - org.springframework.util.ConcurrentReferenceHashMap::&lt;init&gt;@141 (line 184)
  0.06%                          ?  0x00007f3717d46148:   mov    %r8,%rcx
  0.05%                          ?  0x00007f3717d4614b:   mov    %rax,%rbx
  0.03%                          ?  0x00007f3717d4614e:   shr    $0x3,%rbx
  0.74%                          ?  0x00007f3717d46152:   mov    %ebx,0xc(%r8)
  0.06%                          ?  0x00007f3717d46156:   mov    %rax,%rbx
  0.05%                          ?  0x00007f3717d46159:   xor    %rcx,%rbx
  0.02%                          ?  0x00007f3717d4615c:   shr    $0x14,%rbx
  0.72%                          ?  0x00007f3717d46160:   test   %rbx,%rbx
                             ?   ?  0x00007f3717d46163:   je     0x00007f3717d4617f
                             ?   ?  0x00007f3717d46165:   shr    $0x9,%rcx
                             ?   ?  0x00007f3717d46169:   movabs $0x7f370a872000,%rdi
                             ?   ?  0x00007f3717d46173:   add    %rcx,%rdi
                             ?   ?  0x00007f3717d46176:   cmpb   $0x8,(%rdi)
  0.00%                      ?   ?  0x00007f3717d46179:   jne    0x00007f3717d46509
  0.04%                      ?   ?  0x00007f3717d4617f:   movl   $0x0,0x14(%r8)
  0.08%                          ?  0x00007f3717d46187:   lock addl $0x0,-0x40(%rsp)
 23.74%                          ?  0x00007f3717d4618d:   mov    0x120(%r15),%rbx

The problem is that I don't have any mention of `putfield count` in generated assembly at all.

I've asked the question on StackOverflow [4] and Andrey Pangin suggests in his comment [5]
that this might be due to broken mapping between compiled code and byte-code causing
miss of debug info in the output of -XX:+PrintAssembly

P.S. The issue is reproducible on JDK 15 built locally, see [3]


  [1]: https://gist.github.com/stsypanov/ff5678987c6f95a2aaf292fa2a3b92a8
  [2]: https://gist.github.com/stsypanov/2e4bd73c39d7465cbdd75ba26d4bc217
  [3]: https://gist.github.com/stsypanov/30fc0f688e6d37612ca017b59ab3e631
  [4]: https://stackoverflow.com/questions/63397711/linuxperfasmprofiler-shows-java-code-corresponding-assembly-hot-spot-for-java-8
  [5]: https://stackoverflow.com/questions/63397711/linuxperfasmprofiler-shows-java-code-corresponding-assembly-hot-spot-for-java-8#comment112109002_63397711

From aph at redhat.com  Fri Aug 14 08:24:45 2020
From: aph at redhat.com (Andrew Haley)
Date: Fri, 14 Aug 2020 09:24:45 +0100
Subject: JIT optimization broke mapping between compiled code and
 byte-code instructions on JDK 14 / 15EAP
In-Reply-To: <894941597390093@mail.yandex.ru>
References: <894941597390093@mail.yandex.ru>
Message-ID: <19bfa8d8-a9bc-69eb-1f66-af68ab537502@redhat.com>

On 14/08/2020 08:43, ?????? ??????? wrote:

 > The problem is that I don't have any mention of `putfield count` in
 > generated assembly at all.

It's here:

   0.04%                      ?   ?  0x00007f3717d4617f:   movl   $0x0,0x14(%r8)
   0.08%                          ?  0x00007f3717d46187:   lock addl $0x0,-0x40(%rsp)

There's never been any guarantee that debuginfo will be complete after
transformations. Optimization rewrites things to such an extent that
it's not really possible anyway: operations are reorganized and
combined in such a way that the relationship between incoming bytecode
and generated code is not 1:1.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From christian.hagedorn at oracle.com  Fri Aug 14 12:10:50 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Fri, 14 Aug 2020 14:10:50 +0200
Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and
 debugging support
Message-ID: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com>

Hi

Please review the following enhancement for C1:
https://bugs.openjdk.java.net/browse/JDK-8251093
http://cr.openjdk.java.net/~chagedorn/8251093/webrev.00/

While I was working on JDK-8249603 [1], I added some additional 
debugging and logging code which helped to figure out what was going on. 
I think it would be useful to have this code around for the analysis of 
future C1 register allocator bugs.

This RFE adds (everything non-product code):
- find_interval(number): Can be called like that from gdb anywhere to 
find an interval with the given number.
- Interval::print_children()/print_parent(): Useful when debugging with 
gdb to quickly show the split children and parent.
- LinearScan::print_reg_num(number): Prints the register or stack 
location for this register number. This is useful in some places 
(logging with TraceLinearScanLevel set) where it just printed a number 
which first had to be manually looked up in other logs.

I additionally did some cleanup of the touched code.

We could additionally split the TraceLinearScanLevel flag into separate 
flags related to the different phases of the register allocation 
algorithm. It currently just prints too much details on the higher 
levels. You often find yourself being interested in a specific part of 
the algorithm and only want to know more details there. To achieve that 
you now you have to either handle all the noise or manually 
disable/enable other logs. We could file an RFE to clean this up if it's 
worth the effort - given that there are not many new issues filed for C1 
register allocation today.

Thank you!

Best regards,
Christian


[1] https://bugs.openjdk.java.net/browse/JDK-8251093


From hohensee at amazon.com  Fri Aug 14 16:05:56 2020
From: hohensee at amazon.com (Hohensee, Paul)
Date: Fri, 14 Aug 2020 16:05:56 +0000
Subject: RFR 8164632: Node indices should be treated as unsigned integers
Message-ID: <05F44A7B-7BF3-4EF0-B1A6-8131600A3919@amazon.com>

Hi, Vladimir,

What do you think of the following?

1. Fix 8164632, i.e., replace int with uint, and add guarantees where idxs are passed to a different type (as in e.g., Eric's webrev).
2. New issue: Define an enum type for _instance_id, (typedef uint instance_idx_t) and change the guarantees to check < InstanceTop and > InstanceBot (InstanceTop = ~(uint)0, InstanceBot = 0). And change from instance ids from int to instance_idx_t.
3. New issue: Change from uint to node_idx_t.

Thanks,
Paul

?On 8/13/20, 4:00 PM, "hotspot-compiler-dev on behalf of Vladimir Kozlov" <hotspot-compiler-dev-retn at openjdk.java.net on behalf of vladimir.kozlov at oracle.com> wrote:

    Yes, it is sloppy :(

    Mostly it bases on value of MaxNodeLimit = 80000 by default and as result node's idx will never reach MAX_INT.

    For EA we need 2 special types TOP and BOTTOM as Paul correctly pointed in RFE.
    We can make InstanceTop == max_juint and node_idx_t type for _instance_id . We don't do arithmetic on it, see
    TypeOopPtr::meet_instance_id(). But we can't use assert in this case to check incoming idx because max_juint will be
    valid value - InstanceTop.

    And I agree that we should use node_idx_t everywhere.

    For example, Node::Init(), init_node_notes(), node_notes_at() and set_node_notes_at() should use it.

    Same goes for req and other Node's methods arguments. All Node fields defined as node_idx_t but we have mix of int and
    uint when referencing them.

    Warning: it is not small change.

    Regards,
    Vladimir

    On 8/13/20 2:51 PM, Hohensee, Paul wrote:
    > Shouldn't all the uint type uses that represent node indices actually be node_idx_t?
    >
    > Thanks,
    > Paul
    >
    > On 8/13/20, 12:34 AM, "hotspot-compiler-dev on behalf of Tobias Hartmann" <hotspot-compiler-dev-retn at openjdk.java.net on behalf of tobias.hartmann at oracle.com> wrote:
    >
    >      Hi Eric,
    >
    >      there are other places where Node::_idx is casted to int (and a potential overflow might happen).
    >      For example, calls to Compile::node_notes_at.
    >
    >      The purpose of this RFE was to replace all Node::_idx uint -> int casts and consistently use uint
    >      for the node index. If that's not feasible, we should at least add a guarantee (not only an assert)
    >      checking that _idx is always <= MAX_INT.
    >
    >      Best regards,
    >      Tobias
    >
    >      On 12.08.20 00:41, Eric, Chan wrote:
    >      > Hi,
    >      >
    >      > Requesting review for
    >      >
    >      > Webrev : http://cr.openjdk.java.net/~xliu/eric/8164632/00/webrev/
    >      > JBS : https://bugs.openjdk.java.net/browse/JDK-8164632
    >      >
    >      > The change cast uint ni to integer so that the parameter that pass to method TypeOopPtr::cast_to_instance_id is a integer.
    >      >
    >      > I have tested this builds successfully .
    >      >
    >      > Ensured that there are no regressions in hotspot : tier1 tests.
    >      >
    >      > Regards,
    >      > Eric Chen
    >      >
    >


From dmitry.chuyko at bell-sw.com  Fri Aug 14 17:14:54 2020
From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko)
Date: Fri, 14 Aug 2020 20:14:54 +0300
Subject: [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp)
In-Reply-To: <abc16ae7-369e-77eb-ed6c-3fa71e4fd80c@oracle.com>
References: <abfbd41d-aacc-af5d-d12b-d8e685e95a76@bell-sw.com>
 <abc16ae7-369e-77eb-ed6c-3fa71e4fd80c@oracle.com>
Message-ID: <220c8f3d-3443-4c4d-bf42-078bec651335@bell-sw.com>

Hi Vladimir,

Thank you for the comments. Here is a version with simplified node 
definitions:

http://cr.openjdk.java.net/~dchuyko/8251525/webrev.01/

-Dmitry

On 8/13/20 2:51 PM, Vladimir Ivanov wrote:
> Hi Dmitry,
>
> Some comments on shared code changes:
>
> src/hotspot/share/opto/library_call.cpp:
>
> +? case vmIntrinsics::_dsignum:
> +??? return UseSignumIntrinsic && 
> (Matcher::match_rule_supported(Op_SignumD) ? inline_double_math(id) : 
> false);
>
> There's no need in repeating UseSignumIntrinsic and 
> (Matcher::match_rule_supported(Op_SignumD) checks.
> C2Compiler::is_intrinsic_supported() already covers taht.
>
>
> src/hotspot/share/opto/signum.hpp:
>
> ? 32 class SignumNode : public Node {
> ? 33 public:
> ? 34?? SignumNode(Node* in) : Node(0, in) {}
> ? 35?? virtual int Opcode() const;
> ? 36?? virtual const Type *bottom_type() const { return NULL; }
> ? 37?? virtual uint ideal_reg() const { return Op_RegD; }
> ? 38 };
>
> Any particular reason to keep SignumNode? I don't see any and would 
> just drop it.
>
> Also, having a dedicated header file just for a couple of nodes with 
> trivial implementations looks like an overkill. As an alternative 
> location, intrinsicnode.cpp should be a better option.
>
> Best regards,
> Vladimir Ivanov
>
> On 13.08.2020 14:04, Dmitry Chuyko wrote:
>> Hello,
>>
>> Please review a faster version of Math.signum() for AArch64.
>>
>> Two new intrinsics (double and float) are introduced in general code, 
>> with appropriate new nodes. New JTreg test is added to cover the 
>> intrinsic case (enabled only for aarch64).
>>
>> AArch64 implementation uses FACGT (compare abslute fp values) and BSL 
>> (fp bit selection) to avoid branches and moves to non-fp registers 
>> and back.
>>
>> Performance results show ~30% better time in the benchmark with a 
>> black hole [1] on Cortex. E.g. on random numbers 4.8 ns/op --> 3.5 
>> ns/op, overhead is 2.9 ns/op.
>>
>> rfe: https://bugs.openjdk.java.net/browse/JDK-8251525
>> webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.00/
>> testing: jck, jtreg including new dedicated test
>>
>> -Dmitry
>>
>> [1] https://cr.openjdk.java.net/~dchuyko/8249198/DoubleSignum.java
>>

From vladimir.kozlov at oracle.com  Fri Aug 14 18:03:51 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 14 Aug 2020 11:03:51 -0700
Subject: RFR 8164632: Node indices should be treated as unsigned integers
In-Reply-To: <05F44A7B-7BF3-4EF0-B1A6-8131600A3919@amazon.com>
References: <05F44A7B-7BF3-4EF0-B1A6-8131600A3919@amazon.com>
Message-ID: <d533b5d6-2723-09a8-559d-f3afb78f8645@oracle.com>

On 8/14/20 9:05 AM, Hohensee, Paul wrote:
> Hi, Vladimir,
> 
> What do you think of the following?
> 
> 1. Fix 8164632, i.e., replace int with uint, and add guarantees where idxs are passed to a different type (as in e.g., Eric's webrev).

I see only this change:

-      const TypeOopPtr* tinst = t->cast_to_instance_id(ni);
+      assert(ni<=INT_MAX,"node index cannot be negative");
+      const TypeOopPtr* tinst = t->cast_to_instance_id((int)ni);

I would like to see first what you are suggesting.

> 2. New issue: Define an enum type for _instance_id, (typedef uint instance_idx_t) and change the guarantees to check < InstanceTop and > InstanceBot (InstanceTop = ~(uint)0, InstanceBot = 0). And change from instance ids from int to instance_idx_t.
> 3. New issue: Change from uint to node_idx_t.

Yes, it is fine to split these 2.

Regards,
Vladimir

> 
> Thanks,
> Paul
> 
> ?On 8/13/20, 4:00 PM, "hotspot-compiler-dev on behalf of Vladimir Kozlov" <hotspot-compiler-dev-retn at openjdk.java.net on behalf of vladimir.kozlov at oracle.com> wrote:
> 
>      Yes, it is sloppy :(
> 
>      Mostly it bases on value of MaxNodeLimit = 80000 by default and as result node's idx will never reach MAX_INT.
> 
>      For EA we need 2 special types TOP and BOTTOM as Paul correctly pointed in RFE.
>      We can make InstanceTop == max_juint and node_idx_t type for _instance_id . We don't do arithmetic on it, see
>      TypeOopPtr::meet_instance_id(). But we can't use assert in this case to check incoming idx because max_juint will be
>      valid value - InstanceTop.
> 
>      And I agree that we should use node_idx_t everywhere.
> 
>      For example, Node::Init(), init_node_notes(), node_notes_at() and set_node_notes_at() should use it.
> 
>      Same goes for req and other Node's methods arguments. All Node fields defined as node_idx_t but we have mix of int and
>      uint when referencing them.
> 
>      Warning: it is not small change.
> 
>      Regards,
>      Vladimir
> 
>      On 8/13/20 2:51 PM, Hohensee, Paul wrote:
>      > Shouldn't all the uint type uses that represent node indices actually be node_idx_t?
>      >
>      > Thanks,
>      > Paul
>      >
>      > On 8/13/20, 12:34 AM, "hotspot-compiler-dev on behalf of Tobias Hartmann" <hotspot-compiler-dev-retn at openjdk.java.net on behalf of tobias.hartmann at oracle.com> wrote:
>      >
>      >      Hi Eric,
>      >
>      >      there are other places where Node::_idx is casted to int (and a potential overflow might happen).
>      >      For example, calls to Compile::node_notes_at.
>      >
>      >      The purpose of this RFE was to replace all Node::_idx uint -> int casts and consistently use uint
>      >      for the node index. If that's not feasible, we should at least add a guarantee (not only an assert)
>      >      checking that _idx is always <= MAX_INT.
>      >
>      >      Best regards,
>      >      Tobias
>      >
>      >      On 12.08.20 00:41, Eric, Chan wrote:
>      >      > Hi,
>      >      >
>      >      > Requesting review for
>      >      >
>      >      > Webrev : http://cr.openjdk.java.net/~xliu/eric/8164632/00/webrev/
>      >      > JBS : https://bugs.openjdk.java.net/browse/JDK-8164632
>      >      >
>      >      > The change cast uint ni to integer so that the parameter that pass to method TypeOopPtr::cast_to_instance_id is a integer.
>      >      >
>      >      > I have tested this builds successfully .
>      >      >
>      >      > Ensured that there are no regressions in hotspot : tier1 tests.
>      >      >
>      >      > Regards,
>      >      > Eric Chen
>      >      >
>      >
> 

From vladimir.kozlov at oracle.com  Fri Aug 14 18:09:49 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 14 Aug 2020 11:09:49 -0700
Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and
 debugging support
In-Reply-To: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com>
References: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com>
Message-ID: <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com>

One note. Most of the code is guarded by #ifndef PRODUCT.

But the flag is available only in DEBUG build:
   develop(intx, TraceLinearScanLevel, 0,

Should we use #ifdef ASSERT and DEBUG() instead?

Thanks,
Vladimir

On 8/14/20 5:10 AM, Christian Hagedorn wrote:
> Hi
> 
> Please review the following enhancement for C1:
> https://bugs.openjdk.java.net/browse/JDK-8251093
> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.00/
> 
> While I was working on JDK-8249603 [1], I added some additional debugging and logging code which helped to figure out 
> what was going on. I think it would be useful to have this code around for the analysis of future C1 register allocator 
> bugs.
> 
> This RFE adds (everything non-product code):
> - find_interval(number): Can be called like that from gdb anywhere to find an interval with the given number.
> - Interval::print_children()/print_parent(): Useful when debugging with gdb to quickly show the split children and parent.
> - LinearScan::print_reg_num(number): Prints the register or stack location for this register number. This is useful in 
> some places (logging with TraceLinearScanLevel set) where it just printed a number which first had to be manually looked 
> up in other logs.
> 
> I additionally did some cleanup of the touched code.
> 
> We could additionally split the TraceLinearScanLevel flag into separate flags related to the different phases of the 
> register allocation algorithm. It currently just prints too much details on the higher levels. You often find yourself 
> being interested in a specific part of the algorithm and only want to know more details there. To achieve that you now 
> you have to either handle all the noise or manually disable/enable other logs. We could file an RFE to clean this up if 
> it's worth the effort - given that there are not many new issues filed for C1 register allocation today.
> 
> Thank you!
> 
> Best regards,
> Christian
> 
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8251093
> 

From vladimir.x.ivanov at oracle.com  Fri Aug 14 18:53:04 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 14 Aug 2020 21:53:04 +0300
Subject: [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp)
In-Reply-To: <220c8f3d-3443-4c4d-bf42-078bec651335@bell-sw.com>
References: <abfbd41d-aacc-af5d-d12b-d8e685e95a76@bell-sw.com>
 <abc16ae7-369e-77eb-ed6c-3fa71e4fd80c@oracle.com>
 <220c8f3d-3443-4c4d-bf42-078bec651335@bell-sw.com>
Message-ID: <3d8f2ce2-d106-6cb6-ffb8-861905d8f49e@oracle.com>


> http://cr.openjdk.java.net/~dchuyko/8251525/webrev.01/

Changes in shared code look good.

Best regards,
Vladimir Ivanov

> On 8/13/20 2:51 PM, Vladimir Ivanov wrote:
>> Hi Dmitry,
>>
>> Some comments on shared code changes:
>>
>> src/hotspot/share/opto/library_call.cpp:
>>
>> +? case vmIntrinsics::_dsignum:
>> +??? return UseSignumIntrinsic && 
>> (Matcher::match_rule_supported(Op_SignumD) ? inline_double_math(id) : 
>> false);
>>
>> There's no need in repeating UseSignumIntrinsic and 
>> (Matcher::match_rule_supported(Op_SignumD) checks.
>> C2Compiler::is_intrinsic_supported() already covers taht.
>>
>>
>> src/hotspot/share/opto/signum.hpp:
>>
>> ? 32 class SignumNode : public Node {
>> ? 33 public:
>> ? 34?? SignumNode(Node* in) : Node(0, in) {}
>> ? 35?? virtual int Opcode() const;
>> ? 36?? virtual const Type *bottom_type() const { return NULL; }
>> ? 37?? virtual uint ideal_reg() const { return Op_RegD; }
>> ? 38 };
>>
>> Any particular reason to keep SignumNode? I don't see any and would 
>> just drop it.
>>
>> Also, having a dedicated header file just for a couple of nodes with 
>> trivial implementations looks like an overkill. As an alternative 
>> location, intrinsicnode.cpp should be a better option.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 13.08.2020 14:04, Dmitry Chuyko wrote:
>>> Hello,
>>>
>>> Please review a faster version of Math.signum() for AArch64.
>>>
>>> Two new intrinsics (double and float) are introduced in general code, 
>>> with appropriate new nodes. New JTreg test is added to cover the 
>>> intrinsic case (enabled only for aarch64).
>>>
>>> AArch64 implementation uses FACGT (compare abslute fp values) and BSL 
>>> (fp bit selection) to avoid branches and moves to non-fp registers 
>>> and back.
>>>
>>> Performance results show ~30% better time in the benchmark with a 
>>> black hole [1] on Cortex. E.g. on random numbers 4.8 ns/op --> 3.5 
>>> ns/op, overhead is 2.9 ns/op.
>>>
>>> rfe: https://bugs.openjdk.java.net/browse/JDK-8251525
>>> webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.00/
>>> testing: jck, jtreg including new dedicated test
>>>
>>> -Dmitry
>>>
>>> [1] https://cr.openjdk.java.net/~dchuyko/8249198/DoubleSignum.java
>>>

From jptatton at amazon.com  Fri Aug 14 19:26:15 2020
From: jptatton at amazon.com (Tatton, Jason)
Date: Fri, 14 Aug 2020 19:26:15 +0000
Subject: JDK-8180068: Access of mark word should use
 oopDesc::mark_offset_in_bytes() instead of '0' for sparc & arm
In-Reply-To: <e59226e7-c54f-3417-4d61-820f7eaf54cc@redhat.com>
References: <13d13861e3c14681a9180dc2271589ba@EX13D46EUB003.ant.amazon.com>
 <e59226e7-c54f-3417-4d61-820f7eaf54cc@redhat.com>
Message-ID: <868e69bd583b4e5f9c8be4f40c934a5d@EX13D46EUB003.ant.amazon.com>

Hi Aleksey,

Thanks for having a look into this. I was mistaken in what these calls were doing, thank you for explaining this. I'm not able to find any other potential instances where 'oopDesc::mark_offset_in_bytes()' should be used. The bug is a few years old, so perhaps the codebase has naturally evolved in the intervening time to resolve this?

Unless anyone can advise on other instances which I should change, I'd advise closing the bug?

-Jason

-----Original Message-----
From: Aleksey Shipilev <shade at redhat.com> 
Sent: 14 August 2020 08:22
To: Tatton, Jason <jptatton at amazon.com>; hotspot-compiler-dev at openjdk.java.net
Subject: RE: [EXTERNAL] JDK-8180068: Access of mark word should use oopDesc::mark_offset_in_bytes() instead of '0' for sparc & arm

On 8/14/20 1:02 AM, Tatton, Jason wrote:
> I have a cleanup to submit. Please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8180068
> http://cr.openjdk.java.net/~phh/8180068/webrev.00/
> 
> The code change is very straightforward, simply a substitution of '0' with 'oopDesc::mark_offset_in_bytes()' in the relevant 6 locations.

No, wait. None of these look relevant:

 *) The uses in load_heap_oop are the _load addresses_. They are naturally just *(obj + 0). This is not loading the mark word.

 *) The uses in try_resolve_jobject is decoding the JNI handle, "0" is valid there. This is not loading the mark word. See the native implementation in JNIHandles::resolve_impl that ends up loading off the dereferenced handle via:

 inline oop* JNIHandles::jobject_ptr(jobject handle) {
   assert(!is_jweak(handle), "precondition");
   return reinterpret_cast<oop*>(handle);  }


--
Thanks,
-Aleksey


From hohensee at amazon.com  Fri Aug 14 20:54:55 2020
From: hohensee at amazon.com (Hohensee, Paul)
Date: Fri, 14 Aug 2020 20:54:55 +0000
Subject: RFR 8164632: Node indices should be treated as unsigned integers
In-Reply-To: <d533b5d6-2723-09a8-559d-f3afb78f8645@oracle.com>
References: <05F44A7B-7BF3-4EF0-B1A6-8131600A3919@amazon.com>
 <d533b5d6-2723-09a8-559d-f3afb78f8645@oracle.com>
Message-ID: <587AF7B9-5EE9-4F93-A587-9B3277E9183D@amazon.com>

By "e.g.", I meant "ones like the one in the webrev". Tobais is correct that there are more. I grep'ed for "(int idx", ", int idx", "(int idx)", and so on, and found a bunch (not all of them are node_idx_t, but many of those that aren't should probably be uint too). So those would be fixed first.

Thanks,
Paul

?On 8/14/20, 11:04 AM, "Vladimir Kozlov" <vladimir.kozlov at oracle.com> wrote:

    On 8/14/20 9:05 AM, Hohensee, Paul wrote:
    > Hi, Vladimir,
    >
    > What do you think of the following?
    >
    > 1. Fix 8164632, i.e., replace int with uint, and add guarantees where idxs are passed to a different type (as in e.g., Eric's webrev).

    I see only this change:

    -      const TypeOopPtr* tinst = t->cast_to_instance_id(ni);
    +      assert(ni<=INT_MAX,"node index cannot be negative");
    +      const TypeOopPtr* tinst = t->cast_to_instance_id((int)ni);

    I would like to see first what you are suggesting.

    > 2. New issue: Define an enum type for _instance_id, (typedef uint instance_idx_t) and change the guarantees to check < InstanceTop and > InstanceBot (InstanceTop = ~(uint)0, InstanceBot = 0). And change from instance ids from int to instance_idx_t.
    > 3. New issue: Change from uint to node_idx_t.

    Yes, it is fine to split these 2.

    Regards,
    Vladimir

    >
    > Thanks,
    > Paul
    >
    > On 8/13/20, 4:00 PM, "hotspot-compiler-dev on behalf of Vladimir Kozlov" <hotspot-compiler-dev-retn at openjdk.java.net on behalf of vladimir.kozlov at oracle.com> wrote:
    >
    >      Yes, it is sloppy :(
    >
    >      Mostly it bases on value of MaxNodeLimit = 80000 by default and as result node's idx will never reach MAX_INT.
    >
    >      For EA we need 2 special types TOP and BOTTOM as Paul correctly pointed in RFE.
    >      We can make InstanceTop == max_juint and node_idx_t type for _instance_id . We don't do arithmetic on it, see
    >      TypeOopPtr::meet_instance_id(). But we can't use assert in this case to check incoming idx because max_juint will be
    >      valid value - InstanceTop.
    >
    >      And I agree that we should use node_idx_t everywhere.
    >
    >      For example, Node::Init(), init_node_notes(), node_notes_at() and set_node_notes_at() should use it.
    >
    >      Same goes for req and other Node's methods arguments. All Node fields defined as node_idx_t but we have mix of int and
    >      uint when referencing them.
    >
    >      Warning: it is not small change.
    >
    >      Regards,
    >      Vladimir
    >
    >      On 8/13/20 2:51 PM, Hohensee, Paul wrote:
    >      > Shouldn't all the uint type uses that represent node indices actually be node_idx_t?
    >      >
    >      > Thanks,
    >      > Paul
    >      >
    >      > On 8/13/20, 12:34 AM, "hotspot-compiler-dev on behalf of Tobias Hartmann" <hotspot-compiler-dev-retn at openjdk.java.net on behalf of tobias.hartmann at oracle.com> wrote:
    >      >
    >      >      Hi Eric,
    >      >
    >      >      there are other places where Node::_idx is casted to int (and a potential overflow might happen).
    >      >      For example, calls to Compile::node_notes_at.
    >      >
    >      >      The purpose of this RFE was to replace all Node::_idx uint -> int casts and consistently use uint
    >      >      for the node index. If that's not feasible, we should at least add a guarantee (not only an assert)
    >      >      checking that _idx is always <= MAX_INT.
    >      >
    >      >      Best regards,
    >      >      Tobias
    >      >
    >      >      On 12.08.20 00:41, Eric, Chan wrote:
    >      >      > Hi,
    >      >      >
    >      >      > Requesting review for
    >      >      >
    >      >      > Webrev : http://cr.openjdk.java.net/~xliu/eric/8164632/00/webrev/
    >      >      > JBS : https://bugs.openjdk.java.net/browse/JDK-8164632
    >      >      >
    >      >      > The change cast uint ni to integer so that the parameter that pass to method TypeOopPtr::cast_to_instance_id is a integer.
    >      >      >
    >      >      > I have tested this builds successfully .
    >      >      >
    >      >      > Ensured that there are no regressions in hotspot : tier1 tests.
    >      >      >
    >      >      > Regards,
    >      >      > Eric Chen
    >      >      >
    >      >
    >


From OGATAK at jp.ibm.com  Fri Aug 14 21:04:58 2020
From: OGATAK at jp.ibm.com (Kazunori Ogata)
Date: Sat, 15 Aug 2020 06:04:58 +0900
Subject: RFR: JDK-8251470: Add a development option equivalant to
 OptoNoExecute to C1 compiler
In-Reply-To: <aee137bd-ae8d-2070-2990-3f0d26dcdb48@oracle.com>
References: <OF1981A1B1.47AE63D1-ON492585C2.002A4C68-492585C2.002AF0B9@notes.na.collabserv.com>
 <aee137bd-ae8d-2070-2990-3f0d26dcdb48@oracle.com>
Message-ID: <OF43FEEBA0.A7548015-ON492585C4.0072ED21-492585C4.0073D08E@notes.na.collabserv.com>

Hi Tobias,

Thank you for checking the webrev and pointing out InstallMethods option. 
I now realize that I failed to notice this option can be turned off.  I 
remember I checked its default value is true, but I wasn't aware that it's 
a command line option...

Regarding the change in javaCalls.cpp, I made this change when I was 
debugging my changes to support new instructions.  I also made another 
change to make my code work.  I guess I should have revisit the change in 
javaCalls.cpp when my code became workable.  This change must not be 
necessary because my version of JVM works fine by only disabling 
InstallMethods.

Anyway, I agree this RFR is unnecessary.  Sorry for bothering you.


Regards,
Ogata


Tobias Hartmann <tobias.hartmann at oracle.com> wrote on 2020/08/13 16:00:20:

> From: Tobias Hartmann <tobias.hartmann at oracle.com>
> To: Kazunori Ogata <OGATAK at jp.ibm.com>, 
hotspot-compiler-dev at openjdk.java.net
> Date: 2020/08/13 16:02
> Subject: [EXTERNAL] Re: RFR: JDK-8251470: Add a development option 
> equivalant to OptoNoExecute to C1 compiler
> 
> Hi Ogata,
> 
> isn't that what -XX:-InstallMethods [1] is supposed to accomplish? It 
> triggers a bailout right
> before Compilation::install_code, which is the same with your code.
> 
> Also, why do you need the change in javaCalls.cpp? That would also 
affect 
> C2 compiled code.
> 
> Best regards,
> Tobias
> 
> [1] INVALID URI REMOVED
> 
u=http-3A__hg.openjdk.java.net_jdk_jdk_file_a7c030723240_src_hotspot_share_c1_c1-5Fglobals.hpp-23l292&d=DwICaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=p-
> 
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=9MlMOi5vGmX_CxK_2Eh5nMKekBrYPPdxQkvLPDv-
> isw&s=TWFHKEFoj6wwSylXbeLhsD7-tv5nCR50A6-iptKDC00&e= 
> 
> 
> On 12.08.20 09:48, Kazunori Ogata wrote:
> > Hi,
> > 
> > May I get review for JDK-8251470: Add a development option equivalant 
to 
> > OptoNoExecute to C1 compiler?
> > 
> > This patch adds a development option to compile a method with C1 and 
print 
> > disassembly of the generated native code, but to skip execution of the 

> > generated code, in the same manner as OptoNoExecute option does in C2.
> > 
> > Log-based debugging is useful to support a new processor.  In C1, the 
> > existing options BailoutAfterHIR and BailoutAfterLIR can be used if 
> > printing HIR/LIR is sufficient.  However, there is no way to print 
> > disassembly of the generated code because these existing options quit 
> > compilation before generating native code.  So this issue proposes a 
new 
> > option for this purpose.
> > 
> > 
> > Bug: INVALID URI REMOVED
> 
u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8251470&d=DwICaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=p-
> 
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=9MlMOi5vGmX_CxK_2Eh5nMKekBrYPPdxQkvLPDv-
> isw&s=JN0Zd_7HcvX3tVM-KdN-Q4hpX7Um5_muAy0Ma5sFWAI&e= 
> > Webrev: INVALID URI REMOVED
> u=http-3A__cr.openjdk.java.net_-7Eogatak_8251470_webrev.
> 00_&d=DwICaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p-
> 
FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=9MlMOi5vGmX_CxK_2Eh5nMKekBrYPPdxQkvLPDv-
> isw&s=7Z4peF7vXvN0QRxSZALXIU3C91WHZWhS5pWyvRA4XlA&e= 
> > 
> > 
> > Regards,
> > Ogata
> > 
> 


From aph at redhat.com  Sat Aug 15 13:50:42 2020
From: aph at redhat.com (Andrew Haley)
Date: Sat, 15 Aug 2020 14:50:42 +0100
Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster
 Math.signum(fp)
In-Reply-To: <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com>
References: <abfbd41d-aacc-af5d-d12b-d8e685e95a76@bell-sw.com>
 <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com>
Message-ID: <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com>


I've been looking at the way Math.signum() is used, mostly by
searching the GitHub code database. I've changed the JMH test to be
IMO more realistic: it's at
http://cr.openjdk.java.net/~aph/DoubleSignum.java. I think it's more
realitic because signum() results usually aren't stored but are used
to feed other arithmetic ops, usually + or *.

Baseline:

Benchmark                  Mode  Cnt  Score   Error  Units
DoubleSignum.ofMostlyNaN   avgt    3  2.409 ? 0.051  ns/op
DoubleSignum.ofMostlyNeg   avgt    3  2.475 ? 0.211  ns/op
DoubleSignum.ofMostlyPos   avgt    3  2.494 ? 0.015  ns/op
DoubleSignum.ofMostlyZero  avgt    3  2.501 ? 0.008  ns/op
DoubleSignum.ofRandom      avgt    3  2.458 ? 0.373  ns/op
DoubleSignum.overhead      avgt    3  2.373 ? 0.029  ns/op

-XX:+UseSignumIntrinsic:

Benchmark                  Mode  Cnt  Score   Error  Units
DoubleSignum.ofMostlyNaN   avgt    3  2.776 ? 0.006  ns/op
DoubleSignum.ofMostlyNeg   avgt    3  2.773 ? 0.066  ns/op
DoubleSignum.ofMostlyPos   avgt    3  2.772 ? 0.084  ns/op
DoubleSignum.ofMostlyZero  avgt    3  2.770 ? 0.045  ns/op
DoubleSignum.ofRandom      avgt    3  2.769 ? 0.005  ns/op
DoubleSignum.overhead      avgt    3  2.376 ? 0.013  ns/op


I think it might be more useful for you to work on optimizing
Math.copysign().

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From ningsheng.jian at arm.com  Mon Aug 17 06:00:13 2020
From: ningsheng.jian at arm.com (Ningsheng Jian)
Date: Mon, 17 Aug 2020 14:00:13 +0800
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
Message-ID: <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>

Hi Andrew,

Thanks a lot for the review! Sorry for the late reply, as I was on 
vacation last week. And thanks to Pengfei and Joshua for helping 
clarifying some details in the patch.

> 
> Testing:
> 
> I was able to test this patch on a loaned Fujitsu FX700. I replicated
> your results, passing tier1 tests and the jtreg compiler tests in
> vectorization, codegen, c2/cr6340864 and loopopts.
> 

Thanks for the testing.

> I also eyeballed /some/ of the generated code to check that it looked
> ok. I'd really like to be able to do that systematically for a
> comprehensive test suite that exercised every rule but I only had the
> machine for a few days. This really ought to be done as a follow-up to
> ensure that all the rules are working as expected.
> 
> 

Yes, we would expect Pengfei's OptoAssembly check patch can get merged 
in future.

> 
> General Comments:
> 
> Sizing the NEON registers using 8 slots -- even though there might
> actually be more (or less!) slots in use for a VecA is fine. However, I
> think this needs a little bit more explanation in the .ad. file (see
> comments on ra webrev below)
> 

OK, I will try to have some more clear comments in ad file.

> I'm ok with your choice to use p7 as an always true predicate register
> and also how you choose to init and re-init from code defined via the ad
> file based on C->max_vector_size().
> 
> I am not clear why you are choosing to re-init ptrue after certain JVM
> runtime calls (e.g. when Z calls into the runtime) and not others e.g.
> when we call a JVM_ENTRY. Could you explain the rationale you have
> followed here?
> 
> 

We do the re-init at any possible return points to c2 code, not in any 
runtime c++ functions, which will reduce the re-init calls.

Actually I found those entries by some hack of jvm. In the hacky code 
below we use gcc option -finstrument-functions to build hotspot. With 
this option, each C/C++ function entry/exit will call the instrument 
functions we defined. In instrument functions, we clobber p7 (or other 
reg for test) register, and in c2 function return we verify that p7 (or 
other reg) has been reinitialized.

http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch

> 
> Specific Comments (feature webrev):
> 
> 
> globals_aarch64.hpp:102
> 
> Just out of interest why does UseSVE have range(0,2)? It seems you are
> only testing for UseSVE > 0. Does value 2 correspond to  an optional subset?
> 
> 

Thanks to Pengfei's reply for this. :-)

> 
> Specific Comments (register allocator webrev):
> 
> 
> aarch64.ad:97-100
> 
> Why have you added a reg_def for R8 and R9 here and also to alloc_class
> chunk0 at lines 544-545? They aren't used by C2 so why define them?
> 

I think Pengfei has helped to explain that. I will either add clear 
comments or rename the register name as you suggested.

> 
> assembler_aarch64.hpp:280 (also 699)
> 
> prf sets a predicate register field. pgrf sets a governing predicate
> register field. Should the name not be gprf.
> 

Thanks to Pengfei's comment.

> 
> chaitin.cpp:648-660
> 
> The comment is rather oddly formatted.
> 

Thanks!

> At line 650 you guard the assert with a test for lrg._is_vector. Is that
> not always going to be guaranteed by the outer condition
> lrg._is_scalable? If so then you should really assert lrg._is_vector.
> 
> The special case code for computation of num_regs for a vector stack
> slot also appears in this file with a slightly different organization in
> find_first_set (line 1350) and in PhaseChaitin::Select (line 1590).
> There is another similar case in RegMask::num_registers at regmask.cpp:
> 98. It would be better to factor out the common code into methods of
> LRG. Maybe using the following?
> 
>    bool LRG::is_scalable_vector() {
>      if (_is_scalable) {
>        assert(_is_vector == 1);
>        assert(_num_regs == == RegMask::SlotsPerVecA)
>        return true;
>      }
>      return false;
>    }
> 
>    int LRG::scalable_num_regs() {
>      assert(is_scalable_vector());
>      if (OptoReg::is_stack(_reg)) {
>        return _scalable_reg_slots
>      } else {
>        return num_reg_slots;
>      }
>    }
> 
> 
> chaitin.cpp:1350
> 
> Once again the test for lrg._is_vector should be guaranteed by the outer
> test of lrg._is_scalable. Refactoring using the common methods of LRG as
> above ought to help.
> 
> chaitin.cpp:1591
> 
> Use common method code.
> 
> 
> postaloc.cpp:308/323
> 
> Once again you should be able to use common method code of LRG here.
> 
> 
> regmask.cpp:91
> 
> Once again you should be able to use common method code of LRG here.
> 

As Joshua clarified, we are also working on predicate scalable reg, 
which is not in this patch. Thanks for the suggestion, I will try to 
refactor this a bit.

> Specific Comments (c2 webrev):
> 
> 
> aarch64.ad:3815
> 
> very nice defensive check!
> 
> 
> assembler_aarch64.hpp:2469 & 2699+
> 
> Andrew Haley is definitely going to ask you to update function entry
> (assembler_aarch64.cpp:76) to call these new instruction generation
> methods and then validate the generated code using asm_check So, I guess
> you might as well do that now ;-)
> 
>

Yes! :-) Will add the test code. Thanks!

> zBarrierSetAssembler_aarch64.cpp:434
> 
> Can you explain why we need to check p7 here and not do so in other
> places where we call into the JVM? I'm not saying this is wrong. I just
> want to know how you decided where re-init of p7 was needed.
>

Actually I found this by my hack patch above while running jtreg tests. 
The stub slowpath here can be a c++ function.

> superword.cpp:97
> 
> Does this mean that is someone sets the maximum vector size to a
> non-power of two, such as 384, all superword operations will be
> bypassed? Including those which can be done using NEON vectors?
> 

Current SLP vectorizer only supports power-of-2 vector size. We are 
trying to work out a new vectorizer to support all SVE vector sizes, so 
we would expect a size like 384 could go to that path. I tried current 
patch on a 512-bit SVE hardware which does not support 384-bit:

$ java -XX:MaxVectorSize=16 -version # (32 and 64 are the same)
openjdk version "16-internal" 2021-03-16

$ java -XX:MaxVectorSize=48 -version
OpenJDK 64-Bit Server VM warning: Current system only supports max SVE 
vector length 32. Set MaxVectorSize to 32

(Fallbacks to 32 and issue a warning, as the prctl() call returns 32 
instead of unsupported 48: 
https://www.kernel.org/doc/Documentation/arm64/sve.txt)

Do you think we need to exit vm instead of warning and fallbacking to 32 
here?

Thanks,
Ningsheng


From tobias.hartmann at oracle.com  Mon Aug 17 06:16:57 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 17 Aug 2020 08:16:57 +0200
Subject: RFR: JDK-8251470: Add a development option equivalant to
 OptoNoExecute to C1 compiler
In-Reply-To: <OF43FEEBA0.A7548015-ON492585C4.0072ED21-492585C4.0073D08E@notes.na.collabserv.com>
References: <OF1981A1B1.47AE63D1-ON492585C2.002A4C68-492585C2.002AF0B9@notes.na.collabserv.com>
 <aee137bd-ae8d-2070-2990-3f0d26dcdb48@oracle.com>
 <OF43FEEBA0.A7548015-ON492585C4.0072ED21-492585C4.0073D08E@notes.na.collabserv.com>
Message-ID: <e959d828-a80a-0710-9a54-1cfabb1ccfb4@oracle.com>

Hi Ogata,

thanks for the details, I've closed the bug as "Not An Issue".

Best regards,
Tobias

On 14.08.20 23:04, Kazunori Ogata wrote:
> Hi Tobias,
> 
> Thank you for checking the webrev and pointing out InstallMethods option. 
> I now realize that I failed to notice this option can be turned off.  I 
> remember I checked its default value is true, but I wasn't aware that it's 
> a command line option...
> 
> Regarding the change in javaCalls.cpp, I made this change when I was 
> debugging my changes to support new instructions.  I also made another 
> change to make my code work.  I guess I should have revisit the change in 
> javaCalls.cpp when my code became workable.  This change must not be 
> necessary because my version of JVM works fine by only disabling 
> InstallMethods.
> 
> Anyway, I agree this RFR is unnecessary.  Sorry for bothering you.
> 
> 
> Regards,
> Ogata
> 
> 
> Tobias Hartmann <tobias.hartmann at oracle.com> wrote on 2020/08/13 16:00:20:
> 
>> From: Tobias Hartmann <tobias.hartmann at oracle.com>
>> To: Kazunori Ogata <OGATAK at jp.ibm.com>, 
> hotspot-compiler-dev at openjdk.java.net
>> Date: 2020/08/13 16:02
>> Subject: [EXTERNAL] Re: RFR: JDK-8251470: Add a development option 
>> equivalant to OptoNoExecute to C1 compiler
>>
>> Hi Ogata,
>>
>> isn't that what -XX:-InstallMethods [1] is supposed to accomplish? It 
>> triggers a bailout right
>> before Compilation::install_code, which is the same with your code.
>>
>> Also, why do you need the change in javaCalls.cpp? That would also 
> affect 
>> C2 compiled code.
>>
>> Best regards,
>> Tobias
>>
>> [1] INVALID URI REMOVED
>>
> u=http-3A__hg.openjdk.java.net_jdk_jdk_file_a7c030723240_src_hotspot_share_c1_c1-5Fglobals.hpp-23l292&d=DwICaQ&c=jf_iaSHvJObTbx-
>> siA1ZOg&r=p-
>>
> FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=9MlMOi5vGmX_CxK_2Eh5nMKekBrYPPdxQkvLPDv-
>> isw&s=TWFHKEFoj6wwSylXbeLhsD7-tv5nCR50A6-iptKDC00&e= 
>>
>>
>> On 12.08.20 09:48, Kazunori Ogata wrote:
>>> Hi,
>>>
>>> May I get review for JDK-8251470: Add a development option equivalant 
> to 
>>> OptoNoExecute to C1 compiler?
>>>
>>> This patch adds a development option to compile a method with C1 and 
> print 
>>> disassembly of the generated native code, but to skip execution of the 
> 
>>> generated code, in the same manner as OptoNoExecute option does in C2.
>>>
>>> Log-based debugging is useful to support a new processor.  In C1, the 
>>> existing options BailoutAfterHIR and BailoutAfterLIR can be used if 
>>> printing HIR/LIR is sufficient.  However, there is no way to print 
>>> disassembly of the generated code because these existing options quit 
>>> compilation before generating native code.  So this issue proposes a 
> new 
>>> option for this purpose.
>>>
>>>
>>> Bug: INVALID URI REMOVED
>>
> u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8251470&d=DwICaQ&c=jf_iaSHvJObTbx-
>> siA1ZOg&r=p-
>>
> FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=9MlMOi5vGmX_CxK_2Eh5nMKekBrYPPdxQkvLPDv-
>> isw&s=JN0Zd_7HcvX3tVM-KdN-Q4hpX7Um5_muAy0Ma5sFWAI&e= 
>>> Webrev: INVALID URI REMOVED
>> u=http-3A__cr.openjdk.java.net_-7Eogatak_8251470_webrev.
>> 00_&d=DwICaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p-
>>
> FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=9MlMOi5vGmX_CxK_2Eh5nMKekBrYPPdxQkvLPDv-
>> isw&s=7Z4peF7vXvN0QRxSZALXIU3C91WHZWhS5pWyvRA4XlA&e= 
>>>
>>>
>>> Regards,
>>> Ogata
>>>
>>
> 
> 

From tobias.hartmann at oracle.com  Mon Aug 17 06:20:19 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 17 Aug 2020 08:20:19 +0200
Subject: JDK-8180068: Access of mark word should use
 oopDesc::mark_offset_in_bytes() instead of '0' for sparc & arm
In-Reply-To: <868e69bd583b4e5f9c8be4f40c934a5d@EX13D46EUB003.ant.amazon.com>
References: <13d13861e3c14681a9180dc2271589ba@EX13D46EUB003.ant.amazon.com>
 <e59226e7-c54f-3417-4d61-820f7eaf54cc@redhat.com>
 <868e69bd583b4e5f9c8be4f40c934a5d@EX13D46EUB003.ant.amazon.com>
Message-ID: <726c7edc-e0f1-4542-0136-dde41e982653@oracle.com>

Hi Jason,

On 14.08.20 21:26, Tatton, Jason wrote:
> Unless anyone can advise on other instances which I should change, I'd advise closing the bug?

Yes, please close as "Not an Issue" and link to this RFR [1].

Best regards,
Tobias

[1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039509.html

From christian.hagedorn at oracle.com  Mon Aug 17 07:44:14 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Mon, 17 Aug 2020 09:44:14 +0200
Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and
 debugging support
In-Reply-To: <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com>
References: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com>
 <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com>
Message-ID: <5c6d0dae-a30d-7b32-4503-62b7ec460606@oracle.com>

Hi Vladimir

Yes, you're right, these should be changed into ASSERT and DEBUG().

I'm wondering though if these ifdefs are even required for if-blocks 
inside methods?

Isn't, for example, this if-block:

#ifndef PRODUCT
         if (TraceLinearScanLevel >= 2) {
           tty->print_cr("killing XMMs for trig");
         }
#endif

removed anyways when the flag is set to < 2 (which is statically known 
and thus would allow this entire block to be removed)? Or does it make a 
difference by explicitly guarding it with an ifdef?

Best regards,
Christian

On 14.08.20 20:09, Vladimir Kozlov wrote:
> One note. Most of the code is guarded by #ifndef PRODUCT.
> 
> But the flag is available only in DEBUG build:
>  ? develop(intx, TraceLinearScanLevel, 0,
> 
> Should we use #ifdef ASSERT and DEBUG() instead?
> 
> Thanks,
> Vladimir
> 
> On 8/14/20 5:10 AM, Christian Hagedorn wrote:
>> Hi
>>
>> Please review the following enhancement for C1:
>> https://bugs.openjdk.java.net/browse/JDK-8251093
>> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.00/
>>
>> While I was working on JDK-8249603 [1], I added some additional 
>> debugging and logging code which helped to figure out what was going 
>> on. I think it would be useful to have this code around for the 
>> analysis of future C1 register allocator bugs.
>>
>> This RFE adds (everything non-product code):
>> - find_interval(number): Can be called like that from gdb anywhere to 
>> find an interval with the given number.
>> - Interval::print_children()/print_parent(): Useful when debugging 
>> with gdb to quickly show the split children and parent.
>> - LinearScan::print_reg_num(number): Prints the register or stack 
>> location for this register number. This is useful in some places 
>> (logging with TraceLinearScanLevel set) where it just printed a number 
>> which first had to be manually looked up in other logs.
>>
>> I additionally did some cleanup of the touched code.
>>
>> We could additionally split the TraceLinearScanLevel flag into 
>> separate flags related to the different phases of the register 
>> allocation algorithm. It currently just prints too much details on the 
>> higher levels. You often find yourself being interested in a specific 
>> part of the algorithm and only want to know more details there. To 
>> achieve that you now you have to either handle all the noise or 
>> manually disable/enable other logs. We could file an RFE to clean this 
>> up if it's worth the effort - given that there are not many new issues 
>> filed for C1 register allocation today.
>>
>> Thank you!
>>
>> Best regards,
>> Christian
>>
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8251093
>>

From shade at redhat.com  Mon Aug 17 08:45:30 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 17 Aug 2020 10:45:30 +0200
Subject: JDK-8180068: Access of mark word should use
 oopDesc::mark_offset_in_bytes() instead of '0' for sparc & arm
In-Reply-To: <868e69bd583b4e5f9c8be4f40c934a5d@EX13D46EUB003.ant.amazon.com>
References: <13d13861e3c14681a9180dc2271589ba@EX13D46EUB003.ant.amazon.com>
 <e59226e7-c54f-3417-4d61-820f7eaf54cc@redhat.com>
 <868e69bd583b4e5f9c8be4f40c934a5d@EX13D46EUB003.ant.amazon.com>
Message-ID: <f8dc0540-14cf-15bd-d89f-c9ed51792ef9@redhat.com>

Hi again,

On 8/14/20 9:26 PM, Tatton, Jason wrote:
> Thanks for having a look into this. I was mistaken in what these calls were doing, thank you for
> explaining this. I'm not able to find any other potential instances where
> 'oopDesc::mark_offset_in_bytes()' should be used. The bug is a few years old, so perhaps the
> codebase has naturally evolved in the intervening time to resolve this?

I think so. sparc parts are gone. I eyeballed arm parts for Address(...) usages, and there seem to
be none that require changing 0 to oopDesc::mark_offset_in_bytes().

> Unless anyone can advise on other instances which I should change, I'd advise closing the bug?

Yes, I think closing with "Not an Issue" would be in order.

-- 
Thanks,
-Aleksey


From rwestrel at redhat.com  Mon Aug 17 08:49:28 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 17 Aug 2020 10:49:28 +0200
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <875zbjw9m9.fsf@redhat.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
Message-ID: <87h7t13bdz.fsf@redhat.com>


John, Tobias,

> The last patch is flawed: predicates in the inner loop use the jvm state
> from the predicates of the initial loop, that is the state before the
> loop. If deoptimization happens for an inner loop predicate on an
> iteration of the outer loop that's not the first one then execution
> resumes as if the initial loop was never executed when it's already part
> way through.
>
> To fix this, I changed the code so one iteration of the loop is peeled
> when the loop is transformed to a long counted loop. State for
> predicates is obtained from the safepoint at the end of the peeled
> iteration of the loop.

Does the fixed patch look ok to you?

Roland.

> http://cr.openjdk.java.net/~roland/8223051/webrev.03/
>
> diff from previous patch:
> http://cr.openjdk.java.net/~roland/8223051/webrev.02-03/


From adinn at redhat.com  Mon Aug 17 08:52:56 2020
From: adinn at redhat.com (Andrew Dinn)
Date: Mon, 17 Aug 2020 09:52:56 +0100
Subject: [16] RFR(S): 8251525: AARCH64: Faster Math.signum(fp)
In-Reply-To: <abfbd41d-aacc-af5d-d12b-d8e685e95a76@bell-sw.com>
References: <abfbd41d-aacc-af5d-d12b-d8e685e95a76@bell-sw.com>
Message-ID: <e009b40d-919f-96f6-1fb3-738cae4b94e3@redhat.com>

On 13/08/2020 12:04, Dmitry Chuyko wrote:
> Please review a faster version of Math.signum() for AArch64.
> 
> Two new intrinsics (double and float) are introduced in general code,
> with appropriate new nodes. New JTreg test is added to cover the
> intrinsic case (enabled only for aarch64).
> 
> AArch64 implementation uses FACGT (compare abslute fp values) and BSL
> (fp bit selection) to avoid branches and moves to non-fp registers and
> back.
> 
> Performance results show ~30% better time in the benchmark with a black
> hole [1] on Cortex. E.g. on random numbers 4.8 ns/op --> 3.5 ns/op,
> overhead is 2.9 ns/op.
> 
> rfe: https://bugs.openjdk.java.net/browse/JDK-8251525
> webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.00/
> testing: jck, jtreg including new dedicated test
The arrays float_cases and double_cases in that dedicated test
(TestSignumIntrinsic) include some rather randomly picked float literals
with either exponent or a large exponent. They do not include a denormal
float literal (excluding the obvious corner cases
Float/Double.MIN_VALUE). At least one sample value from the denormal
range ought to be included even though (indeed, precisely because) it
ought to be of no consequence for the algorithm being used.

regards,


Andrew Dinn
-----------
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From adinn at redhat.com  Mon Aug 17 09:16:47 2020
From: adinn at redhat.com (Andrew Dinn)
Date: Mon, 17 Aug 2020 10:16:47 +0100
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
Message-ID: <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>

Hi Pengfei,

On 17/08/2020 07:00, Ningsheng Jian wrote:
> Thanks a lot for the review! Sorry for the late reply, as I was on
> vacation last week. And thanks to Pengfei and Joshua for helping
> clarifying some details in the patch.

Yes, they did a very good job of answering most of the pending questions.

>> I also eyeballed /some/ of the generated code to check that it looked
>> ok. I'd really like to be able to do that systematically for a
>> comprehensive test suite that exercised every rule but I only had the
>> machine for a few days. This really ought to be done as a follow-up to
>> ensure that all the rules are working as expected.
> 
> Yes, we would expect Pengfei's OptoAssembly check patch can get merged
> in future.

I'm fine with that as a follow-up patch if you raise a JIRA for it.

>> I am not clear why you are choosing to re-init ptrue after certain JVM
>> runtime calls (e.g. when Z calls into the runtime) and not others e.g.
>> when we call a JVM_ENTRY. Could you explain the rationale you have
>> followed here?
> 
> We do the re-init at any possible return points to c2 code, not in any
> runtime c++ functions, which will reduce the re-init calls.
> 
> Actually I found those entries by some hack of jvm. In the hacky code
> below we use gcc option -finstrument-functions to build hotspot. With
> this option, each C/C++ function entry/exit will call the instrument
> functions we defined. In instrument functions, we clobber p7 (or other
> reg for test) register, and in c2 function return we verify that p7 (or
> other reg) has been reinitialized.
> 
> http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch

Nice work. It's very good to have that documented. I'm willing to accept
i) that this has found all current cases and ii) that the verify will
catch any cases that might get introduced by future changes (e.g. the
callout introduced by ZGC that you mention below). As the above mot say
there is a slim chance this might have missed some cases but I think it
is pretty unlikely.


>> Specific Comments (register allocator webrev):
>>
>>
>> aarch64.ad:97-100
>>
>> Why have you added a reg_def for R8 and R9 here and also to alloc_class
>> chunk0 at lines 544-545? They aren't used by C2 so why define them?
>>
> 
> I think Pengfei has helped to explain that. I will either add clear
> comments or rename the register name as you suggested.

Ok, good.

> As Joshua clarified, we are also working on predicate scalable reg,
> which is not in this patch. Thanks for the suggestion, I will try to
> refactor this a bit.

Ok, I'll wait for an updated patch. Are you planning to include the
scalable predicate reg code as part of this patch? I think that would be
better as it would help to clarify the need to distinguish vector regs
as a subset of scalable regs.

>> zBarrierSetAssembler_aarch64.cpp:434
>>
>> Can you explain why we need to check p7 here and not do so in other
>> places where we call into the JVM? I'm not saying this is wrong. I just
>> want to know how you decided where re-init of p7 was needed.
>>
> 
> Actually I found this by my hack patch above while running jtreg tests.
> The stub slowpath here can be a c++ function.

Yes, good catch.

>> superword.cpp:97
>>
>> Does this mean that is someone sets the maximum vector size to a
>> non-power of two, such as 384, all superword operations will be
>> bypassed? Including those which can be done using NEON vectors?
>>
> 
> Current SLP vectorizer only supports power-of-2 vector size. We are
> trying to work out a new vectorizer to support all SVE vector sizes, so
> we would expect a size like 384 could go to that path. I tried current
> patch on a 512-bit SVE hardware which does not support 384-bit:
> 
> $ java -XX:MaxVectorSize=16 -version # (32 and 64 are the same)
> openjdk version "16-internal" 2021-03-16
> 
> $ java -XX:MaxVectorSize=48 -version
> OpenJDK 64-Bit Server VM warning: Current system only supports max SVE
> vector length 32. Set MaxVectorSize to 32
> 
> (Fallbacks to 32 and issue a warning, as the prctl() call returns 32
> instead of unsupported 48:
> https://www.kernel.org/doc/Documentation/arm64/sve.txt)
> 
> Do you think we need to exit vm instead of warning and fallbacking to 32
> here?

Yes, I think a vm exit would probably be a better choice.

regards,


Andrew Dinn
-----------
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From ningsheng.jian at arm.com  Mon Aug 17 10:19:04 2020
From: ningsheng.jian at arm.com (Ningsheng Jian)
Date: Mon, 17 Aug 2020 18:19:04 +0800
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
Message-ID: <294d013f-e6d4-4eba-4455-78bd4cfd1148@arm.com>

Hi Andrew,

> 
>> As Joshua clarified, we are also working on predicate scalable reg,
>> which is not in this patch. Thanks for the suggestion, I will try to
>> refactor this a bit.
> 
> Ok, I'll wait for an updated patch. Are you planning to include the
> scalable predicate reg code as part of this patch? I think that would be
> better as it would help to clarify the need to distinguish vector regs
> as a subset of scalable regs.
> 

My original plan was not to include scalable predicate reg related code, 
as they are not used and tested without proper mid-end/back-end code. Do 
you think just adding some comments is OK for now, e.g. saying that a 
scalable reg could also be a predicate reg in future?

>> $ java -XX:MaxVectorSize=48 -version
>> OpenJDK 64-Bit Server VM warning: Current system only supports max SVE
>> vector length 32. Set MaxVectorSize to 32
>>
>> (Fallbacks to 32 and issue a warning, as the prctl() call returns 32
>> instead of unsupported 48:
>> https://www.kernel.org/doc/Documentation/arm64/sve.txt)
>>
>> Do you think we need to exit vm instead of warning and fallbacking to 32
>> here?
> 
> Yes, I think a vm exit would probably be a better choice.
> 

OK, will do that. Thanks!

Regards,
Ningsheng


From adinn at redhat.com  Mon Aug 17 10:29:11 2020
From: adinn at redhat.com (Andrew Dinn)
Date: Mon, 17 Aug 2020 11:29:11 +0100
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <294d013f-e6d4-4eba-4455-78bd4cfd1148@arm.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <294d013f-e6d4-4eba-4455-78bd4cfd1148@arm.com>
Message-ID: <9e045763-4a5d-57b5-18c9-63f8a6072c2e@redhat.com>

On 17/08/2020 11:19, Ningsheng Jian wrote:
>>> As Joshua clarified, we are also working on predicate scalable reg,
>>> which is not in this patch. Thanks for the suggestion, I will try to
>>> refactor this a bit.
>>
>> Ok, I'll wait for an updated patch. Are you planning to include the
>> scalable predicate reg code as part of this patch? I think that would be
>> better as it would help to clarify the need to distinguish vector regs
>> as a subset of scalable regs.
>>
> 
> My original plan was not to include scalable predicate reg related code,
> as they are not used and tested without proper mid-end/back-end code. Do
> you think just adding some comments is OK for now, e.g. saying that a
> scalable reg could also be a predicate reg in future?

Sure. A comment describing the meaning of the scalable and vector
properties and their independence from each other will be good enough
for now and it will still be helpful once the extra code is added.

regards,


Andrew Dinn
-----------
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From fairoz.matte at oracle.com  Mon Aug 17 12:46:37 2020
From: fairoz.matte at oracle.com (Fairoz Matte)
Date: Mon, 17 Aug 2020 05:46:37 -0700 (PDT)
Subject: RFR(s): 8248295: serviceability/jvmti/CompiledMethodLoad/Zombie.java
 failure with Graal
Message-ID: <d9b5bcb9-ebc4-42d2-ac22-cba1128a3cf4@default>

Hi,

 
Please review this small test change to work with Graal.

 
Background: 

Graal require more code cache compared to c1/c2. but the test case always set it to 20MB. This may not be sufficient when running graal.

Default configuration for ReservedCodeCacheSize = 250MB

With graal enabled, ReservedCodeCacheSize = 350MB

 
Either we can modify the framework to honor ReservedCodeCacheSize for graal or just update the testcase.

There are not many test cases they rely on ReservedCodeCacheSize or InitialCodeCacheSize. So the fix prefer the later one.

 
JBS - https://bugs.openjdk.java.net/browse/JDK-8248295 

Webrev - http://cr.openjdk.java.net/~fmatte/8248295/webrev.00/  

 
Thanks,

Fairoz

 
From vladimir.kozlov at oracle.com  Mon Aug 17 17:36:27 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 17 Aug 2020 10:36:27 -0700
Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and
 debugging support
In-Reply-To: <5c6d0dae-a30d-7b32-4503-62b7ec460606@oracle.com>
References: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com>
 <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com>
 <5c6d0dae-a30d-7b32-4503-62b7ec460606@oracle.com>
Message-ID: <d4f2ee42-6db4-644f-0b0c-2c6e4d6373bd@oracle.com>

On 8/17/20 12:44 AM, Christian Hagedorn wrote:
> Hi Vladimir
> 
> Yes, you're right, these should be changed into ASSERT and DEBUG().
> 
> I'm wondering though if these ifdefs are even required for if-blocks inside methods?
> 
> Isn't, for example, this if-block:
> 
> #ifndef PRODUCT
>  ??????? if (TraceLinearScanLevel >= 2) {
>  ????????? tty->print_cr("killing XMMs for trig");
>  ??????? }
> #endif
> 
> removed anyways when the flag is set to < 2 (which is statically known and thus would allow this entire block to be 
> removed)? Or does it make a difference by explicitly guarding it with an ifdef?

You are right. It could be statically removed. But we keep #ifdef sometimes to indicate that code is executed only in 
debug build because we don't always remember type of a flag.

Thanks,
Vladimir K

> 
> Best regards,
> Christian
> 
> On 14.08.20 20:09, Vladimir Kozlov wrote:
>> One note. Most of the code is guarded by #ifndef PRODUCT.
>>
>> But the flag is available only in DEBUG build:
>> ?? develop(intx, TraceLinearScanLevel, 0,
>>
>> Should we use #ifdef ASSERT and DEBUG() instead?
>>
>> Thanks,
>> Vladimir
>>
>> On 8/14/20 5:10 AM, Christian Hagedorn wrote:
>>> Hi
>>>
>>> Please review the following enhancement for C1:
>>> https://bugs.openjdk.java.net/browse/JDK-8251093
>>> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.00/
>>>
>>> While I was working on JDK-8249603 [1], I added some additional debugging and logging code which helped to figure out 
>>> what was going on. I think it would be useful to have this code around for the analysis of future C1 register 
>>> allocator bugs.
>>>
>>> This RFE adds (everything non-product code):
>>> - find_interval(number): Can be called like that from gdb anywhere to find an interval with the given number.
>>> - Interval::print_children()/print_parent(): Useful when debugging with gdb to quickly show the split children and 
>>> parent.
>>> - LinearScan::print_reg_num(number): Prints the register or stack location for this register number. This is useful 
>>> in some places (logging with TraceLinearScanLevel set) where it just printed a number which first had to be manually 
>>> looked up in other logs.
>>>
>>> I additionally did some cleanup of the touched code.
>>>
>>> We could additionally split the TraceLinearScanLevel flag into separate flags related to the different phases of the 
>>> register allocation algorithm. It currently just prints too much details on the higher levels. You often find 
>>> yourself being interested in a specific part of the algorithm and only want to know more details there. To achieve 
>>> that you now you have to either handle all the noise or manually disable/enable other logs. We could file an RFE to 
>>> clean this up if it's worth the effort - given that there are not many new issues filed for C1 register allocation 
>>> today.
>>>
>>> Thank you!
>>>
>>> Best regards,
>>> Christian
>>>
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8251093
>>>

From vladimir.kozlov at oracle.com  Mon Aug 17 17:52:22 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 17 Aug 2020 10:52:22 -0700
Subject: RFR(s): 8248295:
 serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal
In-Reply-To: <d9b5bcb9-ebc4-42d2-ac22-cba1128a3cf4@default>
References: <d9b5bcb9-ebc4-42d2-ac22-cba1128a3cf4@default>
Message-ID: <1bdcbd35-097e-1681-3a0c-32f9709497a4@oracle.com>

Hi Fairoz,

How you determine that +10Mb is enough with Graal?

Thanks,
Vladimir

On 8/17/20 5:46 AM, Fairoz Matte wrote:
> Hi,
> 
>   
> 
> Please review this small test change to work with Graal.
> 
>   
> 
> Background:
> 
> Graal require more code cache compared to c1/c2. but the test case always set it to 20MB. This may not be sufficient when running graal.
> 
> Default configuration for ReservedCodeCacheSize = 250MB
> 
> With graal enabled, ReservedCodeCacheSize = 350MB
> 
>   
> 
> Either we can modify the framework to honor ReservedCodeCacheSize for graal or just update the testcase.
> 
> There are not many test cases they rely on ReservedCodeCacheSize or InitialCodeCacheSize. So the fix prefer the later one.
> 
>   
> 
> JBS - https://bugs.openjdk.java.net/browse/JDK-8248295
> 
> Webrev - http://cr.openjdk.java.net/~fmatte/8248295/webrev.00/
> 
>   
> 
> Thanks,
> 
> Fairoz
> 
>   
> 

From jingxinc at amazon.com  Mon Aug 17 18:52:00 2020
From: jingxinc at amazon.com (Eric, Chan)
Date: Mon, 17 Aug 2020 18:52:00 +0000
Subject: RFR 8213777: purge outdated fp code in x86_32.ad
Message-ID: <05D26CF9-9C02-4803-9FEF-1B8EB45A3BEA@amazon.com>

Hi,

Requesting review for

Webrev : http://cr.openjdk.java.net/~phh/8213777/webrev.00/
JBS : https://bugs.openjdk.java.net/browse/JDK-8213777

I delete some outdate code in jdk-11. Since UseSSE is always larger than or equal to 2, some scenario when UseSSE  less than two is outdated.

I have tested this builds successfully .

Ensured that there are no regressions in hotspot : tier1 tests.

Regards,
Eric Chen


From vladimir.x.ivanov at oracle.com  Mon Aug 17 19:35:21 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 17 Aug 2020 22:35:21 +0300
Subject: RFR 8213777: purge outdated fp code in x86_32.ad
In-Reply-To: <05D26CF9-9C02-4803-9FEF-1B8EB45A3BEA@amazon.com>
References: <05D26CF9-9C02-4803-9FEF-1B8EB45A3BEA@amazon.com>
Message-ID: <c71ed029-29c7-eabd-1c1a-823feddd2a76@oracle.com>

Hi Eric,

UseSSE >= 2 invariant is valid only on x86-64 since it is guaranteed by 
system ABI. It is not applicable to x86-32.

Best regards,
Vladimir Ivanov

On 17.08.2020 21:52, Eric, Chan wrote:
> Hi,
> 
> Requesting review for
> 
> Webrev : http://cr.openjdk.java.net/~phh/8213777/webrev.00/
> JBS : https://bugs.openjdk.java.net/browse/JDK-8213777
> 
> I delete some outdate code in jdk-11. Since UseSSE is always larger than or equal to 2, some scenario when UseSSE  less than two is outdated.
> 
> I have tested this builds successfully .
> 
> Ensured that there are no regressions in hotspot : tier1 tests.
> 
> Regards,
> Eric Chen
> 

From martin.doerr at sap.com  Mon Aug 17 21:54:32 2020
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 17 Aug 2020 21:54:32 +0000
Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
Message-ID: <AM4PR02MB305739FDFF38E9432F27E7969A5F0@AM4PR02MB3057.eurprd02.prod.outlook.com>

Hi,

I'd like to backport https://bugs.openjdk.java.net/browse/JDK-8241234 to JDK11u.

Original JDK15 patch (https://hg.openjdk.java.net/jdk/jdk/rev/87c506c8be63) doesn't fit to JDK11u because the locking code has been reworked by https://bugs.openjdk.java.net/browse/JDK-8229844
As mentioned by Vladimir, there's already a GraalVM version available which consists of 2 patches (original + addon) and which can be applied:
https://github.com/graalvm/labs-openjdk-11/commit/6c162cb15262e6aa77e36eb3a268320ef0a206a4
https://github.com/graalvm/labs-openjdk-11/commit/6a28a618cdbe595f9a3993e0eb63c01ccae1a528
Only JVMCI part from GraalVM doesn't apply automatically. The version of this file from JDK15 is very simple and fits perfectly.

Please review the JDK11u backport webrev:
http://cr.openjdk.java.net/~mdoerr/8241234_monitorenterexit_11u/webrev.00/

Thanks and best regards,
Martin


From tobias.hartmann at oracle.com  Tue Aug 18 06:54:09 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 18 Aug 2020 08:54:09 +0200
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <87h7t13bdz.fsf@redhat.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
 <87h7t13bdz.fsf@redhat.com>
Message-ID: <a28925b3-f93c-40f4-f180-8bf8a32c3a56@oracle.com>

Hi Roland,

On 17.08.20 10:49, Roland Westrelin wrote:
> Does the fixed patch look ok to you?

Yes, webrev.03 looks good to me. I've re-run extended testing and the results look good.

Best regards,
Tobias

From HORIE at jp.ibm.com  Tue Aug 18 07:28:12 2020
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Tue, 18 Aug 2020 16:28:12 +0900
Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new
 byte-reverse instructions
In-Reply-To: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com>
References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com>,
 <20200626182644.GA262544@pacoca>
 <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com>, <20200630001528.GA26652@pacoca>
Message-ID: <OF8BF41F00.CCAC864B-ON002585C8.00003A9A-492585C8.002908BA@notes.na.collabserv.com>


Jose,
Latest change looks good also to me.

Marin,
Do you think if I can push the change?

Best regards,
Michihiro


 ----- Original message -----
 From: "Doerr, Martin" <martin.doerr at sap.com>
 To: "joserz at linux.ibm.com" <joserz at linux.ibm.com>
 Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>,
 "horie at jp.ibm.com" <horie at jp.ibm.com>
 Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and
 use new byte-reverse instructions
 Date: Wed, Jul 1, 2020 4:01 AM

 Thanks for the much better flag description.
 Looks good.

 Best regards,
 Martin

 > Am 30.06.2020 um 02:15 schrieb "joserz at linux.ibm.com"
 <joserz at linux.ibm.com>:
 >
 > ?Hello team,
 >
 > Here's the 2nd version, implementing the suggestions asked by Martin.
 >
 > Webrev:
 https://cr.openjdk.java.net/~mhorie/8248190/webrev.01/

 > Bug:
 https://bugs.openjdk.java.net/browse/JDK-8248190

 >
 > Thank you!!
 >
 > Jose
 >
 >> On Sat, Jun 27, 2020 at 09:29:32AM +0000, Doerr, Martin wrote:
 >> Hi Jose,
 >>
 >> Can you replace the outdated description of PowerArchitecturePPC64 in
 globals_poc.hpp by something generic, please?
 >>
 >> Please update the Copyright year in vm_version_poc.hpp.
 >>
 >> I can?t test the change, but it looks good to me.
 >>
 >> Best regards,
 >> Martin
 >>
 >>>> Am 26.06.2020 um 20:29 schrieb "joserz at linux.ibm.com"
 <joserz at linux.ibm.com>:
 >>>
 >>> ?Hello team!
 >>>
 >>> This patch introduces Power10 to OpenJDK and implements three new
 instructions:
 >>> - brh - byte-reverse halfword
 >>> - brw - byte-reverse word
 >>> - brd - byte-reverse doubleword
 >>>
 >>> Webrev:
 https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/

 >>> Bug:
 https://bugs.openjdk.java.net/browse/JDK-8248190

 >>>
 >>> Thanks for your review!
 >>>
 >>> Jose R. Ziviani


From richard.reingruber at sap.com  Tue Aug 18 07:43:51 2020
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Tue, 18 Aug 2020 07:43:51 +0000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <AM4PR0202MB296490252335D6D6D638277AEC760@AM4PR0202MB2964.eurprd02.prod.outlook.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <ca46e04d-6c46-7365-0f09-9d649e196442@oracle.com>
 <DB7PR02MB3612E34960EAD89951E788839B5A0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com>
 <a4213452-e7bd-5bed-7456-3eebf4a4c3a7@oracle.com>
 <DB7PR02MB3612C72A7DC0C14CFC8B92969B540@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <f97264ed-c43e-2d7e-19ae-fcff174f74df@oracle.com>
 <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com>
 <DB7PR02MB36127925DB5D6609DDBF96909B500@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <5271429a-481d-ddb9-99dc-b3f6670fcc0b@oracle.com>
 <AM0PR0202MB33316510E86767AED0D29F679B030@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM7PR02MB6049A3D2F6DE10CAD6AA7A51ECEC0@AM7PR02MB6049.eurprd02.prod.outlook.com>
 <b159e349-95bc-01c3-5250-f3b454d7ef53@oracle.com>
 <AM0PR0202MB33315707EAB1F5C9801DB4C19BE40@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM0PR0202MB32972071A26C80FB22FC49DE9AFD0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <AM0PR0202MB3331EEF36942FCEBA7E131389BCB0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM0PR0202MB329746F57D1C78F14000CB799AC80@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <AM0PR0202MB3331D64C693490FD0746D1989BC90@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <DB6PR0201MB2152AF18921A375D26A76D89ECA40@DB6PR0201MB2152.eurprd02.prod.outlook.com>
 <AM0PR0202MB3331FF18BED42A71796488E59B600@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM4PR0202MB29641555B86889D51E08441BEC7F0@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <AM4PR0202MB2964FAF58FBD21D6705A4418EC7C0@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <AM0PR0202MB333139A9A877B64198E73D0F9B790@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM4PR0202MB296490252335D6D6D638277AEC760@AM4PR0202MB2964.eurprd02.prod.outlook.com>
Message-ID: <AM0PR0202MB3331CBC4824141F221D357129B5C0@AM0PR0202MB3331.eurprd02.prod.outlook.com>

Hi Goetz,

I have collected the changes based on your feedback in a new webrev:

Webrev.7: http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.7/
Delta:    http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.7.inc/

Most of the changes are renamings, commenting, and reformatting.

Besides that ...

- I converted the native agent of the test IterateHeapWithEscapeAnalysisEnabled
  from C to C++, because this seems to be preferred by serviceability
  developers. I also re-indented the file, but excluded this from the delta
  webrev.

- I had to adapt test/jdk/com/sun/jdi/EATests.java to the fact that background
  compilation (-Xbatch) cannot be reliably disabled for JVMCI
  compilers. E.g. the compile broker will compile in the background if JVMCI is
  not yet fully initialized. Therefore it is possible that test cases are
  executed before the main test method is compiled on the highest level and then
  the test case fails. The higher the system load the higher the probability for
  this to happen. In webrev.7 I skip the compilation level check if the vm is
  configured to use the JVMCI compiler.

I also answered you inline below.

Thanks,
Richard.

-----Original Message-----
From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com> 
Sent: Donnerstag, 23. Juli 2020 16:20
To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents

Hi Richard, 

Thanks for your two further explanations in the other thread. 
That made the points clear to me.

> > I was not that happy with the names saying not_global_escape
> > and similar. I now agreed you have to use the terms of the escape
> > analysis (NoEscape ArgEscape= throughout the runtime code. I'm still not happy with
> > the 'not' in the term, I always try to expand the name to some
> > sentence with a negated verb, but it makes no sense.
> > For example, "has_not_global_escape_in_scope" expands to
> > "Hasn't a global escape in its scope." in my thinking, which makes
> > no sense. You probably mean
> > "Has not-global escape in its scope." or "Has {ArgEscape|NoEscape}
> > in its scope."
> 
> > C2 is using the word "non" in this context, e.g., here
> > alloc->is_non_escaping.
> 
> There is also ConnectionGraph::not_global_escape()
That talks about a single node that represents a single 
Object. An object has a single state wrt. ea.
You use the term for safepoint which tracks a set of objects.
Here, has_not_global_excape can mean
  1. None of the several objects does escape globaly.
  2. There is at least one object that escapes globaly.

> > non obviously negates the adjective 'global',
> > non-global or nonglobal even is a English term I find in the
> > net.
> > So what about "has_non_global_escape_in_scope?"
> 
> And what about has_ea_local_in_scope?
That's good. Please document somewhere that 
Ea_local == ArgEscape | NoEscape.
That's what it is, right?

> > Does jvmti specify that the same limits are used ...?
> > ok on your side.
> 
> I don't know and didn't find anything in a quick search.
Ok, not your business.

> 
> > jvmtiEnvBase.cpp  ok
> > jvmtiImpl.h|cpp  ok
> > jvmtiTagMap.cpp ok
> > whitebox.cpp ok
> 
> > deoptimization.cpp
> 
> > line 177: Please break line
> > line 246, 281: Please break line
> > 1578, 1583, 1589, 1632, 1649, 1651 Break line
> 
> > 1651: You use 'non'-terms, too: non-escaping :)
> 
> I know :) At least here it is wrong I'd say. "...has to be a not escaping obj..."
> sounds better
> (hopefully not only to my german ears).
I thought the term non-escpaing makes it quite clear.
I just wanted to point out that using non above would
be similar to the wording here.

> > IterateHeapWithEscapeAnalysisEnabled.java
> 
> > line 415:
> > msg("wait until target thread has set testMethod_result");
> > while (testMethod_result == 0) {
> >     Thread.sleep(50);
> > }
> > Might the test run into timeouts at this place?
> > The field is volatile, i.e. it will be reloaded
> > in each iteration. But will dontinline_testMethod
> > write it back to main memory in time?
> 
> You mean, the test could hang in that loop for a couple of minutes? I don't
> think so. There are cache coherence protocols in place which will invalidate
> stale data very timely.
Ok, anyways, it would only be a hanging test.
> 
> Ok. I've removed quite a lot of the occurrances.
> 
> > Also, I like full sentences in comments.
> > Especially for me as foreign speaker, this makes
> > things much more clear. I.e., I try to make it
> > a real sentence with articles, capitalized and a
> > dot at the end if there is a subject and a verb
> > in first place.
> > E.g., jvmtiEnvBase.cpp:1327
> 
> Are you referring to the following?
> (from
> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6/src/hots
> pot/share/prims/jvmtiEnvBase.cpp.frames.html)
> 
> 1326
> 1327   // If the frame is a compiled one, need to deoptimize it.
> 1328   if (vf->is_compiled_frame()) {
> 
> This line 1327 is preexisting.
Sorry, wrong line number again. 
I think I meant
1333 // eagerly reallocate scalar replaced objects.

But I must admit, the subject is missing. It's one of these 
imperative sentences where the subject is left out, which 
are used throughout documentation.
Bad example, but still a correct sentence, so qualifies 
for punctuation?

Best regards,
  Goetz.


From fairoz.matte at oracle.com  Tue Aug 18 08:10:26 2020
From: fairoz.matte at oracle.com (Fairoz Matte)
Date: Tue, 18 Aug 2020 01:10:26 -0700 (PDT)
Subject: RFR(s): 8248295:
 serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal
In-Reply-To: <1bdcbd35-097e-1681-3a0c-32f9709497a4@oracle.com>
References: <d9b5bcb9-ebc4-42d2-ac22-cba1128a3cf4@default>
 <1bdcbd35-097e-1681-3a0c-32f9709497a4@oracle.com>
Message-ID: <eef7f47b-711c-476c-9ca5-f75b55962bb9@default>

Hi Vladimir, 

Thanks for looking into.
This is intermittent crash, and is reproducible in windows debug build environment. Below is the testing performed.

1. Issues observed 7/100 runs, ReservedCodeCacheSize=20m with "-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler"
2. Issues observed 0/300 runs, ReservedCodeCacheSize=30m with "-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler" 

Thanks,
Fairoz

> -----Original Message-----
> From: Vladimir Kozlov
> Sent: Monday, August 17, 2020 11:22 PM
> To: Fairoz Matte <fairoz.matte at oracle.com>; hotspot-compiler-
> dev at openjdk.java.net; serviceability-dev at openjdk.java.net
> Cc: Coleen Phillimore <coleen.phillimore at oracle.com>; Dean Long
> <dean.long at oracle.com>
> Subject: Re: RFR(s): 8248295:
> serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal
> 
> Hi Fairoz,
> 
> How you determine that +10Mb is enough with Graal?
> 
> Thanks,
> Vladimir
> 
> On 8/17/20 5:46 AM, Fairoz Matte wrote:
> > Hi,
> >
> >
> >
> > Please review this small test change to work with Graal.
> >
> >
> >
> > Background:
> >
> > Graal require more code cache compared to c1/c2. but the test case always
> set it to 20MB. This may not be sufficient when running graal.
> >
> > Default configuration for ReservedCodeCacheSize = 250MB
> >
> > With graal enabled, ReservedCodeCacheSize = 350MB
> >
> >
> >
> > Either we can modify the framework to honor ReservedCodeCacheSize for
> graal or just update the testcase.
> >
> > There are not many test cases they rely on ReservedCodeCacheSize or
> InitialCodeCacheSize. So the fix prefer the later one.
> >
> >
> >
> > JBS - https://bugs.openjdk.java.net/browse/JDK-8248295
> >
> > Webrev - http://cr.openjdk.java.net/~fmatte/8248295/webrev.00/
> >
> >
> >
> > Thanks,
> >
> > Fairoz
> >
> >
> >

From HORIE at jp.ibm.com  Tue Aug 18 08:38:42 2020
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Tue, 18 Aug 2020 17:38:42 +0900
Subject: RFR: 8251926: PPC: Remove an unused variable in assembler_ppc.cpp
Message-ID: <OF2322541E.C3EE3C6A-ON002585C8.002ECB78-492585C8.002F7D08@notes.na.collabserv.com>


Dear all,

Would you please review a small change?

Bug: https://bugs.openjdk.java.net/browse/JDK-8251926
Webrev: http://cr.openjdk.java.net/~mhorie/8251926/webrev.00/

The load_const_optimized function in assembler_ppc.cpp has an unused
variable named return_xd. It looks unnecessary in the current code.

Best regards,
Michihiro

From martin.doerr at sap.com  Tue Aug 18 09:13:39 2020
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 18 Aug 2020 09:13:39 +0000
Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new
 byte-reverse instructions
In-Reply-To: <OF8BF41F00.CCAC864B-ON002585C8.00003A9A-492585C8.002908BA@notes.na.collabserv.com>
References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com>,
 <20200626182644.GA262544@pacoca>
 <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com>, <20200630001528.GA26652@pacoca>
 <OF8BF41F00.CCAC864B-ON002585C8.00003A9A-492585C8.002908BA@notes.na.collabserv.com>
Message-ID: <AM4PR02MB3057CBF45D7ABE3F7850535B9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>

Hi Michihiro and Jose,

I had only done a quick review during my vacation. Thanks for updating the description of PowerArchitecturePPC64.
After taking a second look, I have a few minor requests. Sorry for that.


  *   ?UseByteReverseInstructions? (plural) would be more consistent with other names.
  *   Please add ?size? specifications to the ppc.ad file. Otherwise, the compiler has to determine sizes dynamically every time.
  *   bytes_reverse_short: ?format? specification misses ?extsh?.

Unfortunately, I couldn?t find a Power10 machine in my garage ??
So we rely on your testing.

Thanks and best regards,
Martin


From: Michihiro Horie <HORIE at jp.ibm.com>
Sent: Dienstag, 18. August 2020 09:28
To: Doerr, Martin <martin.doerr at sap.com>
Cc: hotspot-compiler-dev at openjdk.java.net; joserz at linux.ibm.com
Subject: RE: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions


Jose,
Latest change looks good also to me.

Marin,
Do you think if I can push the change?

Best regards,
Michihiro


----- Original message -----
From: "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>
To: "joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>" <joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>>
Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>>, "horie at jp.ibm.com<mailto:horie at jp.ibm.com>" <horie at jp.ibm.com<mailto:horie at jp.ibm.com>>
Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions
Date: Wed, Jul 1, 2020 4:01 AM

Thanks for the much better flag description.
Looks good.

Best regards,
Martin

> Am 30.06.2020 um 02:15 schrieb "joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>" <joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>>:
>
> ?Hello team,
>
> Here's the 2nd version, implementing the suggestions asked by Martin.
>
> Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.01/
> Bug: https://bugs.openjdk.java.net/browse/JDK-8248190
>
> Thank you!!
>
> Jose
>
>> On Sat, Jun 27, 2020 at 09:29:32AM +0000, Doerr, Martin wrote:
>> Hi Jose,
>>
>> Can you replace the outdated description of PowerArchitecturePPC64 in globals_poc.hpp by something generic, please?
>>
>> Please update the Copyright year in vm_version_poc.hpp.
>>
>> I can?t test the change, but it looks good to me.
>>
>> Best regards,
>> Martin
>>
>>>> Am 26.06.2020 um 20:29 schrieb "joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>" <joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>>:
>>>
>>> ?Hello team!
>>>
>>> This patch introduces Power10 to OpenJDK and implements three new instructions:
>>> - brh - byte-reverse halfword
>>> - brw - byte-reverse word
>>> - brd - byte-reverse doubleword
>>>
>>> Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8248190
>>>
>>> Thanks for your review!
>>>
>>> Jose R. Ziviani


From thomas.stuefe at gmail.com  Tue Aug 18 09:23:04 2020
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Tue, 18 Aug 2020 11:23:04 +0200
Subject: RFR: 8251926: PPC: Remove an unused variable in assembler_ppc.cpp
In-Reply-To: <OF2322541E.C3EE3C6A-ON002585C8.002ECB78-492585C8.002F7D08@notes.na.collabserv.com>
References: <OF2322541E.C3EE3C6A-ON002585C8.002ECB78-492585C8.002F7D08@notes.na.collabserv.com>
Message-ID: <CAA-vtUxfB+XHaTJBzw22wvm9poPnz1SsvZ2Lg4kyxNE5g4zYnw@mail.gmail.com>

Hi Michihiro,

seems fine and trivial.

Thanks, Thomas


On Tue, Aug 18, 2020 at 10:40 AM Michihiro Horie <HORIE at jp.ibm.com> wrote:

> Dear all,
>
> Would you please review a small change?
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8251926
> Webrev: http://cr.openjdk.java.net/~mhorie/8251926/webrev.00/
>
> The load_const_optimized function in assembler_ppc.cpp has an unused
> variable named return_xd. It looks unnecessary in the current code.
>
> Best regards,
> Michihiro
>

From HORIE at jp.ibm.com  Tue Aug 18 09:43:34 2020
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Tue, 18 Aug 2020 18:43:34 +0900
Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new
 byte-reverse instructions
In-Reply-To: <AM4PR02MB3057CBF45D7ABE3F7850535B9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>
References: <AM4PR02MB3057CBF45D7ABE3F7850535B9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>,
 <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com>,
 <20200626182644.GA262544@pacoca>
 <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com>, <20200630001528.GA26652@pacoca>
 <OF8BF41F00.CCAC864B-ON002585C8.00003A9A-492585C8.002908BA@notes.na.collabserv.com>
Message-ID: <OF40201F2F.A7388741-ON002585C8.00353C1B-492585C8.00356D81@notes.na.collabserv.com>


Hi Martin,

Thank you so much for your in-depth review. I agree all of the three items
should be updated.

Best regards,
Michihiro


 ----- Original message -----
 From: "Doerr, Martin" <martin.doerr at sap.com>
 To: Michihiro Horie <HORIE at jp.ibm.com>, "joserz at linux.ibm.com"
 <joserz at linux.ibm.com>
 Cc: "hotspot-compiler-dev at openjdk.java.net"
 <hotspot-compiler-dev at openjdk.java.net>
 Subject: [EXTERNAL] RE: RFR(M): 8248190: PPC: Enable Power10 system and
 use new byte-reverse instructions
 Date: Tue, Aug 18, 2020 6:13 PM

 Hi Michihiro and Jose,

 I had only done a quick review during my vacation. Thanks for updating the
 description of PowerArchitecturePPC64.
 After taking a second look, I have a few minor requests. Sorry for that.

             ?UseByteReverseInstructions? (plural) would be more consistent
             with other names.
             Please add ?size? specifications to the ppc.ad file.
             Otherwise, the compiler has to determine sizes dynamically
             every time.
             bytes_reverse_short: ?format? specification misses ?extsh?.

 Unfortunately, I couldn?t find a Power10 machine in my garage ??
 So we rely on your testing.

 Thanks and best regards,
 Martin


 From: Michihiro Horie <HORIE at jp.ibm.com>
 Sent: Dienstag, 18. August 2020 09:28
 To: Doerr, Martin <martin.doerr at sap.com>
 Cc: hotspot-compiler-dev at openjdk.java.net; joserz at linux.ibm.com
 Subject: RE: RFR(M): 8248190: PPC: Enable Power10 system and use new
 byte-reverse instructions


 Jose,
 Latest change looks good also to me.

 Marin,
 Do you think if I can push the change?

 Best regards,
 Michihiro


 ----- Original message -----
 From: "Doerr, Martin" <martin.doerr at sap.com>
 To: "joserz at linux.ibm.com" <joserz at linux.ibm.com>
 Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>, "
 horie at jp.ibm.com" <horie at jp.ibm.com>
 Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and
 use new byte-reverse instructions
 Date: Wed, Jul 1, 2020 4:01 AM

 Thanks for the much better flag description.
 Looks good.

 Best regards,
 Martin

 > Am 30.06.2020 um 02:15 schrieb "joserz at linux.ibm.com" <
 joserz at linux.ibm.com>:
 >
 > ?Hello team,
 >
 > Here's the 2nd version, implementing the suggestions asked by Martin.
 >
 > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.01/
 > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190
 >
 > Thank you!!
 >
 > Jose
 >
 >> On Sat, Jun 27, 2020 at 09:29:32AM +0000, Doerr, Martin wrote:
 >> Hi Jose,
 >>
 >> Can you replace the outdated description of PowerArchitecturePPC64 in
 globals_poc.hpp by something generic, please?
 >>
 >> Please update the Copyright year in vm_version_poc.hpp.
 >>
 >> I can?t test the change, but it looks good to me.
 >>
 >> Best regards,
 >> Martin
 >>
 >>>> Am 26.06.2020 um 20:29 schrieb "joserz at linux.ibm.com" <
 joserz at linux.ibm.com>:
 >>>
 >>> ?Hello team!
 >>>
 >>> This patch introduces Power10 to OpenJDK and implements three new
 instructions:
 >>> - brh - byte-reverse halfword
 >>> - brw - byte-reverse word
 >>> - brd - byte-reverse doubleword
 >>>
 >>> Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/
 >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8248190
 >>>
 >>> Thanks for your review!
 >>>
 >>> Jose R. Ziviani


From HORIE at jp.ibm.com  Tue Aug 18 09:49:19 2020
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Tue, 18 Aug 2020 18:49:19 +0900
Subject: RFR: 8251926: PPC: Remove an unused variable in assembler_ppc.cpp
In-Reply-To: <CAA-vtUxfB+XHaTJBzw22wvm9poPnz1SsvZ2Lg4kyxNE5g4zYnw@mail.gmail.com>
References: <CAA-vtUxfB+XHaTJBzw22wvm9poPnz1SsvZ2Lg4kyxNE5g4zYnw@mail.gmail.com>,
 <OF2322541E.C3EE3C6A-ON002585C8.002ECB78-492585C8.002F7D08@notes.na.collabserv.com>
Message-ID: <OFDBED3C62.843F9FCB-ON002585C8.00357952-492585C8.0035F456@notes.na.collabserv.com>


Hi Thomas,

Thanks a lot!

Best regards,
Michihiro


 ----- Original message -----
 From: "Thomas St?fe" <thomas.stuefe at gmail.com>
 To: Michihiro Horie <HORIE at jp.ibm.com>
 Cc: ppc-aix-port-dev <ppc-aix-port-dev at openjdk.java.net>, hotspot compiler
 <hotspot-compiler-dev at openjdk.java.net>
 Subject: [EXTERNAL] Re: RFR: 8251926: PPC: Remove an unused variable in
 assembler_ppc.cpp
 Date: Tue, Aug 18, 2020 6:23 PM

 Hi Michihiro,

 seems fine and trivial.

 Thanks, Thomas


 On Tue, Aug 18, 2020 at 10:40 AM Michihiro Horie <HORIE at jp.ibm.com> wrote:
  Dear all,

  Would you please review a small change?

  Bug: https://bugs.openjdk.java.net/browse/JDK-8251926
  Webrev: http://cr.openjdk.java.net/~mhorie/8251926/webrev.00/

  The load_const_optimized function in assembler_ppc.cpp has an unused
  variable named return_xd. It looks unnecessary in the current code.

  Best regards,
  Michihiro


From christian.hagedorn at oracle.com  Tue Aug 18 13:16:12 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Tue, 18 Aug 2020 15:16:12 +0200
Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and
 debugging support
In-Reply-To: <d4f2ee42-6db4-644f-0b0c-2c6e4d6373bd@oracle.com>
References: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com>
 <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com>
 <5c6d0dae-a30d-7b32-4503-62b7ec460606@oracle.com>
 <d4f2ee42-6db4-644f-0b0c-2c6e4d6373bd@oracle.com>
Message-ID: <4b6d3d8a-f2f3-9942-de60-078a0e5c46d9@oracle.com>

Hi Vladimir

On 17.08.20 19:36, Vladimir Kozlov wrote:
> On 8/17/20 12:44 AM, Christian Hagedorn wrote:
>> Hi Vladimir
>>
>> Yes, you're right, these should be changed into ASSERT and DEBUG().
>>
>> I'm wondering though if these ifdefs are even required for if-blocks 
>> inside methods?
>>
>> Isn't, for example, this if-block:
>>
>> #ifndef PRODUCT
>> ???????? if (TraceLinearScanLevel >= 2) {
>> ?????????? tty->print_cr("killing XMMs for trig");
>> ???????? }
>> #endif
>>
>> removed anyways when the flag is set to < 2 (which is statically known 
>> and thus would allow this entire block to be removed)? Or does it make 
>> a difference by explicitly guarding it with an ifdef?
> 
> You are right. It could be statically removed. But we keep #ifdef 
> sometimes to indicate that code is executed only in debug build because 
> we don't always remember type of a flag.

I see, that makes sense. I updated my patch and left the ifdefs there 
but changed them to ASSERT. I also updated other ifdefs belonging to 
TraceLinearScanLevel appropriately.

http://cr.openjdk.java.net/~chagedorn/8251093/webrev.01/

Best regards,
Christian

> 
> Thanks,
> Vladimir K
> 
>>
>> Best regards,
>> Christian
>>
>> On 14.08.20 20:09, Vladimir Kozlov wrote:
>>> One note. Most of the code is guarded by #ifndef PRODUCT.
>>>
>>> But the flag is available only in DEBUG build:
>>> ?? develop(intx, TraceLinearScanLevel, 0,
>>>
>>> Should we use #ifdef ASSERT and DEBUG() instead?
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 8/14/20 5:10 AM, Christian Hagedorn wrote:
>>>> Hi
>>>>
>>>> Please review the following enhancement for C1:
>>>> https://bugs.openjdk.java.net/browse/JDK-8251093
>>>> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.00/
>>>>
>>>> While I was working on JDK-8249603 [1], I added some additional 
>>>> debugging and logging code which helped to figure out what was going 
>>>> on. I think it would be useful to have this code around for the 
>>>> analysis of future C1 register allocator bugs.
>>>>
>>>> This RFE adds (everything non-product code):
>>>> - find_interval(number): Can be called like that from gdb anywhere 
>>>> to find an interval with the given number.
>>>> - Interval::print_children()/print_parent(): Useful when debugging 
>>>> with gdb to quickly show the split children and parent.
>>>> - LinearScan::print_reg_num(number): Prints the register or stack 
>>>> location for this register number. This is useful in some places 
>>>> (logging with TraceLinearScanLevel set) where it just printed a 
>>>> number which first had to be manually looked up in other logs.
>>>>
>>>> I additionally did some cleanup of the touched code.
>>>>
>>>> We could additionally split the TraceLinearScanLevel flag into 
>>>> separate flags related to the different phases of the register 
>>>> allocation algorithm. It currently just prints too much details on 
>>>> the higher levels. You often find yourself being interested in a 
>>>> specific part of the algorithm and only want to know more details 
>>>> there. To achieve that you now you have to either handle all the 
>>>> noise or manually disable/enable other logs. We could file an RFE to 
>>>> clean this up if it's worth the effort - given that there are not 
>>>> many new issues filed for C1 register allocation today.
>>>>
>>>> Thank you!
>>>>
>>>> Best regards,
>>>> Christian
>>>>
>>>>
>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8251093
>>>>

From dmitry.chuyko at bell-sw.com  Tue Aug 18 15:05:01 2020
From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko)
Date: Tue, 18 Aug 2020 18:05:01 +0300
Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster
 Math.signum(fp)
In-Reply-To: <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com>
References: <abfbd41d-aacc-af5d-d12b-d8e685e95a76@bell-sw.com>
 <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com>
 <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com>
Message-ID: <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com>

Hi Andrew,

Thanks for taking a look.

This work has started as a try to improve common code, see JDK-8249198 
[1] and short related discussion [2]. And the original benchmark [3] is 
quite similar to the one that you used.

As you kindly tried the patch on a hardware where it shows degradation 
(baseline is quite slow btw), I think it makes sense to limit it to 
Cortex/Neoverse. So I restored UseSignumInrinsic flag which is enabled 
only for CPU_ARM. Disabling InlineMathNatives also disables it.

webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.02/

As suggested by Anrew Dinn, there are few more test cases in the test: 
+-MIN_NORMAL and some denormal numbers.

Some more results for a benchmark with reduce():

-XX:-UseSignumIntrinsic
DoubleOrigSignum.ofMostlyNaN   0.914 ?  0.001  ns/op
DoubleOrigSignum.ofMostlyNeg   1.178 ?  0.001  ns/op
DoubleOrigSignum.ofMostlyPos   1.176 ?  0.017  ns/op
DoubleOrigSignum.ofMostlyZero  0.803 ?  0.001  ns/op
DoubleOrigSignum.ofRandom      1.175 ?  0.012  ns/op
-XX:+UseSignumIntrinsic
DoubleOrigSignum.ofMostlyNaN   1.040 ? 0.007   ns/op
DoubleOrigSignum.ofMostlyNeg   1.040 ? 0.004   ns/op
DoubleOrigSignum.ofMostlyPos   1.039 ? 0.003   ns/op
DoubleOrigSignum.ofMostlyZero  1.040 ? 0.001   ns/op
DoubleOrigSignum.ofRandom      1.040 ? 0.003   ns/op


If we only intrinsify copySign() we lose free mask that we get from 
facgt. In such case improvement (for signum) decreases like from ~30% to 
~15%, and it also greatly depends on the particular HW. We can 
additionally introduce an intrinsic for Math.copySign(), especially it 
makes sense for float where it can be just 2 fp instructions: movi+bsl 
(fmovd+fnegd+bsl for double).

-Dmitry

[1] https://bugs.openjdk.java.net/browse/JDK-8249198
[2] 
https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-July/067666.html
[3] 
http://cr.openjdk.java.net/~dchuyko/8249198/webrev.00/raw_files/new/test/micro/org/openjdk/bench/java/lang/DoubleSignum.java

On 8/15/20 4:50 PM, Andrew Haley wrote:
> I've been looking at the way Math.signum() is used, mostly by
> searching the GitHub code database. I've changed the JMH test to be
> IMO more realistic: it's at
> http://cr.openjdk.java.net/~aph/DoubleSignum.java. I think it's more
> realitic because signum() results usually aren't stored but are used
> to feed other arithmetic ops, usually + or *.
>
> Baseline:
>
> Benchmark                  Mode  Cnt  Score   Error  Units
> DoubleSignum.ofMostlyNaN   avgt    3  2.409 ? 0.051  ns/op
> DoubleSignum.ofMostlyNeg   avgt    3  2.475 ? 0.211  ns/op
> DoubleSignum.ofMostlyPos   avgt    3  2.494 ? 0.015  ns/op
> DoubleSignum.ofMostlyZero  avgt    3  2.501 ? 0.008  ns/op
> DoubleSignum.ofRandom      avgt    3  2.458 ? 0.373  ns/op
> DoubleSignum.overhead      avgt    3  2.373 ? 0.029  ns/op
>
> -XX:+UseSignumIntrinsic:
>
> Benchmark                  Mode  Cnt  Score   Error  Units
> DoubleSignum.ofMostlyNaN   avgt    3  2.776 ? 0.006  ns/op
> DoubleSignum.ofMostlyNeg   avgt    3  2.773 ? 0.066  ns/op
> DoubleSignum.ofMostlyPos   avgt    3  2.772 ? 0.084  ns/op
> DoubleSignum.ofMostlyZero  avgt    3  2.770 ? 0.045  ns/op
> DoubleSignum.ofRandom      avgt    3  2.769 ? 0.005  ns/op
> DoubleSignum.overhead      avgt    3  2.376 ? 0.013  ns/op
>
>
> I think it might be more useful for you to work on optimizing
> Math.copysign().
>

From vladimir.x.ivanov at oracle.com  Tue Aug 18 15:08:30 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 18 Aug 2020 18:08:30 +0300
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <87h7t13bdz.fsf@redhat.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
 <87h7t13bdz.fsf@redhat.com>
Message-ID: <ef7c4e62-ca3b-f6ee-b7ac-8e1af273d33d@oracle.com>


>> http://cr.openjdk.java.net/~roland/8223051/webrev.03/

Looks good! Thanks a lot for taking care of it!

Some minor comments:

===============
src/hotspot/share/opto/callnode.cpp:

+      Node* new_in = old_sosn->clone(sosn_map);
+      if (old_unique != C->unique()) { // New node?

It's not a correctness issue, but strictly speaking it checks whether 
new nodes were allocated or not. It would be clearer to add a flag to 
SafePointScalarObjectNode::clone(Dict*) which signals that the returned 
node comes from the cache. Or just check that "new_in->_idx >= C->unique()".

I see that the code comes from macro.cpp, but IMO it's a good 
opportunity to clean it up a bit.

===============
src/hotspot/share/opto/callnode.cpp:

   // If you have back to back safepoints, remove one
    if( in(TypeFunc::Control)->is_SafePoint() )
      return in(TypeFunc::Control);

-  if( in(0)->is_Proj() ) {
+  // Transforming long counted loops requires a safepoint node. Do not
+  // eliminate a safepoint until loop opts are over.
+  if (in(0)->is_Proj() && !phase->C->major_progress()) {

Can you elaborate on this a bit? Why elimination of back-to-back 
safepoints cause problems during new transformation? Is it because you 
need specifically a SafePoint because CallNode doesn't fit?

===============
src/hotspot/share/opto/loopnode.cpp:

+void PhaseIdealLoop::add_empty_predicate(Deoptimization::DeoptReason 
reason, Node* inner_head, IdealLoopTree* loop, SafePointNode* sfpt) {

Nothing actionable at the moment, but it's unfortunate to see more and 
more code being duplicated from GraphKit. I wish there were a way to 
share implementation between GraphKit, PhaseIdealLoop, and 
PhaseMacroExpand.

Best regards,
Vladimir Ivanov


>>
>> diff from previous patch:
>> http://cr.openjdk.java.net/~roland/8223051/webrev.02-03/
> 

From vladimir.kozlov at oracle.com  Tue Aug 18 15:12:31 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 18 Aug 2020 08:12:31 -0700 (PDT)
Subject: [16] RFR(M) 8251459: Compute caller save exclusion RegMasks once
Message-ID: <0ac5e7ce-2f98-cf5d-6668-fd3b15f9e0ab@oracle.com>

https://cr.openjdk.java.net/~kvn/8251459/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8251459

Claes once again found optimization for C2 code!

Instead of per bit exclusion SOC and AS registers from debuginfo regmasks he suggested to calculate exclusion masks once 
in Matcher::init_spill_mask() during first compilation and use these masks to do per word exclusion.
We can save 27k instructions per compilation on x64 with this!

I modified Claes's original patch by removing refactoring code to see changes more clear.

Tested: hs-tier1-3, xcomp

Thanks,
Vladimir

From vladimir.x.ivanov at oracle.com  Tue Aug 18 15:26:44 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 18 Aug 2020 18:26:44 +0300
Subject: [16] RFR(M) 8251459: Compute caller save exclusion RegMasks once
In-Reply-To: <0ac5e7ce-2f98-cf5d-6668-fd3b15f9e0ab@oracle.com>
References: <0ac5e7ce-2f98-cf5d-6668-fd3b15f9e0ab@oracle.com>
Message-ID: <0b735f6f-39ad-6d6a-8cc5-3f4fd9ab1d96@oracle.com>

Looks good.

Best regards,
Vladimir Ivanov

On 18.08.2020 18:12, Vladimir Kozlov wrote:
> https://cr.openjdk.java.net/~kvn/8251459/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8251459
> 
> Claes once again found optimization for C2 code!
> 
> Instead of per bit exclusion SOC and AS registers from debuginfo 
> regmasks he suggested to calculate exclusion masks once in 
> Matcher::init_spill_mask() during first compilation and use these masks 
> to do per word exclusion.
> We can save 27k instructions per compilation on x64 with this!
> 
> I modified Claes's original patch by removing refactoring code to see 
> changes more clear.
> 
> Tested: hs-tier1-3, xcomp
> 
> Thanks,
> Vladimir

From vladimir.kozlov at oracle.com  Tue Aug 18 15:32:28 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 18 Aug 2020 08:32:28 -0700
Subject: [16] RFR(M) 8251459: Compute caller save exclusion RegMasks once
In-Reply-To: <0b735f6f-39ad-6d6a-8cc5-3f4fd9ab1d96@oracle.com>
References: <0ac5e7ce-2f98-cf5d-6668-fd3b15f9e0ab@oracle.com>
 <0b735f6f-39ad-6d6a-8cc5-3f4fd9ab1d96@oracle.com>
Message-ID: <fae216a1-911e-4502-792b-d49f59e5cf41@oracle.com>

Thanks!

Vladimir K

On 8/18/20 8:26 AM, Vladimir Ivanov wrote:
> Looks good.
> 
> Best regards,
> Vladimir Ivanov
> 
> On 18.08.2020 18:12, Vladimir Kozlov wrote:
>> https://cr.openjdk.java.net/~kvn/8251459/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8251459
>>
>> Claes once again found optimization for C2 code!
>>
>> Instead of per bit exclusion SOC and AS registers from debuginfo regmasks he suggested to calculate exclusion masks 
>> once in Matcher::init_spill_mask() during first compilation and use these masks to do per word exclusion.
>> We can save 27k instructions per compilation on x64 with this!
>>
>> I modified Claes's original patch by removing refactoring code to see changes more clear.
>>
>> Tested: hs-tier1-3, xcomp
>>
>> Thanks,
>> Vladimir

From vladimir.kozlov at oracle.com  Tue Aug 18 15:41:31 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 18 Aug 2020 08:41:31 -0700
Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and
 debugging support
In-Reply-To: <4b6d3d8a-f2f3-9942-de60-078a0e5c46d9@oracle.com>
References: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com>
 <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com>
 <5c6d0dae-a30d-7b32-4503-62b7ec460606@oracle.com>
 <d4f2ee42-6db4-644f-0b0c-2c6e4d6373bd@oracle.com>
 <4b6d3d8a-f2f3-9942-de60-078a0e5c46d9@oracle.com>
Message-ID: <8cd1d560-f473-f4f1-a865-70e306d4750f@oracle.com>

c1_Compilation.hpp: looks like both versions of allocator() do the same thing.

I suggest to build with configure --with-debug-level=optimized to check that NOT_PRODUCT can be built with these changes.

Thanks,
Vladimir

On 8/18/20 6:16 AM, Christian Hagedorn wrote:
> Hi Vladimir
> 
> On 17.08.20 19:36, Vladimir Kozlov wrote:
>> On 8/17/20 12:44 AM, Christian Hagedorn wrote:
>>> Hi Vladimir
>>>
>>> Yes, you're right, these should be changed into ASSERT and DEBUG().
>>>
>>> I'm wondering though if these ifdefs are even required for if-blocks inside methods?
>>>
>>> Isn't, for example, this if-block:
>>>
>>> #ifndef PRODUCT
>>> ???????? if (TraceLinearScanLevel >= 2) {
>>> ?????????? tty->print_cr("killing XMMs for trig");
>>> ???????? }
>>> #endif
>>>
>>> removed anyways when the flag is set to < 2 (which is statically known and thus would allow this entire block to be 
>>> removed)? Or does it make a difference by explicitly guarding it with an ifdef?
>>
>> You are right. It could be statically removed. But we keep #ifdef sometimes to indicate that code is executed only in 
>> debug build because we don't always remember type of a flag.
> 
> I see, that makes sense. I updated my patch and left the ifdefs there but changed them to ASSERT. I also updated other 
> ifdefs belonging to TraceLinearScanLevel appropriately.
> 
> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.01/
> 
> Best regards,
> Christian
> 
>>
>> Thanks,
>> Vladimir K
>>
>>>
>>> Best regards,
>>> Christian
>>>
>>> On 14.08.20 20:09, Vladimir Kozlov wrote:
>>>> One note. Most of the code is guarded by #ifndef PRODUCT.
>>>>
>>>> But the flag is available only in DEBUG build:
>>>> ?? develop(intx, TraceLinearScanLevel, 0,
>>>>
>>>> Should we use #ifdef ASSERT and DEBUG() instead?
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 8/14/20 5:10 AM, Christian Hagedorn wrote:
>>>>> Hi
>>>>>
>>>>> Please review the following enhancement for C1:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8251093
>>>>> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.00/
>>>>>
>>>>> While I was working on JDK-8249603 [1], I added some additional debugging and logging code which helped to figure 
>>>>> out what was going on. I think it would be useful to have this code around for the analysis of future C1 register 
>>>>> allocator bugs.
>>>>>
>>>>> This RFE adds (everything non-product code):
>>>>> - find_interval(number): Can be called like that from gdb anywhere to find an interval with the given number.
>>>>> - Interval::print_children()/print_parent(): Useful when debugging with gdb to quickly show the split children and 
>>>>> parent.
>>>>> - LinearScan::print_reg_num(number): Prints the register or stack location for this register number. This is useful 
>>>>> in some places (logging with TraceLinearScanLevel set) where it just printed a number which first had to be 
>>>>> manually looked up in other logs.
>>>>>
>>>>> I additionally did some cleanup of the touched code.
>>>>>
>>>>> We could additionally split the TraceLinearScanLevel flag into separate flags related to the different phases of 
>>>>> the register allocation algorithm. It currently just prints too much details on the higher levels. You often find 
>>>>> yourself being interested in a specific part of the algorithm and only want to know more details there. To achieve 
>>>>> that you now you have to either handle all the noise or manually disable/enable other logs. We could file an RFE to 
>>>>> clean this up if it's worth the effort - given that there are not many new issues filed for C1 register allocation 
>>>>> today.
>>>>>
>>>>> Thank you!
>>>>>
>>>>> Best regards,
>>>>> Christian
>>>>>
>>>>>
>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8251093
>>>>>

From vladimir.kozlov at oracle.com  Tue Aug 18 19:14:01 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 18 Aug 2020 12:14:01 -0700
Subject: RFR(s): 8248295:
 serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal
In-Reply-To: <eef7f47b-711c-476c-9ca5-f75b55962bb9@default>
References: <d9b5bcb9-ebc4-42d2-ac22-cba1128a3cf4@default>
 <1bdcbd35-097e-1681-3a0c-32f9709497a4@oracle.com>
 <eef7f47b-711c-476c-9ca5-f75b55962bb9@default>
Message-ID: <7040f785-b871-9771-94a2-4c3472a6bf6d@oracle.com>

I would suggest to run test with -XX:+PrintCodeCache flag which prints CodeCache usage on exit.

Also add '-ea -esa' flags - some runs failed with them because they increase Graal's methods size.

Running test with immediately caused OOM error on my local linux machine:

'-server -ea -esa -XX:+TieredCompilation -XX:+PrintCodeCache -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI 
-XX:+UseJVMCICompiler -Djvmci.Compiler=graal'

With -XX:ReservedCodeCacheSize=30m I got:

[11.217s][warning][codecache] CodeCache is full. Compiler has been disabled.
[11.217s][warning][codecache] Try increasing the code cache size using -XX:ReservedCodeCacheSize=

With -XX:ReservedCodeCacheSize=50m I got this output:

CodeCache: size=51200Kb used=34401Kb max_used=34401Kb free=16798Kb

May be you need to set it to 35m or better to 50m to be safe.

Note, without Graal test uses only 5.5m:

CodeCache: size=20480Kb used=5677Kb max_used=5688Kb free=14803Kb

-----------------------------

I also forgot to ask you to update test's Copyright year.

Regards,
Vladimir K

On 8/18/20 1:10 AM, Fairoz Matte wrote:
> Hi Vladimir,
> 
> Thanks for looking into.
> This is intermittent crash, and is reproducible in windows debug build environment. Below is the testing performed.
> 
> 1. Issues observed 7/100 runs, ReservedCodeCacheSize=20m with "-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler"
> 2. Issues observed 0/300 runs, ReservedCodeCacheSize=30m with "-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler"
> 
> Thanks,
> Fairoz
> 
>> -----Original Message-----
>> From: Vladimir Kozlov
>> Sent: Monday, August 17, 2020 11:22 PM
>> To: Fairoz Matte <fairoz.matte at oracle.com>; hotspot-compiler-
>> dev at openjdk.java.net; serviceability-dev at openjdk.java.net
>> Cc: Coleen Phillimore <coleen.phillimore at oracle.com>; Dean Long
>> <dean.long at oracle.com>
>> Subject: Re: RFR(s): 8248295:
>> serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal
>>
>> Hi Fairoz,
>>
>> How you determine that +10Mb is enough with Graal?
>>
>> Thanks,
>> Vladimir
>>
>> On 8/17/20 5:46 AM, Fairoz Matte wrote:
>>> Hi,
>>>
>>>
>>>
>>> Please review this small test change to work with Graal.
>>>
>>>
>>>
>>> Background:
>>>
>>> Graal require more code cache compared to c1/c2. but the test case always
>> set it to 20MB. This may not be sufficient when running graal.
>>>
>>> Default configuration for ReservedCodeCacheSize = 250MB
>>>
>>> With graal enabled, ReservedCodeCacheSize = 350MB
>>>
>>>
>>>
>>> Either we can modify the framework to honor ReservedCodeCacheSize for
>> graal or just update the testcase.
>>>
>>> There are not many test cases they rely on ReservedCodeCacheSize or
>> InitialCodeCacheSize. So the fix prefer the later one.
>>>
>>>
>>>
>>> JBS - https://bugs.openjdk.java.net/browse/JDK-8248295
>>>
>>> Webrev - http://cr.openjdk.java.net/~fmatte/8248295/webrev.00/
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Fairoz
>>>
>>>
>>>

From evgeny.nikitin at oracle.com  Tue Aug 18 19:40:45 2020
From: evgeny.nikitin at oracle.com (Evgeny Nikitin)
Date: Tue, 18 Aug 2020 21:40:45 +0200
Subject: RFR(XS): 8208257: Un-quarantine
 vmTestbase/vm/mlvm/meth/func/jdi/breakpointOtherStratum
Message-ID: <1b7315b8-98be-116d-d037-e3bb17f55f1b@oracle.com>

Hi,

Bug: https://bugs.openjdk.java.net/browse/JDK-8208257
Webrev: 
http://cr.openjdk.java.net/~enikitin/8208257/webrev.00/JDK-8208257.patch

I tried to reproduce the test multiple times with different VM 
parameters, but it always passes. I suggest removing it from 
ProblemList.txt.

Second change is marking the test with randomness keyword from the 
https://bugs.openjdk.java.net/browse/JDK-8243427 (using reproducible 
random for mlvm tests).

The change has been checked in mach5 for windows, macosx, linux in 
x64-debug, approx. 100 runs on each platform (passed).

Please review,
/Evgeny Nikitin.

From martin.doerr at sap.com  Tue Aug 18 21:25:50 2020
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 18 Aug 2020 21:25:50 +0000
Subject: [16] RFR (S) 8249749: modify a primitive array through a stream
 and a for cycle causes jre crash
In-Reply-To: <3d0dc868-e824-a141-02a2-58a58ad5b450@oracle.com>
References: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com>
 <03653a8f-d196-4e4c-d893-a619f2973011@oracle.com>
 <3d0dc868-e824-a141-02a2-58a58ad5b450@oracle.com>
Message-ID: <AM4PR02MB3057138FD7E22199E454EDDC9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>

Hi Vladimir,

we are hitting the following assertion after this change was pushed:
assert(my_pack(s) == __null) failed: only in one pack

Stack:
V  [jvm.dll+0xbbac55]  SuperWord::construct_my_pack_map+0x135  (superword.cpp:1723)
V  [jvm.dll+0xbb57f7]  SuperWord::SLP_extract+0x427  (superword.cpp:520)
V  [jvm.dll+0xbcba0b]  SuperWord::transform_loop+0x48b  (superword.cpp:170)
V  [jvm.dll+0x895a09]  PhaseIdealLoop::build_and_optimize+0xef9  (loopnode.cpp:3270)
V  [jvm.dll+0x3df4b6]  Compile::Optimize+0xf76  (compile.cpp:2187)
...

Seems to be reproducible by JTREG test compiler/vectorization/TestComplexAddrExpr.java on some x64 and aarch64 machines.
(May depend on CPU model.)

Is this a known issue?
Or should I open a bug?

Best regards,
Martin


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> retn at openjdk.java.net> On Behalf Of Vladimir Kozlov
> Sent: Montag, 10. August 2020 19:03
> To: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: [16] RFR (S) 8249749: modify a primitive array through a stream
> and a for cycle causes jre crash
> 
> Thank you, Vladimir
> 
> Vladimir K
> 
> On 8/10/20 2:04 AM, Vladimir Ivanov wrote:
> >
> >> http://cr.openjdk.java.net/~kvn/8249749/webrev.00/
> >
> > Looks good.
> >
> > Best regards,
> > Vladimir Ivanov
> >
> >> https://bugs.openjdk.java.net/browse/JDK-8249749
> >>
> >> SuperWord does not recognize array indexing pattern used in the test
> due to additional AddI node:
> >>
> >> AddI(AddI(Invariant(j*n), Loop_phi(i)), Loop_inc(1))
> >>
> >> As result it can't find memory reference to align vectors. But code ignores
> that and continue execution.
> >> Later when align_to_ref is referenced we hit SEGV because it is NULL.
> >>
> >> The fix is to check align_to_ref for NULL early and bailout.
> >>
> >> I also adjusted code in SWPointer::scaled_iv_plus_offset() to recognize
> this address pattern to vectorize test's code.
> >> And added missing _invar setting.
> >>
> >> And I slightly modified tracking code to investigate this issue.
> >>
> >> Added new test to check some complex address expressions similar to
> bug's test case. Not all cases in test are
> >> vectorized - there are other conditions which prevent that.
> >>
> >> Tested tier1,tier2,hs-tier3,precheckin-comp
> >>
> >> Thanks,
> >> Vladimir K

From vladimir.kozlov at oracle.com  Tue Aug 18 21:57:42 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 18 Aug 2020 14:57:42 -0700
Subject: [16] RFR (S) 8249749: modify a primitive array through a stream
 and a for cycle causes jre crash
In-Reply-To: <AM4PR02MB3057138FD7E22199E454EDDC9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>
References: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com>
 <03653a8f-d196-4e4c-d893-a619f2973011@oracle.com>
 <3d0dc868-e824-a141-02a2-58a58ad5b450@oracle.com>
 <AM4PR02MB3057138FD7E22199E454EDDC9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>
Message-ID: <2f7bd657-ce3b-b14b-e682-2b209c2240eb@oracle.com>

Thank you for reporting, Martin

Please, file bug and specify JDK version, VM flags and CPUID features on machine where it fail.

We test on x86 and aarch64 and I did not see any issues so far.

Regards,
Vladimir

On 8/18/20 2:25 PM, Doerr, Martin wrote:
> Hi Vladimir,
> 
> we are hitting the following assertion after this change was pushed:
> assert(my_pack(s) == __null) failed: only in one pack
> 
> Stack:
> V  [jvm.dll+0xbbac55]  SuperWord::construct_my_pack_map+0x135  (superword.cpp:1723)
> V  [jvm.dll+0xbb57f7]  SuperWord::SLP_extract+0x427  (superword.cpp:520)
> V  [jvm.dll+0xbcba0b]  SuperWord::transform_loop+0x48b  (superword.cpp:170)
> V  [jvm.dll+0x895a09]  PhaseIdealLoop::build_and_optimize+0xef9  (loopnode.cpp:3270)
> V  [jvm.dll+0x3df4b6]  Compile::Optimize+0xf76  (compile.cpp:2187)
> ...
> 
> Seems to be reproducible by JTREG test compiler/vectorization/TestComplexAddrExpr.java on some x64 and aarch64 machines.
> (May depend on CPU model.)
> 
> Is this a known issue?
> Or should I open a bug?
> 
> Best regards,
> Martin
> 
> 
>> -----Original Message-----
>> From: hotspot-compiler-dev <hotspot-compiler-dev-
>> retn at openjdk.java.net> On Behalf Of Vladimir Kozlov
>> Sent: Montag, 10. August 2020 19:03
>> To: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: [16] RFR (S) 8249749: modify a primitive array through a stream
>> and a for cycle causes jre crash
>>
>> Thank you, Vladimir
>>
>> Vladimir K
>>
>> On 8/10/20 2:04 AM, Vladimir Ivanov wrote:
>>>
>>>> http://cr.openjdk.java.net/~kvn/8249749/webrev.00/
>>>
>>> Looks good.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8249749
>>>>
>>>> SuperWord does not recognize array indexing pattern used in the test
>> due to additional AddI node:
>>>>
>>>> AddI(AddI(Invariant(j*n), Loop_phi(i)), Loop_inc(1))
>>>>
>>>> As result it can't find memory reference to align vectors. But code ignores
>> that and continue execution.
>>>> Later when align_to_ref is referenced we hit SEGV because it is NULL.
>>>>
>>>> The fix is to check align_to_ref for NULL early and bailout.
>>>>
>>>> I also adjusted code in SWPointer::scaled_iv_plus_offset() to recognize
>> this address pattern to vectorize test's code.
>>>> And added missing _invar setting.
>>>>
>>>> And I slightly modified tracking code to investigate this issue.
>>>>
>>>> Added new test to check some complex address expressions similar to
>> bug's test case. Not all cases in test are
>>>> vectorized - there are other conditions which prevent that.
>>>>
>>>> Tested tier1,tier2,hs-tier3,precheckin-comp
>>>>
>>>> Thanks,
>>>> Vladimir K

From vladimir.kozlov at oracle.com  Tue Aug 18 22:03:32 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 18 Aug 2020 15:03:32 -0700
Subject: [16] RFR (S) 8249749: modify a primitive array through a stream
 and a for cycle causes jre crash
In-Reply-To: <2f7bd657-ce3b-b14b-e682-2b209c2240eb@oracle.com>
References: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com>
 <03653a8f-d196-4e4c-d893-a619f2973011@oracle.com>
 <3d0dc868-e824-a141-02a2-58a58ad5b450@oracle.com>
 <AM4PR02MB3057138FD7E22199E454EDDC9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <2f7bd657-ce3b-b14b-e682-2b209c2240eb@oracle.com>
Message-ID: <89ebdebb-2d8d-7e3a-1594-a3a6888329bc@oracle.com>

I reproduced it with -XX:UseAVX=0.

I will file bug and take care of it.

Thanks,
Vladimir K

On 8/18/20 2:57 PM, Vladimir Kozlov wrote:
> Thank you for reporting, Martin
> 
> Please, file bug and specify JDK version, VM flags and CPUID features on machine where it fail.
> 
> We test on x86 and aarch64 and I did not see any issues so far.
> 
> Regards,
> Vladimir
> 
> On 8/18/20 2:25 PM, Doerr, Martin wrote:
>> Hi Vladimir,
>>
>> we are hitting the following assertion after this change was pushed:
>> assert(my_pack(s) == __null) failed: only in one pack
>>
>> Stack:
>> V? [jvm.dll+0xbbac55]? SuperWord::construct_my_pack_map+0x135? (superword.cpp:1723)
>> V? [jvm.dll+0xbb57f7]? SuperWord::SLP_extract+0x427? (superword.cpp:520)
>> V? [jvm.dll+0xbcba0b]? SuperWord::transform_loop+0x48b? (superword.cpp:170)
>> V? [jvm.dll+0x895a09]? PhaseIdealLoop::build_and_optimize+0xef9? (loopnode.cpp:3270)
>> V? [jvm.dll+0x3df4b6]? Compile::Optimize+0xf76? (compile.cpp:2187)
>> ...
>>
>> Seems to be reproducible by JTREG test compiler/vectorization/TestComplexAddrExpr.java on some x64 and aarch64 machines.
>> (May depend on CPU model.)
>>
>> Is this a known issue?
>> Or should I open a bug?
>>
>> Best regards,
>> Martin
>>
>>
>>> -----Original Message-----
>>> From: hotspot-compiler-dev <hotspot-compiler-dev-
>>> retn at openjdk.java.net> On Behalf Of Vladimir Kozlov
>>> Sent: Montag, 10. August 2020 19:03
>>> To: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>>> Subject: Re: [16] RFR (S) 8249749: modify a primitive array through a stream
>>> and a for cycle causes jre crash
>>>
>>> Thank you, Vladimir
>>>
>>> Vladimir K
>>>
>>> On 8/10/20 2:04 AM, Vladimir Ivanov wrote:
>>>>
>>>>> http://cr.openjdk.java.net/~kvn/8249749/webrev.00/
>>>>
>>>> Looks good.
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8249749
>>>>>
>>>>> SuperWord does not recognize array indexing pattern used in the test
>>> due to additional AddI node:
>>>>>
>>>>> AddI(AddI(Invariant(j*n), Loop_phi(i)), Loop_inc(1))
>>>>>
>>>>> As result it can't find memory reference to align vectors. But code ignores
>>> that and continue execution.
>>>>> Later when align_to_ref is referenced we hit SEGV because it is NULL.
>>>>>
>>>>> The fix is to check align_to_ref for NULL early and bailout.
>>>>>
>>>>> I also adjusted code in SWPointer::scaled_iv_plus_offset() to recognize
>>> this address pattern to vectorize test's code.
>>>>> And added missing _invar setting.
>>>>>
>>>>> And I slightly modified tracking code to investigate this issue.
>>>>>
>>>>> Added new test to check some complex address expressions similar to
>>> bug's test case. Not all cases in test are
>>>>> vectorized - there are other conditions which prevent that.
>>>>>
>>>>> Tested tier1,tier2,hs-tier3,precheckin-comp
>>>>>
>>>>> Thanks,
>>>>> Vladimir K

From vladimir.kozlov at oracle.com  Tue Aug 18 22:10:20 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 18 Aug 2020 15:10:20 -0700
Subject: [16] RFR (S) 8249749: modify a primitive array through a stream
 and a for cycle causes jre crash
In-Reply-To: <89ebdebb-2d8d-7e3a-1594-a3a6888329bc@oracle.com>
References: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com>
 <03653a8f-d196-4e4c-d893-a619f2973011@oracle.com>
 <3d0dc868-e824-a141-02a2-58a58ad5b450@oracle.com>
 <AM4PR02MB3057138FD7E22199E454EDDC9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <2f7bd657-ce3b-b14b-e682-2b209c2240eb@oracle.com>
 <89ebdebb-2d8d-7e3a-1594-a3a6888329bc@oracle.com>
Message-ID: <860caa40-38f7-baff-54dc-3e6802a64425@oracle.com>

https://bugs.openjdk.java.net/browse/JDK-8251994

On 8/18/20 3:03 PM, Vladimir Kozlov wrote:
> I reproduced it with -XX:UseAVX=0.
> 
> I will file bug and take care of it.
> 
> Thanks,
> Vladimir K
> 
> On 8/18/20 2:57 PM, Vladimir Kozlov wrote:
>> Thank you for reporting, Martin
>>
>> Please, file bug and specify JDK version, VM flags and CPUID features on machine where it fail.
>>
>> We test on x86 and aarch64 and I did not see any issues so far.
>>
>> Regards,
>> Vladimir
>>
>> On 8/18/20 2:25 PM, Doerr, Martin wrote:
>>> Hi Vladimir,
>>>
>>> we are hitting the following assertion after this change was pushed:
>>> assert(my_pack(s) == __null) failed: only in one pack
>>>
>>> Stack:
>>> V? [jvm.dll+0xbbac55]? SuperWord::construct_my_pack_map+0x135? (superword.cpp:1723)
>>> V? [jvm.dll+0xbb57f7]? SuperWord::SLP_extract+0x427? (superword.cpp:520)
>>> V? [jvm.dll+0xbcba0b]? SuperWord::transform_loop+0x48b? (superword.cpp:170)
>>> V? [jvm.dll+0x895a09]? PhaseIdealLoop::build_and_optimize+0xef9? (loopnode.cpp:3270)
>>> V? [jvm.dll+0x3df4b6]? Compile::Optimize+0xf76? (compile.cpp:2187)
>>> ...
>>>
>>> Seems to be reproducible by JTREG test compiler/vectorization/TestComplexAddrExpr.java on some x64 and aarch64 machines.
>>> (May depend on CPU model.)
>>>
>>> Is this a known issue?
>>> Or should I open a bug?
>>>
>>> Best regards,
>>> Martin
>>>
>>>
>>>> -----Original Message-----
>>>> From: hotspot-compiler-dev <hotspot-compiler-dev-
>>>> retn at openjdk.java.net> On Behalf Of Vladimir Kozlov
>>>> Sent: Montag, 10. August 2020 19:03
>>>> To: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>>>> Subject: Re: [16] RFR (S) 8249749: modify a primitive array through a stream
>>>> and a for cycle causes jre crash
>>>>
>>>> Thank you, Vladimir
>>>>
>>>> Vladimir K
>>>>
>>>> On 8/10/20 2:04 AM, Vladimir Ivanov wrote:
>>>>>
>>>>>> http://cr.openjdk.java.net/~kvn/8249749/webrev.00/
>>>>>
>>>>> Looks good.
>>>>>
>>>>> Best regards,
>>>>> Vladimir Ivanov
>>>>>
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8249749
>>>>>>
>>>>>> SuperWord does not recognize array indexing pattern used in the test
>>>> due to additional AddI node:
>>>>>>
>>>>>> AddI(AddI(Invariant(j*n), Loop_phi(i)), Loop_inc(1))
>>>>>>
>>>>>> As result it can't find memory reference to align vectors. But code ignores
>>>> that and continue execution.
>>>>>> Later when align_to_ref is referenced we hit SEGV because it is NULL.
>>>>>>
>>>>>> The fix is to check align_to_ref for NULL early and bailout.
>>>>>>
>>>>>> I also adjusted code in SWPointer::scaled_iv_plus_offset() to recognize
>>>> this address pattern to vectorize test's code.
>>>>>> And added missing _invar setting.
>>>>>>
>>>>>> And I slightly modified tracking code to investigate this issue.
>>>>>>
>>>>>> Added new test to check some complex address expressions similar to
>>>> bug's test case. Not all cases in test are
>>>>>> vectorized - there are other conditions which prevent that.
>>>>>>
>>>>>> Tested tier1,tier2,hs-tier3,precheckin-comp
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir K

From igor.ignatyev at oracle.com  Tue Aug 18 22:43:49 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 18 Aug 2020 15:43:49 -0700
Subject: RFR(XS): 8208257: Un-quarantine
 vmTestbase/vm/mlvm/meth/func/jdi/breakpointOtherStratum
In-Reply-To: <1b7315b8-98be-116d-d037-e3bb17f55f1b@oracle.com>
References: <1b7315b8-98be-116d-d037-e3bb17f55f1b@oracle.com>
Message-ID: <6F5EA02D-CAC0-405B-B386-7FD2B7BA37EA@oracle.com>

Hi Evgeny,

looks good to me, you will need to update 8208257's title in JBS and close 8058176 as CNR.

Cheers,
-- Igor

> On Aug 18, 2020, at 12:40 PM, Evgeny Nikitin <evgeny.nikitin at oracle.com> wrote:
> 
> Hi,
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8208257
> Webrev: http://cr.openjdk.java.net/~enikitin/8208257/webrev.00/JDK-8208257.patch
> 
> I tried to reproduce the test multiple times with different VM parameters, but it always passes. I suggest removing it from ProblemList.txt.
> 
> Second change is marking the test with randomness keyword from the https://bugs.openjdk.java.net/browse/JDK-8243427 (using reproducible random for mlvm tests).
> 
> The change has been checked in mach5 for windows, macosx, linux in x64-debug, approx. 100 runs on each platform (passed).
> 
> Please review,
> /Evgeny Nikitin.


From igor.ignatyev at oracle.com  Tue Aug 18 23:42:19 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 18 Aug 2020 16:42:19 -0700
Subject: RFR(T) : 8252005 : narrow disabling of allowSmartActionArgs in
 vmTestbase
Message-ID: <4E6FECE6-9103-46ED-84B2-79DBA0123ED9@oracle.com>

http://cr.openjdk.java.net/~iignatyev//8252005/webrev.00/
> 0 lines changed: 0 ins; 0 del; 0 mod;

Hi all,

could you please review this trivial (and apparently empty) patch which sets allowSmartActionArgs to false only in subdirectories of vmTestbase which currently use PropertyResolvingWrapper?

(it's hard to tell from webrev or patch, but test/hotspot/jtreg/vmTestbase/TEST.properties is effectively removed)

webrev: http://cr.openjdk.java.net/~iignatyev//8252005/webrev.00/
JBS: https://bugs.openjdk.java.net/browse/JDK-8252005

Thanks,
-- Igor


From joserz at linux.ibm.com  Wed Aug 19 00:24:32 2020
From: joserz at linux.ibm.com (joserz at linux.ibm.com)
Date: Tue, 18 Aug 2020 21:24:32 -0300
Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new
 byte-reverse instructions
In-Reply-To: <AM4PR02MB3057CBF45D7ABE3F7850535B9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>
References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com>
 <20200626182644.GA262544@pacoca>
 <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com>
 <20200630001528.GA26652@pacoca>
 <OF8BF41F00.CCAC864B-ON002585C8.00003A9A-492585C8.002908BA@notes.na.collabserv.com>
 <AM4PR02MB3057CBF45D7ABE3F7850535B9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>
Message-ID: <20200819002432.GA915540@pacoca>

Hallo Martin!

Thank you very much for your review. Here is the v3:

Webrev: http://cr.openjdk.java.net/~mhorie/8248190/webrev.02/
Bug: https://bugs.openjdk.java.net/browse/JDK-8248190

I run a functional test and it's working as expected. If you try to run it in a system <P10 you will get the following message:

$ java -XX:+UseByteReverseInstructions ReverseBytes
OpenJDK 64-Bit Server VM warning: UseByteReverseInstructions specified, but needs at least Power10.
(continue with existing code)

> Unfortunately, I couldn?t find a Power10 machine in my garage ??
????????

This is the code I use to test:
8<---------------------------------------------------------------
import java.io.IOException;

class ReverseBytes
{
    public static void main(String[] args) throws IOException
    {
        for (int i = 0; i < 1000000; ++i) {
            if (Integer.reverseBytes(0x12345678) != 0x78563412) {
                throw new RuntimeException();
            }

            if (Long.reverseBytes(0x123456789ABCDEF0L) != 0xF0DEBC9A78563412L) {
                throw new RuntimeException();
            }

            if (Short.reverseBytes((short)0x1234) != (short)0x3412) {
                throw new RuntimeException();
            }

            if (Character.reverseBytes((char)0xabcd) != (char)0xcdab) {
                throw new RuntimeException();
            }
        }
        System.out.println("ok");
    }
}
8<---------------------------------------------------------------

Best regards!

Jose

On Tue, Aug 18, 2020 at 09:13:39AM +0000, Doerr, Martin wrote:
> Hi Michihiro and Jose,
> 
> I had only done a quick review during my vacation. Thanks for updating the description of PowerArchitecturePPC64.
> After taking a second look, I have a few minor requests. Sorry for that.
> 
> 
>   *   ?UseByteReverseInstructions? (plural) would be more consistent with other names.
>   *   Please add ?size? specifications to the ppc.ad file. Otherwise, the compiler has to determine sizes dynamically every time.
>   *   bytes_reverse_short: ?format? specification misses ?extsh?.
> 
> Unfortunately, I couldn?t find a Power10 machine in my garage ??
> So we rely on your testing.
> 
> Thanks and best regards,
> Martin
> 
> 
> From: Michihiro Horie <HORIE at jp.ibm.com>
> Sent: Dienstag, 18. August 2020 09:28
> To: Doerr, Martin <martin.doerr at sap.com>
> Cc: hotspot-compiler-dev at openjdk.java.net; joserz at linux.ibm.com
> Subject: RE: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions
> 
> 
> Jose,
> Latest change looks good also to me.
> 
> Marin,
> Do you think if I can push the change?
> 
> Best regards,
> Michihiro
> 
> 
> ----- Original message -----
> From: "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>
> To: "joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>" <joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>>, "horie at jp.ibm.com<mailto:horie at jp.ibm.com>" <horie at jp.ibm.com<mailto:horie at jp.ibm.com>>
> Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions
> Date: Wed, Jul 1, 2020 4:01 AM
> 
> Thanks for the much better flag description.
> Looks good.
> 
> Best regards,
> Martin
> 
> > Am 30.06.2020 um 02:15 schrieb "joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>" <joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>>:
> >
> > ?Hello team,
> >
> > Here's the 2nd version, implementing the suggestions asked by Martin.
> >
> > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.01/
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190
> >
> > Thank you!!
> >
> > Jose
> >
> >> On Sat, Jun 27, 2020 at 09:29:32AM +0000, Doerr, Martin wrote:
> >> Hi Jose,
> >>
> >> Can you replace the outdated description of PowerArchitecturePPC64 in globals_poc.hpp by something generic, please?
> >>
> >> Please update the Copyright year in vm_version_poc.hpp.
> >>
> >> I can?t test the change, but it looks good to me.
> >>
> >> Best regards,
> >> Martin
> >>
> >>>> Am 26.06.2020 um 20:29 schrieb "joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>" <joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>>:
> >>>
> >>> ?Hello team!
> >>>
> >>> This patch introduces Power10 to OpenJDK and implements three new instructions:
> >>> - brh - byte-reverse halfword
> >>> - brw - byte-reverse word
> >>> - brd - byte-reverse doubleword
> >>>
> >>> Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/
> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8248190
> >>>
> >>> Thanks for your review!
> >>>
> >>> Jose R. Ziviani
> 

From aph at redhat.com  Wed Aug 19 08:35:57 2020
From: aph at redhat.com (Andrew Haley)
Date: Wed, 19 Aug 2020 09:35:57 +0100
Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster
 Math.signum(fp)
In-Reply-To: <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com>
References: <abfbd41d-aacc-af5d-d12b-d8e685e95a76@bell-sw.com>
 <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com>
 <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com>
 <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com>
Message-ID: <e6700260-198e-f5df-d91c-317bb58d5d47@redhat.com>

On 18/08/2020 16:05, Dmitry Chuyko wrote:
> Some more results for a benchmark with reduce():
>
> -XX:-UseSignumIntrinsic
> DoubleOrigSignum.ofMostlyNaN   0.914 ?  0.001  ns/op
> DoubleOrigSignum.ofMostlyNeg   1.178 ?  0.001  ns/op
> DoubleOrigSignum.ofMostlyPos   1.176 ?  0.017  ns/op
> DoubleOrigSignum.ofMostlyZero  0.803 ?  0.001  ns/op
> DoubleOrigSignum.ofRandom      1.175 ?  0.012  ns/op
> -XX:+UseSignumIntrinsic
> DoubleOrigSignum.ofMostlyNaN   1.040 ? 0.007   ns/op
> DoubleOrigSignum.ofMostlyNeg   1.040 ? 0.004   ns/op
> DoubleOrigSignum.ofMostlyPos   1.039 ? 0.003   ns/op
> DoubleOrigSignum.ofMostlyZero  1.040 ? 0.001   ns/op
> DoubleOrigSignum.ofRandom      1.040 ? 0.003   ns/op

That's almost no difference, isn't it? Down in the noise.

> If we only intrinsify copySign() we lose free mask that we get from
> facgt. In such case improvement (for signum) decreases like from ~30% to
> ~15%, and it also greatly depends on the particular HW. We can
> additionally introduce an intrinsic for Math.copySign(), especially it
> makes sense for float where it can be just 2 fp instructions: movi+bsl
> (fmovd+fnegd+bsl for double).

I think this is worth doing, because moves between GPRs and vector regs
tend to have a long latency. Can you please add that, and we can all try
it on our various hardware.

We're measuring two different things, throughput and latency. The
first JMH test you provided was really testing latency, because
Blackhole waits for everything to complete.

[ Note to self: Blackhole.consume() seems to be particularly slow on
some AArch64 implementations because it uses a volatile read. What
seems to be happening, judging by how long it takes, is that the store
buffer is drained before the volatile read. Maybe some other construct
would work better but still provide the guarantees Blackhole.consume()
needs. ]

For throughput we want to keep everything moving. Sure, sometimes we
are going to have to wait for some calculation to complete, so if we
can improve latency without adverse cost we should. For that, staying
in the vector regs helps.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From nick.gasson at arm.com  Wed Aug 19 08:37:16 2020
From: nick.gasson at arm.com (Nick Gasson)
Date: Wed, 19 Aug 2020 16:37:16 +0800
Subject: RFR(S): 8251923: "Invalid JNI handle" assertion failure in
 JVMCICompiler::force_comp_at_level_simple() 
Message-ID: <85pn7nxc8z.fsf@nicgas01-pc.shanghai.arm.com>

Hi,

Bug: https://bugs.openjdk.java.net/browse/JDK-8251923
Webrev: http://cr.openjdk.java.net/~ngasson/8251923/webrev.1/

We see this crash occasionally when testing with Graal on some AArch64
systems:

 #
 #  Internal Error (/home/ent-user/jdk_src/src/hotspot/share/runtime/jniHandles.inline.hpp:63), pid=92161, tid=92593
 #  assert(external_guard || result != __null) failed: Invalid JNI handle
 #

 V  [libjvm.so+0xdfaa84]  JNIHandles::resolve(_jobject*)+0x19c
 V  [libjvm.so+0xf25104]  HotSpotJVMCI::resolve(JVMCIObject)+0x14
 V  [libjvm.so+0xe9bd20]  JVMCICompiler::force_comp_at_level_simple(methodHandle const&)+0xa0
 V  [libjvm.so+0x174bd6c]  TieredThresholdPolicy::is_mature(Method*)+0x51c
 V  [libjvm.so+0x76e68c]  ciMethodData::load_data()+0x9cc

The full hs_err file is attached to the JBS entry.

The handle here is _HotSpotJVMCIRuntime_instance which is initialised in
JVMCIRuntime::initialize_HotSpotJVMCIRuntime():

   JVMCIObject result = JVMCIENV->call_HotSpotJVMCIRuntime_runtime(JVMCI_CHECK);
   _HotSpotJVMCIRuntime_instance = JVMCIENV->make_global(result);

JVMCICompiler::force_comp_at_level_simple() checks whether the _object
field inside the handle is null before calling JNIHandles::resolve() on
it, which should avoid the above assertion failure where the pointee is
null. However on a non-TSO architecture another thread may observe the
store to _object when assigning _HotSpotJVMCIRuntime_instance before the
store in JVMCIEnv::make_global() that initialises the pointed-to oop. We
need to add a store-store barrier here to force the expected ordering.

Tested with jcstress and Graal on the affected machine, which used to
reproduce it quite reliably.

--
Thanks,
Nick

From ningsheng.jian at arm.com  Wed Aug 19 09:53:45 2020
From: ningsheng.jian at arm.com (Ningsheng Jian)
Date: Wed, 19 Aug 2020 17:53:45 +0800
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
Message-ID: <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>

Hi Andrew,

I have updated the patch based on the review comments. Would you mind 
taking another look? Thanks!

Full:
http://cr.openjdk.java.net/~njian/8231441/webrev.04/

Incremental:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/

Also add build-dev, as there's a makefile change.

And the split parts:

1) SVE feature detection:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-feature

2) c2 register allocation:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-ra

3) SVE c2 backend:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-c2

Bug: https://bugs.openjdk.java.net/browse/JDK-8231441
CSR: https://bugs.openjdk.java.net/browse/JDK-8248742

JTreg tests are still running, and so far no new failure found.

Thanks,
Ningsheng

On 8/17/20 5:16 PM, Andrew Dinn wrote:
> Hi Pengfei,
> 
> On 17/08/2020 07:00, Ningsheng Jian wrote:
>> Thanks a lot for the review! Sorry for the late reply, as I was on
>> vacation last week. And thanks to Pengfei and Joshua for helping
>> clarifying some details in the patch.
> 
> Yes, they did a very good job of answering most of the pending questions.
> 
>>> I also eyeballed /some/ of the generated code to check that it looked
>>> ok. I'd really like to be able to do that systematically for a
>>> comprehensive test suite that exercised every rule but I only had the
>>> machine for a few days. This really ought to be done as a follow-up to
>>> ensure that all the rules are working as expected.
>>
>> Yes, we would expect Pengfei's OptoAssembly check patch can get merged
>> in future.
> 
> I'm fine with that as a follow-up patch if you raise a JIRA for it.
> 
>>> I am not clear why you are choosing to re-init ptrue after certain JVM
>>> runtime calls (e.g. when Z calls into the runtime) and not others e.g.
>>> when we call a JVM_ENTRY. Could you explain the rationale you have
>>> followed here?
>>
>> We do the re-init at any possible return points to c2 code, not in any
>> runtime c++ functions, which will reduce the re-init calls.
>>
>> Actually I found those entries by some hack of jvm. In the hacky code
>> below we use gcc option -finstrument-functions to build hotspot. With
>> this option, each C/C++ function entry/exit will call the instrument
>> functions we defined. In instrument functions, we clobber p7 (or other
>> reg for test) register, and in c2 function return we verify that p7 (or
>> other reg) has been reinitialized.
>>
>> http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch
> 
> Nice work. It's very good to have that documented. I'm willing to accept
> i) that this has found all current cases and ii) that the verify will
> catch any cases that might get introduced by future changes (e.g. the
> callout introduced by ZGC that you mention below). As the above mot say
> there is a slim chance this might have missed some cases but I think it
> is pretty unlikely.
> 
> 
>>> Specific Comments (register allocator webrev):
>>>
>>>
>>> aarch64.ad:97-100
>>>
>>> Why have you added a reg_def for R8 and R9 here and also to alloc_class
>>> chunk0 at lines 544-545? They aren't used by C2 so why define them?
>>>
>>
>> I think Pengfei has helped to explain that. I will either add clear
>> comments or rename the register name as you suggested.
> 
> Ok, good.
> 
>> As Joshua clarified, we are also working on predicate scalable reg,
>> which is not in this patch. Thanks for the suggestion, I will try to
>> refactor this a bit.
> 
> Ok, I'll wait for an updated patch. Are you planning to include the
> scalable predicate reg code as part of this patch? I think that would be
> better as it would help to clarify the need to distinguish vector regs
> as a subset of scalable regs.
> 
>>> zBarrierSetAssembler_aarch64.cpp:434
>>>
>>> Can you explain why we need to check p7 here and not do so in other
>>> places where we call into the JVM? I'm not saying this is wrong. I just
>>> want to know how you decided where re-init of p7 was needed.
>>>
>>
>> Actually I found this by my hack patch above while running jtreg tests.
>> The stub slowpath here can be a c++ function.
> 
> Yes, good catch.
> 
>>> superword.cpp:97
>>>
>>> Does this mean that is someone sets the maximum vector size to a
>>> non-power of two, such as 384, all superword operations will be
>>> bypassed? Including those which can be done using NEON vectors?
>>>
>>
>> Current SLP vectorizer only supports power-of-2 vector size. We are
>> trying to work out a new vectorizer to support all SVE vector sizes, so
>> we would expect a size like 384 could go to that path. I tried current
>> patch on a 512-bit SVE hardware which does not support 384-bit:
>>
>> $ java -XX:MaxVectorSize=16 -version # (32 and 64 are the same)
>> openjdk version "16-internal" 2021-03-16
>>
>> $ java -XX:MaxVectorSize=48 -version
>> OpenJDK 64-Bit Server VM warning: Current system only supports max SVE
>> vector length 32. Set MaxVectorSize to 32
>>
>> (Fallbacks to 32 and issue a warning, as the prctl() call returns 32
>> instead of unsupported 48:
>> https://www.kernel.org/doc/Documentation/arm64/sve.txt)
>>
>> Do you think we need to exit vm instead of warning and fallbacking to 32
>> here?
> 
> Yes, I think a vm exit would probably be a better choice.
> 
> regards,
> 
> 
> Andrew Dinn
> -----------
> Red Hat Distinguished Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill
> 


From martin.doerr at sap.com  Wed Aug 19 09:55:50 2020
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 19 Aug 2020 09:55:50 +0000
Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new
 byte-reverse instructions
In-Reply-To: <20200819002432.GA915540@pacoca>
References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com>
 <20200626182644.GA262544@pacoca>
 <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com>
 <20200630001528.GA26652@pacoca>
 <OF8BF41F00.CCAC864B-ON002585C8.00003A9A-492585C8.002908BA@notes.na.collabserv.com>
 <AM4PR02MB3057CBF45D7ABE3F7850535B9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <20200819002432.GA915540@pacoca>
Message-ID: <AM4PR02MB30574A706883A995CAEC3A879A5D0@AM4PR02MB3057.eurprd02.prod.outlook.com>

Hi Jose,

thanks for the update.

I have never seen 2 format specifications in the ad file. Does that work or does the 2nd one overwrite the 1st one?
I think it should be:
  format %{ "BRH   $dst, $src\n\t"
            "EXTSH $dst, $dst" %}

I don't need to see another webrev for that. Otherwise, the change looks good. Thanks for contributing.

Best regards,
Martin


> -----Original Message-----
> From: joserz at linux.ibm.com <joserz at linux.ibm.com>
> Sent: Mittwoch, 19. August 2020 02:25
> To: Doerr, Martin <martin.doerr at sap.com>
> Cc: Michihiro Horie <HORIE at jp.ibm.com>; hotspot-compiler-
> dev at openjdk.java.net
> Subject: Re: RFR(M): 8248190: PPC: Enable Power10 system and use new
> byte-reverse instructions
> 
> Hallo Martin!
> 
> Thank you very much for your review. Here is the v3:
> 
> Webrev: http://cr.openjdk.java.net/~mhorie/8248190/webrev.02/
> Bug: https://bugs.openjdk.java.net/browse/JDK-8248190
> 
> I run a functional test and it's working as expected. If you try to run it in a
> system <P10 you will get the following message:
> 
> $ java -XX:+UseByteReverseInstructions ReverseBytes
> OpenJDK 64-Bit Server VM warning: UseByteReverseInstructions specified,
> but needs at least Power10.
> (continue with existing code)
> 
> > Unfortunately, I couldn?t find a Power10 machine in my garage ??
> ????????
> 
> This is the code I use to test:
> 8<---------------------------------------------------------------
> import java.io.IOException;
> 
> class ReverseBytes
> {
>     public static void main(String[] args) throws IOException
>     {
>         for (int i = 0; i < 1000000; ++i) {
>             if (Integer.reverseBytes(0x12345678) != 0x78563412) {
>                 throw new RuntimeException();
>             }
> 
>             if (Long.reverseBytes(0x123456789ABCDEF0L) !=
> 0xF0DEBC9A78563412L) {
>                 throw new RuntimeException();
>             }
> 
>             if (Short.reverseBytes((short)0x1234) != (short)0x3412) {
>                 throw new RuntimeException();
>             }
> 
>             if (Character.reverseBytes((char)0xabcd) != (char)0xcdab) {
>                 throw new RuntimeException();
>             }
>         }
>         System.out.println("ok");
>     }
> }
> 8<---------------------------------------------------------------
> 
> Best regards!
> 
> Jose
> 
> On Tue, Aug 18, 2020 at 09:13:39AM +0000, Doerr, Martin wrote:
> > Hi Michihiro and Jose,
> >
> > I had only done a quick review during my vacation. Thanks for updating the
> description of PowerArchitecturePPC64.
> > After taking a second look, I have a few minor requests. Sorry for that.
> >
> >
> >   *   ?UseByteReverseInstructions? (plural) would be more consistent with
> other names.
> >   *   Please add ?size? specifications to the ppc.ad file. Otherwise, the
> compiler has to determine sizes dynamically every time.
> >   *   bytes_reverse_short: ?format? specification misses ?extsh?.
> >
> > Unfortunately, I couldn?t find a Power10 machine in my garage ??
> > So we rely on your testing.
> >
> > Thanks and best regards,
> > Martin
> >
> >
> > From: Michihiro Horie <HORIE at jp.ibm.com>
> > Sent: Dienstag, 18. August 2020 09:28
> > To: Doerr, Martin <martin.doerr at sap.com>
> > Cc: hotspot-compiler-dev at openjdk.java.net; joserz at linux.ibm.com
> > Subject: RE: RFR(M): 8248190: PPC: Enable Power10 system and use new
> byte-reverse instructions
> >
> >
> > Jose,
> > Latest change looks good also to me.
> >
> > Marin,
> > Do you think if I can push the change?
> >
> > Best regards,
> > Michihiro
> >
> >
> > ----- Original message -----
> > From: "Doerr, Martin"
> <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>
> > To: "joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>"
> <joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>>
> > Cc: hotspot compiler <hotspot-compiler-
> dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>>,
> "horie at jp.ibm.com<mailto:horie at jp.ibm.com>"
> <horie at jp.ibm.com<mailto:horie at jp.ibm.com>>
> > Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system
> and use new byte-reverse instructions
> > Date: Wed, Jul 1, 2020 4:01 AM
> >
> > Thanks for the much better flag description.
> > Looks good.
> >
> > Best regards,
> > Martin
> >
> > > Am 30.06.2020 um 02:15 schrieb
> "joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>"
> <joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>>:
> > >
> > > ?Hello team,
> > >
> > > Here's the 2nd version, implementing the suggestions asked by Martin.
> > >
> > > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.01/
> > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190
> > >
> > > Thank you!!
> > >
> > > Jose
> > >
> > >> On Sat, Jun 27, 2020 at 09:29:32AM +0000, Doerr, Martin wrote:
> > >> Hi Jose,
> > >>
> > >> Can you replace the outdated description of PowerArchitecturePPC64 in
> globals_poc.hpp by something generic, please?
> > >>
> > >> Please update the Copyright year in vm_version_poc.hpp.
> > >>
> > >> I can?t test the change, but it looks good to me.
> > >>
> > >> Best regards,
> > >> Martin
> > >>
> > >>>> Am 26.06.2020 um 20:29 schrieb
> "joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>"
> <joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>>:
> > >>>
> > >>> ?Hello team!
> > >>>
> > >>> This patch introduces Power10 to OpenJDK and implements three new
> instructions:
> > >>> - brh - byte-reverse halfword
> > >>> - brw - byte-reverse word
> > >>> - brd - byte-reverse doubleword
> > >>>
> > >>> Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/
> > >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8248190
> > >>>
> > >>> Thanks for your review!
> > >>>
> > >>> Jose R. Ziviani
> >

From magnus.ihse.bursie at oracle.com  Wed Aug 19 10:05:01 2020
From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie)
Date: Wed, 19 Aug 2020 12:05:01 +0200
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
Message-ID: <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com>

On 2020-08-19 11:53, Ningsheng Jian wrote:
> Hi Andrew,
>
> I have updated the patch based on the review comments. Would you mind 
> taking another look? Thanks!
>
> Full:
> http://cr.openjdk.java.net/~njian/8231441/webrev.04/
Build changes look good. Thank you for remembering to cc build-dev!

This is maybe not relevant, but I was surprised to find 
src/hotspot/cpu/aarch64/aarch64-asmtest.py, because a) it's python code, 
and b) the name implies that it is a test, even though that it resides 
in src. Is this really proper?

/Magnus
>
> Incremental:
> http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/
>
> Also add build-dev, as there's a makefile change.
>
> And the split parts:
>
> 1) SVE feature detection:
> http://cr.openjdk.java.net/~njian/8231441/webrev.04-feature
>
> 2) c2 register allocation:
> http://cr.openjdk.java.net/~njian/8231441/webrev.04-ra
>
> 3) SVE c2 backend:
> http://cr.openjdk.java.net/~njian/8231441/webrev.04-c2
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8231441
> CSR: https://bugs.openjdk.java.net/browse/JDK-8248742
>
> JTreg tests are still running, and so far no new failure found.
>
> Thanks,
> Ningsheng
>
> On 8/17/20 5:16 PM, Andrew Dinn wrote:
>> Hi Pengfei,
>>
>> On 17/08/2020 07:00, Ningsheng Jian wrote:
>>> Thanks a lot for the review! Sorry for the late reply, as I was on
>>> vacation last week. And thanks to Pengfei and Joshua for helping
>>> clarifying some details in the patch.
>>
>> Yes, they did a very good job of answering most of the pending 
>> questions.
>>
>>>> I also eyeballed /some/ of the generated code to check that it looked
>>>> ok. I'd really like to be able to do that systematically for a
>>>> comprehensive test suite that exercised every rule but I only had the
>>>> machine for a few days. This really ought to be done as a follow-up to
>>>> ensure that all the rules are working as expected.
>>>
>>> Yes, we would expect Pengfei's OptoAssembly check patch can get merged
>>> in future.
>>
>> I'm fine with that as a follow-up patch if you raise a JIRA for it.
>>
>>>> I am not clear why you are choosing to re-init ptrue after certain JVM
>>>> runtime calls (e.g. when Z calls into the runtime) and not others e.g.
>>>> when we call a JVM_ENTRY. Could you explain the rationale you have
>>>> followed here?
>>>
>>> We do the re-init at any possible return points to c2 code, not in any
>>> runtime c++ functions, which will reduce the re-init calls.
>>>
>>> Actually I found those entries by some hack of jvm. In the hacky code
>>> below we use gcc option -finstrument-functions to build hotspot. With
>>> this option, each C/C++ function entry/exit will call the instrument
>>> functions we defined. In instrument functions, we clobber p7 (or other
>>> reg for test) register, and in c2 function return we verify that p7 (or
>>> other reg) has been reinitialized.
>>>
>>> http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch 
>>>
>>
>> Nice work. It's very good to have that documented. I'm willing to accept
>> i) that this has found all current cases and ii) that the verify will
>> catch any cases that might get introduced by future changes (e.g. the
>> callout introduced by ZGC that you mention below). As the above mot say
>> there is a slim chance this might have missed some cases but I think it
>> is pretty unlikely.
>>
>>
>>>> Specific Comments (register allocator webrev):
>>>>
>>>>
>>>> aarch64.ad:97-100
>>>>
>>>> Why have you added a reg_def for R8 and R9 here and also to 
>>>> alloc_class
>>>> chunk0 at lines 544-545? They aren't used by C2 so why define them?
>>>>
>>>
>>> I think Pengfei has helped to explain that. I will either add clear
>>> comments or rename the register name as you suggested.
>>
>> Ok, good.
>>
>>> As Joshua clarified, we are also working on predicate scalable reg,
>>> which is not in this patch. Thanks for the suggestion, I will try to
>>> refactor this a bit.
>>
>> Ok, I'll wait for an updated patch. Are you planning to include the
>> scalable predicate reg code as part of this patch? I think that would be
>> better as it would help to clarify the need to distinguish vector regs
>> as a subset of scalable regs.
>>
>>>> zBarrierSetAssembler_aarch64.cpp:434
>>>>
>>>> Can you explain why we need to check p7 here and not do so in other
>>>> places where we call into the JVM? I'm not saying this is wrong. I 
>>>> just
>>>> want to know how you decided where re-init of p7 was needed.
>>>>
>>>
>>> Actually I found this by my hack patch above while running jtreg tests.
>>> The stub slowpath here can be a c++ function.
>>
>> Yes, good catch.
>>
>>>> superword.cpp:97
>>>>
>>>> Does this mean that is someone sets the maximum vector size to a
>>>> non-power of two, such as 384, all superword operations will be
>>>> bypassed? Including those which can be done using NEON vectors?
>>>>
>>>
>>> Current SLP vectorizer only supports power-of-2 vector size. We are
>>> trying to work out a new vectorizer to support all SVE vector sizes, so
>>> we would expect a size like 384 could go to that path. I tried current
>>> patch on a 512-bit SVE hardware which does not support 384-bit:
>>>
>>> $ java -XX:MaxVectorSize=16 -version # (32 and 64 are the same)
>>> openjdk version "16-internal" 2021-03-16
>>>
>>> $ java -XX:MaxVectorSize=48 -version
>>> OpenJDK 64-Bit Server VM warning: Current system only supports max SVE
>>> vector length 32. Set MaxVectorSize to 32
>>>
>>> (Fallbacks to 32 and issue a warning, as the prctl() call returns 32
>>> instead of unsupported 48:
>>> https://www.kernel.org/doc/Documentation/arm64/sve.txt)
>>>
>>> Do you think we need to exit vm instead of warning and fallbacking 
>>> to 32
>>> here?
>>
>> Yes, I think a vm exit would probably be a better choice.
>>
>> regards,
>>
>>
>> Andrew Dinn
>> -----------
>> Red Hat Distinguished Engineer
>> Red Hat UK Ltd
>> Registered in England and Wales under Company Registration No. 03798903
>> Directors: Michael Cunningham, Michael ("Mike") O'Neill
>>
>


From martin.doerr at sap.com  Wed Aug 19 10:16:37 2020
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 19 Aug 2020 10:16:37 +0000
Subject: [16] RFR (S) 8249749: modify a primitive array through a stream
 and a for cycle causes jre crash
In-Reply-To: <860caa40-38f7-baff-54dc-3e6802a64425@oracle.com>
References: <1d42d9d4-2744-5bf0-f8e2-239be7be10b5@oracle.com>
 <03653a8f-d196-4e4c-d893-a619f2973011@oracle.com>
 <3d0dc868-e824-a141-02a2-58a58ad5b450@oracle.com>
 <AM4PR02MB3057138FD7E22199E454EDDC9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <2f7bd657-ce3b-b14b-e682-2b209c2240eb@oracle.com>
 <89ebdebb-2d8d-7e3a-1594-a3a6888329bc@oracle.com>
 <860caa40-38f7-baff-54dc-3e6802a64425@oracle.com>
Message-ID: <AM4PR02MB30577557A76B3AF4702314D39A5D0@AM4PR02MB3057.eurprd02.prod.outlook.com>

Hi Vladimir,

thank you for taking care of it. It's good to know that 11u is also affected.

Best regards,
Martin


> -----Original Message-----
> From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Sent: Mittwoch, 19. August 2020 00:10
> To: Doerr, Martin <martin.doerr at sap.com>; hotspot compiler <hotspot-
> compiler-dev at openjdk.java.net>
> Cc: Zeller, Arno <arno.zeller at sap.com>
> Subject: Re: [16] RFR (S) 8249749: modify a primitive array through a stream
> and a for cycle causes jre crash
> 
> https://bugs.openjdk.java.net/browse/JDK-8251994
> 
> On 8/18/20 3:03 PM, Vladimir Kozlov wrote:
> > I reproduced it with -XX:UseAVX=0.
> >
> > I will file bug and take care of it.
> >
> > Thanks,
> > Vladimir K
> >
> > On 8/18/20 2:57 PM, Vladimir Kozlov wrote:
> >> Thank you for reporting, Martin
> >>
> >> Please, file bug and specify JDK version, VM flags and CPUID features on
> machine where it fail.
> >>
> >> We test on x86 and aarch64 and I did not see any issues so far.
> >>
> >> Regards,
> >> Vladimir
> >>
> >> On 8/18/20 2:25 PM, Doerr, Martin wrote:
> >>> Hi Vladimir,
> >>>
> >>> we are hitting the following assertion after this change was pushed:
> >>> assert(my_pack(s) == __null) failed: only in one pack
> >>>
> >>> Stack:
> >>>
> V? [jvm.dll+0xbbac55]? SuperWord::construct_my_pack_map+0x135? (superw
> ord.cpp:1723)
> >>>
> V? [jvm.dll+0xbb57f7]? SuperWord::SLP_extract+0x427? (superword.cpp:520)
> >>>
> V? [jvm.dll+0xbcba0b]? SuperWord::transform_loop+0x48b? (superword.cpp:
> 170)
> >>>
> V? [jvm.dll+0x895a09]? PhaseIdealLoop::build_and_optimize+0xef9? (loopnod
> e.cpp:3270)
> >>> V? [jvm.dll+0x3df4b6]? Compile::Optimize+0xf76? (compile.cpp:2187)
> >>> ...
> >>>
> >>> Seems to be reproducible by JTREG test
> compiler/vectorization/TestComplexAddrExpr.java on some x64 and aarch64
> machines.
> >>> (May depend on CPU model.)
> >>>
> >>> Is this a known issue?
> >>> Or should I open a bug?
> >>>
> >>> Best regards,
> >>> Martin
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: hotspot-compiler-dev <hotspot-compiler-dev-
> >>>> retn at openjdk.java.net> On Behalf Of Vladimir Kozlov
> >>>> Sent: Montag, 10. August 2020 19:03
> >>>> To: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> >>>> Subject: Re: [16] RFR (S) 8249749: modify a primitive array through a
> stream
> >>>> and a for cycle causes jre crash
> >>>>
> >>>> Thank you, Vladimir
> >>>>
> >>>> Vladimir K
> >>>>
> >>>> On 8/10/20 2:04 AM, Vladimir Ivanov wrote:
> >>>>>
> >>>>>> http://cr.openjdk.java.net/~kvn/8249749/webrev.00/
> >>>>>
> >>>>> Looks good.
> >>>>>
> >>>>> Best regards,
> >>>>> Vladimir Ivanov
> >>>>>
> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8249749
> >>>>>>
> >>>>>> SuperWord does not recognize array indexing pattern used in the
> test
> >>>> due to additional AddI node:
> >>>>>>
> >>>>>> AddI(AddI(Invariant(j*n), Loop_phi(i)), Loop_inc(1))
> >>>>>>
> >>>>>> As result it can't find memory reference to align vectors. But code
> ignores
> >>>> that and continue execution.
> >>>>>> Later when align_to_ref is referenced we hit SEGV because it is
> NULL.
> >>>>>>
> >>>>>> The fix is to check align_to_ref for NULL early and bailout.
> >>>>>>
> >>>>>> I also adjusted code in SWPointer::scaled_iv_plus_offset() to
> recognize
> >>>> this address pattern to vectorize test's code.
> >>>>>> And added missing _invar setting.
> >>>>>>
> >>>>>> And I slightly modified tracking code to investigate this issue.
> >>>>>>
> >>>>>> Added new test to check some complex address expressions similar
> to
> >>>> bug's test case. Not all cases in test are
> >>>>>> vectorized - there are other conditions which prevent that.
> >>>>>>
> >>>>>> Tested tier1,tier2,hs-tier3,precheckin-comp
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Vladimir K

From ningsheng.jian at arm.com  Wed Aug 19 10:40:49 2020
From: ningsheng.jian at arm.com (Ningsheng Jian)
Date: Wed, 19 Aug 2020 18:40:49 +0800
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
 <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com>
Message-ID: <4ec335ca-0a88-3b98-f6e4-fe7a0453ae7b@arm.com>

Hi Magnus,

Thanks for the review!

On 8/19/20 6:05 PM, Magnus Ihse Bursie wrote:
> On 2020-08-19 11:53, Ningsheng Jian wrote:
>> Hi Andrew,
>>
>> I have updated the patch based on the review comments. Would you mind
>> taking another look? Thanks!
>>
>> Full:
>> http://cr.openjdk.java.net/~njian/8231441/webrev.04/
> Build changes look good. Thank you for remembering to cc build-dev!
> 
> This is maybe not relevant, but I was surprised to find
> src/hotspot/cpu/aarch64/aarch64-asmtest.py, because a) it's python code,
> and b) the name implies that it is a test, even though that it resides
> in src. Is this really proper?

This handy script is used to (manually) generate some code in 
assembler_aarch64.cpp. The generated code is for assembler smoke test, 
so it named that. It's helpful to make sure the assembler emits correct 
binary code, but I am not sure whether a python code in the project is 
proper or not.

Thanks,
Ningsheng

> 
> /Magnus
>>
>> Incremental:
>> http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/
>>
>> Also add build-dev, as there's a makefile change.
>>
>> And the split parts:
>>
>> 1) SVE feature detection:
>> http://cr.openjdk.java.net/~njian/8231441/webrev.04-feature
>>
>> 2) c2 register allocation:
>> http://cr.openjdk.java.net/~njian/8231441/webrev.04-ra
>>
>> 3) SVE c2 backend:
>> http://cr.openjdk.java.net/~njian/8231441/webrev.04-c2
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231441
>> CSR: https://bugs.openjdk.java.net/browse/JDK-8248742
>>
>> JTreg tests are still running, and so far no new failure found.
>>
>> Thanks,
>> Ningsheng
>>
>> On 8/17/20 5:16 PM, Andrew Dinn wrote:
>>> Hi Pengfei,
>>>
>>> On 17/08/2020 07:00, Ningsheng Jian wrote:
>>>> Thanks a lot for the review! Sorry for the late reply, as I was on
>>>> vacation last week. And thanks to Pengfei and Joshua for helping
>>>> clarifying some details in the patch.
>>>
>>> Yes, they did a very good job of answering most of the pending
>>> questions.
>>>
>>>>> I also eyeballed /some/ of the generated code to check that it looked
>>>>> ok. I'd really like to be able to do that systematically for a
>>>>> comprehensive test suite that exercised every rule but I only had the
>>>>> machine for a few days. This really ought to be done as a follow-up to
>>>>> ensure that all the rules are working as expected.
>>>>
>>>> Yes, we would expect Pengfei's OptoAssembly check patch can get merged
>>>> in future.
>>>
>>> I'm fine with that as a follow-up patch if you raise a JIRA for it.
>>>
>>>>> I am not clear why you are choosing to re-init ptrue after certain JVM
>>>>> runtime calls (e.g. when Z calls into the runtime) and not others e.g.
>>>>> when we call a JVM_ENTRY. Could you explain the rationale you have
>>>>> followed here?
>>>>
>>>> We do the re-init at any possible return points to c2 code, not in any
>>>> runtime c++ functions, which will reduce the re-init calls.
>>>>
>>>> Actually I found those entries by some hack of jvm. In the hacky code
>>>> below we use gcc option -finstrument-functions to build hotspot. With
>>>> this option, each C/C++ function entry/exit will call the instrument
>>>> functions we defined. In instrument functions, we clobber p7 (or other
>>>> reg for test) register, and in c2 function return we verify that p7 (or
>>>> other reg) has been reinitialized.
>>>>
>>>> http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch
>>>>
>>>
>>> Nice work. It's very good to have that documented. I'm willing to accept
>>> i) that this has found all current cases and ii) that the verify will
>>> catch any cases that might get introduced by future changes (e.g. the
>>> callout introduced by ZGC that you mention below). As the above mot say
>>> there is a slim chance this might have missed some cases but I think it
>>> is pretty unlikely.
>>>
>>>
>>>>> Specific Comments (register allocator webrev):
>>>>>
>>>>>
>>>>> aarch64.ad:97-100
>>>>>
>>>>> Why have you added a reg_def for R8 and R9 here and also to
>>>>> alloc_class
>>>>> chunk0 at lines 544-545? They aren't used by C2 so why define them?
>>>>>
>>>>
>>>> I think Pengfei has helped to explain that. I will either add clear
>>>> comments or rename the register name as you suggested.
>>>
>>> Ok, good.
>>>
>>>> As Joshua clarified, we are also working on predicate scalable reg,
>>>> which is not in this patch. Thanks for the suggestion, I will try to
>>>> refactor this a bit.
>>>
>>> Ok, I'll wait for an updated patch. Are you planning to include the
>>> scalable predicate reg code as part of this patch? I think that would be
>>> better as it would help to clarify the need to distinguish vector regs
>>> as a subset of scalable regs.
>>>
>>>>> zBarrierSetAssembler_aarch64.cpp:434
>>>>>
>>>>> Can you explain why we need to check p7 here and not do so in other
>>>>> places where we call into the JVM? I'm not saying this is wrong. I
>>>>> just
>>>>> want to know how you decided where re-init of p7 was needed.
>>>>>
>>>>
>>>> Actually I found this by my hack patch above while running jtreg tests.
>>>> The stub slowpath here can be a c++ function.
>>>
>>> Yes, good catch.
>>>
>>>>> superword.cpp:97
>>>>>
>>>>> Does this mean that is someone sets the maximum vector size to a
>>>>> non-power of two, such as 384, all superword operations will be
>>>>> bypassed? Including those which can be done using NEON vectors?
>>>>>
>>>>
>>>> Current SLP vectorizer only supports power-of-2 vector size. We are
>>>> trying to work out a new vectorizer to support all SVE vector sizes, so
>>>> we would expect a size like 384 could go to that path. I tried current
>>>> patch on a 512-bit SVE hardware which does not support 384-bit:
>>>>
>>>> $ java -XX:MaxVectorSize=16 -version # (32 and 64 are the same)
>>>> openjdk version "16-internal" 2021-03-16
>>>>
>>>> $ java -XX:MaxVectorSize=48 -version
>>>> OpenJDK 64-Bit Server VM warning: Current system only supports max SVE
>>>> vector length 32. Set MaxVectorSize to 32
>>>>
>>>> (Fallbacks to 32 and issue a warning, as the prctl() call returns 32
>>>> instead of unsupported 48:
>>>> https://www.kernel.org/doc/Documentation/arm64/sve.txt)
>>>>
>>>> Do you think we need to exit vm instead of warning and fallbacking
>>>> to 32
>>>> here?
>>>
>>> Yes, I think a vm exit would probably be a better choice.
>>>
>>> regards,
>>>
>>>
>>> Andrew Dinn
>>> -----------
>>> Red Hat Distinguished Engineer
>>> Red Hat UK Ltd
>>> Registered in England and Wales under Company Registration No. 03798903
>>> Directors: Michael Cunningham, Michael ("Mike") O'Neill
>>>
>>
> 


From aph at redhat.com  Wed Aug 19 11:10:10 2020
From: aph at redhat.com (Andrew Haley)
Date: Wed, 19 Aug 2020 12:10:10 +0100
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
 <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com>
Message-ID: <c5f7a2f4-8475-9b69-8f93-3d036ceb585b@redhat.com>

On 19/08/2020 11:05, Magnus Ihse Bursie wrote:
> This is maybe not relevant, but I was surprised to find
> src/hotspot/cpu/aarch64/aarch64-asmtest.py, because a) it's python code,
> and b) the name implies that it is a test, even though that it resides
> in src. Is this really proper?

I have no idea whether it's really proper, but it allows us to check
that instructions are encoded correctly by cross-checking with the
system's assembler. There might well be a more hygienic way to do
that, but I don't want to be without it.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From fairoz.matte at oracle.com  Wed Aug 19 12:30:47 2020
From: fairoz.matte at oracle.com (Fairoz Matte)
Date: Wed, 19 Aug 2020 05:30:47 -0700 (PDT)
Subject: RFR(s): 8248295:
 serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal
In-Reply-To: <7040f785-b871-9771-94a2-4c3472a6bf6d@oracle.com>
References: <d9b5bcb9-ebc4-42d2-ac22-cba1128a3cf4@default>
 <1bdcbd35-097e-1681-3a0c-32f9709497a4@oracle.com>
 <eef7f47b-711c-476c-9ca5-f75b55962bb9@default>
 <7040f785-b871-9771-94a2-4c3472a6bf6d@oracle.com>
Message-ID: <59cd0914-5a61-463e-b46f-ebdc1496ab9f@default>

Hi Vladimir,

Thanks for the review.

> I would suggest to run test with -XX:+PrintCodeCache flag which prints
> CodeCache usage on exit.
> 
> Also add '-ea -esa' flags - some runs failed with them because they increase
> Graal's methods size.
> 
> Running test with immediately caused OOM error on my local linux machine:
> 
> '-server -ea -esa -XX:+TieredCompilation -XX:+PrintCodeCache -
> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -
> XX:+UseJVMCICompiler -Djvmci.Compiler=graal'
> 
> With -XX:ReservedCodeCacheSize=30m I got:
> 
> [11.217s][warning][codecache] CodeCache is full. Compiler has been
> disabled.
> [11.217s][warning][codecache] Try increasing the code cache size using -
> XX:ReservedCodeCacheSize=
> 
> With -XX:ReservedCodeCacheSize=50m I got this output:

Further testing with PrintCodeCache, ReservedCodeCacheSize = 50MB is the safe one to use.

> 
> CodeCache: size=51200Kb used=34401Kb max_used=34401Kb free=16798Kb
> 
> May be you need to set it to 35m or better to 50m to be safe.
> 
> Note, without Graal test uses only 5.5m:
> 
> CodeCache: size=20480Kb used=5677Kb max_used=5688Kb free=14803Kb
> 
> -----------------------------
> 
> I also forgot to ask you to update test's Copyright year.

I have updated the copyright year.
Updated webrev for the reference - http://cr.openjdk.java.net/~fmatte/8248295/webrev.01/ 

Thanks,
Fairoz
> 
> Regards,
> Vladimir K
> 
> On 8/18/20 1:10 AM, Fairoz Matte wrote:
> > Hi Vladimir,
> >
> > Thanks for looking into.
> > This is intermittent crash, and is reproducible in windows debug build
> environment. Below is the testing performed.
> >
> > 1. Issues observed 7/100 runs, ReservedCodeCacheSize=20m with "-
> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -
> XX:+UseJVMCICompiler"
> > 2. Issues observed 0/300 runs, ReservedCodeCacheSize=30m with "-
> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -
> XX:+UseJVMCICompiler"
> >
> > Thanks,
> > Fairoz
> >
> >> -----Original Message-----
> >> From: Vladimir Kozlov
> >> Sent: Monday, August 17, 2020 11:22 PM
> >> To: Fairoz Matte <fairoz.matte at oracle.com>; hotspot-compiler-
> >> dev at openjdk.java.net; serviceability-dev at openjdk.java.net
> >> Cc: Coleen Phillimore <coleen.phillimore at oracle.com>; Dean Long
> >> <dean.long at oracle.com>
> >> Subject: Re: RFR(s): 8248295:
> >> serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with
> >> Graal
> >>
> >> Hi Fairoz,
> >>
> >> How you determine that +10Mb is enough with Graal?
> >>
> >> Thanks,
> >> Vladimir
> >>
> >> On 8/17/20 5:46 AM, Fairoz Matte wrote:
> >>> Hi,
> >>>
> >>>
> >>>
> >>> Please review this small test change to work with Graal.
> >>>
> >>>
> >>>
> >>> Background:
> >>>
> >>> Graal require more code cache compared to c1/c2. but the test case
> >>> always
> >> set it to 20MB. This may not be sufficient when running graal.
> >>>
> >>> Default configuration for ReservedCodeCacheSize = 250MB
> >>>
> >>> With graal enabled, ReservedCodeCacheSize = 350MB
> >>>
> >>>
> >>>
> >>> Either we can modify the framework to honor ReservedCodeCacheSize
> >>> for
> >> graal or just update the testcase.
> >>>
> >>> There are not many test cases they rely on ReservedCodeCacheSize or
> >> InitialCodeCacheSize. So the fix prefer the later one.
> >>>
> >>>
> >>>
> >>> JBS - https://bugs.openjdk.java.net/browse/JDK-8248295
> >>>
> >>> Webrev - http://cr.openjdk.java.net/~fmatte/8248295/webrev.00/
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> Fairoz
> >>>
> >>>
> >>>

From adinn at redhat.com  Wed Aug 19 13:01:44 2020
From: adinn at redhat.com (Andrew Dinn)
Date: Wed, 19 Aug 2020 14:01:44 +0100
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
Message-ID: <47a0b915-291d-7bee-c298-a85d57b1c3a7@redhat.com>

Hi Ningsheng,

On 19/08/2020 10:53, Ningsheng Jian wrote:
> I have updated the patch based on the review comments. Would you mind
> taking another look? Thanks!
> 
> Full:
> http://cr.openjdk.java.net/~njian/8231441/webrev.04/
> 
> Incremental:
> http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/

That looks ok. A few suggested tweaks:

aarch64.ad:168

I think the following comment explains more clearly what is going on:

// For SVE vector registers, we simply extend vector register size to 8
// 'logical' slots. This is nominally 256 bits but it actually covers
// all possible 'physical' SVE vector register lengths from 128 ~ 2048 bits.
// The 'physical' SVE vector register length is detected during startup
// so the register allocator is able to identify the correct number of
// bytes needed for an SVE spill/unspill.
// Note that a vector register with 4 slots, denotes a 128-bit NEON
// register allowing it to be distinguished from the
//  corresponding SVE vector register when the SVE vector length
// is 128 bits.

postaloc.cpp:312 & 322

311     if (lrgs(val_idx).is_scalable()) {
312       assert(val->ideal_reg() == Op_VecA, "scalable vector register");

        . . .

321       if (lrgs(val_idx).is_scalable()) {
322         assert(val->ideal_reg() == Op_VecA, "scalable vector register");

You don't strictly need the asserts here as this is already asserted in
the call to is_scalable().


> JTreg tests are still running, and so far no new failure found.
Ok, well assuming they pass I am happy with this latest patch modulo the
tweaks above.

regards,


Andrew Dinn
-----------
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From christian.hagedorn at oracle.com  Wed Aug 19 14:06:57 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Wed, 19 Aug 2020 16:06:57 +0200
Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and
 debugging support
In-Reply-To: <8cd1d560-f473-f4f1-a865-70e306d4750f@oracle.com>
References: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com>
 <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com>
 <5c6d0dae-a30d-7b32-4503-62b7ec460606@oracle.com>
 <d4f2ee42-6db4-644f-0b0c-2c6e4d6373bd@oracle.com>
 <4b6d3d8a-f2f3-9942-de60-078a0e5c46d9@oracle.com>
 <8cd1d560-f473-f4f1-a865-70e306d4750f@oracle.com>
Message-ID: <0d5fd444-e836-8042-3039-6d16e62ecfb1@oracle.com>

On 18.08.20 17:41, Vladimir Kozlov wrote:
> c1_Compilation.hpp: looks like both versions of allocator() do the same 
> thing.

Right, I first wanted to have a public allocator() version in 
non-product only - but that might be over-engineered as they do the same 
thing. I changed it back to a single public version.

> I suggest to build with configure --with-debug-level=optimized to check 
> that NOT_PRODUCT can be built with these changes.

That's a good idea! I indeed forgot about one NOT_PRODUCT -> DEBUG_ONLY 
change. I also found other build issues with the optimized build. I 
filed [1] and already sent an RFR for it. It builds successfully with 
this patch on top of it.

http://cr.openjdk.java.net/~chagedorn/8251093/webrev.02/

Best regards,
Christian

[1] https://bugs.openjdk.java.net/browse/JDK-8252037

> Thanks,
> Vladimir
> 
> On 8/18/20 6:16 AM, Christian Hagedorn wrote:
>> Hi Vladimir
>>
>> On 17.08.20 19:36, Vladimir Kozlov wrote:
>>> On 8/17/20 12:44 AM, Christian Hagedorn wrote:
>>>> Hi Vladimir
>>>>
>>>> Yes, you're right, these should be changed into ASSERT and DEBUG().
>>>>
>>>> I'm wondering though if these ifdefs are even required for if-blocks 
>>>> inside methods?
>>>>
>>>> Isn't, for example, this if-block:
>>>>
>>>> #ifndef PRODUCT
>>>> ???????? if (TraceLinearScanLevel >= 2) {
>>>> ?????????? tty->print_cr("killing XMMs for trig");
>>>> ???????? }
>>>> #endif
>>>>
>>>> removed anyways when the flag is set to < 2 (which is statically 
>>>> known and thus would allow this entire block to be removed)? Or does 
>>>> it make a difference by explicitly guarding it with an ifdef?
>>>
>>> You are right. It could be statically removed. But we keep #ifdef 
>>> sometimes to indicate that code is executed only in debug build 
>>> because we don't always remember type of a flag.
>>
>> I see, that makes sense. I updated my patch and left the ifdefs there 
>> but changed them to ASSERT. I also updated other ifdefs belonging to 
>> TraceLinearScanLevel appropriately.
>>
>> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.01/
>>
>> Best regards,
>> Christian
>>
>>>
>>> Thanks,
>>> Vladimir K
>>>
>>>>
>>>> Best regards,
>>>> Christian
>>>>
>>>> On 14.08.20 20:09, Vladimir Kozlov wrote:
>>>>> One note. Most of the code is guarded by #ifndef PRODUCT.
>>>>>
>>>>> But the flag is available only in DEBUG build:
>>>>> ?? develop(intx, TraceLinearScanLevel, 0,
>>>>>
>>>>> Should we use #ifdef ASSERT and DEBUG() instead?
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 8/14/20 5:10 AM, Christian Hagedorn wrote:
>>>>>> Hi
>>>>>>
>>>>>> Please review the following enhancement for C1:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8251093
>>>>>> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.00/
>>>>>>
>>>>>> While I was working on JDK-8249603 [1], I added some additional 
>>>>>> debugging and logging code which helped to figure out what was 
>>>>>> going on. I think it would be useful to have this code around for 
>>>>>> the analysis of future C1 register allocator bugs.
>>>>>>
>>>>>> This RFE adds (everything non-product code):
>>>>>> - find_interval(number): Can be called like that from gdb anywhere 
>>>>>> to find an interval with the given number.
>>>>>> - Interval::print_children()/print_parent(): Useful when debugging 
>>>>>> with gdb to quickly show the split children and parent.
>>>>>> - LinearScan::print_reg_num(number): Prints the register or stack 
>>>>>> location for this register number. This is useful in some places 
>>>>>> (logging with TraceLinearScanLevel set) where it just printed a 
>>>>>> number which first had to be manually looked up in other logs.
>>>>>>
>>>>>> I additionally did some cleanup of the touched code.
>>>>>>
>>>>>> We could additionally split the TraceLinearScanLevel flag into 
>>>>>> separate flags related to the different phases of the register 
>>>>>> allocation algorithm. It currently just prints too much details on 
>>>>>> the higher levels. You often find yourself being interested in a 
>>>>>> specific part of the algorithm and only want to know more details 
>>>>>> there. To achieve that you now you have to either handle all the 
>>>>>> noise or manually disable/enable other logs. We could file an RFE 
>>>>>> to clean this up if it's worth the effort - given that there are 
>>>>>> not many new issues filed for C1 register allocation today.
>>>>>>
>>>>>> Thank you!
>>>>>>
>>>>>> Best regards,
>>>>>> Christian
>>>>>>
>>>>>>
>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8251093
>>>>>>

From vladimir.kozlov at oracle.com  Wed Aug 19 16:38:24 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 19 Aug 2020 09:38:24 -0700
Subject: RFR(s): 8248295:
 serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal
In-Reply-To: <59cd0914-5a61-463e-b46f-ebdc1496ab9f@default>
References: <d9b5bcb9-ebc4-42d2-ac22-cba1128a3cf4@default>
 <1bdcbd35-097e-1681-3a0c-32f9709497a4@oracle.com>
 <eef7f47b-711c-476c-9ca5-f75b55962bb9@default>
 <7040f785-b871-9771-94a2-4c3472a6bf6d@oracle.com>
 <59cd0914-5a61-463e-b46f-ebdc1496ab9f@default>
Message-ID: <1b7f5767-7d1f-1f43-87bb-556801ef1c41@oracle.com>

Looks good.

Thanks,
Vladimir K

On 8/19/20 5:30 AM, Fairoz Matte wrote:
> Hi Vladimir,
> 
> Thanks for the review.
> 
>> I would suggest to run test with -XX:+PrintCodeCache flag which prints
>> CodeCache usage on exit.
>>
>> Also add '-ea -esa' flags - some runs failed with them because they increase
>> Graal's methods size.
>>
>> Running test with immediately caused OOM error on my local linux machine:
>>
>> '-server -ea -esa -XX:+TieredCompilation -XX:+PrintCodeCache -
>> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -
>> XX:+UseJVMCICompiler -Djvmci.Compiler=graal'
>>
>> With -XX:ReservedCodeCacheSize=30m I got:
>>
>> [11.217s][warning][codecache] CodeCache is full. Compiler has been
>> disabled.
>> [11.217s][warning][codecache] Try increasing the code cache size using -
>> XX:ReservedCodeCacheSize=
>>
>> With -XX:ReservedCodeCacheSize=50m I got this output:
> 
> Further testing with PrintCodeCache, ReservedCodeCacheSize = 50MB is the safe one to use.
> 
>>
>> CodeCache: size=51200Kb used=34401Kb max_used=34401Kb free=16798Kb
>>
>> May be you need to set it to 35m or better to 50m to be safe.
>>
>> Note, without Graal test uses only 5.5m:
>>
>> CodeCache: size=20480Kb used=5677Kb max_used=5688Kb free=14803Kb
>>
>> -----------------------------
>>
>> I also forgot to ask you to update test's Copyright year.
> 
> I have updated the copyright year.
> Updated webrev for the reference - http://cr.openjdk.java.net/~fmatte/8248295/webrev.01/
> 
> Thanks,
> Fairoz
>>
>> Regards,
>> Vladimir K
>>
>> On 8/18/20 1:10 AM, Fairoz Matte wrote:
>>> Hi Vladimir,
>>>
>>> Thanks for looking into.
>>> This is intermittent crash, and is reproducible in windows debug build
>> environment. Below is the testing performed.
>>>
>>> 1. Issues observed 7/100 runs, ReservedCodeCacheSize=20m with "-
>> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -
>> XX:+UseJVMCICompiler"
>>> 2. Issues observed 0/300 runs, ReservedCodeCacheSize=30m with "-
>> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -
>> XX:+UseJVMCICompiler"
>>>
>>> Thanks,
>>> Fairoz
>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov
>>>> Sent: Monday, August 17, 2020 11:22 PM
>>>> To: Fairoz Matte <fairoz.matte at oracle.com>; hotspot-compiler-
>>>> dev at openjdk.java.net; serviceability-dev at openjdk.java.net
>>>> Cc: Coleen Phillimore <coleen.phillimore at oracle.com>; Dean Long
>>>> <dean.long at oracle.com>
>>>> Subject: Re: RFR(s): 8248295:
>>>> serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with
>>>> Graal
>>>>
>>>> Hi Fairoz,
>>>>
>>>> How you determine that +10Mb is enough with Graal?
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 8/17/20 5:46 AM, Fairoz Matte wrote:
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> Please review this small test change to work with Graal.
>>>>>
>>>>>
>>>>>
>>>>> Background:
>>>>>
>>>>> Graal require more code cache compared to c1/c2. but the test case
>>>>> always
>>>> set it to 20MB. This may not be sufficient when running graal.
>>>>>
>>>>> Default configuration for ReservedCodeCacheSize = 250MB
>>>>>
>>>>> With graal enabled, ReservedCodeCacheSize = 350MB
>>>>>
>>>>>
>>>>>
>>>>> Either we can modify the framework to honor ReservedCodeCacheSize
>>>>> for
>>>> graal or just update the testcase.
>>>>>
>>>>> There are not many test cases they rely on ReservedCodeCacheSize or
>>>> InitialCodeCacheSize. So the fix prefer the later one.
>>>>>
>>>>>
>>>>>
>>>>> JBS - https://bugs.openjdk.java.net/browse/JDK-8248295
>>>>>
>>>>> Webrev - http://cr.openjdk.java.net/~fmatte/8248295/webrev.00/
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Fairoz
>>>>>
>>>>>
>>>>>

From vladimir.kozlov at oracle.com  Wed Aug 19 16:43:08 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 19 Aug 2020 09:43:08 -0700
Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and
 debugging support
In-Reply-To: <0d5fd444-e836-8042-3039-6d16e62ecfb1@oracle.com>
References: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com>
 <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com>
 <5c6d0dae-a30d-7b32-4503-62b7ec460606@oracle.com>
 <d4f2ee42-6db4-644f-0b0c-2c6e4d6373bd@oracle.com>
 <4b6d3d8a-f2f3-9942-de60-078a0e5c46d9@oracle.com>
 <8cd1d560-f473-f4f1-a865-70e306d4750f@oracle.com>
 <0d5fd444-e836-8042-3039-6d16e62ecfb1@oracle.com>
Message-ID: <dd9f2762-9ec0-3788-a27d-5e046df5d81d@oracle.com>

Looks good.

Thanks,
Vladimir K

On 8/19/20 7:06 AM, Christian Hagedorn wrote:
> On 18.08.20 17:41, Vladimir Kozlov wrote:
>> c1_Compilation.hpp: looks like both versions of allocator() do the same thing.
> 
> Right, I first wanted to have a public allocator() version in non-product only - but that might be over-engineered as 
> they do the same thing. I changed it back to a single public version.
> 
>> I suggest to build with configure --with-debug-level=optimized to check that NOT_PRODUCT can be built with these changes.
> 
> That's a good idea! I indeed forgot about one NOT_PRODUCT -> DEBUG_ONLY change. I also found other build issues with the 
> optimized build. I filed [1] and already sent an RFR for it. It builds successfully with this patch on top of it.
> 
> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.02/
> 
> Best regards,
> Christian
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8252037
> 
>> Thanks,
>> Vladimir
>>
>> On 8/18/20 6:16 AM, Christian Hagedorn wrote:
>>> Hi Vladimir
>>>
>>> On 17.08.20 19:36, Vladimir Kozlov wrote:
>>>> On 8/17/20 12:44 AM, Christian Hagedorn wrote:
>>>>> Hi Vladimir
>>>>>
>>>>> Yes, you're right, these should be changed into ASSERT and DEBUG().
>>>>>
>>>>> I'm wondering though if these ifdefs are even required for if-blocks inside methods?
>>>>>
>>>>> Isn't, for example, this if-block:
>>>>>
>>>>> #ifndef PRODUCT
>>>>> ???????? if (TraceLinearScanLevel >= 2) {
>>>>> ?????????? tty->print_cr("killing XMMs for trig");
>>>>> ???????? }
>>>>> #endif
>>>>>
>>>>> removed anyways when the flag is set to < 2 (which is statically known and thus would allow this entire block to be 
>>>>> removed)? Or does it make a difference by explicitly guarding it with an ifdef?
>>>>
>>>> You are right. It could be statically removed. But we keep #ifdef sometimes to indicate that code is executed only 
>>>> in debug build because we don't always remember type of a flag.
>>>
>>> I see, that makes sense. I updated my patch and left the ifdefs there but changed them to ASSERT. I also updated 
>>> other ifdefs belonging to TraceLinearScanLevel appropriately.
>>>
>>> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.01/
>>>
>>> Best regards,
>>> Christian
>>>
>>>>
>>>> Thanks,
>>>> Vladimir K
>>>>
>>>>>
>>>>> Best regards,
>>>>> Christian
>>>>>
>>>>> On 14.08.20 20:09, Vladimir Kozlov wrote:
>>>>>> One note. Most of the code is guarded by #ifndef PRODUCT.
>>>>>>
>>>>>> But the flag is available only in DEBUG build:
>>>>>> ?? develop(intx, TraceLinearScanLevel, 0,
>>>>>>
>>>>>> Should we use #ifdef ASSERT and DEBUG() instead?
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 8/14/20 5:10 AM, Christian Hagedorn wrote:
>>>>>>> Hi
>>>>>>>
>>>>>>> Please review the following enhancement for C1:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8251093
>>>>>>> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.00/
>>>>>>>
>>>>>>> While I was working on JDK-8249603 [1], I added some additional debugging and logging code which helped to figure 
>>>>>>> out what was going on. I think it would be useful to have this code around for the analysis of future C1 register 
>>>>>>> allocator bugs.
>>>>>>>
>>>>>>> This RFE adds (everything non-product code):
>>>>>>> - find_interval(number): Can be called like that from gdb anywhere to find an interval with the given number.
>>>>>>> - Interval::print_children()/print_parent(): Useful when debugging with gdb to quickly show the split children 
>>>>>>> and parent.
>>>>>>> - LinearScan::print_reg_num(number): Prints the register or stack location for this register number. This is 
>>>>>>> useful in some places (logging with TraceLinearScanLevel set) where it just printed a number which first had to 
>>>>>>> be manually looked up in other logs.
>>>>>>>
>>>>>>> I additionally did some cleanup of the touched code.
>>>>>>>
>>>>>>> We could additionally split the TraceLinearScanLevel flag into separate flags related to the different phases of 
>>>>>>> the register allocation algorithm. It currently just prints too much details on the higher levels. You often find 
>>>>>>> yourself being interested in a specific part of the algorithm and only want to know more details there. To 
>>>>>>> achieve that you now you have to either handle all the noise or manually disable/enable other logs. We could file 
>>>>>>> an RFE to clean this up if it's worth the effort - given that there are not many new issues filed for C1 register 
>>>>>>> allocation today.
>>>>>>>
>>>>>>> Thank you!
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Christian
>>>>>>>
>>>>>>>
>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8251093
>>>>>>>

From joserz at linux.ibm.com  Wed Aug 19 16:53:38 2020
From: joserz at linux.ibm.com (joserz at linux.ibm.com)
Date: Wed, 19 Aug 2020 13:53:38 -0300
Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new
 byte-reverse instructions
In-Reply-To: <AM4PR02MB30574A706883A995CAEC3A879A5D0@AM4PR02MB3057.eurprd02.prod.outlook.com>
References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com>
 <20200626182644.GA262544@pacoca>
 <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com>
 <20200630001528.GA26652@pacoca>
 <OF8BF41F00.CCAC864B-ON002585C8.00003A9A-492585C8.002908BA@notes.na.collabserv.com>
 <AM4PR02MB3057CBF45D7ABE3F7850535B9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <20200819002432.GA915540@pacoca>
 <AM4PR02MB30574A706883A995CAEC3A879A5D0@AM4PR02MB3057.eurprd02.prod.outlook.com>
Message-ID: <20200819165338.GA978936@pacoca>

On Wed, Aug 19, 2020 at 09:55:50AM +0000, Doerr, Martin wrote:
> Hi Jose,
> 
> thanks for the update.
> 
> I have never seen 2 format specifications in the ad file. Does that work or does the 2nd one overwrite the 1st one?
> I think it should be:
>   format %{ "BRH   $dst, $src\n\t"
>             "EXTSH $dst, $dst" %}

You're right, actually the 2nd one overwrote the first. I just fixed it. Thanks sir!

> 
> I don't need to see another webrev for that. Otherwise, the change looks good. Thanks for contributing.
> 
> Best regards,
> Martin
> 
> 
> > -----Original Message-----
> > From: joserz at linux.ibm.com <joserz at linux.ibm.com>
> > Sent: Mittwoch, 19. August 2020 02:25
> > To: Doerr, Martin <martin.doerr at sap.com>
> > Cc: Michihiro Horie <HORIE at jp.ibm.com>; hotspot-compiler-
> > dev at openjdk.java.net
> > Subject: Re: RFR(M): 8248190: PPC: Enable Power10 system and use new
> > byte-reverse instructions
> > 
> > Hallo Martin!
> > 
> > Thank you very much for your review. Here is the v3:
> > 
> > Webrev: http://cr.openjdk.java.net/~mhorie/8248190/webrev.02/
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190
> > 
> > I run a functional test and it's working as expected. If you try to run it in a
> > system <P10 you will get the following message:
> > 
> > $ java -XX:+UseByteReverseInstructions ReverseBytes
> > OpenJDK 64-Bit Server VM warning: UseByteReverseInstructions specified,
> > but needs at least Power10.
> > (continue with existing code)
> > 
> > > Unfortunately, I couldn?t find a Power10 machine in my garage ??
> > ????????
> > 
> > This is the code I use to test:
> > 8<---------------------------------------------------------------
> > import java.io.IOException;
> > 
> > class ReverseBytes
> > {
> >     public static void main(String[] args) throws IOException
> >     {
> >         for (int i = 0; i < 1000000; ++i) {
> >             if (Integer.reverseBytes(0x12345678) != 0x78563412) {
> >                 throw new RuntimeException();
> >             }
> > 
> >             if (Long.reverseBytes(0x123456789ABCDEF0L) !=
> > 0xF0DEBC9A78563412L) {
> >                 throw new RuntimeException();
> >             }
> > 
> >             if (Short.reverseBytes((short)0x1234) != (short)0x3412) {
> >                 throw new RuntimeException();
> >             }
> > 
> >             if (Character.reverseBytes((char)0xabcd) != (char)0xcdab) {
> >                 throw new RuntimeException();
> >             }
> >         }
> >         System.out.println("ok");
> >     }
> > }
> > 8<---------------------------------------------------------------
> > 
> > Best regards!
> > 
> > Jose
> > 
> > On Tue, Aug 18, 2020 at 09:13:39AM +0000, Doerr, Martin wrote:
> > > Hi Michihiro and Jose,
> > >
> > > I had only done a quick review during my vacation. Thanks for updating the
> > description of PowerArchitecturePPC64.
> > > After taking a second look, I have a few minor requests. Sorry for that.
> > >
> > >
> > >   *   ?UseByteReverseInstructions? (plural) would be more consistent with
> > other names.
> > >   *   Please add ?size? specifications to the ppc.ad file. Otherwise, the
> > compiler has to determine sizes dynamically every time.
> > >   *   bytes_reverse_short: ?format? specification misses ?extsh?.
> > >
> > > Unfortunately, I couldn?t find a Power10 machine in my garage ??
> > > So we rely on your testing.
> > >
> > > Thanks and best regards,
> > > Martin
> > >
> > >
> > > From: Michihiro Horie <HORIE at jp.ibm.com>
> > > Sent: Dienstag, 18. August 2020 09:28
> > > To: Doerr, Martin <martin.doerr at sap.com>
> > > Cc: hotspot-compiler-dev at openjdk.java.net; joserz at linux.ibm.com
> > > Subject: RE: RFR(M): 8248190: PPC: Enable Power10 system and use new
> > byte-reverse instructions
> > >
> > >
> > > Jose,
> > > Latest change looks good also to me.
> > >
> > > Marin,
> > > Do you think if I can push the change?
> > >
> > > Best regards,
> > > Michihiro
> > >
> > >
> > > ----- Original message -----
> > > From: "Doerr, Martin"
> > <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>
> > > To: "joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>"
> > <joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>>
> > > Cc: hotspot compiler <hotspot-compiler-
> > dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>>,
> > "horie at jp.ibm.com<mailto:horie at jp.ibm.com>"
> > <horie at jp.ibm.com<mailto:horie at jp.ibm.com>>
> > > Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system
> > and use new byte-reverse instructions
> > > Date: Wed, Jul 1, 2020 4:01 AM
> > >
> > > Thanks for the much better flag description.
> > > Looks good.
> > >
> > > Best regards,
> > > Martin
> > >
> > > > Am 30.06.2020 um 02:15 schrieb
> > "joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>"
> > <joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>>:
> > > >
> > > > ?Hello team,
> > > >
> > > > Here's the 2nd version, implementing the suggestions asked by Martin.
> > > >
> > > > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.01/
> > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190
> > > >
> > > > Thank you!!
> > > >
> > > > Jose
> > > >
> > > >> On Sat, Jun 27, 2020 at 09:29:32AM +0000, Doerr, Martin wrote:
> > > >> Hi Jose,
> > > >>
> > > >> Can you replace the outdated description of PowerArchitecturePPC64 in
> > globals_poc.hpp by something generic, please?
> > > >>
> > > >> Please update the Copyright year in vm_version_poc.hpp.
> > > >>
> > > >> I can?t test the change, but it looks good to me.
> > > >>
> > > >> Best regards,
> > > >> Martin
> > > >>
> > > >>>> Am 26.06.2020 um 20:29 schrieb
> > "joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>"
> > <joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>>:
> > > >>>
> > > >>> ?Hello team!
> > > >>>
> > > >>> This patch introduces Power10 to OpenJDK and implements three new
> > instructions:
> > > >>> - brh - byte-reverse halfword
> > > >>> - brw - byte-reverse word
> > > >>> - brd - byte-reverse doubleword
> > > >>>
> > > >>> Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/
> > > >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8248190
> > > >>>
> > > >>> Thanks for your review!
> > > >>>
> > > >>> Jose R. Ziviani
> > >

From cjashfor at linux.ibm.com  Wed Aug 19 18:10:50 2020
From: cjashfor at linux.ibm.com (Corey Ashford)
Date: Wed, 19 Aug 2020 11:10:50 -0700
Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API
 for Base64 decoding
In-Reply-To: <ef75251b-c5ef-10b5-d0bd-c4ba15d2c934@linux.ibm.com>
References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com>
 <OFF371E11D.F5B64585-ON00258591.0006A5F8-49258591.0007AA3E@notes.na.collabserv.com>
 <ef75251b-c5ef-10b5-d0bd-c4ba15d2c934@linux.ibm.com>
Message-ID: <f4c2d16d-5531-b0a6-b763-bed9d316190a@linux.ibm.com>

Michihiro Horie posted up a new iteration of this webrev for me.  This 
time the webrev includes a complete implementation of the intrinsic for 
Power9 and Power10.

You can find it here: http://cr.openjdk.java.net/~mhorie/8248188/webrev.02/

Changes in webrev.02 vs. webrev.01:

   * The method header for the intrinsic in the Base64 code has been 
rewritten using the Javadoc style.  The clarity of the comments has been 
improved and some verbosity has been removed.  There are no additional 
functional changes to Base64.java.

   * The code needed to martial and check the intrinsic parameters has 
been added, using the base64 encodeBlock intrinsic as a guideline.

   * A complete intrinsic implementation for Power9 and Power10 is included.

   * Adds some Power9 and Power10 assembler instructions needed by the 
intrinsic which hadn't been defined before.

The intrinsic implementation in this patch accelerates the decoding of 
large blocks of base64 data by a factor of about 3.5X on Power9.

I'm attaching two Java test cases I am using for testing and 
benchmarking.  The TestBase64_VB encodes and decodes randomly-sized 
buffers of random data and checks that original data matches the 
encoded-then-decoded data.  TestBase64Errors encodes a 48K block of 
random bytes, then corrupts each byte of the encoded data, one at a 
time, checking to see if the decoder catches the illegal byte.

Any comments/suggestions would be appreciated.

Thanks,

- Corey

On 7/27/20 6:49 PM, Corey Ashford wrote:
> Michihiro Horie uploaded a new revision of the Base64 decodeBlock 
> intrinsic API for me:
> 
> http://cr.openjdk.java.net/~mhorie/8248188/webrev.01/
> 
> It has the following changes with respect to the original one posted:
> 
>  ?* In the event of encountering a non-base64 character, instead of 
> having a separate error code of -1, the intrinsic can now just return 
> either 0, or the number of data bytes produced up to the point where the 
> illegal base64 character was encountered.? This reduces the number of 
> special cases, and also provides a way to speed up the process of 
> finding the bad character by the slower, pure-Java algorithm.
> 
>  ?* The isMIME boolean is removed from the API for two reasons:
>  ?? - The current API is not sufficient to handle the isMIME case, 
> because there isn't a strict relationship between the number of input 
> bytes and the number of output bytes, because there can be an arbitrary 
> number of non-base64 characters in the source.
>  ?? - If an intrinsic only implements the (isMIME == false) case as ours 
> does, it will always return 0 bytes processed, which will slightly slow 
> down the normal path of processing an (isMIME == true) instantiation.
>  ?? - We considered adding a separate hotspot candidate for the (isMIME 
> == true) case, but since we don't have an intrinsic implementation to 
> test that, we decided to leave it as a future optimization.
> 
> Comments and suggestions are welcome.? Thanks for your consideration.
> 
> - Corey
> 
> On 6/23/20 6:23 PM, Michihiro Horie wrote:
>> Hi Corey,
>>
>> Following is the issue I created.
>> https://bugs.openjdk.java.net/browse/JDK-8248188
>>
>> I will upload a webrev when you're ready as we talked in private.
>>
>> Best regards,
>> Michihiro
>>
>> Inactive hide details for "Corey Ashford" ---2020/06/24 
>> 09:40:10---Currently in java.util.Base64, there is a 
>> HotSpotIntrinsicCa"Corey Ashford" ---2020/06/24 09:40:10---Currently 
>> in java.util.Base64, there is a HotSpotIntrinsicCandidate and API for 
>> encodeBlock, but no
>>
>> From: "Corey Ashford" <cjashfor at linux.ibm.com>
>> To: "hotspot-compiler-dev at openjdk.java.net" 
>> <hotspot-compiler-dev at openjdk.java.net>, 
>> "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net>
>> Cc: Michihiro Horie/Japan/IBM at IBMJP, Kazunori Ogata/Japan/IBM at IBMJP, 
>> joserz at br.ibm.com
>> Date: 2020/06/24 09:40
>> Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for 
>> Base64 decoding
>>
>> ------------------------------------------------------------------------
>>
>>
>>
>> Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and
>> API for encodeBlock, but none for decoding. ?This means that only
>> encoding gets acceleration from the underlying CPU's vector hardware.
>>
>> I'd like to propose adding a new intrinsic for decodeBlock. ?The
>> considerations I have for this new intrinsic's API:
>>
>> ??* Don't make any assumptions about the underlying capability of the
>> hardware. ?For example, do not impose any specific block size 
>> granularity.
>>
>> ??* Don't assume the underlying intrinsic can handle isMIME or isURL
>> modes, but also let them decide if they will process the data regardless
>> of the settings of the two booleans.
>>
>> ??* Any remaining data that is not processed by the intrinsic will be
>> processed by the pure Java implementation. ?This allows the intrinsic to
>> process whatever block sizes it's good at without the complexity of
>> handling the end fragments.
>>
>> ??* If any illegal character is discovered in the decoding process, the
>> intrinsic will simply return -1, instead of requiring it to throw a
>> proper exception from the context of the intrinsic. ?In the event of
>> getting a -1 returned from the intrinsic, the Java Base64 library code
>> simply calls the pure Java implementation to have it find the error and
>> properly throw an exception. ?This is a performance trade-off in the
>> case of an error (which I expect to be very rare).
>>
>> ??* One thought I have for a further optimization (not implemented in
>> the current patch), is that when the intrinsic decides not to process a
>> block because of some combination of isURL and isMIME settings it
>> doesn't handle, it could return extra bits in the return code, encoded
>> as a negative number. ?For example:
>>
>> Illegal_Base64_char ? = 0b001;
>> isMIME_unsupported ? ?= 0b010;
>> isURL_unsupported ? ? = 0b100;
>>
>> These can be OR'd together as needed and then negated (flip the sign).
>> The Base64 library code could then cache these flags, so it will know
>> not to call the intrinsic again when another decodeBlock is requested
>> but with an unsupported mode. ?This will save the performance hit of
>> calling the intrinsic when it is guaranteed to fail.
>>
>> I've tested the attached patch with an actual intrinsic coded up for
>> Power9/Power10, but those runtime intrinsics and arch-specific patches
>> aren't attached today. ?I want to get some consensus on the
>> library-level intrinsic API first.
>>
>> Also attached is a simple test case to test that the new intrinsic API
>> doesn't break anything.
>>
>> I'm open to any comments about this.
>>
>> Thanks for your consideration,
>>
>> - Corey
>>
>>
>> Corey Ashford
>> IBM Systems, Linux Technology Center, OpenJDK team
>> cjashfor at us dot ibm dot com
>> [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro 
>> Horie/Japan/IBM] [attachment "TestBase64.java" deleted by Michihiro 
>> Horie/Japan/IBM]
>>
>>
> 


From doug.simon at oracle.com  Wed Aug 19 19:16:54 2020
From: doug.simon at oracle.com (Doug Simon)
Date: Wed, 19 Aug 2020 21:16:54 +0200
Subject: RFR(S): 8251923: "Invalid JNI handle" assertion failure in
 JVMCICompiler::force_comp_at_level_simple()
In-Reply-To: <85pn7nxc8z.fsf@nicgas01-pc.shanghai.arm.com>
References: <85pn7nxc8z.fsf@nicgas01-pc.shanghai.arm.com>
Message-ID: <6E15607C-D983-4645-86DB-115BDB7F563E@oracle.com>

Looks good to me.

-Doug

> On 19 Aug 2020, at 10:37, Nick Gasson <Nick.Gasson at arm.com> wrote:
> 
> Hi,
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8251923
> Webrev: http://cr.openjdk.java.net/~ngasson/8251923/webrev.1/
> 
> We see this crash occasionally when testing with Graal on some AArch64
> systems:
> 
> #
> #  Internal Error (/home/ent-user/jdk_src/src/hotspot/share/runtime/jniHandles.inline.hpp:63), pid=92161, tid=92593
> #  assert(external_guard || result != __null) failed: Invalid JNI handle
> #
> 
> V  [libjvm.so+0xdfaa84]  JNIHandles::resolve(_jobject*)+0x19c
> V  [libjvm.so+0xf25104]  HotSpotJVMCI::resolve(JVMCIObject)+0x14
> V  [libjvm.so+0xe9bd20]  JVMCICompiler::force_comp_at_level_simple(methodHandle const&)+0xa0
> V  [libjvm.so+0x174bd6c]  TieredThresholdPolicy::is_mature(Method*)+0x51c
> V  [libjvm.so+0x76e68c]  ciMethodData::load_data()+0x9cc
> 
> The full hs_err file is attached to the JBS entry.
> 
> The handle here is _HotSpotJVMCIRuntime_instance which is initialised in
> JVMCIRuntime::initialize_HotSpotJVMCIRuntime():
> 
>   JVMCIObject result = JVMCIENV->call_HotSpotJVMCIRuntime_runtime(JVMCI_CHECK);
>   _HotSpotJVMCIRuntime_instance = JVMCIENV->make_global(result);
> 
> JVMCICompiler::force_comp_at_level_simple() checks whether the _object
> field inside the handle is null before calling JNIHandles::resolve() on
> it, which should avoid the above assertion failure where the pointee is
> null. However on a non-TSO architecture another thread may observe the
> store to _object when assigning _HotSpotJVMCIRuntime_instance before the
> store in JVMCIEnv::make_global() that initialises the pointed-to oop. We
> need to add a store-store barrier here to force the expected ordering.
> 
> Tested with jcstress and Graal on the affected machine, which used to
> reproduce it quite reliably.
> 
> --
> Thanks,
> Nick


From vladimir.kozlov at oracle.com  Wed Aug 19 19:18:45 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 19 Aug 2020 12:18:45 -0700
Subject: RFR(S): 8251923: "Invalid JNI handle" assertion failure in
 JVMCICompiler::force_comp_at_level_simple()
In-Reply-To: <6E15607C-D983-4645-86DB-115BDB7F563E@oracle.com>
References: <85pn7nxc8z.fsf@nicgas01-pc.shanghai.arm.com>
 <6E15607C-D983-4645-86DB-115BDB7F563E@oracle.com>
Message-ID: <83e818a0-b9d2-205b-6a25-4869fc1e2101@oracle.com>

+1

Thanks,
Vladimir K

On 8/19/20 12:16 PM, Doug Simon wrote:
> Looks good to me.
> 
> -Doug
> 
>> On 19 Aug 2020, at 10:37, Nick Gasson <Nick.Gasson at arm.com> wrote:
>>
>> Hi,
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8251923
>> Webrev: http://cr.openjdk.java.net/~ngasson/8251923/webrev.1/
>>
>> We see this crash occasionally when testing with Graal on some AArch64
>> systems:
>>
>> #
>> #  Internal Error (/home/ent-user/jdk_src/src/hotspot/share/runtime/jniHandles.inline.hpp:63), pid=92161, tid=92593
>> #  assert(external_guard || result != __null) failed: Invalid JNI handle
>> #
>>
>> V  [libjvm.so+0xdfaa84]  JNIHandles::resolve(_jobject*)+0x19c
>> V  [libjvm.so+0xf25104]  HotSpotJVMCI::resolve(JVMCIObject)+0x14
>> V  [libjvm.so+0xe9bd20]  JVMCICompiler::force_comp_at_level_simple(methodHandle const&)+0xa0
>> V  [libjvm.so+0x174bd6c]  TieredThresholdPolicy::is_mature(Method*)+0x51c
>> V  [libjvm.so+0x76e68c]  ciMethodData::load_data()+0x9cc
>>
>> The full hs_err file is attached to the JBS entry.
>>
>> The handle here is _HotSpotJVMCIRuntime_instance which is initialised in
>> JVMCIRuntime::initialize_HotSpotJVMCIRuntime():
>>
>>    JVMCIObject result = JVMCIENV->call_HotSpotJVMCIRuntime_runtime(JVMCI_CHECK);
>>    _HotSpotJVMCIRuntime_instance = JVMCIENV->make_global(result);
>>
>> JVMCICompiler::force_comp_at_level_simple() checks whether the _object
>> field inside the handle is null before calling JNIHandles::resolve() on
>> it, which should avoid the above assertion failure where the pointee is
>> null. However on a non-TSO architecture another thread may observe the
>> store to _object when assigning _HotSpotJVMCIRuntime_instance before the
>> store in JVMCIEnv::make_global() that initialises the pointed-to oop. We
>> need to add a store-store barrier here to force the expected ordering.
>>
>> Tested with jcstress and Graal on the affected machine, which used to
>> reproduce it quite reliably.
>>
>> --
>> Thanks,
>> Nick
> 

From serguei.spitsyn at oracle.com  Wed Aug 19 20:14:56 2020
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Wed, 19 Aug 2020 13:14:56 -0700
Subject: RFR(s): 8248295:
 serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal
In-Reply-To: <1b7f5767-7d1f-1f43-87bb-556801ef1c41@oracle.com>
References: <d9b5bcb9-ebc4-42d2-ac22-cba1128a3cf4@default>
 <1bdcbd35-097e-1681-3a0c-32f9709497a4@oracle.com>
 <eef7f47b-711c-476c-9ca5-f75b55962bb9@default>
 <7040f785-b871-9771-94a2-4c3472a6bf6d@oracle.com>
 <59cd0914-5a61-463e-b46f-ebdc1496ab9f@default>
 <1b7f5767-7d1f-1f43-87bb-556801ef1c41@oracle.com>
Message-ID: <6f104422-11cc-1bea-2ebf-a916a22f10fd@oracle.com>

Hi Fairoz,

LGTM++

Thanks,
Serguei


On 8/19/20 09:38, Vladimir Kozlov wrote:
> Looks good.
>
> Thanks,
> Vladimir K
>
> On 8/19/20 5:30 AM, Fairoz Matte wrote:
>> Hi Vladimir,
>>
>> Thanks for the review.
>>
>>> I would suggest to run test with -XX:+PrintCodeCache flag which prints
>>> CodeCache usage on exit.
>>>
>>> Also add '-ea -esa' flags - some runs failed with them because they 
>>> increase
>>> Graal's methods size.
>>>
>>> Running test with immediately caused OOM error on my local linux 
>>> machine:
>>>
>>> '-server -ea -esa -XX:+TieredCompilation -XX:+PrintCodeCache -
>>> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -
>>> XX:+UseJVMCICompiler -Djvmci.Compiler=graal'
>>>
>>> With -XX:ReservedCodeCacheSize=30m I got:
>>>
>>> [11.217s][warning][codecache] CodeCache is full. Compiler has been
>>> disabled.
>>> [11.217s][warning][codecache] Try increasing the code cache size 
>>> using -
>>> XX:ReservedCodeCacheSize=
>>>
>>> With -XX:ReservedCodeCacheSize=50m I got this output:
>>
>> Further testing with PrintCodeCache, ReservedCodeCacheSize = 50MB is 
>> the safe one to use.
>>
>>>
>>> CodeCache: size=51200Kb used=34401Kb max_used=34401Kb free=16798Kb
>>>
>>> May be you need to set it to 35m or better to 50m to be safe.
>>>
>>> Note, without Graal test uses only 5.5m:
>>>
>>> CodeCache: size=20480Kb used=5677Kb max_used=5688Kb free=14803Kb
>>>
>>> -----------------------------
>>>
>>> I also forgot to ask you to update test's Copyright year.
>>
>> I have updated the copyright year.
>> Updated webrev for the reference - 
>> http://cr.openjdk.java.net/~fmatte/8248295/webrev.01/
>>
>> Thanks,
>> Fairoz
>>>
>>> Regards,
>>> Vladimir K
>>>
>>> On 8/18/20 1:10 AM, Fairoz Matte wrote:
>>>> Hi Vladimir,
>>>>
>>>> Thanks for looking into.
>>>> This is intermittent crash, and is reproducible in windows debug build
>>> environment. Below is the testing performed.
>>>>
>>>> 1. Issues observed 7/100 runs, ReservedCodeCacheSize=20m with "-
>>> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -
>>> XX:+UseJVMCICompiler"
>>>> 2. Issues observed 0/300 runs, ReservedCodeCacheSize=30m with "-
>>> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -
>>> XX:+UseJVMCICompiler"
>>>>
>>>> Thanks,
>>>> Fairoz
>>>>
>>>>> -----Original Message-----
>>>>> From: Vladimir Kozlov
>>>>> Sent: Monday, August 17, 2020 11:22 PM
>>>>> To: Fairoz Matte <fairoz.matte at oracle.com>; hotspot-compiler-
>>>>> dev at openjdk.java.net; serviceability-dev at openjdk.java.net
>>>>> Cc: Coleen Phillimore <coleen.phillimore at oracle.com>; Dean Long
>>>>> <dean.long at oracle.com>
>>>>> Subject: Re: RFR(s): 8248295:
>>>>> serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with
>>>>> Graal
>>>>>
>>>>> Hi Fairoz,
>>>>>
>>>>> How you determine that +10Mb is enough with Graal?
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 8/17/20 5:46 AM, Fairoz Matte wrote:
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Please review this small test change to work with Graal.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Background:
>>>>>>
>>>>>> Graal require more code cache compared to c1/c2. but the test case
>>>>>> always
>>>>> set it to 20MB. This may not be sufficient when running graal.
>>>>>>
>>>>>> Default configuration for ReservedCodeCacheSize = 250MB
>>>>>>
>>>>>> With graal enabled, ReservedCodeCacheSize = 350MB
>>>>>>
>>>>>>
>>>>>>
>>>>>> Either we can modify the framework to honor ReservedCodeCacheSize
>>>>>> for
>>>>> graal or just update the testcase.
>>>>>>
>>>>>> There are not many test cases they rely on ReservedCodeCacheSize or
>>>>> InitialCodeCacheSize. So the fix prefer the later one.
>>>>>>
>>>>>>
>>>>>>
>>>>>> JBS - https://bugs.openjdk.java.net/browse/JDK-8248295
>>>>>>
>>>>>> Webrev - http://cr.openjdk.java.net/~fmatte/8248295/webrev.00/
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Fairoz
>>>>>>
>>>>>>
>>>>>>


From mikael.vidstedt at oracle.com  Wed Aug 19 22:14:21 2020
From: mikael.vidstedt at oracle.com (Mikael Vidstedt)
Date: Wed, 19 Aug 2020 15:14:21 -0700
Subject: RFR(XS): 8252051: Make mlvmJvmtiUtils strncpy uses GCC 10.x friendly
Message-ID: <AC03DC0F-46E2-41E8-B976-ACDD3D4147FC@oracle.com>


Please review this small change which updates the strncpy code in mlvmJvmtiUtils.cpp to make gcc 10.x happy:

JBS: https://bugs.openjdk.java.net/browse/JDK-8252051
webrev: http://cr.openjdk.java.net/~mikael/webrevs/8252051/webrev.00/open/webrev/

* Background (from JBS)

gcc 10.2 is producing a warning for mlvmJmvtiUtils.cpp: 

In file included from test/hotspot/jtreg/vmTestbase/vm/mlvm/indy/func/jvmti/share/libIndyRedefineClass.cpp:31: 
test/hotspot/jtreg/vmTestbase/vm/mlvm/share/mlvmJvmtiUtils.cpp:100:12: error: 'char* strncpy(char*, const char*, size_t)' specified bound 256 equals destination size [-Werror=stringop-truncation] 
  100 | strncpy(mn->classSig, szSignature, sizeof(mn->classSig)); 
      | ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
cc1plus: all warnings being treated as errors 

It seems like gcc is not smart enough to realize that the strncpy on the previous line (mn->methodName) cannot modify szSignature.


* Testing

tier1 and test/hotspot/jtreg:vmTestbase_vm_mlvm locally


Cheers,
Mikael


From igor.ignatyev at oracle.com  Wed Aug 19 22:25:12 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 19 Aug 2020 15:25:12 -0700
Subject: RFR(XS): 8252051: Make mlvmJvmtiUtils strncpy uses GCC 10.x
 friendly
In-Reply-To: <AC03DC0F-46E2-41E8-B976-ACDD3D4147FC@oracle.com>
References: <AC03DC0F-46E2-41E8-B976-ACDD3D4147FC@oracle.com>
Message-ID: <BFD06F1A-756A-4FAE-80BA-AEFF3F8EA870@oracle.com>

LGTM
-- Igor

> On Aug 19, 2020, at 3:14 PM, Mikael Vidstedt <mikael.vidstedt at oracle.com> wrote:
> 
> 
> Please review this small change which updates the strncpy code in mlvmJvmtiUtils.cpp to make gcc 10.x happy:
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8252051
> webrev: http://cr.openjdk.java.net/~mikael/webrevs/8252051/webrev.00/open/webrev/
> 
> * Background (from JBS)
> 
> gcc 10.2 is producing a warning for mlvmJmvtiUtils.cpp: 
> 
> In file included from test/hotspot/jtreg/vmTestbase/vm/mlvm/indy/func/jvmti/share/libIndyRedefineClass.cpp:31: 
> test/hotspot/jtreg/vmTestbase/vm/mlvm/share/mlvmJvmtiUtils.cpp:100:12: error: 'char* strncpy(char*, const char*, size_t)' specified bound 256 equals destination size [-Werror=stringop-truncation] 
>  100 | strncpy(mn->classSig, szSignature, sizeof(mn->classSig)); 
>      | ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
> cc1plus: all warnings being treated as errors 
> 
> It seems like gcc is not smart enough to realize that the strncpy on the previous line (mn->methodName) cannot modify szSignature.
> 
> 
> * Testing
> 
> tier1 and test/hotspot/jtreg:vmTestbase_vm_mlvm locally
> 
> 
> Cheers,
> Mikael
> 


From vladimir.kozlov at oracle.com  Wed Aug 19 22:59:55 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 19 Aug 2020 15:59:55 -0700
Subject: RFR(XS): 8252051: Make mlvmJvmtiUtils strncpy uses GCC 10.x
 friendly
In-Reply-To: <BFD06F1A-756A-4FAE-80BA-AEFF3F8EA870@oracle.com>
References: <AC03DC0F-46E2-41E8-B976-ACDD3D4147FC@oracle.com>
 <BFD06F1A-756A-4FAE-80BA-AEFF3F8EA870@oracle.com>
Message-ID: <6c1007ab-2a92-769f-688f-b123324d5d5b@oracle.com>

+1

Vladimir K

On 8/19/20 3:25 PM, Igor Ignatyev wrote:
> LGTM
> -- Igor
> 
>> On Aug 19, 2020, at 3:14 PM, Mikael Vidstedt <mikael.vidstedt at oracle.com> wrote:
>>
>>
>> Please review this small change which updates the strncpy code in mlvmJvmtiUtils.cpp to make gcc 10.x happy:
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8252051
>> webrev: http://cr.openjdk.java.net/~mikael/webrevs/8252051/webrev.00/open/webrev/
>>
>> * Background (from JBS)
>>
>> gcc 10.2 is producing a warning for mlvmJmvtiUtils.cpp:
>>
>> In file included from test/hotspot/jtreg/vmTestbase/vm/mlvm/indy/func/jvmti/share/libIndyRedefineClass.cpp:31:
>> test/hotspot/jtreg/vmTestbase/vm/mlvm/share/mlvmJvmtiUtils.cpp:100:12: error: 'char* strncpy(char*, const char*, size_t)' specified bound 256 equals destination size [-Werror=stringop-truncation]
>>   100 | strncpy(mn->classSig, szSignature, sizeof(mn->classSig));
>>       | ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> cc1plus: all warnings being treated as errors
>>
>> It seems like gcc is not smart enough to realize that the strncpy on the previous line (mn->methodName) cannot modify szSignature.
>>
>>
>> * Testing
>>
>> tier1 and test/hotspot/jtreg:vmTestbase_vm_mlvm locally
>>
>>
>> Cheers,
>> Mikael
>>
> 

From serguei.spitsyn at oracle.com  Wed Aug 19 23:22:08 2020
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Wed, 19 Aug 2020 16:22:08 -0700
Subject: RFR(T) : 8252005 : narrow disabling of allowSmartActionArgs in
 vmTestbase
In-Reply-To: <4E6FECE6-9103-46ED-84B2-79DBA0123ED9@oracle.com>
References: <4E6FECE6-9103-46ED-84B2-79DBA0123ED9@oracle.com>
Message-ID: <17a8369e-5f38-ebab-974b-28e083378aa2@oracle.com>

Hi Igor,

This looks reasonable.

Thanks,
Serguei


On 8/18/20 16:42, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev//8252005/webrev.00/
>> 0 lines changed: 0 ins; 0 del; 0 mod;
> Hi all,
>
> could you please review this trivial (and apparently empty) patch which sets allowSmartActionArgs to false only in subdirectories of vmTestbase which currently use PropertyResolvingWrapper?
>
> (it's hard to tell from webrev or patch, but test/hotspot/jtreg/vmTestbase/TEST.properties is effectively removed)
>
> webrev: http://cr.openjdk.java.net/~iignatyev//8252005/webrev.00/
> JBS: https://bugs.openjdk.java.net/browse/JDK-8252005
>
> Thanks,
> -- Igor
>
>


From john.r.rose at oracle.com  Thu Aug 20 00:47:02 2020
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 19 Aug 2020 17:47:02 -0700
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <87h7t13bdz.fsf@redhat.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
 <87h7t13bdz.fsf@redhat.com>
Message-ID: <668C77F1-5CCC-43CA-9C5E-2EE390D3137A@oracle.com>

On Aug 17, 2020, at 1:49 AM, Roland Westrelin <rwestrel at redhat.com> wrote:
> 
> Does the fixed patch look ok to you?

I?m going over it one more time (between smoky breaths from
the California fires) and I have a question.  What is the exact
structure of outer_phi?

1. At first it is a clone of phi, its region patched and the other edges the same:

    outer_phi := Phi(outer_head, init, AddL(phi, stride))

2. After long_loop_replace_long_iv, the interior phi links it back to itself:

    outer_phi := Phi(outer_head, init, AddL(AddL(inner_phi, outer_phi), stride))

The thing I?m suspicious of (as fragile code) is the retention of the addend
?stride?.  The inner loop (on ?inner_phi?) *also* adds the stride.  What
prevents there from being a superfluous number of strides added?

I suppose the answer is that the ending output of the inner loop produces
the post-incremented value as (inner_incr + outer_phi), while the
pre-incremented value is (inner_phi + outer_phi), which is *always*
short of the final count by the stride; thus it?s OK to add single missing
stride in the outer loop.

But, this seems fragile to me.  Would it not be safer to make the outer
phi just copy the final inner IV value (the one that fails the loop?s test)?
So:

        outer_phi := Phi(outer_head, init, AddL(inner_incr, outer_phi))

I?m worried that some edge-case of loop loop might actually miss
a stride unless the outer phi has the latter, dead-simple form.

In particular, there are loops where there is only one IV (no separate
incr).  I?m not confident that those loops will work correctly; it seems
to me that the existing ?outer_phi?, with its extra stride addend,
may well contribute an unwanted extra step, when the inner_phi
is post-incremented.

As a related issue, I think the pseudocode comment at the top is
false, with the same problem.  It should probably not say this:

// L1: for (long phi1 = init; phi1 < limit; phi1 += stride) {
//    // phi1 := Phi(L1, init, phi1 + stride)

but rather this:

// L1: for (long phi1 = init; phi1 < limit; phi1 += phi2) {
//    // phi1 := Phi(L1, init, phi1 + phi2)

This sort of bug will show up if (a) we test long loops with
large trip counts, and (b) also use the stress mode which
makes the outer loop trip two or three times, and finally
(c) we get several kinds of loops; ones with and without
phi == incr, and with and without ?limit_check_required?,
and with each kind of possible termination condition
(< <= > >=).

? John

P.S.  I came to this question while working through the transform
logic on pseudocode.  Here it is, for reference.  It think it might
make a good diagram to place in the code, just before the comment
that says ?Peel one iteration?.


From john.r.rose at oracle.com  Thu Aug 20 00:47:46 2020
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 19 Aug 2020 17:47:46 -0700
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <668C77F1-5CCC-43CA-9C5E-2EE390D3137A@oracle.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
 <87h7t13bdz.fsf@redhat.com> <668C77F1-5CCC-43CA-9C5E-2EE390D3137A@oracle.com>
Message-ID: <EB8205E7-DF66-4FCE-B072-F9D3E2542F7C@oracle.com>

On Aug 19, 2020, at 5:47 PM, John Rose <john.r.rose at oracle.com> wrote:
> 
> On Aug 17, 2020, at 1:49 AM, Roland Westrelin <rwestrel at redhat.com <mailto:rwestrel at redhat.com>> wrote:
>> 
>> Does the fixed patch look ok to you?
> 
> I?m going over it one more time (between smoky breaths from
> the California fires) and I have a question.  What is the exact
> structure of outer_phi?
> 
> 1. At first it is a clone of phi, its region patched and the other edges the same:
> 
>     outer_phi := Phi(outer_head, init, AddL(phi, stride))
> 
> 2. After long_loop_replace_long_iv, the interior phi links it back to itself:
> 
>     outer_phi := Phi(outer_head, init, AddL(AddL(inner_phi, outer_phi), stride))
> 
> The thing I?m suspicious of (as fragile code) is the retention of the addend
> ?stride?.  The inner loop (on ?inner_phi?) *also* adds the stride.  What
> prevents there from being a superfluous number of strides added?
> 
> I suppose the answer is that the ending output of the inner loop produces
> the post-incremented value as (inner_incr + outer_phi), while the
> pre-incremented value is (inner_phi + outer_phi), which is *always*
> short of the final count by the stride; thus it?s OK to add single missing
> stride in the outer loop.
> 
> But, this seems fragile to me.  Would it not be safer to make the outer
> phi just copy the final inner IV value (the one that fails the loop?s test)?
> So:
> 
>         outer_phi := Phi(outer_head, init, AddL(inner_incr, outer_phi))
> 
> I?m worried that some edge-case of loop loop might actually miss
> a stride unless the outer phi has the latter, dead-simple form.
> 
> In particular, there are loops where there is only one IV (no separate
> incr).  I?m not confident that those loops will work correctly; it seems
> to me that the existing ?outer_phi?, with its extra stride addend,
> may well contribute an unwanted extra step, when the inner_phi
> is post-incremented.
> 
> As a related issue, I think the pseudocode comment at the top is
> false, with the same problem.  It should probably not say this:
> 
> // L1: for (long phi1 = init; phi1 < limit; phi1 += stride) {
> //    // phi1 := Phi(L1, init, phi1 + stride)
> 
> but rather this:
> 
> // L1: for (long phi1 = init; phi1 < limit; phi1 += phi2) {
> //    // phi1 := Phi(L1, init, phi1 + phi2)
> 
> This sort of bug will show up if (a) we test long loops with
> large trip counts, and (b) also use the stress mode which
> makes the outer loop trip two or three times, and finally
> (c) we get several kinds of loops; ones with and without
> phi == incr, and with and without ?limit_check_required?,
> and with each kind of possible termination condition
> (< <= > >=).
> 
> ? John
> 
> P.S.  I came to this question while working through the transform
> logic on pseudocode.  Here it is, for reference.  It think it might
> make a good diagram to place in the code, just before the comment
> that says ?Peel one iteration?.

== old IR nodes =>

entry_control: {...}
x:
for (long phi = init;;) {
  // phi := Phi(x, init, phi + stride)
  exit_test:
  if (phi < limit)
    back_control: fallthrough;
  else
    exit_branch: break;
  // test happens after increment => phi == phi_incr != NULL
  long incr = (phi + stride);
  ... use phi and incr ...
  phi = incr;
}

== new IR nodes (before final peel) =>

entry_control: {...}
long adjusted_limit = limit + stride;  //because phi_incr != NULL
assert(!limit_check_required || (extralong)limit + stride == adjusted_limit);  // else deopt
ulong inner_iters_limit = max_jint - ABS(stride) - 1;  //near 0x7FFFFFF0
outer_head:
for (long outer_phi = init;;) {  //phi->clone(), in(0):=outer_head
  // outer_phi := Phi(outer_head, init, inner_phi, phi=>(outer_phi+inner_phi) + stride)
  // >>> ISSUE: is the extra '+ stride' here always harmless? <<<
  ulong inner_iters_max = (ulong) MAX(0LL, ((extralong)adjusted_limit - outer_phi) * SGN(stride));
  int inner_iters_actual_int = (int) MIN(inner_iters_limit, inner_iters_max) * SGN(stride);
  inner_head: x: //in(1) := outer_head
  for (int inner_phi = 0;;) {
    // inner_phi := Phi(x, intcon(0), inner_phi + stride)
    int inner_incr = inner_phi + stride;
    bool inner_bol = (inner_incr < inner_iters_actual_int);
    exit_test: //exit_test->in(1) := inner_bol; 
    if (inner_bol) // WAS (phi < limit)
      back_control: fallthrough;
    else
      inner_exit_branch: break;  //exit_branch->clone()
    // REPLACE phi  => (outer_phi+inner_phi)
    // REPLACE incr => (outer_phi+inner_incr)
    ... use phi=>(outer_phi+inner_phi) and incr=>(outer_phi+inner_incr) ...
    inner_phi = inner_phi + stride;  // inner_incr
  }
  outer_exit_test:  //exit_test->clone(), in(0):=inner_exit_branch
  if ((outer_phi+inner_phi) < limit)  // WAS (phi < limit)
    outer_back_branch: fallthrough;  //back_control->clone(), in(0):=outer_exit_test
  else
    exit_branch: break;  //in(0) := outer_exit_test
}


From ningsheng.jian at arm.com  Thu Aug 20 02:27:08 2020
From: ningsheng.jian at arm.com (Ningsheng Jian)
Date: Thu, 20 Aug 2020 10:27:08 +0800
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <47a0b915-291d-7bee-c298-a85d57b1c3a7@redhat.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
 <47a0b915-291d-7bee-c298-a85d57b1c3a7@redhat.com>
Message-ID: <01299e5a-8786-bd78-83f4-5e7f900f96da@arm.com>

Hi Andrew,

On 8/19/20 9:01 PM, Andrew Dinn wrote:
> Hi Ningsheng,
> 
> On 19/08/2020 10:53, Ningsheng Jian wrote:
>> I have updated the patch based on the review comments. Would you mind
>> taking another look? Thanks!
>>
>> Full:
>> http://cr.openjdk.java.net/~njian/8231441/webrev.04/
>>
>> Incremental:
>> http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/
> 
> That looks ok. A few suggested tweaks:
> 

Thanks!

> aarch64.ad:168
> 
> I think the following comment explains more clearly what is going on:
> 
> // For SVE vector registers, we simply extend vector register size to 8
> // 'logical' slots. This is nominally 256 bits but it actually covers
> // all possible 'physical' SVE vector register lengths from 128 ~ 2048 bits.
> // The 'physical' SVE vector register length is detected during startup
> // so the register allocator is able to identify the correct number of
> // bytes needed for an SVE spill/unspill.
> // Note that a vector register with 4 slots, denotes a 128-bit NEON
> // register allowing it to be distinguished from the
> //  corresponding SVE vector register when the SVE vector length
> // is 128 bits.
> 

This looks better than mine. Thanks! :-)

> postaloc.cpp:312 & 322
> 
> 311     if (lrgs(val_idx).is_scalable()) {
> 312       assert(val->ideal_reg() == Op_VecA, "scalable vector register");
> 
>          . . .
> 
> 321       if (lrgs(val_idx).is_scalable()) {
> 322         assert(val->ideal_reg() == Op_VecA, "scalable vector register");
> 
> You don't strictly need the asserts here as this is already asserted in
> the call to is_scalable().
> 

The assertion in LRG::is_scalable() is different, while this is an 
assertion for ideal_reg of a given node.

> 
>> JTreg tests are still running, and so far no new failure found.
> Ok, well assuming they pass I am happy with this latest patch modulo the
> tweaks above.
> 

Will report back once the tests on real hardware passed.

Thanks,
Ningsheng


From nick.gasson at arm.com  Thu Aug 20 03:26:59 2020
From: nick.gasson at arm.com (Nick Gasson)
Date: Thu, 20 Aug 2020 11:26:59 +0800
Subject: RFR(S): 8251923: "Invalid JNI handle" assertion failure in
 JVMCICompiler::force_comp_at_level_simple()
In-Reply-To: <83e818a0-b9d2-205b-6a25-4869fc1e2101@oracle.com>
References: <85pn7nxc8z.fsf@nicgas01-pc.shanghai.arm.com>
 <6E15607C-D983-4645-86DB-115BDB7F563E@oracle.com>
 <83e818a0-b9d2-205b-6a25-4869fc1e2101@oracle.com>
Message-ID: <85eeo2xaik.fsf@nicgas01-pc.shanghai.arm.com>

Thank you both for the reviews. I've pushed it.

--
Nick

On 08/20/20 03:18 am, Vladimir Kozlov wrote:
> +1
>
> Thanks,
> Vladimir K
>
> On 8/19/20 12:16 PM, Doug Simon wrote:
>> Looks good to me.
>>
>> -Doug
>>
>>> On 19 Aug 2020, at 10:37, Nick Gasson <Nick.Gasson at arm.com> wrote:
>>>
>>> Hi,
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8251923
>>> Webrev: http://cr.openjdk.java.net/~ngasson/8251923/webrev.1/
>>>
>>> We see this crash occasionally when testing with Graal on some AArch64
>>> systems:
>>>
>>> #
>>> #  Internal Error (/home/ent-user/jdk_src/src/hotspot/share/runtime/jniHandles.inline.hpp:63), pid=92161, tid=92593
>>> #  assert(external_guard || result != __null) failed: Invalid JNI handle
>>> #
>>>
>>> V  [libjvm.so+0xdfaa84]  JNIHandles::resolve(_jobject*)+0x19c
>>> V  [libjvm.so+0xf25104]  HotSpotJVMCI::resolve(JVMCIObject)+0x14
>>> V  [libjvm.so+0xe9bd20]  JVMCICompiler::force_comp_at_level_simple(methodHandle const&)+0xa0
>>> V  [libjvm.so+0x174bd6c]  TieredThresholdPolicy::is_mature(Method*)+0x51c
>>> V  [libjvm.so+0x76e68c]  ciMethodData::load_data()+0x9cc
>>>
>>> The full hs_err file is attached to the JBS entry.
>>>
>>> The handle here is _HotSpotJVMCIRuntime_instance which is initialised in
>>> JVMCIRuntime::initialize_HotSpotJVMCIRuntime():
>>>
>>>    JVMCIObject result = JVMCIENV->call_HotSpotJVMCIRuntime_runtime(JVMCI_CHECK);
>>>    _HotSpotJVMCIRuntime_instance = JVMCIENV->make_global(result);
>>>
>>> JVMCICompiler::force_comp_at_level_simple() checks whether the _object
>>> field inside the handle is null before calling JNIHandles::resolve() on
>>> it, which should avoid the above assertion failure where the pointee is
>>> null. However on a non-TSO architecture another thread may observe the
>>> store to _object when assigning _HotSpotJVMCIRuntime_instance before the
>>> store in JVMCIEnv::make_global() that initialises the pointed-to oop. We
>>> need to add a store-store barrier here to force the expected ordering.
>>>
>>> Tested with jcstress and Graal on the affected machine, which used to
>>> reproduce it quite reliably.
>>>
>>> --
>>> Thanks,
>>> Nick
>>


From fairoz.matte at oracle.com  Thu Aug 20 03:39:51 2020
From: fairoz.matte at oracle.com (Fairoz Matte)
Date: Wed, 19 Aug 2020 20:39:51 -0700 (PDT)
Subject: RFR(s): 8248295:
 serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal
In-Reply-To: <6f104422-11cc-1bea-2ebf-a916a22f10fd@oracle.com>
References: <d9b5bcb9-ebc4-42d2-ac22-cba1128a3cf4@default>
 <1bdcbd35-097e-1681-3a0c-32f9709497a4@oracle.com>
 <eef7f47b-711c-476c-9ca5-f75b55962bb9@default>
 <7040f785-b871-9771-94a2-4c3472a6bf6d@oracle.com>
 <59cd0914-5a61-463e-b46f-ebdc1496ab9f@default>
 <1b7f5767-7d1f-1f43-87bb-556801ef1c41@oracle.com>
 <6f104422-11cc-1bea-2ebf-a916a22f10fd@oracle.com>
Message-ID: <94f5c0a2-f324-4613-abbd-68c4d7df6f52@default>

Thanks Vladimir and Serguei for the reviews.

Thanks,
Fairoz

> -----Original Message-----
> From: Serguei Spitsyn
> Sent: Thursday, August 20, 2020 1:45 AM
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; Fairoz Matte
> <fairoz.matte at oracle.com>; hotspot-compiler-dev at openjdk.java.net;
> serviceability-dev at openjdk.java.net
> Cc: Coleen Phillimore <coleen.phillimore at oracle.com>
> Subject: Re: RFR(s): 8248295:
> serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with Graal
> 
> Hi Fairoz,
> 
> LGTM++
> 
> Thanks,
> Serguei
> 
> 
> On 8/19/20 09:38, Vladimir Kozlov wrote:
> > Looks good.
> >
> > Thanks,
> > Vladimir K
> >
> > On 8/19/20 5:30 AM, Fairoz Matte wrote:
> >> Hi Vladimir,
> >>
> >> Thanks for the review.
> >>
> >>> I would suggest to run test with -XX:+PrintCodeCache flag which
> >>> prints CodeCache usage on exit.
> >>>
> >>> Also add '-ea -esa' flags - some runs failed with them because they
> >>> increase Graal's methods size.
> >>>
> >>> Running test with immediately caused OOM error on my local linux
> >>> machine:
> >>>
> >>> '-server -ea -esa -XX:+TieredCompilation -XX:+PrintCodeCache -
> >>> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -
> >>> XX:+UseJVMCICompiler -Djvmci.Compiler=graal'
> >>>
> >>> With -XX:ReservedCodeCacheSize=30m I got:
> >>>
> >>> [11.217s][warning][codecache] CodeCache is full. Compiler has been
> >>> disabled.
> >>> [11.217s][warning][codecache] Try increasing the code cache size
> >>> using - XX:ReservedCodeCacheSize=
> >>>
> >>> With -XX:ReservedCodeCacheSize=50m I got this output:
> >>
> >> Further testing with PrintCodeCache, ReservedCodeCacheSize = 50MB is
> >> the safe one to use.
> >>
> >>>
> >>> CodeCache: size=51200Kb used=34401Kb max_used=34401Kb
> free=16798Kb
> >>>
> >>> May be you need to set it to 35m or better to 50m to be safe.
> >>>
> >>> Note, without Graal test uses only 5.5m:
> >>>
> >>> CodeCache: size=20480Kb used=5677Kb max_used=5688Kb
> free=14803Kb
> >>>
> >>> -----------------------------
> >>>
> >>> I also forgot to ask you to update test's Copyright year.
> >>
> >> I have updated the copyright year.
> >> Updated webrev for the reference -
> >> http://cr.openjdk.java.net/~fmatte/8248295/webrev.01/
> >>
> >> Thanks,
> >> Fairoz
> >>>
> >>> Regards,
> >>> Vladimir K
> >>>
> >>> On 8/18/20 1:10 AM, Fairoz Matte wrote:
> >>>> Hi Vladimir,
> >>>>
> >>>> Thanks for looking into.
> >>>> This is intermittent crash, and is reproducible in windows debug
> >>>> build
> >>> environment. Below is the testing performed.
> >>>>
> >>>> 1. Issues observed 7/100 runs, ReservedCodeCacheSize=20m with "-
> >>> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -
> >>> XX:+UseJVMCICompiler"
> >>>> 2. Issues observed 0/300 runs, ReservedCodeCacheSize=30m with "-
> >>> XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -
> >>> XX:+UseJVMCICompiler"
> >>>>
> >>>> Thanks,
> >>>> Fairoz
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Vladimir Kozlov
> >>>>> Sent: Monday, August 17, 2020 11:22 PM
> >>>>> To: Fairoz Matte <fairoz.matte at oracle.com>; hotspot-compiler-
> >>>>> dev at openjdk.java.net; serviceability-dev at openjdk.java.net
> >>>>> Cc: Coleen Phillimore <coleen.phillimore at oracle.com>; Dean Long
> >>>>> <dean.long at oracle.com>
> >>>>> Subject: Re: RFR(s): 8248295:
> >>>>> serviceability/jvmti/CompiledMethodLoad/Zombie.java failure with
> >>>>> Graal
> >>>>>
> >>>>> Hi Fairoz,
> >>>>>
> >>>>> How you determine that +10Mb is enough with Graal?
> >>>>>
> >>>>> Thanks,
> >>>>> Vladimir
> >>>>>
> >>>>> On 8/17/20 5:46 AM, Fairoz Matte wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Please review this small test change to work with Graal.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Background:
> >>>>>>
> >>>>>> Graal require more code cache compared to c1/c2. but the test
> >>>>>> case always
> >>>>> set it to 20MB. This may not be sufficient when running graal.
> >>>>>>
> >>>>>> Default configuration for ReservedCodeCacheSize = 250MB
> >>>>>>
> >>>>>> With graal enabled, ReservedCodeCacheSize = 350MB
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Either we can modify the framework to honor
> ReservedCodeCacheSize
> >>>>>> for
> >>>>> graal or just update the testcase.
> >>>>>>
> >>>>>> There are not many test cases they rely on ReservedCodeCacheSize
> >>>>>> or
> >>>>> InitialCodeCacheSize. So the fix prefer the later one.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> JBS - https://bugs.openjdk.java.net/browse/JDK-8248295
> >>>>>>
> >>>>>> Webrev - http://cr.openjdk.java.net/~fmatte/8248295/webrev.00/
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Fairoz
> >>>>>>
> >>>>>>
> >>>>>>
> 

From nick.gasson at arm.com  Thu Aug 20 04:48:30 2020
From: nick.gasson at arm.com (Nick Gasson)
Date: Thu, 20 Aug 2020 12:48:30 +0800
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <c5f7a2f4-8475-9b69-8f93-3d036ceb585b@redhat.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
 <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com>
 <c5f7a2f4-8475-9b69-8f93-3d036ceb585b@redhat.com>
Message-ID: <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com>

On 08/19/20 19:10 pm, Andrew Haley wrote:
> On 19/08/2020 11:05, Magnus Ihse Bursie wrote:
>> This is maybe not relevant, but I was surprised to find
>> src/hotspot/cpu/aarch64/aarch64-asmtest.py, because a) it's python code,
>> and b) the name implies that it is a test, even though that it resides
>> in src. Is this really proper?
>
> I have no idea whether it's really proper, but it allows us to check
> that instructions are encoded correctly by cross-checking with the
> system's assembler. There might well be a more hygienic way to do
> that, but I don't want to be without it.

It is perhaps a bit strange to have the test code under src/ and
embedded in the assembler implementation. How about we move it under
test/ using the existing gtest framework for native code tests? That
runs in tier1 and also for release builds. I tried this just now and
it's easy to do.

--
Thanks,
Nick

From christian.hagedorn at oracle.com  Thu Aug 20 07:10:57 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Thu, 20 Aug 2020 09:10:57 +0200
Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and
 debugging support
In-Reply-To: <dd9f2762-9ec0-3788-a27d-5e046df5d81d@oracle.com>
References: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com>
 <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com>
 <5c6d0dae-a30d-7b32-4503-62b7ec460606@oracle.com>
 <d4f2ee42-6db4-644f-0b0c-2c6e4d6373bd@oracle.com>
 <4b6d3d8a-f2f3-9942-de60-078a0e5c46d9@oracle.com>
 <8cd1d560-f473-f4f1-a865-70e306d4750f@oracle.com>
 <0d5fd444-e836-8042-3039-6d16e62ecfb1@oracle.com>
 <dd9f2762-9ec0-3788-a27d-5e046df5d81d@oracle.com>
Message-ID: <419a1ca8-3bb0-1ed9-3d6b-6dec9fa4217e@oracle.com>

Thank you Vladimir for your careful review!

Best regards,
Christian

On 19.08.20 18:43, Vladimir Kozlov wrote:
> Looks good.
> 
> Thanks,
> Vladimir K
> 
> On 8/19/20 7:06 AM, Christian Hagedorn wrote:
>> On 18.08.20 17:41, Vladimir Kozlov wrote:
>>> c1_Compilation.hpp: looks like both versions of allocator() do the 
>>> same thing.
>>
>> Right, I first wanted to have a public allocator() version in 
>> non-product only - but that might be over-engineered as they do the 
>> same thing. I changed it back to a single public version.
>>
>>> I suggest to build with configure --with-debug-level=optimized to 
>>> check that NOT_PRODUCT can be built with these changes.
>>
>> That's a good idea! I indeed forgot about one NOT_PRODUCT -> 
>> DEBUG_ONLY change. I also found other build issues with the optimized 
>> build. I filed [1] and already sent an RFR for it. It builds 
>> successfully with this patch on top of it.
>>
>> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.02/
>>
>> Best regards,
>> Christian
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8252037
>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 8/18/20 6:16 AM, Christian Hagedorn wrote:
>>>> Hi Vladimir
>>>>
>>>> On 17.08.20 19:36, Vladimir Kozlov wrote:
>>>>> On 8/17/20 12:44 AM, Christian Hagedorn wrote:
>>>>>> Hi Vladimir
>>>>>>
>>>>>> Yes, you're right, these should be changed into ASSERT and DEBUG().
>>>>>>
>>>>>> I'm wondering though if these ifdefs are even required for 
>>>>>> if-blocks inside methods?
>>>>>>
>>>>>> Isn't, for example, this if-block:
>>>>>>
>>>>>> #ifndef PRODUCT
>>>>>> ???????? if (TraceLinearScanLevel >= 2) {
>>>>>> ?????????? tty->print_cr("killing XMMs for trig");
>>>>>> ???????? }
>>>>>> #endif
>>>>>>
>>>>>> removed anyways when the flag is set to < 2 (which is statically 
>>>>>> known and thus would allow this entire block to be removed)? Or 
>>>>>> does it make a difference by explicitly guarding it with an ifdef?
>>>>>
>>>>> You are right. It could be statically removed. But we keep #ifdef 
>>>>> sometimes to indicate that code is executed only in debug build 
>>>>> because we don't always remember type of a flag.
>>>>
>>>> I see, that makes sense. I updated my patch and left the ifdefs 
>>>> there but changed them to ASSERT. I also updated other ifdefs 
>>>> belonging to TraceLinearScanLevel appropriately.
>>>>
>>>> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.01/
>>>>
>>>> Best regards,
>>>> Christian
>>>>
>>>>>
>>>>> Thanks,
>>>>> Vladimir K
>>>>>
>>>>>>
>>>>>> Best regards,
>>>>>> Christian
>>>>>>
>>>>>> On 14.08.20 20:09, Vladimir Kozlov wrote:
>>>>>>> One note. Most of the code is guarded by #ifndef PRODUCT.
>>>>>>>
>>>>>>> But the flag is available only in DEBUG build:
>>>>>>> ?? develop(intx, TraceLinearScanLevel, 0,
>>>>>>>
>>>>>>> Should we use #ifdef ASSERT and DEBUG() instead?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 8/14/20 5:10 AM, Christian Hagedorn wrote:
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> Please review the following enhancement for C1:
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8251093
>>>>>>>> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.00/
>>>>>>>>
>>>>>>>> While I was working on JDK-8249603 [1], I added some additional 
>>>>>>>> debugging and logging code which helped to figure out what was 
>>>>>>>> going on. I think it would be useful to have this code around 
>>>>>>>> for the analysis of future C1 register allocator bugs.
>>>>>>>>
>>>>>>>> This RFE adds (everything non-product code):
>>>>>>>> - find_interval(number): Can be called like that from gdb 
>>>>>>>> anywhere to find an interval with the given number.
>>>>>>>> - Interval::print_children()/print_parent(): Useful when 
>>>>>>>> debugging with gdb to quickly show the split children and parent.
>>>>>>>> - LinearScan::print_reg_num(number): Prints the register or 
>>>>>>>> stack location for this register number. This is useful in some 
>>>>>>>> places (logging with TraceLinearScanLevel set) where it just 
>>>>>>>> printed a number which first had to be manually looked up in 
>>>>>>>> other logs.
>>>>>>>>
>>>>>>>> I additionally did some cleanup of the touched code.
>>>>>>>>
>>>>>>>> We could additionally split the TraceLinearScanLevel flag into 
>>>>>>>> separate flags related to the different phases of the register 
>>>>>>>> allocation algorithm. It currently just prints too much details 
>>>>>>>> on the higher levels. You often find yourself being interested 
>>>>>>>> in a specific part of the algorithm and only want to know more 
>>>>>>>> details there. To achieve that you now you have to either handle 
>>>>>>>> all the noise or manually disable/enable other logs. We could 
>>>>>>>> file an RFE to clean this up if it's worth the effort - given 
>>>>>>>> that there are not many new issues filed for C1 register 
>>>>>>>> allocation today.
>>>>>>>>
>>>>>>>> Thank you!
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Christian
>>>>>>>>
>>>>>>>>
>>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8251093
>>>>>>>>

From adinn at redhat.com  Thu Aug 20 08:34:45 2020
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 20 Aug 2020 09:34:45 +0100
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <01299e5a-8786-bd78-83f4-5e7f900f96da@arm.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
 <47a0b915-291d-7bee-c298-a85d57b1c3a7@redhat.com>
 <01299e5a-8786-bd78-83f4-5e7f900f96da@arm.com>
Message-ID: <d17db479-fbe7-1af5-91a2-05d4e9ad3429@redhat.com>

Hi Ningsheng,

>> postaloc.cpp:312 & 322
>>
>> 311???? if (lrgs(val_idx).is_scalable()) {
>> 312?????? assert(val->ideal_reg() == Op_VecA, "scalable vector
>> register");
>>
>> ???????? . . .
>>
>> 321?????? if (lrgs(val_idx).is_scalable()) {
>> 322???????? assert(val->ideal_reg() == Op_VecA, "scalable vector
>> register");
>>
>> You don't strictly need the asserts here as this is already asserted in
>> the call to is_scalable().
> 
> The assertion in LRG::is_scalable() is different, while this is an
> assertion for ideal_reg of a given node.
Yes, my apologies for misreading that. These assertions should be retained.

regards,


Andrew Dinn
-----------
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From adinn at redhat.com  Thu Aug 20 08:48:07 2020
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 20 Aug 2020 09:48:07 +0100
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
 <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com>
 <c5f7a2f4-8475-9b69-8f93-3d036ceb585b@redhat.com>
 <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com>
Message-ID: <301b8518-164d-b328-ed14-6aaa3b69b2ef@redhat.com>

On 20/08/2020 05:48, Nick Gasson wrote:
> On 08/19/20 19:10 pm, Andrew Haley wrote:
>> On 19/08/2020 11:05, Magnus Ihse Bursie wrote:
>>> This is maybe not relevant, but I was surprised to find
>>> src/hotspot/cpu/aarch64/aarch64-asmtest.py, because a) it's python code,
>>> and b) the name implies that it is a test, even though that it resides
>>> in src. Is this really proper?
>>
>> I have no idea whether it's really proper, but it allows us to check
>> that instructions are encoded correctly by cross-checking with the
>> system's assembler. There might well be a more hygienic way to do
>> that, but I don't want to be without it.
> 
> It is perhaps a bit strange to have the test code under src/ and
> embedded in the assembler implementation. How about we move it under
> test/ using the existing gtest framework for native code tests? That
> runs in tier1 and also for release builds. I tried this just now and
> it's easy to do.
I'm not sure that would be an improvement. This python code is used to
generate C code run as part of JVM startup in a debug JVM build i.e.
code that is linked into the JVM itself. So, the code it generates is
really the same as the debug code embedded in the JVM. It doesn't really
bear any relation to the code in the test tree.

If the generator code were to go anywhere else it would perhaps make
most sense to put it in the make tree. I'm not sure that is required
though or even appropriate. There is already a precedent for keeping
generator code in the source tree and, when it is specific to a given
arch, keeping it next to the related source. The adlc generator code
sits in the shared source tree. The m4 file used to generate parts of
aarch64.ad is in the aarch64 source tree.

regards,


Andrew Dinn
-----------
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From aph at redhat.com  Thu Aug 20 08:50:32 2020
From: aph at redhat.com (Andrew Haley)
Date: Thu, 20 Aug 2020 09:50:32 +0100
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <01299e5a-8786-bd78-83f4-5e7f900f96da@arm.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
 <47a0b915-291d-7bee-c298-a85d57b1c3a7@redhat.com>
 <01299e5a-8786-bd78-83f4-5e7f900f96da@arm.com>
Message-ID: <7a12ad31-9196-c724-16c9-9994b096974c@redhat.com>

On 20/08/2020 03:27, Ningsheng Jian wrote:
> // Note that a vector register with 4 slots, denotes a 128-bit NEON

Lose the comma.  :-)

Never known to miss a trivial point,

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From aph at redhat.com  Thu Aug 20 08:53:52 2020
From: aph at redhat.com (Andrew Haley)
Date: Thu, 20 Aug 2020 09:53:52 +0100
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
 <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com>
 <c5f7a2f4-8475-9b69-8f93-3d036ceb585b@redhat.com>
 <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com>
Message-ID: <6fea2144-b416-cad3-8c99-068a82490256@redhat.com>

On 20/08/2020 05:48, Nick Gasson wrote:
> On 08/19/20 19:10 pm, Andrew Haley wrote:
>> On 19/08/2020 11:05, Magnus Ihse Bursie wrote:
>>> This is maybe not relevant, but I was surprised to find
>>> src/hotspot/cpu/aarch64/aarch64-asmtest.py, because a) it's python code,
>>> and b) the name implies that it is a test, even though that it resides
>>> in src. Is this really proper?
>>
>> I have no idea whether it's really proper, but it allows us to check
>> that instructions are encoded correctly by cross-checking with the
>> system's assembler. There might well be a more hygienic way to do
>> that, but I don't want to be without it.
>
> It is perhaps a bit strange to have the test code under src/ and
> embedded in the assembler implementation. How about we move it under
> test/ using the existing gtest framework for native code tests? That
> runs in tier1 and also for release builds. I tried this just now and
> it's easy to do.

Go on, then!

Bear in mind that the idea of this test is that it checks the encoding
of all instructions, regardless of whether the processor supports them
or not.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From nick.gasson at arm.com  Thu Aug 20 08:58:57 2020
From: nick.gasson at arm.com (Nick Gasson)
Date: Thu, 20 Aug 2020 16:58:57 +0800
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <301b8518-164d-b328-ed14-6aaa3b69b2ef@redhat.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
 <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com>
 <c5f7a2f4-8475-9b69-8f93-3d036ceb585b@redhat.com>
 <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com>
 <301b8518-164d-b328-ed14-6aaa3b69b2ef@redhat.com>
Message-ID: <857dtty9pq.fsf@nicgas01-pc.shanghai.arm.com>

Hi Andrew,

On 08/20/20 16:48 pm, Andrew Dinn wrote:
>> 
>> It is perhaps a bit strange to have the test code under src/ and
>> embedded in the assembler implementation. How about we move it under
>> test/ using the existing gtest framework for native code tests? That
>> runs in tier1 and also for release builds. I tried this just now and
>> it's easy to do.
> I'm not sure that would be an improvement. This python code is used to
> generate C code run as part of JVM startup in a debug JVM build i.e.
> code that is linked into the JVM itself. So, the code it generates is
> really the same as the debug code embedded in the JVM. It doesn't really
> bear any relation to the code in the test tree.
>

I meant move the test itself - entry() and asm_check() in
assembler_aarch64.cpp - under test/hotspot/gtest. The generator would
move with it.

> If the generator code were to go anywhere else it would perhaps make
> most sense to put it in the make tree. I'm not sure that is required
> though or even appropriate. There is already a precedent for keeping
> generator code in the source tree and, when it is specific to a given
> arch, keeping it next to the related source. The adlc generator code
> sits in the shared source tree. The m4 file used to generate parts of
> aarch64.ad is in the aarch64 source tree.
>
> regards,
>
>
> Andrew Dinn
> -----------
> Red Hat Distinguished Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill


From aph at redhat.com  Thu Aug 20 09:08:18 2020
From: aph at redhat.com (Andrew Haley)
Date: Thu, 20 Aug 2020 10:08:18 +0100
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <857dtty9pq.fsf@nicgas01-pc.shanghai.arm.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
 <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com>
 <c5f7a2f4-8475-9b69-8f93-3d036ceb585b@redhat.com>
 <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com>
 <301b8518-164d-b328-ed14-6aaa3b69b2ef@redhat.com>
 <857dtty9pq.fsf@nicgas01-pc.shanghai.arm.com>
Message-ID: <c98cedd0-733c-2dc9-6631-9679beba4217@redhat.com>

Hi,

On 20/08/2020 09:58, Nick Gasson wrote:
>
> On 08/20/20 16:48 pm, Andrew Dinn wrote:
>>>
>>> It is perhaps a bit strange to have the test code under src/ and
>>> embedded in the assembler implementation. How about we move it under
>>> test/ using the existing gtest framework for native code tests? That
>>> runs in tier1 and also for release builds. I tried this just now and
>>> it's easy to do.
>> I'm not sure that would be an improvement. This python code is used to
>> generate C code run as part of JVM startup in a debug JVM build i.e.
>> code that is linked into the JVM itself. So, the code it generates is
>> really the same as the debug code embedded in the JVM. It doesn't really
>> bear any relation to the code in the test tree.
>
> I meant move the test itself - entry() and asm_check() in
> assembler_aarch64.cpp - under test/hotspot/gtest. The generator would
> move with it.

Hmm. I'm still not sure how this would work. Let's see the patch and
we can talk about it. It still sounds to me rather like pointlessly
moving the furniture around.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From rwestrel at redhat.com  Thu Aug 20 09:12:54 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 20 Aug 2020 11:12:54 +0200
Subject: RFR(XS): 8251527: CTW: C2 (Shenandoah) compilation fails with SEGV
 due to unhandled catchproj == NULL
Message-ID: <877dtt3ckp.fsf@redhat.com>


http://cr.openjdk.java.net/~roland/8251527/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8251527

This triggers with Shenandoah but the fix (and the bug) is in shared
C2 code.

CallNode::extract_projections(), once it has found the control ProjNode
looks for the CatchNode at the first use of the ProjNode. In the case of
the crash, the ProjNode has more than one use and the first use is not
the CatchNode (but a pinned LoadNode). I propose using unique_ctrl_out()
instead.

The ProjNode has a LoadNode because one is pinned on a ProjNode by
PhaseIdealLoop::split_if_with_blocks_post() when it tries to sink the
LoadNode out of loop. A LoadNode becomes the first use of the ProjNode
after the loop body is cloned during unswitching.

Roland.


From nick.gasson at arm.com  Thu Aug 20 09:40:34 2020
From: nick.gasson at arm.com (Nick Gasson)
Date: Thu, 20 Aug 2020 17:40:34 +0800
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <c98cedd0-733c-2dc9-6631-9679beba4217@redhat.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
 <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com>
 <c5f7a2f4-8475-9b69-8f93-3d036ceb585b@redhat.com>
 <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com>
 <301b8518-164d-b328-ed14-6aaa3b69b2ef@redhat.com>
 <857dtty9pq.fsf@nicgas01-pc.shanghai.arm.com>
 <c98cedd0-733c-2dc9-6631-9679beba4217@redhat.com>
Message-ID: <85364hy7sd.fsf@nicgas01-pc.shanghai.arm.com>

On 08/20/20 17:08 pm, Andrew Haley wrote:
>>
>> I meant move the test itself - entry() and asm_check() in
>> assembler_aarch64.cpp - under test/hotspot/gtest. The generator would
>> move with it.
>
> Hmm. I'm still not sure how this would work. Let's see the patch and
> we can talk about it. It still sounds to me rather like pointlessly
> moving the furniture around.

http://cr.openjdk.java.net/~ngasson/asmtest/webrev.0/

Then you'd run it with

  make exploded-test TEST="gtest:AssemblerAArch64"

The downside is that it won't run on every startup of a debug build, but
it will run in the tier1 tests, including for release builds, which
arguably gives more coverage. It looks a lot tidier to me, but that's
clearly subjective.

--
Thanks,
Nick

From christian.hagedorn at oracle.com  Thu Aug 20 09:45:05 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Thu, 20 Aug 2020 11:45:05 +0200
Subject: RFR(XS): 8251527: CTW: C2 (Shenandoah) compilation fails with
 SEGV due to unhandled catchproj == NULL
In-Reply-To: <877dtt3ckp.fsf@redhat.com>
References: <877dtt3ckp.fsf@redhat.com>
Message-ID: <4c3aa4cf-af9f-9000-a12d-010bdd477b30@oracle.com>

Hi Roland

That looks good to me.

Best regards,
Christian

On 20.08.20 11:12, Roland Westrelin wrote:
> 
> http://cr.openjdk.java.net/~roland/8251527/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8251527
> 
> This triggers with Shenandoah but the fix (and the bug) is in shared
> C2 code.
> 
> CallNode::extract_projections(), once it has found the control ProjNode
> looks for the CatchNode at the first use of the ProjNode. In the case of
> the crash, the ProjNode has more than one use and the first use is not
> the CatchNode (but a pinned LoadNode). I propose using unique_ctrl_out()
> instead.
> 
> The ProjNode has a LoadNode because one is pinned on a ProjNode by
> PhaseIdealLoop::split_if_with_blocks_post() when it tries to sink the
> LoadNode out of loop. A LoadNode becomes the first use of the ProjNode
> after the loop body is cloned during unswitching.
> 
> Roland.
> 

From magnus.ihse.bursie at oracle.com  Thu Aug 20 10:14:36 2020
From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie)
Date: Thu, 20 Aug 2020 12:14:36 +0200
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <85364hy7sd.fsf@nicgas01-pc.shanghai.arm.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
 <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com>
 <c5f7a2f4-8475-9b69-8f93-3d036ceb585b@redhat.com>
 <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com>
 <301b8518-164d-b328-ed14-6aaa3b69b2ef@redhat.com>
 <857dtty9pq.fsf@nicgas01-pc.shanghai.arm.com>
 <c98cedd0-733c-2dc9-6631-9679beba4217@redhat.com>
 <85364hy7sd.fsf@nicgas01-pc.shanghai.arm.com>
Message-ID: <39664366-1ba9-6eb1-dcee-c8a4f07877b7@oracle.com>

On 2020-08-20 11:40, Nick Gasson wrote:
> On 08/20/20 17:08 pm, Andrew Haley wrote:
>>> I meant move the test itself - entry() and asm_check() in
>>> assembler_aarch64.cpp - under test/hotspot/gtest. The generator would
>>> move with it.
>> Hmm. I'm still not sure how this would work. Let's see the patch and
>> we can talk about it. It still sounds to me rather like pointlessly
>> moving the furniture around.
> http://cr.openjdk.java.net/~ngasson/asmtest/webrev.0/
>
> Then you'd run it with
>
>    make exploded-test TEST="gtest:AssemblerAArch64"
>
> The downside is that it won't run on every startup of a debug build, but
> it will run in the tier1 tests, including for release builds, which
> arguably gives more coverage. It looks a lot tidier to me, but that's
> clearly subjective.
FWIW, it definitely looks tidier to me too.

/Magnus
>
> --
> Thanks,
> Nick


From adinn at redhat.com  Thu Aug 20 10:38:32 2020
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 20 Aug 2020 11:38:32 +0100
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <39664366-1ba9-6eb1-dcee-c8a4f07877b7@oracle.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
 <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com>
 <c5f7a2f4-8475-9b69-8f93-3d036ceb585b@redhat.com>
 <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com>
 <301b8518-164d-b328-ed14-6aaa3b69b2ef@redhat.com>
 <857dtty9pq.fsf@nicgas01-pc.shanghai.arm.com>
 <c98cedd0-733c-2dc9-6631-9679beba4217@redhat.com>
 <85364hy7sd.fsf@nicgas01-pc.shanghai.arm.com>
 <39664366-1ba9-6eb1-dcee-c8a4f07877b7@oracle.com>
Message-ID: <4ea979d6-c81f-fa3b-7c23-e563f52141dd@redhat.com>

On 20/08/2020 11:14, Magnus Ihse Bursie wrote:
> On 2020-08-20 11:40, Nick Gasson wrote:
>> http://cr.openjdk.java.net/~ngasson/asmtest/webrev.0/
>>
>> Then you'd run it with
>>
>> ?? make exploded-test TEST="gtest:AssemblerAArch64"
>>
>> The downside is that it won't run on every startup of a debug build, but
>> it will run in the tier1 tests, including for release builds, which
>> arguably gives more coverage. It looks a lot tidier to me, but that's
>> clearly subjective.
> FWIW, it definitely looks tidier to me too.
Well, perhaps this check ought to be done as a standalone test rather
than as debug build validation. I don't really have any deep commitment
either way. However, if we do proceed with this I think it ought to be
in a separate follow-up patch and with its own JIRA. It should not stop
Ningsheng's SVE patch going in as is since that merely corrects the
status quo to allow for SVE instructions.

regards,


Andrew Dinn
-----------
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From vladimir.x.ivanov at oracle.com  Thu Aug 20 12:29:27 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 20 Aug 2020 15:29:27 +0300
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
Message-ID: <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com>

Hi Ningsheng,

> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039289.html 

Impressive work, Ningsheng!

> http://cr.openjdk.java.net/~njian/8231441/README-RFR.txt

"Since the bottom 128 bits are shared with the NEON, we extend current
register mask definition of V0-V31 registers. Currently, c2 uses one bit
mask for a 32-bit register slot, so to define at most 2048 bits we will
need to add 64 slots in AD file. That's a really large number, and will
also break current regmask assumption."

Can you, please, elaborate on the last point? What RegMask assumptions 
are broken for 2048-bit vectors? I'm looking at [1] and try to 
understand the motivation for the changes in shared code.

Compared to x86 w/ AVX512, architectural state for vector registers is 
4x larger in the worst case (ignoring predicate registers for now). Here 
are the relevant constants on x86:

gensrc/adfiles/adGlobals_x86.hpp:

// the number of reserved registers + machine registers.
#define REG_COUNT    545
...
// Size of register-mask in ints
#define RM_SIZE 22

My estimate is that for AArch64 with SVE support the constants will be:

   REG_COUNT < 2500
   RM_SIZE < 100

which don't look too bad.

Also, I don't see any changes related to stack management. So, I assume 
it continues to be managed in slots. Any problems there? As I 
understand, wide SVE registers are caller-save, so there may be many 
spills of huge vectors around a call. (Probably, not possible with C2 
auto-vectorizer as it is now, but Vector API will expose it.)

Have you noticed any performance problems? If that's the case, then 
AVX512 support on x86 would benefit from similar optimization as well.

FTR there was a similar exercise [2] on x86 to abstract away exact sizes 
of vector registers, but it didn't have to worry about RA since all the 
operands were already available. Also, vectors of all different sizes 
may be used. So, it makes it hard to compare.

Best regards,
Vladimir Ivanov

[1] http://cr.openjdk.java.net/~njian/8231441/webrev.03-ra/

[2] https://bugs.openjdk.java.net/browse/JDK-8230015

> On 7/30/20 7:26 PM, Andrew Dinn wrote:
>> Hi Ningsheng,
>>
>> I will start to review this either later today or (more likely)
>> tomorrow. It will probably take some time to work through it all. I will
>> work from the updated patch posted by PengFei.
>>
>> regards,
>>
>>
>> Andrew Dinn
>> -----------
>> Red Hat Distinguished Engineer
>> Red Hat UK Ltd
>> Registered in England and Wales under Company Registration No. 03798903
>> Directors: Michael Cunningham, Michael ("Mike") O'Neill
>>
>> On 21/07/2020 07:05, Ningsheng Jian wrote:
>>> [Ping]
>>>
>>> Could anyone please help to review this patch, especially for the c2
>>> register allocation part?
>>>
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8231441
>>>
>>> The latest webrev:
>>> http://cr.openjdk.java.net/~njian/8231441/webrev.02
>>>
>>> In the latest webrev, we block one predicate register (p7) with all
>>> elements preset to TRUE, so that c2 compiled code can use it freely to
>>> generate instructions for unpredicated operations.
>>>
>>> And the split parts:
>>>
>>> 1) SVE feature detection:
>>> http://cr.openjdk.java.net/~njian/8231441/webrev.02-feature
>>>
>>> 2) c2 register allocation:
>>> http://cr.openjdk.java.net/~njian/8231441/webrev.02-ra
>>>
>>> 3) SVE c2 backend:
>>> http://cr.openjdk.java.net/~njian/8231441/webrev.02-c2
>>>
>>> The initial RFR which has some descriptions of the patch:
>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-March/037628.html 
>>>
>>>
>>>
>>> The description can also be found at:
>>> http://cr.openjdk.java.net/~njian/8231441/README-RFR.txt
>>>
>>> Notes to verify the patch on QEMU user emulation, with an example of
>>> compiled code:
>>> http://cr.openjdk.java.net/~njian/8231441/running-sve-in-qemu-user.txt
>>>
>>> Thanks,
>>> Ningsheng
>>>
>>>
>>> On 5/27/20 3:23 PM, Ningsheng Jian wrote:
>>>> Hi,
>>>>
>>>> I have rebased this patch with some more comments added. And also
>>>> relaxed the instruction matching conditions for 128-bit vector.
>>>>
>>>> I would appreciate if someone could help to review this.
>>>>
>>>> Whole patch:
>>>> http://cr.openjdk.java.net/~njian/8231441/webrev.01
>>>>
>>>> Different parts of changes:
>>>>
>>>> 1) SVE feature detection
>>>> http://cr.openjdk.java.net/~njian/8231441/webrev.01-feature
>>>>
>>>> 2) c2 registion allocation
>>>> http://cr.openjdk.java.net/~njian/8231441/webrev.01-ra
>>>>
>>>> 3) SVE c2 backend
>>>> http://cr.openjdk.java.net/~njian/8231441/webrev.01-c2
>>>>
>>>> (Or should I split this into different JBS?)
>>>>
>>>> Thanks,
>>>> Ningsheng
>>>>
>>>> On 3/25/20 2:37 PM, Ningsheng Jian wrote:
>>>>> Hi,
>>>>>
>>>>> Could you please help to review this patch adding AArch64 SVE support?
>>>>> It also touches c2 compiler shared code.
>>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231441
>>>>> Webrev: http://cr.openjdk.java.net/~njian/8231441/webrev.00
>>>>>
>>>>> Arm has released new vector ISA extension for AArch64, SVE [1] and
>>>>> SVE2 [2]. This patch adds the initial SVE support in OpenJDK. In this
>>>>> patch we have:
>>>>>
>>>>> 1) SVE feature enablement and detection
>>>>> 2) SVE vector register allocation support with initial predicate
>>>>> register definition
>>>>> 3) SVE c2 backend for current SLP based vectorizer. (We also have a 
>>>>> POC
>>>>> patch of a new vectorizer using SVE predicate-driven loop control, but
>>>>> that's still under development.)
>>>>>
>>>>> SVE register definition
>>>>> =======================
>>>>> Unlike other SIMD architectures, SVE allows hardware 
>>>>> implementations to
>>>>> choose a vector register length from 128 and 2048 bits, multiple of 
>>>>> 128
>>>>> bits. So we introduce a new vector type VectorA, i.e. length agnostic
>>>>> (scalable) vector type, and Op_VecA for machine vectora register. 
>>>>> In the
>>>>> meantime, to minimize register allocation code changes, we also take
>>>>> advantage of one JIT compiler aspect, that is during the compile 
>>>>> time we
>>>>> actually know the real hardware SVE vector register size of current
>>>>> running machine. So, the register allocator actually knows how many
>>>>> register slots an Op_VecA ideal reg requires, and could work fine
>>>>> without much modification.
>>>>>
>>>>> Since the bottom 128 bits are shared with the NEON, we extend current
>>>>> register mask definition of V0-V31 registers. Currently, c2 uses 
>>>>> one bit
>>>>> mask for a 32-bit register slot, so to define at most 2048 bits we 
>>>>> will
>>>>> need to add 64 slots in AD file. That's a really large number, and 
>>>>> will
>>>>> also break current regmask assumption. Considering the SVE vector
>>>>> register is architecturally scalable for different sizes, we just 
>>>>> define
>>>>> double of original NEON vector register slots, i.e. 8 slots: Vx, Vx_H,
>>>>> Vx_J ... Vx_O. After adlc, the generated register masks now looks 
>>>>> like:
>>>>>
>>>>> const RegMask _VECTORA_REG_mask( 0x0, 0x0, 0xffffffff, 0xffffffff,
>>>>> 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, ...
>>>>>
>>>>> const RegMask _VECTORD_REG_mask( 0x0, 0x0, 0x3030303, 0x3030303,
>>>>> 0x3030303, 0x3030303, 0x3030303, 0x3030303, ...
>>>>>
>>>>> const RegMask _VECTORX_REG_mask( 0x0, 0x0, 0xf0f0f0f, 0xf0f0f0f,
>>>>> 0xf0f0f0f, 0xf0f0f0f, 0xf0f0f0f, 0xf0f0f0f, ...
>>>>>
>>>>> And we use SlotsPerVecA to indicate regmask bit size for a VecA
>>>>> register.
>>>>>
>>>>> Although for physical register allocation, register allocator does not
>>>>> need to know the real VecA register size, while doing spill/unspill,
>>>>> current register allocation needs to know actual stack slot size to
>>>>> store/load VecA registers. SVE is able to do vector size agnostic
>>>>> spilling, but to minimize the code changes, as I mentioned before, we
>>>>> just let RA know the actual vector register size in current running
>>>>> machine, by calling scalable_vector_reg_size().
>>>>>
>>>>> In the meantime, since some vector operations do not have unpredicated
>>>>> SVE1 instructions, but only predicate version, e.g. vector multiply,
>>>>> vector load/store. We have also defined predicate registers in this
>>>>> patch, and c2 register allocator will allocate a temp predicate 
>>>>> register
>>>>> to fulfill the expecting unpredicated operations. And this can also be
>>>>> used for future predicate-driven vectorizer. This is not efficient for
>>>>> now, as we can see many ptrue instructions in the generated code. One
>>>>> possible solution I can see, is to block one predicate register, and
>>>>> preset it to all true. But to preserve/reinitialize a caller save
>>>>> register value cross calls seems risky to work in this patch. I decide
>>>>> to defer it to further optimization work. If anyone has any 
>>>>> suggestions
>>>>> on this, I would appreciate.
>>>>>
>>>>> SVE feature detection
>>>>> =====================
>>>>> Since we may have some compiled code based on the initial detected SVE
>>>>> vector register length and the compiled code is compiled only for that
>>>>> vector register length, we assume that the SVE vector register length
>>>>> will not be changed during the JVM lifetime. However, SVE vector 
>>>>> length
>>>>> is per-thread and can be changed by system call [3], so we need to 
>>>>> make
>>>>> sure that each jni call will not change the sve vector length.
>>>>>
>>>>> Currently, we verify the SVE vector register length on each JNI 
>>>>> return,
>>>>> and if an SVE vector length change is detected, jvm simply reports 
>>>>> error
>>>>> and stops running. The VM running vector length can also be set by
>>>>> existing VM option MaxVectorSize with c2 enabled. If MaxVectorSize is
>>>>> specified not the same as system default sve vector length (in
>>>>> /proc/sys/abi/sve_default_vector_length), JVM will set current process
>>>>> sve vector length to the specified vector length.
>>>>>
>>>>> Compiled code
>>>>> =============
>>>>> We have added all current c2 backend codegen on par with NEON, but 
>>>>> only
>>>>> for vector length larger than 128-bit.
>>>>>
>>>>> On a 1024 bit SVE environment, for the following simple loop with int
>>>>> array element type:
>>>>>
>>>>> ???? for (int i = 0; i < LENGTH; i++) {
>>>>> ?????? c[i] = a[i] + b[i];
>>>>> ???? }
>>>>>
>>>>> c2 generated loop:
>>>>>
>>>>> ???? 0x0000ffff811c0820:?? sbfiz?? x11, x10, #2, #32
>>>>> ???? 0x0000ffff811c0824:?? add???? x13, x18, x11
>>>>> ???? 0x0000ffff811c0828:?? add???? x14, x1, x11
>>>>> ???? 0x0000ffff811c082c:?? add???? x13, x13, #0x10
>>>>> ???? 0x0000ffff811c0830:?? add???? x14, x14, #0x10
>>>>> ???? 0x0000ffff811c0834:?? add???? x11, x0, x11
>>>>> ???? 0x0000ffff811c0838:?? add???? x11, x11, #0x10
>>>>> ???? 0x0000ffff811c083c:?? ptrue?? p1.s??? // To be optimized
>>>>> ???? 0x0000ffff811c0840:?? ld1w??? {z16.s}, p1/z, [x14]
>>>>> ???? 0x0000ffff811c0844:?? ptrue?? p0.s
>>>>> ???? 0x0000ffff811c0848:?? ld1w??? {z17.s}, p0/z, [x13]
>>>>> ???? 0x0000ffff811c084c:?? add???? z16.s, z17.s, z16.s
>>>>> ???? 0x0000ffff811c0850:?? ptrue?? p1.s
>>>>> ???? 0x0000ffff811c0854:?? st1w??? {z16.s}, p1, [x11]
>>>>> ???? 0x0000ffff811c0858:?? add???? w10, w10, #0x20
>>>>> ???? 0x0000ffff811c085c:?? cmp???? w10, w12
>>>>> ???? 0x0000ffff811c0860:?? b.lt??? 0x0000ffff811c0820
>>>>>
>>>>> Test
>>>>> ====
>>>>> Currently, we don't have real hardware to verify SVE features (and
>>>>> performance). But we have run jtreg tests with SVE in some 
>>>>> emulators. On
>>>>> QEMU system emulator, which has SVE emulation support, jtreg tier1-3
>>>>> passed with different vector sizes. We've also verified it with full
>>>>> jtreg tests without SVE on both x86 and AArch64, to make sure that
>>>>> there's no regression.
>>>>>
>>>>> The patch has also been applied to Vector API code base, and 
>>>>> verified on
>>>>> emulator. In Vector API, there are more vector related tests and is 
>>>>> more
>>>>> possible to generate vector instructions by intrinsification.
>>>>>
>>>>> A simple test can also run in QEMU user emulation, e.g.
>>>>>
>>>>> $ qemu-aarch64 -cpu max,sve-max-vq=2 java -XX:UseSVE=1 SIMD
>>>>>
>>>>> (
>>>>> To run it in user emulation mode, we will need to bypass SVE feature
>>>>> detection code in this patch. E.g. apply:
>>>>> http://cr.openjdk.java.net/~njian/8231441/user-emulation.patch
>>>>> )l
>>>>>
>>>>> Others
>>>>> ======
>>>>> Since this patch is a bit large, I've also split it into 3 parts, for
>>>>> easy review:
>>>>>
>>>>> 1) SVE feature detection
>>>>> http://cr.openjdk.java.net/~njian/8231441/webrev.00-feature
>>>>>
>>>>> 2) c2 registion allocation
>>>>> http://cr.openjdk.java.net/~njian/8231441/webrev.00-ra
>>>>>
>>>>> 3) SVE c2 backend
>>>>> http://cr.openjdk.java.net/~njian/8231441/webrev.00-c2
>>>>>
>>>>> Part of this patch has been contributed by Joshua Zhu and Yang Zhang.
>>>>>
>>>>> Refs
>>>>> ====
>>>>> [1] https://developer.arm.com/docs/ddi0584/latest
>>>>> [2] https://developer.arm.com/docs/ddi0602/latest
>>>>> [3] https://www.kernel.org/doc/Documentation/arm64/sve.txt
>>>>>
>>>>> Thanks,
>>>>> Ningsheng
>>>>>
>>>>
>>>
>>
> 

From yudi.zheng at oracle.com  Thu Aug 20 12:37:18 2020
From: yudi.zheng at oracle.com (Yudi Zheng)
Date: Thu, 20 Aug 2020 14:37:18 +0200
Subject: RFR: 8252058: [JVMCI] Rework setting is_method_handle_invoke flag in
 jvmciCodeInstaller
In-Reply-To: <24dd9111-9119-3b00-fb48-733ef6042cae@oracle.com>
References: <B69DEE2F-9715-4DF1-964C-B4AEF3D9264E@oracle.com>
 <b29a7191-ae94-d36d-4ffd-62c05ca53c62@oracle.com>
 <5aec53c6-5e79-aa07-aa97-ca46fccb3f58@oracle.com>
 <CB5447D4-6DB4-43D3-B8B2-C076978BF541@oracle.com>
 <b8f6a4a4-de36-a490-b86a-980b43dbe72f@oracle.com>
 <96D2E077-C8A9-4DB6-9107-359C151A004B@oracle.com>
 <24dd9111-9119-3b00-fb48-733ef6042cae@oracle.com>
Message-ID: <EC2262AB-F697-4DEB-AEBD-4438EDADA388@oracle.com>

Please review this rework of setting is_method_handle_invoke flag in jvmciCodeInstaller.

http://cr.openjdk.java.net/~yzheng/8252058/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8252058

Changes since last time are at http://cr.openjdk.java.net/~yzheng/8252058/webrev.00/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/GraalHotSpotVMConfig.java.udiff.html

-Yudi

> On 7 Jun 2020, at 23:14, Dean Long <dean.long at oracle.com> wrote:
> 
> Looks good!
> 
> dl
> 
> On 6/7/20 1:06 PM, Yudi Zheng wrote:
>> Thanks Dean!
>> Here is a revision including your suggestion: http://cr.openjdk.java.net/~yzheng/8246347/webrev.01/
>> 
>> -Yudi
>> 
>>> On 6 Jun 2020, at 11:33, Dean Long <dean.long at oracle.com> wrote:
>>> 
>>> I found a problem.  You need to make CompiledMethod::is_deopt_mh_entry() look like is_deopt_entry() by adding the JVMCI logic that looks backwards by the size of the call instruction.
>>> 
>>> dl
>>> 
>>> On 6/4/20 12:03 AM, Yudi Zheng wrote:
>>>> I did not push this yet. It might require changes on the Graal side. I am still thinking about how to merge.
>>>> 
>>>> -Yudi
>>>> 
>>>>> On 4 Jun 2020, at 01:22, Dean Long <dean.long at oracle.com> wrote:
>>>>> 
>>>>> Does this require recent Graal change in order to work correctly?
>>>>> 
>>>>> dl
>>>>> 
>>>>> On 6/3/20 3:47 PM, Dean Long wrote:
>>>>>> Hi Yudi.  I'm seeing an assert in test/jdk/java/lang/invoke/CallSiteTest.java with a debug build. Let me remove my changes and see if it still fails.  What testing did you do?
>>>>>> 
>>>>>> dl
>>>>>> 
>>>>>> On 6/2/20 9:38 AM, Yudi Zheng wrote:
>>>>>>> Hello,
>>>>>>> 
>>>>>>> Please review this patch that sets is_method_handle_invoke flag accordingly when describing scope at call site in jvmciCodeInstaller.
>>>>>>> 
>>>>>>> http://cr.openjdk.java.net/~yzheng/8246347/webrev.00/
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8246347
>>>>>>> 
>>>>>>> Many thanks,
>>>>>>> Yudi
>>> 
>> 
> 


From rwestrel at redhat.com  Thu Aug 20 12:51:42 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 20 Aug 2020 14:51:42 +0200
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <668C77F1-5CCC-43CA-9C5E-2EE390D3137A@oracle.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
 <87h7t13bdz.fsf@redhat.com> <668C77F1-5CCC-43CA-9C5E-2EE390D3137A@oracle.com>
Message-ID: <87y2m91nvl.fsf@redhat.com>


Hi John,

> I?m going over it one more time (between smoky breaths from
> the California fires) and I have a question.  What is the exact
> structure of outer_phi?

Thanks for taking another look at this.

> 1. At first it is a clone of phi, its region patched and the other edges the same:
>
>     outer_phi := Phi(outer_head, init, AddL(phi, stride))

It's only a clone so:

outer_phi := Phi(outer_head, init, AddI(phi, stride))

(that is no AddL)

> 2. After long_loop_replace_long_iv, the interior phi links it back to itself:
>
>     outer_phi := Phi(outer_head, init, AddL(AddL(inner_phi, outer_phi), stride))

I don't think that's right. There are 2 calls to
long_loop_replace_long_iv(). One to replace phi and the other one to
replace incr (that is the AddI above).

outer_phi := Phi(outer_head, init, AddL(I2L(AddI(inner_phi, stride)), outer_phi))

(actually this is not quite accurate because peeling one iteration
causes an extra phi to be added to merge the peeled iteration with the
counted loop in most cases).

Do you see a problem with the above outer_phi structure?

Roland.


From aph at redhat.com  Thu Aug 20 14:19:18 2020
From: aph at redhat.com (Andrew Haley)
Date: Thu, 20 Aug 2020 15:19:18 +0100
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <4ea979d6-c81f-fa3b-7c23-e563f52141dd@redhat.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
 <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com>
 <c5f7a2f4-8475-9b69-8f93-3d036ceb585b@redhat.com>
 <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com>
 <301b8518-164d-b328-ed14-6aaa3b69b2ef@redhat.com>
 <857dtty9pq.fsf@nicgas01-pc.shanghai.arm.com>
 <c98cedd0-733c-2dc9-6631-9679beba4217@redhat.com>
 <85364hy7sd.fsf@nicgas01-pc.shanghai.arm.com>
 <39664366-1ba9-6eb1-dcee-c8a4f07877b7@oracle.com>
 <4ea979d6-c81f-fa3b-7c23-e563f52141dd@redhat.com>
Message-ID: <57d07f23-c0f1-31eb-586a-71fa59b80891@redhat.com>

On 20/08/2020 11:38, Andrew Dinn wrote:
> Well, perhaps this check ought to be done as a standalone test rather
> than as debug build validation. I don't really have any deep commitment
> either way. However, if we do proceed with this I think it ought to be
> in a separate follow-up patch and with its own JIRA. It should not stop
> Ningsheng's SVE patch going in as is since that merely corrects the
> status quo to allow for SVE instructions.

I agree.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From magnus.ihse.bursie at oracle.com  Thu Aug 20 14:37:42 2020
From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie)
Date: Thu, 20 Aug 2020 16:37:42 +0200
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <4ea979d6-c81f-fa3b-7c23-e563f52141dd@redhat.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
 <2d6e1fc5-d8f3-d046-325e-f07aa5f3cd83@oracle.com>
 <c5f7a2f4-8475-9b69-8f93-3d036ceb585b@redhat.com>
 <85blj5ylb5.fsf@nicgas01-pc.shanghai.arm.com>
 <301b8518-164d-b328-ed14-6aaa3b69b2ef@redhat.com>
 <857dtty9pq.fsf@nicgas01-pc.shanghai.arm.com>
 <c98cedd0-733c-2dc9-6631-9679beba4217@redhat.com>
 <85364hy7sd.fsf@nicgas01-pc.shanghai.arm.com>
 <39664366-1ba9-6eb1-dcee-c8a4f07877b7@oracle.com>
 <4ea979d6-c81f-fa3b-7c23-e563f52141dd@redhat.com>
Message-ID: <d5743385-0456-7590-22e4-6108da5f6581@oracle.com>

On 2020-08-20 12:38, Andrew Dinn wrote:
> On 20/08/2020 11:14, Magnus Ihse Bursie wrote:
>> On 2020-08-20 11:40, Nick Gasson wrote:
>>> http://cr.openjdk.java.net/~ngasson/asmtest/webrev.0/
>>>
>>> Then you'd run it with
>>>
>>>  ?? make exploded-test TEST="gtest:AssemblerAArch64"
>>>
>>> The downside is that it won't run on every startup of a debug build, but
>>> it will run in the tier1 tests, including for release builds, which
>>> arguably gives more coverage. It looks a lot tidier to me, but that's
>>> clearly subjective.
>> FWIW, it definitely looks tidier to me too.
> Well, perhaps this check ought to be done as a standalone test rather
> than as debug build validation. I don't really have any deep commitment
> either way. However, if we do proceed with this I think it ought to be
> in a separate follow-up patch and with its own JIRA. It should not stop
> Ningsheng's SVE patch going in as is since that merely corrects the
> status quo to allow for SVE instructions.
Yes, I fully agree, and never meant to imply anything else.

/Magnus
>
> regards,
>
>
> Andrew Dinn
> -----------
> Red Hat Distinguished Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill
>


From shade at redhat.com  Thu Aug 20 15:02:59 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 20 Aug 2020 17:02:59 +0200
Subject: RFR (XS) 8252120: compiler/oracle/TestCompileCommand.java misspells
 "occured"
Message-ID: <bbf4deec-77c4-7ecb-2937-6291562acd1f@redhat.com>

Bug:
   https://bugs.openjdk.java.net/browse/JDK-8252120

Noticed this while reading some related test code. There is no way VM could emit the message the 
assert checks, which means the assert always passes. See the history in the bug.

Fix:

diff -r 53629f4016c6 test/hotspot/jtreg/compiler/oracle/TestCompileCommand.java
--- a/test/hotspot/jtreg/compiler/oracle/TestCompileCommand.java        Thu Aug 20 11:42:12 2020 +0100
+++ b/test/hotspot/jtreg/compiler/oracle/TestCompileCommand.java        Thu Aug 20 17:01:40 2020 +0200
@@ -63,5 +63,5 @@
          }

-        out.shouldNotContain("CompileCommand: An error occured during parsing");
+        out.shouldNotContain("CompileCommand: An error occurred during parsing");
          out.shouldHaveExitValue(0);
      }


Testing: affected test

-- 
Thanks,
-Aleksey


From rwestrel at redhat.com  Thu Aug 20 15:34:24 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 20 Aug 2020 17:34:24 +0200
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <a28925b3-f93c-40f4-f180-8bf8a32c3a56@oracle.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
 <87h7t13bdz.fsf@redhat.com> <a28925b3-f93c-40f4-f180-8bf8a32c3a56@oracle.com>
Message-ID: <87tuwx1gcf.fsf@redhat.com>


> Yes, webrev.03 looks good to me. I've re-run extended testing and the results look good.

Thanks for the review and testing!

Roland.


From rwestrel at redhat.com  Thu Aug 20 16:05:39 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 20 Aug 2020 18:05:39 +0200
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <ef7c4e62-ca3b-f6ee-b7ac-8e1af273d33d@oracle.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
 <87h7t13bdz.fsf@redhat.com> <ef7c4e62-ca3b-f6ee-b7ac-8e1af273d33d@oracle.com>
Message-ID: <87r1s11ewc.fsf@redhat.com>


Hi Vladimir,

Thanks for taking a look at this.

> ===============
> src/hotspot/share/opto/callnode.cpp:
>
>    // If you have back to back safepoints, remove one
>     if( in(TypeFunc::Control)->is_SafePoint() )
>       return in(TypeFunc::Control);
>
> -  if( in(0)->is_Proj() ) {
> +  // Transforming long counted loops requires a safepoint node. Do not
> +  // eliminate a safepoint until loop opts are over.
> +  if (in(0)->is_Proj() && !phase->C->major_progress()) {
>
> Can you elaborate on this a bit? Why elimination of back-to-back 
> safepoints cause problems during new transformation? Is it because you 
> need specifically a SafePoint because CallNode doesn't fit?

The issue is with a call followed by a SafePointNode. A call captures
the state before the call but we would need the state after the call
otherwise on a deoptimization we would re-executed the call.

> ===============
> src/hotspot/share/opto/loopnode.cpp:
>
> +void PhaseIdealLoop::add_empty_predicate(Deoptimization::DeoptReason 
> reason, Node* inner_head, IdealLoopTree* loop, SafePointNode* sfpt) {
>
> Nothing actionable at the moment, but it's unfortunate to see more and 
> more code being duplicated from GraphKit. I wish there were a way to 
> share implementation between GraphKit, PhaseIdealLoop, and 
> PhaseMacroExpand.

Actually, there might be a way. In the valhalla repo, Tobias pushed a
change to GraphKit so it's possible to build one with an igvn
argument. So we could do this:

    JVMState* jvms = cloned_sfpt->jvms()->clone_shallow(C);
    SafePointNode* map = cloned_sfpt->clone()->as_SafePoint();
    map->set_jvms(jvms);
    jvms->set_map(map);
    GraphKit kit(jvms, &_igvn);
    kit.set_control(inner_head->in(LoopNode::EntryControl));

    kit.add_empty_predicates(0);

    _igvn.replace_input_of(inner_head, LoopNode::EntryControl, kit.control());
    _igvn.remove_dead_node(map);

instead of:

    if (UseLoopPredicate) {
      add_empty_predicate(Deoptimization::Reason_predicate, inner_head, outer_ilt, cloned_sfpt);
    }
    if (UseProfiledLoopPredicate) {
      add_empty_predicate(Deoptimization::Reason_profile_predicate, inner_head, outer_ilt, cloned_sfpt);
    }
    add_empty_predicate(Deoptimization::Reason_loop_limit_check, inner_head, outer_ilt, cloned_sfpt);

and the new PhaseIdealLoop::add_empty_predicate() wouldn't be needed
anymore.

One thing to consider is that new nodes for predicates are added by
GraphKit now and they are not registered with PhaseIdealLoop. It may not
be a problem because peeling sets major_progress so no further loop opts
will be applied in this round.

Anyway, if we wanted to pursue this further, I think it would make sense
to push Tobias' patch first.

What do you think?

Roland.


From igor.ignatyev at oracle.com  Thu Aug 20 16:16:33 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Thu, 20 Aug 2020 09:16:33 -0700
Subject: RFR (XS) 8252120: compiler/oracle/TestCompileCommand.java
 misspells "occured"
In-Reply-To: <bbf4deec-77c4-7ecb-2937-6291562acd1f@redhat.com>
References: <bbf4deec-77c4-7ecb-2937-6291562acd1f@redhat.com>
Message-ID: <D1C71E79-441C-400D-8E61-407BBA294A89@oracle.com>

Hi Aleksey,

LGTM

-- Igor

> On Aug 20, 2020, at 8:02 AM, Aleksey Shipilev <shade at redhat.com> wrote:
> 
> Bug:
>  https://bugs.openjdk.java.net/browse/JDK-8252120
> 
> Noticed this while reading some related test code. There is no way VM could emit the message the assert checks, which means the assert always passes. See the history in the bug.
> 
> Fix:
> 
> diff -r 53629f4016c6 test/hotspot/jtreg/compiler/oracle/TestCompileCommand.java
> --- a/test/hotspot/jtreg/compiler/oracle/TestCompileCommand.java        Thu Aug 20 11:42:12 2020 +0100
> +++ b/test/hotspot/jtreg/compiler/oracle/TestCompileCommand.java        Thu Aug 20 17:01:40 2020 +0200
> @@ -63,5 +63,5 @@
>         }
> 
> -        out.shouldNotContain("CompileCommand: An error occured during parsing");
> +        out.shouldNotContain("CompileCommand: An error occurred during parsing");
>         out.shouldHaveExitValue(0);
>     }
> 
> 
> Testing: affected test
> 
> -- 
> Thanks,
> -Aleksey
> 


From igor.ignatyev at oracle.com  Thu Aug 20 17:16:31 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Thu, 20 Aug 2020 10:16:31 -0700
Subject: RFR(T) : 8252005 : narrow disabling of allowSmartActionArgs in
 vmTestbase
In-Reply-To: <17a8369e-5f38-ebab-974b-28e083378aa2@oracle.com>
References: <4E6FECE6-9103-46ED-84B2-79DBA0123ED9@oracle.com>
 <17a8369e-5f38-ebab-974b-28e083378aa2@oracle.com>
Message-ID: <C5209174-60CD-416C-80BE-7C5BC1C53284@oracle.com>

Hi Serguei,

thanks for your review. I've decided to slightly modify the patch and use the ids of subtasks in TEST.properties files (instead of main bug id) in order to avoid possible confusion in the future:
 - incremental: http://cr.openjdk.java.net/~iignatyev//8252005/webrev.0-1/index.html <http://cr.openjdk.java.net/~iignatyev//8252005/webrev.0-1/index.html>
 - whole: http://cr.openjdk.java.net/~iignatyev//8252005/webrev.01/index.html <http://cr.openjdk.java.net/~iignatyev//8252005/webrev.01/index.html>

could you please re-review it?

Thanks,
-- Igor

> On Aug 19, 2020, at 4:22 PM, serguei.spitsyn at oracle.com wrote:
> 
> Hi Igor,
> 
> This looks reasonable.
> 
> Thanks,
> Serguei
> 
> 
> On 8/18/20 16:42, Igor Ignatyev wrote:
>> http://cr.openjdk.java.net/~iignatyev//8252005/webrev.00/
>>> 0 lines changed: 0 ins; 0 del; 0 mod;
>> Hi all,
>> 
>> could you please review this trivial (and apparently empty) patch which sets allowSmartActionArgs to false only in subdirectories of vmTestbase which currently use PropertyResolvingWrapper?
>> 
>> (it's hard to tell from webrev or patch, but test/hotspot/jtreg/vmTestbase/TEST.properties is effectively removed)
>> 
>> webrev: http://cr.openjdk.java.net/~iignatyev//8252005/webrev.00/
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8252005
>> 
>> Thanks,
>> -- Igor
>> 
>> 
> 


From igor.ignatyev at oracle.com  Thu Aug 20 18:18:19 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Thu, 20 Aug 2020 11:18:19 -0700
Subject: RFR(T) : 8252005 : narrow disabling of allowSmartActionArgs in
 vmTestbase
In-Reply-To: <8eb1187f-8030-2adf-b20d-d289bfa35198@oracle.com>
References: <4E6FECE6-9103-46ED-84B2-79DBA0123ED9@oracle.com>
 <17a8369e-5f38-ebab-974b-28e083378aa2@oracle.com>
 <C5209174-60CD-416C-80BE-7C5BC1C53284@oracle.com>
 <8eb1187f-8030-2adf-b20d-d289bfa35198@oracle.com>
Message-ID: <3CB6B3FF-458B-4B76-872B-46A6D30B7A33@oracle.com>

thanks Serguei, pushed.

-- Igor

> On Aug 20, 2020, at 10:55 AM, serguei.spitsyn at oracle.com wrote:
> 
> Hi Igor,
> 
> Still looks good to me.
> The webrev is veeeeery slow.
> 
> Thanks,
> Serguei
> 
> 
> On 8/20/20 10:16, Igor Ignatyev wrote:
>> Hi Serguei,
>> 
>> thanks for your review. I've decided to slightly modify the patch and use the ids of subtasks in TEST.properties files (instead of main bug id) in order to avoid possible confusion in the future:
>>  - incremental: http://cr.openjdk.java.net/~iignatyev//8252005/webrev.0-1/index.html <http://cr.openjdk.java.net/~iignatyev//8252005/webrev.0-1/index.html>
>>  - whole: http://cr.openjdk.java.net/~iignatyev//8252005/webrev.01/index.html <http://cr.openjdk.java.net/~iignatyev//8252005/webrev.01/index.html>
>> 
>> could you please re-review it?
>> 
>> Thanks,
>> -- Igor
>> 
>>> On Aug 19, 2020, at 4:22 PM, serguei.spitsyn at oracle.com <mailto:serguei.spitsyn at oracle.com> wrote:
>>> 
>>> Hi Igor,
>>> 
>>> This looks reasonable.
>>> 
>>> Thanks,
>>> Serguei
>>> 
>>> 
>>> On 8/18/20 16:42, Igor Ignatyev wrote:
>>>> http://cr.openjdk.java.net/~iignatyev//8252005/webrev.00/ <http://cr.openjdk.java.net/~iignatyev//8252005/webrev.00/>
>>>>> 0 lines changed: 0 ins; 0 del; 0 mod;
>>>> Hi all,
>>>> 
>>>> could you please review this trivial (and apparently empty) patch which sets allowSmartActionArgs to false only in subdirectories of vmTestbase which currently use PropertyResolvingWrapper?
>>>> 
>>>> (it's hard to tell from webrev or patch, but test/hotspot/jtreg/vmTestbase/TEST.properties is effectively removed)
>>>> 
>>>> webrev: http://cr.openjdk.java.net/~iignatyev//8252005/webrev.00/ <http://cr.openjdk.java.net/~iignatyev//8252005/webrev.00/>
>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8252005 <https://bugs.openjdk.java.net/browse/JDK-8252005>
>>>> 
>>>> Thanks,
>>>> -- Igor
>>>> 
>>>> 
>>> 
>> 
> 


From vladimir.kozlov at oracle.com  Thu Aug 20 19:21:53 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 20 Aug 2020 12:21:53 -0700
Subject: RFR(XS): 8251527: CTW: C2 (Shenandoah) compilation fails with
 SEGV due to unhandled catchproj == NULL
In-Reply-To: <4c3aa4cf-af9f-9000-a12d-010bdd477b30@oracle.com>
References: <877dtt3ckp.fsf@redhat.com>
 <4c3aa4cf-af9f-9000-a12d-010bdd477b30@oracle.com>
Message-ID: <c1016f91-6bf0-87c6-c50e-5479539e2f2e@oracle.com>

+1

Thanks,
Vladimir K

On 8/20/20 2:45 AM, Christian Hagedorn wrote:
> Hi Roland
> 
> That looks good to me.
> 
> Best regards,
> Christian
> 
> On 20.08.20 11:12, Roland Westrelin wrote:
>>
>> http://cr.openjdk.java.net/~roland/8251527/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8251527
>>
>> This triggers with Shenandoah but the fix (and the bug) is in shared
>> C2 code.
>>
>> CallNode::extract_projections(), once it has found the control ProjNode
>> looks for the CatchNode at the first use of the ProjNode. In the case of
>> the crash, the ProjNode has more than one use and the first use is not
>> the CatchNode (but a pinned LoadNode). I propose using unique_ctrl_out()
>> instead.
>>
>> The ProjNode has a LoadNode because one is pinned on a ProjNode by
>> PhaseIdealLoop::split_if_with_blocks_post() when it tries to sink the
>> LoadNode out of loop. A LoadNode becomes the first use of the ProjNode
>> after the loop body is cloned during unswitching.
>>
>> Roland.
>>

From igor.ignatyev at oracle.com  Thu Aug 20 20:47:07 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Thu, 20 Aug 2020 13:47:07 -0700
Subject: RFR(S) : 8251996 : remove usage of PropertyResolvingWrapper in
 vm/compiler/complog/uninit
Message-ID: <5DA75BC6-7102-4582-903A-F5299C398254@oracle.com>

http://cr.openjdk.java.net/~iignatyev//8251996/webrev.00
> 75 lines changed: 13 ins; 29 del; 33 mod; 

Hi all,

could you please review this small patch which removes usage of PropertyResolvingWrapper class from vm/compiler/complog/uninit?

a bit of background (from 8219140):
> CODETOOLS-7902352 added support of using ${property} in action directive, so PropertyResolvingWrapper isn't needed anymore and can be removed. 

jtreg can't pass "${test.vm.opts} ${test.java.opts}" as one option, so v.c.c.share.LogCompilationTest (used by and only by 13 complog/uninit tests) was updated to use j.t.lib.Utils::getTestJavaOpts() to get vm flags, and '-options "${test.vm.opts} ${test.java.opts}"' was removed from all 13 tests. the patch also slightly reformats LogCompilationTest: whitespace, imports cleanup, etc. 

JBS: https://bugs.openjdk.java.net/browse/JDK-8251996
webrev: http://cr.openjdk.java.net/~iignatyev//8251996/webrev.00
testing: :vmTestbase_vm_compiler

Thanks,
-- Igor

From igor.ignatyev at oracle.com  Thu Aug 20 20:57:34 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Thu, 20 Aug 2020 13:57:34 -0700
Subject: RFR(S) : 8251998 remove usage of PropertyResolvingWrapper in
 vmTestbase/jit/t
Message-ID: <1041CE41-B5C9-407F-AF91-918A52885DA8@oracle.com>

http://cr.openjdk.java.net/~iignatyev//8251998/webrev.00
> 69 lines changed: 4 ins; 24 del; 41 mod;

Hi all,

could you please review this small patch which removes usages of PropertyResolvingWrapper from vmTestbase/jit/t tests and reenabled allowSmartActionArgs?

background from the main bug:
> CODETOOLS-7902352 added support of using ${property} in action directive, so PropertyResolvingWrapper isn't needed anymore and can be removed. 


JBS: https://bugs.openjdk.java.net/browse/JDK-8251998
webrev: http://cr.openjdk.java.net/~iignatyev//8251998/webrev.00
testing: :vmTestbase_vm_compiler 

Thanks,
-- Igor
 

From vladimir.x.ivanov at oracle.com  Thu Aug 20 22:01:14 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 21 Aug 2020 01:01:14 +0300
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <87r1s11ewc.fsf@redhat.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
 <87h7t13bdz.fsf@redhat.com> <ef7c4e62-ca3b-f6ee-b7ac-8e1af273d33d@oracle.com>
 <87r1s11ewc.fsf@redhat.com>
Message-ID: <ecc85b86-4878-1048-0399-fe72d7013383@oracle.com>


>> src/hotspot/share/opto/callnode.cpp:
>>
>>     // If you have back to back safepoints, remove one
>>      if( in(TypeFunc::Control)->is_SafePoint() )
>>        return in(TypeFunc::Control);
>>
>> -  if( in(0)->is_Proj() ) {
>> +  // Transforming long counted loops requires a safepoint node. Do not
>> +  // eliminate a safepoint until loop opts are over.
>> +  if (in(0)->is_Proj() && !phase->C->major_progress()) {
>>
>> Can you elaborate on this a bit? Why elimination of back-to-back
>> safepoints cause problems during new transformation? Is it because you
>> need specifically a SafePoint because CallNode doesn't fit?
> 
> The issue is with a call followed by a SafePointNode. A call captures
> the state before the call but we would need the state after the call
> otherwise on a deoptimization we would re-executed the call.

Sorry, I don't get it. Normally JVM state associated with a call is a 
state right after the call returns. Do you mean there are cases when 
call has reexecute bit set and hence it has JVM state before the call 
associated with it?

Anyway, it's trivial to convert between 2 states (before and after) and 
we already do that in some places (e.g., late inline prepares JVM state 
for the parser based on the state associated with CallStaticJava node).

>> ===============
>> src/hotspot/share/opto/loopnode.cpp:
>>
>> +void PhaseIdealLoop::add_empty_predicate(Deoptimization::DeoptReason
>> reason, Node* inner_head, IdealLoopTree* loop, SafePointNode* sfpt) {
>>
>> Nothing actionable at the moment, but it's unfortunate to see more and
>> more code being duplicated from GraphKit. I wish there were a way to
>> share implementation between GraphKit, PhaseIdealLoop, and
>> PhaseMacroExpand.
> 
> Actually, there might be a way. In the valhalla repo, Tobias pushed a
> change to GraphKit so it's possible to build one with an igvn
> argument. So we could do this:
> 
>      JVMState* jvms = cloned_sfpt->jvms()->clone_shallow(C);
>      SafePointNode* map = cloned_sfpt->clone()->as_SafePoint();
>      map->set_jvms(jvms);
>      jvms->set_map(map);
>      GraphKit kit(jvms, &_igvn);
>      kit.set_control(inner_head->in(LoopNode::EntryControl));
> 
>      kit.add_empty_predicates(0);
> 
>      _igvn.replace_input_of(inner_head, LoopNode::EntryControl, kit.control());
>      _igvn.remove_dead_node(map);
> 
> instead of:
> 
>      if (UseLoopPredicate) {
>        add_empty_predicate(Deoptimization::Reason_predicate, inner_head, outer_ilt, cloned_sfpt);
>      }
>      if (UseProfiledLoopPredicate) {
>        add_empty_predicate(Deoptimization::Reason_profile_predicate, inner_head, outer_ilt, cloned_sfpt);
>      }
>      add_empty_predicate(Deoptimization::Reason_loop_limit_check, inner_head, outer_ilt, cloned_sfpt);
> 
> and the new PhaseIdealLoop::add_empty_predicate() wouldn't be needed
> anymore.
> 
> One thing to consider is that new nodes for predicates are added by
> GraphKit now and they are not registered with PhaseIdealLoop. It may not
> be a problem because peeling sets major_progress so no further loop opts
> will be applied in this round.
> 
> Anyway, if we wanted to pursue this further, I think it would make sense
> to push Tobias' patch first.
> 
> What do you think?

Wow, it looks very promising! I'm perfectly fine with addressing it later.

Best regards,
Vladimir Ivanov

From ekaterina.pavlova at oracle.com  Thu Aug 20 22:03:50 2020
From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova)
Date: Thu, 20 Aug 2020 15:03:50 -0700
Subject: RFR(S) : 8251996 : remove usage of PropertyResolvingWrapper in
 vm/compiler/complog/uninit
In-Reply-To: <5DA75BC6-7102-4582-903A-F5299C398254@oracle.com>
References: <5DA75BC6-7102-4582-903A-F5299C398254@oracle.com>
Message-ID: <53db4f4d-75b5-be5b-5799-00ae8e567e65@oracle.com>

Looks good,

-katya

On 8/20/20 1:47 PM, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev//8251996/webrev.00
>> 75 lines changed: 13 ins; 29 del; 33 mod;
> 
> Hi all,
> 
> could you please review this small patch which removes usage of PropertyResolvingWrapper class from vm/compiler/complog/uninit?
> 
> a bit of background (from 8219140):
>> CODETOOLS-7902352 added support of using ${property} in action directive, so PropertyResolvingWrapper isn't needed anymore and can be removed.
> 
> jtreg can't pass "${test.vm.opts} ${test.java.opts}" as one option, so v.c.c.share.LogCompilationTest (used by and only by 13 complog/uninit tests) was updated to use j.t.lib.Utils::getTestJavaOpts() to get vm flags, and '-options "${test.vm.opts} ${test.java.opts}"' was removed from all 13 tests. the patch also slightly reformats LogCompilationTest: whitespace, imports cleanup, etc.
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8251996
> webrev: http://cr.openjdk.java.net/~iignatyev//8251996/webrev.00
> testing: :vmTestbase_vm_compiler
> 
> Thanks,
> -- Igor
> 


From john.r.rose at oracle.com  Fri Aug 21 00:12:23 2020
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 20 Aug 2020 17:12:23 -0700
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <87y2m91nvl.fsf@redhat.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
 <87h7t13bdz.fsf@redhat.com> <668C77F1-5CCC-43CA-9C5E-2EE390D3137A@oracle.com>
 <87y2m91nvl.fsf@redhat.com>
Message-ID: <9CBCBEBB-7C33-4263-8348-900AAC068D65@oracle.com>

On Aug 20, 2020, at 5:51 AM, Roland Westrelin <rwestrel at redhat.com> wrote:
> 
> Hi John,
> 
>> I?m going over it one more time (between smoky breaths from
>> the California fires) and I have a question.  What is the exact
>> structure of outer_phi?
> 
> Thanks for taking another look at this.
> 
>> 1. At first it is a clone of phi, its region patched and the other edges the same:
>> 
>>    outer_phi := Phi(outer_head, init, AddL(phi, stride))
> 
> It's only a clone so:
> 
> outer_phi := Phi(outer_head, init, AddI(phi, stride))
> 
> (that is no AddL)

I?m not sure what you mean?  The original incr is an AddL,
since we are transforming a long loop.  The AddI goes
somewhere else in the transformed code.

But, yes, the first step is just to make a cloned phi and patch
it into the outer loop.

> 
>> 2. After long_loop_replace_long_iv, the interior phi links it back to itself:
>> 
>>    outer_phi := Phi(outer_head, init, AddL(AddL(inner_phi, outer_phi), stride))
> 
> I don't think that's right. There are 2 calls to
> long_loop_replace_long_iv(). One to replace phi and the other one to
> replace incr (that is the AddI above).

Right; I missed the fact that the second replace_long_iv step
replaces the AddL completely.  So while phi is replaced by
AddL(I2L(inner_phi), outer_phi), incr is replaced by
AddL(I2L(inner_incr), outer_phi).

phi := Phi(x, init, incr)  =>  AddL(I2L(inner_phi), outer_phi)
incr := AddL(phi, stride)  =>  AddL(I2L(inner_incr), outer_phi)

And the effect of those replacements on outer_phi (the patched
clone of phi) is:

outer_phi := Phi(outer_head, init, <<incr=AddL(phi, longcon(stride))>>)
  =>  Phi(outer_head, init, <<AddL(I2L(inner_incr), outer_phi)>>)
  =>
outer_phi := Phi(outer_head, init, AddL(I2L(AddI(inner_phi, intcon(stride))), outer_phi))

not (as I was said previously):

outer_phi := Phi(outer_head, init, AddL(<<phi>>, longcon(stride)))
  =>  Phi(outer_head, init, AddL(<<AddL(I2L(inner_phi), outer_phi)>>, longcon(stride)))

And, in the corrected transform, there is no worrying extra
addition of stride (using AddL directly).

> 
> outer_phi := Phi(outer_head, init, AddL(I2L(AddI(inner_phi, stride)), outer_phi))
> 
> (actually this is not quite accurate because peeling one iteration
> causes an extra phi to be added to merge the peeled iteration with the
> counted loop in most cases).
> 
> Do you see a problem with the above outer_phi structure?

Not any more.  Let?s just make sure the transform gets exercised, OK?

After the P.S. is an amended chunk of pseudocode showing how it works.
I created it by labeling the various expressions in the example loop
with the names used in is_long_counted_loop, and then I stepped
through is_long_counted_loop and edited the pseudocode to
reflect each step.  If you agree I did it correctly, and that it helps
explain the code, you could place it as a comment at bottom, just
before the final peel.  Otherwise, we can just leave it here FTR.

I do have this specific request:  Please replace the pseudocode
at the top (already in the webrev) with the following corrected
pseudocode.  It uses names more consistent with the actual C++
code and corresponds more accurately to the transformed IR.

```
// range of long values from the initial loop in (at most) max int
// steps. That is:

x: for (long phi = init; phi < limit; phi += stride) {
  // phi := Phi(L, init, incr)
  // incr := AddL(phi, longcon(stride))
  // phi_incr := phi (test happens before increment)
  long incr = phi + stride;
  ... use phi and incr ...
}

OR:

x: for (long phi = init; (phi += stride) < limit; ) {
  // phi := Phi(L, AddL(init, stride), incr)
  // incr := AddL(phi, longcon(stride))
  // phi_incr := NULL (test happens after increment)
  long incr = phi + stride;
  ... use phi and (phi + stride) ...
}

==transform=>

const ulong inner_iters_limit = INT_MAX - stride - 1;  //near 0x7FFFFFF0
assert(stride <= inner_iters_limit);  // else abort transform
assert((extralong)limit + stride <= LONG_MAX);  // else deopt
outer_head: for (long outer_phi = init;;) {
  // outer_phi := Phi(outer_head, init, AddL(outer_phi, I2L(inner_phi)))
  ulong inner_iters_max = (ulong) MAX(0, ((extralong)limit + stride - outer_phi));
  long inner_iters_actual = MIN(inner_iters_limit, inner_iters_max);
  assert(inner_iters_actual == (int)inner_iters_actual);
  int inner_phi, inner_incr;
  x: for (inner_phi = 0;; inner_phi = inner_incr) {
    // inner_phi := Phi(x, intcon(0), inner_incr)
    // inner_incr := AddI(inner_phi, intcon(stride))
    inner_incr = inner_phi + stride;
    if (inner_incr < inner_iters_actual) {
      ... use phi=>(outer_phi+inner_phi) and incr=>(outer_phi+inner_incr) ...
      continue;
    }
    else break;
  }
  if ((outer_phi+inner_phi) < limit)  //OR (outer_phi+inner_incr) < limit
    continue;
  else break;
}
```

Thanks!

? John

P.S. Here are the intermediate steps, annotated with the C++
variable names for the various nodes, and with the steps that
created the transformed loop nodes.

== old IR nodes =>

entry_control: {...}
x:
for (long phi = init;;) {
  // phi := Phi(x, init, incr)
  // incr := AddL(phi, longcon(stride))
  exit_test:
  if (phi < limit)
    back_control: fallthrough;
  else
    exit_branch: break;
  // test happens before increment => phi == phi_incr != NULL
  long incr = phi + stride;
  ... use phi and incr ...
  phi = incr;
}

== new IR nodes (just before final peel) =>

entry_control: {...}
long adjusted_limit = limit + stride;  //because phi_incr != NULL
assert(!limit_check_required || (extralong)limit + stride == adjusted_limit);  // else deopt
ulong inner_iters_limit = max_jint - ABS(stride) - 1;  //near 0x7FFFFFF0
outer_head:
for (long outer_phi = init;;) {
  // outer_phi := phi->clone(), in(0):=outer_head, => Phi(outer_head, init, incr)
  // REPLACE phi  => AddL(outer_phi, I2L(inner_phi))
  // REPLACE incr => AddL(outer_phi, I2L(inner_incr))
  // SO THAT outer_phi := Phi(outer_head, init, AddL(outer_phi, I2L(inner_incr)))
  ulong inner_iters_max = (ulong) MAX(0, ((extralong)adjusted_limit - outer_phi) * SGN(stride));
  int inner_iters_actual_int = (int) MIN(inner_iters_limit, inner_iters_max) * SGN(stride);
  inner_head: x: //in(1) := outer_head
  int inner_phi;
  for (inner_phi = 0;;) {
    // inner_phi := Phi(x, intcon(0), inner_phi + stride)
    int inner_incr = inner_phi + stride;
    bool inner_bol = (inner_incr < inner_iters_actual_int);
    exit_test: //exit_test->in(1) := inner_bol; 
    if (inner_bol) // WAS (phi < limit)
      back_control: fallthrough;
    else
      inner_exit_branch: break;  //exit_branch->clone()
    ... use phi=>(outer_phi+inner_phi) and incr=>(outer_phi+inner_incr) ...
    inner_phi = inner_phi + stride;  // inner_incr
  }
  outer_exit_test:  //exit_test->clone(), in(0):=inner_exit_branch
  if ((outer_phi+inner_phi) < limit)  // WAS (phi < limit)
    outer_back_branch: fallthrough;  //back_control->clone(), in(0):=outer_exit_test
  else
    exit_branch: break;  //in(0) := outer_exit_test
}

From vladimir.kozlov at oracle.com  Fri Aug 21 00:17:00 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 20 Aug 2020 17:17:00 -0700
Subject: RFR(S) : 8251996 : remove usage of PropertyResolvingWrapper in
 vm/compiler/complog/uninit
In-Reply-To: <53db4f4d-75b5-be5b-5799-00ae8e567e65@oracle.com>
References: <5DA75BC6-7102-4582-903A-F5299C398254@oracle.com>
 <53db4f4d-75b5-be5b-5799-00ae8e567e65@oracle.com>
Message-ID: <f28b4fc0-a6c3-1a66-373a-f27bfb951971@oracle.com>

+1

Vladimir K

On 8/20/20 3:03 PM, Ekaterina Pavlova wrote:
> Looks good,
> 
> -katya
> 
> On 8/20/20 1:47 PM, Igor Ignatyev wrote:
>> http://cr.openjdk.java.net/~iignatyev//8251996/webrev.00
>>> 75 lines changed: 13 ins; 29 del; 33 mod;
>>
>> Hi all,
>>
>> could you please review this small patch which removes usage of PropertyResolvingWrapper class from 
>> vm/compiler/complog/uninit?
>>
>> a bit of background (from 8219140):
>>> CODETOOLS-7902352 added support of using ${property} in action directive, so PropertyResolvingWrapper isn't needed 
>>> anymore and can be removed.
>>
>> jtreg can't pass "${test.vm.opts} ${test.java.opts}" as one option, so v.c.c.share.LogCompilationTest (used by and 
>> only by 13 complog/uninit tests) was updated to use j.t.lib.Utils::getTestJavaOpts() to get vm flags, and '-options 
>> "${test.vm.opts} ${test.java.opts}"' was removed from all 13 tests. the patch also slightly reformats 
>> LogCompilationTest: whitespace, imports cleanup, etc.
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8251996
>> webrev: http://cr.openjdk.java.net/~iignatyev//8251996/webrev.00
>> testing: :vmTestbase_vm_compiler
>>
>> Thanks,
>> -- Igor
>>
> 

From HORIE at jp.ibm.com  Fri Aug 21 02:33:16 2020
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Fri, 21 Aug 2020 11:33:16 +0900
Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new
 byte-reverse instructions
In-Reply-To: <20200819165338.GA978936@pacoca>
References: <20200819165338.GA978936@pacoca>,
 <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com>
 <20200626182644.GA262544@pacoca>
 <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com>
 <20200630001528.GA26652@pacoca>
 <OF8BF41F00.CCAC864B-ON002585C8.00003A9A-492585C8.002908BA@notes.na.collabserv.com>
 <AM4PR02MB3057CBF45D7ABE3F7850535B9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <20200819002432.GA915540@pacoca>
 <AM4PR02MB30574A706883A995CAEC3A879A5D0@AM4PR02MB3057.eurprd02.prod.outlook.com>
Message-ID: <OF8BD9FD13.2CC626B0-ON002585CB.000C8175-492585CB.000E0836@notes.na.collabserv.com>


Hi Jose,

One thing I noticed is a misaligned backslash in globals_ppc.hpp.
Otherwise, the change looks good!

   /* special instructions */
\
+  product(bool, UseByteReverseInstructions, false,
\


Best regards,
Michihiro


 ----- Original message -----
 From: joserz at linux.ibm.com
 To: "Doerr, Martin" <martin.doerr at sap.com>
 Cc: Michihiro Horie/Japan/IBM at IBMJP,
 "hotspot-compiler-dev at openjdk.java.net"
 <hotspot-compiler-dev at openjdk.java.net>
 Subject: Re: RFR(M): 8248190: PPC: Enable Power10 system and use new
 byte-reverse instructions
 Date: Thu, Aug 20, 2020 1:53 AM

 On Wed, Aug 19, 2020 at 09:55:50AM +0000, Doerr, Martin wrote:
 > Hi Jose,
 >
 > thanks for the update.
 >
 > I have never seen 2 format specifications in the ad file. Does that work
 or does the 2nd one overwrite the 1st one?
 > I think it should be:
 >   format %{ "BRH   $dst, $src\n\t"
 >             "EXTSH $dst, $dst" %}

 You're right, actually the 2nd one overwrote the first. I just fixed it.
 Thanks sir!

 >
 > I don't need to see another webrev for that. Otherwise, the change looks
 good. Thanks for contributing.
 >
 > Best regards,
 > Martin
 >
 >
 > > -----Original Message-----
 > > From: joserz at linux.ibm.com <joserz at linux.ibm.com>
 > > Sent: Mittwoch, 19. August 2020 02:25
 > > To: Doerr, Martin <martin.doerr at sap.com>
 > > Cc: Michihiro Horie <HORIE at jp.ibm.com>; hotspot-compiler-
 > > dev at openjdk.java.net
 > > Subject: Re: RFR(M): 8248190: PPC: Enable Power10 system and use new
 > > byte-reverse instructions
 > >
 > > Hallo Martin!
 > >
 > > Thank you very much for your review. Here is the v3:
 > >
 > > Webrev: http://cr.openjdk.java.net/~mhorie/8248190/webrev.02/
 > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190
 > >
 > > I run a functional test and it's working as expected. If you try to
 run it in a
 > > system <P10 you will get the following message:
 > >
 > > $ java -XX:+UseByteReverseInstructions ReverseBytes
 > > OpenJDK 64-Bit Server VM warning: UseByteReverseInstructions
 specified,
 > > but needs at least Power10.
 > > (continue with existing code)
 > >
 > > > Unfortunately, I couldn?t find a Power10 machine in my garage ??
 > > ????????
 > >
 > > This is the code I use to test:
 > > 8<---------------------------------------------------------------
 > > import java.io.IOException;
 > >
 > > class ReverseBytes
 > > {
 > >     public static void main(String[] args) throws IOException
 > >     {
 > >         for (int i = 0; i < 1000000; ++i) {
 > >             if (Integer.reverseBytes(0x12345678) != 0x78563412) {
 > >                 throw new RuntimeException();
 > >             }
 > >
 > >             if (Long.reverseBytes(0x123456789ABCDEF0L) !=
 > > 0xF0DEBC9A78563412L) {
 > >                 throw new RuntimeException();
 > >             }
 > >
 > >             if (Short.reverseBytes((short)0x1234) != (short)0x3412) {
 > >                 throw new RuntimeException();
 > >             }
 > >
 > >             if (Character.reverseBytes((char)0xabcd) != (char)0xcdab)
 {
 > >                 throw new RuntimeException();
 > >             }
 > >         }
 > >         System.out.println("ok");
 > >     }
 > > }
 > > 8<---------------------------------------------------------------
 > >
 > > Best regards!
 > >
 > > Jose
 > >
 > > On Tue, Aug 18, 2020 at 09:13:39AM +0000, Doerr, Martin wrote:
 > > > Hi Michihiro and Jose,
 > > >
 > > > I had only done a quick review during my vacation. Thanks for
 updating the
 > > description of PowerArchitecturePPC64.
 > > > After taking a second look, I have a few minor requests. Sorry for
 that.
 > > >
 > > >
 > > >   *   ?UseByteReverseInstructions? (plural) would be more consistent
 with
 > > other names.
 > > >   *   Please add ?size? specifications to the ppc.ad file.
 Otherwise, the
 > > compiler has to determine sizes dynamically every time.
 > > >   *   bytes_reverse_short: ?format? specification misses ?extsh?.
 > > >
 > > > Unfortunately, I couldn?t find a Power10 machine in my garage ??
 > > > So we rely on your testing.
 > > >
 > > > Thanks and best regards,
 > > > Martin
 > > >
 > > >
 > > > From: Michihiro Horie <HORIE at jp.ibm.com>
 > > > Sent: Dienstag, 18. August 2020 09:28
 > > > To: Doerr, Martin <martin.doerr at sap.com>
 > > > Cc: hotspot-compiler-dev at openjdk.java.net; joserz at linux.ibm.com
 > > > Subject: RE: RFR(M): 8248190: PPC: Enable Power10 system and use new
 > > byte-reverse instructions
 > > >
 > > >
 > > > Jose,
 > > > Latest change looks good also to me.
 > > >
 > > > Marin,
 > > > Do you think if I can push the change?
 > > >
 > > > Best regards,
 > > > Michihiro
 > > >
 > > >
 > > > ----- Original message -----
 > > > From: "Doerr, Martin"
 > > <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>
 > > > To: "joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>"
 > > <joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>>
 > > > Cc: hotspot compiler <hotspot-compiler-
 > > dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>>,
 > > "horie at jp.ibm.com<mailto:horie at jp.ibm.com>"
 > > <horie at jp.ibm.com<mailto:horie at jp.ibm.com>>
 > > > Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system
 > > and use new byte-reverse instructions
 > > > Date: Wed, Jul 1, 2020 4:01 AM
 > > >
 > > > Thanks for the much better flag description.
 > > > Looks good.
 > > >
 > > > Best regards,
 > > > Martin
 > > >
 > > > > Am 30.06.2020 um 02:15 schrieb
 > > "joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>"
 > > <joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>>:
 > > > >
 > > > > ?Hello team,
 > > > >
 > > > > Here's the 2nd version, implementing the suggestions asked by
 Martin.
 > > > >
 > > > > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.01/
 > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190
 > > > >
 > > > > Thank you!!
 > > > >
 > > > > Jose
 > > > >
 > > > >> On Sat, Jun 27, 2020 at 09:29:32AM +0000, Doerr, Martin wrote:
 > > > >> Hi Jose,
 > > > >>
 > > > >> Can you replace the outdated description of
 PowerArchitecturePPC64 in
 > > globals_poc.hpp by something generic, please?
 > > > >>
 > > > >> Please update the Copyright year in vm_version_poc.hpp.
 > > > >>
 > > > >> I can?t test the change, but it looks good to me.
 > > > >>
 > > > >> Best regards,
 > > > >> Martin
 > > > >>
 > > > >>>> Am 26.06.2020 um 20:29 schrieb
 > > "joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>"
 > > <joserz at linux.ibm.com<mailto:joserz at linux.ibm.com>>:
 > > > >>>
 > > > >>> ?Hello team!
 > > > >>>
 > > > >>> This patch introduces Power10 to OpenJDK and implements three
 new
 > > instructions:
 > > > >>> - brh - byte-reverse halfword
 > > > >>> - brw - byte-reverse word
 > > > >>> - brd - byte-reverse doubleword
 > > > >>>
 > > > >>> Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/
 > > > >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8248190
 > > > >>>
 > > > >>> Thanks for your review!
 > > > >>>
 > > > >>> Jose R. Ziviani
 > > >


From igor.ignatyev at oracle.com  Fri Aug 21 03:18:50 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Thu, 20 Aug 2020 20:18:50 -0700
Subject: RFR(S) : 8251996 : remove usage of PropertyResolvingWrapper in
 vm/compiler/complog/uninit
In-Reply-To: <f28b4fc0-a6c3-1a66-373a-f27bfb951971@oracle.com>
References: <5DA75BC6-7102-4582-903A-F5299C398254@oracle.com>
 <53db4f4d-75b5-be5b-5799-00ae8e567e65@oracle.com>
 <f28b4fc0-a6c3-1a66-373a-f27bfb951971@oracle.com>
Message-ID: <92D72E51-F252-49CF-AE72-367C77C24E9C@oracle.com>

Vladimir, Katya,

thank you for your reviews, pushed.

-- Igor

> On Aug 20, 2020, at 5:17 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> +1
> 
> Vladimir K
> 
> On 8/20/20 3:03 PM, Ekaterina Pavlova wrote:
>> Looks good,
>> -katya
>> On 8/20/20 1:47 PM, Igor Ignatyev wrote:
>>> http://cr.openjdk.java.net/~iignatyev//8251996/webrev.00
>>>> 75 lines changed: 13 ins; 29 del; 33 mod;
>>> 
>>> Hi all,
>>> 
>>> could you please review this small patch which removes usage of PropertyResolvingWrapper class from vm/compiler/complog/uninit?
>>> 
>>> a bit of background (from 8219140):
>>>> CODETOOLS-7902352 added support of using ${property} in action directive, so PropertyResolvingWrapper isn't needed anymore and can be removed.
>>> 
>>> jtreg can't pass "${test.vm.opts} ${test.java.opts}" as one option, so v.c.c.share.LogCompilationTest (used by and only by 13 complog/uninit tests) was updated to use j.t.lib.Utils::getTestJavaOpts() to get vm flags, and '-options "${test.vm.opts} ${test.java.opts}"' was removed from all 13 tests. the patch also slightly reformats LogCompilationTest: whitespace, imports cleanup, etc.
>>> 
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8251996
>>> webrev: http://cr.openjdk.java.net/~iignatyev//8251996/webrev.00
>>> testing: :vmTestbase_vm_compiler
>>> 
>>> Thanks,
>>> -- Igor
>>> 


From thomas.stuefe at gmail.com  Fri Aug 21 05:58:42 2020
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Fri, 21 Aug 2020 07:58:42 +0200
Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new
 byte-reverse instructions
In-Reply-To: <OF8BD9FD13.2CC626B0-ON002585CB.000C8175-492585CB.000E0836@notes.na.collabserv.com>
References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com>
 <20200626182644.GA262544@pacoca>
 <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com> <20200630001528.GA26652@pacoca>
 <OF8BF41F00.CCAC864B-ON002585C8.00003A9A-492585C8.002908BA@notes.na.collabserv.com>
 <AM4PR02MB3057CBF45D7ABE3F7850535B9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <20200819002432.GA915540@pacoca>
 <AM4PR02MB30574A706883A995CAEC3A879A5D0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <20200819165338.GA978936@pacoca>
 <OF8BD9FD13.2CC626B0-ON002585CB.000C8175-492585CB.000E0836@notes.na.collabserv.com>
Message-ID: <CAA-vtUxm1aGWJ4XfGZ2ooD2Dx6RrVJFaewoKQijCd1if0Z0avg@mail.gmail.com>

Hi,

Version 3 of these changes look good to me too.

Cheers, Thomas

On Fri, Aug 21, 2020 at 4:33 AM Michihiro Horie <HORIE at jp.ibm.com> wrote:

> Hi Jose,
>
> One thing I noticed is a misaligned backslash in globals_ppc.hpp.
> Otherwise, the change looks good!
>
> /* special instructions */ \
> + product(bool, UseByteReverseInstructions, false, \
>
>
> Best regards,
> Michihiro
>
>
> ----- Original message -----
> From: joserz at linux.ibm.com
> To: "Doerr, Martin" <martin.doerr at sap.com>
> Cc: Michihiro Horie/Japan/IBM at IBMJP, "
> hotspot-compiler-dev at openjdk.java.net" <
> hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR(M): 8248190: PPC: Enable Power10 system and use new
> byte-reverse instructions
> Date: Thu, Aug 20, 2020 1:53 AM
>
> On Wed, Aug 19, 2020 at 09:55:50AM +0000, Doerr, Martin wrote:
> > Hi Jose,
> >
> > thanks for the update.
> >
> > I have never seen 2 format specifications in the ad file. Does that work
> or does the 2nd one overwrite the 1st one?
> > I think it should be:
> >   format %{ "BRH   $dst, $src\n\t"
> >             "EXTSH $dst, $dst" %}
>
> You're right, actually the 2nd one overwrote the first. I just fixed it.
> Thanks sir!
>
> >
> > I don't need to see another webrev for that. Otherwise, the change looks
> good. Thanks for contributing.
> >
> > Best regards,
> > Martin
> >
> >
> > > -----Original Message-----
> > > From: joserz at linux.ibm.com <joserz at linux.ibm.com>
> > > Sent: Mittwoch, 19. August 2020 02:25
> > > To: Doerr, Martin <martin.doerr at sap.com>
> > > Cc: Michihiro Horie <HORIE at jp.ibm.com>; hotspot-compiler-
> > > dev at openjdk.java.net
> > > Subject: Re: RFR(M): 8248190: PPC: Enable Power10 system and use new
> > > byte-reverse instructions
> > >
> > > Hallo Martin!
> > >
> > > Thank you very much for your review. Here is the v3:
> > >
> > > Webrev: *http://cr.openjdk.java.net/~mhorie/8248190/webrev.02/*
> <http://cr.openjdk.java.net/~mhorie/8248190/webrev.02/>
> > > Bug: *https://bugs.openjdk.java.net/browse/JDK-8248190*
> <https://bugs.openjdk.java.net/browse/JDK-8248190>
> > >
> > > I run a functional test and it's working as expected. If you try to
> run it in a
> > > system <P10 you will get the following message:
> > >
> > > $ java -XX:+UseByteReverseInstructions ReverseBytes
> > > OpenJDK 64-Bit Server VM warning: UseByteReverseInstructions specified,
> > > but needs at least Power10.
> > > (continue with existing code)
> > >
> > > > Unfortunately, I couldn?t find a Power10 machine in my garage ??
> > > ????????
> > >
> > > This is the code I use to test:
> > > 8<---------------------------------------------------------------
> > > import java.io.IOException;
> > >
> > > class ReverseBytes
> > > {
> > >     public static void main(String[] args) throws IOException
> > >     {
> > >         for (int i = 0; i < 1000000; ++i) {
> > >             if (Integer.reverseBytes(0x12345678) != 0x78563412) {
> > >                 throw new RuntimeException();
> > >             }
> > >
> > >             if (Long.reverseBytes(0x123456789ABCDEF0L) !=
> > > 0xF0DEBC9A78563412L) {
> > >                 throw new RuntimeException();
> > >             }
> > >
> > >             if (Short.reverseBytes((short)0x1234) != (short)0x3412) {
> > >                 throw new RuntimeException();
> > >             }
> > >
> > >             if (Character.reverseBytes((char)0xabcd) != (char)0xcdab) {
> > >                 throw new RuntimeException();
> > >             }
> > >         }
> > >         System.out.println("ok");
> > >     }
> > > }
> > > 8<---------------------------------------------------------------
> > >
> > > Best regards!
> > >
> > > Jose
> > >
> > > On Tue, Aug 18, 2020 at 09:13:39AM +0000, Doerr, Martin wrote:
> > > > Hi Michihiro and Jose,
> > > >
> > > > I had only done a quick review during my vacation. Thanks for
> updating the
> > > description of PowerArchitecturePPC64.
> > > > After taking a second look, I have a few minor requests. Sorry for
> that.
> > > >
> > > >
> > > >   *   ?UseByteReverseInstructions? (plural) would be more consistent
> with
> > > other names.
> > > >   *   Please add ?size? specifications to the ppc.ad file.
> Otherwise, the
> > > compiler has to determine sizes dynamically every time.
> > > >   *   bytes_reverse_short: ?format? specification misses ?extsh?.
> > > >
> > > > Unfortunately, I couldn?t find a Power10 machine in my garage ??
> > > > So we rely on your testing.
> > > >
> > > > Thanks and best regards,
> > > > Martin
> > > >
> > > >
> > > > From: Michihiro Horie <HORIE at jp.ibm.com>
> > > > Sent: Dienstag, 18. August 2020 09:28
> > > > To: Doerr, Martin <martin.doerr at sap.com>
> > > > Cc: hotspot-compiler-dev at openjdk.java.net; joserz at linux.ibm.com
> > > > Subject: RE: RFR(M): 8248190: PPC: Enable Power10 system and use new
> > > byte-reverse instructions
> > > >
> > > >
> > > > Jose,
> > > > Latest change looks good also to me.
> > > >
> > > > Marin,
> > > > Do you think if I can push the change?
> > > >
> > > > Best regards,
> > > > Michihiro
> > > >
> > > >
> > > > ----- Original message -----
> > > > From: "Doerr, Martin"
> > > <martin.doerr at sap.com<*mailto:martin.doerr at sap.com*
> <martin.doerr at sap.com>>>
> > > > To: "joserz at linux.ibm.com<*mailto:joserz at linux.ibm.com*
> <joserz at linux.ibm.com>>"
> > > <joserz at linux.ibm.com<*mailto:joserz at linux.ibm.com*
> <joserz at linux.ibm.com>>>
> > > > Cc: hotspot compiler <hotspot-compiler-
> > > dev at openjdk.java.net<*mailto:hotspot-compiler-dev at openjdk.java.net*
> <hotspot-compiler-dev at openjdk.java.net>>>,
> > > "horie at jp.ibm.com<*mailto:horie at jp.ibm.com* <horie at jp.ibm.com>>"
> > > <horie at jp.ibm.com<*mailto:horie at jp.ibm.com* <horie at jp.ibm.com>>>
> > > > Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system
> > > and use new byte-reverse instructions
> > > > Date: Wed, Jul 1, 2020 4:01 AM
> > > >
> > > > Thanks for the much better flag description.
> > > > Looks good.
> > > >
> > > > Best regards,
> > > > Martin
> > > >
> > > > > Am 30.06.2020 um 02:15 schrieb
> > > "joserz at linux.ibm.com<*mailto:joserz at linux.ibm.com*
> <joserz at linux.ibm.com>>"
> > > <joserz at linux.ibm.com<*mailto:joserz at linux.ibm.com*
> <joserz at linux.ibm.com>>>:
> > > > >
> > > > > ?Hello team,
> > > > >
> > > > > Here's the 2nd version, implementing the suggestions asked by
> Martin.
> > > > >
> > > > > Webrev: *https://cr.openjdk.java.net/~mhorie/8248190/webrev.01/*
> <https://cr.openjdk.java.net/~mhorie/8248190/webrev.01/>
> > > > > Bug: *https://bugs.openjdk.java.net/browse/JDK-8248190*
> <https://bugs.openjdk.java.net/browse/JDK-8248190>
> > > > >
> > > > > Thank you!!
> > > > >
> > > > > Jose
> > > > >
> > > > >> On Sat, Jun 27, 2020 at 09:29:32AM +0000, Doerr, Martin wrote:
> > > > >> Hi Jose,
> > > > >>
> > > > >> Can you replace the outdated description of
> PowerArchitecturePPC64 in
> > > globals_poc.hpp by something generic, please?
> > > > >>
> > > > >> Please update the Copyright year in vm_version_poc.hpp.
> > > > >>
> > > > >> I can?t test the change, but it looks good to me.
> > > > >>
> > > > >> Best regards,
> > > > >> Martin
> > > > >>
> > > > >>>> Am 26.06.2020 um 20:29 schrieb
> > > "joserz at linux.ibm.com<*mailto:joserz at linux.ibm.com*
> <joserz at linux.ibm.com>>"
> > > <joserz at linux.ibm.com<*mailto:joserz at linux.ibm.com*
> <joserz at linux.ibm.com>>>:
> > > > >>>
> > > > >>> ?Hello team!
> > > > >>>
> > > > >>> This patch introduces Power10 to OpenJDK and implements three new
> > > instructions:
> > > > >>> - brh - byte-reverse halfword
> > > > >>> - brw - byte-reverse word
> > > > >>> - brd - byte-reverse doubleword
> > > > >>>
> > > > >>> Webrev: *https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/*
> <https://cr.openjdk.java.net/~mhorie/8248190/webrev.00/>
> > > > >>> Bug: *https://bugs.openjdk.java.net/browse/JDK-8248190*
> <https://bugs.openjdk.java.net/browse/JDK-8248190>
> > > > >>>
> > > > >>> Thanks for your review!
> > > > >>>
> > > > >>> Jose R. Ziviani
> > > >
>
>
>

From shade at redhat.com  Fri Aug 21 07:38:37 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 21 Aug 2020 09:38:37 +0200
Subject: RFR (XS) 8252120: compiler/oracle/TestCompileCommand.java
 misspells "occured"
In-Reply-To: <D1C71E79-441C-400D-8E61-407BBA294A89@oracle.com>
References: <bbf4deec-77c4-7ecb-2937-6291562acd1f@redhat.com>
 <D1C71E79-441C-400D-8E61-407BBA294A89@oracle.com>
Message-ID: <9a286b3e-60a5-87f9-00cb-4a4aa303947e@redhat.com>

On 8/20/20 6:16 PM, Igor Ignatyev wrote:
> LGTM

Thanks, pushed.

-- 
-Aleksey


From tobias.hartmann at oracle.com  Fri Aug 21 07:43:03 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 21 Aug 2020 09:43:03 +0200
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <87tuwx1gcf.fsf@redhat.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
 <87h7t13bdz.fsf@redhat.com> <a28925b3-f93c-40f4-f180-8bf8a32c3a56@oracle.com>
 <87tuwx1gcf.fsf@redhat.com>
Message-ID: <7433dd28-94ce-781a-a50c-e79234e2986e@oracle.com>

For the record, I've tested tier1-9 with "default" flags and tier1-5 with
-XX:StressLongCountedLoop=1 and -XX:StressLongCountedLoop=4294967295.

Please let me know if you think other flag combinations/values should be tested as well.

Best regards,
Tobias

On 20.08.20 17:34, Roland Westrelin wrote:
> 
>> Yes, webrev.03 looks good to me. I've re-run extended testing and the results look good.
> 
> Thanks for the review and testing!
> 
> Roland.
> 

From rwestrel at redhat.com  Fri Aug 21 07:51:17 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 21 Aug 2020 09:51:17 +0200
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <9CBCBEBB-7C33-4263-8348-900AAC068D65@oracle.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
 <87h7t13bdz.fsf@redhat.com> <668C77F1-5CCC-43CA-9C5E-2EE390D3137A@oracle.com>
 <87y2m91nvl.fsf@redhat.com> <9CBCBEBB-7C33-4263-8348-900AAC068D65@oracle.com>
Message-ID: <87o8n41loq.fsf@redhat.com>


> I?m not sure what you mean?  The original incr is an AddL,
> since we are transforming a long loop.  The AddI goes
> somewhere else in the transformed code.

I was confusing myself. Ignore that comment.

> Not any more.  Let?s just make sure the transform gets exercised, OK?

"make sure the transform gets exercised" = properly stress tested?
Tobias listed StressLongCountedLoop he used in an other email.

> After the P.S. is an amended chunk of pseudocode showing how it works.
> I created it by labeling the various expressions in the example loop
> with the names used in is_long_counted_loop, and then I stepped
> through is_long_counted_loop and edited the pseudocode to
> reflect each step.  If you agree I did it correctly, and that it helps
> explain the code, you could place it as a comment at bottom, just
> before the final peel.  Otherwise, we can just leave it here FTR.
>
> I do have this specific request:  Please replace the pseudocode
> at the top (already in the webrev) with the following corrected
> pseudocode.  It uses names more consistent with the actual C++
> code and corresponds more accurately to the transformed IR.

Ok. Let me do that.

Roland.


From ningsheng.jian at arm.com  Fri Aug 21 07:56:19 2020
From: ningsheng.jian at arm.com (Ningsheng Jian)
Date: Fri, 21 Aug 2020 15:56:19 +0800
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com>
Message-ID: <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com>

Hi Vladimir,

Thanks a lot for looking at this!

On 8/20/20 8:29 PM, Vladimir Ivanov wrote:
> Hi Ningsheng,
> 
>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039289.html 
> 
> 
> Impressive work, Ningsheng!
> 
>> http://cr.openjdk.java.net/~njian/8231441/README-RFR.txt
> 
> "Since the bottom 128 bits are shared with the NEON, we extend current
> register mask definition of V0-V31 registers. Currently, c2 uses one bit
> mask for a 32-bit register slot, so to define at most 2048 bits we will
> need to add 64 slots in AD file. That's a really large number, and will
> also break current regmask assumption."
> 
> Can you, please, elaborate on the last point? What RegMask assumptions 
> are broken for 2048-bit vectors? I'm looking at [1] and try to 
> understand the motivation for the changes in shared code.

Current regmask is handled by an array of ints, so an element of regmask 
array can handle at most 32*32=1024 bits. Some regmask handling 
functions, e.g. clear_to_sets() for alignment, need to be re-examined 
for the support of 2048 bits. And we may even want to support non 
power-of-two physical reg sizes, that could be a lot more work.

> 
> Compared to x86 w/ AVX512, architectural state for vector registers is 
> 4x larger in the worst case (ignoring predicate registers for now). Here 
> are the relevant constants on x86:
> 
> gensrc/adfiles/adGlobals_x86.hpp:
> 
> // the number of reserved registers + machine registers.
> #define REG_COUNT??? 545
> ...
> // Size of register-mask in ints
> #define RM_SIZE 22
> 
> My estimate is that for AArch64 with SVE support the constants will be:
> 
>  ? REG_COUNT < 2500
>  ? RM_SIZE < 100
> 
> which don't look too bad.
> 

Right, but given that most real hardware implementations will be no 
larger than 512 bits, I think. Having a large bitmask array, with most 
bits useless, will be less efficient for regmask computation.

> Also, I don't see any changes related to stack management. So, I assume 
> it continues to be managed in slots. Any problems there? As I 
> understand, wide SVE registers are caller-save, so there may be many 
> spills of huge vectors around a call. (Probably, not possible with C2 
> auto-vectorizer as it is now, but Vector API will expose it.)
> 

Yes, the stack is still managed in slots, but it will be allocated with 
real vector register length instead of 'virtual' slots for VecA. See the 
usages of scalable_reg_slots(), e.g. in chaitin.cpp:1587. We have also 
applied the patch to vector api, and did find a lot of vector spills 
with expected correct results.

> Have you noticed any performance problems? If that's the case, then 
> AVX512 support on x86 would benefit from similar optimization as well.
> 

Do you mean register allocation performance problems? I did not notice 
that before. Do you have any suggestion on how to measure that?

> FTR there was a similar exercise [2] on x86 to abstract away exact sizes 
> of vector registers, but it didn't have to worry about RA since all the 
> operands were already available. Also, vectors of all different sizes 
> may be used. So, it makes it hard to compare.
> 

I've also noticed that. That's an excellent work indeed. It could save a 
lot of backend match rules for different vector register sizes, which 
was one of the concerns when we started to work on SVE RA, if we defined 
all regmasks for different SVE vector register sizes. And yes, our 
current approach will also solve that problem. :-)

> Best regards,
> Vladimir Ivanov
> 
> [1] http://cr.openjdk.java.net/~njian/8231441/webrev.03-ra/
> 
> [2] https://bugs.openjdk.java.net/browse/JDK-8230015
> 

Thanks,
Ningsheng

From rwestrel at redhat.com  Fri Aug 21 07:59:53 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 21 Aug 2020 09:59:53 +0200
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <ecc85b86-4878-1048-0399-fe72d7013383@oracle.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
 <87h7t13bdz.fsf@redhat.com> <ef7c4e62-ca3b-f6ee-b7ac-8e1af273d33d@oracle.com>
 <87r1s11ewc.fsf@redhat.com> <ecc85b86-4878-1048-0399-fe72d7013383@oracle.com>
Message-ID: <87lfi81lae.fsf@redhat.com>


> Sorry, I don't get it. Normally JVM state associated with a call is a 
> state right after the call returns. Do you mean there are cases when 
> call has reexecute bit set and hence it has JVM state before the call 
> associated with it?

JVM state at a call can't be state after the call because it would need
to capture the return value which can't be an incoming edge to the call,
right?

> Anyway, it's trivial to convert between 2 states (before and after) and 
> we already do that in some places (e.g., late inline prepares JVM state 
> for the parser based on the state associated with CallStaticJava node).

Sure it's feasible to build state after the call. I was concerned that
the runtime would hardwire somewhere that state at the call is always
state before the call. That would lead to nasty, rare and hard to debug
failures. It felt a lot simpler and robuster to leave SafePoint nodes in
the graph. We could have the patch go through a performance run and see
if that change makes any difference if there's concern about it.

Roland.


From thomas.schatzl at oracle.com  Fri Aug 21 08:04:38 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 21 Aug 2020 10:04:38 +0200
Subject: RFR(M): 8248190: PPC: Enable Power10 system and use new
 byte-reverse instructions
In-Reply-To: <OF8BD9FD13.2CC626B0-ON002585CB.000C8175-492585CB.000E0836@notes.na.collabserv.com>
References: <20200819165338.GA978936@pacoca>
 <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com>
 <20200626182644.GA262544@pacoca>
 <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com>
 <20200630001528.GA26652@pacoca>
 <OF8BF41F00.CCAC864B-ON002585C8.00003A9A-492585C8.002908BA@notes.na.collabserv.com>
 <AM4PR02MB3057CBF45D7ABE3F7850535B9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <20200819002432.GA915540@pacoca>
 <AM4PR02MB30574A706883A995CAEC3A879A5D0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <OF8BD9FD13.2CC626B0-ON002585CB.000C8175-492585CB.000E0836@notes.na.collabserv.com>
Message-ID: <1ca467b0-14a2-f526-2f97-cf62c5fa148d@oracle.com>

Hi,

On 21.08.20 04:33, Michihiro Horie wrote:
> 
> Hi Jose,
> 
> One thing I noticed is a misaligned backslash in globals_ppc.hpp.
> Otherwise, the change looks good!
> 
>     /* special instructions */
> \
> +  product(bool, UseByteReverseInstructions, false,
> \

Fwiw, for adding product options, you must go through the CSR process. 
Maybe there is an exception for platform specific ones?

Thanks,
   Thomas

From vladimir.x.ivanov at oracle.com  Fri Aug 21 09:12:30 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 21 Aug 2020 12:12:30 +0300
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <87lfi81lae.fsf@redhat.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
 <87h7t13bdz.fsf@redhat.com> <ef7c4e62-ca3b-f6ee-b7ac-8e1af273d33d@oracle.com>
 <87r1s11ewc.fsf@redhat.com> <ecc85b86-4878-1048-0399-fe72d7013383@oracle.com>
 <87lfi81lae.fsf@redhat.com>
Message-ID: <f32aefc4-fca5-3ce8-17b7-55d643502dcf@oracle.com>


>> Sorry, I don't get it. Normally JVM state associated with a call is a
>> state right after the call returns. Do you mean there are cases when
>> call has reexecute bit set and hence it has JVM state before the call
>> associated with it?
> 
> JVM state at a call can't be state after the call because it would need
> to capture the return value which can't be an incoming edge to the call,
> right?

Yes, you are right. But, strictly speaking, it's not the state before 
the call either since all the arguments are not on the stack anymore (as 
an example [1]).

>> Anyway, it's trivial to convert between 2 states (before and after) and
>> we already do that in some places (e.g., late inline prepares JVM state
>> for the parser based on the state associated with CallStaticJava node).
> 
> Sure it's feasible to build state after the call. I was concerned that
> the runtime would hardwire somewhere that state at the call is always
> state before the call. That would lead to nasty, rare and hard to debug
> failures. It felt a lot simpler and robuster to leave SafePoint nodes in
> the graph. We could have the patch go through a performance run and see
> if that change makes any difference if there's concern about it.

Indeed, keeping a safepoint right after the call does look appealing.

But it also means there should be always a safepoint accompanying the 
call and it should follow it immediately for the logic in question to be 
in effect. Do we guarantee that?

Best regards,
Vladimir Ivanov

[1]

56 invokevirtual 179 
<java/util/HashMap.newNode(ILjava/lang/Object;Ljava/lang/Object;Ljava/util/HashMap$Node;)Ljava/util/HashMap$Node;> 

   152 bci: 56   VirtualCallData     count(0) nonprofiled_count(0) 
entries(2)
                                     'java/util/HashMap'(4607 0.99)
                                     'java/util/LinkedHashMap'(57 0.01)

(lldb) p jvms->dump()
  149	SafePoint	===  146  94  148  8  9  10  1  1  1  1  15  1  1  1  1 
1  1  1  96  102  10  11  12  13  32  1  [[]]  SafePoint replaced nodes: 
127->136 !orig=55,[26],23
JVMS depth=1 loc=5 stk=18 arg=20 mon=26 scalar=26 end=26 mondepth=0 sp=2 
bci=56 reexecute=false method=virtual jobject 
java.util.HashMap.putVal(jint, jobject, jobject, jboolean, jboolean)
     bc:     locals(13): 10 1 1 1 1 15 1 1 1 1 1 1 1
      stack(2): 96 102
       args(6): 10 11 12 13 32 1
   monitors(0):
    scalars(0):


(lldb) p call->jvms()->dump()
...
  150	CallDynamicJava	===  146  94  95  8  1 ( 10  11  12  13  32  10  1 
  1  1  1  15  1  1  1  1  1  1  1  96  102 ) [[ 151  152  153  155  156 
  163  164 ]] # Dynamic  java.util.HashMap::newNode 
java/util/HashMap$Node * ( java/util/HashMap:NotNull *, int, 
java/lang/Object *, java/lang/Object *, java/util/HashMap$Node * ) 
HashMap::putVal @ bci:56 !jvms: HashMap::putVal @ bci:56
JVMS depth=1 loc=10 stk=23 arg=25 mon=25 scalar=25 end=25 mondepth=0 
sp=2 bci=56 reexecute=false method=virtual jobject 
java.util.HashMap.putVal(jint, jobject, jobject, jboolean, jboolean)
     bc:     locals(13): 10 1 1 1 1 15 1 1 1 1 1 1 1
      stack(2): 96 102
       args(0):
   monitors(0):
    scalars(0):

(lldb) p new_jvms->dump()
...
  149	SafePoint	===  158  163  165  8  9  10  1  1  1  1  15  1  1  1  1 
  1  1  1  96  102  155  11  12  13  32  1  | 161  [[]]  SafePoint 
replaced nodes: 127->136 !orig=55,[26],23
JVMS depth=1 loc=5 stk=18 arg=21 mon=26 scalar=26 end=26 mondepth=0 sp=3 
bci=56 reexecute=false method=virtual jobject 
java.util.HashMap.putVal(jint, jobject, jobject, jboolean, jboolean)
     bc:     locals(13): 10 1 1 1 1 15 1 1 1 1 1 1 1
      stack(3): 96 102 155
       args(5): 11 12 13 32 1
   monitors(0):
    scalars(0):

From rwestrel at redhat.com  Fri Aug 21 11:41:41 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 21 Aug 2020 13:41:41 +0200
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <f32aefc4-fca5-3ce8-17b7-55d643502dcf@oracle.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
 <87h7t13bdz.fsf@redhat.com> <ef7c4e62-ca3b-f6ee-b7ac-8e1af273d33d@oracle.com>
 <87r1s11ewc.fsf@redhat.com> <ecc85b86-4878-1048-0399-fe72d7013383@oracle.com>
 <87lfi81lae.fsf@redhat.com> <f32aefc4-fca5-3ce8-17b7-55d643502dcf@oracle.com>
Message-ID: <87imdc1b0q.fsf@redhat.com>


> But it also means there should be always a safepoint accompanying the 
> call and it should follow it immediately for the logic in question to be 
> in effect. Do we guarantee that?

In general, it's not guaranteed that there's a safepoint above the loop
exit test. We plant SafePointNodes on back branches in the bytecodes but
if the destination of the backbranch is not the loop head then the
SafePointNode is not above the exit test. If the SafePointNode is not
right above the exit test, the current logic looks for a dominating one
in the loop body and checks that there's no side effects between the
safepoint and the exit test.

So it's possible that we can't find a suitable safepoint in which case
the transformation can proceed but without predicates for the inner loop
(unless the exit test is a not equal test because then a loop limit
check is likely required). So even if we find no safepoint, there's a
good chance we can transform the loop and do a fair job of optimizing
it.

I ran ctw on the base module with the stress option that transforms an
int counted loop to a long loop and back to an int counted loop nest to
estimate how common it is that no suitable safepoint is found and there
was only a handful of them.

So it's possible that we end up with a loop that doesn't have a
safepoint but the loop has a call that dominates the exit test and we
could use its jvm state but it seems like a rare corner case so I don't
think the extra complexity is worth it. We could revisit this if it
turns out to be a common enough code pattern.

Roland.


From vladimir.x.ivanov at oracle.com  Fri Aug 21 11:52:43 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 21 Aug 2020 14:52:43 +0300
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <87imdc1b0q.fsf@redhat.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
 <87h7t13bdz.fsf@redhat.com> <ef7c4e62-ca3b-f6ee-b7ac-8e1af273d33d@oracle.com>
 <87r1s11ewc.fsf@redhat.com> <ecc85b86-4878-1048-0399-fe72d7013383@oracle.com>
 <87lfi81lae.fsf@redhat.com> <f32aefc4-fca5-3ce8-17b7-55d643502dcf@oracle.com>
 <87imdc1b0q.fsf@redhat.com>
Message-ID: <aee6a931-f482-52bb-c3a6-9db683bcae66@oracle.com>


> So it's possible that we end up with a loop that doesn't have a
> safepoint but the loop has a call that dominates the exit test and we
> could use its jvm state but it seems like a rare corner case so I don't
> think the extra complexity is worth it. We could revisit this if it
> turns out to be a common enough code pattern.

Sounds good.

Thanks for the clarifications and additional experiments!

Best regards,
Vladimir Ivanov

From rwestrel at redhat.com  Fri Aug 21 12:29:33 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 21 Aug 2020 14:29:33 +0200
Subject: RFR(XS): 8251527: CTW: C2 (Shenandoah) compilation fails with
 SEGV due to unhandled catchproj == NULL
In-Reply-To: <c1016f91-6bf0-87c6-c50e-5479539e2f2e@oracle.com>
References: <877dtt3ckp.fsf@redhat.com>
 <4c3aa4cf-af9f-9000-a12d-010bdd477b30@oracle.com>
 <c1016f91-6bf0-87c6-c50e-5479539e2f2e@oracle.com>
Message-ID: <87ft8g18sy.fsf@redhat.com>


Thanks for the reviews, Christian and Vladimir.

Roland.


From joserz at linux.ibm.com  Fri Aug 21 13:37:29 2020
From: joserz at linux.ibm.com (joserz at linux.ibm.com)
Date: Fri, 21 Aug 2020 10:37:29 -0300
Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and
 use new byte-reverse instructions
In-Reply-To: <1ca467b0-14a2-f526-2f97-cf62c5fa148d@oracle.com>
References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com>
 <20200626182644.GA262544@pacoca>
 <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com>
 <20200630001528.GA26652@pacoca>
 <OF8BF41F00.CCAC864B-ON002585C8.00003A9A-492585C8.002908BA@notes.na.collabserv.com>
 <AM4PR02MB3057CBF45D7ABE3F7850535B9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <20200819002432.GA915540@pacoca>
 <AM4PR02MB30574A706883A995CAEC3A879A5D0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <OF8BD9FD13.2CC626B0-ON002585CB.000C8175-492585CB.000E0836@notes.na.collabserv.com>
 <1ca467b0-14a2-f526-2f97-cf62c5fa148d@oracle.com>
Message-ID: <20200821133729.GA53991@pacoca>

Hello!

On Fri, Aug 21, 2020 at 10:04:38AM +0200, Thomas Schatzl wrote:
> Hi,
> 
> On 21.08.20 04:33, Michihiro Horie wrote:
> > 
> > Hi Jose,
> > 
> > One thing I noticed is a misaligned backslash in globals_ppc.hpp.
> > Otherwise, the change looks good!
> > 
> >     /* special instructions */
> > \
> > +  product(bool, UseByteReverseInstructions, false,
> > \
> 
> Fwiw, for adding product options, you must go through the CSR process. Maybe
> there is an exception for platform specific ones?

I didn't find any exception for platform specific options. But, "experimental" options
don't need such CSR process and, to be honest, experimental seems more appropriate here.
What do you think?

Thank you for your review! :)

> 
> Thanks,
>   Thomas

From thomas.schatzl at oracle.com  Fri Aug 21 13:45:17 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 21 Aug 2020 15:45:17 +0200
Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and
 use new byte-reverse instructions
In-Reply-To: <20200821133729.GA53991@pacoca>
References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com>
 <20200626182644.GA262544@pacoca>
 <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com>
 <20200630001528.GA26652@pacoca>
 <OF8BF41F00.CCAC864B-ON002585C8.00003A9A-492585C8.002908BA@notes.na.collabserv.com>
 <AM4PR02MB3057CBF45D7ABE3F7850535B9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <20200819002432.GA915540@pacoca>
 <AM4PR02MB30574A706883A995CAEC3A879A5D0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <OF8BD9FD13.2CC626B0-ON002585CB.000C8175-492585CB.000E0836@notes.na.collabserv.com>
 <1ca467b0-14a2-f526-2f97-cf62c5fa148d@oracle.com>
 <20200821133729.GA53991@pacoca>
Message-ID: <e963472b-b2f8-1598-28b4-b69ea152fca1@oracle.com>

Hi,

On 21.08.20 15:37, joserz at linux.ibm.com wrote:
> Hello!
> 
> On Fri, Aug 21, 2020 at 10:04:38AM +0200, Thomas Schatzl wrote:
>> Hi,
>>
>> On 21.08.20 04:33, Michihiro Horie wrote:
>>>
>>> Hi Jose,
>>>
>>> One thing I noticed is a misaligned backslash in globals_ppc.hpp.
>>> Otherwise, the change looks good!
>>>
>>>      /* special instructions */
>>> \
>>> +  product(bool, UseByteReverseInstructions, false,
>>> \
>>
>> Fwiw, for adding product options, you must go through the CSR process. Maybe
>> there is an exception for platform specific ones?
> 
> I didn't find any exception for platform specific options. But, "experimental" options
> don't need such CSR process and, to be honest, experimental seems more appropriate here.
> What do you think?
> 
> Thank you for your review! :)

Just a fly-by. It's up to you :) - just that product options need to be 
announced to the world.

I kind of agree that experimental seems more appropriate. You can always 
"upgrade" it later.

Thomas

From christian.hagedorn at oracle.com  Fri Aug 21 14:28:08 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Fri, 21 Aug 2020 16:28:08 +0200
Subject: [16] RFR(M): 8249607: C2: assert(!had_error) failed: bad dominance
Message-ID: <ff42068d-c628-7698-b16d-de164e9f6a4d@oracle.com>

Hi

Please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8249607
http://cr.openjdk.java.net/~chagedorn/8249607/webrev.00/

In the testcase, a LoadSNode is cloned in 
PhaseIdealLoop::split_if_with_blocks_post() for each use such that they 
can float out of a loop. To ensure that these loads cannot float back 
into the loop, we pin them by setting their control input [1]. In the 
testcase, all 3 new clones are pinned to a loop exit node that is part 
of an outer strip mined loop (see [2]).

The clones LoadS 901 and 902 have a late control that is outside of the 
strip mined loop 879. But the dominance information is still correct 
after the SplitIf optimization since the inner loop exit node 876 
IfFalse is still on the dominator chains of these late controls.

We later create pre/main/post loops and add additional RegionNodes to 
merge them together. However, we do not consider these LoadSNodes that 
have a control input from 876 IfFalse. When later verifying for each 
node that its early control dominates its latest possible control, we 
fail because we cannot reach 876 IfFalse anymore on a dominator chain 
for the late controls of LoadS 901 and 902 which start further down 
outside of the strip mined loop 879.

We have two options to fix this. We could either update the wrong 
control inputs from 876 IfFalse during the creation/merging of 
pre/main/post loops or directly fix it inside 
split_if_with_blocks_post(). I think it is makes more sense and is also 
easier to directly fix it in split_if_with_blocks_post() where we could 
be less pessimistic when pinning loads.

The fix now checks if late_load_ctrl is a loop exit of a loop that has 
an outer strip mined loop and if it dominates x_ctrl. If that is the 
case, we use the outer loop exit control instead. This also means that 
the loads can completely float out of the outer strip mined loop. 
Applying that to the testcase, we get [3] instead of [2]. LoadS 901 and 
902 are both at the outer strip mined loop exit while 903 LoadS is still 
at the inner loop due to 575 StoreI (x_ctrl is 876 IfFalse and dominates 
the outer strip mined loop exit). The process of creating pre/main/post 
loops will then take care of these control inputs of the LoadSNodes and 
rewires them to the newly created RegionNode such that the dominator 
information is correct again.

I additionally updated the printing output in case of such a dominance 
failure which I think improves the analysis of these problems. It now 
also prints the idom chain of the early node and the actual real LCA of 
early and the wrong LCA together with the idom index:

Real LCA of early 876 (idom[5]) and (wrong) LCA 728 (idom[19]):
  1052	If	===  523  1051  [[ 1035  1053 ]] P=0.999999, C=-1.000000


Thank you!

Best regards,
Christian


[1] 
http://hg.openjdk.java.net/jdk/jdk/file/1c332a041243/src/hotspot/share/opto/loopopts.cpp#l1456
[2] 
https://bugs.openjdk.java.net/secure/attachment/89911/pinned_at_inner_loop_exit.png
[3] 
https://bugs.openjdk.java.net/secure/attachment/89912/pinned_at_outer_strip_mined_loop_exit.png

From martin.doerr at sap.com  Fri Aug 21 15:06:55 2020
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 21 Aug 2020 15:06:55 +0000
Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and
 use new byte-reverse instructions
In-Reply-To: <e963472b-b2f8-1598-28b4-b69ea152fca1@oracle.com>
References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com>
 <20200626182644.GA262544@pacoca>
 <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com>
 <20200630001528.GA26652@pacoca>
 <OF8BF41F00.CCAC864B-ON002585C8.00003A9A-492585C8.002908BA@notes.na.collabserv.com>
 <AM4PR02MB3057CBF45D7ABE3F7850535B9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <20200819002432.GA915540@pacoca>
 <AM4PR02MB30574A706883A995CAEC3A879A5D0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <OF8BD9FD13.2CC626B0-ON002585CB.000C8175-492585CB.000E0836@notes.na.collabserv.com>
 <1ca467b0-14a2-f526-2f97-cf62c5fa148d@oracle.com>
 <20200821133729.GA53991@pacoca>
 <e963472b-b2f8-1598-28b4-b69ea152fca1@oracle.com>
Message-ID: <AM4PR02MB3057D406112699A1AE83F28D9A5B0@AM4PR02MB3057.eurprd02.prod.outlook.com>

Hi Thomas,

I agree with you in general. However, all PPC64 specific platform flags are "product" at the moment.
Most of them should probably be "diagnostic". We should fix that at some point of time.
But for now, I'm ok with Jose's webrev since it's consistent with the other PPC64 flags.

Best regards,
Martin


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> retn at openjdk.java.net> On Behalf Of Thomas Schatzl
> Sent: Freitag, 21. August 2020 15:45
> To: joserz at linux.ibm.com
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system
> and use new byte-reverse instructions
> 
> Hi,
> 
> On 21.08.20 15:37, joserz at linux.ibm.com wrote:
> > Hello!
> >
> > On Fri, Aug 21, 2020 at 10:04:38AM +0200, Thomas Schatzl wrote:
> >> Hi,
> >>
> >> On 21.08.20 04:33, Michihiro Horie wrote:
> >>>
> >>> Hi Jose,
> >>>
> >>> One thing I noticed is a misaligned backslash in globals_ppc.hpp.
> >>> Otherwise, the change looks good!
> >>>
> >>>      /* special instructions */
> >>> \
> >>> +  product(bool, UseByteReverseInstructions, false,
> >>> \
> >>
> >> Fwiw, for adding product options, you must go through the CSR process.
> Maybe
> >> there is an exception for platform specific ones?
> >
> > I didn't find any exception for platform specific options. But,
> "experimental" options
> > don't need such CSR process and, to be honest, experimental seems more
> appropriate here.
> > What do you think?
> >
> > Thank you for your review! :)
> 
> Just a fly-by. It's up to you :) - just that product options need to be
> announced to the world.
> 
> I kind of agree that experimental seems more appropriate. You can always
> "upgrade" it later.
> 
> Thomas

From thomas.schatzl at oracle.com  Fri Aug 21 15:12:19 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 21 Aug 2020 17:12:19 +0200
Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and
 use new byte-reverse instructions
In-Reply-To: <AM4PR02MB3057D406112699A1AE83F28D9A5B0@AM4PR02MB3057.eurprd02.prod.outlook.com>
References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com>
 <20200626182644.GA262544@pacoca>
 <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com>
 <20200630001528.GA26652@pacoca>
 <OF8BF41F00.CCAC864B-ON002585C8.00003A9A-492585C8.002908BA@notes.na.collabserv.com>
 <AM4PR02MB3057CBF45D7ABE3F7850535B9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <20200819002432.GA915540@pacoca>
 <AM4PR02MB30574A706883A995CAEC3A879A5D0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <OF8BD9FD13.2CC626B0-ON002585CB.000C8175-492585CB.000E0836@notes.na.collabserv.com>
 <1ca467b0-14a2-f526-2f97-cf62c5fa148d@oracle.com>
 <20200821133729.GA53991@pacoca>
 <e963472b-b2f8-1598-28b4-b69ea152fca1@oracle.com>
 <AM4PR02MB3057D406112699A1AE83F28D9A5B0@AM4PR02MB3057.eurprd02.prod.outlook.com>
Message-ID: <6202cdf2-10b8-dd70-60ee-da9917cf8a28@oracle.com>

Hi,

On 21.08.20 17:06, Doerr, Martin wrote:
> Hi Thomas,
> 
> I agree with you in general. However, all PPC64 specific platform flags are "product" at the moment.
> Most of them should probably be "diagnostic". We should fix that at some point of time.
> But for now, I'm ok with Jose's webrev since it's consistent with the other PPC64 flags.
> 

   I was merely pointing out what the rule is, that has not been a veto 
for the patch (which I haven't reviewed btw). If you want to go ahead 
with that for consistency's sake, with a plan to fix this I can see your 
point of keeping it.

Thanks,
   Thomas

> Best regards,
> Martin
> 
> 
>> -----Original Message-----
>> From: hotspot-compiler-dev <hotspot-compiler-dev-
>> retn at openjdk.java.net> On Behalf Of Thomas Schatzl
>> Sent: Freitag, 21. August 2020 15:45
>> To: joserz at linux.ibm.com
>> Cc: hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system
>> and use new byte-reverse instructions
>>
>> Hi,
>>
>> On 21.08.20 15:37, joserz at linux.ibm.com wrote:
>>> Hello!
>>>
>>> On Fri, Aug 21, 2020 at 10:04:38AM +0200, Thomas Schatzl wrote:
>>>> Hi,
>>>>
>>>> On 21.08.20 04:33, Michihiro Horie wrote:
>>>>>
>>>>> Hi Jose,
>>>>>
>>>>> One thing I noticed is a misaligned backslash in globals_ppc.hpp.
>>>>> Otherwise, the change looks good!
>>>>>
>>>>>       /* special instructions */
>>>>> \
>>>>> +  product(bool, UseByteReverseInstructions, false,
>>>>> \
>>>>
>>>> Fwiw, for adding product options, you must go through the CSR process.
>> Maybe
>>>> there is an exception for platform specific ones?
>>>
>>> I didn't find any exception for platform specific options. But,
>> "experimental" options
>>> don't need such CSR process and, to be honest, experimental seems more
>> appropriate here.
>>> What do you think?
>>>
>>> Thank you for your review! :)
>>
>> Just a fly-by. It's up to you :) - just that product options need to be
>> announced to the world.
>>
>> I kind of agree that experimental seems more appropriate. You can always
>> "upgrade" it later.
>>
>> Thomas


From martin.doerr at sap.com  Fri Aug 21 15:25:46 2020
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 21 Aug 2020 15:25:46 +0000
Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and
 use new byte-reverse instructions
In-Reply-To: <6202cdf2-10b8-dd70-60ee-da9917cf8a28@oracle.com>
References: <573B5B05-4E70-445C-9EB6-5C5DAD8A41CB@sap.com>
 <20200626182644.GA262544@pacoca>
 <5BE37834-2072-426C-A2AD-3331A1B71A26@sap.com>
 <20200630001528.GA26652@pacoca>
 <OF8BF41F00.CCAC864B-ON002585C8.00003A9A-492585C8.002908BA@notes.na.collabserv.com>
 <AM4PR02MB3057CBF45D7ABE3F7850535B9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <20200819002432.GA915540@pacoca>
 <AM4PR02MB30574A706883A995CAEC3A879A5D0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <OF8BD9FD13.2CC626B0-ON002585CB.000C8175-492585CB.000E0836@notes.na.collabserv.com>
 <1ca467b0-14a2-f526-2f97-cf62c5fa148d@oracle.com>
 <20200821133729.GA53991@pacoca>
 <e963472b-b2f8-1598-28b4-b69ea152fca1@oracle.com>
 <AM4PR02MB3057D406112699A1AE83F28D9A5B0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <6202cdf2-10b8-dd70-60ee-da9917cf8a28@oracle.com>
Message-ID: <AM4PR02MB3057D3AB7E81F570F9517E189A5B0@AM4PR02MB3057.eurprd02.prod.outlook.com>

Hi Thomas,

I understand your point. My concern is that it may become a more political discussion how to handle CSR for PPC64 flags and I don't want to delay Jose's change for that. There are already other changes in the pipe which build on top of it.

It will probably be us to handle and approve CSR requests for platforms which are maintained by SAP. We haven't done this so far. We are still handling such flags in a less formal way.
I don't know how other non-Oracle platforms are handled.

Best regards,
Martin


> -----Original Message-----
> From: Thomas Schatzl <thomas.schatzl at oracle.com>
> Sent: Freitag, 21. August 2020 17:12
> To: Doerr, Martin <martin.doerr at sap.com>; joserz at linux.ibm.com
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system
> and use new byte-reverse instructions
> 
> Hi,
> 
> On 21.08.20 17:06, Doerr, Martin wrote:
> > Hi Thomas,
> >
> > I agree with you in general. However, all PPC64 specific platform flags are
> "product" at the moment.
> > Most of them should probably be "diagnostic". We should fix that at some
> point of time.
> > But for now, I'm ok with Jose's webrev since it's consistent with the other
> PPC64 flags.
> >
> 
>    I was merely pointing out what the rule is, that has not been a veto
> for the patch (which I haven't reviewed btw). If you want to go ahead
> with that for consistency's sake, with a plan to fix this I can see your
> point of keeping it.
> 
> Thanks,
>    Thomas
> 
> > Best regards,
> > Martin
> >
> >
> >> -----Original Message-----
> >> From: hotspot-compiler-dev <hotspot-compiler-dev-
> >> retn at openjdk.java.net> On Behalf Of Thomas Schatzl
> >> Sent: Freitag, 21. August 2020 15:45
> >> To: joserz at linux.ibm.com
> >> Cc: hotspot-compiler-dev at openjdk.java.net
> >> Subject: Re: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10
> system
> >> and use new byte-reverse instructions
> >>
> >> Hi,
> >>
> >> On 21.08.20 15:37, joserz at linux.ibm.com wrote:
> >>> Hello!
> >>>
> >>> On Fri, Aug 21, 2020 at 10:04:38AM +0200, Thomas Schatzl wrote:
> >>>> Hi,
> >>>>
> >>>> On 21.08.20 04:33, Michihiro Horie wrote:
> >>>>>
> >>>>> Hi Jose,
> >>>>>
> >>>>> One thing I noticed is a misaligned backslash in globals_ppc.hpp.
> >>>>> Otherwise, the change looks good!
> >>>>>
> >>>>>       /* special instructions */
> >>>>> \
> >>>>> +  product(bool, UseByteReverseInstructions, false,
> >>>>> \
> >>>>
> >>>> Fwiw, for adding product options, you must go through the CSR
> process.
> >> Maybe
> >>>> there is an exception for platform specific ones?
> >>>
> >>> I didn't find any exception for platform specific options. But,
> >> "experimental" options
> >>> don't need such CSR process and, to be honest, experimental seems
> more
> >> appropriate here.
> >>> What do you think?
> >>>
> >>> Thank you for your review! :)
> >>
> >> Just a fly-by. It's up to you :) - just that product options need to be
> >> announced to the world.
> >>
> >> I kind of agree that experimental seems more appropriate. You can
> always
> >> "upgrade" it later.
> >>
> >> Thomas


From rwestrel at redhat.com  Fri Aug 21 15:22:27 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 21 Aug 2020 17:22:27 +0200
Subject: [16] RFR(M): 8249607: C2: assert(!had_error) failed: bad dominance
In-Reply-To: <ff42068d-c628-7698-b16d-de164e9f6a4d@oracle.com>
References: <ff42068d-c628-7698-b16d-de164e9f6a4d@oracle.com>
Message-ID: <87d03k10ss.fsf@redhat.com>


Hi Christian,

> We have two options to fix this. We could either update the wrong 
> control inputs from 876 IfFalse during the creation/merging of 
> pre/main/post loops or directly fix it inside 
> split_if_with_blocks_post(). I think it is makes more sense and is also 
> easier to directly fix it in split_if_with_blocks_post() where we could 
> be less pessimistic when pinning loads.
>
> The fix now checks if late_load_ctrl is a loop exit of a loop that has 
> an outer strip mined loop and if it dominates x_ctrl. If that is the 
> case, we use the outer loop exit control instead. This also means that 
> the loads can completely float out of the outer strip mined loop. 
> Applying that to the testcase, we get [3] instead of [2]. LoadS 901 and 
> 902 are both at the outer strip mined loop exit while 903 LoadS is still 
> at the inner loop due to 575 StoreI (x_ctrl is 876 IfFalse and dominates 
> the outer strip mined loop exit). The process of creating pre/main/post 
> loops will then take care of these control inputs of the LoadSNodes and 
> rewires them to the newly created RegionNode such that the dominator 
> information is correct again.

I agree that fixing it in split_if_with_blocks_post() is the right thing
to do.

The load has no edges to the safepoint in the outer strip mined loop so
why is it in the loop in the first place then? If java code has a load
in a loop that's live outside the loop then it should be live at the
safepoint on loop exit. Is anti dependence analysis too conservative?

Also why does get_late_ctrl(n, n_ctrl) return a control inside the outer
strip mined loop? And why is it safe to bypass that result?

Roland.


From erik.osterlund at oracle.com  Fri Aug 21 16:21:53 2020
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Fri, 21 Aug 2020 18:21:53 +0200
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
Message-ID: <d87a73a0-d2f6-b1eb-8784-f76ada320dc6@oracle.com>

Hi,

Have you tried this with ZGC on AArch64? It has custom code for saving 
live registers in the load barrier slow path.
I can't see any code changes there, so assuming this will just crash 
instead.
The relevant code is in ZBarrierSetAssembler on aarch64.

Maybe I missed something?

Thanks,
/Erik

On 2020-08-19 11:53, Ningsheng Jian wrote:
> Hi Andrew,
>
> I have updated the patch based on the review comments. Would you mind 
> taking another look? Thanks!
>
> Full:
> http://cr.openjdk.java.net/~njian/8231441/webrev.04/
>
> Incremental:
> http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/
>
> Also add build-dev, as there's a makefile change.
>
> And the split parts:
>
> 1) SVE feature detection:
> http://cr.openjdk.java.net/~njian/8231441/webrev.04-feature
>
> 2) c2 register allocation:
> http://cr.openjdk.java.net/~njian/8231441/webrev.04-ra
>
> 3) SVE c2 backend:
> http://cr.openjdk.java.net/~njian/8231441/webrev.04-c2
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8231441
> CSR: https://bugs.openjdk.java.net/browse/JDK-8248742
>
> JTreg tests are still running, and so far no new failure found.
>
> Thanks,
> Ningsheng
>
> On 8/17/20 5:16 PM, Andrew Dinn wrote:
>> Hi Pengfei,
>>
>> On 17/08/2020 07:00, Ningsheng Jian wrote:
>>> Thanks a lot for the review! Sorry for the late reply, as I was on
>>> vacation last week. And thanks to Pengfei and Joshua for helping
>>> clarifying some details in the patch.
>>
>> Yes, they did a very good job of answering most of the pending 
>> questions.
>>
>>>> I also eyeballed /some/ of the generated code to check that it looked
>>>> ok. I'd really like to be able to do that systematically for a
>>>> comprehensive test suite that exercised every rule but I only had the
>>>> machine for a few days. This really ought to be done as a follow-up to
>>>> ensure that all the rules are working as expected.
>>>
>>> Yes, we would expect Pengfei's OptoAssembly check patch can get merged
>>> in future.
>>
>> I'm fine with that as a follow-up patch if you raise a JIRA for it.
>>
>>>> I am not clear why you are choosing to re-init ptrue after certain JVM
>>>> runtime calls (e.g. when Z calls into the runtime) and not others e.g.
>>>> when we call a JVM_ENTRY. Could you explain the rationale you have
>>>> followed here?
>>>
>>> We do the re-init at any possible return points to c2 code, not in any
>>> runtime c++ functions, which will reduce the re-init calls.
>>>
>>> Actually I found those entries by some hack of jvm. In the hacky code
>>> below we use gcc option -finstrument-functions to build hotspot. With
>>> this option, each C/C++ function entry/exit will call the instrument
>>> functions we defined. In instrument functions, we clobber p7 (or other
>>> reg for test) register, and in c2 function return we verify that p7 (or
>>> other reg) has been reinitialized.
>>>
>>> http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch 
>>>
>>
>> Nice work. It's very good to have that documented. I'm willing to accept
>> i) that this has found all current cases and ii) that the verify will
>> catch any cases that might get introduced by future changes (e.g. the
>> callout introduced by ZGC that you mention below). As the above mot say
>> there is a slim chance this might have missed some cases but I think it
>> is pretty unlikely.
>>
>>
>>>> Specific Comments (register allocator webrev):
>>>>
>>>>
>>>> aarch64.ad:97-100
>>>>
>>>> Why have you added a reg_def for R8 and R9 here and also to 
>>>> alloc_class
>>>> chunk0 at lines 544-545? They aren't used by C2 so why define them?
>>>>
>>>
>>> I think Pengfei has helped to explain that. I will either add clear
>>> comments or rename the register name as you suggested.
>>
>> Ok, good.
>>
>>> As Joshua clarified, we are also working on predicate scalable reg,
>>> which is not in this patch. Thanks for the suggestion, I will try to
>>> refactor this a bit.
>>
>> Ok, I'll wait for an updated patch. Are you planning to include the
>> scalable predicate reg code as part of this patch? I think that would be
>> better as it would help to clarify the need to distinguish vector regs
>> as a subset of scalable regs.
>>
>>>> zBarrierSetAssembler_aarch64.cpp:434
>>>>
>>>> Can you explain why we need to check p7 here and not do so in other
>>>> places where we call into the JVM? I'm not saying this is wrong. I 
>>>> just
>>>> want to know how you decided where re-init of p7 was needed.
>>>>
>>>
>>> Actually I found this by my hack patch above while running jtreg tests.
>>> The stub slowpath here can be a c++ function.
>>
>> Yes, good catch.
>>
>>>> superword.cpp:97
>>>>
>>>> Does this mean that is someone sets the maximum vector size to a
>>>> non-power of two, such as 384, all superword operations will be
>>>> bypassed? Including those which can be done using NEON vectors?
>>>>
>>>
>>> Current SLP vectorizer only supports power-of-2 vector size. We are
>>> trying to work out a new vectorizer to support all SVE vector sizes, so
>>> we would expect a size like 384 could go to that path. I tried current
>>> patch on a 512-bit SVE hardware which does not support 384-bit:
>>>
>>> $ java -XX:MaxVectorSize=16 -version # (32 and 64 are the same)
>>> openjdk version "16-internal" 2021-03-16
>>>
>>> $ java -XX:MaxVectorSize=48 -version
>>> OpenJDK 64-Bit Server VM warning: Current system only supports max SVE
>>> vector length 32. Set MaxVectorSize to 32
>>>
>>> (Fallbacks to 32 and issue a warning, as the prctl() call returns 32
>>> instead of unsupported 48:
>>> https://www.kernel.org/doc/Documentation/arm64/sve.txt)
>>>
>>> Do you think we need to exit vm instead of warning and fallbacking 
>>> to 32
>>> here?
>>
>> Yes, I think a vm exit would probably be a better choice.
>>
>> regards,
>>
>>
>> Andrew Dinn
>> -----------
>> Red Hat Distinguished Engineer
>> Red Hat UK Ltd
>> Registered in England and Wales under Company Registration No. 03798903
>> Directors: Michael Cunningham, Michael ("Mike") O'Neill
>>
>


From vladimir.kozlov at oracle.com  Fri Aug 21 20:04:38 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 21 Aug 2020 13:04:38 -0700
Subject: RFR(S) : 8251998 remove usage of PropertyResolvingWrapper in
 vmTestbase/jit/t
In-Reply-To: <1041CE41-B5C9-407F-AF91-918A52885DA8@oracle.com>
References: <1041CE41-B5C9-407F-AF91-918A52885DA8@oracle.com>
Message-ID: <a03b1d1e-c54c-dd98-aa56-82a11a7d67a2@oracle.com>

Looks good.

Thanks,
Vladimir K

On 8/20/20 1:57 PM, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev//8251998/webrev.00
>> 69 lines changed: 4 ins; 24 del; 41 mod;
> 
> Hi all,
> 
> could you please review this small patch which removes usages of PropertyResolvingWrapper from vmTestbase/jit/t tests and reenabled allowSmartActionArgs?
> 
> background from the main bug:
>> CODETOOLS-7902352 added support of using ${property} in action directive, so PropertyResolvingWrapper isn't needed anymore and can be removed.
> 
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8251998
> webrev: http://cr.openjdk.java.net/~iignatyev//8251998/webrev.00
> testing: :vmTestbase_vm_compiler
> 
> Thanks,
> -- Igor
>   
> 

From vladimir.kozlov at oracle.com  Fri Aug 21 20:07:24 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 21 Aug 2020 13:07:24 -0700
Subject: RFR: 8252058: [JVMCI] Rework setting is_method_handle_invoke flag
 in jvmciCodeInstaller
In-Reply-To: <EC2262AB-F697-4DEB-AEBD-4438EDADA388@oracle.com>
References: <B69DEE2F-9715-4DF1-964C-B4AEF3D9264E@oracle.com>
 <b29a7191-ae94-d36d-4ffd-62c05ca53c62@oracle.com>
 <5aec53c6-5e79-aa07-aa97-ca46fccb3f58@oracle.com>
 <CB5447D4-6DB4-43D3-B8B2-C076978BF541@oracle.com>
 <b8f6a4a4-de36-a490-b86a-980b43dbe72f@oracle.com>
 <96D2E077-C8A9-4DB6-9107-359C151A004B@oracle.com>
 <24dd9111-9119-3b00-fb48-733ef6042cae@oracle.com>
 <EC2262AB-F697-4DEB-AEBD-4438EDADA388@oracle.com>
Message-ID: <679726fc-3a89-072e-45a6-d2a69eb5f068@oracle.com>

Looks good. Thank you for testing it with changed version.

Vladimir K

On 8/20/20 5:37 AM, Yudi Zheng wrote:
> Please review this rework of setting is_method_handle_invoke flag in jvmciCodeInstaller.
> 
> http://cr.openjdk.java.net/~yzheng/8252058/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8252058
> 
> Changes since last time are at http://cr.openjdk.java.net/~yzheng/8252058/webrev.00/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/GraalHotSpotVMConfig.java.udiff.html
> 
> -Yudi
> 
>> On 7 Jun 2020, at 23:14, Dean Long <dean.long at oracle.com> wrote:
>>
>> Looks good!
>>
>> dl
>>
>> On 6/7/20 1:06 PM, Yudi Zheng wrote:
>>> Thanks Dean!
>>> Here is a revision including your suggestion: http://cr.openjdk.java.net/~yzheng/8246347/webrev.01/
>>>
>>> -Yudi
>>>
>>>> On 6 Jun 2020, at 11:33, Dean Long <dean.long at oracle.com> wrote:
>>>>
>>>> I found a problem.  You need to make CompiledMethod::is_deopt_mh_entry() look like is_deopt_entry() by adding the JVMCI logic that looks backwards by the size of the call instruction.
>>>>
>>>> dl
>>>>
>>>> On 6/4/20 12:03 AM, Yudi Zheng wrote:
>>>>> I did not push this yet. It might require changes on the Graal side. I am still thinking about how to merge.
>>>>>
>>>>> -Yudi
>>>>>
>>>>>> On 4 Jun 2020, at 01:22, Dean Long <dean.long at oracle.com> wrote:
>>>>>>
>>>>>> Does this require recent Graal change in order to work correctly?
>>>>>>
>>>>>> dl
>>>>>>
>>>>>> On 6/3/20 3:47 PM, Dean Long wrote:
>>>>>>> Hi Yudi.  I'm seeing an assert in test/jdk/java/lang/invoke/CallSiteTest.java with a debug build. Let me remove my changes and see if it still fails.  What testing did you do?
>>>>>>>
>>>>>>> dl
>>>>>>>
>>>>>>> On 6/2/20 9:38 AM, Yudi Zheng wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Please review this patch that sets is_method_handle_invoke flag accordingly when describing scope at call site in jvmciCodeInstaller.
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~yzheng/8246347/webrev.00/
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8246347
>>>>>>>>
>>>>>>>> Many thanks,
>>>>>>>> Yudi
>>>>
>>>
>>
> 

From vladimir.x.ivanov at oracle.com  Fri Aug 21 22:34:40 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Sat, 22 Aug 2020 01:34:40 +0300
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com>
 <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com>
Message-ID: <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com>

Thanks for clarifications, Ningsheng.

Let me share my thoughts on the topic and I'll start with summarizing 
the experience of migrating x86 code to generic vectors.

JVM has quite a bit of special logic to support vectors. It hasn't 
exhausted the complexity budget yet, but it's quite close to the limit 
(as you probably noticed). While extending x86 backend to support Vector 
API, we pushed it over the limit and had to address some of the issues.

The ultimate goal was to move to vectors which represent full-width 
hardware registers. After we were convinced that it will work well in AD 
files, we encountered some inefficiencies with vector spills: depending 
on actual hardware, smaller (than available) vectors may be used (e.g., 
integer computations on AVX-capable CPU). So, we stopped half-way and 
left post-matching part intact: depending on actual vector value width, 
appropriate operand (vecX/vecY/vecZ + legacy variants) is chosen.

(I believe you may be in a similar situation on AArch64 with NEON vs SVE 
where both 128-bit and wide SVE vectors may be used at runtime.)

Now back to the patch.

What I see in the patch is that you try to attack the problem from the 
opposite side: you introduce new concept of a size-agnostic vector 
register on RA side and then directly use it during matching: vecA is 
used in aarch64_sve.ad and aarch64.ad relies on vecD/vecX.

Unfortunately, it extends the implementation in orthogonal direction 
which looks too aarch64-specific to benefit other architectures and x86 
particular. I believe there's an alternative approach which can benefit 
both aarch64 and x86, but it requires more experimentation.

If I were to start from scratch, I would choose between 3 options:

   #1: reuse existing VecX/VecY/VecZ ideal registers and limit supported 
vector sizes to 128-/256-/512-bit values.

   #2: lift limitation on max size (to 1024/2048 bits), but ignore 
non-power-of-2 sizes;

   #3: introduce support for full range of vector register sizes 
(128-/.../2048-bit with 128-bit step);

I see 2 (mostly unrelated) limitations: maximum vector size and 
non-power-of-2 sizes.

My understanding is that you don't try to accurately represent SVE for 
now, but lay some foundations for future work: you give up on 
non-power-of-2 sized vectors, but still enable support for arbitrarily 
sized vectors (addressing both limitations on maximum size and size 
granularity) in RA (and it affects only spills). So, it is somewhere 
between #2 and #3.

The ultimate goal is definitely #3, but how much more work will be 
required to teach the JVM about non-power-of-2 vectors? As I see in the 
patch, you don't have auto-vectorizer support yet, but Vector API will 
provide access to whatever size hardware exposes. What do you expect on 
hardware front in the near/mid-term future? Anything supporting vectors 
larger than 512-bit? What about 384-bit vectors?

I don't have a good understanding where SVE/SVE2-capable hardware is 
moving and would benefit a lot from your insights about what to expect.

If 256-/512-bit vectors end up as the only option, then #1 should fit 
them well.

For larger vectors #2 (or a mix of #1 and #2) may be a good fit. My 
understanding that existing RA machinery should support 1024-bit vectors 
well. So, unless 2048-bit vectors are needed, we could live with the 
framework we have right now.

If hardware has non-power-of-2 vectors, but JVM doesn't support them, 
then JVM can work with just power-of-2 portion of them (384-bit => 256-bit).

Giving up on #3 for now and starting with less ambitious goals (#1 or 
#2) would reduce pressure on RA and give more time for additional 
experiments to come with a better and more universal 
support/representation of generic/size-agnostic vectors. And, in a 
longer term, help reducing complexity and technical debt in the area.

Some more comments follow inline.

>> Compared to x86 w/ AVX512, architectural state for vector registers is 
>> 4x larger in the worst case (ignoring predicate registers for now). 
>> Here are the relevant constants on x86:
>>
>> gensrc/adfiles/adGlobals_x86.hpp:
>>
>> // the number of reserved registers + machine registers.
>> #define REG_COUNT??? 545
>> ...
>> // Size of register-mask in ints
>> #define RM_SIZE 22
>>
>> My estimate is that for AArch64 with SVE support the constants will be:
>>
>> ?? REG_COUNT < 2500
>> ?? RM_SIZE < 100
>>
>> which don't look too bad.
>>
> 
> Right, but given that most real hardware implementations will be no 
> larger than 512 bits, I think. Having a large bitmask array, with most 
> bits useless, will be less efficient for regmask computation.

Does it make sense to limit the maximum supported size to 512-bit then 
(at least, initially)? In that case, the overhead won't be worse it is 
on x86 now.

>> Also, I don't see any changes related to stack management. So, I 
>> assume it continues to be managed in slots. Any problems there? As I 
>> understand, wide SVE registers are caller-save, so there may be many 
>> spills of huge vectors around a call. (Probably, not possible with C2 
>> auto-vectorizer as it is now, but Vector API will expose it.)
>>
> 
> Yes, the stack is still managed in slots, but it will be allocated with 
> real vector register length instead of 'virtual' slots for VecA. See the 
> usages of scalable_reg_slots(), e.g. in chaitin.cpp:1587. We have also 
> applied the patch to vector api, and did find a lot of vector spills 
> with expected correct results.

I'm curious whether similar problems may arise for spills. Considering 
wide vector registers are caller-saved, it's possible to have lots of 
256-byte values to end up on stack (especially, with Vector API). Any 
concerns with that?

>> Have you noticed any performance problems? If that's the case, then 
>> AVX512 support on x86 would benefit from similar optimization as well.
>>
> 
> Do you mean register allocation performance problems? I did not notice 
> that before. Do you have any suggestion on how to measure that?

I'd try to run some applications/benchmarks with -XX:+CITime to get a 
sense how much RA may be affected.

Best regards,
Vladimir Ivanov

From Divino.Cesar at microsoft.com  Sat Aug 22 01:56:34 2020
From: Divino.Cesar at microsoft.com (Cesar Soares Lucas)
Date: Sat, 22 Aug 2020 01:56:34 +0000
Subject: Help with JDK-8230525 - Adding new intrinsic
Message-ID: <MW2PR2101MB0891FDDFF10E58CB893721479A580@MW2PR2101MB0891.namprd21.prod.outlook.com>

Hey there,

I'm working on JDK-8230525 (https://bugs.openjdk.java.net/browse/JDK-8230525)
and for the past few days I'm struggling to get all the plumbing necessary to add a
new intrinsic and instruction pattern for the Integer.reverse() method. Can someone
please look at the code I currently have and advise what I'm doing wrong/missing
here? The exact problem I'm struggling with is that C2 for some reason choose a
previously existing instruction pattern (as defined in x86_64.ad) instead of the new
instruction pattern I created.

I shared the code here: https://github.com/JohnTortugo/jdk/pull/1


Thanks,
Cesar

From igor.ignatyev at oracle.com  Sat Aug 22 02:02:33 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Fri, 21 Aug 2020 19:02:33 -0700
Subject: RFR(S) : 8251998 remove usage of PropertyResolvingWrapper in
 vmTestbase/jit/t
In-Reply-To: <a03b1d1e-c54c-dd98-aa56-82a11a7d67a2@oracle.com>
References: <1041CE41-B5C9-407F-AF91-918A52885DA8@oracle.com>
 <a03b1d1e-c54c-dd98-aa56-82a11a7d67a2@oracle.com>
Message-ID: <D03245D3-150B-4730-BF98-019D0B942089@oracle.com>

Thank you Vladimir, pushed.

-- Igor

> On Aug 21, 2020, at 1:04 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Looks good.
> 
> Thanks,
> Vladimir K
> 
> On 8/20/20 1:57 PM, Igor Ignatyev wrote:
>> http://cr.openjdk.java.net/~iignatyev//8251998/webrev.00
>>> 69 lines changed: 4 ins; 24 del; 41 mod;
>> Hi all,
>> could you please review this small patch which removes usages of PropertyResolvingWrapper from vmTestbase/jit/t tests and reenabled allowSmartActionArgs?
>> background from the main bug:
>>> CODETOOLS-7902352 added support of using ${property} in action directive, so PropertyResolvingWrapper isn't needed anymore and can be removed.
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8251998
>> webrev: http://cr.openjdk.java.net/~iignatyev//8251998/webrev.00
>> testing: :vmTestbase_vm_compiler
>> Thanks,
>> -- Igor
>>  


From igor.ignatyev at oracle.com  Sat Aug 22 05:23:14 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Fri, 21 Aug 2020 22:23:14 -0700
Subject: RFR(S) : 8252186 : remove FileInstaller action from
 vmTestbase/jit/graph tests
Message-ID: <221D21A7-B791-48CF-B48E-8E6E8CF8F4B0@oracle.com>

http://cr.openjdk.java.net/~iignatyev/8252186/webrev.00/
> 24 lines changed: 0 ins; 12 del; 12 mod;

Hi all,

could you please review this small cleanup of vmTestbase/jit/graph tests?
from JBS:
> vmTestbase/jit/graph tests use FileInstaller to copy ${test.src}/data/main.data to the current directory, and pass the path to it as '-path' option to jit.graph.CGT class. since JDK-8252005 enabled jtreg smart action args, we can use ${test.src} right in the argument and avoid copying.

testing:  :vmTestbase_vm_compiler
JBS: https://bugs.openjdk.java.net/browse/JDK-8252186
webrev: http://cr.openjdk.java.net/~iignatyev/8252186/webrev.00/

Thanks,
-- Igor

From goetz.lindenmaier at sap.com  Sat Aug 22 05:45:40 2020
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Sat, 22 Aug 2020 05:45:40 +0000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <AM0PR0202MB3331CBC4824141F221D357129B5C0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <ca46e04d-6c46-7365-0f09-9d649e196442@oracle.com>
 <DB7PR02MB3612E34960EAD89951E788839B5A0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com>
 <a4213452-e7bd-5bed-7456-3eebf4a4c3a7@oracle.com>
 <DB7PR02MB3612C72A7DC0C14CFC8B92969B540@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <f97264ed-c43e-2d7e-19ae-fcff174f74df@oracle.com>
 <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com>
 <DB7PR02MB36127925DB5D6609DDBF96909B500@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <5271429a-481d-ddb9-99dc-b3f6670fcc0b@oracle.com>
 <AM0PR0202MB33316510E86767AED0D29F679B030@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM7PR02MB6049A3D2F6DE10CAD6AA7A51ECEC0@AM7PR02MB6049.eurprd02.prod.outlook.com>
 <b159e349-95bc-01c3-5250-f3b454d7ef53@oracle.com>
 <AM0PR0202MB33315707EAB1F5C9801DB4C19BE40@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM0PR0202MB32972071A26C80FB22FC49DE9AFD0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <AM0PR0202MB3331EEF36942FCEBA7E131389BCB0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM0PR0202MB329746F57D1C78F14000CB799AC80@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <AM0PR0202MB3331D64C693490FD0746D1989BC90@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <DB6PR0201MB2152AF18921A375D26A76D89ECA40@DB6PR0201MB2152.eurprd02.prod.outlook.com>
 <AM0PR0202MB3331FF18BED42A71796488E59B600@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM4PR0202MB29641555B86889D51E08441BEC7F0@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <AM4PR0202MB2964FAF58FBD21D6705A4418EC7C0@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <AM0PR0202MB333139A9A877B64198E73D0F9B790@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM4PR0202MB296490252335D6D6D638277AEC760@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <AM0PR0202MB3331CBC4824141F221D357129B5C0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
Message-ID: <AM4PR0202MB296474493805974FBF5F00B7EC580@AM4PR0202MB2964.eurprd02.prod.outlook.com>

Hi Richard,

I read through your change again. It looks good to me now.
The new naming and additional comments make it 
easier to read I think, thank you.

One small thing:
deoptimization.cpp, l. 1503
You don't really need the brackets. Two lines below you don't use them either.
(No webrev needed)

Best regards,
  Goetz.


-----Original Message-----
From: Reingruber, Richard <richard.reingruber at sap.com> 
Sent: Dienstag, 18. August 2020 10:44
To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents

Hi Goetz,

I have collected the changes based on your feedback in a new webrev:

Webrev.7: http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.7/
Delta:    http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.7.inc/

Most of the changes are renamings, commenting, and reformatting.

Besides that ...

- I converted the native agent of the test IterateHeapWithEscapeAnalysisEnabled
  from C to C++, because this seems to be preferred by serviceability
  developers. I also re-indented the file, but excluded this from the delta
  webrev.

- I had to adapt test/jdk/com/sun/jdi/EATests.java to the fact that background
  compilation (-Xbatch) cannot be reliably disabled for JVMCI
  compilers. E.g. the compile broker will compile in the background if JVMCI is
  not yet fully initialized. Therefore it is possible that test cases are
  executed before the main test method is compiled on the highest level and then
  the test case fails. The higher the system load the higher the probability for
  this to happen. In webrev.7 I skip the compilation level check if the vm is
  configured to use the JVMCI compiler.

I also answered you inline below.

Thanks,
Richard.

-----Original Message-----
From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com> 
Sent: Donnerstag, 23. Juli 2020 16:20
To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents

Hi Richard, 

Thanks for your two further explanations in the other thread. 
That made the points clear to me.

> > I was not that happy with the names saying not_global_escape
> > and similar. I now agreed you have to use the terms of the escape
> > analysis (NoEscape ArgEscape= throughout the runtime code. I'm still not happy with
> > the 'not' in the term, I always try to expand the name to some
> > sentence with a negated verb, but it makes no sense.
> > For example, "has_not_global_escape_in_scope" expands to
> > "Hasn't a global escape in its scope." in my thinking, which makes
> > no sense. You probably mean
> > "Has not-global escape in its scope." or "Has {ArgEscape|NoEscape}
> > in its scope."
> 
> > C2 is using the word "non" in this context, e.g., here
> > alloc->is_non_escaping.
> 
> There is also ConnectionGraph::not_global_escape()
That talks about a single node that represents a single 
Object. An object has a single state wrt. ea.
You use the term for safepoint which tracks a set of objects.
Here, has_not_global_excape can mean
  1. None of the several objects does escape globaly.
  2. There is at least one object that escapes globaly.

> > non obviously negates the adjective 'global',
> > non-global or nonglobal even is a English term I find in the
> > net.
> > So what about "has_non_global_escape_in_scope?"
> 
> And what about has_ea_local_in_scope?
That's good. Please document somewhere that 
Ea_local == ArgEscape | NoEscape.
That's what it is, right?

> > Does jvmti specify that the same limits are used ...?
> > ok on your side.
> 
> I don't know and didn't find anything in a quick search.
Ok, not your business.

> 
> > jvmtiEnvBase.cpp  ok
> > jvmtiImpl.h|cpp  ok
> > jvmtiTagMap.cpp ok
> > whitebox.cpp ok
> 
> > deoptimization.cpp
> 
> > line 177: Please break line
> > line 246, 281: Please break line
> > 1578, 1583, 1589, 1632, 1649, 1651 Break line
> 
> > 1651: You use 'non'-terms, too: non-escaping :)
> 
> I know :) At least here it is wrong I'd say. "...has to be a not escaping obj..."
> sounds better
> (hopefully not only to my german ears).
I thought the term non-escpaing makes it quite clear.
I just wanted to point out that using non above would
be similar to the wording here.

> > IterateHeapWithEscapeAnalysisEnabled.java
> 
> > line 415:
> > msg("wait until target thread has set testMethod_result");
> > while (testMethod_result == 0) {
> >     Thread.sleep(50);
> > }
> > Might the test run into timeouts at this place?
> > The field is volatile, i.e. it will be reloaded
> > in each iteration. But will dontinline_testMethod
> > write it back to main memory in time?
> 
> You mean, the test could hang in that loop for a couple of minutes? I don't
> think so. There are cache coherence protocols in place which will invalidate
> stale data very timely.
Ok, anyways, it would only be a hanging test.
> 
> Ok. I've removed quite a lot of the occurrances.
> 
> > Also, I like full sentences in comments.
> > Especially for me as foreign speaker, this makes
> > things much more clear. I.e., I try to make it
> > a real sentence with articles, capitalized and a
> > dot at the end if there is a subject and a verb
> > in first place.
> > E.g., jvmtiEnvBase.cpp:1327
> 
> Are you referring to the following?
> (from
> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6/src/hots
> pot/share/prims/jvmtiEnvBase.cpp.frames.html)
> 
> 1326
> 1327   // If the frame is a compiled one, need to deoptimize it.
> 1328   if (vf->is_compiled_frame()) {
> 
> This line 1327 is preexisting.
Sorry, wrong line number again. 
I think I meant
1333 // eagerly reallocate scalar replaced objects.

But I must admit, the subject is missing. It's one of these 
imperative sentences where the subject is left out, which 
are used throughout documentation.
Bad example, but still a correct sentence, so qualifies 
for punctuation?

Best regards,
  Goetz.


From vladimir.kozlov at oracle.com  Sat Aug 22 17:55:22 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Sat, 22 Aug 2020 10:55:22 -0700
Subject: RFR(S) : 8252186 : remove FileInstaller action from
 vmTestbase/jit/graph tests
In-Reply-To: <221D21A7-B791-48CF-B48E-8E6E8CF8F4B0@oracle.com>
References: <221D21A7-B791-48CF-B48E-8E6E8CF8F4B0@oracle.com>
Message-ID: <7488a613-f5ad-acc8-edc1-677d4511216a@oracle.com>

LGTM

Thanks,
Vladimir K

On 8/21/20 10:23 PM, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev/8252186/webrev.00/
>> 24 lines changed: 0 ins; 12 del; 12 mod;
> 
> Hi all,
> 
> could you please review this small cleanup of vmTestbase/jit/graph tests?
> from JBS:
>> vmTestbase/jit/graph tests use FileInstaller to copy ${test.src}/data/main.data to the current directory, and pass the path to it as '-path' option to jit.graph.CGT class. since JDK-8252005 enabled jtreg smart action args, we can use ${test.src} right in the argument and avoid copying.
> 
> testing:  :vmTestbase_vm_compiler
> JBS: https://bugs.openjdk.java.net/browse/JDK-8252186
> webrev: http://cr.openjdk.java.net/~iignatyev/8252186/webrev.00/
> 
> Thanks,
> -- Igor
> 

From dean.long at oracle.com  Sun Aug 23 04:14:00 2020
From: dean.long at oracle.com (Dean Long)
Date: Sat, 22 Aug 2020 21:14:00 -0700
Subject: RFR: 8252058: [JVMCI] Rework setting is_method_handle_invoke flag
 in jvmciCodeInstaller
In-Reply-To: <679726fc-3a89-072e-45a6-d2a69eb5f068@oracle.com>
References: <B69DEE2F-9715-4DF1-964C-B4AEF3D9264E@oracle.com>
 <b29a7191-ae94-d36d-4ffd-62c05ca53c62@oracle.com>
 <5aec53c6-5e79-aa07-aa97-ca46fccb3f58@oracle.com>
 <CB5447D4-6DB4-43D3-B8B2-C076978BF541@oracle.com>
 <b8f6a4a4-de36-a490-b86a-980b43dbe72f@oracle.com>
 <96D2E077-C8A9-4DB6-9107-359C151A004B@oracle.com>
 <24dd9111-9119-3b00-fb48-733ef6042cae@oracle.com>
 <EC2262AB-F697-4DEB-AEBD-4438EDADA388@oracle.com>
 <679726fc-3a89-072e-45a6-d2a69eb5f068@oracle.com>
Message-ID: <18afd52c-3f80-2ecb-c1a4-33395081934a@oracle.com>

+1

dl

On 8/21/20 1:07 PM, Vladimir Kozlov wrote:
> Looks good. Thank you for testing it with changed version.
>
> Vladimir K
>
> On 8/20/20 5:37 AM, Yudi Zheng wrote:
>> Please review this rework of setting is_method_handle_invoke flag in 
>> jvmciCodeInstaller.
>>
>> http://cr.openjdk.java.net/~yzheng/8252058/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8252058
>>
>> Changes since last time are at 
>> http://cr.openjdk.java.net/~yzheng/8252058/webrev.00/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/GraalHotSpotVMConfig.java.udiff.html
>>
>> -Yudi
>>
>>> On 7 Jun 2020, at 23:14, Dean Long <dean.long at oracle.com> wrote:
>>>
>>> Looks good!
>>>
>>> dl
>>>
>>> On 6/7/20 1:06 PM, Yudi Zheng wrote:
>>>> Thanks Dean!
>>>> Here is a revision including your suggestion: 
>>>> http://cr.openjdk.java.net/~yzheng/8246347/webrev.01/
>>>>
>>>> -Yudi
>>>>
>>>>> On 6 Jun 2020, at 11:33, Dean Long <dean.long at oracle.com> wrote:
>>>>>
>>>>> I found a problem.? You need to make 
>>>>> CompiledMethod::is_deopt_mh_entry() look like is_deopt_entry() by 
>>>>> adding the JVMCI logic that looks backwards by the size of the 
>>>>> call instruction.
>>>>>
>>>>> dl
>>>>>
>>>>> On 6/4/20 12:03 AM, Yudi Zheng wrote:
>>>>>> I did not push this yet. It might require changes on the Graal 
>>>>>> side. I am still thinking about how to merge.
>>>>>>
>>>>>> -Yudi
>>>>>>
>>>>>>> On 4 Jun 2020, at 01:22, Dean Long <dean.long at oracle.com> wrote:
>>>>>>>
>>>>>>> Does this require recent Graal change in order to work correctly?
>>>>>>>
>>>>>>> dl
>>>>>>>
>>>>>>> On 6/3/20 3:47 PM, Dean Long wrote:
>>>>>>>> Hi Yudi.? I'm seeing an assert in 
>>>>>>>> test/jdk/java/lang/invoke/CallSiteTest.java with a debug build. 
>>>>>>>> Let me remove my changes and see if it still fails.? What 
>>>>>>>> testing did you do?
>>>>>>>>
>>>>>>>> dl
>>>>>>>>
>>>>>>>> On 6/2/20 9:38 AM, Yudi Zheng wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> Please review this patch that sets is_method_handle_invoke 
>>>>>>>>> flag accordingly when describing scope at call site in 
>>>>>>>>> jvmciCodeInstaller.
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~yzheng/8246347/webrev.00/
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8246347
>>>>>>>>>
>>>>>>>>> Many thanks,
>>>>>>>>> Yudi
>>>>>
>>>>
>>>
>>


From boris.ulasevich at bell-sw.com  Sun Aug 23 18:20:28 2020
From: boris.ulasevich at bell-sw.com (Boris Ulasevich)
Date: Sun, 23 Aug 2020 21:20:28 +0300
Subject: RFR 8249893: AARCH64: optimize the construction of the value from
 the bits of the other two
In-Reply-To: <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com>
References: <c11d8af8-bca7-7a7e-5486-94c487f241ac@bell-sw.com>
 <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com>
 <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com>
Message-ID: <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com>

Hi,

Please review the updated change to C2 and AArch64 which introduces
a new BitfieldInsert node to replace Or+Shift+And sequence when possible.
Single BFI instruction is emitted for the new node.

With the current change all the transformation logic is moved out of
aarch64.ad file into the common C2 code.

http://bugs.openjdk.java.net/browse/JDK-8249893
http://cr.openjdk.java.net/~bulasevich/8249893/webrev.01

The change in compiler.cpp was done to implicitly ask IGVN to run
the idealization once again after the loop optimization phase.
This extra step is necessary to make the BFI transform happen
only after loop optimization.

thanks,
Boris

On 05.08.2020 12:08, Andrew Haley wrote:
> Hi,
>
> On 8/4/20 5:56 PM, Boris Ulasevich wrote:
>
>> gently reminding of this review request.
>>> http://bugs.openjdk.java.net/browse/JDK-8249893
>>> http://cr.openjdk.java.net/~bulasevich/8249893/webrev.00
> I'm leaning towards no. The code is too complicated and difficult to
> maintain for such a small gain. As I suggested to Eric Liu
> <eric.c.liu at arm.com> when discussing 8248870, we should try
> canonicalizing this stuff early in compilation then matching with
> BFM rules.
>


From jamsheed.c.m at oracle.com  Mon Aug 24 05:36:51 2020
From: jamsheed.c.m at oracle.com (Jamsheed C M)
Date: Mon, 24 Aug 2020 11:06:51 +0530
Subject: RFR: 8249451: Unconditional exceptions clearing logic in compiler
 code should honor Async Exceptions
In-Reply-To: <442caa21-ca0a-f6eb-60a5-1e74bf994894@oracle.com>
References: <ba5ebf9b-90a7-c45e-a4fb-af2e4efe078d@oracle.com>
 <442caa21-ca0a-f6eb-60a5-1e74bf994894@oracle.com>
Message-ID: <03df9364-817d-04d6-6434-80be93a66526@oracle.com>

Hi David,

Thank you for the review and feedback. Agree on all of them. I will 
rework and get back.

On 10/08/2020 07:33, David Holmes wrote:
> Hi Jamsheed,
>
> On 6/08/2020 10:07 pm, Jamsheed C M wrote:
>> Hi all,
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8249451
>>
>> webrev: http://cr.openjdk.java.net/~jcm/8249451/webrev.00/
>
> Thanks for tackling this messy issue. Overall I like the use of TRAPS 
> to more clearly document which methods can return with an exception 
> pending. I think there are some problems with the proposed changes. 
> I'll start with those comments and then move on to more general comments.
>
> src/hotspot/share/utilities/exceptions.cpp
> src/hotspot/share/utilities/exceptions.hpp
>
> I don't think the changes here are correct or safe in general.
>
> First, adding the new macro and function to only clear non-async 
> exceptions is fine itself. But naming wise the fact only non-async 
> exceptions are cleared should be evident, and there is no "check" 
> involved (in the sense of the existing CHECK_ macros) so I suggest:
>
> s/CHECK_CLEAR_PENDING_EXCEPTION/CLEAR_PENDING_NONASYNC_EXCEPTIONS/
> s/check_clear_pending_exception/clear_pending_nonasync_exceptions/
>
Ok
> But changing the existing CHECK_AND_CLEAR macros to now leave async 
> exceptions pending seems potentially dangerous as calling code may not 
> be prepared for there to now be a pending exception. For example the 
> use in thread.cpp:
>
> ?JDK_Version::set_runtime_name(get_java_runtime_name(THREAD));
> ?JDK_Version::set_runtime_version(get_java_runtime_version(THREAD));
>
> get_java_runtime_name() is currently guaranteed to clear all 
> exceptions, so all the other code is known to be safe to call. But 
> that would no longer be true. That said, this is VM initialization 
> code and an async exception is impossible at this stage.
>
> I think I would rather see CHECK_AND_CLEAR left as-is, and an actual 
> CHECK_AND_CLEAR_NONASYNC introduced for those users of CHECK_AND_CLEAR 
> that can encounter async exceptions and which should not clear them.
>
> +?? if 
> (!_pending_exception->is_a(SystemDictionary::ThreadDeath_klass()) &&
> +?????? _pending_exception->klass() != 
> SystemDictionary::InternalError_klass()) {
>
Ok
> Flagging all InternalErrors as async exceptions is probably also not 
> correct. I don't see a good solution to this at the moment. I think we 
> would need to introduce a new subclass of InternalError for the unsafe 
> access error case**. Now it may be that all the other InternalError 
> usages are "impossible" in the context of where the new macros are to 
> be used, but that is very difficult to establish or assert.
>
> ** Or perhaps we could inject a field that allows the VM to identify 
> instances related to unsafe access errors ... Ideally of course these 
> unsafe access errors would be distinct from the async exception 
> mechanism - something I would still like to pursue.
>
Ok
> ---
>
> General comments ...
>
> There is a general change from "JavaThread* thread" to "Thread* 
> THREAD" (or TRAPS) to allow the use of the CHECK macros. This is 
> unfortunate because the fact the thread is restricted to being a 
> JavaThread is no longer evident in the method signatures. That is a 
> flaw with the TRAPS/CHECK mechanism unfortunately :( . But as the 
> methods no longer take a JavaThread* arg, they should assert that 
> THREAD->is_Java_thread(). I will also look at an RFE to have 
> as_JavaThread() to avoid the need for separate assertion checks before 
> casting from "Thread*" to "JavaThread*".
>
Ok
> Note there's no need to use CHECK when the enclosing method is going 
> to return immediately after the call that contains the CHECK. It just 
> adds unnecessary checking of the exception state. The use of TRAPS 
> shows that the methods may return with an exception pending. I've 
> flagged all such occurrences I spotted below.
>
Ok
> ---
>
> +?? // Only metaspace OOM is expected. no Java code executed.
>
> Nit: s/no/No
>
>
> src/hotspot/share/compiler/compilationPolicy.cpp
>
>
> ?410?????? method_invocation_event(method, CHECK_NULL);
> ?489?????? CompileBroker::compile_method(m, InvocationEntryBci, 
> comp_level, m, hot_count, CompileTask::Reason_InvocationCount, CHECK);
>
> Nit: there's no need to use CHECK here.
>
> ---
>
> src/hotspot/share/compiler/tieredThresholdPolicy.cpp
>
> ?504???? method_invocation_event(method, inlinee, comp_level, nm, 
> CHECK_NULL);
> ?570???????? compile(mh, bci, CompLevel_simple, CHECK);
> ?581???????? compile(mh, bci, CompLevel_simple, CHECK);
> ?595???? CompileBroker::compile_method(mh, bci, level, mh, hot_count, 
> CompileTask::Reason_Tiered, CHECK);
> 1062?????? compile(mh, InvocationEntryBci, next_level, CHECK);
>
> Nit: there's no need to use CHECK here.
>
> 814 void TieredThresholdPolicy::create_mdo(const methodHandle& mh, 
> Thread* THREAD) {
>
> Thank you for correcting this misuse of the THREAD name on a 
> JavaThread* type.
>
> ---
>
> src/hotspot/share/interpreter/linkResolver.cpp
>
> ?128?? CompilationPolicy::compile_if_required(selected_method, CHECK);
>
> Nit: there's no need to use CHECK here.
>
> ---
>
> src/hotspot/share/jvmci/compilerRuntime.cpp
>
> ?260???? CompilationPolicy::policy()->event(emh, mh, 
> InvocationEntryBci, InvocationEntryBci, CompLevel_aot, cm, CHECK);
> ?280???? nmethod* osr_nm = CompilationPolicy::policy()->event(emh, mh, 
> branch_bci, target_bci, CompLevel_aot, cm, CHECK);
>
> Nit: there's no need to use CHECK here.
>
> ---
>
> src/hotspot/share/jvmci/jvmciRuntime.cpp
>
> ?102???????? // Donot clear probable async exceptions.
>
> typo: s/Donot/Do not/
>
> ---
>
> src/hotspot/share/runtime/deoptimization.cpp
>
> 1686 void Deoptimization::load_class_by_index(const 
> constantPoolHandle& constant_pool, int index) {
>
> This method should be declared with TRAPS now.
>
> 1693???? // Donot clear probable Async Exceptions.
>
> typo: s/Donot/Do not/
>
>
Ok
>> testing : mach1-5(links in jbs)
>
> There is very little existing testing that will actually test the key 
> changes you have made here. You will need to do direct fault-injection 
> testing anywhere you now allow async exceptions to remain, to see if 
> the calling code can tolerate that. It will be difficult to test 
> thoroughly.
>
Ok
> Thanks again for tackling this difficult problem!

Best regards,

Jamsheed

>
> David
> -----
>
>>
>> While working on JDK-8246381 it was noticed that compilation request 
>> path clears all exceptions(including async) and doesn't propagate[1].
>>
>> Fix: patch restores the propagation behavior for the probable async 
>> exceptions.
>>
>> Compilation request path propagate exception as in [2]. MDO and 
>> MethodCounter doesn't expect any exception other than metaspace 
>> OOM(added comments).
>>
>> Deoptimization path doesn't clear probable async exceptions and take 
>> unpack_exception path for non uncommontraps.
>>
>> Added java_lang_InternalError to well known classes.
>>
>> Request for review.
>>
>> Best Regards,
>>
>> Jamsheed
>>
>> [1] w.r.t changes done for JDK-7131259
>>
>> [2]
>>
>> ???? (a)
>> ???? -----> c1_Runtime1.cpp/interpreterRuntime.cpp/compilerRuntime.cpp
>> ?????? |
>> ??????? ----- compilationPolicy.cpp/tieredThresholdPolicy.cpp
>> ????????? |
>> ?????????? ------ compileBroker.cpp
>>
>> ???? (b)
>> ???? Xcomp versions
>> ???? ------> compilationPolicy.cpp
>> ??????? |
>> ???????? ------> compileBroker.cpp
>>
>> ???? (c)
>>
>> ???? Direct call to? compile_method in compileBroker.cpp
>>
>> ???? JVMCI bootstrap, whitebox, replayCompile.
>>
>>

From ningsheng.jian at arm.com  Mon Aug 24 09:16:07 2020
From: ningsheng.jian at arm.com (Ningsheng Jian)
Date: Mon, 24 Aug 2020 17:16:07 +0800
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com>
 <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com>
 <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com>
Message-ID: <b97a6343-ec3b-f6ec-2a75-d02a53641ed1@arm.com>

Hi Vladimir,

Thanks for your valuable inputs!

On 8/22/20 6:34 AM, Vladimir Ivanov wrote:
> Thanks for clarifications, Ningsheng.
> 
> Let me share my thoughts on the topic and I'll start with summarizing
> the experience of migrating x86 code to generic vectors.
> 
> JVM has quite a bit of special logic to support vectors. It hasn't
> exhausted the complexity budget yet, but it's quite close to the limit
> (as you probably noticed). While extending x86 backend to support Vector
> API, we pushed it over the limit and had to address some of the issues.
> 
> The ultimate goal was to move to vectors which represent full-width
> hardware registers. After we were convinced that it will work well in AD
> files, we encountered some inefficiencies with vector spills: depending
> on actual hardware, smaller (than available) vectors may be used (e.g.,
> integer computations on AVX-capable CPU). So, we stopped half-way and
> left post-matching part intact: depending on actual vector value width,
> appropriate operand (vecX/vecY/vecZ + legacy variants) is chosen.
> 
> (I believe you may be in a similar situation on AArch64 with NEON vs SVE
> where both 128-bit and wide SVE vectors may be used at runtime.)
> 

Thanks for sharing the background.

> Now back to the patch.
> 
> What I see in the patch is that you try to attack the problem from the
> opposite side: you introduce new concept of a size-agnostic vector
> register on RA side and then directly use it during matching: vecA is
> used in aarch64_sve.ad and aarch64.ad relies on vecD/vecX.
> 
> Unfortunately, it extends the implementation in orthogonal direction
> which looks too aarch64-specific to benefit other architectures and x86
> particular. I believe there's an alternative approach which can benefit
> both aarch64 and x86, but it requires more experimentation.
> 

Since vecA and vecX (and others) are architecturally different vector 
registers, I think it's quite natural that we just introduced the new 
vector register type vecA, to represent what we need for corresponding 
hardware vector register. Please note that in vector length agnostic 
ISA, like Arm SVE and RISC-V vector extension [1], the vector registers 
are architecturally the same type of register despite the different 
hardware implementations.

> If I were to start from scratch, I would choose between 3 options:
> 
>     #1: reuse existing VecX/VecY/VecZ ideal registers and limit supported
> vector sizes to 128-/256-/512-bit values.
> 
>     #2: lift limitation on max size (to 1024/2048 bits), but ignore
> non-power-of-2 sizes;
> 
>     #3: introduce support for full range of vector register sizes
> (128-/.../2048-bit with 128-bit step);
> 
> I see 2 (mostly unrelated) limitations: maximum vector size and
> non-power-of-2 sizes.
> 
> My understanding is that you don't try to accurately represent SVE for
> now, but lay some foundations for future work: you give up on
> non-power-of-2 sized vectors, but still enable support for arbitrarily
> sized vectors (addressing both limitations on maximum size and size
> granularity) in RA (and it affects only spills). So, it is somewhere
> between #2 and #3.
> 
> The ultimate goal is definitely #3, but how much more work will be
> required to teach the JVM about non-power-of-2 vectors? As I see in the
> patch, you don't have auto-vectorizer support yet, but Vector API will
> provide access to whatever size hardware exposes. What do you expect on
> hardware front in the near/mid-term future? Anything supporting vectors
> larger than 512-bit? What about 384-bit vectors?
> 

I think our patch is now in 3. :-) We do not give up non-power-of-2 
sized vectors, instead we are supporting them well in this patch. And 
are still using current regmask framework. (Actually, I think the only 
limitation to the vector size is that it should be multiple of 32-bits - 
bits per 1 reg slot.)

I am not sure about other Arm partners' hardware implementations in the 
mid-term future, as it's free for cpu implementer to choose any max 
vector sizes as long as it follows SVE architecture specification. But 
we did tested the patch with Vector API on different SVE supported 
vector sizes on emulator, e.g. 384, 768, 1024, 2048 etc. The register 
allocator including the spill/unspill works well on those different 
sizes with Vector API. (Thanks to your great work on Vector API. :-))

We currently limit the vector size to power-of-2 in 
vm_version_aarch64.cpp, as suggested by Andrew Dinn, is because current 
SLP vectorizer only supports power-of-2 vectors. With Vector API in, I 
think such restriction can be removed. And we are also working on a new 
vectorizer to support predication/mask, which should not have power-of-2 
limitation.

> I don't have a good understanding where SVE/SVE2-capable hardware is
> moving and would benefit a lot from your insights about what to expect.
> 
> If 256-/512-bit vectors end up as the only option, then #1 should fit
> them well.
> 
> For larger vectors #2 (or a mix of #1 and #2) may be a good fit. My
> understanding that existing RA machinery should support 1024-bit vectors
> well. So, unless 2048-bit vectors are needed, we could live with the
> framework we have right now.
> 
> If hardware has non-power-of-2 vectors, but JVM doesn't support them,
> then JVM can work with just power-of-2 portion of them (384-bit => 256-bit).
> 

Yes, we can make JVM to support portion of vectors, at least for SVE. My 
concern is that the performance wouldn't be as good as the full 
available vector width.

> Giving up on #3 for now and starting with less ambitious goals (#1 or
> #2) would reduce pressure on RA and give more time for additional
> experiments to come with a better and more universal
> support/representation of generic/size-agnostic vectors. And, in a
> longer term, help reducing complexity and technical debt in the area.
> 
> Some more comments follow inline.
> 
>>> Compared to x86 w/ AVX512, architectural state for vector registers is
>>> 4x larger in the worst case (ignoring predicate registers for now).
>>> Here are the relevant constants on x86:
>>>
>>> gensrc/adfiles/adGlobals_x86.hpp:
>>>
>>> // the number of reserved registers + machine registers.
>>> #define REG_COUNT??? 545
>>> ...
>>> // Size of register-mask in ints
>>> #define RM_SIZE 22
>>>
>>> My estimate is that for AArch64 with SVE support the constants will be:
>>>
>>>  ?? REG_COUNT < 2500
>>>  ?? RM_SIZE < 100
>>>
>>> which don't look too bad.
>>>
>>
>> Right, but given that most real hardware implementations will be no
>> larger than 512 bits, I think. Having a large bitmask array, with most
>> bits useless, will be less efficient for regmask computation.
> 
> Does it make sense to limit the maximum supported size to 512-bit then
> (at least, initially)? In that case, the overhead won't be worse it is
> on x86 now.
> 

Technically, this may be possible though I haven't tried. My concerns are:

1) A larger regmask arrays would be less efficient (we only use 256 bits 
- 8 slots for SVE in this patch), though won't be worse than x86.

2) Given that current patch already supports larger sizes and 
non-power-of-2 sizes well with relative small size in diff, if we want 
to support other sizes soon, there may be some more work to roll-back ad 
file changes.

>>> Also, I don't see any changes related to stack management. So, I
>>> assume it continues to be managed in slots. Any problems there? As I
>>> understand, wide SVE registers are caller-save, so there may be many
>>> spills of huge vectors around a call. (Probably, not possible with C2
>>> auto-vectorizer as it is now, but Vector API will expose it.)
>>>
>>
>> Yes, the stack is still managed in slots, but it will be allocated with
>> real vector register length instead of 'virtual' slots for VecA. See the
>> usages of scalable_reg_slots(), e.g. in chaitin.cpp:1587. We have also
>> applied the patch to vector api, and did find a lot of vector spills
>> with expected correct results.
> 
> I'm curious whether similar problems may arise for spills. Considering
> wide vector registers are caller-saved, it's possible to have lots of
> 256-byte values to end up on stack (especially, with Vector API). Any
> concerns with that?
> 

No, we don't need to have such big (256-byte) slots for a smaller vector 
register. The spill slots are the same size as of real vector length, 
e.g. 48 bytes for 384-bit vector. Even for alignment, we currently 
choose SlotsPerVecA (8 slots for 32 bytes, 256 bits) for alignment 
(skipped slots can still be allocated to other args), which is still 
smaller than AVX512 (64 bytes, 512 bits). We can tweak the patch to 
choose other smaller value, if we think the alignment is too large. 
(Yes, we should always try to avoid spills for wide vectors, especially 
with Vector API, to avoid performance pitfalls.)

>>> Have you noticed any performance problems? If that's the case, then
>>> AVX512 support on x86 would benefit from similar optimization as well.
>>>
>>
>> Do you mean register allocation performance problems? I did not notice
>> that before. Do you have any suggestion on how to measure that?
> 
> I'd try to run some applications/benchmarks with -XX:+CITime to get a
> sense how much RA may be affected.
> 

Thanks! I will give a try.

[1] https://github.com/riscv/riscv-v-spec/releases/tag/0.9

Thanks,
Ningsheng


From ningsheng.jian at arm.com  Mon Aug 24 09:59:20 2020
From: ningsheng.jian at arm.com (Ningsheng Jian)
Date: Mon, 24 Aug 2020 17:59:20 +0800
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <d87a73a0-d2f6-b1eb-8784-f76ada320dc6@oracle.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
 <d87a73a0-d2f6-b1eb-8784-f76ada320dc6@oracle.com>
Message-ID: <5397e0d1-9d40-0107-c164-304740bc5d7f@arm.com>

Hi Erik,

Thanks for the review!

On 8/22/20 12:21 AM, Erik ?sterlund wrote:
> Hi,
> 
> Have you tried this with ZGC on AArch64? It has custom code for saving
> live registers in the load barrier slow path.
> I can't see any code changes there, so assuming this will just crash
> instead.
> The relevant code is in ZBarrierSetAssembler on aarch64.
> 
> Maybe I missed something?
> 

I didn't add ZGC option while running tests. I think I need to update 
push_fp() which is called by ZSaveLiveRegisters. But do we need to get 
size info (float/neon/sve) instead of saving the whole vector register? 
Currently, it just simply saves the whole NEON register.

And in ZBarrierSetAssembler::load_at(), before calling to runtime code, 
we call push_call_clobbered_registers_except(), which just saves 
floating point registers instead of the whole NEON vector registers. 
Similar behavior in x86 implementation. Is that correct (not saving 
vectors)?

Thanks,
Ningsheng

From vladimir.x.ivanov at oracle.com  Mon Aug 24 12:03:47 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 24 Aug 2020 15:03:47 +0300
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <b97a6343-ec3b-f6ec-2a75-d02a53641ed1@arm.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com>
 <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com>
 <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com>
 <b97a6343-ec3b-f6ec-2a75-d02a53641ed1@arm.com>
Message-ID: <9fd1e3b1-7884-1cf7-64ba-040a16c74425@oracle.com>

Hi Ningsheng,

>> What I see in the patch is that you try to attack the problem from the
>> opposite side: you introduce new concept of a size-agnostic vector
>> register on RA side and then directly use it during matching: vecA is
>> used in aarch64_sve.ad and aarch64.ad relies on vecD/vecX.
>>
>> Unfortunately, it extends the implementation in orthogonal direction
>> which looks too aarch64-specific to benefit other architectures and x86
>> particular. I believe there's an alternative approach which can benefit
>> both aarch64 and x86, but it requires more experimentation.
>>
> 
> Since vecA and vecX (and others) are architecturally different vector 
> registers, I think it's quite natural that we just introduced the new 
> vector register type vecA, to represent what we need for corresponding 
> hardware vector register. Please note that in vector length agnostic 
> ISA, like Arm SVE and RISC-V vector extension [1], the vector registers 
> are architecturally the same type of register despite the different 
> hardware implementations.

FTR vecX et al don't represent hardware registers, they represent vector 
values of predefined size. (For example, vecS, vecD, and vecX map to the 
very same set of 128-bit vector registers on x86.)

My point is: in terms of existing concepts what you are adding is not 
"yet another flavor of vector". It's a new full-fledged concept (which 
is manifested as special cases across the JVM) and you end up with 2 
different representations of vectors.

I agree that hardware is quite different, but I don't see it makes much 
of a difference in the context of the JVM and abstractions used to hide 
it are similar.

For example, as of now, most of x86-specific code in C2 works just fine 
with full-width hardware vectors which are oblivious of their sizes 
until RA kicks in. And SVE patch you propose completely omits implicit 
predication hardware provides which makes it similar to AVX512 (modulo 
wider range of vector width sizes supported).

So, even though hardware abstractions being used aren't actually *that* 
different, vecA piles complexity and introduces a separate way to 
achieve similar results (but slightly differently). And that's what 
bothers me. I'd like to see more unification instead which should bring 
reduction in complexity and an opportunity to address long-standing 
technical debt (and 5 flavors of ideal registers for vectors is part of 
it IMO).

So far, I see 2 main directions for RA work:

   (a) support vectors of arbitrary size:
     (1) helps push the upper limit on the size (1024-bit)
     (2) handle non-power-of-2 sizes

   (b) optimize RA implementation for large values

Anything else?

Speaking of (a), in particular, I don't see why possible solution for it 
should not supersede vecX et al altogether.

Also, I may be wrong, but I don't see a clear evidence there's a 
pressing need to have all of that fixed right from the beginning. 
(That's why I put #1 and #2 options on the table.) Starting with #1/#2 
would untie initial SVE support from the exploratory work needed to 
choose the most appropriate solution for (a) and (b).

>> If I were to start from scratch, I would choose between 3 options:
>>
>> ??? #1: reuse existing VecX/VecY/VecZ ideal registers and limit supported
>> vector sizes to 128-/256-/512-bit values.
>>
>> ??? #2: lift limitation on max size (to 1024/2048 bits), but ignore
>> non-power-of-2 sizes;
>>
>> ??? #3: introduce support for full range of vector register sizes
>> (128-/.../2048-bit with 128-bit step);
>>
>> I see 2 (mostly unrelated) limitations: maximum vector size and
>> non-power-of-2 sizes.
>>
>> My understanding is that you don't try to accurately represent SVE for
>> now, but lay some foundations for future work: you give up on
>> non-power-of-2 sized vectors, but still enable support for arbitrarily
>> sized vectors (addressing both limitations on maximum size and size
>> granularity) in RA (and it affects only spills). So, it is somewhere
>> between #2 and #3.
>>
>> The ultimate goal is definitely #3, but how much more work will be
>> required to teach the JVM about non-power-of-2 vectors? As I see in the
>> patch, you don't have auto-vectorizer support yet, but Vector API will
>> provide access to whatever size hardware exposes. What do you expect on
>> hardware front in the near/mid-term future? Anything supporting vectors
>> larger than 512-bit? What about 384-bit vectors?
>>
> 
> I think our patch is now in 3. :-) We do not give up non-power-of-2 
> sized vectors, instead we are supporting them well in this patch. And 
> are still using current regmask framework. (Actually, I think the only 
> limitation to the vector size is that it should be multiple of 32-bits - 
> bits per 1 reg slot.)

> I am not sure about other Arm partners' hardware implementations in the 
> mid-term future, as it's free for cpu implementer to choose any max 
> vector sizes as long as it follows SVE architecture specification. But 
> we did tested the patch with Vector API on different SVE supported 
> vector sizes on emulator, e.g. 384, 768, 1024, 2048 etc. The register 
> allocator including the spill/unspill works well on those different 
> sizes with Vector API. (Thanks to your great work on Vector API. :-))
> 
> We currently limit the vector size to power-of-2 in 
> vm_version_aarch64.cpp, as suggested by Andrew Dinn, is because current 
> SLP vectorizer only supports power-of-2 vectors. With Vector API in, I 
> think such restriction can be removed. And we are also working on a new 
> vectorizer to support predication/mask, which should not have power-of-2 
> limitation.

[...]

> Yes, we can make JVM to support portion of vectors, at least for SVE. My 
> concern is that the performance wouldn't be as good as the full 
> available vector width.

To be clear: I called it "somewhere between #2 and #3" solely because 
auto-vectorizer bails out on non-power-of-2 sizes. And even though 
Vector API will work with such cases just fine, IMO having 
auto-vectorizer support is required before calling #3 complete.

In that respect, choosing smaller vector size auto-vectorizer supports 
is preferrable to picking up the full-width vectors and turning off 
auto-vectorizer (even though Vector API will support them).

It can be turned into heuristic (by default, pick only power-of-2 sizes; 
let users explicitly specify non-power-of-2 sizes), but speaking of 
priorities, IMO auto-vectorizer support is more important.

>> Giving up on #3 for now and starting with less ambitious goals (#1 or
>> #2) would reduce pressure on RA and give more time for additional
>> experiments to come with a better and more universal
>> support/representation of generic/size-agnostic vectors. And, in a
>> longer term, help reducing complexity and technical debt in the area.
>>
>> Some more comments follow inline.
>>
>>>> Compared to x86 w/ AVX512, architectural state for vector registers is
>>>> 4x larger in the worst case (ignoring predicate registers for now).
>>>> Here are the relevant constants on x86:
>>>>
>>>> gensrc/adfiles/adGlobals_x86.hpp:
>>>>
>>>> // the number of reserved registers + machine registers.
>>>> #define REG_COUNT??? 545
>>>> ...
>>>> // Size of register-mask in ints
>>>> #define RM_SIZE 22
>>>>
>>>> My estimate is that for AArch64 with SVE support the constants will be:
>>>>
>>>> ??? REG_COUNT < 2500
>>>> ??? RM_SIZE < 100
>>>>
>>>> which don't look too bad.
>>>>
>>>
>>> Right, but given that most real hardware implementations will be no
>>> larger than 512 bits, I think. Having a large bitmask array, with most
>>> bits useless, will be less efficient for regmask computation.
>>
>> Does it make sense to limit the maximum supported size to 512-bit then
>> (at least, initially)? In that case, the overhead won't be worse it is
>> on x86 now.
>>
> 
> Technically, this may be possible though I haven't tried. My concerns are:
> 
> 1) A larger regmask arrays would be less efficient (we only use 256 bits 
> - 8 slots for SVE in this patch), though won't be worse than x86.
> 
> 2) Given that current patch already supports larger sizes and 
> non-power-of-2 sizes well with relative small size in diff, if we want 
> to support other sizes soon, there may be some more work to roll-back ad 
> file changes.
> 
>>>> Also, I don't see any changes related to stack management. So, I
>>>> assume it continues to be managed in slots. Any problems there? As I
>>>> understand, wide SVE registers are caller-save, so there may be many
>>>> spills of huge vectors around a call. (Probably, not possible with C2
>>>> auto-vectorizer as it is now, but Vector API will expose it.)
>>>>
>>>
>>> Yes, the stack is still managed in slots, but it will be allocated with
>>> real vector register length instead of 'virtual' slots for VecA. See the
>>> usages of scalable_reg_slots(), e.g. in chaitin.cpp:1587. We have also
>>> applied the patch to vector api, and did find a lot of vector spills
>>> with expected correct results.
>>
>> I'm curious whether similar problems may arise for spills. Considering
>> wide vector registers are caller-saved, it's possible to have lots of
>> 256-byte values to end up on stack (especially, with Vector API). Any
>> concerns with that?
>>
> 
> No, we don't need to have such big (256-byte) slots for a smaller vector 
> register. The spill slots are the same size as of real vector length, 
> e.g. 48 bytes for 384-bit vector. Even for alignment, we currently 
> choose SlotsPerVecA (8 slots for 32 bytes, 256 bits) for alignment 
> (skipped slots can still be allocated to other args), which is still 
> smaller than AVX512 (64 bytes, 512 bits). We can tweak the patch to 
> choose other smaller value, if we think the alignment is too large. 
> (Yes, we should always try to avoid spills for wide vectors, especially 
> with Vector API, to avoid performance pitfalls.)

Thanks for the clarifications.

Any new problems/hitting some limitations envisioned when spilling large 
number of huge vectors (2048-bit) on stack?

Best regards,
Vladimir Ivanov

>>>> Have you noticed any performance problems? If that's the case, then
>>>> AVX512 support on x86 would benefit from similar optimization as well.
>>>>
>>>
>>> Do you mean register allocation performance problems? I did not notice
>>> that before. Do you have any suggestion on how to measure that?
>>
>> I'd try to run some applications/benchmarks with -XX:+CITime to get a
>> sense how much RA may be affected.
>>
> 
> Thanks! I will give a try.
> 
> [1] 
> https://urldefense.com/v3/__https://github.com/riscv/riscv-v-spec/releases/tag/0.9__;!!GqivPVa7Brio!IwFEx-c_8JDZcWgXPLcWp2ypX3pr1-IWTBfC7O7PHo7_0skMWtQa4fyWpo-lVor0NFv4Ivo$ 
> 
> Thanks,
> Ningsheng
> 

From joserz at linux.ibm.com  Mon Aug 24 12:35:40 2020
From: joserz at linux.ibm.com (joserz at linux.ibm.com)
Date: Mon, 24 Aug 2020 09:35:40 -0300
Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and
 use new byte-reverse instructions
In-Reply-To: <AM4PR02MB3057D3AB7E81F570F9517E189A5B0@AM4PR02MB3057.eurprd02.prod.outlook.com>
References: <AM4PR02MB3057CBF45D7ABE3F7850535B9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <20200819002432.GA915540@pacoca>
 <AM4PR02MB30574A706883A995CAEC3A879A5D0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <OF8BD9FD13.2CC626B0-ON002585CB.000C8175-492585CB.000E0836@notes.na.collabserv.com>
 <1ca467b0-14a2-f526-2f97-cf62c5fa148d@oracle.com>
 <20200821133729.GA53991@pacoca>
 <e963472b-b2f8-1598-28b4-b69ea152fca1@oracle.com>
 <AM4PR02MB3057D406112699A1AE83F28D9A5B0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <6202cdf2-10b8-dd70-60ee-da9917cf8a28@oracle.com>
 <AM4PR02MB3057D3AB7E81F570F9517E189A5B0@AM4PR02MB3057.eurprd02.prod.outlook.com>
Message-ID: <20200824123540.GA166438@pacoca>

Hallo Martin!

Just to understand. Do I need to do something else? Ask more reviewers?

Thank you :)

Jose

On Fri, Aug 21, 2020 at 03:25:46PM +0000, Doerr, Martin wrote:
> Hi Thomas,
> 
> I understand your point. My concern is that it may become a more political discussion how to handle CSR for PPC64 flags and I don't want to delay Jose's change for that. There are already other changes in the pipe which build on top of it.
> 
> It will probably be us to handle and approve CSR requests for platforms which are maintained by SAP. We haven't done this so far. We are still handling such flags in a less formal way.
> I don't know how other non-Oracle platforms are handled.
> 
> Best regards,
> Martin
> 
> 
> > -----Original Message-----
> > From: Thomas Schatzl <thomas.schatzl at oracle.com>
> > Sent: Freitag, 21. August 2020 17:12
> > To: Doerr, Martin <martin.doerr at sap.com>; joserz at linux.ibm.com
> > Cc: hotspot-compiler-dev at openjdk.java.net
> > Subject: Re: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system
> > and use new byte-reverse instructions
> > 
> > Hi,
> > 
> > On 21.08.20 17:06, Doerr, Martin wrote:
> > > Hi Thomas,
> > >
> > > I agree with you in general. However, all PPC64 specific platform flags are
> > "product" at the moment.
> > > Most of them should probably be "diagnostic". We should fix that at some
> > point of time.
> > > But for now, I'm ok with Jose's webrev since it's consistent with the other
> > PPC64 flags.
> > >
> > 
> >    I was merely pointing out what the rule is, that has not been a veto
> > for the patch (which I haven't reviewed btw). If you want to go ahead
> > with that for consistency's sake, with a plan to fix this I can see your
> > point of keeping it.
> > 
> > Thanks,
> >    Thomas
> > 
> > > Best regards,
> > > Martin
> > >
> > >
> > >> -----Original Message-----
> > >> From: hotspot-compiler-dev <hotspot-compiler-dev-
> > >> retn at openjdk.java.net> On Behalf Of Thomas Schatzl
> > >> Sent: Freitag, 21. August 2020 15:45
> > >> To: joserz at linux.ibm.com
> > >> Cc: hotspot-compiler-dev at openjdk.java.net
> > >> Subject: Re: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10
> > system
> > >> and use new byte-reverse instructions
> > >>
> > >> Hi,
> > >>
> > >> On 21.08.20 15:37, joserz at linux.ibm.com wrote:
> > >>> Hello!
> > >>>
> > >>> On Fri, Aug 21, 2020 at 10:04:38AM +0200, Thomas Schatzl wrote:
> > >>>> Hi,
> > >>>>
> > >>>> On 21.08.20 04:33, Michihiro Horie wrote:
> > >>>>>
> > >>>>> Hi Jose,
> > >>>>>
> > >>>>> One thing I noticed is a misaligned backslash in globals_ppc.hpp.
> > >>>>> Otherwise, the change looks good!
> > >>>>>
> > >>>>>       /* special instructions */
> > >>>>> \
> > >>>>> +  product(bool, UseByteReverseInstructions, false,
> > >>>>> \
> > >>>>
> > >>>> Fwiw, for adding product options, you must go through the CSR
> > process.
> > >> Maybe
> > >>>> there is an exception for platform specific ones?
> > >>>
> > >>> I didn't find any exception for platform specific options. But,
> > >> "experimental" options
> > >>> don't need such CSR process and, to be honest, experimental seems
> > more
> > >> appropriate here.
> > >>> What do you think?
> > >>>
> > >>> Thank you for your review! :)
> > >>
> > >> Just a fly-by. It's up to you :) - just that product options need to be
> > >> announced to the world.
> > >>
> > >> I kind of agree that experimental seems more appropriate. You can
> > always
> > >> "upgrade" it later.
> > >>
> > >> Thomas
> 

From martin.doerr at sap.com  Mon Aug 24 13:03:25 2020
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 24 Aug 2020 13:03:25 +0000
Subject: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system and
 use new byte-reverse instructions
In-Reply-To: <20200824123540.GA166438@pacoca>
References: <AM4PR02MB3057CBF45D7ABE3F7850535B9A5C0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <20200819002432.GA915540@pacoca>
 <AM4PR02MB30574A706883A995CAEC3A879A5D0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <OF8BD9FD13.2CC626B0-ON002585CB.000C8175-492585CB.000E0836@notes.na.collabserv.com>
 <1ca467b0-14a2-f526-2f97-cf62c5fa148d@oracle.com>
 <20200821133729.GA53991@pacoca>
 <e963472b-b2f8-1598-28b4-b69ea152fca1@oracle.com>
 <AM4PR02MB3057D406112699A1AE83F28D9A5B0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <6202cdf2-10b8-dd70-60ee-da9917cf8a28@oracle.com>
 <AM4PR02MB3057D3AB7E81F570F9517E189A5B0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <20200824123540.GA166438@pacoca>
Message-ID: <AM4PR02MB3057C087B5D03C00E50B495C9A560@AM4PR02MB3057.eurprd02.prod.outlook.com>

Hi Jose,

you already have 2 reviews by JDK Reviewers.
The change needs to get the formal information including "Reviewed-by" and "Contributed-by" information added such that it passes jcheck. Then you only need a sponsor to push it. I guess Michihiro wants to do that for you?

Best regards,
Martin


> -----Original Message-----
> From: joserz at linux.ibm.com <joserz at linux.ibm.com>
> Sent: Montag, 24. August 2020 14:36
> To: Doerr, Martin <martin.doerr at sap.com>
> Cc: Thomas Schatzl <thomas.schatzl at oracle.com>; hotspot-compiler-
> dev at openjdk.java.net
> Subject: Re: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10 system
> and use new byte-reverse instructions
> 
> Hallo Martin!
> 
> Just to understand. Do I need to do something else? Ask more reviewers?
> 
> Thank you :)
> 
> Jose
> 
> On Fri, Aug 21, 2020 at 03:25:46PM +0000, Doerr, Martin wrote:
> > Hi Thomas,
> >
> > I understand your point. My concern is that it may become a more political
> discussion how to handle CSR for PPC64 flags and I don't want to delay Jose's
> change for that. There are already other changes in the pipe which build on
> top of it.
> >
> > It will probably be us to handle and approve CSR requests for platforms
> which are maintained by SAP. We haven't done this so far. We are still
> handling such flags in a less formal way.
> > I don't know how other non-Oracle platforms are handled.
> >
> > Best regards,
> > Martin
> >
> >
> > > -----Original Message-----
> > > From: Thomas Schatzl <thomas.schatzl at oracle.com>
> > > Sent: Freitag, 21. August 2020 17:12
> > > To: Doerr, Martin <martin.doerr at sap.com>; joserz at linux.ibm.com
> > > Cc: hotspot-compiler-dev at openjdk.java.net
> > > Subject: Re: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10
> system
> > > and use new byte-reverse instructions
> > >
> > > Hi,
> > >
> > > On 21.08.20 17:06, Doerr, Martin wrote:
> > > > Hi Thomas,
> > > >
> > > > I agree with you in general. However, all PPC64 specific platform flags
> are
> > > "product" at the moment.
> > > > Most of them should probably be "diagnostic". We should fix that at
> some
> > > point of time.
> > > > But for now, I'm ok with Jose's webrev since it's consistent with the
> other
> > > PPC64 flags.
> > > >
> > >
> > >    I was merely pointing out what the rule is, that has not been a veto
> > > for the patch (which I haven't reviewed btw). If you want to go ahead
> > > with that for consistency's sake, with a plan to fix this I can see your
> > > point of keeping it.
> > >
> > > Thanks,
> > >    Thomas
> > >
> > > > Best regards,
> > > > Martin
> > > >
> > > >
> > > >> -----Original Message-----
> > > >> From: hotspot-compiler-dev <hotspot-compiler-dev-
> > > >> retn at openjdk.java.net> On Behalf Of Thomas Schatzl
> > > >> Sent: Freitag, 21. August 2020 15:45
> > > >> To: joserz at linux.ibm.com
> > > >> Cc: hotspot-compiler-dev at openjdk.java.net
> > > >> Subject: Re: [EXTERNAL] Re: RFR(M): 8248190: PPC: Enable Power10
> > > system
> > > >> and use new byte-reverse instructions
> > > >>
> > > >> Hi,
> > > >>
> > > >> On 21.08.20 15:37, joserz at linux.ibm.com wrote:
> > > >>> Hello!
> > > >>>
> > > >>> On Fri, Aug 21, 2020 at 10:04:38AM +0200, Thomas Schatzl wrote:
> > > >>>> Hi,
> > > >>>>
> > > >>>> On 21.08.20 04:33, Michihiro Horie wrote:
> > > >>>>>
> > > >>>>> Hi Jose,
> > > >>>>>
> > > >>>>> One thing I noticed is a misaligned backslash in globals_ppc.hpp.
> > > >>>>> Otherwise, the change looks good!
> > > >>>>>
> > > >>>>>       /* special instructions */
> > > >>>>> \
> > > >>>>> +  product(bool, UseByteReverseInstructions, false,
> > > >>>>> \
> > > >>>>
> > > >>>> Fwiw, for adding product options, you must go through the CSR
> > > process.
> > > >> Maybe
> > > >>>> there is an exception for platform specific ones?
> > > >>>
> > > >>> I didn't find any exception for platform specific options. But,
> > > >> "experimental" options
> > > >>> don't need such CSR process and, to be honest, experimental seems
> > > more
> > > >> appropriate here.
> > > >>> What do you think?
> > > >>>
> > > >>> Thank you for your review! :)
> > > >>
> > > >> Just a fly-by. It's up to you :) - just that product options need to be
> > > >> announced to the world.
> > > >>
> > > >> I kind of agree that experimental seems more appropriate. You can
> > > always
> > > >> "upgrade" it later.
> > > >>
> > > >> Thomas
> >

From adinn at redhat.com  Mon Aug 24 13:40:53 2020
From: adinn at redhat.com (Andrew Dinn)
Date: Mon, 24 Aug 2020 14:40:53 +0100
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <b97a6343-ec3b-f6ec-2a75-d02a53641ed1@arm.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com>
 <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com>
 <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com>
 <b97a6343-ec3b-f6ec-2a75-d02a53641ed1@arm.com>
Message-ID: <35f8801f-4383-4804-2a9b-5818d1bda763@redhat.com>

On 24/08/2020 10:16, Ningsheng Jian wrote:

> On 8/22/20 6:34 AM, Vladimir Ivanov wrote:

>> The ultimate goal was to move to vectors which represent full-width
>> hardware registers. After we were convinced that it will work well in AD
>> files, we encountered some inefficiencies with vector spills: depending
>> on actual hardware, smaller (than available) vectors may be used (e.g.,
>> integer computations on AVX-capable CPU). So, we stopped half-way and
>> left post-matching part intact: depending on actual vector value width,
>> appropriate operand (vecX/vecY/vecZ + legacy variants) is chosen.
>>
>> (I believe you may be in a similar situation on AArch64 with NEON vs SVE
>> where both 128-bit and wide SVE vectors may be used at runtime.)

Your problem here seems to be a worry about spilling more data than is
actually needed. As Ningsheng pointed out the amount of data spilled is
determined by the actual length of the VecA registers, not by the
logical size of the VecA mask (256 bits) nor by the maximum possible
size of a VecA register on future architectures (2048 bits). So, no more
stack space will be used than is needed to preserve the live bits that
need preserving.

>> Unfortunately, it extends the implementation in orthogonal direction
>> which looks too aarch64-specific to benefit other architectures and x86
>> particular. I believe there's an alternative approach which can benefit
>> both aarch64 and x86, but it requires more experimentation.
>>
> 
> Since vecA and vecX (and others) are architecturally different vector
> registers, I think it's quite natural that we just introduced the new
> vector register type vecA, to represent what we need for corresponding
> hardware vector register. Please note that in vector length agnostic
> ISA, like Arm SVE and RISC-V vector extension [1], the vector registers
> are architecturally the same type of register despite the different
> hardware implementations.

Yes, I also see this as quite natural. Ningsheng's change extends the
implementation in the architecture-specific direction that is needed for
AArch64's vector model. The fact that this differs from x86_64 is not
unexpected.

>> If I were to start from scratch, I would choose between 3 options:
>>
>> ??? #1: reuse existing VecX/VecY/VecZ ideal registers and limit supported
>> vector sizes to 128-/256-/512-bit values.
>>
>> ??? #2: lift limitation on max size (to 1024/2048 bits), but ignore
>> non-power-of-2 sizes;
>>
>> ??? #3: introduce support for full range of vector register sizes
>> (128-/.../2048-bit with 128-bit step);
>>
>> I see 2 (mostly unrelated) limitations: maximum vector size and
>> non-power-of-2 sizes.

Yes, but this patch deals with both of those and I cannot see it causing
any problems for x86_64 nor do I see it adding any great complexity. The
extra shard paths deal with scalable vectors wich onlu occur on AArch64.
A scalable VecA register (and also eventually the scalable predicate
register) caters for all possible vector sizes via a single 'logical'
vector of size 8 slots (also eventually a single 'logical' predicate
register of size 1 slot). Catering for scalable registers in shared code
is localized and does not change handling of the existing, non-scalable
VecX/Y/Z registers.

>> My understanding is that you don't try to accurately represent SVE for
>> now, but lay some foundations for future work: you give up on
>> non-power-of-2 sized vectors, but still enable support for arbitrarily
>> sized vectors (addressing both limitations on maximum size and size
>> granularity) in RA (and it affects only spills). So, it is somewhere
>> between #2 and #3.

I have to disagree with your statement that this proposal doesn't
'accurately' represent SVE. Yes, the vector mask for this arbitrary-size
vector is modelled 'logically' using a nominal 8 slots. However, that is
merely to avoid wasting bits in the bit masks plus cpu time processing
them. The 'physical' vector length models the actual number of slots,
and includes the option to model a non-power of two. That 'physical'
size is used in all operations that manipulate VecA register contents.
So, although I grant that the code is /parameterized/, it is also 100%
accurate.

>> The ultimate goal is definitely #3, but how much more work will be
>> required to teach the JVM about non-power-of-2 vectors? As I see in the
>> patch, you don't have auto-vectorizer support yet, but Vector API will
>> provide access to whatever size hardware exposes. What do you expect on
>> hardware front in the near/mid-term future? Anything supporting vectors
>> larger than 512-bit? What about 384-bit vectors?

Do we need to know for sure such hardware is going to arrive in order to
allow for it now? If there were a significant cost to doing so I'd maybe
say yes but I don't really see one here. Most importantly, the changes
to the AArch64 register model and small changes to the shared
chaitin/reg mask code proposed here already work with the
auto-vectorizer if the VecA slots are any of the possible powers of 2
VecA sizes.

The extra work needed to profit from non-power-of-two vector involves
upgrading the auto-vectorizer code. While this may be tricky I don't see
ti as impossible. However, more importantly, even if such an upgrade
cannot be achieved then this proposal is still a very simple way to
allow for arbitrarily scalable SVE vectors that are a power of two size.
It also allows any architecture with a non-power of two to work with the
lowest power of two that fits. So, this is a very siple way to cater for
what may turn up.

>> For larger vectors #2 (or a mix of #1 and #2) may be a good fit. My
>> understanding that existing RA machinery should support 1024-bit vectors
>> well. So, unless 2048-bit vectors are needed, we could live with the
>> framework we have right now.

I'm not sure what you are proposing here but it sounds like introducing
extra vectors beyond VecX, VecY for larger powers of two i.e. VecZ,
vecZZ, VecZZZ ... and providing separate case processing for each of
them where the relevant case is selected conditional on the actual
vector size. Is that what you are proposing? I can't see any virtue in
multiplying case handling fore ach new power-of-two size that turns up
when all possible VecZ* power-of-two options can actually be handled as
one uniform case.

>> If hardware has non-power-of-2 vectors, but JVM doesn't support them,
>> then JVM can work with just power-of-2 portion of them (384-bit =>
>> 256-bit).

And, of course, the previous comment applies here /a fortiori/.

>> Giving up on #3 for now and starting with less ambitious goals (#1 or
>> #2) would reduce pressure on RA and give more time for additional
>> experiments to come with a better and more universal
>> support/representation of generic/size-agnostic vectors. And, in a
>> longer term, help reducing complexity and technical debt in the area.

Can you explain what you mean by 'reduce pressure on RA'? I'm also
unclear as to what you see as complex about this proposal.

>> Some more comments follow inline.
>>
>>>> Compared to x86 w/ AVX512, architectural state for vector registers is
>>>> 4x larger in the worst case (ignoring predicate registers for now).
>>>> Here are the relevant constants on x86:
>>>>
>>>> gensrc/adfiles/adGlobals_x86.hpp:
>>>>
>>>> // the number of reserved registers + machine registers.
>>>> #define REG_COUNT??? 545
>>>> ...
>>>> // Size of register-mask in ints
>>>> #define RM_SIZE 22
>>>>
>>>> My estimate is that for AArch64 with SVE support the constants will be:
>>>>
>>>> ??? REG_COUNT < 2500
>>>> ??? RM_SIZE < 100
>>>>
>>>> which don't look too bad.

I'm not sure what these numbers are meant to mean. The number of SVE
vector registers is the same as the number of NEON vector registers i.e.
32. The register mask size for VecA registers is 8 * 32 bits.

>>> Right, but given that most real hardware implementations will be no
>>> larger than 512 bits, I think. Having a large bitmask array, with most
>>> bits useless, will be less efficient for regmask computation.
>>
>> Does it make sense to limit the maximum supported size to 512-bit then
>> (at least, initially)? In that case, the overhead won't be worse it is
>> on x86 now.

Well, no. It doesn't make sense when all you need is a 'logical' 8 * 32
bit mask whatever the actual 'physical' register size is.

regards,


Andrew Dinn
-----------
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From erik.osterlund at oracle.com  Mon Aug 24 15:26:30 2020
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Mon, 24 Aug 2020 17:26:30 +0200
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <5397e0d1-9d40-0107-c164-304740bc5d7f@arm.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
 <d87a73a0-d2f6-b1eb-8784-f76ada320dc6@oracle.com>
 <5397e0d1-9d40-0107-c164-304740bc5d7f@arm.com>
Message-ID: <c3023127-277a-56e5-3dd9-0d8b119d147d@oracle.com>

Hi Ningsheng,

On 2020-08-24 11:59, Ningsheng Jian wrote:
> Hi Erik,
>
> Thanks for the review!
>
> On 8/22/20 12:21 AM, Erik ?sterlund wrote:
>> Hi,
>>
>> Have you tried this with ZGC on AArch64? It has custom code for saving
>> live registers in the load barrier slow path.
>> I can't see any code changes there, so assuming this will just crash
>> instead.
>> The relevant code is in ZBarrierSetAssembler on aarch64.
>>
>> Maybe I missed something?
>>
>
> I didn't add ZGC option while running tests. I think I need to update 
> push_fp() which is called by ZSaveLiveRegisters. But do we need to get 
> size info (float/neon/sve) instead of saving the whole vector 
> register? Currently, it just simply saves the whole NEON register.

What we found on x86_64 was that there was a significant cost in saving 
vector registers in load barriers. That is why we perform some analysis 
so that only the exact registers that are affected, and only the parts 
of the registers that are affected, get spilled. It actually mattered. 
It will of course work either way, but that was our observation on 
x86_64. But I am okay with that being deferred to a separate RFE. I just 
wanted to make sure that it at the very least works with the new code, 
for a start, so it doesn't start crashing.

> And in ZBarrierSetAssembler::load_at(), before calling to runtime 
> code, we call push_call_clobbered_registers_except(), which just saves 
> floating point registers instead of the whole NEON vector registers. 
> Similar behavior in x86 implementation. Is that correct (not saving 
> vectors)?

Yes. The call contexts are:
1) Interpreter. Does not use vector registers.
2) Method handle intrinsic. Uses only floats that are part of the Java 
calling convention, rest is garbage. No vectors here.
3) Checkcast arraycopy. Does not use vectors.

Thanks,
/Erik

> Thanks,
> Ningsheng


From aph at redhat.com  Mon Aug 24 17:31:57 2020
From: aph at redhat.com (Andrew Haley)
Date: Mon, 24 Aug 2020 18:31:57 +0100
Subject: RFR 8249893: AARCH64: optimize the construction of the value from
 the bits of the other two
In-Reply-To: <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com>
References: <c11d8af8-bca7-7a7e-5486-94c487f241ac@bell-sw.com>
 <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com>
 <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com>
 <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com>
Message-ID: <977c8d9b-a9a0-4412-8d1b-0ca6bb5db558@redhat.com>

On 23/08/2020 19:20, Boris Ulasevich wrote:
> With the current change all the transformation logic is moved out of
> aarch64.ad file into the common C2 code.
> 
> http://bugs.openjdk.java.net/browse/JDK-8249893
> http://cr.openjdk.java.net/~bulasevich/8249893/webrev.01
> 
> The change in compiler.cpp was done to implicitly ask IGVN to run
> the idealization once again after the loop optimization phase.
> This extra step is necessary to make the BFI transform happen
> only after loop optimization.

This looks rather nice. How did you test it?

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From igor.ignatyev at oracle.com  Mon Aug 24 20:24:22 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Mon, 24 Aug 2020 13:24:22 -0700
Subject: RFR(S) : 8252186 : remove FileInstaller action from
 vmTestbase/jit/graph tests
In-Reply-To: <7488a613-f5ad-acc8-edc1-677d4511216a@oracle.com>
References: <221D21A7-B791-48CF-B48E-8E6E8CF8F4B0@oracle.com>
 <7488a613-f5ad-acc8-edc1-677d4511216a@oracle.com>
Message-ID: <6CDBB24F-A155-4C93-A85D-82D8D2E58DCD@oracle.com>

thanks Vladimir, pushed.

-- Igor

> On Aug 22, 2020, at 10:55 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> LGTM
> 
> Thanks,
> Vladimir K
> 
> On 8/21/20 10:23 PM, Igor Ignatyev wrote:
>> http://cr.openjdk.java.net/~iignatyev/8252186/webrev.00/
>>> 24 lines changed: 0 ins; 12 del; 12 mod;
>> Hi all,
>> could you please review this small cleanup of vmTestbase/jit/graph tests?
>> from JBS:
>>> vmTestbase/jit/graph tests use FileInstaller to copy ${test.src}/data/main.data to the current directory, and pass the path to it as '-path' option to jit.graph.CGT class. since JDK-8252005 enabled jtreg smart action args, we can use ${test.src} right in the argument and avoid copying.
>> testing:  :vmTestbase_vm_compiler
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8252186
>> webrev: http://cr.openjdk.java.net/~iignatyev/8252186/webrev.00/
>> Thanks,
>> -- Igor


From dmitry.chuyko at bell-sw.com  Mon Aug 24 21:52:06 2020
From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko)
Date: Tue, 25 Aug 2020 00:52:06 +0300
Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster
 Math.signum(fp)
In-Reply-To: <e6700260-198e-f5df-d91c-317bb58d5d47@redhat.com>
References: <abfbd41d-aacc-af5d-d12b-d8e685e95a76@bell-sw.com>
 <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com>
 <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com>
 <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com>
 <e6700260-198e-f5df-d91c-317bb58d5d47@redhat.com>
Message-ID: <c19c6cb8-1377-cca5-0df9-dc7612d13d67@bell-sw.com>

Hi Andrew,

I added two more intrinsics -- for copySign, they are controlled by 
UseCopySignIntrinsic flag.

webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.03/

It also contains 'benchmarks' directory: 
http://cr.openjdk.java.net/~dchuyko/8251525/webrev.03/benchmarks/

There are 8 benchmarks there: (double | float) x (blackhole | reduce) x 
(current j.l.Math.signum | abs()>0 check).

My results on Arm are in signum-facgt-copysign.ods. Main case is 
'random' which is actually a random from positive and negative numbers 
between -0.5 and +0.5.

Basically we have ~14% improvement in 'reduce' benchmark variant but 
~20% regression in 'blackhole' variant in case of only copySign() 
intrinsified.

Same picture if abs()>0 check is used in signum() (+-5%). This variant 
is included as it shows very good results on x86.

Intrinsic for signum() gives improvement of main case in both 
'blackhole' and 'reduce' variants of benchmark: 28% and 11%, which is a 
noticeable difference.

-Dmitry

On 8/19/20 11:35 AM, Andrew Haley wrote:
> On 18/08/2020 16:05, Dmitry Chuyko wrote:
>> Some more results for a benchmark with reduce():
>>
>> -XX:-UseSignumIntrinsic
>> DoubleOrigSignum.ofMostlyNaN   0.914 ?  0.001  ns/op
>> DoubleOrigSignum.ofMostlyNeg   1.178 ?  0.001  ns/op
>> DoubleOrigSignum.ofMostlyPos   1.176 ?  0.017  ns/op
>> DoubleOrigSignum.ofMostlyZero  0.803 ?  0.001  ns/op
>> DoubleOrigSignum.ofRandom      1.175 ?  0.012  ns/op
>> -XX:+UseSignumIntrinsic
>> DoubleOrigSignum.ofMostlyNaN   1.040 ? 0.007   ns/op
>> DoubleOrigSignum.ofMostlyNeg   1.040 ? 0.004   ns/op
>> DoubleOrigSignum.ofMostlyPos   1.039 ? 0.003   ns/op
>> DoubleOrigSignum.ofMostlyZero  1.040 ? 0.001   ns/op
>> DoubleOrigSignum.ofRandom      1.040 ? 0.003   ns/op
> That's almost no difference, isn't it? Down in the noise.
>
>> If we only intrinsify copySign() we lose free mask that we get from
>> facgt. In such case improvement (for signum) decreases like from ~30% to
>> ~15%, and it also greatly depends on the particular HW. We can
>> additionally introduce an intrinsic for Math.copySign(), especially it
>> makes sense for float where it can be just 2 fp instructions: movi+bsl
>> (fmovd+fnegd+bsl for double).
> I think this is worth doing, because moves between GPRs and vector regs
> tend to have a long latency. Can you please add that, and we can all try
> it on our various hardware.
>
> We're measuring two different things, throughput and latency. The
> first JMH test you provided was really testing latency, because
> Blackhole waits for everything to complete.
>
> [ Note to self: Blackhole.consume() seems to be particularly slow on
> some AArch64 implementations because it uses a volatile read. What
> seems to be happening, judging by how long it takes, is that the store
> buffer is drained before the volatile read. Maybe some other construct
> would work better but still provide the guarantees Blackhole.consume()
> needs. ]
>
> For throughput we want to keep everything moving. Sure, sometimes we
> are going to have to wait for some calculation to complete, so if we
> can improve latency without adverse cost we should. For that, staying
> in the vector regs helps.
>

From cjashfor at linux.ibm.com  Tue Aug 25 01:21:59 2020
From: cjashfor at linux.ibm.com (Corey Ashford)
Date: Mon, 24 Aug 2020 18:21:59 -0700
Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API
 for Base64 decoding
In-Reply-To: <b3bd4ba5-a52f-02b7-6e30-b5f64061dec3@oracle.com>
References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com>
 <OFF371E11D.F5B64585-ON00258591.0006A5F8-49258591.0007AA3E@notes.na.collabserv.com>
 <ef75251b-c5ef-10b5-d0bd-c4ba15d2c934@linux.ibm.com>
 <f4c2d16d-5531-b0a6-b763-bed9d316190a@linux.ibm.com>
 <b3bd4ba5-a52f-02b7-6e30-b5f64061dec3@oracle.com>
Message-ID: <8ece8d2e-fd99-b734-211e-a32b534a7dc8@linux.ibm.com>

Here's a revised webrev which includes a JMH benchmark for the decode 
operation.

http://cr.openjdk.java.net/~mhorie/8248188/webrev.03/

The added benchmark tries to be "fair" in that it doesn't prefer a large 
buffer size, which would favor the intrinsic.  It pseudo-randomly (but 
reproducibly) chooses a buffer size between 8 and 20k+8 bytes, and fills 
it with random data to encode and decode.  As part of the TearDown of an 
invocation, it also checks the decoded output data for correctness.

Example runs on the Power9-based machine I use for development shows a 
3X average improvement across these random buffer sizes.  Here's an 
excerpt of the output when run with -XX:-UseBASE64Intrinsics :

Iteration   1: 70795.623 ops/s
Iteration   2: 71070.607 ops/s
Iteration   3: 70867.544 ops/s
Iteration   4: 71107.992 ops/s
Iteration   5: 71048.281 ops/s

And here's the output with the intrinsic enabled:

Iteration   1: 208794.022 ops/s
Iteration   2: 208630.904 ops/s
Iteration   3: 208238.822 ops/s
Iteration   4: 208714.967 ops/s
Iteration   5: 209060.894 ops/s

Taking the best of the two runs: 209060/71048 = 2.94

 From other experiments where the benchmark uses a fixed-size, larger 
buffer, the performance ratio rises to about 4.0.

Power10 should have a slightly higher ratio due to several factors, but 
I have not yet benchmarked on Power10.

Other arches ought to be able to do at least this well, if not better, 
because of wider vector registers (> 128 bits) being available.  Only a 
Power9/10 implementation is included in this webrev, however.

Regards,

- Corey


On 8/19/20 11:20 AM, Roger Riggs wrote:
> Hi Corey,
> 
> For changes obviously performance motivated, it is conventional to run a 
> JMH perf test to demonstate
> the improvement and prove it is worthwhile to add code complexity.
> 
> I don't see any existing Base64 JMH tests but they would be in the repo 
> below or near:
>  ??? test/micro/org/openjdk/bench/java/util/
> 
> Please contribute a JMH test and results to show the difference.
> 
> Regards, Roger
> 
> 
> 
> On 8/19/20 2:10 PM, Corey Ashford wrote:
>> Michihiro Horie posted up a new iteration of this webrev for me.? This 
>> time the webrev includes a complete implementation of the intrinsic 
>> for Power9 and Power10.
>>
>> You can find it here: 
>> http://cr.openjdk.java.net/~mhorie/8248188/webrev.02/
>>
>> Changes in webrev.02 vs. webrev.01:
>>
>> ? * The method header for the intrinsic in the Base64 code has been 
>> rewritten using the Javadoc style.? The clarity of the comments has 
>> been improved and some verbosity has been removed. There are no 
>> additional functional changes to Base64.java.
>>
>> ? * The code needed to martial and check the intrinsic parameters has 
>> been added, using the base64 encodeBlock intrinsic as a guideline.
>>
>> ? * A complete intrinsic implementation for Power9 and Power10 is 
>> included.
>>
>> ? * Adds some Power9 and Power10 assembler instructions needed by the 
>> intrinsic which hadn't been defined before.
>>
>> The intrinsic implementation in this patch accelerates the decoding of 
>> large blocks of base64 data by a factor of about 3.5X on Power9.
>>
>> I'm attaching two Java test cases I am using for testing and 
>> benchmarking.? The TestBase64_VB encodes and decodes randomly-sized 
>> buffers of random data and checks that original data matches the 
>> encoded-then-decoded data.? TestBase64Errors encodes a 48K block of 
>> random bytes, then corrupts each byte of the encoded data, one at a 
>> time, checking to see if the decoder catches the illegal byte.
>>
>> Any comments/suggestions would be appreciated.
>>
>> Thanks,
>>
>> - Corey
>>
>> On 7/27/20 6:49 PM, Corey Ashford wrote:
>>> Michihiro Horie uploaded a new revision of the Base64 decodeBlock 
>>> intrinsic API for me:
>>>
>>> http://cr.openjdk.java.net/~mhorie/8248188/webrev.01/
>>>
>>> It has the following changes with respect to the original one posted:
>>>
>>> ??* In the event of encountering a non-base64 character, instead of 
>>> having a separate error code of -1, the intrinsic can now just return 
>>> either 0, or the number of data bytes produced up to the point where 
>>> the illegal base64 character was encountered. This reduces the number 
>>> of special cases, and also provides a way to speed up the process of 
>>> finding the bad character by the slower, pure-Java algorithm.
>>>
>>> ??* The isMIME boolean is removed from the API for two reasons:
>>> ??? - The current API is not sufficient to handle the isMIME case, 
>>> because there isn't a strict relationship between the number of input 
>>> bytes and the number of output bytes, because there can be an 
>>> arbitrary number of non-base64 characters in the source.
>>> ??? - If an intrinsic only implements the (isMIME == false) case as 
>>> ours does, it will always return 0 bytes processed, which will 
>>> slightly slow down the normal path of processing an (isMIME == true) 
>>> instantiation.
>>> ??? - We considered adding a separate hotspot candidate for the 
>>> (isMIME == true) case, but since we don't have an intrinsic 
>>> implementation to test that, we decided to leave it as a future 
>>> optimization.
>>>
>>> Comments and suggestions are welcome.? Thanks for your consideration.
>>>
>>> - Corey
>>>
>>> On 6/23/20 6:23 PM, Michihiro Horie wrote:
>>>> Hi Corey,
>>>>
>>>> Following is the issue I created.
>>>> https://bugs.openjdk.java.net/browse/JDK-8248188
>>>>
>>>> I will upload a webrev when you're ready as we talked in private.
>>>>
>>>> Best regards,
>>>> Michihiro
>>>>
>>>> Inactive hide details for "Corey Ashford" ---2020/06/24 
>>>> 09:40:10---Currently in java.util.Base64, there is a 
>>>> HotSpotIntrinsicCa"Corey Ashford" ---2020/06/24 09:40:10---Currently 
>>>> in java.util.Base64, there is a HotSpotIntrinsicCandidate and API 
>>>> for encodeBlock, but no
>>>>
>>>> From: "Corey Ashford" <cjashfor at linux.ibm.com>
>>>> To: "hotspot-compiler-dev at openjdk.java.net" 
>>>> <hotspot-compiler-dev at openjdk.java.net>, 
>>>> "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net>
>>>> Cc: Michihiro Horie/Japan/IBM at IBMJP, Kazunori Ogata/Japan/IBM at IBMJP, 
>>>> joserz at br.ibm.com
>>>> Date: 2020/06/24 09:40
>>>> Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for 
>>>> Base64 decoding
>>>>
>>>> ------------------------------------------------------------------------ 
>>>>
>>>>
>>>>
>>>>
>>>> Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and
>>>> API for encodeBlock, but none for decoding. ?This means that only
>>>> encoding gets acceleration from the underlying CPU's vector hardware.
>>>>
>>>> I'd like to propose adding a new intrinsic for decodeBlock. ?The
>>>> considerations I have for this new intrinsic's API:
>>>>
>>>> ??* Don't make any assumptions about the underlying capability of the
>>>> hardware. ?For example, do not impose any specific block size 
>>>> granularity.
>>>>
>>>> ??* Don't assume the underlying intrinsic can handle isMIME or isURL
>>>> modes, but also let them decide if they will process the data 
>>>> regardless
>>>> of the settings of the two booleans.
>>>>
>>>> ??* Any remaining data that is not processed by the intrinsic will be
>>>> processed by the pure Java implementation. ?This allows the 
>>>> intrinsic to
>>>> process whatever block sizes it's good at without the complexity of
>>>> handling the end fragments.
>>>>
>>>> ??* If any illegal character is discovered in the decoding process, the
>>>> intrinsic will simply return -1, instead of requiring it to throw a
>>>> proper exception from the context of the intrinsic. ?In the event of
>>>> getting a -1 returned from the intrinsic, the Java Base64 library code
>>>> simply calls the pure Java implementation to have it find the error and
>>>> properly throw an exception. ?This is a performance trade-off in the
>>>> case of an error (which I expect to be very rare).
>>>>
>>>> ??* One thought I have for a further optimization (not implemented in
>>>> the current patch), is that when the intrinsic decides not to process a
>>>> block because of some combination of isURL and isMIME settings it
>>>> doesn't handle, it could return extra bits in the return code, encoded
>>>> as a negative number. ?For example:
>>>>
>>>> Illegal_Base64_char ? = 0b001;
>>>> isMIME_unsupported ? ?= 0b010;
>>>> isURL_unsupported ? ? = 0b100;
>>>>
>>>> These can be OR'd together as needed and then negated (flip the sign).
>>>> The Base64 library code could then cache these flags, so it will know
>>>> not to call the intrinsic again when another decodeBlock is requested
>>>> but with an unsupported mode. ?This will save the performance hit of
>>>> calling the intrinsic when it is guaranteed to fail.
>>>>
>>>> I've tested the attached patch with an actual intrinsic coded up for
>>>> Power9/Power10, but those runtime intrinsics and arch-specific patches
>>>> aren't attached today. ?I want to get some consensus on the
>>>> library-level intrinsic API first.
>>>>
>>>> Also attached is a simple test case to test that the new intrinsic API
>>>> doesn't break anything.
>>>>
>>>> I'm open to any comments about this.
>>>>
>>>> Thanks for your consideration,
>>>>
>>>> - Corey
>>>>
>>>>
>>>> Corey Ashford
>>>> IBM Systems, Linux Technology Center, OpenJDK team
>>>> cjashfor at us dot ibm dot com
>>>> [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro 
>>>> Horie/Japan/IBM] [attachment "TestBase64.java" deleted by Michihiro 
>>>> Horie/Japan/IBM]
>>>>
>>>>
>>>
>>
> 


From john.r.rose at oracle.com  Tue Aug 25 05:23:27 2020
From: john.r.rose at oracle.com (John Rose)
Date: Mon, 24 Aug 2020 22:23:27 -0700
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <7433dd28-94ce-781a-a50c-e79234e2986e@oracle.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
 <87h7t13bdz.fsf@redhat.com> <a28925b3-f93c-40f4-f180-8bf8a32c3a56@oracle.com>
 <87tuwx1gcf.fsf@redhat.com> <7433dd28-94ce-781a-a50c-e79234e2986e@oracle.com>
Message-ID: <BD40513E-89A5-4E1A-8BFB-4EDFAA524A75@oracle.com>

On Aug 21, 2020, at 12:43 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> For the record, I've tested tier1-9 with "default" flags and tier1-5 with
> -XX:StressLongCountedLoop=1 and -XX:StressLongCountedLoop=4294967295.
> 
> Please let me know if you think other flag combinations/values should be tested as well.

Those settings force iters_limit (normally 2^31-2) to be either
preserved at  2^31-2 or reset to 0, respectively.  The latter value
is not very useful, since the transform will bail out for trip counts
of 1 or 0.  I suggest aiming for StressLongCountedLoop values
which get inner loop trip counts that are a balance between
two concerns:  (a) large enough so that the inner loop makes
a non-trivial number of trips, and (b) small enough so the
*outer* loop makes a non-trivial number of trips.

Concern (a) lets us to exercise further optimizations on the
inner loop such as unrolling, peeling, and RCE.  Concern (b)
helps us be sure that back edge of the outer loop performs the
right register moves, even if the inner loop is very complex
and has many exit points.  If we don?t worry about (a) we
could mask bugs in the transformed inner loop (unlikely,
but possible).  If we don?t worry about (b) we could be
ignorant about what happens when the outer loop runs
the second time (or third, after peeling).

For (a) we want an iters_limit on the order of 100 or more,
while for (b) we want an iters_limit large enough that many
tests (each loop of which has its own characteristic trip
count) will run the outer loop three or more times.  Tests
which intentionally warm up loops go for a *cumulative*
trip count of 20,000 or so, but the individual trip counts
can vary widely.  As a wild guess, I?ll say that many tests
will run 100 or more times, which means we want an
iter_limit of 300 or more.

To derive a StressLongCountedLoop parameter X from a
desired iter_limit, ensure that floor((2^31-2)/X)  is close to
the target iter_limit.  So, I recommend a value of
StressLongCountedLoop which is at most 21400000
(for an iters_limit of at least 100), and another which
is at least 7150000 (for an iters_limit of at most 300).

Putting these together, and choosing a round number
which prioritizes concern (b) by moving closer to the
limit of (a), if I had one more run to do I?d choose
-XX: StressLongCountedLoop=20000000.

If I were to do multiple runs, I might choose vary that
stress parameter by adding and subtracting a couple
of zeroes:

-XX: StressLongCountedLoop=200000
-XX: StressLongCountedLoop=2000000
-XX: StressLongCountedLoop=20000000
-XX: StressLongCountedLoop=200000000
-XX: StressLongCountedLoop=2000000000

If any of those runs kicks out a bug or other suspicious behavior,
it should be added to a permanent test list.

Separately from those issues, we know that the stress mode
converts 32-bit loops into 64-bit loops, which then re-nest
using the new logic.  But, are we confident that this re-nesting
works?  Roland did some manual testing to make sure the
test works as intended, but it would be good to run the above
stress tests with some sort of logging that ensures that there
are at least ?lots and lots? of successful 32-to-64 loop conversions.
If those loop conversions fail (staying at 64 bits) the tests will
pass, but they won?t be testing what we need to be testing.

HTH

? John

> Best regards,
> Tobias
> 
> On 20.08.20 17:34, Roland Westrelin wrote:
>> 
>>> Yes, webrev.03 looks good to me. I've re-run extended testing and the results look good.
>> 
>> Thanks for the review and testing!
>> 
>> Roland.
>> 


From yueshi.zwj at alibaba-inc.com  Tue Aug 25 06:03:10 2020
From: yueshi.zwj at alibaba-inc.com (Joshua Zhu)
Date: Tue, 25 Aug 2020 14:03:10 +0800
Subject: RFR: 8252259: AArch64: Adjust default value of FLOATPRESSURE
Message-ID: <001101d67aa5$69851450$3c8f3cf0$@alibaba-inc.com>

Hi,

I have a small patch that will decrease the default value from 64 into 32
for aarch64's FLOATPRESSURE, which represents float LRG's number that
constitutes high register pressure.
With the proper value setting, in low register pressure (LRP) region, C2
can avoid unnecessary spilling and directly use register.

I wrote a simple case that is able to reflect the effect of new value.
    http://cr.openjdk.java.net/~jzhu/8252259/Test.java

For this case, with new FLOATPRESSURE value, only one iteration of iterative
graph-coloring RA was required. The DefinitionSpillCopyNode was generated
directly when crossing HRP boundary in Split phase [1].
And only one MemToRegSpillCopyNode in HRP region was generated at USE site.
The dump of Split cycles and OptoAssembly is:
    http://cr.openjdk.java.net/~jzhu/8252259/frp_32.log

For the same case, with current FLOATPRESSURE, the whole method was
identified as LRP region. In the first iteration of graph-coloring,
LRG was identified as spilled. In the second iteration,
DefinitionSpillCopyNode was generated [2] and there were three
MemToRegSpillCopy nodes were produced at each USE site.
See dump: http://cr.openjdk.java.net/~jzhu/8252259/frp_64.log with
the old FLOATPRSSURE.

Therefore I propose the default value of FLOATPRESSURE be 32 because
there are 32 float/SIMD registers on aarch64 and also the value of register
pressure is the same as 1 for each LRG of Op_RegL/Op_RegD/Op_Vec. [3]

Could you please help review this change?

JBS: https://bugs.openjdk.java.net/browse/JDK-8252259
Webrev: http://cr.openjdk.java.net/~jzhu/8252259/webrev.00/

[1]
https://hg.openjdk.java.net/jdk/jdk/file/332b3a2eb4cc/src/hotspot/share/opto
/reg_split.cpp#l855
[2]
https://hg.openjdk.java.net/jdk/jdk/file/332b3a2eb4cc/src/hotspot/share/opto
/reg_split.cpp#l1198
[3]
https://hg.openjdk.java.net/jdk/jdk/file/332b3a2eb4cc/src/hotspot/share/opto
/chaitin.cpp#l926

Best Regards,
Joshua


From boris.ulasevich at bell-sw.com  Tue Aug 25 06:40:57 2020
From: boris.ulasevich at bell-sw.com (Boris Ulasevich)
Date: Tue, 25 Aug 2020 09:40:57 +0300
Subject: RFR 8249893: AARCH64: optimize the construction of the value from
 the bits of the other two
In-Reply-To: <977c8d9b-a9a0-4412-8d1b-0ca6bb5db558@redhat.com>
References: <c11d8af8-bca7-7a7e-5486-94c487f241ac@bell-sw.com>
 <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com>
 <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com>
 <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com>
 <977c8d9b-a9a0-4412-8d1b-0ca6bb5db558@redhat.com>
Message-ID: <b88898d4-e837-8538-80de-acb5a0fb2f45@bell-sw.com>

Hi Andrew,

 > This looks rather nice.

Thank you!

 > How did you test it?

I have run JCK and JTREG tests on arm and intel platforms (the 
transformation
works in many places: StringUTF16, BigInteger, ZipUtils, etc).

I checked that benchmark [1] shows positive results on both single call and
vectorized case (adding the benchmarking code in a simple cycle).

I checked with +PrintAssembly that expressions are generated as expected:
((v1 & 0xFF) << 24) | ((v2 & 0xFF) << 16) | ((v3 & 0xFF) << 8) | (v4 & 0xFF)

I ran the generated brute force tests [2] that checks all possible 
mask/shift
combinations for int/long types: (value1 & mask1) | ((value1 & mask2) << 
shift)

thanks,
Boris

[1] http://cr.openjdk.java.net/~bulasevich/8249893/webrev.01/Benchmark.java
[2] http://cr.openjdk.java.net/~bulasevich/8249893/webrev.00/Gen.java

On 24.08.2020 20:31, Andrew Haley wrote:
> On 23/08/2020 19:20, Boris Ulasevich wrote:
>> With the current change all the transformation logic is moved out of
>> aarch64.ad file into the common C2 code.
>>
>> http://bugs.openjdk.java.net/browse/JDK-8249893
>> http://cr.openjdk.java.net/~bulasevich/8249893/webrev.01
>>
>> The change in compiler.cpp was done to implicitly ask IGVN to run
>> the idealization once again after the loop optimization phase.
>> This extra step is necessary to make the BFI transform happen
>> only after loop optimization.
>
> This looks rather nice. How did you test it?
>


From shade at redhat.com  Tue Aug 25 07:08:00 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 25 Aug 2020 09:08:00 +0200
Subject: RFR (S) 8252215: Remove VerifyOptoOopOffsets flag
Message-ID: <96144e25-02b7-ed81-285e-b8d487fd6cfb@redhat.com>

RFE:
   https://bugs.openjdk.java.net/browse/JDK-8252215

VerifyOptoOopOffsets flag does not seem to be used (no tests in the current test base), and it does 
not seem to work reliably (see JDK-4834891). It might be a good time to remove it. JDK-4834891 
evaluation says: "The flag VerifyOptoOopOffsets has not been valid since the introduction of 
sun/misc/Unsafe and the flag should not be used for general testing."

How about we remove it?
   https://cr.openjdk.java.net/~shade/8252215/webrev.01/

Testing: tier1 (locally); jdk-submit (still running?)

-- 
Thanks,
-Aleksey


From christian.hagedorn at oracle.com  Tue Aug 25 07:25:51 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Tue, 25 Aug 2020 09:25:51 +0200
Subject: [16] RFR(M): 8249607: C2: assert(!had_error) failed: bad dominance
In-Reply-To: <87d03k10ss.fsf@redhat.com>
References: <ff42068d-c628-7698-b16d-de164e9f6a4d@oracle.com>
 <87d03k10ss.fsf@redhat.com>
Message-ID: <2334c98c-48da-2fc1-a98d-9e9b983c7500@oracle.com>

Hi Roland

On 21.08.20 17:22, Roland Westrelin wrote:
> 
> Hi Christian,
> 
>> We have two options to fix this. We could either update the wrong
>> control inputs from 876 IfFalse during the creation/merging of
>> pre/main/post loops or directly fix it inside
>> split_if_with_blocks_post(). I think it is makes more sense and is also
>> easier to directly fix it in split_if_with_blocks_post() where we could
>> be less pessimistic when pinning loads.
>>
>> The fix now checks if late_load_ctrl is a loop exit of a loop that has
>> an outer strip mined loop and if it dominates x_ctrl. If that is the
>> case, we use the outer loop exit control instead. This also means that
>> the loads can completely float out of the outer strip mined loop.
>> Applying that to the testcase, we get [3] instead of [2]. LoadS 901 and
>> 902 are both at the outer strip mined loop exit while 903 LoadS is still
>> at the inner loop due to 575 StoreI (x_ctrl is 876 IfFalse and dominates
>> the outer strip mined loop exit). The process of creating pre/main/post
>> loops will then take care of these control inputs of the LoadSNodes and
>> rewires them to the newly created RegionNode such that the dominator
>> information is correct again.
> 
> I agree that fixing it in split_if_with_blocks_post() is the right thing
> to do.
> 
> The load has no edges to the safepoint in the outer strip mined loop so
> why is it in the loop in the first place then? If java code has a load
> in a loop that's live outside the loop then it should be live at the
> safepoint on loop exit. Is anti dependence analysis too conservative?

I maybe should have shared another image of the graph before the LoadS 
clones 901-903 are created. The original 572 LoadS (see [1]) is an input 
into 575 StoreI which is an input of 578 MergeMem which goes into the 
881 SafePoint in the outer strip mined loop. The other two uses (897 Phi 
and 893 Phi) are uses outside of the outer strip mined loop.

> Also why does get_late_ctrl(n, n_ctrl) return a control inside the outer
> strip mined loop? And why is it safe to bypass that result?

Due to 575 StoreI being needed inside the outer strip mined loop, 
get_late_ctrl() of 572 LoadS also returns the inner loop exit 876 
IfFalse. My thinking was that since we now clone 572 LoadS and create a 
new LoadS for each use, then we don't need to pin the LoadS going into 
Phi 893 and 897 to 876 IfFalse, too, if x_ctrl is outside the outer 
strip mined loop but to the outer strip mined loop exit.

But now thinking about it, do we need another get_late_ctrl(x, 
late_load_ctrl) for each clone and check if they can really be put 
outside of the strip mined loop instead of just checking dominance with 
x_ctrl (which is based on get_ctrl(u) of a use of the load)? In 
get_late_ctrl() we do consider anti dependencies. Maybe something like 
this (change on L1473):

http://cr.openjdk.java.net/~chagedorn/8249607/webrev.01/

Best regards,
Christian


[1] 
https://bugs.openjdk.java.net/secure/attachment/89947/before_cloning_LoadS.png

From shade at redhat.com  Tue Aug 25 07:29:16 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 25 Aug 2020 09:29:16 +0200
Subject: RFR (XS) 8252290: C2: Remove unused enum in CallGenerator
Message-ID: <30d68060-d518-a2c6-f853-9e870d48f0ad@redhat.com>

Small cleanup:
   https://bugs.openjdk.java.net/browse/JDK-8252290

Static code inspection complains the enum below is unused.

diff -r 13fdf97f0a8f src/hotspot/share/opto/callGenerator.hpp
--- a/src/hotspot/share/opto/callGenerator.hpp  Mon Aug 24 09:35:23 2020 +0200
+++ b/src/hotspot/share/opto/callGenerator.hpp  Tue Aug 25 09:27:45 2020 +0200
@@ -37,9 +37,4 @@

  class CallGenerator : public ResourceObj {
- public:
-  enum {
-    xxxunusedxxx
-  };
-
   private:
    ciMethod*             _method;                // The method being called.

Testing: grepping for "xxxunusedxxx", local builds

-- 
Thanks,
-Aleksey


From shade at redhat.com  Tue Aug 25 07:34:40 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 25 Aug 2020 09:34:40 +0200
Subject: RFR (XS) 8252291: C2: Assignment in conditional in loopUnswitch.cpp
Message-ID: <b7a6ee57-bf7e-6d73-0bad-1f90141e053a@redhat.com>

Cleanup:
   https://bugs.openjdk.java.net/browse/JDK-8252291

Static code analysis complains there is the assignment in the conditional here. I believe the 
assignment should be explicit here. Code was introduced with JDK-8136725.

diff -r 31de2a59348a src/hotspot/share/opto/loopUnswitch.cpp
--- a/src/hotspot/share/opto/loopUnswitch.cpp   Tue Aug 25 09:27:04 2020 +0200
+++ b/src/hotspot/share/opto/loopUnswitch.cpp   Tue Aug 25 09:29:23 2020 +0200
@@ -442,7 +442,8 @@

    if (iff->in(1)->Opcode() != Op_ConI) {
      return false;
    }

-  return _has_reserved = true;
+  _has_reserved = true;
+  return true;
  }

Testing: local builds

-- 
Thanks,
-Aleksey


From aph at redhat.com  Tue Aug 25 08:10:19 2020
From: aph at redhat.com (Andrew Haley)
Date: Tue, 25 Aug 2020 09:10:19 +0100
Subject: RFR 8249893: AARCH64: optimize the construction of the value from
 the bits of the other two
In-Reply-To: <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com>
References: <c11d8af8-bca7-7a7e-5486-94c487f241ac@bell-sw.com>
 <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com>
 <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com>
 <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com>
Message-ID: <8635e113-53c0-f6a7-e51c-acf20222e216@redhat.com>

Hi,

On 23/08/2020 19:20, Boris Ulasevich wrote:
  >
  > Please review the updated change to C2 and AArch64 which introduces
  > a new BitfieldInsert node to replace Or+Shift+And sequence when possible.
  > Single BFI instruction is emitted for the new node.
  >
  > With the current change all the transformation logic is moved out of
  > aarch64.ad file into the common C2 code.
  >
  > http://bugs.openjdk.java.net/browse/JDK-8249893
  > http://cr.openjdk.java.net/~bulasevich/8249893/webrev.01
  >
  > The change in compiler.cpp was done to implicitly ask IGVN to run
  > the idealization once again after the loop optimization phase.
  > This extra step is necessary to make the BFI transform happen
  > only after loop optimization.

So here's a strange thing. When I run a simple JMH test

      @State(Scope.Benchmark)
      public static class Result {
          public int a, b;
          public long x;
      }

      @Benchmark
      public static int bfm(Result r) {
          return (r.a & 0xFF) | ((r.b & 0xFF) << 8);
      }

I get

    0x0000ffff84644df0:   ubfiz	w12, w11, #8, #8
    0x0000ffff84644df4:   and	w10, w10, #0xff
    0x0000ffff84644df8:   orr	w2, w10, w12                ;*ior {reexecute=0 rethrow=0 return_oop=0}
                                                              ; - org.openjdk.Rotates::bfm at 19 (line 22)
                                                              ; - org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub at 20 (line 199)

instead of

    0x0000ffff808554b4: and	w10, w10, #0xff
    0x0000ffff808554b8: and	w12, w12, #0xff
    0x0000ffff808554bc: orr	w2, w12, w10, lsl #8  ;*ior
                                                  ; - org.openjdk.Rotates::bfm at 19 (line 22)
                                                  ; - org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub at 20 (line 199)

Do you have any ideas why this might be? Thanks.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From rwestrel at redhat.com  Tue Aug 25 08:23:19 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 25 Aug 2020 10:23:19 +0200
Subject: RFR(S): 8252292: 8240795 may cause anti-dependence to be missed
Message-ID: <87wo1n6snc.fsf@redhat.com>


https://bugs.openjdk.java.net/browse/JDK-8252292
http://cr.openjdk.java.net/~roland/8252292/webrev.00/

In 8240795, I modified alias analysis so non escaping allocations don't
alias with bottom memory. While browsing that code last week, I noticed
that that change didn't seem quite right and may cause some
anti-dependences to be missed. I could indeed write a test case that
fails with an incorrect execution.

In the test case: the dst[9] load after the ArrayCopy is transformed
into a src[9] load before the ArrayCopy. Anti dependence analysis find
src[9] shares the memory of the ArrayCopy but because of the way I
tweaked the code with 8240795, anti-dependence analysis finds the src[9]
and ArrayCopy don't alias so src[9] can sink out of the loop which is
wrong because of the src[9] store. Anti-dependence analysis in that case
would need to look at the memory uses of ArrayCopy too.

Roland.


From rwestrel at redhat.com  Tue Aug 25 08:34:00 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 25 Aug 2020 10:34:00 +0200
Subject: RFR(S): 8241486: G1/Z give warning when using LoopStripMiningIter
 and turn off LoopStripMiningIter (0)
Message-ID: <87tuwr6s5j.fsf@redhat.com>


https://bugs.openjdk.java.net/browse/JDK-8241486
http://cr.openjdk.java.net/~roland/8241486/webrev.00/

Setting LoopStripMiningIter on the command line for a GC that has loop
strip mining implicitly enabled causes a warning to be printed and loop
strip mining to be turned off. As suggested in the bug report, this
change moves the validation of loop strip mining options to "AfterErgo".

Roland.


From boris.ulasevich at bell-sw.com  Tue Aug 25 08:57:14 2020
From: boris.ulasevich at bell-sw.com (Boris Ulasevich)
Date: Tue, 25 Aug 2020 11:57:14 +0300
Subject: RFR 8249893: AARCH64: optimize the construction of the value from
 the bits of the other two
In-Reply-To: <8635e113-53c0-f6a7-e51c-acf20222e216@redhat.com>
References: <c11d8af8-bca7-7a7e-5486-94c487f241ac@bell-sw.com>
 <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com>
 <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com>
 <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com>
 <8635e113-53c0-f6a7-e51c-acf20222e216@redhat.com>
Message-ID: <89c44f2f-9a4f-4298-4e78-358fff27b6a7@bell-sw.com>

Hi,

On 25.08.2020 11:10, Andrew Haley wrote:
> Hi,
>
> On 23/08/2020 19:20, Boris Ulasevich wrote:
> ?>
> ?> Please review the updated change to C2 and AArch64 which introduces
> ?> a new BitfieldInsert node to replace Or+Shift+And sequence when 
> possible.
> ?> Single BFI instruction is emitted for the new node.
> ?>
> ?> With the current change all the transformation logic is moved out of
> ?> aarch64.ad file into the common C2 code.
> ?>
> ?> http://bugs.openjdk.java.net/browse/JDK-8249893
> ?> http://cr.openjdk.java.net/~bulasevich/8249893/webrev.01
> ?>
> ?> The change in compiler.cpp was done to implicitly ask IGVN to run
> ?> the idealization once again after the loop optimization phase.
> ?> This extra step is necessary to make the BFI transform happen
> ?> only after loop optimization.
>
> So here's a strange thing. When I run a simple JMH test
>
> ???? @State(Scope.Benchmark)
> ???? public static class Result {
> ???????? public int a, b;
> ???????? public long x;
> ???? }
>
> ???? @Benchmark
> ???? public static int bfm(Result r) {
> ???????? return (r.a & 0xFF) | ((r.b & 0xFF) << 8);
> ???? }
>
> I get
>
> ?? 0x0000ffff84644df0:?? ubfiz??? w12, w11, #8, #8
> ?? 0x0000ffff84644df4:?? and??? w10, w10, #0xff
> ?? 0x0000ffff84644df8:?? orr??? w2, w10, w12??????????????? ;*ior 
> {reexecute=0 rethrow=0 return_oop=0}
> ???????????????????????????????????????????????????????????? ; - 
> org.openjdk.Rotates::bfm at 19 (line 22)
> ???????????????????????????????????????????????????????????? ; - 
> org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub at 20 (line 199)
>
> instead of
>
> ?? 0x0000ffff808554b4: and??? w10, w10, #0xff
> ?? 0x0000ffff808554b8: and??? w12, w12, #0xff
> ?? 0x0000ffff808554bc: orr??? w2, w12, w10, lsl #8? ;*ior
> ???????????????????????????????????????????????? ; - 
> org.openjdk.Rotates::bfm at 19 (line 22)
> ???????????????????????????????????????????????? ; - 
> org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub at 20 (line 199)
>
> Do you have any ideas why this might be? Thanks.
>

Both variants are correct, isn't it?

I think matcher preferred UBFIZto OR rule becauseins_costwas set to 1.9 
for OR:
https://hg.openjdk.java.net/jdk/jdk/file/92ddc6fe60eb/src/hotspot/cpu/aarch64/aarch64.ad#l12130
https://hg.openjdk.java.net/jdk/jdk/file/92ddc6fe60eb/src/hotspot/cpu/aarch64/aarch64.ad#l11675

With my change it would work like this:

0x0000ffff7c587fe0:?? and??? w2, w10, #0xff
0x0000ffff7c587fe8:?? bfi??? x2, x12, #8, #8


From aph at redhat.com  Tue Aug 25 09:17:12 2020
From: aph at redhat.com (Andrew Haley)
Date: Tue, 25 Aug 2020 10:17:12 +0100
Subject: RFR 8249893: AARCH64: optimize the construction of the value from
 the bits of the other two
In-Reply-To: <89c44f2f-9a4f-4298-4e78-358fff27b6a7@bell-sw.com>
References: <c11d8af8-bca7-7a7e-5486-94c487f241ac@bell-sw.com>
 <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com>
 <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com>
 <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com>
 <8635e113-53c0-f6a7-e51c-acf20222e216@redhat.com>
 <89c44f2f-9a4f-4298-4e78-358fff27b6a7@bell-sw.com>
Message-ID: <95cf8beb-2071-8c41-ff71-d4998681e742@redhat.com>

On 25/08/2020 09:57, Boris Ulasevich wrote:
> Hi,
> 
> On 25.08.2020 11:10, Andrew Haley wrote:
>> Hi,
>>
>> On 23/08/2020 19:20, Boris Ulasevich wrote:
>>  ?>
>>  ?> Please review the updated change to C2 and AArch64 which introduces
>>  ?> a new BitfieldInsert node to replace Or+Shift+And sequence when
>> possible.
>>  ?> Single BFI instruction is emitted for the new node.
>>  ?>
>>  ?> With the current change all the transformation logic is moved out of
>>  ?> aarch64.ad file into the common C2 code.
>>  ?>
>>  ?> http://bugs.openjdk.java.net/browse/JDK-8249893
>>  ?> http://cr.openjdk.java.net/~bulasevich/8249893/webrev.01
>>  ?>
>>  ?> The change in compiler.cpp was done to implicitly ask IGVN to run
>>  ?> the idealization once again after the loop optimization phase.
>>  ?> This extra step is necessary to make the BFI transform happen
>>  ?> only after loop optimization.
>>
>> So here's a strange thing. When I run a simple JMH test
>>
>>  ???? @State(Scope.Benchmark)
>>  ???? public static class Result {
>>  ???????? public int a, b;
>>  ???????? public long x;
>>  ???? }
>>
>>  ???? @Benchmark
>>  ???? public static int bfm(Result r) {
>>  ???????? return (r.a & 0xFF) | ((r.b & 0xFF) << 8);
>>  ???? }
>>
>> I get
>>
>>  ?? 0x0000ffff84644df0:?? ubfiz??? w12, w11, #8, #8
>>  ?? 0x0000ffff84644df4:?? and??? w10, w10, #0xff
>>  ?? 0x0000ffff84644df8:?? orr??? w2, w10, w12??????????????? ;*ior
>> {reexecute=0 rethrow=0 return_oop=0}
>>  ???????????????????????????????????????????????????????????? ; -
>> org.openjdk.Rotates::bfm at 19 (line 22)
>>  ???????????????????????????????????????????????????????????? ; -
>> org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub at 20 (line 199)
>>
>> instead of
>>
>>  ?? 0x0000ffff808554b4: and??? w10, w10, #0xff
>>  ?? 0x0000ffff808554b8: and??? w12, w12, #0xff
>>  ?? 0x0000ffff808554bc: orr??? w2, w12, w10, lsl #8? ;*ior
>>  ???????????????????????????????????????????????? ; -
>> org.openjdk.Rotates::bfm at 19 (line 22)
>>  ???????????????????????????????????????????????? ; -
>> org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub at 20 (line 199)
>>
>> Do you have any ideas why this might be? Thanks.
>>
> 
> Both variants are correct, isn't it?

Well, yes. But I thought that the idea was to generate fewer instructions.

> I think matcher preferred UBFIZto OR rule becauseins_costwas set to 1.9
> for OR:
> https://hg.openjdk.java.net/jdk/jdk/file/92ddc6fe60eb/src/hotspot/cpu/aarch64/aarch64.ad#l12130
> https://hg.openjdk.java.net/jdk/jdk/file/92ddc6fe60eb/src/hotspot/cpu/aarch64/aarch64.ad#l11675
> 
> With my change it would work like this:
> 
> 0x0000ffff7c587fe0:?? and??? w2, w10, #0xff
> 0x0000ffff7c587fe8:?? bfi??? x2, x12, #8, #8

But it didn't. I'm asking you why that is. The first code I showed you was the JMH test
in http://cr.openjdk.java.net/~aph/scratch/. This was after I applied your patch.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From richard.reingruber at sap.com  Tue Aug 25 09:28:32 2020
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Tue, 25 Aug 2020 09:28:32 +0000
Subject: RFR (XS) 8252290: C2: Remove unused enum in CallGenerator
In-Reply-To: <30d68060-d518-a2c6-f853-9e870d48f0ad@redhat.com>
References: <30d68060-d518-a2c6-f853-9e870d48f0ad@redhat.com>
Message-ID: <AM0PR0202MB33312F1A87D044A43397B41A9B570@AM0PR0202MB3331.eurprd02.prod.outlook.com>

Hi Aleksey,

the cleanup looks good to me.

That enum was already part of the initial load with xxxunusedxxx as the only element [1].
So there's no open version history.

I could not find any references either (rtags, grep). Probably the enum had more elements
originally which were removed.

Thanks, Richard.

[1] https://github.com/openjdk/jdk/blame/d4626d89cc778b8b7108036f389548c95d52e56a/src/hotspot/share/opto/callGenerator.hpp#L41

-----Original Message-----
From: hotspot-compiler-dev <hotspot-compiler-dev-retn at openjdk.java.net> On Behalf Of Aleksey Shipilev
Sent: Dienstag, 25. August 2020 09:29
To: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: RFR (XS) 8252290: C2: Remove unused enum in CallGenerator

Small cleanup:
   https://bugs.openjdk.java.net/browse/JDK-8252290

Static code inspection complains the enum below is unused.

diff -r 13fdf97f0a8f src/hotspot/share/opto/callGenerator.hpp
--- a/src/hotspot/share/opto/callGenerator.hpp  Mon Aug 24 09:35:23 2020 +0200
+++ b/src/hotspot/share/opto/callGenerator.hpp  Tue Aug 25 09:27:45 2020 +0200
@@ -37,9 +37,4 @@

  class CallGenerator : public ResourceObj {
- public:
-  enum {
-    xxxunusedxxx
-  };
-
   private:
    ciMethod*             _method;                // The method being called.

Testing: grepping for "xxxunusedxxx", local builds

-- 
Thanks,
-Aleksey


From boris.ulasevich at bell-sw.com  Tue Aug 25 09:47:11 2020
From: boris.ulasevich at bell-sw.com (Boris Ulasevich)
Date: Tue, 25 Aug 2020 12:47:11 +0300
Subject: RFR 8249893: AARCH64: optimize the construction of the value from
 the bits of the other two
In-Reply-To: <95cf8beb-2071-8c41-ff71-d4998681e742@redhat.com>
References: <c11d8af8-bca7-7a7e-5486-94c487f241ac@bell-sw.com>
 <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com>
 <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com>
 <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com>
 <8635e113-53c0-f6a7-e51c-acf20222e216@redhat.com>
 <89c44f2f-9a4f-4298-4e78-358fff27b6a7@bell-sw.com>
 <95cf8beb-2071-8c41-ff71-d4998681e742@redhat.com>
Message-ID: <2323d921-8db3-b98f-af7a-bba7b7c345be@bell-sw.com>


On 25.08.2020 12:17, Andrew Haley wrote:
> On 25/08/2020 09:57, Boris Ulasevich wrote:
>> Hi,
>>
>> On 25.08.2020 11:10, Andrew Haley wrote:
>>> Hi,
>>>
>>> On 23/08/2020 19:20, Boris Ulasevich wrote:
>>> ??>
>>> ??> Please review the updated change to C2 and AArch64 which introduces
>>> ??> a new BitfieldInsert node to replace Or+Shift+And sequence when
>>> possible.
>>> ??> Single BFI instruction is emitted for the new node.
>>> ??>
>>> ??> With the current change all the transformation logic is moved 
>>> out of
>>> ??> aarch64.ad file into the common C2 code.
>>> ??>
>>> ??> http://bugs.openjdk.java.net/browse/JDK-8249893
>>> ??> http://cr.openjdk.java.net/~bulasevich/8249893/webrev.01
>>> ??>
>>> ??> The change in compiler.cpp was done to implicitly ask IGVN to run
>>> ??> the idealization once again after the loop optimization phase.
>>> ??> This extra step is necessary to make the BFI transform happen
>>> ??> only after loop optimization.
>>>
>>> So here's a strange thing. When I run a simple JMH test
>>>
>>> ????? @State(Scope.Benchmark)
>>> ????? public static class Result {
>>> ????????? public int a, b;
>>> ????????? public long x;
>>> ????? }
>>>
>>> ????? @Benchmark
>>> ????? public static int bfm(Result r) {
>>> ????????? return (r.a & 0xFF) | ((r.b & 0xFF) << 8);
>>> ????? }
>>>
>>> I get
>>>
>>> ??? 0x0000ffff84644df0:?? ubfiz??? w12, w11, #8, #8
>>> ??? 0x0000ffff84644df4:?? and??? w10, w10, #0xff
>>> ??? 0x0000ffff84644df8:?? orr??? w2, w10, w12 ;*ior
>>> {reexecute=0 rethrow=0 return_oop=0}
>>> ; -
>>> org.openjdk.Rotates::bfm at 19 (line 22)
>>> ; -
>>> org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub at 20 (line 
>>> 199)
>>>
>>> instead of
>>>
>>> ??? 0x0000ffff808554b4: and??? w10, w10, #0xff
>>> ??? 0x0000ffff808554b8: and??? w12, w12, #0xff
>>> ??? 0x0000ffff808554bc: orr??? w2, w12, w10, lsl #8? ;*ior
>>> ????????????????????????????????????????????????? ; -
>>> org.openjdk.Rotates::bfm at 19 (line 22)
>>> ????????????????????????????????????????????????? ; -
>>> org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub at 20 (line 
>>> 199)
>>>
>>> Do you have any ideas why this might be? Thanks.
>>>
>>
>> Both variants are correct, isn't it?
>
> Well, yes. But I thought that the idea was to generate fewer 
> instructions.
>
>> I think matcher preferred UBFIZto OR rule becauseins_costwas set to 1.9
>> for OR:
>> https://hg.openjdk.java.net/jdk/jdk/file/92ddc6fe60eb/src/hotspot/cpu/aarch64/aarch64.ad#l12130 
>>
>> https://hg.openjdk.java.net/jdk/jdk/file/92ddc6fe60eb/src/hotspot/cpu/aarch64/aarch64.ad#l11675 
>>
>>
>> With my change it would work like this:
>>
>> 0x0000ffff7c587fe0:?? and??? w2, w10, #0xff
>> 0x0000ffff7c587fe8:?? bfi??? x2, x12, #8, #8
>
> But it didn't. I'm asking you why that is. The first code I showed you 
> was the JMH test
> in http://cr.openjdk.java.net/~aph/scratch/. This was after I applied 
> your patch.

Ok. Can you please check that my patch [1] has been applied
and built correctly. With my change I see this picture:

....[Hottest Region 2]...........................................
c2, level 4, org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub,

 ??????????? 0x0000ffff84584dac:?? add??? x11, x14, #0x94

 ??????????? 0x0000ffff84584db0:?? stp??? x21, x19, [sp]
 ??????????? 0x0000ffff84584db4:?? stp??? x20, x14, [sp, #16]
 ??????????? 0x0000ffff84584db8:?? stp??? x15, x10, [sp, #32]
 ??????????? 0x0000ffff84584dbc:?? str??? x11, [sp, #48]
 ??????????? 0x0000ffff84584dc0:?? b??? 0x0000ffff84584dd8
 ??????????? 0x0000ffff84584dc4:?? nop
 ??????????? 0x0000ffff84584dc8:?? nop
 ??????????? 0x0000ffff84584dcc:?? nop
 ? 3.64%? ?? 0x0000ffff84584dd0:?? str??? x19, [sp, #16]
 ? 0.07%? ?? 0x0000ffff84584dd4:?? mov??? x16, x29
 ???????? ?? 0x0000ffff84584dd8:?? ldr??? w10, [x16, #12] ;*invokestatic bfm
 ? 3.92%? ?? 0x0000ffff84584ddc:?? ldr??? w12, [x16, #24]
 ? 4.69%? ?? 0x0000ffff84584de0:?? and??? w2, w10, #0xff
 ? 0.03%? ?? 0x0000ffff84584de4:?? mov??? x29, x16
 ? 0.02%? ?? 0x0000ffff84584de8:?? bfi??? x2, x12, #8, #8???? ;*ior 
{reexecute=0 rethrow=0 return_oop=0}
 ???????? ??????????????????????????????????????????????????? ; - 
org.openjdk.Rotates::bfm at 19 (line 23)


[1] http://cr.openjdk.java.net/~bulasevich/8249893/webrev.01/jdk-jdk.patch

From ningsheng.jian at arm.com  Tue Aug 25 10:07:30 2020
From: ningsheng.jian at arm.com (Ningsheng Jian)
Date: Tue, 25 Aug 2020 18:07:30 +0800
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <9fd1e3b1-7884-1cf7-64ba-040a16c74425@oracle.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com>
 <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com>
 <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com>
 <b97a6343-ec3b-f6ec-2a75-d02a53641ed1@arm.com>
 <9fd1e3b1-7884-1cf7-64ba-040a16c74425@oracle.com>
Message-ID: <b9336ae3-4ebb-782b-df3a-b15c86487335@arm.com>

Hi Vladimir,

On 8/24/20 8:03 PM, Vladimir Ivanov wrote:
> Hi Ningsheng,
> 
>>> What I see in the patch is that you try to attack the problem from the
>>> opposite side: you introduce new concept of a size-agnostic vector
>>> register on RA side and then directly use it during matching: vecA is
>>> used in aarch64_sve.ad and aarch64.ad relies on vecD/vecX.
>>>
>>> Unfortunately, it extends the implementation in orthogonal direction
>>> which looks too aarch64-specific to benefit other architectures and x86
>>> particular. I believe there's an alternative approach which can benefit
>>> both aarch64 and x86, but it requires more experimentation.
>>>
>>
>> Since vecA and vecX (and others) are architecturally different vector 
>> registers, I think it's quite natural that we just introduced the new 
>> vector register type vecA, to represent what we need for corresponding 
>> hardware vector register. Please note that in vector length agnostic 
>> ISA, like Arm SVE and RISC-V vector extension [1], the vector 
>> registers are architecturally the same type of register despite the 
>> different hardware implementations.
> 
> FTR vecX et al don't represent hardware registers, they represent vector 
> values of predefined size. (For example, vecS, vecD, and vecX map to the 
> very same set of 128-bit vector registers on x86.)
> 
> My point is: in terms of existing concepts what you are adding is not 
> "yet another flavor of vector". It's a new full-fledged concept (which 
> is manifested as special cases across the JVM) and you end up with 2 
> different representations of vectors.
> 
> I agree that hardware is quite different, but I don't see it makes much 
> of a difference in the context of the JVM and abstractions used to hide 
> it are similar.
> 
> For example, as of now, most of x86-specific code in C2 works just fine 
> with full-width hardware vectors which are oblivious of their sizes 
> until RA kicks in. And SVE patch you propose completely omits implicit 
> predication hardware provides which makes it similar to AVX512 (modulo 
> wider range of vector width sizes supported).
> 
> So, even though hardware abstractions being used aren't actually *that* 
> different, vecA piles complexity and introduces a separate way to 
> achieve similar results (but slightly differently). And that's what 
> bothers me. I'd like to see more unification instead which should bring 
> reduction in complexity and an opportunity to address long-standing 
> technical debt (and 5 flavors of ideal registers for vectors is part of 
> it IMO).
> 

I can understand that a total solution for different archs and vector 
sizes is preferable. Do you have any initial idea how to achieve that?

> So far, I see 2 main directions for RA work:
> 
>  ? (a) support vectors of arbitrary size:
>  ??? (1) helps push the upper limit on the size (1024-bit)
>  ??? (2) handle non-power-of-2 sizes
> 
>  ? (b) optimize RA implementation for large values
> 
> Anything else?
> 

Yes, and it's not just vector. SVE predicate register has scalable size 
(vector_size/8) as well. We also have predicate register allocator 
support well with proposed approach (not in this patch.).

> Speaking of (a), in particular, I don't see why possible solution for it 
> should not supersede vecX et al altogether.
> 
> Also, I may be wrong, but I don't see a clear evidence there's a 
> pressing need to have all of that fixed right from the beginning. 
> (That's why I put #1 and #2 options on the table.) Starting with #1/#2 
> would untie initial SVE support from the exploratory work needed to 
> choose the most appropriate solution for (a) and (b).
> 

Staring from partial SVE register support might be acceptable for 
initial patch (Andrew may not agree :-)), but I think we may end up with 
more follow-up work, given that our proposed approach already supports 
SVE well in terms of (a) and (b). If there's no other solution, would it 
be possible to use current proposed method? It's not difficult to 
backout our changes in register allocation part, if we find other better 
solution to support arbitrary vector/predicate sizes in future, as the 
patch there is actually not big IMO.

>>> If I were to start from scratch, I would choose between 3 options:
>>>
>>> ??? #1: reuse existing VecX/VecY/VecZ ideal registers and limit 
>>> supported
>>> vector sizes to 128-/256-/512-bit values.
>>>
>>> ??? #2: lift limitation on max size (to 1024/2048 bits), but ignore
>>> non-power-of-2 sizes;
>>>
>>> ??? #3: introduce support for full range of vector register sizes
>>> (128-/.../2048-bit with 128-bit step);
>>>
>>> I see 2 (mostly unrelated) limitations: maximum vector size and
>>> non-power-of-2 sizes.
>>>
>>> My understanding is that you don't try to accurately represent SVE for
>>> now, but lay some foundations for future work: you give up on
>>> non-power-of-2 sized vectors, but still enable support for arbitrarily
>>> sized vectors (addressing both limitations on maximum size and size
>>> granularity) in RA (and it affects only spills). So, it is somewhere
>>> between #2 and #3.
>>>
>>> The ultimate goal is definitely #3, but how much more work will be
>>> required to teach the JVM about non-power-of-2 vectors? As I see in the
>>> patch, you don't have auto-vectorizer support yet, but Vector API will
>>> provide access to whatever size hardware exposes. What do you expect on
>>> hardware front in the near/mid-term future? Anything supporting vectors
>>> larger than 512-bit? What about 384-bit vectors?
>>>
>>
>> I think our patch is now in 3. :-) We do not give up non-power-of-2 
>> sized vectors, instead we are supporting them well in this patch. And 
>> are still using current regmask framework. (Actually, I think the only 
>> limitation to the vector size is that it should be multiple of 32-bits 
>> - bits per 1 reg slot.)
> 
>> I am not sure about other Arm partners' hardware implementations in 
>> the mid-term future, as it's free for cpu implementer to choose any 
>> max vector sizes as long as it follows SVE architecture specification. 
>> But we did tested the patch with Vector API on different SVE supported 
>> vector sizes on emulator, e.g. 384, 768, 1024, 2048 etc. The register 
>> allocator including the spill/unspill works well on those different 
>> sizes with Vector API. (Thanks to your great work on Vector API. :-))
>>
>> We currently limit the vector size to power-of-2 in 
>> vm_version_aarch64.cpp, as suggested by Andrew Dinn, is because 
>> current SLP vectorizer only supports power-of-2 vectors. With Vector 
>> API in, I think such restriction can be removed. And we are also 
>> working on a new vectorizer to support predication/mask, which should 
>> not have power-of-2 limitation.
> 
> [...]
> 
>> Yes, we can make JVM to support portion of vectors, at least for SVE. 
>> My concern is that the performance wouldn't be as good as the full 
>> available vector width.
> 
> To be clear: I called it "somewhere between #2 and #3" solely because 
> auto-vectorizer bails out on non-power-of-2 sizes. And even though 
> Vector API will work with such cases just fine, IMO having 
> auto-vectorizer support is required before calling #3 complete.
> 
> In that respect, choosing smaller vector size auto-vectorizer supports 
> is preferrable to picking up the full-width vectors and turning off 
> auto-vectorizer (even though Vector API will support them).
> 
> It can be turned into heuristic (by default, pick only power-of-2 sizes; 
> let users explicitly specify non-power-of-2 sizes), but speaking of 
> priorities, IMO auto-vectorizer support is more important.
> 

I agree that auto-vectorizer support is more important, and we are 
working on that.

>>> Giving up on #3 for now and starting with less ambitious goals (#1 or
>>> #2) would reduce pressure on RA and give more time for additional
>>> experiments to come with a better and more universal
>>> support/representation of generic/size-agnostic vectors. And, in a
>>> longer term, help reducing complexity and technical debt in the area.
>>>
>>> Some more comments follow inline.
>>>
>>>>> Compared to x86 w/ AVX512, architectural state for vector registers is
>>>>> 4x larger in the worst case (ignoring predicate registers for now).
>>>>> Here are the relevant constants on x86:
>>>>>
>>>>> gensrc/adfiles/adGlobals_x86.hpp:
>>>>>
>>>>> // the number of reserved registers + machine registers.
>>>>> #define REG_COUNT??? 545
>>>>> ...
>>>>> // Size of register-mask in ints
>>>>> #define RM_SIZE 22
>>>>>
>>>>> My estimate is that for AArch64 with SVE support the constants will 
>>>>> be:
>>>>>
>>>>> ??? REG_COUNT < 2500
>>>>> ??? RM_SIZE < 100
>>>>>
>>>>> which don't look too bad.
>>>>>
>>>>
>>>> Right, but given that most real hardware implementations will be no
>>>> larger than 512 bits, I think. Having a large bitmask array, with most
>>>> bits useless, will be less efficient for regmask computation.
>>>
>>> Does it make sense to limit the maximum supported size to 512-bit then
>>> (at least, initially)? In that case, the overhead won't be worse it is
>>> on x86 now.
>>>
>>
>> Technically, this may be possible though I haven't tried. My concerns 
>> are:
>>
>> 1) A larger regmask arrays would be less efficient (we only use 256 
>> bits - 8 slots for SVE in this patch), though won't be worse than x86.
>>
>> 2) Given that current patch already supports larger sizes and 
>> non-power-of-2 sizes well with relative small size in diff, if we want 
>> to support other sizes soon, there may be some more work to roll-back 
>> ad file changes.
>>
>>>>> Also, I don't see any changes related to stack management. So, I
>>>>> assume it continues to be managed in slots. Any problems there? As I
>>>>> understand, wide SVE registers are caller-save, so there may be many
>>>>> spills of huge vectors around a call. (Probably, not possible with C2
>>>>> auto-vectorizer as it is now, but Vector API will expose it.)
>>>>>
>>>>
>>>> Yes, the stack is still managed in slots, but it will be allocated with
>>>> real vector register length instead of 'virtual' slots for VecA. See 
>>>> the
>>>> usages of scalable_reg_slots(), e.g. in chaitin.cpp:1587. We have also
>>>> applied the patch to vector api, and did find a lot of vector spills
>>>> with expected correct results.
>>>
>>> I'm curious whether similar problems may arise for spills. Considering
>>> wide vector registers are caller-saved, it's possible to have lots of
>>> 256-byte values to end up on stack (especially, with Vector API). Any
>>> concerns with that?
>>>
>>
>> No, we don't need to have such big (256-byte) slots for a smaller 
>> vector register. The spill slots are the same size as of real vector 
>> length, e.g. 48 bytes for 384-bit vector. Even for alignment, we 
>> currently choose SlotsPerVecA (8 slots for 32 bytes, 256 bits) for 
>> alignment (skipped slots can still be allocated to other args), which 
>> is still smaller than AVX512 (64 bytes, 512 bits). We can tweak the 
>> patch to choose other smaller value, if we think the alignment is too 
>> large. (Yes, we should always try to avoid spills for wide vectors, 
>> especially with Vector API, to avoid performance pitfalls.)
> 
> Thanks for the clarifications.
> 
> Any new problems/hitting some limitations envisioned when spilling large 
> number of huge vectors (2048-bit) on stack?
> 

I haven't seen any so far.

> Best regards,
> Vladimir Ivanov
> 
>>>>> Have you noticed any performance problems? If that's the case, then
>>>>> AVX512 support on x86 would benefit from similar optimization as well.
>>>>>
>>>>
>>>> Do you mean register allocation performance problems? I did not notice
>>>> that before. Do you have any suggestion on how to measure that?
>>>
>>> I'd try to run some applications/benchmarks with -XX:+CITime to get a
>>> sense how much RA may be affected.
>>>
>>
>> Thanks! I will give a try.
>>
>> [1] 
>> https://urldefense.com/v3/__https://github.com/riscv/riscv-v-spec/releases/tag/0.9__;!!GqivPVa7Brio!IwFEx-c_8JDZcWgXPLcWp2ypX3pr1-IWTBfC7O7PHo7_0skMWtQa4fyWpo-lVor0NFv4Ivo$ 
>>

Thanks,
Ningsheng

From ningsheng.jian at arm.com  Tue Aug 25 10:13:14 2020
From: ningsheng.jian at arm.com (Ningsheng Jian)
Date: Tue, 25 Aug 2020 18:13:14 +0800
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <c3023127-277a-56e5-3dd9-0d8b119d147d@oracle.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <cb37928d-0ab1-ce69-bbbf-d5b926b73703@redhat.com>
 <50271ba1-cc78-a325-aed5-2fc468084515@arm.com>
 <66a9812d-256d-d8ef-d435-3a18daa6bb1e@redhat.com>
 <f82f580c-eee6-42ab-b191-789e55d700c2@arm.com>
 <d87a73a0-d2f6-b1eb-8784-f76ada320dc6@oracle.com>
 <5397e0d1-9d40-0107-c164-304740bc5d7f@arm.com>
 <c3023127-277a-56e5-3dd9-0d8b119d147d@oracle.com>
Message-ID: <ff6ccd41-7826-8849-341a-4854c4e00838@arm.com>

Hi Erik,

On 8/24/20 11:26 PM, Erik ?sterlund wrote:
> Hi Ningsheng,
> 
> On 2020-08-24 11:59, Ningsheng Jian wrote:
>> Hi Erik,
>>
>> Thanks for the review!
>>
>> On 8/22/20 12:21 AM, Erik ?sterlund wrote:
>>> Hi,
>>>
>>> Have you tried this with ZGC on AArch64? It has custom code for saving
>>> live registers in the load barrier slow path.
>>> I can't see any code changes there, so assuming this will just crash
>>> instead.
>>> The relevant code is in ZBarrierSetAssembler on aarch64.
>>>
>>> Maybe I missed something?
>>>
>>
>> I didn't add ZGC option while running tests. I think I need to update 
>> push_fp() which is called by ZSaveLiveRegisters. But do we need to get 
>> size info (float/neon/sve) instead of saving the whole vector 
>> register? Currently, it just simply saves the whole NEON register.
> 
> What we found on x86_64 was that there was a significant cost in saving 
> vector registers in load barriers. That is why we perform some analysis 
> so that only the exact registers that are affected, and only the parts 
> of the registers that are affected, get spilled. It actually mattered. 
> It will of course work either way, but that was our observation on 
> x86_64. But I am okay with that being deferred to a separate RFE. I just 
> wanted to make sure that it at the very least works with the new code, 
> for a start, so it doesn't start crashing.
> 

OK, I will make it to save the whole reg in this patch and have a 
separate RFE to optimize as what x86 does.

>> And in ZBarrierSetAssembler::load_at(), before calling to runtime 
>> code, we call push_call_clobbered_registers_except(), which just saves 
>> floating point registers instead of the whole NEON vector registers. 
>> Similar behavior in x86 implementation. Is that correct (not saving 
>> vectors)?
> 
> Yes. The call contexts are:
> 1) Interpreter. Does not use vector registers.
> 2) Method handle intrinsic. Uses only floats that are part of the Java 
> calling convention, rest is garbage. No vectors here.
> 3) Checkcast arraycopy. Does not use vectors.
> 
Thanks for sharing this.


Thanks,
Ningsheng

From aph at redhat.com  Tue Aug 25 11:52:52 2020
From: aph at redhat.com (Andrew Haley)
Date: Tue, 25 Aug 2020 12:52:52 +0100
Subject: [aarch64-port-dev ] RFR: 8252259: AArch64: Adjust default value
 of FLOATPRESSURE
In-Reply-To: <001101d67aa5$69851450$3c8f3cf0$@alibaba-inc.com>
References: <001101d67aa5$69851450$3c8f3cf0$@alibaba-inc.com>
Message-ID: <abd4bc2f-3ad7-41de-b0bf-e615b7276f26@redhat.com>

On 25/08/2020 07:03, Joshua Zhu wrote:
> Therefore I propose the default value of FLOATPRESSURE be 32 because
> there are 32 float/SIMD registers on aarch64 and also the value of register
> pressure is the same as 1 for each LRG of Op_RegL/Op_RegD/Op_Vec. [3]
> 
> Could you please help review this change?
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8252259
> Webrev: http://cr.openjdk.java.net/~jzhu/8252259/webrev.00/

Yes, thanks. I can't remember why FLOATPRESSURE is 64, but it
certainly looks like 32 is a much more sensible value.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From vladimir.x.ivanov at oracle.com  Tue Aug 25 12:12:38 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 25 Aug 2020 15:12:38 +0300
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <b9336ae3-4ebb-782b-df3a-b15c86487335@arm.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com>
 <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com>
 <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com>
 <b97a6343-ec3b-f6ec-2a75-d02a53641ed1@arm.com>
 <9fd1e3b1-7884-1cf7-64ba-040a16c74425@oracle.com>
 <b9336ae3-4ebb-782b-df3a-b15c86487335@arm.com>
Message-ID: <5b452edb-2851-f35a-ac30-523d74d95851@oracle.com>


> I can understand that a total solution for different archs and vector 
> sizes is preferable. Do you have any initial idea how to achieve that?

I have only ideas right now (unfortunately) :-)

So far, my observations from working on refactoring vector support on 
x86 with Intel folks are the following:

   (1) full-width register representation is good enough;

Though on x86 all vector registers are accurately modeled (register 
masks properly track sizes and aliasing), it turns out that what matters 
in practice is aliasing.

So, it's enough to use a single "virtual" slot to model XMM, YMM, and 
ZMM registers all at once unless RA supports packing multiple smaller 
vector values into a single register (separately managing lower and 
upper parts of the register; e.g., YMM = XMM(hi):XMM(lo) ). Though 
currently RA does support it, there are no code which utilizes that and 
no plans to do that in the future.

I believe the situation on AArch64 with NEON and SVE is similar. (And 
scalable vectors make it harder to support packing in RA.)

   (2) vector width matters only for spills/refills and reg2reg moves.

Matcher does type capturing, so all vector mach nodes keep precise type 
of the value they produce. On x86 it is heavily used later in code 
emission phase, but RA still relies on ideal registers (Op_VecX et al). 
I don't see why RA can't be migrated from ideal registers to types 
(TypeVect) to determine vector size when performing spilling.

 From aforementioned observations, I conclude there should be a way to 
declare a single ideal vector register (Op_Vec) which represents 
full-width vector supported by the hardware and use captured vector 
types (TypeVect instances) to guide RA and code generation. And that's 
the state where I'd like to see vector support in C2 be moving to.

Regarding predicate registers, I haven't thought too much about them, so 
I don't have a strong opinion about whether they should be a separate 
entity (Op_RegVMask in your patch) or just treated as a vector of bits 
(Op_Vec).

>> So far, I see 2 main directions for RA work:
>>
>> ?? (a) support vectors of arbitrary size:
>> ???? (1) helps push the upper limit on the size (1024-bit)
>> ???? (2) handle non-power-of-2 sizes
>>
>> ?? (b) optimize RA implementation for large values
>>
>> Anything else?
>>
> 
> Yes, and it's not just vector. SVE predicate register has scalable size 
> (vector_size/8) as well. We also have predicate register allocator 
> support well with proposed approach (not in this patch.).

Though with AVX512 support predicate register support was left aside, I 
agree that predicate registers should be taken into account from the 
very beginning. (And glad to hear you are already working on supporting 
them!)

Also, I believe options #1/#2 may be extended to cover predicate 
registers as well without too much effort.

>> Speaking of (a), in particular, I don't see why possible solution for 
>> it should not supersede vecX et al altogether.
>>
>> Also, I may be wrong, but I don't see a clear evidence there's a 
>> pressing need to have all of that fixed right from the beginning. 
>> (That's why I put #1 and #2 options on the table.) Starting with #1/#2 
>> would untie initial SVE support from the exploratory work needed to 
>> choose the most appropriate solution for (a) and (b).
>>
> 
> Staring from partial SVE register support might be acceptable for 
> initial patch (Andrew may not agree :-)), but I think we may end up with 
> more follow-up work, given that our proposed approach already supports 
> SVE well in terms of (a) and (b). If there's no other solution, would it 
> be possible to use current proposed method? It's not difficult to 
> backout our changes in register allocation part, if we find other better 
> solution to support arbitrary vector/predicate sizes in future, as the 
> patch there is actually not big IMO.

Unfortunately, temporary solutions usually end up as permanent ones 
since there's much less motivation to replace them (and harder to 
justify the effort) after initial pressure is relieved.

I'm OK with the proposed patch if we agree it's a stop-the-gap/temporary 
solution to the immediate problems you face with initial SVE support and 
are ready to commit resources into replacing it.

That's why I think it's the right time to discuss general direction, 
work on a plan, and use it to guide the coordinated effort to improve 
vector support in C2.

Also, considering it a stop-the-gap solution means we should strive for 
the simplest solution and that's another reason I put #1/#2 options on 
the table to consider.

[...]

>> Any new problems/hitting some limitations envisioned when spilling 
>> large number of huge vectors (2048-bit) on stack?
>>
> 
> I haven't seen any so far.

Ok, good to know.

I was curious whether stack representation should also move away from 
32-bit slots to a more compact representation.

Best regards,
Vladimir Ivanov

From tobias.hartmann at oracle.com  Tue Aug 25 12:37:21 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 25 Aug 2020 14:37:21 +0200
Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and
 debugging support
In-Reply-To: <0d5fd444-e836-8042-3039-6d16e62ecfb1@oracle.com>
References: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com>
 <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com>
 <5c6d0dae-a30d-7b32-4503-62b7ec460606@oracle.com>
 <d4f2ee42-6db4-644f-0b0c-2c6e4d6373bd@oracle.com>
 <4b6d3d8a-f2f3-9942-de60-078a0e5c46d9@oracle.com>
 <8cd1d560-f473-f4f1-a865-70e306d4750f@oracle.com>
 <0d5fd444-e836-8042-3039-6d16e62ecfb1@oracle.com>
Message-ID: <ca750e7b-d81a-ec51-19b0-c49639c78aeb@oracle.com>

Hi Christian,

On 19.08.20 16:06, Christian Hagedorn wrote:
> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.02/
Looks good to me, just noticed some style issues (no new webrev required):

c1_LinearScan.cpp:
- Wrong indentation in lines 5445, 5509, 5681

TestTraceLinearScanLevel.java:
- "... in a HelloWorld program". It's not a HelloWorld program, right? ;)

Best regards,
Tobias

From tobias.hartmann at oracle.com  Tue Aug 25 12:43:10 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 25 Aug 2020 14:43:10 +0200
Subject: RFR (S) 8252215: Remove VerifyOptoOopOffsets flag
In-Reply-To: <96144e25-02b7-ed81-285e-b8d487fd6cfb@redhat.com>
References: <96144e25-02b7-ed81-285e-b8d487fd6cfb@redhat.com>
Message-ID: <bf7f3cd0-b422-c305-e8eb-7eb3f44dcf40@oracle.com>

Hi Aleksey,

looks good to me.

Best regards,
Tobias

On 25.08.20 09:08, Aleksey Shipilev wrote:
> RFE:
> ? https://bugs.openjdk.java.net/browse/JDK-8252215
> 
> VerifyOptoOopOffsets flag does not seem to be used (no tests in the current test base), and it does
> not seem to work reliably (see JDK-4834891). It might be a good time to remove it. JDK-4834891
> evaluation says: "The flag VerifyOptoOopOffsets has not been valid since the introduction of
> sun/misc/Unsafe and the flag should not be used for general testing."
> 
> How about we remove it?
> ? https://cr.openjdk.java.net/~shade/8252215/webrev.01/
> 
> Testing: tier1 (locally); jdk-submit (still running?)
> 

From tobias.hartmann at oracle.com  Tue Aug 25 12:44:17 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 25 Aug 2020 14:44:17 +0200
Subject: RFR (XS) 8252291: C2: Assignment in conditional in
 loopUnswitch.cpp
In-Reply-To: <b7a6ee57-bf7e-6d73-0bad-1f90141e053a@redhat.com>
References: <b7a6ee57-bf7e-6d73-0bad-1f90141e053a@redhat.com>
Message-ID: <811ea63c-b897-ccf0-2559-82842b52e4be@oracle.com>

Hi Aleksey,

looks good and trivial to me.

Best regards,
Tobias

On 25.08.20 09:34, Aleksey Shipilev wrote:
> Cleanup:
> ? https://bugs.openjdk.java.net/browse/JDK-8252291
> 
> Static code analysis complains there is the assignment in the conditional here. I believe the
> assignment should be explicit here. Code was introduced with JDK-8136725.
> 
> diff -r 31de2a59348a src/hotspot/share/opto/loopUnswitch.cpp
> --- a/src/hotspot/share/opto/loopUnswitch.cpp?? Tue Aug 25 09:27:04 2020 +0200
> +++ b/src/hotspot/share/opto/loopUnswitch.cpp?? Tue Aug 25 09:29:23 2020 +0200
> @@ -442,7 +442,8 @@
> 
> ?? if (iff->in(1)->Opcode() != Op_ConI) {
> ???? return false;
> ?? }
> 
> -? return _has_reserved = true;
> +? _has_reserved = true;
> +? return true;
> ?}
> 
> Testing: local builds
> 

From tobias.hartmann at oracle.com  Tue Aug 25 12:46:11 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 25 Aug 2020 14:46:11 +0200
Subject: RFR (XS) 8252290: C2: Remove unused enum in CallGenerator
In-Reply-To: <AM0PR0202MB33312F1A87D044A43397B41A9B570@AM0PR0202MB3331.eurprd02.prod.outlook.com>
References: <30d68060-d518-a2c6-f853-9e870d48f0ad@redhat.com>
 <AM0PR0202MB33312F1A87D044A43397B41A9B570@AM0PR0202MB3331.eurprd02.prod.outlook.com>
Message-ID: <d7112951-f58d-1382-8ce9-3c7e7dac3ff6@oracle.com>

+1

On 25.08.20 11:28, Reingruber, Richard wrote:
> Static code inspection complains the enum below is unused.

Just curious, which analyzer are you using?

Best regards,
Tobias

From tobias.hartmann at oracle.com  Tue Aug 25 12:57:22 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 25 Aug 2020 14:57:22 +0200
Subject: RFR(S): 8241486: G1/Z give warning when using LoopStripMiningIter
 and turn off LoopStripMiningIter (0)
In-Reply-To: <87tuwr6s5j.fsf@redhat.com>
References: <87tuwr6s5j.fsf@redhat.com>
Message-ID: <b2bd009e-fdba-d5f4-77b6-173df1609102@oracle.com>

Hi Roland,

> * @requires vm.gc.G1 & vm.gc.Shenandoah & vm.gc.Z & vm.gc.Epsilon

That doesn't look right. The test would never be executed.

Best regards,
Tobias

From tobias.hartmann at oracle.com  Tue Aug 25 13:18:17 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 25 Aug 2020 15:18:17 +0200
Subject: RFR(S): 8241486: G1/Z give warning when using LoopStripMiningIter
 and turn off LoopStripMiningIter (0)
In-Reply-To: <b2bd009e-fdba-d5f4-77b6-173df1609102@oracle.com>
References: <87tuwr6s5j.fsf@redhat.com>
 <b2bd009e-fdba-d5f4-77b6-173df1609102@oracle.com>
Message-ID: <e5dc8786-0044-37bd-299a-89c90f303a2e@oracle.com>


On 25.08.20 14:57, Tobias Hartmann wrote:
>> * @requires vm.gc.G1 & vm.gc.Shenandoah & vm.gc.Z & vm.gc.Epsilon
> That doesn't look right. The test would never be executed.

Sorry, confused it with the vm.gc == .. check. You are just checking if the VM supports the GC.

Looks good to me.

Best regards,
Tobias

From vladimir.x.ivanov at oracle.com  Tue Aug 25 13:18:12 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 25 Aug 2020 16:18:12 +0300
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <35f8801f-4383-4804-2a9b-5818d1bda763@redhat.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com>
 <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com>
 <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com>
 <b97a6343-ec3b-f6ec-2a75-d02a53641ed1@arm.com>
 <35f8801f-4383-4804-2a9b-5818d1bda763@redhat.com>
Message-ID: <524a4aaa-3cf7-7b5e-3e0e-0fd7f4f89fbf@oracle.com>

Hi Andrew,

I elaborated on some of the points in the thread with Ningsheng.

I put my responses in-line, but will try to avoid repeating myself too much.

>>> The ultimate goal was to move to vectors which represent full-width
>>> hardware registers. After we were convinced that it will work well in AD
>>> files, we encountered some inefficiencies with vector spills: depending
>>> on actual hardware, smaller (than available) vectors may be used (e.g.,
>>> integer computations on AVX-capable CPU). So, we stopped half-way and
>>> left post-matching part intact: depending on actual vector value width,
>>> appropriate operand (vecX/vecY/vecZ + legacy variants) is chosen.
>>>
>>> (I believe you may be in a similar situation on AArch64 with NEON vs SVE
>>> where both 128-bit and wide SVE vectors may be used at runtime.)
> 
> Your problem here seems to be a worry about spilling more data than is
> actually needed. As Ningsheng pointed out the amount of data spilled is
> determined by the actual length of the VecA registers, not by the
> logical size of the VecA mask (256 bits) nor by the maximum possible
> size of a VecA register on future architectures (2048 bits). So, no more
> stack space will be used than is needed to preserve the live bits that
> need preserving.

I described the experience with doing a similar exercise on x86: 
migrating away from [leg]vec[SDXYZ] operands to a uniform size-agnostic 
representation (legVec/vec). The only problem with abandoning Op_VecX et 
al was the need to track the size of vector values in RA.

>>> Unfortunately, it extends the implementation in orthogonal direction
>>> which looks too aarch64-specific to benefit other architectures and x86
>>> particular. I believe there's an alternative approach which can benefit
>>> both aarch64 and x86, but it requires more experimentation.
>>>
>>
>> Since vecA and vecX (and others) are architecturally different vector
>> registers, I think it's quite natural that we just introduced the new
>> vector register type vecA, to represent what we need for corresponding
>> hardware vector register. Please note that in vector length agnostic
>> ISA, like Arm SVE and RISC-V vector extension [1], the vector registers
>> are architecturally the same type of register despite the different
>> hardware implementations.
> 
> Yes, I also see this as quite natural. Ningsheng's change extends the
> implementation in the architecture-specific direction that is needed for
> AArch64's vector model. The fact that this differs from x86_64 is not
> unexpected.

And still C2 can model them in a similar way. Moreover, recent changes 
on x86 I described brings x86 very close to SVE. (I elaborated on that 
in the previous response to Ningsheng.)


>>> If I were to start from scratch, I would choose between 3 options:
>>>
>>>  ??? #1: reuse existing VecX/VecY/VecZ ideal registers and limit supported
>>> vector sizes to 128-/256-/512-bit values.
>>>
>>>  ??? #2: lift limitation on max size (to 1024/2048 bits), but ignore
>>> non-power-of-2 sizes;
>>>
>>>  ??? #3: introduce support for full range of vector register sizes
>>> (128-/.../2048-bit with 128-bit step);
>>>
>>> I see 2 (mostly unrelated) limitations: maximum vector size and
>>> non-power-of-2 sizes.
> 
> Yes, but this patch deals with both of those and I cannot see it causing
> any problems for x86_64 nor do I see it adding any great complexity. The
> extra shard paths deal with scalable vectors wich onlu occur on AArch64.
> A scalable VecA register (and also eventually the scalable predicate
> register) caters for all possible vector sizes via a single 'logical'
> vector of size 8 slots (also eventually a single 'logical' predicate
> register of size 1 slot). Catering for scalable registers in shared code
> is localized and does not change handling of the existing, non-scalable
> VecX/Y/Z registers.

Code needed for vector support in C2 has been growing in size over the 
years and now it comprises a noticeable part of the compiler. And it got 
there through relatively small incremental and localized changes.

I agree that the proposed solution demonstrates a very clever way to 
overcome some of the limitations imposed by existing implementation. But 
it is still a workaround which only emphasizes the architectural 
limitations. And it's not specific to AArch64 with SVE: x86 stretches it 
hard as well (though in a slightly different direction) which FTR forced 
recent migration to "generic vectors".

So, instead of proceeding with incremental changes and accumulating 
complexity (and technical debt along the way), I suggest to look into 
reworking vector support and making it relevant to the modern hardware 
(both x86 and AArch64).

>>> My understanding is that you don't try to accurately represent SVE for
>>> now, but lay some foundations for future work: you give up on
>>> non-power-of-2 sized vectors, but still enable support for arbitrarily
>>> sized vectors (addressing both limitations on maximum size and size
>>> granularity) in RA (and it affects only spills). So, it is somewhere
>>> between #2 and #3.
> 
> I have to disagree with your statement that this proposal doesn't
> 'accurately' represent SVE. Yes, the vector mask for this arbitrary-size
> vector is modelled 'logically' using a nominal 8 slots. However, that is
> merely to avoid wasting bits in the bit masks plus cpu time processing
> them. The 'physical' vector length models the actual number of slots,
> and includes the option to model a non-power of two. That 'physical'
> size is used in all operations that manipulate VecA register contents.
> So, although I grant that the code is /parameterized/, it is also 100%
> accurate.

My point is: the proposed solution makes a number of simplifying 
assumptions which makes it much easier to support SVE (e.g., VecA 
represents full-width vector which completely ignores implicit 
predication provided by the ISA).

>>> The ultimate goal is definitely #3, but how much more work will be
>>> required to teach the JVM about non-power-of-2 vectors? As I see in the
>>> patch, you don't have auto-vectorizer support yet, but Vector API will
>>> provide access to whatever size hardware exposes. What do you expect on
>>> hardware front in the near/mid-term future? Anything supporting vectors
>>> larger than 512-bit? What about 384-bit vectors?
> 
> Do we need to know for sure such hardware is going to arrive in order to
> allow for it now? If there were a significant cost to doing so I'd maybe
> say yes but I don't really see one here. Most importantly, the changes
> to the AArch64 register model and small changes to the shared
> chaitin/reg mask code proposed here already work with the
> auto-vectorizer if the VecA slots are any of the possible powers of 2
> VecA sizes.
> 
> The extra work needed to profit from non-power-of-two vector involves
> upgrading the auto-vectorizer code. While this may be tricky I don't see
> ti as impossible. However, more importantly, even if such an upgrade
> cannot be achieved then this proposal is still a very simple way to
> allow for arbitrarily scalable SVE vectors that are a power of two size.
> It also allows any architecture with a non-power of two to work with the
> lowest power of two that fits. So, this is a very siple way to cater for
> what may turn up.

If it makes options #1/#2 viable, then there's no need to change shared 
code at all. Choosing between no code changes and low risk / small code 
changes which won't be used in practice, I'm strongly in favor of the 
former.

>>> For larger vectors #2 (or a mix of #1 and #2) may be a good fit. My
>>> understanding that existing RA machinery should support 1024-bit vectors
>>> well. So, unless 2048-bit vectors are needed, we could live with the
>>> framework we have right now.
> 
> I'm not sure what you are proposing here but it sounds like introducing
> extra vectors beyond VecX, VecY for larger powers of two i.e. VecZ,
> vecZZ, VecZZZ ... and providing separate case processing for each of
> them where the relevant case is selected conditional on the actual
> vector size. Is that what you are proposing? I can't see any virtue in
> multiplying case handling fore ach new power-of-two size that turns up
> when all possible VecZ* power-of-two options can actually be handled as
> one uniform case.

Option #1 doesn't require anything more than Vec[SDXYZ].

Option #2 assumes 1 more operand & ideal register for 1024-bit. As 
Ningsheng pointed out, without introducing length-agnostic vectors, 
supporting 2048-bit vectors require changes in RegMask to accommodate 
for values spanning 64 slots.

>>> Giving up on #3 for now and starting with less ambitious goals (#1 or
>>> #2) would reduce pressure on RA and give more time for additional
>>> experiments to come with a better and more universal
>>> support/representation of generic/size-agnostic vectors. And, in a
>>> longer term, help reducing complexity and technical debt in the area.
> 
> Can you explain what you mean by 'reduce pressure on RA'? I'm also
> unclear as to what you see as complex about this proposal.

IMO vector support already introduces significant complexity in C2. 
Adding platform-specific features will only increase it. So, I'm in 
favor of reworking the support than applying band-aids to relax some 
inherent limitations of it.

>>> Some more comments follow inline.
>>>
>>>>> Compared to x86 w/ AVX512, architectural state for vector registers is
>>>>> 4x larger in the worst case (ignoring predicate registers for now).
>>>>> Here are the relevant constants on x86:
>>>>>
>>>>> gensrc/adfiles/adGlobals_x86.hpp:
>>>>>
>>>>> // the number of reserved registers + machine registers.
>>>>> #define REG_COUNT??? 545
>>>>> ...
>>>>> // Size of register-mask in ints
>>>>> #define RM_SIZE 22
>>>>>
>>>>> My estimate is that for AArch64 with SVE support the constants will be:
>>>>>
>>>>>  ??? REG_COUNT < 2500
>>>>>  ??? RM_SIZE < 100
>>>>>
>>>>> which don't look too bad.
> 
> I'm not sure what these numbers are meant to mean. The number of SVE
> vector registers is the same as the number of NEON vector registers i.e.
> 32. The register mask size for VecA registers is 8 * 32 bits.

I attempted to estimate the sizes of relevant structures if VecA is 
modelled the same way as VecX et al.

>>>> Right, but given that most real hardware implementations will be no
>>>> larger than 512 bits, I think. Having a large bitmask array, with most
>>>> bits useless, will be less efficient for regmask computation.
>>>
>>> Does it make sense to limit the maximum supported size to 512-bit then
>>> (at least, initially)? In that case, the overhead won't be worse it is
>>> on x86 now.
> 
> Well, no. It doesn't make sense when all you need is a 'logical' 8 * 32
> bit mask whatever the actual 'physical' register size is.

I asked that question in a different context trying to get a sense of 
other simplifying assumptions which could be made in the initial 
implementation.

But you should definitely prefer 1-slot design for vector registers then 
;-)

Best regards,
Vladimir Ivanov

From rwestrel at redhat.com  Tue Aug 25 13:37:10 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 25 Aug 2020 15:37:10 +0200
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <BD40513E-89A5-4E1A-8BFB-4EDFAA524A75@oracle.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com>
 <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com>
 <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com>
 <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com>
 <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com>
 <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com>
 <875zbjw9m9.fsf@redhat.com> <87h7t13bdz.fsf@redhat.com>
 <a28925b3-f93c-40f4-f180-8bf8a32c3a56@oracle.com>
 <87tuwx1gcf.fsf@redhat.com>
 <7433dd28-94ce-781a-a50c-e79234e2986e@oracle.com>
 <BD40513E-89A5-4E1A-8BFB-4EDFAA524A75@oracle.com>
Message-ID: <87r1ru7sop.fsf@redhat.com>


> Putting these together, and choosing a round number
> which prioritizes concern (b) by moving closer to the
> limit of (a), if I had one more run to do I?d choose
> -XX: StressLongCountedLoop=20000000.
>
> If I were to do multiple runs, I might choose vary that
> stress parameter by adding and subtracting a couple
> of zeroes:
>
> -XX: StressLongCountedLoop=200000
> -XX: StressLongCountedLoop=2000000
> -XX: StressLongCountedLoop=20000000
> -XX: StressLongCountedLoop=200000000
> -XX: StressLongCountedLoop=2000000000

FWIW, I ran my own tests with -XX: StressLongCountedLoop=10000. Tobias
runs did catch failures I didn't run into.

> Separately from those issues, we know that the stress mode
> converts 32-bit loops into 64-bit loops, which then re-nest
> using the new logic.  But, are we confident that this re-nesting
> works?  Roland did some manual testing to make sure the
> test works as intended, but it would be good to run the above
> stress tests with some sort of logging that ensures that there
> are at least ?lots and lots? of successful 32-to-64 loop conversions.
> If those loop conversions fail (staying at 64 bits) the tests will
> pass, but they won?t be testing what we need to be testing.

What about using the new statistics?
A CTW run of the base module reports: long loops=11/36
A CTW run of the base module with -XX:StressLongCountedLoop=1000 reports: long loops=3271/3410

Granted, the first counter is only incremented once the loop nest is
created but not when the inner loop is converted to a counted loop. On
another run with a third counter incremented on counted loop creation:

2889/2971/3106

(that not all created inner loops are transformed to counted loops is
strange. Maybe some become dead between the 2 steps.)

Roland.


From martin.doerr at sap.com  Tue Aug 25 13:37:38 2020
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 25 Aug 2020 13:37:38 +0000
Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API
 for Base64 decoding
In-Reply-To: <f4c2d16d-5531-b0a6-b763-bed9d316190a@linux.ibm.com>
References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com>
 <OFF371E11D.F5B64585-ON00258591.0006A5F8-49258591.0007AA3E@notes.na.collabserv.com>
 <ef75251b-c5ef-10b5-d0bd-c4ba15d2c934@linux.ibm.com>
 <f4c2d16d-5531-b0a6-b763-bed9d316190a@linux.ibm.com>
Message-ID: <AM4PR02MB30578492C614A8BBD34D2DB39A570@AM4PR02MB3057.eurprd02.prod.outlook.com>

Hi Corey,

thanks for proposing this change. I have comments and suggestions regarding various files.


Base64.java

This is the only file which needs another review from core-libs-dev.
First of all, I like the idea to use a HotSpotIntrinsicCandidate which can consume as many bytes as the implementation wants.

Comment before decodeBlock:
Let's be precise: "should process a multiple of four" => "must process a multiple of four"

> If any illegal base64 bytes are encountered in the source by the
> intrinsic, the intrinsic can return a data length of zero or any
> number of bytes before the place where the illegal base64 byte
> was encountered.
I think this has a drawback. Somebody may use a debugger and want to stop when throwing IllegalArgumentException. He should see the position which matches the Java implementation.

Please note that the comment indentation differs from other comments.

decode0: Final "else" after return is redundant.


stubGenerator_ppc.cpp

"__vector" breaks AIX build!
Does it work on Big Endian linux with old gcc (we require 7.3.1, now)?
Please either support Big Endian properly or #ifdef it out.
What exactly does it on linux?
I remember that we had tried such prefixes but were not satisfied. I think it didn't enforce 16 Byte alignment if I remember correctly.

Attention: C2 does no longer convert int/bool to 64 bit values (since JDK-8086069). So the argument registers for offset, length and isURL may contain garbage in the higher bits.

You may want to use load_const_optimized which produces shorter code.

You may want to use __ align(32) to align unrolled_loop_start.

I'll review the algorithm in detail when I find more time.


assembler_ppc.hpp
assembler_ppc.inline.hpp
vm_version_ppc.cpp
vm_version_ppc.hpp
Please rebase. Parts of the change were pushed as part of 8248190: Enable Power10 system and implement new byte-reverse instructions


vmSymbols.hpp
Indentation looks odd at the end.


library_call.cpp
Good. Indentation style of the call parameters differs from encodeBlock.


runtime.cpp
Good.


aotCodeHeap.cpp
vmSymbols.cpp
shenandoahSupport.cpp
vmStructs_jvmci.cpp
shenandoahSupport.cpp
escape.cpp
runtime.hpp
stubRoutines.cpp
stubRoutines.hpp
vmStructs.cpp
Good and trivial.


Tests:
I think we should have JTREG tests to check for regressions in the future.

Best regards,
Martin


> -----Original Message-----
> From: Corey Ashford <cjashfor at linux.ibm.com>
> Sent: Mittwoch, 19. August 2020 20:11
> To: Michihiro Horie <HORIE at jp.ibm.com>
> Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev <core-libs-
> dev at openjdk.java.net>; Kazunori Ogata <OGATAK at jp.ibm.com>;
> joserz at br.ibm.com; Doerr, Martin <martin.doerr at sap.com>
> Subject: Re: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and
> API for Base64 decoding
> 
> Michihiro Horie posted up a new iteration of this webrev for me.  This
> time the webrev includes a complete implementation of the intrinsic for
> Power9 and Power10.
> 
> You can find it here:
> http://cr.openjdk.java.net/~mhorie/8248188/webrev.02/
> 
> Changes in webrev.02 vs. webrev.01:
> 
>    * The method header for the intrinsic in the Base64 code has been
> rewritten using the Javadoc style.  The clarity of the comments has been
> improved and some verbosity has been removed.  There are no additional
> functional changes to Base64.java.
> 
>    * The code needed to martial and check the intrinsic parameters has
> been added, using the base64 encodeBlock intrinsic as a guideline.
> 
>    * A complete intrinsic implementation for Power9 and Power10 is included.
> 
>    * Adds some Power9 and Power10 assembler instructions needed by the
> intrinsic which hadn't been defined before.
> 
> The intrinsic implementation in this patch accelerates the decoding of
> large blocks of base64 data by a factor of about 3.5X on Power9.
> 
> I'm attaching two Java test cases I am using for testing and
> benchmarking.  The TestBase64_VB encodes and decodes randomly-sized
> buffers of random data and checks that original data matches the
> encoded-then-decoded data.  TestBase64Errors encodes a 48K block of
> random bytes, then corrupts each byte of the encoded data, one at a
> time, checking to see if the decoder catches the illegal byte.
> 
> Any comments/suggestions would be appreciated.
> 
> Thanks,
> 
> - Corey
> 
> On 7/27/20 6:49 PM, Corey Ashford wrote:
> > Michihiro Horie uploaded a new revision of the Base64 decodeBlock
> > intrinsic API for me:
> >
> > http://cr.openjdk.java.net/~mhorie/8248188/webrev.01/
> >
> > It has the following changes with respect to the original one posted:
> >
> >  ?* In the event of encountering a non-base64 character, instead of
> > having a separate error code of -1, the intrinsic can now just return
> > either 0, or the number of data bytes produced up to the point where the
> > illegal base64 character was encountered.? This reduces the number of
> > special cases, and also provides a way to speed up the process of
> > finding the bad character by the slower, pure-Java algorithm.
> >
> >  ?* The isMIME boolean is removed from the API for two reasons:
> >  ?? - The current API is not sufficient to handle the isMIME case,
> > because there isn't a strict relationship between the number of input
> > bytes and the number of output bytes, because there can be an arbitrary
> > number of non-base64 characters in the source.
> >  ?? - If an intrinsic only implements the (isMIME == false) case as ours
> > does, it will always return 0 bytes processed, which will slightly slow
> > down the normal path of processing an (isMIME == true) instantiation.
> >  ?? - We considered adding a separate hotspot candidate for the (isMIME
> > == true) case, but since we don't have an intrinsic implementation to
> > test that, we decided to leave it as a future optimization.
> >
> > Comments and suggestions are welcome.? Thanks for your consideration.
> >
> > - Corey
> >
> > On 6/23/20 6:23 PM, Michihiro Horie wrote:
> >> Hi Corey,
> >>
> >> Following is the issue I created.
> >> https://bugs.openjdk.java.net/browse/JDK-8248188
> >>
> >> I will upload a webrev when you're ready as we talked in private.
> >>
> >> Best regards,
> >> Michihiro
> >>
> >> Inactive hide details for "Corey Ashford" ---2020/06/24
> >> 09:40:10---Currently in java.util.Base64, there is a
> >> HotSpotIntrinsicCa"Corey Ashford" ---2020/06/24 09:40:10---Currently
> >> in java.util.Base64, there is a HotSpotIntrinsicCandidate and API for
> >> encodeBlock, but no
> >>
> >> From: "Corey Ashford" <cjashfor at linux.ibm.com>
> >> To: "hotspot-compiler-dev at openjdk.java.net"
> >> <hotspot-compiler-dev at openjdk.java.net>,
> >> "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-
> dev at openjdk.java.net>
> >> Cc: Michihiro Horie/Japan/IBM at IBMJP, Kazunori
> Ogata/Japan/IBM at IBMJP,
> >> joserz at br.ibm.com
> >> Date: 2020/06/24 09:40
> >> Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for
> >> Base64 decoding
> >>
> >> ------------------------------------------------------------------------
> >>
> >>
> >>
> >> Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and
> >> API for encodeBlock, but none for decoding. ?This means that only
> >> encoding gets acceleration from the underlying CPU's vector hardware.
> >>
> >> I'd like to propose adding a new intrinsic for decodeBlock. ?The
> >> considerations I have for this new intrinsic's API:
> >>
> >> ??* Don't make any assumptions about the underlying capability of the
> >> hardware. ?For example, do not impose any specific block size
> >> granularity.
> >>
> >> ??* Don't assume the underlying intrinsic can handle isMIME or isURL
> >> modes, but also let them decide if they will process the data regardless
> >> of the settings of the two booleans.
> >>
> >> ??* Any remaining data that is not processed by the intrinsic will be
> >> processed by the pure Java implementation. ?This allows the intrinsic to
> >> process whatever block sizes it's good at without the complexity of
> >> handling the end fragments.
> >>
> >> ??* If any illegal character is discovered in the decoding process, the
> >> intrinsic will simply return -1, instead of requiring it to throw a
> >> proper exception from the context of the intrinsic. ?In the event of
> >> getting a -1 returned from the intrinsic, the Java Base64 library code
> >> simply calls the pure Java implementation to have it find the error and
> >> properly throw an exception. ?This is a performance trade-off in the
> >> case of an error (which I expect to be very rare).
> >>
> >> ??* One thought I have for a further optimization (not implemented in
> >> the current patch), is that when the intrinsic decides not to process a
> >> block because of some combination of isURL and isMIME settings it
> >> doesn't handle, it could return extra bits in the return code, encoded
> >> as a negative number. ?For example:
> >>
> >> Illegal_Base64_char ? = 0b001;
> >> isMIME_unsupported ? ?= 0b010;
> >> isURL_unsupported ? ? = 0b100;
> >>
> >> These can be OR'd together as needed and then negated (flip the sign).
> >> The Base64 library code could then cache these flags, so it will know
> >> not to call the intrinsic again when another decodeBlock is requested
> >> but with an unsupported mode. ?This will save the performance hit of
> >> calling the intrinsic when it is guaranteed to fail.
> >>
> >> I've tested the attached patch with an actual intrinsic coded up for
> >> Power9/Power10, but those runtime intrinsics and arch-specific patches
> >> aren't attached today. ?I want to get some consensus on the
> >> library-level intrinsic API first.
> >>
> >> Also attached is a simple test case to test that the new intrinsic API
> >> doesn't break anything.
> >>
> >> I'm open to any comments about this.
> >>
> >> Thanks for your consideration,
> >>
> >> - Corey
> >>
> >>
> >> Corey Ashford
> >> IBM Systems, Linux Technology Center, OpenJDK team
> >> cjashfor at us dot ibm dot com
> >> [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro
> >> Horie/Japan/IBM] [attachment "TestBase64.java" deleted by Michihiro
> >> Horie/Japan/IBM]
> >>
> >>
> >


From tobias.hartmann at oracle.com  Tue Aug 25 13:49:43 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 25 Aug 2020 15:49:43 +0200
Subject: RFR(S): 8252292: 8240795 may cause anti-dependence to be missed
In-Reply-To: <87wo1n6snc.fsf@redhat.com>
References: <87wo1n6snc.fsf@redhat.com>
Message-ID: <fe7731bf-3521-c4d3-7f96-4656b3f28292@oracle.com>

Hi Roland,

Good catch, the fix looks reasonable to me.

I think the test needs a @requires vm.gc == "Parallel" | vm.gc == "null" to not fail due to
conflicting GC options if another GC is set.

Best regards,
Tobias

On 25.08.20 10:23, Roland Westrelin wrote:
> 
> https://bugs.openjdk.java.net/browse/JDK-8252292
> http://cr.openjdk.java.net/~roland/8252292/webrev.00/
> 
> In 8240795, I modified alias analysis so non escaping allocations don't
> alias with bottom memory. While browsing that code last week, I noticed
> that that change didn't seem quite right and may cause some
> anti-dependences to be missed. I could indeed write a test case that
> fails with an incorrect execution.
> 
> In the test case: the dst[9] load after the ArrayCopy is transformed
> into a src[9] load before the ArrayCopy. Anti dependence analysis find
> src[9] shares the memory of the ArrayCopy but because of the way I
> tweaked the code with 8240795, anti-dependence analysis finds the src[9]
> and ArrayCopy don't alias so src[9] can sink out of the loop which is
> wrong because of the src[9] store. Anti-dependence analysis in that case
> would need to look at the memory uses of ArrayCopy too.
> 
> Roland.
> 

From tobias.hartmann at oracle.com  Tue Aug 25 14:06:38 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 25 Aug 2020 16:06:38 +0200
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <BD40513E-89A5-4E1A-8BFB-4EDFAA524A75@oracle.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
 <87h7t13bdz.fsf@redhat.com> <a28925b3-f93c-40f4-f180-8bf8a32c3a56@oracle.com>
 <87tuwx1gcf.fsf@redhat.com> <7433dd28-94ce-781a-a50c-e79234e2986e@oracle.com>
 <BD40513E-89A5-4E1A-8BFB-4EDFAA524A75@oracle.com>
Message-ID: <d9959e2c-32b9-54c2-a59c-c674e7841fce@oracle.com>


On 25.08.20 07:23, John Rose wrote:
> Putting these together, and choosing a round number
> which prioritizes concern (b) by moving closer to the
> limit of (a), if I had one more run to do I?d choose
> -XX: StressLongCountedLoop=20000000.
> 
> If I were to do multiple runs, I might choose vary that
> stress parameter by adding and subtracting a couple
> of zeroes:
> 
> -XX: StressLongCountedLoop=200000
> -XX: StressLongCountedLoop=2000000
> -XX: StressLongCountedLoop=20000000
> -XX: StressLongCountedLoop=200000000
> -XX: StressLongCountedLoop=2000000000

Okay, thanks, I'll run some more testing with these values. Will report back once it finished.

> If any of those runs kicks out a bug or other suspicious behavior,
> it should be added to a permanent test list.

Earlier runs with 1 and 4294967295 already found bugs. I think we should add a selection of stress
values to higher CI tiers.

Best regards,
Tobias

From rwestrel at redhat.com  Tue Aug 25 14:13:24 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 25 Aug 2020 16:13:24 +0200
Subject: [16] RFR(M): 8249607: C2: assert(!had_error) failed: bad dominance
In-Reply-To: <ff42068d-c628-7698-b16d-de164e9f6a4d@oracle.com>
References: <ff42068d-c628-7698-b16d-de164e9f6a4d@oracle.com>
Message-ID: <87o8my7r0b.fsf@redhat.com>


> In the testcase, a LoadSNode is cloned in 
> PhaseIdealLoop::split_if_with_blocks_post() for each use such that they 
> can float out of a loop. To ensure that these loads cannot float back 
> into the loop, we pin them by setting their control input [1]. In the 
> testcase, all 3 new clones are pinned to a loop exit node that is part 
> of an outer strip mined loop (see [2]).

Do I understand this right, that all 3 clones are pinned with the same
control? So they common and only of them is kept?

Roland.


From rwestrel at redhat.com  Tue Aug 25 14:21:55 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 25 Aug 2020 16:21:55 +0200
Subject: RFR(S): 8252292: 8240795 may cause anti-dependence to be missed
In-Reply-To: <fe7731bf-3521-c4d3-7f96-4656b3f28292@oracle.com>
References: <87wo1n6snc.fsf@redhat.com>
 <fe7731bf-3521-c4d3-7f96-4656b3f28292@oracle.com>
Message-ID: <87k0xm7qm4.fsf@redhat.com>


> Good catch, the fix looks reasonable to me.

Thanks for the review.

> I think the test needs a @requires vm.gc == "Parallel" | vm.gc == "null" to not fail due to
> conflicting GC options if another GC is set.

Indeed. I'll make that change before I push the fix.

Roland.


From igor.ignatyev at oracle.com  Tue Aug 25 14:25:26 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 25 Aug 2020 07:25:26 -0700
Subject: RFR(S): 8252292: 8240795 may cause anti-dependence to be missed
In-Reply-To: <fe7731bf-3521-c4d3-7f96-4656b3f28292@oracle.com>
References: <87wo1n6snc.fsf@redhat.com>
 <fe7731bf-3521-c4d3-7f96-4656b3f28292@oracle.com>
Message-ID: <C28950E4-6722-4F28-8CEA-CFD5BA9EA0D6@oracle.com>


> On Aug 25, 2020, at 6:49 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> Hi Roland,
> 
> Good catch, the fix looks reasonable to me.
> 
> I think the test needs a @requires vm.gc == "Parallel" | vm.gc == "null" to not fail due to
> conflicting GC options if another GC is set.
Hi Roland,

'@requires vm.gc.Parallel' should be used to limit execution of the test to configurations where ParallelGC is available and selectable (meaning no GC has been explicitly specified or explicitly specified GC is Parallel).

-- Igor
> 
> Best regards,
> Tobias
> 
> On 25.08.20 10:23, Roland Westrelin wrote:
>> 
>> https://bugs.openjdk.java.net/browse/JDK-8252292
>> http://cr.openjdk.java.net/~roland/8252292/webrev.00/
>> 
>> In 8240795, I modified alias analysis so non escaping allocations don't
>> alias with bottom memory. While browsing that code last week, I noticed
>> that that change didn't seem quite right and may cause some
>> anti-dependences to be missed. I could indeed write a test case that
>> fails with an incorrect execution.
>> 
>> In the test case: the dst[9] load after the ArrayCopy is transformed
>> into a src[9] load before the ArrayCopy. Anti dependence analysis find
>> src[9] shares the memory of the ArrayCopy but because of the way I
>> tweaked the code with 8240795, anti-dependence analysis finds the src[9]
>> and ArrayCopy don't alias so src[9] can sink out of the loop which is
>> wrong because of the src[9] store. Anti-dependence analysis in that case
>> would need to look at the memory uses of ArrayCopy too.
>> 
>> Roland.
>> 


From rwestrel at redhat.com  Tue Aug 25 14:31:16 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 25 Aug 2020 16:31:16 +0200
Subject: RFR(S): 8252292: 8240795 may cause anti-dependence to be missed
In-Reply-To: <C28950E4-6722-4F28-8CEA-CFD5BA9EA0D6@oracle.com>
References: <87wo1n6snc.fsf@redhat.com>
 <fe7731bf-3521-c4d3-7f96-4656b3f28292@oracle.com>
 <C28950E4-6722-4F28-8CEA-CFD5BA9EA0D6@oracle.com>
Message-ID: <87h7sq7q6j.fsf@redhat.com>


Hi Igor,

> '@requires vm.gc.Parallel' should be used to limit execution of the
> test to configurations where ParallelGC is available and selectable
> (meaning no GC has been explicitly specified or explicitly specified
> GC is Parallel).

Thanks for the clarification. I'll go with your suggestion.

Roland.


From aph at redhat.com  Tue Aug 25 14:55:40 2020
From: aph at redhat.com (Andrew Haley)
Date: Tue, 25 Aug 2020 15:55:40 +0100
Subject: RFR 8249893: AARCH64: optimize the construction of the value from
 the bits of the other two
In-Reply-To: <2323d921-8db3-b98f-af7a-bba7b7c345be@bell-sw.com>
References: <c11d8af8-bca7-7a7e-5486-94c487f241ac@bell-sw.com>
 <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com>
 <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com>
 <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com>
 <8635e113-53c0-f6a7-e51c-acf20222e216@redhat.com>
 <89c44f2f-9a4f-4298-4e78-358fff27b6a7@bell-sw.com>
 <95cf8beb-2071-8c41-ff71-d4998681e742@redhat.com>
 <2323d921-8db3-b98f-af7a-bba7b7c345be@bell-sw.com>
Message-ID: <c5bc64a8-3ef9-f577-3bf9-72e9b1585298@redhat.com>

On 25/08/2020 10:47, Boris Ulasevich wrote:
> Ok. Can you please check that my patch [1] has been applied
> and built correctly. With my change I see this picture:
> 
> ....[Hottest Region 2]...........................................
> c2, level 4, org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub,
> 
>  ??????????? 0x0000ffff84584dac:?? add??? x11, x14, #0x94
> 
>  ??????????? 0x0000ffff84584db0:?? stp??? x21, x19, [sp]
>  ??????????? 0x0000ffff84584db4:?? stp??? x20, x14, [sp, #16]
>  ??????????? 0x0000ffff84584db8:?? stp??? x15, x10, [sp, #32]
>  ??????????? 0x0000ffff84584dbc:?? str??? x11, [sp, #48]
>  ??????????? 0x0000ffff84584dc0:?? b??? 0x0000ffff84584dd8
>  ??????????? 0x0000ffff84584dc4:?? nop
>  ??????????? 0x0000ffff84584dc8:?? nop
>  ??????????? 0x0000ffff84584dcc:?? nop
>  ? 3.64%? ?? 0x0000ffff84584dd0:?? str??? x19, [sp, #16]
>  ? 0.07%? ?? 0x0000ffff84584dd4:?? mov??? x16, x29
>  ???????? ?? 0x0000ffff84584dd8:?? ldr??? w10, [x16, #12] ;*invokestatic bfm
>  ? 3.92%? ?? 0x0000ffff84584ddc:?? ldr??? w12, [x16, #24]
>  ? 4.69%? ?? 0x0000ffff84584de0:?? and??? w2, w10, #0xff
>  ? 0.03%? ?? 0x0000ffff84584de4:?? mov??? x29, x16
>  ? 0.02%? ?? 0x0000ffff84584de8:?? bfi??? x2, x12, #8, #8???? ;*ior  {reexecute=0 rethrow=0 return_oop=0}
>  ???????? ??????????????????????????????????????????????????? ; - 

My apologies, I must have messed the patch up. I rebuilt cleanly. One odd thing,
though, is that it only works with some forms, and not necessarily the most
common ones.

Good:

    @Benchmark
    public static int bfm(Result r) {
        return (r.a & 0xFF) | ((r.b & 0xFF) << 8);
    }

  8.13%  ?  0x0000fffface550f0:   and	w2, w11, #0xff
  0.69%  ?  0x0000fffface550f4:   bfi	x2, x10, #8, #8             ;*ior {reexecute=0 rethrow=0 return_oop=0}

Not so good:

    @Benchmark
    public static int shift_bfm(Result r) {
        return ((r.a << 24 >>> 24)  | (r.b << 24 >>> 16));
    }

  8.56%  ?  0x0000ffff88e50e70:   lsl	w12, w11, #24
         ?  0x0000ffff88e50e74:   and	w10, w10, #0xff
  8.59%  ?  0x0000ffff88e50e78:   orr	w2, w10, w12, lsr #16       ;*ior {reexecute=0 rethrow=0 return_oop=0}

    @Benchmark
    public static int shift_sbfm(Result r) {
        return ((r.a << 24 >>> 24)  | (r.b << 24 >> 16));
    }

  9.40%  ?  0x0000ffff84e51070:   lsl	w12, w11, #24
  0.12%  ?  0x0000ffff84e51074:   and	w10, w10, #0xff
  8.06%  ?  0x0000ffff84e51078:   orr	w2, w10, w12, asr #16       ;*ior {reexecute=0 rethrow=0 return_oop=0}

Does this matter? Bits.java uses the (a & 0xff) | ((b & 0xFF) << 8) idiom so maybe
we don't care about the shift left followed by shift right form. But it feels
to me a bit unsatisfactory to miss it.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From vladimir.x.ivanov at oracle.com  Tue Aug 25 15:29:50 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 25 Aug 2020 18:29:50 +0300
Subject: RFR (XS) 8252290: C2: Remove unused enum in CallGenerator
In-Reply-To: <d7112951-f58d-1382-8ce9-3c7e7dac3ff6@oracle.com>
References: <30d68060-d518-a2c6-f853-9e870d48f0ad@redhat.com>
 <AM0PR0202MB33312F1A87D044A43397B41A9B570@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <d7112951-f58d-1382-8ce9-3c7e7dac3ff6@oracle.com>
Message-ID: <0f0a8779-0d70-7bd8-e302-f83fbefee24c@oracle.com>


> Just curious, which analyzer are you using?

One of the other bugs [1] filed by Aleksey mentions CLion.

Best regards,
Vladimir Ivanov

[1] https://bugs.openjdk.java.net/browse/JDK-8252237
     "CLion static analyzer highlights this oddity."

From aph at redhat.com  Tue Aug 25 16:55:38 2020
From: aph at redhat.com (Andrew Haley)
Date: Tue, 25 Aug 2020 17:55:38 +0100
Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster
 Math.signum(fp)
In-Reply-To: <c19c6cb8-1377-cca5-0df9-dc7612d13d67@bell-sw.com>
References: <abfbd41d-aacc-af5d-d12b-d8e685e95a76@bell-sw.com>
 <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com>
 <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com>
 <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com>
 <e6700260-198e-f5df-d91c-317bb58d5d47@redhat.com>
 <c19c6cb8-1377-cca5-0df9-dc7612d13d67@bell-sw.com>
Message-ID: <c8cbe80b-5a34-8c63-6a04-1f299802ee42@redhat.com>

On 24/08/2020 22:52, Dmitry Chuyko wrote:
>
> I added two more intrinsics -- for copySign, they are controlled by
> UseCopySignIntrinsic flag.
>
> webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.03/
>
> It also contains 'benchmarks' directory:
> http://cr.openjdk.java.net/~dchuyko/8251525/webrev.03/benchmarks/
>
> There are 8 benchmarks there: (double | float) x (blackhole | reduce) x
> (current j.l.Math.signum | abs()>0 check).
>
> My results on Arm are in signum-facgt-copysign.ods. Main case is
> 'random' which is actually a random from positive and negative numbers
> between -0.5 and +0.5.
>
> Basically we have ~14% improvement in 'reduce' benchmark variant but
> ~20% regression in 'blackhole' variant in case of only copySign()
> intrinsified.
>
> Same picture if abs()>0 check is used in signum() (+-5%). This variant
> is included as it shows very good results on x86.
>
> Intrinsic for signum() gives improvement of main case in both
> 'blackhole' and 'reduce' variants of benchmark: 28% and 11%, which is a
> noticeable difference.

Ignoring Blackhole for the moment, this is what I'm seeing for the
reduction/random case:

Benchmark                   Mode  Cnt  Score   Error  Units

ThunderX 2:

-XX:-UseSignumIntrinsic -XX:-UseCopySignIntrinsic
DoubleReduceBench.ofRandom  avgt    3  2.456 ? 0.065  ns/op

-XX:+UseSignumIntrinsic -XX:-UseCopySignIntrinsic
DoubleReduceBench.ofRandom  avgt    3  2.766 ? 0.107  ns/op

-XX:-UseSignumIntrinsic -XX:+UseCopySignIntrinsic
DoubleReduceBench.ofRandom  avgt    3  2.537 ? 0.770  ns/op


Neoverse N1 (Actually Amazon m6g.16xlarge):

-XX:-UseSignumIntrinsic -XX:-UseCopySignIntrinsic
DoubleReduceBench.ofRandom  avgt    3  1.173 ? 0.001  ns/op

-XX:+UseSignumIntrinsic -XX:-UseCopySignIntrinsic
DoubleReduceBench.ofRandom  avgt    3  1.043 ? 0.022  ns/op

-XX:-UseSignumIntrinsic -XX:+UseCopySignIntrinsic
DoubleReduceBench.ofRandom  avgt    3  1.012 ?  0.001  ns/op


By your own numbers, in the reduce benchmark the signum intrinsic is
worse than default for all 0 and NaN, but about 12% better for random,
>0, and <0. If you take the average of the sppedups and slowdowns it's
actually worse than default.

By my reckoning, if you take all possibilities (Nan, <0, >0, 0,
Random) into account, the best-performing on the reduce test is
actually Abs/Copysign, but there's very little in it. The only time
that the signum intrinsic actually wins is when you're storing the
result into memory *and* flushing the store buffer.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From boris.ulasevich at bell-sw.com  Tue Aug 25 17:30:02 2020
From: boris.ulasevich at bell-sw.com (Boris Ulasevich)
Date: Tue, 25 Aug 2020 20:30:02 +0300
Subject: RFR 8249893: AARCH64: optimize the construction of the value from
 the bits of the other two
In-Reply-To: <c5bc64a8-3ef9-f577-3bf9-72e9b1585298@redhat.com>
References: <c11d8af8-bca7-7a7e-5486-94c487f241ac@bell-sw.com>
 <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com>
 <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com>
 <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com>
 <8635e113-53c0-f6a7-e51c-acf20222e216@redhat.com>
 <89c44f2f-9a4f-4298-4e78-358fff27b6a7@bell-sw.com>
 <95cf8beb-2071-8c41-ff71-d4998681e742@redhat.com>
 <2323d921-8db3-b98f-af7a-bba7b7c345be@bell-sw.com>
 <c5bc64a8-3ef9-f577-3bf9-72e9b1585298@redhat.com>
Message-ID: <405af8db-d12b-66ef-ff1b-8d0e2fb1273c@bell-sw.com>

Andrew,

Thanks for looking into this.

I believe masking with left shift and right shift is not common.
Search though jdk repository does not give such patterns while
there is a hundreds of mask+lshift expressions.

I implemented a simple is_bitrange_zero() method for counting the
bitranges of sub-expressions: power-of-two masks and left shift only.
We can take into account more cases (careful testing is a main
concern). But particularly about "r.a << 24 >>> 24" expression
I think it is worse to think about canonicalization: "left shift + right
shift" to "mask + left shift" (or may be the backwards).

regards,
Boris

On 25.08.2020 17:55, Andrew Haley wrote:
> On 25/08/2020 10:47, Boris Ulasevich wrote:
>> Ok. Can you please check that my patch [1] has been applied
>> and built correctly. With my change I see this picture:
>>
>> ....[Hottest Region 2]...........................................
>> c2, level 4, org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub,
>>
>>   ??????????? 0x0000ffff84584dac:?? add??? x11, x14, #0x94
>>
>>   ??????????? 0x0000ffff84584db0:?? stp??? x21, x19, [sp]
>>   ??????????? 0x0000ffff84584db4:?? stp??? x20, x14, [sp, #16]
>>   ??????????? 0x0000ffff84584db8:?? stp??? x15, x10, [sp, #32]
>>   ??????????? 0x0000ffff84584dbc:?? str??? x11, [sp, #48]
>>   ??????????? 0x0000ffff84584dc0:?? b??? 0x0000ffff84584dd8
>>   ??????????? 0x0000ffff84584dc4:?? nop
>>   ??????????? 0x0000ffff84584dc8:?? nop
>>   ??????????? 0x0000ffff84584dcc:?? nop
>>   ? 3.64%? ?? 0x0000ffff84584dd0:?? str??? x19, [sp, #16]
>>   ? 0.07%? ?? 0x0000ffff84584dd4:?? mov??? x16, x29
>>   ???????? ?? 0x0000ffff84584dd8:?? ldr??? w10, [x16, #12] ;*invokestatic bfm
>>   ? 3.92%? ?? 0x0000ffff84584ddc:?? ldr??? w12, [x16, #24]
>>   ? 4.69%? ?? 0x0000ffff84584de0:?? and??? w2, w10, #0xff
>>   ? 0.03%? ?? 0x0000ffff84584de4:?? mov??? x29, x16
>>   ? 0.02%? ?? 0x0000ffff84584de8:?? bfi??? x2, x12, #8, #8???? ;*ior  {reexecute=0 rethrow=0 return_oop=0}
>>   ???????? ??????????????????????????????????????????????????? ; -
> My apologies, I must have messed the patch up. I rebuilt cleanly. One odd thing,
> though, is that it only works with some forms, and not necessarily the most
> common ones.
>
> Good:
>
>      @Benchmark
>      public static int bfm(Result r) {
>          return (r.a & 0xFF) | ((r.b & 0xFF) << 8);
>      }
>
>    8.13%  ?  0x0000fffface550f0:   and	w2, w11, #0xff
>    0.69%  ?  0x0000fffface550f4:   bfi	x2, x10, #8, #8             ;*ior {reexecute=0 rethrow=0 return_oop=0}
>
> Not so good:
>
>      @Benchmark
>      public static int shift_bfm(Result r) {
>          return ((r.a << 24 >>> 24)  | (r.b << 24 >>> 16));
>      }
>
>    8.56%  ?  0x0000ffff88e50e70:   lsl	w12, w11, #24
>           ?  0x0000ffff88e50e74:   and	w10, w10, #0xff
>    8.59%  ?  0x0000ffff88e50e78:   orr	w2, w10, w12, lsr #16       ;*ior {reexecute=0 rethrow=0 return_oop=0}
>
>      @Benchmark
>      public static int shift_sbfm(Result r) {
>          return ((r.a << 24 >>> 24)  | (r.b << 24 >> 16));
>      }
>
>    9.40%  ?  0x0000ffff84e51070:   lsl	w12, w11, #24
>    0.12%  ?  0x0000ffff84e51074:   and	w10, w10, #0xff
>    8.06%  ?  0x0000ffff84e51078:   orr	w2, w10, w12, asr #16       ;*ior {reexecute=0 rethrow=0 return_oop=0}
>
> Does this matter? Bits.java uses the (a & 0xff) | ((b & 0xFF) << 8) idiom so maybe
> we don't care about the shift left followed by shift right form. But it feels
> to me a bit unsatisfactory to miss it.


From christian.hagedorn at oracle.com  Tue Aug 25 17:42:38 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Tue, 25 Aug 2020 19:42:38 +0200
Subject: [16] RFR(M): 8249607: C2: assert(!had_error) failed: bad dominance
In-Reply-To: <87o8my7r0b.fsf@redhat.com>
References: <ff42068d-c628-7698-b16d-de164e9f6a4d@oracle.com>
 <87o8my7r0b.fsf@redhat.com>
Message-ID: <738ba102-2cbd-a842-0f23-2984a9293035@oracle.com>

On 25.08.20 16:13, Roland Westrelin wrote:
> 
>> In the testcase, a LoadSNode is cloned in
>> PhaseIdealLoop::split_if_with_blocks_post() for each use such that they
>> can float out of a loop. To ensure that these loads cannot float back
>> into the loop, we pin them by setting their control input [1]. In the
>> testcase, all 3 new clones are pinned to a loop exit node that is part
>> of an outer strip mined loop (see [2]).
> 
> Do I understand this right, that all 3 clones are pinned with the same
> control? So they common and only of them is kept?

Yes, exactly. All are pinned to the inner loop exit node. But at the 
time we hit the assertion failure, we still got one cloned load (903 
LoadS) that is an input to the store (575 StoreI) that's going into the 
outer strip mined loop safepoint, and one load (901 LoadS) that is 
triggering the dominance failure. LoadS 902 was removed at some point in 
between due to other optimizations.

Best regards,
Christian

From john.r.rose at oracle.com  Tue Aug 25 19:04:53 2020
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 25 Aug 2020 12:04:53 -0700
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <87r1ru7sop.fsf@redhat.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
 <87h7t13bdz.fsf@redhat.com> <a28925b3-f93c-40f4-f180-8bf8a32c3a56@oracle.com>
 <87tuwx1gcf.fsf@redhat.com> <7433dd28-94ce-781a-a50c-e79234e2986e@oracle.com>
 <BD40513E-89A5-4E1A-8BFB-4EDFAA524A75@oracle.com> <87r1ru7sop.fsf@redhat.com>
Message-ID: <18900602-6B03-483E-986B-30C8153F9F6F@oracle.com>

On Aug 25, 2020, at 6:37 AM, Roland Westrelin <rwestrel at redhat.com> wrote:
> 
> 
>> Putting these together, and choosing a round number
>> which prioritizes concern (b) by moving closer to the
>> limit of (a), if I had one more run to do I?d choose
>> -XX: StressLongCountedLoop=20000000.
>> 
>> If I were to do multiple runs, I might choose vary that
>> stress parameter by adding and subtracting a couple
>> of zeroes:
>> 
>> -XX: StressLongCountedLoop=200000
>> -XX: StressLongCountedLoop=2000000
>> -XX: StressLongCountedLoop=20000000
>> -XX: StressLongCountedLoop=200000000
>> -XX: StressLongCountedLoop=2000000000
> 
> FWIW, I ran my own tests with -XX: StressLongCountedLoop=10000. Tobias
> runs did catch failures I didn't run into.
> 
>> Separately from those issues, we know that the stress mode
>> converts 32-bit loops into 64-bit loops, which then re-nest
>> using the new logic.  But, are we confident that this re-nesting
>> works?  Roland did some manual testing to make sure the
>> test works as intended, but it would be good to run the above
>> stress tests with some sort of logging that ensures that there
>> are at least ?lots and lots? of successful 32-to-64 loop conversions.
>> If those loop conversions fail (staying at 64 bits) the tests will
>> pass, but they won?t be testing what we need to be testing.
> 
> What about using the new statistics?
> A CTW run of the base module reports: long loops=11/36
> A CTW run of the base module with -XX:StressLongCountedLoop=1000 reports: long loops=3271/3410
> 
> Granted, the first counter is only incremented once the loop nest is
> created but not when the inner loop is converted to a counted loop. On
> another run with a third counter incremented on counted loop creation:
> 
> 2889/2971/3106
> 
> (that not all created inner loops are transformed to counted loops is
> strange. Maybe some become dead between the 2 steps.)

Yes, that?s the sort of manual testing I was referring to.
The numbers you show are a reasonable value of ?lots and lots?
for a CTW run.  Who knows what the conversion rates are
for real applications driven by profiles.  I?m curious what
they are but I suppose we can live without them.  We don?t
have AFAIK a way to set up a special cumulative report
for a tier of testing on those parameters.  I guess we are
several levels of improvement short of being able to set up
a probe across a set of tests and roll up statistics from it.

So, with Tobias running those extra tests (the ?20s?)
we are more than good.

Thanks!

? John

From john.r.rose at oracle.com  Tue Aug 25 19:09:43 2020
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 25 Aug 2020 12:09:43 -0700
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <18900602-6B03-483E-986B-30C8153F9F6F@oracle.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
 <87h7t13bdz.fsf@redhat.com> <a28925b3-f93c-40f4-f180-8bf8a32c3a56@oracle.com>
 <87tuwx1gcf.fsf@redhat.com> <7433dd28-94ce-781a-a50c-e79234e2986e@oracle.com>
 <BD40513E-89A5-4E1A-8BFB-4EDFAA524A75@oracle.com> <87r1ru7sop.fsf@redhat.com>
 <18900602-6B03-483E-986B-30C8153F9F6F@oracle.com>
Message-ID: <4EF06EFC-9790-4B46-AA1D-E688C571D171@oracle.com>

On Aug 25, 2020, at 12:04 PM, John Rose <john.r.rose at oracle.com> wrote:
> 
> I guess we are
> several levels of improvement short of being able to set up
> a probe across a set of tests and roll up statistics from it.

P.S.  Those levels might be:

1. Plumb our ad hoc statistics into JFR and/or BPF publication points.
(Either recode, or write some sort of log-file stripper.)

2. Fit the JVM with a side channel to manage external connections
to said publication points.

3. Fit the side channel to off-the-shelf tools for log data aggregation.

4. Ensure that our testing framework has options for hooking up
said aggregation tools to the test jobs.


From igor.ignatyev at oracle.com  Wed Aug 26 01:01:44 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 25 Aug 2020 18:01:44 -0700
Subject: RFR(M/S) : 8251127 : clean up FileInstaller $test.src $cwd in
 remaining vmTestbase_vm_compiler tests : 
Message-ID: <FC368785-3C64-48CA-B5D8-BD2C336BEE82@oracle.com>

http://cr.openjdk.java.net/~iignatyev/8251127/webrev.00/
> 560 lines changed: 132 ins; 367 del; 61 mod; 

Hi all,

could you please review the patch which removes FileInstaller actions from :vmTestbase_vm_compiler?

the biggest chunk of the patch is just removal for '@run jdk.test.lib.FileInstaller' produced by sed '/jdk.test.lib.FileInstaller \. \./d'. human-made changes are:
 - moving jtreg test descriptions to the test source in t108-t113, corresponding changes in TEST.quick-groups and fixing line numbers in t108-t113.gold files
 - adding -Dtest.src=${test.src} to the tests which use ExecDriver (t087,t088,t108-t113), so GoldChecker would be able to find .gold file

testing: :vmTestbase_vm_compiler
JBS: https://bugs.openjdk.java.net/browse/JDK-8251127
webrev: http://cr.openjdk.java.net/~iignatyev/8251127/webrev.00/

Thanks,
-- Igor


From vladimir.kozlov at oracle.com  Wed Aug 26 01:10:40 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 25 Aug 2020 18:10:40 -0700
Subject: RFR(M/S) : 8251127 : clean up FileInstaller $test.src $cwd in
 remaining vmTestbase_vm_compiler tests :
In-Reply-To: <FC368785-3C64-48CA-B5D8-BD2C336BEE82@oracle.com>
References: <FC368785-3C64-48CA-B5D8-BD2C336BEE82@oracle.com>
Message-ID: <5859dffd-9ed9-21d3-102b-3070013d7fe0@oracle.com>

Good.

Thanks,
Vladimir K

On 8/25/20 6:01 PM, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev/8251127/webrev.00/
>> 560 lines changed: 132 ins; 367 del; 61 mod;
> 
> Hi all,
> 
> could you please review the patch which removes FileInstaller actions from :vmTestbase_vm_compiler?
> 
> the biggest chunk of the patch is just removal for '@run jdk.test.lib.FileInstaller' produced by sed '/jdk.test.lib.FileInstaller \. \./d'. human-made changes are:
>   - moving jtreg test descriptions to the test source in t108-t113, corresponding changes in TEST.quick-groups and fixing line numbers in t108-t113.gold files
>   - adding -Dtest.src=${test.src} to the tests which use ExecDriver (t087,t088,t108-t113), so GoldChecker would be able to find .gold file
> 
> testing: :vmTestbase_vm_compiler
> JBS: https://bugs.openjdk.java.net/browse/JDK-8251127
> webrev: http://cr.openjdk.java.net/~iignatyev/8251127/webrev.00/
> 
> Thanks,
> -- Igor
> 
> 
>   
> 

From shade at redhat.com  Wed Aug 26 07:28:04 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 26 Aug 2020 09:28:04 +0200
Subject: RFR (S) 8252215: Remove VerifyOptoOopOffsets flag
In-Reply-To: <bf7f3cd0-b422-c305-e8eb-7eb3f44dcf40@oracle.com>
References: <96144e25-02b7-ed81-285e-b8d487fd6cfb@redhat.com>
 <bf7f3cd0-b422-c305-e8eb-7eb3f44dcf40@oracle.com>
Message-ID: <6d334364-c26c-13b5-b804-7d61d8fad8d4@redhat.com>

On 8/25/20 2:43 PM, Tobias Hartmann wrote:
> looks good to me.
Thanks! I'll wait a bit for more opinions on this.

-- 
-Aleksey


From shade at redhat.com  Wed Aug 26 07:30:19 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 26 Aug 2020 09:30:19 +0200
Subject: RFR (XS) 8252290: C2: Remove unused enum in CallGenerator
In-Reply-To: <d7112951-f58d-1382-8ce9-3c7e7dac3ff6@oracle.com>
References: <30d68060-d518-a2c6-f853-9e870d48f0ad@redhat.com>
 <AM0PR0202MB33312F1A87D044A43397B41A9B570@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <d7112951-f58d-1382-8ce9-3c7e7dac3ff6@oracle.com>
Message-ID: <c67fc6fd-5fd0-f24b-0120-c0ae56d76674@redhat.com>

On 8/25/20 2:46 PM, Tobias Hartmann wrote:
> +1

Thanks, pushed.

> On 25.08.20 11:28, Reingruber, Richard wrote:
>> Static code inspection complains the enum below is unused.
> 
> Just curious, which analyzer are you using?

Yup, CLion analyzers. They highlight all sorts of errors when I browse the code :)


-- 
Thanks,
-Aleksey


From shade at redhat.com  Wed Aug 26 07:30:17 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 26 Aug 2020 09:30:17 +0200
Subject: RFR (XS) 8252291: C2: Assignment in conditional in
 loopUnswitch.cpp
In-Reply-To: <811ea63c-b897-ccf0-2559-82842b52e4be@oracle.com>
References: <b7a6ee57-bf7e-6d73-0bad-1f90141e053a@redhat.com>
 <811ea63c-b897-ccf0-2559-82842b52e4be@oracle.com>
Message-ID: <173e2b6a-e1ce-32b3-8ded-f84077e55979@redhat.com>

On 8/25/20 2:44 PM, Tobias Hartmann wrote:
> looks good and trivial to me.

Thanks, pushed.

-- 
-Aleksey


From shade at redhat.com  Wed Aug 26 08:06:48 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 26 Aug 2020 10:06:48 +0200
Subject: RFR (XS) 8252362: C2: Remove no-op checking for callee-saved-floats
Message-ID: <dca495c3-c580-d56d-8fa3-992d9c14246e@redhat.com>

Cleanup:
   https://bugs.openjdk.java.net/browse/JDK-8252362

The block below does not do anything, because there are no side-effects anywhere, and then 
callee_saved_floats is left unused. It is this way since the initial load. I believe C2 (matching) 
code just uses SOE/SOC info from .ad. Anyhow, I cannot find where the rest of runtime codifies 
SOE/SOC registers to check here. There are plenty of hand-enumerated registers in, say, 
macroAssembler-s.

I think it is cleaner to remove the block:

diff -r e12584d50765 src/hotspot/share/opto/c2compiler.cpp
--- a/src/hotspot/share/opto/c2compiler.cpp     Wed Aug 26 09:29:46 2020 +0200
+++ b/src/hotspot/share/opto/c2compiler.cpp     Wed Aug 26 10:02:48 2020 +0200
@@ -64,14 +64,4 @@
    }

-  // Check that runtime and architecture description agree on callee-saved-floats
-  bool callee_saved_floats = false;
-  for( OptoReg::Name i=OptoReg::Name(0); i<OptoReg::Name(_last_Mach_Reg); i = OptoReg::add(i,1) ) {
-    // Is there a callee-saved float or double?
-    if( register_save_policy[i] == 'E' /* callee-saved */ &&
-       (register_save_type[i] == Op_RegF || register_save_type[i] == Op_RegD) ) {
-      callee_saved_floats = true;
-    }
-  }
-
    DEBUG_ONLY( Node::init_NodeProperty(); )

Testing: local tier1

-- 
Thanks,
-Aleksey


From yueshi.zwj at alibaba-inc.com  Wed Aug 26 09:05:17 2020
From: yueshi.zwj at alibaba-inc.com (Joshua Zhu)
Date: Wed, 26 Aug 2020 17:05:17 +0800
Subject: [aarch64-port-dev ] RFR: 8252259: AArch64: Adjust default value
 of FLOATPRESSURE
In-Reply-To: <abd4bc2f-3ad7-41de-b0bf-e615b7276f26@redhat.com>
References: <001101d67aa5$69851450$3c8f3cf0$@alibaba-inc.com>
 <abd4bc2f-3ad7-41de-b0bf-e615b7276f26@redhat.com>
Message-ID: <003801d67b88$04e60340$0eb209c0$@alibaba-inc.com>

Andrew, thanks a lot for your review.
Ningsheng, could you please help push this change?

Best Regards,
Joshua

> -----Original Message-----
> From: Andrew Haley <aph at redhat.com>
> Sent: 2020?8?25? 19:53
> To: Joshua Zhu <yueshi.zwj at alibaba-inc.com>; hotspot-compiler-
> dev at openjdk.java.net
> Cc: aarch64-port-dev at openjdk.java.net
> Subject: Re: [aarch64-port-dev ] RFR: 8252259: AArch64: Adjust default value of
> FLOATPRESSURE
> 
> On 25/08/2020 07:03, Joshua Zhu wrote:
> > Therefore I propose the default value of FLOATPRESSURE be 32 because
> > there are 32 float/SIMD registers on aarch64 and also the value of
> > register pressure is the same as 1 for each LRG of
> > Op_RegL/Op_RegD/Op_Vec. [3]
> >
> > Could you please help review this change?
> >
> > JBS: https://bugs.openjdk.java.net/browse/JDK-8252259
> > Webrev: http://cr.openjdk.java.net/~jzhu/8252259/webrev.00/
> 
> Yes, thanks. I can't remember why FLOATPRESSURE is 64, but it certainly looks
> like 32 is a much more sensible value.
> 
> --
> Andrew Haley  (he/him)
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From rwestrel at redhat.com  Wed Aug 26 09:06:03 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 26 Aug 2020 11:06:03 +0200
Subject: RFR(T): 8252296: Shenandoah: crash in CallNode::extract_projections
References: <87d03d7pdk.fsf@redhat.com>
Message-ID: <878se17p50.fsf@redhat.com>


Should have gone to hotspot-compiler-dev as well...

-------------------- Start of forwarded message --------------------
From: Roland Westrelin <rwestrel at redhat.com>
To: shenandoah-dev at openjdk.java.net
Subject: RFR(T): 8252296: Shenandoah: crash in CallNode::extract_projections
Date: Wed, 26 Aug 2020 11:00:55 +0200


http://cr.openjdk.java.net/~roland/8252296/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8252296

My fix for 8251527 has caused failures with shenandoah enabled because
CallNode::extract_projections() is called with a graph in the process of
being modified where a ProjNode has more than one control use.

Roland.
-------------------- End of forwarded message --------------------


From vladimir.x.ivanov at oracle.com  Wed Aug 26 09:30:25 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 26 Aug 2020 12:30:25 +0300
Subject: RFR (XS) 8252362: C2: Remove no-op checking for
 callee-saved-floats
In-Reply-To: <dca495c3-c580-d56d-8fa3-992d9c14246e@redhat.com>
References: <dca495c3-c580-d56d-8fa3-992d9c14246e@redhat.com>
Message-ID: <697ad989-ba33-7eb4-281e-3763e722fa10@oracle.com>

Looks good and trivial.

The code was added as part of JDK-6527187 [1], but it was useless from 
the very beginning.

Best regards,
Vladimir Ivanov

[1] https://bugs.openjdk.java.net/browse/JDK-6527187

On 26.08.2020 11:06, Aleksey Shipilev wrote:
> Cleanup:
>  ? https://bugs.openjdk.java.net/browse/JDK-8252362
> 
> The block below does not do anything, because there are no side-effects 
> anywhere, and then callee_saved_floats is left unused. It is this way 
> since the initial load. I believe C2 (matching) code just uses SOE/SOC 
> info from .ad. Anyhow, I cannot find where the rest of runtime codifies 
> SOE/SOC registers to check here. There are plenty of hand-enumerated 
> registers in, say, macroAssembler-s.
> 
> I think it is cleaner to remove the block:
> 
> diff -r e12584d50765 src/hotspot/share/opto/c2compiler.cpp
> --- a/src/hotspot/share/opto/c2compiler.cpp???? Wed Aug 26 09:29:46 2020 
> +0200
> +++ b/src/hotspot/share/opto/c2compiler.cpp???? Wed Aug 26 10:02:48 2020 
> +0200
> @@ -64,14 +64,4 @@
>  ?? }
> 
> -? // Check that runtime and architecture description agree on 
> callee-saved-floats
> -? bool callee_saved_floats = false;
> -? for( OptoReg::Name i=OptoReg::Name(0); 
> i<OptoReg::Name(_last_Mach_Reg); i = OptoReg::add(i,1) ) {
> -??? // Is there a callee-saved float or double?
> -??? if( register_save_policy[i] == 'E' /* callee-saved */ &&
> -?????? (register_save_type[i] == Op_RegF || register_save_type[i] == 
> Op_RegD) ) {
> -????? callee_saved_floats = true;
> -??? }
> -? }
> -
>  ?? DEBUG_ONLY( Node::init_NodeProperty(); )
> 
> Testing: local tier1
> 

From ningsheng.jian at arm.com  Wed Aug 26 09:31:41 2020
From: ningsheng.jian at arm.com (Ningsheng Jian)
Date: Wed, 26 Aug 2020 17:31:41 +0800
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <5b452edb-2851-f35a-ac30-523d74d95851@oracle.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com>
 <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com>
 <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com>
 <b97a6343-ec3b-f6ec-2a75-d02a53641ed1@arm.com>
 <9fd1e3b1-7884-1cf7-64ba-040a16c74425@oracle.com>
 <b9336ae3-4ebb-782b-df3a-b15c86487335@arm.com>
 <5b452edb-2851-f35a-ac30-523d74d95851@oracle.com>
Message-ID: <15ea964d-6605-7ba4-63bc-e61007407ed8@arm.com>

Hi Vladimir,

On 8/25/20 8:12 PM, Vladimir Ivanov wrote:
> 
[...]
> 
> So, it's enough to use a single "virtual" slot to model XMM, YMM, and 
> ZMM registers all at once unless RA supports packing multiple smaller 
> vector values into a single register (separately managing lower and 
> upper parts of the register; e.g., YMM = XMM(hi):XMM(lo) ). Though 
> currently RA does support it, there are no code which utilizes that and 
> no plans to do that in the future.
> 
> I believe the situation on AArch64 with NEON and SVE is similar. (And 
> scalable vectors make it harder to support packing in RA.)
> 

Right.

>  ? (2) vector width matters only for spills/refills and reg2reg moves.
> 
> Matcher does type capturing, so all vector mach nodes keep precise type 
> of the value they produce. On x86 it is heavily used later in code 
> emission phase, but RA still relies on ideal registers (Op_VecX et al). 
> I don't see why RA can't be migrated from ideal registers to types 
> (TypeVect) to determine vector size when performing spilling.
> 
>  From aforementioned observations, I conclude there should be a way to 
> declare a single ideal vector register (Op_Vec) which represents 
> full-width vector supported by the hardware and use captured vector 
> types (TypeVect instances) to guide RA and code generation. And that's 
> the state where I'd like to see vector support in C2 be moving to.
> 

That may be true. I think we can move forward step-by-step for easy 
maintenance.

> Regarding predicate registers, I haven't thought too much about them, so 
> I don't have a strong opinion about whether they should be a separate 
> entity (Op_RegVMask in your patch) or just treated as a vector of bits 
> (Op_Vec).
> 
>>> So far, I see 2 main directions for RA work:
>>>
>>> ?? (a) support vectors of arbitrary size:
>>> ???? (1) helps push the upper limit on the size (1024-bit)
>>> ???? (2) handle non-power-of-2 sizes
>>>
>>> ?? (b) optimize RA implementation for large values
>>>
>>> Anything else?
>>>
>>
>> Yes, and it's not just vector. SVE predicate register has scalable 
>> size (vector_size/8) as well. We also have predicate register 
>> allocator support well with proposed approach (not in this patch.).
> 
> Though with AVX512 support predicate register support was left aside, I 
> agree that predicate registers should be taken into account from the 
> very beginning. (And glad to hear you are already working on supporting 
> them!)
>

As that's one of the main feature of SVE, we have to do that. :-) With 
initial SVE support in, our further work on that could be easier.

> Also, I believe options #1/#2 may be extended to cover predicate 
> registers as well without too much effort.
> 
>>> Speaking of (a), in particular, I don't see why possible solution for 
>>> it should not supersede vecX et al altogether.
>>>
>>> Also, I may be wrong, but I don't see a clear evidence there's a 
>>> pressing need to have all of that fixed right from the beginning. 
>>> (That's why I put #1 and #2 options on the table.) Starting with 
>>> #1/#2 would untie initial SVE support from the exploratory work 
>>> needed to choose the most appropriate solution for (a) and (b).
>>>
>>
>> Staring from partial SVE register support might be acceptable for 
>> initial patch (Andrew may not agree :-)), but I think we may end up 
>> with more follow-up work, given that our proposed approach already 
>> supports SVE well in terms of (a) and (b). If there's no other 
>> solution, would it be possible to use current proposed method? It's 
>> not difficult to backout our changes in register allocation part, if 
>> we find other better solution to support arbitrary vector/predicate 
>> sizes in future, as the patch there is actually not big IMO.
> 
> Unfortunately, temporary solutions usually end up as permanent ones 
> since there's much less motivation to replace them (and harder to 
> justify the effort) after initial pressure is relieved.
> 
> I'm OK with the proposed patch if we agree it's a stop-the-gap/temporary 
> solution to the immediate problems you face with initial SVE support and 
> are ready to commit resources into replacing it.
> 

Yes, we will continue to maintain and improve it. Our idea might be Arm 
biased :), so we will need collaborations and suggestions from the 
community.

> That's why I think it's the right time to discuss general direction, 
> work on a plan, and use it to guide the coordinated effort to improve 
> vector support in C2.
> 
> Also, considering it a stop-the-gap solution means we should strive for 
> the simplest solution and that's another reason I put #1/#2 options on 
> the table to consider.
>  > [...]
> 
>>> Any new problems/hitting some limitations envisioned when spilling 
>>> large number of huge vectors (2048-bit) on stack?
>>>
>>
>> I haven't seen any so far.
> 
> Ok, good to know.
> 
> I was curious whether stack representation should also move away from 
> 32-bit slots to a more compact representation.
> 

I think that's possible, if we could also have the alignment handled.

Thanks,
Ningsheng


From shade at redhat.com  Wed Aug 26 09:37:47 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 26 Aug 2020 11:37:47 +0200
Subject: RFR (XS) 8252362: C2: Remove no-op checking for
 callee-saved-floats
In-Reply-To: <697ad989-ba33-7eb4-281e-3763e722fa10@oracle.com>
References: <dca495c3-c580-d56d-8fa3-992d9c14246e@redhat.com>
 <697ad989-ba33-7eb4-281e-3763e722fa10@oracle.com>
Message-ID: <3df1803d-84f2-b84f-bbd4-859dddd769e7@redhat.com>

On 8/26/20 11:30 AM, Vladimir Ivanov wrote:
> Looks good and trivial.

Ack. I'll wait a bit and then push.

> The code was added as part of JDK-6527187 [1], but it was useless from
> the very beginning.

Ah. Thanks for digging into pre-OpenJDK history. Added that breadcrumb to the JIRA.

-- 
Thanks,
-Aleksey


From Ningsheng.Jian at arm.com  Wed Aug 26 09:43:23 2020
From: Ningsheng.Jian at arm.com (Ningsheng Jian)
Date: Wed, 26 Aug 2020 09:43:23 +0000
Subject: [aarch64-port-dev ] RFR: 8252259: AArch64: Adjust default value
 of FLOATPRESSURE
In-Reply-To: <003801d67b88$04e60340$0eb209c0$@alibaba-inc.com>
References: <001101d67aa5$69851450$3c8f3cf0$@alibaba-inc.com>
 <abd4bc2f-3ad7-41de-b0bf-e615b7276f26@redhat.com>
 <003801d67b88$04e60340$0eb209c0$@alibaba-inc.com>
Message-ID: <VE1PR08MB50537A94AAB34247E20224D990540@VE1PR08MB5053.eurprd08.prod.outlook.com>

Pushed.

Regards,
Ningsheng

> -----Original Message-----
> From: Joshua Zhu <yueshi.zwj at alibaba-inc.com>
> Sent: Wednesday, August 26, 2020 5:05 PM
> To: 'Andrew Haley' <aph at redhat.com>; hotspot-compiler-dev at openjdk.java.net;
> Ningsheng Jian <Ningsheng.Jian at arm.com>
> Cc: aarch64-port-dev at openjdk.java.net
> Subject: RE: [aarch64-port-dev ] RFR: 8252259: AArch64: Adjust default value of
> FLOATPRESSURE
> 
> Andrew, thanks a lot for your review.
> Ningsheng, could you please help push this change?
> 
> Best Regards,
> Joshua
> 
> > -----Original Message-----
> > From: Andrew Haley <aph at redhat.com>
> > Sent: 2020?8?25? 19:53
> > To: Joshua Zhu <yueshi.zwj at alibaba-inc.com>; hotspot-compiler-
> > dev at openjdk.java.net
> > Cc: aarch64-port-dev at openjdk.java.net
> > Subject: Re: [aarch64-port-dev ] RFR: 8252259: AArch64: Adjust default
> > value of FLOATPRESSURE
> >
> > On 25/08/2020 07:03, Joshua Zhu wrote:
> > > Therefore I propose the default value of FLOATPRESSURE be 32 because
> > > there are 32 float/SIMD registers on aarch64 and also the value of
> > > register pressure is the same as 1 for each LRG of
> > > Op_RegL/Op_RegD/Op_Vec. [3]
> > >
> > > Could you please help review this change?
> > >
> > > JBS: https://bugs.openjdk.java.net/browse/JDK-8252259
> > > Webrev: http://cr.openjdk.java.net/~jzhu/8252259/webrev.00/
> >
> > Yes, thanks. I can't remember why FLOATPRESSURE is 64, but it
> > certainly looks like 32 is a much more sensible value.
> >
> > --
> > Andrew Haley  (he/him)
> > Java Platform Lead Engineer
> > Red Hat UK Ltd. <https://www.redhat.com>
> > https://keybase.io/andrewhaley
> > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From christian.hagedorn at oracle.com  Wed Aug 26 11:10:41 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Wed, 26 Aug 2020 13:10:41 +0200
Subject: [16] RFR(S): 8251093: Improve C1 register allocator logging and
 debugging support
In-Reply-To: <ca750e7b-d81a-ec51-19b0-c49639c78aeb@oracle.com>
References: <485452b9-ce64-f0ef-d7ad-324195ae3324@oracle.com>
 <00af6d16-42c4-5a0f-1ae5-547c28eb6068@oracle.com>
 <5c6d0dae-a30d-7b32-4503-62b7ec460606@oracle.com>
 <d4f2ee42-6db4-644f-0b0c-2c6e4d6373bd@oracle.com>
 <4b6d3d8a-f2f3-9942-de60-078a0e5c46d9@oracle.com>
 <8cd1d560-f473-f4f1-a865-70e306d4750f@oracle.com>
 <0d5fd444-e836-8042-3039-6d16e62ecfb1@oracle.com>
 <ca750e7b-d81a-ec51-19b0-c49639c78aeb@oracle.com>
Message-ID: <78c28a8c-8a7b-f10d-95e9-e583a278b03c@oracle.com>

Hi Tobias

Thank you for your review!

On 25.08.20 14:37, Tobias Hartmann wrote:
> Hi Christian,
> 
> On 19.08.20 16:06, Christian Hagedorn wrote:
>> http://cr.openjdk.java.net/~chagedorn/8251093/webrev.02/
> Looks good to me, just noticed some style issues (no new webrev required):
> 
> c1_LinearScan.cpp:
> - Wrong indentation in lines 5445, 5509, 5681

Thanks, fixed it inline.

> TestTraceLinearScanLevel.java:
> - "... in a HelloWorld program". It's not a HelloWorld program, right? ;)

Oh, you're right! Should have written "... in a *silent* HelloWorld 
program" :-)

Best regards,
Christian

From christian.hagedorn at oracle.com  Wed Aug 26 12:43:20 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Wed, 26 Aug 2020 14:43:20 +0200
Subject: RFR(T): 8252296: Shenandoah: crash in
 CallNode::extract_projections
In-Reply-To: <878se17p50.fsf@redhat.com>
References: <87d03d7pdk.fsf@redhat.com> <878se17p50.fsf@redhat.com>
Message-ID: <312607ab-2d2f-7966-519c-5354951d5184@oracle.com>

Hi Roland

Looks good and trivial to me.

Best regards,
Christian

On 26.08.20 11:06, Roland Westrelin wrote:
> 
> Should have gone to hotspot-compiler-dev as well...
> 
> -------------------- Start of forwarded message --------------------
> From: Roland Westrelin <rwestrel at redhat.com>
> To: shenandoah-dev at openjdk.java.net
> Subject: RFR(T): 8252296: Shenandoah: crash in CallNode::extract_projections
> Date: Wed, 26 Aug 2020 11:00:55 +0200
> 
> 
> http://cr.openjdk.java.net/~roland/8252296/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8252296
> 
> My fix for 8251527 has caused failures with shenandoah enabled because
> CallNode::extract_projections() is called with a graph in the process of
> being modified where a ProjNode has more than one control use.
> 
> Roland.
> -------------------- End of forwarded message --------------------
> 

From adinn at redhat.com  Wed Aug 26 12:54:26 2020
From: adinn at redhat.com (Andrew Dinn)
Date: Wed, 26 Aug 2020 13:54:26 +0100
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <524a4aaa-3cf7-7b5e-3e0e-0fd7f4f89fbf@oracle.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com>
 <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com>
 <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com>
 <b97a6343-ec3b-f6ec-2a75-d02a53641ed1@arm.com>
 <35f8801f-4383-4804-2a9b-5818d1bda763@redhat.com>
 <524a4aaa-3cf7-7b5e-3e0e-0fd7f4f89fbf@oracle.com>
Message-ID: <670fad6f-16ff-a7b3-8775-08dd79809ddf@redhat.com>

Hi Vladimir,

On 25/08/2020 14:18, Vladimir Ivanov wrote:
> I elaborated on some of the points in the thread with Ningsheng.
> 
> I put my responses in-line, but will try to avoid repeating myself too
> much.

Thanks for the response and also clarification in replies to Ningsheng.

So, if I can summarize (please correct me if I misunderstand):

  You are as concerned about existing complexity in vector handling as
much as complexity added by this patch, whether the latter is to AArch64
code or shared code.

  The goal you would like to achieve is a single set of rules for a
single kind of vector register whose size is parameterized, the
appropriate value being derived from each specific vector operation.

  Your main concern about this patch is that it adds yet another
additional vector kind to the current 'wrong' multi-kind vector model
and, what is worse, one with a different behaviour, taking us further
from your desired goal.

  Your other concern is that this design does not allow for the AArch64
ISA predication or, indeed, for what you treat uniformly as the
'implicit' predication imposed on a 'logical' max vector size (2048
bits) by the specific AVX/SVE/NEON hardware vector size.

> But you should definitely prefer 1-slot design for vector registers then
> ;-)

Indeed I do :-]

So, let me respond to the above summary points, assuming I have them
down right.

I agree that your end goal is highly desirable. However, we are not
there yet and since your attempts to do so have not succeeded so far I
don't think that means we are compelled to drop the current patch. As
you say this could (and, if it is adopted, should) be regarded as a
useful stop-gap until we come up with a unified, parameterized vector
implementation that makes it redundant.

That said, I'm not pushing hard to keep the patch if the consequence is
generating significant work later to undo it. The number of users who
might benefit from using SVE vectors from Java now or in the near future
does not look like it is going to be very large (if you are not making a
lot of use of SVE registers then that is a lot of wasted silicon and I
suspect it's going to be the rare case that someone codes an app in Java
that needs to make continuous use of SVE -- mind you, by the same token
I guess that also applies for AVX on Intel).

I'm not sure pushing this now will add a lot more work later. It seems
to me that this code is actually moving in the right direction for the
sort of solution you want. The AArch64 VecA register /is/
size-parameterized, albeit by a size fixed at startup rather than per
operation. So, that's one reason why I don't know if this implies a lot
more rework to move towards your desired goal. Surely, if we do arrive
at a unifying vector model that can replace the existing multi-kind
vectors then it ought to be able to subsume this code - unless of course
it replaces it wholesale.

Are you concerned that adding this patch will result in more cases to
pick through and correct?

Are you worried that we might have to withdraw some of the support this
patch enables to arrive at the final goal?

Also, Ningsheng and his colleagues have laid some foundations for
implementing predicated operations with this patch and have that work in
the pipeline. Once again this is moving towards the desired goal even if
it might end up doign so in a slightly sideways fashion. Perhaps we
could continue this stop-gap experiment as an experimental option in
order to learn from the experience?

regards,


Andrew Dinn
-----------
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From aph at redhat.com  Wed Aug 26 14:21:35 2020
From: aph at redhat.com (Andrew Haley)
Date: Wed, 26 Aug 2020 15:21:35 +0100
Subject: RFR 8249893: AARCH64: optimize the construction of the value from
 the bits of the other two
In-Reply-To: <405af8db-d12b-66ef-ff1b-8d0e2fb1273c@bell-sw.com>
References: <c11d8af8-bca7-7a7e-5486-94c487f241ac@bell-sw.com>
 <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com>
 <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com>
 <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com>
 <8635e113-53c0-f6a7-e51c-acf20222e216@redhat.com>
 <89c44f2f-9a4f-4298-4e78-358fff27b6a7@bell-sw.com>
 <95cf8beb-2071-8c41-ff71-d4998681e742@redhat.com>
 <2323d921-8db3-b98f-af7a-bba7b7c345be@bell-sw.com>
 <c5bc64a8-3ef9-f577-3bf9-72e9b1585298@redhat.com>
 <405af8db-d12b-66ef-ff1b-8d0e2fb1273c@bell-sw.com>
Message-ID: <5cbb89bb-32c7-8064-a6e9-f9b0d0a2b195@redhat.com>

On 25/08/2020 18:30, Boris Ulasevich wrote:
> I believe masking with left shift and right shift is not common.
> Search though jdk repository does not give such patterns while
> there is a hundreds of mask+lshift expressions.


> I implemented a simple is_bitrange_zero() method for counting the
> bitranges of sub-expressions: power-of-two masks and left shift only.
> We can take into account more cases (careful testing is a main
> concern). But particularly about "r.a << 24 >>> 24" expression
> I think it is worse to think about canonicalization: "left shift + right
> shift" to "mask + left shift" (or may be the backwards).

I'm running your test program, and for example I get this, old on the
left, new on the right.

Compiled method (c2)   11832 1113             SubTest0::tst2 (184 bytes)

  : and       x11, x2, #0x1   ;*land                            :   and     x11, x2, #0x1
  : and       x10, x1, #0x1   ;*land                            :   and     x10, x1, #0x1
  : orr       x11, x11, x11, lsl #3                             :   bfi     x11, x2, #3, #1
  : orr       x10, x10, x10, lsl #3                             :   bfi     x10, x1, #3, #1
  : and       xmethod, x3, #0x1  ;*land                         :   and     xmethod, x3, #0x1
  : add       x10, x10, x11                                     :   bfi     xmethod, x3, #3, #1
  : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, x11
  : and       xmethod, x4, #0x1  ;*land                         :   and     x11, x4, #0x1
  : add       x10, x11, x10                                     :   bfi     x11, x4, #3, #1
  : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, xmethod
  : and       xmethod, x5, #0x1  ;*land                         :   and     xmethod, x5, #0x1
  : add       x10, x11, x10                                     :   bfi     xmethod, x5, #3, #1
  : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, x11
  : and       xmethod, x6, #0x1  ;*land                         :   and     x11, x6, #0x1
  : add       x10, x11, x10                                     :   bfi     x11, x6, #3, #1
  : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, xmethod
  : and       xmethod, x7, #0x1  ;*land                         :   and     xmethod, x7, #0x1
  : add       x10, x11, x10                                     :   bfi     xmethod, x7, #3, #1
  : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, x11
  : and       xmethod, x0, #0x1  ;*land                         :   add     x10, x10, xmethod
  : add       x10, x11, x10                                     :   ldr     x13, [sp,#32]
  : orr       x11, xmethod, xmethod, lsl #3                     :   and     x11, x0, #0x1
  : ldr       xmethod, [sp,#32]                                 :   and     xmethod, x13, #0x1
  : and       xmethod, xmethod, #0x1                            :   bfi     x11, x0, #3, #1
  : add       x10, x11, x10                                     :   bfi     xmethod, x13, #3, #1
  : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, x11
  : ldr       xmethod, [sp,#40]                                 :   ldr     x13, [sp,#40]
  : and       xmethod, xmethod, #0x1                            :   and     x11, x13, #0x1
  : add       x10, x11, x10                                     :   bfi     x11, x13, #3, #1
  : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, xmethod
  : ldr       xmethod, [sp,#48]                                 :   ldr     x13, [sp,#48]
  : and       xmethod, xmethod, #0x1                            :   and     xmethod, x13, #0x1
  : add       x10, x11, x10                                     :   bfi     xmethod, x13, #3, #1
  : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, x11
  : ldr       xmethod, [sp,#56]                                 :   ldr     x13, [sp,#56]
  : and       xmethod, xmethod, #0x1                            :   and     x11, x13, #0x1
  : add       x10, x11, x10                                     :   bfi     x11, x13, #3, #1
  : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, xmethod
  : add       x0, x11, x10    ;*ladd                            :   add     x0, x10, x11

I've also tried a bunch of different test cases doing operations that
could match BFI instructions, and in only a few of them does it
happen. In almost all cases, then, this change does not help, *even
your own test case*.

I think that you've got something that is potentially useful, but it
needs some careful analysis to make sure it actually gets used.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From fw at deneb.enyo.de  Wed Aug 26 14:59:26 2020
From: fw at deneb.enyo.de (Florian Weimer)
Date: Wed, 26 Aug 2020 16:59:26 +0200
Subject: RFR(T): 8252296: Shenandoah: crash in
 CallNode::extract_projections
In-Reply-To: <312607ab-2d2f-7966-519c-5354951d5184@oracle.com> (Christian
 Hagedorn's message of "Wed, 26 Aug 2020 14:43:20 +0200")
References: <87d03d7pdk.fsf@redhat.com> <878se17p50.fsf@redhat.com>
 <312607ab-2d2f-7966-519c-5354951d5184@oracle.com>
Message-ID: <874koptpv5.fsf@mid.deneb.enyo.de>

* Christian Hagedorn:

> Looks good and trivial to me.

It seems to fix my reproducer, too.  Thanks.

From lutz.schmidt at sap.com  Wed Aug 26 15:20:52 2020
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Wed, 26 Aug 2020 15:20:52 +0000
Subject: RFR(M): 8219586: CodeHeap State Analytics processes dead nmethods
Message-ID: <6DA47071-83F8-4E02-A6A9-E7FD8B9B5813@sap.com>

Dear all, 

may I please request reviews for this fix/improvement to CodeHeap State Analytics. Explained in a nutshell it removes the last holes through which the analysis code could potentially access memory which is no longer associated with the entity being inspected.

There has been a long-lasting, off-list discussion with Erik ?sterlund until all pitfalls were identified and agreeable solutions were found. The important parts of that discussion are reflected in the bug comments. There are two major changes:

 1) All accesses to the CodeHeap are now protected by continuously holding the CodeCache_lock and, in addition, the Compile_lock. Information is aggregated in local data structures for later printing without holding the above locks.

 2) Printing the names of all code blobs has been disabled except for one operation mode where the locks can be held while printing.

Bug:    https://bugs.openjdk.java.net/browse/JDK-8219586
Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8219586.02/

This change has JDK-8250635 (currently out for review) as a prerequisite. It will not compile without. 

Thank you!
Lutz


From lutz.schmidt at sap.com  Wed Aug 26 15:18:31 2020
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Wed, 26 Aug 2020 15:18:31 +0000
Subject: RFR(S): 8250635: MethodArityHistogram should use Compile_lock in
 favour of fancy checks
Message-ID: <6C21DEE4-95FD-4EDA-9DBF-2B12560A5C04@sap.com>

Dear all, 

may I please request reviews for this small enhancement? Instead of calling a method doing complicated and fancy (hard to understand) checks, the iteration over all nmethods is now protected by holding the Compile_lock in addition to the CodeCache_lock.

Bug:    https://bugs.openjdk.java.net/browse/JDK-8250635
Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8250635.00/

Thank you!
Lutz


From martin.doerr at sap.com  Wed Aug 26 15:26:59 2020
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 26 Aug 2020 15:26:59 +0000
Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API
 for Base64 decoding
In-Reply-To: <AM4PR02MB30578492C614A8BBD34D2DB39A570@AM4PR02MB3057.eurprd02.prod.outlook.com>
References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com>
 <OFF371E11D.F5B64585-ON00258591.0006A5F8-49258591.0007AA3E@notes.na.collabserv.com>
 <ef75251b-c5ef-10b5-d0bd-c4ba15d2c934@linux.ibm.com>
 <f4c2d16d-5531-b0a6-b763-bed9d316190a@linux.ibm.com>
 <AM4PR02MB30578492C614A8BBD34D2DB39A570@AM4PR02MB3057.eurprd02.prod.outlook.com>
Message-ID: <AM4PR02MB3057EC6F32A1933BFFDC35AF9A540@AM4PR02MB3057.eurprd02.prod.outlook.com>

Hi Corey,

I should explain my comments regarding Base64.java better.

> Let's be precise: "should process a multiple of four" => "must process a
> multiple of four"
Did you try to support non-multiple of 4 and this was intended as recommendation?
I think making it a requirement and simplifying the logic in decode0 is better.
Or what's the benefit of the recommendation?

> > If any illegal base64 bytes are encountered in the source by the
> > intrinsic, the intrinsic can return a data length of zero or any
> > number of bytes before the place where the illegal base64 byte
> > was encountered.
> I think this has a drawback. Somebody may use a debugger and want to stop
> when throwing IllegalArgumentException. He should see the position which
> matches the Java implementation.
This is probably hard to understand. Let me try to explain it by example:
1. 80 Bytes get processed by the intrinsic and 60 Bytes written to the destination array.
2. The intrinsic sees an illegal base64 Byte and it returns 12 which is allowed by your specification.
3. The compiled method containing the intrinsic hits a safepoint (e.g. in the large while loop in decodeBlockSlow).
4. A JVMTI agent (debugger) reads dp and dst.
5. The person using the debugger gets angry because more bytes than dp were written into dst. The JVM didn't follow the specified behavior.

I guess we can and should avoid it by specifying that the intrinsic needs to return the dp value matching the number of Bytes written.

Best regards,
Martin


> -----Original Message-----
> From: Doerr, Martin
> Sent: Dienstag, 25. August 2020 15:38
> To: Corey Ashford <cjashfor at linux.ibm.com>; Michihiro Horie
> <HORIE at jp.ibm.com>
> Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev <core-libs-
> dev at openjdk.java.net>; Kazunori Ogata <OGATAK at jp.ibm.com>;
> joserz at br.ibm.com
> Subject: RE: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and
> API for Base64 decoding
> 
> Hi Corey,
> 
> thanks for proposing this change. I have comments and suggestions
> regarding various files.
> 
> 
> Base64.java
> 
> This is the only file which needs another review from core-libs-dev.
> First of all, I like the idea to use a HotSpotIntrinsicCandidate which can
> consume as many bytes as the implementation wants.
> 
> Comment before decodeBlock:
> Let's be precise: "should process a multiple of four" => "must process a
> multiple of four"
> 
> > If any illegal base64 bytes are encountered in the source by the
> > intrinsic, the intrinsic can return a data length of zero or any
> > number of bytes before the place where the illegal base64 byte
> > was encountered.
> I think this has a drawback. Somebody may use a debugger and want to stop
> when throwing IllegalArgumentException. He should see the position which
> matches the Java implementation.
> 
> Please note that the comment indentation differs from other comments.
> 
> decode0: Final "else" after return is redundant.
> 
> 
> stubGenerator_ppc.cpp
> 
> "__vector" breaks AIX build!
> Does it work on Big Endian linux with old gcc (we require 7.3.1, now)?
> Please either support Big Endian properly or #ifdef it out.
> What exactly does it on linux?
> I remember that we had tried such prefixes but were not satisfied. I think it
> didn't enforce 16 Byte alignment if I remember correctly.
> 
> Attention: C2 does no longer convert int/bool to 64 bit values (since JDK-
> 8086069). So the argument registers for offset, length and isURL may contain
> garbage in the higher bits.
> 
> You may want to use load_const_optimized which produces shorter code.
> 
> You may want to use __ align(32) to align unrolled_loop_start.
> 
> I'll review the algorithm in detail when I find more time.
> 
> 
> assembler_ppc.hpp
> assembler_ppc.inline.hpp
> vm_version_ppc.cpp
> vm_version_ppc.hpp
> Please rebase. Parts of the change were pushed as part of 8248190: Enable
> Power10 system and implement new byte-reverse instructions
> 
> 
> vmSymbols.hpp
> Indentation looks odd at the end.
> 
> 
> library_call.cpp
> Good. Indentation style of the call parameters differs from encodeBlock.
> 
> 
> runtime.cpp
> Good.
> 
> 
> aotCodeHeap.cpp
> vmSymbols.cpp
> shenandoahSupport.cpp
> vmStructs_jvmci.cpp
> shenandoahSupport.cpp
> escape.cpp
> runtime.hpp
> stubRoutines.cpp
> stubRoutines.hpp
> vmStructs.cpp
> Good and trivial.
> 
> 
> Tests:
> I think we should have JTREG tests to check for regressions in the future.
> 
> Best regards,
> Martin
> 
> 
> > -----Original Message-----
> > From: Corey Ashford <cjashfor at linux.ibm.com>
> > Sent: Mittwoch, 19. August 2020 20:11
> > To: Michihiro Horie <HORIE at jp.ibm.com>
> > Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev <core-libs-
> > dev at openjdk.java.net>; Kazunori Ogata <OGATAK at jp.ibm.com>;
> > joserz at br.ibm.com; Doerr, Martin <martin.doerr at sap.com>
> > Subject: Re: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and
> > API for Base64 decoding
> >
> > Michihiro Horie posted up a new iteration of this webrev for me.  This
> > time the webrev includes a complete implementation of the intrinsic for
> > Power9 and Power10.
> >
> > You can find it here:
> > http://cr.openjdk.java.net/~mhorie/8248188/webrev.02/
> >
> > Changes in webrev.02 vs. webrev.01:
> >
> >    * The method header for the intrinsic in the Base64 code has been
> > rewritten using the Javadoc style.  The clarity of the comments has been
> > improved and some verbosity has been removed.  There are no additional
> > functional changes to Base64.java.
> >
> >    * The code needed to martial and check the intrinsic parameters has
> > been added, using the base64 encodeBlock intrinsic as a guideline.
> >
> >    * A complete intrinsic implementation for Power9 and Power10 is
> included.
> >
> >    * Adds some Power9 and Power10 assembler instructions needed by the
> > intrinsic which hadn't been defined before.
> >
> > The intrinsic implementation in this patch accelerates the decoding of
> > large blocks of base64 data by a factor of about 3.5X on Power9.
> >
> > I'm attaching two Java test cases I am using for testing and
> > benchmarking.  The TestBase64_VB encodes and decodes randomly-sized
> > buffers of random data and checks that original data matches the
> > encoded-then-decoded data.  TestBase64Errors encodes a 48K block of
> > random bytes, then corrupts each byte of the encoded data, one at a
> > time, checking to see if the decoder catches the illegal byte.
> >
> > Any comments/suggestions would be appreciated.
> >
> > Thanks,
> >
> > - Corey
> >
> > On 7/27/20 6:49 PM, Corey Ashford wrote:
> > > Michihiro Horie uploaded a new revision of the Base64 decodeBlock
> > > intrinsic API for me:
> > >
> > > http://cr.openjdk.java.net/~mhorie/8248188/webrev.01/
> > >
> > > It has the following changes with respect to the original one posted:
> > >
> > >  ?* In the event of encountering a non-base64 character, instead of
> > > having a separate error code of -1, the intrinsic can now just return
> > > either 0, or the number of data bytes produced up to the point where
> the
> > > illegal base64 character was encountered.? This reduces the number of
> > > special cases, and also provides a way to speed up the process of
> > > finding the bad character by the slower, pure-Java algorithm.
> > >
> > >  ?* The isMIME boolean is removed from the API for two reasons:
> > >  ?? - The current API is not sufficient to handle the isMIME case,
> > > because there isn't a strict relationship between the number of input
> > > bytes and the number of output bytes, because there can be an arbitrary
> > > number of non-base64 characters in the source.
> > >  ?? - If an intrinsic only implements the (isMIME == false) case as ours
> > > does, it will always return 0 bytes processed, which will slightly slow
> > > down the normal path of processing an (isMIME == true) instantiation.
> > >  ?? - We considered adding a separate hotspot candidate for the (isMIME
> > > == true) case, but since we don't have an intrinsic implementation to
> > > test that, we decided to leave it as a future optimization.
> > >
> > > Comments and suggestions are welcome.? Thanks for your consideration.
> > >
> > > - Corey
> > >
> > > On 6/23/20 6:23 PM, Michihiro Horie wrote:
> > >> Hi Corey,
> > >>
> > >> Following is the issue I created.
> > >> https://bugs.openjdk.java.net/browse/JDK-8248188
> > >>
> > >> I will upload a webrev when you're ready as we talked in private.
> > >>
> > >> Best regards,
> > >> Michihiro
> > >>
> > >> Inactive hide details for "Corey Ashford" ---2020/06/24
> > >> 09:40:10---Currently in java.util.Base64, there is a
> > >> HotSpotIntrinsicCa"Corey Ashford" ---2020/06/24 09:40:10---Currently
> > >> in java.util.Base64, there is a HotSpotIntrinsicCandidate and API for
> > >> encodeBlock, but no
> > >>
> > >> From: "Corey Ashford" <cjashfor at linux.ibm.com>
> > >> To: "hotspot-compiler-dev at openjdk.java.net"
> > >> <hotspot-compiler-dev at openjdk.java.net>,
> > >> "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-
> > dev at openjdk.java.net>
> > >> Cc: Michihiro Horie/Japan/IBM at IBMJP, Kazunori
> > Ogata/Japan/IBM at IBMJP,
> > >> joserz at br.ibm.com
> > >> Date: 2020/06/24 09:40
> > >> Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for
> > >> Base64 decoding
> > >>
> > >> ------------------------------------------------------------------------
> > >>
> > >>
> > >>
> > >> Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and
> > >> API for encodeBlock, but none for decoding. ?This means that only
> > >> encoding gets acceleration from the underlying CPU's vector hardware.
> > >>
> > >> I'd like to propose adding a new intrinsic for decodeBlock. ?The
> > >> considerations I have for this new intrinsic's API:
> > >>
> > >> ??* Don't make any assumptions about the underlying capability of the
> > >> hardware. ?For example, do not impose any specific block size
> > >> granularity.
> > >>
> > >> ??* Don't assume the underlying intrinsic can handle isMIME or isURL
> > >> modes, but also let them decide if they will process the data regardless
> > >> of the settings of the two booleans.
> > >>
> > >> ??* Any remaining data that is not processed by the intrinsic will be
> > >> processed by the pure Java implementation. ?This allows the intrinsic to
> > >> process whatever block sizes it's good at without the complexity of
> > >> handling the end fragments.
> > >>
> > >> ??* If any illegal character is discovered in the decoding process, the
> > >> intrinsic will simply return -1, instead of requiring it to throw a
> > >> proper exception from the context of the intrinsic. ?In the event of
> > >> getting a -1 returned from the intrinsic, the Java Base64 library code
> > >> simply calls the pure Java implementation to have it find the error and
> > >> properly throw an exception. ?This is a performance trade-off in the
> > >> case of an error (which I expect to be very rare).
> > >>
> > >> ??* One thought I have for a further optimization (not implemented in
> > >> the current patch), is that when the intrinsic decides not to process a
> > >> block because of some combination of isURL and isMIME settings it
> > >> doesn't handle, it could return extra bits in the return code, encoded
> > >> as a negative number. ?For example:
> > >>
> > >> Illegal_Base64_char ? = 0b001;
> > >> isMIME_unsupported ? ?= 0b010;
> > >> isURL_unsupported ? ? = 0b100;
> > >>
> > >> These can be OR'd together as needed and then negated (flip the sign).
> > >> The Base64 library code could then cache these flags, so it will know
> > >> not to call the intrinsic again when another decodeBlock is requested
> > >> but with an unsupported mode. ?This will save the performance hit of
> > >> calling the intrinsic when it is guaranteed to fail.
> > >>
> > >> I've tested the attached patch with an actual intrinsic coded up for
> > >> Power9/Power10, but those runtime intrinsics and arch-specific patches
> > >> aren't attached today. ?I want to get some consensus on the
> > >> library-level intrinsic API first.
> > >>
> > >> Also attached is a simple test case to test that the new intrinsic API
> > >> doesn't break anything.
> > >>
> > >> I'm open to any comments about this.
> > >>
> > >> Thanks for your consideration,
> > >>
> > >> - Corey
> > >>
> > >>
> > >> Corey Ashford
> > >> IBM Systems, Linux Technology Center, OpenJDK team
> > >> cjashfor at us dot ibm dot com
> > >> [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro
> > >> Horie/Japan/IBM] [attachment "TestBase64.java" deleted by Michihiro
> > >> Horie/Japan/IBM]
> > >>
> > >>
> > >


From vladimir.kozlov at oracle.com  Wed Aug 26 16:44:59 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 26 Aug 2020 09:44:59 -0700
Subject: RFR (S) 8252215: Remove VerifyOptoOopOffsets flag
In-Reply-To: <bf7f3cd0-b422-c305-e8eb-7eb3f44dcf40@oracle.com>
References: <96144e25-02b7-ed81-285e-b8d487fd6cfb@redhat.com>
 <bf7f3cd0-b422-c305-e8eb-7eb3f44dcf40@oracle.com>
Message-ID: <d7df95b1-b8e3-12bf-9684-53d453b4bdc8@oracle.com>

I agree.

It does not even check that the field in particular offset is oop. It just check that there is a field for which we have 
a ton of other checks.
Also in shenandoahBarrierSetC2.cpp it check tp == NULL in assert after code already referenced through it!

Thanks,
Vladimir

On 8/25/20 5:43 AM, Tobias Hartmann wrote:
> Hi Aleksey,
> 
> looks good to me.
> 
> Best regards,
> Tobias
> 
> On 25.08.20 09:08, Aleksey Shipilev wrote:
>> RFE:
>>  ? https://bugs.openjdk.java.net/browse/JDK-8252215
>>
>> VerifyOptoOopOffsets flag does not seem to be used (no tests in the current test base), and it does
>> not seem to work reliably (see JDK-4834891). It might be a good time to remove it. JDK-4834891
>> evaluation says: "The flag VerifyOptoOopOffsets has not been valid since the introduction of
>> sun/misc/Unsafe and the flag should not be used for general testing."
>>
>> How about we remove it?
>>  ? https://cr.openjdk.java.net/~shade/8252215/webrev.01/
>>
>> Testing: tier1 (locally); jdk-submit (still running?)
>>

From cjashfor at linux.ibm.com  Wed Aug 26 16:50:05 2020
From: cjashfor at linux.ibm.com (Corey Ashford)
Date: Wed, 26 Aug 2020 09:50:05 -0700
Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API
 for Base64 decoding
In-Reply-To: <AM4PR02MB3057EC6F32A1933BFFDC35AF9A540@AM4PR02MB3057.eurprd02.prod.outlook.com>
References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com>
 <OFF371E11D.F5B64585-ON00258591.0006A5F8-49258591.0007AA3E@notes.na.collabserv.com>
 <ef75251b-c5ef-10b5-d0bd-c4ba15d2c934@linux.ibm.com>
 <f4c2d16d-5531-b0a6-b763-bed9d316190a@linux.ibm.com>
 <AM4PR02MB30578492C614A8BBD34D2DB39A570@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <AM4PR02MB3057EC6F32A1933BFFDC35AF9A540@AM4PR02MB3057.eurprd02.prod.outlook.com>
Message-ID: <c21589a9-dd9a-c735-7851-c8dabe920e20@linux.ibm.com>

Thanks for your careful review, Martin. I will consider what you have 
said, and reply with comments/questions and possibly a revised webrev if 
I think I can satisfy your concerns.

Regards,

- Corey

On 8/26/20 8:26 AM, Doerr, Martin wrote:
> Hi Corey,
> 
> I should explain my comments regarding Base64.java better.
> 
>> Let's be precise: "should process a multiple of four" => "must process a
>> multiple of four"
> Did you try to support non-multiple of 4 and this was intended as recommendation?
> I think making it a requirement and simplifying the logic in decode0 is better.
> Or what's the benefit of the recommendation?
> 
>>> If any illegal base64 bytes are encountered in the source by the
>>> intrinsic, the intrinsic can return a data length of zero or any
>>> number of bytes before the place where the illegal base64 byte
>>> was encountered.
>> I think this has a drawback. Somebody may use a debugger and want to stop
>> when throwing IllegalArgumentException. He should see the position which
>> matches the Java implementation.
> This is probably hard to understand. Let me try to explain it by example:
> 1. 80 Bytes get processed by the intrinsic and 60 Bytes written to the destination array.
> 2. The intrinsic sees an illegal base64 Byte and it returns 12 which is allowed by your specification.
> 3. The compiled method containing the intrinsic hits a safepoint (e.g. in the large while loop in decodeBlockSlow).
> 4. A JVMTI agent (debugger) reads dp and dst.
> 5. The person using the debugger gets angry because more bytes than dp were written into dst. The JVM didn't follow the specified behavior.
> 
> I guess we can and should avoid it by specifying that the intrinsic needs to return the dp value matching the number of Bytes written.
> 
> Best regards,
> Martin
> 
> 
>> -----Original Message-----
>> From: Doerr, Martin
>> Sent: Dienstag, 25. August 2020 15:38
>> To: Corey Ashford <cjashfor at linux.ibm.com>; Michihiro Horie
>> <HORIE at jp.ibm.com>
>> Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev <core-libs-
>> dev at openjdk.java.net>; Kazunori Ogata <OGATAK at jp.ibm.com>;
>> joserz at br.ibm.com
>> Subject: RE: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and
>> API for Base64 decoding
>>
>> Hi Corey,
>>
>> thanks for proposing this change. I have comments and suggestions
>> regarding various files.
>>
>>
>> Base64.java
>>
>> This is the only file which needs another review from core-libs-dev.
>> First of all, I like the idea to use a HotSpotIntrinsicCandidate which can
>> consume as many bytes as the implementation wants.
>>
>> Comment before decodeBlock:
>> Let's be precise: "should process a multiple of four" => "must process a
>> multiple of four"
>>
>>> If any illegal base64 bytes are encountered in the source by the
>>> intrinsic, the intrinsic can return a data length of zero or any
>>> number of bytes before the place where the illegal base64 byte
>>> was encountered.
>> I think this has a drawback. Somebody may use a debugger and want to stop
>> when throwing IllegalArgumentException. He should see the position which
>> matches the Java implementation.
>>
>> Please note that the comment indentation differs from other comments.
>>
>> decode0: Final "else" after return is redundant.
>>
>>
>> stubGenerator_ppc.cpp
>>
>> "__vector" breaks AIX build!
>> Does it work on Big Endian linux with old gcc (we require 7.3.1, now)?
>> Please either support Big Endian properly or #ifdef it out.
>> What exactly does it on linux?
>> I remember that we had tried such prefixes but were not satisfied. I think it
>> didn't enforce 16 Byte alignment if I remember correctly.
>>
>> Attention: C2 does no longer convert int/bool to 64 bit values (since JDK-
>> 8086069). So the argument registers for offset, length and isURL may contain
>> garbage in the higher bits.
>>
>> You may want to use load_const_optimized which produces shorter code.
>>
>> You may want to use __ align(32) to align unrolled_loop_start.
>>
>> I'll review the algorithm in detail when I find more time.
>>
>>
>> assembler_ppc.hpp
>> assembler_ppc.inline.hpp
>> vm_version_ppc.cpp
>> vm_version_ppc.hpp
>> Please rebase. Parts of the change were pushed as part of 8248190: Enable
>> Power10 system and implement new byte-reverse instructions
>>
>>
>> vmSymbols.hpp
>> Indentation looks odd at the end.
>>
>>
>> library_call.cpp
>> Good. Indentation style of the call parameters differs from encodeBlock.
>>
>>
>> runtime.cpp
>> Good.
>>
>>
>> aotCodeHeap.cpp
>> vmSymbols.cpp
>> shenandoahSupport.cpp
>> vmStructs_jvmci.cpp
>> shenandoahSupport.cpp
>> escape.cpp
>> runtime.hpp
>> stubRoutines.cpp
>> stubRoutines.hpp
>> vmStructs.cpp
>> Good and trivial.
>>
>>
>> Tests:
>> I think we should have JTREG tests to check for regressions in the future.
>>
>> Best regards,
>> Martin
>>
>>
>>> -----Original Message-----
>>> From: Corey Ashford <cjashfor at linux.ibm.com>
>>> Sent: Mittwoch, 19. August 2020 20:11
>>> To: Michihiro Horie <HORIE at jp.ibm.com>
>>> Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev <core-libs-
>>> dev at openjdk.java.net>; Kazunori Ogata <OGATAK at jp.ibm.com>;
>>> joserz at br.ibm.com; Doerr, Martin <martin.doerr at sap.com>
>>> Subject: Re: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and
>>> API for Base64 decoding
>>>
>>> Michihiro Horie posted up a new iteration of this webrev for me.  This
>>> time the webrev includes a complete implementation of the intrinsic for
>>> Power9 and Power10.
>>>
>>> You can find it here:
>>> http://cr.openjdk.java.net/~mhorie/8248188/webrev.02/
>>>
>>> Changes in webrev.02 vs. webrev.01:
>>>
>>>     * The method header for the intrinsic in the Base64 code has been
>>> rewritten using the Javadoc style.  The clarity of the comments has been
>>> improved and some verbosity has been removed.  There are no additional
>>> functional changes to Base64.java.
>>>
>>>     * The code needed to martial and check the intrinsic parameters has
>>> been added, using the base64 encodeBlock intrinsic as a guideline.
>>>
>>>     * A complete intrinsic implementation for Power9 and Power10 is
>> included.
>>>
>>>     * Adds some Power9 and Power10 assembler instructions needed by the
>>> intrinsic which hadn't been defined before.
>>>
>>> The intrinsic implementation in this patch accelerates the decoding of
>>> large blocks of base64 data by a factor of about 3.5X on Power9.
>>>
>>> I'm attaching two Java test cases I am using for testing and
>>> benchmarking.  The TestBase64_VB encodes and decodes randomly-sized
>>> buffers of random data and checks that original data matches the
>>> encoded-then-decoded data.  TestBase64Errors encodes a 48K block of
>>> random bytes, then corrupts each byte of the encoded data, one at a
>>> time, checking to see if the decoder catches the illegal byte.
>>>
>>> Any comments/suggestions would be appreciated.
>>>
>>> Thanks,
>>>
>>> - Corey
>>>
>>> On 7/27/20 6:49 PM, Corey Ashford wrote:
>>>> Michihiro Horie uploaded a new revision of the Base64 decodeBlock
>>>> intrinsic API for me:
>>>>
>>>> http://cr.openjdk.java.net/~mhorie/8248188/webrev.01/
>>>>
>>>> It has the following changes with respect to the original one posted:
>>>>
>>>>   ?* In the event of encountering a non-base64 character, instead of
>>>> having a separate error code of -1, the intrinsic can now just return
>>>> either 0, or the number of data bytes produced up to the point where
>> the
>>>> illegal base64 character was encountered.? This reduces the number of
>>>> special cases, and also provides a way to speed up the process of
>>>> finding the bad character by the slower, pure-Java algorithm.
>>>>
>>>>   ?* The isMIME boolean is removed from the API for two reasons:
>>>>   ?? - The current API is not sufficient to handle the isMIME case,
>>>> because there isn't a strict relationship between the number of input
>>>> bytes and the number of output bytes, because there can be an arbitrary
>>>> number of non-base64 characters in the source.
>>>>   ?? - If an intrinsic only implements the (isMIME == false) case as ours
>>>> does, it will always return 0 bytes processed, which will slightly slow
>>>> down the normal path of processing an (isMIME == true) instantiation.
>>>>   ?? - We considered adding a separate hotspot candidate for the (isMIME
>>>> == true) case, but since we don't have an intrinsic implementation to
>>>> test that, we decided to leave it as a future optimization.
>>>>
>>>> Comments and suggestions are welcome.? Thanks for your consideration.
>>>>
>>>> - Corey
>>>>
>>>> On 6/23/20 6:23 PM, Michihiro Horie wrote:
>>>>> Hi Corey,
>>>>>
>>>>> Following is the issue I created.
>>>>> https://bugs.openjdk.java.net/browse/JDK-8248188
>>>>>
>>>>> I will upload a webrev when you're ready as we talked in private.
>>>>>
>>>>> Best regards,
>>>>> Michihiro
>>>>>
>>>>> Inactive hide details for "Corey Ashford" ---2020/06/24
>>>>> 09:40:10---Currently in java.util.Base64, there is a
>>>>> HotSpotIntrinsicCa"Corey Ashford" ---2020/06/24 09:40:10---Currently
>>>>> in java.util.Base64, there is a HotSpotIntrinsicCandidate and API for
>>>>> encodeBlock, but no
>>>>>
>>>>> From: "Corey Ashford" <cjashfor at linux.ibm.com>
>>>>> To: "hotspot-compiler-dev at openjdk.java.net"
>>>>> <hotspot-compiler-dev at openjdk.java.net>,
>>>>> "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-
>>> dev at openjdk.java.net>
>>>>> Cc: Michihiro Horie/Japan/IBM at IBMJP, Kazunori
>>> Ogata/Japan/IBM at IBMJP,
>>>>> joserz at br.ibm.com
>>>>> Date: 2020/06/24 09:40
>>>>> Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for
>>>>> Base64 decoding
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>> Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and
>>>>> API for encodeBlock, but none for decoding. ?This means that only
>>>>> encoding gets acceleration from the underlying CPU's vector hardware.
>>>>>
>>>>> I'd like to propose adding a new intrinsic for decodeBlock. ?The
>>>>> considerations I have for this new intrinsic's API:
>>>>>
>>>>>  ??* Don't make any assumptions about the underlying capability of the
>>>>> hardware. ?For example, do not impose any specific block size
>>>>> granularity.
>>>>>
>>>>>  ??* Don't assume the underlying intrinsic can handle isMIME or isURL
>>>>> modes, but also let them decide if they will process the data regardless
>>>>> of the settings of the two booleans.
>>>>>
>>>>>  ??* Any remaining data that is not processed by the intrinsic will be
>>>>> processed by the pure Java implementation. ?This allows the intrinsic to
>>>>> process whatever block sizes it's good at without the complexity of
>>>>> handling the end fragments.
>>>>>
>>>>>  ??* If any illegal character is discovered in the decoding process, the
>>>>> intrinsic will simply return -1, instead of requiring it to throw a
>>>>> proper exception from the context of the intrinsic. ?In the event of
>>>>> getting a -1 returned from the intrinsic, the Java Base64 library code
>>>>> simply calls the pure Java implementation to have it find the error and
>>>>> properly throw an exception. ?This is a performance trade-off in the
>>>>> case of an error (which I expect to be very rare).
>>>>>
>>>>>  ??* One thought I have for a further optimization (not implemented in
>>>>> the current patch), is that when the intrinsic decides not to process a
>>>>> block because of some combination of isURL and isMIME settings it
>>>>> doesn't handle, it could return extra bits in the return code, encoded
>>>>> as a negative number. ?For example:
>>>>>
>>>>> Illegal_Base64_char ? = 0b001;
>>>>> isMIME_unsupported ? ?= 0b010;
>>>>> isURL_unsupported ? ? = 0b100;
>>>>>
>>>>> These can be OR'd together as needed and then negated (flip the sign).
>>>>> The Base64 library code could then cache these flags, so it will know
>>>>> not to call the intrinsic again when another decodeBlock is requested
>>>>> but with an unsupported mode. ?This will save the performance hit of
>>>>> calling the intrinsic when it is guaranteed to fail.
>>>>>
>>>>> I've tested the attached patch with an actual intrinsic coded up for
>>>>> Power9/Power10, but those runtime intrinsics and arch-specific patches
>>>>> aren't attached today. ?I want to get some consensus on the
>>>>> library-level intrinsic API first.
>>>>>
>>>>> Also attached is a simple test case to test that the new intrinsic API
>>>>> doesn't break anything.
>>>>>
>>>>> I'm open to any comments about this.
>>>>>
>>>>> Thanks for your consideration,
>>>>>
>>>>> - Corey
>>>>>
>>>>>
>>>>> Corey Ashford
>>>>> IBM Systems, Linux Technology Center, OpenJDK team
>>>>> cjashfor at us dot ibm dot com
>>>>> [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro
>>>>> Horie/Japan/IBM] [attachment "TestBase64.java" deleted by Michihiro
>>>>> Horie/Japan/IBM]
>>>>>
>>>>>
>>>>
> 


From vladimir.kozlov at oracle.com  Wed Aug 26 16:59:43 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 26 Aug 2020 09:59:43 -0700
Subject: RFR (XS) 8252290: C2: Remove unused enum in CallGenerator
In-Reply-To: <AM0PR0202MB33312F1A87D044A43397B41A9B570@AM0PR0202MB3331.eurprd02.prod.outlook.com>
References: <30d68060-d518-a2c6-f853-9e870d48f0ad@redhat.com>
 <AM0PR0202MB33312F1A87D044A43397B41A9B570@AM0PR0202MB3331.eurprd02.prod.outlook.com>
Message-ID: <30db6ea6-cf81-4fb8-b43f-3a275fa7acab@oracle.com>

On 8/25/20 2:28 AM, Reingruber, Richard wrote:
> Hi Aleksey,
> 
> the cleanup looks good to me.

+1

> 
> That enum was already part of the initial load with xxxunusedxxx as the only element [1].
> So there's no open version history.
> 
> I could not find any references either (rtags, grep). Probably the enum had more elements > originally which were removed.

Nope. Old history shows that it was like this from time when callGenerator.hpp was created. I assume it is leftover from 
C2 implementation work.

Regards,
Vladimir K

> 
> Thanks, Richard.
> 
> [1] https://github.com/openjdk/jdk/blame/d4626d89cc778b8b7108036f389548c95d52e56a/src/hotspot/share/opto/callGenerator.hpp#L41
> 
> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-retn at openjdk.java.net> On Behalf Of Aleksey Shipilev
> Sent: Dienstag, 25. August 2020 09:29
> To: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: RFR (XS) 8252290: C2: Remove unused enum in CallGenerator
> 
> Small cleanup:
>     https://bugs.openjdk.java.net/browse/JDK-8252290
> 
> Static code inspection complains the enum below is unused.
> 
> diff -r 13fdf97f0a8f src/hotspot/share/opto/callGenerator.hpp
> --- a/src/hotspot/share/opto/callGenerator.hpp  Mon Aug 24 09:35:23 2020 +0200
> +++ b/src/hotspot/share/opto/callGenerator.hpp  Tue Aug 25 09:27:45 2020 +0200
> @@ -37,9 +37,4 @@
> 
>    class CallGenerator : public ResourceObj {
> - public:
> -  enum {
> -    xxxunusedxxx
> -  };
> -
>     private:
>      ciMethod*             _method;                // The method being called.
> 
> Testing: grepping for "xxxunusedxxx", local builds
> 

From vladimir.kozlov at oracle.com  Wed Aug 26 17:10:42 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 26 Aug 2020 10:10:42 -0700
Subject: RFR(S): 8252292: 8240795 may cause anti-dependence to be missed
In-Reply-To: <fe7731bf-3521-c4d3-7f96-4656b3f28292@oracle.com>
References: <87wo1n6snc.fsf@redhat.com>
 <fe7731bf-3521-c4d3-7f96-4656b3f28292@oracle.com>
Message-ID: <09a82d80-208c-6cea-da6b-e501d65e0f79@oracle.com>

On 8/25/20 6:49 AM, Tobias Hartmann wrote:
> Hi Roland,
> 
> Good catch, the fix looks reasonable to me.

+1

Thanks,
Vladimir K

> 
> I think the test needs a @requires vm.gc == "Parallel" | vm.gc == "null" to not fail due to
> conflicting GC options if another GC is set.
> 
> Best regards,
> Tobias
> 
> On 25.08.20 10:23, Roland Westrelin wrote:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8252292
>> http://cr.openjdk.java.net/~roland/8252292/webrev.00/
>>
>> In 8240795, I modified alias analysis so non escaping allocations don't
>> alias with bottom memory. While browsing that code last week, I noticed
>> that that change didn't seem quite right and may cause some
>> anti-dependences to be missed. I could indeed write a test case that
>> fails with an incorrect execution.
>>
>> In the test case: the dst[9] load after the ArrayCopy is transformed
>> into a src[9] load before the ArrayCopy. Anti dependence analysis find
>> src[9] shares the memory of the ArrayCopy but because of the way I
>> tweaked the code with 8240795, anti-dependence analysis finds the src[9]
>> and ArrayCopy don't alias so src[9] can sink out of the loop which is
>> wrong because of the src[9] store. Anti-dependence analysis in that case
>> would need to look at the memory uses of ArrayCopy too.
>>
>> Roland.
>>

From vladimir.kozlov at oracle.com  Wed Aug 26 18:07:38 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 26 Aug 2020 11:07:38 -0700
Subject: RFR(S): 8241486: G1/Z give warning when using LoopStripMiningIter
 and turn off LoopStripMiningIter (0)
In-Reply-To: <e5dc8786-0044-37bd-299a-89c90f303a2e@oracle.com>
References: <87tuwr6s5j.fsf@redhat.com>
 <b2bd009e-fdba-d5f4-77b6-173df1609102@oracle.com>
 <e5dc8786-0044-37bd-299a-89c90f303a2e@oracle.com>
Message-ID: <e3352381-1722-f36b-a8e0-4b3ecc135674@oracle.com>

+1

Thanks,
Vladimir K

On 8/25/20 6:18 AM, Tobias Hartmann wrote:
> 
> On 25.08.20 14:57, Tobias Hartmann wrote:
>>> * @requires vm.gc.G1 & vm.gc.Shenandoah & vm.gc.Z & vm.gc.Epsilon
>> That doesn't look right. The test would never be executed.
> 
> Sorry, confused it with the vm.gc == .. check. You are just checking if the VM supports the GC.
> 
> Looks good to me.
> 
> Best regards,
> Tobias
> 

From honguye at microsoft.com  Wed Aug 26 18:55:07 2020
From: honguye at microsoft.com (Nhat Nguyen)
Date: Wed, 26 Aug 2020 18:55:07 +0000
Subject: RFR(S) 8251271- C2: Compile::_for_igvn list is corrupted after
 RenumberLiveNodes
Message-ID: <MW2PR2101MB1786E5FFC976B369ED901396A6540@MW2PR2101MB1786.namprd21.prod.outlook.com>

Hi hotspot-compiler-dev,

Please review the following patch to address https://bugs.openjdk.java.net/browse/JDK-8251271
The bug is currently assigned to Christian Hagedorn, but he was supportive of me submitting the patch instead.
I have run hotspot/tier1 and jdk/tier1 tests to make sure that the change is working as intended.

webrev: http://cr.openjdk.java.net/~burban/nhat/JDK-8251271/webrev.00/

Thank you,
Nhat

From jingxinc at amazon.com  Wed Aug 26 21:36:52 2020
From: jingxinc at amazon.com (Eric, Chan)
Date: Wed, 26 Aug 2020 21:36:52 +0000
Subject: RFR 8239090: Improve CPU feature support in VM_version
Message-ID: <21DF2FC1-7D91-4D2A-87EB-8F42EA1E276D@amazon.com>

Hi,

Requesting review for

Webrev : http://cr.openjdk.java.net/~xliu/eric/8213777/01/webrev/
JBS : https://bugs.openjdk.java.net/browse/JDK-8239090

I improve the ?get_processor_features? method by store every cpu features in an enum array so that we don?t have to count how many ?%s? that need to added. I passed the tier1 test successfully.

Regards,
Eric Chen

From cjashfor at linux.ibm.com  Wed Aug 26 22:17:25 2020
From: cjashfor at linux.ibm.com (Corey Ashford)
Date: Wed, 26 Aug 2020 15:17:25 -0700
Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API
 for Base64 decoding
In-Reply-To: <AM4PR02MB3057EC6F32A1933BFFDC35AF9A540@AM4PR02MB3057.eurprd02.prod.outlook.com>
References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com>
 <OFF371E11D.F5B64585-ON00258591.0006A5F8-49258591.0007AA3E@notes.na.collabserv.com>
 <ef75251b-c5ef-10b5-d0bd-c4ba15d2c934@linux.ibm.com>
 <f4c2d16d-5531-b0a6-b763-bed9d316190a@linux.ibm.com>
 <AM4PR02MB30578492C614A8BBD34D2DB39A570@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <AM4PR02MB3057EC6F32A1933BFFDC35AF9A540@AM4PR02MB3057.eurprd02.prod.outlook.com>
Message-ID: <f315c49b-a42e-0668-0f53-3b9e979c0acc@linux.ibm.com>

Hi Martin,

Some inline responses below.

On 8/26/20 8:26 AM, Doerr, Martin wrote:

> Hi Corey,
> 
> I should explain my comments regarding Base64.java better.
> 
>> Let's be precise: "should process a multiple of four" => "must process a
>> multiple of four"
> Did you try to support non-multiple of 4 and this was intended as recommendation?
> I think making it a requirement and simplifying the logic in decode0 is better.
> Or what's the benefit of the recommendation?

If I make a requirement, I feel decode0 should check that the 
requirement is met, and raise some kind of internal error if it isn't. 
That actually was my first implementation, but I received some comments 
during an internal review suggesting that I just "round down" the 
destination count to the closest multiple of 3 less than or equal to the 
returned value, rather than throw an internal exception which would 
confuse users.  This "enforces" the rule, in some sense, without error 
handling.  Do you have some thoughts about this?

> 
>>> If any illegal base64 bytes are encountered in the source by the
>>> intrinsic, the intrinsic can return a data length of zero or any
>>> number of bytes before the place where the illegal base64 byte
>>> was encountered.
>> I think this has a drawback. Somebody may use a debugger and want to stop
>> when throwing IllegalArgumentException. He should see the position which
>> matches the Java implementation.kkkk
> This is probably hard to understand. Let me try to explain it by example:
> 1. 80 Bytes get processed by the intrinsic and 60 Bytes written to the destination array.
> 2. The intrinsic sees an illegal base64 Byte and it returns 12 which is allowed by your specification.
> 3. The compiled method containing the intrinsic hits a safepoint (e.g. in the large while loop in decodeBlockSlow).
> 4. A JVMTI agent (debugger) reads dp and dst.
> 5. The person using the debugger gets angry because more bytes than dp were written into dst. The JVM didn't follow the specified behavior.
> 
> I guess we can and should avoid it by specifying that the intrinsic needs to return the dp value matching the number of Bytes written.

That's an interesting point.  I will change the specification, and the 
intrinsic implementation.  Right now the Power9/10 intrinsic returns 0 
when any illegal character is discovered, but I've been thinking about 
returning the number of bytes already written, which will allow 
decodeBlockSlow to more quickly find the offending character.  This 
provides another good reason to make that change.

> 
> Best regards,
> Martin
> 
> 
>> -----Original Message-----
>> From: Doerr, Martin
>> Sent: Dienstag, 25. August 2020 15:38
>> To: Corey Ashford <cjashfor at linux.ibm.com>; Michihiro Horie
>> <HORIE at jp.ibm.com>
>> Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev <core-libs-
>> dev at openjdk.java.net>; Kazunori Ogata <OGATAK at jp.ibm.com>;
>> joserz at br.ibm.com
>> Subject: RE: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and
>> API for Base64 decoding
>>
>> Hi Corey,
>>
>> thanks for proposing this change. I have comments and suggestions
>> regarding various files.
>>
>>
>> Base64.java
>>
>> This is the only file which needs another review from core-libs-dev.
>> First of all, I like the idea to use a HotSpotIntrinsicCandidate which can
>> consume as many bytes as the implementation wants.
>>
>> Comment before decodeBlock:
>> Let's be precise: "should process a multiple of four" => "must process a
>> multiple of four"
>>
>>> If any illegal base64 bytes are encountered in the source by the
>>> intrinsic, the intrinsic can return a data length of zero or any
>>> number of bytes before the place where the illegal base64 byte
>>> was encountered.
>> I think this has a drawback. Somebody may use a debugger and want to stop
>> when throwing IllegalArgumentException. He should see the position which
>> matches the Java implementation.
>>
>> Please note that the comment indentation differs from other comments.

Will fix.

>>
>> decode0: Final "else" after return is redundant.

Will fix.

>>
>>
>> stubGenerator_ppc.cpp
>>
>> "__vector" breaks AIX build!
>> Does it work on Big Endian linux with old gcc (we require 7.3.1, now)?
>> Please either support Big Endian properly or #ifdef it out.

I have been compiling with only Advance Toolchain 13, which is 9.3.1, 
and only on Linux.  It will not work with big endian, so it won't work 
on AIX, however obviously it shouldn't break the AIX build, so I will 
address that.  There's code to set UseBASE64Intrinsics to false on big 
endian, but you're right -- I should ifdef all of the intrinsic code for 
little endian for now.  Getting it to work on big endian / AIX shouldn't 
be difficult, but it's not in my scope of work at the moment.

I will double check that everything compiles and runs properly with gcc 
7.3.1.

>> What exactly does it (do) on linux?

It's an arch-specific type that's 16 bytes in size and aligned on a 
16-byte boundary.

>> I remember that we had tried such prefixes but were not satisfied. I think it
>> didn't enforce 16 Byte alignment if I remember correctly.

I will use __attribute__ ((align(16))) instead of __vector, and make 
them arrays of 16 unsigned char.

>>
>> Attention: C2 does no longer convert int/bool to 64 bit values (since JDK-
>> 8086069). So the argument registers for offset, length and isURL may contain
>> garbage in the higher bits.

Wow, that's good to know!  I will mask off the incoming values.

>>
>> You may want to use load_const_optimized which produces shorter code.

Will fix.

>>
>> You may want to use __ align(32) to align unrolled_loop_start.

Will fix.

>>
>> I'll review the algorithm in detail when I find more time.
>>
>>
>> assembler_ppc.hpp
>> assembler_ppc.inline.hpp
>> vm_version_ppc.cpp
>> vm_version_ppc.hpp
>> Please rebase. Parts of the change were pushed as part of 8248190: Enable
>> Power10 system and implement new byte-reverse instructions

Will do.

>>
>>
>> vmSymbols.hpp
>> Indentation looks odd at the end.

I was following what was done for encodeBlock, but it appears 
encodeBlock's style isn't what is used for the other intrinsics.  I will 
correct decodeBlock to use the prevailing style.  Another patch should 
be added (not part of this webrev) to correct encodeBlock's style.

>>
>>
>> library_call.cpp
>> Good. Indentation style of the call parameters differs from encodeBlock. 

Will fix.

>>
>>
>> runtime.cpp
>> Good.
>>
>>
>> aotCodeHeap.cpp
>> vmSymbols.cpp
>> shenandoahSupport.cpp
>> vmStructs_jvmci.cpp
>> shenandoahSupport.cpp
>> escape.cpp
>> runtime.hpp
>> stubRoutines.cpp
>> stubRoutines.hpp
>> vmStructs.cpp
>> Good and trivial.
>>
>>
>> Tests:
>> I think we should have JTREG tests to check for regressions in the future.

Ah, this is another thing I didn't know about.  I will make some 
regression tests.

Thanks for your time on this.  As you can tell, I'm inexperienced in 
writing openjdk code, so your patience and careful review is really 
appreciated.

- Corey

From vladimir.kozlov at oracle.com  Wed Aug 26 23:31:19 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 26 Aug 2020 16:31:19 -0700
Subject: RFR 8164632: Node indices should be treated as unsigned integers
In-Reply-To: <587AF7B9-5EE9-4F93-A587-9B3277E9183D@amazon.com>
References: <05F44A7B-7BF3-4EF0-B1A6-8131600A3919@amazon.com>
 <d533b5d6-2723-09a8-559d-f3afb78f8645@oracle.com>
 <587AF7B9-5EE9-4F93-A587-9B3277E9183D@amazon.com>
Message-ID: <e2809cf4-b04d-b5fe-7265-adf66f181982@oracle.com>

Missed this.

On 8/14/20 1:54 PM, Hohensee, Paul wrote:
> By "e.g.", I meant "ones like the one in the webrev". Tobais is correct that there are more. I grep'ed for "(int idx", ", int idx", "(int idx)", and so on, and found a bunch (not all of them are node_idx_t, but many of those that aren't should probably be uint too). So those would be fixed first.

Yes, I am okay with fixing them first.

Thanks,
Vladimir K

> 
> Thanks,
> Paul
> 
> ?On 8/14/20, 11:04 AM, "Vladimir Kozlov" <vladimir.kozlov at oracle.com> wrote:
> 
>      On 8/14/20 9:05 AM, Hohensee, Paul wrote:
>      > Hi, Vladimir,
>      >
>      > What do you think of the following?
>      >
>      > 1. Fix 8164632, i.e., replace int with uint, and add guarantees where idxs are passed to a different type (as in e.g., Eric's webrev).
> 
>      I see only this change:
> 
>      -      const TypeOopPtr* tinst = t->cast_to_instance_id(ni);
>      +      assert(ni<=INT_MAX,"node index cannot be negative");
>      +      const TypeOopPtr* tinst = t->cast_to_instance_id((int)ni);
> 
>      I would like to see first what you are suggesting.
> 
>      > 2. New issue: Define an enum type for _instance_id, (typedef uint instance_idx_t) and change the guarantees to check < InstanceTop and > InstanceBot (InstanceTop = ~(uint)0, InstanceBot = 0). And change from instance ids from int to instance_idx_t.
>      > 3. New issue: Change from uint to node_idx_t.
> 
>      Yes, it is fine to split these 2.
> 
>      Regards,
>      Vladimir
> 
>      >
>      > Thanks,
>      > Paul
>      >
>      > On 8/13/20, 4:00 PM, "hotspot-compiler-dev on behalf of Vladimir Kozlov" <hotspot-compiler-dev-retn at openjdk.java.net on behalf of vladimir.kozlov at oracle.com> wrote:
>      >
>      >      Yes, it is sloppy :(
>      >
>      >      Mostly it bases on value of MaxNodeLimit = 80000 by default and as result node's idx will never reach MAX_INT.
>      >
>      >      For EA we need 2 special types TOP and BOTTOM as Paul correctly pointed in RFE.
>      >      We can make InstanceTop == max_juint and node_idx_t type for _instance_id . We don't do arithmetic on it, see
>      >      TypeOopPtr::meet_instance_id(). But we can't use assert in this case to check incoming idx because max_juint will be
>      >      valid value - InstanceTop.
>      >
>      >      And I agree that we should use node_idx_t everywhere.
>      >
>      >      For example, Node::Init(), init_node_notes(), node_notes_at() and set_node_notes_at() should use it.
>      >
>      >      Same goes for req and other Node's methods arguments. All Node fields defined as node_idx_t but we have mix of int and
>      >      uint when referencing them.
>      >
>      >      Warning: it is not small change.
>      >
>      >      Regards,
>      >      Vladimir
>      >
>      >      On 8/13/20 2:51 PM, Hohensee, Paul wrote:
>      >      > Shouldn't all the uint type uses that represent node indices actually be node_idx_t?
>      >      >
>      >      > Thanks,
>      >      > Paul
>      >      >
>      >      > On 8/13/20, 12:34 AM, "hotspot-compiler-dev on behalf of Tobias Hartmann" <hotspot-compiler-dev-retn at openjdk.java.net on behalf of tobias.hartmann at oracle.com> wrote:
>      >      >
>      >      >      Hi Eric,
>      >      >
>      >      >      there are other places where Node::_idx is casted to int (and a potential overflow might happen).
>      >      >      For example, calls to Compile::node_notes_at.
>      >      >
>      >      >      The purpose of this RFE was to replace all Node::_idx uint -> int casts and consistently use uint
>      >      >      for the node index. If that's not feasible, we should at least add a guarantee (not only an assert)
>      >      >      checking that _idx is always <= MAX_INT.
>      >      >
>      >      >      Best regards,
>      >      >      Tobias
>      >      >
>      >      >      On 12.08.20 00:41, Eric, Chan wrote:
>      >      >      > Hi,
>      >      >      >
>      >      >      > Requesting review for
>      >      >      >
>      >      >      > Webrev : http://cr.openjdk.java.net/~xliu/eric/8164632/00/webrev/
>      >      >      > JBS : https://bugs.openjdk.java.net/browse/JDK-8164632
>      >      >      >
>      >      >      > The change cast uint ni to integer so that the parameter that pass to method TypeOopPtr::cast_to_instance_id is a integer.
>      >      >      >
>      >      >      > I have tested this builds successfully .
>      >      >      >
>      >      >      > Ensured that there are no regressions in hotspot : tier1 tests.
>      >      >      >
>      >      >      > Regards,
>      >      >      > Eric Chen
>      >      >      >
>      >      >
>      >
> 

From jiefu at tencent.com  Wed Aug 26 23:37:37 2020
From: jiefu at tencent.com (=?utf-8?B?amllZnUo5YKF5p2wKQ==?=)
Date: Wed, 26 Aug 2020 23:37:37 +0000
Subject: RFR: 8252404: compiler/c1/TestTraceLinearScanLevel.java fails with
 release VMs
Message-ID: <1D07C3F6-F236-4934-9A1D-7F95960D1C24@tencent.com>

Hi all,

May I get reviews for this fix?

JBS:    https://bugs.openjdk.java.net/browse/JDK-8252404
Webrev: http://cr.openjdk.java.net/~jiefu/8252404/webrev.00/

Thanks.
Best regards,
Jie

From igor.ignatyev at oracle.com  Thu Aug 27 00:08:09 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 26 Aug 2020 17:08:09 -0700
Subject: RFR(M/S) : 8251127 : clean up FileInstaller $test.src $cwd in
 remaining vmTestbase_vm_compiler tests : 
In-Reply-To: <5859dffd-9ed9-21d3-102b-3070013d7fe0@oracle.com>
References: <FC368785-3C64-48CA-B5D8-BD2C336BEE82@oracle.com>
 <5859dffd-9ed9-21d3-102b-3070013d7fe0@oracle.com>
Message-ID: <40E57766-0F5A-48E0-9B9A-5353642A75D0@oracle.com>

thanks Vladimir, pushed.

-- Igor

> On Aug 25, 2020, at 6:10 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Good.
> 
> Thanks,
> Vladimir K
> 
> On 8/25/20 6:01 PM, Igor Ignatyev wrote:
>> http://cr.openjdk.java.net/~iignatyev/8251127/webrev.00/
>>> 560 lines changed: 132 ins; 367 del; 61 mod;
>> Hi all,
>> could you please review the patch which removes FileInstaller actions from :vmTestbase_vm_compiler?
>> the biggest chunk of the patch is just removal for '@run jdk.test.lib.FileInstaller' produced by sed '/jdk.test.lib.FileInstaller \. \./d'. human-made changes are:
>>  - moving jtreg test descriptions to the test source in t108-t113, corresponding changes in TEST.quick-groups and fixing line numbers in t108-t113.gold files
>>  - adding -Dtest.src=${test.src} to the tests which use ExecDriver (t087,t088,t108-t113), so GoldChecker would be able to find .gold file
>> testing: :vmTestbase_vm_compiler
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8251127
>> webrev: http://cr.openjdk.java.net/~iignatyev/8251127/webrev.00/
>> Thanks,
>> -- Igor
>>  


From vladimir.kozlov at oracle.com  Thu Aug 27 00:32:19 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 26 Aug 2020 17:32:19 -0700
Subject: [16] RFR(M) 825239: AOT need to process new markId
 DEOPT_MH_HANDLER_ENTRY in compiled code
Message-ID: <9c278576-08e3-1f5b-28d2-6c3b980a6511@oracle.com>

http://cr.openjdk.java.net/~kvn/8252396/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8252396

8252058 added new markId DEOPT_MH_HANDLER_ENTRY to handle deoptimization for MH invoke.
But changes did not updated AOT (jaotc and Hotspot's AOT code) to handle this new markId.

We should handle DEOPT_MH_HANDLER_ENTRY in AOT similar to DEOPT_HANDLER_ENTRY.

In aotCompiledMethod.hpp, if DEOPT_MH_HANDLER_ENTRY value is set, CompiledMethod::_deopt_mh_handler_begin [2] is set 
similar to Graal JIT [3]. I kept current code to set _deopt_mh_handler_begin to 'this' when DEOPT_MH_HANDLER_ENTRY value 
is not set. But may be it should be set to NULL as in [3]. May be it does not matter because offset is not used when 
there are not MH invoke in method.

Tested: ran tests which used AOT (including Graal testing).

Thanks,
Vladimir

[1] https://bugs.openjdk.java.net/browse/JDK-8252058
[2] http://hg.openjdk.java.net/jdk/jdk/file/6abdfb11f342/src/hotspot/share/code/compiledMethod.hpp#l168
[3] http://hg.openjdk.java.net/jdk/jdk/file/6abdfb11f342/src/hotspot/share/code/nmethod.cpp#l764

From vladimir.kozlov at oracle.com  Thu Aug 27 02:20:17 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 26 Aug 2020 19:20:17 -0700
Subject: RFR: 8252404: compiler/c1/TestTraceLinearScanLevel.java fails
 with release VMs
In-Reply-To: <1D07C3F6-F236-4934-9A1D-7F95960D1C24@tencent.com>
References: <1D07C3F6-F236-4934-9A1D-7F95960D1C24@tencent.com>
Message-ID: <405d4932-df45-8967-c4d6-79d119baa511@oracle.com>

Since test's method is empty, it does not make sense to run it when C1's flag is not available.

I suggest to add @requires instead of IgnoreUnrecognizedVMOptions flag, as in an other test [1]:

* @requires vm.debug == true & vm.compiler1.enabled

Thanks,
Vladimir K

[1] http://hg.openjdk.java.net/jdk/jdk/file/6abdfb11f342/test/hotspot/jtreg/compiler/c1/TestPrintIRDuringConstruction.java

On 8/26/20 4:37 PM, jiefu(??) wrote:
> Hi all,
> 
> May I get reviews for this fix?
> 
> JBS:    https://bugs.openjdk.java.net/browse/JDK-8252404
> Webrev: http://cr.openjdk.java.net/~jiefu/8252404/webrev.00/
> 
> Thanks.
> Best regards,
> Jie
> 

From jiefu at tencent.com  Thu Aug 27 02:38:34 2020
From: jiefu at tencent.com (=?iso-2022-jp?B?amllZnUoGyRCUHxbPxsoQik=?=)
Date: Thu, 27 Aug 2020 02:38:34 +0000
Subject: RFR: 8252404: compiler/c1/TestTraceLinearScanLevel.java fails
 with release VMs(Internet mail)
In-Reply-To: <405d4932-df45-8967-c4d6-79d119baa511@oracle.com>
References: <1D07C3F6-F236-4934-9A1D-7F95960D1C24@tencent.com>,
 <405d4932-df45-8967-c4d6-79d119baa511@oracle.com>
Message-ID: <7c0fb874a5e6477fb7ca0c9ec659d004@tencent.com>

Hi Vladimir K,

Thanks for your review.

Updated: http://cr.openjdk.java.net/~jiefu/8252404/webrev.01/

Best regards,
Jie


________________________________
From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
Sent: Thursday, August 27, 2020 10:20 AM
To: jiefu(??); hotspot compiler
Subject: Re: RFR: 8252404: compiler/c1/TestTraceLinearScanLevel.java fails with release VMs(Internet mail)

Since test's method is empty, it does not make sense to run it when C1's flag is not available.

I suggest to add @requires instead of IgnoreUnrecognizedVMOptions flag, as in an other test [1]:

* @requires vm.debug == true & vm.compiler1.enabled

Thanks,
Vladimir K

[1] http://hg.openjdk.java.net/jdk/jdk/file/6abdfb11f342/test/hotspot/jtreg/compiler/c1/TestPrintIRDuringConstruction.java

On 8/26/20 4:37 PM, jiefu(??) wrote:
> Hi all,
>
> May I get reviews for this fix?
>
> JBS:    https://bugs.openjdk.java.net/browse/JDK-8252404
> Webrev: http://cr.openjdk.java.net/~jiefu/8252404/webrev.00/
>
> Thanks.
> Best regards,
> Jie
>


From vladimir.kozlov at oracle.com  Thu Aug 27 02:47:38 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 26 Aug 2020 19:47:38 -0700
Subject: RFR: 8252404: compiler/c1/TestTraceLinearScanLevel.java fails
 with release VMs(Internet mail)
In-Reply-To: <7c0fb874a5e6477fb7ca0c9ec659d004@tencent.com>
References: <1D07C3F6-F236-4934-9A1D-7F95960D1C24@tencent.com>
 <405d4932-df45-8967-c4d6-79d119baa511@oracle.com>
 <7c0fb874a5e6477fb7ca0c9ec659d004@tencent.com>
Message-ID: <2f2cbd0e-639a-9c8a-41ab-33e16483e12c@oracle.com>

Good.

Vladimir K

On 8/26/20 7:38 PM, jiefu(??) wrote:
> Hi Vladimir K,
> 
> Thanks for your review.
> 
> Updated: http://cr.openjdk.java.net/~jiefu/8252404/webrev.01/
> 
> Best regards,
> Jie
> 
> 
> 
> ------------------------------------------------------------------------------------------------------------------------
> *From:* Vladimir Kozlov <vladimir.kozlov at oracle.com>
> *Sent:* Thursday, August 27, 2020 10:20 AM
> *To:* jiefu(??); hotspot compiler
> *Subject:* Re: RFR: 8252404: compiler/c1/TestTraceLinearScanLevel.java fails with release VMs(Internet mail)
> Since test's method is empty, it does not make sense to run it when C1's flag is not available.
> 
> I suggest to add @requires instead of IgnoreUnrecognizedVMOptions flag, as in an other test [1]:
> 
> * @requires vm.debug == true & vm.compiler1.enabled
> 
> Thanks,
> Vladimir K
> 
> [1] http://hg.openjdk.java.net/jdk/jdk/file/6abdfb11f342/test/hotspot/jtreg/compiler/c1/TestPrintIRDuringConstruction.java
> 
> On 8/26/20 4:37 PM, jiefu(??) wrote:
>> Hi all,
>> 
>> May I get reviews for this fix?
>> 
>> JBS:    https://bugs.openjdk.java.net/browse/JDK-8252404
>> Webrev: http://cr.openjdk.java.net/~jiefu/8252404/webrev.00/
>> 
>> Thanks.
>> Best regards,
>> Jie
>> 
> 

From jiefu at tencent.com  Thu Aug 27 02:54:36 2020
From: jiefu at tencent.com (=?iso-2022-jp?B?amllZnUoGyRCUHxbPxsoQik=?=)
Date: Thu, 27 Aug 2020 02:54:36 +0000
Subject: RFR: 8252404: compiler/c1/TestTraceLinearScanLevel.java fails
 with release VMs(Internet mail)
In-Reply-To: <2f2cbd0e-639a-9c8a-41ab-33e16483e12c@oracle.com>
References: <1D07C3F6-F236-4934-9A1D-7F95960D1C24@tencent.com>
 <405d4932-df45-8967-c4d6-79d119baa511@oracle.com>
 <7c0fb874a5e6477fb7ca0c9ec659d004@tencent.com>,
 <2f2cbd0e-639a-9c8a-41ab-33e16483e12c@oracle.com>
Message-ID: <528dc08156b348c48700797c729a7f2c@tencent.com>

Thanks Vladimir K.


Can I push it right now?

I think it's trivial and this is a tier1 failure.


Best regards,

Jie

________________________________
From: Vladimir Kozlov <vladimir.kozlov at oracle.com>
Sent: Thursday, August 27, 2020 10:47 AM
To: jiefu(??); hotspot compiler
Subject: Re: RFR: 8252404: compiler/c1/TestTraceLinearScanLevel.java fails with release VMs(Internet mail)

Good.

Vladimir K

On 8/26/20 7:38 PM, jiefu(??) wrote:
> Hi Vladimir K,
>
> Thanks for your review.
>
> Updated: http://cr.openjdk.java.net/~jiefu/8252404/webrev.01/
>
> Best regards,
> Jie
>
>
>
> ------------------------------------------------------------------------------------------------------------------------
> *From:* Vladimir Kozlov <vladimir.kozlov at oracle.com>
> *Sent:* Thursday, August 27, 2020 10:20 AM
> *To:* jiefu(??); hotspot compiler
> *Subject:* Re: RFR: 8252404: compiler/c1/TestTraceLinearScanLevel.java fails with release VMs(Internet mail)
> Since test's method is empty, it does not make sense to run it when C1's flag is not available.
>
> I suggest to add @requires instead of IgnoreUnrecognizedVMOptions flag, as in an other test [1]:
>
> * @requires vm.debug == true & vm.compiler1.enabled
>
> Thanks,
> Vladimir K
>
> [1] http://hg.openjdk.java.net/jdk/jdk/file/6abdfb11f342/test/hotspot/jtreg/compiler/c1/TestPrintIRDuringConstruction.java
>
> On 8/26/20 4:37 PM, jiefu(??) wrote:
>> Hi all,
>>
>> May I get reviews for this fix?
>>
>> JBS:    https://bugs.openjdk.java.net/browse/JDK-8252404
>> Webrev: http://cr.openjdk.java.net/~jiefu/8252404/webrev.00/
>>
>> Thanks.
>> Best regards,
>> Jie
>>
>


From xxinliu at amazon.com  Thu Aug 27 05:37:25 2020
From: xxinliu at amazon.com (Liu, Xin)
Date: Thu, 27 Aug 2020 05:37:25 +0000
Subject: RFR(S): 8247732: validate user-input intrinsic_ids in
 ControlIntrinsic
In-Reply-To: <1597343851213.53343@amazon.com>
References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com>
 <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com>
 <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com>
 <1595401959932.33284@amazon.com>
 <a03d92d6-ad07-b347-7452-776459b8d174@oracle.com>
 <1595520162373.22868@amazon.com>
 <916b3a4a-5617-941d-6161-840f3ea900bd@oracle.com>
 <1596523192072.15354@amazon.com> <1597165750921.4285@amazon.com>
 <f9dfee56-c3dc-718a-960b-9bfc2a8c0c12@oracle.com>
 <9e3fae0e-ecf7-07a9-dba3-c1cef2646eb3@oracle.com>,
 <4c70ed76-d31a-4077-14b7-37937b5c22ae@oracle.com>,
 <1597343851213.53343@amazon.com>
Message-ID: <1598506645473.15178@amazon.com>

Hi, Reviewers, 

May I ask to review the new revision of JDK-8247732?
Webrev:  http://cr.openjdk.java.net/~xliu/8247732/02/webrev/

Compared to the previous revision, I suppress invalid Intrinsic Ids in -XX:CompileCommand= and -XX:CompileCommandFile=.  This behavior conforms to Tobias and Nils comments before. 

I extent the testing framework to support a new CompileCommand 'INTRINSIC'. It actually represents ControlIntrinic= in both compiler command and compiler directive. 
The reason I don't test DisableIntrinsic because it will deprecate. 3 new ControlIntrinsicTest.java files are added to test ControlIntrinsic appears in -XX:CompileCommand=, -XX:CompilerDirectivesFile= and JCMD respectively.
As the following table described, only  -XX:CompilerDirectivesFile= will abort hotspot process with non-zero exit value. 

The current testing framework can't test vmflag case directly, I ran test manually like I did in comment before.
https://bugs.openjdk.java.net/browse/JDK-8247732?focusedCommentId=14349960&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14349960

Testing: 
hotspot tier1 test and gtest. 

thanks, 
--lx

________________________________________
From: hotspot-compiler-dev <hotspot-compiler-dev-retn at openjdk.java.net> on behalf of Liu, Xin <xxinliu at amazon.com>
Sent: Thursday, August 13, 2020 11:37 AM
To: Nils Eliasson; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic

hi, Nils,

Thank you to elaborate the answer with a table.

I don't know there are up to 4 approaches to affect compilation behaviors until this table!
I got it. I will work tests and make sure my next patch conform this spec.

thanks,
--lx

________________________________________
From: hotspot-compiler-dev <hotspot-compiler-dev-retn at openjdk.java.net> on behalf of Nils Eliasson <nils.eliasson at oracle.com>
Sent: Thursday, August 13, 2020 9:17 AM
To: hotspot-compiler-dev at openjdk.java.net
Subject: RE: [EXTERNAL] RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.


That table didn't come out right...

+-------------------------------------------------+-------+----------------------------------+
| ControlIntrinsics                               | valid |
invalid                          |
+-------------------------------------------------+-------+----------------------------------+
| vmflag                                          | ok    | print error
and don't start      |
+-------------------------------------------------+-------+----------------------------------+
| CompilerOracle: -XX:CompileCommand=             | ok    | print error
and continue         |
+-------------------------------------------------+-------+----------------------------------+
| CompilerDirectives: -XX:CompilerDirectivesFile= | ok    | print error
and don't start      |
+-------------------------------------------------+-------+----------------------------------+
| CompilerDirectives via jcmd                     | ok    | print error,
VM continues to run |
+-------------------------------------------------+-------+----------------------------------+

// Regards
Nils


On 2020-08-13 17:59, Nils Eliasson wrote:
>
> |+-------------------------------------------------+-------+----------------------------------+
> | ControlIntrinsics | valid | invalid |
> +-------------------------------------------------+-------+----------------------------------+
> | vmflag | ok | print error and don't start |
> +-------------------------------------------------+-------+----------------------------------+
> | CompilerOracle: -XX:CompileCommand= | ok | print error and continue
> |
> +-------------------------------------------------+-------+----------------------------------+
> | CompilerDirectives: -XX:CompilerDirectivesFile= | ok | print error
> and don't start |
> +-------------------------------------------------+-------+----------------------------------+
> | CompilerDirectives via jcmd | ok | print error, vm continues to run
> |
> +-------------------------------------------------+-------+----------------------------------+|


From jiefu at tencent.com  Thu Aug 27 06:29:24 2020
From: jiefu at tencent.com (=?utf-8?B?amllZnUo5YKF5p2wKQ==?=)
Date: Thu, 27 Aug 2020 06:29:24 +0000
Subject: RFR: 8252404: compiler/c1/TestTraceLinearScanLevel.java fails
 with release VMs
Message-ID: <E026A34A-E88F-44E4-AE4B-0674400CAEC7@tencent.com>

Thanks Tobias for your review.
I'll push it later.

Best regards,
Jie

?On 2020/8/27, 2:23 PM, "Tobias Hartmann" <tobias.hartmann at oracle.com> wrote:

    
    On 27.08.20 04:54, jiefu(??) wrote:
    > Can I push it right now?
    > 
    > I think it's trivial and this is a tier1 failure.
    
    Looks good and trivial to me as well.
    
    Best regards,
    Tobias
    
    
From rwestrel at redhat.com  Thu Aug 27 07:25:44 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 27 Aug 2020 09:25:44 +0200
Subject: RFR(T): 8252296: Shenandoah: crash in
 CallNode::extract_projections
In-Reply-To: <312607ab-2d2f-7966-519c-5354951d5184@oracle.com>
References: <87d03d7pdk.fsf@redhat.com> <878se17p50.fsf@redhat.com>
 <312607ab-2d2f-7966-519c-5354951d5184@oracle.com>
Message-ID: <87wo1k5z47.fsf@redhat.com>


Thanks for the review, Christian.

Roland.


From rwestrel at redhat.com  Thu Aug 27 07:26:13 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 27 Aug 2020 09:26:13 +0200
Subject: RFR(T): 8252296: Shenandoah: crash in
 CallNode::extract_projections
In-Reply-To: <874koptpv5.fsf@mid.deneb.enyo.de>
References: <87d03d7pdk.fsf@redhat.com> <878se17p50.fsf@redhat.com>
 <312607ab-2d2f-7966-519c-5354951d5184@oracle.com>
 <874koptpv5.fsf@mid.deneb.enyo.de>
Message-ID: <87tuwo5z3e.fsf@redhat.com>


> It seems to fix my reproducer, too.  Thanks.

Thanks for verifying the fix.

Roland.


From christian.hagedorn at oracle.com  Thu Aug 27 07:53:28 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Thu, 27 Aug 2020 09:53:28 +0200
Subject: [16] RFR(M): 8249607: C2: assert(!had_error) failed: bad dominance
In-Reply-To: <738ba102-2cbd-a842-0f23-2984a9293035@oracle.com>
References: <ff42068d-c628-7698-b16d-de164e9f6a4d@oracle.com>
 <87o8my7r0b.fsf@redhat.com> <738ba102-2cbd-a842-0f23-2984a9293035@oracle.com>
Message-ID: <95e31d50-aab1-a56c-9077-3e6370d1d94a@oracle.com>

On 25.08.20 19:42, Christian Hagedorn wrote:
> On 25.08.20 16:13, Roland Westrelin wrote:
>>
>>> In the testcase, a LoadSNode is cloned in
>>> PhaseIdealLoop::split_if_with_blocks_post() for each use such that they
>>> can float out of a loop. To ensure that these loads cannot float back
>>> into the loop, we pin them by setting their control input [1]. In the
>>> testcase, all 3 new clones are pinned to a loop exit node that is part
>>> of an outer strip mined loop (see [2]).
>>
>> Do I understand this right, that all 3 clones are pinned with the same
>> control? So they common and only of them is kept?
> 
> Yes, exactly. All are pinned to the inner loop exit node. But at the 
> time we hit the assertion failure, we still got one cloned load (903 
> LoadS) that is an input to the store (575 StoreI) that's going into the 
> outer strip mined loop safepoint, and one load (901 LoadS) that is 
> triggering the dominance failure. LoadS 902 was removed at some point in 
> between due to other optimizations.

As Roland and I have discussed offline, it seems to be better and safer 
to do a simpler fix that does not change the original behavior of the 
optimization. The new fix suggests not yank AddP nodes (which are inputs 
to the cloned LoadSNodes in the testcase) and also to not yank gc 
barriers. In the testcase, the cloned LoadSNodes are still pinned at the 
loop exit but now they can be optimized and common up to one node during 
igvn that only belongs to the safepoint in the outer strip mined loop 
(i.e. no load after the loop anymore). The load is still successfully 
removed from the inner loop:

http://cr.openjdk.java.net/~chagedorn/8249607/webrev.02/

I left the improved dominance failure dumping as it is.

We think that it would be a good idea to revisit this cloning 
optimization in an RFE and also consider webrev.01 there as it seems to 
be more like an enhancement for loop strip mining rather than a bug fix. 
I filed [1] which summarizes some thoughts about it.

What do others think about that?

Best regards,
Christian


[1] https://bugs.openjdk.java.net/browse/JDK-8252372

From dean.long at oracle.com  Thu Aug 27 08:36:11 2020
From: dean.long at oracle.com (Dean Long)
Date: Thu, 27 Aug 2020 01:36:11 -0700
Subject: [16] RFR(M) 825239: AOT need to process new markId
 DEOPT_MH_HANDLER_ENTRY in compiled code
In-Reply-To: <9c278576-08e3-1f5b-28d2-6c3b980a6511@oracle.com>
References: <9c278576-08e3-1f5b-28d2-6c3b980a6511@oracle.com>
Message-ID: <bff8eacd-0390-e290-a9bb-af9b394ad7e9@oracle.com>

Looks good.

dl

On 8/26/20 5:32 PM, Vladimir Kozlov wrote:
> http://cr.openjdk.java.net/~kvn/8252396/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8252396
>
> 8252058 added new markId DEOPT_MH_HANDLER_ENTRY to handle 
> deoptimization for MH invoke.
> But changes did not updated AOT (jaotc and Hotspot's AOT code) to 
> handle this new markId.
>
> We should handle DEOPT_MH_HANDLER_ENTRY in AOT similar to 
> DEOPT_HANDLER_ENTRY.
>
> In aotCompiledMethod.hpp, if DEOPT_MH_HANDLER_ENTRY value is set, 
> CompiledMethod::_deopt_mh_handler_begin [2] is set similar to Graal 
> JIT [3]. I kept current code to set _deopt_mh_handler_begin to 'this' 
> when DEOPT_MH_HANDLER_ENTRY value is not set. But may be it should be 
> set to NULL as in [3]. May be it does not matter because offset is not 
> used when there are not MH invoke in method.
>
> Tested: ran tests which used AOT (including Graal testing).
>
> Thanks,
> Vladimir
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8252058
> [2] 
> http://hg.openjdk.java.net/jdk/jdk/file/6abdfb11f342/src/hotspot/share/code/compiledMethod.hpp#l168
> [3] 
> http://hg.openjdk.java.net/jdk/jdk/file/6abdfb11f342/src/hotspot/share/code/nmethod.cpp#l764


From aph at redhat.com  Thu Aug 27 09:18:46 2020
From: aph at redhat.com (Andrew Haley)
Date: Thu, 27 Aug 2020 10:18:46 +0100
Subject: [16] RFR(M) 825239: AOT need to process new markId
 DEOPT_MH_HANDLER_ENTRY in compiled code
In-Reply-To: <9c278576-08e3-1f5b-28d2-6c3b980a6511@oracle.com>
References: <9c278576-08e3-1f5b-28d2-6c3b980a6511@oracle.com>
Message-ID: <d516e90e-392d-0e76-2e7e-2ea4cae1efda@redhat.com>

On 27/08/2020 01:32, Vladimir Kozlov wrote:
> [1] https://bugs.openjdk.java.net/browse/JDK-8252058

You can't view this issue

It may have been deleted or you don't have permission to view it.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From aph at redhat.com  Thu Aug 27 09:44:41 2020
From: aph at redhat.com (Andrew Haley)
Date: Thu, 27 Aug 2020 10:44:41 +0100
Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
In-Reply-To: <AM4PR02MB305739FDFF38E9432F27E7969A5F0@AM4PR02MB3057.eurprd02.prod.outlook.com>
References: <AM4PR02MB305739FDFF38E9432F27E7969A5F0@AM4PR02MB3057.eurprd02.prod.outlook.com>
Message-ID: <fdb47a78-5b0e-f013-b5fc-7c9fcdf0cf2c@redhat.com>

On 17/08/2020 22:54, Doerr, Martin wrote:
> Hi,
> 
> I'd like to backport https://bugs.openjdk.java.net/browse/JDK-8241234 to JDK11u.
> 
> Original JDK15 patch (https://hg.openjdk.java.net/jdk/jdk/rev/87c506c8be63) doesn't fit to JDK11u because the locking code has been reworked by https://bugs.openjdk.java.net/browse/JDK-8229844
> As mentioned by Vladimir, there's already a GraalVM version available which consists of 2 patches (original + addon) and which can be applied:
> https://github.com/graalvm/labs-openjdk-11/commit/6c162cb15262e6aa77e36eb3a268320ef0a206a4
> https://github.com/graalvm/labs-openjdk-11/commit/6a28a618cdbe595f9a3993e0eb63c01ccae1a528
> Only JVMCI part from GraalVM doesn't apply automatically. The version of this file from JDK15 is very simple and fits perfectly.
> 
> Please review the JDK11u backport webrev:
> http://cr.openjdk.java.net/~mdoerr/8241234_monitorenterexit_11u/webrev.00/

Why is anyone backporting a P4 Enhancement? Seems weird.


-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From martin.doerr at sap.com  Thu Aug 27 10:04:28 2020
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 27 Aug 2020 10:04:28 +0000
Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
In-Reply-To: <fdb47a78-5b0e-f013-b5fc-7c9fcdf0cf2c@redhat.com>
References: <AM4PR02MB305739FDFF38E9432F27E7969A5F0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <fdb47a78-5b0e-f013-b5fc-7c9fcdf0cf2c@redhat.com>
Message-ID: <AM4PR02MB30575C819AFC52A4F10127689A550@AM4PR02MB3057.eurprd02.prod.outlook.com>

Hi Andrew,

> Why is anyone backporting a P4 Enhancement? Seems weird.
This is a good question in general. Personally, I'd vote for backporting fewer less important things to 11u in the future. We should better focus on 17 IMHO.

However, there are some arguments for backporting this one:
- Oracle has done so. There may be more backports in this area and I'd expect less effort if we have the same code in the open version.
- Performance is supposed to be better. (Though I didn't measure it.)
- New code is much cleaner. Let's keep in mind that we have to support it for quite a while.

Are you ok with it?

Best regards,
Martin


> -----Original Message-----
> From: Andrew Haley <aph at redhat.com>
> Sent: Donnerstag, 27. August 2020 11:45
> To: Doerr, Martin <martin.doerr at sap.com>; 'hotspot-compiler-
> dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>; jdk-
> updates-dev at openjdk.java.net
> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
> Subject: Re: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
> 
> On 17/08/2020 22:54, Doerr, Martin wrote:
> > Hi,
> >
> > I'd like to backport https://bugs.openjdk.java.net/browse/JDK-8241234 to
> JDK11u.
> >
> > Original JDK15 patch
> (https://hg.openjdk.java.net/jdk/jdk/rev/87c506c8be63) doesn't fit to
> JDK11u because the locking code has been reworked by
> https://bugs.openjdk.java.net/browse/JDK-8229844
> > As mentioned by Vladimir, there's already a GraalVM version available
> which consists of 2 patches (original + addon) and which can be applied:
> > https://github.com/graalvm/labs-openjdk-
> 11/commit/6c162cb15262e6aa77e36eb3a268320ef0a206a4
> > https://github.com/graalvm/labs-openjdk-
> 11/commit/6a28a618cdbe595f9a3993e0eb63c01ccae1a528
> > Only JVMCI part from GraalVM doesn't apply automatically. The version of
> this file from JDK15 is very simple and fits perfectly.
> >
> > Please review the JDK11u backport webrev:
> >
> http://cr.openjdk.java.net/~mdoerr/8241234_monitorenterexit_11u/webre
> v.00/
> 
> Why is anyone backporting a P4 Enhancement? Seems weird.
> 
> 
> --
> Andrew Haley  (he/him)
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> https://keybase.io/andrewhaley
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From rwestrel at redhat.com  Thu Aug 27 11:43:06 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 27 Aug 2020 13:43:06 +0200
Subject: RFR(S): 8241486: G1/Z give warning when using
 LoopStripMiningIter and turn off LoopStripMiningIter (0)
In-Reply-To: <e3352381-1722-f36b-a8e0-4b3ecc135674@oracle.com>
References: <87tuwr6s5j.fsf@redhat.com>
 <b2bd009e-fdba-d5f4-77b6-173df1609102@oracle.com>
 <e5dc8786-0044-37bd-299a-89c90f303a2e@oracle.com>
 <e3352381-1722-f36b-a8e0-4b3ecc135674@oracle.com>
Message-ID: <87r1rs5n79.fsf@redhat.com>


Thanks for the reviews Vladimir and Tobias.

Roland.


From rwestrel at redhat.com  Thu Aug 27 11:52:43 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 27 Aug 2020 13:52:43 +0200
Subject: RFR(S): 8252292: 8240795 may cause anti-dependence to be missed
In-Reply-To: <09a82d80-208c-6cea-da6b-e501d65e0f79@oracle.com>
References: <87wo1n6snc.fsf@redhat.com>
 <fe7731bf-3521-c4d3-7f96-4656b3f28292@oracle.com>
 <09a82d80-208c-6cea-da6b-e501d65e0f79@oracle.com>
Message-ID: <87o8mw5mr8.fsf@redhat.com>


Thanks for the review, Vladimir.

Roland.


From vladimir.x.ivanov at oracle.com  Thu Aug 27 12:54:23 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 27 Aug 2020 15:54:23 +0300
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <670fad6f-16ff-a7b3-8775-08dd79809ddf@redhat.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com>
 <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com>
 <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com>
 <b97a6343-ec3b-f6ec-2a75-d02a53641ed1@arm.com>
 <35f8801f-4383-4804-2a9b-5818d1bda763@redhat.com>
 <524a4aaa-3cf7-7b5e-3e0e-0fd7f4f89fbf@oracle.com>
 <670fad6f-16ff-a7b3-8775-08dd79809ddf@redhat.com>
Message-ID: <dbf16339-a41e-2f87-aace-461b21f10bfe@oracle.com>

Hi Andrew,

> So, if I can summarize (please correct me if I misunderstand):
> 
>    You are as concerned about existing complexity in vector handling as
> much as complexity added by this patch, whether the latter is to AArch64
> code or shared code.
> 
>    The goal you would like to achieve is a single set of rules for a
> single kind of vector register whose size is parameterized, the
> appropriate value being derived from each specific vector operation.
> 
>    Your main concern about this patch is that it adds yet another
> additional vector kind to the current 'wrong' multi-kind vector model
> and, what is worse, one with a different behaviour, taking us further
> from your desired goal.

Yes, correct.

>    Your other concern is that this design does not allow for the AArch64
> ISA predication or, indeed, for what you treat uniformly as the
> 'implicit' predication imposed on a 'logical' max vector size (2048
> bits) by the specific AVX/SVE/NEON hardware vector size.

No, I'm not concerned about that. I mentioned SVE implicit predication 
to illustrate that there's a higher-level abstraction in the JVM above 
ISA level which hides some of the functionality ISA exposes. And I'm 
perfectly fine with that.

>> But you should definitely prefer 1-slot design for vector registers then
>> ;-)
> 
> Indeed I do :-]
> 
> So, let me respond to the above summary points, assuming I have them
> down right.
> 
> I agree that your end goal is highly desirable. However, we are not
> there yet and since your attempts to do so have not succeeded so far I
> don't think that means we are compelled to drop the current patch. As
> you say this could (and, if it is adopted, should) be regarded as a
> useful stop-gap until we come up with a unified, parameterized vector
> implementation that makes it redundant.

Unfortunately, there was simply not enough motivation on x86 (and hence 
resources spent) to address it there. Vector API support for x86 
stretched the implementation in a different direction: combinatorial 
explosion of AD instructions needed to cover all useful cases. It 
required switching to full-width vectors in x86.ad file which left RA 
concerns waiting next opportunity.

> That said, I'm not pushing hard to keep the patch if the consequence is
> generating significant work later to undo it. The number of users who
> might benefit from using SVE vectors from Java now or in the near future
> does not look like it is going to be very large (if you are not making a
> lot of use of SVE registers then that is a lot of wasted silicon and I
> suspect it's going to be the rare case that someone codes an app in Java
> that needs to make continuous use of SVE -- mind you, by the same token
> I guess that also applies for AVX on Intel).

I don't consider RA part of the patch as the show-stopper issue for 
initial SVE support. As I said to Ningsheng, I'm fine with the patch as 
it is now if we agree it's a stop-the-gap solution and there's a 
commitment to invest into the proper support.

I initially put options #1/#2 (which don't require any changes in RA 
shared code) as possible alternatives way to temporarily address the 
problem. Both require additional simplifying assumptions and hence I 
didn't insist they should be chosen.

> I'm not sure pushing this now will add a lot more work later. It seems
> to me that this code is actually moving in the right direction for the
> sort of solution you want. The AArch64 VecA register /is/
> size-parameterized, albeit by a size fixed at startup rather than per
> operation. So, that's one reason why I don't know if this implies a lot
> more rework to move towards your desired goal. Surely, if we do arrive
> at a unifying vector model that can replace the existing multi-kind
> vectors then it ought to be able to subsume this code - unless of course
> it replaces it wholesale.
> 
> Are you concerned that adding this patch will result in more cases to
> pick through and correct?
> 
> Are you worried that we might have to withdraw some of the support this
> patch enables to arrive at the final goal?
> 
> Also, Ningsheng and his colleagues have laid some foundations for
> implementing predicated operations with this patch and have that work in
> the pipeline. Once again this is moving towards the desired goal even if
> it might end up doign so in a slightly sideways fashion. Perhaps we
> could continue this stop-gap experiment as an experimental option in
> order to learn from the experience?

I definitely don't want to hinder/block the impressive work Ningsheng 
and others at Arm are doing for SVE support.

Frankly speaking, my main concern is that the implementation can stay 
that way forever ;-) That's why I'm trying to get enough ground covered 
in the discussion and some agreements/commitments to be made before it 
is integrated.

I don't have any strong objections to the patch which could justify 
blocking its integration, but on a higher-level I do voice my concerns 
about where it pushes the implementation longer-term.

Unfortunately, as it is shaped now, I don't see how x86 can benefit from 
it. So, I'm afraid this particular route with vecA and _is_scalable bit 
will stay purely AArch64-specific exercise.

Leaving RA part aside, I have one suggestion which should help in the 
future: let's try to consistently follow full-width vector abstraction. 
In AD file, vecA operand is way too similar to vecX et al which makes a 
wrong impression it's yet another vector flavor. So, choosing a better 
name will help when representation changes. For example, x86 moved away 
from vecX/... operands to a single generic one (called "vec") and you 
can take a loot at x86.ad to see the result.

Best regards,
Vladimir Ivanov

From christian.hagedorn at oracle.com  Thu Aug 27 14:54:22 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Thu, 27 Aug 2020 16:54:22 +0200
Subject: RFR(S) 8251271- C2: Compile::_for_igvn list is corrupted after
 RenumberLiveNodes
In-Reply-To: <MW2PR2101MB1786E5FFC976B369ED901396A6540@MW2PR2101MB1786.namprd21.prod.outlook.com>
References: <MW2PR2101MB1786E5FFC976B369ED901396A6540@MW2PR2101MB1786.namprd21.prod.outlook.com>
Message-ID: <3c989485-754f-b7f5-e91f-c7c0adfdaf88@oracle.com>

Hi Nhat

Looks good to me!

Just make sure you that next time you assign the bug to you or a sponsor 
and/or leave a comment that you intend to work on it to avoid the 
possibility of some duplicated work (was no problem in this case) ;-)

Best regards,
Christian

On 26.08.20 20:55, Nhat Nguyen wrote:
> Hi hotspot-compiler-dev,
> 
> Please review the following patch to address https://bugs.openjdk.java.net/browse/JDK-8251271
> The bug is currently assigned to Christian Hagedorn, but he was supportive of me submitting the patch instead.
> I have run hotspot/tier1 and jdk/tier1 tests to make sure that the change is working as intended.
> 
> webrev: http://cr.openjdk.java.net/~burban/nhat/JDK-8251271/webrev.00/
> 
> Thank you,
> Nhat
> 

From martin.doerr at sap.com  Thu Aug 27 15:07:08 2020
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 27 Aug 2020 15:07:08 +0000
Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API
 for Base64 decoding
In-Reply-To: <f315c49b-a42e-0668-0f53-3b9e979c0acc@linux.ibm.com>
References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com>
 <OFF371E11D.F5B64585-ON00258591.0006A5F8-49258591.0007AA3E@notes.na.collabserv.com>
 <ef75251b-c5ef-10b5-d0bd-c4ba15d2c934@linux.ibm.com>
 <f4c2d16d-5531-b0a6-b763-bed9d316190a@linux.ibm.com>
 <AM4PR02MB30578492C614A8BBD34D2DB39A570@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <AM4PR02MB3057EC6F32A1933BFFDC35AF9A540@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <f315c49b-a42e-0668-0f53-3b9e979c0acc@linux.ibm.com>
Message-ID: <AM4PR02MB30577E1569524741AC27BACA9A550@AM4PR02MB3057.eurprd02.prod.outlook.com>

Hi Corey,

> If I make a requirement, I feel decode0 should check that the
> requirement is met, and raise some kind of internal error if it isn't.
> That actually was my first implementation, but I received some comments
> during an internal review suggesting that I just "round down" the
> destination count to the closest multiple of 3 less than or equal to the
> returned value, rather than throw an internal exception which would
> confuse users.  This "enforces" the rule, in some sense, without error
> handling.  Do you have some thoughts about this?

I think the rounding logic is hard to understand and I'm not sure if it's correct (you're rounding up for the 1st computation of chars_decoded).
If we don't use it, it will never get tested (because the intrinsic always returns a multiple of 3).
I prefer having a more simple version which is easy to understand and for which we can test all cases.

I think we should be able to catch violations of this requirement by adding good JTREG tests.
An illegal intrinsic implementation should never pass the tests. So I don't see a need to catch an illegal state in the Java source code in this case.
I guess this will be best for intrinsic implementors for other platforms as well.

I'd appreciate more opinions on this.


> I will double check that everything compiles and runs properly with gcc
> 7.3.1.
Please note that 7.3.1 is our minimum for Big Endian linux. For Little Endian it's 7.4.0.
You can also find this information here:
https://wiki.openjdk.java.net/display/Build/Supported+Build+Platforms
under "Other JDK 13 build platforms" which hasn't changed since then.

> I will use __attribute__ ((align(16))) instead of __vector, and make
> them arrays of 16 unsigned char.
Maybe __vectors works as expected, too, now. Whatever we use, I'd appreciate to double-check the alignment e.g. by using gdb.
I don't remember what we had tried and why it didn't work as desired.


> I was following what was done for encodeBlock, but it appears
> encodeBlock's style isn't what is used for the other intrinsics.  I will
> correct decodeBlock to use the prevailing style.  Another patch should
> be added (not part of this webrev) to correct encodeBlock's style.
In your code one '\' is not aligned with the other ones.


> Ah, this is another thing I didn't know about.  I will make some
> regression tests.
Thanks. There's some documentation available:
https://openjdk.java.net/jtreg/
I guess your colleagues can assist you with that so you don't have to figure out everything alone.


> Thanks for your time on this.  As you can tell, I'm inexperienced in
> writing openjdk code, so your patience and careful review is really
> appreciated.
I'm glad you work on contributions. I think we should welcome new contributors and assist as far as we can.

Best regards,
Martin


> -----Original Message-----
> From: Corey Ashford <cjashfor at linux.ibm.com>
> Sent: Donnerstag, 27. August 2020 00:17
> To: Doerr, Martin <martin.doerr at sap.com>; Michihiro Horie
> <HORIE at jp.ibm.com>
> Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev <core-libs-
> dev at openjdk.java.net>; Kazunori Ogata <OGATAK at jp.ibm.com>;
> joserz at br.ibm.com
> Subject: Re: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and
> API for Base64 decoding
> 
> Hi Martin,
> 
> Some inline responses below.
> 
> On 8/26/20 8:26 AM, Doerr, Martin wrote:
> 
> > Hi Corey,
> >
> > I should explain my comments regarding Base64.java better.
> >
> >> Let's be precise: "should process a multiple of four" => "must process a
> >> multiple of four"
> > Did you try to support non-multiple of 4 and this was intended as
> recommendation?
> > I think making it a requirement and simplifying the logic in decode0 is
> better.
> > Or what's the benefit of the recommendation?
> 
> If I make a requirement, I feel decode0 should check that the
> requirement is met, and raise some kind of internal error if it isn't.
> That actually was my first implementation, but I received some comments
> during an internal review suggesting that I just "round down" the
> destination count to the closest multiple of 3 less than or equal to the
> returned value, rather than throw an internal exception which would
> confuse users.  This "enforces" the rule, in some sense, without error
> handling.  Do you have some thoughts about this?
> 
> >
> >>> If any illegal base64 bytes are encountered in the source by the
> >>> intrinsic, the intrinsic can return a data length of zero or any
> >>> number of bytes before the place where the illegal base64 byte
> >>> was encountered.
> >> I think this has a drawback. Somebody may use a debugger and want to
> stop
> >> when throwing IllegalArgumentException. He should see the position
> which
> >> matches the Java implementation.kkkk
> > This is probably hard to understand. Let me try to explain it by example:
> > 1. 80 Bytes get processed by the intrinsic and 60 Bytes written to the
> destination array.
> > 2. The intrinsic sees an illegal base64 Byte and it returns 12 which is allowed
> by your specification.
> > 3. The compiled method containing the intrinsic hits a safepoint (e.g. in the
> large while loop in decodeBlockSlow).
> > 4. A JVMTI agent (debugger) reads dp and dst.
> > 5. The person using the debugger gets angry because more bytes than dp
> were written into dst. The JVM didn't follow the specified behavior.
> >
> > I guess we can and should avoid it by specifying that the intrinsic needs to
> return the dp value matching the number of Bytes written.
> 
> That's an interesting point.  I will change the specification, and the
> intrinsic implementation.  Right now the Power9/10 intrinsic returns 0
> when any illegal character is discovered, but I've been thinking about
> returning the number of bytes already written, which will allow
> decodeBlockSlow to more quickly find the offending character.  This
> provides another good reason to make that change.
> 
> >
> > Best regards,
> > Martin
> >
> >
> >> -----Original Message-----
> >> From: Doerr, Martin
> >> Sent: Dienstag, 25. August 2020 15:38
> >> To: Corey Ashford <cjashfor at linux.ibm.com>; Michihiro Horie
> >> <HORIE at jp.ibm.com>
> >> Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev <core-libs-
> >> dev at openjdk.java.net>; Kazunori Ogata <OGATAK at jp.ibm.com>;
> >> joserz at br.ibm.com
> >> Subject: RE: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate
> and
> >> API for Base64 decoding
> >>
> >> Hi Corey,
> >>
> >> thanks for proposing this change. I have comments and suggestions
> >> regarding various files.
> >>
> >>
> >> Base64.java
> >>
> >> This is the only file which needs another review from core-libs-dev.
> >> First of all, I like the idea to use a HotSpotIntrinsicCandidate which can
> >> consume as many bytes as the implementation wants.
> >>
> >> Comment before decodeBlock:
> >> Let's be precise: "should process a multiple of four" => "must process a
> >> multiple of four"
> >>
> >>> If any illegal base64 bytes are encountered in the source by the
> >>> intrinsic, the intrinsic can return a data length of zero or any
> >>> number of bytes before the place where the illegal base64 byte
> >>> was encountered.
> >> I think this has a drawback. Somebody may use a debugger and want to
> stop
> >> when throwing IllegalArgumentException. He should see the position
> which
> >> matches the Java implementation.
> >>
> >> Please note that the comment indentation differs from other comments.
> 
> Will fix.
> 
> >>
> >> decode0: Final "else" after return is redundant.
> 
> Will fix.
> 
> >>
> >>
> >> stubGenerator_ppc.cpp
> >>
> >> "__vector" breaks AIX build!
> >> Does it work on Big Endian linux with old gcc (we require 7.3.1, now)?
> >> Please either support Big Endian properly or #ifdef it out.
> 
> I have been compiling with only Advance Toolchain 13, which is 9.3.1,
> and only on Linux.  It will not work with big endian, so it won't work
> on AIX, however obviously it shouldn't break the AIX build, so I will
> address that.  There's code to set UseBASE64Intrinsics to false on big
> endian, but you're right -- I should ifdef all of the intrinsic code for
> little endian for now.  Getting it to work on big endian / AIX shouldn't
> be difficult, but it's not in my scope of work at the moment.
> 
> I will double check that everything compiles and runs properly with gcc
> 7.3.1.
> 
> >> What exactly does it (do) on linux?
> 
> It's an arch-specific type that's 16 bytes in size and aligned on a
> 16-byte boundary.
> 
> >> I remember that we had tried such prefixes but were not satisfied. I think
> it
> >> didn't enforce 16 Byte alignment if I remember correctly.
> 
> I will use __attribute__ ((align(16))) instead of __vector, and make
> them arrays of 16 unsigned char.
> 
> >>
> >> Attention: C2 does no longer convert int/bool to 64 bit values (since JDK-
> >> 8086069). So the argument registers for offset, length and isURL may
> contain
> >> garbage in the higher bits.
> 
> Wow, that's good to know!  I will mask off the incoming values.
> 
> >>
> >> You may want to use load_const_optimized which produces shorter code.
> 
> Will fix.
> 
> >>
> >> You may want to use __ align(32) to align unrolled_loop_start.
> 
> Will fix.
> 
> >>
> >> I'll review the algorithm in detail when I find more time.
> >>
> >>
> >> assembler_ppc.hpp
> >> assembler_ppc.inline.hpp
> >> vm_version_ppc.cpp
> >> vm_version_ppc.hpp
> >> Please rebase. Parts of the change were pushed as part of 8248190:
> Enable
> >> Power10 system and implement new byte-reverse instructions
> 
> Will do.
> 
> >>
> >>
> >> vmSymbols.hpp
> >> Indentation looks odd at the end.
> 
> I was following what was done for encodeBlock, but it appears
> encodeBlock's style isn't what is used for the other intrinsics.  I will
> correct decodeBlock to use the prevailing style.  Another patch should
> be added (not part of this webrev) to correct encodeBlock's style.
> 
> >>
> >>
> >> library_call.cpp
> >> Good. Indentation style of the call parameters differs from encodeBlock.
> 
> Will fix.
> 
> >>
> >>
> >> runtime.cpp
> >> Good.
> >>
> >>
> >> aotCodeHeap.cpp
> >> vmSymbols.cpp
> >> shenandoahSupport.cpp
> >> vmStructs_jvmci.cpp
> >> shenandoahSupport.cpp
> >> escape.cpp
> >> runtime.hpp
> >> stubRoutines.cpp
> >> stubRoutines.hpp
> >> vmStructs.cpp
> >> Good and trivial.
> >>
> >>
> >> Tests:
> >> I think we should have JTREG tests to check for regressions in the future.
> 
> Ah, this is another thing I didn't know about.  I will make some
> regression tests.
> 
> Thanks for your time on this.  As you can tell, I'm inexperienced in
> writing openjdk code, so your patience and careful review is really
> appreciated.
> 
> - Corey

From aph at redhat.com  Thu Aug 27 15:25:16 2020
From: aph at redhat.com (Andrew Haley)
Date: Thu, 27 Aug 2020 16:25:16 +0100
Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
In-Reply-To: <AM4PR02MB30575C819AFC52A4F10127689A550@AM4PR02MB3057.eurprd02.prod.outlook.com>
References: <AM4PR02MB305739FDFF38E9432F27E7969A5F0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <fdb47a78-5b0e-f013-b5fc-7c9fcdf0cf2c@redhat.com>
 <AM4PR02MB30575C819AFC52A4F10127689A550@AM4PR02MB3057.eurprd02.prod.outlook.com>
Message-ID: <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com>

Hi,

On 27/08/2020 11:04, Doerr, Martin wrote:
>
>> Why is anyone backporting a P4 Enhancement? Seems weird.

> This is a good question in general. Personally, I'd vote for
> backporting fewer less important things to 11u in the future. We
> should better focus on 17 IMHO.
>
> However, there are some arguments for backporting this one:
> - Oracle has done so. There may be more backports in this area and
> I'd expect less effort if we have the same code in the open version.
> - Performance is supposed to be better. (Though I didn't measure it.)
> - New code is much cleaner. Let's keep in mind that we have to
> support it for quite a while.
>
> Are you ok with it?

I'm unsure. While "Oracle has backported it" has been a slam-dunk
justification for many patches, I am concerned about the destabilizing
effect of the volume of patches we are processing.

"Better performance" is not in itself justification for a backport
unless the improvement is really compelling.

"Cleanups" are a red flag. The miserable history of code that has been
broken by seemingly innocuous cleanups is long. This is a big change
that affects some very delicate code, but the fact that there is
already a GraalVM patch we can use is quite persuasive.

So I'm not refusing it, I want people's opinions.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From sgehwolf at redhat.com  Thu Aug 27 15:59:32 2020
From: sgehwolf at redhat.com (Severin Gehwolf)
Date: Thu, 27 Aug 2020 17:59:32 +0200
Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
In-Reply-To: <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com>
References: <AM4PR02MB305739FDFF38E9432F27E7969A5F0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <fdb47a78-5b0e-f013-b5fc-7c9fcdf0cf2c@redhat.com>
 <AM4PR02MB30575C819AFC52A4F10127689A550@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com>
Message-ID: <bafa778e83b01e497052778e58174b87d31d9f7e.camel@redhat.com>

On Thu, 2020-08-27 at 16:25 +0100, Andrew Haley wrote:
> Hi,
> 
> On 27/08/2020 11:04, Doerr, Martin wrote:
> > > Why is anyone backporting a P4 Enhancement? Seems weird.
> > This is a good question in general. Personally, I'd vote for
> > backporting fewer less important things to 11u in the future. We
> > should better focus on 17 IMHO.
> > 
> > However, there are some arguments for backporting this one:
> > - Oracle has done so. There may be more backports in this area and
> > I'd expect less effort if we have the same code in the open version.
> > - Performance is supposed to be better. (Though I didn't measure it.)
> > - New code is much cleaner. Let's keep in mind that we have to
> > support it for quite a while.
> > 
> > Are you ok with it?
> 
> I'm unsure. While "Oracle has backported it" has been a slam-dunk
> justification for many patches, I am concerned about the destabilizing
> effect of the volume of patches we are processing.
> 
> "Better performance" is not in itself justification for a backport
> unless the improvement is really compelling.
> 
> "Cleanups" are a red flag. The miserable history of code that has been
> broken by seemingly innocuous cleanups is long. This is a big change
> that affects some very delicate code, but the fact that there is
> already a GraalVM patch we can use is quite persuasive.
> 
> So I'm not refusing it, I want people's opinions.

It seems like a nice-to-have fix for OpenJDK 11 itself. Interest seems
to be coming from Graal. Until there is a more compelling reason to
backport this (other than performance for some JVMCI impl) we shouldn't
backport this. We already have a label for these: jdk11u-jvmci-defer.
We should apply that and re-evaluate later if needed.

My $0.02

Thanks,
Severin


From vladimir.kozlov at oracle.com  Thu Aug 27 16:33:44 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 27 Aug 2020 09:33:44 -0700
Subject: New EA Metropolis build
Message-ID: <eb143ab9-8aa2-1f17-23cb-44be99beed53@oracle.com>

The build at the Project Metropolis Early Access page [1] has been refreshed. It was updated to JDK 15.

Binaries are based on Metropolis repository [2] which was synced with jdk-15+36 (JDK 15 build 36).

Graal in Metropolis is based on GraalVM CE version of Graal [3]. It was updated up to GR-24572 commit [4] and additional 
patch was applied [5] to enable libgraal build with JDK 15.

Regards,
Vladimir Kozlov

[1] https://jdk.java.net/metropolis/
[2] https://github.com/openjdk/metropolis
[3] https://github.com/oracle/graal
[4] [GR-24572] JDK15 java.lang.invoke.MemberName is reachable.
https://github.com/oracle/graal/commit/b0735cd5fb384cfdb522488edf1d83b013507d72
[5] [GR-25120] Fixed leaked indirect java constants on jdk15.
https://github.com/oracle/graal/commit/e82d1090c23493a6d665e579cacad8241ea75318

From vladimir.kozlov at oracle.com  Thu Aug 27 17:23:52 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 27 Aug 2020 10:23:52 -0700
Subject: [16] RFR(M): 8249607: C2: assert(!had_error) failed: bad dominance
In-Reply-To: <95e31d50-aab1-a56c-9077-3e6370d1d94a@oracle.com>
References: <ff42068d-c628-7698-b16d-de164e9f6a4d@oracle.com>
 <87o8my7r0b.fsf@redhat.com> <738ba102-2cbd-a842-0f23-2984a9293035@oracle.com>
 <95e31d50-aab1-a56c-9077-3e6370d1d94a@oracle.com>
Message-ID: <cccdfa66-74d9-d3fa-fd0a-40426cca8037@oracle.com>

On 8/27/20 12:53 AM, Christian Hagedorn wrote:
> On 25.08.20 19:42, Christian Hagedorn wrote:
>> On 25.08.20 16:13, Roland Westrelin wrote:
>>>
>>>> In the testcase, a LoadSNode is cloned in
>>>> PhaseIdealLoop::split_if_with_blocks_post() for each use such that they
>>>> can float out of a loop. To ensure that these loads cannot float back
>>>> into the loop, we pin them by setting their control input [1]. In the
>>>> testcase, all 3 new clones are pinned to a loop exit node that is part
>>>> of an outer strip mined loop (see [2]).
>>>
>>> Do I understand this right, that all 3 clones are pinned with the same
>>> control? So they common and only of them is kept?
>>
>> Yes, exactly. All are pinned to the inner loop exit node. But at the time we hit the assertion failure, we still got 
>> one cloned load (903 LoadS) that is an input to the store (575 StoreI) that's going into the outer strip mined loop 
>> safepoint, and one load (901 LoadS) that is triggering the dominance failure. LoadS 902 was removed at some point in 
>> between due to other optimizations.
> 
> As Roland and I have discussed offline, it seems to be better and safer to do a simpler fix that does not change the 
> original behavior of the optimization. The new fix suggests not yank AddP nodes (which are inputs to the cloned 
> LoadSNodes in the testcase) and also to not yank gc barriers. In the testcase, the cloned LoadSNodes are still pinned at 
> the loop exit but now they can be optimized and common up to one node during igvn that only belongs to the safepoint in 
> the outer strip mined loop (i.e. no load after the loop anymore). The load is still successfully removed from the inner 
> loop:
> 
> http://cr.openjdk.java.net/~chagedorn/8249607/webrev.02/
> 
> I left the improved dominance failure dumping as it is.

Good.

> 
> We think that it would be a good idea to revisit this cloning optimization in an RFE and also consider webrev.01 there 
> as it seems to be more like an enhancement for loop strip mining rather than a bug fix. I filed [1] which summarizes 
> some thoughts about it.
> 
> What do others think about that?

I agree with that.

Thanks,
Vladimir

> 
> Best regards,
> Christian
> 
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8252372

From vladimir.kozlov at oracle.com  Thu Aug 27 17:27:19 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 27 Aug 2020 10:27:19 -0700
Subject: [16] RFR(M) 825239: AOT need to process new markId
 DEOPT_MH_HANDLER_ENTRY in compiled code
In-Reply-To: <bff8eacd-0390-e290-a9bb-af9b394ad7e9@oracle.com>
References: <9c278576-08e3-1f5b-28d2-6c3b980a6511@oracle.com>
 <bff8eacd-0390-e290-a9bb-af9b394ad7e9@oracle.com>
Message-ID: <c8e4ca62-af8e-b63a-e38c-4bd825e5d1d0@oracle.com>

Thank you, Dean

Vladimir K

On 8/27/20 1:36 AM, Dean Long wrote:
> Looks good.
> 
> dl
> 
> On 8/26/20 5:32 PM, Vladimir Kozlov wrote:
>> http://cr.openjdk.java.net/~kvn/8252396/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8252396
>>
>> 8252058 added new markId DEOPT_MH_HANDLER_ENTRY to handle deoptimization for MH invoke.
>> But changes did not updated AOT (jaotc and Hotspot's AOT code) to handle this new markId.
>>
>> We should handle DEOPT_MH_HANDLER_ENTRY in AOT similar to DEOPT_HANDLER_ENTRY.
>>
>> In aotCompiledMethod.hpp, if DEOPT_MH_HANDLER_ENTRY value is set, CompiledMethod::_deopt_mh_handler_begin [2] is set 
>> similar to Graal JIT [3]. I kept current code to set _deopt_mh_handler_begin to 'this' when DEOPT_MH_HANDLER_ENTRY 
>> value is not set. But may be it should be set to NULL as in [3]. May be it does not matter because offset is not used 
>> when there are not MH invoke in method.
>>
>> Tested: ran tests which used AOT (including Graal testing).
>>
>> Thanks,
>> Vladimir
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8252058
>> [2] http://hg.openjdk.java.net/jdk/jdk/file/6abdfb11f342/src/hotspot/share/code/compiledMethod.hpp#l168
>> [3] http://hg.openjdk.java.net/jdk/jdk/file/6abdfb11f342/src/hotspot/share/code/nmethod.cpp#l764
> 

From vladimir.kozlov at oracle.com  Thu Aug 27 17:39:42 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 27 Aug 2020 10:39:42 -0700 (PDT)
Subject: [16] RFR(M) 825239: AOT need to process new markId
 DEOPT_MH_HANDLER_ENTRY in compiled code
In-Reply-To: <d516e90e-392d-0e76-2e7e-2ea4cae1efda@redhat.com>
References: <9c278576-08e3-1f5b-28d2-6c3b980a6511@oracle.com>
 <d516e90e-392d-0e76-2e7e-2ea4cae1efda@redhat.com>
Message-ID: <94cfa9c9-f5dc-a443-cf4a-53b642c68c84@oracle.com>

I created open bug and will use its ID for changeset:

https://bugs.openjdk.java.net/browse/JDK-8252467

Thank,
Vladimir K

On 8/27/20 2:18 AM, Andrew Haley wrote:
> On 27/08/2020 01:32, Vladimir Kozlov wrote:
>> [1] https://bugs.openjdk.java.net/browse/JDK-8252396
> 
> You can't view this issue
> 
> It may have been deleted or you don't have permission to view it.
> 

From jingxinc at amazon.com  Thu Aug 27 18:08:49 2020
From: jingxinc at amazon.com (Eric, Chan)
Date: Thu, 27 Aug 2020 18:08:49 +0000
Subject: RFR 8239090: Improve CPU feature support in VM_version
Message-ID: <E5276695-CA3B-4F49-9E98-928053D6C742@amazon.com>

Hi,

Requesting review for

Webrev : http://cr.openjdk.java.net/~phh/8239090/webrev.00/
JBS : https://bugs.openjdk.java.net/browse/JDK-8239090

Yesterday I sent a wrong one, so I send it again,
I improve the ?get_processor_features? method by store every cpu features in an enum array so that we don?t have to count how many ?%s? that need to added. I passed the tier1 test successfully.

Regards,
Eric Chen


From Divino.Cesar at microsoft.com  Thu Aug 27 19:36:27 2020
From: Divino.Cesar at microsoft.com (Cesar Soares Lucas)
Date: Thu, 27 Aug 2020 19:36:27 +0000
Subject: [16] RFR(S): 8250668: Clean up method_oop names in adlc
Message-ID: <SN6PR2101MB0893C8833F009265F16C78799A550@SN6PR2101MB0893.namprd21.prod.outlook.com>

Hi there,

RFE: https://bugs.openjdk.java.net/browse/JDK-8250668
Webrev: https://cr.openjdk.java.net/~adityam/cesar/8250668/0/
Need sponsor: Yes
Tested on: Windows/Linux/MacOS tiers 1-3

can I please get some reviews for the Webrev linked above? The work
consists of renaming "method_oop" ocurrences all around the code
base to just "method". I've tested this on x86_64 only?* Can someone
please help testing on other architectures as well: x86_32, PPC,
ARM32/64, S390?


Thank you,
Cesar

From richard.reingruber at sap.com  Thu Aug 27 20:32:36 2020
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Thu, 27 Aug 2020 20:32:36 +0000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <AM4PR0202MB296474493805974FBF5F00B7EC580@AM4PR0202MB2964.eurprd02.prod.outlook.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <ca46e04d-6c46-7365-0f09-9d649e196442@oracle.com>
 <DB7PR02MB3612E34960EAD89951E788839B5A0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com>
 <a4213452-e7bd-5bed-7456-3eebf4a4c3a7@oracle.com>
 <DB7PR02MB3612C72A7DC0C14CFC8B92969B540@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <f97264ed-c43e-2d7e-19ae-fcff174f74df@oracle.com>
 <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com>
 <DB7PR02MB36127925DB5D6609DDBF96909B500@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <5271429a-481d-ddb9-99dc-b3f6670fcc0b@oracle.com>
 <AM0PR0202MB33316510E86767AED0D29F679B030@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM7PR02MB6049A3D2F6DE10CAD6AA7A51ECEC0@AM7PR02MB6049.eurprd02.prod.outlook.com>
 <b159e349-95bc-01c3-5250-f3b454d7ef53@oracle.com>
 <AM0PR0202MB33315707EAB1F5C9801DB4C19BE40@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM0PR0202MB32972071A26C80FB22FC49DE9AFD0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <AM0PR0202MB3331EEF36942FCEBA7E131389BCB0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM0PR0202MB329746F57D1C78F14000CB799AC80@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <AM0PR0202MB3331D64C693490FD0746D1989BC90@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <DB6PR0201MB2152AF18921A375D26A76D89ECA40@DB6PR0201MB2152.eurprd02.prod.outlook.com>
 <AM0PR0202MB3331FF18BED42A71796488E59B600@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM4PR0202MB29641555B86889D51E08441BEC7F0@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <AM4PR0202MB2964FAF58FBD21D6705A4418EC7C0@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <AM0PR0202MB333139A9A877B64198E73D0F9B790@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM4PR0202MB296490252335D6D6D638277AEC760@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <AM0PR0202MB3331CBC4824141F221D357129B5C0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM4PR0202MB296474493805974FBF5F00B7EC580@AM4PR0202MB2964.eurprd02.prod.outlook.com>
Message-ID: <AM0PR0202MB333160B11779E442564522989B550@AM0PR0202MB3331.eurprd02.prod.outlook.com>

Hi Goetz,

> I read through your change again. It looks good to me now.
> The new naming and additional comments make it 
> easier to read I think, thank you.

Thanks for all your input!

> One small thing:
> deoptimization.cpp, l. 1503
> You don't really need the brackets. Two lines below you don't use them either.
> (No webrev needed)

Thanks for providing the correct line off list. Fixed!

I prepared a new webrev, because I had to rebase after JDK-8249293 [1] and
because I wanted to make use of JDK-8251384 [2]

Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8/
Delta:  http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8.inc/

The delta looks bigger than it is. Most of it is re-indentation of
VM_GetOrSetLocal::deoptimize_objects(). You can see this if you look at

http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8.inc/src/hotspot/share/prims/jvmtiImpl.cpp.udiff.html

which does not include the whitespace change.

Hope you are still ok with webrev.8. The changes are marginal. I've commented
each below.

Thanks, Richard.

--- Details below ---

src/hotspot/share/prims/jvmtiImpl.cpp

@@ -425,11 +425,11 @@
   , _depth(depth)
   , _index(index)
   , _type(type)
   , _jvf(NULL)
   , _set(false)
-  , _eb(NULL, NULL, false) // no references escape
+  , _eb(NULL, NULL, type == T_OBJECT)
   , _result(JVMTI_ERROR_NONE)

Currently 'type' is never equal to T_OBJECT at this location, still I think it
is better to check. The compiler will replace the compare with false.

@@ -630,11 +630,11 @@
 }
 
 // Revert optimizations based on escape analysis if this is an access to a local object
 bool VM_GetOrSetLocal::deoptimize_objects(javaVFrame* jvf) {
 #if COMPILER2_OR_JVMCI
-  if (NOT_JVMCI(DoEscapeAnalysis &&) _type == T_OBJECT) {
+  assert(_type == T_OBJECT, "EscapeBarrier should not be active if _type != T_OBJECT");

I removed the if from VM_GetOrSetLocal::deoptimize_objects(), because now it
only gets called if the VM_GetOrSetLocal instance has an active EscapeBarrier
which will be the case iff the local type is T_OBJECT and if either C2 escape
analysis is enabled or Graal is used.

src/hotspot/share/runtime/deoptimization.cpp

You suggested to remove the braces. Done.

src/hotspot/share/runtime/deoptimization.hpp

Must provide definition of EscapeBarrier::barrier_active() for new call site in
VM_GetOrSetLocal::doit_prologue() if building with COMPILER2_OR_JVMCI not
defined.

test/hotspot/jtreg/serviceability/jvmti/Heap/IterateHeapWithEscapeAnalysisEnabled.java

Make use of [2] and pass test with minimal vm.

[1] https://bugs.openjdk.java.net/browse/JDK-8249293
[2] https://bugs.openjdk.java.net/browse/JDK-8251384

-----Original Message-----
From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com> 
Sent: Samstag, 22. August 2020 07:46
To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents

Hi Richard,

I read through your change again. It looks good to me now.
The new naming and additional comments make it 
easier to read I think, thank you.

One small thing:
deoptimization.cpp, l. 1503
You don't really need the brackets. Two lines below you don't use them either.
(No webrev needed)

Best regards,
  Goetz.


From igor.veresov at oracle.com  Fri Aug 28 01:20:48 2020
From: igor.veresov at oracle.com (Igor Veresov)
Date: Thu, 27 Aug 2020 18:20:48 -0700
Subject: RFR 8239090: Improve CPU feature support in VM_version
In-Reply-To: <E5276695-CA3B-4F49-9E98-928053D6C742@amazon.com>
References: <E5276695-CA3B-4F49-9E98-928053D6C742@amazon.com>
Message-ID: <47EE441C-09D0-43C1-A339-E8323B866A66@oracle.com>

You can actually make a constexpr array of feature objects and then use constexpr function with a loop to look it up. The c++ compiler will generate an O(1) table lookup for it.
That would be a good way to get rid of the ugly macro (we allow c++14 now).

For example foo() in this example:

enum E { a, b, c };

struct P {
  E _e; // key
  int _v; // value
  constexpr P(E e, int v) : _e(e), _v(v) { }
};


constexpr static P ps[3] = { P(a, 0xdead), P(b, 0xbeef), P(c, 0xf00d)};

constexpr int match(E e) {
  for (const auto& p : ps) {
    if (p._e == e) {
      return p._v;
    }
  }
  return -1;
}


int foo(E e) {
  return match(e);
}

Will be compiled into:

__Z3foo1E:                              ## @_Z3foo1E
	.cfi_startproc
## %bb.0:
	movl	$-1, %eax
	cmpl	$2, %edi
	ja	LBB0_2
## %bb.1:
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	movslq	%edi, %rax
	leaq	l_switch.table._Z3foo1E(%rip), %rcx
	movq	(%rcx,%rax,8), %rax
	movl	4(%rax), %eax
	popq	%rbp
LBB0_2:
	retq
	.cfi_endproc
                                        ## -- End function
	.section	__TEXT,__const
	.p2align	4               ## @_ZL2ps
__ZL2ps:
	.long	0                       ## 0x0
	.long	57005                   ## 0xdead
	.long	1                       ## 0x1
	.long	48879                   ## 0xbeef
	.long	2                       ## 0x2
	.long	61453                   ## 0xf00d

	.section	__DATA,__const
	.p2align	3               ## @switch.table._Z3foo1E
l_switch.table._Z3foo1E:
	.quad	__ZL2ps
	.quad	__ZL2ps+8
	.quad	__ZL2ps+16


igor


> On Aug 27, 2020, at 11:08 AM, Eric, Chan <jingxinc at amazon.com> wrote:
> 
> Hi,
> 
> Requesting review for
> 
> Webrev : http://cr.openjdk.java.net/~phh/8239090/webrev.00/
> JBS : https://bugs.openjdk.java.net/browse/JDK-8239090
> 
> Yesterday I sent a wrong one, so I send it again,
> I improve the ?get_processor_features? method by store every cpu features in an enum array so that we don?t have to count how many ?%s? that need to added. I passed the tier1 test successfully.
> 
> Regards,
> Eric Chen
> 


From ningsheng.jian at arm.com  Fri Aug 28 05:56:56 2020
From: ningsheng.jian at arm.com (Ningsheng Jian)
Date: Fri, 28 Aug 2020 13:56:56 +0800
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <dbf16339-a41e-2f87-aace-461b21f10bfe@oracle.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com>
 <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com>
 <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com>
 <b97a6343-ec3b-f6ec-2a75-d02a53641ed1@arm.com>
 <35f8801f-4383-4804-2a9b-5818d1bda763@redhat.com>
 <524a4aaa-3cf7-7b5e-3e0e-0fd7f4f89fbf@oracle.com>
 <670fad6f-16ff-a7b3-8775-08dd79809ddf@redhat.com>
 <dbf16339-a41e-2f87-aace-461b21f10bfe@oracle.com>
Message-ID: <b4791a53-2a13-c0a4-a151-c613a613df65@arm.com>

Hi Vladimir,

Thanks a lot for helping clarifying your concerns which will benefit 
future direction.

On 8/27/20 8:54 PM, Vladimir Ivanov wrote:
> Hi Andrew,
> 
>> So, if I can summarize (please correct me if I misunderstand):
>>
>>     You are as concerned about existing complexity in vector handling as
>> much as complexity added by this patch, whether the latter is to AArch64
>> code or shared code.
>>
>>     The goal you would like to achieve is a single set of rules for a
>> single kind of vector register whose size is parameterized, the
>> appropriate value being derived from each specific vector operation.
>>
>>     Your main concern about this patch is that it adds yet another
>> additional vector kind to the current 'wrong' multi-kind vector model
>> and, what is worse, one with a different behaviour, taking us further
>> from your desired goal.
> 
> Yes, correct.
> 
>>     Your other concern is that this design does not allow for the AArch64
>> ISA predication or, indeed, for what you treat uniformly as the
>> 'implicit' predication imposed on a 'logical' max vector size (2048
>> bits) by the specific AVX/SVE/NEON hardware vector size.
> 
> No, I'm not concerned about that. I mentioned SVE implicit predication
> to illustrate that there's a higher-level abstraction in the JVM above
> ISA level which hides some of the functionality ISA exposes. And I'm
> perfectly fine with that.
> 
>>> But you should definitely prefer 1-slot design for vector registers then
>>> ;-)
>>
>> Indeed I do :-]
>>
>> So, let me respond to the above summary points, assuming I have them
>> down right.
>>
>> I agree that your end goal is highly desirable. However, we are not
>> there yet and since your attempts to do so have not succeeded so far I
>> don't think that means we are compelled to drop the current patch. As
>> you say this could (and, if it is adopted, should) be regarded as a
>> useful stop-gap until we come up with a unified, parameterized vector
>> implementation that makes it redundant.
> 
> Unfortunately, there was simply not enough motivation on x86 (and hence
> resources spent) to address it there. Vector API support for x86
> stretched the implementation in a different direction: combinatorial
> explosion of AD instructions needed to cover all useful cases. It
> required switching to full-width vectors in x86.ad file which left RA
> concerns waiting next opportunity.
> 
>> That said, I'm not pushing hard to keep the patch if the consequence is
>> generating significant work later to undo it. The number of users who
>> might benefit from using SVE vectors from Java now or in the near future
>> does not look like it is going to be very large (if you are not making a
>> lot of use of SVE registers then that is a lot of wasted silicon and I
>> suspect it's going to be the rare case that someone codes an app in Java
>> that needs to make continuous use of SVE -- mind you, by the same token
>> I guess that also applies for AVX on Intel).
> 
> I don't consider RA part of the patch as the show-stopper issue for
> initial SVE support. As I said to Ningsheng, I'm fine with the patch as
> it is now if we agree it's a stop-the-gap solution and there's a
> commitment to invest into the proper support.
> 
> I initially put options #1/#2 (which don't require any changes in RA
> shared code) as possible alternatives way to temporarily address the
> problem. Both require additional simplifying assumptions and hence I
> didn't insist they should be chosen.
> 
>> I'm not sure pushing this now will add a lot more work later. It seems
>> to me that this code is actually moving in the right direction for the
>> sort of solution you want. The AArch64 VecA register /is/
>> size-parameterized, albeit by a size fixed at startup rather than per
>> operation. So, that's one reason why I don't know if this implies a lot
>> more rework to move towards your desired goal. Surely, if we do arrive
>> at a unifying vector model that can replace the existing multi-kind
>> vectors then it ought to be able to subsume this code - unless of course
>> it replaces it wholesale.
>>
>> Are you concerned that adding this patch will result in more cases to
>> pick through and correct?
>>
>> Are you worried that we might have to withdraw some of the support this
>> patch enables to arrive at the final goal?
>>
>> Also, Ningsheng and his colleagues have laid some foundations for
>> implementing predicated operations with this patch and have that work in
>> the pipeline. Once again this is moving towards the desired goal even if
>> it might end up doign so in a slightly sideways fashion. Perhaps we
>> could continue this stop-gap experiment as an experimental option in
>> order to learn from the experience?
> 
> I definitely don't want to hinder/block the impressive work Ningsheng
> and others at Arm are doing for SVE support.
> 
> Frankly speaking, my main concern is that the implementation can stay
> that way forever ;-) That's why I'm trying to get enough ground covered
> in the discussion and some agreements/commitments to be made before it
> is integrated.
> 
> I don't have any strong objections to the patch which could justify
> blocking its integration, but on a higher-level I do voice my concerns
> about where it pushes the implementation longer-term.
> 
> Unfortunately, as it is shaped now, I don't see how x86 can benefit from
> it. So, I'm afraid this particular route with vecA and _is_scalable bit
> will stay purely AArch64-specific exercise.
> 
> Leaving RA part aside, I have one suggestion which should help in the
> future: let's try to consistently follow full-width vector abstraction.
> In AD file, vecA operand is way too similar to vecX et al which makes a
> wrong impression it's yet another vector flavor. So, choosing a better
> name will help when representation changes. For example, x86 moved away
> from vecX/... operands to a single generic one (called "vec") and you
> can take a loot at x86.ad to see the result.
> 

Thanks for the suggestion. In current implementation vecA does not 
include vecD/vecX for NEON - so actually it's regarded as another vector 
flavor. We try to keep the SVE implementation separated from original 
NEON code (and a new ad file is also introduced), to make the code 
better maintainable and reviewable. What do you think about this naming, 
Andrew?

Thanks,
Ningsheng


From goetz.lindenmaier at sap.com  Fri Aug 28 06:37:39 2020
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Fri, 28 Aug 2020 06:37:39 +0000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <AM0PR0202MB333160B11779E442564522989B550@AM0PR0202MB3331.eurprd02.prod.outlook.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <ca46e04d-6c46-7365-0f09-9d649e196442@oracle.com>
 <DB7PR02MB3612E34960EAD89951E788839B5A0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com>
 <a4213452-e7bd-5bed-7456-3eebf4a4c3a7@oracle.com>
 <DB7PR02MB3612C72A7DC0C14CFC8B92969B540@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <f97264ed-c43e-2d7e-19ae-fcff174f74df@oracle.com>
 <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com>
 <DB7PR02MB36127925DB5D6609DDBF96909B500@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <5271429a-481d-ddb9-99dc-b3f6670fcc0b@oracle.com>
 <AM0PR0202MB33316510E86767AED0D29F679B030@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM7PR02MB6049A3D2F6DE10CAD6AA7A51ECEC0@AM7PR02MB6049.eurprd02.prod.outlook.com>
 <b159e349-95bc-01c3-5250-f3b454d7ef53@oracle.com>
 <AM0PR0202MB33315707EAB1F5C9801DB4C19BE40@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM0PR0202MB32972071A26C80FB22FC49DE9AFD0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <AM0PR0202MB3331EEF36942FCEBA7E131389BCB0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM0PR0202MB329746F57D1C78F14000CB799AC80@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <AM0PR0202MB3331D64C693490FD0746D1989BC90@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <DB6PR0201MB2152AF18921A375D26A76D89ECA40@DB6PR0201MB2152.eurprd02.prod.outlook.com>
 <AM0PR0202MB3331FF18BED42A71796488E59B600@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM4PR0202MB29641555B86889D51E08441BEC7F0@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <AM4PR0202MB2964FAF58FBD21D6705A4418EC7C0@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <AM0PR0202MB333139A9A877B64198E73D0F9B790@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM4PR0202MB296490252335D6D6D638277AEC760@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <AM0PR0202MB3331CBC4824141F221D357129B5C0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM4PR0202MB296474493805974FBF5F00B7EC580@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <AM0PR0202MB333160B11779E442564522989B550@AM0PR0202MB3331.eurprd02.prod.outlook.com>
Message-ID: <AM4PR0202MB29646E25C84FDBD171D5A445EC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>

Hi Richard, 

Thanks for the new webrev. 

The small improvements are fine, too.
Reviewed from my side.

Best regards,
  Goetz.

> -----Original Message-----
> From: Reingruber, Richard <richard.reingruber at sap.com>
> Sent: Thursday, August 27, 2020 10:33 PM
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; serviceability-
> dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-
> runtime-dev at openjdk.java.net
> Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance
> in the Presence of JVMTI Agents
> 
> Hi Goetz,
> 
> > I read through your change again. It looks good to me now.
> > The new naming and additional comments make it
> > easier to read I think, thank you.
> 
> Thanks for all your input!
> 
> > One small thing:
> > deoptimization.cpp, l. 1503
> > You don't really need the brackets. Two lines below you don't use them
> either.
> > (No webrev needed)
> 
> Thanks for providing the correct line off list. Fixed!
> 
> I prepared a new webrev, because I had to rebase after JDK-8249293 [1] and
> because I wanted to make use of JDK-8251384 [2]
> 
> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8/
> Delta:  http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8.inc/
> 
> The delta looks bigger than it is. Most of it is re-indentation of
> VM_GetOrSetLocal::deoptimize_objects(). You can see this if you look at
> 
> http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8.inc/src/hotsp
> ot/share/prims/jvmtiImpl.cpp.udiff.html
> 
> which does not include the whitespace change.
> 
> Hope you are still ok with webrev.8. The changes are marginal. I've
> commented
> each below.
> 
> Thanks, Richard.
> 
> --- Details below ---
> 
> src/hotspot/share/prims/jvmtiImpl.cpp
> 
> @@ -425,11 +425,11 @@
>    , _depth(depth)
>    , _index(index)
>    , _type(type)
>    , _jvf(NULL)
>    , _set(false)
> -  , _eb(NULL, NULL, false) // no references escape
> +  , _eb(NULL, NULL, type == T_OBJECT)
>    , _result(JVMTI_ERROR_NONE)
> 
> Currently 'type' is never equal to T_OBJECT at this location, still I think it
> is better to check. The compiler will replace the compare with false.
> 
> @@ -630,11 +630,11 @@
>  }
> 
>  // Revert optimizations based on escape analysis if this is an access to a
> local object
>  bool VM_GetOrSetLocal::deoptimize_objects(javaVFrame* jvf) {
>  #if COMPILER2_OR_JVMCI
> -  if (NOT_JVMCI(DoEscapeAnalysis &&) _type == T_OBJECT) {
> +  assert(_type == T_OBJECT, "EscapeBarrier should not be active if _type !=
> T_OBJECT");
> 
> I removed the if from VM_GetOrSetLocal::deoptimize_objects(), because
> now it
> only gets called if the VM_GetOrSetLocal instance has an active
> EscapeBarrier
> which will be the case iff the local type is T_OBJECT and if either C2 escape
> analysis is enabled or Graal is used.
> 
> src/hotspot/share/runtime/deoptimization.cpp
> 
> You suggested to remove the braces. Done.
> 
> src/hotspot/share/runtime/deoptimization.hpp
> 
> Must provide definition of EscapeBarrier::barrier_active() for new call site in
> VM_GetOrSetLocal::doit_prologue() if building with COMPILER2_OR_JVMCI
> not
> defined.
> 
> test/hotspot/jtreg/serviceability/jvmti/Heap/IterateHeapWithEscapeAnalysis
> Enabled.java
> 
> Make use of [2] and pass test with minimal vm.
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8249293
> [2] https://bugs.openjdk.java.net/browse/JDK-8251384
> 
> -----Original Message-----
> From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
> Sent: Samstag, 22. August 2020 07:46
> To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-
> dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-
> runtime-dev at openjdk.java.net
> Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance
> in the Presence of JVMTI Agents
> 
> Hi Richard,
> 
> I read through your change again. It looks good to me now.
> The new naming and additional comments make it
> easier to read I think, thank you.
> 
> One small thing:
> deoptimization.cpp, l. 1503
> You don't really need the brackets. Two lines below you don't use them
> either.
> (No webrev needed)
> 
> Best regards,
>   Goetz.


From richard.reingruber at sap.com  Fri Aug 28 07:41:02 2020
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Fri, 28 Aug 2020 07:41:02 +0000
Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in
 the Presence of JVMTI Agents
In-Reply-To: <AM4PR0202MB29646E25C84FDBD171D5A445EC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>
References: <DB7PR02MB3612C77802B72D3B3A131C729B5B0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <ca46e04d-6c46-7365-0f09-9d649e196442@oracle.com>
 <DB7PR02MB3612E34960EAD89951E788839B5A0@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com>
 <a4213452-e7bd-5bed-7456-3eebf4a4c3a7@oracle.com>
 <DB7PR02MB3612C72A7DC0C14CFC8B92969B540@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <f97264ed-c43e-2d7e-19ae-fcff174f74df@oracle.com>
 <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com>
 <DB7PR02MB36127925DB5D6609DDBF96909B500@DB7PR02MB3612.eurprd02.prod.outlook.com>
 <5271429a-481d-ddb9-99dc-b3f6670fcc0b@oracle.com>
 <AM0PR0202MB33316510E86767AED0D29F679B030@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM7PR02MB6049A3D2F6DE10CAD6AA7A51ECEC0@AM7PR02MB6049.eurprd02.prod.outlook.com>
 <b159e349-95bc-01c3-5250-f3b454d7ef53@oracle.com>
 <AM0PR0202MB33315707EAB1F5C9801DB4C19BE40@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM0PR0202MB32972071A26C80FB22FC49DE9AFD0@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <AM0PR0202MB3331EEF36942FCEBA7E131389BCB0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM0PR0202MB329746F57D1C78F14000CB799AC80@AM0PR0202MB3297.eurprd02.prod.outlook.com>
 <AM0PR0202MB3331D64C693490FD0746D1989BC90@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <DB6PR0201MB2152AF18921A375D26A76D89ECA40@DB6PR0201MB2152.eurprd02.prod.outlook.com>
 <AM0PR0202MB3331FF18BED42A71796488E59B600@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM4PR0202MB29641555B86889D51E08441BEC7F0@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <AM4PR0202MB2964FAF58FBD21D6705A4418EC7C0@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <AM0PR0202MB333139A9A877B64198E73D0F9B790@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM4PR0202MB296490252335D6D6D638277AEC760@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <AM0PR0202MB3331CBC4824141F221D357129B5C0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM4PR0202MB296474493805974FBF5F00B7EC580@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <AM0PR0202MB333160B11779E442564522989B550@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <AM4PR0202MB29646E25C84FDBD171D5A445EC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>
Message-ID: <AM6PR0202MB3333D04D505F4EC670CECB219B520@AM6PR0202MB3333.eurprd02.prod.outlook.com>

Thanks a lot!

Richard.

-----Original Message-----
From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com> 
Sent: Freitag, 28. August 2020 08:38
To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents

Hi Richard, 

Thanks for the new webrev. 

The small improvements are fine, too.
Reviewed from my side.

Best regards,
  Goetz.

> -----Original Message-----
> From: Reingruber, Richard <richard.reingruber at sap.com>
> Sent: Thursday, August 27, 2020 10:33 PM
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; serviceability-
> dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-
> runtime-dev at openjdk.java.net
> Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance
> in the Presence of JVMTI Agents
> 
> Hi Goetz,
> 
> > I read through your change again. It looks good to me now.
> > The new naming and additional comments make it
> > easier to read I think, thank you.
> 
> Thanks for all your input!
> 
> > One small thing:
> > deoptimization.cpp, l. 1503
> > You don't really need the brackets. Two lines below you don't use them
> either.
> > (No webrev needed)
> 
> Thanks for providing the correct line off list. Fixed!
> 
> I prepared a new webrev, because I had to rebase after JDK-8249293 [1] and
> because I wanted to make use of JDK-8251384 [2]
> 
> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8/
> Delta:  http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8.inc/
> 
> The delta looks bigger than it is. Most of it is re-indentation of
> VM_GetOrSetLocal::deoptimize_objects(). You can see this if you look at
> 
> http://cr.openjdk.java.net/~rrich/webrevs/8227745/webrev.8.inc/src/hotsp
> ot/share/prims/jvmtiImpl.cpp.udiff.html
> 
> which does not include the whitespace change.
> 
> Hope you are still ok with webrev.8. The changes are marginal. I've
> commented
> each below.
> 
> Thanks, Richard.
> 
> --- Details below ---
> 
> src/hotspot/share/prims/jvmtiImpl.cpp
> 
> @@ -425,11 +425,11 @@
>    , _depth(depth)
>    , _index(index)
>    , _type(type)
>    , _jvf(NULL)
>    , _set(false)
> -  , _eb(NULL, NULL, false) // no references escape
> +  , _eb(NULL, NULL, type == T_OBJECT)
>    , _result(JVMTI_ERROR_NONE)
> 
> Currently 'type' is never equal to T_OBJECT at this location, still I think it
> is better to check. The compiler will replace the compare with false.
> 
> @@ -630,11 +630,11 @@
>  }
> 
>  // Revert optimizations based on escape analysis if this is an access to a
> local object
>  bool VM_GetOrSetLocal::deoptimize_objects(javaVFrame* jvf) {
>  #if COMPILER2_OR_JVMCI
> -  if (NOT_JVMCI(DoEscapeAnalysis &&) _type == T_OBJECT) {
> +  assert(_type == T_OBJECT, "EscapeBarrier should not be active if _type !=
> T_OBJECT");
> 
> I removed the if from VM_GetOrSetLocal::deoptimize_objects(), because
> now it
> only gets called if the VM_GetOrSetLocal instance has an active
> EscapeBarrier
> which will be the case iff the local type is T_OBJECT and if either C2 escape
> analysis is enabled or Graal is used.
> 
> src/hotspot/share/runtime/deoptimization.cpp
> 
> You suggested to remove the braces. Done.
> 
> src/hotspot/share/runtime/deoptimization.hpp
> 
> Must provide definition of EscapeBarrier::barrier_active() for new call site in
> VM_GetOrSetLocal::doit_prologue() if building with COMPILER2_OR_JVMCI
> not
> defined.
> 
> test/hotspot/jtreg/serviceability/jvmti/Heap/IterateHeapWithEscapeAnalysis
> Enabled.java
> 
> Make use of [2] and pass test with minimal vm.
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8249293
> [2] https://bugs.openjdk.java.net/browse/JDK-8251384
> 
> -----Original Message-----
> From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
> Sent: Samstag, 22. August 2020 07:46
> To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-
> dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-
> runtime-dev at openjdk.java.net
> Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance
> in the Presence of JVMTI Agents
> 
> Hi Richard,
> 
> I read through your change again. It looks good to me now.
> The new naming and additional comments make it
> easier to read I think, thank you.
> 
> One small thing:
> deoptimization.cpp, l. 1503
> You don't really need the brackets. Two lines below you don't use them
> either.
> (No webrev needed)
> 
> Best regards,
>   Goetz.


From christian.hagedorn at oracle.com  Fri Aug 28 08:10:08 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Fri, 28 Aug 2020 10:10:08 +0200
Subject: [16] RFR(M): 8249607: C2: assert(!had_error) failed: bad dominance
In-Reply-To: <cccdfa66-74d9-d3fa-fd0a-40426cca8037@oracle.com>
References: <ff42068d-c628-7698-b16d-de164e9f6a4d@oracle.com>
 <87o8my7r0b.fsf@redhat.com> <738ba102-2cbd-a842-0f23-2984a9293035@oracle.com>
 <95e31d50-aab1-a56c-9077-3e6370d1d94a@oracle.com>
 <cccdfa66-74d9-d3fa-fd0a-40426cca8037@oracle.com>
Message-ID: <32e58c35-e19a-c4cf-608e-10aa2a8fa12e@oracle.com>

Hi Vladimir

On 27.08.20 19:23, Vladimir Kozlov wrote:
> On 8/27/20 12:53 AM, Christian Hagedorn wrote:
>> On 25.08.20 19:42, Christian Hagedorn wrote:
>>> On 25.08.20 16:13, Roland Westrelin wrote:
>>>>
>>>>> In the testcase, a LoadSNode is cloned in
>>>>> PhaseIdealLoop::split_if_with_blocks_post() for each use such that 
>>>>> they
>>>>> can float out of a loop. To ensure that these loads cannot float back
>>>>> into the loop, we pin them by setting their control input [1]. In the
>>>>> testcase, all 3 new clones are pinned to a loop exit node that is part
>>>>> of an outer strip mined loop (see [2]).
>>>>
>>>> Do I understand this right, that all 3 clones are pinned with the same
>>>> control? So they common and only of them is kept?
>>>
>>> Yes, exactly. All are pinned to the inner loop exit node. But at the 
>>> time we hit the assertion failure, we still got one cloned load (903 
>>> LoadS) that is an input to the store (575 StoreI) that's going into 
>>> the outer strip mined loop safepoint, and one load (901 LoadS) that 
>>> is triggering the dominance failure. LoadS 902 was removed at some 
>>> point in between due to other optimizations.
>>
>> As Roland and I have discussed offline, it seems to be better and 
>> safer to do a simpler fix that does not change the original behavior 
>> of the optimization. The new fix suggests not yank AddP nodes (which 
>> are inputs to the cloned LoadSNodes in the testcase) and also to not 
>> yank gc barriers. In the testcase, the cloned LoadSNodes are still 
>> pinned at the loop exit but now they can be optimized and common up to 
>> one node during igvn that only belongs to the safepoint in the outer 
>> strip mined loop (i.e. no load after the loop anymore). The load is 
>> still successfully removed from the inner loop:
>>
>> http://cr.openjdk.java.net/~chagedorn/8249607/webrev.02/
>>
>> I left the improved dominance failure dumping as it is.
> 
> Good.

Thank you for your review!

>>
>> We think that it would be a good idea to revisit this cloning 
>> optimization in an RFE and also consider webrev.01 there as it seems 
>> to be more like an enhancement for loop strip mining rather than a bug 
>> fix. I filed [1] which summarizes some thoughts about it.
>>
>> What do others think about that?
> 
> I agree with that.

Great!

Best regards,
Christian

> Thanks,
> Vladimir
> 
>>
>> Best regards,
>> Christian
>>
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8252372

From rwestrel at redhat.com  Fri Aug 28 08:27:46 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 28 Aug 2020 10:27:46 +0200
Subject: [16] RFR(M): 8249607: C2: assert(!had_error) failed: bad dominance
In-Reply-To: <95e31d50-aab1-a56c-9077-3e6370d1d94a@oracle.com>
References: <ff42068d-c628-7698-b16d-de164e9f6a4d@oracle.com>
 <87o8my7r0b.fsf@redhat.com>
 <738ba102-2cbd-a842-0f23-2984a9293035@oracle.com>
 <95e31d50-aab1-a56c-9077-3e6370d1d94a@oracle.com>
Message-ID: <87lfhz5g59.fsf@redhat.com>


> http://cr.openjdk.java.net/~chagedorn/8249607/webrev.02/

That looks good to me.

Roland.


From christian.hagedorn at oracle.com  Fri Aug 28 08:33:13 2020
From: christian.hagedorn at oracle.com (Christian Hagedorn)
Date: Fri, 28 Aug 2020 10:33:13 +0200
Subject: [16] RFR(M): 8249607: C2: assert(!had_error) failed: bad dominance
In-Reply-To: <87lfhz5g59.fsf@redhat.com>
References: <ff42068d-c628-7698-b16d-de164e9f6a4d@oracle.com>
 <87o8my7r0b.fsf@redhat.com> <738ba102-2cbd-a842-0f23-2984a9293035@oracle.com>
 <95e31d50-aab1-a56c-9077-3e6370d1d94a@oracle.com> <87lfhz5g59.fsf@redhat.com>
Message-ID: <97e85c41-03b2-d0f5-8e8d-7cfe0d120644@oracle.com>

Thank you Roland for your review and your help discussing it!

Best regards,
Christian

On 28.08.20 10:27, Roland Westrelin wrote:
> 
>> http://cr.openjdk.java.net/~chagedorn/8249607/webrev.02/
> 
> That looks good to me.
> 
> Roland.
> 

From goetz.lindenmaier at sap.com  Fri Aug 28 08:57:07 2020
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Fri, 28 Aug 2020 08:57:07 +0000
Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
In-Reply-To: <bafa778e83b01e497052778e58174b87d31d9f7e.camel@redhat.com>
References: <AM4PR02MB305739FDFF38E9432F27E7969A5F0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <fdb47a78-5b0e-f013-b5fc-7c9fcdf0cf2c@redhat.com>
 <AM4PR02MB30575C819AFC52A4F10127689A550@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com>
 <bafa778e83b01e497052778e58174b87d31d9f7e.camel@redhat.com>
Message-ID: <AM4PR0202MB2964850BBDECC859BB802A77EC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>

Hi,

I'd prefer to push this.
I'm not really happy with 11 staying behind 11-oracle in the JVMCI issue.
Unfortunately there is nobody in the open community to address this.
And there are enough other changes OpenJDK 11 lacks wrt. 11-oracle.
If this gap grows big, we can no more claim OpenJDK 11 is a valid replacement
for the Oracle vm. 

So I would continue to try to take all changes that go to 11-oracle
to OpenJDK 11, too.

And as this is now ported to 11, let's push it.
Anyways, it also affects C1 and other shared code, so it might
simplify integrating follow-ups.

Best regards,
  Goetz.


> -----Original Message-----
> From: Severin Gehwolf <sgehwolf at redhat.com>
> Sent: Thursday, August 27, 2020 6:00 PM
> To: Andrew Haley <aph at redhat.com>; Doerr, Martin
> <martin.doerr at sap.com>; 'hotspot-compiler-dev at openjdk.java.net'
> <hotspot-compiler-dev at openjdk.java.net>; jdk-updates-
> dev at openjdk.java.net
> Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
> Subject: Re: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
> 
> On Thu, 2020-08-27 at 16:25 +0100, Andrew Haley wrote:
> > Hi,
> >
> > On 27/08/2020 11:04, Doerr, Martin wrote:
> > > > Why is anyone backporting a P4 Enhancement? Seems weird.
> > > This is a good question in general. Personally, I'd vote for
> > > backporting fewer less important things to 11u in the future. We
> > > should better focus on 17 IMHO.
> > >
> > > However, there are some arguments for backporting this one:
> > > - Oracle has done so. There may be more backports in this area and
> > > I'd expect less effort if we have the same code in the open version.
> > > - Performance is supposed to be better. (Though I didn't measure it.)
> > > - New code is much cleaner. Let's keep in mind that we have to
> > > support it for quite a while.
> > >
> > > Are you ok with it?
> >
> > I'm unsure. While "Oracle has backported it" has been a slam-dunk
> > justification for many patches, I am concerned about the destabilizing
> > effect of the volume of patches we are processing.
> >
> > "Better performance" is not in itself justification for a backport
> > unless the improvement is really compelling.
> >
> > "Cleanups" are a red flag. The miserable history of code that has been
> > broken by seemingly innocuous cleanups is long. This is a big change
> > that affects some very delicate code, but the fact that there is
> > already a GraalVM patch we can use is quite persuasive.
> >
> > So I'm not refusing it, I want people's opinions.
> 
> It seems like a nice-to-have fix for OpenJDK 11 itself. Interest seems
> to be coming from Graal. Until there is a more compelling reason to
> backport this (other than performance for some JVMCI impl) we shouldn't
> backport this. We already have a label for these: jdk11u-jvmci-defer.
> We should apply that and re-evaluate later if needed.
> 
> My $0.02
> 
> Thanks,
> Severin


From adinn at redhat.com  Fri Aug 28 09:21:29 2020
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 28 Aug 2020 10:21:29 +0100
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <b4791a53-2a13-c0a4-a151-c613a613df65@arm.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com>
 <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com>
 <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com>
 <b97a6343-ec3b-f6ec-2a75-d02a53641ed1@arm.com>
 <35f8801f-4383-4804-2a9b-5818d1bda763@redhat.com>
 <524a4aaa-3cf7-7b5e-3e0e-0fd7f4f89fbf@oracle.com>
 <670fad6f-16ff-a7b3-8775-08dd79809ddf@redhat.com>
 <dbf16339-a41e-2f87-aace-461b21f10bfe@oracle.com>
 <b4791a53-2a13-c0a4-a151-c613a613df65@arm.com>
Message-ID: <f6225a97-4660-d771-5ab3-9f92291eadf6@redhat.com>

On 28/08/2020 06:56, Ningsheng Jian wrote:
> On 8/27/20 8:54 PM, Vladimir Ivanov wrote:
>> I definitely don't want to hinder/block the impressive work Ningsheng
>> and others at Arm are doing for SVE support.
>>
>> Frankly speaking, my main concern is that the implementation can stay
>> that way forever ;-) That's why I'm trying to get enough ground covered
>> in the discussion and some agreements/commitments to be made before it
>> is integrated.

Sure, I agree that we should use this implementation as a stepping stone
to a set of unified AArch64 vector rules that handle operations for
vectors of all size. Having looked at the latest x86 vector code I get
the impression that there is a much greater problem unifying the
plethora of different cases within the x86_64 family than there will be
unifying x86_64 and AArch64 in this regard. Your solution of using the
vec (and legVec) register class(es) has tamed the proliferation of match
rules yet it still leaves a great deal of complexity in the logic that
controls the handling of those matches.

I think it will be much easier to subsume the AArch64 Neon and SVE cases
under one common vec type and the resulting case handling will be much
less complex. Of course, the rationale for doing so is far less pressing
than with x86 since the multiplication of match rules is not so large
(particularly as there is no cross-combination with memory operands).
Yet, it still seems worth doing.

>> Leaving RA part aside, I have one suggestion which should help in the
>> future: let's try to consistently follow full-width vector abstraction.
>> In AD file, vecA operand is way too similar to vecX et al which makes a
>> wrong impression it's yet another vector flavor. So, choosing a better
>> name will help when representation changes. For example, x86 moved away
>> from vecX/... operands to a single generic one (called "vec") and you
>> can take a loot at x86.ad to see the result.
> 
> Thanks for the suggestion. In current implementation vecA does not
> include vecD/vecX for NEON - so actually it's regarded as another vector
> flavor. We try to keep the SVE implementation separated from original
> NEON code (and a new ad file is also introduced), to make the code
> better maintainable and reviewable. What do you think about this naming,
> Andrew?
If the goal is that eventually a vec register class will parametrize the
relevant rules for VecD, VecX and VecA operations then I don't see any
harm in re-labelling the vecA class to simply be called vec. The
intention to use this to handle all cases can be signalled by
documenting this register class to explain that it is currently only
used to specify VecA rules but will eventually be used as a generic
class, parameterizing rules that subsume all applicable VecD, VecX and
VecA cases. When that happens we can quite naturally fold the
aarch64_sve rules back into aarch64.ad with common and/or special case
handling merging under a single rule.

regards,


Andrew Dinn
-----------
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From vladimir.x.ivanov at oracle.com  Fri Aug 28 09:56:10 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 28 Aug 2020 12:56:10 +0300
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <f6225a97-4660-d771-5ab3-9f92291eadf6@redhat.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com>
 <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com>
 <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com>
 <b97a6343-ec3b-f6ec-2a75-d02a53641ed1@arm.com>
 <35f8801f-4383-4804-2a9b-5818d1bda763@redhat.com>
 <524a4aaa-3cf7-7b5e-3e0e-0fd7f4f89fbf@oracle.com>
 <670fad6f-16ff-a7b3-8775-08dd79809ddf@redhat.com>
 <dbf16339-a41e-2f87-aace-461b21f10bfe@oracle.com>
 <b4791a53-2a13-c0a4-a151-c613a613df65@arm.com>
 <f6225a97-4660-d771-5ab3-9f92291eadf6@redhat.com>
Message-ID: <9b585dff-38be-16b5-b1a1-4ea0207458b9@oracle.com>


>>> Frankly speaking, my main concern is that the implementation can stay
>>> that way forever ;-) That's why I'm trying to get enough ground covered
>>> in the discussion and some agreements/commitments to be made before it
>>> is integrated.
> 
> Sure, I agree that we should use this implementation as a stepping stone
> to a set of unified AArch64 vector rules that handle operations for
> vectors of all size. Having looked at the latest x86 vector code I get
> the impression that there is a much greater problem unifying the
> plethora of different cases within the x86_64 family than there will be
> unifying x86_64 and AArch64 in this regard. Your solution of using the
> vec (and legVec) register class(es) has tamed the proliferation of match
> rules yet it still leaves a great deal of complexity in the logic that
> controls the handling of those matches.

I believe you are referring to ubiquitous presence of predicates in AD 
instructions for vector cases. The root cause is that operands have very 
limited influence on matching logic. There's a promising idea to 
introduce predicated operands and factor complex predicates into a set 
of simpler ones placed on operands instead. It should significantly 
reduce the perceived complexity, but the prototyping hasn't been 
finished yet.

[...]

>>> Leaving RA part aside, I have one suggestion which should help in the
>>> future: let's try to consistently follow full-width vector abstraction.
>>> In AD file, vecA operand is way too similar to vecX et al which makes a
>>> wrong impression it's yet another vector flavor. So, choosing a better
>>> name will help when representation changes. For example, x86 moved away
>>> from vecX/... operands to a single generic one (called "vec") and you
>>> can take a loot at x86.ad to see the result.
>>
>> Thanks for the suggestion. In current implementation vecA does not
>> include vecD/vecX for NEON - so actually it's regarded as another vector
>> flavor. We try to keep the SVE implementation separated from original
>> NEON code (and a new ad file is also introduced), to make the code
>> better maintainable and reviewable. What do you think about this naming,
>> Andrew?
> If the goal is that eventually a vec register class will parametrize the
> relevant rules for VecD, VecX and VecA operations then I don't see any
> harm in re-labelling the vecA class to simply be called vec. The
> intention to use this to handle all cases can be signalled by
> documenting this register class to explain that it is currently only
> used to specify VecA rules but will eventually be used as a generic
> class, parameterizing rules that subsume all applicable VecD, VecX and
> VecA cases. When that happens we can quite naturally fold the
> aarch64_sve rules back into aarch64.ad with common and/or special case
> handling merging under a single rule.

One more point on naming: though it was me who proposed the name "vec" 
on x86, I don't think it's the best option anymore. Considering it's 
desirable to get rid of VecS/VecD/VecX/... machine ideal registers and 
replace them with a single one, I think using Op_RegV is a better 
alternative to Op_Vec. Hence, regV/rRegV/vReg look better (depending on 
conventions adopted in particular AD file).

Best regards,
Vladimir Ivanov

From goetz.lindenmaier at sap.com  Fri Aug 28 11:48:13 2020
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Fri, 28 Aug 2020 11:48:13 +0000
Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
In-Reply-To: <778219306.3426.1598610873510.JavaMail.www@wwinf1p10>
References: <AM4PR02MB305739FDFF38E9432F27E7969A5F0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <fdb47a78-5b0e-f013-b5fc-7c9fcdf0cf2c@redhat.com>
 <AM4PR02MB30575C819AFC52A4F10127689A550@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com>
 <bafa778e83b01e497052778e58174b87d31d9f7e.camel@redhat.com>
 <AM4PR0202MB2964850BBDECC859BB802A77EC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <778219306.3426.1598610873510.JavaMail.www@wwinf1p10>
Message-ID: <AM4PR0202MB296461E5BB190751F2164E6EEC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>

Hi,

There are queries for this on the jdk11 project page:
https://wiki.openjdk.java.net/display/JDKUpdates/JDK11u

e.g.
https://bugs.openjdk.java.net/issues/?filter=39054

Best regards,
 Goetz.

From: gouessej at orange.fr <gouessej at orange.fr>
Sent: Friday, August 28, 2020 12:35 PM
To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; 'Severin Gehwolf' <sgehwolf at redhat.com>; Andrew Haley <aph at redhat.com>; Doerr, Martin <martin.doerr at sap.com>; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>; jdk-updates-dev at openjdk.java.net
Subject: RE: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.


Please can you elaborate about " there are enough other changes OpenJDK 11 lacks wrt. 11-oracle"?


> Message du 28/08/20 11:03
> De : "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>
> A : "'Severin Gehwolf'" <sgehwolf at redhat.com<mailto:sgehwolf at redhat.com>>, "Andrew Haley" <aph at redhat.com<mailto:aph at redhat.com>>, "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>, "'hotspot-compiler-dev at openjdk.java.net'" <hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>>, "jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>" <jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>>
> Copie ? :
> Objet : RE: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
>
> Hi,
>
> I'd prefer to push this.
> I'm not really happy with 11 staying behind 11-oracle in the JVMCI issue.
> Unfortunately there is nobody in the open community to address this.
> And there are enough other changes OpenJDK 11 lacks wrt. 11-oracle.
> If this gap grows big, we can no more claim OpenJDK 11 is a valid replacement
> for the Oracle vm.
>
> So I would continue to try to take all changes that go to 11-oracle
> to OpenJDK 11, too.
>
> And as this is now ported to 11, let's push it.
> Anyways, it also affects C1 and other shared code, so it might
> simplify integrating follow-ups.
>
> Best regards,
> Goetz.
>
>
> > -----Original Message-----
> > From: Severin Gehwolf <sgehwolf at redhat.com<mailto:sgehwolf at redhat.com>>
> > Sent: Thursday, August 27, 2020 6:00 PM
> > To: Andrew Haley <aph at redhat.com<mailto:aph at redhat.com>>; Doerr, Martin
> > <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>; 'hotspot-compiler-dev at openjdk.java.net'
> > <hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>>; jdk-updates-
> > dev at openjdk.java.net<mailto:dev at openjdk.java.net>
> > Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>
> > Subject: Re: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
> >
> > On Thu, 2020-08-27 at 16:25 +0100, Andrew Haley wrote:
> > > Hi,
> > >
> > > On 27/08/2020 11:04, Doerr, Martin wrote:
> > > > > Why is anyone backporting a P4 Enhancement? Seems weird.
> > > > This is a good question in general. Personally, I'd vote for
> > > > backporting fewer less important things to 11u in the future. We
> > > > should better focus on 17 IMHO.
> > > >
> > > > However, there are some arguments for backporting this one:
> > > > - Oracle has done so. There may be more backports in this area and
> > > > I'd expect less effort if we have the same code in the open version.
> > > > - Performance is supposed to be better. (Though I didn't measure it.)
> > > > - New code is much cleaner. Let's keep in mind that we have to
> > > > support it for quite a while.
> > > >
> > > > Are you ok with it?
> > >
> > > I'm unsure. While "Oracle has backported it" has been a slam-dunk
> > > justification for many patches, I am concerned about the destabilizing
> > > effect of the volume of patches we are processing.
> > >
> > > "Better performance" is not in itself justification for a backport
> > > unless the improvement is really compelling.
> > >
> > > "Cleanups" are a red flag. The miserable history of code that has been
> > > broken by seemingly innocuous cleanups is long. This is a big change
> > > that affects some very delicate code, but the fact that there is
> > > already a GraalVM patch we can use is quite persuasive.
> > >
> > > So I'm not refusing it, I want people's opinions.
> >
> > It seems like a nice-to-have fix for OpenJDK 11 itself. Interest seems
> > to be coming from Graal. Until there is a more compelling reason to
> > backport this (other than performance for some JVMCI impl) we shouldn't
> > backport this. We already have a label for these: jdk11u-jvmci-defer.
> > We should apply that and re-evaluate later if needed.
> >
> > My $0.02
> >
> > Thanks,
> > Severin
>
>

From aph at redhat.com  Fri Aug 28 12:52:18 2020
From: aph at redhat.com (Andrew Haley)
Date: Fri, 28 Aug 2020 13:52:18 +0100
Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
In-Reply-To: <AM4PR0202MB2964850BBDECC859BB802A77EC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>
References: <AM4PR02MB305739FDFF38E9432F27E7969A5F0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <fdb47a78-5b0e-f013-b5fc-7c9fcdf0cf2c@redhat.com>
 <AM4PR02MB30575C819AFC52A4F10127689A550@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com>
 <bafa778e83b01e497052778e58174b87d31d9f7e.camel@redhat.com>
 <AM4PR0202MB2964850BBDECC859BB802A77EC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>
Message-ID: <4137e474-cf95-b380-1fd5-ca71f1313d22@redhat.com>

On 28/08/2020 09:57, Lindenmaier, Goetz wrote:
> I'd prefer to push this.
> I'm not really happy with 11 staying behind 11-oracle in the JVMCI issue.
> Unfortunately there is nobody in the open community to address this.
> And there are enough other changes OpenJDK 11 lacks wrt. 11-oracle.
> If this gap grows big, we can no more claim OpenJDK 11 is a valid replacement
> for the Oracle vm. 

What JVMCI issue is this? Please explain. All that I see is a faster
"slow" locking path for monitors.

> So I would continue to try to take all changes that go to 11-oracle
> to OpenJDK 11, too.
> 
> And as this is now ported to 11, let's push it.
> Anyways, it also affects C1 and other shared code, so it might
> simplify integrating follow-ups.

That is not a good reason for backporting.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From goetz.lindenmaier at sap.com  Fri Aug 28 13:11:57 2020
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Fri, 28 Aug 2020 13:11:57 +0000
Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
In-Reply-To: <4137e474-cf95-b380-1fd5-ca71f1313d22@redhat.com>
References: <AM4PR02MB305739FDFF38E9432F27E7969A5F0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <fdb47a78-5b0e-f013-b5fc-7c9fcdf0cf2c@redhat.com>
 <AM4PR02MB30575C819AFC52A4F10127689A550@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com>
 <bafa778e83b01e497052778e58174b87d31d9f7e.camel@redhat.com>
 <AM4PR0202MB2964850BBDECC859BB802A77EC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <4137e474-cf95-b380-1fd5-ca71f1313d22@redhat.com>
Message-ID: <AM4PR0202MB29645C3C9B82F82C8C72943CEC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>

Hi Andrew, 

> > I'm not really happy with 11 staying behind 11-oracle in the JVMCI issue. 
> What JVMCI issue is this? Please explain. All that I see is a faster
> "slow" locking path for monitors.

This was meant as a more general comment. I wanted to 
address that we don't integrate many of the JVMCI changes
so the OpenJDK 11 is probably not usable with graal.
The comment was not tailored to this specific change.
Unfortunately our team has not the capacity to look at 
JVMCI/graal.  

Best regards,
  Goetz.


From gouessej at orange.fr  Fri Aug 28 10:34:33 2020
From: gouessej at orange.fr (gouessej at orange.fr)
Date: Fri, 28 Aug 2020 12:34:33 +0200 (CEST)
Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
In-Reply-To: <AM4PR0202MB2964850BBDECC859BB802A77EC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>
References: <AM4PR02MB305739FDFF38E9432F27E7969A5F0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <fdb47a78-5b0e-f013-b5fc-7c9fcdf0cf2c@redhat.com>
 <AM4PR02MB30575C819AFC52A4F10127689A550@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com>
 <bafa778e83b01e497052778e58174b87d31d9f7e.camel@redhat.com>
 <AM4PR0202MB2964850BBDECC859BB802A77EC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>
Message-ID: <778219306.3426.1598610873510.JavaMail.www@wwinf1p10>

Please can you elaborate about " there are enough other changes OpenJDK 11 lacks wrt. 11-oracle"?

?

?

> Message du 28/08/20 11:03
> De : "Lindenmaier, Goetz" 
> A : "'Severin Gehwolf'" , "Andrew Haley" , "Doerr, Martin" , "'hotspot-compiler-dev at openjdk.java.net'" , "jdk-updates-dev at openjdk.java.net" 
> Copie ? : 
> Objet : RE: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
> 
> Hi,
> 
> I'd prefer to push this.
> I'm not really happy with 11 staying behind 11-oracle in the JVMCI issue.
> Unfortunately there is nobody in the open community to address this.
> And there are enough other changes OpenJDK 11 lacks wrt. 11-oracle.
> If this gap grows big, we can no more claim OpenJDK 11 is a valid replacement
> for the Oracle vm. 
> 
> So I would continue to try to take all changes that go to 11-oracle
> to OpenJDK 11, too.
> 
> And as this is now ported to 11, let's push it.
> Anyways, it also affects C1 and other shared code, so it might
> simplify integrating follow-ups.
> 
> Best regards,
> Goetz.
> 
> 
> > -----Original Message-----
> > From: Severin Gehwolf 
> > Sent: Thursday, August 27, 2020 6:00 PM
> > To: Andrew Haley ; Doerr, Martin
> > ; 'hotspot-compiler-dev at openjdk.java.net'
> > ; jdk-updates-
> > dev at openjdk.java.net
> > Cc: Lindenmaier, Goetz 
> > Subject: Re: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
> > 
> > On Thu, 2020-08-27 at 16:25 +0100, Andrew Haley wrote:
> > > Hi,
> > >
> > > On 27/08/2020 11:04, Doerr, Martin wrote:
> > > > > Why is anyone backporting a P4 Enhancement? Seems weird.
> > > > This is a good question in general. Personally, I'd vote for
> > > > backporting fewer less important things to 11u in the future. We
> > > > should better focus on 17 IMHO.
> > > >
> > > > However, there are some arguments for backporting this one:
> > > > - Oracle has done so. There may be more backports in this area and
> > > > I'd expect less effort if we have the same code in the open version.
> > > > - Performance is supposed to be better. (Though I didn't measure it.)
> > > > - New code is much cleaner. Let's keep in mind that we have to
> > > > support it for quite a while.
> > > >
> > > > Are you ok with it?
> > >
> > > I'm unsure. While "Oracle has backported it" has been a slam-dunk
> > > justification for many patches, I am concerned about the destabilizing
> > > effect of the volume of patches we are processing.
> > >
> > > "Better performance" is not in itself justification for a backport
> > > unless the improvement is really compelling.
> > >
> > > "Cleanups" are a red flag. The miserable history of code that has been
> > > broken by seemingly innocuous cleanups is long. This is a big change
> > > that affects some very delicate code, but the fact that there is
> > > already a GraalVM patch we can use is quite persuasive.
> > >
> > > So I'm not refusing it, I want people's opinions.
> > 
> > It seems like a nice-to-have fix for OpenJDK 11 itself. Interest seems
> > to be coming from Graal. Until there is a more compelling reason to
> > backport this (other than performance for some JVMCI impl) we shouldn't
> > backport this. We already have a label for these: jdk11u-jvmci-defer.
> > We should apply that and re-evaluate later if needed.
> > 
> > My $0.02
> > 
> > Thanks,
> > Severin
> 
>

From sgehwolf at redhat.com  Fri Aug 28 13:30:51 2020
From: sgehwolf at redhat.com (Severin Gehwolf)
Date: Fri, 28 Aug 2020 15:30:51 +0200
Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
In-Reply-To: <AM4PR0202MB29645C3C9B82F82C8C72943CEC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>
References: <AM4PR02MB305739FDFF38E9432F27E7969A5F0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <fdb47a78-5b0e-f013-b5fc-7c9fcdf0cf2c@redhat.com>
 <AM4PR02MB30575C819AFC52A4F10127689A550@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com>
 <bafa778e83b01e497052778e58174b87d31d9f7e.camel@redhat.com>
 <AM4PR0202MB2964850BBDECC859BB802A77EC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <4137e474-cf95-b380-1fd5-ca71f1313d22@redhat.com>
 <AM4PR0202MB29645C3C9B82F82C8C72943CEC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>
Message-ID: <a830b11033ff3c80769f740535ad8ece222e03e7.camel@redhat.com>

On Fri, 2020-08-28 at 13:11 +0000, Lindenmaier, Goetz wrote:
> I wanted to address that we don't integrate many of the JVMCI changes
> so the OpenJDK 11 is probably not usable with graal.

https://github.com/graalvm/mandrel#how-does-mandrel-differ-from-graal

Thanks,
Severin


From goetz.lindenmaier at sap.com  Fri Aug 28 14:30:32 2020
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Fri, 28 Aug 2020 14:30:32 +0000
Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
In-Reply-To: <a830b11033ff3c80769f740535ad8ece222e03e7.camel@redhat.com>
References: <AM4PR02MB305739FDFF38E9432F27E7969A5F0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <fdb47a78-5b0e-f013-b5fc-7c9fcdf0cf2c@redhat.com>
 <AM4PR02MB30575C819AFC52A4F10127689A550@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com>
 <bafa778e83b01e497052778e58174b87d31d9f7e.camel@redhat.com>
 <AM4PR0202MB2964850BBDECC859BB802A77EC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <4137e474-cf95-b380-1fd5-ca71f1313d22@redhat.com>
 <AM4PR0202MB29645C3C9B82F82C8C72943CEC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <a830b11033ff3c80769f740535ad8ece222e03e7.camel@redhat.com>
Message-ID: <AM4PR0202MB29649FD48737BA6C907E9C01EC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>

That's cool. So it works ??
They would probably profit from '8241234: Unify monitor enter/exit runtime entries."

Best regards,
 Goetz.

> -----Original Message-----
> From: Severin Gehwolf <sgehwolf at redhat.com>
> Sent: Friday, August 28, 2020 3:31 PM
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; 'Andrew Haley'
> <aph at redhat.com>; Doerr, Martin <martin.doerr at sap.com>; 'hotspot-
> compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>;
> jdk-updates-dev at openjdk.java.net
> Subject: Re: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
> 
> On Fri, 2020-08-28 at 13:11 +0000, Lindenmaier, Goetz wrote:
> > I wanted to address that we don't integrate many of the JVMCI changes
> > so the OpenJDK 11 is probably not usable with graal.
> 
> https://github.com/graalvm/mandrel#how-does-mandrel-differ-from-graal
> 
> Thanks,
> Severin


From aph at redhat.com  Fri Aug 28 14:35:39 2020
From: aph at redhat.com (Andrew Haley)
Date: Fri, 28 Aug 2020 15:35:39 +0100
Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
In-Reply-To: <AM4PR0202MB29645C3C9B82F82C8C72943CEC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>
References: <AM4PR02MB305739FDFF38E9432F27E7969A5F0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <fdb47a78-5b0e-f013-b5fc-7c9fcdf0cf2c@redhat.com>
 <AM4PR02MB30575C819AFC52A4F10127689A550@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com>
 <bafa778e83b01e497052778e58174b87d31d9f7e.camel@redhat.com>
 <AM4PR0202MB2964850BBDECC859BB802A77EC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <4137e474-cf95-b380-1fd5-ca71f1313d22@redhat.com>
 <AM4PR0202MB29645C3C9B82F82C8C72943CEC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>
Message-ID: <f5c01810-52f7-7030-fe3d-a91e488711a7@redhat.com>

Hi,

On 28/08/2020 14:11, Lindenmaier, Goetz wrote:
>
>>> I'm not really happy with 11 staying behind 11-oracle in the JVMCI issue.
>> What JVMCI issue is this? Please explain. All that I see is a faster
>> "slow" locking path for monitors.
>
> This was meant as a more general comment. I wanted to address that
> we don't integrate many of the JVMCI changes so the OpenJDK 11 is
> probably not usable with graal.  The comment was not tailored to
> this specific change.  Unfortunately our team has not the capacity
> to look at JVMCI/graal.

Fair enough.

Now, let's think about the wider point.

Any change is bad because our users want, above all else,
stability. So first we should avoid change.

In order to justify any change, I want backport patches to have a real
justification. That is to say, they must have a real effect on a Java
user's experience.  Fixing visible bugs obviously qualifies, as does a
significant performance bump, as does meeting a new crypto
specification, etc, etc.

The other good reason is improved stability, which includes better
testing.

A real justification doesn't exclude "cleanups", as long as there is
some other benefit, such as making making a proposed backport
cleaner. But it has to be a backport that we are actually doing, not
some unknown backport that might happen some day.

It may well be that the 8241234 fix has a definite performance
advantage, in which case it might be a reasonable thing to do.
The provided justifications were:

- Oracle has done so. There may be more backports in this area and I'd
  expect less effort if we have the same code in the open version.
- Performance is supposed to be better.
- New code is much cleaner.

But even though the new code is much cleaner, it's a significant
change in a very delicate area. Bugs in this are can take a long time
to reveal themselves, usually under heavy load in a production
situation.

I am not saying no to this patch. I am asking "Are you sure that this
change is worth making the change?" Given that I doubt anyone will
ever notice this change unless it breaks something important, I have
my doubts.

So, anyone: is there any chance that this patch will break something?
Is this change worth the churn?

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From martin.doerr at sap.com  Fri Aug 28 15:02:53 2020
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 28 Aug 2020 15:02:53 +0000
Subject: RFR(S): 8250635: MethodArityHistogram should use Compile_lock in
 favour of fancy checks
In-Reply-To: <6C21DEE4-95FD-4EDA-9DBF-2B12560A5C04@sap.com>
References: <6C21DEE4-95FD-4EDA-9DBF-2B12560A5C04@sap.com>
Message-ID: <AM4PR02MB305788F5F4C89D29817EC7739A520@AM4PR02MB3057.eurprd02.prod.outlook.com>

Hi Lutz,

just for my understanding: What exactly are we protecting against by holding Compile_lock?
Is it for concurrent initialization or concurrent unloading?

Note that it's also possible to iterate only over alive nmethods:
NMethodIterator iter(NMethodIterator::only_alive);

Best regards,
Martin


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> retn at openjdk.java.net> On Behalf Of Schmidt, Lutz
> Sent: Mittwoch, 26. August 2020 17:19
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: [CAUTION] RFR(S): 8250635: MethodArityHistogram should use
> Compile_lock in favour of fancy checks
> 
> Dear all,
> 
> may I please request reviews for this small enhancement? Instead of calling a
> method doing complicated and fancy (hard to understand) checks, the
> iteration over all nmethods is now protected by holding the Compile_lock in
> addition to the CodeCache_lock.
> 
> Bug:    https://bugs.openjdk.java.net/browse/JDK-8250635
> Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8250635.00/
> 
> Thank you!
> Lutz
> 


From headius at headius.com  Fri Aug 28 15:41:35 2020
From: headius at headius.com (Charles Oliver Nutter)
Date: Fri, 28 Aug 2020 10:41:35 -0500
Subject: Tiered compilation leads to "unloaded signature class" inlining
 failures in JRuby
In-Reply-To: <CAE-f1xT_pGRRs1COxDfcos+M_pxa4ptpuWA+_Day15vEfjKtww@mail.gmail.com>
References: <CAE-f1xT4PUExDDw-LwfZk4dCfAJhnwEUKVYGRZypYJYG5pubQw@mail.gmail.com>
 <2f8c8f7a-3563-758b-9bb2-4e267ef7d694@oracle.com>
 <CAE-f1xT_pGRRs1COxDfcos+M_pxa4ptpuWA+_Day15vEfjKtww@mail.gmail.com>
Message-ID: <CAE-f1xSYpHBjvhsn3jF+t1VbuK993iT0EU5xB3P8Y1PCmFTeSA@mail.gmail.com>

It has been a couple months so I want to wake this thread up again. As
far as I know nothing has changed.

Just to emphasize the importance here: if indy call sites are not
inlining, then JRuby is clearly missing out on tons of performance. It
seems likely to also affect other languages using invokedynamic, and
based on other reports (and my own experiments) it may not matter if
exotic classloader structures are in use.

What is the next step for me to help get this problem solved?

- Charlie

On Mon, Jun 15, 2020 at 4:38 PM Charles Oliver Nutter
<headius at headius.com> wrote:
>
> Charlie Gracie figured out a nice Hotspot incantation to reproduce
> 100% and dump just the PriintInlining graph in question.
>
> He also managed this with tiered compilation *turned off*, so that may
> have been a red herring.
>
> jruby \
>     -Xcompile.invokedynamic \
>     "-J-XX:CompileCommand=option *::*foo*,PrintInlining" \
>     "-J-XX:CompileCommand=compileonly,*::*foo*" \
>     "-J-XX:-TieredCompilation" \
>     main.rb
>
> On Mon, Jun 15, 2020 at 4:23 PM Claes Redestad
> <claes.redestad at oracle.com> wrote:
> > If so, a possible workaround might be to pass the generated class
> > through Unsafe.ensureClassInitialized (or Lookup.ensureInitialized if on
> > 15+)
>
> I added Unsafe.ensureClassInitialized right after the JIT class has
> been defined, and it did not appear to help.
>
> I tried turning off JRuby's background JIT threads, which could cause
> a method to get jitted and loaded twice (into separate classloaders).
> The JRuby flag is "-Xjit.background=false" but it also did not help.
>
> - Charlie

From vladimir.x.ivanov at oracle.com  Fri Aug 28 15:51:02 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 28 Aug 2020 18:51:02 +0300
Subject: Tiered compilation leads to "unloaded signature class" inlining
 failures in JRuby
In-Reply-To: <CAE-f1xSYpHBjvhsn3jF+t1VbuK993iT0EU5xB3P8Y1PCmFTeSA@mail.gmail.com>
References: <CAE-f1xT4PUExDDw-LwfZk4dCfAJhnwEUKVYGRZypYJYG5pubQw@mail.gmail.com>
 <2f8c8f7a-3563-758b-9bb2-4e267ef7d694@oracle.com>
 <CAE-f1xT_pGRRs1COxDfcos+M_pxa4ptpuWA+_Day15vEfjKtww@mail.gmail.com>
 <CAE-f1xSYpHBjvhsn3jF+t1VbuK993iT0EU5xB3P8Y1PCmFTeSA@mail.gmail.com>
Message-ID: <25761258-96b7-9795-41db-94147ff2b3c5@oracle.com>

Hi Charles,

I'll take a look and will try to reproduce it myself.

Meanwhile, here's what Charlie reported:

"Starting at the error message unloaded signature classes I worked 
backwards to find the class(es) which were causing the error. The first 
class in the signature that caused issues was org/jruby/RubyModule. This 
class was found on the current class loader but it is rejected due to a 
protection domain check. There was a 2nd failure related to 
java/lang/String which is just not found on the particular class loader."

It does sound like there's something fishy happening with class loaders 
and compilation context. Does it ring any bell for you?

Best regards,
Vladimir Ivanov

On 28.08.2020 18:41, Charles Oliver Nutter wrote:
> It has been a couple months so I want to wake this thread up again. As
> far as I know nothing has changed.
> 
> Just to emphasize the importance here: if indy call sites are not
> inlining, then JRuby is clearly missing out on tons of performance. It
> seems likely to also affect other languages using invokedynamic, and
> based on other reports (and my own experiments) it may not matter if
> exotic classloader structures are in use.
> 
> What is the next step for me to help get this problem solved?
> 
> - Charlie
> 
> On Mon, Jun 15, 2020 at 4:38 PM Charles Oliver Nutter
> <headius at headius.com> wrote:
>>
>> Charlie Gracie figured out a nice Hotspot incantation to reproduce
>> 100% and dump just the PriintInlining graph in question.
>>
>> He also managed this with tiered compilation *turned off*, so that may
>> have been a red herring.
>>
>> jruby \
>>      -Xcompile.invokedynamic \
>>      "-J-XX:CompileCommand=option *::*foo*,PrintInlining" \
>>      "-J-XX:CompileCommand=compileonly,*::*foo*" \
>>      "-J-XX:-TieredCompilation" \
>>      main.rb
>>
>> On Mon, Jun 15, 2020 at 4:23 PM Claes Redestad
>> <claes.redestad at oracle.com> wrote:
>>> If so, a possible workaround might be to pass the generated class
>>> through Unsafe.ensureClassInitialized (or Lookup.ensureInitialized if on
>>> 15+)
>>
>> I added Unsafe.ensureClassInitialized right after the JIT class has
>> been defined, and it did not appear to help.
>>
>> I tried turning off JRuby's background JIT threads, which could cause
>> a method to get jitted and loaded twice (into separate classloaders).
>> The JRuby flag is "-Xjit.background=false" but it also did not help.
>>
>> - Charlie

From headius at headius.com  Fri Aug 28 15:53:10 2020
From: headius at headius.com (Charles Oliver Nutter)
Date: Fri, 28 Aug 2020 10:53:10 -0500
Subject: Tiered compilation leads to "unloaded signature class" inlining
 failures in JRuby
In-Reply-To: <25761258-96b7-9795-41db-94147ff2b3c5@oracle.com>
References: <CAE-f1xT4PUExDDw-LwfZk4dCfAJhnwEUKVYGRZypYJYG5pubQw@mail.gmail.com>
 <2f8c8f7a-3563-758b-9bb2-4e267ef7d694@oracle.com>
 <CAE-f1xT_pGRRs1COxDfcos+M_pxa4ptpuWA+_Day15vEfjKtww@mail.gmail.com>
 <CAE-f1xSYpHBjvhsn3jF+t1VbuK993iT0EU5xB3P8Y1PCmFTeSA@mail.gmail.com>
 <25761258-96b7-9795-41db-94147ff2b3c5@oracle.com>
Message-ID: <CAE-f1xRZ76AvwtnwQUfi-Ndsu7XZxzgtNnE5M_2nEQX7ywk+1A@mail.gmail.com>

It does not ring any bells but we do generate runtime-compiled methods
into their own classloaders. They should be pretty simple, though...
same protection domain as parent classloader and as each other.

I have also tried forcing all methods to be generated into the same
classloader and did not see any improvement.

I would love for this to be my problem, so I can fix it!

- Charlie

On Fri, Aug 28, 2020 at 10:51 AM Vladimir Ivanov
<vladimir.x.ivanov at oracle.com> wrote:
>
> Hi Charles,
>
> I'll take a look and will try to reproduce it myself.
>
> Meanwhile, here's what Charlie reported:
>
> "Starting at the error message unloaded signature classes I worked
> backwards to find the class(es) which were causing the error. The first
> class in the signature that caused issues was org/jruby/RubyModule. This
> class was found on the current class loader but it is rejected due to a
> protection domain check. There was a 2nd failure related to
> java/lang/String which is just not found on the particular class loader."
>
> It does sound like there's something fishy happening with class loaders
> and compilation context. Does it ring any bell for you?
>
> Best regards,
> Vladimir Ivanov
>
> On 28.08.2020 18:41, Charles Oliver Nutter wrote:
> > It has been a couple months so I want to wake this thread up again. As
> > far as I know nothing has changed.
> >
> > Just to emphasize the importance here: if indy call sites are not
> > inlining, then JRuby is clearly missing out on tons of performance. It
> > seems likely to also affect other languages using invokedynamic, and
> > based on other reports (and my own experiments) it may not matter if
> > exotic classloader structures are in use.
> >
> > What is the next step for me to help get this problem solved?
> >
> > - Charlie
> >
> > On Mon, Jun 15, 2020 at 4:38 PM Charles Oliver Nutter
> > <headius at headius.com> wrote:
> >>
> >> Charlie Gracie figured out a nice Hotspot incantation to reproduce
> >> 100% and dump just the PriintInlining graph in question.
> >>
> >> He also managed this with tiered compilation *turned off*, so that may
> >> have been a red herring.
> >>
> >> jruby \
> >>      -Xcompile.invokedynamic \
> >>      "-J-XX:CompileCommand=option *::*foo*,PrintInlining" \
> >>      "-J-XX:CompileCommand=compileonly,*::*foo*" \
> >>      "-J-XX:-TieredCompilation" \
> >>      main.rb
> >>
> >> On Mon, Jun 15, 2020 at 4:23 PM Claes Redestad
> >> <claes.redestad at oracle.com> wrote:
> >>> If so, a possible workaround might be to pass the generated class
> >>> through Unsafe.ensureClassInitialized (or Lookup.ensureInitialized if on
> >>> 15+)
> >>
> >> I added Unsafe.ensureClassInitialized right after the JIT class has
> >> been defined, and it did not appear to help.
> >>
> >> I tried turning off JRuby's background JIT threads, which could cause
> >> a method to get jitted and loaded twice (into separate classloaders).
> >> The JRuby flag is "-Xjit.background=false" but it also did not help.
> >>
> >> - Charlie

From lutz.schmidt at sap.com  Fri Aug 28 16:01:26 2020
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Fri, 28 Aug 2020 16:01:26 +0000
Subject: RFR(S): 8250635: MethodArityHistogram should use Compile_lock in
 favour of fancy checks
In-Reply-To: <AM4PR02MB305788F5F4C89D29817EC7739A520@AM4PR02MB3057.eurprd02.prod.outlook.com>
References: <6C21DEE4-95FD-4EDA-9DBF-2B12560A5C04@sap.com>
 <AM4PR02MB305788F5F4C89D29817EC7739A520@AM4PR02MB3057.eurprd02.prod.outlook.com>
Message-ID: <73F65D4B-5970-41A3-B678-1F947BEE7392@sap.com>

Hi Martin, 

good question. 

Originally, the iteration was only protected by the CodeCache_lock. That proved insufficient: the CodeCache_lock only protects against structural changes in the CodeCache. The contents of the individual code blobs can be, and is, modified independently. 

By acquiring the Compile_lock, those modifications are blocked while iterating. 

With the help of a consistency check (not contained in the RFR code), it was found that there is a slight chance to see the case (nm != NULL) && (method() == NULL). That chance is eliminated by adding the is_alive() check which is less invasive compared to adding a new nmethods_do() variant. 

Regards,
Lutz

?On 28.08.20, 17:02, "Doerr, Martin" <martin.doerr at sap.com> wrote:

    Hi Lutz,

    just for my understanding: What exactly are we protecting against by holding Compile_lock?
    Is it for concurrent initialization or concurrent unloading?

    Note that it's also possible to iterate only over alive nmethods:
    NMethodIterator iter(NMethodIterator::only_alive);

    Best regards,
    Martin


    > -----Original Message-----
    > From: hotspot-compiler-dev <hotspot-compiler-dev-
    > retn at openjdk.java.net> On Behalf Of Schmidt, Lutz
    > Sent: Mittwoch, 26. August 2020 17:19
    > To: hotspot-compiler-dev at openjdk.java.net
    > Subject: [CAUTION] RFR(S): 8250635: MethodArityHistogram should use
    > Compile_lock in favour of fancy checks
    > 
    > Dear all,
    > 
    > may I please request reviews for this small enhancement? Instead of calling a
    > method doing complicated and fancy (hard to understand) checks, the
    > iteration over all nmethods is now protected by holding the Compile_lock in
    > addition to the CodeCache_lock.
    > 
    > Bug:    https://bugs.openjdk.java.net/browse/JDK-8250635
    > Webrev: https://cr.openjdk.java.net/~lucy/webrevs/8250635.00/
    > 
    > Thank you!
    > Lutz
    > 


From martin.doerr at sap.com  Fri Aug 28 16:19:35 2020
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 28 Aug 2020 16:19:35 +0000
Subject: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
In-Reply-To: <f5c01810-52f7-7030-fe3d-a91e488711a7@redhat.com>
References: <AM4PR02MB305739FDFF38E9432F27E7969A5F0@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <fdb47a78-5b0e-f013-b5fc-7c9fcdf0cf2c@redhat.com>
 <AM4PR02MB30575C819AFC52A4F10127689A550@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <17a295cc-cea0-8534-f5bb-f667376e81d4@redhat.com>
 <bafa778e83b01e497052778e58174b87d31d9f7e.camel@redhat.com>
 <AM4PR0202MB2964850BBDECC859BB802A77EC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <4137e474-cf95-b380-1fd5-ca71f1313d22@redhat.com>
 <AM4PR0202MB29645C3C9B82F82C8C72943CEC520@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <f5c01810-52f7-7030-fe3d-a91e488711a7@redhat.com>
Message-ID: <AM4PR02MB3057716FA68F210C14C5BFA69A520@AM4PR02MB3057.eurprd02.prod.outlook.com>

Hi,

seems like two different philosophies collide here.

1. Some people assume that all of Oracle's 11u changes should get integrated into the open version.
2. Others only want to take them on demand with a good reason.

I think there are good arguments for and against both ones.
Personally, I think approach 1. is better at the beginning of an updates branch while it may be reasonable to switch at some point of time.
At the moment, I still prefer to stay in sync with Oracle as far as we can.

Regarding this change, I don't see a high risk.
What it basically does is that it reuses better code which is already used by C2 for C1 and JVMCI compilers. So there's no substantial new code.
It's tested by GraalVM and by our internal testing. There are no known issues with it.

So I'd rather vote for taking it.

Best regards,
Martin


> -----Original Message-----
> From: Andrew Haley <aph at redhat.com>
> Sent: Freitag, 28. August 2020 16:36
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; 'Severin Gehwolf'
> <sgehwolf at redhat.com>; Doerr, Martin <martin.doerr at sap.com>; 'hotspot-
> compiler-dev at openjdk.java.net' <hotspot-compiler-
> dev at openjdk.java.net>; jdk-updates-dev at openjdk.java.net
> Subject: Re: [11u] RFR(S): 8241234: Unify monitor enter/exit runtime entries.
> 
> Hi,
> 
> On 28/08/2020 14:11, Lindenmaier, Goetz wrote:
> >
> >>> I'm not really happy with 11 staying behind 11-oracle in the JVMCI issue.
> >> What JVMCI issue is this? Please explain. All that I see is a faster
> >> "slow" locking path for monitors.
> >
> > This was meant as a more general comment. I wanted to address that
> > we don't integrate many of the JVMCI changes so the OpenJDK 11 is
> > probably not usable with graal.  The comment was not tailored to
> > this specific change.  Unfortunately our team has not the capacity
> > to look at JVMCI/graal.
> 
> Fair enough.
> 
> Now, let's think about the wider point.
> 
> Any change is bad because our users want, above all else,
> stability. So first we should avoid change.
> 
> In order to justify any change, I want backport patches to have a real
> justification. That is to say, they must have a real effect on a Java
> user's experience.  Fixing visible bugs obviously qualifies, as does a
> significant performance bump, as does meeting a new crypto
> specification, etc, etc.
> 
> The other good reason is improved stability, which includes better
> testing.
> 
> A real justification doesn't exclude "cleanups", as long as there is
> some other benefit, such as making making a proposed backport
> cleaner. But it has to be a backport that we are actually doing, not
> some unknown backport that might happen some day.
> 
> It may well be that the 8241234 fix has a definite performance
> advantage, in which case it might be a reasonable thing to do.
> The provided justifications were:
> 
> - Oracle has done so. There may be more backports in this area and I'd
>   expect less effort if we have the same code in the open version.
> - Performance is supposed to be better.
> - New code is much cleaner.
> 
> But even though the new code is much cleaner, it's a significant
> change in a very delicate area. Bugs in this are can take a long time
> to reveal themselves, usually under heavy load in a production
> situation.
> 
> I am not saying no to this patch. I am asking "Are you sure that this
> change is worth making the change?" Given that I doubt anyone will
> ever notice this change unless it breaks something important, I have
> my doubts.
> 
> So, anyone: is there any chance that this patch will break something?
> Is this change worth the churn?
> 
> --
> Andrew Haley  (he/him)
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> https://keybase.io/andrewhaley
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From hohensee at amazon.com  Fri Aug 28 16:40:28 2020
From: hohensee at amazon.com (Hohensee, Paul)
Date: Fri, 28 Aug 2020 16:40:28 +0000
Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster
 Math.signum(fp)
Message-ID: <5D556B7D-1995-4FBC-9176-E79FFC789571@amazon.com>

One's perspective on the benchmark results depends on the expected frequency of the input types. If we don't expect frequent NaNs (I don?t, because they mean your algorithm is numerically unstable and you're wasting your time running it), or zeros (somewhat arguable, but note that most codes go to some lengths to eliminate zeros, e.g., using sparse arrays), then this patch seems to me to be a win.

Thanks,
Paul

?On 8/25/20, 9:57 AM, "hotspot-compiler-dev on behalf of Andrew Haley" <hotspot-compiler-dev-retn at openjdk.java.net on behalf of aph at redhat.com> wrote:

    On 24/08/2020 22:52, Dmitry Chuyko wrote:
    >
    > I added two more intrinsics -- for copySign, they are controlled by
    > UseCopySignIntrinsic flag.
    >
    > webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.03/
    >
    > It also contains 'benchmarks' directory:
    > http://cr.openjdk.java.net/~dchuyko/8251525/webrev.03/benchmarks/
    >
    > There are 8 benchmarks there: (double | float) x (blackhole | reduce) x
    > (current j.l.Math.signum | abs()>0 check).
    >
    > My results on Arm are in signum-facgt-copysign.ods. Main case is
    > 'random' which is actually a random from positive and negative numbers
    > between -0.5 and +0.5.
    >
    > Basically we have ~14% improvement in 'reduce' benchmark variant but
    > ~20% regression in 'blackhole' variant in case of only copySign()
    > intrinsified.
    >
    > Same picture if abs()>0 check is used in signum() (+-5%). This variant
    > is included as it shows very good results on x86.
    >
    > Intrinsic for signum() gives improvement of main case in both
    > 'blackhole' and 'reduce' variants of benchmark: 28% and 11%, which is a
    > noticeable difference.

    Ignoring Blackhole for the moment, this is what I'm seeing for the
    reduction/random case:

    Benchmark                   Mode  Cnt  Score   Error  Units

    ThunderX 2:

    -XX:-UseSignumIntrinsic -XX:-UseCopySignIntrinsic
    DoubleReduceBench.ofRandom  avgt    3  2.456 ? 0.065  ns/op

    -XX:+UseSignumIntrinsic -XX:-UseCopySignIntrinsic
    DoubleReduceBench.ofRandom  avgt    3  2.766 ? 0.107  ns/op

    -XX:-UseSignumIntrinsic -XX:+UseCopySignIntrinsic
    DoubleReduceBench.ofRandom  avgt    3  2.537 ? 0.770  ns/op


    Neoverse N1 (Actually Amazon m6g.16xlarge):

    -XX:-UseSignumIntrinsic -XX:-UseCopySignIntrinsic
    DoubleReduceBench.ofRandom  avgt    3  1.173 ? 0.001  ns/op

    -XX:+UseSignumIntrinsic -XX:-UseCopySignIntrinsic
    DoubleReduceBench.ofRandom  avgt    3  1.043 ? 0.022  ns/op

    -XX:-UseSignumIntrinsic -XX:+UseCopySignIntrinsic
    DoubleReduceBench.ofRandom  avgt    3  1.012 ?  0.001  ns/op


    By your own numbers, in the reduce benchmark the signum intrinsic is
    worse than default for all 0 and NaN, but about 12% better for random,
    >0, and <0. If you take the average of the sppedups and slowdowns it's
    actually worse than default.

    By my reckoning, if you take all possibilities (Nan, <0, >0, 0,
    Random) into account, the best-performing on the reduce test is
    actually Abs/Copysign, but there's very little in it. The only time
    that the signum intrinsic actually wins is when you're storing the
    result into memory *and* flushing the store buffer.

    --
    Andrew Haley  (he/him)
    Java Platform Lead Engineer
    Red Hat UK Ltd. <https://www.redhat.com>
    https://keybase.io/andrewhaley
    EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From honguye at microsoft.com  Fri Aug 28 17:46:07 2020
From: honguye at microsoft.com (Nhat Nguyen)
Date: Fri, 28 Aug 2020 17:46:07 +0000
Subject: [EXTERNAL] Re: RFR(S) 8251271- C2: Compile::_for_igvn list is
 corrupted after RenumberLiveNodes
In-Reply-To: <3c989485-754f-b7f5-e91f-c7c0adfdaf88@oracle.com>
References: <MW2PR2101MB1786E5FFC976B369ED901396A6540@MW2PR2101MB1786.namprd21.prod.outlook.com>
 <3c989485-754f-b7f5-e91f-c7c0adfdaf88@oracle.com>
Message-ID: <MW2PR2101MB1786282AA6B95B571D25D625A6520@MW2PR2101MB1786.namprd21.prod.outlook.com>

Thank you Christian for taking a look at the patch! I'll be sure to ask a sponsor to assign the bug for me in the future.

Nhat

-----Original Message-----
From: Christian Hagedorn <christian.hagedorn at oracle.com> 
Sent: Thursday, August 27, 2020 7:54 AM
To: Nhat Nguyen <honguye at microsoft.com>; hotspot-compiler-dev at openjdk.java.net
Subject: [EXTERNAL] Re: RFR(S) 8251271- C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes

Hi Nhat

Looks good to me!

Just make sure you that next time you assign the bug to you or a sponsor and/or leave a comment that you intend to work on it to avoid the possibility of some duplicated work (was no problem in this case) ;-)

Best regards,
Christian

On 26.08.20 20:55, Nhat Nguyen wrote:
> Hi hotspot-compiler-dev,
> 
> Please review the following patch to address 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs
> .openjdk.java.net%2Fbrowse%2FJDK-8251271&amp;data=02%7C01%7Chonguye%40
> microsoft.com%7C52cd8fdc324d4e86326b08d84a991fef%7C72f988bf86f141af91a
> b2d7cd011db47%7C1%7C0%7C637341368808595657&amp;sdata=j3YM%2BfxaO8KK1Ie
> CbKCPRYjwmGVfCUBrNULXDCJcUxM%3D&amp;reserved=0
> The bug is currently assigned to Christian Hagedorn, but he was supportive of me submitting the patch instead.
> I have run hotspot/tier1 and jdk/tier1 tests to make sure that the change is working as intended.
> 
> webrev: 
> https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.open
> jdk.java.net%2F~burban%2Fnhat%2FJDK-8251271%2Fwebrev.00%2F&amp;data=02
> %7C01%7Chonguye%40microsoft.com%7C52cd8fdc324d4e86326b08d84a991fef%7C7
> 2f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637341368808595657&amp;sdata
> =PsHUTKZf9MrvM8Et5zPXsXpj32mfsGfBRGoZATjOv0I%3D&amp;reserved=0
> 
> Thank you,
> Nhat
> 

From Roger.Riggs at oracle.com  Fri Aug 28 17:54:42 2020
From: Roger.Riggs at oracle.com (Roger Riggs)
Date: Fri, 28 Aug 2020 13:54:42 -0400
Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API
 for Base64 decoding
In-Reply-To: <8ece8d2e-fd99-b734-211e-a32b534a7dc8@linux.ibm.com>
References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com>
 <OFF371E11D.F5B64585-ON00258591.0006A5F8-49258591.0007AA3E@notes.na.collabserv.com>
 <ef75251b-c5ef-10b5-d0bd-c4ba15d2c934@linux.ibm.com>
 <f4c2d16d-5531-b0a6-b763-bed9d316190a@linux.ibm.com>
 <b3bd4ba5-a52f-02b7-6e30-b5f64061dec3@oracle.com>
 <8ece8d2e-fd99-b734-211e-a32b534a7dc8@linux.ibm.com>
Message-ID: <8d53dcf8-635a-11e2-4f6a-39b70e2c3b8b@oracle.com>

Hi Corey,

A few comments on core-libs side...

The naming convention for methods that end in '0' is usually to indicate
they are the bottom-most method or a native method.
So I think you can/should rename the methods to make the most sense
as to their function.

Comparing with the way that the Base64 encoder was intrinsified, the
method that is intrinsified should have a method body that does
the same function, so it is interchangable.? That likely will just shift
the "fast path" code into the decodeBlock method.
Keeping the symmetry between encoder and decoder will
make it easier to maintain the code.

Given intrinsic only handles 2 of the three cases, and the java code handles
all three, I would add an extra arg to decodeBlock to reflect the isMime 
case
and have the intrinsic take an early exit until it was implemented.


It is unfortunate that taking advantage of vectorization has to be hand 
coded.
If/when the Vector API is ready (JEP 338 https://openjdk.java.net/jeps/338)
the java code should be replaced to use the Vector API and then it would
work for a new hardware without specific coding for each platform.
"Just" implement the Vector API.? There's a lot more bang for the buck
going for that approach.

Thanks, Roger


On 8/24/20 9:21 PM, Corey Ashford wrote:
> Here's a revised webrev which includes a JMH benchmark for the decode 
> operation.
>
> http://cr.openjdk.java.net/~mhorie/8248188/webrev.03/
>
> The added benchmark tries to be "fair" in that it doesn't prefer a 
> large buffer size, which would favor the intrinsic.? It 
> pseudo-randomly (but reproducibly) chooses a buffer size between 8 and 
> 20k+8 bytes, and fills it with random data to encode and decode.? As 
> part of the TearDown of an invocation, it also checks the decoded 
> output data for correctness.
>
> Example runs on the Power9-based machine I use for development shows a 
> 3X average improvement across these random buffer sizes. Here's an 
> excerpt of the output when run with -XX:-UseBASE64Intrinsics :
>
> Iteration?? 1: 70795.623 ops/s
> Iteration?? 2: 71070.607 ops/s
> Iteration?? 3: 70867.544 ops/s
> Iteration?? 4: 71107.992 ops/s
> Iteration?? 5: 71048.281 ops/s
>
> And here's the output with the intrinsic enabled:
>
> Iteration?? 1: 208794.022 ops/s
> Iteration?? 2: 208630.904 ops/s
> Iteration?? 3: 208238.822 ops/s
> Iteration?? 4: 208714.967 ops/s
> Iteration?? 5: 209060.894 ops/s
>
> Taking the best of the two runs: 209060/71048 = 2.94
>
> From other experiments where the benchmark uses a fixed-size, larger 
> buffer, the performance ratio rises to about 4.0.
>
> Power10 should have a slightly higher ratio due to several factors, 
> but I have not yet benchmarked on Power10.
>
> Other arches ought to be able to do at least this well, if not better, 
> because of wider vector registers (> 128 bits) being available.? Only 
> a Power9/10 implementation is included in this webrev, however.
>
> Regards,
>
> - Corey
>
>
> On 8/19/20 11:20 AM, Roger Riggs wrote:
>> Hi Corey,
>>
>> For changes obviously performance motivated, it is conventional to 
>> run a JMH perf test to demonstate
>> the improvement and prove it is worthwhile to add code complexity.
>>
>> I don't see any existing Base64 JMH tests but they would be in the 
>> repo below or near:
>> ???? test/micro/org/openjdk/bench/java/util/
>>
>> Please contribute a JMH test and results to show the difference.
>>
>> Regards, Roger
>>
>>
>>
>> On 8/19/20 2:10 PM, Corey Ashford wrote:
>>> Michihiro Horie posted up a new iteration of this webrev for me.? 
>>> This time the webrev includes a complete implementation of the 
>>> intrinsic for Power9 and Power10.
>>>
>>> You can find it here: 
>>> http://cr.openjdk.java.net/~mhorie/8248188/webrev.02/
>>>
>>> Changes in webrev.02 vs. webrev.01:
>>>
>>> ? * The method header for the intrinsic in the Base64 code has been 
>>> rewritten using the Javadoc style.? The clarity of the comments has 
>>> been improved and some verbosity has been removed. There are no 
>>> additional functional changes to Base64.java.
>>>
>>> ? * The code needed to martial and check the intrinsic parameters 
>>> has been added, using the base64 encodeBlock intrinsic as a guideline.
>>>
>>> ? * A complete intrinsic implementation for Power9 and Power10 is 
>>> included.
>>>
>>> ? * Adds some Power9 and Power10 assembler instructions needed by 
>>> the intrinsic which hadn't been defined before.
>>>
>>> The intrinsic implementation in this patch accelerates the decoding 
>>> of large blocks of base64 data by a factor of about 3.5X on Power9.
>>>
>>> I'm attaching two Java test cases I am using for testing and 
>>> benchmarking.? The TestBase64_VB encodes and decodes randomly-sized 
>>> buffers of random data and checks that original data matches the 
>>> encoded-then-decoded data.? TestBase64Errors encodes a 48K block of 
>>> random bytes, then corrupts each byte of the encoded data, one at a 
>>> time, checking to see if the decoder catches the illegal byte.
>>>
>>> Any comments/suggestions would be appreciated.
>>>
>>> Thanks,
>>>
>>> - Corey
>>>
>>> On 7/27/20 6:49 PM, Corey Ashford wrote:
>>>> Michihiro Horie uploaded a new revision of the Base64 decodeBlock 
>>>> intrinsic API for me:
>>>>
>>>> http://cr.openjdk.java.net/~mhorie/8248188/webrev.01/
>>>>
>>>> It has the following changes with respect to the original one posted:
>>>>
>>>> ??* In the event of encountering a non-base64 character, instead of 
>>>> having a separate error code of -1, the intrinsic can now just 
>>>> return either 0, or the number of data bytes produced up to the 
>>>> point where the illegal base64 character was encountered. This 
>>>> reduces the number of special cases, and also provides a way to 
>>>> speed up the process of finding the bad character by the slower, 
>>>> pure-Java algorithm.
>>>>
>>>> ??* The isMIME boolean is removed from the API for two reasons:
>>>> ??? - The current API is not sufficient to handle the isMIME case, 
>>>> because there isn't a strict relationship between the number of 
>>>> input bytes and the number of output bytes, because there can be an 
>>>> arbitrary number of non-base64 characters in the source.
>>>> ??? - If an intrinsic only implements the (isMIME == false) case as 
>>>> ours does, it will always return 0 bytes processed, which will 
>>>> slightly slow down the normal path of processing an (isMIME == 
>>>> true) instantiation.
>>>> ??? - We considered adding a separate hotspot candidate for the 
>>>> (isMIME == true) case, but since we don't have an intrinsic 
>>>> implementation to test that, we decided to leave it as a future 
>>>> optimization.
>>>>
>>>> Comments and suggestions are welcome.? Thanks for your consideration.
>>>>
>>>> - Corey
>>>>
>>>> On 6/23/20 6:23 PM, Michihiro Horie wrote:
>>>>> Hi Corey,
>>>>>
>>>>> Following is the issue I created.
>>>>> https://bugs.openjdk.java.net/browse/JDK-8248188
>>>>>
>>>>> I will upload a webrev when you're ready as we talked in private.
>>>>>
>>>>> Best regards,
>>>>> Michihiro
>>>>>
>>>>> Inactive hide details for "Corey Ashford" ---2020/06/24 
>>>>> 09:40:10---Currently in java.util.Base64, there is a 
>>>>> HotSpotIntrinsicCa"Corey Ashford" ---2020/06/24 
>>>>> 09:40:10---Currently in java.util.Base64, there is a 
>>>>> HotSpotIntrinsicCandidate and API for encodeBlock, but no
>>>>>
>>>>> From: "Corey Ashford" <cjashfor at linux.ibm.com>
>>>>> To: "hotspot-compiler-dev at openjdk.java.net" 
>>>>> <hotspot-compiler-dev at openjdk.java.net>, 
>>>>> "ppc-aix-port-dev at openjdk.java.net" 
>>>>> <ppc-aix-port-dev at openjdk.java.net>
>>>>> Cc: Michihiro Horie/Japan/IBM at IBMJP, Kazunori 
>>>>> Ogata/Japan/IBM at IBMJP, joserz at br.ibm.com
>>>>> Date: 2020/06/24 09:40
>>>>> Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for 
>>>>> Base64 decoding
>>>>>
>>>>> ------------------------------------------------------------------------ 
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Currently in java.util.Base64, there is a 
>>>>> HotSpotIntrinsicCandidate and
>>>>> API for encodeBlock, but none for decoding. ?This means that only
>>>>> encoding gets acceleration from the underlying CPU's vector hardware.
>>>>>
>>>>> I'd like to propose adding a new intrinsic for decodeBlock. ?The
>>>>> considerations I have for this new intrinsic's API:
>>>>>
>>>>> ??* Don't make any assumptions about the underlying capability of the
>>>>> hardware. ?For example, do not impose any specific block size 
>>>>> granularity.
>>>>>
>>>>> ??* Don't assume the underlying intrinsic can handle isMIME or isURL
>>>>> modes, but also let them decide if they will process the data 
>>>>> regardless
>>>>> of the settings of the two booleans.
>>>>>
>>>>> ??* Any remaining data that is not processed by the intrinsic will be
>>>>> processed by the pure Java implementation. ?This allows the 
>>>>> intrinsic to
>>>>> process whatever block sizes it's good at without the complexity of
>>>>> handling the end fragments.
>>>>>
>>>>> ??* If any illegal character is discovered in the decoding 
>>>>> process, the
>>>>> intrinsic will simply return -1, instead of requiring it to throw a
>>>>> proper exception from the context of the intrinsic. ?In the event of
>>>>> getting a -1 returned from the intrinsic, the Java Base64 library 
>>>>> code
>>>>> simply calls the pure Java implementation to have it find the 
>>>>> error and
>>>>> properly throw an exception. ?This is a performance trade-off in the
>>>>> case of an error (which I expect to be very rare).
>>>>>
>>>>> ??* One thought I have for a further optimization (not implemented in
>>>>> the current patch), is that when the intrinsic decides not to 
>>>>> process a
>>>>> block because of some combination of isURL and isMIME settings it
>>>>> doesn't handle, it could return extra bits in the return code, 
>>>>> encoded
>>>>> as a negative number. ?For example:
>>>>>
>>>>> Illegal_Base64_char ? = 0b001;
>>>>> isMIME_unsupported ? ?= 0b010;
>>>>> isURL_unsupported ? ? = 0b100;
>>>>>
>>>>> These can be OR'd together as needed and then negated (flip the 
>>>>> sign).
>>>>> The Base64 library code could then cache these flags, so it will know
>>>>> not to call the intrinsic again when another decodeBlock is requested
>>>>> but with an unsupported mode. ?This will save the performance hit of
>>>>> calling the intrinsic when it is guaranteed to fail.
>>>>>
>>>>> I've tested the attached patch with an actual intrinsic coded up for
>>>>> Power9/Power10, but those runtime intrinsics and arch-specific 
>>>>> patches
>>>>> aren't attached today. ?I want to get some consensus on the
>>>>> library-level intrinsic API first.
>>>>>
>>>>> Also attached is a simple test case to test that the new intrinsic 
>>>>> API
>>>>> doesn't break anything.
>>>>>
>>>>> I'm open to any comments about this.
>>>>>
>>>>> Thanks for your consideration,
>>>>>
>>>>> - Corey
>>>>>
>>>>>
>>>>> Corey Ashford
>>>>> IBM Systems, Linux Technology Center, OpenJDK team
>>>>> cjashfor at us dot ibm dot com
>>>>> [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro 
>>>>> Horie/Japan/IBM] [attachment "TestBase64.java" deleted by 
>>>>> Michihiro Horie/Japan/IBM]
>>>>>
>>>>>
>>>>
>>>
>>
>


From dean.long at oracle.com  Sat Aug 29 01:41:51 2020
From: dean.long at oracle.com (Dean Long)
Date: Fri, 28 Aug 2020 18:41:51 -0700
Subject: RFR(M) 8209961: [AOT] crash in Graal stub when -XX:+VerifyOops is used
Message-ID: <00d162e0-90ca-5647-6062-fa2e8aa70fd6@oracle.com>

https://bugs.openjdk.java.net/browse/JDK-8209961
http://cr.openjdk.java.net/~dlong/8209961/webrev/

This change fixes support for -XX:+VerifyOops when used with AOT. The 
feature is disabled in generated AOT code by default unless 
-J-Dgraal.AOTVerifyOops=true
is passed to jaotc (similar idea as --compile-with-assertions).? The JVM 
changes are minimal.? The Graal changes are all from upstream Graal and 
have already been reviewed and pushed there.

dl


From boris.ulasevich at bell-sw.com  Sat Aug 29 15:39:02 2020
From: boris.ulasevich at bell-sw.com (Boris Ulasevich)
Date: Sat, 29 Aug 2020 18:39:02 +0300
Subject: RFR 8249893: AARCH64: optimize the construction of the value from
 the bits of the other two
In-Reply-To: <5cbb89bb-32c7-8064-a6e9-f9b0d0a2b195@redhat.com>
References: <c11d8af8-bca7-7a7e-5486-94c487f241ac@bell-sw.com>
 <2fae4597-6803-a7ba-62a3-1e1827819309@bell-sw.com>
 <667e7712-1f7f-f13b-d24a-b45658382d6c@redhat.com>
 <6a1df054-829f-d0b0-a85c-8ee311bab657@bell-sw.com>
 <8635e113-53c0-f6a7-e51c-acf20222e216@redhat.com>
 <89c44f2f-9a4f-4298-4e78-358fff27b6a7@bell-sw.com>
 <95cf8beb-2071-8c41-ff71-d4998681e742@redhat.com>
 <2323d921-8db3-b98f-af7a-bba7b7c345be@bell-sw.com>
 <c5bc64a8-3ef9-f577-3bf9-72e9b1585298@redhat.com>
 <405af8db-d12b-66ef-ff1b-8d0e2fb1273c@bell-sw.com>
 <5cbb89bb-32c7-8064-a6e9-f9b0d0a2b195@redhat.com>
Message-ID: <24ed2bde-c80d-b6db-3167-6c31cc8fb4a7@bell-sw.com>

Hi Andrew,

Thank you once again.

Can you please look at my update. I have added a functional test to
demonstrate which cases are covered by the change and made a small
update (OrI case in is_bitrange_zero) to add the missing transformation
on java.awt.Color case:

http://cr.openjdk.java.net/~bulasevich/8249893/webrev.02

The test shows successful transformation for typical int/long value
construction cases I found in jdk java sources:
((a & 0xFF) << 24) | ((r & 0xFF) << 16) | ((g & 0xFF) << 8) | (b & 0xFF)
(high << 32) | (low & 0xffffffffL)
Was there anything else among your test cases?

On my test case SubTest0::tst2 output I see that the BFI transformation
works, but for this particular case (compiled with template=template1
where value1=value2) the result is not faster than default one.

(value1 & 0x1L) | ((value1 & 0x1L) << 3)
:
and? x11, x2, #0x1
orr? x11, x11, x11, lsl #3
->
and? x11, x2, #0x1
bfi? x11, x2, #3, #1

I think it is Ok, using bfi here does not reduce the number of
instructions used. The same case with different inputs
(template=template2) is better:

(value1 & 0x1L) | ((valueC & 0x1L) << 1)
:
and? x18, x10, #0x1
and? x10, x1, #0x1
orr? x10, x10, x18, lsl #3
->
and? x11, x3, #0x1
bfi? x11, x18, #3, #1

Do you think TestBFI test cases are Ok or I should implement more
checks? The "a << 24 >>> 24" case IMO should be implemented as a
LShiftI::Ideal transformation which should be done separately.

thanks,
Boris

On 26.08.2020 17:21, Andrew Haley wrote:
> On 25/08/2020 18:30, Boris Ulasevich wrote:
>> I believe masking with left shift and right shift is not common.
>> Search though jdk repository does not give such patterns while
>> there is a hundreds of mask+lshift expressions.
>
>> I implemented a simple is_bitrange_zero() method for counting the
>> bitranges of sub-expressions: power-of-two masks and left shift only.
>> We can take into account more cases (careful testing is a main
>> concern). But particularly about "r.a << 24 >>> 24" expression
>> I think it is worse to think about canonicalization: "left shift + right
>> shift" to "mask + left shift" (or may be the backwards).
> I'm running your test program, and for example I get this, old on the
> left, new on the right.
>
> Compiled method (c2)   11832 1113             SubTest0::tst2 (184 bytes)
>
>    : and       x11, x2, #0x1   ;*land                            :   and     x11, x2, #0x1
>    : and       x10, x1, #0x1   ;*land                            :   and     x10, x1, #0x1
>    : orr       x11, x11, x11, lsl #3                             :   bfi     x11, x2, #3, #1
>    : orr       x10, x10, x10, lsl #3                             :   bfi     x10, x1, #3, #1
>    : and       xmethod, x3, #0x1  ;*land                         :   and     xmethod, x3, #0x1
>    : add       x10, x10, x11                                     :   bfi     xmethod, x3, #3, #1
>    : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, x11
>    : and       xmethod, x4, #0x1  ;*land                         :   and     x11, x4, #0x1
>    : add       x10, x11, x10                                     :   bfi     x11, x4, #3, #1
>    : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, xmethod
>    : and       xmethod, x5, #0x1  ;*land                         :   and     xmethod, x5, #0x1
>    : add       x10, x11, x10                                     :   bfi     xmethod, x5, #3, #1
>    : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, x11
>    : and       xmethod, x6, #0x1  ;*land                         :   and     x11, x6, #0x1
>    : add       x10, x11, x10                                     :   bfi     x11, x6, #3, #1
>    : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, xmethod
>    : and       xmethod, x7, #0x1  ;*land                         :   and     xmethod, x7, #0x1
>    : add       x10, x11, x10                                     :   bfi     xmethod, x7, #3, #1
>    : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, x11
>    : and       xmethod, x0, #0x1  ;*land                         :   add     x10, x10, xmethod
>    : add       x10, x11, x10                                     :   ldr     x13, [sp,#32]
>    : orr       x11, xmethod, xmethod, lsl #3                     :   and     x11, x0, #0x1
>    : ldr       xmethod, [sp,#32]                                 :   and     xmethod, x13, #0x1
>    : and       xmethod, xmethod, #0x1                            :   bfi     x11, x0, #3, #1
>    : add       x10, x11, x10                                     :   bfi     xmethod, x13, #3, #1
>    : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, x11
>    : ldr       xmethod, [sp,#40]                                 :   ldr     x13, [sp,#40]
>    : and       xmethod, xmethod, #0x1                            :   and     x11, x13, #0x1
>    : add       x10, x11, x10                                     :   bfi     x11, x13, #3, #1
>    : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, xmethod
>    : ldr       xmethod, [sp,#48]                                 :   ldr     x13, [sp,#48]
>    : and       xmethod, xmethod, #0x1                            :   and     xmethod, x13, #0x1
>    : add       x10, x11, x10                                     :   bfi     xmethod, x13, #3, #1
>    : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, x11
>    : ldr       xmethod, [sp,#56]                                 :   ldr     x13, [sp,#56]
>    : and       xmethod, xmethod, #0x1                            :   and     x11, x13, #0x1
>    : add       x10, x11, x10                                     :   bfi     x11, x13, #3, #1
>    : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, xmethod
>    : add       x0, x11, x10    ;*ladd                            :   add     x0, x10, x11
>
> I've also tried a bunch of different test cases doing operations that
> could match BFI instructions, and in only a few of them does it
> happen. In almost all cases, then, this change does not help, *even
> your own test case*.
>
> I think that you've got something that is potentially useful, but it
> needs some careful analysis to make sure it actually gets used.
>


From xxinliu at amazon.com  Sat Aug 29 20:08:36 2020
From: xxinliu at amazon.com (Liu, Xin)
Date: Sat, 29 Aug 2020 20:08:36 +0000
Subject: RFR: 8251464: make Node::dump(int depth) support indent 
Message-ID: <1598731717217.87517@amazon.com>

hi, Reviewers,


Could you review this patch?

JBS:https://bugs.openjdk.java.net/browse/JDK-8251464

Webrev:

http://cr.openjdk.java.net/~xliu/8251464/00/webrev/


This patch attempts to improve the formation of nodes when developers try to dump an ideal graph or snippet of a graph.  In practice, I found it's pretty handy if Node::dump(int d) can support indent.

The basic idea is to support indention for the utility function:

collect_nodes_i(GrowableArray<Node*>* queue, const Node* start, int direction, uint depth, bool include_start, bool only_ctrl, bool only_data)

It only affects Node::dump family and -XX::PrintIdeal.  It won't impact the output for igv.
This can help developers who try to inspect a cluster of nodes in gdb.

Another change is naming. collect_nodes_i uses breadth-first search. the container is used in fifo way instead of filo.
I think the name "queue" serve better.

TEST:
hotspot:tier1  and gtest.
mach-5

thanks,
--lx


From cjashfor at linux.ibm.com  Sat Aug 29 20:19:42 2020
From: cjashfor at linux.ibm.com (Corey Ashford)
Date: Sat, 29 Aug 2020 13:19:42 -0700
Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API
 for Base64 decoding
In-Reply-To: <8d53dcf8-635a-11e2-4f6a-39b70e2c3b8b@oracle.com>
References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com>
 <OFF371E11D.F5B64585-ON00258591.0006A5F8-49258591.0007AA3E@notes.na.collabserv.com>
 <ef75251b-c5ef-10b5-d0bd-c4ba15d2c934@linux.ibm.com>
 <f4c2d16d-5531-b0a6-b763-bed9d316190a@linux.ibm.com>
 <b3bd4ba5-a52f-02b7-6e30-b5f64061dec3@oracle.com>
 <8ece8d2e-fd99-b734-211e-a32b534a7dc8@linux.ibm.com>
 <8d53dcf8-635a-11e2-4f6a-39b70e2c3b8b@oracle.com>
Message-ID: <65ed7919-86fc-adfa-3cd5-58dd96a3487f@linux.ibm.com>

Hi Roger,

Thanks for your reply and thoughts!  Comments interspersed below:

On 8/28/20 10:54 AM, Roger Riggs wrote:
> Hi Corey,
> 
> A few comments on core-libs side...
> 
> The naming convention for methods that end in '0' is usually to indicate
> they are the bottom-most method or a native method.
> So I think you can/should rename the methods to make the most sense
> as to their function.

Ok, I will fix that.

> 
> Comparing with the way that the Base64 encoder was intrinsified, the
> method that is intrinsified should have a method body that does
> the same function, so it is interchangable.? That likely will just shift
> the "fast path" code into the decodeBlock method.
> Keeping the symmetry between encoder and decoder will
> make it easier to maintain the code.

Good point.  I'll investigate what this looks like in terms of the 
actual code, and will report back (perhaps in a new webrev).

> 
> Given intrinsic only handles 2 of the three cases, and the java code 
> handles
> all three, I would add an extra arg to decodeBlock to reflect the isMime 
> case
> and have the intrinsic take an early exit until it was implemented.
> 

I did consider doing that, but didn't for two reasons:

* Implementing isMIME using vector hardware would be very difficult due 
to the need to ignore non-base64 characters.  This requires eliminating 
those characters from the vector, then reading and shifting more in, 
repeatedly until there are no non-base64 characters left.  This isn't a 
trivial/fast thing to do, at least on Power arch.  None of the published 
base64 encode/decode functions for vector processors address the MIME 
case.  In fact they don't address isURL=true either, but fortunately 
that is a relatively easy addition.

* If isMIME=true is not implemented by the intrinsic, it will cost 
unnecessary overhead for that case, because of the need to martial the 
parameters, call the intrinsic, and then do an early return.  I 
benchmarked this approach before, and saw an approx 5% drop in 
performance when isMIME = true.  So that's why we decided to leave the 
isMIME=true case as a later optimization.  Because of the extra 
complexity of the algorithm, it probably shouldn't share the same 
intrinsic anyway; only the isMIME=true case should take the performance hit.

> 
> It is unfortunate that taking advantage of vectorization has to be hand 
> coded.
> If/when the Vector API is ready (JEP 338 https://openjdk.java.net/jeps/338)
> the java code should be replaced to use the Vector API and then it would
> work for a new hardware without specific coding for each platform.
> "Just" implement the Vector API.? There's a lot more bang for the buck
> going for that approach.


The kind of vector processing used in this intrinsic operates mostly on 
bytes within one vector, not between two vectors (for example in 
matrix-multiply algorithms), otherwise known as SWAR 
(https://en.wikipedia.org/wiki/SWAR).  Because of that, it's very 
sensitive to which exact instructions are available in the vector 
processor.  There isn't much standardization of SWAR instructions 
between different arches, so I think it would be hard to get a generic 
SWAR API that gives good performance across several arches.  From 
briefly looking at the link you provided, it doesn't appear to address 
SWAR operations, so it doesn't seem to me that waiting for the vector 
API would be worth the wait, and in fact may not provide any method at 
all to boost performance of base64 decode/encode.

Regards,

- Corey

P.S. I work only two days a week, so the updates will be slower compared 
to other developers.

> 
> Thanks, Roger
> 
> 
> On 8/24/20 9:21 PM, Corey Ashford wrote:
>> Here's a revised webrev which includes a JMH benchmark for the decode 
>> operation.
>>
>> http://cr.openjdk.java.net/~mhorie/8248188/webrev.03/
>>
>> The added benchmark tries to be "fair" in that it doesn't prefer a 
>> large buffer size, which would favor the intrinsic.? It 
>> pseudo-randomly (but reproducibly) chooses a buffer size between 8 and 
>> 20k+8 bytes, and fills it with random data to encode and decode.? As 
>> part of the TearDown of an invocation, it also checks the decoded 
>> output data for correctness.
>>
>> Example runs on the Power9-based machine I use for development shows a 
>> 3X average improvement across these random buffer sizes. Here's an 
>> excerpt of the output when run with -XX:-UseBASE64Intrinsics :
>>
>> Iteration?? 1: 70795.623 ops/s
>> Iteration?? 2: 71070.607 ops/s
>> Iteration?? 3: 70867.544 ops/s
>> Iteration?? 4: 71107.992 ops/s
>> Iteration?? 5: 71048.281 ops/s
>>
>> And here's the output with the intrinsic enabled:
>>
>> Iteration?? 1: 208794.022 ops/s
>> Iteration?? 2: 208630.904 ops/s
>> Iteration?? 3: 208238.822 ops/s
>> Iteration?? 4: 208714.967 ops/s
>> Iteration?? 5: 209060.894 ops/s
>>
>> Taking the best of the two runs: 209060/71048 = 2.94
>>
>> From other experiments where the benchmark uses a fixed-size, larger 
>> buffer, the performance ratio rises to about 4.0.
>>
>> Power10 should have a slightly higher ratio due to several factors, 
>> but I have not yet benchmarked on Power10.
>>
>> Other arches ought to be able to do at least this well, if not better, 
>> because of wider vector registers (> 128 bits) being available.? Only 
>> a Power9/10 implementation is included in this webrev, however.
>>
>> Regards,
>>
>> - Corey
>>
>>
>> On 8/19/20 11:20 AM, Roger Riggs wrote:
>>> Hi Corey,
>>>
>>> For changes obviously performance motivated, it is conventional to 
>>> run a JMH perf test to demonstate
>>> the improvement and prove it is worthwhile to add code complexity.
>>>
>>> I don't see any existing Base64 JMH tests but they would be in the 
>>> repo below or near:
>>> ???? test/micro/org/openjdk/bench/java/util/
>>>
>>> Please contribute a JMH test and results to show the difference.
>>>
>>> Regards, Roger
>>>
>>>
>>>
>>> On 8/19/20 2:10 PM, Corey Ashford wrote:
>>>> Michihiro Horie posted up a new iteration of this webrev for me. 
>>>> This time the webrev includes a complete implementation of the 
>>>> intrinsic for Power9 and Power10.
>>>>
>>>> You can find it here: 
>>>> http://cr.openjdk.java.net/~mhorie/8248188/webrev.02/
>>>>
>>>> Changes in webrev.02 vs. webrev.01:
>>>>
>>>> ? * The method header for the intrinsic in the Base64 code has been 
>>>> rewritten using the Javadoc style.? The clarity of the comments has 
>>>> been improved and some verbosity has been removed. There are no 
>>>> additional functional changes to Base64.java.
>>>>
>>>> ? * The code needed to martial and check the intrinsic parameters 
>>>> has been added, using the base64 encodeBlock intrinsic as a guideline.
>>>>
>>>> ? * A complete intrinsic implementation for Power9 and Power10 is 
>>>> included.
>>>>
>>>> ? * Adds some Power9 and Power10 assembler instructions needed by 
>>>> the intrinsic which hadn't been defined before.
>>>>
>>>> The intrinsic implementation in this patch accelerates the decoding 
>>>> of large blocks of base64 data by a factor of about 3.5X on Power9.
>>>>
>>>> I'm attaching two Java test cases I am using for testing and 
>>>> benchmarking.? The TestBase64_VB encodes and decodes randomly-sized 
>>>> buffers of random data and checks that original data matches the 
>>>> encoded-then-decoded data.? TestBase64Errors encodes a 48K block of 
>>>> random bytes, then corrupts each byte of the encoded data, one at a 
>>>> time, checking to see if the decoder catches the illegal byte.
>>>>
>>>> Any comments/suggestions would be appreciated.
>>>>
>>>> Thanks,
>>>>
>>>> - Corey
>>>>
>>>> On 7/27/20 6:49 PM, Corey Ashford wrote:
>>>>> Michihiro Horie uploaded a new revision of the Base64 decodeBlock 
>>>>> intrinsic API for me:
>>>>>
>>>>> http://cr.openjdk.java.net/~mhorie/8248188/webrev.01/
>>>>>
>>>>> It has the following changes with respect to the original one posted:
>>>>>
>>>>> ??* In the event of encountering a non-base64 character, instead of 
>>>>> having a separate error code of -1, the intrinsic can now just 
>>>>> return either 0, or the number of data bytes produced up to the 
>>>>> point where the illegal base64 character was encountered. This 
>>>>> reduces the number of special cases, and also provides a way to 
>>>>> speed up the process of finding the bad character by the slower, 
>>>>> pure-Java algorithm.
>>>>>
>>>>> ??* The isMIME boolean is removed from the API for two reasons:
>>>>> ??? - The current API is not sufficient to handle the isMIME case, 
>>>>> because there isn't a strict relationship between the number of 
>>>>> input bytes and the number of output bytes, because there can be an 
>>>>> arbitrary number of non-base64 characters in the source.
>>>>> ??? - If an intrinsic only implements the (isMIME == false) case as 
>>>>> ours does, it will always return 0 bytes processed, which will 
>>>>> slightly slow down the normal path of processing an (isMIME == 
>>>>> true) instantiation.
>>>>> ??? - We considered adding a separate hotspot candidate for the 
>>>>> (isMIME == true) case, but since we don't have an intrinsic 
>>>>> implementation to test that, we decided to leave it as a future 
>>>>> optimization.
>>>>>
>>>>> Comments and suggestions are welcome.? Thanks for your consideration.
>>>>>
>>>>> - Corey
>>>>>
>>>>> On 6/23/20 6:23 PM, Michihiro Horie wrote:
>>>>>> Hi Corey,
>>>>>>
>>>>>> Following is the issue I created.
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8248188
>>>>>>
>>>>>> I will upload a webrev when you're ready as we talked in private.
>>>>>>
>>>>>> Best regards,
>>>>>> Michihiro
>>>>>>
>>>>>> Inactive hide details for "Corey Ashford" ---2020/06/24 
>>>>>> 09:40:10---Currently in java.util.Base64, there is a 
>>>>>> HotSpotIntrinsicCa"Corey Ashford" ---2020/06/24 
>>>>>> 09:40:10---Currently in java.util.Base64, there is a 
>>>>>> HotSpotIntrinsicCandidate and API for encodeBlock, but no
>>>>>>
>>>>>> From: "Corey Ashford" <cjashfor at linux.ibm.com>
>>>>>> To: "hotspot-compiler-dev at openjdk.java.net" 
>>>>>> <hotspot-compiler-dev at openjdk.java.net>, 
>>>>>> "ppc-aix-port-dev at openjdk.java.net" 
>>>>>> <ppc-aix-port-dev at openjdk.java.net>
>>>>>> Cc: Michihiro Horie/Japan/IBM at IBMJP, Kazunori 
>>>>>> Ogata/Japan/IBM at IBMJP, joserz at br.ibm.com
>>>>>> Date: 2020/06/24 09:40
>>>>>> Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for 
>>>>>> Base64 decoding
>>>>>>
>>>>>> ------------------------------------------------------------------------ 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Currently in java.util.Base64, there is a 
>>>>>> HotSpotIntrinsicCandidate and
>>>>>> API for encodeBlock, but none for decoding. ?This means that only
>>>>>> encoding gets acceleration from the underlying CPU's vector hardware.
>>>>>>
>>>>>> I'd like to propose adding a new intrinsic for decodeBlock. ?The
>>>>>> considerations I have for this new intrinsic's API:
>>>>>>
>>>>>> ??* Don't make any assumptions about the underlying capability of the
>>>>>> hardware. ?For example, do not impose any specific block size 
>>>>>> granularity.
>>>>>>
>>>>>> ??* Don't assume the underlying intrinsic can handle isMIME or isURL
>>>>>> modes, but also let them decide if they will process the data 
>>>>>> regardless
>>>>>> of the settings of the two booleans.
>>>>>>
>>>>>> ??* Any remaining data that is not processed by the intrinsic will be
>>>>>> processed by the pure Java implementation. ?This allows the 
>>>>>> intrinsic to
>>>>>> process whatever block sizes it's good at without the complexity of
>>>>>> handling the end fragments.
>>>>>>
>>>>>> ??* If any illegal character is discovered in the decoding 
>>>>>> process, the
>>>>>> intrinsic will simply return -1, instead of requiring it to throw a
>>>>>> proper exception from the context of the intrinsic. ?In the event of
>>>>>> getting a -1 returned from the intrinsic, the Java Base64 library 
>>>>>> code
>>>>>> simply calls the pure Java implementation to have it find the 
>>>>>> error and
>>>>>> properly throw an exception. ?This is a performance trade-off in the
>>>>>> case of an error (which I expect to be very rare).
>>>>>>
>>>>>> ??* One thought I have for a further optimization (not implemented in
>>>>>> the current patch), is that when the intrinsic decides not to 
>>>>>> process a
>>>>>> block because of some combination of isURL and isMIME settings it
>>>>>> doesn't handle, it could return extra bits in the return code, 
>>>>>> encoded
>>>>>> as a negative number. ?For example:
>>>>>>
>>>>>> Illegal_Base64_char ? = 0b001;
>>>>>> isMIME_unsupported ? ?= 0b010;
>>>>>> isURL_unsupported ? ? = 0b100;
>>>>>>
>>>>>> These can be OR'd together as needed and then negated (flip the 
>>>>>> sign).
>>>>>> The Base64 library code could then cache these flags, so it will know
>>>>>> not to call the intrinsic again when another decodeBlock is requested
>>>>>> but with an unsupported mode. ?This will save the performance hit of
>>>>>> calling the intrinsic when it is guaranteed to fail.
>>>>>>
>>>>>> I've tested the attached patch with an actual intrinsic coded up for
>>>>>> Power9/Power10, but those runtime intrinsics and arch-specific 
>>>>>> patches
>>>>>> aren't attached today. ?I want to get some consensus on the
>>>>>> library-level intrinsic API first.
>>>>>>
>>>>>> Also attached is a simple test case to test that the new intrinsic 
>>>>>> API
>>>>>> doesn't break anything.
>>>>>>
>>>>>> I'm open to any comments about this.
>>>>>>
>>>>>> Thanks for your consideration,
>>>>>>
>>>>>> - Corey
>>>>>>
>>>>>>
>>>>>> Corey Ashford
>>>>>> IBM Systems, Linux Technology Center, OpenJDK team
>>>>>> cjashfor at us dot ibm dot com
>>>>>> [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro 
>>>>>> Horie/Japan/IBM] [attachment "TestBase64.java" deleted by 
>>>>>> Michihiro Horie/Japan/IBM]
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 


From aph at redhat.com  Sun Aug 30 08:34:57 2020
From: aph at redhat.com (Andrew Haley)
Date: Sun, 30 Aug 2020 09:34:57 +0100
Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster
 Math.signum(fp)
In-Reply-To: <5D556B7D-1995-4FBC-9176-E79FFC789571@amazon.com>
References: <5D556B7D-1995-4FBC-9176-E79FFC789571@amazon.com>
Message-ID: <8289015c-711b-286f-ba99-e589edfed8a5@redhat.com>

On 28/08/2020 17:40, Hohensee, Paul wrote:

> One's perspective on the benchmark results depends on the expected
> frequency of the input types. If we don't expect frequent NaNs (I
> don?t, because they mean your algorithm is numerically unstable and
> you're wasting your time running it), or zeros (somewhat arguable,
> but note that most codes go to some lengths to eliminate zeros,
> e.g., using sparse arrays), then this patch seems to me to be a win.

Possibly. But it's a significant change that improves some cases while
making some other cases worse. When it does makes some cases better,
it's only by a small factor and it's not consistent across hardware
implementations.

Please consider the numbers. When you look at Abs/Copysign it improves
all cases except 0, and it doesn't make any of them any worse.
Copysign on its own gets close. Copysign is nearly as good. That's
true at least for the reduce case, which I argue is representative,
more so than the blackhole case, where the blackhole operation itself
swamps the calculation we're trying to measure.

Ignoring NaN, I've added averages for the four cases to
http://cr.openjdk.java.net/~aph/signum-facgt-copysign.ods.

But we still don't know what effect all of this has, if any, on real
code. My guess is that copysign should always helps because it avoids
a move between FPU and integer unit and is otherwise identical. But
the blackhole benchmark suggests it can make latency worse, and I have
no explanation for that.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From boris.ulasevich at bell-sw.com  Sun Aug 30 17:18:30 2020
From: boris.ulasevich at bell-sw.com (Boris Ulasevich)
Date: Sun, 30 Aug 2020 20:18:30 +0300
Subject: RFR(S) 8252311: AArch64: save two words in itable lookup stub
Message-ID: <b2158333-d0b8-c276-1c0c-514f647119a6@bell-sw.com>

Hi,

The interface method lookup stub becomes hot when interface calls
are performed frequently. The stub assembly code can be made
shorter (132->124 bytes) by using a pre-increment instruction variant.

http://cr.openjdk.java.net/~bulasevich/8252311/webrev.00
http://bugs.openjdk.java.net/browse/JDK-8252311

The benchmark [1] shows [2] performance and icache loads improvement:
performance: 6165206 -> 6307798 ops/s
L1-icache-loads: 307.271 -> 274.604

The change was tested with JTREG.

thanks,
Boris

[1] http://cr.openjdk.java.net/~bulasevich/8252311/InvokeInterface.java
[2] http://cr.openjdk.java.net/~bulasevich/8252311/InvokeInterface.perf.txt

From vladimir.kozlov at oracle.com  Sun Aug 30 22:16:38 2020
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Sun, 30 Aug 2020 15:16:38 -0700
Subject: RFR(M) 8209961: [AOT] crash in Graal stub when -XX:+VerifyOops is
 used
In-Reply-To: <00d162e0-90ca-5647-6062-fa2e8aa70fd6@oracle.com>
References: <00d162e0-90ca-5647-6062-fa2e8aa70fd6@oracle.com>
Message-ID: <16b6eb21-9d92-790c-4d57-0300be288ffa@oracle.com>

Looks good.

Thanks,
Vladimir K

On 8/28/20 6:41 PM, Dean Long wrote:
> https://bugs.openjdk.java.net/browse/JDK-8209961
> http://cr.openjdk.java.net/~dlong/8209961/webrev/
> 
> This change fixes support for -XX:+VerifyOops when used with AOT. The feature is disabled in generated AOT code by 
> default unless -J-Dgraal.AOTVerifyOops=true
> is passed to jaotc (similar idea as --compile-with-assertions).? The JVM changes are minimal.? The Graal changes are all 
> from upstream Graal and have already been reviewed and pushed there.
> 
> dl
> 

From dean.long at oracle.com  Sun Aug 30 22:37:08 2020
From: dean.long at oracle.com (Dean Long)
Date: Sun, 30 Aug 2020 15:37:08 -0700
Subject: RFR(M) 8209961: [AOT] crash in Graal stub when -XX:+VerifyOops is
 used
In-Reply-To: <16b6eb21-9d92-790c-4d57-0300be288ffa@oracle.com>
References: <00d162e0-90ca-5647-6062-fa2e8aa70fd6@oracle.com>
 <16b6eb21-9d92-790c-4d57-0300be288ffa@oracle.com>
Message-ID: <a99fd107-c18b-f310-2bc1-050ce1ad6049@oracle.com>

Thanks Vladimir.

dl

On 8/30/20 3:16 PM, Vladimir Kozlov wrote:
> Looks good.
>
> Thanks,
> Vladimir K
>
> On 8/28/20 6:41 PM, Dean Long wrote:
>> https://bugs.openjdk.java.net/browse/JDK-8209961
>> http://cr.openjdk.java.net/~dlong/8209961/webrev/
>>
>> This change fixes support for -XX:+VerifyOops when used with AOT. The 
>> feature is disabled in generated AOT code by default unless 
>> -J-Dgraal.AOTVerifyOops=true
>> is passed to jaotc (similar idea as --compile-with-assertions). The 
>> JVM changes are minimal.? The Graal changes are all from upstream 
>> Graal and have already been reviewed and pushed there.
>>
>> dl
>>


From ningsheng.jian at arm.com  Mon Aug 31 04:00:48 2020
From: ningsheng.jian at arm.com (Ningsheng Jian)
Date: Mon, 31 Aug 2020 12:00:48 +0800
Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend
 support
In-Reply-To: <9b585dff-38be-16b5-b1a1-4ea0207458b9@oracle.com>
References: <bce2556a-b709-d6d1-ffa2-655faaf56787@arm.com>
 <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com>
 <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com>
 <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com>
 <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com>
 <01af5faf-0a40-fd8d-4466-5387e5b2c08c@oracle.com>
 <23fe078f-bdf8-d010-4e7d-0e699ecaf842@arm.com>
 <59e535e4-05bc-38cc-0049-5d9f29a882cb@oracle.com>
 <b97a6343-ec3b-f6ec-2a75-d02a53641ed1@arm.com>
 <35f8801f-4383-4804-2a9b-5818d1bda763@redhat.com>
 <524a4aaa-3cf7-7b5e-3e0e-0fd7f4f89fbf@oracle.com>
 <670fad6f-16ff-a7b3-8775-08dd79809ddf@redhat.com>
 <dbf16339-a41e-2f87-aace-461b21f10bfe@oracle.com>
 <b4791a53-2a13-c0a4-a151-c613a613df65@arm.com>
 <f6225a97-4660-d771-5ab3-9f92291eadf6@redhat.com>
 <9b585dff-38be-16b5-b1a1-4ea0207458b9@oracle.com>
Message-ID: <d437a72c-d880-cbe3-d016-afdfe978d87c@arm.com>

Hi Vladimir,

On 8/28/20 5:56 PM, Vladimir Ivanov wrote:
> 
[...]
> 
> One more point on naming: though it was me who proposed the name "vec" 
> on x86, I don't think it's the best option anymore. Considering it's 
> desirable to get rid of VecS/VecD/VecX/... machine ideal registers and 
> replace them with a single one, I think using Op_RegV is a better 
> alternative to Op_Vec. Hence, regV/rRegV/vReg look better (depending on 
> conventions adopted in particular AD file).
> 

vReg looks good to me. I will update it in the new webrev. Thanks!

Regards,
Ningsheng

From felix.yang at huawei.com  Mon Aug 31 06:50:34 2020
From: felix.yang at huawei.com (Yangfei (Felix))
Date: Mon, 31 Aug 2020 06:50:34 +0000
Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic
Message-ID: <DA41BE1DDCA941489001C7FBD7A8820EE7ED7841@dggeml527-mbx.china.huawei.com>

Hi,

    Bug: https://bugs.openjdk.java.net/browse/JDK-8252204 
    Webrev: http://cr.openjdk.java.net/~fyang/8252204/webrev.00/ 

    This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions.
    Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions:
        https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52

    Trivial adaptation in SHA3. implCompress is needed for the purpose of adding the intrinsic.
    For SHA3, we need to pass one extra parameter "digestLength" to the stub for the calculation of block size.
    "digestLength" is also used in for the EOR loop before keccak to differentiate different SHA3 variants.

    We added jtreg tests for SHA3 and used QEMU system emulator which supports SHA3 instructions to test the functionality. 
    Patch passed jtreg tier1-3 tests with QEMU system emulator. 
    Also verified with jtreg tier1-3 tests without SHA3 instructions on aarch64-linux-gnu and x86_64-linux-gnu, to make sure that there's no regression. 

    We used one existing JMH test for performance test: test/micro/org/openjdk/bench/java/security/MessageDigests.java 
    We measured the performance benefit with an aarch64 cycle-accurate simulator.  
    Patch delivers 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message. 

    For now, this feature will not be enabled automatically for aarch64.  We can auto-enable this when it is fully tested on real hardware. 
    But for the above testing purposes, this is auto-enabled when the corresponding hardware feature is detected. 

    Comments?

Thanks,
Felix

From tobias.hartmann at oracle.com  Mon Aug 31 07:16:23 2020
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 31 Aug 2020 09:16:23 +0200
Subject: RFR(M): 8223051: support loops with long (64b) trip counts
In-Reply-To: <d9959e2c-32b9-54c2-a59c-c674e7841fce@oracle.com>
References: <87lfmd8lip.fsf@redhat.com>
 <DAD567AC-99C0-4904-B7E7-1E3414780B8D@oracle.com> <87h7wv7jny.fsf@redhat.com>
 <601CD9EB-C4E2-413E-988A-03CE5DE9FB00@oracle.com> <87y2q55rj4.fsf@redhat.com>
 <497B34CC-BA72-4674-8C5A-CF04DEF0CDC2@oracle.com> <87lflcyz67.fsf@redhat.com>
 <0CD2D156-D877-40AF-8FE6-CF5C64F127D9@oracle.com> <87d06nyoue.fsf@redhat.com>
 <2D6A14FF-ABCA-4762-8BE8-1BEA9C855DBB@oracle.com>
 <49518e41-6d94-f27f-0354-01576bafa3d1@oracle.com> <87y2p4y0fg.fsf@redhat.com>
 <216f325e-661e-02ef-3f58-a1e5c7578d80@oracle.com> <875zbjw9m9.fsf@redhat.com>
 <87h7t13bdz.fsf@redhat.com> <a28925b3-f93c-40f4-f180-8bf8a32c3a56@oracle.com>
 <87tuwx1gcf.fsf@redhat.com> <7433dd28-94ce-781a-a50c-e79234e2986e@oracle.com>
 <BD40513E-89A5-4E1A-8BFB-4EDFAA524A75@oracle.com>
 <d9959e2c-32b9-54c2-a59c-c674e7841fce@oracle.com>
Message-ID: <13aba46a-3200-30fa-7f37-b08a42dc9f8e@oracle.com>


On 25.08.20 16:06, Tobias Hartmann wrote:
> Okay, thanks, I'll run some more testing with these values. Will report back once it finished.

All done.

Apart from expected test failures (TestIntVect due to failed vectorization and
UseCountedLoopSafepointsTest due to a missing safepoint) and unrelated/known issues, I'm seeing the
following failure:

compiler/loopopts/TestRangeCheckPredicatesControl.java
-server -Xcomp -XX:+IgnoreUnrecognizedVMOptions -XX:StressLongCountedLoop=200000000

#  SIGSEGV (0xb) at pc=0x00007fb970ac2b73, pid=2312839, tid=2312845

# Problematic frame:
# V  [libjvm.so+0x18b3b73]  ZMark::try_mark_object(ZMarkCache*, unsigned long, bool)+0x53

Current thread (0x00007fb968059260):  GCTaskThread "ZWorker#2" [stack:
0x00007fb96ea6c000,0x00007fb96eb6c000] [id=2312845]

Stack: [0x00007fb96ea6c000,0x00007fb96eb6c000],  sp=0x00007fb96eb64be0,  free space=994k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native
code)
V  [libjvm.so+0x18b3b73]  ZMark::try_mark_object(ZMarkCache*, unsigned long, bool)+0x53
V  [libjvm.so+0x18b59a8]  ZMark::work_without_timeout(ZMarkCache*, ZMarkStripe*,
ZMarkThreadLocalStacks*)+0x148
V  [libjvm.so+0x18b6048]  ZMark::work(unsigned long)+0xa8
V  [libjvm.so+0x18f1b8d]  ZTask::GangTask::work(unsigned int)+0x1d
V  [libjvm.so+0x187a4c4]  GangWorker::run_task(WorkData)+0x84
V  [libjvm.so+0x187a604]  GangWorker::loop()+0x44
V  [libjvm.so+0x173ab90]  Thread::call_run()+0x100
V  [libjvm.so+0x143fc16]  thread_native_entry(Thread*)+0x116


Roland, could you please try to reproduce and check if it's related to your patch?

Best regards,
Tobias

From aph at redhat.com  Mon Aug 31 08:41:26 2020
From: aph at redhat.com (Andrew Haley)
Date: Mon, 31 Aug 2020 09:41:26 +0100
Subject: [aarch64-port-dev ] RFR: 8252204: AArch64: Implement SHA3
 accelerator/intrinsic
In-Reply-To: <DA41BE1DDCA941489001C7FBD7A8820EE7ED7841@dggeml527-mbx.china.huawei.com>
References: <DA41BE1DDCA941489001C7FBD7A8820EE7ED7841@dggeml527-mbx.china.huawei.com>
Message-ID: <1729f1b1-056d-76c9-c820-d38bd6c1235d@redhat.com>

On 31/08/2020 07:50, Yangfei (Felix) wrote:
>
>     Bug: https://bugs.openjdk.java.net/browse/JDK-8252204
>     Webrev: http://cr.openjdk.java.net/~fyang/8252204/webrev.00/
>
>     This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions.
>     Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions:
>         https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-cecore.S?h=v5.4.52
>    Trivial adaptation in SHA3. implCompress is needed for the purpose
>    of adding the intrinsic.  For SHA3, we need to pass one extra
>    parameter "digestLength" to the stub for the calculation of block
>    size.  "digestLength" is also used in for the EOR loop before
>    keccak to differentiate different SHA3 variants.
>
>    We added jtreg tests for SHA3 and used QEMU system emulator
>    which supports SHA3 instructions to test the functionality.
>    Patch passed jtreg tier1-3 tests with QEMU system emulator.
>    Also verified with jtreg tier1-3 tests without SHA3 instructions
>    on aarch64-linux-gnu and x86_64-linux-gnu, to make sure that
>    there's no regression.
>
>    We used one existing JMH test for performance test:
>    test/micro/org/openjdk/bench/java/security/MessageDigests.java
>    We measured the performance benefit with an aarch64
>    cycle-accurate simulator.
>    Patch delivers 20% - 40% performance improvement depending on
>    specific SHA3 digest length and size of the message.
>    For now, this feature will not be enabled automatically for
>    aarch64.  We can auto-enable this when it is fully tested on
>    real hardware.
>    But for the above testing purposes, this is auto-enabled when
>    the corresponding hardware feature is detected.
>
>    Comments?

This looks like a direct copy of the sha3-cecore.S file.You'll need
Linaro to contribute it.  I don't imagine they'll have any problem
with that: they are OCA signatories

Also, given that we've got the assembly source file, why not just copy
that into OpenJDK? I can't see the point rewriting it into the HotSpot
assembler.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From felix.yang at huawei.com  Mon Aug 31 09:46:58 2020
From: felix.yang at huawei.com (Yangfei (Felix))
Date: Mon, 31 Aug 2020 09:46:58 +0000
Subject: [aarch64-port-dev ] RFR: 8252204: AArch64: Implement SHA3
 accelerator/intrinsic
In-Reply-To: <1729f1b1-056d-76c9-c820-d38bd6c1235d@redhat.com>
References: <DA41BE1DDCA941489001C7FBD7A8820EE7ED7841@dggeml527-mbx.china.huawei.com>
 <1729f1b1-056d-76c9-c820-d38bd6c1235d@redhat.com>
Message-ID: <DA41BE1DDCA941489001C7FBD7A8820EE7ED7A71@dggeml527-mbx.china.huawei.com>


> -----Original Message-----
> From: Andrew Haley [mailto:aph at redhat.com]
> Sent: Monday, August 31, 2020 4:41 PM
> To: Yangfei (Felix) <felix.yang at huawei.com>; hotspot-compiler-
> dev at openjdk.java.net; core-libs-dev at openjdk.java.net
> Cc: aarch64-port-dev at openjdk.java.net
> Subject: Re: [aarch64-port-dev ] RFR: 8252204: AArch64: Implement SHA3
> accelerator/intrinsic
> 
> On 31/08/2020 07:50, Yangfei (Felix) wrote:
> >
> >     Bug: https://bugs.openjdk.java.net/browse/JDK-8252204
> >     Webrev: http://cr.openjdk.java.net/~fyang/8252204/webrev.00/
> >
> >     This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto
> Extensions.
> >     Reference implementation for core SHA-3 transform using ARMv8.2
> Crypto Extensions:
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/ar
> m64/crypto/sha3-cecore.S?h=v5.4.52
> >    Trivial adaptation in SHA3. implCompress is needed for the purpose
> >    of adding the intrinsic.  For SHA3, we need to pass one extra
> >    parameter "digestLength" to the stub for the calculation of block
> >    size.  "digestLength" is also used in for the EOR loop before
> >    keccak to differentiate different SHA3 variants.
> >
> >    We added jtreg tests for SHA3 and used QEMU system emulator
> >    which supports SHA3 instructions to test the functionality.
> >    Patch passed jtreg tier1-3 tests with QEMU system emulator.
> >    Also verified with jtreg tier1-3 tests without SHA3 instructions
> >    on aarch64-linux-gnu and x86_64-linux-gnu, to make sure that
> >    there's no regression.
> >
> >    We used one existing JMH test for performance test:
> >    test/micro/org/openjdk/bench/java/security/MessageDigests.java
> >    We measured the performance benefit with an aarch64
> >    cycle-accurate simulator.
> >    Patch delivers 20% - 40% performance improvement depending on
> >    specific SHA3 digest length and size of the message.
> >    For now, this feature will not be enabled automatically for
> >    aarch64.  We can auto-enable this when it is fully tested on
> >    real hardware.
> >    But for the above testing purposes, this is auto-enabled when
> >    the corresponding hardware feature is detected.
> >
> >    Comments?
> 
> This looks like a direct copy of the sha3-cecore.S file.You'll need Linaro to
> contribute it.  I don't imagine they'll have any problem with that: they are
> OCA signatories

Since the code in sha3-cecore.S works in kernel space, we need several modifications here to makes it work in hotspot.
First, we need to add callee-save & restore for d8 - d15 according to the aarch64 abi.  
Also, the following code snippet is not needed for user-space:
	if_will_cond_yield_neon
	add	x8, x19, #32
	st1	{ v0.1d- v3.1d}, [x19]
	st1	{ v4.1d- v7.1d}, [x8], #32
	st1	{ v8.1d-v11.1d}, [x8], #32
	st1	{v12.1d-v15.1d}, [x8], #32
	st1	{v16.1d-v19.1d}, [x8], #32
	st1	{v20.1d-v23.1d}, [x8], #32
	st1	{v24.1d}, [x8]
	do_cond_yield_neon
	b		0b
	endif_yield_neon

And we need to handle the multi-block case differently for StubRoutines::sha3_implCompressMB:
3485     if (multi_block) {
3486       // block_size =  200 - 2 * digest_length, ofs += block_size
3487       __ add(ofs, ofs, 200);
3488       __ sub(ofs, ofs, digest_length, Assembler::LSL, 1);
3489 
3490       __ cmp(ofs, limit);
3491       __ br(Assembler::LE, sha3_loop);
3492       __ mov(c_rarg0, ofs); // return ofs
3493     }

And StubRoutines::sha3_implCompress does not even need this multi-block check logic.

> Also, given that we've got the assembly source file, why not just copy that
> into OpenJDK? I can't see the point rewriting it into the HotSpot assembler.

Actually, we referenced the existing intrinsics implementation and took a similar way. It looks strange to have one intrinsic that goes differently. 
And we won't be able to emit this code on demand if we go that different way. Some cpu does not support these special sha3 instructions and thus does need this code at all. 
I think that's one advantage of using a stub. 

Thanks,
Felix

From vladimir.x.ivanov at oracle.com  Mon Aug 31 13:52:58 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 31 Aug 2020 16:52:58 +0300
Subject: Tiered compilation leads to "unloaded signature class" inlining
 failures in JRuby
In-Reply-To: <CAE-f1xT4PUExDDw-LwfZk4dCfAJhnwEUKVYGRZypYJYG5pubQw@mail.gmail.com>
References: <CAE-f1xT4PUExDDw-LwfZk4dCfAJhnwEUKVYGRZypYJYG5pubQw@mail.gmail.com>
Message-ID: <416425ef-0980-ba2c-0bdf-8eebefa5e81e@oracle.com>

Hi Charlie,

> So we have a puzzle. Why does running this code with tiered
> compilation cause it to (erroneously?) claim a signature class has not
> been loaded?

I didn't try to answer this exact question, but looked at what happens 
during the failed inlining attempt.

What surprised me is that the absent class which causes the failure is 
java.lang.String. But it turns out java.lang.String is never accessed 
from callee method [1] and hence there are no guarantees it is resolved 
in the context of the context class loader (instance of 
org/jruby/util/OneShotClassLoader) by the time the compilation kicks in.

You can workaround that by forcing j.l.String resolution when 
instantiating the class loader.

Best regards,
Vladimir Ivanov

[1]
Users.vlivanov.ws.tmp.TIERED.inline::RUBY$method$bar$0
(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/parser/StaticScope;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;Lorg/jruby/RubyModule;Ljava/lang/String;)Lorg/jruby/runtime/builtin/IRubyObject;
     0 nop
     1 nop
     2 fast_aload_0
     3 invokedynamic bsm=18 22 
<fixnum(Lorg/jruby/runtime/ThreadContext;)Lorg/jruby/runtime/builtin/IRubyObject;>
       0   bci: 3    CounterData         count(14485)
                     argument types      0: stack(0) 
'org/jruby/runtime/ThreadContext'
                     return type         'org/jruby/RubyFixnum'
     8 areturn
     9 athrow

  - class loader data:  loader data: 0x0000000134c18570 for instance a 
'org/jruby/util/OneShotClassLoader'{0x0000000702198648}
Java dictionary (table_size=107, classes=10, resizable=true)
^ indicates that initiating loader is different from defining loader
    8: ^java.lang.Object, loader data: 0x000000010043e520 of 'bootstrap'
   15: ^org.jruby.ir.targets.Bootstrap, loader data: 0x0000000127b2e010 
for instance a 
'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x0000000700001480}
   36: ^org.jruby.runtime.builtin.IRubyObject, loader data: 
0x0000000127b2e010 for instance a 
'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x0000000700001480}
   53: ^org.jruby.runtime.Block, loader data: 0x0000000127b2e010 for 
instance a 
'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x0000000700001480}
   53: ^org.jruby.ir.IRScope, loader data: 0x0000000127b2e010 for 
instance a 
'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x0000000700001480}
   69:  Users.vlivanov.ws.tmp.TIERED.inline, loader data: 
0x0000000134c18570 for instance a 
'org/jruby/util/OneShotClassLoader'{0x0000000702198648}
   73: ^org.jruby.runtime.ThreadContext, loader data: 0x0000000127b2e010 
for instance a 
'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x0000000700001480}
   74: ^org.jruby.ir.targets.FixnumObjectSite, loader data: 
0x0000000127b2e010 for instance a 
'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x0000000700001480}
   94: ^org.jruby.parser.StaticScope, loader data: 0x0000000127b2e010 
for instance a 
'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x0000000700001480}
   95: ^org.jruby.RubyModule, loader data: 0x0000000127b2e010 for 
instance a 
'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x0000000700001480}


> This appears to affect every OpenJDK release at least back to 8u222,
> the earliest version we tested.
> 
> To reproduce, create the two scripts in the bug, download a JRuby
> distribution from jruby.org, and execute the main script like this:
> 
> bin/jruby -Xcompile.invokedynamic -J-XX:+WhateverHotspotFlag main.rb
> 
> PrintInlining and PrintAssembly output will show that the "bar" method
> fails to inline into "foo" in the inline.rb part of the example.
> 
> Help!
> 
> - Charlie
> 

From dmitry.chuyko at bell-sw.com  Mon Aug 31 14:28:46 2020
From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko)
Date: Mon, 31 Aug 2020 17:28:46 +0300
Subject: [aarch64-port-dev ] [16] RFR(S): 8251525: AARCH64: Faster
 Math.signum(fp)
In-Reply-To: <c8cbe80b-5a34-8c63-6a04-1f299802ee42@redhat.com>
References: <abfbd41d-aacc-af5d-d12b-d8e685e95a76@bell-sw.com>
 <4b0176e2-317b-8fa2-1409-0f77be3f41c3@redhat.com>
 <67e67230-cac7-d940-1cca-6ab4e8cba8d4@redhat.com>
 <9e792a33-4f90-8829-2f7b-158d07d3fd15@bell-sw.com>
 <e6700260-198e-f5df-d91c-317bb58d5d47@redhat.com>
 <c19c6cb8-1377-cca5-0df9-dc7612d13d67@bell-sw.com>
 <c8cbe80b-5a34-8c63-6a04-1f299802ee42@redhat.com>
Message-ID: <0cca5c0c-9240-3a9f-98f0-519384ea69cb@bell-sw.com>

Hi Andrew,

Here is another version of intrinsics. It is an extension of webrev.03. 
Additional thing is that constants 0 and 1 that are used internally by 
intrinics are constructed as nodes. This is somehow similar to what is 
done for passing pointers to tables.

webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.04/
results: 
http://cr.openjdk.java.net/~dchuyko/8251525/webrev.04/benchmarks/signum-facgt_ir-copysign.ods

As you can see the case of intrinsic for entire signum is now up to 
29.2% better for "random" data. NaN is 30% better also. The only 
suffering case is 0, which is just 1 number (in two representations) of 
the whole range, and the regression is ~7%/10%. Performance in case of 0 
becomes the same as for all other numbers (and NaN). I don't suppose 
that 0 is so special. Because if input data is all zeroes and program 
produces zeroes during the computation, it is trivial. If zero make half 
of the data, there still be a win.

For the case of copySign(double), making a constant in IR amplifies 
regression in Blackhole benchmark, but still may be interesting to 
experiment with.

Just in case, it will be interesting to remeasure Blackhole variants if 
compiler support [1] will be implemented.
Here is also a benchmark variant [2] where we consume different data, 
and it shows same effects as Blackhole.consume(signum).

-Dmitry

[1] https://bugs.openjdk.java.net/browse/JDK-8252505
[2] 
http://cr.openjdk.java.net/~dchuyko/8251525/webrev.04/benchmarks/DoubleSideSinkBench.java 


From cjashfor at linux.ibm.com  Mon Aug 31 16:41:32 2020
From: cjashfor at linux.ibm.com (Corey Ashford)
Date: Mon, 31 Aug 2020 09:41:32 -0700
Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API
 for Base64 decoding
In-Reply-To: <AM4PR02MB30577E1569524741AC27BACA9A550@AM4PR02MB3057.eurprd02.prod.outlook.com>
References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com>
 <OFF371E11D.F5B64585-ON00258591.0006A5F8-49258591.0007AA3E@notes.na.collabserv.com>
 <ef75251b-c5ef-10b5-d0bd-c4ba15d2c934@linux.ibm.com>
 <f4c2d16d-5531-b0a6-b763-bed9d316190a@linux.ibm.com>
 <AM4PR02MB30578492C614A8BBD34D2DB39A570@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <AM4PR02MB3057EC6F32A1933BFFDC35AF9A540@AM4PR02MB3057.eurprd02.prod.outlook.com>
 <f315c49b-a42e-0668-0f53-3b9e979c0acc@linux.ibm.com>
 <AM4PR02MB30577E1569524741AC27BACA9A550@AM4PR02MB3057.eurprd02.prod.outlook.com>
Message-ID: <83ee5372-3890-fb07-721b-9d51641865da@linux.ibm.com>

On 8/27/20 8:07 AM, Doerr, Martin wrote:
> Hi Corey,
> 
>> If I make a requirement, I feel decode0 should check that the
>> requirement is met, and raise some kind of internal error if it isn't.
>> That actually was my first implementation, but I received some comments
>> during an internal review suggesting that I just "round down" the
>> destination count to the closest multiple of 3 less than or equal to the
>> returned value, rather than throw an internal exception which would
>> confuse users.  This "enforces" the rule, in some sense, without error
>> handling.  Do you have some thoughts about this?
> 
> I think the rounding logic is hard to understand and I'm not sure if it's correct (you're rounding up for the 1st computation of chars_decoded).
> If we don't use it, it will never get tested (because the intrinsic always returns a multiple of 3).
> I prefer having a more simple version which is easy to understand and for which we can test all cases.

I will see what I can do with the calculation of chars_decoded, at least 
in the comments, to make it more clear as to the "why" of the calculation.

I will remove the round down code: "dl = (dl / 3) * 3;" and leave it for 
intrinsics implementers/maintainers to check that assumption when the 
intrinsic returns.

> 
> I think we should be able to catch violations of this requirement by adding good JTREG tests.
> An illegal intrinsic implementation should never pass the tests. So I don't see a need to catch an illegal state in the Java source code in this case.
> I guess this will be best for intrinsic implementors for other platforms as well.
> 
> I'd appreciate more opinions on this.
> 
> 
>> I will double check that everything compiles and runs properly with gcc
>> 7.3.1.
> Please note that 7.3.1 is our minimum for Big Endian linux. For Little Endian it's 7.4.0.

Ah, that might explain why I wasn't able to find gcc-7.3.1 on RHEL 8.1 
(gcc-8.3.1) or Ubuntu 16.04 (gcc-7.4.0) for Power9.  As long as the code 
is enabled on little endian machines only, there should be no trouble 
with compilation.  I did compile and run the tests against 7.4.0, and it 
worked without a problem.

> You can also find this information here:
> https://wiki.openjdk.java.net/display/Build/Supported+Build+Platforms
> under "Other JDK 13 build platforms" which hasn't changed since then.
>

Great, thank you.


>> I will use __attribute__ ((align(16))) instead of __vector, and make
>> them arrays of 16 unsigned char.
> Maybe __vectors works as expected, too, now. Whatever we use, I'd appreciate to double-check the alignment e.g. by using gdb.

Ok, I will experiment with that with some small test cases and see if I 
can make the compiler stumble and not align the vector.  The lxv 
instruction can handle unaligned vectors in memory, but it would be 
better to have the vectors aligned for performance reasons.

> I don't remember what we had tried and why it didn't work as desired.
> 
> 
>> I was following what was done for encodeBlock, but it appears
>> encodeBlock's style isn't what is used for the other intrinsics.  I will
>> correct decodeBlock to use the prevailing style.  Another patch should
>> be added (not part of this webrev) to correct encodeBlock's style.
> In your code one '\' is not aligned with the other ones.

Yes, it's corrected now.

> 
> 
>> Ah, this is another thing I didn't know about.  I will make some
>> regression tests.
> Thanks. There's some documentation available:
> https://openjdk.java.net/jtreg/
> I guess your colleagues can assist you with that so you don't have to figure out everything alone.

Yes, thank you.  JTREG tests will be part of the next webrev version.

Regards,

- Corey

> 
> 
>> Thanks for your time on this.  As you can tell, I'm inexperienced in
>> writing openjdk code, so your patience and careful review is really
>> appreciated.
> I'm glad you work on contributions. I think we should welcome new contributors and assist as far as we can.
> 
> Best regards,
> Martin
> 
> 
>> -----Original Message-----
>> From: Corey Ashford <cjashfor at linux.ibm.com>
>> Sent: Donnerstag, 27. August 2020 00:17
>> To: Doerr, Martin <martin.doerr at sap.com>; Michihiro Horie
>> <HORIE at jp.ibm.com>
>> Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev <core-libs-
>> dev at openjdk.java.net>; Kazunori Ogata <OGATAK at jp.ibm.com>;
>> joserz at br.ibm.com
>> Subject: Re: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and
>> API for Base64 decoding
>>
>> Hi Martin,
>>
>> Some inline responses below.
>>
>> On 8/26/20 8:26 AM, Doerr, Martin wrote:
>>
>>> Hi Corey,
>>>
>>> I should explain my comments regarding Base64.java better.
>>>
>>>> Let's be precise: "should process a multiple of four" => "must process a
>>>> multiple of four"
>>> Did you try to support non-multiple of 4 and this was intended as
>> recommendation?
>>> I think making it a requirement and simplifying the logic in decode0 is
>> better.
>>> Or what's the benefit of the recommendation?
>>
>> If I make a requirement, I feel decode0 should check that the
>> requirement is met, and raise some kind of internal error if it isn't.
>> That actually was my first implementation, but I received some comments
>> during an internal review suggesting that I just "round down" the
>> destination count to the closest multiple of 3 less than or equal to the
>> returned value, rather than throw an internal exception which would
>> confuse users.  This "enforces" the rule, in some sense, without error
>> handling.  Do you have some thoughts about this?
>>
>>>
>>>>> If any illegal base64 bytes are encountered in the source by the
>>>>> intrinsic, the intrinsic can return a data length of zero or any
>>>>> number of bytes before the place where the illegal base64 byte
>>>>> was encountered.
>>>> I think this has a drawback. Somebody may use a debugger and want to
>> stop
>>>> when throwing IllegalArgumentException. He should see the position
>> which
>>>> matches the Java implementation.kkkk
>>> This is probably hard to understand. Let me try to explain it by example:
>>> 1. 80 Bytes get processed by the intrinsic and 60 Bytes written to the
>> destination array.
>>> 2. The intrinsic sees an illegal base64 Byte and it returns 12 which is allowed
>> by your specification.
>>> 3. The compiled method containing the intrinsic hits a safepoint (e.g. in the
>> large while loop in decodeBlockSlow).
>>> 4. A JVMTI agent (debugger) reads dp and dst.
>>> 5. The person using the debugger gets angry because more bytes than dp
>> were written into dst. The JVM didn't follow the specified behavior.
>>>
>>> I guess we can and should avoid it by specifying that the intrinsic needs to
>> return the dp value matching the number of Bytes written.
>>
>> That's an interesting point.  I will change the specification, and the
>> intrinsic implementation.  Right now the Power9/10 intrinsic returns 0
>> when any illegal character is discovered, but I've been thinking about
>> returning the number of bytes already written, which will allow
>> decodeBlockSlow to more quickly find the offending character.  This
>> provides another good reason to make that change.
>>
>>>
>>> Best regards,
>>> Martin
>>>
>>>
>>>> -----Original Message-----
>>>> From: Doerr, Martin
>>>> Sent: Dienstag, 25. August 2020 15:38
>>>> To: Corey Ashford <cjashfor at linux.ibm.com>; Michihiro Horie
>>>> <HORIE at jp.ibm.com>
>>>> Cc: hotspot-compiler-dev at openjdk.java.net; core-libs-dev <core-libs-
>>>> dev at openjdk.java.net>; Kazunori Ogata <OGATAK at jp.ibm.com>;
>>>> joserz at br.ibm.com
>>>> Subject: RE: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate
>> and
>>>> API for Base64 decoding
>>>>
>>>> Hi Corey,
>>>>
>>>> thanks for proposing this change. I have comments and suggestions
>>>> regarding various files.
>>>>
>>>>
>>>> Base64.java
>>>>
>>>> This is the only file which needs another review from core-libs-dev.
>>>> First of all, I like the idea to use a HotSpotIntrinsicCandidate which can
>>>> consume as many bytes as the implementation wants.
>>>>
>>>> Comment before decodeBlock:
>>>> Let's be precise: "should process a multiple of four" => "must process a
>>>> multiple of four"
>>>>
>>>>> If any illegal base64 bytes are encountered in the source by the
>>>>> intrinsic, the intrinsic can return a data length of zero or any
>>>>> number of bytes before the place where the illegal base64 byte
>>>>> was encountered.
>>>> I think this has a drawback. Somebody may use a debugger and want to
>> stop
>>>> when throwing IllegalArgumentException. He should see the position
>> which
>>>> matches the Java implementation.
>>>>
>>>> Please note that the comment indentation differs from other comments.
>>
>> Will fix.
>>
>>>>
>>>> decode0: Final "else" after return is redundant.
>>
>> Will fix.
>>
>>>>
>>>>
>>>> stubGenerator_ppc.cpp
>>>>
>>>> "__vector" breaks AIX build!
>>>> Does it work on Big Endian linux with old gcc (we require 7.3.1, now)?
>>>> Please either support Big Endian properly or #ifdef it out.
>>
>> I have been compiling with only Advance Toolchain 13, which is 9.3.1,
>> and only on Linux.  It will not work with big endian, so it won't work
>> on AIX, however obviously it shouldn't break the AIX build, so I will
>> address that.  There's code to set UseBASE64Intrinsics to false on big
>> endian, but you're right -- I should ifdef all of the intrinsic code for
>> little endian for now.  Getting it to work on big endian / AIX shouldn't
>> be difficult, but it's not in my scope of work at the moment.
>>
>> I will double check that everything compiles and runs properly with gcc
>> 7.3.1.
>>
>>>> What exactly does it (do) on linux?
>>
>> It's an arch-specific type that's 16 bytes in size and aligned on a
>> 16-byte boundary.
>>
>>>> I remember that we had tried such prefixes but were not satisfied. I think
>> it
>>>> didn't enforce 16 Byte alignment if I remember correctly.
>>
>> I will use __attribute__ ((align(16))) instead of __vector, and make
>> them arrays of 16 unsigned char.
>>
>>>>
>>>> Attention: C2 does no longer convert int/bool to 64 bit values (since JDK-
>>>> 8086069). So the argument registers for offset, length and isURL may
>> contain
>>>> garbage in the higher bits.
>>
>> Wow, that's good to know!  I will mask off the incoming values.
>>
>>>>
>>>> You may want to use load_const_optimized which produces shorter code.
>>
>> Will fix.
>>
>>>>
>>>> You may want to use __ align(32) to align unrolled_loop_start.
>>
>> Will fix.
>>
>>>>
>>>> I'll review the algorithm in detail when I find more time.
>>>>
>>>>
>>>> assembler_ppc.hpp
>>>> assembler_ppc.inline.hpp
>>>> vm_version_ppc.cpp
>>>> vm_version_ppc.hpp
>>>> Please rebase. Parts of the change were pushed as part of 8248190:
>> Enable
>>>> Power10 system and implement new byte-reverse instructions
>>
>> Will do.
>>
>>>>
>>>>
>>>> vmSymbols.hpp
>>>> Indentation looks odd at the end.
>>
>> I was following what was done for encodeBlock, but it appears
>> encodeBlock's style isn't what is used for the other intrinsics.  I will
>> correct decodeBlock to use the prevailing style.  Another patch should
>> be added (not part of this webrev) to correct encodeBlock's style.
>>
>>>>
>>>>
>>>> library_call.cpp
>>>> Good. Indentation style of the call parameters differs from encodeBlock.
>>
>> Will fix.
>>
>>>>
>>>>
>>>> runtime.cpp
>>>> Good.
>>>>
>>>>
>>>> aotCodeHeap.cpp
>>>> vmSymbols.cpp
>>>> shenandoahSupport.cpp
>>>> vmStructs_jvmci.cpp
>>>> shenandoahSupport.cpp
>>>> escape.cpp
>>>> runtime.hpp
>>>> stubRoutines.cpp
>>>> stubRoutines.hpp
>>>> vmStructs.cpp
>>>> Good and trivial.
>>>>
>>>>
>>>> Tests:
>>>> I think we should have JTREG tests to check for regressions in the future.
>>
>> Ah, this is another thing I didn't know about.  I will make some
>> regression tests.
>>
>> Thanks for your time on this.  As you can tell, I'm inexperienced in
>> writing openjdk code, so your patience and careful review is really
>> appreciated.
>>
>> - Corey


From headius at headius.com  Mon Aug 31 18:38:53 2020
From: headius at headius.com (Charles Oliver Nutter)
Date: Mon, 31 Aug 2020 13:38:53 -0500
Subject: Tiered compilation leads to "unloaded signature class" inlining
 failures in JRuby
In-Reply-To: <416425ef-0980-ba2c-0bdf-8eebefa5e81e@oracle.com>
References: <CAE-f1xT4PUExDDw-LwfZk4dCfAJhnwEUKVYGRZypYJYG5pubQw@mail.gmail.com>
 <416425ef-0980-ba2c-0bdf-8eebefa5e81e@oracle.com>
Message-ID: <CAE-f1xQpAi6AkLTG1qu=vz=RqPB9=nAj+NtRLcsADmQb64h7qA@mail.gmail.com>

On Mon, Aug 31, 2020 at 8:53 AM Vladimir Ivanov
<vladimir.x.ivanov at oracle.com> wrote:
> What surprised me is that the absent class which causes the failure is
> java.lang.String. But it turns out java.lang.String is never accessed
> from callee method [1] and hence there are no guarantees it is resolved
> in the context of the context class loader (instance of
> org/jruby/util/OneShotClassLoader) by the time the compilation kicks in.
>
> You can workaround that by forcing j.l.String resolution when
> instantiating the class loader.

I can give this a shot, but if I'm resolving the target method's
class, and that class is using String (there's definitely references
to String in the generated code), why is String still unresolved at
the point where I actually bind the method and call it?

I guess I can't tell whether you're saying "this is not your fault and
here's a workaround" or "this is your fault and this is how you should
fix it".

- Charlie

From evgeny.nikitin at oracle.com  Mon Aug 31 20:22:03 2020
From: evgeny.nikitin at oracle.com (Evgeny Nikitin)
Date: Mon, 31 Aug 2020 22:22:03 +0200
Subject: RFR(M): 8166554: Avoid compilation blocking in
 OverloadCompileQueueTest.java
Message-ID: <34b013fb-4eea-1a88-d3f1-6af990fecfbc@oracle.com>

Hi,

Bug: https://bugs.openjdk.java.net/browse/JDK-8166554
Webrev: http://cr.openjdk.java.net/~enikitin//8166554/webrev.00/index.html

==== Problem explanation ====

The immediate reason for the test timeout is compilation lock within the 
JVM during shutdown. In such a case, the VM gives the stuck compiler 
threads 10 seconds to finish [0], after which shuts down not respecting 
them. That 10 seconds of VM shutdown is enough for the test to fail in 
some cases - the test is a stress test that utilizes almost all the 
possible timeout[1].

The compilation lock, in its turn, is done via WhiteBox by the test. It 
should gracefully unlock the compilation in the 'finally' block, but the
lockUnlocker thread is declared daemon[2], and therefore may not execute 
the 'finally'.

==== Solution ====

Since the 'lockUnlock' is started via InfiniteLoop, it's not possible to 
un-daemon it. So I just turned the lockUnlock method into a Thread 
descendant, which got joined in the end. Not the most beautiful solution 
given direct work with delays, but its main lock-unlock cycle is small 
and it is clear about what it does.

Please review,
// Evgeny Nikitin.

========
[0] 
http://hg.openjdk.java.net/jdk/jdk/file/6db0cb3893c5/src/hotspot/share/runtime/vmOperations.cpp#l388
[1] 
http://hg.openjdk.java.net/jdk/jdk/file/e10f558e1df5/test/hotspot/jtreg/compiler/codecache/stress/CodeCacheStressRunner.java#l40
[2] 
http://hg.openjdk.java.net/jdk/jdk/file/e10f558e1df5/test/hotspot/jtreg/compiler/codecache/stress/Helper.java#l59

From yumin.qi at oracle.com  Mon Aug 31 21:32:26 2020
From: yumin.qi at oracle.com (Yumin Qi)
Date: Mon, 31 Aug 2020 14:32:26 -0700
Subject: 8248337: sparc related code clean up after solaris removal
Message-ID: <f11d53ad-2528-d39d-d68b-43578c1c6efd@oracle.com>

HI,

 ? Please review for

 ? bug: https://bugs.openjdk.java.net/browse/JDK-8248337

 ? webrev:http://cr.openjdk.java.net/~minqi/2020/8248337/webrev-01/


 ? Summary: After Solaris supported files removed from repo, there are some remnants which needs cleaning up. Some comments are not correct, and some refer to wrong files. There is a flag seems only useful for Sparc: UseRDPCForConstantTableBase, which got removed in this patch . Also in postaloc.cpp, the delay slot seems is only for sparc too, but I am not sure about that. Most of the patch are in comment section.


 ? Tests passed tier1-4


 ? Thanks

 ? Yumin


From vladimir.x.ivanov at oracle.com  Mon Aug 31 21:58:31 2020
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 1 Sep 2020 00:58:31 +0300
Subject: Tiered compilation leads to "unloaded signature class" inlining
 failures in JRuby
In-Reply-To: <CAE-f1xQpAi6AkLTG1qu=vz=RqPB9=nAj+NtRLcsADmQb64h7qA@mail.gmail.com>
References: <CAE-f1xT4PUExDDw-LwfZk4dCfAJhnwEUKVYGRZypYJYG5pubQw@mail.gmail.com>
 <416425ef-0980-ba2c-0bdf-8eebefa5e81e@oracle.com>
 <CAE-f1xQpAi6AkLTG1qu=vz=RqPB9=nAj+NtRLcsADmQb64h7qA@mail.gmail.com>
Message-ID: <e07c9317-2a2d-5101-24d4-9af08f6d4679@oracle.com>


>> What surprised me is that the absent class which causes the failure is
>> java.lang.String. But it turns out java.lang.String is never accessed
>> from callee method [1] and hence there are no guarantees it is resolved
>> in the context of the context class loader (instance of
>> org/jruby/util/OneShotClassLoader) by the time the compilation kicks in.
>>
>> You can workaround that by forcing j.l.String resolution when
>> instantiating the class loader.
> 
> I can give this a shot, but if I'm resolving the target method's
> class, and that class is using String (there's definitely references
> to String in the generated code), why is String still unresolved at
> the point where I actually bind the method and call it?

As I can see with the test case, target method is loaded in a separate 
instance of OneShotClassLoader (and, moreover, I see j.l.String loaded 
there!). So, it doesn't mattter whether a class is loaded in a "parent" 
(?) script at all since they are loaded by separate class loaders.

> I guess I can't tell whether you're saying "this is not your fault and
> here's a workaround" or "this is your fault and this is how you should
> fix it".

It's hard to draw a line here. My feeling is JVM can do a better job 
here (but I haven't worked out all the consequences yet). But if you 
want to get rid of this quirk running on 8u, you definitely better fix 
your app (JRuby).

Best regards,
Vladimir Ivanov

From cjashfor at linux.ibm.com  Mon Aug 31 22:22:47 2020
From: cjashfor at linux.ibm.com (Corey Ashford)
Date: Mon, 31 Aug 2020 15:22:47 -0700
Subject: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API
 for Base64 decoding
In-Reply-To: <65ed7919-86fc-adfa-3cd5-58dd96a3487f@linux.ibm.com>
References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com>
 <OFF371E11D.F5B64585-ON00258591.0006A5F8-49258591.0007AA3E@notes.na.collabserv.com>
 <ef75251b-c5ef-10b5-d0bd-c4ba15d2c934@linux.ibm.com>
 <f4c2d16d-5531-b0a6-b763-bed9d316190a@linux.ibm.com>
 <b3bd4ba5-a52f-02b7-6e30-b5f64061dec3@oracle.com>
 <8ece8d2e-fd99-b734-211e-a32b534a7dc8@linux.ibm.com>
 <8d53dcf8-635a-11e2-4f6a-39b70e2c3b8b@oracle.com>
 <65ed7919-86fc-adfa-3cd5-58dd96a3487f@linux.ibm.com>
Message-ID: <f4cfdf33-6dc5-aa30-7eea-8c3e21cfa5f2@linux.ibm.com>

On 8/29/20 1:19 PM, Corey Ashford wrote:
> Hi Roger,
> 
> Thanks for your reply and thoughts!? Comments interspersed below:
> 
> On 8/28/20 10:54 AM, Roger Riggs wrote:
...
>> Comparing with the way that the Base64 encoder was intrinsified, the
>> method that is intrinsified should have a method body that does
>> the same function, so it is interchangable.? That likely will just shift
>> the "fast path" code into the decodeBlock method.
>> Keeping the symmetry between encoder and decoder will
>> make it easier to maintain the code.
> 
> Good point.? I'll investigate what this looks like in terms of the 
> actual code, and will report back (perhaps in a new webrev).
> 

Having looked at this again, I don't think it makes sense.  One thing 
that differs significantly from the encodeBlock intrinsic is that the 
decodeBlock intrinsic only needs to process a prefix of the data, and so 
it can leave virtually any amount of data at the end of the src buffer 
unprocessed, where as with the encodeBlock intrinsic, if it exists, it 
must process the entire buffer.

In the (common) case where the decodeBlock intrinsic returns not having 
processed everything, it still needs to call the Java code, and if that 
Java code is "replaced" by the intrinsic, it's inaccessible.

Is there something I'm overlooking here?  Basically I want the decode 
API to behave differently than the encode API, mostly to make the 
arch-specific intrinsic easier to implement. If that's not acceptable, 
then I need to rethink the API, and also figure out how to deal with the 
illegal character case.  The latter could perhaps be done by throwing an 
exception from the intrinsic, or maybe by returning a negative length 
that specifies the index of the illegal src byte, and then have the Java 
code throw the exception).

Regards,

- Corey