RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86
Bhateja, Jatin
jatin.bhateja at intel.com
Fri Aug 7 09:27:38 UTC 2020
Hi Vladimir,
Please let me know if final version looks fine to you.
Also, if clearance from a second reviewer mandatory here or can we push this
to trunk is no more comments.
Best Regards,
Jatin
> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-retn at openjdk.java.net> On
> Behalf Of Bhateja, Jatin
> Sent: Sunday, August 2, 2020 11:55 PM
> To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-
> dev at openjdk.java.net
> Subject: RE: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86
>
> Hi Vladimir,
>
> Final patch is placed at following link.
>
> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.06/
>
> One more reviewer approval needed.
>
> Best Regards,
> Jatin
>
> > -----Original Message-----
> > From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> > Sent: Saturday, August 1, 2020 4:49 AM
> > To: Bhateja, Jatin <jatin.bhateja at intel.com>
> > Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
> > hotspot-compiler- dev at openjdk.java.net
> > Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for
> > X86
> >
> >
> > > http://cr.openjdk.java.net/~jbhateja/8248830/webrev.05/
> >
> > Looks good.
> >
> > Tier5 (where I saw the crashes) passed.
> >
> > Please, incorporate the following minor cleanups in the final version:
> >
> > http://cr.openjdk.java.net/~vlivanov/jbhateja/8248830/webrev.05.cleanu
> > p/
> >
> > (Tested with hs-tier1,hs-tier2.)
> >
> > Best regards,
> > Vladimir Ivanov
> >
> > >> -----Original Message-----
> > >> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> > >> Sent: Thursday, July 30, 2020 3:30 AM
> > >> To: Bhateja, Jatin <jatin.bhateja at intel.com>
> > >> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
> > >> hotspot-compiler- dev at openjdk.java.net
> > >> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification
> > >> for
> > >> X86
> > >>
> > >>
> > >>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.04/
> > >>>
> > >>> Looks good. (Testing is in progress.)
> > >>
> > >> FYI test results are clean (tier1-tier5).
> > >>
> > >>>> I have removed RotateLeftNode/RotateRightNode::Ideal routines
> > >>>> since we are anyways doing constant folding in LShiftI/URShiftI
> > >>>> value routines. Since JAVA rotate APIs are no longer intrincified
> > >>>> hence these routines may no longer be useful.
> > >>>
> > >>> Nice observation! Good.
> > >>
> > >> As a second thought, it seems there's still a chance left that
> > >> Rotate nodes get their input type narrowed after the folding
> > >> happened. For example, as a result of incremental inlining or CFG
> > >> transformations during loop optimizations. And it does happen in
> > >> practice since the testing revealed some crashes due to the bug in
> > RotateLeftNode/RotateRightNode::Ideal().
> > >>
> > >> So, it makes sense to keep the transformations. But I'm fine with
> > >> addressing that as a followup enhancement.
> > >>
> > >> Best regards,
> > >> Vladimir Ivanov
> > >>
> > >>>
> > >>>>> It would be really nice to migrate to MacroAssembler along the
> > >>>>> way (as a cleanup).
> > >>>>
> > >>>> I guess you are saying remove opcodes/encoding from patterns and
> > >>>> move then to Assembler, Can we take this cleanup activity
> > >>>> separately since other patterns are also using these matcher
> > directives.
> > >>>
> > >>> I'm perfectly fine with handling it as a separate enhancement.
> > >>>
> > >>>> Other synthetic comments have been taken care of. I have extended
> > >>>> the Test to cover all the newly added scalar transforms. Kindly
> > >>>> let me know if there other comments.
> > >>>
> > >>> Nice!
> > >>>
> > >>> Best regards,
> > >>> Vladimir Ivanov
> > >>>
> > >>>>> -----Original Message-----
> > >>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> > >>>>> Sent: Friday, July 24, 2020 3:21 AM
> > >>>>> To: Bhateja, Jatin <jatin.bhateja at intel.com>
> > >>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Andrew
> > >>>>> Haley <aph at redhat.com>; hotspot-compiler-dev at openjdk.java.net
> > >>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification
> > >>>>> for
> > >>>>> X86
> > >>>>>
> > >>>>> Hi Jatin,
> > >>>>>
> > >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.03/
> > >>>>>
> > >>>>> Much better! Thanks.
> > >>>>>
> > >>>>>> Change Summary:
> > >>>>>>
> > >>>>>> 1) Unified the handling for scalar rotate operation. All scalar
> > >>>>>> rotate
> > >>>>> selection patterns are now dependent on newly created
> > >>>>> RotateLeft/RotateRight nodes. This promotes rotate inferencing.
> > >>>>> Currently
> > >>>>> if DAG nodes corresponding to a sub-pattern are shared (have
> > >>>>> multiple
> > >>>>> users) then existing complex patterns based on
> > >>>>> Or/LShiftL/URShift does not get matched and this prevents inferring
> rotate nodes.
> > >>>>> Please refer to JIT'ed assembly output with baseline[1] and with
> > >>>>> patch[2] . We can see that generated code size also went done
> > >>>>> from
> > >>>>> 832 byte to 768 bytes. Also this can cause perf degradation if
> > >>>>> shift-or dependency chain appears inside a hot region.
> > >>>>>>
> > >>>>>> 2) Due to enhanced rotate inferencing new patch shows better
> > >>>>>> performance
> > >>>>> even for legacy targets (non AVX-512). Please refer to the perf
> > >>>>> result[3] over AVX2 machine for JMH benchmark part of the patch.
> > >>>>>
> > >>>>> Very nice!
> > >>>>>> 3) As suggested, removed Java API intrinsification changes and
> > >>>>>> scalar
> > >>>>> rotate transformation are done during OrI/OrL node idealizations.
> > >>>>>
> > >>>>> Good.
> > >>>>>
> > >>>>> (Still would be nice to factor the matching code from Ideal()
> > >>>>> and share it between multiple use sites. Especially considering
> > >>>>> OrVNode::Ideal() now does basically the same thing. As an
> > >>>>> example/idea, take a look at
> > >>>>> is_bmi_pattern() in x86.ad.)
> > >>>>>
> > >>>>>> 4) SLP always gets to work on new scalar Rotate nodes and
> > >>>>>> creates vector
> > >>>>> rotate nodes which are degenerated into OrV/LShiftV/URShiftV
> > >>>>> nodes if target does not supports vector rotates(non-AVX512).
> > >>>>>
> > >>>>> Good.
> > >>>>>
> > >>>>>> 5) Added new instruction patterns for vector shift Left/Right
> > >>>>>> operations
> > >>>>> with constant shift operands. This prevents emitting extra moves
> > >>>>> to
> > >> XMM.
> > >>>>>
> > >>>>> +instruct vshiftI_imm(vec dst, vec src, immI8 shift) %{
> > >>>>> + match(Set dst (LShiftVI src shift));
> > >>>>>
> > >>>>> I'd prefer to see a uniform Ideal IR shape being used
> > >>>>> irrespective of whether the argument is a constant or not. It
> > >>>>> should also simplify the logic in SuperWord and make it easier
> > >>>>> to support on
> > >>>>> non-x86 architectures.
> > >>>>>
> > >>>>> For example, here's how it is done on AArch64:
> > >>>>>
> > >>>>> instruct vsll4I_imm(vecX dst, vecX src, immI shift) %{
> > >>>>> predicate(n->as_Vector()->length() == 4);
> > >>>>> match(Set dst (LShiftVI src (LShiftCntV shift))); ...
> > >>>>>
> > >>>>>> 6) Constant folding scenarios are covered in
> > >>>>>> RotateLeft/RotateRight
> > >>>>> idealization, inferencing of vector rotate through OrV
> > >>>>> idealization covers the vector patterns generated though non SLP
> > route i.e.
> > >>>>> VectorAPI.
> > >>>>>
> > >>>>> I'm fine with keeping OrV::Ideal(), but I'm concerned with the
> > >>>>> general direction here - duplication of scalar transformations
> > >>>>> to lane-wise vector operations. It definitely won't scale and in
> > >>>>> a longer run it risks to diverge. Would be nice to find a way to
> > >>>>> automatically "lift"
> > >>>>> scalar transformations to vectors and apply them uniformly. But
> > >>>>> right now it is just an idea which requires more experimentation.
> > >>>>>
> > >>>>>
> > >>>>> Some other minor comments/suggestions:
> > >>>>>
> > >>>>> + // Swap the computed left and right shift counts.
> > >>>>> + if (is_rotate_left) {
> > >>>>> + Node* temp = shiftRCnt;
> > >>>>> + shiftRCnt = shiftLCnt;
> > >>>>> + shiftLCnt = temp;
> > >>>>> + }
> > >>>>>
> > >>>>> Maybe use swap() here (declared in globalDefinitions.hpp)?
> > >>>>>
> > >>>>>
> > >>>>> + if (Matcher::match_rule_supported_vector(vopc, vlen, bt))
> > >>>>> + return true;
> > >>>>>
> > >>>>> Please, don't omit curly braces (even for simple cases).
> > >>>>>
> > >>>>>
> > >>>>> -// Rotate Right by variable
> > >>>>> -instruct rorI_rReg_Var_C0(no_rcx_RegI dst, rcx_RegI shift,
> > >>>>> immI0 zero, rFlagsReg cr)
> > >>>>> +instruct rorI_immI8_legacy(rRegI dst, immI8 shift, rFlagsReg
> > >>>>> +cr)
> > >>>>> %{
> > >>>>> - match(Set dst (OrI (URShiftI dst shift) (LShiftI dst (SubI
> > >>>>> zero shift))));
> > >>>>> -
> > >>>>> + predicate(!VM_Version::supports_bmi2() &&
> > >>>>> n->bottom_type()->basic_type() == T_INT);
> > >>>>> + match(Set dst (RotateRight dst shift));
> > >>>>> + format %{ "rorl $dst, $shift" %}
> > >>>>> expand %{
> > >>>>> - rorI_rReg_CL(dst, shift, cr);
> > >>>>> + rorI_rReg_imm8(dst, shift, cr);
> > >>>>> %}
> > >>>>>
> > >>>>> It would be really nice to migrate to MacroAssembler along the
> > >>>>> way (as a cleanup).
> > >>>>>
> > >>>>>> Please push the patch through your testing framework and let me
> > >>>>>> know your
> > >>>>> review feedback.
> > >>>>>
> > >>>>> There's one new assertion failure:
> > >>>>>
> > >>>>> # Internal Error (.../src/hotspot/share/opto/phaseX.cpp:1238),
> > >>>>> pid=5476, tid=6219
> > >>>>> # assert((i->_idx >= k->_idx) || i->is_top()) failed: Idealize
> > >>>>> should return new nodes, use Identity to return old nodes
> > >>>>>
> > >>>>> I believe it comes from
> > >>>>> RotateLeftNode::Ideal/RotateRightNode::Ideal
> > >>>>> which can return pre-contructed constants. I suggest to get rid
> > >>>>> of
> > >>>>> Ideal() methods and move constant folding logic into
> > >>>>> Node::Value() (as implemented for other bitwise/arithmethic
> > >>>>> nodes in addnode.cpp/subnode.cpp/mulnode.cpp et al). It's a more
> > >>>>> generic approach since it enables richer type information
> > >>>>> (ranges vs
> > >>>>> constants) and IMO it's more convenient to work with constants
> > >>>>> through Types than ConNodes.
> > >>>>>
> > >>>>> (I suspect that original/expanded IR shape may already provide
> > >>>>> more precise type info for non-constant case which can affect
> > >>>>> the
> > >>>>> benchmarks.)
> > >>>>>
> > >>>>> Best regards,
> > >>>>> Vladimir Ivanov
> > >>>>>
> > >>>>>>
> > >>>>>> Best Regards,
> > >>>>>> Jatin
> > >>>>>>
> > >>>>>> [1]
> > >>>>>>
> > http://cr.openjdk.java.net/~jbhateja/8248830/rotate_baseline_avx2_asm.
> > >>>>>> txt [2]
> > >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_new_patch_a
> > >>>>>> vx
> > >>>>>> 2_
> > >>>>>> asm
> > >>>>>> .txt [3]
> > >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_perf_avx2_n
> > >>>>>> ew
> > >>>>>> _p
> > >>>>>> atc
> > >>>>>> h.txt
> > >>>>>>
> > >>>>>>
> > >>>>>>> -----Original Message-----
> > >>>>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> > >>>>>>> Sent: Saturday, July 18, 2020 12:25 AM
> > >>>>>>> To: Bhateja, Jatin <jatin.bhateja at intel.com>; Andrew Haley
> > >>>>>>> <aph at redhat.com>
> > >>>>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
> > >>>>>>> hotspot-compiler- dev at openjdk.java.net
> > >>>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API
> > >>>>>>> intrinsification for
> > >>>>>>> X86
> > >>>>>>>
> > >>>>>>> Hi Jatin,
> > >>>>>>>
> > >>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev_02/
> > >>>>>>>
> > >>>>>>> It definitely looks better, but IMO it hasn't reached the
> > >>>>>>> sweet spot
> > >>>>> yet.
> > >>>>>>> It feels like the focus is on auto-vectorizer while the burden
> > >>>>>>> is put on scalar cases.
> > >>>>>>>
> > >>>>>>> First of all, considering GVN folds relevant operation
> > >>>>>>> patterns into a single Rotate node now, what's the motivation
> > >>>>>>> to introduce intrinsics?
> > >>>>>>>
> > >>>>>>> Another point is there's still significant duplication for
> > >>>>>>> scalar cases.
> > >>>>>>>
> > >>>>>>> I'd prefer to see the legacy cases which rely on pattern
> > >>>>>>> matching to go away and be substituted with instructions which
> > >>>>>>> match Rotate instructions (migrating ).
> > >>>>>>>
> > >>>>>>> I understand that it will penalize the vectorization
> > >>>>>>> implementation, but IMO reducing overall complexity is worth it.
> > >>>>>>> On auto-vectorizer side, I see
> > >>>>>>> 2 ways to fix it:
> > >>>>>>>
> > >>>>>>> (1) introduce additional AD instructions for
> > >>>>>>> RotateLeftV/RotateRightV specifically for pre-AVX512 hardware;
> > >>>>>>>
> > >>>>>>> (2) in SuperWord::output(), when matcher doesn't support
> > >>>>>>> RotateLeftV/RotateLeftV nodes
> > >>>>>>> (Matcher::match_rule_supported()),
> > >>>>>>> generate vectorized version of the original pattern.
> > >>>>>>>
> > >>>>>>> Overall, it looks like more and more focus is made on scalar
> part.
> > >>>>>>> Considering the main goal of the patch is to enable
> > >>>>>>> vectorization, I'm fine with separating cleanup of scalar part.
> > >>>>>>> As an interim solution, it seems that leaving the scalar part
> > >>>>>>> as it is now and matching scalar bit rotate pattern in
> > >>>>>>> VectorNode::is_rotate() should be enough to keep the
> > >>>>>>> vectorization part functioning. Then scalar Rotate nodes and
> > relevant cleanups can be integrated later.
> > >>>>>>> (Or vice
> > >>>>>>> versa: clean up scalar part first and then follow up with
> > >>>>>>> vectorization.)
> > >>>>>>>
> > >>>>>>> Some other comments:
> > >>>>>>>
> > >>>>>>> * There's a lot of duplication between OrINode::Ideal and
> > >>>>> OrLNode::Ideal.
> > >>>>>>> What do you think about introducing a super type
> > >>>>>>> (OrNode) and put a unified version (OrNode::Ideal) there?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> * src/hotspot/cpu/x86/x86.ad
> > >>>>>>>
> > >>>>>>> +instruct vprotate_immI8(vec dst, vec src, immI8 shift) %{
> > >>>>>>> + predicate(n->bottom_type()->is_vect()->element_basic_type()
> > >>>>>>> +==
> > >>>>>>> T_INT
> > >>>>> ||
> > >>>>>>> + n->bottom_type()->is_vect()->element_basic_type()
> > >>>>>>> +== T_LONG);
> > >>>>>>>
> > >>>>>>> +instruct vprorate(vec dst, vec src, vec shift) %{
> > >>>>>>> + predicate(n->bottom_type()->is_vect()->element_basic_type()
> > >>>>>>> +==
> > >>>>>>> T_INT
> > >>>>> ||
> > >>>>>>> + n->bottom_type()->is_vect()->element_basic_type()
> > >>>>>>> +== T_LONG);
> > >>>>>>>
> > >>>>>>> The predicates are redundant here.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> * src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
> > >>>>>>>
> > >>>>>>> +void C2_MacroAssembler::vprotate_imm(int opcode, BasicType
> > >>>>>>> +etype,
> > >>>>>>> XMMRegister dst, XMMRegister src,
> > >>>>>>> + int shift, int
> > >>>>>>> +vector_len) { if (opcode == Op_RotateLeftV) {
> > >>>>>>> + if (etype == T_INT) {
> > >>>>>>> + evprold(dst, src, shift, vector_len);
> > >>>>>>> + } else {
> > >>>>>>> + evprolq(dst, src, shift, vector_len);
> > >>>>>>> + }
> > >>>>>>>
> > >>>>>>> Please, put an assert for the false case (assert(etype ==
> > >>>>>>> T_LONG,
> > >>>>> "...")).
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> * On testing (with previous version of the patch): -XX:UseAVX
> > >>>>>>> is
> > >>>>>>> x86- specific flag, so new/adjusted tests now fail on non-x86
> > >> platforms.
> > >>>>>>> Either omitting the flag or adding
> > >>>>>>> -XX:+IgnoreUnrecognizedVMOptions will solve the issue.
> > >>>>>>>
> > >>>>>>> Best regards,
> > >>>>>>> Vladimir Ivanov
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Summary of changes:
> > >>>>>>>> 1) Optimization is specifically targeted to exploit vector
> > >>>>>>>> rotation
> > >>>>>>> instruction added for X86 AVX512. A single rotate instruction
> > >>>>>>> encapsulates entire vector OR/SHIFTs pattern thus offers
> > >>>>>>> better latency at reduced instruction count.
> > >>>>>>>>
> > >>>>>>>> 2) There were two approaches to implement this:
> > >>>>>>>> a) Let everything remain the same and add new wide
> > >>>>>>>> complex
> > >>>>>>> instruction patterns in the matcher for e.g.
> > >>>>>>>> set Dst ( OrV (Binary (LShiftVI dst (Binary
> > >>>>>>>> ReplicateI
> > >>>>>>>> shift))
> > >>>>>>> (URShiftVI dst (Binary (SubI (Binary ReplicateI 32) (
> > >>>>>>> Replicate
> > >>>>>>> shift))
> > >>>>>>>> It would have been an overoptimistic assumption to
> > >>>>>>>> expect that graph
> > >>>>>>> shape would be preserved till the matcher for correct
> inferencing.
> > >>>>>>>> In addition we would have required multiple such
> > >>>>>>>> bulky patterns.
> > >>>>>>>> b) Create new RotateLeft/RotateRight scalar nodes,
> > >>>>>>>> these gets
> > >>>>>>> generated during intrinsification as well as during additional
> > >>>>>>> pattern
> > >>>>>>>> matching during node Idealization, later on these
> > >>>>>>>> nodes are consumed
> > >>>>>>> by SLP for valid vectorization scenarios to emit their vector
> > >>>>>>>> counterparts which eventually emits vector rotates.
> > >>>>>>>>
> > >>>>>>>> 3) I choose approach 2b) since its cleaner, only problem here
> > >>>>>>>> was that in non-evex mode (UseAVX < 3) new scalar Rotate
> > >>>>>>>> nodes should either be
> > >>>>>>> dismantled back to OR/SHIFT pattern or we penalize the
> > >>>>>>> vectorization which would be very costly, other option would
> > >>>>>>> have been to add additional vector rotate pattern for UseAVX=3
> > >>>>>>> in the matcher which emit vector OR-SHIFTs instruction but
> > >>>>>>> then it will loose on emitting efficient instruction sequence
> > >>>>>>> which node sharing
> > >>>>>>> (OrV/LShiftV/URShift) offer in current implementation - thus
> > >>>>>>> it will not be beneficial for non-AVX512 targets, only saving
> > >>>>>>> will be in terms of cleanup of few existing scalar rotate
> > >>>>>>> matcher patterns, also old targets does not offer this
> > >>>>>>> powerful rotate
> > >> instruction.
> > >>>>>>> Therefore new scalar nodes are created only for AVX512 targets.
> > >>>>>>>>
> > >>>>>>>> As per suggestions constant folding scenarios have been
> > >>>>>>>> covered during
> > >>>>>>> Idealizations of newly added scalar nodes.
> > >>>>>>>>
> > >>>>>>>> Please review the latest version and share your feedback and
> > >>>>>>>> test
> > >>>>>>> results.
> > >>>>>>>>
> > >>>>>>>> Best Regards,
> > >>>>>>>> Jatin
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> -----Original Message-----
> > >>>>>>>>> From: Andrew Haley <aph at redhat.com>
> > >>>>>>>>> Sent: Saturday, July 11, 2020 2:24 PM
> > >>>>>>>>> To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; Bhateja,
> > >>>>>>>>> Jatin <jatin.bhateja at intel.com>;
> > >>>>>>>>> hotspot-compiler-dev at openjdk.java.net
> > >>>>>>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> > >>>>>>>>> Subject: Re: 8248830 : RFR[S] : C2 : Rotate API
> > >>>>>>>>> intrinsification for
> > >>>>>>>>> X86
> > >>>>>>>>>
> > >>>>>>>>> On 10/07/2020 18:32, Vladimir Ivanov wrote:
> > >>>>>>>>>
> > >>>>>>>>> > High-level comment: so far, there were no pressing
> > >>>>>>>>> need in
> > >>>>>>>>>> explicitly marking the methods as intrinsics. ROR/ROL
> > >>>>>>>>> instructions
> > >>>>>>>>>> were selected during matching [1]. Now the patch introduces
> > >>>>>>>>>> >
> > >>>>>>>>> dedicated nodes
> > >>>>>>>>> (RotateLeft/RotateRight) specifically for intrinsics >
> > >>>>>>>>> which partly duplicates existing logic.
> > >>>>>>>>>
> > >>>>>>>>> The lack of rotate nodes in the IR has always meant that
> > >>>>>>>>> AArch64 doesn't generate optimal code for e.g.
> > >>>>>>>>>
> > >>>>>>>>> (Set dst (XorL reg1 (RotateLeftL reg2 imm)))
> > >>>>>>>>>
> > >>>>>>>>> because, with the RotateLeft expanded to its full
> > >>>>>>>>> combination of ORs and shifts, it's to complicated to match.
> > >>>>>>>>> At the time I put this to one side because it wasn't urgent.
> > >>>>>>>>> This is a shame because although such combinations are
> > >>>>>>>>> unusual they are used in some crypto
> > >>>>> operations.
> > >>>>>>>>>
> > >>>>>>>>> If we can generate immediate-form rotate nodes early by
> > >>>>>>>>> pattern matching during parsing (rather than depending on
> > >>>>>>>>> intrinsics) we'll get more value than by depending on
> > >>>>>>>>> programmers calling
> > >> intrinsics.
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>> Andrew Haley (he/him)
> > >>>>>>>>> Java Platform Lead Engineer
> > >>>>>>>>> Red Hat UK Ltd. <https://www.redhat.com>
> > >>>>>>>>> https://keybase.io/andrewhaley
> > >>>>>>>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
> > >>>>>>>>
More information about the hotspot-compiler-dev
mailing list