JIT code generation for Long/Integer.compare

Vladimir Kozlov vladimir.kozlov at oracle.com
Sat Sep 26 02:35:02 UTC 2015


For pattern matching I mean ideal transformation in ideal graph.
See, fro example, is_x2logic() in cfgnode.cpp and other transformations 
for Phi node.

You will still need new CmpI3 node and changes in .ad file.

Thanks,
Vladimir

On 9/26/15 3:03 AM, Ian Rogers wrote:
> Thanks for the feedback! To summarize, we have a patch here that
> implements Long/Integer.compare using intrinsics there is a counter
> proposal that this would be better done by pattern matching. A
> limitation of the patch is it only implements the CmpI3 intrinsic for
> x86_64, it piggy backs for CmpL3 on the bytecode implementation already
> present for all architectures. Implementing CmpI3 isn't challenging
> given it is a just a minor tweak to CmpL3.
>
> Could you give more details on how you would expect the pattern matching
> approach to work? For example, a convenient place to do this would be in
> the ad file, but this would require porting. It could be done as a
> simplification but there aren't comparable in scope matchers, or am I
> missing this? The matcher would also have to be sufficiently general to
> handle variants in the bool nodes and constants which would necessitate
> multiple matchers if done thoroughly in the ad file. I don't disagree
> that pattern matching is a more generic solution and would avoid the
> CmpI3 node, the scope of the patch required for that seems substantial, no?
>
> Thanks,
> Ian
>
>
> On Thu, Sep 24, 2015 at 6:30 PM, Vladimir Kozlov
> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>
>     I agree with Vitaly. It is better to use pattern matching (and
>     generate CmpI3 and CmpL3 nodes) in subnode.cpp because java code is
>     very simple and we may match other cases too.
>
>     Also suggested changes are not complete since CmpI3 implementation
>     was not added to other platforms.
>
>     Thanks,
>     Vladimir
>
>     On 9/25/15 7:54 AM, Vitaly Davidovich wrote:
>
>         I must admit it's a bit strange seeing this implemented via
>         intrinsic -
>         is this not possible via normal JIT optimizations? There's nothing
>         really "intrinsic" about the code.  I get that it's easier
>         implementation-wise to latch on to a well known method, but what
>         about
>         similar code used without calling compare?
>
>         sent from my phone
>
>         On Sep 24, 2015 7:11 PM, "Ian Rogers" <irogers at google.com
>         <mailto:irogers at google.com>
>         <mailto:irogers at google.com <mailto:irogers at google.com>>> wrote:
>
>              Agreed. The attached patch eliminates the cmpl3_flag
>         enc_class and
>              implements both cmpl3 and cmpi3 as you suggest.
>
>              Thanks,
>              Ian
>
>              On Thu, Sep 24, 2015 at 12:51 PM, Christian Thalinger
>              <christian.thalinger at oracle.com
>         <mailto:christian.thalinger at oracle.com>
>              <mailto:christian.thalinger at oracle.com
>         <mailto:christian.thalinger at oracle.com>>> wrote:
>
>                  One comment about the .ad change:  please don’t
>         introduce new
>                  enc_class methods; use ins_encode %{ %} and MacroAssembler
>                  instructions instead, like this one:
>
>                     ins_encode %{
>                       Register Rp = $p$$Register;
>                       Register Rq = $q$$Register;
>                       Register Ry = $y$$Register;
>                       Label done;
>                       __ cmpl(Rp, Rq);
>                       __ jccb(Assembler::less, done);
>                       __ xorl(Ry, Ry);
>                       __ bind(done);
>                     %}
>
>                  Should be less painful too :-)
>
>                      On Sep 24, 2015, at 8:45 AM, Ian Rogers
>             <irogers at google.com <mailto:irogers at google.com>
>                      <mailto:irogers at google.com
>             <mailto:irogers at google.com>>> wrote:
>
>                      Below is a patch to add JIT code generation for
>                      Long/Integer.compare. It has been reviewed
>             internally by
>             rasbold at google.com <mailto:rasbold at google.com>
>             <mailto:rasbold at google.com <mailto:rasbold at google.com>>. I'd
>             like to
>
>                      open a bug for this, get it reviewed, etc. but I
>             lack a JBS
>                      account. I'd appreciate help in getting this
>             reviewed and merged.
>
>                      Thanks,
>                      Ian Rogers
>
>                      Support JIT code generation for Long/Integer.compare as
>                      intrinsics that fold with branches on their result.
>
>                      Introduce a CmpI3 ideal node mirroring the CmpL3
>             node, that
>                      implements
>                      Integer.compare. Allow this to fold with a CmpI
>             node.  Spot
>                      Long/Integer.compare
>                      as CmpL3 and CmpI3 nodes.  Add a CmpI3
>             implementation for
>                      x86-64.  On a
>                      micro-benchmark loop of:
>                          for (int i = 0; i < x.length; i++) {
>                            if (compare(x[i], y[i]) < 0) {
>                              count++;
>                            }
>                          }
>                      Int speed up averages 1.18x, long speed up averages
>             2.76x,
>                      over 30 runs of
>                      arrays sized at 5,000,000 elements. This can be
>             improved with
>                      work on
>                      instruction selection.
>                      Raw data:
>                      Int before:  23129us, 99.5% range: 19935us - 26046us
>                      Int after:   19557us, 99.5% range: 16972us - 26072us
>                      Long before: 26935us, 99.5% range: 25776us - 29323us
>                      Long after:   9749us, 99.5% range: 8850us  - 11968us
>
>                      <cmpi3-jdk9-tdiff.patch>
>
>
>
>


More information about the hotspot-compiler-dev mailing list