JIT code generation for Long/Integer.compare

Fri Sep 25 19:03:17 UTC 2015

Thanks for the feedback! To summarize, we have a patch here that implements
Long/Integer.compare using intrinsics there is a counter proposal that this
would be better done by pattern matching. A limitation of the patch is it
only implements the CmpI3 intrinsic for x86_64, it piggy backs for CmpL3 on
the bytecode implementation already present for all architectures.
Implementing CmpI3 isn't challenging given it is a just a minor tweak to
CmpL3.

Could you give more details on how you would expect the pattern matching
approach to work? For example, a convenient place to do this would be in
the ad file, but this would require porting. It could be done as a
simplification but there aren't comparable in scope matchers, or am I
missing this? The matcher would also have to be sufficiently general to
handle variants in the bool nodes and constants which would necessitate
multiple matchers if done thoroughly in the ad file. I don't disagree that
pattern matching is a more generic solution and would avoid the CmpI3 node,
the scope of the patch required for that seems substantial, no?

Thanks,
Ian

On Thu, Sep 24, 2015 at 6:30 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com
> wrote:

> I agree with Vitaly. It is better to use pattern matching (and generate
> CmpI3 and CmpL3 nodes) in subnode.cpp because java code is very simple and
> we may match other cases too.
>
> Also suggested changes are not complete since CmpI3 implementation was not
> added to other platforms.
>
> Thanks,
> Vladimir
>
> On 9/25/15 7:54 AM, Vitaly Davidovich wrote:
>
>> I must admit it's a bit strange seeing this implemented via intrinsic -
>> is this not possible via normal JIT optimizations? There's nothing
>> really "intrinsic" about the code.  I get that it's easier
>> implementation-wise to latch on to a well known method, but what about
>> similar code used without calling compare?
>>
>> sent from my phone
>>
>> On Sep 24, 2015 7:11 PM, "Ian Rogers" <irogers at google.com
>> <mailto:irogers at google.com>> wrote:
>>
>>     Agreed. The attached patch eliminates the cmpl3_flag enc_class and
>>     implements both cmpl3 and cmpi3 as you suggest.
>>
>>     Thanks,
>>     Ian
>>
>>     On Thu, Sep 24, 2015 at 12:51 PM, Christian Thalinger
>>     <christian.thalinger at oracle.com
>>     <mailto:christian.thalinger at oracle.com>> wrote:
>>
>>         One comment about the .ad change:  please don’t introduce new
>>         enc_class methods; use ins_encode %{ %} and MacroAssembler
>>         instructions instead, like this one:
>>
>>            ins_encode %{
>>              Register Rp = $p$$Register;
>>              Register Rq = $q$$Register;
>>              Register Ry = $y$$Register;
>>              Label done;
>>              __ cmpl(Rp, Rq);
>>              __ jccb(Assembler::less, done);
>>              __ xorl(Ry, Ry);
>>              __ bind(done);
>>            %}
>>
>>         Should be less painful too :-)
>>
>>         On Sep 24, 2015, at 8:45 AM, Ian Rogers <irogers at google.com
>>>         <mailto:irogers at google.com>> wrote:
>>>
>>>         Below is a patch to add JIT code generation for
>>>         Long/Integer.compare. It has been reviewed internally by
>>>         rasbold at google.com <mailto:rasbold at google.com>. I'd like to
>>>
>>>         open a bug for this, get it reviewed, etc. but I lack a JBS
>>>         account. I'd appreciate help in getting this reviewed and merged.
>>>
>>>         Thanks,
>>>         Ian Rogers
>>>
>>>         Support JIT code generation for Long/Integer.compare as
>>>         intrinsics that fold with branches on their result.
>>>
>>>         Introduce a CmpI3 ideal node mirroring the CmpL3 node, that
>>>         implements
>>>         Integer.compare. Allow this to fold with a CmpI node.  Spot
>>>         Long/Integer.compare
>>>         as CmpL3 and CmpI3 nodes.  Add a CmpI3 implementation for
>>>         x86-64.  On a
>>>         micro-benchmark loop of:
>>>             for (int i = 0; i < x.length; i++) {
>>>               if (compare(x[i], y[i]) < 0) {
>>>                 count++;
>>>               }
>>>             }
>>>         Int speed up averages 1.18x, long speed up averages 2.76x,
>>>         over 30 runs of
>>>         arrays sized at 5,000,000 elements. This can be improved with
>>>         work on
>>>         instruction selection.
>>>         Raw data:
>>>         Int before:  23129us, 99.5% range: 19935us - 26046us
>>>         Int after:   19557us, 99.5% range: 16972us - 26072us
>>>         Long before: 26935us, 99.5% range: 25776us - 29323us
>>>         Long after:   9749us, 99.5% range: 8850us  - 11968us
>>>
>>>         <cmpi3-jdk9-tdiff.patch>
>>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150925/4947b27d/attachment.html>