JIT code generation for Long/Integer.compare

Thu Oct 1 17:05:04 UTC 2015

Hi Ian,

Don't you want to replicate these results yourself, since it may serve
to validate any polishings for the patch? ;) The benchmark you suggest
is easily runnable, and -prof perfasm can give you the generated code in
a minute.

Also, why would anyone do "Long.compare(a, b) < 0" when it's actually "a
< b"? Should we actually care about this (arguably) code smell?

Anyhow, my quick runs suggest with current JDK implementation C2
compiles "Long.compare(a, b) < 0" expressions nicely, collapsing some
code paths that used to return 0xff..ff / setne to just returning 1 / 0,
sometimes even folding into conditional move. The generated code with
the intrinsic is the same!

E.g.

    @Benchmark
    public void first(Blackhole bh) {
        for (long v : a) {
            bh.consume(compare(v, 0L));
        }
    }

    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    private boolean compare(long a, long b) {
        return Long.compare(a, b) < 0;
    }

...with bias=0.5 emits this with both baseline and intrinsified versions:

                  [Verified Entry Point]
  0.78%    0.92%    0x00007fa9395f9b00: sub    $0x18,%rsp
  6.37%    6.16%    0x00007fa9395f9b07: mov    %rbp,0x10(%rsp)
  0.08%    0.08%    0x00007fa9395f9b0c: xor    %r10d,%r10d
  1.05%    0.71%    0x00007fa9395f9b0f: mov    $0x1,%eax
                    0x00007fa9395f9b14: cmp    %rcx,%rdx
  6.23%    5.44%    0x00007fa9395f9b17: cmovge %r10d,%eax
  0.82%    0.48%    0x00007fa9395f9b1b: add    $0x10,%rsp
                    0x00007fa9395f9b1f: pop    %rbp
  6.25%    5.74%    0x00007fa9395f9b20: test   %eax,0x134ae4da(%rip)
  0.04%    0.08%    0x00007fa9395f9b26: retq

Other biases are swinging back and forth, mostly tied, depending on how
lucky you are in laying out the branches. (Admittedly, "< 0" shape is
favorited by non-optimized intrinsic code -- it without setne; so even
better benchmarks would quantify "> 0" and "== 0" patterns as well).

So, I would say we should mostly care about ternary (-1, 0, 1) cases? I
still think the work optimizing Long/Integer.compare is about dealing
with optimizer quirks...

Thanks,
-Aleksey

On 10/01/2015 07:05 PM, Ian Rogers wrote:
> Thanks Aleksey,
> 
> I'd comment on the bug but I have no login. With your analysis could you
> look at the effect of the codegen around:
> 
>     @CompilerControl(CompilerControl.Mode.DONT_INLINE)
>     private boolean compare(long a, long b) {
>         return Long.compare(a, b) < 0;
>     }
> 
> without the intrinsic you effectively (currently) get:
> 
>         return ((x < y) ? -1 : ((x == y) ? 0 : 1)) < 0;
> 
> with the intrinsic you get:
> 
>         return x < y;
> 
> That's not to say intrinsics are good, but recognizing the code as a
> CmplL3 means it folds into the outer comparison which is common in the
> use of compareTo methods. I didn't notice this in your examples above,
> which may be error on my part - apologies in advance.
> 
> Thanks,
> Ian
> 
> 
> On Mon, Sep 28, 2015 at 3:11 PM, Aleksey Shipilev
> <aleksey.shipilev at oracle.com <mailto:aleksey.shipilev at oracle.com>> wrote:
> 
>     Hi Ian,
> 
>     I filed:
>       https://bugs.openjdk.java.net/browse/JDK-8137309
> 
>     I can see some improvement with your patch, mostly when Longs are
>     random, and therefore branch profile is polluted, and current generated
>     code is not laid out well. Current Hotspot seems to generate multiple
>     redundant cmp-s, which looks like a generic issue that should be
>     addressed. See e.g. the very first disassembly here:
>       http://cr.openjdk.java.net/~shade/8137309/BiasedCompareTo.java
>     <http://cr.openjdk.java.net/%7Eshade/8137309/BiasedCompareTo.java>
> 
>     In some other cases, patched version is regressing. It seems to be the
>     interaction with intrinsic not using any branch profile to figure out
>     which option is more frequent, and some minor regalloc/constant
>     propagation issues. See e.g.:
>       http://cr.openjdk.java.net/~shade/8137309/PredictedCompareTo.java
>     <http://cr.openjdk.java.net/%7Eshade/8137309/PredictedCompareTo.java>
> 
>     With that, fixing the optimizer/codegen without resorting to point
>     intrinsics does look like a better course of action.
> 
>     Thanks,
>     -Aleksey
> 
>     On 09/28/2015 04:44 AM, Ian Rogers wrote:
>     > Thanks. I'm not disagreeing with anything here. is_x2logic is 45 lines
>     > long and doing less (at least half) pattern matching work than
>     would be
>     > required for compare. Wrt implementation approach, ReverseBytesI is
>     > purely an intrinsic but could also be implemented by pattern matching,
>     > which fwiw is how its implemented in Jikes RVM as part of instruction
>     > selection. With hindsight a high/ideal level BURS matching phase would
>     > be nice. The intrinsic implementation is trivially correct and
>     easy for
>     > developers to target, whereas pattern matching can be buggy, hard to
>     > test, and so on. Its inherently more LOC complex. I have long thought
>     > pattern matching is the more laudable approach, but there's also a
>     > certain amount of pragmatism when selecting between an intrinsic and
>     > pattern matching. The scale of the proposed pattern match is
>     beyond the
>     > size of any equivalent in C2, so I have concerns around
>     correctness and
>     > in the consistency of the codebase in implementing it this way.
>     >
>     > If I may explain some of the background for this change, aside pure
>     > performance. There is a common bug pattern of:
>     >
>     > class MyFoo implements Comparable<MyFoo> {
>     >   private long x;
>     >   int compareTo(MyFoo o) {
>     >     return (int)(o.x - x);
>     >   }
>     > }
>     >
>     > As the int cast doesn't preserve the sign of the long subtract,
>     subtract
>     > results that overflow 31-bits may produce incorrect sort orders.
>     > Rewriting the above code to use Long.compare is currently a
>     performance
>     > loss, although far more correct. What's true of long is also somewhat
>     > true for ints as the subtract approach doesn't behave correctly around
>     > Integer.MIN_VALUE. Having Long.compare and Integer.compare flagged as
>     > intrinsics would go someway to remove developers reticence to
>     prefer it
>     > over the pervasive subtract approach which is naively considered more
>     > performant.
>     >
>     > Thanks,
>     > Ian
>     >
>     >
>     > On Sat, Sep 26, 2015 at 12:02 PM, John Rose
>     <john.r.rose at oracle.com <mailto:john.r.rose at oracle.com>
>     > <mailto:john.r.rose at oracle.com <mailto:john.r.rose at oracle.com>>> wrote:
>     >
>     >     On Sep 25, 2015, at 7:35 PM, Vladimir Kozlov
>     >     <vladimir.kozlov at oracle.com
>     <mailto:vladimir.kozlov at oracle.com>
>     <mailto:vladimir.kozlov at oracle.com
>     <mailto:vladimir.kozlov at oracle.com>>> wrote:
>     >>
>     >>     For pattern matching I mean ideal transformation in ideal graph.
>     >>     See, fro example, is_x2logic() in cfgnode.cpp and other
>     >>     transformations for Phi node.
>     >>
>     >>     You will still need new CmpI3 node and changes in .ad file.
>     >
>     >     +1  Ideally, we pattern-match the body of the method we know
>     about and
>     >     generate the new IR, and then two good things happen:  Other
>     expressions
>     >     we didn't know about also pattern match, and after looking at the
>     >     pattern
>     >     logic we discover straightforward generalizations that go
>     beyond the
>     >     method
>     >     we knew about at first.
>     >
>     >     — John
>     >
>     >
> 
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20151001/f66ba27c/signature.asc>