JIT code generation for Long/Integer.compare
Aleksey Shipilev
aleksey.shipilev at oracle.com
Thu Oct 1 17:05:04 UTC 2015
Hi Ian,
Don't you want to replicate these results yourself, since it may serve
to validate any polishings for the patch? ;) The benchmark you suggest
is easily runnable, and -prof perfasm can give you the generated code in
a minute.
Also, why would anyone do "Long.compare(a, b) < 0" when it's actually "a
< b"? Should we actually care about this (arguably) code smell?
Anyhow, my quick runs suggest with current JDK implementation C2
compiles "Long.compare(a, b) < 0" expressions nicely, collapsing some
code paths that used to return 0xff..ff / setne to just returning 1 / 0,
sometimes even folding into conditional move. The generated code with
the intrinsic is the same!
E.g.
@Benchmark
public void first(Blackhole bh) {
for (long v : a) {
bh.consume(compare(v, 0L));
}
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
private boolean compare(long a, long b) {
return Long.compare(a, b) < 0;
}
...with bias=0.5 emits this with both baseline and intrinsified versions:
[Verified Entry Point]
0.78% 0.92% 0x00007fa9395f9b00: sub $0x18,%rsp
6.37% 6.16% 0x00007fa9395f9b07: mov %rbp,0x10(%rsp)
0.08% 0.08% 0x00007fa9395f9b0c: xor %r10d,%r10d
1.05% 0.71% 0x00007fa9395f9b0f: mov $0x1,%eax
0x00007fa9395f9b14: cmp %rcx,%rdx
6.23% 5.44% 0x00007fa9395f9b17: cmovge %r10d,%eax
0.82% 0.48% 0x00007fa9395f9b1b: add $0x10,%rsp
0x00007fa9395f9b1f: pop %rbp
6.25% 5.74% 0x00007fa9395f9b20: test %eax,0x134ae4da(%rip)
0.04% 0.08% 0x00007fa9395f9b26: retq
Other biases are swinging back and forth, mostly tied, depending on how
lucky you are in laying out the branches. (Admittedly, "< 0" shape is
favorited by non-optimized intrinsic code -- it without setne; so even
better benchmarks would quantify "> 0" and "== 0" patterns as well).
So, I would say we should mostly care about ternary (-1, 0, 1) cases? I
still think the work optimizing Long/Integer.compare is about dealing
with optimizer quirks...
Thanks,
-Aleksey
On 10/01/2015 07:05 PM, Ian Rogers wrote:
> Thanks Aleksey,
>
> I'd comment on the bug but I have no login. With your analysis could you
> look at the effect of the codegen around:
>
> @CompilerControl(CompilerControl.Mode.DONT_INLINE)
> private boolean compare(long a, long b) {
> return Long.compare(a, b) < 0;
> }
>
> without the intrinsic you effectively (currently) get:
>
> return ((x < y) ? -1 : ((x == y) ? 0 : 1)) < 0;
>
> with the intrinsic you get:
>
> return x < y;
>
> That's not to say intrinsics are good, but recognizing the code as a
> CmplL3 means it folds into the outer comparison which is common in the
> use of compareTo methods. I didn't notice this in your examples above,
> which may be error on my part - apologies in advance.
>
> Thanks,
> Ian
>
>
> On Mon, Sep 28, 2015 at 3:11 PM, Aleksey Shipilev
> <aleksey.shipilev at oracle.com <mailto:aleksey.shipilev at oracle.com>> wrote:
>
> Hi Ian,
>
> I filed:
> https://bugs.openjdk.java.net/browse/JDK-8137309
>
> I can see some improvement with your patch, mostly when Longs are
> random, and therefore branch profile is polluted, and current generated
> code is not laid out well. Current Hotspot seems to generate multiple
> redundant cmp-s, which looks like a generic issue that should be
> addressed. See e.g. the very first disassembly here:
> http://cr.openjdk.java.net/~shade/8137309/BiasedCompareTo.java
> <http://cr.openjdk.java.net/%7Eshade/8137309/BiasedCompareTo.java>
>
> In some other cases, patched version is regressing. It seems to be the
> interaction with intrinsic not using any branch profile to figure out
> which option is more frequent, and some minor regalloc/constant
> propagation issues. See e.g.:
> http://cr.openjdk.java.net/~shade/8137309/PredictedCompareTo.java
> <http://cr.openjdk.java.net/%7Eshade/8137309/PredictedCompareTo.java>
>
> With that, fixing the optimizer/codegen without resorting to point
> intrinsics does look like a better course of action.
>
> Thanks,
> -Aleksey
>
> On 09/28/2015 04:44 AM, Ian Rogers wrote:
> > Thanks. I'm not disagreeing with anything here. is_x2logic is 45 lines
> > long and doing less (at least half) pattern matching work than
> would be
> > required for compare. Wrt implementation approach, ReverseBytesI is
> > purely an intrinsic but could also be implemented by pattern matching,
> > which fwiw is how its implemented in Jikes RVM as part of instruction
> > selection. With hindsight a high/ideal level BURS matching phase would
> > be nice. The intrinsic implementation is trivially correct and
> easy for
> > developers to target, whereas pattern matching can be buggy, hard to
> > test, and so on. Its inherently more LOC complex. I have long thought
> > pattern matching is the more laudable approach, but there's also a
> > certain amount of pragmatism when selecting between an intrinsic and
> > pattern matching. The scale of the proposed pattern match is
> beyond the
> > size of any equivalent in C2, so I have concerns around
> correctness and
> > in the consistency of the codebase in implementing it this way.
> >
> > If I may explain some of the background for this change, aside pure
> > performance. There is a common bug pattern of:
> >
> > class MyFoo implements Comparable<MyFoo> {
> > private long x;
> > int compareTo(MyFoo o) {
> > return (int)(o.x - x);
> > }
> > }
> >
> > As the int cast doesn't preserve the sign of the long subtract,
> subtract
> > results that overflow 31-bits may produce incorrect sort orders.
> > Rewriting the above code to use Long.compare is currently a
> performance
> > loss, although far more correct. What's true of long is also somewhat
> > true for ints as the subtract approach doesn't behave correctly around
> > Integer.MIN_VALUE. Having Long.compare and Integer.compare flagged as
> > intrinsics would go someway to remove developers reticence to
> prefer it
> > over the pervasive subtract approach which is naively considered more
> > performant.
> >
> > Thanks,
> > Ian
> >
> >
> > On Sat, Sep 26, 2015 at 12:02 PM, John Rose
> <john.r.rose at oracle.com <mailto:john.r.rose at oracle.com>
> > <mailto:john.r.rose at oracle.com <mailto:john.r.rose at oracle.com>>> wrote:
> >
> > On Sep 25, 2015, at 7:35 PM, Vladimir Kozlov
> > <vladimir.kozlov at oracle.com
> <mailto:vladimir.kozlov at oracle.com>
> <mailto:vladimir.kozlov at oracle.com
> <mailto:vladimir.kozlov at oracle.com>>> wrote:
> >>
> >> For pattern matching I mean ideal transformation in ideal graph.
> >> See, fro example, is_x2logic() in cfgnode.cpp and other
> >> transformations for Phi node.
> >>
> >> You will still need new CmpI3 node and changes in .ad file.
> >
> > +1 Ideally, we pattern-match the body of the method we know
> about and
> > generate the new IR, and then two good things happen: Other
> expressions
> > we didn't know about also pattern match, and after looking at the
> > pattern
> > logic we discover straightforward generalizations that go
> beyond the
> > method
> > we knew about at first.
> >
> > — John
> >
> >
>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20151001/f66ba27c/signature.asc>
More information about the hotspot-compiler-dev
mailing list