Intrinsics for Math.min and max
Andrew Haley
aph at redhat.com
Wed Apr 2 09:59:14 UTC 2014
On 04/02/2014 12:31 AM, Vitaly Davidovich wrote:
> Thanks for putting the jmh code inline.
>
> Yes, I tend to agree with not forcing cmov in the intrinsic given modern
> hardware (unless, of course, profiling via interpreter shows the branch
> highly unpredictable). Perhaps JIT should see if the min/max is executed
> in a loop body, and if so, consider it predictable (and generate jumps); if
> outside loop, it probably doesn't matter for perf all that much whether
> it's cmov or jump.
When probabilities are equal (i.e. max selects its left and right args
equally often, code appended) HotSpot generates the same code for the
intrinsic and the own version
0x00007f2659249fc1: mov $0x80000000,%eax ;*aload_2
; - org.openjdk.jmh.samples.JmhMaxBenchmark::intrinsic at 20 (line 71)
;; B6: # B6 B7 <- B5 B6 Loop: B6-B6 inner pre of N141 Freq: 1.99806
0x00007f2659249fc6: mov 0x10(%rdx,%r10,4),%ecx ;*iaload
; - org.openjdk.jmh.samples.JmhMaxBenchmark::intrinsic at 23 (line 71)
0x00007f2659249fcb: cmp %ecx,%eax
0x00007f2659249fcd: cmovl %ecx,%eax ;*invokestatic max
; - org.openjdk.jmh.samples.JmhMaxBenchmark::intrinsic at 29 (line 71)
0x00007f2659249fd0: inc %r10d ;*iinc
; - org.openjdk.jmh.samples.JmhMaxBenchmark::intrinsic at 33 (line 71)
0x00007f2659249fd3: cmp $0x1,%r10d
0x00007f2659249fd7: jl 0x00007f2659249fc6 ;*if_icmpge
; - org.openjdk.jmh.samples.JmhMaxBenchmark::intrinsic at 17 (line 71)
and
0x00007fcf3124a283: mov $0x80000000,%eax ;*aload_2
; - org.openjdk.jmh.samples.JmhMaxBenchmark::own at 20 (line 77)
;; B6: # B6 B7 <- B5 B6 Loop: B6-B6 inner pre of N157 Freq: 1.99805
0x00007fcf3124a288: mov 0x10(%rdx,%r10,4),%ecx ;*iaload
; - org.openjdk.jmh.samples.JmhMaxBenchmark::own at 23 (line 77)
0x00007fcf3124a28d: cmp %ecx,%eax
0x00007fcf3124a28f: cmovl %ecx,%eax ;*ireturn
; - org.openjdk.jmh.samples.JmhMaxBenchmark::max at 10 (line 82)
; - org.openjdk.jmh.samples.JmhMaxBenchmark::own at 30 (line 77)
0x00007fcf3124a292: inc %r10d ;*iinc
; - org.openjdk.jmh.samples.JmhMaxBenchmark::own at 34 (line 77)
0x00007fcf3124a295: cmp $0x1,%r10d
0x00007fcf3124a299: jl 0x00007fcf3124a288 ;*if_icmpge
; - org.openjdk.jmh.samples.JmhMaxBenchmark::own at 17 (line 77)
Unsurprisingly, the measured time is the same for own() and intrinsic().
I am concerned that the inner part of the loop is too small. As much
time is spent in the loop machinery as in the actual calculation. I
have noticed that substantial inlining occurs with some probabilities,
and this might significantly change the measurements.
With the setup code below it's easy to fiddle with probabilities and
see what happens.
Andrew.
@Setup public void setUp() {
final Random random = new Random();
for (int i=0; i<table.length; ++i) {
// table[i] = random.nextInt();
if (random.nextDouble() > 0.5) {
table[i] = i;
} else {
table[i] = -i;
}
}
}
More information about the hotspot-compiler-dev
mailing list