Intrinsics for Math.min and max
Vitaly Davidovich
vitalyd at gmail.com
Wed Apr 2 13:03:25 UTC 2014
Andrew,
You mean substantial unrolling occurs with certain probabilities? I think
that's where the cmov really hurts (for predictable branches) as the cpu
will be slowed down by dependency chains.
As I mentioned in my other reply, it's hard to build a software model of a
branch prediction unit as they're not simple probability counters but look
at branch patterns. I think best hotspot can do here is just record highly
likely/unlikely code paths, but leave the gray area alone (i.e. prefer
jumps).
Sent from my phone
On Apr 2, 2014 5:59 AM, "Andrew Haley" <aph at redhat.com> wrote:
> On 04/02/2014 12:31 AM, Vitaly Davidovich wrote:
> > Thanks for putting the jmh code inline.
> >
> > Yes, I tend to agree with not forcing cmov in the intrinsic given modern
> > hardware (unless, of course, profiling via interpreter shows the branch
> > highly unpredictable). Perhaps JIT should see if the min/max is executed
> > in a loop body, and if so, consider it predictable (and generate jumps);
> if
> > outside loop, it probably doesn't matter for perf all that much whether
> > it's cmov or jump.
>
> When probabilities are equal (i.e. max selects its left and right args
> equally often, code appended) HotSpot generates the same code for the
> intrinsic and the own version
>
> 0x00007f2659249fc1: mov $0x80000000,%eax ;*aload_2
> ; -
> org.openjdk.jmh.samples.JmhMaxBenchmark::intrinsic at 20 (line 71)
>
> ;; B6: # B6 B7 <- B5 B6 Loop: B6-B6 inner pre of N141 Freq: 1.99806
>
> 0x00007f2659249fc6: mov 0x10(%rdx,%r10,4),%ecx ;*iaload
> ; -
> org.openjdk.jmh.samples.JmhMaxBenchmark::intrinsic at 23 (line 71)
>
> 0x00007f2659249fcb: cmp %ecx,%eax
> 0x00007f2659249fcd: cmovl %ecx,%eax ;*invokestatic max
> ; -
> org.openjdk.jmh.samples.JmhMaxBenchmark::intrinsic at 29 (line 71)
>
> 0x00007f2659249fd0: inc %r10d ;*iinc
> ; -
> org.openjdk.jmh.samples.JmhMaxBenchmark::intrinsic at 33 (line 71)
>
> 0x00007f2659249fd3: cmp $0x1,%r10d
> 0x00007f2659249fd7: jl 0x00007f2659249fc6 ;*if_icmpge
> ; -
> org.openjdk.jmh.samples.JmhMaxBenchmark::intrinsic at 17 (line 71)
>
> and
>
> 0x00007fcf3124a283: mov $0x80000000,%eax ;*aload_2
> ; -
> org.openjdk.jmh.samples.JmhMaxBenchmark::own at 20 (line 77)
>
> ;; B6: # B6 B7 <- B5 B6 Loop: B6-B6 inner pre of N157 Freq: 1.99805
>
> 0x00007fcf3124a288: mov 0x10(%rdx,%r10,4),%ecx ;*iaload
> ; -
> org.openjdk.jmh.samples.JmhMaxBenchmark::own at 23 (line 77)
>
> 0x00007fcf3124a28d: cmp %ecx,%eax
> 0x00007fcf3124a28f: cmovl %ecx,%eax ;*ireturn
> ; -
> org.openjdk.jmh.samples.JmhMaxBenchmark::max at 10 (line 82)
> ; -
> org.openjdk.jmh.samples.JmhMaxBenchmark::own at 30 (line 77)
>
> 0x00007fcf3124a292: inc %r10d ;*iinc
> ; -
> org.openjdk.jmh.samples.JmhMaxBenchmark::own at 34 (line 77)
>
> 0x00007fcf3124a295: cmp $0x1,%r10d
> 0x00007fcf3124a299: jl 0x00007fcf3124a288 ;*if_icmpge
> ; -
> org.openjdk.jmh.samples.JmhMaxBenchmark::own at 17 (line 77)
>
> Unsurprisingly, the measured time is the same for own() and intrinsic().
>
> I am concerned that the inner part of the loop is too small. As much
> time is spent in the loop machinery as in the actual calculation. I
> have noticed that substantial inlining occurs with some probabilities,
> and this might significantly change the measurements.
>
> With the setup code below it's easy to fiddle with probabilities and
> see what happens.
>
> Andrew.
>
>
> @Setup public void setUp() {
> final Random random = new Random();
> for (int i=0; i<table.length; ++i) {
> // table[i] = random.nextInt();
> if (random.nextDouble() > 0.5) {
> table[i] = i;
> } else {
> table[i] = -i;
> }
> }
> }
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20140402/052967c3/attachment-0001.html>
More information about the hotspot-compiler-dev
mailing list