Intrinsics for Math.min and max

Vitaly Davidovich vitalyd at gmail.com
Tue Apr 1 23:31:17 UTC 2014


Thanks for putting the jmh code inline.

Yes, I tend to agree with not forcing cmov in the intrinsic given modern
hardware (unless, of course, profiling via interpreter shows the branch
highly unpredictable).  Perhaps JIT should see if the min/max is executed
in a loop body, and if so, consider it predictable (and generate jumps); if
outside loop, it probably doesn't matter for perf all that much whether
it's cmov or jump.

Sent from my phone
On Apr 1, 2014 7:06 PM, "Martin Grajcar" <maaartinus at gmail.com> wrote:

> Answered inline.
>
> On Tue, Apr 1, 2014 at 11:58 PM, Vitaly Davidovich <vitalyd at gmail.com>wrote:
>
>> Apologies, meant to reply to the list.
>>
>> Sent from my phone
>> On Apr 1, 2014 5:48 PM, "Vitaly Davidovich" <vitalyd at gmail.com> wrote:
>>
>>> I can't see the attachment (on my phone) but I'm guessing the jumps
>>> generated by manual code are highly predicted? What if you try it with an
>>> array of random values?
>>>
>>
> The input array is random:
>
>     @Setup public void setUp() {
>         final Random random = new Random();
>         for (int i=0; i<table.length; ++i) table[i] = random.nextInt();
>     }
>
> The whole benchmark is this loop:
>
>     @GenerateMicroBenchmark public int intrinsic() {
>         int result = Integer.MIN_VALUE;
>         for (final int x : table) result = Math.max(result, x);
>         return result;
>     }
>
> The values are random, but the branch gets more and more predictable as
> result approaches the real maximum.
>
>>  I'm guessing the cmov based intrinsics only win on (a) cpus with poor
>>> branch prediction, (b) unpredictable branches, or (c) code with lots of
>>> branches clogging up branch history buffer.
>>>
>> Agreed, but my point was not to *force* using cmov for Math.max when the
> compiler can do it anyway (though there cases when it doesn't although it
> should like
> http://stackoverflow.com/questions/19689214/strange-branching-performance
> ).
>
>
>> Also, is compiler generating larger code when using jumps? If so, icache
>>> pressure could be an issue; I don't think a microbenchmark will capture
>>> that though.
>>>
>> I'd guess the code size is about the same. Anyway, this microbenchmark is
> really tiny.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20140401/d41cd468/attachment.html>


More information about the hotspot-compiler-dev mailing list