Intrinsics for Math.min and max

Tue Apr 1 23:45:09 UTC 2014

I agree that we should take into account a profiling information (if it 
is available) in LibraryCallKit::inline_min_max() as we do in 
PhaseIdealLoop::conditional_move(). At least the code will be consistent 
in both cases.

We still need to define how quantify "highly unpredictable".

Thanks,
Vladimir

On 4/1/14 4:31 PM, Vitaly Davidovich wrote:
> Thanks for putting the jmh code inline.
>
> Yes, I tend to agree with not forcing cmov in the intrinsic given modern
> hardware (unless, of course, profiling via interpreter shows the branch
> highly unpredictable).  Perhaps JIT should see if the min/max is
> executed in a loop body, and if so, consider it predictable (and
> generate jumps); if outside loop, it probably doesn't matter for perf
> all that much whether it's cmov or jump.
>
> Sent from my phone
>
> On Apr 1, 2014 7:06 PM, "Martin Grajcar" <maaartinus at gmail.com
> <mailto:maaartinus at gmail.com>> wrote:
>
>     Answered inline.
>
>     On Tue, Apr 1, 2014 at 11:58 PM, Vitaly Davidovich
>     <vitalyd at gmail.com <mailto:vitalyd at gmail.com>> wrote:
>
>         Apologies, meant to reply to the list.
>
>         Sent from my phone
>
>         On Apr 1, 2014 5:48 PM, "Vitaly Davidovich" <vitalyd at gmail.com
>         <mailto:vitalyd at gmail.com>> wrote:
>
>             I can't see the attachment (on my phone) but I'm guessing
>             the jumps generated by manual code are highly predicted?
>             What if you try it with an array of random values?
>
>
>     The input array is random:
>
>          @Setup public void setUp() {
>              final Random random = new Random();
>              for (int i=0; i<table.length; ++i) table[i] = random.nextInt();
>          }
>
>     The whole benchmark is this loop:
>
>          @GenerateMicroBenchmark public int intrinsic() {
>              int result = Integer.MIN_VALUE;
>              for (final int x : table) result = Math.max(result, x);
>              return result;
>          }
>
>     The values are random, but the branch gets more and more predictable
>     as result approaches the real maximum.
>
>             I'm guessing the cmov based intrinsics only win on (a) cpus
>             with poor branch prediction, (b) unpredictable branches, or
>             (c) code with lots of branches clogging up branch history
>             buffer.
>
>     Agreed, but my point was not to /force/ using cmov for Math.max when
>     the compiler can do it anyway (though there cases when it doesn't
>     although it should like
>     http://stackoverflow.com/questions/19689214/strange-branching-performance).
>
>             Also, is compiler generating larger code when using jumps?
>             If so, icache pressure could be an issue; I don't think a
>             microbenchmark will capture that though.
>
>     I'd guess the code size is about the same. Anyway, this
>     microbenchmark is really tiny.
>