Intrinsics for Math.min and max
Vladimir Kozlov
vladimir.kozlov at oracle.com
Tue Apr 1 23:45:09 UTC 2014
I agree that we should take into account a profiling information (if it
is available) in LibraryCallKit::inline_min_max() as we do in
PhaseIdealLoop::conditional_move(). At least the code will be consistent
in both cases.
We still need to define how quantify "highly unpredictable".
Thanks,
Vladimir
On 4/1/14 4:31 PM, Vitaly Davidovich wrote:
> Thanks for putting the jmh code inline.
>
> Yes, I tend to agree with not forcing cmov in the intrinsic given modern
> hardware (unless, of course, profiling via interpreter shows the branch
> highly unpredictable). Perhaps JIT should see if the min/max is
> executed in a loop body, and if so, consider it predictable (and
> generate jumps); if outside loop, it probably doesn't matter for perf
> all that much whether it's cmov or jump.
>
> Sent from my phone
>
> On Apr 1, 2014 7:06 PM, "Martin Grajcar" <maaartinus at gmail.com
> <mailto:maaartinus at gmail.com>> wrote:
>
> Answered inline.
>
> On Tue, Apr 1, 2014 at 11:58 PM, Vitaly Davidovich
> <vitalyd at gmail.com <mailto:vitalyd at gmail.com>> wrote:
>
> Apologies, meant to reply to the list.
>
> Sent from my phone
>
> On Apr 1, 2014 5:48 PM, "Vitaly Davidovich" <vitalyd at gmail.com
> <mailto:vitalyd at gmail.com>> wrote:
>
> I can't see the attachment (on my phone) but I'm guessing
> the jumps generated by manual code are highly predicted?
> What if you try it with an array of random values?
>
>
> The input array is random:
>
> @Setup public void setUp() {
> final Random random = new Random();
> for (int i=0; i<table.length; ++i) table[i] = random.nextInt();
> }
>
> The whole benchmark is this loop:
>
> @GenerateMicroBenchmark public int intrinsic() {
> int result = Integer.MIN_VALUE;
> for (final int x : table) result = Math.max(result, x);
> return result;
> }
>
> The values are random, but the branch gets more and more predictable
> as result approaches the real maximum.
>
> I'm guessing the cmov based intrinsics only win on (a) cpus
> with poor branch prediction, (b) unpredictable branches, or
> (c) code with lots of branches clogging up branch history
> buffer.
>
> Agreed, but my point was not to /force/ using cmov for Math.max when
> the compiler can do it anyway (though there cases when it doesn't
> although it should like
> http://stackoverflow.com/questions/19689214/strange-branching-performance).
>
> Also, is compiler generating larger code when using jumps?
> If so, icache pressure could be an issue; I don't think a
> microbenchmark will capture that though.
>
> I'd guess the code size is about the same. Anyway, this
> microbenchmark is really tiny.
>
More information about the hotspot-compiler-dev
mailing list