Intrinsics for Math.min and max

Wed Apr 2 12:55:43 UTC 2014

I think hotspot profiling records cumulative probability of a branch taken
vs not taken - it doesn't record the pattern, right? This basically means
that the best hotspot can do is determine if a jump is very likely taken or
not, rather than being able to answer whether something is highly
unpredictable.  You can have a branch with 50/50 chance of being taken or
not, but the pattern is just as important for hardware branch prediction.
For example, a loop runs for 100 iterations with first 50 taking the jump
and last 50 not; branch predictor should do fine here even though frequency
as recorded by hotspot will simply say 50/50 (correct me if I'm wrong
though).

So it really seems that hotspot can only use profiling to determine very
high probability of branches taken, but not able to say much else for other
cases because it doesn't model prediction the same way as cpu does (nor
should it).

Given that, seems like code gen should prefer to emit jumps almost all the
time.  There's the other aspect to this which is that if hotspot emits
jumps and profiling shows high branch misprediction, developer can possibly
change their code to either remove branches or make them more predictable.
If cmov is emitted though, then there's nothing dev can do (unless you guys
modify intrinsic for math/min to use profile info).

Am I missing something in my reasoning?

Thanks

Sent from my phone
On Apr 1, 2014 7:46 PM, "Vladimir Kozlov" <vladimir.kozlov at oracle.com>
wrote:

> I agree that we should take into account a profiling information (if it is
> available) in LibraryCallKit::inline_min_max() as we do in
> PhaseIdealLoop::conditional_move(). At least the code will be consistent
> in both cases.
>
> We still need to define how quantify "highly unpredictable".
>
> Thanks,
> Vladimir
>
> On 4/1/14 4:31 PM, Vitaly Davidovich wrote:
>
>> Thanks for putting the jmh code inline.
>>
>> Yes, I tend to agree with not forcing cmov in the intrinsic given modern
>> hardware (unless, of course, profiling via interpreter shows the branch
>> highly unpredictable).  Perhaps JIT should see if the min/max is
>> executed in a loop body, and if so, consider it predictable (and
>> generate jumps); if outside loop, it probably doesn't matter for perf
>> all that much whether it's cmov or jump.
>>
>> Sent from my phone
>>
>> On Apr 1, 2014 7:06 PM, "Martin Grajcar" <maaartinus at gmail.com
>> <mailto:maaartinus at gmail.com>> wrote:
>>
>>     Answered inline.
>>
>>     On Tue, Apr 1, 2014 at 11:58 PM, Vitaly Davidovich
>>     <vitalyd at gmail.com <mailto:vitalyd at gmail.com>> wrote:
>>
>>         Apologies, meant to reply to the list.
>>
>>         Sent from my phone
>>
>>         On Apr 1, 2014 5:48 PM, "Vitaly Davidovich" <vitalyd at gmail.com
>>         <mailto:vitalyd at gmail.com>> wrote:
>>
>>             I can't see the attachment (on my phone) but I'm guessing
>>             the jumps generated by manual code are highly predicted?
>>             What if you try it with an array of random values?
>>
>>
>>     The input array is random:
>>
>>          @Setup public void setUp() {
>>              final Random random = new Random();
>>              for (int i=0; i<table.length; ++i) table[i] =
>> random.nextInt();
>>          }
>>
>>     The whole benchmark is this loop:
>>
>>          @GenerateMicroBenchmark public int intrinsic() {
>>              int result = Integer.MIN_VALUE;
>>              for (final int x : table) result = Math.max(result, x);
>>              return result;
>>          }
>>
>>     The values are random, but the branch gets more and more predictable
>>     as result approaches the real maximum.
>>
>>             I'm guessing the cmov based intrinsics only win on (a) cpus
>>             with poor branch prediction, (b) unpredictable branches, or
>>             (c) code with lots of branches clogging up branch history
>>             buffer.
>>
>>     Agreed, but my point was not to /force/ using cmov for Math.max when
>>     the compiler can do it anyway (though there cases when it doesn't
>>     although it should like
>>     http://stackoverflow.com/questions/19689214/strange-
>> branching-performance).
>>
>>             Also, is compiler generating larger code when using jumps?
>>             If so, icache pressure could be an issue; I don't think a
>>             microbenchmark will capture that though.
>>
>>     I'd guess the code size is about the same. Anyway, this
>>     microbenchmark is really tiny.
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20140402/711580b4/attachment.html>