<div dir="ltr"><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Hi,</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">While I was working on testing min/max long intrinsic as part of [1] I encountered an oddity benchmarking Math.min(II) when called inside a loop.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br>As part of that PR I was trying to measure potential regressions that adding this intrinsic can cause when the code is not vectorized. To test this:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">1. I emulated code not being vectorized by passing in -XX:-UseSuperWord</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">2. I emulated with/without the intrinsic by disabling the minL/maxL intrinsic, allowing me to easily test with/without the changes in my PR.<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br>To compare things I also tested with both long and ints. The test is:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">```</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">    public int intReductionSimpleMin(LoopState state) {<br>        int result = 0;<br>        for (int i = 0; i < state.size; i++) {<br>            final int v = state.minIntA[i];<br>            result = Math.min(result, v);<br>        }<br>        return result;<br>    }<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">```</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">The results can sometimes look like this:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">```</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Benchmark                              (probability)  (size)   Mode  Cnt  -min/-max  +min/+max   Units<br>MinMaxVector.intReductionSimpleMin               100    2048  thrpt    4    460.530    460.490  ops/ms (2)<br>MinMaxVector.longReductionSimpleMin              100    2048  thrpt    4    959.507    459.197  ops/ms (-52%)<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">```</div><br clear="all"></div><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">The probability is a way to control the branchiness Math.min. With the 100 value shown above, the code puts data in the `state.minIntA` array such that on each iteration the code always goes the same way in the branch.</div><br></div><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">The odd thing is that on certain occasions when the code runs scalar and with the min intrinsic disabled, the code behaves a lot slower than what is observed with Math.max(II) or Math.min(LL).</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">By running with perfasm I observed that on slow runs the Math.min(II) version is using cmov instead of cmp+mov. When the branched code is so one sided, the cmov version works much slower.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">The odd thing is that data in the array being added such that one side of the branch is always taken, one should not expect cmov to occur:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">```<br>Node *PhaseIdealLoop::conditional_move( Node *region ) {<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">...</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">  // Check for highly predictable branch.  No point in CMOV'ing if<br>  // we are going to predict accurately all the time.<br>  if (C->use_cmove() && (cmp_op == Op_CmpF || cmp_op == Op_CmpD)) {<br>    //keep going<br>  } else if (iff->_prob < infrequent_prob ||<br>      iff->_prob > (1.0f - infrequent_prob))<br>    return nullptr;<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">```</div><br></div><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">If we look at the PrintMethodData for the slow runs you see this:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">```</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">static java.lang.Math::min(II)I<br>  interpreter_invocation_count:       18171<br>  invocation_counter:                 18171<br>...</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">  0    bci: 2    BranchData         taken(7732) displacement(56)<br>                                    not taken(10180)<br>...<br>org.openjdk.bench.java.lang.MinMaxVector::intReductionSimpleMin(Lorg/openjdk/bench/java/lang/MinMaxVector$LoopState;)I<br>...</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">  23 invokestatic 32 <java/lang/Math.min(II)I><br>  32   bci: 23   CounterData        count(192512)...</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">```<br></div><br></div><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">There are 2 odd things there, one the invocation counter for Math.min. It's way lower than the number of times the benchmark invoves Math.min. Also, the percentage of not taken/taken is nowhere near 100% being either taken or not taken. I verified that the data in the array was correct.</div></div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">What has happened is that Math.min has been compiled before the benchmark code runs, and it's been compiled with different branch conditions to the one that the test expects. That is causing Math.min(II) in this scenario to use cmov.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">So, where are these other Math.min invocations coming from? Looking at the PrintMethodData it looks like the majority of them come from the Java Serialization layer that JMH forked processes depend on:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">```</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">static java.util.Arrays::copyOfRange([BII)[B<br>  73 invokestatic 304 <java/lang/Math.min(II)I><br>  416  bci: 73   CounterData        count(6878)<br><br>java.io.ObjectOutputStream$BlockDataOutputStream::write([BIIZ)V<br>  107 invokestatic 64 <java/lang/Math.min(II)I><br>  488  bci: 107  CounterData        count(3611)<br><br>sun.nio.ch.NioSocketImpl::write([BII)V<br>  41 invokestatic 255 <java/lang/Math.min(II)I><br>  128  bci: 41   CounterData        count(3623)<br><br>sun.nio.cs.UTF_8$Encoder::encodeArrayLoop(Ljava/nio/CharBuffer;Ljava/nio/ByteBuffer;)Ljava/nio/charset/CoderResult;<br>  75 invokestatic 62 <java/lang/Math.min(II)I><br>  480  bci: 75   CounterData        count(3599)<br><br>sun.nio.cs.StreamEncoder::growByteBufferIfNeeded(I)V<br>  34 invokestatic 252 <java/lang/Math.min(II)I><br>  144  bci: 34   CounterData        count(3597)<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">```</div></div><div dir="ltr"><br></div><div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Although the test does not happen under normal circumstances (having the Math.min intrinsic disabled), anyone benchmarking Math.min(II) could potentially see odd results as a result of intended pollution from other parts of code that runs the forked process.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Is there a way to avoid this issue? Can JMH somehow instruct HotSpot to deopt Math.min(II) before the warmup phase of the benchmark runs to avoid pollution? Any other ideas?</div></div><div dir="ltr"><br></div><div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Thanks</div></div><div dir="ltr">Galder<span class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"></span></div></div><div dir="ltr"><span class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></span></div><div dir="ltr"><span class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">[1] <a href="https://github.com/openjdk/jdk/pull/20098">https://github.com/openjdk/jdk/pull/20098</a></span></div></div></div></div></div>