RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11]

Galder ZamarreƱo galder at openjdk.org
Mon Feb 10 09:29:20 UTC 2025


On Fri, 7 Feb 2025 12:27:42 GMT, Galder ZamarreƱo <galder at openjdk.org> wrote:

> At 100% probability baseline fails to vectorize because it observes a control flow. This control flow is not the one you see in min/max implementations, but this is one added by HotSpot as a result of the JIT profiling. It observes that one branch is always taken so it optimizes for that, and adds a branch for the uncommon case where the branch is not taken.

I've dug further into this to try to understand how the baseline hotspot code works, and the explanation above is not entirely correct. Let's look at the IR differences between say 100% vs 80% branch situations.

At branch 80% you see:

 1115  CountedLoop  === 1115 598 463  [[ 1101 1115 1116 1118 451 594 ]] inner stride: 2 main of N1115 strip mined !orig=[599],[590],[307] !jvms: MinMaxVector::longLoopMax @ bci:10 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124)

  692  LoadL  === 1083 1101 393  [[ 747 ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=9; #long (does not depend only on test, unknown control) !orig=[395] !jvms: MinMaxVector::longLoopMax @ bci:26 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124)
  651  LoadL  === 1095 1101 355  [[ 747 ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=9; #long (does not depend only on test, unknown control) !orig=[357] !jvms: MinMaxVector::longLoopMax @ bci:20 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124)
  747  MaxL  === _ 651 692  [[ 451 ]]  !orig=[608],[416] !jvms: Math::max @ bci:11 (line 2037) MinMaxVector::longLoopMax @ bci:27 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124)

  451  StoreL  === 1115 1101 449 747  [[ 1116 454 911 ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=9;  Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=9; !orig=1124 !jvms: MinMaxVector::longLoopMax @ bci:30 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124)

  594  CountedLoopEnd  === 1115 593  [[ 1123 463 ]] [lt] P=0.999731, C=780799.000000 !orig=[462] !jvms: MinMaxVector::longLoopMax @ bci:7 (line 235) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124)


You see the counted loop with the LoadL for array loads and MaxL consuming those. The StoreL is for array assignment (I think).

At branch 100% you see:


  650  LoadL  === 1105 1119 355  [[ 416 408 ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=9; #long (does not depend only on test, unknown control) !orig=[357] !jvms: MinMaxVector::longLoopMax @ bci:20 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124)
  691  LoadL  === 1093 1119 393  [[ 416 408 ]]  @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=9; #long (does not depend only on test, unknown control) !orig=[395] !jvms: MinMaxVector::longLoopMax @ bci:26 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124)
  408  CmpL  === _ 650 691  [[ 409 ]]  !jvms: Math::max @ bci:3 (line 2037) MinMaxVector::longLoopMax @ bci:27 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124)
  409  Bool  === _ 408  [[ 410 ]] [lt] !jvms: Math::max @ bci:3 (line 2037) MinMaxVector::longLoopMax @ bci:27 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124)
  410  If  === 1132 409  [[ 411 412 ]] P=0.019892, C=79127.000000 !jvms: Math::max @ bci:3 (line 2037) MinMaxVector::longLoopMax @ bci:27 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124)
  411  IfTrue  === 410  [[ 415 ]] #1 !jvms: Math::max @ bci:3 (line 2037) MinMaxVector::longLoopMax @ bci:27 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124)
  412  IfFalse  === 410  [[ 415 ]] #0 !jvms: Math::max @ bci:3 (line 2037) MinMaxVector::longLoopMax @ bci:27 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124)
  415  Region  === 415 411 412  [[ 415 594 416 451 ]]  !orig=[423] !jvms: Math::max @ bci:11 (line 2037) MinMaxVector::longLoopMax @ bci:27 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124)

  594  CountedLoopEnd  === 415 593  [[ 1139 463 ]] [lt] P=0.999683, C=706030.000000 !orig=[462] !jvms: MinMaxVector::longLoopMax @ bci:7 (line 235) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124)


You see a region within the counted loop with the if/else which belongs to the actual `Math.max` implementation, with the corresponding CmpL and the LoadL nodes for retrieving the longs from the arrays.

What causes the difference? It's this section in `PhaseIdealLoop::conditional_move`:

```c++
  // Check for highly predictable branch.  No point in CMOV'ing if
  // we are going to predict accurately all the time.
  if (C->use_cmove() && (cmp_op == Op_CmpF || cmp_op == Op_CmpD)) {
    //keep going
  } else if (iff->_prob < infrequent_prob ||
      iff->_prob > (1.0f - infrequent_prob))
    return nullptr;


At branch 100 `iff->_prob > (1.0f - infrequent_prob)` becomes true and no CMoveL is created so hotspot seems to stick to the original bytecode implementation of `Math.max`. At branch 80 that comparison is below and CMoveL is created, which eventually gets converted into a MaxL node and vectorization kicks in. The numbers are interesting. `infrequent_prob` appears to be a fixed number `0.181818187` and `1.0f` minus that is `0.818181812`. So, at branch 100 `iff->_prob` is `0.906792104` therefore higher than `0.818181812`, and at branch 80 `0.718619287`. I would have expected those `iff->_prob` to be closer to the branch % targets I set, but ignoring that, seems like ~90% would be the cut off.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2647410266


More information about the core-libs-dev mailing list