RFR: 8266054: VectorAPI rotate operation optimization [v2]

Paul Sandoz psandoz at openjdk.java.net
Fri Apr 30 15:48:03 UTC 2021


On Fri, 30 Apr 2021 12:59:34 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Current VectorAPI Java side implementation expresses rotateLeft and rotateRight operation using following operations:-
>> 
>>     vec1 = lanewise(VectorOperators.LSHL, n)
>>     vec2 = lanewise(VectorOperators.LSHR, n)
>>     res = lanewise(VectorOperations.OR, vec1 , vec2)
>> 
>> This patch moves above handling from Java side to C2 compiler which facilitates dismantling the rotate operation if target ISA does not support a direct rotate instruction.
>> 
>> AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over long and integer type vectors. For other cases (i.e. sub-word type vectors or for targets which do not support direct rotate operations )   instruction sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted.
>> 
>> Please find below the performance data for included JMH benchmark.
>> Machine:  Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (Cascadelake Server)
>> 
>> ``
>> Benchmark | (TESTSIZE) | Baseline (ops/min) | Withopt (ops/min) | Gain %
>> -- | -- | -- | -- | --
>> RotateBenchmark.testRotateLeftI | 64 | 6813.033 | 6936.507 | 1.81
>> RotateBenchmark.testRotateLeftI | 64 | 11549.524 | 12109.845 | 4.85
>> RotateBenchmark.testRotateLeftI | 64 | 17780.166 | 18180.083 | 2.25
>> RotateBenchmark.testRotateLeftI | 128 | 3355.631 | 3426.436 | 2.11
>> RotateBenchmark.testRotateLeftI | 128 | 5925.534 | 6096.71 | 2.89
>> RotateBenchmark.testRotateLeftI | 128 | 8847.645 | 8964.224 | 1.32
>> RotateBenchmark.testRotateLeftI | 256 | 1704.192 | 1779.964 | 4.45
>> RotateBenchmark.testRotateLeftI | 256 | 2919.158 | 3000.392 | 2.78
>> RotateBenchmark.testRotateLeftI | 256 | 4386.134 | 4514.211 | 2.92
>> RotateBenchmark.testRotateLeftL | 64 | 3419.193 | 3500.522 | 2.38
>> RotateBenchmark.testRotateLeftL | 64 | 5982.87 | 6056.589 | 1.23
>> RotateBenchmark.testRotateLeftL | 64 | 8829.172 | 8949.157 | 1.36
>> RotateBenchmark.testRotateLeftL | 128 | 1684.745 | 1784.49 | 5.92
>> RotateBenchmark.testRotateLeftL | 128 | 2942.226 | 2947.684 | 0.19
>> RotateBenchmark.testRotateLeftL | 128 | 4385.488 | 4554.113 | 3.85
>> RotateBenchmark.testRotateLeftL | 256 | 834.195 | 856.939 | 2.73
>> RotateBenchmark.testRotateLeftL | 256 | 1463.802 | 1551.051 | 5.96
>> RotateBenchmark.testRotateLeftL | 256 | 2187.005 | 2268.596 | 3.73
>> RotateBenchmark.testRotateRightI | 64 | 6676.132 | 7077.587 | 6.01
>> RotateBenchmark.testRotateRightI | 64 | 11452.825 | 11855.45 | 3.52
>> RotateBenchmark.testRotateRightI | 64 | 17507.286 | 18164.757 | 3.76
>> RotateBenchmark.testRotateRightI | 128 | 3412.394 | 3519.829 | 3.15
>> RotateBenchmark.testRotateRightI | 128 | 5905.677 | 5875.698 | -0.51
>> RotateBenchmark.testRotateRightI | 128 | 8745.95 | 8890.757 | 1.66
>> RotateBenchmark.testRotateRightI | 256 | 1681.176 | 1708.54 | 1.63
>> RotateBenchmark.testRotateRightI | 256 | 3004.008 | 3005.606 | 0.05
>> RotateBenchmark.testRotateRightI | 256 | 4466.633 | 4548.232 | 1.83
>> RotateBenchmark.testRotateRightL | 64 | 3361.499 | 3461.121 | 2.96
>> RotateBenchmark.testRotateRightL | 64 | 5873.37 | 6072.209 | 3.39
>> RotateBenchmark.testRotateRightL | 64 | 8820.284 | 9016.172 | 2.22
>> RotateBenchmark.testRotateRightL | 128 | 1715.556 | 1720.67 | 0.30
>> RotateBenchmark.testRotateRightL | 128 | 2957.337 | 3149.193 | 6.49
>> RotateBenchmark.testRotateRightL | 128 | 4440.468 | 4473.221 | 0.74
>> RotateBenchmark.testRotateRightL | 256 | 851.391 | 875.371 | 2.82
>> RotateBenchmark.testRotateRightL | 256 | 1452.422 | 1522.942 | 4.86
>> RotateBenchmark.testRotateRightL | 256 | 2208.454 | 2263.349 | 2.49
>> 
>>  ``
>
> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
> 
>  - 8266054: Review comments resolution.
>  - Merge http://github.com/openjdk/jdk into JDK-8266054
>  - 8266054: Changing gen-src.sh file permissions
>  - 8266054: VectorAPI rotate operation optimization

src/hotspot/cpu/x86/x86.ad line 1652:

> 1650:     case Op_RotateRightV:
> 1651:     case Op_RotateLeftV:
> 1652:       if (is_subword_type(bt)) {

Does that have the effect of not intrinsifying for `byte` or `short`?

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 728:

> 726:                 case VECTOR_OP_URSHIFT: return (v0, v1) ->
> 727:                         v0.bOp(v1, (i, a, n) -> ($type$)((a & LSHR_SETUP_MASK) >>> n));
> 728: #if[long]

I recommend you create new methods in `IntVector` etc called `rotateLeft` and `rotateRight` that do what is in the lambda expressions. Then you can collapse this to non-conditional cases calling those methods.

Do the same for the tests (like i did with the unsigned support), see

https://github.com/openjdk/jdk/blob/a31777959abc7cac1bf6d2a453d6fc15b628d866/test/jdk/jdk/incubator/vector/templates/Unit-header.template#L1429

and

https://github.com/openjdk/jdk/blob/a31777959abc7cac1bf6d2a453d6fc15b628d866/test/jdk/jdk/incubator/vector/gen-template.sh#L491

That will avoid the embedding of complex expressions.

test/micro/org/openjdk/bench/jdk/incubator/vector/RotateBenchmark.java line 37:

> 35: 
> 36:   @Param({"64","128","256"})
> 37:   public int TESTSIZE;

Suggestion:

  int size;

Lower case for instance field names (same applies to the species).
No need for `public`.

test/micro/org/openjdk/bench/jdk/incubator/vector/RotateBenchmark.java line 55:

> 53: 
> 54:   public final long[] specialValsL = {0L, -0L, Long.MIN_VALUE, Long.MAX_VALUE};
> 55:   public final int[] specialValsI = {0, -0, Integer.MIN_VALUE, Integer.MAX_VALUE};

Suggestion:

  static final long[] specialValsL = {0L, -0L, Long.MIN_VALUE, Long.MAX_VALUE};
  static final int[] specialValsI = {0, -0, Integer.MIN_VALUE, Integer.MAX_VALUE};

test/micro/org/openjdk/bench/jdk/incubator/vector/RotateBenchmark.java line 83:

> 81:   @Benchmark
> 82:   public void testRotateLeftI(Blackhole bh) {
> 83:     for(int i = 0 ; i < 10000; i++) {

No need for the outer loop. JMH will do that for you.

test/micro/org/openjdk/bench/jdk/incubator/vector/RotateBenchmark.java line 85:

> 83:     for(int i = 0 ; i < 10000; i++) {
> 84:       for (int j = 0 ; j < TESTSIZE; j+= ISPECIES.length()) {
> 85:         vecI = IntVector.fromArray(ISPECIES, inpI, j);

Suggestion:

        var vecI = IntVector.fromArray(ISPECIES, inpI, j);

Use a local variable instead of storing to a field

test/micro/org/openjdk/bench/jdk/incubator/vector/RotateBenchmark.java line 91:

> 89:         vecI = vecI.lanewise(VectorOperators.ROL, i);
> 90:         vecI.lanewise(VectorOperators.ROL, i).intoArray(resI, j);
> 91:       }

Suggestion:

      }
      return resI;

return the result to better ensure HotSpot does not detect the result is unused.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3720


More information about the core-libs-dev mailing list