RFR: 8266054: VectorAPI rotate operation optimization [v2]
Paul Sandoz
psandoz at openjdk.java.net
Fri Apr 30 15:48:03 UTC 2021
On Fri, 30 Apr 2021 12:59:34 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Current VectorAPI Java side implementation expresses rotateLeft and rotateRight operation using following operations:-
>>
>> vec1 = lanewise(VectorOperators.LSHL, n)
>> vec2 = lanewise(VectorOperators.LSHR, n)
>> res = lanewise(VectorOperations.OR, vec1 , vec2)
>>
>> This patch moves above handling from Java side to C2 compiler which facilitates dismantling the rotate operation if target ISA does not support a direct rotate instruction.
>>
>> AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over long and integer type vectors. For other cases (i.e. sub-word type vectors or for targets which do not support direct rotate operations ) instruction sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted.
>>
>> Please find below the performance data for included JMH benchmark.
>> Machine: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (Cascadelake Server)
>>
>> ``
>> Benchmark | (TESTSIZE) | Baseline (ops/min) | Withopt (ops/min) | Gain %
>> -- | -- | -- | -- | --
>> RotateBenchmark.testRotateLeftI | 64 | 6813.033 | 6936.507 | 1.81
>> RotateBenchmark.testRotateLeftI | 64 | 11549.524 | 12109.845 | 4.85
>> RotateBenchmark.testRotateLeftI | 64 | 17780.166 | 18180.083 | 2.25
>> RotateBenchmark.testRotateLeftI | 128 | 3355.631 | 3426.436 | 2.11
>> RotateBenchmark.testRotateLeftI | 128 | 5925.534 | 6096.71 | 2.89
>> RotateBenchmark.testRotateLeftI | 128 | 8847.645 | 8964.224 | 1.32
>> RotateBenchmark.testRotateLeftI | 256 | 1704.192 | 1779.964 | 4.45
>> RotateBenchmark.testRotateLeftI | 256 | 2919.158 | 3000.392 | 2.78
>> RotateBenchmark.testRotateLeftI | 256 | 4386.134 | 4514.211 | 2.92
>> RotateBenchmark.testRotateLeftL | 64 | 3419.193 | 3500.522 | 2.38
>> RotateBenchmark.testRotateLeftL | 64 | 5982.87 | 6056.589 | 1.23
>> RotateBenchmark.testRotateLeftL | 64 | 8829.172 | 8949.157 | 1.36
>> RotateBenchmark.testRotateLeftL | 128 | 1684.745 | 1784.49 | 5.92
>> RotateBenchmark.testRotateLeftL | 128 | 2942.226 | 2947.684 | 0.19
>> RotateBenchmark.testRotateLeftL | 128 | 4385.488 | 4554.113 | 3.85
>> RotateBenchmark.testRotateLeftL | 256 | 834.195 | 856.939 | 2.73
>> RotateBenchmark.testRotateLeftL | 256 | 1463.802 | 1551.051 | 5.96
>> RotateBenchmark.testRotateLeftL | 256 | 2187.005 | 2268.596 | 3.73
>> RotateBenchmark.testRotateRightI | 64 | 6676.132 | 7077.587 | 6.01
>> RotateBenchmark.testRotateRightI | 64 | 11452.825 | 11855.45 | 3.52
>> RotateBenchmark.testRotateRightI | 64 | 17507.286 | 18164.757 | 3.76
>> RotateBenchmark.testRotateRightI | 128 | 3412.394 | 3519.829 | 3.15
>> RotateBenchmark.testRotateRightI | 128 | 5905.677 | 5875.698 | -0.51
>> RotateBenchmark.testRotateRightI | 128 | 8745.95 | 8890.757 | 1.66
>> RotateBenchmark.testRotateRightI | 256 | 1681.176 | 1708.54 | 1.63
>> RotateBenchmark.testRotateRightI | 256 | 3004.008 | 3005.606 | 0.05
>> RotateBenchmark.testRotateRightI | 256 | 4466.633 | 4548.232 | 1.83
>> RotateBenchmark.testRotateRightL | 64 | 3361.499 | 3461.121 | 2.96
>> RotateBenchmark.testRotateRightL | 64 | 5873.37 | 6072.209 | 3.39
>> RotateBenchmark.testRotateRightL | 64 | 8820.284 | 9016.172 | 2.22
>> RotateBenchmark.testRotateRightL | 128 | 1715.556 | 1720.67 | 0.30
>> RotateBenchmark.testRotateRightL | 128 | 2957.337 | 3149.193 | 6.49
>> RotateBenchmark.testRotateRightL | 128 | 4440.468 | 4473.221 | 0.74
>> RotateBenchmark.testRotateRightL | 256 | 851.391 | 875.371 | 2.82
>> RotateBenchmark.testRotateRightL | 256 | 1452.422 | 1522.942 | 4.86
>> RotateBenchmark.testRotateRightL | 256 | 2208.454 | 2263.349 | 2.49
>>
>> ``
>
> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
>
> - 8266054: Review comments resolution.
> - Merge http://github.com/openjdk/jdk into JDK-8266054
> - 8266054: Changing gen-src.sh file permissions
> - 8266054: VectorAPI rotate operation optimization
src/hotspot/cpu/x86/x86.ad line 1652:
> 1650: case Op_RotateRightV:
> 1651: case Op_RotateLeftV:
> 1652: if (is_subword_type(bt)) {
Does that have the effect of not intrinsifying for `byte` or `short`?
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 728:
> 726: case VECTOR_OP_URSHIFT: return (v0, v1) ->
> 727: v0.bOp(v1, (i, a, n) -> ($type$)((a & LSHR_SETUP_MASK) >>> n));
> 728: #if[long]
I recommend you create new methods in `IntVector` etc called `rotateLeft` and `rotateRight` that do what is in the lambda expressions. Then you can collapse this to non-conditional cases calling those methods.
Do the same for the tests (like i did with the unsigned support), see
https://github.com/openjdk/jdk/blob/a31777959abc7cac1bf6d2a453d6fc15b628d866/test/jdk/jdk/incubator/vector/templates/Unit-header.template#L1429
and
https://github.com/openjdk/jdk/blob/a31777959abc7cac1bf6d2a453d6fc15b628d866/test/jdk/jdk/incubator/vector/gen-template.sh#L491
That will avoid the embedding of complex expressions.
test/micro/org/openjdk/bench/jdk/incubator/vector/RotateBenchmark.java line 37:
> 35:
> 36: @Param({"64","128","256"})
> 37: public int TESTSIZE;
Suggestion:
int size;
Lower case for instance field names (same applies to the species).
No need for `public`.
test/micro/org/openjdk/bench/jdk/incubator/vector/RotateBenchmark.java line 55:
> 53:
> 54: public final long[] specialValsL = {0L, -0L, Long.MIN_VALUE, Long.MAX_VALUE};
> 55: public final int[] specialValsI = {0, -0, Integer.MIN_VALUE, Integer.MAX_VALUE};
Suggestion:
static final long[] specialValsL = {0L, -0L, Long.MIN_VALUE, Long.MAX_VALUE};
static final int[] specialValsI = {0, -0, Integer.MIN_VALUE, Integer.MAX_VALUE};
test/micro/org/openjdk/bench/jdk/incubator/vector/RotateBenchmark.java line 83:
> 81: @Benchmark
> 82: public void testRotateLeftI(Blackhole bh) {
> 83: for(int i = 0 ; i < 10000; i++) {
No need for the outer loop. JMH will do that for you.
test/micro/org/openjdk/bench/jdk/incubator/vector/RotateBenchmark.java line 85:
> 83: for(int i = 0 ; i < 10000; i++) {
> 84: for (int j = 0 ; j < TESTSIZE; j+= ISPECIES.length()) {
> 85: vecI = IntVector.fromArray(ISPECIES, inpI, j);
Suggestion:
var vecI = IntVector.fromArray(ISPECIES, inpI, j);
Use a local variable instead of storing to a field
test/micro/org/openjdk/bench/jdk/incubator/vector/RotateBenchmark.java line 91:
> 89: vecI = vecI.lanewise(VectorOperators.ROL, i);
> 90: vecI.lanewise(VectorOperators.ROL, i).intoArray(resI, j);
> 91: }
Suggestion:
}
return resI;
return the result to better ensure HotSpot does not detect the result is unused.
-------------
PR: https://git.openjdk.java.net/jdk/pull/3720
More information about the core-libs-dev
mailing list