[vectorIntrinsics] RFR: 8244490: [vector] Move Vector API micro benchmarks under test/micro

Marcus G K Williams mgkwill at openjdk.java.net
Tue Jun 8 16:02:34 UTC 2021


On Thu, 6 May 2021 17:54:37 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:

>> Vector API micro benchmarks are currently located under test/jdk/jdk/incubator/vector/benchmark which makes them rather as "dead" code without possibility to built and run.
>> 
>> The proper location for micro benchmarks is actually test/micro/ directory.
>> It would be nice to move Vector API benchmarks there so they can be built automatically as part 'test-image' make target.
>> Once they are built they can be run as
>>  make run-test TEST=micro:BENCHMARK_TEST_NAME
>
> I looked more in detail, there are some immediate issues, and i think an architectural issue with the split.
> 
> The more immediate issues are:
> 
> 1. Some benchmark tests fail to build via `make`. Some are related to bad ASCII characters in comments, some due to compiler warnings, and others due to external dependencies in the `pom.xml` (namely `org.junit.jupiter:junit-jupiter-api`). I think all those are fixable. (See https://openjdk.java.net/groups/build/doc/testing.html for configuring, compiling, and executing JMH tests.)
> 2. The structure under `test/micro` is misleading, and does not follow directory to package name convention. This is also fixable. I think we need to remove the maven project and merge in under `test/micro/org/openjdk/bench/jdk/incubator/vector`. We remove the maven project and place the non-generated benchmark files directly under the aforementioned directory (and fix the ones that fail to compile). Generated benchmarks could be placed under, say, `operation`.
> 
> Initially I thought it was fine that generation scripts were split between unit tests and performance tests, but architecturally, i now think it better to keep one set of bash scripts for generated code, maintained under the `vector` module. These are complex and splitting will cause divergence. These scripts can generate the `operation` performance tests under the `test/micro` directory at the appropriate location.

Thanks again for your review @PaulSandoz. I set this aside until now due to priority work for jdk17 freeze.

Updated per review. 

1. Reverted original draft refactor, to keep perf. test generation code in original location.
2. Changed location of generated perf. tests to `test/micro/org/openjdk/bench/jdk/incubator/vector/operation` 
3. Moved non-generated tests and respective directories to `test/micro/org/openjdk/bench/jdk/incubator/vector/`.

I.E.:
`test/micro/org/openjdk/bench/jdk/incubator/vector/bigdata`
`test/micro/org/openjdk/bench/jdk/incubator/vector/crypto`
`test/micro/org/openjdk/bench/jdk/incubator/vector/utf8`

4. Fixed broken tests, now all tests build. 
  4a. Fixed comments that used invalid characters
  4b. Refactored to remove junit dependency
  4c. Removed static from some `@benchmark` methods to fix `static method should be qualified by type name, VectorDistance, 
  instead of by an expression` compile warning.

5. Updated package location of all moved files and generated files to `org.openjdk.bench.jdk.incubator.vector.<dir name>`
6. Removed pom.xml and other no longer needed files.

I can now build and run all tests using:
`make test TEST="micro:org.openjdk.bench.jdk.incubator.vector" MICRO="FORK=2;WARMUP_ITER=5;" CONF=linux-x86_64-server-release`


# Run complete. Total time: 04:40:06

Benchmark                                                               (ARRAY_LENGTH)  (dataSize)  (maxBytes)  (size)   Mode  Cnt       Score        Error   Units
o.o.b.j.i.v.bigdata.BooleanArrayCheck.filterAll                                   1024         N/A         N/A     N/A  thrpt    5    3884.266 ?    110.491  ops/ms
o.o.b.j.i.v.bigdata.BooleanArrayCheck.filterAll_vec                               1024         N/A         N/A     N/A  thrpt    5   16060.256 ?   5485.055  ops/ms
o.o.b.j.i.v.bigdata.ValueRangeCheckAndCastL2I.castL2I                             1024         N/A         N/A     N/A  thrpt    5    1470.185 ?     87.374  ops/ms
o.o.b.j.i.v.bigdata.ValueRangeCheckAndCastL2I.castL2I_vec                         1024         N/A         N/A     N/A  thrpt    5    1725.998 ?    763.099  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilOptimizedScalarDouble                N/A         N/A         N/A     N/A  thrpt    5    3466.435 ?     61.524  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilOptimizedScalarFloat                 N/A         N/A         N/A     N/A  thrpt    5    2949.337 ?    167.843  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilOptimizedVectorDouble128             N/A         N/A         N/A     N/A  thrpt    5    3831.298 ?    670.538  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilOptimizedVectorDouble256             N/A         N/A         N/A     N/A  thrpt    5    6916.338 ?   1240.536  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilOptimizedVectorDoubleMax             N/A         N/A         N/A     N/A  thrpt    5    6922.271 ?   1193.232  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilOptimizedVectorFloat128              N/A         N/A         N/A     N/A  thrpt    5    7667.293 ?   1627.940  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilOptimizedVectorFloat256              N/A         N/A         N/A     N/A  thrpt    5   12881.137 ?   2561.933  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilOptimizedVectorFloatMax              N/A         N/A         N/A     N/A  thrpt    5   12870.951 ?   2536.147  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilScalarDouble                         N/A         N/A         N/A     N/A  thrpt    5    3357.433 ?     77.093  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilScalarFloat                          N/A         N/A         N/A     N/A  thrpt    5    2812.819 ?    321.380  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilVectorDouble128                      N/A         N/A         N/A     N/A  thrpt    5    3834.837 ?    865.640  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilVectorDouble256                      N/A         N/A         N/A     N/A  thrpt    5    6894.900 ?   1743.191  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilVectorDoubleMax                      N/A         N/A         N/A     N/A  thrpt    5    6906.721 ?   1718.390  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilVectorFloat128                       N/A         N/A         N/A     N/A  thrpt    5    7443.532 ?   1897.726  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilVectorFloat256                       N/A         N/A         N/A     N/A  thrpt    5   12136.630 ?   2997.062  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilVectorFloatMax                       N/A         N/A         N/A     N/A  thrpt    5   12155.037 ?   2943.135  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.l2SquaredScalar                                 N/A         N/A         N/A     N/A  thrpt    5    3615.867 ?    188.036  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.l2SquaredVectorDouble128                        N/A         N/A         N/A     N/A  thrpt    5    4036.363 ?   1023.672  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.l2SquaredVectorDouble256                        N/A         N/A         N/A     N/A  thrpt    5    7766.613 ?   2090.510  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.l2SquaredVectorDoubleMax                        N/A         N/A         N/A     N/A  thrpt    5    7758.038 ?   2196.819  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.l2SquaredVectorFloat128                         N/A         N/A         N/A     N/A  thrpt    5    8603.290 ?   2372.100  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.l2SquaredVectorFloat256                         N/A         N/A         N/A     N/A  thrpt    5   17002.021 ?   5948.044  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.l2SquaredVectorFloatMax                         N/A         N/A         N/A     N/A  thrpt    5   17027.964 ?   5759.068  ops/ms
o.o.b.j.i.v.crypto.ChaChaBench.encrypt128                                          N/A       16384         N/A     N/A  thrpt    8   21730.541 ?   5043.050   ops/s
o.o.b.j.i.v.crypto.ChaChaBench.encrypt128                                          N/A       65536         N/A     N/A  thrpt    8    5496.309 ?   1245.802   ops/s
o.o.b.j.i.v.crypto.ChaChaBench.encrypt256                                          N/A       16384         N/A     N/A  thrpt    8   40332.908 ?  12166.064   ops/s
o.o.b.j.i.v.crypto.ChaChaBench.encrypt256                                          N/A       65536         N/A     N/A  thrpt    8   10304.809 ?   2968.075   ops/s
o.o.b.j.i.v.crypto.ChaChaBench.encrypt512                                          N/A       16384         N/A     N/A  thrpt    8     285.604 ?     82.992   ops/s
o.o.b.j.i.v.crypto.ChaChaBench.encrypt512                                          N/A       65536         N/A     N/A  thrpt    8      58.590 ?     16.202   ops/s
o.o.b.j.i.v.crypto.Poly1305Bench.auth128                                           N/A       16384         N/A     N/A  thrpt    8     201.353 ?     67.948   ops/s
o.o.b.j.i.v.crypto.Poly1305Bench.auth128                                           N/A       65536         N/A     N/A  thrpt    8      49.908 ?     15.856   ops/s
o.o.b.j.i.v.crypto.Poly1305Bench.auth256                                           N/A       16384         N/A     N/A  thrpt    8     364.051 ?    108.472   ops/s
o.o.b.j.i.v.crypto.Poly1305Bench.auth256                                           N/A       65536         N/A     N/A  thrpt    8      99.809 ?     33.255   ops/s
o.o.b.j.i.v.crypto.Poly1305Bench.auth512                                           N/A       16384         N/A     N/A  thrpt    8     472.031 ?    149.266   ops/s
o.o.b.j.i.v.crypto.Poly1305Bench.auth512                                           N/A       65536         N/A     N/A  thrpt    8     120.617 ?     35.399   ops/s
o.o.b.j.i.v.operation.Byte128Vector.ABS                                            N/A         N/A         N/A    1024  thrpt    5   16498.980 ?   3687.013  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ABSMasked                                      N/A         N/A         N/A    1024  thrpt    5   13420.584 ?   4553.559  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ADD                                            N/A         N/A         N/A    1024  thrpt    5   14389.853 ?   2559.982  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ADDLanes                                       N/A         N/A         N/A    1024  thrpt    5    5408.304 ?   1057.194  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ADDMasked                                      N/A         N/A         N/A    1024  thrpt    5   12571.391 ?   3170.842  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ADDMaskedLanes                                 N/A         N/A         N/A    1024  thrpt    5    4300.352 ?    911.484  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.AND                                            N/A         N/A         N/A    1024  thrpt    5   14197.257 ?   2830.585  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ANDLanes                                       N/A         N/A         N/A    1024  thrpt    5    5437.373 ?    607.144  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ANDMasked                                      N/A         N/A         N/A    1024  thrpt    5   12535.445 ?   3085.079  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ANDMaskedLanes                                 N/A         N/A         N/A    1024  thrpt    5    4251.096 ?   1278.371  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.AND_NOT                                        N/A         N/A         N/A    1024  thrpt    5   11770.249 ?   2634.626  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.AND_NOTMasked                                  N/A         N/A         N/A    1024  thrpt    5   10638.213 ?   3156.234  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ASHR                                           N/A         N/A         N/A    1024  thrpt    5    2698.630 ?    866.569  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ASHRMasked                                     N/A         N/A         N/A    1024  thrpt    5    2438.774 ?    716.473  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ASHRMaskedShift                                N/A         N/A         N/A    1024  thrpt    5    4884.817 ?   1014.613  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ASHRShift                                      N/A         N/A         N/A    1024  thrpt    5    6436.859 ?    990.331  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.BITWISE_BLEND                                  N/A         N/A         N/A    1024  thrpt    5   11348.187 ?   3386.170  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.BITWISE_BLENDMasked                            N/A         N/A         N/A    1024  thrpt    5    9926.890 ?   4263.756  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.DIV                                            N/A         N/A         N/A    1024  thrpt    5     169.706 ?    165.121  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.DIVMasked                                      N/A         N/A         N/A    1024  thrpt    5      97.473 ?     77.981  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.EQ                                             N/A         N/A         N/A    1024  thrpt    5    1486.251 ?   2582.369  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.FIRST_NONZERO                                  N/A         N/A         N/A    1024  thrpt    5     491.076 ?    582.534  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.FIRST_NONZEROMasked                            N/A         N/A         N/A    1024  thrpt    5     396.484 ?    535.547  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.GE                                             N/A         N/A         N/A    1024  thrpt    5    1455.821 ?   2516.670  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.GT                                             N/A         N/A         N/A    1024  thrpt    5    1496.344 ?   2613.716  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.IS_DEFAULT                                     N/A         N/A         N/A    1024  thrpt    5    1447.973 ?   2375.492  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.IS_NEGATIVE                                    N/A         N/A         N/A    1024  thrpt    5    1429.487 ?   2438.604  ops/ms

...

o.o.b.j.i.v.operation.ShortScalar.scatter256                                       N/A         N/A         N/A    1024  thrpt    5    1348.385 ?    140.870  ops/ms
o.o.b.j.i.v.operation.ShortScalar.scatter512                                       N/A         N/A         N/A    1024  thrpt    5    1444.297 ?    101.135  ops/ms
o.o.b.j.i.v.operation.ShortScalar.scatterBase0                                     N/A         N/A         N/A    1024  thrpt    5    1682.814 ?     62.974  ops/ms
o.o.b.j.i.v.operation.ShortScalar.zero                                             N/A         N/A         N/A    1024  thrpt    5   24059.679 ?   2452.079  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI128                                    N/A         N/A         N/A      64  thrpt    5     124.008 ?    167.291  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI128                                    N/A         N/A         N/A    1024  thrpt    5       8.291 ?     11.679  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI128                                    N/A         N/A         N/A   65536  thrpt    5       0.132 ?      0.194  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI256                                    N/A         N/A         N/A      64  thrpt    5     104.170 ?    149.013  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI256                                    N/A         N/A         N/A    1024  thrpt    5       5.604 ?      9.367  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI256                                    N/A         N/A         N/A   65536  thrpt    5       0.094 ?      0.132  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI512                                    N/A         N/A         N/A      64  thrpt    5      12.467 ?     14.546  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI512                                    N/A         N/A         N/A    1024  thrpt    5       0.781 ?      0.915  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI512                                    N/A         N/A         N/A   65536  thrpt    5       0.012 ?      0.014  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI64                                     N/A         N/A         N/A      64  thrpt    5      41.097 ?     48.712  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI64                                     N/A         N/A         N/A    1024  thrpt    5       2.555 ?      3.112  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI64                                     N/A         N/A         N/A   65536  thrpt    5       0.041 ?      0.048  ops/ms
o.o.b.j.i.v.operation.SumOfUnsignedBytes.scalar                                    N/A         N/A         N/A      64  thrpt    5   34764.547 ?   2124.872  ops/ms
o.o.b.j.i.v.operation.SumOfUnsignedBytes.scalar                                    N/A         N/A         N/A    1024  thrpt    5    2604.449 ?     63.086  ops/ms
o.o.b.j.i.v.operation.SumOfUnsignedBytes.scalar                                    N/A         N/A         N/A    4096  thrpt    5     653.640 ?     24.208  ops/ms
o.o.b.j.i.v.operation.SumOfUnsignedBytes.vectorInt                                 N/A         N/A         N/A      64  thrpt    5   68950.388 ?  27430.053  ops/ms
o.o.b.j.i.v.operation.SumOfUnsignedBytes.vectorInt                                 N/A         N/A         N/A    1024  thrpt    5   15432.548 ?   7166.145  ops/ms
o.o.b.j.i.v.operation.SumOfUnsignedBytes.vectorInt                                 N/A         N/A         N/A    4096  thrpt    5    4068.423 ?   1865.103  ops/ms
o.o.b.j.i.v.operation.SumOfUnsignedBytes.vectorShort                               N/A         N/A         N/A      64  thrpt    5   52088.764 ?  56731.840  ops/ms
o.o.b.j.i.v.operation.SumOfUnsignedBytes.vectorShort                               N/A         N/A         N/A    1024  thrpt    5    4369.735 ?   7629.283  ops/ms
o.o.b.j.i.v.operation.SumOfUnsignedBytes.vectorShort                               N/A         N/A         N/A    4096  thrpt    5    2692.982 ?   3791.073  ops/ms
o.o.b.j.i.v.utf8.DecodeBench.decodeScalar                                          N/A       32768           1     N/A  thrpt    8   54369.096 ?   1063.656   ops/s
o.o.b.j.i.v.utf8.DecodeBench.decodeScalar                                          N/A     8388608           1     N/A  thrpt    8     151.256 ?      6.312   ops/s
o.o.b.j.i.v.utf8.DecodeBench.decodeVector                                          N/A       32768           1     N/A  thrpt    8    6452.803 ?   2950.714   ops/s
o.o.b.j.i.v.utf8.DecodeBench.decodeVector                                          N/A     8388608           1     N/A  thrpt    8      24.918 ?     12.696   ops/s
o.o.b.j.i.v.utf8.DecodeBench.decodeVectorASCII                                     N/A       32768           1     N/A  thrpt    8  261340.791 ? 102913.900   ops/s
o.o.b.j.i.v.utf8.DecodeBench.decodeVectorASCII                                     N/A     8388608           1     N/A  thrpt    8     293.033 ?     74.431   ops/s

I am getting `java.lang.RuntimeException: Incorrect result` for DecodeBench.java test run:

java.lang.RuntimeException: Incorrect result
        at org.openjdk.bench.jdk.incubator.vector.utf8.DecodeBench.tearDownInvocation(DecodeBench.java:204)
        at org.openjdk.bench.jdk.incubator.vector.utf8.jmh_generated.DecodeBench_decodeVectorASCII_jmhTest.decodeVectorASCII_thrpt_jmhStub(DecodeBench_decodeVectorASCII_jmhTest.java:127)
        at org.openjdk.bench.jdk.incubator.vector.utf8.jmh_generated.DecodeBench_decodeVectorASCII_jmhTest.decodeVectorASCII_Throughput(DecodeBench_decodeVectorASCII_jmhTest.java:85)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:453)
        at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:437)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:831)


I'm looking into it, I'm not sure if this is a lack of hardware issue, test design, or bug. But it should not be failing https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/utf8/DecodeBench.java#L201 

@TearDown(Level.Invocation)
    public void tearDownInvocation() {
        out = new String(dst.array());
        if (!in.equals(out)) {
            System.out.println("in  = (" + in.length() + ") "" + arrayToString(in.getBytes()) + """);
            System.out.println("out = (" + out.length() + ") "" + arrayToString(out.getBytes()) + """);
            throw new RuntimeException("Incorrect result");
        }
    }


Previously I was only running Int64Vector perf test, which hid build issues with other tests. Any more feedback is appreciated.

> Separate from moving the location, but related to mainline integration, I have concerns as to the number of benchmarks generated. It may be we have to curate a default smaller set, but enable the ability for someone to generate more for local testing. We could consider that later on.

Do you have more thoughts here? Which tests should be included versus excluded?

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/77


More information about the panama-dev mailing list