RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v4]
Paul Sandoz
psandoz at openjdk.org
Mon Apr 3 16:39:01 UTC 2023
On Sat, 1 Apr 2023 07:44:25 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:
>> `Vector::slice` is a method at the top-level class of the Vector API that concatenates the 2 inputs into an intermediate composite and extracts a window equal to the size of the inputs into the result. It is used in vector conversion methods where the part number is not 0 to slice the parts to the correct positions. Slicing is also used in text processing such as utf8 and utf16 validation. x86 starting from SSSE3 has `palignr` which does vector slicing very efficiently. As a result, I think it is beneficial to add a C2 node for this operation as well as intrinsify `Vector::slice` method.
>>
>> A slice is currently implemented as `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires preparation of the index vector and the blending mask. Even with the preparations being hoisted out of the loops, microbenchmarks show improvement using the slice instrinsics. Some have tremendous increases in throughput due to the limitation that a mask of length 2 cannot currently be intrinsified, leading to falling back to the Java implementations.
>>
>> Please take a look and have some reviews. Thank you very much.
>
> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits:
>
> - instruction asserts
> - Merge branch 'master' into sliceIntrinsics
> - add comments explaining anonymous classes
> - address reviews
> - sse2, increase warmup
> - aesthetic
> - optimise 64B
> - add jmh
> - vector slice intrinsics
With the latest PR I am observing failures with debug builds for test compiler/vectorapi/TestVectorSlice.java on both AVX512 machines and aarch64 machines.
On AVX512 machines the test fails with JVM args `-XX:UseAVX=3` and `-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting` and results in a test assertion failure e.g.,
Caused by: java.lang.RuntimeException: assertEquals: expected 70 to equal 0
at jdk.test.lib.Asserts.fail(Asserts.java:594)
at jdk.test.lib.Asserts.assertEquals(Asserts.java:205)
at jdk.test.lib.Asserts.assertEquals(Asserts.java:189)
at compiler.vectorapi.TestVectorSlice.lambda$testInts$2(TestVectorSlice.java:163)
at compiler.vectorapi.TestVectorSlice.testInts(TestVectorSlice.java:181)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
... 7 more
CPU flags are:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant tsc arch perfmon rep good nopl xtopology cpuid tsc known freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4 1 sse4 2 x2apic movbe popcnt tsc deadline timer aes xsave avx f16c rdrand hypervisor lahf lm abm 3dnowprefetch cpuid fault invpcid single ssbd ibrs ibpb stibp ibrs enhanced tpr shadow vnmi flexpriority ept vpid ept ad fsgsbase tsc adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves nt good wbnoinvd arat avx512vbmi umip pku ospke avx512 vbmi2 gfni vaes vpclmulqdq avx512 vnni avx512 bitalg avx512 vpopcntdq la57 rdpid md clear arch capabilities
On aarch64 there is an IR rule failure.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/12909#issuecomment-1494641261
More information about the hotspot-compiler-dev
mailing list