RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v16]

Yuri Gaevsky duke at openjdk.org
Tue Aug 5 19:38:12 UTC 2025


On Tue, 5 Aug 2025 12:53:24 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:

>> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware.
>> 
>> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0.
>
> Yuri Gaevsky has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits:
> 
>  - Merge master
>  - replaced vmul_vv + vadd_vv by vmadd_vv
>  - returned lmul==m4
>  - fixed error made for prevoius lmul-m1 experiment
>  - make an experiment with lmul==1 instead of lmul==4.
>  - move vredsum_vs out of VEC_LOOP to improve performance
>  - - removed tail processing with RVV instructions as simple scalar loop provides in general better results
>  - simplified arrays_hashcode_v() to be closer to VLA and use less general-purpose registers; minor cosmetic changes
>  - change slli+add sequence to shadd
>  - reorder instructions to make RVV instructions contiguous
>  - ... and 7 more: https://git.openjdk.org/jdk/compare/ba0ae4cb...e7fac6c7

Updated data after prevoius merge (`e7fac6c`) which includes [JDK-8362596](https://github.com/openjdk/jdk/commit/4189fcbac40943f3b26c3a01938837b4e4762285):

bpif3-16g%  ( for i in "-XX:DisableIntrinsic=_vectorizedHashCode" "-XX:-UseRVV" "-XX:+UseRVV" ; do ( echo "--- ${i} ---" && jdk/bin/java  -jar benchmarks.jar  --jvmArgs="-XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions ${i}" org.openjdk.bench.java.lang.ArraysHashCode.ints -p size=1,5,10,20,30,40,50,60,70,80,90,100,200,300 -f 1 -r 1 -w 1 -wi 5 -i 10 2>&1 | tail -15 ) done )
--- -XX:DisableIntrinsic=_vectorizedHashCode ---
Benchmark            (size)  Mode  Cnt    Score   Error  Units
ArraysHashCode.ints       1  avgt   10   11.274 ± 0.004  ns/op
ArraysHashCode.ints       5  avgt   10   28.837 ± 0.115  ns/op
ArraysHashCode.ints      10  avgt   10   43.109 ± 0.091  ns/op
ArraysHashCode.ints      20  avgt   10   68.190 ± 0.317  ns/op
ArraysHashCode.ints      30  avgt   10   88.075 ± 0.490  ns/op
ArraysHashCode.ints      40  avgt   10  115.032 ± 0.230  ns/op
ArraysHashCode.ints      50  avgt   10  136.004 ± 0.474  ns/op
ArraysHashCode.ints      60  avgt   10  161.900 ± 0.358  ns/op
ArraysHashCode.ints      70  avgt   10  169.663 ± 0.419  ns/op
ArraysHashCode.ints      80  avgt   10  193.207 ± 0.317  ns/op
ArraysHashCode.ints      90  avgt   10  208.696 ± 0.595  ns/op
ArraysHashCode.ints     100  avgt   10  232.698 ± 0.291  ns/op
ArraysHashCode.ints     200  avgt   10  447.169 ± 0.791  ns/op
ArraysHashCode.ints     300  avgt   10  655.249 ± 0.520  ns/op
--- -XX:-UseRVV ---
Benchmark            (size)  Mode  Cnt    Score   Error  Units
ArraysHashCode.ints       1  avgt   10   11.273 ± 0.003  ns/op
ArraysHashCode.ints       5  avgt   10   23.180 ± 0.008  ns/op
ArraysHashCode.ints      10  avgt   10   32.735 ± 0.076  ns/op
ArraysHashCode.ints      20  avgt   10   50.745 ± 0.056  ns/op
ArraysHashCode.ints      30  avgt   10   71.264 ± 0.148  ns/op
ArraysHashCode.ints      40  avgt   10   88.367 ± 0.034  ns/op
ArraysHashCode.ints      50  avgt   10  108.355 ± 0.058  ns/op
ArraysHashCode.ints      60  avgt   10  125.885 ± 0.055  ns/op
ArraysHashCode.ints      70  avgt   10  146.049 ± 0.213  ns/op
ArraysHashCode.ints      80  avgt   10  163.479 ± 0.049  ns/op
ArraysHashCode.ints      90  avgt   10  183.507 ± 0.170  ns/op
ArraysHashCode.ints     100  avgt   10  201.041 ± 0.032  ns/op
ArraysHashCode.ints     200  avgt   10  389.416 ± 0.517  ns/op
ArraysHashCode.ints     300  avgt   10  576.795 ± 0.364  ns/op
--- -XX:+UseRVV ---
Benchmark            (size)  Mode  Cnt    Score   Error  Units
ArraysHashCode.ints       1  avgt   10   11.283 ± 0.005  ns/op
ArraysHashCode.ints       5  avgt   10   23.197 ± 0.023  ns/op
ArraysHashCode.ints      10  avgt   10   38.824 ± 0.007  ns/op
ArraysHashCode.ints      20  avgt   10   70.612 ± 0.372  ns/op
ArraysHashCode.ints      30  avgt   10  101.474 ± 0.027  ns/op
ArraysHashCode.ints      40  avgt   10  108.357 ± 0.034  ns/op
ArraysHashCode.ints      50  avgt   10  139.659 ± 0.061  ns/op
ArraysHashCode.ints      60  avgt   10  171.644 ± 0.047  ns/op
ArraysHashCode.ints      70  avgt   10  112.136 ± 0.051  ns/op
ArraysHashCode.ints      80  avgt   10  146.094 ± 0.289  ns/op
ArraysHashCode.ints      90  avgt   10  177.230 ± 0.032  ns/op
ArraysHashCode.ints     100  avgt   10  119.787 ± 0.270  ns/op
ArraysHashCode.ints     200  avgt   10  161.705 ± 0.086  ns/op
ArraysHashCode.ints     300  avgt   10  216.808 ± 0.364  ns/op

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3156355237


More information about the hotspot-compiler-dev mailing list