RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version
Yuri Gaevsky
duke at openjdk.org
Sat Jan 13 09:27:26 UTC 2024
On Sat, 13 Jan 2024 09:21:37 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:
> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware.
>
> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0.
NB: I have no access to RVV v1.0.0 hardware so to estimate performance improvements
adopted the patch to RVV v0.7.1 ISA under OpenJDK-21 and run the JMH test
org.openjdk.bench.java.lang.ArraysHashCode on LicheePi-4A TH1520 which does support
RVV v.0.7.1.
The results are below. Hopefully they will be similar on RVV v1.0.0 hardware.
Legend: UseVHI ==> UseVectorizedHashCodeIntrinsic
----------------------------------------------------------------------------------------------------------------------------------------------
[-XX:-UseVHI -XX:-UseRVV] [-XX:-UseVHI -XX:+UseRVV] [-XX:+UseVHI -XX:-UseRVV] [-XX:+UseVHi -XX:+UseRVV]
----------------------------------------------------------------------------------------------------------------------------------------------
Benchmark (size) Mode Cnt | Score Error | Score Error | Score Error | Score Error |Units|
----------------------------------------------------------------------------------------------------------------------------------------------
bytes 1 avgt 10 | 20.292 ± 0.524 | 20.693 ± 1.706 | 20.458 ± 0.718 | 20.276 ± 0.525 |ns/op|
bytes 10 avgt 10 | 35.107 ± 0.180 | 35.054 ± 0.029 | 30.898 ± 0.109 | 31.033 ± 0.132 |ns/op|
bytes 100 avgt 10 | 188.190 ± 4.192 | 188.805 ± 4.345 | 152.324 ± 2.205 | 97.673 ± 3.145 |ns/op|
bytes 1000 avgt 10 | 1664.569 ± 1.662 | 1663.711 ± 2.229 | 1184.224 ± 0.731 | 656.340 ± 1.908 |ns/op|
bytes 10000 avgt 10 | 16419.434 ± 68.995 | 16407.357 ± 43.737 | 11599.876 ± 23.574 | 6171.500 ± 16.633 |ns/op|
bytes 100000 avgt 10 | 167738.927 ± 3313.255 | 166577.887 ± 1552.963 | 119475.413 ± 1358.363 | 62061.873 ± 130.268 |ns/op|
chars 1 avgt 10 | 20.420 ± 1.031 | 20.294 ± 0.527 | 20.402 ± 0.992 | 21.267 ± 0.027 |ns/op|
chars 10 avgt 10 | 35.800 ± 0.032 | 35.778 ± 0.049 | 31.170 ± 0.199 | 31.744 ± 0.169 |ns/op|
chars 100 avgt 10 | 185.715 ± 0.674 | 184.531 ± 1.152 | 143.918 ± 1.147 | 90.613 ± 0.092 |ns/op|
chars 1000 avgt 10 | 1683.711 ± 46.493 | 1668.926 ± 6.850 | 1120.730 ± 3.017 | 652.677 ± 2.026 |ns/op|
chars 10000 avgt 10 | 16402.007 ± 16.654 | 16468.497 ± 136.411 | 10939.505 ± 72.647 | 6174.555 ± 28.879 |ns/op|
chars 100000 avgt 10 | 164826.072 ± 381.240 | 165807.663 ± 4328.908 | 114787.826 ± 4217.557 | 61724.436 ± 45.819 |ns/op|
ints 1 avgt 10 | 20.730 ± 2.375 | 20.506 ± 1.458 | 20.277 ± 0.517 | 20.169 ± 0.015 |ns/op|
ints 10 avgt 10 | 36.878 ± 0.059 | 36.162 ± 1.033 | 31.338 ± 0.243 | 32.511 ± 0.165 |ns/op|
ints 100 avgt 10 | 184.288 ± 0.790 | 184.939 ± 0.624 | 143.794 ± 0.708 | 80.406 ± 6.987 |ns/op|
ints 1000 avgt 10 | 1669.219 ± 3.559 | 1670.992 ± 13.830 | 1118.856 ± 1.086 | 486.305 ± 4.471 |ns/op|
ints 10000 avgt 10 | 16432.730 ± 62.326 | 16710.540 ± 68.028 | 11128.766 ± 57.448 | 5232.062 ± 291.835 |ns/op|
ints 100000 avgt 10 | 165387.705 ± 431.814 | 165597.050 ± 278.567 | 115605.648 ± 8245.853 | 45468.032 ± 1793.979 |ns/op|
multibytes 1 avgt 10 | 3.459 ± 0.020 | 3.473 ± 0.055 | 3.477 ± 0.145 | 3.480 ± 0.043 |ns/op|
multibytes 10 avgt 10 | 16.983 ± 0.264 | 17.526 ± 0.375 | 12.325 ± 0.117 | 13.415 ± 0.136 |ns/op|
multibytes 100 avgt 10 | 105.251 ± 0.250 | 105.032 ± 0.180 | 78.795 ± 0.260 | 53.210 ± 1.024 |ns/op|
multibytes 1000 avgt 10 | 948.171 ± 5.950 | 957.757 ± 12.117 | 700.407 ± 1.928 | 440.352 ± 2.248 |ns/op|
multibytes 10000 avgt 10 | 8829.949 ± 64.161 | 9007.879 ± 510.217 | 6406.776 ± 17.982 | 3430.480 ± 35.108 |ns/op|
multibytes 100000 avgt 10 | 89545.793 ± 6151.064 | 88335.319 ± 51.310 | 64236.061 ± 46.572 | 33380.485 ± 56.708 |ns/op|
multichars 1 avgt 10 | 3.475 ± 0.054 | 3.453 ± 0.066 | 3.492 ± 0.122 | 3.495 ± 0.047 |ns/op|
multichars 10 avgt 10 | 17.719 ± 0.645 | 17.201 ± 0.152 | 12.318 ± 0.141 | 13.093 ± 0.147 |ns/op|
multichars 100 avgt 10 | 106.735 ± 0.283 | 106.625 ± 0.177 | 77.695 ± 0.212 | 51.495 ± 0.166 |ns/op|
multichars 1000 avgt 10 | 927.573 ± 6.839 | 932.211 ± 3.445 | 696.374 ± 1.757 | 471.226 ± 1.499 |ns/op|
multichars 10000 avgt 10 | 9846.872 ± 20.840 | 9909.611 ± 188.165 | 6392.901 ± 4.849 | 3978.730 ± 180.130 |ns/op|
multichars 100000 avgt 10 | 88110.303 ± 41.764 | 88892.543 ± 2534.299 | 60615.033 ± 94.002 | 33956.859 ± 199.178 |ns/op|
multiints 1 avgt 10 | 3.450 ± 0.328 | 3.382 ± 0.150 | 3.345 ± 0.024 | 3.380 ± 0.040 |ns/op|
multiints 10 avgt 10 | 18.265 ± 0.424 | 18.644 ± 1.433 | 12.036 ± 0.041 | 13.773 ± 0.114 |ns/op|
multiints 100 avgt 10 | 107.500 ± 0.636 | 107.318 ± 0.466 | 77.971 ± 0.296 | 47.700 ± 0.408 |ns/op|
multiints 1000 avgt 10 | 924.920 ± 9.106 | 937.609 ± 44.303 | 695.427 ± 2.075 | 449.475 ± 2.061 |ns/op|
multiints 10000 avgt 10 | 9322.880 ± 49.589 | 9277.425 ± 91.828 | 7009.704 ± 297.983 | 6196.819 ± 367.531 |ns/op|
multiints 100000 avgt 10 | 88154.281 ± 279.258 | 88272.818 ± 103.608 | 64118.963 ± 6445.702 | 55317.212 ± 916.179 |ns/op|
multishorts 1 avgt 10 | 3.488 ± 0.034 | 3.531 ± 0.227 | 3.521 ± 0.051 | 3.512 ± 0.054 |ns/op|
multishorts 10 avgt 10 | 17.907 ± 0.380 | 17.408 ± 0.659 | 12.252 ± 0.110 | 13.445 ± 0.102 |ns/op|
multishorts 100 avgt 10 | 106.588 ± 0.188 | 107.500 ± 0.531 | 79.630 ± 0.428 | 53.886 ± 3.243 |ns/op|
multishorts 1000 avgt 10 | 931.732 ± 6.891 | 923.814 ± 11.836 | 701.534 ± 1.742 | 470.312 ± 2.117 |ns/op|
multishorts 10000 avgt 10 | 9663.105 ± 1017.387 | 9859.034 ± 66.672 | 6422.864 ± 7.486 | 3785.710 ± 37.656 |ns/op|
multishorts 100000 avgt 10 | 88799.262 ± 2363.672 | 88015.545 ± 52.795 | 60541.966 ± 155.521 | 33888.677 ± 127.071 |ns/op|
shorts 1 avgt 10 | 20.199 ± 0.083 | 20.190 ± 0.027 | 21.389 ± 0.600 | 21.250 ± 0.024 |ns/op|
shorts 10 avgt 10 | 35.842 ± 0.189 | 35.806 ± 0.167 | 30.960 ± 0.186 | 31.451 ± 0.182 |ns/op|
shorts 100 avgt 10 | 184.323 ± 0.488 | 185.318 ± 0.776 | 143.652 ± 1.057 | 90.657 ± 0.052 |ns/op|
shorts 1000 avgt 10 | 1664.583 ± 2.016 | 1666.803 ± 3.100 | 1118.623 ± 0.661 | 652.112 ± 0.346 |ns/op|
shorts 10000 avgt 10 | 16395.042 ± 39.388 | 16426.231 ± 75.461 | 10933.090 ± 16.165 | 6200.135 ± 116.218 |ns/op|
shorts 100000 avgt 10 | 165037.332 ± 226.003 | 167782.156 ± 8844.288 | 114329.012 ± 4326.851 | 61693.056 ± 93.278 |ns/op|
----------------------------------------------------------------------------------------------------------------------------------------------
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-1890392431
More information about the hotspot-compiler-dev
mailing list