RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v2]
Fei Yang
fyang at openjdk.org
Wed Apr 10 07:34:10 UTC 2024
On Sat, 6 Apr 2024 02:24:04 GMT, Fei Yang <fyang at openjdk.org> wrote:
>> ArsenyBochkarev has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available.
>
> I witnessed performance regression on unmatched board when count > 2048.
> JMH numbers:
>
> Before:
> Benchmark (count) Mode Cnt Score Error Units
> TestAdler32.testAdler32Update 64 thrpt 25 1050.761 ± 54.862 ops/ms
> TestAdler32.testAdler32Update 128 thrpt 25 953.858 ± 42.102 ops/ms
> TestAdler32.testAdler32Update 256 thrpt 25 821.011 ± 21.154 ops/ms
> TestAdler32.testAdler32Update 512 thrpt 25 624.207 ± 19.724 ops/ms
> TestAdler32.testAdler32Update 1024 thrpt 25 436.040 ± 5.875 ops/ms
> TestAdler32.testAdler32Update 2048 thrpt 25 265.020 ± 3.058 ops/ms
> TestAdler32.testAdler32Update 5012 thrpt 25 124.934 ± 0.799 ops/ms
> TestAdler32.testAdler32Update 8192 thrpt 25 70.026 ± 0.243 ops/ms
> TestAdler32.testAdler32Update 16384 thrpt 25 35.885 ± 0.055 ops/ms
> TestAdler32.testAdler32Update 32768 thrpt 25 16.883 ± 0.027 ops/ms
> TestAdler32.testAdler32Update 65536 thrpt 25 7.648 ± 0.006 ops/ms
>
> After:
> Benchmark (count) Mode Cnt Score Error Units
> TestAdler32.testAdler32Update 64 thrpt 25 4360.280 ± 39.921 ops/ms
> TestAdler32.testAdler32Update 128 thrpt 25 2766.595 ± 16.027 ops/ms
> TestAdler32.testAdler32Update 256 thrpt 25 1634.373 ± 5.412 ops/ms
> TestAdler32.testAdler32Update 512 thrpt 25 880.028 ± 1.463 ops/ms
> TestAdler32.testAdler32Update 1024 thrpt 25 457.724 ± 0.296 ops/ms
> TestAdler32.testAdler32Update 2048 thrpt 25 233.605 ± 0.072 ops/ms
> TestAdler32.testAdler32Update 5012 thrpt 25 96.610 ± 0.020 ops/ms
> TestAdler32.testAdler32Update 8192 thrpt 25 59.275 ± 0.012 ops/ms
> TestAdler32.testAdler32Update 16384 thrpt 25 29.726 ± 0.004 ops/ms
> TestAdler32.testAdler32Update 32768 thrpt 25 14.736 ± 0.009 ops/ms
> TestAdler32.testAdler32Update 65536 thrpt 25 6.658 ± 0.002 ops/ms
> @RealFYang Hi, thanks for pointing out! To achieve additional acceleration, I did a vectorization and re-measured performance on Kendryte K230 with RVV 1.0 enabled:
That's great to hear! I was not aware that it could run a full-featured Linux system.
May I ask what kind of Linux distro are you running with?
> It seems to me that there's a huge room for improvement in the current implementation.
Have you finished improving this with RVV 1.0? I can take another look when that is done.
> BTW, the data I used as a comparison from T-Head board was recorded a few months ago. Is it the code generation that has improved significantly? Or it's just me making some kind of mistake in measurements?
I am not sure what you mean. But I don't think there is a big change in this part?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/18382#issuecomment-2046729405
More information about the hotspot-compiler-dev
mailing list