RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v7]
ArsenyBochkarev
duke at openjdk.org
Thu May 16 13:03:24 UTC 2024
> Hello everyone! Please review this ~non-vectorized~ implementation of `_updateBytesAdler32` intrinsic. Reference implementation for AArch64 can be found [here](https://github.com/openjdk/jdk9/blob/master/hotspot/src/cpu/aarch64/vm/stubGenerator_aarch64.cpp#L3281).
>
> ### Correctness checks
>
> Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok. All tier1 also passed.
>
> ### Performance results on T-Head board
>
> Enabled intrinsic:
>
> | Benchmark | (count) | Mode | Cnt | Score | Error | Units |
> | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- |
> | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 5522.693 | 23.387 | ops/ms |
> | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 3430.761 | 9.210 | ops/ms |
> | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 1962.888 | 5.323 | ops/ms |
> | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 1050.938 | 0.144 | ops/ms |
> | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 549.227 | 0.375 | ops/ms |
> | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 280.829 | 0.170 | ops/ms |
> | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 116.333 | 0.057 | ops/ms |
> | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 71.392 | 0.060 | ops/ms |
> | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 35.784 | 0.019 | ops/ms |
> | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 17.924 | 0.010 | ops/ms |
> | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 8.940 | 0.003 | ops/ms |
>
> Disabled intrinsic:
>
> | Benchmark | (count) | Mode | Cnt | Score | Error | Units |
> | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- |
> |Adler32.TestAdler32.testAdler32Update|64|thrpt|25|655.633|5.845|ops/ms|
> |Adler32.TestAdler32.testAdler32Update|128|thrpt|25|587.418|10.062|ops/ms|
> |Adler32.TestAdler32.testAdler32Update|256|thrpt|25|546.675|11.598|ops/ms|
> |Adler32.TestAdler32.testAdler32Update|512|thrpt|25|432.328|11.517|ops/ms|
> |Adler32.TestAdler32.testAdler32Update|1024|thrpt|25|311.771|4.238|ops/ms|
> |Adler32.TestAdler32.testAdler32Update|2048|thrpt|25|202.648|2.486|ops/ms|
> |Adler32.TestAdler32.testAdler32Update|5012|thrpt|25|100.246|1.119|ops/ms|
> |Adler32.TestAdler32.testAdler32Update|8192|t...
ArsenyBochkarev has updated the pull request incrementally with eight additional commits since the last revision:
- Prettify L_nmax loop
- Add comments in functions
- Add explanation comment for L_nmax_loop
- Fix L_nmax_loop for big lengths
- Fix L_by16 loop step
- Prettify intrinsic
- Use LMUL=4 for most of the calculations
- Use LMUL to load multiple data in one step
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/18382/files
- new: https://git.openjdk.org/jdk/pull/18382/files/3cf649c9..be7d2551
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=18382&range=06
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=18382&range=05-06
Stats: 124 lines in 1 file changed: 56 ins; 15 del; 53 mod
Patch: https://git.openjdk.org/jdk/pull/18382.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/18382/head:pull/18382
PR: https://git.openjdk.org/jdk/pull/18382
More information about the hotspot-compiler-dev
mailing list