RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms
Sandhya Viswanathan
sviswanathan at openjdk.java.net
Fri Apr 30 00:44:01 UTC 2021
On Thu, 29 Apr 2021 23:47:17 GMT, Xubo Zhang <github.com+58006833+xbzhang99 at openjdk.org> wrote:
> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions.
>
> For the following benchmark:
> http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java
>
> The optimization shows ~5x improvement.
>
> Base:
> Benchmark (count) Mode Cnt Score Error Units
> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ± 0.001 us/op
> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ± 0.001 us/op
> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ± 0.002 us/op
> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ± 0.002 us/op
> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ± 0.005 us/op
> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ± 0.007 us/op
> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ± 0.014 us/op
> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ± 0.023 us/op
> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ± 0.077 us/op
> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ± 0.160 us/op
> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ± 0.319 us/op
>
>
> With patch:
> Benchmark (count) Mode Cnt Score Error Units
> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ± 0.001 us/op
> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ± 0.001 us/op
> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ± 0.001 us/op
> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ± 0.001 us/op
> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ± 0.001 us/op
> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ± 0.002 us/op
> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ± 0.004 us/op
> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ± 0.010 us/op
> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ± 0.022 us/op
> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ± 0.052 us/op
> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ± 0.013 us/op
src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5870:
> 5868: __ cmovl(Assembler::above, s, size); // s = min(size, LIMIT)
> 5869: __ lea(end, Address(s, data, Address::times_1, -CHUNKSIZE_M1));
> 5870: __ cmpq(data, end);
This should be cmpptr here.
src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5895:
> 5893: // reduce
> 5894: __ vpslld(yb, yb, 3, Assembler::AVX_256bit); //b is scaled by 8
> 5895: __ vpmulld(ysa, ya, ExternalAddress((address) StubRoutines::x86::_adler32_ascale_table), Assembler::AVX_256bit); //need scratch register??
All the instructions with ExternalAddress can modify rscratch1 which is r10. It is good to pass an explicit scratch register to these as last argument.
-------------
PR: https://git.openjdk.java.net/jdk/pull/3806
More information about the hotspot-dev
mailing list