RFR: 8266332: Adler32 intrinsic for x86 64-bit platforms [v10]
Vladimir Kozlov
kvn at openjdk.java.net
Mon May 17 19:24:41 UTC 2021
On Mon, 17 May 2021 18:57:28 GMT, Xubo Zhang <github.com+58006833+xbzhang99 at openjdk.org> wrote:
>> Implement Adler32 intrinsic for x86 64-bit platform using vector instructions.
>>
>> The benchmark test/micro/org/openjdk/bench/java/util/TestAdler32.java is contributed by Pengfei Li (pli, Pengfei.Li at arm.com).
>>
>> For this benchmark, the optimization shows ~5x improvement.
>>
>> Base:
>> Benchmark (count) Mode Cnt Score Error Units
>> TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ± 0.001 us/op
>> TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ± 0.001 us/op
>> TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ± 0.002 us/op
>> TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ± 0.002 us/op
>> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ± 0.005 us/op
>> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ± 0.007 us/op
>> TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ± 0.014 us/op
>> TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ± 0.023 us/op
>> TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ± 0.077 us/op
>> TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ± 0.160 us/op
>> TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ± 0.319 us/op
>>
>>
>> With patch:
>> Benchmark (count) Mode Cnt Score Error Units
>> TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ± 0.001 us/op
>> TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ± 0.001 us/op
>> TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ± 0.001 us/op
>> TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ± 0.001 us/op
>> TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ± 0.001 us/op
>> TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ± 0.002 us/op
>> TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ± 0.004 us/op
>> TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ± 0.010 us/op
>> TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ± 0.022 us/op
>> TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ± 0.052 us/op
>> TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ± 0.013 us/op
>
> Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision:
>
> Remove -XX:+UseAdler32Intrinsics, as it will fail on non-supported platforms
I have 2 comments.
src/hotspot/cpu/x86/macroAssembler_x86.hpp line 1322:
> 1320: Assembler::vpmulld(dst, nds, src, vector_len);
> 1321: }
> 1322: void vpmulld(XMMRegister dst, XMMRegister nds, AddressLiteral src, int vector_len, Register scratch_reg = rscratch1);
Looks like my comment was lost.
I see only last version of method is used in stub. Why you need additional 2 wrapper methods?
Also the code always pass `scratch_reg` - you don't need to set default value.
src/hotspot/cpu/x86/vm_version_x86.cpp line 907:
> 905: }
> 906: } else if (UseAdler32Intrinsics) {
> 907: if (!FLAG_IS_DEFAULT(UseAdler32Intrinsics))
Add `{}`.
-------------
Changes requested by kvn (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/3806
More information about the hotspot-compiler-dev
mailing list