RFR: 8300208: Optimize Adler32 stub for AVX-512 targets.
Sandhya Viswanathan
sviswanathan at openjdk.org
Fri Jan 27 01:41:19 UTC 2023
On Tue, 17 Jan 2023 17:24:20 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
> Patch optimizes Adler32 stub for AVX512 target.
>
> Main computation loop now uses zero extended lane widening load vector operation.
>
> New sequence also honors AVX3Thresholds so that implementation uses existing AVX2 instruction sequence on relevant targets
> if input size is smaller than threshold limit (default 4096).
>
> Following are the result of an [existing JMH micro ](https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/util/TestAdler32.java)on various targets.
>
> **System Configurations : Turbo frequency scaling is disabled, all the data is collected at fixed frequency of 2.8 GHz.
> SUT1 : Intel® Xeon® Platinum 8480+ Processor (Sapphire Rapids) 56C 2S
> SUT2 : Intel(R) Xeon(R) Platinum 8380 CPU (Icelake Server) 40C 2S
> SUT3 : Intel(R) Xeon(R) Platinum 8280 CPU (Cascadelake Server) 28C 2S**
>
>
> 
>
> 
>
> 
>
>
> Please review and share your feedback.
>
> Best Regards,
> Jatin
Could you please also update the test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java to throw Exception on failure?
src/hotspot/cpu/x86/stubGenerator_x86_64_adler.cpp line 147:
> 145: // AVX2 performs better for smaller inputs because of leaner post loop reduction sequence..
> 146: __ cmpl(s, 128);
> 147: __ jcc(Assembler::belowEqual, SPRELOOP1A_AVX2);
These two compares can be merged into one compare with larger of avx3_threshold() or 128.
src/hotspot/cpu/x86/stubGenerator_x86_64_adler.cpp line 155:
> 153: __ vpaddd(yb, yb, ya, Assembler::AVX_512bit);
> 154: __ addptr(data, CHUNKSIZE);
> 155: __ cmpptr(data, end);
This still processes 16 bytes worth of data in one loop iteration as the AVX2 loop. Have you given thoughts on processing double the size with AVX3?
-------------
PR: https://git.openjdk.org/jdk/pull/12045
More information about the hotspot-compiler-dev
mailing list