RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2

Claes Redestad redestad at openjdk.org
Mon Jan 23 12:02:11 UTC 2023


On Sat, 21 Jan 2023 00:15:10 GMT, Scott Gibbons <duke at openjdk.org> wrote:

> Added code for Base64 acceleration (encode and decode) which will accelerate ~4x for AVX2 platforms.
> 
> Encode performance:
> **Old:**
> 
> Benchmark                      (maxNumBytes)   Mode  Cnt     Score   Error   Units
> Base64Encode.testBase64Encode           1024  thrpt    3  4309.439 ± 2.632  ops/ms
> 
> 
> **New:**
> 
> Benchmark                      (maxNumBytes)   Mode  Cnt      Score     Error   Units
> Base64Encode.testBase64Encode           1024  thrpt    3  24211.397 ± 102.026  ops/ms
> 
> 
> Decode performance:
> **Old:**
> 
> Benchmark                      (errorIndex)  (lineSize)  (maxNumBytes)   Mode  Cnt     Score    Error   Units
> Base64Decode.testBase64Decode           144           4           1024  thrpt    3  3961.768 ± 93.409  ops/ms
> 
> **New:**
> Benchmark                      (errorIndex)  (lineSize)  (maxNumBytes)   Mode  Cnt      Score    Error   Units
> Base64Decode.testBase64Decode           144           4           1024  thrpt    3  14738.051 ± 24.383  ops/ms

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2661:

> 2659:     __ vpbroadcastq(xmm4, Address(r13, 0), Assembler::AVX_256bit);
> 2660:     __ vmovdqu(xmm11, Address(r13, 0x28));
> 2661:     __ vpbroadcastb(xmm10, Address(r13, 0), Assembler::AVX_256bit);

Sorry in advance since I'm probably reading this wrong: the data that `r13` is pointing to appears to be a repeated byte pattern (`0x2f2f2f...`), does this mean this `vpbroadcastb` and the `vpbroadcastq` above end up filling up their respective registers with the exact same bits? If so, and since neither of them is mutated in the code below, then perhaps this can be simplified a bit.

-------------

PR: https://git.openjdk.org/jdk/pull/12126


More information about the core-libs-dev mailing list