RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]
vpaprotsk
duke at openjdk.org
Fri Oct 28 19:52:03 UTC 2022
On Thu, 27 Oct 2022 05:10:59 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> vpaprotsk has updated the pull request incrementally with one additional commit since the last revision:
>>
>> extra whitespace character
>
> src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 175:
>
>> 173: // Choice of 1024 is arbitrary, need enough data blocks to amortize conversion overhead
>> 174: // and not affect platforms without intrinsic support
>> 175: int blockMultipleLength = (len/BLOCK_LENGTH) * BLOCK_LENGTH;
>
> Since Poly processes 16 byte chunks, a strength reduced version of above expression could be len & (~(BLOCK_LEN-1)
I guess I got no issue with either version.. I was mostly thinking about code clarity? I think your version is 'more reliable' so just gonna switch it, thanks.
> test/micro/org/openjdk/bench/javax/crypto/full/Poly1305DigestBench.java line 94:
>
>> 92: throw new RuntimeException(ex);
>> 93: }
>> 94: }
>
> On CLX patch shows performance regression of about 10% for block size 1024-2048+.
>
> CLX (Non-IFMA target)
>
> Baseline (JDK-20):-
>
> Benchmark (dataSize) (provider) Mode Cnt Score Error Units
> Poly1305DigestBench.digest 64 thrpt 2 3128928.978 ops/s
> Poly1305DigestBench.digest 256 thrpt 2 1526452.083 ops/s
> Poly1305DigestBench.digest 1024 thrpt 2 509267.401 ops/s
> Poly1305DigestBench.digest 2048 thrpt 2 305784.922 ops/s
> Poly1305DigestBench.digest 4096 thrpt 2 142175.885 ops/s
> Poly1305DigestBench.digest 8192 thrpt 2 72142.906 ops/s
> Poly1305DigestBench.digest 16384 thrpt 2 36357.000 ops/s
> Poly1305DigestBench.digest 1048576 thrpt 2 676.142 ops/s
>
>
> Withopt:
> Benchmark (dataSize) (provider) Mode Cnt Score Error Units
> Poly1305DigestBench.digest 64 thrpt 2 3136204.416 ops/s
> Poly1305DigestBench.digest 256 thrpt 2 1683221.124 ops/s
> Poly1305DigestBench.digest 1024 thrpt 2 457432.172 ops/s
> Poly1305DigestBench.digest 2048 thrpt 2 277563.817 ops/s
> Poly1305DigestBench.digest 4096 thrpt 2 149393.357 ops/s
> Poly1305DigestBench.digest 8192 thrpt 2 79463.734 ops/s
> Poly1305DigestBench.digest 16384 thrpt 2 41083.730 ops/s
> Poly1305DigestBench.digest 1048576 thrpt 2 705.419 ops/s
Odd, I measured it on `11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz`, will go again
-------------
PR: https://git.openjdk.org/jdk/pull/10582
More information about the security-dev
mailing list