RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

Volodymyr Paprotski duke at openjdk.org
Mon Nov 14 21:29:08 UTC 2022


On Mon, 14 Nov 2022 17:58:36 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote:

>> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`.
>> 
>> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java.
>>   - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please.
>> - Added a JMH perf test.
>>    - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider.
>> 
>> Perf before:
>> 
>> Benchmark                   (dataSize)  (provider)   Mode  Cnt        Score        Error  Units
>> Poly1305DigestBench.digest          64              thrpt    8  2961300.661 ± 110554.162  ops/s
>> Poly1305DigestBench.digest         256              thrpt    8  1791912.962 ±  86696.037  ops/s
>> Poly1305DigestBench.digest        1024              thrpt    8   637413.054 ±  14074.655  ops/s
>> Poly1305DigestBench.digest       16384              thrpt    8    48762.991 ±    390.921  ops/s
>> Poly1305DigestBench.digest     1048576              thrpt    8      769.872 ±      1.402  ops/s
>> 
>> and after:
>> 
>> Benchmark                   (dataSize)  (provider)   Mode  Cnt        Score        Error  Units
>> Poly1305DigestBench.digest          64              thrpt    8  2841243.668 ± 154528.057  ops/s
>> Poly1305DigestBench.digest         256              thrpt    8  1662003.873 ±  95253.445  ops/s
>> Poly1305DigestBench.digest        1024              thrpt    8  1770028.718 ± 100847.766  ops/s
>> Poly1305DigestBench.digest       16384              thrpt    8   765547.287 ±  25883.825  ops/s
>> Poly1305DigestBench.digest     1048576              thrpt    8    14508.458 ±     56.147  ops/s
>
> Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits:
> 
>  - Merge remote-tracking branch 'origin/master' into avx512-poly
>  - Vladimir's review
>  - live review with Sandhya
>  - jcheck
>  - Sandhya's review
>  - fix windows and 32b linux builds
>  - add getLimbs to interface and reviews
>  - fix 32-bit build
>  - make UsePolyIntrinsics option diagnostic
>  - Merge remote-tracking branch 'origin/master' into avx512-poly
>  - ... and 13 more: https://git.openjdk.org/jdk/compare/e269dc03...a26ac7db

(Build finally passing!)

Hi @TobiHartmann you had mentioned there were some more tests to run? Looking to see what else needs fixing. Thanks.

@iwanowww thanks for the reviews! As you have time, let me know what else you see or if its good for approval? Don't want to switch too much to another intrinsic yet, one crypto algorithm is about what I can fit into my brain at a time.

-------------

PR: https://git.openjdk.org/jdk/pull/10582



More information about the security-dev mailing list