RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]

Jamil Nimeh jnimeh at openjdk.org
Tue Oct 25 22:09:49 UTC 2022


On Mon, 24 Oct 2022 22:09:29 GMT, vpaprotsk <duke at openjdk.org> wrote:

>> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`.
>> 
>> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java.
>>   - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please.
>> - Added a JMH perf test.
>>    - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider.
>> 
>> Perf before:
>> 
>> Benchmark                   (dataSize)  (provider)   Mode  Cnt        Score        Error  Units
>> Poly1305DigestBench.digest          64              thrpt    8  2961300.661 ± 110554.162  ops/s
>> Poly1305DigestBench.digest         256              thrpt    8  1791912.962 ±  86696.037  ops/s
>> Poly1305DigestBench.digest        1024              thrpt    8   637413.054 ±  14074.655  ops/s
>> Poly1305DigestBench.digest       16384              thrpt    8    48762.991 ±    390.921  ops/s
>> Poly1305DigestBench.digest     1048576              thrpt    8      769.872 ±      1.402  ops/s
>> 
>> and after:
>> 
>> Benchmark                   (dataSize)  (provider)   Mode  Cnt        Score        Error  Units
>> Poly1305DigestBench.digest          64              thrpt    8  2841243.668 ± 154528.057  ops/s
>> Poly1305DigestBench.digest         256              thrpt    8  1662003.873 ±  95253.445  ops/s
>> Poly1305DigestBench.digest        1024              thrpt    8  1770028.718 ± 100847.766  ops/s
>> Poly1305DigestBench.digest       16384              thrpt    8   765547.287 ±  25883.825  ops/s
>> Poly1305DigestBench.digest     1048576              thrpt    8    14508.458 ±     56.147  ops/s
>
> vpaprotsk has updated the pull request incrementally with one additional commit since the last revision:
> 
>   extra whitespace character

src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 171:

> 169:         }
> 170: 
> 171:         if (len >= 1024) {

Out of curiosity, do you have any perf numbers for the impact of this change on systems that do not support AVX512?  Does this help or hurt (or make a negligible impact) on poly1305 updates when the input is 1K or larger?

src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 296:

> 294:         keyBytes[12] &= (byte)252;
> 295: 
> 296:         // This should be enabled, but Poly1305KAT would fail

I'm on the fence about this change.  I have no problem with it in basic terms.  If we ever decided to make this a general purpose Mac in JCE then this would definitely be good to do.  As of right now, the only consumer is ChaCha20 and it would submit a key through the process in the RFC.  Seems really unlikely to run afoul of these checks, but admittedly not impossible.

I would agree with @sviswa7 that we could examine this in a separate change and we could look at other approaches to getting around the KAT issue, perhaps some package-private based way to disable the check.  As long as Poly1305 remains with package-private visibility, one could make another form of the constructor with a boolean that would disable this check and that is the constructor that the KAT would use.  This is just an off-the-cuff idea, but one way we might get the best of both worlds.

If we move this down the road then we should remove the commenting.  We can refer back to this PR later.

-------------

PR: https://git.openjdk.org/jdk/pull/10582



More information about the security-dev mailing list