RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]
Jamil Nimeh
jnimeh at openjdk.org
Thu Oct 27 21:21:33 UTC 2022
On Thu, 27 Oct 2022 09:22:03 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> One small thing maybe: It doesn't look like R in `processMultipleBlocks` and `rbytes` ever changes, so maybe there's no need to repeatedly serialize/deserialize them on every call to engineUpdate? There is already an `r` that is attached to the object that is an IntegerModuloP. Could that be used in `processMultipleBlocks` and perhaps a private byte[] for a serialized r is also a field in Poly1305 that can be passed into the intrinsic method rather than creating it every time? It could be set in `setRSVals`. Perhaps we can recover a little performance there?
>
>> 10% is not a negligible impact. I see your point about AVX512 reaping the rewards of this change, but there are plenty of x86_64 systems without AVX512 that will be impacted, not to mention other platforms like aarch64 which (for this change at least) will never see the benefits from the intrinsic.
>>
>> I don't have any suggestions right at this moment for how this could be streamlined at all to help reduce the pain for non-AVX512 systems. Worth looking into though.
>
> Do you suggest using white box APIs for CPU feature query during poly static initialization and perform multi block processing only for relevant platforms and keep the original implementation sacrosanct for other targets. VM does offer native white box primitives and currently its being used by tests infrastructure.
No, going the WhiteBox route was not something I was thinking of. I sought feedback from a couple hotspot-knowledgable people about the use of WhiteBox APIs and both felt that it was not the right way to go. One said that WhiteBox is really for VM testing and not for these kinds of java classes.
-------------
PR: https://git.openjdk.org/jdk/pull/10582
More information about the security-dev
mailing list