RFR: 8358032: Use crypto pmull for CRC32/CRC32C intrinsics on Ampere CPU [v4]
Liming Liu
lliu at openjdk.org
Mon Jun 9 05:16:57 UTC 2025
On Fri, 6 Jun 2025 09:47:17 GMT, Andrew Haley <aph at openjdk.org> wrote:
>>> According to perf, post-increment ops help to reduce the access to TLB on Ampere1 in this case.
>>
>> Hmm, but it's code in a rather odd style in shared code. And from what I see, the intrinsic is only 22% of the runtime (for 128 bytes) anyway, and you're making the code larger. I certainly don't want to see this sort of thing proliferating in the intrinsics.
>>
>> In general, it's up to CPU designers to make simple, straightforward code work well.
>>
>> How important is this?
>
> On the other hand this code already exists in CRC32C, so it's simply unifying the two routines. OK, I won't object.
> you're making the code larger.
I don't think this makes the code larger.
> How important is this?
As I mentioned in problem 1, this causes a regression (~-14%) on Ampere1 when handling 64 bytes. No obvious effects in other cases though.
> so it's simply unifying the two routines.
Yes.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/25609#discussion_r2135041760
More information about the hotspot-dev
mailing list