RFR: 8358032: Use crypto pmull for CRC32/CRC32C intrinsics on Ampere CPU [v4]

Liming Liu lliu at openjdk.org
Mon Jun 9 05:16:57 UTC 2025


On Fri, 6 Jun 2025 09:47:17 GMT, Andrew Haley <aph at openjdk.org> wrote:

>>> According to perf, post-increment ops help to reduce the access to TLB on Ampere1 in this case.
>> 
>> Hmm, but it's code in a rather odd style in shared code. And from what I see, the intrinsic is only 22% of the runtime (for 128 bytes) anyway, and you're making the code larger. I certainly don't want to see this sort of thing proliferating in the intrinsics.
>> 
>> In general, it's up to CPU designers to make simple, straightforward code work well.
>> 
>> How important is this?
>
> On the other hand this code already exists in CRC32C, so it's simply unifying the two routines. OK, I won't object.

> you're making the code larger.

I don't think this makes the code larger.

> How important is this?

As I mentioned in problem 1, this causes a regression (~-14%) on Ampere1 when handling 64 bytes. No obvious effects in other cases though.

> so it's simply unifying the two routines.

Yes.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25609#discussion_r2135041760


More information about the hotspot-dev mailing list