RFR: 8329254: optimize integral reverse operations on x86 GFNI target.

Sandhya Viswanathan sviswanathan at openjdk.org
Tue Apr 9 18:10:59 UTC 2024


On Thu, 28 Mar 2024 11:41:21 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

> - Efficient GFNI based instruction sequence to compute integral reverse operation was added along with JEP-426 (VectorAPI 4th Incubation). https://bugs.openjdk.org/browse/JDK-8284960
> 
> - However, the CPUID based feature detection for GFNI was incorrectly performed under AVX512 check, fixing it shows roughly 2X performance improvement for Integer/Long.reverse APIs on E-core targets (MTL+).
> 
> 
> BaseLine:
> Benchmark              (size)  Mode  Cnt  Score   Error  Units
> Integers.reverse          500  avgt    2  0.120          us/op
> Longs.reverse             500  avgt    2  0.221          us/op
> 
> Withopt:
> Benchmark              (size)  Mode  Cnt  Score   Error  Units
> Integers.reverse          500  avgt    2  0.050          us/op
> Longs.reverse             500  avgt    2  0.086          us/op
> 
> 
> Kindly review.
> 
> Best Regards,
> Jatin

@jatin-bhateja Thanks a lot for putting this PR together. The register class for the following two instructs in x86_64.ad also need change:
From:
instruct bytes_reversebit_int_gfni(rRegI dst, rRegI src, **regF** xtmp1, **regF** xtmp2, rRegL rtmp, rFlagsReg cr)
instruct bytes_reversebit_long_gfni(rRegL dst, rRegL src, **regD** xtmp1, **regD** xtmp2, rRegL rtmp, rFlagsReg cr)

To:
instruct bytes_reversebit_int_gfni(rRegI dst, rRegI src, **vlRegF** xtmp1, **vlRegF** xtmp2, rRegL rtmp, rFlagsReg cr)
instruct bytes_reversebit_long_gfni(rRegL dst, rRegL src, **vlRegD** xtmp1, **vlRegD** xtmp2, rRegL rtmp, rFlagsReg cr)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18530#issuecomment-2045808500


More information about the hotspot-compiler-dev mailing list