RFR: 8329254: optimize integral reverse operations on x86 GFNI target.
Sandhya Viswanathan
sviswanathan at openjdk.org
Tue Apr 9 18:10:59 UTC 2024
On Thu, 28 Mar 2024 11:41:21 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
> - Efficient GFNI based instruction sequence to compute integral reverse operation was added along with JEP-426 (VectorAPI 4th Incubation). https://bugs.openjdk.org/browse/JDK-8284960
>
> - However, the CPUID based feature detection for GFNI was incorrectly performed under AVX512 check, fixing it shows roughly 2X performance improvement for Integer/Long.reverse APIs on E-core targets (MTL+).
>
>
> BaseLine:
> Benchmark (size) Mode Cnt Score Error Units
> Integers.reverse 500 avgt 2 0.120 us/op
> Longs.reverse 500 avgt 2 0.221 us/op
>
> Withopt:
> Benchmark (size) Mode Cnt Score Error Units
> Integers.reverse 500 avgt 2 0.050 us/op
> Longs.reverse 500 avgt 2 0.086 us/op
>
>
> Kindly review.
>
> Best Regards,
> Jatin
@jatin-bhateja Thanks a lot for putting this PR together. The register class for the following two instructs in x86_64.ad also need change:
From:
instruct bytes_reversebit_int_gfni(rRegI dst, rRegI src, **regF** xtmp1, **regF** xtmp2, rRegL rtmp, rFlagsReg cr)
instruct bytes_reversebit_long_gfni(rRegL dst, rRegL src, **regD** xtmp1, **regD** xtmp2, rRegL rtmp, rFlagsReg cr)
To:
instruct bytes_reversebit_int_gfni(rRegI dst, rRegI src, **vlRegF** xtmp1, **vlRegF** xtmp2, rRegL rtmp, rFlagsReg cr)
instruct bytes_reversebit_long_gfni(rRegL dst, rRegL src, **vlRegD** xtmp1, **vlRegD** xtmp2, rRegL rtmp, rFlagsReg cr)
-------------
PR Comment: https://git.openjdk.org/jdk/pull/18530#issuecomment-2045808500
More information about the hotspot-compiler-dev
mailing list