RFR: 8329254: optimize integral reverse operations on x86 GFNI target.

Wed Apr 10 18:40:11 UTC 2024

On Tue, 9 Apr 2024 18:08:27 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

> @jatin-bhateja Thanks a lot for putting this PR together. The register class for the following two instructs in x86_64.ad also need change: From: instruct bytes_reversebit_int_gfni(rRegI dst, rRegI src, **regF** xtmp1, **regF** xtmp2, rRegL rtmp, rFlagsReg cr) instruct bytes_reversebit_long_gfni(rRegL dst, rRegL src, **regD** xtmp1, **regD** xtmp2, rRegL rtmp, rFlagsReg cr)
> 
> To: instruct bytes_reversebit_int_gfni(rRegI dst, rRegI src, **vlRegF** xtmp1, **vlRegF** xtmp2, rRegL rtmp, rFlagsReg cr) instruct bytes_reversebit_long_gfni(rRegL dst, rRegL src, **vlRegD** xtmp1, **vlRegD** xtmp2, rRegL rtmp, rFlagsReg cr)

Hi @sviswa7 , GFNI is supported on Icelake+ CPUs,  with regD/F register classes we select entire range of registers xmm1-31 on AVX512 targets which gives freedom to assembler to auto-promote instruction to EVEX encoding if allocator assigned a register from higher register bank, in this case since instruction operands are 128 bit registers, in principle an autopromotion on AVX512 target will only be feasible if target support VL, but given that all AVX512 GFNI targets support vector length orthogonality hence we should be good to go. 
For non AVX512 targets with GFNI we anyways deal with lower register bank.

I still agree that it's good to be strict than keeping loose ends, given that cloud instances can be tuned to enable custom feature sets.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18530#issuecomment-2048208836