RFR: 8281375: Accelerate bitCount operation for AVX2 and AVX512 target. [v2]
Jatin Bhateja
jbhateja at openjdk.java.net
Fri Feb 11 17:56:12 UTC 2022
On Fri, 11 Feb 2022 13:56:31 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:
> Since the lookup table is the replicate of 16 bytes, it is possible to just emit a 16-byte lane and broadcast it to the whole vector. IIRC a broadcast has the same cost as a normal load. I think it is up to you to decide if a 16-byte constant is reasonable but it is surely much more manageable than a 64-byte constant.
Hi
Yes, it varies from case to case, like replicating 8 byte in a vector may be cheaper than loading from constant table. If we are able to rematerialize the value without touching the memory its should be the preference. Also adding bulky constant table may have side effects which may impact other optimization which considers native code size as a heuristic, generating load time constant tables are sharable. We can make good use of enhancement in substituent patches.
Best Regards,
Jatin
-------------
PR: https://git.openjdk.java.net/jdk/pull/7373
More information about the hotspot-compiler-dev
mailing list