RFR: 8283232: x86: Improve vector broadcast operations [v8]

Jatin Bhateja jbhateja at openjdk.org
Fri Jul 29 05:20:38 UTC 2022


On Fri, 29 Jul 2022 03:44:16 GMT, Quan Anh Mai <duke at openjdk.org> wrote:

>> Both the above JIT sequences have true dependency chain,  there is no scope of any additional architecture imposed false dependency doing any further perf degradation for which we use dep-breaking idioms.
>
> I'm sorry I don't quite understand what do you mean here, what I meant is that while `pcmpeqd xmmk, xmmk` is a dep-breaking idiom, `vpcmpeqd xmmk, xmmk, xmmk` seems to not be. As a result, I reverted that change and in this context, the only change is I added a branch for non-AVX machines. Please have a review for this patch. Thank you very much.

Yes, its a valid one-idiom and as per section E.1.2 of [X86 Optimization manual](https://cdrdv2.intel.com/v1/dl/getContent/671488)  such idioms are resolved by renamer and does not reach execution ports. 

I faintly remember that there was a subtle difference b/w handling of zeroing/one idioms on certain targets where in some cases one-idioms still go beyond renamer. But, we can keep this change of your since even if all-one idiom (vpcmpeqd) reach execution port, latency wise it's same as vpternlog over 256 bit vector.

-------------

PR: https://git.openjdk.org/jdk/pull/7832


More information about the hotspot-compiler-dev mailing list