RFR: 8283232: x86: Improve vector broadcast operations [v8]
Jatin Bhateja
jbhateja at openjdk.org
Fri Jul 29 05:20:38 UTC 2022
On Fri, 29 Jul 2022 03:44:16 GMT, Quan Anh Mai <duke at openjdk.org> wrote:
>> Both the above JIT sequences have true dependency chain, there is no scope of any additional architecture imposed false dependency doing any further perf degradation for which we use dep-breaking idioms.
>
> I'm sorry I don't quite understand what do you mean here, what I meant is that while `pcmpeqd xmmk, xmmk` is a dep-breaking idiom, `vpcmpeqd xmmk, xmmk, xmmk` seems to not be. As a result, I reverted that change and in this context, the only change is I added a branch for non-AVX machines. Please have a review for this patch. Thank you very much.
Yes, its a valid one-idiom and as per section E.1.2 of [X86 Optimization manual](https://cdrdv2.intel.com/v1/dl/getContent/671488) such idioms are resolved by renamer and does not reach execution ports.
I faintly remember that there was a subtle difference b/w handling of zeroing/one idioms on certain targets where in some cases one-idioms still go beyond renamer. But, we can keep this change of your since even if all-one idiom (vpcmpeqd) reach execution port, latency wise it's same as vpternlog over 256 bit vector.
-------------
PR: https://git.openjdk.org/jdk/pull/7832
More information about the hotspot-compiler-dev
mailing list