RFR: 8283232: x86: Improve vector broadcast operations [v2]
Quan Anh Mai
duke at openjdk.org
Fri Jul 29 14:00:28 UTC 2022
On Wed, 16 Mar 2022 17:25:53 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>>> Hi, forwarding results within the same bypass domain does not result in delay, data bypass delay happens when the data crosses different domains, according to "Intel® 64 and IA-32 Architectures Optimization Reference Manual"
>>>
>>> > When a source of a micro-op executed in one stack comes from a micro-op executed in another stack, a delay can occur. The delay occurs also for transitions between Intel SSE integer and Intel SSE floating-point operations. In some of the cases, the data transition is done using a micro-op that is added to the instruction flow.
>>>
>>> The manual mentions the guideline at section 3.5.2.2
>>>
>>> 
>>>
>>> Thanks.
>>
>> Thanks meant to refer to above text. I have removed incorrect reference.
>
>> > Hi, forwarding results within the same bypass domain does not result in delay, data bypass delay happens when the data crosses different domains, according to "Intel® 64 and IA-32 Architectures Optimization Reference Manual"
>> > > When a source of a micro-op executed in one stack comes from a micro-op executed in another stack, a delay can occur. The delay occurs also for transitions between Intel SSE integer and Intel SSE floating-point operations. In some of the cases, the data transition is done using a micro-op that is added to the instruction flow.
>> >
>> >
>> > The manual mentions the guideline at section 3.5.2.2
>> > 
>> > Thanks.
>>
>> Thanks meant to refer to above text. I have removed incorrect reference.
>
> It will still be good if we can come up with a micro benchmark, that shows the gain with the patch.
@jatin-bhateja Thanks a lot for your comments, I have addressed those in the last commit.
@vnkozlov Thanks very much for the review and testing.
-------------
PR: https://git.openjdk.org/jdk/pull/7832
More information about the hotspot-compiler-dev
mailing list