RFR: 8349138: Optimize Math.copySign API for Intel e-core targets [v2]

Thu Feb 13 09:20:11 UTC 2025

On Wed, 12 Feb 2025 12:07:16 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>>> @jatin-bhateja Doing the transformation to `AndF` would be a more general solution and thus better.
>>> 
>>> > Introducing another new IR "AndF" will again need changes in auto-vectorizer.
>>> 
>>> But currently, `CopySign` and `MoveF2I` are not vectorized anyway so we can do the vectorization of `AndF` in a separate patch without much hassle. `AndF` is vectorized into existing `AndV` nicely so it is not a too complicated work.
>> 
>> Yes, I have a follow-up patch to auto-vectorized CopySign.
>> 
>>> > this patch does not break existing IR invariants
>>> 
>>> Also, what invariant can be broken by transforming `AndI(MoveF2I(x), MoveF2I(y)` into `MoveF2I(AndF(x, y))`?
>> 
>> Hi @merykitty , I meant that in the context of CopySign, targets emit efficient instruction sequences for existing IR (CopySignF/D),  this patch simply tuned x86 backend implementation to improve performance.
>
>> > @jatin-bhateja Doing the transformation to `AndF` would be a more general solution and thus better.
>> > > Introducing another new IR "AndF" will again need changes in auto-vectorizer.
>> > 
>> > 
>> > But currently, `CopySign` and `MoveF2I` are not vectorized anyway so we can do the vectorization of `AndF` in a separate patch without much hassle. `AndF` is vectorized into existing `AndV` nicely so it is not a too complicated work.
>> 
>> Yes, I have a follow-up patch to auto-vectorized CopySign.
>> 
>> > > this patch does not break existing IR invariants
>> > 
>> > 
>> > Also, what invariant can be broken by transforming `AndI(MoveF2I(x), MoveF2I(y)` into `MoveF2I(AndF(x, y))`?
>> 
>> Hi @merykitty , I meant that in the context of CopySign, targets emit efficient instruction sequences for existing IR (CopySignF/D), this patch simply tuned x86 backend implementation to improve performance.
> 
> 
> Also currently, logical And mask is a long value, in case we opt-in for new AndF/D node creation, to preserve the IR semantics we would also need to perform an integral to floating point constant conversion, this will incur additional memory load penalty since floating-point constants are emitted into the constant table before native method body.
> 
> For the time being, taking CopySign intrinsic route looks reasonable.

@jatin-bhateja let me know when this is ready for more testing / review.

Quick comment: it seems you are not just optimizing Math.copySign as the PR title says, but also adding vector nodes. Maybe you should update the PR title? Have not looked at the code in detail to suggest a better one yet ;)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23386#issuecomment-2655983534