RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v32]

Fri Oct 25 07:11:20 UTC 2024

On Thu, 24 Oct 2024 13:36:50 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Hi All,
>> 
>> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators.
>> 
>> 
>>      . SUADD   : Saturating unsigned addition.
>>      . SADD    : Saturating signed addition. 
>>      . SUSUB   : Saturating unsigned subtraction.
>>      . SSUB    : Saturating signed subtraction.
>>      . UMAX    : Unsigned max
>>      . UMIN    : Unsigned min.
>>      
>> 
>> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value.
>> 
>> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios.
>> 
>> Summary of changes:
>> - Java side implementation of new vector operators.
>> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it.
>> - C2 compiler IR and inline expander changes.
>> - Optimized x86 backend implementation for new vector operators and their predicated counterparts.
>> - Extends existing VectorAPI Jtreg test suite to cover new operations.
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch.
>> 
>> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html
>
> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits:
> 
>  - Review resolutions.
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201
>  - Factor out IR tests and Transforms to follow-up PRs.
>  - Replacing flag based checks with CPU feature checks in IR validation test.
>  - Remove Saturating IRNode patterns.
>  - Restrict IR validation to newly added UMin/UMax transforms.
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201
>  - Prod build fix
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201
>  - New IR tests + additional IR transformations
>  - ... and 27 more: https://git.openjdk.org/jdk/compare/158b93d1...0e10139c

Ok, I'll approve it now, since @PaulSandoz is ok with the test coverage.
I did not review all the x86 backend instructions, I don't have the time or understanding for that - I rely on testing for correctness here.

Thanks for the work @jatin-bhateja 😊 
The VectorAPI is a lot of work, and we are grateful for your contributions!

I am still worried about general test coverage. One can always do it later - but will that actually be done? I suppose this is a trade-off between spending time on quality vs quantity of features. Bugs in the VectorAPI will probably mostly show in miscompilations. Those are hard to catch without solid testing and rigorous result verification. The VectorAPI is also not widely used yet, so miscompilations will only slowly be discovered. Miscompilations in Auto-Vectorization are quicker discovered because there is more code out there that is sensitive to it.

Maybe I'm coming across as annoying with all my RFE-splitting suggestions. But I do think that it is the job of the PR-author to make the code as reviewable and high-quality before it is sent for review. The author knows the code best, and should not burden the reviewers unnecessarily with combined features. A PR that is 2x as long is not just 2x as much work to review. It takes the reviewer more time - maybe 4x because it is harder to understand what belongs together, if everything is sufficiently tested. Also, the number of review-rounds increases. Every time the reviewer is then required to read all code again. This is really a waste of time and very frustrating. But I think you understand that now, and I am looking forward to nicely portioned PRs in the future ;)

-------------

Marked as reviewed by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2394395794