RFR: 8338021: Support saturating vector operators in VectorAPI [v5]

Tue Sep 3 12:54:24 UTC 2024

On Mon, 2 Sep 2024 12:20:59 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Hi All,
>> 
>> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators.
>> 
>> 
>>      . SUADD   : Saturating unsigned addition.
>>      . SADD    : Saturating signed addition. 
>>      . SUSUB   : Saturating unsigned subtraction.
>>      . SSUB    : Saturating signed subtraction.
>>      . UMAX    : Unsigned max
>>      . UMIN    : Unsigned min.
>>      
>> 
>> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value.
>> 
>> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios.
>> 
>> Summary of changes:
>> - Java side implementation of new vector operators.
>> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it.
>> - C2 compiler IR and inline expander changes.
>> - Optimized x86 backend implementation for new vector operators and their predicated counterparts.
>> - Extends existing VectorAPI Jtreg test suite to cover new operations.
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch.
>> 
>> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Review comments resolved

Ok. This is a huge change. And you do not just introduce changes to the VectorAPI and add Vector instructions. But you also add the scalar instructions.

Can you split this into at least 2 PR's that are smaller please?
- Scalar saturating instructions: they could even be made available to the user via `Integer.saturatingAdd` etc. Would that not be desired?
- Vector saturating instructions

I'm afraid that now you are not using the scalar ops individually at all, and they are only used as fallback when the vector-api code is not intrinsified. But how can we test this properly?

I'm just not very happy having to review 9K+ PR's 🙈

src/hotspot/cpu/x86/assembler_x86.cpp line 560:

> 558: }
> 559: 
> 560: bool Assembler::needs_evex(XMMRegister reg1, XMMRegister reg2, XMMRegister reg3) {

This is an ASSERT / DEBUG only method, correct? Do you want to `#ifdef ASSERT` it accordingly?

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 914:

> 912:       case T_SHORT: vpminuw(dst, src1, src2, vlen_enc); break;
> 913:       case T_INT:   vpminud(dst, src1, src2, vlen_enc); break;
> 914:       case T_LONG:  evpminuq(dst, k0, src1, src2, false, vlen_enc); break;

Can you explain to me what the `k0` is and where it comes from?

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 929:

> 927: }
> 928: 
> 929: void C2_MacroAssembler::vpuminmaxq(int opcode, XMMRegister dst, XMMRegister src1, XMMRegister src2, XMMRegister xtmp1, XMMRegister xtmp2, int vlen_enc) {

Either wrap all inputs or none ;)

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4705:

> 4703:     default:
> 4704:       fatal("Unsupported operation  %s", NodeClassNames[ideal_opc]);
> 4705:       break;

Did you mean to explicitly mention these cases as unsupported? If yes, please add a comment to the code why.

src/hotspot/cpu/x86/x86.ad line 6527:

> 6525:   %}
> 6526:   ins_pipe( pipe_slow );
> 6527: %}

Should change the `uminmax_reg` to indicate that it is a `vector` operation? The `format` already says `vector_uminmax_reg`... Because what if we one day want to use the name `uminmax_reg` for a scalar operation?

src/hotspot/share/opto/addnode.hpp line 194:

> 192: class SaturatingAddINode : public Node {
> 193: public:
> 194:   SaturatingAddINode(Node* in1, Node* in2) : Node(in1,in2) {}

Suggestion:

  SaturatingAddINode(Node* in1, Node* in2) : Node(in1, in2) {}

In other places below as well.

src/hotspot/share/opto/addnode.hpp line 198:

> 196:   virtual const Type* bottom_type() const { return TypeInt::INT; }
> 197:   virtual uint ideal_reg() const { return Op_RegI; }
> 198: };

Are these not supposed to inherit from the `AddNode`, and then override the corresponding methods? Or are you making them separate for a good reason?

src/hotspot/share/opto/addnode.hpp line 462:

> 460: //------------------------------UMaxINode---------------------------------------
> 461: // Maximum of 2 unsigned integers.
> 462: class UMaxLNode : public Node {

Here you comment it with `UMaxINode`, but below it is the `UMaxLNode`. The `-------xyz------` comments are really useless. But the semantics description is useful (though you again say integer instead of long here...).

src/hotspot/share/opto/matcher.hpp line 380:

> 378:   static BasicType vector_element_basic_type(const MachNode* use, const MachOper* opnd);
> 379:   static const Type* vector_element_type(const Node* n);
> 380:   static const Type* vector_element_type(const MachNode* use, const MachOper* opnd);

You should probably create your own section for this, since this is not about the **basic** type.

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2277262281
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741956515
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741964463
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741961089
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741971197
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741976975
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741990855
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741984047
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741987722
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741997411