RFR: 8338021: Support saturating vector operators in VectorAPI [v4]

Mon Aug 26 23:15:11 UTC 2024

On Mon, 19 Aug 2024 07:19:30 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Hi All,
>> 
>> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators.
>> 
>> 
>>      . SUADD   : Saturating unsigned addition.
>>      . SADD    : Saturating signed addition. 
>>      . SUSUB   : Saturating unsigned subtraction.
>>      . SSUB    : Saturating signed subtraction.
>>      . UMAX    : Unsigned max
>>      . UMIN    : Unsigned min.
>>      
>> 
>> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value.
>> 
>> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios.
>> 
>> Summary of changes:
>> - Java side implementation of new vector operators.
>> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it.
>> - C2 compiler IR and inline expander changes.
>> - Optimized x86 backend implementation for new vector operators and their predicated counterparts.
>> - Extends existing VectorAPI Jtreg test suite to cover new operations.
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch.
>> 
>> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Review comments resolutions.

src/hotspot/cpu/x86/assembler_x86.cpp line 3454:

> 3452: 
> 3453: void Assembler::evmovdquw(XMMRegister dst, KRegister mask, XMMRegister src, bool merge, int vector_len) {
> 3454:   assert(VM_Version::supports_avx512vlbw(), "");

vl not needed for 512 bit.

src/hotspot/cpu/x86/assembler_x86.cpp line 4583:

> 4581: void Assembler::evpcmpgtb(KRegister kdst, XMMRegister nds, Address src, int vector_len) {
> 4582:   assert(VM_Version::supports_avx512vlbw(), "");
> 4583:   assert(vector_len == Assembler::AVX_512bit || VM_Version::supports_avx512vl(), "");

The check for supports_avx512vlbw() at previous line in this function need to be changed to supports_avx512bw(). 
If it helps, the vl check is already happening in vex_prefix() if we use the higher bank registers for length < 512 bit.

src/hotspot/cpu/x86/assembler_x86.cpp line 4596:

> 4594: void Assembler::evpcmpgtb(KRegister kdst, KRegister mask, XMMRegister nds, Address src, int vector_len) {
> 4595:   assert(VM_Version::supports_avx512vlbw(), "");
> 4596:   assert(vector_len == Assembler::AVX_512bit || VM_Version::supports_avx512vl(), "");

The check for supports_avx512vlbw() at previous line in this function need to be changed to supports_avx512bw().

src/hotspot/cpu/x86/assembler_x86.cpp line 4611:

> 4609: void Assembler::evpcmpub(KRegister kdst, XMMRegister nds, XMMRegister src, ComparisonPredicate vcc, int vector_len) {
> 4610:   assert(vector_len == Assembler::AVX_512bit || VM_Version::supports_avx512vl(), "");
> 4611:   assert(VM_Version::supports_avx512vlbw(), "");

I think you meant this to be supports_avx512bw().

src/hotspot/cpu/x86/assembler_x86.cpp line 4620:

> 4618: void Assembler::evpcmpuw(KRegister kdst, XMMRegister nds, XMMRegister src, ComparisonPredicate vcc, int vector_len) {
> 4619:   assert(vector_len == Assembler::AVX_512bit || VM_Version::supports_avx512vl(), "");
> 4620:   assert(VM_Version::supports_avx512vlbw(), "");

The check for supports_avx512vlbw() in this function need to be changed to supports_avx512bw().

src/hotspot/cpu/x86/assembler_x86.cpp line 4645:

> 4643: void Assembler::evpcmpuw(KRegister kdst, XMMRegister nds, Address src, ComparisonPredicate vcc, int vector_len) {
> 4644:   assert(VM_Version::supports_avx512vlbw(), "");
> 4645:   assert(vector_len == Assembler::AVX_512bit || VM_Version::supports_avx512vl(), "");

The check for supports_avx512vlbw() at previous line in this function need to be changed to supports_avx512bw().

src/hotspot/cpu/x86/assembler_x86.cpp line 4672:

> 4670: void Assembler::evpcmpeqb(KRegister kdst, KRegister mask, XMMRegister nds, Address src, int vector_len) {
> 4671:   assert(VM_Version::supports_avx512vlbw(), "");
> 4672:   assert(vector_len == Assembler::AVX_512bit || VM_Version::supports_avx512vl(), "");

The check for supports_avx512vlbw() at previous line in this function need to be changed to supports_avx512bw().

src/hotspot/cpu/x86/assembler_x86.cpp line 8191:

> 8189: void Assembler::vpminub(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) {
> 8190:   assert(UseAVX > 0 && (vector_len == Assembler::AVX_512bit || (!needs_evex(dst, nds, src) || VM_Version::supports_avx512vl())), "");
> 8191:   assert(!needs_evex(dst, nds, src) || VM_Version::supports_avx512bw(), "");

It will be good to keep the assert similar to vpaddsb for new vmin/vmax instructions.

src/hotspot/cpu/x86/assembler_x86.cpp line 8311:

> 8309: }
> 8310: 
> 8311: void Assembler::evpminud(XMMRegister dst, KRegister mask, XMMRegister nds, Address src, bool merge, int vector_len) {

assert(VM_Version::supports_evex(), "") check missing.

src/hotspot/cpu/x86/assembler_x86.cpp line 8340:

> 8338: 
> 8339: void Assembler::evpminuq(XMMRegister dst, KRegister mask, XMMRegister nds, Address src, bool merge, int vector_len) {
> 8340:   assert(vector_len == AVX_512bit || VM_Version::supports_avx512vl(), "");

assert(VM_Version::supports_evex(), "") check missing.

src/hotspot/cpu/x86/assembler_x86.cpp line 8402:

> 8400: void Assembler::vpmaxuw(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) {
> 8401:   assert(vector_len == AVX_128bit ? VM_Version::supports_avx() :
> 8402:         (vector_len == AVX_256bit ? VM_Version::supports_avx2() : VM_Version::supports_avx512bw()), "");

Why support_avx() check here only and not in other newly added  v* integral instructions? For avx1 platforms, integral vector width supported is only 128bit.

src/hotspot/cpu/x86/assembler_x86.cpp line 8478:

> 8476: 
> 8477: void Assembler::evpmaxud(XMMRegister dst, KRegister mask, XMMRegister nds, Address src, bool merge, int vector_len) {
> 8478:   assert(vector_len == AVX_512bit || VM_Version::supports_avx512vl(), "");

assert(VM_Version::supports_evex(), "") is  missing.

src/hotspot/cpu/x86/assembler_x86.cpp line 8506:

> 8504: 
> 8505: void Assembler::evpmaxuq(XMMRegister dst, KRegister mask, XMMRegister nds, Address src, bool merge, int vector_len) {
> 8506:   assert(vector_len == AVX_512bit || VM_Version::supports_avx512vl(), "");

assert(VM_Version::supports_evex(), "") is missing.

src/hotspot/cpu/x86/assembler_x86.cpp line 10229:

> 10227:   InstructionMark im(this);
> 10228:   assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), "");
> 10229:   InstructionAttr attributes(vector_len, /* vex_w */ true,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true);

vex_w could be false here.

src/hotspot/cpu/x86/assembler_x86.cpp line 10256:

> 10254:   InstructionMark im(this);
> 10255:   assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), "");
> 10256:   InstructionAttr attributes(vector_len, /* vex_w */ true,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true);

vex_w could be false here.

src/hotspot/cpu/x86/assembler_x86.cpp line 10283:

> 10281:   InstructionMark im(this);
> 10282:   assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), "");
> 10283:   InstructionAttr attributes(vector_len, /* vex_w */ true,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true);

vex_w could be false here.

src/hotspot/cpu/x86/assembler_x86.cpp line 10310:

> 10308:   InstructionMark im(this);
> 10309:   assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), "");
> 10310:   InstructionAttr attributes(vector_len, /* vex_w */ true,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true);

vex_w could be false here.

src/hotspot/cpu/x86/assembler_x86.cpp line 10337:

> 10335:   InstructionMark im(this);
> 10336:   assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), "");
> 10337:   InstructionAttr attributes(vector_len, /* vex_w */ true,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true);

vex_w could be false here.

src/hotspot/cpu/x86/assembler_x86.cpp line 10364:

> 10362:   InstructionMark im(this);
> 10363:   assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), "");
> 10364:   InstructionAttr attributes(vector_len, /* vex_w */ true,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true);

vex_w could be false here.

src/hotspot/cpu/x86/assembler_x86.cpp line 10391:

> 10389:   InstructionMark im(this);
> 10390:   assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), "");
> 10391:   InstructionAttr attributes(vector_len, /* vex_w */ true,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true);

vex_w could be false here.

src/hotspot/cpu/x86/assembler_x86.cpp line 10419:

> 10417:   InstructionMark im(this);
> 10418:   assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), "");
> 10419:   InstructionAttr attributes(vector_len, /* vex_w */ true,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true);

vex_w could be false here.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731912227
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731608860
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731609177
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731917735
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731612730
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731726012
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731726337
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731748671
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731769490
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731771330
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731823750
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731870793
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731870288
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731888852
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731889468
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731890265
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731909994
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731910246
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731910516
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731910755
PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1731911129