RFR: 8318650: Optimized subword gather for x86 targets. [v13]

Mon Feb 26 09:39:59 UTC 2024

On Sun, 25 Feb 2024 06:23:50 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> src/hotspot/cpu/x86/x86.ad line 4120:
>> 
>>> 4118:     BasicType elem_bt = Matcher::vector_element_basic_type(this);
>>> 4119:     __ lea($tmp$$Register, $mem$$Address);
>>> 4120:     __ vgather8b(elem_bt, $dst$$XMMRegister, $tmp$$Register, $idx$$Register, $rtmp$$Register, vlen_enc);
>> 
>> The `LE8B` and `Matcher::vector_length_in_bytes(n) <= 8` suggest we can perform this with 4 bytes as well.
>> Is that correct?
>> Would that not lead to issues, when we are then reading `base_index` at bytes 4...7, which possibly have garbage, and then use that to gather?
>> Do we have tests for that?
>
> 64 bit sub-word SPECIES will either hold 8 bytes values or 4 short values, algorithm appropriately handle it.

Are you saying that the constraints are too relaxed, but currently no outside algorithm would pass something bad?
But then why not tighten the constraint to be correct?
What if I at some point start using this node in SuperWord / AutoVectorization?

>> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 3071:
>> 
>>> 3069:                 .fromArray(lsp, indexMap, mapOffset + i)
>>> 3070:                 .add(offset);
>>> 3071:             vix = VectorIntrinsics.checkIndex(vix, a.length);
>> 
>> are you using the `vix` after this assignment?
>
> Its purpose is to check out of bounds indices.

But is it required to do the assignment?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1502297241
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1502290579