RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v6]
Jatin Bhateja
jbhateja at openjdk.org
Mon Apr 10 17:24:54 UTC 2023
On Fri, 7 Apr 2023 18:04:16 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:
>> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractShuffle.java line 96:
>>
>>> 94: }
>>> 95: Vector<?> shufvec = this.toBitsVector();
>>> 96: VectorMask<?> vecmask = shufvec.compare(VectorOperators.LT, 0);
>>
>> This may impact the intrinsification over AVX1 targets for floating point shuffles. Since bits vector is an integral vector and AVX1 does support 32 byte floats but not 32 byte integral vectors.
>
> Yes I think it is a drawback of this approach, however currently we do not support shuffling for 256-bit vectors on AVX1 machines either, and AVX1 seems to be a special case in this regard. This species of float and double may also be less common in the usage of Vector API since it is larger than SPECIES_PREFERRED.
Hi @merykitty , Agree with you that SPECIES_PREFERRED is preferred for vector algorithms intercepting both integral and floating point vectors.
FTR, we see a perf regression with Float256 based micro now on AVX=1 targets,
public static short micro() {
VectorShuffle<Float> iota = FloatVector.SPECIES_256.iotaShuffle(0, 1, true);
return iota.cast(ShortVector.SPECIES_128).toVector().reinterpretAsShorts().lane(1);
}
CPROMPT>javad --add-modules=jdk.incubator.vector -XX:UseAVX=1 -XX:+PrintIntrinsics -XX:CompileCommand=compileonly,shufflef::micro -cp . shufflef
CompileCommand: compileonly shufflef.micro bool compileonly = true
** not supported: arity=1 op=reinterpret/1 vlen1=8 etype1=int ismask=0
** not supported: arity=1 op=cast/1 vlen1=8 etype1=int ismask=0
@ 17 java.lang.Object::getClass (0 bytes) (intrinsic)
@ 24 java.lang.Object::getClass (0 bytes) (intrinsic)
@ 45 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) failed to inline (intrinsic)
@ 34 java.lang.Object::getClass (0 bytes) (intrinsic)
@ 54 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) failed to inline (intrinsic)
@ 17 java.lang.Object::getClass (0 bytes) (intrinsic)
@ 24 java.lang.Object::getClass (0 bytes) (intrinsic)
@ 45 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic)
@ 292 java.lang.Object::getClass (0 bytes) (intrinsic)
@ 298 java.lang.Object::getClass (0 bytes) (intrinsic)
@ 322 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic)
@ 292 java.lang.Object::getClass (0 bytes) (intrinsic)
@ 298 java.lang.Object::getClass (0 bytes) (intrinsic)
@ 322 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic)
@ 16 jdk.internal.vm.vector.VectorSupport::extract (35 bytes) (intrinsic)
[time] 386ms [res]3392
CPROMPT>export JAVA_HOME=/home/jatinbha/softwares/jdk-20/
CPROMPT>export PATH=$JAVA_HOME/bin:$PATH
CPROMPT>javad --add-modules=jdk.incubator.vector -XX:UseAVX=1 -XX:+PrintIntrinsics -XX:CompileCommand=compileonly,shufflef::micro -cp . shufflef
CompileCommand: compileonly shufflef.micro bool compileonly = true
WARNING: Using incubator modules: jdk.incubator.vector
@ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic)
@ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic)
@ 17 jdk.internal.vm.vector.VectorSupport::shuffleToVector (33 bytes) (intrinsic)
@ 292 java.lang.Object::getClass (0 bytes) (intrinsic)
@ 298 java.lang.Object::getClass (0 bytes) (intrinsic)
@ 322 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic)
@ 16 jdk.internal.vm.vector.VectorSupport::extract (35 bytes) (intrinsic)
[time] 7ms [res]3392
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1161810585
More information about the hotspot-compiler-dev
mailing list