[jdk.incubator.vector ] F2I conversion does not get intristicated on osx
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Fri Mar 12 15:19:01 UTC 2021
Hi Eugene,
Thanks for the test case.
It is affected by the following implementation limitation:
2712 70 b Demo$Simd::work (41 bytes)
** Rejected vector op (VectorCastF2X,int,8) because architecture does
not support it
** not supported: arity=1 op=cast#444/3 vlen2=8 etype2=int ismask=0
...
@ 20 jdk.incubator.vector.AbstractVector::convert (27 bytes) force
inline by annotation
...
@ 128 jdk.internal.vm.vector.VectorSupport::convert (39
bytes) failed to inline (intrinsic)
...
https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L1735
const bool Matcher::match_rule_supported_vector(int opcode, int vlen,
BasicType bt) {
...
case Op_VectorCastF2X:
case Op_VectorCastD2X:
if (is_integral_type(bt)) {
// Casts from FP to integral types require special fixup logic
not easily
// implementable with vectors.
return false; // Implementation limitation
}
Ideally, it should be intrinsified eventually, but even when there's no
vector support for float-to-int casts, vector box elimination shouldn't
be affected. That's something left for future performance work.
Also, I'm not sure I understand what your test case is intended to do:
jdk.incubator.vector.FloatVector.fromArray(VFP, f32, i)
.convert(jdk.incubator.vector.VectorOperators.F2I, 0)
.reinterpretAsInts()
.intoArray(i32, i);
Especially, why convert() is followed by reinterpret cast
(reinterpretAsInts()).
Best regards,
Vladimir Ivanov
On 12.03.2021 16:31, Eugene Kluchnikov wrote:
> Using F2I conversion greatly slow-downs execution (at least on OSX).
>
> Tried official jdk-16+36 build and built manually from
> origin/vectorIntrinsics.
>
> Repro code:
> ```
> public class Demo {
> private static class Simd {
> private static final jdk.incubator.vector.VectorSpecies<Float> VFP =
> jdk.incubator.vector.FloatVector.SPECIES_PREFERRED;
> static final int STEP = VFP.length();
> static void work(float[] f32, int[] i32) {
> for (int i = 0; i < f32.length; i += STEP) {
> jdk.incubator.vector.FloatVector.fromArray(VFP, f32, i)
> .convert(jdk.incubator.vector.VectorOperators.F2I, 0)
> .reinterpretAsInts()
> .intoArray(i32, i);
> }
> }
> }
>
> static void work(float[] f32, int[] i32) {
> for (int i = 0; i < f32.length; ++i) {
> i32[i] = (int) f32[i];
> }
> }
>
> public static void main(String[] args) {
> float[] f32 = new float[1024 * 1024];
> int[] i32 = new int[1024 * 1024];
> long t0 = System.nanoTime();
> for (int i = 0; i < 1024; ++i) {
> if (args.length == 1) {
> Simd.work(f32, i32);
> } else {
> work(f32, i32);
> }
> }
> long t1 = System.nanoTime();
> System.out.println("Elapsed time: " + (t1 - t0) / 1000000 + "ms");
> }
> }
> ```
>
> Non-SIMD version finishes in 0.6s, SIMD version finishes in 6s.
>
> Best regards,
> Eugene.
>
More information about the panama-dev
mailing list