[jdk.incubator.vector ] F2I conversion does not get intristicated on osx

Fri Mar 12 13:31:34 UTC 2021

Using F2I conversion greatly slow-downs execution (at least on OSX).

Tried official jdk-16+36 build and built manually from
origin/vectorIntrinsics.

Repro code:
```
public class Demo {
private static class Simd {
private static final jdk.incubator.vector.VectorSpecies<Float> VFP =
jdk.incubator.vector.FloatVector.SPECIES_PREFERRED;
static final int STEP = VFP.length();
static void work(float[] f32, int[] i32) {
for (int i = 0; i < f32.length; i += STEP) {
jdk.incubator.vector.FloatVector.fromArray(VFP, f32, i)
.convert(jdk.incubator.vector.VectorOperators.F2I, 0)
.reinterpretAsInts()
.intoArray(i32, i);
}
}
}

static void work(float[] f32, int[] i32) {
for (int i = 0; i < f32.length; ++i) {
i32[i] = (int) f32[i];
}
}

public static void main(String[] args) {
float[] f32 = new float[1024 * 1024];
int[] i32 = new int[1024 * 1024];
long t0 = System.nanoTime();
for (int i = 0; i < 1024; ++i) {
if (args.length == 1) {
Simd.work(f32, i32);
} else {
work(f32, i32);
}
}
long t1 = System.nanoTime();
System.out.println("Elapsed time: " + (t1 - t0) / 1000000 + "ms");
}
}
```

Non-SIMD version finishes in 0.6s, SIMD version finishes in 6s.

Best regards,
  Eugene.