[jdk.incubator.vector ] F2I conversion does not get intristicated on osx

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Fri Mar 12 15:19:01 UTC 2021


Hi Eugene,

Thanks for the test case.

It is affected by the following implementation limitation:

    2712   70    b        Demo$Simd::work (41 bytes)
   ** Rejected vector op (VectorCastF2X,int,8) because architecture does 
not support it
   ** not supported: arity=1 op=cast#444/3 vlen2=8 etype2=int ismask=0
...
@ 20   jdk.incubator.vector.AbstractVector::convert (27 bytes)   force 
inline by annotation
...
         @ 128   jdk.internal.vm.vector.VectorSupport::convert (39 
bytes)   failed to inline (intrinsic)
...


https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L1735

const bool Matcher::match_rule_supported_vector(int opcode, int vlen, 
BasicType bt) {
...
     case Op_VectorCastF2X:
     case Op_VectorCastD2X:
       if (is_integral_type(bt)) {
         // Casts from FP to integral types require special fixup logic 
not easily
         // implementable with vectors.
         return false; // Implementation limitation
       }

Ideally, it should be intrinsified eventually, but even when there's no 
vector support for float-to-int casts, vector box elimination shouldn't 
be affected. That's something left for future performance work.

Also, I'm not sure I understand what your test case is intended to do:

  jdk.incubator.vector.FloatVector.fromArray(VFP, f32, i)
    .convert(jdk.incubator.vector.VectorOperators.F2I, 0)
    .reinterpretAsInts()
    .intoArray(i32, i);

Especially, why convert() is followed by reinterpret cast 
(reinterpretAsInts()).

Best regards,
Vladimir Ivanov

On 12.03.2021 16:31, Eugene Kluchnikov wrote:
> Using F2I conversion greatly slow-downs execution (at least on OSX).
> 
> Tried official jdk-16+36 build and built manually from
> origin/vectorIntrinsics.
> 
> Repro code:
> ```
> public class Demo {
> private static class Simd {
> private static final jdk.incubator.vector.VectorSpecies<Float> VFP =
> jdk.incubator.vector.FloatVector.SPECIES_PREFERRED;
> static final int STEP = VFP.length();
> static void work(float[] f32, int[] i32) {
> for (int i = 0; i < f32.length; i += STEP) {
> jdk.incubator.vector.FloatVector.fromArray(VFP, f32, i)
> .convert(jdk.incubator.vector.VectorOperators.F2I, 0)
> .reinterpretAsInts()
> .intoArray(i32, i);
> }
> }
> }
> 
> static void work(float[] f32, int[] i32) {
> for (int i = 0; i < f32.length; ++i) {
> i32[i] = (int) f32[i];
> }
> }
> 
> public static void main(String[] args) {
> float[] f32 = new float[1024 * 1024];
> int[] i32 = new int[1024 * 1024];
> long t0 = System.nanoTime();
> for (int i = 0; i < 1024; ++i) {
> if (args.length == 1) {
> Simd.work(f32, i32);
> } else {
> work(f32, i32);
> }
> }
> long t1 = System.nanoTime();
> System.out.println("Elapsed time: " + (t1 - t0) / 1000000 + "ms");
> }
> }
> ```
> 
> Non-SIMD version finishes in 0.6s, SIMD version finishes in 6s.
> 
> Best regards,
>    Eugene.
> 


More information about the panama-dev mailing list