[jdk.incubator.vector ] F2I conversion does not get intristicated on osx

Paul Sandoz paul.sandoz at oracle.com
Fri Mar 12 19:19:29 UTC 2021


Regarding reinterpretAsInts(), I think it's just to avoid a cast and keep things fluent, as opposed to writing:

IntVector r = (IntVector) FloatVector.fromArray(SPECIES, in, i)
        .convert(VectorOperators.F2I, 0);
r.intoArray(out, i);

Paul.

> On Mar 12, 2021, at 7:19 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> Hi Eugene,
> 
> Thanks for the test case.
> 
> It is affected by the following implementation limitation:
> 
>   2712   70    b        Demo$Simd::work (41 bytes)
>  ** Rejected vector op (VectorCastF2X,int,8) because architecture does not support it
>  ** not supported: arity=1 op=cast#444/3 vlen2=8 etype2=int ismask=0
> ...
> @ 20   jdk.incubator.vector.AbstractVector::convert (27 bytes)   force inline by annotation
> ...
>        @ 128   jdk.internal.vm.vector.VectorSupport::convert (39 bytes)   failed to inline (intrinsic)
> ...
> 
> 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L1735
> 
> const bool Matcher::match_rule_supported_vector(int opcode, int vlen, BasicType bt) {
> ...
>    case Op_VectorCastF2X:
>    case Op_VectorCastD2X:
>      if (is_integral_type(bt)) {
>        // Casts from FP to integral types require special fixup logic not easily
>        // implementable with vectors.
>        return false; // Implementation limitation
>      }
> 
> Ideally, it should be intrinsified eventually, but even when there's no vector support for float-to-int casts, vector box elimination shouldn't be affected. That's something left for future performance work.
> 
> Also, I'm not sure I understand what your test case is intended to do:
> 
> jdk.incubator.vector.FloatVector.fromArray(VFP, f32, i)
>   .convert(jdk.incubator.vector.VectorOperators.F2I, 0)
>   .reinterpretAsInts()
>   .intoArray(i32, i);
> 
> Especially, why convert() is followed by reinterpret cast (reinterpretAsInts()).
> 
> Best regards,
> Vladimir Ivanov
> 
> On 12.03.2021 16:31, Eugene Kluchnikov wrote:
>> Using F2I conversion greatly slow-downs execution (at least on OSX).
>> Tried official jdk-16+36 build and built manually from
>> origin/vectorIntrinsics.
>> Repro code:
>> ```
>> public class Demo {
>> private static class Simd {
>> private static final jdk.incubator.vector.VectorSpecies<Float> VFP =
>> jdk.incubator.vector.FloatVector.SPECIES_PREFERRED;
>> static final int STEP = VFP.length();
>> static void work(float[] f32, int[] i32) {
>> for (int i = 0; i < f32.length; i += STEP) {
>> jdk.incubator.vector.FloatVector.fromArray(VFP, f32, i)
>> .convert(jdk.incubator.vector.VectorOperators.F2I, 0)
>> .reinterpretAsInts()
>> .intoArray(i32, i);
>> }
>> }
>> }
>> static void work(float[] f32, int[] i32) {
>> for (int i = 0; i < f32.length; ++i) {
>> i32[i] = (int) f32[i];
>> }
>> }
>> public static void main(String[] args) {
>> float[] f32 = new float[1024 * 1024];
>> int[] i32 = new int[1024 * 1024];
>> long t0 = System.nanoTime();
>> for (int i = 0; i < 1024; ++i) {
>> if (args.length == 1) {
>> Simd.work(f32, i32);
>> } else {
>> work(f32, i32);
>> }
>> }
>> long t1 = System.nanoTime();
>> System.out.println("Elapsed time: " + (t1 - t0) / 1000000 + "ms");
>> }
>> }
>> ```
>> Non-SIMD version finishes in 0.6s, SIMD version finishes in 6s.
>> Best regards,
>>   Eugene.



More information about the panama-dev mailing list