[vector] reducing the cast implementation

Paul Sandoz paul.sandoz at oracle.com
Tue Jun 5 22:23:07 UTC 2018


Here’s a patch on top of the shuffle patch:

  http://cr.openjdk.java.net/~psandoz/panama/reduce-cast/webrev/

We can probably simplify the rebracket/resize implementations.


> On Jun 5, 2018, at 1:15 PM, Lupusoru, Razvan A <razvan.a.lupusoru at intel.com> wrote:
> 
> Looks good - thanks for trying it out! I believe the test in loop is range check, my guess is that if you run your tests with “-Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0” flag, you should see just vector load, convert, and store without the extra tests.

Yes.

—

I suspect we will still need to revisit Vector cast/reshape at some later point, they don’t sit well with regards to ARM SVE.

Paul.

> Anyway, thanks again!
>  
> --Razvan
>  
> From: Paul Sandoz [mailto:paul.sandoz at oracle.com] 
> Sent: Tuesday, June 05, 2018 11:52 AM
> To: Lupusoru, Razvan A <razvan.a.lupusoru at intel.com>
> Cc: panama-dev at openjdk.java.net
> Subject: Re: [vector] reducing the cast implementation
>  
> Hi,
>  
> I wrote a simple benchmark (see below) and analyzed the generated code. I limited the testing mostly to shapes that are hardware supported on my laptop.
>  
> For these simple tests I can reduce the cast down to the following with no change in generated code:
> @Override
> @ForceInline
> @SuppressWarnings("unchecked")
> public <F, T extends Shape> Float128Vector cast(Vector<F, T> o) {
>     if (o.length() != LENGTH)
>         throw new IllegalArgumentException("Vector length this species length differ");
> 
>     return VectorIntrinsics.cast(
>         (Class<Vector<F, T>>) o.getClass(), o.elementType(), LENGTH,
>         float.class, LENGTH, o,
>         (v, t) -> (Float128Vector) super.cast(v)
>     );
> }
> An example of generated code for int to float conversion (with loop unrolling switched off) is:
>  0.21%  ↗    0x0000000107510ab1: mov    %edx,%r11d         
> 22.26%  │ ↗  0x0000000107510ab4: vmovdqu 0x10(%r8,%r11,4),%xmm0  
>  8.91%  │ │  0x0000000107510abb: vcvtdq2ps %xmm0,%xmm0     
> 33.35%  │ │  0x0000000107510abf: cmp    %r9d,%r11d
>         │ │  0x0000000107510ac2: jae    0x0000000107510b7c 
>  0.16%  │ │  0x0000000107510ac8: vmovdqu %xmm0,0x10(%rcx,%r11,4)  
> 21.42%  │ │  0x0000000107510acf: add    $0x4,%edx
>  9.81%  │ │  0x0000000107510ad2: cmp    %ebx,%edx
>         ╰ │  0x0000000107510ad4: jl     0x0000000107510ab1  
>  
> It appears we don’t need the explicit if/else for the element type if o is type profiled.
>  
> I can clean this up further by adjusting the VectorIntrinsics signature and the generics. Further, i think we can remove the capturing lambda by placing the super cast implementation in a static method. 
>  
> Paul.
>  
> @State(Scope.Thread)
> @BenchmarkMode(Mode.AverageTime)
> @OutputTimeUnit(TimeUnit.NANOSECONDS)
> @Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
> @Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
> @Fork(2)
> public class CastTest {
> 
>     static final IntVector.IntSpecies<Shapes.S128Bit> INT_SPECIES =
>             IntVector.species(Shapes.S_128_BIT);
> 
>     static final FloatVector.FloatSpecies<Shapes.S128Bit> FLOAT_SPECIES =
>             FloatVector.species(Shapes.S_128_BIT);
> 
>     static final ShortVector.ShortSpecies<Shapes.S64Bit> SHORT_SPECIES =
>             ShortVector.species(Shapes.S_64_BIT);
> 
>     static final LongVector.LongSpecies<Shapes.S256Bit> LONG_SPECIES =
>             LongVector.species(Shapes.S_256_BIT);
> 
>     @Param({"1024"})
>     private int size;
> 
>     private int[] a;
>     private int[] ri;
>     private float[] rf;
>     private short[] rs;
>     private long[] rl;
> 
>     @Setup
>     public void setUp() {
>         a = new int[size];
>         ri = new int[size];
>         rf = new float[size];
>         rs = new short[size];
>         rl = new long[size];
>         for (int i = 0; i < size; i++) {
>             a[i] = 1;
>         }
>     }
> 
>     @Benchmark
>     public int[] castIntInt() {
>         for (int i = 0; i < a.length; i += INT_SPECIES.length()) {
>             IntVector<Shapes.S128Bit> av = INT_SPECIES.fromArray(a, i);
>             INT_SPECIES.cast(av).intoArray(ri, i);
>         }
>         return ri;
>     }
> 
>     @Benchmark
>     public float[] castIntFloat() {
>         for (int i = 0; i < a.length; i += INT_SPECIES.length()) {
>             IntVector<Shapes.S128Bit> av = INT_SPECIES.fromArray(a, i);
>             FLOAT_SPECIES.cast(av).intoArray(rf, i);
>         }
>         return rf;
>     }
> 
>     @Benchmark
>     public short[] castIntShort() {
>         for (int i = 0; i < a.length; i += INT_SPECIES.length()) {
>             IntVector<Shapes.S128Bit> av = INT_SPECIES.fromArray(a, i);
>             SHORT_SPECIES.cast(av).intoArray(rs, i);
>         }
>         return rs;
>     }
> 
>     @Benchmark
>     public long[] castIntLong() {
>         for (int i = 0; i < a.length; i += INT_SPECIES.length()) {
>             IntVector<Shapes.S128Bit> av = INT_SPECIES.fromArray(a, i);
>             LONG_SPECIES.cast(av).intoArray(rl, i);
>         }
>         return rl;
>     }
> }
>  
> 
> 
> On Jun 4, 2018, at 4:14 PM, Paul Sandoz <Paul.Sandoz at oracle.com> wrote:
>  
> 
> 
> 
> On Jun 4, 2018, at 3:59 PM, Lupusoru, Razvan A <razvan.a.lupusoru at intel.com> wrote:
> 
> Hey Paul,
> 
> I am not sure just from looking at it, but I believe it should work. Hotspot already inlines o.bitSize() and this is based on type profile. Thus technically the cast is not needed since it should know by that point what type “o” is. The only part I am unsure about is whether the call to o.getClass() gets inlined so that Hotspot intrinsification resolves the class to a “constant oop”. Would you be able to do a simple cast micro and see if generated code looks still good? If yes, then you can go ahead with your change.
> 
> 
> Ok, i can write micro benchmark to check.
> 
>> 
> Separately should we simplify the cast intrinsic itself? that would likely require a split of shared code for _VectorReinterpret and 
> _VectorCast, which may be a good thing in terms of clarity.
> 
> 
> Thanks,
> Paul.



More information about the panama-dev mailing list