RFR: 8265128: [REDO] Optimize Vector API slice and unslice operations

Sandhya Viswanathan sviswanathan at openjdk.java.net
Thu Apr 29 23:39:00 UTC 2021


On Thu, 29 Apr 2021 22:59:13 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:

>> All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin).
>> 
>> Changes include:
>>  * Rewrite Vector API slice/unslice using already intrinsic methods
>>  * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails
>>  * Vector API conversion tests thresholds adjustment
>>  
>> Base Performance:
>> Benchmark (size) Mode Cnt Score Error Units
>> TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ± 254.580 ops/ms
>> TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ± 326.770 ops/ms
>> TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ± 22.060 ops/ms
>> TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ± 220.340 ops/ms
>> TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ± 231.703 ops/ms
>> 
>> Performance with patch:
>> Benchmark (size) Mode Cnt Score Error Units
>> TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ± 37.296 ops/ms
>> TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ± 115.886 ops/ms
>> TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ± 8.203 ops/ms
>> TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ± 43.637 ops/ms
>> TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ± 13.376 ops/ms
>
> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2283:
> 
>> 2281:     @ForceInline
>> 2282:     $abstractvectortype$ sliceTemplate(int origin) {
>> 2283:         Objects.checkIndex(origin, length());
> 
> For `Objects.checkIndex` the upper bound is exclusive, where as i think in this case it needs to be inclusive.
> 
> The original bound check was:
> 
>        if ((origin < 0) || (origin >= VLENGTH)) {
>          throw new ArrayIndexOutOfBoundsException("Index " + origin + " out of bounds for vector length " + VLENGTH);
>        }
> ...
> 
> in accordance with the spec in the JavaDoc.

Thanks for pointing this out. I will update this to Objects.checkIndex(origin, length()+1);

> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2284:
> 
>> 2282:     $abstractvectortype$ sliceTemplate(int origin) {
>> 2283:         Objects.checkIndex(origin, length());
>> 2284:         VectorShuffle<$Boxtype$> Iota = iotaShuffle();
> 
> Suggestion:
> 
>         VectorShuffle<$Boxtype$> iota = iotaShuffle();
> 
> use lower case first letter for variable names.

I will make this change.

> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2287:
> 
>> 2285:         VectorMask<$Boxtype$> BlendMask = Iota.toVector().compare(VectorOperators.LT, (broadcast(($type$)(length() - origin))));
>> 2286:         Iota = iotaShuffle(origin, 1, true);
>> 2287:         return vspecies().zero().blend(this.rearrange(Iota), BlendMask);
> 
> Observation: this is like `this.rearrange(iota, blendMask)` but we avoid the checking for exceptional shuffle indexes, since we know there are no such exceptional indexes.
> 
> And, of course,this method is like `slice(origin vspecies().zero())`. I wonder if this template should defer to the vector accepting tempting? Or the method on vector itself?

Trying to optimize as much as possible:
  There is overhead in checking for exception shuffle indices. 
  The slice(origin, vector) version has one extra rearrange call.

> test/jdk/jdk/incubator/vector/AbstractVectorConversionTest.java line 51:
> 
>> 49:     }
>> 50: 
>> 51:     static final int INVOC_COUNT = Integer.getInteger("jdk.incubator.vector.test.loop-iterations", 100);
> 
> Why did you change this?

Following reasons:
The INVOC_COUNT is set as 100 for all other tests.
The conversion tests take longer time to complete than other tests.
For the conversion tests it looks like the intrinsics don't trigger sometimes and tests take longer to execute and timeout. This was one of the reasons for backout last time.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3804


More information about the hotspot-compiler-dev mailing list