RFR: 8265128: [REDO] Optimize Vector API slice and unslice operations
All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin). Changes include: * Rewrite Vector API slice/unslice using already intrinsic methods * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails * Vector API conversion tests thresholds adjustment Base Performance: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ± 254.580 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ± 326.770 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ± 22.060 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ± 220.340 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ± 231.703 ops/ms Performance with patch: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ± 37.296 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ± 115.886 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ± 8.203 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ± 43.637 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ± 13.376 ops/ms ------------- Commit messages: - 8265128: [REDO] Optimize Vector API slice and unslice operations Changes: https://git.openjdk.java.net/jdk/pull/3804/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3804&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8265128 Stats: 886 lines in 44 files changed: 175 ins; 518 del; 193 mod Patch: https://git.openjdk.java.net/jdk/pull/3804.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3804/head:pull/3804 PR: https://git.openjdk.java.net/jdk/pull/3804
On Thu, 29 Apr 2021 21:29:03 GMT, Sandhya Viswanathan <sviswanathan@openjdk.org> wrote:
All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin).
Changes include: * Rewrite Vector API slice/unslice using already intrinsic methods * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails * Vector API conversion tests thresholds adjustment
Base Performance: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ± 254.580 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ± 326.770 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ± 22.060 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ± 220.340 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ± 231.703 ops/ms
Performance with patch: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ± 37.296 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ± 115.886 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ± 8.203 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ± 43.637 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ± 13.376 ops/ms
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2283:
2281: @ForceInline 2282: $abstractvectortype$ sliceTemplate(int origin) { 2283: Objects.checkIndex(origin, length());
For `Objects.checkIndex` the upper bound is exclusive, where as i think in this case it needs to be inclusive. The original bound check was: if ((origin < 0) || (origin >= VLENGTH)) { throw new ArrayIndexOutOfBoundsException("Index " + origin + " out of bounds for vector length " + VLENGTH); } ... in accordance with the spec in the JavaDoc. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2284:
2282: $abstractvectortype$ sliceTemplate(int origin) { 2283: Objects.checkIndex(origin, length()); 2284: VectorShuffle<$Boxtype$> Iota = iotaShuffle();
Suggestion: VectorShuffle<$Boxtype$> iota = iotaShuffle(); use lower case first letter for variable names. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2287:
2285: VectorMask<$Boxtype$> BlendMask = Iota.toVector().compare(VectorOperators.LT, (broadcast(($type$)(length() - origin)))); 2286: Iota = iotaShuffle(origin, 1, true); 2287: return vspecies().zero().blend(this.rearrange(Iota), BlendMask);
Observation: this is like `this.rearrange(iota, blendMask)` but we avoid the checking for exceptional shuffle indexes, since we know there are no such exceptional indexes. And, of course,this method is like `slice(origin vspecies().zero())`. I wonder if this template should defer to the vector accepting tempting? Or the method on vector itself? test/jdk/jdk/incubator/vector/AbstractVectorConversionTest.java line 51:
49: } 50: 51: static final int INVOC_COUNT = Integer.getInteger("jdk.incubator.vector.test.loop-iterations", 100);
Why did you change this? ------------- PR: https://git.openjdk.java.net/jdk/pull/3804
On Thu, 29 Apr 2021 22:59:13 GMT, Paul Sandoz <psandoz@openjdk.org> wrote:
All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin).
Changes include: * Rewrite Vector API slice/unslice using already intrinsic methods * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails * Vector API conversion tests thresholds adjustment
Base Performance: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ± 254.580 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ± 326.770 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ± 22.060 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ± 220.340 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ± 231.703 ops/ms
Performance with patch: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ± 37.296 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ± 115.886 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ± 8.203 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ± 43.637 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ± 13.376 ops/ms
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2283:
2281: @ForceInline 2282: $abstractvectortype$ sliceTemplate(int origin) { 2283: Objects.checkIndex(origin, length());
For `Objects.checkIndex` the upper bound is exclusive, where as i think in this case it needs to be inclusive.
The original bound check was:
if ((origin < 0) || (origin >= VLENGTH)) { throw new ArrayIndexOutOfBoundsException("Index " + origin + " out of bounds for vector length " + VLENGTH); } ...
in accordance with the spec in the JavaDoc.
Thanks for pointing this out. I will update this to Objects.checkIndex(origin, length()+1);
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2284:
2282: $abstractvectortype$ sliceTemplate(int origin) { 2283: Objects.checkIndex(origin, length()); 2284: VectorShuffle<$Boxtype$> Iota = iotaShuffle();
Suggestion:
VectorShuffle<$Boxtype$> iota = iotaShuffle();
use lower case first letter for variable names.
I will make this change.
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2287:
2285: VectorMask<$Boxtype$> BlendMask = Iota.toVector().compare(VectorOperators.LT, (broadcast(($type$)(length() - origin)))); 2286: Iota = iotaShuffle(origin, 1, true); 2287: return vspecies().zero().blend(this.rearrange(Iota), BlendMask);
Observation: this is like `this.rearrange(iota, blendMask)` but we avoid the checking for exceptional shuffle indexes, since we know there are no such exceptional indexes.
And, of course,this method is like `slice(origin vspecies().zero())`. I wonder if this template should defer to the vector accepting tempting? Or the method on vector itself?
Trying to optimize as much as possible: There is overhead in checking for exception shuffle indices. The slice(origin, vector) version has one extra rearrange call.
test/jdk/jdk/incubator/vector/AbstractVectorConversionTest.java line 51:
49: } 50: 51: static final int INVOC_COUNT = Integer.getInteger("jdk.incubator.vector.test.loop-iterations", 100);
Why did you change this?
Following reasons: The INVOC_COUNT is set as 100 for all other tests. The conversion tests take longer time to complete than other tests. For the conversion tests it looks like the intrinsics don't trigger sometimes and tests take longer to execute and timeout. This was one of the reasons for backout last time. ------------- PR: https://git.openjdk.java.net/jdk/pull/3804
On Thu, 29 Apr 2021 23:36:22 GMT, Sandhya Viswanathan <sviswanathan@openjdk.org> wrote:
test/jdk/jdk/incubator/vector/AbstractVectorConversionTest.java line 51:
49: } 50: 51: static final int INVOC_COUNT = Integer.getInteger("jdk.incubator.vector.test.loop-iterations", 100);
Why did you change this?
Following reasons: The INVOC_COUNT is set as 100 for all other tests. The conversion tests take longer time to complete than other tests. For the conversion tests it looks like the intrinsics don't trigger sometimes and tests take longer to execute and timeout. This was one of the reasons for backout last time.
ok ------------- PR: https://git.openjdk.java.net/jdk/pull/3804
All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin).
Changes include: * Rewrite Vector API slice/unslice using already intrinsic methods * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails * Vector API conversion tests thresholds adjustment
Base Performance: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ± 254.580 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ± 326.770 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ± 22.060 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ± 220.340 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ± 231.703 ops/ms
Performance with patch: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ± 37.296 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ± 115.886 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ± 8.203 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ± 43.637 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ± 13.376 ops/ms
Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Review comments implementation ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3804/files - new: https://git.openjdk.java.net/jdk/pull/3804/files/54dd9206..59275ff6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3804&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3804&range=00-01 Stats: 140 lines in 7 files changed: 0 ins; 0 del; 140 mod Patch: https://git.openjdk.java.net/jdk/pull/3804.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3804/head:pull/3804 PR: https://git.openjdk.java.net/jdk/pull/3804
On Fri, 30 Apr 2021 00:17:24 GMT, Sandhya Viswanathan <sviswanathan@openjdk.org> wrote:
All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin).
Changes include: * Rewrite Vector API slice/unslice using already intrinsic methods * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails * Vector API conversion tests thresholds adjustment
Base Performance: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ± 254.580 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ± 326.770 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ± 22.060 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ± 220.340 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ± 231.703 ops/ms
Performance with patch: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ± 37.296 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ± 115.886 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ± 8.203 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ± 43.637 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ± 13.376 ops/ms
Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:
Review comments implementation
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2255:
2253: Objects.checkIndex(origin, length() + 1); 2254: VectorShuffle<$Boxtype$> iota = iotaShuffle(); 2255: VectorMask<$Boxtype$> BlendMask = iota.toVector().compare(VectorOperators.LT, (broadcast(($type$)(length() - origin))));
Suggestion: VectorMask<$Boxtype$> blendMask = iota.toVector().compare(VectorOperators.LT, (broadcast(($type$)(length() - origin)))); src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2257:
2255: VectorMask<$Boxtype$> BlendMask = iota.toVector().compare(VectorOperators.LT, (broadcast(($type$)(length() - origin)))); 2256: iota = iotaShuffle(origin, 1, true); 2257: return (($abstractvectortype$)v1).rearrange(iota).blend(this.rearrange(iota), BlendMask);
Suggestion: return that.rearrange(iota).blend(this.rearrange(iota), blendMask); src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2285:
2283: Objects.checkIndex(origin, length() + 1); 2284: VectorShuffle<$Boxtype$> iota = iotaShuffle(); 2285: VectorMask<$Boxtype$> BlendMask = iota.toVector().compare(VectorOperators.LT, (broadcast(($type$)(length() - origin))));
Suggestion: VectorMask<$Boxtype$> blendMask = iota.toVector().compare(VectorOperators.LT, (broadcast(($type$)(length() - origin)))); src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2287:
2285: VectorMask<$Boxtype$> BlendMask = iota.toVector().compare(VectorOperators.LT, (broadcast(($type$)(length() - origin)))); 2286: iota = iotaShuffle(origin, 1, true); 2287: return vspecies().zero().blend(this.rearrange(iota), BlendMask);
Suggestion: return vspecies().zero().blend(this.rearrange(iota), blendMask); src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2306:
2304: Objects.checkIndex(origin, length() + 1); 2305: VectorShuffle<$Boxtype$> iota = iotaShuffle(); 2306: VectorMask<$Boxtype$> BlendMask = iota.toVector().compare((part == 0) ? VectorOperators.GE : VectorOperators.LT,
Suggestion: VectorMask<$Boxtype$> blendMask = iota.toVector().compare((part == 0) ? VectorOperators.GE : VectorOperators.LT, src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2309:
2307: (broadcast(($type$)(origin)))); 2308: iota = iotaShuffle(-origin, 1, true); 2309: return (($abstractvectortype$)w).blend(this.rearrange(iota), BlendMask);
Suggestion: return that.blend(this.rearrange(iota), blendMask); src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2346:
2344: Objects.checkIndex(origin, length() + 1); 2345: VectorShuffle<$Boxtype$> iota = iotaShuffle(); 2346: VectorMask<$Boxtype$> BlendMask = iota.toVector().compare(VectorOperators.GE,
Suggestion: VectorMask<$Boxtype$> blendMask = iota.toVector().compare(VectorOperators.GE, src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2349:
2347: (broadcast(($type$)(origin)))); 2348: iota = iotaShuffle(-origin, 1, true); 2349: return vspecies().zero().blend(this.rearrange(iota), BlendMask);
Suggestion: return vspecies().zero().blend(this.rearrange(iota), blendMask); ------------- PR: https://git.openjdk.java.net/jdk/pull/3804
On Fri, 30 Apr 2021 00:17:24 GMT, Sandhya Viswanathan <sviswanathan@openjdk.org> wrote:
All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin).
Changes include: * Rewrite Vector API slice/unslice using already intrinsic methods * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails * Vector API conversion tests thresholds adjustment
Base Performance: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ± 254.580 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ± 326.770 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ± 22.060 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ± 220.340 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ± 231.703 ops/ms
Performance with patch: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ± 37.296 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ± 115.886 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ± 8.203 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ± 43.637 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ± 13.376 ops/ms
Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:
Review comments implementation
@PaulSandoz Thanks a lot for these suggestions. I have included all of them in this updated commit. ------------- PR: https://git.openjdk.java.net/jdk/pull/3804
On Thu, 29 Apr 2021 21:29:03 GMT, Sandhya Viswanathan <sviswanathan@openjdk.org> wrote:
All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin).
Changes include: * Rewrite Vector API slice/unslice using already intrinsic methods * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails * Vector API conversion tests thresholds adjustment
Base Performance: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ± 254.580 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ± 326.770 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ± 22.060 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ± 220.340 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ± 231.703 ops/ms
Performance with patch: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ± 37.296 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ± 115.886 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ± 8.203 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ± 43.637 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ± 13.376 ops/ms
@PaulSandoz I have implemented your review comments. Please take a look. ------------- PR: https://git.openjdk.java.net/jdk/pull/3804
All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin).
Changes include: * Rewrite Vector API slice/unslice using already intrinsic methods * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails * Vector API conversion tests thresholds adjustment
Base Performance: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ± 254.580 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ± 326.770 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ± 22.060 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ± 220.340 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ± 231.703 ops/ms
Performance with patch: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ± 37.296 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ± 115.886 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ± 8.203 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ± 43.637 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ± 13.376 ops/ms
Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Review comments: blendmask etc ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3804/files - new: https://git.openjdk.java.net/jdk/pull/3804/files/59275ff6..94f184ef Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3804&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3804&range=01-02 Stats: 56 lines in 7 files changed: 0 ins; 0 del; 56 mod Patch: https://git.openjdk.java.net/jdk/pull/3804.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3804/head:pull/3804 PR: https://git.openjdk.java.net/jdk/pull/3804
On Fri, 30 Apr 2021 01:58:27 GMT, Sandhya Viswanathan <sviswanathan@openjdk.org> wrote:
All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin).
Changes include: * Rewrite Vector API slice/unslice using already intrinsic methods * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails * Vector API conversion tests thresholds adjustment
Base Performance: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ± 254.580 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ± 326.770 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ± 22.060 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ± 220.340 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ± 231.703 ops/ms
Performance with patch: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ± 37.296 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ± 115.886 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ± 8.203 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ± 43.637 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ± 13.376 ops/ms
Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:
Review comments: blendmask etc
Marked as reviewed by psandoz (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/3804
On Fri, 30 Apr 2021 02:31:07 GMT, Paul Sandoz <psandoz@openjdk.org> wrote:
Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:
Review comments: blendmask etc
Marked as reviewed by psandoz (Reviewer).
@PaulSandoz would it be possible for you to run this through your testing? ------------- PR: https://git.openjdk.java.net/jdk/pull/3804
On Fri, 30 Apr 2021 02:31:07 GMT, Paul Sandoz <psandoz@openjdk.org> wrote:
Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:
Review comments: blendmask etc
Marked as reviewed by psandoz (Reviewer).
@PaulSandoz would it be possible for you to run this through your testing?
Started, will report back when done. ------------- PR: https://git.openjdk.java.net/jdk/pull/3804
On Fri, 30 Apr 2021 17:44:38 GMT, Paul Sandoz <psandoz@openjdk.org> wrote:
@PaulSandoz would it be possible for you to run this through your testing?
Started, will report back when done.
Tier 1 to 3 tests all pass on build profiles linux-x64 linux-aarch64 macosx-x64 windows-x64 ------------- PR: https://git.openjdk.java.net/jdk/pull/3804
On Fri, 30 Apr 2021 23:34:15 GMT, Paul Sandoz <psandoz@openjdk.org> wrote:
@PaulSandoz would it be possible for you to run this through your testing?
Started, will report back when done.
Tier 1 to 3 tests all pass on build profiles linux-x64 linux-aarch64 macosx-x64 windows-x64
@PaulSandoz Thanks a lot! ------------- PR: https://git.openjdk.java.net/jdk/pull/3804
On Fri, 30 Apr 2021 23:34:15 GMT, Paul Sandoz <psandoz@openjdk.org> wrote:
@PaulSandoz would it be possible for you to run this through your testing?
Started, will report back when done.
@PaulSandoz would it be possible for you to run this through your testing?
Started, will report back when done.
Tier 1 to 3 tests all pass on build profiles linux-x64 linux-aarch64 macosx-x64 windows-x64
@PaulSandoz After we fixed the Objects.checkIndex arguments per your review comment, the changes in library_call.cpp are no more needed so I have backed them out. That leaves only changes in vector api code which you have reviewed. Please let me know if it is ok to push. ------------- PR: https://git.openjdk.java.net/jdk/pull/3804
On Fri, 30 Apr 2021 01:58:27 GMT, Sandhya Viswanathan <sviswanathan@openjdk.org> wrote:
All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin).
Changes include: * Rewrite Vector API slice/unslice using already intrinsic methods * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails * Vector API conversion tests thresholds adjustment
Base Performance: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ± 254.580 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ± 326.770 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ± 22.060 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ± 220.340 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ± 231.703 ops/ms
Performance with patch: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ± 37.296 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ± 115.886 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ± 8.203 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ± 43.637 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ± 13.376 ops/ms
Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:
Review comments: blendmask etc
@iwanowww Could you please review if the change in library_call.cpp looks ok to you? ------------- PR: https://git.openjdk.java.net/jdk/pull/3804
All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin).
Changes include: * Rewrite Vector API slice/unslice using already intrinsic methods * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails * Vector API conversion tests thresholds adjustment
Base Performance: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ± 254.580 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ± 326.770 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ± 22.060 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ± 220.340 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ± 231.703 ops/ms
Performance with patch: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ± 37.296 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ± 115.886 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ± 8.203 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ± 43.637 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ± 13.376 ops/ms
Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: library_call.cpp changes not needed after Objects.checkIndex arguments fixed ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3804/files - new: https://git.openjdk.java.net/jdk/pull/3804/files/94f184ef..14439667 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3804&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3804&range=02-03 Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/3804.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3804/head:pull/3804 PR: https://git.openjdk.java.net/jdk/pull/3804
On Mon, 10 May 2021 18:31:30 GMT, Sandhya Viswanathan <sviswanathan@openjdk.org> wrote:
All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin).
Changes include: * Rewrite Vector API slice/unslice using already intrinsic methods * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails * Vector API conversion tests thresholds adjustment
Base Performance: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ± 254.580 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ± 326.770 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ± 22.060 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ± 220.340 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ± 231.703 ops/ms
Performance with patch: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ± 37.296 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ± 115.886 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ± 8.203 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ± 43.637 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ± 13.376 ops/ms
Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:
library_call.cpp changes not needed after Objects.checkIndex arguments fixed
I don't claim to understand the HotSpot details, but Java changes still look good. ------------- Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3804
On Mon, 10 May 2021 18:31:30 GMT, Sandhya Viswanathan <sviswanathan@openjdk.org> wrote:
All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin).
Changes include: * Rewrite Vector API slice/unslice using already intrinsic methods * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails * Vector API conversion tests thresholds adjustment
Base Performance: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ± 254.580 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ± 326.770 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ± 22.060 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ± 220.340 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ± 231.703 ops/ms
Performance with patch: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ± 37.296 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ± 115.886 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ± 8.203 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ± 43.637 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ± 13.376 ops/ms
Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:
library_call.cpp changes not needed after Objects.checkIndex arguments fixed
Looks good. PS: I still think there's a problem with `LibraryCallKit::inline_preconditions_checkIndex`: it shouldn't bail out intrinsification in the middle of the process leaving a partially constructed graph behind. I don't see why short-circuiting the logic once the path is dead (`if (stopped()) return true;`) won't work. But that's a topic for another fix. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3804
On Mon, 10 May 2021 18:31:30 GMT, Sandhya Viswanathan <sviswanathan@openjdk.org> wrote:
All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin).
Changes include: * Rewrite Vector API slice/unslice using already intrinsic methods * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails * Vector API conversion tests thresholds adjustment
Base Performance: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ± 254.580 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ± 326.770 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ± 22.060 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ± 220.340 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ± 231.703 ops/ms
Performance with patch: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ± 37.296 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ± 115.886 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ± 8.203 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ± 43.637 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ± 13.376 ops/ms
Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:
library_call.cpp changes not needed after Objects.checkIndex arguments fixed
Thanks a lot for the review Paul and Vladimir. Vladimir, I will submit a separate patch for LibraryCallKit::inline_preconditions_checkIndex fix. ------------- PR: https://git.openjdk.java.net/jdk/pull/3804
On Thu, 29 Apr 2021 21:29:03 GMT, Sandhya Viswanathan <sviswanathan@openjdk.org> wrote:
All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin).
Changes include: * Rewrite Vector API slice/unslice using already intrinsic methods * Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails * Vector API conversion tests thresholds adjustment
Base Performance: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ± 254.580 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ± 326.770 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ± 22.060 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ± 220.340 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ± 231.703 ops/ms
Performance with patch: Benchmark (size) Mode Cnt Score Error Units TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ± 37.296 ops/ms TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ± 115.886 ops/ms TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ± 8.203 ops/ms TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ± 43.637 ops/ms TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ± 13.376 ops/ms
This pull request has now been integrated. Changeset: 23446f1f Author: Sandhya Viswanathan <sviswanathan@openjdk.org> URL: https://git.openjdk.java.net/jdk/commit/23446f1f5ee087376bc1ab89413a011fc52b... Stats: 881 lines in 43 files changed: 172 ins; 518 del; 191 mod 8265128: [REDO] Optimize Vector API slice and unslice operations Reviewed-by: psandoz, vlivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/3804
participants (3)
-
Paul Sandoz
-
Sandhya Viswanathan
-
Vladimir Ivanov