RFR: 8303161: [vectorapi] VectorMask.cast narrow operation returns incorrect value with SVE [v2]
Bhavana Kilambi
bkilambi at openjdk.org
Tue Mar 28 09:46:19 UTC 2023
> The cast operation for VectorMask from wider type to narrow type returns incorrect result for trueCount() method invocation for the resultant mask with SVE (on some SVE machines toLong() also results in incorrect values). An example narrow operation which results in incorrect toLong() and trueCount() values is shown below for a 128-bit -> 64-bit conversion and this can be extended to other narrow operations where the source mask in bytes is either 4x or 8x the size of the result mask in bytes -
>
>
> public class TestMaskCast {
>
> static final boolean [] mask_arr = {true, true, false, true};
>
> public static long narrow_long() {
> VectorMask<Long> lmask128 = VectorMask.fromArray(LongVector.SPECIES_128, mask_arr, 0);
> return lmask128.cast(IntVector.SPECIES_64).toLong();
> }
>
> public static void main(String[] args) {
> long r = 0L;
> for (int ic = 0; ic < 50000; ic++) {
> r = narrow_long();
> }
> System.out.println("toLong() : " + r);
> }
> }
>
>
> **C2 compilation result :**
> java --add-modules jdk.incubator.vector TestMaskCast
> toLong(): 15
>
> **Interpreter result (for verification) :**
> java --add-modules jdk.incubator.vector -Xint TestMaskCast
> toLong(): 3
>
> The incorrect results with toLong() have been observed only on the 128-bit and 256-bit SVE machines but they are not reproducible on a 512-bit machine. However, trueCount() returns incorrect values too and they are reproducible on all the SVE machines and thus is more reliable to use trueCount() to bring out the drawbacks of the current implementation of mask cast narrow operation for SVE.
>
> Replacing the call to toLong() by trueCount() in the above example -
>
>
> public class TestMaskCast {
>
> static final boolean [] mask_arr = {true, true, false, true};
>
> public static int narrow_long() {
> VectorMask<Long> lmask128 = VectorMask.fromArray(LongVector.SPECIES_128, mask_arr, 0);
> return lmask128.cast(IntVector.SPECIES_64).trueCount();
> }
>
> public static void main(String[] args) {
> int r = 0;
> for (int ic = 0; ic < 50000; ic++) {
> r = narrow_long();
> }
> System.out.println("trueCount() : " + r);
> }
> }
>
>
>
> **C2 compilation result:**
> java --add-modules jdk.incubator.vector TestMaskCast
> trueCount() : 4
>
> **Interpreter result:**
> java --add-modules jdk.incubator.vector -Xint TestMaskCast
> trueCount() : 2
>
> Since in this example, the source mask size in bytes is 2x that of the result mask, trueCount() returns 2x the number of true elements in the source mask. It would return 4x/8x the number of true elements in the source mask if the size of the source mask is 4x/8x that of result mask.
>
> The returned values are incorrect because of the higher order bits in the result not being cleared (since the result is narrowed down) and trueCount() or toLong() tend to consider the higher order bits in the vector register as well which results in incorrect value. For the 128-bit to 64-bit conversion with a mask - "TT" passed, the current implementation for mask cast narrow operation returns the same mask in the lower and upper half of the 128-bit register that is - "TTTT" which results in a long value of 15 (instead of 3 - "FFTT" for the 64-bit Integer mask) and number of true elements to be 4 (instead of 2).
>
> This patch proposes a fix for this problem. An already existing JTREG IR test - "test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastTest.java" has also been modified to call the trueCount() method as well since the toString() method alone cannot be used to reproduce the incorrect values in this bug. This test passes successfully on 128-bit, 256-bit and 512-bit SVE machines. Since the IR test has been changed, it has been tested successfully on other platforms like x86 and aarch64 Neon machines as well to ensure the changes have not introduced any new errors.
Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
- Merge master
- 8303161: [vectorapi] VectorMask.cast narrow operation returns incorrect value with SVE
The cast operation for VectorMask from wider type to narrow type returns
incorrect result for trueCount() method invocation for the resultant
mask with SVE (on some SVE machines toLong() also results in incorrect
values). An example narrow operation which results in incorrect toLong()
and trueCount() values is shown below for a 128-bit -> 64-bit conversion
and this can be extended to other narrow operations where the source
mask in bytes is either 4x or 8x the size of the result mask in
bytes -
public class TestMaskCast {
static final boolean [] mask_arr = {true, true, false, true};
public static long narrow_long() {
VectorMask<Long> lmask128 = VectorMask.fromArray(LongVector.SPECIES_128, mask_arr, 0);
return lmask128.cast(IntVector.SPECIES_64).toLong();
}
public static void main(String[] args) {
long r = 0L;
for (int ic = 0; ic < 50000; ic++) {
r = narrow_long();
}
System.out.println("toLong() : " + r);
}
}
C2 compilation result :
java --add-modules jdk.incubator.vector TestMaskCast
toLong(): 15
Interpreter result (for verification) :
java --add-modules jdk.incubator.vector -Xint TestMaskCast
toLong(): 3
The incorrect results with toLong() have been observed only on the
128-bit and 256-bit SVE machines but they are not reproducible on a
512-bit machine. However, trueCount() returns incorrect values too
and they are reproducible on all the SVE machines and thus is more
reliable to use trueCount() to bring out the drawbacks of the current
implementation of mask cast narrow operation for SVE.
Replacing the call to toLong() by trueCount() in the above example -
public class TestMaskCast {
static final boolean [] mask_arr = {true, true, false, true};
public static int narrow_long() {
VectorMask<Long> lmask128 = VectorMask.fromArray(LongVector.SPECIES_128, mask_arr, 0);
return lmask128.cast(IntVector.SPECIES_64).trueCount();
}
public static void main(String[] args) {
int r = 0;
for (int ic = 0; ic < 50000; ic++) {
r = narrow_long();
}
System.out.println("trueCount() : " + r);
}
}
C2 compilation result:
java --add-modules jdk.incubator.vector TestMaskCast
trueCount() : 4
Interpreter result:
java --add-modules jdk.incubator.vector -Xint TestMaskCast
trueCount() : 2
Since in this example, the source mask size in bytes is 2x that of the
result mask, trueCount() returns 2x the number of true elements in the
source mask. It would return 4x/8x the number of true elements in the
source mask if the size of the source mask is 4x/8x that of result mask.
The returned values are incorrect because of the higher order bits in
the result not being cleared (since the result is narrowed down) and
trueCount() or toLong() tend to consider the higher order bits in the
vector register as well which results in incorrect value.
For the 128-bit to 64-bit conversion with a mask - "TT" passed, the
current implementation for mask cast narrow operation returns the same
mask in the lower and upper half of the 128-bit register that is -
"TTTT" which results in a long value of 15 (instead of 3 - "FFTT" for
the 64-bit Integer mask) and number of true elements to be 4 (instead of
2).
This patch proposes a fix for this problem. An already existing JTREG IR
test - "test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastTest.java"
has also been modified to call the trueCount() method as well since the
toString() method alone cannot be used to reproduce the incorrect values
in this bug. This test passes successfully on 128-bit, 256-bit and
512-bit SVE machines. Since the IR test has been changed, it has been
tested successfully on other platforms like x86 and aarch64 Neon
machines as well to ensure the changes have not introduced any new
errors.
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/12901/files
- new: https://git.openjdk.org/jdk/pull/12901/files/639281ed..ccb23e2d
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=12901&range=01
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=12901&range=00-01
Stats: 160063 lines in 1880 files changed: 111867 ins; 29582 del; 18614 mod
Patch: https://git.openjdk.org/jdk/pull/12901.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/12901/head:pull/12901
PR: https://git.openjdk.org/jdk/pull/12901
More information about the hotspot-compiler-dev
mailing list