RFR: 8303161: [vectorapi] VectorMask.cast narrow operation returns incorrect value with SVE
Bhavana Kilambi
bkilambi at openjdk.org
Tue Mar 7 11:05:03 UTC 2023
The cast operation for VectorMask from wider type to narrow type returns incorrect result for trueCount() method invocation for the resultant mask with SVE (on some SVE machines toLong() also results in incorrect values). An example narrow operation which results in incorrect toLong() and trueCount() values is shown below for a 128-bit -> 64-bit conversion and this can be extended to other narrow operations where the source mask in bytes is either 4x or 8x the size of the result mask in bytes -
public class TestMaskCast {
static final boolean [] mask_arr = {true, true, false, true};
public static long narrow_long() {
VectorMask<Long> lmask128 = VectorMask.fromArray(LongVector.SPECIES_128, mask_arr, 0);
return lmask128.cast(IntVector.SPECIES_64).toLong();
}
public static void main(String[] args) {
long r = 0L;
for (int ic = 0; ic < 50000; ic++) {
r = narrow_long();
}
System.out.println("toLong() : " + r);
}
}
**C2 compilation result :**
java --add-modules jdk.incubator.vector TestMaskCast
toLong(): 15
**Interpreter result (for verification) :**
java --add-modules jdk.incubator.vector -Xint TestMaskCast
toLong(): 3
The incorrect results with toLong() have been observed only on the 128-bit and 256-bit SVE machines but they are not reproducible on a 512-bit machine. However, trueCount() returns incorrect values too and they are reproducible on all the SVE machines and thus is more reliable to use trueCount() to bring out the drawbacks of the current implementation of mask cast narrow operation for SVE.
Replacing the call to toLong() by trueCount() in the above example -
public class TestMaskCast {
static final boolean [] mask_arr = {true, true, false, true};
public static int narrow_long() {
VectorMask<Long> lmask128 = VectorMask.fromArray(LongVector.SPECIES_128, mask_arr, 0);
return lmask128.cast(IntVector.SPECIES_64).trueCount();
}
public static void main(String[] args) {
int r = 0;
for (int ic = 0; ic < 50000; ic++) {
r = narrow_long();
}
System.out.println("trueCount() : " + r);
}
}
**C2 compilation result:**
java --add-modules jdk.incubator.vector TestMaskCast
trueCount() : 4
**Interpreter result:**
java --add-modules jdk.incubator.vector -Xint TestMaskCast
trueCount() : 2
Since in this example, the source mask size in bytes is 2x that of the result mask, trueCount() returns 2x the number of true elements in the source mask. It would return 4x/8x the number of true elements in the source mask if the size of the source mask is 4x/8x that of result mask.
The returned values are incorrect because of the higher order bits in the result not being cleared (since the result is narrowed down) and trueCount() or toLong() tend to consider the higher order bits in the vector register as well which results in incorrect value. For the 128-bit to 64-bit conversion with a mask - "TT" passed, the current implementation for mask cast narrow operation returns the same mask in the lower and upper half of the 128-bit register that is - "TTTT" which results in a long value of 15 (instead of 3 - "FFTT" for the 64-bit Integer mask) and number of true elements to be 4 (instead of 2).
This patch proposes a fix for this problem. An already existing JTREG IR test - "test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastTest.java" has also been modified to call the trueCount() method as well since the toString() method alone cannot be used to reproduce the incorrect values in this bug. This test passes successfully on 128-bit, 256-bit and 512-bit SVE machines. Since the IR test has been changed, it has been tested successfully on other platforms like x86 and aarch64 Neon machines as well to ensure the changes have not introduced any new errors.
-------------
Commit messages:
- 8303161: [vectorapi] VectorMask.cast narrow operation returns incorrect value with SVE
Changes: https://git.openjdk.org/jdk/pull/12901/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12901&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8303161
Stats: 589 lines in 5 files changed: 449 ins; 0 del; 140 mod
Patch: https://git.openjdk.org/jdk/pull/12901.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/12901/head:pull/12901
PR: https://git.openjdk.org/jdk/pull/12901
More information about the hotspot-compiler-dev
mailing list