RFR: 8303161: [vectorapi] VectorMask.cast narrow operation returns incorrect value with SVE [v2]
Nick Gasson
ngasson at openjdk.org
Wed Mar 29 14:03:42 UTC 2023
On Tue, 28 Mar 2023 09:46:19 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:
>> The cast operation for VectorMask from wider type to narrow type returns incorrect result for trueCount() method invocation for the resultant mask with SVE (on some SVE machines toLong() also results in incorrect values). An example narrow operation which results in incorrect toLong() and trueCount() values is shown below for a 128-bit -> 64-bit conversion and this can be extended to other narrow operations where the source mask in bytes is either 4x or 8x the size of the result mask in bytes -
>>
>>
>> public class TestMaskCast {
>>
>> static final boolean [] mask_arr = {true, true, false, true};
>>
>> public static long narrow_long() {
>> VectorMask<Long> lmask128 = VectorMask.fromArray(LongVector.SPECIES_128, mask_arr, 0);
>> return lmask128.cast(IntVector.SPECIES_64).toLong();
>> }
>>
>> public static void main(String[] args) {
>> long r = 0L;
>> for (int ic = 0; ic < 50000; ic++) {
>> r = narrow_long();
>> }
>> System.out.println("toLong() : " + r);
>> }
>> }
>>
>>
>> **C2 compilation result :**
>> java --add-modules jdk.incubator.vector TestMaskCast
>> toLong(): 15
>>
>> **Interpreter result (for verification) :**
>> java --add-modules jdk.incubator.vector -Xint TestMaskCast
>> toLong(): 3
>>
>> The incorrect results with toLong() have been observed only on the 128-bit and 256-bit SVE machines but they are not reproducible on a 512-bit machine. However, trueCount() returns incorrect values too and they are reproducible on all the SVE machines and thus is more reliable to use trueCount() to bring out the drawbacks of the current implementation of mask cast narrow operation for SVE.
>>
>> Replacing the call to toLong() by trueCount() in the above example -
>>
>>
>> public class TestMaskCast {
>>
>> static final boolean [] mask_arr = {true, true, false, true};
>>
>> public static int narrow_long() {
>> VectorMask<Long> lmask128 = VectorMask.fromArray(LongVector.SPECIES_128, mask_arr, 0);
>> return lmask128.cast(IntVector.SPECIES_64).trueCount();
>> }
>>
>> public static void main(String[] args) {
>> int r = 0;
>> for (int ic = 0; ic < 50000; ic++) {
>> r = narrow_long();
>> }
>> System.out.println("trueCount() : " + r);
>> }
>> }
>>
>>
>>
>> **C2 compilation result:**
>> java --add-modules jdk.incubator.vector TestMaskCast
>> trueCount() : 4
>>
>> **Interpreter result:**
>> java --add-modules jdk.incubator.vector -Xint TestMaskCast
>> trueCount() : 2
>>
>> Since in this example, the source mask size in bytes is 2x that of the result mask, trueCount() returns 2x the number of true elements in the source mask. It would return 4x/8x the number of true elements in the source mask if the size of the source mask is 4x/8x that of result mask.
>>
>> The returned values are incorrect because of the higher order bits in the result not being cleared (since the result is narrowed down) and trueCount() or toLong() tend to consider the higher order bits in the vector register as well which results in incorrect value. For the 128-bit to 64-bit conversion with a mask - "TT" passed, the current implementation for mask cast narrow operation returns the same mask in the lower and upper half of the 128-bit register that is - "TTTT" which results in a long value of 15 (instead of 3 - "FFTT" for the 64-bit Integer mask) and number of true elements to be 4 (instead of 2).
>>
>> This patch proposes a fix for this problem. An already existing JTREG IR test - "test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastTest.java" has also been modified to call the trueCount() method as well since the toString() method alone cannot be used to reproduce the incorrect values in this bug. This test passes successfully on 128-bit, 256-bit and 512-bit SVE machines. Since the IR test has been changed, it has been tested successfully on other platforms like x86 and aarch64 Neon machines as well to ensure the changes have not introduced any new errors.
>
> Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
>
> - Merge master
> - 8303161: [vectorapi] VectorMask.cast narrow operation returns incorrect value with SVE
>
> The cast operation for VectorMask from wider type to narrow type returns
> incorrect result for trueCount() method invocation for the resultant
> mask with SVE (on some SVE machines toLong() also results in incorrect
> values). An example narrow operation which results in incorrect toLong()
> and trueCount() values is shown below for a 128-bit -> 64-bit conversion
> and this can be extended to other narrow operations where the source
> mask in bytes is either 4x or 8x the size of the result mask in
> bytes -
>
> public class TestMaskCast {
>
> static final boolean [] mask_arr = {true, true, false, true};
>
> public static long narrow_long() {
> VectorMask<Long> lmask128 = VectorMask.fromArray(LongVector.SPECIES_128, mask_arr, 0);
> return lmask128.cast(IntVector.SPECIES_64).toLong();
> }
>
> public static void main(String[] args) {
> long r = 0L;
> for (int ic = 0; ic < 50000; ic++) {
> r = narrow_long();
> }
> System.out.println("toLong() : " + r);
> }
> }
>
> C2 compilation result :
> java --add-modules jdk.incubator.vector TestMaskCast
> toLong(): 15
>
> Interpreter result (for verification) :
> java --add-modules jdk.incubator.vector -Xint TestMaskCast
> toLong(): 3
>
> The incorrect results with toLong() have been observed only on the
> 128-bit and 256-bit SVE machines but they are not reproducible on a
> 512-bit machine. However, trueCount() returns incorrect values too
> and they are reproducible on all the SVE machines and thus is more
> reliable to use trueCount() to bring out the drawbacks of the current
> implementation of mask cast narrow operation for SVE.
>
> Replacing the call to toLong() by trueCount() in the above example -
> public class TestMaskCast {
>
> static final boolean [] mask_arr = {true, true, false, true};
>
> public static int narrow_long() {
> VectorMask<Long> lmask128 = VectorMask.fromArray(LongVector.SPECIES_128, mask_arr, 0);
> return lmask128.cast(IntVector.SPECIES_64).trueCount();
> }
>
> public static void main(String[] args) {
> int r = 0;
> for (int ic = 0; ic < 50000; ic++) {
> r = narrow_long();
> }
> System.out.println("trueCount() : " + r);
> }
> }
>
> C2 compilation result:
> java --add-modules jdk.incubator.vector TestMaskCast
> trueCount() : 4
>
> Interpreter result:
> java --add-modules jdk.incubator.vector -Xint TestMaskCast
> trueCount() : 2
>
> Since in this example, the source mask size in bytes is 2x that of the
> result mask, trueCount() returns 2x the number of true elements in the
> source mask. It would return 4x/8x the number of true elements in the
> source mask if the size of the source mask is 4x/8x that of result mask.
>
> The returned values are incorrect because of the higher order bits in
> the result not being cleared (since the result is narrowed down) and
> trueCount() or toLong() tend to consider the higher order bits in the
> vector register as well which results in incorrect value.
> For the 128-bit to 64-bit conversion with a mask - "TT" passed, the
> current implementation for mask cast narrow operation returns the same
> mask in the lower and upper half of the 128-bit register that is -
> "TTTT" which results in a long value of 15 (instead of 3 - "FFTT" for
> the 64-bit Integer mask) and number of true elements to be 4 (instead of
> 2).
>
> This patch proposes a fix for this problem. An already existing JTREG IR
> test - "test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastTest.java"
> has also been modified to call the trueCount() method as well since the
> toString() method alone cannot be used to reproduce the incorrect values
> in this bug. This test passes successfully on 128-bit, 256-bit and
> 512-bit SVE machines. Since the IR test has been changed, it has been
> tested successfully on other platforms like x86 and aarch64 Neon
> machines as well to ensure the changes have not introduced any new
> errors.
Marked as reviewed by ngasson (Reviewer).
-------------
PR Review: https://git.openjdk.org/jdk/pull/12901#pullrequestreview-1363183016
More information about the hotspot-compiler-dev
mailing list