Integrated: 8303161: [vectorapi] VectorMask.cast narrow operation returns incorrect value with SVE

Bhavana Kilambi bkilambi at openjdk.org
Wed Mar 29 16:16:40 UTC 2023


On Tue, 7 Mar 2023 10:56:40 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:

> The cast operation for VectorMask from wider type to narrow type returns incorrect result for trueCount() method invocation for the resultant mask with SVE (on some SVE machines toLong() also results in incorrect values). An example narrow operation which results in incorrect toLong() and trueCount() values is shown below for a 128-bit -> 64-bit conversion and this can be extended to other narrow operations where the source mask in bytes is either 4x or 8x the size of the result mask in bytes -
> 
> 
> public class TestMaskCast {
> 
>     static final boolean [] mask_arr = {true, true, false, true};
> 
>     public static long narrow_long() {
>         VectorMask<Long> lmask128 = VectorMask.fromArray(LongVector.SPECIES_128, mask_arr, 0);
>         return lmask128.cast(IntVector.SPECIES_64).toLong();
>     }
> 
>     public static void main(String[] args) {
>         long r = 0L;
>         for (int ic = 0; ic < 50000; ic++) {
>             r = narrow_long();
>         }
>         System.out.println("toLong() :  " + r);
>     }
> }
> 
> 
> **C2 compilation result :**
> java --add-modules jdk.incubator.vector TestMaskCast
> toLong():  15
> 
> **Interpreter result (for verification) :**
> java --add-modules jdk.incubator.vector -Xint TestMaskCast
> toLong():  3
> 
> The incorrect results with toLong() have been observed only on the 128-bit and 256-bit SVE machines but they are not reproducible on a 512-bit machine. However, trueCount() returns incorrect values too and they are reproducible on all the SVE machines and thus is more reliable to use trueCount() to bring out the drawbacks of the current implementation of mask cast narrow operation for SVE.
> 
> Replacing the call to toLong() by trueCount() in the above example - 
> 
> 
> public class TestMaskCast {
> 
>     static final boolean [] mask_arr = {true, true, false, true};
> 
>     public static int narrow_long() {
>         VectorMask<Long> lmask128 = VectorMask.fromArray(LongVector.SPECIES_128, mask_arr, 0);
>         return lmask128.cast(IntVector.SPECIES_64).trueCount();
>     }
> 
>     public static void main(String[] args) {
>         int r = 0;
>         for (int ic = 0; ic < 50000; ic++) {
>             r = narrow_long();
>         }
>         System.out.println("trueCount() :  " + r);
>     }
> }
> 
> 
> 
> **C2 compilation result:**
> java --add-modules jdk.incubator.vector TestMaskCast 
> trueCount() :  4
> 
> **Interpreter result:**
> java --add-modules jdk.incubator.vector -Xint TestMaskCast 
> trueCount() :  2
> 
> Since in this example, the source mask size in bytes is 2x that of the result mask, trueCount() returns 2x the number of true elements in the source mask. It would return 4x/8x the number of true elements in the source mask if the size of the source mask is 4x/8x that of result mask.
> 
> The returned values are incorrect because of the higher order bits in the result not being cleared (since the result is narrowed down) and trueCount() or toLong() tend to consider the higher order bits in the vector register as well which results in incorrect value. For the 128-bit to 64-bit conversion with a mask - "TT" passed, the current implementation for mask cast narrow operation returns the same mask in the lower and upper half of the 128-bit register that is - "TTTT" which results in a long value of 15 (instead of 3 - "FFTT" for the 64-bit Integer mask) and number of true elements to be 4 (instead of 2).
> 
> This patch proposes a fix for this problem. An already existing JTREG IR test - "test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastTest.java" has also been modified to call the trueCount() method as well since the toString() method alone cannot be used to reproduce the incorrect values in this bug. This test passes successfully on 128-bit, 256-bit and 512-bit SVE machines. Since the IR test has been changed, it has been tested successfully on other platforms like x86 and aarch64 Neon machines as well to ensure the changes have not introduced any new errors.

This pull request has now been integrated.

Changeset: 67274906
Author:    Bhavana Kilambi <bkilambi at openjdk.org>
Committer: Nick Gasson <ngasson at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/67274906aeb7a6b83761e6aaf85688aa61aa8a20
Stats:     589 lines in 5 files changed: 449 ins; 0 del; 140 mod

8303161: [vectorapi] VectorMask.cast narrow operation returns incorrect value with SVE

Reviewed-by: eliu, xgong, ngasson

-------------

PR: https://git.openjdk.org/jdk/pull/12901


More information about the hotspot-compiler-dev mailing list