RFR: 8303161: [vectorapi] VectorMask.cast narrow operation returns incorrect value with SVE [v2]

Bhavana Kilambi bkilambi at openjdk.org
Tue Mar 28 09:46:19 UTC 2023


> The cast operation for VectorMask from wider type to narrow type returns incorrect result for trueCount() method invocation for the resultant mask with SVE (on some SVE machines toLong() also results in incorrect values). An example narrow operation which results in incorrect toLong() and trueCount() values is shown below for a 128-bit -> 64-bit conversion and this can be extended to other narrow operations where the source mask in bytes is either 4x or 8x the size of the result mask in bytes -
> 
> 
> public class TestMaskCast {
> 
>     static final boolean [] mask_arr = {true, true, false, true};
> 
>     public static long narrow_long() {
>         VectorMask<Long> lmask128 = VectorMask.fromArray(LongVector.SPECIES_128, mask_arr, 0);
>         return lmask128.cast(IntVector.SPECIES_64).toLong();
>     }
> 
>     public static void main(String[] args) {
>         long r = 0L;
>         for (int ic = 0; ic < 50000; ic++) {
>             r = narrow_long();
>         }
>         System.out.println("toLong() :  " + r);
>     }
> }
> 
> 
> **C2 compilation result :**
> java --add-modules jdk.incubator.vector TestMaskCast
> toLong():  15
> 
> **Interpreter result (for verification) :**
> java --add-modules jdk.incubator.vector -Xint TestMaskCast
> toLong():  3
> 
> The incorrect results with toLong() have been observed only on the 128-bit and 256-bit SVE machines but they are not reproducible on a 512-bit machine. However, trueCount() returns incorrect values too and they are reproducible on all the SVE machines and thus is more reliable to use trueCount() to bring out the drawbacks of the current implementation of mask cast narrow operation for SVE.
> 
> Replacing the call to toLong() by trueCount() in the above example - 
> 
> 
> public class TestMaskCast {
> 
>     static final boolean [] mask_arr = {true, true, false, true};
> 
>     public static int narrow_long() {
>         VectorMask<Long> lmask128 = VectorMask.fromArray(LongVector.SPECIES_128, mask_arr, 0);
>         return lmask128.cast(IntVector.SPECIES_64).trueCount();
>     }
> 
>     public static void main(String[] args) {
>         int r = 0;
>         for (int ic = 0; ic < 50000; ic++) {
>             r = narrow_long();
>         }
>         System.out.println("trueCount() :  " + r);
>     }
> }
> 
> 
> 
> **C2 compilation result:**
> java --add-modules jdk.incubator.vector TestMaskCast 
> trueCount() :  4
> 
> **Interpreter result:**
> java --add-modules jdk.incubator.vector -Xint TestMaskCast 
> trueCount() :  2
> 
> Since in this example, the source mask size in bytes is 2x that of the result mask, trueCount() returns 2x the number of true elements in the source mask. It would return 4x/8x the number of true elements in the source mask if the size of the source mask is 4x/8x that of result mask.
> 
> The returned values are incorrect because of the higher order bits in the result not being cleared (since the result is narrowed down) and trueCount() or toLong() tend to consider the higher order bits in the vector register as well which results in incorrect value. For the 128-bit to 64-bit conversion with a mask - "TT" passed, the current implementation for mask cast narrow operation returns the same mask in the lower and upper half of the 128-bit register that is - "TTTT" which results in a long value of 15 (instead of 3 - "FFTT" for the 64-bit Integer mask) and number of true elements to be 4 (instead of 2).
> 
> This patch proposes a fix for this problem. An already existing JTREG IR test - "test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastTest.java" has also been modified to call the trueCount() method as well since the toString() method alone cannot be used to reproduce the incorrect values in this bug. This test passes successfully on 128-bit, 256-bit and 512-bit SVE machines. Since the IR test has been changed, it has been tested successfully on other platforms like x86 and aarch64 Neon machines as well to ensure the changes have not introduced any new errors.

Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:

 - Merge master
 - 8303161: [vectorapi] VectorMask.cast narrow operation returns incorrect value with SVE
   
   The cast operation for VectorMask from wider type to narrow type returns
   incorrect result for trueCount() method invocation for the resultant
   mask with SVE (on some SVE machines toLong() also results in incorrect
   values). An example narrow operation which results in incorrect toLong()
   and trueCount() values is shown below for a 128-bit -> 64-bit conversion
   and this can be extended to other narrow operations where the source
   mask in bytes is either 4x or 8x the size of the result mask in
   bytes -
   
   public class TestMaskCast {
   
       static final boolean [] mask_arr = {true, true, false, true};
   
       public static long narrow_long() {
           VectorMask<Long> lmask128 = VectorMask.fromArray(LongVector.SPECIES_128, mask_arr, 0);
           return lmask128.cast(IntVector.SPECIES_64).toLong();
       }
   
       public static void main(String[] args) {
           long r = 0L;
           for (int ic = 0; ic < 50000; ic++) {
               r = narrow_long();
           }
           System.out.println("toLong() :  " + r);
       }
   }
   
   C2 compilation result :
   java --add-modules jdk.incubator.vector TestMaskCast
   toLong():  15
   
   Interpreter result (for verification) :
   java --add-modules jdk.incubator.vector -Xint TestMaskCast
   toLong():  3
   
   The incorrect results with toLong() have been observed only on the
   128-bit and 256-bit SVE machines but they are not reproducible on a
   512-bit machine. However, trueCount() returns incorrect values too
   and they are reproducible on all the SVE machines and thus is more
   reliable to use trueCount() to bring out the drawbacks of the current
   implementation of mask cast narrow operation for SVE.
   
   Replacing the call to toLong() by trueCount() in the above example -
   public class TestMaskCast {
   
       static final boolean [] mask_arr = {true, true, false, true};
   
       public static int narrow_long() {
           VectorMask<Long> lmask128 = VectorMask.fromArray(LongVector.SPECIES_128, mask_arr, 0);
           return lmask128.cast(IntVector.SPECIES_64).trueCount();
       }
   
       public static void main(String[] args) {
           int r = 0;
           for (int ic = 0; ic < 50000; ic++) {
               r = narrow_long();
           }
           System.out.println("trueCount() :  " + r);
       }
   }
   
   C2 compilation result:
   java --add-modules jdk.incubator.vector TestMaskCast
   trueCount() :  4
   
   Interpreter result:
   java --add-modules jdk.incubator.vector -Xint TestMaskCast
   trueCount() :  2
   
   Since in this example, the source mask size in bytes is 2x that of the
   result mask, trueCount() returns 2x the number of true elements in the
   source mask. It would return 4x/8x the number of true elements in the
   source mask if the size of the source mask is 4x/8x that of result mask.
   
   The returned values are incorrect because of the higher order bits in
   the result not being cleared (since the result is narrowed down) and
   trueCount() or toLong() tend to consider the higher order bits in the
   vector register as well which results in incorrect value.
   For the 128-bit to 64-bit conversion with a mask - "TT" passed, the
   current implementation for mask cast narrow operation returns the same
   mask in the lower and upper half of the 128-bit register that is -
   "TTTT" which results in a long value of 15 (instead of 3 - "FFTT" for
   the 64-bit Integer mask) and number of true elements to be 4 (instead of
   2).
   
   This patch proposes a fix for this problem. An already existing JTREG IR
   test - "test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastTest.java"
   has also been modified to call the trueCount() method as well since the
   toString() method alone cannot be used to reproduce the incorrect values
   in this bug. This test passes successfully on 128-bit, 256-bit and
   512-bit SVE machines. Since the IR test has been changed, it has been
   tested successfully on other platforms like x86 and aarch64 Neon
   machines as well to ensure the changes have not introduced any new
   errors.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/12901/files
  - new: https://git.openjdk.org/jdk/pull/12901/files/639281ed..ccb23e2d

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=12901&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12901&range=00-01

  Stats: 160063 lines in 1880 files changed: 111867 ins; 29582 del; 18614 mod
  Patch: https://git.openjdk.org/jdk/pull/12901.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/12901/head:pull/12901

PR: https://git.openjdk.org/jdk/pull/12901


More information about the hotspot-compiler-dev mailing list