RFR: 8359419: AArch64: Relax min vector length to 32-bit for short vectors [v4]
Daniel Lundén
dlunden at openjdk.org
Thu Jul 10 18:25:43 UTC 2025
On Wed, 9 Jul 2025 01:23:43 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
>> ### Background
>> On AArch64, the minimum vector length supported is 64-bit for basic types, except for `byte` and `boolean` (32-bit and 16-bit respectively to match special Vector API features). This limitation prevents intrinsification of vector type conversions between `short` and wider types (e.g. `long/double`) in Vector API when the entire vector length is within 128 bits, resulting in degraded performance for such conversions.
>>
>> For example, type conversions between `ShortVector.SPECIES_128` and `LongVector.SPECIES_128` are not supported on AArch64 NEON and SVE architectures with 128-bit max vector size. This occurs because the compiler would need to generate a vector with 2 short elements, resulting in a 32-bit vector size.
>>
>> To intrinsify such type conversion APIs, we need to relax the min vector length constraint from 64-bit to 32-bit for short vectors.
>>
>> ### Impact Analysis
>> #### 1. Vector types
>> Vectors only with `short` element types will be affected, as we just supported 32-bit `short` vectors in this change.
>>
>> #### 2. Vector API
>> No impact on Vector API or the vector-specific nodes. The minimum vector shape at API level remains 64-bit. It's not possible to generate a final vector IR with 32-bit vector size. Type conversions may generate intermediate 32-bit vectors, but they will be resized or cast to vectors with at least 64-bit length.
>>
>> #### 3. Auto-vectorization
>> Enables vectorization of cases containing only 2 `short` lanes, with significant performance improvements. Since we have supported 32-bit vectors for `byte` type for a long time, extending this to `short` did not introduce additional risks.
>>
>> #### 4. Codegen of vector nodes
>> NEON doesn't support 32-bit SIMD instructions, so we use 64-bit instructions instead. For lanewise operations, this is safe because the higher half bits can be ignored.
>>
>> Details:
>> - Lanewise vector operations are unaffected as explained above.
>> - NEON supports vector load/store instructions with 32-bit vector size, which we already use in relevant IRs (shared by SVE).
>> - Cross-lane operations like reduction may be affected, potentially causing incorrect results for `min/max/mul/and` reductions. The min vector size for such operations should remain 64-bit. We've added assertions in match rules. Since it's currently not possible to generate such reductions (Vector API minimum is 64-bit, and SLP doesn't support subword type reductions), we maintain the status quo. However, addin...
>
> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:
>
> Disable auto-vectorization of double to short conversion for NEON and update tests
@XiaohongGong The code changes look sane, although, for the record, I'm not that familiar with this part of HotSpot. Testing also looks good, details below.
### Testing
- [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/16165935815)
- `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
- Performance testing on DaCapo, Renaissance, SPECjbb, and SPECjvm on Linux x64 and macOS aarch64. No observable improvements nor regressions.
-------------
Marked as reviewed by dlunden (Committer).
PR Review: https://git.openjdk.org/jdk/pull/26057#pullrequestreview-3006812696
More information about the core-libs-dev
mailing list