RFR: 8359419: AArch64: Relax min vector length to 32-bit for short vectors [v3]

Bhavana Kilambi bkilambi at openjdk.org
Mon Jul 7 10:27:39 UTC 2025


On Mon, 7 Jul 2025 06:59:20 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> Have you measured the performance of this micro-benchmark on NEON machine?
>> https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java#L251-L256
>> 
>> We added an limitation only for `int` before:
>> https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/cpu/aarch64/aarch64_vector.ad#L131-L134
>> 
>> Perhaps we also need to impose a similar limitation on `short` if the same regression occurs.
>
>> Have you measured the performance of this micro-benchmark on NEON machine?
>> 
>> https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java#L251-L256
>> 
>> We added an limitation only for `int` before:
>> 
>> https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/cpu/aarch64/aarch64_vector.ad#L131-L134
>> 
>> Perhaps we also need to impose a similar limitation on `short` if the same regression occurs.
> 
> Good catch, and thanks so much for your input @fg1417 ! I will test the performance and disable auto-vectorization for double to short casting if the performance has regression. 
> 
>> https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java#L388-L392
> 
> Actually I didn't change the min vector size for `char` vectors in this patch. Relaxing `short` vectors to 32-bit is to support the vector cast for Vector API, and there is no `char` species in it. Do you think it's better to do the same change for `char` as well? This will just benefit auto-vectorization.

Hi @XiaohongGong, is there any way we can implement 2HF -> 2S and 2S -> 2HF in these match rules ? 

https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/cpu/aarch64/aarch64_vector.ad#L4697

https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/cpu/aarch64/aarch64_vector.ad#L4679

The `fcvtn` and `fcvtl` instructions do not support these arrangements. I was wondering if there is any other way we can implement these by any chance?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26057#issuecomment-3044358446


More information about the hotspot-compiler-dev mailing list