RFR: 8365290: [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays [v7]
Vladimir Ivanov
vaivanov at openjdk.org
Thu Oct 2 21:55:46 UTC 2025
On Wed, 1 Oct 2025 21:39:48 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:
>> testing for tier1, tier2 and tier3 were OK. Will review this part one more time.
>> Do you have test scenario that may reproduce this issue?
>
> Testing does not guarantee this path is tested. You need to disable AVX512 to use it otherwise `generate_fill_avx3()` will be used. I was thinking about `byte[63] array` with `-XX:UseAVX=2 -XX:-UseUnalignedLoadStores` flags to hit this path. I did small experiment but unfortunately it seems `arrayof_jbyte_fill` stub is not called with AVX2 so the path is not executed.
>
> I will let you do further investigations to force this path be executed. Here is my small test:
>
> $ java -XX:-TieredCompilation -Xbatch -XX:CompileOnly=TestFillArray::fill -XX:UseAVX=2 -XX:-UseUnalignedLoadStores TestFillArray
>
> $ cat TestFillArray.java
> public class TestFillArray {
> private static byte[] ba;
>
> static void fill() {
> for (int i = 0; i < ba.length; i++) {
> ba[i] = (byte) 123;
> }
> }
>
> public static void main(String[] str) {
> ba = new byte[63];
> for (int i = 0; i < 10000; i++) {
> fill();
> }
> ba = new byte[63];
> fill();
> for (int i = 0; i < ba.length; i++) {
> if (ba[i] != (byte) 123) {
> System.out.println("ba[" + i + "] (" + ba[i] + ") != 123");
> }
> }
> }
> }
With extra cycle for code in main
`public class TestFillArray {
private static byte[] ba;
static void fill() {
for (int i = 0; i < ba.length; i++) {
ba[i] = (byte) 123;
}
}
static void iter() {
ba = new byte[63];
for (int i = 0; i < 1000000; i++) {
fill();
}
ba = new byte[63];
fill();
for (int i = 0; i < ba.length; i++) {
if (ba[i] != (byte) 123) {
System.out.println("ba[" + i + "] (" + ba[i] + ") != 123");
}
}
}
public static void main(String[] str) {
for (int i = 0; i < 10000; i++) {
iter();
}
System.out.println("Done.");
}
}
`
and the command line like 'java -XX:-TieredCompilation -Xbatch -XX:CompileOnly=TestFillArray::fill -XX:+OptimizeFill -XX:UseAVX=2 -XX:-UseUnalignedLoadStores TestFillArray' the vtune reports 'arrayof_jbyte_fill_stub' in hot methods:
Function Module CPU Time % of CPU Time(%)
---------------------------------------------- -------------------- -------- ----------------
Interpreter [Dynamic code] 244.070s 84.2%
Stub Generator arrayof_jbyte_fill_stub [Dynamic code] 24.070s 8.3%
TestFillArray::fill [Compiled Java code] 12.650s 4.4%
But no issues were reported for runs with AVX0/1/2 on the Xeon 6740E (economy cores) and Xeon 6900P (performance cores) for both UseUnalignedLoadStores values (true/false).
The 'tier1' testing with option '-XX:-UseUnalignedLoadStores' passed on the Xeon 6740E.
Sorry, I failed to find bad execution branch. Could you give more details about found issue?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/26747#discussion_r2400175313
More information about the hotspot-dev
mailing list