RFR: 8365290: [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays [v7]

Vladimir Ivanov vaivanov at openjdk.org
Thu Oct 2 21:55:46 UTC 2025


On Wed, 1 Oct 2025 21:39:48 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> testing for tier1, tier2 and tier3 were OK. Will review this part one more time.
>> Do you have test scenario that may reproduce this issue?
>
> Testing does not guarantee this path is tested. You need to disable AVX512 to use it otherwise `generate_fill_avx3()` will be used. I was thinking about `byte[63] array` with `-XX:UseAVX=2 -XX:-UseUnalignedLoadStores` flags to hit this path. I did small experiment but unfortunately it seems `arrayof_jbyte_fill` stub is not called with AVX2 so the path is not executed.
> 
> I will let you do further investigations to force this path be executed. Here is my small test:
> 
> $ java -XX:-TieredCompilation -Xbatch -XX:CompileOnly=TestFillArray::fill -XX:UseAVX=2 -XX:-UseUnalignedLoadStores TestFillArray
> 
> $ cat TestFillArray.java 
> public class TestFillArray {
>     private static byte[] ba;
> 
>     static void fill() {
>         for (int i = 0; i < ba.length; i++) {
>             ba[i] = (byte) 123;
>         }
>     }
> 
>     public static void main(String[] str) {
>         ba = new byte[63];
>         for (int i = 0; i < 10000; i++) {
>             fill();
>         }
>         ba = new byte[63];
>         fill();
>         for (int i = 0; i < ba.length; i++) {
>             if (ba[i] != (byte) 123) {
>                 System.out.println("ba[" + i + "] (" + ba[i] + ") != 123");
>             }
>         }
>     }
> }

With extra cycle for code in main
`public class TestFillArray {
    private static byte[] ba;

    static void fill() {
        for (int i = 0; i < ba.length; i++) {
            ba[i] = (byte) 123;
        }
    }

    static void iter() {
        ba = new byte[63];
        for (int i = 0; i < 1000000; i++) {
            fill();
        }
        ba = new byte[63];
        fill();
        for (int i = 0; i < ba.length; i++) {
            if (ba[i] != (byte) 123) {
                System.out.println("ba[" + i + "] (" + ba[i] + ") != 123");
            }
        }
    }

    public static void main(String[] str) {
        for (int i = 0; i < 10000; i++) {
            iter();
        }
        System.out.println("Done.");
    }
}
`
and the command line like 'java -XX:-TieredCompilation -Xbatch -XX:CompileOnly=TestFillArray::fill -XX:+OptimizeFill -XX:UseAVX=2 -XX:-UseUnalignedLoadStores TestFillArray' the vtune reports 'arrayof_jbyte_fill_stub' in hot methods:
Function                                        Module                CPU Time  % of CPU Time(%)
----------------------------------------------  --------------------  --------  ----------------
Interpreter                                     [Dynamic code]        244.070s             84.2%
Stub Generator arrayof_jbyte_fill_stub          [Dynamic code]         24.070s              8.3%
TestFillArray::fill                             [Compiled Java code]   12.650s              4.4%

But no issues were reported for runs with AVX0/1/2 on the Xeon 6740E (economy cores) and Xeon 6900P (performance cores) for both UseUnalignedLoadStores values (true/false).
The 'tier1' testing with option '-XX:-UseUnalignedLoadStores' passed on the Xeon 6740E.
Sorry, I failed to find bad execution branch. Could you give more details about found issue?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26747#discussion_r2400175313


More information about the hotspot-dev mailing list