RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v18]
Emanuel Peter
epeter at openjdk.org
Fri Aug 22 07:05:05 UTC 2025
On Wed, 20 Aug 2025 12:31:11 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> TODO work that arose during review process / recent merges with master:
>>
>> - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. Investigation ongoing.
>> - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. Probably file a follow-up RFE.
>>
>> ---------------
>>
>> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs.
>>
>> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016:
>> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate.
>> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization.
>>
>> --------------------------
>>
>> **Where to start reviewing**
>>
>> - `src/hotspot/share/opto/mempointer.hpp`:
>> - Read the class comment for `MemPointerRawSummand`.
>> - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks.
>>
>> - `src/hotspot/share/opto/vectorization.cpp`:
>> - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works.
>>
>> - `src/hotspot/share/opto/vtransform.hpp`:
>> - Understand the difference between weak and strong edges.
>>
>> If you need to see some examples, then look at the tests:
>> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning.
>> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases.
>> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases.
>> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments).
>> --------------------------
>>
>> **Details**
>>
>> Most fundamentally:
>> - I had to...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>
> disable flag if not possible
Here the logs:
[empeter at emanuel jdk-fork6]$ perf stat ./build/linux-x64/jdk/bin/java -Djava.library.path=/home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/images/test/micro/native -jar /home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/images/test/micro/benchmarks.jar "VectorAliasing.VectorAliasingSuperWordPretendNotProfitable.bench_copy_array_B_differentIndex_alias" -prof perfasm
WARNING: A terminally deprecated method in sun.misc.Unsafe has been called
WARNING: sun.misc.Unsafe::objectFieldOffset has been called by org.openjdk.jmh.util.Utils (file:/home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/images/test/micro/benchmarks.jar)
WARNING: Please consider reporting this to the maintainers of class org.openjdk.jmh.util.Utils
WARNING: sun.misc.Unsafe::objectFieldOffset will be removed in a future release
# JMH version: 1.37
# VM version: JDK 26-internal, Java HotSpot(TM) 64-Bit Server VM, 26-internal-2025-08-19-0806546.empeter...
# VM invoker: /home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/jdk/bin/java
# VM options: -XX:+UseSuperWord -XX:+UnlockDiagnosticVMOptions -XX:AutoVectorizationOverrideProfitability=0
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 1 iterations, 1 s each
# Measurement: 10 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.openjdk.bench.vm.compiler.VectorAliasing.VectorAliasingSuperWordPretendNotProfitable.bench_copy_array_B_differentIndex_alias
# Parameters: (SIZE = 10000, seed = 0)
# Run progress: 0.00% complete, ETA 00:00:11
# Fork: 1 of 1
# Preparing profilers: LinuxPerfAsmProfiler
# Profilers consume stdout and stderr from target VM, use -v EXTRA to copy to console
# Warmup Iteration 1: 3101.582 ns/op
Iteration 1: 2876.229 ns/op
Iteration 2: 2858.107 ns/op
Iteration 3: 2837.087 ns/op
Iteration 4: 2860.013 ns/op
Iteration 5: 2851.886 ns/op
Iteration 6: 2872.007 ns/op
Iteration 7: 2863.599 ns/op
Iteration 8: 2842.069 ns/op
Iteration 9: 2841.341 ns/op
Iteration 10: 2844.861 ns/op
# Processing profiler results: LinuxPerfAsmProfiler
Result "org.openjdk.bench.vm.compiler.VectorAliasing.VectorAliasingSuperWordPretendNotProfitable.bench_copy_array_B_differentIndex_alias":
2854.720 ±(99.9%) 20.377 ns/op [Average]
(min, avg, max) = (2837.087, 2854.720, 2876.229), stdev = 13.478
CI (99.9%): [2834.343, 2875.097] (assumes normal distribution)
Secondary result "org.openjdk.bench.vm.compiler.VectorAliasing.VectorAliasingSuperWordPretendNotProfitable.bench_copy_array_B_differentIndex_alias:asm":
PrintAssembly processed: 326211 total address lines.
Perf output processed (skipped 10.085 seconds):
Column 1: cycles (10297 events)
Hottest code regions (>10.00% "cycles" events):
Event counts are percents of total event count.
....[Hottest Region 1]..............................................................................
c2, level 4, org.openjdk.bench.vm.compiler.VectorAliasing::copy_B, version 3, compile id 1261
0.03% 0x00007f28e8bef76b: movslq %r11d,%r11
0x00007f28e8bef76e: cmp %r11,%r10
0x00007f28e8bef771: jae 0x00007f28e8befb18
0.02% 0x00007f28e8bef777: movsbl 0x10(%rdx,%rbp,1),%r11d
0x00007f28e8bef77d: mov %r11b,0x10(%rcx,%rax,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0}
; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 22 (line 92)
0x00007f28e8bef782: add $0xffffffffffffffc1,%rbx
0x00007f28e8bef786: mov $0xffffffff80000000,%r10
0x00007f28e8bef78d: cmp $0xffffffff80000000,%rbx
0x00007f28e8bef794: cmovl %r10,%rbx
0.01% 0x00007f28e8bef798: mov %ebx,%r13d
0.01% 0x00007f28e8bef79b: mov $0x1,%esi
0.01% 0x00007f28e8bef7a0: cmp $0x1,%r13d
0x00007f28e8bef7a4: jle 0x00007f28e8befae6 ;*goto {reexecute=0 rethrow=0 return_oop=0}
; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 26 (line 91)
╭ 0x00007f28e8bef7aa: jmpq 0x00007f28e8befab8
│ 0x00007f28e8bef7af: nop
0.07% │↗ 0x00007f28e8bef7b0: vmovd %xmm0,%r8d ;*aload_2 {reexecute=0 rethrow=0 return_oop=0}
││ ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 10 (line 92)
││ 0x00007f28e8bef7b5: vmovd %r8d,%xmm0
0.05% ││ 0x00007f28e8bef7ba: add %esi,%r8d
││ 0x00007f28e8bef7bd: vmovd %xmm2,%r10d
0.95% ││ 0x00007f28e8bef7c2: add %esi,%r10d
││ 0x00007f28e8bef7c5: movslq %r8d,%r11
1.17% ││ 0x00007f28e8bef7c8: movslq %r10d,%r8
0.01% ││ 0x00007f28e8bef7cb: movslq %esi,%r10
0.27% ││ 0x00007f28e8bef7ce: lea (%rax,%r10,1),%r9
││ 0x00007f28e8bef7d2: lea (%r10,%rbp,1),%rbx
0.32% ││ 0x00007f28e8bef7d6: movsbl 0x10(%rdx,%rbx,1),%r10d
0.92% ││ 0x00007f28e8bef7dc: mov %r10b,0x10(%rcx,%r9,1)
1.59% ││ 0x00007f28e8bef7e1: movsbl 0x11(%rdx,%rbx,1),%r10d
0.17% ││ 0x00007f28e8bef7e7: mov %r10b,0x11(%rcx,%r9,1)
1.20% ││ 0x00007f28e8bef7ec: movsbl 0x12(%rdx,%rbx,1),%r10d
0.20% ││ 0x00007f28e8bef7f2: mov %r10b,0x12(%rcx,%r9,1)
0.83% ││ 0x00007f28e8bef7f7: movsbl 0x13(%rdx,%r11,1),%r10d
0.28% ││ 0x00007f28e8bef7fd: mov %r10b,0x13(%rcx,%r8,1)
0.44% ││ 0x00007f28e8bef802: movsbl 0x14(%rdx,%r11,1),%r10d
0.23% ││ 0x00007f28e8bef808: mov %r10b,0x14(%rcx,%r8,1) ; {other}
1.08% ││ 0x00007f28e8bef80d: movsbl 0x15(%rdx,%r11,1),%r10d
0.43% ││ 0x00007f28e8bef813: mov %r10b,0x15(%rcx,%r8,1)
1.12% ││ 0x00007f28e8bef818: movsbl 0x16(%rdx,%r11,1),%r10d
0.51% ││ 0x00007f28e8bef81e: mov %r10b,0x16(%rcx,%r8,1)
1.45% ││ 0x00007f28e8bef823: movsbl 0x17(%rdx,%r11,1),%r10d
0.30% ││ 0x00007f28e8bef829: mov %r10b,0x17(%rcx,%r8,1)
1.05% ││ 0x00007f28e8bef82e: movsbl 0x18(%rdx,%r11,1),%r10d
0.19% ││ 0x00007f28e8bef834: mov %r10b,0x18(%rcx,%r8,1)
1.08% ││ 0x00007f28e8bef839: movsbl 0x19(%rdx,%r11,1),%r10d
0.21% ││ 0x00007f28e8bef83f: mov %r10b,0x19(%rcx,%r8,1)
0.95% ││ 0x00007f28e8bef844: movsbl 0x1a(%rdx,%r11,1),%r10d
0.16% ││ 0x00007f28e8bef84a: mov %r10b,0x1a(%rcx,%r8,1)
0.94% ││ 0x00007f28e8bef84f: movsbl 0x1b(%rdx,%r11,1),%r10d
0.47% ││ 0x00007f28e8bef855: mov %r10b,0x1b(%rcx,%r8,1)
0.96% ││ 0x00007f28e8bef85a: movsbl 0x1c(%rdx,%r11,1),%r10d
0.55% ││ 0x00007f28e8bef860: mov %r10b,0x1c(%rcx,%r8,1)
1.34% ││ 0x00007f28e8bef865: movsbl 0x1d(%rdx,%r11,1),%r10d
0.33% ││ 0x00007f28e8bef86b: mov %r10b,0x1d(%rcx,%r8,1)
1.15% ││ 0x00007f28e8bef870: movsbl 0x1e(%rdx,%r11,1),%r10d
0.25% ││ 0x00007f28e8bef876: mov %r10b,0x1e(%rcx,%r8,1)
1.11% ││ 0x00007f28e8bef87b: movsbl 0x1f(%rdx,%r11,1),%r10d
0.45% ││ 0x00007f28e8bef881: mov %r10b,0x1f(%rcx,%r8,1)
1.15% ││ 0x00007f28e8bef886: movsbl 0x20(%rdx,%r11,1),%r10d
0.19% ││ 0x00007f28e8bef88c: mov %r10b,0x20(%rcx,%r8,1)
1.10% ││ 0x00007f28e8bef891: movsbl 0x21(%rdx,%r11,1),%r10d
0.15% ││ 0x00007f28e8bef897: mov %r10b,0x21(%rcx,%r8,1)
0.97% ││ 0x00007f28e8bef89c: movsbl 0x22(%rdx,%r11,1),%r10d
0.58% ││ 0x00007f28e8bef8a2: mov %r10b,0x22(%rcx,%r8,1)
1.19% ││ 0x00007f28e8bef8a7: movsbl 0x23(%rdx,%r11,1),%r10d
0.38% ││ 0x00007f28e8bef8ad: mov %r10b,0x23(%rcx,%r8,1)
1.12% ││ 0x00007f28e8bef8b2: movsbl 0x24(%rdx,%r11,1),%r10d
0.22% ││ 0x00007f28e8bef8b8: mov %r10b,0x24(%rcx,%r8,1)
1.20% ││ 0x00007f28e8bef8bd: movsbl 0x25(%rdx,%r11,1),%r10d
0.20% ││ 0x00007f28e8bef8c3: mov %r10b,0x25(%rcx,%r8,1)
1.15% ││ 0x00007f28e8bef8c8: movsbl 0x26(%rdx,%r11,1),%r10d
0.10% ││ 0x00007f28e8bef8ce: mov %r10b,0x26(%rcx,%r8,1)
1.02% ││ 0x00007f28e8bef8d3: movsbl 0x27(%rdx,%r11,1),%r10d
0.14% ││ 0x00007f28e8bef8d9: mov %r10b,0x27(%rcx,%r8,1)
1.10% ││ 0x00007f28e8bef8de: movsbl 0x28(%rdx,%r11,1),%r10d
0.38% ││ 0x00007f28e8bef8e4: mov %r10b,0x28(%rcx,%r8,1)
1.02% ││ 0x00007f28e8bef8e9: movsbl 0x29(%rdx,%r11,1),%r10d
0.33% ││ 0x00007f28e8bef8ef: mov %r10b,0x29(%rcx,%r8,1)
1.03% ││ 0x00007f28e8bef8f4: movsbl 0x2a(%rdx,%r11,1),%r10d
0.33% ││ 0x00007f28e8bef8fa: mov %r10b,0x2a(%rcx,%r8,1)
1.08% ││ 0x00007f28e8bef8ff: movsbl 0x2b(%rdx,%r11,1),%r10d
0.32% ││ 0x00007f28e8bef905: mov %r10b,0x2b(%rcx,%r8,1) ; {other}
1.05% ││ 0x00007f28e8bef90a: movsbl 0x2c(%rdx,%r11,1),%r10d
0.27% ││ 0x00007f28e8bef910: mov %r10b,0x2c(%rcx,%r8,1)
1.18% ││ 0x00007f28e8bef915: movsbl 0x2d(%rdx,%r11,1),%r10d
0.24% ││ 0x00007f28e8bef91b: mov %r10b,0x2d(%rcx,%r8,1)
0.98% ││ 0x00007f28e8bef920: movsbl 0x2e(%rdx,%r11,1),%r10d
0.35% ││ 0x00007f28e8bef926: mov %r10b,0x2e(%rcx,%r8,1)
1.16% ││ 0x00007f28e8bef92b: movsbl 0x2f(%rdx,%r11,1),%r10d
0.38% ││ 0x00007f28e8bef931: mov %r10b,0x2f(%rcx,%r8,1)
1.14% ││ 0x00007f28e8bef936: movsbl 0x30(%rdx,%r11,1),%r10d
0.35% ││ 0x00007f28e8bef93c: mov %r10b,0x30(%rcx,%r8,1)
1.16% ││ 0x00007f28e8bef941: movsbl 0x31(%rdx,%r11,1),%r10d
0.32% ││ 0x00007f28e8bef947: mov %r10b,0x31(%rcx,%r8,1)
1.19% ││ 0x00007f28e8bef94c: movsbl 0x32(%rdx,%r11,1),%r10d
0.28% ││ 0x00007f28e8bef952: mov %r10b,0x32(%rcx,%r8,1)
0.98% ││ 0x00007f28e8bef957: movsbl 0x33(%rdx,%r11,1),%r10d
0.37% ││ 0x00007f28e8bef95d: mov %r10b,0x33(%rcx,%r8,1)
1.10% ││ 0x00007f28e8bef962: movsbl 0x34(%rdx,%r11,1),%r10d
0.30% ││ 0x00007f28e8bef968: mov %r10b,0x34(%rcx,%r8,1)
1.36% ││ 0x00007f28e8bef96d: movsbl 0x35(%rdx,%r11,1),%r10d
0.29% ││ 0x00007f28e8bef973: mov %r10b,0x35(%rcx,%r8,1)
1.11% ││ 0x00007f28e8bef978: movsbl 0x36(%rdx,%r11,1),%r10d
0.38% ││ 0x00007f28e8bef97e: mov %r10b,0x36(%rcx,%r8,1)
1.01% ││ 0x00007f28e8bef983: movsbl 0x37(%rdx,%r11,1),%r10d
0.45% ││ 0x00007f28e8bef989: mov %r10b,0x37(%rcx,%r8,1)
1.14% ││ 0x00007f28e8bef98e: movsbl 0x38(%rdx,%r11,1),%r10d
0.29% ││ 0x00007f28e8bef994: mov %r10b,0x38(%rcx,%r8,1)
1.20% ││ 0x00007f28e8bef999: movsbl 0x39(%rdx,%r11,1),%r10d
0.30% ││ 0x00007f28e8bef99f: mov %r10b,0x39(%rcx,%r8,1)
1.07% ││ 0x00007f28e8bef9a4: movsbl 0x3a(%rdx,%r11,1),%r10d
0.38% ││ 0x00007f28e8bef9aa: mov %r10b,0x3a(%rcx,%r8,1)
1.13% ││ 0x00007f28e8bef9af: movsbl 0x3b(%rdx,%r11,1),%r10d
0.26% ││ 0x00007f28e8bef9b5: mov %r10b,0x3b(%rcx,%r8,1)
1.01% ││ 0x00007f28e8bef9ba: movsbl 0x3c(%rdx,%r11,1),%r10d
0.34% ││ 0x00007f28e8bef9c0: mov %r10b,0x3c(%rcx,%r8,1)
1.42% ││ 0x00007f28e8bef9c5: movsbl 0x3d(%rdx,%r11,1),%r10d
0.35% ││ 0x00007f28e8bef9cb: mov %r10b,0x3d(%rcx,%r8,1)
1.09% ││ 0x00007f28e8bef9d0: movsbl 0x3e(%rdx,%r11,1),%r10d
0.26% ││ 0x00007f28e8bef9d6: mov %r10b,0x3e(%rcx,%r8,1)
1.25% ││ 0x00007f28e8bef9db: movsbl 0x3f(%rdx,%r11,1),%r10d
0.32% ││ 0x00007f28e8bef9e1: mov %r10b,0x3f(%rcx,%r8,1)
1.03% ││ 0x00007f28e8bef9e6: movsbl 0x40(%rdx,%r11,1),%r10d
0.35% ││ 0x00007f28e8bef9ec: mov %r10b,0x40(%rcx,%r8,1)
1.18% ││ 0x00007f28e8bef9f1: movsbl 0x41(%rdx,%rbx,1),%r10d
0.29% ││ 0x00007f28e8bef9f7: mov %r10b,0x41(%rcx,%r9,1)
1.18% ││ 0x00007f28e8bef9fc: movsbl 0x42(%rdx,%r11,1),%r10d
0.39% ││ 0x00007f28e8befa02: mov %r10b,0x42(%rcx,%r8,1)
1.15% ││ 0x00007f28e8befa07: movsbl 0x43(%rdx,%r11,1),%r10d ; {other}
0.26% ││ 0x00007f28e8befa0d: mov %r10b,0x43(%rcx,%r8,1)
1.09% ││ 0x00007f28e8befa12: movsbl 0x44(%rdx,%rbx,1),%r10d
0.32% ││ 0x00007f28e8befa18: mov %r10b,0x44(%rcx,%r9,1)
1.02% ││ 0x00007f28e8befa1d: movsbl 0x45(%rdx,%r11,1),%r10d
0.32% ││ 0x00007f28e8befa23: mov %r10b,0x45(%rcx,%r8,1)
1.15% ││ 0x00007f28e8befa28: movsbl 0x46(%rdx,%r11,1),%r10d
0.37% ││ 0x00007f28e8befa2e: mov %r10b,0x46(%rcx,%r8,1)
1.20% ││ 0x00007f28e8befa33: movsbl 0x47(%rdx,%rbx,1),%r10d
0.30% ││ 0x00007f28e8befa39: mov %r10b,0x47(%rcx,%r8,1)
1.01% ││ 0x00007f28e8befa3e: movsbl 0x48(%rdx,%r11,1),%r10d
0.35% ││ 0x00007f28e8befa44: mov %r10b,0x48(%rcx,%r9,1)
1.30% ││ 0x00007f28e8befa49: movsbl 0x49(%rdx,%r11,1),%r10d
0.44% ││ 0x00007f28e8befa4f: mov %r10b,0x49(%rcx,%r8,1)
1.18% ││ 0x00007f28e8befa54: movsbl 0x4a(%rdx,%r11,1),%r10d
0.31% ││ 0x00007f28e8befa5a: mov %r10b,0x4a(%rcx,%r8,1)
1.26% ││ 0x00007f28e8befa5f: movsbl 0x4b(%rdx,%r11,1),%r10d
0.28% ││ 0x00007f28e8befa65: mov %r10b,0x4b(%rcx,%r8,1)
1.01% ││ 0x00007f28e8befa6a: movsbl 0x4c(%rdx,%r11,1),%r10d
0.64% ││ 0x00007f28e8befa70: mov %r10b,0x4c(%rcx,%r8,1)
1.31% ││ 0x00007f28e8befa75: movsbl 0x4d(%rdx,%r11,1),%r10d
0.46% ││ 0x00007f28e8befa7b: mov %r10b,0x4d(%rcx,%r8,1)
1.22% ││ 0x00007f28e8befa80: movsbl 0x4e(%rdx,%r11,1),%r10d
0.44% ││ 0x00007f28e8befa86: mov %r10b,0x4e(%rcx,%r8,1)
1.22% ││ 0x00007f28e8befa8b: movsbl 0x4f(%rdx,%r11,1),%r10d
0.25% ││ 0x00007f28e8befa91: mov %r10b,0x4f(%rcx,%r8,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0}
││ ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 22 (line 92)
1.09% ││ 0x00007f28e8befa96: add $0x40,%esi ;*iinc {reexecute=0 rethrow=0 return_oop=0}
││ ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 23 (line 91)
││ 0x00007f28e8befa99: cmp %r14d,%esi
│╰ 0x00007f28e8befa9c: jl 0x00007f28e8bef7b0 ;*goto {reexecute=0 rethrow=0 return_oop=0}
│ ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 26 (line 91)
│ 0x00007f28e8befaa2: mov 0x30(%r15),%r10 ; ImmutableOopMap {rcx=Oop rdx=Oop }
│ ;*goto {reexecute=1 rethrow=0 return_oop=0}
│ ; - (reexecute) org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 26 (line 91)
0.12% │ 0x00007f28e8befaa6: test %eax,(%r10) ;*goto {reexecute=0 rethrow=0 return_oop=0}
│ ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 26 (line 91)
│ ; {poll}
0.04% │ 0x00007f28e8befaa9: cmp %r13d,%esi
│ 0x00007f28e8befaac: jge 0x00007f28e8befae6
│ 0x00007f28e8befaae: vmovd %xmm0,%r8d
│ 0x00007f28e8befab3: vmovd %xmm2,%r9d
↘ 0x00007f28e8befab8: mov %r13d,%r14d
0x00007f28e8befabb: sub %esi,%r14d
0x00007f28e8befabe: xor %r10d,%r10d
0x00007f28e8befac1: cmp %esi,%r13d
0x00007f28e8befac4: cmovl %r10d,%r14d
....................................................................................................
95.97% <total for region 1>
....[Hottest Regions]...............................................................................
95.97% c2, level 4 org.openjdk.bench.vm.compiler.VectorAliasing::copy_B, version 3, compile id 1261
0.65% c2, level 4 org.openjdk.bench.vm.compiler.VectorAliasing::copy_B, version 3, compile id 1261
0.27% libjvm.so ElfSymbolTable::lookup(unsigned char*, int*, int*, int*, ElfFuncDescTable*)
0.14% libjvm.so resolve_inlining_predicate(CompileCommandEnum, methodHandle const&)
0.12% libc.so.6 __futex_abstimed_wait_common
0.12% libc.so.6 clone3
0.11% c2, level 4 org.openjdk.bench.vm.compiler.jmh_generated.VectorAliasing_VectorAliasingSuperWordPretendNotProfitable_bench_copy_array_B_differentIndex_alias_jmhTest::bench_copy_array_B_differentIndex_alias_avgt_jmhStub, version 5, compile id 1281
0.10% kernel [unknown]
0.08% libc.so.6 __GI___lll_lock_wait
0.07% c2, level 4 org.openjdk.bench.vm.compiler.VectorAliasing::copy_B, version 3, compile id 1261
0.07% libjvm.so CompilerOracle::should_not_inline(methodHandle const&)
0.07% libjvm.so RelocIterator::initialize(nmethod*, unsigned char*, unsigned char*)
0.07% libjvm.so xmlStream::write_text(char const*, unsigned long) [clone .part.0]
0.06% libjvm.so CompilerOracle::should_exclude(methodHandle const&)
0.06% libjvm.so CompilerOracle::tag_blackhole_if_possible(methodHandle const&)
0.06% libjvm.so defaultStream::write(char const*, unsigned long)
0.06% libc.so.6 _IO_fwrite
0.05% c2, level 4 org.openjdk.bench.vm.compiler.jmh_generated.VectorAliasing_VectorAliasingSuperWordPretendNotProfitable_bench_copy_array_B_differentIndex_alias_jmhTest::bench_copy_array_B_differentIndex_alias_avgt_jmhStub, version 5, compile id 1281
0.05% libjvm.so MethodMatcher::matches(methodHandle const&) const
0.05% libjvm.so os::pd_write(int, void const*, unsigned long)
1.80% <...other 139 warm regions...>
....................................................................................................
99.99% <totals>
....[Hottest Methods (after inlining)]..............................................................
96.70% c2, level 4 org.openjdk.bench.vm.compiler.VectorAliasing::copy_B, version 3, compile id 1261
0.27% libjvm.so ElfSymbolTable::lookup(unsigned char*, int*, int*, int*, ElfFuncDescTable*)
0.17% libjvm.so resolve_inlining_predicate(CompileCommandEnum, methodHandle const&)
0.16% c2, level 4 org.openjdk.bench.vm.compiler.jmh_generated.VectorAliasing_VectorAliasingSuperWordPretendNotProfitable_bench_copy_array_B_differentIndex_alias_jmhTest::bench_copy_array_B_differentIndex_alias_avgt_jmhStub, version 5, compile id 1281
0.15% libjvm.so defaultStream::write(char const*, unsigned long)
0.13% <unknown>
0.13% hsdis-amd64.so print_insn
0.12% libc.so.6 clone3
0.12% libc.so.6 __futex_abstimed_wait_common
0.10% kernel [unknown]
0.09% libjvm.so xmlStream::write_text(char const*, unsigned long) [clone .part.0]
0.09% libc.so.6 _IO_fwrite
0.08% libc.so.6 __GI___lll_lock_wait
0.07% libjvm.so CompilerOracle::should_not_inline(methodHandle const&)
0.07% libjvm.so RelocIterator::initialize(nmethod*, unsigned char*, unsigned char*)
0.06% libc.so.6 __vfprintf_internal
0.06% libjvm.so CompilerOracle::should_exclude(methodHandle const&)
0.06% libjvm.so CompilerOracle::tag_blackhole_if_possible(methodHandle const&)
0.05% libc.so.6 __GI___pthread_disable_asynccancel
0.05% libjvm.so os::pd_write(int, void const*, unsigned long)
1.31% <...other 94 warm methods...>
....................................................................................................
99.99% <totals>
....[Distribution by Source]........................................................................
96.86% c2, level 4
1.72% libjvm.so
0.88% libc.so.6
0.18% hsdis-amd64.so
0.13%
0.10% kernel
0.10% interpreter
0.01% perf-1337464.map
0.01% ld-linux-x86-64.so.2
....................................................................................................
99.99% <totals>
# Run complete. Total time: 00:00:21
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
NOTE: Current JVM experimentally supports Compiler Blackholes, and they are in use. Please exercise
extra caution when trusting the results, look into the generated code to check the benchmark still
works, and factor in a small probability of new VM bugs. Additionally, while comparisons between
different JVMs are already problematic, the performance difference caused by different Blackhole
modes can be very significant. Please make sure you use the consistent Blackhole mode for comparisons.
Benchmark (SIZE) (seed) Mode Cnt Score Error Units
VectorAliasing.VectorAliasingSuperWordPretendNotProfitable.bench_copy_array_B_differentIndex_alias 10000 0 avgt 10 2854.720 ± 20.377 ns/op
VectorAliasing.VectorAliasingSuperWordPretendNotProfitable.bench_copy_array_B_differentIndex_alias:asm 10000 0 avgt NaN ---
Performance counter stats for './build/linux-x64/jdk/bin/java -Djava.library.path=/home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/images/test/micro/native -jar /home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/images/test/micro/benchmarks.jar VectorAliasing.VectorAliasingSuperWordPretendNotProfitable.bench_copy_array_B_differentIndex_alias -prof perfasm':
38,626.40 msec task-clock:u # 1.671 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
65,014 page-faults:u # 1.683 K/sec
61,866,869,316 cycles:u # 1.602 GHz
139,866,728,493 instructions:u # 2.26 insn per cycle
12,248,937,904 branches:u # 317.113 M/sec
261,300,604 branch-misses:u # 2.13% of all branches
TopdownL1 # 18.9 % tma_backend_bound
# 33.9 % tma_bad_speculation
# 12.4 % tma_frontend_bound
# 34.9 % tma_retiring
23.119850997 seconds time elapsed
25.491034000 seconds user
13.058817000 seconds sys
VS
[empeter at emanuel jdk-fork6]$ perf stat ./build/linux-x64/jdk/bin/java -Djava.library.path=/home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/images/test/micro/native -jar /home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/images/test/micro/benchmarks.jar "VectorAliasing.VectorAliasingSuperWord.bench_copy_array_B_differentIndex_alias" -prof perfasm
WARNING: A terminally deprecated method in sun.misc.Unsafe has been called
WARNING: sun.misc.Unsafe::objectFieldOffset has been called by org.openjdk.jmh.util.Utils (file:/home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/images/test/micro/benchmarks.jar)
WARNING: Please consider reporting this to the maintainers of class org.openjdk.jmh.util.Utils
WARNING: sun.misc.Unsafe::objectFieldOffset will be removed in a future release
# JMH version: 1.37
# VM version: JDK 26-internal, Java HotSpot(TM) 64-Bit Server VM, 26-internal-2025-08-19-0806546.empeter...
# VM invoker: /home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/jdk/bin/java
# VM options: -XX:+UseSuperWord
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 1 iterations, 1 s each
# Measurement: 10 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.openjdk.bench.vm.compiler.VectorAliasing.VectorAliasingSuperWord.bench_copy_array_B_differentIndex_alias
# Parameters: (SIZE = 10000, seed = 0)
# Run progress: 0.00% complete, ETA 00:00:11
# Fork: 1 of 1
# Preparing profilers: LinuxPerfAsmProfiler
# Profilers consume stdout and stderr from target VM, use -v EXTRA to copy to console
# Warmup Iteration 1: 3546.830 ns/op
Iteration 1: 3178.654 ns/op
Iteration 2: 3191.249 ns/op
Iteration 3: 3184.110 ns/op
Iteration 4: 3199.210 ns/op
Iteration 5: 3188.098 ns/op
Iteration 6: 3190.187 ns/op
Iteration 7: 3177.316 ns/op
Iteration 8: 3166.970 ns/op
Iteration 9: 3175.117 ns/op
Iteration 10: 3165.729 ns/op
# Processing profiler results: LinuxPerfAsmProfiler
Result "org.openjdk.bench.vm.compiler.VectorAliasing.VectorAliasingSuperWord.bench_copy_array_B_differentIndex_alias":
3181.664 ±(99.9%) 16.411 ns/op [Average]
(min, avg, max) = (3165.729, 3181.664, 3199.210), stdev = 10.855
CI (99.9%): [3165.253, 3198.075] (assumes normal distribution)
Secondary result "org.openjdk.bench.vm.compiler.VectorAliasing.VectorAliasingSuperWord.bench_copy_array_B_differentIndex_alias:asm":
PrintAssembly processed: 327081 total address lines.
Perf output processed (skipped 10.149 seconds):
Column 1: cycles (10319 events)
Hottest code regions (>10.00% "cycles" events):
Event counts are percents of total event count.
....[Hottest Region 1]..............................................................................
c2, level 4, org.openjdk.bench.vm.compiler.VectorAliasing::copy_B, version 5, compile id 1267
0x00007fc4e0bef97f: movsbl 0x10(%rdx,%r13,1),%r11d
0x00007fc4e0bef985: mov %r11b,0x10(%rcx,%rbp,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0}
; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 22 (line 92)
0x00007fc4e0bef98a: mov $0x1,%r10d
0x00007fc4e0bef990: mov 0x8(%rsp),%r8d
0x00007fc4e0bef995: cmp $0x1,%r8d
╭ 0x00007fc4e0bef999: jle 0x00007fc4e0befccd
│ 0x00007fc4e0bef99f: mov $0xfa00,%esi
│╭ 0x00007fc4e0bef9a4: jmp 0x00007fc4e0bef9a9
││ ↗ 0x00007fc4e0bef9a6: mov %r14d,%r8d
0.02% │↘ │ 0x00007fc4e0bef9a9: mov %r8d,%r11d
│ │ 0x00007fc4e0bef9ac: sub %r10d,%r11d
│ │ 0x00007fc4e0bef9af: xor %r9d,%r9d
│ │ 0x00007fc4e0bef9b2: cmp %r10d,%r8d
│ │ 0x00007fc4e0bef9b5: cmovl %r9d,%r11d
│ │ 0x00007fc4e0bef9b9: cmp $0xfa00,%r11d
│ │ 0x00007fc4e0bef9c0: cmova %esi,%r11d
0.01% │ │ 0x00007fc4e0bef9c4: add %r10d,%r11d
│ │ 0x00007fc4e0bef9c7: mov %r8d,%r14d
│ │ 0x00007fc4e0bef9ca: nopw 0x0(%rax,%rax,1) ;*aload_2 {reexecute=0 rethrow=0 return_oop=0}
│ │ ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 10 (line 92)
0.22% │ ↗│ 0x00007fc4e0bef9d0: vmovd %xmm0,%ebx
│ ││ 0x00007fc4e0bef9d4: add %r10d,%ebx
0.12% │ ││ 0x00007fc4e0bef9d7: mov 0x4(%rsp),%r9d
0.47% │ ││ 0x00007fc4e0bef9dc: add %r10d,%r9d
1.60% │ ││ 0x00007fc4e0bef9df: movslq %ebx,%r8
0.02% │ ││ 0x00007fc4e0bef9e2: movslq %r9d,%rbx
0.27% │ ││ 0x00007fc4e0bef9e5: movslq %r10d,%r9
│ ││ 0x00007fc4e0bef9e8: lea (%r9,%rbp,1),%rdi
0.25% │ ││ 0x00007fc4e0bef9ec: lea (%r9,%r13,1),%rax
0.01% │ ││ 0x00007fc4e0bef9f0: movsbl 0x10(%rdx,%rax,1),%r9d
0.41% │ ││ 0x00007fc4e0bef9f6: mov %r9b,0x10(%rcx,%rdi,1)
0.16% │ ││ 0x00007fc4e0bef9fb: movsbl 0x11(%rdx,%rax,1),%r9d
1.86% │ ││ 0x00007fc4e0befa01: mov %r9b,0x11(%rcx,%rdi,1)
0.78% │ ││ 0x00007fc4e0befa06: movsbl 0x12(%rdx,%rax,1),%r9d
0.36% │ ││ 0x00007fc4e0befa0c: mov %r9b,0x12(%rcx,%rdi,1)
0.83% │ ││ 0x00007fc4e0befa11: movsbl 0x13(%rdx,%r8,1),%r9d
1.06% │ ││ 0x00007fc4e0befa17: mov %r9b,0x13(%rcx,%rbx,1)
4.04% │ ││ 0x00007fc4e0befa1c: movsbl 0x14(%rdx,%r8,1),%r9d
4.36% │ ││ 0x00007fc4e0befa22: mov %r9b,0x14(%rcx,%rbx,1)
3.32% │ ││ 0x00007fc4e0befa27: movsbl 0x15(%rdx,%r8,1),%r9d
1.06% │ ││ 0x00007fc4e0befa2d: mov %r9b,0x15(%rcx,%rbx,1)
1.21% │ ││ 0x00007fc4e0befa32: movsbl 0x16(%rdx,%r8,1),%r9d
0.64% │ ││ 0x00007fc4e0befa38: mov %r9b,0x16(%rcx,%rbx,1)
1.11% │ ││ 0x00007fc4e0befa3d: movsbl 0x17(%rdx,%r8,1),%r9d
0.07% │ ││ 0x00007fc4e0befa43: mov %r9b,0x17(%rcx,%rbx,1)
0.70% │ ││ 0x00007fc4e0befa48: movsbl 0x18(%rdx,%r8,1),%r9d
1.01% │ ││ 0x00007fc4e0befa4e: mov %r9b,0x18(%rcx,%rbx,1)
1.61% │ ││ 0x00007fc4e0befa53: movsbl 0x19(%rdx,%r8,1),%r9d
0.58% │ ││ 0x00007fc4e0befa59: mov %r9b,0x19(%rcx,%rbx,1)
1.18% │ ││ 0x00007fc4e0befa5e: movsbl 0x1a(%rdx,%r8,1),%r9d
0.34% │ ││ 0x00007fc4e0befa64: mov %r9b,0x1a(%rcx,%rbx,1)
1.08% │ ││ 0x00007fc4e0befa69: movsbl 0x1b(%rdx,%r8,1),%r9d ; {other}
0.05% │ ││ 0x00007fc4e0befa6f: mov %r9b,0x1b(%rcx,%rbx,1)
0.79% │ ││ 0x00007fc4e0befa74: movsbl 0x1c(%rdx,%r8,1),%r9d
0.70% │ ││ 0x00007fc4e0befa7a: mov %r9b,0x1c(%rcx,%rbx,1)
1.49% │ ││ 0x00007fc4e0befa7f: movsbl 0x1d(%rdx,%r8,1),%r9d
0.33% │ ││ 0x00007fc4e0befa85: mov %r9b,0x1d(%rcx,%rbx,1)
0.80% │ ││ 0x00007fc4e0befa8a: movsbl 0x1e(%rdx,%r8,1),%r9d
0.49% │ ││ 0x00007fc4e0befa90: mov %r9b,0x1e(%rcx,%rbx,1)
1.03% │ ││ 0x00007fc4e0befa95: movsbl 0x1f(%rdx,%r8,1),%r9d
0.05% │ ││ 0x00007fc4e0befa9b: mov %r9b,0x1f(%rcx,%rbx,1)
0.98% │ ││ 0x00007fc4e0befaa0: movsbl 0x20(%rdx,%r8,1),%r9d
0.33% │ ││ 0x00007fc4e0befaa6: mov %r9b,0x20(%rcx,%rbx,1)
1.31% │ ││ 0x00007fc4e0befaab: movsbl 0x21(%rdx,%r8,1),%r9d
0.02% │ ││ 0x00007fc4e0befab1: mov %r9b,0x21(%rcx,%rbx,1)
0.84% │ ││ 0x00007fc4e0befab6: movsbl 0x22(%rdx,%r8,1),%r9d
0.05% │ ││ 0x00007fc4e0befabc: mov %r9b,0x22(%rcx,%rbx,1)
1.01% │ ││ 0x00007fc4e0befac1: movsbl 0x23(%rdx,%r8,1),%r9d
│ ││ 0x00007fc4e0befac7: mov %r9b,0x23(%rcx,%rbx,1)
0.76% │ ││ 0x00007fc4e0befacc: movsbl 0x24(%rdx,%r8,1),%r9d
0.08% │ ││ 0x00007fc4e0befad2: mov %r9b,0x24(%rcx,%rbx,1)
1.17% │ ││ 0x00007fc4e0befad7: movsbl 0x25(%rdx,%r8,1),%r9d
0.01% │ ││ 0x00007fc4e0befadd: mov %r9b,0x25(%rcx,%rbx,1)
0.88% │ ││ 0x00007fc4e0befae2: movsbl 0x26(%rdx,%r8,1),%r9d
0.07% │ ││ 0x00007fc4e0befae8: mov %r9b,0x26(%rcx,%rbx,1)
1.23% │ ││ 0x00007fc4e0befaed: movsbl 0x27(%rdx,%r8,1),%r9d
0.02% │ ││ 0x00007fc4e0befaf3: mov %r9b,0x27(%rcx,%rbx,1)
0.73% │ ││ 0x00007fc4e0befaf8: movsbl 0x28(%rdx,%r8,1),%r9d
0.16% │ ││ 0x00007fc4e0befafe: mov %r9b,0x28(%rcx,%rbx,1)
1.27% │ ││ 0x00007fc4e0befb03: movsbl 0x29(%rdx,%r8,1),%r9d
0.01% │ ││ 0x00007fc4e0befb09: mov %r9b,0x29(%rcx,%rbx,1)
0.75% │ ││ 0x00007fc4e0befb0e: movsbl 0x2a(%rdx,%r8,1),%r9d
0.13% │ ││ 0x00007fc4e0befb14: mov %r9b,0x2a(%rcx,%rbx,1)
1.27% │ ││ 0x00007fc4e0befb19: movsbl 0x2b(%rdx,%r8,1),%r9d
0.01% │ ││ 0x00007fc4e0befb1f: mov %r9b,0x2b(%rcx,%rbx,1)
0.63% │ ││ 0x00007fc4e0befb24: movsbl 0x2c(%rdx,%r8,1),%r9d
0.17% │ ││ 0x00007fc4e0befb2a: mov %r9b,0x2c(%rcx,%rbx,1)
1.26% │ ││ 0x00007fc4e0befb2f: movsbl 0x2d(%rdx,%r8,1),%r9d
0.04% │ ││ 0x00007fc4e0befb35: mov %r9b,0x2d(%rcx,%rbx,1)
0.76% │ ││ 0x00007fc4e0befb3a: movsbl 0x2e(%rdx,%r8,1),%r9d
0.23% │ ││ 0x00007fc4e0befb40: mov %r9b,0x2e(%rcx,%rbx,1)
1.49% │ ││ 0x00007fc4e0befb45: movsbl 0x2f(%rdx,%r8,1),%r9d
0.14% │ ││ 0x00007fc4e0befb4b: mov %r9b,0x2f(%rcx,%rbx,1)
0.79% │ ││ 0x00007fc4e0befb50: movsbl 0x30(%rdx,%r8,1),%r9d
0.33% │ ││ 0x00007fc4e0befb56: mov %r9b,0x30(%rcx,%rbx,1)
1.44% │ ││ 0x00007fc4e0befb5b: movsbl 0x31(%rdx,%r8,1),%r9d
0.20% │ ││ 0x00007fc4e0befb61: mov %r9b,0x31(%rcx,%rbx,1)
0.78% │ ││ 0x00007fc4e0befb66: movsbl 0x32(%rdx,%r8,1),%r9d
0.46% │ ││ 0x00007fc4e0befb6c: mov %r9b,0x32(%rcx,%rbx,1) ; {other}
1.46% │ ││ 0x00007fc4e0befb71: movsbl 0x33(%rdx,%r8,1),%r9d
│ ││ 0x00007fc4e0befb77: mov %r9b,0x33(%rcx,%rbx,1)
0.66% │ ││ 0x00007fc4e0befb7c: movsbl 0x34(%rdx,%r8,1),%r9d
0.07% │ ││ 0x00007fc4e0befb82: mov %r9b,0x34(%rcx,%rbx,1)
1.55% │ ││ 0x00007fc4e0befb87: movsbl 0x35(%rdx,%r8,1),%r9d
0.01% │ ││ 0x00007fc4e0befb8d: mov %r9b,0x35(%rcx,%rbx,1)
0.78% │ ││ 0x00007fc4e0befb92: movsbl 0x36(%rdx,%r8,1),%r9d
0.19% │ ││ 0x00007fc4e0befb98: mov %r9b,0x36(%rcx,%rbx,1)
1.47% │ ││ 0x00007fc4e0befb9d: movsbl 0x37(%rdx,%r8,1),%r9d
0.01% │ ││ 0x00007fc4e0befba3: mov %r9b,0x37(%rcx,%rbx,1)
0.74% │ ││ 0x00007fc4e0befba8: movsbl 0x38(%rdx,%r8,1),%r9d
0.15% │ ││ 0x00007fc4e0befbae: mov %r9b,0x38(%rcx,%rbx,1)
1.24% │ ││ 0x00007fc4e0befbb3: movsbl 0x39(%rdx,%r8,1),%r9d
0.01% │ ││ 0x00007fc4e0befbb9: mov %r9b,0x39(%rcx,%rbx,1)
0.68% │ ││ 0x00007fc4e0befbbe: movsbl 0x3a(%rdx,%r8,1),%r9d
0.25% │ ││ 0x00007fc4e0befbc4: mov %r9b,0x3a(%rcx,%rbx,1)
1.59% │ ││ 0x00007fc4e0befbc9: movsbl 0x3b(%rdx,%r8,1),%r9d
│ ││ 0x00007fc4e0befbcf: mov %r9b,0x3b(%rcx,%rbx,1)
0.57% │ ││ 0x00007fc4e0befbd4: movsbl 0x3c(%rdx,%r8,1),%r9d
0.12% │ ││ 0x00007fc4e0befbda: mov %r9b,0x3c(%rcx,%rbx,1)
1.55% │ ││ 0x00007fc4e0befbdf: movsbl 0x3d(%rdx,%r8,1),%r9d
0.01% │ ││ 0x00007fc4e0befbe5: mov %r9b,0x3d(%rcx,%rbx,1)
0.57% │ ││ 0x00007fc4e0befbea: movsbl 0x3e(%rdx,%r8,1),%r9d
0.12% │ ││ 0x00007fc4e0befbf0: mov %r9b,0x3e(%rcx,%rbx,1)
1.48% │ ││ 0x00007fc4e0befbf5: movsbl 0x3f(%rdx,%r8,1),%r9d
│ ││ 0x00007fc4e0befbfb: mov %r9b,0x3f(%rcx,%rbx,1)
0.41% │ ││ 0x00007fc4e0befc00: movsbl 0x40(%rdx,%r8,1),%r9d
0.14% │ ││ 0x00007fc4e0befc06: mov %r9b,0x40(%rcx,%rdi,1)
1.77% │ ││ 0x00007fc4e0befc0b: movsbl 0x41(%rdx,%r8,1),%r9d
0.01% │ ││ 0x00007fc4e0befc11: mov %r9b,0x41(%rcx,%rbx,1)
0.30% │ ││ 0x00007fc4e0befc16: movsbl 0x42(%rdx,%r8,1),%r9d
0.33% │ ││ 0x00007fc4e0befc1c: mov %r9b,0x42(%rcx,%rbx,1)
2.00% │ ││ 0x00007fc4e0befc21: movsbl 0x43(%rdx,%r8,1),%r9d
0.01% │ ││ 0x00007fc4e0befc27: mov %r9b,0x43(%rcx,%rbx,1)
0.34% │ ││ 0x00007fc4e0befc2c: movsbl 0x44(%rdx,%r8,1),%r9d
0.87% │ ││ 0x00007fc4e0befc32: mov %r9b,0x44(%rcx,%rbx,1)
2.25% │ ││ 0x00007fc4e0befc37: movsbl 0x45(%rdx,%r8,1),%r9d
0.01% │ ││ 0x00007fc4e0befc3d: mov %r9b,0x45(%rcx,%rbx,1)
0.19% │ ││ 0x00007fc4e0befc42: movsbl 0x46(%rdx,%r8,1),%r9d
0.65% │ ││ 0x00007fc4e0befc48: mov %r9b,0x46(%rcx,%rbx,1)
1.96% │ ││ 0x00007fc4e0befc4d: movsbl 0x47(%rdx,%r8,1),%r9d
0.02% │ ││ 0x00007fc4e0befc53: mov %r9b,0x47(%rcx,%rbx,1)
0.16% │ ││ 0x00007fc4e0befc58: movsbl 0x48(%rdx,%r8,1),%r9d
0.58% │ ││ 0x00007fc4e0befc5e: mov %r9b,0x48(%rcx,%rbx,1)
1.78% │ ││ 0x00007fc4e0befc63: movsbl 0x49(%rdx,%r8,1),%r9d
0.06% │ ││ 0x00007fc4e0befc69: mov %r9b,0x49(%rcx,%rbx,1) ; {other}
0.18% │ ││ 0x00007fc4e0befc6e: movsbl 0x4a(%rdx,%r8,1),%r9d
0.49% │ ││ 0x00007fc4e0befc74: mov %r9b,0x4a(%rcx,%rbx,1)
1.76% │ ││ 0x00007fc4e0befc79: movsbl 0x4b(%rdx,%r8,1),%r9d
0.01% │ ││ 0x00007fc4e0befc7f: mov %r9b,0x4b(%rcx,%rbx,1)
0.15% │ ││ 0x00007fc4e0befc84: movsbl 0x4c(%rdx,%r8,1),%r9d
0.49% │ ││ 0x00007fc4e0befc8a: mov %r9b,0x4c(%rcx,%rbx,1)
1.92% │ ││ 0x00007fc4e0befc8f: movsbl 0x4d(%rdx,%r8,1),%r9d
0.21% │ ││ 0x00007fc4e0befc95: mov %r9b,0x4d(%rcx,%rbx,1)
0.22% │ ││ 0x00007fc4e0befc9a: movsbl 0x4e(%rdx,%r8,1),%r9d
0.50% │ ││ 0x00007fc4e0befca0: mov %r9b,0x4e(%rcx,%rbx,1)
1.94% │ ││ 0x00007fc4e0befca5: movsbl 0x4f(%rdx,%r8,1),%r9d
0.12% │ ││ 0x00007fc4e0befcab: mov %r9b,0x4f(%rcx,%rbx,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0}
│ ││ ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 22 (line 92)
0.18% │ ││ 0x00007fc4e0befcb0: add $0x40,%r10d ;*iinc {reexecute=0 rethrow=0 return_oop=0}
│ ││ ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 23 (line 91)
│ ││ 0x00007fc4e0befcb4: cmp %r11d,%r10d
│ ╰│ 0x00007fc4e0befcb7: jl 0x00007fc4e0bef9d0 ;*goto {reexecute=0 rethrow=0 return_oop=0}
│ │ ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 26 (line 91)
0.01% │ │ 0x00007fc4e0befcbd: mov 0x30(%r15),%r11 ; ImmutableOopMap {rcx=Oop rdx=Oop }
│ │ ;*goto {reexecute=1 rethrow=0 return_oop=0}
│ │ ; - (reexecute) org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 26 (line 91)
0.11% │ │ 0x00007fc4e0befcc1: test %eax,(%r11) ;*goto {reexecute=0 rethrow=0 return_oop=0}
│ │ ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 26 (line 91)
│ │ ; {poll}
0.02% │ │ 0x00007fc4e0befcc4: cmp %r14d,%r10d
│ ╰ 0x00007fc4e0befcc7: jl 0x00007fc4e0bef9a6
0.01% ↘ 0x00007fc4e0befccd: cmp (%rsp),%r10d
0x00007fc4e0befcd1: jge 0x00007fc4e0bef967
0.02% 0x00007fc4e0befcd7: nop ;*aload_2 {reexecute=0 rethrow=0 return_oop=0}
; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 10 (line 92)
0.06% ↗ 0x00007fc4e0befcd8: movslq %r10d,%r11
0.01% │ 0x00007fc4e0befcdb: lea (%r11,%rbp,1),%r8
0.18% │ 0x00007fc4e0befcdf: lea (%r11,%r13,1),%r9
0.01% │ 0x00007fc4e0befce3: movsbl 0x10(%rdx,%r9,1),%r9d
0.04% │ 0x00007fc4e0befce9: mov %r9b,0x10(%rcx,%r8,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0}
│ ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 22 (line 92)
0.21% │ 0x00007fc4e0befcee: inc %r10d ;*iinc {reexecute=0 rethrow=0 return_oop=0}
│ ; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 23 (line 91)
│ 0x00007fc4e0befcf1: cmp (%rsp),%r10d
╰ 0x00007fc4e0befcf5: jl 0x00007fc4e0befcd8
0x00007fc4e0befcf7: jmpq 0x00007fc4e0bef967 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - org.openjdk.bench.vm.compiler.VectorAliasing::copy_B at 7 (line 91)
0x00007fc4e0befcfc: mov $0xffffff6e,%esi
0x00007fc4e0befd01: mov %rdx,%rbp
0x00007fc4e0befd04: mov %rcx,(%rsp)
0x00007fc4e0befd08: mov %r8d,0x8(%rsp)
....................................................................................................
96.25% <total for region 1>
....[Hottest Regions]...............................................................................
96.25% c2, level 4 org.openjdk.bench.vm.compiler.VectorAliasing::copy_B, version 5, compile id 1267
0.29% libjvm.so ElfSymbolTable::lookup(unsigned char*, int*, int*, int*, ElfFuncDescTable*)
0.16% libjvm.so resolve_inlining_predicate(CompileCommandEnum, methodHandle const&)
0.14% libjvm.so CompilerOracle::tag_blackhole_if_possible(methodHandle const&)
0.12% c2, level 4 org.openjdk.bench.vm.compiler.VectorAliasing::copy_B, version 5, compile id 1267
0.12% libc.so.6 __futex_abstimed_wait_common
0.11% libjvm.so resolve_inlining_predicate(CompileCommandEnum, methodHandle const&)
0.11% libjvm.so MethodMatcher::matches(methodHandle const&) const
0.09% libjvm.so CompilerOracle::should_not_inline(methodHandle const&)
0.09% libjvm.so fileStream::write(char const*, unsigned long)
0.08% libjvm.so defaultStream::write(char const*, unsigned long)
0.08% libc.so.6 _IO_fwrite
0.08% libc.so.6 clone3
0.07% kernel [unknown]
0.07% c2, level 4 org.openjdk.bench.vm.compiler.jmh_generated.VectorAliasing_VectorAliasingSuperWord_bench_copy_array_B_differentIndex_alias_jmhTest::bench_copy_array_B_differentIndex_alias_avgt_jmhStub, version 5, compile id 1288
0.07% libjvm.so CompilerOracle::should_exclude(methodHandle const&)
0.07% libjvm.so RelocIterator::initialize(nmethod*, unsigned char*, unsigned char*)
0.07% libc.so.6 __GI___pthread_disable_asynccancel
0.06% libjvm.so os::write(int, void const*, unsigned long)
0.05% libjvm.so xmlStream::write_text(char const*, unsigned long) [clone .part.0]
1.85% <...other 145 warm regions...>
....................................................................................................
99.99% <totals>
....[Hottest Methods (after inlining)]..............................................................
96.41% c2, level 4 org.openjdk.bench.vm.compiler.VectorAliasing::copy_B, version 5, compile id 1267
0.29% libjvm.so ElfSymbolTable::lookup(unsigned char*, int*, int*, int*, ElfFuncDescTable*)
0.27% libjvm.so resolve_inlining_predicate(CompileCommandEnum, methodHandle const&)
0.15% hsdis-amd64.so print_insn
0.14% libjvm.so CompilerOracle::tag_blackhole_if_possible(methodHandle const&)
0.13% <unknown>
0.13% libc.so.6 _IO_fwrite
0.12% libc.so.6 __futex_abstimed_wait_common
0.12% libjvm.so fileStream::write(char const*, unsigned long)
0.11% libjvm.so MethodMatcher::matches(methodHandle const&) const
0.11% libjvm.so defaultStream::write(char const*, unsigned long)
0.09% libjvm.so CompilerOracle::should_not_inline(methodHandle const&)
0.09% c2, level 4 org.openjdk.bench.vm.compiler.jmh_generated.VectorAliasing_VectorAliasingSuperWord_bench_copy_array_B_differentIndex_alias_jmhTest::bench_copy_array_B_differentIndex_alias_avgt_jmhStub, version 5, compile id 1288
0.09% libjvm.so xmlStream::write_text(char const*, unsigned long) [clone .part.0]
0.08% libc.so.6 clone3
0.07% libc.so.6 __GI___pthread_disable_asynccancel
0.07% libjvm.so RelocIterator::initialize(nmethod*, unsigned char*, unsigned char*)
0.07% libjvm.so CompilerOracle::should_exclude(methodHandle const&)
0.07% kernel [unknown]
0.07% interpreter method entry point (kind = zerolocals)
1.36% <...other 99 warm methods...>
....................................................................................................
99.99% <totals>
....[Distribution by Source]........................................................................
96.54% c2, level 4
2.24% libjvm.so
0.69% libc.so.6
0.19% hsdis-amd64.so
0.13%
0.13% interpreter
0.07% kernel
0.01% ld-linux-x86-64.so.2
....................................................................................................
99.99% <totals>
# Run complete. Total time: 00:00:21
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
NOTE: Current JVM experimentally supports Compiler Blackholes, and they are in use. Please exercise
extra caution when trusting the results, look into the generated code to check the benchmark still
works, and factor in a small probability of new VM bugs. Additionally, while comparisons between
different JVMs are already problematic, the performance difference caused by different Blackhole
modes can be very significant. Please make sure you use the consistent Blackhole mode for comparisons.
Benchmark (SIZE) (seed) Mode Cnt Score Error Units
VectorAliasing.VectorAliasingSuperWord.bench_copy_array_B_differentIndex_alias 10000 0 avgt 10 3181.664 ± 16.411 ns/op
VectorAliasing.VectorAliasingSuperWord.bench_copy_array_B_differentIndex_alias:asm 10000 0 avgt NaN ---
Performance counter stats for './build/linux-x64/jdk/bin/java -Djava.library.path=/home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/images/test/micro/native -jar /home/empeter/Documents/oracle/jdk-fork6/build/linux-x64/images/test/micro/benchmarks.jar VectorAliasing.VectorAliasingSuperWord.bench_copy_array_B_differentIndex_alias -prof perfasm':
38,374.64 msec task-clock:u # 1.688 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
63,886 page-faults:u # 1.665 K/sec
61,212,825,552 cycles:u # 1.595 GHz
130,428,038,623 instructions:u # 2.13 insn per cycle
12,158,904,836 branches:u # 316.847 M/sec
259,878,216 branch-misses:u # 2.14% of all branches
TopdownL1 # 21.9 % tma_backend_bound
# 32.5 % tma_bad_speculation
# 12.8 % tma_frontend_bound
# 32.8 % tma_retiring
22.730051773 seconds time elapsed
25.260343000 seconds user
13.046024000 seconds sys
-------------
PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3213283462
More information about the hotspot-compiler-dev
mailing list