Vector API + MethodHandle optimization
Paul Sandoz
paul.sandoz at oracle.com
Mon Feb 22 20:05:41 UTC 2021
Hi Remi,
It will take me some time to analyze, but there might be some odd interaction going on with vector boxes and MHs.
As you experienced, masked operations are not yet fully optimized. Work is being done on supporting platforms to use mask hardware registers e.g. on ARM SVE and AVX-512. On other platforms, AVX2, we have to resort to composing with blend, and particular challenging are loads/stores, where we need to ensure we don’t access memory that is out of bounds.
Paul.
> On Feb 16, 2021, at 10:17 AM, Remi Forax <forax at univ-mlv.fr> wrote:
>
> Long mail because there is an issue with the way the method handle and the vector API are optimized together AND
> i believe there is a regression because i don't see any method handle related code when i use LogCompilation or PrintInlining anymore.
>
> This week end, i've spent a little time to check with JMH if the code inside the vector-handle repo was optimized correctly.
> https://github.com/forax/vector-handle
>
> First, it did not look great, because it seems that using masks is really slow.
> So i've rewritten the code to not use masks but a classical loop at the end.
>
> default void apply(int[] dest, int[] a, int[] b, IIIOp operator) {
> var length = dest.length;
> if (a.length != length || b.length != length) {
> throw new IllegalArgumentException("wrong length");
> }
> int i = 0;
> int bound = INT_SPECIES.loopBound(a.length);
> for (; i < bound; i += INT_SPECIES.length()) {
> var va = IntVector.fromArray(INT_SPECIES, a, i);
> var vb = IntVector.fromArray(INT_SPECIES, b, i);
> var vc = (IntVector) invoke(operator, va, vb, null, null);
> vc.intoArray(dest, i);
> }
> for (; i < a.length; i++) {
> dest[i] = operator.apply(a[i], b[i]);
> }
> }
>
> and the JMH benchmark is a call to
> VH.apply(DEST, A, B, (x, y) -> x + y * 2);
> with DEST, A and B being arrays of 100_000 ints declared as constants
>
> Now i've a weird behavior using JMH, the warmup iteration goes from 70 µs to 250 then down to 18 µs.
> So at some point in time, the JITed code is far worst that the previous versions (70 to 250) before being fully optimize (at 18).
>
> # JMH version: 1.27
> # VM version: JDK 16-ea, OpenJDK 64-Bit Server VM, 16-ea+28-2065
> # VM invoker: /home/forax/jdk/jdk-16/bin/java
> # VM options: --add-modules jdk.incubator.vector
> # JMH blackhole mode: full blackhole + dont-inline hint; set -Djmh.blackhole.mode=COMPILER to get compiler-assisted ones
> # Warmup: 5 iterations, 5 s each
> # Measurement: 5 iterations, 5 s each
> # Timeout: 10 min per iteration
> # Threads: 1 thread, will synchronize iterations
> # Benchmark mode: Average time, time/op
> # Benchmark: com.github.forax.vectorhandle.VectorHandlePerfTest.vector_handle
>
> # Run progress: 0.00% complete, ETA 00:00:50
> # Fork: 1 of 1
> WARNING: Using incubator modules: jdk.incubator.vector
> # Warmup Iteration 1: 70.396 us/op
> # Warmup Iteration 2: 250.460 us/op
> # Warmup Iteration 3: 127.519 us/op
> # Warmup Iteration 4: 18.574 us/op
> # Warmup Iteration 5: 18.594 us/op
> Iteration 1: 18.516 us/op
> Iteration 2: 18.555 us/op
> Iteration 3: 18.516 us/op
> Iteration 4: 18.389 us/op
> Iteration 5: 18.331 us/op
>
> Result "com.github.forax.vectorhandle.VectorHandlePerfTest.vector_handle":
> 18.461 ±(99.9%) 0.371 us/op [Average]
> (min, avg, max) = (18.331, 18.461, 18.555), stdev = 0.096
> CI (99.9%): [18.091, 18.832] (assumes normal distribution)
>
> Using -prof perfasm, i get 3 regions, the first one is i believe the code fully optimized, the loop is unrolled into a serie of vpmulld and vpaddd.
> The second region is the loop without the method handle call (the one behing invoke()) being optimize (see callq 0x00007fa531840a80).
> The last regsion is the content of the method handle only being optimized.
>
> Then it gets even weirder, when i ask to get the compilation log using -XX:+LogCompilation or just print the inlining trees with -XX:+PrintInlining,
> i'm not able to find any trace of either the benchmark method or even any lambda.
> I wonder if there is not a regression here, it seems that any compilation log that contains a hidden class is now hidden.
>
> Getting back to the issue, when the loop is optimized the first time by c2, the JIT decide to call the method handle instead of inlining it,
> i don't understand exactly why but it's clearly not what i'm expect.
> Is it the normal behavior, can we do better here, do we need to add an hint like a special method on MutableCallSite saying that now the call is stable enough to be inlined ?
>
> regards,
> Rémi
>
> -------------------------------------------------------------------------------------------------------------------
>
> # JMH version: 1.27
> # VM version: JDK 16-ea, OpenJDK 64-Bit Server VM, 16-ea+28-2065
> # VM invoker: /home/forax/jdk/jdk-16/bin/java
> # VM options: --add-modules jdk.incubator.vector
> # JMH blackhole mode: full blackhole + dont-inline hint; set -Djmh.blackhole.mode=COMPILER to get compiler-assisted ones
> # Warmup: <none>
> # Measurement: 5 iterations, 5 s each
> # Timeout: 10 min per iteration
> # Threads: 1 thread, will synchronize iterations
> # Benchmark mode: Average time, time/op
> # Benchmark: com.github.forax.vectorhandle.VectorHandlePerfTest.vector_handle
>
> # Run progress: 0.00% complete, ETA 00:00:25
> # Fork: 1 of 1
> # Preparing profilers: LinuxPerfAsmProfiler
> # Profilers consume stdout and stderr from target VM, use -v EXTRA to copy to console
> Iteration 1: 71.821 us/op
> Iteration 2: 241.065 us/op
> Iteration 3: 124.976 us/op
> Iteration 4: 18.637 us/op
> Iteration 5: 18.655 us/op
> # Processing profiler results: LinuxPerfAsmProfiler
>
>
> Result "com.github.forax.vectorhandle.VectorHandlePerfTest.vector_handle":
> 95.031 ±(99.9%) 357.254 us/op [Average]
> (min, avg, max) = (18.637, 95.031, 241.065), stdev = 92.778
> CI (99.9%): [≈ 0, 452.285] (assumes normal distribution)
>
> Secondary result "com.github.forax.vectorhandle.VectorHandlePerfTest.vector_handle:·asm":
> PrintAssembly processed: 187947 total address lines.
> Perf output processed (skipped 1.209 seconds):
> Column 1: cycles (25831 events)
>
> Hottest code regions (>10.00% "cycles" events):
>
> ....[Hottest Region 1]..............................................................................
> c2, level 4, com.github.forax.vectorhandle.jmh_generated.VectorHandlePerfTest_vector_handle_jmhTest::vector_handle_avgt_jmhStub, version 890 (268 bytes)
>
> ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99400::invoke at 23
> ; - java.lang.invoke.Invokers$Holder::invokeExact_MT at 26
> ; - com.github.forax.vectorhandle.VectorHandle::lambda$of$0 at 8 (line 410)
> ; - com.github.forax.vectorhandle.VectorHandle$$Lambda$78/0x0000000800c27610::invoke at 11
> ; - com.github.forax.vectorhandle.VectorHandle::apply at 81 (line 298)
> ; - com.github.forax.vectorhandle.VectorHandlePerfTest::vector_handle at 17 (line 44)
> ; - com.github.forax.vectorhandle.jmh_generated.VectorHandlePerfTest_vector_handle_jmhTest::vector_handle_avgt_jmhStub at 17 (line 190)
> 0x00007fa53932320b: mov $0x18688,%r10d
> 0x00007fa539323211: sub %r8d,%r10d
> 0x00007fa539323214: xor %r11d,%r11d
> 0.00% 0x00007fa539323217: cmp $0x18688,%r8d
> 0x00007fa53932321e: cmovg %r11d,%r10d
> 0x00007fa539323222: cmp $0x7d00,%r10d
> 0x00007fa539323229: mov $0x7d00,%r11d
> 0x00007fa53932322f: cmova %r11d,%r10d
> 0.00% 0x00007fa539323233: add %r8d,%r10d ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0}
> ; - jdk.incubator.vector.IntVector::intoArray at 42 (line 2962)
> ; - com.github.forax.vectorhandle.VectorHandle::apply at 96 (line 299)
> ; - com.github.forax.vectorhandle.VectorHandlePerfTest::vector_handle at 17 (line 44)
> ; - com.github.forax.vectorhandle.jmh_generated.VectorHandlePerfTest_vector_handle_jmhTest::vector_handle_avgt_jmhStub at 17 (line 190)
> 0x00007fa539323236: data16 nopw 0x0(%rax,%rax,1) ;*invokevirtual invokeBasic {reexecute=0 rethrow=0 return_oop=0}
> ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99400::invoke at 23
> ; - java.lang.invoke.Invokers$Holder::invokeExact_MT at 26
> ; - com.github.forax.vectorhandle.VectorHandle::lambda$of$0 at 8 (line 410)
> ; - com.github.forax.vectorhandle.VectorHandle$$Lambda$78/0x0000000800c27610::invoke at 11
> ; - com.github.forax.vectorhandle.VectorHandle::apply at 81 (line 298)
> ; - com.github.forax.vectorhandle.VectorHandlePerfTest::vector_handle at 17 (line 44)
> ; - com.github.forax.vectorhandle.jmh_generated.VectorHandlePerfTest_vector_handle_jmhTest::vector_handle_avgt_jmhStub at 17 (line 190)
> 0.26% ↗ 0x00007fa539323240: movabs $0x70ac04e30,%r11 ; {oop([I{0x000000070ac04e30})}
> 0.31% │ 0x00007fa53932324a: vpmulld 0x10(%r11,%r8,4),%ymm1,%ymm0
> 1.60% │ 0x00007fa539323251: movabs $0x70ad73c68,%r11 ; {oop([I{0x000000070ad73c68})}
> 0.28% │ 0x00007fa53932325b: vpaddd 0x10(%r11,%r8,4),%ymm0,%ymm0 ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0}
> │ ; - jdk.incubator.vector.IntVector::intoArray at 42 (line 2962)
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 96 (line 299)
> │ ; - com.github.forax.vectorhandle.VectorHandlePerfTest::vector_handle at 17 (line 44)
> │ ; - com.github.forax.vectorhandle.jmh_generated.VectorHandlePerfTest_vector_handle_jmhTest::vector_handle_avgt_jmhStub at 17 (line 190)
> 1.33% │ 0x00007fa539323262: movabs $0x70acd2c58,%r11 ; {oop([I{0x000000070acd2c58})}
> 0.28% │ 0x00007fa53932326c: vmovdqu %ymm0,0x10(%r11,%r8,4) ;*invokevirtual invokeBasic {reexecute=0 rethrow=0 return_oop=0}
> │ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99800::invoke at 35
> │ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99400::invoke at 64
> │ ; - java.lang.invoke.Invokers$Holder::invokeExact_MT at 26
> │ ; - com.github.forax.vectorhandle.VectorHandle::lambda$of$0 at 8 (line 410)
> │ ; - com.github.forax.vectorhandle.VectorHandle$$Lambda$78/0x0000000800c27610::invoke at 11
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 81 (line 298)
> │ ; - com.github.forax.vectorhandle.VectorHandlePerfTest::vector_handle at 17 (line 44)
> │ ; - com.github.forax.vectorhandle.jmh_generated.VectorHandlePerfTest_vector_handle_jmhTest::vector_handle_avgt_jmhStub at 17 (line 190)
> 0.51% │ 0x00007fa539323273: movabs $0x70ac04e30,%r11 ; {oop([I{0x000000070ac04e30})}
> 0.60% │ 0x00007fa53932327d: vpmulld 0x30(%r11,%r8,4),%ymm1,%ymm0
> 5.37% │ 0x00007fa539323284: movabs $0x70ad73c68,%r11 ; {oop([I{0x000000070ad73c68})}
> 0.42% │ 0x00007fa53932328e: vpaddd 0x30(%r11,%r8,4),%ymm0,%ymm0 ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0}
> │ ; - jdk.incubator.vector.IntVector::intoArray at 42 (line 2962)
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 96 (line 299)
> │ ; - com.github.forax.vectorhandle.VectorHandlePerfTest::vector_handle at 17 (line 44)
> │ ; - com.github.forax.vectorhandle.jmh_generated.VectorHandlePerfTest_vector_handle_jmhTest::vector_handle_avgt_jmhStub at 17 (line 190)
> 6.96% │ 0x00007fa539323295: movabs $0x70acd2c58,%r11 ; {oop([I{0x000000070acd2c58})}
> 0.28% │ 0x00007fa53932329f: vmovdqu %ymm0,0x30(%r11,%r8,4) ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 47 (line 295)
> │ ; - com.github.forax.vectorhandle.VectorHandlePerfTest::vector_handle at 17 (line 44)
> │ ; - com.github.forax.vectorhandle.jmh_generated.VectorHandlePerfTest_vector_handle_jmhTest::vector_handle_avgt_jmhStub at 17 (line 190)
> 0.52% │ 0x00007fa5393232a6: movabs $0x70ac04e30,%r11 ; {oop([I{0x000000070ac04e30})}
> 0.50% │ 0x00007fa5393232b0: vpmulld 0x50(%r11,%r8,4),%ymm1,%ymm0
> 1.44% │ 0x00007fa5393232b7: movabs $0x70ad73c68,%r11 ; {oop([I{0x000000070ad73c68})}
> 0.54% │ 0x00007fa5393232c1: vpaddd 0x50(%r11,%r8,4),%ymm0,%ymm0 ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0}
> │ ; - jdk.incubator.vector.IntVector::intoArray at 42 (line 2962)
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 96 (line 299)
> │ ; - com.github.forax.vectorhandle.VectorHandlePerfTest::vector_handle at 17 (line 44)
> │ ; - com.github.forax.vectorhandle.jmh_generated.VectorHandlePerfTest_vector_handle_jmhTest::vector_handle_avgt_jmhStub at 17 (line 190)
> 1.10% │ 0x00007fa5393232c8: movabs $0x70acd2c58,%r11 ; {oop([I{0x000000070acd2c58})}
> 0.42% │ 0x00007fa5393232d2: vmovdqu %ymm0,0x50(%r11,%r8,4) ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 47 (line 295)
> │ ; - com.github.forax.vectorhandle.VectorHandlePerfTest::vector_handle at 17 (line 44)
> │ ; - com.github.forax.vectorhandle.jmh_generated.VectorHandlePerfTest_vector_handle_jmhTest::vector_handle_avgt_jmhStub at 17 (line 190)
> 0.62% │ 0x00007fa5393232d9: movabs $0x70ac04e30,%r11 ; {oop([I{0x000000070ac04e30})}
> 0.29% │ 0x00007fa5393232e3: vpmulld 0x70(%r11,%r8,4),%ymm1,%ymm0
> 5.35% │ 0x00007fa5393232ea: movabs $0x70ad73c68,%r11 ; {oop([I{0x000000070ad73c68})}
> 0.26% │ 0x00007fa5393232f4: vpaddd 0x70(%r11,%r8,4),%ymm0,%ymm0 ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0}
> │ ; - jdk.incubator.vector.IntVector::intoArray at 42 (line 2962)
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 96 (line 299)
> │ ; - com.github.forax.vectorhandle.VectorHandlePerfTest::vector_handle at 17 (line 44)
> │ ; - com.github.forax.vectorhandle.jmh_generated.VectorHandlePerfTest_vector_handle_jmhTest::vector_handle_avgt_jmhStub at 17 (line 190)
> 6.09% │ 0x00007fa5393232fb: movabs $0x70acd2c58,%r11 ; {oop([I{0x000000070acd2c58})}
> 0.36% │ 0x00007fa539323305: vmovdqu %ymm0,0x70(%r11,%r8,4)
> 2.56% │ 0x00007fa53932330c: add $0x20,%r8d
> 0.27% │ 0x00007fa539323310: cmp %r10d,%r8d
> ╰ 0x00007fa539323313: jl 0x00007fa539323240 ;*invokestatic checkIndex {reexecute=0 rethrow=0 return_oop=0}
> ; - java.util.Objects::checkIndex at 3 (line 359)
> ; - jdk.incubator.vector.VectorIntrinsics::checkFromIndexSize at 43 (line 74)
> ; - jdk.incubator.vector.IntVector::fromArray at 9 (line 2689)
> ; - com.github.forax.vectorhandle.VectorHandle::apply at 56 (line 296)
> ; - com.github.forax.vectorhandle.VectorHandlePerfTest::vector_handle at 17 (line 44)
> ; - com.github.forax.vectorhandle.jmh_generated.VectorHandlePerfTest_vector_handle_jmhTest::vector_handle_avgt_jmhStub at 17 (line 190)
> 0x00007fa539323319: mov 0x120(%r15),%r11 ; ImmutableOopMap {r9=Oop [80]=Oop [88]=Oop [96]=Oop }
> ;*goto {reexecute=1 rethrow=0 return_oop=0}
> ; - (reexecute) com.github.forax.vectorhandle.VectorHandle::apply at 112 (line 295)
> ; - com.github.forax.vectorhandle.VectorHandlePerfTest::vector_handle at 17 (line 44)
> ; - com.github.forax.vectorhandle.jmh_generated.VectorHandlePerfTest_vector_handle_jmhTest::vector_handle_avgt_jmhStub at 17 (line 190)
> 0.01% 0x00007fa539323320: test %eax,(%r11) ; {poll}
> 0.11% 0x00007fa539323323: cmp $0x18688,%r8d
> 0x00007fa53932332a: jl 0x00007fa53932320b
> 0x00007fa539323330: cmp $0x186a0,%r8d
> 0x00007fa539323337: jl 0x00007fa53932313e
> 0x00007fa53932333d: vmovdqu %ymm1,0x8(%rsp)
> 0x00007fa539323343: jmpq 0x00007fa53932318f ;*invokedynamic {reexecute=0 rethrow=0 return_oop=0}
> ; - com.github.forax.vectorhandle.VectorHandlePerfTest::vector_handle at 12 (line 44)
> ; - com.github.forax.vectorhandle.jmh_generated.VectorHandlePerfTest_vector_handle_jmhTest::vector_handle_avgt_jmhStub at 17 (line 190)
> 0x00007fa539323348: mov $0xfffffff6,%esi ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
> ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99800::invoke at 101
> ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99400::invoke at 64
> ....................................................................................................
> 38.65% <total for region 1>
>
> ....[Hottest Region 2]..............................................................................
> c2, level 4, com.github.forax.vectorhandle.VectorHandle::apply, version 851 (626 bytes)
>
> 0x00007fa53931cd4c: mov %rdx,0x78(%rsp)
> 0x00007fa53931cd51: mov %rsi,0xb0(%rsp)
> 0x00007fa53931cd59: mov %r9d,0x9c(%rsp)
> 0x00007fa53931cd61: mov %r13d,0xb8(%rsp)
> 0x00007fa53931cd69: mov %r8d,0xbc(%rsp)
> 0x00007fa53931cd71: mov %r14d,0xc0(%rsp)
> 0x00007fa53931cd79: mov %r11d,0x84(%rsp)
> 0x00007fa53931cd81: mov %rbx,0xc8(%rsp)
> 0x00007fa53931cd89: mov %r10d,0x80(%rsp) ;*goto {reexecute=0 rethrow=0 return_oop=0}
> ; - com.github.forax.vectorhandle.VectorHandle::apply at 112 (line 295)
> 0.07% ↗ 0x00007fa53931cd91: mov 0x130(%r15),%rax
> 0.46% │ 0x00007fa53931cd98: mov %rax,%r10
> 0.03% │ 0x00007fa53931cd9b: add $0x30,%r10
> 0.05% │ 0x00007fa53931cd9f: movslq 0x80(%rsp),%r11
> 0.11% │ 0x00007fa53931cda7: shl $0x2,%r11
> 0.51% │ 0x00007fa53931cdab: mov %r11,(%rsp)
> 0.03% │ 0x00007fa53931cdaf: mov %r11,%rbp ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0}
> │ ; - jdk.incubator.vector.IntVector::intoArray at 42 (line 2962)
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 96 (line 299)
> 0.03% │ 0x00007fa53931cdb2: add $0x10,%rbp
> 0.09% │ 0x00007fa53931cdb6: data16 nopw 0x0(%rax,%rax,1)
> 0.68% │ 0x00007fa53931cdc0: cmp 0x140(%r15),%r10
> │ 0x00007fa53931cdc7: jae 0x00007fa53931d21e ;*goto {reexecute=0 rethrow=0 return_oop=0}
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 112 (line 295)
> 0.05% │ 0x00007fa53931cdcd: mov %r10,0x130(%r15)
> 0.05% │ 0x00007fa53931cdd4: prefetchw 0xc0(%r10)
> 0.09% │ 0x00007fa53931cddc: movq $0x1,(%rax)
> 0.50% │ 0x00007fa53931cde3: prefetchw 0x100(%r10)
> 0.03% │ 0x00007fa53931cdeb: movl $0x1165,0x8(%rax) ; {metadata({type array int})}
> 0.03% │ 0x00007fa53931cdf2: prefetchw 0x140(%r10)
> 0.08% │ 0x00007fa53931cdfa: movl $0x8,0xc(%rax)
> 0.63% │ 0x00007fa53931ce01: prefetchw 0x180(%r10)
> 1.71% │ 0x00007fa53931ce09: mov %r12,0x10(%rax)
> 0.05% │ 0x00007fa53931ce0d: mov %r12,0x18(%rax)
> 0.16% │ 0x00007fa53931ce11: mov %r12,0x20(%rax)
> 0.57% │ 0x00007fa53931ce15: mov %r12,0x28(%rax)
> 0.07% │ 0x00007fa53931ce19: mov %rax,0x8(%rsp)
> 0.04% │ 0x00007fa53931ce1e: mov 0xa8(%rsp),%r10
> 0.10% │ 0x00007fa53931ce26: mov (%rsp),%r11 ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0}
> │ ; - jdk.incubator.vector.IntVector::intoArray at 42 (line 2962)
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 96 (line 299)
> 0.41% │ 0x00007fa53931ce2a: vmovdqu 0x10(%r10,%r11,1),%ymm0 ;*invokestatic load {reexecute=0 rethrow=0 return_oop=0}
> │ ; - jdk.incubator.vector.IntVector::fromArray0Template at 31 (line 3209)
> │ ; - jdk.incubator.vector.Int256Vector::fromArray0 at 3 (line 789)
> │ ; - jdk.incubator.vector.IntVector::fromArray at 24 (line 2691)
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 56 (line 296)
> 2.49% │ 0x00007fa53931ce31: vmovdqu %ymm0,0x10(%rax)
> 0.58% │ 0x00007fa53931ce36: mov 0x130(%r15),%r8
> 0.06% │ 0x00007fa53931ce3d: mov %r8,%r10
> 0.24% │ 0x00007fa53931ce40: add $0x10,%r10
> 0.07% │ 0x00007fa53931ce44: cmp 0x140(%r15),%r10
> │ 0x00007fa53931ce4b: jae 0x00007fa53931d249
> 0.21% │ 0x00007fa53931ce51: mov %r10,0x130(%r15)
> 0.03% │ 0x00007fa53931ce58: prefetchw 0xc0(%r10)
> 0.28% │ 0x00007fa53931ce60: movq $0x1,(%r8)
> 0.03% │ 0x00007fa53931ce67: movl $0x186a6c,0x8(%r8) ; {metadata('jdk/incubator/vector/Int256Vector')}
> 0.20% │ 0x00007fa53931ce6f: mov 0x8(%rsp),%r10
> 0.09% │ 0x00007fa53931ce74: mov %r10,%r11
> 0.33% │ 0x00007fa53931ce77: shr $0x3,%r11
> 0.03% │ 0x00007fa53931ce7b: mov %r11d,0xc(%r8)
> 0.29% │ 0x00007fa53931ce7f: mov %r8,0x8(%rsp)
> 0.03% │ 0x00007fa53931ce84: mov 0x130(%r15),%r11
> 0.29% │ 0x00007fa53931ce8b: mov %r11,%r10
> 0.03% │ 0x00007fa53931ce8e: add $0x30,%r10
> 0.28% │ 0x00007fa53931ce92: data16 nopw 0x0(%rax,%rax,1)
> 0.08% │ 0x00007fa53931ce9c: data16 data16 xchg %ax,%ax
> 0.48% │ 0x00007fa53931cea0: cmp 0x140(%r15),%r10
> │ 0x00007fa53931cea7: jae 0x00007fa53931d27e ;*goto {reexecute=0 rethrow=0 return_oop=0}
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 112 (line 295)
> 0.03% │ 0x00007fa53931cead: mov %r10,0x130(%r15)
> 0.23% │ 0x00007fa53931ceb4: prefetchw 0xc0(%r10)
> 0.07% │ 0x00007fa53931cebc: movq $0x1,(%r11)
> 0.31% │ 0x00007fa53931cec3: prefetchw 0x100(%r10)
> 0.07% │ 0x00007fa53931cecb: movl $0x1165,0x8(%r11) ; {metadata({type array int})}
> 0.24% │ 0x00007fa53931ced3: prefetchw 0x140(%r10)
> 0.09% │ 0x00007fa53931cedb: movl $0x8,0xc(%r11)
> 0.34% │ 0x00007fa53931cee3: prefetchw 0x180(%r10)
> 1.79% │ 0x00007fa53931ceeb: mov %r12,0x10(%r11)
> 0.20% │ 0x00007fa53931ceef: mov %r12,0x18(%r11)
> 0.09% │ 0x00007fa53931cef3: mov %r12,0x20(%r11)
> 0.36% │ 0x00007fa53931cef7: mov %r12,0x28(%r11)
> 0.13% │ 0x00007fa53931cefb: mov %r11,0x10(%rsp)
> 0.17% │ 0x00007fa53931cf00: mov 0x78(%rsp),%r10
> 0.07% │ 0x00007fa53931cf05: mov (%rsp),%r11 ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0}
> │ ; - jdk.incubator.vector.IntVector::intoArray at 42 (line 2962)
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 96 (line 299)
> 0.33% │ 0x00007fa53931cf09: vmovdqu 0x10(%r10,%r11,1),%ymm1 ;*invokestatic load {reexecute=0 rethrow=0 return_oop=0}
> │ ; - jdk.incubator.vector.IntVector::fromArray0Template at 31 (line 3209)
> │ ; - jdk.incubator.vector.Int256Vector::fromArray0 at 3 (line 789)
> │ ; - jdk.incubator.vector.IntVector::fromArray at 24 (line 2691)
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 67 (line 297)
> 1.82% │ 0x00007fa53931cf10: mov 0x10(%rsp),%r10
> 0.17% │ 0x00007fa53931cf15: vmovdqu %ymm1,0x10(%r10)
> 0.88% │ 0x00007fa53931cf1b: mov 0x130(%r15),%r8
> 0.16% │ 0x00007fa53931cf22: mov %r8,%r10
> 0.05% │ 0x00007fa53931cf25: add $0x10,%r10
> 0.06% │ 0x00007fa53931cf29: cmp 0x140(%r15),%r10
> │ 0x00007fa53931cf30: jae 0x00007fa53931d2b6 ;*goto {reexecute=0 rethrow=0 return_oop=0}
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 112 (line 295)
> 0.32% │ 0x00007fa53931cf36: mov %r10,0x130(%r15)
> 0.22% │ 0x00007fa53931cf3d: prefetchw 0xc0(%r10)
> 0.07% │ 0x00007fa53931cf45: movq $0x1,(%r8)
> 0.12% │ 0x00007fa53931cf4c: movl $0x186a6c,0x8(%r8) ; {metadata('jdk/incubator/vector/Int256Vector')}
> 0.24% │ 0x00007fa53931cf54: mov 0xc8(%rsp),%r10 ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0}
> │ ; - jdk.incubator.vector.IntVector::intoArray at 42 (line 2962)
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 96 (line 299)
> 0.22% │ 0x00007fa53931cf5c: mov 0xc(%r10),%r10d ;*getfield arg$1 {reexecute=0 rethrow=0 return_oop=0}
> │ ; - com.github.forax.vectorhandle.VectorHandle$$Lambda$78/0x0000000800c27610::invoke at 1
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 81 (line 298)
> 0.12% │ 0x00007fa53931cf60: mov 0x10(%rsp),%r11
> 0.13% │ 0x00007fa53931cf65: mov %r11,%r9
> 0.26% │ 0x00007fa53931cf68: shr $0x3,%r9
> 0.17% │ 0x00007fa53931cf6c: mov %r9d,0xc(%r8)
> 0.11% │ 0x00007fa53931cf70: mov 0x10(%r12,%r10,8),%ecx ; implicit exception: dispatches to 0x00007fa53931d420
> 0.44% │ 0x00007fa53931cf75: data16 data16 nopw 0x0(%rax,%rax,1)
> 0.24% │ 0x00007fa53931cf80: cmp $0xe15806c0,%ecx ; {oop(a 'java/lang/invoke/MethodType'{0x000000070ac03600} = (Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;)}
> │ 0x00007fa53931cf86: jne 0x00007fa53931d2e8 ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0}
> │ ; - jdk.incubator.vector.IntVector::intoArray at 42 (line 2962)
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 96 (line 299)
> 0.22% │ 0x00007fa53931cf8c: mov 0x14(%r12,%r10,8),%r11d ;*getfield form {reexecute=0 rethrow=0 return_oop=0}
> │ ; - java.lang.invoke.Invokers::checkCustomized at 9 (line 610)
> │ ; - java.lang.invoke.Invokers$Holder::invokeExact_MT at 15
> │ ; - com.github.forax.vectorhandle.VectorHandle::lambda$of$0 at 8 (line 410)
> │ ; - com.github.forax.vectorhandle.VectorHandle$$Lambda$78/0x0000000800c27610::invoke at 11
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 81 (line 298)
> 0.08% │ 0x00007fa53931cf91: mov 0x1c(%r12,%r11,8),%ecx ; implicit exception: dispatches to 0x00007fa53931d438
> 0.68% │ 0x00007fa53931cf96: test %ecx,%ecx
> │ 0x00007fa53931cf98: je 0x00007fa53931d31c
> 0.35% │ 0x00007fa53931cf9e: lea (%r12,%r10,8),%rsi
> 0.08% │ 0x00007fa53931cfa2: mov 0x18(%rsp),%rdx
> 0.07% │ 0x00007fa53931cfa7: mov 0x8(%rsp),%rcx
> 0.07% │ 0x00007fa53931cfac: xor %r9d,%r9d
> 0.39% │ 0x00007fa53931cfaf: xor %edi,%edi ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0}
> │ ; - jdk.incubator.vector.IntVector::intoArray at 42 (line 2962)
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 96 (line 299)
> 0.12% │ 0x00007fa53931cfb1: data16 xchg %ax,%ax
> 0.04% │ 0x00007fa53931cfb4: vzeroupper
> 0.53% │ 0x00007fa53931cfb7: callq 0x00007fa531840a80 ; ImmutableOopMap {[24]=Oop [120]=Oop [160]=Oop [168]=Oop [176]=Oop [200]=Oop }
> │ ;*invokevirtual invokeBasic {reexecute=0 rethrow=0 return_oop=1}
> │ ; - java.lang.invoke.Invokers$Holder::invokeExact_MT at 26
> │ ; - com.github.forax.vectorhandle.VectorHandle::lambda$of$0 at 8 (line 410)
> │ ; - com.github.forax.vectorhandle.VectorHandle$$Lambda$78/0x0000000800c27610::invoke at 11
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 81 (line 298)
> │ ; {optimized virtual_call}
> 0.07% │ 0x00007fa53931cfbc: nopl 0x0(%rax)
> 0.04% │ 0x00007fa53931cfc0: mov 0x8(%rax),%r11d ; implicit exception: dispatches to 0x00007fa53931d450
> 1.17% │ 0x00007fa53931cfc4: cmp $0x186a6c,%r11d ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0}
> │ ; - jdk.incubator.vector.IntVector::intoArray at 42 (line 2962)
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 96 (line 299)
> │ ; {metadata('jdk/incubator/vector/Int256Vector')}
> │ 0x00007fa53931cfcb: jne 0x00007fa53931d405 ;*checkcast {reexecute=0 rethrow=0 return_oop=0}
> │ ; - com.github.forax.vectorhandle.VectorHandle::apply at 86 (line 298)
> 0.05% │ 0x00007fa53931cfd1: mov 0xc(%rax),%r11d
> 0.05% │ 0x00007fa53931cfd5: vmovdqu 0x10(%r12,%r11,8),%ymm0
> 0.10% │ 0x00007fa53931cfdc: mov 0xa0(%rsp),%r10
> 0.56% │ 0x00007fa53931cfe4: mov (%rsp),%r11
> 0.07% │ 0x00007fa53931cfe8: vmovdqu %ymm0,0x10(%r10,%r11,1)
> 0.37% │ 0x00007fa53931cfef: mov 0x80(%rsp),%r11d
> 0.54% │ 0x00007fa53931cff7: add $0x8,%r11d
> 0.03% │ 0x00007fa53931cffb: mov %r11d,0x80(%rsp)
> 0.03% │ 0x00007fa53931d003: cmp 0x84(%rsp),%r11d
> ╰ 0x00007fa53931d00b: jl 0x00007fa53931cd91
> 0x00007fa53931d011: mov 0xa0(%rsp),%rcx
> 0x00007fa53931d019: mov 0xa8(%rsp),%rdi
> 0x00007fa53931d021: mov 0x78(%rsp),%rdx
> 0x00007fa53931d026: mov 0x9c(%rsp),%r9d
> 0x00007fa53931d02e: mov 0xb8(%rsp),%r13d
> 0x00007fa53931d036: mov 0xbc(%rsp),%r8d
> 0.00% 0x00007fa53931d03e: mov 0xc0(%rsp),%r14d
> 0x00007fa53931d046: mov %r11d,%eax
> 0x00007fa53931d049: cmp %r13d,%eax ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0}
> ....................................................................................................
> 28.77% <total for region 2>
>
> ....[Hottest Region 3]..............................................................................
> c2, level 4, java.lang.invoke.LambdaForm$MH.0x0000000800c99400::invoke, version 844 (362 bytes)
>
> --------------------------------------------------------------------------------
> [Verified Entry Point]
> # {method} {0x00007fa528a483d0} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000800c99400'
> # parm0: rsi:rsi = 'java/lang/Object'
> # parm1: rdx:rdx = 'java/lang/Object'
> # parm2: rcx:rcx = 'java/lang/Object'
> # parm3: r8:r8 = 'java/lang/Object'
> # parm4: r9:r9 = 'java/lang/Object'
> # parm5: rdi:rdi = 'java/lang/Object'
> # [sp+0x60] (sp of caller)
> 2.59% 0x00007fa539317b60: mov %eax,-0x14000(%rsp) ; {no_reloc}
> 0.05% 0x00007fa539317b67: push %rbp
> 0.07% 0x00007fa539317b68: sub $0x50,%rsp ;*synchronization entry
> ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99400::invoke at -1
> 0.56% 0x00007fa539317b6c: test %r8,%r8
> ╭ 0x00007fa539317b6f: je 0x00007fa539317ccb
> 0.04% │ 0x00007fa539317b75: mov 0x8(%r8),%r10d
> 0.10% │ 0x00007fa539317b79: nopl 0x0(%rax)
> 0.54% │ 0x00007fa539317b80: cmp $0x186a6c,%r10d ;*invokestatic binaryOp {reexecute=0 rethrow=0 return_oop=0}
> │ ; - jdk.incubator.vector.IntVector::lanewiseTemplate at 244 (line 633)
> │ ; - jdk.incubator.vector.Int256Vector::lanewise at 3 (line 279)
> │ ; - jdk.incubator.vector.Int256Vector::lanewise at 3 (line 41)
> │ ; - jdk.incubator.vector.IntVector::add at 5 (line 1096)
> │ ; - com.github.forax.vectorhandle.VectorHandlePerfTest$Template/0x0000000800c8e250::lambda at 13
> │ ; - java.lang.invoke.DirectMethodHandle$Holder::invokeStatic at 11
> │ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c91000::invoke at 16
> │ ; - java.lang.invoke.DelegatingMethodHandle$Holder::delegate at 21
> │ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99800::invoke at 125
> │ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99400::invoke at 64
> │ ; {metadata('jdk/incubator/vector/Int256Vector')}
> │ 0x00007fa539317b87: jne 0x00007fa539317d70 ;*invokestatic linkToSpecial {reexecute=0 rethrow=0 return_oop=0}
> │ ; - java.lang.invoke.DirectMethodHandle$Holder::invokeSpecial at 11
> │ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c82400::invoke at 21
> │ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99800::invoke at 27
> │ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99400::invoke at 64
> 0.03% │ ↗ 0x00007fa539317b8d: test %rcx,%rcx
> │╭ │ 0x00007fa539317b90: je 0x00007fa539317cd3
> 0.03% ││ │ 0x00007fa539317b96: mov 0x8(%rcx),%r11d
> 0.05% ││ │ 0x00007fa539317b9a: nopw 0x0(%rax,%rax,1)
> 0.54% ││ │ 0x00007fa539317ba0: cmp $0x186a6c,%r11d ;*invokestatic binaryOp {reexecute=0 rethrow=0 return_oop=0}
> ││ │ ; - jdk.incubator.vector.IntVector::lanewiseTemplate at 244 (line 633)
> ││ │ ; - jdk.incubator.vector.Int256Vector::lanewise at 3 (line 279)
> ││ │ ; - jdk.incubator.vector.Int256Vector::lanewise at 3 (line 41)
> ││ │ ; - jdk.incubator.vector.IntVector::add at 5 (line 1096)
> ││ │ ; - com.github.forax.vectorhandle.VectorHandlePerfTest$Template/0x0000000800c8e250::lambda at 13
> ││ │ ; - java.lang.invoke.DirectMethodHandle$Holder::invokeStatic at 11
> ││ │ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c91000::invoke at 16
> ││ │ ; - java.lang.invoke.DelegatingMethodHandle$Holder::delegate at 21
> ││ │ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99800::invoke at 125
> ││ │ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99400::invoke at 64
> ││ │ ; {metadata('jdk/incubator/vector/Int256Vector')}
> ││ │ 0x00007fa539317ba7: jne 0x00007fa539317d94
> 0.03% ││ │ 0x00007fa539317bad: mov %rcx,0x20(%rsp) ;*invokestatic linkToSpecial {reexecute=0 rethrow=0 return_oop=0}
> ││ │ ; - java.lang.invoke.DirectMethodHandle$Holder::invokeSpecial at 11
> ││ │ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c82400::invoke at 21
> ││ │ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99800::invoke at 35
> ││ │ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99400::invoke at 64
> 0.04% ││ │↗ 0x00007fa539317bb2: movabs $0x719998c28,%r10 ; {oop(a 'com/github/forax/vectorhandle/VectorHandlePerfTest$$Lambda$80+0x0000000800c27cc0'{0x0000000719998c28})}
> 0.05% ││ ││ 0x00007fa539317bbc: nopl 0x0(%rax)
> 0.50% ││ ││ 0x00007fa539317bc0: cmp %r10,%rdx
> ││ ││ 0x00007fa539317bc3: jne 0x00007fa539317d45 ;*invokestatic binaryOp {reexecute=0 rethrow=0 return_oop=0}
> ││ ││ ; - jdk.incubator.vector.IntVector::lanewiseTemplate at 244 (line 633)
> ││ ││ ; - jdk.incubator.vector.Int256Vector::lanewise at 3 (line 279)
> ││ ││ ; - jdk.incubator.vector.Int256Vector::lanewise at 3 (line 41)
> ││ ││ ; - jdk.incubator.vector.IntVector::add at 5 (line 1096)
> ││ ││ ; - com.github.forax.vectorhandle.VectorHandlePerfTest$Template/0x0000000800c8e250::lambda at 13
> ││ ││ ; - java.lang.invoke.DirectMethodHandle$Holder::invokeStatic at 11
> ││ ││ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c91000::invoke at 16
> ││ ││ ; - java.lang.invoke.DelegatingMethodHandle$Holder::delegate at 21
> ││ ││ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99800::invoke at 125
> ││ ││ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99400::invoke at 64
> 0.05% ││ ││ 0x00007fa539317bc9: vmovq -0x91(%rip),%xmm0 # 0x00007fa539317b40
> ││ ││ ; {section_word}
> 0.02% ││ ││ 0x00007fa539317bd1: vpbroadcastd %xmm0,%ymm0 ;*invokestatic broadcastCoerced {reexecute=0 rethrow=0 return_oop=0}
> ││ ││ ; - jdk.incubator.vector.IntVector$IntSpecies::broadcastBits at 18 (line 3504)
> ││ ││ ; - jdk.incubator.vector.IntVector$IntSpecies::broadcast at 5 (line 3513)
> ││ ││ ; - jdk.incubator.vector.IntVector::broadcast at 7 (line 484)
> ││ ││ ; - com.github.forax.vectorhandle.VectorHandlePerfTest$Template/0x0000000800c8e250::lambda at 7
> ││ ││ ; - java.lang.invoke.DirectMethodHandle$Holder::invokeStatic at 11
> ││ ││ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c91000::invoke at 16
> ││ ││ ; - java.lang.invoke.DelegatingMethodHandle$Holder::delegate at 21
> ││ ││ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99800::invoke at 125
> ││ ││ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99400::invoke at 64
> 0.04% ││ ││ 0x00007fa539317bd6: mov 0xc(%r8),%r10d ; implicit exception: dispatches to 0x00007fa539317db8
> ││ ││ ;*invokestatic binaryOp {reexecute=0 rethrow=0 return_oop=0}
> ││ ││ ; - jdk.incubator.vector.IntVector::lanewiseTemplate at 244 (line 633)
> ││ ││ ; - jdk.incubator.vector.Int256Vector::lanewise at 3 (line 279)
> ││ ││ ; - jdk.incubator.vector.Int256Vector::lanewise at 3 (line 41)
> ││ ││ ; - jdk.incubator.vector.IntVector::add at 5 (line 1096)
> ││ ││ ; - com.github.forax.vectorhandle.VectorHandlePerfTest$Template/0x0000000800c8e250::lambda at 13
> ││ ││ ; - java.lang.invoke.DirectMethodHandle$Holder::invokeStatic at 11
> ││ ││ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c91000::invoke at 16
> ││ ││ ; - java.lang.invoke.DelegatingMethodHandle$Holder::delegate at 21
> ││ ││ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99800::invoke at 125
> ││ ││ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99400::invoke at 64
> 0.53% ││ ││ 0x00007fa539317bda: vpmulld 0x10(%r12,%r10,8),%ymm0,%ymm1;*invokestatic binaryOp {reexecute=0 rethrow=0 return_oop=0}
> ││ ││ ; - jdk.incubator.vector.IntVector::lanewiseTemplate at 244 (line 633)
> ││ ││ ; - jdk.incubator.vector.Int256Vector::lanewise at 3 (line 279)
> ││ ││ ; - jdk.incubator.vector.Int256Vector::lanewise at 3 (line 41)
> ││ ││ ; - jdk.incubator.vector.IntVector::mul at 5 (line 1245)
> ││ ││ ; - com.github.forax.vectorhandle.VectorHandlePerfTest$Template/0x0000000800c8e250::lambda at 10
> ││ ││ ; - java.lang.invoke.DirectMethodHandle$Holder::invokeStatic at 11
> ││ ││ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c91000::invoke at 16
> ││ ││ ; - java.lang.invoke.DelegatingMethodHandle$Holder::delegate at 21
> ││ ││ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99800::invoke at 125
> ││ ││ ; - java.lang.invoke.LambdaForm$MH/0x0000000800c99400::invoke at 64
> 0.19% ││ ││ 0x00007fa539317be1: mov 0x20(%rsp),%r10
> 0.54% ││ ││ 0x00007fa539317be6: mov 0xc(%r10),%r11d ; implicit exception: dispatches to 0x00007fa539317dcc
> 0.04% ││ ││ 0x00007fa539317bea: mov 0x130(%r15),%rbp
> 0.02% ││ ││ 0x00007fa539317bf1: mov %rbp,%r10
> 0.03% ││ ││ 0x00007fa539317bf4: add $0x30,%r10
> 0.45% ││ ││ 0x00007fa539317bf8: nopl 0x0(%rax,%rax,1)
> 0.05% ││ ││ 0x00007fa539317c00: cmp 0x140(%r15),%r10
> ││╭││ 0x00007fa539317c07: jae 0x00007fa539317ce5
> 0.03% │││││ 0x00007fa539317c0d: mov %r10,0x130(%r15)
> 0.07% │││││ 0x00007fa539317c14: prefetchw 0xc0(%r10)
> 0.55% │││││ 0x00007fa539317c1c: movq $0x1,0x0(%rbp)
> 0.02% │││││ 0x00007fa539317c24: prefetchw 0x100(%r10)
> 0.02% │││││ 0x00007fa539317c2c: movl $0x1165,0x8(%rbp) ; {metadata({type array int})}
> 0.05% │││││ 0x00007fa539317c33: prefetchw 0x140(%r10)
> 0.48% │││││ 0x00007fa539317c3b: movl $0x8,0xc(%rbp)
> 0.03% │││││ 0x00007fa539317c42: prefetchw 0x180(%r10)
> 1.67% │││││ 0x00007fa539317c4a: mov %r12,0x10(%rbp)
> 0.06% │││││ 0x00007fa539317c4e: mov %r12,0x18(%rbp)
> 0.48% │││││ 0x00007fa539317c52: mov %r12,0x20(%rbp)
> 0.02% │││││ 0x00007fa539317c56: mov %r12,0x28(%rbp)
> 0.05% │││││ 0x00007fa539317c5a: vmovdqu 0x10(%r12,%r11,8),%ymm0
> 0.04% │││││ 0x00007fa539317c61: mov 0x130(%r15),%rax
> 0.49% │││││ 0x00007fa539317c68: vpaddd %ymm1,%ymm0,%ymm0
> 0.02% │││││ 0x00007fa539317c6c: vmovdqu %ymm0,0x10(%rbp)
> 0.06% │││││ 0x00007fa539317c71: mov %rax,%r10
> 0.09% │││││ 0x00007fa539317c74: add $0x10,%r10
> 0.44% │││││ 0x00007fa539317c78: nopl 0x0(%rax,%rax,1)
> 0.02% │││││ 0x00007fa539317c80: cmp 0x140(%r15),%r10
> │││││ 0x00007fa539317c87: jae 0x00007fa539317d1a
> 0.08% │││││ 0x00007fa539317c8d: mov %r10,0x130(%r15)
> 0.05% │││││ 0x00007fa539317c94: prefetchw 0xc0(%r10)
> 0.53% │││││ 0x00007fa539317c9c: movq $0x1,(%rax)
> 0.04% │││││ 0x00007fa539317ca3: movl $0x186a6c,0x8(%rax) ; {metadata('jdk/incubator/vector/Int256Vector')}
> 0.03% │││││ 0x00007fa539317caa: mov %rbp,%r10
> 0.09% │││││ 0x00007fa539317cad: shr $0x3,%r10
> 0.47% │││││ 0x00007fa539317cb1: mov %r10d,0xc(%rax)
> 0.03% │││││ 0x00007fa539317cb5: vzeroupper
> 0.52% │││││ 0x00007fa539317cb8: add $0x50,%rsp
> 0.06% │││││ 0x00007fa539317cbc: pop %rbp
> 0.07% │││││ 0x00007fa539317cbd: cmp 0x118(%r15),%rsp ; {poll_return}
> │││││ 0x00007fa539317cc4: ja 0x00007fa539317df5
> 0.59% │││││ 0x00007fa539317cca: retq
> ↘││││ 0x00007fa539317ccb: xor %r8d,%r8d
> ││╰│ 0x00007fa539317cce: jmpq 0x00007fa539317b8d
> ↘│ │ 0x00007fa539317cd3: xor %r10d,%r10d
> │ │ 0x00007fa539317cd6: mov %r10,0x20(%rsp)
> │ │ 0x00007fa539317cdb: nopl 0x0(%rax,%rax,1)
> │ ╰ 0x00007fa539317ce0: jmpq 0x00007fa539317bb2
> ↘ 0x00007fa539317ce5: mov %r11d,0x28(%rsp)
> 0x00007fa539317cea: vmovdqu %ymm1,(%rsp)
> 0x00007fa539317cef: movabs $0x800008b28,%rsi ; {metadata({type array int})}
> 0x00007fa539317cf9: mov $0x8,%edx ;*invokestatic binaryOp {reexecute=0 rethrow=0 return_oop=0}
> ....................................................................................................
> 14.46% <total for region 3>
>
> ....[Hottest Regions]...............................................................................
> 38.65% c2, level 4 com.github.forax.vectorhandle.jmh_generated.VectorHandlePerfTest_vector_handle_jmhTest::vector_handle_avgt_jmhStub, version 890 (268 bytes)
> 28.77% c2, level 4 com.github.forax.vectorhandle.VectorHandle::apply, version 851 (626 bytes)
> 14.46% c2, level 4 java.lang.invoke.LambdaForm$MH.0x0000000800c99400::invoke, version 844 (362 bytes)
> 5.44% Unknown, level 0 java.lang.invoke.MethodHandle::invokeBasic, version 247 (34 bytes)
> 4.12% c2, level 4 com.github.forax.vectorhandle.VectorHandlePerfTest::vector_handle, version 854 (102 bytes)
> 1.43% c2, level 4 com.github.forax.vectorhandle.jmh_generated.VectorHandlePerfTest_vector_handle_jmhTest::vector_handle_avgt_jmhStub, version 889 (226 bytes)
> 0.85% libpthread-2.31.so do_futex_wait.constprop.0 (0 bytes)
> 0.22% libjvm.so ElfSymbolTable::lookup (48 bytes)
> 0.17% libpthread-2.31.so pthread_cond_timedwait@@GLIBC_2.3.2 (0 bytes)
> 0.16% libjvm.so HeapRegionManager::par_iterate (103 bytes)
> 0.14% libc-2.31.so __clone (0 bytes)
> 0.13% libpthread-2.31.so pthread_cond_wait@@GLIBC_2.3.2 (12 bytes)
> 0.10% libjvm.so HeapRegionClaimer::claim_region (2 bytes)
> 0.09% kernel [unknown] (0 bytes)
> 0.08% kernel [unknown] (0 bytes)
> 0.08% libpthread-2.31.so __lll_lock_wait (0 bytes)
> 0.06% libjvm.so SpinPause (0 bytes)
> 0.06% libpthread-2.31.so __pthread_disable_asynccancel (43 bytes)
> 0.06% ld-2.31.so __tls_get_addr (47 bytes)
> 0.05% libpthread-2.31.so __libc_write (8 bytes)
> 4.88% <...other 780 warm regions...>
> ....................................................................................................
> 100.00% <totals>
>
> ....[Hottest Methods (after inlining)]..............................................................
> 38.68% c2, level 4 com.github.forax.vectorhandle.jmh_generated.VectorHandlePerfTest_vector_handle_jmhTest::vector_handle_avgt_jmhStub, version 890
> 28.78% c2, level 4 com.github.forax.vectorhandle.VectorHandle::apply, version 851
> 14.47% c2, level 4 java.lang.invoke.LambdaForm$MH.0x0000000800c99400::invoke, version 844
> 5.44% Unknown, level 0 java.lang.invoke.MethodHandle::invokeBasic, version 247
> 4.12% c2, level 4 com.github.forax.vectorhandle.VectorHandlePerfTest::vector_handle, version 854
> 1.43% c2, level 4 com.github.forax.vectorhandle.jmh_generated.VectorHandlePerfTest_vector_handle_jmhTest::vector_handle_avgt_jmhStub, version 889
> 0.85% libpthread-2.31.so do_futex_wait.constprop.0
> 0.22% libjvm.so ElfSymbolTable::lookup
> 0.20% kernel [unknown]
> 0.18% libpthread-2.31.so pthread_cond_timedwait@@GLIBC_2.3.2
> 0.17% libjvm.so HeapRegionManager::par_iterate
> 0.14% libc-2.31.so __clone
> 0.14% libc-2.31.so __vfprintf_internal
> 0.13% libpthread-2.31.so pthread_cond_wait@@GLIBC_2.3.2
> 0.11% hsdis-amd64.so print_insn
> 0.10% libjvm.so HeapRegionClaimer::claim_region
> 0.10% libc-2.31.so _IO_fwrite
> 0.10% interpreter method entry point (kind = zerolocals)
> 0.08% libpthread-2.31.so __lll_lock_wait
> 0.07% interpreter return entry points
> 4.51% <...other 513 warm methods...>
> ....................................................................................................
> 100.00% <totals>
>
> ....[Distribution by Source]........................................................................
> 87.59% c2, level 4
> 5.45% Unknown, level 0
> 3.43% libjvm.so
> 1.44% libpthread-2.31.so
> 1.06% libc-2.31.so
> 0.43% interpreter
> 0.20% kernel
> 0.15% hsdis-amd64.so
> 0.07% ld-2.31.so
> 0.07% c1, level 3
> 0.03% [vdso]
> 0.03% c1, level 2
> 0.02%
> 0.02% perf-821535.map
> 0.01% libz.so.1.2.11
> 0.00% runtime stub
> ....................................................................................................
> 100.00% <totals>
>
>
More information about the panama-dev
mailing list