Observations from a simple JMH benchmark
Paul Sandoz
paul.sandoz at oracle.com
Wed Feb 14 23:09:56 UTC 2018
Hi,
I have been playing around with a simple benchmark and JMH (separately i can get asm hotspots working on the mac now via dtrace!)
A simple addition:
for (int i = 0; i < a.length; i += SPECIES.length()) {
IntVector<Shapes.S128Bit> av = SPECIES.fromArray(a, i);
IntVector<Shapes.S128Bit> bv = SPECIES.fromArray(b, i);
av.add(bv).intoArray(c, i);
}
results in code such as the following for a non-unrolled loop:
4.50% │↗│ 0x0000000119c33bf0: mov %r9d,%ebx
2.79% │││ 0x0000000119c33bf3: or %edi,%ebx
4.62% │││ 0x0000000119c33bf5: or $0x4,%ebx
2.78% │││ 0x0000000119c33bf8: test %ebx,%ebx
│││ 0x0000000119c33bfa: jl 0x0000000119c33ccc ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0}
│││ ; - jdk.incubator.vector.Int128Vector::intoArray at 37 (line 252)
│││ ; - jmh.AddTest::add at 45 (line 54)
│││ ; - jmh.generated.AddTest_add_jmhTest::add_avgt_jmhStub at 17 (line 186)
5.17% │││ 0x0000000119c33c00: vmovdqu 0x10(%r13,%r9,4),%xmm3 ;*invokestatic load {reexecute=0 rethrow=0 return_oop=0}
│││ ; - jdk.incubator.vector.Int128Vector$Int128Species::fromArray at 37 (line 677)
│││ ; - jdk.incubator.vector.Int128Vector$Int128Species::fromArray at 3 (line 566)
│││ ; - jmh.AddTest::add at 19 (line 52)
│││ ; - jmh.generated.AddTest_add_jmhTest::add_avgt_jmhStub at 17 (line 186)
9.60% │││ 0x0000000119c33c07: mov %r9d,%ebx
4.19% │││ 0x0000000119c33c0a: or %eax,%ebx
2.38% │││ 0x0000000119c33c0c: or $0x4,%ebx
4.47% │││ 0x0000000119c33c0f: test %ebx,%ebx
│││ 0x0000000119c33c11: jl 0x0000000119c33d0e ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0}
│││ ; - jdk.incubator.vector.Int128Vector::intoArray at 37 (line 252)
│││ ; - jmh.AddTest::add at 45 (line 54)
│││ ; - jmh.generated.AddTest_add_jmhTest::add_avgt_jmhStub at 17 (line 186)
4.04% │││ 0x0000000119c33c17: vpaddd 0x10(%rsi,%r9,4),%xmm3,%xmm3
│││ ;*invokestatic binaryOp {reexecute=0 rethrow=0 return_oop=0}
│││ ; - jdk.incubator.vector.Int128Vector::add at 26 (line 150)
│││ ; - jdk.incubator.vector.Int128Vector::add at 2 (line 34)
│││ ; - jmh.AddTest::add at 37 (line 54)
│││ ; - jmh.generated.AddTest_add_jmhTest::add_avgt_jmhStub at 17 (line 186)
14.79% │││ 0x0000000119c33c1e: mov %r9d,%ebx
3.55% │││ 0x0000000119c33c21: or %edx,%ebx
2.32% │││ 0x0000000119c33c23: or $0x4,%ebx
3.36% │││ 0x0000000119c33c26: test %ebx,%ebx
│││ 0x0000000119c33c28: jl 0x0000000119c33d4e
5.65% │││ 0x0000000119c33c2e: vmovdqu %xmm3,0x10(%rbp,%r9,4) ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0}
│││ ; - jdk.incubator.vector.Int128Vector::intoArray at 37 (line 252)
│││ ; - jmh.AddTest::add at 45 (line 54)
│││ ; - jmh.generated.AddTest_add_jmhTest::add_avgt_jmhStub at 17 (line 186)
17.97% │││ 0x0000000119c33c35: add $0x4,%r9d
2.56% │││ 0x0000000119c33c39: cmp %r10d,%r9d
0.00% │╰│ 0x0000000119c33c3c: jl 0x0000000119c33bf0
The vector instructions are there, intermixed with what i believe are repeated artifacts from the Objects.checkFromIndexSize bounds checks that are not hoisted, specifically i believe these tests are related to checking if any of the arguments are negative:
if ((length | fromIndex | size) < 0 || size > length - fromIndex)
throw outOfBoundsCheckFromIndexSize(oobef, fromIndex, size, length);
return fromIndex;
Turning off the bounds checks (-Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=false) produces:
6.92% ↗ │ 0x000000011bbd62d0: vmovdqu 0x10(%r8,%rbx,4),%xmm0
22.53% │ │ 0x000000011bbd62d7: vpaddd 0x10(%r10,%rbx,4),%xmm0,%xmm0
25.81% │ │ 0x000000011bbd62de: vmovdqu %xmm0,0x10(%r9,%rbx,4) ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0}
│ │ ; - jdk.incubator.vector.Int128Vector::intoArray at 37 (line 252)
│ │ ; - jmh.AddTest::add at 45 (line 54)
│ │ ; - jmh.generated.AddTest_add_jmhTest::add_avgt_jmhStub at 17 (line 186)
30.80% │ │ 0x000000011bbd62e5: add $0x4,%ebx
6.30% │ │ 0x000000011bbd62e8: cmp %r11d,%ebx
╰ │ 0x000000011bbd62eb: jl 0x000000011bbd62d0 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
│ ; - jmh.AddTest::add at 8 (line 51)
│ ; - jmh.generated.AddTest_add_jmhTest::add_avgt_jmhStub at 17 (line 186)
Vladimir, i guess this is the kind of thing you were mentioning with regards to bounds checks?
Perhaps there are general optimization possibilities for such bounds checks. Only Preconditions.checkIndex is currently an intrinsic.
Paul.
More information about the panama-dev
mailing list