RFR (14) 8235837: Memory access API refinements
Andrew Haley
aph at redhat.com
Wed Jan 15 18:00:00 UTC 2020
On 1/9/20 4:37 PM, Maurizio Cimadamore wrote:
> There you go
>
> cr.openjdk.java.net/~mcimadamore/8235837_javadoc
Thank you.
So I've been kicking the tyres, and I'm rather surprised at how poor
the performance seems to be. My simple test, like this:
@Benchmark
public void intHandleTest(BenchmarkState state) {
try (var segment = BenchmarkState.segment.acquire()) {
var base = segment.baseAddress();
final var byteSize = ARRAY_SIZE * 4;
for (int i = 0; i < byteSize; i += 4) {
BenchmarkState.intHandle.set(base.offset(i), (int) 4);
}
}
}
has a great deal of overhead. It was a bit of a struggle to get it to
unroll nicely, and the best I could get was
6.90% │ 0x00007faeeff7dec8: mov r9d,r11d
│ 0x00007faeeff7decb: add r9d,0x4 ;*iinc {reexecute=0 rethrow=0 return_oop=0}
│ ; - org.sample.MemoryHandlesTest::intHandleTest at 45 (line 34)
│ ; - org.sample.generated.MemoryHandlesTest_intHandleTest_jmhTest::intHandleTest_avgt_jmhStub at 17 (line 191)
│ 0x00007faeeff7decf: mov rdx,rbx
│ 0x00007faeeff7ded2: add rdx,0x10 ;*i2l {reexecute=0 rethrow=0 return_oop=0}
│ ; - org.sample.MemoryHandlesTest::intHandleTest at 35 (line 35)
│ ; - org.sample.generated.MemoryHandlesTest_intHandleTest_jmhTest::intHandleTest_avgt_jmhStub at 17 (line 191)
0.06% │ 0x00007faeeff7ded6: cmp rdx,rdi
│ 0x00007faeeff7ded9: jg 0x00007faeeff7df94 ;*ifle {reexecute=0 rethrow=0 return_oop=0}
│ ; - jdk.internal.foreign.MemorySegmentImpl::checkBounds at 20 (line 196)
│ ; - jdk.internal.foreign.MemorySegmentImpl::checkRange at 29 (line 178)
│ ; - jdk.internal.foreign.MemoryAddressImpl::checkAccess at 21 (line 84)
│ ; - java.lang.invoke.VarHandleMemoryAddressAsInts::checkAddress at 15 (line 50)
│ ; - java.lang.invoke.VarHandleMemoryAddressAsInts::set0 at 7 (line 85)
│ ; - java.lang.invoke.VarHandleMemoryAddressAsInts0/0x0000000800bc3840::set at 7
│ ; - java.lang.invoke.VarHandleGuards::guard_LI_V at 33 (line 114)
│ ; - org.sample.MemoryHandlesTest::intHandleTest at 42 (line 35)
│ ; - org.sample.generated.MemoryHandlesTest_intHandleTest_jmhTest::intHandleTest_avgt_jmhStub at 17 (line 191)
│ 0x00007faeeff7dedf: mov DWORD PTR [rsi+0x10],0x4 ;*invokevirtual putIntUnaligned {reexecute=0 rethrow=0 return_oop=0}
│ ; - jdk.internal.misc.Unsafe::putIntUnaligned at 10 (line 3693)
│ ; - java.lang.invoke.VarHandleMemoryAddressAsInts::set0 at 38 (line 86)
│ ; - java.lang.invoke.VarHandleMemoryAddressAsInts0/0x0000000800bc3840::set at 7
│ ; - java.lang.invoke.VarHandleGuards::guard_LI_V at 33 (line 114)
│ ; - org.sample.MemoryHandlesTest::intHandleTest at 42 (line 35)
│ ; - org.sample.generated.MemoryHandlesTest_intHandleTest_jmhTest::intHandleTest_avgt_jmhStub at 17 (line 191)
for every store. In contrast, similar ByteBuffer code looks like:
0.08% ↗ 0x00007f3b5bf717c0: movsxd r13,r8d
0.16% │ 0x00007f3b5bf717c3: mov r14,rdx
│ 0x00007f3b5bf717c6: add r14,r13
1.00% │ 0x00007f3b5bf717c9: movsxd r13,r8d
0.04% │ 0x00007f3b5bf717cc: vmovdqu YMMWORD PTR [rdx+r13*1],ymm4
6.87% │ 0x00007f3b5bf717d2: vmovdqu YMMWORD PTR [r14+0x20],ymm4
5.77% │ 0x00007f3b5bf717d8: vmovdqu YMMWORD PTR [r14+0x40],ymm4
3.99% │ 0x00007f3b5bf717de: vmovdqu YMMWORD PTR [r14+0x60],ymm4
6.09% │ 0x00007f3b5bf717e4: vmovdqu YMMWORD PTR [r14+0x80],ymm4
4.97% │ 0x00007f3b5bf717ed: vmovdqu YMMWORD PTR [r14+0xa0],ymm4
4.93% │ 0x00007f3b5bf717f6: vmovdqu YMMWORD PTR [r14+0xc0],ymm4
5.07% │ 0x00007f3b5bf717ff: vmovdqu YMMWORD PTR [r14+0xe0],ymm4
4.87% │ 0x00007f3b5bf71808: vmovdqu YMMWORD PTR [r14+0x100],ymm4
7.39% │ 0x00007f3b5bf71811: vmovdqu YMMWORD PTR [r14+0x120],ymm4
5.19% │ 0x00007f3b5bf7181a: vmovdqu YMMWORD PTR [r14+0x140],ymm4
6.21% │ 0x00007f3b5bf71823: vmovdqu YMMWORD PTR [r14+0x160],ymm4
4.93% │ 0x00007f3b5bf7182c: vmovdqu YMMWORD PTR [r14+0x180],ymm4
5.69% │ 0x00007f3b5bf71835: vmovdqu YMMWORD PTR [r14+0x1a0],ymm4
11.28% │ 0x00007f3b5bf7183e: vmovdqu YMMWORD PTR [r14+0x1c0],ymm4
4.83% │ 0x00007f3b5bf71847: vmovdqu YMMWORD PTR [r14+0x1e0],ymm4;*invokevirtual putIntUnaligned {reexecute=0 rethrow=0 return_oop=0}
│ ; - jdk.internal.misc.Unsafe::putIntUnaligned at 10 (line 3693)
│ ; - java.nio.DirectByteBuffer::putInt at 18 (line 860)
│ ; - java.nio.DirectByteBuffer::putInt at 12 (line 881)
│ ; - org.sample.ByteBufferTest::floss at 15 (line 34)
│ ; - org.sample.ByteBufferTest::test at 14 (line 42)
│ ; - org.sample.generated.ByteBufferTest_test_jmhTest::test_avgt_jmhStub at 17 (line 241)
2.85% │ 0x00007f3b5bf71850: add r8d,0x200 ;*iinc {reexecute=0 rethrow=0 return_oop=0}
│ ; - org.sample.ByteBufferTest::floss at 19 (line 33)
│ ; - org.sample.ByteBufferTest::test at 14 (line 42)
│ ; - org.sample.generated.ByteBufferTest_test_jmhTest::test_avgt_jmhStub at 17 (line 241)
│ 0x00007f3b5bf71857: cmp r8d,ecx
╰ 0x00007f3b5bf7185a: jl 0x00007f3b5bf717c0 ;*goto {reexecute=0 rethrow=0 return_oop=0}
nice, eh?
Benchmark Mode Cnt Score Error Units
ByteBufferTest.test avgt 5 620.628 ± 2.947 ns/op
MemoryHandlesTest.intHandleTest avgt 5 2778.602 ± 10557.068 ns/op
Could it be that some C2 improvements or similar are proposed?
--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
More information about the core-libs-dev
mailing list