RFR (14) 8235837: Memory access API refinements

Andrew Haley aph at redhat.com
Wed Jan 15 18:00:00 UTC 2020


On 1/9/20 4:37 PM, Maurizio Cimadamore wrote:
> There you go
>
> cr.openjdk.java.net/~mcimadamore/8235837_javadoc

Thank you.

So I've been kicking the tyres, and I'm rather surprised at how poor
the performance seems to be. My simple test, like this:

    @Benchmark
    public void intHandleTest(BenchmarkState state) {
        try (var segment = BenchmarkState.segment.acquire()) {
            var base = segment.baseAddress();
            final var byteSize = ARRAY_SIZE * 4;
            for (int i = 0; i < byteSize; i += 4) {
                BenchmarkState.intHandle.set(base.offset(i), (int) 4);
            }
        }
    }

has a great deal of overhead. It was a bit of a struggle to get it to
unroll nicely, and the best I could get was

  6.90%  │  0x00007faeeff7dec8:   mov    r9d,r11d
         │  0x00007faeeff7decb:   add    r9d,0x4                      ;*iinc {reexecute=0 rethrow=0 return_oop=0}
         │                                                            ; - org.sample.MemoryHandlesTest::intHandleTest at 45 (line 34)
         │                                                            ; - org.sample.generated.MemoryHandlesTest_intHandleTest_jmhTest::intHandleTest_avgt_jmhStub at 17 (line 191)
         │  0x00007faeeff7decf:   mov    rdx,rbx
         │  0x00007faeeff7ded2:   add    rdx,0x10                     ;*i2l {reexecute=0 rethrow=0 return_oop=0}
         │                                                            ; - org.sample.MemoryHandlesTest::intHandleTest at 35 (line 35)
         │                                                            ; - org.sample.generated.MemoryHandlesTest_intHandleTest_jmhTest::intHandleTest_avgt_jmhStub at 17 (line 191)
  0.06%  │  0x00007faeeff7ded6:   cmp    rdx,rdi
         │  0x00007faeeff7ded9:   jg     0x00007faeeff7df94           ;*ifle {reexecute=0 rethrow=0 return_oop=0}
         │                                                            ; - jdk.internal.foreign.MemorySegmentImpl::checkBounds at 20 (line 196)
         │                                                            ; - jdk.internal.foreign.MemorySegmentImpl::checkRange at 29 (line 178)
         │                                                            ; - jdk.internal.foreign.MemoryAddressImpl::checkAccess at 21 (line 84)
         │                                                            ; - java.lang.invoke.VarHandleMemoryAddressAsInts::checkAddress at 15 (line 50)
         │                                                            ; - java.lang.invoke.VarHandleMemoryAddressAsInts::set0 at 7 (line 85)
         │                                                            ; - java.lang.invoke.VarHandleMemoryAddressAsInts0/0x0000000800bc3840::set at 7
         │                                                            ; - java.lang.invoke.VarHandleGuards::guard_LI_V at 33 (line 114)
         │                                                            ; - org.sample.MemoryHandlesTest::intHandleTest at 42 (line 35)
         │                                                            ; - org.sample.generated.MemoryHandlesTest_intHandleTest_jmhTest::intHandleTest_avgt_jmhStub at 17 (line 191)
         │  0x00007faeeff7dedf:   mov    DWORD PTR [rsi+0x10],0x4     ;*invokevirtual putIntUnaligned {reexecute=0 rethrow=0 return_oop=0}
         │                                                            ; - jdk.internal.misc.Unsafe::putIntUnaligned at 10 (line 3693)
         │                                                            ; - java.lang.invoke.VarHandleMemoryAddressAsInts::set0 at 38 (line 86)
         │                                                            ; - java.lang.invoke.VarHandleMemoryAddressAsInts0/0x0000000800bc3840::set at 7
         │                                                            ; - java.lang.invoke.VarHandleGuards::guard_LI_V at 33 (line 114)
         │                                                            ; - org.sample.MemoryHandlesTest::intHandleTest at 42 (line 35)
         │                                                            ; - org.sample.generated.MemoryHandlesTest_intHandleTest_jmhTest::intHandleTest_avgt_jmhStub at 17 (line 191)

for every store. In contrast, similar ByteBuffer code looks like:


  0.08%   ↗  0x00007f3b5bf717c0:   movsxd r13,r8d
  0.16%   │  0x00007f3b5bf717c3:   mov    r14,rdx
          │  0x00007f3b5bf717c6:   add    r14,r13
  1.00%   │  0x00007f3b5bf717c9:   movsxd r13,r8d
  0.04%   │  0x00007f3b5bf717cc:   vmovdqu YMMWORD PTR [rdx+r13*1],ymm4
  6.87%   │  0x00007f3b5bf717d2:   vmovdqu YMMWORD PTR [r14+0x20],ymm4
  5.77%   │  0x00007f3b5bf717d8:   vmovdqu YMMWORD PTR [r14+0x40],ymm4
  3.99%   │  0x00007f3b5bf717de:   vmovdqu YMMWORD PTR [r14+0x60],ymm4
  6.09%   │  0x00007f3b5bf717e4:   vmovdqu YMMWORD PTR [r14+0x80],ymm4
  4.97%   │  0x00007f3b5bf717ed:   vmovdqu YMMWORD PTR [r14+0xa0],ymm4
  4.93%   │  0x00007f3b5bf717f6:   vmovdqu YMMWORD PTR [r14+0xc0],ymm4
  5.07%   │  0x00007f3b5bf717ff:   vmovdqu YMMWORD PTR [r14+0xe0],ymm4
  4.87%   │  0x00007f3b5bf71808:   vmovdqu YMMWORD PTR [r14+0x100],ymm4
  7.39%   │  0x00007f3b5bf71811:   vmovdqu YMMWORD PTR [r14+0x120],ymm4
  5.19%   │  0x00007f3b5bf7181a:   vmovdqu YMMWORD PTR [r14+0x140],ymm4
  6.21%   │  0x00007f3b5bf71823:   vmovdqu YMMWORD PTR [r14+0x160],ymm4
  4.93%   │  0x00007f3b5bf7182c:   vmovdqu YMMWORD PTR [r14+0x180],ymm4
  5.69%   │  0x00007f3b5bf71835:   vmovdqu YMMWORD PTR [r14+0x1a0],ymm4
 11.28%   │  0x00007f3b5bf7183e:   vmovdqu YMMWORD PTR [r14+0x1c0],ymm4
  4.83%   │  0x00007f3b5bf71847:   vmovdqu YMMWORD PTR [r14+0x1e0],ymm4;*invokevirtual putIntUnaligned {reexecute=0 rethrow=0 return_oop=0}
          │                                                            ; - jdk.internal.misc.Unsafe::putIntUnaligned at 10 (line 3693)
          │                                                            ; - java.nio.DirectByteBuffer::putInt at 18 (line 860)
          │                                                            ; - java.nio.DirectByteBuffer::putInt at 12 (line 881)
          │                                                            ; - org.sample.ByteBufferTest::floss at 15 (line 34)
          │                                                            ; - org.sample.ByteBufferTest::test at 14 (line 42)
          │                                                            ; - org.sample.generated.ByteBufferTest_test_jmhTest::test_avgt_jmhStub at 17 (line 241)
  2.85%   │  0x00007f3b5bf71850:   add    r8d,0x200                    ;*iinc {reexecute=0 rethrow=0 return_oop=0}
          │                                                            ; - org.sample.ByteBufferTest::floss at 19 (line 33)
          │                                                            ; - org.sample.ByteBufferTest::test at 14 (line 42)
          │                                                            ; - org.sample.generated.ByteBufferTest_test_jmhTest::test_avgt_jmhStub at 17 (line 241)
          │  0x00007f3b5bf71857:   cmp    r8d,ecx
          ╰  0x00007f3b5bf7185a:   jl     0x00007f3b5bf717c0           ;*goto {reexecute=0 rethrow=0 return_oop=0}

nice, eh?

Benchmark                             Mode  Cnt     Score       Error  Units
ByteBufferTest.test                   avgt    5   620.628 ±     2.947  ns/op
MemoryHandlesTest.intHandleTest       avgt    5  2778.602 ± 10557.068  ns/op

Could it be that some C2 improvements or similar are proposed?

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


More information about the core-libs-dev mailing list