[vectorIntrinsics] RFR: Optimize mem barriers for ByteBuffer cases

Thu Jul 29 18:16:49 UTC 2021

On Tue, 27 Jul 2021 20:42:13 GMT, Radoslaw Smogura <github.com+7535718+rsmogura at openjdk.org> wrote:

> # Description
> This change tries to remove mem bars for byte buffer cases.
> 
> Previously mem bars were inserted almost unconditionally if attemp to native memory acees where detected. This patch tries to follow up inline_unsafe_access and insert bar only if can't determine if it's heap or off-heap (type missmatch cases are not ported).
> 
> # Testing
> Memory tests should include rollbacking JDK changes, and leaving only hotspot, as intrinsics should be well guarded
> 
> # Notes
> Polluted cases to be addressed later
> 
> # Benchmarks
> 
> Benchmark                                (size)  Mode  Cnt    Score   Error  Units
> ByteBufferVectorAccess.arrays              1024  avgt   10   12.585 ? 0.409  ns/op
> ByteBufferVectorAccess.directBuffers       1024  avgt   10   19.962 ? 0.080  ns/op
> ByteBufferVectorAccess.heapBuffers         1024  avgt   10   15.878 ? 0.187  ns/op
> ByteBufferVectorAccess.pollutedBuffers2    1024  avgt   10  123.702 ? 0.723  ns/op
> ByteBufferVectorAccess.pollutedBuffers3    1024  avgt   10  223.928 ? 1.906  ns/op
> 
> Before
> 
> Benchmark                                (size)  Mode  Cnt    Score   Error  Units
> ByteBufferVectorAccess.arrays              1024  avgt   10   14.730 ? 0.061  ns/op
> ByteBufferVectorAccess.directBuffers       1024  avgt   10   77.707 ? 4.867  ns/op
> ByteBufferVectorAccess.heapBuffers         1024  avgt   10   76.530 ? 1.076  ns/op
> ByteBufferVectorAccess.pollutedBuffers2    1024  avgt   10  143.331 ? 1.096  ns/op
> ByteBufferVectorAccess.pollutedBuffers3    1024  avgt   10  286.645 ? 3.444  ns/op

This is starting to look better, the logic for heap/native looks correct comparing with unsafe. Perhaps for clarity we should use `DecoratorSet`?

It got me wondering more why we did not have to do similar updates in Java code for VH accesses to byte buffers. Your `ByteBufferBarriersTests` benchmark indicates VHs work for non-polluted heap and direct cases.

I played around with your patch and i managed to simplify by reducing the code changes in scoped memory access classes to a cast of the base to `byte[]`, e.g. src/java.base/share/classes/jdk/internal/misc/X-ScopedMemoryAccess.java.template

    @Scoped
    @ForceInline
    private static
    <V extends VectorSupport.Vector<E>, E, S extends VectorSupport.VectorSpecies<E>>
    V loadFromByteBufferScoped(ScopedMemoryAccess.Scope scope,
                          Class<? extends V> vmClass, Class<E> e, int length,
                          ByteBuffer bb, int offset,
                          S s,
                          VectorSupport.LoadOperation<ByteBuffer, V, E, S> defaultImpl) {
        try {
            if (scope != null) {
                scope.checkValidState();
            }

            return VectorSupport.load(vmClass, e, length,
                (byte[]) BufferAccess.bufferBase(bb), BufferAccess.bufferAddress(bb, offset),
                bb, offset, s,
                defaultImpl);
        } finally {
            Reference.reachabilityFence(scope);
        }
    }

I don't know why this works.

I am also observing an odd performance issue with heap access when tiered compilation is enabled, it may be unrelated as i can reproduce using the tip of `jdk/jdk`.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/104