[vectorIntrinsics] RFR: Optimize mem barriers for ByteBuffer cases [v8]

Paul Sandoz psandoz at openjdk.java.net
Wed Aug 4 15:42:44 UTC 2021


On Tue, 3 Aug 2021 21:19:15 GMT, Radoslaw Smogura <github.com+7535718+rsmogura at openjdk.org> wrote:

>> # Description
>> This change tries to remove mem bars for byte buffer cases.
>> 
>> Previously mem bars were inserted almost unconditionally if attemp to native memory acees where detected. This patch tries to follow up inline_unsafe_access and insert bar only if can't determine if it's heap or off-heap (type missmatch cases are not ported).
>> 
>> # Testing
>> Memory tests should include rollbacking JDK changes, and leaving only hotspot, as intrinsics should be well guarded
>> 
>> # Notes
>> Polluted cases to be addressed later
>> 
>> # Benchmarks
>> 
>> Benchmark                                (size)  Mode  Cnt    Score   Error  Units
>> ByteBufferVectorAccess.arrays              1024  avgt   10   12.585 ? 0.409  ns/op
>> ByteBufferVectorAccess.directBuffers       1024  avgt   10   19.962 ? 0.080  ns/op
>> ByteBufferVectorAccess.heapBuffers         1024  avgt   10   15.878 ? 0.187  ns/op
>> ByteBufferVectorAccess.pollutedBuffers2    1024  avgt   10  123.702 ? 0.723  ns/op
>> ByteBufferVectorAccess.pollutedBuffers3    1024  avgt   10  223.928 ? 1.906  ns/op
>> 
>> Before
>> 
>> Benchmark                                (size)  Mode  Cnt    Score   Error  Units
>> ByteBufferVectorAccess.arrays              1024  avgt   10   14.730 ? 0.061  ns/op
>> ByteBufferVectorAccess.directBuffers       1024  avgt   10   77.707 ? 4.867  ns/op
>> ByteBufferVectorAccess.heapBuffers         1024  avgt   10   76.530 ? 1.076  ns/op
>> ByteBufferVectorAccess.pollutedBuffers2    1024  avgt   10  143.331 ? 1.096  ns/op
>> ByteBufferVectorAccess.pollutedBuffers3    1024  avgt   10  286.645 ? 3.444  ns/op
>
> Radoslaw Smogura has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Minor stylistic fixes.

Appreciate all your efforts and explorations. We have been at this for a few rounds and it looks like we are getting close.

Two things:
1. Did you ever get the `VectorMemoryAlias` test to fail e.g. by deliberately disabling the barriers?
2. I think we should use unsafe to test for a read only buffer. We can follow up with another PR. (My hope is this might can also resolve the issue with tiered compilation and heap buffers i was observing.)

src/hotspot/share/opto/vectorIntrinsics.cpp line 781:

> 779:   const TypeAryPtr* arr_type = addr_type->isa_aryptr();
> 780: 
> 781:   if (arr_type == NULL && !off_heap_access) {

Not sure this case can occur, since we only pass array instances or `null` and we cannot pollute with different types of array. Might need some guidance from a HotSpot expert? Perhaps we can remove this code or place in an assert?

We do have a form of mismatched access between `char[]` and `ShortVector` and `boolean[]` and `ByteVector` but that's a different case.

src/java.base/share/classes/jdk/internal/misc/X-ScopedMemoryAccess.java.template line 421:

> 419:                       bb, offset, s,
> 420:                       defaultImpl);
> 421:             }

Can we try collapsing that to:
Suggestion:

              return VectorSupport.load(vmClass, e, length,
                      base, BufferAccess.bufferAddress(bb, offset),
                      bb, offset, s,
                      defaultImpl);

? That appeared to be ok when i ran the benchmark for the non-polluted cases.

test/hotspot/jtreg/compiler/vectorapi/VectorMemoryAlias.java line 58:

> 56:     final var bb = ByteBuffer.wrap(arr);
> 57:     final var vs = VectorSpecies.ofLargestShape(byte.class);
> 58:     final var ones = ByteVector.broadcast(vs, 1);

Make these static finals (or at least the former)?

test/micro/org/openjdk/bench/jdk/incubator/vector/ByteBufferVectorAccess.java line 146:

> 144: 
> 145:   @CompilerControl(CompilerControl.Mode.DONT_INLINE)
> 146:   protected void copyMemory(ByteBuffer in, ByteBuffer out) {

We may want to create a wrapping method that is not inlined just for the polluted cases e.g. `copyIntoNotInlined` calling `copyMemory`?

Thereby we avoid the measurement of a method call in the non-polluted cases so as to compare more fairly with `arrayCopy`?

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/104


More information about the panama-dev mailing list