[vectorIntrinsics] RFR: Optimize mem barriers for ByteBuffer cases [v8]
Paul Sandoz
psandoz at openjdk.java.net
Wed Aug 4 15:42:44 UTC 2021
On Tue, 3 Aug 2021 21:19:15 GMT, Radoslaw Smogura <github.com+7535718+rsmogura at openjdk.org> wrote:
>> # Description
>> This change tries to remove mem bars for byte buffer cases.
>>
>> Previously mem bars were inserted almost unconditionally if attemp to native memory acees where detected. This patch tries to follow up inline_unsafe_access and insert bar only if can't determine if it's heap or off-heap (type missmatch cases are not ported).
>>
>> # Testing
>> Memory tests should include rollbacking JDK changes, and leaving only hotspot, as intrinsics should be well guarded
>>
>> # Notes
>> Polluted cases to be addressed later
>>
>> # Benchmarks
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> ByteBufferVectorAccess.arrays 1024 avgt 10 12.585 ? 0.409 ns/op
>> ByteBufferVectorAccess.directBuffers 1024 avgt 10 19.962 ? 0.080 ns/op
>> ByteBufferVectorAccess.heapBuffers 1024 avgt 10 15.878 ? 0.187 ns/op
>> ByteBufferVectorAccess.pollutedBuffers2 1024 avgt 10 123.702 ? 0.723 ns/op
>> ByteBufferVectorAccess.pollutedBuffers3 1024 avgt 10 223.928 ? 1.906 ns/op
>>
>> Before
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> ByteBufferVectorAccess.arrays 1024 avgt 10 14.730 ? 0.061 ns/op
>> ByteBufferVectorAccess.directBuffers 1024 avgt 10 77.707 ? 4.867 ns/op
>> ByteBufferVectorAccess.heapBuffers 1024 avgt 10 76.530 ? 1.076 ns/op
>> ByteBufferVectorAccess.pollutedBuffers2 1024 avgt 10 143.331 ? 1.096 ns/op
>> ByteBufferVectorAccess.pollutedBuffers3 1024 avgt 10 286.645 ? 3.444 ns/op
>
> Radoslaw Smogura has updated the pull request incrementally with one additional commit since the last revision:
>
> Minor stylistic fixes.
Appreciate all your efforts and explorations. We have been at this for a few rounds and it looks like we are getting close.
Two things:
1. Did you ever get the `VectorMemoryAlias` test to fail e.g. by deliberately disabling the barriers?
2. I think we should use unsafe to test for a read only buffer. We can follow up with another PR. (My hope is this might can also resolve the issue with tiered compilation and heap buffers i was observing.)
src/hotspot/share/opto/vectorIntrinsics.cpp line 781:
> 779: const TypeAryPtr* arr_type = addr_type->isa_aryptr();
> 780:
> 781: if (arr_type == NULL && !off_heap_access) {
Not sure this case can occur, since we only pass array instances or `null` and we cannot pollute with different types of array. Might need some guidance from a HotSpot expert? Perhaps we can remove this code or place in an assert?
We do have a form of mismatched access between `char[]` and `ShortVector` and `boolean[]` and `ByteVector` but that's a different case.
src/java.base/share/classes/jdk/internal/misc/X-ScopedMemoryAccess.java.template line 421:
> 419: bb, offset, s,
> 420: defaultImpl);
> 421: }
Can we try collapsing that to:
Suggestion:
return VectorSupport.load(vmClass, e, length,
base, BufferAccess.bufferAddress(bb, offset),
bb, offset, s,
defaultImpl);
? That appeared to be ok when i ran the benchmark for the non-polluted cases.
test/hotspot/jtreg/compiler/vectorapi/VectorMemoryAlias.java line 58:
> 56: final var bb = ByteBuffer.wrap(arr);
> 57: final var vs = VectorSpecies.ofLargestShape(byte.class);
> 58: final var ones = ByteVector.broadcast(vs, 1);
Make these static finals (or at least the former)?
test/micro/org/openjdk/bench/jdk/incubator/vector/ByteBufferVectorAccess.java line 146:
> 144:
> 145: @CompilerControl(CompilerControl.Mode.DONT_INLINE)
> 146: protected void copyMemory(ByteBuffer in, ByteBuffer out) {
We may want to create a wrapping method that is not inlined just for the polluted cases e.g. `copyIntoNotInlined` calling `copyMemory`?
Thereby we avoid the measurement of a method call in the non-polluted cases so as to compare more fairly with `arrayCopy`?
-------------
PR: https://git.openjdk.java.net/panama-vector/pull/104
More information about the panama-dev
mailing list