[vectorIntrinsics] RFR: Optimize mem barriers for ByteBuffer cases [v10]
Radoslaw Smogura
github.com+7535718+rsmogura at openjdk.java.net
Thu Aug 5 18:16:45 UTC 2021
On Wed, 4 Aug 2021 22:44:09 GMT, Radoslaw Smogura <github.com+7535718+rsmogura at openjdk.org> wrote:
>> # Description
>> This change tries to remove mem bars for byte buffer cases.
>>
>> Previously mem bars were inserted almost unconditionally if attemp to native memory acees where detected. This patch tries to follow up inline_unsafe_access and insert bar only if can't determine if it's heap or off-heap (type missmatch cases are not ported).
>>
>> # Testing
>> Memory tests should include rollbacking JDK changes, and leaving only hotspot, as intrinsics should be well guarded
>>
>> # Notes
>> Polluted cases to be addressed later
>>
>> # Benchmarks
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> ByteBufferVectorAccess.arrays 1024 avgt 10 12.585 ? 0.409 ns/op
>> ByteBufferVectorAccess.directBuffers 1024 avgt 10 19.962 ? 0.080 ns/op
>> ByteBufferVectorAccess.heapBuffers 1024 avgt 10 15.878 ? 0.187 ns/op
>> ByteBufferVectorAccess.pollutedBuffers2 1024 avgt 10 123.702 ? 0.723 ns/op
>> ByteBufferVectorAccess.pollutedBuffers3 1024 avgt 10 223.928 ? 1.906 ns/op
>>
>> Before
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> ByteBufferVectorAccess.arrays 1024 avgt 10 14.730 ? 0.061 ns/op
>> ByteBufferVectorAccess.directBuffers 1024 avgt 10 77.707 ? 4.867 ns/op
>> ByteBufferVectorAccess.heapBuffers 1024 avgt 10 76.530 ? 1.076 ns/op
>> ByteBufferVectorAccess.pollutedBuffers2 1024 avgt 10 143.331 ? 1.096 ns/op
>> ByteBufferVectorAccess.pollutedBuffers3 1024 avgt 10 286.645 ? 3.444 ns/op
>
> Radoslaw Smogura has updated the pull request incrementally with two additional commits since the last revision:
>
> - Revert: gitignore(s)
> - CR changes:
> * reformat checks
> * bring array mismatched access back
> > for a store we could assign result of StoreVector to two slices raw, and byte[] in a memory merge node,
>
> I don't see how it could work with the alias analysis (as it is implemented now).
> Every memory slice is "flattened" into a unique slice which doesn't alias with anything except the one represented with `TypePtr::BOTTOM`. What you suggest implies that some slices start to alias with raw memory. It will break the existing logic unless you find a smart way to fix it.
>
> > for a load, we could consume the whole memory as input, instead of a single slice.
>
> Still, you need to be very cautious about the alias index being assigned to the "wide" memory slice of mixed/mismatched access. Also, the logic which inserts anti-dependencies in the graph has to be taught about the aliasing slices.
>
> Overall, it looks error-prone and it wouldn't necessarily lead to a simpler IR (and better generated code) compared to CPU members.
There was a comment in memnode.cpp: "A merge can take a "wide" memory state as one of its narrow inputs. This simply means that the merge observes out only the relevant parts of the wide input (...) (This is rare.)" So, I thought we could mark mixed store / load as delivering wide memory and assign it to only two slices which can be potentially modified (we know that only one slice will be physically modified).
That was more or less one of the reasons I thought we could do this.
-------------
PR: https://git.openjdk.java.net/panama-vector/pull/104
More information about the panama-dev
mailing list