[vectorIntrinsics] RFR: Optimize mem barriers for ByteBuffer cases [v4]
Radoslaw Smogura
github.com+7535718+rsmogura at openjdk.java.net
Mon Aug 2 21:02:24 UTC 2021
> # Description
> This change tries to remove mem bars for byte buffer cases.
>
> Previously mem bars were inserted almost unconditionally if attemp to native memory acees where detected. This patch tries to follow up inline_unsafe_access and insert bar only if can't determine if it's heap or off-heap (type missmatch cases are not ported).
>
> # Testing
> Memory tests should include rollbacking JDK changes, and leaving only hotspot, as intrinsics should be well guarded
>
> # Notes
> Polluted cases to be addressed later
>
> # Benchmarks
>
> Benchmark (size) Mode Cnt Score Error Units
> ByteBufferVectorAccess.arrays 1024 avgt 10 12.585 ? 0.409 ns/op
> ByteBufferVectorAccess.directBuffers 1024 avgt 10 19.962 ? 0.080 ns/op
> ByteBufferVectorAccess.heapBuffers 1024 avgt 10 15.878 ? 0.187 ns/op
> ByteBufferVectorAccess.pollutedBuffers2 1024 avgt 10 123.702 ? 0.723 ns/op
> ByteBufferVectorAccess.pollutedBuffers3 1024 avgt 10 223.928 ? 1.906 ns/op
>
> Before
>
> Benchmark (size) Mode Cnt Score Error Units
> ByteBufferVectorAccess.arrays 1024 avgt 10 14.730 ? 0.061 ns/op
> ByteBufferVectorAccess.directBuffers 1024 avgt 10 77.707 ? 4.867 ns/op
> ByteBufferVectorAccess.heapBuffers 1024 avgt 10 76.530 ? 1.076 ns/op
> ByteBufferVectorAccess.pollutedBuffers2 1024 avgt 10 143.331 ? 1.096 ns/op
> ByteBufferVectorAccess.pollutedBuffers3 1024 avgt 10 286.645 ? 3.444 ns/op
Radoslaw Smogura has updated the pull request incrementally with one additional commit since the last revision:
Support polluted cases.
Factor load and stores to supported polluted cases.
Use more immutable memory and instance fields, to avoid
virtual calls.
Use immutable memory to help unswitching loops.
This code works suspicousyly well (I see loop get unswitched 4 times).
```
Benchmark (size) Mode Cnt Score Error Units
ByteBufferVectorAccess.arrayCopy 1024 avgt 10 14.524 ? 0.356 ns/op
ByteBufferVectorAccess.directBuffers 1024 avgt 10 19.633 ? 0.137 ns/op
ByteBufferVectorAccess.heapBuffers 1024 avgt 10 19.148 ? 0.505 ns/op
ByteBufferVectorAccess.pollutedBuffers2 1024 avgt 10 31.682 ? 0.762 ns/op
ByteBufferVectorAccess.pollutedBuffers3 1024 avgt 10 74.878 ? 1.127 ns/op
ByteBufferVectorAccess.pollutedBuffers4 1024 avgt 10 71.133 ? 1.822 ns/op
ByteBufferVectorAccess.pollutedBuffers5 1024 avgt 10 66.990 ? 1.323 ns/op
```
With loop unrolling
```
Benchmark (size) Mode Cnt Score Error Units
ByteBufferVectorAccess.arrayCopy 1024 avgt 10 14.517 ? 0.103 ns/op
ByteBufferVectorAccess.directBuffers 1024 avgt 10 12.140 ? 0.134 ns/op
ByteBufferVectorAccess.pollutedBuffers2 1024 avgt 10 34.582 ? 0.250 ns/op
ByteBufferVectorAccess.pollutedBuffers3 1024 avgt 10 69.405 ? 0.845 ns/op
ByteBufferVectorAccess.pollutedBuffers4 1024 avgt 10 58.719 ? 0.491 ns/op
ByteBufferVectorAccess.pollutedBuffers5 1024 avgt 10 60.044 ? 0.338 ns/op
```
plus heap buff which sometimes executes slower...
```
ByteBufferVectorAccess.heapBuffers 1024 avgt 10 15.878 ? 0.423 ns/op
```
-------------
Changes:
- all: https://git.openjdk.java.net/panama-vector/pull/104/files
- new: https://git.openjdk.java.net/panama-vector/pull/104/files/ed6c744d..4852ea23
Webrevs:
- full: https://webrevs.openjdk.java.net/?repo=panama-vector&pr=104&range=03
- incr: https://webrevs.openjdk.java.net/?repo=panama-vector&pr=104&range=02-03
Stats: 403 lines in 11 files changed: 287 ins; 0 del; 116 mod
Patch: https://git.openjdk.java.net/panama-vector/pull/104.diff
Fetch: git fetch https://git.openjdk.java.net/panama-vector pull/104/head:pull/104
PR: https://git.openjdk.java.net/panama-vector/pull/104
More information about the panama-dev
mailing list