[foreign-memaccess+abi] RFR: Add benchmarks to MemorySegmentVsBits [v2]

Thu Jan 5 21:51:09 UTC 2023

On Wed, 4 Jan 2023 08:00:15 GMT, Per Minborg <pminborg at openjdk.org> wrote:

>> This PR proposes the addition of some benchmarks, for example using a LonBuffer and a VarHandle.
>
> Per Minborg has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Change to big endian for some variants

Bulk copy seems the best. It is likely that using a bulk "put" on ByteBuffer is also way better than using separate stores.
The way I arrived at using bulk copy was because I noted that the code was not being vectorized - but the code C2 emits for a bulk copy has vector instructions in it, so it should perform optimally (which it does). Since ByteBuffer and MemorySegment use the same underlying primitive for bulk copy (Unsafe::copyMemory) I'd expect the two to behave the same.

That said, I don't think there's any reason for the non-bulk versions to be slower, and I hope that it turns out to be some "near miss" in the autovectorization code (we're looking into it).

> According to this data, if your application primarily works with reading/writing/copying of buffers & binary data
> 
> Then you're better off (perf-wise) using Panama FFM `MemorySegment` with `MemorySegment#copy` rather than ByteBuffers or byte-arrays?
> 
> I'd be curious if you lose any of this benefit if you have to adapt the MemorySegment with `.asByteBuffer()` for calling existing API's that require BB's too, like for example `java.nio.AsynchronousSocketServer` callbacks and the like.

Turning a segment into a BB is a rather cheap O(1) operation, so I wouldn't expect that to result in performance degradation.
> 
> (The size here is in `long-bytes`, right? So it's x8 with `256` being a 2kb buffer?)
yes

-------------

PR: https://git.openjdk.org/panama-foreign/pull/762