compose MemorySegments
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Fri Jun 11 10:47:11 UTC 2021
On 11/06/2021 00:24, Douglas Surber wrote:
>
> Maurizio,
>
> I can certainly respect a decision that composing multiple
> MemorySegments might be out of scope. Without composition I would
> write something like this.
>
> MemorySegment.ofArray(dest)
> .asSlice(destOffset, firstPartLength)
> .copyFrom(MemorySegment.ofArray(src0).asSlice(srcOffset,
> firstPartLength);
> MemorySegment.ofArray(dest)
> .asSlice(destOffset + firstPartLength, destOffset +
> firstPartLength + secondPartLength)
> .copyFrom(MemorySegment.ofArray(src1).asSlice(0L, secondPartLength));
>
>
> This would copy the bits from the end of src0 into the first part of
> dest and the bits from the beginning of src1 into the second part of
> dest. Would all this result in just two SIMD instructions modulo
> bounds checking? And no allocations?
Short answer is, yes, possibly.
Slightly longer answer - as I mentioned we're working to offer different
helpers to copy data from segments to arrays and viceversa, so that the
code above can be made significantly more readable.
Looking a bit further down the road, memory segments are effectively
immutable (the bit of mutable state which tells whether a resource is
released or not is in the so called "resource scope", not in a
MemorySegment). So, right now, we can sometimes take advantage of escape
analysis to eliminate allocation of segment slices (e.g. if a segment
doesn't "escape", it can often be scalarized into registers). Of course
escape analysis doesn't always work, especially when a method contains
control flow. But Valhalla will give us the ability to implement the
MemorySegment interface with a true "primitive class" - for which
allocation behavior could be much more predictable.
So, if the Java code surrounding the bulk copy is optimized enough, you
do get a pretty optimized bulk copy. In your specific case, there's no
control flow, and the temporary slices you create are only used inside
the copy method - which makes me think that stuff like this should
already perform decently, assuming the code above gets inlined by C2.
Here's a benchmark I've tweaked in the past to show the (non) cost of
slicing prior to a bulk op:
https://mail.openjdk.java.net/pipermail/panama-dev/2021-April/012889.html
and
https://mail.openjdk.java.net/pipermail/panama-dev/2021-April/012897.html
The first shows throughput, the latter allocation rate, which is zero
for both unsafe and the memory segment APIs.
This doesn't mean that in _all_ cases allocations will be eliminated
(see above), but we're in relatively good shape, and we have plans to
make allocations even less expensive as new VM features are rolled out.
Maurizio
>
> Douglas
More information about the panama-dev
mailing list