compose MemorySegments

leerho leerho at gmail.com
Fri Jun 11 03:15:36 UTC 2021


Hi Douglas,
This is an interesting problem and a difficult one to do efficiently
without copies or additional allocations. I think the OS does similar
tricks when it maps virtual memory addresses into CPU caches or main
memory.  To the user's program it appears as a contiguous memory space but
in reality could be broken into many separate physical chunks.  I'm sure
there are some ways one could do this in C++, but probably not as efficient
as the OS could do it.

In looking at your code above I am guessing you may be trying to create a
1:1 association between fields (which may cross block boundaries) and
MemorySegments as a preprocessing step, so that when a user requests a
field it is already in a segment?

I'm not sure, but from what you have described there are statistical
techniques that could possibly be of help.  If your field sizes vary over a
wide dynamic range and your data is large, it is likely that the
distribution of field sizes roughly follow a power-law distribution.  In
other words, the number of fields that are large enough to actually cross
block boundaries are orders-of-magnitude fewer than the smaller fields that
are always contained in a single block.  This means that the expensive
copy/slice operation you describe above really only needs to be performed
on the big fields, so amortized across all fields becomes much less
expensive, on average.  But perhaps there is more.

There are streaming algorithms (called "sketches") for computing the shape
of your field size distribution on-the-fly. Perhaps you could use this
real-time information to partition off to another processing element or
memory space those fields that are likely to split across blocks. This
would allow some parallelization of the expensive composition operation.

This is clearly whimsical speculation because I can only guess at what your
application is actually doing, but perhaps it might trigger something for
you :)

Cheers,

Lee.













On Thu, Jun 10, 2021 at 4:25 PM Douglas Surber <douglas.surber at oracle.com>
wrote:

>
> Maurizio,
>
> I can certainly respect a decision that composing multiple MemorySegments
> might be out of scope. Without composition I would write something like
> this.
>
> MemorySegment.ofArray(dest)
> .asSlice(destOffset, firstPartLength)
> .copyFrom(MemorySegment.ofArray(src0).asSlice(srcOffset, firstPartLength);
> MemorySegment.ofArray(dest)
> .asSlice(destOffset + firstPartLength, destOffset + firstPartLength +
> secondPartLength)
> .copyFrom(MemorySegment.ofArray(src1).asSlice(0L, secondPartLength));
>
> This would copy the bits from the end of src0 into the first part of dest
> and the bits from the beginning of src1 into the second part of dest. Would
> all this result in just two SIMD instructions modulo bounds checking? And
> no allocations?
>
> Douglas
>


More information about the panama-dev mailing list