Propose 2 new methods for MemorySegment

leerho leerho at gmail.com
Thu Jun 17 17:26:19 UTC 2021


This is super! Thank you for this.  Is there a chance this can make it into
17?

Lee.

On Thu, Jun 17, 2021 at 2:43 AM Maurizio Cimadamore <
maurizio.cimadamore at oracle.com> wrote:

>
> On 16/06/2021 22:36, leerho wrote:
>
> I think that will work and it does more than i wanted.  If two segments
> overlap (either 2 segments on-heap or 2 segments off-heap) and then using
> the segmentOffset() I can figure out how to do a direct copy without having
> to create an extra buffer --- without having to catch exceptions.
>
> I assume this would work even if the two segments are buried in a deep
> hierarchy, i.e., two great-grandchildren of a parent segment.
>
> I believe so - after you check that base object is the same (either same
> array object, or null, for native), you just need to do a range check.
> Typically these amount at checking if either the start _or_ the end address
> of a segment is contained in the other.
>
> But this would do nothing for the memory mapped case (second paragraph of
> the copyFrom javadoc) - although, as I said, I don't think we can
> realistically do much for that.
>
> Maurizio
>
>
> Lee.
>
>
> On Wed, Jun 16, 2021 at 2:07 PM Maurizio Cimadamore <
> maurizio.cimadamore at oracle.com> wrote:
>
>> Few observation:
>>
>> * I think there is space to add a method which checks if two segments
>> _overlap_
>>
>> * This doesn't mean reasoning in terms of structure, like you are
>> suggesting (e.g. two slices of the same parent), but merely checking for
>> address overlap
>>
>> * I don't think that, in the general case we carry around the mapped file
>> on which a segment/buffer is based from. And even if we did, with symbolic
>> links etc. it would be pretty hard to uniformly detect these issues
>>
>> Given the above, the complexity vs. benefit of the proposed API seems
>> rather slim.
>>
>> If the general feeling is that a _simple_ address overlap test would be
>> useful, we can add that - but compared with other things we're discussing
>> seems like low priority.
>>
>> Cheers
>> Maurizio
>>
>>
>> On 16/06/2021 20:06, leerho wrote:
>>
>> Maurizio,
>>   Well, I learned yet another corner of the API I hadn't found:
>> *MemoryAddress::segmentOffset()* :)
>>
>> However, having the boolean i*sSameBaseResource(MemorySegment other)* would
>> still be very useful!
>>
>> Having to catch exceptions in order to understand some basic properties
>> of a segment (or a pair of them) is a real nuisance.  As the API stands
>> currently, given two segments:
>>
>>    1. If they are both independently allocated on-heap the
>>    segmentOffset() throws an exception.
>>    2. If one is on-heap and the other off-heap segmentOffset() throws an
>>    exception.
>>    3. If they are both independently allocated *off-heap* segmentOffset *does
>>    not* throw an exception!
>>
>> If you had the method i*sSameBaseResource(MemorySegment other)*:
>>
>>    1. would return false
>>    2. would return false
>>    3. would return true (since the segmentOffset works in this case).
>>    4. Also, if both segments are descendants of a common ancestor
>>    segment, it would return true
>>
>> This would make handling of moving data between segments so much more
>> straightforward.
>>
>> I revise my request to just add the first method:
>>
>>    - MemorySegment::boolean isSameBaseResource(MemorySegment other);
>>    The intent is to reveal if *this* segment and the *other* segment
>>    share a common ancestor segment.
>>    ["It could also be extended to determine if the two segments share
>>    the same memory-mapped file (a true resource), thus possibly removing the
>>    caveat in paragraph 2 above". -- this may not be possible ]
>>
>> Now whether this removes your paragraph 2 caveat (at the top), I'm not
>> sure.  Perhaps the caveat is because memory regions of a memory-mapped file
>> can be swapped out at any time, making any assumptions about sub-regions
>> and offsets rather meaningless?  Are there other reasons?
>>
>> Lee.
>>
>> On Wed, Jun 16, 2021 at 1:43 AM Maurizio Cimadamore <
>> maurizio.cimadamore at oracle.com> wrote:
>>
>>> I see what you mean.
>>>
>>> I wonder if this use case isn't already partially covered by
>>> MemoryAddress::segmentOffset.
>>>
>>> E.g. can you do:
>>>
>>> long otherOffset = segment.address().segmentOffset(otherSegment);
>>>
>>> Then it should be easy to check if the offset is within the bounds of
>>> "otherSegment" ?
>>>
>>> (Note that the method already throws if you try to compare addresses and
>>> segments that are mismatched - e.g. on-heap vs. off-heap).
>>>
>>> Not saying a more direct API is ruled out, just pointing out what we
>>> have to see if it can be used.
>>>
>>> Maurizio
>>>
>>>
>>> On 16/06/2021 02:37, leerho wrote:
>>> > In working on https://github.com/openjdk/panama-foreign/pull/555
>>> <https://urldefense.com/v3/__https://github.com/openjdk/panama-foreign/pull/555__;!!GqivPVa7Brio!NIq-EJ-oDIw_GAcsVYAeixn4aRoWv0Ka_lgwzAjIaMC6ieshNbNmRDI0DoTelLQyi2Lh030$>,
>>> which is
>>> > the PR for Memory Segment Efficient Array Handling, I discovered that
>>> there
>>> > are two methods that would be very useful beyond copying arrays, but
>>> useful
>>> > in other types of data movement operations between MemorySegments.
>>> >
>>> > I'd like to draw attention to the opening Javadoc of the
>>> > *MemorySegment::copyFrom(MemorySegment)* method:
>>> >
>>> > 1. Performs a bulk copy from given source segment to this segment. More
>>> >> specifically, the bytes at offset 0 through src.byteSize() - 1 in the
>>> >> source segment are copied into this segment at offset 0 through
>>> src.byteSize()
>>> >> - 1. If the source segment overlaps with this segment, then the
>>> copying
>>> >> is performed as if the bytes at offset 0 through src.byteSize() - 1 in
>>> >> the source segment were first copied into a temporary segment with
>>> size
>>> >> bytes, and then the contents of the temporary segment were copied into
>>> >> this segment at offset 0 through src.byteSize() - 1.
>>> >>
>>> >> 2. The result of a bulk copy is unspecified if, in the uncommon case,
>>> the
>>> >> source segment and this segment do not overlap, but refer to
>>> overlapping
>>> >> regions of the same backing storage using different addresses. For
>>> example,
>>> >> this may occur if the same file is mapped
>>> >>
>>> <#mapFile(java.nio.file.Path,long,long,java.nio.channels.FileChannel.MapMode,jdk.incubator.foreign.ResourceScope)>
>>> to
>>> >> two segments.
>>> >>
>>> > The first paragraph is a guarantee that even if two descendant segments
>>> > have an overlapping region with a parent segment that the copy
>>> operation
>>> > will work properly.  This is similar to the guarantee of
>>> System.arrayCopy()
>>> >
>>> > The second paragraph refers to memory-mapped files.  However, let's
>>> examine
>>> > the following scenario:
>>> >
>>> >     - A hierarchy of Memory Segments where two descendant segments may
>>> >     overlap a region of the parent segment.
>>> >     - The elements of the segments are more complex than Java
>>> primitives
>>> >     (thus, PR 555 doesn't apply).
>>> >     - The user wishes to copy a region of elements from one of the
>>> >     descendant segments to the other descendant segment.
>>> >     - The user only has the two descendant segments in hand and does
>>> not
>>> >     have access to the parent segment.
>>> >
>>> > With the current MemorySegment API, the descendant segments are blind
>>> to
>>> > the overlap, to wit:
>>> >
>>> >     - The user cannot determine if an overlap exists.
>>> >     - Or, if an overlap exists where the overlap is with respect to
>>> the two
>>> >     segments in hand.
>>> >
>>> > In order to ensure that corruption doesn't occur during the copy, the
>>> user
>>> > must create a temporary duplicate of the destination segment, copy the
>>> data
>>> > into the duplicate, then copy the duplicate into the original
>>> destination
>>> > segment.  This can be expensive in time and space.
>>> >
>>> > If, however, the user can determine that an overlap exists, and where
>>> the
>>> > overlap occurs, the copy operation can be done safely, with no
>>> additional
>>> > storage, by properly choosing the direction of the iterative copy.
>>> >
>>> > To solve this, the user doesn't need access to the parent segment (this
>>> > could be for security reasons), but could use these two methods:
>>> >
>>> >     - MemorySegment::boolean isSameBaseResource(MemorySegment other);
>>> >     The intent is to reveal if *this* segment and the *other* segment
>>> share
>>> >     a common ancestor segment.  It could also be extended to determine
>>> if the
>>> >     two segments share the same memory-mapped file (a true resource),
>>> thus
>>> >     possibly removing the caveat in paragraph 2 above.
>>> >
>>> >
>>> >     - MemorySegment::long baseResourceOffsetBytes();
>>> >     This would return the offset in bytes of the start of this segment
>>> from
>>> >     the start of the highest common segment (or resource).
>>> >
>>> > With this information, the user can easily design a safe, efficient,
>>> and
>>> > fast data copy method for moving arbitrary elements from one segment to
>>> > another with the same guarantee as System.arrayCopy().
>>> >
>>> > *Evidence*
>>> > See (copySwap(...)
>>> > <
>>> https://github.com/leerho/PanamaLocal/blob/main/src/main/java/org/apache/datasketches/panama/MemoryCopy.java#L667-L703
>>> <https://urldefense.com/v3/__https://github.com/leerho/PanamaLocal/blob/main/src/main/java/org/apache/datasketches/panama/MemoryCopy.java*L667-L703__;Iw!!GqivPVa7Brio!NIq-EJ-oDIw_GAcsVYAeixn4aRoWv0Ka_lgwzAjIaMC6ieshNbNmRDI0DoTelLQySw8wUVE$>
>>> >).
>>> > Before I had access to the new MemorySegment::void
>>> copyFrom(MemorySegment,
>>> > MemoryLayout, MemoryLayout), I had to design a proxy routine that
>>> would do
>>> > the copy (with swap) correctly, especially in the case where the two
>>> > segments overlapped. Note lines 682, 683 where I create a temporary
>>> > segment. If I had the above two methods, this extra copy operation
>>> would
>>> > not be needed.
>>> >
>>> > For exactly the above reasons, some years ago we implemented similar
>>> > methods in our DataSketches Memory Component
>>> > <
>>> https://datasketches.apache.org/api/memory/snapshot/apidocs/index.html
>>> <https://urldefense.com/v3/__https://datasketches.apache.org/api/memory/snapshot/apidocs/index.html__;!!GqivPVa7Brio!NIq-EJ-oDIw_GAcsVYAeixn4aRoWv0Ka_lgwzAjIaMC6ieshNbNmRDI0DoTelLQyNuBaO9I$>
>>> >.
>>> > Specifically, in the class *WritableMemory*, the methods
>>> *getRegionOffset()*
>>> > and *isSameResource(that).*
>>> >
>>> > Lee.
>>>
>>


More information about the panama-dev mailing list