Propose 2 new methods for MemorySegment

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Thu Jun 17 09:43:00 UTC 2021


On 16/06/2021 22:36, leerho wrote:
> I think that will work and it does more than i wanted.  If two 
> segments overlap (either 2 segments on-heap or 2 segments off-heap) 
> and then using the segmentOffset() I can figure out how to do a direct 
> copy without having to create an extra buffer --- without having to 
> catch exceptions.
>
> I assume this would work even if the two segments are buried in a deep 
> hierarchy, i.e., two great-grandchildren of a parent segment.

I believe so - after you check that base object is the same (either same 
array object, or null, for native), you just need to do a range check. 
Typically these amount at checking if either the start _or_ the end 
address of a segment is contained in the other.

But this would do nothing for the memory mapped case (second paragraph 
of the copyFrom javadoc) - although, as I said, I don't think we can 
realistically do much for that.

Maurizio

>
> Lee.
>
>
> On Wed, Jun 16, 2021 at 2:07 PM Maurizio Cimadamore 
> <maurizio.cimadamore at oracle.com 
> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>
>     Few observation:
>
>     * I think there is space to add a method which checks if two
>     segments _overlap_
>
>     * This doesn't mean reasoning in terms of structure, like you are
>     suggesting (e.g. two slices of the same parent), but merely
>     checking for address overlap
>
>     * I don't think that, in the general case we carry around the
>     mapped file on which a segment/buffer is based from. And even if
>     we did, with symbolic links etc. it would be pretty hard to
>     uniformly detect these issues
>
>     Given the above, the complexity vs. benefit of the proposed API
>     seems rather slim.
>
>     If the general feeling is that a _simple_ address overlap test
>     would be useful, we can add that - but compared with other things
>     we're discussing seems like low priority.
>
>     Cheers
>     Maurizio
>
>
>     On 16/06/2021 20:06, leerho wrote:
>>     Maurizio,
>>       Well, I learned yet another corner of the API I hadn't found:
>>     /MemoryAddress::segmentOffset()/ :)
>>
>>     However, having the boolean i/sSameBaseResource(MemorySegment
>>     other)/ would still be very useful!
>>
>>     Having to catch exceptions in order to understand some basic
>>     properties of a segment (or a pair of them) is a real nuisance. 
>>     As the API stands currently, given two segments:
>>
>>      1. If they are both independently allocated on-heap the
>>         segmentOffset() throws an exception.
>>      2. If one is on-heap and the other off-heap segmentOffset()
>>         throws an exception.
>>      3. If they are both independently allocated */off-heap/*
>>         segmentOffset */does not/* throw an exception!
>>
>>     If you had the method i/sSameBaseResource(MemorySegment other)/:
>>
>>      1. would return false
>>      2. would return false
>>      3. would return true (since the segmentOffset works in this case).
>>      4. Also, if both segments are descendants of a common ancestor
>>         segment, it would return true
>>
>>     This would make handling of moving data between segments so much
>>     more straightforward.
>>
>>     I revise my request to just add the first method:
>>
>>       * MemorySegment::boolean isSameBaseResource(MemorySegment other);
>>         The intent is to reveal if /*this*/ segment and the
>>         */other/* segment share a common ancestor segment.
>>         ["It could also be extended to determine if the two segments
>>         share the same memory-mapped file (a true resource), thus
>>         possibly removing the caveat in paragraph 2 above". -- this
>>         may not be possible ]
>>
>>     Now whether this removes your paragraph 2 caveat (at the top),
>>     I'm not sure.  Perhaps the caveat is because memory regions of a
>>     memory-mapped file can be swapped out at any time, making any
>>     assumptions about sub-regions and offsets rather meaningless? 
>>     Are there other reasons?
>>
>>     Lee.
>>
>>     On Wed, Jun 16, 2021 at 1:43 AM Maurizio Cimadamore
>>     <maurizio.cimadamore at oracle.com
>>     <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>
>>         I see what you mean.
>>
>>         I wonder if this use case isn't already partially covered by
>>         MemoryAddress::segmentOffset.
>>
>>         E.g. can you do:
>>
>>         long otherOffset = segment.address().segmentOffset(otherSegment);
>>
>>         Then it should be easy to check if the offset is within the
>>         bounds of
>>         "otherSegment" ?
>>
>>         (Note that the method already throws if you try to compare
>>         addresses and
>>         segments that are mismatched - e.g. on-heap vs. off-heap).
>>
>>         Not saying a more direct API is ruled out, just pointing out
>>         what we
>>         have to see if it can be used.
>>
>>         Maurizio
>>
>>
>>         On 16/06/2021 02:37, leerho wrote:
>>         > In working on
>>         https://github.com/openjdk/panama-foreign/pull/555
>>         <https://urldefense.com/v3/__https://github.com/openjdk/panama-foreign/pull/555__;!!GqivPVa7Brio!NIq-EJ-oDIw_GAcsVYAeixn4aRoWv0Ka_lgwzAjIaMC6ieshNbNmRDI0DoTelLQyi2Lh030$>,
>>         which is
>>         > the PR for Memory Segment Efficient Array Handling, I
>>         discovered that there
>>         > are two methods that would be very useful beyond copying
>>         arrays, but useful
>>         > in other types of data movement operations between
>>         MemorySegments.
>>         >
>>         > I'd like to draw attention to the opening Javadoc of the
>>         > *MemorySegment::copyFrom(MemorySegment)* method:
>>         >
>>         > 1. Performs a bulk copy from given source segment to this
>>         segment. More
>>         >> specifically, the bytes at offset 0 through src.byteSize()
>>         - 1 in the
>>         >> source segment are copied into this segment at offset 0
>>         through src.byteSize()
>>         >> - 1. If the source segment overlaps with this segment,
>>         then the copying
>>         >> is performed as if the bytes at offset 0 through
>>         src.byteSize() - 1 in
>>         >> the source segment were first copied into a temporary
>>         segment with size
>>         >> bytes, and then the contents of the temporary segment were
>>         copied into
>>         >> this segment at offset 0 through src.byteSize() - 1.
>>         >>
>>         >> 2. The result of a bulk copy is unspecified if, in the
>>         uncommon case, the
>>         >> source segment and this segment do not overlap, but refer
>>         to overlapping
>>         >> regions of the same backing storage using different
>>         addresses. For example,
>>         >> this may occur if the same file is mapped
>>         >>
>>         <#mapFile(java.nio.file.Path,long,long,java.nio.channels.FileChannel.MapMode,jdk.incubator.foreign.ResourceScope)>
>>         to
>>         >> two segments.
>>         >>
>>         > The first paragraph is a guarantee that even if two
>>         descendant segments
>>         > have an overlapping region with a parent segment that the
>>         copy operation
>>         > will work properly.  This is similar to the guarantee of
>>         System.arrayCopy()
>>         >
>>         > The second paragraph refers to memory-mapped files. 
>>         However, let's examine
>>         > the following scenario:
>>         >
>>         >     - A hierarchy of Memory Segments where two descendant
>>         segments may
>>         >     overlap a region of the parent segment.
>>         >     - The elements of the segments are more complex than
>>         Java primitives
>>         >     (thus, PR 555 doesn't apply).
>>         >     - The user wishes to copy a region of elements from one
>>         of the
>>         >     descendant segments to the other descendant segment.
>>         >     - The user only has the two descendant segments in hand
>>         and does not
>>         >     have access to the parent segment.
>>         >
>>         > With the current MemorySegment API, the descendant segments
>>         are blind to
>>         > the overlap, to wit:
>>         >
>>         >     - The user cannot determine if an overlap exists.
>>         >     - Or, if an overlap exists where the overlap is with
>>         respect to the two
>>         >     segments in hand.
>>         >
>>         > In order to ensure that corruption doesn't occur during the
>>         copy, the user
>>         > must create a temporary duplicate of the destination
>>         segment, copy the data
>>         > into the duplicate, then copy the duplicate into the
>>         original destination
>>         > segment.  This can be expensive in time and space.
>>         >
>>         > If, however, the user can determine that an overlap exists,
>>         and where the
>>         > overlap occurs, the copy operation can be done safely, with
>>         no additional
>>         > storage, by properly choosing the direction of the
>>         iterative copy.
>>         >
>>         > To solve this, the user doesn't need access to the parent
>>         segment (this
>>         > could be for security reasons), but could use these two
>>         methods:
>>         >
>>         >     - MemorySegment::boolean
>>         isSameBaseResource(MemorySegment other);
>>         >     The intent is to reveal if *this* segment and the
>>         *other* segment share
>>         >     a common ancestor segment.  It could also be extended
>>         to determine if the
>>         >     two segments share the same memory-mapped file (a true
>>         resource), thus
>>         >     possibly removing the caveat in paragraph 2 above.
>>         >
>>         >
>>         >     - MemorySegment::long baseResourceOffsetBytes();
>>         >     This would return the offset in bytes of the start of
>>         this segment from
>>         >     the start of the highest common segment (or resource).
>>         >
>>         > With this information, the user can easily design a safe,
>>         efficient, and
>>         > fast data copy method for moving arbitrary elements from
>>         one segment to
>>         > another with the same guarantee as System.arrayCopy().
>>         >
>>         > *Evidence*
>>         > See (copySwap(...)
>>         >
>>         <https://github.com/leerho/PanamaLocal/blob/main/src/main/java/org/apache/datasketches/panama/MemoryCopy.java#L667-L703
>>         <https://urldefense.com/v3/__https://github.com/leerho/PanamaLocal/blob/main/src/main/java/org/apache/datasketches/panama/MemoryCopy.java*L667-L703__;Iw!!GqivPVa7Brio!NIq-EJ-oDIw_GAcsVYAeixn4aRoWv0Ka_lgwzAjIaMC6ieshNbNmRDI0DoTelLQySw8wUVE$>>).
>>         > Before I had access to the new MemorySegment::void
>>         copyFrom(MemorySegment,
>>         > MemoryLayout, MemoryLayout), I had to design a proxy
>>         routine that would do
>>         > the copy (with swap) correctly, especially in the case
>>         where the two
>>         > segments overlapped. Note lines 682, 683 where I create a
>>         temporary
>>         > segment. If I had the above two methods, this extra copy
>>         operation would
>>         > not be needed.
>>         >
>>         > For exactly the above reasons, some years ago we
>>         implemented similar
>>         > methods in our DataSketches Memory Component
>>         >
>>         <https://datasketches.apache.org/api/memory/snapshot/apidocs/index.html
>>         <https://urldefense.com/v3/__https://datasketches.apache.org/api/memory/snapshot/apidocs/index.html__;!!GqivPVa7Brio!NIq-EJ-oDIw_GAcsVYAeixn4aRoWv0Ka_lgwzAjIaMC6ieshNbNmRDI0DoTelLQyNuBaO9I$>>.
>>         > Specifically, in the class *WritableMemory*, the methods
>>         *getRegionOffset()*
>>         > and *isSameResource(that).*
>>         >
>>         > Lee.
>>


More information about the panama-dev mailing list