Propose 2 new methods for MemorySegment

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Thu Jun 17 17:34:45 UTC 2021


On 17/06/2021 18:26, leerho wrote:
> This is super! Thank you for this.  Is there a chance this can make it 
> into 17?

No, Java 17 is in finalization stage [1], only high priority requests 
(e.g. broken stuff) can be accommodated at this point in time.

Cheers
Maurizio

[1] - https://mail.openjdk.java.net/pipermail/jdk-dev/2021-June/005690.html


>
> Lee.
>
> On Thu, Jun 17, 2021 at 2:43 AM Maurizio Cimadamore 
> <maurizio.cimadamore at oracle.com 
> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>
>
>     On 16/06/2021 22:36, leerho wrote:
>>     I think that will work and it does more than i wanted.  If two
>>     segments overlap (either 2 segments on-heap or 2 segments
>>     off-heap) and then using the segmentOffset() I can figure out how
>>     to do a direct copy without having to create an extra buffer ---
>>     without having to catch exceptions.
>>
>>     I assume this would work even if the two segments are buried in a
>>     deep hierarchy, i.e., two great-grandchildren of a parent segment.
>
>     I believe so - after you check that base object is the same
>     (either same array object, or null, for native), you just need to
>     do a range check. Typically these amount at checking if either the
>     start _or_ the end address of a segment is contained in the other.
>
>     But this would do nothing for the memory mapped case (second
>     paragraph of the copyFrom javadoc) - although, as I said, I don't
>     think we can realistically do much for that.
>
>     Maurizio
>
>>
>>     Lee.
>>
>>
>>     On Wed, Jun 16, 2021 at 2:07 PM Maurizio Cimadamore
>>     <maurizio.cimadamore at oracle.com
>>     <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>
>>         Few observation:
>>
>>         * I think there is space to add a method which checks if two
>>         segments _overlap_
>>
>>         * This doesn't mean reasoning in terms of structure, like you
>>         are suggesting (e.g. two slices of the same parent), but
>>         merely checking for address overlap
>>
>>         * I don't think that, in the general case we carry around the
>>         mapped file on which a segment/buffer is based from. And even
>>         if we did, with symbolic links etc. it would be pretty hard
>>         to uniformly detect these issues
>>
>>         Given the above, the complexity vs. benefit of the proposed
>>         API seems rather slim.
>>
>>         If the general feeling is that a _simple_ address overlap
>>         test would be useful, we can add that - but compared with
>>         other things we're discussing seems like low priority.
>>
>>         Cheers
>>         Maurizio
>>
>>
>>         On 16/06/2021 20:06, leerho wrote:
>>>         Maurizio,
>>>           Well, I learned yet another corner of the API I hadn't
>>>         found: /MemoryAddress::segmentOffset()/ :)
>>>
>>>         However, having the boolean
>>>         i/sSameBaseResource(MemorySegment other)/ would still be
>>>         very useful!
>>>
>>>         Having to catch exceptions in order to understand some basic
>>>         properties of a segment (or a pair of them) is a real
>>>         nuisance.  As the API stands currently, given two segments:
>>>
>>>          1. If they are both independently allocated on-heap the
>>>             segmentOffset() throws an exception.
>>>          2. If one is on-heap and the other off-heap segmentOffset()
>>>             throws an exception.
>>>          3. If they are both independently allocated */off-heap/*
>>>             segmentOffset */does not/* throw an exception!
>>>
>>>         If you had the method i/sSameBaseResource(MemorySegment other)/:
>>>
>>>          1. would return false
>>>          2. would return false
>>>          3. would return true (since the segmentOffset works in this
>>>             case).
>>>          4. Also, if both segments are descendants of a common
>>>             ancestor segment, it would return true
>>>
>>>         This would make handling of moving data between segments so
>>>         much more straightforward.
>>>
>>>         I revise my request to just add the first method:
>>>
>>>           * MemorySegment::boolean isSameBaseResource(MemorySegment
>>>             other);
>>>             The intent is to reveal if /*this*/ segment and the
>>>             */other/* segment share a common ancestor segment.
>>>             ["It could also be extended to determine if the two
>>>             segments share the same memory-mapped file (a true
>>>             resource), thus possibly removing the caveat in
>>>             paragraph 2 above". -- this may not be possible ]
>>>
>>>         Now whether this removes your paragraph 2 caveat (at the
>>>         top), I'm not sure.  Perhaps the caveat is because memory
>>>         regions of a memory-mapped file can be swapped out at any
>>>         time, making any assumptions about sub-regions and offsets
>>>         rather meaningless?  Are there other reasons?
>>>
>>>         Lee.
>>>
>>>         On Wed, Jun 16, 2021 at 1:43 AM Maurizio Cimadamore
>>>         <maurizio.cimadamore at oracle.com
>>>         <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>>
>>>             I see what you mean.
>>>
>>>             I wonder if this use case isn't already partially
>>>             covered by
>>>             MemoryAddress::segmentOffset.
>>>
>>>             E.g. can you do:
>>>
>>>             long otherOffset =
>>>             segment.address().segmentOffset(otherSegment);
>>>
>>>             Then it should be easy to check if the offset is within
>>>             the bounds of
>>>             "otherSegment" ?
>>>
>>>             (Note that the method already throws if you try to
>>>             compare addresses and
>>>             segments that are mismatched - e.g. on-heap vs. off-heap).
>>>
>>>             Not saying a more direct API is ruled out, just pointing
>>>             out what we
>>>             have to see if it can be used.
>>>
>>>             Maurizio
>>>
>>>
>>>             On 16/06/2021 02:37, leerho wrote:
>>>             > In working on
>>>             https://github.com/openjdk/panama-foreign/pull/555
>>>             <https://urldefense.com/v3/__https://github.com/openjdk/panama-foreign/pull/555__;!!GqivPVa7Brio!NIq-EJ-oDIw_GAcsVYAeixn4aRoWv0Ka_lgwzAjIaMC6ieshNbNmRDI0DoTelLQyi2Lh030$>,
>>>             which is
>>>             > the PR for Memory Segment Efficient Array Handling, I
>>>             discovered that there
>>>             > are two methods that would be very useful beyond
>>>             copying arrays, but useful
>>>             > in other types of data movement operations between
>>>             MemorySegments.
>>>             >
>>>             > I'd like to draw attention to the opening Javadoc of the
>>>             > *MemorySegment::copyFrom(MemorySegment)* method:
>>>             >
>>>             > 1. Performs a bulk copy from given source segment to
>>>             this segment. More
>>>             >> specifically, the bytes at offset 0 through
>>>             src.byteSize() - 1 in the
>>>             >> source segment are copied into this segment at offset
>>>             0 through src.byteSize()
>>>             >> - 1. If the source segment overlaps with this
>>>             segment, then the copying
>>>             >> is performed as if the bytes at offset 0 through
>>>             src.byteSize() - 1 in
>>>             >> the source segment were first copied into a temporary
>>>             segment with size
>>>             >> bytes, and then the contents of the temporary segment
>>>             were copied into
>>>             >> this segment at offset 0 through src.byteSize() - 1.
>>>             >>
>>>             >> 2. The result of a bulk copy is unspecified if, in
>>>             the uncommon case, the
>>>             >> source segment and this segment do not overlap, but
>>>             refer to overlapping
>>>             >> regions of the same backing storage using different
>>>             addresses. For example,
>>>             >> this may occur if the same file is mapped
>>>             >>
>>>             <#mapFile(java.nio.file.Path,long,long,java.nio.channels.FileChannel.MapMode,jdk.incubator.foreign.ResourceScope)>
>>>             to
>>>             >> two segments.
>>>             >>
>>>             > The first paragraph is a guarantee that even if two
>>>             descendant segments
>>>             > have an overlapping region with a parent segment that
>>>             the copy operation
>>>             > will work properly.  This is similar to the guarantee
>>>             of System.arrayCopy()
>>>             >
>>>             > The second paragraph refers to memory-mapped files. 
>>>             However, let's examine
>>>             > the following scenario:
>>>             >
>>>             >     - A hierarchy of Memory Segments where two
>>>             descendant segments may
>>>             >     overlap a region of the parent segment.
>>>             >     - The elements of the segments are more complex
>>>             than Java primitives
>>>             >     (thus, PR 555 doesn't apply).
>>>             >     - The user wishes to copy a region of elements
>>>             from one of the
>>>             >     descendant segments to the other descendant segment.
>>>             >     - The user only has the two descendant segments in
>>>             hand and does not
>>>             >     have access to the parent segment.
>>>             >
>>>             > With the current MemorySegment API, the descendant
>>>             segments are blind to
>>>             > the overlap, to wit:
>>>             >
>>>             >     - The user cannot determine if an overlap exists.
>>>             >     - Or, if an overlap exists where the overlap is
>>>             with respect to the two
>>>             >     segments in hand.
>>>             >
>>>             > In order to ensure that corruption doesn't occur
>>>             during the copy, the user
>>>             > must create a temporary duplicate of the destination
>>>             segment, copy the data
>>>             > into the duplicate, then copy the duplicate into the
>>>             original destination
>>>             > segment.  This can be expensive in time and space.
>>>             >
>>>             > If, however, the user can determine that an overlap
>>>             exists, and where the
>>>             > overlap occurs, the copy operation can be done safely,
>>>             with no additional
>>>             > storage, by properly choosing the direction of the
>>>             iterative copy.
>>>             >
>>>             > To solve this, the user doesn't need access to the
>>>             parent segment (this
>>>             > could be for security reasons), but could use these
>>>             two methods:
>>>             >
>>>             >     - MemorySegment::boolean
>>>             isSameBaseResource(MemorySegment other);
>>>             >     The intent is to reveal if *this* segment and the
>>>             *other* segment share
>>>             >     a common ancestor segment.  It could also be
>>>             extended to determine if the
>>>             >     two segments share the same memory-mapped file (a
>>>             true resource), thus
>>>             >     possibly removing the caveat in paragraph 2 above.
>>>             >
>>>             >
>>>             >     - MemorySegment::long baseResourceOffsetBytes();
>>>             >     This would return the offset in bytes of the start
>>>             of this segment from
>>>             >     the start of the highest common segment (or resource).
>>>             >
>>>             > With this information, the user can easily design a
>>>             safe, efficient, and
>>>             > fast data copy method for moving arbitrary elements
>>>             from one segment to
>>>             > another with the same guarantee as System.arrayCopy().
>>>             >
>>>             > *Evidence*
>>>             > See (copySwap(...)
>>>             >
>>>             <https://github.com/leerho/PanamaLocal/blob/main/src/main/java/org/apache/datasketches/panama/MemoryCopy.java#L667-L703
>>>             <https://urldefense.com/v3/__https://github.com/leerho/PanamaLocal/blob/main/src/main/java/org/apache/datasketches/panama/MemoryCopy.java*L667-L703__;Iw!!GqivPVa7Brio!NIq-EJ-oDIw_GAcsVYAeixn4aRoWv0Ka_lgwzAjIaMC6ieshNbNmRDI0DoTelLQySw8wUVE$>>).
>>>             > Before I had access to the new MemorySegment::void
>>>             copyFrom(MemorySegment,
>>>             > MemoryLayout, MemoryLayout), I had to design a proxy
>>>             routine that would do
>>>             > the copy (with swap) correctly, especially in the case
>>>             where the two
>>>             > segments overlapped. Note lines 682, 683 where I
>>>             create a temporary
>>>             > segment. If I had the above two methods, this extra
>>>             copy operation would
>>>             > not be needed.
>>>             >
>>>             > For exactly the above reasons, some years ago we
>>>             implemented similar
>>>             > methods in our DataSketches Memory Component
>>>             >
>>>             <https://datasketches.apache.org/api/memory/snapshot/apidocs/index.html
>>>             <https://urldefense.com/v3/__https://datasketches.apache.org/api/memory/snapshot/apidocs/index.html__;!!GqivPVa7Brio!NIq-EJ-oDIw_GAcsVYAeixn4aRoWv0Ka_lgwzAjIaMC6ieshNbNmRDI0DoTelLQyNuBaO9I$>>.
>>>             > Specifically, in the class *WritableMemory*, the
>>>             methods *getRegionOffset()*
>>>             > and *isSameResource(that).*
>>>             >
>>>             > Lee.
>>>


More information about the panama-dev mailing list