Propose 2 new methods for MemorySegment
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu Jun 17 17:34:45 UTC 2021
On 17/06/2021 18:26, leerho wrote:
> This is super! Thank you for this. Is there a chance this can make it
> into 17?
No, Java 17 is in finalization stage [1], only high priority requests
(e.g. broken stuff) can be accommodated at this point in time.
Cheers
Maurizio
[1] - https://mail.openjdk.java.net/pipermail/jdk-dev/2021-June/005690.html
>
> Lee.
>
> On Thu, Jun 17, 2021 at 2:43 AM Maurizio Cimadamore
> <maurizio.cimadamore at oracle.com
> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>
>
> On 16/06/2021 22:36, leerho wrote:
>> I think that will work and it does more than i wanted. If two
>> segments overlap (either 2 segments on-heap or 2 segments
>> off-heap) and then using the segmentOffset() I can figure out how
>> to do a direct copy without having to create an extra buffer ---
>> without having to catch exceptions.
>>
>> I assume this would work even if the two segments are buried in a
>> deep hierarchy, i.e., two great-grandchildren of a parent segment.
>
> I believe so - after you check that base object is the same
> (either same array object, or null, for native), you just need to
> do a range check. Typically these amount at checking if either the
> start _or_ the end address of a segment is contained in the other.
>
> But this would do nothing for the memory mapped case (second
> paragraph of the copyFrom javadoc) - although, as I said, I don't
> think we can realistically do much for that.
>
> Maurizio
>
>>
>> Lee.
>>
>>
>> On Wed, Jun 16, 2021 at 2:07 PM Maurizio Cimadamore
>> <maurizio.cimadamore at oracle.com
>> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>
>> Few observation:
>>
>> * I think there is space to add a method which checks if two
>> segments _overlap_
>>
>> * This doesn't mean reasoning in terms of structure, like you
>> are suggesting (e.g. two slices of the same parent), but
>> merely checking for address overlap
>>
>> * I don't think that, in the general case we carry around the
>> mapped file on which a segment/buffer is based from. And even
>> if we did, with symbolic links etc. it would be pretty hard
>> to uniformly detect these issues
>>
>> Given the above, the complexity vs. benefit of the proposed
>> API seems rather slim.
>>
>> If the general feeling is that a _simple_ address overlap
>> test would be useful, we can add that - but compared with
>> other things we're discussing seems like low priority.
>>
>> Cheers
>> Maurizio
>>
>>
>> On 16/06/2021 20:06, leerho wrote:
>>> Maurizio,
>>> Well, I learned yet another corner of the API I hadn't
>>> found: /MemoryAddress::segmentOffset()/ :)
>>>
>>> However, having the boolean
>>> i/sSameBaseResource(MemorySegment other)/ would still be
>>> very useful!
>>>
>>> Having to catch exceptions in order to understand some basic
>>> properties of a segment (or a pair of them) is a real
>>> nuisance. As the API stands currently, given two segments:
>>>
>>> 1. If they are both independently allocated on-heap the
>>> segmentOffset() throws an exception.
>>> 2. If one is on-heap and the other off-heap segmentOffset()
>>> throws an exception.
>>> 3. If they are both independently allocated */off-heap/*
>>> segmentOffset */does not/* throw an exception!
>>>
>>> If you had the method i/sSameBaseResource(MemorySegment other)/:
>>>
>>> 1. would return false
>>> 2. would return false
>>> 3. would return true (since the segmentOffset works in this
>>> case).
>>> 4. Also, if both segments are descendants of a common
>>> ancestor segment, it would return true
>>>
>>> This would make handling of moving data between segments so
>>> much more straightforward.
>>>
>>> I revise my request to just add the first method:
>>>
>>> * MemorySegment::boolean isSameBaseResource(MemorySegment
>>> other);
>>> The intent is to reveal if /*this*/ segment and the
>>> */other/* segment share a common ancestor segment.
>>> ["It could also be extended to determine if the two
>>> segments share the same memory-mapped file (a true
>>> resource), thus possibly removing the caveat in
>>> paragraph 2 above". -- this may not be possible ]
>>>
>>> Now whether this removes your paragraph 2 caveat (at the
>>> top), I'm not sure. Perhaps the caveat is because memory
>>> regions of a memory-mapped file can be swapped out at any
>>> time, making any assumptions about sub-regions and offsets
>>> rather meaningless? Are there other reasons?
>>>
>>> Lee.
>>>
>>> On Wed, Jun 16, 2021 at 1:43 AM Maurizio Cimadamore
>>> <maurizio.cimadamore at oracle.com
>>> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>>
>>> I see what you mean.
>>>
>>> I wonder if this use case isn't already partially
>>> covered by
>>> MemoryAddress::segmentOffset.
>>>
>>> E.g. can you do:
>>>
>>> long otherOffset =
>>> segment.address().segmentOffset(otherSegment);
>>>
>>> Then it should be easy to check if the offset is within
>>> the bounds of
>>> "otherSegment" ?
>>>
>>> (Note that the method already throws if you try to
>>> compare addresses and
>>> segments that are mismatched - e.g. on-heap vs. off-heap).
>>>
>>> Not saying a more direct API is ruled out, just pointing
>>> out what we
>>> have to see if it can be used.
>>>
>>> Maurizio
>>>
>>>
>>> On 16/06/2021 02:37, leerho wrote:
>>> > In working on
>>> https://github.com/openjdk/panama-foreign/pull/555
>>> <https://urldefense.com/v3/__https://github.com/openjdk/panama-foreign/pull/555__;!!GqivPVa7Brio!NIq-EJ-oDIw_GAcsVYAeixn4aRoWv0Ka_lgwzAjIaMC6ieshNbNmRDI0DoTelLQyi2Lh030$>,
>>> which is
>>> > the PR for Memory Segment Efficient Array Handling, I
>>> discovered that there
>>> > are two methods that would be very useful beyond
>>> copying arrays, but useful
>>> > in other types of data movement operations between
>>> MemorySegments.
>>> >
>>> > I'd like to draw attention to the opening Javadoc of the
>>> > *MemorySegment::copyFrom(MemorySegment)* method:
>>> >
>>> > 1. Performs a bulk copy from given source segment to
>>> this segment. More
>>> >> specifically, the bytes at offset 0 through
>>> src.byteSize() - 1 in the
>>> >> source segment are copied into this segment at offset
>>> 0 through src.byteSize()
>>> >> - 1. If the source segment overlaps with this
>>> segment, then the copying
>>> >> is performed as if the bytes at offset 0 through
>>> src.byteSize() - 1 in
>>> >> the source segment were first copied into a temporary
>>> segment with size
>>> >> bytes, and then the contents of the temporary segment
>>> were copied into
>>> >> this segment at offset 0 through src.byteSize() - 1.
>>> >>
>>> >> 2. The result of a bulk copy is unspecified if, in
>>> the uncommon case, the
>>> >> source segment and this segment do not overlap, but
>>> refer to overlapping
>>> >> regions of the same backing storage using different
>>> addresses. For example,
>>> >> this may occur if the same file is mapped
>>> >>
>>> <#mapFile(java.nio.file.Path,long,long,java.nio.channels.FileChannel.MapMode,jdk.incubator.foreign.ResourceScope)>
>>> to
>>> >> two segments.
>>> >>
>>> > The first paragraph is a guarantee that even if two
>>> descendant segments
>>> > have an overlapping region with a parent segment that
>>> the copy operation
>>> > will work properly. This is similar to the guarantee
>>> of System.arrayCopy()
>>> >
>>> > The second paragraph refers to memory-mapped files.
>>> However, let's examine
>>> > the following scenario:
>>> >
>>> > - A hierarchy of Memory Segments where two
>>> descendant segments may
>>> > overlap a region of the parent segment.
>>> > - The elements of the segments are more complex
>>> than Java primitives
>>> > (thus, PR 555 doesn't apply).
>>> > - The user wishes to copy a region of elements
>>> from one of the
>>> > descendant segments to the other descendant segment.
>>> > - The user only has the two descendant segments in
>>> hand and does not
>>> > have access to the parent segment.
>>> >
>>> > With the current MemorySegment API, the descendant
>>> segments are blind to
>>> > the overlap, to wit:
>>> >
>>> > - The user cannot determine if an overlap exists.
>>> > - Or, if an overlap exists where the overlap is
>>> with respect to the two
>>> > segments in hand.
>>> >
>>> > In order to ensure that corruption doesn't occur
>>> during the copy, the user
>>> > must create a temporary duplicate of the destination
>>> segment, copy the data
>>> > into the duplicate, then copy the duplicate into the
>>> original destination
>>> > segment. This can be expensive in time and space.
>>> >
>>> > If, however, the user can determine that an overlap
>>> exists, and where the
>>> > overlap occurs, the copy operation can be done safely,
>>> with no additional
>>> > storage, by properly choosing the direction of the
>>> iterative copy.
>>> >
>>> > To solve this, the user doesn't need access to the
>>> parent segment (this
>>> > could be for security reasons), but could use these
>>> two methods:
>>> >
>>> > - MemorySegment::boolean
>>> isSameBaseResource(MemorySegment other);
>>> > The intent is to reveal if *this* segment and the
>>> *other* segment share
>>> > a common ancestor segment. It could also be
>>> extended to determine if the
>>> > two segments share the same memory-mapped file (a
>>> true resource), thus
>>> > possibly removing the caveat in paragraph 2 above.
>>> >
>>> >
>>> > - MemorySegment::long baseResourceOffsetBytes();
>>> > This would return the offset in bytes of the start
>>> of this segment from
>>> > the start of the highest common segment (or resource).
>>> >
>>> > With this information, the user can easily design a
>>> safe, efficient, and
>>> > fast data copy method for moving arbitrary elements
>>> from one segment to
>>> > another with the same guarantee as System.arrayCopy().
>>> >
>>> > *Evidence*
>>> > See (copySwap(...)
>>> >
>>> <https://github.com/leerho/PanamaLocal/blob/main/src/main/java/org/apache/datasketches/panama/MemoryCopy.java#L667-L703
>>> <https://urldefense.com/v3/__https://github.com/leerho/PanamaLocal/blob/main/src/main/java/org/apache/datasketches/panama/MemoryCopy.java*L667-L703__;Iw!!GqivPVa7Brio!NIq-EJ-oDIw_GAcsVYAeixn4aRoWv0Ka_lgwzAjIaMC6ieshNbNmRDI0DoTelLQySw8wUVE$>>).
>>> > Before I had access to the new MemorySegment::void
>>> copyFrom(MemorySegment,
>>> > MemoryLayout, MemoryLayout), I had to design a proxy
>>> routine that would do
>>> > the copy (with swap) correctly, especially in the case
>>> where the two
>>> > segments overlapped. Note lines 682, 683 where I
>>> create a temporary
>>> > segment. If I had the above two methods, this extra
>>> copy operation would
>>> > not be needed.
>>> >
>>> > For exactly the above reasons, some years ago we
>>> implemented similar
>>> > methods in our DataSketches Memory Component
>>> >
>>> <https://datasketches.apache.org/api/memory/snapshot/apidocs/index.html
>>> <https://urldefense.com/v3/__https://datasketches.apache.org/api/memory/snapshot/apidocs/index.html__;!!GqivPVa7Brio!NIq-EJ-oDIw_GAcsVYAeixn4aRoWv0Ka_lgwzAjIaMC6ieshNbNmRDI0DoTelLQyNuBaO9I$>>.
>>> > Specifically, in the class *WritableMemory*, the
>>> methods *getRegionOffset()*
>>> > and *isSameResource(that).*
>>> >
>>> > Lee.
>>>
More information about the panama-dev
mailing list