Propose 2 new methods for MemorySegment
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu Jun 17 09:43:00 UTC 2021
On 16/06/2021 22:36, leerho wrote:
> I think that will work and it does more than i wanted. If two
> segments overlap (either 2 segments on-heap or 2 segments off-heap)
> and then using the segmentOffset() I can figure out how to do a direct
> copy without having to create an extra buffer --- without having to
> catch exceptions.
>
> I assume this would work even if the two segments are buried in a deep
> hierarchy, i.e., two great-grandchildren of a parent segment.
I believe so - after you check that base object is the same (either same
array object, or null, for native), you just need to do a range check.
Typically these amount at checking if either the start _or_ the end
address of a segment is contained in the other.
But this would do nothing for the memory mapped case (second paragraph
of the copyFrom javadoc) - although, as I said, I don't think we can
realistically do much for that.
Maurizio
>
> Lee.
>
>
> On Wed, Jun 16, 2021 at 2:07 PM Maurizio Cimadamore
> <maurizio.cimadamore at oracle.com
> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>
> Few observation:
>
> * I think there is space to add a method which checks if two
> segments _overlap_
>
> * This doesn't mean reasoning in terms of structure, like you are
> suggesting (e.g. two slices of the same parent), but merely
> checking for address overlap
>
> * I don't think that, in the general case we carry around the
> mapped file on which a segment/buffer is based from. And even if
> we did, with symbolic links etc. it would be pretty hard to
> uniformly detect these issues
>
> Given the above, the complexity vs. benefit of the proposed API
> seems rather slim.
>
> If the general feeling is that a _simple_ address overlap test
> would be useful, we can add that - but compared with other things
> we're discussing seems like low priority.
>
> Cheers
> Maurizio
>
>
> On 16/06/2021 20:06, leerho wrote:
>> Maurizio,
>> Well, I learned yet another corner of the API I hadn't found:
>> /MemoryAddress::segmentOffset()/ :)
>>
>> However, having the boolean i/sSameBaseResource(MemorySegment
>> other)/ would still be very useful!
>>
>> Having to catch exceptions in order to understand some basic
>> properties of a segment (or a pair of them) is a real nuisance.
>> As the API stands currently, given two segments:
>>
>> 1. If they are both independently allocated on-heap the
>> segmentOffset() throws an exception.
>> 2. If one is on-heap and the other off-heap segmentOffset()
>> throws an exception.
>> 3. If they are both independently allocated */off-heap/*
>> segmentOffset */does not/* throw an exception!
>>
>> If you had the method i/sSameBaseResource(MemorySegment other)/:
>>
>> 1. would return false
>> 2. would return false
>> 3. would return true (since the segmentOffset works in this case).
>> 4. Also, if both segments are descendants of a common ancestor
>> segment, it would return true
>>
>> This would make handling of moving data between segments so much
>> more straightforward.
>>
>> I revise my request to just add the first method:
>>
>> * MemorySegment::boolean isSameBaseResource(MemorySegment other);
>> The intent is to reveal if /*this*/ segment and the
>> */other/* segment share a common ancestor segment.
>> ["It could also be extended to determine if the two segments
>> share the same memory-mapped file (a true resource), thus
>> possibly removing the caveat in paragraph 2 above". -- this
>> may not be possible ]
>>
>> Now whether this removes your paragraph 2 caveat (at the top),
>> I'm not sure. Perhaps the caveat is because memory regions of a
>> memory-mapped file can be swapped out at any time, making any
>> assumptions about sub-regions and offsets rather meaningless?
>> Are there other reasons?
>>
>> Lee.
>>
>> On Wed, Jun 16, 2021 at 1:43 AM Maurizio Cimadamore
>> <maurizio.cimadamore at oracle.com
>> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>
>> I see what you mean.
>>
>> I wonder if this use case isn't already partially covered by
>> MemoryAddress::segmentOffset.
>>
>> E.g. can you do:
>>
>> long otherOffset = segment.address().segmentOffset(otherSegment);
>>
>> Then it should be easy to check if the offset is within the
>> bounds of
>> "otherSegment" ?
>>
>> (Note that the method already throws if you try to compare
>> addresses and
>> segments that are mismatched - e.g. on-heap vs. off-heap).
>>
>> Not saying a more direct API is ruled out, just pointing out
>> what we
>> have to see if it can be used.
>>
>> Maurizio
>>
>>
>> On 16/06/2021 02:37, leerho wrote:
>> > In working on
>> https://github.com/openjdk/panama-foreign/pull/555
>> <https://urldefense.com/v3/__https://github.com/openjdk/panama-foreign/pull/555__;!!GqivPVa7Brio!NIq-EJ-oDIw_GAcsVYAeixn4aRoWv0Ka_lgwzAjIaMC6ieshNbNmRDI0DoTelLQyi2Lh030$>,
>> which is
>> > the PR for Memory Segment Efficient Array Handling, I
>> discovered that there
>> > are two methods that would be very useful beyond copying
>> arrays, but useful
>> > in other types of data movement operations between
>> MemorySegments.
>> >
>> > I'd like to draw attention to the opening Javadoc of the
>> > *MemorySegment::copyFrom(MemorySegment)* method:
>> >
>> > 1. Performs a bulk copy from given source segment to this
>> segment. More
>> >> specifically, the bytes at offset 0 through src.byteSize()
>> - 1 in the
>> >> source segment are copied into this segment at offset 0
>> through src.byteSize()
>> >> - 1. If the source segment overlaps with this segment,
>> then the copying
>> >> is performed as if the bytes at offset 0 through
>> src.byteSize() - 1 in
>> >> the source segment were first copied into a temporary
>> segment with size
>> >> bytes, and then the contents of the temporary segment were
>> copied into
>> >> this segment at offset 0 through src.byteSize() - 1.
>> >>
>> >> 2. The result of a bulk copy is unspecified if, in the
>> uncommon case, the
>> >> source segment and this segment do not overlap, but refer
>> to overlapping
>> >> regions of the same backing storage using different
>> addresses. For example,
>> >> this may occur if the same file is mapped
>> >>
>> <#mapFile(java.nio.file.Path,long,long,java.nio.channels.FileChannel.MapMode,jdk.incubator.foreign.ResourceScope)>
>> to
>> >> two segments.
>> >>
>> > The first paragraph is a guarantee that even if two
>> descendant segments
>> > have an overlapping region with a parent segment that the
>> copy operation
>> > will work properly. This is similar to the guarantee of
>> System.arrayCopy()
>> >
>> > The second paragraph refers to memory-mapped files.
>> However, let's examine
>> > the following scenario:
>> >
>> > - A hierarchy of Memory Segments where two descendant
>> segments may
>> > overlap a region of the parent segment.
>> > - The elements of the segments are more complex than
>> Java primitives
>> > (thus, PR 555 doesn't apply).
>> > - The user wishes to copy a region of elements from one
>> of the
>> > descendant segments to the other descendant segment.
>> > - The user only has the two descendant segments in hand
>> and does not
>> > have access to the parent segment.
>> >
>> > With the current MemorySegment API, the descendant segments
>> are blind to
>> > the overlap, to wit:
>> >
>> > - The user cannot determine if an overlap exists.
>> > - Or, if an overlap exists where the overlap is with
>> respect to the two
>> > segments in hand.
>> >
>> > In order to ensure that corruption doesn't occur during the
>> copy, the user
>> > must create a temporary duplicate of the destination
>> segment, copy the data
>> > into the duplicate, then copy the duplicate into the
>> original destination
>> > segment. This can be expensive in time and space.
>> >
>> > If, however, the user can determine that an overlap exists,
>> and where the
>> > overlap occurs, the copy operation can be done safely, with
>> no additional
>> > storage, by properly choosing the direction of the
>> iterative copy.
>> >
>> > To solve this, the user doesn't need access to the parent
>> segment (this
>> > could be for security reasons), but could use these two
>> methods:
>> >
>> > - MemorySegment::boolean
>> isSameBaseResource(MemorySegment other);
>> > The intent is to reveal if *this* segment and the
>> *other* segment share
>> > a common ancestor segment. It could also be extended
>> to determine if the
>> > two segments share the same memory-mapped file (a true
>> resource), thus
>> > possibly removing the caveat in paragraph 2 above.
>> >
>> >
>> > - MemorySegment::long baseResourceOffsetBytes();
>> > This would return the offset in bytes of the start of
>> this segment from
>> > the start of the highest common segment (or resource).
>> >
>> > With this information, the user can easily design a safe,
>> efficient, and
>> > fast data copy method for moving arbitrary elements from
>> one segment to
>> > another with the same guarantee as System.arrayCopy().
>> >
>> > *Evidence*
>> > See (copySwap(...)
>> >
>> <https://github.com/leerho/PanamaLocal/blob/main/src/main/java/org/apache/datasketches/panama/MemoryCopy.java#L667-L703
>> <https://urldefense.com/v3/__https://github.com/leerho/PanamaLocal/blob/main/src/main/java/org/apache/datasketches/panama/MemoryCopy.java*L667-L703__;Iw!!GqivPVa7Brio!NIq-EJ-oDIw_GAcsVYAeixn4aRoWv0Ka_lgwzAjIaMC6ieshNbNmRDI0DoTelLQySw8wUVE$>>).
>> > Before I had access to the new MemorySegment::void
>> copyFrom(MemorySegment,
>> > MemoryLayout, MemoryLayout), I had to design a proxy
>> routine that would do
>> > the copy (with swap) correctly, especially in the case
>> where the two
>> > segments overlapped. Note lines 682, 683 where I create a
>> temporary
>> > segment. If I had the above two methods, this extra copy
>> operation would
>> > not be needed.
>> >
>> > For exactly the above reasons, some years ago we
>> implemented similar
>> > methods in our DataSketches Memory Component
>> >
>> <https://datasketches.apache.org/api/memory/snapshot/apidocs/index.html
>> <https://urldefense.com/v3/__https://datasketches.apache.org/api/memory/snapshot/apidocs/index.html__;!!GqivPVa7Brio!NIq-EJ-oDIw_GAcsVYAeixn4aRoWv0Ka_lgwzAjIaMC6ieshNbNmRDI0DoTelLQyNuBaO9I$>>.
>> > Specifically, in the class *WritableMemory*, the methods
>> *getRegionOffset()*
>> > and *isSameResource(that).*
>> >
>> > Lee.
>>
More information about the panama-dev
mailing list