Part 1 Proposal for JDK-8264594

Wed Jun 9 11:53:12 UTC 2021

Thanks for the background. I think my question was ill-posed, as I 
realize that even in the cases you describe, you could always avoid the 
need for a bulk copy with raw byte offset by doing slicing and copy.

e.g.

copyIntoArrayRaw(segment, byteOffset, long[], arrayIndex)

can be rewritten as:

copyIntoArrayLogical(segment.asSlice(byteOffset), 0, long[], arrayIndex)

So, if we find that having a physical (segment offset) vs. logical 
(array offset) mismatch is too confusing, we *could* fall back to 
slicing and using logical offsets only.

Maurizio

On 08/06/2021 17:54, leerho wrote:
>
>     Question: do you have concrete use cases where you'd like to copy
>     contents of a double[] at non-aligned positions inside a segment?
>
>
> Absolutely!   In fact, specifying the offset in bytes is probably the 
> most common.  This is because many of our structs start with a 
> preamble of a mix of byte, short, int, and float fields, and then 
> followed by several data array regions.  For example an array of ints 
> followed by an array of longs, which in some cases may not be unit 
> aligned with the base of the segment.  This can be particularly true 
> of structs that are defined by other users where alignment cannot be 
> guaranteed.  In fact, one of our customers has years of historical 
> data (petabytes) that are not at all byte aligned.  So to assume that 
> everything is neatly byte aligned would be a big mistake.  
> Unfortunately, many engineers that have created files of data over the 
> years did so without any awareness of the need for byte-alignment.
>
> Lee.
>
> On Tue, Jun 8, 2021 at 7:00 AM Maurizio Cimadamore 
> <maurizio.cimadamore at oracle.com 
> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>
>     Hi Lee,
>     thanks for this, I'll try to put together a PR in the next week or so.
>
>     Re. overloads I was thinking about how the ByteBuffer API goes
>     about them, and, in I see that that API doesn't have the same
>     issues, given that ByteBuffer only provides bulk copy for byte
>     arrays. If you want to bulk copy a double[] you have to convert to
>     a DoubleBuffer first and then copy (and then use logical indices
>     for the buffer too).
>
>     Question: do you have concrete use cases where you'd like to copy
>     contents of a double[] at non-aligned positions inside a segment?
>
>     I'm asking because, in principle we could leave the methods in
>     CopyMemory cover the aligned, logical indices only part of the
>     API. And ask the user to fall back to MemorySegment::copyFrom if
>     they really really want unaligned (which should be uncommon?).
>
>     Cheers
>     Maurizio
>
>     On 08/06/2021 00:55, leerho wrote:
>>     Maurizio,
>>     I have updated MemoryCopy and TestMemoryCopy by extending the
>>     variable names to be even more clear, for example:
>>
>>       * srcArr -> srcArray
>>       * srcSeg -> srcSegment
>>       * srcCopyLen -> srcCopyLengthInts
>>       * srcIndex -> srcIndexLongs
>>       * Etc.
>>
>>     I also added a comment in the class javadoc describing the
>>     difference between variables with the word "index" and ones with
>>     the word "offset", etc.
>>
>>     I have looked at what it would take to add overloads that refer
>>     to the segment offset in array element units.  It can be done,
>>     but the names of these new methods will be kinda ugly or awkward:
>>
>>       * copyFromArrayAtIndex -- confusing since the array uses index
>>         units already
>>       * copyAtIndexFromArray -- not much better
>>       * copyFromArrayWithIndex -- nah
>>       * copyFromArrayToSegementUsingIndex -- a little more clear, but
>>         really awkward
>>
>>     If you have any better ideas, I am open.  (But IMHO, I don't
>>     think we need this :) )
>>
>>     Please let me know what the next step should be :)
>>
>>     Thanks!
>>
>>     Lee.
>>
>>     On Fri, Jun 4, 2021 at 7:09 AM leerho <leerho at gmail.com
>>     <mailto:leerho at gmail.com>> wrote:
>>
>>         Maurizio,
>>         Thank you for your reply.  I agree with your observation in
>>         the difference between the single item case and the bulk
>>         copy.  I guess I became so familiar with how offsets are
>>         handled in Unsafe where the single element and bulk copy
>>         cases the references to memory are always in bytes, it seamed
>>         so simple and natural.
>>
>>         In my experience anyway, reading and writing data to off-heap
>>         almost always involves a mix of preamble elements and bulk
>>         copy arrays where it was necessary to track the offset in
>>         bytes, where the multibyte elements may not be correctly
>>         aligned to the base of the allocated segment, or God forbid,
>>         the arrays.
>>
>>         What is most important is getting this into the JDK, and if I
>>         have to add more overloads, I will :)
>>
>>         Lee.
>>
>>         On Fri, Jun 4, 2021 at 2:34 AM Maurizio Cimadamore
>>         <maurizio.cimadamore at oracle.com
>>         <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>
>>
>>             On 04/06/2021 01:32, leerho wrote:
>>>             When I first saw all the logical index overloads in
>>>             MemoryAccess I shuddered.  It is a lot of API clutter as
>>>             far as I’m concerned.  But just because we made that
>>>             mistake with MemoryAccess doesn’t mean we have to
>>>             continue propagating it.
>>
>>             I'm sorry, I just don't see that as being a "mistake".
>>             Code like this:
>>
>>             ```
>>             for (int i = 0 ; i < limit ; i++) {
>>                long val = MemoryAccess.getLongAtIndex(segment, i);
>>                ...
>>             }
>>             ```
>>
>>             Is _extremely_ common. While it's true that memory
>>             segment is an unbiased array of bytes, there are times
>>             you want to access it in a more structured form - and
>>             saying "you can't do that" because it adds overloads
>>             doesn't seem like a great solution to me.
>>
>>>
>>>             Also I have explicitly named the two offsets very
>>>             differently to try to make it clear that they are very
>>>             different.  I guess I could go further and name the
>>>             “srcIndex” -> “srcIndexLongs” for the long array case,
>>>             for example.  But since “index” is used elsewhere as the
>>>             logical index I assumed that this would not be necessary.
>>>
>>>             But I would much rather use longer, more explicit names
>>>             than propagate API clutter.
>>
>>             I was thinking overnight that, perhaps, the memory copy
>>             and the single element cases are different. When you do
>>             single access (see above) you do that frequently in a
>>             loop - and you often get elements of the same type. As
>>             stated above, asking developers to work in terms of
>>             offsets seems a bit odd, when they are really accessing a
>>             flat array of element type T.
>>
>>             In the case of bulk copy, I don't think there's the same
>>             degree of pressure - you will probably call the bulk copy
>>             at the beginning, or at the end of your computation (or
>>             both ends!) - and having an offset shift is more tolerable.
>>
>>             So, I'm open to evaluating something more minimal, and
>>             see how far we can get with it.
>>
>>             Maurizio
>>
>>>
>>>             Lee
>>>
>>>             On Thu, Jun 3, 2021 at 1:26 PM Maurizio Cimadamore
>>>             <maurizio.cimadamore at oracle.com
>>>             <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>>
>>>                 Thanks.
>>>
>>>                 While I agree that a segment is just a bunch of
>>>                 bytes, I think also think that having an API which
>>>                 takes two indices at the same time, one logical and
>>>                 one physical is kind of evil. I bet that a lot of
>>>                 developers will get that wrong, at least the first time.
>>>
>>>                 What you say about segments being an array of bytes,
>>>                 and thus not needing index-based overloads can also
>>>                 be said for MemoryAccess class - but I believe that
>>>                 the index-based accessors in the are quite popular.
>>>
>>>                 But overall, my worry is not performances, or
>>>                 symmetry with respect to other APIs - my main worry
>>>                 is what I said above: that people will just assume
>>>                 that indices are logical for both the segment and
>>>                 the array - so overloads with clearly different
>>>                 names should help quite a bit IMHO.
>>>
>>>
>>>                 Maurizio
>>>
>>>                 On 03/06/2021 20:50, leerho wrote:
>>>>                 Maurizio,
>>>>                 Here is the MemoryCopy class (w/o byte swap) and
>>>>                 the TestMemoryCopy class (w/o testing byte swap).
>>>>
>>>>                 I copied from the TestMemoryAccess class in that I
>>>>                 noticed that there is no package statement.  I am
>>>>                 not sure what @SuppressWarnings are allowed in your
>>>>                 test environment, I suspect I should remove all of
>>>>                 them.
>>>>
>>>>                 When the MemorySegment copyFrom(MemorySegment,
>>>>                 MemoryLayout, ByteOrder) becomes available I can
>>>>                 add the tests for that.
>>>>
>>>>                 *IMHO*
>>>>
>>>>                     I feel strongly that there is no need to
>>>>                     additionally overload these methods with the
>>>>                     segment offset specified in array index units. 
>>>>                     It is so trivial to convert from one to the
>>>>                     other and it can be done easily in the method
>>>>                     argument with a simple multiply by 2, 4 or 8.
>>>>                     This should compile down to a simple shift,
>>>>                     which becomes a single cycle CPU instruction.
>>>>                     So this is not a performance issue.
>>>>
>>>>                     Furthermore, it is best that the user becomes
>>>>                     accustomed to thinking of a segment
>>>>                     fundamentally as an array of bytes.  Once a
>>>>                     segment is loaded with some primitive array,
>>>>                     the segment loses the context of the type of
>>>>                     the array it was loaded with; in effect, a kind
>>>>                     of "type erasure" similar to Java's generics.
>>>>                     This is one of the reasons that MemorySegments
>>>>                     can be so powerful.
>>>>
>>>>
>>>>                     I have been programming with the
>>>>                     "MemorySegment" concept for a number of years
>>>>                     now and find that keeping in mind that segments
>>>>                     are just bytes is very useful.
>>>>
>>>>
>>>>                 Let me know what you think.
>>>>
>>>>                 Lee.
>>>>
>>>>
>>>>
>>>>                 On Mon, May 31, 2021 at 6:56 AM Maurizio Cimadamore
>>>>                 <maurizio.cimadamore at oracle.com
>>>>                 <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>>>
>>>>                     Hi Lee,
>>>>                     this looks good.
>>>>
>>>>                     One thing I note is that there's an ambiguity
>>>>                     as to whether the segment index is expressed as
>>>>                     a logical index, or a raw byte offset. Your
>>>>                     snippet does the latter. If we want to follow
>>>>                     MemoryAccess, perhaps that calls for 2
>>>>                     overloads
>>>>                     (copyFromArrayAtIndex/copyFromArrayAtOffset),
>>>>                     as I imagine both could be useful, depending on
>>>>                     the case?
>>>>
>>>>                     The javadoc will have to say something when the
>>>>                     segment being used is backed by the very array
>>>>                     that is the source/target of the copy (we have
>>>>                     some text like that in MemorySegment::copyFrom).
>>>>
>>>>                     I think it would be helpful to progress further
>>>>                     with this, add the remaining templates (w/o
>>>>                     ByteOrder, for now) test and see how it works
>>>>                     in practice.
>>>>
>>>>                     I will add (or just file a simple PR, so that
>>>>                     you can just borrow from it - should be a
>>>>                     single method) something to do the segment copy
>>>>                     with swap.
>>>>
>>>>                     Thanks!
>>>>                     Maurizio
>>>>
>>>>                     On 28/05/2021 22:37, leerho wrote:
>>>>>                     Maurizio,
>>>>>
>>>>>                     Again sorry about the delay.
>>>>>
>>>>>                     Attached is a template proposal for the
>>>>>                     MemoryCopy class.
>>>>>
>>>>>                     I can't complete this without your proposed
>>>>>                     new copyFrom(...) method in MemorySegment. As
>>>>>                     it is written, it should work, but without the
>>>>>                     byte swap capability.
>>>>>
>>>>>                     I can complete the rest of the primitives like
>>>>>                     this template, if you would like with
>>>>>                     javadocs.  I could also start writing tests,
>>>>>                     but without the byte-swap.
>>>>>
>>>>>                     Let me know what would be most helpful.
>>>>>
>>>>>                     Cheers,
>>>>>
>>>>>                     Lee
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                     On Thu, May 20, 2021 at 11:36 AM leerho
>>>>>                     <leerho at gmail.com <mailto:leerho at gmail.com>>
>>>>>                     wrote:
>>>>>
>>>>>                         Of course!
>>>>>
>>>>>                         On Thu, May 20, 2021 at 9:42 AM Maurizio
>>>>>                         Cimadamore <maurizio.cimadamore at oracle.com
>>>>>                         <mailto:maurizio.cimadamore at oracle.com>>
>>>>>                         wrote:
>>>>>
>>>>>
>>>>>                             On 20/05/2021 17:13, leerho wrote:
>>>>>>                             I am not sure if the /dstSegSlice/
>>>>>>                             requires the /srcCopyLen/. I would
>>>>>>                             hope that it is smart enough to
>>>>>>                             realize that the input length is
>>>>>>                             smaller than the given offset minus
>>>>>>                             the segment size.
>>>>>
>>>>>                             asSlice has an overload that just
>>>>>                             takes an offset and infers the
>>>>>                             resulting size from there.
>>>>>
>>>>>                             But that doesn't seem what you want
>>>>>                             here - as you want the slice to have a
>>>>>                             specific size (the size of the input
>>>>>                             array).
>>>>>
>>>>>                             MemorySegment::copyFrom wants the two
>>>>>                             segments to have the same size, so I
>>>>>                             think you need that.
>>>>>
>>>>>                             In terms of performance, there's no
>>>>>                             difference between asSlice(offset) and
>>>>>                             asSlice(offset, size) - you have to
>>>>>                             create a new segment anyway.
>>>>>
>>>>>                             Maurizio
>>>>>
>>>>>
>>>             -- 
>>>             From my cell phone.
>>
>>         -- 
>>         From my cell phone.
>>