Part 1 Proposal for JDK-8264594

Wed Jun 9 21:15:29 UTC 2021

I'm sorry, I must be dense :-)  I don't understand your previous email.

   - Why would one want to avoid a bulk copy? If you have to do a copy, a
   bulk copy is the fastest way to do it.
   (i.e., by "bulk" I mean a low level copy a la System.arrayCopy(...),
   which at the lowest level is just moving bytes,
   and by that criteria, both of your statements above are "bulk" copies
   ... :-)   ).
   - Your "logical" rewrite is still specifying the offset in bytes as is
   the "Raw".  I thought this whole discussion was to create an overloaded
   "logical" API that would specify the segment offset in array units (in this
   case, longs).
   - Both of your statements are missing a length parameter, which would be
   in longs.

Please, please,  don't discard the portion of the API that allows easy
copying from/to arrays with a segment offset specified in bytes! I will
gladly write the "logical" overload methods, if that is what you feel you
need.  If they are there I will use them, but only rarely.

*****
Because I don't have your new "copyFrom" I'm going to temporarily mock it
so I can test the byte-order version as I have the byte-order version of
the test code ready to test.  (It is a completely different rewrite of the
test code I've given you so far.)

Lee.

On Wed, Jun 9, 2021 at 4:53 AM Maurizio Cimadamore <
maurizio.cimadamore at oracle.com> wrote:

> Thanks for the background. I think my question was ill-posed, as I realize
> that even in the cases you describe, you could always avoid the need for a
> bulk copy with raw byte offset by doing slicing and copy.
>
> e.g.
>
> copyIntoArrayRaw(segment, byteOffset, long[], arrayIndex)
>
> can be rewritten as:
>
> copyIntoArrayLogical(segment.asSlice(byteOffset), 0, long[], arrayIndex)
>
>
> So, if we find that having a physical (segment offset) vs. logical (array
> offset) mismatch is too confusing, we *could* fall back to slicing and
> using logical offsets only.
>
> Maurizio
>
>
> On 08/06/2021 17:54, leerho wrote:
>
> Question: do you have concrete use cases where you'd like to copy contents
>> of a double[] at non-aligned positions inside a segment?
>
>
> Absolutely!   In fact, specifying the offset in bytes is probably the most
> common.  This is because many of our structs start with a preamble of a mix
> of byte, short, int, and float fields, and then followed by several data
> array regions.  For example an array of ints followed by an array of longs,
> which in some cases may not be unit aligned with the base of the segment.
> This can be particularly true of structs that are defined by other users
> where alignment cannot be guaranteed.   In fact, one of our customers has
> years of historical data (petabytes) that are not at all byte aligned.  So
> to assume that everything is neatly byte aligned would be a big mistake.
> Unfortunately, many engineers that have created files of data over the
> years did so without any awareness of the need for byte-alignment.
>
> Lee.
>
>
> On Tue, Jun 8, 2021 at 7:00 AM Maurizio Cimadamore <
> maurizio.cimadamore at oracle.com> wrote:
>
>> Hi Lee,
>> thanks for this, I'll try to put together a PR in the next week or so.
>>
>> Re. overloads I was thinking about how the ByteBuffer API goes about
>> them, and, in I see that that API doesn't have the same issues, given that
>> ByteBuffer only provides bulk copy for byte arrays. If you want to bulk
>> copy a double[] you have to convert to a DoubleBuffer first and then copy
>> (and then use logical indices for the buffer too).
>>
>> Question: do you have concrete use cases where you'd like to copy
>> contents of a double[] at non-aligned positions inside a segment?
>>
>> I'm asking because, in principle we could leave the methods in CopyMemory
>> cover the aligned, logical indices only part of the API. And ask the user
>> to fall back to MemorySegment::copyFrom if they really really want
>> unaligned (which should be uncommon?).
>>
>> Cheers
>> Maurizio
>> On 08/06/2021 00:55, leerho wrote:
>>
>> Maurizio,
>> I have updated MemoryCopy and TestMemoryCopy by extending the variable
>> names to be even more clear, for example:
>>
>>    - srcArr -> srcArray
>>    - srcSeg -> srcSegment
>>    - srcCopyLen -> srcCopyLengthInts
>>    - srcIndex -> srcIndexLongs
>>    - Etc.
>>
>> I also added a comment in the class javadoc describing the difference
>> between variables with the word "index" and ones with the word "offset",
>> etc.
>>
>> I have looked at what it would take to add overloads that refer to the
>> segment offset in array element units.  It can be done, but the names of
>> these new methods will be kinda ugly or awkward:
>>
>>    - copyFromArrayAtIndex -- confusing since the array uses index units
>>    already
>>    - copyAtIndexFromArray -- not much better
>>    - copyFromArrayWithIndex -- nah
>>    - copyFromArrayToSegementUsingIndex -- a little more clear, but
>>    really awkward
>>
>> If you have any better ideas, I am open.  (But IMHO, I don't think we
>> need this :) )
>>
>> Please let me know what the next step should be :)
>>
>> Thanks!
>>
>> Lee.
>>
>> On Fri, Jun 4, 2021 at 7:09 AM leerho <leerho at gmail.com> wrote:
>>
>>> Maurizio,
>>> Thank you for your reply.  I agree with your observation in the
>>> difference between the single item case and the bulk copy.  I guess I
>>> became so familiar with how offsets are handled in Unsafe where the single
>>> element and bulk copy cases the references to memory are always in bytes,
>>> it seamed so simple and natural.
>>>
>>> In my experience anyway, reading and writing data to off-heap almost
>>> always involves a mix of preamble elements and bulk copy arrays where it
>>> was necessary to track the offset in bytes, where the multibyte elements
>>> may not be correctly aligned to the base of the allocated segment, or God
>>> forbid, the arrays.
>>>
>>> What is most important is getting this into the JDK, and if I have to
>>> add more overloads, I will :)
>>>
>>> Lee.
>>>
>>> On Fri, Jun 4, 2021 at 2:34 AM Maurizio Cimadamore <
>>> maurizio.cimadamore at oracle.com> wrote:
>>>
>>>>
>>>> On 04/06/2021 01:32, leerho wrote:
>>>>
>>>> When I first saw all the logical index overloads in MemoryAccess I
>>>> shuddered.  It is a lot of API clutter as far as I’m concerned.  But just
>>>> because we made that mistake with MemoryAccess doesn’t mean we have to
>>>> continue propagating it.
>>>>
>>>> I'm sorry, I just don't see that as being a "mistake". Code like this:
>>>>
>>>> ```
>>>> for (int i = 0 ; i < limit ; i++) {
>>>>    long val = MemoryAccess.getLongAtIndex(segment, i);
>>>>    ...
>>>> }
>>>> ```
>>>>
>>>> Is _extremely_ common. While it's true that memory segment is an
>>>> unbiased array of bytes, there are times you want to access it in a more
>>>> structured form - and saying "you can't do that" because it adds overloads
>>>> doesn't seem like a great solution to me.
>>>>
>>>>
>>>> Also I have explicitly named the two offsets very differently to try to
>>>> make it clear that they are very different.  I guess I could go further and
>>>> name the “srcIndex” -> “srcIndexLongs” for the long array case, for
>>>> example.  But since “index” is used elsewhere as the logical index I
>>>> assumed that this would not be necessary.
>>>>
>>>> But I would much rather use longer, more explicit names than propagate
>>>> API clutter.
>>>>
>>>> I was thinking overnight that, perhaps, the memory copy and the single
>>>> element cases are different. When you do single access (see above) you do
>>>> that frequently in a loop - and you often get elements of the same type. As
>>>> stated above, asking developers to work in terms of offsets seems a bit
>>>> odd, when they are really accessing a flat array of element type T.
>>>>
>>>> In the case of bulk copy, I don't think there's the same degree of
>>>> pressure - you will probably call the bulk copy at the beginning, or at the
>>>> end of your computation (or both ends!) - and having an offset shift is
>>>> more tolerable.
>>>>
>>>> So, I'm open to evaluating something more minimal, and see how far we
>>>> can get with it.
>>>>
>>>> Maurizio
>>>>
>>>>
>>>> Lee
>>>>
>>>> On Thu, Jun 3, 2021 at 1:26 PM Maurizio Cimadamore <
>>>> maurizio.cimadamore at oracle.com> wrote:
>>>>
>>>>> Thanks.
>>>>>
>>>>> While I agree that a segment is just a bunch of bytes, I think also
>>>>> think that having an API which takes two indices at the same time, one
>>>>> logical and one physical is kind of evil. I bet that a lot of developers
>>>>> will get that wrong, at least the first time.
>>>>>
>>>>> What you say about segments being an array of bytes, and thus not
>>>>> needing index-based overloads can also be said for MemoryAccess class - but
>>>>> I believe that the index-based accessors in the are quite popular.
>>>>>
>>>>> But overall, my worry is not performances, or symmetry with respect to
>>>>> other APIs - my main worry is what I said above: that people will just
>>>>> assume that indices are logical for both the segment and the array - so
>>>>> overloads with clearly different names should help quite a bit IMHO.
>>>>>
>>>>>
>>>>> Maurizio
>>>>> On 03/06/2021 20:50, leerho wrote:
>>>>>
>>>>> Maurizio,
>>>>> Here is the MemoryCopy class (w/o byte swap) and the TestMemoryCopy
>>>>> class (w/o testing byte swap).
>>>>>
>>>>> I copied from the TestMemoryAccess class in that I noticed that there
>>>>> is no package statement.  I am not sure what @SuppressWarnings are allowed
>>>>> in your test environment, I suspect I should remove all of them.
>>>>>
>>>>> When the MemorySegment copyFrom(MemorySegment, MemoryLayout,
>>>>> ByteOrder) becomes available I can add the tests for that.
>>>>>
>>>>> *IMHO*
>>>>>
>>>>> I feel strongly that there is no need to additionally overload these
>>>>> methods with the segment offset specified in array index units.  It is so
>>>>> trivial to convert from one to the other and it can be done easily in the
>>>>> method argument with a simple multiply by 2, 4 or 8. This should compile
>>>>> down to a simple shift, which becomes a single cycle CPU instruction.  So
>>>>> this is not a performance issue.
>>>>>
>>>>> Furthermore, it is best that the user becomes accustomed to thinking
>>>>> of a segment fundamentally as an array of bytes.  Once a segment is loaded
>>>>> with some primitive array, the segment loses the context of the type of the
>>>>> array it was loaded with; in effect, a kind of "type erasure" similar to
>>>>> Java's generics.  This is one of the reasons that MemorySegments can be so
>>>>> powerful.
>>>>>
>>>>>
>>>>> I have been programming with the "MemorySegment" concept for a number
>>>>> of years now and find that keeping in mind that segments are just bytes is
>>>>> very useful.
>>>>>
>>>>>
>>>>> Let me know what you think.
>>>>>
>>>>> Lee.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, May 31, 2021 at 6:56 AM Maurizio Cimadamore <
>>>>> maurizio.cimadamore at oracle.com> wrote:
>>>>>
>>>>>> Hi Lee,
>>>>>> this looks good.
>>>>>>
>>>>>> One thing I note is that there's an ambiguity as to whether the
>>>>>> segment index is expressed as a logical index, or a raw byte offset. Your
>>>>>> snippet does the latter. If we want to follow MemoryAccess, perhaps that
>>>>>> calls for 2 overloads (copyFromArrayAtIndex/copyFromArrayAtOffset), as I
>>>>>> imagine both could be useful, depending on the case?
>>>>>>
>>>>>> The javadoc will have to say something when the segment being used is
>>>>>> backed by the very array that is the source/target of the copy (we have
>>>>>> some text like that in MemorySegment::copyFrom).
>>>>>>
>>>>>> I think it would be helpful to progress further with this, add the
>>>>>> remaining templates (w/o ByteOrder, for now) test and see how it works in
>>>>>> practice.
>>>>>>
>>>>>> I will add (or just file a simple PR, so that you can just borrow
>>>>>> from it - should be a single method) something to do the segment copy with
>>>>>> swap.
>>>>>>
>>>>>> Thanks!
>>>>>> Maurizio
>>>>>> On 28/05/2021 22:37, leerho wrote:
>>>>>>
>>>>>> Maurizio,
>>>>>>
>>>>>> Again sorry about the delay.
>>>>>>
>>>>>> Attached is a template proposal for the MemoryCopy class.
>>>>>>
>>>>>> I can't complete this without your proposed new copyFrom(...) method
>>>>>> in MemorySegment. As it is written, it should work, but without the byte
>>>>>> swap capability.
>>>>>>
>>>>>> I can complete the rest of the primitives like this template, if you
>>>>>> would like with javadocs.  I could also start writing tests, but without
>>>>>> the byte-swap.
>>>>>>
>>>>>> Let me know what would be most helpful.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Lee
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, May 20, 2021 at 11:36 AM leerho <leerho at gmail.com> wrote:
>>>>>>
>>>>>>> Of course!
>>>>>>>
>>>>>>> On Thu, May 20, 2021 at 9:42 AM Maurizio Cimadamore <
>>>>>>> maurizio.cimadamore at oracle.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On 20/05/2021 17:13, leerho wrote:
>>>>>>>>
>>>>>>>> I am not sure if the *dstSegSlice* requires the *srcCopyLen*. I
>>>>>>>> would hope that it is smart enough to realize that the input length is
>>>>>>>> smaller than the given offset minus the segment size.
>>>>>>>>
>>>>>>>> asSlice has an overload that just takes an offset and infers the
>>>>>>>> resulting size from there.
>>>>>>>>
>>>>>>>> But that doesn't seem what you want here - as you want the slice to
>>>>>>>> have a specific size (the size of the input array).
>>>>>>>>
>>>>>>>> MemorySegment::copyFrom wants the two segments to have the same
>>>>>>>> size, so I think you need that.
>>>>>>>>
>>>>>>>> In terms of performance, there's no difference between
>>>>>>>> asSlice(offset) and asSlice(offset, size) - you have to create a new
>>>>>>>> segment anyway.
>>>>>>>>
>>>>>>>> Maurizio
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>> From my cell phone.
>>>>
>>>> --
>>> From my cell phone.
>>>
>>