Part 1 Proposal for JDK-8264594

Tue Jun 8 16:54:50 UTC 2021

>
> Question: do you have concrete use cases where you'd like to copy contents
> of a double[] at non-aligned positions inside a segment?

Absolutely!   In fact, specifying the offset in bytes is probably the most
common.  This is because many of our structs start with a preamble of a mix
of byte, short, int, and float fields, and then followed by several data
array regions.  For example an array of ints followed by an array of longs,
which in some cases may not be unit aligned with the base of the segment.
This can be particularly true of structs that are defined by other users
where alignment cannot be guaranteed.   In fact, one of our customers has
years of historical data (petabytes) that are not at all byte aligned.  So
to assume that everything is neatly byte aligned would be a big mistake.
Unfortunately, many engineers that have created files of data over the
years did so without any awareness of the need for byte-alignment.

Lee.

On Tue, Jun 8, 2021 at 7:00 AM Maurizio Cimadamore <
maurizio.cimadamore at oracle.com> wrote:

> Hi Lee,
> thanks for this, I'll try to put together a PR in the next week or so.
>
> Re. overloads I was thinking about how the ByteBuffer API goes about them,
> and, in I see that that API doesn't have the same issues, given that
> ByteBuffer only provides bulk copy for byte arrays. If you want to bulk
> copy a double[] you have to convert to a DoubleBuffer first and then copy
> (and then use logical indices for the buffer too).
>
> Question: do you have concrete use cases where you'd like to copy contents
> of a double[] at non-aligned positions inside a segment?
>
> I'm asking because, in principle we could leave the methods in CopyMemory
> cover the aligned, logical indices only part of the API. And ask the user
> to fall back to MemorySegment::copyFrom if they really really want
> unaligned (which should be uncommon?).
>
> Cheers
> Maurizio
> On 08/06/2021 00:55, leerho wrote:
>
> Maurizio,
> I have updated MemoryCopy and TestMemoryCopy by extending the variable
> names to be even more clear, for example:
>
>    - srcArr -> srcArray
>    - srcSeg -> srcSegment
>    - srcCopyLen -> srcCopyLengthInts
>    - srcIndex -> srcIndexLongs
>    - Etc.
>
> I also added a comment in the class javadoc describing the difference
> between variables with the word "index" and ones with the word "offset",
> etc.
>
> I have looked at what it would take to add overloads that refer to the
> segment offset in array element units.  It can be done, but the names of
> these new methods will be kinda ugly or awkward:
>
>    - copyFromArrayAtIndex -- confusing since the array uses index units
>    already
>    - copyAtIndexFromArray -- not much better
>    - copyFromArrayWithIndex -- nah
>    - copyFromArrayToSegementUsingIndex -- a little more clear, but really
>    awkward
>
> If you have any better ideas, I am open.  (But IMHO, I don't think we need
> this :) )
>
> Please let me know what the next step should be :)
>
> Thanks!
>
> Lee.
>
> On Fri, Jun 4, 2021 at 7:09 AM leerho <leerho at gmail.com> wrote:
>
>> Maurizio,
>> Thank you for your reply.  I agree with your observation in the
>> difference between the single item case and the bulk copy.  I guess I
>> became so familiar with how offsets are handled in Unsafe where the single
>> element and bulk copy cases the references to memory are always in bytes,
>> it seamed so simple and natural.
>>
>> In my experience anyway, reading and writing data to off-heap almost
>> always involves a mix of preamble elements and bulk copy arrays where it
>> was necessary to track the offset in bytes, where the multibyte elements
>> may not be correctly aligned to the base of the allocated segment, or God
>> forbid, the arrays.
>>
>> What is most important is getting this into the JDK, and if I have to add
>> more overloads, I will :)
>>
>> Lee.
>>
>> On Fri, Jun 4, 2021 at 2:34 AM Maurizio Cimadamore <
>> maurizio.cimadamore at oracle.com> wrote:
>>
>>>
>>> On 04/06/2021 01:32, leerho wrote:
>>>
>>> When I first saw all the logical index overloads in MemoryAccess I
>>> shuddered.  It is a lot of API clutter as far as I’m concerned.  But just
>>> because we made that mistake with MemoryAccess doesn’t mean we have to
>>> continue propagating it.
>>>
>>> I'm sorry, I just don't see that as being a "mistake". Code like this:
>>>
>>> ```
>>> for (int i = 0 ; i < limit ; i++) {
>>>    long val = MemoryAccess.getLongAtIndex(segment, i);
>>>    ...
>>> }
>>> ```
>>>
>>> Is _extremely_ common. While it's true that memory segment is an
>>> unbiased array of bytes, there are times you want to access it in a more
>>> structured form - and saying "you can't do that" because it adds overloads
>>> doesn't seem like a great solution to me.
>>>
>>>
>>> Also I have explicitly named the two offsets very differently to try to
>>> make it clear that they are very different.  I guess I could go further and
>>> name the “srcIndex” -> “srcIndexLongs” for the long array case, for
>>> example.  But since “index” is used elsewhere as the logical index I
>>> assumed that this would not be necessary.
>>>
>>> But I would much rather use longer, more explicit names than propagate
>>> API clutter.
>>>
>>> I was thinking overnight that, perhaps, the memory copy and the single
>>> element cases are different. When you do single access (see above) you do
>>> that frequently in a loop - and you often get elements of the same type. As
>>> stated above, asking developers to work in terms of offsets seems a bit
>>> odd, when they are really accessing a flat array of element type T.
>>>
>>> In the case of bulk copy, I don't think there's the same degree of
>>> pressure - you will probably call the bulk copy at the beginning, or at the
>>> end of your computation (or both ends!) - and having an offset shift is
>>> more tolerable.
>>>
>>> So, I'm open to evaluating something more minimal, and see how far we
>>> can get with it.
>>>
>>> Maurizio
>>>
>>>
>>> Lee
>>>
>>> On Thu, Jun 3, 2021 at 1:26 PM Maurizio Cimadamore <
>>> maurizio.cimadamore at oracle.com> wrote:
>>>
>>>> Thanks.
>>>>
>>>> While I agree that a segment is just a bunch of bytes, I think also
>>>> think that having an API which takes two indices at the same time, one
>>>> logical and one physical is kind of evil. I bet that a lot of developers
>>>> will get that wrong, at least the first time.
>>>>
>>>> What you say about segments being an array of bytes, and thus not
>>>> needing index-based overloads can also be said for MemoryAccess class - but
>>>> I believe that the index-based accessors in the are quite popular.
>>>>
>>>> But overall, my worry is not performances, or symmetry with respect to
>>>> other APIs - my main worry is what I said above: that people will just
>>>> assume that indices are logical for both the segment and the array - so
>>>> overloads with clearly different names should help quite a bit IMHO.
>>>>
>>>>
>>>> Maurizio
>>>> On 03/06/2021 20:50, leerho wrote:
>>>>
>>>> Maurizio,
>>>> Here is the MemoryCopy class (w/o byte swap) and the TestMemoryCopy
>>>> class (w/o testing byte swap).
>>>>
>>>> I copied from the TestMemoryAccess class in that I noticed that there
>>>> is no package statement.  I am not sure what @SuppressWarnings are allowed
>>>> in your test environment, I suspect I should remove all of them.
>>>>
>>>> When the MemorySegment copyFrom(MemorySegment, MemoryLayout, ByteOrder)
>>>> becomes available I can add the tests for that.
>>>>
>>>> *IMHO*
>>>>
>>>> I feel strongly that there is no need to additionally overload these
>>>> methods with the segment offset specified in array index units.  It is so
>>>> trivial to convert from one to the other and it can be done easily in the
>>>> method argument with a simple multiply by 2, 4 or 8. This should compile
>>>> down to a simple shift, which becomes a single cycle CPU instruction.  So
>>>> this is not a performance issue.
>>>>
>>>> Furthermore, it is best that the user becomes accustomed to thinking of
>>>> a segment fundamentally as an array of bytes.  Once a segment is loaded
>>>> with some primitive array, the segment loses the context of the type of the
>>>> array it was loaded with; in effect, a kind of "type erasure" similar to
>>>> Java's generics.  This is one of the reasons that MemorySegments can be so
>>>> powerful.
>>>>
>>>>
>>>> I have been programming with the "MemorySegment" concept for a number
>>>> of years now and find that keeping in mind that segments are just bytes is
>>>> very useful.
>>>>
>>>>
>>>> Let me know what you think.
>>>>
>>>> Lee.
>>>>
>>>>
>>>>
>>>> On Mon, May 31, 2021 at 6:56 AM Maurizio Cimadamore <
>>>> maurizio.cimadamore at oracle.com> wrote:
>>>>
>>>>> Hi Lee,
>>>>> this looks good.
>>>>>
>>>>> One thing I note is that there's an ambiguity as to whether the
>>>>> segment index is expressed as a logical index, or a raw byte offset. Your
>>>>> snippet does the latter. If we want to follow MemoryAccess, perhaps that
>>>>> calls for 2 overloads (copyFromArrayAtIndex/copyFromArrayAtOffset), as I
>>>>> imagine both could be useful, depending on the case?
>>>>>
>>>>> The javadoc will have to say something when the segment being used is
>>>>> backed by the very array that is the source/target of the copy (we have
>>>>> some text like that in MemorySegment::copyFrom).
>>>>>
>>>>> I think it would be helpful to progress further with this, add the
>>>>> remaining templates (w/o ByteOrder, for now) test and see how it works in
>>>>> practice.
>>>>>
>>>>> I will add (or just file a simple PR, so that you can just borrow from
>>>>> it - should be a single method) something to do the segment copy with swap.
>>>>>
>>>>> Thanks!
>>>>> Maurizio
>>>>> On 28/05/2021 22:37, leerho wrote:
>>>>>
>>>>> Maurizio,
>>>>>
>>>>> Again sorry about the delay.
>>>>>
>>>>> Attached is a template proposal for the MemoryCopy class.
>>>>>
>>>>> I can't complete this without your proposed new copyFrom(...) method
>>>>> in MemorySegment. As it is written, it should work, but without the byte
>>>>> swap capability.
>>>>>
>>>>> I can complete the rest of the primitives like this template, if you
>>>>> would like with javadocs.  I could also start writing tests, but without
>>>>> the byte-swap.
>>>>>
>>>>> Let me know what would be most helpful.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Lee
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 20, 2021 at 11:36 AM leerho <leerho at gmail.com> wrote:
>>>>>
>>>>>> Of course!
>>>>>>
>>>>>> On Thu, May 20, 2021 at 9:42 AM Maurizio Cimadamore <
>>>>>> maurizio.cimadamore at oracle.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> On 20/05/2021 17:13, leerho wrote:
>>>>>>>
>>>>>>> I am not sure if the *dstSegSlice* requires the *srcCopyLen*. I
>>>>>>> would hope that it is smart enough to realize that the input length is
>>>>>>> smaller than the given offset minus the segment size.
>>>>>>>
>>>>>>> asSlice has an overload that just takes an offset and infers the
>>>>>>> resulting size from there.
>>>>>>>
>>>>>>> But that doesn't seem what you want here - as you want the slice to
>>>>>>> have a specific size (the size of the input array).
>>>>>>>
>>>>>>> MemorySegment::copyFrom wants the two segments to have the same
>>>>>>> size, so I think you need that.
>>>>>>>
>>>>>>> In terms of performance, there's no difference between
>>>>>>> asSlice(offset) and asSlice(offset, size) - you have to create a new
>>>>>>> segment anyway.
>>>>>>>
>>>>>>> Maurizio
>>>>>>>
>>>>>>>
>>>>>>> --
>>> From my cell phone.
>>>
>>> --
>> From my cell phone.
>>
>