Part 1 Proposal for JDK-8264594
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Wed Jun 9 11:53:12 UTC 2021
Thanks for the background. I think my question was ill-posed, as I
realize that even in the cases you describe, you could always avoid the
need for a bulk copy with raw byte offset by doing slicing and copy.
e.g.
copyIntoArrayRaw(segment, byteOffset, long[], arrayIndex)
can be rewritten as:
copyIntoArrayLogical(segment.asSlice(byteOffset), 0, long[], arrayIndex)
So, if we find that having a physical (segment offset) vs. logical
(array offset) mismatch is too confusing, we *could* fall back to
slicing and using logical offsets only.
Maurizio
On 08/06/2021 17:54, leerho wrote:
>
> Question: do you have concrete use cases where you'd like to copy
> contents of a double[] at non-aligned positions inside a segment?
>
>
> Absolutely! In fact, specifying the offset in bytes is probably the
> most common. This is because many of our structs start with a
> preamble of a mix of byte, short, int, and float fields, and then
> followed by several data array regions. For example an array of ints
> followed by an array of longs, which in some cases may not be unit
> aligned with the base of the segment. This can be particularly true
> of structs that are defined by other users where alignment cannot be
> guaranteed. In fact, one of our customers has years of historical
> data (petabytes) that are not at all byte aligned. So to assume that
> everything is neatly byte aligned would be a big mistake.
> Unfortunately, many engineers that have created files of data over the
> years did so without any awareness of the need for byte-alignment.
>
> Lee.
>
> On Tue, Jun 8, 2021 at 7:00 AM Maurizio Cimadamore
> <maurizio.cimadamore at oracle.com
> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>
> Hi Lee,
> thanks for this, I'll try to put together a PR in the next week or so.
>
> Re. overloads I was thinking about how the ByteBuffer API goes
> about them, and, in I see that that API doesn't have the same
> issues, given that ByteBuffer only provides bulk copy for byte
> arrays. If you want to bulk copy a double[] you have to convert to
> a DoubleBuffer first and then copy (and then use logical indices
> for the buffer too).
>
> Question: do you have concrete use cases where you'd like to copy
> contents of a double[] at non-aligned positions inside a segment?
>
> I'm asking because, in principle we could leave the methods in
> CopyMemory cover the aligned, logical indices only part of the
> API. And ask the user to fall back to MemorySegment::copyFrom if
> they really really want unaligned (which should be uncommon?).
>
> Cheers
> Maurizio
>
> On 08/06/2021 00:55, leerho wrote:
>> Maurizio,
>> I have updated MemoryCopy and TestMemoryCopy by extending the
>> variable names to be even more clear, for example:
>>
>> * srcArr -> srcArray
>> * srcSeg -> srcSegment
>> * srcCopyLen -> srcCopyLengthInts
>> * srcIndex -> srcIndexLongs
>> * Etc.
>>
>> I also added a comment in the class javadoc describing the
>> difference between variables with the word "index" and ones with
>> the word "offset", etc.
>>
>> I have looked at what it would take to add overloads that refer
>> to the segment offset in array element units. It can be done,
>> but the names of these new methods will be kinda ugly or awkward:
>>
>> * copyFromArrayAtIndex -- confusing since the array uses index
>> units already
>> * copyAtIndexFromArray -- not much better
>> * copyFromArrayWithIndex -- nah
>> * copyFromArrayToSegementUsingIndex -- a little more clear, but
>> really awkward
>>
>> If you have any better ideas, I am open. (But IMHO, I don't
>> think we need this :) )
>>
>> Please let me know what the next step should be :)
>>
>> Thanks!
>>
>> Lee.
>>
>> On Fri, Jun 4, 2021 at 7:09 AM leerho <leerho at gmail.com
>> <mailto:leerho at gmail.com>> wrote:
>>
>> Maurizio,
>> Thank you for your reply. I agree with your observation in
>> the difference between the single item case and the bulk
>> copy. I guess I became so familiar with how offsets are
>> handled in Unsafe where the single element and bulk copy
>> cases the references to memory are always in bytes, it seamed
>> so simple and natural.
>>
>> In my experience anyway, reading and writing data to off-heap
>> almost always involves a mix of preamble elements and bulk
>> copy arrays where it was necessary to track the offset in
>> bytes, where the multibyte elements may not be correctly
>> aligned to the base of the allocated segment, or God forbid,
>> the arrays.
>>
>> What is most important is getting this into the JDK, and if I
>> have to add more overloads, I will :)
>>
>> Lee.
>>
>> On Fri, Jun 4, 2021 at 2:34 AM Maurizio Cimadamore
>> <maurizio.cimadamore at oracle.com
>> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>
>>
>> On 04/06/2021 01:32, leerho wrote:
>>> When I first saw all the logical index overloads in
>>> MemoryAccess I shuddered. It is a lot of API clutter as
>>> far as I’m concerned. But just because we made that
>>> mistake with MemoryAccess doesn’t mean we have to
>>> continue propagating it.
>>
>> I'm sorry, I just don't see that as being a "mistake".
>> Code like this:
>>
>> ```
>> for (int i = 0 ; i < limit ; i++) {
>> long val = MemoryAccess.getLongAtIndex(segment, i);
>> ...
>> }
>> ```
>>
>> Is _extremely_ common. While it's true that memory
>> segment is an unbiased array of bytes, there are times
>> you want to access it in a more structured form - and
>> saying "you can't do that" because it adds overloads
>> doesn't seem like a great solution to me.
>>
>>>
>>> Also I have explicitly named the two offsets very
>>> differently to try to make it clear that they are very
>>> different. I guess I could go further and name the
>>> “srcIndex” -> “srcIndexLongs” for the long array case,
>>> for example. But since “index” is used elsewhere as the
>>> logical index I assumed that this would not be necessary.
>>>
>>> But I would much rather use longer, more explicit names
>>> than propagate API clutter.
>>
>> I was thinking overnight that, perhaps, the memory copy
>> and the single element cases are different. When you do
>> single access (see above) you do that frequently in a
>> loop - and you often get elements of the same type. As
>> stated above, asking developers to work in terms of
>> offsets seems a bit odd, when they are really accessing a
>> flat array of element type T.
>>
>> In the case of bulk copy, I don't think there's the same
>> degree of pressure - you will probably call the bulk copy
>> at the beginning, or at the end of your computation (or
>> both ends!) - and having an offset shift is more tolerable.
>>
>> So, I'm open to evaluating something more minimal, and
>> see how far we can get with it.
>>
>> Maurizio
>>
>>>
>>> Lee
>>>
>>> On Thu, Jun 3, 2021 at 1:26 PM Maurizio Cimadamore
>>> <maurizio.cimadamore at oracle.com
>>> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>>
>>> Thanks.
>>>
>>> While I agree that a segment is just a bunch of
>>> bytes, I think also think that having an API which
>>> takes two indices at the same time, one logical and
>>> one physical is kind of evil. I bet that a lot of
>>> developers will get that wrong, at least the first time.
>>>
>>> What you say about segments being an array of bytes,
>>> and thus not needing index-based overloads can also
>>> be said for MemoryAccess class - but I believe that
>>> the index-based accessors in the are quite popular.
>>>
>>> But overall, my worry is not performances, or
>>> symmetry with respect to other APIs - my main worry
>>> is what I said above: that people will just assume
>>> that indices are logical for both the segment and
>>> the array - so overloads with clearly different
>>> names should help quite a bit IMHO.
>>>
>>>
>>> Maurizio
>>>
>>> On 03/06/2021 20:50, leerho wrote:
>>>> Maurizio,
>>>> Here is the MemoryCopy class (w/o byte swap) and
>>>> the TestMemoryCopy class (w/o testing byte swap).
>>>>
>>>> I copied from the TestMemoryAccess class in that I
>>>> noticed that there is no package statement. I am
>>>> not sure what @SuppressWarnings are allowed in your
>>>> test environment, I suspect I should remove all of
>>>> them.
>>>>
>>>> When the MemorySegment copyFrom(MemorySegment,
>>>> MemoryLayout, ByteOrder) becomes available I can
>>>> add the tests for that.
>>>>
>>>> *IMHO*
>>>>
>>>> I feel strongly that there is no need to
>>>> additionally overload these methods with the
>>>> segment offset specified in array index units.
>>>> It is so trivial to convert from one to the
>>>> other and it can be done easily in the method
>>>> argument with a simple multiply by 2, 4 or 8.
>>>> This should compile down to a simple shift,
>>>> which becomes a single cycle CPU instruction.
>>>> So this is not a performance issue.
>>>>
>>>> Furthermore, it is best that the user becomes
>>>> accustomed to thinking of a segment
>>>> fundamentally as an array of bytes. Once a
>>>> segment is loaded with some primitive array,
>>>> the segment loses the context of the type of
>>>> the array it was loaded with; in effect, a kind
>>>> of "type erasure" similar to Java's generics.
>>>> This is one of the reasons that MemorySegments
>>>> can be so powerful.
>>>>
>>>>
>>>> I have been programming with the
>>>> "MemorySegment" concept for a number of years
>>>> now and find that keeping in mind that segments
>>>> are just bytes is very useful.
>>>>
>>>>
>>>> Let me know what you think.
>>>>
>>>> Lee.
>>>>
>>>>
>>>>
>>>> On Mon, May 31, 2021 at 6:56 AM Maurizio Cimadamore
>>>> <maurizio.cimadamore at oracle.com
>>>> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>>>
>>>> Hi Lee,
>>>> this looks good.
>>>>
>>>> One thing I note is that there's an ambiguity
>>>> as to whether the segment index is expressed as
>>>> a logical index, or a raw byte offset. Your
>>>> snippet does the latter. If we want to follow
>>>> MemoryAccess, perhaps that calls for 2
>>>> overloads
>>>> (copyFromArrayAtIndex/copyFromArrayAtOffset),
>>>> as I imagine both could be useful, depending on
>>>> the case?
>>>>
>>>> The javadoc will have to say something when the
>>>> segment being used is backed by the very array
>>>> that is the source/target of the copy (we have
>>>> some text like that in MemorySegment::copyFrom).
>>>>
>>>> I think it would be helpful to progress further
>>>> with this, add the remaining templates (w/o
>>>> ByteOrder, for now) test and see how it works
>>>> in practice.
>>>>
>>>> I will add (or just file a simple PR, so that
>>>> you can just borrow from it - should be a
>>>> single method) something to do the segment copy
>>>> with swap.
>>>>
>>>> Thanks!
>>>> Maurizio
>>>>
>>>> On 28/05/2021 22:37, leerho wrote:
>>>>> Maurizio,
>>>>>
>>>>> Again sorry about the delay.
>>>>>
>>>>> Attached is a template proposal for the
>>>>> MemoryCopy class.
>>>>>
>>>>> I can't complete this without your proposed
>>>>> new copyFrom(...) method in MemorySegment. As
>>>>> it is written, it should work, but without the
>>>>> byte swap capability.
>>>>>
>>>>> I can complete the rest of the primitives like
>>>>> this template, if you would like with
>>>>> javadocs. I could also start writing tests,
>>>>> but without the byte-swap.
>>>>>
>>>>> Let me know what would be most helpful.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Lee
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 20, 2021 at 11:36 AM leerho
>>>>> <leerho at gmail.com <mailto:leerho at gmail.com>>
>>>>> wrote:
>>>>>
>>>>> Of course!
>>>>>
>>>>> On Thu, May 20, 2021 at 9:42 AM Maurizio
>>>>> Cimadamore <maurizio.cimadamore at oracle.com
>>>>> <mailto:maurizio.cimadamore at oracle.com>>
>>>>> wrote:
>>>>>
>>>>>
>>>>> On 20/05/2021 17:13, leerho wrote:
>>>>>> I am not sure if the /dstSegSlice/
>>>>>> requires the /srcCopyLen/. I would
>>>>>> hope that it is smart enough to
>>>>>> realize that the input length is
>>>>>> smaller than the given offset minus
>>>>>> the segment size.
>>>>>
>>>>> asSlice has an overload that just
>>>>> takes an offset and infers the
>>>>> resulting size from there.
>>>>>
>>>>> But that doesn't seem what you want
>>>>> here - as you want the slice to have a
>>>>> specific size (the size of the input
>>>>> array).
>>>>>
>>>>> MemorySegment::copyFrom wants the two
>>>>> segments to have the same size, so I
>>>>> think you need that.
>>>>>
>>>>> In terms of performance, there's no
>>>>> difference between asSlice(offset) and
>>>>> asSlice(offset, size) - you have to
>>>>> create a new segment anyway.
>>>>>
>>>>> Maurizio
>>>>>
>>>>>
>>> --
>>> From my cell phone.
>>
>> --
>> From my cell phone.
>>
More information about the panama-dev
mailing list