[External] : Re: Class files in ByteBuffer
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Fri Mar 21 15:58:31 UTC 2025
On 21/03/2025 15:44, David Lloyd wrote:
>
>
> On Fri, Mar 21, 2025 at 10:25 AM Maurizio Cimadamore
> <maurizio.cimadamore at oracle.com> wrote:
>
>
> On 21/03/2025 15:17, David Lloyd wrote:
>> Yes, you *could* wrap a byte[] with a MemorySegment, but then
>> you'd just have a heap memory segment; this is probably not what
>> the user is looking for (otherwise they'd probably just build to
>> a byte[] to begin with). By using the user-supplied allocator, I
>> can (for example) write the class bytes directly to a
>> memory-mapped file.
>
> Ah! I now see that first allocator parameter -- which makes sense.
>
> I'd suggest to turn that into SegmentAllocator -- note that Arena
> is also a SegmentAllocator, so you can pass either your own
> allocating lambda, or directly use an arena which is nice (I think).
>
>
> Ah, that's an interesting idea. The idea with using the plain function
> is that the user would explicitly be in charge of deciding all of the
> characteristics of the buffer (including, for example, alignment). I'm
> not quite sure how that shakes out (ergonomically speaking) with using
> a `SegmentAllocator`, because either I'd have to send in the alignment
> (which means, I guess, accepting it as a parameter in the
> `buildToMemorySegment` methods), or else rely on the user to override
> that method in their allocator. With a plain function, the user could
> always pass in `mySegmentAllocator::allocate` or `size ->
> mySegmentAllocator.allocate(size, 8)` or whatever. Also, with a plain
> function, I could pass one in which easily yields a subsegment of a
> parent segment e.g. `size -> parentSegment.asSlice(offset, size)`.
> Would it still be easy to do this while accepting a `SegmentAllocator`?
Note: SementAllocator is a functional interface. So you can always pass
a lambda to implement it -- e.g.
buildToMemorySegment((size, _) -> getMeASegment(size));
And, SegmentAllocator already suports sliced allocation - see
SegmentAllocator::slicingAllocator (which creates a segment allocator
from an existing segment and keeps slicing from it, at consecutive
offsets until it runs out), or SegmentAllocator::prefixAllocator (which
creates a segment allocator from an existing segment and keeps slicing
from it from the start of the segment -- possibly overwriting each time)
>
> ByteBuffer don't have the allocation abstraction -- so I wonder if
> we really should let them go, and maybe only add the segment API
> with the allocator parameter. Or maybe have a simpler API for
> ByteBuffer w/o allocator parameter, which always allocates a heap
> buffer. Then if users want something more complex (e.g. a mapped
> byte buffer) they can use the MS API instead, and then wrap the MS
> into a BB after the fact.
>
>
> Well, ByteBuffer has the same allocation abstraction that (for
> example) arrays do when you're calling `Collection.toArray(generator)`
> - that is, `IntFunction<T>`. Using this abstraction, one can basically
> get all the same benefits described above - the ability to select
> direct or heap, the ability to return a subslice of a parent buffer,
> etc. But I agree, it seems that semantically the buffer stuff is
> pretty much exactly redundant with respect to the MemorySegment
> variations (since we have `MemorySegment.asByteBuffer()` and
> `MemorySegment.ofBuffer()`), so I could see dropping it and just
> living with the minor asymmetry with `ClassLoader.defineClass(...)`.
Yeah -- we could go both ways... I guess my sense is that since
SegmentAllocator is an official thing (TM ;-) ), it has more right to be
in the API than a "random" BB-generating function. Also, I guess what
I'm saying is that, if you squint, the memory segment-accepting methods
are really the primitve ones as all the others (including byte[]) can be
derived from there. But, at the moment we know we can't go all the way
down there...
Maurizio
> Maurizio
>
>
>>
>> That said, the buildTo* variants exist because Brian suggested
>> that the API might be unacceptably asymmetrical otherwise. Either
>> way, the part that matters the most to me is the parsing side.
>>
>> The idea with the API is that, while the current impl would need
>> to copy to a byte array, it is at least theoretically possible
>> that in the future, that may change. As a user of the
>> byte[]-based API, I am already copying from my direct buffer, so
>> it's not worse than the status quo. By putting the copy into the
>> JDK, if the JDK does get enhanced someday to use MemorySegment
>> internally, or maybe Unsafe, or whatever, then I'll get the
>> benefit of this change. If it doesn't, then I'm no worse off than
>> I am today (slightly better off actually, because it saves me a
>> step going in). Supporting this use case would be beneficial for
>> the same reason that it is beneficial to be able to define
>> classes out of direct buffers (which has been supported since JDK
>> 1.5).
>>
>> The thing I'm mainly in doubt about is that the ability to parse
>> from or generate to byte buffers is potentially redundant with
>> respect to MemorySegment. It would just be a bit weird if I could
>> define a class using an array or a byte buffer, but parsing
>> classes used arrays or memory segments. Is it weird enough to
>> justify the third API variant? I don't know.
>>
>> On Fri, Mar 21, 2025 at 9:27 AM Maurizio Cimadamore
>> <maurizio.cimadamore at oracle.com> wrote:
>>
>> I looked at the PR and I had a similar reaction as Adam.
>>
>> It seems like the buildToXYZ methods have one less copy,
>> but... if the API returns a byte[], you can always wrap the
>> byte[] as either a MemorySegment (MemorySegment::ofArrays) or
>> a ByteBuffer (ByteBuffer::wrap). These methods do _not_ copy.
>>
>> The way I see it, is that this PR doesn't remove the need to
>> copy on the way in (that would be a much more complex
>> change), and it doesn't improve things significantly _on the
>> way out_ (because, that side is already covered by existing
>> API/views support).
>>
>> So, while I'm not strongly opposed to the changes in the PR,
>> at the same time I don't see a lot of added value in the new
>> methods. At the same time I'm a bit skeptical to commit to
>> the new API in without doing the full exercise to see how
>> these methods might be implemented.
>>
>> A good motto in API design is "when in doubt, leave it out"
>> -- is this one of those cases?
>>
>> Maurizio
>>
>>
>>
>> On 21/03/2025 13:34, David Lloyd wrote:
>>> The idea is that in the future, it may be possible to do
>>> these things without the extra copy. In the PR, I found that
>>> we can already build to memory segments and byte buffers
>>> without more copies than what we're doing for arrays. On the
>>> parsing side, we can already sometimes work without copying
>>> in some cases that the user won't have access to (e.g.
>>> accessing the backing array of a memory segment, even if
>>> it's read-only). It's not hard to imagine that we could
>>> possibly have a way to parse without the extra copy in the
>>> native memory case in the future, without impacting current
>>> performance on arrays. But without the API support, it can
>>> never be possible.
>>>
>>> On Fri, Mar 21, 2025 at 8:26 AM Adam Sotona
>>> <adam.sotona at oracle.com> wrote:
>>>
>>> I’m more thinking that the API already provides all the
>>> important entries and conversion from and to
>>> `MemorySegment` can be done by simple call of
>>> `MemorySegment::toArray` and `MemorySegment::ofArray`.
>>>
>>> *From: *David Lloyd <david.lloyd at redhat.com>
>>> *Date: *Friday, 21 March 2025 at 13:37
>>> *To: *Adam Sotona <adam.sotona at oracle.com>
>>> *Cc: *classfile-api-dev at openjdk.org
>>> <classfile-api-dev at openjdk.org>
>>> *Subject: *[External] : Re: Class files in ByteBuffer
>>>
>>> Please have a look at the PR. If you feel the API
>>> surface has grown too much, maybe removing the
>>> `ByteBuffer` variants is a logical step, since users can
>>> always wrap a `ByteBuffer` with a `MemorySegment`? If
>>> you could comment on the PR if you feel that to be the
>>> case, I would appreciate it.
>>>
>>> Thanks.
>>>
>>> On Thu, Mar 20, 2025 at 4:09 PM Adam Sotona
>>> <adam.sotona at oracle.com> wrote:
>>>
>>> I’m sorry to join the discussion a bit late.
>>>
>>> Here are the points to consider:
>>>
>>> * Class-File API is implementation is after many
>>> rounds of performance optimizations purely based
>>> on byte arrays.
>>> * Internal use of ByteBuffer has been removed from
>>> the implementation, as it caused significant JDK
>>> bootstrap performance regression.
>>> * Enormous amount of work has been spent on the
>>> API surface reduction and removal of all
>>> unnecessary “conveniences”.
>>>
>>> Adam
>>>
>>> *From: *classfile-api-dev
>>> <classfile-api-dev-retn at openjdk.org> on behalf of
>>> David Lloyd <david.lloyd at redhat.com>
>>> *Date: *Thursday, 20 March 2025 at 21:11
>>> *To: *classfile-api-dev at openjdk.org
>>> <classfile-api-dev at openjdk.org>
>>> *Subject: *Re: Class files in ByteBuffer
>>>
>>> I've opened a bug [1] and pull request [2]
>>> incorporating this discussion (more or less). I've
>>> implemented support for both `MemorySegment` and
>>> `ByteBuffer`, but this could be revisited if it
>>> doesn't look OK. The implementation is not terribly
>>> invasive for now, only grabbing a few low-hanging
>>> optimizations.
>>>
>>> [1] https://bugs.openjdk.org/browse/JDK-8352536
>>> <https://bugs.openjdk.org/browse/JDK-8352536>
>>>
>>> [2] https://github.com/openjdk/jdk/pull/24139
>>> <https://urldefense.com/v3/__https:/github.com/openjdk/jdk/pull/24139__;!!ACWV5N9M2RV99hQ!MNdgtt56CgKWQQak5PkwC2t1L1q0tTZrT2fqzVPHYo3jer5CUfysIIx9uMuCoLb27YGoQWuUDgu3s_h8ZhlN-w$>
>>>
>>> On Mon, Mar 10, 2025 at 12:38 PM David Lloyd
>>> <david.lloyd at redhat.com> wrote:
>>>
>>> When defining a class in the JDK, one may either
>>> use a byte array or a byte buffer to hold the
>>> contents of the class. The latter is useful when
>>> (for example) a JAR file containing uncompressed
>>> classes is mapped into memory. Thus, some class
>>> loaders depend on this form of the API for class
>>> definition.
>>>
>>> If I were to supplement such a class loader with
>>> a class transformation step based on the class
>>> file API, I would have to copy the bytes of each
>>> class on to the heap as a byte[] before I could
>>> begin parsing it. This is potentially expensive,
>>> and definitely awkward.
>>>
>>> After transformation, it doesn't really matter
>>> if you have a byte[] or ByteBuffer because
>>> either way, the class can be defined directly.
>>>
>>> It would be nice if the class file parser could
>>> accept either a byte[] or a ByteBuffer. I did a
>>> quick bit of exploratory work and it looks like
>>> porting the code to read from a ByteBuffer
>>> instead of a byte[] (using ByteBuffer.wrap()
>>> for the array case) would be largely
>>> straightforward *except* for the code which
>>> parses UTF-8 constants into strings. Also there
>>> could be some small performance differences
>>> (maybe positive, maybe negative) depending on
>>> how the buffer is accessed.
>>>
>>> Is this something that might be considered?
>>>
>>> --
>>>
>>> - DML • he/him
>>>
>>>
>>> --
>>>
>>> - DML • he/him
>>>
>>>
>>> --
>>>
>>> - DML • he/him
>>>
>>>
>>>
>>> --
>>> - DML • he/him
>>
>>
>>
>> --
>> - DML • he/him
>
>
>
> --
> - DML • he/him
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/classfile-api-dev/attachments/20250321/e7b584e1/attachment-0001.htm>
More information about the classfile-api-dev
mailing list