[External] : Re: Class files in ByteBuffer

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Fri Mar 21 15:58:31 UTC 2025


On 21/03/2025 15:44, David Lloyd wrote:
>
>
> On Fri, Mar 21, 2025 at 10:25 AM Maurizio Cimadamore 
> <maurizio.cimadamore at oracle.com> wrote:
>
>
>     On 21/03/2025 15:17, David Lloyd wrote:
>>     Yes, you *could* wrap a byte[] with a MemorySegment, but then
>>     you'd just have a heap memory segment; this is probably not what
>>     the user is looking for (otherwise they'd probably just build to
>>     a byte[] to begin with). By using the user-supplied allocator, I
>>     can (for example) write the class bytes directly to a
>>     memory-mapped file.
>
>     Ah! I now see that first allocator parameter -- which makes sense.
>
>     I'd suggest to turn that into SegmentAllocator -- note that Arena
>     is also a SegmentAllocator, so you can pass either your own
>     allocating lambda, or directly use an arena which is nice (I think).
>
>
> Ah, that's an interesting idea. The idea with using the plain function 
> is that the user would explicitly be in charge of deciding all of the 
> characteristics of the buffer (including, for example, alignment). I'm 
> not quite sure how that shakes out (ergonomically speaking) with using 
> a `SegmentAllocator`, because either I'd have to send in the alignment 
> (which means, I guess, accepting it as a parameter in the 
> `buildToMemorySegment` methods), or else rely on the user to override 
> that method in their allocator. With a plain function, the user could 
> always pass in `mySegmentAllocator::allocate` or `size -> 
> mySegmentAllocator.allocate(size, 8)` or whatever. Also, with a plain 
> function, I could pass one in which easily yields a subsegment of a 
> parent segment e.g. `size -> parentSegment.asSlice(offset, size)`. 
> Would it still be easy to do this while accepting a `SegmentAllocator`?

Note: SementAllocator is a functional interface. So you can always pass 
a lambda to implement it -- e.g.

buildToMemorySegment((size, _) -> getMeASegment(size));

And, SegmentAllocator already suports sliced allocation - see 
SegmentAllocator::slicingAllocator (which creates a segment allocator 
from an existing segment and keeps slicing from it, at consecutive 
offsets until it runs out), or SegmentAllocator::prefixAllocator (which 
creates a segment allocator from an existing segment and keeps slicing 
from it from the start of the segment -- possibly overwriting each time)

>
>     ByteBuffer don't have the allocation abstraction -- so I wonder if
>     we really should let them go, and maybe only add the segment API
>     with the allocator parameter. Or maybe have a simpler API for
>     ByteBuffer w/o allocator parameter, which always allocates a heap
>     buffer. Then if users want something more complex (e.g. a mapped
>     byte buffer) they can use the MS API instead, and then wrap the MS
>     into a BB after the fact.
>
>
> Well, ByteBuffer has the same allocation abstraction that (for 
> example) arrays do when you're calling `Collection.toArray(generator)` 
> - that is, `IntFunction<T>`. Using this abstraction, one can basically 
> get all the same benefits described above - the ability to select 
> direct or heap, the ability to return a subslice of a parent buffer, 
> etc. But I agree, it seems that semantically the buffer stuff is 
> pretty much exactly redundant with respect to the MemorySegment 
> variations (since we have `MemorySegment.asByteBuffer()` and 
> `MemorySegment.ofBuffer()`), so I could see dropping it and just 
> living with the minor asymmetry with `ClassLoader.defineClass(...)`.

Yeah -- we could go both ways... I guess my sense is that since 
SegmentAllocator is an official thing (TM ;-) ), it has more right to be 
in the API than a "random" BB-generating function. Also, I guess what 
I'm saying is that, if you squint, the memory segment-accepting methods 
are really the primitve ones as all the others (including byte[]) can be 
derived from there. But, at the moment we know we can't go all the way 
down there...

Maurizio

>     Maurizio
>
>
>>
>>     That said, the buildTo* variants exist because Brian suggested
>>     that the API might be unacceptably asymmetrical otherwise. Either
>>     way, the part that matters the most to me is the parsing side.
>>
>>     The idea with the API is that, while the current impl would need
>>     to copy to a byte array, it is at least theoretically possible
>>     that in the future, that may change. As a user of the
>>     byte[]-based API, I am already copying from my direct buffer, so
>>     it's not worse than the status quo. By putting the copy into the
>>     JDK, if the JDK does get enhanced someday to use MemorySegment
>>     internally, or maybe Unsafe, or whatever, then I'll get the
>>     benefit of this change. If it doesn't, then I'm no worse off than
>>     I am today (slightly better off actually, because it saves me a
>>     step going in). Supporting this use case would be beneficial for
>>     the same reason that it is beneficial to be able to define
>>     classes out of direct buffers (which has been supported since JDK
>>     1.5).
>>
>>     The thing I'm mainly in doubt about is that the ability to parse
>>     from or generate to byte buffers is potentially redundant with
>>     respect to MemorySegment. It would just be a bit weird if I could
>>     define a class using an array or a byte buffer, but parsing
>>     classes used arrays or memory segments. Is it weird enough to
>>     justify the third API variant? I don't know.
>>
>>     On Fri, Mar 21, 2025 at 9:27 AM Maurizio Cimadamore
>>     <maurizio.cimadamore at oracle.com> wrote:
>>
>>         I looked at the PR and I had a similar reaction as Adam.
>>
>>         It seems like the buildToXYZ methods have one less copy,
>>         but... if the API returns a byte[], you can always wrap the
>>         byte[] as either a MemorySegment (MemorySegment::ofArrays) or
>>         a ByteBuffer (ByteBuffer::wrap). These methods do _not_ copy.
>>
>>         The way I see it, is that this PR doesn't remove the need to
>>         copy on the way in (that would be a much more complex
>>         change), and it doesn't improve things significantly _on the
>>         way out_ (because, that side is already covered by existing
>>         API/views support).
>>
>>         So, while I'm not strongly opposed to the changes in the PR,
>>         at the same time I don't see a lot of added value in the new
>>         methods. At the same time I'm a bit skeptical to commit to
>>         the new API in without doing the full exercise to see how
>>         these methods might be implemented.
>>
>>         A good motto in API design is "when in doubt, leave it out"
>>         -- is this one of those cases?
>>
>>         Maurizio
>>
>>
>>
>>         On 21/03/2025 13:34, David Lloyd wrote:
>>>         The idea is that in the future, it may be possible to do
>>>         these things without the extra copy. In the PR, I found that
>>>         we can already build to memory segments and byte buffers
>>>         without more copies than what we're doing for arrays. On the
>>>         parsing side, we can already sometimes work without copying
>>>         in some cases that the user won't have access to (e.g.
>>>         accessing the backing array of a memory segment, even if
>>>         it's read-only). It's not hard to imagine that we could
>>>         possibly have a way to parse without the extra copy in the
>>>         native memory case in the future, without impacting current
>>>         performance on arrays. But without the API support, it can
>>>         never be possible.
>>>
>>>         On Fri, Mar 21, 2025 at 8:26 AM Adam Sotona
>>>         <adam.sotona at oracle.com> wrote:
>>>
>>>             I’m more thinking that the API already provides all the
>>>             important entries and conversion from and to
>>>             `MemorySegment` can be done by simple call of
>>>             `MemorySegment::toArray` and `MemorySegment::ofArray`.
>>>
>>>             *From: *David Lloyd <david.lloyd at redhat.com>
>>>             *Date: *Friday, 21 March 2025 at 13:37
>>>             *To: *Adam Sotona <adam.sotona at oracle.com>
>>>             *Cc: *classfile-api-dev at openjdk.org
>>>             <classfile-api-dev at openjdk.org>
>>>             *Subject: *[External] : Re: Class files in ByteBuffer
>>>
>>>             Please have a look at the PR. If you feel the API
>>>             surface has grown too much, maybe removing the
>>>             `ByteBuffer` variants is a logical step, since users can
>>>             always wrap a `ByteBuffer` with a `MemorySegment`? If
>>>             you could comment on the PR if you feel that to be the
>>>             case, I would appreciate it.
>>>
>>>             Thanks.
>>>
>>>             On Thu, Mar 20, 2025 at 4:09 PM Adam Sotona
>>>             <adam.sotona at oracle.com> wrote:
>>>
>>>                 I’m sorry to join the discussion a bit late.
>>>
>>>                 Here are the points to consider:
>>>
>>>                   * Class-File API is implementation is after many
>>>                     rounds of performance optimizations purely based
>>>                     on byte arrays.
>>>                   * Internal use of ByteBuffer has been removed from
>>>                     the implementation, as it caused significant JDK
>>>                     bootstrap performance regression.
>>>                   * Enormous amount of work has been spent on the
>>>                     API surface reduction and removal of all
>>>                     unnecessary “conveniences”.
>>>
>>>                 Adam
>>>
>>>                 *From: *classfile-api-dev
>>>                 <classfile-api-dev-retn at openjdk.org> on behalf of
>>>                 David Lloyd <david.lloyd at redhat.com>
>>>                 *Date: *Thursday, 20 March 2025 at 21:11
>>>                 *To: *classfile-api-dev at openjdk.org
>>>                 <classfile-api-dev at openjdk.org>
>>>                 *Subject: *Re: Class files in ByteBuffer
>>>
>>>                 I've opened a bug [1] and pull request [2]
>>>                 incorporating this discussion (more or less). I've
>>>                 implemented support for both `MemorySegment` and
>>>                 `ByteBuffer`, but this could be revisited if it
>>>                 doesn't look OK. The implementation is not terribly
>>>                 invasive for now, only grabbing a few low-hanging
>>>                 optimizations.
>>>
>>>                 [1] https://bugs.openjdk.org/browse/JDK-8352536
>>>                 <https://bugs.openjdk.org/browse/JDK-8352536>
>>>
>>>                 [2] https://github.com/openjdk/jdk/pull/24139
>>>                 <https://urldefense.com/v3/__https:/github.com/openjdk/jdk/pull/24139__;!!ACWV5N9M2RV99hQ!MNdgtt56CgKWQQak5PkwC2t1L1q0tTZrT2fqzVPHYo3jer5CUfysIIx9uMuCoLb27YGoQWuUDgu3s_h8ZhlN-w$>
>>>
>>>                 On Mon, Mar 10, 2025 at 12:38 PM David Lloyd
>>>                 <david.lloyd at redhat.com> wrote:
>>>
>>>                     When defining a class in the JDK, one may either
>>>                     use a byte array or a byte buffer to hold the
>>>                     contents of the class. The latter is useful when
>>>                     (for example) a JAR file containing uncompressed
>>>                     classes is mapped into memory. Thus, some class
>>>                     loaders depend on this form of the API for class
>>>                     definition.
>>>
>>>                     If I were to supplement such a class loader with
>>>                     a class transformation step based on the class
>>>                     file API, I would have to copy the bytes of each
>>>                     class on to the heap as a byte[] before I could
>>>                     begin parsing it. This is potentially expensive,
>>>                     and definitely awkward.
>>>
>>>                     After transformation, it doesn't really matter
>>>                     if you have a byte[] or ByteBuffer because
>>>                     either way, the class can be defined directly.
>>>
>>>                     It would be nice if the class file parser could
>>>                     accept either a byte[] or a ByteBuffer. I did a
>>>                     quick bit of exploratory work and it looks like
>>>                     porting the code to read from a ByteBuffer
>>>                     instead of a byte[]  (using ByteBuffer.wrap()
>>>                     for the array case) would be largely
>>>                     straightforward *except* for the code which
>>>                     parses UTF-8 constants into strings. Also there
>>>                     could be some small performance differences
>>>                     (maybe positive, maybe negative) depending on
>>>                     how the buffer is accessed.
>>>
>>>                     Is this something that might be considered?
>>>
>>>                     -- 
>>>
>>>                     - DML • he/him
>>>
>>>
>>>                 -- 
>>>
>>>                 - DML • he/him
>>>
>>>
>>>             -- 
>>>
>>>             - DML • he/him
>>>
>>>
>>>
>>>         -- 
>>>         - DML • he/him
>>
>>
>>
>>     -- 
>>     - DML • he/him
>
>
>
> -- 
> - DML • he/him
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/classfile-api-dev/attachments/20250321/e7b584e1/attachment-0001.htm>


More information about the classfile-api-dev mailing list