[External] : Re: Class files in ByteBuffer

Fri Mar 21 15:25:19 UTC 2025

On 21/03/2025 15:17, David Lloyd wrote:
> Yes, you *could* wrap a byte[] with a MemorySegment, but then you'd 
> just have a heap memory segment; this is probably not what the user is 
> looking for (otherwise they'd probably just build to a byte[] to begin 
> with). By using the user-supplied allocator, I can (for example) write 
> the class bytes directly to a memory-mapped file.

Ah! I now see that first allocator parameter -- which makes sense.

I'd suggest to turn that into SegmentAllocator -- note that Arena is 
also a SegmentAllocator, so you can pass either your own allocating 
lambda, or directly use an arena which is nice (I think).

ByteBuffer don't have the allocation abstraction -- so I wonder if we 
really should let them go, and maybe only add the segment API with the 
allocator parameter. Or maybe have a simpler API for ByteBuffer w/o 
allocator parameter, which always allocates a heap buffer. Then if users 
want something more complex (e.g. a mapped byte buffer) they can use the 
MS API instead, and then wrap the MS into a BB after the fact.

Maurizio

>
> That said, the buildTo* variants exist because Brian suggested that 
> the API might be unacceptably asymmetrical otherwise. Either way, the 
> part that matters the most to me is the parsing side.
>
> The idea with the API is that, while the current impl would need to 
> copy to a byte array, it is at least theoretically possible that in 
> the future, that may change. As a user of the byte[]-based API, I am 
> already copying from my direct buffer, so it's not worse than the 
> status quo. By putting the copy into the JDK, if the JDK does get 
> enhanced someday to use MemorySegment internally, or maybe Unsafe, or 
> whatever, then I'll get the benefit of this change. If it doesn't, 
> then I'm no worse off than I am today (slightly better off actually, 
> because it saves me a step going in). Supporting this use case would 
> be beneficial for the same reason that it is beneficial to be able to 
> define classes out of direct buffers (which has been supported since 
> JDK 1.5).
>
> The thing I'm mainly in doubt about is that the ability to parse from 
> or generate to byte buffers is potentially redundant with respect to 
> MemorySegment. It would just be a bit weird if I could define a class 
> using an array or a byte buffer, but parsing classes used arrays or 
> memory segments. Is it weird enough to justify the third API variant? 
> I don't know.
>
> On Fri, Mar 21, 2025 at 9:27 AM Maurizio Cimadamore 
> <maurizio.cimadamore at oracle.com> wrote:
>
>     I looked at the PR and I had a similar reaction as Adam.
>
>     It seems like the buildToXYZ methods have one less copy, but... if
>     the API returns a byte[], you can always wrap the byte[] as either
>     a MemorySegment (MemorySegment::ofArrays) or a ByteBuffer
>     (ByteBuffer::wrap). These methods do _not_ copy.
>
>     The way I see it, is that this PR doesn't remove the need to copy
>     on the way in (that would be a much more complex change), and it
>     doesn't improve things significantly _on the way out_ (because,
>     that side is already covered by existing API/views support).
>
>     So, while I'm not strongly opposed to the changes in the PR, at
>     the same time I don't see a lot of added value in the new methods.
>     At the same time I'm a bit skeptical to commit to the new API in
>     without doing the full exercise to see how these methods might be
>     implemented.
>
>     A good motto in API design is "when in doubt, leave it out" -- is
>     this one of those cases?
>
>     Maurizio
>
>
>
>     On 21/03/2025 13:34, David Lloyd wrote:
>>     The idea is that in the future, it may be possible to do these
>>     things without the extra copy. In the PR, I found that we can
>>     already build to memory segments and byte buffers without more
>>     copies than what we're doing for arrays. On the parsing side, we
>>     can already sometimes work without copying in some cases that the
>>     user won't have access to (e.g. accessing the backing array of a
>>     memory segment, even if it's read-only). It's not hard to imagine
>>     that we could possibly have a way to parse without the extra copy
>>     in the native memory case in the future, without impacting
>>     current performance on arrays. But without the API support, it
>>     can never be possible.
>>
>>     On Fri, Mar 21, 2025 at 8:26 AM Adam Sotona
>>     <adam.sotona at oracle.com> wrote:
>>
>>         I’m more thinking that the API already provides all the
>>         important entries and conversion from and to `MemorySegment`
>>         can be done by simple call of `MemorySegment::toArray` and
>>         `MemorySegment::ofArray`.
>>
>>         *From: *David Lloyd <david.lloyd at redhat.com>
>>         *Date: *Friday, 21 March 2025 at 13:37
>>         *To: *Adam Sotona <adam.sotona at oracle.com>
>>         *Cc: *classfile-api-dev at openjdk.org
>>         <classfile-api-dev at openjdk.org>
>>         *Subject: *[External] : Re: Class files in ByteBuffer
>>
>>         Please have a look at the PR. If you feel the API surface has
>>         grown too much, maybe removing the `ByteBuffer` variants is a
>>         logical step, since users can always wrap a `ByteBuffer` with
>>         a `MemorySegment`? If you could comment on the PR if you feel
>>         that to be the case, I would appreciate it.
>>
>>         Thanks.
>>
>>         On Thu, Mar 20, 2025 at 4:09 PM Adam Sotona
>>         <adam.sotona at oracle.com> wrote:
>>
>>             I’m sorry to join the discussion a bit late.
>>
>>             Here are the points to consider:
>>
>>               * Class-File API is implementation is after many rounds
>>                 of performance optimizations purely based on byte arrays.
>>               * Internal use of ByteBuffer has been removed from the
>>                 implementation, as it caused significant JDK
>>                 bootstrap performance regression.
>>               * Enormous amount of work has been spent on the API
>>                 surface reduction and removal of all unnecessary
>>                 “conveniences”.
>>
>>             Adam
>>
>>             *From: *classfile-api-dev
>>             <classfile-api-dev-retn at openjdk.org> on behalf of David
>>             Lloyd <david.lloyd at redhat.com>
>>             *Date: *Thursday, 20 March 2025 at 21:11
>>             *To: *classfile-api-dev at openjdk.org
>>             <classfile-api-dev at openjdk.org>
>>             *Subject: *Re: Class files in ByteBuffer
>>
>>             I've opened a bug [1] and pull request [2] incorporating
>>             this discussion (more or less). I've implemented support
>>             for both `MemorySegment` and `ByteBuffer`, but this could
>>             be revisited if it doesn't look OK. The implementation is
>>             not terribly invasive for now, only grabbing a few
>>             low-hanging optimizations.
>>
>>             [1] https://bugs.openjdk.org/browse/JDK-8352536
>>             <https://bugs.openjdk.org/browse/JDK-8352536>
>>
>>             [2] https://github.com/openjdk/jdk/pull/24139
>>             <https://urldefense.com/v3/__https:/github.com/openjdk/jdk/pull/24139__;!!ACWV5N9M2RV99hQ!MNdgtt56CgKWQQak5PkwC2t1L1q0tTZrT2fqzVPHYo3jer5CUfysIIx9uMuCoLb27YGoQWuUDgu3s_h8ZhlN-w$>
>>
>>             On Mon, Mar 10, 2025 at 12:38 PM David Lloyd
>>             <david.lloyd at redhat.com> wrote:
>>
>>                 When defining a class in the JDK, one may either use
>>                 a byte array or a byte buffer to hold the contents of
>>                 the class. The latter is useful when (for example) a
>>                 JAR file containing uncompressed classes is mapped
>>                 into memory. Thus, some class loaders depend on this
>>                 form of the API for class definition.
>>
>>                 If I were to supplement such a class loader with a
>>                 class transformation step based on the class file
>>                 API, I would have to copy the bytes of each class on
>>                 to the heap as a byte[] before I could begin parsing
>>                 it. This is potentially expensive, and definitely
>>                 awkward.
>>
>>                 After transformation, it doesn't really matter if you
>>                 have a byte[] or ByteBuffer because either way, the
>>                 class can be defined directly.
>>
>>                 It would be nice if the class file parser could
>>                 accept either a byte[] or a ByteBuffer. I did a quick
>>                 bit of exploratory work and it looks like porting the
>>                 code to read from a ByteBuffer instead of a byte[] 
>>                 (using ByteBuffer.wrap() for the array case) would be
>>                 largely straightforward *except* for the code which
>>                 parses UTF-8 constants into strings. Also there could
>>                 be some small performance differences (maybe
>>                 positive, maybe negative) depending on how the buffer
>>                 is accessed.
>>
>>                 Is this something that might be considered?
>>
>>                 -- 
>>
>>                 - DML • he/him
>>
>>
>>             -- 
>>
>>             - DML • he/him
>>
>>
>>         -- 
>>
>>         - DML • he/him
>>
>>
>>
>>     -- 
>>     - DML • he/him
>
>
>
> -- 
> - DML • he/him
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/classfile-api-dev/attachments/20250321/ffc5d035/attachment-0001.htm>