From david.lloyd at redhat.com  Mon Mar 10 17:38:12 2025
From: david.lloyd at redhat.com (David Lloyd)
Date: Mon, 10 Mar 2025 17:38:12 +0000
Subject: Class files in ByteBuffer
Message-ID: <CANghgrQf44YJ0tux4QVpL0HU2aKoQhdh17kAsQ6JHW6-sV5__g@mail.gmail.com>

When defining a class in the JDK, one may either use a byte array or a byte
buffer to hold the contents of the class. The latter is useful when (for
example) a JAR file containing uncompressed classes is mapped into memory.
Thus, some class loaders depend on this form of the API for class
definition.

If I were to supplement such a class loader with a class transformation
step based on the class file API, I would have to copy the bytes of each
class on to the heap as a byte[] before I could begin parsing it. This is
potentially expensive, and definitely awkward.

After transformation, it doesn't really matter if you have a byte[] or
ByteBuffer because either way, the class can be defined directly.

It would be nice if the class file parser could accept either a byte[] or a
ByteBuffer. I did a quick bit of exploratory work and it looks like porting
the code to read from a ByteBuffer instead of a byte[]  (using
ByteBuffer.wrap() for the array case) would be largely straightforward
*except* for the code which parses UTF-8 constants into strings. Also there
could be some small performance differences (maybe positive, maybe
negative) depending on how the buffer is accessed.

Is this something that might be considered?

-- 
- DML ? he/him
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/classfile-api-dev/attachments/20250310/d78ff85d/attachment.htm>

From brian.goetz at oracle.com  Mon Mar 10 17:52:10 2025
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 10 Mar 2025 13:52:10 -0400
Subject: Class files in ByteBuffer
In-Reply-To: <CANghgrQf44YJ0tux4QVpL0HU2aKoQhdh17kAsQ6JHW6-sV5__g@mail.gmail.com>
References: <CANghgrQf44YJ0tux4QVpL0HU2aKoQhdh17kAsQ6JHW6-sV5__g@mail.gmail.com>
Message-ID: <61213c8a-ca86-4d37-8d5a-1aff31834481@oracle.com>

It sounds like you are asking two questions.? At the API level, you are 
asking whether adding a Classfile.parse(ByteBuffer) method would be in 
scope.? But at the implementation level, you are asking whether we would 
be OK to make ByteBuffer *the primitive* on which processing the byte[] 
format is based, which is a more intrusive change.

My first reaction is that the first seems fine in theory, but if the 
only reasonable implementation strategy is the latter, then I am pretty 
skeptical.

A ByteBuffer-accepting factory that simply copied to a byte[] would be 
fine (this is what we do with the existing Path-accepting factory, it's 
a similar form of convenience), but it sounds like this would not make 
you any happier.


On 3/10/2025 1:38 PM, David Lloyd wrote:
> When defining a class in the JDK, one may either use a byte array or a 
> byte buffer to hold the contents of the class. The latter is useful 
> when (for example) a JAR file containing uncompressed classes is 
> mapped into memory. Thus, some class loaders depend on this form of 
> the API for class definition.
>
> If I were to supplement such a class loader with a class 
> transformation step based on the class file API, I would have to copy 
> the bytes of each class on to the heap as a byte[] before I could 
> begin parsing it. This is potentially expensive, and definitely awkward.
>
> After transformation, it doesn't really matter if you have a byte[] or 
> ByteBuffer because either way, the class can be defined directly.
>
> It would be nice if the class file parser could accept either a byte[] 
> or a ByteBuffer. I did a quick bit of exploratory work and it looks 
> like porting the code to read from a ByteBuffer instead of a byte[]? 
> (using ByteBuffer.wrap() for the array case) would be largely 
> straightforward *except* for the code which parses UTF-8 constants 
> into strings. Also there could be some small performance differences 
> (maybe positive, maybe negative) depending on how the buffer is accessed.
>
> Is this something that might be considered?
>
> -- 
> - DML ? he/him
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/classfile-api-dev/attachments/20250310/e602ac67/attachment.htm>

From david.lloyd at redhat.com  Mon Mar 10 18:13:39 2025
From: david.lloyd at redhat.com (David Lloyd)
Date: Mon, 10 Mar 2025 18:13:39 +0000
Subject: Class files in ByteBuffer
In-Reply-To: <61213c8a-ca86-4d37-8d5a-1aff31834481@oracle.com>
References: <CANghgrQf44YJ0tux4QVpL0HU2aKoQhdh17kAsQ6JHW6-sV5__g@mail.gmail.com>
 <61213c8a-ca86-4d37-8d5a-1aff31834481@oracle.com>
Message-ID: <CANghgrS0PM3WZ=9SHeoV8-92XxzrWwTxf3zjP1Nk-MG6v9Xvcg@mail.gmail.com>

Thanks for the response; comments inline.

On Mon, Mar 10, 2025 at 12:52?PM Brian Goetz <brian.goetz at oracle.com> wrote:

> It sounds like you are asking two questions.  At the API level, you are
> asking whether adding a Classfile.parse(ByteBuffer) method would be in
> scope.  But at the implementation level, you are asking whether we would be
> OK to make ByteBuffer *the primitive* on which processing the byte[] format
> is based, which is a more intrusive change.
>
> My first reaction is that the first seems fine in theory, but if the only
> reasonable implementation strategy is the latter, then I am pretty
> skeptical.
>

A ByteBuffer-accepting factory that simply copied to a byte[] would be fine
> (this is what we do with the existing Path-accepting factory, it's a
> similar form of convenience), but it sounds like this would not make you
> any happier.
>

Well, it honestly wouldn't make me unhappy, because it's not worse than
today's status quo. If the API exists, then optimization is always going to
be a future possibility. So I for one would be fine with this as a starting
point, especially if it would greatly increase the chances of such an API
being included in time for Java 25. Trying to find an optimal
implementation strategy might be a diverting future spare-time project for
someone (maybe even myself if I ever find enough of those elusive
"round tuits" I keep hearing about).

-- 
- DML ? he/him
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/classfile-api-dev/attachments/20250310/8caf9e42/attachment-0001.htm>

From brian.goetz at oracle.com  Mon Mar 10 18:18:19 2025
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 10 Mar 2025 14:18:19 -0400
Subject: Class files in ByteBuffer
In-Reply-To: <CANghgrS0PM3WZ=9SHeoV8-92XxzrWwTxf3zjP1Nk-MG6v9Xvcg@mail.gmail.com>
References: <CANghgrQf44YJ0tux4QVpL0HU2aKoQhdh17kAsQ6JHW6-sV5__g@mail.gmail.com>
 <61213c8a-ca86-4d37-8d5a-1aff31834481@oracle.com>
 <CANghgrS0PM3WZ=9SHeoV8-92XxzrWwTxf3zjP1Nk-MG6v9Xvcg@mail.gmail.com>
Message-ID: <f8030bc9-b7b5-4150-8774-c8ddd6ccee42@oracle.com>

So, the other half of this is the overloads for 
Classfile::buildToByteBuffer, which I assume has a similarly trivial 
initial implementation; we wouldn't want to do one without the other, as 
it will seem a gratuitous asymmetry.? If both are shallow 
implementations, I'm not averse to this -- though you'll probably want 
an @ImplNote that explains how the implementation works, to avoid 
unhappy performance surprises.

On 3/10/2025 2:13 PM, David Lloyd wrote:
> Thanks for the response; comments inline.
>
> On Mon, Mar 10, 2025 at 12:52?PM Brian Goetz <brian.goetz at oracle.com> 
> wrote:
>
>     It sounds like you are asking two questions.? At the API level,
>     you are asking whether adding a Classfile.parse(ByteBuffer) method
>     would be in scope.? But at the implementation level, you are
>     asking whether we would be OK to make ByteBuffer *the primitive*
>     on which processing the byte[] format is based, which is a more
>     intrusive change.
>
>     My first reaction is that the first seems fine in theory, but if
>     the only reasonable implementation strategy is the latter, then I
>     am pretty skeptical.
>
>
>     A ByteBuffer-accepting factory that simply copied to a byte[]
>     would be fine (this is what we do with the existing Path-accepting
>     factory, it's a similar form of convenience), but it sounds like
>     this would not make you any happier.
>
>
> Well, it honestly wouldn't make me unhappy,?because it's not worse 
> than today's status quo. If the API exists, then optimization is 
> always going to be a future possibility. So I for one would be fine 
> with this as a starting point, especially if it would greatly increase 
> the chances of such an API being included in time for Java 25. Trying 
> to find an optimal implementation strategy might be a diverting future 
> spare-time project for someone (maybe even myself if I ever find 
> enough of those elusive "round?tuits" I keep hearing about).
>
> -- 
> - DML ? he/him
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/classfile-api-dev/attachments/20250310/fd5bc887/attachment.htm>

From chen.l.liang at oracle.com  Mon Mar 10 18:46:46 2025
From: chen.l.liang at oracle.com (Chen Liang)
Date: Mon, 10 Mar 2025 18:46:46 +0000
Subject: Class files in ByteBuffer
In-Reply-To: <CANghgrQf44YJ0tux4QVpL0HU2aKoQhdh17kAsQ6JHW6-sV5__g@mail.gmail.com>
References: <CANghgrQf44YJ0tux4QVpL0HU2aKoQhdh17kAsQ6JHW6-sV5__g@mail.gmail.com>
Message-ID: <SJ2PR10MB7669843B9621B4F531AC5AC3A2D62@SJ2PR10MB7669.namprd10.prod.outlook.com>

I think the use of ByteBuffer vs byte[] is a tradeoff - JIT compiler has a lot of trouble with ByteBuffer due to polymorphism and this might actually turn out to be a regression. (ClassFile API previously used ByteBuffer for stack map generation I think; it has been since eliminated for performance improvements) Also ClassFile API depends on some sweet properties of byte[], such as using some String intrinsics on byte array to quickly process ascii-compatible UTF8 entries.

Luckily the access to the array is nicely encapsulated in ClassReader for the most part and Utf8 entry is the only place where it escapes. You should be able to make a prototype of reading from ByteBuffer easily; your "using byte buffer as backing" approach might be accepted if you can prove there is no regression in the case of reading from plain byte arrays.

Regards, Chen
________________________________
From: classfile-api-dev <classfile-api-dev-retn at openjdk.org> on behalf of David Lloyd <david.lloyd at redhat.com>
Sent: Monday, March 10, 2025 12:38 PM
To: classfile-api-dev at openjdk.org <classfile-api-dev at openjdk.org>
Subject: Class files in ByteBuffer

When defining a class in the JDK, one may either use a byte array or a byte buffer to hold the contents of the class. The latter is useful when (for example) a JAR file containing uncompressed classes is mapped into memory. Thus, some class loaders depend on this form of the API for class definition.

If I were to supplement such a class loader with a class transformation step based on the class file API, I would have to copy the bytes of each class on to the heap as a byte[] before I could begin parsing it. This is potentially expensive, and definitely awkward.

After transformation, it doesn't really matter if you have a byte[] or ByteBuffer because either way, the class can be defined directly.

It would be nice if the class file parser could accept either a byte[] or a ByteBuffer. I did a quick bit of exploratory work and it looks like porting the code to read from a ByteBuffer instead of a byte[]  (using ByteBuffer.wrap() for the array case) would be largely straightforward *except* for the code which parses UTF-8 constants into strings. Also there could be some small performance differences (maybe positive, maybe negative) depending on how the buffer is accessed.

Is this something that might be considered?

--
- DML ? he/him
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/classfile-api-dev/attachments/20250310/e4f2c982/attachment-0001.htm>

From david.lloyd at redhat.com  Wed Mar 12 13:27:31 2025
From: david.lloyd at redhat.com (David Lloyd)
Date: Wed, 12 Mar 2025 13:27:31 +0000
Subject: Class files in ByteBuffer
In-Reply-To: <f8030bc9-b7b5-4150-8774-c8ddd6ccee42@oracle.com>
References: <CANghgrQf44YJ0tux4QVpL0HU2aKoQhdh17kAsQ6JHW6-sV5__g@mail.gmail.com>
 <61213c8a-ca86-4d37-8d5a-1aff31834481@oracle.com>
 <CANghgrS0PM3WZ=9SHeoV8-92XxzrWwTxf3zjP1Nk-MG6v9Xvcg@mail.gmail.com>
 <f8030bc9-b7b5-4150-8774-c8ddd6ccee42@oracle.com>
Message-ID: <CANghgrRjCPC9WU-KeBB+6Uj39OkeXbgjMTWwm+ZoVEc7kGrtiw@mail.gmail.com>

Making the output fully symmetrical might be a little bit more challenging
(interesting?) than it seemed to be at first glance. You'd have to think
about questions like "should the buffer be direct?". We could possibly
allow an `IntFunction<ByteBuffer>` to be passed in, to support flexible
allocation strategies and to allow (for example) writing to memory-mapped
areas and things like that. Since we're currently doing a couple of
`arraycopy` to write to the output, it should be trivial to create a
variation which bulk-writes to a user-supplied `ByteBuffer`. This would be
more broadly useful than just a naive `ByteBuffer.wrap()` on the byte array
output. That effect could however still be achieved if the user passes in
e.g. `ByteBuffer::allocate` as the buffer acquisition function (we could
possibly supply an overload which uses this strategy).

On Mon, Mar 10, 2025 at 1:18?PM Brian Goetz <brian.goetz at oracle.com> wrote:

> So, the other half of this is the overloads for
> Classfile::buildToByteBuffer, which I assume has a similarly trivial
> initial implementation; we wouldn't want to do one without the other, as it
> will seem a gratuitous asymmetry.  If both are shallow implementations, I'm
> not averse to this -- though you'll probably want an @ImplNote that
> explains how the implementation works, to avoid unhappy performance
> surprises.
>
> On 3/10/2025 2:13 PM, David Lloyd wrote:
>
> Thanks for the response; comments inline.
>
> On Mon, Mar 10, 2025 at 12:52?PM Brian Goetz <brian.goetz at oracle.com>
> wrote:
>
>> It sounds like you are asking two questions.  At the API level, you are
>> asking whether adding a Classfile.parse(ByteBuffer) method would be in
>> scope.  But at the implementation level, you are asking whether we would be
>> OK to make ByteBuffer *the primitive* on which processing the byte[] format
>> is based, which is a more intrusive change.
>>
>> My first reaction is that the first seems fine in theory, but if the only
>> reasonable implementation strategy is the latter, then I am pretty
>> skeptical.
>>
>
> A ByteBuffer-accepting factory that simply copied to a byte[] would be
>> fine (this is what we do with the existing Path-accepting factory, it's a
>> similar form of convenience), but it sounds like this would not make you
>> any happier.
>>
>
> Well, it honestly wouldn't make me unhappy, because it's not worse than
> today's status quo. If the API exists, then optimization is always going to
> be a future possibility. So I for one would be fine with this as a starting
> point, especially if it would greatly increase the chances of such an API
> being included in time for Java 25. Trying to find an optimal
> implementation strategy might be a diverting future spare-time project for
> someone (maybe even myself if I ever find enough of those elusive
> "round tuits" I keep hearing about).
>
> --
> - DML ? he/him
>
>
>

-- 
- DML ? he/him
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/classfile-api-dev/attachments/20250312/ff807df3/attachment.htm>

From maurizio.cimadamore at oracle.com  Wed Mar 12 15:53:45 2025
From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore)
Date: Wed, 12 Mar 2025 15:53:45 +0000
Subject: Class files in ByteBuffer
In-Reply-To: <61213c8a-ca86-4d37-8d5a-1aff31834481@oracle.com>
References: <CANghgrQf44YJ0tux4QVpL0HU2aKoQhdh17kAsQ6JHW6-sV5__g@mail.gmail.com>
 <61213c8a-ca86-4d37-8d5a-1aff31834481@oracle.com>
Message-ID: <16a12929-7ba7-417c-a8f4-aa0f04e5e11e@oracle.com>


On 10/03/2025 17:52, Brian Goetz wrote:
> My first reaction is that the first seems fine in theory

I wonder if an API accepting a MemorySegment would be more general -- 
you can construct a MS from a BB and you can of course go from MS to 
byte[] (which is what the impl needs). So I wonder if that would be more 
future-proof. (We can, of course, also provide both).

Maurizio


From brian.goetz at oracle.com  Wed Mar 12 16:25:31 2025
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 12 Mar 2025 12:25:31 -0400
Subject: Class files in ByteBuffer
In-Reply-To: <16a12929-7ba7-417c-a8f4-aa0f04e5e11e@oracle.com>
References: <CANghgrQf44YJ0tux4QVpL0HU2aKoQhdh17kAsQ6JHW6-sV5__g@mail.gmail.com>
 <61213c8a-ca86-4d37-8d5a-1aff31834481@oracle.com>
 <16a12929-7ba7-417c-a8f4-aa0f04e5e11e@oracle.com>
Message-ID: <da776b1e-7ef8-417c-9238-5d55e452a254@oracle.com>

That does seem like a more future-proof choice.? (I suspect too it would 
be less intrusive to adapt the internals to MS than BB.)

On 3/12/2025 11:53 AM, Maurizio Cimadamore wrote:
>
> On 10/03/2025 17:52, Brian Goetz wrote:
>> My first reaction is that the first seems fine in theory
>
> I wonder if an API accepting a MemorySegment would be more general -- 
> you can construct a MS from a BB and you can of course go from MS to 
> byte[] (which is what the impl needs). So I wonder if that would be 
> more future-proof. (We can, of course, also provide both).
>
> Maurizio
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/classfile-api-dev/attachments/20250312/df766993/attachment-0001.htm>

From maurizio.cimadamore at oracle.com  Wed Mar 12 16:51:41 2025
From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore)
Date: Wed, 12 Mar 2025 16:51:41 +0000
Subject: Class files in ByteBuffer
In-Reply-To: <da776b1e-7ef8-417c-9238-5d55e452a254@oracle.com>
References: <CANghgrQf44YJ0tux4QVpL0HU2aKoQhdh17kAsQ6JHW6-sV5__g@mail.gmail.com>
 <61213c8a-ca86-4d37-8d5a-1aff31834481@oracle.com>
 <16a12929-7ba7-417c-a8f4-aa0f04e5e11e@oracle.com>
 <da776b1e-7ef8-417c-9238-5d55e452a254@oracle.com>
Message-ID: <fe2af4aa-3bab-486e-ad4e-aae45142cfc3@oracle.com>


On 12/03/2025 16:25, Brian Goetz wrote:
> That does seem like a more future-proof choice.? (I suspect too it 
> would be less intrusive to adapt the internals to MS than BB.)

They are probably similar in spirit -- but at least you would know that 
the MS path is more aggressively/actively optimized.

I do share some of Chen's concerns -- random access on MS (and BB) is 
not comparable to random access on a byte[]. So changing the internals 
of the classfile API to use MS/BB is something that needs to be done 
carefully (and with benchmarks at hands).

One possible area where adopting a "more raw" buffer would be beneficial 
is when writing/reading custom attributes -- since BB/MS will already 
provide the primitives we need to access load/store primitive values 
from/in the buffer. But -- again, something that requires care and 
consideration, it's not a slam dunk.

Maurizio

>
> On 3/12/2025 11:53 AM, Maurizio Cimadamore wrote:
>>
>> On 10/03/2025 17:52, Brian Goetz wrote:
>>> My first reaction is that the first seems fine in theory
>>
>> I wonder if an API accepting a MemorySegment would be more general -- 
>> you can construct a MS from a BB and you can of course go from MS to 
>> byte[] (which is what the impl needs). So I wonder if that would be 
>> more future-proof. (We can, of course, also provide both).
>>
>> Maurizio
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/classfile-api-dev/attachments/20250312/c22e6cd2/attachment.htm>

From david.lloyd at redhat.com  Wed Mar 12 19:10:44 2025
From: david.lloyd at redhat.com (David Lloyd)
Date: Wed, 12 Mar 2025 19:10:44 +0000
Subject: Class files in ByteBuffer
In-Reply-To: <fe2af4aa-3bab-486e-ad4e-aae45142cfc3@oracle.com>
References: <CANghgrQf44YJ0tux4QVpL0HU2aKoQhdh17kAsQ6JHW6-sV5__g@mail.gmail.com>
 <61213c8a-ca86-4d37-8d5a-1aff31834481@oracle.com>
 <16a12929-7ba7-417c-a8f4-aa0f04e5e11e@oracle.com>
 <da776b1e-7ef8-417c-9238-5d55e452a254@oracle.com>
 <fe2af4aa-3bab-486e-ad4e-aae45142cfc3@oracle.com>
Message-ID: <CANghgrSPLKLjkqH5i5-pDOFhYBxGKN24BgN-_3GA6+9G8RiY-w@mail.gmail.com>

On Wed, Mar 12, 2025 at 11:51?AM Maurizio Cimadamore <
maurizio.cimadamore at oracle.com> wrote:

>
> On 12/03/2025 16:25, Brian Goetz wrote:
>
> That does seem like a more future-proof choice.  (I suspect too it would
> be less intrusive to adapt the internals to MS than BB.)
>
> They are probably similar in spirit -- but at least you would know that
> the MS path is more aggressively/actively optimized.
>
> I do share some of Chen's concerns -- random access on MS (and BB) is not
> comparable to random access on a byte[]. So changing the internals of the
> classfile API to use MS/BB is something that needs to be done carefully
> (and with benchmarks at hands).
>
> One possible area where adopting a "more raw" buffer would be beneficial
> is when writing/reading custom attributes -- since BB/MS will already
> provide the primitives we need to access load/store primitive values
> from/in the buffer. But -- again, something that requires care and
> consideration, it's not a slam dunk.
>
Internally, (on the parsing side at least) it is my expectation that we
would not likely be able to get away with having a single, general access
strategy using the `MemorySegment` API (but we could test that now, even
without the suggested API changes - I would love to be wrong). It seems
more likely that we'd want to keep the current array-based strategy (which
uses `Unsafe` liberally) and add a new direct memory address-based access
strategy (also using `Unsafe` in an equivalent manner), and select the
strategy based on the kind of `MemorySegment` or `ByteBuffer`.

Having three parse and build APIs (one for each of `byte[]`, `ByteBuffer`,
and `MemorySegment`) makes sense to me because there's a use case for each
of them, and they can be implemented in terms of one another to a great
extent which gives a lot of flexibility. Particularly, it seems to me that
as long as `ClassLoader.defineClass(String,ByteBuffer,ProtectionDomain)`
exists, then ByteBuffer should be floated up to the API, even if it ends up
being e.g. `MemorySegment.ofByteBuffer()` on the inside. (That said, I
wouldn't hate it if a new `defineClass` which uses `MemorySegment` could be
defined someday.)

-- 
- DML ? he/him
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/classfile-api-dev/attachments/20250312/21c087cb/attachment.htm>