From david.lloyd at redhat.com Mon Mar 10 17:38:12 2025 From: david.lloyd at redhat.com (David Lloyd) Date: Mon, 10 Mar 2025 17:38:12 +0000 Subject: Class files in ByteBuffer Message-ID: When defining a class in the JDK, one may either use a byte array or a byte buffer to hold the contents of the class. The latter is useful when (for example) a JAR file containing uncompressed classes is mapped into memory. Thus, some class loaders depend on this form of the API for class definition. If I were to supplement such a class loader with a class transformation step based on the class file API, I would have to copy the bytes of each class on to the heap as a byte[] before I could begin parsing it. This is potentially expensive, and definitely awkward. After transformation, it doesn't really matter if you have a byte[] or ByteBuffer because either way, the class can be defined directly. It would be nice if the class file parser could accept either a byte[] or a ByteBuffer. I did a quick bit of exploratory work and it looks like porting the code to read from a ByteBuffer instead of a byte[] (using ByteBuffer.wrap() for the array case) would be largely straightforward *except* for the code which parses UTF-8 constants into strings. Also there could be some small performance differences (maybe positive, maybe negative) depending on how the buffer is accessed. Is this something that might be considered? -- - DML ? he/him -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Mon Mar 10 17:52:10 2025 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 10 Mar 2025 13:52:10 -0400 Subject: Class files in ByteBuffer In-Reply-To: References: Message-ID: <61213c8a-ca86-4d37-8d5a-1aff31834481@oracle.com> It sounds like you are asking two questions.? At the API level, you are asking whether adding a Classfile.parse(ByteBuffer) method would be in scope.? But at the implementation level, you are asking whether we would be OK to make ByteBuffer *the primitive* on which processing the byte[] format is based, which is a more intrusive change. My first reaction is that the first seems fine in theory, but if the only reasonable implementation strategy is the latter, then I am pretty skeptical. A ByteBuffer-accepting factory that simply copied to a byte[] would be fine (this is what we do with the existing Path-accepting factory, it's a similar form of convenience), but it sounds like this would not make you any happier. On 3/10/2025 1:38 PM, David Lloyd wrote: > When defining a class in the JDK, one may either use a byte array or a > byte buffer to hold the contents of the class. The latter is useful > when (for example) a JAR file containing uncompressed classes is > mapped into memory. Thus, some class loaders depend on this form of > the API for class definition. > > If I were to supplement such a class loader with a class > transformation step based on the class file API, I would have to copy > the bytes of each class on to the heap as a byte[] before I could > begin parsing it. This is potentially expensive, and definitely awkward. > > After transformation, it doesn't really matter if you have a byte[] or > ByteBuffer because either way, the class can be defined directly. > > It would be nice if the class file parser could accept either a byte[] > or a ByteBuffer. I did a quick bit of exploratory work and it looks > like porting the code to read from a ByteBuffer instead of a byte[]? > (using ByteBuffer.wrap() for the array case) would be largely > straightforward *except* for the code which parses UTF-8 constants > into strings. Also there could be some small performance differences > (maybe positive, maybe negative) depending on how the buffer is accessed. > > Is this something that might be considered? > > -- > - DML ? he/him -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.lloyd at redhat.com Mon Mar 10 18:13:39 2025 From: david.lloyd at redhat.com (David Lloyd) Date: Mon, 10 Mar 2025 18:13:39 +0000 Subject: Class files in ByteBuffer In-Reply-To: <61213c8a-ca86-4d37-8d5a-1aff31834481@oracle.com> References: <61213c8a-ca86-4d37-8d5a-1aff31834481@oracle.com> Message-ID: Thanks for the response; comments inline. On Mon, Mar 10, 2025 at 12:52?PM Brian Goetz wrote: > It sounds like you are asking two questions. At the API level, you are > asking whether adding a Classfile.parse(ByteBuffer) method would be in > scope. But at the implementation level, you are asking whether we would be > OK to make ByteBuffer *the primitive* on which processing the byte[] format > is based, which is a more intrusive change. > > My first reaction is that the first seems fine in theory, but if the only > reasonable implementation strategy is the latter, then I am pretty > skeptical. > A ByteBuffer-accepting factory that simply copied to a byte[] would be fine > (this is what we do with the existing Path-accepting factory, it's a > similar form of convenience), but it sounds like this would not make you > any happier. > Well, it honestly wouldn't make me unhappy, because it's not worse than today's status quo. If the API exists, then optimization is always going to be a future possibility. So I for one would be fine with this as a starting point, especially if it would greatly increase the chances of such an API being included in time for Java 25. Trying to find an optimal implementation strategy might be a diverting future spare-time project for someone (maybe even myself if I ever find enough of those elusive "round tuits" I keep hearing about). -- - DML ? he/him -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Mon Mar 10 18:18:19 2025 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 10 Mar 2025 14:18:19 -0400 Subject: Class files in ByteBuffer In-Reply-To: References: <61213c8a-ca86-4d37-8d5a-1aff31834481@oracle.com> Message-ID: So, the other half of this is the overloads for Classfile::buildToByteBuffer, which I assume has a similarly trivial initial implementation; we wouldn't want to do one without the other, as it will seem a gratuitous asymmetry.? If both are shallow implementations, I'm not averse to this -- though you'll probably want an @ImplNote that explains how the implementation works, to avoid unhappy performance surprises. On 3/10/2025 2:13 PM, David Lloyd wrote: > Thanks for the response; comments inline. > > On Mon, Mar 10, 2025 at 12:52?PM Brian Goetz > wrote: > > It sounds like you are asking two questions.? At the API level, > you are asking whether adding a Classfile.parse(ByteBuffer) method > would be in scope.? But at the implementation level, you are > asking whether we would be OK to make ByteBuffer *the primitive* > on which processing the byte[] format is based, which is a more > intrusive change. > > My first reaction is that the first seems fine in theory, but if > the only reasonable implementation strategy is the latter, then I > am pretty skeptical. > > > A ByteBuffer-accepting factory that simply copied to a byte[] > would be fine (this is what we do with the existing Path-accepting > factory, it's a similar form of convenience), but it sounds like > this would not make you any happier. > > > Well, it honestly wouldn't make me unhappy,?because it's not worse > than today's status quo. If the API exists, then optimization is > always going to be a future possibility. So I for one would be fine > with this as a starting point, especially if it would greatly increase > the chances of such an API being included in time for Java 25. Trying > to find an optimal implementation strategy might be a diverting future > spare-time project for someone (maybe even myself if I ever find > enough of those elusive "round?tuits" I keep hearing about). > > -- > - DML ? he/him -------------- next part -------------- An HTML attachment was scrubbed... URL: From chen.l.liang at oracle.com Mon Mar 10 18:46:46 2025 From: chen.l.liang at oracle.com (Chen Liang) Date: Mon, 10 Mar 2025 18:46:46 +0000 Subject: Class files in ByteBuffer In-Reply-To: References: Message-ID: I think the use of ByteBuffer vs byte[] is a tradeoff - JIT compiler has a lot of trouble with ByteBuffer due to polymorphism and this might actually turn out to be a regression. (ClassFile API previously used ByteBuffer for stack map generation I think; it has been since eliminated for performance improvements) Also ClassFile API depends on some sweet properties of byte[], such as using some String intrinsics on byte array to quickly process ascii-compatible UTF8 entries. Luckily the access to the array is nicely encapsulated in ClassReader for the most part and Utf8 entry is the only place where it escapes. You should be able to make a prototype of reading from ByteBuffer easily; your "using byte buffer as backing" approach might be accepted if you can prove there is no regression in the case of reading from plain byte arrays. Regards, Chen ________________________________ From: classfile-api-dev on behalf of David Lloyd Sent: Monday, March 10, 2025 12:38 PM To: classfile-api-dev at openjdk.org Subject: Class files in ByteBuffer When defining a class in the JDK, one may either use a byte array or a byte buffer to hold the contents of the class. The latter is useful when (for example) a JAR file containing uncompressed classes is mapped into memory. Thus, some class loaders depend on this form of the API for class definition. If I were to supplement such a class loader with a class transformation step based on the class file API, I would have to copy the bytes of each class on to the heap as a byte[] before I could begin parsing it. This is potentially expensive, and definitely awkward. After transformation, it doesn't really matter if you have a byte[] or ByteBuffer because either way, the class can be defined directly. It would be nice if the class file parser could accept either a byte[] or a ByteBuffer. I did a quick bit of exploratory work and it looks like porting the code to read from a ByteBuffer instead of a byte[] (using ByteBuffer.wrap() for the array case) would be largely straightforward *except* for the code which parses UTF-8 constants into strings. Also there could be some small performance differences (maybe positive, maybe negative) depending on how the buffer is accessed. Is this something that might be considered? -- - DML ? he/him -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.lloyd at redhat.com Wed Mar 12 13:27:31 2025 From: david.lloyd at redhat.com (David Lloyd) Date: Wed, 12 Mar 2025 13:27:31 +0000 Subject: Class files in ByteBuffer In-Reply-To: References: <61213c8a-ca86-4d37-8d5a-1aff31834481@oracle.com> Message-ID: Making the output fully symmetrical might be a little bit more challenging (interesting?) than it seemed to be at first glance. You'd have to think about questions like "should the buffer be direct?". We could possibly allow an `IntFunction` to be passed in, to support flexible allocation strategies and to allow (for example) writing to memory-mapped areas and things like that. Since we're currently doing a couple of `arraycopy` to write to the output, it should be trivial to create a variation which bulk-writes to a user-supplied `ByteBuffer`. This would be more broadly useful than just a naive `ByteBuffer.wrap()` on the byte array output. That effect could however still be achieved if the user passes in e.g. `ByteBuffer::allocate` as the buffer acquisition function (we could possibly supply an overload which uses this strategy). On Mon, Mar 10, 2025 at 1:18?PM Brian Goetz wrote: > So, the other half of this is the overloads for > Classfile::buildToByteBuffer, which I assume has a similarly trivial > initial implementation; we wouldn't want to do one without the other, as it > will seem a gratuitous asymmetry. If both are shallow implementations, I'm > not averse to this -- though you'll probably want an @ImplNote that > explains how the implementation works, to avoid unhappy performance > surprises. > > On 3/10/2025 2:13 PM, David Lloyd wrote: > > Thanks for the response; comments inline. > > On Mon, Mar 10, 2025 at 12:52?PM Brian Goetz > wrote: > >> It sounds like you are asking two questions. At the API level, you are >> asking whether adding a Classfile.parse(ByteBuffer) method would be in >> scope. But at the implementation level, you are asking whether we would be >> OK to make ByteBuffer *the primitive* on which processing the byte[] format >> is based, which is a more intrusive change. >> >> My first reaction is that the first seems fine in theory, but if the only >> reasonable implementation strategy is the latter, then I am pretty >> skeptical. >> > > A ByteBuffer-accepting factory that simply copied to a byte[] would be >> fine (this is what we do with the existing Path-accepting factory, it's a >> similar form of convenience), but it sounds like this would not make you >> any happier. >> > > Well, it honestly wouldn't make me unhappy, because it's not worse than > today's status quo. If the API exists, then optimization is always going to > be a future possibility. So I for one would be fine with this as a starting > point, especially if it would greatly increase the chances of such an API > being included in time for Java 25. Trying to find an optimal > implementation strategy might be a diverting future spare-time project for > someone (maybe even myself if I ever find enough of those elusive > "round tuits" I keep hearing about). > > -- > - DML ? he/him > > > -- - DML ? he/him -------------- next part -------------- An HTML attachment was scrubbed... URL: From maurizio.cimadamore at oracle.com Wed Mar 12 15:53:45 2025 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Wed, 12 Mar 2025 15:53:45 +0000 Subject: Class files in ByteBuffer In-Reply-To: <61213c8a-ca86-4d37-8d5a-1aff31834481@oracle.com> References: <61213c8a-ca86-4d37-8d5a-1aff31834481@oracle.com> Message-ID: <16a12929-7ba7-417c-a8f4-aa0f04e5e11e@oracle.com> On 10/03/2025 17:52, Brian Goetz wrote: > My first reaction is that the first seems fine in theory I wonder if an API accepting a MemorySegment would be more general -- you can construct a MS from a BB and you can of course go from MS to byte[] (which is what the impl needs). So I wonder if that would be more future-proof. (We can, of course, also provide both). Maurizio From brian.goetz at oracle.com Wed Mar 12 16:25:31 2025 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 12 Mar 2025 12:25:31 -0400 Subject: Class files in ByteBuffer In-Reply-To: <16a12929-7ba7-417c-a8f4-aa0f04e5e11e@oracle.com> References: <61213c8a-ca86-4d37-8d5a-1aff31834481@oracle.com> <16a12929-7ba7-417c-a8f4-aa0f04e5e11e@oracle.com> Message-ID: That does seem like a more future-proof choice.? (I suspect too it would be less intrusive to adapt the internals to MS than BB.) On 3/12/2025 11:53 AM, Maurizio Cimadamore wrote: > > On 10/03/2025 17:52, Brian Goetz wrote: >> My first reaction is that the first seems fine in theory > > I wonder if an API accepting a MemorySegment would be more general -- > you can construct a MS from a BB and you can of course go from MS to > byte[] (which is what the impl needs). So I wonder if that would be > more future-proof. (We can, of course, also provide both). > > Maurizio > -------------- next part -------------- An HTML attachment was scrubbed... URL: From maurizio.cimadamore at oracle.com Wed Mar 12 16:51:41 2025 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Wed, 12 Mar 2025 16:51:41 +0000 Subject: Class files in ByteBuffer In-Reply-To: References: <61213c8a-ca86-4d37-8d5a-1aff31834481@oracle.com> <16a12929-7ba7-417c-a8f4-aa0f04e5e11e@oracle.com> Message-ID: On 12/03/2025 16:25, Brian Goetz wrote: > That does seem like a more future-proof choice.? (I suspect too it > would be less intrusive to adapt the internals to MS than BB.) They are probably similar in spirit -- but at least you would know that the MS path is more aggressively/actively optimized. I do share some of Chen's concerns -- random access on MS (and BB) is not comparable to random access on a byte[]. So changing the internals of the classfile API to use MS/BB is something that needs to be done carefully (and with benchmarks at hands). One possible area where adopting a "more raw" buffer would be beneficial is when writing/reading custom attributes -- since BB/MS will already provide the primitives we need to access load/store primitive values from/in the buffer. But -- again, something that requires care and consideration, it's not a slam dunk. Maurizio > > On 3/12/2025 11:53 AM, Maurizio Cimadamore wrote: >> >> On 10/03/2025 17:52, Brian Goetz wrote: >>> My first reaction is that the first seems fine in theory >> >> I wonder if an API accepting a MemorySegment would be more general -- >> you can construct a MS from a BB and you can of course go from MS to >> byte[] (which is what the impl needs). So I wonder if that would be >> more future-proof. (We can, of course, also provide both). >> >> Maurizio >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.lloyd at redhat.com Wed Mar 12 19:10:44 2025 From: david.lloyd at redhat.com (David Lloyd) Date: Wed, 12 Mar 2025 19:10:44 +0000 Subject: Class files in ByteBuffer In-Reply-To: References: <61213c8a-ca86-4d37-8d5a-1aff31834481@oracle.com> <16a12929-7ba7-417c-a8f4-aa0f04e5e11e@oracle.com> Message-ID: On Wed, Mar 12, 2025 at 11:51?AM Maurizio Cimadamore < maurizio.cimadamore at oracle.com> wrote: > > On 12/03/2025 16:25, Brian Goetz wrote: > > That does seem like a more future-proof choice. (I suspect too it would > be less intrusive to adapt the internals to MS than BB.) > > They are probably similar in spirit -- but at least you would know that > the MS path is more aggressively/actively optimized. > > I do share some of Chen's concerns -- random access on MS (and BB) is not > comparable to random access on a byte[]. So changing the internals of the > classfile API to use MS/BB is something that needs to be done carefully > (and with benchmarks at hands). > > One possible area where adopting a "more raw" buffer would be beneficial > is when writing/reading custom attributes -- since BB/MS will already > provide the primitives we need to access load/store primitive values > from/in the buffer. But -- again, something that requires care and > consideration, it's not a slam dunk. > Internally, (on the parsing side at least) it is my expectation that we would not likely be able to get away with having a single, general access strategy using the `MemorySegment` API (but we could test that now, even without the suggested API changes - I would love to be wrong). It seems more likely that we'd want to keep the current array-based strategy (which uses `Unsafe` liberally) and add a new direct memory address-based access strategy (also using `Unsafe` in an equivalent manner), and select the strategy based on the kind of `MemorySegment` or `ByteBuffer`. Having three parse and build APIs (one for each of `byte[]`, `ByteBuffer`, and `MemorySegment`) makes sense to me because there's a use case for each of them, and they can be implemented in terms of one another to a great extent which gives a lot of flexibility. Particularly, it seems to me that as long as `ClassLoader.defineClass(String,ByteBuffer,ProtectionDomain)` exists, then ByteBuffer should be floated up to the API, even if it ends up being e.g. `MemorySegment.ofByteBuffer()` on the inside. (That said, I wouldn't hate it if a new `defineClass` which uses `MemorySegment` could be defined someday.) -- - DML ? he/him -------------- next part -------------- An HTML attachment was scrubbed... URL: