From claes.redestad at oracle.com Thu May 2 11:34:38 2024 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 2 May 2024 11:34:38 +0000 Subject: Reducing classes loaded by ClassFile API usage Message-ID: Hi, Looking at replacing ASM with the ClassFile API (CFA) in various places we have observed both startup and footprint regressions. Startup times increase 4-5 ms on Hello World, 40 ms on a small GUI app and 250ms on a larger app. So there?s both a one-off cost and a scaling factor here. We?ve been doing some analysis and picked a lot of low-hanging fruit. Bytecode executed has been reduced to about the same level and we?ve found improvements in dependencies such as the java.lang.constant API. All good. And the number of classes loaded on a Hello World style application has dropped by about 50. Great! Still the overall picture persists: a Hello World style application that initializes a lambda takes a wee bit longer and the footprint is decidedly. The main culprit now that some low-hanging fruit has been plucked seem to be that the trivial use of CFA to spin up lambda proxies is loading in about 160 classes: An ASM-based baseline loads 691 classes. The best recent CFA version (a merge of https://github.com/openjdk/jdk/pull/19006 and https://github.com/openjdk/jdk/pull/17108) loads 834. A net 143 class difference. Loading classes slows down startup, increases memory footprint, grows the default CDS archive. And involving more classes - and more code - is often costly even accompanied with some of the solutions being explored to ?fix? startup at large. So why is this? The CFA is mainly split up into two package stuctures, one public under java.lang.classfile and one internal under jdk.internal.classfile.impl. In the public side most of the types are sealed interfaces, which are then implemented by an assortment of abstract and concrete classes under jdk.internal.classfile.impl. Very neat. But I do fear this means we are at least doubling the number of loaded classes from this neat separation. While it?s a bit late in the game I still feel I must propose striking up a conversation about what, if anything, we could consider that would reduce the number of loaded classes. Whether they are interfaces, abstract or concrete classes. I think any savings would be very welcome. Here?s an idea: There are a number of cases where the separation seem unnecessary: public sealed interface ArrayLoadInstruction extends Instruction permits AbstractInstruction.UnboundArrayLoadInstruction { ? static ArrayLoadInstruction of(Opcode op) { Util.checkKind(op, Opcode.Kind.ARRAY_LOAD); return new AbstractInstruction.UnboundArrayLoadInstruction(op); } } An interface in java.lang.classfile.instruction which only permits a single implementation class - and as it happens has a static factory method which is the only place where that concrete instruction is called. Making single-use interfaces such as this one a final class is doable[1], but now we?d have some instructions modeled as an interface, others as classes. Cats and dogs, living together. And it gets messy quick for instructions that can be bound or unbound, since those inherit from abstract BoundInstruction or UnboundInstruction respectively. But perhaps internal implementation details like whether an instruction is bound or unbound ought to be modeled with composition rather than inheritance (and optional CodeImpl + pos tuple) in a shared base class? Then it might follow that each of the interfaces in java.lang.classfile.instruction can really be a single final class. If all concrete instructions were folded into their corresponding interface that could reduce the total number of implementation classes by 46 (though only 6 of those seem to be on a Hello World) Yikes, that?s a deep cut for a small, incremental gain. From an API consumer point of view I can?t say there?s much difference, and the factories can still (be evolved to) produce different concrete types when necessary. Maybe someone can think of other, simpler ways to reduce the number of types floating around in the ClassFile API? Thank you for your consideration. Claes [1] https://github.com/openjdk/jdk/compare/master...cl4es:jdk:fold_instruction_example?expand=1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu May 2 11:50:05 2024 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 2 May 2024 11:50:05 +0000 Subject: Reducing classes loaded by ClassFile API usage In-Reply-To: References: Message-ID: For what it?s worth, there was an earlier experiment that merged the bound and unbound representation classes, and you could see a measurable performance loss just because these classes have more fields and there were more branches to access them. Sent from my iPad On May 2, 2024, at 7:34?AM, Claes Redestad wrote: ? Hi, Looking at replacing ASM with the ClassFile API (CFA) in various places we have observed both startup and footprint regressions. Startup times increase 4-5 ms on Hello World, 40 ms on a small GUI app and 250ms on a larger app. So there?s both a one-off cost and a scaling factor here. We?ve been doing some analysis and picked a lot of low-hanging fruit. Bytecode executed has been reduced to about the same level and we?ve found improvements in dependencies such as the java.lang.constant API. All good. And the number of classes loaded on a Hello World style application has dropped by about 50. Great! Still the overall picture persists: a Hello World style application that initializes a lambda takes a wee bit longer and the footprint is decidedly. The main culprit now that some low-hanging fruit has been plucked seem to be that the trivial use of CFA to spin up lambda proxies is loading in about 160 classes: An ASM-based baseline loads 691 classes. The best recent CFA version (a merge of https://github.com/openjdk/jdk/pull/19006 and https://github.com/openjdk/jdk/pull/17108) loads 834. A net 143 class difference. Loading classes slows down startup, increases memory footprint, grows the default CDS archive. And involving more classes - and more code - is often costly even accompanied with some of the solutions being explored to ?fix? startup at large. So why is this? The CFA is mainly split up into two package stuctures, one public under java.lang.classfile and one internal under jdk.internal.classfile.impl. In the public side most of the types are sealed interfaces, which are then implemented by an assortment of abstract and concrete classes under jdk.internal.classfile.impl. Very neat. But I do fear this means we are at least doubling the number of loaded classes from this neat separation. While it?s a bit late in the game I still feel I must propose striking up a conversation about what, if anything, we could consider that would reduce the number of loaded classes. Whether they are interfaces, abstract or concrete classes. I think any savings would be very welcome. Here?s an idea: There are a number of cases where the separation seem unnecessary: public sealed interface ArrayLoadInstruction extends Instruction permits AbstractInstruction.UnboundArrayLoadInstruction { ? static ArrayLoadInstruction of(Opcode op) { Util.checkKind(op, Opcode.Kind.ARRAY_LOAD); return new AbstractInstruction.UnboundArrayLoadInstruction(op); } } An interface in java.lang.classfile.instruction which only permits a single implementation class - and as it happens has a static factory method which is the only place where that concrete instruction is called. Making single-use interfaces such as this one a final class is doable[1], but now we?d have some instructions modeled as an interface, others as classes. Cats and dogs, living together. And it gets messy quick for instructions that can be bound or unbound, since those inherit from abstract BoundInstruction or UnboundInstruction respectively. But perhaps internal implementation details like whether an instruction is bound or unbound ought to be modeled with composition rather than inheritance (and optional CodeImpl + pos tuple) in a shared base class? Then it might follow that each of the interfaces in java.lang.classfile.instruction can really be a single final class. If all concrete instructions were folded into their corresponding interface that could reduce the total number of implementation classes by 46 (though only 6 of those seem to be on a Hello World) Yikes, that?s a deep cut for a small, incremental gain. From an API consumer point of view I can?t say there?s much difference, and the factories can still (be evolved to) produce different concrete types when necessary. Maybe someone can think of other, simpler ways to reduce the number of types floating around in the ClassFile API? Thank you for your consideration. Claes [1] https://github.com/openjdk/jdk/compare/master...cl4es:jdk:fold_instruction_example?expand=1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From claes.redestad at oracle.com Thu May 2 11:57:59 2024 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 2 May 2024 11:57:59 +0000 Subject: Reducing classes loaded by ClassFile API usage In-Reply-To: References: Message-ID: <7B50AF59-E816-497E-84B9-F0C06C034A00@oracle.com> A performance loss where exactly? For classfile generation (and reflection) I?d be more concerned with cold-to-lukewarm cases of getting an app up and running than, say, the number you might get in synthetic benchmarks running the API at peak performance. 2 maj 2024 kl. 13:50 skrev Brian Goetz : For what it?s worth, there was an earlier experiment that merged the bound and unbound representation classes, and you could see a measurable performance loss just because these classes have more fields and there were more branches to access them. Sent from my iPad On May 2, 2024, at 7:34?AM, Claes Redestad wrote: ? Hi, Looking at replacing ASM with the ClassFile API (CFA) in various places we have observed both startup and footprint regressions. Startup times increase 4-5 ms on Hello World, 40 ms on a small GUI app and 250ms on a larger app. So there?s both a one-off cost and a scaling factor here. We?ve been doing some analysis and picked a lot of low-hanging fruit. Bytecode executed has been reduced to about the same level and we?ve found improvements in dependencies such as the java.lang.constant API. All good. And the number of classes loaded on a Hello World style application has dropped by about 50. Great! Still the overall picture persists: a Hello World style application that initializes a lambda takes a wee bit longer and the footprint is decidedly. The main culprit now that some low-hanging fruit has been plucked seem to be that the trivial use of CFA to spin up lambda proxies is loading in about 160 classes: An ASM-based baseline loads 691 classes. The best recent CFA version (a merge of https://github.com/openjdk/jdk/pull/19006 and https://github.com/openjdk/jdk/pull/17108) loads 834. A net 143 class difference. Loading classes slows down startup, increases memory footprint, grows the default CDS archive. And involving more classes - and more code - is often costly even accompanied with some of the solutions being explored to ?fix? startup at large. So why is this? The CFA is mainly split up into two package stuctures, one public under java.lang.classfile and one internal under jdk.internal.classfile.impl. In the public side most of the types are sealed interfaces, which are then implemented by an assortment of abstract and concrete classes under jdk.internal.classfile.impl. Very neat. But I do fear this means we are at least doubling the number of loaded classes from this neat separation. While it?s a bit late in the game I still feel I must propose striking up a conversation about what, if anything, we could consider that would reduce the number of loaded classes. Whether they are interfaces, abstract or concrete classes. I think any savings would be very welcome. Here?s an idea: There are a number of cases where the separation seem unnecessary: public sealed interface ArrayLoadInstruction extends Instruction permits AbstractInstruction.UnboundArrayLoadInstruction { ? static ArrayLoadInstruction of(Opcode op) { Util.checkKind(op, Opcode.Kind.ARRAY_LOAD); return new AbstractInstruction.UnboundArrayLoadInstruction(op); } } An interface in java.lang.classfile.instruction which only permits a single implementation class - and as it happens has a static factory method which is the only place where that concrete instruction is called. Making single-use interfaces such as this one a final class is doable[1], but now we?d have some instructions modeled as an interface, others as classes. Cats and dogs, living together. And it gets messy quick for instructions that can be bound or unbound, since those inherit from abstract BoundInstruction or UnboundInstruction respectively. But perhaps internal implementation details like whether an instruction is bound or unbound ought to be modeled with composition rather than inheritance (and optional CodeImpl + pos tuple) in a shared base class? Then it might follow that each of the interfaces in java.lang.classfile.instruction can really be a single final class. If all concrete instructions were folded into their corresponding interface that could reduce the total number of implementation classes by 46 (though only 6 of those seem to be on a Hello World) Yikes, that?s a deep cut for a small, incremental gain. From an API consumer point of view I can?t say there?s much difference, and the factories can still (be evolved to) produce different concrete types when necessary. Maybe someone can think of other, simpler ways to reduce the number of types floating around in the ClassFile API? Thank you for your consideration. Claes [1] https://github.com/openjdk/jdk/compare/master...cl4es:jdk:fold_instruction_example?expand=1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu May 2 12:04:21 2024 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 2 May 2024 12:04:21 +0000 Subject: Reducing classes loaded by ClassFile API usage In-Reply-To: <7B50AF59-E816-497E-84B9-F0C06C034A00@oracle.com> References: <7B50AF59-E816-497E-84B9-F0C06C034A00@oracle.com> Message-ID: The benchmark that we used most frequently when writing the library was the null adaptation benchmark, where we visit a class file with a transform that just passes the elements through. It is a measure of the cost to traverse and inflate the representation. We prioritize the case where a class file is transformed with only small changes because this is one of the most common cases for online transformation. Sent from my iPad On May 2, 2024, at 7:58?AM, Claes Redestad wrote: ? A performance loss where exactly? For classfile generation (and reflection) I?d be more concerned with cold-to-lukewarm cases of getting an app up and running than, say, the number you might get in synthetic benchmarks running the API at peak performance. 2 maj 2024 kl. 13:50 skrev Brian Goetz : For what it?s worth, there was an earlier experiment that merged the bound and unbound representation classes, and you could see a measurable performance loss just because these classes have more fields and there were more branches to access them. Sent from my iPad On May 2, 2024, at 7:34?AM, Claes Redestad wrote: ? Hi, Looking at replacing ASM with the ClassFile API (CFA) in various places we have observed both startup and footprint regressions. Startup times increase 4-5 ms on Hello World, 40 ms on a small GUI app and 250ms on a larger app. So there?s both a one-off cost and a scaling factor here. We?ve been doing some analysis and picked a lot of low-hanging fruit. Bytecode executed has been reduced to about the same level and we?ve found improvements in dependencies such as the java.lang.constant API. All good. And the number of classes loaded on a Hello World style application has dropped by about 50. Great! Still the overall picture persists: a Hello World style application that initializes a lambda takes a wee bit longer and the footprint is decidedly. The main culprit now that some low-hanging fruit has been plucked seem to be that the trivial use of CFA to spin up lambda proxies is loading in about 160 classes: An ASM-based baseline loads 691 classes. The best recent CFA version (a merge of https://github.com/openjdk/jdk/pull/19006 and https://github.com/openjdk/jdk/pull/17108) loads 834. A net 143 class difference. Loading classes slows down startup, increases memory footprint, grows the default CDS archive. And involving more classes - and more code - is often costly even accompanied with some of the solutions being explored to ?fix? startup at large. So why is this? The CFA is mainly split up into two package stuctures, one public under java.lang.classfile and one internal under jdk.internal.classfile.impl. In the public side most of the types are sealed interfaces, which are then implemented by an assortment of abstract and concrete classes under jdk.internal.classfile.impl. Very neat. But I do fear this means we are at least doubling the number of loaded classes from this neat separation. While it?s a bit late in the game I still feel I must propose striking up a conversation about what, if anything, we could consider that would reduce the number of loaded classes. Whether they are interfaces, abstract or concrete classes. I think any savings would be very welcome. Here?s an idea: There are a number of cases where the separation seem unnecessary: public sealed interface ArrayLoadInstruction extends Instruction permits AbstractInstruction.UnboundArrayLoadInstruction { ? static ArrayLoadInstruction of(Opcode op) { Util.checkKind(op, Opcode.Kind.ARRAY_LOAD); return new AbstractInstruction.UnboundArrayLoadInstruction(op); } } An interface in java.lang.classfile.instruction which only permits a single implementation class - and as it happens has a static factory method which is the only place where that concrete instruction is called. Making single-use interfaces such as this one a final class is doable[1], but now we?d have some instructions modeled as an interface, others as classes. Cats and dogs, living together. And it gets messy quick for instructions that can be bound or unbound, since those inherit from abstract BoundInstruction or UnboundInstruction respectively. But perhaps internal implementation details like whether an instruction is bound or unbound ought to be modeled with composition rather than inheritance (and optional CodeImpl + pos tuple) in a shared base class? Then it might follow that each of the interfaces in java.lang.classfile.instruction can really be a single final class. If all concrete instructions were folded into their corresponding interface that could reduce the total number of implementation classes by 46 (though only 6 of those seem to be on a Hello World) Yikes, that?s a deep cut for a small, incremental gain. From an API consumer point of view I can?t say there?s much difference, and the factories can still (be evolved to) produce different concrete types when necessary. Maybe someone can think of other, simpler ways to reduce the number of types floating around in the ClassFile API? Thank you for your consideration. Claes [1] https://github.com/openjdk/jdk/compare/master...cl4es:jdk:fold_instruction_example?expand=1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From claes.redestad at oracle.com Thu May 2 13:12:28 2024 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 2 May 2024 13:12:28 +0000 Subject: Reducing classes loaded by ClassFile API usage In-Reply-To: References: <7B50AF59-E816-497E-84B9-F0C06C034A00@oracle.com> Message-ID: I?m curious what data we have on which cases of online transformations are common, and which of those common use cases are most performance-sensitive? Regardless, I?m just looking for constructive ways to reduce the bootstrap overheads of the API. What we have here today is getting close to being acceptable, but we would be looking at a multitude of regressions if #17108 is integrated.. 2 maj 2024 kl. 14:04 skrev Brian Goetz : The benchmark that we used most frequently when writing the library was the null adaptation benchmark, where we visit a class file with a transform that just passes the elements through. It is a measure of the cost to traverse and inflate the representation. We prioritize the case where a class file is transformed with only small changes because this is one of the most common cases for online transformation. Sent from my iPad On May 2, 2024, at 7:58?AM, Claes Redestad wrote: ? A performance loss where exactly? For classfile generation (and reflection) I?d be more concerned with cold-to-lukewarm cases of getting an app up and running than, say, the number you might get in synthetic benchmarks running the API at peak performance. 2 maj 2024 kl. 13:50 skrev Brian Goetz : For what it?s worth, there was an earlier experiment that merged the bound and unbound representation classes, and you could see a measurable performance loss just because these classes have more fields and there were more branches to access them. Sent from my iPad On May 2, 2024, at 7:34?AM, Claes Redestad wrote: ? Hi, Looking at replacing ASM with the ClassFile API (CFA) in various places we have observed both startup and footprint regressions. Startup times increase 4-5 ms on Hello World, 40 ms on a small GUI app and 250ms on a larger app. So there?s both a one-off cost and a scaling factor here. We?ve been doing some analysis and picked a lot of low-hanging fruit. Bytecode executed has been reduced to about the same level and we?ve found improvements in dependencies such as the java.lang.constant API. All good. And the number of classes loaded on a Hello World style application has dropped by about 50. Great! Still the overall picture persists: a Hello World style application that initializes a lambda takes a wee bit longer and the footprint is decidedly. The main culprit now that some low-hanging fruit has been plucked seem to be that the trivial use of CFA to spin up lambda proxies is loading in about 160 classes: An ASM-based baseline loads 691 classes. The best recent CFA version (a merge of https://github.com/openjdk/jdk/pull/19006 and https://github.com/openjdk/jdk/pull/17108) loads 834. A net 143 class difference. Loading classes slows down startup, increases memory footprint, grows the default CDS archive. And involving more classes - and more code - is often costly even accompanied with some of the solutions being explored to ?fix? startup at large. So why is this? The CFA is mainly split up into two package stuctures, one public under java.lang.classfile and one internal under jdk.internal.classfile.impl. In the public side most of the types are sealed interfaces, which are then implemented by an assortment of abstract and concrete classes under jdk.internal.classfile.impl. Very neat. But I do fear this means we are at least doubling the number of loaded classes from this neat separation. While it?s a bit late in the game I still feel I must propose striking up a conversation about what, if anything, we could consider that would reduce the number of loaded classes. Whether they are interfaces, abstract or concrete classes. I think any savings would be very welcome. Here?s an idea: There are a number of cases where the separation seem unnecessary: public sealed interface ArrayLoadInstruction extends Instruction permits AbstractInstruction.UnboundArrayLoadInstruction { ? static ArrayLoadInstruction of(Opcode op) { Util.checkKind(op, Opcode.Kind.ARRAY_LOAD); return new AbstractInstruction.UnboundArrayLoadInstruction(op); } } An interface in java.lang.classfile.instruction which only permits a single implementation class - and as it happens has a static factory method which is the only place where that concrete instruction is called. Making single-use interfaces such as this one a final class is doable[1], but now we?d have some instructions modeled as an interface, others as classes. Cats and dogs, living together. And it gets messy quick for instructions that can be bound or unbound, since those inherit from abstract BoundInstruction or UnboundInstruction respectively. But perhaps internal implementation details like whether an instruction is bound or unbound ought to be modeled with composition rather than inheritance (and optional CodeImpl + pos tuple) in a shared base class? Then it might follow that each of the interfaces in java.lang.classfile.instruction can really be a single final class. If all concrete instructions were folded into their corresponding interface that could reduce the total number of implementation classes by 46 (though only 6 of those seem to be on a Hello World) Yikes, that?s a deep cut for a small, incremental gain. From an API consumer point of view I can?t say there?s much difference, and the factories can still (be evolved to) produce different concrete types when necessary. Maybe someone can think of other, simpler ways to reduce the number of types floating around in the ClassFile API? Thank you for your consideration. Claes [1] https://github.com/openjdk/jdk/compare/master...cl4es:jdk:fold_instruction_example?expand=1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From liangchenblue at gmail.com Thu May 2 14:34:48 2024 From: liangchenblue at gmail.com (-) Date: Thu, 2 May 2024 09:34:48 -0500 Subject: Reducing classes loaded by ClassFile API usage In-Reply-To: <7B50AF59-E816-497E-84B9-F0C06C034A00@oracle.com> References: <7B50AF59-E816-497E-84B9-F0C06C034A00@oracle.com> Message-ID: Hi Claes, Class-File API is suitable for parsing and transforming class files too, yet these functionalities are rarely used within the JDK itself, which almost exclusively spin new classes. These no-op tests are simulating the overheads from trying to go into a detailed object (such as only transforming a particular instruction in a particular method in a class but keeping everything else intact otherwise). With ASM, all details have to be visited and all Strings are automatically expanded; Class-File API already wins over ASM in this aspect, as CF API is more lazy. For Class-File generation, I think our API has mostly focused on its stack-map generation bottleneck (one being using MethodTypeDesc to speed up slot counting); otherwise I don't recall we had major performance improvements for writing, so Adam is using approaches like storing known constants into a bound byte array constant pool so we can share the CP prefixes. Chen On Thu, May 2, 2024 at 9:22?AM Claes Redestad wrote: > A performance loss where exactly? > > For classfile generation (and reflection) I?d be more concerned with > cold-to-lukewarm cases of getting an app up and running than, say, the > number you might get in synthetic benchmarks running the API at peak > performance. > > > 2 maj 2024 kl. 13:50 skrev Brian Goetz : > > For what it?s worth, there was an earlier experiment that merged the bound > and unbound representation classes, and you could see a measurable > performance loss just because these classes have more fields and there were > more branches to access them. > > Sent from my iPad > > On May 2, 2024, at 7:34?AM, Claes Redestad > wrote: > > ? Hi, > > Looking at replacing ASM with the ClassFile API (CFA) in various places we > have observed both startup and footprint regressions. Startup times > increase 4-5 ms on Hello World, 40 ms on a small GUI app and 250ms on a > larger app. So there?s both a one-off cost and a scaling factor here. > > We?ve been doing some analysis and picked a lot of low-hanging fruit. > Bytecode executed has been reduced to about the same level and we?ve found > improvements in dependencies such as the java.lang.constant API. All good. > And the number of classes loaded on a Hello World style application has > dropped by about 50. Great! > > Still the overall picture persists: a Hello World style application that > initializes a lambda takes a wee bit longer and the footprint is decidedly. > The main culprit now that some low-hanging fruit has been plucked seem to > be that the trivial use of CFA to spin up lambda proxies is loading in > about 160 classes: An ASM-based baseline loads 691 classes. The best recent > CFA version (a merge of https://github.com/openjdk/jdk/pull/19006 and > https://github.com/openjdk/jdk/pull/17108) loads 834. A net 143 class > difference. > > Loading classes slows down startup, increases memory footprint, grows the > default CDS archive. And involving more classes - and more code - is often > costly even accompanied with some of the solutions being explored to ?fix? > startup at large. > > So why is this? > > The CFA is mainly split up into two package stuctures, one public under > java.lang.classfile and one internal under jdk.internal.classfile.impl. In > the public side most of the types are sealed interfaces, which are then > implemented by an assortment of abstract and concrete classes under > jdk.internal.classfile.impl. Very neat. But I do fear this means we are at > least doubling the number of loaded classes from this neat separation. > > While it?s a bit late in the game I still feel I must propose striking up > a conversation about what, if anything, we could consider that would reduce > the number of loaded classes. Whether they are interfaces, abstract or > concrete classes. I think any savings would be very welcome. > > Here?s an idea: > > There are a number of cases where the separation seem unnecessary: > > public sealed interface ArrayLoadInstruction extends Instruction > permits AbstractInstruction.UnboundArrayLoadInstruction { > *?* > > static ArrayLoadInstruction of(Opcode op) { > Util.checkKind(op, Opcode.Kind.ARRAY_LOAD); > return new AbstractInstruction.UnboundArrayLoadInstruction(op); > } > } > > An interface in java.lang.classfile.instruction which only permits a > single implementation class - and as it happens has a static factory > method which is the only place where that concrete instruction is called. > > Making single-use interfaces such as this one a final class is doable[1], > but now we?d have some instructions modeled as an interface, others as > classes. Cats and dogs, living together. And it gets messy quick for > instructions that can be bound or unbound, since those inherit from > abstract BoundInstruction or UnboundInstruction respectively. But perhaps > internal implementation details like whether an instruction is bound or > unbound ought to be modeled with composition rather than inheritance (and > optional CodeImpl + pos tuple) in a shared base class? Then it might follow > that each of the interfaces in java.lang.classfile.instruction can really > be a single final class. If all concrete instructions were folded into > their corresponding interface that could reduce the total number of > implementation classes by 46 (though only 6 of those seem to be on a Hello > World) > > Yikes, that?s a deep cut for a small, incremental gain. From an API > consumer point of view I can?t say there?s much difference, and the > factories can still (be evolved to) produce different concrete types when > necessary. > > Maybe someone can think of other, simpler ways to reduce the number of > types floating around in the ClassFile API? > > Thank you for your consideration. > > Claes > > [1] > https://github.com/openjdk/jdk/compare/master...cl4es:jdk:fold_instruction_example?expand=1 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Thu May 2 16:40:41 2024 From: adam.sotona at oracle.com (Adam Sotona) Date: Thu, 2 May 2024 16:40:41 +0000 Subject: Reducing classes loaded by ClassFile API usage In-Reply-To: References: <7B50AF59-E816-497E-84B9-F0C06C034A00@oracle.com> Message-ID: First thanks Claes for recent performance improvements work and findings done. Unfortunately, I don't see much space for reduction when looking at the interfaces exposed as the API. Bound and unbound elements handling is way different, so beside some minor exceptions there is not a space for common implementation in a form of final classes. Performance difference between handling bound and unbound elements is one order of magnitude for each level you dive into the class model. I see small chance of abstract classes reduction in the implementation, however performance benefit would be questionable. Class numbers may reduce, however bytecode loaded may multiply as we would have to inline the abstractions into the individual implementation classes. Huge performance work has been already done on unnecessary from/to String conversions, unnecessary sub-Stringing, arrays/lists cloning. A lot of work has been done on stack maps calculating algorithm and there still may be a space for improvements. Now we are in process of static initialization footprint reduction, and it goes well (inline advertisement: guys, please don't forget to review pr/19006 and the CSR so we can make it into 23). We may also look for internal "unsafe" access to construct some of the symbols, which now must pass through layers of checks, even constructed from constants. Parsing Strings and counting number of square brackets during bootstrap for hard-coded constants is pure wasting of CPU cycles... and they may be more such improvement places, or better implementation of the existing code. In terms of loaded class numbers, we could not compare with ASM, where the visitors API approach is so different, and the most frequently used symbol is String. Personally (and after so many different prototypes), I suggest continuing in the API evolution, rather than revolution. Thanks, Adam From: classfile-api-dev on behalf of Claes Redestad Date: Thursday, 2 May 2024 at 15:12 To: Brian Goetz Cc: classfile-api-dev at openjdk.org Subject: Re: Reducing classes loaded by ClassFile API usage I?m curious what data we have on which cases of online transformations are common, and which of those common use cases are most performance-sensitive? Regardless, I?m just looking for constructive ways to reduce the bootstrap overheads of the API. What we have here today is getting close to being acceptable, but we would be looking at a multitude of regressions if #17108 is integrated.. 2 maj 2024 kl. 14:04 skrev Brian Goetz : The benchmark that we used most frequently when writing the library was the null adaptation benchmark, where we visit a class file with a transform that just passes the elements through. It is a measure of the cost to traverse and inflate the representation. We prioritize the case where a class file is transformed with only small changes because this is one of the most common cases for online transformation. Sent from my iPad On May 2, 2024, at 7:58?AM, Claes Redestad wrote: ? A performance loss where exactly? For classfile generation (and reflection) I?d be more concerned with cold-to-lukewarm cases of getting an app up and running than, say, the number you might get in synthetic benchmarks running the API at peak performance. 2 maj 2024 kl. 13:50 skrev Brian Goetz : For what it?s worth, there was an earlier experiment that merged the bound and unbound representation classes, and you could see a measurable performance loss just because these classes have more fields and there were more branches to access them. Sent from my iPad On May 2, 2024, at 7:34?AM, Claes Redestad wrote: ? Hi, Looking at replacing ASM with the ClassFile API (CFA) in various places we have observed both startup and footprint regressions. Startup times increase 4-5 ms on Hello World, 40 ms on a small GUI app and 250ms on a larger app. So there?s both a one-off cost and a scaling factor here. We?ve been doing some analysis and picked a lot of low-hanging fruit. Bytecode executed has been reduced to about the same level and we?ve found improvements in dependencies such as the java.lang.constant API. All good. And the number of classes loaded on a Hello World style application has dropped by about 50. Great! Still the overall picture persists: a Hello World style application that initializes a lambda takes a wee bit longer and the footprint is decidedly. The main culprit now that some low-hanging fruit has been plucked seem to be that the trivial use of CFA to spin up lambda proxies is loading in about 160 classes: An ASM-based baseline loads 691 classes. The best recent CFA version (a merge of https://github.com/openjdk/jdk/pull/19006 and https://github.com/openjdk/jdk/pull/17108) loads 834. A net 143 class difference. Loading classes slows down startup, increases memory footprint, grows the default CDS archive. And involving more classes - and more code - is often costly even accompanied with some of the solutions being explored to ?fix? startup at large. So why is this? The CFA is mainly split up into two package stuctures, one public under java.lang.classfile and one internal under jdk.internal.classfile.impl. In the public side most of the types are sealed interfaces, which are then implemented by an assortment of abstract and concrete classes under jdk.internal.classfile.impl. Very neat. But I do fear this means we are at least doubling the number of loaded classes from this neat separation. While it?s a bit late in the game I still feel I must propose striking up a conversation about what, if anything, we could consider that would reduce the number of loaded classes. Whether they are interfaces, abstract or concrete classes. I think any savings would be very welcome. Here?s an idea: There are a number of cases where the separation seem unnecessary: public sealed interface ArrayLoadInstruction extends Instruction permits AbstractInstruction.UnboundArrayLoadInstruction { ? static ArrayLoadInstruction of(Opcode op) { Util.checkKind(op, Opcode.Kind.ARRAY_LOAD); return new AbstractInstruction.UnboundArrayLoadInstruction(op); } } An interface in java.lang.classfile.instruction which only permits a single implementation class - and as it happens has a static factory method which is the only place where that concrete instruction is called. Making single-use interfaces such as this one a final class is doable[1], but now we?d have some instructions modeled as an interface, others as classes. Cats and dogs, living together. And it gets messy quick for instructions that can be bound or unbound, since those inherit from abstract BoundInstruction or UnboundInstruction respectively. But perhaps internal implementation details like whether an instruction is bound or unbound ought to be modeled with composition rather than inheritance (and optional CodeImpl + pos tuple) in a shared base class? Then it might follow that each of the interfaces in java.lang.classfile.instruction can really be a single final class. If all concrete instructions were folded into their corresponding interface that could reduce the total number of implementation classes by 46 (though only 6 of those seem to be on a Hello World) Yikes, that?s a deep cut for a small, incremental gain. From an API consumer point of view I can?t say there?s much difference, and the factories can still (be evolved to) produce different concrete types when necessary. Maybe someone can think of other, simpler ways to reduce the number of types floating around in the ClassFile API? Thank you for your consideration. Claes [1] https://github.com/openjdk/jdk/compare/master...cl4es:jdk:fold_instruction_example?expand=1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From claes.redestad at oracle.com Thu May 2 09:49:39 2024 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 2 May 2024 09:49:39 +0000 Subject: Reducing the number of classes Message-ID: Hi, Looking at replacing ASM with the ClassFile API (CFA) in various places we have observed both startup and footprint regressions. Startup times increase 4-5 ms on Hello World, 40 ms on a small GUI app and 250ms on a larger app. So there?s both a one-off cost and a scaling factor here. We?ve been doing some analysis and picked a lot of low-hanging fruit. Bytecode executed has been reduced to about the same level and we?ve found improvements in dependencies such as the java.lang.constant API. All good. And the number of classes loaded on a Hello World style application has dropped by about 50. Great! Still the overall picture persists: a Hello World style application that initializes a lambda takes a wee bit longer and the footprint is decidedly. The main culprit now that some low-hanging fruit has been plucked seem to be that the trivial use of CFA to spin up lambda proxies is loading in about 160 classes: An ASM-based baseline loads 691 classes. The best recent CFA version (a merge of https://github.com/openjdk/jdk/pull/19006 and https://github.com/openjdk/jdk/pull/17108) loads 834. A net 143 class difference. Loading classes slows down startup, increases memory footprint (grows the default CDS archive by ~1Mb, grows process space). And loading/involving more classes - and more code - is typically bad also for at least some of the solutions being explored to ?fix? startup at large. So why is this? The CFA is mainly split up into two package stuctures, one public under java.lang.classfile and one internal under jdk.internal.classfile.impl. In the public side most of the types are sealed interfaces, which are then implemented by an assortment of abstract and concrete classes under jdk.internal.classfile.impl. Very neat. But I do fear this means we are at least doubling the number of loaded classes from this neat separation. While it?s a bit late in the game I still feel I must propose striking up a conversation about what, if anything, we could consider that would reduce the number of loaded classes. Whether they are interfaces, abstract or concrete classes. I think any savings would be very welcome. Here?s an idea: There are a number of cases where the separation seem unnecessary: public sealed interface ArrayLoadInstruction extends Instruction permits AbstractInstruction.UnboundArrayLoadInstruction { ... static ArrayLoadInstruction of(Opcode op) { Util.checkKind(op, Opcode.Kind.ARRAY_LOAD); return new AbstractInstruction.UnboundArrayLoadInstruction(op); } } An interface in java.lang.classfile.instruction which only permits a single implementation class - and as it happens has a static factory method which is the only place where that concrete instruction is called. Making single-use interfaces such as this one a final class is doable, but now we?d have some instructions modeled as an interface, others as a final class which would be unclean. And it gets messy quick for instructions that can be bound or unbound, since those inherit from abstract BoundInstruction or UnboundInstruction respectively. But perhaps internal implementation details like whether an instruction is bound or unbound ought to be modeled as a state on a more common instruction class rather than relying on distinct types. Then it might follow that all the interfaces in java.lang.classfile.instruction can really be a single final class. If all concrete instructions were folded into their corresponding interface that could reduce the total number of implementation classes by 46 (though only 6 of those seem to be on a Hello World) Yikes, that?s a deep cut for a trivial gain. Maybe someone can think of simpler ways to reduce the number of types floating around in the ClassFile API? Thank you for your consideration Claes -------------- next part -------------- An HTML attachment was scrubbed... URL: From claes.redestad at oracle.com Fri May 3 18:38:34 2024 From: claes.redestad at oracle.com (Claes Redestad) Date: Fri, 3 May 2024 18:38:34 +0000 Subject: Reducing the number of classes In-Reply-To: References: Message-ID: Sorry for the spam - the mailing list first fooled me that I had signed up, then put several e-mail in the ?moderation? queue, with broken links to the page that would allow me to cancel. Mailing lists are dead, see you in the PRs! Claes > 2 maj 2024 kl. 11:49 skrev Claes Redestad : > > Hi! From liangchenblue at gmail.com Sat May 4 00:07:44 2024 From: liangchenblue at gmail.com (-) Date: Fri, 3 May 2024 19:07:44 -0500 Subject: Reducing the number of classes In-Reply-To: References: Message-ID: Hi Claes, I just wonder if you can share an output from -Xlog:class+load before and after to see which classes contribute the most? Bound instructions shouldn't be loaded in the JDK startup, as JDK I think is exclusively writing new class files. We do have a precedent where we merge bound and unbound objects, namely Utf8EntryImpl where the state of is . For instructions I don't think we have done so yet, and I would like to confirm that bound instruction classes are indeed loaded per -Xlog:class+load. If they do get loaded, we can consider using the same model the Utf8EntryImpl used. Thanks, Chen On Fri, May 3, 2024 at 3:03?PM Claes Redestad wrote: > Sorry for the spam - the mailing list first fooled me that I had signed > up, then put several e-mail in the ?moderation? queue, with broken links to > the page that would allow me to cancel. > > Mailing lists are dead, see you in the PRs! > > Claes > > > 2 maj 2024 kl. 11:49 skrev Claes Redestad : > > > > Hi! > -------------- next part -------------- An HTML attachment was scrubbed... URL: From liangchenblue at gmail.com Mon May 13 22:50:39 2024 From: liangchenblue at gmail.com (-) Date: Mon, 13 May 2024 17:50:39 -0500 Subject: Moving core reflection's Signature parsing to Class-File API Message-ID: Hello, Class-File API has a convenient Signature model API, which can replace the Generic Tree internal API in the current core reflection implementation. The old internal Tree API + visitor pattern was bloated with a lot of unnecessary classes, which can be removed with the new CF API. A working implementation with all tier 1 tests passing is available at https://github.com/openjdk/jdk/compare/master...liachmodded:jdk:feature/new-generic-info?expand=1 . In my implementation: - reflectiveObjects (classes directly implementing ParameterizedType, etc.) remain; they are exported, and their implementation is sufficient as-is. - all other old generic classes are removed - few existing usages of old generic code moved to BytecodeDescriptor, use sites converted to throw GenericSignatureFormatError (preexisting behavior) instead of IllegalArgumentException - Just 5 new classes are added to replaced all removed old classes. Given the size of this patch, I decide to start a discussion on the mailing lists on this proposal. Is there any shortcomings with this patch, or is there anything that I should be cautious of for this patch? Best regards, Chen Liang -------------- next part -------------- An HTML attachment was scrubbed... URL: From liangchenblue at gmail.com Tue May 14 23:57:24 2024 From: liangchenblue at gmail.com (-) Date: Tue, 14 May 2024 18:57:24 -0500 Subject: Class File error handling for users Message-ID: Hello ClassFile API programmers, I noticed that recently ClassFile API has quite a few bugs caused by our minimal validation policies, such as JDK-8331940, JDK-8331655, JDK-8331320, JDK-8330684. They are mostly caused by us not defending against malicious values that conform to the CF structure (so they avoid IAE). I believe our policy of minimal validation (i.e. only validating data that's required to build our API model, such as bci for Label, cp index + type for PoolEntry, etc.) is on the correct path for minimal performance impact. Yet the occurrences of these bugs may have an implication on user code, that users might frequently encounter such bugs like we did too, especially for bound attributes lazily reading from malicious class files and returning unsanitized primitive types. As a solution to such problems, I recommend we put up a note in the package info of ClassFile API, telling users that they should sanitize input class files. Our API already took a step in this direction, that we throw IAE for structurally malformed class files; users should realize that such IAEs only happen with lazy expansion, and should perform extra validations and throw IAE to further validate correctness of Class Files. Do you think this validation problem is valid? And if so, would this solution mitigate this issue? I don't think my solution is perfect and am open to input. Chen Liang -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu May 16 12:36:09 2024 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 16 May 2024 12:36:09 +0000 Subject: Class File error handling for users In-Reply-To: References: Message-ID: I think it would be worthwhile to try to characterize the sorts of validation that we do and do not perform, so that users don?t think ?they caught error A but not error B? is purely a result of inconsistent implementation. > On May 14, 2024, at 7:57 PM, liangchenblue at gmail.com wrote: > > Hello ClassFile API programmers, > I noticed that recently ClassFile API has quite a few bugs caused by our minimal validation policies, such as JDK-8331940, JDK-8331655, JDK-8331320, JDK-8330684. They are mostly caused by us not defending against malicious values that conform to the CF structure (so they avoid IAE). > > I believe our policy of minimal validation (i.e. only validating data that's required to build our API model, such as bci for Label, cp index + type for PoolEntry, etc.) is on the correct path for minimal performance impact. Yet the occurrences of these bugs may have an implication on user code, that users might frequently encounter such bugs like we did too, especially for bound attributes lazily reading from malicious class files and returning unsanitized primitive types. > > As a solution to such problems, I recommend we put up a note in the package info of ClassFile API, telling users that they should sanitize input class files. Our API already took a step in this direction, that we throw IAE for structurally malformed class files; users should realize that such IAEs only happen with lazy expansion, and should perform extra validations and throw IAE to further validate correctness of Class Files. > > Do you think this validation problem is valid? And if so, would this solution mitigate this issue? I don't think my solution is perfect and am open to input. > > Chen Liang From o.myhre at gmail.com Thu May 16 13:54:17 2024 From: o.myhre at gmail.com (=?UTF-8?Q?=C3=98ystein_Myhre_Andersen?=) Date: Thu, 16 May 2024 15:54:17 +0200 Subject: Class File error handling for users In-Reply-To: References: Message-ID: As I mentioned before, I had problems with the error message "stack size mismatch" at an early stage of my implementation. I didn't think the message was good enough. At least the message could state the sizes. Later I got a hint about a suitable transform to monitor the stack. I chose instead to add dummy jumps to enforce the error earlier. I have no problem with this today Is it a good idea to add more explanation to the error messages. - ?ystein Myhre Andersen On Thu, May 16, 2024 at 3:32?PM Brian Goetz wrote: > I think it would be worthwhile to try to characterize the sorts of > validation that we do and do not perform, so that users don?t think ?they > caught error A but not error B? is purely a result of inconsistent > implementation. > > > On May 14, 2024, at 7:57 PM, liangchenblue at gmail.com wrote: > > > > Hello ClassFile API programmers, > > I noticed that recently ClassFile API has quite a few bugs caused by our > minimal validation policies, such as JDK-8331940, JDK-8331655, JDK-8331320, > JDK-8330684. They are mostly caused by us not defending against malicious > values that conform to the CF structure (so they avoid IAE). > > > > I believe our policy of minimal validation (i.e. only validating data > that's required to build our API model, such as bci for Label, cp index + > type for PoolEntry, etc.) is on the correct path for minimal performance > impact. Yet the occurrences of these bugs may have an implication on user > code, that users might frequently encounter such bugs like we did too, > especially for bound attributes lazily reading from malicious class files > and returning unsanitized primitive types. > > > > As a solution to such problems, I recommend we put up a note in the > package info of ClassFile API, telling users that they should sanitize > input class files. Our API already took a step in this direction, that we > throw IAE for structurally malformed class files; users should realize that > such IAEs only happen with lazy expansion, and should perform extra > validations and throw IAE to further validate correctness of Class Files. > > > > Do you think this validation problem is valid? And if so, would this > solution mitigate this issue? I don't think my solution is perfect and am > open to input. > > > > Chen Liang > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From liangchenblue at gmail.com Mon May 20 13:36:35 2024 From: liangchenblue at gmail.com (-) Date: Mon, 20 May 2024 08:36:35 -0500 Subject: Error handling for Label construction from bad BCI Message-ID: Hi ClassFile API subscribers, Looking at the recent developments in ClassFile API, there are quite a few bugfixes around AIOOBE from Label generation or not anticipating IAE from Labels. Currently, we aim to always throw IAE instead of IOOBE, but in this way, malformed bcis cannot be retrieved programmatically from APIs returning Label. I am thinking of 2 ways to access the bad bci for a label: 1. throw a subtype of IAE that returns the bad bci index (closer to current behavior) 2. return a dummy label (maybe create a new boundlabel type) that can produce bad bci when used in `CodeAttribute.labelToBci` Do you think accessing the bad bci programmatically is meaningful, and which approach do you think is better to access the bad bci? -------------- next part -------------- An HTML attachment was scrubbed... URL: From liangchenblue at gmail.com Mon May 20 14:08:10 2024 From: liangchenblue at gmail.com (-) Date: Mon, 20 May 2024 09:08:10 -0500 Subject: Type-checked entryByIndex and readEntryOrNull Message-ID: Hi ClassFile API list, I call for addition of a type-checked entryByIndex in ConstantPool, with a signature: T entryByIndex(int index, Class cls) and a type-checked readEntryOrNull in ClassReader, with a signature: T readEntryOrNull(int offset, Class cls) Which will throw ConstantPoolException if the entry is of a mismatched type, much like the type-checked readClassEntry in ClassBuilder. A search for existing generic ConstantPool::entryByIndex and ClassReader::readEntryOrNull in JDK reveals that most of their usages within the jdk.internal.classfile.impl and its subpackages involve a direct cast right after retrieving the result. These casts are susceptible to malformed classfiles putting entries of wrong type, such as a Utf8 at the cursor of superclass entry, throwing ClassCastException, which is out of spec with the Classfile API. I recommend adding these 2 methods for user convenience, and migrating all existing entryByIndex/readEntryOrNull with casts to these 2 new methods, to enhance the robustness of the ClassFile API. (On a side note, we can promote ClassReader::utf8EntryByIndex to ConstantPool too) Please feel free to comment or critique this proposal. Chen Liang -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Tue May 21 10:58:54 2024 From: adam.sotona at oracle.com (Adam Sotona) Date: Tue, 21 May 2024 10:58:54 +0000 Subject: RFR: 8332597: Remove redundant methods from j.l.classfile.ClassReader API In-Reply-To: References: Message-ID: Hi, Class-File API JCK work revealed some of the inappropriately exposed methods in the API. Below is proposal to remove two methods from j.l.classfile.ClassReader. Please let me know any objections or review the PR and related CSR: https://bugs.openjdk.org/browse/JDK-8332598 Thank you, Adam j.l.classfile.ClassReader instance is exposed in the Class-File API through j.l.classfile.AttributeMapper::readAttribute method only. ClassReader only purpose is to serve as a tool for reading content of a custom attribute in a user-provided AttribtueMapper. It contains useful set of low-level class reading methods for user to implement a custom attribute content parser. However methods ClassReader::thisClassPos and ClassReader::skipAttributeHolder are not necessary for a custom attribute content parsing and so redundant in the API. Class-File API implementation internally use these methods, however they should not be exposed in the API. This patch removes the methods from the API. Please review. Thanks, Adam ------------- Commit messages: - 8332597: Remove redundant methods from j.l.classfile.ClassReader API Changes: https://git.openjdk.org/jdk/pull/19323/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19323&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332597 Stats: 26 lines in 5 files changed: 0 ins; 21 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19323.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19323/head:pull/19323 PR: https://git.openjdk.org/jdk/pull/19323 -------------- next part -------------- An HTML attachment was scrubbed... URL: From liangchenblue at gmail.com Tue May 21 11:54:10 2024 From: liangchenblue at gmail.com (Chen Liang) Date: Tue, 21 May 2024 06:54:10 -0500 Subject: Type-checked entryByIndex and readEntryOrNull In-Reply-To: References: Message-ID: Hi Adam, This patch is simple; since this can prevent a lot of bugs around malicious CP references in crafted classfiles, should we consider this enhancement for JDK 23, or should we only have this as internal APIs in ClassReaderImpl? - Chen On Mon, May 20, 2024 at 9:08?AM - wrote: > Hi ClassFile API list, > I call for addition of a type-checked entryByIndex in ConstantPool, with a > signature: > T entryByIndex(int index, Class cls) > and a type-checked readEntryOrNull in ClassReader, with a signature: > T readEntryOrNull(int offset, Class cls) > Which will throw ConstantPoolException if the entry is of a mismatched > type, much like the type-checked readClassEntry in ClassBuilder. > > A search for existing generic ConstantPool::entryByIndex and > ClassReader::readEntryOrNull in JDK reveals that most of their usages > within the jdk.internal.classfile.impl and its subpackages involve a direct > cast right after retrieving the result. These casts are susceptible to > malformed classfiles putting entries of wrong type, such as a Utf8 at the > cursor of superclass entry, throwing ClassCastException, which is out of > spec with the Classfile API. > > I recommend adding these 2 methods for user convenience, and migrating all > existing entryByIndex/readEntryOrNull with casts to these 2 new methods, to > enhance the robustness of the ClassFile API. (On a side note, we can > promote ClassReader::utf8EntryByIndex to ConstantPool too) > > Please feel free to comment or critique this proposal. > > Chen Liang > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Tue May 21 12:19:44 2024 From: adam.sotona at oracle.com (Adam Sotona) Date: Tue, 21 May 2024 12:19:44 +0000 Subject: Type-checked entryByIndex and readEntryOrNull In-Reply-To: References: Message-ID: Hi Chen, Internally it is already resolved, and it does not bring much of a value to change only internal implementation. I'm OK with the proposed API addition, however real use cases would give the proposal more weight. Priority is to clean the Class-File API and additions to the API should be backed by real use cases or visible benefits in the existing code. Procedurally, feel free to go ahead and propose it to 23. Thanks, Adam From: Chen Liang Date: Tuesday, 21 May 2024 at 13:54 To: classfile-api-dev , asotona at openjdk.org Subject: Re: Type-checked entryByIndex and readEntryOrNull Hi Adam, This patch is simple; since this can prevent a lot of bugs around malicious CP references in crafted classfiles, should we consider this enhancement for JDK 23, or should we only have this as internal APIs in ClassReaderImpl? - Chen On Mon, May 20, 2024 at 9:08?AM - > wrote: Hi ClassFile API list, I call for addition of a type-checked entryByIndex in ConstantPool, with a signature: T entryByIndex(int index, Class cls) and a type-checked readEntryOrNull in ClassReader, with a signature: T readEntryOrNull(int offset, Class cls) Which will throw ConstantPoolException if the entry is of a mismatched type, much like the type-checked readClassEntry in ClassBuilder. A search for existing generic ConstantPool::entryByIndex and ClassReader::readEntryOrNull in JDK reveals that most of their usages within the jdk.internal.classfile.impl and its subpackages involve a direct cast right after retrieving the result. These casts are susceptible to malformed classfiles putting entries of wrong type, such as a Utf8 at the cursor of superclass entry, throwing ClassCastException, which is out of spec with the Classfile API. I recommend adding these 2 methods for user convenience, and migrating all existing entryByIndex/readEntryOrNull with casts to these 2 new methods, to enhance the robustness of the ClassFile API. (On a side note, we can promote ClassReader::utf8EntryByIndex to ConstantPool too) Please feel free to comment or critique this proposal. Chen Liang -------------- next part -------------- An HTML attachment was scrubbed... URL: From liangchenblue at gmail.com Wed May 22 14:24:19 2024 From: liangchenblue at gmail.com (Chen Liang) Date: Wed, 22 May 2024 09:24:19 -0500 Subject: Type-checked entryByIndex and readEntryOrNull In-Reply-To: References: Message-ID: Hi Adam and the list, I have since created JDK-8332614 and opened https://github.com/openjdk/jdk/pull/19330 pull request. Feel free to take a look and comment! Thanks, Chen On Tue, May 21, 2024 at 7:19?AM Adam Sotona wrote: > Hi Chen, > > Internally it is already resolved, and it does not bring much of a value > to change only internal implementation. > > > > I'm OK with the proposed API addition, however real use cases would give > the proposal more weight. > > Priority is to clean the Class-File API and additions to the API should be > backed by real use cases or visible benefits in the existing code. > > Procedurally, feel free to go ahead and propose it to 23. > > > > Thanks, > > Adam > > > > *From: *Chen Liang > *Date: *Tuesday, 21 May 2024 at 13:54 > *To: *classfile-api-dev , > asotona at openjdk.org > *Subject: *Re: Type-checked entryByIndex and readEntryOrNull > > Hi Adam, > > This patch is simple; since this can prevent a lot of bugs around > malicious CP references in crafted classfiles, should we consider this > enhancement for JDK 23, or should we only have this as internal APIs in > ClassReaderImpl? > > > > - Chen > > > > > > On Mon, May 20, 2024 at 9:08?AM - wrote: > > Hi ClassFile API list, > > I call for addition of a type-checked entryByIndex in ConstantPool, with a > signature: > > T entryByIndex(int index, Class cls) > > and a type-checked readEntryOrNull in ClassReader, with a signature: > > T readEntryOrNull(int offset, Class cls) > > Which will throw ConstantPoolException if the entry is of a mismatched > type, much like the type-checked readClassEntry in ClassBuilder. > > > > A search for existing generic ConstantPool::entryByIndex and > ClassReader::readEntryOrNull in JDK reveals that most of their usages > within the jdk.internal.classfile.impl and its subpackages involve a direct > cast right after retrieving the result. These casts are susceptible to > malformed classfiles putting entries of wrong type, such as a Utf8 at the > cursor of superclass entry, throwing ClassCastException, which is out of > spec with the Classfile API. > > > > I recommend adding these 2 methods for user convenience, and migrating all > existing entryByIndex/readEntryOrNull with casts to these 2 new methods, to > enhance the robustness of the ClassFile API. (On a side note, we can > promote ClassReader::utf8EntryByIndex to ConstantPool too) > > > > Please feel free to comment or critique this proposal. > > > > Chen Liang > > -------------- next part -------------- An HTML attachment was scrubbed... URL: