From rafael.wth at gmail.com Fri Jul 1 13:51:57 2022 From: rafael.wth at gmail.com (Rafael Winterhalter) Date: Fri, 1 Jul 2022 15:51:57 +0200 Subject: POC: JDK ClassModel -> ASM ClassReader Message-ID: Hello, Thanks for the initiative, I am looking forward to having this as a public API at some point! I maintain Byte Buddy, a code generation library that relies on ASM for processing byte code. To also support the JDK API, I do want to make the currently used ClassReader/ClassWriter pluggable to use the JDK API once it is available. This way, Byte Buddy can offer some form of forward compatibility for example, for Java agents in the future by using ASM for older ASMs but OpenJDK-based reader/writers for JVMs that do include these new APIs. For a POC, I have now implemented a ClassReader that uses a ClassModel to delegate to ASM?s API. I could validate that ASM and the JDK API generate the same output for all constellations I could think of: https://github.com/raphw/asm-jdk-bridge This works really well, congratulations to such a well-working first attempt. Some details stuck out however: 1. It does not seem like it is possible to model ?CROP? frames. The frames that are read include zero additional local variables, but they do not indicate how many local variables are cropped. I would suggest adding subinterfaces for all frame types to allow for the same pattern matching style that works well with the rest of the API. This way, the declaredLocals and declaredStack methods would only be available if relevant and the crop frame type could add a croppedLocals() : int method. 2. StackMapFrame uses offsetDelta and absoluteOffset to indicate the frame's location. I found that a bit awkward as I need to keep track of the offset just to add frames at the right location. With all other types, Labels are used to indicate the code location. Why are labels not used for frames to keep things consistent? Also, I didn't really understand the purpose of ?initialFrame?. Is it a mere convenience? 3. Debugging with toString works well for the most, but not all classes, for example subclasses of CodeElement have representations. It?s probably an oversight but it would be neat to add this quickly to make exploring the code easier. 4. The JDK code knows an attribute named CharacterRange. I must confess that I never heard of it and I could neither find documentation. I assume it is a JDK20 concept? This made me however think about how such ?unknown? attributes can be handled. I would like to find a way to treat all attributes that I do not know or care about as an UnknownAttribute. This way, I could simply forward them as a binary dump, as I currently do it for custom attributes, for example to forward it to ASM. However, today there is no way to convert an attribute payload back to an array. 5. I think the ?findAttribute? method will be invoked a lot. Currently, it iterates over all attributes on each call. Ideally, this would only be done once to create a map of attributes with O(1) lookup speed. Of course, I could do this myself, but I think this would be such a common need that I wanted to suggest it as implementation for the official API, especially since the API feels like a map lookup. This could be done as easily as by storing attributes in a map after they are read for the first time as the attribute keys already are compared by identity. 6. I found that TypeAnnotation.CatchTarget offers an index of the exception table entry, additionally to being visited inline. I found that model a bit awkward as there is no indication of the index in the ExceptionCatch instruction that comes with the same pass. Also, there is no guarantee on when the type annotations are visited in relation to the try catch block. Ideally, it would be guaranteed that the annotation is visited directly after the ExceptionCatch pseudo instruction, to allow for an easy single-pass processing. 7. There is a SAME_LOCALS_1_STACK_ITEM frame constant with an ?EXTENDED? version where this word is appended. For the SAME constant, the extended version is called SAME_FRAME_EXTENDED. To keep it consistent, should this constant be renamed to SAME_EXTENDED? Also, there is a RESERVED_FOR_FUTURE_USE constant. Should that constant exist even now? Finally, I have not yet started working on a ClassWriter equivalent. Here, I found that the style of the Consumer to be incompatible with the way ASM works. This is of course a decision of style, but I would consider this a major difficulty to migrate current code at some point. Would it be an idea to offer both styles of class creation? The internal one, additionally to exposing an interface with the methods of DirectClassBuilder? With some API as: DirectClassBuilder builder = Classfile.build(ClassDec); byte[] classFile = builder.toByteArray(); I could easily plug an OpenJDK-based ClassWriter into Byte Buddy?s ASM code. Today, I can emulate this by building a new class for every instruction I encounter but this is of course quite inefficient. Thanks and best regards, Rafael -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Fri Jul 1 15:01:59 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 1 Jul 2022 11:01:59 -0400 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: References: Message-ID: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> Thanks for test-driving this! > This works really well, congratulations to such a well-working first > attempt. Some details stuck out however: Good to hear. > 1. It does not seem like it is possible to model ?CROP? frames. The > frames that are read include zero additional local variables, but they > do not indicate how many local variables are cropped. I would suggest > adding subinterfaces for all frame types to allow for the same pattern > matching style that works well with the rest of the API. This way, the > declaredLocals and declaredStack methods would only be available if > relevant and the crop frame type could add a croppedLocals() : int method. In general, the API for stack maps hasn't gotten a lot of attention, as there is not a lot of user code that accesses stack maps at all, and we generate stack maps as part of code generation.? So I'm not surprised there are missing bits or rough edges (same for next issue.)? Are you generating stack maps yourself?? Is the generator in the library not meeting your needs? > 3. Debugging with toString works well for the most, but not all > classes, for example subclasses of CodeElement have representations. > It?s probably an oversight but it would be neat to add this quickly to > make exploring the code easier. We should make a pass through all the implementations and make sure that toString is both present and uniform.? I am not surprised there are gaps here.? That would be a fine "starter bug" for someone to grab. > 4. The JDK code knows an attribute named CharacterRange. I must > confess that I never heard of it and I could neither find > documentation. I assume it is a JDK20 concept? This made me however > think about how such ?unknown? attributes can be handled. I would like > to find a way to treat all attributes that I do not know or care about > as an UnknownAttribute. This way, I could simply forward them as a > binary dump, as I currently do it for custom attributes, for example > to forward it to ASM. However, today there is no way to convert an > attribute payload back to an array. It is not a JDK20 concept; it is what is used by the `jcov` coverage tool (and I believe others such as jacoco), from the CodeTools project.? Javac will generate this attribute if you ask it nicely. You should just ignore attributes you don't understand; that's how attributes were designed to work (there will always be new ones.) But your point about getting the raw contents is well taken. Essentially, your request is to raise `UnknownAttribute::contents` up to Attribute, perhaps renaming to `rawContents`.? This seems an easy and reasonable addition. > 5. I think the ?findAttribute? method will be invoked a lot. > Currently, it iterates over all attributes on each call. Ideally, this > would only be done once to create a map of attributes with O(1) lookup > speed. Of course, I could do this myself, but I think this would be > such a common need that I wanted to suggest it as implementation for > the official API, especially since the API feels like a map lookup. > This could be done as easily as by storing attributes in a map after > they are read for the first time as the attribute keys already are > compared by identity. We considered this, and early performance tests suggested this was a loser.? However, that was a lot of API and workload churn ago, before AttributeMapper, which is interned, so we could try with an IdentityHashMap here and see whether it perturbs the benchmarks. > 6. I found that TypeAnnotation.CatchTarget offers an index of the > exception table entry, additionally to being visited inline. I found > that model a bit awkward as there is no indication of the index in the > ExceptionCatch instruction that comes with the same pass. Also, there > is no guarantee on when the type annotations are visited in relation > to the try catch block. Ideally, it would be guaranteed that the > annotation is visited directly after the ExceptionCatch pseudo > instruction, to allow for an easy single-pass processing. Yeah, everything about the type annotations attributes is a disaster...?? Transforming while keeping the validity of the TA attributes is essentially impossible, because it includes indexes into things like exception tables, code attributes, tvar lists, tvar bounds lists, etc etc.? We can decode a TA attribute, we can give you a way to write one, but for a while we seriously considered just _dropping TA attributes on the floor during adaptation_ because they're so inherently unstable. The visitation order guarantee you ask for is pretty expensive, and something that very very few people would care about.? So I surely wouldn't want to do that by default. We've largely punted on doing anything better with type annotations, at least up until now, because they're such a disaster.? Still, we can probably do better here.? Open to suggestions. > Finally, I have not yet started working on a ClassWriter equivalent. > Here, I found that the style of the Consumer to be > incompatible with the way ASM works. This is of course a decision of > style, but I would consider this a major difficulty to migrate current > code at some point. Would it be an idea to offer both styles of class > creation? The internal one, additionally to exposing an interface with > the methods of DirectClassBuilder? We've struggled with this as well when adapting the back end of `javac`.? This is a tricky issue.? In theory, allowing users to ask for a ClassBuilder is not hard, as you've sketched out.? But I am really worried it will become an attractive nuisance, where users will instinctively reach for that style because its familiar and seems "easier", and then will lose out on some features that the Consumer approach offers, such as managing branch offset sizes.? Success would be if this were allowed, but only used 1% of the time.? Failure would be 3% :) Q: in this case, is it enough if only ClassBuilder has this option, or do you need it for MethodBuilder and CodeBuilder as well? From rafael.wth at gmail.com Wed Jul 6 21:05:39 2022 From: rafael.wth at gmail.com (Rafael Winterhalter) Date: Wed, 6 Jul 2022 23:05:39 +0200 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> Message-ID: Hello again, Thanks for merging my patches, the class reader API works more or less equivalent to the ASM version. The one thing I have not working so far are attributes that the JDK knows but ASM does not. I was trying to implement the "contents" method upwards, but this is difficult with the unbound attributes as they normally require a BufWriter which in turn needs a ConstantPoolBuilder. Somehow, I need to pipe the constant resolution to ASM which cannot be done without unsealing some interfaces. I will try to prototype a solution here, too, but I wanted to get the writer working first. With the writer, I have made some progress after adding a monadic view to ClassBuilder where one can apply a consumer multiple times before "closing" the writer for extracting a class file. I pushed this experiment on a commit of my clone ( https://github.com/raphw/jdk-sandbox/commit/2be58f400b9ebf96b851eda658e0b8d2560421c5) to showcase the way I thought this might work. In theory, it should allow for any optimization of the current API. At the same time, it is awkward enough that people would only use it if they really needed it and therefore avoid it by default. And once they use it, any IDE would ask for closing each intermediate object when detecting the AutoCloseable interface. The only thing that I had to compromise on compared to "non-open" API was the use of CodeBuilderImpl which is currently reapplying the consumer in case of a LabelOverflowException. At the same time, I hoped that this might be a temporary state anyways as the possible reapplication is unlikely to be expected by any user. For the type annotations on instructions: Would it be an option to add "getVisibleAnnotations" and "getInvisibleAnnotations" to the relevant CodeElement types? This way, I could for example query the "ExceptionCatch" value for its annotations. StackMapFrames could on the other hand just be added at their position in the CodeElement iteration to receive them where they become relevant. This way, one would not need to keep track of the current offset. This would also allow for an easier write model where ASM does not allow you to know the offset of a stack map. I assume that the current model is very much modeled after the needs of the javap tool. Ideally, the frame objects would be reduced to the information that is contained in a class file and the consumer could track implicit information such as the "effective" stack and locals. Java agents do not normally compute stack map frames as it requires to determine the most specific common supertype. This is not always possible when classes are not yet loaded as the type information is not yet available. Byte Buddy, for example, uses Java templates where two javac-compiled classes are merged and the stack map frames of both classes are combined on the fly. This works with a single pass and does not require a recomputation what makes it also very efficient. This is why I believe there should be an explicit way of defining stack map frames. If you wanted to see my attempt on a writer using my suggested "monad" API for writers, here's my status quo: https://github.com/raphw/asm-jdk-bridge/compare/main...writer-poc At last, some notes I made while working with the writer API: - ClassBuilder does not require flags when creating a field using "withField", but does so for "withMethod". Should this not be consistent? - EnclosingMethodAttribute requires optional parameters in its "of" method. I find that a bit awkward and would rather see an overload or the parameters to accept "null". Thanks, Rafael Am Fr., 1. Juli 2022 um 17:02 Uhr schrieb Brian Goetz < brian.goetz at oracle.com>: > Thanks for test-driving this! > > > This works really well, congratulations to such a well-working first > > attempt. Some details stuck out however: > > Good to hear. > > > 1. It does not seem like it is possible to model ?CROP? frames. The > > frames that are read include zero additional local variables, but they > > do not indicate how many local variables are cropped. I would suggest > > adding subinterfaces for all frame types to allow for the same pattern > > matching style that works well with the rest of the API. This way, the > > declaredLocals and declaredStack methods would only be available if > > relevant and the crop frame type could add a croppedLocals() : int > method. > > In general, the API for stack maps hasn't gotten a lot of attention, as > there is not a lot of user code that accesses stack maps at all, and we > generate stack maps as part of code generation. So I'm not surprised > there are missing bits or rough edges (same for next issue.) Are you > generating stack maps yourself? Is the generator in the library not > meeting your needs? > > > 3. Debugging with toString works well for the most, but not all > > classes, for example subclasses of CodeElement have representations. > > It?s probably an oversight but it would be neat to add this quickly to > > make exploring the code easier. > > We should make a pass through all the implementations and make sure that > toString is both present and uniform. I am not surprised there are gaps > here. That would be a fine "starter bug" for someone to grab. > > > 4. The JDK code knows an attribute named CharacterRange. I must > > confess that I never heard of it and I could neither find > > documentation. I assume it is a JDK20 concept? This made me however > > think about how such ?unknown? attributes can be handled. I would like > > to find a way to treat all attributes that I do not know or care about > > as an UnknownAttribute. This way, I could simply forward them as a > > binary dump, as I currently do it for custom attributes, for example > > to forward it to ASM. However, today there is no way to convert an > > attribute payload back to an array. > > It is not a JDK20 concept; it is what is used by the `jcov` coverage > tool (and I believe others such as jacoco), from the CodeTools project. > Javac will generate this attribute if you ask it nicely. > > You should just ignore attributes you don't understand; that's how > attributes were designed to work (there will always be new ones.) But > your point about getting the raw contents is well taken. Essentially, > your request is to raise `UnknownAttribute::contents` up to Attribute, > perhaps renaming to `rawContents`. This seems an easy and reasonable > addition. > > > 5. I think the ?findAttribute? method will be invoked a lot. > > Currently, it iterates over all attributes on each call. Ideally, this > > would only be done once to create a map of attributes with O(1) lookup > > speed. Of course, I could do this myself, but I think this would be > > such a common need that I wanted to suggest it as implementation for > > the official API, especially since the API feels like a map lookup. > > This could be done as easily as by storing attributes in a map after > > they are read for the first time as the attribute keys already are > > compared by identity. > > We considered this, and early performance tests suggested this was a > loser. However, that was a lot of API and workload churn ago, before > AttributeMapper, which is interned, so we could try with an > IdentityHashMap here and see whether it perturbs the benchmarks. > > > 6. I found that TypeAnnotation.CatchTarget offers an index of the > > exception table entry, additionally to being visited inline. I found > > that model a bit awkward as there is no indication of the index in the > > ExceptionCatch instruction that comes with the same pass. Also, there > > is no guarantee on when the type annotations are visited in relation > > to the try catch block. Ideally, it would be guaranteed that the > > annotation is visited directly after the ExceptionCatch pseudo > > instruction, to allow for an easy single-pass processing. > > Yeah, everything about the type annotations attributes is a > disaster... Transforming while keeping the validity of the TA > attributes is essentially impossible, because it includes indexes into > things like exception tables, code attributes, tvar lists, tvar bounds > lists, etc etc. We can decode a TA attribute, we can give you a way to > write one, but for a while we seriously considered just _dropping TA > attributes on the floor during adaptation_ because they're so inherently > unstable. > > The visitation order guarantee you ask for is pretty expensive, and > something that very very few people would care about. So I surely > wouldn't want to do that by default. > > We've largely punted on doing anything better with type annotations, at > least up until now, because they're such a disaster. Still, we can > probably do better here. Open to suggestions. > > > Finally, I have not yet started working on a ClassWriter equivalent. > > Here, I found that the style of the Consumer to be > > incompatible with the way ASM works. This is of course a decision of > > style, but I would consider this a major difficulty to migrate current > > code at some point. Would it be an idea to offer both styles of class > > creation? The internal one, additionally to exposing an interface with > > the methods of DirectClassBuilder? > > We've struggled with this as well when adapting the back end of > `javac`. This is a tricky issue. In theory, allowing users to ask for > a ClassBuilder is not hard, as you've sketched out. But I am really > worried it will become an attractive nuisance, where users will > instinctively reach for that style because its familiar and seems > "easier", and then will lose out on some features that the > Consumer approach offers, such as managing branch offset > sizes. Success would be if this were allowed, but only used 1% of the > time. Failure would be 3% :) > > Q: in this case, is it enough if only ClassBuilder has this option, or > do you need it for MethodBuilder and CodeBuilder as well? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.sandoz at oracle.com Fri Jul 8 20:53:50 2022 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Fri, 8 Jul 2022 20:53:50 +0000 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> Message-ID: The inversion of control in the current API is indeed awkward when something else wants to take overall control, and in those circumstances one would have to give up certain features (like reapply as you noted) and there is more room for error (e.g. paring closes, use after close). Arguably making those open builders AutoCloseable is misleading, since since if the building can be lexically scoped one should use the existing API (ignoring details on exceptions and capturing, which I don?t think are sufficient to justify a new mode of writing). It feels like the ASM integration is more of an academic exercise. A useful one to play with the API and provide feedback, but in practice how useful is it? (Since one can always interoperate between classifies.) I am concerned the choice will be a distraction, but I don?t have any better concrete ideas right now. It would be helpful to understand more about the integration experiments with the Java compiler to compare/contrast. Paul. > On Jul 6, 2022, at 2:05 PM, Rafael Winterhalter wrote: > > With the writer, I have made some progress after adding a monadic view to ClassBuilder where one can apply a consumer multiple times before "closing" the writer for extracting a class file. I pushed this experiment on a commit of my clone (https://github.com/raphw/jdk-sandbox/commit/2be58f400b9ebf96b851eda658e0b8d2560421c5) to showcase the way I thought this might work. In theory, it should allow for any optimization of the current API. At the same time, it is awkward enough that people would only use it if they really needed it and therefore avoid it by default. And once they use it, any IDE would ask for closing each intermediate object when detecting the AutoCloseable interface. The only thing that I had to compromise on compared to "non-open" API was the use of CodeBuilderImpl which is currently reapplying the consumer in case of a LabelOverflowException. At the same time, I hoped that this might be a temporary state anyways as the possible reapplication is unlikely to be expected by any user. > From brian.goetz at oracle.com Sun Jul 10 17:40:19 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 10 Jul 2022 17:40:19 +0000 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> Message-ID: <59424A30-EDF8-4AAA-8259-6F1B24FAB0DF@oracle.com> Just some quick comments, more when I get back from vacation. > Thanks for merging my patches, the class reader API works more or less equivalent to the ASM version. The one thing I have not working so far are attributes that the JDK knows but ASM does not. I was trying to implement the "contents" method upwards, but this is difficult with the unbound attributes as they normally require a BufWriter which in turn needs a ConstantPoolBuilder. Somehow, I need to pipe the constant resolution to ASM which cannot be done without unsealing some interfaces. I will try to prototype a solution here, too, but I wanted to get the writer working first. I?ll take a look at this when I get back. > For the type annotations on instructions: Would it be an option to add "getVisibleAnnotations" and "getInvisibleAnnotations" to the relevant CodeElement types? This way, I could for example query the "ExceptionCatch" value for its annotations. I?d like to slice this in two parts; one to query the BCI for a (bound) CodeElement, and one to use that BCI to lookup annotations. Getting the BCI is an essential bit of functionality, but given that elements can be bound or unbound, such functionality will be necessarily partial, so we have to communicate ?no BCI? as a possible return value. > StackMapFrames could on the other hand just be added at their position in the CodeElement iteration to receive them where they become relevant. This way, one would not need to keep track of the current offset. This would also allow for an easier write model where ASM does not allow you to know the offset of a stack map. I assume that the current model is very much modeled after the needs of the javap tool. Ideally, the frame objects would be reduced to the information that is contained in a class file and the consumer could track implicit information such as the "effective" stack and locals. There are two concerns with this, one minor and one major. The minor one is that this has a significant cost, and most users don?t want this information. So we would surely want to gate this with an option whose default is false. (We care the most about transformation, and most transformations make only light changes, so we don?t want to add costs that most users won?t want to bear.). The major one is how it perturbs the model. The element stream delivered by a model, and the one consumed by a builder, should be duals. What should a builder do when handed a frame? Switch responsibility over to the user for correct frame generation? Ignore it and regenerate stack maps anyway? I think we need a better picture of ?who is the audience for this sub feature? before designing the API. > At last, some notes I made while working with the writer API: > - ClassBuilder does not require flags when creating a field using "withField", but does so for "withMethod". Should this not be consistent? One can always set flags through the AccessFlags element; anything else is just a convenience. The convenience is there for fields because they almost never have any other elements ? but methods almost always do. I don?t necessarily object to conveniences, but I don?t want to confuse users with too many ways to do the same thing. (I think the flags-accepting overload for fields does NOT accept a Consumer, so there?s always only one way to do it within each withField overload ? which is a good thing.). > - EnclosingMethodAttribute requires optional parameters in its "of" method. I find that a bit awkward and would rather see an overload or the parameters to accept "null?. This is a deeper question of API design than it first appears. (BTW originally we had the null-accepting/delivering version, and we found it to be error-prone.). We want for the _data model_ to be the source of truth, and derive the API from that. One reason for doing so is that it means that the reading API (accessors on the Element) and the writing API (withXxx methods on Builders) will be free of gratuitous inconsistencies, so you can always take things apart to whatever level you want, clone/modify/keep each individual datum, and feed them back to the corresponding builder without having to convert, wrap, unwrap, etc. So the return type of `enclosingMethod` returns an Optional, there?s a strong argument that the of() method should take one. Otherwise, users have to make up the difference in ad-hoc and error-prone ways when adapting classifies. Both Optional and null are reasonable ways to represent possibly-not-there values in an API like this, but mixing the two in the same API multiplies the chance for error. So I prefer to address this question at a higher level (?should anything ever return null?) rather than what a particular accessor or factory should deal in. Cheers, -Brian From rafael.wth at gmail.com Sun Jul 10 18:54:43 2022 From: rafael.wth at gmail.com (Rafael Winterhalter) Date: Sun, 10 Jul 2022 20:54:43 +0200 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> Message-ID: You are right that using AutoCloseable might be misleading, I had the same thought. I still chose the interface as IDEs know the type and warn if an AutoClosable is neither passed on nor closed, so I felt it would be a better choice compared to a custom method. I do however disagree with the ASM integration being an academic exercise. ASM is well-established and a de-facto standard for code generation. There's tons of code written in ASM and sometimes (like in Byte Buddy), ASM is publicly exposed and cannot be replaced without creating years of migration work. My hope is that a bridge like this would allow for ASM to adapt to the OpenJDK APIs for its readers and writers (on VMs where those are available). By achieving this, the "ASM problem" could be solved with the JVM version the API is released, which is likely many years before the OpenJDK API could make ASM obsolete. I agree that the focus should be on a convenient API and not to reinvent ASM in new cloths, but the new API should aim to make itself integratable, I think, as this would create huge value and speed-up adoption. Am Fr., 8. Juli 2022 um 22:53 Uhr schrieb Paul Sandoz < paul.sandoz at oracle.com>: > The inversion of control in the current API is indeed awkward when > something else wants to take overall control, and in those circumstances > one would have to give up certain features (like reapply as you noted) and > there is more room for error (e.g. paring closes, use after close). > > Arguably making those open builders AutoCloseable is misleading, since > since if the building can be lexically scoped one should use the existing > API (ignoring details on exceptions and capturing, which I don?t think are > sufficient to justify a new mode of writing). > > It feels like the ASM integration is more of an academic exercise. A > useful one to play with the API and provide feedback, but in practice how > useful is it? (Since one can always interoperate between classifies.) > > I am concerned the choice will be a distraction, but I don?t have any > better concrete ideas right now. It would be helpful to understand more > about the integration experiments with the Java compiler to > compare/contrast. > > Paul. > > > On Jul 6, 2022, at 2:05 PM, Rafael Winterhalter > wrote: > > > > With the writer, I have made some progress after adding a monadic view > to ClassBuilder where one can apply a consumer multiple times before > "closing" the writer for extracting a class file. I pushed this experiment > on a commit of my clone ( > https://github.com/raphw/jdk-sandbox/commit/2be58f400b9ebf96b851eda658e0b8d2560421c5) > to showcase the way I thought this might work. In theory, it should allow > for any optimization of the current API. At the same time, it is awkward > enough that people would only use it if they really needed it and therefore > avoid it by default. And once they use it, any IDE would ask for closing > each intermediate object when detecting the AutoCloseable interface. The > only thing that I had to compromise on compared to "non-open" API was the > use of CodeBuilderImpl which is currently reapplying the consumer in case > of a LabelOverflowException. At the same time, I hoped that this might be a > temporary state anyways as the possible reapplication is unlikely to be > expected by any user. > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafael.wth at gmail.com Sun Jul 10 20:57:04 2022 From: rafael.wth at gmail.com (Rafael Winterhalter) Date: Sun, 10 Jul 2022 22:57:04 +0200 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: <59424A30-EDF8-4AAA-8259-6F1B24FAB0DF@oracle.com> References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <59424A30-EDF8-4AAA-8259-6F1B24FAB0DF@oracle.com> Message-ID: I was able to complete a POC for an ASM-to-OpenJDK bridge. It works quite well, but I hit a few more road blocks then the last time: https://github.com/raphw/asm-jdk-bridge/tree/writer-poc I have used my API suggestion of "open builders" ( https://github.com/raphw/jdk-sandbox/commit/2be58f400b9ebf96b851eda658e0b8d2560421c5) for this POC which I had modified slightly since my last mail. As Paul pointed out, it does not allow to repeat a code builder in case of a LabelOverflowException. But personally, I would - as a user - not expect that my CodeBuilder-Consumer is consumed multiple times, I would hope that the implementation could rather reiterate to patch the label offsets, or at least do that for the "open version". Anyways, I think that this limitation is solvable. I have again written a row of tests to see how well the bridge is working. Again, I made a few observations: - The frame generation works, but generates slightly different frames from what javac generated if I leave the byte code untouched. For example, local variables are "voided" if they are not used after a frame. This is not a problem in practice, but I know that some tools prepend code to methods, assuming all variables still exist, without checking frames what would fail bytecode verification. This would however only happen if a class file was first processed by OpenJDKs API and later by ASM. In my bridge sample repo, this is reproduced by the test of the "TryThrowCatch" class. - The frame generation does not work for optional types which are often used especially in enterprise-directed frameworks like Spring (if this is fortunate is another question, it is however quite common). I understand that the class hierarchy resolution is pluggable, but for optional types, the class hierarchy cannot be resolved, even with a custom resolver, since these types are in fact missing at runtime. This is why Byte Buddy uses "frame weaving" which would require that frames can be written explicitly and are not handled by the class writer. I would therefore suggest that one should be able to opt-out of frame generation, be it for single methods, and have some API to add them explicitly. This normally also improves performance. Byte Buddy for example generates three times the overhead if ASMs frame computation is enabled as it is rather I/O-heavy. An example of where OpenJDK creates invalid frames is documented in the "FrameWithMissingType" test of the bridge repo. Also, many byte code processing frameworks already need to keep track of the maximum stack and locals; I am not sure how expensive computation of those values is in practice, but ASM claims that it saves 20% on avoiding the bookkeeping. Maybe one could consider to allow specifying these values too if they are available anyways. - I noticed that when adding a TypeAnnotation on a NEW instruction, the offset in the attribute seems to be set incorrectly. The type annotation is added with reference to the DUP bytecode that follows NEW. I am not 100% sure if this is legal as the type reference still points to a NEW instruction, but I would still suggest fixing it. I tried to do this myself, but I got lost in the code. Maybe I will find some more time to still do it myself some time. The error is documented in the "TypeAnnotationInCode" test of my repo. - I noticed that the JSR/RET instructions are not supported, even if they are just "passed through". Unfortunately, I still encounter them regularly when working with Java agents. JDBC drivers are often compiled to very old bytecode levels and for example vendors for APM tools like instrumenting those. I know they are difficult and annoying bytecodes, but I would expect that OpenJDK will eventually be able to process those. Is this only meant as an intermediate limitation? - Custom/unknown attributes are also difficult to deal with when writing. In a way, its the same problem as with reading them. As I mentioned then, I lack some way of resolving an unknown Attribute to an array of bytes given some constant pool. Currently, I cannot even attempt this as all interfaces are sealed and I could not provide a custom implementation of a ConstantPoolBuilder to an AttributeMapper either. As things are, ASM does neither support this and I made a suggestion to open up the attribute API which is currently tied to ClassReader/ClassWriter implementations ( https://gitlab.ow2.org/asm/asm/-/merge_requests/353). This is a prerequisit and I would hope to have a similar API that is implementable in OpenJDKs to pass to OpenJDK's AttributeMapper class, to write attributes from and to byte arrays.. - OpenJDK's "line number" API only allows setting a line number for the current code position. For most use cases, I think this is sufficient, but it stuck out to me as it is an attribute where most in-code attributes are currently tied to labels. Ideally, I would suggest that line numbers should also be definable for a previous position with a label; ASM does allow the same at least and I assume people will have an easier time migrating code. Before all of these things however, some form of "open API" would be required to even make the class writer bridge possible, so I hope that you can consider something in the direction of my suggested pull request. Thanks for considering! I will try to make suggestions for the minor things I pointed out in the mean time and follow up with the ASM crew to see if we can get the attribute problem sorted, too. Many thanks, Rafael Am So., 10. Juli 2022 um 19:40 Uhr schrieb Brian Goetz < brian.goetz at oracle.com>: > Just some quick comments, more when I get back from vacation. > > > Thanks for merging my patches, the class reader API works more or less > equivalent to the ASM version. The one thing I have not working so far are > attributes that the JDK knows but ASM does not. I was trying to implement > the "contents" method upwards, but this is difficult with the unbound > attributes as they normally require a BufWriter which in turn needs a > ConstantPoolBuilder. Somehow, I need to pipe the constant resolution to ASM > which cannot be done without unsealing some interfaces. I will try to > prototype a solution here, too, but I wanted to get the writer working > first. > > I?ll take a look at this when I get back. > > > For the type annotations on instructions: Would it be an option to add > "getVisibleAnnotations" and "getInvisibleAnnotations" to the relevant > CodeElement types? This way, I could for example query the "ExceptionCatch" > value for its annotations. > > I?d like to slice this in two parts; one to query the BCI for a (bound) > CodeElement, and one to use that BCI to lookup annotations. Getting the > BCI is an essential bit of functionality, but given that elements can be > bound or unbound, such functionality will be necessarily partial, so we > have to communicate ?no BCI? as a possible return value. > > > StackMapFrames could on the other hand just be added at their position > in the CodeElement iteration to receive them where they become relevant. > This way, one would not need to keep track of the current offset. This > would also allow for an easier write model where ASM does not allow you to > know the offset of a stack map. I assume that the current model is very > much modeled after the needs of the javap tool. Ideally, the frame objects > would be reduced to the information that is contained in a class file and > the consumer could track implicit information such as the "effective" stack > and locals. > > There are two concerns with this, one minor and one major. The minor one > is that this has a significant cost, and most users don?t want this > information. So we would surely want to gate this with an option whose > default is false. (We care the most about transformation, and most > transformations make only light changes, so we don?t want to add costs that > most users won?t want to bear.). > > The major one is how it perturbs the model. The element stream delivered > by a model, and the one consumed by a builder, should be duals. What > should a builder do when handed a frame? Switch responsibility over to the > user for correct frame generation? Ignore it and regenerate stack maps > anyway? > > I think we need a better picture of ?who is the audience for this sub > feature? before designing the API. > > > At last, some notes I made while working with the writer API: > > - ClassBuilder does not require flags when creating a field using > "withField", but does so for "withMethod". Should this not be consistent? > > One can always set flags through the AccessFlags element; anything else is > just a convenience. The convenience is there for fields because they > almost never have any other elements ? but methods almost always do. I > don?t necessarily object to conveniences, but I don?t want to confuse users > with too many ways to do the same thing. (I think the flags-accepting > overload for fields does NOT accept a Consumer, so there?s always only one > way to do it within each withField overload ? which is a good thing.). > > > - EnclosingMethodAttribute requires optional parameters in its "of" > method. I find that a bit awkward and would rather see an overload or the > parameters to accept "null?. > > This is a deeper question of API design than it first appears. (BTW > originally we had the null-accepting/delivering version, and we found it to > be error-prone.). > > We want for the _data model_ to be the source of truth, and derive the API > from that. One reason for doing so is that it means that the reading API > (accessors on the Element) and the writing API (withXxx methods on > Builders) will be free of gratuitous inconsistencies, so you can always > take things apart to whatever level you want, clone/modify/keep each > individual datum, and feed them back to the corresponding builder without > having to convert, wrap, unwrap, etc. So the return type of > `enclosingMethod` returns an Optional, there?s a strong argument that the > of() method should take one. Otherwise, users have to make up the > difference in ad-hoc and error-prone ways when adapting classifies. > > Both Optional and null are reasonable ways to represent possibly-not-there > values in an API like this, but mixing the two in the same API multiplies > the chance for error. So I prefer to address this question at a higher > level (?should anything ever return null?) rather than what a particular > accessor or factory should deal in. > > Cheers, > -Brian > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafael.wth at gmail.com Mon Jul 11 07:41:31 2022 From: rafael.wth at gmail.com (Rafael Winterhalter) Date: Mon, 11 Jul 2022 09:41:31 +0200 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <59424A30-EDF8-4AAA-8259-6F1B24FAB0DF@oracle.com> Message-ID: Short follow up: 1. I figured out that the annotation on the NEW instruction used a label that was bound after the instruction when it should have been bound before. My hacky workaround for ASM - which only visits annotations after an instruction - is to always register a label prior to annotatable instructions, but ideally, I would be able to offset a label with a given size to avoid this. I felt like this could be a solution as OpenJDK already exposes bytecode offsets in its public API. 2. I suggested a change for line numbers to optionally accept a label. This way, I can register line numbers later similarly to how ASM does it. I understand it's unlikey a common need but it's a minor change and unifies the possibility with other metadata in the API which all use labels. Best regards, Rafael Rafael Winterhalter schrieb am So., 10. Juli 2022, 22:57: > I was able to complete a POC for an ASM-to-OpenJDK bridge. It works quite > well, but I hit a few more road blocks then the last time: > https://github.com/raphw/asm-jdk-bridge/tree/writer-poc > > I have used my API suggestion of "open builders" ( > https://github.com/raphw/jdk-sandbox/commit/2be58f400b9ebf96b851eda658e0b8d2560421c5) > for this POC which I had modified slightly since my last mail. As Paul > pointed out, it does not allow to repeat a code builder in case of a > LabelOverflowException. But personally, I would - as a user - not expect > that my CodeBuilder-Consumer is consumed multiple times, I would hope that > the implementation could rather reiterate to patch the label offsets, or at > least do that for the "open version". Anyways, I think that this limitation > is solvable. > > I have again written a row of tests to see how well the bridge is working. > Again, I made a few observations: > > - The frame generation works, but generates slightly different frames from > what javac generated if I leave the byte code untouched. For example, local > variables are "voided" if they are not used after a frame. This is not a > problem in practice, but I know that some tools prepend code to methods, > assuming all variables still exist, without checking frames what would fail > bytecode verification. This would however only happen if a class file was > first processed by OpenJDKs API and later by ASM. In my bridge sample repo, > this is reproduced by the test of the "TryThrowCatch" class. > - The frame generation does not work for optional types which are often > used especially in enterprise-directed frameworks like Spring (if this is > fortunate is another question, it is however quite common). I understand > that the class hierarchy resolution is pluggable, but for optional types, > the class hierarchy cannot be resolved, even with a custom resolver, since > these types are in fact missing at runtime. This is why Byte Buddy uses > "frame weaving" which would require that frames can be written explicitly > and are not handled by the class writer. I would therefore suggest that one > should be able to opt-out of frame generation, be it for single methods, > and have some API to add them explicitly. This normally also improves > performance. Byte Buddy for example generates three times the overhead if > ASMs frame computation is enabled as it is rather I/O-heavy. An example of > where OpenJDK creates invalid frames is documented in the > "FrameWithMissingType" test of the bridge repo. Also, many byte code > processing frameworks already need to keep track of the maximum stack and > locals; I am not sure how expensive computation of those values is in > practice, but ASM claims that it saves 20% on avoiding the bookkeeping. > Maybe one could consider to allow specifying these values too if they are > available anyways. > - I noticed that when adding a TypeAnnotation on a NEW instruction, the > offset in the attribute seems to be set incorrectly. The type annotation is > added with reference to the DUP bytecode that follows NEW. I am not 100% > sure if this is legal as the type reference still points to a NEW > instruction, but I would still suggest fixing it. I tried to do this > myself, but I got lost in the code. Maybe I will find some more time to > still do it myself some time. The error is documented in the > "TypeAnnotationInCode" test of my repo. > - I noticed that the JSR/RET instructions are not supported, even if they > are just "passed through". Unfortunately, I still encounter them regularly > when working with Java agents. JDBC drivers are often compiled to very old > bytecode levels and for example vendors for APM tools like instrumenting > those. I know they are difficult and annoying bytecodes, but I would expect > that OpenJDK will eventually be able to process those. Is this only meant > as an intermediate limitation? > - Custom/unknown attributes are also difficult to deal with when writing. > In a way, its the same problem as with reading them. As I mentioned then, I > lack some way of resolving an unknown Attribute to an array of bytes given > some constant pool. Currently, I cannot even attempt this as all interfaces > are sealed and I could not provide a custom implementation of a > ConstantPoolBuilder to an AttributeMapper either. As things are, ASM does > neither support this and I made a suggestion to open up the attribute API > which is currently tied to ClassReader/ClassWriter implementations ( > https://gitlab.ow2.org/asm/asm/-/merge_requests/353). This is a > prerequisit and I would hope to have a similar API that is implementable > in OpenJDKs to pass to OpenJDK's AttributeMapper class, to write attributes > from and to byte arrays.. > - OpenJDK's "line number" API only allows setting a line number for the > current code position. For most use cases, I think this is sufficient, but > it stuck out to me as it is an attribute where most in-code attributes are > currently tied to labels. Ideally, I would suggest that line numbers should > also be definable for a previous position with a label; ASM does allow the > same at least and I assume people will have an easier time migrating code. > > Before all of these things however, some form of "open API" would be > required to even make the class writer bridge possible, so I hope that you > can consider something in the direction of my suggested pull request. > Thanks for considering! I will try to make suggestions for the minor things > I pointed out in the mean time and follow up with the ASM crew to see if we > can get the attribute problem sorted, too. > > Many thanks, Rafael > > Am So., 10. Juli 2022 um 19:40 Uhr schrieb Brian Goetz < > brian.goetz at oracle.com>: > >> Just some quick comments, more when I get back from vacation. >> >> > Thanks for merging my patches, the class reader API works more or less >> equivalent to the ASM version. The one thing I have not working so far are >> attributes that the JDK knows but ASM does not. I was trying to implement >> the "contents" method upwards, but this is difficult with the unbound >> attributes as they normally require a BufWriter which in turn needs a >> ConstantPoolBuilder. Somehow, I need to pipe the constant resolution to ASM >> which cannot be done without unsealing some interfaces. I will try to >> prototype a solution here, too, but I wanted to get the writer working >> first. >> >> I?ll take a look at this when I get back. >> >> > For the type annotations on instructions: Would it be an option to add >> "getVisibleAnnotations" and "getInvisibleAnnotations" to the relevant >> CodeElement types? This way, I could for example query the "ExceptionCatch" >> value for its annotations. >> >> I?d like to slice this in two parts; one to query the BCI for a (bound) >> CodeElement, and one to use that BCI to lookup annotations. Getting the >> BCI is an essential bit of functionality, but given that elements can be >> bound or unbound, such functionality will be necessarily partial, so we >> have to communicate ?no BCI? as a possible return value. >> >> > StackMapFrames could on the other hand just be added at their position >> in the CodeElement iteration to receive them where they become relevant. >> This way, one would not need to keep track of the current offset. This >> would also allow for an easier write model where ASM does not allow you to >> know the offset of a stack map. I assume that the current model is very >> much modeled after the needs of the javap tool. Ideally, the frame objects >> would be reduced to the information that is contained in a class file and >> the consumer could track implicit information such as the "effective" stack >> and locals. >> >> There are two concerns with this, one minor and one major. The minor one >> is that this has a significant cost, and most users don?t want this >> information. So we would surely want to gate this with an option whose >> default is false. (We care the most about transformation, and most >> transformations make only light changes, so we don?t want to add costs that >> most users won?t want to bear.). >> >> The major one is how it perturbs the model. The element stream delivered >> by a model, and the one consumed by a builder, should be duals. What >> should a builder do when handed a frame? Switch responsibility over to the >> user for correct frame generation? Ignore it and regenerate stack maps >> anyway? >> >> I think we need a better picture of ?who is the audience for this sub >> feature? before designing the API. >> >> > At last, some notes I made while working with the writer API: >> > - ClassBuilder does not require flags when creating a field using >> "withField", but does so for "withMethod". Should this not be consistent? >> >> One can always set flags through the AccessFlags element; anything else >> is just a convenience. The convenience is there for fields because they >> almost never have any other elements ? but methods almost always do. I >> don?t necessarily object to conveniences, but I don?t want to confuse users >> with too many ways to do the same thing. (I think the flags-accepting >> overload for fields does NOT accept a Consumer, so there?s always only one >> way to do it within each withField overload ? which is a good thing.). >> >> > - EnclosingMethodAttribute requires optional parameters in its "of" >> method. I find that a bit awkward and would rather see an overload or the >> parameters to accept "null?. >> >> This is a deeper question of API design than it first appears. (BTW >> originally we had the null-accepting/delivering version, and we found it to >> be error-prone.). >> >> We want for the _data model_ to be the source of truth, and derive the >> API from that. One reason for doing so is that it means that the reading >> API (accessors on the Element) and the writing API (withXxx methods on >> Builders) will be free of gratuitous inconsistencies, so you can always >> take things apart to whatever level you want, clone/modify/keep each >> individual datum, and feed them back to the corresponding builder without >> having to convert, wrap, unwrap, etc. So the return type of >> `enclosingMethod` returns an Optional, there?s a strong argument that the >> of() method should take one. Otherwise, users have to make up the >> difference in ad-hoc and error-prone ways when adapting classifies. >> >> Both Optional and null are reasonable ways to represent >> possibly-not-there values in an API like this, but mixing the two in the >> same API multiplies the chance for error. So I prefer to address this >> question at a higher level (?should anything ever return null?) rather than >> what a particular accessor or factory should deal in. >> >> Cheers, >> -Brian >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafael.wth at gmail.com Mon Jul 11 22:09:06 2022 From: rafael.wth at gmail.com (Rafael Winterhalter) Date: Tue, 12 Jul 2022 00:09:06 +0200 Subject: javap with classfile API Message-ID: Hello, I completed a full round of comparing byte code output with ASM and OpenJDK's APIs and doing so, I was using the sandboxes version of javap. I found some errors in the tool doing so which I fixed for my own convenience on the way (if you wanted to adapt those fixes, https://github.com/openjdk/jdk-sandbox/pull/21, basically it makes the javap output identical to the ASM-based version). I was however wondering about printing dynamic constants: 1. For invokedynamic instructions, printConstantPoolRefAndValue(instr.invokedynamic(), 0) is used. I would not know why the 0 is printed here, but it was printed previously. Possibly, it was meant to yield an equivalent output for the "count" (JVMS 6.5) value of interface calls, but I do not see its value. Maybe this should be changed in javap. 2. I changed the print of DynamicConstantPoolEntrys to print("#" + info.bootstrap().bsmIndex() + ":#" + info.nameAndType().index()); , the same that is used in the string representation. Previously, the index was read via info.bootstrap().bootstrapMethod().index() - I do not quite understand why those two would be different, and bsmIndex seems to return 0 for lambda meta factory values, but it is indeed the old behavior. Maybe printing the reference to lambda meta factory would however be the better option. In this case, the string representation should however also be adjusted. 3. For handle types, the internal constant names were printed. Those are not part of the spec, so I changed it to print the documented REF_abc values. Those do however not indicate if a reference addresses an interface invocation; maybe this information should be included in javap. 4. I thought I found a row of issues with printing in-code type annotation type references, but it seems that javac has a row of issues with adding them to class files. So it's javac who gets those wrong, not javap. I validated that this has been an issue previously and is not related to the change in class file emitter. I reported these issues here ( https://bugs.openjdk.org/browse/JDK-8290125) but wanted to mention it if anybody ran into the same issue. Best regards, Rafael -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Mon Jul 11 22:21:06 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 12 Jul 2022 00:21:06 +0200 (CEST) Subject: javap with classfile API In-Reply-To: References: Message-ID: <546997455.8954361.1657578066882.JavaMail.zimbra@u-pem.fr> > From: "Rafael Winterhalter" > To: "classfile-api-dev" > Sent: Tuesday, July 12, 2022 12:09:06 AM > Subject: javap with classfile API > Hello, > I completed a full round of comparing byte code output with ASM and OpenJDK's > APIs and doing so, I was using the sandboxes version of javap. I found some > errors in the tool doing so which I fixed for my own convenience on the way (if > you wanted to adapt those fixes, [ > https://github.com/openjdk/jdk-sandbox/pull/21 | > https://github.com/openjdk/jdk-sandbox/pull/21 ] , basically it makes the javap > output identical to the ASM-based version). > I was however wondering about printing dynamic constants: > 1. For invokedynamic instructions, > printConstantPoolRefAndValue(instr.invokedynamic(), 0) > is used. I would not know why the 0 is printed here, but it was printed > previously. Possibly, it was meant to yield an equivalent output for the > "count" (JVMS 6.5) value of interface calls, but I do not see its value. Maybe > this should be changed in javap. invokedynamic format is 5 bytes with the same encoding as invokevirtual, the opcode, two bytes for the constant and two bytes that are 0. Those two bytes initially at 0 are used by the VM to reference the target of the CallSite once the BSM is executed. see https://docs.oracle.com/javase/specs/jvms/se18/html/jvms-6.html#jvms-6.5.invokedynamic > 2. I changed the print of DynamicConstantPoolEntrys to > print("#" + info.bootstrap().bsmIndex() + ":#" + info.nameAndType().index()); > , the same that is used in the string representation. Previously, the index was > read via info.bootstrap().bootstrapMethod().index() - I do not quite understand > why those two would be different, and bsmIndex seems to return 0 for lambda > meta factory values, but it is indeed the old behavior. Maybe printing the > reference to lambda meta factory would however be the better option. In this > case, the string representation should however also be adjusted. > 3. For handle types, the internal constant names were printed. Those are not > part of the spec, so I changed it to print the documented REF_abc values. Those > do however not indicate if a reference addresses an interface invocation; maybe > this information should be included in javap. > 4. I thought I found a row of issues with printing in-code type annotation type > references, but it seems that javac has a row of issues with adding them to > class files. So it's javac who gets those wrong, not javap. I validated that > this has been an issue previously and is not related to the change in class > file emitter. I reported these issues here ( [ > https://bugs.openjdk.org/browse/JDK-8290125 | > https://bugs.openjdk.org/browse/JDK-8290125 ] ) but wanted to mention it if > anybody ran into the same issue. > Best regards, Rafael R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafael.wth at gmail.com Tue Jul 12 06:54:20 2022 From: rafael.wth at gmail.com (Rafael Winterhalter) Date: Tue, 12 Jul 2022 08:54:20 +0200 Subject: javap with classfile API In-Reply-To: <546997455.8954361.1657578066882.JavaMail.zimbra@u-pem.fr> References: <546997455.8954361.1657578066882.JavaMail.zimbra@u-pem.fr> Message-ID: Ah, of course, I could have thought of that. Then I still wonder if it makes sense to show this zero in javap. Remi Forax schrieb am Di., 12. Juli 2022, 00:21: > > > ------------------------------ > > *From: *"Rafael Winterhalter" > *To: *"classfile-api-dev" > *Sent: *Tuesday, July 12, 2022 12:09:06 AM > *Subject: *javap with classfile API > > Hello, > > I completed a full round of comparing byte code output with ASM and > OpenJDK's APIs and doing so, I was using the sandboxes version of javap. I > found some errors in the tool doing so which I fixed for my own convenience > on the way (if you wanted to adapt those fixes, > https://github.com/openjdk/jdk-sandbox/pull/21, basically it makes the > javap output identical to the ASM-based version). > > I was however wondering about printing dynamic constants: > > 1. For invokedynamic instructions, > > printConstantPoolRefAndValue(instr.invokedynamic(), 0) > > is used. I would not know why the 0 is printed here, but it was printed > previously. Possibly, it was meant to yield an equivalent output for the > "count" (JVMS 6.5) value of interface calls, but I do not see its value. > Maybe this should be changed in javap. > > > invokedynamic format is 5 bytes with the same encoding as invokevirtual, > the opcode, two bytes for the constant and two bytes that are 0. > Those two bytes initially at 0 are used by the VM to reference the target > of the CallSite once the BSM is executed. > > see > https://docs.oracle.com/javase/specs/jvms/se18/html/jvms-6.html#jvms-6.5.invokedynamic > > > 2. I changed the print of DynamicConstantPoolEntrys to > > print("#" + info.bootstrap().bsmIndex() + ":#" + > info.nameAndType().index()); > > , the same that is used in the string representation. Previously, the > index was read via info.bootstrap().bootstrapMethod().index() - I do not > quite understand why those two would be different, and bsmIndex seems to > return 0 for lambda meta factory values, but it is indeed the old behavior. > Maybe printing the reference to lambda meta factory would however be the > better option. In this case, the string representation should however also > be adjusted. > > 3. For handle types, the internal constant names were printed. Those are > not part of the spec, so I changed it to print the documented REF_abc > values. Those do however not indicate if a reference addresses an interface > invocation; maybe this information should be included in javap. > > 4. I thought I found a row of issues with printing in-code type annotation > type references, but it seems that javac has a row of issues with adding > them to class files. So it's javac who gets those wrong, not javap. I > validated that this has been an issue previously and is not related to the > change in class file emitter. I reported these issues here ( > https://bugs.openjdk.org/browse/JDK-8290125) but wanted to mention it if > anybody ran into the same issue. > > Best regards, Rafael > > > R?mi > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Tue Jul 12 07:29:09 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Tue, 12 Jul 2022 09:29:09 +0200 (CEST) Subject: javap with classfile API In-Reply-To: References: <546997455.8954361.1657578066882.JavaMail.zimbra@u-pem.fr> Message-ID: <1691979391.9111533.1657610949490.JavaMail.zimbra@u-pem.fr> > From: "Rafael Winterhalter" > To: "Remi Forax" > Cc: "classfile-api-dev" > Sent: Tuesday, July 12, 2022 8:54:20 AM > Subject: Re: javap with classfile API In my previous mail, it should be, invokedynamic format is 5 bytes with the same encoding (* in practice *) as **invokeinterface** (not invokevirtual) > Ah, of course, I could have thought of that. Then I still wonder if it makes > sense to show this zero in javap. At least, this is not very consistent, invokeinterface does not show the last two bytes of it's encoding. So i guess invokedynamic should not show them too. R?mi > Remi Forax < [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] > schrieb am Di., > 12. Juli 2022, 00:21: >>> From: "Rafael Winterhalter" < [ mailto:rafael.wth at gmail.com | >>> rafael.wth at gmail.com ] > >>> To: "classfile-api-dev" < [ mailto:classfile-api-dev at openjdk.org | >>> classfile-api-dev at openjdk.org ] > >>> Sent: Tuesday, July 12, 2022 12:09:06 AM >>> Subject: javap with classfile API >>> Hello, >>> I completed a full round of comparing byte code output with ASM and OpenJDK's >>> APIs and doing so, I was using the sandboxes version of javap. I found some >>> errors in the tool doing so which I fixed for my own convenience on the way (if >>> you wanted to adapt those fixes, [ >>> https://github.com/openjdk/jdk-sandbox/pull/21 | >>> https://github.com/openjdk/jdk-sandbox/pull/21 ] , basically it makes the javap >>> output identical to the ASM-based version). >>> I was however wondering about printing dynamic constants: >>> 1. For invokedynamic instructions, >>> printConstantPoolRefAndValue(instr.invokedynamic(), 0) >>> is used. I would not know why the 0 is printed here, but it was printed >>> previously. Possibly, it was meant to yield an equivalent output for the >>> "count" (JVMS 6.5) value of interface calls, but I do not see its value. Maybe >>> this should be changed in javap. >> invokedynamic format is 5 bytes with the same encoding as invokevirtual, the >> opcode, two bytes for the constant and two bytes that are 0. >> Those two bytes initially at 0 are used by the VM to reference the target of the >> CallSite once the BSM is executed. >> see [ >> https://docs.oracle.com/javase/specs/jvms/se18/html/jvms-6.html#jvms-6.5.invokedynamic >> | >> https://docs.oracle.com/javase/specs/jvms/se18/html/jvms-6.html#jvms-6.5.invokedynamic >> ] >>> 2. I changed the print of DynamicConstantPoolEntrys to >>> print("#" + info.bootstrap().bsmIndex() + ":#" + info.nameAndType().index()); >>> , the same that is used in the string representation. Previously, the index was >>> read via info.bootstrap().bootstrapMethod().index() - I do not quite understand >>> why those two would be different, and bsmIndex seems to return 0 for lambda >>> meta factory values, but it is indeed the old behavior. Maybe printing the >>> reference to lambda meta factory would however be the better option. In this >>> case, the string representation should however also be adjusted. >>> 3. For handle types, the internal constant names were printed. Those are not >>> part of the spec, so I changed it to print the documented REF_abc values. Those >>> do however not indicate if a reference addresses an interface invocation; maybe >>> this information should be included in javap. >>> 4. I thought I found a row of issues with printing in-code type annotation type >>> references, but it seems that javac has a row of issues with adding them to >>> class files. So it's javac who gets those wrong, not javap. I validated that >>> this has been an issue previously and is not related to the change in class >>> file emitter. I reported these issues here ( [ >>> https://bugs.openjdk.org/browse/JDK-8290125 | >>> https://bugs.openjdk.org/browse/JDK-8290125 ] ) but wanted to mention it if >>> anybody ran into the same issue. >>> Best regards, Rafael >> R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Tue Jul 12 09:47:59 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Tue, 12 Jul 2022 09:47:59 +0000 Subject: javap with classfile API In-Reply-To: References: Message-ID: Hi Rafael, Could you, please, add tests cases where classfile-api-dev-branch javap output differs from standard javap? These cases should be added into javap tests to cover backward compatibility of the output. Thanks, Adam From: classfile-api-dev on behalf of Rafael Winterhalter Date: Tuesday, 12 July 2022 0:09 To: classfile-api-dev at openjdk.org Subject: javap with classfile API Hello, I completed a full round of comparing byte code output with ASM and OpenJDK's APIs and doing so, I was using the sandboxes version of javap. I found some errors in the tool doing so which I fixed for my own convenience on the way (if you wanted to adapt those fixes, https://github.com/openjdk/jdk-sandbox/pull/21, basically it makes the javap output identical to the ASM-based version). I was however wondering about printing dynamic constants: 1. For invokedynamic instructions, printConstantPoolRefAndValue(instr.invokedynamic(), 0) is used. I would not know why the 0 is printed here, but it was printed previously. Possibly, it was meant to yield an equivalent output for the "count" (JVMS 6.5) value of interface calls, but I do not see its value. Maybe this should be changed in javap. 2. I changed the print of DynamicConstantPoolEntrys to print("#" + info.bootstrap().bsmIndex() + ":#" + info.nameAndType().index()); , the same that is used in the string representation. Previously, the index was read via info.bootstrap().bootstrapMethod().index() - I do not quite understand why those two would be different, and bsmIndex seems to return 0 for lambda meta factory values, but it is indeed the old behavior. Maybe printing the reference to lambda meta factory would however be the better option. In this case, the string representation should however also be adjusted. 3. For handle types, the internal constant names were printed. Those are not part of the spec, so I changed it to print the documented REF_abc values. Those do however not indicate if a reference addresses an interface invocation; maybe this information should be included in javap. 4. I thought I found a row of issues with printing in-code type annotation type references, but it seems that javac has a row of issues with adding them to class files. So it's javac who gets those wrong, not javap. I validated that this has been an issue previously and is not related to the change in class file emitter. I reported these issues here (https://bugs.openjdk.org/browse/JDK-8290125) but wanted to mention it if anybody ran into the same issue. Best regards, Rafael -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Tue Jul 12 12:20:40 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Tue, 12 Jul 2022 12:20:40 +0000 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: <59424A30-EDF8-4AAA-8259-6F1B24FAB0DF@oracle.com> References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <59424A30-EDF8-4AAA-8259-6F1B24FAB0DF@oracle.com> Message-ID: From: classfile-api-dev on behalf of Brian Goetz Date: Sunday, 10 July 2022 19:40 To: Rafael Winterhalter Cc: classfile-api-dev at openjdk.org Subject: Re: POC: JDK ClassModel -> ASM ClassReader > StackMapFrames could on the other hand just be added at their position in the CodeElement iteration to receive them where they become relevant. This way, one would not need to keep track of the current offset. This would also allow for an easier write model where ASM does not allow you to know the offset of a stack map. I assume that the current model is very much modeled after the needs of the javap tool. Ideally, the frame objects would be reduced to the information that is contained in a class file and the consumer could track implicit information such as the "effective" stack and locals. There are two concerns with this, one minor and one major. The minor one is that this has a significant cost, and most users don?t want this information. So we would surely want to gate this with an option whose default is false. (We care the most about transformation, and most transformations make only light changes, so we don?t want to add costs that most users won?t want to bear.). The major one is how it perturbs the model. The element stream delivered by a model, and the one consumed by a builder, should be duals. What should a builder do when handed a frame? Switch responsibility over to the user for correct frame generation? Ignore it and regenerate stack maps anyway? I think we need a better picture of ?who is the audience for this sub feature? before designing the API. I?ve been considering (and re-considering) many various scenarios related to stack maps. Number one requirement is to generate valid stack maps in any circumstances and with minimal performance penalty. And we already do a lot in this area: * Stack maps generation requires minimal information about involved classes and only to resolve controversial situations (for example when looking for common parent of dual assignment to a single local variable). Required information is minimal and limited to individual classes (is this specific class an interface or what is its parent class). On the other side for example ASM requires to load the classes with all dependencies to generate stack maps. * Generation process is fast and does not produce any temporary structures and objects. It is single-pass in >95% cases, dual-pass in >4% cases and maximum three-pass in the remaining ~1% of cases (statistics calculated from very large corpus of classes). Experiments to involve transformed original stack maps led to increased complexity, worse performance, and mainly failed to produce valid stack maps. There is no benefit of passing user-created (or somehow transformed) stack map frames to the generator for final processing. >From the discussion (and from my experience with class instrumentation) I see one use case we didn?t cover and one case where we can improve: 1. We handle well class transformation from single source. Shared constant pool allows to keep original stack maps for all methods with unmodified bytecode. However, class instrumentation is a transformation with at least two sources. Such transformation can share only one constant pool. All methods from the second source must be exploded to instructions, reconstructed and stack maps generated from scratch. Author of such transformation must be fully aware of the consequences and having an option to pass stack maps through non-shared CP transformation would be a valuable feature. It would require: * Option to individually turn off stack map generation per method (because there might be also synthetic methods where sm generation is required). I would propose to implement Code-level override of global options. * Factory for stack map table manual construction (based on labels, not offsets). I would propose to put this ?manual mode? aside from CodeBuilder and implement it as StackMapTableAttribute.of(Frame?) factory. 1. There are cases where required class hierarchy is not available (when there is no access to all jars), so it would be hard for user to provide appropriate ClassHierarchyResolver. However, many individual class information can be theoretically extracted from the source classes (from class headers, from existing stack maps, from other attributes or from the bytecode itself). It is just an idea; however I think it might be possible to implement an optional hierarchy resolver, that will learn from the parsed classes. It is a theoretical option for improvement, without throwing that responsibility on user. However, any such solution would remain in category of non-deterministic. Altering a stack map without minimal knowledge of the involved classes is still a blind surgery. Thanks, Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafael.wth at gmail.com Wed Jul 13 11:49:09 2022 From: rafael.wth at gmail.com (Rafael Winterhalter) Date: Wed, 13 Jul 2022 13:49:09 +0200 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <59424A30-EDF8-4AAA-8259-6F1B24FAB0DF@oracle.com> Message-ID: ASM offers to override a method in ClassWriter to compute frames without loading classes. Byte Buddy does so, and it is possible to generate stack map frames this way. But it is still quite an overhead and just as with OpenJDK right now, it does not work when types are missing what is unfortunately rather common, for example if Spring is involved. In contrast, Byte Buddy's frame weaving is always single-pass and does not require any allocation (per frame), and works with missing types. (Currently, OpenJDK produces a verification error https://github.com/raphw/asm-jdk-bridge/blob/writer-poc/src/test/java/codes/rafael/asmjdkbridge/sample/FrameWithMissingType.java). This is why I would prefer to plug this logic into OpenJDK's class writer, if possible. Of course, this logic is based on some assumptions, but if you are not trying to cover the general case, one can always be more efficient than a generic processor, just for that it would be a helpful addition. As for the generator API: The current StackMapFrame objects contain information that is not present in the class file. For example, the attribute offers an "initialFrame", and the frames themselves contain effective stack and effective locals. For crop frames, those currently contain the types of the cropped frames. For a writer API, OpenJDK's API would hopefully only consume the data as it is written to the class file. That would be (a) the type of frame and (b) the declared values for stack and locals, or the amount of cropped frames. Of course, one could link all frames within a StackMapTableAttribute together and compute this information on the fly. For example by providing some builder: StackMapFrameAttribute b = StackMapFrameAttribute.builder(MethodDesc) .append(...) .crop(...) .same(...) .same1(...) .full(...) .build(); One could then add this attribute or fail the CodeBuilder if "manual mode" was set without the attribute being present. I think this later option would be a decent API. If you would consider it, I can offer to prototype such a solution. Best regards, Rafael Am Di., 12. Juli 2022 um 14:20 Uhr schrieb Adam Sotona < adam.sotona at oracle.com>: > > > > > *From: *classfile-api-dev on behalf > of Brian Goetz > *Date: *Sunday, 10 July 2022 19:40 > *To: *Rafael Winterhalter > *Cc: *classfile-api-dev at openjdk.org > *Subject: *Re: POC: JDK ClassModel -> ASM ClassReader > > > > > StackMapFrames could on the other hand just be added at their position > in the CodeElement iteration to receive them where they become relevant. > This way, one would not need to keep track of the current offset. This > would also allow for an easier write model where ASM does not allow you to > know the offset of a stack map. I assume that the current model is very > much modeled after the needs of the javap tool. Ideally, the frame objects > would be reduced to the information that is contained in a class file and > the consumer could track implicit information such as the "effective" stack > and locals. > > There are two concerns with this, one minor and one major. The minor one > is that this has a significant cost, and most users don?t want this > information. So we would surely want to gate this with an option whose > default is false. (We care the most about transformation, and most > transformations make only light changes, so we don?t want to add costs that > most users won?t want to bear.). > > The major one is how it perturbs the model. The element stream delivered > by a model, and the one consumed by a builder, should be duals. What > should a builder do when handed a frame? Switch responsibility over to the > user for correct frame generation? Ignore it and regenerate stack maps > anyway? > > I think we need a better picture of ?who is the audience for this sub > feature? before designing the API. > > I?ve been considering (and re-considering) many various scenarios related > to stack maps. > > Number one requirement is to generate valid stack maps in any > circumstances and with minimal performance penalty. And we already do a lot > in this area: > > - Stack maps generation requires minimal information about involved > classes and only to resolve controversial situations (for example when > looking for common parent of dual assignment to a single local variable). > Required information is minimal and limited to individual classes (is this > specific class an interface or what is its parent class). On the other side > for example ASM requires to load the classes with all dependencies to > generate stack maps. > - Generation process is fast and does not produce any temporary > structures and objects. It is single-pass in >95% cases, dual-pass in >4% > cases and maximum three-pass in the remaining ~1% of cases (statistics > calculated from very large corpus of classes). > > Experiments to involve transformed original stack maps led to increased > complexity, worse performance, and mainly failed to produce valid stack > maps. There is no benefit of passing user-created (or somehow transformed) > stack map frames to the generator for final processing. > > From the discussion (and from my experience with class instrumentation) I > see one use case we didn?t cover and one case where we can improve: > > 1. We handle well class transformation from single source. Shared > constant pool allows to keep original stack maps for all methods with > unmodified bytecode. However, class instrumentation is a transformation > with at least two sources. Such transformation can share only one constant > pool. All methods from the second source must be exploded to instructions, > reconstructed and stack maps generated from scratch. Author of such > transformation must be fully aware of the consequences and having an option > to pass stack maps through non-shared CP transformation would be a valuable > feature. It would require: > > > 1. Option to individually turn off stack map generation per method > (because there might be also synthetic methods where sm generation is > required). I would propose to implement Code-level override of global > options. > 2. Factory for stack map table manual construction (based on > labels, not offsets). I would propose to put this ?manual mode? aside from > CodeBuilder and implement it as StackMapTableAttribute.of(Frame?) factory. > > > 1. There are cases where required class hierarchy is not available > (when there is no access to all jars), so it would be hard for user to > provide appropriate ClassHierarchyResolver. However, many individual class > information can be theoretically extracted from the source classes (from > class headers, from existing stack maps, from other attributes or from the > bytecode itself). It is just an idea; however I think it might be possible > to implement an optional hierarchy resolver, that will learn from the > parsed classes. It is a theoretical option for improvement, without > throwing that responsibility on user. However, any such solution would > remain in category of non-deterministic. Altering a stack map without > minimal knowledge of the involved classes is still a blind surgery. > > > > Thanks, > > Adam > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Wed Jul 13 13:17:00 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Wed, 13 Jul 2022 13:17:00 +0000 Subject: [External] : Re: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <59424A30-EDF8-4AAA-8259-6F1B24FAB0DF@oracle.com> Message-ID: On 13.07.2022 13:49, "Rafael Winterhalter" wrote: ASM offers to override a method in ClassWriter to compute frames without loading classes. Byte Buddy does so, and it is possible to generate stack map frames this way. But it is still quite an overhead and just as with OpenJDK right now, it does not work when types are missing what is unfortunately rather common, for example if Spring is involved. In contrast, Byte Buddy's frame weaving is always single-pass and does not require any allocation (per frame), and works with missing types. (Currently, OpenJDK produces a verification error https://github.com/raphw/asm-jdk-bridge/blob/writer-poc/src/test/java/codes/rafael/asmjdkbridge/sample/FrameWithMissingType.java). This is why I would prefer to plug this logic into OpenJDK's class writer, if possible. Of course, this logic is based on some assumptions, but if you are not trying to cover the general case, one can always be more efficient than a generic processor, just for that it would be a helpful addition. Could you be, please, more specific about ?OpenJDK produces a verification error?. I don?t see any code using Classfile API in your sample. Classfile API is passing unchanged stack maps of unchanged methods through transformations (in default shared constant pool mode). What I agree with is API extension to allow manually pass stack maps also through builder. However I?m not sure what else do you mean by ?frame weaving?. As for the generator API: The current StackMapFrame objects contain information that is not present in the class file. For example, the attribute offers an "initialFrame", and the frames themselves contain effective stack and effective locals. For crop frames, those currently contain the types of the cropped frames. For a writer API, OpenJDK's API would hopefully only consume the data as it is written to the class file. That would be (a) the type of frame and (b) the declared values for stack and locals, or the amount of cropped frames. Of course, one could link all frames within a StackMapTableAttribute together and compute this information on the fly. For example by providing some builder: StackMapFrameAttribute b = StackMapFrameAttribute.builder(MethodDesc) .append(...) .crop(...) .same(...) .same1(...) .full(...) .build(); One could then add this attribute or fail the CodeBuilder if "manual mode" was set without the attribute being present. I think this later option would be a decent API. If you would consider it, I can offer to prototype such a solution. What you are asking for is compressed form of stack map table. Initial stack map frame is a key information for anyone working with stack maps. Having only relative offsets and differentially compressed subsequent frames is useless information without the initial frame. According to my experience only effective full stack map frames can be transformed or any other way processed. Information about compressed form of every frame is just a secondary. Every stack map table can be compressed into many equivalent forms. I agree with an option for user to specificy labeled full stack map frames. However frames compression is something user should not be responsible of, it is similar as to request of manual deflation when writing a zip file. Thanks, Adam Best regards, Rafael Am Di., 12. Juli 2022 um 14:20 Uhr schrieb Adam Sotona >: From: classfile-api-dev > on behalf of Brian Goetz > Date: Sunday, 10 July 2022 19:40 To: Rafael Winterhalter > Cc: classfile-api-dev at openjdk.org > Subject: Re: POC: JDK ClassModel -> ASM ClassReader > StackMapFrames could on the other hand just be added at their position in the CodeElement iteration to receive them where they become relevant. This way, one would not need to keep track of the current offset. This would also allow for an easier write model where ASM does not allow you to know the offset of a stack map. I assume that the current model is very much modeled after the needs of the javap tool. Ideally, the frame objects would be reduced to the information that is contained in a class file and the consumer could track implicit information such as the "effective" stack and locals. There are two concerns with this, one minor and one major. The minor one is that this has a significant cost, and most users don?t want this information. So we would surely want to gate this with an option whose default is false. (We care the most about transformation, and most transformations make only light changes, so we don?t want to add costs that most users won?t want to bear.). The major one is how it perturbs the model. The element stream delivered by a model, and the one consumed by a builder, should be duals. What should a builder do when handed a frame? Switch responsibility over to the user for correct frame generation? Ignore it and regenerate stack maps anyway? I think we need a better picture of ?who is the audience for this sub feature? before designing the API. I?ve been considering (and re-considering) many various scenarios related to stack maps. Number one requirement is to generate valid stack maps in any circumstances and with minimal performance penalty. And we already do a lot in this area: ? Stack maps generation requires minimal information about involved classes and only to resolve controversial situations (for example when looking for common parent of dual assignment to a single local variable). Required information is minimal and limited to individual classes (is this specific class an interface or what is its parent class). On the other side for example ASM requires to load the classes with all dependencies to generate stack maps. ? Generation process is fast and does not produce any temporary structures and objects. It is single-pass in >95% cases, dual-pass in >4% cases and maximum three-pass in the remaining ~1% of cases (statistics calculated from very large corpus of classes). Experiments to involve transformed original stack maps led to increased complexity, worse performance, and mainly failed to produce valid stack maps. There is no benefit of passing user-created (or somehow transformed) stack map frames to the generator for final processing. >From the discussion (and from my experience with class instrumentation) I see one use case we didn?t cover and one case where we can improve: 1. We handle well class transformation from single source. Shared constant pool allows to keep original stack maps for all methods with unmodified bytecode. However, class instrumentation is a transformation with at least two sources. Such transformation can share only one constant pool. All methods from the second source must be exploded to instructions, reconstructed and stack maps generated from scratch. Author of such transformation must be fully aware of the consequences and having an option to pass stack maps through non-shared CP transformation would be a valuable feature. It would require: a. Option to individually turn off stack map generation per method (because there might be also synthetic methods where sm generation is required). I would propose to implement Code-level override of global options. b. Factory for stack map table manual construction (based on labels, not offsets). I would propose to put this ?manual mode? aside from CodeBuilder and implement it as StackMapTableAttribute.of(Frame?) factory. 2. There are cases where required class hierarchy is not available (when there is no access to all jars), so it would be hard for user to provide appropriate ClassHierarchyResolver. However, many individual class information can be theoretically extracted from the source classes (from class headers, from existing stack maps, from other attributes or from the bytecode itself). It is just an idea; however I think it might be possible to implement an optional hierarchy resolver, that will learn from the parsed classes. It is a theoretical option for improvement, without throwing that responsibility on user. However, any such solution would remain in category of non-deterministic. Altering a stack map without minimal knowledge of the involved classes is still a blind surgery. Thanks, Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafael.wth at gmail.com Wed Jul 13 20:09:51 2022 From: rafael.wth at gmail.com (Rafael Winterhalter) Date: Wed, 13 Jul 2022 22:09:51 +0200 Subject: [External] : Re: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <59424A30-EDF8-4AAA-8259-6F1B24FAB0DF@oracle.com> Message-ID: If you are exciting the tests of this repo (you'd need to use my "monad" branch to do so, but the error is not related to it), then OpenJDK would, for FrameWithMissingType - where the "Missing" class is not present - compute the following bytecode if it the instructions of "m" are copied to a new class: public m(Z)V ILOAD 1 IFEQ 1 NEW codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Present DUP INVOKESPECIAL codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Present. ()V ASTORE 2 GOTO 2 LABEL 1 FRAME SAME NEW codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Missing DUP INVOKESPECIAL codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Missing. ()V ASTORE 2 LABEL 2 FRAME APPEND [java/lang/Object] // this will render slot 2 of type object, the next two lines yield the verification error ALOAD 2 INVOKEINTERFACE codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Iface.m ()V (itf) RETURN When specifying the frames explicitly in ASM, this does not happen where the append frame with comment above is correctly: FRAME APPEND [codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Iface] By "frame weaving" I mean a technique that Byte Buddy is using to take frames of two classes (the user and the "advice" class) and to merge the frame information of both classes on the fly. Other than AspectJ and similar tools, Byte Buddy uses this to compute frames where this merger contains enough information to always yield correct frames without any traversals. Also, these frames can be weaved rather efficiently and it has made Byte Buddy a very popular tool for Java agents. This is also the main reason why I am arguing for a way to explicitly generate stack map tables, as this technique has proven to be so useful. As for your second point: I agree that there needs to be some context of what the initial frame is. I just would find it strange to specify it explicitly in an API for writing stack map frames, as this initial frame is implicit and not stored in the class file. I would also argue that there are many use cases for specifying reduced frames. Especially the "same" and "same1" frames are very popular in code generation frameworks: if you wanted to add code somewhere within a method, without writing local variables, but with some if/else logic, you can simply add a same-frame and not care about the greater context of the method. Similarly, append/crop frames are commonly used by modular byte code generators where every module only gets a "free" offset as their input and writes local variables after it via append-frames but crops those changes before handing over to the next module. I would therefore hope that a frame generation API allows to specify frames of any type explicitly. Of course, OpenJDK could do optimization in the background, ASM supports this too, on demand. ASM also has an option to "expand" frames and to only specify full frames, but I would not know that it is used very often, as most code generation tools work on a level where they already have to retain metainformation of some sort that makes emitting efficient frames rather easy. As far as I know, the expanded frame/frame computation modes are mainly used for prototyping. To me, this is another hint that there is demand for explicit frame generation also beyond offering full frames. Would you be open to a builder API like the one I suggested? If one would want to use it for only full frames, this would already be supported by only specifying them along the way. If you are open to it, I still offer to create a prototype! Thanks, Rafael Am Mi., 13. Juli 2022 um 15:17 Uhr schrieb Adam Sotona < adam.sotona at oracle.com>: > > > > > On 13.07.2022 13:49, "Rafael Winterhalter" wrote: > > > > ASM offers to override a method in ClassWriter to compute frames without > loading classes. Byte Buddy does so, and it is possible to generate stack > map frames this way. But it is still quite an overhead and just as with > OpenJDK right now, it does not work when types are missing what is > unfortunately rather common, for example if Spring is involved. In > contrast, Byte Buddy's frame weaving is always single-pass and does not > require any allocation (per frame), and works with missing types. > (Currently, OpenJDK produces a verification error > https://github.com/raphw/asm-jdk-bridge/blob/writer-poc/src/test/java/codes/rafael/asmjdkbridge/sample/FrameWithMissingType.java > ). > This is why I would prefer to plug this logic into OpenJDK's class writer, > if possible. Of course, this logic is based on some assumptions, but if you > are not trying to cover the general case, one can always be more efficient > than a generic processor, just for that it would be a helpful addition. > > > > Could you be, please, more specific about ?OpenJDK produces a > verification error?. I don?t see any code using Classfile API in your > sample. > > Classfile API is passing unchanged stack maps of unchanged methods through > transformations (in default shared constant pool mode). > > What I agree with is API extension to allow manually pass stack maps also > through builder. However I?m not sure what else do you mean by ?frame > weaving?. > > > > As for the generator API: The current StackMapFrame objects contain > information that is not present in the class file. For example, the > attribute offers an "initialFrame", and the frames themselves contain > effective stack and effective locals. For crop frames, those currently > contain the types of the cropped frames. For a writer API, OpenJDK's API > would hopefully only consume the data as it is written to the class file. > That would be (a) the type of frame and (b) the declared values for stack > and locals, or the amount of cropped frames. Of course, one could link all > frames within a StackMapTableAttribute together and compute this > information on the fly. For example by providing some builder: > > > > StackMapFrameAttribute b = StackMapFrameAttribute.builder(MethodDesc) > > .append(...) > > .crop(...) > > .same(...) > > .same1(...) > > .full(...) > > .build(); > > > > One could then add this attribute or fail the CodeBuilder if "manual mode" > was set without the attribute being present. I think this later option > would be a decent API. If you would consider it, I can offer to prototype > such a solution. > > > > What you are asking for is compressed form of stack map table. Initial > stack map frame is a key information for anyone working with stack maps. > Having only relative offsets and differentially compressed subsequent > frames is useless information without the initial frame. According to my > experience only effective full stack map frames can be transformed or any > other way processed. Information about compressed form of every frame is > just a secondary. Every stack map table can be compressed into many > equivalent forms. > > I agree with an option for user to specificy labeled full stack map frames. > > However frames compression is something user should not be responsible of, > it is similar as to request of manual deflation when writing a zip file. > > > > Thanks, > > Adam > > > > Best regards, Rafael > > > > > > > > Am Di., 12. Juli 2022 um 14:20 Uhr schrieb Adam Sotona < > adam.sotona at oracle.com>: > > > > > > *From: *classfile-api-dev on behalf > of Brian Goetz > *Date: *Sunday, 10 July 2022 19:40 > *To: *Rafael Winterhalter > *Cc: *classfile-api-dev at openjdk.org > *Subject: *Re: POC: JDK ClassModel -> ASM ClassReader > > > > > StackMapFrames could on the other hand just be added at their position > in the CodeElement iteration to receive them where they become relevant. > This way, one would not need to keep track of the current offset. This > would also allow for an easier write model where ASM does not allow you to > know the offset of a stack map. I assume that the current model is very > much modeled after the needs of the javap tool. Ideally, the frame objects > would be reduced to the information that is contained in a class file and > the consumer could track implicit information such as the "effective" stack > and locals. > > There are two concerns with this, one minor and one major. The minor one > is that this has a significant cost, and most users don?t want this > information. So we would surely want to gate this with an option whose > default is false. (We care the most about transformation, and most > transformations make only light changes, so we don?t want to add costs that > most users won?t want to bear.). > > The major one is how it perturbs the model. The element stream delivered > by a model, and the one consumed by a builder, should be duals. What > should a builder do when handed a frame? Switch responsibility over to the > user for correct frame generation? Ignore it and regenerate stack maps > anyway? > > I think we need a better picture of ?who is the audience for this sub > feature? before designing the API. > > I?ve been considering (and re-considering) many various scenarios related > to stack maps. > > Number one requirement is to generate valid stack maps in any > circumstances and with minimal performance penalty. And we already do a lot > in this area: > > ? Stack maps generation requires minimal information about involved > classes and only to resolve controversial situations (for example when > looking for common parent of dual assignment to a single local variable). > Required information is minimal and limited to individual classes (is this > specific class an interface or what is its parent class). On the other side > for example ASM requires to load the classes with all dependencies to > generate stack maps. > > ? Generation process is fast and does not produce any temporary > structures and objects. It is single-pass in >95% cases, dual-pass in >4% > cases and maximum three-pass in the remaining ~1% of cases (statistics > calculated from very large corpus of classes). > > Experiments to involve transformed original stack maps led to increased > complexity, worse performance, and mainly failed to produce valid stack > maps. There is no benefit of passing user-created (or somehow transformed) > stack map frames to the generator for final processing. > > From the discussion (and from my experience with class instrumentation) I > see one use case we didn?t cover and one case where we can improve: > > 1. We handle well class transformation from single source. Shared > constant pool allows to keep original stack maps for all methods with > unmodified bytecode. However, class instrumentation is a transformation > with at least two sources. Such transformation can share only one constant > pool. All methods from the second source must be exploded to instructions, > reconstructed and stack maps generated from scratch. Author of such > transformation must be fully aware of the consequences and having an option > to pass stack maps through non-shared CP transformation would be a valuable > feature. It would require: > > a. Option to individually turn off stack map generation per method > (because there might be also synthetic methods where sm generation is > required). I would propose to implement Code-level override of global > options. > > b. Factory for stack map table manual construction (based on labels, > not offsets). I would propose to put this ?manual mode? aside from > CodeBuilder and implement it as StackMapTableAttribute.of(Frame?) factory. > > 2. There are cases where required class hierarchy is not available > (when there is no access to all jars), so it would be hard for user to > provide appropriate ClassHierarchyResolver. However, many individual class > information can be theoretically extracted from the source classes (from > class headers, from existing stack maps, from other attributes or from the > bytecode itself). It is just an idea; however I think it might be possible > to implement an optional hierarchy resolver, that will learn from the > parsed classes. It is a theoretical option for improvement, without > throwing that responsibility on user. However, any such solution would > remain in category of non-deterministic. Altering a stack map without > minimal knowledge of the involved classes is still a blind surgery. > > > > Thanks, > > Adam > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Thu Jul 14 09:15:04 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Thu, 14 Jul 2022 09:15:04 +0000 Subject: [External] : Re: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <59424A30-EDF8-4AAA-8259-6F1B24FAB0DF@oracle.com> Message-ID: On 13.07.2022 22:10, "Rafael Winterhalter" wrote: If you are exciting the tests of this repo (you'd need to use my "monad" branch to do so, but the error is not related to it), then OpenJDK would, for FrameWithMissingType - where the "Missing" class is not present - compute the following bytecode if it the instructions of "m" are copied to a new class: public m(Z)V ILOAD 1 IFEQ 1 NEW codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Present DUP INVOKESPECIAL codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Present. ()V ASTORE 2 GOTO 2 LABEL 1 FRAME SAME NEW codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Missing DUP INVOKESPECIAL codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Missing. ()V ASTORE 2 LABEL 2 FRAME APPEND [java/lang/Object] // this will render slot 2 of type object, the next two lines yield the verification error ALOAD 2 INVOKEINTERFACE codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Iface.m ()V (itf) RETURN When specifying the frames explicitly in ASM, this does not happen where the append frame with comment above is correctly: FRAME APPEND [codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Iface] Both FRAME APPEND [java/lang/Object] or FRAME APPEND [codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Iface] are valid and both pass verification during class loading. Common ancestor of FrameWithMissingType$Present and FrameWithMissingType$Missing is java/lang/Object and it is a valid entry in the stack map frame. Declaration of common interface FrameWithMissingType$Iface in stack map frame is irrelevant, as interfaces are not subject of hierarchical assignability verification. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafael.wth at gmail.com Thu Jul 14 09:38:45 2022 From: rafael.wth at gmail.com (Rafael Winterhalter) Date: Thu, 14 Jul 2022 11:38:45 +0200 Subject: [External] : Re: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <59424A30-EDF8-4AAA-8259-6F1B24FAB0DF@oracle.com> Message-ID: The frame is indeed valid. The verification fails after ALOAD 2 (now a Object type is on the stack) and the subsequent method invocation INVOKEINTERFACE codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Iface.m ()V. This method is of course not declared on Object. Adam Sotona schrieb am Do., 14. Juli 2022, 11:15: > > > > > On 13.07.2022 22:10, "Rafael Winterhalter" wrote: > > > > If you are exciting the tests of this repo (you'd need to use my "monad" > branch to do so, but the error is not related to it), then OpenJDK would, > for FrameWithMissingType - where the "Missing" class is not present - > compute the following bytecode if it the instructions of "m" are copied to > a new class: > > > public m(Z)V > ILOAD 1 > IFEQ 1 > NEW codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Present > DUP > INVOKESPECIAL > codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Present. ()V > ASTORE 2 > > GOTO 2 > > LABEL 1 > > FRAME SAME > NEW codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Missing > DUP > INVOKESPECIAL > codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Missing. ()V > ASTORE 2 > > LABEL 2 > > FRAME APPEND [java/lang/Object] // this will render slot 2 of type > object, the next two lines yield the verification error > ALOAD 2 > INVOKEINTERFACE > codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Iface.m ()V (itf) > RETURN > > > > When specifying the frames explicitly in ASM, this does not happen where > the append frame with comment above is correctly: > > > > FRAME APPEND > [codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Iface] > > > > Both FRAME APPEND [java/lang/Object] or FRAME APPEND > [codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Iface] are valid and > both pass verification during class loading. > > Common ancestor of FrameWithMissingType$Present and > FrameWithMissingType$Missing is java/lang/Object and it is a valid entry in > the stack map frame. > > Declaration of common interface FrameWithMissingType$Iface in stack map > frame is irrelevant, as interfaces are not subject of hierarchical > assignability verification. > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Thu Jul 14 09:51:39 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Thu, 14 Jul 2022 09:51:39 +0000 Subject: [External] : Re: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <59424A30-EDF8-4AAA-8259-6F1B24FAB0DF@oracle.com> Message-ID: Could you, please, provide exact verification error. As I mentioned ? interfaces are not subjects of assignability verification, so INVOKEINTERFACE FrameWithMissingType$Iface is valid with any object type (except for arrays) on stack. On 14.07.2022 11:39, "Rafael Winterhalter" wrote: The frame is indeed valid. The verification fails after ALOAD 2 (now a Object type is on the stack) and the subsequent method invocation INVOKEINTERFACE codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Iface.m ()V. This method is of course not declared on Object. Adam Sotona > schrieb am Do., 14. Juli 2022, 11:15: On 13.07.2022 22:10, "Rafael Winterhalter" > wrote: If you are exciting the tests of this repo (you'd need to use my "monad" branch to do so, but the error is not related to it), then OpenJDK would, for FrameWithMissingType - where the "Missing" class is not present - compute the following bytecode if it the instructions of "m" are copied to a new class: public m(Z)V ILOAD 1 IFEQ 1 NEW codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Present DUP INVOKESPECIAL codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Present. ()V ASTORE 2 GOTO 2 LABEL 1 FRAME SAME NEW codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Missing DUP INVOKESPECIAL codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Missing. ()V ASTORE 2 LABEL 2 FRAME APPEND [java/lang/Object] // this will render slot 2 of type object, the next two lines yield the verification error ALOAD 2 INVOKEINTERFACE codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Iface.m ()V (itf) RETURN When specifying the frames explicitly in ASM, this does not happen where the append frame with comment above is correctly: FRAME APPEND [codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Iface] Both FRAME APPEND [java/lang/Object] or FRAME APPEND [codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Iface] are valid and both pass verification during class loading. Common ancestor of FrameWithMissingType$Present and FrameWithMissingType$Missing is java/lang/Object and it is a valid entry in the stack map frame. Declaration of common interface FrameWithMissingType$Iface in stack map frame is irrelevant, as interfaces are not subject of hierarchical assignability verification. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafael.wth at gmail.com Thu Jul 14 19:20:26 2022 From: rafael.wth at gmail.com (Rafael Winterhalter) Date: Thu, 14 Jul 2022 21:20:26 +0200 Subject: javap with classfile API In-Reply-To: References: Message-ID: My bad, my simple reproduction was too simple. If I make it an abstract super class with two subclasses, where one ("$Missing") is not available during instrumentation (for example from a build plugin), then the OpenJDK-generated byte code will be as follows: m(I)V ILOAD 1 IFEQ L0 NEW codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Sub DUP INVOKESPECIAL codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Sub. ()V ASTORE 2 GOTO L1 L0 FRAME SAME NEW codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Missing DUP INVOKESPECIAL codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Missing. ()V ASTORE 2 L1 FRAME APPEND [codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Sub] // should be "$Base" ALOAD 2 INVOKEVIRTUAL codes/rafael/asmjdkbridge/sample/FrameWithMissingType$Base.m ()V RETURN With ASM and explicit frame writing, the frames are issued correctly. With the OpenJDK version, the verification error occurs at the frame location as it might contain "$Missing" (which is now present again at runtime). I have updated the code sample to represent this scenario. Best regards, Rafael Am Di., 12. Juli 2022 um 11:48 Uhr schrieb Adam Sotona < adam.sotona at oracle.com>: > Hi Rafael, > > Could you, please, add tests cases where classfile-api-dev-branch javap > output differs from standard javap? > > These cases should be added into javap tests to cover backward > compatibility of the output. > > > > Thanks, > > Adam > > > > *From: *classfile-api-dev on behalf > of Rafael Winterhalter > *Date: *Tuesday, 12 July 2022 0:09 > *To: *classfile-api-dev at openjdk.org > *Subject: *javap with classfile API > > Hello, > > > > I completed a full round of comparing byte code output with ASM and > OpenJDK's APIs and doing so, I was using the sandboxes version of javap. I > found some errors in the tool doing so which I fixed for my own convenience > on the way (if you wanted to adapt those fixes, > https://github.com/openjdk/jdk-sandbox/pull/21, basically it makes the > javap output identical to the ASM-based version). > > > > I was however wondering about printing dynamic constants: > > > > 1. For invokedynamic instructions, > > > > printConstantPoolRefAndValue(instr.invokedynamic(), 0) > > > > is used. I would not know why the 0 is printed here, but it was printed > previously. Possibly, it was meant to yield an equivalent output for the > "count" (JVMS 6.5) value of interface calls, but I do not see its value. > Maybe this should be changed in javap. > > > > 2. I changed the print of DynamicConstantPoolEntrys to > > > > print("#" + info.bootstrap().bsmIndex() + ":#" + > info.nameAndType().index()); > > > > , the same that is used in the string representation. Previously, the > index was read via info.bootstrap().bootstrapMethod().index() - I do not > quite understand why those two would be different, and bsmIndex seems to > return 0 for lambda meta factory values, but it is indeed the old behavior. > Maybe printing the reference to lambda meta factory would however be the > better option. In this case, the string representation should however also > be adjusted. > > > > 3. For handle types, the internal constant names were printed. Those are > not part of the spec, so I changed it to print the documented REF_abc > values. Those do however not indicate if a reference addresses an interface > invocation; maybe this information should be included in javap. > > > > 4. I thought I found a row of issues with printing in-code type annotation > type references, but it seems that javac has a row of issues with adding > them to class files. So it's javac who gets those wrong, not javap. I > validated that this has been an issue previously and is not related to the > change in class file emitter. I reported these issues here ( > https://bugs.openjdk.org/browse/JDK-8290125) but wanted to mention it if > anybody ran into the same issue. > > > > Best regards, Rafael > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Fri Jul 15 11:09:38 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Fri, 15 Jul 2022 11:09:38 +0000 Subject: New jdk.jfr integration use cases in classfile-api-dev-branch Message-ID: Hi, I?m happy to inform you about fresh draft of Classfile API integration with jdk.jfr module. Complex JFR class instrumentation framework has been migrated from ASM to Classfile API and it joins growing list of Classfile API use cases in the classfile-api-dev-branch. See: https://github.com/openjdk/jdk-sandbox/tree/classfile-api-dev-branch#use-cases Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.van.acken at gmail.com Mon Jul 18 09:36:15 2022 From: michael.van.acken at gmail.com (Michael van Acken) Date: Mon, 18 Jul 2022 11:36:15 +0200 Subject: constantInstruction() & DynamicConstantDesc Message-ID: I just started to move my most recent Clojure compiler iteration to the Classfile API. It is a pleasure to use. I appreciate the huge amount of work that is going into this. While going through my constant handling tests I found three issues. First, I expected ConstantDescs.NULL to be mapped to ACONST_NULL. I can just as well special case this in the compiler, of course. Second, to get a test case for the literal constant -0.0 to round-trip I had to patch CodeBuilder.java to go via LDC: cause: expected not equivalent to actual value expected: [(LDC2_W -0.0) (DRETURN)] actual: [(DCONST_0 ) (DRETURN)] Third, a round-trip involving the describeConstable() of a Character instance (via ConstantDescs.BSM_EXPLICIT_CAST) was read back as a DCD with a nameAndType of "_":"LC;" instead of the expected "_":"C". Changing the ofCanonical invocation in ConcreteEntry.java fixed this particular case for me, but I am only fishing around here. The patch below has more details. -- mva diff --git a/src/java.base/share/classes/jdk/classfile/CodeBuilder.java b/src/java.base/share/classes/jdk/classfile/CodeBuilder.java index 154050ae8b4..330980af725 100755 --- a/src/java.base/share/classes/jdk/classfile/CodeBuilder.java +++ b/src/java.base/share/classes/jdk/classfile/CodeBuilder.java @@ -27,6 +27,7 @@ package jdk.classfile; import java.lang.constant.ClassDesc; import java.lang.constant.ConstantDesc; +import java.lang.constant.ConstantDescs; import java.lang.constant.DirectMethodHandleDesc; import java.lang.constant.DynamicCallSiteDesc; import java.lang.constant.MethodTypeDesc; @@ -424,7 +425,7 @@ public sealed interface CodeBuilder default CodeBuilder constantInstruction(ConstantDesc value) { // This method must ensure any call to constant(Opcode, ConstantDesc) has a non-null Opcode. - if (value == null) { + if (value == null || ConstantDescs.NULL.equals(value)) { return constantInstruction(Opcode.ACONST_NULL, null); } else if (value instanceof Integer iVal) { @@ -437,16 +438,16 @@ public sealed interface CodeBuilder else return constantInstruction(Opcode.LDC2_W, lVal); } else if (value instanceof Float fVal) { - if (fVal == 0.0) + if (fVal.compareTo(0.0f) == 0) // 0.0f but not -0.0f return with(ConstantInstruction.ofIntrinsic(Opcode.FCONST_0)); - else if (fVal == 1.0) + else if (fVal == 1.0f) return with(ConstantInstruction.ofIntrinsic(Opcode.FCONST_1)); - else if (fVal == 2.0) + else if (fVal == 2.0f) return with(ConstantInstruction.ofIntrinsic(Opcode.FCONST_2)); else return constantInstruction(Opcode.LDC, fVal); } else if (value instanceof Double dVal) { - if (dVal == 0.0d) + if (dVal.compareTo(0.0d) == 0) // 0.0d but not -0.0d return with(ConstantInstruction.ofIntrinsic(Opcode.DCONST_0)); else if (dVal == 1.0d) return with(ConstantInstruction.ofIntrinsic(Opcode.DCONST_1)); diff --git a/src/java.base/share/classes/jdk/classfile/impl/ConcreteEntry.java b/src/java.base/share/classes/jdk/classfile/impl/ConcreteEntry.java index bead58d7618..9dcc8eeb7e1 100755 --- a/src/java.base/share/classes/jdk/classfile/impl/ConcreteEntry.java +++ b/src/java.base/share/classes/jdk/classfile/impl/ConcreteEntry.java @@ -797,7 +797,7 @@ public abstract sealed class ConcreteEntry { staticArgs[i] = args.get(i).constantValue(); return DynamicConstantDesc.ofCanonical(bootstrap().bootstrapMethod().asSymbol(), - nameAndType().name().stringValue(), ClassDesc.of(Util.toClassString(Util.descriptorToClass(nameAndType().type().stringValue()))), staticArgs); + nameAndType().name().stringValue(), ClassDesc.ofDescriptor(nameAndType().type().stringValue()), staticArgs); } } -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Mon Jul 18 13:06:08 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 18 Jul 2022 13:06:08 +0000 Subject: constantInstruction() & DynamicConstantDesc In-Reply-To: References: Message-ID: Thanks for the experience report! You raised three issues: - ConstantDescs.NULL - Preservation of signed zero - round-tripping of DCD The signed zero definitely looks like a bug and we should fix it as per your patch. Handling ConstantDescs.NULL is an interesting question. Since this is a ?convenience? method (the user isn?t asking for a specific opcode, they?re saying ?here?s a constant, do something?), either treatment of NULL is probably valid. If we do nothing, you?ll get a condy that evaluates to null, so you get the same runtime behavior, but then again we could LDC an integer constant of zero rather than iconst_0. So this is probably reasonable behavior to special-case NULL here. I?ll look into the third one. Thanks, -Brian > On Jul 18, 2022, at 5:36 AM, Michael van Acken wrote: > > I just started to move my most recent Clojure compiler iteration to > the Classfile API. It is a pleasure to use. I appreciate the huge > amount of work that is going into this. > > While going through my constant handling tests I found three issues. > > First, I expected ConstantDescs.NULL to be mapped to ACONST_NULL. I > can just as well special case this in the compiler, of course. > > Second, to get a test case for the literal constant -0.0 to round-trip > I had to patch CodeBuilder.java to go via LDC: > > cause: expected not equivalent to actual value > expected: [(LDC2_W -0.0) (DRETURN)] > actual: [(DCONST_0 ) (DRETURN)] > > Third, a round-trip involving the describeConstable() of a Character > instance (via ConstantDescs.BSM_EXPLICIT_CAST) was read back as a DCD > with a nameAndType of "_":"LC;" instead of the expected "_":"C". > Changing the ofCanonical invocation in ConcreteEntry.java fixed this > particular case for me, but I am only fishing around here. > > The patch below has more details. > > -- mva > > > diff --git a/src/java.base/share/classes/jdk/classfile/CodeBuilder.java b/src/java.base/share/classes/jdk/classfile/CodeBuilder.java > index 154050ae8b4..330980af725 100755 > --- a/src/java.base/share/classes/jdk/classfile/CodeBuilder.java > +++ b/src/java.base/share/classes/jdk/classfile/CodeBuilder.java > @@ -27,6 +27,7 @@ package jdk.classfile; > > import java.lang.constant.ClassDesc; > import java.lang.constant.ConstantDesc; > +import java.lang.constant.ConstantDescs; > import java.lang.constant.DirectMethodHandleDesc; > import java.lang.constant.DynamicCallSiteDesc; > import java.lang.constant.MethodTypeDesc; > @@ -424,7 +425,7 @@ public sealed interface CodeBuilder > > default CodeBuilder constantInstruction(ConstantDesc value) { > // This method must ensure any call to constant(Opcode, ConstantDesc) has a non-null Opcode. > - if (value == null) { > + if (value == null || ConstantDescs.NULL.equals(value)) { > return constantInstruction(Opcode.ACONST_NULL, null); > } > else if (value instanceof Integer iVal) { > @@ -437,16 +438,16 @@ public sealed interface CodeBuilder > else > return constantInstruction(Opcode.LDC2_W, lVal); > } else if (value instanceof Float fVal) { > - if (fVal == 0.0) > + if (fVal.compareTo(0.0f) == 0) // 0.0f but not -0.0f > return with(ConstantInstruction.ofIntrinsic(Opcode.FCONST_0)); > - else if (fVal == 1.0) > + else if (fVal == 1.0f) > return with(ConstantInstruction.ofIntrinsic(Opcode.FCONST_1)); > - else if (fVal == 2.0) > + else if (fVal == 2.0f) > return with(ConstantInstruction.ofIntrinsic(Opcode.FCONST_2)); > else > return constantInstruction(Opcode.LDC, fVal); > } else if (value instanceof Double dVal) { > - if (dVal == 0.0d) > + if (dVal.compareTo(0.0d) == 0) // 0.0d but not -0.0d > return with(ConstantInstruction.ofIntrinsic(Opcode.DCONST_0)); > else if (dVal == 1.0d) > return with(ConstantInstruction.ofIntrinsic(Opcode.DCONST_1)); > diff --git a/src/java.base/share/classes/jdk/classfile/impl/ConcreteEntry.java b/src/java.base/share/classes/jdk/classfile/impl/ConcreteEntry.java > index bead58d7618..9dcc8eeb7e1 100755 > --- a/src/java.base/share/classes/jdk/classfile/impl/ConcreteEntry.java > +++ b/src/java.base/share/classes/jdk/classfile/impl/ConcreteEntry.java > @@ -797,7 +797,7 @@ public abstract sealed class ConcreteEntry { > staticArgs[i] = args.get(i).constantValue(); > > return DynamicConstantDesc.ofCanonical(bootstrap().bootstrapMethod().asSymbol(), > - nameAndType().name().stringValue(), ClassDesc.of(Util.toClassString(Util.descriptorToClass(nameAndType().type().stringValue()))), staticArgs); > + nameAndType().name().stringValue(), ClassDesc.ofDescriptor(nameAndType().type().stringValue()), staticArgs); > } > } > From paul.sandoz at oracle.com Mon Jul 18 20:31:41 2022 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Mon, 18 Jul 2022 20:31:41 +0000 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> Message-ID: <3647E41B-745F-425B-AEFD-F944E26BBAA3@oracle.com> Fair point. How might you envisage the future of ASM? some stripped down version with integration hooks into the classfile API? Paul. > On Jul 10, 2022, at 11:54 AM, Rafael Winterhalter wrote: > > You are right that using AutoCloseable might be misleading, I had the same thought. I still chose the interface as IDEs know the type and warn if an AutoClosable is neither passed on nor closed, so I felt it would be a better choice compared to a custom method. I do however disagree with the ASM integration being an academic exercise. ASM is well-established and a de-facto standard for code generation. There's tons of code written in ASM and sometimes (like in Byte Buddy), ASM is publicly exposed and cannot be replaced without creating years of migration work. My hope is that a bridge like this would allow for ASM to adapt to the OpenJDK APIs for its readers and writers (on VMs where those are available). By achieving this, the "ASM problem" could be solved with the JVM version the API is released, which is likely many years before the OpenJDK API could make ASM obsolete. > > I agree that the focus should be on a convenient API and not to reinvent ASM in new cloths, but the new API should aim to make itself integratable, I think, as this would create huge value and speed-up adoption. > > Am Fr., 8. Juli 2022 um 22:53 Uhr schrieb Paul Sandoz : > The inversion of control in the current API is indeed awkward when something else wants to take overall control, and in those circumstances one would have to give up certain features (like reapply as you noted) and there is more room for error (e.g. paring closes, use after close). > > Arguably making those open builders AutoCloseable is misleading, since since if the building can be lexically scoped one should use the existing API (ignoring details on exceptions and capturing, which I don?t think are sufficient to justify a new mode of writing). > > It feels like the ASM integration is more of an academic exercise. A useful one to play with the API and provide feedback, but in practice how useful is it? (Since one can always interoperate between classifies.) > > I am concerned the choice will be a distraction, but I don?t have any better concrete ideas right now. It would be helpful to understand more about the integration experiments with the Java compiler to compare/contrast. > > Paul. > > > On Jul 6, 2022, at 2:05 PM, Rafael Winterhalter wrote: > > > > With the writer, I have made some progress after adding a monadic view to ClassBuilder where one can apply a consumer multiple times before "closing" the writer for extracting a class file. I pushed this experiment on a commit of my clone (https://github.com/raphw/jdk-sandbox/commit/2be58f400b9ebf96b851eda658e0b8d2560421c5) to showcase the way I thought this might work. In theory, it should allow for any optimization of the current API. At the same time, it is awkward enough that people would only use it if they really needed it and therefore avoid it by default. And once they use it, any IDE would ask for closing each intermediate object when detecting the AutoCloseable interface. The only thing that I had to compromise on compared to "non-open" API was the use of CodeBuilderImpl which is currently reapplying the consumer in case of a LabelOverflowException. At the same time, I hoped that this might be a temporary state anyways as the possible reapplication is unlikely to be expected by any user. > > > > From michael.van.acken at gmail.com Tue Jul 19 05:20:50 2022 From: michael.van.acken at gmail.com (Michael van Acken) Date: Tue, 19 Jul 2022 07:20:50 +0200 Subject: constantInstruction() & DynamicConstantDesc In-Reply-To: References: Message-ID: Am Mo., 18. Juli 2022 um 15:06 Uhr schrieb Brian Goetz < brian.goetz at oracle.com>: > [...] > > Handling ConstantDescs.NULL is an interesting question. Since this is a > ?convenience? method (the user isn?t asking for a specific opcode, they?re > saying ?here?s a constant, do something?), either treatment of NULL is > probably valid. If we do nothing, you?ll get a condy that evaluates to > null, so you get the same runtime behavior, but then again we could LDC an > integer constant of zero rather than iconst_0. So this is probably > reasonable behavior to special-case NULL here. Besides the symmetry to the other single-byte encodings, having NULL leaves room for the application to have its own interpretation of lowercase null. This is what happened to me: every expression node of the compiler has a ConstantDesc member, with null signalling that the node cannot be represented by a constant. I never call constantInstruction(null) because here the literal meaning of lowercase null is "constantInstruction() is not applicable". -- mva -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin.john.evans at gmail.com Tue Jul 19 10:56:05 2022 From: benjamin.john.evans at gmail.com (Ben Evans) Date: Tue, 19 Jul 2022 12:56:05 +0200 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: <3647E41B-745F-425B-AEFD-F944E26BBAA3@oracle.com> References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <3647E41B-745F-425B-AEFD-F944E26BBAA3@oracle.com> Message-ID: On Mon, Jul 18, 2022 at 10:31 PM Paul Sandoz wrote: > > Fair point. How might you envisage the future of ASM? some stripped down version with integration hooks into the classfile API? If this approach was adopted by the ASM folks it would potentially remove a barrier to adoption of non-LTS JDK versions in production. Specifically, it removes the risk of being stuck on an orphaned JDK version because the current JDK version is not supported by the ASM version that one of your key dependencies is pinned to. If ASM becomes a wrapper & value-add over the JDK Classfile API then "maximum supported JDK version" is a property solely of the JDK, not of the ASM library, and so is no longer a barrier to upgrading the JDK. I'm sure it's not the not only barrier to getting more people to use non-LTS in production, but based on what I saw at New Relic (where customers raised this issue, or its consequences, fairly regularly) I think it could be a significant one. Thanks, Ben From rafael.wth at gmail.com Tue Jul 19 11:34:46 2022 From: rafael.wth at gmail.com (Rafael Winterhalter) Date: Tue, 19 Jul 2022 13:34:46 +0200 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: <3647E41B-745F-425B-AEFD-F944E26BBAA3@oracle.com> References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <3647E41B-745F-425B-AEFD-F944E26BBAA3@oracle.com> Message-ID: It's not really my decision what ASM is doing, but I think they agree with my attempt to integrate ASM and the API: https://gitlab.ow2.org/asm/asm/-/issues/317978 My favoured solution would be to merge this into core-ASM (via a multi-release jar for example) and to pick up the reader/writer if possible. This way ASM would automatically become forwards-compatible given no byte code changes. (And the possibility to run unknown attributes through ASM what is still an open issue in my bridge.) If ASM would not want to do that, I would still want to adopt it for Byte Buddy. I did a test with my bridge in Byte Buddy for a simple Java agent with a downgraded ASM version and it worked really well. Paul Sandoz schrieb am Mo., 18. Juli 2022, 22:31: > Fair point. How might you envisage the future of ASM? some stripped down > version with integration hooks into the classfile API? > > Paul. > > > On Jul 10, 2022, at 11:54 AM, Rafael Winterhalter > wrote: > > > > You are right that using AutoCloseable might be misleading, I had the > same thought. I still chose the interface as IDEs know the type and warn if > an AutoClosable is neither passed on nor closed, so I felt it would be a > better choice compared to a custom method. I do however disagree with the > ASM integration being an academic exercise. ASM is well-established and a > de-facto standard for code generation. There's tons of code written in ASM > and sometimes (like in Byte Buddy), ASM is publicly exposed and cannot be > replaced without creating years of migration work. My hope is that a bridge > like this would allow for ASM to adapt to the OpenJDK APIs for its readers > and writers (on VMs where those are available). By achieving this, the "ASM > problem" could be solved with the JVM version the API is released, which is > likely many years before the OpenJDK API could make ASM obsolete. > > > > I agree that the focus should be on a convenient API and not to reinvent > ASM in new cloths, but the new API should aim to make itself integratable, > I think, as this would create huge value and speed-up adoption. > > > > Am Fr., 8. Juli 2022 um 22:53 Uhr schrieb Paul Sandoz < > paul.sandoz at oracle.com>: > > The inversion of control in the current API is indeed awkward when > something else wants to take overall control, and in those circumstances > one would have to give up certain features (like reapply as you noted) and > there is more room for error (e.g. paring closes, use after close). > > > > Arguably making those open builders AutoCloseable is misleading, since > since if the building can be lexically scoped one should use the existing > API (ignoring details on exceptions and capturing, which I don?t think are > sufficient to justify a new mode of writing). > > > > It feels like the ASM integration is more of an academic exercise. A > useful one to play with the API and provide feedback, but in practice how > useful is it? (Since one can always interoperate between classifies.) > > > > I am concerned the choice will be a distraction, but I don?t have any > better concrete ideas right now. It would be helpful to understand more > about the integration experiments with the Java compiler to > compare/contrast. > > > > Paul. > > > > > On Jul 6, 2022, at 2:05 PM, Rafael Winterhalter > wrote: > > > > > > With the writer, I have made some progress after adding a monadic view > to ClassBuilder where one can apply a consumer multiple times before > "closing" the writer for extracting a class file. I pushed this experiment > on a commit of my clone ( > https://github.com/raphw/jdk-sandbox/commit/2be58f400b9ebf96b851eda658e0b8d2560421c5) > to showcase the way I thought this might work. In theory, it should allow > for any optimization of the current API. At the same time, it is awkward > enough that people would only use it if they really needed it and therefore > avoid it by default. And once they use it, any IDE would ask for closing > each intermediate object when detecting the AutoCloseable interface. The > only thing that I had to compromise on compared to "non-open" API was the > use of CodeBuilderImpl which is currently reapplying the consumer in case > of a LabelOverflowException. At the same time, I hoped that this might be a > temporary state anyways as the possible reapplication is unlikely to be > expected by any user. > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Tue Jul 19 12:52:28 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 19 Jul 2022 14:52:28 +0200 (CEST) Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <3647E41B-745F-425B-AEFD-F944E26BBAA3@oracle.com> Message-ID: <510150081.12807221.1658235148868.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Ben Evans" > To: "Paul Sandoz" > Cc: "Rafael Winterhalter" , "Brian Goetz" , "classfile-api-dev" > > Sent: Tuesday, July 19, 2022 12:56:05 PM > Subject: Re: POC: JDK ClassModel -> ASM ClassReader Hi Ben, > On Mon, Jul 18, 2022 at 10:31 PM Paul Sandoz wrote: >> >> Fair point. How might you envisage the future of ASM? some stripped down version >> with integration hooks into the classfile API? > > If this approach was adopted by the ASM folks it would potentially > remove a barrier to adoption of non-LTS JDK versions in production. > > Specifically, it removes the risk of being stuck on an orphaned JDK > version because the current JDK version is not supported by the ASM > version that one of your key dependencies is pinned to. > > If ASM becomes a wrapper & value-add over the JDK Classfile API then > "maximum supported JDK version" is a property solely of the JDK, not > of the ASM library, and so is no longer a barrier to upgrading the > JDK. > > I'm sure it's not the not only barrier to getting more people to use > non-LTS in production, but based on what I saw at New Relic (where > customers raised this issue, or its consequences, fairly regularly) I > think it could be a significant one. ASM is fully backward compatible for *every* releases of the JDK, LTS or not. Usually, ASM is not directly the issue, the issue is some libraries that are using an older version of ASM and users not wanting to update solely ASM. So i do not think that ASM using the JDK as backend will change something. Also Gradle was a usual suspect and recently they have updated gradle to work even with older ASM version by disabling the incremental compilation. > > Thanks, > > Ben R?mi From benjamin.john.evans at gmail.com Tue Jul 19 13:07:25 2022 From: benjamin.john.evans at gmail.com (Ben Evans) Date: Tue, 19 Jul 2022 15:07:25 +0200 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: <510150081.12807221.1658235148868.JavaMail.zimbra@u-pem.fr> References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <3647E41B-745F-425B-AEFD-F944E26BBAA3@oracle.com> <510150081.12807221.1658235148868.JavaMail.zimbra@u-pem.fr> Message-ID: On Tue, Jul 19, 2022 at 2:52 PM Remi Forax wrote: > > > From: "Ben Evans" > > > On Mon, Jul 18, 2022 at 10:31 PM Paul Sandoz wrote: > >> > >> Fair point. How might you envisage the future of ASM? some stripped down version > >> with integration hooks into the classfile API? > > > > If this approach was adopted by the ASM folks it would potentially > > remove a barrier to adoption of non-LTS JDK versions in production. > > > > Specifically, it removes the risk of being stuck on an orphaned JDK > > version because the current JDK version is not supported by the ASM > > version that one of your key dependencies is pinned to. > > > > If ASM becomes a wrapper & value-add over the JDK Classfile API then > > "maximum supported JDK version" is a property solely of the JDK, not > > of the ASM library, and so is no longer a barrier to upgrading the > > JDK. > > > > I'm sure it's not the not only barrier to getting more people to use > > non-LTS in production, but based on what I saw at New Relic (where > > customers raised this issue, or its consequences, fairly regularly) I > > think it could be a significant one. > > ASM is fully backward compatible for *every* releases of the JDK, LTS or not. Backward compatibility is not the issue here - we're talking about *forward* compatibility. > Usually, ASM is not directly the issue, the issue is some libraries that are using an older version of ASM and users not wanting to update solely ASM. That's exactly what I said above: > > it removes the risk of being stuck on an orphaned JDK > > version because the current JDK version is not supported by the ASM > > version that one of your key dependencies is pinned to. > So i do not think that ASM using the JDK as backend will change something. Why not? If the "maximum supported JDK version" constant comes from the JDK, not from ASM, then upgrading the JDK (without touching ASM) will also upgrade the max supported version that ASM can handle (in the majority of cases, i.e. that don't touch any aspect of bytecode that might have been altered in the new JDK version). Thanks, Ben From michael.van.acken at gmail.com Tue Jul 19 16:13:47 2022 From: michael.van.acken at gmail.com (Michael van Acken) Date: Tue, 19 Jul 2022 18:13:47 +0200 Subject: Opcode, LabelResolver & NewMultiArrayInstruction Message-ID: Sorry for another post with three points. In Opcode.java, the TypeKind of ISHR & LSHR does not match the opcode: ISHR(Classfile.ISHR, 1, CodeElement.Kind.OPERATOR, TypeKind.FloatType), LSHR(Classfile.LSHR, 1, CodeElement.Kind.OPERATOR, TypeKind.DoubleType), Both CodeModel and CodeAttribute (from the published API) depend on jdk.classfile.impl.LabelResolver (from the implementation package) for the method labelToBci(). Maybe LabelResolver should be moved into the published API as well? Finally, may I ask for an overload `of(ClassDesc,int)` in j.cf.i.NewMultiArrayInstruction similar to the one e.g. offered by TypeCheckInstruction? Being able to create simple instructions outside of a CodeBuilder simplifies my intermediate representation significantly. I can handle all the simple instructions with a single node class, and only have to define dedicated classes for more complicated higher-level stuff. Without the ClassDesc overload, I would have to move NMAI from the generic into the dedicated category. -- mva -------------- next part -------------- An HTML attachment was scrubbed... URL: From ebruneton at free.fr Tue Jul 19 17:27:27 2022 From: ebruneton at free.fr (ebruneton at free.fr) Date: Tue, 19 Jul 2022 19:27:27 +0200 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <3647E41B-745F-425B-AEFD-F944E26BBAA3@oracle.com> <510150081.12807221.1658235148868.JavaMail.zimbra@u-pem.fr> Message-ID: <0556193f9cd3687d6ae0d50e25b842f0@free.fr> Hi Ben, Le 19/07/2022 15:07, Ben Evans a ?crit?: > On Tue, Jul 19, 2022 at 2:52 PM Remi Forax wrote: >> >> > From: "Ben Evans" >> >> > On Mon, Jul 18, 2022 at 10:31 PM Paul Sandoz wrote: >> >> >> >> Fair point. How might you envisage the future of ASM? some stripped down version >> >> with integration hooks into the classfile API? >> > >> > If this approach was adopted by the ASM folks it would potentially >> > remove a barrier to adoption of non-LTS JDK versions in production. >> > >> > Specifically, it removes the risk of being stuck on an orphaned JDK >> > version because the current JDK version is not supported by the ASM >> > version that one of your key dependencies is pinned to. >> > >> > If ASM becomes a wrapper & value-add over the JDK Classfile API then >> > "maximum supported JDK version" is a property solely of the JDK, not >> > of the ASM library, and so is no longer a barrier to upgrading the >> > JDK. >> > >> > I'm sure it's not the not only barrier to getting more people to use >> > non-LTS in production, but based on what I saw at New Relic (where >> > customers raised this issue, or its consequences, fairly regularly) I >> > think it could be a significant one. >> >> ASM is fully backward compatible for *every* releases of the JDK, LTS >> or not. > > Backward compatibility is not the issue here - we're talking about > *forward* compatibility. > >> Usually, ASM is not directly the issue, the issue is some libraries >> that are using an older version of ASM and users not wanting to update >> solely ASM. > > That's exactly what I said above: > >> > it removes the risk of being stuck on an orphaned JDK >> > version because the current JDK version is not supported by the ASM >> > version that one of your key dependencies is pinned to. > >> So i do not think that ASM using the JDK as backend will change >> something. > > Why not? If the "maximum supported JDK version" constant comes from > the JDK, not from ASM, then upgrading the JDK (without touching ASM) > will also upgrade the max supported version that ASM can handle (in > the majority of cases, i.e. that don't touch any aspect of bytecode > that might have been altered in the new JDK version). Forward compatibility is a difficult topic I think, even for the ClassFile API. I don't think users will be able to just upgrade to the latest JDK to have their ClassFile API code automatically work with the latest class version. In the majority of cases, maybe. But there will be cases where their code might silently fail, when the new class version introduces new features. For example, consider the ClassRemapper here https://github.com/openjdk/jdk-sandbox/blob/classfile-api-branch/src/java.base/share/classes/jdk/classfile/transforms/ClassRemapper.java. It seems that it currently does not remap record components, annotations or type annotations, modules, LDC opcode, etc? You could imagine that it was written for an old class version, pre annotations. If you upgrade to a new JDK version with records and type annotations, this code will silently produce incorrect results, where not everything is remapped. It might be possible to rewrite it in a more secure way, but this would probably require all "switch" to list all known options at the time of writing (and the default case to throw an exception)? If your code has a dependency to such a library, you will be stuck to old JDK versions as long as the library's author does not update it to support the new features. Whatever the class file API it uses (ASM, ClassFile, etc). ASM has a "max recognized version" in ClassReader and an ASM API version provided by user code (which declares for which ASM API version it was written), in order to address these forward compatibility issues (they are discussed in more details p82 of https://asm.ow2.io/asm4-guide.pdf). The goal is that user code should not silently fail because of new class file features. What kind of forward compatibility guarantees do you want to provide with the ClassFile API? Are they or will they be documented somewhere? Will there be recommendations for users so that their code does not silently fail in case of new class features? Or can this guarantee be ensured by the API somehow? About the future of ASM, I would say we will continue to support users who need it. If at some point most users have switched to the ClassFile API, we can deprecate it. For the reasons explained above, I don't think that refactoring ASM as a thin layer on top of the ClassFile API would bring any benefit. Eric > Thanks, > > Ben From brian.goetz at oracle.com Tue Jul 19 18:38:09 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 19 Jul 2022 18:38:09 +0000 Subject: Opcode, LabelResolver & NewMultiArrayInstruction In-Reply-To: References: Message-ID: > Sorry for another post with three points. It?s like you have a buffer :) > In Opcode.java, the TypeKind of ISHR & LSHR does not match the opcode: > > ISHR(Classfile.ISHR, 1, CodeElement.Kind.OPERATOR, TypeKind.FloatType), > LSHR(Classfile.LSHR, 1, CodeElement.Kind.OPERATOR, TypeKind.DoubleType), Whoops :) I?m not sure how much depends on these ancillary properties of Opcode, so this may not trigger any bad behavior, but obviously this is some sort of cut and paste error. In an earlier iteration there were tons of ancillary properties and over time they migrated elsewhere (or were not needed.). Before we fix this, we should look at how widely the ?primary type kind? property is used, and whether this is the right place for it. > Both CodeModel and CodeAttribute (from the published API) depend on > jdk.classfile.impl.LabelResolver (from the implementation package) for > the method labelToBci(). Maybe LabelResolver should be moved into the > published API as well? I?m surprised we didn?t see a warning for that, as there are supposed to be warnings for when encapsulated types leak into the API of exported types. But, the LabelResolver type mostly serves the implementation, not the API, so I don?t think there?s a benefit of exposing it by name (the behavior it has is of course public.). > Finally, may I ask for an overload `of(ClassDesc,int)` in > j.cf.i.NewMultiArrayInstruction similar to the one e.g. offered by > TypeCheckInstruction? Being able to create simple instructions > outside of a CodeBuilder simplifies my intermediate representation > significantly. I can handle all the simple instructions with a single > node class, and only have to define dedicated classes for more > complicated higher-level stuff. Without the ClassDesc overload, I > would have to move NMAI from the generic into the dedicated category. This seems reasonable enough. This is in the category of ?convenience factories?, of which we have many, but have not yet does a fully disciplined pass through the API to ensure that they are consistent across all the instructions / attributes / etc. One of the risks of adding convenience factories like this is that each one we add sometimes raises ?but why then isn?t there an anologous one here.? It helps to proceed from some principles about what kinds of convenience factories we want, and then ensure they are all consistently implemented, rather than adding them one by one. From ebruneton at free.fr Wed Jul 20 06:08:11 2022 From: ebruneton at free.fr (ebruneton at free.fr) Date: Wed, 20 Jul 2022 08:08:11 +0200 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <3647E41B-745F-425B-AEFD-F944E26BBAA3@oracle.com> <510150081.12807221.1658235148868.JavaMail.zimbra@u-pem.fr> <0556193f9cd3687d6ae0d50e25b842f0@free.fr> Message-ID: Hi Ben, Le 19/07/2022 20:37, Ben Evans a ?crit?: > What worries me is that: > > a) The Classfile API ends up not containing everything that is really > needed by library authors, so they don't switch I don't think a rewrite of ASM on top of ClassFile is necessary for that. The work that Rafael is doing should be very helpful already. As well as all the feedback you receive on this mailing list. > b) The cost of switching is very high, so authors keep using ASM, > which therefore keeps being developed, and we lose the opportunity to > close a big gap in the long upgrade cycle problem. For the "long upgrade cycle problem" I think very common libraries which are no longer actively maintained (such as cglib - https://github.com/cglib/cglib/blob/master/README.md, used in many other libraries, such as EasyMock) will continue to be an issue anyway, even with the ClassFile API. > c) Even if a) turns out not to be a problem, the time to get enough of > the ecosystem moved over to Classfile API is potentially extremely > long. From these 3 points it seems that your goal is for the ClassFile API to eventually replace ASM? Yet this is listed as a non goal in https://bugs.openjdk.org/browse/JDK-8280389? Can you clarify? About the "majority of cases" where the class file format does not change between two JDK versions, why is the class file version still incremented? If it was not, this would solve several cases where tools such as ASM must be "updated" for no real reason. Another option would be to increment the minor version only, and to reserve major version increments for real format changes. If this was added to the JVMS, tools such as ASM could take advantage of this, and would need to be updated less often (in fact I realize that ASM is already checking the major version only, so we unconsciously already assumed that minor version changes are not important). Eric > > There are no easy answers here, of course. > > Thanks, > > Ben From adam.sotona at oracle.com Wed Jul 20 08:05:07 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Wed, 20 Jul 2022 08:05:07 +0000 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: <0556193f9cd3687d6ae0d50e25b842f0@free.fr> References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <3647E41B-745F-425B-AEFD-F944E26BBAA3@oracle.com> <510150081.12807221.1658235148868.JavaMail.zimbra@u-pem.fr> <0556193f9cd3687d6ae0d50e25b842f0@free.fr> Message-ID: From: classfile-api-dev on behalf of ebruneton at free.fr Date: Tuesday, 19 July 2022 19:27 To: Ben Evans Cc: Remi Forax , Paul Sandoz , Rafael Winterhalter , Brian Goetz , classfile-api-dev , classfile-api-dev Subject: Re: POC: JDK ClassModel -> ASM ClassReader Forward compatibility is a difficult topic I think, even for the ClassFile API. I don't think users will be able to just upgrade to the latest JDK to have their ClassFile API code automatically work with the latest class version. In the majority of cases, maybe. But there will be cases where their code might silently fail, when the new class version introduces new features. For example, consider the ClassRemapper here https://github.com/openjdk/jdk-sandbox/blob/classfile-api-branch/src/java.base/share/classes/jdk/classfile/transforms/ClassRemapper.java. It seems that it currently does not remap record components, annotations or type annotations, modules, LDC opcode, etc? You could imagine that it was written for an old class version, pre annotations. If you upgrade to a new JDK version with records and type annotations, this code will silently produce incorrect results, where not everything is remapped. It might be possible to rewrite it in a more secure way, but this would probably require all "switch" to list all known options at the time of writing (and the default case to throw an exception)? Hi Eric, Thanks for pointing ClassRemapper out. It was written as an example of transformations modularity for jdk.jfr class instrumentation use case. The implementation is incomplete yet, however I see no blockers to finish it and keep it up to date with every new JDK feature coming. Thanks, Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Wed Jul 20 09:04:26 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Wed, 20 Jul 2022 09:04:26 +0000 Subject: Opcode, LabelResolver & NewMultiArrayInstruction In-Reply-To: References: Message-ID: Opcode::primaryTypeKind() forwards to LoadInstruction, StoreInstruction, ReturnInstruction, ArrayLoadInstruction, ArrayStoreInstruction, ConvertInstruction and OperatorInstruction. Specifically, for ConvertInstruction and OperatorInstruction I haven?t found any use of the instruction type kind across ClassfileAPI and tests. The information is only informative for these two kinds of opcodes, however it may still be usefull. I would suggest to fix it. Adam From: classfile-api-dev on behalf of Brian Goetz Date: Tuesday, 19 July 2022 20:38 To: Michael van Acken Cc: classfile-api-dev at openjdk.org Subject: Re: Opcode, LabelResolver & NewMultiArrayInstruction > In Opcode.java, the TypeKind of ISHR & LSHR does not match the opcode: > > ISHR(Classfile.ISHR, 1, CodeElement.Kind.OPERATOR, TypeKind.FloatType), > LSHR(Classfile.LSHR, 1, CodeElement.Kind.OPERATOR, TypeKind.DoubleType), Whoops :) I?m not sure how much depends on these ancillary properties of Opcode, so this may not trigger any bad behavior, but obviously this is some sort of cut and paste error. In an earlier iteration there were tons of ancillary properties and over time they migrated elsewhere (or were not needed.). Before we fix this, we should look at how widely the ?primary type kind? property is used, and whether this is the right place for it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin.john.evans at gmail.com Wed Jul 20 11:21:33 2022 From: benjamin.john.evans at gmail.com (Ben Evans) Date: Wed, 20 Jul 2022 13:21:33 +0200 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <3647E41B-745F-425B-AEFD-F944E26BBAA3@oracle.com> <510150081.12807221.1658235148868.JavaMail.zimbra@u-pem.fr> <0556193f9cd3687d6ae0d50e25b842f0@free.fr> Message-ID: On Wed, Jul 20, 2022 at 8:08 AM wrote: > > Le 19/07/2022 20:37, Ben Evans a ?crit : > > What worries me is that: > > > > a) The Classfile API ends up not containing everything that is really > > needed by library authors, so they don't switch > > I don't think a rewrite of ASM on top of ClassFile is necessary for > that. The work that Rafael is doing should be very helpful already. As > well as all the feedback you receive on this mailing list. I think that you're absolutely right in that Rafael's input will be invaluable, but my take is that having 2 or more points of view from library authors would be even better. And - to be completely explicit, I'm not actively working on this (I wish I had the time, but I'm fully committed elsewhere) - I'm only trying to provide some feedback as a former tool maker who was a consumer of ASM and who has seen first hand some of the problems that the status quo has caused for customers, and who can see real potential for the Classfile API to solve some of those problems, if we can find the right path forward. > > c) Even if a) turns out not to be a problem, the time to get enough of > > the ecosystem moved over to Classfile API is potentially extremely > > long. > > From these 3 points it seems that your goal is for the ClassFile API to > eventually replace ASM? Yet this is listed as a non goal in > https://bugs.openjdk.org/browse/JDK-8280389? Can you clarify? That's better directed at the Oracle folks. I read it as "within the context of this JEP, which does not include a public API". I think it's naive to assume that as and when a public API exists, that people won't use it / won't seek to remove usage of third-party bytecode libraries. > About the "majority of cases" where the class file format does not > change between two JDK versions, why is the class file version still > incremented? If it was not, this would solve several cases where tools > such as ASM must be "updated" for no real reason. Another option would > be to increment the minor version only, and to reserve major version > increments for real format changes. If this was added to the JVMS, tools > such as ASM could take advantage of this, and would need to be updated > less often (in fact I realize that ASM is already checking the major > version only, so we unconsciously already assumed that minor version > changes are not important). That's one for the Oracle folks. Personally, I can see both sides of that argument & feel like either choice would have been a reasonable outcome - but that is strictly my own personal feeling. Thanks, Ben From brian.goetz at oracle.com Wed Jul 20 13:29:55 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 20 Jul 2022 13:29:55 +0000 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <3647E41B-745F-425B-AEFD-F944E26BBAA3@oracle.com> <510150081.12807221.1658235148868.JavaMail.zimbra@u-pem.fr> <0556193f9cd3687d6ae0d50e25b842f0@free.fr> Message-ID: <8E564F83-3D3E-43F1-A8C8-568B2ED86F84@oracle.com> From these 3 points it seems that your goal is for the ClassFile API to eventually replace ASM? Yet this is listed as a non goal in https://bugs.openjdk.org/browse/JDK-8280389? Can you clarify? That's better directed at the Oracle folks. I read it as "within the context of this JEP, which does not include a public API". I think it's naive to assume that as and when a public API exists, that people won't use it / won't seek to remove usage of third-party bytecode libraries. This is getting pretty off topic, so we should wind this line of speculation down. I think the JEP was pretty clear, though. There?s no intent to ?kill ASM? or any other bytecode library. There?s a reason there are dozens of bytecode libraries; each focuses on different priorities and audiences. It would be naive to think that this would be the ?one library to rule them all?; it will meet some people?s needs, and not others. If/when this API becomes public and permanent in Java 2x, it will be a long time before the ecosystem has fully baselined on Java 2x. So ASM and others are here to stay for a while, and that?s all good. The future of ASM is in the hands of the ASM community, which seems strong to me. Users will make choices that make sense for them, on their own schedule. What worries me is that: a) The Classfile API ends up not containing everything that is really needed by library authors, so they don't switch Don?t worry about this; this is guaranteed. There will always be something that this library doesn?t do (e.g., I doubt the TCK team will be able to use it to create some of the pathologically broken classifies that are used to test the JVM) or doesn?t do well enough for every situation. In those cases, users should use what works for them. There are choices. b) The cost of switching is very high, so authors keep using ASM, which therefore keeps being developed, and we lose the opportunity to close a big gap in the long upgrade cycle problem. Don?t worry about this; this is baked in. If people are happy with ASM, they?ll stay with ASM, and that?s fine. New projects will have more choices, and many of them may choose the one baked into the JDK. Some projects may make the switch (as the JDK is embarking on ? and this is definitely not cheap) and some may not. Some will do it sooner, some later, some never. Ecosystems are messy; we can lead, but people get to choose whether they follow. c) Even if a) turns out not to be a problem, the time to get enough of the ecosystem moved over to Classfile API is potentially extremely long. Don?t worry about this; this is just how the world works. Some people will adopt early; some will migrate later; some never will. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Wed Jul 20 13:50:45 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 20 Jul 2022 13:50:45 +0000 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <3647E41B-745F-425B-AEFD-F944E26BBAA3@oracle.com> <510150081.12807221.1658235148868.JavaMail.zimbra@u-pem.fr> <0556193f9cd3687d6ae0d50e25b842f0@free.fr> Message-ID: <6A20D208-B12A-4AB6-A767-05864042D03E@oracle.com> On Jul 20, 2022, at 2:08 AM, ebruneton at free.fr wrote: About the "majority of cases" where the class file format does not change between two JDK versions, why is the class file version still incremented? If it was not, this would solve several cases where tools such as ASM must be "updated" for no real reason. Another option would be to increment the minor version only, and to reserve major version increments for real format changes. If this was added to the JVMS, tools such as ASM could take advantage of this, and would need to be updated less often (in fact I realize that ASM is already checking the major version only, so we unconsciously already assumed that minor version changes are not important). This question comes up over and over again. There was a long and drawn out thread about this a few years ago (rooted here: https://mail.openjdk.org/pipermail/jdk-dev/2019-October/003388.html) as well as several times before that, but of the many times this has come up, no one has presented a compelling argument to change the policy. I?ll summarize the conclusions briefly, but would rather not reopen the ?debate? unless there is dramatic new evidence. The bottom line is that it is a myth that the ?majority of versions? do not have classfile changes. There are actually two myths here: - The only changes that matter are ?format changes? - Format changes are rare. Sure, we only introduced one new bytecode (7). But new byte codes are not the only kind of format changes. There have also been three versions (5, 7, and 11) that added new _constant pool forms_. (Keeping score: that?s three versions out of 19 versions with format changes significant enough to make classfile parsers fail, approximately 1 out of 6. 1 out of 6 is not ?rare?.) But so-called ?format changes? merely scratch the surface of the sorts of dependencies inherent in translation from source to class file, for which versioning is warranted. Additional forms of dependencies include: - New class file attributes. The JVMS defines a number of attributes that carry important semantic information, including those that control access control decisions, such as NestMembers, PermittedSubtypes, Module (and friends), etc. Yes, the class file format was designed so that unrecognized attributes could be ?skipped? without losing your place in parsing ? but that doesn?t mean that it is reasonable for a class file processor who claims to understand that class file version to ignore them! New class file attributes are also quite common, with additions in 5, 7, 8, 9, 11, 16, and 17. Not rare, and you should not be parsing (let alone instrumenting) Java 19 classifies if you don?t understand the semantics of the attributes defined for Java 19 classfiles. - Dependencies between translation strategy and JDK libraries. It is not uncommon that language features require library support; the foreach loop depends on Iterable (5), try-with-resources depends on Autocloseable (7), lambdas depend on the LambdaMetafactory bootstrap (8), indified string concatenation depends on the appropriate bootstraps (9), and similar for many new language features such as records (java.lang.Record) and pattern matching. Not bumping the class file version here would be setting a trap for users, because while they might look like they conform to an older class file spec, they won?t run on older JDKs. - Change in classfile validity. Small changes about valid combinations of classfile elements are made all the time. For example, prior to Java 9, it was an error for a method in an interface to have ACC_PRIVATE; now that is valid. (Soon ACC_SUPER will be deprecated and later the bit value repurposed.). While the ?format? hasn?t change, the semantics has. This is also not so infrequent. As you can see, both the assumptions ? that ?format changes? should be defined by the most narrow interpretation of ?can I parse this without losing my place?, and the assumption that changes that require us to bump the class file version are rare ? appeal to wishful thinking. Even if we didn?t bump the class file version ?unless we had to?, such bumps would happen in probably 70-80% of releases anyway. (As a secondary matter, because features are late bound to releases, a policy of only bumping the version ?when there are significant changes? creates a process tax because we have to wait to the end to see if a bump is warranted, and then do the appropriate work at the last minute, which introduces more friction and potential for error. While it may seem that that is ?our problem?, such things eventually become everybody?s problem. If it were truly rare (once a decade), we might consider this, but that?s not the case.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Wed Jul 20 14:19:09 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Wed, 20 Jul 2022 14:19:09 +0000 Subject: RFR: Classfile API ClassRemapper implemenation handling more attributes In-Reply-To: References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <3647E41B-745F-425B-AEFD-F944E26BBAA3@oracle.com> <510150081.12807221.1658235148868.JavaMail.zimbra@u-pem.fr> <0556193f9cd3687d6ae0d50e25b842f0@free.fr> Message-ID: Hi, I?ve implemented handling of RecordAttribute, InnerClassesAttribute, EnclosingMethodAttribute, Annotations, TypeAnnotations, ParameterAnnotations attributes and LoadableConstants in ClassRemapper. Please review: https://github.com/openjdk/jdk-sandbox/pull/27 Thanks, Adam From: classfile-api-dev on behalf of Adam Sotona Date: Wednesday, 20 July 2022 10:05 To: ebruneton at free.fr , Ben Evans Cc: Remi Forax , Paul Sandoz , Rafael Winterhalter , Brian Goetz , classfile-api-dev , classfile-api-dev Subject: Re: POC: JDK ClassModel -> ASM ClassReader From: classfile-api-dev on behalf of ebruneton at free.fr Date: Tuesday, 19 July 2022 19:27 To: Ben Evans Cc: Remi Forax , Paul Sandoz , Rafael Winterhalter , Brian Goetz , classfile-api-dev , classfile-api-dev Subject: Re: POC: JDK ClassModel -> ASM ClassReader Forward compatibility is a difficult topic I think, even for the ClassFile API. I don't think users will be able to just upgrade to the latest JDK to have their ClassFile API code automatically work with the latest class version. In the majority of cases, maybe. But there will be cases where their code might silently fail, when the new class version introduces new features. For example, consider the ClassRemapper here https://github.com/openjdk/jdk-sandbox/blob/classfile-api-branch/src/java.base/share/classes/jdk/classfile/transforms/ClassRemapper.java. It seems that it currently does not remap record components, annotations or type annotations, modules, LDC opcode, etc? You could imagine that it was written for an old class version, pre annotations. If you upgrade to a new JDK version with records and type annotations, this code will silently produce incorrect results, where not everything is remapped. It might be possible to rewrite it in a more secure way, but this would probably require all "switch" to list all known options at the time of writing (and the default case to throw an exception)? Hi Eric, Thanks for pointing ClassRemapper out. It was written as an example of transformations modularity for jdk.jfr class instrumentation use case. The implementation is incomplete yet, however I see no blockers to finish it and keep it up to date with every new JDK feature coming. Thanks, Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: From ebruneton at free.fr Wed Jul 20 17:42:29 2022 From: ebruneton at free.fr (ebruneton at free.fr) Date: Wed, 20 Jul 2022 19:42:29 +0200 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <3647E41B-745F-425B-AEFD-F944E26BBAA3@oracle.com> <510150081.12807221.1658235148868.JavaMail.zimbra@u-pem.fr> <0556193f9cd3687d6ae0d50e25b842f0@free.fr> Message-ID: Hi Adam, Le 20/07/2022 10:05, Adam Sotona a ?crit?: > Thanks for pointing ClassRemapper out. It was written as an example of > transformations modularity for jdk.jfr class instrumentation use case. > > > The implementation is incomplete yet, however I see no blockers to > finish it and keep it up to date with every new JDK feature coming. Of course. I was not saying it can't be finished. My point was that if this was a library outside the JDK, written for JDK 1.5 or so (imagining that the ClassFile API existed back then), and no longer maintained since, everyone depending directly or indirectly on it would be stuck to JDK 1.5, despite the ClassFile API evolving in sync with the JDK. This is an imaginary example, but I think similar cases can happen with new libraries which will be written on top of ClassFile. Eric > > Thanks, > > Adam From ebruneton at free.fr Wed Jul 20 18:03:19 2022 From: ebruneton at free.fr (ebruneton at free.fr) Date: Wed, 20 Jul 2022 20:03:19 +0200 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: <6A20D208-B12A-4AB6-A767-05864042D03E@oracle.com> References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <3647E41B-745F-425B-AEFD-F944E26BBAA3@oracle.com> <510150081.12807221.1658235148868.JavaMail.zimbra@u-pem.fr> <0556193f9cd3687d6ae0d50e25b842f0@free.fr> <6A20D208-B12A-4AB6-A767-05864042D03E@oracle.com> Message-ID: Hi Brian, Le 20/07/2022 15:50, Brian Goetz a ?crit?: >> On Jul 20, 2022, at 2:08 AM, ebruneton at free.fr wrote: >> About the "majority of cases" where the class file format does not >> change between two JDK versions, why is the class file version still >> incremented? If it was not, this would solve several cases where >> tools such as ASM must be "updated" for no real reason. Another >> option would be to increment the minor version only, and to reserve >> major version increments for real format changes. If this was added >> to the JVMS, tools such as ASM could take advantage of this, and >> would need to be updated less often (in fact I realize that ASM is >> already checking the major version only, so we unconsciously already >> assumed that minor version changes are not important). > > This question comes up over and over again. There was a long and > drawn out thread about this a few years ago (rooted here: > https://mail.openjdk.org/pipermail/jdk-dev/2019-October/003388.html > [1]) as well as several times before that, but of the many times this > has come up, no one has presented a compelling argument to change the > policy. I?ll summarize the conclusions briefly, but would rather > not reopen the ?debate? unless there is dramatic new evidence. > > The bottom line is that it is a myth that the ?majority of > versions? do not have classfile changes. There are actually two > myths here: > > - The only changes that matter are ?format changes? > - Format changes are rare. > > Sure, we only introduced one new bytecode (7). But new byte codes are > not the only kind of format changes. There have also been three > versions (5, 7, and 11) that added new _constant pool forms_. > (Keeping score: that?s three versions out of 19 versions with format > changes significant enough to make classfile parsers fail, > approximately 1 out of 6. 1 out of 6 is not ?rare?.) > > But so-called ?format changes? merely scratch the surface of the > sorts of dependencies inherent in translation from source to class > file, for which versioning is warranted. Additional forms of > dependencies include: > > - New class file attributes. The JVMS defines a number of attributes > that carry important semantic information, including those that > control access control decisions, such as NestMembers, > PermittedSubtypes, Module (and friends), etc. Yes, the class file > format was designed so that unrecognized attributes could be > ?skipped? without losing your place in parsing ? but that > doesn?t mean that it is reasonable for a class file processor who > claims to understand that class file version to ignore them! New > class file attributes are also quite common, with additions in 5, 7, > 8, 9, 11, 16, and 17. Not rare, and you should not be parsing (let > alone instrumenting) Java 19 classifies if you don?t understand the > semantics of the attributes defined for Java 19 classfiles. > > - Dependencies between translation strategy and JDK libraries. It is > not uncommon that language features require library support; the > foreach loop depends on Iterable (5), try-with-resources depends on > Autocloseable (7), lambdas depend on the LambdaMetafactory bootstrap > (8), indified string concatenation depends on the appropriate > bootstraps (9), and similar for many new language features such as > records (java.lang.Record) and pattern matching. Not bumping the > class file version here would be setting a trap for users, because > while they might look like they conform to an older class file spec, > they won?t run on older JDKs. > > - Change in classfile validity. Small changes about valid > combinations of classfile elements are made all the time. For > example, prior to Java 9, it was an error for a method in an interface > to have ACC_PRIVATE; now that is valid. (Soon ACC_SUPER will be > deprecated and later the bit value repurposed.). While the > ?format? hasn?t change, the semantics has. This is also not so > infrequent. > > As you can see, both the assumptions ? that ?format changes? > should be defined by the most narrow interpretation of ?can I parse > this without losing my place?, and the assumption that changes that > require us to bump the class file version are rare ? appeal to > wishful thinking. Even if we didn?t bump the class file version > ?unless we had to?, such bumps would happen in probably 70-80% of > releases anyway. > > (As a secondary matter, because features are late bound to releases, a > policy of only bumping the version ?when there are significant > changes? creates a process tax because we have to wait to the end to > see if a bump is warranted, and then do the appropriate work at the > last minute, which introduces more friction and potential for error. > While it may seem that that is ?our problem?, such things > eventually become everybody?s problem. If it were truly rare (once > a decade), we might consider this, but that?s not the case.) Thanks for all the arguments! I was counting new constant pool items and new attributes in the "new format" category, as well as new semantics (such as the removal of jsr/ret), but not the library support. But I was only considering the "recent" versions, since the 6-months release cycle. Also I was not proposing to not bump the version at all, but to use the minor / major version distinction provided by the class file format. Your last point seems to preclude this as well... Eric > > > Links: > ------ > [1] https://mail.openjdk.org/pipermail/jdk-dev/2019-October/003388.html From adam.sotona at oracle.com Thu Jul 21 10:17:32 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Thu, 21 Jul 2022 10:17:32 +0000 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel Message-ID: Hi, I would like to propose removal of solo class jdk.classfile.util.ClassPrinter (as the only class remaining in jdk.classfile.util package). And integrate print functionality directly to ClassModel: /** * Print this classfile. * * @param output handler to receive printed text * @param printOptions optional print configuration */ default void print(Consumer output, PrintOption... printOptions) { new ClassPrinterImpl(output, printOptions).printClass(this); } And to MethodModel: /** * Print this method. * * @param output handler to receive printed text * @param printOptions optional print configuration */ default void print(Consumer output, PrintOption... printOptions) { new ClassPrinterImpl(output, printOptions).printMethod(this); } With help of very simple PrintOption: /** * An option that affects the printing. */ public sealed interface PrintOption { /** * Selection of available print output formats. */ public enum Format implements PrintOption { JSON, XML, YAML } /** * Verbosity level of the print. */ public enum Verbosity implements PrintOption { MEMBERS_ONLY, CRITICAL_ATTRIBUTES, TRACE_ALL } } Any comments are welcome. Thanks, Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.van.acken at gmail.com Thu Jul 21 11:21:41 2022 From: michael.van.acken at gmail.com (Michael van Acken) Date: Thu, 21 Jul 2022 13:21:41 +0200 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: Message-ID: Related but geared towards the builder side of things: Is there a way to print out a trace of parts fed to CodeBuilder instances? Just this morning I had Classfile die on me because of a stack underflow, and it was quite hard to find out which parts were missing from the Code attribute. And that with a Code totalling just 5 instructions... If there would have been bytes output, then I could have inspected the situation with javap. But if I mess up and pass inconsistent data to CodeBuilder, causing it to throw instead of producing a byte array, then I have an observability gap. -- mva -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Jul 21 11:38:03 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 21 Jul 2022 11:38:03 +0000 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: Message-ID: <55A091D9-D9CD-4B0A-AB6A-29825013F271@oracle.com> Mechanics are there in the form of buffered builders, but they are not exposed in the easy way right now. This is something to figure out. Sent from my iPad On Jul 21, 2022, at 7:22 AM, Michael van Acken wrote: ? Related but geared towards the builder side of things: Is there a way to print out a trace of parts fed to CodeBuilder instances? Just this morning I had Classfile die on me because of a stack underflow, and it was quite hard to find out which parts were missing from the Code attribute. And that with a Code totalling just 5 instructions... If there would have been bytes output, then I could have inspected the situation with javap. But if I mess up and pass inconsistent data to CodeBuilder, causing it to throw instead of producing a byte array, then I have an observability gap. -- mva -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Thu Jul 21 11:59:23 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Thu, 21 Jul 2022 11:59:23 +0000 Subject: [External] : Re: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: Message-ID: Yes, this is valid request. There is no debug print when stack map generation fails in CodeBuilder. Thanks, Adam From: Michael van Acken Date: Thursday, 21 July 2022 13:21 To: Adam Sotona Cc: classfile-api-dev at openjdk.org Subject: [External] : Re: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel Related but geared towards the builder side of things: Is there a way to print out a trace of parts fed to CodeBuilder instances? Just this morning I had Classfile die on me because of a stack underflow, and it was quite hard to find out which parts were missing from the Code attribute. And that with a Code totalling just 5 instructions... If there would have been bytes output, then I could have inspected the situation with javap. But if I mess up and pass inconsistent data to CodeBuilder, causing it to throw instead of producing a byte array, then I have an observability gap. -- mva -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Jul 21 14:15:07 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 21 Jul 2022 14:15:07 +0000 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: Message-ID: I think the root of the problem here is that CodeBuilder combines the ?build my method? with stackmap generation, and if the latter fails, nothing is produced. And when stack map generation fails, you?d like to see a javap-like output so you can see what you did wrong. You can suppress stack map generation with an Option, and then you?ll get a classfile out, but that is probably a little hard to discover. This is a more general problem, not just for stack maps; there are other things that can cause code generation to fail (e.g., forward branch to a label that is never defined; invalid labels in exception tables; etc.) The main vector we have for feeding back information is the exception message, but putting the entire javap output of the method body in the exception message might be too much (but might not be, since any exception from building a classfile will trigger a round of debugging.) Any thoughts on how you would like to see this information fed back? Is there a way to print out a trace of parts fed to CodeBuilder instances? Just this morning I had Classfile die on me because of a stack underflow, and it was quite hard to find out which parts were missing from the Code attribute. And that with a Code totalling just 5 instructions... If there would have been bytes output, then I could have inspected the situation with javap. But if I mess up and pass inconsistent data to CodeBuilder, causing it to throw instead of producing a byte array, then I have an observability gap. -- mva -------------- next part -------------- An HTML attachment was scrubbed... URL: From ebruneton at free.fr Thu Jul 21 15:33:02 2022 From: ebruneton at free.fr (Eric Bruneton) Date: Thu, 21 Jul 2022 17:33:02 +0200 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: Message-ID: Le 21/07/2022 ? 16:15, Brian Goetz a ?crit?: > I think the root of the problem here is that CodeBuilder combines the > ?build my method? with stackmap generation, and if the latter fails, > nothing is produced. ?And when stack map generation fails, you?d like to > see a javap-like output so you can see what you did wrong. > > You can suppress stack map generation with an Option, and then you?ll > get a classfile out, but that is probably a little hard to discover. > > This is a more general problem, not just for stack maps; there are other > things that can cause code generation to fail (e.g., forward branch to a > label that is never defined; invalid labels in exception tables; etc.) > ?The main vector we have for feeding back information is the exception > message, but putting the entire javap output of the method body in the > exception message might be too much (but might not be, since any > exception from building a classfile will trigger a round of debugging.) > > Any thoughts on how you would like to see this information fed back? For debugging, could you replace the builder with a dummy builder which would print the class instead of actually building a byte array? More generally could you insert a "no-op" class transform anywhere in a transformation chain, which would print the class instead of transforming it? This is how we do this in ASM, with the TraceClassVisitor (which can be used instead of ClassWriter, or anywhere in a chain of visitors). But maybe this is not possible/desirable with this API? Maybe a "ClassModelBuilder" (ClassModel build(ClassDesc thisClass, Consumer handler)) could be useful too? Eric > >> Is there a way to print out a trace of parts fed to CodeBuilder instances? >> >> Just this morning I had Classfile die on me because of a stack underflow, >> and it was quite hard to find out which parts were missing from the Code >> attribute.? And that with a Code totalling just 5 instructions... >> >> If there would have been bytes output, then I could have inspected the >> situation with javap.? But if I mess up and pass inconsistent data to >> CodeBuilder, causing it to throw?instead of producing a byte array, then >> I have an observability gap. >> >> -- mva >> > From adam.sotona at oracle.com Thu Jul 21 15:36:09 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Thu, 21 Jul 2022 15:36:09 +0000 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: <5D49653A-407E-4685-BA8F-8465DC6FE31C@oracle.com> References: <5D49653A-407E-4685-BA8F-8465DC6FE31C@oracle.com> Message-ID: Actual ClassPrinter is very monolithic by intention. It has been written for the very base purpose of seeing the content of ClassModel or individual MethodModel in human-readable form (as well as machine-readable). The implementation traverses the models and prints formatted output using custom internal templates. The monolithic printer code gives me a chance to reflect all changes in one single class and many changes are instantly handled during refactoring. Additional API/SPI layers would make it very complex, painful for maintenance and a nightmare for testing. What you propose is to transform actual models (ClassModel, MethodModel?) into a kind of ?printable models? and then implement various transformers of these printable models to provide formatted output. I think that another ?printable? SPI layer is a bit of overkill, as anyone can already implement consumer of the actual Classfile API models to print whatever is needed. Purpose of ClassPrinter and its tight integration with Classfile API is to provide the one standard and integrated printer (or set of printers), just a bit extended version of toString() method, and also javap-like embedded tool (all in one). Providing structured text output in these three different formats is a key to simplify parsing of the printed output and search for expected fragments in automated tests (in contrast to complex regexps used to parse actual javap outputs). JSON and XML can be parsed by many tools and libraries, while YAML advantage is to be the closest to human perception of a structured text out of the three. Specification of verbosity levels is also a key as some use cases need to print just brief info for very large number of classes, while some need full trace of a huge single class. Thanks, Adam From: Brian Goetz Date: Thursday, 21 July 2022 15:57 To: Adam Sotona Cc: classfile-api-dev at openjdk.org Subject: Re: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel I am all for finding the right home / API for ClassPrinter; thet it is the only class in ?util? is kind of a red flag. But I?d like to consider a few other directions first. Pushing things into ClassModel is moving in the ?more monolithic? direction; this is evidenced by the fact that you have to feed it multiple kinds of options (both formatting kind and verbosity degrees) at the top, and that information flows down through the traversal of the tree. This means that users can?t easily reuse or influence small pieces of the traversal. I?d like to explore a direction where we expose the constituent parts in a way the user can mix and match, or substitute their own. The essence of CP is to take the tree of elements, which has many dozens of element types, and turn it into a much simpler tree, one which has a few kinds of nodes which vary by structure: key-value pairs, blocks, tables. Having reduced the complexity of the element tree to one that only exposes structure/arity, it is more amenable to shoveling into a hierarchical text format. I might lean towards something like first turning the ClassModel into a tree of Map, where Node could be a key-value pair, a list of nodes, or a Map. This can be parameterize by the Verbosity enum. Then visit this Map with a format-specific traversal that shovels into JSON, XML, or YAML, using the key names to produce the JSON keys / XML element names / YAML paragraph names, and the type of the value to determine whether the payload is a scalar/array/obejct (YAML), simple element / complex element (XML), etc. The current ClassPrinter complects traversal, filtering, element recognition, low-level format details, and high-level format structure. I think these can be teased apart into individual concerns. On Jul 21, 2022, at 6:17 AM, Adam Sotona > wrote: Hi, I would like to propose removal of solo class jdk.classfile.util.ClassPrinter (as the only class remaining in jdk.classfile.util package). And integrate print functionality directly to ClassModel: /** * Print this classfile. * * @param output handler to receive printed text * @param printOptions optional print configuration */ default void print(Consumer output, PrintOption... printOptions) { new ClassPrinterImpl(output, printOptions).printClass(this); } And to MethodModel: /** * Print this method. * * @param output handler to receive printed text * @param printOptions optional print configuration */ default void print(Consumer output, PrintOption... printOptions) { new ClassPrinterImpl(output, printOptions).printMethod(this); } With help of very simple PrintOption: /** * An option that affects the printing. */ public sealed interface PrintOption { /** * Selection of available print output formats. */ public enum Format implements PrintOption { JSON, XML, YAML } /** * Verbosity level of the print. */ public enum Verbosity implements PrintOption { MEMBERS_ONLY, CRITICAL_ATTRIBUTES, TRACE_ALL } } Any comments are welcome. Thanks, Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Thu Jul 21 15:39:49 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Thu, 21 Jul 2022 15:39:49 +0000 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: Message-ID: As it is probably not the best to print to System.err and also filling large method print into an Exception message is overkill. I would suggest to create a new Classfile.Option for debug output, where user can specify a Consumer as handler for such situations. For verification purposes such handler is a direct method argument, which can be unified to use the new Option. BTW: turning unfinished CodeBuilder into a MethodModel to re-use ClassPrinter is a bit tricky, however I already have a prototype and it seems to be valuable feature And one trick to debug stack map generation I almost forgot ? look at the StackMapGenerator Javadoc: https://htmlpreview.github.io/?https://raw.githubusercontent.com/openjdk/jdk-sandbox/classfile-api-javadoc-branch/doc/classfile-api/javadoc/jdk/classfile/impl/StackMapGenerator.html In case of an exception during the Generator loop there is just minimal information available in the exception message. To determine root cause of the exception it is recommended to enable debug logging of the Generator in one of the two modes using following java.lang.System properties: -Djdk.classfile.impl.StackMapGenerator.DEBUG=true Activates debug logging with basic information + generated stack map frames in case of success. It also re-runs with enabled full trace logging in case of an error or exception. -Djdk.classfile.impl.StackMapGenerator.TRACE=true Activates full detailed tracing of the generator process for all invocations. From: Brian Goetz Date: Thursday, 21 July 2022 16:15 To: Michael van Acken Cc: Adam Sotona , classfile-api-dev at openjdk.org Subject: Re: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel I think the root of the problem here is that CodeBuilder combines the ?build my method? with stackmap generation, and if the latter fails, nothing is produced. And when stack map generation fails, you?d like to see a javap-like output so you can see what you did wrong. You can suppress stack map generation with an Option, and then you?ll get a classfile out, but that is probably a little hard to discover. This is a more general problem, not just for stack maps; there are other things that can cause code generation to fail (e.g., forward branch to a label that is never defined; invalid labels in exception tables; etc.) The main vector we have for feeding back information is the exception message, but putting the entire javap output of the method body in the exception message might be too much (but might not be, since any exception from building a classfile will trigger a round of debugging.) Any thoughts on how you would like to see this information fed back? Is there a way to print out a trace of parts fed to CodeBuilder instances? Just this morning I had Classfile die on me because of a stack underflow, and it was quite hard to find out which parts were missing from the Code attribute. And that with a Code totalling just 5 instructions... If there would have been bytes output, then I could have inspected the situation with javap. But if I mess up and pass inconsistent data to CodeBuilder, causing it to throw instead of producing a byte array, then I have an observability gap. -- mva -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.van.acken at gmail.com Thu Jul 21 15:42:30 2022 From: michael.van.acken at gmail.com (Michael van Acken) Date: Thu, 21 Jul 2022 17:42:30 +0200 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: Message-ID: Am Do., 21. Juli 2022 um 16:15 Uhr schrieb Brian Goetz < brian.goetz at oracle.com>: > I think the root of the problem here is that CodeBuilder combines the > ?build my method? with stackmap generation, and if the latter fails, > nothing is produced. And when stack map generation fails, you?d like to > see a javap-like output so you can see what you did wrong. > > You can suppress stack map generation with an Option, and then you?ll get > a classfile out, but that is probably a little hard to discover. > > This is a more general problem, not just for stack maps; there are other > things that can cause code generation to fail (e.g., forward branch to a > label that is never defined; invalid labels in exception tables; etc.) The > main vector we have for feeding back information is the exception message, > but putting the entire javap output of the method body in the exception > message might be too much (but might not be, since any exception from > building a classfile will trigger a round of debugging.) > > Any thoughts on how you would like to see this information fed back? > For me as a user building a compiler, the Classfile API is a black box. When it throws an exception that is caused by faulty input, one of the first questions I need to answer is "What did I do?" Well, I called lots and lots of methods on builder instances (for the most part CodeBuilder, of course). This is my primary point of contact to the API, and a textual log of those method calls would be helpful. Something along the lines of how Unix strace prints a log of system calls. Symbolic information of call details that allows to piece information together, if need be from thousands of lines of text. Having such call log lines emitted on demand as part of a Classfile.build() would be the most straightforward way to do this. -- mva -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Jul 21 15:59:19 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 21 Jul 2022 15:59:19 +0000 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: Message-ID: You could ? we have a buffering builder internally (not yet exposed) that just accumulates a List; this is something that we could press into service for exposing a debugging mode. (We don?t want to do this by default because it slows down the pipeline.). But, this is in the same category as ?set the option to not generate stack maps? ? it means that the user has to re-run the computation with different code, which is annoying for a compiler and super-duper-annoying when the transform is being done on the fly as part of a framework. Alternately, since build() and friends take a Consumer, the user can wrap that consumer with one that logs the elements somewhere. But again, that?s a manual intervention. Another idea is to embrace the ?re-run the lambda? approach that we do with optimizing branch offsets; if a CodeBuilder fails to verify, re-run the same lambda in a buffering builder to get more observability. The ClassModelBuilder approach you suggest could be easily exposed, and I think it has value, but it has some rough edges ? the buffering builders don?t implement all the functionality that the direct builders do (e.g., you can?t ask it ?what?s the current BCI?). So while this is totally a valid thing to expose and would likely be useful and is easy to do, it?s not purely a drop-in replacement for building directly right now. > For debugging, could you replace the builder with a dummy builder which would print the class instead of actually building a byte array? More generally could you insert a "no-op" class transform anywhere in a transformation chain, which would print the class instead of transforming it? This is how we do this in ASM, with the TraceClassVisitor (which can be used instead of ClassWriter, or anywhere in a chain of visitors). But maybe this is not possible/desirable with this API? > > Maybe a "ClassModelBuilder" (ClassModel build(ClassDesc thisClass, Consumer handler)) could be useful too? > > Eric > >>> Is there a way to print out a trace of parts fed to CodeBuilder instances? >>> >>> Just this morning I had Classfile die on me because of a stack underflow, >>> and it was quite hard to find out which parts were missing from the Code >>> attribute. And that with a Code totalling just 5 instructions... >>> >>> If there would have been bytes output, then I could have inspected the >>> situation with javap. But if I mess up and pass inconsistent data to >>> CodeBuilder, causing it to throw instead of producing a byte array, then >>> I have an observability gap. >>> >>> -- mva >>> From brian.goetz at oracle.com Thu Jul 21 16:09:11 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 21 Jul 2022 16:09:11 +0000 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: <5D49653A-407E-4685-BA8F-8465DC6FE31C@oracle.com> Message-ID: <19814815-4482-4CB6-B3A3-BFEB156B03B1@oracle.com> Actual ClassPrinter is very monolithic by intention. It has been written for the very base purpose of seeing the content of ClassModel or individual MethodModel in human-readable form (as well as machine-readable). Yes, agreed. The implementation traverses the models and prints formatted output using custom internal templates. The monolithic printer code gives me a chance to reflect all changes in one single class and many changes are instantly handled during refactoring. Additional API/SPI layers would make it very complex, painful for maintenance and a nightmare for testing. I?m not sure about this. Yes, more knobs means more testing, but having simpler, well-defined layers also simplifies testing. One thing that concerns me having ?seen this movie before? is that highly complected APIs tend to have a high rate of enhancement requests, because the user can?t control things that they want to control. So while a monolithic thing is easier to design the API for initially, over the long term, the cost of maintenance is higher. What you propose is to transform actual models (ClassModel, MethodModel?) into a kind of ?printable models? and then implement various transformers of these printable models to provide formatted output. Yes, where those ?printable models? are probably a standard data structure like Map, maybe involving a few custom record types along the way. I?m not at all worried that this process will involve more rounds of copying data; since printing is the ultimate destination (probably on its way for a human to debug it), creating a few thousand extra objects is in the noise. I think that another ?printable? SPI layer is a bit of overkill, as anyone can already implement consumer of the actual Classfile API models to print whatever is needed. Right, we could get away without a printing facility at all. To me, the value-add here is that someone has gone through the work ? once ? to switch over all the kinds of elements that can appear in a XxxModel and turn them into something simpler. The code to turn the ?abstract model? into XML or JSON or whatever is probably trivial, but the code to turn the real models into abstract printable ones is a lot of kind-of-annoying case analysis, and I think this is where we will save the user time and frustration. Let me propose an experiment: take the existing ClassPrinter implementation ? still monolithic ? and try refactoring to use internal ?printable models?, where the keys in the output (e.g., ?class name?) are derived from the keys in the printable model, rather than hard-coded format strings, and see how we like that? That should be a small step, and if we like it, we can take another step. If it turns out that is nastier to do that I am guessing, we can back off an think of another approach. -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Thu Jul 21 16:28:08 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Thu, 21 Jul 2022 16:28:08 +0000 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: <19814815-4482-4CB6-B3A3-BFEB156B03B1@oracle.com> References: <5D49653A-407E-4685-BA8F-8465DC6FE31C@oracle.com> <19814815-4482-4CB6-B3A3-BFEB156B03B1@oracle.com> Message-ID: From: Brian Goetz Date: Thursday, 21 July 2022 18:09 To: Adam Sotona Cc: classfile-api-dev at openjdk.org Subject: Re: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel Actual ClassPrinter is very monolithic by intention. It has been written for the very base purpose of seeing the content of ClassModel or individual MethodModel in human-readable form (as well as machine-readable). Yes, agreed. The implementation traverses the models and prints formatted output using custom internal templates. The monolithic printer code gives me a chance to reflect all changes in one single class and many changes are instantly handled during refactoring. Additional API/SPI layers would make it very complex, painful for maintenance and a nightmare for testing. I?m not sure about this. Yes, more knobs means more testing, but having simpler, well-defined layers also simplifies testing. One thing that concerns me having ?seen this movie before? is that highly complected APIs tend to have a high rate of enhancement requests, because the user can?t control things that they want to control. So while a monolithic thing is easier to design the API for initially, over the long term, the cost of maintenance is higher. What you propose is to transform actual models (ClassModel, MethodModel?) into a kind of ?printable models? and then implement various transformers of these printable models to provide formatted output. Yes, where those ?printable models? are probably a standard data structure like Map, maybe involving a few custom record types along the way. I?m not at all worried that this process will involve more rounds of copying data; since printing is the ultimate destination (probably on its way for a human to debug it), creating a few thousand extra objects is in the noise. I think that another ?printable? SPI layer is a bit of overkill, as anyone can already implement consumer of the actual Classfile API models to print whatever is needed. Right, we could get away without a printing facility at all. To me, the value-add here is that someone has gone through the work ? once ? to switch over all the kinds of elements that can appear in a XxxModel and turn them into something simpler. The code to turn the ?abstract model? into XML or JSON or whatever is probably trivial, but the code to turn the real models into abstract printable ones is a lot of kind-of-annoying case analysis, and I think this is where we will save the user time and frustration. Let me propose an experiment: take the existing ClassPrinter implementation ? still monolithic ? and try refactoring to use internal ?printable models?, where the keys in the output (e.g., ?class name?) are derived from the keys in the printable model, rather than hard-coded format strings, and see how we like that? That should be a small step, and if we like it, we can take another step. If it turns out that is nastier to do that I am guessing, we can back off an think of another approach. I?m not quite how it would look like. This is one fragment of the actual templates in one format: new Block(",%n \"module\": {%n \"name\": \"%s\",%n \"flags\": %s,%n \"version\": \"%s\",%n \"uses\": %s", " }"), new Table(",%n \"requires\": [", "]", "%n { \"name\": \"%s\", \"flags\": %s, \"version\": \"%s\" }"), new Table(",%n \"exports\": [", "]", "%n { \"package\": \"%s\", \"flags\": %s, \"to\": %s }"), new Table(",%n \"opens\": [", "]", "%n { \"package\": \"%s\", \"flags\": %s, \"to\": %s }"), new Table(",%n \"provides\": [", "]", "%n { \"class\": \"%s\", \"with\": %s }"), While applied are following way: case ModuleAttribute ma -> { out.accept(template.module.header.formatted(ma.moduleName().name().stringValue(), quoteFlags(ma.moduleFlags()), ma.moduleVersion().map(Utf8Entry::stringValue).orElse(""), typesToString(ma.uses().stream().map(ce -> ce.asInternalName())))); printTable(template.requires, ma.requires(), req -> new Object[] {req.requires().name().stringValue(), quoteFlags(req.requiresFlags()), req.requiresVersion().map(Utf8Entry::stringValue).orElse(null)}); printTable(template.exports, ma.exports(), exp -> new Object[] {exp.exportedPackage().name().stringValue(), quoteFlags(exp.exportsFlags()), typesToString(exp.exportsTo().stream().map(me -> me.name().stringValue()))}); printTable(template.opens, ma.opens(), open -> new Object[] {open.openedPackage().name().stringValue(), quoteFlags(open.opensFlags()), typesToString(open.opensTo().stream().map(me -> me.name().stringValue()))}); printTable(template.provides, ma.provides(), provide -> new Object[] {provide.provides().asInternalName(), typesToString(provide.providesWith().stream().map(me -> me.asInternalName()))}); out.accept(template.module.footer.formatted()); } Each individual parameter of each template has its position (the same position in each format) and format-specific escaping methods are frequently (and individually based on context) called. How do you suggest to pass it through generic key-value maps, when String format is index-based? -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Jul 21 17:23:23 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 21 Jul 2022 17:23:23 +0000 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: <5D49653A-407E-4685-BA8F-8465DC6FE31C@oracle.com> <19814815-4482-4CB6-B3A3-BFEB156B03B1@oracle.com> Message-ID: <856D314D-9237-40F3-8EF7-53CC73286B11@oracle.com> Let me propose an experiment: take the existing ClassPrinter implementation ? still monolithic ? and try refactoring to use internal ?printable models?, where the keys in the output (e.g., ?class name?) are derived from the keys in the printable model, rather than hard-coded format strings, and see how we like that? That should be a small step, and if we like it, we can take another step. If it turns out that is nastier to do that I am guessing, we can back off an think of another approach. I?m not quite how it would look like. This is one fragment of the actual templates in one format: new Block(",%n \"module\": {%n \"name\": \"%s\",%n \"flags\": %s,%n \"version\": \"%s\",%n \"uses\": %s", " }"), Right. So first, the words like ?module? and ?name? can be derived from the keys in the Map. We have latitude here, but let?s say for example that the key is ?class name? and in JSON this is rendered as ?class name? and in XML as ??. These are both mechanical translations. That gets all of the text out of the format, and carries the name in the Map instead. Note also that all of the format specifiers are %s, since we?ve already converted the leaves of the Map to String. So much of these format strings can be generated from the data itself. Second, in the code you have, there is an implicit alignment between the order of the arguments to ?formatted? and the order of the format specifiers, as is natural for formatters. But we can break this, by using a LinkedHashMap (which preserves order) for the keys to ensure they come out in the right order. Third, the block/table structure (with its prefix and suffix) can be derived from the shape of the Map. Instead of calling printTable, the key ?requires? maps to a LIST of { ?name? -> name, ?flags? -> flags, ?version? -> version }. This alone is enough to trigger generation of an object whose key is ?requires?, and whose corresponding value is an array of objects with name, flags, and version keys (and similar for other formats.). new Table(",%n \"requires\": [", "]", "%n { \"name\": \"%s\", \"flags\": %s, \"version\": \"%s\" }"), new Table(",%n \"exports\": [", "]", "%n { \"package\": \"%s\", \"flags\": %s, \"to\": %s }"), new Table(",%n \"opens\": [", "]", "%n { \"package\": \"%s\", \"flags\": %s, \"to\": %s }"), new Table(",%n \"provides\": [", "]", "%n { \"class\": \"%s\", \"with\": %s }"), While applied are following way: case ModuleAttribute ma -> { out.accept(template.module.header.formatted(ma.moduleName().name().stringValue(), quoteFlags(ma.moduleFlags()), ma.moduleVersion().map(Utf8Entry::stringValue).orElse(""), typesToString(ma.uses().stream().map(ce -> ce.asInternalName())))); printTable(template.requires, ma.requires(), req -> new Object[] {req.requires().name().stringValue(), quoteFlags(req.requiresFlags()), req.requiresVersion().map(Utf8Entry::stringValue).orElse(null)}); printTable(template.exports, ma.exports(), exp -> new Object[] {exp.exportedPackage().name().stringValue(), quoteFlags(exp.exportsFlags()), typesToString(exp.exportsTo().stream().map(me -> me.name().stringValue()))}); printTable(template.opens, ma.opens(), open -> new Object[] {open.openedPackage().name().stringValue(), quoteFlags(open.opensFlags()), typesToString(open.opensTo().stream().map(me -> me.name().stringValue()))}); printTable(template.provides, ma.provides(), provide -> new Object[] {provide.provides().asInternalName(), typesToString(provide.providesWith().stream().map(me -> me.asInternalName()))}); out.accept(template.module.footer.formatted()); } Each individual parameter of each template has its position (the same position in each format) and format-specific escaping methods are frequently (and individually based on context) called. How do you suggest to pass it through generic key-value maps, when String format is index-based? Control the order of keys in a Map with LHM; then just spool out the key-value pairs in order. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Thu Jul 21 17:31:43 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 21 Jul 2022 17:31:43 +0000 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: <856D314D-9237-40F3-8EF7-53CC73286B11@oracle.com> References: <5D49653A-407E-4685-BA8F-8465DC6FE31C@oracle.com> <19814815-4482-4CB6-B3A3-BFEB156B03B1@oracle.com> <856D314D-9237-40F3-8EF7-53CC73286B11@oracle.com> Message-ID: <82682B3C-0912-4347-BDCA-40BF83D07CF1@oracle.com> To be more explicit: all the format strings go away, replaced by some mechanical logic about ?lists are rendered in [ ? ], maps are rendered in { ? }?. And the code like: case ModuleAttribute ma -> { out.accept(template.module.header.formatted(ma.moduleName().name().stringValue(), quoteFlags(ma.moduleFlags()), ma.moduleVersion().map(Utf8Entry::stringValue).orElse(""), typesToString(ma.uses().stream().map(ce -> ce.asInternalName())))); printTable(template.requires, ma.requires(), req -> new Object[] {req.requires().name().stringValue(), quoteFlags(req.requiresFlags()), req.requiresVersion().map(Utf8Entry::stringValue).orElse(null)}); printTable(template.exports, ma.exports(), exp -> new Object[] {exp.exportedPackage().name().stringValue(), quoteFlags(exp.exportsFlags()), typesToString(exp.exportsTo().stream().map(me -> me.name().stringValue()))}); printTable(template.opens, ma.opens(), open -> new Object[] {open.openedPackage().name().stringValue(), quoteFlags(open.opensFlags()), typesToString(open.opensTo().stream().map(me -> me.name().stringValue()))}); printTable(template.provides, ma.provides(), provide -> new Object[] {provide.provides().asInternalName(), typesToString(provide.providesWith().stream().map(me -> me.asInternalName()))}); out.accept(template.module.footer.formatted()); } Becomes adding the following to the map: { moduleName -> ma.moduleName().name().stringValue(), ?, requires -> ma.requires().stream().map(?).toList(), ? } -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Fri Jul 22 17:44:29 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 22 Jul 2022 17:44:29 +0000 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: <5D49653A-407E-4685-BA8F-8465DC6FE31C@oracle.com> <19814815-4482-4CB6-B3A3-BFEB156B03B1@oracle.com> <856D314D-9237-40F3-8EF7-53CC73286B11@oracle.com> <57280290-63A6-473E-8879-62C51B097838@oracle.com> Message-ID: I?ve significantly progressed in the prototype as you propose to process ClassPrint (to use Maps and Lists) and I?m experimenting with minimal set of formatting features. Cool, let?s see where it leads. I wasn?t saying that the map-of-maps is the only way to get there, as much as it seemed a viable way to get to a cleanly factored and flexible result. So let?s see where it goes, and if it runs off the road, we?ll try something else. We can drop many features and it still looks OK, except for expanded vs compact lists and maps. Compact map can be represented as array, however I?m not aware of anything similar for Map. Or we can use explicit delimiter class Compact, where everything inside is treated as compact form. Or we can avoid compact maps and use Maps and Lists for expanded form and arrays for compact lists. I am not sure what you mean by ?compact lists and maps?? Can you give an example so I see what you are aiming at? Who is the audience for the JSON/XML/YAML-formatted results? Is someone going to parse it with Jackson and do analysis on it? If human-readability is important, why is the javap formatting not sufficient? Audience for JSON/XML/YAML are all test writers validating anything in the generated code. Yes, people prefer to parse structured text in a specific format over full-text or regexp searches across unstructured log output. OK, so the idea here is that we think it is easier to grovel over the result with (say) XPath to make assertions, than to traverse the CM directly? Perhaps we need to provide more navigation operators on the models to make it easier to make such assertions? I would propose do it the other way: - clean and refactor ClassPrinter - integrate ClassPrinter with Classfile API (also for internal debugging purposes) - extend javap about options to produce JSON/XML/YAML output as thin layer delegating to ClassfileAPI - extend javap about verification option as thin layer delegating to ClassfileAPI (however that is a different topic :) Yes, either way I think more of javap moves into ClassFile API, which is good. Let?s see where this leads. -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Fri Jul 22 20:50:43 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Fri, 22 Jul 2022 20:50:43 +0000 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: <5D49653A-407E-4685-BA8F-8465DC6FE31C@oracle.com> <19814815-4482-4CB6-B3A3-BFEB156B03B1@oracle.com> <856D314D-9237-40F3-8EF7-53CC73286B11@oracle.com> <57280290-63A6-473E-8879-62C51B097838@oracle.com> Message-ID: We can drop many features and it still looks OK, except for expanded vs compact lists and maps. Compact map can be represented as array, however I?m not aware of anything similar for Map. Or we can use explicit delimiter class Compact, where everything inside is treated as compact form. Or we can avoid compact maps and use Maps and Lists for expanded form and arrays for compact lists. I am not sure what you mean by ?compact lists and maps?? Can you give an example so I see what you are aiming at? Expanded is everything spread across multiple lines. Expanded lists in the example below are top level list of classes and lists of fields and methods. Expanded maps are: content of class, constant pool, content of field, content of method, code and stack map frames Compact means all in one line. Compact lists are everything in [ ] and compact maps everything in { }. If you traverse the tree from root you can never return from compact to expanded, so an indicator (list or map attribute or another form of delimiter) should indicate where to start render compacted form of lists and maps. The same applies for XML and JSON if we would like to have them human-readable. - class name: 'Foo' version: '61.0' flags: [PUBLIC] superclass: 'Boo' interfaces: ['Phee', 'Phoo'] attributes: [SourceFile] constant pool: 1: [CONSTANT_Utf8, 'Foo'] 2: [CONSTANT_Class, {name index: 1, name: 'Foo'}] 3: [CONSTANT_Utf8, 'Boo'] 4: [CONSTANT_Class, {name index: 3, name: 'Boo'}] 5: [CONSTANT_Utf8, 'f'] 6: [CONSTANT_Utf8, 'Ljava/lang/String;'] 7: [CONSTANT_Utf8, 'm'] 8: [CONSTANT_Utf8, '(ZLjava/lang/Throwable;)Ljava/lang/Void;'] 9: [CONSTANT_Utf8, 'Phee'] 10: [CONSTANT_Class, {name index: 9, name: 'Phee'}] 11: [CONSTANT_Utf8, 'Phoo'] 12: [CONSTANT_Class, {name index: 11, name: 'Phoo'}] 13: [CONSTANT_Utf8, 'Code'] 14: [CONSTANT_Utf8, 'StackMapTable'] 15: [CONSTANT_Utf8, 'SourceFile'] 16: [CONSTANT_Utf8, 'Foo.java'] source: 'Foo.java' fields: - field name: 'f' flags: [PRIVATE] descriptor: 'Ljava/lang/String;' attributes: [] methods: - method name: 'm' flags: [PROTECTED] descriptor: '(ZLjava/lang/Throwable;)Ljava/lang/Void;' attributes: [Code] code: max stack: 1 max locals: 3 attributes: [StackMapTable] stack map frames: 6: {locals: ['Foo', 'int', 'java/lang/Throwable'], stack: []} #stack map frame locals: ['Foo', 'int', 'java/lang/Throwable'], stack: [] 0: [ILOAD_1, {slot: 1}] 1: [IFEQ, {target: 6}] 4: [ALOAD_2, {slot: 2}] 5: [ATHROW] #stack map frame locals: ['Foo', 'int', 'java/lang/Throwable'], stack: [] 6: [RETURN] -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.van.acken at gmail.com Sat Jul 23 08:49:35 2022 From: michael.van.acken at gmail.com (Michael van Acken) Date: Sat, 23 Jul 2022 10:49:35 +0200 Subject: VerifyError: Frame offset out of bytecode range Message-ID: (Btw, kudos to whoever took Zelazny's Amber novels as an inspiration for a project name. Learning about this triggered a massive trip down memory lane for me.) Note: This post is also a kind of addendum to the printing discussion on the other thread. I have StackMapGenerator throwing "VerifyError: Frame offset out of bytecode range", and this may be caused by an interaction with the recently added ifThenElse(). Input is this unit test: (is (= '[["LOCAL 0: int i"] (ILOAD_0) (ICONST_1) (IADD) (ISTORE_1) ["LOCAL 1: int __temp"] (ILOAD_0) (ICONST_2) (IADD) (ISTORE_2) ["LOCAL 2: int __temp"] (ILOAD_0) (ICONST_3) (IADD) (ISTORE_3) ["LOCAL 3: int __temp"] (ILOAD_1) (ILOAD_2) (IF_ICMPGE L:1) (ILOAD_2) (ILOAD_3) (IF_ICMPGE L:0) (ICONST_1) (IRETURN) [L:0] (ICONST_0) (IRETURN) [L:1] (ICONST_0) (IRETURN)] (asm-expr [^int i] (< (+ i 1) (+ i 2) (+ i 3))))) Extending CodeBuilder.java's "with" method with println like such public CodeBuilder with(CodeElement element) { System.out.println("with "+element); if (element.toString().contains("OP=GOTO")) { new Throwable().printStackTrace(); } ((AbstractElement) element).writeTo(this); return this; } I get the output below (after some manual indentation). With the exception of the additional GOTO inserted by ifThenElse() before the final ICONST_0, this seems to match the intended output. :accept ClassBuilder start :accept CodeBuilder start with LocalVariable[Slot=0, name=i, descriptor='I'] with Label[context=CodeBuilder[id=1458540918], contextInfo=-1] with Load[OP=ILOAD_0, slot=0] with UnboundIntrinsicConstantInstruction[op=ICONST_1] with UnboundOperatorInstruction[op=IADD] with Store[OP=ISTORE_1, slot=1] with Label[context=CodeBuilder[id=1458540918], contextInfo=-1] with LocalVariable[Slot=1, name=__temp, descriptor='I'] with Load[OP=ILOAD_0, slot=0] with UnboundIntrinsicConstantInstruction[op=ICONST_2] with UnboundOperatorInstruction[op=IADD] with Store[OP=ISTORE_2, slot=2] with Label[context=CodeBuilder[id=1458540918], contextInfo=-1] with LocalVariable[Slot=2, name=__temp, descriptor='I'] with Load[OP=ILOAD_0, slot=0] with UnboundIntrinsicConstantInstruction[op=ICONST_3] with UnboundOperatorInstruction[op=IADD] with Store[OP=ISTORE_3, slot=3] with Label[context=CodeBuilder[id=1458540918], contextInfo=-1] with LocalVariable[Slot=3, name=__temp, descriptor='I'] with Load[OP=ILOAD_1, slot=1] with Load[OP=ILOAD_2, slot=2] with Branch[OP=IF_ICMPGE] with Label[context=CodeBuilder[id=1458540918], contextInfo=-1] :accept CodeBuilder start with Load[OP=ILOAD_2, slot=2] with Load[OP=ILOAD_3, slot=3] with Branch[OP=IF_ICMPGE] with Label[context=CodeBuilder[id=1458540918], contextInfo=-1] :accept CodeBuilder start with UnboundIntrinsicConstantInstruction[op=ICONST_1] with Return[OP=IRETURN] :accept CodeBuilder end with Label[context=CodeBuilder[id=1458540918], contextInfo=-1] with Label[context=CodeBuilder[id=1458540918], contextInfo=-1] :accept CodeBuilder start with UnboundIntrinsicConstantInstruction[op=ICONST_0] with Return[OP=IRETURN] :accept CodeBuilder end with Label[context=CodeBuilder[id=1458540918], contextInfo=-1] with Label[context=CodeBuilder[id=1458540918], contextInfo=-1] :accept CodeBuilder end with Branch[OP=GOTO] java.lang.Throwable at java.base/jdk.classfile.impl.DirectCodeBuilder.with(DirectCodeBuilder.java:143) at java.base/jdk.classfile.impl.DirectCodeBuilder.with(DirectCodeBuilder.java:75) at java.base/jdk.classfile.impl.BlockCodeBuilderImpl.with(BlockCodeBuilderImpl.java:90) at java.base/jdk.classfile.impl.BlockCodeBuilderImpl.with(BlockCodeBuilderImpl.java:39) at java.base/jdk.classfile.CodeBuilder.branchInstruction(CodeBuilder.java:383) at java.base/jdk.classfile.CodeBuilder.ifThenElse(CodeBuilder.java:288) at tcljc.emitter.bytecode.__ns100.expr-insns$split-join-insn~4(bytecode.cljt:638) [...] with Label[context=CodeBuilder[id=1458540918], contextInfo=-1] with Label[context=CodeBuilder[id=1458540918], contextInfo=-1] :accept CodeBuilder start with UnboundIntrinsicConstantInstruction[op=ICONST_0] with Return[OP=IRETURN] :accept CodeBuilder end with Label[context=CodeBuilder[id=1458540918], contextInfo=-1] with Label[context=CodeBuilder[id=1458540918], contextInfo=-1] :accept CodeBuilder end [...] :accept ClassBuilder end Generating stack maps for class: __ns100 method: fnbody~1 with signature: MethodTypeDesc[(int)boolean] ERROR at tcljc.switch-test/int-cmp-test:45 [...] java.lang.VerifyError: Frame offset out of bytecode range at fnbody~1 at java.base/jdk.classfile.impl.StackMapGenerator.generatorError(StackMapGenerator.java:865) at java.base/jdk.classfile.impl.StackMapGenerator$2.set(StackMapGenerator.java:878) at java.base/jdk.classfile.impl.StackMapGenerator.detectFrameOffsets(StackMapGenerator.java:893) at java.base/jdk.classfile.impl.StackMapGenerator.generate(StackMapGenerator.java:308) at java.base/jdk.classfile.impl.StackMapGenerator.(StackMapGenerator.java:248) at java.base/jdk.classfile.impl.DirectCodeBuilder$4.writeBody(DirectCodeBuilder.java:298) at java.base/jdk.classfile.impl.UnboundAttribute$AdHocAttribute.writeTo(UnboundAttribute.java:931) at java.base/jdk.classfile.impl.AttributeHolder.writeTo(AttributeHolder.java:60) at java.base/jdk.classfile.impl.DirectMethodBuilder.writeTo(DirectMethodBuilder.java:137) at java.base/jdk.classfile.impl.BufWriterImpl.writeList(BufWriterImpl.java:197) at java.base/jdk.classfile.impl.DirectClassBuilder.build(DirectClassBuilder.java:177) at java.base/jdk.classfile.Classfile.build(Classfile.java:216) at java.base/jdk.classfile.Classfile.build(Classfile.java:198) at java.base/jdk.classfile.Classfile.build(Classfile.java:184) at tcljc.emitter.__ns100.build-segment~1(emitter.cljt:190) [...] -- mva -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Mon Jul 25 16:58:11 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Mon, 25 Jul 2022 16:58:11 +0000 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: <5D49653A-407E-4685-BA8F-8465DC6FE31C@oracle.com> <19814815-4482-4CB6-B3A3-BFEB156B03B1@oracle.com> <856D314D-9237-40F3-8EF7-53CC73286B11@oracle.com> <57280290-63A6-473E-8879-62C51B097838@oracle.com> Message-ID: During the experiments I found generic map-of-maps is too loose and missing any formatting information. The API below has been derived from the requirements of the actual ClassPrinter and made as a minimal set of formatting features unified for producing JSON, YAML and XML. The API also tries to avoid problematic combinations (for example list of lists is not possible, but list of maps of lists works perfectly). Each Classfile API model can provide its printable form in the future. Printers implementations are generic and very simple (~80 lines of code each, mainly one big switch expression) and custom printers can be implemented. Please let me know your comments before I start rewriting the ClassPrinter into this intermediate form. Thanks, Adam BTW: this is re-send of a too-big email with code and examples referencing to gist.github.com instead of attachements Here is a small API to specify printable fragments and an example how sample class file could compose its printable form: https://gist.github.com/asotona/cfc559c2c82e48d27551e32d2ef89474 And here is YAML print of the example above: https://gist.github.com/asotona/dc62246e0ea2922a3f6b01018b02679a XML print: https://gist.github.com/asotona/3b731e820f98093731353f4826cd34ce And JSON print: https://gist.github.com/asotona/a1d0300d4ed68773d2d502db25892ade -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Mon Jul 25 17:03:12 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Mon, 25 Jul 2022 17:03:12 +0000 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: <071e3754-17a8-f6a0-560c-54be6d17f36d@oracle.com> References: <5D49653A-407E-4685-BA8F-8465DC6FE31C@oracle.com> <19814815-4482-4CB6-B3A3-BFEB156B03B1@oracle.com> <856D314D-9237-40F3-8EF7-53CC73286B11@oracle.com> <57280290-63A6-473E-8879-62C51B097838@oracle.com> <071e3754-17a8-f6a0-560c-54be6d17f36d@oracle.com> Message-ID: Unfortunately it is format-specific. We can unify it down to ?all-quoted? however the visual benefits will be lost. On 25.07.2022 19:00, "Brian Goetz" wrote: This seems like nice progress. I wonder if we can boil it down further? Can we, for example, get rid of Quoted by Plain(key, "\"value\""), and such? On 7/25/2022 12:40 PM, Adam Sotona wrote: During the experiments I found generic map-of-maps is too loose and missing any formatting information. The API below has been derived from the requirements of the actual ClassPrinter and made as a minimal set of formatting features unified for producing JSON, YAML and XML. The API also tries to avoid problematic combinations (for example list of lists is not possible, but list of maps of lists works perfectly). Each Classfile API model can provide its printable form in the future. Printers implementations are generic and very simple (~80 lines of code each, mainly one big switch expression) and custom printers can be implemented. Please let me know your comments before I start rewriting the ClassPrinter into this intermediate form. Thanks, Adam Here is an example of small API to specify printable fragments: public sealed interface Printable { public String key(); public sealed interface Fragment extends Printable {} public record Plain(String key, String value) implements Fragment {} public record Quoted(String key, String value) implements Fragment {} public record Decimal(String key, int value) implements Fragment {} public record PlainList(String key, List values) implements Fragment {} public record QuotedList(String key, List values) implements Fragment {} public record Mapping(String key, List fragments) implements Printable {} public record BlockMapping(String key, List printables) implements Printable {} public record BlockList(String key, List blockMappings) implements Printable {} public record Comment(String key) implements Printable {} -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Mon Jul 25 17:20:02 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Mon, 25 Jul 2022 17:20:02 +0000 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel Message-ID: Or maybe we could let the printer to auto-quote as needed by the actual value in the context of the actual format ? So yes, it can be reduced: public sealed interface Printable { public String key(); public sealed interface Fragment extends Printable {} public record Value(String key, String ConstantDesc value) implements Fragment {} public record Quoted(String key, String value) implements Fragment {} public record Decimal(String key, int value) implements Fragment {} public record ValueList(String key, List values) implements Fragment {} public record QuotedList(String key, List values) implements Fragment {} public record Mapping(String key, List fragments) implements Printable {} public record BlockMapping(String key, List printables) implements Printable {} public record BlockList(String key, List blockMappings) implements Printable {} public record Comment(String key) implements Printable {} On 25.07.2022 19:03, "classfile-api-dev" wrote: Unfortunately it is format-specific. We can unify it down to ?all-quoted? however the visual benefits will be lost. On 25.07.2022 19:00, "Brian Goetz" wrote: This seems like nice progress. I wonder if we can boil it down further? Can we, for example, get rid of Quoted by Plain(key, "\"value\""), and such? On 7/25/2022 12:40 PM, Adam Sotona wrote: During the experiments I found generic map-of-maps is too loose and missing any formatting information. The API below has been derived from the requirements of the actual ClassPrinter and made as a minimal set of formatting features unified for producing JSON, YAML and XML. The API also tries to avoid problematic combinations (for example list of lists is not possible, but list of maps of lists works perfectly). Each Classfile API model can provide its printable form in the future. Printers implementations are generic and very simple (~80 lines of code each, mainly one big switch expression) and custom printers can be implemented. Please let me know your comments before I start rewriting the ClassPrinter into this intermediate form. Thanks, Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Mon Jul 25 17:28:29 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 25 Jul 2022 13:28:29 -0400 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: Message-ID: That's a good move.? Can we push further?? We have types here for mapping a key to a: ?- list of simple values ?- list of arbitrary fragments ?- list of printables ?- list of list of printable (blocklist) On 7/25/2022 1:20 PM, Adam Sotona wrote: > > Or maybe we could let the printer to auto-quote as needed by the > actual value in the context of the actual format ? > > So yes, it can be reduced: > > publicsealedinterface*Printable* { > > publicString *key*(); > > publicsealedinterface*/Fragment/* extendsPrintable {} > > publicrecord/Value/(String key, String ConstantDesc value) > implements/Fragment/ {} > > publicrecord/Quoted/(String key, String value) implements/Fragment/ {} > > publicrecord/Decimal/(String key, intvalue) implements/Fragment/ {} > > publicrecord/ValueList/(String key, List values) > implements/Fragment/ {} > > publicrecord/QuotedList/(String key, List values) > implements/Fragment/ {} > > publicrecord/Mapping/(String key, List fragments) > implementsPrintable {} > > publicrecord/BlockMapping/(String key, List printables) > implementsPrintable {} > > publicrecord/BlockList/(String key, List > blockMappings) implementsPrintable {} > > publicrecord/Comment/(String key) implementsPrintable {} > > On 25.07.2022 19:03, "classfile-api-dev" > wrote: > > Unfortunately it is format-specific. > > We can unify it down to ?all-quoted? however the visual benefits will > be lost. > > On 25.07.2022 19:00, "Brian Goetz" wrote: > > > This seems like nice progress.? I wonder if we can boil it down > further?? Can we, for example, get rid of Quoted by Plain(key, > "\"value\""), and such? > > > On 7/25/2022 12:40 PM, Adam Sotona wrote: > > During the experiments I found generic map-of-maps is too loose > and missing any formatting information. > > The API below has been derived from the requirements of the actual > ClassPrinter and made as a minimal set of formatting features > unified for producing JSON, YAML and XML. > > The API also tries to avoid problematic combinations (for example > list of lists is not possible, but list of maps of lists works > perfectly). > > Each Classfile API model can provide its printable form in the future. > > Printers implementations are generic and very simple (~80 lines of > code each, mainly one big switch expression) and custom printers > can be implemented. > > Please let me know your comments before I start rewriting the > ClassPrinter into this intermediate form. > > Thanks, > > Adam > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Mon Jul 25 17:51:11 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Mon, 25 Jul 2022 17:51:11 +0000 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: Message-ID: Further reduction will require additional info about formatting. Actual list has very simple printers implementations: switch (node) { case Value v -> case ValueList vl -> case Mapping m case BlockList bl -> case BlockMapping bm -> case Comment c -> } Replacing all simple values with list of values will wrap everything into square brackets and significantly reduce user readability. Or dynamic detection of single value in a list will destroy any schema for parsing. Visual difference of rendered Mapping and BlockMapping is significant in all three formats. The key here means a mapping key in YAML, object key in JSON and element or attribute name in XML, so it defines a key in the schemas. BlockMapping renders as multi-line mapping in YAML, as multi-line object in JSON and as multi-line nested elements in XML. While Mapping renders as single-line flow in YAML, as single-line object in JSON and as single element with attributes only in XML. Dynamic detection by content will make the print very unstable and unfriendly for user reading (for example one method will be collapsed, while the other will expand). I don?t see much space for reduction without losing visual features. On 25.07.2022 19:28, "Brian Goetz" wrote: That's a good move. Can we push further? We have types here for mapping a key to a: - list of simple values - list of arbitrary fragments - list of printables - list of list of printable (blocklist) On 7/25/2022 1:20 PM, Adam Sotona wrote: Or maybe we could let the printer to auto-quote as needed by the actual value in the context of the actual format ? So yes, it can be reduced: public sealed interface Printable { public String key(); public sealed interface Fragment extends Printable {} public record Value(String key, String ConstantDesc value) implements Fragment {} public record Quoted(String key, String value) implements Fragment {} public record Decimal(String key, int value) implements Fragment {} public record ValueList(String key, List values) implements Fragment {} public record QuotedList(String key, List values) implements Fragment {} public record Mapping(String key, List fragments) implements Printable {} public record BlockMapping(String key, List printables) implements Printable {} public record BlockList(String key, List blockMappings) implements Printable {} public record Comment(String key) implements Printable {} -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Mon Jul 25 17:59:42 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 25 Jul 2022 13:59:42 -0400 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: Message-ID: If "block" means multi-line, and non-block means single-line, I'm confused as to why we have ?? public record Mapping(String key, List fragments) implements Printable {} but ?? public record BlockMapping(String key,? List printables) implements Printable {} where one has fragments and the other only has Printables. Similarly, we have ??? public record ValueList(String key, List values) implements Fragment {} ??? public record BlockList(String key, List blockMappings) implements Printable {} where again the List element differs.?? Is there no way to make the block/non-block orthogonal to the payload type?? That would allow us to replace the two BlockXxx(...) with different payloads, with a Block wrapper as a formatting hint. On 7/25/2022 1:51 PM, Adam Sotona wrote: > > Further reduction will require additional info about formatting. > Actual list has very simple printers implementations: > > switch(node) { > > case/Value/ v -> > > case/ValueList/ vl -> > > case/Mapping/ m > > case/BlockList/ bl -> > > case/BlockMapping/ bm -> > > case/Comment/ c -> > > ??????????? } > > Replacing all simple values with list of values will wrap everything > into square brackets and significantly reduce user readability. > > Or dynamic detection of single value in a list will destroy any schema > for parsing. > > Visual difference of rendered Mapping and BlockMapping is significant > in all three formats. > > The key here means a mapping key in YAML, object key in JSON and > element or attribute name in XML, so it defines a key in the schemas. > > BlockMapping renders as multi-line mapping in YAML, as multi-line > object in JSON and as multi-line nested elements in XML. > > While Mapping renders as single-line flow in YAML, as single-line > object in JSON and as single ?element with attributes only in XML. > > Dynamic detection by content will make the print very unstable and > unfriendly for user reading (for example one method will be collapsed, > while the other will expand). > > I don?t see much space for reduction without losing visual features. > > On 25.07.2022 19:28, "Brian Goetz" wrote: > > > That's a good move.? Can we push further? We have types here for > mapping a key to a: > > ?- list of simple values > ?- list of arbitrary fragments > ?- list of printables > ?- list of list of printable (blocklist) > > > On 7/25/2022 1:20 PM, Adam Sotona wrote: > > Or maybe we could let the printer to auto-quote as needed by the > actual value in the context of the actual format ? > > So yes, it can be reduced: > > publicsealedinterface*Printable* { > > publicString *key*(); > > publicsealedinterface*/Fragment/* extendsPrintable {} > > publicrecord/Value/(String key, String ConstantDesc value) > implements/Fragment/ {} > > publicrecord/Quoted/(String key, String value) implements/Fragment/ {} > > publicrecord/Decimal/(String key, intvalue) implements/Fragment/ {} > > publicrecord/ValueList/(String key, List > values) implements/Fragment/ {} > > publicrecord/QuotedList/(String key, List values) > implements/Fragment/ {} > > publicrecord/Mapping/(String key, List fragments) > implementsPrintable {} > > publicrecord/BlockMapping/(String key,? List > printables) implementsPrintable {} > > publicrecord/BlockList/(String key,? List > blockMappings) implementsPrintable {} > > publicrecord/Comment/(String key) implementsPrintable {} > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Mon Jul 25 18:30:47 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 25 Jul 2022 14:30:47 -0400 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> Message-ID: <187a008a-bac6-c733-0c96-5da492b56331@oracle.com> Going through some old messages after vacation.? Not sure if I responded to any of these or not, so forgive me if I've said this already (or something contradictory.) > Thanks for merging my patches, the class reader API works more or less > equivalent to the ASM version. The one thing I have not working so far > are attributes that the JDK knows but ASM does not. I was trying to > implement the "contents" method upwards, but this is difficult with > the unbound attributes as they normally require a BufWriter which in > turn needs a ConstantPoolBuilder. Somehow, I need to pipe the constant > resolution to ASM which cannot be done without unsealing some > interfaces. I will try to prototype a solution here, too, but I wanted > to get the writer working first. This raises a more general question, which deserves a little bit of discussion.? For nearly every entity, there is a bound and unbound implementation.? The bound implementation is tied to a segment of a byte[] for a complete classfile; the unbound implementation can be considered a "deferred" implementation, and is disconnected from a constant pool.? The deferred implementation will eventually get written to a BufWriter which is bound to a constant pool. Similarly, when building, there are "direct" builders (who are accumulating various byte[] to write to the classfile) and "buffered" builders (which accumulate lists of elements.) There are some questions about entities that can only be answered by "bound" entities or "direct" builders, such as "what's the current BCI" or "give me the bytes of this attribute."? We have several choices here, none great: ?- Only expose APIs that can be satisfied by both implementations. This means you don't get to ask questions like "what's the current BCI".? Seems unfortunate. ?- Expose explicitly partial APIs, such as accessors returning Optional.? This makes it clear that you might not get an answer, but may annoy users as it is not always obvious that the question is partial (buffered builders, used when chaining transforms, are a non-obvious concept.) ?- Expose implicitly partial APIs, such as throwing when you ask a question in the wrong mode. We've been steering towards the third, on the theory/hope that it will only be natural to ask the question in the cases when we can actually answer.? This theory has been working out well so far, though this could simply be lack of imagination on our part. Under the third option, we'd move the contents() method to the base BoundAttribute class, and have the unbound classes throw "sorry, I can't answer that right now."? This is arguably reasonable because you are much more likely to ask for the bytes of an attribute you've read from a classfile, than one you've just yourself asked to have written to a classfile.? Where this might fall down is deep in a transform chain, where an earlier stage might replace a bound attribute with an unbound one (say, dropping an Exception from the MethodExceptions attribute.) > For the type annotations on instructions: Would it be an option to add > "getVisibleAnnotations" and "getInvisibleAnnotations" to the relevant > CodeElement types? This way, I could for example query the > "ExceptionCatch" value for its annotations. This is a good example of something that would only work on bound instructions, because the unbound ones are "stateless". > StackMapFrames could on the other hand just be added at their position > in the CodeElement iteration to receive them where they become > relevant. This way, one would not need to keep track of the current > offset. This would also allow for an easier write model where ASM does > not allow you to know the offset of a stack map. I assume that the > current model is very much modeled after the needs of the javap tool. > Ideally, the frame objects would be reduced to the information that is > contained in a class file and the consumer could track implicit > information such as the "effective" stack and locals. If we dispense stack map frames as part of the CodeElement iteration, users will assume that they can make their own stack map frames and send them downstream to the builder too (the whole model of transforms is based on flatmapping the stream of elements.)? It seems more likely that this will cause problems than will help. > > Q: in this case, is it enough if only ClassBuilder has this > option, or > do you need it for MethodBuilder and CodeBuilder as well? > Did you ever answer this question? -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Tue Jul 26 08:19:40 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Tue, 26 Jul 2022 08:19:40 +0000 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: Message-ID: We are building structured documents with focus on visual aspect, which makes them human readable. Such documents are very complex trees with zillions of combinations of various branch joints. These six classes (records) represent careful selection of such joints, so their combination (hopefully) covers all our needs for visually appearing class printing in three major structured text formats. Asymmetry of the joins exactly delimits permitted combinations. I didn?t validate it by test yet, however I expect any possible combination of Value, ValueList, Mapping, BlockMapping, BlockList and Comment will always render into valid (and nicely styled) document in all three formats. I?ve been evaluating another two approaches: 1. Separate data from formatting , so data can be hold in Map-of-Maps, while each mapping will be attributed by a single Enum describing the formatting. However I found no simple way how to attribute LinkedHashMap (without implementing custom keys, values, entries or the whole Map). And also generic Map is not very specific API, the whole code restrictions would have to move to Javadoc. Also implementation of each Printer will become a nightmare. 2. Produce generic data as Map-of-Maps without any formatting attributes and match it by each printer with specific ?style? descriptor (matching by keys). However beside the same problems as in #1 it also ignores the fact the data must be produce with formatting in mind, or the printers code will be one big hell and there will be no way how to cover the combinations by tests. I?m not sure how using generics can reduce actual number of 6 classes (records). Actual switch expression in each printer have exactly 6 cases to handle. When using generics we would need additional API and sub-switch to determine differences. Theoretically we can make the tree a bit more square and add an Enum parameter to determine difference between Mapping, BlockMapping, ValueList and BlockList (for example public record Mapping(String key, ShapeEnum shape, List)), however it will more than double possible combinations of the joins. Each combination must be covered in each printer or to exclude in runtime and document in Javadoc and provided with a test. To be more specific about the actual classes: Mapping accepts only fragments, because it must render as single line and for example in XML it renders as single element with attributes. It can hold single values or list of values, where the list is rendered into a single attribute value. It cannot hold another Mapping because we cannot embed XML element into an attribute (so that is we it implements printable). Rendering multiple elements on a single line is still valid XML document, however far from human readability. Mapping can represent a Fragment in JSON and in YAML, however XML throws an axe into that possibility. BlockMapping is the most powerful (multi-line indenting) joint, able to nest and render correctly another BlockMapping as well as any other Printable or Fragment. ValueList is restricted to leaf values (Strings, quoted String, numbers), because it is simple in all three formats. Construction of generic List of Fragments will require printers to render tons of other joint combinations, which we simply do not need. BlockList makes sense only in combination with BlockMappings. We would like to avoid BlockList of BlockLists as multi-level unnamed lists do not make much sense. Also rendering lists with any other joints is useless. If you have list of classes, list of methods, or list of fields ? it does not make sense to put any other Fragment, BlockList or Comment in between them. Theoretically we can replace BlockList with ListOfBlockMaps(String key, String mapsKey, List> printables), however that does not make anything easier nor smaller. From: Brian Goetz Date: Monday, 25 July 2022 19:59 To: Adam Sotona Cc: classfile-api-dev at openjdk.org Subject: Re: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel If "block" means multi-line, and non-block means single-line, I'm confused as to why we have public record Mapping(String key, List fragments) implements Printable {} but public record BlockMapping(String key, List printables) implements Printable {} where one has fragments and the other only has Printables. Similarly, we have public record ValueList(String key, List values) implements Fragment {} public record BlockList(String key, List blockMappings) implements Printable {} where again the List element differs. Is there no way to make the block/non-block orthogonal to the payload type? That would allow us to replace the two BlockXxx(...) with different payloads, with a Block wrapper as a formatting hint. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Tue Jul 26 19:43:03 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 26 Jul 2022 15:43:03 -0400 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: Message-ID: <00c561bf-ad99-b633-76d1-60f272ac6941@oracle.com> > To be more specific about the actual classes: > > Mapping accepts only fragments, because it must render as single line > and for example in XML it renders as single element with attributes. > OK, so this unearths a previously unstated requirement: that we be able to turn certain maps into attributes, rather than embedded elements, when converting to XML.? Does this requirement correspond to anything in other formats?? Do we gain anything by dropping this, and formatting always to elements, and/or using other heuristics to determine when we can get away with this format optimization? In general, I'd like to unearth more of this sort of requirements. Looking at the current hierarchy, I can't help but feel we've not yet "hit bottom"; it doesn't feel factored into separate concerns yet.? But I'm optimistic we can get there. Here's a naive decomposition, which looks a lot like the JSON spec if you squint: ??? Value = SimpleValue(ConstantDesc) | ListValue(List) | MapValue(Map) Converting to this form is useful separate from formatting -- it provides a basis for query/traversal using simple string keys.? But it's not good enough for getting good formatting yet, because you need formatting hints to determine how to lay out / indent, right? So let's consider how we might add these in as hints, that are more transparent. So far, I've seen that it's useful to: ?- Indent properly in multi-line formatting ?- Render simple maps in XML using attributes rather than elements ?- Render some lists on one line, rather than one line per element ?- Render some maps on one line, rather than one line per element The single-line / multi-line seems to be a hint based on the "complexity" of the element, which the traverser knows and wants to encode in the result.? Here's an attempt at simplifying. ??? Value = SimpleValue(ConstantDesc c) | ListValue(List l) | MapValue(Map m) ???????? | BlockValue(Value v) Here, BlockValue is a "hint wrapper", which takes some Value and wraps it with a hint that "this thing is complex, don't try to do it all in one line."? Non-format code can just ignore the hint and unwrap the BlockValue payload, and keep going. You currently distinguish the payload kind between Mapping and BlockMapping; one has a List, whereas the other has a List.? If we detune the payload type to List (Value in this simple example), we only need one Mapping type.? And the same thing goes with List vs BlockList. This puts a small additional burden on the formatter -- dealing with the case where we have an unwrapped List (therefore, non-block formatting), but some of its elements are not simple values.? I think this is easy enough, and there are choices: either reject these, or we fall back to block formatting when non-simple values are present.? This is a simple matter of checking: ??? l.stream().allMatch(v -> v instanceof SimpleValue) and falling back to block formatting if this isn't true. Alternately, we can get rid of the block wrapper, and add a "block hint" to the structured elements.? Switching back to your notation (but keeping my "keys are only for maps" modeling): ??? Printable = Value(ConstantDesc value) ??????? | ValueList(FormatHint f, List list) ??????? | ValueMap(FormatHint f, Map map) ??????? | Comment() where FormatHint might be as simple as `enum FormatHint { BLOCK, NOT_BLOCK }`, but could of course be fancier. When we lose the specificity of the List element type, we downgrade the formatting metadata from "authoritative requirement" to "hint", but that seems OK, since these are heuristics for optimizing the human-readable output. If someone arrived at this ADT, I think they'd know immediately what it means.? That's a big plus. Would this work? > It can hold single values or list of values, where the list is > rendered into a single attribute value. > > It cannot hold another Mapping because we cannot embed XML element > into an attribute (so that is we it implements printable). > > Rendering multiple elements on a single line is still valid XML > document, however far from human readability. > > Mapping can represent a Fragment in JSON and in YAML, however XML > throws an axe into that possibility. > > BlockMapping is the most powerful (multi-line indenting) joint, able > to nest and render correctly another BlockMapping as well as any other > Printable or Fragment. > > ValueList is restricted to leaf values (Strings, quoted String, > numbers), because it is simple in all three formats. Construction of > generic List of Fragments will require printers to render tons of > other joint combinations, which we simply do not need. > > BlockList makes sense only in combination with BlockMappings. We would > like to avoid BlockList of BlockLists as multi-level unnamed lists do > not make much sense. Also rendering lists with any other joints is > useless. If you have list of classes, list of methods, or list of > fields ? it does not make sense to put any other Fragment, BlockList > or Comment in between them. Theoretically we can replace BlockList > with ListOfBlockMaps(String key, String mapsKey, List> > printables), however that does not make anything easier nor smaller. > > *From: *Brian Goetz > *Date: *Monday, 25 July 2022 19:59 > *To: *Adam Sotona > *Cc: *classfile-api-dev at openjdk.org > *Subject: *Re: Classfile API proposal to integrate basic print > functionality directly to ClassModel and MethodModel > > If "block" means multi-line, and non-block means single-line, I'm > confused as to why we have > > ?? public record Mapping(String key, List fragments) > implements Printable {} > but > ?? public record BlockMapping(String key, List printables) > implements Printable {} > > where one has fragments and the other only has Printables. Similarly, > we have > > ??? public record ValueList(String key, List values) > implements Fragment {} > ??? public record BlockList(String key, List > blockMappings) implements Printable {} > > where again the List element differs.?? Is there no way to make the > block/non-block orthogonal to the payload type?? That would allow us > to replace the two BlockXxx(...) with different payloads, with a > Block wrapper as a formatting hint. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafael.wth at gmail.com Tue Jul 26 22:23:41 2022 From: rafael.wth at gmail.com (Rafael Winterhalter) Date: Wed, 27 Jul 2022 00:23:41 +0200 Subject: POC: JDK ClassModel -> ASM ClassReader In-Reply-To: <187a008a-bac6-c733-0c96-5da492b56331@oracle.com> References: <8763d647-92d1-0eca-7adf-9193bc58f31b@oracle.com> <187a008a-bac6-c733-0c96-5da492b56331@oracle.com> Message-ID: Hi Brian, As for your question about the "open class writer": yes, I would need this for everything that is modelled as a visitor in ASM. I created a proof of concept here where this is shown: https://github.com/raphw/jdk-sandbox/tree/classfile-api-monad - I integrated it in my writer bridge for ASM here: https://github.com/raphw/asm-jdk-bridge/tree/writer-poc which is more or less working for the majority of cases. I agree with your conclusion about the stack map API. It would also be nice to opt-out on a per-method basis. With ASM, this has sometimes been limited to not being able to compute frames only for certain methods that would require to chain ASM-reader-writer chains. We had discussed a potential solution on the thread about functionality I lack to fully complete my writer adoption. As for attributes: Ideally I would like to see that the AttributeMapper's by Attributes would allow me to write an attribute to a ConstantPool interface that I can implement myself. If I meet an attribute that JDK knows, I would like to simply pipe it to ASM's ClassWriter where that writer represents the targeted constant pool, rather than the JDK one. Maybe it could accept a form of reduced BufWriter as an interface that is not sealed? This would allow for a use against a different sink than a JDK class writer. This would make these partial byte-getters unneeded as the raw bytes are indeed less valuable if one could write a constant-pool-aware representation of that byte array to a sink. Best regards, Rafael Am Mo., 25. Juli 2022 um 20:30 Uhr schrieb Brian Goetz < brian.goetz at oracle.com>: > Going through some old messages after vacation. Not sure if I responded > to any of these or not, so forgive me if I've said this already (or > something contradictory.) > > Thanks for merging my patches, the class reader API works more or less > equivalent to the ASM version. The one thing I have not working so far are > attributes that the JDK knows but ASM does not. I was trying to implement > the "contents" method upwards, but this is difficult with the unbound > attributes as they normally require a BufWriter which in turn needs a > ConstantPoolBuilder. Somehow, I need to pipe the constant resolution to ASM > which cannot be done without unsealing some interfaces. I will try to > prototype a solution here, too, but I wanted to get the writer working > first. > > > This raises a more general question, which deserves a little bit of > discussion. For nearly every entity, there is a bound and unbound > implementation. The bound implementation is tied to a segment of a byte[] > for a complete classfile; the unbound implementation can be considered a > "deferred" implementation, and is disconnected from a constant pool. The > deferred implementation will eventually get written to a BufWriter which is > bound to a constant pool. Similarly, when building, there are "direct" > builders (who are accumulating various byte[] to write to the classfile) > and "buffered" builders (which accumulate lists of elements.) > > There are some questions about entities that can only be answered by > "bound" entities or "direct" builders, such as "what's the current BCI" or > "give me the bytes of this attribute." We have several choices here, none > great: > > - Only expose APIs that can be satisfied by both implementations. This > means you don't get to ask questions like "what's the current BCI". Seems > unfortunate. > - Expose explicitly partial APIs, such as accessors returning Optional. > This makes it clear that you might not get an answer, but may annoy users > as it is not always obvious that the question is partial (buffered > builders, used when chaining transforms, are a non-obvious concept.) > - Expose implicitly partial APIs, such as throwing when you ask a > question in the wrong mode. > > We've been steering towards the third, on the theory/hope that it will > only be natural to ask the question in the cases when we can actually > answer. This theory has been working out well so far, though this could > simply be lack of imagination on our part. > > Under the third option, we'd move the contents() method to the base > BoundAttribute class, and have the unbound classes throw "sorry, I can't > answer that right now." This is arguably reasonable because you are much > more likely to ask for the bytes of an attribute you've read from a > classfile, than one you've just yourself asked to have written to a > classfile. Where this might fall down is deep in a transform chain, where > an earlier stage might replace a bound attribute with an unbound one (say, > dropping an Exception from the MethodExceptions attribute.) > > For the type annotations on instructions: Would it be an option to add > "getVisibleAnnotations" and "getInvisibleAnnotations" to the relevant > CodeElement types? This way, I could for example query the "ExceptionCatch" > value for its annotations. > > > This is a good example of something that would only work on bound > instructions, because the unbound ones are "stateless". > > StackMapFrames could on the other hand just be added at their position in > the CodeElement iteration to receive them where they become relevant. This > way, one would not need to keep track of the current offset. This would > also allow for an easier write model where ASM does not allow you to know > the offset of a stack map. I assume that the current model is very much > modeled after the needs of the javap tool. Ideally, the frame objects would > be reduced to the information that is contained in a class file and the > consumer could track implicit information such as the "effective" stack and > locals. > > > If we dispense stack map frames as part of the CodeElement iteration, > users will assume that they can make their own stack map frames and send > them downstream to the builder too (the whole model of transforms is based > on flatmapping the stream of elements.) It seems more likely that this > will cause problems than will help. > > > Q: in this case, is it enough if only ClassBuilder has this option, or >> do you need it for MethodBuilder and CodeBuilder as well? >> > > Did you ever answer this question? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Wed Jul 27 06:35:12 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Wed, 27 Jul 2022 06:35:12 +0000 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: <00c561bf-ad99-b633-76d1-60f272ac6941@oracle.com> References: <00c561bf-ad99-b633-76d1-60f272ac6941@oracle.com> Message-ID: On 26.07.2022 21:43, "Brian Goetz" wrote: To be more specific about the actual classes: Mapping accepts only fragments, because it must render as single line and for example in XML it renders as single element with attributes. OK, so this unearths a previously unstated requirement: that we be able to turn certain maps into attributes, rather than embedded elements, when converting to XML. Does this requirement correspond to anything in other formats? Do we gain anything by dropping this, and formatting always to elements, and/or using other heuristics to determine when we can get away with this format optimization? In general, I'd like to unearth more of this sort of requirements. Looking at the current hierarchy, I can't help but feel we've not yet "hit bottom"; it doesn't feel factored into separate concerns yet. But I'm optimistic we can get there. Here's a naive decomposition, which looks a lot like the JSON spec if you squint: Value = SimpleValue(ConstantDesc) | ListValue(List) | MapValue(Map) Association with Java Maps and Lists is a mistake here. Every value must be named because every XML element or attribute must have a name and there is not a difference between rendering block list and block map in XML. Mapping is a collection of named entries, where the keys are rendered in all formats. List is a collection of named entries, where the keys are rendered in XML only. We can make it more ?square? and intuitive to Java developers as you propose, if we drop XML format. -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.van.acken at gmail.com Wed Jul 27 08:10:11 2022 From: michael.van.acken at gmail.com (Michael van Acken) Date: Wed, 27 Jul 2022 10:10:11 +0200 Subject: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: <00c561bf-ad99-b633-76d1-60f272ac6941@oracle.com> References: <00c561bf-ad99-b633-76d1-60f272ac6941@oracle.com> Message-ID: Am Di., 26. Juli 2022 um 21:43 Uhr schrieb Brian Goetz < brian.goetz at oracle.com>: > > To be more specific about the actual classes: > > Mapping accepts only fragments, because it must render as single line and > for example in XML it renders as single element with attributes. > > > OK, so this unearths a previously unstated requirement: that we be able to > turn certain maps into attributes, rather than embedded elements, when > converting to XML. Does this requirement correspond to anything in other > formats? Do we gain anything by dropping this, and formatting always to > elements, and/or using other heuristics to determine when we can get away > with this format optimization? > > In general, I'd like to unearth more of this sort of requirements. > Looking at the current hierarchy, I can't help but feel we've not yet "hit > bottom"; it doesn't feel factored into separate concerns yet. But I'm > optimistic we can get there. > > Here's a naive decomposition, which looks a lot like the JSON spec if you > squint: > > Value = SimpleValue(ConstantDesc) | ListValue(List) | > MapValue(Map) > > Converting to this form is useful separate from formatting -- it provides > a basis for query/traversal using simple string keys. But it's not good > enough for getting good formatting yet, because you need formatting hints > to determine how to lay out / indent, right? So let's consider how we > might add these in as hints, that are more transparent. [...] > If it's a question of prettyprinting where indentation and line breaks are added automatically, then there is the option to layer all whitespace manipulation on top of the printed data. I use this to prettyprint Clojure data structures, which closely resemble your `Value` production above. The key idea goes back to the 1980 paper "Prettyprinting" by Oppen and boils down to a single decision how to insert line breaks. Given a delimited group ... where and are usually literals, arbitrary data, and is eventually a single whitespace or a line break. Determine the width of the group assuming that all (both in the group itself and as part of elements) are of width one. If the group's width fits within the page width remaining, then print each as a single whitespace, otherwise the group's separators as line breaks. The "one line width" flows naturally from the leaves to the root. Combine this with indentation tracking, and the result is a simple but capable prettyprinter targeting a given page width as a soft target. Building on this, I print colorized side-by-side diffs of lists of classfiles. -- mva -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.sotona at oracle.com Wed Jul 27 10:04:33 2022 From: adam.sotona at oracle.com (Adam Sotona) Date: Wed, 27 Jul 2022 10:04:33 +0000 Subject: [External] : Re: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: <00c561bf-ad99-b633-76d1-60f272ac6941@oracle.com> Message-ID: Unfortunately, ?Prettyprinting? can be reduced to indentation and new lines only for XML and JSON, however YAML offers much more dynamic formatting. I think when we kick XML out of the equations (and workaround comments), we can simplify the printing API to following: enum Style { BLOCK, FLOW } public sealed interface Printable {} public record PrintableValue(ConstantDesc value) implements Printable {} public record PrintableList(Style style, List list) implements Printable {} public record PrintableMap(Style style, Map map) implements Printable {} And get following results: https://gist.github.com/asotona/01a054ecca5d8f9608516cb738c33ce1 and https://gist.github.com/asotona/1e9b1c233c606724ad1c4e0eff1df3de On 27.07.2022 10:10, "Michael van Acken" wrote: Am Di., 26. Juli 2022 um 21:43 Uhr schrieb Brian Goetz >: To be more specific about the actual classes: Mapping accepts only fragments, because it must render as single line and for example in XML it renders as single element with attributes. OK, so this unearths a previously unstated requirement: that we be able to turn certain maps into attributes, rather than embedded elements, when converting to XML. Does this requirement correspond to anything in other formats? Do we gain anything by dropping this, and formatting always to elements, and/or using other heuristics to determine when we can get away with this format optimization? In general, I'd like to unearth more of this sort of requirements. Looking at the current hierarchy, I can't help but feel we've not yet "hit bottom"; it doesn't feel factored into separate concerns yet. But I'm optimistic we can get there. Here's a naive decomposition, which looks a lot like the JSON spec if you squint: Value = SimpleValue(ConstantDesc) | ListValue(List) | MapValue(Map) Converting to this form is useful separate from formatting -- it provides a basis for query/traversal using simple string keys. But it's not good enough for getting good formatting yet, because you need formatting hints to determine how to lay out / indent, right? So let's consider how we might add these in as hints, that are more transparent. [...] If it's a question of prettyprinting where indentation and line breaks are added automatically, then there is the option to layer all whitespace manipulation on top of the printed data. I use this to prettyprint Clojure data structures, which closely resemble your `Value` production above. The key idea goes back to the 1980 paper "Prettyprinting" by Oppen and boils down to a single decision how to insert line breaks. Given a delimited group ... where and are usually literals, arbitrary data, and is eventually a single whitespace or a line break. Determine the width of the group assuming that all (both in the group itself and as part of elements) are of width one. If the group's width fits within the page width remaining, then print each as a single whitespace, otherwise the group's separators as line breaks. The "one line width" flows naturally from the leaves to the root. Combine this with indentation tracking, and the result is a simple but capable prettyprinter targeting a given page width as a soft target. Building on this, I print colorized side-by-side diffs of lists of classfiles. -- mva -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.van.acken at gmail.com Wed Jul 27 14:00:53 2022 From: michael.van.acken at gmail.com (Michael van Acken) Date: Wed, 27 Jul 2022 16:00:53 +0200 Subject: [External] : Re: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: <00c561bf-ad99-b633-76d1-60f272ac6941@oracle.com> Message-ID: Am Mi., 27. Juli 2022 um 12:04 Uhr schrieb Adam Sotona < adam.sotona at oracle.com>: > Unfortunately, ?Prettyprinting? can be reduced to indentation and new > lines only for XML and JSON, however YAML offers much more dynamic > formatting. > I have to admit that I never used YAML syntax in any form. Looking at the Wikipedia page, it seems like YAML's optional "inline-style" corresponds to the "single-line formatting" style of the prettyprinter approach, and its indented multi-line blocks to multi-line formatting. In this case, the width information passed up the tree would represent the "inline-style" width. I would not rule out that it can be made to work. I agree with you that it would entail more than just moving strings around, because printing of lists and maps would have two distinct output modes. But it may be possible to move the decision which mode to use out of the data model. -- mva -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.lloyd at redhat.com Wed Jul 27 14:42:44 2022 From: david.lloyd at redhat.com (David Lloyd) Date: Wed, 27 Jul 2022 09:42:44 -0500 Subject: Trying out the API for qbicc Message-ID: Our qbicc project handles classfiles extensively, as one might imagine, for both parsing and generation of classes. I've been playing around with using the new API instead. So far, I am liking this API a lot; the design is overall very sensible and usable as far as I can tell; though it is still pretty early and not a lot is working yet, I do have some initial impressions. Parsing When we parse a method body, our parser is processing it directly into SSA form for later analysis. We're doing this in a depth-first recursive manner, where we start from the top of the method, and at each instruction which corresponds to a basic block terminator (which would be things like GOTO*, IF*, *SWITCH, *RETURN, ATHROW, and also some special cases like method invocation inside of a `try` block), we close up the current block and recursively process each unprocessed successor block (if any). In this way we naturally ignore any unreachable bytecodes - not just from bizarrely-formed class files (though this is possible) but also from parsing conditional constructs where we can establish a constant condition early in processing. However this approach depends on being able to randomly access the bytecode body. This seems doable with the new API, but unless I missed some helper method(s), to do so apparently requires iterating the instruction list, collecting all of the labels, and building a label-to-integer-key mapping to locate the list indices where processing should be resumed for a given label. It would certainly be nice to be able to have a more flexible seeking solution, like a special iterator API which can seek based on labels for example. Of course another option is to rework the algorithm to process every basic block in the bytecode from top to bottom, and let unreachable basic blocks fall out of the graph. This is not an unreasonable option, though it does generally require at least one more pass of processing in order to do things like number the blocks, identify loops, and establish reachability. Another issue with the strong encapsulation of BCI as labels is that it does not seem possible to find the BCI of an arbitrary instruction. This can be a problem for example when we need to record the BCI of a method invocation within a try block. A usable solution could be to automatically generate a Label before any invocation within a try/catch region (unless this is already being done). It also makes debugging mode difficult, as presently every node in our program graph records original line number and BCI information to make it easy to correlate subgraphs with their original bytecodes. This is less of a problem on the generation side since it appears that I can generate a label before any instruction, and then collect the corresponding BCIs after the method body is compiled. Generation We also generate classes for various purposes so I was doing some experiments with this as well. So far I have found this to be fairly straightforward, but I have so far encountered one minor API issue with this API (which to be completely fair ASM also suffers from). With ASM, when you're emitting instructions, you have to know not only the opcode of the instruction you're emitting but also the particular API method which corresponds to the correct instruction shape. This is excusable to an extent within ASM, because the opcode argument is an `int` so if there was only one overloaded method name with every shape, it might be too easy to make a mistake (never mind how relatively poetic it would have been for the main assembly method in ASM to be called `asm` :-) ). However, this API is otherwise very strongly typed, taking full advantage of the new pattern matching and sealing capabilities. So I was a bit surprised when all instruction opcodes were still represented by a single type (in this case an `enum`), even though there are enough different opcode shapes or characteristics to warrant *six* different constructors. Would it not make sense to make `Opcode` a sealed interface, with an enum for each opcode shape? In this way, instead of having a method for each of *many* (but not all) instructions (many of which are highly similar internally) and several overlapping ASM-like "emit this shape by name" methods for *some* other instructions - which ambiguously accepts a plain, generally-typed Opcode - there could be (many fewer) emit methods which accept a specific opcode type as the first argument and the correct argument values for subsequent arguments? Then as a developer, I need only to know which opcode I want to emit, and in my IDE I can for example type `cb.emit(GOTO, ` and immediately see that the GOTO instruction requires a `Label` argument, because that overload will be unambiguously selected. This also makes it much harder to make an error involving the wrong instruction shape; invalid-opcode errors that this API would only raise at run time would then be detectable directly within one's IDE without even having to compile, which improves ease of use. Obviously this is wandering dangerously close to the bikeshed borderline, however one other real-world advantage is that an enum constant in a more specific `*Opcode` subtype type can store more useful information about itself that a consumer could use; for example, the opcode constant for `IFEQ` could have a method `complement` which yields `IFNE`, which can be useful for simplifying some code generators (and I can think of specific cases both within qbicc and within Quarkus where this would have been useful). -- - DML ? he/him -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Wed Jul 27 15:56:10 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 27 Jul 2022 11:56:10 -0400 Subject: Trying out the API for qbicc In-Reply-To: References: Message-ID: Thanks for giving it a try! On 7/27/2022 10:42 AM, David Lloyd wrote: > Our qbicc project handles classfiles?extensively, as one might > imagine, for both parsing and generation of classes. I've been playing > around with using the new API instead. So far, I am liking this API a > lot; the design is overall very sensible and usable as far as I can > tell; though it is still pretty early and not a lot is working yet, I > do have some initial impressions. > > Parsing > > When we parse a method body, our parser is processing it directly into > SSA form for later analysis. We're doing this in a depth-first > recursive manner, where we start from the top of the method, and at > each instruction which corresponds to a basic block terminator (which > would be things like GOTO*, IF*, *SWITCH, *RETURN, ATHROW, and also > some special cases like method invocation inside of a `try` block), we > close up the current block and recursively process each unprocessed > successor block (if any). In this way we naturally ignore any > unreachable bytecodes - not just from bizarrely-formed class files > (though this is possible) but also from parsing conditional constructs > where we can establish a constant condition early in processing. > > However this approach depends on being able to randomly access the > bytecode body. This seems doable with the new API, but unless I missed > some helper method(s), to do so apparently requires iterating the > instruction list, collecting all of the labels, and building a > label-to-integer-key mapping to locate the list indices where > processing should?be resumed for a given label. It would certainly be > nice to be able to have a more flexible seeking solution, like a > special iterator API which can seek based on labels for example. In ASM, you would use the "tree" API, to materialize the body into a random-access data structure.? This is a bit unfortunate, because (a) the tree API is much slower than the streaming API, and (b) it is also somewhat different from the streaming API.? (And mutable.) We intent to improve on that, by having the "materialized" API just be "put the elements in a list/tree structure".? For ClassModel/MethodModel, you can see the idea in play; you can stream the elements of a ClassModel, and you'll get methods, fields, etc, but you could also just call ClassModel::fields and it will materialize (and cache) a List and return that. What you want is the equivalent for CodeModel, which is conceptually similar but we are missing a few things. You can of course call CodeModel::elementList and get a List out, which includes the label targets inline.? What's missing is the ability to map labels to *list indexes*.? We know we want this, we made a stab at it in an early prototype, it was a mess (because some other things were a mess), but we would like to return to this. > Another issue with the strong encapsulation of BCI as labels is that > it does not seem possible to find the BCI of an arbitrary instruction. This is related to a comment recently from Rafael, in that this works when we are traversing a *bound* CodeModel, but not a buffered code model (which might result from an intermediate stage of a transformation.)? If we are OK with making operations like bci() partial, we can address this by, say, defining a refined `Iterator` that also has a bci() accessor.? This works when parsing, but not necessarily when transforming, but that might be OK. > Generation > > We also generate classes for various purposes so I was doing some > experiments with this as well. So far I have found this to be fairly > straightforward, but I have so far encountered one minor API issue > with this API (which to be completely fair ASM also suffers from). > > With ASM, when you're emitting instructions, you have to know not only > the opcode of the instruction you're emitting but also the particular > API method which corresponds to the correct instruction shape. This is > excusable to an extent within ASM, because the opcode argument is an > `int` so if there was only one overloaded method name with every > shape, it might be too easy to make a mistake (never mind how > relatively poetic it would have been for the main assembly method in > ASM to be called `asm` :-) ). You should think of the generation methods as layered.? At the most abstract, there is `with(CodeElement)`.? Every other generation method bottoms out here.? At the next level, there are the ones that correspond to the coarse categories in the data model, such as `load(kind, slot)` or `operator(opc)`.? At the finest level, there are methods for aload_0() and ishl(), which again all bottom out in `with(CodeElement)`. Our assumption is that most "hand coded" generation code will prefer the most fine-grained ones, pattern-driven transformation code will probably do things like match on `LoadInstruction` and turn around and call load() again, maybe with different arguments, and "purely mechanical" transformation code will probably prefer just making elements and shoveling them down the pipeline. > However, this API is otherwise very strongly typed,?taking full > advantage of the new pattern matching and sealing capabilities. So I > was a bit surprised when all instruction opcodes were still > represented by a single type (in this case an `enum`), even though > there are enough different opcode shapes or characteristics to warrant > *six* different constructors. I think you may be mixing the Opcode and Instruction abstractions? The `Opcode` abstraction is explicitly about bytecodes and bytecode-specific metadata, whereas an Instruction is an instantiation of an Opcode + operands.? (Some instructions, of course, have no operations (e.g., `iadd`); in this case, you'll notice the implementation has a singleton cache.) The Opcode type mostly serves the implementation, to facilitate mapping to metadata (instruction size, kind, etc), and to manage the weirdness of the WIDE opcodes.? (If it were not for WIDE, I'd probably have just gone with `byte` and lookup functions.) I find it a little unfortunate that some methods like `branch` require an Opcode argument -- feels like mixing levels, as you suggest -- but the alternatives were worse. > Would it not make sense to make `Opcode` a sealed interface, with an > enum for each opcode shape? We tried something like this early on.? It ran into the problem that switching over multiple enums in one switch is not supported.? So having multiple enums may be more rich in modeling, but clients pay a penalty -- multiple switches.? This didn't feel like a good trade.? (It is possible the API and implementation has evolved since then, to make this less problematic, but that would have to be established.) > In this way, instead of having a method for each of *many* (but not > all) instructions (many of which are highly similar internally) and > several overlapping ASM-like "emit this shape by name" methods for > *some* other instructions - which ambiguously accepts a plain, > generally-typed Opcode - there could be (many fewer) emit methods > which accept a specific opcode type as the first argument and the > correct argument values for subsequent arguments? I don't think there would be "many fewer" methods; it just means that some of the type checking can be moved from runtime to compile time (e.g., branch(opc, label) wouldn't let you use IADD as the opcode).?? But I would think all the same methods would be there, just with tighter types. > Obviously this is wandering dangerously close to the bikeshed > borderline, however one other real-world advantage is that an enum > constant in a more specific `*Opcode` subtype type can store more > useful information about itself that a consumer could use; for > example, the opcode constant for `IFEQ` could have a method > `complement` which yields `IFNE`, which can be useful for simplifying > some code generators (and I can think of specific cases both within > qbicc and within Quarkus where this would have been useful). This method exists in the library as an Opcode -> Opcode method. Cheers, -Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.van.acken at gmail.com Wed Jul 27 19:22:32 2022 From: michael.van.acken at gmail.com (Michael van Acken) Date: Wed, 27 Jul 2022 21:22:32 +0200 Subject: [External] : Re: Classfile API proposal to integrate basic print functionality directly to ClassModel and MethodModel In-Reply-To: References: <00c561bf-ad99-b633-76d1-60f272ac6941@oracle.com> Message-ID: Am Mi., 27. Juli 2022 um 16:00 Uhr schrieb Michael van Acken < michael.van.acken at gmail.com>: > Am Mi., 27. Juli 2022 um 12:04 Uhr schrieb Adam Sotona < > adam.sotona at oracle.com>: > >> Unfortunately, ?Prettyprinting? can be reduced to indentation and new >> lines only for XML and JSON, however YAML offers much more dynamic >> formatting. >> > > I have to admit that I never used YAML syntax in any form. Looking at the > Wikipedia page, it seems like YAML's optional "inline-style" corresponds > to the "single-line formatting" style of the prettyprinter approach, and > its > indented multi-line blocks to multi-line formatting. In this case, the > width > information passed up the tree would represent the "inline-style" width. > > I would not rule out that it can be made to work. I agree with you that it > would entail more than just moving strings around, because printing > of lists and maps would have two distinct output modes. But it may > be possible to move the decision which mode to use out of the data > model. > Curiosity got the better of me and I tried to put together a proof of concept. I used sample.json as input and generated two attempts at YAML, one formatted to a page width of 80 and another one to a width of 40. The diff between sample.yaml and output_pp80.yaml is attached, not sure if it is still a valid representation with its three indented inline maps. Someone who understands YAML should be able to do a better job than this. I'm not sure I got the interaction between list, map, and map entry entirely correct. The code is on github at https://github.com/mva/eval-pp -- mva --- sample.yaml 2022-07-27 20:43:42.795748163 +0200 +++ output_pp80.yaml 2022-07-27 21:04:23.923566743 +0200 @@ -37,11 +37,13 @@ max locals: 3 attributes: [StackMapTable] stack map frames: - '@6': {locals: [Foo, int, java/lang/Throwable], stack: []} - //stack map frame @0: {locals: [Foo, int, java/lang/Throwable], stack: []} + {'@6': {locals: [Foo, int, java/lang/Throwable], stack: []}} + //stack map frame @0: + {locals: [Foo, int, java/lang/Throwable], stack: []} 0: [ILOAD_1, {slot: 1}] 1: [IFEQ, {target: 6}] 4: [ALOAD_2, {slot: 2}] 5: [ATHROW] - //stack map frame @6: {locals: [Foo, int, java/lang/Throwable], stack: []} + //stack map frame @6: + {locals: [Foo, int, java/lang/Throwable], stack: []} 6: [RETURN] -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.lloyd at redhat.com Wed Jul 27 20:24:21 2022 From: david.lloyd at redhat.com (David Lloyd) Date: Wed, 27 Jul 2022 15:24:21 -0500 Subject: Trying out the API for qbicc In-Reply-To: References: Message-ID: On Wed, Jul 27, 2022 at 1:41 PM Brian Goetz wrote: > In ASM, you would use the "tree" API, to materialize the body into a > random-access data structure. This is a bit unfortunate, because (a) the > tree API is much slower than the streaming API, and (b) it is also somewhat > different from the streaming API. (And mutable.) > Indeed, I abandoned using ASM for parsing for this reason, in favor of a hand-written ByteBuffer-backed parser. It goes without saying that I would be pleased to use something else though. :-) > We intent to improve on that, by having the "materialized" API just be > "put the elements in a list/tree structure". For ClassModel/MethodModel, > you can see the idea in play; you can stream the elements of a ClassModel, > and you'll get methods, fields, etc, but you could also just call > ClassModel::fields and it will materialize (and cache) a List > and return that. What you want is the equivalent for CodeModel, which is > conceptually similar but we are missing a few things. > > You can of course call CodeModel::elementList and get a List > out, which includes the label targets inline. What's missing is the > ability to map labels to *list indexes*. We know we want this, we made a > stab at it in an early prototype, it was a mess (because some other things > were a mess), but we would like to return to this. > Great! I do especially like the lazy (or at the very least, potentially lazy) materialization, which was another contributing reason for leaving ASM behind on the parsing end. > This is related to a comment recently from Rafael, in that this works when > we are traversing a *bound* CodeModel, but not a buffered code model (which > might result from an intermediate stage of a transformation.) If we are OK > with making operations like bci() partial, we can address this by, say, > defining a refined `Iterator` that also has a bci() accessor. > This works when parsing, but not necessarily when transforming, but that > might be OK. > OK, I look forward to seeing whether or how this gets addressed. I think you may be mixing the Opcode and Instruction abstractions? The > `Opcode` abstraction is explicitly about bytecodes and bytecode-specific > metadata, whereas an Instruction is an instantiation of an Opcode + > operands. (Some instructions, of course, have no operations (e.g., > `iadd`); in this case, you'll notice the implementation has a singleton > cache.) > Maybe I explained poorly but I was specifically thinking of Opcodes and their characteristics, independently of any particular realization of an instruction in a code model. > Would it not make sense to make `Opcode` a sealed interface, with an enum > for each opcode shape? > > > We tried something like this early on. It ran into the problem that > switching over multiple enums in one switch is not supported. So having > multiple enums may be more rich in modeling, but clients pay a penalty -- > multiple switches. This didn't feel like a good trade. (It is possible > the API and implementation has evolved since then, to make this less > problematic, but that would have to be established.) > Ah, I had assumed that switching over multiple enums was addressed in the pattern-matching-switch update. Not having personally kept on top of the latest developments there, I had quickly sketched up a test in IntelliJ and it appeared to work so long as the static type of the switch argument was a sealed interface which permits only enum types *and* you happened to have static-imported the enum constant names (for syntactic reasons I suppose). But I didn't actually verify that this was allowed by spec and sure enough, `javac` rejects it as it does not consider the statically imported enum values to be constant expressions. Oh well. Obviously this is wandering dangerously close to the bikeshed borderline, > however one other real-world advantage is that an enum constant in a more > specific `*Opcode` subtype type can store more useful information about > itself that a consumer could use; for example, the opcode constant for > `IFEQ` could have a method `complement` which yields `IFNE`, which can be > useful for simplifying some code generators (and I can think of specific > cases both within qbicc and within Quarkus where this would have been > useful). > > > This method exists in the library as an Opcode -> Opcode method. > Ah yes, I found it in `BytecodeHelpers`, excellent. That's in the `impl` subpackage though, so it doesn't feel very "public". Perhaps that class could be moved into the `jdk.classfile` package? -- - DML ? he/him -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Wed Jul 27 20:34:08 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 27 Jul 2022 16:34:08 -0400 Subject: Trying out the API for qbicc In-Reply-To: References: Message-ID: <56d62ce2-9ec5-5dab-77f0-0d83c14386c6@oracle.com> >> Obviously this is wandering dangerously close to the bikeshed >> borderline, however one other real-world advantage is that an >> enum constant in a more specific `*Opcode` subtype type can store >> more useful information about itself that a consumer could use; >> for example, the opcode constant for `IFEQ` could have a method >> `complement` which yields `IFNE`, which can be useful for >> simplifying some code generators (and I can think of specific >> cases both within qbicc and within Quarkus where this would have >> been useful). > > This method exists in the library as an Opcode -> Opcode method. > > > Ah yes, I found it in `BytecodeHelpers`, excellent. That's in the > `impl` subpackage though, so it doesn't feel very "public". Perhaps > that class could be moved into the `jdk.classfile` package? > Yes, we've been conservative about what we expose, since its easier to expose something hidden than vice versa. What other transforms on opcodes go along with this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.lloyd at redhat.com Wed Jul 27 20:56:33 2022 From: david.lloyd at redhat.com (David Lloyd) Date: Wed, 27 Jul 2022 15:56:33 -0500 Subject: Trying out the API for qbicc In-Reply-To: <56d62ce2-9ec5-5dab-77f0-0d83c14386c6@oracle.com> References: <56d62ce2-9ec5-5dab-77f0-0d83c14386c6@oracle.com> Message-ID: On Wed, Jul 27, 2022 at 3:34 PM Brian Goetz wrote: > > This method exists in the library as an Opcode -> Opcode method. >> > > Ah yes, I found it in `BytecodeHelpers`, excellent. That's in the `impl` > subpackage though, so it doesn't feel very "public". Perhaps that class > could be moved into the `jdk.classfile` package? > > > Yes, we've been conservative about what we expose, since its easier to > expose something hidden than vice versa. > > What other transforms on opcodes go along with this? > >From my own experience, complementing a conditional is about it for transformations - though I have at least imagined a couple of the other transformations in that class, I haven't run across practical applications for them. Some additional queries might be useful though. I've recently had a case where it would have been convenient to know whether a given binary operation was commutative, in order to know whether I had to care about operand order on the stack. Also, I note that both binary and unary operators are known just as "operators"; adding a distinction here might be useful as well. I could probably speculate several more, but as they say: YAGNI. -- - DML ? he/him -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.van.acken at gmail.com Thu Jul 28 04:12:49 2022 From: michael.van.acken at gmail.com (Michael van Acken) Date: Thu, 28 Jul 2022 06:12:49 +0200 Subject: Trying out the API for qbicc In-Reply-To: <56d62ce2-9ec5-5dab-77f0-0d83c14386c6@oracle.com> References: <56d62ce2-9ec5-5dab-77f0-0d83c14386c6@oracle.com> Message-ID: Am Mi., 27. Juli 2022 um 22:34 Uhr schrieb Brian Goetz < brian.goetz at oracle.com>: > > Obviously this is wandering dangerously close to the bikeshed borderline, >> however one other real-world advantage is that an enum constant in a more >> specific `*Opcode` subtype type can store more useful information about >> itself that a consumer could use; for example, the opcode constant for >> `IFEQ` could have a method `complement` which yields `IFNE`, which can be >> useful for simplifying some code generators (and I can think of specific >> cases both within qbicc and within Quarkus where this would have been >> useful). >> >> >> This method exists in the library as an Opcode -> Opcode method. >> > > Ah yes, I found it in `BytecodeHelpers`, excellent. That's in the `impl` > subpackage though, so it doesn't feel very "public". Perhaps that class > could be moved into the `jdk.classfile` package? > > > Yes, we've been conservative about what we expose, since its easier to > expose something hidden than vice versa. > > What other transforms on opcodes go along with this? > I'm using two related transforms. First, a "swap sides" for comparisons mapping e.g. the LT variants to their GT companions while being identity for EQ and NE. I use this to slide null and integer zero to the right hand side. Then the second step takes two operand compares against null/zero and maps them to their one operand counterparts. -- mva -------------- next part -------------- An HTML attachment was scrubbed... URL: