<div dir="auto">Short follow up:<div dir="auto"><br></div><div dir="auto">1. I figured out that the annotation on the NEW instruction used a label that was bound after the instruction when it should have been bound before. My hacky workaround for ASM - which only visits annotations after an instruction - is to always register a label prior to annotatable instructions, but ideally, I would be able to offset a label with a given size to avoid this. I felt like this could be a solution as OpenJDK already exposes bytecode offsets in its public API.</div><div dir="auto"><br></div><div dir="auto">2. I suggested a change for line numbers to optionally accept a label. This way, I can register line numbers later similarly to how ASM does it. I understand it's unlikey a common need but it's a minor change and unifies the possibility with other metadata in the API which all use labels.</div><div dir="auto"><br></div><div dir="auto">Best regards, Rafael </div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Rafael Winterhalter <<a href="mailto:rafael.wth@gmail.com" rel="noreferrer noreferrer" target="_blank">rafael.wth@gmail.com</a>> schrieb am So., 10. Juli 2022, 22:57:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>I was able to complete a POC for an ASM-to-OpenJDK bridge. It works quite well, but I hit a few more road blocks then the last time: <a href="https://github.com/raphw/asm-jdk-bridge/tree/writer-poc" rel="noreferrer noreferrer noreferrer" target="_blank">https://github.com/raphw/asm-jdk-bridge/tree/writer-poc</a></div><div><br></div><div>I have used my API suggestion of "open builders" (<a href="https://github.com/raphw/jdk-sandbox/commit/2be58f400b9ebf96b851eda658e0b8d2560421c5" rel="noreferrer noreferrer noreferrer" target="_blank">https://github.com/raphw/jdk-sandbox/commit/2be58f400b9ebf96b851eda658e0b8d2560421c5</a>) for this POC which I had modified slightly since my last mail. As Paul pointed out, it does not allow to repeat a code builder in case of a LabelOverflowException. But personally, I would - as a user - not expect that my CodeBuilder-Consumer is consumed multiple times, I would hope that the implementation could rather reiterate to patch the label offsets, or at least do that for the "open version". Anyways, I think that this limitation is solvable.</div><div><br></div><div>I have again written a row of tests to see how well the bridge is working. Again, I made a few observations:</div><div><br></div><div>- The frame generation works, but generates slightly different frames from what javac generated if I leave the byte code untouched. For example, local variables are "voided" if they are not used after a frame. This is not a problem in practice, but I know that some tools prepend code to methods, assuming all variables still exist, without checking frames what would fail bytecode verification. This would however only happen if a class file was first processed by OpenJDKs API and later by ASM. In my bridge sample repo, this is reproduced by the test of the "TryThrowCatch" class.<br></div><div>- The frame generation does not work for optional types which are often used especially in enterprise-directed frameworks like Spring (if this is fortunate is another question, it is however quite common). I understand that the class hierarchy resolution is pluggable, but for optional types, the class hierarchy cannot be resolved, even with a custom resolver, since these types are in fact missing at runtime. This is why Byte Buddy uses "frame weaving" which would require that frames can be written explicitly and are not handled by the class writer. I would therefore suggest that one should be able to opt-out of frame generation, be it for single methods, and have some API to add them explicitly. This normally also improves performance. Byte Buddy for example generates three times the overhead if ASMs frame computation is enabled as it is rather I/O-heavy. An example of where OpenJDK creates invalid frames is documented in the "FrameWithMissingType" test of the bridge repo. Also, many byte code processing frameworks already need to keep track of the maximum stack and locals; I am not sure how expensive computation of those values is in practice, but ASM claims that it saves 20% on avoiding the bookkeeping. Maybe one could consider to allow specifying these values too if they are available anyways.<br></div><div>- I noticed that when adding a TypeAnnotation on a NEW instruction, the offset in the attribute seems to be set incorrectly. The type annotation is added with reference to the DUP bytecode that follows NEW. I am not 100% sure if this is legal as the type reference still points to a NEW instruction, but I would still suggest fixing it. I tried to do this myself, but I got lost in the code. Maybe I will find some more time to still do it myself some time. The error is documented in the "TypeAnnotationInCode" test of my repo.</div><div>- I noticed that the JSR/RET instructions are not supported, even if they are just "passed through". Unfortunately, I still encounter them regularly when working with Java agents. JDBC drivers are often compiled to very old bytecode levels and for example vendors for APM tools like instrumenting those. I know they are difficult and annoying bytecodes, but I would expect that OpenJDK will eventually be able to process those. Is this only meant as an intermediate limitation?</div><div>- Custom/unknown attributes are also difficult to deal with when writing. In a way, its the same problem as with reading them. As I mentioned then, I lack some way of resolving an unknown Attribute to an array of bytes given some constant pool. Currently, I cannot even attempt this as all interfaces are sealed and I could not provide a custom implementation of a ConstantPoolBuilder to an AttributeMapper either. As things are, ASM does neither support this and I made a suggestion to open up the attribute API which is currently tied to ClassReader/ClassWriter implementations (<a href="https://gitlab.ow2.org/asm/asm/-/merge_requests/353" rel="noreferrer noreferrer noreferrer" target="_blank">https://gitlab.ow2.org/asm/asm/-/merge_requests/353</a>). This is a prerequisit and I would hope to have a similar API that is implementable in OpenJDKs to pass to OpenJDK's AttributeMapper class, to write attributes from and to byte arrays..</div><div>- OpenJDK's "line number" API only allows setting a line number for the current code position. For most use cases, I think this is sufficient, but it stuck out to me as it is an attribute where most in-code attributes are currently tied to labels. Ideally, I would suggest that line numbers should also be definable for a previous position with a label; ASM does allow the same at least and I assume people will have an easier time migrating code.<br></div><div><br></div><div>Before all of these things however, some form of "open API" would be required to even make the class writer bridge possible, so I hope that you can consider something in the direction of my suggested pull request. Thanks for considering! I will try to make suggestions for the minor things I pointed out in the mean time and follow up with the ASM crew to see if we can get the attribute problem sorted, too.</div><div><br></div><div>Many thanks, Rafael<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Am So., 10. Juli 2022 um 19:40 Uhr schrieb Brian Goetz <<a href="mailto:brian.goetz@oracle.com" rel="noreferrer noreferrer noreferrer" target="_blank">brian.goetz@oracle.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Just some quick comments, more when I get back from vacation. <br>
<br>
> Thanks for merging my patches, the class reader API works more or less equivalent to the ASM version. The one thing I have not working so far are attributes that the JDK knows but ASM does not. I was trying to implement the "contents" method upwards, but this is difficult with the unbound attributes as they normally require a BufWriter which in turn needs a ConstantPoolBuilder. Somehow, I need to pipe the constant resolution to ASM which cannot be done without unsealing some interfaces. I will try to prototype a solution here, too, but I wanted to get the writer working first.<br>
<br>
I’ll take a look at this when I get back.<br>
<br>
> For the type annotations on instructions: Would it be an option to add "getVisibleAnnotations" and "getInvisibleAnnotations" to the relevant CodeElement types? This way, I could for example query the "ExceptionCatch" value for its annotations. <br>
<br>
I’d like to slice this in two parts; one to query the BCI for a (bound) CodeElement, and one to use that BCI to lookup annotations. Getting the BCI is an essential bit of functionality, but given that elements can be bound or unbound, such functionality will be necessarily partial, so we have to communicate “no BCI” as a possible return value. <br>
<br>
> StackMapFrames could on the other hand just be added at their position in the CodeElement iteration to receive them where they become relevant. This way, one would not need to keep track of the current offset. This would also allow for an easier write model where ASM does not allow you to know the offset of a stack map. I assume that the current model is very much modeled after the needs of the javap tool. Ideally, the frame objects would be reduced to the information that is contained in a class file and the consumer could track implicit information such as the "effective" stack and locals.<br>
<br>
There are two concerns with this, one minor and one major. The minor one is that this has a significant cost, and most users don’t want this information. So we would surely want to gate this with an option whose default is false. (We care the most about transformation, and most transformations make only light changes, so we don’t want to add costs that most users won’t want to bear.). <br>
<br>
The major one is how it perturbs the model. The element stream delivered by a model, and the one consumed by a builder, should be duals. What should a builder do when handed a frame? Switch responsibility over to the user for correct frame generation? Ignore it and regenerate stack maps anyway? <br>
<br>
I think we need a better picture of “who is the audience for this sub feature” before designing the API. <br>
<br>
> At last, some notes I made while working with the writer API:<br>
> - ClassBuilder does not require flags when creating a field using "withField", but does so for "withMethod". Should this not be consistent?<br>
<br>
One can always set flags through the AccessFlags element; anything else is just a convenience. The convenience is there for fields because they almost never have any other elements — but methods almost always do. I don’t necessarily object to conveniences, but I don’t want to confuse users with too many ways to do the same thing. (I think the flags-accepting overload for fields does NOT accept a Consumer, so there’s always only one way to do it within each withField overload — which is a good thing.). <br>
<br>
> - EnclosingMethodAttribute requires optional parameters in its "of" method. I find that a bit awkward and would rather see an overload or the parameters to accept "null”.<br>
<br>
This is a deeper question of API design than it first appears. (BTW originally we had the null-accepting/delivering version, and we found it to be error-prone.). <br>
<br>
We want for the _data model_ to be the source of truth, and derive the API from that. One reason for doing so is that it means that the reading API (accessors on the Element) and the writing API (withXxx methods on Builders) will be free of gratuitous inconsistencies, so you can always take things apart to whatever level you want, clone/modify/keep each individual datum, and feed them back to the corresponding builder without having to convert, wrap, unwrap, etc. So the return type of `enclosingMethod` returns an Optional, there’s a strong argument that the of() method should take one. Otherwise, users have to make up the difference in ad-hoc and error-prone ways when adapting classifies. <br>
<br>
Both Optional and null are reasonable ways to represent possibly-not-there values in an API like this, but mixing the two in the same API multiplies the chance for error. So I prefer to address this question at a higher level (“should anything ever return null”) rather than what a particular accessor or factory should deal in. <br>
<br>
Cheers,<br>
-Brian<br>
<br>
<br>
</blockquote></div>
</blockquote></div>