POC: JDK ClassModel -> ASM ClassReader

Mon Jul 11 07:41:31 UTC 2022

Short follow up:

1. I figured out that the annotation on the NEW instruction used a label
that was bound after the instruction when it should have been bound before.
My hacky workaround for ASM - which only visits annotations after an
instruction - is to always register a label prior to annotatable
instructions, but ideally, I would be able to offset a label with a given
size to avoid this. I felt like this could be a solution as OpenJDK already
exposes bytecode offsets in its public API.

2. I suggested a change for line numbers to optionally accept a label. This
way, I can register line numbers later similarly to how ASM does it. I
understand it's unlikey a common need but it's a minor change and unifies
the possibility with other metadata in the API which all use labels.

Best regards, Rafael

Rafael Winterhalter <rafael.wth at gmail.com> schrieb am So., 10. Juli 2022,
22:57:

> I was able to complete a POC for an ASM-to-OpenJDK bridge. It works quite
> well, but I hit a few more road blocks then the last time:
> https://github.com/raphw/asm-jdk-bridge/tree/writer-poc
>
> I have used my API suggestion of "open builders" (
> https://github.com/raphw/jdk-sandbox/commit/2be58f400b9ebf96b851eda658e0b8d2560421c5)
> for this POC which I had modified slightly since my last mail. As Paul
> pointed out, it does not allow to repeat a code builder in case of a
> LabelOverflowException. But personally, I would - as a user - not expect
> that my CodeBuilder-Consumer is consumed multiple times, I would hope that
> the implementation could rather reiterate to patch the label offsets, or at
> least do that for the "open version". Anyways, I think that this limitation
> is solvable.
>
> I have again written a row of tests to see how well the bridge is working.
> Again, I made a few observations:
>
> - The frame generation works, but generates slightly different frames from
> what javac generated if I leave the byte code untouched. For example, local
> variables are "voided" if they are not used after a frame. This is not a
> problem in practice, but I know that some tools prepend code to methods,
> assuming all variables still exist, without checking frames what would fail
> bytecode verification. This would however only happen if a class file was
> first processed by OpenJDKs API and later by ASM. In my bridge sample repo,
> this is reproduced by the test of the "TryThrowCatch" class.
> - The frame generation does not work for optional types which are often
> used especially in enterprise-directed frameworks like Spring (if this is
> fortunate is another question, it is however quite common). I understand
> that the class hierarchy resolution is pluggable, but for optional types,
> the class hierarchy cannot be resolved, even with a custom resolver, since
> these types are in fact missing at runtime. This is why Byte Buddy uses
> "frame weaving" which would require that frames can be written explicitly
> and are not handled by the class writer. I would therefore suggest that one
> should be able to opt-out of frame generation, be it for single methods,
> and have some API to add them explicitly. This normally also improves
> performance. Byte Buddy for example generates three times the overhead if
> ASMs frame computation is enabled as it is rather I/O-heavy. An example of
> where OpenJDK creates invalid frames is documented in the
> "FrameWithMissingType" test of the bridge repo. Also, many byte code
> processing frameworks already need to keep track of the maximum stack and
> locals; I am not sure how expensive computation of those values is in
> practice, but ASM claims that it saves 20% on avoiding the bookkeeping.
> Maybe one could consider to allow specifying these values too if they are
> available anyways.
> - I noticed that when adding a TypeAnnotation on a NEW instruction, the
> offset in the attribute seems to be set incorrectly. The type annotation is
> added with reference to the DUP bytecode that follows NEW. I am not 100%
> sure if this is legal as the type reference still points to a NEW
> instruction, but I would still suggest fixing it. I tried to do this
> myself, but I got lost in the code. Maybe I will find some more time to
> still do it myself some time. The error is documented in the
> "TypeAnnotationInCode" test of my repo.
> - I noticed that the JSR/RET instructions are not supported, even if they
> are just "passed through". Unfortunately, I still encounter them regularly
> when working with Java agents. JDBC drivers are often compiled to very old
> bytecode levels and for example vendors for APM tools like instrumenting
> those. I know they are difficult and annoying bytecodes, but I would expect
> that OpenJDK will eventually be able to process those. Is this only meant
> as an intermediate limitation?
> - Custom/unknown attributes are also difficult to deal with when writing.
> In a way, its the same problem as with reading them. As I mentioned then, I
> lack some way of resolving an unknown Attribute to an array of bytes given
> some constant pool. Currently, I cannot even attempt this as all interfaces
> are sealed and I could not provide a custom implementation of a
> ConstantPoolBuilder to an AttributeMapper either. As things are, ASM does
> neither support this and I made a suggestion to open up the attribute API
> which is currently tied to ClassReader/ClassWriter implementations (
> https://gitlab.ow2.org/asm/asm/-/merge_requests/353). This is a
> prerequisit and I would hope to have a similar API that is implementable
> in OpenJDKs to pass to OpenJDK's AttributeMapper class, to write attributes
> from and to byte arrays..
> - OpenJDK's "line number" API only allows setting a line number for the
> current code position. For most use cases, I think this is sufficient, but
> it stuck out to me as it is an attribute where most in-code attributes are
> currently tied to labels. Ideally, I would suggest that line numbers should
> also be definable for a previous position with a label; ASM does allow the
> same at least and I assume people will have an easier time migrating code.
>
> Before all of these things however, some form of "open API" would be
> required to even make the class writer bridge possible, so I hope that you
> can consider something in the direction of my suggested pull request.
> Thanks for considering! I will try to make suggestions for the minor things
> I pointed out in the mean time and follow up with the ASM crew to see if we
> can get the attribute problem sorted, too.
>
> Many thanks, Rafael
>
> Am So., 10. Juli 2022 um 19:40 Uhr schrieb Brian Goetz <
> brian.goetz at oracle.com>:
>
>> Just some quick comments, more when I get back from vacation.
>>
>> > Thanks for merging my patches, the class reader API works more or less
>> equivalent to the ASM version. The one thing I have not working so far are
>> attributes that the JDK knows but ASM does not. I was trying to implement
>> the "contents" method upwards, but this is difficult with the unbound
>> attributes as they normally require a BufWriter which in turn needs a
>> ConstantPoolBuilder. Somehow, I need to pipe the constant resolution to ASM
>> which cannot be done without unsealing some interfaces. I will try to
>> prototype a solution here, too, but I wanted to get the writer working
>> first.
>>
>> I’ll take a look at this when I get back.
>>
>> > For the type annotations on instructions: Would it be an option to add
>> "getVisibleAnnotations" and "getInvisibleAnnotations" to the relevant
>> CodeElement types? This way, I could for example query the "ExceptionCatch"
>> value for its annotations.
>>
>> I’d like to slice this in two parts; one to query the BCI for a (bound)
>> CodeElement, and one to use that BCI to lookup annotations.  Getting the
>> BCI is an essential bit of functionality, but given that elements can be
>> bound or unbound, such functionality will be necessarily partial, so we
>> have to communicate “no BCI” as a possible return value.
>>
>> > StackMapFrames could on the other hand just be added at their position
>> in the CodeElement iteration to receive them where they become relevant.
>> This way, one would not need to keep track of the current offset. This
>> would also allow for an easier write model where ASM does not allow you to
>> know the offset of a stack map. I assume that the current model is very
>> much modeled after the needs of the javap tool. Ideally, the frame objects
>> would be reduced to the information that is contained in a class file and
>> the consumer could track implicit information such as the "effective" stack
>> and locals.
>>
>> There are two concerns with this, one minor and one major.  The minor one
>> is that this has a significant cost, and most users don’t want this
>> information.  So we would surely want to gate this with an option whose
>> default is false.  (We care the most about transformation, and most
>> transformations make only light changes, so we don’t want to add costs that
>> most users won’t want to bear.).
>>
>> The major one is how it perturbs the model.  The element stream delivered
>> by a model, and the one consumed by a builder, should be duals.  What
>> should a builder do when handed a frame?  Switch responsibility over to the
>> user for correct frame generation?  Ignore it and regenerate stack maps
>> anyway?
>>
>> I think we need a better picture of “who is the audience for this sub
>> feature” before designing the API.
>>
>> > At last, some notes I made while working with the writer API:
>> > - ClassBuilder does not require flags when creating a field using
>> "withField", but does so for "withMethod". Should this not be consistent?
>>
>> One can always set flags through the AccessFlags element; anything else
>> is just a convenience.  The convenience is there for fields because they
>> almost never have any other elements — but methods almost always do.  I
>> don’t necessarily object to conveniences, but I don’t want to confuse users
>> with too many ways to do the same thing.  (I think the flags-accepting
>> overload for fields does NOT accept a Consumer, so there’s always only one
>> way to do it within each withField overload — which is a good thing.).
>>
>> > - EnclosingMethodAttribute requires optional parameters in its "of"
>> method. I find that a bit awkward and would rather see an overload or the
>> parameters to accept "null”.
>>
>> This is a deeper question of API design than it first appears.  (BTW
>> originally we had the null-accepting/delivering version, and we found it to
>> be error-prone.).
>>
>> We want for the _data model_ to be the source of truth, and derive the
>> API from that.  One reason for doing so is that it means that the reading
>> API (accessors on the Element) and the writing API (withXxx methods on
>> Builders) will be free of gratuitous inconsistencies, so you can always
>> take things apart to whatever level you want, clone/modify/keep each
>> individual datum, and feed them back to the corresponding builder without
>> having to convert, wrap, unwrap, etc.  So the return type of
>> `enclosingMethod` returns an Optional, there’s a strong argument that the
>> of() method should take one.  Otherwise, users have to make up the
>> difference in ad-hoc and error-prone ways when adapting classifies.
>>
>> Both Optional and null are reasonable ways to represent
>> possibly-not-there values in an API like this, but mixing the two in the
>> same API multiplies the chance for error.  So I prefer to address this
>> question at a higher level (“should anything ever return null”) rather than
>> what a particular accessor or factory should deal in.
>>
>> Cheers,
>> -Brian
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/classfile-api-dev/attachments/20220711/1ea3c2bd/attachment-0001.htm>