POC: JDK ClassModel -> ASM ClassReader

Sun Jul 10 20:57:04 UTC 2022

I was able to complete a POC for an ASM-to-OpenJDK bridge. It works quite
well, but I hit a few more road blocks then the last time:
https://github.com/raphw/asm-jdk-bridge/tree/writer-poc

I have used my API suggestion of "open builders" (
https://github.com/raphw/jdk-sandbox/commit/2be58f400b9ebf96b851eda658e0b8d2560421c5)
for this POC which I had modified slightly since my last mail. As Paul
pointed out, it does not allow to repeat a code builder in case of a
LabelOverflowException. But personally, I would - as a user - not expect
that my CodeBuilder-Consumer is consumed multiple times, I would hope that
the implementation could rather reiterate to patch the label offsets, or at
least do that for the "open version". Anyways, I think that this limitation
is solvable.

I have again written a row of tests to see how well the bridge is working.
Again, I made a few observations:

- The frame generation works, but generates slightly different frames from
what javac generated if I leave the byte code untouched. For example, local
variables are "voided" if they are not used after a frame. This is not a
problem in practice, but I know that some tools prepend code to methods,
assuming all variables still exist, without checking frames what would fail
bytecode verification. This would however only happen if a class file was
first processed by OpenJDKs API and later by ASM. In my bridge sample repo,
this is reproduced by the test of the "TryThrowCatch" class.
- The frame generation does not work for optional types which are often
used especially in enterprise-directed frameworks like Spring (if this is
fortunate is another question, it is however quite common). I understand
that the class hierarchy resolution is pluggable, but for optional types,
the class hierarchy cannot be resolved, even with a custom resolver, since
these types are in fact missing at runtime. This is why Byte Buddy uses
"frame weaving" which would require that frames can be written explicitly
and are not handled by the class writer. I would therefore suggest that one
should be able to opt-out of frame generation, be it for single methods,
and have some API to add them explicitly. This normally also improves
performance. Byte Buddy for example generates three times the overhead if
ASMs frame computation is enabled as it is rather I/O-heavy. An example of
where OpenJDK creates invalid frames is documented in the
"FrameWithMissingType" test of the bridge repo. Also, many byte code
processing frameworks already need to keep track of the maximum stack and
locals; I am not sure how expensive computation of those values is in
practice, but ASM claims that it saves 20% on avoiding the bookkeeping.
Maybe one could consider to allow specifying these values too if they are
available anyways.
- I noticed that when adding a TypeAnnotation on a NEW instruction, the
offset in the attribute seems to be set incorrectly. The type annotation is
added with reference to the DUP bytecode that follows NEW. I am not 100%
sure if this is legal as the type reference still points to a NEW
instruction, but I would still suggest fixing it. I tried to do this
myself, but I got lost in the code. Maybe I will find some more time to
still do it myself some time. The error is documented in the
"TypeAnnotationInCode" test of my repo.
- I noticed that the JSR/RET instructions are not supported, even if they
are just "passed through". Unfortunately, I still encounter them regularly
when working with Java agents. JDBC drivers are often compiled to very old
bytecode levels and for example vendors for APM tools like instrumenting
those. I know they are difficult and annoying bytecodes, but I would expect
that OpenJDK will eventually be able to process those. Is this only meant
as an intermediate limitation?
- Custom/unknown attributes are also difficult to deal with when writing.
In a way, its the same problem as with reading them. As I mentioned then, I
lack some way of resolving an unknown Attribute to an array of bytes given
some constant pool. Currently, I cannot even attempt this as all interfaces
are sealed and I could not provide a custom implementation of a
ConstantPoolBuilder to an AttributeMapper either. As things are, ASM does
neither support this and I made a suggestion to open up the attribute API
which is currently tied to ClassReader/ClassWriter implementations (
https://gitlab.ow2.org/asm/asm/-/merge_requests/353). This is a prerequisit
and I would hope to have a similar API that is implementable  in OpenJDKs
to pass to OpenJDK's AttributeMapper class, to write attributes from and to
byte arrays..
- OpenJDK's "line number" API only allows setting a line number for the
current code position. For most use cases, I think this is sufficient, but
it stuck out to me as it is an attribute where most in-code attributes are
currently tied to labels. Ideally, I would suggest that line numbers should
also be definable for a previous position with a label; ASM does allow the
same at least and I assume people will have an easier time migrating code.

Before all of these things however, some form of "open API" would be
required to even make the class writer bridge possible, so I hope that you
can consider something in the direction of my suggested pull request.
Thanks for considering! I will try to make suggestions for the minor things
I pointed out in the mean time and follow up with the ASM crew to see if we
can get the attribute problem sorted, too.

Many thanks, Rafael

Am So., 10. Juli 2022 um 19:40 Uhr schrieb Brian Goetz <
brian.goetz at oracle.com>:

> Just some quick comments, more when I get back from vacation.
>
> > Thanks for merging my patches, the class reader API works more or less
> equivalent to the ASM version. The one thing I have not working so far are
> attributes that the JDK knows but ASM does not. I was trying to implement
> the "contents" method upwards, but this is difficult with the unbound
> attributes as they normally require a BufWriter which in turn needs a
> ConstantPoolBuilder. Somehow, I need to pipe the constant resolution to ASM
> which cannot be done without unsealing some interfaces. I will try to
> prototype a solution here, too, but I wanted to get the writer working
> first.
>
> I’ll take a look at this when I get back.
>
> > For the type annotations on instructions: Would it be an option to add
> "getVisibleAnnotations" and "getInvisibleAnnotations" to the relevant
> CodeElement types? This way, I could for example query the "ExceptionCatch"
> value for its annotations.
>
> I’d like to slice this in two parts; one to query the BCI for a (bound)
> CodeElement, and one to use that BCI to lookup annotations.  Getting the
> BCI is an essential bit of functionality, but given that elements can be
> bound or unbound, such functionality will be necessarily partial, so we
> have to communicate “no BCI” as a possible return value.
>
> > StackMapFrames could on the other hand just be added at their position
> in the CodeElement iteration to receive them where they become relevant.
> This way, one would not need to keep track of the current offset. This
> would also allow for an easier write model where ASM does not allow you to
> know the offset of a stack map. I assume that the current model is very
> much modeled after the needs of the javap tool. Ideally, the frame objects
> would be reduced to the information that is contained in a class file and
> the consumer could track implicit information such as the "effective" stack
> and locals.
>
> There are two concerns with this, one minor and one major.  The minor one
> is that this has a significant cost, and most users don’t want this
> information.  So we would surely want to gate this with an option whose
> default is false.  (We care the most about transformation, and most
> transformations make only light changes, so we don’t want to add costs that
> most users won’t want to bear.).
>
> The major one is how it perturbs the model.  The element stream delivered
> by a model, and the one consumed by a builder, should be duals.  What
> should a builder do when handed a frame?  Switch responsibility over to the
> user for correct frame generation?  Ignore it and regenerate stack maps
> anyway?
>
> I think we need a better picture of “who is the audience for this sub
> feature” before designing the API.
>
> > At last, some notes I made while working with the writer API:
> > - ClassBuilder does not require flags when creating a field using
> "withField", but does so for "withMethod". Should this not be consistent?
>
> One can always set flags through the AccessFlags element; anything else is
> just a convenience.  The convenience is there for fields because they
> almost never have any other elements — but methods almost always do.  I
> don’t necessarily object to conveniences, but I don’t want to confuse users
> with too many ways to do the same thing.  (I think the flags-accepting
> overload for fields does NOT accept a Consumer, so there’s always only one
> way to do it within each withField overload — which is a good thing.).
>
> > - EnclosingMethodAttribute requires optional parameters in its "of"
> method. I find that a bit awkward and would rather see an overload or the
> parameters to accept "null”.
>
> This is a deeper question of API design than it first appears.  (BTW
> originally we had the null-accepting/delivering version, and we found it to
> be error-prone.).
>
> We want for the _data model_ to be the source of truth, and derive the API
> from that.  One reason for doing so is that it means that the reading API
> (accessors on the Element) and the writing API (withXxx methods on
> Builders) will be free of gratuitous inconsistencies, so you can always
> take things apart to whatever level you want, clone/modify/keep each
> individual datum, and feed them back to the corresponding builder without
> having to convert, wrap, unwrap, etc.  So the return type of
> `enclosingMethod` returns an Optional, there’s a strong argument that the
> of() method should take one.  Otherwise, users have to make up the
> difference in ad-hoc and error-prone ways when adapting classifies.
>
> Both Optional and null are reasonable ways to represent possibly-not-there
> values in an API like this, but mixing the two in the same API multiplies
> the chance for error.  So I prefer to address this question at a higher
> level (“should anything ever return null”) rather than what a particular
> accessor or factory should deal in.
>
> Cheers,
> -Brian
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/classfile-api-dev/attachments/20220710/a47d7e21/attachment.htm>