Another experience report

Sun Aug 7 15:46:23 UTC 2022

Thanks again for the thoughtful feedback.

> Because of the high density
> of instanceof checks on instruction types, this was the one place
> where I wished for shorter class names.

We did consider a few alternatives to the XxxInstruction convention, 
such as XxxOp.  I'm open to rationalizing names at this point; the 
current conventions were chosen early on to be clear and consistent, so 
we could move on to actually implementing the right idioms, but it is 
reasonable to circle back and question naming (for some bounded time) 
now that the shape of the API is more settled.

(Irritating confounding factor: if you name the builder methods load(), 
store(), etc, when you get to "return" and "instanceof", you have a 
problem...)

> Another huge boon is `block()`.  I call it just once in my code base,
> but it reduces the work of managing local slots, their lifetimes, and
> their debug information by an enormous amount.

These were added relatively late, especially the local-variable 
management, and we could especially use more feedback (and test 
coverage!) on that.  There are some missing bits here too, since 
`allocateLocal` doesn't automatically generate LVT/LVTT entries, and it 
should.  Feedback and contribution is welcome here.

> With regard to performance, I can only say that it is fast.

I'm sure authors of other frameworks will take that as "fighting words", 
but yes, it is likely to be somewhat surprising to many that such an 
allocation-heavy approach offers this degree of "fast enough" for most 
use cases.

> ### Lower than I like
>
> The bulk of API usage stays on a single level of abstraction, for
> example working with *Desc entry points instead of *Entry. There are
> rare places where this slipped at little.

I'm not following what you're getting at here.  Do you mean there are 
gaps where we are not consistent about choices of Desc vs Entry?  Or 
simply that you stuck to the Desc level as a user?  And if so, was there 
friction here?

> The New*Array family of instructions gave me a hard time.  I had
> forgotten that there are three of those, and there was some bumping
> around involved while re-learning this fact.  (Btw, anewarray() accepts
> a primitive ClassDesc and converts it into a reference type, e.g. "I"
> to "LI;".)  Two of the instructions have no ClassDesc factory, which
> meant I had to wrap the family into a intermediate representation node
> to carry essentially (ClassDesc,int) downstream to the CodeBuilder
> instance.  I wonder if it is worthwhile to move the three under an
> umbrella interface, similar to what ConstantInstruction is doing.

Quite possibly.  These instructions gave us a hard time too, because 
they are so irregular in how they work.  It is quite possible that there 
is a simplification here.  The constant instructions have a similar 
abstraction-resistence, and we did go around a few times with how much 
to lump and split there too.  The main constraint is that the lowest 
level modeling should map directly to the classfile spec, which means 
any creativity around merging representations has to happen one level up.

When designing the groupings of instructions, there are a lot of forces 
pulling in different directions.  The main normalizing force is that 
transformation is an emergent property of reading+writing, and we want 
for the no-op transformation to do as little "lifting and lowering" work 
as possible. This is why no XxxInstruction uses XxxDesc in its data 
model; converting from CP entry -> descriptor string -> XxxDesc -> 
descriptor string -> CP entry for a no-op transform is expensive and 
pointless.  We have to worry about this because the default 
representation is a stream of element; if the elements are "too far" 
from what's in the classfile, then the lifting and lowering overhead 
will eat you.

If you try to merge semi-related instructions like the constant or 
new-array instructions into a common representation, that representation 
will not be the natural representation for some or all of the 
instructions in the group.  When we hit walls like this, we fall back to 
splitting, and modeling the classfile spec directly, but there is 
probably more we can do atop them.  Suggestions welcome.

> The one place where I use labelToBci() is try/catch/finally.  There is
> the special case of exceptionCatch() failing for an empty region, a
> condition that in turn can lead to handler blocks becoming
> unreachable.  For me, the only robust way to deal with this to a)
> guard against an empty region by inspecting the bcis and b)
> subsequently omitting the invalid/unreachable parts.

This raises a good question about how much the library wants to do to 
"fix" questionable bytecode.  We already NOP out unreachable bytecode 
(otherwise the verifier freaks out).  Should we just silently drop catch 
clauses associated with empty try blocks?  (We won't know that they are 
empty until after all the labels are resolved, so we can't usually 
detect this at the point of emitting the catch entry.)  What about when, 
by the time we get to the end of generation, a label used in a 
try-catch, or LVT[T], isn't bound? Should we throw, or just drop the 
entry?  I suspect one size does not fit all here and we have to design 
some more options-handling.

> Another single use only is constantPool(), to go from a DMHD instance
> to CodeBuilder's (field|invoke)Instruction.  This was a consequence of
> DMHD only providing the lookupDescriptor() as String and not as an MTD
> as well.  With hindsight, it may have been better for me to recover
> the MTD from the String regardless, and to stay on the level of *Desc
> throughout.

Is there something missing that would bridge that for you?

> Finally, is there a way to decide between tableswitch and
> lookupswitch?  Lacking something better, I'm trying to emulate this
> code here:
> https://github.com/openjdk/jdk-sandbox/blob/master/src/jdk.compiler/share/classes/com/sun/tools/javac/jvm/Gen.java#L1320

Most compilers use a heuristic based on the size of the two 
alternatives, comparing `(hi-lo) / count` to some density threshold.  
You're mostly optimizing for bytecode size here, since when the JIT gets 
its hands on it, it has its own heuristics.

> ### Lost in translation
>
> One feature I cannot duplicate with Classfile is try/catch/finally in
> expression position when the operand stack is not empty.  The old
> bytecode generator dealt with this case by unwinding the operand stack
> into locals, evaluating the t/c/f, and then rebuilding the operand
> stack with the result on top.  But to do this, one needs to know what
> the operand stack looks like at the point of the `try`.

Interesting point.  We do not build stack maps as we go, so we don't 
have our hands readily on the stack state.  However, I could imagine an 
overload of the try-catch builder that would let you feed it a 
TypeKind[], and that would use allocateLocal to automate the push/pop 
logic.  This is something you should be able to build from outside the 
library, too; this would be a good experiment try try. (You'd have to 
manually compute the stack state.)

Cheers,
-Brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/classfile-api-dev/attachments/20220807/a7cb1489/attachment-0001.htm>