Another experience report

Michael van Acken michael.van.acken at gmail.com
Sun Aug 7 13:27:20 UTC 2022


Here is another experience report, although in the more pedestrian
setting of a compiler with Clojure as input.  Over the past 4 weeks I
moved an incomplete project of mine from a homegrown bytecode emitter
to Classfile.  I have now reached feature parity with the old
codebase, and classfile generation is essentially complete.

### Reading

Sole users of Classfile.parse() are the unit tests.  They take a
sequence of byte arrays and wrangle them into the data format expected
by the existing test cases.  In case of an expected/actual mismatch,
the nested data is diffed and visualised side by side.

Some minor work was involved in assigning nice names to labels, and to
slide LocalVariable and ExceptionCatch to their prior places in the
instruction sequences.  The rest of the mapping essentially wrote
itself by following the API's guidance.  Because of the high density
of instanceof checks on instruction types, this was the one place
where I wished for shorter class names.

### Writing

Manually writing parts of a class or a method's Code attribute is a
pleasure.  Concise, easy to write, and easy to read.  Similarly, going
from the intermediate representation to Classfile handlers is a
breeze.

I find the ability to represent bytecode instructions as immutable
data especially useful.  It allows me to stash away "simple" opcodes
in a uniform way in the IR early while parsing the source code, and
have them emit themselves when writing out the class.  A single node
class of the intermediate representation is sufficient to cover the
bulk of opcodes (currently with a single exception; see below).

Another huge boon is `block()`.  I call it just once in my code base,
but it reduces the work of managing local slots, their lifetimes, and
their debug information by an enormous amount.

With regard to performance, I can only say that it is fast.  For
example, turnaround time for the unit tests is usually below 150ms and
it is the best I have managed so far.  These tests encompass 2.3k
build() and 1.5k parse() of class files on virtual threads, with file
size ranging from tiny to small.

### Lower than I like

The bulk of API usage stays on a single level of abstraction, for
example working with *Desc entry points instead of *Entry.  There are
rare places where this slipped at little.

The New*Array family of instructions gave me a hard time.  I had
forgotten that there are three of those, and there was some bumping
around involved while re-learning this fact.  (Btw, anewarray() accepts
a primitive ClassDesc and converts it into a reference type, e.g. "I"
to "LI;".)  Two of the instructions have no ClassDesc factory, which
meant I had to wrap the family into a intermediate representation node
to carry essentially (ClassDesc,int) downstream to the CodeBuilder
instance.  I wonder if it is worthwhile to move the three under an
umbrella interface, similar to what ConstantInstruction is doing.

The one place where I use labelToBci() is try/catch/finally.  There is
the special case of exceptionCatch() failing for an empty region, a
condition that in turn can lead to handler blocks becoming
unreachable.  For me, the only robust way to deal with this to a)
guard against an empty region by inspecting the bcis and b)
subsequently omitting the invalid/unreachable parts.

Another single use only is constantPool(), to go from a DMHD instance
to CodeBuilder's (field|invoke)Instruction.  This was a consequence of
DMHD only providing the lookupDescriptor() as String and not as an MTD
as well.  With hindsight, it may have been better for me to recover
the MTD from the String regardless, and to stay on the level of *Desc
throughout.

Finally, is there a way to decide between tableswitch and
lookupswitch?  Lacking something better, I'm trying to emulate this
code here:
https://github.com/openjdk/jdk-sandbox/blob/master/src/jdk.compiler/share/classes/com/sun/tools/javac/jvm/Gen.java#L1320

### Lost in translation

One feature I cannot duplicate with Classfile is try/catch/finally in
expression position when the operand stack is not empty.  The old
bytecode generator dealt with this case by unwinding the operand stack
into locals, evaluating the t/c/f, and then rebuilding the operand
stack with the result on top.  But to do this, one needs to know what
the operand stack looks like at the point of the `try`.

Echoing Dan's sentiment, I'm also looking forward to Classfile being
part of the JDK.

-- mva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/classfile-api-dev/attachments/20220807/f6f81247/attachment.htm>


More information about the classfile-api-dev mailing list