Trying out the API for qbicc

Wed Jul 27 20:24:21 UTC 2022

On Wed, Jul 27, 2022 at 1:41 PM Brian Goetz <brian.goetz at oracle.com> wrote:

> In ASM, you would use the "tree" API, to materialize the body into a
> random-access data structure.  This is a bit unfortunate, because (a) the
> tree API is much slower than the streaming API, and (b) it is also somewhat
> different from the streaming API.  (And mutable.)
>

Indeed, I abandoned using ASM for parsing for this reason, in favor of a
hand-written ByteBuffer-backed parser. It goes without saying that I would
be pleased to use something else though. :-)

> We intent to improve on that, by having the "materialized" API just be
> "put the elements in a list/tree structure".  For ClassModel/MethodModel,
> you can see the idea in play; you can stream the elements of a ClassModel,
> and you'll get methods, fields, etc, but you could also just call
> ClassModel::fields and it will materialize (and cache) a List<FieldModel>
> and return that.  What you want is the equivalent for CodeModel, which is
> conceptually similar but we are missing a few things.
>
> You can of course call CodeModel::elementList and get a List<CodeElement>
> out, which includes the label targets inline.  What's missing is the
> ability to map labels to *list indexes*.  We know we want this, we made a
> stab at it in an early prototype, it was a mess (because some other things
> were a mess), but we would like to return to this.
>

Great! I do especially like the lazy (or at the very least, potentially
lazy) materialization, which was another contributing reason for leaving
ASM behind on the parsing end.

> This is related to a comment recently from Rafael, in that this works when
> we are traversing a *bound* CodeModel, but not a buffered code model (which
> might result from an intermediate stage of a transformation.)  If we are OK
> with making operations like bci() partial, we can address this by, say,
> defining a refined `Iterator<CodeElement>` that also has a bci() accessor.
> This works when parsing, but not necessarily when transforming, but that
> might be OK.
>

OK, I look forward to seeing whether or how this gets addressed.

I think you may be mixing the Opcode and Instruction abstractions?  The
> `Opcode` abstraction is explicitly about bytecodes and bytecode-specific
> metadata, whereas an Instruction is an instantiation of an Opcode +
> operands.  (Some instructions, of course, have no operations (e.g.,
> `iadd`); in this case, you'll notice the implementation has a singleton
> cache.)
>

Maybe I explained poorly but I was specifically thinking of Opcodes and
their characteristics, independently of any particular realization of an
instruction in a code model.

> Would it not make sense to make `Opcode` a sealed interface, with an enum
> for each opcode shape?
>
>
> We tried something like this early on.  It ran into the problem that
> switching over multiple enums in one switch is not supported.  So having
> multiple enums may be more rich in modeling, but clients pay a penalty --
> multiple switches.  This didn't feel like a good trade.  (It is possible
> the API and implementation has evolved since then, to make this less
> problematic, but that would have to be established.)
>

Ah, I had assumed that switching over multiple enums was addressed in the
pattern-matching-switch update. Not having personally kept on top of the
latest developments there, I had quickly sketched up a test in IntelliJ and
it appeared to work so long as the static type of the switch argument was a
sealed interface which permits only enum types *and* you happened to have
static-imported the enum constant names (for syntactic reasons I suppose).
But I didn't actually verify that this was allowed by spec and sure enough,
`javac` rejects it as it does not consider the statically imported enum
values to be constant expressions. Oh well.

Obviously this is wandering dangerously close to the bikeshed borderline,
> however one other real-world advantage is that an enum constant in a more
> specific `*Opcode` subtype type can store more useful information about
> itself that a consumer could use; for example, the opcode constant for
> `IFEQ` could have a method `complement` which yields `IFNE`, which can be
> useful for simplifying some code generators (and I can think of specific
> cases both within qbicc and within Quarkus where this would have been
> useful).
>
>
> This method exists in the library as an Opcode -> Opcode method.
>

Ah yes, I found it in `BytecodeHelpers`, excellent. That's in the `impl`
subpackage though, so it doesn't feel very "public". Perhaps that class
could be moved into the `jdk.classfile` package?

-- 
- DML • he/him
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/classfile-api-dev/attachments/20220727/8e6f4751/attachment-0001.htm>