<html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body>

    Thanks for giving it a try!  <br>

    <br>

    <div class="moz-cite-prefix">On 7/27/2022 10:42 AM, David Lloyd

      wrote:<br>

    </div>

    <blockquote type="cite" cite="mid:CANghgrT=r6qSJTck80cVNMu8nZepzhHOu7CDaaJzmcay3m-PCQ@mail.gmail.com">

      

      <div dir="ltr">

        <div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Our qbicc

          project handles classfiles extensively, as one might imagine,

          for both parsing and generation of classes. I've been playing

          around with using the new API instead. <span class="gmail_default" style="font-family:Arial,Helvetica,sans-serif">S</span><span class="gmail_default">o far, I am liking this API a lot; the

            design is overall very sensible and usable as far as I can

            tell; though </span>it is still pretty early and not a lot

          is working yet, I do have some initial impressions.</div>

        <div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br>

        </div>

        <div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Parsing</div>

        <div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br>

        </div>

        <div class="gmail_default" style="font-family:arial,helvetica,sans-serif">When we parse a

          method body, our parser is processing it directly into SSA

          form for later analysis. We're doing this in a depth-first

          recursive manner, where we start from the top of the method,

          and at each instruction which corresponds to a basic block

          terminator (which would be things like GOTO*, IF*, *SWITCH,

          *RETURN, ATHROW, and also some special cases like method

          invocation inside of a `try` block), we close up the current

          block and recursively process each unprocessed successor block

          (if any). In this way we naturally ignore any unreachable

          bytecodes - not just from bizarrely-formed class files (though

          this is possible) but also from parsing conditional constructs

          where we can establish a constant condition early in

          processing.</div>

        <div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br>

        </div>

        <div class="gmail_default" style="font-family:arial,helvetica,sans-serif">However this

          approach depends on being able to randomly access the bytecode

          body. This seems doable with the new API, but unless I missed

          some helper method(s), to do so apparently requires iterating

          the instruction list, collecting all of the labels, and

          building a label-to-integer-key mapping to locate the list

          indices where processing should be resumed for a given label. 

          It would certainly be nice to be able to have a more flexible

          seeking solution, like a special iterator API which can seek

          based on labels for example.</div>

      </div>

    </blockquote>

    <br>

    In ASM, you would use the "tree" API, to materialize the body into a

    random-access data structure.  This is a bit unfortunate, because

    (a) the tree API is much slower than the streaming API, and (b) it

    is also somewhat different from the streaming API.  (And mutable.) 

    <br>

    <br>

    We intent to improve on that, by having the "materialized" API just

    be "put the elements in a list/tree structure".  For

    ClassModel/MethodModel, you can see the idea in play; you can stream

    the elements of a ClassModel, and you'll get methods, fields, etc,

    but you could also just call ClassModel::fields and it will

    materialize (and cache) a List<FieldModel> and return that. 

    What you want is the equivalent for CodeModel, which is conceptually

    similar but we are missing a few things.  <br>

    <br>

    You can of course call CodeModel::elementList and get a

    List<CodeElement> out, which includes the label targets

    inline.  What's missing is the ability to map labels to *list

    indexes*.  We know we want this, we made a stab at it in an early

    prototype, it was a mess (because some other things were a mess),

    but we would like to return to this.  <br>

    <br>

    <blockquote type="cite" cite="mid:CANghgrT=r6qSJTck80cVNMu8nZepzhHOu7CDaaJzmcay3m-PCQ@mail.gmail.com">

      <div dir="ltr">Another issue with the strong encapsulation of BCI

        as labels is that it does not seem possible to find the BCI of

        an arbitrary instruction. </div>

    </blockquote>

    <br>

    This is related to a comment recently from Rafael, in that this

    works when we are traversing a *bound* CodeModel, but not a buffered

    code model (which might result from an intermediate stage of a

    transformation.)  If we are OK with making operations like bci()

    partial, we can address this by, say, defining a refined

    `Iterator<CodeElement>` that also has a bci() accessor.  This

    works when parsing, but not necessarily when transforming, but that

    might be OK.  <br>

    <br>

    <blockquote type="cite" cite="mid:CANghgrT=r6qSJTck80cVNMu8nZepzhHOu7CDaaJzmcay3m-PCQ@mail.gmail.com">

      <div dir="ltr">

        <div>Generation

          <div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br>

          </div>

          <div class="gmail_default" style="font-family:arial,helvetica,sans-serif">We also

            generate classes for various purposes so I was doing some

            experiments with this as well. So far I have found this to

            be fairly straightforward, but I have so far encountered one

            minor API issue with this API (which to be completely fair

            ASM also suffers from).</div>

          <div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br>

          </div>

          <div class="gmail_default" style="font-family:arial,helvetica,sans-serif">With ASM,

            when you're emitting instructions, you have to know not only

            the opcode of the instruction you're emitting but also the

            particular API method which corresponds to the correct

            instruction shape. This is excusable to an extent within

            ASM, because the opcode argument is an `int` so if there was

            only one overloaded method name with every shape, it might

            be too easy to make a mistake (never mind how relatively

            poetic it would have been for the main assembly method in

            ASM to be called `asm` :-) ).</div>

        </div>

      </div>

    </blockquote>

    <br>

    You should think of the generation methods as layered.  At the most

    abstract, there is `with(CodeElement)`.  Every other generation

    method bottoms out here.  At the next level, there are the ones that

    correspond to the coarse categories in the data model, such as

    `load(kind, slot)` or `operator(opc)`.  At the finest level, there

    are methods for aload_0() and ishl(), which again all bottom out in

    `with(CodeElement)`.  <br>

    <br>

    Our assumption is that most "hand coded" generation code will prefer

    the most fine-grained ones, pattern-driven transformation code will

    probably do things like match on `LoadInstruction` and turn around

    and call load() again, maybe with different arguments, and "purely

    mechanical" transformation code will probably prefer just making

    elements and shoveling them down the pipeline.  <br>

    <br>

    <blockquote type="cite" cite="mid:CANghgrT=r6qSJTck80cVNMu8nZepzhHOu7CDaaJzmcay3m-PCQ@mail.gmail.com">

      <div dir="ltr">

        <div>

          <div class="gmail_default" style="font-family:arial,helvetica,sans-serif">However, this

            API is otherwise very strongly typed, taking full advantage

            of the new pattern matching and sealing capabilities. So I

            was a bit surprised when all instruction opcodes were still

            represented by a single type (in this case an `enum`), even

            though there are enough different opcode shapes or

            characteristics to warrant *six* different constructors.</div>

        </div>

      </div>

    </blockquote>

    <br>

    I think you may be mixing the Opcode and Instruction abstractions? 

    The `Opcode` abstraction is explicitly about bytecodes and

    bytecode-specific metadata, whereas an Instruction is an

    instantiation of an Opcode + operands.  (Some instructions, of

    course, have no operations (e.g., `iadd`); in this case, you'll

    notice the implementation has a singleton cache.)  <br>

    <br>

    The Opcode type mostly serves the implementation, to facilitate

    mapping to metadata (instruction size, kind, etc), and to manage the

    weirdness of the WIDE opcodes.  (If it were not for WIDE, I'd

    probably have just gone with `byte` and lookup functions.)  <br>

    <br>

    I find it a little unfortunate that some methods like `branch`

    require an Opcode argument -- feels like mixing levels, as you

    suggest -- but the alternatives were worse.  <br>

    <br>

    <blockquote type="cite" cite="mid:CANghgrT=r6qSJTck80cVNMu8nZepzhHOu7CDaaJzmcay3m-PCQ@mail.gmail.com">

      <div dir="ltr">

        <div>

          <div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Would it not

            make sense to make `Opcode` a sealed interface, with an enum

            for each opcode shape? </div>

        </div>

      </div>

    </blockquote>

    <br>

    We tried something like this early on.  It ran into the problem that

    switching over multiple enums in one switch is not supported.  So

    having multiple enums may be more rich in modeling, but clients pay

    a penalty -- multiple switches.  This didn't feel like a good

    trade.  (It is possible the API and implementation has evolved since

    then, to make this less problematic, but that would have to be

    established.)<br>

    <br>

    <blockquote type="cite" cite="mid:CANghgrT=r6qSJTck80cVNMu8nZepzhHOu7CDaaJzmcay3m-PCQ@mail.gmail.com">

      <div dir="ltr">

        <div>

          <div class="gmail_default" style="font-family:arial,helvetica,sans-serif">In this way,

            instead of having a method for each of *many* (but not all)

            instructions (many of which are highly similar internally)

            and several overlapping ASM-like "emit this shape by name"

            methods for *some* other instructions - which ambiguously

            accepts a plain, generally-typed Opcode - there could be

            (many fewer) emit methods which accept a specific opcode

            type as the first argument and the correct argument values

            for subsequent arguments? </div>

        </div>

      </div>

    </blockquote>

    <br>

    I don't think there would be "many fewer" methods; it just means

    that some of the type checking can be moved from runtime to compile

    time (e.g., branch(opc, label) wouldn't let you use IADD as the

    opcode).   But I would think all the same methods would be there,

    just with tighter types.  <br>

    <br>

    <blockquote type="cite" cite="mid:CANghgrT=r6qSJTck80cVNMu8nZepzhHOu7CDaaJzmcay3m-PCQ@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Obviously this

          is wandering dangerously close to the bikeshed borderline,

          however one other real-world advantage is that an enum

          constant in a more specific `*Opcode` subtype type can store

          more useful information about itself that a consumer could

          use; for example, the opcode constant for `IFEQ` could have a

          method `complement` which yields `IFNE`, which can be useful

          for simplifying some code generators (and I can think of

          specific cases both within qbicc and within Quarkus where this

          would have been useful).</div>

      </div>

    </blockquote>

    <br>

    This method exists in the library as an Opcode -> Opcode method.<br>

    <br>

    Cheers,<br>

    -Brian<br>

  </body>

</html>