Another experience report

Sun Aug 7 17:45:47 UTC 2022

Am So., 7. Aug. 2022 um 17:46 Uhr schrieb Brian Goetz <
brian.goetz at oracle.com>:

> ### Lower than I like
>
> The bulk of API usage stays on a single level of abstraction, for
> example working with *Desc entry points instead of *Entry.  There are
> rare places where this slipped at little.
>
>
> I'm not following what you're getting at here.  Do you mean there are gaps
> where we are not consistent about choices of Desc vs Entry?  Or simply that
> you stuck to the Desc level as a user?  And if so, was there friction here?
>

This was intended as a comment on a common theme of the following four
points,
where a small gap in the API or in my understanding forced me slightly out
of the
lane in which I would have preferred to stay.  As it is, I find
Classfile highly regular
and predictable, even in this early state.  The one exception that comes to
mind
is two New*Array instructions not having a ClassDesc-based of() factory.

> The one place where I use labelToBci() is try/catch/finally.  There is
> the special case of exceptionCatch() failing for an empty region, a
> condition that in turn can lead to handler blocks becoming
> unreachable.  For me, the only robust way to deal with this to a)
> guard against an empty region by inspecting the bcis and b)
> subsequently omitting the invalid/unreachable parts.
>
>
> This raises a good question about how much the library wants to do to
> "fix" questionable bytecode.  We already NOP out unreachable bytecode
> (otherwise the verifier freaks out).  Should we just silently drop catch
> clauses associated with empty try blocks?  (We won't know that they are
> empty until after all the labels are resolved, so we can't usually detect
> this at the point of emitting the catch entry.)  What about when, by the
> time we get to the end of generation, a label used in a try-catch, or
> LVT[T], isn't bound?  Should we throw, or just drop the entry?  I suspect
> one size does not fit all here and we have to design some more
> options-handling.
>

I've currently disabled NOP-ing because I want to know about unreachable
code early.
If I find a situation where I cannot prevent such code with reasonable
effort, I will have
to revert to the default behaviour.  exceptionCatch() on an empty region is
an example
for a situation that I cannot detect with reasonable effort upfront, and
the default of
throwing helped me to think through what is happening there and about the
potential
consequences.

>From my limited experience with ifThenElse(), the higher level block-based
entry
points are in a better position to produce semantically equivalent code
("do what I
mean") instead of just passing through an instruction stream verbatim ("do
as I say").
This probably requires exclusive control over both sides of the involved
labels'
contract, position marking and position targeting/consumption.

Another single use only is constantPool(), to go from a DMHD instance
> to CodeBuilder's (field|invoke)Instruction.  This was a consequence of
> DMHD only providing the lookupDescriptor() as String and not as an MTD
> as well.  With hindsight, it may have been better for me to recover
> the MTD from the String regardless, and to stay on the level of *Desc
> throughout.
>
>
> Is there something missing that would bridge that for you?
>

I have many uses of DMHD: handover points to the runtime are stored as DMHD,
Java interop goes from Member to DMHD and then onward, every function arity
gets eventually assigned one or two DMHD.  The need to generate a field or
invoke instruction from DMHD-like data arises quite often, and a CodeBuilder
method to facilitate this from DMHD components feels kind of natural.  This
needs
a mapping from DMHD$Kind to Opcode, and DMHD exposing its lookup as MTD
would allow to keep this on the *Desc level.

Right now I am using these two function variants:

(defn invoke
  (^CodeBuilder [^CodeBuilder xb ^DirectMethodHandleDesc$Kind kind
                 ^ClassDesc owner ^String method-name
                 ^String lookup-descriptor ^boolean owner-interface?]
   (let [cp (.constantPool xb)
         owner (.classEntry cp owner)
         nm (.utf8Entry cp method-name)
         tp (.utf8Entry cp lookup-descriptor)
         nat (.natEntry cp nm tp)
         opc (case (.refKind kind)
               #_REF_getField 1 Opcode/GETFIELD
               #_REF_getStatic 2 Opcode/GETSTATIC
               #_REF_putField 3 Opcode/PUTFIELD
               #_REF_putStatic 4 Opcode/PUTSTATIC
               #_REF_invokeVirtual 5 Opcode/INVOKEVIRTUAL
               #_REF_invokeStatic 6 Opcode/INVOKESTATIC
               #_REF_invokeSpecial 7 Opcode/INVOKESPECIAL
               #_REF_newInvokeSpecial 8 Opcode/INVOKESPECIAL
               #_REF_invokeInterface 9 Opcode/INVOKEINTERFACE)]
     (if (< (.refKind kind) #_REF_invokeVirtual 5)
       (.fieldInstruction xb opc (.fieldRefEntry cp owner nat))
       (.invokeInstruction xb opc (if owner-interface?
                                    (.interfaceMethodRefEntry cp owner nat)
                                    (.methodRefEntry cp owner nat))))))
  (^CodeBuilder [^CodeBuilder xb ^DirectMethodHandleDesc mhd]
   (invoke xb (.kind mhd) (.owner mhd) (.methodName mhd) (.lookupDescriptor
mhd)
           (.isOwnerInterface mhd))))

> Finally, is there a way to decide between tableswitch and
> lookupswitch?  Lacking something better, I'm trying to emulate this
> code here:
>
> https://github.com/openjdk/jdk-sandbox/blob/master/src/jdk.compiler/share/classes/com/sun/tools/javac/jvm/Gen.java#L1320
>
>
> Most compilers use a heuristic based on the size of the two alternatives,
> comparing `(hi-lo) / count` to some density threshold.  You're mostly
> optimizing for bytecode size here, since when the JIT gets its hands on it,
> it has its own heuristics.
>

Understood.  It would be nice if Classfile would offer some "blessed" or
just reasonable heuristic here.

>
> ### Lost in translation
>
> One feature I cannot duplicate with Classfile is try/catch/finally in
> expression position when the operand stack is not empty.  The old
> bytecode generator dealt with this case by unwinding the operand stack
> into locals, evaluating the t/c/f, and then rebuilding the operand
> stack with the result on top.  But to do this, one needs to know what
> the operand stack looks like at the point of the `try`.
>
>
> Interesting point.  We do not build stack maps as we go, so we don't have
> our hands readily on the stack state.  However, I could imagine an overload
> of the try-catch builder that would let you feed it a TypeKind[], and that
> would use allocateLocal to automate the push/pop logic.  This is something
> you should be able to build from outside the library, too; this would be a
> good experiment try try.  (You'd have to manually compute the stack state.)
>

I'm currently experimenting with minimal stack tracking, basically an
approximated
flag "no stack operands" passed down during parsing.  If a try is reached
without
this flag being present, it is wrapped in a no-argument closure and
called.  My
hope is that this is much easier to get right than accurate stack tracking.

--mva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/classfile-api-dev/attachments/20220807/0a54edf5/attachment.htm>