null hygiene of Q-types: are we safe yet?

Tue Apr 20 17:17:25 UTC 2021

The current Valhalla JVM prototype allows `CONSTANT_Class[QFoo;]` as
well as `CONSTANT_Class{Foo]`.

It consults the Q-vs-non-Q distinction by means of a bit-field in the
CP, called `JVM_CONSTANT_QDescBit` (0x80), which is adjoined bitwise
to the `CONSTANT_Class` tag value (0x07).

The same information is redundantly encoded in the CP by always
recording the original symbol of a `CONSTANT_Class` item even after
resolution; if it is of the form “Q…;” then that provides the same
information as the QDescBit.

One use of the QDescBit is to gate the selection of ref-mirror
vs. val-mirror (the Q-mirror is the val-mirror).

The “Q bit” is also used in `StackMapReader::parse_verification_type`,
to decide which “flavor” of class type to hand to the verifier.  (It
consults CP::klass_name_at and then Symbol::is_Q_signature, not the
QDescBit.  Perhaps this should be cleaned up one way or the other, to
use a common convention for detecting Q-ness.)

The `checkcast` bytecode excludes `null` when it sees `QDescBit` set,
whether the CP entry is resolved or not.  This preserves null-hygiene
within Q-types.

(I do have some quibbles with the exact semantics of the bytecode,
mainly related to future-proofing.  Specifically, I think a `null`
query should be specified to perform class loading, even though,
today, the answer is no longer in question at that point.  Should
`instanceof` do a similar trick?  No, `null` could be a valid inline
value some day, but you still cannot tell which class of `null` it is:
all `nulls` look the same.)

The verifier makes a distinction between Q-types and non-Q-types,
loading Q-descriptors with a special verification type
(`VT::inline_type`) in `SMR::parse_verification_type` as noted above.
The verifier generally keeps references and inlines separate.  This
has the effect of keeping `null`-dirty L-types from contaminating
Q-types during verification.

`VT::is_ref_assignable_from_inline_type` allows a Q-type to promote to
its own regular L-type (ref-type) or those of any of its supers
(including Object).  There is no implicit “demotion” from a regular
L-type down to a Q-type, in the verifier; this must always be done (as
with `null`) using `checkcast QFoo;`.

We want null-safety of Q-types to be strong enough so that when the
interpreter runs Q-values through Q-descriptors of method arguments,
nulls have already been excluded before (or during) method entry.  By
the time the JIT kicks in (either C1 or C2) we need Q-types to
participate in scalarized calling conventions (for
compiled-to-compiled code).  This doesn’t work well if there are
wandering nulls that show up for Q-values at method entry.  In
particular, it’s not good enough to catch nulls _later than method
entry_ in the interpreter, but _during method entry_ when the code
compiles.  And the above scheme does seem satisfy these goals.

For bytecode behaviors that are simpler than method argument passing
we might assist the verifier (as needed) by adding implicit `null`
checks where Q-descriptors appear, for storing into flattened fields,
or returning a Q-value from a method.

(We will need to revisit the problem of applying extra checks to
method arguments when we do specialized generics, because any of the
arguments to a generic method, and/or a method of a generic class or
interface, might be contextually specialized.  So we haven’t
completely escaped from the complexity of per-argument checks
in the interpreter.  I do think these per-arguments checks can
be safely done on method entry, in most cases, which localizes
the complexity somewhat.)

I think all of the above is pretty null-safe.  So, where are the
remaining “cracks in the armor”?