[External] : Re: VM model and aconst_init
John Rose
john.r.rose at oracle.com
Thu Jan 27 21:00:20 UTC 2022
On 25 Jan 2022, at 2:50, forax at univ-mlv.fr wrote:
> Let's talk a bit about having the L world and the Q world completely
> disjoint at least from the bytecode verifier POV.
Yes please; indeed, that is my strong recommendation. Doing this
(`Q>:<L` perhaps, the “squinting cat operator”?) allows the
interpreter (as well as verifier) to concentrate one “mode” at a
time, for values on stack. This in turn unlocks implementation
possibilities which are not as practical when `Q<:L` (our previous plan
of record).
But keeping Q and L disjoint is a relative notion. Clearly they have to
interoperate *somewhere*. And I believe that “somewhere”, where
Q’s can become L’s and vice versa, is in the dynamic behavior of
bytecodes, not in the type system (in the verifier) which relates
multiple bytecodes.
> It means that we need some checkcasts to move in both direction, from
> a Q-type to a L-type and vice-versa.
Correct. Checkcast is the mode-change operator (both directions). No
need to create a new bytecode for either direction.
> But at the same time, an array of L-type is a subtype of an array of
> Q-type ?
That depends on what you mean by “subtype”. The JVM has multiple
type systems. In the verifier we might have a choice of allowing
`Q[]<:L[]` or not. It’s better not. That puts no limitations on what
the `aaload` and `aastore` instructions do. They just dynamically sense
whether the array is flattened or not and DTRT. That has nothing to do
with verifier types. In fact, it’s a separated job of the runtime
type system.
For the runtime type system, we will say that arrays are covariant
across an *extends* which is an augmentation of the *subtype* system.
So for the runtime `T[]<:U[]` iff `T extends U`, so `Q[]<:L[]` (for the
same class in both modes). But in the verifier `Q[]>:<L[]`. Don’t
like it? Sorry, that’s the best we can do, given Java’s history
with arrays and the weakness of the verifier.
> The result to a very uncommon/unconventional type system, and i'm not
> a big fan of surprises in that area.
You are looking for a conventional type system involving arrays? Then
surely Java is not where you should be looking! Arrays are always going
to be an embarrassment, but it’s going to be more embarrassing overall
if we let the whole design pivot around the task of minimizing array
embarrassments.
> Furthermore, i believe that subtyping is a key to avoid multiple
> bytecode verification of the generics code.
I recommend a far simpler technique: Just don’t. Don’t
multiple-verify, and and don’t do anything that would *need* such
extra verification (with different types) to give the same answer (as
the first verification). Also, don’t think of a JVM-loaded set of
bytecodes as every having more than one operational meaning.
(I don’t understand this obsession, not just with you, of imagining
generic bytecodes as statically “recopied” or “recapitulated” or
“stamped out” for each generic instance. I have an uncomfortable
feeling I’m missing something important about that. But I have to
say, that’s not how the JVM will ever work. The JIT is in charge of
managing copies of code paths, and the verifier should not be extended
in any way to support multiple passes or multiple meanings or multiple
copies. Maybe the idea about retyping bytecodes is a hold-over from the
Model 3 experiments, where prototypes were built that elaborated, at
some sort of load time, multiple copies of originally generic bytecodes,
for distinct instances. That was a quagmire in several ways.)
So, let’s not treat generic bytecodes as anything other than
singly-typed by a one-time pass of the verifier. Any attempt to
reverify later from a different POV on just repeats earlier trips into
the quagmire.
Now, the *source code* can be *imagined* to be re-typed in generic
instances; that what generic code *means*. But don’t confuse the
*implementation* of generics from the *semantic model*. The
implementation will (in our plan of record) use something the verifier
knows nothing about: Dynamic type restrictions (or something
equivalent) which manage the projections and embeddings of specific
instance types into the generic bound type. These *do not have to be*
(indeed *must not be*, see below) identified as verifier types. They
are associations of pairs of JVM runtime types.
> By example, with the TypeRestriction attribute [1], the restriction
> has to be subtype of the declared type/descriptor.
No, just no. (Not your fault; I suppose the TR docs are in a confusing
state.) There’s no reason to make that restriction (linking TRs to
the verifier type system), other than the futile obsession I called out
above. And if you do you make yourself sorry when you remember about
the verifier’s I/J/F/D types. And for many other reasons, such as
instance types which are inaccessible to the generic code. And the
simple fact that bytecodes are not really generic-capable, directly,
without serious (and futile) redesign.
Making bytecodes directly generic (in the sense of contextually
re-type-able) is never going to be as simple as using a TR mechanism
which occurs *apart from the verifier*.
(Plus, I don’t trust the verifier to take on new jobs. In its
existing job it has been an ever-flowing fountain of security bugs. It
made a hash of jsr/ret, and continually messed up instance creation
paths. I won’t re-hire it for any new jobs. It’s good for matching
descriptors in symbolic references, and maybe for helping the
*interpreter* make a minor optimization to the `getfield` and `putfield`
bytecodes, but that’s about it. The v-table index caching for
`invokevirtual` looks cool but is effectively a no-op. The JIT makes
all of the above irrelevant to performance, once the JIT kicks in. So
the verifier gives an incremental push to startup performance, or
interpreter-only performance if anyone still cares. Plus it gives us
extra “security” except when it doesn’t. OK, I’ll stop with the
verifier now.)
A “type system” for TRs is pretty meaningless, from the JVM POV. In
the JVM, a TR (despite its name) is most fundamentally an operational
specification of how to project (G->I) or embed (I->G) instance types
into generic types. It’s a function pair. It can connect `I=int` to
`G=Object`, which the JVM *knows* are disjoint types. A TR is very
useful for implementing source-level type relations that the JVM
doesn’t know directly about, such as `int extends Object`
(`int<|Object` maybe?). So talking about subtypes in TRs is nonsense as
far as I can see, unless in the special sense that “we use TRs in a
translation strategy, to help extend the JVM’s limited subtyping
notions to *our language’s* more expansive subtyping notions”. If
TRs were a language-level construct, then, yeah, maybe we say a TR has
some constraints about language-level subtypes. But it’s a JVM-only
construct that plugs in where the JVM’s type system doesn’t work
well enough.
(Should we rename TR to PE, “Project-Embedding Pair”? I like TR,
but maybe only because I’m used to the term. As a JVM-level concept,
it doesn’t need to get a highly polished name. The audience for it is
limited.)
More information about the valhalla-spec-observers
mailing list