[External] : Re: VM model and aconst_init

Thu Jan 27 21:00:20 UTC 2022

On 25 Jan 2022, at 2:50, forax at univ-mlv.fr wrote:

> Let's talk a bit about having the L world and the Q world completely 
> disjoint at least from the bytecode verifier POV.

Yes please; indeed, that is my strong recommendation.  Doing this 
(`Q>:<L` perhaps, the “squinting cat operator”?) allows the 
interpreter (as well as verifier) to concentrate one “mode” at a 
time, for values on stack.  This in turn unlocks implementation 
possibilities which are not as practical when `Q<:L` (our previous plan 
of record).

But keeping Q and L disjoint is a relative notion.  Clearly they have to 
interoperate *somewhere*.  And I believe that “somewhere”, where 
Q’s can become L’s and vice versa, is in the dynamic behavior of 
bytecodes, not in the type system (in the verifier) which relates 
multiple bytecodes.

> It means that we need some checkcasts to move in both direction, from 
> a Q-type to a L-type and vice-versa.

Correct.  Checkcast is the mode-change operator (both directions).  No 
need to create a new bytecode for either direction.

> But at the same time, an array of L-type is a subtype of an array of 
> Q-type ?

That depends on what you mean by “subtype”.  The JVM has multiple 
type systems.  In the verifier we might have a choice of allowing 
`Q[]<:L[]` or not.  It’s better not.  That puts no limitations on what 
the `aaload` and `aastore` instructions do.  They just dynamically sense 
whether the array is flattened or not and DTRT.  That has nothing to do 
with verifier types.  In fact, it’s a separated job of the runtime 
type system.

For the runtime type system, we will say that arrays are covariant 
across an *extends* which is an augmentation of the *subtype* system.  
So for the runtime `T[]<:U[]` iff `T extends U`, so `Q[]<:L[]` (for the 
same class in both modes).  But in the verifier `Q[]>:<L[]`.  Don’t 
like it?  Sorry, that’s the best we can do, given Java’s history 
with arrays and the weakness of the verifier.

> The result to a very uncommon/unconventional type system, and i'm not 
> a big fan of surprises in that area.

You are looking for a conventional type system involving arrays?  Then 
surely Java is not where you should be looking!  Arrays are always going 
to be an embarrassment, but it’s going to be more embarrassing overall 
if we let the whole design pivot around the task of minimizing array 
embarrassments.

> Furthermore, i believe that subtyping is a key to avoid multiple 
> bytecode verification of the generics code.

I recommend a far simpler technique:  Just don’t.  Don’t 
multiple-verify, and and don’t do anything that would *need* such 
extra verification (with different types) to give the same answer (as 
the first verification).  Also, don’t think of a JVM-loaded set of 
bytecodes as every having more than one operational meaning.

(I don’t understand this obsession, not just with you, of imagining 
generic bytecodes as statically “recopied” or “recapitulated” or 
“stamped out” for each generic instance.  I have an uncomfortable 
feeling I’m missing something important about that.  But I have to 
say, that’s not how the JVM will ever work.  The JIT is in charge of 
managing copies of code paths, and the verifier should not be extended 
in any way to support multiple passes or multiple meanings or multiple 
copies.  Maybe the idea about retyping bytecodes is a hold-over from the 
Model 3 experiments, where prototypes were built that elaborated, at 
some sort of load time, multiple copies of originally generic bytecodes, 
for distinct instances.  That was a quagmire in several ways.)

So, let’s not treat generic bytecodes as anything other than 
singly-typed by a one-time pass of the verifier.  Any attempt to 
reverify later from a different POV on just repeats earlier trips into 
the quagmire.

Now, the *source code* can be *imagined* to be re-typed in generic 
instances; that what generic code *means*.  But don’t confuse the 
*implementation* of generics from the *semantic model*.  The 
implementation will (in our plan of record) use something the verifier 
knows nothing about:  Dynamic type restrictions (or something 
equivalent) which manage the projections and embeddings of specific 
instance types into the generic bound type.  These *do not have to be* 
(indeed *must not be*, see below) identified as verifier types.  They 
are associations of pairs of JVM runtime types.

> By example, with the TypeRestriction attribute [1], the restriction 
> has to be subtype of the declared type/descriptor.

No, just no.  (Not your fault; I suppose the TR docs are in a confusing 
state.)  There’s no reason to make that restriction (linking TRs to 
the verifier type system), other than the futile obsession I called out 
above.  And if you do you make yourself sorry when you remember about 
the verifier’s I/J/F/D types.  And for many other reasons, such as 
instance types which are inaccessible to the generic code.  And the 
simple fact that bytecodes are not really generic-capable, directly, 
without serious (and futile) redesign.

Making bytecodes directly generic (in the sense of contextually 
re-type-able) is never going to be as simple as using a TR mechanism 
which occurs *apart from the verifier*.

(Plus, I don’t trust the verifier to take on new jobs.  In its 
existing job it has been an ever-flowing fountain of security bugs.  It 
made a hash of jsr/ret, and continually messed up instance creation 
paths.  I won’t re-hire it for any new jobs.  It’s good for matching 
descriptors in symbolic references, and maybe for helping the 
*interpreter* make a minor optimization to the `getfield` and `putfield` 
bytecodes, but that’s about it.  The v-table index caching for 
`invokevirtual` looks cool but is effectively a no-op.  The JIT makes 
all of the above irrelevant to performance, once the JIT kicks in.  So 
the verifier gives an incremental push to startup performance, or 
interpreter-only performance if anyone still cares.  Plus it gives us 
extra “security” except when it doesn’t.  OK, I’ll stop with the 
verifier now.)

A “type system” for TRs is pretty meaningless, from the JVM POV.  In 
the JVM, a TR (despite its name) is most fundamentally an operational 
specification of how to project (G->I) or embed (I->G) instance types 
into generic types.  It’s a function pair.  It can connect `I=int` to 
`G=Object`, which the JVM *knows* are disjoint types.  A TR is very 
useful for implementing source-level type relations that the JVM 
doesn’t know directly about, such as `int extends Object` 
(`int<|Object` maybe?).  So talking about subtypes in TRs is nonsense as 
far as I can see, unless in the special sense that “we use TRs in a 
translation strategy, to help extend the JVM’s limited subtyping 
notions to *our language’s* more expansive subtyping notions”.  If 
TRs were a language-level construct, then, yeah, maybe we say a TR has 
some constraints about language-level subtypes.  But it’s a JVM-only 
construct that plugs in where the JVM’s type system doesn’t work 
well enough.

(Should we rename TR to PE, “Project-Embedding Pair”?  I like TR, 
but maybe only because I’m used to the term.  As a JVM-level concept, 
it doesn’t need to get a highly polished name.  The audience for it is 
limited.)