[External] : Re: VM model and aconst_init

Mon Jan 31 18:04:26 UTC 2022

> From: "John Rose" <john.r.rose at oracle.com>
> To: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Sent: Thursday, January 27, 2022 10:00:20 PM
> Subject: Re: [External] : Re: VM model and aconst_init

> On 25 Jan 2022, at 2:50, [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] wrote:

>> Let's talk a bit about having the L world and the Q world completely disjoint at
>> least from the bytecode verifier POV.
> Yes please; indeed, that is my strong recommendation. Doing this ( Q>:<L
> perhaps, the “squinting cat operator”?) allows the interpreter (as well as
> verifier) to concentrate one “mode” at a time, for values on stack. This in
> turn unlocks implementation possibilities which are not as practical when Q<:L
> (our previous plan of record).

> But keeping Q and L disjoint is a relative notion. Clearly they have to
> interoperate somewhere . And I believe that “somewhere”, where Q’s can become
> L’s and vice versa, is in the dynamic behavior of bytecodes, not in the type
> system (in the verifier) which relates multiple bytecodes.

>> It means that we need some checkcasts to move in both direction, from a Q-type
>> to a L-type and vice-versa.
> Correct. Checkcast is the mode-change operator (both directions). No need to
> create a new bytecode for either direction.

>> But at the same time, an array of L-type is a subtype of an array of Q-type ?
> That depends on what you mean by “subtype”. The JVM has multiple type systems.
> In the verifier we might have a choice of allowing Q[]<:L[] or not. It’s better
> not. That puts no limitations on what the aaload and aastore instructions do.
> They just dynamically sense whether the array is flattened or not and DTRT.
> That has nothing to do with verifier types. In fact, it’s a separated job of
> the runtime type system.

> For the runtime type system, we will say that arrays are covariant across an
> extends which is an augmentation of the subtype system. So for the runtime
> T[]<:U[] iff T extends U , so Q[]<:L[] (for the same class in both modes). But
> in the verifier Q[]>:<L[] . Don’t like it? Sorry, that’s the best we can do,
> given Java’s history with arrays and the weakness of the verifier.

It did not occur to me that the verifier type system and the runtime type system can be so divorced. 

Currently, the verifier just does not do some verifications (related to interfaces) that are done later at runtime. But you propose to do the opposite, something that is not legal for the verifier but legal at runtime. 
I don't say that this is not possible to do that, but this idea has a huge cost in term of engineering because it requires to think in terms of two different mental models, so every discrepancies can be a source of bug. 
Also, what does it means for java.lang.invoke which tries to follow the bytecode semantics but at runtime ? 

[...] 

>> Furthermore, i believe that subtyping is a key to avoid multiple bytecode
>> verification of the generics code.
> I recommend a far simpler technique: Just don’t. Don’t multiple-verify, and and
> don’t do anything that would need such extra verification (with different
> types) to give the same answer (as the first verification). Also, don’t think
> of a JVM-loaded set of bytecodes as every having more than one operational
> meaning.

I agree, no multiple-verify, but like the VM has to do some runtime checks akin when there is a checkcast with an interface, the VM has to do something when there is a TypeRestriction. 

[...] 

> But don’t confuse the implementation of generics from the semantic model . The
> implementation will (in our plan of record) use something the verifier knows
> nothing about: Dynamic type restrictions (or something equivalent) which manage
> the projections and embeddings of specific instance types into the generic
> bound type. These do not have to be (indeed must not be , see below) identified
> as verifier types. They are associations of pairs of JVM runtime types.

>> By example, with the TypeRestriction attribute [1], the restriction has to be
>> subtype of the declared type/descriptor.
> No, just no. (Not your fault; I suppose the TR docs are in a confusing state.)
> There’s no reason to make that restriction (linking TRs to the verifier type
> system), other than the futile obsession I called out above. And if you do you
> make yourself sorry when you remember about the verifier’s I/J/F/D types. And
> for many other reasons, such as instance types which are inaccessible to the
> generic code. And the simple fact that bytecodes are not really
> generic-capable, directly, without serious (and futile) redesign.

> Making bytecodes directly generic (in the sense of contextually re-type-able) is
> never going to be as simple as using a TR mechanism which occurs apart from the
> verifier .

Yes, 

> (Plus, I don’t trust the verifier to take on new jobs. In its existing job it
> has been an ever-flowing fountain of security bugs. It made a hash of jsr/ret,
> and continually messed up instance creation paths. I won’t re-hire it for any
> new jobs. It’s good for matching descriptors in symbolic references, and maybe
> for helping the interpreter make a minor optimization to the getfield and
> putfield bytecodes, but that’s about it. The v-table index caching for
> invokevirtual looks cool but is effectively a no-op. The JIT makes all of the
> above irrelevant to performance, once the JIT kicks in. So the verifier gives
> an incremental push to startup performance, or interpreter-only performance if
> anyone still cares. Plus it gives us extra “security” except when it doesn’t.
> OK, I’ll stop with the verifier now.)

It also gives meaningful (mostly) error messages when my students try to generate bytecodes :) 

> A “type system” for TRs is pretty meaningless, from the JVM POV. In the JVM, a
> TR (despite its name) is most fundamentally an operational specification of how
> to project (G->I) or embed (I->G) instance types into generic types. It’s a
> function pair. It can connect I=int to G=Object , which the JVM knows are
> disjoint types. A TR is very useful for implementing source-level type
> relations that the JVM doesn’t know directly about, such as int extends Object
> ( int<|Object maybe?). So talking about subtypes in TRs is nonsense as far as I
> can see, unless in the special sense that “we use TRs in a translation
> strategy, to help extend the JVM’s limited subtyping notions to our language’s
> more expansive subtyping notions”. If TRs were a language-level construct,
> then, yeah, maybe we say a TR has some constraints about language-level
> subtypes. But it’s a JVM-only construct that plugs in where the JVM’s type
> system doesn’t work well enough.

First, i don't think you need int extends Object, because you can have Qjava/lang/Integer; which already extends Object, ArrayList<int> being another spelling of ArrayList<Qjava/lang/Integer;> which is great because it's a subtype (at runtme) of ArrayList<Integer>. 

TR is not a silver bullet, TR has only two paths, one using the declared type and one using the restricted type, when there is inheritance, unlike bridges that can generate several paths (at the cost of several methods/entry points), with TR you are stuck with two, the fully optimized and the un-optimized, anything in between will go to the un-optimized path. Furthermore, any codes with a wildcard will go through the un-optimized path. So the runtime checks associated with TRs as to be fast. Hence the idea to restrict them to subtyping checks. 

> (Should we rename TR to PE, “Project-Embedding Pair”? I like TR, but maybe only
> because I’m used to the term. As a JVM-level concept, it doesn’t need to get a
> highly polished name. The audience for it is limited.)

Rémi