Valhalla Minimal Value Types review invitation

Wed May 10 10:20:41 UTC 2017

(Note to Oracle people:  This is a duplicate of my message on an internal list!)

I have rolled most of the effect of these comments into the Shady doc also.

(Below I say that vunbox doesn't belong on the same list with vdefault,
but I changed the presentation again in the Shady doc, of vunbox.
It might trigger DVT derivation, just like vdefault might.)

On Apr 26, 2017, at 7:47 AM, Karen Kinnear <karen.kinnear at oracle.com> wrote:
> 
>> Next meeting: Wednesday April 26, 9am PT: 
>> 
>> NEW DIAL-IN: https://oracle.zoom.us/j/251372518 <https://oracle.zoom.us/j/251372518>
> Rough initial cut at load/link/init proposal - one potential topic for today’s agenda.
> 
> MVT Assumptions:
> VCC can not have a nullary constructor.
> DVT does not have a <clinit> nor an <init> method.

DVT has no code at all.  To pass verification maybe it has a nullary constructor,
but that constructor can be empty and the JVM will swallow it.  Actually, it could
*throw an exception*, if there is any chance that random user code could make
a call to it, although that is unlikely.  (If we ever have a DVT node in the
heap, it needs to be created using privileged operations, not a user-written
"new DVT" or "DVT.class.newInstance()".)  We want to keep the DVT, in its
"L-type" form, under the woodwork as much as possible.

> 
> Behavior Goals for contained value types for load, link, init:
> 
> I. Resolution of a VCC or DVT, i.e. classfile contains an LFoo or QFoo:
>   Resolve a VCC: (LFoo)
>     1. load VCC
>        annotation based: derive DVT class with an internal name  - eagerly load DVT

As Dan points out, it the DVT derivation could be delayed
until the first resolution of the DVT per se.  But since they
eventually will one class, this question is moot: There will
be only one class to load/link/init.  How closely do we
model this with our pairs DVT/VCC of classes in MVT?
I don't have a strong opinion:  We could treat them as
separate (although one is half-invisible), or we could
try to synchronize their bootstrap process as much as
possible, to simulate a single class using a pair of
closely coupled classes. Either is fine for now, and
I would even tolerate simplifications in the JVM and
spec. which led to distinct behaviors:  At worst it is
a bug (spec. vs. impl.) in a temporary prototype.

>     2. link VCC
>        does not trigger linking of DVT

+1

>     3. initialization of VCC
>        triggered by: new, static bytecodes
>        does not trigger initialization of DVT

+1 (in any case DVT <clinit> is missing, so init is a nop)

> 
>   Resolve a DVT: (QFoo)
>     1. load DVT
>        first load VCC, which derives and loads DVT

Or:  First load VCC (as if it were a superclass, which is kind of true),
and then "load" the DVT by deriving it from the loaded VCC.
That's the way I prefer to think about it, as long as they are separate.
(Alternatively, loading one loads the other; you can't do them separately.)

>     2. link DVT
>        first link VCC

Again, it's as if the VCC were a "super" of the DVT.
(Just as the JVM loads supers before subs, it *also* links
supers before subs.)  Or, again, just say that they are
always linked together, as if they were one class.

> 
>     3. initialize DVT
>        first initialize VCC
>          what triggers initialization of DVT?
>            normally: new, static bytecodes - these are invalid for DVT
>            vdefault
>            vunbox
>            anewarray/multianewarray on a DVT element type

The vunbox call does not trigger initialization in the final system,
since there is only one class present, and the value fed to the vunbox
op is already evidence of initialization.  In the MVT world, the DVT
has no <clinit>, so again we are free to dispense with initialization.

Bottom line:  "vunbox" doesn't seem to belong on the list with vdefault
and newarray-of-value (and eventual getstatic/putstatic/invokestatic).

Same argument for "vbox", in the other direction.

> 
> Open for Discussion:
>   The proposal is that you must not only load a DVT element of an array, you must also link and
>   initialize the DVT element.

Yes.  We cannot have values running around on the JVM stack or heap
until *after* the value class (DVT or eventual full VT) is loaded.  It would be
a disaster to try to process values of some type "Foo" before we have
decided what is the size and layout of Foo instances.  (It's easier with
object reference types, since null is always a valid reference value of
any type, including an unlinked type, or even an unloaded type.)

A while ago we decided to load embedded value types when
loading the containing object or value class.  It is as if the
embedded values are a kind of "super" to the embedding.  What
is common to both supers and embedded values is you cannot 
size and lay out the container until those prior dependencies are
sized and laid out.

As for linkage, that does not (AFAIK) contribute to the layout of the
value types.  What linkage contributes is the "vetting" of method
structure (verification and override analysis).  If we were to allow
values to run around on the JVM stack before linkage, we would
know how big they are and what parts they have but we would
not know if they had valid methods we could call on them.  This
edge case is clearly wrong enough to exclude completely.

Finally, as for initialization, what the contributes is the static state
that methods inside the value are assuming is true.  Again, but
more subtly, if you allow values to run around before initialization
is complete, the methods can fail if they assume that static state
is correctly spun up.  (While the <clinit> code is running, there are
necessarily some incomplete states potentially exposed, but only
for a short time and confined to one thread.)

Bottom line:  +100 on this invariant:  If a value-type (DVT) object
is anywhere on the JVM stack (or in locals), then either (1) the value
type class is fully initialized (the VCC, in MVT), or (2) the value type
class is in the process of being initialized, and the value type occurrence
is in a stack frame in the same thread as is running the <clinit>.

The various rules about arrays and vdefault (and get/put/invokestatic)
prevent values from leaking onto the JVM stack without enforcing that
invariant.

>   Otherwise you would need to link and initialize the DVT element on the first vaload,
>   in case you did not perform a prior vastore.

Yep.  Loading an uninitialized element of a value array is indistinguishable
from doing vdefault.  (This is one readl why vdefault is not as privileged as
vwithfield.)

>   The verifier could ensure that you perform a prior vastore, in which case you would only
>   need to load the DVT element of an array, not link and initialize it.

(I don't believe this.  The verifier cannot possibly track separately type-states
for heap variables, and especially not distinct elements of one array.)

In any case, if we don't push element-type initialization into array creation,
we must "poll" for it when loading elements from the array, which will add
useless expense to that (very common) operation.  Again, when you don't
have "nulls" as a sort of loose glue to tie things together, you have to be
careful about containers and embedded values.

Specifically:  Most of what we say about array elements of value types is
going to apply also to fields of value types, and vice versa.  When you
create a blank object which contains values, you should already have
run the initializers of those value types to completion.  …Sort of as if
the value types were supers of the object type containing those fields.

Going back to arrays:  It is as if the value-type element of an array is
sort of like a super to that array type.   It has to be initialized before
you can use an instance of the array.

(And this "generalized super" mentality gives a framework for dealing with vicious
cycles: We must detect and reject dependency cycles through value-type components,
in the load phase, just as we do with regular supers today.)

> II. Instance creation of a DVT, DVT has no <init>
> Creation of a default value type instance: which is all 0s in memory to represent
> the 0 or 0.0 or null value for fields of the DVT

Yes.  (Can we get away with no <init>?  Yay, that's the best!)

> Triggered by:
>   1) vdefault
>   2) anewarray/multianewarray on a derived value type
>     - creates a value array which is all 0s in memory representing the flattened
>     elements of the array
>     - which does not entail invoking any constructor on the VCC
>   3) vunbox

(Also if the value occurs as a field.  I just remembered that this is temporarily
excluded in MVT, which is fine.  I'd put it on the list anyway, with an asterisk
saying "we don't do this yet but here's how it would be treated if we did".)

>   4) internal implementation details such as copying a DVT - all of which imply that
>      the DVT is already initialized

(So #4 is not really a trigger; we could have a separate list of non-triggering
operations, notably by-value copy from any place to any other place.)

> It is required that the DVT is in the initialized state prior to the creation of a
> default value type instance.

Yes.  And we remind ourselves (by making the above lists) that creating
a default value type instance can happen explicitly via a value-bearing
bytecode (vdefault), or implicitly as part of creating an object (array only
in MVT) that contains a variable of that value type.

> 
> III. "uninitialized" value type/ partially initialized value type
> There is no such thing as an uninitialized value type.
> vdefault and anewarray/multianewarray can be invoked from anywhere.

+1

> QUESTION: vwithfield: is this restricted to invocation within the DVT?
> (For MVT, this would also be within the VCC)

Yes, please.  Also privileged code, so said code can act responsibly
on behalf of the VCC/DVT.

> That would allow the wither or instance factory to decide whether
> a partially initialized value type would be returned if there were
> an exception.

Precisely.  That's the user model for full value types as well.

> 
> IV. DVT in a Container: other object, other DVT
>    Class contains a DVT  field
>       - QUESTION: is this supported via bytecodes?

I think we said it's OK not to support this except for arrays.
But if it's easy to do, we should do it.  In any case, we need
to keep this case in mind, put on the asterisk that says
"in the next version", if we decide not to deliver it.

> 
>    For MVT, since we do not flatten DVT fields in objects or in other DVTs, then we
>    do not require preloading of DVT classes used to define fields.

Yes; that's the asterisk.  But as soon as we allow "QFoo;" to occur in
a classfile field definition (even if javac didn't generate it) then we take
off the asterisks.

Thanks for laying this out, Karen.

To recap:  Let's lean on the concept of "generalized supers", where
a class (or array type) can have the following dependencies which
are all treated on a similar footing:
 - any class depends on its super class
 - any class depends on its implemented interfaces
 - any class with embedded value-type fields (or array elements) depends on their types
 - the DVT (Q-type projection of VCC) depends on the VCC (principal L-type class)

For X in {load,link,initialize}, before a class can be ${X}ed,
it must first ${X} each of its "super-like" dependencies.

If you buy all that, then I think the only thing left to do is force vdefault
to trigger initialization of the DVT.  The reason vdefault is a special case
is that it creates a value type value out of thin air, rather than loading it
from memory.  When you load a value type from memory, you can rely
on the above load/link/initialize rules to have spun up the value properly.
If we make other bytecodes that create values out of thin air, they will
have to trigger initialization like vdefault does.  (I'm thinking vaguely
of a2b type instructions, but probably it can't happen.)

The unbox instruction has to "spin up" the DVT, but since it takes the
VCC as input, the only action left to do is initialize the DVT, and
since the DVT is a pure projection from the VCC, with none of its
own baggage (no <clinit>) then we are free to opine either way,
and in the one-class world the problem will be moot.  I'm inclined
to say that initializing the VCC automatically, implicitly initializes
the DVT also.  That will be true in the one-class world also, where
initializing a class is all the initialization you need for any of its
projections.

Also  note that if you work *only* with the VCC, the above rules do not imply
spinning up the DVT (unless you say it was done invisibly).

When we go to the one-class world, we can make the projections depend (in the
"super-like" manner) on the principal types.  (There are choices there we don't
need to make yet, mainly deciding who or what is really principal.)

(Specialization may have non-trivial projection initialization.  If projections
have their own <clinit>s, then operations which form *those* projections *will*
require initialization triggers.)

— John