Moving from VVT to the L-world value types (LWVT)

Sat Jan 20 04:01:58 UTC 2018

On Jan 19, 2018, at 1:10 PM, Remi Forax <forax at univ-mlv.fr> wrote:
> 
> So i propose to introduce a new class attribute named ValueTypes that contains the set of all value types that are used by that class,
> so for the VM a type is by default a reference type (a L-type) apart if the type is listed in the attribute ValueTypes, in that case it's a value type.
> Basically, having a bit for each type at class level that say if a type behave as a value type or not when the class was compiled.

This is a nice idea.  I hope we don't need it!  I'll explain.

I agree about the basic problem with Q-descriptors.
In L-world, there are relatively few places where the Q/R
distinction matters (and recall that U=L in L-world).
So remaining uses of Q-descriptors can be viewed
as places where we haven't yet figured out a different
way to express the Q/R distinction, in a place where
it matters.

For fields, a Q-descriptor can be replaced by a normal
L-descriptor, but with an ACC_VALUE bit in the field.
The same ACC_VALUE bit makes sense on a class
as a whole, I think.

Your ValueTypes table potentially gives a different
way to record ACC_VALUE bits, by associating
them with (the local views of) names rather than
schema elements.

> In term of implementation,
>  - method only have one descriptor using L types

For methods, a Q-descriptor has less utility.  It encodes
a different posture towards certain edge cases, such as
the acceptance of null.  But there are other ways to manage
such edge cases.  (For null, it can be a suitable NPE,
akin to what happens today when you cast a null to int.)

I don't really buy claims that method descriptors are
needed to distinguish by-reference from by-value
argument passing.  Yes, there is a "real calling sequence"
for a type like Complex<double>, probably involving
xmm registers, but the interpreter does not (cannot)
know about this, and needs to use the uniform L-type
as one size that fits everything.

This does leave open the problem of saying when
the interpreter uses the pointer copying vs. rebuffering
on return instructions.  To my mind that is similar to
the acmp problem:  You need to quickly decide whether
the L-world thingy needs rebuffering or not.  In all
cases you can decide whether the thingy is a value
simply by examining the payload of the L carrier type.
If it's null, it's a reference; if it's not, we look at its
class (or perhaps a tag bit or range check) to see
if it needs special value processing.  As long as we
can avoid doing this frequently, or optimize it well
when it is done frequently, the extra Q/R checks will
be in the noise.

>  - the verifier reject code that store null, use synchonize, etc with a type which is listed in the class attribute ValueTypes.

The verifier doesn't track nulls, so this proposal doesn't
make sense unless nullability is added to the verifier
type system.  I don't think that is needed.  The Java
way is to assume non-nullness and back that up with
a runtime check, and NPE if it fails.

Same point for sync.  And for acmp.  The verifier has
nothing to add here.

The verifier has a strategic weakness:  It does not load
classes (with a very few exceptions, carefully minimized).
It operates mostly on descriptors kinds (carrier types).

(Yes it loads classes to determine sub/super relations
as required by control flow.  I think of that as a weakness
in the design, not a feature to repeat.)

Your proposal of a ValueTypes table works around this
limitation by providing a local (and approximate) view
fo the contents of other files (specifically, the ACC_VALUE
bits).  Like I said, it's clever, but I hope we don't need it.

>  - at runtime, the VM check that when loading a class listed in the attribute ValueTypes is a value type.

The simple way to do this would seem to require a wave
of recursive loading, just to check value-type bits.  We try
to avoid that sort of thing.  (It's why the verifier is so weak.)
If we adopted your proposal, we'd probably want to do a
feature like class loader constraints, where classes that
claim X is a value register that claim, and when X eventually
loads it fails if the prediction was false.

(This is one big reason I don't want to adopt your proposal,
but rather sneak around it.  I think method signatures mention
too many types to load them all, and I don't want to manage
constraints either.)

Note that the issue of constraints does not appear with field
descriptors, because mentioning a value field *does* require
an immediate load of the field's class, and the constraint is
checked immediately when the class loader is calculating
the memory layout of the new class.

>  - when creating the object layout of a class, field of a type listed in ValueType may be flattened

That can be handled by an ACC_VALUE bit on the field.

>  - when creating an array of a type listed in ValueType may be flattened.

That can be handled by querying the metadata of the array
component type.   Loading an array type always forces a
load of its component type.

>  - when calling a method or accessing a field, after resolution, the interpreter may do adaptation in order to buffer (value type -> ref) or nullcheck (ref -> value type).   

Yes, *that* is the key trick.  This allows out-of-date clients to
use nulls freely.  The resulting NPEs are an artifact of the
client's need to recompile, just as CCEs are an artifact of
generic code that needs recompilation.  They are just as
rare and only-in-principle annoying.

I'm sure we haven't exhausted the list of places where in
L-world we still want to reach for Q-descriptors, but I think
we *can* drive that list down to the empty list, without
a ValueTypes table and/or special descriptors.

— John