ParameterizedType encoding (was: Model 3 classfile design document)
Brian Goetz
brian.goetz at oracle.com
Thu Mar 24 21:47:17 UTC 2016
The complex structure you refer to exists in two places:
- The GenericClass attribute, which describes the structure of the
type variables;
- The ParameterizedType attribute, through its 'enclosing' field.
The structure of these two need to line up, which opens up various sorts
of possibility for error, which must be checked at runtime. So, why did
I propose this, and not just flattening the tvars into a linear array?
The proposed form is derived from some (presumed) compatibility
requirements. These leak in because, even though Outer<T> and Inner<U>
are derived from the same source file (and presumably therefore their
classfiles will always be consistent), *other* classfiles can describe
Outer<T>.Inner<U> using ParamType, and its possible that Outer/Inner can
be modified without recompiling the client.
Let's make these compatibility requirements more explicit. In the
absence of qualification, "compatible" means "binary and source
compatible for clients and subclasses."
1. Renaming a type variable should be compatible. The rationale here
is, the choice of parameter names is an implementation detail, and its
reasonable to treat the choice of type variable names the same way. We
accomplish this by encoding type variable uses with indexes instead of
names -- but this depends on the stability of indexes.
2. (Non-requirement.) We don't require that reordering or removing
type variables be compatible. While it would be nice if we could do
this, the number and order of type variables is part of a classes
interface definition. A Map<K,V> is a map from K to V; the order is
fixed when we first publish the class.
Note that (1) and (2) match the story for method argument lists today;
you can compatibly rename parameters, but not remove or reorder them.
(See below for adding.)
3. Anyfying an existing erased type variable should be compatible; I
should be able to evolve class Foo<T,U> to be class Foo<any T, U>. The
rationale is obvious; if this were not the case, we couldn't anyfy any
of our existing libraries.
So far, nothing too controversial. Now, let's move on to some "would be
nice" evolution cases.
4. Generifying a non-generic class by adding one or more erased type
variables should be (at least binary) compatible. This means that if we
have Outer.Inner<U>, it would be nice if we could evolve this to
Outer<T>.Inner<U>. And similarly, if we have Outer<T>.Inner, it would
be nice if we could evolve this to be Outer<T>.Inner<U>.
(Note that #4 + #3 means we can also add any-tvars too.)
This is the case with existing erased generics; so long as we follow
some rules, we can generify existing classes without breaking clients.
But, adding type variables somewhere in the chain has the potential to
perturb the numbering scheme for type variables, conflicting with #1
above. Note that whether we start numbering outer-to-inner, or
inner-to-outer, a continuous numbering system will be perturbed by one
of the above scenarios.
5. (Weak requirement.) It would be nice if adding new erased type
variables to the end of the type variable argument list were also binary
compatible, as it is today with erased generics. This also has the
potential to perturb continuous numbering schemes.
Given these requirements, gleaned from existing compatibility behaviors,
nudged us in the direction of explicitly modeling Outer<T>.Inner<U> as a
chain of parameterized type descriptors, rather than one flattened
descriptor. When we encounter a PT, we match its structure (loosely, to
account for #4 and #5) to the GenericClass descriptor for the described
template class, as well as validating type parameters (e.g., that we
haven't passed "I" to an erased type parameter.)
Alternately, we could have encoded type variables as (owner, index
within owner) pairs. But, even with such an encoding, we would have to
go through a similar validation process as above; this changes the
representation, but not the amount of work.
Alternately alternately, we could encode a snapshot of the
as-of-compile-time contents of the GenericClass, so that separate
compilation changes can be detected and possibly corrected. But this
seems overkill.
On 3/22/2016 4:21 PM, John Rose wrote:
> The full display of type variables, with all their definition sites, strikes me as clunky, from a VM perspective. It's a large amount of AST info.
>
> For inner classes, we flatten up level references by introducing synthetic variables and fields. In a few places core reflection needs an attribute to map backward but the executable part is all flattened. This makes it easier to execute and compile.
>
> Could we do a similar trick for type variables? I.e. represent up-level type vars as a flat sequence of synthetic local copies.
>
> – John
>
>> On Feb 11, 2016, at 2:24 PM, Bjorn B Vardal <bjornvar at ca.ibm.com> wrote:
>>
>> where Inner doesn't declare any type variables, my understanding is that Inner will still have the GenericClass attribute because it may refer to T. Will Inner still appear as the first class frame, with tvarCount=0, enforcing the rule that the first element is always the class itself?
More information about the valhalla-spec-observers
mailing list