ParameterizedType encoding (was: Model 3 classfile design document)

Thu Mar 24 21:47:17 UTC 2016

The complex structure you refer to exists in two places:
  - The GenericClass attribute, which describes the structure of the 
type variables;
  - The ParameterizedType attribute, through its 'enclosing' field.

The structure of these two need to line up, which opens up various sorts 
of possibility for error, which must be checked at runtime. So, why did 
I propose this, and not just flattening the tvars into a linear array?

The proposed form is derived from some (presumed) compatibility 
requirements. These leak in because, even though Outer<T> and Inner 
are derived from the same source file (and presumably therefore their 
classfiles will always be consistent), *other* classfiles can describe 
Outer<T>.Inner using ParamType, and its possible that Outer/Inner can 
be modified without recompiling the client.

Let's make these compatibility requirements more explicit.  In the 
absence of qualification, "compatible" means "binary and source 
compatible for clients and subclasses."

1.  Renaming a type variable should be compatible.  The rationale here 
is, the choice of parameter names is an implementation detail, and its 
reasonable to treat the choice of type variable names the same way.  We 
accomplish this by encoding type variable uses with indexes instead of 
names -- but this depends on the stability of indexes.

2. (Non-requirement.) We don't require that reordering or removing 
type variables be compatible. While it would be nice if we could do 
this, the number and order of type variables is part of a classes 
interface definition. A Map<K,V> is a map from K to V; the order is 
fixed when we first publish the class.

Note that (1) and (2) match the story for method argument lists today; 
you can compatibly rename parameters, but not remove or reorder them.  
(See below for adding.)

3. Anyfying an existing erased type variable should be compatible; I 
should be able to evolve class Foo<T,U> to be class Foo<any T, U>. The 
rationale is obvious; if this were not the case, we couldn't anyfy any 
of our existing libraries.

So far, nothing too controversial.  Now, let's move on to some "would be 
nice" evolution cases.

4. Generifying a non-generic class by adding one or more erased type 
variables should be (at least binary) compatible. This means that if we 
have Outer.Inner, it would be nice if we could evolve this to 
Outer<T>.Inner. And similarly, if we have Outer<T>.Inner, it would 
be nice if we could evolve this to be Outer<T>.Inner.

(Note that #4 + #3 means we can also add any-tvars too.)

This is the case with existing erased generics; so long as we follow 
some rules, we can generify existing classes without breaking clients.  
But, adding type variables somewhere in the chain has the potential to 
perturb the numbering scheme for type variables, conflicting with #1 
above.  Note that whether we start numbering outer-to-inner, or 
inner-to-outer, a continuous numbering system will be perturbed by one 
of the above scenarios.

5.  (Weak requirement.)  It would be nice if adding new erased type 
variables to the end of the type variable argument list were also binary 
compatible, as it is today with erased generics.  This also has the 
potential to perturb continuous numbering schemes.

Given these requirements, gleaned from existing compatibility behaviors, 
nudged us in the direction of explicitly modeling Outer<T>.Inner as a 
chain of parameterized type descriptors, rather than one flattened 
descriptor. When we encounter a PT, we match its structure (loosely, to 
account for #4 and #5) to the GenericClass descriptor for the described 
template class, as well as validating type parameters (e.g., that we 
haven't passed "I" to an erased type parameter.)

Alternately, we could have encoded type variables as (owner, index 
within owner) pairs.  But, even with such an encoding, we would have to 
go through a similar validation process as above; this changes the 
representation, but not the amount of work.

Alternately alternately, we could encode a snapshot of the 
as-of-compile-time contents of the GenericClass, so that separate 
compilation changes can be detected and possibly corrected.  But this 
seems overkill.

On 3/22/2016 4:21 PM, John Rose wrote:
> The full display of type variables, with all their definition sites, strikes me as clunky, from a VM perspective. It's a large amount of AST info.
>
> For inner classes, we flatten up level references by introducing synthetic variables and fields. In a few places core reflection needs an attribute to map backward but the executable part is all flattened. This makes it easier to execute and compile.
>
> Could we do a similar trick for type variables?  I.e. represent up-level type vars as a flat sequence of synthetic local copies.
>
> – John
>
>> On Feb 11, 2016, at 2:24 PM, Bjorn B Vardal <bjornvar at ca.ibm.com> wrote:
>>
>> where Inner doesn't declare any type variables, my understanding is that Inner will still have the GenericClass attribute because it may refer to T. Will Inner still appear as the first class frame, with tvarCount=0, enforcing the rule that the first element is always the class itself?