[LW100] Specialized generics -- user model issues
Brian Goetz
brian.goetz at oracle.com
Wed Oct 17 20:55:13 UTC 2018
I spent some time trawling through several years of Valhalla e-mails to
excavate the user-model issues inherent in specialized generics. I won’t
address translation, VM support for template classes, or binary
migration compatibility issues in this mail — this is focused on user
model and source compatibility of migrating existing libraries.
Type variables
In erased generics, all type variables are assumed to be erased, and
instantiations are assumed to be subtypes of |Object|. This means that
the following idioms are fair game in erased generic code (where |t| is
of type |T|):
* t = null
* t == null
* t == anyReferenceType
* Object o = t
* Object[] os = arrayOfT
* synchronized (t) { … }
Since not all of these will be supported by value types, type variables
need to be restricted if we intend to generify over values (and
primitives). Our working model is to qualify such type variables with an
|any| modifier, which triggers stronger type checking.
*Comparison with null.* We had previously rejected comparisons between
an any-var (avar) and |null|. However, this is unlikely to be good
enough; it seems entirely reasonable for a T-accepting method to reject
null inputs:
|void accept(T t) { if (t == null) throw new NPE(); // do stuff } |
A preferable solution is to allow comparison to null, and at
specialization time, constant-fold the calculation to false when |T| is
a (non-nullable) value type.
*Assignment to null.* Assigning |null| to an avar is something we can
generally reject in any-generic code, though we’ve recently started
discussing how we might treat some value types as nullable (using
zero-field detection). In this case, not only might we want to support
this when we know T must be nullable. (This could be managed by a type
constraint |<T extends Nullable>|, where all reference types are
nullable, as well as all nullable value types.) For clients that just
want a zero-ish sentinel, they can use |T.default| instead of |null|,
which evaluates to |null| for erased |T|.
*Comparison to |Object|.* There’s a larger discussion to be had on |==|
on values; this is part of it.
*Assignment to |Object|.* As with primitives, this is a boxing +
widening reference conversion.
*Array covariance.* This is a larger discussion.
*Synchronization*. This can be rejected at compile time, or we can make
heroic attempts to have this work in the VM. This is a larger discussion.
To the extent that various dependent operations are implemented one way
for values and another for references, the compiler and template class
mechanism will need to conspire to accept a bytecode representation that
specializes down to the right thing.
We had been assuming that for compatibility, we would specialize value
and primitive instantiations but leave reference instantiations erased.
However, this choice (and how to override it) is open to discussion, as
the specialization mechanisms we prototyped seem like they would be up
for specializing on reference types too.
Object::equals
In Q-world, value types did not derive from |Object|, and could not be
used as the type operand of |instanceof|. This meant that the usual
|equals()| idiom:
|if (other instanceof Me) { Me o = (Me) other; return this.x == o.x &&
this.y == o.y; } else return false; |
could not be used. Instead, the signature of |equals()| was tightened to
take the value type itself. This caused all sorts of downstream pain.
But, now that there exists a boxing+widening conversion from any value
type to |Object|, and we can support value types in |instanceof|, we are
in better shape; value types can derive from |Object|, and implement
|equals(Object)| in the usual way. As we’ll see, this eliminates a lot
of pain.
Collection::contains
A number of methods in Collection (|contains|, |remove|, etc) take
|Object| instead of |T|. (This allows us to ask questions like
|dogs.contains(animal)|. One could argue that this is not necessary, but
its the collections API we have.)
In Q-world, these methods were problematic, because they would
eventually bottom out in |Object::equals|:
|boolean contains(Object o) { for (T t : elements) return t.equals(o);
return false; } |
But, if |V::equals| took a |V|, this code would not compile. Our
previous-best (not necessarily good) strategy for this was a complicated
dependent type that amounted to /(erased T ? Object : T)/, which would
let the signature of reified collections be |contains(T)| and erased
collections be |contains(Object)|.
Having healed the rift by allowing value classes to extend |Object| (or
close enough), this problem goes away; the existing signatures and
implementations are good enough.
Collection::removeAll
The problem that |equals()| imposes on |contains()| is further kicked up
the road to |containsAll()|, which accepts a |Collection<?>| rather than
a more narrowly typed collection (for the same reason:
|dogs.containsAll(animals)|.) And, having addressed |contains(Object)|,
we’re in good shape for |containsAll|:
|boolean containsAll(Collection<?> c) { for (Object o : c) if
(!contains(o)) return false; return true; } |
This assumes that a wildcard instantiation of an any-generic class
refers to any instantiation, not just an erased instantiation (which is
essentially a forced move if we don’t want users to hate us.)
Implementing Object::equals
Consider this erased generic class:
|class Box<T> { T t; public Box(T t) { this.t = t; } public boolean
equals(Object other) { return (other instanceof Box b) // pattern
matching FTW! && Objects.equals(t, b.t); } |
This code is OK, but it is somewhat unsatisfying, because we’re not
comparing that the two boxes hold the same type. This might be OK for
|Box| — because we’re going to call |equals()| on the held |T| value —
but doesn’t work as well for |List|. If I have an erased generic |List|,
then this implementation of |equals()| (which is what |List::equals|
does) would say that an empty |List<Integer>| is equal to an empty
|List<String>|. This is kind of unsatisfying. Its the best we can do
when the types are erased, but it’s not the best we can do for a
specialized |List|. Unfortunately, we would like to use the same code
for both.
But, it gets worse. If I’m an erased |Box|, then casting to |Box| or
|Box<?>| is the best I can do, and we don’t want to lie to the user and
let them ask |instanceof Box<T>| when we can’t really answer that
question. But if I’m a specialized |Box|, I’d like to check against and
cast to |Box<T>|, because I would like to take a |T| out of the box, and
not just |Object|.
We could punt here, and say this is the cost of possibly-erased
generics. And this might be OK. Our previous best (but not necessarily
good) idea was to introduce a dependent type |T.OR_ERASED|, which would
basically mean /(erased T ? erasure(T) : T)/. Then we could ask
|if (other instanceof Box<T.OR_ERASED>) Box<T.OR_ERASED> o =
(Box<T.OR_ERASED>) other; T t = o.get(); ... } |
which is exactly the question we want to ask. It also turns out to be
how we’re likely to represent |Box<T>| in the class file. But, it may be
too much to ask of users. (Though, we can give them the choice; users
who are willing to let an empty |List<int>| be equal to an empty
|List<String>| can go the old route, and users who want finer control
can use the new locution.)
List::remove(int)
List contains the following methods:
|boolean remove(Object object); E remove(int index); |
The former is inherited from |Collection|; the latter means “remove the
element at the given index.” The authors of this method thought there
was no way this could be an overload conflict, and so called this method
|remove| rather than the more verbose |removeAt|. But, if we allow the
instantiation |List<int>|, now these methods become ambiguous.
Our story here was some sort of conditional-member trick, where we said
something like:
* |remove(int)| is only defined on reference instantiations of |List|
* |List| acquires a new method, |removeAt(int)|.
* On reference instantiations of |List|, |removeAt(int)| has a default
that delegates to |remove(int)|.
* On other instantiations of |List|, |removeAt(int)| is abstract.
This allowed existing clients to work, because all existing
instantiations of |List| are typed as some sort of erased |List|. (If a
client anyfied their use of |List|, then its reasonable to ask them to
change their code.) It also allows existing subclasses to work, as these
are all erased. New clients and subclasses would have to use and declare
the new |removeAt(int)| method.
The leading syntactic choice for declaring conditional methods is to use
|this| parameters:
|E remove(List<ref T> this, int index); E removeAt(int index); default
removeAt(List<ref T> this, int index) { return remove(index); } |
Note that to get this right, we needed to declare |removeAt(int)| twice,
once as a total interface method, and once as a partial default.
Map::get
|Map::get| uses |null| to signal that the specified key was not found in
the map. For (non-nullable) value types, this is fundamentally a
misguided method signature. (In general, an unconditional get method is
ill-suited to types that have no available sentinel value, such as
|int|. Types that have spare bit patterns, such as |LocalDate|, could
still work in this model. Note that this pushes on the so-far implicit
linkage between erasure, reference instantiation, and nullability, which
may need to be revisited.)
One approach is to treat |Map::get| as being the same sort of pariah
that |List::remove(int)| is; we’ll grudgingly welcome it in erased
company, but as our types become more refined, we erase our memory of
our erstwhile coarser companions:
|V get(Map<K, ref V> this, K key); |
Of course, we have to add some new methods to replace this. This could
be an |Optional|-returning |get()|, or a |get()| that takes a
user-supplied sentinel, or a pattern with a binding variable. This is an
open libraries question.
An alternate route is to recognize that there’s a projection from every
type to a closely-related nullable type. If we give every value type a
companion box type, and have a way to denote it, we can totalize this
notation trivially, and declare |get| as:
|V.Box get(K k) |
For erased instantiations, |V.Box| is just |V|. In L-world, I like this
solution as the boxing conversion is so cheap, that attempting to avoid
boxing doesn’t seem worth it. (Which is what winning looks like!)
Collection::toArray
The return type of |Collection::toArray| is |Object[]|, rather than
|T[]|. This is largely a forced move for erased collections (otherwise
we’d risk heap pollution without issuing unchecked warnings), but for a
specialized collection, we can (and want to) do better.
As with |Map::get|, there are a few paths. We could restrict |toArray()|
to erased instantiations, and expose a new method, such as
|toArray(factory)|, as |Stream| does now (and which |Collection| is
likely to get soon.) This is doable, but kind of sucks as the user
shouldn’t have to provide |T[]::new| as a factory for a reified
collection, since the collection presumably already knows it is a
collection of |T|.
Alternately, we could make another trip to the type operator well, and
define |T.ARRAY| as /(erased T ? Object[] : T[])/. (This is a close
relative of the type operator we floated for |equals()|, and they could
probably be merged.) Then we redefine
|T.ARRAY toArray() |
which erases to |Object[]| for erased instantiations, making existing
clients and subclasses happy.
9-way method sets
There are a number of method overload sets (such as
|StringBuilder::append| or |Arrays::fill|) that have overloads for all
the types (eight primitives, and one generic or Object version.) In
Q-World, we had trouble specializing code that called these methods,
because there was no applicable method for value instantiations. (For
some of these (|Arrays::fill|), we’d like to turn these into a single
specializable method, with bridges for the existing
hand-specializations.) Our troubles here in Q-World stemmed from the
fact that we were not able to conclude that these overload groups could
be effectively unified into a total signature, so we could not call
these methods from any-generic code. This is an open issue involving
membership computation (can the compiler assume there is a total member
if it finds a covering set of overloads?) and template classfile design
(can we encode an invocation of such a method in a template classfile.)
Separately, we either need to decide to lump the value versions of these
in with |Object| (which works OK with |m(Object)| but not with
|m(Object[])| unless we plunk for covariance), or to have a |Value|
super type and let users code an |m(Value)| version.
Species statics
In migrating generic code, it is not uncommon to use erasure as a way of
sleazily sharing an instance. For example, this idiom is common:
|private static final Collection<?> EMPTY = new EmptyCollection();
@SuppressWarning(“unchecked”) public static<T> Collection<T>
emptyCollection() { return (Collection<T>) EMPTY); } |
This is OK in erased code, but a no-go in specialized code. But, users
will be reluctant to give up such idioms. We concluded the answer is
that some statics are static to the /class/, and others care static to
the /species/, and the above idiom is an example of the latter. (The
type variables are in scope in a species context, but not a class context.)
Wildcards
I saved the worst for last. In the language type system, |Foo<?>| is a
super type of every instantiation of |Foo|. Conveniently, we erase
|Foo<T>|, |Foo<?>|, and raw |Foo| to the same thing — |LFoo;|, and
existing classfiles will use this to denote both “erased Foo” and “any
Foo”. It is likely we’re going to have to pull some serious magic to
compatibly retain this illusion. But, we’re talking about user model,
not migration compatibility. Which means that we need a way to write
down a wildcard instantiation, and the VM needs to know that every
species is a subtype of the wildcard.
There are multiple wildcard instantiations, since there could be
multiple type variables. If we have |Foo<T,U>|, we could have
instantiations |Foo<?,String>| and |Foo<String,?>| and |Foo<?,?>|, with:
|Foo<String,String> <: Foo<String,?> <: Foo<?, ?> Foo<String,String> <:
Foo<?,String> <: Foo<?, ?> |
We could assign each wildcard type its own runtime type, or we could
collapse them to a single all-wild type. The tradeoff here is boxing vs
type proliferation; if we give each wildcard its own runtime type, we
get sharper type signatures (|Map<?,int>| is known to return |int|,
rather than erasing to |Object|) but more types and more bridges, but
less boxing. (Given that boxing just got a lot cheaper, that gives us a
nudge towards the simpler solution.)
Summary so far
The story so far is much better than it was in Q-World! Some of the
problems (such as |equals(Object)|) from Q-World have just gone away,
and others are amenable to techniques we have already identified.
Candidate pieces of the solution include:
* Template classfiles
* Species statics
* Partial / conditional members
* |T.default|
* |T.BOX|
* |T.OR_ERASED|
* |T.ARRAY|
Open issues include:
* Array covariance
* Equality comparisons
* Locking
* Whether “erased” and “reference instantiation” are coupled
* Whether we can get away with projected types like |T.OR_ERASED|
* A top value type
* Representation of wildcards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/attachments/20181017/01d52b42/attachment-0001.html>
More information about the valhalla-spec-experts
mailing list