Type-dependent operations
Brian Goetz
brian.goetz at oracle.com
Fri Dec 25 00:13:56 UTC 2015
The previous discussion topic (which is still not remotely finished,
you're not off the hook yet!) centered on migrating generic APIs. A
related topic is that of migrating the /implementations/ of these APIs.
This exploration has been informed by the prototype of Collections and
Streams.
In the lucky case, no changes are needed to the bodies of methods after
anyfying the API -- just adding "any" is all you need. However, not all
cases are so lucky.
Here's a list of places where simply recompiling existing ref-generic as
any-generic could run into problems.
*Nullity. *References are nullable, values are not. (We'll have a
separate discussion for "nullable values", which may be desirable for
migration compatibility.)
*Variance. *References are polymorphic; values are not.
*Identity. *References have identity; values do not. For example, this
means that values cannot be used as the lock object for a synchronized
block.
*Object methods. *Value types will almost certainly have some form of
equals, hashCode, toString, and getClass; they will almost certainly not
have wait, notify, or notifyAll methods, and probably not clone either.
*Relationship with Object. *All reference types are subtypes of Object;
value types are not. Similarly, arrays of reference types are subtypes
of Object[]; arrays of value types are not.
**Array creation. **Currently, the idiom for creating a T[] array is to
create an Object[] array, and statically cast it to T[]. This won't
work with value arrays. *
Instanceof, casting, and type literals. *Instanceof doesn't permit a
parameterized type on the RHS. However, this is not specific enough for
specialized types; the runtime class of List<int> is different from
List<String>. Casting does permit a parameterized type, but currently
this is interpreted as a static cast. Similarly, type literals as
currently formulated are not specific enough either.
*Wildcards. *The existing wildcard Foo<?> means (and must continue to
mean) Foo<? extends Object>.
When dealing with quantities that might either be a reference or a
value, such as an expression of type 'T' where T is an avar, the
compiler must be conservative and only allow operations that can be
proven safe for either references or values. So it would have to
reject, for example, assigning a null to a T, since we don't know that
null is a member of the domain for all T.
Obviously, we are not going to add value types and any-tvars to the
language, and then not adjust the other places where type variables meet
other language features. So clearly some of these issues will be
addressed by extending the semantics of existing language features.
But, we don't necessarily *have* to change anything in order for things
to "work". A method in an any-generic class could be "peeled" into a
ref version and a value version:
class Foo<any T> {
<where ref T>
public void moo(T t) {
// existing method body
}
<where val T>
public void moo(T t) {
// alternate method body, that steers clear of restrictions
}
}
But, asking users to write their code twice would be rude, so we want to
keep this sort of peeling to a bare minimum, and preserve it as as an
"escape hatch". If we're to minimize peeling, it stands to reason that
either new linguistic forms need to be added, or existing forms be
stretched to accomodate the broadened domain of genericity. Let's take
these one at a time.
*Nullity. *We can further break this down into assignment to null and
comparison to null.
Assignment to null is not going to fly. Our current prototype supports
the expression T.default, which evaluates to the default value for
whatever type T describes. For reference instantiations, this is null;
for primitives, this is zero/false. Assignment to null can be replace
with assignment to T.default.
For comparison with null, there are some options. In the prototype, we
currently have a peeled generic method <any T>Any.isNull(), which
returns false for value invocations. However, even swapping out ==null
for Any.isNull() is somewhat intrusive; we can define ==null such that
it constant folds to false for value instantiations. Then existing
source code is unchanged (and there's no runtime overhead for the null
check in value instantiations, since its been folded away.)
When we look more closely at the possibility of nullable value types,
this will have to be refined.
*Variance. *We already fold "? extends T" and "? super T" to T when
T is known to be a value type. (More specifically, we treat wildcards
bounded by avars as a dependent type, (if erased T then ? extends T
else T)).
*Identity. *There are a few cases here -- synchronization, reference
comparison to an Object, System.identityHashCode. I think it makes
sense to reject synchronization, and instead ask for more explicit
lock-selection logic (perhaps appealing to a peeled <any T> Object
lockFor(T) method). For reference comparison to Object, we can treat
this as we propose above for comparison to null -- constant fold to
false. For System.identityHashCode, we can peel this into something
that uses ordinary hashCode for values.
*Object methods. *We've already discussed the Objectible<any T>
interface, which would define the core methods equals, hashCode,
toString, and getClass. Other Object methods (wait, notify, notifyAll,
and probably clone) would not be available on any-T-valued expressions.
(Arguably clone() could be the identity function on values, but this may
not be worth it -- cloning is pretty broken.)
*Relationship with Object. *Assignment to Object could be accepted as a
possibly-autoboxed operation, but I'm not sure this is a great idea --
it might be better to have an explicit toObject() method (maybe even on
Objectible). Assignment of T[] to Object[] needs to be rejected, but in
most cases Object[] can be replaced with T.erasure[] (just like
replacing null with T.default.)
**Array creation. **The current prototype supports the expression form
new T[n], which downgrades to Object[] when T is a reference type (and
issues an unchecked warning.) Alternately, we could provide a
reflective method <any T> T[] newArray(int), also with an unchecked
warning. We can make the unchecked warning go away if the new
expression were new T.erasure[n] (or the library version returned
T.erasure).
*
Instanceof, casting, and type literals. *It is straightforward enough to
extend instanceof to support "instanceof Foo<int>", and similarly for
cast and type literals. We can do the same for the wildcard Foo<any>.
Supporting "instanceof Foo<T>" is trickier because T might be erased,
and so it might not give you the answer you expect. Currently in a
generic class Foo<T> you can ask if something is instanceof raw Foo, or
of wildcard Foo<?>. The equivalent question with any-generics is more
complicated; you want to express "If I am erased and the other is erased
Foo, OR I am not erased and the other is the same instantiation of Foo
as me." (This collapses to "do they have the same runtime class", but
that's not really what we want to encourage people to write.) Simply
extending instanceof to support Foo<T> (even with an unchecked warning)
seems insufficient here, because in the erased case, it will say yes
when all it can tell is "they're both erased Foo", and it seems like it
promises more than it delivers. (And, it should be possible to write a
sensible equals() method without unchecked warnings.) But all is not
lost! Our friendly dependent type T.erasure saves us here too:
if (other instanceof Foo<T.erasure>)
(This is a slight stretching of the syntax, since we're not really
asking if the other is an instance of Foo<Object> in the erased case,
but only slightly.) We can do the same for casting; I am not yet sure
it makes sense to do the same for type literals.
*Wildcards. *Code that makes use of Foo<?> will likely want to migrate
to using Foo<any> instead.
Looking at how many times T.erasure plays into the answer, you can see
why I was arguing for it in the context of the API migration -- because
with any of the other API approaches, we would still have the same set
of problems / unchecked warnings when we get to the method body.
Take the equals() method. We would like to be able to write an equals()
method once, generically for all instantiations, with no peeling and no
unchecked warnings. The T.erasure approach gets us there.
If we have a class Box<T> today:
class Box<T> {
T t;
boolean equals(Object o) {
if (o instanceof Box<?>) {
Box<?> other = (Box<?>) o;
if (t == null)
return other.t == null;
Object otherT = other.t;
return t.equals(otherT);
}
else
return false;
}
}
The parts in red are those where erasure is exposed to the programmer;
the programmer would like to ask if the other object is a Box<T>, cast
it to a Box<T>, and extract its state as a T, but can't do so safely, so
we settle for answering a looser question.
Here's the same class, anyfied. Red is code that changes from the above.
class Box<any T> {
T t;
boolean equals(Object o) {
if (o instanceof Box<T.erasure>) {
Box<T.erasure> other = (Box<T.erasure>) o;
if (t == null)
return other.t == null;
T.erasure otherT = other.t;
return t.equals(otherT); // This is .equals(T.erasure) too
}
else
return false;
}
}
My claim here is: not only is this safe (no unchecked warnings, no heap
pollution), and not only is it more generic because the domain of
genericity is broadened, but that it is /less polluted //by erasure
/(despite the word "erasure" appearing prominently.) By using the
T.erasure type, we're able to explicitly say "use the sharpest type you
can, modulo erasure" in the instanceof, cast, variable extraction, and
equals contexts, and the limitations of our approximations are explicit
-- and we get more type checking than we would by manually erasing
things to "Object". We're working within the type system, rather than
outside it.
Overall, with the language features adjusted as described (loosely)
herein, we can migrate existing generic code to any-generic in a fairly
localized and mechanized manner, with only a few idioms (e.g., locking)
requiring any sort of peeling on the part of the user. The (incomplete)
prototype of Collections in the Valhalla repo seems consistent with this
theory.
Oh, and there's one more elephant in this room: serialization. Lots of
work will be needed for serialization, which uses Object everywhere ...
but that's another day.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/attachments/20151224/43ea85a5/attachment-0001.html>
More information about the valhalla-spec-experts
mailing list