Type-dependent operations

Fri Dec 25 00:13:56 UTC 2015

The previous discussion topic (which is still not remotely finished, 
you're not off the hook yet!) centered on migrating generic APIs.  A 
related topic is that of migrating the /implementations/ of these APIs.  
This exploration has been informed by the prototype of Collections and 
Streams.

In the lucky case, no changes are needed to the bodies of methods after 
anyfying the API -- just adding "any" is all you need. However, not all 
cases are so lucky.

Here's a list of places where simply recompiling existing ref-generic as 
any-generic could run into problems.

*Nullity. *References are nullable, values are not.  (We'll have a 
separate discussion for "nullable values", which may be desirable for 
migration compatibility.)

*Variance. *References are polymorphic; values are not.

*Identity. *References have identity; values do not.  For example, this 
means that values cannot be used as the lock object for a synchronized 
block.

*Object methods. *Value types will almost certainly have some form of 
equals, hashCode, toString, and getClass; they will almost certainly not 
have wait, notify, or notifyAll methods, and probably not clone either.

*Relationship with Object. *All reference types are subtypes of Object; 
value types are not.  Similarly, arrays of reference types are subtypes 
of Object[]; arrays of value types are not.

**Array creation. **Currently, the idiom for creating a T[] array is to 
create an Object[] array, and statically cast it to T[].  This won't 
work with value arrays. *

Instanceof, casting, and type literals. *Instanceof doesn't permit a 
parameterized type on the RHS.  However, this is not specific enough for 
specialized types; the runtime class of List<int> is different from 
List<String>.  Casting does permit a parameterized type, but currently 
this is interpreted as a static cast.  Similarly, type literals as 
currently formulated are not specific enough either.

*Wildcards. *The existing wildcard Foo<?> means (and must continue to 
mean) Foo<? extends Object>.

When dealing with quantities that might either be a reference or a 
value, such as an expression of type 'T' where T is an avar, the 
compiler must be conservative and only allow operations that can be 
proven safe for either references or values.  So it would have to 
reject, for example, assigning a null to a T, since we don't know that 
null is a member of the domain for all T.

Obviously, we are not going to add value types and any-tvars to the 
language, and then not adjust the other places where type variables meet 
other language features.  So clearly some of these issues will be 
addressed by extending the semantics of existing language features.

But, we don't necessarily *have* to change anything in order for things 
to "work".  A method in an any-generic class could be "peeled" into a 
ref version and a value version:

class Foo<any T> {
     <where ref T>
     public void moo(T t) {
         // existing method body
     }

     <where val T>
     public void moo(T t) {
         // alternate method body, that steers clear of restrictions
     }
}

But, asking users to write their code twice would be rude, so we want to 
keep this sort of peeling to a bare minimum, and preserve it as as an 
"escape hatch".  If we're to minimize peeling, it stands to reason that 
either new linguistic forms need to be added, or existing forms be 
stretched to accomodate the broadened domain of genericity.  Let's take 
these one at a time.

*Nullity. *We can further break this down into assignment to null and 
comparison to null.

Assignment to null is not going to fly.  Our current prototype supports 
the expression T.default, which evaluates to the default value for 
whatever type T describes.  For reference instantiations, this is null; 
for primitives, this is zero/false. Assignment to null can be replace 
with assignment to T.default.

For comparison with null, there are some options.  In the prototype, we 
currently have a peeled generic method <any T>Any.isNull(), which 
returns false for value invocations. However, even swapping out ==null 
for Any.isNull() is somewhat intrusive; we can define ==null such that 
it constant folds to false for value instantiations.  Then existing 
source code is unchanged (and there's no runtime overhead for the null 
check in value instantiations, since its been folded away.)

When we look more closely at the possibility of nullable value types, 
this will have to be refined.

*Variance. *We already fold  "? extends T"  and  "? super T"  to T when 
T is known to be a value type.  (More specifically, we treat wildcards 
bounded by avars as a dependent type, (if erased T  then  ? extends T  
else  T)).

*Identity. *There are a few cases here -- synchronization, reference 
comparison to an Object, System.identityHashCode.  I think it makes 
sense to reject synchronization, and instead ask for more explicit 
lock-selection logic (perhaps appealing to a peeled <any T> Object 
lockFor(T) method).  For reference comparison to Object, we can treat 
this as we propose above for comparison to null -- constant fold to 
false.  For System.identityHashCode, we can peel this into something 
that uses ordinary hashCode for values.

*Object methods. *We've already discussed the Objectible<any T> 
interface, which would define the core methods equals, hashCode, 
toString, and getClass.  Other Object methods (wait, notify, notifyAll, 
and probably clone) would not be available on any-T-valued expressions.  
(Arguably clone() could be the identity function on values, but this may 
not be worth it -- cloning is pretty broken.)

*Relationship with Object. *Assignment to Object could be accepted as a 
possibly-autoboxed operation, but I'm not sure this is a great idea -- 
it might be better to have an explicit toObject() method (maybe even on 
Objectible).  Assignment of T[] to Object[] needs to be rejected, but in 
most cases Object[] can be replaced with T.erasure[] (just like 
replacing null with T.default.)

**Array creation. **The current prototype supports the expression form 
new T[n], which downgrades to Object[] when T is a reference type (and 
issues an unchecked warning.)  Alternately, we could provide a 
reflective method <any T> T[] newArray(int), also with an unchecked 
warning.  We can make the unchecked warning go away if the new 
expression were new T.erasure[n] (or the library version returned 
T.erasure).
*
Instanceof, casting, and type literals. *It is straightforward enough to 
extend instanceof to support "instanceof Foo<int>", and similarly for 
cast and type literals.  We can do the same for the wildcard Foo<any>.

Supporting "instanceof Foo<T>" is trickier because T might be erased, 
and so it might not give you the answer you expect. Currently in a 
generic class Foo<T> you can ask if something is instanceof raw Foo, or 
of wildcard Foo<?>.  The equivalent question with any-generics is more 
complicated; you want to express "If I am erased and the other is erased 
Foo, OR I am not erased and the other is the same instantiation of Foo 
as me."  (This collapses to "do they have the same runtime class", but 
that's not really what we want to encourage people to write.) Simply 
extending instanceof to support Foo<T> (even with an unchecked warning) 
seems insufficient here, because in the erased case, it will say yes 
when all it can tell is "they're both erased Foo", and it seems like it 
promises more than it delivers.  (And, it should be possible to write a 
sensible equals() method without unchecked warnings.)  But all is not 
lost!  Our friendly dependent type T.erasure saves us here too:

    if (other instanceof Foo<T.erasure>)

(This is a slight stretching of the syntax, since we're not really 
asking if the other is an instance of Foo<Object> in the erased case, 
but only slightly.)  We can do the same for casting; I am not yet sure 
it makes sense to do the same for type literals.

*Wildcards. *Code that makes use of Foo<?> will likely want to migrate 
to using Foo<any> instead.

Looking at how many times T.erasure plays into the answer, you can see 
why I was arguing for it in the context of the API migration -- because 
with any of the other API approaches, we would still have the same set 
of problems / unchecked warnings when we get to the method body.

Take the equals() method.  We would like to be able to write an equals() 
method once, generically for all instantiations, with no peeling and no 
unchecked warnings.  The T.erasure approach gets us there.

If we have a class Box<T> today:

class Box<T> {
     T t;

     boolean equals(Object o) {
        if (o instanceof Box<?>) {
Box<?> other = (Box<?>) o;
            if (t == null)
                return other.t == null;
Object otherT = other.t;
            return t.equals(otherT);
        }
        else
            return false;
     }
}

The parts in red are those where erasure is exposed to the programmer; 
the programmer would like to ask if the other object is a Box<T>, cast 
it to a Box<T>, and extract its state as a T, but can't do so safely, so 
we settle for answering a looser question.

Here's the same class, anyfied.  Red is code that changes from the above.

class Box<any T> {
     T t;

     boolean equals(Object o) {
        if (o instanceof Box<T.erasure>) {
            Box<T.erasure> other = (Box<T.erasure>) o;
if (t == null)
                return other.t == null;
T.erasure otherT = other.t;
            return t.equals(otherT);  // This is .equals(T.erasure) too
        }
        else
            return false;
     }
}

My claim here is: not only is this safe (no unchecked warnings, no heap 
pollution), and not only is it more generic because the domain of 
genericity is broadened, but that it is /less polluted //by erasure 
/(despite the word "erasure" appearing prominently.)  By using the 
T.erasure type, we're able to explicitly say "use the sharpest type you 
can, modulo erasure" in the instanceof, cast, variable extraction, and 
equals contexts, and the limitations of our approximations are explicit 
-- and we get more type checking than we would by manually erasing 
things to "Object".  We're working within the type system, rather than 
outside it.

Overall, with the language features adjusted as described (loosely) 
herein, we can migrate existing generic code to any-generic in a fairly 
localized and mechanized manner, with only a few idioms (e.g., locking) 
requiring any sort of peeling on the part of the user.  The (incomplete) 
prototype of Collections in the Valhalla repo seems consistent with this 
theory.

Oh, and there's one more elephant in this room: serialization. Lots of 
work will be needed for serialization, which uses Object everywhere ... 
but that's another day.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/attachments/20151224/43ea85a5/attachment-0001.html>