Migrating methods in Collections

Mon Dec 21 16:20:35 UTC 2015

Totally fair to ask "have we missed a simpler choice?"  But sadly I 
think we haven't (though I don't fully understand your second idea, I'm 
interested to hear more.)

> here we’re talking added complexity which directly
> affects any interaction with a generic class — not just its
> implementation.

I think this is either not true, or overstated (depending on what you 
mean by "any interaction".)  The only new thing that this adds (and its 
not really that new) is the fact that the set of members of a 
parameterized type depends on the parameterization.  For example, 
List<String> might have members

     removeAt(int)   // new total method
     remove(int)     // legacy method

but List<int> would only have

     removeAt(int) // new total method

But, this sort of dependency isn't even new; from the client 
perspective, the signature of add in List<String> is add(String), 
whereas its signature in List<Number> is add(Number).  What's new is 
that some methods won't appear in some parameterizations.  I don't think 
this rises to the level of "added complexity" (arguably it is even 
"reduced complexity"?); the user will see this as a context-dependent 
set of methods when they hit ctrl-space in their IDE.  Just as the type 
signatures are already specialized to the type of the receiver in this 
context, methods that are not applicable will now be filtered.  (Note 
also that when we migrate a reference-specific method to a new method, 
the new method is not value-specific, its total, so it can be positioned 
as "the old method has been deprecated in favor of this new, more 
flexible method.")

> 1. We can simply not specialize the signatures of public collection
> methods (say, if [T] is the boxed-type of T, the signature of
> Map<K,V>.get(Object) will be [V] get(Object)). The JVM’s ability to
> avoid boxing might be good enough for this to yield the performance we
> want. New methods can, of course, be added. This approach can be taken
> in addition to or instead of superation.

Yes, this was something we considered early on.  There are several issues:
  - Is our box elision going to be good enough?
  - Nullity
  - Transparency

Elision.  Deciding to not specialize the signatures means that we're 
relying on box elision in the VM being good enough so that boxes are 
elided "almost all the time."  Sadly, I do not think this is going to be 
the case.  There are certainly reasons why box elision could be better 
with value-boxes (we can be more hostile to their identity, and 
therefore more freely elide them) than the existing wrappers, but if I 
have a deep chain of calls that are passing a boxed value through a 
library (common), there is a real risk of fall-off-the-cliff behavior 
when we hit our various inlining limits, and can't see that both ends of 
the chain prefer the unboxed variant.  Further, the most important box 
types -- Integer and friends -- are already deeply polluted with 
identity (want to bet that no program ever locks on one?)  So I think 
this one goes in the "boy, it would be nice" column, but I don't think 
its something we can bet the farm on.

Nullity.  Even if elision were perfect, Map.get is still fundamentally 
unrescuable, because it uses null as a return to signal non-presence. 
(Forcing all values to be nullable is a non-starter.)  This means that 
we may never be able to elide the boxing in Map.get(), which would 
cripple map performance -- non-starter.  So some sort of migration 
strategy is needed for Map.get() anyway -- and in fact, the "peeling" 
technique was invented in the context of "what about Map.get", and the 
rest was mostly an exploration of whether we needed any additional 
hammers beyond that.

Transparency.  Even if the above two were not issues, I think having box 
types (or worse, Object) show up in signatures when the user is 
expecting something involving T is a visible wart that the users will 
notice.  (Users would reasonably expect a List<int> to have methods that 
truck in int, not Integer, and not Object.)

For these reasons, I think *some* intrusion into the API is unavoidable. 
  The work that's gone into this draft is aimed at trying to balance 
compatibility with the current API (in both letter and spirit) with 
minimizing the warts perceived by future clients of the anyfied APIs. 
(Future *implementors* will experience warts, such as having to 
implement both flavors of remove.  However, these are migration-specific 
warts; as new libraries are written that don't have be migrated from 
ref-generics, these won't even show up.)

> 2. If methods are to be removed (as in made partial), instead of
> magically disappearing them at the call site based on usage, perhaps we
> should consider hiding them by source-code version (not from the class
> file, of course, only hiding them in javac)? This is an explicit
> decision to break source compatibility, but it has two mitigating
> factors: 1/ javac conveniently has a source level (which, I hear, will
> also result in hiding new methods starting with Java 9) and 2/ Java
> already breaks source compatibility from time to time. I had quite a few
> classes that didn’t compile under 8 because 8 changed the name
> resolution rules wrt static imports (or, more precisely, made them
> conform to the JLS, whereas they hadn't in prior versions). It took me
> some time to figure out what was wrong, but hidden methods would be able
> to give much better error messages.

I'm not sure I'm following what problem you're trying to solve here? 
(This sounds a little like the tricks we did with default methods when 
compiling with the jdk8 compiler in -source 7 mode, where we didn't 
consider default methods to be members of the class for some purposes 
when viewed from 7 code?)  Can you elaborate?

> Also, the superation idea seems very interesting, but I don’t understand
> how it would work for contains/remove(Object), as contains needs to be
> able to accept both super- /and/ subtypes of T (as in,
> animals.contains(dog)).

Yeah, this is what I meant by "Even though this works, its still not 
that obvious."  If you have

     animals.contains(dog)

where

     <U super E> boolean contains(U)

then inference concludes U=Animal, so everything is fine.  (The 
constraints: U :> E, E=Animal, Dog <: U).  But as I said, its not 
obvious.  (Dan likens it to F-bounds; for most people, the best they can 
do is learn "this is the idiom", rather than truly understand it.) 
Hence, this is a downside of this approach -- that even smart people 
will look at it and scratch their heads.

> I believe its type — like that of equals() --
> should be <any T> contains(T x)

Maybe!  But I think there's also a bit of Stockholm Syndrome in that 
thinking, that derives from a pre-generics notion of the world.  In a 
generic world, you can use the type system to exclude the "obviously 
stupid" candidates, such as those that are known not to be either a 
subtype or a supertype of the type in question.

Secondarily, there's a contingent reason why I'm nervous about such a 
fundamental method like Object.equals() being defined as a generic 
method -- when you follow the details of how any-generic methods are 
implemented, the invocation cost is unavoidably higher.  For new code, 
this is probably acceptable, but for the cornerstone of the castle, it 
doesn't seem to be.

The technique hinted at in the end of my mail is an attempt to get the 
benefits of superation while not having to reach for either the big 
contravariance hammer or the generic method hammer.  The result would a 
single, non-generic method whose signature collapses to

     equals(Object)

for reference types and

     equals(V)

for value types.  (All the animals.contains(dog) examples only show up 
when there's variance, and value types are monomorphic, so they don't 
have to deal with superclasses or subclasses showing up.)  If we can 
make this work, this seems preferable to any of the options explored 
previously.