Migrating methods in Collections
Brian Goetz
brian.goetz at oracle.com
Mon Dec 21 16:20:35 UTC 2015
Totally fair to ask "have we missed a simpler choice?" But sadly I
think we haven't (though I don't fully understand your second idea, I'm
interested to hear more.)
> here we’re talking added complexity which directly
> affects any interaction with a generic class — not just its
> implementation.
I think this is either not true, or overstated (depending on what you
mean by "any interaction".) The only new thing that this adds (and its
not really that new) is the fact that the set of members of a
parameterized type depends on the parameterization. For example,
List<String> might have members
removeAt(int) // new total method
remove(int) // legacy method
but List<int> would only have
removeAt(int) // new total method
But, this sort of dependency isn't even new; from the client
perspective, the signature of add in List<String> is add(String),
whereas its signature in List<Number> is add(Number). What's new is
that some methods won't appear in some parameterizations. I don't think
this rises to the level of "added complexity" (arguably it is even
"reduced complexity"?); the user will see this as a context-dependent
set of methods when they hit ctrl-space in their IDE. Just as the type
signatures are already specialized to the type of the receiver in this
context, methods that are not applicable will now be filtered. (Note
also that when we migrate a reference-specific method to a new method,
the new method is not value-specific, its total, so it can be positioned
as "the old method has been deprecated in favor of this new, more
flexible method.")
> 1. We can simply not specialize the signatures of public collection
> methods (say, if [T] is the boxed-type of T, the signature of
> Map<K,V>.get(Object) will be [V] get(Object)). The JVM’s ability to
> avoid boxing might be good enough for this to yield the performance we
> want. New methods can, of course, be added. This approach can be taken
> in addition to or instead of superation.
Yes, this was something we considered early on. There are several issues:
- Is our box elision going to be good enough?
- Nullity
- Transparency
Elision. Deciding to not specialize the signatures means that we're
relying on box elision in the VM being good enough so that boxes are
elided "almost all the time." Sadly, I do not think this is going to be
the case. There are certainly reasons why box elision could be better
with value-boxes (we can be more hostile to their identity, and
therefore more freely elide them) than the existing wrappers, but if I
have a deep chain of calls that are passing a boxed value through a
library (common), there is a real risk of fall-off-the-cliff behavior
when we hit our various inlining limits, and can't see that both ends of
the chain prefer the unboxed variant. Further, the most important box
types -- Integer and friends -- are already deeply polluted with
identity (want to bet that no program ever locks on one?) So I think
this one goes in the "boy, it would be nice" column, but I don't think
its something we can bet the farm on.
Nullity. Even if elision were perfect, Map.get is still fundamentally
unrescuable, because it uses null as a return to signal non-presence.
(Forcing all values to be nullable is a non-starter.) This means that
we may never be able to elide the boxing in Map.get(), which would
cripple map performance -- non-starter. So some sort of migration
strategy is needed for Map.get() anyway -- and in fact, the "peeling"
technique was invented in the context of "what about Map.get", and the
rest was mostly an exploration of whether we needed any additional
hammers beyond that.
Transparency. Even if the above two were not issues, I think having box
types (or worse, Object) show up in signatures when the user is
expecting something involving T is a visible wart that the users will
notice. (Users would reasonably expect a List<int> to have methods that
truck in int, not Integer, and not Object.)
For these reasons, I think *some* intrusion into the API is unavoidable.
The work that's gone into this draft is aimed at trying to balance
compatibility with the current API (in both letter and spirit) with
minimizing the warts perceived by future clients of the anyfied APIs.
(Future *implementors* will experience warts, such as having to
implement both flavors of remove. However, these are migration-specific
warts; as new libraries are written that don't have be migrated from
ref-generics, these won't even show up.)
> 2. If methods are to be removed (as in made partial), instead of
> magically disappearing them at the call site based on usage, perhaps we
> should consider hiding them by source-code version (not from the class
> file, of course, only hiding them in javac)? This is an explicit
> decision to break source compatibility, but it has two mitigating
> factors: 1/ javac conveniently has a source level (which, I hear, will
> also result in hiding new methods starting with Java 9) and 2/ Java
> already breaks source compatibility from time to time. I had quite a few
> classes that didn’t compile under 8 because 8 changed the name
> resolution rules wrt static imports (or, more precisely, made them
> conform to the JLS, whereas they hadn't in prior versions). It took me
> some time to figure out what was wrong, but hidden methods would be able
> to give much better error messages.
I'm not sure I'm following what problem you're trying to solve here?
(This sounds a little like the tricks we did with default methods when
compiling with the jdk8 compiler in -source 7 mode, where we didn't
consider default methods to be members of the class for some purposes
when viewed from 7 code?) Can you elaborate?
> Also, the superation idea seems very interesting, but I don’t understand
> how it would work for contains/remove(Object), as contains needs to be
> able to accept both super- /and/ subtypes of T (as in,
> animals.contains(dog)).
Yeah, this is what I meant by "Even though this works, its still not
that obvious." If you have
animals.contains(dog)
where
<U super E> boolean contains(U)
then inference concludes U=Animal, so everything is fine. (The
constraints: U :> E, E=Animal, Dog <: U). But as I said, its not
obvious. (Dan likens it to F-bounds; for most people, the best they can
do is learn "this is the idiom", rather than truly understand it.)
Hence, this is a downside of this approach -- that even smart people
will look at it and scratch their heads.
> I believe its type — like that of equals() --
> should be <any T> contains(T x)
Maybe! But I think there's also a bit of Stockholm Syndrome in that
thinking, that derives from a pre-generics notion of the world. In a
generic world, you can use the type system to exclude the "obviously
stupid" candidates, such as those that are known not to be either a
subtype or a supertype of the type in question.
Secondarily, there's a contingent reason why I'm nervous about such a
fundamental method like Object.equals() being defined as a generic
method -- when you follow the details of how any-generic methods are
implemented, the invocation cost is unavoidably higher. For new code,
this is probably acceptable, but for the cornerstone of the castle, it
doesn't seem to be.
The technique hinted at in the end of my mail is an attempt to get the
benefits of superation while not having to reach for either the big
contravariance hammer or the generic method hammer. The result would a
single, non-generic method whose signature collapses to
equals(Object)
for reference types and
equals(V)
for value types. (All the animals.contains(dog) examples only show up
when there's variance, and value types are monomorphic, so they don't
have to deal with superclasses or subclasses showing up.) If we can
make this work, this seems preferable to any of the options explored
previously.
More information about the valhalla-spec-experts
mailing list