Proposal for generics over primitives needs a rethink

Brian Goetz brian.goetz at oracle.com
Wed Dec 31 22:33:02 UTC 2014


Thanks, Gavin, for bringing up this point.

I'm actually a little surprised that no one has asked this question 
before; after all, the "why not 'just' have an Any type" question is 
kind of an obvious one after you start thinking about this problem for a 
few minutes.

*Obviously* it would be more desirable to integrate primitives and 
values into generics by leaning on the existing notion of type bound, 
rather than introducing all the additional complexity that we're 
considering.  (Also obviously, this must have occurred to us in the 
first five minutes of thought.  So why, after so much effort, have we 
said nothing about this possible approach?  Indeed, it's on our (long) 
to-do list to write up some of our analysis of various roads not taken, 
including this one.)

When designing a language (at least, one intended for real work), you 
need to pay attention to both the part where it meets the user, *and* 
the part where it meets the compilation target; if the mapping between 
source-level concepts and target-level concepts is not sufficiently 
straightforward, bad things will happen.  But, most suggestions we 
receive for evolving Java tend to focus only on the former.  (This is 
natural; developers usually only see the source code, not the bytecode, 
and even some language designers are willing to accept dramatic 
impedance mismatches between source code and bytecode if it gets them to 
their expressiveness goals.)

But the reality is that, if we were to ignore the latter, people would 
be happy for a few minutes and then unhappy forever due to the parade of 
corner cases, complexity, and performance potholes that this approach 
generally leads to.  We don't want to do this to our users.  (We are 
lucky enough to have some control over our compilation target, but we're 
also constrained there as well by the same compatibility requirements.)

For the record, the reason we rejected a unified 'Any' type is: it is a 
fiction.  (A "unifiction". 
(https://twitter.com/BrianGoetz/status/461539994197585920)).  Sure, it's 
easy to use 'Any' as a pseudo-type bound, and we could certainly choose 
to denote "Foo<any T>" as "Foo<T extends Any>", but all this does is 
draw the user further into the Any fiction while not actually making it 
a reality.

Where the wheels start to come off the wagon is: how do we represent a 
variable of type 'Any" in bytecode (a field, local variable, or method 
parameter or return type)?  If we can't answer that, we can't allow use 
of Any in these places.  And solving this problem amounts to only 
slightly less than a total redesign of the JVM and bytecode 
architecture.  So this harmless-seeming question (couched in claims of 
"simpler" and "more elegant") amount to "Why not just redesign the VM 
completely."

Languages that have attempted to unify primitives and references on the 
JVM, with the existing bytecode architecture, while retaining some sort 
of compatibility with existing Java idioms, have failed at doing so. 
(And I am thankful to have those experiments to inform our work here!) 
As a concrete example, I point you to Paul Philips' excellent "Scala War 
Stories" talk from JVM Language Summit 2013, which covers the failure of 
such unifictions, and more:
   http://medianetwork.oracle.com/video/player/2623635250001

But you might say "Wait a second, C# managed to pull off this redesign 
of the VM to support polymorphism over objects and primitives".  And 
indeed they did, and overall their solution is quite elegant.  And 
obviously, we must have known about this example, so why wouldn't we 
explore this?

Well, obviously we have.  The cost of the C# approach is that existing 
classes could not be gradually migrated to be generic; existing 
collections had to be effectively deprecated and replaced, or a "flag 
day" had to be declared where all the code (library and client) changes 
simultaneously.  These are not options for us.  At the risk of being 
obnoxious, C# was able to get away with it because at the time, they had 
a very small base of existing users and code and were not yet successful 
enough to have to worry about compatibility.  Lucky for them, unlucky 
for us.

Some more comments inline.

> I'm rather concerned with this proposal

We're concerned with it too, as I think we've made quite clear.  Here's 
the position we're in: if we wait until we have a complete, 100% 
solution before sharing anything, people throw rocks at us for doing 
everything behind closed doors, but if we share our working thoughts in 
progress, people throw rocks at us for being half-baked.  We've chosen 
the latter poison, so by all means, throw your rocks, but don't kid 
yourself that you've spotted something that no one else has.  (And 
please, check the attitude at the door, it's just not helpful.)

> What this proposal does is introduce parametric polymorphism over
> primitive types, while leaving it impossible to abstract over
> primitives and reference types with subtype polymorphism. Thus, at the
> intersection of the two systems of abstraction, namely, *variance*, we
> get the broken behavior that a List<int> isn't a List<?>.

Indeed, we've already pointed this out, and its not pretty.  All 
constructive suggestions accepted.  But implicitly dropping a key 
requirement (like gradual migration compatibility), and then claiming 
there's an obvious answer, is not really helpful.

> I therefore suggest a different, simpler, and much more natural
> starting point for this work: stop pretending that there is no type
> Any.

This is a particularly funny way to put it, as it is the notion that 
there *is* an Any type which requires pretending!  There is simply no 
way (without boxing) to represent this on the JVM as it currently 
stands.  But if boxing were good enough, then we wouldn't need to do 
anything -- we'd just write ArrayList<Integer> and be done with it!  But 
obviously boxing isn't good enough, since we're having this 
conversation.  Which means you need a VM story for how we're going to 
represent a flattened array of XY-points or make ArrayList<int> actually 
be backed by an int[] array or inline value types into containing 
classes, and still plays nicely with generics.  Where the data hits the 
heap is where the boxing story (and therefore the Any story) falls apart.

All in all, you paint a picture of a beautiful world, but not the one we
find ourselves living in.  If we were designing a language from scratch,
or didn't have users, or hated our users, we would certainly be 
exploring this approach in preference to the current approach we've 
staked out.  (This is so obvious I wish I didn't even have to say it.) 
But we're not ready to throw our users under the bus to the degree that 
this approach seems to entail.  But if we've missed something obvious, 
by all means, point it out (but please, constructively).

And, feel free to prove us wrong!  Try implementing the changes you are 
envisioning in the JVM, and show how they can get us to the goal!





More information about the valhalla-dev mailing list