Proposal for generics over primitives needs a rethink
Remi Forax
forax at univ-mlv.fr
Wed Dec 31 22:58:25 UTC 2014
Just to add that there is another cost with the C# approach, you must
have exactly the same bytecode for Set<int> and Set<boolean> so you can
not by example, specialize Set<boolean> to use a bitset.
About the introduction of a type Any in the VM, it means that now we
have a type that can store a primitive or a reference,
we know how to represent this kind of type using tag bits (like V8 does)
or Nan boxing (like Mozilla"s *Monkey VM does).
Basically, representing Any in the VM is like asking for transform a
Java VM into something very close to a Javascript VM.
happy new year,
Rémi
On 12/31/2014 11:33 PM, Brian Goetz wrote:
> Thanks, Gavin, for bringing up this point.
>
> I'm actually a little surprised that no one has asked this question
> before; after all, the "why not 'just' have an Any type" question is
> kind of an obvious one after you start thinking about this problem for
> a few minutes.
>
> *Obviously* it would be more desirable to integrate primitives and
> values into generics by leaning on the existing notion of type bound,
> rather than introducing all the additional complexity that we're
> considering. (Also obviously, this must have occurred to us in the
> first five minutes of thought. So why, after so much effort, have we
> said nothing about this possible approach? Indeed, it's on our (long)
> to-do list to write up some of our analysis of various roads not
> taken, including this one.)
>
> When designing a language (at least, one intended for real work), you
> need to pay attention to both the part where it meets the user, *and*
> the part where it meets the compilation target; if the mapping between
> source-level concepts and target-level concepts is not sufficiently
> straightforward, bad things will happen. But, most suggestions we
> receive for evolving Java tend to focus only on the former. (This is
> natural; developers usually only see the source code, not the
> bytecode, and even some language designers are willing to accept
> dramatic impedance mismatches between source code and bytecode if it
> gets them to their expressiveness goals.)
>
> But the reality is that, if we were to ignore the latter, people would
> be happy for a few minutes and then unhappy forever due to the parade
> of corner cases, complexity, and performance potholes that this
> approach generally leads to. We don't want to do this to our users.
> (We are lucky enough to have some control over our compilation target,
> but we're also constrained there as well by the same compatibility
> requirements.)
>
> For the record, the reason we rejected a unified 'Any' type is: it is
> a fiction. (A "unifiction".
> (https://twitter.com/BrianGoetz/status/461539994197585920)). Sure,
> it's easy to use 'Any' as a pseudo-type bound, and we could certainly
> choose to denote "Foo<any T>" as "Foo<T extends Any>", but all this
> does is draw the user further into the Any fiction while not actually
> making it a reality.
>
> Where the wheels start to come off the wagon is: how do we represent a
> variable of type 'Any" in bytecode (a field, local variable, or method
> parameter or return type)? If we can't answer that, we can't allow
> use of Any in these places. And solving this problem amounts to only
> slightly less than a total redesign of the JVM and bytecode
> architecture. So this harmless-seeming question (couched in claims of
> "simpler" and "more elegant") amount to "Why not just redesign the VM
> completely."
>
> Languages that have attempted to unify primitives and references on
> the JVM, with the existing bytecode architecture, while retaining some
> sort of compatibility with existing Java idioms, have failed at doing
> so. (And I am thankful to have those experiments to inform our work
> here!) As a concrete example, I point you to Paul Philips' excellent
> "Scala War Stories" talk from JVM Language Summit 2013, which covers
> the failure of such unifictions, and more:
> http://medianetwork.oracle.com/video/player/2623635250001
>
> But you might say "Wait a second, C# managed to pull off this redesign
> of the VM to support polymorphism over objects and primitives". And
> indeed they did, and overall their solution is quite elegant. And
> obviously, we must have known about this example, so why wouldn't we
> explore this?
>
> Well, obviously we have. The cost of the C# approach is that existing
> classes could not be gradually migrated to be generic; existing
> collections had to be effectively deprecated and replaced, or a "flag
> day" had to be declared where all the code (library and client)
> changes simultaneously. These are not options for us. At the risk of
> being obnoxious, C# was able to get away with it because at the time,
> they had a very small base of existing users and code and were not yet
> successful enough to have to worry about compatibility. Lucky for
> them, unlucky for us.
>
> Some more comments inline.
>
>> I'm rather concerned with this proposal
>
> We're concerned with it too, as I think we've made quite clear. Here's
> the position we're in: if we wait until we have a complete, 100%
> solution before sharing anything, people throw rocks at us for doing
> everything behind closed doors, but if we share our working thoughts
> in progress, people throw rocks at us for being half-baked. We've
> chosen the latter poison, so by all means, throw your rocks, but don't
> kid yourself that you've spotted something that no one else has. (And
> please, check the attitude at the door, it's just not helpful.)
>
>> What this proposal does is introduce parametric polymorphism over
>> primitive types, while leaving it impossible to abstract over
>> primitives and reference types with subtype polymorphism. Thus, at the
>> intersection of the two systems of abstraction, namely, *variance*, we
>> get the broken behavior that a List<int> isn't a List<?>.
>
> Indeed, we've already pointed this out, and its not pretty. All
> constructive suggestions accepted. But implicitly dropping a key
> requirement (like gradual migration compatibility), and then claiming
> there's an obvious answer, is not really helpful.
>
>> I therefore suggest a different, simpler, and much more natural
>> starting point for this work: stop pretending that there is no type
>> Any.
>
> This is a particularly funny way to put it, as it is the notion that
> there *is* an Any type which requires pretending! There is simply no
> way (without boxing) to represent this on the JVM as it currently
> stands. But if boxing were good enough, then we wouldn't need to do
> anything -- we'd just write ArrayList<Integer> and be done with it!
> But obviously boxing isn't good enough, since we're having this
> conversation. Which means you need a VM story for how we're going to
> represent a flattened array of XY-points or make ArrayList<int>
> actually be backed by an int[] array or inline value types into
> containing classes, and still plays nicely with generics. Where the
> data hits the heap is where the boxing story (and therefore the Any
> story) falls apart.
>
> All in all, you paint a picture of a beautiful world, but not the one we
> find ourselves living in. If we were designing a language from scratch,
> or didn't have users, or hated our users, we would certainly be
> exploring this approach in preference to the current approach we've
> staked out. (This is so obvious I wish I didn't even have to say it.)
> But we're not ready to throw our users under the bus to the degree
> that this approach seems to entail. But if we've missed something
> obvious, by all means, point it out (but please, constructively).
>
> And, feel free to prove us wrong! Try implementing the changes you
> are envisioning in the JVM, and show how they can get us to the goal!
>
>
More information about the valhalla-dev
mailing list