Notes about Valhalla from a non-Java perspective

Wed Oct 1 18:25:12 UTC 2014

On Sep 30, 2014, at 8:10 AM, Simon Ochsenreither <simon at ochsenreither.de> wrote:

> Hi,
> 
> it seems that no one from the Scala team found enough time/interest to participate on valhalla-dev (Martin mentioned that most of the Scala team is on holidays/in a move/in recovery) yet ... I would have preferred to not get involved (bad impressions from other OpenJDK lists), but I'll just post my personal notes (which I sent to scala-internals a few weeks earlier) here directly before they fall completely out of date.
> 
> Please note that it's written from a Scala development perspective, so "we" == "arbitrary group of Scala devs/users".

Thanks for taking the time to send your notes.
Here's a partial reply...

The Scala perspective on values is valuable to us,
not only because is known to work on the JVM,
but because you folks have lived with the implications
of your design choices.

> Here are my (slightly improved) notes:
> 
> ==============
> 
> * On the general approach of using class file attributes with tuples
>   of (bytecode index -> type)
>   It's kind of funny, because that's exactly the approach I took
>   almost 18 months earlier when I was thinking about this topic. I
>   considered that to be quite a hack at that time and thought "if the
>   JVM ever gets this, they will surely come up with a more principled
>   way of doing this".

We have similar opinions on it, and are mulling principled approaches.
The type annotations (similar approach) was done under the constraint
that the bytecode set was off limits.  That is not the case here, so we
can add annotation-carrying bytecodes (decorated nops) if we choose.

> * vcmp
>   This feels a bit ad-hoc currently. I think it would be considerably
>   more useful if they tried to come up with a design which would work
>   across all types, and not yet-another special case.
> 
>   Scala's == implementation for instance is around 100 lines of code
>   with dozens of branches to work around Java's/the JVM's idea of
>   equality. It would be nice that if they were adding another
>   comparison operator that they wouldn't repeat the mistakes of equals
>   & friends.
> 
>   They could be on the right way, but hard to tell without looking at
>   it more closely.
> 
>   My suggestion would be to have a notion of "primitive" equality
>   which is defined as "do the most basic comparison available":
>   Compare the bits of the underlying value, which means ...
>     o the bits of value types, taking types into account
>     o the bits of the reference for reference types
>     o wrappers are unboxed
>     o Double.NaN is equal to itself, 0.0 and -0.0 are not equal

"Most basic comparison available" ... "except for certain undesirable,
even more basic comparisons."

Part of the reason to view values as typed bits is to provide a place
for a non-ad-hoc definition of of "most basic" comparison, which is
bitwise.  That's the current thinking behind vcmp.  Anything higher
level can be a method, can't it?

> 
>   This would be pretty much be in line with our earlier debate about
>   supporting eq on value types. If this could be encoded as a single
>   vcmp operation, it would be a huge win.
> 
>   Additionally, one could also consider a corresponding operation for
>   "semantic" equality:
> 
>     o Use "equals" implementation for value types
>     o Use Java-style "==" for primitives
>     o Wrappers are unboxed
>     o Use "equals" implementation for references
>     o Double.NaN is not equal to itself, 0.0 and -0.0 are equal
>     o (Optionally) Compare arrays by comparing the element
> 
> That would vastly simplify our == implementation, but that's not really the point ... it would be possible to do that, but I think priority should be to prevent vcmp from being either artificially limited to value types or ending up incoherent like the rest of equality stuff in Java.

vcmp is a morphed version of [ail]cmp.  That's why it is proposed as bitwise.
There is no fcmp, so we pull from Double.equals, since that is almost as fundamental.
This is obviously an over-constrained design problem, but I will observe
that there is a discernible "most basic" comparison, the initial element
in the category of comparable representations of a value type.
It may strike some as "wait, that's way too basic for me!", but it's fairly unique.

> In the end, I think how to deal with primitive wrappers is still an uncharted territory. Retrofitting those wrapper classes as value boxes very likely won't work, so maybe there is more specification required on how "T=Integer" for "any T" is treated (or int in need of a box in specialized code).

We doubt existing wrappers can be upgraded to be real value boxes.
Watch for "Heisenboxes" for primitives and values.

> * No reification for reference types (Reified in the sense of "the
>   type is available at runtime", not "it gets a specialized class".)
>   I'm split on this. On the one side, this could give us the escape
>   hatch for types not expressible with the new Generics, but on the
>   other hand it would really suck because it would mean we couldn't
>   just drop ClassTags altogether, but would need to drag them around
>   for every type even if only references would actually need them.
>   Additionally, the split between value types/reference types is very
>   likely not similar to the split we would need to erase Scala's types
>   to the JVM's level of expressiveness.
>   It feels like they are again trying to cut corners here, trading
>   implementation ease for additional language complexity.
>   (As far as I remember, there were some concerns about what happens
>   with Java's .class/getClass() when reference types are reified ...
>   but imho, Java problems should stay Java problems and shouldn't make
>   the JVM approach worse than necessary.) Also note, how this fits
>   with my work on making classOf[T] ClassTag-aware.

(Insert long conversation about the pros and cons, hows and whys,
of reification of Java source-level types to JVM level.  There was
a lot of this at JavaOne yesterday; sorry I can't recap here.)

> 
>   Additionally, we have already an escape hatch with the existing
>   erased Generics, so having yet-another different style of Generics
>   doesn't feel like the right way ... which brings us to the next point:
> 
> * Reified/Erased Generics interop
>   This seems to be a really dark corner. The draft is pretty silent on
>   this. It currently looks like you can't have a type hierarchy were A
>   is erased at the top but reified in a a subclass (or the other way
>   around) ... I think this will make it very hard to use erased
>   Generics as an escape hatch. Combined with the earlier point, I'd
>   prefer better reified/erased interop and having reified reference
>   types with reified Generics.
>   Otherwise, it feels like tons of people will try to wrap random
>   reference types in value types to get around that limitation.

We will probably provide some sort of opt-in way to track more elaborate
representations of compile-time types.  But the JVM will always (IMO)
tilt towards simplifying runtime types, as a way of making code sharing easier.
This doesn't stop programmers from weaving Class and Type token pointers
into their data structures, but doesn't require the JVM to do so always.

> * No variance for value types
>   This is a big conceptual problem with heterogeneous translation and
>   my personal conclusion 1.5 years ago was the same: It can't really
>   be done without a huge explosion in runtime complexity.

This will be something to watch.  If you don't explode the runtime complexity,
you may have to explode the static complexity (JAR file), right?
On the other hand, if you push instantiation to runtime (or link time)
you delay code splitting as long as possible, and allow for internal
unsafe code sharing (untyped inline mini-boxes, for example).

> * Poor man's typeclasses
>   There was a short mention of "conditional methods", e.g. enabling
>   methods only for a subset of some generic type. That smells a lot
>   like a crippled version of typeclasses. Might be useful to watch
>   what happens in that space.

Yes, please do.  General question:  What is the simplest JVM support
needed to make higher-kinded parameterizations efficient?

> * Members on any T
>   There seems to be a debate about whether/how "any T" should expose
>   methods by itself. While this seems to be nicely in line with what a
>   lot of people in Scala would like to do, I think Java will not be
>   able to properly add constraints to T to describe required methods.

That's what interfaces are for, on both references and value types.
I would expect to see them show up on primitives too, sooner than
new categorization mechanisms (which as you say would be wrong
from the start, for most languages).

>   They currently only have upper bounds to express those things, but
>   even in Scala where bounds are less horrible than in Java, people
>   have pretty much abandoned bounds based on subtyping in favor of
>   context bounds.
>   So even if Any/T wouldn't get any members I think we should be aware
>   that Java's options for adding constraints are very, very primitive
>   and probably not what we would want to use in Scala. Let's take that
>   into account before arguing for or against this ...

Maybe there's a way to make interfaces slightly stronger, so they can
represent more complex (structural or contextual) constraints.
General question:  What is the simplest JVM support needed to
allow interfaces to support the "next level" of constraints?
(Requires some discussion of what is the next level; my preference
is emphatically given to widely deployed languages.)

> 
> My conclusion regarding Scala:
> 
> * Exposing two kinds of Generics to the user is highly undesirable
>   from a language design POV. F# did the same with a much better
>   starting point than Java (well-designed runtime support for reified
>   Generics) and even for them I think the increase of complexity just
>   wasn't worth the benefit. Java might not have a choice here, but I
>   think we should make sure to not leak these "implementation details"
>   to the user. We will have to support both kinds of Generics for
>   interop anyway.

Noted.

> * We should think about how we would like to see T/Any and eq/==
>   evolve in the long-term and communicate that clearly to the Valhalla
>   people. It's probably much more painful if they e.g. decide to have
>   no members on any T, but decide on an incompatible constraint
>   mechanism (which would very likely be reified in the bytecode). That
>   way, keeping status quo and implementing member-free Any/T as a
>   scalac fiction might work better.

We'll do ad hoc constraint mechanisms if we need to, and we are
using them while prototyping, but we have both the time and inclination
to converge on a reasonably clean final design, if and when one appears.

> * We should start preparing for the variance/value type breakage.
>   People who need variance can migrate to boxed representations easily
>   (replace Vector[Int] with Vector[Integer] for instance), but trying
>   to keep things as-is would mean just not supporting JVM value types,
>   and I think that would be a terrible decision. From a type system
>   POV, instantiating a type parameter with a value type could be seen
>   as collapsing all the bounds of that type parameter to make it invariant

Thank you for helping us watch this point.  We are mulling over options
for introducing runtime or instantiation-time variance on value types.
Same point here about us having time and inclination, and hoping for
better clarity.

> * We should really think about having a scalac prototype which tries
>   to emit the new class file format and leverage the new semantics.
>   There is already an ASM fork out there with preliminary support, but
>   I don't know how stable/complete that code is.
>   Experience has shown (JRuby) that this is the most effective way to
>   actually influence the design.

Yes, the 292 bytecode changes were deeply influenced by JRuby's helpful
early adoption.

Thanks,
— John

> That's just some short overview, I can expand on these topics as necessary, but I expect that people have read the draft proposal and the complete mailing list already so that everyone is on roughly the same level.
> 
> ==============
> 
> I hope this is helpful.
> 
> Thanks,
> 
> Simon