Notes about Valhalla from a non-Java perspective

Tue Sep 30 15:10:22 UTC 2014

Hi,

it seems that no one from the Scala team found enough time/interest to 
participate on valhalla-dev (Martin mentioned that most of the Scala 
team is on holidays/in a move/in recovery) yet ... I would have 
preferred to not get involved (bad impressions from other OpenJDK 
lists), but I'll just post my personal notes (which I sent to 
scala-internals a few weeks earlier) here directly before they fall 
completely out of date.

Please note that it's written from a Scala development perspective, so 
"we" == "arbitrary group of Scala devs/users".

Here are my (slightly improved) notes:

==============

  * On the general approach of using class file attributes with tuples
    of (bytecode index -> type)
    It's kind of funny, because that's exactly the approach I took
    almost 18 months earlier when I was thinking about this topic. I
    considered that to be quite a hack at that time and thought "if the
    JVM ever gets this, they will surely come up with a more principled
    way of doing this".

  * vcmp
    This feels a bit ad-hoc currently. I think it would be considerably
    more useful if they tried to come up with a design which would work
    across all types, and not yet-another special case.

    Scala's == implementation for instance is around 100 lines of code
    with dozens of branches to work around Java's/the JVM's idea of
    equality. It would be nice that if they were adding another
    comparison operator that they wouldn't repeat the mistakes of equals
    & friends.

    They could be on the right way, but hard to tell without looking at
    it more closely.

    My suggestion would be to have a notion of "primitive" equality
    which is defined as "do the most basic comparison available":
    Compare the bits of the underlying value, which means ...
      o the bits of value types, taking types into account
      o the bits of the reference for reference types
      o wrappers are unboxed
      o Double.NaN is equal to itself, 0.0 and -0.0 are not equal

    This would be pretty much be in line with our earlier debate about
    supporting eq on value types. If this could be encoded as a single
    vcmp operation, it would be a huge win.

    Additionally, one could also consider a corresponding operation for
    "semantic" equality:

      o Use "equals" implementation for value types
      o Use Java-style "==" for primitives
      o Wrappers are unboxed
      o Use "equals" implementation for references
      o Double.NaN is not equal to itself, 0.0 and -0.0 are equal
      o (Optionally) Compare arrays by comparing the element

That would vastly simplify our == implementation, but that's not really 
the point ... it would be possible to do that, but I think priority 
should be to prevent vcmp from being either artificially limited to 
value types or ending up incoherent like the rest of equality stuff in Java.

In the end, I think how to deal with primitive wrappers is still an 
uncharted territory. Retrofitting those wrapper classes as value boxes 
very likely won't work, so maybe there is more specification required on 
how "T=Integer" for "any T" is treated (or int in need of a box in 
specialized code).

  * No reification for reference types (Reified in the sense of "the
    type is available at runtime", not "it gets a specialized class".)
    I'm split on this. On the one side, this could give us the escape
    hatch for types not expressible with the new Generics, but on the
    other hand it would really suck because it would mean we couldn't
    just drop ClassTags altogether, but would need to drag them around
    for every type even if only references would actually need them.
    Additionally, the split between value types/reference types is very
    likely not similar to the split we would need to erase Scala's types
    to the JVM's level of expressiveness.
    It feels like they are again trying to cut corners here, trading
    implementation ease for additional language complexity.
    (As far as I remember, there were some concerns about what happens
    with Java's .class/getClass() when reference types are reified ...
    but imho, Java problems should stay Java problems and shouldn't make
    the JVM approach worse than necessary.) Also note, how this fits
    with my work on making classOf[T] ClassTag-aware.

    Additionally, we have already an escape hatch with the existing
    erased Generics, so having yet-another different style of Generics
    doesn't feel like the right way ... which brings us to the next point:

  * Reified/Erased Generics interop
    This seems to be a really dark corner. The draft is pretty silent on
    this. It currently looks like you can't have a type hierarchy were A
    is erased at the top but reified in a a subclass (or the other way
    around) ... I think this will make it very hard to use erased
    Generics as an escape hatch. Combined with the earlier point, I'd
    prefer better reified/erased interop and having reified reference
    types with reified Generics.
    Otherwise, it feels like tons of people will try to wrap random
    reference types in value types to get around that limitation.

  * No variance for value types
    This is a big conceptual problem with heterogeneous translation and
    my personal conclusion 1.5 years ago was the same: It can't really
    be done without a huge explosion in runtime complexity.

  * Poor man's typeclasses
    There was a short mention of "conditional methods", e.g. enabling
    methods only for a subset of some generic type. That smells a lot
    like a crippled version of typeclasses. Might be useful to watch
    what happens in that space.

  * Members on any T
    There seems to be a debate about whether/how "any T" should expose
    methods by itself. While this seems to be nicely in line with what a
    lot of people in Scala would like to do, I think Java will not be
    able to properly add constraints to T to describe required methods.
    They currently only have upper bounds to express those things, but
    even in Scala where bounds are less horrible than in Java, people
    have pretty much abandoned bounds based on subtyping in favor of
    context bounds.
    So even if Any/T wouldn't get any members I think we should be aware
    that Java's options for adding constraints are very, very primitive
    and probably not what we would want to use in Scala. Let's take that
    into account before arguing for or against this ...

My conclusion regarding Scala:

  * Exposing two kinds of Generics to the user is highly undesirable
    from a language design POV. F# did the same with a much better
    starting point than Java (well-designed runtime support for reified
    Generics) and even for them I think the increase of complexity just
    wasn't worth the benefit. Java might not have a choice here, but I
    think we should make sure to not leak these "implementation details"
    to the user. We will have to support both kinds of Generics for
    interop anyway.

  * We should think about how we would like to see T/Any and eq/==
    evolve in the long-term and communicate that clearly to the Valhalla
    people. It's probably much more painful if they e.g. decide to have
    no members on any T, but decide on an incompatible constraint
    mechanism (which would very likely be reified in the bytecode). That
    way, keeping status quo and implementing member-free Any/T as a
    scalac fiction might work better.

  * We should start preparing for the variance/value type breakage.
    People who need variance can migrate to boxed representations easily
    (replace Vector[Int] with Vector[Integer] for instance), but trying
    to keep things as-is would mean just not supporting JVM value types,
    and I think that would be a terrible decision. From a type system
    POV, instantiating a type parameter with a value type could be seen
    as collapsing all the bounds of that type parameter to make it invariant

  * We should really think about having a scalac prototype which tries
    to emit the new class file format and leverage the new semantics.
    There is already an ASM fork out there with preliminary support, but
    I don't know how stable/complete that code is.
    Experience has shown (JRuby) that this is the most effective way to
    actually influence the design.

That's just some short overview, I can expand on these topics as 
necessary, but I expect that people have read the draft proposal and the 
complete mailing list already so that everyone is on roughly the same level.

==============

I hope this is helpful.

Thanks,

Simon