Notes about Valhalla from a non-Java perspective

Wed Oct 1 22:17:33 UTC 2014

Hi John,

thanks a lot for your insightful response!

> The Scala perspective on values is valuable to us,
> not only because is known to work on the JVM,
> but because you folks have lived with the implications
> of your design choices.

I'll try to provide you with as much information as possible. :-)
Especially for value types, I think it's extremely important to come up with a
good design, because there is a huge difference between Java and Scala.
While in Java, value types won't be too widespread due to compatibility issues,
I can imagine that most Scala libraries will eventually end up with a majority
of value types.

Imho, it's very important to make sure that the value class design allows
languages/developers to choose what works best for them and not let them hit
limitations like "well, this could be a value type, but the JVM doesn't support
it because we didn't need it for Java"
A good design would support a wide range of reasonable reference-type/value-type
ratios regardless of whether it will be more like 80%/20% (Java) or 30%/70%
(Scala).

But until now, things look good. I haven't seen any huge deal breakers yet, and
most of the extra stuff the JVM draft allows is what we wanted to have in Scala,
too (e. g. value types with more than one field) but couldn't be implemented due
to JVM restrictions.

> We have similar opinions on it, and are mulling principled approaches.
> The type annotations (similar approach) was done under the constraint
> that the bytecode set was off limits. That is not the case here, so we
> can add annotation-carrying bytecodes (decorated nops) if we choose.

Yes, I found that approach slightly nicer but in the end the question (imho) is
... if there are these huge changes to the bytecode, maybe it would just make
sense to flip the switch, go to the drawing board and redesign the class file
format, fixing a lot of other issues, too. (Like the 65K method limit.)

As an example, the only reason why Scala hasn't adopted
https://blogs.oracle.com/jrose/entry/symbolic_freedom_in_the_vm yet and still
uses name mangling (I wrote the patch to remove name mangling, and it was a huge
readability improvement, especially in stacktraces) is because the way inner
classes are handled is such a disaster and paulp wanted to keep all those
"extra" symbols reserved as a potential $-replacement if he found it to improve
things.

But I can kind of understand why people don't want to start yet-another huge
project. :-)

> "Most basic comparison available" ... "except for certain undesirable,
> even more basic comparisons."

If you refer to the Double stuff, I added that as a more detailed description of
what I expect to happen (it is in line with "simply compare type-and-bits").
If you refer to the handling of wrappers, yes ... not sure what's the right
approach here. From my point of view, having to potentially deal with
java.lang.Double, double and value-boxed double is not desirable. In that sense
maybe "any T" can be specified as "wrapped primitives don't exist in here".

> Part of the reason to view values as typed bits is to provide a place
> for a non-ad-hoc definition of of "most basic" comparison, which is
> bitwise. That's the current thinking behind vcmp. Anything higher
> level can be a method, can't it?

Yes, that's the intention. I think we are pretty much in agreement on what the
"primitive comparison" is supposed to do. I'm just unsure how wrappers can fit
into it.
In the end, we have conceptually two different ways of comparing things, but
floating point numbers offer three. (primitive ==, reference == and reference
equals), and it would be painful to not being able to abstract over collection
element types (e. g. having to write both HashMap[T] and DoubleHashMap, although
something similar was proposed earlier ...).

> vcmp is a morphed version of [ail]cmp. That's why it is proposed as bitwise.
> There is no fcmp, so we pull from Double.equals, since that is almost as
> fundamental.
> This is obviously an over-constrained design problem, but I will observe
> that there is a discernible "most basic" comparison, the initial element
> in the category of comparable representations of a value type.
> It may strike some as "wait, that's way too basic for me!", but it's fairly
> unique.

I think the general question is what do we expect to happen when people call
"equals" on floating-point types. On the one side, I think it would be good not
to pull in all the reference-Double craziness, but on the other side, it could
lead to things like changes how equals-based collections work (e. g. not being
able to remove Double.NaNs from a collection anymore).

That was basically the idea behind the "primitive vs. semantic equality" divide:
One really really really just compares type-and-bits and the other does more
advanced things (like following the IEEE754 behaviour on floating point
equality).

In the end, one could think of it this way: reference types, value types and
primitive types have "primitive" and "semantic" equality:
* for most primitive types primitive and semantic are the same (except floating
points)
* primitive equality is fixed and cannot be changed
* semantic equality can be implemented/overridden by the code author for
reference and value types.
> (Insert long conversation about the pros and cons, hows and whys,
> of reification of Java source-level types to JVM level. There was
> a lot of this at JavaOne yesterday; sorry I can't recap here.)

Yes, I just hope people can come up with a better approach. The current one
feels a lot like translating a Java paradigm to the JVM, and I really think that
after almost 20 years it would be nice if that sentence about being a
language-agnostic runtime would be taken a bit more seriously.

> We will probably provide some sort of opt-in way to track more elaborate
> representations of compile-time types. But the JVM will always (IMO)
> tilt towards simplifying runtime types, as a way of making code sharing
> easier.
> This doesn't stop programmers from weaving Class and Type token pointers
> into their data structures, but doesn't require the JVM to do so always.

Sure. This is what scalac has been doing for a lot of years already. The problem
is that this passing-types-along has impact on method/class signatures
currently, so it is exposed to the user and can create compatibility issues. It
would be nice to change/migrate code without breaking every user of that code.

> This will be something to watch. If you don't explode the runtime complexity,
> you may have to explode the static complexity (JAR file), right?

Not sure. From my point of view, and given the time/resource constraints I don't
think it is possible to come up with a good solution.
Consider this code, assuming specialization:
val vector = Vector(1,2,3) // Vector[T] specialized to Int, which means it is
backed by an Array[Int] instead of an Array[Object]
vector :+ "abc" // What's expected to happen here? I don't think it is realistic
that the JIT figures out how to move and box all the values from Array[Int] to
an Array[Object], replace the (final!) reference to the Array in Vector, and add
"abc" to the Array[Object], etc.

Imho, the only practical thing (IF it can be shown to be sound from a language
type system POV) is to collapse all bounds/make everything invariant related to
a type parameter T when T is instantiated with a value type.

> On the other hand, if you push instantiation to runtime (or link time)
> you delay code splitting as long as possible, and allow for internal
> unsafe code sharing (untyped inline mini-boxes, for example).

Mhh, I think this is orthogonal.

> Yes, please do. General question: What is the simplest JVM support
> needed to make higher-kinded parameterizations efficient?

In the sense of higher-kinded types or "higher-kinded" the general sense?

> That's what interfaces are for, on both references and value types.
> I would expect to see them show up on primitives too, sooner than
> new categorization mechanisms (which as you say would be wrong
> from the start, for most languages).

I think if Java would finally get a useful replacement for the useless
java.lang.Number type, that would be great.
For some use-cases, like +/*/-/... on numbers, translating a
scala.math.Numeric[T]-like typeclass to a Java interface which those types
implement makes sense.

The caveat is that this won't be usable for things like equality, comparison,
hashing, toString, etc., while it would be extremely desirable to have a
solution which covers these things, too.

(See all those duplicated collections in the standard and third-party libraries
in Java, Scala, etc. which basically just exchange the equality function.
(HashMap vs. IdentityHashMap in Java, HashMap vs. AnyRefMap in Scala, etc.))
(As far as I remember, the only reason why Scala hardcoded this stuff is because
the performance differences between hardcoding and accepting an equality
function as input was catastrophic on the JVM.)

> Maybe there's a way to make interfaces slightly stronger, so they can
> represent more complex (structural or contextual) constraints.
> General question: What is the simplest JVM support needed to
> allow interfaces to support the "next level" of constraints?
> (Requires some discussion of what is the next level; my preference
> is emphatically given to widely deployed languages.)

I think allowing some kind of dictionary-passing that doesn't impact the visible
signatures (in terms of methods parameters) of methods/classes would be an
interesting thing to research. Maybe all that is required is some better
detection of that pattern by the JIT compiler, because the performance drawbacks
are pretty much the only issue in Scala today (The amount of work and careful
design happening in e. g. Erik Osheim's et al.s Spire math library to line up
things right for the JVM are pretty impressive.)

(Actually, Scala's way of passing-a-type-around which I described above is just
one usage of context bounds, which are basically dictionary-passing.)

Which values are passed around could be completely language-specific then. (E.
g., Which values are in scope? Are multiple available values allowed? When are
they ambiguous? ... Different languages Scala, Haskell, Idris, Agda, ... will
have different opinions on that.)

After having used both upper bounds and context bounds, upper bounds are almost
never what one actually wants, especially in the cases mentioned above, because
equality/comparison/hashing/etc. is not a property of a type, but a relationship
between a type and an equality/compare/hash function.

This is an interesting thing here ... retrofitting generic interfaces to numbers
would certainly be very helpful when working with numbers, but that alone won't
work for a lot of different use-cases.

> We'll do ad hoc constraint mechanisms if we need to, and we are
> using them while prototyping, but we have both the time and inclination
> to converge on a reasonably clean final design, if and when one appears.

Yes, I think the question really is whether it's possible to come up with a good
design which accommodates different languages.

> Thank you for helping us watch this point. We are mulling over options
> for introducing runtime or instantiation-time variance on value types.
> Same point here about us having time and inclination, and hoping for
> better clarity.

Thanks for the info. I'm really wondering whether there even exists a workable
option.

I really hope a few additional Scala people show up who know more about which
things are conceptually possible (I'm more the expert on which things don't work
:-)).

Thanks again for your extremely helpful answer!

Bye,

Simon