Value types questions & comments

Mon Apr 11 23:13:56 UTC 2016

Thanks for pulling these together.  Some quick answers inline.  Don’t believe me — challenge the answers.  

> My perspective on this: Since Java has had only two "kinds of types" for 20+ years -- and since the tension between those two is already a major source of confusion and bugs for intermediate programmers -- adding a third kind now is a Very Big Deal. It needs to be as simple as possible to understand this new kind, as nothing but a "natural" hybrid of the other two. The fewer asterisks we need to put on that simple model, the better.

Total agreement.  We view values as generalizations of primitives, where primitives are “values with legacy baggage”, and hopefully as little baggage as possible.  So hopefully in the end we still have two things, references and values (with some values “more equal than others” for historical reasons.)  The biggest baggage is probably surrounding the bespoke box types that we’re probably stuck with.    

> I've gathered that it's like a reference type in that it's a named, user-defined type that can have fields (of any kind), methods ad constructors (getting eq/hc/ts for free, kind of like enum classes get valueOf for free, I assume), and can implement interfaces.

Right. 

> But I think it resembles a primitive in pretty much every other way?
> No identity (so mutability isn't even a question)
> Can't be null
> Can't have subtype or supertype (excepion: as above, value types can implement interfaces)
> Does not extend Object, so synchronization/wait/notify not possible
> Not heap-allocated (locals on the stack, fields and array values inlined)
> Can be boxed to an Object ... although *boxing works differently
> So first off: have I got all that right?

Yes.  And, some of these asterisks can be erased.  For example, I see no reason why `int` can’t implement Comparable or Serializable (though seeing 1.compareTo(2) might make some developer’s heads explode, so we might dial back on how much we close up this gap — TBD.)  As you say, the biggest asterisk is how we handle boxing.  The box types for values will be derived from the class file and have nice clean properties, whereas the box types for primitives will likely remain some sort of bespoke bag of smelly stuff.  

We might even take this further — by actually describing `int` with a source file (public native class int implements Comparable { … }) which might try and smooth out some of the differences, but I wouldn’t hold out a lot of hope for this being super successful.  Mostly this is just moving the magic around, but its possible this will seem less overall magic to some.  

Another asterisk: the semantics of operators are predefined on primitives, and not at all on values.  Its possible we can close up this gap too, but I’ve been deliberately avoiding opening this Pandora’s Can Of Worms, strictly for scope-management reasons.  (But given that one motivating example for values is alternate numerics, calls for operator overloading won’t be far behind.)  

Though, bottom line, I think users will be able to recognize that the primitives are special cases of this new value thingie.  They behave so similarly, they have all the same restrictions, then can be used in all the same places.  

> Conceptual question: is a user-defined value type a "class"? A "yes" and a "no" answer both seem defensible, and of course we have to choose one and defend it. And notably, whichever way we decide it, users are going to have to rethink their preconceived notions of what a "class" is no matter what. (This gets back to my statement that what we're doing here is a Very Big Deal. These are bedrock concepts we're tampering with.)

Yes, there’s gonna be some adjustment of mental models required.  (Additionally, enhanced generics also put a lot of pressure on the deeply overloaded word “class”, since we will have multiple runtime parameterizations of a given generic “class”.)  A class is used to describe a source file, a binary file, a runtime type, something you load, a type mirror ….  Early in Java’s lifetime, these entities were in strict 1:1 correspondence, but no more.  

We have classes at the source level — this will probably expand to include value types.  We have class files — this will probably similarly expand.  I don’t think these will be controversial.  But I think we need to call the runtime entities something else — like TYPE and TYPE MIRROR.  The meaning of “class” is already too overloaded.  Again, though, the game here is to frame the old reality as a lower-dimensional projection of the new reality, and this doesn’t seem impossible.  

> On the one hand, classes are things that have fields and methods, so yes, a value type is a class. On the other hand, one expects classes to have "instances"/"objects", pointed to by references, which these don't. Also, you expect to be able to call getClass() and get something useful back (that knows what methods are present, what interfaces are implemented) and that doesn't seem possible in the general case here (but could maybe(?) be faked in cases where the static type of the value is known to the compiler).

I think this one isn’t so bad.  Java has TYPES today, reference types and primitive types.  Instances of reference types are object references, and instances of primitive types are values.  So the notion of types whose members are not references is not new.  

Because value types are not polymorphic, there’s no case where you have a value when you don’t know its type by the time the bytecode moving it / describing it is executed.  This means that the “general case” here doesn’t exist.  In any case, there needs to be reflection over values, but its not clear whether it has to be spelled “.getClass()”, nor is it clear that what is returned must be a java.lang.Class.  (But, because all values can be boxed, we may be able to get away with just returning the type mirror for the box type from .getClass() on values, and calling it good— (almost) everything in reflection is boxed anyway.)

The Scala type system has some magic types that help capture these differences (AnyRef and AnyVal are the roots for reference and value types, respectively, and both extend Any).  Not clear that we want to copy this, but I think we put things in reasonable context here (especially if we’re willing to tolerate expressions like 1.toString() and such — then every type has members, some types are value types, some are reference types, no big mystery.)

> It's nice that a value type can implement interfaces. But I get confused when I try to think through the implications of this. I get that when referring to it as the interface type, boxing may occur. I'd expect eq/hc/ts on the box to pass through to the value itself (two different boxes of equal values are equal). But... maybe most of my confusion is just stemming from getClass() again. What would it return? Could the returned Class possibly have all the metadata a user might expect? I think not?

The story here is actually pretty straightforward.  There’s a slight mental gymnastic you have to do to generalize your notion of “implements interface.”  

Hitherto, “C implements I” meant two things together:
 - C has all the methods that I has; 
 - C is a subtype of I 

With values in the mix, we have to slightly redefine this in a  backward compatible way.  

For each type T, there exists a reference type Ref[T], where Ref[T] <: Object.  
For all reference types R, Ref[R] = R.  
For all value types V, Ref[V] = V’s box type.  

Now, we redefine “C implements I” as follows:
 - C has all the methods that I has;
 - Ref[C] is a subtype of I

Note that this fully describes reality, we just didn’t know that there were types for which Ref[T] was not just T.  

If I have a Decimal, where Decimal implements Comparable:

    Decimal d = …, e = ...
    if (d.compareTo(e) < 0) { …. }

Since I know the receiver is a Decimal, I’ll generate the bytecode:

    vload_1              // push d
    vload_2              // push e
    invokedirect Decimal.compareTo(Decimal):Z

I don’t need to go through the interface, so no boxing.  On the other hand, if I convert it:

    Comparable c = d

then I will box d to D’s box (which implements Comparable).  Similarly, if I do

    Object o = d

I will also box.  So boxing happens when you assign a value either to Object or to an interface.  Otherwise I can invoke the methods directly, not unlike how the compiler selects invokevirtual over invokeinterface when it has sharp enough static types.  

> Re: "Large groups of component values should usually be modeled as plain classes", I'd VERY much like avoid putting that responsibility onto the user if at all possible. Is there a reason why the VM can't simply decide "this is past my threshold, so I'm gonna box it instead of putting it all on the stack" and not make the developer worry about it?

The VM will definitely do this, based on some internal, machine-dependent threshold.  However, because the semantics of values is different from references, the user should still pick the right tool, or might face performance consequences.  If I have an XY point, and I want to vary the X component and not the Y, I might well be happy to do:

    Point p = …
    Point q = p.withX(0)

With small values, this will all stay in registers, it’ll be fast, everything will be fine.  But if I have boxed my values onto the heap, mutation (unless I can prove non-aliasing) I will have to allocate a new object for the new Point.  If I plan to do something “mutation happy”, I’m probably better off with a real object that supports mutation, even though I can model it as a value.  

> 
> Re: "Cloning a value type does not strictly make sense," well, technically when a value includes fields of Cloneable reference types, you might want a deep-clone of that. However I lean toward thinking this is too weird to bother supporting. Users should really be dissuaded from including references to mutable types in their value definition in the first place.

Agree on clone() — best to let it rot, it’s halfway there already.  But here’s a value type I could imagine writing all the time:

    value class Cursor<T> { 
        private T[] array;
        private int offset;
    }

I can use a Cursor as a garbage-free Iterator.  But it refers into mutable objects (here, an array.)  (But note that the fields are private.)  I think the logical definition of equality here is “do they point at the same array, and at the same location.”  

> 
> Which actually raises another issue. If my value type has no way to include an int[] unmodifiably, that would be extremely sad, right?

Array mutability is a persistent thorn in our side.  We’re investigating an idea called /frozen arrays/ which will allow an array to be marked readonly (and, if not aliased, efficiently so.)  Which we’d love to backlit onto varargs….

> 
> Going to stop there for now!
> 

Keep ‘em coming!