Value types questions & comments

Tue Apr 12 20:51:25 UTC 2016

> I would assume we're not actually changing anything about primitive boxing, here...?

So, this is rife with tradeoffs….

The legacy boxes are inferior to the new boxes, for a number of reasons.  The association between QComplex; and LComplex; is mechanical and simple, whereas the association between int and Integer is ad-hoc and complex.  And since the new boxes are new, they can be defined from the get-go to have relaxed identity semantics, enabling optimizations and defending against possible bugs (e.g., they could throw when synchronized upon.)  Whereas its valid now to synchronize on a j.l.Integer, and existing code does this (shame, shame), meaning that we can’t necessarily take liberties with the identity of the box for optimization purposes.  

So it would be great if we could get away with having new mechanically generated primitve box classes, and deprecate Integer, but I have deep doubts we’ll be able to get away with that.  So, probably right that we’re stuck with primitive boxing mostly as is.  

> 
> We have classes at the source level — this will probably expand to include value types.  We have class files — this will probably similarly expand.  I don’t think these will be controversial.  But I think we need to call the runtime entities something else — like TYPE and TYPE MIRROR.  The meaning of “class” is already too overloaded.  Again, though, the game here is to frame the old reality as a lower-dimensional projection of the new reality, and this doesn’t seem impossible.  
> 
> "Is a class from the source/bytecode perspective, isn't a class from the runtime perspective" is worth shooting for, but it seems difficult to even get it down to something that simple. I mean, at runtime this is still a thing that gets loaded and initialized by a class loader, yes? I fear we will never find a clean way to address this.

One terminology we’ve been experimenting with is having “class” and “species” (think back to middle school: kingdom, phylum, class, order, family, genus, species.)  List is a class; List<int> and List<erased> are species of List.  Similarly, the boxed projection and the value projection of Complex are both species of class Complex.  

Not clear whether this is the right terminology, but it gives users a way to to keep thinking that List is a class, while recognizing that the beasts List<int> and List<erased> are at the same time both of class List and also of different species.  

> I think the fact that we are now talking about user-defined named types with fields, methods, constructors, and implemented interfaces makes this something very different. 

So, how about: 
 - Java has always had values
 - Primitives are the BUILT-IN values
 - Java now gets USER-DEFINED values in addition to USER-DEFINED classes
 - USER-DEFINED values and classes can have fields, methods, constructors, and implement interfaces

Does this stacking make it sound less radical?  

I agree that there’s a real pedagogical challenge here, but I think it can be made to seem like less of a hurdle.  

> 
> Now, we redefine “C implements I” as follows:
>  - C has all the methods that I has;
>  - Ref[C] is a subtype of I
> 
> 
> Ah, I think this helps some. Maybe. So a layperson explanation is: When writing a value type, you can declare interfaces, but you are actually declaring which interfaces the boxed form of the value will implement, not the value itself. But then if all you do myValue.myInterfaceMethod() it will just skip boxing behind the scenes. Something like that?

That’s exactly how it works, yes.  And, you could put the “skip boxing behind the scenes” part in a smaller font, since that’s just an optimization (and, even when you explicitly box and then access a box member, there’s some chance that the box will still be elided due to escape analysis.)  

> Using a value type for something that isn't a value raises alarm bells for me. At the minimum I would expect this user to have to implement eq/hc by hand, because the default behavior users want 99% of the time is (deep) content-based equality.

This may be the reality-distortion field speaking, but in my view a reference *is* a kind of value — albeit a very special kind.  They’re immutable, like other values.  Almost all their state is encapsulated (they can be compared by identity, that’s it).  They can only be constructed by privileged factories (we call these constructors.)  But, ultimately, they behave like values — they are passed by value, they have no identity of their own.  

For the Cursor class, the natural definition of equals *is* the componentwise one — two cursors are the same if they refer into the same source at the same position.  But yes, there are cases where we’d want to hand-override equals (which is allowed) to do a deeper comparison (generally when our components are value-like references, like strings or dates or big decimals.)  

Another place where references to mutable objects will show up in values is if we use values as a substrate for multiple return / tuples.  Here, the value is just an ad-hoc container for multiple values — and object references are entirely reasonable to use in this context.   

> Gratuitous aside about language syntax even though it is not actually important right now: since we write "enum Foo" not "enum class Foo", I would be quite surprised if we used "value class" here, since between the two only enums are the ones that are real classes in every sense of the word.

Sure, this is one of our tools for helping frame the correct mental model.  If we decide that the terminology falls out as “classes are the entities that have fields, methods, and constructors”, then “value class” reinforces that.  But we could go other ways too.