Value types questions & comments

Tue Apr 12 20:07:59 UTC 2016

On Mon, Apr 11, 2016 at 4:13 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

Thanks for pulling these together.  Some quick answers inline.  Don’t
> believe me — challenge the answers.
>
> My perspective on this: Since Java has had only two "kinds of types" for
> 20+ years -- and since the tension between those two is already a major
> source of confusion and bugs for intermediate programmers -- adding a third
> kind now is a *Very Big Deal*. It needs to be as simple as possible to
> understand this new kind, as nothing but a "natural" hybrid of the other
> two. The fewer asterisks we need to put on that simple model, the better.
>
> Total agreement.  We view values as generalizations of primitives, where
> primitives are “values with legacy baggage”, and hopefully as little
> baggage as possible.  So hopefully in the end we still have two things,
> references and values (with some values “more equal than others” for
> historical reasons.)  The biggest baggage is probably surrounding the
> bespoke box types that we’re probably stuck with.
>

Ok, there is one difference in how we are conceptualizing this, and the
difference does concern me. I'm convinced that ending up at a place where
it still feels like there are only two kinds of types is not attainable,
and we should not be under the impression that that's what we're going to
accomplish here. There WILL be three kinds of types. Developers will have
to learn all three. Value types will *not* be just generalized primitives,
however, as we both agree, we want to make the two as similar as we can.

Just the fact that they are named, user-defined aggregate types is enough
to make them different from primitives. The fact that they box differently
is of even greater concern, since a large share of existing confusion
between the two kinds we already have already centers around boxing.

But I think it resembles a primitive in pretty much every other way?
>
>    - No identity (so mutability isn't even a question)
>    - Can't be null
>    - Can't have subtype or supertype (excepion: as above, value types can
>    implement interfaces)
>    - Does not extend Object, so synchronization/wait/notify not possible
>    - Not heap-allocated (locals on the stack, fields and array values
>    inlined)
>    - Can be boxed to an Object ... although *boxing works differently
>
> So first off: have I got all that right?
>
>
> Yes.  And, some of these asterisks can be erased.  For example, I see no
> reason why `int` can’t implement Comparable or Serializable (though seeing
> 1.compareTo(2) might make some developer’s heads explode, so we might dial
> back on how much we close up this gap — TBD.)  As you say, the biggest
> asterisk is how we handle boxing.  The box types for values will be derived
> from the class file and have nice clean properties, whereas the box types
> for primitives will likely remain some sort of bespoke bag of smelly stuff.
>

I would assume we're not actually changing anything about primitive boxing,
here...?
FWIW, `int` implementing Comparable seems like it would neither help nor
harm the situation.

We might even take this further — by actually describing `int` with a
> source file (public native class int implements Comparable { … }) which
> might try and smooth out some of the differences, but I wouldn’t hold out a
> lot of hope for this being super successful.  Mostly this is just moving
> the magic around, but its possible this will seem less overall magic to
> some.
>

Yeah, I also don't see that really helping; best to leave primitives
completely alone.

> Another asterisk: the semantics of operators are predefined on primitives,
> and not at all on values.  Its possible we can close up this gap too, but
> I’ve been deliberately avoiding opening this Pandora’s Can Of Worms,
> strictly for scope-management reasons.  (But given that one motivating
> example for values is alternate numerics, calls for operator overloading
> won’t be far behind.)
>

Cool, keep holding out against that...

> Though, bottom line, I think users will be able to recognize that the
> primitives are special cases of this new value thingie.  They behave so
> similarly, they have all the same restrictions, then can be used in all the
> same places.
>

My rewording of this is that "bottom line, we hope users will be able to
recognize that the new hybrid type is really very very similar to
primitives (and primitives continue to be what they always were), but
resemble reference types instead in a few obviously useful ways, and there
are really only two or three asterisks they'll have to watch out for."

> Conceptual question: is a user-defined value type a "class"? A "yes" and a
> "no" answer both seem defensible, and of course we have to choose one and
> defend it. And notably, whichever way we decide it, users are going to have
> to rethink their preconceived notions of what a "class" is no matter what.
> (This gets back to my statement that what we're doing here is a Very Big
> Deal. These are bedrock concepts we're tampering with.)
>
> Yes, there’s gonna be some adjustment of mental models required.
>  (Additionally, enhanced generics also put a lot of pressure on the deeply
> overloaded word “class”, since we will have multiple runtime
> parameterizations of a given generic “class”.)  A class is used to describe
> a source file, a binary file, a runtime type, something you load, a type
> mirror ….  Early in Java’s lifetime, these entities were in strict 1:1
> correspondence, but no more.
>
> We have classes at the source level — this will probably expand to include
> value types.  We have class files — this will probably similarly expand.  I
> don’t think these will be controversial.  But I think we need to call the
> runtime entities something else — like TYPE and TYPE MIRROR.  The meaning
> of “class” is already too overloaded.  Again, though, the game here is to
> frame the old reality as a lower-dimensional projection of the new reality,
> and this doesn’t seem impossible.
>

"Is a class from the source/bytecode perspective, isn't a class from the
runtime perspective" is worth shooting for, but it seems difficult to even
get it down to something that simple. I mean, at runtime this is still a
thing that gets loaded and initialized by a class loader, yes? I fear we
will never find a clean way to address this.

On the one hand, classes are things that have fields and methods, so yes, a
> value type is a class. On the other hand, one expects classes to have
> "instances"/"objects", pointed to by references, which these don't. Also,
> you expect to be able to call getClass() and get something useful back
> (that knows what methods are present, what interfaces are implemented) and
> that doesn't seem possible in the general case here (but could maybe(?) be
> faked in cases where the static type of the value is known to the compiler).
>
> I think this one isn’t so bad.  Java has TYPES today, reference types and
> primitive types.  Instances of reference types are object references, and
> instances of primitive types are values.  So the notion of types whose
> members are not references is not new.
>

I think the fact that we are now talking about user-defined named types
with fields, methods, constructors, and implemented interfaces makes this
something very different.

Now, we redefine “C implements I” as follows:
>  - C has all the methods that I has;
>  - Ref[C] is a subtype of I
>
>
Ah, I think this helps some. Maybe. So a layperson explanation is: When
writing a value type, you can declare interfaces, but you are actually
declaring which interfaces the *boxed* form of the value will implement,
not the value itself. But then if all you do myValue.myInterfaceMethod() it
will just skip boxing behind the scenes. Something like that?

I will also box.  So boxing happens when you assign a value either to
> Object or to an interface.  Otherwise I can invoke the methods directly,
> not unlike how the compiler selects invokevirtual over invokeinterface when
> it has sharp enough static types.
>
> Re: "Large groups of component values should usually be modeled as plain
> classes", I'd VERY much like avoid putting that responsibility onto the
> user if at all possible. Is there a reason why the VM can't simply decide
> "this is past my threshold, so I'm gonna box it instead of putting it all
> on the stack" and not make the developer worry about it?
>
>
> The VM will definitely do this, based on some internal, machine-dependent
> threshold.  However, because the semantics of values is different from
> references, the user should still pick the right tool, or might face
> performance consequences.  If I have an XY point, and I want to vary the X
> component and not the Y, I might well be happy to do:
>
>     Point p = …
>     Point q = p.withX(0)
>
> With small values, this will all stay in registers, it’ll be fast,
> everything will be fine.  But if I have boxed my values onto the heap,
> mutation (unless I can prove non-aliasing) I will have to allocate a new
> object for the new Point.  If I plan to do something “mutation happy”, I’m
> probably better off with a real object that supports mutation, even though
> I can model it as a value.
>

Okay, I think I'm good here. I am fine with developers in "mutation happy"
situations deciding to avoid value types for "bigger" things (yet give
themselves a pass for smaller things).  For non-mutating cases that's where
I'd hope that a vague size threshold doesn't have to come into it, so if
you say the VM will make elective boxing decisions that's great I think.

Re: "Cloning a value type does not strictly make sense," well, *technically*
> when a value includes fields of Cloneable reference types, you might want a
> deep-clone of that. However I lean toward thinking this is too weird to
> bother supporting. Users should really be dissuaded from including
> references to *mutable* types in their value definition in the first
> place.
>
>
> Agree on clone() — best to let it rot, it’s halfway there already.  But
> here’s a value type I could imagine writing all the time:
>
>     value class Cursor<T> {
>         private T[] array;
>         private int offset;
>     }
>
> I can use a Cursor as a garbage-free Iterator.  But it refers into mutable
> objects (here, an array.)  (But note that the fields are private.)  I think
> the logical definition of equality here is “do they point at the same
> array, and at the same location.”
>

Using a value type for something that *isn't a value* raises alarm bells
for me. At the minimum I would expect this user to have to implement eq/hc
by hand, because the default behavior users want 99% of the time is (deep)
content-based equality.

Gratuitous aside about language syntax even though it is not actually
important right now: since we write "enum Foo" not "enum class Foo", I
would be quite surprised if we used "value class" here, since between the
two only enums are the ones that are real classes in every sense of the
word.