Updated SoV, take 3

Brian Goetz brian.goetz at oracle.com
Thu Jul 28 18:24:21 UTC 2022


>
>     Java currently has eight built-in primitive types. Primitives
>     represent pure
>     _values_; any `int` value of "3" is equivalent to, and
>     indistinguishable from,
>     any other `int` value of "3".  Because primitives are "just their
>     bits" with no
>     ancillarly state such as object identity, they are _freely
>     copyable_; whether
>     there is one copy of the `int` value "3", or millions, doesn't
>     matter to the
>     execution of the program.  With the exception of the unusual
>     treatment of exotic
>     floating point values such as `NaN`, the `==` operator on
>     primitives performs a
>     _substitutibility test_ -- it asks "are these two values the same
>     value".
>
>
> I've said this before, but I think both "substitutability" and 
> "sameness" just lead to more questions, and I'm not sure why we don't 
> appeal to distinguishability instead.

Fair.  Substitutibility is neither a commonly understood concept, nor is 
it an official term in the spec, so happy to change this to something 
more intuitive.  That said, I'm not sure why you're down on "sameness"?

>     Java also has _objects_, and each object has a unique _object
>     identity_.  This
>     means that each object must live in exactly one place (at any
>     given time), and
>     this has consequences for how the JVM lays out objects in memory. 
>     Objects in
>     Java are not manipulated or accessed directly, but instead through
>     _object
>     references_.  Object references are also a kind of value -- they
>     encode the
>     identity of the object to which they refer,
>
>
> Do we really want to invoke identity here? That surprises me. That 
> suggests that a `ValueClass.ref` instance will have identity too.
> Isn't it really only about the object being addressable or locatable 
> (some term like that)?

Will adjust; this is more of an implementation detail anyway.

>
>     This says that an `Point` is a class whose instances have no
>     identity.  As a
>     consequence, it must give up the things that depend on identity;
>     the class and
>     its fields are implicitly final.  Additionally, operations that
>     depended on
>     identity must either be adjusted (`==` on value objects compares
>     state, not
>     identity) or disallowed (it is illegal to lock on a value object.)
>
>
> Just for broad understandability, you might want to address here "but 
> then how could a reference 'identify' what object it's pointing to?"

Indeed, this is a tricky new concept; a reference to a thing that is not 
necessarily unique, but for which we can't distinguish between copies.

>
>     Value classes can still have most of the affordances of classes --
>     fields,
>     methods, constructors, type parameters, superclasses (with some
>     restrictions),
>     nested classes, class literals, interfaces, etc.  The classes they
>     can extend
>     are restricted: `Object` or abstract classes with no instance
>     fields, empty
>     no-arg constructor bodies, no other constructors, no instance
>     initializers, no
>     synchronized methods, and whose superclasses all meet this same set of
>     conditions.  (`Number` is an example of such an abstract class.)
>
>     Because `Point` has value semantics, `==` compares by state rather
>     than
>     identity.  This means that value objects, like primitives, are _freely
>     copyable_; we can explode them into their fields and re-aggregate
>     them into
>     another value object, and we cannot tell the difference.
>
>
> It feels like if this wants to rest some stuff on "comparing by state" 
> it ought to explain here what that means? Or, I guess at least a 
> forward reference.
> It seems pretty important to understand that it means shallow 
> fieldwise delegation back to `==` again, meaning that fields of 
> identity types are still identity-compared.
> In many contexts "value semantics" and "comparing by state" tend to 
> only make sense if done recursively/deeply.

It's worse than that, because references to value objects get a deeper 
comparison than refs to identity objects.  I'll stay away from 
shallow/deep, but talk about fieldwise equivalence.

>
>
>     ### Migration
>
>     The JDK (as well as other libraries) has many [value-based
>     classes][valuebased]
>     such as `Optional` and `LocalDateTime`.  Value-based classes
>     adhere to the
>     semantic restrictions of value classes, but are still identity
>     classes -- even
>     though they don't want to be.  Value-based classes can be migrated
>     to true value
>     classes simply by redeclaring them as value classes, which is both
>     source- and
>     binary-compatible.
>
>
> This gave me a slight "huh, then what's the catch?" reaction. It might 
> make more sense by adding the fact right away that any errant usages 
> (that don't adhere to the VBC requirements) will start failing at 
> runtime, and might cause compilation warnings?

The catch is twofold:

  - Clients that depend on that accidental identity despite the warning 
signs are in for a surprise (hello, Integer);
  - The ref companion gets the good name, which will surely annoy people

The former should be viewed as an anti-catch, but not everyone will see 
it that way.  The latter will surely be spun as "why do you guys hate 
your users."  For which we'll tell them it was Kevin's idea.

>
>     We plan to migrate many value-based classes in the JDK to value
>     classes.
>     Additionally, the primitive wrappers can be migrated to value
>     classes as well,
>     making the conversion between `int` and `Integer` cheaper; see
>     "Migrating the
>     legacy primitives" below.  (In some cases, this may be _behaviorally_
>     incompatible for code that synchronizes on the primitive
>     wrappers.  [JEP
>     390][jep390] has supported both compile-time and runtime warnings for
>     synchronizing on primitive wrappers since Java 16.)
>
>
> Putting this in parens under the topic of the primitive wrappers feels 
> like "pulling a fast one". Like it's pretending that this 
> incompatibility problem is somehow unique to those 8 classes, hoping 
> people won't notice "wait a minute, *any* class hopeful of future 
> migration would have the same desire to opt into such warnings in 
> advance." (And for more than just synchronization.) I get that there 
> is no current plan to solve that problem, but we could be more 
> up-front about that?

I think it is just these eight classes, since in Java 8, we wrote this 
into the definition of value-based class (but couldn't back-apply that 
definition to these eight.)  But I can drop the parens if that helps :)

>
>     Value classes are generalizations of primitives.  Since primitives
>     have a
>     reference companion type, value classes actually give rise to
>     _pairs_ of types:
>     a value type and a reference type.  We've seen the reference type
>     already; for
>     the value class `ArrayCursor`, the reference type is called
>     `ArrayCursor`, just
>     as with identity classes.  The full name for the reference type is
>     `ArrayCursor.ref`; `ArrayCursor` is just a convenient alias for
>     that.  (This
>     aliasing is what allows value-based classes to be compatibly
>     migrated to value
>     classes.)
>
>
> It's more than just that: it's what unifies all classes together! They 
> all define a reference type, always with the same name as the class. 
> That's nice, unchanging solid ground under our feet while all the 
> Valhalla shifts are going on.
>
> It would make more sense to me if `ArrayCursor.ref` were the alias to 
> `ArrayCursor`, and it would be appropriate for the reader to wonder 
> "why do we even need that alias?".

Yes, and the answer is "we almost don't", except for type variables 
(T.ref).

>     The value type is called `ArrayCursor.val`, and the two types have the
>     same conversions between them as primitives do today with their
>     boxes.  The
>     default value of the value type is the one for which all fields
>     take on their
>     default value; the default value of the reference type is, like
>     all reference
>     types, null.  We will refer to the value type of a value class as
>     the _value
>     companion type_.
>
>
> ... because it acts as a companion to the reference type you've always 
> known.
> (At least, *I* still really don't want people to think that both the 
> value type and the reference types are "companions" to the class that 
> defined them.)

I am thinking they companions to each other, we can be more explicit 
about this.
>
>
>     Both the reference and value companion types have the same members.
>
>
> Maybe worth acknowledging "(even those, like `wait()` inherited from 
> `Object`, that don't make sense and will fail at runtime, for 
> simplicity's sake)".

It is not clear how pedantic to be here.  Do they have the same members, 
or are the members all on the ref type, and we just provide a convenient 
syntax / fast implementations for vals as receivers? The latter is 
closer to reality, but does that explanation help?

>
> I think it is worth acknowledging that this does lead to 
> `5.toString()` becoming valid and functioning code, which happens just 
> for consistency and not because it was a goal in itself.

OK.  Another good thing that happens here is that we can write equals() 
methods uniformly:

     return o instanceof Foo f &&
         i.equals(f.i) && name.equals(f.name);

and not have to worry about "is this a ref or a primitive".  Just use 
equals everywhere.

>
>
>     Arrays of reference types are _covariant_; this means that if `A
>     <: B`, then
>     `A[] <: B[]`.  This allows `Object[]` to be the "top array type"
>     -- but only for
>     arrays of references.  Arrays of primitives are currently left out
>     of this
>     story.   We unify the treatment of arrays by defining array
>     covariance over the
>     new "extends" relationship; if A _extends_ B, then `A[] <: B[]`. 
>     This means
>     that for a value class P, `P.val[] <: P.ref[] <: Object[]`; when
>     we migrate the
>     primitive types to be value classes, then `Object[]` is finally
>     the top type for
>     all arrays.  (When the built-in primitives are migrated to value
>     classes, this
>     means `int[] <: Integer[] <: Object[]` too.)
>
>
> I think it's worth addressing that this does mean there will be 
> `Integer[]` and `Object[]` instances that can't store null, failing at 
> runtime, but that this is consistent with the existing quirks of array 
> covariance.

Yep, same ASE

>
>
>     The base implementation of `Object::equals` delegates to `==`,
>     which is a
>     suitable default for both reference and value classes.
>
>
> This is where you could appeal to the idea that `==` has always meant 
> "strictly indistinguishable by any means" and this preserves that 
> meaning (modulo float/double weirdness).

Yep

>
>     ### Serialization
>
>     If a value class implements `Serializable`, this is also really a
>     statement
>     about the reference type.  Just as with other aspects described here,
>     serialization of value companions can be defined by converting to the
>     corresponding reference type and serializing that, and reversing
>     the process at
>     deserialization time.
>
>
> It's nonobvious to me why the reference type is being elevated as the 
> primary one here, except that of course a method like `writeObject` is 
> only going to be fed the reference type. I would have expected just 
> that serializability applies equally to both types in the same way, 
> much like invoking some method on both types.

It's a lot like members; we can define them to be the same on both, or 
we can define them to live on the ref.  A lot of things are simpler with 
the latter, but its not clear readers of this doc need to understand all 
that.

>
>     The built-in primitives reflect the design assumption that zero is
>     a reasonable
>     default.  The choice to use a zero default for uninitialized
>     variables was one
>     of the central tradeoffs in the design of the built-in
>     primitives.  It gives us
>     a usable initial value (most of the time), and requires less
>     storage footprint
>     than a representation that supports null (`int` uses all 2^32 of
>     its bit
>     patterns, so a nullable `int` would have to either make some 32
>     bit signed
>     integers unrepresentable, or use a 33rd bit).  This was a
>     reasonable tradeoff
>     for the built-in primitives, and is also a reasonable tradeoff for
>     many other
>     potential value classes (such as complex numbers, 2D points,
>     half-floats, etc).
>
>
> You might not want to go into the following. But I hope that users 
> will understand that the numeric types really do clear a pretty high 
> bar here. They are fortunate that for the *two* most popular reduction 
> operations over those types, zero happens to be the correct identity 
> for one of them, and absolutely destructive to the other (i.e., making 
> it at least easy to detect the bug). If not for *both* of those facts 
> we would have more and worse bugs in the world.

Yeah, it's not obvious how much algebra is helpful here.  I mostly want 
to make the point that zero wasn't chosen at random; its the default you 
actually want, and if you got null, you probably wouldn't like it as 
much.  Agree about the high bar; Jan 1 1970 doesn't clear that bar.

>
>     But for other potential value classes, such as `LocalDate`, there
>     simply _is_ no
>     reasonable default.  If we choose to represent a date as the
>     number of days
>     since some some epoch, there will invariably be bugs that stem from
>     uninitialized dates; we've all been mistakenly told by computers
>     that something
>     that never happened actually happened on or near 1 January 1970. 
>     Even if we
>     could choose a default other than the zero representation as a
>     default, an
>     uninitialized date is still likely to be an error -- there simply
>     is no good
>     default date value.
>
>     For this reason, value classes have the choice of _encapsulating_
>     their value
>     companion type.  If the class is willing to tolerate an
>     uninitialized (zero)
>     value, it can freely share its `.val` companion with the world; if
>     uninitialized
>     values are dangerous (such as for `LocalDate`), the value
>     companion can be
>     encapsulated to the class or package, and clients can use the
>     reference
>     companion.  Encapsulation is accomplished using ordinary access
>     control.  By
>     default, the value companion is `private` to the value class (it
>     need not be
>     declared explicitly); a class that wishes to share its value
>     companion more
>     broadly can do so by declaring it explicitly:
>
>     ```
>     public value record Complex(double real, double imag) {
>         public value companion Complex.val;
>     }
>     ```
>
>
> I think you should add that the name `Complex.val` can't be changed 
> here, much like you can't change the name of a constructor even though 
> it *looks* like you could.

I keep hoping that we'll come up with a brilliant replacement for X.val 
before that....

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-observers/attachments/20220728/28decc4b/attachment-0001.htm>


More information about the valhalla-spec-observers mailing list