it's a value! it's a reference!, was Substitutability, was Re: Finding the spirit of L-World

John Rose john.r.rose at oracle.com
Sat Feb 23 23:28:39 UTC 2019


On Feb 23, 2019, at 5:43 AM, Brian Goetz <brian.goetz at oracle.com> wrote:
> 
> But, for whatever reason, people have a hard time with this notion. Every time I've said "references are values too", people look at me like I have two heads.  (Think of the old trick exam question, "Does Java pass objects by reference or by value", and the accepted answer is "neither, it passes object references by value."  The SO page for this question has like a zillion upvotes.)  So while clearly true, its a harder sell.

And, to complete the cluster of puzzlers, in the JVM
specification all values are manipulated by reference.
So references are values too, which are references too.

Remember that C has x.y and px->y syntaxes, while
Java just has x.y.  (Remember how remarkable it seemed,
the first time you saw Java, that there was no px->y?)
This works because Java systematically hides (makes
unobservable) the same distinction that C makes
between x and px.  Valhalla builds on top of this by
hiding (making unobservable) additional semantic
differences between references and values, notably
object identity.

I think it is pedagogically difficult, in this setting,
to try to explain Java objects (new and old) by citing
fine distinctions between values and references.

(Are Java objects always passed by reference?  Yes,
except when they are scalarized under the covers.
OK, are Valhalla values always passed by value?
Well yes, except when the VM chooses to add an
indirection, which it probably *always* does in
the interpreter.  Well, at least primitives are always
passed by value, right?  Yes, except when they
are boxed, and unless that box is scalarized.
And then there's the story of what's "on the stack".
What starts as an attempt to teach semantics
turns into a seminar on implementation tactics.
There are some very good .NET blogs on this
phenomenon.)

In the end we'll want to recycle some terms,
gently used by prior owners.  And I suppose
those terms will include "value", "reference",
"object", and maybe "instance".  We'll have to
insist that, in describing Java and the JVM,
those terms mean what we mean them to mean
(h/t Lewis Carroll) and help people see that
not all of their intuitions about them, derived
from previous encounters, are necessarily
valid.

So in L-world, apart from primitives (which are
just biding their time), everything is a class,
everything is an instance of a class, everything
is a reference, everything is an object,
everything is a value (except the object part
of an object, unless it's just a value).  Sigh.

One tactic that will help, I think, is to use
pairs of words (or other short phrases) to
add precision.  Based on some terms I've
heard Brian prefer, the terms "value object"
and "reference object" might pass muster
as intuitive-yet-precise descriptions of
the new distinction we are trying to make.

The bikeshed I would prefer is "value instance"
vs. "object instance", but that runs afoul of
the proposition of L-world that everything
is a java.lang.Object.  And surely that should
be a major constraint on our choice of terms.

More correct might be "value-class" for value
types and "value-class instance" for their
instantiated values.  But what's the yang to
that yin?  "reference-class"?  "object-class"?
Both of those are polluted by the "everything
is an X" proposals we have.  Something historical
like "classic-class"?  Technical like "stateful-class"
or "identifiable class"?  ("Identity-laden class"?)

The best proposal I know that puts "object" at
the neutral point ("everything is an object") forces
"reference" into the role of distinguisher (despite
the fact that the JVMS puts it into the neutral point).

So we might get the pleasing symmetry of:

   class (something with fields, methods, and supers)
   reference* class  (classic, identity-testable, stateful, with natural by-reference impl.)
   value* class  (new fangled, identity-free, pure, with natural by-value impl.)

   object (anything instantiated from a class, incl. int some day: just like C)
   reference* object (instance of a reference class; you can test its identity via. ref==)
   value* object (instance of a value class; you can only test its value)

The starred terms are adjectives which modify the following nouns.
The starred terms are also candidates for replacement; get out your
scrabble dictionaries!

To pull this off we would have to teach ourselves to give the term
"object" connotations from C rather than from Smalltalk.  Students
who say "is it an object or a value?" would have to be gently corrected,
probably by answering "both" and saying that (apart from primitives,
for now) everything is an object *and* a value.  Then asking them if
they mean "is it a reference* object or a value* object"?

A simple way to characterize this terminology would be to
insist that "value*" and "reference*" are mainly adjectives,
not nouns.  The nouns we choose to use are "object" and
"class".  This saves those fundamental terms for ceremonial
use at the top of the Java ecosystem.

Also I like the term "instance" because (I think) everybody
knows that an "instance" always comes from a "class".
So if we want to keep "object" as a neutral noun (as in
"everything is an object"), but we don't want to smuggle
in Smalltalk connotations by accident (or is that just me?),
we can talk all day of "instances":  reference instances,
value instances, unconstrained arbitrary instances (as
found in generics and dynamically typed code).

We also need to talk of types, and in particular of abstract types
(in generics and dynamically typed code) which cover both reference
objects* and value* objects.  Something like:

   type (something that describes what you can put in a variable)
   reference* type (something that refers only to reference objects?)
   value* type (something that refers only to value objects?)
   interface* type (a both-and type like List or, yes, Object)

(The term "interface* type" is a blatant bikeshed.  Could be
"object type", "general type", "generic type", "polymorphic type",
etc.  Even the terms reference* and value* here are questionable,
since the JLS clearly says a "reference type" is any type but a
primitive type, so we are bitten again by "values are references
too" here.)

In fact, the story of named bindings (variables as in JLS 4.12)
complicates my little story here, as you can see from looking at
the starred terms in this text from the JLS:

> A variable is a storage location and has an associated type*
> that is either a primitive type* (§4.2) or a reference* type…
> A variable's value* is changed by an assignment…
> Compatibility of the value* of a variable with its type…

So we also need to give ourselves permission to continue
to say things like "the value of x is y" or "x refers to y"
or "x is a reference to the same object as y".  Surely
some of those usages can be demoted to the status
of informal traditional talk, to be clarified as necessary
in formal discourse.

It seems we cannot jettison the term "value" from the
phrase "the value of the variable x", but perhaps we can
save the adjective "value*" from turning into a noun of
rampant ambiguity by saying "the value of x" is shorthand
for something more precise like "the object assigned to x",
or "the assigned object of x" or the like.  Then if needed
we can say things like "the assignment of x is a value* object"
or "x is assigned to a reference* object".  We want to avoid
hopeless muddles like "the value of x is a reference" or
"x refers to a value".

One way to do all of this is give up on "value*" (it's overloaded
too much) and find a new term.  We can certainly try, and the
reason I've added all the stars after value* and reference*
above is to encourage folks to have at it.  To arrive at the final
result we are not waiting for someone to arrive at the correct
bikeshed color, so we can all have a warm flash of instant
agreement.  Rather, we need to make sure that our final
resulting terminology fits in the major specifications of our
ecosystem, starting with java.lang.Object and the rest of
the JDK, and moving to the JLS and even the JVMS.  That's
hard work, obviously.  The starred items above can help,
maybe, as a sort of Mad Libs form to fill in your own favorite
terms.

HTH
— John


More information about the valhalla-dev mailing list