object sameness, Leibniz's Law, substitutability, indistinguishability

Tue Aug 2 21:48:50 UTC 2022

On 27 Jul 2022, at 13:22, Kevin Bourrillion wrote:

>  I've said this before, but I think both "substitutability" and 
> "sameness" just lead to more questions, and I'm not sure why we don't 
> appeal to distinguishability instead.

The short answer is there is no one-magic-word-no-more-questions 
explanation for what we need to convey.  More words will surely help, 
but at some point you have to stop talking.

Except on Email threads.  So, here’s a lot more talk from me on the 
subject!

There are a number of words we could use here.  “Substitutability” 
is (AFAICR) a bow to Liskov, who uses it to talk about 
correctness-preserving type changes at compile time, specifically 
substituting an argument type S for for a parameter type C where S<:C, 
when binding a method argument.  From that POV we are bending it (just a 
little) to talk about result-preserving reference changes at runtime.  
But this term also occurs (along with the related term 
“substitutivity”) in logic, law, and economics, with related 
meanings.

In the context of software (and sometimes math and logic), “x is 
substitutable for y” means “x can replace y”, and in an absolute 
sense it captures the intuition that “x is just as good as y (for any 
purpose we are now contemplating)”.

We could certainly pick other words out of the dictionary, and/or negate 
the proposed predicate (defining x!=y instead of x==y) to allow other 
related concepts like “[in-]distinguishability”, or “sameness”, 
or “indiscernability”, or perhaps simply “unconditional 
equality”.  But choosing another word will probably not refine our 
understanding of what’s going on; it will just shuffle around the 
connotations and emphases, making some folks happier and other folks 
more confused.  Conversely, given a clearer shared understanding, we can 
then proceed to choose any of a range of reasonable words to refer to 
that understanding, including either or both of “substitutable” and 
“indistinguishable”, or some other word.

But I’d like to back up first.  Our problem with object equality, as I 
understand it, begins with the word “same”.  That little word 
“same” is hardwired in the JLS and JVMS to talk about identity of 
many kinds of things: tokens, source locations, classes, interfaces, 
types, values, exceptions, packages, and more — in both static and 
dynamic contexts (as appropriate).  The word shows up several hundred 
times, often with strongly normative force.  That is, the specification 
would be seriously broken without each such statement of “sameness”.

Occasionally sameness is defined explicitly, as in JLS 4.3.4, “When 
Reference Types Are the Same”.  But often the meaning (including 
normative meanings) is left to the reader’s common sense, as in the 
many places the spec talks about “same name” and “same 
descriptor” (both referring to character sequences).

<https://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.3.4>

But for object references the meaning of “same” is left to the 
reader in JLS 15.21.3:

> At run time, the result of == is true if the operand values are both 
> null or both refer to the same object or array; otherwise, the result 
> is false.

<https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.21.3-200>

It seems to me that, in order to sharpen up the meaning of that use of 
“same” (as well as others), a very good tool is something called 
“Leibniz’s Law”, which simply says that two things are the same if 
and only if there is no predicate that can distinguish them.  That is, 
x=y ⇔ ∀F(Fx ⇔ Fy).

<https://plato.stanford.edu/entries/identity-indiscernible/>

The application to software is immediate, since (in a software system) 
any predicate can be represented as a procedure which accepts an input 
and returns a boolean.  (Obviously the procedure cannot just look at a 
random number or changing global variable, or throw something; it has to 
be immunized somehow against indeterminacy, lest our conceptual test 
harness fail to produce the kind of result we want.)  And the law’s 
force is preserved if formulated in terms of many kinds of functions, 
rather than predicates:  x=y ⇔ ∀f(fx = fy), as long as the second 
equality check is somehow more primitive; we Java folks could say it 
must return an identity class or primitive.

So, for us, Leibniz’s Law defines that two software objects (or 
values) are “the same” (identical in a purely logical sense) if and 
only if no reasonable software function can tell them apart, by 
returning different values for each of the two objects.

 From there, it is a short step (a) to recognize that pre-Valhalla 
reference equality satisfies Leibniz’s Law, and thus (b) to recognize 
that this state of affairs can (and should) be preserved by choosing a 
design much like we have today.

The law breaks into two parts, each part being its own separate 
principle:

x=y ⇒ ∀f(fx = fy)

(If x equals y then applying any reasonable f to x and y gives the same 
answer.)

This means that if two object references are evaluated (by the JLS and 
JVMS) as referring to “the same object”, there must be no subsequent 
computation which somehow extracts two different results from the two 
references.  Putting on our engineer’s hat, that means that if `acmp` 
reports equality, the two inputs must have the same class (f = 
`Object::getClass`) and pairwise “same” fields (f = various 
`getfield` operations).

Specifically, the JVM is not allowed to get away with comparing some 
pointers and returning “false” if the pointers refer to structurally 
“same” objects which happen to be duplicated in the heap.  It must 
do the work of chasing the pointers.

(This principle breaks when applied to `==` on floating point numbers, 
since +0.0 = -0.0 but f=`doubleToLongBits` returns distinct values.)

x≠y ⇒ ∃f(fx ≠ fy)

(If x doesn’t equal y then you can find a reasonable f to apply to x 
and y, which gives different answers.)

This converse means that if two object references are evaluated as *not* 
referring to the same object, then there must exist a computation which 
can derive distinct values from them.

Specifically, we don’t just hardwire the `==` operator, much less the 
“sameness” condition, to a user-defined `Foo::equal` method.  Doing 
this would require an infinite regress in the body of the method, if 
(and when) it tries to ask if its operands are “the same” using 
`==`.

(This principle breaks when applied to `!=` on floating point numbers, 
since 0/0.=0/0. but there is no function which can distinguish 0/0. from 
itself.)

Moving back to the problem at hand:  Valhalla forces the specs to define 
`acmp` and `==` in terms of same-class-and-fields for value objects.  It 
also (IMO) raises the question of whether we should try to tighten the 
existing spec for identity objects (since there’s now a clear 
contrast).  I think it works to appeal to an object’s “identity”, 
as a property of the object assigned at the first instant of the 
execution of `new` (the operator or bytecode), and differently from 
(“not the same as”) any other “identity” from any other 
execution.

The specs do not *need* to refer to Leibniz’s Law, nor even appeal to 
concepts of “substitutability” or “indistinguishability”, in 
order to be operationally complete and sound.  Email archives will hold 
the present discussion basically forever, in case someone wants to 
inquire why it’s the way it is.

But, the JVMS and JLS *should* (IMO) include some non-normative prose 
that encourages the reader to think about intuitions like “once we 
know x==y, then x is as good as y everywhere”, and also “if x==y in 
one place, it will be impossible to tell them apart somewhere else”.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-observers/attachments/20220802/7bea1298/attachment-0001.htm>