Finding the spirit of L-World
John Rose
john.r.rose at oracle.com
Sat Feb 23 04:23:35 UTC 2019
On Jan 23, 2019, at 9:51 AM, Brian Goetz <brian.goetz at oracle.com> wrote:
>
>> Because values have no identity, in LW1 `System::identityHashCode`
>> throws `UnsupportedOperationException`. However, this is
>> unnecessarily harsh; for values, `identityHashCode` could simply
>> return `hashCode`. This would enable classes like `IdentityHashMap`
>> (used by serialization frameworks) to accept values without
>> modification, with reasonable semantics -- two objects would be deemed
>> the same if they are `==`. (For serialization, this means that equal
>> values would be interned in the stream, which is probably what is
>> wanted.)
>>
>> By return `hashCode`, do you mean call a user defined hashCode function? Would the VM enforce that all values must implement `hashCode()`? Is the intention they are stored (growing the size of the flattened values) or would calling the hashcode() method each time be sufficient?
>
> I would prefer to call the "built-in” value hashCode — the one that is deterministically derived from state. That way, we preserve the invariant that == values have equal identity hash codes.
Just as op== (acmp) is a built-in and equals is the user-coded variation on it,
System.identityHashCode is a built-in and hashCode is the user-coded variation.
In both cases, the default implementation of the latter is the former.
When we get to values, generics and dynamically typed code lead us
to consider retrofitting op== (acmp) and System.iHC from references to
values also. Which is what we are talking about.
And, at the very least, the existence of such code forces us to define
two *new* operations which extend op== (acmp) and System.iHC.
The first is a (total) substitutability test. The second is a total hash
algorithm that computes a hash code compatible with the first.
That second is *not* the same as any coder's override of O.hashCode.
It is an intrinsic function, just like op== (acmp).
I like Remi's names for them: structuralEquals, structuralHashCode,
and also my own isSubstitutable, substitutabilityHashCode.
(Problem with "structuralEquals": When applied to references, it does
*not* look at structural. Bummer. The word "structural" works
for values and not for references. I guess I'm back to substitutability.)
Should op== (acmp) be bound to the first and System.identityHashCode
bound to the second? OK, if we do that then we don't need to find
new names for them. But even then, maybe we *want* the new names
so that programmers can advertise their intentions more clearly. We
can then deprecate System.identityHashCode, and/or add lint-style
warnings to op== on certain cases (as with String today), or whatever,
and let users refactor those warnings away by using the newer names.
I think where we will end up is with making op== the same as
isSubstitutable, but we will still want isSubstitutable for certain
coding tasks, such as code which works with floats and doubles
and wants to side-step the NaN behavior of op==. Today's
workaround for that boxes the values and calls Object.equals.
Since equals and hashCode are on parallel tracks, we can also
extend System.iHC "in place" to handle values by doing a structural
hash code on their component fields. But we are *not* obligated
to do so. We can (and should) make System.substitutabilityHashCode
a new API point (which calls System.iHC on references only) and
then decide what is the most useful thing to do for System.iHC
when applied to a value type (and, some day, a primitive).
I think a thoughtful prototyping move would be to make it
throw an exception or log a warning, and then ask our users
to debug the resulting diagnostics. Maybe if the diagnostics
are useful they can be turned into JFR events or something.
A conservative *final design* (especially if we had to choose one
*today*) would be to extend System.iHC in place, so as to keep
the trains running on all tracks; we could deprecate System.iHC
as a gentle way of encouraging folks to re-evaluate their code
(as opposed to a non-gentle exception or log blather).
I guess I'm saying that we would be right to make System.iHC
throw, at first, when presented with a value instance. Then we
can react to what we learn.
The obvious reason to make System.substitutabilityHashCode
be a different API point from System.identityHashCode is that
that latter mentions "identity". So there's a clear pedagogical
problem here, if we are telling students "value = object - identity".
A final reason to have a separate API point for System.subHC
is that it's 2019 already, and 32-bit systems are long gone.
The return type of System.subHC should be a full range int64,
mixed (in a few CPU cycles) without egregious funnels.
In short, new hashCode API points should be routinely upgraded
to work well with modern hardware.
Why do op== (acmp) and System.iHC deserve different treatments?
Because op== is used by 100% and System.iHC by 0.01% of Java
programmers. That means that having System.iHC break is a reasonable
way to force a few programmers (like Doug Lea) to go and inspect
their code for bugs. Deprecating op== for values is not an analogous
move; that would amount to saying something like, "all generic Java
code is now buggy, please debug".
— John
P.S. The word "substitutable" is exactly correct as a way of defining
equality for any two nameable terms, in any of a wide range of
formalisms, including programming languages.
In Java a term is a reference, value, or primitive, and substitutability
inspects the type, the identity of the reference, and the structure of
the value (but not the mutable parts of a mutable object).
For a discussion of the connections between equality and substitution,
see http://intrologic.stanford.edu/extras/equality.html
The short version is: Two values x, y are equal if and only if
the sentence "P(x) <=> P(y)" is true for all predicates "P" (in
some relevant domain of discourse also including x and y).
This logical pattern of testing for equality depends on
substituting x for y (in some P(y)) and seeing if anything changes.
If nothing changes, x and y cannot be different. (Or if they
still differ, your work is done; go home and sleep it off.)
This test is extremely robust (though also impossible to
compute directly); it makes no appeal to any intrinsic
property of x or y, other than "P can ask it any questions
it wants". This exercise demonstrates that equality, in
its deepest form, is really an appeal to substitutability.
What, then, about other forms of equality which are
defined as equivalence relations? Those are built on
top of the basic logic of relations and objects (sets,
categories, dependent types, choose one). Those
extra forms of equality can only "coarsen" the basic
equality as determined by substitutability.
If two names x, y are the same thing, as proven by
substitutability, then they cannot be unequal in a derived
equivalence relation E. (Otherwise, that relation E would
provide a hook for breaking the substitutability condition,
via P(z) := E(x,z). Then, P(x)=true and P(y)=false since
E(x,x) must be true and E(x,y) is by assumption false.
But this P breaks the substitutability proof that x==y.)
Thus, derived equivalence relations cannot distinguish x
and y if they are known to be substitutable for each other.
Substitutability is the unique "ground condition" for equality.
This logic applies exactly to Java and the JVM (as soon as
we block out functions P which give non-deterministic
or global-state-dependent answers). Thus, the most
sensitive, exact, finely discriminating equality test the
JVM can provide is the substitutability test.
Although the JVM cannot run a theoretical substitutability
proof in every acmp instruction, it has enough control over
little universe to test efficiently whether such a proof would
succeed or fail, since it "knows all the tricks" that any predicate
P could possibly play.
Recap: There can only be one most-exact equivalence
relation, and that coincides with the substitutability
test. Other equivalence relations can be derived
(as Object.equals is derived from primitive tests)
but they cannot distinguish objects which are found
to be substitutable. The JVM can implement this test.
P.P.S. Is there such a thing as substitutabilityToString? Alas, no.
But there is something deeper than it which underlies it, and equality,
and hashCode. That is a "something" that would enumerate all the
components of a value's substitutability state. Maybe call it
System.visitSubstitutabilityState or some such. For a reference
type it would visit just that ref and indicate ==ref; for a value
of two int fields it would visit each field and indicate ==int.
Maybe we don't want something so space-age in the core JDK,
but we do need enough reflective API points to be able to build
one, or do its job for it. I think Core Reflection, as extended by
Valhalla, succeeds in this.
More information about the valhalla-spec-observers
mailing list