Value equality

John Rose john.r.rose at oracle.com
Tue May 17 22:48:24 UTC 2016


On May 16, 2016, at 11:20 AM, Brian Goetz <Brian.Goetz at Oracle.COM> wrote:
> 
> We have two choices in front of us regarding equality on value types:
> 
> 1.  What does `==` on values do?  Choices include a bitwise comparison, a componentwise comparision (identical to bitwise unless there are float members), or invoking the `equals()` method.

I.e., what is the semantic binding of the op== syntax?

> 2.  What is the default for the `equals()` method?  The obvious choice is a componentwise comparision, but Kevin has (strongly) suggested we consider a deep comparison, calling .equals() on any reference members.  (Note that the two choices collapse to the same thing when there are no reference members.)

I.e., what are the global expectations of the .equals method, and how can we embody them in a suitable default?

> For values whose members are value-ish objects themselves (e.g., String, Optional, LocalDateTime), the deep-equals is obviously appealing -- a tuple of (int, date) makes total sense to compare using .equals() on the date.  On the other hand, for values whose members are references to mutable things (like lists), it's not as obvious what the right default is.

I can't vote for "deep" vs. "shallow", or any syntax binding, without setting up a bunch of semantic background.  So here goes…

Let's start with semantics with no syntax.  I would like to single out two especially useful equivalence relations, to be called normal equals ("NE") and copy-wise equals ("CE").

("Normal" is a shameless play for pre-eminence of a particular theory.  But we have to define a normal, because we have to say what we expect of .equals.  We might have to tweak my notion of "normal", but here it is.)

For exposition, we need also names for legacy comparison operations for primitives and references, to be called primitive equals ("PE") and object equals ("OE").

CE is the most refined possible equality.  If two values (reference or primitive) are CE, then you can *never* execute code that detects a difference between them.  If two values are CE then each is fully substitutable for the other, in all contexts.  This is sometimes a useful property to observe in generic code.

  CE(a,b) := "a is indistinguishable from an assignment-copy of b"

Implementation note:  CE is op== today, except for floats.  For floats, CE must consult raw bits; cf. Double.doubleToRawLongBits.  CE for values has to be a component-wise distribution of CE, or else it wouldn't be CE.  But after this point we get less certain.

In most cases, CE is exactly the legacy op==, but not for floats.  So let's define a slightly coarser equivalence relation for primitive equality:

  PE(a,b) := legacy(a==b) || (floating(a) && CE(a,b))

Here the op== is the legacy primitive equality appropriate to both a and b.  (Note: These formulas will assume that a and b are either both refs, both of the same primitive type, or (in some cases) both of the same value type.)

Implementation note:  Both PE(+0.0,-0.0) and PE(Float.NaN, Float.NaN) are true, while CE(+0.0,-0.0) is true but Float.NaN==Float.NaN is false.  These are the only differences, so we can treat PE as a delta from CE:

  PE(a,b) == CE(a,b) || (floating(a) && legacy(a==0 && b==0)).

Also, since the box types Float and Double use CE internally, we can also boxed comparison to obtain CE:

  PE(a,b) == legacy(a==b) || ((Float)a).equals((Float)b)

Implementation note:  By appealing to boxes for a semantic notion like PE, we *do not* require implementations to perform boxing.  We simply require that any computation of the semantics produces the same result *as if* the formula were directly evaluated, boxes and all.

So PE is neither CE nor the legacy op==, but is an equivalence relation that closely approximates both.  It is presented here to show the interactions of general equality with primitive equality, but (spoiler alert) it probably won't make the final cut.

OE is simply an appeal to the Object.equals method, guarded first by CE.

  OE(a,b) := CE(a,b) || (reference(a) && legacy(a != null && a.equals(b)))

Implementation note:  The CE guard takes care of null==null.  For references, this relation is embodied in Objects.equals(a,b).

Implementation note:  For primitive wrappers, OE is compatible with CE (but not PE or op==) on the wrapped primitive value.  This is true even for floats (the .equals methods consult the raw bits).  Therefore we can use these relations, if we want:

  CE(a,b) == OE((Object)a,(Object)b)  where  primitive(a)
  PE(a,b) == legacy(a==b) || (primitive(a) && OE((Object)a,(Object)b))

For some objects, OE is the same as CE; this includes arrays.  For objects which favor structural equality, including most collection classes, OE recurses into substructure.

One might say that two objects which are OE but not CE are not fully substitutable, but "structurally substitutable".  The Arrays.equals method family offers an alternative form of structural substitutability by treating arrays as lists.  Structural substitutability is a time-varying property, if an object or any subpart is mutable.  Still, many objects are never-mutated, so structural substitutability can be viewed as time-invariant in many cases (though this is a fruitful source of bugs and TOCTOU exploits).

Now it's time to define "normal", as a concept that works equally well with both objects and primitives, and can extend to the new value types.  Here is a symmetric possibility:

  NE'(a,b) := PE(a,b) || OE(a,b)

In other words, a "normal equality" test compares primitives with op== (except it uses CE for NaNs), and compares objects with .equals.

We can also rewrite this formula to omit the non-legacy notion of PE, and appeal to boxes instead:

  NE'(a,b) := legacy(a==b) || OE((Object)a,(Object)b)

For reference types, this is obviously our old friend Objects.equals in another guise.  For primitive types it is something new, but perhaps desirable.  (So I'm putting it on the table, though I don't prefer it.)  It is a "naively generified" version of "a == b || a != null && a.equals(b)".  Since it includes PE, it compares +0.0 as equal to -0.0.

But I prefer a simpler version of NE:

  NE(a,b) := CE(a,b) || (reference(a) && OE(a,b))

This version drops PE from the equation.  Float zeroes are not respected; primitives comparisons are uniformly bitwise (CE).

Because primitive boxes perform CE for .equals, we can alternatively formulate NE in terms of boxes:

  NE(a,b) := OE((Object)a,(Object)b)

This finally takes me to why I want to privilege this comparison with the name "normal":  It is in use every nanosecond already, in our collection types.  When you auto-box a primitive into a collection, it participates in the form of NE.

A third candidate for NE would just use legacy comparisons:

  NE''(a,b) := legacy(a == b) || (reference(a) && OE(a,b))

But this is not an equivalence relation, in the case of floating point NaNs.  Nor is it compatible with current computations involving boxed NaNs (which *do* compare as equal if they are CE).  I believe these reasons, combined, take NE'' out of the running.

Still staying in semantics, let's consider extending these relations to values.  As noted above, CE *must* be "componentwise CE", unless we have a way to guarantee that some field will *never* be detectible by user code, in which case we should just drop the field.

PE *could* be extended to values componentwise also.  This would give a slightly weakened version of CE for values which contain floats.  I don't think that move carries its weight.

To refer to the equals method for values, we can make a simple partial definition:

  VE(a,b) := a.equals(b)  where value(a) and typeof(a)==typeof(b)

Alternatively, OE is easily extended to values, either by boxing, or directly:

  OE'(a,b) := OE(a,b) || (value(a) && OE((Object)a, (Object)b))
  OE'(a,b) := CE(a,b) || (!primitive(a) && legacy(a != null && a.equals(b)))

(To make notation easier, we can assume a!=null is identically true for any value a.)

Those two definitions will remain in alignment as long as .equals behavior commutes with boxing a value to Object.

Then we get an compatibly extended version of NE (with no prime this time) as:

  NE(a,b) := CE(a,b) || OE(a,b) || VE(a,b)

We can fold together the equals calls instead:

  NE(a,b) := (!primitive(a) && OE'(a,b))

Or, recognizing that PE is not in play, simply fold everything into boxes:

  NE(a,b) := OE((Object)a,(Object)b)

By dropping CE from the formula, we are relying on the fact that all .equals methods are reflexive, even for values.

Any of these formulas are plausible starting points for efficient implementations of NE.

An NE with values is a highly plausible extension for places in our system which execute the logic of Objects.equals.

For places in our system which execute non-CE primitive equality predicates (such as op== or PE), NE is an approximate replacement, and may need special handling.  But the differences are small, affecting only to the treatment of NaN dis-equality and signed-zero equality.  And today's normative treatment (after boxing) is simply CE, which is compatible with NE for boxed primitives.

For the record, we could also define array-traversing predicates which coarsen OE, such as:

  AE(a,b) := /*OE(a,b) ||*/ Arrays.deepEquals(new Object[]{a},new Object[]{b})
  AE'(a,b) := OE(a,b) || (array(statictypeof(a)) && array(statictypeof(b)) && Arrays.equals(a,b))
  AE''(a,b) := OE(a,b) || (array(statictypeof(a)) && array(statictypeof(b)) && Arrays.deepEquals(a,b))

I hope that is helpful to others besides me.

Now, as for syntax and bindings.

CE is obviously useful; when you need it you shouldn't have to approximate it.  So it needs its own API point, something like System.isEqualCopy.  One reason for this, rather than rely on a legacy operator, is to give programmers a definitive way to avoid binding to the floating op== which has a surprise inside.  Generic code, if it specializes to floating op==, will break substitutability in just that one place; surely this will be unwelcome sometimes.  Even if generic code uses CE uniformly for op==, System.isEqualCopy can have a pedagogical function, as well as being handy in numeric code.

PE is not obviously useful, so unless it is incorporated in NE (as NE') it does not need an API point.  It can be coded (in numeric code) as a==b || CE(a,b).

It is plausible to bind op== to CE almost everywhere.  There is one caveat (besides floating point):  For values, there will be a performance cost, because many value comparisons will *not* welcome a pre-pass with CE as an optimization to VE.  In other words, if op== is bound to CE, the NE formula for values will be the usual:

  v==w || v.equals(w)

This is nice and symmetric with current practice, *but* it is slow.  The big user-written pattern puts a burden on the JIT.  The JIT *should* be able to optimize away the various field references in "v==w", by recognizing that the same fields are being compared in the .equals method.  *But* if the equals method does anything other than CE to those fields (e.g., NE or AE or a user-defined check), then the JIT might not be able to infer that the original field comparison is subsumed by the later test.  There may be control flow in the way which the JIT does not understand.  Thus, "v==w||v.equals(w)" may optimize only partially, including machine code artifacts like "v.f==w.f; …; v.f.equals(w.f)".

Finally, even if the JIT is really smart, users are even smarter, and they endlessly produce new variations on the theme of a==b||a.equals(b).  The two comparisons, which we want to merge into each other, can be separated by arbitrary user code.  This is a classic problem with verbose languages:  The compiler has to recognize any code shape the user throws at it.

For this reason one might prefer that op== for values (and any-types) be bound to NE, not CE.  (This is P2 below.)  An underlying assumption here would be that CE is much less useful than NE, so NE should have preferential access.

Data is mixed on whether NE is more important than CE.  Object reference identity would seem to be the wrong question to answer in many cases, yet some codes use CE to detect full substitutability.  Some uses of op== are errors, where the user mean to reach for NE but forgot how to spell it (or was too lazy?).  Sometimes CE seems like an expert-only tool, a sharp needle to be kept on a high shelf.  Deciding the true utility of CE requires corpus studies.

Considered by itself, NE is very useful; we call it Objects.equals(a,b) nowadays, so it has an API point waiting for it already.

What do today's classes do when implementing .equals?  This is an empirical question, but to me it seems that they typically consult most or all of their fields, comparing each field value with the field value in the comparand.  The tendency is to use recursive structural comparison on fields.  In short, it is rare to find CE applied to fields in an .equals method, and common to find NE applied.  (Sometimes an AE or another thing is used.)

(Note:  When we say that code executes NE or CE, we really mean that it executes other code whose result is equivalent to NE or CE, as restricted to the types and inputs actually present.)

If a class or value models a *cursor* into some mutable state, it may be undesirable to traverse the state.  A cursor comparison operation should use CE on the state reference, while is can use NE (or op==) on other parts such as indexes.  Java does not have many such data types, but the option must exist for some occasions, to use CE instead of NE for a field comparison.  (C++ STL has many such types.  Project Panama defines proxies for C pointers which behave like this.)

It is simple to use NE to specify a useful default .equals method for values:  Simply distribute NE across the fields, in the same way that CE *must* distribute itself across the fields.

If a value type wants to use floating op== on a floating point field, it can specify this "manually".  Likewise for AE (arrays treated structurally).  The manual specialization can be carried out either by hand-writing the .equals method, or else giving the field a type for which NE delegates to the correct equality computation.  As Peter Levart points out, this field type can itself be a value type which wraps the "real" field value, and such a value type is probably of negligible cost.

So it appears that having NE-across-components be the default equals method is customary and reasonable.  Given wrapper types for tweaked fields, it is even (arguably) universal.

Going back to op==, there are two plausible options for binding it to new types:

(P1) Syntax of op==(val,val) and op==(any,any) binds to CE as with op==(ref,ref).  Therefore, NE is uniformly reached by today's idiom, which traverses value fields twice.

(P2) Syntax of op==(val,val) and op==(any,any) is direct access to NE.  CE is reachable by experts at System.isEqualCopy.  The old idiom for NE works also calls equals twice.

(P3) Same as P1, op== is uniform access to CE.  New op (spelled "===", ".==", "=~", etc.) is uniform, optimizable access to NE, attracting users away from legacy idiom for NE.

In all cases, floats have irregularities on NaNs and zeroes.  Basically, generic code needs to access CE, and wants to use op== for it, but floating-point code has a slightly different meaning for op==.  You can't have two meanings for one operator, so something has to give.  I recommend segregating the floating-point meaning of op== to monomorphic code, so polymorphic code can be reasoned about in terms of the clean CE.

In P1 and P2, the legacy comparison idiom (a==b || a.equals(b)) reaches NE, but also iterates over value fields twice, with unpredictable optimization results.  In P2, both passes are done by V.equals, so if there is a guaranteed that V.equals is idempotent, it can be directly optimized.  If P1, if there is a guarantee that op== is subsumed by .equals, the leading op== can be optimized.  In any case, P2 allows code to be improved to reach NE by a single expression a==b, allowing full optimization in all cases.

With P1, we can use op== for CE substitutability checks (except for non-specialized floats), and use Objects.equals for NE.  There is no need to directly call .equals, but users will do this because it is easier than calling Objects.equals.

With P2, we can use op== as abbreviation for Objects.equals, and use, System.isEqualCopy for substitutability checks.  Users will be attracted away from direct calls to .equals.

With P3, we can use op== for CE as in P1, and the new op for NE.  Users will be attracted away from direct calls to .equals.

Sadly, legacy use of op==(ref,ref) makes P2 difficult to adopt.  We could make an agreement, as with float, that op==(ref,ref) means one thing in legacy code, while op==(any,any) means something different (and better) in any-generic code.  Note that legacy code includes generics, in which op== is CE not NE.

There are good arguments that CE is so useful that we *should* allocate an operator to it, anyway.  But if this argument is accepted, it provides even more motivation for an operator for NE, since NE is extremely common.  For example, many uses of op== (CE) are written simply to support the idiom for NE (Objects.equals).  And if we default field comparison to NE not CE, how is it that users are given the less-friendly notation for NE and the more friendly one for NE?

At the cost of another bikeshed and some sugar, P3 would allow users and optimizers smoother access to NE.  Too much sugar is bad for the diet, but this sugar (op===) might just be the right way to lead Hansel and Gretel to the correct house, instead of stumbling through the woods with the Objects.equals incantation.

— John


More information about the valhalla-spec-observers mailing list