Equality (was: Re: Migrating methods in Collections)

Sat Dec 19 10:13:07 UTC 2015

On Dec 18, 2015, at 8:55 AM, Brian Goetz <brian.goetz at oracle.com> wrote:
> 
>> 1. It seems unresolved in the current State of Values doc
>> whether value types will have user-definable equals()
>> methods. I think that this needs to be settled soon:
>> 
>> If value types don't allow overriding equals, and if the implementation
>> is "has same type and bits", then some of the problems you note
>> almost disappear. For example c.contains(x) could be automatically
>> translated into "false" if c is a collection of a different val type
>> than x, or x is a ref type or null. Which also happens to catch all
>> the type-problematic cases. I'm not sure how a compiler would know
>> to do this though.
> 
> Here's the current thinking on the tools for equality:
> 
> - The bytecode set will provide sort of 'vcmpeq' instruction, whose behavior is a componentwise recursive comparison (int fields with icmp, value fields with vcmp, etc).
> - The == operator in the language will correspond to vcmpeq
> - The default (whether provided by javac or VM) implementation of equals(V) for value types will do an == comparison
> - Users can override equals(V)

"Codes like a class" => You can override equals.  There's not really an open question here.  Forbidding overrides to equals would be crippling.

"Works like an int" => operator== might be logically adjusted to the value type semantics.

(Reminder "Codes like a class, works like an int" is the slogan which best captures in a few words what we are trying to do with value types.)

With that as context, I think we have two plausible, logically consistent options:

Option 1. (POR, as Brian points out) operator== is hardwired to bitwise comparison (ignoring padding, never calling equals methods)

Option 2. operator== is an alias for equals, and vcmpeq is accessible but available under a different name (isSameAs).

The choice here must balance two competing influences.

"Codes like a class" means that, internally within the implementation of a value type, uses of operator== must be "dumb" approximations to "true" equality.
Indeed, probably most occurrences of operator== on references other than null are of the form "p == q || p != null && p.equals(q)".  Bad language choice here, IMO
That legacy meaning of operator== pushes us towards Option 1.

"Works like an int" means that, externally when people use a value type, as if it were a primitive, will just say "v == w" and not even dream that "v.equals(w)" is a thing.
Exactly zero occurrences of operator== on non-references are backed up by calls to equals, and users will be surprised if a value type give incomplete answers to v==w.
This practicality pushes us towards Option 2.

But, if you think about it, it also pushes us towards well-controlled behavior for other operators.
If I can write "v == w", what should I expect from "v < w" (if they are comparable)?
Does this roll us all the way down the slippery slope to operator overloading?  It had better not.

There are two obvious places we could stop rolling towards (uncontrolled) operator overloading.
First, only "overload" operators which are *already* common to both primitives and references.  That means == and !=, and nothing else.

Second, retroactively add interfaces to Byte, Boolean, Integer, Long, Float, etc., which reify all the relevant operators as named method calls.
And then allow value types to overload those named methods, wiring operator uses into those methods (but continuing to hardwire the primitives to the appropriate bytecodes).

I think the POR (Option 1) is reasonable, unless/until we discover evidence to the contrary as we work with generics over primitives.

Finally, note that operator overloading is not just an academic or esthetic concern, because enhanced generics demand some sort of unified view of types.

When we write a generic method over a type parameter <T>, we expect the method to operate correctly over all valid bindings of T.
Today, since T ranges only over references, we can assume that code that touches T will do the Option 1 dance of "v == w || v.equals(w)", across all T, even value types.
Tomorrow, when T ranges over primitives, references, and values, there will be a little more pressure to "rationalize" the behavior of op<=, op+, op*, etc., so that they operates correctly over all valid bindings of T.
I say "a little more" but experiment will show whether it is significant.  If so, we will want to re-interpret op<=, op+, etc., as interface calls, and write generics using bounds like <any T extends Comparable> (getting op< <= == != >= >), <any T extends Arithmetic> (getting op+ etc.), and so on.

The conservative thing to do, which might be right in the end, is to require all new code that uses <any T extends Comparable> etc. to always use method-call syntax on values of type T, and bring primitives into consistency by retroactively assigning the methods in Comparable, etc.  (Perhaps only in generic code?)
Later on we can reconsider whether rehabilitating the various infix operators (as sugar for those methods) is worth doing.

The thing we must *not* do is get to a place where primitives can *only* be operated on via operators like op< op+, but values and references can *only* be operated on via method invocation.  One of the two sides has to change so as to overlap (at least for generic code) with the other.

— John