Equality for values -- new analysis, same conclusion

Remi Forax forax at univ-mlv.fr
Mon Aug 12 14:23:00 UTC 2019


I think we should take a step back on that subject,
because you are all jumping to the conclusion too fast in my opinion.

Let starts by the beginning,
the question about supporting == on inline type should first be guided by what we should have decided if inline types were present from the inception of Java, it's the usual trick when you want to retcon a feature. 
If we had inline types from the beginning, i believe we will never had allowed == on Object, the root type of the hierarchy, but have a special method call that will only work on indirect type like in C#.
So Object== is a kind of liability, so we are not here to provide a nice semantics to Object== but to deprecate Object== and provide a backward compatible way to make inline types to work with old codes.

So first, we have to clearly convey that == should be deprecated apart on primitive type,
i propose
- to banned V== (compile error)
- to make Object==and T== emit a compiler warning explaining that the code should be changed
- add a method System.identityEquals(RefObject, RefObject) as replacement

Now, the second thing that disturb me is that no email of this thread, lists the two issues of the substitutibility test that make it unsuitable as an implementation of Object==.
- it's not compatible with the primitive == on float and double, by example,
  inline class InlineFloat {
    float value;

    public boolean equals(Object o) {
      if (!(o instanceof InlineFloat i)) {
        return false;
      }
      return value == i.value;
    }
  }
  has the stupid property of having == being true and equals() being false if value is NaN.

- it can be really slow 
    1) Object== can be megamorphic
    2) Object== can do a recursive call
  so it destroys the assumption that Object== is faster than equals.

Hopefully that we are not trying to make == to work on inline types, but only to make Object== to have a compatible semantics when one of the operand is an inline type. 

so the only choice we have is to return false is the left or the right operand is an inline type.

And yes, people will find it weird but that's why we are deprecating it after all.

Rémi

----- Mail original -----
> De: "Brian Goetz" <brian.goetz at oracle.com>
> À: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Envoyé: Vendredi 9 Août 2019 17:46:19
> Objet: Equality for values -- new analysis, same conclusion

> Time to take another look at equality, now that we’ve simplified away the
> LFoo/QFoo distinction. This mail focuses on the language notion of equality
> (==) only.  Let’s start with the simplest case, the ==(V,V) operator.  There is
> a range of possible interpretations:
> 
> - Not allowed; the compiler treats == as not applicable to operands of type V.
> (Note that since V <: Object, == may still be called upon to have an opinion
> about two Vs whose static types are Object or interface.)
> - Allowed, but always false.  (This appeals to a concept of “aggressive
> reboxing”, where a value is reboxed between every pair of byte codes.)
> - Weak substitutability.  This is where we make a “good faith” attempt to treat
> equal points as equal, but there are cases (such as those where a value hides
> behind an Object/interface) where two otherwise equal objects might report not
> equal.  This would have to appeal to some notion of invisible boxing, where
> sometimes two boxes for the same value are not equal.
> - Substitutability.  This is where we extend == field wise over the fields of
> the object, potentially recursively.
> 
> As noted above, we don’t only have to define ==V, but ==Object when there may be
> a value hiding behind the object. It might be acceptable, though clearly weird,
> for two values that are ==V to not be ==Object when viewed as Objects. However,
> the only way this might make sense to users is if this were appealing to a
> boxing conversion, and its hard to say there’s a boxing conversion from V to
> Object when V <: Object.  There is a gravitational force that says that if two
> values are V==, then they should still be == when viewed as Object or
> Comparable.
> 
> Let’s take a look at the use cases for Object==.
> 
> - Direct identity comparison.  This is used for objects that are known to be
> interned (such as interned strings or Enum constants), as well as algorithms
> that want to compare objects by identity. such as IdentityHashMap.  (When the
> operands are of generic (T) or dynamic (Object) type, the “Interned” case is
> less credible, but the other cases are still credible.)
> - As a fast path for deeper equality comparisons (a == b || a.equals(b)), since
> the contract of equals() requires that == objects are equals().
> - In comparing references against null.
> - In comparing references against a known sentinel value, such as a value
> already observed, or a sentinel value provided by the user.
> 
> When generics are specialized, T== will specialize too, so when T specializes to
> a value, we will get V==, and when T is erased, we will get Object==.
> 
> Suptyping is a powerful constraint; it says that a value is-a Object.  While it
> is theoretically possible to say that v1==v2 does not imply that ((Object)v1 ==
> (Object) v2), I think we’ll have a very hard time suggesting this with a
> straight face.  (If, instead, the conversion from value to Object were a
> straight boxing conversion, this would become credible.)  Which says to me that
> if we define == on values at all, then it must be consistent with == on object
> or interface types.
> 
> Similarly, the fact that we want to migrate erased generics to specialized,
> where T== will degenerate to V== on specialization, suggests that having
> Object== and V== be consistent is a strong normalizing force.
> 
> 
> Having == not be allowed on values at all would surely be strange, since == is
> (mostly) a substitutibilty test on primitives, and values are supposed to “work
> like an int.”   And, even if we disallowed == on values, one could always cast
> the value to an Object, and compare them.  While this is not an outright
> indefensible position, it is going to be an uncomfortable one.
> 
> Having V== always be false still does not seem like something we can offer with
> a straight face, again, citing “works like an int.”
> 
> Having V== be “weak substitutability” is possible, but I don’t think it would
> make the VM people happy anyway.  Most values won’t require recursive
> comparisons (since most fields of value types will be statically typed as
> primitives, refs, or values), but much of the cost is in having the split at
> all.
> 
> Note too that treating == as substitutibility means use cases such as
> IdentityHashMap will just work as expected, with no modification for a
> value-full world.
> 
> So if V <: Object, it feels we are still being “boxed” into the corner that ==
> is a substitutability test.  But, in generic / dynamically typed code, we are
> likely to discourage broad use of Object==, since the most common case (fast
> path comparison) is no long as fast as it once was.
> 
> 
> We have a few other options to mitigate the performance concerns here:
> 
> - Live with legacy ACMP anomalies;
> - Re-explore a boxing relationship between V and Object.
> 
> If we say that == is substitutability, we still have the option to translate ==
> to something other than ACMP.  Which means that existing binaries (and likely,
> binaries recompiled with —source 8) will still use ACMP.  If we give ACMP the
> “false if value” interpretation, then existing classifies (which mostly use ==
> as a fast-path check) will still work, as those tests should be backed up with
> .equals(), though they may suffer performance changes on recompilation.  This
> is an uncomfortable compromise, but is worth considering.  Down this route,
> ACMP has a much narrower portfolio, as we would not use it in translating most
> Object== unless we were sure we were dealing with identityful types.
> 
> The alternate route to preserving a narrower definition of == is to say that _at
> the language level_, values are not subtypes of Object.  Then, we can credibly
> say that the eclair companion type is the box, and there is a boxing conversion
> between V and I (putting the cream in the eclair is like putting it in a box.)
> This may seem like a huge step backwards, but it actually is a consistent
> world, and in this world, boxing is a super-lightweight operation.  The main
> concern here is that when a user assigns a value to an object/interface type,
> and then invokes Object.getClass(), they will see the value class — which
> perhaps we can present as “the runtime box is so light that you can’t even see
> it.”
> 
> Where this world runs into more trouble is with specialized generics; we’d like
> to treat specialized Foo<T> as being generic in “T extends Object”, which
> subsumes values.  This complicates things like bound computation and type
> inference, and also makes invoking Object methods trickier, since we have to do
> some sort of reasoning by parts (which we did in M3, but didn’t like it.)
> 
> tl;dr: if we want a unified type system where values are objects, then I think
> we have to take the obvious semantics for ==, and if we want to reduce the
> runtime impact on _old_ binaries, we should consider whether giving older
> binaries older semantics, and taking the discontinuity as the cost of
> unification.


More information about the valhalla-spec-observers mailing list