Equality for values -- new analysis, same conclusion

Tue Aug 20 20:09:25 UTC 2019

The latest proposal at the tip of this thread lands in the following place for the LANGUAGE:

 - Totalize `==` over all types to mean SAME== (as it already does for all existing types today);
 - Totalize `.equals()` over all types to mean EQ==, which means adding primitives to the .equals() game

The result is two total equivalence relations, each with semantics that are useful in some situations.  At the same time, we can start to discourage excessive use of Object== if the compiler can’t prove that both operands are identity class instances, by issuing warnings.  

Now, that leaves the question of what to do in the VM, and how to bridge the two (via the translation strategy.)  For the purposes of this message, we’ll assume the first of the two paths, where V <: Object.  

> If we say that == is substitutability, we still have the option to translate == to something other than ACMP.  Which means that existing binaries (and likely, binaries recompiled with —source 8) will still use ACMP.  If we give ACMP the “false if value” interpretation, then existing classifies (which mostly use == as a fast-path check) will still work, as those tests should be backed up with .equals(), though they may suffer performance changes on recompilation.  This is an uncomfortable compromise, but is worth considering.  Down this route, ACMP has a much narrower portfolio, as we would not use it in translating most Object== unless we were sure we were dealing with identityful types.  

Currently, Object== translates to ACMP byte codes, which have ID== semantics.  The VM folks understandably want to avoid perturbing ACMP, especially for legacy code.  We have the option to translate Object== differently between (say) —target 14 and —target 15, where we translate to ACMP for pre-valhalla language levels and to something else for post, where ACMP retains the “false if either operand is a value” semantics, and the new target means “SAME==“.  This is a tradeoff between not creating performance potholes for legacy code which does not use values, and creating a discontinuous behavior when migrating old code forward.  It means code compiled for later JVMs will use a more refined implementation of Object==.  If we believe its acceptable to return false always, it should also be acceptable to return false _sometimes_ but return true when the two values are not externally distinguishable.)  

Cue Dan to say: “OK, do we have benchmarks that differentiate between the “false if value” and “deep == if value” options?  Do we have reason to believe that the former is better enough to risk the discontinuity?  

Assuming the benchmarks bear out the sense of doing so, where we end up is that ACMP becomes the ID== operator, and we move the SAME== operator somewhere else (either to a new byte code, or an intensified static method.)