Value object equality & floating-point values

Fri Feb 9 18:13:03 UTC 2024

---- Original Message -----
> From: "daniel smith" <daniel.smith at oracle.com>
> To: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Sent: Friday, February 9, 2024 3:43:34 AM
> Subject: Value object equality & floating-point values

> Remi asked about the spec change last May that switched the `==` behavior on
> value objects that wrap floating points from a `doubleToLongBits` comparison to
> a `doubleToRawLongBits` comparison. Here's my recollection of the motivation.

Hello,

> 
> First, a good summary of the different versions of floating point equality can
> be found here:
> https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Double.html#equivalenceRelation
> 
> It discusses three different concepts of equality for type 'double'.
> 
> - Numerical equality: The behavior of == acting on doubles, with special
> treatment for NaNs (never equal to themselves) and +0/-0 (distinct but
> considered equal)
> 
> - Representational equivalence: The behavior of `Double.equals` and
> `doubleToLongBits`-based comparisons, distinguishing +0 from -0, but with all
> NaN bit patterns considered equal to each other
> 
> - Bitwise equivalence: The behavior of `doubleToRawLongBits`-based comparisons,
> distinguishing +0 from -0, and with every NaN bit pattern distinguished from
> every other
> 
> -----
> 
> Now turning to value objects.
> 
> Discussing the general concept of equivalence classes, the above reference has
> this to say: "At least for some purposes, all the members of an equivalence
> class are substitutable for each other. In particular, in a numeric expression
> equivalent values can be substituted for one another without changing the
> result of the expression, meaning changing the equivalence class of the result
> of the expression."
> 
> Value classes that wrap primitive floating point values will have their own
> notion of what version of "substitutable" they wish to work with, and so what
> equivalence classes they need. But, at bottom, the JVM and other applications
> need to have some least common denominator equivalence relation that support
> substitutability for *all* value classes. That equivalence relation is bitwise
> equivalence.
> 
> That is, consider this class:
> 
> value class C {
>    private double d;
>    C(double d) { this.d = d; }
>    long bits() { return Double.doubleToRawLongBits(d); }
> }
> 
> C c1 = new C(Double.longBitsToDouble(0x7ff0000000000001L));
> C c2 = new C(Double.longBitsToDouble(0x7ff0000000000002L));
> assert c1.bits() != c2.bits();
> 
> Will this assert ever fail? Well, it depends on the JVM treats c1 and c2 as
> belonging to the same equivalence class. If they are, it's allowed to
> substitute c1 for c2 at any time. I think it's pretty clear that would be a
> mistake. So the JVM internals need to be operating in terms of bitwise
> equivalence of nested floating-point values.

As you said there are 3 possible equivalence class. The numerical equality is not really an equivalence so let rule it out.
So we have the choice between the bitwise equivalence and the representational equivalence.

Whatever the equivalence class we choose, it will be the definition for substitutability.
If we choose the representational equivalence, the VM will have more leeway to optimize because it may substitute one instance of C by another whatever the encoding of NaN is.
If we choose the bitwise equivalence, the VM will not be able to optimize if it is not the exactly bitwise representation of NaN.

> I think it's pretty clear that would be a mistake.

I do not compute that statement :)

Why do you want users to care about the bitwise representation of NaN ?
Both 0x7ff0000000000001L and 0x7ff0000000000002L represents NaN, if we print c1.d and c2.d both will print NaN, if we use c1.d or c2.d in numeric computation, they will both behave as NaN.

[...]

> So we know Java programmers need to be conversant in at least two versions of
> value object equality: universal substitutability (using bitwise equivalence
> for floating points), and domain equivalence (defined by 'equals' methods). And
> traditionally, '==' on objects has been understood to mean universal
> substitutability. Do we really want to complicate matters further by asking
> programmers to keep track of *three* object equivalence relations, and teaching
> them that '==' doesn't *really* mean substitutability anymore? We decided that
> wasn't worth the trouble—ultimately, we just want to continue to encourage them
> to use 'equals' in most contexts.

Your example is both compatible with the bitwise equivalence and the representational equivalence, because the only difference between the two equivalent classes, is the behavior of  NaN.
So the only case where using the representational equivalence as substitutability is an issue is if you want equals() to use the bitwise equivalence. In this specific case, it will not work.

If we were mathematicians, that the end of the discussion, but we are designing a programming language so we have to take care of the drawbacks of not using the representational equivalence and balance it with the fact that if we choose the representational equivalence for value class, a class that have an equals() that uses the bitwise equivalence can not be declared as a value class.

For me, there are serious drawbacks of using the bitwise equivalence, it will clash with the other places where we are already using the representational equivalence:
- the bitwise equivalence is pretty obscure and hard to debug given that the string representation is compatible with the representational equivalence,
- the behavior of java.lang.Double and java.lang.Float becomes different from the other wrapper types for which both == and equals() have the same semantics,
- the semantics of equals in a record is based on the representational equivalence, so a record value with primitive components will also have an == and a equals that disagree. 

Using your example, but with a value record (supposing the bitwise equivalence)

value record C(double d) { }
C c1 = new C(Double.longBitsToDouble(0x7ff0000000000001L));
C c2 = new C(Double.longBitsToDouble(0x7ff0000000000002L));

System.out.println(c1);  // C[d=NaN]
System.out.println(c2);  // C[d=NaN]
System.out.println(c1 == c2);  // false ??
System.out.println(c1.equals(c2));  // true

Rémi