Value object equality & floating-point values
Stephen Colebourne
scolebourne at joda.org
Sat Feb 10 12:57:47 UTC 2024
The Java SE specification in java.lang.Double says this:
| No IEEE 754 floating-point operation provided by Java can distinguish
| between two NaN values of the same type with different bit
| patterns. Distinct values of NaN are only distinguishable by
| use of the {@code Double.doubleToRawLongBits} method.
(Same text since at least Java 6 AFAICT)
Note the *only distinguishable*.part. The proposal below breaks this,
as it provides a second way to observe different kinds of NaN. Not
only that, but unlike doubleToRawLongBits which very few people know
about, == on values would be a very mainstream part of the language.
Under your proposal developers would need to handle all three kinds of
equivalence:
- Numeric - for double == double
- Representational - for a double wrapped by a record
- Bitwise - for a double wrapped by a value class
Surely it doesn't make sense for two different kinds of wrapping to
result in two different behaviours, neither of which matches the
unwrapped behaviour??!!
It is my opinion that exposing the concept of different bit patterns
of NaN to most developers would be a significant retrograde step for
Java. The rules of Java have always been simple wrt doubles -
Representational equivalence except for math-style rules on primitive
doubles.
A proposed solution - normalization
I believe there is a simple approach that also works to explain the
behaviour of java.lang.Float and java.lang.Double equals().
* For each `float` or `double` field in a value class, the constructor
will generate normalization code
* The normalization is equivalent to `longBitsToDouble(doubleToLongBits(field))`
* Normalization also applies to java.lang.Float and java.lang.Double
* == is a Bitwise implementation, but behaves like Representational
for developers
If deemed important, there could be a mechanism to opt out of
auto-generated normalization (I personally don't think the use case is
strong enough).
Note that the outcome of this is that all value types consisting only
of primitive type fields have == the same as the record-ike .equals()
definition, which is a very good outcome.
Stephen
On Fri, 9 Feb 2024 at 02:43, Dan Smith <daniel.smith at oracle.com> wrote:
>
> Remi asked about the spec change last May that switched the `==` behavior on value objects that wrap floating points from a `doubleToLongBits` comparison to a `doubleToRawLongBits` comparison. Here's my recollection of the motivation.
>
> First, a good summary of the different versions of floating point equality can be found here:
> https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Double.html#equivalenceRelation
>
> It discusses three different concepts of equality for type 'double'.
>
> - Numerical equality: The behavior of == acting on doubles, with special treatment for NaNs (never equal to themselves) and +0/-0 (distinct but considered equal)
>
> - Representational equivalence: The behavior of `Double.equals` and `doubleToLongBits`-based comparisons, distinguishing +0 from -0, but with all NaN bit patterns considered equal to each other
>
> - Bitwise equivalence: The behavior of `doubleToRawLongBits`-based comparisons, distinguishing +0 from -0, and with every NaN bit pattern distinguished from every other
>
> -----
>
> Now turning to value objects.
>
> Discussing the general concept of equivalence classes, the above reference has this to say: "At least for some purposes, all the members of an equivalence class are substitutable for each other. In particular, in a numeric expression equivalent values can be substituted for one another without changing the result of the expression, meaning changing the equivalence class of the result of the expression."
>
> Value classes that wrap primitive floating point values will have their own notion of what version of "substitutable" they wish to work with, and so what equivalence classes they need. But, at bottom, the JVM and other applications need to have some least common denominator equivalence relation that support substitutability for *all* value classes. That equivalence relation is bitwise equivalence.
>
> That is, consider this class:
>
> value class C {
> private double d;
> C(double d) { this.d = d; }
> long bits() { return Double.doubleToRawLongBits(d); }
> }
>
> C c1 = new C(Double.longBitsToDouble(0x7ff0000000000001L));
> C c2 = new C(Double.longBitsToDouble(0x7ff0000000000002L));
> assert c1.bits() != c2.bits();
>
> Will this assert ever fail? Well, it depends on the JVM treats c1 and c2 as belonging to the same equivalence class. If they are, it's allowed to substitute c1 for c2 at any time. I think it's pretty clear that would be a mistake. So the JVM internals need to be operating in terms of bitwise equivalence of nested floating-point values.
>
> Now consider another class:
>
> value class D {
> double d;
> D(double d) { this.d = d; }
> public boolean equals(Object o) {
> return o instanceof D that && Math.abs(this.d - that.d) < 0.00001d;
> }
> }
>
> D d1 = new D(0.3);
> D d2 = new D(0.1+0.2);
> assert d1.d != d2.d;
>
> Now we've got a class that wants to work with a much chunkier equivalence relation. (I kind of suspect this isn't an equivalence relation at all, sorry, floating-point experts. But you get the idea.) This class wouldn't mind if the VM *did* randomly swap out d1 for d2, because *in this application*, they're substitutable.
>
> So: different classes will have different needs, we can't anticipate them all, but in certain contexts that lack domain knowledge (like VM optimizations), bitwise equivalence must be used.
>
> Finally: must '==' be defined to reflect "least common denominator" substitutability, or could it be something else? Perhaps representation equivalence, which has some nice properties and can be conveniently expressed in terms of Double.equals?
>
> In theory, sure, there's no reason we couldn't use representational equivalence for '==', and provide some other path to bitwise equivalence (Objects.isSubstitutable?).
>
> But again, note that every class has its own domain-specific equivalence relation needs. This is captured by 'equals'. (Beyond floating point interpretations, don't forget that '==' will often not be the equivalence relation that value classes want for their identity object fields, so they'll need to override the default equals and make some recursive 'equals' calls.)
>
> So we know Java programmers need to be conversant in at least two versions of value object equality: universal substitutability (using bitwise equivalence for floating points), and domain equivalence (defined by 'equals' methods). And traditionally, '==' on objects has been understood to mean universal substitutability. Do we really want to complicate matters further by asking programmers to keep track of *three* object equivalence relations, and teaching them that '==' doesn't *really* mean substitutability anymore? We decided that wasn't worth the trouble—ultimately, we just want to continue to encourage them to use 'equals' in most contexts.
>
More information about the valhalla-spec-observers
mailing list