Value object equality & floating-point values

Sat Feb 10 20:57:06 UTC 2024

----- Original Message -----
> From: "Stephen Colebourne" <scolebourne at joda.org>
> To: "Valhalla Expert Group Observers" <valhalla-spec-observers at openjdk.org>
> Cc: "daniel smith" <daniel.smith at oracle.com>
> Sent: Saturday, February 10, 2024 1:57:47 PM
> Subject: Re: Value object equality & floating-point values

> The Java SE specification in java.lang.Double says this:
> 
>|  No IEEE 754 floating-point operation provided by Java can distinguish
>|  between two NaN values of the same type with different bit
>|  patterns. Distinct values of NaN are only distinguishable by
>|  use of the {@code Double.doubleToRawLongBits} method.
> (Same text since at least Java 6 AFAICT)
> 
> Note the *only distinguishable*.part. The proposal below breaks this,
> as it provides a second way to observe different kinds of NaN. Not
> only that, but unlike doubleToRawLongBits which very few people know
> about, == on values would be a very mainstream part of the language.
> 
> Under your proposal developers would need to handle all three kinds of
> equivalence:
> - Numeric - for double == double
> - Representational - for a double wrapped by a record
> - Bitwise - for a double wrapped by a value class
> Surely it doesn't make sense for two different kinds of wrapping to
> result in two different behaviours, neither of which matches the
> unwrapped behaviour??!!
> 
> It is my opinion that exposing the concept of different bit patterns
> of NaN to most developers would be a significant retrograde step for
> Java. The rules of Java have always been simple wrt doubles -
> Representational equivalence except for math-style rules on primitive
> doubles.
> 

Yes !

> 
> A proposed solution - normalization
> I believe there is a simple approach that also works to explain the
> behaviour of java.lang.Float and java.lang.Double equals().
> 
> * For each `float` or `double` field in a value class, the constructor
> will generate normalization code
> * The normalization is equivalent to `longBitsToDouble(doubleToLongBits(field))`
> * Normalization also applies to java.lang.Float and java.lang.Double
> * == is a Bitwise implementation, but behaves like Representational
> for developers
> 
> If deemed important, there could be a mechanism to opt out of
> auto-generated normalization (I personally don't think the use case is
> strong enough).
> 
> Note that the outcome of this is that all value types consisting only
> of primitive type fields have == the same as the record-ike .equals()
> definition, which is a very good outcome.

yes !

And also all wrappers of primitive types have == the same as their .equals() definition.

> 
> Stephen
> 

Rémi

> 
> On Fri, 9 Feb 2024 at 02:43, Dan Smith <daniel.smith at oracle.com> wrote:
>>
>> Remi asked about the spec change last May that switched the `==` behavior on
>> value objects that wrap floating points from a `doubleToLongBits` comparison to
>> a `doubleToRawLongBits` comparison. Here's my recollection of the motivation.
>>
>> First, a good summary of the different versions of floating point equality can
>> be found here:
>> https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Double.html#equivalenceRelation
>>
>> It discusses three different concepts of equality for type 'double'.
>>
>> - Numerical equality: The behavior of == acting on doubles, with special
>> treatment for NaNs (never equal to themselves) and +0/-0 (distinct but
>> considered equal)
>>
>> - Representational equivalence: The behavior of `Double.equals` and
>> `doubleToLongBits`-based comparisons, distinguishing +0 from -0, but with all
>> NaN bit patterns considered equal to each other
>>
>> - Bitwise equivalence: The behavior of `doubleToRawLongBits`-based comparisons,
>> distinguishing +0 from -0, and with every NaN bit pattern distinguished from
>> every other
>>
>> -----
>>
>> Now turning to value objects.
>>
>> Discussing the general concept of equivalence classes, the above reference has
>> this to say: "At least for some purposes, all the members of an equivalence
>> class are substitutable for each other. In particular, in a numeric expression
>> equivalent values can be substituted for one another without changing the
>> result of the expression, meaning changing the equivalence class of the result
>> of the expression."
>>
>> Value classes that wrap primitive floating point values will have their own
>> notion of what version of "substitutable" they wish to work with, and so what
>> equivalence classes they need. But, at bottom, the JVM and other applications
>> need to have some least common denominator equivalence relation that support
>> substitutability for *all* value classes. That equivalence relation is bitwise
>> equivalence.
>>
>> That is, consider this class:
>>
>> value class C {
>>     private double d;
>>     C(double d) { this.d = d; }
>>     long bits() { return Double.doubleToRawLongBits(d); }
>> }
>>
>> C c1 = new C(Double.longBitsToDouble(0x7ff0000000000001L));
>> C c2 = new C(Double.longBitsToDouble(0x7ff0000000000002L));
>> assert c1.bits() != c2.bits();
>>
>> Will this assert ever fail? Well, it depends on the JVM treats c1 and c2 as
>> belonging to the same equivalence class. If they are, it's allowed to
>> substitute c1 for c2 at any time. I think it's pretty clear that would be a
>> mistake. So the JVM internals need to be operating in terms of bitwise
>> equivalence of nested floating-point values.
>>
>> Now consider another class:
>>
>> value class D {
>>     double d;
>>     D(double d) { this.d = d; }
>>     public boolean equals(Object o) {
>>         return o instanceof D that && Math.abs(this.d - that.d) < 0.00001d;
>>     }
>> }
>>
>> D d1 = new D(0.3);
>> D d2 = new D(0.1+0.2);
>> assert d1.d != d2.d;
>>
>> Now we've got a class that wants to work with a much chunkier equivalence
>> relation. (I kind of suspect this isn't an equivalence relation at all, sorry,
>> floating-point experts. But you get the idea.) This class wouldn't mind if the
>> VM *did* randomly swap out d1 for d2, because *in this application*, they're
>> substitutable.
>>
>> So: different classes will have different needs, we can't anticipate them all,
>> but in certain contexts that lack domain knowledge (like VM optimizations),
>> bitwise equivalence must be used.
>>
>> Finally: must '==' be defined to reflect "least common denominator"
>> substitutability, or could it be something else? Perhaps representation
>> equivalence, which has some nice properties and can be conveniently expressed
>> in terms of Double.equals?
>>
>> In theory, sure, there's no reason we couldn't use representational equivalence
>> for '==', and provide some other path to bitwise equivalence
>> (Objects.isSubstitutable?).
>>
>> But again, note that every class has its own domain-specific equivalence
>> relation needs. This is captured by 'equals'. (Beyond floating point
>> interpretations, don't forget that '==' will often not be the equivalence
>> relation that value classes want for their identity object fields, so they'll
>> need to override the default equals and make some recursive 'equals' calls.)
>>
>> So we know Java programmers need to be conversant in at least two versions of
>> value object equality: universal substitutability (using bitwise equivalence
>> for floating points), and domain equivalence (defined by 'equals' methods). And
>> traditionally, '==' on objects has been understood to mean universal
>> substitutability. Do we really want to complicate matters further by asking
>> programmers to keep track of *three* object equivalence relations, and teaching
>> them that '==' doesn't *really* mean substitutability anymore? We decided that
>> wasn't worth the trouble—ultimately, we just want to continue to encourage them
>> to use 'equals' in most contexts.