raw floating-point bits in '==' value object comparisons (again/still)

Mon Mar 11 18:58:24 UTC 2024

On 11 Mar 2024, at 6:33, Remi Forax wrote:

> Last week, I explain at JChateau (think JCrete in France, less sun, more chateaux) how value types work from the user POV, among other subject describing the semantics of ==.
>
> First, most of the attendee knew the semantics difference between == on double and Double.equals(). I suppose it's because people that attend to such (un-)conference have a more intimate knowledge of Java than an average developer. Second, no attendee knew that NaN was a prefix.

So you are using conversations with people who admittedly did not understand the problem, until you explained it to them in the moment, to motivate a design change?  That does not seem promising as a way to predict the user experience after years of learned familiarity.

In short: Sorry, I don’t buy this argument.

>
> So it let me think again on that subject.
>
> 1) The argument that of Dan that we want to be able to create a class with two different NaN, does not hold because instead of storing the values as double, the values can be stored as long.
>
>   value class C {
>       private double d;
>       C(double d) { this.d = d; }
>       long bits() { return Double.doubleToRawLongBits(d); }
>   }
>
>   C c1 = new C(Double.longBitsToDouble(0x7ff0000000000001L));
>   C c2 = new C(Double.longBitsToDouble(0x7ff0000000000002L));
>   assert c1.bits() != c2.bits();
>
> can be rewritten as
>
>   value class C {
>       private long l;
>       C(double d) { this.l = Double.doubleToRawLongBits(d); }
>       long bits() { return l; }
>   }

(Agreed there would be a workaround.  But that is a tiny corner of the issue.)

> 2) The de-duplication of value instances by the GC works with both the bitwise equivalence and the representational equivalence.
>
>  If the GC only de-duplicate the value instance based only on the bitwise equivalence, it is a valid algorithm under the representational equivalence.

And where in the JVMS or JLS would the GC get permission to make such decisions?  We would be smuggling a second same-ness condition back into the JLS and JVMS.  That’s what’s required to specify JVM and language behaviors like this.  We’d have an “is same” and an “is really really and truly the same” condition for value objects containing floats. That strikes me as bad VM physics, regardless of the motivation to align with some pre-existing library API.  Please, no.

There would be real performance costs of such “bad physics”.  The JVM would have to keep a “second set of books” about whether two value objects were really (and really truly) the same, separate from the effects of bytecode-issued acmp instructions.  There would still be an open question of whether to normalize NaN bits (those silly NaN sub-flavors are a root of the problem here).

If we chose to normalize NaN bits in the heap, then we would incur a performance cost for every single value object construction (where a float or double is in the picture).  If we chose to not normalize such heap bits, then we would incur a performance cost on every single comparison of such value objects.  And all to gain some supposed teachability benefit, which mostly disappears a year or two after the curriculum changes.  Sorry, no.

I’m kind of sorry about the burden for those o(1000) expert users who know about the internals of Double::equals in Java but don’t know about multiple NaNs, as of this moment.  But I’m not sorry enough to consent to harming JVM performance to cater to their particular expectations in 2024.

> So I not convinced that the bitwise equivalence should be choosen instead of the representational equivalence, for me two semantics instead of three is a win.
>
> Rémi