raw floating-point bits in '==' value object comparisons (again/still)

Mon Mar 11 23:03:45 UTC 2024

----- Original Message -----
> From: "John Rose" <john.r.rose at oracle.com>
> To: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Sent: Monday, March 11, 2024 7:58:24 PM
> Subject: Re: raw floating-point bits in '==' value object comparisons (again/still)

> On 11 Mar 2024, at 6:33, Remi Forax wrote:
> 
>> Last week, I explain at JChateau (think JCrete in France, less sun, more
>> chateaux) how value types work from the user POV, among other subject
>> describing the semantics of ==.
>>
>> First, most of the attendee knew the semantics difference between == on double
>> and Double.equals(). I suppose it's because people that attend to such
>> (un-)conference have a more intimate knowledge of Java than an average
>> developer. Second, no attendee knew that NaN was a prefix.
> 
> So you are using conversations with people who admittedly did not understand the
> problem, until you explained it to them in the moment, to motivate a design
> change?  That does not seem promising as a way to predict the user experience
> after years of learned familiarity.

I think you misunderstood me, it was surprising to me that the semantics of floating points in Java is well know while at the same time the encoding of floating points is not.
This section is not about changing the design.

> 
> In short: Sorry, I don’t buy this argument.
> 
>>
>> So it let me think again on that subject.

Now, i'm talking about design.

>>
>> 1) The argument that of Dan that we want to be able to create a class with two
>> different NaN, does not hold because instead of storing the values as double,
>> the values can be stored as long.
>>
>>   value class C {
>>       private double d;
>>       C(double d) { this.d = d; }
>>       long bits() { return Double.doubleToRawLongBits(d); }
>>   }
>>
>>   C c1 = new C(Double.longBitsToDouble(0x7ff0000000000001L));
>>   C c2 = new C(Double.longBitsToDouble(0x7ff0000000000002L));
>>   assert c1.bits() != c2.bits();
>>
>> can be rewritten as
>>
>>   value class C {
>>       private long l;
>>       C(double d) { this.l = Double.doubleToRawLongBits(d); }
>>       long bits() { return l; }
>>   }
> 
> (Agreed there would be a workaround.  But that is a tiny corner of the issue.)
> 
>> 2) The de-duplication of value instances by the GC works with both the bitwise
>> equivalence and the representational equivalence.
>>
>>  If the GC only de-duplicate the value instance based only on the bitwise
>>  equivalence, it is a valid algorithm under the representational equivalence.
> 
> And where in the JVMS or JLS would the GC get permission to make such decisions ?

The de-duplication is an optimization, it does not have to deduplicate all instances that are equivalent using the representational equivalence, it can only de-duplicate the ones that are bitwise equivalent.

> We would be smuggling a second same-ness condition back into the JLS and JVMS.
> That’s what’s required to specify JVM and language behaviors like this.  We’d
> have an “is same” and an “is really really and truly the same” condition for
> value objects containing floats.

No, the deduplication is an optimization.

> That strikes me as bad VM physics, regardless
> of the motivation to align with some pre-existing library API.  Please, no.

The main motivation is not having 3 semantics where we can only have 2 and having the wrapper types == and .equals() to be equivalent (reduce the differences between a primitive and its wrapper type).

> 
> There would be real performance costs of such “bad physics”.  The JVM would have
> to keep a “second set of books” about whether two value objects were really
> (and really truly) the same, separate from the effects of bytecode-issued acmp
> instructions.  There would still be an open question of whether to normalize
> NaN bits (those silly NaN sub-flavors are a root of the problem here).

Yes, that's the root of the problem. 

> 
> If we chose to normalize NaN bits in the heap, then we would incur a performance
> cost for every single value object construction (where a float or double is in
> the picture).

This is not what i'm proposing.

> If we chose to not normalize such heap bits, then we would incur
> a performance cost on every single comparison of such value objects.

yes, but this is exactly how j.l.Float.equals() and j.l.Double.equals() actually works, this is how a record with float or double component actually works, and you explain in another mail about liliput that "The key principle here is probably that there should be a slow path which is rare and easy to test.". A NaN sub-flavor (i.e. not the canonical NaN) is rare and easy to test.

Rémi