Value object equality & floating-point values

Tue Feb 13 21:18:24 UTC 2024

On 9 Feb 2024, at 12:32, forax at univ-mlv.fr wrote:

>
> We fix == but for NaN looks like a terrible slogan.

There’s no question of fixing anything.  Actually, float== (fcmp/dcmp) is permanently broken for NaN.  It’s the inevitable consequence of having Java import IEEE 754.  You already teach your students that, surely.

We aren’t trying to fix ==.  We are steering towards a semantics for values which is compatible with ref== (acmp) today.  And that is substitutability, in all its details, including the unwelcome details.  Two objects which store distinguishable NaNs are distinct, hence not the same, and hence not ref==.  (Whether they are float== is a very different matter.)

We are not trying to align ref== with float==, because that is an impossible goal.

We could try to file off an unimportant rough edge by making some NaNs less distinct from each other, by normalizing NaNs stored in some fields in some classes.  (All value classes fields is the proposal of the week.)  That won’t change the result of float== on such values; they are still never float==, whether distinct or not.

Filing off that rough edge would shift around the envelope of the sameness predicate ref==.  Nobody will care!  That ref== is also known (to all well-informed programmers) to be rather mysterious because of accidental object identity.  We make ref== somewhat less mysterious by making value objects more likely to compare the same (under ref==).  But we don’t undertake to fold in all possible kinds of equality checks — that is what Object::equals is for.

I don’t buy the claim that people will be more surprised by our changes to NaN behavior and object identity (already very surprising!).

We have Float::floatToRawIntBits for reasons, and it would be a heavy lift to get rid of it.  (Would also delay Valhalla.)  We might wish it were not necessary, because it necessarily brings in questions of bitwise equivalence.  But the reasons are there:

A. IEEE 754 leaves open the possibility that distinct NaNs might carry interesting information.  Java doesn’t have to respect that, but it is reasonable to do so.

B. Hardware platforms produce varying NaN values in varying conditions.  It is simpler to let sleeping NaNs lie, rather than to give each new floating point value a bath to make it look normal, on the chance it might have been an ugly NaN.  This is why (I think) we added the “raw” (bitwise) versions of the float and double conversion methods:  The non-raw versions do too much work for too little benefit.

C. Following up on B, routine normalization (say, on every data store, not just as requested explicitly by Float::floatToIntBits) would make all float-related data structures slower.  Also it would divert JIT optimizer work to solve a problem we made for ourselves, to avoid normalizations for values already proved normal.

D. Some programmers actually use the full 64-bit bandwidth of IEEE 754 double values, for exotic purposes.  No Java libraries I know of do this, but I’ve heard of it in JavaScript interpreters which encode managed pointers as NaNs.  If we start normalizing stored float values, we might cause a compatibility problem with some clever Java code out there.  I put this last because although it is a nonzero risk, the downside is likely limited.  Point C (an endemic normalization cost) is my main concern.

I also agree with Chen’s characterization of the plan of record as not messing with cartesian products.  They are efficiently stored in memory, component-wise (bitwise!!) as long as you don’t try to filter out the “ugly” combinations you don’t prefer.  This is equivalent to my observation about the “64-bit bandwidth” of doubles.  If I can’t get 64 independent bits out of my double field, then I know somebody is probably spending extra cycles to suppress unwanted bit combinations.