[lworld+fp16] RFR: 8334432: Refine Float16.fma
Raffaello Giulietti
rgiulietti at openjdk.org
Fri Jun 21 14:19:23 UTC 2024
On Fri, 21 Jun 2024 04:36:52 GMT, Joe Darcy <darcy at openjdk.org> wrote:
> Adding comments and test cases for Float16.fma.
The analysis done [here](https://github.com/openjdk/valhalla/pull/1117#issuecomment-2174342914) seems correct, but there is an implicit suspension moment in the last case.
> That leaves possibly non-exact product-sum with a combination of product in the subnormal range of Float16 and the c term to be added in being not-small. However, if this product-sum is non-exact, the smaller term from the product, with at most 22 exponent bit positions set, and the the 11 bits from c being summed in, must be separated by at least 53 - (22 + 11) = 20 bit positions otherwise the product-sum would fit in a double. I believe this implies at least one of the double-rounding scenarios cannot occur, in particular a half-way result in the smaller precision, Float16 in this case, rounding differently because sticky bit information from the higher precision was rounded away.
Here's a further analysis of this case.
Double rounding is usually harmless. It is harmful only in two situations:
- The first rounding from the exact value to the extended precision (here `double`) happens to be directed _toward_ 0 to a value exactly midway between two adjacent working precision (here `float16`) values, followed by a second rounding from there which again happens to be directed _toward_ 0 to one of these values (the one with lesser magnitude).
A single rounding from the exact value to the working precision, in contrast, rounds to the value with larger magnitude.
- Symmetrically, the first rounding to the extended precision happens to be directed _away_ from 0 to a value exactly midway between two adjacent working precision values, followed by a second rounding from there which again happens to be directed _away_ from 0 to one of these values (the one with larger magnitude).
However, a single rounding from the exact value to the working precision rounds to the value with lesser magnitude.
In any other case double rounding is innocuous, returning the same value as a single rounding to the working precision.
We only need to ensure that the first rounding to `double` does not produce the midpoint of two adjacent `float16` values.
- If a·b and c have the same sign, the sum a·b + c has a significand with a large gap of 20 or more 0s between the bits of the signifcand of c to the left (at most 11 bits) and those of the product a·b to the right (at most 22 bits).
The rounding bit for the final working precision of `float16` is the leftmost 0 in the gap.
- If rounding to `double` is directed toward 0, all the 0s in the gap are preserved, thus the `float16` rounding bit is unaffected and remains 0. This means that the `double` value is _not_ the midpoint of two adjacent `float16` values, so double rounding is harmless.
- If rounding to `double` is directed away form 0, the rightmost 0 in the gap might be replaced by a 1, but the others are unaffected, including the `float16` rounding bit. Again, this shows that the `double` value is _not_ the midpoint of two adjacent `float16` values, and double rounding is innocuous.
- If a·b and c have opposite signs, the sum a·b + c the long gap of 0s above is replaced by a long gap of 1s. The `float16` rounding bit is the leftmost 1 in the gap, or the second leftmost 1 iff c is a power of 2. In both cases, the rounding bit is followed by at least another 1.
- If rounding to `double` is directed toward 0, the `float16` rounding bit and its follower are preserved and both 1, so the `double` value is _not_ the midpoint of two adjacent `float16` values, and double rounding is harmless.
- If rounding to `double` is directed away from 0, the `float16` rounding bit and its follower are either preserved (both 1), or both switch to 0. Either way, the `double` value is again _not_ the midpoint of two adjacent `float16` values, and double rounding is harmless.
-------------
PR Comment: https://git.openjdk.org/valhalla/pull/1143#issuecomment-2182843647
More information about the valhalla-dev
mailing list