[lworld+fp16] RFR: 8334432: Refine Float16.fma
Joe Darcy
darcy at openjdk.org
Mon Aug 12 15:27:45 UTC 2024
On Fri, 21 Jun 2024 14:17:02 GMT, Raffaello Giulietti <rgiulietti at openjdk.org> wrote:
>> Adding comments and test cases for Float16.fma.
>
> The analysis done [here](https://github.com/openjdk/valhalla/pull/1117#issuecomment-2174342914) seems correct, but there is an implicit suspension moment in the last case.
>
>> That leaves possibly non-exact product-sum with a combination of product in the subnormal range of Float16 and the c term to be added in being not-small. However, if this product-sum is non-exact, the smaller term from the product, with at most 22 exponent bit positions set, and the the 11 bits from c being summed in, must be separated by at least 53 - (22 + 11) = 20 bit positions otherwise the product-sum would fit in a double. I believe this implies at least one of the double-rounding scenarios cannot occur, in particular a half-way result in the smaller precision, Float16 in this case, rounding differently because sticky bit information from the higher precision was rounded away.
>
> Here's a further analysis of this case.
>
> Double rounding is usually harmless. It is harmful only in two situations:
>
> - The first rounding from the exact value to the extended precision (here `double`) happens to be directed _toward_ 0 to a value exactly midway between two adjacent working precision (here `float16`) values, followed by a second rounding from there which again happens to be directed _toward_ 0 to one of these values (the one with lesser magnitude).
> A single rounding from the exact value to the working precision, in contrast, rounds to the value with larger magnitude.
> - Symmetrically, the first rounding to the extended precision happens to be directed _away_ from 0 to a value exactly midway between two adjacent working precision values, followed by a second rounding from there which again happens to be directed _away_ from 0 to one of these values (the one with larger magnitude).
> However, a single rounding from the exact value to the working precision rounds to the value with lesser magnitude.
>
> In any other case double rounding is innocuous, returning the same value as a single rounding to the working precision.
> We only need to ensure that the first rounding to `double` does not produce the midpoint of two adjacent `float16` values.
>
> - If a·b and c have the same sign, the sum a·b + c has a significand with a large gap of 20 or more 0s between the bits of the signifcand of c to the left (at most 11 bits) and those of the product a·b to the right (at most 22 bits).
> The rounding bit for the final working precision of `float16` is the leftmost 0 in the gap.
> - If rounding to `double` is dir...
Thanks for the review comments @rgiulietti ; the most recent push should address the points you raised.
-------------
PR Comment: https://git.openjdk.org/valhalla/pull/1143#issuecomment-2284277898
More information about the valhalla-dev
mailing list