RFR: JDK-8289551: Conversions between bit representations of half precision values and floats [v6]
Raffaello Giulietti
duke at openjdk.org
Sat Jul 23 20:07:13 UTC 2022
On Sat, 23 Jul 2022 18:55:03 GMT, Raffaello Giulietti <duke at openjdk.org> wrote:
>> Joe Darcy has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Implement review feedback.
>
> src/java.base/share/classes/java/lang/Float.java line 1122:
>
>> 1120: // binary16 (when rounding is done, could still round up)
>> 1121: int exp = Math.getExponent(f);
>> 1122: assert -25 <= exp && exp <= 15;
>
> I think that both the subnormal and the normal case can be unified if we pay closer attention to the positions of the lsb, round and sticky bits in subnormals.
>
>
> // Clamp exp to the [-15, 15] range while retaining the
> // difference between the original value and -15 on clamping.
> // This is the excess shift value in addition to 13.
> int expdelta = Math.max(0, -15 - exp);
> exp += expdelta;
> assert -15 <= exp && exp <= 15;
>
> int f_signif_bits = doppel & 0x007f_ffff; // original significand
> // Significand bits as if using rounding to zero (truncation).
> short signif_bits = (short)(f_signif_bits >> (13 + expdelta));
>
> // For round to nearest even, determining whether or
> // not to round up (in magnitude) is a function of the
> // least significant bit (LSB), the next bit position
> // (the round position), and the sticky bit (whether
> // there are any nonzero bits in the exact result to
> // the right of the round digit). An increment occurs
> // in three cases:
> //
> // LSB Round Sticky
> // 0 1 1
> // 1 1 0
> // 1 1 1
> // See "Computer Arithmetic Algorithms," Koren, Table 4.9
>
> int lsb = f_signif_bits & (1 << 13 + expdelta);
> int round = f_signif_bits & (1 << 12 + expdelta);
> int sticky = f_signif_bits & ((1 << 12 + expdelta) - 1);
>
> if (round != 0 && ((lsb | sticky) != 0 )) {
> signif_bits++;
> }
>
> // No bits set in significand beyond the *first* exponent
> // bit, not just the sigificand; quantity is added to the
> // exponent to implement a carry out from rounding the
> // significand.
> assert (0xf800 & signif_bits) == 0x0;
>
> return (short)(sign_bit | ( ((exp + 15) << 10) + signif_bits ) );
I didn't test this variant, will do tomorrow when also reviewing the tests themselves.
-------------
PR: https://git.openjdk.org/jdk/pull/9422
More information about the core-libs-dev
mailing list