RFR: JDK-8289551: Conversions between bit representations of half precision values and floats [v6]
Raffaello Giulietti
duke at openjdk.org
Sun Jul 24 15:48:45 UTC 2022
On Sat, 23 Jul 2022 20:03:39 GMT, Raffaello Giulietti <duke at openjdk.org> wrote:
>> src/java.base/share/classes/java/lang/Float.java line 1122:
>>
>>> 1120: // binary16 (when rounding is done, could still round up)
>>> 1121: int exp = Math.getExponent(f);
>>> 1122: assert -25 <= exp && exp <= 15;
>>
>> I think that both the subnormal and the normal case can be unified if we pay closer attention to the positions of the lsb, round and sticky bits in subnormals.
>>
>>
>> // Clamp exp to the [-15, 15] range while retaining the
>> // difference between the original value and -15 on clamping.
>> // This is the excess shift value in addition to 13.
>> int expdelta = Math.max(0, -15 - exp);
>> exp += expdelta;
>> assert -15 <= exp && exp <= 15;
>>
>> int f_signif_bits = doppel & 0x007f_ffff; // original significand
>> // Significand bits as if using rounding to zero (truncation).
>> short signif_bits = (short)(f_signif_bits >> (13 + expdelta));
>>
>> // For round to nearest even, determining whether or
>> // not to round up (in magnitude) is a function of the
>> // least significant bit (LSB), the next bit position
>> // (the round position), and the sticky bit (whether
>> // there are any nonzero bits in the exact result to
>> // the right of the round digit). An increment occurs
>> // in three cases:
>> //
>> // LSB Round Sticky
>> // 0 1 1
>> // 1 1 0
>> // 1 1 1
>> // See "Computer Arithmetic Algorithms," Koren, Table 4.9
>>
>> int lsb = f_signif_bits & (1 << 13 + expdelta);
>> int round = f_signif_bits & (1 << 12 + expdelta);
>> int sticky = f_signif_bits & ((1 << 12 + expdelta) - 1);
>>
>> if (round != 0 && ((lsb | sticky) != 0 )) {
>> signif_bits++;
>> }
>>
>> // No bits set in significand beyond the *first* exponent
>> // bit, not just the sigificand; quantity is added to the
>> // exponent to implement a carry out from rounding the
>> // significand.
>> assert (0xf800 & signif_bits) == 0x0;
>>
>> return (short)(sign_bit | ( ((exp + 15) << 10) + signif_bits ) );
>
> I didn't test this variant, will do tomorrow when also reviewing the tests themselves.
The correct variant below passes the tests.
// For binary16 subnormals, beside forcing exp to -15,
// retain the difference expdelta = E_min - exp.
// This is the excess shift value, in addition to 13, to be used
// in the computations below.
// Further the (hidden) msb with value 1 in f must be involved as well.
int expdelta = 0;
int msb = 0x0000_0000;
if (exp < -14) {
expdelta = -14 - exp;
exp = -15;
msb = 0x0080_0000;
}
int f_signif_bits = doppel & 0x007f_ffff | msb;
// Significand bits as if using rounding to zero (truncation).
short signif_bits = (short)(f_signif_bits >> (13 + expdelta));
// For round to nearest even, determining whether or
// not to round up (in magnitude) is a function of the
// least significant bit (LSB), the next bit position
// (the round position), and the sticky bit (whether
// there are any nonzero bits in the exact result to
// the right of the round digit). An increment occurs
// in three cases:
//
// LSB Round Sticky
// 0 1 1
// 1 1 0
// 1 1 1
// See "Computer Arithmetic Algorithms," Koren, Table 4.9
int lsb = f_signif_bits & (1 << 13 + expdelta);
int round = f_signif_bits & (1 << 12 + expdelta);
int sticky = f_signif_bits & ((1 << 12 + expdelta) - 1);
if (round != 0 && ((lsb | sticky) != 0 )) {
signif_bits++;
}
// No bits set in significand beyond the *first* exponent
// bit, not just the sigificand; quantity is added to the
// exponent to implement a carry out from rounding the
// significand.
assert (0xf800 & signif_bits) == 0x0;
return (short)(sign_bit | ( ((exp + 15) << 10) + signif_bits ) );
-------------
PR: https://git.openjdk.org/jdk/pull/9422
More information about the core-libs-dev
mailing list