RFR: JDK-8289551: Conversions between bit representations of half precision values and floats [v6]
Raffaello Giulietti
duke at openjdk.org
Sat Jul 23 20:07:12 UTC 2022
On Fri, 22 Jul 2022 01:29:57 GMT, Joe Darcy <darcy at openjdk.org> wrote:
>> Initial implementation.
>
> Joe Darcy has updated the pull request incrementally with one additional commit since the last revision:
>
> Implement review feedback.
src/java.base/share/classes/java/lang/Float.java line 980:
> 978:
> 979: /**
> 980: * {@return the {@code float} value closest to the numerical value
`{@return` seems to be out of place and missing a closing `}`
src/java.base/share/classes/java/lang/Float.java line 1028:
> 1026: // Shift left difference in the number of significand bits in
> 1027: // the float and binary16 formats
> 1028: final int SIGNIF_SHIFT = (FloatConsts.SIGNIFICAND_WIDTH - 11);
Wouldn't it make more sense to declare this local as `private static final`? It could then even be used in the next method.
src/java.base/share/classes/java/lang/Float.java line 1036:
> 1034: // (significand width includes the implicit bit so shift one
> 1035: // less).
> 1036: int bin16Exp = (((bin16ExpBits >> 10) - 15));
Opening `((` and closing `))` are über-redundant
src/java.base/share/classes/java/lang/Float.java line 1059:
> 1057: int result = (floatExpBits |
> 1058: (bin16SignifBits << SIGNIF_SHIFT));
> 1059: return sign * Float.intBitsToFloat(result);
One could use `bin16SignBit` when preparing the `int result` bits to avoid a `float` multiplication just to generate the sign
src/java.base/share/classes/java/lang/Float.java line 1063:
> 1061:
> 1062: /**
> 1063: * {@return the floating-point binary16 value, encoded in a {@code
`{@0return` seems to be out of place and missing a closing `}`
src/java.base/share/classes/java/lang/Float.java line 1110:
> 1108:
> 1109: // The overflow threshold is binary16 MAX_VALUE + 1/2 ulp
> 1110: if (abs_f >= (65504.0f + 16.0f) ) {
What about `(0x1.ffcp15f + 0x0.002p15f)` for the overflow threshold? The bits are reflected more clearly.
src/java.base/share/classes/java/lang/Float.java line 1112:
> 1110: if (abs_f >= (65504.0f + 16.0f) ) {
> 1111: return (short)(sign_bit | 0x7c00); // Positive or negative infinity
> 1112: } else {
The ` else {` with the corresponding closing `}` could be removed and the code in this branch could be un-indented. This would make the code visually less "jagged".
src/java.base/share/classes/java/lang/Float.java line 1115:
> 1113: // Smallest magnitude nonzero representable binary16 value
> 1114: // is equal to 0x1.0p-24; half-way and smaller rounds to zero.
> 1115: if (abs_f <= 0x1.0p-25f) { // Covers float zeros and subnormals.
I think `0x0.002p-14f` or `0.5f * 0x0.004p-14f` might be clearer than `0x1.0p-25f`. The latter requires mental calculation to verify.
src/java.base/share/classes/java/lang/Float.java line 1122:
> 1120: // binary16 (when rounding is done, could still round up)
> 1121: int exp = Math.getExponent(f);
> 1122: assert -25 <= exp && exp <= 15;
I think that both the subnormal and the normal case can be unified if we pay closer attention to the positions of the lsb, round and sticky bits in subnormals.
// Clamp exp to the [-15, 15] range while retaining the
// difference between the original value and -15 on clamping.
// This is the excess shift value in addition to 13.
int expdelta = Math.max(0, -15 - exp);
exp += expdelta;
assert -15 <= exp && exp <= 15;
int f_signif_bits = doppel & 0x007f_ffff; // original significand
// Significand bits as if using rounding to zero (truncation).
short signif_bits = (short)(f_signif_bits >> (13 + expdelta));
// For round to nearest even, determining whether or
// not to round up (in magnitude) is a function of the
// least significant bit (LSB), the next bit position
// (the round position), and the sticky bit (whether
// there are any nonzero bits in the exact result to
// the right of the round digit). An increment occurs
// in three cases:
//
// LSB Round Sticky
// 0 1 1
// 1 1 0
// 1 1 1
// See "Computer Arithmetic Algorithms," Koren, Table 4.9
int lsb = f_signif_bits & (1 << 13 + expdelta);
int round = f_signif_bits & (1 << 12 + expdelta);
int sticky = f_signif_bits & ((1 << 12 + expdelta) - 1);
if (round != 0 && ((lsb | sticky) != 0 )) {
signif_bits++;
}
// No bits set in significand beyond the *first* exponent
// bit, not just the sigificand; quantity is added to the
// exponent to implement a carry out from rounding the
// significand.
assert (0xf800 & signif_bits) == 0x0;
return (short)(sign_bit | ( ((exp + 15) << 10) + signif_bits ) );
src/java.base/share/classes/java/lang/Float.java line 1186:
> 1184: return (short)(sign_bit | ( ((exp + 15) << 10) + signif_bits ) );
> 1185: }
> 1186: }
`}` should be indented by 4 spaces.
-------------
PR: https://git.openjdk.org/jdk/pull/9422
More information about the core-libs-dev
mailing list