RFR: JDK-8289551: Conversions between bit representations of half precision values and floats [v6]

Sat Jul 23 20:07:12 UTC 2022

On Fri, 22 Jul 2022 01:29:57 GMT, Joe Darcy <darcy at openjdk.org> wrote:

>> Initial implementation.
>
> Joe Darcy has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Implement review feedback.

src/java.base/share/classes/java/lang/Float.java line 980:

> 978: 
> 979:     /**
> 980:      * {@return the {@code float} value closest to the numerical value

`{@return` seems to be out of place and missing a closing `}`

src/java.base/share/classes/java/lang/Float.java line 1028:

> 1026:         // Shift left difference in the number of significand bits in
> 1027:         // the float and binary16 formats
> 1028:         final int SIGNIF_SHIFT = (FloatConsts.SIGNIFICAND_WIDTH - 11);

Wouldn't it make more sense to declare this local as `private static final`? It could then even be used in the next method.

src/java.base/share/classes/java/lang/Float.java line 1036:

> 1034:         // (significand width includes the implicit bit so shift one
> 1035:         // less).
> 1036:         int bin16Exp = (((bin16ExpBits >> 10) - 15));

Opening `((` and closing `))` are über-redundant

src/java.base/share/classes/java/lang/Float.java line 1059:

> 1057:         int result = (floatExpBits |
> 1058:                       (bin16SignifBits << SIGNIF_SHIFT));
> 1059:         return sign * Float.intBitsToFloat(result);

One could use `bin16SignBit` when preparing the `int result` bits to avoid a `float` multiplication just to generate the sign

src/java.base/share/classes/java/lang/Float.java line 1063:

> 1061: 
> 1062:     /**
> 1063:      * {@return the floating-point binary16 value, encoded in a {@code

`{@0return` seems to be out of place and missing a closing `}`

src/java.base/share/classes/java/lang/Float.java line 1110:

> 1108: 
> 1109:         // The overflow threshold is binary16 MAX_VALUE + 1/2 ulp
> 1110:         if (abs_f >= (65504.0f + 16.0f) ) {

What about `(0x1.ffcp15f + 0x0.002p15f)` for the overflow threshold? The bits are reflected more clearly.

src/java.base/share/classes/java/lang/Float.java line 1112:

> 1110:         if (abs_f >= (65504.0f + 16.0f) ) {
> 1111:             return (short)(sign_bit | 0x7c00); // Positive or negative infinity
> 1112:         } else {

The ` else {` with the corresponding closing `}` could be removed and the code in this branch could be un-indented. This would make the code visually less "jagged".

src/java.base/share/classes/java/lang/Float.java line 1115:

> 1113:             // Smallest magnitude nonzero representable binary16 value
> 1114:             // is equal to 0x1.0p-24; half-way and smaller rounds to zero.
> 1115:             if (abs_f <= 0x1.0p-25f) { // Covers float zeros and subnormals.

I think `0x0.002p-14f` or `0.5f * 0x0.004p-14f` might be clearer than `0x1.0p-25f`. The latter requires mental calculation to verify.

src/java.base/share/classes/java/lang/Float.java line 1122:

> 1120:             // binary16 (when rounding is done, could still round up)
> 1121:             int exp = Math.getExponent(f);
> 1122:             assert -25 <= exp && exp <= 15;

I think that both the subnormal and the normal case can be unified if we pay closer attention to the positions of the lsb, round and sticky bits in subnormals.

        // Clamp exp to the [-15, 15] range while retaining the
        // difference between the original value and -15 on clamping.
        // This is the excess shift value in addition to 13.
        int expdelta = Math.max(0, -15 - exp);
        exp += expdelta;
        assert -15 <= exp && exp <= 15;

        int f_signif_bits = doppel & 0x007f_ffff;  // original significand
        // Significand bits as if using rounding to zero (truncation).
        short signif_bits = (short)(f_signif_bits >> (13 + expdelta));

        // For round to nearest even, determining whether or
        // not to round up (in magnitude) is a function of the
        // least significant bit (LSB), the next bit position
        // (the round position), and the sticky bit (whether
        // there are any nonzero bits in the exact result to
        // the right of the round digit). An increment occurs
        // in three cases:
        //
        // LSB  Round Sticky
        // 0    1     1
        // 1    1     0
        // 1    1     1
        // See "Computer Arithmetic Algorithms," Koren, Table 4.9

        int lsb    = f_signif_bits & (1 << 13 + expdelta);
        int round  = f_signif_bits & (1 << 12 + expdelta);
        int sticky = f_signif_bits & ((1 << 12 + expdelta) - 1);

        if (round != 0 && ((lsb | sticky) != 0 )) {
            signif_bits++;
        }

        // No bits set in significand beyond the *first* exponent
        // bit, not just the sigificand; quantity is added to the
        // exponent to implement a carry out from rounding the
        // significand.
        assert (0xf800 & signif_bits) == 0x0;

        return (short)(sign_bit | ( ((exp + 15) << 10) + signif_bits ) );

src/java.base/share/classes/java/lang/Float.java line 1186:

> 1184:             return (short)(sign_bit | ( ((exp + 15) << 10) + signif_bits ) );
> 1185:         }
> 1186: }

`}` should be indented by 4 spaces.

-------------

PR: https://git.openjdk.org/jdk/pull/9422