JLS 3.10.2 -- Exposition of hexadecimal f.p. literals

Sat Oct 26 00:29:17 UTC 2019

To provide some additional background on this thread if not the JLS 
section in question, hexadecimal floating-point literals are a very 
useful language feature for a narrow range of situations. Those 
situations include having a straightforward way to set the exact bits of 
a floating-point value using a roughly human readable format. I commonly 
use hexadecimal floating-point literals in numerical tests and have 
added them to the narrative specs for sentinel values such as 
Double.MAX_VALUE.

A finite IEEE floating-point value is conceptually a tuple of three 
values, sign, significand, and exponent, where the significand and 
exponent have ranges that are a function of the format in question, 
float or double, etc. Depending on how one wants to formulate the 
values, the ranges of each of these values can be given in terms of a 
set of contiguous integers.

In a hex floating-point literal, the significand value is written in hex 
but the exponent value is written in *decimal*. However, the decimal 
value is used as an exponent for base 2 and in that sense is a "binary" 
exponent. This seemingly conflicting design works for the intended use 
cases.

For example, the smallest nonzero double value is numerically equal to 
2^-1074. As a hex literal this can be written in a number of ways including

     0x1.0p-1074

in decimal, 1 * 2 ^ -1074. The way to write the value corresponding to 
the representation of the floating-point value, using some underscores 
to help grouping, is

     0x0.0000_0000_0000_1p-1022

Decoding, this is a subnormal value (leading digit 0 with the lowest 
exponent value) and only the least significant bit of the significand is 
set. The double format uses 52-bits for its significand field, 13 hex 
digits.

Other examples would include how to write "3.0" in a way corresponding 
to the representation:

     0x1.8p1

that is 1.8 as a hex value (namely 1.5 in decimal) multiplied by 2^1 = 2.

On 10/24/2019 5:04 PM, Alex Buckley wrote:
> On 10/24/2019 1:38 PM, Joe Darcy wrote:
>> To make an explicit statement about the value of a floating-point 
>> literal, I suggest after the sentence
>>
>> "A floating-point literal may be expressed in decimal (base 10) or 
>> hexadecimal (base 16). "
>>
>> adding something like
>>
>> "The exact numerical value of a decimal floating-point literals is
>>
>>      decimal_sequence * 10 ^ exponent
>>
>> The exact numerical value of a hexadecimal floating-point literal is
>>
>>      hex_sequence * 2 ^ exponent
>>
>> The conversion of the exact numerical value to a particular 
>> floating-point value is handled as if by Float.valueOf or 
>> Double.valueOf for literals of type float and double, respectively."
>
> This is a good start, but needs tightening up. Please consider this 
> text as if you're seeing it for the first t

To be more explicit, "decimal_sequence * 10 ^ exponent " is an informal 
short-hand for "in each of the possible decimal floating-point literal 
forms below, collect together the leading digits as a digit sequence, 
treat it as a normal rational numerical value and multiply it by 10 
raised to the exponent where the exponent if implicitly 1 if not 
syntactically present in the literal."

      Digits . [Digits] [ExponentPart] [FloatTypeSuffix]
     . Digits [ExponentPart] [FloatTypeSuffix]
     Digits ExponentPart [FloatTypeSuffix]
     Digits [ExponentPart] FloatTypeSuffix

> ime, bearing in mind that it's defining terms which map to productions 
> in the grammar immediately after.
>
> -----
> A floating-point literal may be expressed in decimal (base 10) or 
> hexadecimal (base 16).
>
> For decimal floating-point literals, at least one digit (in either the 
> whole number or the fraction part) and either a decimal point, an 
> exponent, or a float type suffix are required. All other parts are 
> optional. The exponent, if present, is indicated by the ASCII letter e 
> or E followed by an optionally signed integer.
>
> The exact numerical value of a decimal floating-point literal is:
>   decimal_sequence * 10 ^ exponent
>
> For hexadecimal floating-point literals, at least one digit is 
> required (in either the whole number or the fraction part), and the 
> exponent is mandatory, and the float type suffix is optional. The 
> exponent is indicated by the ASCII letter p or P followed by an 
> optionally signed integer.
>
> The exact numerical value of a hexadecimal floating-point literal is:
>   hex_sequence * 2 ^ exponent
>
> Underscores are allowed as separators between digits that denote the 
> whole-number part, and between digits that denote the fraction part, 
> and between digits that denote the exponent.
> -----
>
> - What is "decimal_sequence"? The answer must be in terms of the 
> artifacts mentioned in the immediately preceding paragraph -- or 
> modify the grammar to introduce new artifacts that can be described in 
> the narrative.
>
> - Similarly for "hex_sequence".
>
> - A decimal f-p literal need not include the exponent part, so the 
> definition can't just assume "exponent" is known.
>
> - For a hexadecimal f-p literal, the questioner mentioned that the 
> (mandatory) exponent is "in base 2", but there is no requirement to 
> write the exponent in binary. There's lots of potential for confusion 
> here. What are some examples of hexadecimal f-p literals?
>
> In the JLS, it is often the most fundamental descriptions and 
> operations that are the hardest to phrase. We're not there yet for f-p 
> literal values.

There could be value in having a highly condensed floating-point primer 
in the JLS, but it would be fine to continue to omit such information as 
well. For example, the statement in the JLS

     "The smallest positive finite non-zero literal of type double is 
4.9e-324."

is considerably more subtle than it appears at first. All finite binary 
floating-point values are exactly representable as double values since 
10 = 2 * 5. There is a range of the number line which converts to 
Double.MIN_VALUE and many decimal strings in that range which get 
converted to Double.MIN_VALUE. The string "4.9e-324" is not the 
numerically smallest such string nor is it the numerically largest. The 
exact string has several hundred decimal digits. This string used is the 
shortest such string, which is regarded as the canonical one.

Such subtleties could be alluded to in the JLS; if there is interest, I 
could work on a few paragraphs describing the situation.

Cheers,

-Joe