RFR: 8251989: Hex formatting and parsing utility [v10]

Roger Riggs Roger.Riggs at oracle.com
Mon Nov 30 15:42:13 UTC 2020


Hi Naoto,

There are a couple of ways consistency can be achieved (and with what).

The existing hex conversions from strings to hex all delegate to 
Character.digit(ch, radix) which allows
both digits and letters beyond Latin1. (See Integer.valueOf(string, 
radix), Long.valueOf(string, radix), etc.)
For conversions from primitive to string they support conversion to the 
Latin1 characters "0-9", "a-f".

Making the conversion of strings to and from primitives consistent 
within HexFormat seems attractive
but would diverge from existing conversions and typically the non-Latin1 
digits and letters almost never appear.

There are uses cases (primarily in protocols and RFCs) where the 
hexadecimal characters are
specifed as "0-9", "a-f", and "A-F".  If HexFormat used 
Character.digit(string, radix) it would fail
to detect unexpected or  illegal characters and render HexFormat 
unusable for those use cases.

Though it would diverge from consistency with existing parsing of 
hexadecimal in Character, Integer, Long, etc,
I'll post an update to use the string parsing allowing only Latin1 
hexadecimal characters.

Comments?

Thanks, Roger



On 11/27/20 5:43 PM, Naoto Sato wrote:
> On Fri, 27 Nov 2020 16:57:07 GMT, Roger Riggs <rriggs at openjdk.org> wrote:
>
>>> src/java.base/share/classes/java/util/HexFormat.java line 853:
>>>
>>>> 851:      */
>>>> 852:     public int fromHexDigit(int ch) {
>>>> 853:         int value = Character.digit(ch, 16);
>>> Do we need to limit parsing the hex digit for only [0-9a-fA-F]? This would return `0` for other digits, say `fullwidth digit zero` (U+FF10)
>> The normal and conventional characters for hex encoding are limited to the ASCII/Latin1 range.
>> I don't know of any use case that would take advantage of non-ASCII characters.
> My point is that probably we should define `hexadecimal string` more clearly. In the class description, that exclusively means [0-9a-fA-F] in the context of formatting, but in the parsing, it allows non-ASCII digits. e.g.,
> HexFormat.of().parseHex("\uff10\uff11")
> Succeeds. I would like consistency here.
>
> -------------
>
> PR: https://git.openjdk.java.net/jdk/pull/482



More information about the core-libs-dev mailing list