RFR: 8251989: Hex formatting and parsing utility [v10]
Roger Riggs
Roger.Riggs at oracle.com
Mon Nov 30 15:42:13 UTC 2020
Hi Naoto,
There are a couple of ways consistency can be achieved (and with what).
The existing hex conversions from strings to hex all delegate to
Character.digit(ch, radix) which allows
both digits and letters beyond Latin1. (See Integer.valueOf(string,
radix), Long.valueOf(string, radix), etc.)
For conversions from primitive to string they support conversion to the
Latin1 characters "0-9", "a-f".
Making the conversion of strings to and from primitives consistent
within HexFormat seems attractive
but would diverge from existing conversions and typically the non-Latin1
digits and letters almost never appear.
There are uses cases (primarily in protocols and RFCs) where the
hexadecimal characters are
specifed as "0-9", "a-f", and "A-F". If HexFormat used
Character.digit(string, radix) it would fail
to detect unexpected or illegal characters and render HexFormat
unusable for those use cases.
Though it would diverge from consistency with existing parsing of
hexadecimal in Character, Integer, Long, etc,
I'll post an update to use the string parsing allowing only Latin1
hexadecimal characters.
Comments?
Thanks, Roger
On 11/27/20 5:43 PM, Naoto Sato wrote:
> On Fri, 27 Nov 2020 16:57:07 GMT, Roger Riggs <rriggs at openjdk.org> wrote:
>
>>> src/java.base/share/classes/java/util/HexFormat.java line 853:
>>>
>>>> 851: */
>>>> 852: public int fromHexDigit(int ch) {
>>>> 853: int value = Character.digit(ch, 16);
>>> Do we need to limit parsing the hex digit for only [0-9a-fA-F]? This would return `0` for other digits, say `fullwidth digit zero` (U+FF10)
>> The normal and conventional characters for hex encoding are limited to the ASCII/Latin1 range.
>> I don't know of any use case that would take advantage of non-ASCII characters.
> My point is that probably we should define `hexadecimal string` more clearly. In the class description, that exclusively means [0-9a-fA-F] in the context of formatting, but in the parsing, it allows non-ASCII digits. e.g.,
> HexFormat.of().parseHex("\uff10\uff11")
> Succeeds. I would like consistency here.
>
> -------------
>
> PR: https://git.openjdk.java.net/jdk/pull/482
More information about the core-libs-dev
mailing list