RFR: 8251989: Hex formatting and parsing utility [v10]
Naoto Sato
naoto.sato at oracle.com
Mon Nov 30 17:32:28 UTC 2020
Hi Roger,
Thanks for your thought and I agree with you. Since this is a utility
primarily meant for developers, not end users, limiting the "hexadecimal
string/character" in Latin-1 seems reasonable.
Naoto
On 11/30/20 7:42 AM, Roger Riggs wrote:
> Hi Naoto,
>
> There are a couple of ways consistency can be achieved (and with what).
>
> The existing hex conversions from strings to hex all delegate to
> Character.digit(ch, radix) which allows
> both digits and letters beyond Latin1. (See Integer.valueOf(string,
> radix), Long.valueOf(string, radix), etc.)
> For conversions from primitive to string they support conversion to the
> Latin1 characters "0-9", "a-f".
>
> Making the conversion of strings to and from primitives consistent
> within HexFormat seems attractive
> but would diverge from existing conversions and typically the non-Latin1
> digits and letters almost never appear.
>
> There are uses cases (primarily in protocols and RFCs) where the
> hexadecimal characters are
> specifed as "0-9", "a-f", and "A-F". If HexFormat used
> Character.digit(string, radix) it would fail
> to detect unexpected or illegal characters and render HexFormat
> unusable for those use cases.
>
> Though it would diverge from consistency with existing parsing of
> hexadecimal in Character, Integer, Long, etc,
> I'll post an update to use the string parsing allowing only Latin1
> hexadecimal characters.
>
> Comments?
>
> Thanks, Roger
>
>
>
> On 11/27/20 5:43 PM, Naoto Sato wrote:
>> On Fri, 27 Nov 2020 16:57:07 GMT, Roger Riggs <rriggs at openjdk.org> wrote:
>>
>>>> src/java.base/share/classes/java/util/HexFormat.java line 853:
>>>>
>>>>> 851: */
>>>>> 852: public int fromHexDigit(int ch) {
>>>>> 853: int value = Character.digit(ch, 16);
>>>> Do we need to limit parsing the hex digit for only [0-9a-fA-F]? This
>>>> would return `0` for other digits, say `fullwidth digit zero` (U+FF10)
>>> The normal and conventional characters for hex encoding are limited
>>> to the ASCII/Latin1 range.
>>> I don't know of any use case that would take advantage of non-ASCII
>>> characters.
>> My point is that probably we should define `hexadecimal string` more
>> clearly. In the class description, that exclusively means [0-9a-fA-F]
>> in the context of formatting, but in the parsing, it allows non-ASCII
>> digits. e.g.,
>> HexFormat.of().parseHex("\uff10\uff11")
>> Succeeds. I would like consistency here.
>>
>> -------------
>>
>> PR: https://git.openjdk.java.net/jdk/pull/482
>
More information about the core-libs-dev
mailing list