RFR: 8251989: Hex formatting and parsing utility [v10]

Mon Nov 30 17:32:28 UTC 2020

Hi Roger,

Thanks for your thought and I agree with you. Since this is a utility 
primarily meant for developers, not end users, limiting the "hexadecimal 
string/character" in Latin-1 seems reasonable.

Naoto

On 11/30/20 7:42 AM, Roger Riggs wrote:
> Hi Naoto,
> 
> There are a couple of ways consistency can be achieved (and with what).
> 
> The existing hex conversions from strings to hex all delegate to 
> Character.digit(ch, radix) which allows
> both digits and letters beyond Latin1. (See Integer.valueOf(string, 
> radix), Long.valueOf(string, radix), etc.)
> For conversions from primitive to string they support conversion to the 
> Latin1 characters "0-9", "a-f".
> 
> Making the conversion of strings to and from primitives consistent 
> within HexFormat seems attractive
> but would diverge from existing conversions and typically the non-Latin1 
> digits and letters almost never appear.
> 
> There are uses cases (primarily in protocols and RFCs) where the 
> hexadecimal characters are
> specifed as "0-9", "a-f", and "A-F".  If HexFormat used 
> Character.digit(string, radix) it would fail
> to detect unexpected or  illegal characters and render HexFormat 
> unusable for those use cases.
> 
> Though it would diverge from consistency with existing parsing of 
> hexadecimal in Character, Integer, Long, etc,
> I'll post an update to use the string parsing allowing only Latin1 
> hexadecimal characters.
> 
> Comments?
> 
> Thanks, Roger
> 
> 
> 
> On 11/27/20 5:43 PM, Naoto Sato wrote:
>> On Fri, 27 Nov 2020 16:57:07 GMT, Roger Riggs <rriggs at openjdk.org> wrote:
>>
>>>> src/java.base/share/classes/java/util/HexFormat.java line 853:
>>>>
>>>>> 851:      */
>>>>> 852:     public int fromHexDigit(int ch) {
>>>>> 853:         int value = Character.digit(ch, 16);
>>>> Do we need to limit parsing the hex digit for only [0-9a-fA-F]? This 
>>>> would return `0` for other digits, say `fullwidth digit zero` (U+FF10)
>>> The normal and conventional characters for hex encoding are limited 
>>> to the ASCII/Latin1 range.
>>> I don't know of any use case that would take advantage of non-ASCII 
>>> characters.
>> My point is that probably we should define `hexadecimal string` more 
>> clearly. In the class description, that exclusively means [0-9a-fA-F] 
>> in the context of formatting, but in the parsing, it allows non-ASCII 
>> digits. e.g.,
>> HexFormat.of().parseHex("\uff10\uff11")
>> Succeeds. I would like consistency here.
>>
>> -------------
>>
>> PR: https://git.openjdk.java.net/jdk/pull/482
>