RFR: 8251989: Hex formatting and parsing utility [v10]

Fri Nov 27 17:29:09 UTC 2020

On Fri, 27 Nov 2020 10:46:09 GMT, Chris Hegarty <chegar at openjdk.org> wrote:

>> Roger Riggs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision:
>> 
>>  - Clarified that suffix() and prefix() methods do not return null, instead the empty string is returned.
>>  - Merge branch 'master' into 8251989-hex-formatter
>>  - Merge branch 'master' into 8251989-hex-formatter
>>  - Merge branch 'master' into 8251989-hex-formatter
>>  - The HexFormat API indexing model for array and string ranges is changed
>>    to describe the range using 'fromIndex (inclusive)' and 'toIndex (exclusive)'.
>>    
>>    Initially, it was specified as 'index' and 'length'. However, both byte arrays
>>    and strings used in the HexFormat API typically use fromIndex and toIndex
>>    to describe ranges.  Using the same indexing model can prevent mistakes.
>>    
>>    The change affects the methods and corresponding tests:
>>    
>>        formatHex(byte[] bytes, int fromIndex, int toIndex)
>>        formatHex(A out, byte[] bytes, int fromIndex, int toIndex)
>>        parseHex(char[] chars, int fromIndex, int toIndex)
>>        parseHex(CharSequence string, int fromIndex, int toIndex)
>>        fromHexDigits(CharSequence string, int fromIndex, int toIndex)
>>        fromHexDigitsToLong(CharSequence string, int fromIndex, int toIndex)
>>  - - Added @see and @link references to Integer.toHexString and Long.toHexString
>>    - Clarified parsing is case insensistive in various parse and fromXXX methods
>>    - Source level cleanup based on review comments
>>    - Expanded some javadoc tag text to make it more descriptive
>>    - Consistent use of 'hexadecimal' vs 'hex'
>>  - Review comment updates to class javadoc
>>  - Review comment updates, in the example code, and to describe the characters used to convert to hexadecimal
>>  - Correct length of StringBuilder in formatHex;
>>    Correct bug in formatHex(char[], 2, 3) and add test for subranges of char[]
>>  - Merge branch 'master' into 8251989-hex-formatter
>>  - ... and 9 more: https://git.openjdk.java.net/jdk/compare/dcf4db97...b19d2827
>
> src/java.base/share/classes/java/util/HexFormat.java line 47:
> 
>> 45:  * or choice of {@link #withUpperCase()} or {@link #withLowerCase()} parameters.
>> 46:  * <p>
>> 47:  * For primitive to hexadecimal string conversions the {@code toHexDigits}
> 
> It took me a little while to grok the naming convention being used here. It might be helpful to ensure that all the terms are clearly defined and used consistently throughout. 
> 
> "hexadecimal string"  - consists of just 0-9, a-f, A-F  (i.e. no delimiter, prefix, suffix)
> "formatted hexadecimal string" - consists of hexadecimal string, and prefix, delimiter, suffix.
> 
> The `parseXXX` and `formatXXX` are logically related - and operate on "formatted hexadecimal strings"
> 
> The `toXXX` and `fromXXX` are logically related - one is kind of the inverse of the other. And these operate (even when accepting/producing chars) on "hexadecimal strings" - ignoring almost all properties of the HexFormat, except the toXXX cares about upper/lower case. Does `fromXXX` care about upper/lower ?

Yes, naming is important and there are two dimensions in the api that are significant.  
Primitives vs byte arrays, and decoding and encoding.
         Primitives   ByteArrays
encode:   toXXX         format 
decode:   fromXXX       parse
Both fromXXX or parseXXX accept either upper or lower case.

The class javadoc and each method specifies whether it uses the prefix, suffix, delimiter, and upper/lower case choice.

The choice of 'formatted byte array' was used to make a strong distinction between formatting of byte arrays vs converting primitives with limited/no formatting.

-------------

PR: https://git.openjdk.java.net/jdk/pull/482