RFR: 8251989: Hex formatting and parsing utility [v12]

Tue Dec 1 11:34:05 UTC 2020

On Mon, 30 Nov 2020 20:56:17 GMT, Roger Riggs <rriggs at openjdk.org> wrote:

>> java.util.HexFormat utility:
>> 
>>  - Format and parse hexadecimal strings, with parameters for delimiter, prefix, suffix and upper/lowercase
>>  - Static factories and builder methods to create HexFormat copies with modified parameters.
>>  - Consistent naming of methods for conversion of byte arrays to formatted strings and back: formatHex and parseHex
>>  - Consistent naming of methods for conversion of primitive types: toHexDigits... and fromHexDigits...
>>  - Prefix and suffixes now apply to each formatted value, not the string as a whole
>>  - Using java.util.Appendable as a target for buffered conversions so output to Writers and PrintStreams
>>    like System.out are supported in addition to StringBuilder. (IOExceptions are converted to unchecked exceptions)
>>  - Immutable and thread safe, a "value-based" class
>> 
>> See the [HexFormat javadoc](http://cr.openjdk.java.net/~rriggs/8251989-hex-formatter/java.base/java/util/HexFormat.html) for details.
>> 
>> Review comments and suggestions welcome.
>
> Roger Riggs has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Clarified hexadecimal characters used in converting from characters to values to be strictly 0-9, a-f, and A-F.
>   Added a test to verify isHexDigit and fromHexDigit for the entire range of chars

src/java.base/share/classes/java/lang/Integer.java line 265:

> 263:      * of byte arrays and primitives to return a string or adding to an {@link Appendable}.
> 264:      * HexFormat formats and parses uppercase or lowercase hexadecimal characters,
> 265:      * with leading zeros and for byte arrays includes for each byte

Should that be:

     * {@code HexFormat} formats and parses uppercase or lowercase hexadecimal characters,

or 

     * The {@code HexFormat} class formats and parses uppercase or lowercase hexadecimal characters,

src/java.base/share/classes/java/lang/Long.java line 299:

> 297:      * {@link java.util.HexFormat} provides formatting and parsing
> 298:      * of byte arrays and primitives to return a string or adding to an {@link Appendable}.
> 299:      * HexFormat formats and parses uppercase or lowercase hexadecimal characters,

Same remark here: `{@code HexFormat}`

src/java.base/share/classes/java/util/HexFormat.java line 171:

> 169:             new HexFormat("", "", "", LOWERCASE_DIGITS);
> 170: 
> 171:     private static final byte[] emptyBytes = new byte[0];

Nit: I believe it might be clearer (at the point of use) if that constant name was all upper case - as it would be more obvious that what is used there is a static final constant. e.g. if the name was something like EMPTY_BYTES or NO_BYTES then when it's later used in the code it would be clearer that a constant is being referenced.

src/java.base/share/classes/java/util/HexFormat.java line 347:

> 345:         if (s == null) {
> 346:             StringBuilder sb = new StringBuilder((toIndex - fromIndex) *
> 347:                     (delimiter.length() + prefix.length() + 2 + suffix.length()) - delimiter.length());

There should be some guard here that this length computation does not overflow. 
And also a test to verify that this is handled properly?

src/java.base/share/classes/java/util/HexFormat.java line 362:

> 360:      *
> 361:      * @param <A> The type of Appendable
> 362:      * @param out an Appendable, non-null

should be {@code Appendable} on both lines

src/java.base/share/classes/java/util/HexFormat.java line 379:

> 377:      *
> 378:      * @param <A> The type of Appendable
> 379:      * @param out an Appendable, non-null

should be `{@code Appendable}` on both lines

src/java.base/share/classes/java/util/HexFormat.java line 436:

> 434:         if (delimiter.isEmpty()) {
> 435:             // Allocate the byte array and fill in the hex pairs for each byte
> 436:             rep = new byte[length * 2];

What if `length * 2` overflows?

src/java.base/share/classes/java/util/HexFormat.java line 445:

> 443:             // Then insert the delimiter and hexadecimal characters for each of the remaining bytes
> 444:             char sep = delimiter.charAt(0);
> 445:             rep = new byte[length * 3 - 1];

same here: what if `length * 3 - 1` overflows?

src/java.base/share/classes/java/util/HexFormat.java line 516:

> 514: 
> 515:         int valueChars = prefix.length() + 2 + suffix.length();
> 516:         int stride = valueChars + delimiter.length();

Unlikely - but there's still a possibility of overflow in these two lines?
I wonder if those should be checked at construction time - and an IAE thrown if someone tried to create a HexFormat with unreasonable prefix/suffix/delimiters?
Otherwise maybe use Math.addExact()...

src/java.base/share/classes/java/util/HexFormat.java line 660:

> 658:      *
> 659:      * @param <A> The type of Appendable
> 660:      * @param out an Appendable, non-null

Should be `{@code Appendable}` on both lines

src/java.base/share/classes/java/util/HexFormat.java line 741:

> 739:     /**
> 740:      * Returns the sixteen hexadecimal characters for the {@code long} value
> 741:      * considering it to be unsigned.

What does this means? And why is it necessary for long when it wasn't necessary for other signed types (byte, short, int). I'd vote for removing `considering it to be unsigned` there - or else add it everywhere else, but then you would have to explain what it means and what you would print if you considered it signed.

src/java.base/share/classes/java/util/HexFormat.java line 887:

> 885:      * The characters in the range from {@code index} to {@code index + 1},
> 886:      * inclusive, must be valid hex digits according to {@link #fromHexDigit(int)}.
> 887:      * The delimiter, prefix and suffix are not used.

In this method - and all similar methods that follow - then is `isUpperCase()` taken into account? If not maybe it should be made explicit - just as for delimiter, prefix, and suffix? 
Alternatively, remove this line and make the method static. Then it would be clear that upperCase, delimiter, prefix, and suffix are not used.

-------------

PR: https://git.openjdk.java.net/jdk/pull/482