RFR: 8251989: Hex formatting and parsing utility

Mon Oct 12 23:08:20 UTC 2020

On Fri, 2 Oct 2020 15:18:04 GMT, Roger Riggs <rriggs at openjdk.org> wrote:

> java.util.HexFormat utility:
> 
>  - Format and parse hexadecimal strings, with parameters for delimiter, prefix, suffix and upper/lowercase
>  - Static factories and builder methods to create HexFormat copies with modified parameters.
>  - Consistent naming of methods for conversion of byte arrays to formatted strings and back: formatHex and parseHex
>  - Consistent naming of methods for conversion of primitive types: toHexDigits... and fromHexDigits...
>  - Prefix and suffixes now apply to each formatted value, not the string as a whole
>  - Using java.util.Appendable as a target for buffered conversions so output to Writers and PrintStreams
>    like System.out are supported in addition to StringBuilder. (IOExceptions are converted to unchecked exceptions)
>  - Immutable and thread safe, a "value-based" class
> 
> See the [HexFormat
> javadoc](http://cr.openjdk.java.net/~rriggs/8251989-hex-formatter/java.base/java/util/HexFormat.html) for details.
> Review comments and suggestions welcome.

This is a really useful addition! Utility classes for formatting bytes to hex have been re-invented so often already
that having this in the JDK would be really great.

Now that the OpenJDK project migrated to GitHub I took the opportunity and wrote this GitHub pull request review
(though I am not a OpenJDK contributor). Most things are formatting or documentation related. I hope these comments are
useful and are not too intrusive. Please let me know otherwise.

Also is it common practice to use `System.out` in JDK tests? In my opinion it often does not add much value once the
unit test implementation has been completed because the output is not checked during tests automatically anyways and
might only clutter the console output.

The tests are using `String.toUpperCase`, `toLowerCase` and `format` without `Locale` (therefore using the default
one). Does the test setup guarantee a constant default locale or would be better to include a `Locale` to make sure the
tests don't break for any unusual locale?

Has it been also considered to add support for parsing from a `Reader` to an `OutputStream` and from `InputStream` to
`Appendable` to support arbitrary length input and output?

Maybe it would also be good to mention for the method parsing and formatting `int`, `long`, ... in which byte order the
output is created.

src/java.naming/share/classes/com/sun/jndi/ldap/Filter.java line 32:

> 30:
> 31: import java.io.IOException;
> 32: import java.util.HexFormat;

This import appears to be unused

test/jdk/java/util/HexFormat/HexFormatTest.java line 406:

> 404:
> 405:             int byteVal = hex.fromHexDigits(byteStr);
> 406:             assert(byteStr.equals("7f"));

Why use regular `assert` statements in this test method? TestNG's methods likely yield more useful results.

src/java.base/share/classes/java/util/HexFormat.java line 45:

> 43:  * the {@code withXXX} methods return copies of {@code HexFormat}ters modified
> 44:  * {@link #withPrefix(String)}, {@link #withSuffix(String)}, {@link #withDelimiter(String)}
> 45:  * or choice of {@link #withUppercase()} or {@link #withLowercase()} parameters using

Would <code>withUpperCase</code> and <code>withLowerCase</code> (captial "C") be better to be consistent
with `String.toUpperCase` and `Character.toUpperCase`?

src/java.base/share/classes/java/util/HexFormat.java line 77:

> 75:  * uppercase or lowercase digits, and the {@code suffix}.
> 76:  * The {@code delimiter} is required after each formatted value, except the last.
> 77:  * <p>

Is this `` needed here? `@apiNote` below is a block tag so javadoc should put it in a separate paragraph anyways.

src/java.base/share/classes/java/util/HexFormat.java line 112:

> 110:  * <pre>{@code
> 111:  *     HexFormat formatFingerprint = HexFormat.ofDelimiter(":").withUppercase();
> 112:  *     byte[] bytes = { 0, 1, 2, 3, 124, 125, 126, 127};

Suggestion:

 *     byte[] bytes = {0, 1, 2, 3, 124, 125, 126, 127};

Space here is not consistent with other code snippets here.

src/java.base/share/classes/java/util/HexFormat.java line 148:

> 146:     private static final byte[] LOWERCASE_DIGITS = {
> 147:             '0' , '1' , '2' , '3' , '4' , '5' , '6' , '7',
> 148:             '8' , '9' , 'a' , 'b' , 'c' , 'd' , 'e' , 'f',

Suggestion:

            '0', '1', '2', '3', '4', '5', '6', '7',
            '8', '9', 'a', 'b', 'c', 'd', 'e', 'f',

src/java.base/share/classes/java/util/HexFormat.java line 182:

> 180:     /**
> 181:      * Returns a hexadecimal formatter with no delimiter and lowercase characters.
> 182:      * The hex characters are lowercase and the delimiter, prefix, and suffix are empty.

> The hex characters are lowercase

This is already mentioned in the sentence before.

src/java.base/share/classes/java/util/HexFormat.java line 195:

> 193:
> 194:     /**
> 195:      * Returns a hexadecimal formatter with a {@code delimiter} and lowercase letters.

Suggestion:

     * Returns a hexadecimal formatter with a {@code delimiter} and lowercase characters.
To be consistent with documentation of other methods.

src/java.base/share/classes/java/util/HexFormat.java line 203:

> 201:      *
> 202:      * @param delimiter a {@code delimiter}, non-null, may be empty
> 203:      * @return a {@link Formatter} with the {@code delimiter} and lowercase letters

Suggestion:

     * @return a {@link Formatter} with the {@code delimiter} and lowercase characters
To be consistent with documentation of other methods.

src/java.base/share/classes/java/util/HexFormat.java line 221:

> 219:      * Returns a copy of this {@code HexFormat} with the {@code prefix}.
> 220:      *
> 221:      * @param prefix a prefix, non-null, may be empty

Suggestion:

     * @param prefix a {@code prefix}, non-null, may be empty
For consistency with other methods.

src/java.base/share/classes/java/util/HexFormat.java line 252:

> 250:      * The lowercase hexadecimal characters are {@code "0-9", "a-f"}.
> 251:      *
> 252:      * @return a copy of this {@code HexFormat} with lowercase hex characters

Suggestion:

     * @return a copy of this {@code HexFormat} with lowercase hexadecimal characters
Either write "hexadecimal" here or change the documentation for `withUppercase()` to use "hex".

src/java.base/share/classes/java/util/HexFormat.java line 261:

> 259:      * Returns the {@code delimiter} between hexadecimal values in a formatted byte array.
> 260:      *
> 261:      * @return return the {@code delimiter}, non-null, may be empty {@code ""}

Suggestion:

     * @return the {@code delimiter}, non-null, may be empty {@code ""}
"return" here and for other methods is redundant because the Standard Doclet already writes "Returns", so you end up
with "Returns return". (Also note that the other methods use "return**s**" (with "s" at the end), which is not
consistent in case you choose to keep the redundant "return")

src/java.base/share/classes/java/util/HexFormat.java line 287:

> 285:     /**
> 286:      * Returns {@code true} if the hexadecimal digits will be uppercase,
> 287:      *          otherwise {@code false}.

Suggestion:

     * otherwise {@code false}.

src/java.base/share/classes/java/util/HexFormat.java line 302:

> 300:      *
> 301:      * The behavior is equivalent to
> 302:      * {@link #formatHex(byte[], int, int) format(bytes, 0, bytes.length))}.

Suggestion:

     * {@link #formatHex(byte[], int, int) formatHex(bytes, 0, bytes.length))}.

src/java.base/share/classes/java/util/HexFormat.java line 324:

> 322:      */
> 323:     public String formatHex(byte[] bytes, int index, int length) {
> 324:         Objects.requireNonNull(bytes,"bytes");

Suggestion:

        Objects.requireNonNull(bytes, "bytes");

src/java.base/share/classes/java/util/HexFormat.java line 346:

> 344:      * A {@code delimiter} appears after each formatted value, except the last.
> 345:      * The behavior is equivalent to
> 346:      * {@link #formatHex(byte[]) out.append(format(bytes))}.

Suggestion:

     * {@link #formatHex(byte[]) out.append(hexFormat.formatHex(bytes))}.
Could possibly also omit the `hexFormat.`; though if you decide to use it, some of the other examples have to be
adjusted as well.

src/java.base/share/classes/java/util/HexFormat.java line 364:

> 362:      * A {@code delimiter} appears after each formatted value, except the last.
> 363:      * The behavior is equivalent to
> 364:      * {@link #formatHex(byte[], int, int)  out.append(format(bytes, index, length))}.

Suggestion:

     * {@link #formatHex(byte[], int, int)  out.append(hexFormat.formatHex(bytes, index, length))}.
Could possibly also omit the `hexFormat.`; though if you decide to use it, some of the other examples have to be
adjusted as well.

src/java.base/share/classes/java/util/HexFormat.java line 457:

> 455:      * A {@code delimiter} appears after each formatted value, except the last.
> 456:      * The {@code delimiter}s, {@code prefix}es, and {@code suffix}es strings must be present;
> 457:      * they may be empty strings.

This is a little bit irritating because the HexFormat dictates whether they may be empty. Having this statement here
might sound as if the caller could somehow influence it. Maybe it should mention that this depends on the HexFormat.

(Applies to the other methods as well)

src/java.base/share/classes/java/util/HexFormat.java line 461:

> 459:      *
> 460:      * @param string a string containing the byte values with {@code prefix}, hexadecimal digits, {@code suffix},
> 461:      *            and delimiters

Suggestion:

     *            and {@code delimiter}s
For consistency with the other documentation

src/java.base/share/classes/java/util/HexFormat.java line 543:

> 541:      *
> 542:      * @param chars a char array range containing an even number of hex digits,
> 543:      *          {@code delimiters}, {@code prefix}, and {@code suffix}.

Suggestion:

     *          {@code delimiter}s, {@code prefix}, and {@code suffix}.
For consistency

src/java.base/share/classes/java/util/HexFormat.java line 482:

> 480:      *
> 481:      * @param string a string range containing hex digits,
> 482:      *           {@code delimiters}, {@code prefix}, and {@code suffix}.

Suggestion:

     *           {@code delimiter}s, {@code prefix}, and {@code suffix}.
For consistency

src/java.base/share/classes/java/util/HexFormat.java line 462:

> 460:      * @param string a string containing the byte values with {@code prefix}, hexadecimal digits, {@code suffix},
> 461:      *            and delimiters
> 462:      * @return a byte array

It might be good to be a little bit more verbose here because the method signature shows that the return type is
`byte[]` so this documentation does not add that much information. Maybe something like "a byte array representing the
parsed hexadecimal string" would be better?

(Applies to the other methods as well)

src/java.base/share/classes/java/util/HexFormat.java line 488:

> 486:      * @throws IllegalArgumentException if the string length is not valid or
> 487:      *          the string contains non-hex characters,
> 488:      *          or the {@code delimiter}, {@code prefix}, or {@code suffix} are not found

Might be good to make this consistent, `parseHex(CharSequence)` says:
if the {@code prefix} or {@code suffix} is not present for each byte value,
the byte values are not hexadecimal characters, or if the {@code delimiter} is not present
after all but the last byte value.

src/java.base/share/classes/java/util/HexFormat.java line 508:

> 506:         if (string.length() < valueChars || (string.length() - valueChars) % stride != 0)
> 507:             throw new IllegalArgumentException("extra or missing delimiters " +
> 508:                     "or values consisting of prefix, two hex digits, and suffix");

> extra or missing delimiters or values consisting of prefix, two hex digits, and suffix

Sounds a little bit like "values consisting of prefix, two hex digits and suffix" is an error condition, while the
sentence probably means valid values being missing is the error condition. It might be clearer if reworded, e.g. like
this:
> extra or missing delimiters, prefix, suffx or hex digits

Or something general like "malformed formatted hex string".

src/java.base/share/classes/java/util/HexFormat.java line 614:

> 612:      *
> 613:      * @param value a value, only bits {@code 4-7} of the value are used
> 614:      * @return the hex character for the bits {@code 4-7} of the value are used

Suggestion:

     * @return the hex character for the bits {@code 4-7} of the value

src/java.base/share/classes/java/util/HexFormat.java line 650:

> 648:      * @throws UncheckedIOException if an I/O exception occurs appending to the output
> 649:      */
> 650:     public Appendable toHexDigits(Appendable out, byte value) {

Why not use a type parameter here for the `Appendable` similar to the other methods?

src/java.base/share/classes/java/util/HexFormat.java line 765:

> 763:      * The {@code delimiter}, {@code prefix} and {@code suffix} are not used.
> 764:      *
> 765:      * @param value an {@code long} value

Suggestion:

     * @param value a {@code long} value

src/java.base/share/classes/java/util/HexFormat.java line 760:

> 758:
> 759:     /**
> 760:      * Returns up to sixteen hex characters for the {@code long} value.

Might be good to clearify that the `digits` starts with the high bytes of the `long`.

src/java.base/share/classes/java/util/HexFormat.java line 836:

> 834:      *          otherwise {@code false}
> 835:      */
> 836:     public boolean isHexDigit(int ch) {

Should this method be `static`? Or otherwise should it consider the `uppercase()` setting?
(Same for `fromHexDigit` and all the other subsequent methods)

These methods being instance methods is likely pretty confusing because they are unrelated to the HexFormat instance
and makes using them cumbersome because the user has to create a HexFormat instance first.

src/java.base/share/classes/java/util/HexFormat.java line 902:

> 900:      * characters.
> 901:      * The characters in the range from {@code index} to {@code index + length - 1},
> 902:      * inclusive, must be valid hex length according to {@link #fromHexDigit(int)}.

Suggestion:

     * inclusive, must be valid hex digits according to {@link #fromHexDigit(int)}.

src/java.base/share/classes/java/util/HexFormat.java line 902:

> 900:      * characters.
> 901:      * The characters in the range from {@code index} to {@code index + length - 1},
> 902:      * inclusive, must be valid hex length according to {@link #fromHexDigit(int)}.

Maybe would be better to specify the "exclusive" value. Because providing a `length` of 0 is allowed in which case this
statement would be incorrect (`index + 0 - 1`). (same for `fromHexDigitsToLong`)

src/java.base/share/classes/java/util/HexFormat.java line 996:

> 994:
> 995:     /**
> 996:      * Returns a hashcode for this {@code HexFormat} that is consistent with

* [...] that is consistent with
* {@link #equals(Object) equals}.
This information might be redundant; the general contract of `Object.hashCode()` requires that.

src/java.base/share/classes/java/util/HexFormat.java line 1015:

> 1013:      */
> 1014:     @Override
> 1015:     public String toString() {

Might be useful to also include the class name?

src/java.base/share/classes/java/util/HexFormat.java line 521:

> 519:                 throw new IllegalArgumentException("input contains non-hex characters");
> 520:             bytes[i] = (byte) v;
> 521:             HexFormat.checkLiteral(string, offset + 2, between);

Suggestion:

            checkLiteral(string, offset + 2, between);

src/java.base/share/classes/java/util/HexFormat.java line 206:

> 204:      */
> 205:     public static HexFormat ofDelimiter(String delimiter) {
> 206:         return new HexFormat(delimiter, "", "", HexFormat.LOWERCASE_DIGITS);

Suggestion:

        return new HexFormat(delimiter, "", "", LOWERCASE_DIGITS);
Might be good to drop the type qualifier here (and for other usages) or for consistency add the type qualifier where it
is currently not used.

-------------

PR: https://git.openjdk.java.net/jdk/pull/482