RFR: 8251989: Hex formatting and parsing utility [v10]
Chris Hegarty
chegar at openjdk.java.net
Fri Nov 27 10:49:21 UTC 2020
On Wed, 25 Nov 2020 22:51:44 GMT, Roger Riggs <rriggs at openjdk.org> wrote:
>> java.util.HexFormat utility:
>>
>> - Format and parse hexadecimal strings, with parameters for delimiter, prefix, suffix and upper/lowercase
>> - Static factories and builder methods to create HexFormat copies with modified parameters.
>> - Consistent naming of methods for conversion of byte arrays to formatted strings and back: formatHex and parseHex
>> - Consistent naming of methods for conversion of primitive types: toHexDigits... and fromHexDigits...
>> - Prefix and suffixes now apply to each formatted value, not the string as a whole
>> - Using java.util.Appendable as a target for buffered conversions so output to Writers and PrintStreams
>> like System.out are supported in addition to StringBuilder. (IOExceptions are converted to unchecked exceptions)
>> - Immutable and thread safe, a "value-based" class
>>
>> See the [HexFormat javadoc](http://cr.openjdk.java.net/~rriggs/8251989-hex-formatter/java.base/java/util/HexFormat.html) for details.
>>
>> Review comments and suggestions welcome.
>
> Roger Riggs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision:
>
> - Clarified that suffix() and prefix() methods do not return null, instead the empty string is returned.
> - Merge branch 'master' into 8251989-hex-formatter
> - Merge branch 'master' into 8251989-hex-formatter
> - Merge branch 'master' into 8251989-hex-formatter
> - The HexFormat API indexing model for array and string ranges is changed
> to describe the range using 'fromIndex (inclusive)' and 'toIndex (exclusive)'.
>
> Initially, it was specified as 'index' and 'length'. However, both byte arrays
> and strings used in the HexFormat API typically use fromIndex and toIndex
> to describe ranges. Using the same indexing model can prevent mistakes.
>
> The change affects the methods and corresponding tests:
>
> formatHex(byte[] bytes, int fromIndex, int toIndex)
> formatHex(A out, byte[] bytes, int fromIndex, int toIndex)
> parseHex(char[] chars, int fromIndex, int toIndex)
> parseHex(CharSequence string, int fromIndex, int toIndex)
> fromHexDigits(CharSequence string, int fromIndex, int toIndex)
> fromHexDigitsToLong(CharSequence string, int fromIndex, int toIndex)
> - - Added @see and @link references to Integer.toHexString and Long.toHexString
> - Clarified parsing is case insensistive in various parse and fromXXX methods
> - Source level cleanup based on review comments
> - Expanded some javadoc tag text to make it more descriptive
> - Consistent use of 'hexadecimal' vs 'hex'
> - Review comment updates to class javadoc
> - Review comment updates, in the example code, and to describe the characters used to convert to hexadecimal
> - Correct length of StringBuilder in formatHex;
> Correct bug in formatHex(char[], 2, 3) and add test for subranges of char[]
> - Merge branch 'master' into 8251989-hex-formatter
> - ... and 9 more: https://git.openjdk.java.net/jdk/compare/55be0a99...b19d2827
Changes requested by chegar (Reviewer).
src/java.base/share/classes/java/util/HexFormat.java line 71:
> 69: * <p>
> 70: * For formatted hexadecimal string to byte array conversions the
> 71: * {@code parseHex} methods include {@link #parseHex(CharSequence) parseHex(string)} and
parseHex(string) -> parseHex(CharSequence)
src/java.base/share/classes/java/util/HexFormat.java line 467:
> 465:
> 466: /**
> 467: * Returns a byte array containing hexadecimal values parsed from a range of the string.
string -> charSequence. And flow through at param tag and param name.
src/java.base/share/classes/java/util/HexFormat.java line 448:
> 446:
> 447: /**
> 448: * Returns a byte array containing hexadecimal values parsed from the string.
string -> charSequence. And flow through at param tag and param name.
src/java.base/share/classes/java/util/HexFormat.java line 528:
> 526: * a range of the character array.
> 527: *
> 528: * Each byte value is parsed as the prefix, two case insensitive hexadecimal characters,
Each *char* value ...
src/java.base/share/classes/java/util/HexFormat.java line 538:
> 536: * @param toIndex the final index of the range, exclusive.
> 537: * @return a byte array with the values parsed from the character array range
> 538: * @throws IllegalArgumentException if the prefix or suffix is not present for each byte value,
... for each *char* value ... OR ... for each pair of char values?
src/java.base/share/classes/java/util/HexFormat.java line 617:
> 615: * Returns the two hexadecimal characters for the {@code byte} value.
> 616: * Each nibble (4 bits) from most significant to least significant of the value
> 617: * is formatted as if by {@link #toLowHexDigit(int) toLowHexDigit(nibble)}.
It might be more straightforward to frame this in terms of toLowHexDigit and toHighHexDigit (rather than only toLowHexDigit). `nibble`- should the param name for to(Low|High)HexDigit be named `nibble` ?
src/java.base/share/classes/java/util/HexFormat.java line 860:
> 858:
> 859: /**
> 860: * Returns a value parsed from two hexadecimal characters in a string.
sting -> charsequence ( same comment more generally, anywhere a charsequence is referred to as a string )
src/java.base/share/classes/java/util/HexFormat.java line 47:
> 45: * or choice of {@link #withUpperCase()} or {@link #withLowerCase()} parameters.
> 46: * <p>
> 47: * For primitive to hexadecimal string conversions the {@code toHexDigits}
It took me a little while to grok the naming convention being used here. It might be helpful to ensure that all the terms are clearly defined and used consistently throughout.
"hexadecimal string" - consists of just 0-9, a-f, A-F (i.e. no delimiter, prefix, suffix)
"formatted hexadecimal string" - consists of hexadecimal string, and prefix, delimiter, suffix.
The `parseXXX` and `formatXXX` are logically related - and operate on "formatted hexadecimal strings"
The `toXXX` and `fromXXX` are logically related - one is kind of the inverse of the other. And these operate (even when accepting/producing chars) on "hexadecimal strings" - ignoring almost all properties of the HexFormat, except the toXXX cares about upper/lower case. Does `fromXXX` care about upper/lower ?
-------------
PR: https://git.openjdk.java.net/jdk/pull/482
More information about the core-libs-dev
mailing list