RFR 8251989: Hex encoder and decoder utility
mark.reinhold at oracle.com
mark.reinhold at oracle.com
Thu Aug 20 19:10:12 UTC 2020
2020/8/19 14:14:56 -0700, roger.riggs at oracle.com:
> Please review a java.util.Hex API to encode and decode hexadecimal
> strings to and from byte arrays.
>
> JavaDoc:
> http://cr.openjdk.java.net/~rriggs/hex-javadoc/java.base/java/util/Hex.html
This mostly looks good -- it’ll be nice to have a standard utility for
this sort of thing.
A few comments:
- Why do the short-form `encoder` factory methods return encoders that
produce upper-case hex strings? `Integer::toHexString` and other
existing `toHex` methods return lower-case hex strings. That’s also
what you get from common Unix CLI tools (e.g., `od -tx1`).
Please consider changing these methods to return lower-case encoders.
- Is it worth having static `Hex.encode(byte[])` and
`Hex.decode(CharSequence)` convenience methods for the simplest
cases?
- [Warning: Bikeshed] The verbs “encode” and “decode” seem unfortunate.
Over in `java.nio.charsets` we have encoders that transform
characters to bytes, and decoders that transform bytes to characters.
The coded thing is the bytes; the uncoded thing is the characters.
In `java.util` we already have the `Base64` class, which provides
encoders that transform bytes to characters, and decoders that
transform characters to bytes. The coded thing is the characters;
the uncoded thing is the bytes.
The use of “encode” and “decode” in `Base64` was likely inspired by
the fact that the format has been known as “base 64 encoding” for
decades, having originated as a hack for transporting non-ASCII data
via SMTP. Developers looking to do base-64 operations will, thus,
expect this terminology.
Here you’re proposing that the `Hex` class follow the `Base64` class.
Consistency with existing nearby APIs is a worthy goal. If I were
just looking to convert a byte array into a readable hex string,
however, I’d probably want to “format” it rather than “encode” it,
something like `String.format("%x")` on steroids. Likewise, if I
were looking to convert a hex string into bytes then I’d want to
“parse” it rather than “decode” it, i.e., `Integer::parseInt` on
steroids.
If you were to rename the nested classes to `Hex.Formatter` and
`Hex.Parser`, and rename all methods accordingly, then this API would
be inconsistent with the nearby `Base64` but likely more consistent
with developer expectations.
(`Hex` is already inconsistent with `Base64` in that it doesn’t
prefix the names of its factory methods with `get`, which is a good
thing.)
- Mark
More information about the core-libs-dev
mailing list