RFR: 8170769 Provide a simple hexdump facility for binary data
Stuart Marks
stuart.marks at oracle.com
Tue Dec 11 02:11:33 UTC 2018
On 12/7/18 10:22 AM, Vincent Ryan wrote:
>> I'm not convinced that the overloads that send output to an OutputStream pull their weight. They basically wrap the OutputStream in a PrintStream, which conveniently doesn't declare IOException, making it easy to use from a lambda passed to forEachOrdered(). If an error writing the output occurs, this is recorded by the PrintStream wrapper; however, the wrapper is then thrown away, making it impossible for the caller to check its error status.
> The intent is to support a trivial convenience method call that generates the well-known hexdump format.
> Especially for users that are interested in the hexdump data rather than the low-level details of how to terminate a stream.
> The dumpAsStream methods are available to support cases that differ from that format.
>
> Have you a suggestion to improve the dump() methods, or you’d like to see them omitted?
>
>> The PrintStream wrapper also uses the platform default charset, and doesn't provide any way for the caller to override the charset.
> Is there a need for that? Originally the requirement was driven by the hexdump format which is ASCII-only.
> Recently the class has been enhanced to also support the printable characters from ISO 8859-1.
> A custom formatter be supplied to dumpAsStream() to cater for all other cases?
OK, let's step back from this a bit. I see this hexdump as a little subsystem
that has the following facets:
1) a source of bytes
2) a converter to hex
3) a destination
The converter is HexDump.Formatter, which converts and formats a subrange of
byte[] to a String. Since the user can supply the Formatter function, the result
String can contain any unicode character. Thus, the destination needs to handle
any unicode character. It can be a Writer, which accepts String data. Or if you
want it to write bytes, it can be an OutputStream, which raises the issue of
encoding (charset). I would recommend against relying on the platform default
charset, as this has been a source of subtle bugs. The preferred approach these
days is to default to UTF-8 and provide an overload that takes an explicit charset.
An alternative is PrintStream. (This overlaps somewhat with your recent exchange
with Roger on this topic.) PrintStream also does charset encoding, and the
charset it uses depends on how it's created. I think the same approach should be
applied as I described above with OutputStream, namely avoid the platform
default charset; default to UTF-8; and provide an overload that takes an
explicit charset.
I'm not sure which of these is the right thing. You should decide which is the
most convenient for the use cases you expect to see. However, the solution needs
to handle charset encoding. (And it should also properly deal with I/O
exceptions, per my previous message.)
**
ISO 8859-1 comes up in a different place. The toPrintableString() method (used
by the default formatter) considers a byte "printable" if it encodes a valid ISO
8859-1 character. The byte is properly decoded to a String, so this is ok. Note
this is a distinct issue from the encoding of the OutputStream or PrintStream as
described above.
(As an aside I think that the encoding of ISO 8859-1 matches the corresponding
code units of UTF-16, so you don't have to do the new String(..., ISO_8859_1)
jazz. You can just cast the byte to a char and append it to the StringBuilder.)
s'marks
More information about the core-libs-dev
mailing list