RFR: 8170769 Provide a simple hexdump facility for binary data
Vincent Ryan
vincent.x.ryan at oracle.com
Tue Dec 11 21:21:20 UTC 2018
Thanks for your review Stuart.
My preference is for PrintStream rather than Writer, for the same reason as Roger: it’s more convenient
for handling System.out. Does that address your concern?
I cannot simply cast 8859-1 characters into UTF-8 because UTF-8 is multi-byte charset so some 0x8X characters
Will trigger the multi-byte sequence and will end up being misinterpreted. Hence my rather awkward conversion to a String.
Is there a better way?
I’m not sure I’ve addressed your concern regarding IOExceptions - can you elaborate?
BTW updated webrev/javadoc available:
http://cr.openjdk.java.net/~vinnie/8170769/webrev.08/ <http://cr.openjdk.java.net/~vinnie/8170769/webrev.08/>
http://cr.openjdk.java.net/~vinnie/8170769/javadoc.08/api/java.base/java/util/HexFormat.html <http://cr.openjdk.java.net/~vinnie/8170769/javadoc.08/api/java.base/java/util/HexFormat.html>
> On 11 Dec 2018, at 02:11, Stuart Marks <stuart.marks at oracle.com> wrote:
>
> On 12/7/18 10:22 AM, Vincent Ryan wrote:
>>> I'm not convinced that the overloads that send output to an OutputStream pull their weight. They basically wrap the OutputStream in a PrintStream, which conveniently doesn't declare IOException, making it easy to use from a lambda passed to forEachOrdered(). If an error writing the output occurs, this is recorded by the PrintStream wrapper; however, the wrapper is then thrown away, making it impossible for the caller to check its error status.
>> The intent is to support a trivial convenience method call that generates the well-known hexdump format.
>> Especially for users that are interested in the hexdump data rather than the low-level details of how to terminate a stream.
>> The dumpAsStream methods are available to support cases that differ from that format.
>> Have you a suggestion to improve the dump() methods, or you’d like to see them omitted?
>>> The PrintStream wrapper also uses the platform default charset, and doesn't provide any way for the caller to override the charset.
>> Is there a need for that? Originally the requirement was driven by the hexdump format which is ASCII-only.
>> Recently the class has been enhanced to also support the printable characters from ISO 8859-1.
>> A custom formatter be supplied to dumpAsStream() to cater for all other cases?
>
> OK, let's step back from this a bit. I see this hexdump as a little subsystem that has the following facets:
>
> 1) a source of bytes
> 2) a converter to hex
> 3) a destination
>
> The converter is HexDump.Formatter, which converts and formats a subrange of byte[] to a String. Since the user can supply the Formatter function, the result String can contain any unicode character. Thus, the destination needs to handle any unicode character. It can be a Writer, which accepts String data. Or if you want it to write bytes, it can be an OutputStream, which raises the issue of encoding (charset). I would recommend against relying on the platform default charset, as this has been a source of subtle bugs. The preferred approach these days is to default to UTF-8 and provide an overload that takes an explicit charset.
>
> An alternative is PrintStream. (This overlaps somewhat with your recent exchange with Roger on this topic.) PrintStream also does charset encoding, and the charset it uses depends on how it's created. I think the same approach should be applied as I described above with OutputStream, namely avoid the platform default charset; default to UTF-8; and provide an overload that takes an explicit charset.
>
> I'm not sure which of these is the right thing. You should decide which is the most convenient for the use cases you expect to see. However, the solution needs to handle charset encoding. (And it should also properly deal with I/O exceptions, per my previous message.)
>
> **
>
> ISO 8859-1 comes up in a different place. The toPrintableString() method (used by the default formatter) considers a byte "printable" if it encodes a valid ISO 8859-1 character. The byte is properly decoded to a String, so this is ok. Note this is a distinct issue from the encoding of the OutputStream or PrintStream as described above.
>
> (As an aside I think that the encoding of ISO 8859-1 matches the corresponding code units of UTF-16, so you don't have to do the new String(..., ISO_8859_1) jazz. You can just cast the byte to a char and append it to the StringBuilder.)
>
> s'marks
More information about the core-libs-dev
mailing list