RFR: 8170769 Provide a simple hexdump facility for binary data
Roger Riggs
Roger.Riggs at oracle.com
Tue Dec 11 16:45:07 UTC 2018
Hi Stuart,
The APIs for streams of characters bifurcated a bit between PrintStream
and Writers.
Many common use cases would like to direct the output to System.out/err
which are
PrintStreams. Hence, I lean toward PrintStream that can be used directly.
$.02, Roger
On 12/10/2018 09:11 PM, Stuart Marks wrote:
> On 12/7/18 10:22 AM, Vincent Ryan wrote:
>>> I'm not convinced that the overloads that send output to an
>>> OutputStream pull their weight. They basically wrap the OutputStream
>>> in a PrintStream, which conveniently doesn't declare IOException,
>>> making it easy to use from a lambda passed to forEachOrdered(). If
>>> an error writing the output occurs, this is recorded by the
>>> PrintStream wrapper; however, the wrapper is then thrown away,
>>> making it impossible for the caller to check its error status.
>> The intent is to support a trivial convenience method call that
>> generates the well-known hexdump format.
>> Especially for users that are interested in the hexdump data rather
>> than the low-level details of how to terminate a stream.
>> The dumpAsStream methods are available to support cases that differ
>> from that format.
>>
>> Have you a suggestion to improve the dump() methods, or you’d like to
>> see them omitted?
>>
>>> The PrintStream wrapper also uses the platform default charset, and
>>> doesn't provide any way for the caller to override the charset.
>> Is there a need for that? Originally the requirement was driven by
>> the hexdump format which is ASCII-only.
>> Recently the class has been enhanced to also support the printable
>> characters from ISO 8859-1.
>> A custom formatter be supplied to dumpAsStream() to cater for all
>> other cases?
>
> OK, let's step back from this a bit. I see this hexdump as a little
> subsystem that has the following facets:
>
> 1) a source of bytes
> 2) a converter to hex
> 3) a destination
>
> The converter is HexDump.Formatter, which converts and formats a
> subrange of byte[] to a String. Since the user can supply the
> Formatter function, the result String can contain any unicode
> character. Thus, the destination needs to handle any unicode
> character. It can be a Writer, which accepts String data. Or if you
> want it to write bytes, it can be an OutputStream, which raises the
> issue of encoding (charset). I would recommend against relying on the
> platform default charset, as this has been a source of subtle bugs.
> The preferred approach these days is to default to UTF-8 and provide
> an overload that takes an explicit charset.
>
> An alternative is PrintStream. (This overlaps somewhat with your
> recent exchange with Roger on this topic.) PrintStream also does
> charset encoding, and the charset it uses depends on how it's created.
> I think the same approach should be applied as I described above with
> OutputStream, namely avoid the platform default charset; default to
> UTF-8; and provide an overload that takes an explicit charset.
>
> I'm not sure which of these is the right thing. You should decide
> which is the most convenient for the use cases you expect to see.
> However, the solution needs to handle charset encoding. (And it should
> also properly deal with I/O exceptions, per my previous message.)
>
> **
>
> ISO 8859-1 comes up in a different place. The toPrintableString()
> method (used by the default formatter) considers a byte "printable" if
> it encodes a valid ISO 8859-1 character. The byte is properly decoded
> to a String, so this is ok. Note this is a distinct issue from the
> encoding of the OutputStream or PrintStream as described above.
>
> (As an aside I think that the encoding of ISO 8859-1 matches the
> corresponding code units of UTF-16, so you don't have to do the new
> String(..., ISO_8859_1) jazz. You can just cast the byte to a char and
> append it to the StringBuilder.)
>
> s'marks
More information about the core-libs-dev
mailing list