RFR: 8170769 Provide a simple hexdump facility for binary data

Tue Dec 11 16:45:07 UTC 2018

Hi Stuart,

The APIs for streams of characters bifurcated a bit between PrintStream 
and Writers.
Many common use cases would like to direct the output to System.out/err 
which are
PrintStreams.  Hence, I lean toward PrintStream that can be used directly.

$.02, Roger

On 12/10/2018 09:11 PM, Stuart Marks wrote:
> On 12/7/18 10:22 AM, Vincent Ryan wrote:
>>> I'm not convinced that the overloads that send output to an 
>>> OutputStream pull their weight. They basically wrap the OutputStream 
>>> in a PrintStream, which conveniently doesn't declare IOException, 
>>> making it easy to use from a lambda passed to forEachOrdered(). If 
>>> an error writing the output occurs, this is recorded by the 
>>> PrintStream wrapper; however, the wrapper is then thrown away, 
>>> making it impossible for the caller to check its error status.
>> The intent is to support a trivial convenience method call that 
>> generates the well-known hexdump format.
>> Especially for users that are interested in the hexdump data rather 
>> than the low-level details of how to terminate a stream.
>> The dumpAsStream methods are available to support cases that differ 
>> from that format.
>>
>> Have you a suggestion to improve the dump() methods, or you’d like to 
>> see them omitted?
>>
>>> The PrintStream wrapper also uses the platform default charset, and 
>>> doesn't provide any way for the caller to override the charset.
>> Is there a need for that? Originally the requirement was driven by 
>> the hexdump format which is ASCII-only.
>> Recently the class has been enhanced to also support the printable 
>> characters from ISO 8859-1.
>> A custom formatter be supplied to dumpAsStream() to cater for all 
>> other cases?
>
> OK, let's step back from this a bit. I see this hexdump as a little 
> subsystem that has the following facets:
>
> 1) a source of bytes
> 2) a converter to hex
> 3) a destination
>
> The converter is HexDump.Formatter, which converts and formats a 
> subrange of byte[] to a String. Since the user can supply the 
> Formatter function, the result String can contain any unicode 
> character. Thus, the destination needs to handle any unicode 
> character. It can be a Writer, which accepts String data. Or if you 
> want it to write bytes, it can be an OutputStream, which raises the 
> issue of encoding (charset). I would recommend against relying on the 
> platform default charset, as this has been a source of subtle bugs. 
> The preferred approach these days is to default to UTF-8 and provide 
> an overload that takes an explicit charset.
>
> An alternative is PrintStream. (This overlaps somewhat with your 
> recent exchange with Roger on this topic.) PrintStream also does 
> charset encoding, and the charset it uses depends on how it's created. 
> I think the same approach should be applied as I described above with 
> OutputStream, namely avoid the platform default charset; default to 
> UTF-8; and provide an overload that takes an explicit charset.
>
> I'm not sure which of these is the right thing. You should decide 
> which is the most convenient for the use cases you expect to see. 
> However, the solution needs to handle charset encoding. (And it should 
> also properly deal with I/O exceptions, per my previous message.)
>
> **
>
> ISO 8859-1 comes up in a different place. The toPrintableString() 
> method (used by the default formatter) considers a byte "printable" if 
> it encodes a valid ISO 8859-1 character. The byte is properly decoded 
> to a String, so this is ok. Note this is a distinct issue from the 
> encoding of the OutputStream or PrintStream as described above.
>
> (As an aside I think that the encoding of ISO 8859-1 matches the 
> corresponding code units of UTF-16, so you don't have to do the new 
> String(..., ISO_8859_1) jazz. You can just cast the byte to a char and 
> append it to the StringBuilder.)
>
> s'marks