RFR: 8170769 Provide a simple hexdump facility for binary data

Wed Dec 12 02:35:31 UTC 2018

On 12/11/18 1:21 PM, Vincent Ryan wrote:
> My preference is for PrintStream rather than Writer, for the same reason as 
> Roger: it’s more convenient
> for handling System.out. Does that address your concern?

PrintStream is fine with me.

> I cannot simply cast 8859-1 characters into UTF-8 because UTF-8 is multi-byte 
> charset so some 0x8X characters
> Will trigger the multi-byte sequence and will end up being misinterpreted. Hence 
> my rather awkward conversion to a String.
> Is there a better way?

In toPrintableString(),

  259         StringBuilder printable = new StringBuilder(toIndex - fromIndex);
  260         for (int i = fromIndex; i < toIndex; i++) {
  261             if (bytes[i] > 0x1F && bytes[i] < 0x7F) {
  262                 printable.append((char) bytes[i]);
  263             } else if (bytes[i] > (byte)0x9F && bytes[i] <= (byte)0xFF) {
  264                 printable.append(new String(new byte[]{bytes[i]}, 
ISO_8859_1));
  265
  266             } else {
  267                 printable.append('.');
  268             }
  269         }

It works to cast ASCII bytes char, because the 7-bit ASCII range overlaps the 
low 7 bits of the UTF-16 char range. The bytes values of ISO 8859-1 overlap the 
low 8 bits of UTF-16, so casts work for them too.

For any other charset, you'd need to do codeset conversion. But you're cleverly 
supporting only ISO 8859-1, so you don't have to do any conversion. :-)

> I’m not sure I’ve addressed your concern regarding IOExceptions - can you elaborate?

Taking out the OutputStream overloads addressed my concerns. In at least one 
case the code would wrap the OutputStream into a PrintStream, print stuff to it, 
and then throw away the PrintStream. If an output error occurred, any error 
state in the PrintStream would also be thrown away. The creation of the 
PrintStream wrapper would also use the system's default charset instead of 
letting the caller control it.

The dump() overloads now all take PrintStream, so it's the caller's 
responsibility to ensure that the PrintStream is using the right charset and to 
check for errors after. So this is all OK now.

Note that the internal getPrintStream(), to wrap an OutputStream in a 
PrintStream, is now obsolete and can be removed.

(Oh, I see Roger has said much the same things. Oh well, the peril of parallel 
reviews.)

**

> BTW updated webrev/javadoc available:
> http://cr.openjdk.java.net/~vinnie/8170769/webrev.08/
> http://cr.openjdk.java.net/~vinnie/8170769/javadoc.08/api/java.base/java/util/HexFormat.html

Now we have a somewhat unsatisfying asymmetry in the APIs.

There are four kinds of inputs:

1. byte[]
2. byte[] subrange
3. InputStream
4. ByteBuffer

and two kinds of outputs:

1. PrintStream
2. Stream<String>

and two variations of formatters:

1. default formatter
2. custom formatter + chunk size

This is a total of 16 combinations. But there are only eight methods: three 
PrintStream methods with choice of input, two stream-output methods using the 
default formatter, and three stream-output methods using custom chunk+formatter.

You don't have to provide ALL combinations, but what's here is an odd subset 
with some apparently arbitrary choices. For example, if I have a ByteBuffer and 
I want to dump it to System.out using default formatting, I have to go the 
Stream.forEachOrdered route AND provide the default chunk size and formatter.

     HexFormat.dumpAsStream(buf, DEFAULT_CHUNK_SIZE, HEXDUMP_FORMATTER)
              .forEachOrdered(System.out::println);

These aren't huge deals, but they're easily stumbled over.

One approach to organizing this is to have a HexFormat instance that contains 
the setup information and then to have instance methods that either update the 
setup or perform conversion and output. I'd use static factory methods instead 
of constructors. For example, you could do this:

static factories methods:
  - from(byte[])
  - from(byte[], fromIndex, toIndex)
  - from(InputStream)
  - from(ByteBuffer)

formatter setup instance methods:
  - format(chunksize, formatter)

output:
  - void dump(PrintStream)
  - Stream<String> stream()

Using this approach my example from above could be performed as follows:

    HexFormat.from(buf).dump(System.out);

Note, I'm not saying that you HAVE to do it this way. (In particular, the naming 
could use work.) This is quite possibly overkill. But it's something you might 
consider, as it gets you all 16 combinations using seven methods, compared to 
the eight static methods in the current proposal that cover only half of the 
combinations.

Alternatively, pare down the set of static methods to a bare minimum. Provide 
one that can do everything, and then provide one or two more that are 
essentially the same as the first but with some hardwired defaults. For example, 
to help minimize things, you can wrap a ByteBuffer around a byte array subrange, 
or get an InputStream from a byte array subrange. But you can't get an 
InputStream from a ByteBuffer or vice-versa, without a lot of work.

(I haven't looked at the to* or from* methods.)

s'marks