RFR: 8170769 Provide a simple hexdump facility for binary data
Stuart Marks
stuart.marks at oracle.com
Wed Dec 12 02:35:31 UTC 2018
On 12/11/18 1:21 PM, Vincent Ryan wrote:
> My preference is for PrintStream rather than Writer, for the same reason as
> Roger: it’s more convenient
> for handling System.out. Does that address your concern?
PrintStream is fine with me.
> I cannot simply cast 8859-1 characters into UTF-8 because UTF-8 is multi-byte
> charset so some 0x8X characters
> Will trigger the multi-byte sequence and will end up being misinterpreted. Hence
> my rather awkward conversion to a String.
> Is there a better way?
In toPrintableString(),
259 StringBuilder printable = new StringBuilder(toIndex - fromIndex);
260 for (int i = fromIndex; i < toIndex; i++) {
261 if (bytes[i] > 0x1F && bytes[i] < 0x7F) {
262 printable.append((char) bytes[i]);
263 } else if (bytes[i] > (byte)0x9F && bytes[i] <= (byte)0xFF) {
264 printable.append(new String(new byte[]{bytes[i]},
ISO_8859_1));
265
266 } else {
267 printable.append('.');
268 }
269 }
It works to cast ASCII bytes char, because the 7-bit ASCII range overlaps the
low 7 bits of the UTF-16 char range. The bytes values of ISO 8859-1 overlap the
low 8 bits of UTF-16, so casts work for them too.
For any other charset, you'd need to do codeset conversion. But you're cleverly
supporting only ISO 8859-1, so you don't have to do any conversion. :-)
> I’m not sure I’ve addressed your concern regarding IOExceptions - can you elaborate?
Taking out the OutputStream overloads addressed my concerns. In at least one
case the code would wrap the OutputStream into a PrintStream, print stuff to it,
and then throw away the PrintStream. If an output error occurred, any error
state in the PrintStream would also be thrown away. The creation of the
PrintStream wrapper would also use the system's default charset instead of
letting the caller control it.
The dump() overloads now all take PrintStream, so it's the caller's
responsibility to ensure that the PrintStream is using the right charset and to
check for errors after. So this is all OK now.
Note that the internal getPrintStream(), to wrap an OutputStream in a
PrintStream, is now obsolete and can be removed.
(Oh, I see Roger has said much the same things. Oh well, the peril of parallel
reviews.)
**
> BTW updated webrev/javadoc available:
> http://cr.openjdk.java.net/~vinnie/8170769/webrev.08/
> http://cr.openjdk.java.net/~vinnie/8170769/javadoc.08/api/java.base/java/util/HexFormat.html
Now we have a somewhat unsatisfying asymmetry in the APIs.
There are four kinds of inputs:
1. byte[]
2. byte[] subrange
3. InputStream
4. ByteBuffer
and two kinds of outputs:
1. PrintStream
2. Stream<String>
and two variations of formatters:
1. default formatter
2. custom formatter + chunk size
This is a total of 16 combinations. But there are only eight methods: three
PrintStream methods with choice of input, two stream-output methods using the
default formatter, and three stream-output methods using custom chunk+formatter.
You don't have to provide ALL combinations, but what's here is an odd subset
with some apparently arbitrary choices. For example, if I have a ByteBuffer and
I want to dump it to System.out using default formatting, I have to go the
Stream.forEachOrdered route AND provide the default chunk size and formatter.
HexFormat.dumpAsStream(buf, DEFAULT_CHUNK_SIZE, HEXDUMP_FORMATTER)
.forEachOrdered(System.out::println);
These aren't huge deals, but they're easily stumbled over.
One approach to organizing this is to have a HexFormat instance that contains
the setup information and then to have instance methods that either update the
setup or perform conversion and output. I'd use static factory methods instead
of constructors. For example, you could do this:
static factories methods:
- from(byte[])
- from(byte[], fromIndex, toIndex)
- from(InputStream)
- from(ByteBuffer)
formatter setup instance methods:
- format(chunksize, formatter)
output:
- void dump(PrintStream)
- Stream<String> stream()
Using this approach my example from above could be performed as follows:
HexFormat.from(buf).dump(System.out);
Note, I'm not saying that you HAVE to do it this way. (In particular, the naming
could use work.) This is quite possibly overkill. But it's something you might
consider, as it gets you all 16 combinations using seven methods, compared to
the eight static methods in the current proposal that cover only half of the
combinations.
Alternatively, pare down the set of static methods to a bare minimum. Provide
one that can do everything, and then provide one or two more that are
essentially the same as the first but with some hardwired defaults. For example,
to help minimize things, you can wrap a ByteBuffer around a byte array subrange,
or get an InputStream from a byte array subrange. But you can't get an
InputStream from a ByteBuffer or vice-versa, without a lot of work.
(I haven't looked at the to* or from* methods.)
s'marks
More information about the core-libs-dev
mailing list