RFR: 8170769 Provide a simple hexdump facility for binary data

Mon Dec 3 19:44:38 UTC 2018

Hi Vincent,

On 12/02/2018 05:54 PM, Vincent Ryan wrote:
> Thanks for the thorough review.
> Responses in-line.
>
>> On 26 Nov 2018, at 19:49, Roger Riggs <Roger.Riggs at oracle.com 
>> <mailto:Roger.Riggs at oracle.com>> wrote:
>>
>> Hi,
>>
>> Thanks for updating this proposal.  Some comments on the javadoc/spec.
>>
>> The class name Hex isn't very evocative of its function.
>> HexFormat would convey a more complete idea as to its function and be 
>> similar to
>> to the existing DecimalFormat and NumberFormat classes though they do 
>> not share
>> any of the APIs or framework.
>
> OK
>
>>
>> - The 2nd example using Files.newInputStream is not recommended
>> since the stream to the file may not be closed.
>> The recommended form would use try-with-resources.
>> |try (InputStream is = |||Files.newInputStream(Paths.get("mydata"))) {| Hex.dumpAsStream(is, 
>> 64, (offset, chunk, fromIndex, toIndex) -> String.format("%d %s", 
>> offset / 64 + 1, Hex.toPrintableString(chunk, fromIndex, toIndex))) 
>> .forEachOrdered(System.out::println); } catch (IOException ioe) { ... } |
>> The 'forEachOrdered' should not be necessary and may raise questions 
>> about why.
>> if there's no good reason, use 'forEach’.
>
> OK
>
>>
>>
>> - The HEXDUMP_FORMATTER should be explicit about the format it generates.
>>   Though it may declare itself compatible with hexdump(1) it should 
>> specify
>>   the format it produces to avoid an external reference that might 
>> change unexpectedly.
>
> OK
>
>>
>> - toFormattedHexString:
>> References to the line-separator should be javadoc linked to 
>> System#lineSeparator.
>
> OK. And maybe shorten this method to simply: toFormattedString ?
Fine with me.
>
>>
>> Alternatively, should this class follow the lead of 
>> String.align/indent and use
>> the normalized line terminator "\n" consistently? In which case, it 
>> should be explicit.
>>
>> Clarify exactly the behavior:
>>
>> "If the final chunk is less than 16 +bytes+ then +the result+ is 
>> padded with spaces to match the length of the preceding chunks. "
>>
>> Only the built-in formatter asserts that every result is the same 
>> length.  A provided formatter
>> may not conform.  soften "will be" -> "may be" in "the final chunk …"
>
> OK
>
>>
>> - everywhere: "binary buffer" -> "byte buffer"  or byte array
>
> OK
>
>>
>> - toPrintableString:  The ASCII limitation seems quite dated, 8-bit 
>> clean is the norm and
>> except for control characters  and 0x7f, the rest of the range is 
>> printable in most Locales.
>> Strict conformance to hexdump is not a requirement.
>
> I can see the benefit to including the accented letters > 0x7F but is 
> it really useful to also display
> graphical and non-printable characters > 0x7F ?
Unless the character would cause the terminal to go 'haywire' or enter
a unprintable mode  I'd rather see the character than '.'.
It may a bit arbitrary but if a character is ISO 8859-1, it should be 
printed.
That omits  0..0x1F, and 0x7F..0x9F;  I'm fine with those being '.'.

>
>>
>> - dumpAsStream(InputStream)
>>   It is inconsistent (and possibly wrong) that the final chunk may be 
>> shorter since the HEXDUMP_FORMATTER
>>   uses the formatter that always pads.  (Btw, the implementation 
>> appends \n and should be using lineSeparator.)
>>    "If the input is not a multiple of 16 bytes then the final chunk 
>> will be shorter than the preceding chunks.”
>
> I’ve clarified the language. The final chunk will indeed be shorter 
> but the output 'will be' padded by HEXDUMP_FORMATTER
> and 'may be' padded when a custom formatter is supplied.
ok
>
>>
>> - dumpAsStream(bytes, from, to, chunk, formatter)
>>
>>  Can it more explicit about the unchecked exception?
>>  "If an error occurs in the |formatter| then an unchecked exception 
>> will be thrown from the underlying |Stream| method.”
>
> What do you suggest? The formatter may throw any exception and it 
> doesn’t get thrown until the stream is processed.
ok, You are right, there isn't much that can be said, exceptions from 
the stream can happen.
>
>>
>> - Another thought about the methods returning Stream<String>.
>>   They require a concrete string to be created for each chunk.
>>   There may be some efficiency and flexibility to be gained if 
>> formatters can return a CharSequence.
>>   In many cases it would be a string, but it would allow the 
>> formatter to return a StringBuilder
>>   that could be used directly without needing to construct a short 
>> lived string.
>
> Sounds OK but I’ll need to prototype this and get back to you.

This may be more trouble than its worth; it makes the API a bit harder 
to understand.
If its not quick and easy, I'm fine with it as string.

Thanks, Roger

>
>
>>
>> Thanks, Roger
>>
>>
>> p.s.
>> I wrote code for the use case of formatting bytes in the form that 
>> can be compiled with javac.
>> it took rather more code than I expected but I don't have any 
>> specific suggestions.
>>
>>
>>         byte[] b = ...;
>>         final String indent = " ".repeat(8);
>>         final StringBuilder sb = new StringBuilder(16 * 5);
>>         sb.append(indent);
>>
>>         System.out.print("    byte[] bytes = {\n");
>>         try (ByteArrayInputStream is = new ByteArrayInputStream(b)) {
>>             dumpAsStream(is, 16, (o, c, s, e) -> {
>>                 sb.setLength(indent.length());
>>                 for (int i = s; i < e; i++) {
>>                     sb.append(c[i]);
>>                     sb.append(", ");
>>                 }
>>                 sb.append(System.lineSeparator());
>>                 return sb.toString();
>>             }).forEach(s -> System.out.print(s));
>>         }
>>         System.out.print("    }\n");
>>
>>
>> On 11/23/2018 09:51 AM, Vincent Ryan wrote:
>>> Hello,
>>>
>>> Please review this proposal for a new API to conveniently generate 
>>> and display binary data using hexadecimal string representation.
>>> It supports both bulk and stream operations and it can also generate 
>>> the well-known hexdump format [1].
>>>
>>> This latest revision addresses review comments provided earlier.
>>> These include adding methods to support for data supplied in a 
>>> ByteBuffer, exposing the default formatter implementation as a public
>>> static and dropping the superfluous 'Hex' term from several method 
>>> names.
>>>
>>> Thanks
>>>
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8170769
>>> API: 
>>> http://cr.openjdk.java.net/~vinnie/8170769/javadoc.06/api/java.base/java/util/Hex.html
>>> Webrev: http://cr.openjdk.java.net/~vinnie/8170769/webrev.06/
>>>
>>>
>>> ____
>>> [1] https://docs.oracle.com/cd/E88353_01/html/E37839/hexdump-1.html
>>
>