RFR: 8170769 Provide a simple hexdump facility for binary data
Roger Riggs
Roger.Riggs at oracle.com
Mon Dec 3 19:44:38 UTC 2018
Hi Vincent,
On 12/02/2018 05:54 PM, Vincent Ryan wrote:
> Thanks for the thorough review.
> Responses in-line.
>
>> On 26 Nov 2018, at 19:49, Roger Riggs <Roger.Riggs at oracle.com
>> <mailto:Roger.Riggs at oracle.com>> wrote:
>>
>> Hi,
>>
>> Thanks for updating this proposal. Some comments on the javadoc/spec.
>>
>> The class name Hex isn't very evocative of its function.
>> HexFormat would convey a more complete idea as to its function and be
>> similar to
>> to the existing DecimalFormat and NumberFormat classes though they do
>> not share
>> any of the APIs or framework.
>
> OK
>
>>
>> - The 2nd example using Files.newInputStream is not recommended
>> since the stream to the file may not be closed.
>> The recommended form would use try-with-resources.
>> |try (InputStream is = |||Files.newInputStream(Paths.get("mydata"))) {| Hex.dumpAsStream(is,
>> 64, (offset, chunk, fromIndex, toIndex) -> String.format("%d %s",
>> offset / 64 + 1, Hex.toPrintableString(chunk, fromIndex, toIndex)))
>> .forEachOrdered(System.out::println); } catch (IOException ioe) { ... } |
>> The 'forEachOrdered' should not be necessary and may raise questions
>> about why.
>> if there's no good reason, use 'forEach’.
>
> OK
>
>>
>>
>> - The HEXDUMP_FORMATTER should be explicit about the format it generates.
>> Though it may declare itself compatible with hexdump(1) it should
>> specify
>> the format it produces to avoid an external reference that might
>> change unexpectedly.
>
> OK
>
>>
>> - toFormattedHexString:
>> References to the line-separator should be javadoc linked to
>> System#lineSeparator.
>
> OK. And maybe shorten this method to simply: toFormattedString ?
Fine with me.
>
>>
>> Alternatively, should this class follow the lead of
>> String.align/indent and use
>> the normalized line terminator "\n" consistently? In which case, it
>> should be explicit.
>>
>> Clarify exactly the behavior:
>>
>> "If the final chunk is less than 16 +bytes+ then +the result+ is
>> padded with spaces to match the length of the preceding chunks. "
>>
>> Only the built-in formatter asserts that every result is the same
>> length. A provided formatter
>> may not conform. soften "will be" -> "may be" in "the final chunk …"
>
> OK
>
>>
>> - everywhere: "binary buffer" -> "byte buffer" or byte array
>
> OK
>
>>
>> - toPrintableString: The ASCII limitation seems quite dated, 8-bit
>> clean is the norm and
>> except for control characters and 0x7f, the rest of the range is
>> printable in most Locales.
>> Strict conformance to hexdump is not a requirement.
>
> I can see the benefit to including the accented letters > 0x7F but is
> it really useful to also display
> graphical and non-printable characters > 0x7F ?
Unless the character would cause the terminal to go 'haywire' or enter
a unprintable mode I'd rather see the character than '.'.
It may a bit arbitrary but if a character is ISO 8859-1, it should be
printed.
That omits 0..0x1F, and 0x7F..0x9F; I'm fine with those being '.'.
>
>>
>> - dumpAsStream(InputStream)
>> It is inconsistent (and possibly wrong) that the final chunk may be
>> shorter since the HEXDUMP_FORMATTER
>> uses the formatter that always pads. (Btw, the implementation
>> appends \n and should be using lineSeparator.)
>> "If the input is not a multiple of 16 bytes then the final chunk
>> will be shorter than the preceding chunks.”
>
> I’ve clarified the language. The final chunk will indeed be shorter
> but the output 'will be' padded by HEXDUMP_FORMATTER
> and 'may be' padded when a custom formatter is supplied.
ok
>
>>
>> - dumpAsStream(bytes, from, to, chunk, formatter)
>>
>> Can it more explicit about the unchecked exception?
>> "If an error occurs in the |formatter| then an unchecked exception
>> will be thrown from the underlying |Stream| method.”
>
> What do you suggest? The formatter may throw any exception and it
> doesn’t get thrown until the stream is processed.
ok, You are right, there isn't much that can be said, exceptions from
the stream can happen.
>
>>
>> - Another thought about the methods returning Stream<String>.
>> They require a concrete string to be created for each chunk.
>> There may be some efficiency and flexibility to be gained if
>> formatters can return a CharSequence.
>> In many cases it would be a string, but it would allow the
>> formatter to return a StringBuilder
>> that could be used directly without needing to construct a short
>> lived string.
>
> Sounds OK but I’ll need to prototype this and get back to you.
This may be more trouble than its worth; it makes the API a bit harder
to understand.
If its not quick and easy, I'm fine with it as string.
Thanks, Roger
>
>
>>
>> Thanks, Roger
>>
>>
>> p.s.
>> I wrote code for the use case of formatting bytes in the form that
>> can be compiled with javac.
>> it took rather more code than I expected but I don't have any
>> specific suggestions.
>>
>>
>> byte[] b = ...;
>> final String indent = " ".repeat(8);
>> final StringBuilder sb = new StringBuilder(16 * 5);
>> sb.append(indent);
>>
>> System.out.print(" byte[] bytes = {\n");
>> try (ByteArrayInputStream is = new ByteArrayInputStream(b)) {
>> dumpAsStream(is, 16, (o, c, s, e) -> {
>> sb.setLength(indent.length());
>> for (int i = s; i < e; i++) {
>> sb.append(c[i]);
>> sb.append(", ");
>> }
>> sb.append(System.lineSeparator());
>> return sb.toString();
>> }).forEach(s -> System.out.print(s));
>> }
>> System.out.print(" }\n");
>>
>>
>> On 11/23/2018 09:51 AM, Vincent Ryan wrote:
>>> Hello,
>>>
>>> Please review this proposal for a new API to conveniently generate
>>> and display binary data using hexadecimal string representation.
>>> It supports both bulk and stream operations and it can also generate
>>> the well-known hexdump format [1].
>>>
>>> This latest revision addresses review comments provided earlier.
>>> These include adding methods to support for data supplied in a
>>> ByteBuffer, exposing the default formatter implementation as a public
>>> static and dropping the superfluous 'Hex' term from several method
>>> names.
>>>
>>> Thanks
>>>
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8170769
>>> API:
>>> http://cr.openjdk.java.net/~vinnie/8170769/javadoc.06/api/java.base/java/util/Hex.html
>>> Webrev: http://cr.openjdk.java.net/~vinnie/8170769/webrev.06/
>>>
>>>
>>> ____
>>> [1] https://docs.oracle.com/cd/E88353_01/html/E37839/hexdump-1.html
>>
>
More information about the core-libs-dev
mailing list