[11] RFR: 8170769 Provide a simple hexdump facility for binary data

Roger Riggs Roger.Riggs at Oracle.com
Thu May 10 20:51:58 UTC 2018


Hi,

On 5/9/2018 6:49 PM, Vincent Ryan wrote:
> Thanks Roger for your comments.
>
> The main motivator for this class is to provide a basic hex. encoder/decoder for smaller amounts of binary data
> and to provide a hexdump encoder for larger amounts of binary data, while recognising the need to cater for
> custom formats too.
>
> The class does not attempt to satisfy every custom format. Instead it provides some basic formatting methods
> and functions from which a wide variety of custom formats can be constructed. Developers can also write their
> own formatting functions to use with this class.

I'm fine with covering the simple cases if the rest can be covered by 
the extensible formatting API.
>
> More comments in-line below.
>
>
>> On 9 May 2018, at 16:20, Roger Riggs <Roger.Riggs at oracle.com> wrote:
>>
>> Hi Vinnie,
>>
>> On the API and spec, a few comments:
>>
>>   - Expanding the printable string from ASCII to ISO-8859-1 would make it a bit more useful in more cases.
>>     That might suggest using the Charset converter to do the work (less optimized but more functional).
> Yes adding support for additional charsets could be useful but it is a tradeoff against cluttering the API.
> I think ASCII is sufficient for the common case.
Converting the control characters to "." below <space>  0-31 makes sense 
to avoid odd carriage control effects.
But the characters above 127 are printable and  there is no reason to 
blank them out.
>> - There is no API support for ByteBuffers, another common source of bytes, that would make a good addition
>>     for completeness.  John Rose suggested a ByteSequence interface in the context of file processing but
>>     that hasn't settled down.
> Support for byte buffers was requested before and I suggested wrapping in an I/O stream.
> Is that acceptable?
If it were a simple one or two method pattern, I'd be likely to agree 
but its not simple unless the ByteArray is
backed by an array and  using the position and limit it gets more 
complicated.

A convenience method would need only to allocate an array (limit - 
position) and get, the rest is as easy as byte[].
For example,

      public static Stream<String> dumpAsStream(ByteBuffer buffer, int
    fromIndex, int toIndex,
                                                   Formatter formatter) {
             int len = Math.max(0, buffer.limit() - buffer.position());
             byte[] bytes = new byte[len];
             buffer.get(bytes);
             return dumpAsStream(bytes, fromIndex, toIndex, 16, formatter);
         }

>
>
>> - The class name "Hex" might be a bit more evocative as HexDump or HexConverter.
> I tried to keep the class name short as many of its method names are long.
>
>
>> - Method names;  the "Hex" in method names might be unnecessary/redundant since, as static methods,
>>    they would frequently appear in code as "Hex.fromHexString" and a simple "Hex.fromString" would be fine.
>>    Ditto,  toHexString(bytes) -> toString(bytes)…
> I agree the repetition is ugly. Shortening to fromString() and toString() is appealing except for the possible
> confusion with Object.toString
>
There are no instances, so little room for confusion.

>> - There are not many forms that allow the formatter to be supplied, for example, dump(in, out) might be
>>    a case where a formatter would be desired.
> The 3 dump methods are just convenience functions around the stream generators. The dumpAsStream
> methods are the expected entry points for customizers.

Since those methods are writing formatted output, they should be 
directed to a PrintStream,
of which System.out/err would be common case.

OutputStream isn't going to know the correct line separator; (and it 
should not be hardcoded as '\n'  Hex:606)

Btw,  the example in the class javadoc does not compile.

>> - Hex.Formatter interface could have a default method that provides the default formatting or as
>>    a static method so it can be used with a method reference.
> Sure. What do you suggest?
Hex: 69:

   69             String.format("%08x  %s  |%s|",
   70                 offset,
   71                 Hex.toFormattedHexString(chunk, fromIndex, toIndex),
   72                 Hex.toPrintableString(chunk, fromIndex, toIndex));

>
>
>> - On the example in the class javadoc, I would use the implementation of the default formatter with both hex and ascii
>>    to show how that works.
> Can you give me an example?
Define the method with the default modifier and the method body of 
HEXDUMP_FORMATTER.format.
(Hex:68-73).
>
>
>>   - As Max observes, being able to supply the delimiters might be a good addition.  (I'm thinking IP addresses too).
> Sure. Add another toHexString method that takes a delimiter character?
I take it back, that's a customization that could be done by a user 
supplied formatter.

Thanks, Roger

>
>> It looks quite good and very useful.
>>
>> Thanks, Roger
>>
>> On 5/8/2018 10:34 PM, Weijun Wang wrote:
>>> Nice tool.
>>>
>>> However, I am not sure how toFormattedHexString() and toPrintableString() are useful, seems only for providing a customizable dump format which is, actually, not very customizable.
>>>
>>> For me, toHexString and fromHexString are of course the most useful methods. As for dump, I can only think of
>>>
>>> 1. The existing sun.security.HexDumpEncoder format, when I want to dump a lot of bytes as a block
>>> 2. "00:11:22:33:AA:BB:CC" which fits in one line and also easy to read, when I want inline debugging output
>>>
>>> If the customizable dump method is both powerful and simple enough to create 2) above, I'll be happy. Otherwise, I can live with toHexString().replaceAll("(..)(?=.)", "$1:”).
>>>
>>> Thanks
>>> Max
>>>
>>>> On May 4, 2018, at 4:22 AM, Vincent Ryan <Vincent.X.Ryan at oracle.com> wrote:
>>>>
>>>> Hello,
>>>>
>>>> Please review this proposal for a new API to conveniently generate and display binary data using hex string representation.
>>>> It supports both bulk and stream operations and it can also generate the well-known hexdump format [1].
>>>>
>>>> Thanks
>>>>
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8170769
>>>> API: http://cr.openjdk.java.net/~vinnie/8170769/javadoc.05/api/java.base/java/util/Hex.html
>>>> Webrev: http://cr.openjdk.java.net/~vinnie/8170769/webrev.05/
>>>>
>>>> ____
>>>> [1] https://docs.oracle.com/cd/E86824_01/html/E54763/hexdump-1.html



More information about the core-libs-dev mailing list