RFR 8251989: Hex encoder and decoder utility

Sun Aug 23 14:42:14 UTC 2020

> 
> In the meantime I'll nevertheless take a look at the current 
> implementation of the Decoder.
> 
> 
> Greetings
> Raffaello

Hi Roger,

here are my notes about the Decoder.

(1) The only serious issue I could find is on L.548. It should read
L.548   CharBuffer cb = CharBuffer.wrap(chars, index, index + length);
as method wrap() takes a start and an end.
(It's quite confusing and error prone to have both (start, length) and
(start, end) conventions on ranges all over the OpenJDK :-(  Nobody's
fault, just annoying.)

The rest are minor notes.

(2) While I'm pretty sure that a JIT compiler can remove the index range
check on L.465 when len is replaced by bytes.length on L.463, I'm less
sure that it can do it as currently coded, despite len and bytes.length
having the same value. Readability wouldn't suffer and len could be removed.

(3) The expressions preLen > origLen - sufLen on L.492 and sufLen >
origLen - preLen on L.496 are equivalent. The latter is thus useless on
L.496 because it is always false when execution reaches here.

(4) The code starting at L.588 assumes that delimiter.length() > 0,
which is ensured by the optimization in L.585-586.
Without this assumption, however, neither L.509 nor L.512 are correct
when delimiter.length() == 0.
To make the code to work even when, for some reason, delimiter.length()
== 0) the lines can be replaced by the slightly more involved yet more
robust
L.509   if ((subcs.length() - 2) % stride != 0)
and
L.512   final int len = (subcs.length() - 2) / stride + 1;

(5) I guess that the rationale for not failing fast on illegal
characters in the loop on L.516-521, and accumulate and postpone the
check only after the whole cs has been processed, is that the code
optimistically assumes that cs is legal, so it would be wasteful to
check at each iteration.
But since each iteration already needs a quite more expensive check for
the delimiter, I wonder if the postponing makes much economical sense.
The check could be directly on v before assignment to bytes and the var
illegal could be removed.

(6) What about moving L.517 at the end of the loop body so that memory
accesses are issued with strictly increasing and hardware predictable
addresses? This move would also better reveal the "left-to-right"
processing intent.

(7) On L.561, delimiter.isEmpty() could be removed as checkDelimiter()
is not invoked when the delimiter is empty. But it's perhaps safer to
have it.

(8) You may want to wrap escapeNL() around the message on L.567-569

(9) fromHexPair() and fromHex() can be declared static.

(10) equals() and hashCode() are missing despite the class being
value-based. This holds for Encoder as well.

I look forward for the next iteration of the code.

Greetings
Raffaello