Warnings Cleanup in java.util.<various> (more from hack day)

Sat Dec 3 03:05:23 PST 2011

Hi Sherman,

In order to keep this change within the scope of the intentions of the
exercise I'm going to revert that section to what it was (I'll re-spin a
patch).  At this stage I won't add a @SuppressWarnings as I think this
should be avoidable once it's looked at again in a little more depth.

The rest follows in-line.  If this is starting to chew up your time then
please don't feel obliged to answer, this is more for my own curiosities
sake.  This also isn't a suggestion to change the patch again, just
theorising here :-)

On Friday, 2 December 2011, Xueming Shen <xueming.shen at oracle.com> wrote:
> Martijn,
>
> The proposed change is incorrect.
>
> +  value = new String(vb, 0, 0, StandardCharsets.UTF_8);
>
> First, shouldn't it at least be
>
> value = new String(vb, StandardCharsets.UTF_8);
>
>  or
>
> value = new String(vb, 0, vb.length, StandardCharsets.UTF_8);
>
> Second, the "value" will be written out via dos.writeBytes(String) later,
which
> only takes the low-byte of a each char of the target String object.

Right, so the need for the hi-byte constructor can be avoided, but by me
ignoring the vb.length I introduced a bug, gotcha.

So the code was:

String value = e.getKey();
if (value != null) {
    byte[] vb = value.getBytes("utf8");
    value = new String(vb, 0, 0, vb.length);
}

My proposed (bad) change was:

String value = e.getKey();
if (value != null) {
    byte[] vb = value.getBytes(StandardCharsets.UTF_8);
    value = new String(vb, 0, 0, StandardCharsets.UTF_8);
}

As you say, a better change would have been:

String value = e.getKey();
if (value != null) {
    byte[] vb = value.getBytes(StandardCharsets.UTF_8);
    value = new String(vb, 0, vb.length, StandardCharsets.UTF_8);
}

Or in a more performing manner:

String value = e.getKey();
if (value != null) {
    // As at jdk8 build X - we don't use StandardCharsets.UTF_8 in getBytes
for performance reasons
    byte[] vb = value.getBytes("utf8");
    value = new String(vb, 0, vb.length, StandardCharsets.UTF_8);
}

> That "not properly" conversion is actually what we need here to make sure
the
> output will be correctly in utf8. The only "reliable" replacement might
be to use
> "iso8859-1" (not utf_8), but I would recommend keep it un-touched.

Fair enough.

Thanks for clearing that up and apologies for causing pain :(

Cheers,
Martijn