RFR: JDK-8066619: String(byte[],int,int,int) in String has been deprecated in Manifest and Attributes

Mon Dec 3 21:43:37 UTC 2018

Hi Phillip,

The amount detail obscures the general purpose.
And there appears to be more than 1.
The Jira issue IDs mentioned are 8066619 and 6202130.

Is this functionally neutral and only fixes the deprecations?

There is a mention that a test is needed for multi-byte chars, but a test
is not included.  Is there an existing test for that?

Its probably best to identify the main functional improvement (multi-byte)
and fix the deprecation as a side effect.

Thanks for digging through the issues and the explanations;
it will take a bit of study to unravel and understand everything in this 
changeset.

Regards, Roger

On 12/01/2018 06:49 AM, Philipp Kunz wrote:
> Find the proposed patch attached. Some comments and explanations, here:
>
> There is a quite interesting implementation in Manifest and Attributes
> worth quite some explanation. The way it used to work before was:
>
> 1. put manifest header name, colon and space into a StringBuffer
> -> the buffer now contains a string of characters each high-byte of
> which is zero as explained later why this is important. the high-bytes
> are zero because the set of allowed characters is very limited to ":",
> " ", "a" - "z", "A" - "Z", "0" - "9", "_", and "-" according to
> Attributes.Name#hash(String) so far with only the name and the
> separator and yet without the values.
>
> 2. if the value is not null, encode it in UTF-8 into a byte array and
> instantiate a String with it using deprecated
> String#String(byte[],int,int,int) resulting in a String with the same
> length as the byte array before holding one byte in each character's
> low-byte. This makes a difference for characters encoded with more than
> one byte in UTF-8. The new String is potentially longer than the
> original value.
>
> 3. if the value is not null, append value to buffer. The one UTF-8
> encoded byte per character from the appended string is preserved also
> in the buffer along with the previous buffer contents.
>
> 3alt. if the value is null, add "null" to the buffer. See
> java.lang.AbstractStringBuilder#append(String). Neither of the
> characters of "null" has a non-zero high-byte encoded as UTF-16 chars.
>
> 4. make72Safe inserts line breaks with continuation spaces. Note that
> the buffer here contains only one byte per character because all high-
> bytes are still zero so that line.length() and line.insert(index, ...)
> effectively operate with byte offsets and not characters.
>
> 5. buffer.toString()
>
> 6. DataOutputStream#writeBytes(String). First of all read the JavaDoc
> comment for it, which explains it all:
>      Writes out the string to the underlying output stream as a
>      sequence of bytes. Each character in the string is written out, in
>      sequence, by discarding its high eight bits. If no exception is
>      thrown, the counter <code>written</code> is incremented by the
>      length of <code>s</code>
> This restores the earlier UTF-8 encoding correctly.
>
> The topic has been discussed and mentioned already in
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-May/052946.ht
> ml
> https://bugs.openjdk.java.net/browse/JDK-6202130
>
> String(byte[],int,int,int) works "well" or "well enough" only together
> with DataOutputStream#writeBytes(String). When removing
> String(byte[],int,int,int) from Manifest and Attributes because
> deprecated, it makes no sense to keep using
> DataOutputStream#writeBytes(String) either.
>
> For the same reason as String#String(byte[],int,int,int) has been
> deprecated, I suggest to also deprecate
> java.io.DataOutput#writeBytes(String) as a separate issue. This might
> relate to https://bugs.openjdk.java.net/browse/JDK-6400767 but that one
> came to a different conclusion some ten years ago.
>
> I preferred to stick with the DataOutputStream even though not strictly
> necessary any more. It is and has been in the API of Attributes
> (unfortunately not private) and should better not be removed by
> changing the parameter type. Same for Manifest#make72Safe(StringBuffer)
> which I deprecated rather than having removed. Someone could have
> extended a class from Manifest and use such a method and when changing
> the signature it could no longer even compile in a far-fetched case.
>
> LINE_BREAK, CONTINUATION_SPACE, LINE_BREAK_BYTES, and
> LINE_BREAK_WITH_CONTINUATION_SPACE_BYTES should prevent having to
> invoke getBytes(UTF_8) over and over again on "\r\n" and "\r\n " with
> the idea to slightly improve performance this way. I figured it does
> not need JavaDoc comments but would be happy to add them if desired.
>
> I removed "XXX Need to handle UTF8 values." from Manifest#read after
> adding a test for it in ValueUtf8Coding. This change and test also
> relate to bug 6202130 but does not solve that one completely.
> ValueUtf8Coding demonstrates that Manifest can read UTF-8 encoded
> values which is a necessary test case to cover for this patch here.
> ValueUtf8Coding is the same test as already submitted and suggested
> earlier. See
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-October/threa
> d.html#55848
>
> Indentation in Attributes#write(DataOutputStream) was five spaces on
> most lines. I fixed indentation only on the lines changed anyway.
>
> I replaced String#String(byte[],int,int,String) with
> String#String(byte[],int,int,java.nio.charset.StandardCharsets.UTF_8)
> which as a difference does not declare to throw a
> java.io.UnsupportedEncodingException. That also replaced "UTF8" as a
> charset name which I would consider not optimal regarding
> sun.nio.cs.UTF_8#UTF_8() and sun.nio.cs.UTF_8#historicalName().
>
> In my opinion there is still some duplicated or at least very similar
> code in Manifest#write, Attributes#writeMain, and Attributes#write but
> I preferred to change less rather than more and not to further refactor
> and re-combine it.
>
> In EmptyKeysAndValues and NullKeysAndValues tests I tried to
> demonstrate that the changed implementation does not change behaviour
> also in edge cases. I would have expected not having to test all these
> cases but then I realized it was possible to test and is therefore
> possible in a real use case as well however far-fetched. At least the
>
>      if (value != null) {
>
> lines (three times) most obviously demand to test the null value cases.
>
> I'm looking curiously forward to any kind of feedback or opinion.
>
> Philipp