RFR: JDK-8066619: String(byte[],int,int,int) in String has been deprecated in Manifest and Attributes

Tue Dec 4 08:34:47 UTC 2018

Hi Roger,
I'm afraid the previous patch missed the tests, should be included this
time.
The intention of the patch is to solve only bug 8066619 about
deprecation. I sincerely hope the changes are neutral.
The new ValueUtf8Coding test heavily coincides/overlaps with 6202130
which is why I mentioned it. I'm however not satisfied that that test
alone also completely solves 6202130 because 6202130 has or might have
implications with breaking characters encoded in UTF-8 with more than
one bytes across a line break onto a continuation line which is not
part of the current patch proposed for 8066619. At some point I
proposed ValueUtf8Coding with only removing the comments from the
implementation in http://mail.openjdk.java.net/pipermail/core-libs-dev/
2018-October/056166.html but I have changed my mind since then and
think now 6202130 should also change the implementation not to break
lines inside of multi-byte characters which is not part of the current
patch and is probably easier after the current patch if necessary at
all. Both 6202130 and the current patch for 8066619 here touch the UTF-
8 coding of manifests and which ever is solved first should add a
corresponding test because no such test exists yet I believe. Worth to
mention are test/jdk/tools/launcher/DiacriticTest.java and
test/jdk/tools/launcher/UnicodeTest.java both of which test the JVM
launch and have a somewhat different purpose. I haven't found any other
test for the specifically changed lines of code apart from probably
many tests that use manifests indirectly in some form. 
Regards,Philipp

On Mon, 2018-12-03 at 16:43 -0500, Roger Riggs wrote:
> Hi Phillip,
> 
> The amount detail obscures the general purpose.
> And there appears to be more than 1.
> The Jira issue IDs mentioned are 8066619 and 6202130.
> 
> Is this functionally neutral and only fixes the deprecations?
> 
> There is a mention that a test is needed for multi-byte chars, but a
> test
> is not included.  Is there an existing test for that?
> 
> Its probably best to identify the main functional improvement (multi-
> byte)
> and fix the deprecation as a side effect.
> 
> Thanks for digging through the issues and the explanations;
> it will take a bit of study to unravel and understand everything in
> this 
> changeset.
> 
> Regards, Roger
> 
> 
> On 12/01/2018 06:49 AM, Philipp Kunz wrote:
> > Find the proposed patch attached. Some comments and explanations,
> > here:
> > 
> > There is a quite interesting implementation in Manifest and
> > Attributes
> > worth quite some explanation. The way it used to work before was:
> > 
> > 1. put manifest header name, colon and space into a StringBuffer
> > -> the buffer now contains a string of characters each high-byte of
> > which is zero as explained later why this is important. the high-
> > bytes
> > are zero because the set of allowed characters is very limited to
> > ":",
> > " ", "a" - "z", "A" - "Z", "0" - "9", "_", and "-" according to
> > Attributes.Name#hash(String) so far with only the name and the
> > separator and yet without the values.
> > 
> > 2. if the value is not null, encode it in UTF-8 into a byte array
> > and
> > instantiate a String with it using deprecated
> > String#String(byte[],int,int,int) resulting in a String with the
> > same
> > length as the byte array before holding one byte in each
> > character's
> > low-byte. This makes a difference for characters encoded with more
> > than
> > one byte in UTF-8. The new String is potentially longer than the
> > original value.
> > 
> > 3. if the value is not null, append value to buffer. The one UTF-8
> > encoded byte per character from the appended string is preserved
> > also
> > in the buffer along with the previous buffer contents.
> > 
> > 3alt. if the value is null, add "null" to the buffer. See
> > java.lang.AbstractStringBuilder#append(String). Neither of the
> > characters of "null" has a non-zero high-byte encoded as UTF-16
> > chars.
> > 
> > 4. make72Safe inserts line breaks with continuation spaces. Note
> > that
> > the buffer here contains only one byte per character because all
> > high-
> > bytes are still zero so that line.length() and line.insert(index,
> > ...)
> > effectively operate with byte offsets and not characters.
> > 
> > 5. buffer.toString()
> > 
> > 6. DataOutputStream#writeBytes(String). First of all read the
> > JavaDoc
> > comment for it, which explains it all:
> >      Writes out the string to the underlying output stream as a
> >      sequence of bytes. Each character in the string is written
> > out, in
> >      sequence, by discarding its high eight bits. If no exception
> > is
> >      thrown, the counter <code>written</code> is incremented by the
> >      length of <code>s</code>
> > This restores the earlier UTF-8 encoding correctly.
> > 
> > The topic has been discussed and mentioned already in
> > http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-May/05294
> > 6.ht
> > ml
> > https://bugs.openjdk.java.net/browse/JDK-6202130
> > 
> > String(byte[],int,int,int) works "well" or "well enough" only
> > together
> > with DataOutputStream#writeBytes(String). When removing
> > String(byte[],int,int,int) from Manifest and Attributes because
> > deprecated, it makes no sense to keep using
> > DataOutputStream#writeBytes(String) either.
> > 
> > For the same reason as String#String(byte[],int,int,int) has been
> > deprecated, I suggest to also deprecate
> > java.io.DataOutput#writeBytes(String) as a separate issue. This
> > might
> > relate to https://bugs.openjdk.java.net/browse/JDK-6400767 but that
> > one
> > came to a different conclusion some ten years ago.
> > 
> > I preferred to stick with the DataOutputStream even though not
> > strictly
> > necessary any more. It is and has been in the API of Attributes
> > (unfortunately not private) and should better not be removed by
> > changing the parameter type. Same for
> > Manifest#make72Safe(StringBuffer)
> > which I deprecated rather than having removed. Someone could have
> > extended a class from Manifest and use such a method and when
> > changing
> > the signature it could no longer even compile in a far-fetched
> > case.
> > 
> > LINE_BREAK, CONTINUATION_SPACE, LINE_BREAK_BYTES, and
> > LINE_BREAK_WITH_CONTINUATION_SPACE_BYTES should prevent having to
> > invoke getBytes(UTF_8) over and over again on "\r\n" and "\r\n "
> > with
> > the idea to slightly improve performance this way. I figured it
> > does
> > not need JavaDoc comments but would be happy to add them if
> > desired.
> > 
> > I removed "XXX Need to handle UTF8 values." from Manifest#read
> > after
> > adding a test for it in ValueUtf8Coding. This change and test also
> > relate to bug 6202130 but does not solve that one completely.
> > ValueUtf8Coding demonstrates that Manifest can read UTF-8 encoded
> > values which is a necessary test case to cover for this patch here.
> > ValueUtf8Coding is the same test as already submitted and suggested
> > earlier. See
> > http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-October/t
> > hrea
> > d.html#55848
> > 
> > Indentation in Attributes#write(DataOutputStream) was five spaces
> > on
> > most lines. I fixed indentation only on the lines changed anyway.
> > 
> > I replaced String#String(byte[],int,int,String) with
> > String#String(byte[],int,int,java.nio.charset.StandardCharsets.UTF_
> > 8)
> > which as a difference does not declare to throw a
> > java.io.UnsupportedEncodingException. That also replaced "UTF8" as
> > a
> > charset name which I would consider not optimal regarding
> > sun.nio.cs.UTF_8#UTF_8() and sun.nio.cs.UTF_8#historicalName().
> > 
> > In my opinion there is still some duplicated or at least very
> > similar
> > code in Manifest#write, Attributes#writeMain, and Attributes#write
> > but
> > I preferred to change less rather than more and not to further
> > refactor
> > and re-combine it.
> > 
> > In EmptyKeysAndValues and NullKeysAndValues tests I tried to
> > demonstrate that the changed implementation does not change
> > behaviour
> > also in edge cases. I would have expected not having to test all
> > these
> > cases but then I realized it was possible to test and is therefore
> > possible in a real use case as well however far-fetched. At least
> > the
> > 
> >      if (value != null) {
> > 
> > lines (three times) most obviously demand to test the null value
> > cases.
> > 
> > I'm looking curiously forward to any kind of feedback or opinion.
> > 
> > Philipp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 8066619.patch
Type: text/x-patch
Size: 27217 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20181204/a5d7e6e3/8066619-0001.patch>