RFR: JDK-8066619: String(byte[],int,int,int) in String has been deprecated in Manifest and Attributes

Philipp Kunz philipp.kunz at paratix.ch
Sat Dec 1 11:49:15 UTC 2018


Find the proposed patch attached. Some comments and explanations, here:

There is a quite interesting implementation in Manifest and Attributes
worth quite some explanation. The way it used to work before was:

1. put manifest header name, colon and space into a StringBuffer
-> the buffer now contains a string of characters each high-byte of
which is zero as explained later why this is important. the high-bytes
are zero because the set of allowed characters is very limited to ":",
" ", "a" - "z", "A" - "Z", "0" - "9", "_", and "-" according to
Attributes.Name#hash(String) so far with only the name and the
separator and yet without the values.

2. if the value is not null, encode it in UTF-8 into a byte array and
instantiate a String with it using deprecated
String#String(byte[],int,int,int) resulting in a String with the same
length as the byte array before holding one byte in each character's
low-byte. This makes a difference for characters encoded with more than
one byte in UTF-8. The new String is potentially longer than the
original value.

3. if the value is not null, append value to buffer. The one UTF-8
encoded byte per character from the appended string is preserved also
in the buffer along with the previous buffer contents.

3alt. if the value is null, add "null" to the buffer. See
java.lang.AbstractStringBuilder#append(String). Neither of the
characters of "null" has a non-zero high-byte encoded as UTF-16 chars.

4. make72Safe inserts line breaks with continuation spaces. Note that
the buffer here contains only one byte per character because all high-
bytes are still zero so that line.length() and line.insert(index, ...)
effectively operate with byte offsets and not characters.

5. buffer.toString()

6. DataOutputStream#writeBytes(String). First of all read the JavaDoc
comment for it, which explains it all:
    Writes out the string to the underlying output stream as a
    sequence of bytes. Each character in the string is written out, in
    sequence, by discarding its high eight bits. If no exception is
    thrown, the counter <code>written</code> is incremented by the
    length of <code>s</code>
This restores the earlier UTF-8 encoding correctly.

The topic has been discussed and mentioned already in
http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-May/052946.ht
ml
https://bugs.openjdk.java.net/browse/JDK-6202130

String(byte[],int,int,int) works "well" or "well enough" only together
with DataOutputStream#writeBytes(String). When removing
String(byte[],int,int,int) from Manifest and Attributes because
deprecated, it makes no sense to keep using
DataOutputStream#writeBytes(String) either.

For the same reason as String#String(byte[],int,int,int) has been
deprecated, I suggest to also deprecate
java.io.DataOutput#writeBytes(String) as a separate issue. This might
relate to https://bugs.openjdk.java.net/browse/JDK-6400767 but that one
came to a different conclusion some ten years ago.

I preferred to stick with the DataOutputStream even though not strictly
necessary any more. It is and has been in the API of Attributes
(unfortunately not private) and should better not be removed by
changing the parameter type. Same for Manifest#make72Safe(StringBuffer)
which I deprecated rather than having removed. Someone could have
extended a class from Manifest and use such a method and when changing
the signature it could no longer even compile in a far-fetched case.

LINE_BREAK, CONTINUATION_SPACE, LINE_BREAK_BYTES, and
LINE_BREAK_WITH_CONTINUATION_SPACE_BYTES should prevent having to
invoke getBytes(UTF_8) over and over again on "\r\n" and "\r\n " with
the idea to slightly improve performance this way. I figured it does
not need JavaDoc comments but would be happy to add them if desired.

I removed "XXX Need to handle UTF8 values." from Manifest#read after
adding a test for it in ValueUtf8Coding. This change and test also
relate to bug 6202130 but does not solve that one completely.
ValueUtf8Coding demonstrates that Manifest can read UTF-8 encoded
values which is a necessary test case to cover for this patch here.
ValueUtf8Coding is the same test as already submitted and suggested
earlier. See
http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-October/threa
d.html#55848

Indentation in Attributes#write(DataOutputStream) was five spaces on
most lines. I fixed indentation only on the lines changed anyway.

I replaced String#String(byte[],int,int,String) with
String#String(byte[],int,int,java.nio.charset.StandardCharsets.UTF_8)
which as a difference does not declare to throw a
java.io.UnsupportedEncodingException. That also replaced "UTF8" as a
charset name which I would consider not optimal regarding
sun.nio.cs.UTF_8#UTF_8() and sun.nio.cs.UTF_8#historicalName().

In my opinion there is still some duplicated or at least very similar
code in Manifest#write, Attributes#writeMain, and Attributes#write but
I preferred to change less rather than more and not to further refactor
and re-combine it.

In EmptyKeysAndValues and NullKeysAndValues tests I tried to
demonstrate that the changed implementation does not change behaviour
also in edge cases. I would have expected not having to test all these
cases but then I realized it was possible to test and is therefore
possible in a real use case as well however far-fetched. At least the

    if (value != null) {

lines (three times) most obviously demand to test the null value cases.

I'm looking curiously forward to any kind of feedback or opinion.

Philipp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 8066619.patch
Type: text/x-patch
Size: 8905 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20181201/2d50ce27/8066619.patch>


More information about the core-libs-dev mailing list