RFR: JDK-8066619: String(byte[],int,int,int) in String has been deprecated in Manifest and Attributes
Roger Riggs
Roger.Riggs at oracle.com
Wed Dec 12 15:52:24 UTC 2018
Hi Phillip,
Sorry, got busy...
Can you rebase the patch to the current repo, it did not apply cleanly.
I know you are focused on removing the deprecation, but a few localized
improvements
would be good.
In Attributes.java : 346-337, it uses StringBuffer, please change it to
StringBuilder.
Unless thread safety is an issue, StringBuilder is recommended, it is
slightly more efficient since it does no synchronization.
- And the StringBuilder should be sized when it is created, to avoid
needing to resize it later.
Using a single StringBuilder all the entries, using setLength(0),
would save allocating
for each entry.
- check the indentation @line 308-20 and 311.
In Manifest.java:
- write72 method !String.isEmpty() is preferred over the .length() > 0.
- if the line is empty, should it write the LINE_BREAK_BYTES?
A blank line in the manifest may be seen as significant.
- in the write method: Change StringBuffer to StringBuilder
- The javadoc links to MANIFEST_VERSION and SIGNATURE_VERSION should
use "#".
* {@link Attributes.Name#MANIFEST_VERSION} or
* {@link Attributes.Name#SIGNATURE_VERSION} must be set in
Thanks, Roger
On 12/04/2018 03:34 AM, Philipp Kunz wrote:
> Hi Roger,
>
> I'm afraid the previous patch missed the tests, should be included
> this time.
>
> The intention of the patch is to solve only bug 8066619 about
> deprecation. I sincerely hope the changes are neutral.
>
> The new ValueUtf8Coding test heavily coincides/overlaps with 6202130
> which is why I mentioned it. I'm however not satisfied that that test
> alone also completely solves 6202130 because 6202130 has or might have
> implications with breaking characters encoded in UTF-8 with more than
> one bytes across a line break onto a continuation line which is not
> part of the current patch proposed for 8066619. At some point I
> proposed ValueUtf8Coding with only removing the comments from the
> implementation in
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-October/056166.html
> but I have changed my mind since then and think now 6202130 should
> also change the implementation not to break lines inside of multi-byte
> characters which is not part of the current patch and is probably
> easier after the current patch if necessary at all. Both 6202130 and
> the current patch for 8066619 here touch the UTF-8 coding of manifests
> and which ever is solved first should add a corresponding test because
> no such test exists yet I believe. Worth to mention are
> test/jdk/tools/launcher/DiacriticTest.java and
> test/jdk/tools/launcher/UnicodeTest.java both of which test the JVM
> launch and have a somewhat different purpose. I haven't found any
> other test for the specifically changed lines of code apart from
> probably many tests that use manifests indirectly in some form.
>
> Regards,
> Philipp
>
>
> On Mon, 2018-12-03 at 16:43 -0500, Roger Riggs wrote:
>> Hi Phillip,
>>
>> The amount detail obscures the general purpose.
>> And there appears to be more than 1.
>> The Jira issue IDs mentioned are 8066619 and 6202130.
>>
>> Is this functionally neutral and only fixes the deprecations?
>>
>> There is a mention that a test is needed for multi-byte chars, but a test
>> is not included. Is there an existing test for that?
>>
>> Its probably best to identify the main functional improvement (multi-byte)
>> and fix the deprecation as a side effect.
>>
>> Thanks for digging through the issues and the explanations;
>> it will take a bit of study to unravel and understand everything in this
>> changeset.
>>
>> Regards, Roger
>>
>>
>> On 12/01/2018 06:49 AM, Philipp Kunz wrote:
>>> Find the proposed patch attached. Some comments and explanations,
>>> here: There is a quite interesting implementation in Manifest and
>>> Attributes worth quite some explanation. The way it used to work
>>> before was: 1. put manifest header name, colon and space into a
>>> StringBuffer -> the buffer now contains a string of characters each
>>> high-byte of which is zero as explained later why this is important.
>>> the high-bytes are zero because the set of allowed characters is
>>> very limited to ":", " ", "a" - "z", "A" - "Z", "0" - "9", "_", and
>>> "-" according to Attributes.Name#hash(String) so far with only the
>>> name and the separator and yet without the values. 2. if the value
>>> is not null, encode it in UTF-8 into a byte array and instantiate a
>>> String with it using deprecated String#String(byte[],int,int,int)
>>> resulting in a String with the same length as the byte array before
>>> holding one byte in each character's low-byte. This makes a
>>> difference for characters encoded with more than one byte in UTF-8.
>>> The new String is potentially longer than the original value. 3. if
>>> the value is not null, append value to buffer. The one UTF-8 encoded
>>> byte per character from the appended string is preserved also in the
>>> buffer along with the previous buffer contents. 3alt. if the value
>>> is null, add "null" to the buffer. See
>>> java.lang.AbstractStringBuilder#append(String). Neither of the
>>> characters of "null" has a non-zero high-byte encoded as UTF-16
>>> chars. 4. make72Safe inserts line breaks with continuation spaces.
>>> Note that the buffer here contains only one byte per character
>>> because all high- bytes are still zero so that line.length() and
>>> line.insert(index, ...) effectively operate with byte offsets and
>>> not characters. 5. buffer.toString() 6.
>>> DataOutputStream#writeBytes(String). First of all read the JavaDoc
>>> comment for it, which explains it all: Writes out the string to the
>>> underlying output stream as a sequence of bytes. Each character
>>> in the string is written out, in sequence, by discarding its
>>> high eight bits. If no exception is thrown, the counter
>>> <code>written</code> is incremented by the length of
>>> <code>s</code> This restores the earlier UTF-8 encoding correctly.
>>> The topic has been discussed and mentioned already in
>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-May/052946.ht
>>> ml https://bugs.openjdk.java.net/browse/JDK-6202130
>>> String(byte[],int,int,int) works "well" or "well enough" only
>>> together with DataOutputStream#writeBytes(String). When removing
>>> String(byte[],int,int,int) from Manifest and Attributes because
>>> deprecated, it makes no sense to keep using
>>> DataOutputStream#writeBytes(String) either. For the same reason
>>> as String#String(byte[],int,int,int) has been deprecated, I suggest
>>> to also deprecate java.io.DataOutput#writeBytes(String) as a
>>> separate issue. This might relate to
>>> https://bugs.openjdk.java.net/browse/JDK-6400767 but that one came
>>> to a different conclusion some ten years ago. I preferred to stick
>>> with the DataOutputStream even though not strictly necessary any
>>> more. It is and has been in the API of Attributes (unfortunately not
>>> private) and should better not be removed by changing the parameter
>>> type. Same for Manifest#make72Safe(StringBuffer) which I deprecated
>>> rather than having removed. Someone could have extended a class from
>>> Manifest and use such a method and when changing the signature it
>>> could no longer even compile in a far-fetched case. LINE_BREAK,
>>> CONTINUATION_SPACE, LINE_BREAK_BYTES, and
>>> LINE_BREAK_WITH_CONTINUATION_SPACE_BYTES should prevent having to
>>> invoke getBytes(UTF_8) over and over again on "\r\n" and "\r\n "
>>> with the idea to slightly improve performance this way. I figured it
>>> does not need JavaDoc comments but would be happy to add them if
>>> desired. I removed "XXX Need to handle UTF8 values." from
>>> Manifest#read after adding a test for it in ValueUtf8Coding. This
>>> change and test also relate to bug 6202130 but does not solve that
>>> one completely. ValueUtf8Coding demonstrates that Manifest can read
>>> UTF-8 encoded values which is a necessary test case to cover for
>>> this patch here. ValueUtf8Coding is the same test as already
>>> submitted and suggested earlier. See
>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-October/threa
>>> d.html#55848 Indentation in Attributes#write(DataOutputStream) was
>>> five spaces on most lines. I fixed indentation only on the lines
>>> changed anyway. I replaced String#String(byte[],int,int,String) with
>>> String#String(byte[],int,int,java.nio.charset.StandardCharsets.UTF_8)
>>> which as a difference does not declare to throw a
>>> java.io.UnsupportedEncodingException. That also replaced "UTF8" as a
>>> charset name which I would consider not optimal regarding
>>> sun.nio.cs.UTF_8#UTF_8() and sun.nio.cs.UTF_8#historicalName(). In
>>> my opinion there is still some duplicated or at least very similar
>>> code in Manifest#write, Attributes#writeMain, and Attributes#write
>>> but I preferred to change less rather than more and not to further
>>> refactor and re-combine it. In EmptyKeysAndValues and
>>> NullKeysAndValues tests I tried to demonstrate that the changed
>>> implementation does not change behaviour also in edge cases. I would
>>> have expected not having to test all these cases but then I realized
>>> it was possible to test and is therefore possible in a real use case
>>> as well however far-fetched. At least the if (value != null) { lines
>>> (three times) most obviously demand to test the null value cases.
>>> I'm looking curiously forward to any kind of feedback or opinion.
>>> Philipp
>>
>>
More information about the core-libs-dev
mailing list