RFR: JDK-8066619: String(byte[],int,int,int) in String has been deprecated in Manifest and Attributes

Roger Riggs Roger.Riggs at oracle.com
Wed Dec 12 15:52:24 UTC 2018


Hi Phillip,

Sorry, got busy...

Can you rebase the patch to the current repo, it did not apply cleanly.

I know you are focused on removing the deprecation, but a few localized 
improvements
would be good.

In Attributes.java : 346-337, it uses StringBuffer, please change it to 
StringBuilder.
   Unless thread safety is an issue, StringBuilder is recommended, it is
   slightly more efficient since it does no synchronization.

- And the StringBuilder should be sized when it is created, to avoid 
needing to resize it later.
   Using a single StringBuilder all the entries, using setLength(0), 
would save allocating
   for each entry.

  - check the indentation @line 308-20 and 311.


In Manifest.java:
  - write72 method  !String.isEmpty() is preferred over the .length() > 0.
  - if the line is empty, should it write the LINE_BREAK_BYTES?
    A blank line in the manifest may be seen as significant.

  - in the write method: Change StringBuffer to StringBuilder

  - The javadoc links to MANIFEST_VERSION and SIGNATURE_VERSION should 
use "#".
      * {@link Attributes.Name#MANIFEST_VERSION} or
      * {@link Attributes.Name#SIGNATURE_VERSION} must be set in

Thanks, Roger


On 12/04/2018 03:34 AM, Philipp Kunz wrote:
> Hi Roger,
>
> I'm afraid the previous patch missed the tests, should be included 
> this time.
>
> The intention of the patch is to solve only bug 8066619 about 
> deprecation. I sincerely hope the changes are neutral.
>
> The new ValueUtf8Coding test heavily coincides/overlaps with 6202130 
> which is why I mentioned it. I'm however not satisfied that that test 
> alone also completely solves 6202130 because 6202130 has or might have 
> implications with breaking characters encoded in UTF-8 with more than 
> one bytes across a line break onto a continuation line which is not 
> part of the current patch proposed for 8066619. At some point I 
> proposed ValueUtf8Coding with only removing the comments from the 
> implementation in 
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-October/056166.html 
> but I have changed my mind since then and think now 6202130 should 
> also change the implementation not to break lines inside of multi-byte 
> characters which is not part of the current patch and is probably 
> easier after the current patch if necessary at all. Both 6202130 and 
> the current patch for 8066619 here touch the UTF-8 coding of manifests 
> and which ever is solved first should add a corresponding test because 
> no such test exists yet I believe. Worth to mention are 
> test/jdk/tools/launcher/DiacriticTest.java and 
> test/jdk/tools/launcher/UnicodeTest.java both of which test the JVM 
> launch and have a somewhat different purpose. I haven't found any 
> other test for the specifically changed lines of code apart from 
> probably many tests that use manifests indirectly in some form.
>
> Regards,
> Philipp
>
>
> On Mon, 2018-12-03 at 16:43 -0500, Roger Riggs wrote:
>> Hi Phillip,
>>
>> The amount detail obscures the general purpose.
>> And there appears to be more than 1.
>> The Jira issue IDs mentioned are 8066619 and 6202130.
>>
>> Is this functionally neutral and only fixes the deprecations?
>>
>> There is a mention that a test is needed for multi-byte chars, but a test
>> is not included.  Is there an existing test for that?
>>
>> Its probably best to identify the main functional improvement (multi-byte)
>> and fix the deprecation as a side effect.
>>
>> Thanks for digging through the issues and the explanations;
>> it will take a bit of study to unravel and understand everything in this
>> changeset.
>>
>> Regards, Roger
>>
>>
>> On 12/01/2018 06:49 AM, Philipp Kunz wrote:
>>> Find the proposed patch attached. Some comments and explanations, 
>>> here: There is a quite interesting implementation in Manifest and 
>>> Attributes worth quite some explanation. The way it used to work 
>>> before was: 1. put manifest header name, colon and space into a 
>>> StringBuffer -> the buffer now contains a string of characters each 
>>> high-byte of which is zero as explained later why this is important. 
>>> the high-bytes are zero because the set of allowed characters is 
>>> very limited to ":", " ", "a" - "z", "A" - "Z", "0" - "9", "_", and 
>>> "-" according to Attributes.Name#hash(String) so far with only the 
>>> name and the separator and yet without the values. 2. if the value 
>>> is not null, encode it in UTF-8 into a byte array and instantiate a 
>>> String with it using deprecated String#String(byte[],int,int,int) 
>>> resulting in a String with the same length as the byte array before 
>>> holding one byte in each character's low-byte. This makes a 
>>> difference for characters encoded with more than one byte in UTF-8. 
>>> The new String is potentially longer than the original value. 3. if 
>>> the value is not null, append value to buffer. The one UTF-8 encoded 
>>> byte per character from the appended string is preserved also in the 
>>> buffer along with the previous buffer contents. 3alt. if the value 
>>> is null, add "null" to the buffer. See 
>>> java.lang.AbstractStringBuilder#append(String). Neither of the 
>>> characters of "null" has a non-zero high-byte encoded as UTF-16 
>>> chars. 4. make72Safe inserts line breaks with continuation spaces. 
>>> Note that the buffer here contains only one byte per character 
>>> because all high- bytes are still zero so that line.length() and 
>>> line.insert(index, ...) effectively operate with byte offsets and 
>>> not characters. 5. buffer.toString() 6. 
>>> DataOutputStream#writeBytes(String). First of all read the JavaDoc 
>>> comment for it, which explains it all: Writes out the string to the 
>>> underlying output stream as a     sequence of bytes. Each character 
>>> in the string is written out, in     sequence, by discarding its 
>>> high eight bits. If no exception is     thrown, the counter 
>>> <code>written</code> is incremented by the     length of 
>>> <code>s</code> This restores the earlier UTF-8 encoding correctly. 
>>> The topic has been discussed and mentioned already in 
>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-May/052946.ht 
>>> ml https://bugs.openjdk.java.net/browse/JDK-6202130 
>>> String(byte[],int,int,int) works "well" or "well enough" only 
>>> together with DataOutputStream#writeBytes(String). When removing 
>>> String(byte[],int,int,int) from Manifest and Attributes because 
>>> deprecated, it makes no sense to keep using 
>>> DataOutputStream#writeBytes(String) either. For the same reason 
>>> as String#String(byte[],int,int,int) has been deprecated, I suggest 
>>> to also deprecate java.io.DataOutput#writeBytes(String) as a 
>>> separate issue. This might relate to 
>>> https://bugs.openjdk.java.net/browse/JDK-6400767 but that one came 
>>> to a different conclusion some ten years ago. I preferred to stick 
>>> with the DataOutputStream even though not strictly necessary any 
>>> more. It is and has been in the API of Attributes (unfortunately not 
>>> private) and should better not be removed by changing the parameter 
>>> type. Same for Manifest#make72Safe(StringBuffer) which I deprecated 
>>> rather than having removed. Someone could have extended a class from 
>>> Manifest and use such a method and when changing the signature it 
>>> could no longer even compile in a far-fetched case. LINE_BREAK, 
>>> CONTINUATION_SPACE, LINE_BREAK_BYTES, and 
>>> LINE_BREAK_WITH_CONTINUATION_SPACE_BYTES should prevent having to 
>>> invoke getBytes(UTF_8) over and over again on "\r\n" and "\r\n " 
>>> with the idea to slightly improve performance this way. I figured it 
>>> does not need JavaDoc comments but would be happy to add them if 
>>> desired. I removed "XXX Need to handle UTF8 values." from 
>>> Manifest#read after adding a test for it in ValueUtf8Coding. This 
>>> change and test also relate to bug 6202130 but does not solve that 
>>> one completely. ValueUtf8Coding demonstrates that Manifest can read 
>>> UTF-8 encoded values which is a necessary test case to cover for 
>>> this patch here. ValueUtf8Coding is the same test as already 
>>> submitted and suggested earlier. See 
>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-October/threa 
>>> d.html#55848 Indentation in Attributes#write(DataOutputStream) was 
>>> five spaces on most lines. I fixed indentation only on the lines 
>>> changed anyway. I replaced String#String(byte[],int,int,String) with 
>>> String#String(byte[],int,int,java.nio.charset.StandardCharsets.UTF_8) 
>>> which as a difference does not declare to throw a 
>>> java.io.UnsupportedEncodingException. That also replaced "UTF8" as a 
>>> charset name which I would consider not optimal regarding 
>>> sun.nio.cs.UTF_8#UTF_8() and sun.nio.cs.UTF_8#historicalName(). In 
>>> my opinion there is still some duplicated or at least very similar 
>>> code in Manifest#write, Attributes#writeMain, and Attributes#write 
>>> but I preferred to change less rather than more and not to further 
>>> refactor and re-combine it. In EmptyKeysAndValues and 
>>> NullKeysAndValues tests I tried to demonstrate that the changed 
>>> implementation does not change behaviour also in edge cases. I would 
>>> have expected not having to test all these cases but then I realized 
>>> it was possible to test and is therefore possible in a real use case 
>>> as well however far-fetched. At least the if (value != null) { lines 
>>> (three times) most obviously demand to test the null value cases. 
>>> I'm looking curiously forward to any kind of feedback or opinion. 
>>> Philipp 
>>
>>



More information about the core-libs-dev mailing list