RFR: JDK-8066619: String(byte[],int,int,int) in String has been deprecated in Manifest and Attributes

Mon Dec 17 17:12:20 UTC 2018

Hi Philipp,

Manifest.java:

  - Line 258: creating a new array for two characters on each call isn't 
as efficient as:
     out.write('\r');
     out.write('\n').

The new test that need internal access can gain that access by adding:
    @modules java.base/java.util.jar:+open

That instructs testng to add the correct command line switches.
Then you can remove --illegal-access=warn and the tests will work.

In the test ValueUtf8Coding, just a mention of a method to create a 
string with repeats.
      "-".repeat(80);

On 12/17/2018 01:42 AM, Philipp Kunz wrote:
> Hi Roger,
>
> Thank you very much for your review and the feedback. Please find my 
> comments below and a new patch attached.
>
> Philipp
>
>
> On Wed, 2018-12-12 at 10:52 -0500, Roger Riggs wrote:
>> Hi Phillip,
>>
>> Sorry, got busy...
>>
>> Can you rebase the patch to the current repo, it did not apply cleanly.
>>
>> I know you are focused on removing the deprecation, but a few 
>> localized improvements
>> would be good.
>>
>> In Attributes.java : 346-337, it uses StringBuffer, please change it 
>> to StringBuilder.
>>   Unless thread safety is an issue, StringBuilder is recommended, it is
>>   slightly more efficient since it does no synchronization.
> I did deliberately not touch the StringBuffer in the previous patch 
> but fully agree now I know it has a chance to be accepted. Would you 
> accept to replace StringBuffer with plain string concatenation after 
> http://openjdk.java.net/jeps/280 which was not in place at the time 
> those StringBuffers were introduced?
Using "+" concat is fine.
>>
>> - And the StringBuilder should be sized when it is created, to avoid 
>> needing to resize it later.
>>   Using a single StringBuilder all the entries, using setLength(0), 
>> would save allocating
>>   for each entry.
> Jep 280 would also avoid having to size the buffers far as I understand.
>>
>>  - check the indentation @line 308-20 and 311.
> The indentation was weird, 5 instead of 4 spaces on some lines and I 
> re-indented only the lines I touched anyway in the previous patch. 
> Now, some lines appear changed only due to the indentation. After the 
> StringBuffer removal only two of them are left in the current patch 
> and certainly don't add significantly many unrelated changed lines now 
> any more.
ok
>>
>> In Manifest.java:
>>  - write72 method  !String.isEmpty() is preferred over the .length() > 0.
> ok
>>  - if the line is empty, should it write the LINE_BREAK_BYTES?
>>    A blank line in the manifest may be seen as significant.
> Before the patch, a line break was always added to the end of the 
> StringBuffer after passing to make72Safe and before writing it. Now 
> with the previous patch, write72 added it. Altogether makes no 
> difference. But after reconsidering your point, I found a clearer 
> approach, I hope, than passing an empty string for having a line break 
> requested to be output which now also seems to me having been not the 
> most obvious way and more like a kind of a hack before. In the course 
> of that change I also renamed write72 to println and println72 and 
> also added a test for it. Hope you also like it better that way.
  - Line 257:  println() always makes me think of the system specific 
line separator.
      I'd name it writeln() and write72ln.  I think, write is clearer 
that it is bytes being written with
      no charset implications.
>>
>>  - in the write method: Change StringBuffer to StringBuilder
>>
>>  - The javadoc links to MANIFEST_VERSION and SIGNATURE_VERSION should 
>> use "#".
>>      * {@link Attributes.Name#MANIFEST_VERSION} or
>>      * {@link Attributes.Name#SIGNATURE_VERSION} must be set in
> removed it again because it applies more to bug 8196371 or 6910466
ok

Thanks, Roger
>>
>> Thanks, Roger
>>
>>
>> On 12/04/2018 03:34 AM, Philipp Kunz wrote:
>>> Hi Roger,
>>>
>>> I'm afraid the previous patch missed the tests, should be included 
>>> this time.
>>>
>>> The intention of the patch is to solve only bug 8066619 about 
>>> deprecation. I sincerely hope the changes are neutral.
>>>
>>> The new ValueUtf8Coding test heavily coincides/overlaps with 6202130 
>>> which is why I mentioned it. I'm however not satisfied that that 
>>> test alone also completely solves 6202130 because 6202130 has or 
>>> might have implications with breaking characters encoded in UTF-8 
>>> with more than one bytes across a line break onto a continuation 
>>> line which is not part of the current patch proposed for 8066619. At 
>>> some point I proposed ValueUtf8Coding with only removing the 
>>> comments from the implementation in 
>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-October/056166.html 
>>> but I have changed my mind since then and think now 6202130 should 
>>> also change the implementation not to break lines inside of 
>>> multi-byte characters which is not part of the current patch and is 
>>> probably easier after the current patch if necessary at all. Both 
>>> 6202130 and the current patch for 8066619 here touch the UTF-8 
>>> coding of manifests and which ever is solved first should add a 
>>> corresponding test because no such test exists yet I believe. Worth 
>>> to mention are test/jdk/tools/launcher/DiacriticTest.java and 
>>> test/jdk/tools/launcher/UnicodeTest.java both of which test the JVM 
>>> launch and have a somewhat different purpose. I haven't found any 
>>> other test for the specifically changed lines of code apart from 
>>> probably many tests that use manifests indirectly in some form.
>>>
>>> Regards,
>>> Philipp
>>>
>>>
>>> On Mon, 2018-12-03 at 16:43 -0500, Roger Riggs wrote:
>>>> Hi Phillip,
>>>>
>>>> The amount detail obscures the general purpose.
>>>> And there appears to be more than 1.
>>>> The Jira issue IDs mentioned are 8066619 and 6202130.
>>>>
>>>> Is this functionally neutral and only fixes the deprecations?
>>>>
>>>> There is a mention that a test is needed for multi-byte chars, but a test
>>>> is not included.  Is there an existing test for that?
>>>>
>>>> Its probably best to identify the main functional improvement (multi-byte)
>>>> and fix the deprecation as a side effect.
>>>>
>>>> Thanks for digging through the issues and the explanations;
>>>> it will take a bit of study to unravel and understand everything in this
>>>> changeset.
>>>>
>>>> Regards, Roger
>>>>
>>>>
>>>> On 12/01/2018 06:49 AM, Philipp Kunz wrote:
>>>>> Find the proposed patch attached. Some comments and explanations, 
>>>>> here: There is a quite interesting implementation in Manifest and 
>>>>> Attributes worth quite some explanation. The way it used to work 
>>>>> before was: 1. put manifest header name, colon and space into a 
>>>>> StringBuffer -> the buffer now contains a string of characters 
>>>>> each high-byte of which is zero as explained later why this is 
>>>>> important. the high-bytes are zero because the set of allowed 
>>>>> characters is very limited to ":", " ", "a" - "z", "A" - "Z", "0" 
>>>>> - "9", "_", and "-" according to Attributes.Name#hash(String) so 
>>>>> far with only the name and the separator and yet without the 
>>>>> values. 2. if the value is not null, encode it in UTF-8 into a 
>>>>> byte array and instantiate a String with it using deprecated 
>>>>> String#String(byte[],int,int,int) resulting in a String with the 
>>>>> same length as the byte array before holding one byte in each 
>>>>> character's low-byte. This makes a difference for characters 
>>>>> encoded with more than one byte in UTF-8. The new String is 
>>>>> potentially longer than the original value. 3. if the value is not 
>>>>> null, append value to buffer. The one UTF-8 encoded byte per 
>>>>> character from the appended string is preserved also in the buffer 
>>>>> along with the previous buffer contents. 3alt. if the value is 
>>>>> null, add "null" to the buffer. See 
>>>>> java.lang.AbstractStringBuilder#append(String). Neither of the 
>>>>> characters of "null" has a non-zero high-byte encoded as UTF-16 
>>>>> chars. 4. make72Safe inserts line breaks with continuation spaces. 
>>>>> Note that the buffer here contains only one byte per character 
>>>>> because all high- bytes are still zero so that line.length() and 
>>>>> line.insert(index, ...) effectively operate with byte offsets and 
>>>>> not characters. 5. buffer.toString() 6. 
>>>>> DataOutputStream#writeBytes(String). First of all read the JavaDoc 
>>>>> comment for it, which explains it all: Writes out the string to 
>>>>> the underlying output stream as a     sequence of bytes. Each 
>>>>> character in the string is written out, in     sequence, by 
>>>>> discarding its high eight bits. If no exception is     thrown, the 
>>>>> counter <code>written</code> is incremented by the     length of 
>>>>> <code>s</code> This restores the earlier UTF-8 encoding correctly. 
>>>>> The topic has been discussed and mentioned already in 
>>>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-May/052946.ht 
>>>>> ml https://bugs.openjdk.java.net/browse/JDK-6202130 
>>>>> String(byte[],int,int,int) works "well" or "well enough" only 
>>>>> together with DataOutputStream#writeBytes(String). When removing 
>>>>> String(byte[],int,int,int) from Manifest and Attributes because 
>>>>> deprecated, it makes no sense to keep using 
>>>>> DataOutputStream#writeBytes(String) either. For the same reason 
>>>>> as String#String(byte[],int,int,int) has been deprecated, I 
>>>>> suggest to also deprecate java.io.DataOutput#writeBytes(String) as 
>>>>> a separate issue. This might relate to 
>>>>> https://bugs.openjdk.java.net/browse/JDK-6400767 but that one came 
>>>>> to a different conclusion some ten years ago. I preferred to stick 
>>>>> with the DataOutputStream even though not strictly necessary any 
>>>>> more. It is and has been in the API of Attributes (unfortunately 
>>>>> not private) and should better not be removed by changing the 
>>>>> parameter type. Same for Manifest#make72Safe(StringBuffer) which I 
>>>>> deprecated rather than having removed. Someone could have extended 
>>>>> a class from Manifest and use such a method and when changing the 
>>>>> signature it could no longer even compile in a far-fetched case. 
>>>>> LINE_BREAK, CONTINUATION_SPACE, LINE_BREAK_BYTES, and 
>>>>> LINE_BREAK_WITH_CONTINUATION_SPACE_BYTES should prevent having to 
>>>>> invoke getBytes(UTF_8) over and over again on "\r\n" and "\r\n " 
>>>>> with the idea to slightly improve performance this way. I figured 
>>>>> it does not need JavaDoc comments but would be happy to add them 
>>>>> if desired. I removed "XXX Need to handle UTF8 values." from 
>>>>> Manifest#read after adding a test for it in ValueUtf8Coding. This 
>>>>> change and test also relate to bug 6202130 but does not solve that 
>>>>> one completely. ValueUtf8Coding demonstrates that Manifest can 
>>>>> read UTF-8 encoded values which is a necessary test case to cover 
>>>>> for this patch here. ValueUtf8Coding is the same test as already 
>>>>> submitted and suggested earlier. See 
>>>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-October/threa 
>>>>> d.html#55848 Indentation in Attributes#write(DataOutputStream) was 
>>>>> five spaces on most lines. I fixed indentation only on the lines 
>>>>> changed anyway. I replaced String#String(byte[],int,int,String) 
>>>>> with 
>>>>> String#String(byte[],int,int,java.nio.charset.StandardCharsets.UTF_8) 
>>>>> which as a difference does not declare to throw a 
>>>>> java.io.UnsupportedEncodingException. That also replaced "UTF8" as 
>>>>> a charset name which I would consider not optimal regarding 
>>>>> sun.nio.cs.UTF_8#UTF_8() and sun.nio.cs.UTF_8#historicalName(). In 
>>>>> my opinion there is still some duplicated or at least very similar 
>>>>> code in Manifest#write, Attributes#writeMain, and Attributes#write 
>>>>> but I preferred to change less rather than more and not to further 
>>>>> refactor and re-combine it. In EmptyKeysAndValues and 
>>>>> NullKeysAndValues tests I tried to demonstrate that the changed 
>>>>> implementation does not change behaviour also in edge cases. I 
>>>>> would have expected not having to test all these cases but then I 
>>>>> realized it was possible to test and is therefore possible in a 
>>>>> real use case as well however far-fetched. At least the if (value 
>>>>> != null) { lines (three times) most obviously demand to test the 
>>>>> null value cases. I'm looking curiously forward to any kind of 
>>>>> feedback or opinion. Philipp 
>>>>
>>>>
>>