RFR: JDK-8066619: String(byte[],int,int,int) in String has been deprecated in Manifest and Attributes
Roger Riggs
Roger.Riggs at oracle.com
Mon Dec 17 17:12:20 UTC 2018
Hi Philipp,
Manifest.java:
- Line 258: creating a new array for two characters on each call isn't
as efficient as:
out.write('\r');
out.write('\n').
The new test that need internal access can gain that access by adding:
@modules java.base/java.util.jar:+open
That instructs testng to add the correct command line switches.
Then you can remove --illegal-access=warn and the tests will work.
In the test ValueUtf8Coding, just a mention of a method to create a
string with repeats.
"-".repeat(80);
On 12/17/2018 01:42 AM, Philipp Kunz wrote:
> Hi Roger,
>
> Thank you very much for your review and the feedback. Please find my
> comments below and a new patch attached.
>
> Philipp
>
>
> On Wed, 2018-12-12 at 10:52 -0500, Roger Riggs wrote:
>> Hi Phillip,
>>
>> Sorry, got busy...
>>
>> Can you rebase the patch to the current repo, it did not apply cleanly.
>>
>> I know you are focused on removing the deprecation, but a few
>> localized improvements
>> would be good.
>>
>> In Attributes.java : 346-337, it uses StringBuffer, please change it
>> to StringBuilder.
>> Unless thread safety is an issue, StringBuilder is recommended, it is
>> slightly more efficient since it does no synchronization.
> I did deliberately not touch the StringBuffer in the previous patch
> but fully agree now I know it has a chance to be accepted. Would you
> accept to replace StringBuffer with plain string concatenation after
> http://openjdk.java.net/jeps/280 which was not in place at the time
> those StringBuffers were introduced?
Using "+" concat is fine.
>>
>> - And the StringBuilder should be sized when it is created, to avoid
>> needing to resize it later.
>> Using a single StringBuilder all the entries, using setLength(0),
>> would save allocating
>> for each entry.
> Jep 280 would also avoid having to size the buffers far as I understand.
>>
>> - check the indentation @line 308-20 and 311.
> The indentation was weird, 5 instead of 4 spaces on some lines and I
> re-indented only the lines I touched anyway in the previous patch.
> Now, some lines appear changed only due to the indentation. After the
> StringBuffer removal only two of them are left in the current patch
> and certainly don't add significantly many unrelated changed lines now
> any more.
ok
>>
>> In Manifest.java:
>> - write72 method !String.isEmpty() is preferred over the .length() > 0.
> ok
>> - if the line is empty, should it write the LINE_BREAK_BYTES?
>> A blank line in the manifest may be seen as significant.
> Before the patch, a line break was always added to the end of the
> StringBuffer after passing to make72Safe and before writing it. Now
> with the previous patch, write72 added it. Altogether makes no
> difference. But after reconsidering your point, I found a clearer
> approach, I hope, than passing an empty string for having a line break
> requested to be output which now also seems to me having been not the
> most obvious way and more like a kind of a hack before. In the course
> of that change I also renamed write72 to println and println72 and
> also added a test for it. Hope you also like it better that way.
- Line 257: println() always makes me think of the system specific
line separator.
I'd name it writeln() and write72ln. I think, write is clearer
that it is bytes being written with
no charset implications.
>>
>> - in the write method: Change StringBuffer to StringBuilder
>>
>> - The javadoc links to MANIFEST_VERSION and SIGNATURE_VERSION should
>> use "#".
>> * {@link Attributes.Name#MANIFEST_VERSION} or
>> * {@link Attributes.Name#SIGNATURE_VERSION} must be set in
> removed it again because it applies more to bug 8196371 or 6910466
ok
Thanks, Roger
>>
>> Thanks, Roger
>>
>>
>> On 12/04/2018 03:34 AM, Philipp Kunz wrote:
>>> Hi Roger,
>>>
>>> I'm afraid the previous patch missed the tests, should be included
>>> this time.
>>>
>>> The intention of the patch is to solve only bug 8066619 about
>>> deprecation. I sincerely hope the changes are neutral.
>>>
>>> The new ValueUtf8Coding test heavily coincides/overlaps with 6202130
>>> which is why I mentioned it. I'm however not satisfied that that
>>> test alone also completely solves 6202130 because 6202130 has or
>>> might have implications with breaking characters encoded in UTF-8
>>> with more than one bytes across a line break onto a continuation
>>> line which is not part of the current patch proposed for 8066619. At
>>> some point I proposed ValueUtf8Coding with only removing the
>>> comments from the implementation in
>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-October/056166.html
>>> but I have changed my mind since then and think now 6202130 should
>>> also change the implementation not to break lines inside of
>>> multi-byte characters which is not part of the current patch and is
>>> probably easier after the current patch if necessary at all. Both
>>> 6202130 and the current patch for 8066619 here touch the UTF-8
>>> coding of manifests and which ever is solved first should add a
>>> corresponding test because no such test exists yet I believe. Worth
>>> to mention are test/jdk/tools/launcher/DiacriticTest.java and
>>> test/jdk/tools/launcher/UnicodeTest.java both of which test the JVM
>>> launch and have a somewhat different purpose. I haven't found any
>>> other test for the specifically changed lines of code apart from
>>> probably many tests that use manifests indirectly in some form.
>>>
>>> Regards,
>>> Philipp
>>>
>>>
>>> On Mon, 2018-12-03 at 16:43 -0500, Roger Riggs wrote:
>>>> Hi Phillip,
>>>>
>>>> The amount detail obscures the general purpose.
>>>> And there appears to be more than 1.
>>>> The Jira issue IDs mentioned are 8066619 and 6202130.
>>>>
>>>> Is this functionally neutral and only fixes the deprecations?
>>>>
>>>> There is a mention that a test is needed for multi-byte chars, but a test
>>>> is not included. Is there an existing test for that?
>>>>
>>>> Its probably best to identify the main functional improvement (multi-byte)
>>>> and fix the deprecation as a side effect.
>>>>
>>>> Thanks for digging through the issues and the explanations;
>>>> it will take a bit of study to unravel and understand everything in this
>>>> changeset.
>>>>
>>>> Regards, Roger
>>>>
>>>>
>>>> On 12/01/2018 06:49 AM, Philipp Kunz wrote:
>>>>> Find the proposed patch attached. Some comments and explanations,
>>>>> here: There is a quite interesting implementation in Manifest and
>>>>> Attributes worth quite some explanation. The way it used to work
>>>>> before was: 1. put manifest header name, colon and space into a
>>>>> StringBuffer -> the buffer now contains a string of characters
>>>>> each high-byte of which is zero as explained later why this is
>>>>> important. the high-bytes are zero because the set of allowed
>>>>> characters is very limited to ":", " ", "a" - "z", "A" - "Z", "0"
>>>>> - "9", "_", and "-" according to Attributes.Name#hash(String) so
>>>>> far with only the name and the separator and yet without the
>>>>> values. 2. if the value is not null, encode it in UTF-8 into a
>>>>> byte array and instantiate a String with it using deprecated
>>>>> String#String(byte[],int,int,int) resulting in a String with the
>>>>> same length as the byte array before holding one byte in each
>>>>> character's low-byte. This makes a difference for characters
>>>>> encoded with more than one byte in UTF-8. The new String is
>>>>> potentially longer than the original value. 3. if the value is not
>>>>> null, append value to buffer. The one UTF-8 encoded byte per
>>>>> character from the appended string is preserved also in the buffer
>>>>> along with the previous buffer contents. 3alt. if the value is
>>>>> null, add "null" to the buffer. See
>>>>> java.lang.AbstractStringBuilder#append(String). Neither of the
>>>>> characters of "null" has a non-zero high-byte encoded as UTF-16
>>>>> chars. 4. make72Safe inserts line breaks with continuation spaces.
>>>>> Note that the buffer here contains only one byte per character
>>>>> because all high- bytes are still zero so that line.length() and
>>>>> line.insert(index, ...) effectively operate with byte offsets and
>>>>> not characters. 5. buffer.toString() 6.
>>>>> DataOutputStream#writeBytes(String). First of all read the JavaDoc
>>>>> comment for it, which explains it all: Writes out the string to
>>>>> the underlying output stream as a sequence of bytes. Each
>>>>> character in the string is written out, in sequence, by
>>>>> discarding its high eight bits. If no exception is thrown, the
>>>>> counter <code>written</code> is incremented by the length of
>>>>> <code>s</code> This restores the earlier UTF-8 encoding correctly.
>>>>> The topic has been discussed and mentioned already in
>>>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-May/052946.ht
>>>>> ml https://bugs.openjdk.java.net/browse/JDK-6202130
>>>>> String(byte[],int,int,int) works "well" or "well enough" only
>>>>> together with DataOutputStream#writeBytes(String). When removing
>>>>> String(byte[],int,int,int) from Manifest and Attributes because
>>>>> deprecated, it makes no sense to keep using
>>>>> DataOutputStream#writeBytes(String) either. For the same reason
>>>>> as String#String(byte[],int,int,int) has been deprecated, I
>>>>> suggest to also deprecate java.io.DataOutput#writeBytes(String) as
>>>>> a separate issue. This might relate to
>>>>> https://bugs.openjdk.java.net/browse/JDK-6400767 but that one came
>>>>> to a different conclusion some ten years ago. I preferred to stick
>>>>> with the DataOutputStream even though not strictly necessary any
>>>>> more. It is and has been in the API of Attributes (unfortunately
>>>>> not private) and should better not be removed by changing the
>>>>> parameter type. Same for Manifest#make72Safe(StringBuffer) which I
>>>>> deprecated rather than having removed. Someone could have extended
>>>>> a class from Manifest and use such a method and when changing the
>>>>> signature it could no longer even compile in a far-fetched case.
>>>>> LINE_BREAK, CONTINUATION_SPACE, LINE_BREAK_BYTES, and
>>>>> LINE_BREAK_WITH_CONTINUATION_SPACE_BYTES should prevent having to
>>>>> invoke getBytes(UTF_8) over and over again on "\r\n" and "\r\n "
>>>>> with the idea to slightly improve performance this way. I figured
>>>>> it does not need JavaDoc comments but would be happy to add them
>>>>> if desired. I removed "XXX Need to handle UTF8 values." from
>>>>> Manifest#read after adding a test for it in ValueUtf8Coding. This
>>>>> change and test also relate to bug 6202130 but does not solve that
>>>>> one completely. ValueUtf8Coding demonstrates that Manifest can
>>>>> read UTF-8 encoded values which is a necessary test case to cover
>>>>> for this patch here. ValueUtf8Coding is the same test as already
>>>>> submitted and suggested earlier. See
>>>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-October/threa
>>>>> d.html#55848 Indentation in Attributes#write(DataOutputStream) was
>>>>> five spaces on most lines. I fixed indentation only on the lines
>>>>> changed anyway. I replaced String#String(byte[],int,int,String)
>>>>> with
>>>>> String#String(byte[],int,int,java.nio.charset.StandardCharsets.UTF_8)
>>>>> which as a difference does not declare to throw a
>>>>> java.io.UnsupportedEncodingException. That also replaced "UTF8" as
>>>>> a charset name which I would consider not optimal regarding
>>>>> sun.nio.cs.UTF_8#UTF_8() and sun.nio.cs.UTF_8#historicalName(). In
>>>>> my opinion there is still some duplicated or at least very similar
>>>>> code in Manifest#write, Attributes#writeMain, and Attributes#write
>>>>> but I preferred to change less rather than more and not to further
>>>>> refactor and re-combine it. In EmptyKeysAndValues and
>>>>> NullKeysAndValues tests I tried to demonstrate that the changed
>>>>> implementation does not change behaviour also in edge cases. I
>>>>> would have expected not having to test all these cases but then I
>>>>> realized it was possible to test and is therefore possible in a
>>>>> real use case as well however far-fetched. At least the if (value
>>>>> != null) { lines (three times) most obviously demand to test the
>>>>> null value cases. I'm looking curiously forward to any kind of
>>>>> feedback or opinion. Philipp
>>>>
>>>>
>>
More information about the core-libs-dev
mailing list