RFR for 6443578 and 6202130: UTF-8 in Manifests
Roger Riggs
roger.riggs at oracle.com
Fri May 4 19:51:57 UTC 2018
Hi Phillip,
Just a reminder that OpenJDK can *only* accept patches via
cr.openjdk.java.net
(as an author) or inline or attached in email.
Thanks, Roger
On 5/2/18 9:12 PM, Philipp Kunz wrote:
> Hi,
>
> Here is patch for 6443578 and 6202130 also in webrev form.
>
> http://files.paratix.ch/jdk/6372077and6443578/webrev.01/
> http://files.paratix.ch/jdk/6372077and6443578/webrev.01.zip
>
> Hope it helps. With all the patience, can I do anything to make it
> easier to get feedback or find a sponsor?
>
> Regards,
> Philipp
>
>
> On 02.05.2018 07:21, Philipp Kunz wrote:
>> Hi,
>>
>> Recently, I tried to fix only bug 6202130 with the intention to fix
>> bug 6443578 later with the intention to get some opportunity for
>> feedback, but haven't got any, and propose now a fix for both
>> together which in my opinion makes more sense.
>>
>> See attached patch.
>>
>> Some considerations, assumptions, and explanations
>>
>> * In my opinion, the code for writing manifests was distributed in
>> the two classes Attributes and Manifest in an elegant way but
>> somewhat difficult to explain the coherence. I chose to group the
>> code that writes manifests into a new class ManifestWriter. The
>> main incentive for that was to prevent or reduce duplicated code
>> I would have had to change twice otherwise. This also results in
>> a source file of a suitable size.
>> * I could not support the assumption that the write and writeMain
>> methods in Attributes couldn't be referenced anywhere so I
>> deprecated them rather than having them removed.
>> * I assumed the patch will not make it into JDK 10 and, hence, the
>> deprecated annotations are attributed with since = 11.
>> * I could not figure out any reason for the use of DataOutputStream
>> and did not use it.
>> * Performance-wise I assume that the code is approximately
>> comparable to the previous version. The biggest improvement in
>> this respect I hope comes from removing the String that contains
>> the byte array constructed with deprecated String(byte[], int,
>> int, int) and then copying it over again to a StringBuffer and
>> from there to a String again and then Characters. On the other
>> hand, keeping whole characters together when breaking lines might
>> make it slightly slower. I hope my changes are an overall
>> improvement, but I haven't measured it.
>> * For telling first from continuation bytes of utf-8 characters
>> apart I re-used a method isNotUtfContinuationByte from either
>> StringCoding or UTF_8.Decoder. Unfortunately I found no way not
>> to duplicate it.
>> * Where it said before "XXX Need to handle UTF8 values and break up
>> lines longer than 72 bytes" in Attributes#writeMain I did not
>> dare to remove the comment completely because it still does not
>> deal correctly with version headers longer than 72 bytes and the
>> set of allowed values. I changed it accordingly. Two similar
>> comments are removed in the patch.
>> * I added two tests, WriteDeprecated and NullKeysAndValues, to
>> demonstrate compatibility as good as I could. Might however not
>> be desired to keep and having to maintain.
>> * LineBrokenMultiByteCharacter for jarsigner should not be removed
>> or not so immediately because someone might attempt to sign an
>> older jarfile created without that patch with a newer jarsigner
>> that already contains it.
>>
>>
>>
>> suggested changes or additions to the bug database: (i have no
>> permissions to edit it myself)
>>
>> * Re-combine copies of isNotUtfContinuationByte (three by now).
>> Relates to 6184334. Worth to file another issue?
>> * Manifest versions have specific specifications, cannot break
>> across lines and can contain a subset of characters only. Bug
>> 6910466 relates but is not exactly the same. If someone else is
>> convinced that writing a manifest should issue a warning or any
>> other way to deal with a version that does not conform to the
>> specification, I'd suggest to create a separate bug for that.
>>
>>
>> Now, I would be glad if someone sponsored a review. This is only my
>> third attempt to submit a patch which is why I chose a lesser
>> important subject to fix in order to get familiar and now I
>> understand it's not the most attractive patch to review. Please don't
>> hesitate to suggest what I could do better or differently.
>>
>> As a bonus, with these changes, manifest files will always be
>> displayed correctly with just any utf capable viewer even if they
>> contain multi-byte utf characters that would have been broken across
>> a line break with the current/previous implementation and all
>> manifests will become also valid strings in Java.
>>
>> Regards,
>> Philipp
>>
>>
>>
>> On 20.04.2018 00:58, Philipp Kunz wrote:
>>> Hi,
>>>
>>> I tried to fix bug 6202130 about manifest utf support and come up
>>> now with a test as suggested in the bug's comments that shows that
>>> utf charset actually works before removing the comments from the code.
>>>
>>> When I wanted to remove the XXX comments about utf it occurred to me
>>> that version attributes ("Signature-Version" and "Manifest-Version")
>>> would never be broken across lines and should anyway not support the
>>> whole utf character set which sounds more like related to bugs
>>> 6910466 or 4935610 but it's not a real fit. Therefore, I could not
>>> remove one such comment of Attributes#writeMain but I changed it.
>>> The first comment in bug 6202130 mentions only two comments but
>>> there are three in Attributes. In the attached patch I removed only
>>> two of three and changed the remaining third to not mention utf
>>> anymore.
>>>
>>> At the moment, at least until 6443578 is fixed, multi-byte utf
>>> characters can be broken across lines. It might be worth a
>>> consideration to test that explicitly as well but then I guess there
>>> is not much of a point in testing the current behavior that will
>>> change with 6443578, hopefully soon. There are in my opinion enough
>>> characters broken across lines in the attached test that demonstrate
>>> that this still works like it did before.
>>>
>>> I would have preferred also to remove the calls to deprecated
>>> String(byte[], int, int, int) but then figured it relates more to
>>> bug 6443578 than 6202130 and now prefer to do that in another
>>> separate patch.
>>>
>>> Bug 6202130 also states that lines are broken by String.length not
>>> by byte length. While it looks so at first glance, I could not
>>> confirm. The combination of getBytes("UTF8"), String(byte[], int,
>>> int, int), and then DataOutputStream.writeBytes(String) in that
>>> combination does not drop high-bytes because every byte (whether a
>>> whole character or only a part of a multi-byte character) becomes a
>>> character in String(...) containing that byte in its low-byte which
>>> will be read again from writeBytes(...). Or put in a different way,
>>> every utf encoded byte becomes a character and multi-byte utf
>>> characters are converted into multiple string characters containing
>>> one byte each in their lower bytes. The current solution is not
>>> nice, but at least works. With that respect I'd like to suggest to
>>> deprecate DataOutputStream.writeBytes(String) because it does
>>> something not exactly expected when guessing from its name and that
>>> would suit a byte[] parameter better very much like it has been done
>>> with String(byte[], int, int, int). Any advice about the procedure
>>> to deprecate something?
>>>
>>> I was surprised that it was not trivial to list all valid utf
>>> characters. If someone has a better idea than isValidUtfCharacter in
>>> the attached test, let me know.
>>>
>>> Altogether, I would not consider 6202130 resolved completely, unless
>>> maybe all remaining points are copied to 6443578 and maybe another
>>> bug about valid values for "Signature-Version" and
>>> "Manifest-Version" if at all desired. But still I consider the
>>> attached patch an improvement and most of the remainder can then be
>>> solved in 6443578 and so far I am looking forward to any kind of
>>> feedback.
>>>
>>> Regards,
>>> Philipp
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/security-dev/attachments/20180504/051196ac/attachment.htm>
More information about the security-dev
mailing list