RFR for 6443578: Continuation lines in JAR manifests do not follow RFC-822

Fri May 4 08:25:18 UTC 2018

Hi Sherman,

Thanks again for pointing out that fewer changes are sufficient. I 
prepared another patch for 6443578.

http://files.paratix.ch/jdk/6443578/webrev.02/
http://files.paratix.ch/jdk/6443578/webrev.02.zip

The approach is basically, not to put a line break where a utf character 
continuation byte follows which can be tested with a bit mask.

For telling first from continuation bytes of utf characters I re-used a 
method isNotUtfContinuationByte from either StringCoding or 
UTF_8.Decoder. Unfortunately I found no way not to duplicate it which I 
consider subject of already known bug 6184334. In case this patch is 
accepted, I suggest to add a comment to 6184334 that refers to the new 
copy in Manifest#make72Safe.

LineBrokenMultiByteCharacter should not be removed or not so immediately 
at least because someone might attempt to sign an older jarfile created 
without that patch with a newer jarsigner that already contains it if 
the jar previously contained digests presumably from another signature.

Looking forward to any feedback and with kind regards,
Philipp

On 03.05.2018 08:56, Xueming Shen wrote:
> Philipp,
>
> I kinda recalled JDK-6202130 which I read long time ago and I believed 
> it's not a bug but
> just wanted to leave it open and for double check, then it's off the 
> priority list somehow...
>
> Reread the code below I think it handles utf8 and the 72-length-limit 
> appropriately, though
> a little tricky,
>
> (1) value.getByte("utf8") + deprecated String(byte[], int, int, int);
>      which creates a String that each "char" inside that String object 
> actually represents one byte
>      value of the resulted utf8 byte sequence, with its utf16 value = 
> [hi-byte= 0 and low-byte="utf8-byte"]
>
> (2) append this String to the StringBuffer (--> StringBuilder), now 
> the sb.length() actually is the
>      length of the utf8 byte sequence, make72Safe(...) is used to 
> adjust the length to 72 working
>      on "chars" inside the buffer.
>
> (3) write out the adjusted buffer via DataOutputStream.writeBytes()
>      in which the "write" cuts off the hi-byte of that utf16, so you 
> actually get the original
>      utf-8 byte sequence output to the stream.
>
> Sure the implementation looks "interesting" and uses deprecated String 
> constructor, but it was
> written 2 decades ago.
>
> Back then I think we were thinking maybe the fix is to simply remove 
> the "misleading" comment
> line below in the source code, if the implementation actually supports 
> utf-8 + 72-limit.
>
> "     * XXX Need to handle UTF8 values and break up lines longer than 
> 72 bytes"
>
> Any reason/test case to believe the mechanism does not work?
>
> BTW the logic of line below in writeMain() appears to be incorrect. I 
> do have another bug for it.
> "            if ((version != null) && 
> !(name.equalsIgnoreCase(vername))) {"
>
> ----------------------
>
>                 StringBuffer buffer = new StringBuffer(name);
>                 buffer.append(": ");
>
>                 String value = (String) e.getValue();
>                 if (value != null) {
>                     byte[] vb = value.getBytes("UTF8");
>                     value = new String(vb, 0, 0, vb.length);
>                 }
>                 buffer.append(value);
>
>                 Manifest.make72Safe(buffer);
>                 buffer.append("\r\n");
>                 out.writeBytes(buffer.toString());
>
> -------------------------
>
> sherman
>
>
>
> On 5/1/18, 10:21 PM, Philipp Kunz wrote:
>> Hi,
>>
>> Recently, I tried to fix only bug 6202130 with the intention to fix 
>> bug 6443578 later with the intention to get some opportunity for 
>> feedback, but haven't got any, and propose now a fix for both 
>> together which in my opinion makes more sense.
>>
>> See attached patch.
>>
>> Some considerations, assumptions, and explanations
>>
>>  * In my opinion, the code for writing manifests was distributed in the
>>    two classes Attributes and Manifest in an elegant way but somewhat
>>    difficult to explain the coherence. I chose to group the code that
>>    writes manifests into a new class ManifestWriter. The main incentive
>>    for that was to prevent or reduce duplicated code I would have had
>>    to change twice otherwise. This also results in a source file of a
>>    suitable size.
>>  * I could not support the assumption that the write and writeMain
>>    methods in Attributes couldn't be referenced anywhere so I
>>    deprecated them rather than having them removed.
>>  * I assumed the patch will not make it into JDK 10 and, hence, the
>>    deprecated annotations are attributed with since = 11.
>>  * I could not figure out any reason for the use of DataOutputStream
>>    and did not use it.
>>  * Performance-wise I assume that the code is approximately comparable
>>    to the previous version. The biggest improvement in this respect I
>>    hope comes from removing the String that contains the byte array
>>    constructed with deprecated String(byte[], int, int, int) and then
>>    copying it over again to a StringBuffer and from there to a String
>>    again and then Characters. On the other hand, keeping whole
>>    characters together when breaking lines might make it slightly
>>    slower. I hope my changes are an overall improvement, but I haven't
>>    measured it.
>>  * For telling first from continuation bytes of utf-8 characters apart
>>    I re-used a method isNotUtfContinuationByte from either StringCoding
>>    or UTF_8.Decoder. Unfortunately I found no way not to duplicate it.
>>  * Where it said before "XXX Need to handle UTF8 values and break up
>>    lines longer than 72 bytes" in Attributes#writeMain I did not dare
>>    to remove the comment completely because it still does not deal
>>    correctly with version headers longer than 72 bytes and the set of
>>    allowed values. I changed it accordingly. Two similar comments are
>>    removed in the patch.
>>  * I added two tests, WriteDeprecated and NullKeysAndValues, to
>>    demonstrate compatibility as good as I could. Might however not be
>>    desired to keep and having to maintain.
>>  * LineBrokenMultiByteCharacter for jarsigner should not be removed or
>>    not so immediately because someone might attempt to sign an older
>>    jarfile created without that patch with a newer jarsigner that
>>    already contains it.
>>
>>
>>
>> suggested changes or additions to the bug database: (i have no 
>> permissions to edit it myself)
>>
>>  * Re-combine copies of isNotUtfContinuationByte (three by now).
>>    Relates to 6184334. Worth to file another issue?
>>  * Manifest versions have specific specifications, cannot break across
>>    lines and can contain a subset of characters only. Bug 6910466
>>    relates but is not exactly the same. If someone else is convinced
>>    that writing a manifest should issue a warning or any other way to
>>    deal with a version that does not conform to the specification, I'd
>>    suggest to create a separate bug for that.
>>
>>
>> Now, I would be glad if someone sponsored a review. This is only my 
>> third attempt to submit a patch which is why I chose a lesser 
>> important subject to fix in order to get familiar and now I 
>> understand it's not the most attractive patch to review. Please don't 
>> hesitate to suggest what I could do better or differently.
>>
>> As a bonus, with these changes, manifest files will always be 
>> displayed correctly with just any utf capable viewer even if they 
>> contain multi-byte utf characters that would have been broken across 
>> a line break with the current/previous implementation and all 
>> manifests will become also valid strings in Java.
>>
>

-- 

Gruss Philipp

------------------------------------------------------------------------

Paratix GmbH
St Peterhofstatt 11
8001 Zürich

+41 (0)76 397 79 35
philipp.kunz at paratix.ch <mailto:philipp.kunz at paratix.ch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 6443578.patch
Type: text/x-patch
Size: 43396 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20180504/c029c971/6443578-0001.patch>