RFR for 6443578: Continuation lines in JAR manifests do not follow RFC-822
Philipp Kunz
philipp.kunz at paratix.ch
Fri May 4 08:25:18 UTC 2018
Hi Sherman,
Thanks again for pointing out that fewer changes are sufficient. I
prepared another patch for 6443578.
http://files.paratix.ch/jdk/6443578/webrev.02/
http://files.paratix.ch/jdk/6443578/webrev.02.zip
The approach is basically, not to put a line break where a utf character
continuation byte follows which can be tested with a bit mask.
For telling first from continuation bytes of utf characters I re-used a
method isNotUtfContinuationByte from either StringCoding or
UTF_8.Decoder. Unfortunately I found no way not to duplicate it which I
consider subject of already known bug 6184334. In case this patch is
accepted, I suggest to add a comment to 6184334 that refers to the new
copy in Manifest#make72Safe.
LineBrokenMultiByteCharacter should not be removed or not so immediately
at least because someone might attempt to sign an older jarfile created
without that patch with a newer jarsigner that already contains it if
the jar previously contained digests presumably from another signature.
Looking forward to any feedback and with kind regards,
Philipp
On 03.05.2018 08:56, Xueming Shen wrote:
> Philipp,
>
> I kinda recalled JDK-6202130 which I read long time ago and I believed
> it's not a bug but
> just wanted to leave it open and for double check, then it's off the
> priority list somehow...
>
> Reread the code below I think it handles utf8 and the 72-length-limit
> appropriately, though
> a little tricky,
>
> (1) value.getByte("utf8") + deprecated String(byte[], int, int, int);
> which creates a String that each "char" inside that String object
> actually represents one byte
> value of the resulted utf8 byte sequence, with its utf16 value =
> [hi-byte= 0 and low-byte="utf8-byte"]
>
> (2) append this String to the StringBuffer (--> StringBuilder), now
> the sb.length() actually is the
> length of the utf8 byte sequence, make72Safe(...) is used to
> adjust the length to 72 working
> on "chars" inside the buffer.
>
> (3) write out the adjusted buffer via DataOutputStream.writeBytes()
> in which the "write" cuts off the hi-byte of that utf16, so you
> actually get the original
> utf-8 byte sequence output to the stream.
>
> Sure the implementation looks "interesting" and uses deprecated String
> constructor, but it was
> written 2 decades ago.
>
> Back then I think we were thinking maybe the fix is to simply remove
> the "misleading" comment
> line below in the source code, if the implementation actually supports
> utf-8 + 72-limit.
>
> " * XXX Need to handle UTF8 values and break up lines longer than
> 72 bytes"
>
> Any reason/test case to believe the mechanism does not work?
>
> BTW the logic of line below in writeMain() appears to be incorrect. I
> do have another bug for it.
> " if ((version != null) &&
> !(name.equalsIgnoreCase(vername))) {"
>
> ----------------------
>
> StringBuffer buffer = new StringBuffer(name);
> buffer.append(": ");
>
> String value = (String) e.getValue();
> if (value != null) {
> byte[] vb = value.getBytes("UTF8");
> value = new String(vb, 0, 0, vb.length);
> }
> buffer.append(value);
>
> Manifest.make72Safe(buffer);
> buffer.append("\r\n");
> out.writeBytes(buffer.toString());
>
> -------------------------
>
> sherman
>
>
>
> On 5/1/18, 10:21 PM, Philipp Kunz wrote:
>> Hi,
>>
>> Recently, I tried to fix only bug 6202130 with the intention to fix
>> bug 6443578 later with the intention to get some opportunity for
>> feedback, but haven't got any, and propose now a fix for both
>> together which in my opinion makes more sense.
>>
>> See attached patch.
>>
>> Some considerations, assumptions, and explanations
>>
>> * In my opinion, the code for writing manifests was distributed in the
>> two classes Attributes and Manifest in an elegant way but somewhat
>> difficult to explain the coherence. I chose to group the code that
>> writes manifests into a new class ManifestWriter. The main incentive
>> for that was to prevent or reduce duplicated code I would have had
>> to change twice otherwise. This also results in a source file of a
>> suitable size.
>> * I could not support the assumption that the write and writeMain
>> methods in Attributes couldn't be referenced anywhere so I
>> deprecated them rather than having them removed.
>> * I assumed the patch will not make it into JDK 10 and, hence, the
>> deprecated annotations are attributed with since = 11.
>> * I could not figure out any reason for the use of DataOutputStream
>> and did not use it.
>> * Performance-wise I assume that the code is approximately comparable
>> to the previous version. The biggest improvement in this respect I
>> hope comes from removing the String that contains the byte array
>> constructed with deprecated String(byte[], int, int, int) and then
>> copying it over again to a StringBuffer and from there to a String
>> again and then Characters. On the other hand, keeping whole
>> characters together when breaking lines might make it slightly
>> slower. I hope my changes are an overall improvement, but I haven't
>> measured it.
>> * For telling first from continuation bytes of utf-8 characters apart
>> I re-used a method isNotUtfContinuationByte from either StringCoding
>> or UTF_8.Decoder. Unfortunately I found no way not to duplicate it.
>> * Where it said before "XXX Need to handle UTF8 values and break up
>> lines longer than 72 bytes" in Attributes#writeMain I did not dare
>> to remove the comment completely because it still does not deal
>> correctly with version headers longer than 72 bytes and the set of
>> allowed values. I changed it accordingly. Two similar comments are
>> removed in the patch.
>> * I added two tests, WriteDeprecated and NullKeysAndValues, to
>> demonstrate compatibility as good as I could. Might however not be
>> desired to keep and having to maintain.
>> * LineBrokenMultiByteCharacter for jarsigner should not be removed or
>> not so immediately because someone might attempt to sign an older
>> jarfile created without that patch with a newer jarsigner that
>> already contains it.
>>
>>
>>
>> suggested changes or additions to the bug database: (i have no
>> permissions to edit it myself)
>>
>> * Re-combine copies of isNotUtfContinuationByte (three by now).
>> Relates to 6184334. Worth to file another issue?
>> * Manifest versions have specific specifications, cannot break across
>> lines and can contain a subset of characters only. Bug 6910466
>> relates but is not exactly the same. If someone else is convinced
>> that writing a manifest should issue a warning or any other way to
>> deal with a version that does not conform to the specification, I'd
>> suggest to create a separate bug for that.
>>
>>
>> Now, I would be glad if someone sponsored a review. This is only my
>> third attempt to submit a patch which is why I chose a lesser
>> important subject to fix in order to get familiar and now I
>> understand it's not the most attractive patch to review. Please don't
>> hesitate to suggest what I could do better or differently.
>>
>> As a bonus, with these changes, manifest files will always be
>> displayed correctly with just any utf capable viewer even if they
>> contain multi-byte utf characters that would have been broken across
>> a line break with the current/previous implementation and all
>> manifests will become also valid strings in Java.
>>
>
--
Gruss Philipp
------------------------------------------------------------------------
Paratix GmbH
St Peterhofstatt 11
8001 Zürich
+41 (0)76 397 79 35
philipp.kunz at paratix.ch <mailto:philipp.kunz at paratix.ch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 6443578.patch
Type: text/x-patch
Size: 43396 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20180504/c029c971/6443578-0001.patch>
More information about the core-libs-dev
mailing list