RFR: JDK-8061777, , (zipfs) IllegalArgumentException in ZipCoder.toString when using Shitft_JIS

Tue May 31 05:07:25 UTC 2016

Thanks Paul,

updated accordingly.

http://cr.openjdk.java.net/~sherman/8061777/webrev

-sherman

On 5/30/16 2:31 AM, Paul Sandoz wrote:
> Hi Sherman,
>
> Do you consider modifying the new ZipPath constructor you added to accept a boolean value for UTF-8 encoding?
>
> If so you can more clearly document the behaviour and avoid duplication of the operators in ZipFileSystem e.g.:
>
>    return new ZipPath(this, first, zc.isUTF8());
>
> Paul.
>
>> On 27 May 2016, at 22:38, Xueming Shen <xueming.shen at oracle.com> wrote:
>>
>> Hi,
>>
>> Please help review the change for JDK-8061777.
>>
>> issue: https://bugs.openjdk.java.net/browse/JDK-8061777
>> webrev: http://cr.openjdk.java.net/~sherman/8061777
>>
>> Cause: ZipPath/ZipFileSystem uses byte[] as the internal underlying storage for
>> entry names (for better performance, as the "name" is stored as bytes inside
>> the zip/jar file, it is desirable to avoid the redundant String<->byte[] conversion,
>> if possible). With this design, it is natural to also work on byte[] directly for those
>> "path" operations, including the "normalization", which mainly is to remove the
>> redundant "/" and switch the "\" to "/". This appears to be a problem for non-utf8
>> encoded zip file (utf8 is the default encoding used to de/encode the entry name
>> for the Java jar/zip APIs), especially those double-byte encodings that have 0x5c
>> ('\') as one of the double-byte bytes. The 0x5c byte will be mistakenly normalized
>> to '\' if we normalize on the byte[] directly. The proposed change here is to
>> normalize on the "String" to avoid this problem. Given the fact that Java jar/zip
>> is specified to use utf-8 by default, to avoid the potential performance risk/cost
>> for most of the zip/jar files (if we switch completely to the String based operation)
>> the utf-8/byte[] path is still being used (as the default) when the encoding is utf-8.
>> The implementation only switches to "String based" code path when the encoding
>> is specifically specified as "non-utf8", which should be rare.
>>
>> Thanks,
>> Sherman