RFR: 8243254: Examine ZipFile slash optimization for non-ASCII compatible charsets

naoto.sato at oracle.com naoto.sato at oracle.com
Wed Apr 22 16:19:43 UTC 2020


+1

Naoto

On 4/22/20 9:13 AM, Lance Andersen wrote:
> Hi Claes,
> 
> The latest version looks good.
> 
> Thank you for the patch.
> 
> Best
> Lance
> 
>> On Apr 22, 2020, at 6:26 AM, Claes Redestad <claes.redestad at oracle.com> wrote:
>>
>> Hi,
>>
>> new webrev based on discussions here and offline:
>>
>> http://cr.openjdk.java.net/~redestad/8243254/open.01/
>>
>> - creates a new testng test TestZipFileEncodings, which derives from
>>   TestZipFile rather than repurposing that existing test. There is
>>   definitely some overlap, but since TestZipFile is only run manually
>>   and this new test runs fine (and quickly) in our testing I think
>>   that's fine
>>
>> - only implements the optimization for UTF-8, removing the concept
>>   of ASCII-compatible ZipCoders.
>>
>> - a few other cleanups
>>
>> Testing: tier1+2
>>
>> Thanks!
>>
>> /Claes
>>
>> On 2020-04-21 15:52, Claes Redestad wrote:
>>> Hi,
>>> ZipFile.getEntry does optimizations to check for directory entries by
>>> adding a '/' to the encoded byte array. JDK-8242959 improved on this
>>> optimization, but a question was raised Jason Zaugg on whether the
>>> optimization is always valid[1]. Turns out it isn't, but only for
>>> non-ASCII compatible charsets.
>>> While JarFiles are always assumed to be UTF-8-encoded or compatible,
>>> ZipFiles allow arbitrary encoding. E.g., UTF-16 would encode '/' (2F) as
>>> either 2F 00 or 00 2F, which means the hash code would differ and a
>>> directory "foo/" potentially not be found when looking up "foo". Further
>>> complications arise when/if the directory name ends with a code point
>>> that might be encoded so that the final byte is 2F, e.g. \u012F.
>>> This patch only enables this low-level optimization when the charset
>>> encoding used is known to be ASCII compatible in the sense that 2F will
>>> be encoded as single-byte 2F. UTF-8 is compatible in this sense - and
>>> since this is the charset exclusively used by JarFile these changes have
>>> little to no effect on startup performance in the cases we've been
>>> looking at.
>>> Webrev: http://cr.openjdk.java.net/~redestad/8243254/open.00/
>>> Bug:    https://bugs.openjdk.java.net/browse/JDK-8243254
>>> I've partially repurposed ZipFile/TestZipFile to test some such corner
>>> cases. Mainly expanded it to test ZipFiles created with various non-
>>> standard encodings, but also slimmed it down so that it can actually run
>>> quickly and reliably - as well as enabled it in regular testing. The
>>> updated test fails both before and after JDK-8242959, but passes with
>>> the proposed changes.
>>> Testing: tier1+2
>>> Thanks!
>>> /Claes
>>> [1] https://twitter.com/retronym/status/1252134723431747584?s=20
> 
>   <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>   <http://oracle.com/us/design/oracle-email-sig-198324.gif> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>   <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>
> 
> 
> 


More information about the core-libs-dev mailing list