RFR: 8243254: Examine ZipFile slash optimization for non-ASCII compatible charsets

Claes Redestad claes.redestad at oracle.com
Wed Apr 22 16:27:32 UTC 2020


Lance, Naoto,

thanks for reviewing!

/Claes

On 2020-04-22 18:19, naoto.sato at oracle.com wrote:
> +1
> 
> Naoto
> 
> On 4/22/20 9:13 AM, Lance Andersen wrote:
>> Hi Claes,
>>
>> The latest version looks good.
>>
>> Thank you for the patch.
>>
>> Best
>> Lance
>>
>>> On Apr 22, 2020, at 6:26 AM, Claes Redestad 
>>> <claes.redestad at oracle.com> wrote:
>>>
>>> Hi,
>>>
>>> new webrev based on discussions here and offline:
>>>
>>> http://cr.openjdk.java.net/~redestad/8243254/open.01/
>>>
>>> - creates a new testng test TestZipFileEncodings, which derives from
>>>   TestZipFile rather than repurposing that existing test. There is
>>>   definitely some overlap, but since TestZipFile is only run manually
>>>   and this new test runs fine (and quickly) in our testing I think
>>>   that's fine
>>>
>>> - only implements the optimization for UTF-8, removing the concept
>>>   of ASCII-compatible ZipCoders.
>>>
>>> - a few other cleanups
>>>
>>> Testing: tier1+2
>>>
>>> Thanks!
>>>
>>> /Claes
>>>
>>> On 2020-04-21 15:52, Claes Redestad wrote:
>>>> Hi,
>>>> ZipFile.getEntry does optimizations to check for directory entries by
>>>> adding a '/' to the encoded byte array. JDK-8242959 improved on this
>>>> optimization, but a question was raised Jason Zaugg on whether the
>>>> optimization is always valid[1]. Turns out it isn't, but only for
>>>> non-ASCII compatible charsets.
>>>> While JarFiles are always assumed to be UTF-8-encoded or compatible,
>>>> ZipFiles allow arbitrary encoding. E.g., UTF-16 would encode '/' 
>>>> (2F) as
>>>> either 2F 00 or 00 2F, which means the hash code would differ and a
>>>> directory "foo/" potentially not be found when looking up "foo". 
>>>> Further
>>>> complications arise when/if the directory name ends with a code point
>>>> that might be encoded so that the final byte is 2F, e.g. \u012F.
>>>> This patch only enables this low-level optimization when the charset
>>>> encoding used is known to be ASCII compatible in the sense that 2F will
>>>> be encoded as single-byte 2F. UTF-8 is compatible in this sense - and
>>>> since this is the charset exclusively used by JarFile these changes 
>>>> have
>>>> little to no effect on startup performance in the cases we've been
>>>> looking at.
>>>> Webrev: http://cr.openjdk.java.net/~redestad/8243254/open.00/
>>>> Bug:    https://bugs.openjdk.java.net/browse/JDK-8243254
>>>> I've partially repurposed ZipFile/TestZipFile to test some such corner
>>>> cases. Mainly expanded it to test ZipFiles created with various non-
>>>> standard encodings, but also slimmed it down so that it can actually 
>>>> run
>>>> quickly and reliably - as well as enabled it in regular testing. The
>>>> updated test fails both before and after JDK-8242959, but passes with
>>>> the proposed changes.
>>>> Testing: tier1+2
>>>> Thanks!
>>>> /Claes
>>>> [1] https://twitter.com/retronym/status/1252134723431747584?s=20
>>
>>   <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>>   <http://oracle.com/us/design/oracle-email-sig-198324.gif> 
>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>>   <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance 
>> Andersen| Principal Member of Technical Staff | +1.781.442.2037
>> Oracle Java Engineering
>> 1 Network Drive
>> Burlington, MA 01803
>> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>
>>
>>
>>


More information about the core-libs-dev mailing list