RFR: 8243254: Examine ZipFile slash optimization for non-ASCII compatible charsets
Claes Redestad
claes.redestad at oracle.com
Wed Apr 22 10:26:02 UTC 2020
Hi,
new webrev based on discussions here and offline:
http://cr.openjdk.java.net/~redestad/8243254/open.01/
- creates a new testng test TestZipFileEncodings, which derives from
TestZipFile rather than repurposing that existing test. There is
definitely some overlap, but since TestZipFile is only run manually
and this new test runs fine (and quickly) in our testing I think
that's fine
- only implements the optimization for UTF-8, removing the concept
of ASCII-compatible ZipCoders.
- a few other cleanups
Testing: tier1+2
Thanks!
/Claes
On 2020-04-21 15:52, Claes Redestad wrote:
> Hi,
>
> ZipFile.getEntry does optimizations to check for directory entries by
> adding a '/' to the encoded byte array. JDK-8242959 improved on this
> optimization, but a question was raised Jason Zaugg on whether the
> optimization is always valid[1]. Turns out it isn't, but only for
> non-ASCII compatible charsets.
>
> While JarFiles are always assumed to be UTF-8-encoded or compatible,
> ZipFiles allow arbitrary encoding. E.g., UTF-16 would encode '/' (2F) as
> either 2F 00 or 00 2F, which means the hash code would differ and a
> directory "foo/" potentially not be found when looking up "foo". Further
> complications arise when/if the directory name ends with a code point
> that might be encoded so that the final byte is 2F, e.g. \u012F.
>
> This patch only enables this low-level optimization when the charset
> encoding used is known to be ASCII compatible in the sense that 2F will
> be encoded as single-byte 2F. UTF-8 is compatible in this sense - and
> since this is the charset exclusively used by JarFile these changes have
> little to no effect on startup performance in the cases we've been
> looking at.
>
> Webrev: http://cr.openjdk.java.net/~redestad/8243254/open.00/
> Bug: https://bugs.openjdk.java.net/browse/JDK-8243254
>
> I've partially repurposed ZipFile/TestZipFile to test some such corner
> cases. Mainly expanded it to test ZipFiles created with various non-
> standard encodings, but also slimmed it down so that it can actually run
> quickly and reliably - as well as enabled it in regular testing. The
> updated test fails both before and after JDK-8242959, but passes with
> the proposed changes.
>
> Testing: tier1+2
>
> Thanks!
>
> /Claes
>
> [1] https://twitter.com/retronym/status/1252134723431747584?s=20
More information about the core-libs-dev
mailing list