RFR: 8243254: Examine ZipFile slash optimization for non-ASCII compatible charsets

Claes Redestad claes.redestad at oracle.com
Tue Apr 21 22:28:33 UTC 2020


On 2020-04-21 20:30, Martin Buchholz wrote:
> 
>      >
>      > Are we really willing to maintain tests for UTF-8,
>     ASCII-compatible, and
>      > ASCII-incompatible code paths?
> 
>     If we aren't then I think we should terminally deprecate all the ZipFile
>     constructors that take a Charset argument. Not a call I can make.
> 
> 
> I mean more generally.  You're proposing to fix zip files where the 
> metadata is UTF-16 encoded.
> 
> But I suspect we aren't testing such zip files at all, and I suspect 
> it's broken in other ways, that we might discover by expanding our zip 
> tests to fully cover such zip files.

AFAICT there's a certain scarcity of bug reports of the "this zip file
can't be read by java" variety. I'm not sure what you're suggesting
here, though, and while there's surely other dragons in this code I'm
sure that's a strong argument against fixing issues as we find them?

At least until we decide to deprecate them.

I'll go sleep on this now, but I think we should probably go ahead
with your suggestion to optimize for UTF-8 - while providing a simpler,
slower but more easily checked implementation for every other encoding.

Depending on how far one goes down that path, quite a few cleanups can
be done to the UTF-8 implementation - which might be cause for
additional speed-ups.

> 
> 
>      >
>      > ASCII-compatibility is also deeply embedded in the original C code,
>      > which is still there
> 
>     Yep, the assumption go way back, but as long as it's functionality only
>     relevant to jar files it's fine in both impls.
> 
>     Is there any usage left of the C implementation these days?
> 
> 
> Bootclassloader likely doesn't use java code!
> So -Xbootclasspath/a is implemented using that C code.

That's true. Found the relevant places in hotspot that call into
libzip/zip_util. Since we can't prepend the bootclasspath any more it
seems like it could be rewritten to call java code rather than link into
libzip, though.

> 
> ---
> The number one reason to make a zip test "manual" is that it consumes 
> too many resources to run all the time.

That assumes there's someone actually running the tests from time to
time. Several of TestZipFile seem to fail even after overcoming the
OOMEs. I guess I should leave a version of it intact, file a bug to
investigate failures, then create a new slimmed down test that we can
add to automatic test coverage.

After slimming down TestZipFile it runs in a matter of seconds on my
setups, but if I only focus on the reproducer for this issue it should
be faster still.

/Claes


More information about the core-libs-dev mailing list