RFR: 8243469: Lazily encode name in ZipFile.getEntryPos

Claes Redestad claes.redestad at oracle.com
Fri Apr 24 19:22:55 UTC 2020


Hi Volker,

On 2020-04-24 20:37, Volker Simonis wrote:
> On Thu, Apr 23, 2020 at 2:35 PM Claes Redestad
> <claes.redestad at oracle.com> wrote:
>>
>> Hi,
>>
>> current implementation of ZipFile.getEntryPos takes the encoded byte[]
>> of the String we're looking up. Which means when looking up entries
>> across multiple jar files, we allocate the byte[] over and over for each
>> jar file searched.
>>
>> If we instead refactor the internal hash table to use a normalized
>> String-based hash value, we can almost always avoid both the repeated
>> encoding and (most of) the hash calculation when the entry is not found
>> in the jar/zip.
>>
>> This realizes a significant startup improvement on applications with
>> several or many jar files. A rather typical case. For example I see a
>> 25% speedup of ZipEnty.getEntry calls on a Spring PetClinic, which
>> reduces total startup time by ~120ms, or ~2.5% of total, on my setup. At
>> the same time remaining neutral on single-jar apps.
>>
>> Webrev: http://cr.openjdk.java.net/~redestad/8243469/open.00/
>> Bug:    https://bugs.openjdk.java.net/browse/JDK-8243469
>>
>> Testing: tier1-2
> 
> Hi Claes,
> 
> that's a little puzzling change but also a nice enhancement and cleanup.

first: thanks for reviewing!

Yes, I've tried to simplify and explain everything clearly, since it's a
rather delicate area and we're already stretching the complexity budget
thin.

> 
> I think it looks good. I have only two minor comments:
> 
> There's no check for "end > 0" here:
>    93         @Override
>    94         boolean hasTrailingSlash(byte[] a, int end) {
>    95             return a[end - 1] == '/';
>    96         }
> I think that's currently no real problem, but maybe put in a check for any case?

Will do.

> 
> And while you're on it, I think the following comment should be updated from:
> 
>   641     /* Checks ensureOpen() before invoke this method */
> 
> to something like:
> 
>   641     /* Check ensureOpen() before invoking this method */

Will do.

> 
> 
> I've also had a quick look at the microbenchmark which you've
> apparently only added today :)
> 
> It seems that 'getEntryHitUncached' is getting slightly slower with
> your change while all the other variants get significantly faster. I
> don't think that's a problem, but do you have an explanation why
> that's the case?

I've noticed it swing a bit either way, and have been asking myself the
same thing. After a little analysis I think it's actually a bug in my
microbenchmark: I'm always looking up the same entry, and thus hitting
the same bucket in the hash table. If that one has a collision, we'll do
a few extra passes. If not, we won't. This might be reflected as a
significant swing in either direction.

I'm going to try rewriting it to consider more (if not all) entries in
the zip file. That should mean the cost averages out a bit.

/Claes

> 
> Thanks for this nice improvement,
> Volker
> 
>>
>> Thanks!
>>
>> /Claes


More information about the core-libs-dev mailing list