RFR: 8243469: Lazily encode name in ZipFile.getEntryPos
Claes Redestad
claes.redestad at oracle.com
Fri Apr 24 19:22:55 UTC 2020
Hi Volker,
On 2020-04-24 20:37, Volker Simonis wrote:
> On Thu, Apr 23, 2020 at 2:35 PM Claes Redestad
> <claes.redestad at oracle.com> wrote:
>>
>> Hi,
>>
>> current implementation of ZipFile.getEntryPos takes the encoded byte[]
>> of the String we're looking up. Which means when looking up entries
>> across multiple jar files, we allocate the byte[] over and over for each
>> jar file searched.
>>
>> If we instead refactor the internal hash table to use a normalized
>> String-based hash value, we can almost always avoid both the repeated
>> encoding and (most of) the hash calculation when the entry is not found
>> in the jar/zip.
>>
>> This realizes a significant startup improvement on applications with
>> several or many jar files. A rather typical case. For example I see a
>> 25% speedup of ZipEnty.getEntry calls on a Spring PetClinic, which
>> reduces total startup time by ~120ms, or ~2.5% of total, on my setup. At
>> the same time remaining neutral on single-jar apps.
>>
>> Webrev: http://cr.openjdk.java.net/~redestad/8243469/open.00/
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8243469
>>
>> Testing: tier1-2
>
> Hi Claes,
>
> that's a little puzzling change but also a nice enhancement and cleanup.
first: thanks for reviewing!
Yes, I've tried to simplify and explain everything clearly, since it's a
rather delicate area and we're already stretching the complexity budget
thin.
>
> I think it looks good. I have only two minor comments:
>
> There's no check for "end > 0" here:
> 93 @Override
> 94 boolean hasTrailingSlash(byte[] a, int end) {
> 95 return a[end - 1] == '/';
> 96 }
> I think that's currently no real problem, but maybe put in a check for any case?
Will do.
>
> And while you're on it, I think the following comment should be updated from:
>
> 641 /* Checks ensureOpen() before invoke this method */
>
> to something like:
>
> 641 /* Check ensureOpen() before invoking this method */
Will do.
>
>
> I've also had a quick look at the microbenchmark which you've
> apparently only added today :)
>
> It seems that 'getEntryHitUncached' is getting slightly slower with
> your change while all the other variants get significantly faster. I
> don't think that's a problem, but do you have an explanation why
> that's the case?
I've noticed it swing a bit either way, and have been asking myself the
same thing. After a little analysis I think it's actually a bug in my
microbenchmark: I'm always looking up the same entry, and thus hitting
the same bucket in the hash table. If that one has a collision, we'll do
a few extra passes. If not, we won't. This might be reflected as a
significant swing in either direction.
I'm going to try rewriting it to consider more (if not all) entries in
the zip file. That should mean the cost averages out a bit.
/Claes
>
> Thanks for this nice improvement,
> Volker
>
>>
>> Thanks!
>>
>> /Claes
More information about the core-libs-dev
mailing list