RFR: 8243469: Lazily encode name in ZipFile.getEntryPos
Claes Redestad
claes.redestad at oracle.com
Mon Apr 27 10:11:38 UTC 2020
On 2020-04-27 11:49, Volker Simonis wrote:
> On Sun, Apr 26, 2020 at 11:34 PM Claes Redestad
> <claes.redestad at oracle.com> wrote:
>>
>> Hi again,
>>
>> On 2020-04-24 21:22, Claes Redestad wrote:
>>>> It seems that 'getEntryHitUncached' is getting slightly slower with
>>>> your change while all the other variants get significantly faster. I
>>>> don't think that's a problem, but do you have an explanation why
>>>> that's the case?
>>>
>>> I've noticed it swing a bit either way, and have been asking myself the
>>> same thing. After a little analysis I think it's actually a bug in my
>>> microbenchmark: I'm always looking up the same entry, and thus hitting
>>> the same bucket in the hash table. If that one has a collision, we'll do
>>> a few extra passes. If not, we won't. This might be reflected as a
>>> significant swing in either direction.
>>>
>>> I'm going to try rewriting it to consider more (if not all) entries in
>>> the zip file. That should mean the cost averages out a bit.
>>
>> after I improved my micro to root out sources of variance, the
>> performance issue for hits persisted.
>>
>> Luckily Eirik had a brilliant idea: Why not decode the bytes in the
>> cen to a String and compare that, rather than the other way around?
>> To some surprise it turns out this gives us about a ~1.2x speedup for
>> getEntryHit and getEntryHitUncached over open.00 - and comfortably
>> just ahead of the baseline on getEntryHitUncached[1]. It also leads to
>> slightly cleaner code[2].
>>
>> Webrev: http://cr.openjdk.java.net/~redestad/8243469/open.01/
>>
>> The speed-up appears to come from String.equals, which is intrinsified
>> and significantly faster than the replaced loop. I profiled allocation
>> per operation and it stays the same (EA removes the String).
>>
>
> Great! Another nice improvement. The changes look good to me.
Thanks!
> Following just two minor remarks:
>
> In ZipCoder.normalizedHashDecode() you've changed the line:
>
> if (limit > 0 && decoded[limit - 1] != '/') {
>
> to:
>
> if (limit > pos && decoded[limit - 1] != '/') {
>
> which was first a little confusing to me. But in the end it turns out
> that this is semantically the same, because the
> CharsetDecoder.decode() method called before is guaranteed to return a
> "newly-allocated character buffer" and its "position will be zero and
> its limit will follow the last character written". This also explains
> why you don't have to take the CharBuffer's "arrayOffset()" into
> account if you use the CharBuffer's backing array (because it will
> always be 0 for newly created buffers). So maybe you can put in some
> comments to make it less confusing for the ingenuous reader:
>
> CharBuffer cb = decoder().decode(ByteBuffer.wrap(a, off, end - off));
> // 'cb' is a newly allocated CharBuffer with 'pos == 0'
> int pos = cb.position();
> int limit = cb.limit();
> char[] decoded;
> if (cb.hasArray()) {
> // 'cb.arrayOffset()' is zero for newly allocated CharBuffers
> decoded = cb.array();
> } else {
> decoded = new char[limit - pos];
> cb.get(decoded);
> }
>
> I think you can also remove the "else" branch (and maybe replace it
> with an assertion) because newly allocated CharBuffers are guaranteed
> to be backed by an array with array offset zero (see
> https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/nio/CharBuffer.html#allocate(int)
> ).
Yes, it does seem the specification is pretty strong here and we can
assume that pos == 0, arrayOffset == 0 and cb.hasArray() == true.
I'll simplify to:
// 'cb' is a newly allocated CharBuffer with 'pos == 0',
// 'arrayOffset == 0', backed by an array.
CharBuffer cb = decoder().decode(ByteBuffer.wrap(a, off,
end - off));
int limit = cb.limit();
char[] decoded = cb.array();
for (int i = 0; i < limit; i++) {
h = 31 * h + decoded[i];
}
if (limit > 0 && decoded[limit - 1] != '/') {
h = 31 * h + '/';
}
An assert seems like overkill.
>
> Zipcoder.get() seems to be the only remaining if block without braces.
> Maybe you'll wnat to fix that once your on it?
>
> public static ZipCoder get(Charset charset) {
> if (charset == UTF_8.INSTANCE)
> return UTF8;
> return new ZipCoder(charset);
Sure.
/Claes
>
> Thumbs up from my side. There's no need for a new webrev from my side.
>
> Best regards,
> Volker
>
>> Testing: tier1-4
>>
>> Thanks!
>>
>> /Claes
>>
>> [1]
>> Baseline:
>> Benchmark (size) Mode Cnt Score Error Units
>> ZipFileGetEntry.getEntryHit 512 avgt 15 126.264 ± 5.297
>> ns/op
>> ZipFileGetEntry.getEntryHit 1024 avgt 15 130.823 ± 7.212
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 152.149 ± 4.978
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 151.527 ± 4.054
>> ns/op
>>
>> open.01:
>> Benchmark (size) Mode Cnt Score Error
>> Units
>> ZipFileGetEntry.getEntryHit 512 avgt 15 84.450 ± 5.474
>> ns/op
>> ZipFileGetEntry.getEntryHit 1024 avgt 15 85.224 ± 3.776
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 140.448 ± 4.667
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 145.046 ± 7.363
>>
>> [2] I stopped short of taking the cleanup a step further by decoding to
>> String even in initCEN, which sadly isn't performance neutral:
>>
>> http://cr.openjdk.java.net/~redestad/8243469/open.01.init_decode/
>>
>> Something for the future to consider, maybe.
More information about the core-libs-dev
mailing list