RFR: 8342040: Further improve entry lookup performance for multi-release JARs [v3]

Claes Redestad redestad at openjdk.org
Fri Oct 18 13:59:33 UTC 2024


On Fri, 18 Oct 2024 09:31:45 GMT, Eirik Bjørsnøs <eirbjo at openjdk.org> wrote:

>> With this change to `ZipFileOpen`: 
>> 
>> diff --git a/test/micro/org/openjdk/bench/java/util/zip/ZipFileOpen.java b/test/micro/org/openjdk/bench/java/util/zip/ZipFileOpen.java
>> index 4f6ae6373ec..0e22cf7f0ab 100644
>> --- a/test/micro/org/openjdk/bench/java/util/zip/ZipFileOpen.java
>> +++ b/test/micro/org/openjdk/bench/java/util/zip/ZipFileOpen.java
>> @@ -73,6 +73,11 @@ public void beforeRun() throws IOException {
>> 
>>                  ename += entryPrefixes[i % entryPrefixes.length] + "-" + i;
>>                  zos.putNextEntry(new ZipEntry(ename));
>> +                zos.putNextEntry(new ZipEntry("META-INF/versions/9/" + ename));
>> +                zos.putNextEntry(new ZipEntry("META-INF/versions/11/" + ename));
>> +                zos.putNextEntry(new ZipEntry("META-INF/versions/17/" + ename));
>> +                zos.putNextEntry(new ZipEntry("META-INF/versions/21/" + ename));
>> +                zos.putNextEntry(new ZipEntry("META-INF/versions/24/" + ename));
>>              }
>>          }
>>          zipFile = tempFile;
>> 
>> 
>> On that modified `ZipFileOpen` micro,
>> https://github.com/eirbjo/jdk/pull/1 improves over #21489 but still regress compared to the baseline:
>> 
>> 
>> Benchmark                       (size)  Mode  Cnt        Score       Error  Units
>> ZipFileOpen.openCloseZipFile      1024  avgt   15   677386,387 ± 15761,989  ns/op # baseline
>> ZipFileOpen.openCloseZipFile      1024  avgt   15  1121127,629 ± 16716,218  ns/op # pull/21489, 0.6x
>> ZipFileOpen.openCloseZipFile      1024  avgt   15   968501,630 ± 28761,067  ns/op # eirbjo/jdk#1, 0.7x
>> 
>> 
>> I suspect remaining regression is mostly from the added `String` creation/processing in `initCEN`.
>
>> I suspect remaining regression is mostly from the added `String` creation/processing in `initCEN`.
> 
> Yes, there is a cost with the String construction when parsing the name out of `META-INF/versions/{name}`

There are a few possible strategies for avoiding that additional parse, since the effect we're getting at here is to have a quick filter to avoid pointless lookups and not necessarily an exact mapping.

One is to store the `checkedHash` rather than the full `String`. This gets `openCloseZipFile` down to ~910000 ns/op. (`checkedHash` very hot in profiles). There's still a chance for redundant lookups on hash collisions, but this should be rare.

Another is to store a `BitSet` per name length. This gets `ZipFileOpen` down to baseline level (~670000 ns/op), but increases chance of having to do redundant lookups a lot.

Both also improves footprint (not keeping each versioned entry `String` in memory would be nice).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21489#discussion_r1806242792


More information about the security-dev mailing list