RFR: 8342040: Further improve entry lookup performance for multi-release JARs [v3]
Claes Redestad
redestad at openjdk.org
Wed Oct 23 10:28:07 UTC 2024
On Fri, 18 Oct 2024 16:02:26 GMT, Eirik Bjørsnøs <eirbjo at openjdk.org> wrote:
>> src/java.base/share/classes/java/util/zip/ZipFile.java line 1796:
>>
>>> 1794: if (metaVersions == null)
>>> 1795: metaVersions = new HashMap<>();
>>> 1796: metaVersions.computeIfAbsent(hashCode, _ -> new BitSet()).set(version);
>>
>> Does this hashCode ever collide? Tried reading ZipCoder and can't rule that out. If yes, we need a fallback mechanism if this `computeIfAbsent` encountered an existing entry.
>
> Yes it may collide. This is an overapproximation of the version set where a hash collision just results in an extra lookup being performed for a version entry which does not exist for that name. See the comment on the `metaVersions` field declaration.
>
> This is a trade-off against the initial alternative proposed in this PR which was to map using the entry name string. That would minimize lookups, but it would also require allocating the String name and storing those strings in the map. So we decided a few extra lookups on hash collisions was an okay trade-off for faster parsing and less memory used.
>
> In fact, if we a use constant hash code like '1', such that _all_ hash codes collide, we get the behavior of the current mainline, where a search for any name will result in searches being performed for each version discovered in the ZIP.
In principle the `metaVersions` map acts as a filter to reduce entry lookups, so it's ok to give false positives but not false negatives.
Other alternatives such as mapping from entry length to a `BitSet` would also be semantically fine, would carry less of a footprint and per-versioned-entry-`initCEN` overhead, but might degrade to more false positives as the number of versioned entries in a jar file grows.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/21489#discussion_r1812437330
More information about the security-dev
mailing list