RFR: 8342040: Further improve entry lookup performance for multi-release JARs [v3]

Jaikiran Pai jpai at openjdk.org
Wed Oct 30 16:17:12 UTC 2024


On Fri, 18 Oct 2024 13:59:32 GMT, Eirik Bjørsnøs <eirbjo at openjdk.org> wrote:

>> Please review this PR which speeds up `JarFile::getEntry` lookup significantly for multi-release JAR files.
>> 
>> The changes in this PR are motivated by the following insights:
>> 
>> * `META-INF/versions/` is sparsely populated.
>> * Most entries are not versioned
>> * The number of unique versions for each versioned entry is small
>> * Many JAR files are 'accidentally' multi-release; they use the feature to hide `module-info.class` from Java 8.
>> 
>> Instead of  performing one lookup for every version identified in the JAR, this PR narrows the version search down to an approximation of the number of versions found for the entry being looked up, which will often be zero. This speeds up lookup for non-versioned entries, and provides a more targeted search for versioned entries.
>> 
>> An alternative approach could be to normalize the hash code to use the none-versioned name such that versioned and non-versioned names would be resolved in the same lookup. This was quickly abandoned since the code changes were intrusive and mixed too many JAR specific concerns into `ZipFile`.
>> 
>> Testing: The existing `JarFileGetEntry` benchmark is updated to optionally test a multi-release JAR file with one versioned entry for `module-info.class` plus two other versioned class files for two distinct versions. Performance results in [first comment](#issuecomment-2410901754).
>> 
>> Running `ZipFileOpen` on a multi-release JAR did not show a significat difference between this PR and mainline. 
>> 
>> The JAR and ZIP tests are run locally. GHA results green. The `noreg-perf` label is added in JBS.
>
> Eirik Bjørsnøs has updated the pull request incrementally with four additional commits since the last revision:
> 
>  - Map versions by entry name hashcode instead of by entry name. This avoids String allocation and storage
>  - Merge pull request #1 from cl4es/bitset_versions
>    
>    Use BitSet to streamline construction
>  - Fix traversal, traverse backwards to pick latest applicable version
>  - Use BitSet to streamline construction

src/java.base/share/classes/java/util/zip/ZipFile.java line 1798:

> 1796:                                 metaVersions.computeIfAbsent(hashCode, _ -> new BitSet()).set(version);
> 1797:                             } catch (Exception e) {
> 1798:                                 throw new IllegalArgumentException(e);

Hello Eirik, I'm late to this to PR. Was throwing an unspecified `IllegalArgumentException` from here intentional? Should it have been `IOException` instead?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21489#discussion_r1822968189


More information about the core-libs-dev mailing list