JarFile.getVersionedEntry scalability with new release cadence

Eirik Bjørsnøs eirbjo at gmail.com
Sat Apr 11 20:06:49 UTC 2020


There's an added up-front cost to scanning versions which I have also tried
to test.

Since checkForSpecialAttributes is lazy, this can be tested by measuring
the time taken to read the first entry of an open JarFile. For the h2 jar
file, this seems to take ~350 microseconds without my patch, which
increases to ~850 microseconds with the patch.

This cost applies to the first getEntry call and is then amortized over all
following calls.

So this patch is probably not a win for use cases where very few entries
are read.

Eirik.

On Sat, Apr 11, 2020 at 9:10 PM Eirik Bjørsnøs <eirbjo at gmail.com> wrote:

>
> Lance,
>
> I made a small performance test. Pretty sloppy, so please don't tell
> Aleksey S :-)
>
> Results indicate there may be some performance wins to be had.
>
> The test uses the Maven artifact com.h2database:h2:1.4.200:jar. This jar
> which has 950 entries, of which the following three are versioned:
>
> META-INF/versions/10/org/h2/util/NetUtils2.class
> META-INF/versions/9/org/h2/util/Bits.class
> META-INF/versions/9/org/h2/util/CurrentTimestamp.class
>
> The performance test calls JarFile.getEntry for each of the base names
> found in the jar. It does so 2000 times for 50 iterations and calculates
> the average run time.
>
> This is done once on a JarFile opened with runtime version 15, once on a
> JarFile opened with runtime version 8 (which effectively disables versioned
> lookup so works as a baseline). Warmup runs are run first to get stable
> results.
>
> The test is run with OpenJDK 15 built from master.
>
> Results:
>
> Average time to get 950 entries 2000 times:
>
> Runtime version 15: 2903 ms
> Runtime version 8: 336 ms:
>
> This is shows the difference between testing seven versions (9, 10, 11,
> 12, 13, 14, 15) and not testing versions.
>
> I then made a change to JarFile which scans the versions up front and
> stores them in an int[] which is then looped over in getVersionedEntry.
>
> Results:
>
> Runtime version 15: 1048 ms
> Runtime version 8: 315 ms:
>
> My benchmark is of course synthetic and does not represent reality. I have
> not done any analysis on the shape of typical multi-versioned jars nor
> their access patterns.
>
> However, an improvement of 2.5 - 3x is maybe worth taking a closer look?
>
> Here's the patch for my change in JarFile.java:
>
> Index: src/java.base/share/classes/java/util/jar/JarFile.java
> IDEA additional info:
> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> <+>UTF-8
> ===================================================================
> --- src/java.base/share/classes/java/util/jar/JarFile.java (revision
> 86722cb038d3030c51f3268799a2c3dc0c508638)
> +++ src/java.base/share/classes/java/util/jar/JarFile.java (date
> 1586631626679)
> @@ -161,7 +161,7 @@
>      private final Runtime.Version version;  // current version
>      private final int versionFeature;       // version.feature()
>      private boolean isMultiRelease;         // is jar multi-release?
> -
> +    private int[] versions;                 // which versions does the
> jar contain
>      // indicates if Class-Path attribute present
>      private boolean hasClassPathAttribute;
>      // true if manifest checked for special attributes
> @@ -599,12 +599,13 @@
>      }
>
>      private JarEntry getVersionedEntry(String name, JarEntry je) {
> -        if (BASE_VERSION_FEATURE < versionFeature) {
> +        int[] versions = this.versions;
> +        if (BASE_VERSION_FEATURE < versionFeature && versions != null &&
> versions.length > 0) {
>              if (!name.startsWith(META_INF)) {
>                  // search for versioned entry
> -                int v = versionFeature;
> -                while (v > BASE_VERSION_FEATURE) {
> -                    JarFileEntry vje = getEntry0(META_INF_VERSIONS + v +
> "/" + name);
> +                int v = versions.length - 1;
> +                while (v >= 0) {
> +                    JarFileEntry vje = getEntry0(META_INF_VERSIONS +
> versions[v] + "/" + name);
>                      if (vje != null) {
>                          return vje.withBasename(name);
>                      }
> @@ -1016,9 +1017,20 @@
>                          byte[] lbuf = new byte[512];
>                          Attributes attr = new Attributes();
>                          attr.read(new Manifest.FastInputStream(
> -                            new ByteArrayInputStream(b)), lbuf);
> -                        isMultiRelease = Boolean.parseBoolean(
> -                            attr.getValue(Attributes.Name.MULTI_RELEASE));
> +                                new ByteArrayInputStream(b)), lbuf);
> +                        if(Boolean.parseBoolean(
> +
>  attr.getValue(Attributes.Name.MULTI_RELEASE))) {
> +                            isMultiRelease = true;
> +                            versions = this.stream()
> +                                    .map(ZipEntry::getName)
> +                                    .mapToInt(this::parseVersion)
> +                                    .filter(v -> v != -1 && v >=
> BASE_VERSION_FEATURE && v <= versionFeature)
> +                                    .distinct()
> +                                    .sorted()
> +                                    .toArray();
> +
> +                        }
> +
>                      }
>                  }
>              }
> @@ -1026,6 +1038,27 @@
>          }
>      }
>
> +    /**
> +     * If {@code entryName} is a a versioned entry, parse and return the
> version as an integer, otherwise return -1
> +     */
> +    private int parseVersion(String entryName) {
> +        if(!entryName.startsWith(META_INF_VERSIONS)) {
> +            return -1;
> +        }
> +
> +        int separator = entryName.indexOf("/",
> META_INF_VERSIONS.length());
> +
> +        if(separator == -1) {
> +            return -1;
> +        }
> +
> +        try {
> +            return Integer.parseInt(entryName,
> META_INF_VERSIONS.length(), separator, 10);
> +        } catch (NumberFormatException e) {
> +            return -1;
> +        }
> +    }
> +
>      synchronized void ensureInitialization() {
>          try {
>              maybeInstantiateVerifier();
>
>
>
>
>
>
>
>
>
>
>
>
> On Fri, Apr 10, 2020 at 10:58 PM Lance Andersen <lance.andersen at oracle.com>
> wrote:
>
>> Hi Eric
>>
>> Feel free to enter a feature request and better yet propose a fix :-)
>>
>> Have a good weekend!
>>
>> Best
>> Lance
>>
>> On Apr 10, 2020, at 2:59 PM, Eirik Bjørsnøs <eirbjo at gmail.com> wrote:
>>
>> I recently needed to re-implement multi-release lookup logic for a
>> ModuleReader capable of reading modules from unpacked (exploded) jar files
>> [1]
>>
>> It occurred to me that JarFile.getVersionedEntry checks _every_ version
>> between 8 and the runtime version when looking up paths.
>>
>> Since META-INF/versions will probably be sparsely populated, I'm wondering
>> if something could be done to avoid checking 20 different paths in OpenJDK
>> 28.
>>
>> Perhaps scanning META-INF/versions once when opening the file could work,
>> then only check existing versions in getVersionedEntry?
>>
>> Maybe a premature optimization today, but with the new release cadence,
>> this problem is going to surface at some point in the future, right?
>>
>> [1]
>> https://mail.openjdk.java.net/pipermail/jigsaw-dev/2020-April/014414.html
>>
>> Eirik.
>>
>>
>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance Andersen|
>> Principal Member of Technical Staff | +1.781.442.2037
>> Oracle Java Engineering
>> 1 Network Drive
>> Burlington, MA 01803
>> Lance.Andersen at oracle.com
>>
>>
>>
>>


More information about the core-libs-dev mailing list