JarFile.getVersionedEntry scalability with new release cadence

Eirik Bjørsnøs eirbjo at gmail.com
Sat Apr 11 22:03:34 UTC 2020


If Multi-Release specified a list of versions instead if just true|false,
then scanning would not be needed at all.

Would require a spec update and updates to ecosystem build tools etc, but
perhaps a better choice in the long run since any kind of scanning will be
linear to to number of entries.

Eirik.

On Sat, Apr 11, 2020 at 11:54 PM Eirik Bjørsnøs <eirbjo at gmail.com> wrote:

> Claes,
>
> That trick did occur to me before I reported the up front costs.
>
> Without this optimization, my tests shows more like ~1050 microseconds
> instead of ~850.
>
> Perhaps further improvements is possible by extracting versions from the
> binary representation instead of going via UTF for all names.
>
> But I'm not sure it would be worth it. I don't feel I have a clear
> understanding about how "bad" this up front cost really is compared to the
> later wins in the real world.
>
> Performance is hard to reason about :-)
>
> Eirik.
>
> On Sat, Apr 11, 2020 at 11:32 PM Claes Redestad <claes.redestad at oracle.com>
> wrote:
>
>> Hi Eirik,
>>
>> interesting idea.
>>
>> I think you could tune away a significant part of that up front cost by
>> using JUZFA.entryNameStream(this) instead of
>> this.stream().map(ZipEntry::getName). This will avoid expanding each
>> entry into a JarEntry internally. Perhaps this gets the up-front
>> overhead down to more acceptable levels..?
>>
>> /Claes
>>
>> On 2020-04-11 22:06, Eirik Bjørsnøs wrote:
>> > There's an added up-front cost to scanning versions which I have also
>> tried
>> > to test.
>> >
>> > Since checkForSpecialAttributes is lazy, this can be tested by measuring
>> > the time taken to read the first entry of an open JarFile. For the h2
>> jar
>> > file, this seems to take ~350 microseconds without my patch, which
>> > increases to ~850 microseconds with the patch.
>> >
>> > This cost applies to the first getEntry call and is then amortized over
>> all
>> > following calls.
>> >
>> > So this patch is probably not a win for use cases where very few entries
>> > are read.
>> >
>> > Eirik.
>> >
>> > On Sat, Apr 11, 2020 at 9:10 PM Eirik Bjørsnøs <eirbjo at gmail.com>
>> wrote:
>> >
>> >>
>> >> Lance,
>> >>
>> >> I made a small performance test. Pretty sloppy, so please don't tell
>> >> Aleksey S :-)
>> >>
>> >> Results indicate there may be some performance wins to be had.
>> >>
>> >> The test uses the Maven artifact com.h2database:h2:1.4.200:jar. This
>> jar
>> >> which has 950 entries, of which the following three are versioned:
>> >>
>> >> META-INF/versions/10/org/h2/util/NetUtils2.class
>> >> META-INF/versions/9/org/h2/util/Bits.class
>> >> META-INF/versions/9/org/h2/util/CurrentTimestamp.class
>> >>
>> >> The performance test calls JarFile.getEntry for each of the base names
>> >> found in the jar. It does so 2000 times for 50 iterations and
>> calculates
>> >> the average run time.
>> >>
>> >> This is done once on a JarFile opened with runtime version 15, once on
>> a
>> >> JarFile opened with runtime version 8 (which effectively disables
>> versioned
>> >> lookup so works as a baseline). Warmup runs are run first to get stable
>> >> results.
>> >>
>> >> The test is run with OpenJDK 15 built from master.
>> >>
>> >> Results:
>> >>
>> >> Average time to get 950 entries 2000 times:
>> >>
>> >> Runtime version 15: 2903 ms
>> >> Runtime version 8: 336 ms:
>> >>
>> >> This is shows the difference between testing seven versions (9, 10, 11,
>> >> 12, 13, 14, 15) and not testing versions.
>> >>
>> >> I then made a change to JarFile which scans the versions up front and
>> >> stores them in an int[] which is then looped over in getVersionedEntry.
>> >>
>> >> Results:
>> >>
>> >> Runtime version 15: 1048 ms
>> >> Runtime version 8: 315 ms:
>> >>
>> >> My benchmark is of course synthetic and does not represent reality. I
>> have
>> >> not done any analysis on the shape of typical multi-versioned jars nor
>> >> their access patterns.
>> >>
>> >> However, an improvement of 2.5 - 3x is maybe worth taking a closer
>> look?
>> >>
>> >> Here's the patch for my change in JarFile.java:
>> >>
>> >> Index: src/java.base/share/classes/java/util/jar/JarFile.java
>> >> IDEA additional info:
>> >> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
>> >> <+>UTF-8
>> >> ===================================================================
>> >> --- src/java.base/share/classes/java/util/jar/JarFile.java (revision
>> >> 86722cb038d3030c51f3268799a2c3dc0c508638)
>> >> +++ src/java.base/share/classes/java/util/jar/JarFile.java (date
>> >> 1586631626679)
>> >> @@ -161,7 +161,7 @@
>> >>       private final Runtime.Version version;  // current version
>> >>       private final int versionFeature;       // version.feature()
>> >>       private boolean isMultiRelease;         // is jar multi-release?
>> >> -
>> >> +    private int[] versions;                 // which versions does the
>> >> jar contain
>> >>       // indicates if Class-Path attribute present
>> >>       private boolean hasClassPathAttribute;
>> >>       // true if manifest checked for special attributes
>> >> @@ -599,12 +599,13 @@
>> >>       }
>> >>
>> >>       private JarEntry getVersionedEntry(String name, JarEntry je) {
>> >> -        if (BASE_VERSION_FEATURE < versionFeature) {
>> >> +        int[] versions = this.versions;
>> >> +        if (BASE_VERSION_FEATURE < versionFeature && versions != null
>> &&
>> >> versions.length > 0) {
>> >>               if (!name.startsWith(META_INF)) {
>> >>                   // search for versioned entry
>> >> -                int v = versionFeature;
>> >> -                while (v > BASE_VERSION_FEATURE) {
>> >> -                    JarFileEntry vje = getEntry0(META_INF_VERSIONS +
>> v +
>> >> "/" + name);
>> >> +                int v = versions.length - 1;
>> >> +                while (v >= 0) {
>> >> +                    JarFileEntry vje = getEntry0(META_INF_VERSIONS +
>> >> versions[v] + "/" + name);
>> >>                       if (vje != null) {
>> >>                           return vje.withBasename(name);
>> >>                       }
>> >> @@ -1016,9 +1017,20 @@
>> >>                           byte[] lbuf = new byte[512];
>> >>                           Attributes attr = new Attributes();
>> >>                           attr.read(new Manifest.FastInputStream(
>> >> -                            new ByteArrayInputStream(b)), lbuf);
>> >> -                        isMultiRelease = Boolean.parseBoolean(
>> >> -
>> attr.getValue(Attributes.Name.MULTI_RELEASE));
>> >> +                                new ByteArrayInputStream(b)), lbuf);
>> >> +                        if(Boolean.parseBoolean(
>> >> +
>> >>   attr.getValue(Attributes.Name.MULTI_RELEASE))) {
>> >> +                            isMultiRelease = true;
>> >> +                            versions = this.stream()
>> >> +                                    .map(ZipEntry::getName)
>> >> +                                    .mapToInt(this::parseVersion)
>> >> +                                    .filter(v -> v != -1 && v >=
>> >> BASE_VERSION_FEATURE && v <= versionFeature)
>> >> +                                    .distinct()
>> >> +                                    .sorted()
>> >> +                                    .toArray();
>> >> +
>> >> +                        }
>> >> +
>> >>                       }
>> >>                   }
>> >>               }
>> >> @@ -1026,6 +1038,27 @@
>> >>           }
>> >>       }
>> >>
>> >> +    /**
>> >> +     * If {@code entryName} is a a versioned entry, parse and return
>> the
>> >> version as an integer, otherwise return -1
>> >> +     */
>> >> +    private int parseVersion(String entryName) {
>> >> +        if(!entryName.startsWith(META_INF_VERSIONS)) {
>> >> +            return -1;
>> >> +        }
>> >> +
>> >> +        int separator = entryName.indexOf("/",
>> >> META_INF_VERSIONS.length());
>> >> +
>> >> +        if(separator == -1) {
>> >> +            return -1;
>> >> +        }
>> >> +
>> >> +        try {
>> >> +            return Integer.parseInt(entryName,
>> >> META_INF_VERSIONS.length(), separator, 10);
>> >> +        } catch (NumberFormatException e) {
>> >> +            return -1;
>> >> +        }
>> >> +    }
>> >> +
>> >>       synchronized void ensureInitialization() {
>> >>           try {
>> >>               maybeInstantiateVerifier();
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Fri, Apr 10, 2020 at 10:58 PM Lance Andersen <
>> lance.andersen at oracle.com>
>> >> wrote:
>> >>
>> >>> Hi Eric
>> >>>
>> >>> Feel free to enter a feature request and better yet propose a fix :-)
>> >>>
>> >>> Have a good weekend!
>> >>>
>> >>> Best
>> >>> Lance
>> >>>
>> >>> On Apr 10, 2020, at 2:59 PM, Eirik Bjørsnøs <eirbjo at gmail.com> wrote:
>> >>>
>> >>> I recently needed to re-implement multi-release lookup logic for a
>> >>> ModuleReader capable of reading modules from unpacked (exploded) jar
>> files
>> >>> [1]
>> >>>
>> >>> It occurred to me that JarFile.getVersionedEntry checks _every_
>> version
>> >>> between 8 and the runtime version when looking up paths.
>> >>>
>> >>> Since META-INF/versions will probably be sparsely populated, I'm
>> wondering
>> >>> if something could be done to avoid checking 20 different paths in
>> OpenJDK
>> >>> 28.
>> >>>
>> >>> Perhaps scanning META-INF/versions once when opening the file could
>> work,
>> >>> then only check existing versions in getVersionedEntry?
>> >>>
>> >>> Maybe a premature optimization today, but with the new release
>> cadence,
>> >>> this problem is going to surface at some point in the future, right?
>> >>>
>> >>> [1]
>> >>>
>> https://mail.openjdk.java.net/pipermail/jigsaw-dev/2020-April/014414.html
>> >>>
>> >>> Eirik.
>> >>>
>> >>>
>> >>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>> >>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>> >>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>> >>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance
>> Andersen|
>> >>> Principal Member of Technical Staff | +1.781.442.2037
>> >>> Oracle Java Engineering
>> >>> 1 Network Drive
>> >>> Burlington, MA 01803
>> >>> Lance.Andersen at oracle.com
>> >>>
>> >>>
>> >>>
>> >>>
>>
>


More information about the core-libs-dev mailing list