JarFile.getVersionedEntry scalability with new release cadence
Eirik Bjørsnøs
eirbjo at gmail.com
Sat Apr 11 21:54:26 UTC 2020
Claes,
That trick did occur to me before I reported the up front costs.
Without this optimization, my tests shows more like ~1050 microseconds
instead of ~850.
Perhaps further improvements is possible by extracting versions from the
binary representation instead of going via UTF for all names.
But I'm not sure it would be worth it. I don't feel I have a clear
understanding about how "bad" this up front cost really is compared to the
later wins in the real world.
Performance is hard to reason about :-)
Eirik.
On Sat, Apr 11, 2020 at 11:32 PM Claes Redestad <claes.redestad at oracle.com>
wrote:
> Hi Eirik,
>
> interesting idea.
>
> I think you could tune away a significant part of that up front cost by
> using JUZFA.entryNameStream(this) instead of
> this.stream().map(ZipEntry::getName). This will avoid expanding each
> entry into a JarEntry internally. Perhaps this gets the up-front
> overhead down to more acceptable levels..?
>
> /Claes
>
> On 2020-04-11 22:06, Eirik Bjørsnøs wrote:
> > There's an added up-front cost to scanning versions which I have also
> tried
> > to test.
> >
> > Since checkForSpecialAttributes is lazy, this can be tested by measuring
> > the time taken to read the first entry of an open JarFile. For the h2 jar
> > file, this seems to take ~350 microseconds without my patch, which
> > increases to ~850 microseconds with the patch.
> >
> > This cost applies to the first getEntry call and is then amortized over
> all
> > following calls.
> >
> > So this patch is probably not a win for use cases where very few entries
> > are read.
> >
> > Eirik.
> >
> > On Sat, Apr 11, 2020 at 9:10 PM Eirik Bjørsnøs <eirbjo at gmail.com> wrote:
> >
> >>
> >> Lance,
> >>
> >> I made a small performance test. Pretty sloppy, so please don't tell
> >> Aleksey S :-)
> >>
> >> Results indicate there may be some performance wins to be had.
> >>
> >> The test uses the Maven artifact com.h2database:h2:1.4.200:jar. This jar
> >> which has 950 entries, of which the following three are versioned:
> >>
> >> META-INF/versions/10/org/h2/util/NetUtils2.class
> >> META-INF/versions/9/org/h2/util/Bits.class
> >> META-INF/versions/9/org/h2/util/CurrentTimestamp.class
> >>
> >> The performance test calls JarFile.getEntry for each of the base names
> >> found in the jar. It does so 2000 times for 50 iterations and calculates
> >> the average run time.
> >>
> >> This is done once on a JarFile opened with runtime version 15, once on a
> >> JarFile opened with runtime version 8 (which effectively disables
> versioned
> >> lookup so works as a baseline). Warmup runs are run first to get stable
> >> results.
> >>
> >> The test is run with OpenJDK 15 built from master.
> >>
> >> Results:
> >>
> >> Average time to get 950 entries 2000 times:
> >>
> >> Runtime version 15: 2903 ms
> >> Runtime version 8: 336 ms:
> >>
> >> This is shows the difference between testing seven versions (9, 10, 11,
> >> 12, 13, 14, 15) and not testing versions.
> >>
> >> I then made a change to JarFile which scans the versions up front and
> >> stores them in an int[] which is then looped over in getVersionedEntry.
> >>
> >> Results:
> >>
> >> Runtime version 15: 1048 ms
> >> Runtime version 8: 315 ms:
> >>
> >> My benchmark is of course synthetic and does not represent reality. I
> have
> >> not done any analysis on the shape of typical multi-versioned jars nor
> >> their access patterns.
> >>
> >> However, an improvement of 2.5 - 3x is maybe worth taking a closer look?
> >>
> >> Here's the patch for my change in JarFile.java:
> >>
> >> Index: src/java.base/share/classes/java/util/jar/JarFile.java
> >> IDEA additional info:
> >> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> >> <+>UTF-8
> >> ===================================================================
> >> --- src/java.base/share/classes/java/util/jar/JarFile.java (revision
> >> 86722cb038d3030c51f3268799a2c3dc0c508638)
> >> +++ src/java.base/share/classes/java/util/jar/JarFile.java (date
> >> 1586631626679)
> >> @@ -161,7 +161,7 @@
> >> private final Runtime.Version version; // current version
> >> private final int versionFeature; // version.feature()
> >> private boolean isMultiRelease; // is jar multi-release?
> >> -
> >> + private int[] versions; // which versions does the
> >> jar contain
> >> // indicates if Class-Path attribute present
> >> private boolean hasClassPathAttribute;
> >> // true if manifest checked for special attributes
> >> @@ -599,12 +599,13 @@
> >> }
> >>
> >> private JarEntry getVersionedEntry(String name, JarEntry je) {
> >> - if (BASE_VERSION_FEATURE < versionFeature) {
> >> + int[] versions = this.versions;
> >> + if (BASE_VERSION_FEATURE < versionFeature && versions != null
> &&
> >> versions.length > 0) {
> >> if (!name.startsWith(META_INF)) {
> >> // search for versioned entry
> >> - int v = versionFeature;
> >> - while (v > BASE_VERSION_FEATURE) {
> >> - JarFileEntry vje = getEntry0(META_INF_VERSIONS + v
> +
> >> "/" + name);
> >> + int v = versions.length - 1;
> >> + while (v >= 0) {
> >> + JarFileEntry vje = getEntry0(META_INF_VERSIONS +
> >> versions[v] + "/" + name);
> >> if (vje != null) {
> >> return vje.withBasename(name);
> >> }
> >> @@ -1016,9 +1017,20 @@
> >> byte[] lbuf = new byte[512];
> >> Attributes attr = new Attributes();
> >> attr.read(new Manifest.FastInputStream(
> >> - new ByteArrayInputStream(b)), lbuf);
> >> - isMultiRelease = Boolean.parseBoolean(
> >> -
> attr.getValue(Attributes.Name.MULTI_RELEASE));
> >> + new ByteArrayInputStream(b)), lbuf);
> >> + if(Boolean.parseBoolean(
> >> +
> >> attr.getValue(Attributes.Name.MULTI_RELEASE))) {
> >> + isMultiRelease = true;
> >> + versions = this.stream()
> >> + .map(ZipEntry::getName)
> >> + .mapToInt(this::parseVersion)
> >> + .filter(v -> v != -1 && v >=
> >> BASE_VERSION_FEATURE && v <= versionFeature)
> >> + .distinct()
> >> + .sorted()
> >> + .toArray();
> >> +
> >> + }
> >> +
> >> }
> >> }
> >> }
> >> @@ -1026,6 +1038,27 @@
> >> }
> >> }
> >>
> >> + /**
> >> + * If {@code entryName} is a a versioned entry, parse and return
> the
> >> version as an integer, otherwise return -1
> >> + */
> >> + private int parseVersion(String entryName) {
> >> + if(!entryName.startsWith(META_INF_VERSIONS)) {
> >> + return -1;
> >> + }
> >> +
> >> + int separator = entryName.indexOf("/",
> >> META_INF_VERSIONS.length());
> >> +
> >> + if(separator == -1) {
> >> + return -1;
> >> + }
> >> +
> >> + try {
> >> + return Integer.parseInt(entryName,
> >> META_INF_VERSIONS.length(), separator, 10);
> >> + } catch (NumberFormatException e) {
> >> + return -1;
> >> + }
> >> + }
> >> +
> >> synchronized void ensureInitialization() {
> >> try {
> >> maybeInstantiateVerifier();
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Fri, Apr 10, 2020 at 10:58 PM Lance Andersen <
> lance.andersen at oracle.com>
> >> wrote:
> >>
> >>> Hi Eric
> >>>
> >>> Feel free to enter a feature request and better yet propose a fix :-)
> >>>
> >>> Have a good weekend!
> >>>
> >>> Best
> >>> Lance
> >>>
> >>> On Apr 10, 2020, at 2:59 PM, Eirik Bjørsnøs <eirbjo at gmail.com> wrote:
> >>>
> >>> I recently needed to re-implement multi-release lookup logic for a
> >>> ModuleReader capable of reading modules from unpacked (exploded) jar
> files
> >>> [1]
> >>>
> >>> It occurred to me that JarFile.getVersionedEntry checks _every_ version
> >>> between 8 and the runtime version when looking up paths.
> >>>
> >>> Since META-INF/versions will probably be sparsely populated, I'm
> wondering
> >>> if something could be done to avoid checking 20 different paths in
> OpenJDK
> >>> 28.
> >>>
> >>> Perhaps scanning META-INF/versions once when opening the file could
> work,
> >>> then only check existing versions in getVersionedEntry?
> >>>
> >>> Maybe a premature optimization today, but with the new release cadence,
> >>> this problem is going to surface at some point in the future, right?
> >>>
> >>> [1]
> >>>
> https://mail.openjdk.java.net/pipermail/jigsaw-dev/2020-April/014414.html
> >>>
> >>> Eirik.
> >>>
> >>>
> >>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
> >>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
> >>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
> >>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance
> Andersen|
> >>> Principal Member of Technical Staff | +1.781.442.2037
> >>> Oracle Java Engineering
> >>> 1 Network Drive
> >>> Burlington, MA 01803
> >>> Lance.Andersen at oracle.com
> >>>
> >>>
> >>>
> >>>
>
More information about the core-libs-dev
mailing list