JarFile.getVersionedEntry scalability with new release cadence
Claes Redestad
claes.redestad at oracle.com
Sat Apr 11 22:24:39 UTC 2020
Agree that declaring which versions are available in a manifest
attribute would be the best solution from a performance point of view,
since there'd be no scanning at all. Seems like an optional attribute
that should be relatively easy to add and cause no adverse effects on
unaware JDK versions.
/Claes
On 2020-04-12 00:03, Eirik Bjørsnøs wrote:
>
> If Multi-Release specified a list of versions instead if just
> true|false, then scanning would not be needed at all.
>
> Would require a spec update and updates to ecosystem build tools etc,
> but perhaps a better choice in the long run since any kind of scanning
> will be linear to to number of entries.
>
> Eirik.
>
> On Sat, Apr 11, 2020 at 11:54 PM Eirik Bjørsnøs <eirbjo at gmail.com
> <mailto:eirbjo at gmail.com>> wrote:
>
> Claes,
>
> That trick did occur to me before I reported the up front costs.
>
> Without this optimization, my tests shows more like ~1050
> microseconds instead of ~850.
>
> Perhaps further improvements is possible by extracting versions from
> the binary representation instead of going via UTF for all names.
>
> But I'm not sure it would be worth it. I don't feel I have a clear
> understanding about how "bad" this up front cost really is compared
> to the later wins in the real world.
>
> Performance is hard to reason about :-)
>
> Eirik.
>
> On Sat, Apr 11, 2020 at 11:32 PM Claes Redestad
> <claes.redestad at oracle.com <mailto:claes.redestad at oracle.com>> wrote:
>
> Hi Eirik,
>
> interesting idea.
>
> I think you could tune away a significant part of that up front
> cost by
> using JUZFA.entryNameStream(this) instead of
> this.stream().map(ZipEntry::getName). This will avoid expanding each
> entry into a JarEntry internally. Perhaps this gets the up-front
> overhead down to more acceptable levels..?
>
> /Claes
>
> On 2020-04-11 22:06, Eirik Bjørsnøs wrote:
> > There's an added up-front cost to scanning versions which I
> have also tried
> > to test.
> >
> > Since checkForSpecialAttributes is lazy, this can be tested
> by measuring
> > the time taken to read the first entry of an open JarFile.
> For the h2 jar
> > file, this seems to take ~350 microseconds without my patch,
> which
> > increases to ~850 microseconds with the patch.
> >
> > This cost applies to the first getEntry call and is then
> amortized over all
> > following calls.
> >
> > So this patch is probably not a win for use cases where very
> few entries
> > are read.
> >
> > Eirik.
> >
> > On Sat, Apr 11, 2020 at 9:10 PM Eirik Bjørsnøs
> <eirbjo at gmail.com <mailto:eirbjo at gmail.com>> wrote:
> >
> >>
> >> Lance,
> >>
> >> I made a small performance test. Pretty sloppy, so please
> don't tell
> >> Aleksey S :-)
> >>
> >> Results indicate there may be some performance wins to be had.
> >>
> >> The test uses the Maven artifact
> com.h2database:h2:1.4.200:jar. This jar
> >> which has 950 entries, of which the following three are
> versioned:
> >>
> >> META-INF/versions/10/org/h2/util/NetUtils2.class
> >> META-INF/versions/9/org/h2/util/Bits.class
> >> META-INF/versions/9/org/h2/util/CurrentTimestamp.class
> >>
> >> The performance test calls JarFile.getEntry for each of the
> base names
> >> found in the jar. It does so 2000 times for 50 iterations
> and calculates
> >> the average run time.
> >>
> >> This is done once on a JarFile opened with runtime version
> 15, once on a
> >> JarFile opened with runtime version 8 (which effectively
> disables versioned
> >> lookup so works as a baseline). Warmup runs are run first to
> get stable
> >> results.
> >>
> >> The test is run with OpenJDK 15 built from master.
> >>
> >> Results:
> >>
> >> Average time to get 950 entries 2000 times:
> >>
> >> Runtime version 15: 2903 ms
> >> Runtime version 8: 336 ms:
> >>
> >> This is shows the difference between testing seven versions
> (9, 10, 11,
> >> 12, 13, 14, 15) and not testing versions.
> >>
> >> I then made a change to JarFile which scans the versions up
> front and
> >> stores them in an int[] which is then looped over in
> getVersionedEntry.
> >>
> >> Results:
> >>
> >> Runtime version 15: 1048 ms
> >> Runtime version 8: 315 ms:
> >>
> >> My benchmark is of course synthetic and does not represent
> reality. I have
> >> not done any analysis on the shape of typical
> multi-versioned jars nor
> >> their access patterns.
> >>
> >> However, an improvement of 2.5 - 3x is maybe worth taking a
> closer look?
> >>
> >> Here's the patch for my change in JarFile.java:
> >>
> >> Index: src/java.base/share/classes/java/util/jar/JarFile.java
> >> IDEA additional info:
> >> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> >> <+>UTF-8
> >>
> ===================================================================
> >> --- src/java.base/share/classes/java/util/jar/JarFile.java
> (revision
> >> 86722cb038d3030c51f3268799a2c3dc0c508638)
> >> +++ src/java.base/share/classes/java/util/jar/JarFile.java (date
> >> 1586631626679)
> >> @@ -161,7 +161,7 @@
> >> private final Runtime.Version version; // current version
> >> private final int versionFeature; //
> version.feature()
> >> private boolean isMultiRelease; // is jar
> multi-release?
> >> -
> >> + private int[] versions; // which
> versions does the
> >> jar contain
> >> // indicates if Class-Path attribute present
> >> private boolean hasClassPathAttribute;
> >> // true if manifest checked for special attributes
> >> @@ -599,12 +599,13 @@
> >> }
> >>
> >> private JarEntry getVersionedEntry(String name,
> JarEntry je) {
> >> - if (BASE_VERSION_FEATURE < versionFeature) {
> >> + int[] versions = this.versions;
> >> + if (BASE_VERSION_FEATURE < versionFeature &&
> versions != null &&
> >> versions.length > 0) {
> >> if (!name.startsWith(META_INF)) {
> >> // search for versioned entry
> >> - int v = versionFeature;
> >> - while (v > BASE_VERSION_FEATURE) {
> >> - JarFileEntry vje =
> getEntry0(META_INF_VERSIONS + v +
> >> "/" + name);
> >> + int v = versions.length - 1;
> >> + while (v >= 0) {
> >> + JarFileEntry vje =
> getEntry0(META_INF_VERSIONS +
> >> versions[v] + "/" + name);
> >> if (vje != null) {
> >> return vje.withBasename(name);
> >> }
> >> @@ -1016,9 +1017,20 @@
> >> byte[] lbuf = new byte[512];
> >> Attributes attr = new Attributes();
> >> attr.read(new
> Manifest.FastInputStream(
> >> - new ByteArrayInputStream(b)),
> lbuf);
> >> - isMultiRelease = Boolean.parseBoolean(
> >> -
> attr.getValue(Attributes.Name.MULTI_RELEASE));
> >> + new
> ByteArrayInputStream(b)), lbuf);
> >> + if(Boolean.parseBoolean(
> >> +
> >> attr.getValue(Attributes.Name.MULTI_RELEASE))) {
> >> + isMultiRelease = true;
> >> + versions = this.stream()
> >> + .map(ZipEntry::getName)
> >> +
> .mapToInt(this::parseVersion)
> >> + .filter(v -> v != -1 &&
> v >=
> >> BASE_VERSION_FEATURE && v <= versionFeature)
> >> + .distinct()
> >> + .sorted()
> >> + .toArray();
> >> +
> >> + }
> >> +
> >> }
> >> }
> >> }
> >> @@ -1026,6 +1038,27 @@
> >> }
> >> }
> >>
> >> + /**
> >> + * If {@code entryName} is a a versioned entry, parse
> and return the
> >> version as an integer, otherwise return -1
> >> + */
> >> + private int parseVersion(String entryName) {
> >> + if(!entryName.startsWith(META_INF_VERSIONS)) {
> >> + return -1;
> >> + }
> >> +
> >> + int separator = entryName.indexOf("/",
> >> META_INF_VERSIONS.length());
> >> +
> >> + if(separator == -1) {
> >> + return -1;
> >> + }
> >> +
> >> + try {
> >> + return Integer.parseInt(entryName,
> >> META_INF_VERSIONS.length(), separator, 10);
> >> + } catch (NumberFormatException e) {
> >> + return -1;
> >> + }
> >> + }
> >> +
> >> synchronized void ensureInitialization() {
> >> try {
> >> maybeInstantiateVerifier();
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Fri, Apr 10, 2020 at 10:58 PM Lance Andersen
> <lance.andersen at oracle.com <mailto:lance.andersen at oracle.com>>
> >> wrote:
> >>
> >>> Hi Eric
> >>>
> >>> Feel free to enter a feature request and better yet propose
> a fix :-)
> >>>
> >>> Have a good weekend!
> >>>
> >>> Best
> >>> Lance
> >>>
> >>> On Apr 10, 2020, at 2:59 PM, Eirik Bjørsnøs
> <eirbjo at gmail.com <mailto:eirbjo at gmail.com>> wrote:
> >>>
> >>> I recently needed to re-implement multi-release lookup
> logic for a
> >>> ModuleReader capable of reading modules from unpacked
> (exploded) jar files
> >>> [1]
> >>>
> >>> It occurred to me that JarFile.getVersionedEntry checks
> _every_ version
> >>> between 8 and the runtime version when looking up paths.
> >>>
> >>> Since META-INF/versions will probably be sparsely
> populated, I'm wondering
> >>> if something could be done to avoid checking 20 different
> paths in OpenJDK
> >>> 28.
> >>>
> >>> Perhaps scanning META-INF/versions once when opening the
> file could work,
> >>> then only check existing versions in getVersionedEntry?
> >>>
> >>> Maybe a premature optimization today, but with the new
> release cadence,
> >>> this problem is going to surface at some point in the
> future, right?
> >>>
> >>> [1]
> >>>
> https://mail.openjdk.java.net/pipermail/jigsaw-dev/2020-April/014414.html
> >>>
> >>> Eirik.
> >>>
> >>>
> >>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
> >>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
> >>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
> >>>
> <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance
> Andersen|
> >>> Principal Member of Technical Staff | +1.781.442.2037
> >>> Oracle Java Engineering
> >>> 1 Network Drive
> >>> Burlington, MA 01803
> >>> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>
> >>>
> >>>
> >>>
> >>>
>
More information about the core-libs-dev
mailing list