JarFile.getVersionedEntry scalability with new release cadence

Claes Redestad claes.redestad at oracle.com
Sat Apr 11 21:36:21 UTC 2020


Hi Eirik,

interesting idea.

I think you could tune away a significant part of that up front cost by
using JUZFA.entryNameStream(this) instead of
this.stream().map(ZipEntry::getName). This will avoid expanding each
entry into a JarEntry internally. Perhaps this gets the up-front
overhead down to more acceptable levels..?

/Claes

On 2020-04-11 22:06, Eirik Bjørsnøs wrote:
> There's an added up-front cost to scanning versions which I have also tried
> to test.
> 
> Since checkForSpecialAttributes is lazy, this can be tested by measuring
> the time taken to read the first entry of an open JarFile. For the h2 jar
> file, this seems to take ~350 microseconds without my patch, which
> increases to ~850 microseconds with the patch.
> 
> This cost applies to the first getEntry call and is then amortized over all
> following calls.
> 
> So this patch is probably not a win for use cases where very few entries
> are read.
> 
> Eirik.
> 
> On Sat, Apr 11, 2020 at 9:10 PM Eirik Bjørsnøs <eirbjo at gmail.com> wrote:
> 
>>
>> Lance,
>>
>> I made a small performance test. Pretty sloppy, so please don't tell
>> Aleksey S :-)
>>
>> Results indicate there may be some performance wins to be had.
>>
>> The test uses the Maven artifact com.h2database:h2:1.4.200:jar. This jar
>> which has 950 entries, of which the following three are versioned:
>>
>> META-INF/versions/10/org/h2/util/NetUtils2.class
>> META-INF/versions/9/org/h2/util/Bits.class
>> META-INF/versions/9/org/h2/util/CurrentTimestamp.class
>>
>> The performance test calls JarFile.getEntry for each of the base names
>> found in the jar. It does so 2000 times for 50 iterations and calculates
>> the average run time.
>>
>> This is done once on a JarFile opened with runtime version 15, once on a
>> JarFile opened with runtime version 8 (which effectively disables versioned
>> lookup so works as a baseline). Warmup runs are run first to get stable
>> results.
>>
>> The test is run with OpenJDK 15 built from master.
>>
>> Results:
>>
>> Average time to get 950 entries 2000 times:
>>
>> Runtime version 15: 2903 ms
>> Runtime version 8: 336 ms:
>>
>> This is shows the difference between testing seven versions (9, 10, 11,
>> 12, 13, 14, 15) and not testing versions.
>>
>> I then made a change to JarFile which scans the versions up front and
>> stores them in an int[] which is then looped over in getVersionedEntry.
>>
>> Results:
>>
>> Runtime version 15: 1048 ms
>> Runtime version 8: 315 ms:
>>
>> My benchmark is of course synthetic and does not represent reality. I have
>> not done any analysis on the shape of typical multi-versioned jars nor
>> their access patterns.
>>
>> However, an improvement of 2.5 - 3x is maybe worth taking a closer look?
>>
>> Here's the patch for my change in JarFile.java:
>>
>> Index: src/java.base/share/classes/java/util/jar/JarFile.java
>> IDEA additional info:
>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
>> <+>UTF-8
>> ===================================================================
>> --- src/java.base/share/classes/java/util/jar/JarFile.java (revision
>> 86722cb038d3030c51f3268799a2c3dc0c508638)
>> +++ src/java.base/share/classes/java/util/jar/JarFile.java (date
>> 1586631626679)
>> @@ -161,7 +161,7 @@
>>       private final Runtime.Version version;  // current version
>>       private final int versionFeature;       // version.feature()
>>       private boolean isMultiRelease;         // is jar multi-release?
>> -
>> +    private int[] versions;                 // which versions does the
>> jar contain
>>       // indicates if Class-Path attribute present
>>       private boolean hasClassPathAttribute;
>>       // true if manifest checked for special attributes
>> @@ -599,12 +599,13 @@
>>       }
>>
>>       private JarEntry getVersionedEntry(String name, JarEntry je) {
>> -        if (BASE_VERSION_FEATURE < versionFeature) {
>> +        int[] versions = this.versions;
>> +        if (BASE_VERSION_FEATURE < versionFeature && versions != null &&
>> versions.length > 0) {
>>               if (!name.startsWith(META_INF)) {
>>                   // search for versioned entry
>> -                int v = versionFeature;
>> -                while (v > BASE_VERSION_FEATURE) {
>> -                    JarFileEntry vje = getEntry0(META_INF_VERSIONS + v +
>> "/" + name);
>> +                int v = versions.length - 1;
>> +                while (v >= 0) {
>> +                    JarFileEntry vje = getEntry0(META_INF_VERSIONS +
>> versions[v] + "/" + name);
>>                       if (vje != null) {
>>                           return vje.withBasename(name);
>>                       }
>> @@ -1016,9 +1017,20 @@
>>                           byte[] lbuf = new byte[512];
>>                           Attributes attr = new Attributes();
>>                           attr.read(new Manifest.FastInputStream(
>> -                            new ByteArrayInputStream(b)), lbuf);
>> -                        isMultiRelease = Boolean.parseBoolean(
>> -                            attr.getValue(Attributes.Name.MULTI_RELEASE));
>> +                                new ByteArrayInputStream(b)), lbuf);
>> +                        if(Boolean.parseBoolean(
>> +
>>   attr.getValue(Attributes.Name.MULTI_RELEASE))) {
>> +                            isMultiRelease = true;
>> +                            versions = this.stream()
>> +                                    .map(ZipEntry::getName)
>> +                                    .mapToInt(this::parseVersion)
>> +                                    .filter(v -> v != -1 && v >=
>> BASE_VERSION_FEATURE && v <= versionFeature)
>> +                                    .distinct()
>> +                                    .sorted()
>> +                                    .toArray();
>> +
>> +                        }
>> +
>>                       }
>>                   }
>>               }
>> @@ -1026,6 +1038,27 @@
>>           }
>>       }
>>
>> +    /**
>> +     * If {@code entryName} is a a versioned entry, parse and return the
>> version as an integer, otherwise return -1
>> +     */
>> +    private int parseVersion(String entryName) {
>> +        if(!entryName.startsWith(META_INF_VERSIONS)) {
>> +            return -1;
>> +        }
>> +
>> +        int separator = entryName.indexOf("/",
>> META_INF_VERSIONS.length());
>> +
>> +        if(separator == -1) {
>> +            return -1;
>> +        }
>> +
>> +        try {
>> +            return Integer.parseInt(entryName,
>> META_INF_VERSIONS.length(), separator, 10);
>> +        } catch (NumberFormatException e) {
>> +            return -1;
>> +        }
>> +    }
>> +
>>       synchronized void ensureInitialization() {
>>           try {
>>               maybeInstantiateVerifier();
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Apr 10, 2020 at 10:58 PM Lance Andersen <lance.andersen at oracle.com>
>> wrote:
>>
>>> Hi Eric
>>>
>>> Feel free to enter a feature request and better yet propose a fix :-)
>>>
>>> Have a good weekend!
>>>
>>> Best
>>> Lance
>>>
>>> On Apr 10, 2020, at 2:59 PM, Eirik Bjørsnøs <eirbjo at gmail.com> wrote:
>>>
>>> I recently needed to re-implement multi-release lookup logic for a
>>> ModuleReader capable of reading modules from unpacked (exploded) jar files
>>> [1]
>>>
>>> It occurred to me that JarFile.getVersionedEntry checks _every_ version
>>> between 8 and the runtime version when looking up paths.
>>>
>>> Since META-INF/versions will probably be sparsely populated, I'm wondering
>>> if something could be done to avoid checking 20 different paths in OpenJDK
>>> 28.
>>>
>>> Perhaps scanning META-INF/versions once when opening the file could work,
>>> then only check existing versions in getVersionedEntry?
>>>
>>> Maybe a premature optimization today, but with the new release cadence,
>>> this problem is going to surface at some point in the future, right?
>>>
>>> [1]
>>> https://mail.openjdk.java.net/pipermail/jigsaw-dev/2020-April/014414.html
>>>
>>> Eirik.
>>>
>>>
>>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance Andersen|
>>> Principal Member of Technical Staff | +1.781.442.2037
>>> Oracle Java Engineering
>>> 1 Network Drive
>>> Burlington, MA 01803
>>> Lance.Andersen at oracle.com
>>>
>>>
>>>
>>>


More information about the core-libs-dev mailing list