JarFile.getVersionedEntry scalability with new release cadence
Claes Redestad
claes.redestad at oracle.com
Sat Apr 11 21:36:21 UTC 2020
Hi Eirik,
interesting idea.
I think you could tune away a significant part of that up front cost by
using JUZFA.entryNameStream(this) instead of
this.stream().map(ZipEntry::getName). This will avoid expanding each
entry into a JarEntry internally. Perhaps this gets the up-front
overhead down to more acceptable levels..?
/Claes
On 2020-04-11 22:06, Eirik Bjørsnøs wrote:
> There's an added up-front cost to scanning versions which I have also tried
> to test.
>
> Since checkForSpecialAttributes is lazy, this can be tested by measuring
> the time taken to read the first entry of an open JarFile. For the h2 jar
> file, this seems to take ~350 microseconds without my patch, which
> increases to ~850 microseconds with the patch.
>
> This cost applies to the first getEntry call and is then amortized over all
> following calls.
>
> So this patch is probably not a win for use cases where very few entries
> are read.
>
> Eirik.
>
> On Sat, Apr 11, 2020 at 9:10 PM Eirik Bjørsnøs <eirbjo at gmail.com> wrote:
>
>>
>> Lance,
>>
>> I made a small performance test. Pretty sloppy, so please don't tell
>> Aleksey S :-)
>>
>> Results indicate there may be some performance wins to be had.
>>
>> The test uses the Maven artifact com.h2database:h2:1.4.200:jar. This jar
>> which has 950 entries, of which the following three are versioned:
>>
>> META-INF/versions/10/org/h2/util/NetUtils2.class
>> META-INF/versions/9/org/h2/util/Bits.class
>> META-INF/versions/9/org/h2/util/CurrentTimestamp.class
>>
>> The performance test calls JarFile.getEntry for each of the base names
>> found in the jar. It does so 2000 times for 50 iterations and calculates
>> the average run time.
>>
>> This is done once on a JarFile opened with runtime version 15, once on a
>> JarFile opened with runtime version 8 (which effectively disables versioned
>> lookup so works as a baseline). Warmup runs are run first to get stable
>> results.
>>
>> The test is run with OpenJDK 15 built from master.
>>
>> Results:
>>
>> Average time to get 950 entries 2000 times:
>>
>> Runtime version 15: 2903 ms
>> Runtime version 8: 336 ms:
>>
>> This is shows the difference between testing seven versions (9, 10, 11,
>> 12, 13, 14, 15) and not testing versions.
>>
>> I then made a change to JarFile which scans the versions up front and
>> stores them in an int[] which is then looped over in getVersionedEntry.
>>
>> Results:
>>
>> Runtime version 15: 1048 ms
>> Runtime version 8: 315 ms:
>>
>> My benchmark is of course synthetic and does not represent reality. I have
>> not done any analysis on the shape of typical multi-versioned jars nor
>> their access patterns.
>>
>> However, an improvement of 2.5 - 3x is maybe worth taking a closer look?
>>
>> Here's the patch for my change in JarFile.java:
>>
>> Index: src/java.base/share/classes/java/util/jar/JarFile.java
>> IDEA additional info:
>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
>> <+>UTF-8
>> ===================================================================
>> --- src/java.base/share/classes/java/util/jar/JarFile.java (revision
>> 86722cb038d3030c51f3268799a2c3dc0c508638)
>> +++ src/java.base/share/classes/java/util/jar/JarFile.java (date
>> 1586631626679)
>> @@ -161,7 +161,7 @@
>> private final Runtime.Version version; // current version
>> private final int versionFeature; // version.feature()
>> private boolean isMultiRelease; // is jar multi-release?
>> -
>> + private int[] versions; // which versions does the
>> jar contain
>> // indicates if Class-Path attribute present
>> private boolean hasClassPathAttribute;
>> // true if manifest checked for special attributes
>> @@ -599,12 +599,13 @@
>> }
>>
>> private JarEntry getVersionedEntry(String name, JarEntry je) {
>> - if (BASE_VERSION_FEATURE < versionFeature) {
>> + int[] versions = this.versions;
>> + if (BASE_VERSION_FEATURE < versionFeature && versions != null &&
>> versions.length > 0) {
>> if (!name.startsWith(META_INF)) {
>> // search for versioned entry
>> - int v = versionFeature;
>> - while (v > BASE_VERSION_FEATURE) {
>> - JarFileEntry vje = getEntry0(META_INF_VERSIONS + v +
>> "/" + name);
>> + int v = versions.length - 1;
>> + while (v >= 0) {
>> + JarFileEntry vje = getEntry0(META_INF_VERSIONS +
>> versions[v] + "/" + name);
>> if (vje != null) {
>> return vje.withBasename(name);
>> }
>> @@ -1016,9 +1017,20 @@
>> byte[] lbuf = new byte[512];
>> Attributes attr = new Attributes();
>> attr.read(new Manifest.FastInputStream(
>> - new ByteArrayInputStream(b)), lbuf);
>> - isMultiRelease = Boolean.parseBoolean(
>> - attr.getValue(Attributes.Name.MULTI_RELEASE));
>> + new ByteArrayInputStream(b)), lbuf);
>> + if(Boolean.parseBoolean(
>> +
>> attr.getValue(Attributes.Name.MULTI_RELEASE))) {
>> + isMultiRelease = true;
>> + versions = this.stream()
>> + .map(ZipEntry::getName)
>> + .mapToInt(this::parseVersion)
>> + .filter(v -> v != -1 && v >=
>> BASE_VERSION_FEATURE && v <= versionFeature)
>> + .distinct()
>> + .sorted()
>> + .toArray();
>> +
>> + }
>> +
>> }
>> }
>> }
>> @@ -1026,6 +1038,27 @@
>> }
>> }
>>
>> + /**
>> + * If {@code entryName} is a a versioned entry, parse and return the
>> version as an integer, otherwise return -1
>> + */
>> + private int parseVersion(String entryName) {
>> + if(!entryName.startsWith(META_INF_VERSIONS)) {
>> + return -1;
>> + }
>> +
>> + int separator = entryName.indexOf("/",
>> META_INF_VERSIONS.length());
>> +
>> + if(separator == -1) {
>> + return -1;
>> + }
>> +
>> + try {
>> + return Integer.parseInt(entryName,
>> META_INF_VERSIONS.length(), separator, 10);
>> + } catch (NumberFormatException e) {
>> + return -1;
>> + }
>> + }
>> +
>> synchronized void ensureInitialization() {
>> try {
>> maybeInstantiateVerifier();
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Apr 10, 2020 at 10:58 PM Lance Andersen <lance.andersen at oracle.com>
>> wrote:
>>
>>> Hi Eric
>>>
>>> Feel free to enter a feature request and better yet propose a fix :-)
>>>
>>> Have a good weekend!
>>>
>>> Best
>>> Lance
>>>
>>> On Apr 10, 2020, at 2:59 PM, Eirik Bjørsnøs <eirbjo at gmail.com> wrote:
>>>
>>> I recently needed to re-implement multi-release lookup logic for a
>>> ModuleReader capable of reading modules from unpacked (exploded) jar files
>>> [1]
>>>
>>> It occurred to me that JarFile.getVersionedEntry checks _every_ version
>>> between 8 and the runtime version when looking up paths.
>>>
>>> Since META-INF/versions will probably be sparsely populated, I'm wondering
>>> if something could be done to avoid checking 20 different paths in OpenJDK
>>> 28.
>>>
>>> Perhaps scanning META-INF/versions once when opening the file could work,
>>> then only check existing versions in getVersionedEntry?
>>>
>>> Maybe a premature optimization today, but with the new release cadence,
>>> this problem is going to surface at some point in the future, right?
>>>
>>> [1]
>>> https://mail.openjdk.java.net/pipermail/jigsaw-dev/2020-April/014414.html
>>>
>>> Eirik.
>>>
>>>
>>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance Andersen|
>>> Principal Member of Technical Staff | +1.781.442.2037
>>> Oracle Java Engineering
>>> 1 Network Drive
>>> Burlington, MA 01803
>>> Lance.Andersen at oracle.com
>>>
>>>
>>>
>>>
More information about the core-libs-dev
mailing list