RFR: 8260617: Merge ZipFile encoding check with the initial hash calculation
- Merge checkEncoding into the byte[]-based normalizedHash. The latter is only used from ZipFile.initCEN right after the checkEncoding today, so verifying this is equivalent is straightforward. - Factor out the logic to calculate hash, check encoding etc into the addEntry method to allow JITs to chew off larger chunks of the logic early on Roughly 1.2x speedup on the JarFileMeta microbenchmarks, which include the time required to open the jar file and thus exercising the optimized code. Testing: tier1-4 ------------- Commit messages: - Refactor ZipFile.initCEN, merging checkEncoding and normalizedHash Changes: https://git.openjdk.java.net/jdk/pull/2306/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2306&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260617 Stats: 132 lines in 2 files changed: 31 ins; 68 del; 33 mod Patch: https://git.openjdk.java.net/jdk/pull/2306.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2306/head:pull/2306 PR: https://git.openjdk.java.net/jdk/pull/2306
On Fri, 29 Jan 2021 00:54:46 GMT, Claes Redestad <redestad@openjdk.org> wrote:
- Merge checkEncoding into the byte[]-based normalizedHash. The latter is only used from ZipFile.initCEN right after the checkEncoding today, so verifying this is equivalent is straightforward. - Factor out the logic to calculate hash, check encoding etc into the addEntry method to allow JITs to chew off larger chunks of the logic early on
Roughly 1.2x speedup on the JarFileMeta microbenchmarks, which include the time required to open the jar file and thus exercising the optimized code.
Testing: tier1-4
Baseline: Benchmark (size) Mode Cnt Score Error Units JarFileMeta.getManifestFromJarWithManifest 512 avgt 5 246.534 ± 32.934 us/op JarFileMeta.getManifestFromJarWithManifest 1024 avgt 5 475.978 ± 70.689 us/op JarFileMeta.getStreamFromJarWithManifest 512 avgt 5 243.401 ± 33.190 us/op JarFileMeta.getStreamFromJarWithManifest 1024 avgt 5 465.899 ± 58.306 us/op JarFileMeta.getStreamFromJarWithNoManifest 512 avgt 5 243.480 ± 44.631 us/op JarFileMeta.getStreamFromJarWithNoManifest 1024 avgt 5 454.937 ± 55.268 us/op JarFileMeta.getStreamFromJarWithSignatureFiles 512 avgt 5 267.522 ± 48.914 us/op JarFileMeta.getStreamFromJarWithSignatureFiles 1024 avgt 5 509.779 ± 81.314 us/op Patched: JarFileMeta.getManifestFromJarWithManifest 512 avgt 5 186.973 ± 26.564 us/op JarFileMeta.getManifestFromJarWithManifest 1024 avgt 5 360.141 ± 33.414 us/op JarFileMeta.getStreamFromJarWithManifest 512 avgt 5 186.244 ± 30.014 us/op JarFileMeta.getStreamFromJarWithManifest 1024 avgt 5 362.870 ± 61.271 us/op JarFileMeta.getStreamFromJarWithNoManifest 512 avgt 5 182.841 ± 13.797 us/op JarFileMeta.getStreamFromJarWithNoManifest 1024 avgt 5 354.254 ± 67.377 us/op JarFileMeta.getStreamFromJarWithSignatureFiles 512 avgt 5 215.419 ± 20.463 us/op JarFileMeta.getStreamFromJarWithSignatureFiles 1024 avgt 5 403.043 ± 78.926 us/op ------------- PR: https://git.openjdk.java.net/jdk/pull/2306
On Fri, 29 Jan 2021 00:54:46 GMT, Claes Redestad <redestad@openjdk.org> wrote:
- Merge checkEncoding into the byte[]-based normalizedHash. The latter is only used from ZipFile.initCEN right after the checkEncoding today, so verifying this is equivalent is straightforward. - Factor out the logic to calculate hash, check encoding etc into the addEntry method to allow JITs to chew off larger chunks of the logic early on
Roughly 1.2x speedup on the JarFileMeta microbenchmarks, which include the time required to open the jar file and thus exercising the optimized code.
Testing: tier1-4
Hi Claes, Overall, the changes seem good. A few comments below. src/java.base/share/classes/java/util/zip/ZipCoder.java line 140:
138: // aborting the ASCII fast-path in the UTF8 implementation, so {@code h} 139: // might be a partially calculated hash code 140: int normalizedHashDecode(int h, byte[] a, int off, int end) {
Would it make sense to keep some of this comment for clarity? src/java.base/share/classes/java/util/zip/ZipFile.java line 1531:
1529: entryPos = pos + CENHDR; 1530: } 1531: this.total = idx / 3;
It took me a moment to figure out why you made the change of total = idx/3 but I understand now. I guess my question is this part of your change more performant as I think it was easier to read /understand when total was used? If you believe this is the better way to go, I think it would be helpful to add a comment as the change is not obvious on a first pass to the reader or at least me ;-) src/java.base/share/classes/java/util/zip/ZipFile.java line 1501:
1499: if (entryPos + nlen > limit) 1500: zerror("invalid CEN header (bad header size)"); 1501: idx = addEntry(idx, table, nlen, pos, entryPos);
Perhaps consider adding a comment describing addEntry. Probably similar to the line 1500 comment (or similar) would be beneficial ------------- PR: https://git.openjdk.java.net/jdk/pull/2306
On Fri, 29 Jan 2021 18:43:40 GMT, Lance Andersen <lancea@openjdk.org> wrote:
Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
Refactor to do most entry validation in one place, out of line from initCEN.
src/java.base/share/classes/java/util/zip/ZipCoder.java line 140:
138: // aborting the ASCII fast-path in the UTF8 implementation, so {@code h} 139: // might be a partially calculated hash code 140: int normalizedHashDecode(int h, byte[] a, int off, int end) {
Would it make sense to keep some of this comment for clarity?
The comment is misleading after JDK-8260010 since the UTF8 implementation can't use this method, and I should have rewritten or removed it with JDK-8260010.
src/java.base/share/classes/java/util/zip/ZipFile.java line 1501:
1499: if (entryPos + nlen > limit) 1500: zerror("invalid CEN header (bad header size)"); 1501: idx = addEntry(idx, table, nlen, pos, entryPos);
Perhaps consider adding a comment describing addEntry. Probably similar to the line 1500 comment (or similar) would be beneficial
I've refactored this code a bit further, both for clarity and to outline more related things into what's now `checkAndAddEntry`, which helps startup time as measured with a spring-petclinic startup test (time spent opening 393 jar files drop from ~46 to ~41ms on my setup). ------------- PR: https://git.openjdk.java.net/jdk/pull/2306
- Merge checkEncoding into the byte[]-based normalizedHash. The latter is only used from ZipFile.initCEN right after the checkEncoding today, so verifying this is equivalent is straightforward. - Factor out the logic to calculate hash, check encoding etc into the addEntry method to allow JITs to chew off larger chunks of the logic early on
Roughly 1.2x speedup on the JarFileMeta microbenchmarks, which include the time required to open the jar file and thus exercising the optimized code.
Testing: tier1-4
Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Refactor to do most entry validation in one place, out of line from initCEN. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2306/files - new: https://git.openjdk.java.net/jdk/pull/2306/files/ccf4d954..713e03b4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2306&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2306&range=00-01 Stats: 71 lines in 2 files changed: 31 ins; 13 del; 27 mod Patch: https://git.openjdk.java.net/jdk/pull/2306.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2306/head:pull/2306 PR: https://git.openjdk.java.net/jdk/pull/2306
- Merge checkEncoding into the byte[]-based normalizedHash. The latter is only used from ZipFile.initCEN right after the checkEncoding today, so verifying this is equivalent is straightforward. - Factor out the logic to calculate hash, check encoding etc into the addEntry method to allow JITs to chew off larger chunks of the logic early on
Roughly 1.2x speedup on the JarFileMeta microbenchmarks, which include the time required to open the jar file and thus exercising the optimized code.
Testing: tier1-4
Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Clarify comments, put normal path branch first in UTF8 checkedHash ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2306/files - new: https://git.openjdk.java.net/jdk/pull/2306/files/713e03b4..bb1659e3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2306&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2306&range=01-02 Stats: 12 lines in 1 file changed: 4 ins; 5 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2306.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2306/head:pull/2306 PR: https://git.openjdk.java.net/jdk/pull/2306
On Sat, 30 Jan 2021 18:30:59 GMT, Claes Redestad <redestad@openjdk.org> wrote:
- Merge checkEncoding into the byte[]-based normalizedHash. The latter is only used from ZipFile.initCEN right after the checkEncoding today, so verifying this is equivalent is straightforward. - Factor out the logic to calculate hash, check encoding etc into the addEntry method to allow JITs to chew off larger chunks of the logic early on
Roughly 1.2x speedup on the JarFileMeta microbenchmarks, which include the time required to open the jar file and thus exercising the optimized code.
Testing: tier1-4
Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
Clarify comments, put normal path branch first in UTF8 checkedHash
Hi Claes I gone through your changes a couple of times and with your latest updates, the patch looks good. We should probably consider looking at Zip FS as its initCEN is similar at a future time. Thank you for your efforts to optimize the performance in this area. ------------- Marked as reviewed by lancea (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2306
On Sun, 31 Jan 2021 18:05:30 GMT, Lance Andersen <lancea@openjdk.org> wrote:
Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
Clarify comments, put normal path branch first in UTF8 checkedHash
Hi Claes
I gone through your changes a couple of times and with your latest updates, the patch looks good. We should probably consider looking at Zip FS as its initCEN is similar at a future time.
Thank you for your efforts to optimize the performance in this area.
Added some comments and refactored slightly after offline feedback from Eirik. ------------- PR: https://git.openjdk.java.net/jdk/pull/2306
- Merge checkEncoding into the byte[]-based normalizedHash. The latter is only used from ZipFile.initCEN right after the checkEncoding today, so verifying this is equivalent is straightforward. - Factor out the logic to calculate hash, check encoding etc into the addEntry method to allow JITs to chew off larger chunks of the logic early on
Roughly 1.2x speedup on the JarFileMeta microbenchmarks, which include the time required to open the jar file and thus exercising the optimized code.
Testing: tier1-4
Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Refactor out specific ZipException from checkedHash ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2306/files - new: https://git.openjdk.java.net/jdk/pull/2306/files/bb1659e3..695273fb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2306&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2306&range=02-03 Stats: 75 lines in 2 files changed: 18 ins; 14 del; 43 mod Patch: https://git.openjdk.java.net/jdk/pull/2306.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2306/head:pull/2306 PR: https://git.openjdk.java.net/jdk/pull/2306
On Tue, 2 Feb 2021 10:40:01 GMT, Claes Redestad <redestad@openjdk.org> wrote:
- Merge checkEncoding into the byte[]-based normalizedHash. The latter is only used from ZipFile.initCEN right after the checkEncoding today, so verifying this is equivalent is straightforward. - Factor out the logic to calculate hash, check encoding etc into the addEntry method to allow JITs to chew off larger chunks of the logic early on
Roughly 1.2x speedup on the JarFileMeta microbenchmarks, which include the time required to open the jar file and thus exercising the optimized code.
Testing: tier1-4
Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
Refactor out specific ZipException from checkedHash
Thank you for the update Claes, the changes look fine ------------- Marked as reviewed by lancea (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2306
On Tue, 2 Feb 2021 22:55:59 GMT, Lance Andersen <lancea@openjdk.org> wrote:
Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
Refactor out specific ZipException from checkedHash
Thank you for the update Claes, the changes look fine
Thanks @LanceAndersen and @eirbjo for reviewing ------------- PR: https://git.openjdk.java.net/jdk/pull/2306
On Fri, 29 Jan 2021 00:54:46 GMT, Claes Redestad <redestad@openjdk.org> wrote:
- Merge checkEncoding into the byte[]-based normalizedHash. The latter is only used from ZipFile.initCEN right after the checkEncoding today, so verifying this is equivalent is straightforward. - Factor out the logic to calculate hash, check encoding etc into the addEntry method to allow JITs to chew off larger chunks of the logic early on
Roughly 1.2x speedup on the JarFileMeta microbenchmarks, which include the time required to open the jar file and thus exercising the optimized code.
Testing: tier1-4
This pull request has now been integrated. Changeset: c8de943c Author: Claes Redestad <redestad@openjdk.org> URL: https://git.openjdk.java.net/jdk/commit/c8de943c Stats: 192 lines in 2 files changed: 67 ins; 83 del; 42 mod 8260617: Merge ZipFile encoding check with the initial hash calculation Reviewed-by: lancea ------------- PR: https://git.openjdk.java.net/jdk/pull/2306
participants (2)
-
Claes Redestad
-
Lance Andersen