RFR: 8341594: Use Unsafe to coalesce reads in java.util.zip.ZipUtils [v2]
Eirik Bjørsnøs
eirbjo at openjdk.org
Sun Oct 6 15:39:36 UTC 2024
On Sun, 6 Oct 2024 15:14:05 GMT, Claes Redestad <redestad at openjdk.org> wrote:
>> #14632 showed that coalescing loads in the `ZipUtils` utility methods could improve performance in zip-related microbenchmarks, but doing so would increase startup overheads. Progress was stalled as we backed out some related early use of `ByteArray(LittleEndian)` and started exploring merge store optimizations in C2.
>>
>> In this PR I instead suggest using `Unsafe` directly to coalesce `short`, `int`, and `long` reads from zip data. Even with explicit bounds checking to ensure these utilities are _always_ safe there are significant improvements both to lookup speed and speed of opening zip files (most if not all bounds checks are optimized away):
>>
>>
>> make test TEST=micro:java.util.zip.ZipFile
>>
>> Name (size) Cnt Base Error Test Error Unit Change
>> GetEntry.getEntryHit 512 15 37.999 ± 0.841 34.641 ± 0.389 ns/op 1.10x (p = 0.000*)
>> GetEntry.getEntryHit 1024 15 39.557 ± 0.523 36.959 ± 1.488 ns/op 1.07x (p = 0.000*)
>> GetEntry.getEntryHitUncached 512 15 69.250 ± 0.931 64.851 ± 0.987 ns/op 1.07x (p = 0.000*)
>> GetEntry.getEntryHitUncached 1024 15 71.628 ± 0.307 67.927 ± 0.714 ns/op 1.05x (p = 0.000*)
>> GetEntry.getEntryMiss 512 15 22.961 ± 0.336 22.825 ± 0.188 ns/op 1.01x (p = 0.158 )
>> GetEntry.getEntryMiss 1024 15 22.940 ± 0.115 23.502 ± 0.273 ns/op 0.98x (p = 0.000*)
>> GetEntry.getEntryMissUncached 512 15 35.886 ± 0.429 35.598 ± 1.296 ns/op 1.01x (p = 0.395 )
>> GetEntry.getEntryMissUncached 1024 15 38.168 ± 0.911 36.141 ± 0.356 ns/op 1.06x (p = 0.000*)
>> Open.openCloseZipFile 512 15 62425.563 ± 997.455 56263.401 ± 896.892 ns/op 1.11x (p = 0.000*)
>> Open.openCloseZipFile 1024 15 117491.250 ± 962.928 108055.491 ± 1595.577 ns/op 1.09x (p = 0.000*)
>> Open.openCloseZipFilex2 512 15 62974.575 ± 911.095 57996.388 ± 910.929 ns/op 1.09x (p = 0.000*)
>> Open.openCloseZipFilex2 1024 15 119164.769 ± 1756.065 108803.468 ± 929.483 ns/op 1.10x (p = 0.000*)
>> * = significant
>>
>>
>> This PR also address some code duplication in `ZipUtils`.
>>
>> An appealing alternative would be to implement a merge load analogue to the merge store optimizations in C2. Such optimizations would be very welcome since it would improve similar code outside of java.base (jdk.zi...
>
> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
>
> copyright
Changes requested by eirbjo (Committer).
src/java.base/share/classes/java/util/zip/ZipFile.java line 1644:
> 1642: // Now scan the block backwards for END header signature
> 1643: for (int i = buf.length - ENDHDR; i >= 0; i--) {
> 1644: if (get32(buf, i) == ENDSIG) {
Maybe a matter of personal preference, but I think `GETSIG(buf, i) == ENDSIG` reads better than `get32(buf, i) == ENDSIG`.
The fact that it's 32 bits is kind of a detail and it doesn't reveal as well that we intend to read a signature.
So could we keep GETSIG, but add an index? There are places in `ZipInputStream` as well which could make use of that for signature checking. (But maybe not for this PR)
Alternatively, `ENDSIG(buf, i) == ENDSIG` would be consistent with `CENSIG(buf, i)` uses.
Same applies to the other GETSIG replacements in this file.
src/java.base/share/classes/java/util/zip/ZipUtils.java line 258:
> 256: static final long CENSIG(byte[] b, int pos) { return get32(b, pos + 0); }
> 257: static final int CENVEM(byte[] b, int pos) { return get16(b, pos + 4); }
> 258: static final int CENVEM_FA(byte[] b, int pos) { return Byte.toUnsignedInt(b[pos + 5]); } // file attribute compatibility
Did you consider introducing `get8` for consistency here? As it stands, this looks like the odd one out.
test/micro/org/openjdk/bench/java/util/zip/ZipFileOpen.java line 110:
> 108: zf2.close();
> 109: }
> 110:
A short comment stating the purpose of the main method would not hurt.
-------------
PR Review: https://git.openjdk.org/jdk/pull/21377#pullrequestreview-2350549495
PR Review Comment: https://git.openjdk.org/jdk/pull/21377#discussion_r1789122069
PR Review Comment: https://git.openjdk.org/jdk/pull/21377#discussion_r1789134874
PR Review Comment: https://git.openjdk.org/jdk/pull/21377#discussion_r1789134125
More information about the core-libs-dev
mailing list