RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v12]
Eirik Bjorsnos
duke at openjdk.org
Tue Feb 14 11:49:23 UTC 2023
> After finding a hash match, getEntryPos needs to compare the lookup name up to the encoded entry name in the CEN. This comparison is done by decoding the entry name into a String. The names can then be compared using the String API. This decoding step adds a significat cost to this method.
>
> This PR suggest to update the string comparison such that in the common case where both the lookup name and the entry name are encoded in ASCII-compatible UTF-8, decoding can be avoided and the byte arrays can instead be compared direcly.
>
> ZipCoder is updated with a new method to compare a string with an encoded byte array range. The default implementation decodes to string (like the current code), while the UTF-8 implementation uses JavaLangAccess.getBytesNoRepl to get the bytes. Both methods thes uses Arrays.mismatch for comparison with or without matching trailing slashes.
>
> Additionally, this PR suggest to make the following updates to getEntryPos:
>
> - The try/catch for IAE is redundant and can be safely removed. (initCEN already checks this and will throws IAE for invalid UTF-8). This seems to give a 3-4% speedup on micros)
> - A new test InvalidBytesInEntryNameOrComment is a added to verify that initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I found no existing test coverage for this)
> - The recursion when looking for "name/" matches is replaced with iteration. We keep track of any "name/" match and return it at the end of the search. (I feel this is easier to follow and it also gives a ~30% speedup for addSlash lookups with no regression on regular lookups)
>
> (My though is that including these additional updates in this PR might reduce reviewer overhead given that it touches the exact same code. I might be wrong on this, please advise :)
>
> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified to use xalan.jar):
>
> Baseline:
>
> Benchmark (size) Mode Cnt Score Error Units
> ZipFileGetEntry.getEntryHit 512 avgt 15 74.941 ± 1.004 ns/op
> ZipFileGetEntry.getEntryHit 1024 avgt 15 84.943 ± 1.320 ns/op
> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 120.371 ± 2.386 ns/op
> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 126.128 ± 1.075 ns/op
> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.818 ± 0.838 ns/op
> ZipFileGetEntry.getEntryMiss 1024 avgt 15 29.762 ± 5.998 ns/op
> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 59.405 ± 0.545 ns/op
> ZipFileGetEntry.getEntryMissUncached 1024 avgt 15 71.840 ± 2.455 ns/op
> ZipFileGetEntry.getEntrySlash 512 avgt 15 135.621 ± 4.341 ns/op
> ZipFileGetEntry.getEntrySlash 1024 avgt 15 134.190 ± 2.141 ns/op
>
>
>
> PR:
>
>
> Benchmark (size) Mode Cnt Score Error Units
> ZipFileGetEntry.getEntryHit 512 avgt 15 62.267 ± 1.329 ns/op
> ZipFileGetEntry.getEntryHit 1024 avgt 15 72.916 ± 2.428 ns/op
> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 101.630 ± 1.154 ns/op
> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 113.161 ± 0.502 ns/op
> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.003 ± 1.191 ns/op
> ZipFileGetEntry.getEntryMiss 1024 avgt 15 23.236 ± 1.114 ns/op
> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 56.781 ± 1.505 ns/op
> ZipFileGetEntry.getEntryMissUncached 1024 avgt 15 67.767 ± 1.963 ns/op
> ZipFileGetEntry.getEntrySlash 512 avgt 15 73.745 ± 2.717 ns/op
> ZipFileGetEntry.getEntrySlash 1024 avgt 15 75.784 ± 1.051 ns/op
>
>
> To assess the impact on startup/warmup, I made a main method class which measures the total time of calling ZipFile.getEntry for all entries in the 109 JAR file dependenies of spring-petclinic. The shows a nice improvement (time in micros):
>
>
> Percentile Baseline Patch
> 50 % 23155 21149
> 75 % 23598 21454
> 90 % 23989 21691
> 95 % 24238 21973
> 99 % 25270 22446
> STDEV 792 549
> Count 500 500
Eirik Bjorsnos has updated the pull request incrementally with one additional commit since the last revision:
Prefer expectThrows with asserts over test annotation regex
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/12290/files
- new: https://git.openjdk.org/jdk/pull/12290/files/21cdb732..4981abd3
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=12290&range=11
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=12290&range=10-11
Stats: 18 lines in 1 file changed: 7 ins; 1 del; 10 mod
Patch: https://git.openjdk.org/jdk/pull/12290.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/12290/head:pull/12290
PR: https://git.openjdk.org/jdk/pull/12290
More information about the security-dev
mailing list