RFR: JDK-8184947:,ZipCoder performance improvements
Hi, Please help review the changes for j.u.z.ZipCoder/JDK-8184947 (which also includes cleanup/improvement work in java.lang.StringCoding.java to speed up general String coding performance, especially for UTF8). issue: https://bugs.openjdk.java.net/browse/JDK-8184947 webrev: http://cr.openjdk.java.net/~sherman/8184947/webrev jmh benchmark: http://cr.openjdk.java.net/~sherman/8184947/ZipCodingBM.java http://cr.openjdk.java.net/~sherman/8184947/StringCodingBM.java Notes: (1) StringCoding.de/encode() for new String()/String.getBytes() with default charset. For historical reason the existing SC.decode(byte[], off, len)/encode(coder, val) implementation has code to handle any "possible" UnsupportedEncodingExcetion situation and turn to the slow "charset name" version of de/encode() for real work. Given the fact that the Charset.defaultCharset() now returns UTF8 as the fallback default charset if there is anything wrong to obtain a default charset (we did that in jdk7 or 8?), there is no need actually to handle the UEE. This also provides the opportunity to use fastpath for stateless UTF8/88591/ASCII de/encode(). The benchmark data for newString_xxx/ getBytes_xxx (which uses the default encoding, UTF8 in this case) suggests a big speed up fo ascii-only String. StringCodingBM size) Mode Cnt NEW Score Error OLD Score Error Units getBytes_ASCII 16 avgt 5 21.155 ± 5.586 63.777 ± 54.262 ns/op getBytes_ASCII 64 avgt 5 20.854 ± 6.237 98.988 ± 62.932 ns/op getBytes_ASCII 256 avgt 5 38.291 ± 8.494 272.306 ± 77.951 ns/op getBytes_Latin 16 avgt 5 80.968 ± 15.814 76.769 ± 38.512 ns/op getBytes_Latin 64 avgt 5 163.078 ± 51.993 219.085 ± 42.665 ns/op getBytes_Latin 256 avgt 5 759.548 ± 99.386 824.594 ± 763.735 ns/op getBytes_Unicode 16 avgt 5 94.311 ± 22.189 124.185 ± 32.751 ns/op getBytes_Unicode 64 avgt 5 289.603 ± 152.056 321.541 ± 103.703 ns/op getBytes_Unicode 256 avgt 5 1253.098 ± 216.243 1201.667 ± 512.532 ns/op newString_ASCII 16 avgt 5 33.273 ± 13.780 50.402 ± 17.574 ns/op newString_ASCII 64 avgt 5 30.420 ± 6.207 84.989 ± 43.355 ns/op newString_ASCII 256 avgt 5 54.391 ± 10.451 208.096 ± 102.716 ns/op newString_Latin 16 avgt 5 115.606 ± 7.181 114.186 ± 36.310 ns/op newString_Latin 64 avgt 5 393..710 ± 73.478 414.286 ± 176.837 ns/op newString_Latin 256 avgt 5 1618.967 ± 289.044 1551.499 ± 487.904 ns/op newString_Unicode 16 avgt 5 104.848 ± 32.694 127.558 ± 12.029 ns/op newString_Unicode 64 avgt 5 377.894 ± 147.731 374.779 ± 53.028 ns/op newString_Unicode 256 avgt 5 1557.977 ± 318.652 1457.236 ± 284.424 ns/op (2) updated to "fast path" UTF8/8859-1/ASCII in all de/coding operation, which are all implemented in static /stateless methods. (benchmark for MS932 [4] provide to make sure no regression for "other" charsets) (3) added "fast path" for "ascii-only' bytes for utf8 encoding/getBytes(). The benchmark [1] suggests a big speedup for ascii-only getBytes() with limited cost to non-ascii-only cases. (this helps big for (4), the ZipCoder situation, which mainly uses ascii only). (4) java.util.zip.ZipCoder This is where this patch actually started from. As the rfe suggested we are now using byte[] as the internal storage for the String class, the optimization we put in ZipCoder for UTF8 (which uses the byte[]/char[] interface of out UTF8 implementation to help avoid the relatively heavy ByteBuffer/CharBuffer coding interface) now appears to be not that "optimized". The to/from char[] copy/paste has become a waste. ZipCoder implementation can't use new String/String.getBytes() directly because of the the different malformed/unmappable character handing requirement. The proposed change here is to add a pair of special new String()/String.getBytes() in StrngCoding class to throw IAE instead of silent replacement, via (yet another) SharedSecrets interface. This brings us much faster de/encoding (30%-50% speed up) and much less memory usage (no more unnecessary byte[]/char[] allocation and in default mode, there is only ONE utf8 ZipCoder), on all "Jar/ZipEntry" related access operations. ZipCodeBenchMark [latest] * "New Score" is with the patch * getEntry() is mainly String.getBytes(), entries()/stream() is mainly new String(bytes)). Mode Cnt New Score Error Old Score Units jf_entries avgt 20 0.582 ± 0.036 0.953 ± 0.108 ms/op jf_getEntry avgt 20 1.506 ± 0.158 2.052 ± 0.171 ms/op jf_stream avgt 20 0.698 ± 0.060 0.940 ± 0.067 ms/op zf_entries avgt 20 0.691 ± 0.057 0.917 ± 0.080 ms/op zf_getEntry avgt 20 1.459 ± 0.180 2.081 ± 0.161 ms/op zf_stream avgt 20 0.626 ± 0.074 0.909 ± 0.075 ms/op Thanks, Sherman [1] http://cr.openjdk.java.net/~sherman/8184947/StringCoding.utf8 [2]http://cr.openjdk.java.net/~sherman/8184947/StringCoding.8859_1 [3] http://cr.openjdk.java.net/~sherman/8184947/StringCoding.ascii [4]http://cr.openjdk.java.net/~sherman/8184947/StringCoding.ms932 [5] http://cr.openjdk.java.net/~sherman/8184947/ZipCoding.bm
Hi Sherman, On 2017-12-09 00:09, Xueming Shen wrote:
Hi,
Please help review the changes for j.u.z.ZipCoder/JDK-8184947 (which also includes cleanup/improvement work in java.lang.StringCoding.java to speed up general String coding performance, especially for UTF8).
issue: https://bugs.openjdk.java.net/browse/JDK-8184947 webrev: http://cr.openjdk.java.net/~sherman/8184947/webrev
I've not fully reviewed this yet, but something struck me halfway through: As the ASCII fast-path is what's really important here, we could write that part without ever having to go via a StringCoding.Result. On four of your ZipCodingBM micros this improves performance a bit further (~10%): diff -r 848591d85052 src/java.base/share/classes/java/lang/StringCoding.java --- a/src/java.base/share/classes/java/lang/StringCoding.java Sun Dec 10 18:48:21 2017 +0100 +++ b/src/java.base/share/classes/java/lang/StringCoding.java Sun Dec 10 18:55:38 2017 +0100 @@ -937,7 +937,13 @@ * Throws iae, instead of replacing, if malformed or unmmappble. */ static String newStringUTF8NoRepl(byte[] bytes, int off, int len) { - Result ret = decodeUTF8(bytes, off, len, false); + if (COMPACT_STRINGS && !hasNegatives(bytes, off, len)) { + return new String(Arrays.copyOfRange(bytes, off, off + len), LATIN1); + } + Result ret = decodeUTF8_0(bytes, off, len, false); return new String(ret.value, ret.coder); } Benchmark Mode Cnt Score Error Units ZipCodingBM.jf_entries avgt 25 43.682 ± 0.656 us/op ZipCodingBM.jf_stream avgt 25 42.075 ± 0.444 us/op ZipCodingBM.zf_entries avgt 25 43.323 ± 0.572 us/op ZipCodingBM.zf_stream avgt 25 40.237 ± 0.604 us/op After: Benchmark Mode Cnt Score Error Units ZipCodingBM.jf_entries avgt 25 37.551 ± 1.198 us/op ZipCodingBM.jf_stream avgt 25 38.065 ± 0.628 us/op ZipCodingBM.zf_entries avgt 25 37.595 ± 0.686 us/op ZipCodingBM.zf_stream avgt 25 35.734 ± 0.442 us/op (I don't know which jar you using as test.jar, but results seems consistent across a few different ones) The gain is achieved by not going via the ThreadLocal<StringCoding.Result> resultCache, which checks out when inspecting the perfasm output. I'm a bit skeptical of ThreadLocal caching optimizations for such small objects (StringCoding.Result), and wonder if there's something else we can do to help the optimizer out here, possibly eliminating the allocation entirely. Thanks! /Claes
On 12/10/17, 3:12 PM, Claes Redestad wrote:
Hi Sherman,
On 2017-12-09 00:09, Xueming Shen wrote:
Hi,
Please help review the changes for j.u.z.ZipCoder/JDK-8184947 (which also includes cleanup/improvement work in java.lang.StringCoding.java to speed up general String coding performance, especially for UTF8).
issue: https://bugs.openjdk.java.net/browse/JDK-8184947 webrev: http://cr.openjdk.java.net/~sherman/8184947/webrev
I've not fully reviewed this yet, but something struck me halfway through: As the ASCII fast-path is what's really important here, we could write that part without ever having to go via a StringCoding.Result.
On four of your ZipCodingBM micros this improves performance a bit further (~10%):
diff -r 848591d85052 src/java.base/share/classes/java/lang/StringCoding.java --- a/src/java.base/share/classes/java/lang/StringCoding.java Sun Dec 10 18:48:21 2017 +0100 +++ b/src/java.base/share/classes/java/lang/StringCoding.java Sun Dec 10 18:55:38 2017 +0100 @@ -937,7 +937,13 @@ * Throws iae, instead of replacing, if malformed or unmmappble. */ static String newStringUTF8NoRepl(byte[] bytes, int off, int len) { - Result ret = decodeUTF8(bytes, off, len, false); + if (COMPACT_STRINGS && !hasNegatives(bytes, off, len)) { + return new String(Arrays.copyOfRange(bytes, off, off + len), LATIN1); + } + Result ret = decodeUTF8_0(bytes, off, len, false); return new String(ret.value, ret.coder); }
Hi Claes, You're definite right on this one. We dont need the Result for the ASCII case. webrev has been updated accordingly. http://cr.openjdk.java.net/~sherman/8184947/webrev Yes, understood the threadlocal might not be the best choice here. But just feel something need to be done for the temporary Result object after observed its usage with the jfr in #8184947. It is taking as many spaces as the overall String objects do. Sure, it's in young-gen, should be wiped with quickly. But my take is it might be worth the tradeoff of having each/every new String/cs) get a little slower instead of having the "global" vm has to do some extra clean up for this extra Result object, for now. thanks, sherman
Hi Sherman, On 2017-12-11 05:08, Xueming Shen wrote:
thanks for incorporating my suggestion! It looks pretty good to me. While many parts is just code that has been moved, this is still a pretty big change, so I hope we can get at least another pair of eyes on it. StringCoding.java: private static void throwMalformed(int nb) { throw new IllegalArgumentException("malformed input length : " + nb); } nb is the number of bytes of the *first* offending chunk of bytes(?); is this information generally useful? I think the error message can be improved as input length is ambiguous in this context, but I wouldn't mind if the nb parameter was dropped along with a simplification of the error message. Indentation errors at lines 321, 324, 728, 746, 913 TestStringCoding.java: Are the added System.out.println's intentional? Indentation.
Yes, understood the threadlocal might not be the best choice here. But just feel something need to be done for the temporary Result object after observed its usage with the jfr in #8184947. It is taking as many spaces as the overall String objects do. Sure, it's in young-gen, should be wiped with quickly. But my take is it might be worth the tradeoff of having each/every new String/cs) get a little slower instead of having the "global" vm has to do some extra clean up for this extra Result object, for now.
Right, as this can be cause for significant GC pressure then I see how a bit of ThreadLocal overhead is fine until we come up with something better. Thanks! /Claes
thanks, sherman
On 12/12/2017 09:58 AM, Claes Redestad wrote:
StringCoding.java:
private static void throwMalformed(int nb) { throw new IllegalArgumentException("malformed input length : " + nb); }
nb is the number of bytes of the *first* offending chunk of bytes(?); is this information generally useful? I think the error message can be improved as input length is ambiguous in this context, but I wouldn't mind if the nb parameter was dropped along with a simplification of the error message.
It's the "copy/paste" of what MalformedInputException does for the toString(). It would/might be a help if with a "offset" info. I'm adding the offset info here with the hope it might help debugging.
Indentation errors at lines 321, 324, 728, 746, 913
updated.
TestStringCoding.java:
Are the added System.out.println's intentional? Indentation.
that's leftover of the debugging. removed. webrev has been updated according. I believe Martin is also reviewing. Thanks! Sherman
Sorry, I haven't had the time I would like to review. It would be good to make jdk10. I keep wishing what we do for performance here wouldn't get so messy. I keep thinking we should add some methods to the public Charset classes, e.g. canDecode(byte[], int, int) with one general purpose implementation and high-performance implementations for UTF-8, ASCII, Latin1 ASCII checking via hasNegatives has some hotspot help and that should be available as a high performance public API somewhere. One possibility is my canDecode suggestion. + if (b >= 0) + putChar(dst, dp++, (char)b); + else + putChar(dst, dp++, repl); why not coalesce into putChar(dst, dp++, (b >= 0) ? (char)b : repl)
Fix typo: singlton Remove stray space: + ba = Arrays.copyOfRange(ba, off, off + len);
Thanks Martin! webrev has been updated accordingly. http://cr.openjdk.java.net/~sherman/8184947/webrev I will try to see if I can do something for your canDecode(...) idea. I'm still have a relative big change in repo (mainly for the Strting(ByteBuffer ...) which might move all those utf8/asci/iso8859 back to where they were in sun.nio.cs. I'm running tests locally and on our mach5 system now. If everything comes back positively, with in next 50 minutes :-) I will try to push what I have right now into the jdk10. Thanks, Sherman On 12/13/17, 2:28 AM, Martin Buchholz wrote:
Sorry, I haven't had the time I would like to review. It would be good to make jdk10. I keep wishing what we do for performance here wouldn't get so messy. I keep thinking we should add some methods to the public Charset classes, e.g. canDecode(byte[], int, int) with one general purpose implementation and high-performance implementations for UTF-8, ASCII, Latin1 ASCII checking via hasNegatives has some hotspot help and that should be available as a high performance public API somewhere. One possibility is my canDecode suggestion.
+ if (b>= 0) + putChar(dst, dp++, (char)b); + else + putChar(dst, dp++, repl); why not coalesce into putChar(dst, dp++, (b >= 0) ? (char)b : repl)
participants (3)
-
Claes Redestad
-
Martin Buchholz
-
Xueming Shen