RFR: 8215017: Improve String::equals warmup characteristics
Claes Redestad
claes.redestad at oracle.com
Mon Dec 10 17:36:05 UTC 2018
Hi,
Tobias weighed in on this in another thread[1], and while he thinks the
proposed patch is semantically correct, concerns was raised that maybe
the UTF16 intrinsics could be superior (on some platforms).
I ran the microbenchmark below as well as existing string-density-
benchmark[2] suite, noting no statistically significant differences for
peak performance on x64_86 (Windows, Linux, Mac) and SPARC T4 through
M7. Warmup improvements are similar across all platforms.
So from our point of view things look green.
However, I have no means of testing the intrinsics on other platforms
(S390, aarch64, ppc), so it'd be much appreciated if performance could
be verified on those platforms using the proposed patch and benchmark.
By inspecting code it seems the difference should be negligible or even
positive, e.g., on aarch64 there's a trailing comparison that is elided
when treating the byte[] as a char[] - overhead that is possibly offset
entirely by removing an extra branch before going into the intrinsics.
Thanks!
/Claes
[1]
http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-December/057240.html
[2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip
(had to remove hg maven plugin, reference to sun.misc.Version and update
JMH version for this to build and run on latest JDK)
On 2018-12-08 01:11, Claes Redestad wrote:
> Hi,
>
> following up from discussions during review of JDK-8214971[1], I
> examined the startup and peak performance of a few different variant of
> writing String::equals.
>
> Webrev: http://cr.openjdk.java.net/~redestad/8215017/jdk.00/
> Bug: https://bugs.openjdk.java.net/browse/JDK-8215017
>
> - folding coder() == aString.coder() into sameCoder(aString) helps
> interpreter without adversely affecting higher optimization levels
>
> - Jim's proposal to use Arrays.equals is _interesting_: it improves
> peak performance on some inputs but regress it on others. I'll defer
> that to a future RFE as it needs a more thorough examination.
>
> - what we can do is simplify to only use StringLatin1.equals. If I'm not
> completely mistaken these are all semantically neutral (and
> StringUTF16.equals effectively redundant). If I'm completely wrong here
> I'll welcome the learning opportunity. :-)
>
> This removes a branch and two method calls, and for UTF16 Strings we'll
> use a simpler algorithm early, which turns out to be beneficial during
> interpreter and C1 level.
>
> I added a simple microbenchmark to explore this, results show 1.2-2.5x
> improvements in interpreter performance, while remaining perfectly
> neutral results for optimized code on this simple micro[2].
>
> This could be extended to clean up and move StringLatin1.equals back
> into String and remove StringUTF16, but we'd also need to rearrange the
> intrinsics on the VM side. Let me know what you think.
>
> Thanks!
>
> /Claes
>
> [1]
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-December/057162.html
>
>
> [2]
> ========== Baseline =================
>
> -Xint
> Benchmark Mode Cnt Score Error Units
> StringEquals.equalsAlmostEqual avgt 4 968.640 ± 1.337 ns/op
> StringEquals.equalsAlmostEqualUTF16 avgt 4 2082.007 ± 5.303 ns/op
> StringEquals.equalsDifferent avgt 4 583.166 ± 29.461 ns/op
> StringEquals.equalsDifferentCoders avgt 4 422.993 ± 1.291 ns/op
> StringEquals.equalsEqual avgt 4 988.671 ± 1.492 ns/op
> StringEquals.equalsEqualsUTF16 avgt 4 2103.060 ± 5.705 ns/op
>
> -XX:+CompactStrings
> Benchmark Mode Cnt Score Error Units
> StringEquals.equalsAlmostEqual avgt 4 23.896 ± 0.089 ns/op
> StringEquals.equalsAlmostEqualUTF16 avgt 4 23.935 ± 0.562 ns/op
> StringEquals.equalsDifferent avgt 4 15.086 ± 0.044 ns/op
> StringEquals.equalsDifferentCoders avgt 4 12.572 ± 0.008 ns/op
> StringEquals.equalsEqual avgt 4 25.143 ± 0.025 ns/op
> StringEquals.equalsEqualsUTF16 avgt 4 25.148 ± 0.021 ns/op
>
> -XX:-CompactStrings
> Benchmark Mode Cnt Score Error Units
> StringEquals.equalsAlmostEqual avgt 4 24.539 ± 0.127 ns/op
> StringEquals.equalsAlmostEqualUTF16 avgt 4 22.638 ± 0.047 ns/op
> StringEquals.equalsDifferent avgt 4 13.930 ± 0.835 ns/op
> StringEquals.equalsDifferentCoders avgt 4 13.836 ± 0.025 ns/op
> StringEquals.equalsEqual avgt 4 26.420 ± 0.020 ns/op
> StringEquals.equalsEqualsUTF16 avgt 4 23.889 ± 0.037 ns/op
>
> ========== Fix ======================
>
> -Xint
> Benchmark Mode Cnt Score Error Units
> StringEquals.equalsAlmostEqual avgt 4 811.859 ± 8.663 ns/op
> StringEquals.equalsAlmostEqualUTF16 avgt 4 802.784 ± 352.884 ns/op
> StringEquals.equalsDifferent avgt 4 431.837 ± 1.884 ns/op
> StringEquals.equalsDifferentCoders avgt 4 358.244 ± 1.208 ns/op
> StringEquals.equalsEqual avgt 4 832.056 ± 3.541 ns/op
> StringEquals.equalsEqualsUTF16 avgt 4 832.434 ± 3.516 ns/op
>
> -XX:+CompactStrings
> Benchmark Mode Cnt Score Error Units
> StringEquals.equalsAlmostEqual avgt 4 23.906 ± 0.151 ns/op
> StringEquals.equalsAlmostEqualUTF16 avgt 4 23.905 ± 0.123 ns/op
> StringEquals.equalsDifferent avgt 4 15.088 ± 0.023 ns/op
> StringEquals.equalsDifferentCoders avgt 4 12.575 ± 0.030 ns/op
> StringEquals.equalsEqual avgt 4 25.149 ± 0.059 ns/op
> StringEquals.equalsEqualsUTF16 avgt 4 25.149 ± 0.033 ns/op
>
> -XX:-CompactStrings
> Benchmark Mode Cnt Score Error Units
> StringEquals.equalsAlmostEqual avgt 4 24.521 ± 0.050 ns/op
> StringEquals.equalsAlmostEqualUTF16 avgt 4 22.639 ± 0.035 ns/op
> StringEquals.equalsDifferent avgt 4 13.831 ± 0.020 ns/op
> StringEquals.equalsDifferentCoders avgt 4 13.884 ± 0.345 ns/op
> StringEquals.equalsEqual avgt 4 26.395 ± 0.066 ns/op
> StringEquals.equalsEqualsUTF16 avgt 4 23.904 ± 0.112 ns/op
More information about the core-libs-dev
mailing list