String concatenation tweaks
Peter Levart
peter.levart at gmail.com
Tue Jun 9 17:20:53 UTC 2015
On 06/09/2015 09:37 AM, Remi Forax wrote:
> So basically, you bypass the use of StringBuilder and make the JIT's
> OptoStringConcat optimization useless by rewriting it in Java, that's
> nice !
>
> if you combine this idea with the fact that the constant Strings can
> be pass 'out of the band', it will be hard to have something faster.
I don't think it is realistic to "compile-in" specializations for
constant strings. Why? There are too much of them and each distinct
specialization takes time to initialize. This is what I found out
analyzing string concatenation usages of a real-world program...
I asked myself how one could measure and compare startup performance for
a real-world program. Ideally one would like to measure just string
concatenations as they occur in a real-world program when it starts up.
So I thought, why not record string concatenations as they occur and
then replay them in a JMH benchmark? Here's what I came-up with:
http://cr.openjdk.java.net/~plevart/misc/StringConcatenation/webrev.02/
Analyzer is a generic MethodHandle interceptor that can be used to
analyze registrations (call-site linkages) and invocations of method
handles that are instrumented. I took Apache Tomcat 8 sources, compiled
it with indy-string-concat-enabled javac and started the container with
the following system property (defined in Tomcat's setenv.sh):
CATALINA_OPTS=-Djava.lang.invoke.stringConcat.analyzer=java.lang.invoke.StringConcatRecorderAgent:/tmp/concats_catalina.txt
...after Tomcat booted, I requested it's home page once and then shut it
down. What I got recorded is the following Java source:
http://cr.openjdk.java.net/~plevart/misc/StringConcatenation/benchmarks/TomcatStartup.java
Running this as part of JMH benchmark is easy:
http://cr.openjdk.java.net/~plevart/misc/StringConcatenation/benchmarks/ConcatStartupBench.java
The results suggest that caching should always be turned on. The most
interesting are "Cold" tests. These are one-shot invocations of a bunch
of string-concatenation invocations as they occur in Tomcat startup in a
newly started VM. "Warm" tests are one-shot invocations after 10
iterations of warm-up invocations and show how quickly some strategies
get optimized by JIT compared to others.
Just looking at TomcatStartup.java string concatenations suggests that
majority of them are actually concatenations of String(s) and minority
of them use other types of arguments. This is where OptoStringConcat
does it's best. To see statistics and distributions of string
concatenations in Tomcat startup, I fired it up using the following
system property (defined in Tomcat's setenv.sh):
CATALINA_OPTS=-Djava.lang.invoke.stringConcat.analyzer=java.lang.invoke.StringConcatStatsAgent:/tmp/concat_stats_catalina.txt
...which produces the following file after it is shut down:
http://cr.openjdk.java.net/~plevart/misc/StringConcatenation/benchmarks/concat_stats_catalina.txt
Should we optimize for Tomcat, optimizing for two String concatenation
will cover 90% of call sites and 90% of invocations.
Any ideas which program to try next? It should be easy to compile it
from sources on JDK9 and run it on JDK9 too. For Tomcat, this was
straight-forward. I just had to strip definitions of java.endorsed.dirs
system property from startup scripts...
So to answer Rémi's proposal on specialization for constant strings: The
statistics of running Tomcat show that there are just 18 (17 when types
are truncated) distinct shapes of string concatenations in 519 distinct
call-sites that are used during Tomcat startup. I don't have evidence
currently on which strings are constant and which not, but I have a
feeling that much more than 18 distinct constant strings are being used
in Tomcat startup. Looking at startup benchmark numbers, I also have a
feeling that too-much specialization will hurt startup and overall
real-world performance, not improve it.
Regards, Peter
>
> Rémi
>
> On 06/08/2015 08:31 AM, Peter Levart wrote:
>>
>>
>> On 06/04/2015 11:21 AM, Aleksey Shipilev wrote:
>>> The patches, notes, benchmarks are moved under the appropriate bug ID:
>>> http://cr.openjdk.java.net/~shade/8085796/
>>>
>>> The patches now have a complete support in javac (implemented the "s +=
>>> <string> concat part as well), passes jtreg and other smoke tests [*],
>>> plus newly developed regression tests. I think this gives us enough
>>> confidence with going forward.
>>>
>>> Thanks,
>>> -Aleksey
>>>
>>> [*] There are few regression tests that start to fail with the change:
>>> one seems Indify-specific, and another is a javac-specific regression
>>> test. Current patches work that around, and we will probably have to fix
>>> these properly before integration.
>>>
>>>
>>
>> Hi Aleksey,
>>
>> I played with this a bit and it's cool. I created a full-MH strategy
>> called EXACT which always calculates upfront exactly what the
>> resulting string length will be so that it never reallocates the
>> building buffer. It doesn't use StringBuilder at all, but directly
>> fills-in a char[] which it then constructs resulting String with
>> without copying the array. Here's how it compares with other
>> strategies in your quick JMH benchmark:
>>
>> -Djava.lang.invoke.stringConcat=INNER_SIZED
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> ConcatBench.string_string 1 avgt 15 15.433 ± 1.430 ns/op
>> ConcatBench.string_string 10 avgt 15 19.479 ± 2.062 ns/op
>> ConcatBench.string_string 100 avgt 15 46.397 ± 2.112 ns/op
>> ConcatBench.string_string_int 1 avgt 15 73.743 ± 3.888 ns/op
>> ConcatBench.string_string_int 10 avgt 15 81.760 ± 5.398 ns/op
>> ConcatBench.string_string_int 100 avgt 15 204.479 ± 11.887 ns/op
>> ConcatBench.string_string_long 1 avgt 15 78.695 ± 6.292 ns/op
>> ConcatBench.string_string_long 10 avgt 15 81.722 ± 6.045 ns/op
>> ConcatBench.string_string_long 100 avgt 15 213.209 ± 33.311 ns/op
>>
>> -Djava.lang.invoke.stringConcat=INNER
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> ConcatBench.string_string 1 avgt 15 13.990 ± 0.795 ns/op
>> ConcatBench.string_string 10 avgt 15 17.306 ± 1.348 ns/op
>> ConcatBench.string_string 100 avgt 15 46.452 ± 2.730 ns/op
>> ConcatBench.string_string_int 1 avgt 15 48.821 ± 3.855 ns/op
>> ConcatBench.string_string_int 10 avgt 15 70.947 ± 5.537 ns/op
>> ConcatBench.string_string_int 100 avgt 15 145.151 ± 9.816 ns/op
>> ConcatBench.string_string_long 1 avgt 15 47.395 ± 3.015 ns/op
>> ConcatBench.string_string_long 10 avgt 15 71.046 ± 3.842 ns/op
>> ConcatBench.string_string_long 100 avgt 15 144.379 ± 9.042 ns/op
>>
>> -Djava.lang.invoke.stringConcat=EXACT
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> ConcatBench.string_string 1 avgt 15 26.940 ± 1.166 ns/op
>> ConcatBench.string_string 10 avgt 15 29.833 ± 2.053 ns/op
>> ConcatBench.string_string 100 avgt 15 54.967 ± 3.610 ns/op
>> ConcatBench.string_string_int 1 avgt 15 32.282 ± 1.868 ns/op
>> ConcatBench.string_string_int 10 avgt 15 34.342 ± 1.677 ns/op
>> ConcatBench.string_string_int 100 avgt 15 59.280 ± 2.661 ns/op
>> ConcatBench.string_string_long 1 avgt 15 35.146 ± 1.911 ns/op
>> ConcatBench.string_string_long 10 avgt 15 35.799 ± 1.862 ns/op
>> ConcatBench.string_string_long 100 avgt 15 60.160 ± 5.329 ns/op
>>
>> -Djava.lang.invoke.stringConcat=FULL_MH
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> ConcatBench.string_string 1 avgt 15 36.820 ± 1.778 ns/op
>> ConcatBench.string_string 10 avgt 15 39.301 ± 1.939 ns/op
>> ConcatBench.string_string 100 avgt 15 96.144 ± 5.151 ns/op
>> ConcatBench.string_string_int 1 avgt 15 55.876 ± 2.802 ns/op
>> ConcatBench.string_string_int 10 avgt 15 63.166 ± 4.442 ns/op
>> ConcatBench.string_string_int 100 avgt 15 110.761 ± 8.146 ns/op
>> ConcatBench.string_string_long 1 avgt 15 57.672 ± 3.732 ns/op
>> ConcatBench.string_string_long 10 avgt 15 61.310 ± 4.416 ns/op
>> ConcatBench.string_string_long 100 avgt 15 109.464 ± 8.979 ns/op
>>
>>
>> It doesn't beat INNER strategies for small sizes, but does quite well
>> with bigger sizes. It is better than Remi's FULL_MH which is also
>> "full-mh". I think it is a good candidate for optimization, since it
>> seems it has a constant overhead which is not caused by sub-optimal
>> algorithm. Perhaps by encoding the algorithm in a spinned inner class?
>>
>> Here's the implementation:
>>
>> http://cr.openjdk.java.net/~plevart/misc/StringConcatenation/webrev.01/
>>
>>
>> ...encapsulated in StringConcatFactory.ExactStrategy inner class...
>>
>> The algorithm is very unusual as it fills the char[] backwards. It
>> uses a helper package-private class java.lang.StringConcatHelper with
>> a bunch of methods used for measuring the lengths and "prepending"
>> (not appending) to the resulting char[]. The helper class is in
>> java.lang package to have access to internal methods of j.l.Integer
>> and j.l.Long classes which already contain the basic algorithms for
>> lengthing and stringing the int/long values. Their algorithms for
>> converting int/long to characters fill characters backwards, so this
>> is the main reason why the whole strategy uses this approach. byte
>> and short primitives are widened into int as well as truncated into
>> int in the caching key. float and double primitives are converted to
>> String in the strategy, since their basic conversion algorithm is
>> encapsulated in a class that already allocates a thread-local working
>> array and it would be hard to do it otherwise (i.e. the algorithm can
>> not be split into lengthing and stringing part - it is fused).
>>
>> Regards, Peter
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20150609/ba461a3d/attachment-0001.html>
More information about the compiler-dev
mailing list