String concatenation tweaks

Tue Jun 9 17:20:53 UTC 2015

On 06/09/2015 09:37 AM, Remi Forax wrote:
> So basically, you bypass the use of StringBuilder and make the JIT's 
> OptoStringConcat optimization useless by rewriting it in Java, that's 
> nice !
>
> if you combine this idea with the fact that the constant Strings can 
> be pass 'out of the band', it will be hard to have something faster.

I don't think it is realistic to "compile-in" specializations for 
constant strings. Why? There are too much of them and each distinct 
specialization takes time to initialize. This is what I found out 
analyzing string concatenation usages of a real-world program...

I asked myself how one could measure and compare startup performance for 
a real-world program. Ideally one would like to measure just string 
concatenations as they occur in a real-world program when it starts up. 
So I thought, why not record string concatenations as they occur and 
then replay them in a JMH benchmark? Here's what I came-up with:

http://cr.openjdk.java.net/~plevart/misc/StringConcatenation/webrev.02/

Analyzer is a generic MethodHandle interceptor that can be used to 
analyze registrations (call-site linkages) and invocations of method 
handles that are instrumented. I took Apache Tomcat 8 sources, compiled 
it with indy-string-concat-enabled javac and started the container with 
the following system property (defined in Tomcat's setenv.sh):

CATALINA_OPTS=-Djava.lang.invoke.stringConcat.analyzer=java.lang.invoke.StringConcatRecorderAgent:/tmp/concats_catalina.txt

...after Tomcat booted, I requested it's home page once and then shut it 
down. What I got recorded is the following Java source:

http://cr.openjdk.java.net/~plevart/misc/StringConcatenation/benchmarks/TomcatStartup.java

Running this as part of JMH benchmark is easy:

http://cr.openjdk.java.net/~plevart/misc/StringConcatenation/benchmarks/ConcatStartupBench.java

The results suggest that caching should always be turned on. The most 
interesting are "Cold" tests. These are one-shot invocations of a bunch 
of string-concatenation invocations as they occur in Tomcat startup in a 
newly started VM. "Warm" tests are one-shot invocations after 10 
iterations of warm-up invocations and show how quickly some strategies 
get optimized by JIT compared to others.

Just looking at TomcatStartup.java string concatenations suggests that 
majority of them are actually concatenations of String(s) and minority 
of them use other types of arguments. This is where OptoStringConcat 
does it's best. To see statistics and distributions of string 
concatenations in Tomcat startup, I fired it up using the following 
system property (defined in Tomcat's setenv.sh):

CATALINA_OPTS=-Djava.lang.invoke.stringConcat.analyzer=java.lang.invoke.StringConcatStatsAgent:/tmp/concat_stats_catalina.txt

...which produces the following file after it is shut down:

http://cr.openjdk.java.net/~plevart/misc/StringConcatenation/benchmarks/concat_stats_catalina.txt

Should we optimize for Tomcat, optimizing for two String concatenation 
will cover 90% of call sites and 90% of invocations.

Any ideas which program to try next? It should be easy to compile it 
from sources on JDK9 and run it on JDK9 too. For Tomcat, this was 
straight-forward. I just had to strip definitions of java.endorsed.dirs 
system property from startup scripts...

So to answer Rémi's proposal on specialization for constant strings: The 
statistics of running Tomcat show that there are just 18 (17 when types 
are truncated) distinct shapes of string concatenations in 519 distinct 
call-sites that are used during Tomcat startup. I don't have evidence 
currently on which strings are constant and which not, but I have a 
feeling that much more than 18 distinct constant strings are being used 
in Tomcat startup. Looking at startup benchmark numbers, I also have a 
feeling that too-much specialization will hurt startup and overall 
real-world performance, not improve it.

Regards, Peter

>
> Rémi
>
> On 06/08/2015 08:31 AM, Peter Levart wrote:
>>
>>
>> On 06/04/2015 11:21 AM, Aleksey Shipilev wrote:
>>> The patches, notes, benchmarks are moved under the appropriate bug ID:
>>>    http://cr.openjdk.java.net/~shade/8085796/
>>>
>>> The patches now have a complete support in javac (implemented the "s +=
>>> <string> concat part as well), passes jtreg and other smoke tests [*],
>>> plus newly developed regression tests. I think this gives us enough
>>> confidence with going forward.
>>>
>>> Thanks,
>>> -Aleksey
>>>
>>> [*] There are few regression tests that start to fail with the change:
>>> one seems Indify-specific, and another is a javac-specific regression
>>> test. Current patches work that around, and we will probably have to fix
>>> these properly before integration.
>>>
>>>
>>
>> Hi Aleksey,
>>
>> I played with this a bit and it's cool. I created a full-MH strategy 
>> called EXACT which always calculates upfront exactly what the 
>> resulting string length will be so that it never reallocates the 
>> building buffer. It doesn't use StringBuilder at all, but directly 
>> fills-in a char[] which it then constructs resulting String with 
>> without copying the array. Here's how it compares with other 
>> strategies in your quick JMH benchmark:
>>
>> -Djava.lang.invoke.stringConcat=INNER_SIZED
>>
>> Benchmark                       (size)  Mode  Cnt Score    Error  Units
>> ConcatBench.string_string            1  avgt   15 15.433 ±  1.430  ns/op
>> ConcatBench.string_string           10  avgt   15 19.479 ±  2.062  ns/op
>> ConcatBench.string_string          100  avgt   15 46.397 ±  2.112  ns/op
>> ConcatBench.string_string_int        1  avgt   15 73.743 ±  3.888  ns/op
>> ConcatBench.string_string_int       10  avgt   15 81.760 ±  5.398  ns/op
>> ConcatBench.string_string_int      100  avgt   15 204.479 ± 11.887  ns/op
>> ConcatBench.string_string_long       1  avgt   15 78.695 ±  6.292  ns/op
>> ConcatBench.string_string_long      10  avgt   15 81.722 ±  6.045  ns/op
>> ConcatBench.string_string_long     100  avgt   15 213.209 ± 33.311  ns/op
>>
>> -Djava.lang.invoke.stringConcat=INNER
>>
>> Benchmark                       (size)  Mode  Cnt Score   Error  Units
>> ConcatBench.string_string            1  avgt   15 13.990 ± 0.795  ns/op
>> ConcatBench.string_string           10  avgt   15 17.306 ± 1.348  ns/op
>> ConcatBench.string_string          100  avgt   15 46.452 ± 2.730  ns/op
>> ConcatBench.string_string_int        1  avgt   15 48.821 ± 3.855  ns/op
>> ConcatBench.string_string_int       10  avgt   15 70.947 ± 5.537  ns/op
>> ConcatBench.string_string_int      100  avgt   15 145.151 ± 9.816  ns/op
>> ConcatBench.string_string_long       1  avgt   15 47.395 ± 3.015  ns/op
>> ConcatBench.string_string_long      10  avgt   15 71.046 ± 3.842  ns/op
>> ConcatBench.string_string_long     100  avgt   15 144.379 ± 9.042  ns/op
>>
>> -Djava.lang.invoke.stringConcat=EXACT
>>
>> Benchmark                       (size)  Mode  Cnt Score   Error  Units
>> ConcatBench.string_string            1  avgt   15 26.940 ± 1.166  ns/op
>> ConcatBench.string_string           10  avgt   15 29.833 ± 2.053  ns/op
>> ConcatBench.string_string          100  avgt   15 54.967 ± 3.610  ns/op
>> ConcatBench.string_string_int        1  avgt   15 32.282 ± 1.868  ns/op
>> ConcatBench.string_string_int       10  avgt   15 34.342 ± 1.677  ns/op
>> ConcatBench.string_string_int      100  avgt   15 59.280 ± 2.661  ns/op
>> ConcatBench.string_string_long       1  avgt   15 35.146 ± 1.911  ns/op
>> ConcatBench.string_string_long      10  avgt   15 35.799 ± 1.862  ns/op
>> ConcatBench.string_string_long     100  avgt   15 60.160 ± 5.329  ns/op
>>
>> -Djava.lang.invoke.stringConcat=FULL_MH
>>
>> Benchmark                       (size)  Mode  Cnt Score   Error  Units
>> ConcatBench.string_string            1  avgt   15 36.820 ± 1.778  ns/op
>> ConcatBench.string_string           10  avgt   15 39.301 ± 1.939  ns/op
>> ConcatBench.string_string          100  avgt   15 96.144 ± 5.151  ns/op
>> ConcatBench.string_string_int        1  avgt   15 55.876 ± 2.802  ns/op
>> ConcatBench.string_string_int       10  avgt   15 63.166 ± 4.442  ns/op
>> ConcatBench.string_string_int      100  avgt   15 110.761 ± 8.146  ns/op
>> ConcatBench.string_string_long       1  avgt   15 57.672 ± 3.732  ns/op
>> ConcatBench.string_string_long      10  avgt   15 61.310 ± 4.416  ns/op
>> ConcatBench.string_string_long     100  avgt   15 109.464 ± 8.979  ns/op
>>
>>
>> It doesn't beat INNER strategies for small sizes, but does quite well 
>> with bigger sizes. It is better than Remi's FULL_MH which is also 
>> "full-mh". I think it is a good candidate for optimization, since it 
>> seems it has a constant overhead which is not caused by sub-optimal 
>> algorithm. Perhaps by encoding the algorithm in a spinned inner class?
>>
>> Here's the implementation:
>>
>> http://cr.openjdk.java.net/~plevart/misc/StringConcatenation/webrev.01/
>>
>>
>> ...encapsulated in StringConcatFactory.ExactStrategy inner class...
>>
>> The algorithm is very unusual as it fills the char[] backwards. It 
>> uses a helper package-private class java.lang.StringConcatHelper with 
>> a bunch of methods used for measuring the lengths and "prepending" 
>> (not appending) to the resulting char[]. The helper class is in 
>> java.lang package to have access to internal methods of j.l.Integer 
>> and j.l.Long classes which already contain the basic algorithms for 
>> lengthing and stringing the int/long values. Their algorithms for 
>> converting int/long to characters fill characters backwards, so this 
>> is the main reason why the whole strategy uses this approach. byte 
>> and short primitives are widened into int as well as truncated into 
>> int in the caching key. float and double primitives are converted to 
>> String in the strategy, since their basic conversion algorithm is 
>> encapsulated in a class that already allocates a thread-local working 
>> array and it would be hard to do it otherwise (i.e. the algorithm can 
>> not be split into lengthing and stringing part - it is fused).
>>
>> Regards, Peter
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20150609/ba461a3d/attachment-0001.html>