String concatenation tweaks

Mon Jun 8 06:31:16 UTC 2015

On 06/04/2015 11:21 AM, Aleksey Shipilev wrote:
> The patches, notes, benchmarks are moved under the appropriate bug ID:
>    http://cr.openjdk.java.net/~shade/8085796/
>
> The patches now have a complete support in javac (implemented the "s +=
> <string> concat part as well), passes jtreg and other smoke tests [*],
> plus newly developed regression tests. I think this gives us enough
> confidence with going forward.
>
> Thanks,
> -Aleksey
>
> [*] There are few regression tests that start to fail with the change:
> one seems Indify-specific, and another is a javac-specific regression
> test. Current patches work that around, and we will probably have to fix
> these properly before integration.
>
>

Hi Aleksey,

I played with this a bit and it's cool. I created a full-MH strategy 
called EXACT which always calculates upfront exactly what the resulting 
string length will be so that it never reallocates the building buffer. 
It doesn't use StringBuilder at all, but directly fills-in a char[] 
which it then constructs resulting String with without copying the 
array. Here's how it compares with other strategies in your quick JMH 
benchmark:

-Djava.lang.invoke.stringConcat=INNER_SIZED

Benchmark                       (size)  Mode  Cnt Score    Error  Units
ConcatBench.string_string            1  avgt   15   15.433 ±  1.430  ns/op
ConcatBench.string_string           10  avgt   15   19.479 ±  2.062  ns/op
ConcatBench.string_string          100  avgt   15   46.397 ±  2.112  ns/op
ConcatBench.string_string_int        1  avgt   15   73.743 ±  3.888  ns/op
ConcatBench.string_string_int       10  avgt   15   81.760 ±  5.398  ns/op
ConcatBench.string_string_int      100  avgt   15  204.479 ± 11.887  ns/op
ConcatBench.string_string_long       1  avgt   15   78.695 ±  6.292  ns/op
ConcatBench.string_string_long      10  avgt   15   81.722 ±  6.045  ns/op
ConcatBench.string_string_long     100  avgt   15  213.209 ± 33.311  ns/op

-Djava.lang.invoke.stringConcat=INNER

Benchmark                       (size)  Mode  Cnt Score   Error  Units
ConcatBench.string_string            1  avgt   15   13.990 ± 0.795  ns/op
ConcatBench.string_string           10  avgt   15   17.306 ± 1.348  ns/op
ConcatBench.string_string          100  avgt   15   46.452 ± 2.730  ns/op
ConcatBench.string_string_int        1  avgt   15   48.821 ± 3.855  ns/op
ConcatBench.string_string_int       10  avgt   15   70.947 ± 5.537  ns/op
ConcatBench.string_string_int      100  avgt   15  145.151 ± 9.816  ns/op
ConcatBench.string_string_long       1  avgt   15   47.395 ± 3.015  ns/op
ConcatBench.string_string_long      10  avgt   15   71.046 ± 3.842  ns/op
ConcatBench.string_string_long     100  avgt   15  144.379 ± 9.042  ns/op

-Djava.lang.invoke.stringConcat=EXACT

Benchmark                       (size)  Mode  Cnt   Score Error  Units
ConcatBench.string_string            1  avgt   15  26.940 ± 1.166  ns/op
ConcatBench.string_string           10  avgt   15  29.833 ± 2.053  ns/op
ConcatBench.string_string          100  avgt   15  54.967 ± 3.610  ns/op
ConcatBench.string_string_int        1  avgt   15  32.282 ± 1.868  ns/op
ConcatBench.string_string_int       10  avgt   15  34.342 ± 1.677  ns/op
ConcatBench.string_string_int      100  avgt   15  59.280 ± 2.661  ns/op
ConcatBench.string_string_long       1  avgt   15  35.146 ± 1.911  ns/op
ConcatBench.string_string_long      10  avgt   15  35.799 ± 1.862  ns/op
ConcatBench.string_string_long     100  avgt   15  60.160 ± 5.329  ns/op

-Djava.lang.invoke.stringConcat=FULL_MH

Benchmark                       (size)  Mode  Cnt Score   Error  Units
ConcatBench.string_string            1  avgt   15   36.820 ± 1.778  ns/op
ConcatBench.string_string           10  avgt   15   39.301 ± 1.939  ns/op
ConcatBench.string_string          100  avgt   15   96.144 ± 5.151  ns/op
ConcatBench.string_string_int        1  avgt   15   55.876 ± 2.802  ns/op
ConcatBench.string_string_int       10  avgt   15   63.166 ± 4.442  ns/op
ConcatBench.string_string_int      100  avgt   15  110.761 ± 8.146  ns/op
ConcatBench.string_string_long       1  avgt   15   57.672 ± 3.732  ns/op
ConcatBench.string_string_long      10  avgt   15   61.310 ± 4.416  ns/op
ConcatBench.string_string_long     100  avgt   15  109.464 ± 8.979  ns/op

It doesn't beat INNER strategies for small sizes, but does quite well 
with bigger sizes. It is better than Remi's FULL_MH which is also 
"full-mh". I think it is a good candidate for optimization, since it 
seems it has a constant overhead which is not caused by sub-optimal 
algorithm. Perhaps by encoding the algorithm in a spinned inner class?

Here's the implementation:

http://cr.openjdk.java.net/~plevart/misc/StringConcatenation/webrev.01/

...encapsulated in StringConcatFactory.ExactStrategy inner class...

The algorithm is very unusual as it fills the char[] backwards. It uses 
a helper package-private class java.lang.StringConcatHelper with a bunch 
of methods used for measuring the lengths and "prepending" (not 
appending) to the resulting char[]. The helper class is in java.lang 
package to have access to internal methods of j.l.Integer and j.l.Long 
classes which already contain the basic algorithms for lengthing and 
stringing the int/long values. Their algorithms for converting int/long 
to characters fill characters backwards, so this is the main reason why 
the whole strategy uses this approach. byte and short primitives are 
widened into int as well as truncated into int in the caching key. float 
and double primitives are converted to String in the strategy, since 
their basic conversion algorithm is encapsulated in a class that already 
allocates a thread-local working array and it would be hard to do it 
otherwise (i.e. the algorithm can not be split into lengthing and 
stringing part - it is fused).

Regards, Peter

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20150608/5df43f39/attachment-0001.html>