String concatenation tweaks

Remi Forax forax at univ-mlv.fr
Tue Jun 9 07:37:23 UTC 2015


So basically, you bypass the use of StringBuilder and make the JIT's 
OptoStringConcat optimization useless by rewriting it in Java, that's nice !

if you combine this idea with the fact that the constant Strings can be 
pass 'out of the band', it will be hard to have something faster.

Rémi

On 06/08/2015 08:31 AM, Peter Levart wrote:
>
>
> On 06/04/2015 11:21 AM, Aleksey Shipilev wrote:
>> The patches, notes, benchmarks are moved under the appropriate bug ID:
>>    http://cr.openjdk.java.net/~shade/8085796/
>>
>> The patches now have a complete support in javac (implemented the "s +=
>> <string> concat part as well), passes jtreg and other smoke tests [*],
>> plus newly developed regression tests. I think this gives us enough
>> confidence with going forward.
>>
>> Thanks,
>> -Aleksey
>>
>> [*] There are few regression tests that start to fail with the change:
>> one seems Indify-specific, and another is a javac-specific regression
>> test. Current patches work that around, and we will probably have to fix
>> these properly before integration.
>>
>>
>
> Hi Aleksey,
>
> I played with this a bit and it's cool. I created a full-MH strategy 
> called EXACT which always calculates upfront exactly what the 
> resulting string length will be so that it never reallocates the 
> building buffer. It doesn't use StringBuilder at all, but directly 
> fills-in a char[] which it then constructs resulting String with 
> without copying the array. Here's how it compares with other 
> strategies in your quick JMH benchmark:
>
> -Djava.lang.invoke.stringConcat=INNER_SIZED
>
> Benchmark                       (size)  Mode  Cnt Score    Error  Units
> ConcatBench.string_string            1  avgt   15 15.433 ±  1.430  ns/op
> ConcatBench.string_string           10  avgt   15 19.479 ±  2.062  ns/op
> ConcatBench.string_string          100  avgt   15 46.397 ±  2.112  ns/op
> ConcatBench.string_string_int        1  avgt   15 73.743 ±  3.888  ns/op
> ConcatBench.string_string_int       10  avgt   15 81.760 ±  5.398  ns/op
> ConcatBench.string_string_int      100  avgt   15 204.479 ± 11.887  ns/op
> ConcatBench.string_string_long       1  avgt   15 78.695 ±  6.292  ns/op
> ConcatBench.string_string_long      10  avgt   15 81.722 ±  6.045  ns/op
> ConcatBench.string_string_long     100  avgt   15 213.209 ± 33.311  ns/op
>
> -Djava.lang.invoke.stringConcat=INNER
>
> Benchmark                       (size)  Mode  Cnt Score   Error  Units
> ConcatBench.string_string            1  avgt   15 13.990 ± 0.795  ns/op
> ConcatBench.string_string           10  avgt   15 17.306 ± 1.348  ns/op
> ConcatBench.string_string          100  avgt   15 46.452 ± 2.730  ns/op
> ConcatBench.string_string_int        1  avgt   15 48.821 ± 3.855  ns/op
> ConcatBench.string_string_int       10  avgt   15 70.947 ± 5.537  ns/op
> ConcatBench.string_string_int      100  avgt   15 145.151 ± 9.816  ns/op
> ConcatBench.string_string_long       1  avgt   15 47.395 ± 3.015  ns/op
> ConcatBench.string_string_long      10  avgt   15 71.046 ± 3.842  ns/op
> ConcatBench.string_string_long     100  avgt   15 144.379 ± 9.042  ns/op
>
> -Djava.lang.invoke.stringConcat=EXACT
>
> Benchmark                       (size)  Mode  Cnt Score   Error  Units
> ConcatBench.string_string            1  avgt   15  26.940 ± 1.166  ns/op
> ConcatBench.string_string           10  avgt   15  29.833 ± 2.053  ns/op
> ConcatBench.string_string          100  avgt   15  54.967 ± 3.610  ns/op
> ConcatBench.string_string_int        1  avgt   15  32.282 ± 1.868  ns/op
> ConcatBench.string_string_int       10  avgt   15  34.342 ± 1.677  ns/op
> ConcatBench.string_string_int      100  avgt   15  59.280 ± 2.661  ns/op
> ConcatBench.string_string_long       1  avgt   15  35.146 ± 1.911  ns/op
> ConcatBench.string_string_long      10  avgt   15  35.799 ± 1.862  ns/op
> ConcatBench.string_string_long     100  avgt   15  60.160 ± 5.329  ns/op
>
> -Djava.lang.invoke.stringConcat=FULL_MH
>
> Benchmark                       (size)  Mode  Cnt Score   Error  Units
> ConcatBench.string_string            1  avgt   15 36.820 ± 1.778  ns/op
> ConcatBench.string_string           10  avgt   15 39.301 ± 1.939  ns/op
> ConcatBench.string_string          100  avgt   15 96.144 ± 5.151  ns/op
> ConcatBench.string_string_int        1  avgt   15 55.876 ± 2.802  ns/op
> ConcatBench.string_string_int       10  avgt   15 63.166 ± 4.442  ns/op
> ConcatBench.string_string_int      100  avgt   15 110.761 ± 8.146  ns/op
> ConcatBench.string_string_long       1  avgt   15 57.672 ± 3.732  ns/op
> ConcatBench.string_string_long      10  avgt   15 61.310 ± 4.416  ns/op
> ConcatBench.string_string_long     100  avgt   15 109.464 ± 8.979  ns/op
>
>
> It doesn't beat INNER strategies for small sizes, but does quite well 
> with bigger sizes. It is better than Remi's FULL_MH which is also 
> "full-mh". I think it is a good candidate for optimization, since it 
> seems it has a constant overhead which is not caused by sub-optimal 
> algorithm. Perhaps by encoding the algorithm in a spinned inner class?
>
> Here's the implementation:
>
> http://cr.openjdk.java.net/~plevart/misc/StringConcatenation/webrev.01/
>
>
> ...encapsulated in StringConcatFactory.ExactStrategy inner class...
>
> The algorithm is very unusual as it fills the char[] backwards. It 
> uses a helper package-private class java.lang.StringConcatHelper with 
> a bunch of methods used for measuring the lengths and "prepending" 
> (not appending) to the resulting char[]. The helper class is in 
> java.lang package to have access to internal methods of j.l.Integer 
> and j.l.Long classes which already contain the basic algorithms for 
> lengthing and stringing the int/long values. Their algorithms for 
> converting int/long to characters fill characters backwards, so this 
> is the main reason why the whole strategy uses this approach. byte and 
> short primitives are widened into int as well as truncated into int in 
> the caching key. float and double primitives are converted to String 
> in the strategy, since their basic conversion algorithm is 
> encapsulated in a class that already allocates a thread-local working 
> array and it would be hard to do it otherwise (i.e. the algorithm can 
> not be split into lengthing and stringing part - it is fused).
>
> Regards, Peter
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20150609/36f4448a/attachment-0001.html>


More information about the compiler-dev mailing list