String concatenation tweaks

Aleksey Shipilev aleksey.shipilev at oracle.com
Mon Jun 1 11:09:05 UTC 2015


Hi Jan,

Thanks for pointing out this alternative, I should add it to JEP
"Alternatives" section. See below:

On 06/01/2015 12:29 PM, Jan Lahoda wrote:
> While I agree that use of invokedynamic to allow optimized String
> concatenation is very clever, using invokedynamic for so basic operation
> seems like an overkill to me. I suppose that many tools that
> read/analyze/manipulate the classfiles would need understand what this
> new code is doing etc.

Think about indy here as the declaration of the intent. Yes, it will
require tools to recognize that (newly shaped) intent. But we've been
that path before with lambda translation. Having this new bytecode shape
is one of the reasons why we better do it in the upcoming *major* release.

The overheads, even in current (crude) prototype are about the first
linkage of indy callsite, when we need to expand on the intent -- this
is what crude startup tests are trying to show. There should be not much
of a difference (if any) after the indy had linked, i.e. when we reach
the steady state.


> Is there any chance JIT could detect and optimize the pattern? The
> advantage of that would not only be that existing classfiles would get
> the optimization, and maybe also (some) hand-written String
> concatenations could get the improved behavior as well?

C2 already optimizes quite a few StringBuilder append chains (see
-XX:+-OptimizeStringConcat [1]), but not all of them. In fact, Louis
told us that their "precompute the final String length" trick improves
performance beyond what OptoStringConcat does today.

There are two types of troubles I see with extending OptoStringConcat to
more cases:

 a) The need to recognize the code shape emitted by javac. The minute
differences in the bytecode shape break OptoStringConcat, as we saw in
this work already. This is somewhat alleviated if we agree to keep the
javac-generated code shape to stay in current form forever and ever.
(There also was a funny thread on StackOverflow about the differences in
Eclipse's compiler vs Oracle's javac when it comes to String concat, but
I can't find it now).

 b) Optimizing at JIT compiler side will require spelling out the
toString conversions and storage management on C2 IR. See e.g. the
rewrite of corresponding Java code in PhaseStringOpts::int_getChars [1].
JIT optimizations, in fact, make the whole business *less* observable
than just a plain bytecode and/or MH combinators.

There is a careful balance between what JIT compiler can recognize, and
what it should be forced to recognize. Clearly demarcating the intent to
"concat" helps to recognize the idiom.

The choice of indy is incidental here: it actually covers the case when
compiler cannot recognize the idiom, and we need to end up running
almost the same bytecode sequence as javac currently generates. If that
was not an issue, and we did not care about the fallback performance, we
might as well do java.lang.StringConcat::concat(Object... args), ask JIT
to eliminate the boxing in primitive args and the array for varargs
call, provide a fallback implementation in Java for the cases JIT
refuses to optimize, and be done by lunch.

On a top of that, let's ask the philosophical question: if JIT
recognizes the StringBuilder append chain with default initial capacity,
can/should it really replace it with an optimized concat code? Or, that
is just an accident from inability to disambiguate javac's generated
bytecode sequence (which actually spells "don't care about the capacity
at this point, this is just the concat, go wild") from a genuine user
code (which *may* spell "I am relying on default capacity rules as
described in StringBuilder Javadocs")?


Thanks,
-Aleksey.

[1]
http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/ac6a7b63d701/src/share/vm/opto/stringopts.cpp


> On 1.6.2015 09:49, Aleksey Shipilev wrote:
>> Hi Peter,
>>
>> On 06/01/2015 12:33 AM, Peter Levart wrote:
>>> On 05/31/2015 11:06 PM, Peter Levart wrote:
>>>> On 05/31/2015 10:58 PM, Peter Levart wrote:
>>>>> This is a noble goal. I will just warn you about the possible
>>>>> initialization problems. String concatenation is a very rudimentary
>>>>> operation and might be used very early in the startup of the JVM. If
>>>>> it is used before the system class loader is initialized (before the
>>>>> main method is executed), you will be faced with the following issue
>>>>> at least:
>>>>>
>>>>>
>>>>> http://mail.openjdk.java.net/pipermail/mlvm-dev/2015-March/006386.html
>>>>>
>>>>> ...so we might need to fix these early java.lang.invoke
>>>>> initialization problems 1st.
>>>>
>>>> Not to mention that java.lang.invoke infrastructure (at least the part
>>>> that is used to support invokedynamic etc.) should then *not* use
>>>> string concatenation...
>>>
>>> One way to tackle this is to have a javac option to emit classical
>>> StringBuilder-based code and then build the (java.base module at least)
>>> with this option. So only other modules and user code would use indy
>>> based concatenation.
>>
>> If you read my notes about this:
>>   http://cr.openjdk.java.net/~shade/scratch/string-concat-indy/notes.txt
>>
>> You will see the mention of "java.base is exempt from indy string
>> concat, otherwise the initialization circularity ensues". Indeed, there
>> is a patch that disables indy string concat for java.base:
>>  
>> http://cr.openjdk.java.net/~shade/scratch/string-concat-indy/patch-root-1.patch
>>
>>
>>> This will also eliminate worries about startup time.
>>
>> It would not, because, as I was saying in the notes, the significant
>> time is spent dealing with indy infrastructure for every user string
>> concat. In other words, a simple smoke test with HelloWorld concating a
>> simple string suffers quite a bit.
>>
>> Thanks,
>> -Aleksey.
>>
>>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20150601/f2d753ac/signature.asc>


More information about the compiler-dev mailing list