String concatenation tweaks

Peter Levart peter.levart at gmail.com
Mon Jun 1 23:04:33 UTC 2015


Hi Aleksey,

On 06/01/2015 04:52 PM, Aleksey Shipilev wrote:
>> >Passing more
>> >type info might at first seem beneficial to potential strategies that
>> >might use it to construct better specializations, but semantically we
>> >don't need all the reference types. StringBuilder.append() overloads
>> >only differentiate among 4 distinct reference types:
>> >
>> >     Object, String, StringBuffer, CharSequence
>> >
>> >But internally StringBuilder actually dispatches dynamically to 5
>> >different run-time cases and treats them differently:
>> >
>> >     null, Object, String, AbstractStringBuilder, CharSequence
>> >
>> >Javac could truncate the reference types to one of the above 5 before
>> >emitting invokedynamic. The null literal value could be passed via the
>> >Void parameter type to differentiate it from other types. Why is
>> >truncating necessary?
>> >
>> >- the key space for the cached shapes is reduced this way. There are
>> >only 8 primitive + 5 reference = 13 types possible at each argument
>> >position this way.
>> >- passing truncated reference types to invokedynamic means that the
>> >bootstrap method doesn't have to do the truncation.
>> >- the truncation has to be performed anyway to get rid of custom/user
>> >reference types which, if used in the MethodType key for caching, will
>> >cause Class[Loader] leaks.
>> >
>> >What do you think?
> I think since we need to get the javac bytecode part future-proof, we
> are better off passing the concrete type info to the indy bootstrap.
> Bootstrap can then decide if it wants to collapse the types back to
> those 4-5 variants, solving both the explosion of shapes, and the class
> leaks.

String concatenation actually needs just two reference types (maybe 
three if you want to optimize for the literal null as a special type 
which is really a very rare case in practice). The JLS says that '+' 
binary operator where at least one of the arguments is of String type is 
string concatenation operator. If the other argument is not of String 
type then it must be converted to String type via String conversion:

https://docs.oracle.com/javase/specs/jls/se8/html/jls-5.html#jls-5.1.11

Which says that .toString() needs to be called if the argument is a 
reference type other than String and not null at runtime. I checked 
javac and yes, it emits code which only ever invokes two of the 
overloaded StringBuilder.append() methods for reference type arguments:

append(String);
append(Object);

so the following expression:

"a" + new StringBuffer("b")

...is actually translated to equivalent of:

new StringBuilder().append("a").append((Object) new 
StringBuffer("b")).toString()

.. the 'new StringBuffer("b")' argument is translated .toString() 1st 
and then appended. If the overloaded append method for StringBuffer 
argument was used, then .getChars() would be invoked on the argument 
instead to copy chars directly. The same happens if the argument type is 
CharSequence. Does this subtle detail need to be respected or can dirty 
tricks be played with known types to "optimize" their conversion to 
String without actually invoking .toString() or even creating an 
intermediary String object? Also the String Conversion specification 
says that in case the .toString() method returns null, then "null" 
string is a result of such String conversion. So there have to be 2 null 
checks for reference arguments - before and after .toString() call.

Regarding future-proof translation. Do you think that in the future, JLS 
could change and say that some pre-existing reference type different 
from String is to be treated differently than before? There might be 
some future reference type that is to be treated differently, but such 
type does not exist in bytecode compiled before the type is invented. So 
if you only put String,Object,null into bytecode now, it should be 
future proof and the set of truncated reference types can be extended in 
the future.

Also I don't think string concatenation will ever want to know the 
precise compile-time types of the reference typed arguments apart from 
the three mentioned: String, Object, null, unless it wants to play dirty 
tricks with some types to "optimize" String conversion (like 
StringBuilder.append(StringBuffer) does for example).

Regards, Peter

> (Note to self: current prototype collapses the types*after*  checking
> with cache, need to fix that possible class leak, thanks!)
>
> We are not inherently limited with StringBuilder API to do the
> concatenation. This compiler improvement actually opens up the way for
> specialized implementations that span more than just current 4 reference
> types.
>
> Example case: would it make sense to null-check and unbox Integer before
> pushing it on to append() chain? This will set us up for
> OptoStringConcat for new SB().append(String).append(Integer).toString():)

What is Integer at compile time can always be null at runtime:

Integer i = null;

System.out.println("i=" + i);


> Thanks,
> -Aleksey.
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20150602/69537115/attachment-0001.html>


More information about the compiler-dev mailing list