String concatenation tweaks
Peter Levart
peter.levart at gmail.com
Mon Jun 1 23:04:33 UTC 2015
Hi Aleksey,
On 06/01/2015 04:52 PM, Aleksey Shipilev wrote:
>> >Passing more
>> >type info might at first seem beneficial to potential strategies that
>> >might use it to construct better specializations, but semantically we
>> >don't need all the reference types. StringBuilder.append() overloads
>> >only differentiate among 4 distinct reference types:
>> >
>> > Object, String, StringBuffer, CharSequence
>> >
>> >But internally StringBuilder actually dispatches dynamically to 5
>> >different run-time cases and treats them differently:
>> >
>> > null, Object, String, AbstractStringBuilder, CharSequence
>> >
>> >Javac could truncate the reference types to one of the above 5 before
>> >emitting invokedynamic. The null literal value could be passed via the
>> >Void parameter type to differentiate it from other types. Why is
>> >truncating necessary?
>> >
>> >- the key space for the cached shapes is reduced this way. There are
>> >only 8 primitive + 5 reference = 13 types possible at each argument
>> >position this way.
>> >- passing truncated reference types to invokedynamic means that the
>> >bootstrap method doesn't have to do the truncation.
>> >- the truncation has to be performed anyway to get rid of custom/user
>> >reference types which, if used in the MethodType key for caching, will
>> >cause Class[Loader] leaks.
>> >
>> >What do you think?
> I think since we need to get the javac bytecode part future-proof, we
> are better off passing the concrete type info to the indy bootstrap.
> Bootstrap can then decide if it wants to collapse the types back to
> those 4-5 variants, solving both the explosion of shapes, and the class
> leaks.
String concatenation actually needs just two reference types (maybe
three if you want to optimize for the literal null as a special type
which is really a very rare case in practice). The JLS says that '+'
binary operator where at least one of the arguments is of String type is
string concatenation operator. If the other argument is not of String
type then it must be converted to String type via String conversion:
https://docs.oracle.com/javase/specs/jls/se8/html/jls-5.html#jls-5.1.11
Which says that .toString() needs to be called if the argument is a
reference type other than String and not null at runtime. I checked
javac and yes, it emits code which only ever invokes two of the
overloaded StringBuilder.append() methods for reference type arguments:
append(String);
append(Object);
so the following expression:
"a" + new StringBuffer("b")
...is actually translated to equivalent of:
new StringBuilder().append("a").append((Object) new
StringBuffer("b")).toString()
.. the 'new StringBuffer("b")' argument is translated .toString() 1st
and then appended. If the overloaded append method for StringBuffer
argument was used, then .getChars() would be invoked on the argument
instead to copy chars directly. The same happens if the argument type is
CharSequence. Does this subtle detail need to be respected or can dirty
tricks be played with known types to "optimize" their conversion to
String without actually invoking .toString() or even creating an
intermediary String object? Also the String Conversion specification
says that in case the .toString() method returns null, then "null"
string is a result of such String conversion. So there have to be 2 null
checks for reference arguments - before and after .toString() call.
Regarding future-proof translation. Do you think that in the future, JLS
could change and say that some pre-existing reference type different
from String is to be treated differently than before? There might be
some future reference type that is to be treated differently, but such
type does not exist in bytecode compiled before the type is invented. So
if you only put String,Object,null into bytecode now, it should be
future proof and the set of truncated reference types can be extended in
the future.
Also I don't think string concatenation will ever want to know the
precise compile-time types of the reference typed arguments apart from
the three mentioned: String, Object, null, unless it wants to play dirty
tricks with some types to "optimize" String conversion (like
StringBuilder.append(StringBuffer) does for example).
Regards, Peter
> (Note to self: current prototype collapses the types*after* checking
> with cache, need to fix that possible class leak, thanks!)
>
> We are not inherently limited with StringBuilder API to do the
> concatenation. This compiler improvement actually opens up the way for
> specialized implementations that span more than just current 4 reference
> types.
>
> Example case: would it make sense to null-check and unbox Integer before
> pushing it on to append() chain? This will set us up for
> OptoStringConcat for new SB().append(String).append(Integer).toString():)
What is Integer at compile time can always be null at runtime:
Integer i = null;
System.out.println("i=" + i);
> Thanks,
> -Aleksey.
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20150602/69537115/attachment-0001.html>
More information about the compiler-dev
mailing list