String concatenation tweaks

Remi Forax forax at univ-mlv.fr
Fri Jun 5 12:56:01 UTC 2015


On 06/05/2015 11:57 AM, Maurizio Cimadamore wrote:
> Hi Alex,
> I think the bulk of my concern (which has also been expressed by other 
> members of my team in this very thread) is that what you are proposing 
> might be too extreme in practice. Do we have an issue with string 
> concat? Yes, we do - as you pointed out, the mismatch between JLS and 
> JVMS is not negligible and sometimes we pay for that performance-wise. 
> Your solution is to decouple string optimization from the code that's 
> emitted by the static compiler. Of course that's an elegant solution 
> which has proven to be effective with lambda. But, as we learned with 
> lambda, adding new indys in the bytecode is not something that comes 
> from free. 

There is no free lunch but IMO this translation seems to be a good use 
case to optimize for the Hotspot team.

> Here's a list of potential concerns:
>
> *  maintenance cost involved (as mentioned in previous emails) in 
> terms of implementing new features
> *  platforms (ME) which don't spell indy (yet?); once we start 
> generating the code suggested here, the only viable option for those 
> platforms would be to introduce yet another hack to desugar away the 
> indys and re-generate statically that very same code that the indy 
> would emit on the fly (as they currently do for lambdas).

yes, it has to be done, the translation will require to insert code in 
between the calculation of the arguments, so it's a little more work 
than what is already done for lambdas but not something really hard if 
the tool uses ASM.

> * what about other compilers - i.e. other than javac? Does this JEP 
> propose that all compiler implementations start spitting the new indys?
> *  bytecode manipulators/weavers would need to handle new indys in 
> places they are not used to even in absence of source code changes - 
> just by virtue of recompilation (we got this A LOT when stackmap attrs 
> were added by default in 7)

not a real issue in my opinion because bytecode tools already need to 
take care of the invokedynamic calls generated by lambda translations 
that can happen everywhere in the bytecode.

> * static footprint (i.e. increased classfile size) would definitively 
> be an issue in strings-concat heavy code (there are machine-generated 
> files out there which nearly use up all the entries in the CP and do 
> an huge number of string concat); this is also something that we can 
> see with lambdas, but in that case the alternative was inner classes, 
> which, in itself, is another big static footprint killer, so the 
> choice was easier. Again, a static footprint-oriented benchmark on one 
> of such files would be welcome.

javac translation generates a lot of bytecodes but shares a lot in term 
of constant pool,
invokedynamic translations will do the opposite, they use more entries 
in the CP (mostly due to the method descriptor of invokedynamic) and use 
far less bytecode. So in term of global static footprint   So yes, if 
the current class uses of most of all possible CP entries, there is a 
good chance that it will not compile with the invokedynamic translation.
There is another limitation, a method call can not have more than 255 
arguments, if there are a lot of '+', the invokedynamic translation may 
fail too.

Maybe, the compiler should use the old strategy if there are too many 
arguments.

> * while indy is generally a great tool, it has its own performance 
> quirks; while microbenchmarking with JMH is certainly a good start, I 
> think we also need to think about real-world scenario benchmarks, to 
> see if they are affected in a significant way by startup or any other 
> cost.

You see the glass half empty !
There are ways to improve invokedynamic startup time, put most frequent 
lambda forms in CDS or the big gun, do an AOT pass on the code when 
generating the jimage (pre-generate native code if you prefer).
These things will have to be implemented to have good startup perf on 
small devices or to allow to write command tools in Java anyway.

>
> While I fully appreciate the benefits of your proposal, I'm afraid the 
> reality is a bit more complex. That, coupled with the feeling that, at 
> the end of the day, it's not like we're updating the string generation 
> code every other day (at least we never changed in the last 8 years 
> I've been here), leaves me with a bit of mixed feelings.
>
> Maurizio

cheers,
Rémi



More information about the compiler-dev mailing list