String concatenation tweaks

Sat Jun 6 14:41:05 UTC 2015

I was too tired when I answer to this email so let's try it again.

The whole idea to use invokedynamic here is equivalent to create a new 
bytecode without having to update the JVMS.
So this is not a lightweight operation, I think we can all agree with that.

There are several places were we know we need a 'new bytecode', String 
concatenation is one example, varargs at callsite is another [1], 
support for array of constants and/or collection initialization are in 
the same ballpark too.

Using invokedynamic has the advantage of not requiring to update the 
JVMS but several drawbacks:
1) some platforms do not support it, so either the compiler need to 
maintain two translations or those platform need to use a bytecode 
rewriting tool.
2) a translation scheme based on invokedynamic uses usually more 
constant pool entries and less bytecodes because instead of routing 
things to a generic API, the method descriptor of invokedynamic and the 
bootstrap constants tends to be specific to a callsite (so the runtime 
part have more precise information).
3) invokedynamic (exactly the way the method handle API is implemented) 
has a non negligible impact on startup time.

In my opinion, 1 and 2 means we need a backup plan, inside the compiler 
or/and inside the bytecode rewriting tool. 3 is more complex because it 
requires to tune/teak several parts of the VM, to reduce the number of 
lambda forms, the JIT must do a better escape analysis in order to 
remove primitive boxing and array boxing and IMO most common lambda 
forms should also be pre-computed.

Now, I think that the String concatenation based on invokedynamic should 
be integrated in 9, behind a -XD option of javac at first, because it's 
a good simple example that exercise method handle combiners with 
multiple shapes and see if the startup problem can be solved for that case.

regards,
Rémi

[1] https://bugs.openjdk.java.net/browse/JDK-8013269

On 06/05/2015 11:57 AM, Maurizio Cimadamore wrote:
> Hi Alex,
> I think the bulk of my concern (which has also been expressed by other 
> members of my team in this very thread) is that what you are proposing 
> might be too extreme in practice. Do we have an issue with string 
> concat? Yes, we do - as you pointed out, the mismatch between JLS and 
> JVMS is not negligible and sometimes we pay for that performance-wise. 
> Your solution is to decouple string optimization from the code that's 
> emitted by the static compiler. Of course that's an elegant solution 
> which has proven to be effective with lambda. But, as we learned with 
> lambda, adding new indys in the bytecode is not something that comes 
> from free. Here's a list of potential concerns:
>
> *  maintenance cost involved (as mentioned in previous emails) in 
> terms of implementing new features
> *  platforms (ME) which don't spell indy (yet?); once we start 
> generating the code suggested here, the only viable option for those 
> platforms would be to introduce yet another hack to desugar away the 
> indys and re-generate statically that very same code that the indy 
> would emit on the fly (as they currently do for lambdas).
> *  what about other compilers - i.e. other than javac? Does this JEP 
> propose that all compiler implementations start spitting the new indys?
> *  bytecode manipulators/weavers would need to handle new indys in 
> places they are not used to even in absence of source code changes - 
> just by virtue of recompilation (we got this A LOT when stackmap attrs 
> were added by default in 7)
> *  static footprint (i.e. increased classfile size) would definitively 
> be an issue in strings-concat heavy code (there are machine-generated 
> files out there which nearly use up all the entries in the CP and do 
> an huge number of string concat); this is also something that we can 
> see with lambdas, but in that case the alternative was inner classes, 
> which, in itself, is another big static footprint killer, so the 
> choice was easier. Again, a static footprint-oriented benchmark on one 
> of such files would be welcome.
> *  while indy is generally a great tool, it has its own performance 
> quirks; while microbenchmarking with JMH is certainly a good start, I 
> think we also need to think about real-world scenario benchmarks, to 
> see if they are affected in a significant way by startup or any other 
> cost.
>
> While I fully appreciate the benefits of your proposal, I'm afraid the 
> reality is a bit more complex. That, coupled with the feeling that, at 
> the end of the day, it's not like we're updating the string generation 
> code every other day (at least we never changed in the last 8 years 
> I've been here), leaves me with a bit of mixed feelings.
>
> Maurizio
>
> On 05/06/15 09:04, Aleksey Shipilev wrote:
>> First things first, I added more discussion into the JEP draft:
>>   https://bugs.openjdk.java.net/browse/JDK-8085796
>>
>>
>> On 06/04/2015 08:50 PM, Maurizio Cimadamore wrote:
>>> On 04/06/15 17:23, Aleksey Shipilev wrote:
>>>> While it maybe an interesting experiment to try, the performance
>>>> engineering experience tells me to focus on the parts that are 
>>>> relevant,
>>>> rather than doing every experiment you can come up with, otherwise you
>>>> will never ever un-bury yourself from the performance work:)
>>> I don't think this is *any* experiment you can come up with - it's the
>>> very foundation for all the JEP work. For the JEP to be viable you need
>>> to prove that whatever technique you come up with, it would be 
>>> almost in
>>> the same ballpark as the one implemented in javac.
>> If you want to assess the technique itself, you have to compare the
>> similar concat approaches used by javac and by indy-based translation
>> strategy. It would be dumb to compare a heavily-optimized javac
>> translation vs. a dumb indy-based translation.
>>
>> This is why we have INNER, it is the (almost) direct rewrite of the code
>> emitted by vanilla javac now, to the runtime. This strategy is actually
>> used to prove the move from compile time to runtime does not experience
>> the throughput hits. *That's* the foundational performance data, already
>> available.
>>
>>
>>> If the numbers are the same, then it's mostly an argument of whether
>>> we want to open up the machinery vs. keeping it buried in javac (and
>>> the future maintenance cost for any BSM added). But what if the
>>> numbers are not the same?
>> But they are the same already! INNER performs the same as vanilla javac
>> (BASELINE) is performing. This means the new infrastructure is not
>> getting in the way, when our translation strategy emits the plain
>> bytecode. Heck, both BASELINE and INNER are recognized by
>> OptimizeStringConcat and are optimized to death.
>>
>> That's what I am trying to tell here: it _is_ the question about the
>> machinery at this point. All other strategies are motivational examples
>> how this can be used to improve the translation strategy without
>> involving javac.
>>
>>
>>> Put in other words, if the natively written javac impl was 10x faster
>>> with no startup cost (I'm obviously making things up), wouldn't that
>>> mean the very death of this JEP?
>> No, it wouldn't. Because really, whatever bytecode you can emit in
>> javac, the same bytecode can be emitted through the indy translation
>> strategy. (The reverse is not true -- JDK-internal StringConcatFactory
>> may use private APIs! -- which makes javac more limiting for this task).
>>
>> If you discover a bytecode sequence that is 10x faster than proposed
>> variants in indy translation strategies, you move it in as additional
>> strategy, and that's it (plus the beauty that you can discover such a
>> bytecode sequence much later).
>>
>> This is why we would certainly like to see Louis' patch, to see if we
>> can/need to move it in under indy translation strategy, and if/what we
>> should adjust at the indy interface.
>>
>> The only plausible concern at this point is startup time, but you can
>> see that even without going with additional javac experiments. And,
>> those costs seem to be related to the initialization of indy
>> infrastructure -- some project would have to suck up those costs,
>> whether it's String Concat, Jigsaw, on any Valhalla-related project.
>>
>> Thanks,
>> -Aleksey
>>
>