RFR: 8215371: Better nodes for concatenated string literals

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Sat Dec 15 22:02:35 UTC 2018


Hi Liam,
form an hgh level perspective, having a better tree representation for 
string concat has always looked like the natural place where the impl 
should go. Far too much time has been spent micro-optimizing the 
existing code, trying to come up with clever ways to avoid creation of 
deeply recursive trees - optimizations which work... until they don't. 
For instance, javac is not very good at removing recursion when you mix 
constants and variables, and so there are way to end up in a badly 
stack-heavy scenario.

Then there's the problem of IDEs, which do not like much javac messing 
up with the source representation (which is why we do have flags to 
avoid the eager folding, which is, I believe, used by Netbeans).

So, I think I'd be interested to see what an approach using a different 
tree would look like; but at the same time I don't think it will be easy.

One possible complication in flattening the tree representation is that 
string concat is, right now, pretty much handled by the logic which 
handles all other binary operators (see Attr::visitBinary/Operators); 
all that type-checking code is made to work on a pair of operands at a 
time, and it might be tricky to refactor it so that it works on n-ary 
inputs - and I think this will force you to remove string concat from 
the list of operators - which then would probably result in asymmetries 
with the += string concat assign operator.

Another (perhaps bigger) question mark is whether the parser has enough 
info to classify the expression as a string concat: if you see the 
expression:

a + b + 12

How can you tell (at parse time) whether is a string concat or not? It 
depends on the types of 'a' and 'b' - so there's a resolution process in 
the middle (not unlike what happens with method resolution). So I'm not 
sure we can pull this off (unless you rewrite the AST later in the 
pipeline, but that would defeat some of the purpose?) - it seems that, 
if you pull the string long enough, you end up in a place where perhaps 
you need to flatten _all_ the binary expression trees, which, while 
doable, starts looking as a much much bigger patch.

Then there's the problem of evolving the c.s.s.tree API - can we switch 
to a different, non-recursive internal representation w/o touching the 
API (or providing some sensible behavior for existing methods?). Maybe, 
but that will also require work.

Unless we are convinced that we have sound solutions for all these 
issues, I think it might be better to stick with your more pragmatic 
balancing fix in the meantime (although I'd love to aim for something 
more principled, of course).

Maurizio

On 14/12/2018 17:31, Liam Miller-Cushon wrote:
> Hi,
>
> I've been thinking a bit about the discussion of string literals 
> in JDK-8182769, and I started a separate bug for that part: 
> https://bugs.openjdk.java.net/browse/JDK-8215371
>
> The issue is that long string concatenation chains are represented as 
> unbalanced binary expression trees, and it's easy to run out of stack 
> when processing those trees recursively. As a work-around, javac 
> currently supports folding concatenated string literals during 
> parsing, but that work-around reduces the fidelity of the AST and is 
> problematic for IDE-like tooling.
>
> A possible fix is to remove the early constant folding, and instead 
> balance the binary trees for concatentation: 
> http://cr.openjdk.java.net/~cushon/8215371/webrev.00/
>
> There's some discussion of the tradeoffs in the comments 
> on JDK-8182769. Other ideas include:
>
> * Refactoring code processing string literals to avoid recursion. This 
> removes the constraint on the shape on the AST, but might reduce 
> readability. The problem also affects non-javac users of the 
> com.sun.source APIs which we don't have the option of refactoring.
>
> * Introducing a new AST node to represent concatenated string literals 
> as a flat list. This would require more refactoring to javac, and an 
> addition to the com.sun.source APIs.
>
> What do you think? My prototype fix is kind of a work-around, but I 
> think it's a modest improvement over the status quo. If there are 
> ideas of a more principled solution I'm happy to investigate alternatives.
>
> Thanks,


More information about the compiler-dev mailing list