RFR: 8215371: Better nodes for concatenated string literals

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Wed Dec 19 12:04:26 UTC 2018


Hi Liam,
while I agree that some degree of breakage in c.s.s.t API can be 
tolerated in name of better consistency, in the light of our exchange, 
it doesn't really seem that we have a credible solution for sticking 
with a more principled approach; yes, we could swap trees at later 
stages, but this would not improve the code much, and will create 
confusion for clients of the tree API which will see different tree at 
different points of the compiler pipeline.

I'd say, let's stick with your pragmatic balancing solution (which I 
still like) and keep thinking about alternatives as a separate effort?

Maurizio

On 18/12/2018 23:39, Liam Miller-Cushon wrote:
>> How can you tell (at parse time) whether is a string concat or not? It
>> depends on the types of 'a' and 'b' - so there's a resolution process in
>> the middle (not unlike what happens with method resolution). So I'm not
>> sure we can pull this off (unless you rewrite the AST later in the
>> pipeline, but that would defeat some of the purpose?) - it seems that,
>> if you pull the string long enough, you end up in a place where perhaps
>> you need to flatten _all_ the binary expression trees, which, while
>> doable, starts looking as a much much bigger patch.
> I don't see any good ways to do that disambiguation.
>
> Rewriting later in the pipeline would still have some value, but
> doesn't help tools that only want the syntax (e.g. code formatters),
> and it would be nice to keep the representation consistent from
> parsing up to lowering.
>
> Another option would be to use the improved representation only in
> cases where it's easy to identify string concatenation (e.g. one of
> the components is a string literal). But that would also make the
> representation less consistent.
>
> Applying flatting to all binary expression trees doesn't necessarily
> seem like a bad outcome, but it would definitely be a more invasive
> change.
>
>> Then there's the problem of evolving the c.s.s.tree API - can we switch
>> to a different, non-recursive internal representation w/o touching the
>> API (or providing some sensible behavior for existing methods?).
> The naive update to c.s.s.tree would be to add a new Tree and
> corresponding visit methods, which would have similar compatibility
> implications to adding support for new language features. The change
> could be tied to the language level to make it easier for users to
> migrate. Is that level of breakage tolerable? There are other changes
> that would increase the fidelity of the AST (like the nodes for enum
> declarations discussed in JDK-8182769) that seem to require similar
> breaking changes.
>
>
>
>
>
>> Unless we are convinced that we have sound solutions for all these
>> issues, I think it might be better to stick with your more pragmatic
>> balancing fix in the meantime (although I'd love to aim for something
>> more principled, of course).
>>
>> Maurizio
>>
>> On 14/12/2018 17:31, Liam Miller-Cushon wrote:
>>> Hi,
>>>
>>> I've been thinking a bit about the discussion of string literals
>>> in JDK-8182769, and I started a separate bug for that part:
>>> https://bugs.openjdk.java.net/browse/JDK-8215371
>>>
>>> The issue is that long string concatenation chains are represented as
>>> unbalanced binary expression trees, and it's easy to run out of stack
>>> when processing those trees recursively. As a work-around, javac
>>> currently supports folding concatenated string literals during
>>> parsing, but that work-around reduces the fidelity of the AST and is
>>> problematic for IDE-like tooling.
>>>
>>> A possible fix is to remove the early constant folding, and instead
>>> balance the binary trees for concatentation:
>>> http://cr.openjdk.java.net/~cushon/8215371/webrev.00/
>>>
>>> There's some discussion of the tradeoffs in the comments
>>> on JDK-8182769. Other ideas include:
>>>
>>> * Refactoring code processing string literals to avoid recursion. This
>>> removes the constraint on the shape on the AST, but might reduce
>>> readability. The problem also affects non-javac users of the
>>> com.sun.source APIs which we don't have the option of refactoring.
>>>
>>> * Introducing a new AST node to represent concatenated string literals
>>> as a flat list. This would require more refactoring to javac, and an
>>> addition to the com.sun.source APIs.
>>>
>>> What do you think? My prototype fix is kind of a work-around, but I
>>> think it's a modest improvement over the status quo. If there are
>>> ideas of a more principled solution I'm happy to investigate alternatives.
>>>
>>> Thanks,


More information about the compiler-dev mailing list