RFR: 8215371: Better nodes for concatenated string literals

Tue Dec 18 23:39:37 UTC 2018

> How can you tell (at parse time) whether is a string concat or not? It
> depends on the types of 'a' and 'b' - so there's a resolution process in
> the middle (not unlike what happens with method resolution). So I'm not
> sure we can pull this off (unless you rewrite the AST later in the
> pipeline, but that would defeat some of the purpose?) - it seems that,
> if you pull the string long enough, you end up in a place where perhaps
> you need to flatten _all_ the binary expression trees, which, while
> doable, starts looking as a much much bigger patch.

I don't see any good ways to do that disambiguation.

Rewriting later in the pipeline would still have some value, but
doesn't help tools that only want the syntax (e.g. code formatters),
and it would be nice to keep the representation consistent from
parsing up to lowering.

Another option would be to use the improved representation only in
cases where it's easy to identify string concatenation (e.g. one of
the components is a string literal). But that would also make the
representation less consistent.

Applying flatting to all binary expression trees doesn't necessarily
seem like a bad outcome, but it would definitely be a more invasive
change.

> Then there's the problem of evolving the c.s.s.tree API - can we switch
> to a different, non-recursive internal representation w/o touching the
> API (or providing some sensible behavior for existing methods?).

The naive update to c.s.s.tree would be to add a new Tree and
corresponding visit methods, which would have similar compatibility
implications to adding support for new language features. The change
could be tied to the language level to make it easier for users to
migrate. Is that level of breakage tolerable? There are other changes
that would increase the fidelity of the AST (like the nodes for enum
declarations discussed in JDK-8182769) that seem to require similar
breaking changes.

> Unless we are convinced that we have sound solutions for all these
> issues, I think it might be better to stick with your more pragmatic
> balancing fix in the meantime (although I'd love to aim for something
> more principled, of course).
>
> Maurizio
>
> On 14/12/2018 17:31, Liam Miller-Cushon wrote:
> > Hi,
> >
> > I've been thinking a bit about the discussion of string literals
> > in JDK-8182769, and I started a separate bug for that part:
> > https://bugs.openjdk.java.net/browse/JDK-8215371
> >
> > The issue is that long string concatenation chains are represented as
> > unbalanced binary expression trees, and it's easy to run out of stack
> > when processing those trees recursively. As a work-around, javac
> > currently supports folding concatenated string literals during
> > parsing, but that work-around reduces the fidelity of the AST and is
> > problematic for IDE-like tooling.
> >
> > A possible fix is to remove the early constant folding, and instead
> > balance the binary trees for concatentation:
> > http://cr.openjdk.java.net/~cushon/8215371/webrev.00/
> >
> > There's some discussion of the tradeoffs in the comments
> > on JDK-8182769. Other ideas include:
> >
> > * Refactoring code processing string literals to avoid recursion. This
> > removes the constraint on the shape on the AST, but might reduce
> > readability. The problem also affects non-javac users of the
> > com.sun.source APIs which we don't have the option of refactoring.
> >
> > * Introducing a new AST node to represent concatenated string literals
> > as a flat list. This would require more refactoring to javac, and an
> > addition to the com.sun.source APIs.
> >
> > What do you think? My prototype fix is kind of a work-around, but I
> > think it's a modest improvement over the status quo. If there are
> > ideas of a more principled solution I'm happy to investigate alternatives.
> >
> > Thanks,