Raw String Literals (RSL) - indent stripping of multi-line strings

Jim Laskey james.laskey at oracle.com
Mon Apr 23 17:04:56 UTC 2018


Let me try and summarize the discussion related to RSL "indent stripping” of multi-line strings.

- There are at least two distinct use case groups for RSL; single line raw strings and multi-line strings (raw or otherwise).

- A multi-line RSL is indicated by the presence of at least one new line in the body of the RSL.

- There is an assumption that uses of multi-line strings will be dominated by code snippets.

- There may be some circumstances where here-document style (bodies aligned to left margin) is needed/chosen.

- Most developers will likely choose to indent/format the body of their RSLs to align with neighbouring Java code.

- This incidental indentation may add whitespace that the developer does not want including in the body of the string.

- Incidental indentation may consist of spaces and tabs, and, not all tabs are treated equally when displayed.


Samples of multi-line RSL styles (periods represent incidental indentation):

    String a = `line one
................line two`;
        
    String b = `
...............line one
...............line two
...............`;
        
    String c = `
....    line one
....    line two
....`;

    String d = `
........line one
........line two
....`;

    String e = `
........line one
........line two
........`;

    String f = `
line one
line two
`;


To avoid imposing a style on developers by way of the JLS, we opted to define RSLs as raw, allow the developer to tailor their own incidental indentation stripping technique and presupply best technique guesses via String instance methods. 

As an example, the String.stripIndent method was defined to remove the incidental indentation using the following rules;

- a determining line is any non-blank line that is not the first line
- the last line is also a determining line
- calculate the least amount of leading whitespace used on determining lines 
- remove that least amount of leading whitespace from each determining line
- if the first line is blank remove it
- if the last line is blank remove it

Two additional rules will be added, based on the e-mail discussion;

- trailing whitespace is not removed (was a side effect of detecting blank lines)
- only remove leading whitespace of a determining line if the line's leading whitespace is the same sequence of spaces and tabs used on the representative line deemed to have the least amount of leading whitespace

This works for all samples except d. For d we would have to drop the "the last line is also a determining line" rule, but then that would break c.

The possibilty of varying the composition of leading whitespace also leads to a complication. Hence, the need for something like String.stripMarkers where the body of the RSL is framed via marker sequences and leading whitespace matters not.

After thinking that we have settled, a survey of a very large code base (many 100Mlcs) leads us to wonder if String.stripIndent would be invoked in almost every case of multi-line RSL, with a few cases of here-document.  Note that String.stripIndent does not affect here-document if the close backtick is on a newline. If String.stripIndent would almost always be called, why not always apply the generic incidental indentation stripping at compile time? We’re not looking for a change of plan, just a discussion of pros and cons.

QUESTIONS:

- Should the javac compiler remove incidental indentation at compile time?

- What is the rule set used?

    - Should the last line be a determining line?

    - Should trailing whitespace be stripped?

    - Should the first or last line be removed if blank?



More information about the amber-dev mailing list