RFR - JDK-8223775 String::stripIndent (Preview)

Alex Buckley alex.buckley at oracle.com
Tue May 21 18:11:42 UTC 2019


On 5/21/2019 7:56 AM, Jim Laskey wrote:
> Please do a code review of the new String::stripIndent instance method.

My interest is in the API spec because the JLS will make normative 
reference to it.


"Removes horizontal white space margins from the essential body of a
Text Block originated string, while preserving relative indentation."

Lots of concepts here: margins, "essential body", a "Text Block 
originated string" (not just any string?). Looks like the removal 
happens in-place. While "essential" is a concept from the JEP, I think 
the sibling concept, "incidental", is the one to lead with. Alluding to 
multiple lines is also good. Recommend:

"Returns a string whose value is this string, with incidental white 
space removed from the beginning and end of every line."

That is, 'stripIndent' isn't quite as generic as its name suggests. It 
implements a particular policy about "incidental" white space that we 
will now explain...


New paragraph: "Incidental white space is often present in a text block 
to align the content with the opening delimiter. For example, in the 
following code, dots represent incidental white space:

String html = """
..............<html>
..............    <body>
..............        <p>Hello, world</p>
..............    </body>
..............</html>
..............""";


Another new para, to link this high-falutin' incidental concept to the 
workmanlike name of this method:  "This method treats the incidental 
white space as indentation to be stripped, producing a string that 
preserves the relative indentation of the content. Using | to visualize 
the start of each line of the string:

|<html>
|    <body>
|        <p>Hello, world</p>
|    </body>
|</html>


Now, the algo. "This string is first conceptually separated into lines 
as if by {@link String#lines()}." -- phrasing: "First, this string is 
conceptually ..."


"Then, the <i>minimum indentation</i> (min) is determined as follows. 
For each non-blank line (as defined by {@link String#isBlank()}), the
leading {@link Character#isWhitespace(int) white space} characters are
counted. The leading {@link Character#isWhitespace(int) white space} 
characters on the last line are are also counted even if blank.
The <i>min</i> value is the smallest of these counts."

No more "common white space prefix" like in the JEP? OK, fair enough, 
we're now fully rotated to "indentation" rather than "white space".


"For each non-blank line, <i>min</i> leading white space characters are
removed. Each white space character is treated as a single character. In
particular, the tab character {@code "\t"} (U+0009) is considered a
single character; it is not expanded.

The trailing white space of each line is also removed."

The trailing white space characters are just as "treated as a single 
character" as the leading white space characters. Recommend:  "For each 
non-blank line, <i>min</i> leading white space characters are removed, 
and any trailing white space characters are removed."


Recommend handling escape sequences in a separate para. Showing "\t", a 
two character sequence, and then alluding to \u0009, a single character, 
is confusing.


"Each line is suffixed with a line feed character {@code "\n"} (U+000A),
except for the last line. The last line is suffixed with a line feed
character {@code "\n"} (U+000A) only if the original last line was blank."

The phrasing "except for the last line" is odd because it suggests the 
last line never gets a suffix. Except, oh, it might. And "suffixed" is 
an unusual word. If you're going to allude to the original string's last 
line, then isn't there a way to lean on it more? e.g. "A line feed 
character is appended to every line if the corresponding line in the 
original string was terminated with a line feed character."  (Not saying 
that is correct or desirable, but is easier to visualize.)


"Finally, the lines are concatenated into a single string and returned."

Recommend: "Finally, the lines are concatenated into a string, which is 
the result."

Alex


More information about the compiler-dev mailing list