RFR - JDK-8223775 String::stripIndent (Preview)

Jim Laskey james.laskey at oracle.com
Tue May 21 18:30:58 UTC 2019


Thank you. I haven't really put the API out for CSR review as yet. This is current state remnant from Stuart's Raw String Literals CSR review. He and I hope to reform this very soon. Your comments will provide a good starting point.

Cheers,

-- Jim


> On May 21, 2019, at 3:11 PM, Alex Buckley <alex.buckley at oracle.com> wrote:
> 
> On 5/21/2019 7:56 AM, Jim Laskey wrote:
>> Please do a code review of the new String::stripIndent instance method.
> 
> My interest is in the API spec because the JLS will make normative reference to it.
> 
> 
> "Removes horizontal white space margins from the essential body of a
> Text Block originated string, while preserving relative indentation."
> 
> Lots of concepts here: margins, "essential body", a "Text Block originated string" (not just any string?). Looks like the removal happens in-place. While "essential" is a concept from the JEP, I think the sibling concept, "incidental", is the one to lead with. Alluding to multiple lines is also good. Recommend:
> 
> "Returns a string whose value is this string, with incidental white space removed from the beginning and end of every line."
> 
> That is, 'stripIndent' isn't quite as generic as its name suggests. It implements a particular policy about "incidental" white space that we will now explain...
> 
> 
> New paragraph: "Incidental white space is often present in a text block to align the content with the opening delimiter. For example, in the following code, dots represent incidental white space:
> 
> String html = """
> ..............<html>
> ..............    <body>
> ..............        <p>Hello, world</p>
> ..............    </body>
> ..............</html>
> ..............""";
> 
> 
> Another new para, to link this high-falutin' incidental concept to the workmanlike name of this method:  "This method treats the incidental white space as indentation to be stripped, producing a string that preserves the relative indentation of the content. Using | to visualize the start of each line of the string:
> 
> |<html>
> |    <body>
> |        <p>Hello, world</p>
> |    </body>
> |</html>
> 
> 
> Now, the algo. "This string is first conceptually separated into lines as if by {@link String#lines()}." -- phrasing: "First, this string is conceptually ..."
> 
> 
> "Then, the <i>minimum indentation</i> (min) is determined as follows. For each non-blank line (as defined by {@link String#isBlank()}), the
> leading {@link Character#isWhitespace(int) white space} characters are
> counted. The leading {@link Character#isWhitespace(int) white space} characters on the last line are are also counted even if blank.
> The <i>min</i> value is the smallest of these counts."
> 
> No more "common white space prefix" like in the JEP? OK, fair enough, we're now fully rotated to "indentation" rather than "white space".
> 
> 
> "For each non-blank line, <i>min</i> leading white space characters are
> removed. Each white space character is treated as a single character. In
> particular, the tab character {@code "\t"} (U+0009) is considered a
> single character; it is not expanded.
> 
> The trailing white space of each line is also removed."
> 
> The trailing white space characters are just as "treated as a single character" as the leading white space characters. Recommend:  "For each non-blank line, <i>min</i> leading white space characters are removed, and any trailing white space characters are removed."
> 
> 
> Recommend handling escape sequences in a separate para. Showing "\t", a two character sequence, and then alluding to \u0009, a single character, is confusing.
> 
> 
> "Each line is suffixed with a line feed character {@code "\n"} (U+000A),
> except for the last line. The last line is suffixed with a line feed
> character {@code "\n"} (U+000A) only if the original last line was blank."
> 
> The phrasing "except for the last line" is odd because it suggests the last line never gets a suffix. Except, oh, it might. And "suffixed" is an unusual word. If you're going to allude to the original string's last line, then isn't there a way to lean on it more? e.g. "A line feed character is appended to every line if the corresponding line in the original string was terminated with a line feed character."  (Not saying that is correct or desirable, but is easier to visualize.)
> 
> 
> "Finally, the lines are concatenated into a single string and returned."
> 
> Recommend: "Finally, the lines are concatenated into a string, which is the result."
> 
> Alex



More information about the compiler-dev mailing list