RFR - JDK-8223775 String::stripIndent (Preview)
Alex Buckley
alex.buckley at oracle.com
Tue May 21 23:19:13 UTC 2019
On 5/21/2019 2:10 PM, Jim Laskey wrote:
> Updated version http://cr.openjdk.java.net/~jlaskey/8223775/webrev-02
This webrev substantially updates the API spec, which is really a topic
for amber-spec-experts (keep reading to see why). Cross-posting between
-dev and -spec-experts lists is not good, so maybe we can wrap this up
here without prolonged discussion.
API spec looks good, but it was a surprise to learn that stripIndent
performs normalization of line terminators:
"@return string with margins removed and line terminators normalized"
The processing steps in the JEP (and the JLS) are clear that
normalization happens before incidental white space removal. I realize
that stripIndent performs separation and joining in such a way as to
produce a string that looks like it was normalized prior to stripIndent,
so the @return isn't wrong, but it's still confusing to make a big deal
of normalization-first only for stripIndent to suggest normalization-last.
I think we should leave the JEP alone, since it interleaves behavior
with motivation and examples in order to aid the reader, but we should
align the JLS with the API:
-----
The string represented by a text block is not the literal sequence of
characters in the content. Instead, the string represented by a text
block is the result of applying the following transformations to the
content, in order:
1. _Incidental white space_ is removed and line terminators are
_normalized_, as if by execution of String::stripIndent on the
characters in the content. [The emphasized terms are a hint to the API
spec to define the term, which is not currently the case for the second
term.]
2. Escape sequences are interpreted, as in a string literal.
-----
String::indent also says "normalizes line termination characters"
without defining it. Separately, String::stripIndent is not at all like
the strip, stripLeading, and stripTrailing methods which sound related
to it -- they would pointlessly strip the first row of white space dots
from a multi-line string and leave the other rows.
Taking all this together, I think it's time to upgrade the class-level
spec of String: to advertise the new methods added in 11+, and to show
text blocks, and to get some terms defined for the benefit of multiple
methods. I realize this wasn't on your radar, but it's inevitable -- the
same thing happened for the class-level spec of Class when nestmates
were introduced. So, here goes:
-----
The String class represents character strings. ~~All~~ String literals
**and text blocks** in Java programs ~~, such as "abc",~~ are
implemented as instances of this class.
The strings represented this class are constant; their values cannot be
changed after they are created. (For mutable strings, see StringBuffer
and StringBuilder.) Because instances of `String` are immutable, they
can be shared. For example: ...
[The example with a char[] is quite subtle for a beginner, but I'm
skipping over it right now.]
The class String includes methods for examining individual characters of
the sequence, for comparing strings, for searching strings, for
extracting substrings, and for creating a copy of a string with all
characters translated to uppercase or to lowercase. Case mapping is
based on the Unicode Standard version specified by the Character class.
Here are some examples of how strings can be used:
System.out.println("abc");
String cde = "cde";
String c = "abc".substring(2,3);
String d = cde.substring(1, 2);
Unless otherwise noted, methods for comparing Strings do not take locale
into account. The Collator class provides methods for finer-grain,
locale-sensitive String comparison.
Unless otherwise noted, passing a null argument to a constructor or
method in this class will cause a NullPointerException to be thrown.
[This doesn't fit anywhere. j.l.Character doesn't bother with it, even
though its methods throw NPEs too. Maybe time to drop. We have lots more
important stuff to say.]
### String concatenation
The Java language provides special support for the string concatenation
operator (`+`), and for conversion of other objects to strings. For
additional information on string concatenation and conversion, see The
Java™ Language Specification. [Somehow this manages to skip the valueOf
methods, which in conjunction with things like Integer::parseInt are
worthy of a paragraph by themselves. Future work.]
Here are some examples of string concatenation:
String cde = "cde";
System.out.println("abc" + cde);
[These examples are dull, and don't describe their output, and need to
show text blocks. Future work.]
### String processing
The strings represented by this class may span multiple lines by
including _line terminators_ among their characters. A line terminator
is one of the following: a line feed character LF (U+000A), a carriage
return character CR (U+000D), or a carriage return followed immediately
by a line feed CRLF (U+000D U+000A). [Don't want to show escape
sequences like \n yet.]
A string has _normalized_ line terminators if LF is the only line
terminator which appears in the string. Many methods of this class
_normalize_ the strings they return by ensuring that CR and CRLF are
translated to LF.
The class String also includes methods for manipulating non-alphanumeric
characters in strings, such as converting escape sequences into
non-graphic characters, and stripping white space. [This paragraph is a
jumping off point for describing stripIndent's special relationship with
text blocks.]
### Unicode
A String represents a string in the UTF-16 format in which supplementary
characters are represented by surrogate pairs (see the section Unicode
Character Representations in the Character class for more information).
Index values refer to char code units, so a supplementary character uses
two positions in a String.
The String class provides methods for dealing with Unicode code points
(i.e., characters), in addition to those for dealing with Unicode code
units (i.e., char values).
-----
Alex
More information about the compiler-dev
mailing list