Raw String Literal Library Support
Michael Hixson
michael.hixson at gmail.com
Wed Mar 14 23:05:01 UTC 2018
Hi Jim,
Does string.lines() agree with new BufferedReader(new
StringReader(string)).lines() on what the lines are for all inputs?
For example, does ``.lines() produce an empty stream?
-Michael
On Tue, Mar 13, 2018 at 6:47 AM, Jim Laskey <james.laskey at oracle.com> wrote:
> With the announcement of JEP 326 Raw String Literals, we would like to open up a discussion with regards to RSL library support. Below are several implemented String methods that are believed to be appropriate. Please comment on those mentioned below including recommending alternate names or signatures. Additional methods can be considered if warranted, but as always, the bar for inclusion in String is high.
>
> You should keep a couple things in mind when reviewing these methods.
>
> Methods should be applicable to all strings, not just Raw String Literals.
>
> The number of additional methods should be minimized, not adding every possible method.
>
> Don't put any emphasis on performance. That is a separate discussion.
>
> Cheers,
>
> -- Jim
>
> A. Line support.
>
> public Stream<String> lines()
> Returns a stream of substrings extracted from this string partitioned by line terminators. Internally, the stream is implemented using a Spliteratorthat extracts one line at a time. The line terminators recognized are \n, \r\n and \r. This method provides versatility for the developer working with multi-line strings.
> Example:
>
> String string = "abc\ndef\nghi";
> Stream<String> stream = string.lines();
> List<String> list = stream.collect(Collectors.toList());
>
> Result:
>
> [abc, def, ghi]
>
>
> Example:
>
> String string = "abc\ndef\nghi";
> String[] array = string.lines().toArray(String[]::new);
>
> Result:
>
> [Ljava.lang.String;@33e5ccce // [abc, def, ghi]
>
>
> Example:
>
> String string = "abc\ndef\r\nghi\rjkl";
> String platformString =
> string.lines().collect(joining(System.lineSeparator()));
>
> Result:
>
> abc
> def
> ghi
> jkl
>
>
> Example:
>
> String string = " abc \n def \n ghi ";
> String trimmedString =
> string.lines().map(s -> s.trim()).collect(joining("\n"));
>
> Result:
>
> abc
> def
> ghi
>
>
> Example:
>
> String table = `First Name Surname Phone
> Al Albert 555-1111
> Bob Roberts 555-2222
> Cal Calvin 555-3333
> `;
>
> // Extract headers
> String firstLine = table.lines().findFirst().orElse("");
> List<String> headings = List.of(firstLine.trim().split(`\s{2,}`));
>
> // Build stream of maps
> Stream<Map<String, String>> stream =
> table.lines().skip(1)
> .map(line -> line.trim())
> .filter(line -> !line.isEmpty())
> .map(line -> line.split(`\s{2,}`))
> .map(columns -> {
> List<String> values = List.of(columns);
> return IntStream.range(0, headings.size()).boxed()
> .collect(toMap(headings::get, values::get));
> });
>
> // print all "First Name"
> stream.map(row -> row.get("First Name"))
> .forEach(name -> System.out.println(name));
>
> Result:
>
> Al
> Bob
> Cal
> B. Additions to basic trim methods. In addition to margin methods trimIndent and trimMarkers described below in Section C, it would be worth introducing trimLeft and trimRight to augment the longstanding trim method. A key question is how trimLeft and trimRight should detect whitespace, because different definitions of whitespace exist in the library.
>
> trim itself uses the simple test less than or equal to the space character, a fast test but not Unicode friendly.
>
> Character.isWhitespace(codepoint) returns true if codepoint one of the following;
>
> SPACE_SEPARATOR.
> LINE_SEPARATOR.
> PARAGRAPH_SEPARATOR.
> '\t', U+0009 HORIZONTAL TABULATION.
> '\n', U+000A LINE FEED.
> '\u000B', U+000B VERTICAL TABULATION.
> '\f', U+000C FORM FEED.
> '\r', U+000D CARRIAGE RETURN.
> '\u001C', U+001C FILE SEPARATOR.
> '\u001D', U+001D GROUP SEPARATOR.
> '\u001E', U+001E RECORD SEPARATOR.
> '\u001F', U+001F UNIT SEPARATOR.
> ' ', U+0020 SPACE.
> (Note: that non-breaking space (\u00A0) is excluded)
>
> Character.isSpaceChar(codepoint) returns true if codepoint one of the following;
>
> SPACE_SEPARATOR.
> LINE_SEPARATOR.
> PARAGRAPH_SEPARATOR.
> ' ', U+0020 SPACE.
> '\u00A0', U+00A0 NON-BREAKING SPACE.
> That sets up several kinds of whitespace; trim's whitespace (TWS), Character whitespace (CWS) and the union of the two (UWS). TWS is a fast test. CWS is a slow test. UWS is fast for Latin1 and slow-ish for UTF-16.
>
> We are recommending that trimLeft and trimRight use UWS, leave trim alone to avoid breaking the world and then possibly introduce trimWhitespace that uses UWS.
>
> public String trim()
> Removes characters less than equal to space from the beginning and end of the string. No, change except spec clarification and links to the new trim methods.
> Examples:
> "".trim(); // ""
> " ".trim(); // ""
> " abc ".trim(); // "abc"
> " \u2028abc ".trim(); // "\u2028abc"
> public String trimWhitespace()
> Removes whitespace from the beginning and end of the string.
> Examples:
>
> "".trimWhitespace(); // ""
> " ".trimWhitespace(); // ""
> " abc ".trimWhitespace(); // "abc"
> " \u2028abc ".trimWhitespace(); // "abc"
> public String trimLeft()
> Removes whitespace from the beginning of the string.
> Examples:
>
> "".trimLeft(); // ""
> " ".trimLeft(); // ""
> " abc ".trimLeft(); // "abc "
> public String trimRight()
> Removes whitespace from the end of the string.
> Examples:
>
> "".trimRight(); // ""
> " ".trimRight(); // ""
> " abc ".trimRight(); // " abc"
> C. Margin management. With introduction of multi-line Raw String Literals, developers will have to deal with the extraneous spacing introduced by indenting and formatting string bodies.
>
> Note that for all the methods in this group, if the first line is empty then it is removed and if the last is empty then it is removed. This removal provides a means for developers that use delimiters on separate lines to bracket string bodies. Also note, that all line separators are replaced with \n.
>
> public String trimIndent()
> This method determines a representative line in the string body that has a non-whitespace character closest to the left margin. Once that line has been determined, the number of leading whitespaces is tallied to produce a minimal indent amount. Consequently, the result of the method is a string with the minimal indent amount removed from each line. The first line is unaffected since it is preceded by the open delimiter. The type of whitespace used (spaces or tabs) does not affect the result as long as the developer is consistent with the whitespace used.
> Example:
>
> String x = `
> This is a line
> This is a line
> This is a line
> This is a line
> This is a line
> `.trimIndent();
>
> Result:
>
> This is a line
> This is a line
> This is a line
> This is a line
> This is a line
> public String trimMarkers(String leftMarker, String rightMarker)
> Each line of the multi-line string is first trimmed. If the trimmed line contains the leftMarker at the beginning of the string then it is removed. Finally, if the line contains the rightMarker at the end of line, it is removed.
> Example:
>
> String x = `|This is a line|
> |This is a line|
> |This is a line|`.trimMarkers("|", "|");
> Result:
>
> This is a line
> This is a line
> This is a line
>
> Example:
>
> String x = `>> This is a line
> >> This is a line
> >> This is a line`.trimMarkers(">> ", "");
> Result:
>
> This is a line
> This is a line
> This is a line
> D. Escape management. Since Raw String Literals do not interpret Unicode escapes (\unnnn) or escape sequences (\n, \b, etc), we need to provide a scheme for developers who just want multi-line strings but still have escape sequences interpreted.
>
> public String unescape() throws MalformedEscapeException
> Translates each Unicode escape or escape sequence in the string into the character represented by the escape. @jls 3.3, 3.10.6
> Example:
>
> `abc\u2022def\nghi`.unescape();
>
> Result:
>
> abc•def
> ghi
> public String unescape(EscapeType... escape) throws MalformedEscapeException
> Selectively translates Unicode escape or escape sequence based on the escape type flags provided.
> public enum EscapeType {
> /** Backslash escape sequences based on section 3.10.6 of the
> * <cite>The Java™ Language Specification</cite>.
> * This includes sequences for backspace, horizontal tab,
> * line feed, form feed, carriage return, double quote,
> * single quote, backslash and octal escape sequences.
> */
> BACKSLASH, //
>
> /** Unicode sequences based on section 3.3 of the
> * <cite>The Java™ Language Specification</cite>.
> * This includes sequences in the form {@code \u005Cunnnn}.
> */
> UNICODE
> }
>
>
> Example:
>
> `abc\u2022def\nghi`.unescape(EscapeType.BACKSLASH);
>
> Result:
>
> abc\u2022def
> ghi
>
>
> Example:
>
> `abc\u2022def\nghi`.unescape(EscapeType.UNICODE);
>
> Result:
>
> abc•def\nghi
> Conversely, there are circumstances where the inverse is required
>
> public String escape()
> Translates each quote, backslash, non-graphic character or non-ASCII character into an Unicode escape or escape sequence. The method is equivalent to escape(BACKSLASH, UNICODE) .
> Example:
>
> `abc•def
> ghi`.escape();
>
> Result:
>
> abc\u2022def\nghi
> public String escape(EscapeType... escape)
> Selectively translates each quote, backslash, non-graphic character or non-ASCII character into an Unicode escape or escape sequence based on the escape type flags provided.
> Example:
>
> `abc•def
> ghi`.escape(EscapeType.BACKSLASH);
>
> Result:
>
> abc•def\nghi
>
>
> Example:
>
> `abc•def
> ghi`.escape(EscapeType.UNICODE);
>
> Result:
>
> abc\u2022def
> ghi
>
More information about the core-libs-dev
mailing list