Raw String Literals indentation management

Jim Laskey james.laskey at oracle.com
Thu Jun 7 13:11:01 UTC 2018


This topic died down while we were off doing other things including some more research, consequently Raw String Literals will miss the JDK 11 train.

We have introduced some new String methods in JDK 11 (and one Predicate method) that will help us along the way (used in some of the examples below.)

    String::repeat(int n)
    String::strip(), String::stripLeading(), String::stripTrailing()
    String::isBlank()
    String::lines()
    Predicate::not(Predicate<? super T> target)

We propose three new Raw String Literal related string methods and plus a power tool;

    String indent(int n)
    String indent()
    Stream<String> lines(LinesOptions... options)

    String transform(Function<String, String> f)


Generalized Indentation 

String::indent(int n) can be used to "add to” or" remove from" a fixed amount of indentation on each line. n > 0 adds, n < 0 removes, n == 0 neutral.

Example:

    String a =
`
abc
def
ghi
`.indent(4);

Result:

    abc
    def
    ghi


Ex.

    String b =
`
    abc
    def
    ghi
`.indent(-4);

Result:

abc
def
ghi

Notes:
- Tabs, as all other white space characters, are treated as one space. The only viable alternative is to introduce String::detab/String::entab (which may be reasonable.)
- String::indent(int n)also works with strings without line terminators
- String::indent(int n) does not crop blank lines. This keeps the method revelant to all strings, not just Raw String Literals.

Implementation:

/**
 * When applied to a string, modifies the indentation
 * of each line based on parameter {@code n}.
 * <p>
 * If {@code n > 0} then {@code n} spaces {@code u+0020}
 * are inserted at the beginning of each line.
 * {@link String#isBlank() blank lines} are unaffected.
 * <p>
 * If {@code n < 0} then {@code n}
 * {@link Character#isWhitespace(int) white space characters}
 * are removed from the beginning of each line. If a given line
 * does not contain sufficient white space then all leading
 * {@link Character#isWhitespace(int) white space characters}
 * are removed.
 * <p>
 * If {@code n == 0} then indentation remains unchanged, but other
 * transformations, such as line termination, still take affect.
 * <p>
 * @apinote All
 *          {@link Character#isWhitespace(int) white space characters},
 *          including tab, are treated as a single space.
 * <p>
 * @apiNote The all line terminators in the result will be
 *          replaced with line feed {@code "\n"} ({@code U+000A}).
 *
 * @param n  number of leading white space characters
 *           to adjust
 *
 * @return string with indentation modified.
 *
 * @see Character#isWhitespace(int)
 * @see String#lines()
 *
 * @since 11
 */
public String indent(int n) {
    return isEmpty() ? "" :  indent(n, false);
}

private String indent(int n, boolean skipBlanks) {
    if (isMultiline()) {
        Stream<String> stream = skipBlanks ? lines(LinesOptions.ONE_LEADING_TRAILING)
                                           : lines();
        if (n > 0) {
            final String spaces = " ".repeat(n);
            stream = stream.map(s -> s.isBlank() ? s : spaces + s);
        } else if (n == Integer.MIN_VALUE) {
            stream = stream.map(s -> s.stripLeading());
        } else if (n < 0) {
            stream = stream.map(s -> s.substring(Math.min(-n, s.indexOfNonWhitespace())));
        }
        return stream.collect(Collectors.joining("\n", "", "\n"));
    } else {
        if (n > 0) {
            return " ".repeat(n) + this;
        }
        if (n == Integer.MIN_VALUE) {
            return stripLeading();
        }
        if (n < 0) {
            return substring(Math.min(-n, indexOfNonWhitespace()));
        }
        return this;
    }
}


Raw String Literal Conventional Form and Blank Lines

Prior e-mails discussed the conventional form of multi-line Raw String Literals. I listed several examples, but I think the open and close delimiters “bracketing” string body form seemed to satisfy most commenters.

Example:

    String c = `
                   abc
                   def
                   ghi
               `;

Unfortunately, doing so introduces additional leading and trailing blank lines that the developer may not have intended to be part of the string. To handle this issue, we propose String::lines(LinesOptions... options).  String::lines(LinesOptions... options)returns a stream of substrings extracted from the string partitioned by line terminators with options to remove blank lines before creating the stream. This allows Raw String Literal transformations using String::lines to remain single pass. We realize this is a little rough around the edges, but gets us close to the goal.

Example:

        System.out.println(`
           abc
           def
           ghi
        `.lines(LinesOptions.ONE_LEADING_TRAILING).count());

Result:
3

Implementation:

/**
 * The constants of this enumerated type provide fine
 * control over the content of the string stream produced
 * by the {@link String#lines(LinesOptions)} method.
 */
public enum LinesOptions {
    /**
     * Remove one leading blank line from the stream if present.
     */
    ONE_LEADING,
    /**
     * Remove one trailing blank line from the stream if present.
     */
    ONE_TRAILING,
    /**
     * Remove one leading blank line and one trailing blank line
     * from the stream if present.
     */
    ONE_LEADING_TRAILING,
    /**
     * Remove all leading blank lines from the stream.
     */
    ALL_LEADING,
    /**
     * Remove all trailing blank lines from the stream.
     */
    ALL_TRAILING,
    /**
     * Remove all leading and trailing blank lines from the
     * stream.
     */
    ALL_LEADING_TRAILING,
    /**
     * Remove all blank lines from the stream.
     */
    ALL_BLANKS
}

/**
 * Returns a stream of substrings extracted from this string
 * partitioned by line terminators.
 * <p>
 * Line terminators recognized are line feed
 * {@code "\n"} ({@code U+000A}),
 * carriage return
 * {@code "\r"} ({@code U+000D})
 * and a carriage return followed immediately by a line feed
 * {@code "\r\n"} ({@code U+000D U+000A}).
 * <p>
 * The stream returned by this method contains each line of
 * this string that is terminated by a line terminator except that
 * the last line can either be terminated by a line terminator or the
 * end of the string.
 * The lines in the stream are in the order in which
 * they occur in this string and do not include the line terminators
 * partitioning the lines.
 * <p>
 * The {@code options} vararg can be used to specify filtering of
 * certain lines from the resulting stream.
 *
 * @implNote This method provides better performance than
 *           split("\R") by supplying elements lazily and
 *           by faster search of new line terminators.
 *
 * @param  options  a vararg of {@link LinesOptions} used to
 *                  control the selection of lines returned
 *                  by the stream
 *
 * @return  the stream of strings extracted from this string
 *          partitioned by line terminators
 *
 * @since 11
 */
public Stream<String> lines(LinesOptions... options) {


Raw String Literal Magic Remove Indentation

String::indent() (was String::stripIndent) determines an "n" that left justifies lines without loss of relative indentation.

Example:

    String d = `
                   abc
                     def
                       ghi
               `.indent();

Result:
    abc
      def
        ghi
    
That is, indent(-16).

Notes:
- Raw String Literals are String::indent()'s primary use case.
- String::indent() removes the first and last lines if blank, that is lines(LinesOptions.ONE_LEADING_TRAILING), as per the Raw String Literal Conventional Form and Blank Lines topic.
- All white spaces are treated equally. There was a discussion about only removing leading white space only if the leading white space matched the "determinant" string's leading white space. The problem is there is often no such string. Example: String x = " \ta\n\t b\n". Both lines have an equal number of leading white spaces. Which one is determinant?
- Additional indentation can be added by chaining String::indent() with String::indent(int n).

Example:

    String e = `
                   abc
                     def
                       ghi
               `.indent().indent(4);

Result:
        abc
          def
            ghi

Implementation:

/**
 * When applied to a string, left justifies
 * lines without loss of relative indentation. This is
 * accomplished by removing an equal number of
 * {@link Character#isWhitespace(int) white space} characters
 * from each line so that at least one line has a non-white
 * space character in the left-most position. 
 * First and last blank lines introduced to allow
 * bracketing  delimiters to appear on separate source lines
 * are also removed.
 * <p>
 * @apinote All
 *          {@link Character#isWhitespace(int) white space characters},
 *          including tab, are treated as a single space.
 * <p>
 * @apiNote The all line terminators in the result will be
 *          replaced with line feed {@code "\n"} ({@code U+000A}).
 *
 * @return string left justified
 *
 * @see Character#isWhitespace(int)
 * @see String#indent(int)
 * @see String#lines()
 * @see String#lines(LinesOptions... options)
 *
 * @since 11
 */public String indent() {
    if (isEmpty()) {
        return "";
    }
    int min = lines().skip(1)
                     .filter(not(String::isEmpty))
                     .mapToInt(String::indexOfNonWhitespace)
                     .min()
                     .orElse(0);
    return indent(-min, true);
}

I should mention that concerns about compile time verses runtime costs of transforming Raw String Literals will be mitigated by "JEP 303: Intrinsics for the LDC and INVOKEDYNAMIC Instructions”  http://openjdk.java.net/jeps/303 <http://openjdk.java.net/jeps/303>. That is, stripping of indentation by indent() will occur at compile time.


Custom Indentation Management

In prior discussions, we included String::stripMarkers(String leftMarker, String rightMarker) as a means to manage indentation. There was a general feeling, at the time, that String::indent() (then String::stripIndent()) would dominate use and String::stripMarkers less so. We propose that we drop String::stripMarkers and provide a more customizable indentation management using String::lines(LinesOptions... options)and String::transform(Function<String, String> f).

The String::transform(Function<String, String> f) method allows the application of a string transformation function “in chain”.

Example:

public class MyClass {
    private static final String MARGIN_MARKER= "| ";
    public String stripMargin(String string) {
        return lines(LinesOptions.ONE_LEADING_TRAILING)
                    .map(String::strip)
                    .map(s -> s.startsWith(MARGIN_MARKER) ? s.substring(MARGIN_MARKER.length()) : s)
                    .collect(Collectors.joining("\n", "", "\n"));
    }
    
    String f = `
                | The content of
                | the string
               `.transform(MyClass::stripMargin);

Result:

The content of
the string


Implementation:

/**
 * This method allows the application of a function to {@code this}
 * string. The function should expect a single String argument
 * and produce a String result.
 *
 * @param f    functional interface to a apply
 *
 * @return     the string result of applying the function to this string
 *
 * @see java.util.function.Function
 *
 * @since 11
 */
public String transform(Function<String, String> f) {
    return f.apply(this);
}

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20180607/1531b213/attachment-0001.html>


More information about the amber-spec-experts mailing list