Targeting JEP 326: Raw String Literal for JDK 12

Jim Laskey james.laskey at oracle.com
Thu Jul 5 20:11:43 UTC 2018


With your guidance, we consider the Raw String Literal design and initial implementation has stabilized enough to target JEP 326 as a Preview Language Feature in JDK 12. Before we proceed, we’d like review some of recommendations made since JEP 326 was proposed as Candidate.


Margin Management

Considerable time was spent discussing how to deal with incidental indentation introduced when a Raw String Literal’s visible content marshals with surrounding code. Clearly, there is no all satisfying solution. We would be remiss if we were to lock the language into a irreversible decision that we may regret in the future. Therefore, the sensible/responsible thing to do is to follow the "raw means raw" principle and leave the Raw String Literal content uninterpreted (other than line terminators) by the compiler, and place the margin management onus on library and developer furnished methods.

To provide incidental indentation support for what we think will be the common case, the following instance method will be added to the String class:

       public String align()

which after removing all leading and trailing blank lines, left justifies each line without loss of relative indentation. Thus, stripping away all incidental indentation and line spacing.

Example:

       String html = `
                          <html>
                              <body>
                                  <p>Hello World.</p>
                              </body>
                          </html>
                     `.align();
       System.out.print(html);

Output:
<html>
   <body>
       <p>Hello World.&</p>
   </body>
</html>

Further, generalized control of indentation will be provided with the following String instance method:

       public String indent(int n)

where `n` specifies the number of white spaces to add or remove from each line of the string; a positive `n` adds n spaces (U+0020) and negative `n` removes n white spaces.

Example:

       String html = `
                          <html>
                              <body>
                                  <p>Hello World.</p>
                              </body>
                          </html>
                     `.align().indent(4);
       System.out.print(html);

Output:
   <html>
      <body>
          <p>Hello World.&</p>
      </body>
   </html>

In the cases where align() is not what the developer wants, we expect the preponderance of cases to be align().ident(n). Therefore, an additional variation of `align` will be provided:

       public align(int n)

where `n` is the indentation applied to the string after _alignment_.

Example:

       String html = `
                          <html>
                              <body>
                                  <p>Hello World.</p>
                              </body>
                          </html>
                     `.align(4);
       System.out.print(html);

Output:
   <html>
      <body>
          <p>Hello World.&</p>
      </body>
   </html>

Customizable margin management (and more) will be provided by the string instance method:

	<R> R transform​(Function<String,​R> f)

where the supplied function f is called with the string.

Example:

public class MyClass {
   private static final String MARGIN_MARKER= "| ";
   public String stripMargin(String string) {
       return lines().map(String::strip)
                     .map(s -> s.startsWith(MARGIN_MARKER) ? s.substring(MARGIN_MARKER.length()) : s)
                     .collect(Collectors.joining("\n", "", "\n"));
   }

   String stripped = `
                         | The content of
                         | the string
                     `.transform(MyClass::stripMargin);

Output:
The content of
the string

It should be noted that concern for class file size and runtime impact are addressed by the _constant folding_ features of [JEP 303](http://openjdk.java.net/jeps/303 <http://openjdk.java.net/jeps/303>).


White Space and Tabs

The use of tabs came up during the Margin Management discussion. Specifically, what do tabs represent when removing incidental indentation. As long as the source was consistent across all lines of a multi-line string (with respect to tabs), the align() method will behave as expected.  For the cases where it does not, there is no consistent rule for handling it; the best thing to do here is to provide tools for getting back to consistency.

For example, we propose the introduction of String instance method:

	public String detab(int n)

which replaces tab U+0009 characters with enough space U+0020 characters to align to tab stops at intervals n, and:

	public String entab(int n)

which replaces some space U+0020 characters with tab U+0009 characters if can align to tab stops at intervals n.

Example:

       String html = `
                          <html>
                              <body>
                                  <p>Hello World.</p>
                              </body>
                          </html>
                     `.detab(8).align(4);
       System.out.print(html);

Output:
   <html>
      <body>
          <p>Hello World.&</p>
      </body>
   </html>


Escape Sequences

After the initial release of JEP 326 there was some discussion about the names of the escape sequence managing methods. We reversed the naming (unescape/escape), but the lack of additional feedback left us wondering, “Do we have the appropriate names?” and “Do we really need these methods?”

If these methods are needed rarely, would longer names escapeSequencesToChars and charsToEscapeSequences be more suitable?

Would more putting effort into a more generalized String::replaceAll(Pattern pattern, Function<Matcher, String> replacer)) make more sense?


Repeating Delimiters

We remain convinced that the choice of an arbitrary sequence of backticks as a delimiter is the best choice. There were two main points of concern; the lack of empty string and the distinction between single & multi-line literals. We contend both of these points are aesthetic and if were part of the design, would take away from the simplicity. 

With regards to empty Raw String Literal. Java already has a representation for empty string; “”. Why have two? Raw String Literals do not have to be symmetrically in sync with traditional strings. The lack of symmetry is a discerning plus for Raw String Literal.

With regards to distinction between single & multi-line literals. If the developer chooses to use single backticks for single line and triple backticks for multi-line there is nothing in the design that prevents the developer from doing so. However, we would discourage the adoption of this "coding convention".


Cheers,

- Jim


More information about the amber-dev mailing list