[raw-strings] Indentation problem

Remi Forax forax at univ-mlv.fr
Sat Jan 27 11:03:40 UTC 2018


The is a rule when you design a language, if you can do something in the compiler or in a library, do it in the library :)

I do not thing it's a good idea to force the pipe prefix in the spec, and from an IDE point of view, you have to do more analysis but you can recognize the sequence ` ... `.trimMargin() in order to auto-indent things correctly.

regards,
Rémi 

----- Mail original -----
> De: "Tagir Valeev" <amaembo at gmail.com>
> À: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Envoyé: Samedi 27 Janvier 2018 09:23:31
> Objet: [raw-strings] Indentation problem

> Hello!
> 
> Every language which implements the multiline strings has problems
> with indentation. E.g. consider something like this:
> 
> public class Multiline {
>  static String createHtml(String message) {
>    String html = `<html>
>  <head>
>    <title>Message</title>
>  </head>
>  <body>`;
>    if (message != null) {
>      html += `
>    <p>
>      Message: `+message+`
>    </p>`;
>    }
>    html += `
>  </body>
> </html>`;
>    return html;
>  }
> }
> 
> Here the indentation of embedded snippet breaks the indentation of the
> Java program harming its readability. The overall structure of the
> method is messed with generated HTML structure. This is not just bad
> indentation which could be fixed by auto-formatting feature of IDE.
> You cannot fix this without throwing away a multiline string syntax
> and without changing the semantics. Some people sacrifice the
> semantics, namely the indentation of generated output if output
> language is indentation agnostic. HTML is mostly so, unless you have a
> <pre> section. So one may "fix" it like this:
> 
> public class Multiline {
>  static String createHtml(String message) {
>    String html = `<html>
>        <head>
>          <title>Message</title>
>        </head>
>        <body>`;
>    if (message != null) {
>      html += `
>          <p>
>            Message: `+message+`
>          </p>`;
>    }
>    html += `
>        </body>
>      </html>`;
>    return html;
>  }
> }
> 
> Now we have broken formatting in the generated HTML, which ruins the
> idea of multiline strings (why bother to generate \n in output HTML if
> it looks like a mess anyways?) Moreover, the structure of Java program
> now affects the output. E.g. if you add several more nested "if" or
> "switch" statement, you will need to indent <p> even more.
> 
> Many languages provide library methods to handle this. E.g.
> trimIndent() could be provided to remove leading spaces of every line,
> but this would kill the HTML indents at all. Another possibility is to
> provide a method like trimMargin() on Kotlin [1] which trims all
> spaces before a special character (pipe by default) including a
> special character itself.
> 
> Assuming such method exists in Java, we can rewrite our method in a
> prettier way preserving both Java and HTML formatting:
> 
> public class Multiline {
>  static String createHtml(String message) {
>    String html = `<html>
>      |  <head>
>      |    <title>Message</title>
>      |  </head>
>      |  <body>`.trimMargin();
>    if (message != null) {
>      html += `
>        |    <p>
>        |      Message: `+message+`
>        |    </p>`.trimMargin();
>    }
>    html += `
>      |  </body>
>      |</html>`.trimMargin();
>    return html;
>  }
> }
> 
> This is almost nice. Even without syntax highlighting you can easily
> distinguish between Java code and injected HTML code, you can indent
> Java and HTML independently and HTML code does not clash with Java
> code structure. The only problem is the necesity to call the
> trimMargin() method. This means that original line is preserved in the
> bytecode and during runtime and the trimming is processed every time
> the method is called causing performance and memory handicap. This
> problem could be minimized making trimMargin() a javac intrinsic.
> Hoever even in this case it would be hard to enforce usage of this
> method and I expect that tons of hard-to-read Java code will appear in
> the wild, despite I believe that Java is about readability.
> 
> So I propose to enforce such (or similar) format on language level
> instead of adding a library method like "trimMargin()". The syntax
> could be formalized like this:
> 
> - Raw string starts with back-quote, ends with back-quote, as written
> in draft before
> - When line terminating sequence is encountered within a raw string,
> the '\n' character is included into the string, and the literal is
> interrupted
> - After the interruption any amount of whitespace or comment tokens
> are allowed and ignored
> - The next meaningful token must be a pipe '|'. It's a compilation
> error if any other token or EOF appears before '|' except comments or
> whitespaces
> - After '|' the raw-string literal continues and may either end with
> back-quote or be interrupted again with the subsequent line
> terminating sequence.
> 
> Note the you don't need to especially escape the pipes within the literals.
> 
> I see some advantages with such syntax:
> 1. You can comment (or comment out!) a part of multiline string
> without terminating it:
> 
> String sql = `SELECT * FROM table
>    // Negative entry ID = deleted entry
>    | WHERE entryID >= 0`;
> 
> If you want you can still make this comment a part of the query
> (assuming DBMS accepts // comments):
> 
> String sql = `SELECT * FROM table
>    | // Negative entry ID = deleted entry
>    | WHERE entryID >= 0`;
> 
> Outcommenting code:
> 
> String html = `<div>
> /*  |   <span color='red'>
>    |       Error
>    |   </span>*/ // single-line comments would work as well
>    |   Something wrong happened
>    |</div>`;
> 
> 2. Looking into code fragment out of context (e.g. diff log) you
> understand that you are inside a multiline literal. E.g. consider
> reviewing a diff like
> 
>            | x++;
> +           | if (x == 10) break;
>            | foo(x);
> 
> Without pipes you could think that it's Java code without any further
> consideration. But now it's clear that it's part of multiline string
> (probably a JavaScript!), so this is not direct Java logic and you
> should check the broader context to understand what's this literal is
> for.
> 
> 3. You cannot accidentally make a big part of program a part of
> multiline raw string just forgetting to close the back-quote. A
> compilation error will be issued right in the next string like
> "Multiline string must continue with a pipe token", not some obscure
> message five screens below where the next raw string literal happens
> to start.
> 
> 4. IDEs will easily distinguish between in-literal indentation and
> Java indentation and may allow you to adjust independently one or
> another.
> 
> In general this greatly increases the readability clearly telling you
> at every line that you're not in Java, but inside something nested.
> You can easily nest Java snippet into Java snippet and use multiline
> raw-strings inside and still not get lost!
> 
> String javaMethod = `public void dumpHtml() {
>  |  System.out.println(``<!DOCTYPE html>
>  |    |<html>
>  |    |  <body>
>  |    |    <h1>HelloWorld!</h1>
>  |    |  </body>
>  |    |</html>``);
>  |}`
> 
> One pipe means one level inside, two pipes mean two levels inside.
> 
> 
> The only disadvantage I see in forcing a pipe prefix is inability to
> just paste a big snippet from somewhere to the middle of Java program
> in a plain text editor. However any decent IDE would support automatic
> addition of pipes on paste. If not, simple search-and-replace with
> regex like s/^/   |/ though the pasted content will do the thing. Even
> adding pipes manually is not that hard (I did this manually many times
> writing this letter).
> 
> What do you think?
> 
> [1] https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/trim-margin.html


More information about the amber-spec-experts mailing list