[raw-strings] Indentation problem

Tagir Valeev amaembo at gmail.com
Sat Jan 27 11:37:01 UTC 2018


Hello!

> The is a rule when you design a language, if you can do something in the compiler or in a library, do it in the library :)

Library can indeed allow you (to some extent) to use better syntax.
What library cannot do is to disallow the worse syntax. And this the
most important part of my suggestion: to prevent people from writing
bad code, not to allow people to write better code.

> I do not thing it's a good idea to force the pipe prefix in the spec,

Why do you think it's not a good idea? What are possible
disadvantages? Please share your concerns. Thanks.

> and from an IDE point of view, you have to do more analysis but you can recognize the sequence ` ... `.trimMargin() in order to auto-indent things correctly.

True, it's possible if one uses trimMargin() call. But if bad code is
already written, it would be not so easy to fix it automatically. We
could automatically add trimMargin(), but determining which part of
indent should be moved to the left part of pipe cannot be done with
100% accuracy.

With best regards,
Tagir Valeev.

>
> regards,
> Rémi
>
> ----- Mail original -----
>> De: "Tagir Valeev" <amaembo at gmail.com>
>> À: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
>> Envoyé: Samedi 27 Janvier 2018 09:23:31
>> Objet: [raw-strings] Indentation problem
>
>> Hello!
>>
>> Every language which implements the multiline strings has problems
>> with indentation. E.g. consider something like this:
>>
>> public class Multiline {
>>  static String createHtml(String message) {
>>    String html = `<html>
>>  <head>
>>    <title>Message</title>
>>  </head>
>>  <body>`;
>>    if (message != null) {
>>      html += `
>>    <p>
>>      Message: `+message+`
>>    </p>`;
>>    }
>>    html += `
>>  </body>
>> </html>`;
>>    return html;
>>  }
>> }
>>
>> Here the indentation of embedded snippet breaks the indentation of the
>> Java program harming its readability. The overall structure of the
>> method is messed with generated HTML structure. This is not just bad
>> indentation which could be fixed by auto-formatting feature of IDE.
>> You cannot fix this without throwing away a multiline string syntax
>> and without changing the semantics. Some people sacrifice the
>> semantics, namely the indentation of generated output if output
>> language is indentation agnostic. HTML is mostly so, unless you have a
>> <pre> section. So one may "fix" it like this:
>>
>> public class Multiline {
>>  static String createHtml(String message) {
>>    String html = `<html>
>>        <head>
>>          <title>Message</title>
>>        </head>
>>        <body>`;
>>    if (message != null) {
>>      html += `
>>          <p>
>>            Message: `+message+`
>>          </p>`;
>>    }
>>    html += `
>>        </body>
>>      </html>`;
>>    return html;
>>  }
>> }
>>
>> Now we have broken formatting in the generated HTML, which ruins the
>> idea of multiline strings (why bother to generate \n in output HTML if
>> it looks like a mess anyways?) Moreover, the structure of Java program
>> now affects the output. E.g. if you add several more nested "if" or
>> "switch" statement, you will need to indent <p> even more.
>>
>> Many languages provide library methods to handle this. E.g.
>> trimIndent() could be provided to remove leading spaces of every line,
>> but this would kill the HTML indents at all. Another possibility is to
>> provide a method like trimMargin() on Kotlin [1] which trims all
>> spaces before a special character (pipe by default) including a
>> special character itself.
>>
>> Assuming such method exists in Java, we can rewrite our method in a
>> prettier way preserving both Java and HTML formatting:
>>
>> public class Multiline {
>>  static String createHtml(String message) {
>>    String html = `<html>
>>      |  <head>
>>      |    <title>Message</title>
>>      |  </head>
>>      |  <body>`.trimMargin();
>>    if (message != null) {
>>      html += `
>>        |    <p>
>>        |      Message: `+message+`
>>        |    </p>`.trimMargin();
>>    }
>>    html += `
>>      |  </body>
>>      |</html>`.trimMargin();
>>    return html;
>>  }
>> }
>>
>> This is almost nice. Even without syntax highlighting you can easily
>> distinguish between Java code and injected HTML code, you can indent
>> Java and HTML independently and HTML code does not clash with Java
>> code structure. The only problem is the necesity to call the
>> trimMargin() method. This means that original line is preserved in the
>> bytecode and during runtime and the trimming is processed every time
>> the method is called causing performance and memory handicap. This
>> problem could be minimized making trimMargin() a javac intrinsic.
>> Hoever even in this case it would be hard to enforce usage of this
>> method and I expect that tons of hard-to-read Java code will appear in
>> the wild, despite I believe that Java is about readability.
>>
>> So I propose to enforce such (or similar) format on language level
>> instead of adding a library method like "trimMargin()". The syntax
>> could be formalized like this:
>>
>> - Raw string starts with back-quote, ends with back-quote, as written
>> in draft before
>> - When line terminating sequence is encountered within a raw string,
>> the '\n' character is included into the string, and the literal is
>> interrupted
>> - After the interruption any amount of whitespace or comment tokens
>> are allowed and ignored
>> - The next meaningful token must be a pipe '|'. It's a compilation
>> error if any other token or EOF appears before '|' except comments or
>> whitespaces
>> - After '|' the raw-string literal continues and may either end with
>> back-quote or be interrupted again with the subsequent line
>> terminating sequence.
>>
>> Note the you don't need to especially escape the pipes within the literals.
>>
>> I see some advantages with such syntax:
>> 1. You can comment (or comment out!) a part of multiline string
>> without terminating it:
>>
>> String sql = `SELECT * FROM table
>>    // Negative entry ID = deleted entry
>>    | WHERE entryID >= 0`;
>>
>> If you want you can still make this comment a part of the query
>> (assuming DBMS accepts // comments):
>>
>> String sql = `SELECT * FROM table
>>    | // Negative entry ID = deleted entry
>>    | WHERE entryID >= 0`;
>>
>> Outcommenting code:
>>
>> String html = `<div>
>> /*  |   <span color='red'>
>>    |       Error
>>    |   </span>*/ // single-line comments would work as well
>>    |   Something wrong happened
>>    |</div>`;
>>
>> 2. Looking into code fragment out of context (e.g. diff log) you
>> understand that you are inside a multiline literal. E.g. consider
>> reviewing a diff like
>>
>>            | x++;
>> +           | if (x == 10) break;
>>            | foo(x);
>>
>> Without pipes you could think that it's Java code without any further
>> consideration. But now it's clear that it's part of multiline string
>> (probably a JavaScript!), so this is not direct Java logic and you
>> should check the broader context to understand what's this literal is
>> for.
>>
>> 3. You cannot accidentally make a big part of program a part of
>> multiline raw string just forgetting to close the back-quote. A
>> compilation error will be issued right in the next string like
>> "Multiline string must continue with a pipe token", not some obscure
>> message five screens below where the next raw string literal happens
>> to start.
>>
>> 4. IDEs will easily distinguish between in-literal indentation and
>> Java indentation and may allow you to adjust independently one or
>> another.
>>
>> In general this greatly increases the readability clearly telling you
>> at every line that you're not in Java, but inside something nested.
>> You can easily nest Java snippet into Java snippet and use multiline
>> raw-strings inside and still not get lost!
>>
>> String javaMethod = `public void dumpHtml() {
>>  |  System.out.println(``<!DOCTYPE html>
>>  |    |<html>
>>  |    |  <body>
>>  |    |    <h1>HelloWorld!</h1>
>>  |    |  </body>
>>  |    |</html>``);
>>  |}`
>>
>> One pipe means one level inside, two pipes mean two levels inside.
>>
>>
>> The only disadvantage I see in forcing a pipe prefix is inability to
>> just paste a big snippet from somewhere to the middle of Java program
>> in a plain text editor. However any decent IDE would support automatic
>> addition of pipes on paste. If not, simple search-and-replace with
>> regex like s/^/   |/ though the pasted content will do the thing. Even
>> adding pipes manually is not that hard (I did this manually many times
>> writing this letter).
>>
>> What do you think?
>>
>> [1] https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/trim-margin.html


More information about the amber-spec-experts mailing list