[raw-strings] Indentation problem
Remi Forax
forax at univ-mlv.fr
Sat Jan 27 11:03:40 UTC 2018
The is a rule when you design a language, if you can do something in the compiler or in a library, do it in the library :)
I do not thing it's a good idea to force the pipe prefix in the spec, and from an IDE point of view, you have to do more analysis but you can recognize the sequence ` ... `.trimMargin() in order to auto-indent things correctly.
regards,
Rémi
----- Mail original -----
> De: "Tagir Valeev" <amaembo at gmail.com>
> À: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Envoyé: Samedi 27 Janvier 2018 09:23:31
> Objet: [raw-strings] Indentation problem
> Hello!
>
> Every language which implements the multiline strings has problems
> with indentation. E.g. consider something like this:
>
> public class Multiline {
> static String createHtml(String message) {
> String html = `<html>
> <head>
> <title>Message</title>
> </head>
> <body>`;
> if (message != null) {
> html += `
> <p>
> Message: `+message+`
> </p>`;
> }
> html += `
> </body>
> </html>`;
> return html;
> }
> }
>
> Here the indentation of embedded snippet breaks the indentation of the
> Java program harming its readability. The overall structure of the
> method is messed with generated HTML structure. This is not just bad
> indentation which could be fixed by auto-formatting feature of IDE.
> You cannot fix this without throwing away a multiline string syntax
> and without changing the semantics. Some people sacrifice the
> semantics, namely the indentation of generated output if output
> language is indentation agnostic. HTML is mostly so, unless you have a
> <pre> section. So one may "fix" it like this:
>
> public class Multiline {
> static String createHtml(String message) {
> String html = `<html>
> <head>
> <title>Message</title>
> </head>
> <body>`;
> if (message != null) {
> html += `
> <p>
> Message: `+message+`
> </p>`;
> }
> html += `
> </body>
> </html>`;
> return html;
> }
> }
>
> Now we have broken formatting in the generated HTML, which ruins the
> idea of multiline strings (why bother to generate \n in output HTML if
> it looks like a mess anyways?) Moreover, the structure of Java program
> now affects the output. E.g. if you add several more nested "if" or
> "switch" statement, you will need to indent <p> even more.
>
> Many languages provide library methods to handle this. E.g.
> trimIndent() could be provided to remove leading spaces of every line,
> but this would kill the HTML indents at all. Another possibility is to
> provide a method like trimMargin() on Kotlin [1] which trims all
> spaces before a special character (pipe by default) including a
> special character itself.
>
> Assuming such method exists in Java, we can rewrite our method in a
> prettier way preserving both Java and HTML formatting:
>
> public class Multiline {
> static String createHtml(String message) {
> String html = `<html>
> | <head>
> | <title>Message</title>
> | </head>
> | <body>`.trimMargin();
> if (message != null) {
> html += `
> | <p>
> | Message: `+message+`
> | </p>`.trimMargin();
> }
> html += `
> | </body>
> |</html>`.trimMargin();
> return html;
> }
> }
>
> This is almost nice. Even without syntax highlighting you can easily
> distinguish between Java code and injected HTML code, you can indent
> Java and HTML independently and HTML code does not clash with Java
> code structure. The only problem is the necesity to call the
> trimMargin() method. This means that original line is preserved in the
> bytecode and during runtime and the trimming is processed every time
> the method is called causing performance and memory handicap. This
> problem could be minimized making trimMargin() a javac intrinsic.
> Hoever even in this case it would be hard to enforce usage of this
> method and I expect that tons of hard-to-read Java code will appear in
> the wild, despite I believe that Java is about readability.
>
> So I propose to enforce such (or similar) format on language level
> instead of adding a library method like "trimMargin()". The syntax
> could be formalized like this:
>
> - Raw string starts with back-quote, ends with back-quote, as written
> in draft before
> - When line terminating sequence is encountered within a raw string,
> the '\n' character is included into the string, and the literal is
> interrupted
> - After the interruption any amount of whitespace or comment tokens
> are allowed and ignored
> - The next meaningful token must be a pipe '|'. It's a compilation
> error if any other token or EOF appears before '|' except comments or
> whitespaces
> - After '|' the raw-string literal continues and may either end with
> back-quote or be interrupted again with the subsequent line
> terminating sequence.
>
> Note the you don't need to especially escape the pipes within the literals.
>
> I see some advantages with such syntax:
> 1. You can comment (or comment out!) a part of multiline string
> without terminating it:
>
> String sql = `SELECT * FROM table
> // Negative entry ID = deleted entry
> | WHERE entryID >= 0`;
>
> If you want you can still make this comment a part of the query
> (assuming DBMS accepts // comments):
>
> String sql = `SELECT * FROM table
> | // Negative entry ID = deleted entry
> | WHERE entryID >= 0`;
>
> Outcommenting code:
>
> String html = `<div>
> /* | <span color='red'>
> | Error
> | </span>*/ // single-line comments would work as well
> | Something wrong happened
> |</div>`;
>
> 2. Looking into code fragment out of context (e.g. diff log) you
> understand that you are inside a multiline literal. E.g. consider
> reviewing a diff like
>
> | x++;
> + | if (x == 10) break;
> | foo(x);
>
> Without pipes you could think that it's Java code without any further
> consideration. But now it's clear that it's part of multiline string
> (probably a JavaScript!), so this is not direct Java logic and you
> should check the broader context to understand what's this literal is
> for.
>
> 3. You cannot accidentally make a big part of program a part of
> multiline raw string just forgetting to close the back-quote. A
> compilation error will be issued right in the next string like
> "Multiline string must continue with a pipe token", not some obscure
> message five screens below where the next raw string literal happens
> to start.
>
> 4. IDEs will easily distinguish between in-literal indentation and
> Java indentation and may allow you to adjust independently one or
> another.
>
> In general this greatly increases the readability clearly telling you
> at every line that you're not in Java, but inside something nested.
> You can easily nest Java snippet into Java snippet and use multiline
> raw-strings inside and still not get lost!
>
> String javaMethod = `public void dumpHtml() {
> | System.out.println(``<!DOCTYPE html>
> | |<html>
> | | <body>
> | | <h1>HelloWorld!</h1>
> | | </body>
> | |</html>``);
> |}`
>
> One pipe means one level inside, two pipes mean two levels inside.
>
>
> The only disadvantage I see in forcing a pipe prefix is inability to
> just paste a big snippet from somewhere to the middle of Java program
> in a plain text editor. However any decent IDE would support automatic
> addition of pipes on paste. If not, simple search-and-replace with
> regex like s/^/ |/ though the pasted content will do the thing. Even
> adding pipes manually is not that hard (I did this manually many times
> writing this letter).
>
> What do you think?
>
> [1] https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/trim-margin.html
More information about the amber-spec-experts
mailing list