[raw-strings] Indentation problem

forax at univ-mlv.fr forax at univ-mlv.fr
Mon Jan 29 11:01:58 UTC 2018


----- Mail original -----
> De: "Tagir Valeev" <amaembo at gmail.com>
> À: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Envoyé: Samedi 27 Janvier 2018 12:37:01
> Objet: Re: [raw-strings] Indentation problem

> Hello!

Hi !

> 
>> The is a rule when you design a language, if you can do something in the
>> compiler or in a library, do it in the library :)
> 
> Library can indeed allow you (to some extent) to use better syntax.
> What library cannot do is to disallow the worse syntax. And this the
> most important part of my suggestion: to prevent people from writing
> bad code, not to allow people to write better code.

non-indented code doesn't always equals to bad code,
think about minification by example. 

> 
>> I do not thing it's a good idea to force the pipe prefix in the spec,
> 
> Why do you think it's not a good idea? What are possible
> disadvantages? Please share your concerns. Thanks.

Coding conventions, code formatting rules, etc. change over time as we learn as a community,
baking this kind of considerations in a language will make your language always live in the same decade,
the less 'by defaults' you have in a language, the better it is when you will looking back. 

> 
>> and from an IDE point of view, you have to do more analysis but you can
>> recognize the sequence ` ... `.trimMargin() in order to auto-indent things
>> correctly.
> 
> True, it's possible if one uses trimMargin() call. But if bad code is
> already written, it would be not so easy to fix it automatically. We
> could automatically add trimMargin(), but determining which part of
> indent should be moved to the left part of pipe cannot be done with
> 100% accuracy.

I was thinking about using .trimMarging() as a hint that the user want IDEs to re-format the code correctly (by detecting the pipe), not forcing users to follow a specific convention.

> 
> With best regards,
> Tagir Valeev.
> 

regards,
Rémi

>>
>> regards,
>> Rémi
>>
>> ----- Mail original -----
>>> De: "Tagir Valeev" <amaembo at gmail.com>
>>> À: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
>>> Envoyé: Samedi 27 Janvier 2018 09:23:31
>>> Objet: [raw-strings] Indentation problem
>>
>>> Hello!
>>>
>>> Every language which implements the multiline strings has problems
>>> with indentation. E.g. consider something like this:
>>>
>>> public class Multiline {
>>>  static String createHtml(String message) {
>>>    String html = `<html>
>>>  <head>
>>>    <title>Message</title>
>>>  </head>
>>>  <body>`;
>>>    if (message != null) {
>>>      html += `
>>>    <p>
>>>      Message: `+message+`
>>>    </p>`;
>>>    }
>>>    html += `
>>>  </body>
>>> </html>`;
>>>    return html;
>>>  }
>>> }
>>>
>>> Here the indentation of embedded snippet breaks the indentation of the
>>> Java program harming its readability. The overall structure of the
>>> method is messed with generated HTML structure. This is not just bad
>>> indentation which could be fixed by auto-formatting feature of IDE.
>>> You cannot fix this without throwing away a multiline string syntax
>>> and without changing the semantics. Some people sacrifice the
>>> semantics, namely the indentation of generated output if output
>>> language is indentation agnostic. HTML is mostly so, unless you have a
>>> <pre> section. So one may "fix" it like this:
>>>
>>> public class Multiline {
>>>  static String createHtml(String message) {
>>>    String html = `<html>
>>>        <head>
>>>          <title>Message</title>
>>>        </head>
>>>        <body>`;
>>>    if (message != null) {
>>>      html += `
>>>          <p>
>>>            Message: `+message+`
>>>          </p>`;
>>>    }
>>>    html += `
>>>        </body>
>>>      </html>`;
>>>    return html;
>>>  }
>>> }
>>>
>>> Now we have broken formatting in the generated HTML, which ruins the
>>> idea of multiline strings (why bother to generate \n in output HTML if
>>> it looks like a mess anyways?) Moreover, the structure of Java program
>>> now affects the output. E.g. if you add several more nested "if" or
>>> "switch" statement, you will need to indent <p> even more.
>>>
>>> Many languages provide library methods to handle this. E.g.
>>> trimIndent() could be provided to remove leading spaces of every line,
>>> but this would kill the HTML indents at all. Another possibility is to
>>> provide a method like trimMargin() on Kotlin [1] which trims all
>>> spaces before a special character (pipe by default) including a
>>> special character itself.
>>>
>>> Assuming such method exists in Java, we can rewrite our method in a
>>> prettier way preserving both Java and HTML formatting:
>>>
>>> public class Multiline {
>>>  static String createHtml(String message) {
>>>    String html = `<html>
>>>      |  <head>
>>>      |    <title>Message</title>
>>>      |  </head>
>>>      |  <body>`.trimMargin();
>>>    if (message != null) {
>>>      html += `
>>>        |    <p>
>>>        |      Message: `+message+`
>>>        |    </p>`.trimMargin();
>>>    }
>>>    html += `
>>>      |  </body>
>>>      |</html>`.trimMargin();
>>>    return html;
>>>  }
>>> }
>>>
>>> This is almost nice. Even without syntax highlighting you can easily
>>> distinguish between Java code and injected HTML code, you can indent
>>> Java and HTML independently and HTML code does not clash with Java
>>> code structure. The only problem is the necesity to call the
>>> trimMargin() method. This means that original line is preserved in the
>>> bytecode and during runtime and the trimming is processed every time
>>> the method is called causing performance and memory handicap. This
>>> problem could be minimized making trimMargin() a javac intrinsic.
>>> Hoever even in this case it would be hard to enforce usage of this
>>> method and I expect that tons of hard-to-read Java code will appear in
>>> the wild, despite I believe that Java is about readability.
>>>
>>> So I propose to enforce such (or similar) format on language level
>>> instead of adding a library method like "trimMargin()". The syntax
>>> could be formalized like this:
>>>
>>> - Raw string starts with back-quote, ends with back-quote, as written
>>> in draft before
>>> - When line terminating sequence is encountered within a raw string,
>>> the '\n' character is included into the string, and the literal is
>>> interrupted
>>> - After the interruption any amount of whitespace or comment tokens
>>> are allowed and ignored
>>> - The next meaningful token must be a pipe '|'. It's a compilation
>>> error if any other token or EOF appears before '|' except comments or
>>> whitespaces
>>> - After '|' the raw-string literal continues and may either end with
>>> back-quote or be interrupted again with the subsequent line
>>> terminating sequence.
>>>
>>> Note the you don't need to especially escape the pipes within the literals.
>>>
>>> I see some advantages with such syntax:
>>> 1. You can comment (or comment out!) a part of multiline string
>>> without terminating it:
>>>
>>> String sql = `SELECT * FROM table
>>>    // Negative entry ID = deleted entry
>>>    | WHERE entryID >= 0`;
>>>
>>> If you want you can still make this comment a part of the query
>>> (assuming DBMS accepts // comments):
>>>
>>> String sql = `SELECT * FROM table
>>>    | // Negative entry ID = deleted entry
>>>    | WHERE entryID >= 0`;
>>>
>>> Outcommenting code:
>>>
>>> String html = `<div>
>>> /*  |   <span color='red'>
>>>    |       Error
>>>    |   </span>*/ // single-line comments would work as well
>>>    |   Something wrong happened
>>>    |</div>`;
>>>
>>> 2. Looking into code fragment out of context (e.g. diff log) you
>>> understand that you are inside a multiline literal. E.g. consider
>>> reviewing a diff like
>>>
>>>            | x++;
>>> +           | if (x == 10) break;
>>>            | foo(x);
>>>
>>> Without pipes you could think that it's Java code without any further
>>> consideration. But now it's clear that it's part of multiline string
>>> (probably a JavaScript!), so this is not direct Java logic and you
>>> should check the broader context to understand what's this literal is
>>> for.
>>>
>>> 3. You cannot accidentally make a big part of program a part of
>>> multiline raw string just forgetting to close the back-quote. A
>>> compilation error will be issued right in the next string like
>>> "Multiline string must continue with a pipe token", not some obscure
>>> message five screens below where the next raw string literal happens
>>> to start.
>>>
>>> 4. IDEs will easily distinguish between in-literal indentation and
>>> Java indentation and may allow you to adjust independently one or
>>> another.
>>>
>>> In general this greatly increases the readability clearly telling you
>>> at every line that you're not in Java, but inside something nested.
>>> You can easily nest Java snippet into Java snippet and use multiline
>>> raw-strings inside and still not get lost!
>>>
>>> String javaMethod = `public void dumpHtml() {
>>>  |  System.out.println(``<!DOCTYPE html>
>>>  |    |<html>
>>>  |    |  <body>
>>>  |    |    <h1>HelloWorld!</h1>
>>>  |    |  </body>
>>>  |    |</html>``);
>>>  |}`
>>>
>>> One pipe means one level inside, two pipes mean two levels inside.
>>>
>>>
>>> The only disadvantage I see in forcing a pipe prefix is inability to
>>> just paste a big snippet from somewhere to the middle of Java program
>>> in a plain text editor. However any decent IDE would support automatic
>>> addition of pipes on paste. If not, simple search-and-replace with
>>> regex like s/^/   |/ though the pasted content will do the thing. Even
>>> adding pipes manually is not that hard (I did this manually many times
>>> writing this letter).
>>>
>>> What do you think?
>>>
> >> [1] https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/trim-margin.html


More information about the amber-spec-experts mailing list