[raw-strings] Indentation problem
forax at univ-mlv.fr
forax at univ-mlv.fr
Mon Jan 29 11:01:58 UTC 2018
----- Mail original -----
> De: "Tagir Valeev" <amaembo at gmail.com>
> À: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Envoyé: Samedi 27 Janvier 2018 12:37:01
> Objet: Re: [raw-strings] Indentation problem
> Hello!
Hi !
>
>> The is a rule when you design a language, if you can do something in the
>> compiler or in a library, do it in the library :)
>
> Library can indeed allow you (to some extent) to use better syntax.
> What library cannot do is to disallow the worse syntax. And this the
> most important part of my suggestion: to prevent people from writing
> bad code, not to allow people to write better code.
non-indented code doesn't always equals to bad code,
think about minification by example.
>
>> I do not thing it's a good idea to force the pipe prefix in the spec,
>
> Why do you think it's not a good idea? What are possible
> disadvantages? Please share your concerns. Thanks.
Coding conventions, code formatting rules, etc. change over time as we learn as a community,
baking this kind of considerations in a language will make your language always live in the same decade,
the less 'by defaults' you have in a language, the better it is when you will looking back.
>
>> and from an IDE point of view, you have to do more analysis but you can
>> recognize the sequence ` ... `.trimMargin() in order to auto-indent things
>> correctly.
>
> True, it's possible if one uses trimMargin() call. But if bad code is
> already written, it would be not so easy to fix it automatically. We
> could automatically add trimMargin(), but determining which part of
> indent should be moved to the left part of pipe cannot be done with
> 100% accuracy.
I was thinking about using .trimMarging() as a hint that the user want IDEs to re-format the code correctly (by detecting the pipe), not forcing users to follow a specific convention.
>
> With best regards,
> Tagir Valeev.
>
regards,
Rémi
>>
>> regards,
>> Rémi
>>
>> ----- Mail original -----
>>> De: "Tagir Valeev" <amaembo at gmail.com>
>>> À: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
>>> Envoyé: Samedi 27 Janvier 2018 09:23:31
>>> Objet: [raw-strings] Indentation problem
>>
>>> Hello!
>>>
>>> Every language which implements the multiline strings has problems
>>> with indentation. E.g. consider something like this:
>>>
>>> public class Multiline {
>>> static String createHtml(String message) {
>>> String html = `<html>
>>> <head>
>>> <title>Message</title>
>>> </head>
>>> <body>`;
>>> if (message != null) {
>>> html += `
>>> <p>
>>> Message: `+message+`
>>> </p>`;
>>> }
>>> html += `
>>> </body>
>>> </html>`;
>>> return html;
>>> }
>>> }
>>>
>>> Here the indentation of embedded snippet breaks the indentation of the
>>> Java program harming its readability. The overall structure of the
>>> method is messed with generated HTML structure. This is not just bad
>>> indentation which could be fixed by auto-formatting feature of IDE.
>>> You cannot fix this without throwing away a multiline string syntax
>>> and without changing the semantics. Some people sacrifice the
>>> semantics, namely the indentation of generated output if output
>>> language is indentation agnostic. HTML is mostly so, unless you have a
>>> <pre> section. So one may "fix" it like this:
>>>
>>> public class Multiline {
>>> static String createHtml(String message) {
>>> String html = `<html>
>>> <head>
>>> <title>Message</title>
>>> </head>
>>> <body>`;
>>> if (message != null) {
>>> html += `
>>> <p>
>>> Message: `+message+`
>>> </p>`;
>>> }
>>> html += `
>>> </body>
>>> </html>`;
>>> return html;
>>> }
>>> }
>>>
>>> Now we have broken formatting in the generated HTML, which ruins the
>>> idea of multiline strings (why bother to generate \n in output HTML if
>>> it looks like a mess anyways?) Moreover, the structure of Java program
>>> now affects the output. E.g. if you add several more nested "if" or
>>> "switch" statement, you will need to indent <p> even more.
>>>
>>> Many languages provide library methods to handle this. E.g.
>>> trimIndent() could be provided to remove leading spaces of every line,
>>> but this would kill the HTML indents at all. Another possibility is to
>>> provide a method like trimMargin() on Kotlin [1] which trims all
>>> spaces before a special character (pipe by default) including a
>>> special character itself.
>>>
>>> Assuming such method exists in Java, we can rewrite our method in a
>>> prettier way preserving both Java and HTML formatting:
>>>
>>> public class Multiline {
>>> static String createHtml(String message) {
>>> String html = `<html>
>>> | <head>
>>> | <title>Message</title>
>>> | </head>
>>> | <body>`.trimMargin();
>>> if (message != null) {
>>> html += `
>>> | <p>
>>> | Message: `+message+`
>>> | </p>`.trimMargin();
>>> }
>>> html += `
>>> | </body>
>>> |</html>`.trimMargin();
>>> return html;
>>> }
>>> }
>>>
>>> This is almost nice. Even without syntax highlighting you can easily
>>> distinguish between Java code and injected HTML code, you can indent
>>> Java and HTML independently and HTML code does not clash with Java
>>> code structure. The only problem is the necesity to call the
>>> trimMargin() method. This means that original line is preserved in the
>>> bytecode and during runtime and the trimming is processed every time
>>> the method is called causing performance and memory handicap. This
>>> problem could be minimized making trimMargin() a javac intrinsic.
>>> Hoever even in this case it would be hard to enforce usage of this
>>> method and I expect that tons of hard-to-read Java code will appear in
>>> the wild, despite I believe that Java is about readability.
>>>
>>> So I propose to enforce such (or similar) format on language level
>>> instead of adding a library method like "trimMargin()". The syntax
>>> could be formalized like this:
>>>
>>> - Raw string starts with back-quote, ends with back-quote, as written
>>> in draft before
>>> - When line terminating sequence is encountered within a raw string,
>>> the '\n' character is included into the string, and the literal is
>>> interrupted
>>> - After the interruption any amount of whitespace or comment tokens
>>> are allowed and ignored
>>> - The next meaningful token must be a pipe '|'. It's a compilation
>>> error if any other token or EOF appears before '|' except comments or
>>> whitespaces
>>> - After '|' the raw-string literal continues and may either end with
>>> back-quote or be interrupted again with the subsequent line
>>> terminating sequence.
>>>
>>> Note the you don't need to especially escape the pipes within the literals.
>>>
>>> I see some advantages with such syntax:
>>> 1. You can comment (or comment out!) a part of multiline string
>>> without terminating it:
>>>
>>> String sql = `SELECT * FROM table
>>> // Negative entry ID = deleted entry
>>> | WHERE entryID >= 0`;
>>>
>>> If you want you can still make this comment a part of the query
>>> (assuming DBMS accepts // comments):
>>>
>>> String sql = `SELECT * FROM table
>>> | // Negative entry ID = deleted entry
>>> | WHERE entryID >= 0`;
>>>
>>> Outcommenting code:
>>>
>>> String html = `<div>
>>> /* | <span color='red'>
>>> | Error
>>> | </span>*/ // single-line comments would work as well
>>> | Something wrong happened
>>> |</div>`;
>>>
>>> 2. Looking into code fragment out of context (e.g. diff log) you
>>> understand that you are inside a multiline literal. E.g. consider
>>> reviewing a diff like
>>>
>>> | x++;
>>> + | if (x == 10) break;
>>> | foo(x);
>>>
>>> Without pipes you could think that it's Java code without any further
>>> consideration. But now it's clear that it's part of multiline string
>>> (probably a JavaScript!), so this is not direct Java logic and you
>>> should check the broader context to understand what's this literal is
>>> for.
>>>
>>> 3. You cannot accidentally make a big part of program a part of
>>> multiline raw string just forgetting to close the back-quote. A
>>> compilation error will be issued right in the next string like
>>> "Multiline string must continue with a pipe token", not some obscure
>>> message five screens below where the next raw string literal happens
>>> to start.
>>>
>>> 4. IDEs will easily distinguish between in-literal indentation and
>>> Java indentation and may allow you to adjust independently one or
>>> another.
>>>
>>> In general this greatly increases the readability clearly telling you
>>> at every line that you're not in Java, but inside something nested.
>>> You can easily nest Java snippet into Java snippet and use multiline
>>> raw-strings inside and still not get lost!
>>>
>>> String javaMethod = `public void dumpHtml() {
>>> | System.out.println(``<!DOCTYPE html>
>>> | |<html>
>>> | | <body>
>>> | | <h1>HelloWorld!</h1>
>>> | | </body>
>>> | |</html>``);
>>> |}`
>>>
>>> One pipe means one level inside, two pipes mean two levels inside.
>>>
>>>
>>> The only disadvantage I see in forcing a pipe prefix is inability to
>>> just paste a big snippet from somewhere to the middle of Java program
>>> in a plain text editor. However any decent IDE would support automatic
>>> addition of pipes on paste. If not, simple search-and-replace with
>>> regex like s/^/ |/ though the pasted content will do the thing. Even
>>> adding pipes manually is not that hard (I did this manually many times
>>> writing this letter).
>>>
>>> What do you think?
>>>
> >> [1] https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/trim-margin.html
More information about the amber-spec-experts
mailing list