From forax at univ-mlv.fr  Tue Jan  2 12:35:15 2018
From: forax at univ-mlv.fr (Remi Forax)
Date: Tue, 2 Jan 2018 13:35:15 +0100 (CET)
Subject: Switch translation, part 2
In-Reply-To: <73805383-7ae7-fc47-1a21-f7116b78248a@oracle.com>
References: <73805383-7ae7-fc47-1a21-f7116b78248a@oracle.com>
Message-ID: <1651774490.240302.1514896515983.JavaMail.zimbra@u-pem.fr>

Hi all, 
while the proposed translation is a good translation by default, when you can have fallthroughs or guards, if you have none of them, it's not the best translation. 

[CC John Rose because i may say something stupid] 

The problem is that the VM doesn't not prune never called cases in a switch while it does that for the branch of a if, so an if ... else can be faster than a switch in the case only some cases are exercise at runtime. 
Also note that a lot of tableswitch end up to be generated by the VM as if .. else if you take a look to the generated assembly code because jumping to a computed address is not free. 
So it seems a good idea in the case of an expression switch with no guard to not generate an indy + a tableswitch but just an indy. 

So instead of lowering: 

switch (x) { 
case T t: A 
case U u: B 
case V v: C 
} 

to 

int y = indy[bootstrap=typeSwitch(T.class, U.class, V.class)](x) 
switch (y) { 
case 0: A 
case 1: B 
case 2: C 
} 

I propose to lowering it to something similar to the lambda translation: 
var expr = indy[bootstrap=exprTypeSwich(1, T.class, U.class, V.class)(x); 
and let the bootstrap to do all the lifting. 
The first bootstrap argument is the number of the switch in the code, here 1. 

With A, B and C being desugared as static methods respectively to switch$1case$0, switch$1case$1 and switch$1case$2 (i.e. "switch$" + switchNumber + "case$" + caseNumber. 

The bootstrap method exprTypeSwich can work like an inlining cache if the number of branches actually visited is small, this is interesting because the performance of an expression switch will be in the same ball park as the performance of the corresponding virtual call, and if there is too many branches used, revert to use a new method handle combinator that does a tableswitch*. 

cheers, 
R?mi 

* the JSR 292 has talked several times to introduce such method handle combinator. 

> De: "Brian Goetz" <brian.goetz at oracle.com>
> ?: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Envoy?: Lundi 11 D?cembre 2017 22:37:57
> Objet: Switch translation, part 2

> # Switch Translation, Part 2 -- type test patterns and guards
> #### Maurizio Cimadamore and Brian Goetz
> #### December 2017

> This document examines possible translation of `switch` constructs involving
> `case` labels that include type-test patterns, potentially with guards. Part 3
> will address translation of destructuring patterns, nested patterns, and OR
> patterns.

> ## Type-test patterns

> Type-test patterns are notable because their applicability predicate is purely
> based on the type system, meaning that the compiler can directly reason about
> it both statically (using flow analysis, optimizing away dynamic type tests)
> and dynamically (with `instanceof`.) A switch involving type-tests:

> switch (x) {
> case String s: ...
> case Integer i: ...
> case Long l: ...
> }

> can (among other strategies) be translated into a chain of `if-else` using
> `instanceof` and casts:

> if (x instanceof String) { String s = (String) x; ... }
> else if (x instanceof Integer) { Integer i = (Integer) x; ... }
> else if (x instanceof Long) { Long l = (Long) x; ... }

> #### Guards

> The `if-else` desugaring can also naturally handle guards:

> switch (x) {
> case String s
> where (s.length() > 0): ...
> case Integer i
> where (i > 0): ...
> case Long l
> where (l > 0L): ...
> }

> can be translated to:

> if (x instanceof String
> && ((String) x).length() > 0) { String s = (String) x; ... }
> else if (x instanceof Integer
> && ((Integer) x) > 0) { Integer i = (Integer) x; ... }
> else if (x instanceof Long
> && ((Long) x) > 0L) { Long l = (Long) x; ... }

> #### Performance concerns

> The translation to `if-else` chains is simple (for switches without
> fallthrough), but is harder for the VM to optimize, because we've used a more
> general control flow mechanism. If the target is an empty `String`, which means
> we'd pass the first `instanceof` but fail the guard, class-hierarchy analysis
> could tell us that it can't possibly be an `Integer` or a `Long`, and so
> there's no need to perform those tests. But generating code that takes
> advantage of this information is more complex.

> In the extreme case, where a switch consists entirely of type test patterns for
> final classes, this could be performed as an O(1) operation by hashing. And
> this is a common case involving switches over alternatives in a sum (sealed)
> type. (We probably shouldn't rely on finality at compile time, as this can
> change between compile and run time, but we would like to take advantage of
> this at run time if we can.)

> Finally, the straightforward static translation may miss opportunities for
> optimization. For example:

> switch (x) {
> case Point p
> where p.x > 0 && p.y > 0: A
> case Point p
> where p.x > 0 && p.y == 0: B
> }

> Here, not only would we potentially test the target twice to see if it is a
> `Point`, but we then further extract the `x` component twice and perform the
> `p.x > 0` test twice.

> #### Optimization opportunities

> The compiler can eliminate some redundant calculations through straightforward
> techniques. The previous switch can be transformed to:

> switch (x) {
> case Point p:
> if (((Point) p).x > 0 && ((Point) p).y > 0) { A }
> else if (((Point) p).x > 0 && ((Point) p).y > 0) { B }

> to eliminate the redundant `instanceof` (and could be further transformed to
> eliminate the downstream redundant computations.)

> #### Clause reordering

> The above example was easy to transform because the two `case Point` clauses
> were adjacent. But what if they are not? In some cases, it is safe to reorder
> them. For types `T` and `U`, it is safe to reorder `case T` and `case U` if the
> two types have no intersection; that there can be no types that are subtypes of
> them both. This is true when `T` and `U` are classes and neither extends the
> other, or when one is a final class and the other is an interface that the
> class does not implement.

> The compiler could then reorder case clauses so that all the ones whose first
> test is `case Point` are adjacent, and then coalesce them all into a single arm
> of the `if-else` chain.

> A possible spoiler here is fallthrough; if case A falls into case B, then cases
> A and B have to be moved as a group. (This is another reason to consider
> limiting fallthrough.)

> #### Summary of if-else translation

> While the if-else translation at first looks pretty bad, we are able to extract
> a fair amount of redundancy through well-understood compiler transformations.
> If an N-way switch has only M distinct types in it, in most cases we can reduce
> the cost from _O(N)_ to _O(M)_. Sometimes _M == N_, so this doesn't help, but
> sometimes _ M << N _ (and sometimes `N` is small, in which case _O(N)_ is
> fine.)

> Reordering clauses involves some risk; specifically, that the class hierarchy
> will change between compile and run time. It seems eminently safe to reorder
> `String` and `Integer`, but more questionable to reorder an arbitrary class
> `Foo` with `Runnable`, even if `Foo` doesn't implement `Runnable` now, because
> it might easily be changed to do so later. Ideally we'd like to perform
> class-hierarchy optimizations using the runtime hierarchy, not the compile-time
> hierarchy.

> ## Type classifiers

> The technique outlined in _Part 1_, where we lower the complex switch to a dense
> `int` switch, and use an indy-based classifier to select an index, is
> applicable here as well. First let's consider a switch consisting only of
> unguarded type-test patterns (and optionally a default clause.)

> We'll start with an `indy` bootstrap whose static argument are `Class` constants
> corresponding to each arm of the switch, whose dynamic argument is the switch
> target, and whose return value is a case number (or distinguished sentinels for
> "no match" and `null`.) We can easily implement such a bootstrap with a linear
> search, but can also do better; if some subset of the classes are `final`, we
> can choose between these more quickly (such as via binary search on
> `hashCode()`, hash function, or hash table), and we need perform only a single
> operation to test all of those at once. Dynamic techniques (such as a building
> a hash map of previously seen target types), which `indy` is well-suited to,
> can asymptotically approach _O(1)_ even when the classes involved are not
> final.

> So we can lower:

> switch (x) {
> case T t: A
> case U u: B
> case V v: C
> }

> to

> int y = indy[bootstrap=typeSwitch(T.class, U.class, V.class)](x)
> switch (y) {
> case 0: A
> case 1: B
> case 2: C
> }

> This has the advantages that the generated code is very similar to the source
> code, we can (in some cases) get _O(1)_ dispatch performance, and we can handle
> fallthrough with no additional complexity.

> #### Guards

> There are two approaches we could take to add support for guards into the
> process; we could try to teach the bootstrap about guards (and would have to
> pass locals that appear in guard expressions as additional arguments to the
> classifier), or we could leave guards to the generated bytecode. The latter
> seems far more attractive, but requires some tweaks to the bootstrap arguments
> and to the shape of the generated code.

> If the classifier says "you have matched case #3", but then we fail the guard
> for #3, we want to go back into the classifier and start again at #4.
> Additionally, we'd like for the classifier to use this information ("start over
> at #4") to optimize away unnecessary tests.

> We add a second argument (where to start) to the classifier invocation
> signature, and wrap the switch in a loop, lowering:

> switch (x) {
> case T t where (e1): A
> case T t where (e2): B
> case U u where (e3): C
> }

> into

> int y = -1; // start at the top
> while (true) {
> y = indy[...](x, y)
> switch (y) {
> case 0: if (!e1) continue; A
> case 1: if (!e2) continue; B
> case 2: if (!e3) continue; C
> }
> break;
> }

> For cases where the same type test is repeated in consecutive positions (at N
> and N+1), we can have the static compiler coalesce them as above, or we could
> have the bootstrap maintain a table so that if you re-enter the bootstrap where
> the previous answer was N, then it can immediately return N+1. Similarly, if N
> and N+1 are known to be mutually exclusive types (like `String` and `Integer`),
> on reentering the classifier with N, we can skip right to N+2 since if we
> matched `String`, we cannot match `Integer`. Lookup tables for such
> optimizations can be built at link time.

> #### Mixing constants and type tests

> This approach also extends to tests that are a mix of constant patterns and
> type-test patterns, such as:

> switch (x) {
> case "Foo": ...
> case 0L: ...
> case Integer i:
> }

> We can extend the bootstrap protocol to accept constants as well as types, and
> it is a straightforward optimization to combine both type matching and constant
> matching in a single pass.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20180102/e612be41/attachment-0001.html>

From brian.goetz at oracle.com  Fri Jan 12 17:22:34 2018
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 12 Jan 2018 12:22:34 -0500
Subject: Constable interface should die
In-Reply-To: <113776be-1bab-0358-e99e-8611e6796625@oracle.com>
References: <1058715836.2467666.1504363902094.JavaMail.zimbra@u-pem.fr>
 <113776be-1bab-0358-e99e-8611e6796625@oracle.com>
Message-ID: <202ed4cf-39b8-9702-4f17-6e66efcb05f7@oracle.com>

And the next iteration of the loop I think lands in a more comfortable 
place.

Constable did indeed die, to be renamed to SymbolicRef.? A SymbolicRef, 
is, well, a symbolic reference for some classfile or platform entity, 
including constants, but also possibly including annotations, classfile 
attributes, indy bootstrap specifiers, or other constructs.

Most symbolic references have a live object analogue; ClassRef 
corresponds to a Class, MethodTypeRef corresponds to a MethodType.

Some symbolic references are their own live object; this is true for 
String, Integer, Float, Long, and Double.

Constable was then reborn to mean something else -- something that is 
constant-able -- i.e., a live object that can be naturally represented 
in the constant pool.? This includes String and friends from above, as 
well as Class, MethodType, and MethodHandle -- as well as anything that 
knows how to represent itself in the constant pool with condy, like var 
handles or enum constants.

So:
 ?- Constable -- represents a live object that also knows how to 
construct a symbolic reference for itself.? Constructing a symbolic 
reference is a partial function; not all method handles can be 
represented in the constant pool directly, just the direct kind (for now).
 ?- SymbolicRef -- a symbolic (purely nominal) descriptor for some 
object.? These can be resolved reflectively or can be intrinsified into 
the constant pool.? Symbolic references also implement Constable, so you 
can explicitly store a symbolic ref in the CP (with condy).
 ?- SymbolicRef.OfSelf -- for the types that act as their own symbolic ref.


On 9/22/2017 11:14 AM, Brian Goetz wrote:
> So, to close the loop here ...
>
> Based on these comments, we went through three more rounds of API 
> design, and ... ended up in a pretty similar place to where we 
> started.? First we tried a more formal separation between "Constable" 
> and "ConstantPoolEntry."? Then we tried a CP-entry-centric approach.? 
> And what that brought us back to was, that the central abstraction 
> here is the symbolic references -- that this isn't only about 
> intrinsification.? (If it were, then the comments regarding 
> macro-systems would be spot-on.)? So the current draft brings the 
> symbolic references front and center -- and leaves intrinsification 
> and constant pool entries deliberately in the background.
>
> On 9/2/2017 10:51 AM, Remi Forax wrote:
>> Brian ask me to explain my concerns about the Constable interface.
>>
>> The whole constant folding story is like a macro system, it's a 
>> limited macro system, but still a macro system.
>> I've developed several macro systems, all have limitations, some 
>> limitation that i have introduced voluntarily, some that have appear 
>> after being being used, all the macro systems have always evolved 
>> after the first release.
>> So the first lesson of designing a macro system seems to be, because 
>> it will evolve, it should provide the minimal API so it can be 
>> refactored easily.
>>
>> In the case of constant-folding mechanism, it's not a mechanism that 
>> target end users but JDK maintainers, so end users should not be able 
>> to see the implementation of such mecanism.
>> It's my main concern with the Constable interface, it's a public 
>> visible type with a public visible API.
>>
>> We have already introduced in the past a mechanism that requires a 
>> specific interaction between the user code, the JDK and the compiler, 
>> it's the polymorphic methods signature and it was solved by using a 
>> private annotation.
>>
>> I think constant folding should use the same trick. Mark constant 
>> foldable type with a hidden annotation (@Constable ?) and mark 
>> methods (private) that can be called by the compiler with another 
>> hidden annotation (@TrackableConstant ?) and i will be happy.
>>
>> Compared to using an interface, there is a loss of discover-ability 
>> from the end user, but their is no loss of compiler checking because 
>> the compiler can check if a type is annotated by an annotation the 
>> same way it can check if it implements an interface.
>>
>> Now, we can discuss if @Constable should be a public annotation or 
>> not because once a type can be constant folded, removing the 
>> annotation is a non backward compatible change. So having the 
>> @Constable public is perhaps better than having to have a sentence in 
>> the middle of the javadoc saying that this is a constant foladable type.
>>
>> Note that constant folding things is also a form of serialization, 
>> the first Java serialization API have made that mistake to make the 
>> implementation of the part that serialize each object too visible. I 
>> think we can do better here.
>> You can also think that like Serializable, Constable could be an 
>> empty interface and ldc will take a Constable. But int 
>> constant-foldable and i do not see why it should be boxed to an 
>> Integer to becomes Constable (The full implication of that is that 
>> ldc should be a method with a polymorphic signature but we are moving 
>> in that direction anyway).
>>
>> Long live to @Constable !
>>
>> regards,
>> R?mi
>>
>


From amaembo at gmail.com  Sat Jan 27 04:13:58 2018
From: amaembo at gmail.com (Tagir Valeev)
Date: Sat, 27 Jan 2018 11:13:58 +0700
Subject: [raw-strings] Newline character handling
Message-ID: <CAE+3fjYLkgAgcOYEaTM7tPa276ajxRKDwz7g3CvZ0AsDEs9Z7Q@mail.gmail.com>

Hello!

I looked through Raw String Literals JEP draft [1] and did not find
any explicit statement about CR/LF translation within multiline raw
string. Usually in text files (and, I believe, Java source qualifies
as a text file) it's assumed that changing \n to \r\n and vice versa
would not change the semantics. Sometimes such changes are performed
automatically, e.g. on Git checkout via core.autocrlf=true setting
[2]. If multiline string literal is used, then such replacement may
badly affect the semantics of the program. E.g.:

public class Hello {
  public static void main(String[] args) {
    System.out.println(`Hello
World!`.length());
  }
}

The output of this program may change if its source text is converted
from CR/LF to LF line endings or vice versa.

As far as I know, Kotlin forcibly replaces CR/LF to LF within
multiline strings, though I did not find any explicit statement about
this in the documentation. This looks a good compromise, though could
be annoying for people who actually want to encode CR/LF inside a
multiline string. Nevertheless, I feel, that the special handling of
line terminators within multiline strings (or absence of such
handling) should be explicitly mentioned in the JEP and the following
specification.

With best regards,
Tagir Valeev.


[1] http://openjdk.java.net/jeps/8196004
[2] https://help.github.com/articles/dealing-with-line-endings/

From amaembo at gmail.com  Sat Jan 27 08:23:31 2018
From: amaembo at gmail.com (Tagir Valeev)
Date: Sat, 27 Jan 2018 15:23:31 +0700
Subject: [raw-strings] Indentation problem
Message-ID: <CAE+3fjZGiMoQ7SRgb6LEoo4eqiRS2Gs7YFgGmy4DAizmjkDjRQ@mail.gmail.com>

Hello!

Every language which implements the multiline strings has problems
with indentation. E.g. consider something like this:

public class Multiline {
  static String createHtml(String message) {
    String html = `<html>
  <head>
    <title>Message</title>
  </head>
  <body>`;
    if (message != null) {
      html += `
    <p>
      Message: `+message+`
    </p>`;
    }
    html += `
  </body>
</html>`;
    return html;
  }
}

Here the indentation of embedded snippet breaks the indentation of the
Java program harming its readability. The overall structure of the
method is messed with generated HTML structure. This is not just bad
indentation which could be fixed by auto-formatting feature of IDE.
You cannot fix this without throwing away a multiline string syntax
and without changing the semantics. Some people sacrifice the
semantics, namely the indentation of generated output if output
language is indentation agnostic. HTML is mostly so, unless you have a
<pre> section. So one may "fix" it like this:

public class Multiline {
  static String createHtml(String message) {
    String html = `<html>
        <head>
          <title>Message</title>
        </head>
        <body>`;
    if (message != null) {
      html += `
          <p>
            Message: `+message+`
          </p>`;
    }
    html += `
        </body>
      </html>`;
    return html;
  }
}

Now we have broken formatting in the generated HTML, which ruins the
idea of multiline strings (why bother to generate \n in output HTML if
it looks like a mess anyways?) Moreover, the structure of Java program
now affects the output. E.g. if you add several more nested "if" or
"switch" statement, you will need to indent <p> even more.

Many languages provide library methods to handle this. E.g.
trimIndent() could be provided to remove leading spaces of every line,
but this would kill the HTML indents at all. Another possibility is to
provide a method like trimMargin() on Kotlin [1] which trims all
spaces before a special character (pipe by default) including a
special character itself.

Assuming such method exists in Java, we can rewrite our method in a
prettier way preserving both Java and HTML formatting:

public class Multiline {
  static String createHtml(String message) {
    String html = `<html>
      |  <head>
      |    <title>Message</title>
      |  </head>
      |  <body>`.trimMargin();
    if (message != null) {
      html += `
        |    <p>
        |      Message: `+message+`
        |    </p>`.trimMargin();
    }
    html += `
      |  </body>
      |</html>`.trimMargin();
    return html;
  }
}

This is almost nice. Even without syntax highlighting you can easily
distinguish between Java code and injected HTML code, you can indent
Java and HTML independently and HTML code does not clash with Java
code structure. The only problem is the necesity to call the
trimMargin() method. This means that original line is preserved in the
bytecode and during runtime and the trimming is processed every time
the method is called causing performance and memory handicap. This
problem could be minimized making trimMargin() a javac intrinsic.
Hoever even in this case it would be hard to enforce usage of this
method and I expect that tons of hard-to-read Java code will appear in
the wild, despite I believe that Java is about readability.

So I propose to enforce such (or similar) format on language level
instead of adding a library method like "trimMargin()". The syntax
could be formalized like this:

- Raw string starts with back-quote, ends with back-quote, as written
in draft before
- When line terminating sequence is encountered within a raw string,
the '\n' character is included into the string, and the literal is
interrupted
- After the interruption any amount of whitespace or comment tokens
are allowed and ignored
- The next meaningful token must be a pipe '|'. It's a compilation
error if any other token or EOF appears before '|' except comments or
whitespaces
- After '|' the raw-string literal continues and may either end with
back-quote or be interrupted again with the subsequent line
terminating sequence.

Note the you don't need to especially escape the pipes within the literals.

I see some advantages with such syntax:
1. You can comment (or comment out!) a part of multiline string
without terminating it:

String sql = `SELECT * FROM table
    // Negative entry ID = deleted entry
    | WHERE entryID >= 0`;

If you want you can still make this comment a part of the query
(assuming DBMS accepts // comments):

String sql = `SELECT * FROM table
    | // Negative entry ID = deleted entry
    | WHERE entryID >= 0`;

Outcommenting code:

String html = `<div>
/*  |   <span color='red'>
    |       Error
    |   </span>*/ // single-line comments would work as well
    |   Something wrong happened
    |</div>`;

2. Looking into code fragment out of context (e.g. diff log) you
understand that you are inside a multiline literal. E.g. consider
reviewing a diff like

            | x++;
+           | if (x == 10) break;
            | foo(x);

Without pipes you could think that it's Java code without any further
consideration. But now it's clear that it's part of multiline string
(probably a JavaScript!), so this is not direct Java logic and you
should check the broader context to understand what's this literal is
for.

3. You cannot accidentally make a big part of program a part of
multiline raw string just forgetting to close the back-quote. A
compilation error will be issued right in the next string like
"Multiline string must continue with a pipe token", not some obscure
message five screens below where the next raw string literal happens
to start.

4. IDEs will easily distinguish between in-literal indentation and
Java indentation and may allow you to adjust independently one or
another.

In general this greatly increases the readability clearly telling you
at every line that you're not in Java, but inside something nested.
You can easily nest Java snippet into Java snippet and use multiline
raw-strings inside and still not get lost!

String javaMethod = `public void dumpHtml() {
  |  System.out.println(``<!DOCTYPE html>
  |    |<html>
  |    |  <body>
  |    |    <h1>HelloWorld!</h1>
  |    |  </body>
  |    |</html>``);
  |}`

One pipe means one level inside, two pipes mean two levels inside.


The only disadvantage I see in forcing a pipe prefix is inability to
just paste a big snippet from somewhere to the middle of Java program
in a plain text editor. However any decent IDE would support automatic
addition of pipes on paste. If not, simple search-and-replace with
regex like s/^/   |/ though the pasted content will do the thing. Even
adding pipes manually is not that hard (I did this manually many times
writing this letter).

What do you think?

[1] https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/trim-margin.html

From forax at univ-mlv.fr  Sat Jan 27 10:59:40 2018
From: forax at univ-mlv.fr (Remi Forax)
Date: Sat, 27 Jan 2018 11:59:40 +0100 (CET)
Subject: [raw-strings] Newline character handling
In-Reply-To: <CAE+3fjYLkgAgcOYEaTM7tPa276ajxRKDwz7g3CvZ0AsDEs9Z7Q@mail.gmail.com>
References: <CAE+3fjYLkgAgcOYEaTM7tPa276ajxRKDwz7g3CvZ0AsDEs9Z7Q@mail.gmail.com>
Message-ID: <465103338.2523097.1517050780606.JavaMail.zimbra@u-pem.fr>

Hi Tagir,
you have miss this line:
  CR (\u000D) and CRLF (\u000D\u000A) sequences are always translated to LF (\u000A). This translation provides least surprise behavior across platforms.

this is also the behavior of Perl, PHP, etc.

as a guy that had to write too many shaders in Java recently, thanks for resurrecting this discussion, i think we should not wait another 10 years to add raw strings in Java. 

regards,
R?mi

----- Mail original -----
> De: "Tagir Valeev" <amaembo at gmail.com>
> ?: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Envoy?: Samedi 27 Janvier 2018 05:13:58
> Objet: [raw-strings] Newline character handling

> Hello!
> 
> I looked through Raw String Literals JEP draft [1] and did not find
> any explicit statement about CR/LF translation within multiline raw
> string. Usually in text files (and, I believe, Java source qualifies
> as a text file) it's assumed that changing \n to \r\n and vice versa
> would not change the semantics. Sometimes such changes are performed
> automatically, e.g. on Git checkout via core.autocrlf=true setting
> [2]. If multiline string literal is used, then such replacement may
> badly affect the semantics of the program. E.g.:
> 
> public class Hello {
>  public static void main(String[] args) {
>    System.out.println(`Hello
> World!`.length());
>  }
> }
> 
> The output of this program may change if its source text is converted
> from CR/LF to LF line endings or vice versa.
> 
> As far as I know, Kotlin forcibly replaces CR/LF to LF within
> multiline strings, though I did not find any explicit statement about
> this in the documentation. This looks a good compromise, though could
> be annoying for people who actually want to encode CR/LF inside a
> multiline string. Nevertheless, I feel, that the special handling of
> line terminators within multiline strings (or absence of such
> handling) should be explicitly mentioned in the JEP and the following
> specification.
> 
> With best regards,
> Tagir Valeev.
> 
> 
> [1] http://openjdk.java.net/jeps/8196004
> [2] https://help.github.com/articles/dealing-with-line-endings/

From forax at univ-mlv.fr  Sat Jan 27 11:03:40 2018
From: forax at univ-mlv.fr (Remi Forax)
Date: Sat, 27 Jan 2018 12:03:40 +0100 (CET)
Subject: [raw-strings] Indentation problem
In-Reply-To: <CAE+3fjZGiMoQ7SRgb6LEoo4eqiRS2Gs7YFgGmy4DAizmjkDjRQ@mail.gmail.com>
References: <CAE+3fjZGiMoQ7SRgb6LEoo4eqiRS2Gs7YFgGmy4DAizmjkDjRQ@mail.gmail.com>
Message-ID: <233027356.2523249.1517051020375.JavaMail.zimbra@u-pem.fr>

The is a rule when you design a language, if you can do something in the compiler or in a library, do it in the library :)

I do not thing it's a good idea to force the pipe prefix in the spec, and from an IDE point of view, you have to do more analysis but you can recognize the sequence ` ... `.trimMargin() in order to auto-indent things correctly.

regards,
R?mi 

----- Mail original -----
> De: "Tagir Valeev" <amaembo at gmail.com>
> ?: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Envoy?: Samedi 27 Janvier 2018 09:23:31
> Objet: [raw-strings] Indentation problem

> Hello!
> 
> Every language which implements the multiline strings has problems
> with indentation. E.g. consider something like this:
> 
> public class Multiline {
>  static String createHtml(String message) {
>    String html = `<html>
>  <head>
>    <title>Message</title>
>  </head>
>  <body>`;
>    if (message != null) {
>      html += `
>    <p>
>      Message: `+message+`
>    </p>`;
>    }
>    html += `
>  </body>
> </html>`;
>    return html;
>  }
> }
> 
> Here the indentation of embedded snippet breaks the indentation of the
> Java program harming its readability. The overall structure of the
> method is messed with generated HTML structure. This is not just bad
> indentation which could be fixed by auto-formatting feature of IDE.
> You cannot fix this without throwing away a multiline string syntax
> and without changing the semantics. Some people sacrifice the
> semantics, namely the indentation of generated output if output
> language is indentation agnostic. HTML is mostly so, unless you have a
> <pre> section. So one may "fix" it like this:
> 
> public class Multiline {
>  static String createHtml(String message) {
>    String html = `<html>
>        <head>
>          <title>Message</title>
>        </head>
>        <body>`;
>    if (message != null) {
>      html += `
>          <p>
>            Message: `+message+`
>          </p>`;
>    }
>    html += `
>        </body>
>      </html>`;
>    return html;
>  }
> }
> 
> Now we have broken formatting in the generated HTML, which ruins the
> idea of multiline strings (why bother to generate \n in output HTML if
> it looks like a mess anyways?) Moreover, the structure of Java program
> now affects the output. E.g. if you add several more nested "if" or
> "switch" statement, you will need to indent <p> even more.
> 
> Many languages provide library methods to handle this. E.g.
> trimIndent() could be provided to remove leading spaces of every line,
> but this would kill the HTML indents at all. Another possibility is to
> provide a method like trimMargin() on Kotlin [1] which trims all
> spaces before a special character (pipe by default) including a
> special character itself.
> 
> Assuming such method exists in Java, we can rewrite our method in a
> prettier way preserving both Java and HTML formatting:
> 
> public class Multiline {
>  static String createHtml(String message) {
>    String html = `<html>
>      |  <head>
>      |    <title>Message</title>
>      |  </head>
>      |  <body>`.trimMargin();
>    if (message != null) {
>      html += `
>        |    <p>
>        |      Message: `+message+`
>        |    </p>`.trimMargin();
>    }
>    html += `
>      |  </body>
>      |</html>`.trimMargin();
>    return html;
>  }
> }
> 
> This is almost nice. Even without syntax highlighting you can easily
> distinguish between Java code and injected HTML code, you can indent
> Java and HTML independently and HTML code does not clash with Java
> code structure. The only problem is the necesity to call the
> trimMargin() method. This means that original line is preserved in the
> bytecode and during runtime and the trimming is processed every time
> the method is called causing performance and memory handicap. This
> problem could be minimized making trimMargin() a javac intrinsic.
> Hoever even in this case it would be hard to enforce usage of this
> method and I expect that tons of hard-to-read Java code will appear in
> the wild, despite I believe that Java is about readability.
> 
> So I propose to enforce such (or similar) format on language level
> instead of adding a library method like "trimMargin()". The syntax
> could be formalized like this:
> 
> - Raw string starts with back-quote, ends with back-quote, as written
> in draft before
> - When line terminating sequence is encountered within a raw string,
> the '\n' character is included into the string, and the literal is
> interrupted
> - After the interruption any amount of whitespace or comment tokens
> are allowed and ignored
> - The next meaningful token must be a pipe '|'. It's a compilation
> error if any other token or EOF appears before '|' except comments or
> whitespaces
> - After '|' the raw-string literal continues and may either end with
> back-quote or be interrupted again with the subsequent line
> terminating sequence.
> 
> Note the you don't need to especially escape the pipes within the literals.
> 
> I see some advantages with such syntax:
> 1. You can comment (or comment out!) a part of multiline string
> without terminating it:
> 
> String sql = `SELECT * FROM table
>    // Negative entry ID = deleted entry
>    | WHERE entryID >= 0`;
> 
> If you want you can still make this comment a part of the query
> (assuming DBMS accepts // comments):
> 
> String sql = `SELECT * FROM table
>    | // Negative entry ID = deleted entry
>    | WHERE entryID >= 0`;
> 
> Outcommenting code:
> 
> String html = `<div>
> /*  |   <span color='red'>
>    |       Error
>    |   </span>*/ // single-line comments would work as well
>    |   Something wrong happened
>    |</div>`;
> 
> 2. Looking into code fragment out of context (e.g. diff log) you
> understand that you are inside a multiline literal. E.g. consider
> reviewing a diff like
> 
>            | x++;
> +           | if (x == 10) break;
>            | foo(x);
> 
> Without pipes you could think that it's Java code without any further
> consideration. But now it's clear that it's part of multiline string
> (probably a JavaScript!), so this is not direct Java logic and you
> should check the broader context to understand what's this literal is
> for.
> 
> 3. You cannot accidentally make a big part of program a part of
> multiline raw string just forgetting to close the back-quote. A
> compilation error will be issued right in the next string like
> "Multiline string must continue with a pipe token", not some obscure
> message five screens below where the next raw string literal happens
> to start.
> 
> 4. IDEs will easily distinguish between in-literal indentation and
> Java indentation and may allow you to adjust independently one or
> another.
> 
> In general this greatly increases the readability clearly telling you
> at every line that you're not in Java, but inside something nested.
> You can easily nest Java snippet into Java snippet and use multiline
> raw-strings inside and still not get lost!
> 
> String javaMethod = `public void dumpHtml() {
>  |  System.out.println(``<!DOCTYPE html>
>  |    |<html>
>  |    |  <body>
>  |    |    <h1>HelloWorld!</h1>
>  |    |  </body>
>  |    |</html>``);
>  |}`
> 
> One pipe means one level inside, two pipes mean two levels inside.
> 
> 
> The only disadvantage I see in forcing a pipe prefix is inability to
> just paste a big snippet from somewhere to the middle of Java program
> in a plain text editor. However any decent IDE would support automatic
> addition of pipes on paste. If not, simple search-and-replace with
> regex like s/^/   |/ though the pasted content will do the thing. Even
> adding pipes manually is not that hard (I did this manually many times
> writing this letter).
> 
> What do you think?
> 
> [1] https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/trim-margin.html

From amaembo at gmail.com  Sat Jan 27 11:30:16 2018
From: amaembo at gmail.com (Tagir Valeev)
Date: Sat, 27 Jan 2018 18:30:16 +0700
Subject: [raw-strings] Newline character handling
In-Reply-To: <465103338.2523097.1517050780606.JavaMail.zimbra@u-pem.fr>
References: <CAE+3fjYLkgAgcOYEaTM7tPa276ajxRKDwz7g3CvZ0AsDEs9Z7Q@mail.gmail.com>
 <465103338.2523097.1517050780606.JavaMail.zimbra@u-pem.fr>
Message-ID: <CAE+3fjYqTj70Brvfx6N1_oLf0O0tdv7aBsdxWuzkYW03jODTTw@mail.gmail.com>

Ah, indeed. Missed this part somehow. Sorry for the noise then.

With best regards,
Tagir Valeev.

On Sat, Jan 27, 2018 at 5:59 PM, Remi Forax <forax at univ-mlv.fr> wrote:
> Hi Tagir,
> you have miss this line:
>   CR (\u000D) and CRLF (\u000D\u000A) sequences are always translated to LF (\u000A). This translation provides least surprise behavior across platforms.
>
> this is also the behavior of Perl, PHP, etc.
>
> as a guy that had to write too many shaders in Java recently, thanks for resurrecting this discussion, i think we should not wait another 10 years to add raw strings in Java.
>
> regards,
> R?mi
>
> ----- Mail original -----
>> De: "Tagir Valeev" <amaembo at gmail.com>
>> ?: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
>> Envoy?: Samedi 27 Janvier 2018 05:13:58
>> Objet: [raw-strings] Newline character handling
>
>> Hello!
>>
>> I looked through Raw String Literals JEP draft [1] and did not find
>> any explicit statement about CR/LF translation within multiline raw
>> string. Usually in text files (and, I believe, Java source qualifies
>> as a text file) it's assumed that changing \n to \r\n and vice versa
>> would not change the semantics. Sometimes such changes are performed
>> automatically, e.g. on Git checkout via core.autocrlf=true setting
>> [2]. If multiline string literal is used, then such replacement may
>> badly affect the semantics of the program. E.g.:
>>
>> public class Hello {
>>  public static void main(String[] args) {
>>    System.out.println(`Hello
>> World!`.length());
>>  }
>> }
>>
>> The output of this program may change if its source text is converted
>> from CR/LF to LF line endings or vice versa.
>>
>> As far as I know, Kotlin forcibly replaces CR/LF to LF within
>> multiline strings, though I did not find any explicit statement about
>> this in the documentation. This looks a good compromise, though could
>> be annoying for people who actually want to encode CR/LF inside a
>> multiline string. Nevertheless, I feel, that the special handling of
>> line terminators within multiline strings (or absence of such
>> handling) should be explicitly mentioned in the JEP and the following
>> specification.
>>
>> With best regards,
>> Tagir Valeev.
>>
>>
>> [1] http://openjdk.java.net/jeps/8196004
>> [2] https://help.github.com/articles/dealing-with-line-endings/

From amaembo at gmail.com  Sat Jan 27 11:37:01 2018
From: amaembo at gmail.com (Tagir Valeev)
Date: Sat, 27 Jan 2018 18:37:01 +0700
Subject: [raw-strings] Indentation problem
In-Reply-To: <233027356.2523249.1517051020375.JavaMail.zimbra@u-pem.fr>
References: <CAE+3fjZGiMoQ7SRgb6LEoo4eqiRS2Gs7YFgGmy4DAizmjkDjRQ@mail.gmail.com>
 <233027356.2523249.1517051020375.JavaMail.zimbra@u-pem.fr>
Message-ID: <CAE+3fjZeDRwcfZTHAfRQOt2rA14VEhfJvSoKpDpQHiN0h3rL4w@mail.gmail.com>

Hello!

> The is a rule when you design a language, if you can do something in the compiler or in a library, do it in the library :)

Library can indeed allow you (to some extent) to use better syntax.
What library cannot do is to disallow the worse syntax. And this the
most important part of my suggestion: to prevent people from writing
bad code, not to allow people to write better code.

> I do not thing it's a good idea to force the pipe prefix in the spec,

Why do you think it's not a good idea? What are possible
disadvantages? Please share your concerns. Thanks.

> and from an IDE point of view, you have to do more analysis but you can recognize the sequence ` ... `.trimMargin() in order to auto-indent things correctly.

True, it's possible if one uses trimMargin() call. But if bad code is
already written, it would be not so easy to fix it automatically. We
could automatically add trimMargin(), but determining which part of
indent should be moved to the left part of pipe cannot be done with
100% accuracy.

With best regards,
Tagir Valeev.

>
> regards,
> R?mi
>
> ----- Mail original -----
>> De: "Tagir Valeev" <amaembo at gmail.com>
>> ?: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
>> Envoy?: Samedi 27 Janvier 2018 09:23:31
>> Objet: [raw-strings] Indentation problem
>
>> Hello!
>>
>> Every language which implements the multiline strings has problems
>> with indentation. E.g. consider something like this:
>>
>> public class Multiline {
>>  static String createHtml(String message) {
>>    String html = `<html>
>>  <head>
>>    <title>Message</title>
>>  </head>
>>  <body>`;
>>    if (message != null) {
>>      html += `
>>    <p>
>>      Message: `+message+`
>>    </p>`;
>>    }
>>    html += `
>>  </body>
>> </html>`;
>>    return html;
>>  }
>> }
>>
>> Here the indentation of embedded snippet breaks the indentation of the
>> Java program harming its readability. The overall structure of the
>> method is messed with generated HTML structure. This is not just bad
>> indentation which could be fixed by auto-formatting feature of IDE.
>> You cannot fix this without throwing away a multiline string syntax
>> and without changing the semantics. Some people sacrifice the
>> semantics, namely the indentation of generated output if output
>> language is indentation agnostic. HTML is mostly so, unless you have a
>> <pre> section. So one may "fix" it like this:
>>
>> public class Multiline {
>>  static String createHtml(String message) {
>>    String html = `<html>
>>        <head>
>>          <title>Message</title>
>>        </head>
>>        <body>`;
>>    if (message != null) {
>>      html += `
>>          <p>
>>            Message: `+message+`
>>          </p>`;
>>    }
>>    html += `
>>        </body>
>>      </html>`;
>>    return html;
>>  }
>> }
>>
>> Now we have broken formatting in the generated HTML, which ruins the
>> idea of multiline strings (why bother to generate \n in output HTML if
>> it looks like a mess anyways?) Moreover, the structure of Java program
>> now affects the output. E.g. if you add several more nested "if" or
>> "switch" statement, you will need to indent <p> even more.
>>
>> Many languages provide library methods to handle this. E.g.
>> trimIndent() could be provided to remove leading spaces of every line,
>> but this would kill the HTML indents at all. Another possibility is to
>> provide a method like trimMargin() on Kotlin [1] which trims all
>> spaces before a special character (pipe by default) including a
>> special character itself.
>>
>> Assuming such method exists in Java, we can rewrite our method in a
>> prettier way preserving both Java and HTML formatting:
>>
>> public class Multiline {
>>  static String createHtml(String message) {
>>    String html = `<html>
>>      |  <head>
>>      |    <title>Message</title>
>>      |  </head>
>>      |  <body>`.trimMargin();
>>    if (message != null) {
>>      html += `
>>        |    <p>
>>        |      Message: `+message+`
>>        |    </p>`.trimMargin();
>>    }
>>    html += `
>>      |  </body>
>>      |</html>`.trimMargin();
>>    return html;
>>  }
>> }
>>
>> This is almost nice. Even without syntax highlighting you can easily
>> distinguish between Java code and injected HTML code, you can indent
>> Java and HTML independently and HTML code does not clash with Java
>> code structure. The only problem is the necesity to call the
>> trimMargin() method. This means that original line is preserved in the
>> bytecode and during runtime and the trimming is processed every time
>> the method is called causing performance and memory handicap. This
>> problem could be minimized making trimMargin() a javac intrinsic.
>> Hoever even in this case it would be hard to enforce usage of this
>> method and I expect that tons of hard-to-read Java code will appear in
>> the wild, despite I believe that Java is about readability.
>>
>> So I propose to enforce such (or similar) format on language level
>> instead of adding a library method like "trimMargin()". The syntax
>> could be formalized like this:
>>
>> - Raw string starts with back-quote, ends with back-quote, as written
>> in draft before
>> - When line terminating sequence is encountered within a raw string,
>> the '\n' character is included into the string, and the literal is
>> interrupted
>> - After the interruption any amount of whitespace or comment tokens
>> are allowed and ignored
>> - The next meaningful token must be a pipe '|'. It's a compilation
>> error if any other token or EOF appears before '|' except comments or
>> whitespaces
>> - After '|' the raw-string literal continues and may either end with
>> back-quote or be interrupted again with the subsequent line
>> terminating sequence.
>>
>> Note the you don't need to especially escape the pipes within the literals.
>>
>> I see some advantages with such syntax:
>> 1. You can comment (or comment out!) a part of multiline string
>> without terminating it:
>>
>> String sql = `SELECT * FROM table
>>    // Negative entry ID = deleted entry
>>    | WHERE entryID >= 0`;
>>
>> If you want you can still make this comment a part of the query
>> (assuming DBMS accepts // comments):
>>
>> String sql = `SELECT * FROM table
>>    | // Negative entry ID = deleted entry
>>    | WHERE entryID >= 0`;
>>
>> Outcommenting code:
>>
>> String html = `<div>
>> /*  |   <span color='red'>
>>    |       Error
>>    |   </span>*/ // single-line comments would work as well
>>    |   Something wrong happened
>>    |</div>`;
>>
>> 2. Looking into code fragment out of context (e.g. diff log) you
>> understand that you are inside a multiline literal. E.g. consider
>> reviewing a diff like
>>
>>            | x++;
>> +           | if (x == 10) break;
>>            | foo(x);
>>
>> Without pipes you could think that it's Java code without any further
>> consideration. But now it's clear that it's part of multiline string
>> (probably a JavaScript!), so this is not direct Java logic and you
>> should check the broader context to understand what's this literal is
>> for.
>>
>> 3. You cannot accidentally make a big part of program a part of
>> multiline raw string just forgetting to close the back-quote. A
>> compilation error will be issued right in the next string like
>> "Multiline string must continue with a pipe token", not some obscure
>> message five screens below where the next raw string literal happens
>> to start.
>>
>> 4. IDEs will easily distinguish between in-literal indentation and
>> Java indentation and may allow you to adjust independently one or
>> another.
>>
>> In general this greatly increases the readability clearly telling you
>> at every line that you're not in Java, but inside something nested.
>> You can easily nest Java snippet into Java snippet and use multiline
>> raw-strings inside and still not get lost!
>>
>> String javaMethod = `public void dumpHtml() {
>>  |  System.out.println(``<!DOCTYPE html>
>>  |    |<html>
>>  |    |  <body>
>>  |    |    <h1>HelloWorld!</h1>
>>  |    |  </body>
>>  |    |</html>``);
>>  |}`
>>
>> One pipe means one level inside, two pipes mean two levels inside.
>>
>>
>> The only disadvantage I see in forcing a pipe prefix is inability to
>> just paste a big snippet from somewhere to the middle of Java program
>> in a plain text editor. However any decent IDE would support automatic
>> addition of pipes on paste. If not, simple search-and-replace with
>> regex like s/^/   |/ though the pasted content will do the thing. Even
>> adding pipes manually is not that hard (I did this manually many times
>> writing this letter).
>>
>> What do you think?
>>
>> [1] https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/trim-margin.html

From forax at univ-mlv.fr  Mon Jan 29 11:01:58 2018
From: forax at univ-mlv.fr (forax at univ-mlv.fr)
Date: Mon, 29 Jan 2018 12:01:58 +0100 (CET)
Subject: [raw-strings] Indentation problem
In-Reply-To: <CAE+3fjZeDRwcfZTHAfRQOt2rA14VEhfJvSoKpDpQHiN0h3rL4w@mail.gmail.com>
References: <CAE+3fjZGiMoQ7SRgb6LEoo4eqiRS2Gs7YFgGmy4DAizmjkDjRQ@mail.gmail.com>
 <233027356.2523249.1517051020375.JavaMail.zimbra@u-pem.fr>
 <CAE+3fjZeDRwcfZTHAfRQOt2rA14VEhfJvSoKpDpQHiN0h3rL4w@mail.gmail.com>
Message-ID: <1338120311.230249.1517223718272.JavaMail.zimbra@u-pem.fr>

----- Mail original -----
> De: "Tagir Valeev" <amaembo at gmail.com>
> ?: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Envoy?: Samedi 27 Janvier 2018 12:37:01
> Objet: Re: [raw-strings] Indentation problem

> Hello!

Hi !

> 
>> The is a rule when you design a language, if you can do something in the
>> compiler or in a library, do it in the library :)
> 
> Library can indeed allow you (to some extent) to use better syntax.
> What library cannot do is to disallow the worse syntax. And this the
> most important part of my suggestion: to prevent people from writing
> bad code, not to allow people to write better code.

non-indented code doesn't always equals to bad code,
think about minification by example. 

> 
>> I do not thing it's a good idea to force the pipe prefix in the spec,
> 
> Why do you think it's not a good idea? What are possible
> disadvantages? Please share your concerns. Thanks.

Coding conventions, code formatting rules, etc. change over time as we learn as a community,
baking this kind of considerations in a language will make your language always live in the same decade,
the less 'by defaults' you have in a language, the better it is when you will looking back. 

> 
>> and from an IDE point of view, you have to do more analysis but you can
>> recognize the sequence ` ... `.trimMargin() in order to auto-indent things
>> correctly.
> 
> True, it's possible if one uses trimMargin() call. But if bad code is
> already written, it would be not so easy to fix it automatically. We
> could automatically add trimMargin(), but determining which part of
> indent should be moved to the left part of pipe cannot be done with
> 100% accuracy.

I was thinking about using .trimMarging() as a hint that the user want IDEs to re-format the code correctly (by detecting the pipe), not forcing users to follow a specific convention.

> 
> With best regards,
> Tagir Valeev.
> 

regards,
R?mi

>>
>> regards,
>> R?mi
>>
>> ----- Mail original -----
>>> De: "Tagir Valeev" <amaembo at gmail.com>
>>> ?: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
>>> Envoy?: Samedi 27 Janvier 2018 09:23:31
>>> Objet: [raw-strings] Indentation problem
>>
>>> Hello!
>>>
>>> Every language which implements the multiline strings has problems
>>> with indentation. E.g. consider something like this:
>>>
>>> public class Multiline {
>>>  static String createHtml(String message) {
>>>    String html = `<html>
>>>  <head>
>>>    <title>Message</title>
>>>  </head>
>>>  <body>`;
>>>    if (message != null) {
>>>      html += `
>>>    <p>
>>>      Message: `+message+`
>>>    </p>`;
>>>    }
>>>    html += `
>>>  </body>
>>> </html>`;
>>>    return html;
>>>  }
>>> }
>>>
>>> Here the indentation of embedded snippet breaks the indentation of the
>>> Java program harming its readability. The overall structure of the
>>> method is messed with generated HTML structure. This is not just bad
>>> indentation which could be fixed by auto-formatting feature of IDE.
>>> You cannot fix this without throwing away a multiline string syntax
>>> and without changing the semantics. Some people sacrifice the
>>> semantics, namely the indentation of generated output if output
>>> language is indentation agnostic. HTML is mostly so, unless you have a
>>> <pre> section. So one may "fix" it like this:
>>>
>>> public class Multiline {
>>>  static String createHtml(String message) {
>>>    String html = `<html>
>>>        <head>
>>>          <title>Message</title>
>>>        </head>
>>>        <body>`;
>>>    if (message != null) {
>>>      html += `
>>>          <p>
>>>            Message: `+message+`
>>>          </p>`;
>>>    }
>>>    html += `
>>>        </body>
>>>      </html>`;
>>>    return html;
>>>  }
>>> }
>>>
>>> Now we have broken formatting in the generated HTML, which ruins the
>>> idea of multiline strings (why bother to generate \n in output HTML if
>>> it looks like a mess anyways?) Moreover, the structure of Java program
>>> now affects the output. E.g. if you add several more nested "if" or
>>> "switch" statement, you will need to indent <p> even more.
>>>
>>> Many languages provide library methods to handle this. E.g.
>>> trimIndent() could be provided to remove leading spaces of every line,
>>> but this would kill the HTML indents at all. Another possibility is to
>>> provide a method like trimMargin() on Kotlin [1] which trims all
>>> spaces before a special character (pipe by default) including a
>>> special character itself.
>>>
>>> Assuming such method exists in Java, we can rewrite our method in a
>>> prettier way preserving both Java and HTML formatting:
>>>
>>> public class Multiline {
>>>  static String createHtml(String message) {
>>>    String html = `<html>
>>>      |  <head>
>>>      |    <title>Message</title>
>>>      |  </head>
>>>      |  <body>`.trimMargin();
>>>    if (message != null) {
>>>      html += `
>>>        |    <p>
>>>        |      Message: `+message+`
>>>        |    </p>`.trimMargin();
>>>    }
>>>    html += `
>>>      |  </body>
>>>      |</html>`.trimMargin();
>>>    return html;
>>>  }
>>> }
>>>
>>> This is almost nice. Even without syntax highlighting you can easily
>>> distinguish between Java code and injected HTML code, you can indent
>>> Java and HTML independently and HTML code does not clash with Java
>>> code structure. The only problem is the necesity to call the
>>> trimMargin() method. This means that original line is preserved in the
>>> bytecode and during runtime and the trimming is processed every time
>>> the method is called causing performance and memory handicap. This
>>> problem could be minimized making trimMargin() a javac intrinsic.
>>> Hoever even in this case it would be hard to enforce usage of this
>>> method and I expect that tons of hard-to-read Java code will appear in
>>> the wild, despite I believe that Java is about readability.
>>>
>>> So I propose to enforce such (or similar) format on language level
>>> instead of adding a library method like "trimMargin()". The syntax
>>> could be formalized like this:
>>>
>>> - Raw string starts with back-quote, ends with back-quote, as written
>>> in draft before
>>> - When line terminating sequence is encountered within a raw string,
>>> the '\n' character is included into the string, and the literal is
>>> interrupted
>>> - After the interruption any amount of whitespace or comment tokens
>>> are allowed and ignored
>>> - The next meaningful token must be a pipe '|'. It's a compilation
>>> error if any other token or EOF appears before '|' except comments or
>>> whitespaces
>>> - After '|' the raw-string literal continues and may either end with
>>> back-quote or be interrupted again with the subsequent line
>>> terminating sequence.
>>>
>>> Note the you don't need to especially escape the pipes within the literals.
>>>
>>> I see some advantages with such syntax:
>>> 1. You can comment (or comment out!) a part of multiline string
>>> without terminating it:
>>>
>>> String sql = `SELECT * FROM table
>>>    // Negative entry ID = deleted entry
>>>    | WHERE entryID >= 0`;
>>>
>>> If you want you can still make this comment a part of the query
>>> (assuming DBMS accepts // comments):
>>>
>>> String sql = `SELECT * FROM table
>>>    | // Negative entry ID = deleted entry
>>>    | WHERE entryID >= 0`;
>>>
>>> Outcommenting code:
>>>
>>> String html = `<div>
>>> /*  |   <span color='red'>
>>>    |       Error
>>>    |   </span>*/ // single-line comments would work as well
>>>    |   Something wrong happened
>>>    |</div>`;
>>>
>>> 2. Looking into code fragment out of context (e.g. diff log) you
>>> understand that you are inside a multiline literal. E.g. consider
>>> reviewing a diff like
>>>
>>>            | x++;
>>> +           | if (x == 10) break;
>>>            | foo(x);
>>>
>>> Without pipes you could think that it's Java code without any further
>>> consideration. But now it's clear that it's part of multiline string
>>> (probably a JavaScript!), so this is not direct Java logic and you
>>> should check the broader context to understand what's this literal is
>>> for.
>>>
>>> 3. You cannot accidentally make a big part of program a part of
>>> multiline raw string just forgetting to close the back-quote. A
>>> compilation error will be issued right in the next string like
>>> "Multiline string must continue with a pipe token", not some obscure
>>> message five screens below where the next raw string literal happens
>>> to start.
>>>
>>> 4. IDEs will easily distinguish between in-literal indentation and
>>> Java indentation and may allow you to adjust independently one or
>>> another.
>>>
>>> In general this greatly increases the readability clearly telling you
>>> at every line that you're not in Java, but inside something nested.
>>> You can easily nest Java snippet into Java snippet and use multiline
>>> raw-strings inside and still not get lost!
>>>
>>> String javaMethod = `public void dumpHtml() {
>>>  |  System.out.println(``<!DOCTYPE html>
>>>  |    |<html>
>>>  |    |  <body>
>>>  |    |    <h1>HelloWorld!</h1>
>>>  |    |  </body>
>>>  |    |</html>``);
>>>  |}`
>>>
>>> One pipe means one level inside, two pipes mean two levels inside.
>>>
>>>
>>> The only disadvantage I see in forcing a pipe prefix is inability to
>>> just paste a big snippet from somewhere to the middle of Java program
>>> in a plain text editor. However any decent IDE would support automatic
>>> addition of pipes on paste. If not, simple search-and-replace with
>>> regex like s/^/   |/ though the pasted content will do the thing. Even
>>> adding pipes manually is not that hard (I did this manually many times
>>> writing this letter).
>>>
>>> What do you think?
>>>
> >> [1] https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/trim-margin.html

From john.r.rose at oracle.com  Tue Jan 30 00:19:09 2018
From: john.r.rose at oracle.com (John Rose)
Date: Mon, 29 Jan 2018 16:19:09 -0800
Subject: [raw-strings] Indentation problem
In-Reply-To: <1338120311.230249.1517223718272.JavaMail.zimbra@u-pem.fr>
References: <CAE+3fjZGiMoQ7SRgb6LEoo4eqiRS2Gs7YFgGmy4DAizmjkDjRQ@mail.gmail.com>
 <233027356.2523249.1517051020375.JavaMail.zimbra@u-pem.fr>
 <CAE+3fjZeDRwcfZTHAfRQOt2rA14VEhfJvSoKpDpQHiN0h3rL4w@mail.gmail.com>
 <1338120311.230249.1517223718272.JavaMail.zimbra@u-pem.fr>
Message-ID: <EAA46096-AE10-44E4-9125-22922D746452@oracle.com>

Sorry, Tagir, I'm with Remi on this one.

Baking conventions into a language is dangerous,
and must be proven either canonical in some way
or harmless.  (Floating point tokens are canonical
by reference to C, javadoc was a harmless design decision
in the context of Java.)  Tagir's proposals for indentation
control are very clever but neither canonical nor harmless.

Jim's proposal is carefully designed to be as minimal
and simple as possible.

I expect most of the feedback will be, like Tagir's,
to the effect that Jim has missed some crucial feature,
so that if only the proposal were less minimal in
some way it would be better.

But minimality itself is a feature, one to be protected
and defended, especially against committees of
people each of whom has their favorite non-negotiable.

That said, I will (in another message) argue (in a
different way) that Jim's proposal is very slightly
too simple, in a way that makes it fail to meet its
primary goal.  Spoiler:  I think I can prove that
Markdown code quoting is appropriately minimal
in its design, in a way Jim's is not.

? John

From brian.goetz at oracle.com  Tue Jan 30 20:01:15 2018
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 30 Jan 2018 15:01:15 -0500
Subject: [raw-strings] Indentation problem
In-Reply-To: <CAE+3fjZGiMoQ7SRgb6LEoo4eqiRS2Gs7YFgGmy4DAizmjkDjRQ@mail.gmail.com>
References: <CAE+3fjZGiMoQ7SRgb6LEoo4eqiRS2Gs7YFgGmy4DAizmjkDjRQ@mail.gmail.com>
Message-ID: <8B9EB16F-EFCB-4014-9031-91BCE7AA304E@oracle.com>

In my opinion, this is actually the central motivation for the feature ? minimizing the friction (both reading and writing) of embedding small snippets from other programming languages in Java sources. So this disadvantage is a pretty big one.  

That said, a library method to remove the piping (some variant of trimIndent) makes sense, so people who want to program this way can do so easily.  

> The only disadvantage I see in forcing a pipe prefix is inability to
> just paste a big snippet from somewhere to the middle of Java program
> in a plain text editor. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20180130/c05bc9ce/attachment.html>