From forax at univ-mlv.fr Tue Jan 2 12:35:15 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 2 Jan 2018 13:35:15 +0100 (CET) Subject: Switch translation, part 2 In-Reply-To: <73805383-7ae7-fc47-1a21-f7116b78248a@oracle.com> References: <73805383-7ae7-fc47-1a21-f7116b78248a@oracle.com> Message-ID: <1651774490.240302.1514896515983.JavaMail.zimbra@u-pem.fr> Hi all, while the proposed translation is a good translation by default, when you can have fallthroughs or guards, if you have none of them, it's not the best translation. [CC John Rose because i may say something stupid] The problem is that the VM doesn't not prune never called cases in a switch while it does that for the branch of a if, so an if ... else can be faster than a switch in the case only some cases are exercise at runtime. Also note that a lot of tableswitch end up to be generated by the VM as if .. else if you take a look to the generated assembly code because jumping to a computed address is not free. So it seems a good idea in the case of an expression switch with no guard to not generate an indy + a tableswitch but just an indy. So instead of lowering: switch (x) { case T t: A case U u: B case V v: C } to int y = indy[bootstrap=typeSwitch(T.class, U.class, V.class)](x) switch (y) { case 0: A case 1: B case 2: C } I propose to lowering it to something similar to the lambda translation: var expr = indy[bootstrap=exprTypeSwich(1, T.class, U.class, V.class)(x); and let the bootstrap to do all the lifting. The first bootstrap argument is the number of the switch in the code, here 1. With A, B and C being desugared as static methods respectively to switch$1case$0, switch$1case$1 and switch$1case$2 (i.e. "switch$" + switchNumber + "case$" + caseNumber. The bootstrap method exprTypeSwich can work like an inlining cache if the number of branches actually visited is small, this is interesting because the performance of an expression switch will be in the same ball park as the performance of the corresponding virtual call, and if there is too many branches used, revert to use a new method handle combinator that does a tableswitch*. cheers, R?mi * the JSR 292 has talked several times to introduce such method handle combinator. > De: "Brian Goetz" > ?: "amber-spec-experts" > Envoy?: Lundi 11 D?cembre 2017 22:37:57 > Objet: Switch translation, part 2 > # Switch Translation, Part 2 -- type test patterns and guards > #### Maurizio Cimadamore and Brian Goetz > #### December 2017 > This document examines possible translation of `switch` constructs involving > `case` labels that include type-test patterns, potentially with guards. Part 3 > will address translation of destructuring patterns, nested patterns, and OR > patterns. > ## Type-test patterns > Type-test patterns are notable because their applicability predicate is purely > based on the type system, meaning that the compiler can directly reason about > it both statically (using flow analysis, optimizing away dynamic type tests) > and dynamically (with `instanceof`.) A switch involving type-tests: > switch (x) { > case String s: ... > case Integer i: ... > case Long l: ... > } > can (among other strategies) be translated into a chain of `if-else` using > `instanceof` and casts: > if (x instanceof String) { String s = (String) x; ... } > else if (x instanceof Integer) { Integer i = (Integer) x; ... } > else if (x instanceof Long) { Long l = (Long) x; ... } > #### Guards > The `if-else` desugaring can also naturally handle guards: > switch (x) { > case String s > where (s.length() > 0): ... > case Integer i > where (i > 0): ... > case Long l > where (l > 0L): ... > } > can be translated to: > if (x instanceof String > && ((String) x).length() > 0) { String s = (String) x; ... } > else if (x instanceof Integer > && ((Integer) x) > 0) { Integer i = (Integer) x; ... } > else if (x instanceof Long > && ((Long) x) > 0L) { Long l = (Long) x; ... } > #### Performance concerns > The translation to `if-else` chains is simple (for switches without > fallthrough), but is harder for the VM to optimize, because we've used a more > general control flow mechanism. If the target is an empty `String`, which means > we'd pass the first `instanceof` but fail the guard, class-hierarchy analysis > could tell us that it can't possibly be an `Integer` or a `Long`, and so > there's no need to perform those tests. But generating code that takes > advantage of this information is more complex. > In the extreme case, where a switch consists entirely of type test patterns for > final classes, this could be performed as an O(1) operation by hashing. And > this is a common case involving switches over alternatives in a sum (sealed) > type. (We probably shouldn't rely on finality at compile time, as this can > change between compile and run time, but we would like to take advantage of > this at run time if we can.) > Finally, the straightforward static translation may miss opportunities for > optimization. For example: > switch (x) { > case Point p > where p.x > 0 && p.y > 0: A > case Point p > where p.x > 0 && p.y == 0: B > } > Here, not only would we potentially test the target twice to see if it is a > `Point`, but we then further extract the `x` component twice and perform the > `p.x > 0` test twice. > #### Optimization opportunities > The compiler can eliminate some redundant calculations through straightforward > techniques. The previous switch can be transformed to: > switch (x) { > case Point p: > if (((Point) p).x > 0 && ((Point) p).y > 0) { A } > else if (((Point) p).x > 0 && ((Point) p).y > 0) { B } > to eliminate the redundant `instanceof` (and could be further transformed to > eliminate the downstream redundant computations.) > #### Clause reordering > The above example was easy to transform because the two `case Point` clauses > were adjacent. But what if they are not? In some cases, it is safe to reorder > them. For types `T` and `U`, it is safe to reorder `case T` and `case U` if the > two types have no intersection; that there can be no types that are subtypes of > them both. This is true when `T` and `U` are classes and neither extends the > other, or when one is a final class and the other is an interface that the > class does not implement. > The compiler could then reorder case clauses so that all the ones whose first > test is `case Point` are adjacent, and then coalesce them all into a single arm > of the `if-else` chain. > A possible spoiler here is fallthrough; if case A falls into case B, then cases > A and B have to be moved as a group. (This is another reason to consider > limiting fallthrough.) > #### Summary of if-else translation > While the if-else translation at first looks pretty bad, we are able to extract > a fair amount of redundancy through well-understood compiler transformations. > If an N-way switch has only M distinct types in it, in most cases we can reduce > the cost from _O(N)_ to _O(M)_. Sometimes _M == N_, so this doesn't help, but > sometimes _ M << N _ (and sometimes `N` is small, in which case _O(N)_ is > fine.) > Reordering clauses involves some risk; specifically, that the class hierarchy > will change between compile and run time. It seems eminently safe to reorder > `String` and `Integer`, but more questionable to reorder an arbitrary class > `Foo` with `Runnable`, even if `Foo` doesn't implement `Runnable` now, because > it might easily be changed to do so later. Ideally we'd like to perform > class-hierarchy optimizations using the runtime hierarchy, not the compile-time > hierarchy. > ## Type classifiers > The technique outlined in _Part 1_, where we lower the complex switch to a dense > `int` switch, and use an indy-based classifier to select an index, is > applicable here as well. First let's consider a switch consisting only of > unguarded type-test patterns (and optionally a default clause.) > We'll start with an `indy` bootstrap whose static argument are `Class` constants > corresponding to each arm of the switch, whose dynamic argument is the switch > target, and whose return value is a case number (or distinguished sentinels for > "no match" and `null`.) We can easily implement such a bootstrap with a linear > search, but can also do better; if some subset of the classes are `final`, we > can choose between these more quickly (such as via binary search on > `hashCode()`, hash function, or hash table), and we need perform only a single > operation to test all of those at once. Dynamic techniques (such as a building > a hash map of previously seen target types), which `indy` is well-suited to, > can asymptotically approach _O(1)_ even when the classes involved are not > final. > So we can lower: > switch (x) { > case T t: A > case U u: B > case V v: C > } > to > int y = indy[bootstrap=typeSwitch(T.class, U.class, V.class)](x) > switch (y) { > case 0: A > case 1: B > case 2: C > } > This has the advantages that the generated code is very similar to the source > code, we can (in some cases) get _O(1)_ dispatch performance, and we can handle > fallthrough with no additional complexity. > #### Guards > There are two approaches we could take to add support for guards into the > process; we could try to teach the bootstrap about guards (and would have to > pass locals that appear in guard expressions as additional arguments to the > classifier), or we could leave guards to the generated bytecode. The latter > seems far more attractive, but requires some tweaks to the bootstrap arguments > and to the shape of the generated code. > If the classifier says "you have matched case #3", but then we fail the guard > for #3, we want to go back into the classifier and start again at #4. > Additionally, we'd like for the classifier to use this information ("start over > at #4") to optimize away unnecessary tests. > We add a second argument (where to start) to the classifier invocation > signature, and wrap the switch in a loop, lowering: > switch (x) { > case T t where (e1): A > case T t where (e2): B > case U u where (e3): C > } > into > int y = -1; // start at the top > while (true) { > y = indy[...](x, y) > switch (y) { > case 0: if (!e1) continue; A > case 1: if (!e2) continue; B > case 2: if (!e3) continue; C > } > break; > } > For cases where the same type test is repeated in consecutive positions (at N > and N+1), we can have the static compiler coalesce them as above, or we could > have the bootstrap maintain a table so that if you re-enter the bootstrap where > the previous answer was N, then it can immediately return N+1. Similarly, if N > and N+1 are known to be mutually exclusive types (like `String` and `Integer`), > on reentering the classifier with N, we can skip right to N+2 since if we > matched `String`, we cannot match `Integer`. Lookup tables for such > optimizations can be built at link time. > #### Mixing constants and type tests > This approach also extends to tests that are a mix of constant patterns and > type-test patterns, such as: > switch (x) { > case "Foo": ... > case 0L: ... > case Integer i: > } > We can extend the bootstrap protocol to accept constants as well as types, and > it is a straightforward optimization to combine both type matching and constant > matching in a single pass. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.goetz at oracle.com Fri Jan 12 17:22:34 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 12 Jan 2018 12:22:34 -0500 Subject: Constable interface should die In-Reply-To: <113776be-1bab-0358-e99e-8611e6796625@oracle.com> References: <1058715836.2467666.1504363902094.JavaMail.zimbra@u-pem.fr> <113776be-1bab-0358-e99e-8611e6796625@oracle.com> Message-ID: <202ed4cf-39b8-9702-4f17-6e66efcb05f7@oracle.com> And the next iteration of the loop I think lands in a more comfortable place. Constable did indeed die, to be renamed to SymbolicRef.? A SymbolicRef, is, well, a symbolic reference for some classfile or platform entity, including constants, but also possibly including annotations, classfile attributes, indy bootstrap specifiers, or other constructs. Most symbolic references have a live object analogue; ClassRef corresponds to a Class, MethodTypeRef corresponds to a MethodType. Some symbolic references are their own live object; this is true for String, Integer, Float, Long, and Double. Constable was then reborn to mean something else -- something that is constant-able -- i.e., a live object that can be naturally represented in the constant pool.? This includes String and friends from above, as well as Class, MethodType, and MethodHandle -- as well as anything that knows how to represent itself in the constant pool with condy, like var handles or enum constants. So: ?- Constable -- represents a live object that also knows how to construct a symbolic reference for itself.? Constructing a symbolic reference is a partial function; not all method handles can be represented in the constant pool directly, just the direct kind (for now). ?- SymbolicRef -- a symbolic (purely nominal) descriptor for some object.? These can be resolved reflectively or can be intrinsified into the constant pool.? Symbolic references also implement Constable, so you can explicitly store a symbolic ref in the CP (with condy). ?- SymbolicRef.OfSelf -- for the types that act as their own symbolic ref. On 9/22/2017 11:14 AM, Brian Goetz wrote: > So, to close the loop here ... > > Based on these comments, we went through three more rounds of API > design, and ... ended up in a pretty similar place to where we > started.? First we tried a more formal separation between "Constable" > and "ConstantPoolEntry."? Then we tried a CP-entry-centric approach.? > And what that brought us back to was, that the central abstraction > here is the symbolic references -- that this isn't only about > intrinsification.? (If it were, then the comments regarding > macro-systems would be spot-on.)? So the current draft brings the > symbolic references front and center -- and leaves intrinsification > and constant pool entries deliberately in the background. > > On 9/2/2017 10:51 AM, Remi Forax wrote: >> Brian ask me to explain my concerns about the Constable interface. >> >> The whole constant folding story is like a macro system, it's a >> limited macro system, but still a macro system. >> I've developed several macro systems, all have limitations, some >> limitation that i have introduced voluntarily, some that have appear >> after being being used, all the macro systems have always evolved >> after the first release. >> So the first lesson of designing a macro system seems to be, because >> it will evolve, it should provide the minimal API so it can be >> refactored easily. >> >> In the case of constant-folding mechanism, it's not a mechanism that >> target end users but JDK maintainers, so end users should not be able >> to see the implementation of such mecanism. >> It's my main concern with the Constable interface, it's a public >> visible type with a public visible API. >> >> We have already introduced in the past a mechanism that requires a >> specific interaction between the user code, the JDK and the compiler, >> it's the polymorphic methods signature and it was solved by using a >> private annotation. >> >> I think constant folding should use the same trick. Mark constant >> foldable type with a hidden annotation (@Constable ?) and mark >> methods (private) that can be called by the compiler with another >> hidden annotation (@TrackableConstant ?) and i will be happy. >> >> Compared to using an interface, there is a loss of discover-ability >> from the end user, but their is no loss of compiler checking because >> the compiler can check if a type is annotated by an annotation the >> same way it can check if it implements an interface. >> >> Now, we can discuss if @Constable should be a public annotation or >> not because once a type can be constant folded, removing the >> annotation is a non backward compatible change. So having the >> @Constable public is perhaps better than having to have a sentence in >> the middle of the javadoc saying that this is a constant foladable type. >> >> Note that constant folding things is also a form of serialization, >> the first Java serialization API have made that mistake to make the >> implementation of the part that serialize each object too visible. I >> think we can do better here. >> You can also think that like Serializable, Constable could be an >> empty interface and ldc will take a Constable. But int >> constant-foldable and i do not see why it should be boxed to an >> Integer to becomes Constable (The full implication of that is that >> ldc should be a method with a polymorphic signature but we are moving >> in that direction anyway). >> >> Long live to @Constable ! >> >> regards, >> R?mi >> > From amaembo at gmail.com Sat Jan 27 04:13:58 2018 From: amaembo at gmail.com (Tagir Valeev) Date: Sat, 27 Jan 2018 11:13:58 +0700 Subject: [raw-strings] Newline character handling Message-ID: Hello! I looked through Raw String Literals JEP draft [1] and did not find any explicit statement about CR/LF translation within multiline raw string. Usually in text files (and, I believe, Java source qualifies as a text file) it's assumed that changing \n to \r\n and vice versa would not change the semantics. Sometimes such changes are performed automatically, e.g. on Git checkout via core.autocrlf=true setting [2]. If multiline string literal is used, then such replacement may badly affect the semantics of the program. E.g.: public class Hello { public static void main(String[] args) { System.out.println(`Hello World!`.length()); } } The output of this program may change if its source text is converted from CR/LF to LF line endings or vice versa. As far as I know, Kotlin forcibly replaces CR/LF to LF within multiline strings, though I did not find any explicit statement about this in the documentation. This looks a good compromise, though could be annoying for people who actually want to encode CR/LF inside a multiline string. Nevertheless, I feel, that the special handling of line terminators within multiline strings (or absence of such handling) should be explicitly mentioned in the JEP and the following specification. With best regards, Tagir Valeev. [1] http://openjdk.java.net/jeps/8196004 [2] https://help.github.com/articles/dealing-with-line-endings/ From amaembo at gmail.com Sat Jan 27 08:23:31 2018 From: amaembo at gmail.com (Tagir Valeev) Date: Sat, 27 Jan 2018 15:23:31 +0700 Subject: [raw-strings] Indentation problem Message-ID: Hello! Every language which implements the multiline strings has problems with indentation. E.g. consider something like this: public class Multiline { static String createHtml(String message) { String html = ` Message `; if (message != null) { html += `

Message: `+message+`

`; } html += ` `; return html; } } Here the indentation of embedded snippet breaks the indentation of the Java program harming its readability. The overall structure of the method is messed with generated HTML structure. This is not just bad indentation which could be fixed by auto-formatting feature of IDE. You cannot fix this without throwing away a multiline string syntax and without changing the semantics. Some people sacrifice the semantics, namely the indentation of generated output if output language is indentation agnostic. HTML is mostly so, unless you have a
 section. So one may "fix" it like this:

public class Multiline {
  static String createHtml(String message) {
    String html = `
        
          Message
        
        `;
    if (message != null) {
      html += `
          

Message: `+message+`

`; } html += ` `; return html; } } Now we have broken formatting in the generated HTML, which ruins the idea of multiline strings (why bother to generate \n in output HTML if it looks like a mess anyways?) Moreover, the structure of Java program now affects the output. E.g. if you add several more nested "if" or "switch" statement, you will need to indent

even more. Many languages provide library methods to handle this. E.g. trimIndent() could be provided to remove leading spaces of every line, but this would kill the HTML indents at all. Another possibility is to provide a method like trimMargin() on Kotlin [1] which trims all spaces before a special character (pipe by default) including a special character itself. Assuming such method exists in Java, we can rewrite our method in a prettier way preserving both Java and HTML formatting: public class Multiline { static String createHtml(String message) { String html = ` | | Message | | `.trimMargin(); if (message != null) { html += ` |

| Message: `+message+` |

`.trimMargin(); } html += ` | |`.trimMargin(); return html; } } This is almost nice. Even without syntax highlighting you can easily distinguish between Java code and injected HTML code, you can indent Java and HTML independently and HTML code does not clash with Java code structure. The only problem is the necesity to call the trimMargin() method. This means that original line is preserved in the bytecode and during runtime and the trimming is processed every time the method is called causing performance and memory handicap. This problem could be minimized making trimMargin() a javac intrinsic. Hoever even in this case it would be hard to enforce usage of this method and I expect that tons of hard-to-read Java code will appear in the wild, despite I believe that Java is about readability. So I propose to enforce such (or similar) format on language level instead of adding a library method like "trimMargin()". The syntax could be formalized like this: - Raw string starts with back-quote, ends with back-quote, as written in draft before - When line terminating sequence is encountered within a raw string, the '\n' character is included into the string, and the literal is interrupted - After the interruption any amount of whitespace or comment tokens are allowed and ignored - The next meaningful token must be a pipe '|'. It's a compilation error if any other token or EOF appears before '|' except comments or whitespaces - After '|' the raw-string literal continues and may either end with back-quote or be interrupted again with the subsequent line terminating sequence. Note the you don't need to especially escape the pipes within the literals. I see some advantages with such syntax: 1. You can comment (or comment out!) a part of multiline string without terminating it: String sql = `SELECT * FROM table // Negative entry ID = deleted entry | WHERE entryID >= 0`; If you want you can still make this comment a part of the query (assuming DBMS accepts // comments): String sql = `SELECT * FROM table | // Negative entry ID = deleted entry | WHERE entryID >= 0`; Outcommenting code: String html = `
/* | | Error | */ // single-line comments would work as well | Something wrong happened |
`; 2. Looking into code fragment out of context (e.g. diff log) you understand that you are inside a multiline literal. E.g. consider reviewing a diff like | x++; + | if (x == 10) break; | foo(x); Without pipes you could think that it's Java code without any further consideration. But now it's clear that it's part of multiline string (probably a JavaScript!), so this is not direct Java logic and you should check the broader context to understand what's this literal is for. 3. You cannot accidentally make a big part of program a part of multiline raw string just forgetting to close the back-quote. A compilation error will be issued right in the next string like "Multiline string must continue with a pipe token", not some obscure message five screens below where the next raw string literal happens to start. 4. IDEs will easily distinguish between in-literal indentation and Java indentation and may allow you to adjust independently one or another. In general this greatly increases the readability clearly telling you at every line that you're not in Java, but inside something nested. You can easily nest Java snippet into Java snippet and use multiline raw-strings inside and still not get lost! String javaMethod = `public void dumpHtml() { | System.out.println(`` | | | | | |

HelloWorld!

| | | |``); |}` One pipe means one level inside, two pipes mean two levels inside. The only disadvantage I see in forcing a pipe prefix is inability to just paste a big snippet from somewhere to the middle of Java program in a plain text editor. However any decent IDE would support automatic addition of pipes on paste. If not, simple search-and-replace with regex like s/^/ |/ though the pasted content will do the thing. Even adding pipes manually is not that hard (I did this manually many times writing this letter). What do you think? [1] https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/trim-margin.html From forax at univ-mlv.fr Sat Jan 27 10:59:40 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 27 Jan 2018 11:59:40 +0100 (CET) Subject: [raw-strings] Newline character handling In-Reply-To: References: Message-ID: <465103338.2523097.1517050780606.JavaMail.zimbra@u-pem.fr> Hi Tagir, you have miss this line: CR (\u000D) and CRLF (\u000D\u000A) sequences are always translated to LF (\u000A). This translation provides least surprise behavior across platforms. this is also the behavior of Perl, PHP, etc. as a guy that had to write too many shaders in Java recently, thanks for resurrecting this discussion, i think we should not wait another 10 years to add raw strings in Java. regards, R?mi ----- Mail original ----- > De: "Tagir Valeev" > ?: "amber-spec-experts" > Envoy?: Samedi 27 Janvier 2018 05:13:58 > Objet: [raw-strings] Newline character handling > Hello! > > I looked through Raw String Literals JEP draft [1] and did not find > any explicit statement about CR/LF translation within multiline raw > string. Usually in text files (and, I believe, Java source qualifies > as a text file) it's assumed that changing \n to \r\n and vice versa > would not change the semantics. Sometimes such changes are performed > automatically, e.g. on Git checkout via core.autocrlf=true setting > [2]. If multiline string literal is used, then such replacement may > badly affect the semantics of the program. E.g.: > > public class Hello { > public static void main(String[] args) { > System.out.println(`Hello > World!`.length()); > } > } > > The output of this program may change if its source text is converted > from CR/LF to LF line endings or vice versa. > > As far as I know, Kotlin forcibly replaces CR/LF to LF within > multiline strings, though I did not find any explicit statement about > this in the documentation. This looks a good compromise, though could > be annoying for people who actually want to encode CR/LF inside a > multiline string. Nevertheless, I feel, that the special handling of > line terminators within multiline strings (or absence of such > handling) should be explicitly mentioned in the JEP and the following > specification. > > With best regards, > Tagir Valeev. > > > [1] http://openjdk.java.net/jeps/8196004 > [2] https://help.github.com/articles/dealing-with-line-endings/ From forax at univ-mlv.fr Sat Jan 27 11:03:40 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 27 Jan 2018 12:03:40 +0100 (CET) Subject: [raw-strings] Indentation problem In-Reply-To: References: Message-ID: <233027356.2523249.1517051020375.JavaMail.zimbra@u-pem.fr> The is a rule when you design a language, if you can do something in the compiler or in a library, do it in the library :) I do not thing it's a good idea to force the pipe prefix in the spec, and from an IDE point of view, you have to do more analysis but you can recognize the sequence ` ... `.trimMargin() in order to auto-indent things correctly. regards, R?mi ----- Mail original ----- > De: "Tagir Valeev" > ?: "amber-spec-experts" > Envoy?: Samedi 27 Janvier 2018 09:23:31 > Objet: [raw-strings] Indentation problem > Hello! > > Every language which implements the multiline strings has problems > with indentation. E.g. consider something like this: > > public class Multiline { > static String createHtml(String message) { > String html = ` > > Message > > `; > if (message != null) { > html += ` >

> Message: `+message+` >

`; > } > html += ` > > `; > return html; > } > } > > Here the indentation of embedded snippet breaks the indentation of the > Java program harming its readability. The overall structure of the > method is messed with generated HTML structure. This is not just bad > indentation which could be fixed by auto-formatting feature of IDE. > You cannot fix this without throwing away a multiline string syntax > and without changing the semantics. Some people sacrifice the > semantics, namely the indentation of generated output if output > language is indentation agnostic. HTML is mostly so, unless you have a >
 section. So one may "fix" it like this:
> 
> public class Multiline {
>  static String createHtml(String message) {
>    String html = `
>        
>          Message
>        
>        `;
>    if (message != null) {
>      html += `
>          

> Message: `+message+` >

`; > } > html += ` > > `; > return html; > } > } > > Now we have broken formatting in the generated HTML, which ruins the > idea of multiline strings (why bother to generate \n in output HTML if > it looks like a mess anyways?) Moreover, the structure of Java program > now affects the output. E.g. if you add several more nested "if" or > "switch" statement, you will need to indent

even more. > > Many languages provide library methods to handle this. E.g. > trimIndent() could be provided to remove leading spaces of every line, > but this would kill the HTML indents at all. Another possibility is to > provide a method like trimMargin() on Kotlin [1] which trims all > spaces before a special character (pipe by default) including a > special character itself. > > Assuming such method exists in Java, we can rewrite our method in a > prettier way preserving both Java and HTML formatting: > > public class Multiline { > static String createHtml(String message) { > String html = ` > | > | Message > | > | `.trimMargin(); > if (message != null) { > html += ` > |

> | Message: `+message+` > |

`.trimMargin(); > } > html += ` > | > |`.trimMargin(); > return html; > } > } > > This is almost nice. Even without syntax highlighting you can easily > distinguish between Java code and injected HTML code, you can indent > Java and HTML independently and HTML code does not clash with Java > code structure. The only problem is the necesity to call the > trimMargin() method. This means that original line is preserved in the > bytecode and during runtime and the trimming is processed every time > the method is called causing performance and memory handicap. This > problem could be minimized making trimMargin() a javac intrinsic. > Hoever even in this case it would be hard to enforce usage of this > method and I expect that tons of hard-to-read Java code will appear in > the wild, despite I believe that Java is about readability. > > So I propose to enforce such (or similar) format on language level > instead of adding a library method like "trimMargin()". The syntax > could be formalized like this: > > - Raw string starts with back-quote, ends with back-quote, as written > in draft before > - When line terminating sequence is encountered within a raw string, > the '\n' character is included into the string, and the literal is > interrupted > - After the interruption any amount of whitespace or comment tokens > are allowed and ignored > - The next meaningful token must be a pipe '|'. It's a compilation > error if any other token or EOF appears before '|' except comments or > whitespaces > - After '|' the raw-string literal continues and may either end with > back-quote or be interrupted again with the subsequent line > terminating sequence. > > Note the you don't need to especially escape the pipes within the literals. > > I see some advantages with such syntax: > 1. You can comment (or comment out!) a part of multiline string > without terminating it: > > String sql = `SELECT * FROM table > // Negative entry ID = deleted entry > | WHERE entryID >= 0`; > > If you want you can still make this comment a part of the query > (assuming DBMS accepts // comments): > > String sql = `SELECT * FROM table > | // Negative entry ID = deleted entry > | WHERE entryID >= 0`; > > Outcommenting code: > > String html = `
> /* | > | Error > | */ // single-line comments would work as well > | Something wrong happened > |
`; > > 2. Looking into code fragment out of context (e.g. diff log) you > understand that you are inside a multiline literal. E.g. consider > reviewing a diff like > > | x++; > + | if (x == 10) break; > | foo(x); > > Without pipes you could think that it's Java code without any further > consideration. But now it's clear that it's part of multiline string > (probably a JavaScript!), so this is not direct Java logic and you > should check the broader context to understand what's this literal is > for. > > 3. You cannot accidentally make a big part of program a part of > multiline raw string just forgetting to close the back-quote. A > compilation error will be issued right in the next string like > "Multiline string must continue with a pipe token", not some obscure > message five screens below where the next raw string literal happens > to start. > > 4. IDEs will easily distinguish between in-literal indentation and > Java indentation and may allow you to adjust independently one or > another. > > In general this greatly increases the readability clearly telling you > at every line that you're not in Java, but inside something nested. > You can easily nest Java snippet into Java snippet and use multiline > raw-strings inside and still not get lost! > > String javaMethod = `public void dumpHtml() { > | System.out.println(`` > | | > | | > | |

HelloWorld!

> | | > | |``); > |}` > > One pipe means one level inside, two pipes mean two levels inside. > > > The only disadvantage I see in forcing a pipe prefix is inability to > just paste a big snippet from somewhere to the middle of Java program > in a plain text editor. However any decent IDE would support automatic > addition of pipes on paste. If not, simple search-and-replace with > regex like s/^/ |/ though the pasted content will do the thing. Even > adding pipes manually is not that hard (I did this manually many times > writing this letter). > > What do you think? > > [1] https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/trim-margin.html From amaembo at gmail.com Sat Jan 27 11:30:16 2018 From: amaembo at gmail.com (Tagir Valeev) Date: Sat, 27 Jan 2018 18:30:16 +0700 Subject: [raw-strings] Newline character handling In-Reply-To: <465103338.2523097.1517050780606.JavaMail.zimbra@u-pem.fr> References: <465103338.2523097.1517050780606.JavaMail.zimbra@u-pem.fr> Message-ID: Ah, indeed. Missed this part somehow. Sorry for the noise then. With best regards, Tagir Valeev. On Sat, Jan 27, 2018 at 5:59 PM, Remi Forax wrote: > Hi Tagir, > you have miss this line: > CR (\u000D) and CRLF (\u000D\u000A) sequences are always translated to LF (\u000A). This translation provides least surprise behavior across platforms. > > this is also the behavior of Perl, PHP, etc. > > as a guy that had to write too many shaders in Java recently, thanks for resurrecting this discussion, i think we should not wait another 10 years to add raw strings in Java. > > regards, > R?mi > > ----- Mail original ----- >> De: "Tagir Valeev" >> ?: "amber-spec-experts" >> Envoy?: Samedi 27 Janvier 2018 05:13:58 >> Objet: [raw-strings] Newline character handling > >> Hello! >> >> I looked through Raw String Literals JEP draft [1] and did not find >> any explicit statement about CR/LF translation within multiline raw >> string. Usually in text files (and, I believe, Java source qualifies >> as a text file) it's assumed that changing \n to \r\n and vice versa >> would not change the semantics. Sometimes such changes are performed >> automatically, e.g. on Git checkout via core.autocrlf=true setting >> [2]. If multiline string literal is used, then such replacement may >> badly affect the semantics of the program. E.g.: >> >> public class Hello { >> public static void main(String[] args) { >> System.out.println(`Hello >> World!`.length()); >> } >> } >> >> The output of this program may change if its source text is converted >> from CR/LF to LF line endings or vice versa. >> >> As far as I know, Kotlin forcibly replaces CR/LF to LF within >> multiline strings, though I did not find any explicit statement about >> this in the documentation. This looks a good compromise, though could >> be annoying for people who actually want to encode CR/LF inside a >> multiline string. Nevertheless, I feel, that the special handling of >> line terminators within multiline strings (or absence of such >> handling) should be explicitly mentioned in the JEP and the following >> specification. >> >> With best regards, >> Tagir Valeev. >> >> >> [1] http://openjdk.java.net/jeps/8196004 >> [2] https://help.github.com/articles/dealing-with-line-endings/ From amaembo at gmail.com Sat Jan 27 11:37:01 2018 From: amaembo at gmail.com (Tagir Valeev) Date: Sat, 27 Jan 2018 18:37:01 +0700 Subject: [raw-strings] Indentation problem In-Reply-To: <233027356.2523249.1517051020375.JavaMail.zimbra@u-pem.fr> References: <233027356.2523249.1517051020375.JavaMail.zimbra@u-pem.fr> Message-ID: Hello! > The is a rule when you design a language, if you can do something in the compiler or in a library, do it in the library :) Library can indeed allow you (to some extent) to use better syntax. What library cannot do is to disallow the worse syntax. And this the most important part of my suggestion: to prevent people from writing bad code, not to allow people to write better code. > I do not thing it's a good idea to force the pipe prefix in the spec, Why do you think it's not a good idea? What are possible disadvantages? Please share your concerns. Thanks. > and from an IDE point of view, you have to do more analysis but you can recognize the sequence ` ... `.trimMargin() in order to auto-indent things correctly. True, it's possible if one uses trimMargin() call. But if bad code is already written, it would be not so easy to fix it automatically. We could automatically add trimMargin(), but determining which part of indent should be moved to the left part of pipe cannot be done with 100% accuracy. With best regards, Tagir Valeev. > > regards, > R?mi > > ----- Mail original ----- >> De: "Tagir Valeev" >> ?: "amber-spec-experts" >> Envoy?: Samedi 27 Janvier 2018 09:23:31 >> Objet: [raw-strings] Indentation problem > >> Hello! >> >> Every language which implements the multiline strings has problems >> with indentation. E.g. consider something like this: >> >> public class Multiline { >> static String createHtml(String message) { >> String html = ` >> >> Message >> >> `; >> if (message != null) { >> html += ` >>

>> Message: `+message+` >>

`; >> } >> html += ` >> >> `; >> return html; >> } >> } >> >> Here the indentation of embedded snippet breaks the indentation of the >> Java program harming its readability. The overall structure of the >> method is messed with generated HTML structure. This is not just bad >> indentation which could be fixed by auto-formatting feature of IDE. >> You cannot fix this without throwing away a multiline string syntax >> and without changing the semantics. Some people sacrifice the >> semantics, namely the indentation of generated output if output >> language is indentation agnostic. HTML is mostly so, unless you have a >>
 section. So one may "fix" it like this:
>>
>> public class Multiline {
>>  static String createHtml(String message) {
>>    String html = `
>>        
>>          Message
>>        
>>        `;
>>    if (message != null) {
>>      html += `
>>          

>> Message: `+message+` >>

`; >> } >> html += ` >> >> `; >> return html; >> } >> } >> >> Now we have broken formatting in the generated HTML, which ruins the >> idea of multiline strings (why bother to generate \n in output HTML if >> it looks like a mess anyways?) Moreover, the structure of Java program >> now affects the output. E.g. if you add several more nested "if" or >> "switch" statement, you will need to indent

even more. >> >> Many languages provide library methods to handle this. E.g. >> trimIndent() could be provided to remove leading spaces of every line, >> but this would kill the HTML indents at all. Another possibility is to >> provide a method like trimMargin() on Kotlin [1] which trims all >> spaces before a special character (pipe by default) including a >> special character itself. >> >> Assuming such method exists in Java, we can rewrite our method in a >> prettier way preserving both Java and HTML formatting: >> >> public class Multiline { >> static String createHtml(String message) { >> String html = ` >> | >> | Message >> | >> | `.trimMargin(); >> if (message != null) { >> html += ` >> |

>> | Message: `+message+` >> |

`.trimMargin(); >> } >> html += ` >> | >> |`.trimMargin(); >> return html; >> } >> } >> >> This is almost nice. Even without syntax highlighting you can easily >> distinguish between Java code and injected HTML code, you can indent >> Java and HTML independently and HTML code does not clash with Java >> code structure. The only problem is the necesity to call the >> trimMargin() method. This means that original line is preserved in the >> bytecode and during runtime and the trimming is processed every time >> the method is called causing performance and memory handicap. This >> problem could be minimized making trimMargin() a javac intrinsic. >> Hoever even in this case it would be hard to enforce usage of this >> method and I expect that tons of hard-to-read Java code will appear in >> the wild, despite I believe that Java is about readability. >> >> So I propose to enforce such (or similar) format on language level >> instead of adding a library method like "trimMargin()". The syntax >> could be formalized like this: >> >> - Raw string starts with back-quote, ends with back-quote, as written >> in draft before >> - When line terminating sequence is encountered within a raw string, >> the '\n' character is included into the string, and the literal is >> interrupted >> - After the interruption any amount of whitespace or comment tokens >> are allowed and ignored >> - The next meaningful token must be a pipe '|'. It's a compilation >> error if any other token or EOF appears before '|' except comments or >> whitespaces >> - After '|' the raw-string literal continues and may either end with >> back-quote or be interrupted again with the subsequent line >> terminating sequence. >> >> Note the you don't need to especially escape the pipes within the literals. >> >> I see some advantages with such syntax: >> 1. You can comment (or comment out!) a part of multiline string >> without terminating it: >> >> String sql = `SELECT * FROM table >> // Negative entry ID = deleted entry >> | WHERE entryID >= 0`; >> >> If you want you can still make this comment a part of the query >> (assuming DBMS accepts // comments): >> >> String sql = `SELECT * FROM table >> | // Negative entry ID = deleted entry >> | WHERE entryID >= 0`; >> >> Outcommenting code: >> >> String html = `
>> /* | >> | Error >> | */ // single-line comments would work as well >> | Something wrong happened >> |
`; >> >> 2. Looking into code fragment out of context (e.g. diff log) you >> understand that you are inside a multiline literal. E.g. consider >> reviewing a diff like >> >> | x++; >> + | if (x == 10) break; >> | foo(x); >> >> Without pipes you could think that it's Java code without any further >> consideration. But now it's clear that it's part of multiline string >> (probably a JavaScript!), so this is not direct Java logic and you >> should check the broader context to understand what's this literal is >> for. >> >> 3. You cannot accidentally make a big part of program a part of >> multiline raw string just forgetting to close the back-quote. A >> compilation error will be issued right in the next string like >> "Multiline string must continue with a pipe token", not some obscure >> message five screens below where the next raw string literal happens >> to start. >> >> 4. IDEs will easily distinguish between in-literal indentation and >> Java indentation and may allow you to adjust independently one or >> another. >> >> In general this greatly increases the readability clearly telling you >> at every line that you're not in Java, but inside something nested. >> You can easily nest Java snippet into Java snippet and use multiline >> raw-strings inside and still not get lost! >> >> String javaMethod = `public void dumpHtml() { >> | System.out.println(`` >> | | >> | | >> | |

HelloWorld!

>> | | >> | |``); >> |}` >> >> One pipe means one level inside, two pipes mean two levels inside. >> >> >> The only disadvantage I see in forcing a pipe prefix is inability to >> just paste a big snippet from somewhere to the middle of Java program >> in a plain text editor. However any decent IDE would support automatic >> addition of pipes on paste. If not, simple search-and-replace with >> regex like s/^/ |/ though the pasted content will do the thing. Even >> adding pipes manually is not that hard (I did this manually many times >> writing this letter). >> >> What do you think? >> >> [1] https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/trim-margin.html From forax at univ-mlv.fr Mon Jan 29 11:01:58 2018 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Mon, 29 Jan 2018 12:01:58 +0100 (CET) Subject: [raw-strings] Indentation problem In-Reply-To: References: <233027356.2523249.1517051020375.JavaMail.zimbra@u-pem.fr> Message-ID: <1338120311.230249.1517223718272.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Tagir Valeev" > ?: "Remi Forax" > Cc: "amber-spec-experts" > Envoy?: Samedi 27 Janvier 2018 12:37:01 > Objet: Re: [raw-strings] Indentation problem > Hello! Hi ! > >> The is a rule when you design a language, if you can do something in the >> compiler or in a library, do it in the library :) > > Library can indeed allow you (to some extent) to use better syntax. > What library cannot do is to disallow the worse syntax. And this the > most important part of my suggestion: to prevent people from writing > bad code, not to allow people to write better code. non-indented code doesn't always equals to bad code, think about minification by example. > >> I do not thing it's a good idea to force the pipe prefix in the spec, > > Why do you think it's not a good idea? What are possible > disadvantages? Please share your concerns. Thanks. Coding conventions, code formatting rules, etc. change over time as we learn as a community, baking this kind of considerations in a language will make your language always live in the same decade, the less 'by defaults' you have in a language, the better it is when you will looking back. > >> and from an IDE point of view, you have to do more analysis but you can >> recognize the sequence ` ... `.trimMargin() in order to auto-indent things >> correctly. > > True, it's possible if one uses trimMargin() call. But if bad code is > already written, it would be not so easy to fix it automatically. We > could automatically add trimMargin(), but determining which part of > indent should be moved to the left part of pipe cannot be done with > 100% accuracy. I was thinking about using .trimMarging() as a hint that the user want IDEs to re-format the code correctly (by detecting the pipe), not forcing users to follow a specific convention. > > With best regards, > Tagir Valeev. > regards, R?mi >> >> regards, >> R?mi >> >> ----- Mail original ----- >>> De: "Tagir Valeev" >>> ?: "amber-spec-experts" >>> Envoy?: Samedi 27 Janvier 2018 09:23:31 >>> Objet: [raw-strings] Indentation problem >> >>> Hello! >>> >>> Every language which implements the multiline strings has problems >>> with indentation. E.g. consider something like this: >>> >>> public class Multiline { >>> static String createHtml(String message) { >>> String html = ` >>> >>> Message >>> >>> `; >>> if (message != null) { >>> html += ` >>>

>>> Message: `+message+` >>>

`; >>> } >>> html += ` >>> >>> `; >>> return html; >>> } >>> } >>> >>> Here the indentation of embedded snippet breaks the indentation of the >>> Java program harming its readability. The overall structure of the >>> method is messed with generated HTML structure. This is not just bad >>> indentation which could be fixed by auto-formatting feature of IDE. >>> You cannot fix this without throwing away a multiline string syntax >>> and without changing the semantics. Some people sacrifice the >>> semantics, namely the indentation of generated output if output >>> language is indentation agnostic. HTML is mostly so, unless you have a >>>
 section. So one may "fix" it like this:
>>>
>>> public class Multiline {
>>>  static String createHtml(String message) {
>>>    String html = `
>>>        
>>>          Message
>>>        
>>>        `;
>>>    if (message != null) {
>>>      html += `
>>>          

>>> Message: `+message+` >>>

`; >>> } >>> html += ` >>> >>> `; >>> return html; >>> } >>> } >>> >>> Now we have broken formatting in the generated HTML, which ruins the >>> idea of multiline strings (why bother to generate \n in output HTML if >>> it looks like a mess anyways?) Moreover, the structure of Java program >>> now affects the output. E.g. if you add several more nested "if" or >>> "switch" statement, you will need to indent

even more. >>> >>> Many languages provide library methods to handle this. E.g. >>> trimIndent() could be provided to remove leading spaces of every line, >>> but this would kill the HTML indents at all. Another possibility is to >>> provide a method like trimMargin() on Kotlin [1] which trims all >>> spaces before a special character (pipe by default) including a >>> special character itself. >>> >>> Assuming such method exists in Java, we can rewrite our method in a >>> prettier way preserving both Java and HTML formatting: >>> >>> public class Multiline { >>> static String createHtml(String message) { >>> String html = ` >>> | >>> | Message >>> | >>> | `.trimMargin(); >>> if (message != null) { >>> html += ` >>> |

>>> | Message: `+message+` >>> |

`.trimMargin(); >>> } >>> html += ` >>> | >>> |`.trimMargin(); >>> return html; >>> } >>> } >>> >>> This is almost nice. Even without syntax highlighting you can easily >>> distinguish between Java code and injected HTML code, you can indent >>> Java and HTML independently and HTML code does not clash with Java >>> code structure. The only problem is the necesity to call the >>> trimMargin() method. This means that original line is preserved in the >>> bytecode and during runtime and the trimming is processed every time >>> the method is called causing performance and memory handicap. This >>> problem could be minimized making trimMargin() a javac intrinsic. >>> Hoever even in this case it would be hard to enforce usage of this >>> method and I expect that tons of hard-to-read Java code will appear in >>> the wild, despite I believe that Java is about readability. >>> >>> So I propose to enforce such (or similar) format on language level >>> instead of adding a library method like "trimMargin()". The syntax >>> could be formalized like this: >>> >>> - Raw string starts with back-quote, ends with back-quote, as written >>> in draft before >>> - When line terminating sequence is encountered within a raw string, >>> the '\n' character is included into the string, and the literal is >>> interrupted >>> - After the interruption any amount of whitespace or comment tokens >>> are allowed and ignored >>> - The next meaningful token must be a pipe '|'. It's a compilation >>> error if any other token or EOF appears before '|' except comments or >>> whitespaces >>> - After '|' the raw-string literal continues and may either end with >>> back-quote or be interrupted again with the subsequent line >>> terminating sequence. >>> >>> Note the you don't need to especially escape the pipes within the literals. >>> >>> I see some advantages with such syntax: >>> 1. You can comment (or comment out!) a part of multiline string >>> without terminating it: >>> >>> String sql = `SELECT * FROM table >>> // Negative entry ID = deleted entry >>> | WHERE entryID >= 0`; >>> >>> If you want you can still make this comment a part of the query >>> (assuming DBMS accepts // comments): >>> >>> String sql = `SELECT * FROM table >>> | // Negative entry ID = deleted entry >>> | WHERE entryID >= 0`; >>> >>> Outcommenting code: >>> >>> String html = `
>>> /* | >>> | Error >>> | */ // single-line comments would work as well >>> | Something wrong happened >>> |
`; >>> >>> 2. Looking into code fragment out of context (e.g. diff log) you >>> understand that you are inside a multiline literal. E.g. consider >>> reviewing a diff like >>> >>> | x++; >>> + | if (x == 10) break; >>> | foo(x); >>> >>> Without pipes you could think that it's Java code without any further >>> consideration. But now it's clear that it's part of multiline string >>> (probably a JavaScript!), so this is not direct Java logic and you >>> should check the broader context to understand what's this literal is >>> for. >>> >>> 3. You cannot accidentally make a big part of program a part of >>> multiline raw string just forgetting to close the back-quote. A >>> compilation error will be issued right in the next string like >>> "Multiline string must continue with a pipe token", not some obscure >>> message five screens below where the next raw string literal happens >>> to start. >>> >>> 4. IDEs will easily distinguish between in-literal indentation and >>> Java indentation and may allow you to adjust independently one or >>> another. >>> >>> In general this greatly increases the readability clearly telling you >>> at every line that you're not in Java, but inside something nested. >>> You can easily nest Java snippet into Java snippet and use multiline >>> raw-strings inside and still not get lost! >>> >>> String javaMethod = `public void dumpHtml() { >>> | System.out.println(`` >>> | | >>> | | >>> | |

HelloWorld!

>>> | | >>> | |``); >>> |}` >>> >>> One pipe means one level inside, two pipes mean two levels inside. >>> >>> >>> The only disadvantage I see in forcing a pipe prefix is inability to >>> just paste a big snippet from somewhere to the middle of Java program >>> in a plain text editor. However any decent IDE would support automatic >>> addition of pipes on paste. If not, simple search-and-replace with >>> regex like s/^/ |/ though the pasted content will do the thing. Even >>> adding pipes manually is not that hard (I did this manually many times >>> writing this letter). >>> >>> What do you think? >>> > >> [1] https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/trim-margin.html From john.r.rose at oracle.com Tue Jan 30 00:19:09 2018 From: john.r.rose at oracle.com (John Rose) Date: Mon, 29 Jan 2018 16:19:09 -0800 Subject: [raw-strings] Indentation problem In-Reply-To: <1338120311.230249.1517223718272.JavaMail.zimbra@u-pem.fr> References: <233027356.2523249.1517051020375.JavaMail.zimbra@u-pem.fr> <1338120311.230249.1517223718272.JavaMail.zimbra@u-pem.fr> Message-ID: Sorry, Tagir, I'm with Remi on this one. Baking conventions into a language is dangerous, and must be proven either canonical in some way or harmless. (Floating point tokens are canonical by reference to C, javadoc was a harmless design decision in the context of Java.) Tagir's proposals for indentation control are very clever but neither canonical nor harmless. Jim's proposal is carefully designed to be as minimal and simple as possible. I expect most of the feedback will be, like Tagir's, to the effect that Jim has missed some crucial feature, so that if only the proposal were less minimal in some way it would be better. But minimality itself is a feature, one to be protected and defended, especially against committees of people each of whom has their favorite non-negotiable. That said, I will (in another message) argue (in a different way) that Jim's proposal is very slightly too simple, in a way that makes it fail to meet its primary goal. Spoiler: I think I can prove that Markdown code quoting is appropriately minimal in its design, in a way Jim's is not. ? John From brian.goetz at oracle.com Tue Jan 30 20:01:15 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 30 Jan 2018 15:01:15 -0500 Subject: [raw-strings] Indentation problem In-Reply-To: References: Message-ID: <8B9EB16F-EFCB-4014-9031-91BCE7AA304E@oracle.com> In my opinion, this is actually the central motivation for the feature ? minimizing the friction (both reading and writing) of embedding small snippets from other programming languages in Java sources. So this disadvantage is a pretty big one. That said, a library method to remove the piping (some variant of trimIndent) makes sense, so people who want to program this way can do so easily. > The only disadvantage I see in forcing a pipe prefix is inability to > just paste a big snippet from somewhere to the middle of Java program > in a plain text editor. -------------- next part -------------- An HTML attachment was scrubbed... URL: