From forax at univ-mlv.fr Wed May 1 12:32:40 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 1 May 2019 14:32:40 +0200 (CEST) Subject: Feedback on Sealed Types In-Reply-To: <7a3cd179-1ccc-f3d5-b438-3146e87fa4eb@oracle.com> References:

<7a3cd179-1ccc-f3d5-b438-3146e87fa4eb@oracle.com> Message-ID: <2005472563.509550.1556713960688.JavaMail.zimbra@u-pem.fr> > De: "Brian Goetz" > ?: "Alan Malloy" > Cc: "amber-spec-experts" > Envoy?: Lundi 29 Avril 2019 23:01:17 > Objet: Re: Feedback on Sealed Types > It would be nice if we could "just" overload enum itself to support a > record-like option: > enum Node { > AddNode(Node a, Node b), > MulNode(Node a, Node b), > ...; > ... > } > but unfortunately that syntax looks confusingly close to something else :( It > also doesn't really scale to multi-level hierarchies, though that might be OK. > Its worth thinking about, though. The `data` construct from Haskell is surely > more direct than modeling a sum with sealed interfaces (though, the latter is > also more flexible than `data`.) > On the other hand, declaring a sum of records as: > sealed interface I { > record A(int a) implements I { } > record B(long b) implements I { } > record C(String c) implements I { } > } > isn't so bad. The main redundancy here is primarily "implements I", and > secondarily "record". We can surely compress that away like this: > enum interface I { > A(int a), > B(long b), > C(String c); > // I methods > } > but I am not sure it carries its weight, given as it adds little additional > concision (and no additional semantics.) It may solve the enclosing issue because the ';' syntactically separate A, B and C from the content of I which is declared after the ';', so A, B and C can be top-level. I kind a like the intellectual separation between - a sealed interface which represent a closed type and requires a permit clause and - an enum interface which represent a sum type which is sugar on top of sealed interface + records. one interesting question is how to desugar an enum interface with a component that has no parameter, like enum interface Option { Some(T value), Empty } If there is only one constant of type Empty and the construction is typesafe, it can be a huge win. R?mi > On 4/29/2019 4:33 PM, Alan Malloy wrote: >> Thanks, Brian. I indeed didn't think of some of your proposed benefits of >> sealing non-sum types, as I was focused mostly on things you mentioned >> explicitly in the JEP, which is somewhat light on the expected benefits. >> I think the first two items in your "challenges" solve each other: I don't >> intend sum types to be the only kind of sealed type, but just a good way to >> declare the simplest kind. I left out the "record" keyword from the declaration >> with the idea that it would be implicit: if you want the convenient >> sum-of-products declaration style, you have to use records. If you want >> something more complicated, you declare a sealed interface (or superclass), and >> N permitted subclasses, declared separately in whatever way you want. This >> restriction helps by making the semantics clearer, and I had also hoped that it >> would lead to a syntax error if you leave out the comma. Looking more closely, >> I see this is somewhat precarious: a record declaration looks enough like a >> method signature that they may be ambiguous in an interface, if you don't >> require the "record" keyword, or if you use a semicolon instead of a comma. I >> think it can still work if we require each nested record to use {...} instead >> of ; even if it's empty. This way, your two examples look like >> interface X { >> class X1 { ? } >> class X2 { ? } >> } >> and >> enumerated interface Y { >> Y1 { ? }, > Y2 { ? } >> } >> The latter would become illegal if you dropped the comma, even if you also >> forgot the "enumerated" keyword, because the braces make no sense in an >> ordinary interface. >> On Mon, Apr 29, 2019 at 12:35 PM Brian Goetz < [ mailto:brian.goetz at oracle.com | >> brian.goetz at oracle.com ] > wrote: >>> Thanks Alan, for this nice exploration. There?s a lot to respond to. I?ll start >>> with some general comments about sealing, and then move on to your alternate >>> proposal for exposing it. >>> I can think of several main reasons why you would want to seal a hierarchy. >>> - To say something about the _interface itself_. That is, that it was not >>> designed as a general purpose contract between arms-length entities, but that >>> it exists only as a common super type for a fixed set of classes that are part >>> of the library. In other words, ?please don?t implement me.? >>> - To say something about the semantics of the type. Several of the examples in >>> your report fall into this category: ?a DbResult is either a NoRowsFound or a >>> Rows(List)?. This tells users exactly what the reasonable range of results >>> to expect are when doing a query. Of course, the spec could say the same thing, >>> but that involves reading and interpreting the spec. Easier if this conclusion >>> can be driven by types (and IDEs can help more here too.) >>> - To strengthen typing by simulating unions. If my method is going to return >>> either a String or a Number, the common super type is Object. (Actually, it?s >>> some variant of Serializable & Comparable.). Sums-of-products >>> allow library authors to make a stronger statement about types in the presence >>> of unions. Exposing a sum of StringHolder(String) and NumberHolder(Number), >>> using records and sealed types, is not so ceremonious, so some library >>> developers might choose to do this instead of Object. >>> - Security. Some libraries may want to know that the code they are calling is >>> part of their library, rather than an arbitrary implementation of some >>> interface. >>> - To aid in exhaustiveness. We?ve already discussed this at length; your point >>> is that this one doesn?t come up as often as one might hope. >>> Not only is there an obvious synergy between sums and products (as many >>> languages have demonstrated), but there is a third factor, which is ?if you >>> make it easy enough, people will use it more.? Clearly records are pretty easy >>> to use; your point is that if there were a more streamlined sum-of-products >>> idiom, the third factor would be even stronger here. I think algebraic data >>> types is one of those things that will take some time for developers to learn >>> to appreciate; the easier we make it, of course the faster that will happen. >>> Now, to your syntax suggestion. Overall, I like the idea, but I have some >>> concerns. First, the good parts: >>> - The connection with enums is powerful. Users already understand enums, so this >>> will help them understand sums. Enums have special treatment in switch; we want >>> the same treatment for sealed type patterns. Enums have special treatment for >>> exhaustiveness; we want the same for sealed type patterns. So tying these >>> together with some more general enum-ness leans on what people already know. >>> - While sums and products are theoretically independent features, >>> sums-of-products are expected to be quite common. So it might be reasonable to >>> encourage this syntactically. >>> - The current proposal has some redundancy, in that the subtypes have to say >>> ?implements Node?, even if they are nested within Node. With a stronger >>> mechanism for declaring them, as you propose, then that can safely be left >>> implicit. >>> - I confess that I too like the simplicity of Haskell?s `data` declaration, and >>> this brings us closer. >>> Now, the challenges: >>> - The result is still a little busy. We need a modifier for ?enumerated type?, >>> and we would also need to be able to have child types be not only records, but >>> ordinary classes and interfaces. So we?d have to have a place for ?record?, >>> ?class?, or ?interface? with the declaration of the enumerated classes (as well >>> as other modifiers.). That busies up the result a bit. >>> - Once we do this, I worry that it will be hard to tell the difference between: >>> interface X { >>> class X1 { ? } >>> class X2 { ? } >>> } >>> and >>> enumerated interface Y { >>> class Y1 { ? }, >> class Y2 { ? } >>> } >>> and that users will forever be making mistakes like forgetting the comma, or >>> putting it where it doesn?t belong. >>> - This mechanism addresses the very common case of sum-of-product, but leaves >>> more esoteric sums out of the picture. (Consider the types in >>> java.lang.constant, which really want to be sealed.). There, because they are >>> not co-declared, we?d need something more like >>> sealed interface ConstantDesc >>> permits ClassDesc, MethodTypeDesc, ?. { } >>> It's possible that such a mechanism can be grafted on to your proposal, or there >>> is a shuffling that supports it. >>>> On Apr 29, 2019, at 2:28 PM, Alan Malloy < [ mailto:amalloy at google.com | >>>> amalloy at google.com ] > wrote: >>>> Hello again, amber-spec-experts. I have another report from the Google codebase, >>>> this time focusing on sealed types. It is viewable in full Technicolor HTML at >>>> [ http://cr.openjdk.java.net/~cushon/amalloy/sealed-types-report.html | >>>> http://cr.openjdk.java.net/~cushon/amalloy/sealed-types-report.html ] (thanks >>>> again to Liam for hosting), and included below as plain text: >>>> Author: Alan Malloy ( [ mailto:amalloy at google.com | amalloy at google.com ] ) >>>> Published: 2019-04-29 >>>> Feedback on Sealed Types >>>> Hello again, amber-spec-experts. I?m back with a second Google codebase research >>>> project. I?m looking again at the Records & Sealed Types proposal (which has >>>> now become JDK-8222777), but this time I?m focusing on sealed types instead of >>>> records, as promised in my RFC of a few weeks ago. My goal was to investigate >>>> Google?s codebase to guess what developers might have done differently if they >>>> had access to sealed types. This could help us understand what works in the >>>> current proposal and what to consider changing. >>>> Unlike my previous report, this one contains more anecdotes than statistics. It >>>> wound up being difficult to build static analysis to categorize the interesting >>>> cases, so I mostly examined promising candidates by hand. >>>> Summary and Recommendations >>>> For those who don?t care to read through all my anecdotes, I first provide a >>>> summary of my findings, and one suggested addition. >>>> Sealed types, as proposed so far, are a good idea in theory: Java already has >>>> product types and open polymorphism, and sealed types give us closed >>>> polymorphism. However, I could not find many cases of code being written today >>>> that would be greatly enhanced if sealed types were available. The main selling >>>> point of sealed types for application authors is getting help from the compiler >>>> with exhaustiveness checking, but in practice developers almost always have a >>>> default case, because they are only interested in a subset of the possible >>>> subclasses, and want to ignore cases they don?t understand. This means that >>>> exhaustiveness-checking for pattern matches would mostly go unused if >>>> developers rewrote their existing code using sealed types. >>>> Pattern matching is great, and can replace visitors in many cases, but this does >>>> not depend on sealed types except for exhaustiveness checks (which, again, >>>> would go mostly unused in code written today). The class hierarchies for which >>>> people define visitors today are just too large to write an exhaustive pattern >>>> match, and so a default case would be very common. >>>> The other audience for sealed types is library authors. While in practice most >>>> developers have no great need to forbid subclasses, perhaps it would be a boon >>>> for authors of particularly popular libraries, who need to expose a non-final >>>> class as an implementation detail but don?t intend for consumers to create >>>> their own subclasses. Those authors can already include documentation saying >>>> ?you really should not extend this?, but there is always some weirdo out there >>>> who will ignore your warnings and then write an angry letter when the next >>>> version of your library breaks his program (see: sun.misc.Unsafe). Authors of >>>> such libraries would welcome the opportunity to make it truly impossible to >>>> create undesirable subclasses. >>>> Sealed Types As a Vehicle For Sum Types >>>> So, sealed types as-is would be an improvement, but a niche one, used by few. I >>>> think we can get substantially more mileage out of them if we also include a >>>> more cohesive way to explicitly define a sum type and all its subtypes in one >>>> place with minimal ceremony. Such a sum type could be sealed, implicitly or >>>> explicitly. A tool like this takes what I see as the ?theoretical? advantage of >>>> sum types (closed polymorphism), and makes it ?practical? by putting it front >>>> and center. Making sums an actual language element instead of something >>>> ?implied? by sealing a type and putting its subclasses nearby could help in a >>>> lot of ways: >>>> * Developers might more often realize that a sealed/sum type is a good model for >>>> their domain. Currently it?s a ?pattern? external to the language instead of a >>>> ?feature?, and many don?t realize it could be applied to their domain. Putting >>>> it in the language raises its profile, addressing the problem that people don?t >>>> realize they want it. >>>> * The compiler could provide help for defining simple sums-of-products, while >>>> making it possible to opt into more complicated subclasses, in much the way >>>> that enums do: the typical enum just has bare constants like EAST, but you can >>>> add constructor arguments or override methods when necessary. >>>> * The ability to more easily model data in this way may result in developers >>>> writing more classes that are amenable to sealing/sums, as they do in other >>>> languages with explicit sum types (Haskell, Kotlin, Scala). Then, the >>>> exhaustiveness-checking feature that sealed types provide would pull more >>>> weight. >>>> Since enum types are ?degenerate sum types?, the syntax for defining sums can >>>> borrow heavily from enums. A sketch of the syntax I imagine for such things (of >>>> course, I am not married to it): >>>> public type-enum interface BinaryTree { >>>> Leaf { >>>> @Override public Stream elements() {return Stream.empty();} >>>> }, >>>> Node(T data, BinaryTree left, BinaryTree right) { >>>> @Override public Stream elements() { >>>> return Stream.concat(left.elements(), >>>> Stream.concat(Stream.of(data), right.elements())); >>>> } >>>> }; >>>> public Stream elements(); >>>> } >>>> Like enums, you can use a bare identifier for simple types that exist only to be >>>> pattern-matched against, but you can add fields and/or override blocks as >>>> necessary. The advantage over declaring a sealed type separately from its >>>> elements is both concision (the compiler infers visible records, superclass, >>>> and all type parameters) and clarity: you state your intention firmly. I think >>>> a convenient syntax like this will encourage developers to use the powerful >>>> tool of sealed types to model their data. >>>> Evidence in Google?s Codebase >>>> If you are just interested in recommendations, you can stop reading now: they >>>> are all included in the summary. What follows is a number of anecdotes, or case >>>> studies if you prefer, that led me to the conclusions above. Each shows a type >>>> that might have been written as a sealed type, and hopefully highlights a >>>> different facet of the question of how sealed types can be useful. >>>> The first thing I looked for was classes which are often involved in instanceof >>>> checks. As language authors, we imagine people writing stuff like this[1] all >>>> the time: >>>> interface Expr {int eval(Scope s);} >>>> record Var(String name) implements Expr { >>>> public int eval(Scope s) {return s.get(name);} >>>> } >>>> record Sum(Expr left, Expr right) implements Expr { >>>> public int eval(Scope s) {return left.eval(s) + right.eval(s);} >>>> } >>>> class Analyzer { >>>> Stream variablesUsed(Expr e) { >>>> if (e instanceof Var) return Stream.of(((Var)e).name); >>>> if (e instanceof Sum) { >>>> return variablesUsed(((Sum)e).left) >>>> .concat(variablesUsed(((Sum)e).right)); >>>> } >>>> throw new IllegalArgumentException(); >>>> } >>>> } >>>> Here, the Expr interface captures some of the functionality shared by all >>>> expressions, but later a client (Analyzer) came along and invented some other >>>> polymorphic operations to perform on an Expr, which Expr did not support. So >>>> Analyzer needed to do instanceof checks instead, externalizing the >>>> polymorphism. The principled approach would have been for Expr to export a >>>> visitor to begin with, but perhaps it wasn?t seen as worth the trouble at the >>>> time. >>>> To try to find this pattern in the wild, I searched for method bodies which >>>> perform multiple instanceof checks against the same variable. Notably, this >>>> excludes the typical equals(Object) method, which only performs a single check. >>>> For each such variable, I noted: >>>> 1. Its declared type >>>> 2. The set of subtypes it was checked for with instanceof >>>> 3. The common supertype of those subtypes. >>>> I guessed that (3) would usually be the same as (1), but in practice 55% of the >>>> time they were different. Often, the declared type was Object, or some generic >>>> type variable which erases to Object, while the common supertype being tested >>>> was something like Number, Event, or Node. For example, a Container knows it >>>> will be used in some context where NaN is unsuitable, so it checks whether its >>>> contents are Float or Double, and if so ensures NaN is not stored. As a second >>>> example, a serialize(Object) method checks whether its input is String or >>>> ByteString, and throws an exception otherwise. >>>> Bad sealed types found looking at instanceof checks >>>> I looked through the most popular declared types of these candidates, to >>>> investigate which types are often involved in such checks. Most of them are not >>>> good candidates for a sealed type. Object was the most common type, followed by >>>> Exception and Throwabe. >>>> Next up is an internal DOMObject class, which sounds promising until I tell you >>>> it has thousands of direct subclasses. Nobody is doing exhaustive switches on >>>> this, of course. Instead, many uses iterate over a Collection, or >>>> receive a DOMObject in some way, and just check whether it is of one or two >>>> specific subtypes they care about. This turned out to be a very common pattern, >>>> not just for DOMObject, but for many candidate sealed types I found: nobody >>>> does exhaustive case analysis. They just look for objects they understand in >>>> some much larger hierarchy, and ignore the rest. >>>> Some more humorous types that are often involved in instanceof checks: >>>> java.net.InetAddress (everyone wants to know if it?s v4 or v6) and >>>> com.sun.source.tree.Tree, in our static-analysis tools. Tree is an interesting >>>> case: here we do exactly what I mentioned previously for DOMObject. On the >>>> surface it seems that Tree would be a good candidate for a sealed interface >>>> with record subtypes, but in practice I?m not sure what sealing would buy us. >>>> We would effectively opt out of exhaustiveness-checking by having a large >>>> default case, or by extending a visitor with default-empty methods. Of course, >>>> sometimes we define a new visitor to do some polymorphic operation over a Tree, >>>> but more often we just look for one or two subtypes we care about. For example, >>>> DataFlow inspects a Tree, but knows from context that it is either a >>>> LambdaExpressionTree, MethodTree, or an initializer. >>>> Plausible sealed types found looking at instanceof checks >>>> The previous section notwithstanding, I did dig deep enough into the results to >>>> find a few classes that could make good sealed types. The most prominent, and >>>> most interesting, was another AST. There is an abstract Node class for >>>> representing HTML documents. It has just 4 subclasses defined in the same file: >>>> Text, Comment, Tag, and EndTag. This spartan definition suggests it?s used for >>>> something like SAX parsing, but I didn?t confirm this. It does everything you >>>> could hope for from a type like this: it exposes a Visitor, it provides an >>>> accept(Visitor) method, and the superclass specifies abstract methods for a >>>> couple of the most common things you would want to do, such as a String >>>> toHtml() method. >>>> However, recall that I found this class by looking for classes often involved in >>>> instanceof checks! Some people use the visitor, but why doesn?t everyone? The >>>> first reason I found is one I?ve mentioned several times already: clients only >>>> care about one of the 4 cases, and may have felt creating an anonymous visitor >>>> is too much ceremony. Would they be happy with a switch and a default clause? >>>> Probably, but it?s hard to know for sure. The second reason surprised me a bit: >>>> I found clients doing analysis that isn?t really amenable to any one visitor, >>>> or a simple pattern-match. They?ve written this: >>>> if (mode1) { if (x instanceof Tag) {...} } >>>> else if (mode2) { if (x instanceof Text) {...}} >>>> The same use site cares about different subclasses at different times, depending >>>> on some other flag(s) controlling its behavior. Even if we offered a >>>> pattern-match on x, it?s difficult to encode the flags correctly. They would >>>> have to match on a tuple of (mode1, mode2, x), with a case for (true, _, Tag >>>> name) and another for (false, true, Text text). Technically possible, but not >>>> really prettier than what they already have, especially since you would need to >>>> use a local record instead of an anonymous tuple. >>>> Even so, I think this would have benefited from being a sealed type. Recall that >>>> earlier I carefully said ?4 subclasses defined in the same file?. This is >>>> because some jokester in a different package altogether has defined their own >>>> fifth subclass, Doctype. They have their own sub-interface of Visitor that >>>> knows about Doctype nodes. I can?t help but feel that the authors of Node would >>>> have preferred to make this illegal, if they had been able to. >>>> The second good sealed type I found is almost an enum, except that one of the >>>> instances has per-instance data. This is not exactly a surprise, since an enum >>>> is a degenerate sum type, and one way to think of sealed types is as a way to >>>> model sums. It looks something like this[2]: >>>> public abstract class DbResult { >>>> public record NoDatabase() extends DbResult; >>>> public record RowNotFound() extends DbResult; >>>> // Four more error types ... >>>> public record EmptySuccess() extends DbResult; >>>> public record SuccessWithData(T data) extends DbResult; >>>> public T getData() { >>>> if (!(this instanceof SuccessWithData)) >>>> throw new DbException(); >>>> return ((SuccessWithData)this).data; >>>> } >>>> public DbResult transform(Function f) { >>>> if (!(this instanceof SuccessWithData)) { >>>> return (DbResult)this; >>>> } >>>> return new SuccessWithData(f.apply( >>>> ((SuccessWithData)this).data)); >>>> } >>>> Reading this code made me yearn for Haskell: here is someone who surely wanted >>>> to write >>>> data DbResult t = NoDatabase | NoRow | EmptySuccess | Success t >>>> but had to spend 120 lines defining their sum-of-products (the extra verbosity >>>> is because really they made the subclasses private, and defined private static >>>> singletons for each of the error types, with a static getter to get the type >>>> parameter right). This seems like a potential win for records and for sealed >>>> types. Certainly my snippet was much shorter than the actual source file >>>> because the proposed record syntax is quite concise, so that is a real win. But >>>> what do we really gain from sealing this type? Still nobody does exhaustive >>>> analysis even of this relatively small type: they just use functions like >>>> getData and transform to work with the result generically, or spot-check a >>>> couple interesting subtypes with instanceof. Forbidding subclassing from other >>>> packages hardly matters: nobody was subclassing it anyway, and nor would they >>>> be tempted to. Really the improvements DbResult benefits most from are records, >>>> and pattern-matching on records. It would be much nicer to replace the >>>> instanceof/cast pattern with a pattern-match that extracts the relevant field. >>>> This is the use case that inspired my idea of a type-enum, in the Summary >>>> section above. Rewriting it as a type-enum eliminates many of the problems: all >>>> the instanceof checks are gone, we don?t need a bunch of extra keywords for >>>> each case, and we?re explicit about the subclasses ?belonging to? the sealed >>>> parent, which means we get stuff like extends and for free. We get improved >>>> clarity by letting the definition of the class hierarchy reflect its ?nature? >>>> as a sum. >>>> public abstract type-enum DbResult { >>>> NoDatabase, >>>> RowNotFound, >>>> EmptySuccess, >>>> SuccessWithData(T data) { >>>> @Override public T getData() { >>>> return data; >>>> } >>>> @Override public DbResult transform(Function f) { >>>> return new SuccessWithData(f.apply(data)); >>>> } >>>> } >>>> public T getData() { >>>> throw new DbException(); >>>> } >>>> public DbResult transform(Function f) { >>>> return (DbResult)this; >>>> } >>>> } >>>> Visitors >>>> Instead of doing a bunch of instanceof checks, the ?sophisticated? way to >>>> interact with a class having a small, known set of subtypes is with a visitor. >>>> I considered doing some complicated analysis to characterize what makes a class >>>> a visitor, and trying to automatically cross-reference visitors to the classes >>>> they visit...but in practice simply looking for classes with ?Visitor? in their >>>> name was a strong enough signal that a more complicated approach was not >>>> needed. Having identified visitors, I looked at those visitors with the most >>>> subclasses, since each distinct subclass corresponds to one ?interaction? with >>>> the sealed type that it visits, and well-used visitors suggest both popularity >>>> and good design. >>>> One common theme I found: developers aren?t good at applying the visitor >>>> pattern. Many cases I found had some weird and inexplicable quirk compared to >>>> the ?standard? visitor. These developers will be relieved to get >>>> pattern-matching syntax so they can stop writing visitors. >>>> The Visiting Object >>>> The first popular visitor I found was a bit odd to me. It?s another tree type, >>>> but with a weird amalgam of several visitors, and an unusual approach to its >>>> double dispatch. I have to include a relatively lengthy code snippet to show >>>> all of its facets: >>>> public static abstract class Node { >>>> public interface Visitor { >>>> boolean process(Node node); >>>> } >>>> public boolean visit(Object v) { >>>> return v instanceof Visitor >>>> && ((Visitor)v).process(this); >>>> } >>>> // Other methods common to all Nodes ... >>>> } >>>> public static final class RootNode extends Node { >>>> public interface Visitor { >>>> boolean processRoot(RootNode node); >>>> } >>>> @Override >>>> public boolean visit(Object v) { >>>> return v instanceof Visitor >>>> ? ((Visitor)v).processRoot(this) >>>> : super.visit(v); >>>> } >>>> // Other stuff about root nodes ... >>>> } >>>> public static abstract class ValueNode extends Node { >>>> public interface Visitor { >>>> boolean processValue(ValueNode node); >>>> } >>>> @Override >>>> public boolean visit(Object v) { >>>> return v instanceof Visitor >>>> ? ((Visitor)v).processValue(this) >>>> : super.visit(v); >>>> } >>>> } >>>> public static final class BooleanNode extends ValueNode { >>>> public interface Visitor { >>>> boolean processBool(BooleanNode node); >>>> } >>>> @Override >>>> public boolean visit(Object v) { >>>> return v instanceof Visitor >>>> ? ((Visitor)v).processBool(this) >>>> : super.visit(v); >>>> } >>>> // Other stuff about booleans ... >>>> } >>>> public static final class StringNode extends ValueNode { >>>> // Much the same as BooleanNode >>>> } >>>> This goes on for some time: there is a multi-layered hierarchy of dozens of node >>>> types, each with a boolean visit(Object) method, and their own distinct Visitor >>>> interface, in this file. I should note that this code is actually not written >>>> by a human, but rather generated by some process (I didn?t look into how). I >>>> still think it is worth mentioning here for two reasons: first, whoever wrote >>>> the code generator would probably do something similar if writing it by hand, >>>> and second because these visitors are used often by hand-written code. >>>> Speaking of hand-written code, visitor subclasses now get to declare ahead of >>>> time exactly which kinds of nodes they care about, by implementing only the >>>> appropriate Visitor interfaces: >>>> private class FooVisitor implements StringNode.Visitor, >>>> BooleanNode.Visitor, RootNode.Visitor { >>>> // ... >>>> } >>>> This isn?t how I would have written things, but I can sorta see the appeal, if >>>> you don?t have to write it all by hand: a visitor can choose to handle any one >>>> subclass of ValueNode, or all ValueNodes, or just RootNode and StringNode, et >>>> cetera. They get to pick and choose what sub-trees of the inheritance tree they >>>> work with. >>>> Would Node be a good sealed class? Maybe. It clearly intends to enumerate all >>>> subclasses, but the benefit it gets from enforcing that is minimal. As in my >>>> previous examples, the main advantage for Node implementors would come from >>>> records, and the main advantage for clients would come from pattern-matching, >>>> obviating their need for this giant visitor. >>>> The Enumerated Node >>>> Another AST, this time for some kind of query language, explicitly declares an >>>> enum of all subclasses it can have, and uses this enum instead of using >>>> traditional double-dispatch: >>>> public interface Node { >>>> enum Kind {EXPR, QUERY, IMPORT /* and 9 more */} >>>> Kind getKind(); >>>> Location getLocation(); >>>> } >>>> public abstract record AbstractNode(Location l) implements Node {} >>>> public class Expr extends AbstractNode { >>>> public Kind getKind() {return EXPR;} >>>> // ... >>>> } >>>> // And so on for other Kinds ... >>>> public abstract class Visitor { >>>> // Empty default implementations, not abstract. >>>> public Expr visitExpr(Expr e) {} >>>> public Query visitQuery(Query q) {} >>>> public Import visitImport(Import i) {} >>>> public Node visit(Node n) { >>>> switch (n.getKind()) { >>>> case EXPR: return visitExpr((Expr)n); >>>> case QUERY: return visitQuery((Query)n); >>>> case IMPORT: return visitImport((Import)n); >>>> // ... >>>> } >>>> } >>>> } >>>> It?s not really clear to me why they do it this way, instead of putting an >>>> accept(Visitor) method on Node. They gain the ability to return different types >>>> for each Node subtype, but are hugely restricted in what visitors can do: they >>>> must return a Node, instead of performing an arbitrary computation. It seems >>>> like the idea is visitors must specialize to tree rewriting, but I still would >>>> have preferred to parameterize the visitor by return type. >>>> Would this be better as a sealed type? I feel sure that if sealed types existed, >>>> the authors of this class would have used one. We could certainly do away with >>>> the enum, and use an expression-switch instead to pattern-match in the >>>> implementation of visit(Node). But I think the Visitor class would still exist, >>>> and still have separate methods for each Node subtype, because they developer >>>> seemed to care about specializing the return type. The only place where an >>>> exhaustiveness check helps would be in the visit(Node) method, inside the >>>> visitor class itself. All other dispatch goes through visit(Node), or through >>>> one of the specialized visitor methods if the type is known statically. It >>>> seems like overall this would be an improvement, but again, the improvement >>>> comes primarily from pattern-matching, not sealing. >>>> Colocated interface implementations >>>> Finally, I looked for interfaces having all of their implementations defined in >>>> the same file. On this I do have some statistical data[3]. A huge majority >>>> (98.5%) of public interfaces have at least one implementation in a different >>>> source file. Package-private interfaces also tend to have implementations in >>>> other files: 85% of them are in this category. For protected interfaces it?s >>>> much closer: only 53% have external implementations. Of course, all private >>>> interfaces have all implementations in a single file. >>>> Next, I looked at interfaces that share a source file with all their >>>> implementations, to see whether they?d make good sealed types. First was this >>>> Entry class: >>>> public interface Entry { >>>> enum Status {OK, PENDING, FAILED} >>>> Status getStatus(); >>>> int size(); >>>> String render(); >>>> } >>>> public class UserEntry implements Entry { >>>> private User u; >>>> private Status s; >>>> public UserEntry(User u, Status s) { >>>> this.u = u; >>>> this.s = s; >>>> } >>>> @Override String render() {return [ http://u.name/ | u.name ] ();} >>>> @Override int size() {return 1;} >>>> @Override Status getStatus() {return s;} >>>> } >>>> public class AccountEntry implements Entry { >>>> private Account a; >>>> private Status s; >>>> public UserEntry(Account a, Status s) { >>>> this.a = a; >>>> this.s = s; >>>> } >>>> @Override String render() {return a.render();} >>>> @Override int size() {return a.size();} >>>> @Override Status getStatus() {return s;} >>>> } >>>> A huge majority of the clients of this Entry interface treat it polymorphically, >>>> just calling its interface methods. In only one case is there an instanceof >>>> check made on an Entry, dispatching to different methods depending on which >>>> subclass is present. >>>> Is this a good sealed type? I think not, really. There are two implementations >>>> now, but perhaps there will be a GroupEntry someday. Existing clients should >>>> continue to work in that case: the polymorphic Entry interface provides >>>> everything clients are ?intended? to know. >>>> Another candidate for sealing: >>>> public interface Request {/* Empty */} >>>> public record RequestById(int id) implements Request; >>>> public record RequestByValue(String owner, boolean historic) implements Request; >>>> public class RequestFetcher { >>>> public List fetch(Iterable requests) { >>>> List idReqs = Lists.newArrayList(); >>>> List valueReqs = Lists.newArrayList(); >>>> List queries = Lists.newArrayList(); >>>> for (Request req : requests) { >>>> if (req instanceof RequestById) { >>>> idReqs.add((RequestById)req); >>>> } else if (req instanceof RequestByValue) { >>>> valueReqs.add((RequestByValue)req); >>>> } >>>> } >>>> queries.addAll(prepareIdQueries(idReqs)); >>>> queries.addAll(prepareValueQueries(valueReqs)); >>>> return runQueries(queries); >>>> } >>>> } >>>> Interestingly, since the Request interface is empty, the only way to do anything >>>> with this class is to cast it to one implementation type. In fact, the >>>> RequestFetcher I include here is the only usage of either of these classes >>>> (plus, of course, helpers like prepareIdQueries). >>>> So, clients need to know about specific subclasses, and want to be sure they?re >>>> doing exhaustive pattern-matching. Seems like a great sealed class to me. >>>> Except...actually each of the two subclasses has been extended by a decorator >>>> adding a source[4]: >>>> public record SourcedRequestById(Source source) extends RequestById; >>>> public record SourcedRequestByValue(Source source) extends RequestByValue; >>>> Does this argue in favor of sealing, or against? I don?t really know. The owners >>>> of Request clearly intended for all four of these subclasses to exist (they?re >>>> in the same package), so they could include them all in the permitted subtype >>>> list, but it seems like a confusing API to expose to clients. >>>> A third candidate for sealing is another simple sum type: >>>> public interface ValueOrAggregatorException { >>>> T get(); >>>> public static ValueOrAggregatorException >>>> of(T value) { >>>> return new OfValue(value); >>>> } >>>> public static ValueOrAggregatorException >>>> ofException(AggregatorException err) { >>>> return new OfException(err); >>>> } >>>> private record OfValue(T value) >>>> implements ValueOrAggregatorException { >>>> @Override T get() {return value;} >>>> } >>>> private record OfException(AggregatorException err) >>>> implements ValueOrAggregatorException { >>>> @Override T get() {throw err;} >>>> } >>>> } >>>> It has only two subtypes, and it seems unimaginable there could ever be a third, >>>> so why not seal it? However, the subtypes are intentionally hidden: it is >>>> undesirable to let people see whether there?s an exception, except by having it >>>> thrown at you. In fact AggregatorException is documented as ?plugins may throw >>>> this, but should never catch it?: there is some higher-level thing responsible >>>> for catching all such exceptions. So, this type gains no benefit from >>>> exhaustiveness checks in pattern-matching. The type is intended to be used >>>> polymorphically, through its interface method, even though its private >>>> implementation is amenable to sealing. >>>> ________________ >>>> [1] Throughout this document I will use record syntax as if it were already in >>>> the language. This is merely for brevity, and to avoid making the reader spend >>>> a lot of time reading code that boils down to just storing a couple fields. In >>>> practice, of course the code in Google?s codebase either defines the records by >>>> hand, or uses an @AutoValue. >>>> [2] Recall that @AutoValue, Google?s ?record?, allows extending a class, which >>>> is semantically okay here: DbResult has no state, only behavior. >>>> [3]This data is imperfect. While the Google codebase strongly discourages having >>>> more than one version of a module checked in, there is still some amount of >>>> ?vendoring? or checking in multiple versions of some package, e.g. for >>>> supporting external clients of an old version of an API. As a result, two >>>> ?different files? which are really copies of each other may implement >>>> interfaces with the same fully-qualified name; I did not attempt to control for >>>> this case, and so such cases may look like they were in the same file, or not. >>>> [4] Of course in the record proposal it is illegal to extend records like this; >>>> in real life these plain data carriers are implemented by hand as ordinary >>>> classes, so the subtyping is legal. From brian.goetz at oracle.com Wed May 1 12:37:23 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 1 May 2019 08:37:23 -0400 Subject: Feedback on Sealed Types In-Reply-To: <2005472563.509550.1556713960688.JavaMail.zimbra@u-pem.fr> References:

<7a3cd179-1ccc-f3d5-b438-3146e87fa4eb@oracle.com> <2005472563.509550.1556713960688.JavaMail.zimbra@u-pem.fr> Message-ID: <8B333F6F-77A8-4571-BA5E-A131A5DD8368@oracle.com> > It may solve the enclosing issue because the ';' syntactically separate A, B and C from the content of I which is declared after the ';', so A, B and C can be top-level. Trying to make these top level has the same ?how do I find the source file? problem that aux classes have. > I kind a like the intellectual separation between > - a sealed interface which represent a closed type and requires a permit clause and > - an enum interface which represent a sum type which is sugar on top of sealed interface + records. This does have a certain appeal, as each construct underscores what it is for. On the other hand, the return-on-sugar for the second is just not that big (unlike with records or enums). Basically, you get to drop the word ?record? and ?implements I? a bunch of times ? not clear it carries its weight. From brian.goetz at oracle.com Wed May 1 13:58:12 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 1 May 2019 09:58:12 -0400 Subject: Feedback on Sealed Types In-Reply-To: <8B333F6F-77A8-4571-BA5E-A131A5DD8368@oracle.com> References:

<7a3cd179-1ccc-f3d5-b438-3146e87fa4eb@oracle.com> <2005472563.509550.1556713960688.JavaMail.zimbra@u-pem.fr> <8B333F6F-77A8-4571-BA5E-A131A5DD8368@oracle.com> Message-ID: <3CE5AD07-CFCA-47A0-A48A-34E0C3DF0742@oracle.com> > >> I kind a like the intellectual separation between >> - a sealed interface which represent a closed type and requires a permit clause and >> - an enum interface which represent a sum type which is sugar on top of sealed interface + records. > To be clear, I think what Alan is suggesting, and what Remi is supporting, is: - Make ?sealed? the primitive for defining closed types, as originally proposed, and also - Make the following enumerated interface Foo { R(X), S(Y); STUFF } sugar for sealed interface Foo permits R, S { STUFF record R(X) implements Foo { } record S(Y) implements Foo { } } Is that correct? From amalloy at google.com Wed May 1 15:34:45 2019 From: amalloy at google.com (Alan Malloy) Date: Wed, 1 May 2019 08:34:45 -0700 Subject: Feedback on Sealed Types In-Reply-To: <3CE5AD07-CFCA-47A0-A48A-34E0C3DF0742@oracle.com> References:

<7a3cd179-1ccc-f3d5-b438-3146e87fa4eb@oracle.com> <2005472563.509550.1556713960688.JavaMail.zimbra@u-pem.fr> <8B333F6F-77A8-4571-BA5E-A131A5DD8368@oracle.com> <3CE5AD07-CFCA-47A0-A48A-34E0C3DF0742@oracle.com> Message-ID: Yes, that is what I suggest. Two points, though. First, the sugar benefit is at least a tiny bit larger than you say. You also get to omit instances of if the interface is parameterized, as I expect it will often be. I argue you should get to omit public, too: the implementations of a sum should always be public, just as the accessors for a record should be, for the same reason: they are the entire propose of defining the type, and allowing variation here detracts from their semantic value. And second, I don't know that counting the number of saved characters/tokens is the best way to measure the benefits anyway. An enhanced for loop over an array is not that much shorter than an old-style for loop with an explicit index - in fact it probably saves fewer characters than a couple "implements FooSum". But it's clearly a win because it communicates intent better, and leaves fewer opportunities to make a mistake, either in writing the code or in reading it. Likewise the ability to say in a single token, "this is a closed sum" has legibility benefits aside from just being shorter. On Wed, May 1, 2019, 6:58 AM Brian Goetz wrote: > > > > I kind a like the intellectual separation between > - a sealed interface which represent a closed type and requires a permit > clause and > - an enum interface which represent a sum type which is sugar on top of > sealed interface + records. > > > > To be clear, I think what Alan is suggesting, and what Remi is supporting, > is: > > - Make ?sealed? the primitive for defining closed types, as originally > proposed, and also > - Make the following > > enumerated interface Foo { > R(X), S(Y); > > STUFF > } > > sugar for > > sealed interface Foo > permits R, S { > > STUFF > > record R(X) implements Foo { } > record S(Y) implements Foo { } > } > > Is that correct? > > > > From forax at univ-mlv.fr Wed May 1 15:58:23 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 1 May 2019 17:58:23 +0200 (CEST) Subject: Feedback on Sealed Types In-Reply-To: <8B333F6F-77A8-4571-BA5E-A131A5DD8368@oracle.com> References:

<7a3cd179-1ccc-f3d5-b438-3146e87fa4eb@oracle.com> <2005472563.509550.1556713960688.JavaMail.zimbra@u-pem.fr> <8B333F6F-77A8-4571-BA5E-A131A5DD8368@oracle.com> Message-ID: <509217760.520690.1556726303707.JavaMail.zimbra@u-pem.fr> > De: "Brian Goetz" > ?: "Remi Forax" > Cc: "Alan Malloy" , "amber-spec-experts" > > Envoy?: Mercredi 1 Mai 2019 14:37:23 > Objet: Re: Feedback on Sealed Types >> It may solve the enclosing issue because the ';' syntactically separate A, B and >> C from the content of I which is declared after the ';', so A, B and C can be >> top-level. > Trying to make these top level has the same ?how do I find the source file? > problem that aux classes have. I was thinking that those components are something new so you can 'tweak' import for them import akeyword Foo; // automatically import the component names too if it's an enum interface. given that no enum interface exists now, it's compatible. R?mi From john.r.rose at oracle.com Thu May 2 02:42:08 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 1 May 2019 19:42:08 -0700 Subject: Feedback on Sealed Types In-Reply-To: <2005472563.509550.1556713960688.JavaMail.zimbra@u-pem.fr> References:

<7a3cd179-1ccc-f3d5-b438-3146e87fa4eb@oracle.com> <2005472563.509550.1556713960688.JavaMail.zimbra@u-pem.fr> Message-ID: <076E0A54-10A9-418B-BB9D-BD907C38F935@oracle.com> On May 1, 2019, at 5:32 AM, Remi Forax wrote: > > If there is only one constant of type Empty and the construction is typesafe, it can be a huge win. If Empty is an inline (value) type with no components, then Empty.default is the singleton, and there's nothing else to say about it. This is a use case for empty inlines. In fact they are unit types, as recognized in many languages. (There are some low-level technical reasons why Valhalla doesn't support this now, but they can be overcome with a bit of work. One problem is how to keep track of a field of size zero, if you are using relative offsets at present.) ? John P.S. Either a sum of N zero-length unit types or a classic enum of N elements, could be represented as a byte of lgN bits. (Or lg(N+1) if it's nullable.) We can't do this in the old contract of L-types, but under the new contract that allows early loading of field types, we could pull such tricks for either enums or unit-sums, in the JVM. So if Foo is a small enum, LFoo; requires 8 or 4 bytes, but GFoo; (where G is the "go and look" contract) can require just 1 byte, just like a boolean. From john.r.rose at oracle.com Fri May 3 20:21:04 2019 From: john.r.rose at oracle.com (John Rose) Date: Fri, 3 May 2019 13:21:04 -0700 Subject: String literals: some principles In-Reply-To: <0FBBF6C5-E836-43CD-88CD-BFD803AE6752@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <0FBBF6C5-E836-43CD-88CD-BFD803AE6752@oracle.com> Message-ID: <37341F23-5DC9-48A6-ADE5-CDA83EBED43A@oracle.com> On Apr 29, 2019, at 8:48 AM, Guy Steele wrote: > >> On Apr 28, 2019, at 4:32 PM, Brian Goetz wrote: >> >> . . . >> Looking ahead to the next round, we can build on this. In the first round, we mistakenly thought that there was something that could reasonably be called a ?raw? string, but this notion is a fantasy; no string literal is so raw that it can?t recognize its closing delimiter. So ?rawness? is really only a matter of degree. > > This is _almost_ true. If a string is truly raw (that is, it can contain _anything_), then one absolutely cannot depend on recognizing the closing delimiter by examining what might be the raw content. > > Put another way: one cannot determine how long the raw content is by examining it. That?s a solid principle. I'm going to be nit-picky here and refer to my earlier mentions of the paradigm of strong quoting, which at its heart simply means you have an infinite set of delimiters to choose from, when wrapping a payload into a literal syntax. Adding a numeral to the open quote means that there are now an unbounded set of open quotes, so it is an instance of strong quoting. Another instance of strong quoting adds nonces, and yet another just lengthens the quote pattern until it doesn't occur (anywhere) in the raw string payload. The numeric prefix convention is different from other kinds of strong quoting conventions, in that the end-quote can be a substring of the payload. Actually, the end-quote is most naturally the empty string, which is a substring of every string. The numeric prefix convention and other strong-quote conventions all share a common property: The convention as a whole is universal for arbitrary payloads, but for any given payload there are quotes which work and others that don't work. In the case of the numeric prefix convention, once you choose an open-quote (with numeral) you are limited to payloads of that length. That's not quite a "raw string" any more, since it's suitable only for a fixed-sized character field. Likewise, once you choose a particular nonce-based or patterned quote (e.g., seven double-quotes), payloads containing the corresponding end-quote as a substring are no longer suitable. Once you pick a particular payload string, the next question is whether you can embed that particular string into your program without inserting escape sequences. Only with a strong quote scheme of some sort is this possible. But, with any of several strong quote schemes, it is possible to dispense with escapes for any given string; it is not a fantasy. ? John From guy.steele at oracle.com Fri May 3 20:37:36 2019 From: guy.steele at oracle.com (Guy Steele) Date: Fri, 3 May 2019 16:37:36 -0400 Subject: String literals: some principles In-Reply-To: <37341F23-5DC9-48A6-ADE5-CDA83EBED43A@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <0FBBF6C5-E836-43CD-88CD-BFD803AE6752@oracle.com> <37341F23-5DC9-48A6-ADE5-CDA83EBED43A@oracle.com> Message-ID: I completely agree with what you said here, John. We both took a good look, but you squinted with your right eye, and I with my left. :-) Either point of view is correct; the two together yield depth perception. Yay! > On May 3, 2019, at 4:21 PM, John Rose wrote: > > On Apr 29, 2019, at 8:48 AM, Guy Steele wrote: >> >>> On Apr 28, 2019, at 4:32 PM, Brian Goetz wrote: >>> >>> . . . >>> Looking ahead to the next round, we can build on this. In the first round, we mistakenly thought that there was something that could reasonably be called a ?raw? string, but this notion is a fantasy; no string literal is so raw that it can?t recognize its closing delimiter. So ?rawness? is really only a matter of degree. >> >> This is _almost_ true. If a string is truly raw (that is, it can contain _anything_), then one absolutely cannot depend on recognizing the closing delimiter by examining what might be the raw content. >> >> Put another way: one cannot determine how long the raw content is by examining it. That?s a solid principle. > > I'm going to be nit-picky here and refer to my earlier > mentions of the paradigm of strong quoting, which > at its heart simply means you have an infinite set of > delimiters to choose from, when wrapping a payload > into a literal syntax. > > Adding a numeral to the open quote means that there > are now an unbounded set of open quotes, so it is an > instance of strong quoting. Another instance of strong > quoting adds nonces, and yet another just lengthens > the quote pattern until it doesn't occur (anywhere) in > the raw string payload. > > The numeric prefix convention is different from other > kinds of strong quoting conventions, in that the end-quote > can be a substring of the payload. Actually, the end-quote > is most naturally the empty string, which is a substring > of every string. > > The numeric prefix convention and other strong-quote > conventions all share a common property: The convention > as a whole is universal for arbitrary payloads, but for > any given payload there are quotes which work and others > that don't work. In the case of the numeric prefix > convention, once you choose an open-quote (with > numeral) you are limited to payloads of that length. > That's not quite a "raw string" any more, since it's > suitable only for a fixed-sized character field. > Likewise, once you choose a particular nonce-based > or patterned quote (e.g., seven double-quotes), > payloads containing the corresponding end-quote > as a substring are no longer suitable. > > Once you pick a particular payload string, the next > question is whether you can embed that particular > string into your program without inserting escape > sequences. Only with a strong quote scheme of > some sort is this possible. But, with any of several > strong quote schemes, it is possible to dispense > with escapes for any given string; it is not a fantasy. > > ? John From john.r.rose at oracle.com Fri May 3 22:25:49 2019 From: john.r.rose at oracle.com (John Rose) Date: Fri, 3 May 2019 15:25:49 -0700 Subject: String literals: some principles In-Reply-To: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> Message-ID: <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> TL;DR: Good framework; must also account for the rectangle extraction rule (RER). A unified escape sublanguage (ESL) is highly desirable, and I propose adding <\ > and <\ LT WS*> as escapes for space and for null string. The existing \ char is OK, and should be "fattened" as a separate feature. I note some issues with <\ u X X X X>. On Apr 28, 2019, at 1:32 PM, Brian Goetz wrote: > - Opening delimiter > - Closing delimiter > - Escape characters, if any > - Escape sublanguages, if any Yes, this is a useful way to break down the syntax. You left out padding conventions as a degree of freedom. Padding conventions given the programmer detailed control over the format of the program by associating non-payload characters with the string literal. Whitespace rectangle extraction is the only padding convention we are discussing, plus occasional suggestions that we remove horizontal space in one-line fat strings. If we denote today's escape sublanguage as ESL and the rectangle extraction rule as RER, then today's literals are: ThinString=SL[open=close=", escape=\, esl=ESL, pc=none] Tomorrow's fat strings will be something like: FatString=SL[open=close=""", escape=\, esl=ESL, pc=RER] Another aspect of defining a string literal is the *phasing* of the different features. I think we have good consensus that padding should be stripped *before* escape interpretation, so that escaped characters are not mistaken for padding characters. > I bring this up not because I want to talk about raw-ness now (getting the hint?), but because I want to keep all the variations of string literals as lightly-varying projections of the same basic feature. Understanding the variations is important. It also gives me hope that we could parley this framework, later on, into something strong. In the future (not now) we might add a parameterized range of these schemes: StrongString=SL[open=close=F(N), escape=G(N), esl=ESL, pc=RER] for some functions F, G that enumerate quote and escape tokens. This would be a strong quoting scheme that could (with care) allow any given payload string S to be embedded without the need for escapes, by choosing an N for which F(N) and G(N) do not occur in S. Getting back to today, I want to talk about escapes. First, I'll remind us all that the RER is part of fat strings and that therefore the newline and space characters are no longer just passive string body characters, but rather play a role in the string syntax. This means that the ESL needs to be upgraded so that occurrences of strings and newlines which otherwise would play a role in syntax can be escaped. I think this at a minimum means that the ESL needs to add support for the two character escape sequence <\ space>. There is already an escape sequence for a line terminator; it is <\ n>. A similar point holds for <\ t>. These three escapes (one new, two old) are enough to allow a programmer to tell the RER to stay away from a particular bit of white-space. (Note that if the RER were to happen *after* escape processing, we'd be in a pickle: There's be no way to use the existing ESL to control the RER, and we'd have to put some sort of extra control feature into the RER itself, or settle for an uncontrollable RER.) > It has come up, for example, that we might treat \ differently in ML strings as in classic strings, My own suggestions in this vein have nothing to do with making a new ESL but with extending the old one so it works well with fat strings. > but I would prefer it we could not tinker with the escape language in nonuniform ways ? as this minimizes the variations between the various sub-features. I agree that we should have only one ESL; there's no reason to have different "dialects" of it in different types of strings. So <\ space> should be added to the ESL, not because it's particularly useful for thin strings, but because it escapes otherwise strippable padding in fat strings. Here's an interesting feature of the JLS: It defines a uniform ESL for both string and character literals. This means that <\ '> can occur in both kinds of literals, even though it is only needed for character literals. Same point in reverse for <\ ">. Since the ESL is uniform, if *one* kind of literal needs a particular escape sequence, then *all* the literals have it. (See where I'm going?) Now, the upcoming features of fat strings includes a padding convention, ergo the common ESL needs a way to escape the now-syntactic padding characters. About <\ LT> (an escaped LineTerminator), a similar point holds: Sure it's useful only in string literals with line terminators, but if there is a legitimate reason to add extra control over LTs, then <\ LT> gets bundled into the common escape sublanguage of the JLS. There are two interesting questions about positioning <\ LT> as an escape sequence: 1. What does <\ LT> mean, if it is legal and not just an alias for <\ n>? 2. Is <\ LT> allowed in a thin string, given that (currently) the thin string syntax rejects LT? For 1. I'm already on record as proposing that <\ LT WS*> is an escape sequence for the null string. (WS is horizontal whitespace.) For 2., if we say "no" then we seem to come close to forking the ESL, which Brian and I want to avoid. A thin string body is a sequence of regular non-LT chars plus escape sequences, except <\ LT>. A fat string body can include <\ LT> as well as other escape sequences. But that is not really a fork of the ESL. The difference between fat and thin strings is a structural constraint on their bodies, before escape processing: A fat string can contain LT in its pre-escape-processed body, and so in fact can contain <\ LT>. A thin string cannot contain LT at all, so the presence of <\ LT> in the ESL is moot for a thin string. (Also moot for a char literal.) The parsing of a string literal (either kind) consists of gathering an escaped string body while looking for the close-quote. The close-quote interrupts the body and terminates the string. For the case of a thin string, an LT also interrupts the body, but causes parsing to fail. So we could answer "no" to 2 and keep a unified ESL, simply by asserting that thin string tokens never contain LT, while fat string tokens contain LT (always? different question). We could also answer "yes" to 2, and I think it's worth a discussion. What I'm suggesting here is that the thin strings are allowed to contain *escaped* LTs in a new version of the JLS (that also contains fat strings). The pre-escape-processed body of either kind of string can contain escaped LTs, and fat strings can *also* contain *unescaped* LTs. Example: var ts = "hel\ lo\ "; assert ts == "hello"; var fs = """ hel\ lo\ """; assert fs == "hello"; In the latter case, the RER strips most or all of the whitespace. In any case <\ LT WS*> sops up the rest. The reason we are discussing <\ LT> is that there are plenty of reasons why programmers would wish to control the format of their programs by breaking up long logical lines into shorter physical lines. Such use cases are not specific to payloads with or without newlines. If your payload has newlines, use a fat string *and* break up long logical lines into shorter physical ones. If you payload has no newlines (maybe it's a very long hex number), then use a thin string, and break it up. The RER of fat strings (which I like!) prompts the discussion of breaking up logical lines into physical ones, more than thin strings. After all, with thin strings, you break one line into two lines, it's a given that you are going to write two literals, and then the + sign (for concatenation) adds no additional overhead. The break-up sequence is something like <" LT WS + "> But if you have a large MLS with a few very long logical lines, suddenly you have an invidious choice between keeping your nice rectangle, or disrupting it totally by adding <" LT WS + ">. Breaking a long line in this case drops you off a syntax cliff. Supporting lets you down easy, by breaking the logical lines without disrupting the enclosing padding of the rectangle extraction rule. > Soliciting discussion on the pros and cons of keeping \ as our escape character. Well, \ makes a very fine escape character, except for particular payloads when it doesn't. Any payload which is a program in some little language that uses \ for escaping is going get confusing very fast. Nobody wants to count a train of escapes, and layers of escaping cause escape trains to lengthen fast (doubling with each layer). Regular expressions are the poster child, and I'll just pretend that they are the key use case, since they are the worst-behaved. Fattening \ to \\\ helps a little with REs. But it would make long trains even longer, with the result that you would need even more help keeping count. The eye can only count a small number of repeated characters at a glance. var re = "\\\\\\["; //train wreck for /\\\[/ assert ('\\'+"[").matches(re); A non-repeating escape is much easier on the eye. Choosing at random, I'll suggest <\ -> as a fattened escape sequence, with the standard ESL from the JLS (as amended with <\ space> etc). As long as that particular pair of characters is rare in REs (and other similar venues), there won't be any long trains of backslashes. var re = *"\\\["; assert ('\\'+"[").matches(re); var s6 = *"\-\- \-" \"; assert s6 == '\\'+"- \" "+'\\'; The star shows that I'm talking about some non-standard string syntax: FatEscString=SL[open=*", close=", escape=\-, esl=ESL, pc=none] I think it would be reasonable to fatten escapes as a separate feature, but not in tandem with the current multi-line string proposal. Straw man, separate from the MLS proposal. If a string literal (either fat or thing) is immediate preceded by <\ ->, the body of the string uses that sequence for its escapes instead of \. The ESL is unchanged. If stronger escapes are also desired, the feature can be extended simply by allowing any number of - characters, e.g. \--"x\-y\z" and \--"\--n" (for "x\\-y\\z" and "\n"). We are leaving \uXXXX escapes out of the accounting. This is understandable, because they are not a regular part of the ESL, and hard to treat as part of it. But we should try. In particular, we can and should find a way to treat most or all of the \uXXXX escapes *in a string body* as being expanded as part of the ESL, rather than a pre-pass. This will make \uXXXX escapes more complicated, but it may profitably simplify their effect on the user model. One idea is simple: In the body of a string, any \uXXXX which doesn't denote a controlling part of the string syntax (quote or backslash) is collected into the string body as an unexpanded character sequence <\ u X X X X>. This sequence is then supported by the ESL. The effect is that padding removal (rectangle extraction) happens before \u replacement *in a string body*. A second idea could be adopted either with the first or separately: As a structural constraint on string bodies, unicode sequences which would expand to whitespace, quote, or backslash are forbidden. And here's a draconian one: Forbid <\ u X X X X> where the code point is 007F or lower. That would blow up some stupid test cases and puzzlers; user code that does this should be fixed. If we can't do this everywhere, do it inside string bodies. We may be limited by backward compatibility on the application of these ideas to thin strings, but they should be considered at least for fat strings. There are two benefits to taming \uXXXX: 1. Fewer puzzlers involving hidden syntax (\ " etc.) 2. The processing of \uXXXX for string bodies can be documented and aligned with an "unescape" method on String, which is useful in its own right. From john.r.rose at oracle.com Fri May 3 23:40:16 2019 From: john.r.rose at oracle.com (John Rose) Date: Fri, 3 May 2019 16:40:16 -0700 Subject: Wrapping up the first two courses In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> <8F0DF8BC-B4CE-445E-808A-B4A7B74F7B3E@oracle.com>

Message-ID: <85314830-6CB0-4761-882D-24A655FC0B50@oracle.com> On Apr 26, 2019, at 8:59 AM, Kevin Bourrillion wrote: > > On Fri, Apr 26, 2019 at 8:56 AM Kevin Bourrillion wrote: > > Apparently bash's behavior is to replace with a single space character, and that at least seems like a useful behavior for us too if we're open to it. No, it replaces <\ NL> with nothing at all. Any spaces before or after that two-character sequence are bystanders. In a separate step, if not inside quotes, all sequences of whitespace are treated as if they were single spaces, as the shell breaks a line up into words. The net is that the stuff you mentioned behaves like whitespace. But also: ``` $ x=a\ b $ echo $x ab ``` However, I'm proposing that horizontal whitespace *after* the newline is "gobbled up" and thrown away with the leading <\ LT>, so the escape sequence is more like <\ LT (SP|TAB)*>. This gives the programmer more control over program layout. > I was forgetting, when I said this, that another substantial minority use case (I want to say at least 15%? These were rough estimates though) for multi-line strings is really long URLs, checksums, etc., that aren't meant to have any spaces in them at all. So the bash behavior is not necessarily what we'd want, although of course consistency with it has some amount of value in itself. The actual bash behavior, described above, *is* what we want. If the programmer *wants* a space, one can be placed just before the <\ LT> sequence. Luckily, that's reasonably readable. > Which raises another question: do we allow \ in SL strings? (I presume so, and we just eat the \ and the terminator.) If we eat the (SP|TAB)* after LT, then we have given the programmer control over indentation, in a way that is consistent with the rectangle rule, but applies only to the one escaped (partial) line. > Hmm, I can see how that could be harmless but it seems to blur the boundary between the features to me. It seems that way. I think what's happening is another iteration of "Let's do raw strings! Wait, that's not what they really are" and now we are at "Let's do multi-line strings!" Brian's comment is that the tri-quote makes a better container for payloads with single quotes. Those payloads often have multiple lines too. So it's really "fatter strings", in some sense. We might say we are making strings with *unescaped LTs*. The rectangle rule shows up as soon as we realize that programmers have strong opinions about spacing, and want to indent their code so it is readable. (Pretty too; beauty is a proxy for readability I suppose.) So if we let the programmer start putting paragraphs into string bodies, we also have to let the programmer manage indentation. And it's a short and natural step from exdenting to line-breaking, IMO. We might say we are making *more readable syntax for large strings*. Minimizing escape sequences makes them readable, and so does giving the programmer control over program layout. Such "readable strings" make some sense for one-liners also, especially if we extend the 2D rectangle rule to the 1D case and strip leading and trailing whitespace, near the triquotes. In the end, we might just dub them "fat strings". ? John From john.r.rose at oracle.com Sat May 4 00:43:45 2019 From: john.r.rose at oracle.com (John Rose) Date: Fri, 3 May 2019 17:43:45 -0700 Subject: String literals: some principles In-Reply-To: <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> Message-ID: On May 3, 2019, at 3:25 PM, John Rose wrote: > > And here's a draconian one: Forbid <\ u X X X X> > where the code point is 007F or lower. That would > blow up some stupid test cases and puzzlers; user > code that does this should be fixed. If we can't do > this everywhere, do it inside string bodies. > > We may be limited by backward compatibility on the > application of these ideas to thin strings, but they should > be considered at least for fat strings. Here's an example of how \uXXXX escapes could be brought into alignment with the escape sublanguage: https://docs.oracle.com/javase/specs/jls/se12/html/jls-3.html#jls-3.10.6 > 3.10.6. Escape Sequences for Character and String Literals ? > It is a compile-time error if the character following a backslash in an escape sequence is not an ASCII b, t, n, f, r, ", ', \, + [?space, LineTerminator,?] > 0, 1, 2, 3, 4, 5, 6, or 7. The Unicode escape \u is processed earlier (?3.3). +In a [?fat?] string literal, no part of the open or closing quote, or of +any escape sequence, or of any stripped whitespace, may contain +a character that was derived (in the earlier processing) from +a Unicode escape +[?, unless the first character of the literal, a ", was also derived +from a Unicode escape?] +. > Octal escapes are provided for compatibility with C, but can express only Unicode values \u0000 through \u00FF, so Unicode escapes are usually preferred. +(In a string literal we forbid Unicode escapes for characters which +steer the lexical syntax of the literal. This makes it easier to +read. [?The exception allows Java programs to be encoded with +dense use of Unicode escapes, as long as the open-quotes are +so encoded.?]) If we omit [fat] in the above, we get an incompatible change to thin strings. But I think it would actually be the right move. Here's a puzzler I just thought of: var puz = "\1\u0032"; // puz = '\1'+"0" or '\10'+""? This is a one-character string "\n". If \u escapes were a proper part of the escape sub-language, then puz would be a two-character string. Here's a place where prior-expansion of \u escapes interferes with the structure of fat strings: var fat = """ \u0020 hello """; // fat = "hello\n" or " hello\n"? We can stop caring about the awkward phasing of \u escapes if and only if we make a restriction that \u escapes can't mix with other parts of string syntax, as above. This goes for the new syntax as well as the old. It's easier to impose such a rule on new syntax, of course. This sort of thing makes me want to put the restriction on all string (and character) literals. It seems to me that only deliberately obfuscated code would fall afoul of it. If that's really true, this feature is completely separable from fat strings or any other menu items, as long as we are willing to apply it after the fact, incompatibly with obfuscated code. ? John From brian.goetz at oracle.com Tue May 7 22:14:59 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 7 May 2019 15:14:59 -0700 Subject: [ string literals ] Extending the escape language (was: String literals: some principles) In-Reply-To: <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> Message-ID: <8AFDFDF5-0743-48AE-833E-879A75F635FC@oracle.com> > TL;DR: Good framework; must also account for the > rectangle extraction rule (RER). A unified escape > sublanguage (ESL) is highly desirable, and I propose > adding <\ > and <\ LT WS*> as escapes for space > and for null string. The existing \ char is OK, and > should be "fattened" as a separate feature. I note > some issues with <\ u X X X X>. Agree in general with the desire to extend ESL with some whitespace sequences, though I take some issues with the syntax on \ and \. Some alternate ideas regarding \uxxxx. First, unicode escapes. Alex pointed out offline that we had worked our way into a linear thinking trap (again). In the first round, because we were focused on raw strings, we turned off \uxxxx processing in the body of a raw string, which raised the question of ?how do we turn it back on.? And also that, while we use the same escape character for both, they occupy very different places in the language; the ESL is purely about string literals, whereas \uxxxx is purely a lexing concern. His recommendation, which (now that its been explained to me) I strongly agree with, is: let?s not have this feature touch unicode processing at all. Let?s just leave unicode processing as is, using \uxxxx, whether in code, SLSLs, MLSLs, and any future ?raw? SLs. The similarly between \n and \uxxxx is purely coincidental. And if we really want the characters "\u0000? in a string literal, well, we know how to escape the \. Which brings us to \ and \. My main complaint here is that I am really uncomfortable using \ for ?literal space?, because at the end of the line, one cannot differentiate between \ and \ when reading the code. Alternatives include \_, or \s, or \., or ? many others. From john.r.rose at oracle.com Tue May 7 23:36:21 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 7 May 2019 16:36:21 -0700 Subject: [ string literals ] Extending the escape language (was: String literals: some principles) In-Reply-To: <8AFDFDF5-0743-48AE-833E-879A75F635FC@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> <8AFDFDF5-0743-48AE-833E-879A75F635FC@oracle.com> Message-ID: <90AC5F2B-C787-4A7A-9829-66FE94897BA9@oracle.com> On May 7, 2019, at 3:14 PM, Brian Goetz wrote: > > >> TL;DR: Good framework; must also account for the >> rectangle extraction rule (RER). A unified escape >> sublanguage (ESL) is highly desirable, and I propose >> adding <\ > and <\ LT WS*> as escapes for space >> and for null string. The existing \ char is OK, and >> should be "fattened" as a separate feature. I note >> some issues with <\ u X X X X>. > > Agree in general with the desire to extend ESL with some whitespace sequences, though I take some issues with the syntax on \ and \. Some alternate ideas regarding \uxxxx. > > First, unicode escapes. Alex pointed out offline that we had worked our way into a linear thinking trap (again). In the first round, because we were focused on raw strings, we turned off \uxxxx processing in the body of a raw string, which raised the question of ?how do we turn it back on.? And also that, while we use the same escape character for both, they occupy very different places in the language; the ESL is purely about string literals, whereas \uxxxx is purely a lexing concern. I don't think that's the trap we are in. The trap is the Language Experts Designing User Model trap, where LE's say "we don't need to deal with \u because it's not the part of the JLS we are working on", and the user says, "they are all just escapes, right?" The reason it's a trap is we think the user will be happy to learn and apply the geeky-fine distinctions between the two superficially similar syntaxes. One good way out of this particular trap is to carefully restrict the allowed \uxxxx patterns in strings, so that the phase order becomes irrelevant, and then move those patterns forward in the phase order along with the other escapes. We can also do as you are recommending, and ignore the problem. The only difficulty there is occasionally having to ask the user to ignore the problem also, by saying things like "yes, that's an escape sequence but \u sequence break the rule you are trying to apply". Such as using "\0040" to escape a space. How frequent is "occasionally"? I don't know; if it's very infrequent then, yes, we can ignore this problem. It will give puzzler authors some extra scope for their hobby. > His recommendation, which (now that its been explained to me) I strongly agree with, is: let?s not have this feature touch unicode processing at all. Let?s just leave unicode processing as is, using \uxxxx, whether in code, SLSLs, MLSLs, and any future ?raw? SLs. The similarly between \n and \uxxxx is purely coincidental. (That's why it's a LEDUM trap.) > And if we really want the characters "\u0000? in a string literal, well, we know how to escape the \. > > Which brings us to \ and \. My main complaint here is that I am really uncomfortable using \ for ?literal space?, because at the end of the line, one cannot differentiate between \ and \ when reading the code. Alternatives include \_, or \s, or \., or ? many others. Personally, I'm fine with those. By analogy with \n I suppose \s will be unsurprising; I don't care about this corner of the bikeshed, though. I certainly agree that having more than one "\ whitespace" sequence creates visual ambiguities; that's a good catch. ? John From guy.steele at oracle.com Wed May 8 20:26:33 2019 From: guy.steele at oracle.com (Guy Steele) Date: Wed, 8 May 2019 16:26:33 -0400 Subject: [ string literals ] Extending the escape language (was: String literals: some principles) In-Reply-To: <8AFDFDF5-0743-48AE-833E-879A75F635FC@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> <8AFDFDF5-0743-48AE-833E-879A75F635FC@oracle.com> Message-ID: <972A3698-ABC9-4D7F-B016-C8C0A8980B15@oracle.com> > On May 7, 2019, at 6:14 PM, Brian Goetz wrote: > > . . . at the end of the line, one cannot differentiate between \ and \ when reading the code. This suggests a design constraint for the ESL: whatever \ means, \ ought to mean the same thing. From john.r.rose at oracle.com Wed May 8 20:27:43 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 8 May 2019 13:27:43 -0700 Subject: [ string literals ] Extending the escape language (was: String literals: some principles) In-Reply-To: <972A3698-ABC9-4D7F-B016-C8C0A8980B15@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> <8AFDFDF5-0743-48AE-833E-879A75F635FC@oracle.com> <972A3698-ABC9-4D7F-B016-C8C0A8980B15@oracle.com> Message-ID: <10E54C71-D384-4C3E-B406-6BCCD380A082@oracle.com> On May 8, 2019, at 1:26 PM, Guy Steele wrote: > >> >> On May 7, 2019, at 6:14 PM, Brian Goetz wrote: >> >> . . . at the end of the line, one cannot differentiate between \ and \ when reading the code. > > This suggests a design constraint for the ESL: whatever \ means, \ ought to mean the same thing. Or else \+ is illegal. In other words, there shouldn't be more than one non-error meaning. From guy.steele at oracle.com Wed May 8 20:31:56 2019 From: guy.steele at oracle.com (Guy Steele) Date: Wed, 8 May 2019 16:31:56 -0400 Subject: [ string literals ] Extending the escape language (was: String literals: some principles) In-Reply-To: <10E54C71-D384-4C3E-B406-6BCCD380A082@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> <8AFDFDF5-0743-48AE-833E-879A75F635FC@oracle.com> <972A3698-ABC9-4D7F-B016-C8C0A8980B15@oracle.com> <10E54C71-D384-4C3E-B406-6BCCD380A082@oracle.com> Message-ID: > On May 8, 2019, at 4:27 PM, John Rose wrote: > > On May 8, 2019, at 1:26 PM, Guy Steele wrote: >> >>> >>> On May 7, 2019, at 6:14 PM, Brian Goetz wrote: >>> >>> . . . at the end of the line, one cannot differentiate between \ and \ when reading the code. >> >> This suggests a design constraint for the ESL: whatever \ means, \ ought to mean the same thing. > > Or else \+ is illegal. > In other words, there shouldn't be > more than one non-error meaning. True. Then there are the separate questions of (a) whether it is less confusing to Joe Programmer to accept \ but reject \+, or to make \+ ?just work?, and (b) what are costs of making \+ ?just work?. From james.laskey at oracle.com Wed May 8 20:35:23 2019 From: james.laskey at oracle.com (James Laskey) Date: Wed, 8 May 2019 17:35:23 -0300 Subject: [ string literals ] Extending the escape language (was: String literals: some principles) In-Reply-To: References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> <8AFDFDF5-0743-48AE-833E-879A75F635FC@oracle.com> <972A3698-ABC9-4D7F-B016-C8C0A8980B15@oracle.com> <10E54C71-D384-4C3E-B406-6BCCD380A082@oracle.com> Message-ID: <5579BE1E-435C-4735-9402-49634AF7A86E@oracle.com> Sent from my iPhone > On May 8, 2019, at 5:31 PM, Guy Steele wrote: > > >> On May 8, 2019, at 4:27 PM, John Rose wrote: >> >> On May 8, 2019, at 1:26 PM, Guy Steele wrote: >>> >>>> >>>> On May 7, 2019, at 6:14 PM, Brian Goetz wrote: >>>> >>>> . . . at the end of the line, one cannot differentiate between \ and \ when reading the code. >>> >>> This suggests a design constraint for the ESL: whatever \ means, \ ought to mean the same thing. >> >> Or else \+ is illegal. >> In other words, there shouldn't be >> more than one non-error meaning. > > True. Then there are the separate questions of (a) whether it is less confusing to Joe Programmer to accept \ but reject \+, or to make \+ ?just work?, and (b) what are costs of making \+ ?just work?. > Explaining to Joe Programmer might be the main cost. From guy.steele at oracle.com Wed May 8 20:37:35 2019 From: guy.steele at oracle.com (Guy Steele) Date: Wed, 8 May 2019 16:37:35 -0400 Subject: [ string literals ] Extending the escape language (was: String literals: some principles) In-Reply-To: <5579BE1E-435C-4735-9402-49634AF7A86E@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> <8AFDFDF5-0743-48AE-833E-879A75F635FC@oracle.com> <972A3698-ABC9-4D7F-B016-C8C0A8980B15@oracle.com> <10E54C71-D384-4C3E-B406-6BCCD380A082@oracle.com> <5579BE1E-435C-4735-9402-49634AF7A86E@oracle.com> Message-ID: > On May 8, 2019, at 4:35 PM, James Laskey wrote: > > > > Sent from my iPhone > >> On May 8, 2019, at 5:31 PM, Guy Steele wrote: >> >> >>> On May 8, 2019, at 4:27 PM, John Rose wrote: >>> >>> On May 8, 2019, at 1:26 PM, Guy Steele wrote: >>>> >>>>> >>>>> On May 7, 2019, at 6:14 PM, Brian Goetz wrote: >>>>> >>>>> . . . at the end of the line, one cannot differentiate between \ and \ when reading the code. >>>> >>>> This suggests a design constraint for the ESL: whatever \ means, \ ought to mean the same thing. >>> >>> Or else \+ is illegal. >>> In other words, there shouldn't be >>> more than one non-error meaning. >> >> True. Then there are the separate questions of (a) whether it is less confusing to Joe Programmer to accept \ but reject \+, or to make \+ ?just work?, and (b) what are costs of making \+ ?just work?. >> > > Explaining to Joe Programmer might be the main cost. True dat. From john.r.rose at oracle.com Wed May 8 22:31:35 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 8 May 2019 15:31:35 -0700 Subject: [ string literals ] Extending the escape language (was: String literals: some principles) In-Reply-To: References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> <8AFDFDF5-0743-48AE-833E-879A75F635FC@oracle.com> <972A3698-ABC9-4D7F-B016-C8C0A8980B15@oracle.com> <10E54C71-D384-4C3E-B406-6BCCD380A082@oracle.com> Message-ID: On May 8, 2019, at 1:31 PM, Guy Steele wrote: > > True. Then there are the separate questions of (a) whether it is less confusing to Joe Programmer to accept \ but reject \+, or to make \+ ?just work?, and (b) what are costs of making \+ ?just work?. Deprecating invisible whitespace before is a common practice. The OpenJDK repos reject this along with leading tabs and other visual ambiguities. Come to think of it, this common practice is... - evidence that Joe P. already knows isn't quite kosher. - evidence that *shouldn't* be a *significant* part of a new Java syntax! - *not* necessarily a candidate for enforcement at the language level. (The middle point supports <\ s> against <\ space> as a candidate escape sequence!) From james.laskey at oracle.com Thu May 9 12:06:41 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Thu, 9 May 2019 09:06:41 -0300 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] Message-ID: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> At this point I think the only outstanding issue is long line continuation. While we can postpone continuation until a later release, I think we should at least lay out the details to see if we need to do anything now. I'll follow up with a (long line continuation) synopsis e-mail in a few. Meanwhile, please review the JEP and comment back here. Cheers, -- Jim html: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.html markdown: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.md From james.laskey at oracle.com Thu May 9 14:34:48 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Thu, 9 May 2019 11:34:48 -0300 Subject: Long line string literals Message-ID: <454D3E30-C49B-49F0-9BC6-CA9E44117051@oracle.com> How does a Java developer express a very long string? Note that this is not just a multi-line string literal question. The issue relates to all string literals. Example, String ls = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc est libero, vehicula nec molestie in, semper aliquam magna."; Current solution, String ls = "Lorem ipsum dolor sit amet, consectetur " + "adipiscing elit. Nunc est libero, vehicula " + "nec molestie in, semper aliquam magna."; This works and will continue to work, but I think there is concern that this pattern won't work when multi-line string literals are added to the equation. There has been some debate about various machinations that could be be used. Some of the parameters; - The solution needs to be an escape sequence(s). This is the only mechanism we can introduce (now) and be backward compatible with traditional string literals. Other mechanisms, such as literal prefixing, are not open for discussion at this point in time. (+1) - A Multi-line String Literal JEP goal is to make all escape sequences equally meaningful for traditional string literals and multi-line string literals. (+1) - \, \ and \ (white space includes LF and CR) have been proposed with various semantics for each. There is a concern that the lack of visibility for what comes after the \. Is it a space, tab, unicode white space, LF or CR? How do you tell? (?1) - When the new escape sequence(s) is in a traditional string literal the compiler scanner needs to treat the traditional string literal as multi-line. (-1) The escape sequences suggested differ, but they are all variations of consuming the escape and zero to N characters after (or before). A) \ or \ Just consume the (single) line terminator/white space. Sample, String tsl = "Lorem ipsum dolor sit amet, consectetur \ adipiscing elit. Nunc est libero, vehicula \ nec molestie in, semper aliquam magna."; String msl = """ Lorem ipsum dolor sit amet, consectetur \ adipiscing elit. Nunc est libero, vehicula \ nec molestie in, semper aliquam magna."""; This works if the line terminator follows immediately after the \ . (+1) Can not tell if it is a white space or line terminator after the \ . (-1) This does not work if there is one or more intervening white space characters. (-1) This works for multi-line string literals because of stripTrailing. (+1) This does not work for traditional string literals because there is no notion of auto alignment to strip the leading white space on the next line. (-2) B) \ Consume all white space up to and including the line terminator. Same sample as A). Works in more cases than A). (+2) Still does not work for traditional string literals because there is no notion of auto alignment to strip the leading white space on the next line. (-2) C) \ Consume all white space (including LF and CR) up to a non-white space or end of string. Same sample as A). This works for both traditional and multi-line strings. (+1) Note that in A), B) and C) the next line may influence multi-line indentation. I.E., escapes are translated after auto alignment. (?1) D) \, (something other that white space) but otherwise the same as C) String tsl = "Lorem ipsum dolor sit amet, consectetur \, adipiscing elit. Nunc est libero, vehicula \, nec molestie in, semper aliquam magna."; String msl = """ Lorem ipsum dolor sit amet, consectetur \, adipiscing elit. Nunc est libero, vehicula \, nec molestie in, semper aliquam magna."""; Works but trading " + for \, . (?1) E) \> (something other that white space) Consume all white space up to and including the line terminator. \< (something other that white space) Consume all white space back to beginning of line. String tsl = "Lorem ipsum dolor sit amet, consectetur \> \ \ \ \ References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> Message-ID: <92CC33C3-6AA1-4639-ABC0-1DD87CD33C59@oracle.com> I've withdrawn the posting for some additional changes. Will keep you posted. -- Jim > On May 9, 2019, at 9:06 AM, Jim Laskey wrote: > > At this point I think the only outstanding issue is long line continuation. While we can postpone continuation until a later release, I think we should at least lay out the details to see if we need to do anything now. I'll follow up with a (long line continuation) synopsis e-mail in a few. > > Meanwhile, please review the JEP and comment back here. > > Cheers, > > -- Jim > > > html: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.html > markdown: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.md > > From amaembo at gmail.com Thu May 9 15:59:11 2019 From: amaembo at gmail.com (Tagir Valeev) Date: Thu, 9 May 2019 22:59:11 +0700 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> Message-ID: Hello! Great draft, thank you. I'm especially happy that the expert group came to the conclusion that automatic builtin processing of the indentation is important. I proposed to do this in January, 2018 [1]. While the solution proposed in JEP draft is not as radical as proposed by me, I still like it better than the previous RSL proposal. One thing which seems missing is dealing with tabs. What if user file is invented with tabs? Should they be also processed? More specifically, what is a "white space" in strip indent algorithm description? Only \u0020 symbol or \u0020 & \u0009? Or anything for which Character.isWhiteSpace() returns true? Also if tabs are included do single tab costs the same as single space? You may imagine that somebody pastes part of multiline string from StackOverflow where tabs were used for indent and see some unexpected results (e.g. indent changes for the untouched lines while visually in the editor pasted lines look having the same indent as surrounding ones). I admit that defining what is "expected result" is hard, especially taking into account that most editors provide a setting for the tab size and different users may have different tab size. Nevertheless I feel that tab handling should be explicitly spelled out (even if it's "tab is not considered as a white-space character"). With best regards, Tagir Valeev. [1] http://mail.openjdk.java.net/pipermail/amber-spec-experts/2018-January/000251.html ??, 9 ??? 2019 ?., 19:07 Jim Laskey : > At this point I think the only outstanding issue is long line > continuation. While we can postpone continuation until a later release, I > think we should at least lay out the details to see if we need to do > anything now. I'll follow up with a (long line continuation) synopsis > e-mail in a few. > > Meanwhile, please review the JEP and comment back here. > > Cheers, > > -- Jim > > > html: > http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.html > markdown: > http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.md > > > From james.laskey at oracle.com Thu May 9 16:40:19 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Thu, 9 May 2019 13:40:19 -0300 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> Message-ID: The proposed solution does not solve the mixed leading white space problem. As long as the white space is consistent across lines with the tabs, it works fine. Otherwise, the developer has two solutions; 1) Uuse your editor to detab or detab + entab. Either makes the white space consistent. 2) Write a custom String::transform method that does the trimMargins thing. String string = """ |> line 1 |> line 2 """.transform(s -> s.replaceAll("\\w*\\|> ", "")); The lone line discussion could fall out an automatic trimMargins solution, but it gets messy. -- Jim > On May 9, 2019, at 12:59 PM, Tagir Valeev wrote: > > Hello! > > Great draft, thank you. I'm especially happy that the expert group came to the conclusion that automatic builtin processing of the indentation is important. I proposed to do this in January, 2018 [1]. While the solution proposed in JEP draft is not as radical as proposed by me, I still like it better than the previous RSL proposal. > > One thing which seems missing is dealing with tabs. What if user file is invented with tabs? Should they be also processed? More specifically, what is a "white space" in strip indent algorithm description? Only \u0020 symbol or \u0020 & \u0009? Or anything for which Character.isWhiteSpace() returns true? Also if tabs are included do single tab costs the same as single space? You may imagine that somebody pastes part of multiline string from StackOverflow where tabs were used for indent and see some unexpected results (e.g. indent changes for the untouched lines while visually in the editor pasted lines look having the same indent as surrounding ones). I admit that defining what is "expected result" is hard, especially taking into account that most editors provide a setting for the tab size and different users may have different tab size. Nevertheless I feel that tab handling should be explicitly spelled out (even if it's "tab is not considered as a white-space character"). > > With best regards, > Tagir Valeev. > > [1] http://mail.openjdk.java.net/pipermail/amber-spec-experts/2018-January/000251.html > ??, 9 ??? 2019 ?., 19:07 Jim Laskey >: > At this point I think the only outstanding issue is long line continuation. While we can postpone continuation until a later release, I think we should at least lay out the details to see if we need to do anything now. I'll follow up with a (long line continuation) synopsis e-mail in a few. > > Meanwhile, please review the JEP and comment back here. > > Cheers, > > -- Jim > > > html: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.html > markdown: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.md > > From guy.steele at oracle.com Thu May 9 16:43:35 2019 From: guy.steele at oracle.com (Guy Steele) Date: Thu, 9 May 2019 12:43:35 -0400 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com>

Message-ID: One possibility is language enforcement: at the part of the processing of a multiline string where whitespace is stripped off to the left of the rectangle, it could be an error if that leading whitespace is not ?spelled the same? on all lines from which it is being stripped. I lean toward recommending this. > On May 9, 2019, at 12:40 PM, Jim Laskey wrote: > > The proposed solution does not solve the mixed leading white space problem. As long as the white space is consistent across lines with the tabs, it works fine. > > Otherwise, the developer has two solutions; > > 1) Uuse your editor to detab or detab + entab. Either makes the white space consistent. > > 2) Write a custom String::transform method that does the trimMargins thing. > > String string = """ > |> line 1 > |> line 2 > """.transform(s -> s.replaceAll("\\w*\\|> ", "")); > The lone line discussion could fall out an automatic trimMargins solution, but it gets messy. > -- Jim > > > > > >> On May 9, 2019, at 12:59 PM, Tagir Valeev > wrote: >> >> Hello! >> >> Great draft, thank you. I'm especially happy that the expert group came to the conclusion that automatic builtin processing of the indentation is important. I proposed to do this in January, 2018 [1]. While the solution proposed in JEP draft is not as radical as proposed by me, I still like it better than the previous RSL proposal. >> >> One thing which seems missing is dealing with tabs. What if user file is invented with tabs? Should they be also processed? More specifically, what is a "white space" in strip indent algorithm description? Only \u0020 symbol or \u0020 & \u0009? Or anything for which Character.isWhiteSpace() returns true? Also if tabs are included do single tab costs the same as single space? You may imagine that somebody pastes part of multiline string from StackOverflow where tabs were used for indent and see some unexpected results (e.g. indent changes for the untouched lines while visually in the editor pasted lines look having the same indent as surrounding ones). I admit that defining what is "expected result" is hard, especially taking into account that most editors provide a setting for the tab size and different users may have different tab size. Nevertheless I feel that tab handling should be explicitly spelled out (even if it's "tab is not considered as a white-space character"). >> >> With best regards, >> Tagir Valeev. >> >> [1] http://mail.openjdk.java.net/pipermail/amber-spec-experts/2018-January/000251.html >> ??, 9 ??? 2019 ?., 19:07 Jim Laskey >: >> At this point I think the only outstanding issue is long line continuation. While we can postpone continuation until a later release, I think we should at least lay out the details to see if we need to do anything now. I'll follow up with a (long line continuation) synopsis e-mail in a few. >> >> Meanwhile, please review the JEP and comment back here. >> >> Cheers, >> >> -- Jim >> >> >> html: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.html >> markdown: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.md >> >> > From james.laskey at oracle.com Thu May 9 16:53:34 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Thu, 9 May 2019 13:53:34 -0300 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com>

Message-ID: <67095905-B377-4E4E-B0F9-2D0879F3680C@oracle.com> Reasonable. This is a learning step for new users of MLS. > On May 9, 2019, at 1:43 PM, Guy Steele wrote: > > One possibility is language enforcement: at the part of the processing of a multiline string where whitespace is stripped off to the left of the rectangle, it could be an error if that leading whitespace is not ?spelled the same? on all lines from which it is being stripped. I lean toward recommending this. > >> On May 9, 2019, at 12:40 PM, Jim Laskey > wrote: >> >> The proposed solution does not solve the mixed leading white space problem. As long as the white space is consistent across lines with the tabs, it works fine. >> >> Otherwise, the developer has two solutions; >> >> 1) Uuse your editor to detab or detab + entab. Either makes the white space consistent. >> >> 2) Write a custom String::transform method that does the trimMargins thing. >> >> String string = """ >> |> line 1 >> |> line 2 >> """.transform(s -> s.replaceAll("\\w*\\| > ", "")); >> The lone line discussion could fall out an automatic trimMargins solution, but it gets messy. >> -- Jim >> >> >> >> >> >>> On May 9, 2019, at 12:59 PM, Tagir Valeev > wrote: >>> >>> Hello! >>> >>> Great draft, thank you. I'm especially happy that the expert group came to the conclusion that automatic builtin processing of the indentation is important. I proposed to do this in January, 2018 [1]. While the solution proposed in JEP draft is not as radical as proposed by me, I still like it better than the previous RSL proposal. >>> >>> One thing which seems missing is dealing with tabs. What if user file is invented with tabs? Should they be also processed? More specifically, what is a "white space" in strip indent algorithm description? Only \u0020 symbol or \u0020 & \u0009? Or anything for which Character.isWhiteSpace() returns true? Also if tabs are included do single tab costs the same as single space? You may imagine that somebody pastes part of multiline string from StackOverflow where tabs were used for indent and see some unexpected results (e.g. indent changes for the untouched lines while visually in the editor pasted lines look having the same indent as surrounding ones). I admit that defining what is "expected result" is hard, especially taking into account that most editors provide a setting for the tab size and different users may have different tab size. Nevertheless I feel that tab handling should be explicitly spelled out (even if it's "tab is not considered as a white-space character"). >>> >>> With best regards, >>> Tagir Valeev. >>> >>> [1] http://mail.openjdk.java.net/pipermail/amber-spec-experts/2018-January/000251.html >>> ??, 9 ??? 2019 ?., 19:07 Jim Laskey >: >>> At this point I think the only outstanding issue is long line continuation. While we can postpone continuation until a later release, I think we should at least lay out the details to see if we need to do anything now. I'll follow up with a (long line continuation) synopsis e-mail in a few. >>> >>> Meanwhile, please review the JEP and comment back here. >>> >>> Cheers, >>> >>> -- Jim >>> >>> >>> html: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.html >>> markdown: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.md >>> >>> >> > From guy.steele at oracle.com Thu May 9 17:14:15 2019 From: guy.steele at oracle.com (Guy Steele) Date: Thu, 9 May 2019 13:14:15 -0400 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: <67095905-B377-4E4E-B0F9-2D0879F3680C@oracle.com> References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com>

<67095905-B377-4E4E-B0F9-2D0879F3680C@oracle.com> Message-ID: The nice thing about this rule is that having an editor/IDT _either_ detab _or_ retab should fix the problem. > On May 9, 2019, at 12:53 PM, Jim Laskey wrote: > > Reasonable. This is a learning step for new users of MLS. > > > >> On May 9, 2019, at 1:43 PM, Guy Steele > wrote: >> >> One possibility is language enforcement: at the part of the processing of a multiline string where whitespace is stripped off to the left of the rectangle, it could be an error if that leading whitespace is not ?spelled the same? on all lines from which it is being stripped. I lean toward recommending this. >> >>> On May 9, 2019, at 12:40 PM, Jim Laskey > wrote: >>> >>> The proposed solution does not solve the mixed leading white space problem. As long as the white space is consistent across lines with the tabs, it works fine. >>> >>> Otherwise, the developer has two solutions; >>> >>> 1) Uuse your editor to detab or detab + entab. Either makes the white space consistent. >>> >>> 2) Write a custom String::transform method that does the trimMargins thing. >>> >>> String string = """ >>> |> line 1 >>> |> line 2 >>> """.transform(s -> s.replaceAll("\\w*\\| > ", "")); >>> The lone line discussion could fall out an automatic trimMargins solution, but it gets messy. >>> -- Jim >>> >>> >>> >>> >>> >>>> On May 9, 2019, at 12:59 PM, Tagir Valeev > wrote: >>>> >>>> Hello! >>>> >>>> Great draft, thank you. I'm especially happy that the expert group came to the conclusion that automatic builtin processing of the indentation is important. I proposed to do this in January, 2018 [1]. While the solution proposed in JEP draft is not as radical as proposed by me, I still like it better than the previous RSL proposal. >>>> >>>> One thing which seems missing is dealing with tabs. What if user file is invented with tabs? Should they be also processed? More specifically, what is a "white space" in strip indent algorithm description? Only \u0020 symbol or \u0020 & \u0009? Or anything for which Character.isWhiteSpace() returns true? Also if tabs are included do single tab costs the same as single space? You may imagine that somebody pastes part of multiline string from StackOverflow where tabs were used for indent and see some unexpected results (e.g. indent changes for the untouched lines while visually in the editor pasted lines look having the same indent as surrounding ones). I admit that defining what is "expected result" is hard, especially taking into account that most editors provide a setting for the tab size and different users may have different tab size. Nevertheless I feel that tab handling should be explicitly spelled out (even if it's "tab is not considered as a white-space character"). >>>> >>>> With best regards, >>>> Tagir Valeev. >>>> >>>> [1] http://mail.openjdk.java.net/pipermail/amber-spec-experts/2018-January/000251.html >>>> ??, 9 ??? 2019 ?., 19:07 Jim Laskey >: >>>> At this point I think the only outstanding issue is long line continuation. While we can postpone continuation until a later release, I think we should at least lay out the details to see if we need to do anything now. I'll follow up with a (long line continuation) synopsis e-mail in a few. >>>> >>>> Meanwhile, please review the JEP and comment back here. >>>> >>>> Cheers, >>>> >>>> -- Jim >>>> >>>> >>>> html: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.html >>>> markdown: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.md >>>> >>>> >>> >> > From brian.goetz at oracle.com Thu May 9 22:21:37 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 9 May 2019 15:21:37 -0700 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com>

Message-ID: > One possibility is language enforcement: at the part of the processing of a multiline string where whitespace is stripped off to the left of the rectangle, it could be an error if that leading whitespace is not ?spelled the same? on all lines from which it is being stripped. I lean toward recommending this. I see the logic here, but it also makes me a bit uncomfortable. Our story is that indentation-stripping is done by a JDK method (String::stripIndent), and that the language behavior is specified in terms of the library behavior. (This is essential if we want to allow users to opt out, do some manipulation on the un-aligned form, and then perform alignment ? the language and library behavior must be, er, aligned.). We could surely make String::stripIndent throw when you present it a mixed-whitespace string, but do we really want this? I would prefer that stripIndent be a total function on strings, even if it has to produce ugly output when given ugly input. From guy.steele at oracle.com Thu May 9 22:41:27 2019 From: guy.steele at oracle.com (Guy Steele) Date: Thu, 9 May 2019 18:41:27 -0400 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com>

Message-ID: We have already discussed other conditions that can cause string literals to be rejected before the indentation stripper gets a crack at them. I see nothing wrong with stripIndent being a total function but also having the compiler filter string literals that appear in source code before the function. Is applied. (The filter predicate could also be in the library if we want.) Sent from my iPhone > On May 9, 2019, at 6:21 PM, Brian Goetz wrote: > > >> One possibility is language enforcement: at the part of the processing of a multiline string where whitespace is stripped off to the left of the rectangle, it could be an error if that leading whitespace is not ?spelled the same? on all lines from which it is being stripped. I lean toward recommending this. > > I see the logic here, but it also makes me a bit uncomfortable. Our story is that indentation-stripping is done by a JDK method (String::stripIndent), and that the language behavior is specified in terms of the library behavior. (This is essential if we want to allow users to opt out, do some manipulation on the un-aligned form, and then perform alignment ? the language and library behavior must be, er, aligned.). > > We could surely make String::stripIndent throw when you present it a mixed-whitespace string, but do we really want this? I would prefer that stripIndent be a total function on strings, even if it has to produce ugly output when given ugly input. > > From john.r.rose at oracle.com Fri May 10 04:46:35 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 9 May 2019 21:46:35 -0700 Subject: Long line string literals In-Reply-To: <454D3E30-C49B-49F0-9BC6-CA9E44117051@oracle.com> References: <454D3E30-C49B-49F0-9BC6-CA9E44117051@oracle.com> Message-ID: <43AB2BC3-95E3-4006-A10D-0DFCF778A858@oracle.com> On May 9, 2019, at 7:34 AM, Jim Laskey wrote: > > How does a Java developer express a very long string? > ? > > Some of the parameters; > > - The solution needs to be an escape sequence(s). This is the only > mechanism we can introduce (now) and be backward compatible with > traditional string literals. Other mechanisms, such as literal > prefixing, are not open for discussion at this point in time. (+1) +1 from me > > - A Multi-line String Literal JEP goal is to make all escape sequences > equally meaningful for traditional string literals and multi-line > string literals. (+1) +1 > - \, \ and \ (white space includes LF > and CR) have been proposed with various semantics for each. There is a > concern that the lack of visibility for what comes after the \. Is it a > space, tab, unicode white space, LF or CR? How do you tell? (?1) Yep. Also note that some source control systems (ours!) forbid trailing spaces before EOL in code, precisely because they are invisible. IMO this consideration immediately disqualifies <\ space> as a candidate for an escape sequence. <\ LT> is still just fine, and maybe <\ LT space*> is tolerable, but not if it means something different from <\ LT>. > - When the new escape sequence(s) is in a traditional string literal the > compiler scanner needs to treat the traditional string literal as > multi-line. (-1) Yes: If you use a <\ LT> escape sequence in a thin string, it becomes a ML string. If you thought that only fat strings could be ML strings, I've got a nice puzzler for you. The reality of about fat strings is they are nicely formatted multi-line strings (with the rectangle extraction feature). > The escape sequences suggested differ, but they are all variations of > consuming the escape and zero to N characters after (or before). I'll say up front that greedily gobbling whitespace characters either before or after an escape is a powerful idea, IMO, because it allows the user to designate an ad hoc run of whitespace as "program format only, but not payload". If we make the ad hoc run easy to use, to make the program more readable, we win, as with the rectangle rule. But there has to be a way to "fence" the whitespace gobbler so it doesn't gobble nearby whitespace which is intended as payload. You can do this today as <\ 0 4 0>, and I would prefer to add a more memorable optional <\ s>. To protect a tab, today's <\ t> works just fine. I think either <\ 0 4 0> or <\ s> is adequate to "fence the gobbler", in either direction. > > A) \ or \ Just consume the (single) > line terminator/white space. > > Sample, > > String tsl = "Lorem ipsum dolor sit amet, consectetur \ > adipiscing elit. Nunc est libero, vehicula \ > nec molestie in, semper aliquam magna."; > > String msl = """ > Lorem ipsum dolor sit amet, consectetur \ > adipiscing elit. Nunc est libero, vehicula \ > nec molestie in, semper aliquam magna."""; > > This works if the line terminator follows immediately after the \ . (+1) > > Can not tell if it is a white space or line terminator after the \ . (-1) > > This does not work if there is one or more intervening white space > characters. (-1) > > This works for multi-line string literals because of stripTrailing. (+1) > > This does not work for traditional string literals because there is no > notion of auto alignment to strip the leading white space on the next > line. (-2) -1 from me. It lets you break the long line, but then you have to place it flush against the left margin. To me breaking a long line inherently has two decisions: 1. break the line, 2. decide where to place the second part on the next line, using spaces and tabs. So I want the same mechanism that gobbles the LT to also gobble the succeeding whitespace. Thus <\ LT WS*> expands to the null string. > > B) \ Consume all white space up to and including the line > terminator. > > Same sample as A). > > Works in more cases than A). (+2) > > Still does not work for traditional string literals because there is no > notion of auto alignment to strip the leading white space on the next > line. (-2) Same objection (and proposal) as for A. > > C) \ Consume all white space (including LF and CR) up to a > non-white space or end of string. > > Same sample as A). > > This works for both traditional and multi-line strings. (+1) > > Note that in A), B) and C) the next line may influence multi-line > indentation. I.E., escapes are translated after auto alignment. (?1) +1 This is the one I like! I accept that, for a fat string with rectangle extratction, I am required to indent the second line fragment *after* the left margin of the extracted rectangle. It's a fine compromise. String msl = """ First. Lorem ipsum dolor sit amet, consectetur \ adipiscing elit. Nunc est libero, vehicula \ nec molestie in, semper aliquam magna. Last. """; => "First.\n Lorem?magna.\nLast." In this example, the continuation lines (second and third after Lorem?) can be exdented to align with First and Last, but not further. Any extra indentation, after that of First and Last, is gobbled by <\ LT WS*>. > D) \, (something other that white space) but otherwise the same as C) > > String tsl = "Lorem ipsum dolor sit amet, consectetur \, > adipiscing elit. Nunc est libero, vehicula \, > nec molestie in, semper aliquam magna."; > > String msl = """ > Lorem ipsum dolor sit amet, consectetur \, > adipiscing elit. Nunc est libero, vehicula \, > nec molestie in, semper aliquam magna."""; > > Works but trading " + for \, . (?1) -1 (Not sure what D buys?) > > E) \> (something other that white space) > Consume all white space up to and including the line terminator. > \< (something other that white space) > Consume all white space back to beginning of line. > > String tsl = "Lorem ipsum dolor sit amet, consectetur \> > \ > \ > String msl = """ > Lorem ipsum dolor sit amet, consectetur \> > \ > \ > A goal of the multi-line JEP was to make the string more readable, less > error prone and maintainable. (-10) Yep. > Note for D) and E), is it an error if a non-white space is encountered > or just stop? (?1) D/K. ? John From daniel.smith at oracle.com Fri May 10 21:04:08 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 10 May 2019 15:04:08 -0600 Subject: Wrapping up the first two courses In-Reply-To: <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> Message-ID: <9B1B3DB5-53FF-41B7-8B42-82837A1B39BE@oracle.com> I generally like where this has landed. I've been an uninvolved observer, and can't possibly process all the discussions on this mailing list over the past few months, so sorry if I've missed some of the core arguments on certain points. But I scanned through the various threads, and wanted to point out a couple of things in this conclusion that strike me as odd/unmotivated. > On Apr 22, 2019, at 7:15 AM, Brian Goetz wrote: > > So, I posit, we have consensus over the following things: > > - Multi-line strings are a useful feature on their own > - Using ?fat? delimiters for multi-line strings is practical and intuitive There's an argument that "fat" delimiters are important because lots of use cases contain single quotes. Two thoughts on that: - Okay, but that doesn't mean we have to prohibit "thin" delimiters, right? I have a weak preference for wanting to write multi-line strings using the standard " characters when I can get away with it. Seems more readable to me, especially for multi-line strings that aren't big chunks of marked-up text. - What's the solution for single-line string literals that contain quotes? Fat delimiters are pretty hard to read when they're both on a single line, and I don't think the current story supports that anyway. If the solution is some "turn off escapes" mechanism, wouldn't the same mechanism work for multi-line strings? > - There exists a reasonable alignment algorithm, which users can learn easily enough, and can be captured as a library method on String (some finer points to be hammered out) Practically, the programming style I would want to use is Jim's example (h): String h = """+--------+ | text | +--------+"""; Occasionally?when the line is wide?I might want to fall back to one of the other styles (like (d)), but (h) would be my go-to. Unfortunately, it seems like we've landed in a place where (h) is disallowed, because it can't be handled by a library method. There have been various discussions about whether multi-line string literals are one-dimensionsal (open quote + payload + close quote) or two-dimensional (the contents of a rectangle in the editor). I think the two-dimensional model is the right abstraction?that is, drawing a rectangle should be an inherent part of parsing a multi-line string literal. The "implicitly apply a library method to this string" view is based on the one-dimensional model, where after the fact we try to approximate context of the literal and re-interpret the payload. Why tie our hands? (Strawman: "We want a pluggable string processor." Me: "Since when is parsing supposed to be pluggable?") As a pretty-simple definition of the 2D rectangle, I'd be happy with "all columns to the right of the opening delimiter, on all lines until the closing delimiter". Indents in between must use whitespace to align with the opening delimiter; if they don't, that's a parse error. I realize that some people prefer a different style, and that this story is complicated by tab characters and variable-width fonts. So maybe there's another rule (or two) for the 2D rectangle when the first line is blank, based on the placement of the closing delimiter, or based on the leftmost non-whitespace character. But my high-level point is that I'd rather not force the algorithm to be defined on a context-free String. > - To the extent the language performs alignment, it should be consistent with what the library-based version does, so that users can opt out and opt back in again > - There needs to be an opt-out, for the cases where alignment is not the default the user wants I want to say that, again relying on the 2D program text the parser is working with, the algorithm should be designed so that delimiters can be placed in a way to naturally indicate no trimming should occur. E.g., end delimiter in column 0 (sorry, case (d)). Others have suggested something along those lines. I don't know if you'd call that an "opt out", but the best opt-outs are the ones that don't need special syntax or rules. (That's another reason my preferred style (h) doesn't work for everybody, because it requires at least 3 characters of indentation.) From daniel.smith at oracle.com Fri May 10 23:39:48 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 10 May 2019 17:39:48 -0600 Subject: Wrapping up the first two courses In-Reply-To: <9B1B3DB5-53FF-41B7-8B42-82837A1B39BE@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <9B1B3DB5-53FF-41B7-8B42-82837A1B39BE@oracle.com> Message-ID: > On May 10, 2019, at 3:04 PM, Dan Smith wrote: > > Practically, the programming style I would want to use is Jim's example (h): > > String h = """+--------+ > | text | > +--------+"""; Thinking about this a bit more, I could also be happy uniformly adopting example (a): String a = """ +--------+ | text | +--------+ """; Or, where needed, (e): String e = """ +--------+ | text | +--------+ """; I think the key to this style for me is to stop thinking about this as a "string literal with newlines" and start thinking about it as a different entity. (Which is a good argument for fat delimiters.) > As a pretty-simple definition of the 2D rectangle, I'd be happy with "all columns to the right of the opening delimiter, on all lines until the closing delimiter". Indents in between must use whitespace to align with the opening delimiter; if they don't, that's a parse error. > > I realize that some people prefer a different style, and that this story is complicated by tab characters and variable-width fonts. So maybe there's another rule (or two) for the 2D rectangle when the first line is blank, based on the placement of the closing delimiter, or based on the leftmost non-whitespace character. But my high-level point is that I'd rather not force the algorithm to be defined on a context-free String. Reframing this to support things like (a) and (e), but still take context into account, I really think we could cut down on the degrees of freedom significantly, and just say this: the left margin of the rectangle aligns with the left side of the opening or closing delimiter, whichever is leftmost*; the top of the rectangle is the line after the opening delimiter. All indents must match the leftmost delimiter's prefix (where non-whitespace prefix text is replaced with spaces), and the line after the opening delimiter must be blank. This is a very opinionated rule: the space to the left of the leftmost delimiter is simply off-limits. And any indentation to the right of the leftmost delimiter is preserved. That's just How It's Done. If you need a different left margin, move your delimiters (e.g., add a newline, like (e)). I think programmers would appreciate a simple, strict, easy-to-see rule, rather than a best-effort trimming algorithm. (* I'd almost be willing to say that the opening delimiter always determines the indent, but I'm backing off for tab-lovers who won't like how prefix text gets replaced with spaces; though maybe tab-lovers will want to keep things tidy with a newline before the opening delimiter. Anyway, in most cases the opening and closing delimiter will start in the same column.) From james.laskey at oracle.com Mon May 13 14:05:17 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Mon, 13 May 2019 11:05:17 -0300 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: <92CC33C3-6AA1-4639-ABC0-1DD87CD33C59@oracle.com> References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> <92CC33C3-6AA1-4639-ABC0-1DD87CD33C59@oracle.com> Message-ID: <950B7480-0CC3-4970-9209-553AFE2FD603@oracle.com> After some significant tweaks, reopening the JEP for review. https://bugs.openjdk.java.net/browse/JDK-8222530 The most significant change is the renaming to Text Blocks (I'm sure it will devolve over time Text Literals or just Texts.) This is primarily to reflect the two-dimensionality of the new literal, whereas String literals are one-dimensional. Comment back here. Cheers, -- Jim > On May 9, 2019, at 12:44 PM, Jim Laskey wrote: > > I've withdrawn the posting for some additional changes. Will keep you posted. > > -- Jim > > >> On May 9, 2019, at 9:06 AM, Jim Laskey wrote: >> >> At this point I think the only outstanding issue is long line continuation. While we can postpone continuation until a later release, I think we should at least lay out the details to see if we need to do anything now. I'll follow up with a (long line continuation) synopsis e-mail in a few. >> >> Meanwhile, please review the JEP and comment back here. >> >> Cheers, >> >> -- Jim >> >> >> html: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.html >> markdown: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.md >> >> > From forax at univ-mlv.fr Mon May 13 14:12:36 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 13 May 2019 16:12:36 +0200 (CEST) Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> Message-ID: <229271493.298931.1557756756018.JavaMail.zimbra@u-pem.fr> > De: "Brian Goetz" > ?: "amber-spec-experts" > Envoy?: Dimanche 12 Mai 2019 21:38:38 > Objet: Call for bikeshed -- break replacement in expression switch > As mentioned in the preview mail, we have one more decision to make: the new > spelling of ?break value? in expression switches. We have previously discussed > ?break-with value?, which everyone seems to like better than ?break value?, but > I think we can, and should, do better. > (Despite the call-for-bikeshed, this is not to reopen every sub-decision ? the > 2x2 semantics, the use of ->, the name of the construct ? this bikeshed only > has room for one bike.) > There are two primary reasons why we prefer break-with to break. We originally > chose ?break value" when we had a more limited palette of options to choose > from (the keyword-resupply ship hadn?t yet docked.) The overloading of break > creates uncomfortable interactions. There is the obvious ambiguity between > ?break value? and ?break label?; there is also the slightly less obvious > interaction where we cannot permit ?break value? inside a loop or statement > switch inside an expression switch. While both of these can be ?specified > around?, they create distortions in the spec, which in turn creates complexity > in the user model; these are a sign that we may be pushing something a bit too > far. Further, historically ?break? has been a straight transfer of control; > this muddies up what ?break? means. > Once we alit on the idea of break-* as a keyword, it seemed immediately more > comfortable to make a new break-derived keyword; this allowed us to undo the > distortions that ?break value? introduced, and it immediately felt better. But > I think we can do better still. Here?s what?s making me uncomfortable. > We?ve actually been here before: lambda expressions were the first time we > allowed an expression to contain statements, and while the streamlined case of > ?x -> e? didn?t require any control statements, and many lambdas could be > expressed with this form, statement lambdas needed a way to say ?stop executing > the body of this lambda, and yield a value.? We settled ? somewhat > uncomfortably ? on ?return value" for this. > Fast-forward to today, when we?re introducing the second expression form that > can contain statements, and we face the same question: how to indicate ?I?m > done, I?m completing normally, here?s my value.? Lambdas provide no help here; > we can?t use ?return? here. (Well, we could, but that would be terrible, so > we?re not going to.) Which means we have to solve the problem again, but > differently. That?s already not so great. > Digression: What?s so terrible about ?return?, any why is it OK for lambdas but > not OK for switches? > While we could of course define ?return? to mean whatever we want, But, in > imperative languages with the concept of ?methods? or ?procedures?, including > Java, return has always had a clear meaning: unwind the current call frame, and > yield the designated value to the caller. Lambda expressions are effectively > method bodies (lambdas are literals for functional interfaces, which are single > method interfaces), and so return (barely) fits. But switch expressions are > most definitely not methods, and are not associated with call frames. Asking > users to look at the enclosing context when they see a ?return? in the middle > of a method, to know whether it returns from the method or merely transfers > control within the method, is a lot to ask. (Yes, I know lambdas ask this as > well; this is why this was an uncomfortable choice, and having made this hole, > I?m not anxious to expand it dramatically. If anything I?d prefer to close it, > but that?s another bikeshed.). > (end digression) > We could surely take ?break-with? and move on; it feels sufficiently ?switchy?. > But let?s look ahead a little bit. We?ve now confronted the same problem twice: > an expression form that, in a minority use case, needed a way to express ?stop > computing this expression, because I?m done, and here?s its value.? (And, > unfortunately, we have two different syntactic ways to express the same basic > concept.) Let?s call these ?structured expressions.? > We have two structured expression forms, and of the three numbers in computer > science, ?two? is not one of them. Which suggests we are going to face this > problem again some day ? whether it be ?block expressions?, or ?if > expressions?, or ?let expressions?, or ?try expressions?, or whatever. (NB: > this call-for-bikeshed most definitely does not extend to ?why not just do > generalized block expressions?, so please don?t go there. That said, you could > treat this discussion as ?if Java had block expressions, what might they look > like?? But we?re focusing on the content of the block, not how the block is > framed.) > Let?s say for sake of argument that we might someway want to extend ternary > expressions to support the same kind of ?restricted block expressions? as > expression switches. (This is just an example for purposes of illustration, > let?s not get derailed on ?but you should use an ?if? statement for that"). > String s = (foo != null) > ? s > : { > println(?null again at line? + __LINE__); > break-with ?null?; > }; > Such an expression needs a way to say ?I?m done, here?s my value?, just as > lambda and switch did before it. Clearly ?return? is not the right thing here > any more than it is for switches. And I don?t think ?break-with? is all that > great here either! It?s not terrible, but outside of a loop or switch, it > starts to feel kind of forced. And it would be terrible to solve this problem > twice with one-time solutions, and have no general story, and then have to come > up with YET ANOTHER way of expressing the same basic concept. So regardless of > what we expect for future expression forms, let?s examine what our options are > that are not tied to call frames (return) or direct transfer of control > (switches and loops.). > Looking at what other languages have done here, there are a few broad > directions: > - A statement like ?break-with v?, indicating that the enclosing structured > expression is completing normally with the provided value. > - An operator that serves the same purpose, such as ?-> e?. > - Assigning to some magic variable (this is how Pascal indicates the return > value of a function). > - Treating the last expression in the block as the result. > I think we can dispatch all but the first relatively easily: > - We don?t use operators for ?return?, we use a keyword; this would be both a > gratuitous departure, as well as too easy to miss. > - Switch expressions don?t have names, and even if we assigned to ?switch?, it > wouldn?t be obvious that we were actually terminating execution of the block. > - Everywhere else in the language (such as method bodies), you are free to yield > up a value from the middle of the block, perhaps from within a control > construct like a loop; restricting the RHS of case blocks to put their result > last would be a significant new restriction, and would limit the ability to > refactor to/from methods. And further, the convention of putting the result > last, while a fine one for a language that is ?expressions all the way down?, > would likely be too subtle a cue in Java. > So, we want a keyword (or contextual keyword.). In some hallway brainstorming, > candidates that emerged include yield, produce, offer, offer-up, result, > value-break, yield-value, provide, resulting-in, break-with, resulting, > yielding, put, give, giving, ... > (Also to keep in mind: remember we?re dealing with a minority case; most of the > time, there?ll just be an expression on the RHS.) > TL;DR: I think we might come to regret break-* just as we did with return ? > because it won?t scale to future demands we place on it, and having *three* > ways to say basically the same thing in three different contexts would be > embarrassing. I would like to see if we can do better. > Of the options listed here, I have a favorite: yield. (This is one of the terms > we?ve actually be using all along when describing this feature in english.) > There is one obvious objection to ?yield?, which I?d like to preemptively > address: that in some languages (though not in Java, except for the > infrequently-used Thread.yield()), it is associated with concurrency > primitives, such as generators. (This was the objection raised when yield was > proposed in the context of lambdas.). But, these association are not grounded > in existing Java constructs (and, the progress of Loom suggests that constructs > like async/await are not coming to Java, and even if we wanted language support > for generators, there are ample other ways to say it.) I kind a like the simplicity of the keyword yield but i don't think it's a good idea to use it. - as you said, yield in other language has a different meaning, so even if Java doesn't use yield in a generator it will be confusing for people discovering Java after Python by example. - currently for loom the way to yield from a continuation is to use Continuation.yield(scope) with scope being a continuation scope, so it might be confusing if there is a static import because "yield scope;" and "yield(scope);" have two different meaning. I kind a like relinquish too, so i will stop there. R?mi From brian.goetz at oracle.com Mon May 13 14:20:13 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 13 May 2019 10:20:13 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <229271493.298931.1557756756018.JavaMail.zimbra@u-pem.fr> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <229271493.298931.1557756756018.JavaMail.zimbra@u-pem.fr> Message-ID: > I kind a like the simplicity of the keyword yield but i don't think it's a good idea to use it. > - as you said, yield in other language has a different meaning, so even if Java doesn't use yield in a generator it will be confusing for people discovering Java after Python by example. Everything is a tradeoff. There are two dimensions here to consider: - What percentage of the user base has a polluted perspective? - How badly are they polluted, and how hard is it to get over? My suspicion is that the first number is actually pretty small, and for most of them, they can get over it. And also: the percentage of people _on this list_ that are polluted is probably dramatically higher than for the ambient Java developer population (those that take an active interest in language evolution are probably familiar with more languages.). So, do we want to pick something that is clear for most people, but polluted for a minority, or something that is crappy for everyone, but unpolluted? It depends, of course, but my main point is that I think the ?pollution? angle is overblown, and we shouldn?t over-rotate to it. > - currently for loom the way to yield from a continuation is to use Continuation.yield(scope) with scope being a continuation scope, so it might be confusing if there is a static import because "yield scope;" and "yield(scope);" have two different meaning. Yes, but of course these can be changed, and if we went with yield in the language, we would of course update Loom APIs accordingly. From dl at cs.oswego.edu Mon May 13 14:28:52 2019 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 13 May 2019 10:28:52 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> Message-ID: <9aaff291-fb96-9cbd-432b-206cd215ed06@cs.oswego.edu> Having lost (nearly) this argument years ago, I'm not sure why I bother, but ... On 5/12/19 3:38 PM, Brian Goetz wrote: > > Looking at what other?languages have done here, there are a few broad > directions:? > > ?- A statement like??break-with v?, indicating that the enclosing > structured expression is?completing normally with the provided value. ? > ?- An operator that?serves the same purpose, such as??-> e?. > ?- Assigning to some magic variable (this is how Pascal indicates the > return value of a function). ? > ?- Treating the last expression in the block as the result.? (The last one being "progn", the earliest and arguably still best of these.) > > I think we can dispatch all but the first relatively easily: ... > > > ?- Everywhere else in the language (such as method bodies), you are > free to yield up a value from the middle of the block, perhaps from > within a control?construct like a loop; restricting the RHS of case > blocks to put their result last would be a significant new > restriction, and would limit the ability to refactor to/from methods. > And further, the convention of putting the result last, while a fine > one for a language that is??expressions all the way down?, would > likely be too subtle a cue in Java.? Last time around, the last point about subtlety and odd-lookingness of progn seemed to bother people the most.? It is possible to make it? less subtle by additionally requiring some symbol. Prefix "^" is still available. Allowing for example: ? ? String s = (foo != null)? ? ? ? ? ? s ? ? ? ? : { println(?null again at line? + __LINE__);? ^ ?null?;? }; Which still lgtm.... -Doug From guy.steele at oracle.com Mon May 13 19:08:48 2019 From: guy.steele at oracle.com (Guy Steele) Date: Mon, 13 May 2019 15:08:48 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <9aaff291-fb96-9cbd-432b-206cd215ed06@cs.oswego.edu> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <9aaff291-fb96-9cbd-432b-206cd215ed06@cs.oswego.edu> Message-ID: > On May 13, 2019, at 10:28 AM, Doug Lea

wrote: > > > Having lost (nearly) this argument years ago, I'm not sure why I bother, but ... > > On 5/12/19 3:38 PM, Brian Goetz wrote: >> >> Looking at what other languages have done here, there are a few broad directions: >> >> - A statement like ?break-with v?, indicating that the enclosing structured expression is completing normally with the provided value. >> - An operator that serves the same purpose, such as ?-> e?. >> - Assigning to some magic variable (this is how Pascal indicates the return value of a function). >> - Treating the last expression in the block as the result. > > (The last one being "progn", the earliest and arguably still best of these.) > >> >> I think we can dispatch all but the first relatively easily: ... >> >> >> - Everywhere else in the language (such as method bodies), you are free to yield up a value from the middle of the block, perhaps from within a control construct like a loop; restricting the RHS of case blocks to put their result last would be a significant new restriction, and would limit the ability to refactor to/from methods. And further, the convention of putting the result last, while a fine one for a language that is ?expressions all the way down?, would likely be too subtle a cue in Java. > > Last time around, the last point about subtlety and odd-lookingness of progn seemed to bother people the most. It is possible to make it less subtle by additionally requiring some symbol. Prefix "^" is still available. Allowing for example: > > > String s = (foo != null) > ? s > : { println(?null again at line? + __LINE__); ^ ?null?; }; > > Which still lgtm?. Could be worse, but looks to be like Java with a Smalltalk accent?just as { foo(); bar } is Java with a Lisp (or ECL) accent. I would prefer to adapt a bit of syntax from ECL: the statement b => e; evaluates b as a boolean expression, and if it is true, then e is evaluated and its value becomes the value of the block. This gives you a syntax very similar to that of Lisp COND: { x > y => 1; x < y => -1; true => 0; } If you then want to further abbreviate ?true =>?, well, that?s another story, but I wouldn?t blame you. ?Guy From guy.steele at oracle.com Mon May 13 19:13:09 2019 From: guy.steele at oracle.com (Guy Steele) Date: Mon, 13 May 2019 15:13:09 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <229271493.298931.1557756756018.JavaMail.zimbra@u-pem.fr> Message-ID: <2131355F-93D9-4346-A973-F513DD33AC29@oracle.com> > On May 13, 2019, at 10:20 AM, Brian Goetz wrote: > >> I kind a like the simplicity of the keyword yield but i don't think it's a good idea to use it. >> - as you said, yield in other language has a different meaning, so even if Java doesn't use yield in a generator it will be confusing for people discovering Java after Python by example. > > Everything is a tradeoff. There are two dimensions here to consider: > - What percentage of the user base has a polluted perspective? > - How badly are they polluted, and how hard is it to get over? > > My suspicion is that the first number is actually pretty small, and for most of them, they can get over it. And also: the percentage of people _on this list_ that are polluted is probably dramatically higher than for the ambient Java developer population (those that take an active interest in language evolution are probably familiar with more languages.). It?s true; I have been polluted for ?yield? for a long, long time. I think I would still prefer ?produce?. > So, do we want to pick something that is clear for most people, but polluted for a minority, or something that is crappy for everyone, but unpolluted? It depends, of course, but my main point is that I think the ?pollution? angle is overblown, and we shouldn?t over-rotate to it. > >> - currently for loom the way to yield from a continuation is to use Continuation.yield(scope) with scope being a continuation scope, so it might be confusing if there is a static import because "yield scope;" and "yield(scope);" have two different meaning. > > Yes, but of course these can be changed, and if we went with yield in the language, we would of course update Loom APIs accordingly. > > > From john.r.rose at oracle.com Mon May 13 19:48:39 2019 From: john.r.rose at oracle.com (John Rose) Date: Mon, 13 May 2019 12:48:39 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <9aaff291-fb96-9cbd-432b-206cd215ed06@cs.oswego.edu> Message-ID: <135B5F1D-5DF3-4C54-9B2D-8D2CD716FFC9@oracle.com> On May 13, 2019, at 12:08 PM, Guy Steele wrote: > >> On May 13, 2019, at 10:28 AM, Doug Lea

wrote: >> >> >> Having lost (nearly) this argument years ago, I'm not sure why I bother, but ... >> >> On 5/12/19 3:38 PM, Brian Goetz wrote: >>> >>> Looking at what other languages have done here, there are a few broad directions: >>> >>> - A statement like ?break-with v?, indicating that the enclosing structured expression is completing normally with the provided value. >>> - An operator that serves the same purpose, such as ?-> e?. >>> - Assigning to some magic variable (this is how Pascal indicates the return value of a function). >>> - Treating the last expression in the block as the result. >> >> (The last one being "progn", the earliest and arguably still best of these.) >> >>> >>> I think we can dispatch all but the first relatively easily: ... >>> >>> >>> - Everywhere else in the language (such as method bodies), you are free to yield up a value from the middle of the block, perhaps from within a control construct like a loop; restricting the RHS of case blocks to put their result last would be a significant new restriction, and would limit the ability to refactor to/from methods. And further, the convention of putting the result last, while a fine one for a language that is ?expressions all the way down?, would likely be too subtle a cue in Java. >> >> Last time around, the last point about subtlety and odd-lookingness of progn seemed to bother people the most. It is possible to make it less subtle by additionally requiring some symbol. Prefix "^" is still available. Allowing for example: >> >> >> String s = (foo != null) >> ? s >> : { println(?null again at line? + __LINE__); ^ ?null?; }; >> >> Which still lgtm?. > > Could be worse, but looks to be like Java with a Smalltalk accent?just as > > { foo(); bar } > > is Java with a Lisp (or ECL) accent. I would prefer to adapt a bit of syntax from ECL: the statement > > b => e; > > evaluates b as a boolean expression, and if it is true, then e is evaluated and its value becomes the value of the block. This gives you a syntax very similar to that of Lisp COND: > > { x > y => 1; x < y => -1; true => 0; } > > If you then want to further abbreviate ?true =>?, well, that?s another story, but I wouldn?t blame you. OK, I can't resist putting some spray paint here. If we are contemplating operator-like syntaxes (instead of the keyword-like ones that seem most reasonable, and which Brian is guiding us towards), then let's note that the operator-like syntax that *Java already has* for producing a value from a structured expression is "->". So perhaps the Java-native idiom for ECL's "true=>" is just "->". Or (more likely for me) it is a break-like keyword *with an arrow*. So under that observation: switch (x) { case Y -> z; } is short for something like: switch (x) { case Y -> { ? break -> z; } } and (what's more) the "?" could contain side effects and let-bindings. The rule for developers is that if you needed to put a {?} block after your arrow ->, then you can still use an arrow to return a value, but it must be an extra arrow, marked with a keyword (or syntax context) that means "here is the rest of the arrow you wanted to write a moment ago". This could work inside of lambdas also: f( (x,y) -> z ) is short for something like: f( (x,y) -> { ? return -> z; } ) (Why do such a thing? To give users the option of a uniform style which answers every "->{" with a finishing arrow; they *can* use unadorned "return" but their colleagues might frown on the faux pas.) One reason I'm pushing on the "interrupted arrow" idea here is a fundamental design prejudice I have. I very much like the Lisp syntax (block foo ? (return-from foo x) ?). Although "return" is damaged goods for us, what I'd like to salvage from this example is the *very clear correspondence* between the "starter syntax" of the structured expression ("block foo") and the "stopper syntax" in the middle ("return-from foo"). The shared tag "foo" makes it very easy for the eye to match up the stopper with the starter. You don't have to consult a complex matrix of "what matches with what". ("I shot an arrow into the air, and where it landed only the author of the break permeability matrix knows here.") OK, one more spritz of spray paint and I'm done for now. If we like the idea of an "interrupted arrow", then we could think about going the whole way with it. If the "stopper" is the sharp end of the arrow (anchored to a keyword like return or break) then the "starter" of the structured expression could be the dull end of the arrow (without an arrowhead). Like this: switch (x) { case Y -{? break -> z; } } Here, the rule is if you intend to use an arrow to return a value, you put half of the arrow where the return will go to, and the other half when you have a value. (Note that the syntax "break LABEL" could be added easily, later on, if there were any value for that, which probably there isn't.) This conflicts with a bit of precedent with lambdas, where we might expect to break the arrows the new way: f( (x,y) -{ ? return -> z; } ) If we don't want broken arrows then set up a duel between the starter and stopper with opposing arrows: switch (x) { case Y -> {? break <- z; } } Or let the author propose a target at the starter: switch (x) { case Y @< {? break -> z; } } Surely that would be a spritz too far. ? John From john.r.rose at oracle.com Mon May 13 20:40:07 2019 From: john.r.rose at oracle.com (John Rose) Date: Mon, 13 May 2019 13:40:07 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> Message-ID: <1FCD1BB2-8B81-4D97-A29D-FE44DE6E1764@oracle.com> On May 12, 2019, at 12:38 PM, Brian Goetz wrote: > > Digression: What?s so terrible about ?return?, any why is it OK for lambdas but not OK for switches? Although I like Lisp (block foo ? (return-from foo x) ?), I buy your argument that "unwind the current call frame" is different enough from "transfer control within the current call frame", enough to merit syntactic difference. There's a very subtle difference in Java (but not in Lisp) between call frames and blocks which you didn't mention directly but which tips the balance for me: A Java call frame has side-effectable locals (because Java is an imperative language, as you said). Thus, a block which exits to the current call frame can also push side effects to that call frame, while a block that unwinds to a different call frame cannot push side effects to enclosing variable bindings, because they will be in a different call frame. That's a difference that is usually ignored when reasoning about Java programs, thanks to the implicitly final rule. Using distinct syntaxes for same-frame block exits and different-frame unwinds adds a little extra help to programmers to keep track of the difference. It doesn't matter that programmers usually don't care about the difference; having a syntax difference might help them avoid surprise errors, and allow them to keep the semantic differences at a non-distracting subliminal level. Thus, if I have to say "return" I know I can't return an extra value by side effects, since all my up-level variables will be final (implicitly or not). And if I say "yield" I know I can return an extra value, if I need to, by punching it into some visible local. From kevinb at google.com Mon May 13 21:55:34 2019 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 13 May 2019 14:55:34 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> Message-ID: Moving away from "break": I'm interested.... So in colon-form switch (whether statement or expression) you are responsible for your own control flow, and in arrow-form switch (whether statement or expression) you are not. "break" is synonymous in users' minds with that control flow they don't want to have to do. So in theory it's arrow-form that should make the concept of "breaking" obsolete. Unfortunately, that doesn't seem like the distinction we'll making; do I have the following right? 1. A colon-form case in a switch statement stays absolutely the same as always - keep `break`ing 2. An arrow-form case in a switch statement usually doesn't need to `break`... but can, just as an early-out from a block, right? 3. A colon-form case in a switch expression cannot `break` at all; it either yields, throws, or falls through 4. An arrow-form case in a switch expression: cannot `break` or fall through; must be a single expression, or it must always `__yield` or throw So using break or not isn't about whether you are doing your own control flow or not. So it's not a nice conceptual clean break that way, but in practice we think most switches will be all #1 or all #4, do we not? (side note: I had to think about these as four different kinds of switches. As I think users will much of the time; it would be very optimistic to think they will see it the way language designers do: two orthogonal features that they can simply compose together or use apart. Actually they won't see four kinds; they will think there are two kinds and then be very surprised when they come across a hybrid like 2 or 3. ) Anyway, I don't dislike yield even though I know it has other connotations. I think it communicates "I am done and I give forth this value", and what happens from there can be context-dependent and that seems fine.... *From: *Brian Goetz *Date: *Mon, May 13, 2019 at 6:33 AM *To: *amber-spec-experts As mentioned in the preview mail, we have one more decision to make: the > new spelling of ?break value? in expression switches. We have previously > discussed ?break-with value?, which everyone seems to like better than > ?break value?, but I think we can, and should, do better. > > (Despite the call-for-bikeshed, this is not to reopen every sub-decision ? > the 2x2 semantics, the use of ->, the name of the construct ? this bikeshed > only has room for one bike.) > > There are two primary reasons why we prefer break-with to break. We > originally chose ?break value" when we had a more limited palette of > options to choose from (the keyword-resupply ship hadn?t yet docked.) The > overloading of break creates uncomfortable interactions. There is the > obvious ambiguity between ?break value? and ?break label?; there is also > the slightly less obvious interaction where we cannot permit ?break value? > inside a loop or statement switch inside an expression switch. While both > of these can be ?specified around?, they create distortions in the spec, > which in turn creates complexity in the user model; these are a sign that > we may be pushing something a bit too far. Further, historically ?break? > has been a straight transfer of control; this muddies up what ?break? > means. > > Once we alit on the idea of break-* as a keyword, it seemed immediately > more comfortable to make a new break-derived keyword; this allowed us to > undo the distortions that ?break value? introduced, and it immediately felt > better. But I think we can do better still. Here?s what?s making me > uncomfortable. > > We?ve actually been here before: lambda expressions were the first time we > allowed an expression to contain statements, and while the streamlined case > of ?x -> e? didn?t require any control statements, and many lambdas could > be expressed with this form, statement lambdas needed a way to say ?stop > executing the body of this lambda, and yield a value.? We settled ? > somewhat uncomfortably ? on ?return value" for this. > > Fast-forward to today, when we?re introducing the second expression form > that can contain statements, and we face the same question: how to indicate > ?I?m done, I?m completing normally, here?s my value.? Lambdas provide no > help here; we can?t use ?return? here. (Well, we could, but that would be > terrible, so we?re not going to.) Which means we have to solve the problem > again, but differently. That?s already not so great. > > Digression: What?s so terrible about ?return?, any why is it OK for > lambdas but not OK for switches? > > While we could of course define ?return? to mean whatever we want, But, in > imperative languages with the concept of ?methods? or ?procedures?, > including Java, return has always had a clear meaning: unwind the current > call frame, and yield the designated value to the caller. Lambda > expressions are effectively method bodies (lambdas are literals for > functional interfaces, which are single method interfaces), and so return > (barely) fits. But switch expressions are most definitely not methods, and > are not associated with call frames. Asking users to look at the enclosing > context when they see a ?return? in the middle of a method, to know whether > it returns from the method or merely transfers control within the method, > is a lot to ask. (Yes, I know lambdas ask this as well; this is why this > was an uncomfortable choice, and having made this hole, I?m not anxious to > expand it dramatically. If anything I?d prefer to close it, but that?s > another bikeshed.). > > (end digression) > > > We could surely take ?break-with? and move on; it feels sufficiently > ?switchy?. But let?s look ahead a little bit. We?ve now confronted the > same problem twice: an expression form that, in a minority use case, needed > a way to express ?stop computing this expression, because I?m done, and > here?s its value.? (And, unfortunately, we have two different syntactic > ways to express the same basic concept.) Let?s call these ?structured > expressions.? > > We have two structured expression forms, and of the three numbers in > computer science, ?two? is not one of them. Which suggests we are going to > face this problem again some day ? whether it be ?block expressions?, or > ?if expressions?, or ?let expressions?, or ?try expressions?, or whatever. > (NB: this call-for-bikeshed most definitely does not extend to ?why not > just do generalized block expressions?, so please don?t go there. That > said, you could treat this discussion as ?if Java had block expressions, > what might they look like?? But we?re focusing on the content of the > block, not how the block is framed.) > > Let?s say for sake of argument that we might someway want to extend > ternary expressions to support the same kind of ?restricted block > expressions? as expression switches. (This is just an example for purposes > of illustration, let?s not get derailed on ?but you should use an ?if? > statement for that"). > > String s = (foo != null) > ? s > : { > println(?null again at line? + __LINE__); > break-with ?null?; > }; > > Such an expression needs a way to say ?I?m done, here?s my value?, just as > lambda and switch did before it. Clearly ?return? is not the right thing > here any more than it is for switches. And I don?t think ?break-with? is > all that great here either! It?s not terrible, but outside of a loop or > switch, it starts to feel kind of forced. And it would be terrible to > solve this problem twice with one-time solutions, and have no general > story, and then have to come up with YET ANOTHER way of expressing the same > basic concept. So regardless of what we expect for future expression > forms, let?s examine what our options are that are not tied to call frames > (return) or direct transfer of control (switches and loops.). > > Looking at what other languages have done here, there are a few broad > directions: > > - A statement like ?break-with v?, indicating that the enclosing > structured expression is completing normally with the provided value. > - An operator that serves the same purpose, such as ?-> e?. > - Assigning to some magic variable (this is how Pascal indicates the > return value of a function). > - Treating the last expression in the block as the result. > > I think we can dispatch all but the first relatively easily: > > - We don?t use operators for ?return?, we use a keyword; this would be > both a gratuitous departure, as well as too easy to miss. > - Switch expressions don?t have names, and even if we assigned > to ?switch?, it wouldn?t be obvious that we were actually terminating > execution of the block. > - Everywhere else in the language (such as method bodies), you are free > to yield up a value from the middle of the block, perhaps from within a > control construct like a loop; restricting the RHS of case blocks to put > their result last would be a significant new restriction, and would limit > the ability to refactor to/from methods. And further, the convention of > putting the result last, while a fine one for a language that > is ?expressions all the way down?, would likely be too subtle a cue in > Java. > > So, we want a keyword (or contextual keyword.). In some hallway > brainstorming, candidates that emerged include yield, produce, offer, > offer-up, result, value-break, yield-value, provide, resulting-in, > break-with, resulting, yielding, put, give, giving, ... > > (Also to keep in mind: remember we?re dealing with a minority case; most > of the time, there?ll just be an expression on the RHS.) > > TL;DR: I think we might come to regret break-* just as we did with return > ? because it won?t scale to future demands we place on it, and having > *three* ways to say basically the same thing in three different contexts > would be embarrassing. I would like to see if we can do better. > > > Of the options listed here, I have a favorite: yield. (This is one of the > terms we?ve actually be using all along when describing this feature in > english.) > > There is one obvious objection to ?yield?, which I?d like to preemptively > address: that in some languages (though not in Java, except for the > infrequently-used Thread.yield()), it is associated with concurrency > primitives, such as generators. (This was the objection raised when yield > was proposed in the context of lambdas.). But, these association are not > grounded in existing Java constructs (and, the progress of Loom suggests > that constructs like async/await are not coming to Java, and even if we > wanted language support for generators, there are ample other ways to say > it.) > > Dictionary.com lists the following meanings for > yield: > > verb (used with object) > - to give forth or produce by a natural process or > in return for cultivation: > - to produce or furnish (payment, profit, or interest): > - to give up, as to superior power or authority: > - to give up or over; relinquish or resign: > - to give as due or required: > - to cause; give rise to: > > verb (used without object) > - to give a return, as for labor expended; produce; bear. > - to surrender or submit, as to superior power: > - to give way to influence, entreaty, argument, or the like: > - to give place or precedence (usually followed by to): > - to give way to force, pressure, etc., so as > to move, bend, collapse, or the like: > > These are mostly consistent with the use of ?yield? as proposed here. > > One more thing to bear in mind: there is an ordering to abrupt completion > mechanisms, as to how far away they can transfer control: > > - yield: can unwind only the innermost yieldable expression > - break/continue: can unwind multiple control constructs (for, while, > switch), but stays within the method > - return: unwinds exactly one method > - throw: unwinds one or more methods > - System.exit: unwinds the whole VM > > > Bikeshed is open (but remember the bounds of this bikeshed are limited; > we?re talking purely about the syntax of a ?stop executing this block and > yield a value to the enclosing context? ? and time is ticking.) > > > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Mon May 13 22:48:09 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 13 May 2019 18:48:09 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> Message-ID: > So in colon-form switch (whether statement or expression) you are responsible for your own control flow, and in arrow-form switch (whether statement or expression) you are not. "break" is synonymous in users' minds with that control flow they don't want to have to do. So in theory it's arrow-form that should make the concept of "breaking" obsolete. Unfortunately, that doesn't seem like the distinction we'll making; do I have the following right? In colon-form, you are always responsible for your control flow. In arrow-form, you are generally not, except that if you have a block on the RHS of the arrow, you are responsible for control flow _out of the block_. In: y = switch (x) { case 1 -> { foo(); yield 3; } }; there is a pleasant ambiguity as to whether the ?yield 3? is yielding a value to the _block_, in which case the switch just completes normally, or whether it is yielding the value to the _switch case_. And it doesn?t really matter, so whichever intuition users are attracted to, is fine. > A colon-form case in a switch statement stays absolutely the same as always - keep `break`ing > An arrow-form case in a switch statement usually doesn't need to `break`... but can, just as an early-out from a block, right? > A colon-form case in a switch expression cannot `break` at all; it either yields, throws, or falls through > An arrow-form case in a switch expression: cannot `break` or fall through; must be a single expression, or it must always `__yield` or throw Right. There are several rules interacting here: - An expression must either yield a value or throw; control statements like break, continue, or return is not allowed in a ?structured expression.? - You break out of a switch statement; you yield values from a switch expression - In arrow form, neither break/yield is needed if the RHS is not a block - In arrow form, break/yield/throw *is* needed if the RHS is a block > So using break or not isn't about whether you are doing your own control flow or not. So it's not a nice conceptual clean break that way, but in practice we think most switches will be all #1 or all #4, do we not? I would expect 1/4 to be the most common, followed by 2, with 3 bringing up the rear. > Anyway, I don't dislike yield even though I know it has other connotations. I think it communicates "I am done and I give forth this value", and what happens from there can be context-dependent and that seems fine.... > Yep. And that context dependency is: - Yield yields to the immediately enclosing structured expression; if there is none, it is an error - Unlabeled break/continue breaks to the immediately enclosing ?breaky? statement, if there is none, its an error, but cannot ?break through? a structured expression. From daniel.smith at oracle.com Tue May 14 23:15:55 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 14 May 2019 17:15:55 -0600 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: <950B7480-0CC3-4970-9209-553AFE2FD603@oracle.com> References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> <92CC33C3-6AA1-4639-ABC0-1DD87CD33C59@oracle.com> <950B7480-0CC3-4970-9209-553AFE2FD603@oracle.com> Message-ID: <135B8E95-7D8D-4035-902A-CC9B8BF0A044@oracle.com> > On May 13, 2019, at 8:05 AM, Jim Laskey wrote: > > After some significant tweaks, reopening the JEP for review. https://bugs.openjdk.java.net/browse/JDK-8222530 Something really clicks for me in calling these "text blocks". The delimiter syntax and conventions for line breaks/whitespace, which seemed somewhat arbitrary before, feel right. Nice psychological trick. Let me weigh in with some design feedback, in a refined form of some comments I made in a previous thread: Finding the right indentation trimming algorithm has been a struggle. We've come up with something, but it sure seems complex, and I'll bet most programmers will never fully internalize it. The struggle arises primarily because the feature has an ambitious goal of getting it "right" for a wide variety of indentation conventions, and also because the feature is constrained to be a post-processing step, independent of program context. I suggest rethinking both of those requirements. Instead, the language should be strongly opinionated about how text blocks should be indented, and should take the enclosing context into account. Specifically, the opening """ delimiter should mark the left margin of the text block, and it should be a compiler error to put content to the left of that margin. This results in a really simple, readable approach to indenting: the delimiter marks the rectangle. Detailed rules: - The *prefix* of a text block is the program text after the immediately preceding \n or \r, up to the opening """, with every non-whitespace character replaced with a space (\u0020). - The form of a text block is """ * ( * )+ """ (that is, opening delimiter, ignored whitespace, then one or more lines of content, each prefixed by a newline and the *prefix*; all prefixes must be identical). - The string denoted by a text block is its * strings after escape processing, concatenated together with '\n'. Most of the examples in the JEP follow these rules as a convention already. The concatenation examples would benefit from following it. Discussion: What if I want to shift my content left? Just put a line break before the opening delimiter, and align it wherever you want to set your left margin. (If you don't want to strip anything, put the opening delimiter in column 0.) You're n-line text block now takes n+1 lines?nbd. What if I want to shift my content right, beyond the delimiters? Don't do that. That's not how text blocks work. (I mean, you can do it, but your extra whitespace will be included in the denoted string.) What about tabs? Tabs that come before the opening delimiter are recognized, and all prefixes must use the same pattern of tabs/spaces/[other exotic whitespace]. What if you want to have program text on the same line as the opening delimiter, but then want to use tabs underneath?: \t \t System.out.println(""" \t \t \t \t \t \t \t Hello world! \t \t \t \t \t \t \t """); Well, then you're doing tabs wrong?different tab widths will make "Hello world!" appear to the left or right of the delimiters. So this is an error. Either use spaces after the first two tabs, or put the opening delimiter on a new line. What about variable-width fonts? If you expect your code to be read in a variable-width font, by convention you should start all text blocks on a (possibly-indented) blank line. What about Unicode escapes? It's an orthogonal question, but I think it's fine to continue pre-processing all Unicode escapes. If you want obfuscate prefixes and line breaks using \u0020 and \u000a, go for it. From brian.goetz at oracle.com Tue May 14 23:25:17 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 14 May 2019 19:25:17 -0400 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: <135B8E95-7D8D-4035-902A-CC9B8BF0A044@oracle.com> References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> <92CC33C3-6AA1-4639-ABC0-1DD87CD33C59@oracle.com> <950B7480-0CC3-4970-9209-553AFE2FD603@oracle.com> <135B8E95-7D8D-4035-902A-CC9B8BF0A044@oracle.com> Message-ID: <26D5F108-EA91-4AA7-9464-5DE7ED81CB15@oracle.com> > Most of the examples in the JEP follow these rules as a convention already. The concatenation examples would benefit from following it. Sorry, not seeing it ? how would the concatenation examples benefit? Example? From gavin.bierman at oracle.com Wed May 15 13:47:23 2019 From: gavin.bierman at oracle.com (Gavin Bierman) Date: Wed, 15 May 2019 15:47:23 +0200 Subject: Draft language spec for JEP354: Switch Expressions Message-ID: Dear experts: A draft language spec for JEP 354: Switch Expressions can be found here: http://cr.openjdk.java.net/~gbierman/jep354-jls-201905.html [Note: This spec uses the break-with statement. There is a discussion elsewhere on alternatives for a different syntax. The spec will be updated as soon as this discussion has been finalised.] Comments welcome! Gavin From daniel.smith at oracle.com Wed May 15 17:17:31 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 15 May 2019 11:17:31 -0600 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: <26D5F108-EA91-4AA7-9464-5DE7ED81CB15@oracle.com> References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> <92CC33C3-6AA1-4639-ABC0-1DD87CD33C59@oracle.com> <950B7480-0CC3-4970-9209-553AFE2FD603@oracle.com> <135B8E95-7D8D-4035-902A-CC9B8BF0A044@oracle.com> <26D5F108-EA91-4AA7-9464-5DE7ED81CB15@oracle.com> Message-ID: <50775FC6-35CE-4B82-8137-6A7AEBBEC6E5@oracle.com> > On May 14, 2019, at 5:25 PM, Brian Goetz wrote: > >> Most of the examples in the JEP follow these rules as a convention already. The concatenation examples would benefit from following it. > > Sorry, not seeing it ? how would the concatenation examples benefit? Example? > Sure, let me elaborate. I think this: ~~~ String code = """ public void print(""" + type + """ o) { System.out.println(Objects.toString(o)); } """; ~~~ should be presented like this: ~~~ String code = """ public void print(""" + type + """ o) { System.out.println(Objects.toString(o)); } """; ~~~ It's not great, and replace/format is the "right" solution, but if somebody wants to do concatenation, this style does a better job of indicating where the indent prefix ends and the content begins. The delimiter gives a visual indication of where the "block" is located. Further illustrations: Things like this are following the convention I'm proposing we enforce: ~~~ String html = """

Hello, world

"""; ~~~ As is this: ~~~ """ line 1 line 2 line 3""" ~~~ This one doesn't, but it's a simple matter of putting some spaces before the closing delimiter to fix it: ~~~ String empty = """ """; ~~~ This concatenation example follows the convention (although note that there's no newline between '{' and 'System'): ~~~ String code = "public void print(Object o) {" + """ System.out.println(Objects.toString(o)); } """; ~~~ From john.r.rose at oracle.com Wed May 15 17:35:08 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 15 May 2019 10:35:08 -0700 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: <50775FC6-35CE-4B82-8137-6A7AEBBEC6E5@oracle.com> References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> <92CC33C3-6AA1-4639-ABC0-1DD87CD33C59@oracle.com> <950B7480-0CC3-4970-9209-553AFE2FD603@oracle.com> <135B8E95-7D8D-4035-902A-CC9B8BF0A044@oracle.com> <26D5F108-EA91-4AA7-9464-5DE7ED81CB15@oracle.com> <50775FC6-35CE-4B82-8137-6A7AEBBEC6E5@oracle.com> Message-ID: 1/4. FTR, an escape <\ LT> could clean that up a bit more, if the goal is to get the interpolation cruft on a separate line: ~~~ String code = """ public void print(\ """ + type + """ o) { System.out.println(Objects.toString(o)); } """; ~~~ 2/4. Dan, I'm having trouble seeing your idea of "prefix" in this example. Is it that `String code = ` has the same number of chars as there are spaces before `public` (start of the first payload line)? This is hard to read, I'm afraid. 3/4. Dan, isn't it true that programmers can use this idiom under the existing proposal, without appealing to your "prefix" rule? All they do is (a) keep the close-quotes (in a single ""+x+"" expression) aligned, and also (b) don't exdent before the close quotes. 4/4. I guess you are proposing two adjustments, the "prefix" rule and the "no exdent rule". The "prefix" rule allows open-quote to set indentation, by counting arbitrary characters before the open-quote as setting a target column. The "no exdent rule" disallows payload chars in columns before the target column, as set by the close-quote. If I'm reading that right, I'm much happier with the "no exdent rule" than the "prefix" rule. ? John P.S. In one example you say something about a missing newline before a close-quote. Those can always be introduced explicitly by <\ n>. One reason I like <\ LT> is that it pairs very well with <\ n>: You can put in <\ LT> to control a line break, and then if you really want a payload LT also, you add <\ n> either before or after the <\ LT>. On May 15, 2019, at 10:17 AM, Dan Smith wrote: > > ~~~ > String code = """ > public void print(""" + > type + > """ > o) { > System.out.println(Objects.toString(o)); > } > """; > ~~~ > From daniel.smith at oracle.com Wed May 15 18:01:25 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 15 May 2019 12:01:25 -0600 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> <92CC33C3-6AA1-4639-ABC0-1DD87CD33C59@oracle.com> <950B7480-0CC3-4970-9209-553AFE2FD603@oracle.com> <135B8E95-7D8D-4035-902A-CC9B8BF0A044@oracle.com> <26D5F108-EA91-4AA7-9464-5DE7ED81CB15@oracle.com> <50775FC6-35CE-4B82-8137-6A7AEBBEC6E5@oracle.com> Message-ID: <391EBA8C-36F6-4103-B153-8E38A5A8C9F0@oracle.com> > On May 15, 2019, at 11:35 AM, John Rose wrote: > > 2/4. Dan, I'm having trouble seeing your > idea of "prefix" in this example. Is it that > `String code = ` has the same number of > chars as there are spaces before `public` > (start of the first payload line)? This is hard > to read, I'm afraid. Yes. "Same number of characters" is the idea (with extra constraints to handle tabs and other exotic whitespace, but most people won't care about those). Is it hard to read because of a variable-width font? In a normal editing environment, I'm just saying the opening delimiter should be visually aligned with the content. > 3/4. Dan, isn't it true that programmers can > use this idiom under the existing proposal, > without appealing to your "prefix" rule? > All they do is (a) keep the close-quotes > (in a single ""+x+"" expression) aligned, > and also (b) don't exdent before the close > quotes. Sure. I'm claiming that it would be helpful to put some additional constraints on what constitutes a valid text block, in order to ensure some harder-to-read cases never come up. > 4/4. I guess you are proposing two adjustments, the > "prefix" rule and the "no exdent rule". The "prefix" > rule allows open-quote to set indentation, by counting > arbitrary characters before the open-quote as setting > a target column. The "no exdent rule" disallows payload > chars in columns before the target column, as set by > the close-quote. You could say that. Your "no exdent" rule prevents any lines but the (necessarily blank) closing-delimiter line from setting the target column. Your "prefix" rule transfers this responsibility to the opening-delimiter line. I think using the opening delimiter is helpful because 1) readers see the opening delimiter first, and 2) it frees the closing delimiter to be a marker for trailing whitespace/newlines. From alex.buckley at oracle.com Wed May 15 18:11:44 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Wed, 15 May 2019 11:11:44 -0700 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: <50775FC6-35CE-4B82-8137-6A7AEBBEC6E5@oracle.com> References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> <92CC33C3-6AA1-4639-ABC0-1DD87CD33C59@oracle.com> <950B7480-0CC3-4970-9209-553AFE2FD603@oracle.com> <135B8E95-7D8D-4035-902A-CC9B8BF0A044@oracle.com> <26D5F108-EA91-4AA7-9464-5DE7ED81CB15@oracle.com> <50775FC6-35CE-4B82-8137-6A7AEBBEC6E5@oracle.com> Message-ID: <5CDC5660.9010802@oracle.com> On 5/15/2019 10:17 AM, Dan Smith wrote: > I think this: > > ~~~ > String code = """ > public void print(""" + type + """ > o) { > System.out.println(Objects.toString(o)); > } > """; > ~~~ > > should be presented like this: > > ~~~ > String code = """ > public void print(""" + > type + > """ > o) { > System.out.println(Objects.toString(o)); > } > """; > ~~~ > > It's not great, and replace/format is the "right" solution, but if > somebody wants to do concatenation, this style does a better job of > indicating where the indent prefix ends and the content begins. The > delimiter gives a visual indication of where the "block" is located. I appreciate that you want to position an opening delimiter to the left of its content, but can you say why you want `type +` on its own line? What's the big deal with `...""" + type +\n` and then the next text block? (You don't seem to object to the closing delimiter sharing a line with content, since you have ` + ` after the first closing delimiter.) Alex From daniel.smith at oracle.com Wed May 15 18:25:14 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 15 May 2019 12:25:14 -0600 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: <5CDC5660.9010802@oracle.com> References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> <92CC33C3-6AA1-4639-ABC0-1DD87CD33C59@oracle.com> <950B7480-0CC3-4970-9209-553AFE2FD603@oracle.com> <135B8E95-7D8D-4035-902A-CC9B8BF0A044@oracle.com> <26D5F108-EA91-4AA7-9464-5DE7ED81CB15@oracle.com> <50775FC6-35CE-4B82-8137-6A7AEBBEC6E5@oracle.com> <5CDC5660.9010802@oracle.com> Message-ID: > On May 15, 2019, at 12:11 PM, Alex Buckley wrote: > > On 5/15/2019 10:17 AM, Dan Smith wrote: >> I think this: >> >> ~~~ >> String code = """ >> public void print(""" + type + """ >> o) { >> System.out.println(Objects.toString(o)); >> } >> """; >> ~~~ >> >> should be presented like this: >> >> ~~~ >> String code = """ >> public void print(""" + >> type + >> """ >> o) { >> System.out.println(Objects.toString(o)); >> } >> """; >> ~~~ >> >> It's not great, and replace/format is the "right" solution, but if >> somebody wants to do concatenation, this style does a better job of >> indicating where the indent prefix ends and the content begins. The >> delimiter gives a visual indication of where the "block" is located. > > I appreciate that you want to position an opening delimiter to the left of its content, but can you say why you want `type +` on its own line? What's the big deal with `...""" + type +\n` and then the next text block? (You don't seem to object to the closing delimiter sharing a line with content, since you have ` + ` after the first closing delimiter.) Just a feeling that it might read better with every piece on a separate line. I don't have a strong preference about that, though. In retrospect, here's how I'd really write it in a program of mine, assuming I was opposed to the replace/format approach for some reason: String code = "public void print(" + type + " o) {\n" + """ System.out.println(Objects.toString(o)); } """; But that doesn't do such a good job of illustrating how the re-indentation algorithm impacts whitespace before the 'o'. :-) From dl at cs.oswego.edu Wed May 15 23:35:07 2019 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 15 May 2019 19:35:07 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <135B5F1D-5DF3-4C54-9B2D-8D2CD716FFC9@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <9aaff291-fb96-9cbd-432b-206cd215ed06@cs.oswego.edu> <135B5F1D-5DF3-4C54-9B2D-8D2CD716FFC9@oracle.com> Message-ID: <14a66fd2-62c7-7a1b-19b6-a07422a47ff2@cs.oswego.edu> (With continuing deja vu...) On 5/13/19 3:48 PM, John Rose wrote: > The rule for developers is that if you > needed to put a {?} block after your > arrow ->, then you can still use an > arrow to return a value, but it must be > an extra arrow, marked with a keyword > (or syntax context) that means "here is > the rest of the arrow you wanted to > write a moment ago". Yes, but this arrow should not point right, but up (which was the thought underlying Smalltalk's choice). Maybe finally use unicode "?". Or more conservatively, "^". I still think a symbol is better than keyword, because there is no single common word that applies across contexts this may be applied in, except possibly "yield", that already means something else in Java (Thread.yield), and several something else's in other languages. (Meta: What do you call a bikeshed thread in which no one likes anyone else's suggestions?) -Doug From guy.steele at oracle.com Thu May 16 00:31:51 2019 From: guy.steele at oracle.com (Guy Steele) Date: Wed, 15 May 2019 20:31:51 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <14a66fd2-62c7-7a1b-19b6-a07422a47ff2@cs.oswego.edu> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <9aaff291-fb96-9cbd-432b-206cd215ed06@cs.oswego.edu> <135B5F1D-5DF3-4C54-9B2D-8D2CD716FFC9@oracle.com> <14a66fd2-62c7-7a1b-19b6-a07422a47ff2@cs.oswego.edu> Message-ID: <817DF6B5-CA82-4CAE-83D1-088F1B90223D@oracle.com> > On May 15, 2019, at 7:35 PM, Doug Lea

wrote: > . . . > (Meta: What do you call a bikeshed thread in which no one likes anyone > else's suggestions?) Cliqueless? From forax at univ-mlv.fr Thu May 16 11:41:33 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 16 May 2019 13:41:33 +0200 (CEST) Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> Message-ID: <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> Another possible keyword is 'pass'. R?mi > De: "Brian Goetz" > ?: "amber-spec-experts" > Envoy?: Dimanche 12 Mai 2019 21:38:38 > Objet: Call for bikeshed -- break replacement in expression switch > As mentioned in the preview mail, we have one more decision to make: the new > spelling of ?break value? in expression switches. We have previously discussed > ?break-with value?, which everyone seems to like better than ?break value?, but > I think we can, and should, do better. > (Despite the call-for-bikeshed, this is not to reopen every sub-decision ? the > 2x2 semantics, the use of ->, the name of the construct ? this bikeshed only > has room for one bike.) > There are two primary reasons why we prefer break-with to break. We originally > chose ?break value" when we had a more limited palette of options to choose > from (the keyword-resupply ship hadn?t yet docked.) The overloading of break > creates uncomfortable interactions. There is the obvious ambiguity between > ?break value? and ?break label?; there is also the slightly less obvious > interaction where we cannot permit ?break value? inside a loop or statement > switch inside an expression switch. While both of these can be ?specified > around?, they create distortions in the spec, which in turn creates complexity > in the user model; these are a sign that we may be pushing something a bit too > far. Further, historically ?break? has been a straight transfer of control; > this muddies up what ?break? means. > Once we alit on the idea of break-* as a keyword, it seemed immediately more > comfortable to make a new break-derived keyword; this allowed us to undo the > distortions that ?break value? introduced, and it immediately felt better. But > I think we can do better still. Here?s what?s making me uncomfortable. > We?ve actually been here before: lambda expressions were the first time we > allowed an expression to contain statements, and while the streamlined case of > ?x -> e? didn?t require any control statements, and many lambdas could be > expressed with this form, statement lambdas needed a way to say ?stop executing > the body of this lambda, and yield a value.? We settled ? somewhat > uncomfortably ? on ?return value" for this. > Fast-forward to today, when we?re introducing the second expression form that > can contain statements, and we face the same question: how to indicate ?I?m > done, I?m completing normally, here?s my value.? Lambdas provide no help here; > we can?t use ?return? here. (Well, we could, but that would be terrible, so > we?re not going to.) Which means we have to solve the problem again, but > differently. That?s already not so great. > Digression: What?s so terrible about ?return?, any why is it OK for lambdas but > not OK for switches? > While we could of course define ?return? to mean whatever we want, But, in > imperative languages with the concept of ?methods? or ?procedures?, including > Java, return has always had a clear meaning: unwind the current call frame, and > yield the designated value to the caller. Lambda expressions are effectively > method bodies (lambdas are literals for functional interfaces, which are single > method interfaces), and so return (barely) fits. But switch expressions are > most definitely not methods, and are not associated with call frames. Asking > users to look at the enclosing context when they see a ?return? in the middle > of a method, to know whether it returns from the method or merely transfers > control within the method, is a lot to ask. (Yes, I know lambdas ask this as > well; this is why this was an uncomfortable choice, and having made this hole, > I?m not anxious to expand it dramatically. If anything I?d prefer to close it, > but that?s another bikeshed.). > (end digression) > We could surely take ?break-with? and move on; it feels sufficiently ?switchy?. > But let?s look ahead a little bit. We?ve now confronted the same problem twice: > an expression form that, in a minority use case, needed a way to express ?stop > computing this expression, because I?m done, and here?s its value.? (And, > unfortunately, we have two different syntactic ways to express the same basic > concept.) Let?s call these ?structured expressions.? > We have two structured expression forms, and of the three numbers in computer > science, ?two? is not one of them. Which suggests we are going to face this > problem again some day ? whether it be ?block expressions?, or ?if > expressions?, or ?let expressions?, or ?try expressions?, or whatever. (NB: > this call-for-bikeshed most definitely does not extend to ?why not just do > generalized block expressions?, so please don?t go there. That said, you could > treat this discussion as ?if Java had block expressions, what might they look > like?? But we?re focusing on the content of the block, not how the block is > framed.) > Let?s say for sake of argument that we might someway want to extend ternary > expressions to support the same kind of ?restricted block expressions? as > expression switches. (This is just an example for purposes of illustration, > let?s not get derailed on ?but you should use an ?if? statement for that"). > String s = (foo != null) > ? s > : { > println(?null again at line? + __LINE__); > break-with ?null?; > }; > Such an expression needs a way to say ?I?m done, here?s my value?, just as > lambda and switch did before it. Clearly ?return? is not the right thing here > any more than it is for switches. And I don?t think ?break-with? is all that > great here either! It?s not terrible, but outside of a loop or switch, it > starts to feel kind of forced. And it would be terrible to solve this problem > twice with one-time solutions, and have no general story, and then have to come > up with YET ANOTHER way of expressing the same basic concept. So regardless of > what we expect for future expression forms, let?s examine what our options are > that are not tied to call frames (return) or direct transfer of control > (switches and loops.). > Looking at what other languages have done here, there are a few broad > directions: > - A statement like ?break-with v?, indicating that the enclosing structured > expression is completing normally with the provided value. > - An operator that serves the same purpose, such as ?-> e?. > - Assigning to some magic variable (this is how Pascal indicates the return > value of a function). > - Treating the last expression in the block as the result. > I think we can dispatch all but the first relatively easily: > - We don?t use operators for ?return?, we use a keyword; this would be both a > gratuitous departure, as well as too easy to miss. > - Switch expressions don?t have names, and even if we assigned to ?switch?, it > wouldn?t be obvious that we were actually terminating execution of the block. > - Everywhere else in the language (such as method bodies), you are free to yield > up a value from the middle of the block, perhaps from within a control > construct like a loop; restricting the RHS of case blocks to put their result > last would be a significant new restriction, and would limit the ability to > refactor to/from methods. And further, the convention of putting the result > last, while a fine one for a language that is ?expressions all the way down?, > would likely be too subtle a cue in Java. > So, we want a keyword (or contextual keyword.). In some hallway brainstorming, > candidates that emerged include yield, produce, offer, offer-up, result, > value-break, yield-value, provide, resulting-in, break-with, resulting, > yielding, put, give, giving, ... > (Also to keep in mind: remember we?re dealing with a minority case; most of the > time, there?ll just be an expression on the RHS.) > TL;DR: I think we might come to regret break-* just as we did with return ? > because it won?t scale to future demands we place on it, and having *three* > ways to say basically the same thing in three different contexts would be > embarrassing. I would like to see if we can do better. > Of the options listed here, I have a favorite: yield. (This is one of the terms > we?ve actually be using all along when describing this feature in english.) > There is one obvious objection to ?yield?, which I?d like to preemptively > address: that in some languages (though not in Java, except for the > infrequently-used Thread.yield()), it is associated with concurrency > primitives, such as generators. (This was the objection raised when yield was > proposed in the context of lambdas.). But, these association are not grounded > in existing Java constructs (and, the progress of Loom suggests that constructs > like async/await are not coming to Java, and even if we wanted language support > for generators, there are ample other ways to say it.) > [ http://dictionary.com/ | Dictionary.com ] lists the following meanings for > yield: > verb (used with object) > - to give forth or produce by a natural process or in return for cultivation: > - to produce or furnish (payment, profit, or interest): > - to give up, as to superior power or authority: > - to give up or over; relinquish or resign: > - to give as due or required: > - to cause; give rise to: > verb (used without object) > - to give a return, as for labor expended; produce; bear. > - to surrender or submit, as to superior power: > - to give way to influence, entreaty, argument, or the like: > - to give place or precedence (usually followed by to): > - to give way to force, pressure, etc., so as to move, bend, collapse, or the > like: > These are mostly consistent with the use of ?yield? as proposed here. > One more thing to bear in mind: there is an ordering to abrupt completion > mechanisms, as to how far away they can transfer control: > - yield: can unwind only the innermost yieldable expression > - break/continue: can unwind multiple control constructs (for, while, switch), > but stays within the method > - return: unwinds exactly one method > - throw: unwinds one or more methods > - System.exit: unwinds the whole VM > Bikeshed is open (but remember the bounds of this bikeshed are limited; we?re > talking purely about the syntax of a ?stop executing this block and yield a > value to the enclosing context? ? and time is ticking.) From brian.goetz at oracle.com Thu May 16 15:24:36 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 16 May 2019 11:24:36 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> Message-ID: <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> We?ve probably pretty much explored the options at this point; time to converge around one of the choices... > > De: "Brian Goetz" > ?: "amber-spec-experts" > Envoy?: Dimanche 12 Mai 2019 21:38:38 > Objet: Call for bikeshed -- break replacement in expression switch > As mentioned in the preview mail, we have one more decision to make: the new spelling of ?break value? in expression switches. We have previously discussed ?break-with value?, which everyone seems to like better than ?break value?, but I think we can, and should, do better. > > (Despite the call-for-bikeshed, this is not to reopen every sub-decision ? the 2x2 semantics, the use of ->, the name of the construct ? this bikeshed only has room for one bike.) > > There are two primary reasons why we prefer break-with to break. We originally chose ?break value" when we had a more limited palette of options to choose from (the keyword-resupply ship hadn?t yet docked.) The overloading of break creates uncomfortable interactions. There is the obvious ambiguity between ?break value? and ?break label?; there is also the slightly less obvious interaction where we cannot permit ?break value? inside a loop or statement switch inside an expression switch. While both of these can be ?specified around?, they create distortions in the spec, which in turn creates complexity in the user model; these are a sign that we may be pushing something a bit too far. Further, historically ?break? has been a straight transfer of control; this muddies up what ?break? means. > > Once we alit on the idea of break-* as a keyword, it seemed immediately more comfortable to make a new break-derived keyword; this allowed us to undo the distortions that ?break value? introduced, and it immediately felt better. But I think we can do better still. Here?s what?s making me uncomfortable. > > We?ve actually been here before: lambda expressions were the first time we allowed an expression to contain statements, and while the streamlined case of ?x -> e? didn?t require any control statements, and many lambdas could be expressed with this form, statement lambdas needed a way to say ?stop executing the body of this lambda, and yield a value.? We settled ? somewhat uncomfortably ? on ?return value" for this. > > Fast-forward to today, when we?re introducing the second expression form that can contain statements, and we face the same question: how to indicate ?I?m done, I?m completing normally, here?s my value.? Lambdas provide no help here; we can?t use ?return? here. (Well, we could, but that would be terrible, so we?re not going to.) Which means we have to solve the problem again, but differently. That?s already not so great. > > Digression: What?s so terrible about ?return?, any why is it OK for lambdas but not OK for switches? > > While we could of course define ?return? to mean whatever we want, But, in imperative languages with the concept of ?methods? or ?procedures?, including Java, return has always had a clear meaning: unwind the current call frame, and yield the designated value to the caller. Lambda expressions are effectively method bodies (lambdas are literals for functional interfaces, which are single method interfaces), and so return (barely) fits. But switch expressions are most definitely not methods, and are not associated with call frames. Asking users to look at the enclosing context when they see a ?return? in the middle of a method, to know whether it returns from the method or merely transfers control within the method, is a lot to ask. (Yes, I know lambdas ask this as well; this is why this was an uncomfortable choice, and having made this hole, I?m not anxious to expand it dramatically. If anything I?d prefer to close it, but that?s another bikeshed.). > > (end digression) > > > We could surely take ?break-with? and move on; it feels sufficiently ?switchy?. But let?s look ahead a little bit. We?ve now confronted the same problem twice: an expression form that, in a minority use case, needed a way to express ?stop computing this expression, because I?m done, and here?s its value.? (And, unfortunately, we have two different syntactic ways to express the same basic concept.) Let?s call these ?structured expressions.? > > We have two structured expression forms, and of the three numbers in computer science, ?two? is not one of them. Which suggests we are going to face this problem again some day ? whether it be ?block expressions?, or ?if expressions?, or ?let expressions?, or ?try expressions?, or whatever. (NB: this call-for-bikeshed most definitely does not extend to ?why not just do generalized block expressions?, so please don?t go there. That said, you could treat this discussion as ?if Java had block expressions, what might they look like?? But we?re focusing on the content of the block, not how the block is framed.) > > Let?s say for sake of argument that we might someway want to extend ternary expressions to support the same kind of ?restricted block expressions? as expression switches. (This is just an example for purposes of illustration, let?s not get derailed on ?but you should use an ?if? statement for that"). > > String s = (foo != null) > ? s > : { > println(?null again at line? + __LINE__); > break-with ?null?; > }; > > Such an expression needs a way to say ?I?m done, here?s my value?, just as lambda and switch did before it. Clearly ?return? is not the right thing here any more than it is for switches. And I don?t think ?break-with? is all that great here either! It?s not terrible, but outside of a loop or switch, it starts to feel kind of forced. And it would be terrible to solve this problem twice with one-time solutions, and have no general story, and then have to come up with YET ANOTHER way of expressing the same basic concept. So regardless of what we expect for future expression forms, let?s examine what our options are that are not tied to call frames (return) or direct transfer of control (switches and loops.). > > Looking at what other languages have done here, there are a few broad directions: > > - A statement like ?break-with v?, indicating that the enclosing structured expression is completing normally with the provided value. > - An operator that serves the same purpose, such as ?-> e?. > - Assigning to some magic variable (this is how Pascal indicates the return value of a function). > - Treating the last expression in the block as the result. > > I think we can dispatch all but the first relatively easily: > > - We don?t use operators for ?return?, we use a keyword; this would be both a gratuitous departure, as well as too easy to miss. > - Switch expressions don?t have names, and even if we assigned to ?switch?, it wouldn?t be obvious that we were actually terminating execution of the block. > - Everywhere else in the language (such as method bodies), you are free to yield up a value from the middle of the block, perhaps from within a control construct like a loop; restricting the RHS of case blocks to put their result last would be a significant new restriction, and would limit the ability to refactor to/from methods. And further, the convention of putting the result last, while a fine one for a language that is ?expressions all the way down?, would likely be too subtle a cue in Java. > > So, we want a keyword (or contextual keyword.). In some hallway brainstorming, candidates that emerged include yield, produce, offer, offer-up, result, value-break, yield-value, provide, resulting-in, break-with, resulting, yielding, put, give, giving, ... > > (Also to keep in mind: remember we?re dealing with a minority case; most of the time, there?ll just be an expression on the RHS.) > > TL;DR: I think we might come to regret break-* just as we did with return ? because it won?t scale to future demands we place on it, and having *three* ways to say basically the same thing in three different contexts would be embarrassing. I would like to see if we can do better. > > > Of the options listed here, I have a favorite: yield. (This is one of the terms we?ve actually be using all along when describing this feature in english.) > > There is one obvious objection to ?yield?, which I?d like to preemptively address: that in some languages (though not in Java, except for the infrequently-used Thread.yield()), it is associated with concurrency primitives, such as generators. (This was the objection raised when yield was proposed in the context of lambdas.). But, these association are not grounded in existing Java constructs (and, the progress of Loom suggests that constructs like async/await are not coming to Java, and even if we wanted language support for generators, there are ample other ways to say it.) > > Dictionary.com lists the following meanings for yield: > > verb (used with object) > - to give forth or produce by a natural process or in return for cultivation: > - to produce or furnish (payment, profit, or interest): > - to give up, as to superior power or authority: > - to give up or over; relinquish or resign: > - to give as due or required: > - to cause; give rise to: > > verb (used without object) > - to give a return, as for labor expended; produce; bear. > - to surrender or submit, as to superior power: > - to give way to influence, entreaty, argument, or the like: > - to give place or precedence (usually followed by to): > - to give way to force, pressure, etc., so as to move, bend, collapse, or the like: > > These are mostly consistent with the use of ?yield? as proposed here. > > One more thing to bear in mind: there is an ordering to abrupt completion mechanisms, as to how far away they can transfer control: > > - yield: can unwind only the innermost yieldable expression > - break/continue: can unwind multiple control constructs (for, while, switch), but stays within the method > - return: unwinds exactly one method > - throw: unwinds one or more methods > - System.exit: unwinds the whole VM > > > Bikeshed is open (but remember the bounds of this bikeshed are limited; we?re talking purely about the syntax of a ?stop executing this block and yield a value to the enclosing context? ? and time is ticking.) > > > > > From alex.buckley at oracle.com Thu May 16 19:36:38 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Thu, 16 May 2019 12:36:38 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> Message-ID: <5CDDBBC6.6010107@oracle.com> On 5/16/2019 8:24 AM, Brian Goetz wrote: > We?ve probably pretty much explored the options at this point; time to > converge around one of the choices... I am very happy with `yield` as the new construct for concluding the evaluation of a switch expression and leaving a value on the stack for consumption within the method. I think a statement form for the new construct is ideal. The purpose of the new construct is to complete abruptly in an attempt to transfer control back to the switch expression, which then completes normally with a value. Abrupt completion and an attempt to transfer control are the hallmarks of `break`, `continue`, and `return`; having `yield` as the junior member of that club is quite natural. Putting the junior and senior members side by side shows both similarity and difference: ----- A `yield` statement attempts to transfer control to the innermost enclosing switch expression; this expression ... then immediately completes normally and the value of the _Expression_ becomes the value of the switch expression. A `return` statement attempts to transfer control to the invoker of the innermost enclosing constructor, method, or lambda expression ... In the case of a return statement with value _Expression_, the value of the _Expression_ becomes the value of the invocation. ----- Note that the aspect of _attempting_ to transfer control applies to `yield` just as much as to `break`, `continue`, and `return`. Below, the `finally` block "intercepts" the transfer of control started by `yield`. The `finally` block then completes normally, so the transfer of control proceeds and the switch expression completes normally, leaving 5 or 6 on the stack. ``` int result = switch (x) { case 0 -> { try { ... if (...) yield 5; ... yield 6; } finally { cleanUp(); } } default -> 42; }; ``` Abrupt completion and transfer of control are not the hallmarks of operators. The purpose of an operator is to indicate the kind of expression to be evaluated (numeric addition, method invocation, etc), so an operator-like syntax such as `^` would suggest the imminent evaluation of a NEW expression. However, we are ALREADY in the process of evaluating a switch expression; in fact we would like to finish it up by transferring control from the {...} block (which has been happily executing statements sequentially) to the switch expression itself (so it can complete normally). So, I think an operator-like syntax is inappropriate. Alex From guy.steele at oracle.com Thu May 16 19:47:38 2019 From: guy.steele at oracle.com (Guy Steele) Date: Thu, 16 May 2019 15:47:38 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5CDDBBC6.6010107@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> Message-ID: > On May 16, 2019, at 3:36 PM, Alex Buckley wrote: > > On 5/16/2019 8:24 AM, Brian Goetz wrote: >> We?ve probably pretty much explored the options at this point; time to >> converge around one of the choices... > > I am very happy with `yield` as the new construct for concluding the evaluation of a switch expression and leaving a value on the stack for consumption within the method. > Yah, okay, I now admit that ?yield? is growing on me. I no longer object to it. And your other points below are well taken. > I think a statement form for the new construct is ideal. The purpose of the new construct is to complete abruptly in an attempt to transfer control back to the switch expression, which then completes normally with a value. Abrupt completion and an attempt to transfer control are the hallmarks of `break`, `continue`, and `return`; having `yield` as the junior member of that club is quite natural. Putting the junior and senior members side by side shows both similarity and difference: > > ----- > A `yield` statement attempts to transfer control to the innermost enclosing switch expression; this expression ... then immediately completes normally and the value of the _Expression_ becomes the value of the switch expression. > > A `return` statement attempts to transfer control to the invoker of the innermost enclosing constructor, method, or lambda expression ... In the case of a return statement with value _Expression_, the value of the _Expression_ becomes the value of the invocation. > ----- > > Note that the aspect of _attempting_ to transfer control applies to `yield` just as much as to `break`, `continue`, and `return`. Below, the `finally` block "intercepts" the transfer of control started by `yield`. The `finally` block then completes normally, so the transfer of control proceeds and the switch expression completes normally, leaving 5 or 6 on the stack. > > ``` > int result = switch (x) { > case 0 -> { > try { > ... > if (...) yield 5; > ... > yield 6; > } > finally { > cleanUp(); > } > } > > default -> 42; > }; > ``` > > Abrupt completion and transfer of control are not the hallmarks of operators. The purpose of an operator is to indicate the kind of expression to be evaluated (numeric addition, method invocation, etc), so an operator-like syntax such as `^` would suggest the imminent evaluation of a NEW expression. However, we are ALREADY in the process of evaluating a switch expression; in fact we would like to finish it up by transferring control from the {...} block (which has been happily executing statements sequentially) to the switch expression itself (so it can complete normally). So, I think an operator-like syntax is inappropriate. > > Alex From john.r.rose at oracle.com Thu May 16 19:53:27 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 16 May 2019 12:53:27 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> Message-ID: FTR I'm OK with "yield". (I yield the floor?) (And I'm OK with "pass", but we'll probably pass on that option?) The rule, I take it, is that `yield x;` would deliver a value to the innermost enclosing `->` operator. If it could be that simple, that would be a win; we could teach our eyes and IDEs to make that match-up. As I think you've said, such a rule that keys on `->` would allow us to apply yield retroactively to lambdas, *and* to switches, *and* to hypothetical expression-blocks in the future (if they have a `->` at their head, the rule applies uniformly), *and* to concise method bodies, as an alternative (as with lambda) to return. What about return-vs-yield? Well, yield is OK when there's a matching `->`. And return is OK when you're in a method body (and not also in a `->`). So sometimes both rules apply; pick a keyword; tastes may vary. That's not too different an experience from equivalent break vs. return (when the break falls out of the method body). On May 16, 2019, at 8:24 AM, Brian Goetz wrote: > One more thing to bear in mind: there is an ordering to abrupt completion mechanisms, as to how far away they can transfer control: > > - yield: can unwind only the innermost yieldable expression > - break/continue: can unwind multiple control constructs (for, while, switch), but stays within the method > - return: unwinds exactly one method > - throw: unwinds one or more methods > - System.exit: unwinds the whole VM Let's be careful of how we apply this ordering. A yield (like a lambda-return) can unwind any number of control constructs, up to the innermost yieldable expression. Because yields don't take labels, they cannot even express a multi-expression exit. But they *naturally* entail multi-block exit. Searches involving loops, catches, and ifs are common in Java and therefore essential to support with yield: L0: for (var q : qstuff) { L1: f(q, ()->{ STARTER -> { //B0 //break L0; continue L0; => BAD JUMP B1: try { B2: for (var x : stuff) { B3: if (x.stopHere()) yield x; } catch (MyEx ex) { yield ex.getStuff(); } B4: if (lastChance) yield DEFAULT_STUFF; throw new ComplainingEx(); } } } } Here, any of the blocks Bi could be unwound by a yield. The yield only goes back to the STARTER (which could be a lambda, switch or futuristic thing). A yield cannot reach the outer lambda at L1. More over, the break L0 would be a bad jump, since it cannot break out of the -> of the STARTER expression. Going back to your list of "unwind strength", I think *breaks* are therefore more limited than yields: - break/continue: can unwind multiple control constructs (for, while, switch), but stays within *both* the method and the innermost `->` - yield: can unwind multiple control constructs (for, while, switch), but stays within the innermost `->` - return: unwinds exactly one method frame (implicit after `->` method body) - throw: unwinds one or more methods - System.exit: unwinds the whole VM One more side note: Yield in a lambda can be viewed as jumping to the very outside of the lambda body, with a value, at which point "return off the end" takes over. So every yield can be considered a frame-local operation (perhaps followed by an implicit "return off the end, but with a value"). The reason I'm making this distinction is that it lets us say that yield always stays *inside* a method activation frame (even if the next step is to return the yielded value). This "yields" a uniform rule: If a `->` is immediately inside a block which defines local variables, those variables are available to code around the yield *for mutation* as well as reading. This is a different rule than with lambda uplevels. It allows code which yields an expression to *also* yield additional values by assigning to up-level variables. This too is a common pattern in Java. For example, a loop might return both an array element and the index of that element, to set up later searches starting after that index. int res2; var res1 = STARTER -> { ? res2 = 42; yield myRes1Value; }; System.out.println("got em: "+asList(res1, res2)); So why can't lambdas side-effect out? Simple, because they are -> blocks invisibly and immediately nested inside of method bodies. There are no vars declared which will survive the implicit return operation, so there's nothing to share (writably) with an enclosing block. But you can say "x = 1; yield 2;" usefully if the enclosing -> block is not also a method body. ? John From emcmanus at google.com Thu May 16 19:56:32 2019 From: emcmanus at google.com (=?UTF-8?Q?=C3=89amonn_McManus?=) Date: Thu, 16 May 2019 12:56:32 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5CDDBBC6.6010107@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> Message-ID: "yield" isn't a reserved word, is it? Doesn't that mean that `yield(5);` is ambiguous? On Thu, 16 May 2019 at 12:36, Alex Buckley wrote: > > On 5/16/2019 8:24 AM, Brian Goetz wrote: > > We?ve probably pretty much explored the options at this point; time to > > converge around one of the choices... > > I am very happy with `yield` as the new construct for concluding the > evaluation of a switch expression and leaving a value on the stack for > consumption within the method. > > I think a statement form for the new construct is ideal. The purpose of > the new construct is to complete abruptly in an attempt to transfer > control back to the switch expression, which then completes normally > with a value. Abrupt completion and an attempt to transfer control are > the hallmarks of `break`, `continue`, and `return`; having `yield` as > the junior member of that club is quite natural. Putting the junior and > senior members side by side shows both similarity and difference: > > ----- > A `yield` statement attempts to transfer control to the innermost > enclosing switch expression; this expression ... then immediately > completes normally and the value of the _Expression_ becomes the value > of the switch expression. > > A `return` statement attempts to transfer control to the invoker of the > innermost enclosing constructor, method, or lambda expression ... In the > case of a return statement with value _Expression_, the value of the > _Expression_ becomes the value of the invocation. > ----- > > Note that the aspect of _attempting_ to transfer control applies to > `yield` just as much as to `break`, `continue`, and `return`. Below, the > `finally` block "intercepts" the transfer of control started by `yield`. > The `finally` block then completes normally, so the transfer of control > proceeds and the switch expression completes normally, leaving 5 or 6 on > the stack. > > ``` > int result = switch (x) { > case 0 -> { > try { > ... > if (...) yield 5; > ... > yield 6; > } > finally { > cleanUp(); > } > } > > default -> 42; > }; > ``` > > Abrupt completion and transfer of control are not the hallmarks of > operators. The purpose of an operator is to indicate the kind of > expression to be evaluated (numeric addition, method invocation, etc), > so an operator-like syntax such as `^` would suggest the imminent > evaluation of a NEW expression. However, we are ALREADY in the process > of evaluating a switch expression; in fact we would like to finish it up > by transferring control from the {...} block (which has been happily > executing statements sequentially) to the switch expression itself (so > it can complete normally). So, I think an operator-like syntax is > inappropriate. > > Alex From john.r.rose at oracle.com Thu May 16 19:58:42 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 16 May 2019 12:58:42 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5CDDBBC6.6010107@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> Message-ID: <811BEAF1-3F9C-4119-AC0F-2A13DEC23574@oracle.com> On May 16, 2019, at 12:36 PM, Alex Buckley wrote: > having `yield` as the junior member of that club is quite natural. Putting the junior and senior members side by side shows both similarity and difference: If junior yield is allowed to help senior return with his job, we have a more uniform rule: yield always matches an arrow. If junior yield should stay off of senior return's grass, we have a somewhat less uniform rule: yield always matches an arrow, unless the arrow is coterminous with a method body, in which case return must be used. Either way is OK with me, but the more uniform rule seems to give me more insight into what's really happening. ? John From john.r.rose at oracle.com Thu May 16 19:59:52 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 16 May 2019 12:59:52 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> Message-ID: On May 16, 2019, at 12:56 PM, ?amonn McManus wrote: > > "yield" isn't a reserved word, is it? Doesn't that mean that > `yield(5);` is ambiguous? Yes, and the plan of record is to finesse such ambiguities, as we did with `var`. From john.r.rose at oracle.com Thu May 16 20:01:46 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 16 May 2019 13:01:46 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com>

Message-ID: <313B6A52-9DE9-44E2-BBBA-0E1C8CC1A4B7@oracle.com> On May 16, 2019, at 12:59 PM, John Rose wrote: > > On May 16, 2019, at 12:56 PM, ?amonn McManus wrote: >> >> "yield" isn't a reserved word, is it? Doesn't that mean that >> `yield(5);` is ambiguous? > > Yes, and the plan of record is to finesse such ambiguities, > as we did with `var`. Q: But we cannot know if that will really work. A: Yes, it's an ambiguous plan of record. Worked once, though. From brian.goetz at oracle.com Thu May 16 20:04:41 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 16 May 2019 16:04:41 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> Message-ID: <074BE8CF-D875-40B5-B640-E14E49A6ED18@oracle.com> The notion of ?reserved word? is insufficiently precise. More precisely, yield is a _reserved type identifier_, like `var`. That means that you cannot have a class called `yield`, but you can have local variables, or methods, or fields, or type variables, with that name. See https://openjdk.java.net/jeps/8223002 for further guidance on the fine degrees of shading between keywords, context-sensitive keywords, reserved identifiers, and reserved type names. > On May 16, 2019, at 3:56 PM, ?amonn McManus wrote: > > "yield" isn't a reserved word, is it? Doesn't that mean that > `yield(5);` is ambiguous? From brian.goetz at oracle.com Thu May 16 20:10:24 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 16 May 2019 16:10:24 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <811BEAF1-3F9C-4119-AC0F-2A13DEC23574@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> <811BEAF1-3F9C-4119-AC0F-2A13DEC23574@oracle.com> Message-ID: While dodging the arrow, I?ll point out that there is a pleasant ambiguity in the following: x = switch (y) { case L -> { foo(); yield 7; } }; Does the `yield` yield a value to the _block_, or to the _switch_? Answer: IT DOESN?T MATTER! Whichever intuition feels comfortable to you, yields the right answer. If we think of it as yielding to the block, then the block terminates normally with 7, and therefore the case label does, and therefore the switch does. If we think of it as yielding to the switch, then the switch completes normally with 7. And if we later want to expand block expressions to more places, maybe with some new syntax, then in a future Java case L -> { ? } becomes sugar for case L -> BLOCK_COMING { ? } at which point the yield is retconned to yield to the block. > On May 16, 2019, at 3:58 PM, John Rose wrote: > > On May 16, 2019, at 12:36 PM, Alex Buckley wrote: >> having `yield` as the junior member of that club is quite natural. Putting the junior and senior members side by side shows both similarity and difference: > > If junior yield is allowed to help senior return with > his job, we have a more uniform rule: yield always > matches an arrow. > > If junior yield should stay off of senior return's grass, > we have a somewhat less uniform rule: yield always > matches an arrow, unless the arrow is coterminous > with a method body, in which case return must be > used. > > Either way is OK with me, but the more uniform rule > seems to give me more insight into what's really > happening. > > ? John From john.r.rose at oracle.com Thu May 16 20:28:58 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 16 May 2019 13:28:58 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> <811BEAF1-3F9C-4119-AC0F-2A13DEC23574@oracle.com> Message-ID: <197AA4CD-F757-480C-8DB6-49860FB342E8@oracle.com> On May 16, 2019, at 1:10 PM, Brian Goetz wrote: > > While dodging the arrow, I?ll point out that there is a pleasant ambiguity in the following: > > x = switch (y) { > case L -> { > foo(); > yield 7; > } > }; Yes, it is pleasant, and it applies (potentially) to lambdas also. I'm saying it's extra-pleasant (for me) to divide the story into two chapters: Chapter 1. Some constructs have arrows. They define when the arrow bodies are executed, and, if the the arrow gets tossed a value, what is done with that value (method return? switch result? block result? depends on where the arrow is). Chapter 2. Every yield matches an innermost arrow, and every arrow (in a non-void T context) accepts a yielded value (of type T). It's pleasant this way because when you get to Chapter 2, you can forget all the gnarly context outside the arrow. Your yield passes to the innermost arrow, period. And if there's an arrow in sight (in the same stack frame) you can yield to it. Again, period. From maurizio.cimadamore at oracle.com Thu May 16 20:34:03 2019 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Thu, 16 May 2019 21:34:03 +0100 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <074BE8CF-D875-40B5-B640-E14E49A6ED18@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> <074BE8CF-D875-40B5-B640-E14E49A6ED18@oracle.com> Message-ID: <4037c6d3-e4df-9d61-5436-0aaae23ae5ab@oracle.com> On 16/05/2019 21:04, Brian Goetz wrote: > The notion of ?reserved word? is insufficiently precise. ?More > precisely, yield is a _reserved type identifier_, like `var`. ?That > means that you cannot have a class called `yield`, but you can have > local variables, or methods, or fields, or type variables, with that > name. Yep - but it's also different from 'var' in the sense that 'var' never had to fight with ambiguities with method names because it only applied to the 'type' part of a variable declaration, which is either a (possibly qualified) identifier (possibly followed by '<'). Parenthesis were never allowed where 'var' as a type was expected. For yield Eamon is right - there's a new kind of ambiguity. On the other hand is a trivial one to resolve, given what we're discussing now is something like "yields" EXPRESSION so, as soon as the compiler sees a "(" it will say: "ok, that's not a new yield statement". Maurizio > > See > > https://openjdk.java.net/jeps/8223002 > > for further guidance on the fine degrees of shading between keywords, > context-sensitive keywords, reserved identifiers, and reserved type > names. > >> On May 16, 2019, at 3:56 PM, ?amonn McManus > > wrote: >> >> "yield" isn't a reserved word, is it? Doesn't that mean that >> `yield(5);` is ambiguous? > From john.r.rose at oracle.com Thu May 16 20:46:42 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 16 May 2019 13:46:42 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <4037c6d3-e4df-9d61-5436-0aaae23ae5ab@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> <074BE8CF-D875-40B5-B640-E14E49A6ED18@oracle.com> <4037c6d3-e4df-9d61-5436-0aaae23ae5ab@oracle.com> Message-ID: <5D69C670-7320-4F00-831F-C97B2BC3C6B3@oracle.com> On May 16, 2019, at 1:34 PM, Maurizio Cimadamore wrote: > > On the other hand is a trivial one to resolve, given what we're discussing now is something like > > "yields" EXPRESSION > > so, as soon as the compiler sees a "(" it will say: "ok, that's not a new yield statement". The tricky bit with that is the user experience. What if the user needs a parenthesized expression: yield ("answer is "+x).trim(); There are some sharp edges here. Oh, look, it's a workaround bikeshed: yield false? 0: ("answer is "+x).trim(); yield (String)("answer is "+x); yield new String[]{ "answer is "+x }[0]; yield Arrays.asList("answer is "+x).get(0); yield Objects.id("answer is "+x); And my own little favorite, a bespoke use of arrow: yield -> ("answer is "+x); Maybe then also: `yield -> { block of stuff to do before I go; YepDone: yield s; };` ? John From maurizio.cimadamore at oracle.com Thu May 16 21:05:17 2019 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Thu, 16 May 2019 22:05:17 +0100 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5D69C670-7320-4F00-831F-C97B2BC3C6B3@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> <074BE8CF-D875-40B5-B640-E14E49A6ED18@oracle.com> <4037c6d3-e4df-9d61-5436-0aaae23ae5ab@oracle.com> <5D69C670-7320-4F00-831F-C97B2BC3C6B3@oracle.com> Message-ID: On 16/05/2019 21:46, John Rose wrote: > On May 16, 2019, at 1:34 PM, Maurizio Cimadamore wrote: >> On the other hand is a trivial one to resolve, given what we're discussing now is something like >> >> "yields" EXPRESSION >> >> so, as soon as the compiler sees a "(" it will say: "ok, that's not a new yield statement". > The tricky bit with that is the user experience. What if the > user needs a parenthesized expression: > > yield ("answer is "+x).trim(); > > There are some sharp edges here. I was hoping we didn't need to go there :-) There are other contexts in which we limit what can be done w/r/t/ parenthesized expressions (since these are ambiguous with cast to generic types). So this looks like another case where the grammar has to say - sorry no parens here. Maurizio > > Oh, look, it's a workaround bikeshed: > > yield false? 0: ("answer is "+x).trim(); > yield (String)("answer is "+x); > yield new String[]{ "answer is "+x }[0]; > yield Arrays.asList("answer is "+x).get(0); > yield Objects.id("answer is "+x); > > And my own little favorite, a bespoke > use of arrow: > > yield -> ("answer is "+x); > > Maybe then also: > > `yield -> { block of stuff to do before I go; YepDone: yield s; };` > > ? John > From guy.steele at oracle.com Thu May 16 21:41:05 2019 From: guy.steele at oracle.com (Guy Steele) Date: Thu, 16 May 2019 17:41:05 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> <074BE8CF-D875-40B5-B640-E14E49A6ED18@oracle.com> <4037c6d3-e4df-9d61-5436-0aaae23ae5ab@oracle.com> <5D69C670-7320-4F00-831F-C97B2BC3C6B3@oracle.com> Message-ID: <6C53E425-3BED-4227-93F0-5279B868D3BE@oracle.com> > On May 16, 2019, at 5:05 PM, Maurizio Cimadamore wrote: > > > On 16/05/2019 21:46, John Rose wrote: >> On May 16, 2019, at 1:34 PM, Maurizio Cimadamore wrote: >>> On the other hand is a trivial one to resolve, given what we're discussing now is something like >>> >>> "yields" EXPRESSION >>> >>> so, as soon as the compiler sees a "(" it will say: "ok, that's not a new yield statement". >> The tricky bit with that is the user experience. What if the >> user needs a parenthesized expression: >> >> yield ("answer is "+x).trim(); >> >> There are some sharp edges here. > > I was hoping we didn't need to go there :-) > > There are other contexts in which we limit what can be done w/r/t/ parenthesized expressions (since these are ambiguous with cast to generic types). So this looks like another case where the grammar has to say - sorry no parens here. And _that_ would very much give me pause. I would find it quite wrenching to have a place in the language where an expression cannot be parenthesized and have it mean exactly the same thing. Maybe we should go back to a hyphenated keyword. ?Guy From alex.buckley at oracle.com Thu May 16 21:43:29 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Thu, 16 May 2019 14:43:29 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> <074BE8CF-D875-40B5-B640-E14E49A6ED18@oracle.com> <4037c6d3-e4df-9d61-5436-0aaae23ae5ab@oracle.com> <5D69C670-7320-4F00-831F-C97B2BC3C6B3@oracle.com> Message-ID: <5CDDD981.3010707@oracle.com> On 5/16/2019 2:05 PM, Maurizio Cimadamore wrote: > There are other contexts in which we limit what can be done w/r/t/ > parenthesized expressions (since these are ambiguous with cast to > generic types). So this looks like another case where the grammar has to > say - sorry no parens here. If you're proposing to disallow a cast expression or a parenthesized expression after a `yield` token, then I think that's not right. The parsing of a `(` token has triggered potentially unbounded lookahead for some time [1][2], and everything worked out, so I don't see why the language should disallow any of John's examples: yield (String)("answer is "+x); yield ("answer is "+x).trim(); yield new String[]{ "answer is "+x }[0]; yield Arrays.asList("answer is "+x).get(0); yield false ? 0 : ("answer is "+x).trim(); Alex [1] See slides 9-11 from https://www.eclipsecon.org/na2014/session/jdt-embraces-lambda-expressions.html [2] JLS 15.27 on the choice of `(...)` for lambda parameters : The syntax has some parsing challenges. The Java programming language has always required arbitrary lookahead to distinguish between types and expressions after a '(' token: what follows may be a cast or a parenthesized expression. This was made worse when generics reused the binary operators '<' and '>' in types. Lambda expressions introduce a new possibility: the tokens following '(' may describe a type, an expression, or a lambda parameter list. Some tokens immediately indicate a parameter list (annotations, final); in other cases there are certain patterns that must be interpreted as parameter lists (two names in a row, a ',' not nested inside of '<' and '>'); and sometimes, the decision cannot be made until a '->' is encountered after a ')'. The simplest way to think of how this might be efficiently parsed is with a state machine: each state represents a subset of possible interpretations (type, expression, or parameters), and when the machine transitions to a state in which the set is a singleton, the parser knows which case it is. This does not map very elegantly to a fixed-lookahead grammar, however. From james.laskey at oracle.com Thu May 16 21:45:59 2019 From: james.laskey at oracle.com (James Laskey) Date: Thu, 16 May 2019 18:45:59 -0300 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> Message-ID: Yield +1 Sent from my iPhone > On May 16, 2019, at 12:24 PM, Brian Goetz wrote: > > We?ve probably pretty much explored the options at this point; time to converge around one of the choices... > >> >> De: "Brian Goetz" >> ?: "amber-spec-experts" >> Envoy?: Dimanche 12 Mai 2019 21:38:38 >> Objet: Call for bikeshed -- break replacement in expression switch >> As mentioned in the preview mail, we have one more decision to make: the new spelling of ?break value? in expression switches. We have previously discussed ?break-with value?, which everyone seems to like better than ?break value?, but I think we can, and should, do better. >> >> (Despite the call-for-bikeshed, this is not to reopen every sub-decision ? the 2x2 semantics, the use of ->, the name of the construct ? this bikeshed only has room for one bike.) >> >> There are two primary reasons why we prefer break-with to break. We originally chose ?break value" when we had a more limited palette of options to choose from (the keyword-resupply ship hadn?t yet docked.) The overloading of break creates uncomfortable interactions. There is the obvious ambiguity between ?break value? and ?break label?; there is also the slightly less obvious interaction where we cannot permit ?break value? inside a loop or statement switch inside an expression switch. While both of these can be ?specified around?, they create distortions in the spec, which in turn creates complexity in the user model; these are a sign that we may be pushing something a bit too far. Further, historically ?break? has been a straight transfer of control; this muddies up what ?break? means. >> >> Once we alit on the idea of break-* as a keyword, it seemed immediately more comfortable to make a new break-derived keyword; this allowed us to undo the distortions that ?break value? introduced, and it immediately felt better. But I think we can do better still. Here?s what?s making me uncomfortable. >> >> We?ve actually been here before: lambda expressions were the first time we allowed an expression to contain statements, and while the streamlined case of ?x -> e? didn?t require any control statements, and many lambdas could be expressed with this form, statement lambdas needed a way to say ?stop executing the body of this lambda, and yield a value.? We settled ? somewhat uncomfortably ? on ?return value" for this. >> >> Fast-forward to today, when we?re introducing the second expression form that can contain statements, and we face the same question: how to indicate ?I?m done, I?m completing normally, here?s my value.? Lambdas provide no help here; we can?t use ?return? here. (Well, we could, but that would be terrible, so we?re not going to.) Which means we have to solve the problem again, but differently. That?s already not so great. >> >> Digression: What?s so terrible about ?return?, any why is it OK for lambdas but not OK for switches? >> >> While we could of course define ?return? to mean whatever we want, But, in imperative languages with the concept of ?methods? or ?procedures?, including Java, return has always had a clear meaning: unwind the current call frame, and yield the designated value to the caller. Lambda expressions are effectively method bodies (lambdas are literals for functional interfaces, which are single method interfaces), and so return (barely) fits. But switch expressions are most definitely not methods, and are not associated with call frames. Asking users to look at the enclosing context when they see a ?return? in the middle of a method, to know whether it returns from the method or merely transfers control within the method, is a lot to ask. (Yes, I know lambdas ask this as well; this is why this was an uncomfortable choice, and having made this hole, I?m not anxious to expand it dramatically. If anything I?d prefer to close it, but that?s another bikeshed.). >> >> (end digression) >> >> >> We could surely take ?break-with? and move on; it feels sufficiently ?switchy?. But let?s look ahead a little bit. We?ve now confronted the same problem twice: an expression form that, in a minority use case, needed a way to express ?stop computing this expression, because I?m done, and here?s its value.? (And, unfortunately, we have two different syntactic ways to express the same basic concept.) Let?s call these ?structured expressions.? >> >> We have two structured expression forms, and of the three numbers in computer science, ?two? is not one of them. Which suggests we are going to face this problem again some day ? whether it be ?block expressions?, or ?if expressions?, or ?let expressions?, or ?try expressions?, or whatever. (NB: this call-for-bikeshed most definitely does not extend to ?why not just do generalized block expressions?, so please don?t go there. That said, you could treat this discussion as ?if Java had block expressions, what might they look like?? But we?re focusing on the content of the block, not how the block is framed.) >> >> Let?s say for sake of argument that we might someway want to extend ternary expressions to support the same kind of ?restricted block expressions? as expression switches. (This is just an example for purposes of illustration, let?s not get derailed on ?but you should use an ?if? statement for that"). >> >> String s = (foo != null) >> ? s >> : { >> println(?null again at line? + __LINE__); >> break-with ?null?; >> }; >> >> Such an expression needs a way to say ?I?m done, here?s my value?, just as lambda and switch did before it. Clearly ?return? is not the right thing here any more than it is for switches. And I don?t think ?break-with? is all that great here either! It?s not terrible, but outside of a loop or switch, it starts to feel kind of forced. And it would be terrible to solve this problem twice with one-time solutions, and have no general story, and then have to come up with YET ANOTHER way of expressing the same basic concept. So regardless of what we expect for future expression forms, let?s examine what our options are that are not tied to call frames (return) or direct transfer of control (switches and loops.). >> >> Looking at what other languages have done here, there are a few broad directions: >> >> - A statement like ?break-with v?, indicating that the enclosing structured expression is completing normally with the provided value. >> - An operator that serves the same purpose, such as ?-> e?. >> - Assigning to some magic variable (this is how Pascal indicates the return value of a function). >> - Treating the last expression in the block as the result. >> >> I think we can dispatch all but the first relatively easily: >> >> - We don?t use operators for ?return?, we use a keyword; this would be both a gratuitous departure, as well as too easy to miss. >> - Switch expressions don?t have names, and even if we assigned to ?switch?, it wouldn?t be obvious that we were actually terminating execution of the block. >> - Everywhere else in the language (such as method bodies), you are free to yield up a value from the middle of the block, perhaps from within a control construct like a loop; restricting the RHS of case blocks to put their result last would be a significant new restriction, and would limit the ability to refactor to/from methods. And further, the convention of putting the result last, while a fine one for a language that is ?expressions all the way down?, would likely be too subtle a cue in Java. >> >> So, we want a keyword (or contextual keyword.). In some hallway brainstorming, candidates that emerged include yield, produce, offer, offer-up, result, value-break, yield-value, provide, resulting-in, break-with, resulting, yielding, put, give, giving, ... >> >> (Also to keep in mind: remember we?re dealing with a minority case; most of the time, there?ll just be an expression on the RHS.) >> >> TL;DR: I think we might come to regret break-* just as we did with return ? because it won?t scale to future demands we place on it, and having *three* ways to say basically the same thing in three different contexts would be embarrassing. I would like to see if we can do better. >> >> >> Of the options listed here, I have a favorite: yield. (This is one of the terms we?ve actually be using all along when describing this feature in english.) >> >> There is one obvious objection to ?yield?, which I?d like to preemptively address: that in some languages (though not in Java, except for the infrequently-used Thread.yield()), it is associated with concurrency primitives, such as generators. (This was the objection raised when yield was proposed in the context of lambdas.). But, these association are not grounded in existing Java constructs (and, the progress of Loom suggests that constructs like async/await are not coming to Java, and even if we wanted language support for generators, there are ample other ways to say it.) >> >> Dictionary.com lists the following meanings for yield: >> >> verb (used with object) >> - to give forth or produce by a natural process or in return for cultivation: >> - to produce or furnish (payment, profit, or interest): >> - to give up, as to superior power or authority: >> - to give up or over; relinquish or resign: >> - to give as due or required: >> - to cause; give rise to: >> >> verb (used without object) >> - to give a return, as for labor expended; produce; bear. >> - to surrender or submit, as to superior power: >> - to give way to influence, entreaty, argument, or the like: >> - to give place or precedence (usually followed by to): >> - to give way to force, pressure, etc., so as to move, bend, collapse, or the like: >> >> These are mostly consistent with the use of ?yield? as proposed here. >> >> One more thing to bear in mind: there is an ordering to abrupt completion mechanisms, as to how far away they can transfer control: >> >> - yield: can unwind only the innermost yieldable expression >> - break/continue: can unwind multiple control constructs (for, while, switch), but stays within the method >> - return: unwinds exactly one method >> - throw: unwinds one or more methods >> - System.exit: unwinds the whole VM >> >> >> Bikeshed is open (but remember the bounds of this bikeshed are limited; we?re talking purely about the syntax of a ?stop executing this block and yield a value to the enclosing context? ? and time is ticking.) >> >> >> >> >> > From john.r.rose at oracle.com Fri May 17 01:15:36 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 16 May 2019 18:15:36 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5CDDD981.3010707@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> <074BE8CF-D875-40B5-B640-E14E49A6ED18@oracle.com> <4037c6d3-e4df-9d61-5436-0aaae23ae5ab@oracle.com> <5D69C670-7320-4F00-831F-C97B2BC3C6B3@oracle.com> <5CDDD981.3010707@oracle.com> Message-ID: On May 16, 2019, at 2:43 PM, Alex Buckley wrote: > > If you're proposing to disallow a cast expression or a parenthesized expression after a `yield` token, then I think that's not right. The parsing of a `(` token has triggered potentially unbounded lookahead for some time [1][2], and everything worked out, so I don't see why the language should disallow any of John's examples: > > yield (String)("answer is "+x); > yield ("answer is "+x).trim(); > yield new String[]{ "answer is "+x }[0]; > yield Arrays.asList("answer is "+x).get(0); > yield false ? 0 : ("answer is "+x).trim(); Here's what's tricky: If there is a method called "yield" in scope, then one of those examples is a valid method call expression statement. import static MyFavYielder.yield; class Client extends MaybeHasYieldMethod { void m(int x) { var res = switch (x) { case 42 -> { yield ("answer is "+x).trim(); } default -> -1; }} Here's one way to slice it (very thin): The name "yield" is placed in scope in "->" blocks as if it were an inherited or imported static method. It acts like an arity-1 signature-poly method returning void. When "yield" is followed by a paren, an appeal to this method, and any other ambient methods named "yield" is made, and overloading and ambiguity analysis is done. If after all the special sig-poly method is matched, then the compiler edits the statement into a control flow construct. (This is circular: A control flow construct affects ambient DA/DU rules which might also indirectly affect types IIRC. So the type of the yield call maybe could circularly depend on the surrounding control flow.) If the built-in "yield" quasi-method conflicts with a real "yield" method that is in scope and matches, the we report an ambiguity to the user. (Ambiguity? Ya think??) The user has to fix it by using a fully qualified call to the intended yield or some similar dodge. If the yield statement is desired, at worst case the user makes a temporary variable, and yields *that*. This trickiness does tend to support a less ambiguous syntax, such as "yield -> x;" or (per Doug) "yield ^x;". ? John P.S. I find "yield -> x" charming partly because the arrow seems to have additional possibilities: if (foo) { yield -> { var x = waitASec(); var y = OK; yield -> f(x, OK); }; } instead of: if (foo) { var x = waitASec(); var y = OK; yield -> f(x, OK); } From forax at univ-mlv.fr Fri May 17 06:30:21 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 17 May 2019 08:30:21 +0200 (CEST) Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <074BE8CF-D875-40B5-B640-E14E49A6ED18@oracle.com> <4037c6d3-e4df-9d61-5436-0aaae23ae5ab@oracle.com> <5D69C670-7320-4F00-831F-C97B2BC3C6B3@oracle.com> <5CDDD981.3010707@oracle.com> Message-ID: <841602160.1715579.1558074621687.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "John Rose" > ?: "Alex Buckley" > Cc: "amber-spec-experts" > Envoy?: Vendredi 17 Mai 2019 03:15:36 > Objet: Re: Call for bikeshed -- break replacement in expression switch > On May 16, 2019, at 2:43 PM, Alex Buckley wrote: >> >> If you're proposing to disallow a cast expression or a parenthesized expression >> after a `yield` token, then I think that's not right. The parsing of a `(` >> token has triggered potentially unbounded lookahead for some time [1][2], and >> everything worked out, so I don't see why the language should disallow any of >> John's examples: >> >> yield (String)("answer is "+x); >> yield ("answer is "+x).trim(); >> yield new String[]{ "answer is "+x }[0]; >> yield Arrays.asList("answer is "+x).get(0); >> yield false ? 0 : ("answer is "+x).trim(); > > Here's what's tricky: If there is a method called "yield" > in scope, then one of those examples is a valid method > call expression statement. > > import static MyFavYielder.yield; > class Client extends MaybeHasYieldMethod { > void m(int x) { > var res = switch (x) { > case 42 -> { > yield ("answer is "+x).trim(); > } > default -> -1; > }} > > Here's one way to slice it (very thin): > > The name "yield" is placed in scope in "->" > blocks as if it were an inherited or imported > static method. It acts like an arity-1 > signature-poly method returning void. > When "yield" is followed by a paren, > an appeal to this method, and any > other ambient methods named "yield" > is made, and overloading and ambiguity > analysis is done. If after all the special > sig-poly method is matched, then the > compiler edits the statement into a > control flow construct. > > (This is circular: A control flow construct > affects ambient DA/DU rules which might > also indirectly affect types IIRC. So the type > of the yield call maybe could circularly depend > on the surrounding control flow.) I would prefer a more "brutal approach" for the shake of my brain, i would like those rules to be true: - inside a -> block, the "yield" text always means yield from that block. - if there is no -> block (no switch expression), the compiler will not emit an error. It works that way, at the beginning of a -> block, the compiler checks in the scope if there is a method named "yield" available (whatever the number of parameters), if it's true, the compiler reports an error. This rule is voluntarily simple, so a human can understand it :) And if there is an unqualified access to a method yield anywhere in the compilation unit, the compiler emits a warning to help users to change their code to make it more readable. [...] > > ? John R?mi From forax at univ-mlv.fr Fri May 17 06:30:48 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 17 May 2019 08:30:48 +0200 (CEST) Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <6C53E425-3BED-4227-93F0-5279B868D3BE@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <5CDDBBC6.6010107@oracle.com>