From forax at univ-mlv.fr Wed May 1 12:32:40 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 1 May 2019 14:32:40 +0200 (CEST) Subject: Feedback on Sealed Types In-Reply-To: <7a3cd179-1ccc-f3d5-b438-3146e87fa4eb@oracle.com> References: <7a3cd179-1ccc-f3d5-b438-3146e87fa4eb@oracle.com> Message-ID: <2005472563.509550.1556713960688.JavaMail.zimbra@u-pem.fr> > De: "Brian Goetz" > ?: "Alan Malloy" > Cc: "amber-spec-experts" > Envoy?: Lundi 29 Avril 2019 23:01:17 > Objet: Re: Feedback on Sealed Types > It would be nice if we could "just" overload enum itself to support a > record-like option: > enum Node { > AddNode(Node a, Node b), > MulNode(Node a, Node b), > ...; > ... > } > but unfortunately that syntax looks confusingly close to something else :( It > also doesn't really scale to multi-level hierarchies, though that might be OK. > Its worth thinking about, though. The `data` construct from Haskell is surely > more direct than modeling a sum with sealed interfaces (though, the latter is > also more flexible than `data`.) > On the other hand, declaring a sum of records as: > sealed interface I { > record A(int a) implements I { } > record B(long b) implements I { } > record C(String c) implements I { } > } > isn't so bad. The main redundancy here is primarily "implements I", and > secondarily "record". We can surely compress that away like this: > enum interface I { > A(int a), > B(long b), > C(String c); > // I methods > } > but I am not sure it carries its weight, given as it adds little additional > concision (and no additional semantics.) It may solve the enclosing issue because the ';' syntactically separate A, B and C from the content of I which is declared after the ';', so A, B and C can be top-level. I kind a like the intellectual separation between - a sealed interface which represent a closed type and requires a permit clause and - an enum interface which represent a sum type which is sugar on top of sealed interface + records. one interesting question is how to desugar an enum interface with a component that has no parameter, like enum interface Option { Some(T value), Empty } If there is only one constant of type Empty and the construction is typesafe, it can be a huge win. R?mi > On 4/29/2019 4:33 PM, Alan Malloy wrote: >> Thanks, Brian. I indeed didn't think of some of your proposed benefits of >> sealing non-sum types, as I was focused mostly on things you mentioned >> explicitly in the JEP, which is somewhat light on the expected benefits. >> I think the first two items in your "challenges" solve each other: I don't >> intend sum types to be the only kind of sealed type, but just a good way to >> declare the simplest kind. I left out the "record" keyword from the declaration >> with the idea that it would be implicit: if you want the convenient >> sum-of-products declaration style, you have to use records. If you want >> something more complicated, you declare a sealed interface (or superclass), and >> N permitted subclasses, declared separately in whatever way you want. This >> restriction helps by making the semantics clearer, and I had also hoped that it >> would lead to a syntax error if you leave out the comma. Looking more closely, >> I see this is somewhat precarious: a record declaration looks enough like a >> method signature that they may be ambiguous in an interface, if you don't >> require the "record" keyword, or if you use a semicolon instead of a comma. I >> think it can still work if we require each nested record to use {...} instead >> of ; even if it's empty. This way, your two examples look like >> interface X { >> class X1 { ? } >> class X2 { ? } >> } >> and >> enumerated interface Y { >> Y1 { ? }, > Y2 { ? } >> } >> The latter would become illegal if you dropped the comma, even if you also >> forgot the "enumerated" keyword, because the braces make no sense in an >> ordinary interface. >> On Mon, Apr 29, 2019 at 12:35 PM Brian Goetz < [ mailto:brian.goetz at oracle.com | >> brian.goetz at oracle.com ] > wrote: >>> Thanks Alan, for this nice exploration. There?s a lot to respond to. I?ll start >>> with some general comments about sealing, and then move on to your alternate >>> proposal for exposing it. >>> I can think of several main reasons why you would want to seal a hierarchy. >>> - To say something about the _interface itself_. That is, that it was not >>> designed as a general purpose contract between arms-length entities, but that >>> it exists only as a common super type for a fixed set of classes that are part >>> of the library. In other words, ?please don?t implement me.? >>> - To say something about the semantics of the type. Several of the examples in >>> your report fall into this category: ?a DbResult is either a NoRowsFound or a >>> Rows(List)?. This tells users exactly what the reasonable range of results >>> to expect are when doing a query. Of course, the spec could say the same thing, >>> but that involves reading and interpreting the spec. Easier if this conclusion >>> can be driven by types (and IDEs can help more here too.) >>> - To strengthen typing by simulating unions. If my method is going to return >>> either a String or a Number, the common super type is Object. (Actually, it?s >>> some variant of Serializable & Comparable.). Sums-of-products >>> allow library authors to make a stronger statement about types in the presence >>> of unions. Exposing a sum of StringHolder(String) and NumberHolder(Number), >>> using records and sealed types, is not so ceremonious, so some library >>> developers might choose to do this instead of Object. >>> - Security. Some libraries may want to know that the code they are calling is >>> part of their library, rather than an arbitrary implementation of some >>> interface. >>> - To aid in exhaustiveness. We?ve already discussed this at length; your point >>> is that this one doesn?t come up as often as one might hope. >>> Not only is there an obvious synergy between sums and products (as many >>> languages have demonstrated), but there is a third factor, which is ?if you >>> make it easy enough, people will use it more.? Clearly records are pretty easy >>> to use; your point is that if there were a more streamlined sum-of-products >>> idiom, the third factor would be even stronger here. I think algebraic data >>> types is one of those things that will take some time for developers to learn >>> to appreciate; the easier we make it, of course the faster that will happen. >>> Now, to your syntax suggestion. Overall, I like the idea, but I have some >>> concerns. First, the good parts: >>> - The connection with enums is powerful. Users already understand enums, so this >>> will help them understand sums. Enums have special treatment in switch; we want >>> the same treatment for sealed type patterns. Enums have special treatment for >>> exhaustiveness; we want the same for sealed type patterns. So tying these >>> together with some more general enum-ness leans on what people already know. >>> - While sums and products are theoretically independent features, >>> sums-of-products are expected to be quite common. So it might be reasonable to >>> encourage this syntactically. >>> - The current proposal has some redundancy, in that the subtypes have to say >>> ?implements Node?, even if they are nested within Node. With a stronger >>> mechanism for declaring them, as you propose, then that can safely be left >>> implicit. >>> - I confess that I too like the simplicity of Haskell?s `data` declaration, and >>> this brings us closer. >>> Now, the challenges: >>> - The result is still a little busy. We need a modifier for ?enumerated type?, >>> and we would also need to be able to have child types be not only records, but >>> ordinary classes and interfaces. So we?d have to have a place for ?record?, >>> ?class?, or ?interface? with the declaration of the enumerated classes (as well >>> as other modifiers.). That busies up the result a bit. >>> - Once we do this, I worry that it will be hard to tell the difference between: >>> interface X { >>> class X1 { ? } >>> class X2 { ? } >>> } >>> and >>> enumerated interface Y { >>> class Y1 { ? }, >> class Y2 { ? } >>> } >>> and that users will forever be making mistakes like forgetting the comma, or >>> putting it where it doesn?t belong. >>> - This mechanism addresses the very common case of sum-of-product, but leaves >>> more esoteric sums out of the picture. (Consider the types in >>> java.lang.constant, which really want to be sealed.). There, because they are >>> not co-declared, we?d need something more like >>> sealed interface ConstantDesc >>> permits ClassDesc, MethodTypeDesc, ?. { } >>> It's possible that such a mechanism can be grafted on to your proposal, or there >>> is a shuffling that supports it. >>>> On Apr 29, 2019, at 2:28 PM, Alan Malloy < [ mailto:amalloy at google.com | >>>> amalloy at google.com ] > wrote: >>>> Hello again, amber-spec-experts. I have another report from the Google codebase, >>>> this time focusing on sealed types. It is viewable in full Technicolor HTML at >>>> [ http://cr.openjdk.java.net/~cushon/amalloy/sealed-types-report.html | >>>> http://cr.openjdk.java.net/~cushon/amalloy/sealed-types-report.html ] (thanks >>>> again to Liam for hosting), and included below as plain text: >>>> Author: Alan Malloy ( [ mailto:amalloy at google.com | amalloy at google.com ] ) >>>> Published: 2019-04-29 >>>> Feedback on Sealed Types >>>> Hello again, amber-spec-experts. I?m back with a second Google codebase research >>>> project. I?m looking again at the Records & Sealed Types proposal (which has >>>> now become JDK-8222777), but this time I?m focusing on sealed types instead of >>>> records, as promised in my RFC of a few weeks ago. My goal was to investigate >>>> Google?s codebase to guess what developers might have done differently if they >>>> had access to sealed types. This could help us understand what works in the >>>> current proposal and what to consider changing. >>>> Unlike my previous report, this one contains more anecdotes than statistics. It >>>> wound up being difficult to build static analysis to categorize the interesting >>>> cases, so I mostly examined promising candidates by hand. >>>> Summary and Recommendations >>>> For those who don?t care to read through all my anecdotes, I first provide a >>>> summary of my findings, and one suggested addition. >>>> Sealed types, as proposed so far, are a good idea in theory: Java already has >>>> product types and open polymorphism, and sealed types give us closed >>>> polymorphism. However, I could not find many cases of code being written today >>>> that would be greatly enhanced if sealed types were available. The main selling >>>> point of sealed types for application authors is getting help from the compiler >>>> with exhaustiveness checking, but in practice developers almost always have a >>>> default case, because they are only interested in a subset of the possible >>>> subclasses, and want to ignore cases they don?t understand. This means that >>>> exhaustiveness-checking for pattern matches would mostly go unused if >>>> developers rewrote their existing code using sealed types. >>>> Pattern matching is great, and can replace visitors in many cases, but this does >>>> not depend on sealed types except for exhaustiveness checks (which, again, >>>> would go mostly unused in code written today). The class hierarchies for which >>>> people define visitors today are just too large to write an exhaustive pattern >>>> match, and so a default case would be very common. >>>> The other audience for sealed types is library authors. While in practice most >>>> developers have no great need to forbid subclasses, perhaps it would be a boon >>>> for authors of particularly popular libraries, who need to expose a non-final >>>> class as an implementation detail but don?t intend for consumers to create >>>> their own subclasses. Those authors can already include documentation saying >>>> ?you really should not extend this?, but there is always some weirdo out there >>>> who will ignore your warnings and then write an angry letter when the next >>>> version of your library breaks his program (see: sun.misc.Unsafe). Authors of >>>> such libraries would welcome the opportunity to make it truly impossible to >>>> create undesirable subclasses. >>>> Sealed Types As a Vehicle For Sum Types >>>> So, sealed types as-is would be an improvement, but a niche one, used by few. I >>>> think we can get substantially more mileage out of them if we also include a >>>> more cohesive way to explicitly define a sum type and all its subtypes in one >>>> place with minimal ceremony. Such a sum type could be sealed, implicitly or >>>> explicitly. A tool like this takes what I see as the ?theoretical? advantage of >>>> sum types (closed polymorphism), and makes it ?practical? by putting it front >>>> and center. Making sums an actual language element instead of something >>>> ?implied? by sealing a type and putting its subclasses nearby could help in a >>>> lot of ways: >>>> * Developers might more often realize that a sealed/sum type is a good model for >>>> their domain. Currently it?s a ?pattern? external to the language instead of a >>>> ?feature?, and many don?t realize it could be applied to their domain. Putting >>>> it in the language raises its profile, addressing the problem that people don?t >>>> realize they want it. >>>> * The compiler could provide help for defining simple sums-of-products, while >>>> making it possible to opt into more complicated subclasses, in much the way >>>> that enums do: the typical enum just has bare constants like EAST, but you can >>>> add constructor arguments or override methods when necessary. >>>> * The ability to more easily model data in this way may result in developers >>>> writing more classes that are amenable to sealing/sums, as they do in other >>>> languages with explicit sum types (Haskell, Kotlin, Scala). Then, the >>>> exhaustiveness-checking feature that sealed types provide would pull more >>>> weight. >>>> Since enum types are ?degenerate sum types?, the syntax for defining sums can >>>> borrow heavily from enums. A sketch of the syntax I imagine for such things (of >>>> course, I am not married to it): >>>> public type-enum interface BinaryTree { >>>> Leaf { >>>> @Override public Stream elements() {return Stream.empty();} >>>> }, >>>> Node(T data, BinaryTree left, BinaryTree right) { >>>> @Override public Stream elements() { >>>> return Stream.concat(left.elements(), >>>> Stream.concat(Stream.of(data), right.elements())); >>>> } >>>> }; >>>> public Stream elements(); >>>> } >>>> Like enums, you can use a bare identifier for simple types that exist only to be >>>> pattern-matched against, but you can add fields and/or override blocks as >>>> necessary. The advantage over declaring a sealed type separately from its >>>> elements is both concision (the compiler infers visible records, superclass, >>>> and all type parameters) and clarity: you state your intention firmly. I think >>>> a convenient syntax like this will encourage developers to use the powerful >>>> tool of sealed types to model their data. >>>> Evidence in Google?s Codebase >>>> If you are just interested in recommendations, you can stop reading now: they >>>> are all included in the summary. What follows is a number of anecdotes, or case >>>> studies if you prefer, that led me to the conclusions above. Each shows a type >>>> that might have been written as a sealed type, and hopefully highlights a >>>> different facet of the question of how sealed types can be useful. >>>> The first thing I looked for was classes which are often involved in instanceof >>>> checks. As language authors, we imagine people writing stuff like this[1] all >>>> the time: >>>> interface Expr {int eval(Scope s);} >>>> record Var(String name) implements Expr { >>>> public int eval(Scope s) {return s.get(name);} >>>> } >>>> record Sum(Expr left, Expr right) implements Expr { >>>> public int eval(Scope s) {return left.eval(s) + right.eval(s);} >>>> } >>>> class Analyzer { >>>> Stream variablesUsed(Expr e) { >>>> if (e instanceof Var) return Stream.of(((Var)e).name); >>>> if (e instanceof Sum) { >>>> return variablesUsed(((Sum)e).left) >>>> .concat(variablesUsed(((Sum)e).right)); >>>> } >>>> throw new IllegalArgumentException(); >>>> } >>>> } >>>> Here, the Expr interface captures some of the functionality shared by all >>>> expressions, but later a client (Analyzer) came along and invented some other >>>> polymorphic operations to perform on an Expr, which Expr did not support. So >>>> Analyzer needed to do instanceof checks instead, externalizing the >>>> polymorphism. The principled approach would have been for Expr to export a >>>> visitor to begin with, but perhaps it wasn?t seen as worth the trouble at the >>>> time. >>>> To try to find this pattern in the wild, I searched for method bodies which >>>> perform multiple instanceof checks against the same variable. Notably, this >>>> excludes the typical equals(Object) method, which only performs a single check. >>>> For each such variable, I noted: >>>> 1. Its declared type >>>> 2. The set of subtypes it was checked for with instanceof >>>> 3. The common supertype of those subtypes. >>>> I guessed that (3) would usually be the same as (1), but in practice 55% of the >>>> time they were different. Often, the declared type was Object, or some generic >>>> type variable which erases to Object, while the common supertype being tested >>>> was something like Number, Event, or Node. For example, a Container knows it >>>> will be used in some context where NaN is unsuitable, so it checks whether its >>>> contents are Float or Double, and if so ensures NaN is not stored. As a second >>>> example, a serialize(Object) method checks whether its input is String or >>>> ByteString, and throws an exception otherwise. >>>> Bad sealed types found looking at instanceof checks >>>> I looked through the most popular declared types of these candidates, to >>>> investigate which types are often involved in such checks. Most of them are not >>>> good candidates for a sealed type. Object was the most common type, followed by >>>> Exception and Throwabe. >>>> Next up is an internal DOMObject class, which sounds promising until I tell you >>>> it has thousands of direct subclasses. Nobody is doing exhaustive switches on >>>> this, of course. Instead, many uses iterate over a Collection, or >>>> receive a DOMObject in some way, and just check whether it is of one or two >>>> specific subtypes they care about. This turned out to be a very common pattern, >>>> not just for DOMObject, but for many candidate sealed types I found: nobody >>>> does exhaustive case analysis. They just look for objects they understand in >>>> some much larger hierarchy, and ignore the rest. >>>> Some more humorous types that are often involved in instanceof checks: >>>> java.net.InetAddress (everyone wants to know if it?s v4 or v6) and >>>> com.sun.source.tree.Tree, in our static-analysis tools. Tree is an interesting >>>> case: here we do exactly what I mentioned previously for DOMObject. On the >>>> surface it seems that Tree would be a good candidate for a sealed interface >>>> with record subtypes, but in practice I?m not sure what sealing would buy us. >>>> We would effectively opt out of exhaustiveness-checking by having a large >>>> default case, or by extending a visitor with default-empty methods. Of course, >>>> sometimes we define a new visitor to do some polymorphic operation over a Tree, >>>> but more often we just look for one or two subtypes we care about. For example, >>>> DataFlow inspects a Tree, but knows from context that it is either a >>>> LambdaExpressionTree, MethodTree, or an initializer. >>>> Plausible sealed types found looking at instanceof checks >>>> The previous section notwithstanding, I did dig deep enough into the results to >>>> find a few classes that could make good sealed types. The most prominent, and >>>> most interesting, was another AST. There is an abstract Node class for >>>> representing HTML documents. It has just 4 subclasses defined in the same file: >>>> Text, Comment, Tag, and EndTag. This spartan definition suggests it?s used for >>>> something like SAX parsing, but I didn?t confirm this. It does everything you >>>> could hope for from a type like this: it exposes a Visitor, it provides an >>>> accept(Visitor) method, and the superclass specifies abstract methods for a >>>> couple of the most common things you would want to do, such as a String >>>> toHtml() method. >>>> However, recall that I found this class by looking for classes often involved in >>>> instanceof checks! Some people use the visitor, but why doesn?t everyone? The >>>> first reason I found is one I?ve mentioned several times already: clients only >>>> care about one of the 4 cases, and may have felt creating an anonymous visitor >>>> is too much ceremony. Would they be happy with a switch and a default clause? >>>> Probably, but it?s hard to know for sure. The second reason surprised me a bit: >>>> I found clients doing analysis that isn?t really amenable to any one visitor, >>>> or a simple pattern-match. They?ve written this: >>>> if (mode1) { if (x instanceof Tag) {...} } >>>> else if (mode2) { if (x instanceof Text) {...}} >>>> The same use site cares about different subclasses at different times, depending >>>> on some other flag(s) controlling its behavior. Even if we offered a >>>> pattern-match on x, it?s difficult to encode the flags correctly. They would >>>> have to match on a tuple of (mode1, mode2, x), with a case for (true, _, Tag >>>> name) and another for (false, true, Text text). Technically possible, but not >>>> really prettier than what they already have, especially since you would need to >>>> use a local record instead of an anonymous tuple. >>>> Even so, I think this would have benefited from being a sealed type. Recall that >>>> earlier I carefully said ?4 subclasses defined in the same file?. This is >>>> because some jokester in a different package altogether has defined their own >>>> fifth subclass, Doctype. They have their own sub-interface of Visitor that >>>> knows about Doctype nodes. I can?t help but feel that the authors of Node would >>>> have preferred to make this illegal, if they had been able to. >>>> The second good sealed type I found is almost an enum, except that one of the >>>> instances has per-instance data. This is not exactly a surprise, since an enum >>>> is a degenerate sum type, and one way to think of sealed types is as a way to >>>> model sums. It looks something like this[2]: >>>> public abstract class DbResult { >>>> public record NoDatabase() extends DbResult; >>>> public record RowNotFound() extends DbResult; >>>> // Four more error types ... >>>> public record EmptySuccess() extends DbResult; >>>> public record SuccessWithData(T data) extends DbResult; >>>> public T getData() { >>>> if (!(this instanceof SuccessWithData)) >>>> throw new DbException(); >>>> return ((SuccessWithData)this).data; >>>> } >>>> public DbResult transform(Function f) { >>>> if (!(this instanceof SuccessWithData)) { >>>> return (DbResult)this; >>>> } >>>> return new SuccessWithData(f.apply( >>>> ((SuccessWithData)this).data)); >>>> } >>>> Reading this code made me yearn for Haskell: here is someone who surely wanted >>>> to write >>>> data DbResult t = NoDatabase | NoRow | EmptySuccess | Success t >>>> but had to spend 120 lines defining their sum-of-products (the extra verbosity >>>> is because really they made the subclasses private, and defined private static >>>> singletons for each of the error types, with a static getter to get the type >>>> parameter right). This seems like a potential win for records and for sealed >>>> types. Certainly my snippet was much shorter than the actual source file >>>> because the proposed record syntax is quite concise, so that is a real win. But >>>> what do we really gain from sealing this type? Still nobody does exhaustive >>>> analysis even of this relatively small type: they just use functions like >>>> getData and transform to work with the result generically, or spot-check a >>>> couple interesting subtypes with instanceof. Forbidding subclassing from other >>>> packages hardly matters: nobody was subclassing it anyway, and nor would they >>>> be tempted to. Really the improvements DbResult benefits most from are records, >>>> and pattern-matching on records. It would be much nicer to replace the >>>> instanceof/cast pattern with a pattern-match that extracts the relevant field. >>>> This is the use case that inspired my idea of a type-enum, in the Summary >>>> section above. Rewriting it as a type-enum eliminates many of the problems: all >>>> the instanceof checks are gone, we don?t need a bunch of extra keywords for >>>> each case, and we?re explicit about the subclasses ?belonging to? the sealed >>>> parent, which means we get stuff like extends and for free. We get improved >>>> clarity by letting the definition of the class hierarchy reflect its ?nature? >>>> as a sum. >>>> public abstract type-enum DbResult { >>>> NoDatabase, >>>> RowNotFound, >>>> EmptySuccess, >>>> SuccessWithData(T data) { >>>> @Override public T getData() { >>>> return data; >>>> } >>>> @Override public DbResult transform(Function f) { >>>> return new SuccessWithData(f.apply(data)); >>>> } >>>> } >>>> public T getData() { >>>> throw new DbException(); >>>> } >>>> public DbResult transform(Function f) { >>>> return (DbResult)this; >>>> } >>>> } >>>> Visitors >>>> Instead of doing a bunch of instanceof checks, the ?sophisticated? way to >>>> interact with a class having a small, known set of subtypes is with a visitor. >>>> I considered doing some complicated analysis to characterize what makes a class >>>> a visitor, and trying to automatically cross-reference visitors to the classes >>>> they visit...but in practice simply looking for classes with ?Visitor? in their >>>> name was a strong enough signal that a more complicated approach was not >>>> needed. Having identified visitors, I looked at those visitors with the most >>>> subclasses, since each distinct subclass corresponds to one ?interaction? with >>>> the sealed type that it visits, and well-used visitors suggest both popularity >>>> and good design. >>>> One common theme I found: developers aren?t good at applying the visitor >>>> pattern. Many cases I found had some weird and inexplicable quirk compared to >>>> the ?standard? visitor. These developers will be relieved to get >>>> pattern-matching syntax so they can stop writing visitors. >>>> The Visiting Object >>>> The first popular visitor I found was a bit odd to me. It?s another tree type, >>>> but with a weird amalgam of several visitors, and an unusual approach to its >>>> double dispatch. I have to include a relatively lengthy code snippet to show >>>> all of its facets: >>>> public static abstract class Node { >>>> public interface Visitor { >>>> boolean process(Node node); >>>> } >>>> public boolean visit(Object v) { >>>> return v instanceof Visitor >>>> && ((Visitor)v).process(this); >>>> } >>>> // Other methods common to all Nodes ... >>>> } >>>> public static final class RootNode extends Node { >>>> public interface Visitor { >>>> boolean processRoot(RootNode node); >>>> } >>>> @Override >>>> public boolean visit(Object v) { >>>> return v instanceof Visitor >>>> ? ((Visitor)v).processRoot(this) >>>> : super.visit(v); >>>> } >>>> // Other stuff about root nodes ... >>>> } >>>> public static abstract class ValueNode extends Node { >>>> public interface Visitor { >>>> boolean processValue(ValueNode node); >>>> } >>>> @Override >>>> public boolean visit(Object v) { >>>> return v instanceof Visitor >>>> ? ((Visitor)v).processValue(this) >>>> : super.visit(v); >>>> } >>>> } >>>> public static final class BooleanNode extends ValueNode { >>>> public interface Visitor { >>>> boolean processBool(BooleanNode node); >>>> } >>>> @Override >>>> public boolean visit(Object v) { >>>> return v instanceof Visitor >>>> ? ((Visitor)v).processBool(this) >>>> : super.visit(v); >>>> } >>>> // Other stuff about booleans ... >>>> } >>>> public static final class StringNode extends ValueNode { >>>> // Much the same as BooleanNode >>>> } >>>> This goes on for some time: there is a multi-layered hierarchy of dozens of node >>>> types, each with a boolean visit(Object) method, and their own distinct Visitor >>>> interface, in this file. I should note that this code is actually not written >>>> by a human, but rather generated by some process (I didn?t look into how). I >>>> still think it is worth mentioning here for two reasons: first, whoever wrote >>>> the code generator would probably do something similar if writing it by hand, >>>> and second because these visitors are used often by hand-written code. >>>> Speaking of hand-written code, visitor subclasses now get to declare ahead of >>>> time exactly which kinds of nodes they care about, by implementing only the >>>> appropriate Visitor interfaces: >>>> private class FooVisitor implements StringNode.Visitor, >>>> BooleanNode.Visitor, RootNode.Visitor { >>>> // ... >>>> } >>>> This isn?t how I would have written things, but I can sorta see the appeal, if >>>> you don?t have to write it all by hand: a visitor can choose to handle any one >>>> subclass of ValueNode, or all ValueNodes, or just RootNode and StringNode, et >>>> cetera. They get to pick and choose what sub-trees of the inheritance tree they >>>> work with. >>>> Would Node be a good sealed class? Maybe. It clearly intends to enumerate all >>>> subclasses, but the benefit it gets from enforcing that is minimal. As in my >>>> previous examples, the main advantage for Node implementors would come from >>>> records, and the main advantage for clients would come from pattern-matching, >>>> obviating their need for this giant visitor. >>>> The Enumerated Node >>>> Another AST, this time for some kind of query language, explicitly declares an >>>> enum of all subclasses it can have, and uses this enum instead of using >>>> traditional double-dispatch: >>>> public interface Node { >>>> enum Kind {EXPR, QUERY, IMPORT /* and 9 more */} >>>> Kind getKind(); >>>> Location getLocation(); >>>> } >>>> public abstract record AbstractNode(Location l) implements Node {} >>>> public class Expr extends AbstractNode { >>>> public Kind getKind() {return EXPR;} >>>> // ... >>>> } >>>> // And so on for other Kinds ... >>>> public abstract class Visitor { >>>> // Empty default implementations, not abstract. >>>> public Expr visitExpr(Expr e) {} >>>> public Query visitQuery(Query q) {} >>>> public Import visitImport(Import i) {} >>>> public Node visit(Node n) { >>>> switch (n.getKind()) { >>>> case EXPR: return visitExpr((Expr)n); >>>> case QUERY: return visitQuery((Query)n); >>>> case IMPORT: return visitImport((Import)n); >>>> // ... >>>> } >>>> } >>>> } >>>> It?s not really clear to me why they do it this way, instead of putting an >>>> accept(Visitor) method on Node. They gain the ability to return different types >>>> for each Node subtype, but are hugely restricted in what visitors can do: they >>>> must return a Node, instead of performing an arbitrary computation. It seems >>>> like the idea is visitors must specialize to tree rewriting, but I still would >>>> have preferred to parameterize the visitor by return type. >>>> Would this be better as a sealed type? I feel sure that if sealed types existed, >>>> the authors of this class would have used one. We could certainly do away with >>>> the enum, and use an expression-switch instead to pattern-match in the >>>> implementation of visit(Node). But I think the Visitor class would still exist, >>>> and still have separate methods for each Node subtype, because they developer >>>> seemed to care about specializing the return type. The only place where an >>>> exhaustiveness check helps would be in the visit(Node) method, inside the >>>> visitor class itself. All other dispatch goes through visit(Node), or through >>>> one of the specialized visitor methods if the type is known statically. It >>>> seems like overall this would be an improvement, but again, the improvement >>>> comes primarily from pattern-matching, not sealing. >>>> Colocated interface implementations >>>> Finally, I looked for interfaces having all of their implementations defined in >>>> the same file. On this I do have some statistical data[3]. A huge majority >>>> (98.5%) of public interfaces have at least one implementation in a different >>>> source file. Package-private interfaces also tend to have implementations in >>>> other files: 85% of them are in this category. For protected interfaces it?s >>>> much closer: only 53% have external implementations. Of course, all private >>>> interfaces have all implementations in a single file. >>>> Next, I looked at interfaces that share a source file with all their >>>> implementations, to see whether they?d make good sealed types. First was this >>>> Entry class: >>>> public interface Entry { >>>> enum Status {OK, PENDING, FAILED} >>>> Status getStatus(); >>>> int size(); >>>> String render(); >>>> } >>>> public class UserEntry implements Entry { >>>> private User u; >>>> private Status s; >>>> public UserEntry(User u, Status s) { >>>> this.u = u; >>>> this.s = s; >>>> } >>>> @Override String render() {return [ http://u.name/ | u.name ] ();} >>>> @Override int size() {return 1;} >>>> @Override Status getStatus() {return s;} >>>> } >>>> public class AccountEntry implements Entry { >>>> private Account a; >>>> private Status s; >>>> public UserEntry(Account a, Status s) { >>>> this.a = a; >>>> this.s = s; >>>> } >>>> @Override String render() {return a.render();} >>>> @Override int size() {return a.size();} >>>> @Override Status getStatus() {return s;} >>>> } >>>> A huge majority of the clients of this Entry interface treat it polymorphically, >>>> just calling its interface methods. In only one case is there an instanceof >>>> check made on an Entry, dispatching to different methods depending on which >>>> subclass is present. >>>> Is this a good sealed type? I think not, really. There are two implementations >>>> now, but perhaps there will be a GroupEntry someday. Existing clients should >>>> continue to work in that case: the polymorphic Entry interface provides >>>> everything clients are ?intended? to know. >>>> Another candidate for sealing: >>>> public interface Request {/* Empty */} >>>> public record RequestById(int id) implements Request; >>>> public record RequestByValue(String owner, boolean historic) implements Request; >>>> public class RequestFetcher { >>>> public List fetch(Iterable requests) { >>>> List idReqs = Lists.newArrayList(); >>>> List valueReqs = Lists.newArrayList(); >>>> List queries = Lists.newArrayList(); >>>> for (Request req : requests) { >>>> if (req instanceof RequestById) { >>>> idReqs.add((RequestById)req); >>>> } else if (req instanceof RequestByValue) { >>>> valueReqs.add((RequestByValue)req); >>>> } >>>> } >>>> queries.addAll(prepareIdQueries(idReqs)); >>>> queries.addAll(prepareValueQueries(valueReqs)); >>>> return runQueries(queries); >>>> } >>>> } >>>> Interestingly, since the Request interface is empty, the only way to do anything >>>> with this class is to cast it to one implementation type. In fact, the >>>> RequestFetcher I include here is the only usage of either of these classes >>>> (plus, of course, helpers like prepareIdQueries). >>>> So, clients need to know about specific subclasses, and want to be sure they?re >>>> doing exhaustive pattern-matching. Seems like a great sealed class to me. >>>> Except...actually each of the two subclasses has been extended by a decorator >>>> adding a source[4]: >>>> public record SourcedRequestById(Source source) extends RequestById; >>>> public record SourcedRequestByValue(Source source) extends RequestByValue; >>>> Does this argue in favor of sealing, or against? I don?t really know. The owners >>>> of Request clearly intended for all four of these subclasses to exist (they?re >>>> in the same package), so they could include them all in the permitted subtype >>>> list, but it seems like a confusing API to expose to clients. >>>> A third candidate for sealing is another simple sum type: >>>> public interface ValueOrAggregatorException { >>>> T get(); >>>> public static ValueOrAggregatorException >>>> of(T value) { >>>> return new OfValue(value); >>>> } >>>> public static ValueOrAggregatorException >>>> ofException(AggregatorException err) { >>>> return new OfException(err); >>>> } >>>> private record OfValue(T value) >>>> implements ValueOrAggregatorException { >>>> @Override T get() {return value;} >>>> } >>>> private record OfException(AggregatorException err) >>>> implements ValueOrAggregatorException { >>>> @Override T get() {throw err;} >>>> } >>>> } >>>> It has only two subtypes, and it seems unimaginable there could ever be a third, >>>> so why not seal it? However, the subtypes are intentionally hidden: it is >>>> undesirable to let people see whether there?s an exception, except by having it >>>> thrown at you. In fact AggregatorException is documented as ?plugins may throw >>>> this, but should never catch it?: there is some higher-level thing responsible >>>> for catching all such exceptions. So, this type gains no benefit from >>>> exhaustiveness checks in pattern-matching. The type is intended to be used >>>> polymorphically, through its interface method, even though its private >>>> implementation is amenable to sealing. >>>> ________________ >>>> [1] Throughout this document I will use record syntax as if it were already in >>>> the language. This is merely for brevity, and to avoid making the reader spend >>>> a lot of time reading code that boils down to just storing a couple fields. In >>>> practice, of course the code in Google?s codebase either defines the records by >>>> hand, or uses an @AutoValue. >>>> [2] Recall that @AutoValue, Google?s ?record?, allows extending a class, which >>>> is semantically okay here: DbResult has no state, only behavior. >>>> [3]This data is imperfect. While the Google codebase strongly discourages having >>>> more than one version of a module checked in, there is still some amount of >>>> ?vendoring? or checking in multiple versions of some package, e.g. for >>>> supporting external clients of an old version of an API. As a result, two >>>> ?different files? which are really copies of each other may implement >>>> interfaces with the same fully-qualified name; I did not attempt to control for >>>> this case, and so such cases may look like they were in the same file, or not. >>>> [4] Of course in the record proposal it is illegal to extend records like this; >>>> in real life these plain data carriers are implemented by hand as ordinary >>>> classes, so the subtyping is legal. From brian.goetz at oracle.com Wed May 1 12:37:23 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 1 May 2019 08:37:23 -0400 Subject: Feedback on Sealed Types In-Reply-To: <2005472563.509550.1556713960688.JavaMail.zimbra@u-pem.fr> References: <7a3cd179-1ccc-f3d5-b438-3146e87fa4eb@oracle.com> <2005472563.509550.1556713960688.JavaMail.zimbra@u-pem.fr> Message-ID: <8B333F6F-77A8-4571-BA5E-A131A5DD8368@oracle.com> > It may solve the enclosing issue because the ';' syntactically separate A, B and C from the content of I which is declared after the ';', so A, B and C can be top-level. Trying to make these top level has the same ?how do I find the source file? problem that aux classes have. > I kind a like the intellectual separation between > - a sealed interface which represent a closed type and requires a permit clause and > - an enum interface which represent a sum type which is sugar on top of sealed interface + records. This does have a certain appeal, as each construct underscores what it is for. On the other hand, the return-on-sugar for the second is just not that big (unlike with records or enums). Basically, you get to drop the word ?record? and ?implements I? a bunch of times ? not clear it carries its weight. From brian.goetz at oracle.com Wed May 1 13:58:12 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 1 May 2019 09:58:12 -0400 Subject: Feedback on Sealed Types In-Reply-To: <8B333F6F-77A8-4571-BA5E-A131A5DD8368@oracle.com> References: <7a3cd179-1ccc-f3d5-b438-3146e87fa4eb@oracle.com> <2005472563.509550.1556713960688.JavaMail.zimbra@u-pem.fr> <8B333F6F-77A8-4571-BA5E-A131A5DD8368@oracle.com> Message-ID: <3CE5AD07-CFCA-47A0-A48A-34E0C3DF0742@oracle.com> > >> I kind a like the intellectual separation between >> - a sealed interface which represent a closed type and requires a permit clause and >> - an enum interface which represent a sum type which is sugar on top of sealed interface + records. > To be clear, I think what Alan is suggesting, and what Remi is supporting, is: - Make ?sealed? the primitive for defining closed types, as originally proposed, and also - Make the following enumerated interface Foo { R(X), S(Y); STUFF } sugar for sealed interface Foo permits R, S { STUFF record R(X) implements Foo { } record S(Y) implements Foo { } } Is that correct? From amalloy at google.com Wed May 1 15:34:45 2019 From: amalloy at google.com (Alan Malloy) Date: Wed, 1 May 2019 08:34:45 -0700 Subject: Feedback on Sealed Types In-Reply-To: <3CE5AD07-CFCA-47A0-A48A-34E0C3DF0742@oracle.com> References: <7a3cd179-1ccc-f3d5-b438-3146e87fa4eb@oracle.com> <2005472563.509550.1556713960688.JavaMail.zimbra@u-pem.fr> <8B333F6F-77A8-4571-BA5E-A131A5DD8368@oracle.com> <3CE5AD07-CFCA-47A0-A48A-34E0C3DF0742@oracle.com> Message-ID: Yes, that is what I suggest. Two points, though. First, the sugar benefit is at least a tiny bit larger than you say. You also get to omit instances of if the interface is parameterized, as I expect it will often be. I argue you should get to omit public, too: the implementations of a sum should always be public, just as the accessors for a record should be, for the same reason: they are the entire propose of defining the type, and allowing variation here detracts from their semantic value. And second, I don't know that counting the number of saved characters/tokens is the best way to measure the benefits anyway. An enhanced for loop over an array is not that much shorter than an old-style for loop with an explicit index - in fact it probably saves fewer characters than a couple "implements FooSum". But it's clearly a win because it communicates intent better, and leaves fewer opportunities to make a mistake, either in writing the code or in reading it. Likewise the ability to say in a single token, "this is a closed sum" has legibility benefits aside from just being shorter. On Wed, May 1, 2019, 6:58 AM Brian Goetz wrote: > > > > I kind a like the intellectual separation between > - a sealed interface which represent a closed type and requires a permit > clause and > - an enum interface which represent a sum type which is sugar on top of > sealed interface + records. > > > > To be clear, I think what Alan is suggesting, and what Remi is supporting, > is: > > - Make ?sealed? the primitive for defining closed types, as originally > proposed, and also > - Make the following > > enumerated interface Foo { > R(X), S(Y); > > STUFF > } > > sugar for > > sealed interface Foo > permits R, S { > > STUFF > > record R(X) implements Foo { } > record S(Y) implements Foo { } > } > > Is that correct? > > > > From forax at univ-mlv.fr Wed May 1 15:58:23 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 1 May 2019 17:58:23 +0200 (CEST) Subject: Feedback on Sealed Types In-Reply-To: <8B333F6F-77A8-4571-BA5E-A131A5DD8368@oracle.com> References: <7a3cd179-1ccc-f3d5-b438-3146e87fa4eb@oracle.com> <2005472563.509550.1556713960688.JavaMail.zimbra@u-pem.fr> <8B333F6F-77A8-4571-BA5E-A131A5DD8368@oracle.com> Message-ID: <509217760.520690.1556726303707.JavaMail.zimbra@u-pem.fr> > De: "Brian Goetz" > ?: "Remi Forax" > Cc: "Alan Malloy" , "amber-spec-experts" > > Envoy?: Mercredi 1 Mai 2019 14:37:23 > Objet: Re: Feedback on Sealed Types >> It may solve the enclosing issue because the ';' syntactically separate A, B and >> C from the content of I which is declared after the ';', so A, B and C can be >> top-level. > Trying to make these top level has the same ?how do I find the source file? > problem that aux classes have. I was thinking that those components are something new so you can 'tweak' import for them import akeyword Foo; // automatically import the component names too if it's an enum interface. given that no enum interface exists now, it's compatible. R?mi From john.r.rose at oracle.com Thu May 2 02:42:08 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 1 May 2019 19:42:08 -0700 Subject: Feedback on Sealed Types In-Reply-To: <2005472563.509550.1556713960688.JavaMail.zimbra@u-pem.fr> References: <7a3cd179-1ccc-f3d5-b438-3146e87fa4eb@oracle.com> <2005472563.509550.1556713960688.JavaMail.zimbra@u-pem.fr> Message-ID: <076E0A54-10A9-418B-BB9D-BD907C38F935@oracle.com> On May 1, 2019, at 5:32 AM, Remi Forax wrote: > > If there is only one constant of type Empty and the construction is typesafe, it can be a huge win. If Empty is an inline (value) type with no components, then Empty.default is the singleton, and there's nothing else to say about it. This is a use case for empty inlines. In fact they are unit types, as recognized in many languages. (There are some low-level technical reasons why Valhalla doesn't support this now, but they can be overcome with a bit of work. One problem is how to keep track of a field of size zero, if you are using relative offsets at present.) ? John P.S. Either a sum of N zero-length unit types or a classic enum of N elements, could be represented as a byte of lgN bits. (Or lg(N+1) if it's nullable.) We can't do this in the old contract of L-types, but under the new contract that allows early loading of field types, we could pull such tricks for either enums or unit-sums, in the JVM. So if Foo is a small enum, LFoo; requires 8 or 4 bytes, but GFoo; (where G is the "go and look" contract) can require just 1 byte, just like a boolean. From john.r.rose at oracle.com Fri May 3 20:21:04 2019 From: john.r.rose at oracle.com (John Rose) Date: Fri, 3 May 2019 13:21:04 -0700 Subject: String literals: some principles In-Reply-To: <0FBBF6C5-E836-43CD-88CD-BFD803AE6752@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <0FBBF6C5-E836-43CD-88CD-BFD803AE6752@oracle.com> Message-ID: <37341F23-5DC9-48A6-ADE5-CDA83EBED43A@oracle.com> On Apr 29, 2019, at 8:48 AM, Guy Steele wrote: > >> On Apr 28, 2019, at 4:32 PM, Brian Goetz wrote: >> >> . . . >> Looking ahead to the next round, we can build on this. In the first round, we mistakenly thought that there was something that could reasonably be called a ?raw? string, but this notion is a fantasy; no string literal is so raw that it can?t recognize its closing delimiter. So ?rawness? is really only a matter of degree. > > This is _almost_ true. If a string is truly raw (that is, it can contain _anything_), then one absolutely cannot depend on recognizing the closing delimiter by examining what might be the raw content. > > Put another way: one cannot determine how long the raw content is by examining it. That?s a solid principle. I'm going to be nit-picky here and refer to my earlier mentions of the paradigm of strong quoting, which at its heart simply means you have an infinite set of delimiters to choose from, when wrapping a payload into a literal syntax. Adding a numeral to the open quote means that there are now an unbounded set of open quotes, so it is an instance of strong quoting. Another instance of strong quoting adds nonces, and yet another just lengthens the quote pattern until it doesn't occur (anywhere) in the raw string payload. The numeric prefix convention is different from other kinds of strong quoting conventions, in that the end-quote can be a substring of the payload. Actually, the end-quote is most naturally the empty string, which is a substring of every string. The numeric prefix convention and other strong-quote conventions all share a common property: The convention as a whole is universal for arbitrary payloads, but for any given payload there are quotes which work and others that don't work. In the case of the numeric prefix convention, once you choose an open-quote (with numeral) you are limited to payloads of that length. That's not quite a "raw string" any more, since it's suitable only for a fixed-sized character field. Likewise, once you choose a particular nonce-based or patterned quote (e.g., seven double-quotes), payloads containing the corresponding end-quote as a substring are no longer suitable. Once you pick a particular payload string, the next question is whether you can embed that particular string into your program without inserting escape sequences. Only with a strong quote scheme of some sort is this possible. But, with any of several strong quote schemes, it is possible to dispense with escapes for any given string; it is not a fantasy. ? John From guy.steele at oracle.com Fri May 3 20:37:36 2019 From: guy.steele at oracle.com (Guy Steele) Date: Fri, 3 May 2019 16:37:36 -0400 Subject: String literals: some principles In-Reply-To: <37341F23-5DC9-48A6-ADE5-CDA83EBED43A@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <0FBBF6C5-E836-43CD-88CD-BFD803AE6752@oracle.com> <37341F23-5DC9-48A6-ADE5-CDA83EBED43A@oracle.com> Message-ID: I completely agree with what you said here, John. We both took a good look, but you squinted with your right eye, and I with my left. :-) Either point of view is correct; the two together yield depth perception. Yay! > On May 3, 2019, at 4:21 PM, John Rose wrote: > > On Apr 29, 2019, at 8:48 AM, Guy Steele wrote: >> >>> On Apr 28, 2019, at 4:32 PM, Brian Goetz wrote: >>> >>> . . . >>> Looking ahead to the next round, we can build on this. In the first round, we mistakenly thought that there was something that could reasonably be called a ?raw? string, but this notion is a fantasy; no string literal is so raw that it can?t recognize its closing delimiter. So ?rawness? is really only a matter of degree. >> >> This is _almost_ true. If a string is truly raw (that is, it can contain _anything_), then one absolutely cannot depend on recognizing the closing delimiter by examining what might be the raw content. >> >> Put another way: one cannot determine how long the raw content is by examining it. That?s a solid principle. > > I'm going to be nit-picky here and refer to my earlier > mentions of the paradigm of strong quoting, which > at its heart simply means you have an infinite set of > delimiters to choose from, when wrapping a payload > into a literal syntax. > > Adding a numeral to the open quote means that there > are now an unbounded set of open quotes, so it is an > instance of strong quoting. Another instance of strong > quoting adds nonces, and yet another just lengthens > the quote pattern until it doesn't occur (anywhere) in > the raw string payload. > > The numeric prefix convention is different from other > kinds of strong quoting conventions, in that the end-quote > can be a substring of the payload. Actually, the end-quote > is most naturally the empty string, which is a substring > of every string. > > The numeric prefix convention and other strong-quote > conventions all share a common property: The convention > as a whole is universal for arbitrary payloads, but for > any given payload there are quotes which work and others > that don't work. In the case of the numeric prefix > convention, once you choose an open-quote (with > numeral) you are limited to payloads of that length. > That's not quite a "raw string" any more, since it's > suitable only for a fixed-sized character field. > Likewise, once you choose a particular nonce-based > or patterned quote (e.g., seven double-quotes), > payloads containing the corresponding end-quote > as a substring are no longer suitable. > > Once you pick a particular payload string, the next > question is whether you can embed that particular > string into your program without inserting escape > sequences. Only with a strong quote scheme of > some sort is this possible. But, with any of several > strong quote schemes, it is possible to dispense > with escapes for any given string; it is not a fantasy. > > ? John From john.r.rose at oracle.com Fri May 3 22:25:49 2019 From: john.r.rose at oracle.com (John Rose) Date: Fri, 3 May 2019 15:25:49 -0700 Subject: String literals: some principles In-Reply-To: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> Message-ID: <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> TL;DR: Good framework; must also account for the rectangle extraction rule (RER). A unified escape sublanguage (ESL) is highly desirable, and I propose adding <\ > and <\ LT WS*> as escapes for space and for null string. The existing \ char is OK, and should be "fattened" as a separate feature. I note some issues with <\ u X X X X>. On Apr 28, 2019, at 1:32 PM, Brian Goetz wrote: > - Opening delimiter > - Closing delimiter > - Escape characters, if any > - Escape sublanguages, if any Yes, this is a useful way to break down the syntax. You left out padding conventions as a degree of freedom. Padding conventions given the programmer detailed control over the format of the program by associating non-payload characters with the string literal. Whitespace rectangle extraction is the only padding convention we are discussing, plus occasional suggestions that we remove horizontal space in one-line fat strings. If we denote today's escape sublanguage as ESL and the rectangle extraction rule as RER, then today's literals are: ThinString=SL[open=close=", escape=\, esl=ESL, pc=none] Tomorrow's fat strings will be something like: FatString=SL[open=close=""", escape=\, esl=ESL, pc=RER] Another aspect of defining a string literal is the *phasing* of the different features. I think we have good consensus that padding should be stripped *before* escape interpretation, so that escaped characters are not mistaken for padding characters. > I bring this up not because I want to talk about raw-ness now (getting the hint?), but because I want to keep all the variations of string literals as lightly-varying projections of the same basic feature. Understanding the variations is important. It also gives me hope that we could parley this framework, later on, into something strong. In the future (not now) we might add a parameterized range of these schemes: StrongString=SL[open=close=F(N), escape=G(N), esl=ESL, pc=RER] for some functions F, G that enumerate quote and escape tokens. This would be a strong quoting scheme that could (with care) allow any given payload string S to be embedded without the need for escapes, by choosing an N for which F(N) and G(N) do not occur in S. Getting back to today, I want to talk about escapes. First, I'll remind us all that the RER is part of fat strings and that therefore the newline and space characters are no longer just passive string body characters, but rather play a role in the string syntax. This means that the ESL needs to be upgraded so that occurrences of strings and newlines which otherwise would play a role in syntax can be escaped. I think this at a minimum means that the ESL needs to add support for the two character escape sequence <\ space>. There is already an escape sequence for a line terminator; it is <\ n>. A similar point holds for <\ t>. These three escapes (one new, two old) are enough to allow a programmer to tell the RER to stay away from a particular bit of white-space. (Note that if the RER were to happen *after* escape processing, we'd be in a pickle: There's be no way to use the existing ESL to control the RER, and we'd have to put some sort of extra control feature into the RER itself, or settle for an uncontrollable RER.) > It has come up, for example, that we might treat \ differently in ML strings as in classic strings, My own suggestions in this vein have nothing to do with making a new ESL but with extending the old one so it works well with fat strings. > but I would prefer it we could not tinker with the escape language in nonuniform ways ? as this minimizes the variations between the various sub-features. I agree that we should have only one ESL; there's no reason to have different "dialects" of it in different types of strings. So <\ space> should be added to the ESL, not because it's particularly useful for thin strings, but because it escapes otherwise strippable padding in fat strings. Here's an interesting feature of the JLS: It defines a uniform ESL for both string and character literals. This means that <\ '> can occur in both kinds of literals, even though it is only needed for character literals. Same point in reverse for <\ ">. Since the ESL is uniform, if *one* kind of literal needs a particular escape sequence, then *all* the literals have it. (See where I'm going?) Now, the upcoming features of fat strings includes a padding convention, ergo the common ESL needs a way to escape the now-syntactic padding characters. About <\ LT> (an escaped LineTerminator), a similar point holds: Sure it's useful only in string literals with line terminators, but if there is a legitimate reason to add extra control over LTs, then <\ LT> gets bundled into the common escape sublanguage of the JLS. There are two interesting questions about positioning <\ LT> as an escape sequence: 1. What does <\ LT> mean, if it is legal and not just an alias for <\ n>? 2. Is <\ LT> allowed in a thin string, given that (currently) the thin string syntax rejects LT? For 1. I'm already on record as proposing that <\ LT WS*> is an escape sequence for the null string. (WS is horizontal whitespace.) For 2., if we say "no" then we seem to come close to forking the ESL, which Brian and I want to avoid. A thin string body is a sequence of regular non-LT chars plus escape sequences, except <\ LT>. A fat string body can include <\ LT> as well as other escape sequences. But that is not really a fork of the ESL. The difference between fat and thin strings is a structural constraint on their bodies, before escape processing: A fat string can contain LT in its pre-escape-processed body, and so in fact can contain <\ LT>. A thin string cannot contain LT at all, so the presence of <\ LT> in the ESL is moot for a thin string. (Also moot for a char literal.) The parsing of a string literal (either kind) consists of gathering an escaped string body while looking for the close-quote. The close-quote interrupts the body and terminates the string. For the case of a thin string, an LT also interrupts the body, but causes parsing to fail. So we could answer "no" to 2 and keep a unified ESL, simply by asserting that thin string tokens never contain LT, while fat string tokens contain LT (always? different question). We could also answer "yes" to 2, and I think it's worth a discussion. What I'm suggesting here is that the thin strings are allowed to contain *escaped* LTs in a new version of the JLS (that also contains fat strings). The pre-escape-processed body of either kind of string can contain escaped LTs, and fat strings can *also* contain *unescaped* LTs. Example: var ts = "hel\ lo\ "; assert ts == "hello"; var fs = """ hel\ lo\ """; assert fs == "hello"; In the latter case, the RER strips most or all of the whitespace. In any case <\ LT WS*> sops up the rest. The reason we are discussing <\ LT> is that there are plenty of reasons why programmers would wish to control the format of their programs by breaking up long logical lines into shorter physical lines. Such use cases are not specific to payloads with or without newlines. If your payload has newlines, use a fat string *and* break up long logical lines into shorter physical ones. If you payload has no newlines (maybe it's a very long hex number), then use a thin string, and break it up. The RER of fat strings (which I like!) prompts the discussion of breaking up logical lines into physical ones, more than thin strings. After all, with thin strings, you break one line into two lines, it's a given that you are going to write two literals, and then the + sign (for concatenation) adds no additional overhead. The break-up sequence is something like <" LT WS + "> But if you have a large MLS with a few very long logical lines, suddenly you have an invidious choice between keeping your nice rectangle, or disrupting it totally by adding <" LT WS + ">. Breaking a long line in this case drops you off a syntax cliff. Supporting lets you down easy, by breaking the logical lines without disrupting the enclosing padding of the rectangle extraction rule. > Soliciting discussion on the pros and cons of keeping \ as our escape character. Well, \ makes a very fine escape character, except for particular payloads when it doesn't. Any payload which is a program in some little language that uses \ for escaping is going get confusing very fast. Nobody wants to count a train of escapes, and layers of escaping cause escape trains to lengthen fast (doubling with each layer). Regular expressions are the poster child, and I'll just pretend that they are the key use case, since they are the worst-behaved. Fattening \ to \\\ helps a little with REs. But it would make long trains even longer, with the result that you would need even more help keeping count. The eye can only count a small number of repeated characters at a glance. var re = "\\\\\\["; //train wreck for /\\\[/ assert ('\\'+"[").matches(re); A non-repeating escape is much easier on the eye. Choosing at random, I'll suggest <\ -> as a fattened escape sequence, with the standard ESL from the JLS (as amended with <\ space> etc). As long as that particular pair of characters is rare in REs (and other similar venues), there won't be any long trains of backslashes. var re = *"\\\["; assert ('\\'+"[").matches(re); var s6 = *"\-\- \-" \"; assert s6 == '\\'+"- \" "+'\\'; The star shows that I'm talking about some non-standard string syntax: FatEscString=SL[open=*", close=", escape=\-, esl=ESL, pc=none] I think it would be reasonable to fatten escapes as a separate feature, but not in tandem with the current multi-line string proposal. Straw man, separate from the MLS proposal. If a string literal (either fat or thing) is immediate preceded by <\ ->, the body of the string uses that sequence for its escapes instead of \. The ESL is unchanged. If stronger escapes are also desired, the feature can be extended simply by allowing any number of - characters, e.g. \--"x\-y\z" and \--"\--n" (for "x\\-y\\z" and "\n"). We are leaving \uXXXX escapes out of the accounting. This is understandable, because they are not a regular part of the ESL, and hard to treat as part of it. But we should try. In particular, we can and should find a way to treat most or all of the \uXXXX escapes *in a string body* as being expanded as part of the ESL, rather than a pre-pass. This will make \uXXXX escapes more complicated, but it may profitably simplify their effect on the user model. One idea is simple: In the body of a string, any \uXXXX which doesn't denote a controlling part of the string syntax (quote or backslash) is collected into the string body as an unexpanded character sequence <\ u X X X X>. This sequence is then supported by the ESL. The effect is that padding removal (rectangle extraction) happens before \u replacement *in a string body*. A second idea could be adopted either with the first or separately: As a structural constraint on string bodies, unicode sequences which would expand to whitespace, quote, or backslash are forbidden. And here's a draconian one: Forbid <\ u X X X X> where the code point is 007F or lower. That would blow up some stupid test cases and puzzlers; user code that does this should be fixed. If we can't do this everywhere, do it inside string bodies. We may be limited by backward compatibility on the application of these ideas to thin strings, but they should be considered at least for fat strings. There are two benefits to taming \uXXXX: 1. Fewer puzzlers involving hidden syntax (\ " etc.) 2. The processing of \uXXXX for string bodies can be documented and aligned with an "unescape" method on String, which is useful in its own right. From john.r.rose at oracle.com Fri May 3 23:40:16 2019 From: john.r.rose at oracle.com (John Rose) Date: Fri, 3 May 2019 16:40:16 -0700 Subject: Wrapping up the first two courses In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> <8F0DF8BC-B4CE-445E-808A-B4A7B74F7B3E@oracle.com> Message-ID: <85314830-6CB0-4761-882D-24A655FC0B50@oracle.com> On Apr 26, 2019, at 8:59 AM, Kevin Bourrillion wrote: > > On Fri, Apr 26, 2019 at 8:56 AM Kevin Bourrillion wrote: > > Apparently bash's behavior is to replace with a single space character, and that at least seems like a useful behavior for us too if we're open to it. No, it replaces <\ NL> with nothing at all. Any spaces before or after that two-character sequence are bystanders. In a separate step, if not inside quotes, all sequences of whitespace are treated as if they were single spaces, as the shell breaks a line up into words. The net is that the stuff you mentioned behaves like whitespace. But also: ``` $ x=a\ b $ echo $x ab ``` However, I'm proposing that horizontal whitespace *after* the newline is "gobbled up" and thrown away with the leading <\ LT>, so the escape sequence is more like <\ LT (SP|TAB)*>. This gives the programmer more control over program layout. > I was forgetting, when I said this, that another substantial minority use case (I want to say at least 15%? These were rough estimates though) for multi-line strings is really long URLs, checksums, etc., that aren't meant to have any spaces in them at all. So the bash behavior is not necessarily what we'd want, although of course consistency with it has some amount of value in itself. The actual bash behavior, described above, *is* what we want. If the programmer *wants* a space, one can be placed just before the <\ LT> sequence. Luckily, that's reasonably readable. > Which raises another question: do we allow \ in SL strings? (I presume so, and we just eat the \ and the terminator.) If we eat the (SP|TAB)* after LT, then we have given the programmer control over indentation, in a way that is consistent with the rectangle rule, but applies only to the one escaped (partial) line. > Hmm, I can see how that could be harmless but it seems to blur the boundary between the features to me. It seems that way. I think what's happening is another iteration of "Let's do raw strings! Wait, that's not what they really are" and now we are at "Let's do multi-line strings!" Brian's comment is that the tri-quote makes a better container for payloads with single quotes. Those payloads often have multiple lines too. So it's really "fatter strings", in some sense. We might say we are making strings with *unescaped LTs*. The rectangle rule shows up as soon as we realize that programmers have strong opinions about spacing, and want to indent their code so it is readable. (Pretty too; beauty is a proxy for readability I suppose.) So if we let the programmer start putting paragraphs into string bodies, we also have to let the programmer manage indentation. And it's a short and natural step from exdenting to line-breaking, IMO. We might say we are making *more readable syntax for large strings*. Minimizing escape sequences makes them readable, and so does giving the programmer control over program layout. Such "readable strings" make some sense for one-liners also, especially if we extend the 2D rectangle rule to the 1D case and strip leading and trailing whitespace, near the triquotes. In the end, we might just dub them "fat strings". ? John From john.r.rose at oracle.com Sat May 4 00:43:45 2019 From: john.r.rose at oracle.com (John Rose) Date: Fri, 3 May 2019 17:43:45 -0700 Subject: String literals: some principles In-Reply-To: <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> Message-ID: On May 3, 2019, at 3:25 PM, John Rose wrote: > > And here's a draconian one: Forbid <\ u X X X X> > where the code point is 007F or lower. That would > blow up some stupid test cases and puzzlers; user > code that does this should be fixed. If we can't do > this everywhere, do it inside string bodies. > > We may be limited by backward compatibility on the > application of these ideas to thin strings, but they should > be considered at least for fat strings. Here's an example of how \uXXXX escapes could be brought into alignment with the escape sublanguage: https://docs.oracle.com/javase/specs/jls/se12/html/jls-3.html#jls-3.10.6 > 3.10.6. Escape Sequences for Character and String Literals ? > It is a compile-time error if the character following a backslash in an escape sequence is not an ASCII b, t, n, f, r, ", ', \, + [?space, LineTerminator,?] > 0, 1, 2, 3, 4, 5, 6, or 7. The Unicode escape \u is processed earlier (?3.3). +In a [?fat?] string literal, no part of the open or closing quote, or of +any escape sequence, or of any stripped whitespace, may contain +a character that was derived (in the earlier processing) from +a Unicode escape +[?, unless the first character of the literal, a ", was also derived +from a Unicode escape?] +. > Octal escapes are provided for compatibility with C, but can express only Unicode values \u0000 through \u00FF, so Unicode escapes are usually preferred. +(In a string literal we forbid Unicode escapes for characters which +steer the lexical syntax of the literal. This makes it easier to +read. [?The exception allows Java programs to be encoded with +dense use of Unicode escapes, as long as the open-quotes are +so encoded.?]) If we omit [fat] in the above, we get an incompatible change to thin strings. But I think it would actually be the right move. Here's a puzzler I just thought of: var puz = "\1\u0032"; // puz = '\1'+"0" or '\10'+""? This is a one-character string "\n". If \u escapes were a proper part of the escape sub-language, then puz would be a two-character string. Here's a place where prior-expansion of \u escapes interferes with the structure of fat strings: var fat = """ \u0020 hello """; // fat = "hello\n" or " hello\n"? We can stop caring about the awkward phasing of \u escapes if and only if we make a restriction that \u escapes can't mix with other parts of string syntax, as above. This goes for the new syntax as well as the old. It's easier to impose such a rule on new syntax, of course. This sort of thing makes me want to put the restriction on all string (and character) literals. It seems to me that only deliberately obfuscated code would fall afoul of it. If that's really true, this feature is completely separable from fat strings or any other menu items, as long as we are willing to apply it after the fact, incompatibly with obfuscated code. ? John From brian.goetz at oracle.com Tue May 7 22:14:59 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 7 May 2019 15:14:59 -0700 Subject: [ string literals ] Extending the escape language (was: String literals: some principles) In-Reply-To: <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> Message-ID: <8AFDFDF5-0743-48AE-833E-879A75F635FC@oracle.com> > TL;DR: Good framework; must also account for the > rectangle extraction rule (RER). A unified escape > sublanguage (ESL) is highly desirable, and I propose > adding <\ > and <\ LT WS*> as escapes for space > and for null string. The existing \ char is OK, and > should be "fattened" as a separate feature. I note > some issues with <\ u X X X X>. Agree in general with the desire to extend ESL with some whitespace sequences, though I take some issues with the syntax on \ and \. Some alternate ideas regarding \uxxxx. First, unicode escapes. Alex pointed out offline that we had worked our way into a linear thinking trap (again). In the first round, because we were focused on raw strings, we turned off \uxxxx processing in the body of a raw string, which raised the question of ?how do we turn it back on.? And also that, while we use the same escape character for both, they occupy very different places in the language; the ESL is purely about string literals, whereas \uxxxx is purely a lexing concern. His recommendation, which (now that its been explained to me) I strongly agree with, is: let?s not have this feature touch unicode processing at all. Let?s just leave unicode processing as is, using \uxxxx, whether in code, SLSLs, MLSLs, and any future ?raw? SLs. The similarly between \n and \uxxxx is purely coincidental. And if we really want the characters "\u0000? in a string literal, well, we know how to escape the \. Which brings us to \ and \. My main complaint here is that I am really uncomfortable using \ for ?literal space?, because at the end of the line, one cannot differentiate between \ and \ when reading the code. Alternatives include \_, or \s, or \., or ? many others. From john.r.rose at oracle.com Tue May 7 23:36:21 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 7 May 2019 16:36:21 -0700 Subject: [ string literals ] Extending the escape language (was: String literals: some principles) In-Reply-To: <8AFDFDF5-0743-48AE-833E-879A75F635FC@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> <8AFDFDF5-0743-48AE-833E-879A75F635FC@oracle.com> Message-ID: <90AC5F2B-C787-4A7A-9829-66FE94897BA9@oracle.com> On May 7, 2019, at 3:14 PM, Brian Goetz wrote: > > >> TL;DR: Good framework; must also account for the >> rectangle extraction rule (RER). A unified escape >> sublanguage (ESL) is highly desirable, and I propose >> adding <\ > and <\ LT WS*> as escapes for space >> and for null string. The existing \ char is OK, and >> should be "fattened" as a separate feature. I note >> some issues with <\ u X X X X>. > > Agree in general with the desire to extend ESL with some whitespace sequences, though I take some issues with the syntax on \ and \. Some alternate ideas regarding \uxxxx. > > First, unicode escapes. Alex pointed out offline that we had worked our way into a linear thinking trap (again). In the first round, because we were focused on raw strings, we turned off \uxxxx processing in the body of a raw string, which raised the question of ?how do we turn it back on.? And also that, while we use the same escape character for both, they occupy very different places in the language; the ESL is purely about string literals, whereas \uxxxx is purely a lexing concern. I don't think that's the trap we are in. The trap is the Language Experts Designing User Model trap, where LE's say "we don't need to deal with \u because it's not the part of the JLS we are working on", and the user says, "they are all just escapes, right?" The reason it's a trap is we think the user will be happy to learn and apply the geeky-fine distinctions between the two superficially similar syntaxes. One good way out of this particular trap is to carefully restrict the allowed \uxxxx patterns in strings, so that the phase order becomes irrelevant, and then move those patterns forward in the phase order along with the other escapes. We can also do as you are recommending, and ignore the problem. The only difficulty there is occasionally having to ask the user to ignore the problem also, by saying things like "yes, that's an escape sequence but \u sequence break the rule you are trying to apply". Such as using "\0040" to escape a space. How frequent is "occasionally"? I don't know; if it's very infrequent then, yes, we can ignore this problem. It will give puzzler authors some extra scope for their hobby. > His recommendation, which (now that its been explained to me) I strongly agree with, is: let?s not have this feature touch unicode processing at all. Let?s just leave unicode processing as is, using \uxxxx, whether in code, SLSLs, MLSLs, and any future ?raw? SLs. The similarly between \n and \uxxxx is purely coincidental. (That's why it's a LEDUM trap.) > And if we really want the characters "\u0000? in a string literal, well, we know how to escape the \. > > Which brings us to \ and \. My main complaint here is that I am really uncomfortable using \ for ?literal space?, because at the end of the line, one cannot differentiate between \ and \ when reading the code. Alternatives include \_, or \s, or \., or ? many others. Personally, I'm fine with those. By analogy with \n I suppose \s will be unsurprising; I don't care about this corner of the bikeshed, though. I certainly agree that having more than one "\ whitespace" sequence creates visual ambiguities; that's a good catch. ? John From guy.steele at oracle.com Wed May 8 20:26:33 2019 From: guy.steele at oracle.com (Guy Steele) Date: Wed, 8 May 2019 16:26:33 -0400 Subject: [ string literals ] Extending the escape language (was: String literals: some principles) In-Reply-To: <8AFDFDF5-0743-48AE-833E-879A75F635FC@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> <8AFDFDF5-0743-48AE-833E-879A75F635FC@oracle.com> Message-ID: <972A3698-ABC9-4D7F-B016-C8C0A8980B15@oracle.com> > On May 7, 2019, at 6:14 PM, Brian Goetz wrote: > > . . . at the end of the line, one cannot differentiate between \ and \ when reading the code. This suggests a design constraint for the ESL: whatever \ means, \ ought to mean the same thing. From john.r.rose at oracle.com Wed May 8 20:27:43 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 8 May 2019 13:27:43 -0700 Subject: [ string literals ] Extending the escape language (was: String literals: some principles) In-Reply-To: <972A3698-ABC9-4D7F-B016-C8C0A8980B15@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> <8AFDFDF5-0743-48AE-833E-879A75F635FC@oracle.com> <972A3698-ABC9-4D7F-B016-C8C0A8980B15@oracle.com> Message-ID: <10E54C71-D384-4C3E-B406-6BCCD380A082@oracle.com> On May 8, 2019, at 1:26 PM, Guy Steele wrote: > >> >> On May 7, 2019, at 6:14 PM, Brian Goetz wrote: >> >> . . . at the end of the line, one cannot differentiate between \ and \ when reading the code. > > This suggests a design constraint for the ESL: whatever \ means, \ ought to mean the same thing. Or else \+ is illegal. In other words, there shouldn't be more than one non-error meaning. From guy.steele at oracle.com Wed May 8 20:31:56 2019 From: guy.steele at oracle.com (Guy Steele) Date: Wed, 8 May 2019 16:31:56 -0400 Subject: [ string literals ] Extending the escape language (was: String literals: some principles) In-Reply-To: <10E54C71-D384-4C3E-B406-6BCCD380A082@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> <8AFDFDF5-0743-48AE-833E-879A75F635FC@oracle.com> <972A3698-ABC9-4D7F-B016-C8C0A8980B15@oracle.com> <10E54C71-D384-4C3E-B406-6BCCD380A082@oracle.com> Message-ID: > On May 8, 2019, at 4:27 PM, John Rose wrote: > > On May 8, 2019, at 1:26 PM, Guy Steele wrote: >> >>> >>> On May 7, 2019, at 6:14 PM, Brian Goetz wrote: >>> >>> . . . at the end of the line, one cannot differentiate between \ and \ when reading the code. >> >> This suggests a design constraint for the ESL: whatever \ means, \ ought to mean the same thing. > > Or else \+ is illegal. > In other words, there shouldn't be > more than one non-error meaning. True. Then there are the separate questions of (a) whether it is less confusing to Joe Programmer to accept \ but reject \+, or to make \+ ?just work?, and (b) what are costs of making \+ ?just work?. From james.laskey at oracle.com Wed May 8 20:35:23 2019 From: james.laskey at oracle.com (James Laskey) Date: Wed, 8 May 2019 17:35:23 -0300 Subject: [ string literals ] Extending the escape language (was: String literals: some principles) In-Reply-To: References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> <8AFDFDF5-0743-48AE-833E-879A75F635FC@oracle.com> <972A3698-ABC9-4D7F-B016-C8C0A8980B15@oracle.com> <10E54C71-D384-4C3E-B406-6BCCD380A082@oracle.com> Message-ID: <5579BE1E-435C-4735-9402-49634AF7A86E@oracle.com> Sent from my iPhone > On May 8, 2019, at 5:31 PM, Guy Steele wrote: > > >> On May 8, 2019, at 4:27 PM, John Rose wrote: >> >> On May 8, 2019, at 1:26 PM, Guy Steele wrote: >>> >>>> >>>> On May 7, 2019, at 6:14 PM, Brian Goetz wrote: >>>> >>>> . . . at the end of the line, one cannot differentiate between \ and \ when reading the code. >>> >>> This suggests a design constraint for the ESL: whatever \ means, \ ought to mean the same thing. >> >> Or else \+ is illegal. >> In other words, there shouldn't be >> more than one non-error meaning. > > True. Then there are the separate questions of (a) whether it is less confusing to Joe Programmer to accept \ but reject \+, or to make \+ ?just work?, and (b) what are costs of making \+ ?just work?. > Explaining to Joe Programmer might be the main cost. From guy.steele at oracle.com Wed May 8 20:37:35 2019 From: guy.steele at oracle.com (Guy Steele) Date: Wed, 8 May 2019 16:37:35 -0400 Subject: [ string literals ] Extending the escape language (was: String literals: some principles) In-Reply-To: <5579BE1E-435C-4735-9402-49634AF7A86E@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> <8AFDFDF5-0743-48AE-833E-879A75F635FC@oracle.com> <972A3698-ABC9-4D7F-B016-C8C0A8980B15@oracle.com> <10E54C71-D384-4C3E-B406-6BCCD380A082@oracle.com> <5579BE1E-435C-4735-9402-49634AF7A86E@oracle.com> Message-ID: > On May 8, 2019, at 4:35 PM, James Laskey wrote: > > > > Sent from my iPhone > >> On May 8, 2019, at 5:31 PM, Guy Steele wrote: >> >> >>> On May 8, 2019, at 4:27 PM, John Rose wrote: >>> >>> On May 8, 2019, at 1:26 PM, Guy Steele wrote: >>>> >>>>> >>>>> On May 7, 2019, at 6:14 PM, Brian Goetz wrote: >>>>> >>>>> . . . at the end of the line, one cannot differentiate between \ and \ when reading the code. >>>> >>>> This suggests a design constraint for the ESL: whatever \ means, \ ought to mean the same thing. >>> >>> Or else \+ is illegal. >>> In other words, there shouldn't be >>> more than one non-error meaning. >> >> True. Then there are the separate questions of (a) whether it is less confusing to Joe Programmer to accept \ but reject \+, or to make \+ ?just work?, and (b) what are costs of making \+ ?just work?. >> > > Explaining to Joe Programmer might be the main cost. True dat. From john.r.rose at oracle.com Wed May 8 22:31:35 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 8 May 2019 15:31:35 -0700 Subject: [ string literals ] Extending the escape language (was: String literals: some principles) In-Reply-To: References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> <3C1B2843-E3E2-4FCB-B24D-B4965285A5A3@oracle.com> <8AFDFDF5-0743-48AE-833E-879A75F635FC@oracle.com> <972A3698-ABC9-4D7F-B016-C8C0A8980B15@oracle.com> <10E54C71-D384-4C3E-B406-6BCCD380A082@oracle.com> Message-ID: On May 8, 2019, at 1:31 PM, Guy Steele wrote: > > True. Then there are the separate questions of (a) whether it is less confusing to Joe Programmer to accept \ but reject \+, or to make \+ ?just work?, and (b) what are costs of making \+ ?just work?. Deprecating invisible whitespace before is a common practice. The OpenJDK repos reject this along with leading tabs and other visual ambiguities. Come to think of it, this common practice is... - evidence that Joe P. already knows isn't quite kosher. - evidence that *shouldn't* be a *significant* part of a new Java syntax! - *not* necessarily a candidate for enforcement at the language level. (The middle point supports <\ s> against <\ space> as a candidate escape sequence!) From james.laskey at oracle.com Thu May 9 12:06:41 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Thu, 9 May 2019 09:06:41 -0300 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] Message-ID: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> At this point I think the only outstanding issue is long line continuation. While we can postpone continuation until a later release, I think we should at least lay out the details to see if we need to do anything now. I'll follow up with a (long line continuation) synopsis e-mail in a few. Meanwhile, please review the JEP and comment back here. Cheers, -- Jim html: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.html markdown: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.md From james.laskey at oracle.com Thu May 9 14:34:48 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Thu, 9 May 2019 11:34:48 -0300 Subject: Long line string literals Message-ID: <454D3E30-C49B-49F0-9BC6-CA9E44117051@oracle.com> How does a Java developer express a very long string? Note that this is not just a multi-line string literal question. The issue relates to all string literals. Example, String ls = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc est libero, vehicula nec molestie in, semper aliquam magna."; Current solution, String ls = "Lorem ipsum dolor sit amet, consectetur " + "adipiscing elit. Nunc est libero, vehicula " + "nec molestie in, semper aliquam magna."; This works and will continue to work, but I think there is concern that this pattern won't work when multi-line string literals are added to the equation. There has been some debate about various machinations that could be be used. Some of the parameters; - The solution needs to be an escape sequence(s). This is the only mechanism we can introduce (now) and be backward compatible with traditional string literals. Other mechanisms, such as literal prefixing, are not open for discussion at this point in time. (+1) - A Multi-line String Literal JEP goal is to make all escape sequences equally meaningful for traditional string literals and multi-line string literals. (+1) - \, \ and \ (white space includes LF and CR) have been proposed with various semantics for each. There is a concern that the lack of visibility for what comes after the \. Is it a space, tab, unicode white space, LF or CR? How do you tell? (?1) - When the new escape sequence(s) is in a traditional string literal the compiler scanner needs to treat the traditional string literal as multi-line. (-1) The escape sequences suggested differ, but they are all variations of consuming the escape and zero to N characters after (or before). A) \ or \ Just consume the (single) line terminator/white space. Sample, String tsl = "Lorem ipsum dolor sit amet, consectetur \ adipiscing elit. Nunc est libero, vehicula \ nec molestie in, semper aliquam magna."; String msl = """ Lorem ipsum dolor sit amet, consectetur \ adipiscing elit. Nunc est libero, vehicula \ nec molestie in, semper aliquam magna."""; This works if the line terminator follows immediately after the \ . (+1) Can not tell if it is a white space or line terminator after the \ . (-1) This does not work if there is one or more intervening white space characters. (-1) This works for multi-line string literals because of stripTrailing. (+1) This does not work for traditional string literals because there is no notion of auto alignment to strip the leading white space on the next line. (-2) B) \ Consume all white space up to and including the line terminator. Same sample as A). Works in more cases than A). (+2) Still does not work for traditional string literals because there is no notion of auto alignment to strip the leading white space on the next line. (-2) C) \ Consume all white space (including LF and CR) up to a non-white space or end of string. Same sample as A). This works for both traditional and multi-line strings. (+1) Note that in A), B) and C) the next line may influence multi-line indentation. I.E., escapes are translated after auto alignment. (?1) D) \, (something other that white space) but otherwise the same as C) String tsl = "Lorem ipsum dolor sit amet, consectetur \, adipiscing elit. Nunc est libero, vehicula \, nec molestie in, semper aliquam magna."; String msl = """ Lorem ipsum dolor sit amet, consectetur \, adipiscing elit. Nunc est libero, vehicula \, nec molestie in, semper aliquam magna."""; Works but trading " + for \, . (?1) E) \> (something other that white space) Consume all white space up to and including the line terminator. \< (something other that white space) Consume all white space back to beginning of line. String tsl = "Lorem ipsum dolor sit amet, consectetur \> \ \ \ \ References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> Message-ID: <92CC33C3-6AA1-4639-ABC0-1DD87CD33C59@oracle.com> I've withdrawn the posting for some additional changes. Will keep you posted. -- Jim > On May 9, 2019, at 9:06 AM, Jim Laskey wrote: > > At this point I think the only outstanding issue is long line continuation. While we can postpone continuation until a later release, I think we should at least lay out the details to see if we need to do anything now. I'll follow up with a (long line continuation) synopsis e-mail in a few. > > Meanwhile, please review the JEP and comment back here. > > Cheers, > > -- Jim > > > html: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.html > markdown: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.md > > From amaembo at gmail.com Thu May 9 15:59:11 2019 From: amaembo at gmail.com (Tagir Valeev) Date: Thu, 9 May 2019 22:59:11 +0700 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> Message-ID: Hello! Great draft, thank you. I'm especially happy that the expert group came to the conclusion that automatic builtin processing of the indentation is important. I proposed to do this in January, 2018 [1]. While the solution proposed in JEP draft is not as radical as proposed by me, I still like it better than the previous RSL proposal. One thing which seems missing is dealing with tabs. What if user file is invented with tabs? Should they be also processed? More specifically, what is a "white space" in strip indent algorithm description? Only \u0020 symbol or \u0020 & \u0009? Or anything for which Character.isWhiteSpace() returns true? Also if tabs are included do single tab costs the same as single space? You may imagine that somebody pastes part of multiline string from StackOverflow where tabs were used for indent and see some unexpected results (e.g. indent changes for the untouched lines while visually in the editor pasted lines look having the same indent as surrounding ones). I admit that defining what is "expected result" is hard, especially taking into account that most editors provide a setting for the tab size and different users may have different tab size. Nevertheless I feel that tab handling should be explicitly spelled out (even if it's "tab is not considered as a white-space character"). With best regards, Tagir Valeev. [1] http://mail.openjdk.java.net/pipermail/amber-spec-experts/2018-January/000251.html ??, 9 ??? 2019 ?., 19:07 Jim Laskey : > At this point I think the only outstanding issue is long line > continuation. While we can postpone continuation until a later release, I > think we should at least lay out the details to see if we need to do > anything now. I'll follow up with a (long line continuation) synopsis > e-mail in a few. > > Meanwhile, please review the JEP and comment back here. > > Cheers, > > -- Jim > > > html: > http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.html > markdown: > http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.md > > > From james.laskey at oracle.com Thu May 9 16:40:19 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Thu, 9 May 2019 13:40:19 -0300 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> Message-ID: The proposed solution does not solve the mixed leading white space problem. As long as the white space is consistent across lines with the tabs, it works fine. Otherwise, the developer has two solutions; 1) Uuse your editor to detab or detab + entab. Either makes the white space consistent. 2) Write a custom String::transform method that does the trimMargins thing. String string = """ |> line 1 |> line 2 """.transform(s -> s.replaceAll("\\w*\\|> ", "")); The lone line discussion could fall out an automatic trimMargins solution, but it gets messy. -- Jim > On May 9, 2019, at 12:59 PM, Tagir Valeev wrote: > > Hello! > > Great draft, thank you. I'm especially happy that the expert group came to the conclusion that automatic builtin processing of the indentation is important. I proposed to do this in January, 2018 [1]. While the solution proposed in JEP draft is not as radical as proposed by me, I still like it better than the previous RSL proposal. > > One thing which seems missing is dealing with tabs. What if user file is invented with tabs? Should they be also processed? More specifically, what is a "white space" in strip indent algorithm description? Only \u0020 symbol or \u0020 & \u0009? Or anything for which Character.isWhiteSpace() returns true? Also if tabs are included do single tab costs the same as single space? You may imagine that somebody pastes part of multiline string from StackOverflow where tabs were used for indent and see some unexpected results (e.g. indent changes for the untouched lines while visually in the editor pasted lines look having the same indent as surrounding ones). I admit that defining what is "expected result" is hard, especially taking into account that most editors provide a setting for the tab size and different users may have different tab size. Nevertheless I feel that tab handling should be explicitly spelled out (even if it's "tab is not considered as a white-space character"). > > With best regards, > Tagir Valeev. > > [1] http://mail.openjdk.java.net/pipermail/amber-spec-experts/2018-January/000251.html > ??, 9 ??? 2019 ?., 19:07 Jim Laskey >: > At this point I think the only outstanding issue is long line continuation. While we can postpone continuation until a later release, I think we should at least lay out the details to see if we need to do anything now. I'll follow up with a (long line continuation) synopsis e-mail in a few. > > Meanwhile, please review the JEP and comment back here. > > Cheers, > > -- Jim > > > html: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.html > markdown: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.md > > From guy.steele at oracle.com Thu May 9 16:43:35 2019 From: guy.steele at oracle.com (Guy Steele) Date: Thu, 9 May 2019 12:43:35 -0400 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> Message-ID: One possibility is language enforcement: at the part of the processing of a multiline string where whitespace is stripped off to the left of the rectangle, it could be an error if that leading whitespace is not ?spelled the same? on all lines from which it is being stripped. I lean toward recommending this. > On May 9, 2019, at 12:40 PM, Jim Laskey wrote: > > The proposed solution does not solve the mixed leading white space problem. As long as the white space is consistent across lines with the tabs, it works fine. > > Otherwise, the developer has two solutions; > > 1) Uuse your editor to detab or detab + entab. Either makes the white space consistent. > > 2) Write a custom String::transform method that does the trimMargins thing. > > String string = """ > |> line 1 > |> line 2 > """.transform(s -> s.replaceAll("\\w*\\|> ", "")); > The lone line discussion could fall out an automatic trimMargins solution, but it gets messy. > -- Jim > > > > > >> On May 9, 2019, at 12:59 PM, Tagir Valeev > wrote: >> >> Hello! >> >> Great draft, thank you. I'm especially happy that the expert group came to the conclusion that automatic builtin processing of the indentation is important. I proposed to do this in January, 2018 [1]. While the solution proposed in JEP draft is not as radical as proposed by me, I still like it better than the previous RSL proposal. >> >> One thing which seems missing is dealing with tabs. What if user file is invented with tabs? Should they be also processed? More specifically, what is a "white space" in strip indent algorithm description? Only \u0020 symbol or \u0020 & \u0009? Or anything for which Character.isWhiteSpace() returns true? Also if tabs are included do single tab costs the same as single space? You may imagine that somebody pastes part of multiline string from StackOverflow where tabs were used for indent and see some unexpected results (e.g. indent changes for the untouched lines while visually in the editor pasted lines look having the same indent as surrounding ones). I admit that defining what is "expected result" is hard, especially taking into account that most editors provide a setting for the tab size and different users may have different tab size. Nevertheless I feel that tab handling should be explicitly spelled out (even if it's "tab is not considered as a white-space character"). >> >> With best regards, >> Tagir Valeev. >> >> [1] http://mail.openjdk.java.net/pipermail/amber-spec-experts/2018-January/000251.html >> ??, 9 ??? 2019 ?., 19:07 Jim Laskey >: >> At this point I think the only outstanding issue is long line continuation. While we can postpone continuation until a later release, I think we should at least lay out the details to see if we need to do anything now. I'll follow up with a (long line continuation) synopsis e-mail in a few. >> >> Meanwhile, please review the JEP and comment back here. >> >> Cheers, >> >> -- Jim >> >> >> html: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.html >> markdown: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.md >> >> > From james.laskey at oracle.com Thu May 9 16:53:34 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Thu, 9 May 2019 13:53:34 -0300 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> Message-ID: <67095905-B377-4E4E-B0F9-2D0879F3680C@oracle.com> Reasonable. This is a learning step for new users of MLS. > On May 9, 2019, at 1:43 PM, Guy Steele wrote: > > One possibility is language enforcement: at the part of the processing of a multiline string where whitespace is stripped off to the left of the rectangle, it could be an error if that leading whitespace is not ?spelled the same? on all lines from which it is being stripped. I lean toward recommending this. > >> On May 9, 2019, at 12:40 PM, Jim Laskey > wrote: >> >> The proposed solution does not solve the mixed leading white space problem. As long as the white space is consistent across lines with the tabs, it works fine. >> >> Otherwise, the developer has two solutions; >> >> 1) Uuse your editor to detab or detab + entab. Either makes the white space consistent. >> >> 2) Write a custom String::transform method that does the trimMargins thing. >> >> String string = """ >> |> line 1 >> |> line 2 >> """.transform(s -> s.replaceAll("\\w*\\| > ", "")); >> The lone line discussion could fall out an automatic trimMargins solution, but it gets messy. >> -- Jim >> >> >> >> >> >>> On May 9, 2019, at 12:59 PM, Tagir Valeev > wrote: >>> >>> Hello! >>> >>> Great draft, thank you. I'm especially happy that the expert group came to the conclusion that automatic builtin processing of the indentation is important. I proposed to do this in January, 2018 [1]. While the solution proposed in JEP draft is not as radical as proposed by me, I still like it better than the previous RSL proposal. >>> >>> One thing which seems missing is dealing with tabs. What if user file is invented with tabs? Should they be also processed? More specifically, what is a "white space" in strip indent algorithm description? Only \u0020 symbol or \u0020 & \u0009? Or anything for which Character.isWhiteSpace() returns true? Also if tabs are included do single tab costs the same as single space? You may imagine that somebody pastes part of multiline string from StackOverflow where tabs were used for indent and see some unexpected results (e.g. indent changes for the untouched lines while visually in the editor pasted lines look having the same indent as surrounding ones). I admit that defining what is "expected result" is hard, especially taking into account that most editors provide a setting for the tab size and different users may have different tab size. Nevertheless I feel that tab handling should be explicitly spelled out (even if it's "tab is not considered as a white-space character"). >>> >>> With best regards, >>> Tagir Valeev. >>> >>> [1] http://mail.openjdk.java.net/pipermail/amber-spec-experts/2018-January/000251.html >>> ??, 9 ??? 2019 ?., 19:07 Jim Laskey >: >>> At this point I think the only outstanding issue is long line continuation. While we can postpone continuation until a later release, I think we should at least lay out the details to see if we need to do anything now. I'll follow up with a (long line continuation) synopsis e-mail in a few. >>> >>> Meanwhile, please review the JEP and comment back here. >>> >>> Cheers, >>> >>> -- Jim >>> >>> >>> html: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.html >>> markdown: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.md >>> >>> >> > From guy.steele at oracle.com Thu May 9 17:14:15 2019 From: guy.steele at oracle.com (Guy Steele) Date: Thu, 9 May 2019 13:14:15 -0400 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: <67095905-B377-4E4E-B0F9-2D0879F3680C@oracle.com> References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> <67095905-B377-4E4E-B0F9-2D0879F3680C@oracle.com> Message-ID: The nice thing about this rule is that having an editor/IDT _either_ detab _or_ retab should fix the problem. > On May 9, 2019, at 12:53 PM, Jim Laskey wrote: > > Reasonable. This is a learning step for new users of MLS. > > > >> On May 9, 2019, at 1:43 PM, Guy Steele > wrote: >> >> One possibility is language enforcement: at the part of the processing of a multiline string where whitespace is stripped off to the left of the rectangle, it could be an error if that leading whitespace is not ?spelled the same? on all lines from which it is being stripped. I lean toward recommending this. >> >>> On May 9, 2019, at 12:40 PM, Jim Laskey > wrote: >>> >>> The proposed solution does not solve the mixed leading white space problem. As long as the white space is consistent across lines with the tabs, it works fine. >>> >>> Otherwise, the developer has two solutions; >>> >>> 1) Uuse your editor to detab or detab + entab. Either makes the white space consistent. >>> >>> 2) Write a custom String::transform method that does the trimMargins thing. >>> >>> String string = """ >>> |> line 1 >>> |> line 2 >>> """.transform(s -> s.replaceAll("\\w*\\| > ", "")); >>> The lone line discussion could fall out an automatic trimMargins solution, but it gets messy. >>> -- Jim >>> >>> >>> >>> >>> >>>> On May 9, 2019, at 12:59 PM, Tagir Valeev > wrote: >>>> >>>> Hello! >>>> >>>> Great draft, thank you. I'm especially happy that the expert group came to the conclusion that automatic builtin processing of the indentation is important. I proposed to do this in January, 2018 [1]. While the solution proposed in JEP draft is not as radical as proposed by me, I still like it better than the previous RSL proposal. >>>> >>>> One thing which seems missing is dealing with tabs. What if user file is invented with tabs? Should they be also processed? More specifically, what is a "white space" in strip indent algorithm description? Only \u0020 symbol or \u0020 & \u0009? Or anything for which Character.isWhiteSpace() returns true? Also if tabs are included do single tab costs the same as single space? You may imagine that somebody pastes part of multiline string from StackOverflow where tabs were used for indent and see some unexpected results (e.g. indent changes for the untouched lines while visually in the editor pasted lines look having the same indent as surrounding ones). I admit that defining what is "expected result" is hard, especially taking into account that most editors provide a setting for the tab size and different users may have different tab size. Nevertheless I feel that tab handling should be explicitly spelled out (even if it's "tab is not considered as a white-space character"). >>>> >>>> With best regards, >>>> Tagir Valeev. >>>> >>>> [1] http://mail.openjdk.java.net/pipermail/amber-spec-experts/2018-January/000251.html >>>> ??, 9 ??? 2019 ?., 19:07 Jim Laskey >: >>>> At this point I think the only outstanding issue is long line continuation. While we can postpone continuation until a later release, I think we should at least lay out the details to see if we need to do anything now. I'll follow up with a (long line continuation) synopsis e-mail in a few. >>>> >>>> Meanwhile, please review the JEP and comment back here. >>>> >>>> Cheers, >>>> >>>> -- Jim >>>> >>>> >>>> html: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.html >>>> markdown: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.md >>>> >>>> >>> >> > From brian.goetz at oracle.com Thu May 9 22:21:37 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 9 May 2019 15:21:37 -0700 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> Message-ID: > One possibility is language enforcement: at the part of the processing of a multiline string where whitespace is stripped off to the left of the rectangle, it could be an error if that leading whitespace is not ?spelled the same? on all lines from which it is being stripped. I lean toward recommending this. I see the logic here, but it also makes me a bit uncomfortable. Our story is that indentation-stripping is done by a JDK method (String::stripIndent), and that the language behavior is specified in terms of the library behavior. (This is essential if we want to allow users to opt out, do some manipulation on the un-aligned form, and then perform alignment ? the language and library behavior must be, er, aligned.). We could surely make String::stripIndent throw when you present it a mixed-whitespace string, but do we really want this? I would prefer that stripIndent be a total function on strings, even if it has to produce ugly output when given ugly input. From guy.steele at oracle.com Thu May 9 22:41:27 2019 From: guy.steele at oracle.com (Guy Steele) Date: Thu, 9 May 2019 18:41:27 -0400 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> Message-ID: We have already discussed other conditions that can cause string literals to be rejected before the indentation stripper gets a crack at them. I see nothing wrong with stripIndent being a total function but also having the compiler filter string literals that appear in source code before the function. Is applied. (The filter predicate could also be in the library if we want.) Sent from my iPhone > On May 9, 2019, at 6:21 PM, Brian Goetz wrote: > > >> One possibility is language enforcement: at the part of the processing of a multiline string where whitespace is stripped off to the left of the rectangle, it could be an error if that leading whitespace is not ?spelled the same? on all lines from which it is being stripped. I lean toward recommending this. > > I see the logic here, but it also makes me a bit uncomfortable. Our story is that indentation-stripping is done by a JDK method (String::stripIndent), and that the language behavior is specified in terms of the library behavior. (This is essential if we want to allow users to opt out, do some manipulation on the un-aligned form, and then perform alignment ? the language and library behavior must be, er, aligned.). > > We could surely make String::stripIndent throw when you present it a mixed-whitespace string, but do we really want this? I would prefer that stripIndent be a total function on strings, even if it has to produce ugly output when given ugly input. > > From john.r.rose at oracle.com Fri May 10 04:46:35 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 9 May 2019 21:46:35 -0700 Subject: Long line string literals In-Reply-To: <454D3E30-C49B-49F0-9BC6-CA9E44117051@oracle.com> References: <454D3E30-C49B-49F0-9BC6-CA9E44117051@oracle.com> Message-ID: <43AB2BC3-95E3-4006-A10D-0DFCF778A858@oracle.com> On May 9, 2019, at 7:34 AM, Jim Laskey wrote: > > How does a Java developer express a very long string? > ? > > Some of the parameters; > > - The solution needs to be an escape sequence(s). This is the only > mechanism we can introduce (now) and be backward compatible with > traditional string literals. Other mechanisms, such as literal > prefixing, are not open for discussion at this point in time. (+1) +1 from me > > - A Multi-line String Literal JEP goal is to make all escape sequences > equally meaningful for traditional string literals and multi-line > string literals. (+1) +1 > - \, \ and \ (white space includes LF > and CR) have been proposed with various semantics for each. There is a > concern that the lack of visibility for what comes after the \. Is it a > space, tab, unicode white space, LF or CR? How do you tell? (?1) Yep. Also note that some source control systems (ours!) forbid trailing spaces before EOL in code, precisely because they are invisible. IMO this consideration immediately disqualifies <\ space> as a candidate for an escape sequence. <\ LT> is still just fine, and maybe <\ LT space*> is tolerable, but not if it means something different from <\ LT>. > - When the new escape sequence(s) is in a traditional string literal the > compiler scanner needs to treat the traditional string literal as > multi-line. (-1) Yes: If you use a <\ LT> escape sequence in a thin string, it becomes a ML string. If you thought that only fat strings could be ML strings, I've got a nice puzzler for you. The reality of about fat strings is they are nicely formatted multi-line strings (with the rectangle extraction feature). > The escape sequences suggested differ, but they are all variations of > consuming the escape and zero to N characters after (or before). I'll say up front that greedily gobbling whitespace characters either before or after an escape is a powerful idea, IMO, because it allows the user to designate an ad hoc run of whitespace as "program format only, but not payload". If we make the ad hoc run easy to use, to make the program more readable, we win, as with the rectangle rule. But there has to be a way to "fence" the whitespace gobbler so it doesn't gobble nearby whitespace which is intended as payload. You can do this today as <\ 0 4 0>, and I would prefer to add a more memorable optional <\ s>. To protect a tab, today's <\ t> works just fine. I think either <\ 0 4 0> or <\ s> is adequate to "fence the gobbler", in either direction. > > A) \ or \ Just consume the (single) > line terminator/white space. > > Sample, > > String tsl = "Lorem ipsum dolor sit amet, consectetur \ > adipiscing elit. Nunc est libero, vehicula \ > nec molestie in, semper aliquam magna."; > > String msl = """ > Lorem ipsum dolor sit amet, consectetur \ > adipiscing elit. Nunc est libero, vehicula \ > nec molestie in, semper aliquam magna."""; > > This works if the line terminator follows immediately after the \ . (+1) > > Can not tell if it is a white space or line terminator after the \ . (-1) > > This does not work if there is one or more intervening white space > characters. (-1) > > This works for multi-line string literals because of stripTrailing. (+1) > > This does not work for traditional string literals because there is no > notion of auto alignment to strip the leading white space on the next > line. (-2) -1 from me. It lets you break the long line, but then you have to place it flush against the left margin. To me breaking a long line inherently has two decisions: 1. break the line, 2. decide where to place the second part on the next line, using spaces and tabs. So I want the same mechanism that gobbles the LT to also gobble the succeeding whitespace. Thus <\ LT WS*> expands to the null string. > > B) \ Consume all white space up to and including the line > terminator. > > Same sample as A). > > Works in more cases than A). (+2) > > Still does not work for traditional string literals because there is no > notion of auto alignment to strip the leading white space on the next > line. (-2) Same objection (and proposal) as for A. > > C) \ Consume all white space (including LF and CR) up to a > non-white space or end of string. > > Same sample as A). > > This works for both traditional and multi-line strings. (+1) > > Note that in A), B) and C) the next line may influence multi-line > indentation. I.E., escapes are translated after auto alignment. (?1) +1 This is the one I like! I accept that, for a fat string with rectangle extratction, I am required to indent the second line fragment *after* the left margin of the extracted rectangle. It's a fine compromise. String msl = """ First. Lorem ipsum dolor sit amet, consectetur \ adipiscing elit. Nunc est libero, vehicula \ nec molestie in, semper aliquam magna. Last. """; => "First.\n Lorem?magna.\nLast." In this example, the continuation lines (second and third after Lorem?) can be exdented to align with First and Last, but not further. Any extra indentation, after that of First and Last, is gobbled by <\ LT WS*>. > D) \, (something other that white space) but otherwise the same as C) > > String tsl = "Lorem ipsum dolor sit amet, consectetur \, > adipiscing elit. Nunc est libero, vehicula \, > nec molestie in, semper aliquam magna."; > > String msl = """ > Lorem ipsum dolor sit amet, consectetur \, > adipiscing elit. Nunc est libero, vehicula \, > nec molestie in, semper aliquam magna."""; > > Works but trading " + for \, . (?1) -1 (Not sure what D buys?) > > E) \> (something other that white space) > Consume all white space up to and including the line terminator. > \< (something other that white space) > Consume all white space back to beginning of line. > > String tsl = "Lorem ipsum dolor sit amet, consectetur \> > \ > \ > String msl = """ > Lorem ipsum dolor sit amet, consectetur \> > \ > \ > A goal of the multi-line JEP was to make the string more readable, less > error prone and maintainable. (-10) Yep. > Note for D) and E), is it an error if a non-white space is encountered > or just stop? (?1) D/K. ? John From daniel.smith at oracle.com Fri May 10 21:04:08 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 10 May 2019 15:04:08 -0600 Subject: Wrapping up the first two courses In-Reply-To: <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> Message-ID: <9B1B3DB5-53FF-41B7-8B42-82837A1B39BE@oracle.com> I generally like where this has landed. I've been an uninvolved observer, and can't possibly process all the discussions on this mailing list over the past few months, so sorry if I've missed some of the core arguments on certain points. But I scanned through the various threads, and wanted to point out a couple of things in this conclusion that strike me as odd/unmotivated. > On Apr 22, 2019, at 7:15 AM, Brian Goetz wrote: > > So, I posit, we have consensus over the following things: > > - Multi-line strings are a useful feature on their own > - Using ?fat? delimiters for multi-line strings is practical and intuitive There's an argument that "fat" delimiters are important because lots of use cases contain single quotes. Two thoughts on that: - Okay, but that doesn't mean we have to prohibit "thin" delimiters, right? I have a weak preference for wanting to write multi-line strings using the standard " characters when I can get away with it. Seems more readable to me, especially for multi-line strings that aren't big chunks of marked-up text. - What's the solution for single-line string literals that contain quotes? Fat delimiters are pretty hard to read when they're both on a single line, and I don't think the current story supports that anyway. If the solution is some "turn off escapes" mechanism, wouldn't the same mechanism work for multi-line strings? > - There exists a reasonable alignment algorithm, which users can learn easily enough, and can be captured as a library method on String (some finer points to be hammered out) Practically, the programming style I would want to use is Jim's example (h): String h = """+--------+ | text | +--------+"""; Occasionally?when the line is wide?I might want to fall back to one of the other styles (like (d)), but (h) would be my go-to. Unfortunately, it seems like we've landed in a place where (h) is disallowed, because it can't be handled by a library method. There have been various discussions about whether multi-line string literals are one-dimensionsal (open quote + payload + close quote) or two-dimensional (the contents of a rectangle in the editor). I think the two-dimensional model is the right abstraction?that is, drawing a rectangle should be an inherent part of parsing a multi-line string literal. The "implicitly apply a library method to this string" view is based on the one-dimensional model, where after the fact we try to approximate context of the literal and re-interpret the payload. Why tie our hands? (Strawman: "We want a pluggable string processor." Me: "Since when is parsing supposed to be pluggable?") As a pretty-simple definition of the 2D rectangle, I'd be happy with "all columns to the right of the opening delimiter, on all lines until the closing delimiter". Indents in between must use whitespace to align with the opening delimiter; if they don't, that's a parse error. I realize that some people prefer a different style, and that this story is complicated by tab characters and variable-width fonts. So maybe there's another rule (or two) for the 2D rectangle when the first line is blank, based on the placement of the closing delimiter, or based on the leftmost non-whitespace character. But my high-level point is that I'd rather not force the algorithm to be defined on a context-free String. > - To the extent the language performs alignment, it should be consistent with what the library-based version does, so that users can opt out and opt back in again > - There needs to be an opt-out, for the cases where alignment is not the default the user wants I want to say that, again relying on the 2D program text the parser is working with, the algorithm should be designed so that delimiters can be placed in a way to naturally indicate no trimming should occur. E.g., end delimiter in column 0 (sorry, case (d)). Others have suggested something along those lines. I don't know if you'd call that an "opt out", but the best opt-outs are the ones that don't need special syntax or rules. (That's another reason my preferred style (h) doesn't work for everybody, because it requires at least 3 characters of indentation.) From daniel.smith at oracle.com Fri May 10 23:39:48 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 10 May 2019 17:39:48 -0600 Subject: Wrapping up the first two courses In-Reply-To: <9B1B3DB5-53FF-41B7-8B42-82837A1B39BE@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <9B1B3DB5-53FF-41B7-8B42-82837A1B39BE@oracle.com> Message-ID: > On May 10, 2019, at 3:04 PM, Dan Smith wrote: > > Practically, the programming style I would want to use is Jim's example (h): > > String h = """+--------+ > | text | > +--------+"""; Thinking about this a bit more, I could also be happy uniformly adopting example (a): String a = """ +--------+ | text | +--------+ """; Or, where needed, (e): String e = """ +--------+ | text | +--------+ """; I think the key to this style for me is to stop thinking about this as a "string literal with newlines" and start thinking about it as a different entity. (Which is a good argument for fat delimiters.) > As a pretty-simple definition of the 2D rectangle, I'd be happy with "all columns to the right of the opening delimiter, on all lines until the closing delimiter". Indents in between must use whitespace to align with the opening delimiter; if they don't, that's a parse error. > > I realize that some people prefer a different style, and that this story is complicated by tab characters and variable-width fonts. So maybe there's another rule (or two) for the 2D rectangle when the first line is blank, based on the placement of the closing delimiter, or based on the leftmost non-whitespace character. But my high-level point is that I'd rather not force the algorithm to be defined on a context-free String. Reframing this to support things like (a) and (e), but still take context into account, I really think we could cut down on the degrees of freedom significantly, and just say this: the left margin of the rectangle aligns with the left side of the opening or closing delimiter, whichever is leftmost*; the top of the rectangle is the line after the opening delimiter. All indents must match the leftmost delimiter's prefix (where non-whitespace prefix text is replaced with spaces), and the line after the opening delimiter must be blank. This is a very opinionated rule: the space to the left of the leftmost delimiter is simply off-limits. And any indentation to the right of the leftmost delimiter is preserved. That's just How It's Done. If you need a different left margin, move your delimiters (e.g., add a newline, like (e)). I think programmers would appreciate a simple, strict, easy-to-see rule, rather than a best-effort trimming algorithm. (* I'd almost be willing to say that the opening delimiter always determines the indent, but I'm backing off for tab-lovers who won't like how prefix text gets replaced with spaces; though maybe tab-lovers will want to keep things tidy with a newline before the opening delimiter. Anyway, in most cases the opening and closing delimiter will start in the same column.) From james.laskey at oracle.com Mon May 13 14:05:17 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Mon, 13 May 2019 11:05:17 -0300 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: <92CC33C3-6AA1-4639-ABC0-1DD87CD33C59@oracle.com> References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> <92CC33C3-6AA1-4639-ABC0-1DD87CD33C59@oracle.com> Message-ID: <950B7480-0CC3-4970-9209-553AFE2FD603@oracle.com> After some significant tweaks, reopening the JEP for review. https://bugs.openjdk.java.net/browse/JDK-8222530 The most significant change is the renaming to Text Blocks (I'm sure it will devolve over time Text Literals or just Texts.) This is primarily to reflect the two-dimensionality of the new literal, whereas String literals are one-dimensional. Comment back here. Cheers, -- Jim > On May 9, 2019, at 12:44 PM, Jim Laskey wrote: > > I've withdrawn the posting for some additional changes. Will keep you posted. > > -- Jim > > >> On May 9, 2019, at 9:06 AM, Jim Laskey wrote: >> >> At this point I think the only outstanding issue is long line continuation. While we can postpone continuation until a later release, I think we should at least lay out the details to see if we need to do anything now. I'll follow up with a (long line continuation) synopsis e-mail in a few. >> >> Meanwhile, please review the JEP and comment back here. >> >> Cheers, >> >> -- Jim >> >> >> html: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.html >> markdown: http://cr.openjdk.java.net/~jlaskey/Strings/MLS/MultilineStrings.md >> >> > From forax at univ-mlv.fr Mon May 13 14:12:36 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 13 May 2019 16:12:36 +0200 (CEST) Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> Message-ID: <229271493.298931.1557756756018.JavaMail.zimbra@u-pem.fr> > De: "Brian Goetz" > ?: "amber-spec-experts" > Envoy?: Dimanche 12 Mai 2019 21:38:38 > Objet: Call for bikeshed -- break replacement in expression switch > As mentioned in the preview mail, we have one more decision to make: the new > spelling of ?break value? in expression switches. We have previously discussed > ?break-with value?, which everyone seems to like better than ?break value?, but > I think we can, and should, do better. > (Despite the call-for-bikeshed, this is not to reopen every sub-decision ? the > 2x2 semantics, the use of ->, the name of the construct ? this bikeshed only > has room for one bike.) > There are two primary reasons why we prefer break-with to break. We originally > chose ?break value" when we had a more limited palette of options to choose > from (the keyword-resupply ship hadn?t yet docked.) The overloading of break > creates uncomfortable interactions. There is the obvious ambiguity between > ?break value? and ?break label?; there is also the slightly less obvious > interaction where we cannot permit ?break value? inside a loop or statement > switch inside an expression switch. While both of these can be ?specified > around?, they create distortions in the spec, which in turn creates complexity > in the user model; these are a sign that we may be pushing something a bit too > far. Further, historically ?break? has been a straight transfer of control; > this muddies up what ?break? means. > Once we alit on the idea of break-* as a keyword, it seemed immediately more > comfortable to make a new break-derived keyword; this allowed us to undo the > distortions that ?break value? introduced, and it immediately felt better. But > I think we can do better still. Here?s what?s making me uncomfortable. > We?ve actually been here before: lambda expressions were the first time we > allowed an expression to contain statements, and while the streamlined case of > ?x -> e? didn?t require any control statements, and many lambdas could be > expressed with this form, statement lambdas needed a way to say ?stop executing > the body of this lambda, and yield a value.? We settled ? somewhat > uncomfortably ? on ?return value" for this. > Fast-forward to today, when we?re introducing the second expression form that > can contain statements, and we face the same question: how to indicate ?I?m > done, I?m completing normally, here?s my value.? Lambdas provide no help here; > we can?t use ?return? here. (Well, we could, but that would be terrible, so > we?re not going to.) Which means we have to solve the problem again, but > differently. That?s already not so great. > Digression: What?s so terrible about ?return?, any why is it OK for lambdas but > not OK for switches? > While we could of course define ?return? to mean whatever we want, But, in > imperative languages with the concept of ?methods? or ?procedures?, including > Java, return has always had a clear meaning: unwind the current call frame, and > yield the designated value to the caller. Lambda expressions are effectively > method bodies (lambdas are literals for functional interfaces, which are single > method interfaces), and so return (barely) fits. But switch expressions are > most definitely not methods, and are not associated with call frames. Asking > users to look at the enclosing context when they see a ?return? in the middle > of a method, to know whether it returns from the method or merely transfers > control within the method, is a lot to ask. (Yes, I know lambdas ask this as > well; this is why this was an uncomfortable choice, and having made this hole, > I?m not anxious to expand it dramatically. If anything I?d prefer to close it, > but that?s another bikeshed.). > (end digression) > We could surely take ?break-with? and move on; it feels sufficiently ?switchy?. > But let?s look ahead a little bit. We?ve now confronted the same problem twice: > an expression form that, in a minority use case, needed a way to express ?stop > computing this expression, because I?m done, and here?s its value.? (And, > unfortunately, we have two different syntactic ways to express the same basic > concept.) Let?s call these ?structured expressions.? > We have two structured expression forms, and of the three numbers in computer > science, ?two? is not one of them. Which suggests we are going to face this > problem again some day ? whether it be ?block expressions?, or ?if > expressions?, or ?let expressions?, or ?try expressions?, or whatever. (NB: > this call-for-bikeshed most definitely does not extend to ?why not just do > generalized block expressions?, so please don?t go there. That said, you could > treat this discussion as ?if Java had block expressions, what might they look > like?? But we?re focusing on the content of the block, not how the block is > framed.) > Let?s say for sake of argument that we might someway want to extend ternary > expressions to support the same kind of ?restricted block expressions? as > expression switches. (This is just an example for purposes of illustration, > let?s not get derailed on ?but you should use an ?if? statement for that"). > String s = (foo != null) > ? s > : { > println(?null again at line? + __LINE__); > break-with ?null?; > }; > Such an expression needs a way to say ?I?m done, here?s my value?, just as > lambda and switch did before it. Clearly ?return? is not the right thing here > any more than it is for switches. And I don?t think ?break-with? is all that > great here either! It?s not terrible, but outside of a loop or switch, it > starts to feel kind of forced. And it would be terrible to solve this problem > twice with one-time solutions, and have no general story, and then have to come > up with YET ANOTHER way of expressing the same basic concept. So regardless of > what we expect for future expression forms, let?s examine what our options are > that are not tied to call frames (return) or direct transfer of control > (switches and loops.). > Looking at what other languages have done here, there are a few broad > directions: > - A statement like ?break-with v?, indicating that the enclosing structured > expression is completing normally with the provided value. > - An operator that serves the same purpose, such as ?-> e?. > - Assigning to some magic variable (this is how Pascal indicates the return > value of a function). > - Treating the last expression in the block as the result. > I think we can dispatch all but the first relatively easily: > - We don?t use operators for ?return?, we use a keyword; this would be both a > gratuitous departure, as well as too easy to miss. > - Switch expressions don?t have names, and even if we assigned to ?switch?, it > wouldn?t be obvious that we were actually terminating execution of the block. > - Everywhere else in the language (such as method bodies), you are free to yield > up a value from the middle of the block, perhaps from within a control > construct like a loop; restricting the RHS of case blocks to put their result > last would be a significant new restriction, and would limit the ability to > refactor to/from methods. And further, the convention of putting the result > last, while a fine one for a language that is ?expressions all the way down?, > would likely be too subtle a cue in Java. > So, we want a keyword (or contextual keyword.). In some hallway brainstorming, > candidates that emerged include yield, produce, offer, offer-up, result, > value-break, yield-value, provide, resulting-in, break-with, resulting, > yielding, put, give, giving, ... > (Also to keep in mind: remember we?re dealing with a minority case; most of the > time, there?ll just be an expression on the RHS.) > TL;DR: I think we might come to regret break-* just as we did with return ? > because it won?t scale to future demands we place on it, and having *three* > ways to say basically the same thing in three different contexts would be > embarrassing. I would like to see if we can do better. > Of the options listed here, I have a favorite: yield. (This is one of the terms > we?ve actually be using all along when describing this feature in english.) > There is one obvious objection to ?yield?, which I?d like to preemptively > address: that in some languages (though not in Java, except for the > infrequently-used Thread.yield()), it is associated with concurrency > primitives, such as generators. (This was the objection raised when yield was > proposed in the context of lambdas.). But, these association are not grounded > in existing Java constructs (and, the progress of Loom suggests that constructs > like async/await are not coming to Java, and even if we wanted language support > for generators, there are ample other ways to say it.) I kind a like the simplicity of the keyword yield but i don't think it's a good idea to use it. - as you said, yield in other language has a different meaning, so even if Java doesn't use yield in a generator it will be confusing for people discovering Java after Python by example. - currently for loom the way to yield from a continuation is to use Continuation.yield(scope) with scope being a continuation scope, so it might be confusing if there is a static import because "yield scope;" and "yield(scope);" have two different meaning. I kind a like relinquish too, so i will stop there. R?mi From brian.goetz at oracle.com Mon May 13 14:20:13 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 13 May 2019 10:20:13 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <229271493.298931.1557756756018.JavaMail.zimbra@u-pem.fr> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <229271493.298931.1557756756018.JavaMail.zimbra@u-pem.fr> Message-ID: > I kind a like the simplicity of the keyword yield but i don't think it's a good idea to use it. > - as you said, yield in other language has a different meaning, so even if Java doesn't use yield in a generator it will be confusing for people discovering Java after Python by example. Everything is a tradeoff. There are two dimensions here to consider: - What percentage of the user base has a polluted perspective? - How badly are they polluted, and how hard is it to get over? My suspicion is that the first number is actually pretty small, and for most of them, they can get over it. And also: the percentage of people _on this list_ that are polluted is probably dramatically higher than for the ambient Java developer population (those that take an active interest in language evolution are probably familiar with more languages.). So, do we want to pick something that is clear for most people, but polluted for a minority, or something that is crappy for everyone, but unpolluted? It depends, of course, but my main point is that I think the ?pollution? angle is overblown, and we shouldn?t over-rotate to it. > - currently for loom the way to yield from a continuation is to use Continuation.yield(scope) with scope being a continuation scope, so it might be confusing if there is a static import because "yield scope;" and "yield(scope);" have two different meaning. Yes, but of course these can be changed, and if we went with yield in the language, we would of course update Loom APIs accordingly. From dl at cs.oswego.edu Mon May 13 14:28:52 2019 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 13 May 2019 10:28:52 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> Message-ID: <9aaff291-fb96-9cbd-432b-206cd215ed06@cs.oswego.edu> Having lost (nearly) this argument years ago, I'm not sure why I bother, but ... On 5/12/19 3:38 PM, Brian Goetz wrote: > > Looking at what other?languages have done here, there are a few broad > directions:? > > ?- A statement like??break-with v?, indicating that the enclosing > structured expression is?completing normally with the provided value. ? > ?- An operator that?serves the same purpose, such as??-> e?. > ?- Assigning to some magic variable (this is how Pascal indicates the > return value of a function). ? > ?- Treating the last expression in the block as the result.? (The last one being "progn", the earliest and arguably still best of these.) > > I think we can dispatch all but the first relatively easily: ... > > > ?- Everywhere else in the language (such as method bodies), you are > free to yield up a value from the middle of the block, perhaps from > within a control?construct like a loop; restricting the RHS of case > blocks to put their result last would be a significant new > restriction, and would limit the ability to refactor to/from methods. > And further, the convention of putting the result last, while a fine > one for a language that is??expressions all the way down?, would > likely be too subtle a cue in Java.? Last time around, the last point about subtlety and odd-lookingness of progn seemed to bother people the most.? It is possible to make it? less subtle by additionally requiring some symbol. Prefix "^" is still available. Allowing for example: ? ? String s = (foo != null)? ? ? ? ? ? s ? ? ? ? : { println(?null again at line? + __LINE__);? ^ ?null?;? }; Which still lgtm.... -Doug From guy.steele at oracle.com Mon May 13 19:08:48 2019 From: guy.steele at oracle.com (Guy Steele) Date: Mon, 13 May 2019 15:08:48 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <9aaff291-fb96-9cbd-432b-206cd215ed06@cs.oswego.edu> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <9aaff291-fb96-9cbd-432b-206cd215ed06@cs.oswego.edu> Message-ID: > On May 13, 2019, at 10:28 AM, Doug Lea
wrote: > > > Having lost (nearly) this argument years ago, I'm not sure why I bother, but ... > > On 5/12/19 3:38 PM, Brian Goetz wrote: >> >> Looking at what other languages have done here, there are a few broad directions: >> >> - A statement like ?break-with v?, indicating that the enclosing structured expression is completing normally with the provided value. >> - An operator that serves the same purpose, such as ?-> e?. >> - Assigning to some magic variable (this is how Pascal indicates the return value of a function). >> - Treating the last expression in the block as the result. > > (The last one being "progn", the earliest and arguably still best of these.) > >> >> I think we can dispatch all but the first relatively easily: ... >> >> >> - Everywhere else in the language (such as method bodies), you are free to yield up a value from the middle of the block, perhaps from within a control construct like a loop; restricting the RHS of case blocks to put their result last would be a significant new restriction, and would limit the ability to refactor to/from methods. And further, the convention of putting the result last, while a fine one for a language that is ?expressions all the way down?, would likely be too subtle a cue in Java. > > Last time around, the last point about subtlety and odd-lookingness of progn seemed to bother people the most. It is possible to make it less subtle by additionally requiring some symbol. Prefix "^" is still available. Allowing for example: > > > String s = (foo != null) > ? s > : { println(?null again at line? + __LINE__); ^ ?null?; }; > > Which still lgtm?. Could be worse, but looks to be like Java with a Smalltalk accent?just as { foo(); bar } is Java with a Lisp (or ECL) accent. I would prefer to adapt a bit of syntax from ECL: the statement b => e; evaluates b as a boolean expression, and if it is true, then e is evaluated and its value becomes the value of the block. This gives you a syntax very similar to that of Lisp COND: { x > y => 1; x < y => -1; true => 0; } If you then want to further abbreviate ?true =>?, well, that?s another story, but I wouldn?t blame you. ?Guy From guy.steele at oracle.com Mon May 13 19:13:09 2019 From: guy.steele at oracle.com (Guy Steele) Date: Mon, 13 May 2019 15:13:09 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <229271493.298931.1557756756018.JavaMail.zimbra@u-pem.fr> Message-ID: <2131355F-93D9-4346-A973-F513DD33AC29@oracle.com> > On May 13, 2019, at 10:20 AM, Brian Goetz wrote: > >> I kind a like the simplicity of the keyword yield but i don't think it's a good idea to use it. >> - as you said, yield in other language has a different meaning, so even if Java doesn't use yield in a generator it will be confusing for people discovering Java after Python by example. > > Everything is a tradeoff. There are two dimensions here to consider: > - What percentage of the user base has a polluted perspective? > - How badly are they polluted, and how hard is it to get over? > > My suspicion is that the first number is actually pretty small, and for most of them, they can get over it. And also: the percentage of people _on this list_ that are polluted is probably dramatically higher than for the ambient Java developer population (those that take an active interest in language evolution are probably familiar with more languages.). It?s true; I have been polluted for ?yield? for a long, long time. I think I would still prefer ?produce?. > So, do we want to pick something that is clear for most people, but polluted for a minority, or something that is crappy for everyone, but unpolluted? It depends, of course, but my main point is that I think the ?pollution? angle is overblown, and we shouldn?t over-rotate to it. > >> - currently for loom the way to yield from a continuation is to use Continuation.yield(scope) with scope being a continuation scope, so it might be confusing if there is a static import because "yield scope;" and "yield(scope);" have two different meaning. > > Yes, but of course these can be changed, and if we went with yield in the language, we would of course update Loom APIs accordingly. > > > From john.r.rose at oracle.com Mon May 13 19:48:39 2019 From: john.r.rose at oracle.com (John Rose) Date: Mon, 13 May 2019 12:48:39 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <9aaff291-fb96-9cbd-432b-206cd215ed06@cs.oswego.edu> Message-ID: <135B5F1D-5DF3-4C54-9B2D-8D2CD716FFC9@oracle.com> On May 13, 2019, at 12:08 PM, Guy Steele wrote: > >> On May 13, 2019, at 10:28 AM, Doug Lea
wrote: >> >> >> Having lost (nearly) this argument years ago, I'm not sure why I bother, but ... >> >> On 5/12/19 3:38 PM, Brian Goetz wrote: >>> >>> Looking at what other languages have done here, there are a few broad directions: >>> >>> - A statement like ?break-with v?, indicating that the enclosing structured expression is completing normally with the provided value. >>> - An operator that serves the same purpose, such as ?-> e?. >>> - Assigning to some magic variable (this is how Pascal indicates the return value of a function). >>> - Treating the last expression in the block as the result. >> >> (The last one being "progn", the earliest and arguably still best of these.) >> >>> >>> I think we can dispatch all but the first relatively easily: ... >>> >>> >>> - Everywhere else in the language (such as method bodies), you are free to yield up a value from the middle of the block, perhaps from within a control construct like a loop; restricting the RHS of case blocks to put their result last would be a significant new restriction, and would limit the ability to refactor to/from methods. And further, the convention of putting the result last, while a fine one for a language that is ?expressions all the way down?, would likely be too subtle a cue in Java. >> >> Last time around, the last point about subtlety and odd-lookingness of progn seemed to bother people the most. It is possible to make it less subtle by additionally requiring some symbol. Prefix "^" is still available. Allowing for example: >> >> >> String s = (foo != null) >> ? s >> : { println(?null again at line? + __LINE__); ^ ?null?; }; >> >> Which still lgtm?. > > Could be worse, but looks to be like Java with a Smalltalk accent?just as > > { foo(); bar } > > is Java with a Lisp (or ECL) accent. I would prefer to adapt a bit of syntax from ECL: the statement > > b => e; > > evaluates b as a boolean expression, and if it is true, then e is evaluated and its value becomes the value of the block. This gives you a syntax very similar to that of Lisp COND: > > { x > y => 1; x < y => -1; true => 0; } > > If you then want to further abbreviate ?true =>?, well, that?s another story, but I wouldn?t blame you. OK, I can't resist putting some spray paint here. If we are contemplating operator-like syntaxes (instead of the keyword-like ones that seem most reasonable, and which Brian is guiding us towards), then let's note that the operator-like syntax that *Java already has* for producing a value from a structured expression is "->". So perhaps the Java-native idiom for ECL's "true=>" is just "->". Or (more likely for me) it is a break-like keyword *with an arrow*. So under that observation: switch (x) { case Y -> z; } is short for something like: switch (x) { case Y -> { ? break -> z; } } and (what's more) the "?" could contain side effects and let-bindings. The rule for developers is that if you needed to put a {?} block after your arrow ->, then you can still use an arrow to return a value, but it must be an extra arrow, marked with a keyword (or syntax context) that means "here is the rest of the arrow you wanted to write a moment ago". This could work inside of lambdas also: f( (x,y) -> z ) is short for something like: f( (x,y) -> { ? return -> z; } ) (Why do such a thing? To give users the option of a uniform style which answers every "->{" with a finishing arrow; they *can* use unadorned "return" but their colleagues might frown on the faux pas.) One reason I'm pushing on the "interrupted arrow" idea here is a fundamental design prejudice I have. I very much like the Lisp syntax (block foo ? (return-from foo x) ?). Although "return" is damaged goods for us, what I'd like to salvage from this example is the *very clear correspondence* between the "starter syntax" of the structured expression ("block foo") and the "stopper syntax" in the middle ("return-from foo"). The shared tag "foo" makes it very easy for the eye to match up the stopper with the starter. You don't have to consult a complex matrix of "what matches with what". ("I shot an arrow into the air, and where it landed only the author of the break permeability matrix knows here.") OK, one more spritz of spray paint and I'm done for now. If we like the idea of an "interrupted arrow", then we could think about going the whole way with it. If the "stopper" is the sharp end of the arrow (anchored to a keyword like return or break) then the "starter" of the structured expression could be the dull end of the arrow (without an arrowhead). Like this: switch (x) { case Y -{? break -> z; } } Here, the rule is if you intend to use an arrow to return a value, you put half of the arrow where the return will go to, and the other half when you have a value. (Note that the syntax "break LABEL" could be added easily, later on, if there were any value for that, which probably there isn't.) This conflicts with a bit of precedent with lambdas, where we might expect to break the arrows the new way: f( (x,y) -{ ? return -> z; } ) If we don't want broken arrows then set up a duel between the starter and stopper with opposing arrows: switch (x) { case Y -> {? break <- z; } } Or let the author propose a target at the starter: switch (x) { case Y @< {? break -> z; } } Surely that would be a spritz too far. ? John From john.r.rose at oracle.com Mon May 13 20:40:07 2019 From: john.r.rose at oracle.com (John Rose) Date: Mon, 13 May 2019 13:40:07 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> Message-ID: <1FCD1BB2-8B81-4D97-A29D-FE44DE6E1764@oracle.com> On May 12, 2019, at 12:38 PM, Brian Goetz wrote: > > Digression: What?s so terrible about ?return?, any why is it OK for lambdas but not OK for switches? Although I like Lisp (block foo ? (return-from foo x) ?), I buy your argument that "unwind the current call frame" is different enough from "transfer control within the current call frame", enough to merit syntactic difference. There's a very subtle difference in Java (but not in Lisp) between call frames and blocks which you didn't mention directly but which tips the balance for me: A Java call frame has side-effectable locals (because Java is an imperative language, as you said). Thus, a block which exits to the current call frame can also push side effects to that call frame, while a block that unwinds to a different call frame cannot push side effects to enclosing variable bindings, because they will be in a different call frame. That's a difference that is usually ignored when reasoning about Java programs, thanks to the implicitly final rule. Using distinct syntaxes for same-frame block exits and different-frame unwinds adds a little extra help to programmers to keep track of the difference. It doesn't matter that programmers usually don't care about the difference; having a syntax difference might help them avoid surprise errors, and allow them to keep the semantic differences at a non-distracting subliminal level. Thus, if I have to say "return" I know I can't return an extra value by side effects, since all my up-level variables will be final (implicitly or not). And if I say "yield" I know I can return an extra value, if I need to, by punching it into some visible local. From kevinb at google.com Mon May 13 21:55:34 2019 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 13 May 2019 14:55:34 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> Message-ID: Moving away from "break": I'm interested.... So in colon-form switch (whether statement or expression) you are responsible for your own control flow, and in arrow-form switch (whether statement or expression) you are not. "break" is synonymous in users' minds with that control flow they don't want to have to do. So in theory it's arrow-form that should make the concept of "breaking" obsolete. Unfortunately, that doesn't seem like the distinction we'll making; do I have the following right? 1. A colon-form case in a switch statement stays absolutely the same as always - keep `break`ing 2. An arrow-form case in a switch statement usually doesn't need to `break`... but can, just as an early-out from a block, right? 3. A colon-form case in a switch expression cannot `break` at all; it either yields, throws, or falls through 4. An arrow-form case in a switch expression: cannot `break` or fall through; must be a single expression, or it must always `__yield` or throw So using break or not isn't about whether you are doing your own control flow or not. So it's not a nice conceptual clean break that way, but in practice we think most switches will be all #1 or all #4, do we not? (side note: I had to think about these as four different kinds of switches. As I think users will much of the time; it would be very optimistic to think they will see it the way language designers do: two orthogonal features that they can simply compose together or use apart. Actually they won't see four kinds; they will think there are two kinds and then be very surprised when they come across a hybrid like 2 or 3. ) Anyway, I don't dislike yield even though I know it has other connotations. I think it communicates "I am done and I give forth this value", and what happens from there can be context-dependent and that seems fine.... *From: *Brian Goetz *Date: *Mon, May 13, 2019 at 6:33 AM *To: *amber-spec-experts As mentioned in the preview mail, we have one more decision to make: the > new spelling of ?break value? in expression switches. We have previously > discussed ?break-with value?, which everyone seems to like better than > ?break value?, but I think we can, and should, do better. > > (Despite the call-for-bikeshed, this is not to reopen every sub-decision ? > the 2x2 semantics, the use of ->, the name of the construct ? this bikeshed > only has room for one bike.) > > There are two primary reasons why we prefer break-with to break. We > originally chose ?break value" when we had a more limited palette of > options to choose from (the keyword-resupply ship hadn?t yet docked.) The > overloading of break creates uncomfortable interactions. There is the > obvious ambiguity between ?break value? and ?break label?; there is also > the slightly less obvious interaction where we cannot permit ?break value? > inside a loop or statement switch inside an expression switch. While both > of these can be ?specified around?, they create distortions in the spec, > which in turn creates complexity in the user model; these are a sign that > we may be pushing something a bit too far. Further, historically ?break? > has been a straight transfer of control; this muddies up what ?break? > means. > > Once we alit on the idea of break-* as a keyword, it seemed immediately > more comfortable to make a new break-derived keyword; this allowed us to > undo the distortions that ?break value? introduced, and it immediately felt > better. But I think we can do better still. Here?s what?s making me > uncomfortable. > > We?ve actually been here before: lambda expressions were the first time we > allowed an expression to contain statements, and while the streamlined case > of ?x -> e? didn?t require any control statements, and many lambdas could > be expressed with this form, statement lambdas needed a way to say ?stop > executing the body of this lambda, and yield a value.? We settled ? > somewhat uncomfortably ? on ?return value" for this. > > Fast-forward to today, when we?re introducing the second expression form > that can contain statements, and we face the same question: how to indicate > ?I?m done, I?m completing normally, here?s my value.? Lambdas provide no > help here; we can?t use ?return? here. (Well, we could, but that would be > terrible, so we?re not going to.) Which means we have to solve the problem > again, but differently. That?s already not so great. > > Digression: What?s so terrible about ?return?, any why is it OK for > lambdas but not OK for switches? > > While we could of course define ?return? to mean whatever we want, But, in > imperative languages with the concept of ?methods? or ?procedures?, > including Java, return has always had a clear meaning: unwind the current > call frame, and yield the designated value to the caller. Lambda > expressions are effectively method bodies (lambdas are literals for > functional interfaces, which are single method interfaces), and so return > (barely) fits. But switch expressions are most definitely not methods, and > are not associated with call frames. Asking users to look at the enclosing > context when they see a ?return? in the middle of a method, to know whether > it returns from the method or merely transfers control within the method, > is a lot to ask. (Yes, I know lambdas ask this as well; this is why this > was an uncomfortable choice, and having made this hole, I?m not anxious to > expand it dramatically. If anything I?d prefer to close it, but that?s > another bikeshed.). > > (end digression) > > > We could surely take ?break-with? and move on; it feels sufficiently > ?switchy?. But let?s look ahead a little bit. We?ve now confronted the > same problem twice: an expression form that, in a minority use case, needed > a way to express ?stop computing this expression, because I?m done, and > here?s its value.? (And, unfortunately, we have two different syntactic > ways to express the same basic concept.) Let?s call these ?structured > expressions.? > > We have two structured expression forms, and of the three numbers in > computer science, ?two? is not one of them. Which suggests we are going to > face this problem again some day ? whether it be ?block expressions?, or > ?if expressions?, or ?let expressions?, or ?try expressions?, or whatever. > (NB: this call-for-bikeshed most definitely does not extend to ?why not > just do generalized block expressions?, so please don?t go there. That > said, you could treat this discussion as ?if Java had block expressions, > what might they look like?? But we?re focusing on the content of the > block, not how the block is framed.) > > Let?s say for sake of argument that we might someway want to extend > ternary expressions to support the same kind of ?restricted block > expressions? as expression switches. (This is just an example for purposes > of illustration, let?s not get derailed on ?but you should use an ?if? > statement for that"). > > String s = (foo != null) > ? s > : { > println(?null again at line? + __LINE__); > break-with ?null?; > }; > > Such an expression needs a way to say ?I?m done, here?s my value?, just as > lambda and switch did before it. Clearly ?return? is not the right thing > here any more than it is for switches. And I don?t think ?break-with? is > all that great here either! It?s not terrible, but outside of a loop or > switch, it starts to feel kind of forced. And it would be terrible to > solve this problem twice with one-time solutions, and have no general > story, and then have to come up with YET ANOTHER way of expressing the same > basic concept. So regardless of what we expect for future expression > forms, let?s examine what our options are that are not tied to call frames > (return) or direct transfer of control (switches and loops.). > > Looking at what other languages have done here, there are a few broad > directions: > > - A statement like ?break-with v?, indicating that the enclosing > structured expression is completing normally with the provided value. > - An operator that serves the same purpose, such as ?-> e?. > - Assigning to some magic variable (this is how Pascal indicates the > return value of a function). > - Treating the last expression in the block as the result. > > I think we can dispatch all but the first relatively easily: > > - We don?t use operators for ?return?, we use a keyword; this would be > both a gratuitous departure, as well as too easy to miss. > - Switch expressions don?t have names, and even if we assigned > to ?switch?, it wouldn?t be obvious that we were actually terminating > execution of the block. > - Everywhere else in the language (such as method bodies), you are free > to yield up a value from the middle of the block, perhaps from within a > control construct like a loop; restricting the RHS of case blocks to put > their result last would be a significant new restriction, and would limit > the ability to refactor to/from methods. And further, the convention of > putting the result last, while a fine one for a language that > is ?expressions all the way down?, would likely be too subtle a cue in > Java. > > So, we want a keyword (or contextual keyword.). In some hallway > brainstorming, candidates that emerged include yield, produce, offer, > offer-up, result, value-break, yield-value, provide, resulting-in, > break-with, resulting, yielding, put, give, giving, ... > > (Also to keep in mind: remember we?re dealing with a minority case; most > of the time, there?ll just be an expression on the RHS.) > > TL;DR: I think we might come to regret break-* just as we did with return > ? because it won?t scale to future demands we place on it, and having > *three* ways to say basically the same thing in three different contexts > would be embarrassing. I would like to see if we can do better. > > > Of the options listed here, I have a favorite: yield. (This is one of the > terms we?ve actually be using all along when describing this feature in > english.) > > There is one obvious objection to ?yield?, which I?d like to preemptively > address: that in some languages (though not in Java, except for the > infrequently-used Thread.yield()), it is associated with concurrency > primitives, such as generators. (This was the objection raised when yield > was proposed in the context of lambdas.). But, these association are not > grounded in existing Java constructs (and, the progress of Loom suggests > that constructs like async/await are not coming to Java, and even if we > wanted language support for generators, there are ample other ways to say > it.) > > Dictionary.com lists the following meanings for > yield: > > verb (used with object) > - to give forth or produce by a natural process or > in return for cultivation: > - to produce or furnish (payment, profit, or interest): > - to give up, as to superior power or authority: > - to give up or over; relinquish or resign: > - to give as due or required: > - to cause; give rise to: > > verb (used without object) > - to give a return, as for labor expended; produce; bear. > - to surrender or submit, as to superior power: > - to give way to influence, entreaty, argument, or the like: > - to give place or precedence (usually followed by to): > - to give way to force, pressure, etc., so as > to move, bend, collapse, or the like: > > These are mostly consistent with the use of ?yield? as proposed here. > > One more thing to bear in mind: there is an ordering to abrupt completion > mechanisms, as to how far away they can transfer control: > > - yield: can unwind only the innermost yieldable expression > - break/continue: can unwind multiple control constructs (for, while, > switch), but stays within the method > - return: unwinds exactly one method > - throw: unwinds one or more methods > - System.exit: unwinds the whole VM > > > Bikeshed is open (but remember the bounds of this bikeshed are limited; > we?re talking purely about the syntax of a ?stop executing this block and > yield a value to the enclosing context? ? and time is ticking.) > > > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Mon May 13 22:48:09 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 13 May 2019 18:48:09 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> Message-ID: > So in colon-form switch (whether statement or expression) you are responsible for your own control flow, and in arrow-form switch (whether statement or expression) you are not. "break" is synonymous in users' minds with that control flow they don't want to have to do. So in theory it's arrow-form that should make the concept of "breaking" obsolete. Unfortunately, that doesn't seem like the distinction we'll making; do I have the following right? In colon-form, you are always responsible for your control flow. In arrow-form, you are generally not, except that if you have a block on the RHS of the arrow, you are responsible for control flow _out of the block_. In: y = switch (x) { case 1 -> { foo(); yield 3; } }; there is a pleasant ambiguity as to whether the ?yield 3? is yielding a value to the _block_, in which case the switch just completes normally, or whether it is yielding the value to the _switch case_. And it doesn?t really matter, so whichever intuition users are attracted to, is fine. > A colon-form case in a switch statement stays absolutely the same as always - keep `break`ing > An arrow-form case in a switch statement usually doesn't need to `break`... but can, just as an early-out from a block, right? > A colon-form case in a switch expression cannot `break` at all; it either yields, throws, or falls through > An arrow-form case in a switch expression: cannot `break` or fall through; must be a single expression, or it must always `__yield` or throw Right. There are several rules interacting here: - An expression must either yield a value or throw; control statements like break, continue, or return is not allowed in a ?structured expression.? - You break out of a switch statement; you yield values from a switch expression - In arrow form, neither break/yield is needed if the RHS is not a block - In arrow form, break/yield/throw *is* needed if the RHS is a block > So using break or not isn't about whether you are doing your own control flow or not. So it's not a nice conceptual clean break that way, but in practice we think most switches will be all #1 or all #4, do we not? I would expect 1/4 to be the most common, followed by 2, with 3 bringing up the rear. > Anyway, I don't dislike yield even though I know it has other connotations. I think it communicates "I am done and I give forth this value", and what happens from there can be context-dependent and that seems fine.... > Yep. And that context dependency is: - Yield yields to the immediately enclosing structured expression; if there is none, it is an error - Unlabeled break/continue breaks to the immediately enclosing ?breaky? statement, if there is none, its an error, but cannot ?break through? a structured expression. From daniel.smith at oracle.com Tue May 14 23:15:55 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 14 May 2019 17:15:55 -0600 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: <950B7480-0CC3-4970-9209-553AFE2FD603@oracle.com> References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> <92CC33C3-6AA1-4639-ABC0-1DD87CD33C59@oracle.com> <950B7480-0CC3-4970-9209-553AFE2FD603@oracle.com> Message-ID: <135B8E95-7D8D-4035-902A-CC9B8BF0A044@oracle.com> > On May 13, 2019, at 8:05 AM, Jim Laskey wrote: > > After some significant tweaks, reopening the JEP for review. https://bugs.openjdk.java.net/browse/JDK-8222530 Something really clicks for me in calling these "text blocks". The delimiter syntax and conventions for line breaks/whitespace, which seemed somewhat arbitrary before, feel right. Nice psychological trick. Let me weigh in with some design feedback, in a refined form of some comments I made in a previous thread: Finding the right indentation trimming algorithm has been a struggle. We've come up with something, but it sure seems complex, and I'll bet most programmers will never fully internalize it. The struggle arises primarily because the feature has an ambitious goal of getting it "right" for a wide variety of indentation conventions, and also because the feature is constrained to be a post-processing step, independent of program context. I suggest rethinking both of those requirements. Instead, the language should be strongly opinionated about how text blocks should be indented, and should take the enclosing context into account. Specifically, the opening """ delimiter should mark the left margin of the text block, and it should be a compiler error to put content to the left of that margin. This results in a really simple, readable approach to indenting: the delimiter marks the rectangle. Detailed rules: - The *prefix* of a text block is the program text after the immediately preceding \n or \r, up to the opening """, with every non-whitespace character replaced with a space (\u0020). - The form of a text block is """ * ( * )+ """ (that is, opening delimiter, ignored whitespace, then one or more lines of content, each prefixed by a newline and the *prefix*; all prefixes must be identical). - The string denoted by a text block is its * strings after escape processing, concatenated together with '\n'. Most of the examples in the JEP follow these rules as a convention already. The concatenation examples would benefit from following it. Discussion: What if I want to shift my content left? Just put a line break before the opening delimiter, and align it wherever you want to set your left margin. (If you don't want to strip anything, put the opening delimiter in column 0.) You're n-line text block now takes n+1 lines?nbd. What if I want to shift my content right, beyond the delimiters? Don't do that. That's not how text blocks work. (I mean, you can do it, but your extra whitespace will be included in the denoted string.) What about tabs? Tabs that come before the opening delimiter are recognized, and all prefixes must use the same pattern of tabs/spaces/[other exotic whitespace]. What if you want to have program text on the same line as the opening delimiter, but then want to use tabs underneath?: \t \t System.out.println(""" \t \t \t \t \t \t \t Hello world! \t \t \t \t \t \t \t """); Well, then you're doing tabs wrong?different tab widths will make "Hello world!" appear to the left or right of the delimiters. So this is an error. Either use spaces after the first two tabs, or put the opening delimiter on a new line. What about variable-width fonts? If you expect your code to be read in a variable-width font, by convention you should start all text blocks on a (possibly-indented) blank line. What about Unicode escapes? It's an orthogonal question, but I think it's fine to continue pre-processing all Unicode escapes. If you want obfuscate prefixes and line breaks using \u0020 and \u000a, go for it. From brian.goetz at oracle.com Tue May 14 23:25:17 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 14 May 2019 19:25:17 -0400 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: <135B8E95-7D8D-4035-902A-CC9B8BF0A044@oracle.com> References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> <92CC33C3-6AA1-4639-ABC0-1DD87CD33C59@oracle.com> <950B7480-0CC3-4970-9209-553AFE2FD603@oracle.com> <135B8E95-7D8D-4035-902A-CC9B8BF0A044@oracle.com> Message-ID: <26D5F108-EA91-4AA7-9464-5DE7ED81CB15@oracle.com> > Most of the examples in the JEP follow these rules as a convention already. The concatenation examples would benefit from following it. Sorry, not seeing it ? how would the concatenation examples benefit? Example? From gavin.bierman at oracle.com Wed May 15 13:47:23 2019 From: gavin.bierman at oracle.com (Gavin Bierman) Date: Wed, 15 May 2019 15:47:23 +0200 Subject: Draft language spec for JEP354: Switch Expressions Message-ID: Dear experts: A draft language spec for JEP 354: Switch Expressions can be found here: http://cr.openjdk.java.net/~gbierman/jep354-jls-201905.html [Note: This spec uses the break-with statement. There is a discussion elsewhere on alternatives for a different syntax. The spec will be updated as soon as this discussion has been finalised.] Comments welcome! Gavin From daniel.smith at oracle.com Wed May 15 17:17:31 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 15 May 2019 11:17:31 -0600 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: <26D5F108-EA91-4AA7-9464-5DE7ED81CB15@oracle.com> References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> <92CC33C3-6AA1-4639-ABC0-1DD87CD33C59@oracle.com> <950B7480-0CC3-4970-9209-553AFE2FD603@oracle.com> <135B8E95-7D8D-4035-902A-CC9B8BF0A044@oracle.com> <26D5F108-EA91-4AA7-9464-5DE7ED81CB15@oracle.com> Message-ID: <50775FC6-35CE-4B82-8137-6A7AEBBEC6E5@oracle.com> > On May 14, 2019, at 5:25 PM, Brian Goetz wrote: > >> Most of the examples in the JEP follow these rules as a convention already. The concatenation examples would benefit from following it. > > Sorry, not seeing it ? how would the concatenation examples benefit? Example? > Sure, let me elaborate. I think this: ~~~ String code = """ public void print(""" + type + """ o) { System.out.println(Objects.toString(o)); } """; ~~~ should be presented like this: ~~~ String code = """ public void print(""" + type + """ o) { System.out.println(Objects.toString(o)); } """; ~~~ It's not great, and replace/format is the "right" solution, but if somebody wants to do concatenation, this style does a better job of indicating where the indent prefix ends and the content begins. The delimiter gives a visual indication of where the "block" is located. Further illustrations: Things like this are following the convention I'm proposing we enforce: ~~~ String html = """

Hello, world

"""; ~~~ As is this: ~~~ """ line 1 line 2 line 3""" ~~~ This one doesn't, but it's a simple matter of putting some spaces before the closing delimiter to fix it: ~~~ String empty = """ """; ~~~ This concatenation example follows the convention (although note that there's no newline between '{' and 'System'): ~~~ String code = "public void print(Object o) {" + """ System.out.println(Objects.toString(o)); } """; ~~~ From john.r.rose at oracle.com Wed May 15 17:35:08 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 15 May 2019 10:35:08 -0700 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: <50775FC6-35CE-4B82-8137-6A7AEBBEC6E5@oracle.com> References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> <92CC33C3-6AA1-4639-ABC0-1DD87CD33C59@oracle.com> <950B7480-0CC3-4970-9209-553AFE2FD603@oracle.com> <135B8E95-7D8D-4035-902A-CC9B8BF0A044@oracle.com> <26D5F108-EA91-4AA7-9464-5DE7ED81CB15@oracle.com> <50775FC6-35CE-4B82-8137-6A7AEBBEC6E5@oracle.com> Message-ID: 1/4. FTR, an escape <\ LT> could clean that up a bit more, if the goal is to get the interpolation cruft on a separate line: ~~~ String code = """ public void print(\ """ + type + """ o) { System.out.println(Objects.toString(o)); } """; ~~~ 2/4. Dan, I'm having trouble seeing your idea of "prefix" in this example. Is it that `String code = ` has the same number of chars as there are spaces before `public` (start of the first payload line)? This is hard to read, I'm afraid. 3/4. Dan, isn't it true that programmers can use this idiom under the existing proposal, without appealing to your "prefix" rule? All they do is (a) keep the close-quotes (in a single ""+x+"" expression) aligned, and also (b) don't exdent before the close quotes. 4/4. I guess you are proposing two adjustments, the "prefix" rule and the "no exdent rule". The "prefix" rule allows open-quote to set indentation, by counting arbitrary characters before the open-quote as setting a target column. The "no exdent rule" disallows payload chars in columns before the target column, as set by the close-quote. If I'm reading that right, I'm much happier with the "no exdent rule" than the "prefix" rule. ? John P.S. In one example you say something about a missing newline before a close-quote. Those can always be introduced explicitly by <\ n>. One reason I like <\ LT> is that it pairs very well with <\ n>: You can put in <\ LT> to control a line break, and then if you really want a payload LT also, you add <\ n> either before or after the <\ LT>. On May 15, 2019, at 10:17 AM, Dan Smith wrote: > > ~~~ > String code = """ > public void print(""" + > type + > """ > o) { > System.out.println(Objects.toString(o)); > } > """; > ~~~ > From daniel.smith at oracle.com Wed May 15 18:01:25 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 15 May 2019 12:01:25 -0600 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> <92CC33C3-6AA1-4639-ABC0-1DD87CD33C59@oracle.com> <950B7480-0CC3-4970-9209-553AFE2FD603@oracle.com> <135B8E95-7D8D-4035-902A-CC9B8BF0A044@oracle.com> <26D5F108-EA91-4AA7-9464-5DE7ED81CB15@oracle.com> <50775FC6-35CE-4B82-8137-6A7AEBBEC6E5@oracle.com> Message-ID: <391EBA8C-36F6-4103-B153-8E38A5A8C9F0@oracle.com> > On May 15, 2019, at 11:35 AM, John Rose wrote: > > 2/4. Dan, I'm having trouble seeing your > idea of "prefix" in this example. Is it that > `String code = ` has the same number of > chars as there are spaces before `public` > (start of the first payload line)? This is hard > to read, I'm afraid. Yes. "Same number of characters" is the idea (with extra constraints to handle tabs and other exotic whitespace, but most people won't care about those). Is it hard to read because of a variable-width font? In a normal editing environment, I'm just saying the opening delimiter should be visually aligned with the content. > 3/4. Dan, isn't it true that programmers can > use this idiom under the existing proposal, > without appealing to your "prefix" rule? > All they do is (a) keep the close-quotes > (in a single ""+x+"" expression) aligned, > and also (b) don't exdent before the close > quotes. Sure. I'm claiming that it would be helpful to put some additional constraints on what constitutes a valid text block, in order to ensure some harder-to-read cases never come up. > 4/4. I guess you are proposing two adjustments, the > "prefix" rule and the "no exdent rule". The "prefix" > rule allows open-quote to set indentation, by counting > arbitrary characters before the open-quote as setting > a target column. The "no exdent rule" disallows payload > chars in columns before the target column, as set by > the close-quote. You could say that. Your "no exdent" rule prevents any lines but the (necessarily blank) closing-delimiter line from setting the target column. Your "prefix" rule transfers this responsibility to the opening-delimiter line. I think using the opening delimiter is helpful because 1) readers see the opening delimiter first, and 2) it frees the closing delimiter to be a marker for trailing whitespace/newlines. From alex.buckley at oracle.com Wed May 15 18:11:44 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Wed, 15 May 2019 11:11:44 -0700 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: <50775FC6-35CE-4B82-8137-6A7AEBBEC6E5@oracle.com> References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> <92CC33C3-6AA1-4639-ABC0-1DD87CD33C59@oracle.com> <950B7480-0CC3-4970-9209-553AFE2FD603@oracle.com> <135B8E95-7D8D-4035-902A-CC9B8BF0A044@oracle.com> <26D5F108-EA91-4AA7-9464-5DE7ED81CB15@oracle.com> <50775FC6-35CE-4B82-8137-6A7AEBBEC6E5@oracle.com> Message-ID: <5CDC5660.9010802@oracle.com> On 5/15/2019 10:17 AM, Dan Smith wrote: > I think this: > > ~~~ > String code = """ > public void print(""" + type + """ > o) { > System.out.println(Objects.toString(o)); > } > """; > ~~~ > > should be presented like this: > > ~~~ > String code = """ > public void print(""" + > type + > """ > o) { > System.out.println(Objects.toString(o)); > } > """; > ~~~ > > It's not great, and replace/format is the "right" solution, but if > somebody wants to do concatenation, this style does a better job of > indicating where the indent prefix ends and the content begins. The > delimiter gives a visual indication of where the "block" is located. I appreciate that you want to position an opening delimiter to the left of its content, but can you say why you want `type +` on its own line? What's the big deal with `...""" + type +\n` and then the next text block? (You don't seem to object to the closing delimiter sharing a line with content, since you have ` + ` after the first closing delimiter.) Alex From daniel.smith at oracle.com Wed May 15 18:25:14 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 15 May 2019 12:25:14 -0600 Subject: RFR: Multi-line String Literal (Preview) JEP [EG Draft] In-Reply-To: <5CDC5660.9010802@oracle.com> References: <017DBC7A-FCFA-4D6C-BA0F-4594D42F0C1E@oracle.com> <92CC33C3-6AA1-4639-ABC0-1DD87CD33C59@oracle.com> <950B7480-0CC3-4970-9209-553AFE2FD603@oracle.com> <135B8E95-7D8D-4035-902A-CC9B8BF0A044@oracle.com> <26D5F108-EA91-4AA7-9464-5DE7ED81CB15@oracle.com> <50775FC6-35CE-4B82-8137-6A7AEBBEC6E5@oracle.com> <5CDC5660.9010802@oracle.com> Message-ID: > On May 15, 2019, at 12:11 PM, Alex Buckley wrote: > > On 5/15/2019 10:17 AM, Dan Smith wrote: >> I think this: >> >> ~~~ >> String code = """ >> public void print(""" + type + """ >> o) { >> System.out.println(Objects.toString(o)); >> } >> """; >> ~~~ >> >> should be presented like this: >> >> ~~~ >> String code = """ >> public void print(""" + >> type + >> """ >> o) { >> System.out.println(Objects.toString(o)); >> } >> """; >> ~~~ >> >> It's not great, and replace/format is the "right" solution, but if >> somebody wants to do concatenation, this style does a better job of >> indicating where the indent prefix ends and the content begins. The >> delimiter gives a visual indication of where the "block" is located. > > I appreciate that you want to position an opening delimiter to the left of its content, but can you say why you want `type +` on its own line? What's the big deal with `...""" + type +\n` and then the next text block? (You don't seem to object to the closing delimiter sharing a line with content, since you have ` + ` after the first closing delimiter.) Just a feeling that it might read better with every piece on a separate line. I don't have a strong preference about that, though. In retrospect, here's how I'd really write it in a program of mine, assuming I was opposed to the replace/format approach for some reason: String code = "public void print(" + type + " o) {\n" + """ System.out.println(Objects.toString(o)); } """; But that doesn't do such a good job of illustrating how the re-indentation algorithm impacts whitespace before the 'o'. :-) From dl at cs.oswego.edu Wed May 15 23:35:07 2019 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 15 May 2019 19:35:07 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <135B5F1D-5DF3-4C54-9B2D-8D2CD716FFC9@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <9aaff291-fb96-9cbd-432b-206cd215ed06@cs.oswego.edu> <135B5F1D-5DF3-4C54-9B2D-8D2CD716FFC9@oracle.com> Message-ID: <14a66fd2-62c7-7a1b-19b6-a07422a47ff2@cs.oswego.edu> (With continuing deja vu...) On 5/13/19 3:48 PM, John Rose wrote: > The rule for developers is that if you > needed to put a {?} block after your > arrow ->, then you can still use an > arrow to return a value, but it must be > an extra arrow, marked with a keyword > (or syntax context) that means "here is > the rest of the arrow you wanted to > write a moment ago". Yes, but this arrow should not point right, but up (which was the thought underlying Smalltalk's choice). Maybe finally use unicode "?". Or more conservatively, "^". I still think a symbol is better than keyword, because there is no single common word that applies across contexts this may be applied in, except possibly "yield", that already means something else in Java (Thread.yield), and several something else's in other languages. (Meta: What do you call a bikeshed thread in which no one likes anyone else's suggestions?) -Doug From guy.steele at oracle.com Thu May 16 00:31:51 2019 From: guy.steele at oracle.com (Guy Steele) Date: Wed, 15 May 2019 20:31:51 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <14a66fd2-62c7-7a1b-19b6-a07422a47ff2@cs.oswego.edu> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <9aaff291-fb96-9cbd-432b-206cd215ed06@cs.oswego.edu> <135B5F1D-5DF3-4C54-9B2D-8D2CD716FFC9@oracle.com> <14a66fd2-62c7-7a1b-19b6-a07422a47ff2@cs.oswego.edu> Message-ID: <817DF6B5-CA82-4CAE-83D1-088F1B90223D@oracle.com> > On May 15, 2019, at 7:35 PM, Doug Lea
wrote: > . . . > (Meta: What do you call a bikeshed thread in which no one likes anyone > else's suggestions?) Cliqueless? From forax at univ-mlv.fr Thu May 16 11:41:33 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 16 May 2019 13:41:33 +0200 (CEST) Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> Message-ID: <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> Another possible keyword is 'pass'. R?mi > De: "Brian Goetz" > ?: "amber-spec-experts" > Envoy?: Dimanche 12 Mai 2019 21:38:38 > Objet: Call for bikeshed -- break replacement in expression switch > As mentioned in the preview mail, we have one more decision to make: the new > spelling of ?break value? in expression switches. We have previously discussed > ?break-with value?, which everyone seems to like better than ?break value?, but > I think we can, and should, do better. > (Despite the call-for-bikeshed, this is not to reopen every sub-decision ? the > 2x2 semantics, the use of ->, the name of the construct ? this bikeshed only > has room for one bike.) > There are two primary reasons why we prefer break-with to break. We originally > chose ?break value" when we had a more limited palette of options to choose > from (the keyword-resupply ship hadn?t yet docked.) The overloading of break > creates uncomfortable interactions. There is the obvious ambiguity between > ?break value? and ?break label?; there is also the slightly less obvious > interaction where we cannot permit ?break value? inside a loop or statement > switch inside an expression switch. While both of these can be ?specified > around?, they create distortions in the spec, which in turn creates complexity > in the user model; these are a sign that we may be pushing something a bit too > far. Further, historically ?break? has been a straight transfer of control; > this muddies up what ?break? means. > Once we alit on the idea of break-* as a keyword, it seemed immediately more > comfortable to make a new break-derived keyword; this allowed us to undo the > distortions that ?break value? introduced, and it immediately felt better. But > I think we can do better still. Here?s what?s making me uncomfortable. > We?ve actually been here before: lambda expressions were the first time we > allowed an expression to contain statements, and while the streamlined case of > ?x -> e? didn?t require any control statements, and many lambdas could be > expressed with this form, statement lambdas needed a way to say ?stop executing > the body of this lambda, and yield a value.? We settled ? somewhat > uncomfortably ? on ?return value" for this. > Fast-forward to today, when we?re introducing the second expression form that > can contain statements, and we face the same question: how to indicate ?I?m > done, I?m completing normally, here?s my value.? Lambdas provide no help here; > we can?t use ?return? here. (Well, we could, but that would be terrible, so > we?re not going to.) Which means we have to solve the problem again, but > differently. That?s already not so great. > Digression: What?s so terrible about ?return?, any why is it OK for lambdas but > not OK for switches? > While we could of course define ?return? to mean whatever we want, But, in > imperative languages with the concept of ?methods? or ?procedures?, including > Java, return has always had a clear meaning: unwind the current call frame, and > yield the designated value to the caller. Lambda expressions are effectively > method bodies (lambdas are literals for functional interfaces, which are single > method interfaces), and so return (barely) fits. But switch expressions are > most definitely not methods, and are not associated with call frames. Asking > users to look at the enclosing context when they see a ?return? in the middle > of a method, to know whether it returns from the method or merely transfers > control within the method, is a lot to ask. (Yes, I know lambdas ask this as > well; this is why this was an uncomfortable choice, and having made this hole, > I?m not anxious to expand it dramatically. If anything I?d prefer to close it, > but that?s another bikeshed.). > (end digression) > We could surely take ?break-with? and move on; it feels sufficiently ?switchy?. > But let?s look ahead a little bit. We?ve now confronted the same problem twice: > an expression form that, in a minority use case, needed a way to express ?stop > computing this expression, because I?m done, and here?s its value.? (And, > unfortunately, we have two different syntactic ways to express the same basic > concept.) Let?s call these ?structured expressions.? > We have two structured expression forms, and of the three numbers in computer > science, ?two? is not one of them. Which suggests we are going to face this > problem again some day ? whether it be ?block expressions?, or ?if > expressions?, or ?let expressions?, or ?try expressions?, or whatever. (NB: > this call-for-bikeshed most definitely does not extend to ?why not just do > generalized block expressions?, so please don?t go there. That said, you could > treat this discussion as ?if Java had block expressions, what might they look > like?? But we?re focusing on the content of the block, not how the block is > framed.) > Let?s say for sake of argument that we might someway want to extend ternary > expressions to support the same kind of ?restricted block expressions? as > expression switches. (This is just an example for purposes of illustration, > let?s not get derailed on ?but you should use an ?if? statement for that"). > String s = (foo != null) > ? s > : { > println(?null again at line? + __LINE__); > break-with ?null?; > }; > Such an expression needs a way to say ?I?m done, here?s my value?, just as > lambda and switch did before it. Clearly ?return? is not the right thing here > any more than it is for switches. And I don?t think ?break-with? is all that > great here either! It?s not terrible, but outside of a loop or switch, it > starts to feel kind of forced. And it would be terrible to solve this problem > twice with one-time solutions, and have no general story, and then have to come > up with YET ANOTHER way of expressing the same basic concept. So regardless of > what we expect for future expression forms, let?s examine what our options are > that are not tied to call frames (return) or direct transfer of control > (switches and loops.). > Looking at what other languages have done here, there are a few broad > directions: > - A statement like ?break-with v?, indicating that the enclosing structured > expression is completing normally with the provided value. > - An operator that serves the same purpose, such as ?-> e?. > - Assigning to some magic variable (this is how Pascal indicates the return > value of a function). > - Treating the last expression in the block as the result. > I think we can dispatch all but the first relatively easily: > - We don?t use operators for ?return?, we use a keyword; this would be both a > gratuitous departure, as well as too easy to miss. > - Switch expressions don?t have names, and even if we assigned to ?switch?, it > wouldn?t be obvious that we were actually terminating execution of the block. > - Everywhere else in the language (such as method bodies), you are free to yield > up a value from the middle of the block, perhaps from within a control > construct like a loop; restricting the RHS of case blocks to put their result > last would be a significant new restriction, and would limit the ability to > refactor to/from methods. And further, the convention of putting the result > last, while a fine one for a language that is ?expressions all the way down?, > would likely be too subtle a cue in Java. > So, we want a keyword (or contextual keyword.). In some hallway brainstorming, > candidates that emerged include yield, produce, offer, offer-up, result, > value-break, yield-value, provide, resulting-in, break-with, resulting, > yielding, put, give, giving, ... > (Also to keep in mind: remember we?re dealing with a minority case; most of the > time, there?ll just be an expression on the RHS.) > TL;DR: I think we might come to regret break-* just as we did with return ? > because it won?t scale to future demands we place on it, and having *three* > ways to say basically the same thing in three different contexts would be > embarrassing. I would like to see if we can do better. > Of the options listed here, I have a favorite: yield. (This is one of the terms > we?ve actually be using all along when describing this feature in english.) > There is one obvious objection to ?yield?, which I?d like to preemptively > address: that in some languages (though not in Java, except for the > infrequently-used Thread.yield()), it is associated with concurrency > primitives, such as generators. (This was the objection raised when yield was > proposed in the context of lambdas.). But, these association are not grounded > in existing Java constructs (and, the progress of Loom suggests that constructs > like async/await are not coming to Java, and even if we wanted language support > for generators, there are ample other ways to say it.) > [ http://dictionary.com/ | Dictionary.com ] lists the following meanings for > yield: > verb (used with object) > - to give forth or produce by a natural process or in return for cultivation: > - to produce or furnish (payment, profit, or interest): > - to give up, as to superior power or authority: > - to give up or over; relinquish or resign: > - to give as due or required: > - to cause; give rise to: > verb (used without object) > - to give a return, as for labor expended; produce; bear. > - to surrender or submit, as to superior power: > - to give way to influence, entreaty, argument, or the like: > - to give place or precedence (usually followed by to): > - to give way to force, pressure, etc., so as to move, bend, collapse, or the > like: > These are mostly consistent with the use of ?yield? as proposed here. > One more thing to bear in mind: there is an ordering to abrupt completion > mechanisms, as to how far away they can transfer control: > - yield: can unwind only the innermost yieldable expression > - break/continue: can unwind multiple control constructs (for, while, switch), > but stays within the method > - return: unwinds exactly one method > - throw: unwinds one or more methods > - System.exit: unwinds the whole VM > Bikeshed is open (but remember the bounds of this bikeshed are limited; we?re > talking purely about the syntax of a ?stop executing this block and yield a > value to the enclosing context? ? and time is ticking.) From brian.goetz at oracle.com Thu May 16 15:24:36 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 16 May 2019 11:24:36 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> Message-ID: <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> We?ve probably pretty much explored the options at this point; time to converge around one of the choices... > > De: "Brian Goetz" > ?: "amber-spec-experts" > Envoy?: Dimanche 12 Mai 2019 21:38:38 > Objet: Call for bikeshed -- break replacement in expression switch > As mentioned in the preview mail, we have one more decision to make: the new spelling of ?break value? in expression switches. We have previously discussed ?break-with value?, which everyone seems to like better than ?break value?, but I think we can, and should, do better. > > (Despite the call-for-bikeshed, this is not to reopen every sub-decision ? the 2x2 semantics, the use of ->, the name of the construct ? this bikeshed only has room for one bike.) > > There are two primary reasons why we prefer break-with to break. We originally chose ?break value" when we had a more limited palette of options to choose from (the keyword-resupply ship hadn?t yet docked.) The overloading of break creates uncomfortable interactions. There is the obvious ambiguity between ?break value? and ?break label?; there is also the slightly less obvious interaction where we cannot permit ?break value? inside a loop or statement switch inside an expression switch. While both of these can be ?specified around?, they create distortions in the spec, which in turn creates complexity in the user model; these are a sign that we may be pushing something a bit too far. Further, historically ?break? has been a straight transfer of control; this muddies up what ?break? means. > > Once we alit on the idea of break-* as a keyword, it seemed immediately more comfortable to make a new break-derived keyword; this allowed us to undo the distortions that ?break value? introduced, and it immediately felt better. But I think we can do better still. Here?s what?s making me uncomfortable. > > We?ve actually been here before: lambda expressions were the first time we allowed an expression to contain statements, and while the streamlined case of ?x -> e? didn?t require any control statements, and many lambdas could be expressed with this form, statement lambdas needed a way to say ?stop executing the body of this lambda, and yield a value.? We settled ? somewhat uncomfortably ? on ?return value" for this. > > Fast-forward to today, when we?re introducing the second expression form that can contain statements, and we face the same question: how to indicate ?I?m done, I?m completing normally, here?s my value.? Lambdas provide no help here; we can?t use ?return? here. (Well, we could, but that would be terrible, so we?re not going to.) Which means we have to solve the problem again, but differently. That?s already not so great. > > Digression: What?s so terrible about ?return?, any why is it OK for lambdas but not OK for switches? > > While we could of course define ?return? to mean whatever we want, But, in imperative languages with the concept of ?methods? or ?procedures?, including Java, return has always had a clear meaning: unwind the current call frame, and yield the designated value to the caller. Lambda expressions are effectively method bodies (lambdas are literals for functional interfaces, which are single method interfaces), and so return (barely) fits. But switch expressions are most definitely not methods, and are not associated with call frames. Asking users to look at the enclosing context when they see a ?return? in the middle of a method, to know whether it returns from the method or merely transfers control within the method, is a lot to ask. (Yes, I know lambdas ask this as well; this is why this was an uncomfortable choice, and having made this hole, I?m not anxious to expand it dramatically. If anything I?d prefer to close it, but that?s another bikeshed.). > > (end digression) > > > We could surely take ?break-with? and move on; it feels sufficiently ?switchy?. But let?s look ahead a little bit. We?ve now confronted the same problem twice: an expression form that, in a minority use case, needed a way to express ?stop computing this expression, because I?m done, and here?s its value.? (And, unfortunately, we have two different syntactic ways to express the same basic concept.) Let?s call these ?structured expressions.? > > We have two structured expression forms, and of the three numbers in computer science, ?two? is not one of them. Which suggests we are going to face this problem again some day ? whether it be ?block expressions?, or ?if expressions?, or ?let expressions?, or ?try expressions?, or whatever. (NB: this call-for-bikeshed most definitely does not extend to ?why not just do generalized block expressions?, so please don?t go there. That said, you could treat this discussion as ?if Java had block expressions, what might they look like?? But we?re focusing on the content of the block, not how the block is framed.) > > Let?s say for sake of argument that we might someway want to extend ternary expressions to support the same kind of ?restricted block expressions? as expression switches. (This is just an example for purposes of illustration, let?s not get derailed on ?but you should use an ?if? statement for that"). > > String s = (foo != null) > ? s > : { > println(?null again at line? + __LINE__); > break-with ?null?; > }; > > Such an expression needs a way to say ?I?m done, here?s my value?, just as lambda and switch did before it. Clearly ?return? is not the right thing here any more than it is for switches. And I don?t think ?break-with? is all that great here either! It?s not terrible, but outside of a loop or switch, it starts to feel kind of forced. And it would be terrible to solve this problem twice with one-time solutions, and have no general story, and then have to come up with YET ANOTHER way of expressing the same basic concept. So regardless of what we expect for future expression forms, let?s examine what our options are that are not tied to call frames (return) or direct transfer of control (switches and loops.). > > Looking at what other languages have done here, there are a few broad directions: > > - A statement like ?break-with v?, indicating that the enclosing structured expression is completing normally with the provided value. > - An operator that serves the same purpose, such as ?-> e?. > - Assigning to some magic variable (this is how Pascal indicates the return value of a function). > - Treating the last expression in the block as the result. > > I think we can dispatch all but the first relatively easily: > > - We don?t use operators for ?return?, we use a keyword; this would be both a gratuitous departure, as well as too easy to miss. > - Switch expressions don?t have names, and even if we assigned to ?switch?, it wouldn?t be obvious that we were actually terminating execution of the block. > - Everywhere else in the language (such as method bodies), you are free to yield up a value from the middle of the block, perhaps from within a control construct like a loop; restricting the RHS of case blocks to put their result last would be a significant new restriction, and would limit the ability to refactor to/from methods. And further, the convention of putting the result last, while a fine one for a language that is ?expressions all the way down?, would likely be too subtle a cue in Java. > > So, we want a keyword (or contextual keyword.). In some hallway brainstorming, candidates that emerged include yield, produce, offer, offer-up, result, value-break, yield-value, provide, resulting-in, break-with, resulting, yielding, put, give, giving, ... > > (Also to keep in mind: remember we?re dealing with a minority case; most of the time, there?ll just be an expression on the RHS.) > > TL;DR: I think we might come to regret break-* just as we did with return ? because it won?t scale to future demands we place on it, and having *three* ways to say basically the same thing in three different contexts would be embarrassing. I would like to see if we can do better. > > > Of the options listed here, I have a favorite: yield. (This is one of the terms we?ve actually be using all along when describing this feature in english.) > > There is one obvious objection to ?yield?, which I?d like to preemptively address: that in some languages (though not in Java, except for the infrequently-used Thread.yield()), it is associated with concurrency primitives, such as generators. (This was the objection raised when yield was proposed in the context of lambdas.). But, these association are not grounded in existing Java constructs (and, the progress of Loom suggests that constructs like async/await are not coming to Java, and even if we wanted language support for generators, there are ample other ways to say it.) > > Dictionary.com lists the following meanings for yield: > > verb (used with object) > - to give forth or produce by a natural process or in return for cultivation: > - to produce or furnish (payment, profit, or interest): > - to give up, as to superior power or authority: > - to give up or over; relinquish or resign: > - to give as due or required: > - to cause; give rise to: > > verb (used without object) > - to give a return, as for labor expended; produce; bear. > - to surrender or submit, as to superior power: > - to give way to influence, entreaty, argument, or the like: > - to give place or precedence (usually followed by to): > - to give way to force, pressure, etc., so as to move, bend, collapse, or the like: > > These are mostly consistent with the use of ?yield? as proposed here. > > One more thing to bear in mind: there is an ordering to abrupt completion mechanisms, as to how far away they can transfer control: > > - yield: can unwind only the innermost yieldable expression > - break/continue: can unwind multiple control constructs (for, while, switch), but stays within the method > - return: unwinds exactly one method > - throw: unwinds one or more methods > - System.exit: unwinds the whole VM > > > Bikeshed is open (but remember the bounds of this bikeshed are limited; we?re talking purely about the syntax of a ?stop executing this block and yield a value to the enclosing context? ? and time is ticking.) > > > > > From alex.buckley at oracle.com Thu May 16 19:36:38 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Thu, 16 May 2019 12:36:38 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> Message-ID: <5CDDBBC6.6010107@oracle.com> On 5/16/2019 8:24 AM, Brian Goetz wrote: > We?ve probably pretty much explored the options at this point; time to > converge around one of the choices... I am very happy with `yield` as the new construct for concluding the evaluation of a switch expression and leaving a value on the stack for consumption within the method. I think a statement form for the new construct is ideal. The purpose of the new construct is to complete abruptly in an attempt to transfer control back to the switch expression, which then completes normally with a value. Abrupt completion and an attempt to transfer control are the hallmarks of `break`, `continue`, and `return`; having `yield` as the junior member of that club is quite natural. Putting the junior and senior members side by side shows both similarity and difference: ----- A `yield` statement attempts to transfer control to the innermost enclosing switch expression; this expression ... then immediately completes normally and the value of the _Expression_ becomes the value of the switch expression. A `return` statement attempts to transfer control to the invoker of the innermost enclosing constructor, method, or lambda expression ... In the case of a return statement with value _Expression_, the value of the _Expression_ becomes the value of the invocation. ----- Note that the aspect of _attempting_ to transfer control applies to `yield` just as much as to `break`, `continue`, and `return`. Below, the `finally` block "intercepts" the transfer of control started by `yield`. The `finally` block then completes normally, so the transfer of control proceeds and the switch expression completes normally, leaving 5 or 6 on the stack. ``` int result = switch (x) { case 0 -> { try { ... if (...) yield 5; ... yield 6; } finally { cleanUp(); } } default -> 42; }; ``` Abrupt completion and transfer of control are not the hallmarks of operators. The purpose of an operator is to indicate the kind of expression to be evaluated (numeric addition, method invocation, etc), so an operator-like syntax such as `^` would suggest the imminent evaluation of a NEW expression. However, we are ALREADY in the process of evaluating a switch expression; in fact we would like to finish it up by transferring control from the {...} block (which has been happily executing statements sequentially) to the switch expression itself (so it can complete normally). So, I think an operator-like syntax is inappropriate. Alex From guy.steele at oracle.com Thu May 16 19:47:38 2019 From: guy.steele at oracle.com (Guy Steele) Date: Thu, 16 May 2019 15:47:38 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5CDDBBC6.6010107@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> Message-ID: > On May 16, 2019, at 3:36 PM, Alex Buckley wrote: > > On 5/16/2019 8:24 AM, Brian Goetz wrote: >> We?ve probably pretty much explored the options at this point; time to >> converge around one of the choices... > > I am very happy with `yield` as the new construct for concluding the evaluation of a switch expression and leaving a value on the stack for consumption within the method. > Yah, okay, I now admit that ?yield? is growing on me. I no longer object to it. And your other points below are well taken. > I think a statement form for the new construct is ideal. The purpose of the new construct is to complete abruptly in an attempt to transfer control back to the switch expression, which then completes normally with a value. Abrupt completion and an attempt to transfer control are the hallmarks of `break`, `continue`, and `return`; having `yield` as the junior member of that club is quite natural. Putting the junior and senior members side by side shows both similarity and difference: > > ----- > A `yield` statement attempts to transfer control to the innermost enclosing switch expression; this expression ... then immediately completes normally and the value of the _Expression_ becomes the value of the switch expression. > > A `return` statement attempts to transfer control to the invoker of the innermost enclosing constructor, method, or lambda expression ... In the case of a return statement with value _Expression_, the value of the _Expression_ becomes the value of the invocation. > ----- > > Note that the aspect of _attempting_ to transfer control applies to `yield` just as much as to `break`, `continue`, and `return`. Below, the `finally` block "intercepts" the transfer of control started by `yield`. The `finally` block then completes normally, so the transfer of control proceeds and the switch expression completes normally, leaving 5 or 6 on the stack. > > ``` > int result = switch (x) { > case 0 -> { > try { > ... > if (...) yield 5; > ... > yield 6; > } > finally { > cleanUp(); > } > } > > default -> 42; > }; > ``` > > Abrupt completion and transfer of control are not the hallmarks of operators. The purpose of an operator is to indicate the kind of expression to be evaluated (numeric addition, method invocation, etc), so an operator-like syntax such as `^` would suggest the imminent evaluation of a NEW expression. However, we are ALREADY in the process of evaluating a switch expression; in fact we would like to finish it up by transferring control from the {...} block (which has been happily executing statements sequentially) to the switch expression itself (so it can complete normally). So, I think an operator-like syntax is inappropriate. > > Alex From john.r.rose at oracle.com Thu May 16 19:53:27 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 16 May 2019 12:53:27 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> Message-ID: FTR I'm OK with "yield". (I yield the floor?) (And I'm OK with "pass", but we'll probably pass on that option?) The rule, I take it, is that `yield x;` would deliver a value to the innermost enclosing `->` operator. If it could be that simple, that would be a win; we could teach our eyes and IDEs to make that match-up. As I think you've said, such a rule that keys on `->` would allow us to apply yield retroactively to lambdas, *and* to switches, *and* to hypothetical expression-blocks in the future (if they have a `->` at their head, the rule applies uniformly), *and* to concise method bodies, as an alternative (as with lambda) to return. What about return-vs-yield? Well, yield is OK when there's a matching `->`. And return is OK when you're in a method body (and not also in a `->`). So sometimes both rules apply; pick a keyword; tastes may vary. That's not too different an experience from equivalent break vs. return (when the break falls out of the method body). On May 16, 2019, at 8:24 AM, Brian Goetz wrote: > One more thing to bear in mind: there is an ordering to abrupt completion mechanisms, as to how far away they can transfer control: > > - yield: can unwind only the innermost yieldable expression > - break/continue: can unwind multiple control constructs (for, while, switch), but stays within the method > - return: unwinds exactly one method > - throw: unwinds one or more methods > - System.exit: unwinds the whole VM Let's be careful of how we apply this ordering. A yield (like a lambda-return) can unwind any number of control constructs, up to the innermost yieldable expression. Because yields don't take labels, they cannot even express a multi-expression exit. But they *naturally* entail multi-block exit. Searches involving loops, catches, and ifs are common in Java and therefore essential to support with yield: L0: for (var q : qstuff) { L1: f(q, ()->{ STARTER -> { //B0 //break L0; continue L0; => BAD JUMP B1: try { B2: for (var x : stuff) { B3: if (x.stopHere()) yield x; } catch (MyEx ex) { yield ex.getStuff(); } B4: if (lastChance) yield DEFAULT_STUFF; throw new ComplainingEx(); } } } } Here, any of the blocks Bi could be unwound by a yield. The yield only goes back to the STARTER (which could be a lambda, switch or futuristic thing). A yield cannot reach the outer lambda at L1. More over, the break L0 would be a bad jump, since it cannot break out of the -> of the STARTER expression. Going back to your list of "unwind strength", I think *breaks* are therefore more limited than yields: - break/continue: can unwind multiple control constructs (for, while, switch), but stays within *both* the method and the innermost `->` - yield: can unwind multiple control constructs (for, while, switch), but stays within the innermost `->` - return: unwinds exactly one method frame (implicit after `->` method body) - throw: unwinds one or more methods - System.exit: unwinds the whole VM One more side note: Yield in a lambda can be viewed as jumping to the very outside of the lambda body, with a value, at which point "return off the end" takes over. So every yield can be considered a frame-local operation (perhaps followed by an implicit "return off the end, but with a value"). The reason I'm making this distinction is that it lets us say that yield always stays *inside* a method activation frame (even if the next step is to return the yielded value). This "yields" a uniform rule: If a `->` is immediately inside a block which defines local variables, those variables are available to code around the yield *for mutation* as well as reading. This is a different rule than with lambda uplevels. It allows code which yields an expression to *also* yield additional values by assigning to up-level variables. This too is a common pattern in Java. For example, a loop might return both an array element and the index of that element, to set up later searches starting after that index. int res2; var res1 = STARTER -> { ? res2 = 42; yield myRes1Value; }; System.out.println("got em: "+asList(res1, res2)); So why can't lambdas side-effect out? Simple, because they are -> blocks invisibly and immediately nested inside of method bodies. There are no vars declared which will survive the implicit return operation, so there's nothing to share (writably) with an enclosing block. But you can say "x = 1; yield 2;" usefully if the enclosing -> block is not also a method body. ? John From emcmanus at google.com Thu May 16 19:56:32 2019 From: emcmanus at google.com (=?UTF-8?Q?=C3=89amonn_McManus?=) Date: Thu, 16 May 2019 12:56:32 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5CDDBBC6.6010107@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> Message-ID: "yield" isn't a reserved word, is it? Doesn't that mean that `yield(5);` is ambiguous? On Thu, 16 May 2019 at 12:36, Alex Buckley wrote: > > On 5/16/2019 8:24 AM, Brian Goetz wrote: > > We?ve probably pretty much explored the options at this point; time to > > converge around one of the choices... > > I am very happy with `yield` as the new construct for concluding the > evaluation of a switch expression and leaving a value on the stack for > consumption within the method. > > I think a statement form for the new construct is ideal. The purpose of > the new construct is to complete abruptly in an attempt to transfer > control back to the switch expression, which then completes normally > with a value. Abrupt completion and an attempt to transfer control are > the hallmarks of `break`, `continue`, and `return`; having `yield` as > the junior member of that club is quite natural. Putting the junior and > senior members side by side shows both similarity and difference: > > ----- > A `yield` statement attempts to transfer control to the innermost > enclosing switch expression; this expression ... then immediately > completes normally and the value of the _Expression_ becomes the value > of the switch expression. > > A `return` statement attempts to transfer control to the invoker of the > innermost enclosing constructor, method, or lambda expression ... In the > case of a return statement with value _Expression_, the value of the > _Expression_ becomes the value of the invocation. > ----- > > Note that the aspect of _attempting_ to transfer control applies to > `yield` just as much as to `break`, `continue`, and `return`. Below, the > `finally` block "intercepts" the transfer of control started by `yield`. > The `finally` block then completes normally, so the transfer of control > proceeds and the switch expression completes normally, leaving 5 or 6 on > the stack. > > ``` > int result = switch (x) { > case 0 -> { > try { > ... > if (...) yield 5; > ... > yield 6; > } > finally { > cleanUp(); > } > } > > default -> 42; > }; > ``` > > Abrupt completion and transfer of control are not the hallmarks of > operators. The purpose of an operator is to indicate the kind of > expression to be evaluated (numeric addition, method invocation, etc), > so an operator-like syntax such as `^` would suggest the imminent > evaluation of a NEW expression. However, we are ALREADY in the process > of evaluating a switch expression; in fact we would like to finish it up > by transferring control from the {...} block (which has been happily > executing statements sequentially) to the switch expression itself (so > it can complete normally). So, I think an operator-like syntax is > inappropriate. > > Alex From john.r.rose at oracle.com Thu May 16 19:58:42 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 16 May 2019 12:58:42 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5CDDBBC6.6010107@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> Message-ID: <811BEAF1-3F9C-4119-AC0F-2A13DEC23574@oracle.com> On May 16, 2019, at 12:36 PM, Alex Buckley wrote: > having `yield` as the junior member of that club is quite natural. Putting the junior and senior members side by side shows both similarity and difference: If junior yield is allowed to help senior return with his job, we have a more uniform rule: yield always matches an arrow. If junior yield should stay off of senior return's grass, we have a somewhat less uniform rule: yield always matches an arrow, unless the arrow is coterminous with a method body, in which case return must be used. Either way is OK with me, but the more uniform rule seems to give me more insight into what's really happening. ? John From john.r.rose at oracle.com Thu May 16 19:59:52 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 16 May 2019 12:59:52 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> Message-ID: On May 16, 2019, at 12:56 PM, ?amonn McManus wrote: > > "yield" isn't a reserved word, is it? Doesn't that mean that > `yield(5);` is ambiguous? Yes, and the plan of record is to finesse such ambiguities, as we did with `var`. From john.r.rose at oracle.com Thu May 16 20:01:46 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 16 May 2019 13:01:46 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> Message-ID: <313B6A52-9DE9-44E2-BBBA-0E1C8CC1A4B7@oracle.com> On May 16, 2019, at 12:59 PM, John Rose wrote: > > On May 16, 2019, at 12:56 PM, ?amonn McManus wrote: >> >> "yield" isn't a reserved word, is it? Doesn't that mean that >> `yield(5);` is ambiguous? > > Yes, and the plan of record is to finesse such ambiguities, > as we did with `var`. Q: But we cannot know if that will really work. A: Yes, it's an ambiguous plan of record. Worked once, though. From brian.goetz at oracle.com Thu May 16 20:04:41 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 16 May 2019 16:04:41 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> Message-ID: <074BE8CF-D875-40B5-B640-E14E49A6ED18@oracle.com> The notion of ?reserved word? is insufficiently precise. More precisely, yield is a _reserved type identifier_, like `var`. That means that you cannot have a class called `yield`, but you can have local variables, or methods, or fields, or type variables, with that name. See https://openjdk.java.net/jeps/8223002 for further guidance on the fine degrees of shading between keywords, context-sensitive keywords, reserved identifiers, and reserved type names. > On May 16, 2019, at 3:56 PM, ?amonn McManus wrote: > > "yield" isn't a reserved word, is it? Doesn't that mean that > `yield(5);` is ambiguous? From brian.goetz at oracle.com Thu May 16 20:10:24 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 16 May 2019 16:10:24 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <811BEAF1-3F9C-4119-AC0F-2A13DEC23574@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> <811BEAF1-3F9C-4119-AC0F-2A13DEC23574@oracle.com> Message-ID: While dodging the arrow, I?ll point out that there is a pleasant ambiguity in the following: x = switch (y) { case L -> { foo(); yield 7; } }; Does the `yield` yield a value to the _block_, or to the _switch_? Answer: IT DOESN?T MATTER! Whichever intuition feels comfortable to you, yields the right answer. If we think of it as yielding to the block, then the block terminates normally with 7, and therefore the case label does, and therefore the switch does. If we think of it as yielding to the switch, then the switch completes normally with 7. And if we later want to expand block expressions to more places, maybe with some new syntax, then in a future Java case L -> { ? } becomes sugar for case L -> BLOCK_COMING { ? } at which point the yield is retconned to yield to the block. > On May 16, 2019, at 3:58 PM, John Rose wrote: > > On May 16, 2019, at 12:36 PM, Alex Buckley wrote: >> having `yield` as the junior member of that club is quite natural. Putting the junior and senior members side by side shows both similarity and difference: > > If junior yield is allowed to help senior return with > his job, we have a more uniform rule: yield always > matches an arrow. > > If junior yield should stay off of senior return's grass, > we have a somewhat less uniform rule: yield always > matches an arrow, unless the arrow is coterminous > with a method body, in which case return must be > used. > > Either way is OK with me, but the more uniform rule > seems to give me more insight into what's really > happening. > > ? John From john.r.rose at oracle.com Thu May 16 20:28:58 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 16 May 2019 13:28:58 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> <811BEAF1-3F9C-4119-AC0F-2A13DEC23574@oracle.com> Message-ID: <197AA4CD-F757-480C-8DB6-49860FB342E8@oracle.com> On May 16, 2019, at 1:10 PM, Brian Goetz wrote: > > While dodging the arrow, I?ll point out that there is a pleasant ambiguity in the following: > > x = switch (y) { > case L -> { > foo(); > yield 7; > } > }; Yes, it is pleasant, and it applies (potentially) to lambdas also. I'm saying it's extra-pleasant (for me) to divide the story into two chapters: Chapter 1. Some constructs have arrows. They define when the arrow bodies are executed, and, if the the arrow gets tossed a value, what is done with that value (method return? switch result? block result? depends on where the arrow is). Chapter 2. Every yield matches an innermost arrow, and every arrow (in a non-void T context) accepts a yielded value (of type T). It's pleasant this way because when you get to Chapter 2, you can forget all the gnarly context outside the arrow. Your yield passes to the innermost arrow, period. And if there's an arrow in sight (in the same stack frame) you can yield to it. Again, period. From maurizio.cimadamore at oracle.com Thu May 16 20:34:03 2019 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Thu, 16 May 2019 21:34:03 +0100 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <074BE8CF-D875-40B5-B640-E14E49A6ED18@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> <074BE8CF-D875-40B5-B640-E14E49A6ED18@oracle.com> Message-ID: <4037c6d3-e4df-9d61-5436-0aaae23ae5ab@oracle.com> On 16/05/2019 21:04, Brian Goetz wrote: > The notion of ?reserved word? is insufficiently precise. ?More > precisely, yield is a _reserved type identifier_, like `var`. ?That > means that you cannot have a class called `yield`, but you can have > local variables, or methods, or fields, or type variables, with that > name. Yep - but it's also different from 'var' in the sense that 'var' never had to fight with ambiguities with method names because it only applied to the 'type' part of a variable declaration, which is either a (possibly qualified) identifier (possibly followed by '<'). Parenthesis were never allowed where 'var' as a type was expected. For yield Eamon is right - there's a new kind of ambiguity. On the other hand is a trivial one to resolve, given what we're discussing now is something like "yields" EXPRESSION so, as soon as the compiler sees a "(" it will say: "ok, that's not a new yield statement". Maurizio > > See > > https://openjdk.java.net/jeps/8223002 > > for further guidance on the fine degrees of shading between keywords, > context-sensitive keywords, reserved identifiers, and reserved type > names. > >> On May 16, 2019, at 3:56 PM, ?amonn McManus > > wrote: >> >> "yield" isn't a reserved word, is it? Doesn't that mean that >> `yield(5);` is ambiguous? > From john.r.rose at oracle.com Thu May 16 20:46:42 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 16 May 2019 13:46:42 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <4037c6d3-e4df-9d61-5436-0aaae23ae5ab@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> <074BE8CF-D875-40B5-B640-E14E49A6ED18@oracle.com> <4037c6d3-e4df-9d61-5436-0aaae23ae5ab@oracle.com> Message-ID: <5D69C670-7320-4F00-831F-C97B2BC3C6B3@oracle.com> On May 16, 2019, at 1:34 PM, Maurizio Cimadamore wrote: > > On the other hand is a trivial one to resolve, given what we're discussing now is something like > > "yields" EXPRESSION > > so, as soon as the compiler sees a "(" it will say: "ok, that's not a new yield statement". The tricky bit with that is the user experience. What if the user needs a parenthesized expression: yield ("answer is "+x).trim(); There are some sharp edges here. Oh, look, it's a workaround bikeshed: yield false? 0: ("answer is "+x).trim(); yield (String)("answer is "+x); yield new String[]{ "answer is "+x }[0]; yield Arrays.asList("answer is "+x).get(0); yield Objects.id("answer is "+x); And my own little favorite, a bespoke use of arrow: yield -> ("answer is "+x); Maybe then also: `yield -> { block of stuff to do before I go; YepDone: yield s; };` ? John From maurizio.cimadamore at oracle.com Thu May 16 21:05:17 2019 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Thu, 16 May 2019 22:05:17 +0100 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5D69C670-7320-4F00-831F-C97B2BC3C6B3@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> <074BE8CF-D875-40B5-B640-E14E49A6ED18@oracle.com> <4037c6d3-e4df-9d61-5436-0aaae23ae5ab@oracle.com> <5D69C670-7320-4F00-831F-C97B2BC3C6B3@oracle.com> Message-ID: On 16/05/2019 21:46, John Rose wrote: > On May 16, 2019, at 1:34 PM, Maurizio Cimadamore wrote: >> On the other hand is a trivial one to resolve, given what we're discussing now is something like >> >> "yields" EXPRESSION >> >> so, as soon as the compiler sees a "(" it will say: "ok, that's not a new yield statement". > The tricky bit with that is the user experience. What if the > user needs a parenthesized expression: > > yield ("answer is "+x).trim(); > > There are some sharp edges here. I was hoping we didn't need to go there :-) There are other contexts in which we limit what can be done w/r/t/ parenthesized expressions (since these are ambiguous with cast to generic types). So this looks like another case where the grammar has to say - sorry no parens here. Maurizio > > Oh, look, it's a workaround bikeshed: > > yield false? 0: ("answer is "+x).trim(); > yield (String)("answer is "+x); > yield new String[]{ "answer is "+x }[0]; > yield Arrays.asList("answer is "+x).get(0); > yield Objects.id("answer is "+x); > > And my own little favorite, a bespoke > use of arrow: > > yield -> ("answer is "+x); > > Maybe then also: > > `yield -> { block of stuff to do before I go; YepDone: yield s; };` > > ? John > From guy.steele at oracle.com Thu May 16 21:41:05 2019 From: guy.steele at oracle.com (Guy Steele) Date: Thu, 16 May 2019 17:41:05 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> <074BE8CF-D875-40B5-B640-E14E49A6ED18@oracle.com> <4037c6d3-e4df-9d61-5436-0aaae23ae5ab@oracle.com> <5D69C670-7320-4F00-831F-C97B2BC3C6B3@oracle.com> Message-ID: <6C53E425-3BED-4227-93F0-5279B868D3BE@oracle.com> > On May 16, 2019, at 5:05 PM, Maurizio Cimadamore wrote: > > > On 16/05/2019 21:46, John Rose wrote: >> On May 16, 2019, at 1:34 PM, Maurizio Cimadamore wrote: >>> On the other hand is a trivial one to resolve, given what we're discussing now is something like >>> >>> "yields" EXPRESSION >>> >>> so, as soon as the compiler sees a "(" it will say: "ok, that's not a new yield statement". >> The tricky bit with that is the user experience. What if the >> user needs a parenthesized expression: >> >> yield ("answer is "+x).trim(); >> >> There are some sharp edges here. > > I was hoping we didn't need to go there :-) > > There are other contexts in which we limit what can be done w/r/t/ parenthesized expressions (since these are ambiguous with cast to generic types). So this looks like another case where the grammar has to say - sorry no parens here. And _that_ would very much give me pause. I would find it quite wrenching to have a place in the language where an expression cannot be parenthesized and have it mean exactly the same thing. Maybe we should go back to a hyphenated keyword. ?Guy From alex.buckley at oracle.com Thu May 16 21:43:29 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Thu, 16 May 2019 14:43:29 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> <074BE8CF-D875-40B5-B640-E14E49A6ED18@oracle.com> <4037c6d3-e4df-9d61-5436-0aaae23ae5ab@oracle.com> <5D69C670-7320-4F00-831F-C97B2BC3C6B3@oracle.com> Message-ID: <5CDDD981.3010707@oracle.com> On 5/16/2019 2:05 PM, Maurizio Cimadamore wrote: > There are other contexts in which we limit what can be done w/r/t/ > parenthesized expressions (since these are ambiguous with cast to > generic types). So this looks like another case where the grammar has to > say - sorry no parens here. If you're proposing to disallow a cast expression or a parenthesized expression after a `yield` token, then I think that's not right. The parsing of a `(` token has triggered potentially unbounded lookahead for some time [1][2], and everything worked out, so I don't see why the language should disallow any of John's examples: yield (String)("answer is "+x); yield ("answer is "+x).trim(); yield new String[]{ "answer is "+x }[0]; yield Arrays.asList("answer is "+x).get(0); yield false ? 0 : ("answer is "+x).trim(); Alex [1] See slides 9-11 from https://www.eclipsecon.org/na2014/session/jdt-embraces-lambda-expressions.html [2] JLS 15.27 on the choice of `(...)` for lambda parameters : The syntax has some parsing challenges. The Java programming language has always required arbitrary lookahead to distinguish between types and expressions after a '(' token: what follows may be a cast or a parenthesized expression. This was made worse when generics reused the binary operators '<' and '>' in types. Lambda expressions introduce a new possibility: the tokens following '(' may describe a type, an expression, or a lambda parameter list. Some tokens immediately indicate a parameter list (annotations, final); in other cases there are certain patterns that must be interpreted as parameter lists (two names in a row, a ',' not nested inside of '<' and '>'); and sometimes, the decision cannot be made until a '->' is encountered after a ')'. The simplest way to think of how this might be efficiently parsed is with a state machine: each state represents a subset of possible interpretations (type, expression, or parameters), and when the machine transitions to a state in which the set is a singleton, the parser knows which case it is. This does not map very elegantly to a fixed-lookahead grammar, however. From james.laskey at oracle.com Thu May 16 21:45:59 2019 From: james.laskey at oracle.com (James Laskey) Date: Thu, 16 May 2019 18:45:59 -0300 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> Message-ID: Yield +1 Sent from my iPhone > On May 16, 2019, at 12:24 PM, Brian Goetz wrote: > > We?ve probably pretty much explored the options at this point; time to converge around one of the choices... > >> >> De: "Brian Goetz" >> ?: "amber-spec-experts" >> Envoy?: Dimanche 12 Mai 2019 21:38:38 >> Objet: Call for bikeshed -- break replacement in expression switch >> As mentioned in the preview mail, we have one more decision to make: the new spelling of ?break value? in expression switches. We have previously discussed ?break-with value?, which everyone seems to like better than ?break value?, but I think we can, and should, do better. >> >> (Despite the call-for-bikeshed, this is not to reopen every sub-decision ? the 2x2 semantics, the use of ->, the name of the construct ? this bikeshed only has room for one bike.) >> >> There are two primary reasons why we prefer break-with to break. We originally chose ?break value" when we had a more limited palette of options to choose from (the keyword-resupply ship hadn?t yet docked.) The overloading of break creates uncomfortable interactions. There is the obvious ambiguity between ?break value? and ?break label?; there is also the slightly less obvious interaction where we cannot permit ?break value? inside a loop or statement switch inside an expression switch. While both of these can be ?specified around?, they create distortions in the spec, which in turn creates complexity in the user model; these are a sign that we may be pushing something a bit too far. Further, historically ?break? has been a straight transfer of control; this muddies up what ?break? means. >> >> Once we alit on the idea of break-* as a keyword, it seemed immediately more comfortable to make a new break-derived keyword; this allowed us to undo the distortions that ?break value? introduced, and it immediately felt better. But I think we can do better still. Here?s what?s making me uncomfortable. >> >> We?ve actually been here before: lambda expressions were the first time we allowed an expression to contain statements, and while the streamlined case of ?x -> e? didn?t require any control statements, and many lambdas could be expressed with this form, statement lambdas needed a way to say ?stop executing the body of this lambda, and yield a value.? We settled ? somewhat uncomfortably ? on ?return value" for this. >> >> Fast-forward to today, when we?re introducing the second expression form that can contain statements, and we face the same question: how to indicate ?I?m done, I?m completing normally, here?s my value.? Lambdas provide no help here; we can?t use ?return? here. (Well, we could, but that would be terrible, so we?re not going to.) Which means we have to solve the problem again, but differently. That?s already not so great. >> >> Digression: What?s so terrible about ?return?, any why is it OK for lambdas but not OK for switches? >> >> While we could of course define ?return? to mean whatever we want, But, in imperative languages with the concept of ?methods? or ?procedures?, including Java, return has always had a clear meaning: unwind the current call frame, and yield the designated value to the caller. Lambda expressions are effectively method bodies (lambdas are literals for functional interfaces, which are single method interfaces), and so return (barely) fits. But switch expressions are most definitely not methods, and are not associated with call frames. Asking users to look at the enclosing context when they see a ?return? in the middle of a method, to know whether it returns from the method or merely transfers control within the method, is a lot to ask. (Yes, I know lambdas ask this as well; this is why this was an uncomfortable choice, and having made this hole, I?m not anxious to expand it dramatically. If anything I?d prefer to close it, but that?s another bikeshed.). >> >> (end digression) >> >> >> We could surely take ?break-with? and move on; it feels sufficiently ?switchy?. But let?s look ahead a little bit. We?ve now confronted the same problem twice: an expression form that, in a minority use case, needed a way to express ?stop computing this expression, because I?m done, and here?s its value.? (And, unfortunately, we have two different syntactic ways to express the same basic concept.) Let?s call these ?structured expressions.? >> >> We have two structured expression forms, and of the three numbers in computer science, ?two? is not one of them. Which suggests we are going to face this problem again some day ? whether it be ?block expressions?, or ?if expressions?, or ?let expressions?, or ?try expressions?, or whatever. (NB: this call-for-bikeshed most definitely does not extend to ?why not just do generalized block expressions?, so please don?t go there. That said, you could treat this discussion as ?if Java had block expressions, what might they look like?? But we?re focusing on the content of the block, not how the block is framed.) >> >> Let?s say for sake of argument that we might someway want to extend ternary expressions to support the same kind of ?restricted block expressions? as expression switches. (This is just an example for purposes of illustration, let?s not get derailed on ?but you should use an ?if? statement for that"). >> >> String s = (foo != null) >> ? s >> : { >> println(?null again at line? + __LINE__); >> break-with ?null?; >> }; >> >> Such an expression needs a way to say ?I?m done, here?s my value?, just as lambda and switch did before it. Clearly ?return? is not the right thing here any more than it is for switches. And I don?t think ?break-with? is all that great here either! It?s not terrible, but outside of a loop or switch, it starts to feel kind of forced. And it would be terrible to solve this problem twice with one-time solutions, and have no general story, and then have to come up with YET ANOTHER way of expressing the same basic concept. So regardless of what we expect for future expression forms, let?s examine what our options are that are not tied to call frames (return) or direct transfer of control (switches and loops.). >> >> Looking at what other languages have done here, there are a few broad directions: >> >> - A statement like ?break-with v?, indicating that the enclosing structured expression is completing normally with the provided value. >> - An operator that serves the same purpose, such as ?-> e?. >> - Assigning to some magic variable (this is how Pascal indicates the return value of a function). >> - Treating the last expression in the block as the result. >> >> I think we can dispatch all but the first relatively easily: >> >> - We don?t use operators for ?return?, we use a keyword; this would be both a gratuitous departure, as well as too easy to miss. >> - Switch expressions don?t have names, and even if we assigned to ?switch?, it wouldn?t be obvious that we were actually terminating execution of the block. >> - Everywhere else in the language (such as method bodies), you are free to yield up a value from the middle of the block, perhaps from within a control construct like a loop; restricting the RHS of case blocks to put their result last would be a significant new restriction, and would limit the ability to refactor to/from methods. And further, the convention of putting the result last, while a fine one for a language that is ?expressions all the way down?, would likely be too subtle a cue in Java. >> >> So, we want a keyword (or contextual keyword.). In some hallway brainstorming, candidates that emerged include yield, produce, offer, offer-up, result, value-break, yield-value, provide, resulting-in, break-with, resulting, yielding, put, give, giving, ... >> >> (Also to keep in mind: remember we?re dealing with a minority case; most of the time, there?ll just be an expression on the RHS.) >> >> TL;DR: I think we might come to regret break-* just as we did with return ? because it won?t scale to future demands we place on it, and having *three* ways to say basically the same thing in three different contexts would be embarrassing. I would like to see if we can do better. >> >> >> Of the options listed here, I have a favorite: yield. (This is one of the terms we?ve actually be using all along when describing this feature in english.) >> >> There is one obvious objection to ?yield?, which I?d like to preemptively address: that in some languages (though not in Java, except for the infrequently-used Thread.yield()), it is associated with concurrency primitives, such as generators. (This was the objection raised when yield was proposed in the context of lambdas.). But, these association are not grounded in existing Java constructs (and, the progress of Loom suggests that constructs like async/await are not coming to Java, and even if we wanted language support for generators, there are ample other ways to say it.) >> >> Dictionary.com lists the following meanings for yield: >> >> verb (used with object) >> - to give forth or produce by a natural process or in return for cultivation: >> - to produce or furnish (payment, profit, or interest): >> - to give up, as to superior power or authority: >> - to give up or over; relinquish or resign: >> - to give as due or required: >> - to cause; give rise to: >> >> verb (used without object) >> - to give a return, as for labor expended; produce; bear. >> - to surrender or submit, as to superior power: >> - to give way to influence, entreaty, argument, or the like: >> - to give place or precedence (usually followed by to): >> - to give way to force, pressure, etc., so as to move, bend, collapse, or the like: >> >> These are mostly consistent with the use of ?yield? as proposed here. >> >> One more thing to bear in mind: there is an ordering to abrupt completion mechanisms, as to how far away they can transfer control: >> >> - yield: can unwind only the innermost yieldable expression >> - break/continue: can unwind multiple control constructs (for, while, switch), but stays within the method >> - return: unwinds exactly one method >> - throw: unwinds one or more methods >> - System.exit: unwinds the whole VM >> >> >> Bikeshed is open (but remember the bounds of this bikeshed are limited; we?re talking purely about the syntax of a ?stop executing this block and yield a value to the enclosing context? ? and time is ticking.) >> >> >> >> >> > From john.r.rose at oracle.com Fri May 17 01:15:36 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 16 May 2019 18:15:36 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5CDDD981.3010707@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> <074BE8CF-D875-40B5-B640-E14E49A6ED18@oracle.com> <4037c6d3-e4df-9d61-5436-0aaae23ae5ab@oracle.com> <5D69C670-7320-4F00-831F-C97B2BC3C6B3@oracle.com> <5CDDD981.3010707@oracle.com> Message-ID: On May 16, 2019, at 2:43 PM, Alex Buckley wrote: > > If you're proposing to disallow a cast expression or a parenthesized expression after a `yield` token, then I think that's not right. The parsing of a `(` token has triggered potentially unbounded lookahead for some time [1][2], and everything worked out, so I don't see why the language should disallow any of John's examples: > > yield (String)("answer is "+x); > yield ("answer is "+x).trim(); > yield new String[]{ "answer is "+x }[0]; > yield Arrays.asList("answer is "+x).get(0); > yield false ? 0 : ("answer is "+x).trim(); Here's what's tricky: If there is a method called "yield" in scope, then one of those examples is a valid method call expression statement. import static MyFavYielder.yield; class Client extends MaybeHasYieldMethod { void m(int x) { var res = switch (x) { case 42 -> { yield ("answer is "+x).trim(); } default -> -1; }} Here's one way to slice it (very thin): The name "yield" is placed in scope in "->" blocks as if it were an inherited or imported static method. It acts like an arity-1 signature-poly method returning void. When "yield" is followed by a paren, an appeal to this method, and any other ambient methods named "yield" is made, and overloading and ambiguity analysis is done. If after all the special sig-poly method is matched, then the compiler edits the statement into a control flow construct. (This is circular: A control flow construct affects ambient DA/DU rules which might also indirectly affect types IIRC. So the type of the yield call maybe could circularly depend on the surrounding control flow.) If the built-in "yield" quasi-method conflicts with a real "yield" method that is in scope and matches, the we report an ambiguity to the user. (Ambiguity? Ya think??) The user has to fix it by using a fully qualified call to the intended yield or some similar dodge. If the yield statement is desired, at worst case the user makes a temporary variable, and yields *that*. This trickiness does tend to support a less ambiguous syntax, such as "yield -> x;" or (per Doug) "yield ^x;". ? John P.S. I find "yield -> x" charming partly because the arrow seems to have additional possibilities: if (foo) { yield -> { var x = waitASec(); var y = OK; yield -> f(x, OK); }; } instead of: if (foo) { var x = waitASec(); var y = OK; yield -> f(x, OK); } From forax at univ-mlv.fr Fri May 17 06:30:21 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 17 May 2019 08:30:21 +0200 (CEST) Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <074BE8CF-D875-40B5-B640-E14E49A6ED18@oracle.com> <4037c6d3-e4df-9d61-5436-0aaae23ae5ab@oracle.com> <5D69C670-7320-4F00-831F-C97B2BC3C6B3@oracle.com> <5CDDD981.3010707@oracle.com> Message-ID: <841602160.1715579.1558074621687.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "John Rose" > ?: "Alex Buckley" > Cc: "amber-spec-experts" > Envoy?: Vendredi 17 Mai 2019 03:15:36 > Objet: Re: Call for bikeshed -- break replacement in expression switch > On May 16, 2019, at 2:43 PM, Alex Buckley wrote: >> >> If you're proposing to disallow a cast expression or a parenthesized expression >> after a `yield` token, then I think that's not right. The parsing of a `(` >> token has triggered potentially unbounded lookahead for some time [1][2], and >> everything worked out, so I don't see why the language should disallow any of >> John's examples: >> >> yield (String)("answer is "+x); >> yield ("answer is "+x).trim(); >> yield new String[]{ "answer is "+x }[0]; >> yield Arrays.asList("answer is "+x).get(0); >> yield false ? 0 : ("answer is "+x).trim(); > > Here's what's tricky: If there is a method called "yield" > in scope, then one of those examples is a valid method > call expression statement. > > import static MyFavYielder.yield; > class Client extends MaybeHasYieldMethod { > void m(int x) { > var res = switch (x) { > case 42 -> { > yield ("answer is "+x).trim(); > } > default -> -1; > }} > > Here's one way to slice it (very thin): > > The name "yield" is placed in scope in "->" > blocks as if it were an inherited or imported > static method. It acts like an arity-1 > signature-poly method returning void. > When "yield" is followed by a paren, > an appeal to this method, and any > other ambient methods named "yield" > is made, and overloading and ambiguity > analysis is done. If after all the special > sig-poly method is matched, then the > compiler edits the statement into a > control flow construct. > > (This is circular: A control flow construct > affects ambient DA/DU rules which might > also indirectly affect types IIRC. So the type > of the yield call maybe could circularly depend > on the surrounding control flow.) I would prefer a more "brutal approach" for the shake of my brain, i would like those rules to be true: - inside a -> block, the "yield" text always means yield from that block. - if there is no -> block (no switch expression), the compiler will not emit an error. It works that way, at the beginning of a -> block, the compiler checks in the scope if there is a method named "yield" available (whatever the number of parameters), if it's true, the compiler reports an error. This rule is voluntarily simple, so a human can understand it :) And if there is an unqualified access to a method yield anywhere in the compilation unit, the compiler emits a warning to help users to change their code to make it more readable. [...] > > ? John R?mi From forax at univ-mlv.fr Fri May 17 06:30:48 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 17 May 2019 08:30:48 +0200 (CEST) Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <6C53E425-3BED-4227-93F0-5279B868D3BE@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <5CDDBBC6.6010107@oracle.com> <074BE8CF-D875-40B5-B640-E14E49A6ED18@oracle.com> <4037c6d3-e4df-9d61-5436-0aaae23ae5ab@oracle.com> <5D69C670-7320-4F00-831F-C97B2BC3C6B3@oracle.com> <6C53E425-3BED-4227-93F0-5279B868D3BE@oracle.com> Message-ID: <1134821580.1715647.1558074648213.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Guy Steele" > ?: "Maurizio Cimadamore" > Cc: "amber-spec-experts" , "?amonn McManus" > Envoy?: Jeudi 16 Mai 2019 23:41:05 > Objet: Re: Call for bikeshed -- break replacement in expression switch >> On May 16, 2019, at 5:05 PM, Maurizio Cimadamore >> wrote: >> >> >> On 16/05/2019 21:46, John Rose wrote: >>> On May 16, 2019, at 1:34 PM, Maurizio Cimadamore >>> wrote: >>>> On the other hand is a trivial one to resolve, given what we're discussing now >>>> is something like >>>> >>>> "yields" EXPRESSION >>>> >>>> so, as soon as the compiler sees a "(" it will say: "ok, that's not a new yield >>>> statement". >>> The tricky bit with that is the user experience. What if the >>> user needs a parenthesized expression: >>> >>> yield ("answer is "+x).trim(); >>> >>> There are some sharp edges here. >> >> I was hoping we didn't need to go there :-) >> >> There are other contexts in which we limit what can be done w/r/t/ parenthesized >> expressions (since these are ambiguous with cast to generic types). So this >> looks like another case where the grammar has to say - sorry no parens here. > > And _that_ would very much give me pause. I would find it quite wrenching to > have a place in the language where an expression cannot be parenthesized and > have it mean exactly the same thing. > > Maybe we should go back to a hyphenated keyword. goto-with ? > > ?Guy R?mi From john.r.rose at oracle.com Fri May 17 06:41:20 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 16 May 2019 23:41:20 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> Message-ID: <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> (Going back to the start of this thread.) On May 12, 2019, at 12:38 PM, Brian Goetz wrote: > > We could surely take ?break-with? and move on; it feels sufficiently ?switchy?. If "break L" breaks out of a statement introduced with "L"? Then? "break ->" could break out of a statement introduced with "->". So, consider this a plea for that bikeshed color. It's better than "break-with" because instead of "if" there's a little arrow in there. See, you're breaking from the arrow!?! STARTER -> { stuff; if (early) break -> 42; stuff; stuff; break -> -1; } (The ECL "true => 42" is then in Java "if (true) break -> 42;") From forax at univ-mlv.fr Fri May 17 07:29:49 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 17 May 2019 09:29:49 +0200 (CEST) Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> Message-ID: <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "John Rose" > ?: "Brian Goetz" > Cc: "amber-spec-experts" > Envoy?: Vendredi 17 Mai 2019 08:41:20 > Objet: Re: Call for bikeshed -- break replacement in expression switch > (Going back to the start of this thread.) > > On May 12, 2019, at 12:38 PM, Brian Goetz wrote: >> >> We could surely take ?break-with? and move on; it feels sufficiently ?switchy?. > > If "break L" breaks out of a statement introduced with "L"? > > Then? > > "break ->" could break out of a statement introduced with "->". It's not logical for me, it's not "L", it's "L:". If it was "break :L", i would agree. R?mi From dl at cs.oswego.edu Fri May 17 10:43:21 2019 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 17 May 2019 06:43:21 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> Message-ID: On 5/17/19 2:41 AM, John Rose wrote: > > "break ->" could break out of a statement introduced with "->". > Not bad. This is now my second choice vote. (Behind unary "^".) -Doug From guy.steele at oracle.com Fri May 17 12:50:21 2019 From: guy.steele at oracle.com (Guy Steele) Date: Fri, 17 May 2019 08:50:21 -0400 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <1134821580.1715647.1558074648213.JavaMail.zimbra@u-pem.fr> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <5CDDBBC6.6010107@oracle.com> <074BE8CF-D875-40B5-B640-E14E49A6ED18@oracle.com> <4037c6d3-e4df-9d61-5436-0aaae23ae5ab@oracle.com> <5D69C670-7320-4F00-831F-C97B2BC3C6B3@oracle.com> <6C53E425-3BED-4227-93F0-5279B868D3BE@oracle.com> <1134821580.1715647.1558074648213.JavaMail.zimbra@u-pem.fr> Message-ID: <53F0FBFA-A31F-4B2D-8885-C90E671DE437@oracle.com> Sent from my iPhone > On May 17, 2019, at 2:30 AM, Remi Forax wrote: > > ----- Mail original ----- >> De: "Guy Steele" >> ?: "Maurizio Cimadamore" >> Cc: "amber-spec-experts" , "?amonn McManus" >> Envoy?: Jeudi 16 Mai 2019 23:41:05 >> Objet: Re: Call for bikeshed -- break replacement in expression switch > >>> On May 16, 2019, at 5:05 PM, Maurizio Cimadamore >>> wrote: >>> >>> >>>> On 16/05/2019 21:46, John Rose wrote: >>>> On May 16, 2019, at 1:34 PM, Maurizio Cimadamore >>>> wrote: >>>>> On the other hand is a trivial one to resolve, given what we're discussing now >>>>> is something like >>>>> >>>>> "yields" EXPRESSION >>>>> >>>>> so, as soon as the compiler sees a "(" it will say: "ok, that's not a new yield >>>>> statement". >>>> The tricky bit with that is the user experience. What if the >>>> user needs a parenthesized expression: >>>> >>>> yield ("answer is "+x).trim(); >>>> >>>> There are some sharp edges here. >>> >>> I was hoping we didn't need to go there :-) >>> >>> There are other contexts in which we limit what can be done w/r/t/ parenthesized >>> expressions (since these are ambiguous with cast to generic types). So this >>> looks like another case where the grammar has to say - sorry no parens here. >> >> And _that_ would very much give me pause. I would find it quite wrenching to >> have a place in the language where an expression cannot be parenthesized and >> have it mean exactly the same thing. >> >> Maybe we should go back to a hyphenated keyword. > > goto-with ? throw-yield ? From manoj.palat at in.ibm.com Fri May 17 14:55:14 2019 From: manoj.palat at in.ibm.com (Manoj Palat) Date: Fri, 17 May 2019 20:25:14 +0530 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> Message-ID: Hi, I have a few points regarding this ? since there was a flurry of mails last night/day, I have given references below to specific threads below: -As Maurizio pointed out in https://mail.openjdk.java.net/pipermail/amber-spec-experts/2019-May/001334.html , ?yield? is not really a _reserved_type_identifier_ like ?var? ? ?var? is correct only at places (at some places actually) where a type can occur- Our view point: At parsing time ?var? is just taken as a type and hence from a compiler implementation point of view, ?var? is less of a challenge than the proposed ?yield?. If ?yield? value is used instead of ?break? value, then again, the compiler needs to disambiguate ? the disambiguation problem just manifests in a different avatar. -Alex, in the discussion here https://mail.openjdk.java.net/pipermail/amber-spec-experts/2019-May/001338.html has pointed out that ?The parsing of a `(` token has triggered potentially unbounded lookahead for some time [1][2], and everything worked out, so I don't see why the language should disallow any of John's examples? where The reference [1] is ?[1] See slides 9-11 from https://www.eclipsecon.org/na2014/session/jdt-embraces-lambda-expressions.html ? Our View point: However, though the problem was resolved finally for lambda, additions of new context sensitive keywords would make our parsing more complicated with additional logic in lookaheads. Although the problem was solved from a pure compiler perspective, we are far from winning the battle as an IDE where one major value add is code completion, which works on incomplete code. Due to these hacks, code completion for lambdas still has unresolved issues for us. - An additional input to this discussion is the proposal for hyphenated keywords as described in https://openjdk.java.net/jeps/8223002. ?break-with? which was the earlier proposed one, was one among these hyphenated keywords. Our View point: We are fine with that as mentioned in the mailing list sometime earlier in the context of switch expressions and break-with, the hyphenated keyword. The more the number of context sensitive keywords are introduced, causing more hacks, it would be really difficult to sustain and scale the Eclipse IDE. - Based on the above, I believe ?break-with? was a better candidate with less or disambiguation and it goes along with the future direction of keywords. Here the assumption is break-with is not context sensitive at any point in time. Given that ?break-with? had opposition, and ?yield? was more popular candidate, planning to reply with a new suggestion of hyphenated keyword ?yield-value? or any other hyphenated keyword. Regards, Manoj. Eclipse Java Dev. From: Remi Forax To: John Rose Cc: amber-spec-experts Date: 05/17/2019 01:00 PM Subject: [EXTERNAL] Re: Call for bikeshed -- break replacement in expression switch Sent by: "amber-spec-experts" ----- Mail original ----- > De: "John Rose" > ?: "Brian Goetz" > Cc: "amber-spec-experts" > Envoy?: Vendredi 17 Mai 2019 08:41:20 > Objet: Re: Call for bikeshed -- break replacement in expression switch > (Going back to the start of this thread.) > > On May 12, 2019, at 12:38 PM, Brian Goetz wrote: >> >> We could surely take ?break-with? and move on; it feels sufficiently ?switchy?. > > If "break L" breaks out of a statement introduced with "L"? > > Then? > > "break ->" could break out of a statement introduced with "->". It's not logical for me, it's not "L", it's "L:". If it was "break :L", i would agree. R?mi From forax at univ-mlv.fr Fri May 17 15:40:46 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 17 May 2019 15:40:46 +0000 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> Message-ID: <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> Hi Manoj, yield-value is not a hyphenated keyword, the left part of the right part as to be an existing keyword. Remi On May 17, 2019 2:55:14 PM UTC, Manoj Palat wrote: >Hi, >I have a few points regarding this ? since there was a flurry of mails >last >night/day, I have given references below to specific threads below: > >-As Maurizio pointed out in >https://mail.openjdk.java.net/pipermail/amber-spec-experts/2019-May/001334.html >, ?yield? is not really a _reserved_type_identifier_ like ?var? ? ?var? >is >correct only at places (at some places actually) where a type can >occur- > Our view point: At parsing time ?var? is just taken as a type and >hence from a compiler implementation point of view, ?var? is less of a >challenge than the proposed ?yield?. If ?yield? value is used instead >of >?break? value, then again, the compiler needs to disambiguate ? the >disambiguation problem just manifests in a different avatar. > >-Alex, in the discussion here >https://mail.openjdk.java.net/pipermail/amber-spec-experts/2019-May/001338.html >has pointed out that ?The parsing of a `(` token has triggered >potentially >unbounded lookahead for some time [1][2], and everything worked out, so >I >don't see why the language should disallow any of John's examples? >where >The reference [1] is ?[1] See slides 9-11 from >https://www.eclipsecon.org/na2014/session/jdt-embraces-lambda-expressions.html > ? > Our View point: However, though the problem was resolved finally >for lambda, additions of new context sensitive keywords would make our >parsing more complicated with additional logic in lookaheads. Although >the >problem was solved from a pure compiler perspective, we are far from >winning the battle as an IDE where one major value add is code >completion, >which works on incomplete code. Due to these hacks, code completion for >lambdas still has unresolved issues for us. > >- An additional input to this discussion is the proposal for hyphenated >keywords as described in https://openjdk.java.net/jeps/8223002. >?break-with? which was the earlier proposed one, was one among these >hyphenated keywords. >Our View point: We are fine with that as mentioned in the mailing list >sometime earlier in the context of switch expressions and break-with, >the >hyphenated keyword. The more the number of context sensitive keywords >are >introduced, causing more hacks, it would be really difficult to sustain >and >scale the Eclipse IDE. >- Based on the above, I believe ?break-with? was a better candidate >with >less or disambiguation and it goes along with the future direction of >keywords. Here the assumption is break-with is not context sensitive at >any >point in time. Given that ?break-with? had opposition, and ?yield? was >more >popular candidate, planning to reply with a new suggestion of >hyphenated >keyword ?yield-value? or any other hyphenated keyword. > >Regards, >Manoj. >Eclipse Java Dev. > > > >From: Remi Forax >To: John Rose >Cc: amber-spec-experts >Date: 05/17/2019 01:00 PM >Subject: [EXTERNAL] Re: Call for bikeshed -- break replacement in > expression switch >Sent by: "amber-spec-experts" > > > > >----- Mail original ----- >> De: "John Rose" >> ?: "Brian Goetz" >> Cc: "amber-spec-experts" >> Envoy?: Vendredi 17 Mai 2019 08:41:20 >> Objet: Re: Call for bikeshed -- break replacement in expression >switch > >> (Going back to the start of this thread.) >> >> On May 12, 2019, at 12:38 PM, Brian Goetz >wrote: >>> >>> We could surely take ?break-with? and move on; it feels sufficiently >?switchy?. >> >> If "break L" breaks out of a statement introduced with "L"? >> >> Then? >> >> "break ->" could break out of a statement introduced with "->". > >It's not logical for me, it's not "L", it's "L:". >If it was "break :L", i would agree. > >R?mi -- Envoy? de mon appareil Android avec Courriel K-9 Mail. Veuillez excuser ma bri?vet?. From brian.goetz at oracle.com Fri May 17 16:57:28 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 17 May 2019 12:57:28 -0400 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> Message-ID: As was pointed out in Keyword Management for the Java Language (https://openjdk.java.net/jeps/8223002 ), contextual keywords are a compromise, and their compromises vary by lexical position. Let?s take a more organized look at the costs and options for doing `yield` as a contextual keyword. But, before we do, let?s put this in context (heh): methods called yield() are rare (there?s only one in the JDK), and blocks on the RHS of an arrow-switch are rare, so we?re talking about the interaction of two corner cases. Let?s take the following example. class C { /* 1 */ void yield(int x) { } void m(int y) { /* 2 */ yield (1); /* 3 */ yield 1; int z = switch (y) { case 0 -> { /* 4 */ yield (1); } case 1 -> { /* 5 */ yield 1; } default -> 42; } } } First, requirements: For usage (1), this has to be a valid method declaration. For usage (2), this has to be a method invocation. For usage (3), this has to be some sort of compilation error. For usage (4), there is some discussion to be had. For usage (5), this has to be a yield statement. (1) is not problematic, as the yield-statement production is not in play at all when parsing method declarations. (3) is not problematic, as there is no ambiguity between method-invocation and yield-statement, and yield-statement is not allowed here. (Even if the operand were an identifier, not a numeric literal, it would not be ambiguous with a local variable declaration, because `yield` will not be permitted as a type identifier.). (5) is not problematic, as there is no ambiguity between method invocation and yield-statement. Let?s talk about (2) and (4). Let?s assume the parser production only allows yield statement inside of a block on the RHS of an arrow-switch (and maybe some other contexts in the future, but not all blocks). Let?s call these ?switchy blocks? for clarity. That means that (2) is similarly unambiguous to (3), and will be parsed as a method invocation. So this is really all about (4). OPTION A: DISALLOW YIELD (E) ---------------------------- In this option, we disallow yield statements whose argument is a parenthesized expression, instead parsing them as method invocations. Most such invocations will fail as there is unlikely to be a yield() method in scope. From a parser perspective, this is straightforward enough; we need an alternate Expression production which omits ?parenthesized expression.? From a user perspective, I think this is likely to be a sharp edge, as I would expect it to be more common to want to use a parenthesized operand than there will be a yield method in scope. OPTION B: DISALLOW UNQUALIFIED INVOCATION ----------------------------------------- From a parser perspective, this is similarly straightforward: inside a switchy block, give the rule `yield ` a higher priority than method invocation. The compiler can warn on this ambiguity, if we like. From a user perspective, users wanting to invoke yield() methods inside switchy blocks will need to qualify the receiver (Foo.yield(), this.yield(), etc). The cost is that a statement ?yield (e)? parses to different things in different contexts; in a switchy block, it is a yield statement, the rest of the time, it is a method invocation. I think this is much less likely to cause user distress than Option A, because it is rare that there is an unqualified yield(x) method in scope. (And, given every yield() method I can think of, you?d likely never call one from a switchy block anyway, as they are side-effectful and blocking.). And in the case of collision, there is a clear workaround if the user really wanted a method invocation, and the compiler can deliver a warning when there is actual ambiguity. OPTION C: SYMBOL-DRIVEN PARSING ------------------------------- In this option, the context-sensitivity of parsing includes a check for whether a `yield()` method is in scope. I think we can rule this out as overly heroic; constraining parsing to be aware of the symbol table is asking a lot of compilers. OPTION D: BOTH WAYS ------------------- In this option, we proceed as with Option A, but when we get to symbol analysis, if we are in a switchy block and there is no yield() method in scope, we rewrite the tree to be a yield statement instead. OPTION E: A REAL KEYWORD ------------------------ The pain above is an artifact of choosing a contextual keyword; on the scale of contextual pain, this rates a ?mild?, largely because true collisions are likely to be quite rare, and there is no backward compatibility concern. So while choosing a real keyword (break-with) would be cleaner, I don?t think the users will like it as much. My opinions: I think C is pretty much a non-starter, and IMO B is measurably more attractive than A. Option D is not as terrible as C but seems overly heroic, as we try to avoid tree-rewriting in attribution. I don?t think the pain of either A or B merits grabbing for E. From forax at univ-mlv.fr Fri May 17 17:35:02 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 17 May 2019 17:35:02 +0000 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> Message-ID: <1F45AF9F-4CDA-45FE-8CDA-50BEAF5B5A5A@univ-mlv.fr> Thanks for providing a clear view of our options. I vote for B. I will add that obviously there is no switchy block that contains an unqualified yield in the actual code so the compiler should emit an error instead of a warning if there is an unqualified yield in the scope of the switchy block. Remi On May 17, 2019 4:57:28 PM UTC, Brian Goetz wrote: >As was pointed out in Keyword Management for the Java Language >(https://openjdk.java.net/jeps/8223002 >), contextual keywords are a >compromise, and their compromises vary by lexical position. Let?s take >a more organized look at the costs and options for doing `yield` as a >contextual keyword. > >But, before we do, let?s put this in context (heh): methods called >yield() are rare (there?s only one in the JDK), and blocks on the RHS >of an arrow-switch are rare, so we?re talking about the interaction of >two corner cases. > >Let?s take the following example. > >class C { > /* 1 */ void yield(int x) { } > > void m(int y) { > /* 2 */ yield (1); > /* 3 */ yield 1; > > int z = switch (y) { > case 0 -> { > /* 4 */ yield (1); > } > case 1 -> { > /* 5 */ yield 1; > } > default -> 42; > } > } >} > >First, requirements: > >For usage (1), this has to be a valid method declaration. > >For usage (2), this has to be a method invocation. > >For usage (3), this has to be some sort of compilation error. > >For usage (4), there is some discussion to be had. > >For usage (5), this has to be a yield statement. > > >(1) is not problematic, as the yield-statement production is not in >play at all when parsing method declarations. > >(3) is not problematic, as there is no ambiguity between >method-invocation and yield-statement, and yield-statement is not >allowed here. (Even if the operand were an identifier, not a numeric >literal, it would not be ambiguous with a local variable declaration, >because `yield` will not be permitted as a type identifier.). > >(5) is not problematic, as there is no ambiguity between method >invocation and yield-statement. > >Let?s talk about (2) and (4). > >Let?s assume the parser production only allows yield statement inside >of a block on the RHS of an arrow-switch (and maybe some other contexts >in the future, but not all blocks). Let?s call these ?switchy blocks? >for clarity. That means that (2) is similarly unambiguous to (3), and >will be parsed as a method invocation. So this is really all about >(4). > >OPTION A: DISALLOW YIELD (E) >---------------------------- > >In this option, we disallow yield statements whose argument is a >parenthesized expression, instead parsing them as method invocations. >Most such invocations will fail as there is unlikely to be a yield() >method in scope. > >From a parser perspective, this is straightforward enough; we need an >alternate Expression production which omits ?parenthesized expression.? > > >From a user perspective, I think this is likely to be a sharp edge, as >I would expect it to be more common to want to use a parenthesized >operand than there will be a yield method in scope. > >OPTION B: DISALLOW UNQUALIFIED INVOCATION >----------------------------------------- > >From a parser perspective, this is similarly straightforward: inside a >switchy block, give the rule `yield ` a higher priority than >method invocation. The compiler can warn on this ambiguity, if we >like. > >From a user perspective, users wanting to invoke yield() methods inside >switchy blocks will need to qualify the receiver (Foo.yield(), >this.yield(), etc). > >The cost is that a statement ?yield (e)? parses to different things in >different contexts; in a switchy block, it is a yield statement, the >rest of the time, it is a method invocation. > >I think this is much less likely to cause user distress than Option A, >because it is rare that there is an unqualified yield(x) method in >scope. (And, given every yield() method I can think of, you?d likely >never call one from a switchy block anyway, as they are side-effectful >and blocking.). And in the case of collision, there is a clear >workaround if the user really wanted a method invocation, and the >compiler can deliver a warning when there is actual ambiguity. > >OPTION C: SYMBOL-DRIVEN PARSING >------------------------------- > >In this option, the context-sensitivity of parsing includes a check for >whether a `yield()` method is in scope. I think we can rule this out >as overly heroic; constraining parsing to be aware of the symbol table >is asking a lot of compilers. > >OPTION D: BOTH WAYS >------------------- > >In this option, we proceed as with Option A, but when we get to symbol >analysis, if we are in a switchy block and there is no yield() method >in scope, we rewrite the tree to be a yield statement instead. > >OPTION E: A REAL KEYWORD >------------------------ > >The pain above is an artifact of choosing a contextual keyword; on the >scale of contextual pain, this rates a ?mild?, largely because true >collisions are likely to be quite rare, and there is no backward >compatibility concern. So while choosing a real keyword (break-with) >would be cleaner, I don?t think the users will like it as much. > > >My opinions: I think C is pretty much a non-starter, and IMO B is >measurably more attractive than A. Option D is not as terrible as C >but seems overly heroic, as we try to avoid tree-rewriting in >attribution. I don?t think the pain of either A or B merits grabbing >for E. -- Envoy? de mon appareil Android avec Courriel K-9 Mail. Veuillez excuser ma bri?vet?. From daniel.smith at oracle.com Fri May 17 18:21:12 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 17 May 2019 12:21:12 -0600 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <197AA4CD-F757-480C-8DB6-49860FB342E8@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <1088337931.1496052.1558006893704.JavaMail.zimbra@u-pem.fr> <65F7DBFE-4215-4F6A-81B8-70D8F5B402D0@oracle.com> <5CDDBBC6.6010107@oracle.com> <811BEAF1-3F9C-4119-AC0F-2A13DEC23574@oracle.com> <197AA4CD-F757-480C-8DB6-49860FB342E8@oracle.com> Message-ID: <3B2D6582-0508-4332-9857-1B709CF43C2A@oracle.com> > On May 16, 2019, at 2:28 PM, John Rose wrote: > > Chapter 1. Some constructs have arrows. > They define when the arrow bodies are executed, > and, if the the arrow gets tossed a value, what > is done with that value (method return? > switch result? block result? depends on > where the arrow is). > > Chapter 2. Every yield matches an innermost > arrow, and every arrow (in a non-void T context) > accepts a yielded value (of type T). Reminder: the statement can be used in two contexts: arrow-based switch rules and label-based switch blocks. int i = switch (foo) { case "a" -> { yield 1; } case "b" -> { yield 2; } default: -> { yield 0; } }; int j = switch (foo) { case "a": yield 1; case "b": yield 2; default: yield 0; }; From guy.steele at oracle.com Fri May 17 18:41:41 2019 From: guy.steele at oracle.com (Guy Steele) Date: Fri, 17 May 2019 14:41:41 -0400 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> Message-ID: I (somewhat reluctantly, but with an appreciation for the pragmatics of the situation) support option B. ?Guy > On May 17, 2019, at 12:57 PM, Brian Goetz wrote: > > As was pointed out in Keyword Management for the Java Language (https://openjdk.java.net/jeps/8223002 ), contextual keywords are a compromise, and their compromises vary by lexical position. Let?s take a more organized look at the costs and options for doing `yield` as a contextual keyword. > > But, before we do, let?s put this in context (heh): methods called yield() are rare (there?s only one in the JDK), and blocks on the RHS of an arrow-switch are rare, so we?re talking about the interaction of two corner cases. > > Let?s take the following example. > > class C { > /* 1 */ void yield(int x) { } > > void m(int y) { > /* 2 */ yield (1); > /* 3 */ yield 1; > > int z = switch (y) { > case 0 -> { > /* 4 */ yield (1); > } > case 1 -> { > /* 5 */ yield 1; > } > default -> 42; > } > } > } > > First, requirements: > > For usage (1), this has to be a valid method declaration. > > For usage (2), this has to be a method invocation. > > For usage (3), this has to be some sort of compilation error. > > For usage (4), there is some discussion to be had. > > For usage (5), this has to be a yield statement. > > > (1) is not problematic, as the yield-statement production is not in play at all when parsing method declarations. > > (3) is not problematic, as there is no ambiguity between method-invocation and yield-statement, and yield-statement is not allowed here. (Even if the operand were an identifier, not a numeric literal, it would not be ambiguous with a local variable declaration, because `yield` will not be permitted as a type identifier.). > > (5) is not problematic, as there is no ambiguity between method invocation and yield-statement. > > Let?s talk about (2) and (4). > > Let?s assume the parser production only allows yield statement inside of a block on the RHS of an arrow-switch (and maybe some other contexts in the future, but not all blocks). Let?s call these ?switchy blocks? for clarity. That means that (2) is similarly unambiguous to (3), and will be parsed as a method invocation. So this is really all about (4). > > OPTION A: DISALLOW YIELD (E) > ---------------------------- > > In this option, we disallow yield statements whose argument is a parenthesized expression, instead parsing them as method invocations. Most such invocations will fail as there is unlikely to be a yield() method in scope. > > From a parser perspective, this is straightforward enough; we need an alternate Expression production which omits ?parenthesized expression.? > > From a user perspective, I think this is likely to be a sharp edge, as I would expect it to be more common to want to use a parenthesized operand than there will be a yield method in scope. > > OPTION B: DISALLOW UNQUALIFIED INVOCATION > ----------------------------------------- > > From a parser perspective, this is similarly straightforward: inside a switchy block, give the rule `yield ` a higher priority than method invocation. The compiler can warn on this ambiguity, if we like. > > From a user perspective, users wanting to invoke yield() methods inside switchy blocks will need to qualify the receiver (Foo.yield(), this.yield(), etc). > > The cost is that a statement ?yield (e)? parses to different things in different contexts; in a switchy block, it is a yield statement, the rest of the time, it is a method invocation. > > I think this is much less likely to cause user distress than Option A, because it is rare that there is an unqualified yield(x) method in scope. (And, given every yield() method I can think of, you?d likely never call one from a switchy block anyway, as they are side-effectful and blocking.). And in the case of collision, there is a clear workaround if the user really wanted a method invocation, and the compiler can deliver a warning when there is actual ambiguity. > > OPTION C: SYMBOL-DRIVEN PARSING > ------------------------------- > > In this option, the context-sensitivity of parsing includes a check for whether a `yield()` method is in scope. I think we can rule this out as overly heroic; constraining parsing to be aware of the symbol table is asking a lot of compilers. > > OPTION D: BOTH WAYS > ------------------- > > In this option, we proceed as with Option A, but when we get to symbol analysis, if we are in a switchy block and there is no yield() method in scope, we rewrite the tree to be a yield statement instead. > > OPTION E: A REAL KEYWORD > ------------------------ > > The pain above is an artifact of choosing a contextual keyword; on the scale of contextual pain, this rates a ?mild?, largely because true collisions are likely to be quite rare, and there is no backward compatibility concern. So while choosing a real keyword (break-with) would be cleaner, I don?t think the users will like it as much. > > > My opinions: I think C is pretty much a non-starter, and IMO B is measurably more attractive than A. Option D is not as terrible as C but seems overly heroic, as we try to avoid tree-rewriting in attribution. I don?t think the pain of either A or B merits grabbing for E. > > > From alex.buckley at oracle.com Fri May 17 21:56:14 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Fri, 17 May 2019 14:56:14 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> Message-ID: <5CDF2DFE.6030901@oracle.com> Correction: `yield-value` is a hyphenated keyword. Specifically, a hyphenated contextual keyword, where each term is itself a unitary contextual keyword. This is discussed, with examples, in the JEP (https://openjdk.java.net/jeps/8223002). Introducing `yield-value` as a hyphenated contextual keyword doesn't buy you much. Both `yield` and `value` would tokenize as identifiers everywhere, so that you can keep on subtracting your `value` variable and the result of your `value` method: `int x = yield-value +y;` `int x = yield -value(x);` So, recognizing a hyphenated contextual keyword `yield-value` would still require careful reasoning about context, about as much as we're doing to recognize a unitary contextual keyword `yield`. Alex On 5/17/2019 8:40 AM, Remi Forax wrote: > Hi Manoj, > yield-value is not a hyphenated keyword, the left part of the right part > as to be an existing keyword. > > Remi > > On May 17, 2019 2:55:14 PM UTC, Manoj Palat wrote: > > Hi, > I have a few points regarding this ? since there was a flurry of > mails last night/day, I have given references below to specific > threads below: > > -As Maurizio pointed out in > _https://mail.openjdk.java.net/pipermail/amber-spec-experts/2019-May/001334.html_, > ?yield? is not really a _reserved_type_identifier_ like ?var? ? > ?var? is correct only at places (at some places actually) where a > type can occur- > *Our view point*: At parsing time ?var? is just taken as a type and > hence from a compiler implementation point of view, ?var? is less of > a challenge than the proposed ?yield?. If ?yield? value is used > instead of ?break? value, then again, the compiler needs to > disambiguate ? the disambiguation problem just manifests in a > different avatar. > > -Alex, in the discussion here > _https://mail.openjdk.java.net/pipermail/amber-spec-experts/2019-May/001338.html_has > pointed out that ?The parsing of a `(` token has triggered > potentially unbounded lookahead for some time [1][2], and everything > worked out, so I don't see why the language should disallow any of > John's examples? where The reference [1] is ?[1] See slides 9-11 > from > _https://www.eclipsecon.org/na2014/session/jdt-embraces-lambda-expressions.html_? > > *Our View point: *However, though the problem was resolved finally > for lambda, additions of new context sensitive keywords would make > our parsing more complicated with additional logic in lookaheads. > Although the problem was solved from a pure compiler perspective, we > are far from winning the battle as an IDE where one major value add > is code completion, which works on incomplete code. Due to these > hacks, code completion for lambdas still has unresolved issues for us. > > - An additional input to this discussion is the proposal for > hyphenated keywords as described in > _https://openjdk.java.net/jeps/8223002_. ?break-with? which was the > earlier proposed one, was one among these hyphenated keywords. > *Our View point: *We are fine with that as mentioned in the mailing > list sometime earlier in the context of switch expressions and > break-with, the hyphenated keyword. The more the number of context > sensitive keywords are introduced, causing more hacks, it would be > really difficult to sustain and scale the Eclipse IDE. > - Based on the above, I believe ?break-with? was a better candidate > with less or disambiguation and it goes along with the future > direction of keywords. Here the assumption is break-with is not > context sensitive at any point in time. Given that ?break-with? had > opposition, and ?yield? was more popular candidate, planning to > reply with a new suggestion of hyphenated keyword ?*yield-value*? or > any other hyphenated keyword. > > Regards, > Manoj. > Eclipse Java Dev. From john.r.rose at oracle.com Fri May 17 22:13:15 2019 From: john.r.rose at oracle.com (John Rose) Date: Fri, 17 May 2019 15:13:15 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5CDF2DFE.6030901@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <5CDF2DFE.6030901@oracle.com> Message-ID: <2BF3C2F1-BFD2-4D1A-AE39-D7AE053FD91B@oracle.com> On May 17, 2019, at 2:56 PM, Alex Buckley wrote: > > So, recognizing a hyphenated contextual keyword `yield-value` would still require careful reasoning about context, about as much as we're doing to recognize a unitary contextual keyword `yield`. Much less so than the rules either Brian or I sketched. It?s a statement, not an expression. And no expression statement begins with ID - ID right? It?s not as ambiguous as ID (. From forax at univ-mlv.fr Fri May 17 22:38:40 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 18 May 2019 00:38:40 +0200 (CEST) Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5CDF2DFE.6030901@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <5CDF2DFE.6030901@oracle.com> Message-ID: <2012306974.2107888.1558132720266.JavaMail.zimbra@u-pem.fr> Thanks, Alex, i stand corrected. R?mi ----- Mail original ----- > De: "Alex Buckley" > ?: "amber-spec-experts" > Envoy?: Vendredi 17 Mai 2019 23:56:14 > Objet: Re: Call for bikeshed -- break replacement in expression switch > Correction: `yield-value` is a hyphenated keyword. Specifically, a > hyphenated contextual keyword, where each term is itself a unitary > contextual keyword. This is discussed, with examples, in the JEP > (https://openjdk.java.net/jeps/8223002). > > Introducing `yield-value` as a hyphenated contextual keyword doesn't buy > you much. Both `yield` and `value` would tokenize as identifiers > everywhere, so that you can keep on subtracting your `value` variable > and the result of your `value` method: > > `int x = yield-value +y;` > `int x = yield -value(x);` > > So, recognizing a hyphenated contextual keyword `yield-value` would > still require careful reasoning about context, about as much as we're > doing to recognize a unitary contextual keyword `yield`. > > Alex > > On 5/17/2019 8:40 AM, Remi Forax wrote: >> Hi Manoj, >> yield-value is not a hyphenated keyword, the left part of the right part >> as to be an existing keyword. >> >> Remi >> >> On May 17, 2019 2:55:14 PM UTC, Manoj Palat wrote: >> >> Hi, >> I have a few points regarding this ? since there was a flurry of >> mails last night/day, I have given references below to specific >> threads below: >> >> -As Maurizio pointed out in >> _https://mail.openjdk.java.net/pipermail/amber-spec-experts/2019-May/001334.html_, >> ?yield? is not really a _reserved_type_identifier_ like ?var? ? >> ?var? is correct only at places (at some places actually) where a >> type can occur- >> *Our view point*: At parsing time ?var? is just taken as a type and >> hence from a compiler implementation point of view, ?var? is less of >> a challenge than the proposed ?yield?. If ?yield? value is used >> instead of ?break? value, then again, the compiler needs to >> disambiguate ? the disambiguation problem just manifests in a >> different avatar. >> >> -Alex, in the discussion here >> _https://mail.openjdk.java.net/pipermail/amber-spec-experts/2019-May/001338.html_has >> pointed out that ?The parsing of a `(` token has triggered >> potentially unbounded lookahead for some time [1][2], and everything >> worked out, so I don't see why the language should disallow any of >> John's examples? where The reference [1] is ?[1] See slides 9-11 >> from >> _https://www.eclipsecon.org/na2014/session/jdt-embraces-lambda-expressions.html_? >> >> *Our View point: *However, though the problem was resolved finally >> for lambda, additions of new context sensitive keywords would make >> our parsing more complicated with additional logic in lookaheads. >> Although the problem was solved from a pure compiler perspective, we >> are far from winning the battle as an IDE where one major value add >> is code completion, which works on incomplete code. Due to these >> hacks, code completion for lambdas still has unresolved issues for us. >> >> - An additional input to this discussion is the proposal for >> hyphenated keywords as described in >> _https://openjdk.java.net/jeps/8223002_. ?break-with? which was the >> earlier proposed one, was one among these hyphenated keywords. >> *Our View point: *We are fine with that as mentioned in the mailing >> list sometime earlier in the context of switch expressions and >> break-with, the hyphenated keyword. The more the number of context >> sensitive keywords are introduced, causing more hacks, it would be >> really difficult to sustain and scale the Eclipse IDE. >> - Based on the above, I believe ?break-with? was a better candidate >> with less or disambiguation and it goes along with the future >> direction of keywords. Here the assumption is break-with is not >> context sensitive at any point in time. Given that ?break-with? had >> opposition, and ?yield? was more popular candidate, planning to >> reply with a new suggestion of hyphenated keyword ?*yield-value*? or >> any other hyphenated keyword. >> >> Regards, >> Manoj. > > Eclipse Java Dev. From alex.buckley at oracle.com Fri May 17 23:06:06 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Fri, 17 May 2019 16:06:06 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <2BF3C2F1-BFD2-4D1A-AE39-D7AE053FD91B@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <5CDF2DFE.6030901@oracle.com> <2BF3C2F1-BFD2-4D1A-AE39-D7AE053FD91B@oracle.com> Message-ID: <5CDF3E5E.4060306@oracle.com> On 5/17/2019 3:13 PM, John Rose wrote: > On May 17, 2019, at 2:56 PM, Alex Buckley > wrote: >> >> So, recognizing a hyphenated contextual keyword `yield-value` would >> still require careful reasoning about context, about as much as >> we're doing to recognize a unitary contextual keyword `yield`. > > Much less so than the rules either Brian or I sketched. It?s a > statement, not an expression. And no expression statement begins with > ID - ID right? It?s not as ambiguous as ID (. A single-expression lambda body has the flavor of a statement form, though yes, it has no `;` of its own and is not parsed as ExpressionStatement. `map( y -> yield-value +y );` So, I agree that parsing `yield-value (1);` as a YieldStatement in a SwitchLabeledBlock does not have the ambiguity of parsing `yield (1);` as a YieldStatement|ExpressionStatement in a SwitchLabeledBlock ... but the decision still has to be taken about whether "in a SwitchLabeledBlock" or "in " is the proper context to recognize something new. Alex From manoj.palat at in.ibm.com Mon May 20 14:47:57 2019 From: manoj.palat at in.ibm.com (Manoj Palat) Date: Mon, 20 May 2019 20:17:57 +0530 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> Message-ID: I would vote for option E - a real keyword : break-with. Regards, Manoj From: Guy Steele To: Brian Goetz Cc: amber-spec-experts Date: 05/18/2019 12:11 AM Subject: [EXTERNAL] Re: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) Sent by: "amber-spec-experts" I (somewhat reluctantly, but with an appreciation for the pragmatics of the situation) support option B. ?Guy On May 17, 2019, at 12:57 PM, Brian Goetz wrote: As was pointed out in Keyword Management for the Java Language ( https://openjdk.java.net/jeps/8223002), contextual keywords are a compromise, and their compromises vary by lexical position. Let?s take a more organized look at the costs and options for doing `yield` as a contextual keyword. But, before we do, let?s put this in context (heh): methods called yield() are rare (there?s only one in the JDK), and blocks on the RHS of an arrow-switch are rare, so we?re talking about the interaction of two corner cases. Let?s take the following example. class C { /* 1 */ void yield(int x) { } void m(int y) { /* 2 */ yield (1); /* 3 */ yield 1; int z = switch (y) { case 0 -> { /* 4 */ yield (1); } case 1 -> { /* 5 */ yield 1; } default -> 42; } } } First, requirements: For usage (1), this has to be a valid method declaration. For usage (2), this has to be a method invocation. For usage (3), this has to be some sort of compilation error. For usage (4), there is some discussion to be had. For usage (5), this has to be a yield statement. (1) is not problematic, as the yield-statement production is not in play at all when parsing method declarations. (3) is not problematic, as there is no ambiguity between method-invocation and yield-statement, and yield-statement is not allowed here. (Even if the operand were an identifier, not a numeric literal, it would not be ambiguous with a local variable declaration, because `yield` will not be permitted as a type identifier.). (5) is not problematic, as there is no ambiguity between method invocation and yield-statement. Let?s talk about (2) and (4). Let?s assume the parser production only allows yield statement inside of a block on the RHS of an arrow-switch (and maybe some other contexts in the future, but not all blocks). Let?s call these ?switchy blocks? for clarity. That means that (2) is similarly unambiguous to (3), and will be parsed as a method invocation. So this is really all about (4). OPTION A: DISALLOW YIELD (E) ---------------------------- In this option, we disallow yield statements whose argument is a parenthesized expression, instead parsing them as method invocations. Most such invocations will fail as there is unlikely to be a yield() method in scope. From a parser perspective, this is straightforward enough; we need an alternate Expression production which omits ?parenthesized expression.? From a user perspective, I think this is likely to be a sharp edge, as I would expect it to be more common to want to use a parenthesized operand than there will be a yield method in scope. OPTION B: DISALLOW UNQUALIFIED INVOCATION ----------------------------------------- From a parser perspective, this is similarly straightforward: inside a switchy block, give the rule `yield ` a higher priority than method invocation. The compiler can warn on this ambiguity, if we like. From a user perspective, users wanting to invoke yield() methods inside switchy blocks will need to qualify the receiver (Foo.yield(), this.yield(), etc). The cost is that a statement ?yield (e)? parses to different things in different contexts; in a switchy block, it is a yield statement, the rest of the time, it is a method invocation. I think this is much less likely to cause user distress than Option A, because it is rare that there is an unqualified yield(x) method in scope. (And, given every yield() method I can think of, you?d likely never call one from a switchy block anyway, as they are side-effectful and blocking.). And in the case of collision, there is a clear workaround if the user really wanted a method invocation, and the compiler can deliver a warning when there is actual ambiguity. OPTION C: SYMBOL-DRIVEN PARSING ------------------------------- In this option, the context-sensitivity of parsing includes a check for whether a `yield()` method is in scope. I think we can rule this out as overly heroic; constraining parsing to be aware of the symbol table is asking a lot of compilers. OPTION D: BOTH WAYS ------------------- In this option, we proceed as with Option A, but when we get to symbol analysis, if we are in a switchy block and there is no yield() method in scope, we rewrite the tree to be a yield statement instead. OPTION E: A REAL KEYWORD ------------------------ The pain above is an artifact of choosing a contextual keyword; on the scale of contextual pain, this rates a ?mild?, largely because true collisions are likely to be quite rare, and there is no backward compatibility concern. So while choosing a real keyword (break-with) would be cleaner, I don?t think the users will like it as much. My opinions: I think C is pretty much a non-starter, and IMO B is measurably more attractive than A. Option D is not as terrible as C but seems overly heroic, as we try to avoid tree-rewriting in attribution. I don?t think the pain of either A or B merits grabbing for E. From amaembo at gmail.com Mon May 20 15:24:25 2019 From: amaembo at gmail.com (Tagir Valeev) Date: Mon, 20 May 2019 22:24:25 +0700 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> Message-ID: Hello! Assuming that we agreed on 'yield' the option B seems the most attractive. A big No to context-specific parse tree. It's a complete pain to IDEs. Don't forget that IDE often deals with incomplete code, missing dependencies, etc., and still needs to provide reasonable highlighting and completion. Imagine that 'yield' method is available via import static Foo.* or superclass. In this case we don't want to look into other files to build a correct parse tree. With best regards, Tagir Valeev. ??, 17 ??? 2019 ?., 23:57 Brian Goetz : > As was pointed out in Keyword Management for the Java Language ( > https://openjdk.java.net/jeps/8223002), contextual keywords are a > compromise, and their compromises vary by lexical position. Let?s take a > more organized look at the costs and options for doing `yield` as a > contextual keyword. > > But, before we do, let?s put this in context (heh): methods called yield() > are rare (there?s only one in the JDK), and blocks on the RHS of an > arrow-switch are rare, so we?re talking about the interaction of two corner > cases. > > Let?s take the following example. > > class C { > /* 1 */ void yield(int x) { } > > void m(int y) { > /* 2 */ yield (1); > /* 3 */ yield 1; > > int z = switch (y) { > case 0 -> { > /* 4 */ yield (1); > } > case 1 -> { > /* 5 */ yield 1; > } > default -> 42; > } > } > } > > First, requirements: > > For usage (1), this has to be a valid method declaration. > > For usage (2), this has to be a method invocation. > > For usage (3), this has to be some sort of compilation error. > > For usage (4), there is some discussion to be had. > > For usage (5), this has to be a yield statement. > > > (1) is not problematic, as the yield-statement production is not in play > at all when parsing method declarations. > > (3) is not problematic, as there is no ambiguity between method-invocation > and yield-statement, and yield-statement is not allowed here. (Even if the > operand were an identifier, not a numeric literal, it would not be > ambiguous with a local variable declaration, because `yield` will not be > permitted as a type identifier.). > > (5) is not problematic, as there is no ambiguity between method invocation > and yield-statement. > > Let?s talk about (2) and (4). > > Let?s assume the parser production only allows yield statement inside of a > block on the RHS of an arrow-switch (and maybe some other contexts in the > future, but not all blocks). Let?s call these ?switchy blocks? for > clarity. That means that (2) is similarly unambiguous to (3), and will be > parsed as a method invocation. So this is really all about (4). > > OPTION A: DISALLOW YIELD (E) > ---------------------------- > > In this option, we disallow yield statements whose argument is a > parenthesized expression, instead parsing them as method invocations. Most > such invocations will fail as there is unlikely to be a yield() method in > scope. > > From a parser perspective, this is straightforward enough; we need an > alternate Expression production which omits ?parenthesized expression.? > > From a user perspective, I think this is likely to be a sharp edge, as I > would expect it to be more common to want to use a parenthesized operand > than there will be a yield method in scope. > > OPTION B: DISALLOW UNQUALIFIED INVOCATION > ----------------------------------------- > > From a parser perspective, this is similarly straightforward: inside a > switchy block, give the rule `yield ` a higher priority than method > invocation. The compiler can warn on this ambiguity, if we like. > > From a user perspective, users wanting to invoke yield() methods inside > switchy blocks will need to qualify the receiver (Foo.yield(), > this.yield(), etc). > > The cost is that a statement ?yield (e)? parses to different things in > different contexts; in a switchy block, it is a yield statement, the rest > of the time, it is a method invocation. > > I think this is much less likely to cause user distress than Option A, > because it is rare that there is an unqualified yield(x) method in scope. > (And, given every yield() method I can think of, you?d likely never call > one from a switchy block anyway, as they are side-effectful and blocking.). > And in the case of collision, there is a clear workaround if the user > really wanted a method invocation, and the compiler can deliver a warning > when there is actual ambiguity. > > OPTION C: SYMBOL-DRIVEN PARSING > ------------------------------- > > In this option, the context-sensitivity of parsing includes a check for > whether a `yield()` method is in scope. I think we can rule this out as > overly heroic; constraining parsing to be aware of the symbol table is > asking a lot of compilers. > > OPTION D: BOTH WAYS > ------------------- > > In this option, we proceed as with Option A, but when we get to symbol > analysis, if we are in a switchy block and there is no yield() method in > scope, we rewrite the tree to be a yield statement instead. > > OPTION E: A REAL KEYWORD > ------------------------ > > The pain above is an artifact of choosing a contextual keyword; on the > scale of contextual pain, this rates a ?mild?, largely because true > collisions are likely to be quite rare, and there is no backward > compatibility concern. So while choosing a real keyword (break-with) would > be cleaner, I don?t think the users will like it as much. > > > My opinions: I think C is pretty much a non-starter, and IMO B is > measurably more attractive than A. Option D is not as terrible as C but > seems overly heroic, as we try to avoid tree-rewriting in attribution. I > don?t think the pain of either A or B merits grabbing for E. > > > > From guy.steele at oracle.com Mon May 20 16:06:34 2019 From: guy.steele at oracle.com (Guy Steele) Date: Mon, 20 May 2019 12:06:34 -0400 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> Message-ID: I am becoming more worried again about the consequences of using a contextual keyword such as ?yield?, especially for IDEs. It seems to me that this whole discussion kicked off with a discussion of `yield` as being desirable because, by analogy with `return` being the One True Way to return a value from a stack frame for an invocation, it might be desirable to have One True Way to return a value from a statement-like expression that did not create a stack frame. So if in future we might have a number of other expressions from which we want to return a value, we might want to pick a keyword not specifically associated with `switch`. But I now think that this argument is weak, because I believe it is in fact unlikely that there will be a large number of other statement types that we will want to turn into expressions. * Yes, we may want block expressions. * We would want `if`-statement expressions _except_ that we already have the ternary operator ? :, so I don?t really think there is a pressing need. * As a Common Lisp programmer, I can see the value of having a `for` loop produce a value. But I very much doubt that Joe Java-programmer wants that. * And if we don?t want values from `for` loops, I doubt there will be much demand for values from `while` loops or `try` statements. So I really do think that ? : and `switch` expressions and blocks really are it. Sometimes there really *are* only two or three things of real interest, even if one could theoretically dream up more. And if I am wrong, there is still a way out for the rare cases: if we have a way to do a non-local yield from a block, you can wrap that kind of block around any statement you like, just as you can `break` from any kind of statement by wrapping a labeled-statement around it (that is, by giving it a label). So I am now leaning toward either `break-with` (or `switch-yield`, but I think `break-with` is better). And if we ever do expression blocks, then we could have a special thing for returning from them. In fact, I have a suggestion: block expression ({ statements; expression }) labeled block expression (label: { statements; expression }) nonlocal block yield break label: expression; In fact, if we had those, we wouldn?t need a special way to yield a value from a switch expression; we could just write (label: switch { . . . ; case 43: { printf(?foo?); break label: 96; } . . . }) which in turn suggests that we could omit the label from the `switch` statement and the corresponding `break` statement if we wanted?in other words, under this theory a plausible spelling of `break-with` is `break:`. :-) ?Guy From brian.goetz at oracle.com Mon May 20 16:16:23 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 20 May 2019 12:16:23 -0400 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> Message-ID: > But I now think that this argument is weak, because I believe it is in fact unlikely that there will be a large number of other statement types that we will want to turn into expressions. > > * Yes, we may want block expressions. > * We would want `if`-statement expressions _except_ that we already have the ternary operator ? :, so I don?t really think there is a pressing need. > * As a Common Lisp programmer, I can see the value of having a `for` loop produce a value. But I very much doubt that Joe Java-programmer wants that. > * And if we don?t want values from `for` loops, I doubt there will be much demand for values from `while` loops or `try` statements. While I agree about if / for / while, I?m not completely convinced on `try`; the notion of ?produce a value or throw? is a common one, and has all the same advantages over a try statement that a switch expression has over a switch statement ? that initialization of variables with side-effectful statements is messier and more error-prone than with expressions. So, for example, the not-so-uncommon idiom: static final Foo foo; static { try { foo = new Foo(); } catch (FooException fe) { throw new OtherKindOfException(fe); } } leaves, well, room for improvement. Let?s just register that as something we might want to do someday. Another form of expression that might have statements in it is some sort of ?let? or ?with? expression; it is not uncommon enough to have to execute statements to produce a result: Foo f = new Foo(); f.setBar(1); return f; which similarly would like to be replaced with an expression. (let?s not design this here, the point is that this is a road we might well want to walk again some day.) From guy.steele at oracle.com Mon May 20 16:33:28 2019 From: guy.steele at oracle.com (Guy Steele) Date: Mon, 20 May 2019 12:33:28 -0400 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> Message-ID: > On May 20, 2019, at 12:16 PM, Brian Goetz wrote: > >> But I now think that this argument is weak, because I believe it is in fact unlikely that there will be a large number of other statement types that we will want to turn into expressions. >> >> * Yes, we may want block expressions. >> * We would want `if`-statement expressions _except_ that we already have the ternary operator ? :, so I don?t really think there is a pressing need. >> * As a Common Lisp programmer, I can see the value of having a `for` loop produce a value. But I very much doubt that Joe Java-programmer wants that. >> * And if we don?t want values from `for` loops, I doubt there will be much demand for values from `while` loops or `try` statements. > > While I agree about if / for / while, I?m not completely convinced on `try`; the notion of ?produce a value or throw? is a common one, and has all the same advantages over a try statement that a switch expression has over a switch statement ? that initialization of variables with side-effectful statements is messier and more error-prone than with expressions. So, for example, the not-so-uncommon idiom: > > static final Foo foo; > static { > try { > foo = new Foo(); > } > catch (FooException fe) { > throw new OtherKindOfException(fe); > } > } > > leaves, well, room for improvement. Let?s just register that as something we might want to do someday. Sure, I concede that I was hasty on this one. > Another form of expression that might have statements in it is some sort of ?let? or ?with? expression; it is not uncommon enough to have to execute statements to produce a result: > > Foo f = new Foo(); > f.setBar(1); > return f; > > which similarly would like to be replaced with an expression. (let?s not design this here, the point is that this is a road we might well want to walk again some day.) But that?s easily handled by a block expression. (Java doesn?t already have a `let` or `with` statement precisely because being able to intersperse declarations with statements in a block does that job quite nicely.) So, for fun, I?m going to taste-test these two excellent examples. Offhand I see no problem with using the `try` keyword in the middle of an expression, just as we did for `switch`, so: static final Foo = try { break: new Foo(); } catch (FooException fe) { throw new OtherKindOfException(fe); }; Perhaps more interesting is returning a default value if a FooException occurs?and maybe we can tighten up the main body by using parentheses: static final Foo = try (new Foo()) catch (FooException fe) { break: myDefaultFoo; }; and maybe even the `catch` body could be tightened up in the same way: static final Foo = try (new Foo()) catch (FooException fe) (myDefaultFoo); As for the other one: ({ Foo f = new Foo(); f.setBar(1); f }) looks pretty good me. None of this is to make an actual proposal that this is best or even good, or that we should do it soon; it?s just to see whether there is something plausible available. From john.r.rose at oracle.com Mon May 20 17:02:52 2019 From: john.r.rose at oracle.com (John Rose) Date: Mon, 20 May 2019 10:02:52 -0700 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> Message-ID: On May 20, 2019, at 9:06 AM, Guy Steele wrote: > > under this theory a plausible spelling of `break-with` is `break:` FWIW I'd be very very happy with the look/feel of `break: x;`. I'd be a little happier with `break->x` because I'm used to reading expressions `x` in `->x` but not so much `:x`. Though there is `p?y:x`, for that matter. Dan, I hear you about block-switches having `:` instead of `->`. I guess that's a bit of support for Guy's proposal, then. I still think `->` will be a bit more readable, though I may be falling for the "make new syntax STaNd *oUT*" syndrome. I've always thought `break` serviceable for the purpose we are discussing, and even applicable to new statement types, especially if strengthened with a label, as you point out, Guy. I'm very happy to go back to `break`. ? John From dl at cs.oswego.edu Mon May 20 17:36:30 2019 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 20 May 2019 13:36:30 -0400 Subject: Yield as contextual keyword In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> Message-ID: <94ee8d85-d597-6f81-cee1-c8e68e7f98fc@cs.oswego.edu> On 5/20/19 1:02 PM, John Rose wrote: > On May 20, 2019, at 9:06 AM, Guy Steele wrote: >> >> under this theory a plausible spelling of `break-with` is `break:` > > FWIW I'd be very very happy with the look/feel of `break: x;`. > Almost the same here. Subtract a few verys, keep the happy. -Doug From brian.goetz at oracle.com Mon May 20 19:51:37 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 20 May 2019 15:51:37 -0400 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> Message-ID: Clarification on Option (B): the rational thing to do is probably not to restrict this behavior to switchy blocks, because there are other kinds of statement contexts that will be embedded in switchy blocks, such as: case L -> { if (cond) yield e; } And surely we want to treat this as a yield (in fact, the inability to do this in loops was one of the reasons we rejected `break` in the first place.). Which means we always parse `yield ` as a yield statement, and if someone happens to call an _unqualified_ unary method called yield, they get an error (with a helpful suggestion that they might want to try a qualified yield.). So, to quantify the cost of a conditional keyword here: invocations of the form yield(e) will be parsed as statements, though qualified invocations (this.yield(e), Thread.yield(e), etc) will be parsed as method invocations. The cost-benefit analysis rests on the assumption that this will bite exceedingly rarely, and when it does, the workaround will be clear and easy. > OPTION B: DISALLOW UNQUALIFIED INVOCATION > ----------------------------------------- > > From a parser perspective, this is similarly straightforward: inside a switchy block, give the rule `yield ` a higher priority than method invocation. The compiler can warn on this ambiguity, if we like. > > From a user perspective, users wanting to invoke yield() methods inside switchy blocks will need to qualify the receiver (Foo.yield(), this.yield(), etc). > > The cost is that a statement ?yield (e)? parses to different things in different contexts; in a switchy block, it is a yield statement, the rest of the time, it is a method invocation. > > I think this is much less likely to cause user distress than Option A, because it is rare that there is an unqualified yield(x) method in scope. (And, given every yield() method I can think of, you?d likely never call one from a switchy block anyway, as they are side-effectful and blocking.). And in the case of collision, there is a clear workaround if the user really wanted a method invocation, and the compiler can deliver a warning when there is actual ambiguity. From john.r.rose at oracle.com Mon May 20 19:52:23 2019 From: john.r.rose at oracle.com (John Rose) Date: Mon, 20 May 2019 12:52:23 -0700 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> Message-ID: <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> On May 20, 2019, at 8:24 AM, Tagir Valeev wrote: > > Assuming that we agreed on 'yield' the option B seems the most attractive. A big No to context-specific parse tree. It's a complete pain to IDEs. Don't forget that IDE often deals with incomplete code, missing dependencies, etc., and still needs to provide reasonable highlighting and completion. Imagine that 'yield' method is available via import static Foo.* or superclass. In this case we don't want to look into other files to build a correct parse tree. So does this (option B plus your No) mean that IDEs would tend to color "yield" as a keyword (at the beginning of a statement) even if followed by "("? I suppose that would work. It's hard to predict what that would feel like, but it's logical. ? John From john.r.rose at oracle.com Mon May 20 20:00:43 2019 From: john.r.rose at oracle.com (John Rose) Date: Mon, 20 May 2019 13:00:43 -0700 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> Message-ID: <687FA694-B044-4BF0-8D37-028159B3FCF3@oracle.com> On May 20, 2019, at 12:51 PM, Brian Goetz wrote: > > The cost-benefit analysis rests on the assumption that this will bite exceedingly rarely, and when it does, the workaround will be clear and easy. OK, so let's road-test "yield". While "break: x" is syntactically safer than "yield x", I buy your argument. And "yield x" is easier to explain to users. We all know why ":" is necessary in "break: x" but users will need explanations about the colon where "yield x" will just work for them. There's always a trade-off between precision and concision. Concise formulations are inherently ambiguous, just because there are a limited number of length-N strings and each can be given only one semantic pigeonhole. Adding a new keyword lets us occupy new pigeonholes with the same number of tokens. So, while "break: x" would please me as a syntax geek, "yield x" (if it really works) will please me as a user. We should try it if we think we can make it work. ? John From guy.steele at oracle.com Mon May 20 20:10:00 2019 From: guy.steele at oracle.com (Guy Steele) Date: Mon, 20 May 2019 16:10:00 -0400 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> Message-ID: Okay, with this more detailed explanation and understanding, I am back in the `yield` camp. But I?m glad to have explored that side path just a bit. > On May 20, 2019, at 3:51 PM, Brian Goetz > wrote: > > Clarification on Option (B): the rational thing to do is probably not to restrict this behavior to switchy blocks, because there are other kinds of statement contexts that will be embedded in switchy blocks, such as: > > case L -> { > if (cond) > yield e; > } > > And surely we want to treat this as a yield (in fact, the inability to do this in loops was one of the reasons we rejected `break` in the first place.). Which means we always parse `yield ` as a yield statement, and if someone happens to call an _unqualified_ unary method called yield, they get an error (with a helpful suggestion that they might want to try a qualified yield.). > > So, to quantify the cost of a conditional keyword here: invocations of the form yield(e) will be parsed as statements, though qualified invocations (this.yield(e), Thread.yield(e), etc) will be parsed as method invocations. The cost-benefit analysis rests on the assumption that this will bite exceedingly rarely, and when it does, the workaround will be clear and easy. > > >> OPTION B: DISALLOW UNQUALIFIED INVOCATION >> ----------------------------------------- >> >> From a parser perspective, this is similarly straightforward: inside a switchy block, give the rule `yield ` a higher priority than method invocation. The compiler can warn on this ambiguity, if we like. >> >> From a user perspective, users wanting to invoke yield() methods inside switchy blocks will need to qualify the receiver (Foo.yield(), this.yield(), etc). >> >> The cost is that a statement ?yield (e)? parses to different things in different contexts; in a switchy block, it is a yield statement, the rest of the time, it is a method invocation. >> >> I think this is much less likely to cause user distress than Option A, because it is rare that there is an unqualified yield(x) method in scope. (And, given every yield() method I can think of, you?d likely never call one from a switchy block anyway, as they are side-effectful and blocking.). And in the case of collision, there is a clear workaround if the user really wanted a method invocation, and the compiler can deliver a warning when there is actual ambiguity. > From john.r.rose at oracle.com Mon May 20 20:33:01 2019 From: john.r.rose at oracle.com (John Rose) Date: Mon, 20 May 2019 13:33:01 -0700 Subject: Call for bikeshed -- break replacement in expression switch In-Reply-To: <5CDF3E5E.4060306@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <5CDF2DFE.6030901@oracle.com> <2BF3C2F1-BFD2-4D1A-AE39-D7AE053FD91B@oracle.com> <5CDF3E5E.4060306@oracle.com> Message-ID: <5B92992E-8666-4477-9908-2F2E1C60FAE3@oracle.com> On May 17, 2019, at 4:06 PM, Alex Buckley wrote: > > the decision still has to be taken about whether "in a SwitchLabeledBlock" or "in " is the proper context to recognize something new. From Tagir's note it seems we want to use less context than "in a SLB", because IDEs (and human eyes) won't always have enough enclosing context to classify the token. It also seems like it would be workable to classify "yield" as a keyword if it occurs at the beginning of a statement. This is almost exactly the same place that "var" is classified as a conditional keyword: The place in a block where we might be parsing either an expression-statement or a declaration or a block-structured sub-statement. So the rule for "yield" might be that it is classified as a keyword exactly where "var" would be. If it were that easy, that would be nice, since we have a precedent. ? John From alex.buckley at oracle.com Mon May 20 23:08:02 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Mon, 20 May 2019 16:08:02 -0700 Subject: Draft language spec for JEP 355: Text Blocks Message-ID: <5CE33352.4030008@oracle.com> Please see http://cr.openjdk.java.net/~abuckley/jep355/text-blocks-jls.html for JLS changes that align with the JEP. Text blocks compile to the same class file construct as string literals, namely CONSTANT_String_info entries in the constant pool. Helpfully, the JVMS is already agnostic about the origin of a CONSTANT_String_info, making no reference to "string literals". Therefore, there are no JVMS changes for text blocks, save for a tiny clarification w.r.t. annotation elements. Alex From brian.goetz at oracle.com Mon May 20 23:46:06 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 20 May 2019 19:46:06 -0400 Subject: Draft language spec for JEP 355: Text Blocks In-Reply-To: <5CE33352.4030008@oracle.com> References: <5CE33352.4030008@oracle.com> Message-ID: I wonder if we want to be cagey about committing to interning, which is another way to say we must translate too a constant string info. In the future, alternate condy- based representations may seem desirable and we don?t want to be painted into a translation by overspecification. Sent from my iPad > On May 20, 2019, at 7:08 PM, Alex Buckley wrote: > > Please see http://cr.openjdk.java.net/~abuckley/jep355/text-blocks-jls.html for JLS changes that align with the JEP. > > Text blocks compile to the same class file construct as string literals, namely CONSTANT_String_info entries in the constant pool. Helpfully, the JVMS is already agnostic about the origin of a CONSTANT_String_info, making no reference to "string literals". Therefore, there are no JVMS changes for text blocks, save for a tiny clarification w.r.t. annotation elements. > > Alex From alex.buckley at oracle.com Tue May 21 00:57:21 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Mon, 20 May 2019 17:57:21 -0700 Subject: Draft language spec for JEP 355: Text Blocks In-Reply-To: References: <5CE33352.4030008@oracle.com> Message-ID: <5CE34CF1.7000005@oracle.com> We already know the migration incompatibility of how: "SELECT ..." + "FROM ..." + "WHERE ..." is not ever equals() to: """ SELECT ... FROM ... WHERE ...""" because of the extra line terminators in the string derived from the text block. There will be a further migration incompatibility if: """ Hello world""" is not always == to: "Hello world" because of the lack of guaranteed string interning. Are you saying that the freedom to compile text blocks as dynamically-computed constants (rather than as static constants; see JVMS12 5.1) is more important than the space savings and identity guarantees from interning? I understand that starting off loose allows tightening later, but the loose behavior is significant. Alex On 5/20/2019 4:46 PM, Brian Goetz wrote: > I wonder if we want to be cagey about committing to interning, which > is another way to say we must translate too a constant string info. > In the future, alternate condy- based representations may seem > desirable and we don?t want to be painted into a translation by > overspecification. > > > > Sent from my iPad > >> On May 20, 2019, at 7:08 PM, Alex Buckley >> wrote: >> >> Please see >> http://cr.openjdk.java.net/~abuckley/jep355/text-blocks-jls.html >> for JLS changes that align with the JEP. >> >> Text blocks compile to the same class file construct as string >> literals, namely CONSTANT_String_info entries in the constant pool. >> Helpfully, the JVMS is already agnostic about the origin of a >> CONSTANT_String_info, making no reference to "string literals". >> Therefore, there are no JVMS changes for text blocks, save for a >> tiny clarification w.r.t. annotation elements. >> >> Alex > From amaembo at gmail.com Tue May 21 03:36:08 2019 From: amaembo at gmail.com (Tagir Valeev) Date: Tue, 21 May 2019 10:36:08 +0700 Subject: Trailing white-space in text blocks Message-ID: Hello! JEP 355 [1] says the following: > Remove all trailing white space from all lines in the modified list of individual lines from step 5. (Hidden white space at the end of lines is unintentional, so it is overwhelmingly likely that the developer does notwant it in the string.) Note that this step collapses wholly-whitespace lines in the modified list so that they are empty, but does not discard them. I'm not sure this is a good idea. Consider the following quite possible user story: Suppose I wrote a method like this (just to illustrate the problem, exact method details don't matter): public static String formatList(List list, int maxWidth) { var result = new StringBuilder(); int currentCount = 0; for (String s : list) { if (currentCount > 0 && currentCount + s.length() + 1 > maxWidth) { result.append("\n"); currentCount = 0; } result.append(s).append(" "); currentCount += s.length() + 1; } return result.toString(); } The method joins list of strings wrapping the result if it exceeds the supplied maxWidth. Note that it adds a whitespace before the linebreak. Probably it's accidental and could be removed, but it's also possible that the method exists for a long time and somebody already relies on these trailing whitespaces, so probably I'm not in the position to modify the method. Ok, we have a method, time to unit-test it. I lazily write: public void testFormatList() { String actual = StringUtil.formatList(List.of("One", "Two", "Three", "Four", "Five", "Six", "Seven"), 10); assertEquals("", actual); } Run the unit-test, it predictably fails and shows the difference between expected and actual. Using IDE diff view I examine the actual text, it looks like this: One Two Three Four Five Six Seven I confirm that I like it, so I paste it to the expected parameter into my unit-test changing the assertion to assertEquals(""" One Two Three Four Five Six Seven """, actual); It doesn't matter whether we add indent on the left or not. Looks pretty, but the test still fails. Closer examination of the output shows that expected code doesn't contain the trailing white-space while the actual output contains. Usually people are not very attentive to small details. Assume that I've missed the part about trailing whitespace stripping (or I learned this feature from online tutorial where this detail wasn't mentioned). Now I'm in complete confusion. I just copied the actual to the expected, but lost my whitespaces. What happened? - Probably my diff viewer was too smart at the copy action and stripped the whitespaces? - Probably my editor was too smart at the paste action and stripped the whitespaces? - Probably my editor was too smart during the file save operation and did additional cleanup which resulted in whitespace stripping? IDEA can actually do this, and it's a question whether this will affect multiline literals as well. - Probably compiler somehow stripped it (that's actually the case) - Probably I don't know something about assertEquals method? Probably it's buggy and strips whitespaces in one case but keeps them in another? I saw similar problem before with some custom assertion method. - Probably I misinterpreted the output of test failure and problem is not in whitespaces? The diff viewer of my IDE just highlights the single whitespace at the end of the every line, but I'm not sure, probably this highlighting means that something wrong with line separators? Things could be much worse if I was too lazy to rerun the test locally (what for? I just copied actual to expected! What could go wrong? Come on, just commit this and move to the next task). Then CI build fails and I have more possiblities: - Probably Git commit does something special with trailing whitespaces (e.g. we have pre-commit hook) - Probably CI pulls changes in some special way - Probably CI has different environment (JVM version, OS, filesystem) and this causes failure? Also CI output could be not very clear when the difference is in whitespaces only, and again I'm not sure what's going on. So I have many things to check. If I'm experienced, I would probably open the .java file in hex viewer and check the actual bytes to ensure that the whitespaces are there, thus first three cases are ruled out, then probably read the spec. Less experienced developer would be completely lost. Nevertheless after a hour or two, probably asking a colleague to help I will find the cause: the problem is really caused by the compiler. So I will spend another hour to find how to work-around it. I will see that escapes are handled after whitespace removal, thus will try to replace trailing spaces with \u0020 (very few developers are aware that it's not an actual string literal escape sequence). I will probably consult StackOverflow and find nothing. At the end I will give up and return to good old plain string literal. Not very productive day. I'm not sure how to avoid such scenario if trailing whitespaces are actually stripped. I can think up some solutions how IDE could help pointing at the actual problem cause, but not every IDE could be smart enough to help user in such scenario. I think that trailing whitespace stripping should be reconsidered. With best regards, Tagir Valeev [1] https://openjdk.java.net/jeps/355 From amaembo at gmail.com Tue May 21 05:31:06 2019 From: amaembo at gmail.com (Tagir Valeev) Date: Tue, 21 May 2019 12:31:06 +0700 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> Message-ID: For the record: I checked IDEA Ultimate source code (>100000 of Java files, total size is about 400Mb). It appears that an unqualified call of the method called 'yield' as an expression statement is used only once accross our sources: https://github.com/JetBrains/intellij-community/blob/34e0721dca3700bd14081cfb0da115b367e84a84/platform/lang-impl/src/com/intellij/ide/projectView/BaseProjectTreeBuilder.java#L305 With best regards, Tagir Valeev. On Tue, May 21, 2019 at 3:10 AM Guy Steele wrote: > > Okay, with this more detailed explanation and understanding, I am back in the `yield` camp. But I?m glad to have explored that side path just a bit. > > On May 20, 2019, at 3:51 PM, Brian Goetz wrote: > > Clarification on Option (B): the rational thing to do is probably not to restrict this behavior to switchy blocks, because there are other kinds of statement contexts that will be embedded in switchy blocks, such as: > > case L -> { > if (cond) > yield e; > } > > And surely we want to treat this as a yield (in fact, the inability to do this in loops was one of the reasons we rejected `break` in the first place.). Which means we always parse `yield ` as a yield statement, and if someone happens to call an _unqualified_ unary method called yield, they get an error (with a helpful suggestion that they might want to try a qualified yield.). > > So, to quantify the cost of a conditional keyword here: invocations of the form yield(e) will be parsed as statements, though qualified invocations (this.yield(e), Thread.yield(e), etc) will be parsed as method invocations. The cost-benefit analysis rests on the assumption that this will bite exceedingly rarely, and when it does, the workaround will be clear and easy. > > > OPTION B: DISALLOW UNQUALIFIED INVOCATION > ----------------------------------------- > > From a parser perspective, this is similarly straightforward: inside a switchy block, give the rule `yield ` a higher priority than method invocation. The compiler can warn on this ambiguity, if we like. > > From a user perspective, users wanting to invoke yield() methods inside switchy blocks will need to qualify the receiver (Foo.yield(), this.yield(), etc). > > The cost is that a statement ?yield (e)? parses to different things in different contexts; in a switchy block, it is a yield statement, the rest of the time, it is a method invocation. > > I think this is much less likely to cause user distress than Option A, because it is rare that there is an unqualified yield(x) method in scope. (And, given every yield() method I can think of, you?d likely never call one from a switchy block anyway, as they are side-effectful and blocking.). And in the case of collision, there is a clear workaround if the user really wanted a method invocation, and the compiler can deliver a warning when there is actual ambiguity. > > > From amaembo at gmail.com Tue May 21 05:38:48 2019 From: amaembo at gmail.com (Tagir Valeev) Date: Tue, 21 May 2019 12:38:48 +0700 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> Message-ID: > So does this (option B plus your No) mean that IDEs would tend to color "yield" as a keyword (at the beginning of a statement) even if followed by "("? My "No" was mostly against options C and D where symbol resolution affects the parse tree. Sorry if it wasn't clear from my message. When the context for the parsing is available inside the same Java file, it's usually ok. See the 'var' restricted keyword: var var = 10; // first is highlighted as type, second as local variable var = 20; // var is highlighted as local variable, despite it's at the beginning of a statement. var(1); // var is highlighted as a method call, despite it's at the beginning of a statement. We have no very big problems parsing this. With best regards, Tagir Valeev. On Tue, May 21, 2019 at 2:52 AM John Rose wrote: > > On May 20, 2019, at 8:24 AM, Tagir Valeev wrote: > > > > Assuming that we agreed on 'yield' the option B seems the most attractive. A big No to context-specific parse tree. It's a complete pain to IDEs. Don't forget that IDE often deals with incomplete code, missing dependencies, etc., and still needs to provide reasonable highlighting and completion. Imagine that 'yield' method is available via import static Foo.* or superclass. In this case we don't want to look into other files to build a correct parse tree. > > So does this (option B plus your No) mean that IDEs would > tend to color "yield" as a keyword (at the beginning of a > statement) even if followed by "("? > > I suppose that would work. It's hard to predict what that > would feel like, but it's logical. > > ? John From forax at univ-mlv.fr Tue May 21 06:05:42 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 21 May 2019 08:05:42 +0200 (CEST) Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: <687FA694-B044-4BF0-8D37-028159B3FCF3@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <687FA694-B044-4BF0-8D37-028159B3FCF3@oracle.com> Message-ID: <1934325333.519923.1558418742002.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "John Rose" > ?: "Brian Goetz" > Cc: "amber-spec-experts" > Envoy?: Lundi 20 Mai 2019 22:00:43 > Objet: Re: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) > On May 20, 2019, at 12:51 PM, Brian Goetz wrote: >> >> The cost-benefit analysis rests on the assumption that this will bite >> exceedingly rarely, and when it does, the workaround will be clear and easy. > > OK, so let's road-test "yield". > > While "break: x" is syntactically safer than "yield x", I buy > your argument. And "yield x" is easier to explain to users. > We all know why ":" is necessary in "break: x" but users > will need explanations about the colon where "yield x" > will just work for them. > > There's always a trade-off between precision and > concision. Concise formulations are inherently > ambiguous, just because there are a limited number > of length-N strings and each can be given only one > semantic pigeonhole. Adding a new keyword > lets us occupy new pigeonholes with the same > number of tokens. > > So, while "break: x" would please me as a syntax > geek, "yield x" (if it really works) will please me > as a user. We should try it if we think we can > make it work. I think we should not neglect the fact that a lot of Java users don't like the keyword break because for them it embodies the fallthrough problem of the switch statement syntax. So reusing break even with a colon after it goes in the wrong direction in term of a syntax signalling the semantics we want. This argument is also why yield is better keyword than break-with. > > ? John R?mi From james.laskey at oracle.com Tue May 21 09:17:09 2019 From: james.laskey at oracle.com (James Laskey) Date: Tue, 21 May 2019 06:17:09 -0300 Subject: Trailing white-space in text blocks In-Reply-To: References: Message-ID: Tagir, These scenarios were considered in the decision, and I guess will be part of the rationale section of some doc TBA. The argument for removal of trailing white space is pretty close to the argument for normalization of line terminators. The developer needs to be able to rely on what they can not see as much as what they can see. As you mention, many editors strip trailing spaces on save, which would frustrate developers to no end. ?Rats (expletive of choice), I need to add those (next expletive) spaces back again...? A better option is to have some visible indication that the intent is to keep spaces at the end. Guy recommended boxing the string with long delimiters to possibly provide fixed length strings. I think something more flexible is required, say if you wanted 4 spaces at the end of each line. If we move the line continuation discussion along, I think we have a possible candidate for keep the space indicator. If we go with \ for line continuation, and """ First \ Second \ Third """; represents the string "FirstSecondThird", then """ First \n\ Second \n\ Third """; represents "First\nSecond\nThird". Elegant or ugly, this visible indication makes it clear that the space is intentional. Of course, it?s easily enough to do the same with any sequence. """ First $ Second $ Third """.replace("$\n", "\n?) And, this works because we can rely on the fact that there is no incidental spacing after the $ and the line terminator is exactly \n. Cheers, ? Jim Sent from my iPhone > On May 21, 2019, at 12:36 AM, Tagir Valeev wrote: > > Hello! > > JEP 355 [1] says the following: > >> Remove all trailing white space from all lines in the modified list of individual lines from step 5. (Hidden white space at the end of lines is unintentional, so it is overwhelmingly likely that the developer does notwant it in the string.) Note that this step collapses wholly-whitespace lines in the modified list so that they are empty, but does not discard them. > > I'm not sure this is a good idea. Consider the following quite > possible user story: > > Suppose I wrote a method like this (just to illustrate the problem, > exact method details don't matter): > > public static String formatList(List list, int maxWidth) { > var result = new StringBuilder(); > int currentCount = 0; > for (String s : list) { > if (currentCount > 0 && currentCount + s.length() + 1 > maxWidth) { > result.append("\n"); > currentCount = 0; > } > result.append(s).append(" "); > currentCount += s.length() + 1; > } > return result.toString(); > } > > The method joins list of strings wrapping the result if it exceeds the > supplied maxWidth. Note that it adds a whitespace before the > linebreak. Probably it's accidental and could be removed, but it's > also possible that the method exists for a long time and somebody > already relies on these trailing whitespaces, so probably I'm not in > the position to modify the method. > > Ok, we have a method, time to unit-test it. I lazily write: > > public void testFormatList() { > String actual = StringUtil.formatList(List.of("One", "Two", "Three", > "Four", "Five", "Six", "Seven"), 10); > assertEquals("", actual); > } > > Run the unit-test, it predictably fails and shows the difference > between expected and actual. Using IDE diff view I examine the actual > text, it looks like this: > > One Two > Three > Four Five > Six Seven > > I confirm that I like it, so I paste it to the expected parameter into > my unit-test changing the assertion to > > assertEquals(""" > One Two > Three > Four Five > Six Seven > """, actual); > > It doesn't matter whether we add indent on the left or not. Looks > pretty, but the test still fails. Closer examination of the output > shows that expected code doesn't contain the trailing white-space > while the actual output contains. Usually people are not very > attentive to small details. Assume that I've missed the part about > trailing whitespace stripping (or I learned this feature from online > tutorial where this detail wasn't mentioned). Now I'm in complete > confusion. I just copied the actual to the expected, but lost my > whitespaces. What happened? > > - Probably my diff viewer was too smart at the copy action and > stripped the whitespaces? > - Probably my editor was too smart at the paste action and stripped > the whitespaces? > - Probably my editor was too smart during the file save operation and > did additional cleanup which resulted in whitespace stripping? IDEA > can actually do this, and it's a question whether this will affect > multiline literals as well. > - Probably compiler somehow stripped it (that's actually the case) > - Probably I don't know something about assertEquals method? Probably > it's buggy and strips whitespaces in one case but keeps them in > another? I saw similar problem before with some custom assertion > method. > - Probably I misinterpreted the output of test failure and problem is > not in whitespaces? The diff viewer of my IDE just highlights the > single whitespace at the end of the every line, but I'm not sure, > probably this highlighting means that something wrong with line > separators? > > Things could be much worse if I was too lazy to rerun the test locally > (what for? I just copied actual to expected! What could go wrong? Come > on, just commit this and move to the next task). Then CI build fails > and I have more possiblities: > > - Probably Git commit does something special with trailing whitespaces > (e.g. we have pre-commit hook) > - Probably CI pulls changes in some special way > - Probably CI has different environment (JVM version, OS, filesystem) > and this causes failure? Also CI output could be not very clear when > the difference is in whitespaces only, and again I'm not sure what's > going on. > > So I have many things to check. If I'm experienced, I would probably > open the .java file in hex viewer and check the actual bytes to ensure > that the whitespaces are there, thus first three cases are ruled out, > then probably read the spec. Less experienced developer would be > completely lost. > > Nevertheless after a hour or two, probably asking a colleague to help > I will find the cause: the problem is really caused by the compiler. > So I will spend another hour to find how to work-around it. I will see > that escapes are handled after whitespace removal, thus will try to > replace trailing spaces with \u0020 (very few developers are aware > that it's not an actual string literal escape sequence). I will > probably consult StackOverflow and find nothing. At the end I will > give up and return to good old plain string literal. Not very > productive day. > > I'm not sure how to avoid such scenario if trailing whitespaces are > actually stripped. I can think up some solutions how IDE could help > pointing at the actual problem cause, but not every IDE could be smart > enough to help user in such scenario. > > I think that trailing whitespace stripping should be reconsidered. > > With best regards, > Tagir Valeev > > [1] https://openjdk.java.net/jeps/355 From james.laskey at oracle.com Tue May 21 12:30:46 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Tue, 21 May 2019 09:30:46 -0300 Subject: Draft language spec for JEP 355: Text Blocks In-Reply-To: <5CE33352.4030008@oracle.com> References: <5CE33352.4030008@oracle.com> Message-ID: <31C335AD-2A27-4CB8-8913-C00FA2162B93@oracle.com> Looks pretty good. ___________________________________________________________________________________________________ TextBlock: " " " { the ASCII SP character } LineTerminator { TextBlockCharacter } " " " "the ASCII SP character" in the open delimiter is currently implemented as any "white space" but not a line terminator. Later on you state "zero or more white spaces". ___________________________________________________________________________________________________ The string represented by a text block is not the literal sequence of characters in the content. Instead, the string represented by a text block is the result of processing the content, as follows: I think this could be reworded so that the importance of order is made clear. Later on you state "Interpreting escape sequences last allows", but it's still not clear the order of 1 & 2 is important. In the JEP we described them as "steps". Stages might work as well. ___________________________________________________________________________________________________ @Precondition(""" rate > 0 && rate <= MAX_REFRESH_RATE """) public void setRefreshRate(int rate) { ... } You went there. :-) ___________________________________________________________________________________________________ Cheers, - Jim > On May 20, 2019, at 8:08 PM, Alex Buckley wrote: > > Please see http://cr.openjdk.java.net/~abuckley/jep355/text-blocks-jls.html for JLS changes that align with the JEP. > > Text blocks compile to the same class file construct as string literals, namely CONSTANT_String_info entries in the constant pool. Helpfully, the JVMS is already agnostic about the origin of a CONSTANT_String_info, making no reference to "string literals". Therefore, there are no JVMS changes for text blocks, save for a tiny clarification w.r.t. annotation elements. > > Alex From brian.goetz at oracle.com Tue May 21 12:51:39 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 21 May 2019 08:51:39 -0400 Subject: Draft language spec for JEP 355: Text Blocks In-Reply-To: <5CE34CF1.7000005@oracle.com> References: <5CE33352.4030008@oracle.com> <5CE34CF1.7000005@oracle.com> Message-ID: <16C84E21-26F7-4527-99B5-CD2434501E36@oracle.com> I think I?m saying something slightly stronger than that. Interning provides two benefits: - performance/footprint ? not filling the heap up with a million instances of ?? or ?yes? - guarantees about String== ? that it is safe to use == on String instances I believe the value of the second is rooted in a time when method invocation was so expensive that encouraging == in situations where the developer ?just knows? seemed a pragmatic compromise. We?re nearly 25 years away from that world now, and we generally strongly discourage relying on == over equals for equality comparisons. (It is common to do `a == b || a.equals(b)`, in which case interning might push us through the fast path more often ? for strings that are actually common. But this is more in the ?can we intern? rather than the ?must we intern? camp). As string literals get longer, the cost-benefit of interning get worse, and eventually turn negative; it is super-unlikely that two compilation units will use the same 14-line snippet of JSON (no benefit), and at the same time, we?re taking up much more space in the intern table (more cost). Surely today we?ll use Constant_String_info because that?s the sensible translation target, and if the same string appears twice in a single class, it?ll automatically get merged by the constant pool wrier. But committing forever to interning seems likely to be something we?ll eventually regret, without buying us very much. Even the migration benefit seems questionable. > Are you saying that the freedom to compile text blocks as dynamically-computed constants (rather than as static constants; see JVMS12 5.1) is more important than the space savings and identity guarantees from interning? I understand that starting off loose allows tightening later, but the loose behavior is significant. > > Alex > > On 5/20/2019 4:46 PM, Brian Goetz wrote: >> I wonder if we want to be cagey about committing to interning, which >> is another way to say we must translate too a constant string info. >> In the future, alternate condy- based representations may seem >> desirable and we don?t want to be painted into a translation by >> overspecification. >> >> >> >> Sent from my iPad >> >>> On May 20, 2019, at 7:08 PM, Alex Buckley >>> wrote: >>> >>> Please see >>> http://cr.openjdk.java.net/~abuckley/jep355/text-blocks-jls.html >>> for JLS changes that align with the JEP. >>> >>> Text blocks compile to the same class file construct as string >>> literals, namely CONSTANT_String_info entries in the constant pool. >>> Helpfully, the JVMS is already agnostic about the origin of a >>> CONSTANT_String_info, making no reference to "string literals". >>> Therefore, there are no JVMS changes for text blocks, save for a >>> tiny clarification w.r.t. annotation elements. >>> >>> Alex >> From amaembo at gmail.com Tue May 21 14:50:25 2019 From: amaembo at gmail.com (Tagir Valeev) Date: Tue, 21 May 2019 21:50:25 +0700 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> Message-ID: I discussed this with colleagues and can confirm that for IntelliJ IDEA parser it will be no problem to always consider yield as a statement. At least it's much easier than to consider it as a statement inside switchy blocks only. With best regards, Tagir Valeev. On Tue, May 21, 2019 at 12:38 PM Tagir Valeev wrote: > > > So does this (option B plus your No) mean that IDEs would tend to color "yield" as a keyword (at the beginning of a statement) even if followed by "("? > > My "No" was mostly against options C and D where symbol resolution > affects the parse tree. Sorry if it wasn't clear from my message. When > the context for the parsing is available inside the same Java file, > it's usually ok. See the 'var' restricted keyword: > > var var = 10; // first is highlighted as type, second as local variable > var = 20; // var is highlighted as local variable, despite it's at the > beginning of a statement. > var(1); // var is highlighted as a method call, despite it's at the > beginning of a statement. > > We have no very big problems parsing this. > > With best regards, > Tagir Valeev. > > On Tue, May 21, 2019 at 2:52 AM John Rose wrote: > > > > On May 20, 2019, at 8:24 AM, Tagir Valeev wrote: > > > > > > Assuming that we agreed on 'yield' the option B seems the most attractive. A big No to context-specific parse tree. It's a complete pain to IDEs. Don't forget that IDE often deals with incomplete code, missing dependencies, etc., and still needs to provide reasonable highlighting and completion. Imagine that 'yield' method is available via import static Foo.* or superclass. In this case we don't want to look into other files to build a correct parse tree. > > > > So does this (option B plus your No) mean that IDEs would > > tend to color "yield" as a keyword (at the beginning of a > > statement) even if followed by "("? > > > > I suppose that would work. It's hard to predict what that > > would feel like, but it's logical. > > > > ? John From alex.buckley at oracle.com Tue May 21 19:41:39 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Tue, 21 May 2019 12:41:39 -0700 Subject: Draft language spec for JEP 355: Text Blocks In-Reply-To: <31C335AD-2A27-4CB8-8913-C00FA2162B93@oracle.com> References: <5CE33352.4030008@oracle.com> <31C335AD-2A27-4CB8-8913-C00FA2162B93@oracle.com> Message-ID: <5CE45473.2050301@oracle.com> On 5/21/2019 5:30 AM, Jim Laskey wrote: > TextBlock: > > " " " { the ASCII SP character } LineTerminator { TextBlockCharacter } " " " > > "the ASCII SP character" in the open delimiter is currently > implemented as any "white space" but not a line terminator. Later on you > state "zero or more white spaces". Thank you for this clarification. I agree that a text-block-only form of "white space" -- spaces, tabs, form feeds -- can legitimately appear after the """. I have added a production and narrative to capture this. > The string represented by a text block is /not/ the literal sequence of > characters in the content. Instead, the string represented by a text > block is the result of processing the content, as follows: > > I think this could be reworded so that the importance of order is made > clear. Later on you state "Interpreting escape sequences last allows", > but it's still not clear the order of 1 & 2 is important. In the JEP we > described them as "steps". Stages might work as well. A numbered list in the JLS traditionally means in-order processing, but for the avoidance of doubt I have said "... the result of applying the following transformations to the content, in order:" Alex From alex.buckley at oracle.com Tue May 21 19:45:40 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Tue, 21 May 2019 12:45:40 -0700 Subject: Draft language spec for JEP 355: Text Blocks In-Reply-To: <16C84E21-26F7-4527-99B5-CD2434501E36@oracle.com> References: <5CE33352.4030008@oracle.com> <5CE34CF1.7000005@oracle.com> <16C84E21-26F7-4527-99B5-CD2434501E36@oracle.com> Message-ID: <5CE45564.5090405@oracle.com> On 5/21/2019 5:51 AM, Brian Goetz wrote: > As string literals get longer, the cost-benefit of interning get > worse, and eventually turn negative; it is super-unlikely that two > compilation units will use the same 14-line snippet of JSON (no > benefit), and at the same time, we?re taking up much more space in > the intern table (more cost). > > Surely today we?ll use Constant_String_info because that?s the > sensible translation target, and if the same string appears twice in > a single class, it?ll automatically get merged by the constant pool > writer. But committing forever to interning seems likely to be > something we?ll eventually regret, without buying us very much. Even > the migration benefit seems questionable. OK, I have walked back the requirement to intern text blocks in 3.10.6 and 12.5. Spec updated in place (http://cr.openjdk.java.net/~abuckley/jep355/text-blocks-jls.html), old version available (http://cr.openjdk.java.net/~abuckley/jep355/text-blocks-jls-20190520.html). Alex From james.laskey at oracle.com Tue May 21 19:44:48 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Tue, 21 May 2019 16:44:48 -0300 Subject: Draft language spec for JEP 355: Text Blocks In-Reply-To: <5CE45473.2050301@oracle.com> References: <5CE33352.4030008@oracle.com> <31C335AD-2A27-4CB8-8913-C00FA2162B93@oracle.com> <5CE45473.2050301@oracle.com> Message-ID: <4AC42BC7-6BD7-4628-8103-809B00AB766A@oracle.com> Thank you. > On May 21, 2019, at 4:41 PM, Alex Buckley wrote: > > On 5/21/2019 5:30 AM, Jim Laskey wrote: >> TextBlock: >> >> " " " { the ASCII SP character } LineTerminator { TextBlockCharacter } " " " >> >> "the ASCII SP character" in the open delimiter is currently >> implemented as any "white space" but not a line terminator. Later on you >> state "zero or more white spaces". > > Thank you for this clarification. I agree that a text-block-only form of "white space" -- spaces, tabs, form feeds -- can legitimately appear after the """. I have added a production and narrative to capture this. > >> The string represented by a text block is /not/ the literal sequence of >> characters in the content. Instead, the string represented by a text >> block is the result of processing the content, as follows: >> >> I think this could be reworded so that the importance of order is made >> clear. Later on you state "Interpreting escape sequences last allows", >> but it's still not clear the order of 1 & 2 is important. In the JEP we >> described them as "steps". Stages might work as well. > > A numbered list in the JLS traditionally means in-order processing, but for the avoidance of doubt I have said "... the result of applying the following transformations to the content, in order:" > > Alex From john_kozlov at axmor.com Wed May 22 04:06:59 2019 From: john_kozlov at axmor.com (John Kozlov) Date: Wed, 22 May 2019 11:06:59 +0700 Subject: Trailing white-space in text blocks In-Reply-To: References: Message-ID: I support Tagir here. The compiler cannot know for sure if trailing whitespaces are intentional or unintentional. It's the user who should decide whether they want to keep them or not. Also, the spec does not say anything about mixed whitespace prefixes. What happens to text blocks where spaces are interleaved with tabs? As far as I understand, a tab is counted as one space, right? For example: """ (tab)(space)line1 (space)(space)line2 (space)(space)""" What is the resulting string here? ??, 21 ??? 2019 ?. ? 10:36, Tagir Valeev : > Hello! > > JEP 355 [1] says the following: > > > Remove all trailing white space from all lines in the modified list of > individual lines from step 5. (Hidden white space at the end of lines is > unintentional, so it is overwhelmingly likely that the developer does > notwant it in the string.) Note that this step collapses wholly-whitespace > lines in the modified list so that they are empty, but does not discard > them. > > I'm not sure this is a good idea. Consider the following quite > possible user story: > > Suppose I wrote a method like this (just to illustrate the problem, > exact method details don't matter): > > public static String formatList(List list, int maxWidth) { > var result = new StringBuilder(); > int currentCount = 0; > for (String s : list) { > if (currentCount > 0 && currentCount + s.length() + 1 > maxWidth) { > result.append("\n"); > currentCount = 0; > } > result.append(s).append(" "); > currentCount += s.length() + 1; > } > return result.toString(); > } > > The method joins list of strings wrapping the result if it exceeds the > supplied maxWidth. Note that it adds a whitespace before the > linebreak. Probably it's accidental and could be removed, but it's > also possible that the method exists for a long time and somebody > already relies on these trailing whitespaces, so probably I'm not in > the position to modify the method. > > Ok, we have a method, time to unit-test it. I lazily write: > > public void testFormatList() { > String actual = StringUtil.formatList(List.of("One", "Two", "Three", > "Four", "Five", "Six", "Seven"), 10); > assertEquals("", actual); > } > > Run the unit-test, it predictably fails and shows the difference > between expected and actual. Using IDE diff view I examine the actual > text, it looks like this: > > One Two > Three > Four Five > Six Seven > > I confirm that I like it, so I paste it to the expected parameter into > my unit-test changing the assertion to > > assertEquals(""" > One Two > Three > Four Five > Six Seven > """, actual); > > It doesn't matter whether we add indent on the left or not. Looks > pretty, but the test still fails. Closer examination of the output > shows that expected code doesn't contain the trailing white-space > while the actual output contains. Usually people are not very > attentive to small details. Assume that I've missed the part about > trailing whitespace stripping (or I learned this feature from online > tutorial where this detail wasn't mentioned). Now I'm in complete > confusion. I just copied the actual to the expected, but lost my > whitespaces. What happened? > > - Probably my diff viewer was too smart at the copy action and > stripped the whitespaces? > - Probably my editor was too smart at the paste action and stripped > the whitespaces? > - Probably my editor was too smart during the file save operation and > did additional cleanup which resulted in whitespace stripping? IDEA > can actually do this, and it's a question whether this will affect > multiline literals as well. > - Probably compiler somehow stripped it (that's actually the case) > - Probably I don't know something about assertEquals method? Probably > it's buggy and strips whitespaces in one case but keeps them in > another? I saw similar problem before with some custom assertion > method. > - Probably I misinterpreted the output of test failure and problem is > not in whitespaces? The diff viewer of my IDE just highlights the > single whitespace at the end of the every line, but I'm not sure, > probably this highlighting means that something wrong with line > separators? > > Things could be much worse if I was too lazy to rerun the test locally > (what for? I just copied actual to expected! What could go wrong? Come > on, just commit this and move to the next task). Then CI build fails > and I have more possiblities: > > - Probably Git commit does something special with trailing whitespaces > (e.g. we have pre-commit hook) > - Probably CI pulls changes in some special way > - Probably CI has different environment (JVM version, OS, filesystem) > and this causes failure? Also CI output could be not very clear when > the difference is in whitespaces only, and again I'm not sure what's > going on. > > So I have many things to check. If I'm experienced, I would probably > open the .java file in hex viewer and check the actual bytes to ensure > that the whitespaces are there, thus first three cases are ruled out, > then probably read the spec. Less experienced developer would be > completely lost. > > Nevertheless after a hour or two, probably asking a colleague to help > I will find the cause: the problem is really caused by the compiler. > So I will spend another hour to find how to work-around it. I will see > that escapes are handled after whitespace removal, thus will try to > replace trailing spaces with \u0020 (very few developers are aware > that it's not an actual string literal escape sequence). I will > probably consult StackOverflow and find nothing. At the end I will > give up and return to good old plain string literal. Not very > productive day. > > I'm not sure how to avoid such scenario if trailing whitespaces are > actually stripped. I can think up some solutions how IDE could help > pointing at the actual problem cause, but not every IDE could be smart > enough to help user in such scenario. > > I think that trailing whitespace stripping should be reconsidered. > > With best regards, > Tagir Valeev > > [1] https://openjdk.java.net/jeps/355 > From brian.goetz at oracle.com Wed May 22 15:45:47 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 22 May 2019 11:45:47 -0400 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> Message-ID: <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> We?ve been drilling into the spec and implementation of yield as a contextual keyword. We have three possible strategies, all of which are specifiable and implementable, but with tradeoffs. The ?dumb strategy? would be to say that `yield` is a keyword when it appears in the first position of a statement production (e.g., after an open brace or a semicolon.). This is simple to spec, and simple to implement, but it doesn?t so do well with variables named `yield`: yield++; yield = 3; if (foo) yield += 3; yield[3] = 4; The ?smart strategy? says that `yield` is a keyword only within the context of the YieldStatement production; the rest of the time it is an identifier. This is also simple to spec, and does the right thing in all unambiguous cases, but requires unbounded lookahead, which compiler implementations may not like. The one ambiguous case is yield(e) which would match both YieldStatement and ExpressionStatement, and here we bias towards YieldStatement. Naked yield() invocations can qualify the invocation: this.yield(3) Thread.yield(4) The ?compromise? strategy is like the smart strategy, except that it trades fixed lookahead for missing a few more method invocation cases. Here, we look at the tokens that follow the identifier yield, and use those to determine whether to classify yield as a keyword or identifier. (We?d choose identifier if it is an assignment op (=, +=, etc), left-bracket, dot, and a few others, plus a few two-token sequences (e.g., ++ and then semicolon), which is lookahead(2). The main difference between the compromise strategy and the smart strategy is the handling of method invocations that are not unary: yield(3, 4) In the smart strategy, we?d figure out that this is a method call; in the compromise strategy, we?d require qualification just as we do with the unary method. The compromise strategy misses some cases we could parse unambiguously, but also offers a simpler user model: always qualify invocations of methods called yield when used as expression statements. And it offers the better lookup behavior, which will make life easier for IDEs. So my recommendation here is the compromise strategy. > On May 21, 2019, at 10:50 AM, Tagir Valeev wrote: > > I discussed this with colleagues and can confirm that for IntelliJ > IDEA parser it will be no problem to always consider yield as a > statement. At least it's much easier than to consider it as a > statement inside switchy blocks only. > > With best regards, > Tagir Valeev. > > On Tue, May 21, 2019 at 12:38 PM Tagir Valeev wrote: >> >>> So does this (option B plus your No) mean that IDEs would tend to color "yield" as a keyword (at the beginning of a statement) even if followed by "("? >> >> My "No" was mostly against options C and D where symbol resolution >> affects the parse tree. Sorry if it wasn't clear from my message. When >> the context for the parsing is available inside the same Java file, >> it's usually ok. See the 'var' restricted keyword: >> >> var var = 10; // first is highlighted as type, second as local variable >> var = 20; // var is highlighted as local variable, despite it's at the >> beginning of a statement. >> var(1); // var is highlighted as a method call, despite it's at the >> beginning of a statement. >> >> We have no very big problems parsing this. >> >> With best regards, >> Tagir Valeev. >> >> On Tue, May 21, 2019 at 2:52 AM John Rose wrote: >>> >>> On May 20, 2019, at 8:24 AM, Tagir Valeev wrote: >>>> >>>> Assuming that we agreed on 'yield' the option B seems the most attractive. A big No to context-specific parse tree. It's a complete pain to IDEs. Don't forget that IDE often deals with incomplete code, missing dependencies, etc., and still needs to provide reasonable highlighting and completion. Imagine that 'yield' method is available via import static Foo.* or superclass. In this case we don't want to look into other files to build a correct parse tree. >>> >>> So does this (option B plus your No) mean that IDEs would >>> tend to color "yield" as a keyword (at the beginning of a >>> statement) even if followed by "("? >>> >>> I suppose that would work. It's hard to predict what that >>> would feel like, but it's logical. >>> >>> ? John From john.r.rose at oracle.com Wed May 22 16:12:03 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 22 May 2019 09:12:03 -0700 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> Message-ID: <8DBBDB25-4800-4FAB-B393-853E4369CD1E@oracle.com> On May 22, 2019, at 8:45 AM, Brian Goetz wrote: > > So my recommendation here is the compromise strategy. +10 Let's do it. Even though it fails to recognize some yield method calls, it is easy to learn and explain and therefore more predictable for end users. The failure to recognize is going to be a rare problem; this can be proven during the preview period. From guy.steele at oracle.com Wed May 22 16:16:07 2019 From: guy.steele at oracle.com (Guy Steele) Date: Wed, 22 May 2019 12:16:07 -0400 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: <8DBBDB25-4800-4FAB-B393-853E4369CD1E@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <8DBBDB25-4800-4FAB-B393-853E4369CD1E@oracle.com> Message-ID: <7938F045-ED7B-480F-8004-C1C69E1020F0@oracle.com> +3.7 > On May 22, 2019, at 12:12 PM, John Rose wrote: > > On May 22, 2019, at 8:45 AM, Brian Goetz wrote: >> >> So my recommendation here is the compromise strategy. > > +10 Let's do it. > > Even though it fails to recognize some yield method > calls, it is easy to learn and explain and therefore more > predictable for end users. The failure to recognize is > going to be a rare problem; this can be proven during > the preview period. > > From forax at univ-mlv.fr Wed May 22 17:20:04 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 22 May 2019 19:20:04 +0200 (CEST) Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: <8DBBDB25-4800-4FAB-B393-853E4369CD1E@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <8DBBDB25-4800-4FAB-B393-853E4369CD1E@oracle.com> Message-ID: <112292072.1359780.1558545604915.JavaMail.zimbra@u-pem.fr> yes, i fully agree. R?mi ----- Mail original ----- > De: "John Rose" > ?: "Brian Goetz" > Cc: "amber-spec-experts" > Envoy?: Mercredi 22 Mai 2019 18:12:03 > Objet: Re: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) > On May 22, 2019, at 8:45 AM, Brian Goetz wrote: >> >> So my recommendation here is the compromise strategy. > > +10 Let's do it. > > Even though it fails to recognize some yield method > calls, it is easy to learn and explain and therefore more > predictable for end users. The failure to recognize is > going to be a rare problem; this can be proven during > the preview period. From john.r.rose at oracle.com Wed May 22 19:26:43 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 22 May 2019 12:26:43 -0700 Subject: Trailing white-space in text blocks In-Reply-To: References: Message-ID: <2D5BFFF2-9C32-4089-ACBE-061A42526B1D@oracle.com> On May 20, 2019, at 8:36 PM, Tagir Valeev wrote: > > Nevertheless after a hour or two, probably asking a colleague to help > I will find the cause: the problem is really caused by the compiler. > So I will spend another hour to find how to work-around it. I will see > that escapes are handled after whitespace removal, thus will try to > replace trailing spaces with \u0020 (very few developers are aware > that it's not an actual string literal escape sequence). I will > probably consult StackOverflow and find nothing. At the end I will > give up and return to good old plain string literal. Not very > productive day. > > I'm not sure how to avoid such scenario if trailing whitespaces are > actually stripped. I can think up some solutions how IDE could help > pointing at the actual problem cause, but not every IDE could be smart > enough to help user in such scenario. > Yes, this is why we want <\ s> as a new escape sequence. <\ 0 4 0> is not clear enough, but it is better than nothing. (And <\ 0 0 2 0> is a disaster, which nobody but me seems to want to address? Moving on?) Significant trailing space at line ends is a language design bug. Visible escapes are the necessary fix. In either case, the IDE which does the pasting into a text block *must* detect trailing space and escape it. ? John From john.r.rose at oracle.com Wed May 22 19:31:09 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 22 May 2019 12:31:09 -0700 Subject: Trailing white-space in text blocks In-Reply-To: References: Message-ID: On May 21, 2019, at 2:17 AM, James Laskey wrote: > > If we go with \ for line continuation, and > > """ > First \ > Second \ > Third > """; > > represents the string "FirstSecondThird", then > > """ > First \n\ > Second \n\ > Third > """; > Yes! FTR I would prefer that we get both <\ LT> and <\ s>, as a way to manage the new issues involving incidental white space in text blocks. I'm greedy. Having <\ n \ LT> as a visible line end is a nice thing, and reduces pressure for <\ s> as a workaround for invisible space at line ends. But that's not the only use case for <\ s>. To be clear, <\ s> (and less so <\ 0 4 0>) is also useful for visibly marking non-incidental space at the beginning of a line, as well as at the end of lines. ? John From john.r.rose at oracle.com Wed May 22 19:33:43 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 22 May 2019 12:33:43 -0700 Subject: Trailing white-space in text blocks In-Reply-To: <2D5BFFF2-9C32-4089-ACBE-061A42526B1D@oracle.com> References: <2D5BFFF2-9C32-4089-ACBE-061A42526B1D@oracle.com> Message-ID: <4E903124-0C3A-4937-8CF4-14B5BBCC2DB6@oracle.com> On May 22, 2019, at 12:26 PM, John Rose wrote: > > (And <\ 0 0 2 0> is a disaster, which nobody but me seems > to want to address? Moving on?) (That was supposed to be <\ u 0 0 2 0>. The claimed disaster is that some end users will reach for \u0020 and other forms of \u00XX and be mystified by their ineffectiveness. I wish we could fix this for text blocks, but I understand it's an extra cost.) From amaembo at gmail.com Thu May 23 00:46:19 2019 From: amaembo at gmail.com (Tagir Valeev) Date: Thu, 23 May 2019 07:46:19 +0700 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> Message-ID: Hello. I agree: compromise strategy is the best option. It should be less surprising to users. E. g. consider that yield is a vararg method to produce several values at once. Use would be pretty surprised when removing an argument makes code incompilable. With best regards, Tagir Valeev. ??, 22 ??? 2019 ?., 22:45 Brian Goetz : > We?ve been drilling into the spec and implementation of yield as a > contextual keyword. We have three possible strategies, all of which are > specifiable and implementable, but with tradeoffs. > > The ?dumb strategy? would be to say that `yield` is a keyword when it > appears in the first position of a statement production (e.g., after an > open brace or a semicolon.). This is simple to spec, and simple to > implement, but it doesn?t so do well with variables named `yield`: > > yield++; > yield = 3; > if (foo) > yield += 3; > yield[3] = 4; > > The ?smart strategy? says that `yield` is a keyword only within the > context of the YieldStatement production; the rest of the time it is an > identifier. This is also simple to spec, and does the right thing in all > unambiguous cases, but requires unbounded lookahead, which compiler > implementations may not like. The one ambiguous case is > > yield(e) > > which would match both YieldStatement and ExpressionStatement, and here we > bias towards YieldStatement. Naked yield() invocations can qualify the > invocation: > > this.yield(3) > Thread.yield(4) > > The ?compromise? strategy is like the smart strategy, except that it > trades fixed lookahead for missing a few more method invocation cases. > Here, we look at the tokens that follow the identifier yield, and use those > to determine whether to classify yield as a keyword or identifier. (We?d > choose identifier if it is an assignment op (=, +=, etc), left-bracket, > dot, and a few others, plus a few two-token sequences (e.g., ++ and then > semicolon), which is lookahead(2). > > The main difference between the compromise strategy and the smart strategy > is the handling of method invocations that are not unary: > > yield(3, 4) > > In the smart strategy, we?d figure out that this is a method call; in the > compromise strategy, we?d require qualification just as we do with the > unary method. > > The compromise strategy misses some cases we could parse unambiguously, > but also offers a simpler user model: always qualify invocations of methods > called yield when used as expression statements. And it offers the better > lookup behavior, which will make life easier for IDEs. > > So my recommendation here is the compromise strategy. > > > On May 21, 2019, at 10:50 AM, Tagir Valeev wrote: > > > > I discussed this with colleagues and can confirm that for IntelliJ > > IDEA parser it will be no problem to always consider yield as a > > statement. At least it's much easier than to consider it as a > > statement inside switchy blocks only. > > > > With best regards, > > Tagir Valeev. > > > > On Tue, May 21, 2019 at 12:38 PM Tagir Valeev wrote: > >> > >>> So does this (option B plus your No) mean that IDEs would tend to > color "yield" as a keyword (at the beginning of a statement) even if > followed by "("? > >> > >> My "No" was mostly against options C and D where symbol resolution > >> affects the parse tree. Sorry if it wasn't clear from my message. When > >> the context for the parsing is available inside the same Java file, > >> it's usually ok. See the 'var' restricted keyword: > >> > >> var var = 10; // first is highlighted as type, second as local variable > >> var = 20; // var is highlighted as local variable, despite it's at the > >> beginning of a statement. > >> var(1); // var is highlighted as a method call, despite it's at the > >> beginning of a statement. > >> > >> We have no very big problems parsing this. > >> > >> With best regards, > >> Tagir Valeev. > >> > >> On Tue, May 21, 2019 at 2:52 AM John Rose > wrote: > >>> > >>> On May 20, 2019, at 8:24 AM, Tagir Valeev wrote: > >>>> > >>>> Assuming that we agreed on 'yield' the option B seems the most > attractive. A big No to context-specific parse tree. It's a complete pain > to IDEs. Don't forget that IDE often deals with incomplete code, missing > dependencies, etc., and still needs to provide reasonable highlighting and > completion. Imagine that 'yield' method is available via import static > Foo.* or superclass. In this case we don't want to look into other files to > build a correct parse tree. > >>> > >>> So does this (option B plus your No) mean that IDEs would > >>> tend to color "yield" as a keyword (at the beginning of a > >>> statement) even if followed by "("? > >>> > >>> I suppose that would work. It's hard to predict what that > >>> would feel like, but it's logical. > >>> > >>> ? John > > From brian.goetz at oracle.com Thu May 23 03:31:29 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 22 May 2019 23:31:29 -0400 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> Message-ID: <139D593F-9F15-4D37-A6C8-6CB2712EA32C@oracle.com> Right. Having a simple rule (?always qualify methods?) has a simpler surface, even if it isn?t maximally discriminating. That it is lookahead(2) also makes life easier for tools?. > On May 22, 2019, at 8:46 PM, Tagir Valeev wrote: > > Hello. > > I agree: compromise strategy is the best option. It should be less surprising to users. E. g. consider that yield is a vararg method to produce several values at once. Use would be pretty surprised when removing an argument makes code incompilable. > > With best regards, > Tagir Valeev. > > ??, 22 ??? 2019 ?., 22:45 Brian Goetz >: > We?ve been drilling into the spec and implementation of yield as a contextual keyword. We have three possible strategies, all of which are specifiable and implementable, but with tradeoffs. > > The ?dumb strategy? would be to say that `yield` is a keyword when it appears in the first position of a statement production (e.g., after an open brace or a semicolon.). This is simple to spec, and simple to implement, but it doesn?t so do well with variables named `yield`: > > yield++; > yield = 3; > if (foo) > yield += 3; > yield[3] = 4; > > The ?smart strategy? says that `yield` is a keyword only within the context of the YieldStatement production; the rest of the time it is an identifier. This is also simple to spec, and does the right thing in all unambiguous cases, but requires unbounded lookahead, which compiler implementations may not like. The one ambiguous case is > > yield(e) > > which would match both YieldStatement and ExpressionStatement, and here we bias towards YieldStatement. Naked yield() invocations can qualify the invocation: > > this.yield(3) > Thread.yield(4) > > The ?compromise? strategy is like the smart strategy, except that it trades fixed lookahead for missing a few more method invocation cases. Here, we look at the tokens that follow the identifier yield, and use those to determine whether to classify yield as a keyword or identifier. (We?d choose identifier if it is an assignment op (=, +=, etc), left-bracket, dot, and a few others, plus a few two-token sequences (e.g., ++ and then semicolon), which is lookahead(2). > > The main difference between the compromise strategy and the smart strategy is the handling of method invocations that are not unary: > > yield(3, 4) > > In the smart strategy, we?d figure out that this is a method call; in the compromise strategy, we?d require qualification just as we do with the unary method. > > The compromise strategy misses some cases we could parse unambiguously, but also offers a simpler user model: always qualify invocations of methods called yield when used as expression statements. And it offers the better lookup behavior, which will make life easier for IDEs. > > So my recommendation here is the compromise strategy. > > > On May 21, 2019, at 10:50 AM, Tagir Valeev > wrote: > > > > I discussed this with colleagues and can confirm that for IntelliJ > > IDEA parser it will be no problem to always consider yield as a > > statement. At least it's much easier than to consider it as a > > statement inside switchy blocks only. > > > > With best regards, > > Tagir Valeev. > > > > On Tue, May 21, 2019 at 12:38 PM Tagir Valeev > wrote: > >> > >>> So does this (option B plus your No) mean that IDEs would tend to color "yield" as a keyword (at the beginning of a statement) even if followed by "("? > >> > >> My "No" was mostly against options C and D where symbol resolution > >> affects the parse tree. Sorry if it wasn't clear from my message. When > >> the context for the parsing is available inside the same Java file, > >> it's usually ok. See the 'var' restricted keyword: > >> > >> var var = 10; // first is highlighted as type, second as local variable > >> var = 20; // var is highlighted as local variable, despite it's at the > >> beginning of a statement. > >> var(1); // var is highlighted as a method call, despite it's at the > >> beginning of a statement. > >> > >> We have no very big problems parsing this. > >> > >> With best regards, > >> Tagir Valeev. > >> > >> On Tue, May 21, 2019 at 2:52 AM John Rose > wrote: > >>> > >>> On May 20, 2019, at 8:24 AM, Tagir Valeev > wrote: > >>>> > >>>> Assuming that we agreed on 'yield' the option B seems the most attractive. A big No to context-specific parse tree. It's a complete pain to IDEs. Don't forget that IDE often deals with incomplete code, missing dependencies, etc., and still needs to provide reasonable highlighting and completion. Imagine that 'yield' method is available via import static Foo.* or superclass. In this case we don't want to look into other files to build a correct parse tree. > >>> > >>> So does this (option B plus your No) mean that IDEs would > >>> tend to color "yield" as a keyword (at the beginning of a > >>> statement) even if followed by "("? > >>> > >>> I suppose that would work. It's hard to predict what that > >>> would feel like, but it's logical. > >>> > >>> ? John > From daniel.smith at oracle.com Thu May 23 21:29:58 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 23 May 2019 15:29:58 -0600 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> Message-ID: <82D3CEAA-31F9-4E21-B38E-1D42D3507313@oracle.com> > On May 22, 2019, at 9:45 AM, Brian Goetz wrote: > > The ?compromise? strategy is like the smart strategy, except that it trades fixed lookahead for missing a few more method invocation cases. Here, we look at the tokens that follow the identifier yield, and use those to determine whether to classify yield as a keyword or identifier. (We?d choose identifier if it is an assignment op (=, +=, etc), left-bracket, dot, and a few others, plus a few two-token sequences (e.g., ++ and then semicolon), which is lookahead(2). > The compromise strategy misses some cases we could parse unambiguously, but also offers a simpler user model: always qualify invocations of methods called yield when used as expression statements. And it offers the better lookup behavior, which will make life easier for IDEs. There's still some space for different design choices within the compromise strategy: what happens to names in contexts *other than* the start of a statement? I think it's really helpful to split the question into three parts: variable names, type names, and method names. 1) Variable names: we've established that, with a fixed lookahead, every legal use of the variable name 'yield' can be properly interpreted. Great. 2) Type names: 'yield' might be used as the name of a class, type of a method parameter, type of a field, array component type, type of a 'final' local variable etc. Or we can prohibit it entirely as a type name. We went through this when designing 'var', and settled on the more restrictive position: you can't declare classes/interfaces/type vars or make reference to types with name 'var', regardless of context. That way, there's no risk of confusion between subtly different programs?wherever you see 'var' used as a type, you know it can only mean the keyword. I think it's best to treat 'yield' like 'var' in this case. 3) Method names: 'yield(' at the start of a statement means YieldStatement, but what about other contexts in which method invocations can appear? Example: var v = switch (x) { case 1 -> yield(x); // method call? default -> { yield(x); } // no-op, produces x (oops!) }; Fortunately, the different normal-completion behavior of a method call and a yield statement will probably catch most errors of this form?when I type the braces above, I'll probably also try adding a statement after the attempted 'yield' call, and the compiler will complain that the statement is unreachable. But it's all very subtle (not to mention painful for IDEs). Taking inspiration from the treatment of type names, my preference here is to make a blanket restriction that's easy to visualize: an *unqualified* method invocation must not use the name 'yield'. Context is irrelevant. The workaround is always to add a qualifier. (If, in the future, we introduce local methods or something similar that can't be qualified, we should not allow such methods to be named 'yield'.) --- Are people generally good with my preferred restrictions, or do you think it's better to be more permissive? From john.r.rose at oracle.com Thu May 23 22:00:36 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 23 May 2019 15:00:36 -0700 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: <82D3CEAA-31F9-4E21-B38E-1D42D3507313@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <82D3CEAA-31F9-4E21-B38E-1D42D3507313@oracle.com> Message-ID: <3B4ACB56-BFE6-4865-8856-F438BC1611ED@oracle.com> On May 23, 2019, at 2:29 PM, Dan Smith wrote: > > Are people generally good with my preferred restrictions, or do you think it's better to be more permissive? In this case I prefer the restrictions because, again, it's a simpler user experience. The loss of the method name "yield" for an *unqualified* method call is, as we have said all along, likely to be a minor cost. From alex.buckley at oracle.com Thu May 23 23:51:48 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Thu, 23 May 2019 16:51:48 -0700 Subject: Yield as contextual keyword In-Reply-To: <82D3CEAA-31F9-4E21-B38E-1D42D3507313@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <82D3CEAA-31F9-4E21-B38E-1D42D3507313@oracle.com> Message-ID: <5CE73214.8010801@oracle.com> On 5/23/2019 2:29 PM, Dan Smith wrote: > 2) Type names: 'yield' might be used as the name of a class, type of > a method parameter, type of a field, array component type, type of a > 'final' local variable etc. Or we can prohibit it entirely as a type > name. > > We went through this when designing 'var', and settled on the more > restrictive position: you can't declare classes/interfaces/type vars > or make reference to types with name 'var', regardless of context. > That way, there's no risk of confusion between subtly different > programs?wherever you see 'var' used as a type, you know it can only > mean the keyword. > > I think it's best to treat 'yield' like 'var' in this case. > > 3) Method names: 'yield(' at the start of a statement means > YieldStatement, but what about other contexts in which method > invocations can appear? > Taking inspiration from the treatment of type names, my preference > here is to make a blanket restriction that's easy to visualize: an > *unqualified* method invocation must not use the name 'yield'. > Context is irrelevant. The workaround is always to add a qualifier. This policy is "You can declare a method called `yield`, but you can only invoke the method by using qualified invocation syntax." OK, great. Could the policy in SE 10 have been similar? -- "You can declare a type called `var`, but you can only declare a variable at the type by using a qualified name." -- `var x = ...` to always indicate LVTI, `com.example.api.var x = ...` to still be possible. The need for TypeIdentifier to kick `var` out of type names (such as the type name used in a LocalVariableDeclarationStatement) would be unnecessary, as the rules of 14.4.1 would special-case the `var` identifier like they do today. OTOH, no-one has noticed that types called `var` can't be declared anymore, so maybe no-one will notice if types called `yield` can't be declared anymore. Alex From amaembo at gmail.com Fri May 24 03:31:28 2019 From: amaembo at gmail.com (Tagir Valeev) Date: Fri, 24 May 2019 10:31:28 +0700 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: <82D3CEAA-31F9-4E21-B38E-1D42D3507313@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <82D3CEAA-31F9-4E21-B38E-1D42D3507313@oracle.com> Message-ID: Also an interesting case: var yield = 5; var res = switch(yield) { default -> yield + yield; } // are we returning result of binary plus (10) or yielding result of unary plus (5)? Seems the first one, yet confusing. Tagir. On Fri, May 24, 2019 at 4:30 AM Dan Smith wrote: > > > On May 22, 2019, at 9:45 AM, Brian Goetz wrote: > > > > The ?compromise? strategy is like the smart strategy, except that it trades fixed lookahead for missing a few more method invocation cases. Here, we look at the tokens that follow the identifier yield, and use those to determine whether to classify yield as a keyword or identifier. (We?d choose identifier if it is an assignment op (=, +=, etc), left-bracket, dot, and a few others, plus a few two-token sequences (e.g., ++ and then semicolon), which is lookahead(2). > > > The compromise strategy misses some cases we could parse unambiguously, but also offers a simpler user model: always qualify invocations of methods called yield when used as expression statements. And it offers the better lookup behavior, which will make life easier for IDEs. > > There's still some space for different design choices within the compromise strategy: what happens to names in contexts *other than* the start of a statement? > > I think it's really helpful to split the question into three parts: variable names, type names, and method names. > > 1) Variable names: we've established that, with a fixed lookahead, every legal use of the variable name 'yield' can be properly interpreted. Great. > > 2) Type names: 'yield' might be used as the name of a class, type of a method parameter, type of a field, array component type, type of a 'final' local variable etc. Or we can prohibit it entirely as a type name. > > We went through this when designing 'var', and settled on the more restrictive position: you can't declare classes/interfaces/type vars or make reference to types with name 'var', regardless of context. That way, there's no risk of confusion between subtly different programs?wherever you see 'var' used as a type, you know it can only mean the keyword. > > I think it's best to treat 'yield' like 'var' in this case. > > 3) Method names: 'yield(' at the start of a statement means YieldStatement, but what about other contexts in which method invocations can appear? > > Example: > var v = switch (x) { > case 1 -> yield(x); // method call? > default -> { yield(x); } // no-op, produces x (oops!) > }; > > Fortunately, the different normal-completion behavior of a method call and a yield statement will probably catch most errors of this form?when I type the braces above, I'll probably also try adding a statement after the attempted 'yield' call, and the compiler will complain that the statement is unreachable. But it's all very subtle (not to mention painful for IDEs). > > Taking inspiration from the treatment of type names, my preference here is to make a blanket restriction that's easy to visualize: an *unqualified* method invocation must not use the name 'yield'. Context is irrelevant. The workaround is always to add a qualifier. > > (If, in the future, we introduce local methods or something similar that can't be qualified, we should not allow such methods to be named 'yield'.) > > --- > > Are people generally good with my preferred restrictions, or do you think it's better to be more permissive? > From gavin.bierman at oracle.com Fri May 24 12:25:47 2019 From: gavin.bierman at oracle.com (Gavin Bierman) Date: Fri, 24 May 2019 14:25:47 +0200 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> Message-ID: <565C5B9C-F884-4FD4-9BFC-2692327B86FC@oracle.com> A draft spec including the compromise strategy below is available at: http://cr.openjdk.java.net/~gbierman/jep354-jls-20190524.html Comments welcomed! Gavin > On 22 May 2019, at 17:45, Brian Goetz wrote: > > We?ve been drilling into the spec and implementation of yield as a contextual keyword. We have three possible strategies, all of which are specifiable and implementable, but with tradeoffs. > > The ?dumb strategy? would be to say that `yield` is a keyword when it appears in the first position of a statement production (e.g., after an open brace or a semicolon.). This is simple to spec, and simple to implement, but it doesn?t so do well with variables named `yield`: > > yield++; > yield = 3; > if (foo) > yield += 3; > yield[3] = 4; > > The ?smart strategy? says that `yield` is a keyword only within the context of the YieldStatement production; the rest of the time it is an identifier. This is also simple to spec, and does the right thing in all unambiguous cases, but requires unbounded lookahead, which compiler implementations may not like. The one ambiguous case is > > yield(e) > > which would match both YieldStatement and ExpressionStatement, and here we bias towards YieldStatement. Naked yield() invocations can qualify the invocation: > > this.yield(3) > Thread.yield(4) > > The ?compromise? strategy is like the smart strategy, except that it trades fixed lookahead for missing a few more method invocation cases. Here, we look at the tokens that follow the identifier yield, and use those to determine whether to classify yield as a keyword or identifier. (We?d choose identifier if it is an assignment op (=, +=, etc), left-bracket, dot, and a few others, plus a few two-token sequences (e.g., ++ and then semicolon), which is lookahead(2). > > The main difference between the compromise strategy and the smart strategy is the handling of method invocations that are not unary: > > yield(3, 4) > > In the smart strategy, we?d figure out that this is a method call; in the compromise strategy, we?d require qualification just as we do with the unary method. > > The compromise strategy misses some cases we could parse unambiguously, but also offers a simpler user model: always qualify invocations of methods called yield when used as expression statements. And it offers the better lookup behavior, which will make life easier for IDEs. > > So my recommendation here is the compromise strategy. > >> On May 21, 2019, at 10:50 AM, Tagir Valeev wrote: >> >> I discussed this with colleagues and can confirm that for IntelliJ >> IDEA parser it will be no problem to always consider yield as a >> statement. At least it's much easier than to consider it as a >> statement inside switchy blocks only. >> >> With best regards, >> Tagir Valeev. >> >> On Tue, May 21, 2019 at 12:38 PM Tagir Valeev wrote: >>> >>>> So does this (option B plus your No) mean that IDEs would tend to color "yield" as a keyword (at the beginning of a statement) even if followed by "("? >>> >>> My "No" was mostly against options C and D where symbol resolution >>> affects the parse tree. Sorry if it wasn't clear from my message. When >>> the context for the parsing is available inside the same Java file, >>> it's usually ok. See the 'var' restricted keyword: >>> >>> var var = 10; // first is highlighted as type, second as local variable >>> var = 20; // var is highlighted as local variable, despite it's at the >>> beginning of a statement. >>> var(1); // var is highlighted as a method call, despite it's at the >>> beginning of a statement. >>> >>> We have no very big problems parsing this. >>> >>> With best regards, >>> Tagir Valeev. >>> >>> On Tue, May 21, 2019 at 2:52 AM John Rose wrote: >>>> >>>> On May 20, 2019, at 8:24 AM, Tagir Valeev wrote: >>>>> >>>>> Assuming that we agreed on 'yield' the option B seems the most attractive. A big No to context-specific parse tree. It's a complete pain to IDEs. Don't forget that IDE often deals with incomplete code, missing dependencies, etc., and still needs to provide reasonable highlighting and completion. Imagine that 'yield' method is available via import static Foo.* or superclass. In this case we don't want to look into other files to build a correct parse tree. >>>> >>>> So does this (option B plus your No) mean that IDEs would >>>> tend to color "yield" as a keyword (at the beginning of a >>>> statement) even if followed by "("? >>>> >>>> I suppose that would work. It's hard to predict what that >>>> would feel like, but it's logical. >>>> >>>> ? John > From brian.goetz at oracle.com Fri May 24 14:34:33 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 24 May 2019 10:34:33 -0400 Subject: Yield as contextual keyword In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <82D3CEAA-31F9-4E21-B38E-1D42D3507313@oracle.com> Message-ID: var yield = 5; yield is lexed as an identifier, so this is a valid variable declaration ?var res = switch(yield) { yield is lexed as an identifier, so this is a valid switch operand ??? default -> yield + yield; RHS of a single-consequence case is expression | block statement | throw.? We're not in a YieldStatement production, so this is an expression. Like lambdas, we can think of this as shorthand for ??? default -> { ??????? yield yield + yield; ??? } which parses as a yield statement whose operand is the expression yield+yield. > // are we > returning result of binary plus (10) or yielding result of unary plus > (5)? Seems the first one, yet confusing. > > Tagir. > > On Fri, May 24, 2019 at 4:30 AM Dan Smith wrote: >>> On May 22, 2019, at 9:45 AM, Brian Goetz wrote: >>> >>> The ?compromise? strategy is like the smart strategy, except that it trades fixed lookahead for missing a few more method invocation cases. Here, we look at the tokens that follow the identifier yield, and use those to determine whether to classify yield as a keyword or identifier. (We?d choose identifier if it is an assignment op (=, +=, etc), left-bracket, dot, and a few others, plus a few two-token sequences (e.g., ++ and then semicolon), which is lookahead(2). >>> The compromise strategy misses some cases we could parse unambiguously, but also offers a simpler user model: always qualify invocations of methods called yield when used as expression statements. And it offers the better lookup behavior, which will make life easier for IDEs. >> There's still some space for different design choices within the compromise strategy: what happens to names in contexts *other than* the start of a statement? >> >> I think it's really helpful to split the question into three parts: variable names, type names, and method names. >> >> 1) Variable names: we've established that, with a fixed lookahead, every legal use of the variable name 'yield' can be properly interpreted. Great. >> >> 2) Type names: 'yield' might be used as the name of a class, type of a method parameter, type of a field, array component type, type of a 'final' local variable etc. Or we can prohibit it entirely as a type name. >> >> We went through this when designing 'var', and settled on the more restrictive position: you can't declare classes/interfaces/type vars or make reference to types with name 'var', regardless of context. That way, there's no risk of confusion between subtly different programs?wherever you see 'var' used as a type, you know it can only mean the keyword. >> >> I think it's best to treat 'yield' like 'var' in this case. >> >> 3) Method names: 'yield(' at the start of a statement means YieldStatement, but what about other contexts in which method invocations can appear? >> >> Example: >> var v = switch (x) { >> case 1 -> yield(x); // method call? >> default -> { yield(x); } // no-op, produces x (oops!) >> }; >> >> Fortunately, the different normal-completion behavior of a method call and a yield statement will probably catch most errors of this form?when I type the braces above, I'll probably also try adding a statement after the attempted 'yield' call, and the compiler will complain that the statement is unreachable. But it's all very subtle (not to mention painful for IDEs). >> >> Taking inspiration from the treatment of type names, my preference here is to make a blanket restriction that's easy to visualize: an *unqualified* method invocation must not use the name 'yield'. Context is irrelevant. The workaround is always to add a qualifier. >> >> (If, in the future, we introduce local methods or something similar that can't be qualified, we should not allow such methods to be named 'yield'.) >> >> --- >> >> Are people generally good with my preferred restrictions, or do you think it's better to be more permissive? >> From guy.steele at oracle.com Fri May 24 14:53:08 2019 From: guy.steele at oracle.com (Guy Steele) Date: Fri, 24 May 2019 10:53:08 -0400 Subject: Yield as contextual keyword In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <82D3CEAA-31F9-4E21-B38E-1D42D3507313@oracle.com> Message-ID: <445959B0-B140-48C1-8AAB-5C77F1E28032@oracle.com> > On May 24, 2019, at 10:34 AM, Brian Goetz wrote: > > > var yield = 5; > > yield is lexed as an identifier, so this is a valid variable declaration > > var res = switch(yield) { > > yield is lexed as an identifier, so this is a valid switch operand > > > default -> yield + yield; > > RHS of a single-consequence case is expression | block statement | throw. We're not in a YieldStatement production, so this is an expression. > > Like lambdas, we can think of this as shorthand for > > default -> { > yield yield + yield; > } > > which parses as a yield statement whose operand is the expression yield+yield. And, yay (thanks, Tagir), default -> { yield + yield; } is a genuine, bona fide puzzler! (But as long as we are doing special-case parsing and checking the next token right after an occurrence of yield, it is a puzzler we can easily defend against if we want to.) ?Guy From brian.goetz at oracle.com Fri May 24 15:24:32 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 24 May 2019 11:24:32 -0400 Subject: Yield as contextual keyword In-Reply-To: <445959B0-B140-48C1-8AAB-5C77F1E28032@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <82D3CEAA-31F9-4E21-B38E-1D42D3507313@oracle.com> <445959B0-B140-48C1-8AAB-5C77F1E28032@oracle.com> Message-ID: <8DFDFEA1-4C43-405D-8EAA-D6D756A6DE96@oracle.com> Right. So far we have only talked about the language spec and the meaning of programs. There is lots more that compilers and ides can do to help users steer away from the (actually quite small) puzzler zones. Sent from my iPad > On May 24, 2019, at 10:53 AM, Guy Steele wrote: > > >> On May 24, 2019, at 10:34 AM, Brian Goetz wrote: >> >> >> var yield = 5; >> >> yield is lexed as an identifier, so this is a valid variable declaration >> >> var res = switch(yield) { >> >> yield is lexed as an identifier, so this is a valid switch operand >> >> >> default -> yield + yield; >> >> RHS of a single-consequence case is expression | block statement | throw. We're not in a YieldStatement production, so this is an expression. >> >> Like lambdas, we can think of this as shorthand for >> >> default -> { >> yield yield + yield; >> } >> >> which parses as a yield statement whose operand is the expression yield+yield. > > And, yay (thanks, Tagir), > > default -> { > yield + yield; > } > > is a genuine, bona fide puzzler! > > (But as long as we are doing special-case parsing and checking the next token right after an occurrence of yield, it is a puzzler we can easily defend against if we want to.) > > ?Guy > From daniel.smith at oracle.com Fri May 24 16:07:06 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 24 May 2019 10:07:06 -0600 Subject: Yield as contextual keyword In-Reply-To: <5CE73214.8010801@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <82D3CEAA-31F9-4E21-B38E-1D42D3507313@oracle.com> <5CE73214.8010801@oracle.com> Message-ID: > On May 23, 2019, at 5:51 PM, Alex Buckley wrote: > > This policy is "You can declare a method called `yield`, but you can only invoke the method by using qualified invocation syntax." OK, great. > > Could the policy in SE 10 have been similar? -- "You can declare a type called `var`, but you can only declare a variable at the type by using a qualified name." -- `var x = ...` to always indicate LVTI, `com.example.api.var x = ...` to still be possible. The need for TypeIdentifier to kick `var` out of type names (such as the type name used in a LocalVariableDeclarationStatement) would be unnecessary, as the rules of 14.4.1 would special-case the `var` identifier like they do today. Yes, that's a fair point. A uniform strategy would suggest allowing qualified references to types 'var' and 'yield'. (Specifically: a class or interface declared *in a named package* or *as a member* may be named 'var' or 'yield'; a qualified type name may use 'var' or 'yield' after the dot.) In 10, that would have seemed like way too much complexity?why bother to allow class declarations named 'var' that can only be referenced with a qualified name? Nobody cares about classes named 'var' anyway. With 'yield' and method invocations, the story is different?we have useful 'yield' methods in the wild, and need to find a way to continue supporting them. So, while I don't think that would have been a compelling story in 10, now it may make sense, in order to make our policies more consistent. From amaembo at gmail.com Fri May 24 20:14:50 2019 From: amaembo at gmail.com (Tagir Valeev) Date: Sat, 25 May 2019 03:14:50 +0700 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: <565C5B9C-F884-4FD4-9BFC-2692327B86FC@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <565C5B9C-F884-4FD4-9BFC-2692327B86FC@oracle.com> Message-ID: Hello! > The first token in a YieldStatement production is always preceded by one of these separator tokens: ;, {, }, ), or ->. Seems I'm missing something. Could you please illustrate in which case YieldStatement could be preceded by ')'? Also what about '->'? In lambda '->' is followed by an expression or block, but not a statement. In switch '->' is followed by block, throw or expression plus semicolon. Also could YieldStatement be preceded by ':' in old switch format? E.g. System.out.println(switch(0) { default: yield 1; }); // seems legit Also sections 16.1.7 and 16.1.8 are named identically. Probably there's some mistake. With best regards, Tagir Valeev. On Fri, May 24, 2019 at 7:25 PM Gavin Bierman wrote: > > A draft spec including the compromise strategy below is available at: > > http://cr.openjdk.java.net/~gbierman/jep354-jls-20190524.html > > Comments welcomed! > Gavin > > On 22 May 2019, at 17:45, Brian Goetz wrote: > > We?ve been drilling into the spec and implementation of yield as a contextual keyword. We have three possible strategies, all of which are specifiable and implementable, but with tradeoffs. > > The ?dumb strategy? would be to say that `yield` is a keyword when it appears in the first position of a statement production (e.g., after an open brace or a semicolon.). This is simple to spec, and simple to implement, but it doesn?t so do well with variables named `yield`: > > yield++; > yield = 3; > if (foo) > yield += 3; > yield[3] = 4; > > The ?smart strategy? says that `yield` is a keyword only within the context of the YieldStatement production; the rest of the time it is an identifier. This is also simple to spec, and does the right thing in all unambiguous cases, but requires unbounded lookahead, which compiler implementations may not like. The one ambiguous case is > > yield(e) > > which would match both YieldStatement and ExpressionStatement, and here we bias towards YieldStatement. Naked yield() invocations can qualify the invocation: > > this.yield(3) > Thread.yield(4) > > The ?compromise? strategy is like the smart strategy, except that it trades fixed lookahead for missing a few more method invocation cases. Here, we look at the tokens that follow the identifier yield, and use those to determine whether to classify yield as a keyword or identifier. (We?d choose identifier if it is an assignment op (=, +=, etc), left-bracket, dot, and a few others, plus a few two-token sequences (e.g., ++ and then semicolon), which is lookahead(2). > > The main difference between the compromise strategy and the smart strategy is the handling of method invocations that are not unary: > > yield(3, 4) > > In the smart strategy, we?d figure out that this is a method call; in the compromise strategy, we?d require qualification just as we do with the unary method. > > The compromise strategy misses some cases we could parse unambiguously, but also offers a simpler user model: always qualify invocations of methods called yield when used as expression statements. And it offers the better lookup behavior, which will make life easier for IDEs. > > So my recommendation here is the compromise strategy. > > On May 21, 2019, at 10:50 AM, Tagir Valeev wrote: > > I discussed this with colleagues and can confirm that for IntelliJ > IDEA parser it will be no problem to always consider yield as a > statement. At least it's much easier than to consider it as a > statement inside switchy blocks only. > > With best regards, > Tagir Valeev. > > On Tue, May 21, 2019 at 12:38 PM Tagir Valeev wrote: > > > So does this (option B plus your No) mean that IDEs would tend to color "yield" as a keyword (at the beginning of a statement) even if followed by "("? > > > My "No" was mostly against options C and D where symbol resolution > affects the parse tree. Sorry if it wasn't clear from my message. When > the context for the parsing is available inside the same Java file, > it's usually ok. See the 'var' restricted keyword: > > var var = 10; // first is highlighted as type, second as local variable > var = 20; // var is highlighted as local variable, despite it's at the > beginning of a statement. > var(1); // var is highlighted as a method call, despite it's at the > beginning of a statement. > > We have no very big problems parsing this. > > With best regards, > Tagir Valeev. > > On Tue, May 21, 2019 at 2:52 AM John Rose wrote: > > > On May 20, 2019, at 8:24 AM, Tagir Valeev wrote: > > > Assuming that we agreed on 'yield' the option B seems the most attractive. A big No to context-specific parse tree. It's a complete pain to IDEs. Don't forget that IDE often deals with incomplete code, missing dependencies, etc., and still needs to provide reasonable highlighting and completion. Imagine that 'yield' method is available via import static Foo.* or superclass. In this case we don't want to look into other files to build a correct parse tree. > > > So does this (option B plus your No) mean that IDEs would > tend to color "yield" as a keyword (at the beginning of a > statement) even if followed by "("? > > I suppose that would work. It's hard to predict what that > would feel like, but it's logical. > > ? John > > > From amaembo at gmail.com Fri May 24 20:19:06 2019 From: amaembo at gmail.com (Tagir Valeev) Date: Sat, 25 May 2019 03:19:06 +0700 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <565C5B9C-F884-4FD4-9BFC-2692327B86FC@oracle.com> Message-ID: Hello! Answering myself > > The first token in a YieldStatement production is always preceded by one of these separator tokens: ;, {, }, ), or ->. > > Seems I'm missing something. Could you please illustrate in which case > YieldStatement could be preceded by ')'? Nevermind. if(foo) yield bar; is a good example. Other my points still apply. > Also what about '->'? In > lambda '->' is followed by an expression or block, but not a > statement. In switch '->' is followed by block, throw or expression > plus semicolon. Also could YieldStatement be preceded by ':' in old > switch format? E.g. > > System.out.println(switch(0) { default: yield 1; }); // seems legit > > Also sections 16.1.7 and 16.1.8 are named identically. Probably > there's some mistake. > > With best regards, > Tagir Valeev. > > On Fri, May 24, 2019 at 7:25 PM Gavin Bierman wrote: > > > > A draft spec including the compromise strategy below is available at: > > > > http://cr.openjdk.java.net/~gbierman/jep354-jls-20190524.html > > > > Comments welcomed! > > Gavin > > > > On 22 May 2019, at 17:45, Brian Goetz wrote: > > > > We?ve been drilling into the spec and implementation of yield as a contextual keyword. We have three possible strategies, all of which are specifiable and implementable, but with tradeoffs. > > > > The ?dumb strategy? would be to say that `yield` is a keyword when it appears in the first position of a statement production (e.g., after an open brace or a semicolon.). This is simple to spec, and simple to implement, but it doesn?t so do well with variables named `yield`: > > > > yield++; > > yield = 3; > > if (foo) > > yield += 3; > > yield[3] = 4; > > > > The ?smart strategy? says that `yield` is a keyword only within the context of the YieldStatement production; the rest of the time it is an identifier. This is also simple to spec, and does the right thing in all unambiguous cases, but requires unbounded lookahead, which compiler implementations may not like. The one ambiguous case is > > > > yield(e) > > > > which would match both YieldStatement and ExpressionStatement, and here we bias towards YieldStatement. Naked yield() invocations can qualify the invocation: > > > > this.yield(3) > > Thread.yield(4) > > > > The ?compromise? strategy is like the smart strategy, except that it trades fixed lookahead for missing a few more method invocation cases. Here, we look at the tokens that follow the identifier yield, and use those to determine whether to classify yield as a keyword or identifier. (We?d choose identifier if it is an assignment op (=, +=, etc), left-bracket, dot, and a few others, plus a few two-token sequences (e.g., ++ and then semicolon), which is lookahead(2). > > > > The main difference between the compromise strategy and the smart strategy is the handling of method invocations that are not unary: > > > > yield(3, 4) > > > > In the smart strategy, we?d figure out that this is a method call; in the compromise strategy, we?d require qualification just as we do with the unary method. > > > > The compromise strategy misses some cases we could parse unambiguously, but also offers a simpler user model: always qualify invocations of methods called yield when used as expression statements. And it offers the better lookup behavior, which will make life easier for IDEs. > > > > So my recommendation here is the compromise strategy. > > > > On May 21, 2019, at 10:50 AM, Tagir Valeev wrote: > > > > I discussed this with colleagues and can confirm that for IntelliJ > > IDEA parser it will be no problem to always consider yield as a > > statement. At least it's much easier than to consider it as a > > statement inside switchy blocks only. > > > > With best regards, > > Tagir Valeev. > > > > On Tue, May 21, 2019 at 12:38 PM Tagir Valeev wrote: > > > > > > So does this (option B plus your No) mean that IDEs would tend to color "yield" as a keyword (at the beginning of a statement) even if followed by "("? > > > > > > My "No" was mostly against options C and D where symbol resolution > > affects the parse tree. Sorry if it wasn't clear from my message. When > > the context for the parsing is available inside the same Java file, > > it's usually ok. See the 'var' restricted keyword: > > > > var var = 10; // first is highlighted as type, second as local variable > > var = 20; // var is highlighted as local variable, despite it's at the > > beginning of a statement. > > var(1); // var is highlighted as a method call, despite it's at the > > beginning of a statement. > > > > We have no very big problems parsing this. > > > > With best regards, > > Tagir Valeev. > > > > On Tue, May 21, 2019 at 2:52 AM John Rose wrote: > > > > > > On May 20, 2019, at 8:24 AM, Tagir Valeev wrote: > > > > > > Assuming that we agreed on 'yield' the option B seems the most attractive. A big No to context-specific parse tree. It's a complete pain to IDEs. Don't forget that IDE often deals with incomplete code, missing dependencies, etc., and still needs to provide reasonable highlighting and completion. Imagine that 'yield' method is available via import static Foo.* or superclass. In this case we don't want to look into other files to build a correct parse tree. > > > > > > So does this (option B plus your No) mean that IDEs would > > tend to color "yield" as a keyword (at the beginning of a > > statement) even if followed by "("? > > > > I suppose that would work. It's hard to predict what that > > would feel like, but it's logical. > > > > ? John > > > > > > From amaembo at gmail.com Fri May 24 20:22:25 2019 From: amaembo at gmail.com (Tagir Valeev) Date: Sat, 25 May 2019 03:22:25 +0700 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: <565C5B9C-F884-4FD4-9BFC-2692327B86FC@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <565C5B9C-F884-4FD4-9BFC-2692327B86FC@oracle.com> Message-ID: One more exotic case: System.out.println(switch(0) { default -> { do yield 1; while(false); } }); Looks legit as well, though incredibly strange. Here yield is preceded by the 'do' token. With best regards, Tagir Valeev. On Fri, May 24, 2019 at 7:25 PM Gavin Bierman wrote: > > A draft spec including the compromise strategy below is available at: > > http://cr.openjdk.java.net/~gbierman/jep354-jls-20190524.html > > Comments welcomed! > Gavin > > On 22 May 2019, at 17:45, Brian Goetz wrote: > > We?ve been drilling into the spec and implementation of yield as a contextual keyword. We have three possible strategies, all of which are specifiable and implementable, but with tradeoffs. > > The ?dumb strategy? would be to say that `yield` is a keyword when it appears in the first position of a statement production (e.g., after an open brace or a semicolon.). This is simple to spec, and simple to implement, but it doesn?t so do well with variables named `yield`: > > yield++; > yield = 3; > if (foo) > yield += 3; > yield[3] = 4; > > The ?smart strategy? says that `yield` is a keyword only within the context of the YieldStatement production; the rest of the time it is an identifier. This is also simple to spec, and does the right thing in all unambiguous cases, but requires unbounded lookahead, which compiler implementations may not like. The one ambiguous case is > > yield(e) > > which would match both YieldStatement and ExpressionStatement, and here we bias towards YieldStatement. Naked yield() invocations can qualify the invocation: > > this.yield(3) > Thread.yield(4) > > The ?compromise? strategy is like the smart strategy, except that it trades fixed lookahead for missing a few more method invocation cases. Here, we look at the tokens that follow the identifier yield, and use those to determine whether to classify yield as a keyword or identifier. (We?d choose identifier if it is an assignment op (=, +=, etc), left-bracket, dot, and a few others, plus a few two-token sequences (e.g., ++ and then semicolon), which is lookahead(2). > > The main difference between the compromise strategy and the smart strategy is the handling of method invocations that are not unary: > > yield(3, 4) > > In the smart strategy, we?d figure out that this is a method call; in the compromise strategy, we?d require qualification just as we do with the unary method. > > The compromise strategy misses some cases we could parse unambiguously, but also offers a simpler user model: always qualify invocations of methods called yield when used as expression statements. And it offers the better lookup behavior, which will make life easier for IDEs. > > So my recommendation here is the compromise strategy. > > On May 21, 2019, at 10:50 AM, Tagir Valeev wrote: > > I discussed this with colleagues and can confirm that for IntelliJ > IDEA parser it will be no problem to always consider yield as a > statement. At least it's much easier than to consider it as a > statement inside switchy blocks only. > > With best regards, > Tagir Valeev. > > On Tue, May 21, 2019 at 12:38 PM Tagir Valeev wrote: > > > So does this (option B plus your No) mean that IDEs would tend to color "yield" as a keyword (at the beginning of a statement) even if followed by "("? > > > My "No" was mostly against options C and D where symbol resolution > affects the parse tree. Sorry if it wasn't clear from my message. When > the context for the parsing is available inside the same Java file, > it's usually ok. See the 'var' restricted keyword: > > var var = 10; // first is highlighted as type, second as local variable > var = 20; // var is highlighted as local variable, despite it's at the > beginning of a statement. > var(1); // var is highlighted as a method call, despite it's at the > beginning of a statement. > > We have no very big problems parsing this. > > With best regards, > Tagir Valeev. > > On Tue, May 21, 2019 at 2:52 AM John Rose wrote: > > > On May 20, 2019, at 8:24 AM, Tagir Valeev wrote: > > > Assuming that we agreed on 'yield' the option B seems the most attractive. A big No to context-specific parse tree. It's a complete pain to IDEs. Don't forget that IDE often deals with incomplete code, missing dependencies, etc., and still needs to provide reasonable highlighting and completion. Imagine that 'yield' method is available via import static Foo.* or superclass. In this case we don't want to look into other files to build a correct parse tree. > > > So does this (option B plus your No) mean that IDEs would > tend to color "yield" as a keyword (at the beginning of a > statement) even if followed by "("? > > I suppose that would work. It's hard to predict what that > would feel like, but it's logical. > > ? John > > > From alex.buckley at oracle.com Fri May 24 21:44:05 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Fri, 24 May 2019 14:44:05 -0700 Subject: Yield as contextual keyword In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <565C5B9C-F884-4FD4-9BFC-2692327B86FC@oracle.com> Message-ID: <5CE865A5.1060301@oracle.com> On 5/24/2019 1:19 PM, Tagir Valeev wrote: > Hello! Answering myself > >>> The first token in a YieldStatement production is always preceded >>> by one of these separator tokens: ;, {, }, ), or ->. >> >> Seems I'm missing something. Could you please illustrate in which >> case YieldStatement could be preceded by ')'? > > Nevermind. if(foo) yield bar; is a good example. Other my points > still apply. > >> Also what about '->'? In lambda '->' is followed by an expression >> or block, but not a statement. In switch '->' is followed by block, >> throw or expression plus semicolon. Also could YieldStatement be >> preceded by ':' in old switch format? E.g. >> >> System.out.println(switch(0) { default: yield 1; }); // seems >> legit You're right that `->` should not appear in the list. Any `yield` which follows `->` is necessarily the start of an expression, so `yield` should be tokenized as an identifier there. `:` is tricky. On the one hand, the space after `:` is sometimes desirous of an statement, so tokenize `yield` as a keyword: - `default : yield (1);` in a switch expression (also `case ... :`) - `L1 : yield (1);` in a switch expression (labeled statements are legitimate in a switch-labeled block! If there was no label, we would quickly say that this `yield` is a YieldStatement not an ExpressionStatement, and that if you want an ExpressionStatement which invokes a method, then qualify the invocation.) On the other hand, the space after `:` is sometimes desirous of an expression, so tokenize `yield` as a identifier: (and it might be the name of a local variable, so no way to qualify) - `for (String s : yield . f) ...` - `m(a ? yield . f : yield . g)` Alex From amaembo at gmail.com Sat May 25 06:05:15 2019 From: amaembo at gmail.com (Tagir Valeev) Date: Sat, 25 May 2019 13:05:15 +0700 Subject: Yield as contextual keyword In-Reply-To: <5CE865A5.1060301@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <565C5B9C-F884-4FD4-9BFC-2692327B86FC@oracle.com> <5CE865A5.1060301@oracle.com> Message-ID: Hello! On Sat, May 25, 2019 at 4:44 AM Alex Buckley wrote: > On the other hand, the space after `:` is sometimes desirous of an > expression, so tokenize `yield` as a identifier: (and it might be the > name of a local variable, so no way to qualify) > > - `for (String s : yield . f) ...` > > - `m(a ? yield . f : yield . g)` Well to me it seems that `:` is no more special here than other listed tokens (probably except `}`). E.g.: - After `)`: ((Foo)yield).bar(); - After `{`: Object[] arr = {yield}; - After `;`: for(int i=0; i<10; yield(i++)) {} `yield` keyword is always preceded by one of X, Y, Z tokens doesn't mean that any `yield` character sequence which follows X, Y or Z tokens is necessarily a keyword. Only where YieldStatement production is expected we treat it as a keyword. Btw `yield` keyword can also appear after `else` token: if(foo) yield bar; else yield baz; With best regards, Tagir Valeev. From amaembo at gmail.com Sat May 25 06:15:18 2019 From: amaembo at gmail.com (Tagir Valeev) Date: Sat, 25 May 2019 13:15:18 +0700 Subject: Yield as contextual keyword In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <565C5B9C-F884-4FD4-9BFC-2692327B86FC@oracle.com> <5CE865A5.1060301@oracle.com> Message-ID: A small remark: when we advice to qualify the `yield` call to distinguish it from YieldStatement, there's one corner case when qualification is impossible. I mean a nested class inside an anonymous class var x = new Runnable() { public void run() { new Object() { void foo() { yield(1); // this.yield(1) or Runnable.this.yield(1) are incorrect. } }.foo(); } void yield(int x) { System.out.println(x); } }; I admit that such code is completely insane and probably doesn't exist anywhere on the planet. Yet, the spec draft says: > Any statement that begins with a reference to a field or method called yield can use its qualified name to ensure that the character sequence is not tokenized as a keyword. Here 'Any' is not entirely correct. With best regards, Tagir Valeev. On Sat, May 25, 2019 at 1:05 PM Tagir Valeev wrote: > > Hello! > > On Sat, May 25, 2019 at 4:44 AM Alex Buckley wrote: > > On the other hand, the space after `:` is sometimes desirous of an > > expression, so tokenize `yield` as a identifier: (and it might be the > > name of a local variable, so no way to qualify) > > > > - `for (String s : yield . f) ...` > > > > - `m(a ? yield . f : yield . g)` > > Well to me it seems that `:` is no more special here than other listed > tokens (probably except `}`). E.g.: > > - After `)`: ((Foo)yield).bar(); > - After `{`: Object[] arr = {yield}; > - After `;`: for(int i=0; i<10; yield(i++)) {} > > `yield` keyword is always preceded by one of X, Y, Z tokens doesn't > mean that any `yield` character sequence which follows X, Y or Z > tokens is necessarily a keyword. Only where YieldStatement production > is expected we treat it as a keyword. > > Btw `yield` keyword can also appear after `else` token: if(foo) yield > bar; else yield baz; > > With best regards, > Tagir Valeev. From john.r.rose at oracle.com Sat May 25 20:22:45 2019 From: john.r.rose at oracle.com (John Rose) Date: Sat, 25 May 2019 13:22:45 -0700 Subject: Yield as contextual keyword In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <565C5B9C-F884-4FD4-9BFC-2692327B86FC@oracle.com> <5CE865A5.1060301@oracle.com> Message-ID: <48F830B2-B2E4-4041-AEF0-421A0D8D9FEB@oracle.com> This one is fixable with a local-variable rename. I think that's a robust solution as long as the anonymous class is in a context where the local can be introduced nearby. We can perhaps find awkward contexts where the local cannot be introduced. I'm thinking of 'super(new Object() {?})' and field initializers. In those cases, the outer object can supply a field or method to provide access to the shadowed 'this'. On May 24, 2019, at 11:15 PM, Tagir Valeev wrote: > > > var x = new Runnable() { > public void run() { + var this1 = this; > new Object() { > void foo() { + this1. > yield(1); // this.yield(1) or Runnable.this.yield(1) are incorrect. > } > }.foo(); > } > void yield(int x) { System.out.println(x); } > }; From amaembo at gmail.com Sun May 26 04:47:23 2019 From: amaembo at gmail.com (Tagir Valeev) Date: Sun, 26 May 2019 11:47:23 +0700 Subject: Yield as contextual keyword In-Reply-To: <48F830B2-B2E4-4041-AEF0-421A0D8D9FEB@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <565C5B9C-F884-4FD4-9BFC-2692327B86FC@oracle.com> <5CE865A5.1060301@oracle.com> <48F830B2-B2E4-4041-AEF0-421A0D8D9FEB@oracle.com> Message-ID: Sure the problem could be fixed in a number of ways, it's just not as straightforwards as adding a qualifier. I can also think up several ways to fix this locally (modifying just single line): ((Runnable)() -> yield(1)).run(); or switch(0) { default -> yield(1); } or even for(int i=0; i<1; yield(1), i++) {} With best regards, Tagir Valeev On Sun, May 26, 2019 at 3:22 AM John Rose wrote: > > This one is fixable with a local-variable rename. > I think that's a robust solution as long as the > anonymous class is in a context where the > local can be introduced nearby. > > We can perhaps find awkward contexts where > the local cannot be introduced. I'm thinking > of 'super(new Object() {?})' and field initializers. > In those cases, the outer object can supply > a field or method to provide access to the > shadowed 'this'. > > On May 24, 2019, at 11:15 PM, Tagir Valeev wrote: > > > > > > var x = new Runnable() { > > public void run() { > + var this1 = this; > > new Object() { > > void foo() { > + this1. > > yield(1); // this.yield(1) or Runnable.this.yield(1) are incorrect. > > } > > }.foo(); > > } > > void yield(int x) { System.out.println(x); } > > }; > From gavin.bierman at oracle.com Mon May 27 09:13:05 2019 From: gavin.bierman at oracle.com (Gavin Bierman) Date: Mon, 27 May 2019 11:13:05 +0200 Subject: Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch) In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <565C5B9C-F884-4FD4-9BFC-2692327B86FC@oracle.com> Message-ID: <268378C2-45B1-4A76-A7B6-EE0CACA977A2@oracle.com> > On 24 May 2019, at 22:14, Tagir Valeev wrote: > > Also sections 16.1.7 and 16.1.8 are named identically. Probably > there's some mistake. Weird as it seems, it was deliberate, in the sense that 16.1.5 and 16.1.6 are currently named identically (they are the n-ary versions of the conditional expressions). Gavin From dl at cs.oswego.edu Mon May 27 14:06:43 2019 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 27 May 2019 10:06:43 -0400 Subject: Yield as contextual keyword In-Reply-To: <48F830B2-B2E4-4041-AEF0-421A0D8D9FEB@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <565C5B9C-F884-4FD4-9BFC-2692327B86FC@oracle.com> <5CE865A5.1060301@oracle.com> <48F830B2-B2E4-4041-AEF0-421A0D8D9FEB@oracle.com> Message-ID: <4e96ff80-872a-1694-0bc1-6b0bf2dc5f09@cs.oswego.edu> I don't enjoy being the token curmudgeon here, but I find it increasingly hard to appreciate why a non-ambiguous choice (prefix "^") with precedence in related languages should be rejected in favor of one requiring context-sensitive grammar mangling with some known odd consequences. At the very least, could someone help check as-yet-unknown impact by using candidate parsers on large source corpuses (for example http://groups.inf.ed.ac.uk/cup/javaGithub/, google-internal, etc)? -Doug From brian.goetz at oracle.com Mon May 27 15:24:36 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 27 May 2019 17:24:36 +0200 Subject: Yield as contextual keyword In-Reply-To: <4e96ff80-872a-1694-0bc1-6b0bf2dc5f09@cs.oswego.edu> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <565C5B9C-F884-4FD4-9BFC-2692327B86FC@oracle.com> <5CE865A5.1060301@oracle.com> <48F830B2-B2E4-4041-AEF0-421A0D8D9FEB@oracle.com> <4e96ff80-872a-1694-0bc1-6b0b! f2dc5f09@cs.oswego.edu> Message-ID: This is really two questions. We could have a non-ambiguous keyword (eg break-from-expression-switch); that?s separate from the keyword vs operator story. To the latter, I think the simple answer is: all existing control flow operations (return, throw, break, etc) are words. This does not seem sufficiently different to change paradigms by creating an operator. To the former, this is a trade off between spec complexity and reading clarity. To this, the question of whether this is a good trade off is a reasonable one. If the complexity can be reasonably bounded, I think most people prefer a new verb to the set of things that can be constructed with real unambiguous keywords, but this is surely subjective. Sent from my MacBook Wheel > On May 27, 2019, at 4:06 PM, Doug Lea
wrote: > > > I don't enjoy being the token curmudgeon here, but I find it > increasingly hard to appreciate why a non-ambiguous choice (prefix "^") > with precedence in related languages should be rejected in favor of one > requiring context-sensitive grammar mangling with some known odd > consequences. At the very least, could someone help check as-yet-unknown > impact by using candidate parsers on large source corpuses (for example > http://groups.inf.ed.ac.uk/cup/javaGithub/, google-internal, etc)? > > -Doug > > From dl at cs.oswego.edu Mon May 27 15:54:28 2019 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 27 May 2019 11:54:28 -0400 Subject: Yield as contextual keyword In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <565C5B9C-F884-4FD4-9BFC-2692327B86FC@oracle.com> <5CE865A5.1060301@oracle.com> <48F830B2-B2E4-4041-AEF0-421A0D8D9FEB@oracle.com> <4e96ff80-872a-1694-0bc1-6b0b! f2dc5f09@cs.oswego.edu> Message-ID: <9f175ec1-2e3d-7cbe-35c7-8352b1a5ac41@cs.oswego.edu> On 5/27/19 11:24 AM, Brian Goetz wrote: > This is really two questions. We could have a non-ambiguous keyword (eg break-from-expression-switch); that?s separate from the keyword vs operator story. > > To the latter, I think the simple answer is: all existing control flow operations (return, throw, break, etc) are words. This does not seem sufficiently different to change paradigms by creating an operator. Well, there's the main control flow operator ";", plus "?...:" and "->". > > To the former, this is a trade off between spec complexity and reading clarity. To this, the question of whether this is a good trade off is a reasonable one. If the complexity can be reasonably bounded, I think most people prefer a new verb to the set of things that can be constructed with real unambiguous keywords, but this is surely subjective. > (At the risk of concurrent programmers subjectively factionalizing into an Anyone But Yield movement for value-producing blocks to avoid misreading code.) Anyway, the main point of writing was a CSR-member-style plea for empirical checks of impact as a part of due diligence. -Doug > Sent from my MacBook Wheel > >> On May 27, 2019, at 4:06 PM, Doug Lea
wrote: >> >> >> I don't enjoy being the token curmudgeon here, but I find it >> increasingly hard to appreciate why a non-ambiguous choice (prefix "^") >> with precedence in related languages should be rejected in favor of one >> requiring context-sensitive grammar mangling with some known odd >> consequences. At the very least, could someone help check as-yet-unknown >> impact by using candidate parsers on large source corpuses (for example >> http://groups.inf.ed.ac.uk/cup/javaGithub/, google-internal, etc)? >> >> -Doug >> >> > > From jan.lahoda at oracle.com Mon May 27 16:18:41 2019 From: jan.lahoda at oracle.com (Jan Lahoda) Date: Mon, 27 May 2019 18:18:41 +0200 Subject: Yield as contextual keyword In-Reply-To: <4e96ff80-872a-1694-0bc1-6b0bf2dc5f09@cs.oswego.edu> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <565C5B9C-F884-4FD4-9BFC-2692327B86FC@oracle.com> <5CE865A5.1060301@oracle.com> <48F830B2-B2E4-4041-AEF0-421A0D8D9FEB@oracle.com> <4e96ff80-872a-1694-0bc1-6b0bf2dc5f09@cs.oswego.edu> Message-ID: On 27. 05. 19 16:06, Doug Lea wrote: > > I don't enjoy being the token curmudgeon here, but I find it > increasingly hard to appreciate why a non-ambiguous choice (prefix "^") > with precedence in related languages should be rejected in favor of one > requiring context-sensitive grammar mangling with some known odd > consequences. At the very least, could someone help check as-yet-unknown > impact by using candidate parsers on large source corpuses (for example > http://groups.inf.ed.ac.uk/cup/javaGithub/, google-internal, etc)? So, I've run the current parser over a corpus that we sometimes use - there do not appear a conflicting use of yield there. There is a handful of uses of yield as a variable; some invocations of Thread.yield; and (the included version of) JRuby appears to define and invoke methods called "yield", but the invocations appear to be in an expression context. Jan > > -Doug > > From brian.goetz at oracle.com Mon May 27 16:24:55 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 27 May 2019 18:24:55 +0200 Subject: Fwd: Yield as contextual keyword References: Message-ID: Sent from my MacBook Wheel Begin forwarded message: > From: Jan Lahoda > Date: May 27, 2019 at 6:18:41 PM GMT+2 > To: Amber Expert Group Observers , Doug Lea
, Brian Goetz > Subject: Re: Yield as contextual keyword > >> On 27. 05. 19 16:06, Doug Lea wrote: >> I don't enjoy being the token curmudgeon here, but I find it >> increasingly hard to appreciate why a non-ambiguous choice (prefix "^") >> with precedence in related languages should be rejected in favor of one >> requiring context-sensitive grammar mangling with some known odd >> consequences. At the very least, could someone help check as-yet-unknown >> impact by using candidate parsers on large source corpuses (for example >> http://groups.inf.ed.ac.uk/cup/javaGithub/, google-internal, etc)? > > So, I've run the current parser over a corpus that we sometimes use - there do not appear a conflicting use of yield there. There is a handful of uses of yield as a variable; some invocations of Thread.yield; and (the included version of) JRuby appears to define and invoke methods called "yield", but the invocations appear to be in an expression context. > > Jan > >> -Doug From gavin.bierman at oracle.com Wed May 29 14:21:06 2019 From: gavin.bierman at oracle.com (Gavin Bierman) Date: Wed, 29 May 2019 16:21:06 +0200 Subject: Yield as contextual keyword In-Reply-To: <5CE865A5.1060301@oracle.com> References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <565C5B9C-F884-4FD4-9BFC-2692327B86FC@oracle.com> <5CE865A5.1060301@oracle.com> Message-ID: Upon reflection, the simplest way out of this is to not go down the path of trying to identify tokens so that the lexer knows something about parsing, but rather follow the suggestion made by Dan earlier in this thread. To wit, we treat `yield` much like we treat `var`. It?s a "restricted identifier", which means that it can?t be used as a *TypeIdentifier* nor as a *MethodName*. Thus any unqualified method invocation needs to be qualified (or in the extreme corner case involving an anonymous class spotted by Tagir, may need (local) renaming). Without qualification, `yield (42);` will be *parsed* as a `yield` statement and not an expression statement. Our corpus analysis, as reported by Brian, shows this not to be a problem. Tagir?s analysis of the Idea Ultimate sources suggests the same. The revised JLS is available at: http://cr.openjdk.java.net/~gbierman/jep354-jls-20190528.html Thanks, Gavin > On 24 May 2019, at 23:44, Alex Buckley wrote: > > On 5/24/2019 1:19 PM, Tagir Valeev wrote: >> Hello! Answering myself >> >>>> The first token in a YieldStatement production is always preceded >>>> by one of these separator tokens: ;, {, }, ), or ->. >>> >>> Seems I'm missing something. Could you please illustrate in which >>> case YieldStatement could be preceded by ')'? >> >> Nevermind. if(foo) yield bar; is a good example. Other my points >> still apply. >> >>> Also what about '->'? In lambda '->' is followed by an expression >>> or block, but not a statement. In switch '->' is followed by block, >>> throw or expression plus semicolon. Also could YieldStatement be >>> preceded by ':' in old switch format? E.g. >>> >>> System.out.println(switch(0) { default: yield 1; }); // seems >>> legit > > You're right that `->` should not appear in the list. Any `yield` which follows `->` is necessarily the start of an expression, so `yield` should be tokenized as an identifier there. > > `:` is tricky. On the one hand, the space after `:` is sometimes desirous of an statement, so tokenize `yield` as a keyword: > > - `default : yield (1);` in a switch expression (also `case ... :`) > > - `L1 : yield (1);` in a switch expression (labeled statements are legitimate in a switch-labeled block! If there was no label, we would quickly say that this `yield` is a YieldStatement not an ExpressionStatement, and that if you want an ExpressionStatement which invokes a method, then qualify the invocation.) > > On the other hand, the space after `:` is sometimes desirous of an expression, so tokenize `yield` as a identifier: (and it might be the name of a local variable, so no way to qualify) > > - `for (String s : yield . f) ...` > > - `m(a ? yield . f : yield . g)` > > Alex From peter.levart at gmail.com Wed May 29 16:15:34 2019 From: peter.levart at gmail.com (Peter Levart) Date: Wed, 29 May 2019 18:15:34 +0200 Subject: Yield as contextual keyword In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <565C5B9C-F884-4FD4-9BFC-2692327B86FC@oracle.com> <5CE865A5.1060301@oracle.com> Message-ID: Even in expression context, unqualified yield could be tokenized as keyword (and hence produce a compile-time error). What do we loose? If it is a field, it can be qualified. If it is a local variable, it only presents source incompatibility, which can easily be fixed at next re-compile. The treatment would be more regular (not dependent on expression vs. statement context) this way. Regards, Peter On 5/29/19 4:21 PM, Gavin Bierman wrote: > Upon reflection, the simplest way out of this is to not go down the > path of trying to identify tokens so that the lexer knows something > about parsing, but rather follow the suggestion made by Dan earlier in > this thread. To wit, we treat `yield` much like we treat `var`. It?s a > "restricted identifier", which means that it can?t be used as a > *TypeIdentifier* nor as a *MethodName*. Thus any unqualified method > invocation needs to be qualified (or in the extreme corner case > involving an anonymous class spotted by Tagir, may need (local) > renaming). Without qualification, `yield (42);` will be *parsed* as a > `yield` statement and not an expression statement. Our corpus > analysis, as reported by Brian, shows this not to be a problem. > Tagir?s analysis of the Idea Ultimate sources suggests the same. > > The revised JLS is available at: > http://cr.openjdk.java.net/~gbierman/jep354-jls-20190528.html > > Thanks, > Gavin > > > >> On 24 May 2019, at 23:44, Alex Buckley > > wrote: >> >> On 5/24/2019 1:19 PM, Tagir Valeev wrote: >>> Hello! Answering myself >>> >>>>> The first token in a YieldStatement production is always preceded >>>>> by one of these separator tokens: ;, {, }, ), or ->. >>>> >>>> Seems I'm missing something. Could you please illustrate in which >>>> case YieldStatement could be preceded by ')'? >>> >>> Nevermind. if(foo) yield bar; is a good example. Other my points >>> still apply. >>> >>>> Also what about '->'? In lambda '->' is followed by an expression >>>> or block, but not a statement. In switch '->' is followed by block, >>>> throw or expression plus semicolon. Also could YieldStatement be >>>> preceded by ':' in old switch format? E.g. >>>> >>>> System.out.println(switch(0) { default: yield 1; }); // seems >>>> legit >> >> You're right that `->` should not appear in the list. Any `yield` >> which follows `->` is necessarily the start of an expression, so >> `yield` should be tokenized as an identifier there. >> >> `:` is tricky. On the one hand, the space after `:` is sometimes >> desirous of an statement, so tokenize `yield` as a keyword: >> >> - `default : yield (1);` in a switch expression (also `case ... :`) >> >> - `L1 : yield (1);` in a switch expression (labeled statements are >> legitimate in a switch-labeled block! If there was no label, we would >> quickly say that this `yield` is a YieldStatement not an >> ExpressionStatement, and that if you want an ExpressionStatement >> which invokes a method, then qualify the invocation.) >> >> On the other hand, the space after `:` is sometimes desirous of an >> expression, so tokenize `yield` as a identifier: (and it might be the >> name of a local variable, so no way to qualify) >> >> - `for (String s : yield . f) ...` >> >> - `m(a ? yield . f : yield . g)` >> >> Alex > From alex.buckley at oracle.com Thu May 30 18:20:46 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Thu, 30 May 2019 11:20:46 -0700 Subject: Yield as contextual keyword In-Reply-To: References: <5788F02D-65C4-44A6-BB2D-E8CEF92E4DC7@oracle.com> <3B574577-2510-48C7-82E3-0571A013FB90@oracle.com> <1884638192.1737818.1558078189851.JavaMail.zimbra@u-pem.fr> <10D1F041-E9D6-46F7-A9DF-D130F10791E3@univ-mlv.fr> <61BF195D-0BED-444F-AF37-B815825986A5@oracle.com> <7A552B7F-1A49-46E4-9C82-AD8421707C9D@oracle.com> <565C5B9C-F884-4FD4-9BFC-2692327B86FC@oracle.com> <5CE865A5.1060301@oracle.com> Message-ID: <5CF01EFE.6060601@oracle.com> To be clear, in the new approach, the lexeme `yield` is always tokenized as an identifier, and never as a keyword. Gavin has already changed the MethodName production so that it uses UnqualifiedMethodIdentifier rather than Identifier. And since MethodName is used by MethodInvocation (15.12), ALL unqualified method invocations are now constrained by the "can't call `yield`" policy -- whether an invocation is top level (an expression statement) or nested (an expression). If you write `f(g(yield(1)))` then you will get a compile-time error due to g's argument not parsing as an Expression. Alex On 5/29/2019 9:15 AM, Peter Levart wrote: > Even in expression context, unqualified yield could be tokenized as > keyword (and hence produce a compile-time error). What do we loose? If > it is a field, it can be qualified. If it is a local variable, it only > presents source incompatibility, which can easily be fixed at next > re-compile. > > The treatment would be more regular (not dependent on expression vs. > statement context) this way. > > Regards, Peter > > On 5/29/19 4:21 PM, Gavin Bierman wrote: >> Upon reflection, the simplest way out of this is to not go down the >> path of trying to identify tokens so that the lexer knows something >> about parsing, but rather follow the suggestion made by Dan earlier in >> this thread. To wit, we treat `yield` much like we treat `var`. It?s a >> "restricted identifier", which means that it can?t be used as a >> *TypeIdentifier* nor as a *MethodName*. Thus any unqualified method >> invocation needs to be qualified (or in the extreme corner case >> involving an anonymous class spotted by Tagir, may need (local) >> renaming). Without qualification, `yield (42);` will be *parsed* as a >> `yield` statement and not an expression statement. Our corpus >> analysis, as reported by Brian, shows this not to be a problem. >> Tagir?s analysis of the Idea Ultimate sources suggests the same. >> >> The revised JLS is available at: >> http://cr.openjdk.java.net/~gbierman/jep354-jls-20190528.html >> >> Thanks, >> Gavin >> >> >> >>> On 24 May 2019, at 23:44, Alex Buckley >> > wrote: >>> >>> On 5/24/2019 1:19 PM, Tagir Valeev wrote: >>>> Hello! Answering myself >>>> >>>>>> The first token in a YieldStatement production is always preceded >>>>>> by one of these separator tokens: ;, {, }, ), or ->. >>>>> >>>>> Seems I'm missing something. Could you please illustrate in which >>>>> case YieldStatement could be preceded by ')'? >>>> >>>> Nevermind. if(foo) yield bar; is a good example. Other my points >>>> still apply. >>>> >>>>> Also what about '->'? In lambda '->' is followed by an expression >>>>> or block, but not a statement. In switch '->' is followed by block, >>>>> throw or expression plus semicolon. Also could YieldStatement be >>>>> preceded by ':' in old switch format? E.g. >>>>> >>>>> System.out.println(switch(0) { default: yield 1; }); // seems >>>>> legit >>> >>> You're right that `->` should not appear in the list. Any `yield` >>> which follows `->` is necessarily the start of an expression, so >>> `yield` should be tokenized as an identifier there. >>> >>> `:` is tricky. On the one hand, the space after `:` is sometimes >>> desirous of an statement, so tokenize `yield` as a keyword: >>> >>> - `default : yield (1);` in a switch expression (also `case ... :`) >>> >>> - `L1 : yield (1);` in a switch expression (labeled statements are >>> legitimate in a switch-labeled block! If there was no label, we would >>> quickly say that this `yield` is a YieldStatement not an >>> ExpressionStatement, and that if you want an ExpressionStatement >>> which invokes a method, then qualify the invocation.) >>> >>> On the other hand, the space after `:` is sometimes desirous of an >>> expression, so tokenize `yield` as a identifier: (and it might be the >>> name of a local variable, so no way to qualify) >>> >>> - `for (String s : yield . f) ...` >>> >>> - `m(a ? yield . f : yield . g)` >>> >>> Alex >> > From alex.buckley at oracle.com Thu May 30 19:57:40 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Thu, 30 May 2019 12:57:40 -0700 Subject: Draft language spec for JEP 355: Text Blocks In-Reply-To: References: Message-ID: <5CF035B4.1030501@oracle.com> On 5/29/2019 9:57 AM, Arthur Neufeld wrote: > String season = """ > winter > """; // the six characters w i n t e r > > Doesn?t ?season? actually contain 7 characters? > > w i n t e r \n Good catch, thanks. Yes, seven characters. The final character is a line terminator per step 7 of the reindentation algorithm: "If the final line in the list from step 6 is empty [because the final line was all white space prior to stripping], then the joining LF from the previous line will be the last character in the result string." There were other spec examples which had the closing delimiter on its own line, yet forgot to include the final LF in the result string. I have corrected the spec @ http://cr.openjdk.java.net/~abuckley/jep355/text-blocks-jls.html Alex