From amalloy at google.com Wed Apr 3 17:27:54 2019 From: amalloy at google.com (Alan Malloy) Date: Wed, 3 Apr 2019 10:27:54 -0700 Subject: How records would fit into Google's codebase Message-ID: Hello, amber-spec-experts. I have done some analysis of Google's codebase, to see if it can give us any concrete ideas about hoe developers might use records, and about how we might design records to make them more useful. A plain-text version of my report follows below - I had prepared a lovely version with formatting and inline links, but was advised that plain text is better for this forum. I've enclosed the original HTML document as an attachment to this email, in case that works better. Record Proposal Feedback Based On @AutoValue The most recent version of the records proposal puts many restrictions on records, with the aim of producing a more focused, opinionated tool. Most notably: record fields must be final; record classes must be final; field accessors must be public. There has broadly been support for these ideas, from Google and from other JDK contributors: it appeals to our sensibilities. However, there have also been some questions asked about whether the restrictions imposed will hinder adoption, and it?s hard to estimate that in the abstract. Google would like to ensure Java gets the best possible version of the record feature, and so in addition to thinking abstractly about what features sound good for records, I have spent some time collecting concrete data from the Google codebase, to determine how well a version of records would fit in there, and what changes to the records proposal might make it better. A natural starting point is to compare with @AutoValue, an annotation processor Google publishes and uses to auto-generate implementations for simple data classes (for details, see https://github.com/google/auto/blob/master/value/userguide/index.md). We use these internally for roughly the same kinds of things records might be used for, and so data about how they are used will help when evaluating the impact of various facets of the records proposal. Accordingly, I have collected some statistics about all the @AutoValue classes in Google?s codebase. Below I share this data, and present some recommendations based on it. When it is useful and possible, I also include simplified and fictionalized code samples based on the real code in Google?s codebase, to clarify what patterns I am talking about. One additional thing I would have liked to do is to somehow find classes which were ?almost? records, but which ended up not using @AutoValue. A survey of these might help us decide what changes we could make to increase adoption, or simply confirm for us, ?Yes, it?s a good thing we included restriction X, because this class is a bad candidate for a record, but without restriction X it might have become a record?. Alas, unsurprisingly, it is much easier to find actual @AutoValue classes than to design a heuristic for ?almost an @AutoValue?, and so I have not done this. Summary and Recommendations (TL;DR) For those who don?t care to slog through the whole document, I present here a summary of recommendations. Many of these recommendations simply echo what is already in the proposal, because the data I found supports the proposal. * Records can expect to be about as common as enums * Records should be allowed to implement interfaces, and perhaps to extend a superclass, but should not allow their own subclasses (except perhaps abstract records) * Records should have a very lightweight syntax for the common case, preferably one line. @AutoValue?s clunkier syntax may be reducing adoption for some use cases. * Records should be immutable. The language should enforce shallow mutability (by making all fields final), and style guides should recommend that deep immutability. * We should consider adding an alternative way to construct records besides a constructor with positional parameters. Builders are popular for @AutoValue, but perhaps something better could be done for records as a language feature. * Records do not need language-level support for withFoo methods, or toBuilder, even if builder support is included. * Records do not need a way to include private/hidden state, or to memoize derived properties * Records should allow implementing Object methods by hand, rejecting the auto-generated implementation, but expect this to be done rarely Popularity One simple question to ask is: how often would records be used? If records end up being used very rarely, we may regret spending too much time designing and implementing them, or may wish we had made them more flexible instead of more restrictive. To measure popularity, I simply asked ?what percentage of all named, non-local classes are @AutoValue classes?? I limit to named, non-local classes because these are the scopes in which @AutoValue is able to operate; a record integrated more tightly with the language could replace local classes, but @AutoValue can?t. The answer is: 3.0% of Google?s named, concrete classes are @AutoValue classes. Is that a lot, or a little? In isolation it?s hard to say. For comparison, I asked how many of Google?s classes fall into other interesting categories. At two opposite ends of the spectrum, 8.9% of classes are anonymous, while just 0.05% of classes are method-local named classes. A particularly promising comparison is to enums, a feature which similarly gives up some flexibility for increased expressive power, and to which Brian refers in the records proposal. Of Google?s named, concrete classes, 3.7% of them are enums. So, it seems we are past the first hurdle: there is a healthy appetite for ?simple immutable data carriers?, if we make a compelling offering. We can expect them to be less popular than anonymous classes, but roughly as popular as enums. Or perhaps more popular: quite possibly, developers would be more keen to use records if they were a language feature instead of a Google library. It?s hard to know for sure. Superclasses How often do @AutoValue classes make use of inheritance? 77% of @AutoValue classes are ?islands? in the inheritance graph: they extend Object, and implement no interfaces. 15% of @AutoValue classes extend Object and implement exactly one interface. A mere 4% extend some class other than Object, and implement no interfaces. Very few do anything else (implement 2+ interfaces, or implement interface(s) while also extending a class). Just like @AutoValue, the current proposal plans to allow implementing interfaces and/or extending a class. That seems reasonable, but extending a class is rare enough that we could consider forbidding it if that fits better onto the semantic goals of ?simple data carriers?: it won?t harm too large a percentage of the usages. Perhaps, for example, we could reserve the extends syntax for the future possibility of extending ?abstract records? that Brian suggested. Still, it would be reasonable to allow normal subclassing, if we judge that it helps achieve the semantic goals of records. One possible source of skew in this data: since @AutoValue determines its fields by identifying abstract 0-argument methods, rather than by explicitly listing them, some @AutoValue classes ?inherit their API? by extending an abstract class, or implementing an interface, containing some abstract accessor methods. I don?t have a good estimate for how common this is. Such uses are perhaps arguments in favor of the ?extending abstract records? feature. However, it is also interesting for a non-record class to conform to the same interface. Consider, for example, this pattern I have seen a few times: public interface JobInfo { String session(); boolean privileged(); Instant startTime(); } @AutoValue public abstract class FakeJobInfo implements JobInfo { // Note no fields specified: they are inherited from JobInfo! Builder builder() {return new AutoValue_FakeJobInfo.Builder();} @AutoValue.Builder public interface Builder { Builder setSession(String session); Builder setPrivileged(boolean privileged); Builder setStartTime(Instant startTime); FakeJobInfo build(); } } public class JobRegistry { private Database database; // ... private class JobInfoImpl implements JobInfo { private JobId id; public JobInfoImpl(JobId id) {this.id = id;} public String session() {return database.lookupSession(id);} public boolean privileged() {return database.isPrivileged(id);} public Instant startTime() { return Instant.ofEpochSecond(database.startTime(id)); } } } Of course I have simplified the logic in JobInfoImpl, but the idea is that there is an interface for looking things up, a fake implementation (used in tests) that is a simple record, and a non-record implementation for production use that gets its data from some other source. I think the records proposal as written already supports this use case semantically: we can define the interface first, then define a record that implements it. However, we don?t get any code for free: we have to repeat the list of properties, defining them once in the interface and then again in the record?s list of fields. One interesting possibility would be some connection between records and interfaces. Perhaps a record definition could, upon request, also produce an interface that can be conformed to by some non-record implementation. Alternatively, a record could inherit fields based on the interfaces it implements, as @AutoValue does; however, since records do not normally have their fields defined via abstract methods, I think this approach fits less well for records than it does for @AutoValue. Subclasses Another inheritance-related question: should subclasses of records be allowed? Google?s @AutoValue documentation strongly discourages this, but we cannot make it illegal because @AutoValue uses subclassing as an implementation detail (it generates a subclass of your abstract class). So, we can ask how many authors decided to write additional subclasses despite the warnings. Just 0.26% of @AutoValue classes have subclasses (aside from the auto-generated one that we expect). An inspection of some of these instances doesn?t suggest a compelling case for subclassing of records. These subclasses are mostly stubs for testing: for example, overriding accessors to throw an exception, or to return a value that could have just been specified in the constructor. The authors do not seem to be intentionally working around some restriction of @AutoValue. One example: @AutoValue public abstract class Document { public abstract String text(); public abstract Language language(); // and 4 more fields... } public class DocumentTestHelper { public static Document instance() {return INSTANCE;} private static final Document INSTANCE = new ThrowingDocument(); private static class ThrowingPoint extends Point { private UnsupportedOperationException cannotDoThis() { return new UnsupportedOperationException("Cannot use this!"); } @Override public String text() {throw cannotDoThis();} @Override public Language language() {throw cannotDoThis();} // and 4 more fields doing the same thing... } } The rarity of subclasses (and lack of any convincing subclasses) argues in favor of the restriction that all records must be final (except, perhaps, some kind of abstract record). Visibility and Scoping I wanted to know whether people are using @AutoValue mostly for public API ?contracts?, or for internal implementation details. To answer this question, I asked of each @AutoValue class whether it a nested class or top-level, and what visibility modifier it declares. This misses some subtlety: I didn?t pay attention to effective visibility, so a public @AutoValue nested inside a package-private class would look public in this analysis. 62% of @AutoValue classes are top-level classes, of which 84% are public; the rest are necessarily package-private. The remaining 38% of @AutoValue classes are nested classes, divided evenly between public and package-private (none are protected or private, because @AutoValue does not support such visibilities). Thus, almost 75% of @AutoValue classes are public, suggesting they are used as part of some contract between two or more classes. This is a bit higher than I would have guessed. I suspect it is because while @AutoValue does a good job of meeting the ?semantic goals? of records, it still has a fair amount of boilerplate, and developers do not like to go to the trouble of defining one for one-off data types used within a method. Consider Brian?s topThreePeople example. I have reproduced it below for convenience, and included an alternate implementation using @AutoValue: public class PersonDatabase { List topThreePeopleUsingRecord(List list) { record PersonX(Person p, int hash) { PersonX(Person p) { this(p, p.name().toUpperCase().hashCode()); } } return list.stream() .map(PersonX::new) .sorted(Comparator.comparingInt(PersonX::hash)) .limit(3) .map(PersonX::p) .collect(toList()); } @AutoValue abstract static class HashedPerson { public abstract Person p(); public abstract int hash(); public static HashedPerson create(Person p) { return new AutoValue_PersonDatabase_HashedPerson(p, p.name ().toUpperCase().hashCode()); } } List topThreePeopleUsingAutoValue(List list) { return list.stream() .map(HashedPerson::create) .sorted(Comparator.comparingInt(HashedPerson::hash)) .limit(3) .map(HashedPerson::p) .collect(toList()); } } The @AutoValue ?record? takes 7 lines to declare instead of 5, requires you to look up or remember the special naming scheme it uses, and also ?leaks? into the enclosing class from the method where it really belongs. If we want records to be more useful as implementation details, we should ensure there is a very low-overhead way of defining them. Brian?s current proposal is promising in this regard, allowing a simple one-line definition for lightweight records. Mutability The most recent restriction proposed for records is that all fields must be final. Google broadly encourages immutability, and so we support this idea, but can we prove that developers agree? It?s hard to collect unbiased data on this: since @AutoValue doesn?t define fields, but rather defines named accessors and hides the fields from you, there is no way for a developer to say ?hey wait, I wanted a mutable field?, except by defining the field themselves...but even this is hard to do! @AutoValue allows you to define fields independently, but it will only call the no-argument constructor of your class, so there?s no way to initialize those fields except by relying on side effects. So, developers who really want a mutable field won?t be using @AutoValue, and won?t appear in the data I collected. However, we have static analysis tools that issue compiler warnings if you put an array or other mutable object (e.g. collection) in an @AutoValue. So instead of looking at the current state of the codebase, I looked at data about how developers reacted to the static analysis warnings. I sampled 304 instances of warnings where someone felt strongly enough to point them out during code review: 272 of these actions were to say ?this is a good warning, and you should fix your class?, and 32 were to say ?This warning is not useful in this case.? I do not have data for developers who saw the warning and fixed it on their own before getting to code review. This warning is a relatively recent addition to our static analysis tooling, so there are some committed instances of @AutoValue classes with array fields from before that time, and additionally some cases where developers have reacted to the new warning by simply adding @SuppressWarnings to their existing @AutoValue class. 0.49% of @AutoValue classes have an array member, and 0.06% of @AutoValueclasses contain a @SuppressWarnings annotation for this warning. So, broadly it seems that developers agree that it?s better for records to be deeply immutable, but a small percentage of rebels yearn to mutate their data carriers, or at any rate don?t want to refactor their legacy code to hew closer to the semantics of records. Construction When defining an @AutoValue, you don?t get a public constructor for free, the way a proposed record would. Instead, you get a private generated constructor for free, and must either define a static factory method that delegates to the constructor, or define an abstract class to act as a builder for you; in the latter case, @AutoValue implements the builder, but there is still a fair bit more code to write in defining the methods that the builder class should have. Half of @AutoValue classes do the simplest thing: they define a single static factory which delegates to the generated constructor. 10% define two different factories. These groups, totaling 60% of @AutoValue classes, map well onto records as currently proposed. However, a third of @AutoValue classes think it?s worth the trouble to define a builder instead of, or in addition to, the static factory, even though they have to write a bunch more code to support it. This is an area where records could serve developers? needs better, by offering some kind of opt-in support for generating a builder to go with your data carrier. But why do people want a builder? How do they use it? Perhaps what they really want is named arguments, or default arguments. A builder may just be the best @AutoValue can do with the language as-is, but a new feature can try something bolder. In some use cases I see, every use of the builder sets every field explicitly, so the main advantage of a builder is that the field names are associated with their values at the construction call site. Such classes would probably be happy with a constructor with named arguments. On the other hand, I also see use cases where the builder is used to avoid specifying values for Optional fields. Consider: public record Response(Optional provider, Optional responseType, Optional action, Optional referenceUrl) { // Empty. This is a very bland record. } // ... public Response respond(Action action) { return new Response(Optional.empty(), Optional.of(ResponseType.ACTION), Optional.of(action), Optional.empty()); } // ... public Response redirect(String url) { return new Response(Optional.empty(), Optional.empty(), Optional.empty(), Optional.of(url)); } All the Optional wrappers have muddied up the call sites a lot, and the use of positional constructor parameters makes it hard to tell what is being specified in each usage. The @AutoValue version of this record uses a builder, and so replaces redirect with: public Response redirect(String url) { return Response.builder().referenceUrl(url).build(); } ?Modification? Of course with records being immutable, you can?t modify an existing record. But is it common to ask for a ?modified version? of a record, copying a subset of fields but changing others? An often-suggested feature for records is support for ?wither methods?: methods like MyRecord withFoo(int newFoo) {return new MyRecord(newFoo, this.bar);} As it turns out, defining methods like these is not very common. 3% of @AutoValue classes C have at least one instance method returning C - probably not all of these are ?wither? methods, but many of them are. This is a small enough percentage of classes that we could reasonably exclude this feature from records: ?if you want it that badly, you can do it yourself?. @AutoValue supports another kind of ?modification? that I expected to be more popular: toBuilder(). If your data carrier uses a builder for its construction, you can ask @AutoValue to generate a toBuilder() method, which converts an existing value into a builder, so that you can ask for a subset of fields to be changed before solidifying back down into an immutable value. But it turns out this feature is used very rarely: only 1.5% of @AutoValue classes with builders use this feature, which is less than 1% of all @AutoValue classes. So even considering wither methods and toBuilder together, less than 5% of @AutoValue classes use this feature. Perhaps if records could define builders and withers for you automatically and with very little boilerplate, these features would be used more often, but they don?t seem to fill a need so common that developers feel compelled to write them by hand. It doesn?t seem like a high priority to support wither methods, or toBuilder(), even if support for builders is added. Hidden State How will developers feel about the restriction that each field corresponds to a constructor parameter and a public accessor? Will they wish they could have some local state? We can look at two things in @AutoValue classes to identify developers who fit into these categories. First, they may define private fields which do not participate in @AutoValue generation. This turns out to be quite rare: less than 1% of @AutoValue classes have such properties. It makes sense to not support this, as hidden state both goes against the semantic goals of records and would go unused by most developers. However, there is a more restricted notion of private ?state? that may be more suitable, and which @AutoValue supports directly: memoization of derived properties. Developers can tag any nullary method with @Memoize, and the generated @AutoValue class will cache the return value of that method in a private field. This seems reasonably compatible with the semantic goals of records, and could be worth supporting if it is used regularly. However, despite being very easy to use, @Memoize is not very popular. Only 1.4% of @AutoValue classes memoize any properties. The most obvious things to memoize are hashCode and toString, and those are indeed the two most-memoized methods, but in total it is still pretty rare. Of @AutoValue classes which memoize something, only 14% memoize these methods: most have some other derived property that they want to cache. So, while it might be nice to offer support for lazy/cached methods, leaving it out will likely not have a significant impact on record adoption. If lazy instance fields ever make it into the language, we can retrofit them into records at that time. If memoization support is included, it should cover all properties, not just Object overrides. Manually Written Methods Both records and @AutoValue will automatically provide correct implementations of equals(Object) and hashCode(), as well as a reasonable toString(). How often do developers feel the need to override these methods? toString(), it turns out, is most common by a landslide, but still rare: 3% of @AutoValue classes have a manual implementation of toString(). Some examples: @AutoValue public abstract class Constraint { // ... @Override public final String toString() { return String.format( "Constraint_%s_%s_%s_%s_%s", cluster().name(), machine().name(), machineIntent(), subinterval(), constraint()); } } @AutoValue public abstract class SensitiveString { public abstract String getValue(); public static SensitiveString of(String value) { return new AutoValue_SensitiveString(value); } // Prevents sensitive strings accidentally being rendered. @Override public final String toString() { return "*"; } } equals(Object) and hashCode() are only overridden around 0.5% of the time. Developers are generally happy with auto-generated value semantics for their simple data carriers. I looked at some of the overriding implementations of these methods - they often just wanted a hashCode that was faster, at the expense of having more collisions. In one case I found, one of the fields being wrapped was of a class with an incorrect hashCode implementation, and so the @AutoValue author hashed it externally. To allow workarounds like this, allowing overrides is a good idea, but we can expect it to be used rarely if the automatic implementations of Object methods are generally suitable. In addition to overriding Object methods, there are other method signatures that crop up multiple times. Most common, although still less common than toString(), are conversion functions like toJson, toProto(), or toBuilder() (see Construction section). Much more rare, at around 0.1%, are methods like iterator() and size(): some @AutoValue classes wrap a single ImmutableCollection of some kind, and implement methods that delegate to this field. This could be an argument in favor of the recent method-delegation proposal, but it is a pretty rare thing to do, and many of these cases are really not a great idea: they should just call foo.coll().iterator() instead of foo.iterator(), and having Foo implement Iterable brings relatively little benefit. Footnote: Google?s Codebase A brief reminder about the value of using Google?s codebase to answer questions like these. Our codebase is large, easy to analyze, and highly cultivated, through static-analysis tools, enforced code-review, etc. In some ways, it does represent ?what good Java code looks like?, but it also has some peculiarities, such as a weird fascination with protobufs. So, keep in mind that when I make claims about how code looks, I am talking specifically about Google?s codebase, and not about all Java code in the universe. From amalloy at google.com Wed Apr 3 17:46:15 2019 From: amalloy at google.com (Alan Malloy) Date: Wed, 3 Apr 2019 10:46:15 -0700 Subject: How records would fit into Google's codebase In-Reply-To: References: Message-ID: Ah, and Liam has kindly hosted that HTML document on cr.openjdk for me, since I don't have an account there yet. So, anyone looking through the archives can find a fancier version of the report at: http://cr.openjdk.java.net/~cushon/amalloy/JDKRecordsProposalReport.html On Wed, Apr 3, 2019 at 10:27 AM Alan Malloy wrote: > Hello, amber-spec-experts. I have done some analysis of Google's codebase, > to see if it can give us any concrete ideas about hoe developers might use > records, and about how we might design records to make them more useful. > > A plain-text version of my report follows below - I had prepared a lovely > version with formatting and inline links, but was advised that plain text > is better for this forum. I've enclosed the original HTML document as an > attachment to this email, in case that works better. > > Record Proposal Feedback Based On @AutoValue > > The most recent version of the records proposal puts many restrictions on > records, with the aim of producing a more focused, opinionated tool. Most > notably: record fields must be final; record classes must be final; field > accessors must be public. There has broadly been support for these ideas, > from Google and from other JDK contributors: it appeals to our > sensibilities. However, there have also been some questions asked about > whether the restrictions imposed will hinder adoption, and it?s hard to > estimate that in the abstract. > > Google would like to ensure Java gets the best possible version of the > record feature, and so in addition to thinking abstractly about what > features sound good for records, I have spent some time collecting concrete > data from the Google codebase, to determine how well a version of records > would fit in there, and what changes to the records proposal might make it > better. > > A natural starting point is to compare with @AutoValue, an annotation > processor Google publishes and uses to auto-generate implementations for > simple data classes (for details, see > https://github.com/google/auto/blob/master/value/userguide/index.md). We > use these internally for roughly the same kinds of things records might be > used for, and so data about how they are used will help when evaluating the > impact of various facets of the records proposal. Accordingly, I have > collected some statistics about all the @AutoValue classes in Google?s > codebase. > > Below I share this data, and present some recommendations based on it. > When it is useful and possible, I also include simplified and fictionalized > code samples based on the real code in Google?s codebase, to clarify what > patterns I am talking about. > > One additional thing I would have liked to do is to somehow find classes > which were ?almost? records, but which ended up not using @AutoValue. A > survey of these might help us decide what changes we could make to increase > adoption, or simply confirm for us, ?Yes, it?s a good thing we included > restriction X, because this class is a bad candidate for a record, but > without restriction X it might have become a record?. Alas, unsurprisingly, > it is much easier to find actual @AutoValue classes than to design a > heuristic for ?almost an @AutoValue?, and so I have not done this. > > Summary and Recommendations (TL;DR) > > For those who don?t care to slog through the whole document, I present > here a summary of recommendations. Many of these recommendations simply > echo what is already in the proposal, because the data I found supports the > proposal. > > * Records can expect to be about as common as enums > * Records should be allowed to implement interfaces, and perhaps to extend > a superclass, but should not allow their own subclasses (except perhaps > abstract records) > * Records should have a very lightweight syntax for the common case, > preferably one line. @AutoValue?s clunkier syntax may be reducing adoption > for some use cases. > * Records should be immutable. The language should enforce shallow > mutability (by making all fields final), and style guides should recommend > that deep immutability. > * We should consider adding an alternative way to construct records > besides a constructor with positional parameters. Builders are popular for > @AutoValue, but perhaps something better could be done for records as a > language feature. > * Records do not need language-level support for withFoo methods, or > toBuilder, even if builder support is included. > * Records do not need a way to include private/hidden state, or to memoize > derived properties > * Records should allow implementing Object methods by hand, rejecting the > auto-generated implementation, but expect this to be done rarely > > Popularity > > One simple question to ask is: how often would records be used? If records > end up being used very rarely, we may regret spending too much time > designing and implementing them, or may wish we had made them more flexible > instead of more restrictive. To measure popularity, I simply asked ?what > percentage of all named, non-local classes are @AutoValue classes?? I limit > to named, non-local classes because these are the scopes in which > @AutoValue is able to operate; a record integrated more tightly with the > language could replace local classes, but @AutoValue can?t. The answer is: > 3.0% of Google?s named, concrete classes are @AutoValue classes. Is that a > lot, or a little? In isolation it?s hard to say. For comparison, I asked > how many of Google?s classes fall into other interesting categories. At two > opposite ends of the spectrum, 8.9% of classes are anonymous, while just > 0.05% of classes are method-local named classes. A particularly promising > comparison is to enums, a feature which similarly gives up some flexibility > for increased expressive power, and to which Brian refers in the records > proposal. Of Google?s named, concrete classes, 3.7% of them are enums. > > So, it seems we are past the first hurdle: there is a healthy appetite for > ?simple immutable data carriers?, if we make a compelling offering. We can > expect them to be less popular than anonymous classes, but roughly as > popular as enums. Or perhaps more popular: quite possibly, developers would > be more keen to use records if they were a language feature instead of a > Google library. It?s hard to know for sure. > > Superclasses > > How often do @AutoValue classes make use of inheritance? 77% of @AutoValue > classes are ?islands? in the inheritance graph: they extend Object, and > implement no interfaces. 15% of @AutoValue classes extend Object and > implement exactly one interface. A mere 4% extend some class other than > Object, and implement no interfaces. Very few do anything else (implement > 2+ interfaces, or implement interface(s) while also extending a class). > > Just like @AutoValue, the current proposal plans to allow implementing > interfaces and/or extending a class. That seems reasonable, but extending a > class is rare enough that we could consider forbidding it if that fits > better onto the semantic goals of ?simple data carriers?: it won?t harm too > large a percentage of the usages. Perhaps, for example, we could reserve > the extends syntax for the future possibility of extending ?abstract > records? that Brian suggested. Still, it would be reasonable to allow > normal subclassing, if we judge that it helps achieve the semantic goals of > records. > > One possible source of skew in this data: since @AutoValue determines its > fields by identifying abstract 0-argument methods, rather than by > explicitly listing them, some @AutoValue classes ?inherit their API? by > extending an abstract class, or implementing an interface, containing some > abstract accessor methods. I don?t have a good estimate for how common this > is. Such uses are perhaps arguments in favor of the ?extending abstract > records? feature. However, it is also interesting for a non-record class to > conform to the same interface. Consider, for example, this pattern I have > seen a few times: > > public interface JobInfo { > String session(); > boolean privileged(); > Instant startTime(); > } > > @AutoValue public abstract class FakeJobInfo implements JobInfo { > // Note no fields specified: they are inherited from JobInfo! > Builder builder() {return new AutoValue_FakeJobInfo.Builder();} > @AutoValue.Builder public interface Builder { > Builder setSession(String session); > Builder setPrivileged(boolean privileged); > Builder setStartTime(Instant startTime); > FakeJobInfo build(); > } > } > > public class JobRegistry { > private Database database; > // ... > private class JobInfoImpl implements JobInfo { > private JobId id; > public JobInfoImpl(JobId id) {this.id = id;} > public String session() {return database.lookupSession(id);} > public boolean privileged() {return database.isPrivileged(id);} > public Instant startTime() { > return Instant.ofEpochSecond(database.startTime(id)); > } > } > } > > Of course I have simplified the logic in JobInfoImpl, but the idea is that > there is an interface for looking things up, a fake implementation (used in > tests) that is a simple record, and a non-record implementation for > production use that gets its data from some other source. > > I think the records proposal as written already supports this use case > semantically: we can define the interface first, then define a record that > implements it. However, we don?t get any code for free: we have to repeat > the list of properties, defining them once in the interface and then again > in the record?s list of fields. One interesting possibility would be some > connection between records and interfaces. Perhaps a record definition > could, upon request, also produce an interface that can be conformed to by > some non-record implementation. Alternatively, a record could inherit > fields based on the interfaces it implements, as @AutoValue does; however, > since records do not normally have their fields defined via abstract > methods, I think this approach fits less well for records than it does for > @AutoValue. > > Subclasses > > Another inheritance-related question: should subclasses of records be > allowed? Google?s @AutoValue documentation strongly discourages this, but > we cannot make it illegal because @AutoValue uses subclassing as an > implementation detail (it generates a subclass of your abstract class). So, > we can ask how many authors decided to write additional subclasses despite > the warnings. Just 0.26% of @AutoValue classes have subclasses (aside from > the auto-generated one that we expect). An inspection of some of these > instances doesn?t suggest a compelling case for subclassing of records. > These subclasses are mostly stubs for testing: for example, overriding > accessors to throw an exception, or to return a value that could have just > been specified in the constructor. The authors do not seem to be > intentionally working around some restriction of @AutoValue. > > One example: > > @AutoValue public abstract class Document { > public abstract String text(); > public abstract Language language(); > // and 4 more fields... > } > > public class DocumentTestHelper { > public static Document instance() {return INSTANCE;} > private static final Document INSTANCE = new ThrowingDocument(); > private static class ThrowingPoint extends Point { > private UnsupportedOperationException cannotDoThis() { > return new UnsupportedOperationException("Cannot use this!"); > } > @Override public String text() {throw cannotDoThis();} > @Override public Language language() {throw cannotDoThis();} > // and 4 more fields doing the same thing... > } > } > > The rarity of subclasses (and lack of any convincing subclasses) argues in > favor of the restriction that all records must be final (except, perhaps, > some kind of abstract record). > > Visibility and Scoping > > I wanted to know whether people are using @AutoValue mostly for public API > ?contracts?, or for internal implementation details. To answer this > question, I asked of each @AutoValue class whether it a nested class or > top-level, and what visibility modifier it declares. This misses some > subtlety: I didn?t pay attention to effective visibility, so a public > @AutoValue nested inside a package-private class would look public in this > analysis. > > 62% of @AutoValue classes are top-level classes, of which 84% are public; > the rest are necessarily package-private. The remaining 38% of @AutoValue > classes are nested classes, divided evenly between public and > package-private (none are protected or private, because @AutoValue does not > support such visibilities). Thus, almost 75% of @AutoValue classes are > public, suggesting they are used as part of some contract between two or > more classes. > > This is a bit higher than I would have guessed. I suspect it is because > while @AutoValue does a good job of meeting the ?semantic goals? of > records, it still has a fair amount of boilerplate, and developers do not > like to go to the trouble of defining one for one-off data types used > within a method. Consider Brian?s topThreePeople example. I have reproduced > it below for convenience, and included an alternate implementation using > @AutoValue: > > public class PersonDatabase { > List topThreePeopleUsingRecord(List list) { > record PersonX(Person p, int hash) { > PersonX(Person p) { > this(p, p.name().toUpperCase().hashCode()); > } > } > > return list.stream() > .map(PersonX::new) > .sorted(Comparator.comparingInt(PersonX::hash)) > .limit(3) > .map(PersonX::p) > .collect(toList()); > } > > @AutoValue abstract static class HashedPerson { > public abstract Person p(); > public abstract int hash(); > public static HashedPerson create(Person p) { > return new AutoValue_PersonDatabase_HashedPerson(p, p.name > ().toUpperCase().hashCode()); > } > } > > List topThreePeopleUsingAutoValue(List list) { > return list.stream() > .map(HashedPerson::create) > .sorted(Comparator.comparingInt(HashedPerson::hash)) > .limit(3) > .map(HashedPerson::p) > .collect(toList()); > } > } > > The @AutoValue ?record? takes 7 lines to declare instead of 5, requires > you to look up or remember the special naming scheme it uses, and also > ?leaks? into the enclosing class from the method where it really belongs. > If we want records to be more useful as implementation details, we should > ensure there is a very low-overhead way of defining them. Brian?s current > proposal is promising in this regard, allowing a simple one-line definition > for lightweight records. > > Mutability > > The most recent restriction proposed for records is that all fields must > be final. Google broadly encourages immutability, and so we support this > idea, but can we prove that developers agree? It?s hard to collect unbiased > data on this: since @AutoValue doesn?t define fields, but rather defines > named accessors and hides the fields from you, there is no way for a > developer to say ?hey wait, I wanted a mutable field?, except by defining > the field themselves...but even this is hard to do! @AutoValue allows you > to define fields independently, but it will only call the no-argument > constructor of your class, so there?s no way to initialize those fields > except by relying on side effects. So, developers who really want a mutable > field won?t be using @AutoValue, and won?t appear in the data I collected. > > However, we have static analysis tools that issue compiler warnings if you > put an array or other mutable object (e.g. collection) in an @AutoValue. So > instead of looking at the current state of the codebase, I looked at data > about how developers reacted to the static analysis warnings. I sampled > 304 instances of warnings where someone felt strongly enough to point them > out during code review: 272 of these actions were to say ?this is a good > warning, and you should fix your class?, and 32 were to say ?This warning > is not useful in this case.? I do not have data for developers who saw the > warning and fixed it on their own before getting to code review. > > This warning is a relatively recent addition to our static analysis > tooling, so there are some committed instances of @AutoValue classes with > array fields from before that time, and additionally some cases where > developers have reacted to the new warning by simply adding > @SuppressWarnings to their existing @AutoValue class. 0.49% of @AutoValue > classes have an array member, and 0.06% of @AutoValueclasses contain a > @SuppressWarnings annotation for this warning. > > So, broadly it seems that developers agree that it?s better for records to > be deeply immutable, but a small percentage of rebels yearn to mutate their > data carriers, or at any rate don?t want to refactor their legacy code to > hew closer to the semantics of records. > > Construction > > When defining an @AutoValue, you don?t get a public constructor for free, > the way a proposed record would. Instead, you get a private generated > constructor for free, and must either define a static factory method that > delegates to the constructor, or define an abstract class to act as a > builder for you; in the latter case, @AutoValue implements the builder, but > there is still a fair bit more code to write in defining the methods that > the builder class should have. > > Half of @AutoValue classes do the simplest thing: they define a single > static factory which delegates to the generated constructor. 10% define two > different factories. These groups, totaling 60% of @AutoValue classes, map > well onto records as currently proposed. However, a third of @AutoValue > classes think it?s worth the trouble to define a builder instead of, or in > addition to, the static factory, even though they have to write a bunch > more code to support it. This is an area where records could serve > developers? needs better, by offering some kind of opt-in support for > generating a builder to go with your data carrier. > > But why do people want a builder? How do they use it? Perhaps what they > really want is named arguments, or default arguments. A builder may just be > the best @AutoValue can do with the language as-is, but a new feature can > try something bolder. In some use cases I see, every use of the builder > sets every field explicitly, so the main advantage of a builder is that the > field names are associated with their values at the construction call site. > Such classes would probably be happy with a constructor with named > arguments. > > On the other hand, I also see use cases where the builder is used to avoid > specifying values for Optional fields. Consider: > public record Response(Optional provider, > Optional responseType, > Optional action, > Optional referenceUrl) { > // Empty. This is a very bland record. > } > > // ... > > public Response respond(Action action) { > return new Response(Optional.empty(), Optional.of(ResponseType.ACTION), > Optional.of(action), Optional.empty()); > } > > // ... > > public Response redirect(String url) { > return new Response(Optional.empty(), Optional.empty(), > Optional.empty(), Optional.of(url)); > } > > All the Optional wrappers have muddied up the call sites a lot, and the > use of positional constructor parameters makes it hard to tell what is > being specified in each usage. The @AutoValue version of this record uses a > builder, and so replaces redirect with: > > public Response redirect(String url) { > return Response.builder().referenceUrl(url).build(); > } > > ?Modification? > > Of course with records being immutable, you can?t modify an existing > record. But is it common to ask for a ?modified version? of a record, > copying a subset of fields but changing others? An often-suggested feature > for records is support for ?wither methods?: methods like > > MyRecord withFoo(int newFoo) {return new MyRecord(newFoo, this.bar);} > > As it turns out, defining methods like these is not very common. 3% of > @AutoValue classes C have at least one instance method returning C - > probably not all of these are ?wither? methods, but many of them are. This > is a small enough percentage of classes that we could reasonably exclude > this feature from records: ?if you want it that badly, you can do it > yourself?. > > @AutoValue supports another kind of ?modification? that I expected to be > more popular: toBuilder(). If your data carrier uses a builder for its > construction, you can ask @AutoValue to generate a toBuilder() method, > which converts an existing value into a builder, so that you can ask for a > subset of fields to be changed before solidifying back down into an > immutable value. But it turns out this feature is used very rarely: only > 1.5% of @AutoValue classes with builders use this feature, which is less > than 1% of all @AutoValue classes. So even considering wither methods and > toBuilder together, less than 5% of @AutoValue classes use this feature. > > Perhaps if records could define builders and withers for you automatically > and with very little boilerplate, these features would be used more often, > but they don?t seem to fill a need so common that developers feel compelled > to write them by hand. It doesn?t seem like a high priority to support > wither methods, or toBuilder(), even if support for builders is added. > > Hidden State > > How will developers feel about the restriction that each field corresponds > to a constructor parameter and a public accessor? Will they wish they could > have some local state? We can look at two things in @AutoValue classes to > identify developers who fit into these categories. First, they may define > private fields which do not participate in @AutoValue generation. This > turns out to be quite rare: less than 1% of @AutoValue classes have such > properties. It makes sense to not support this, as hidden state both goes > against the semantic goals of records and would go unused by most > developers. > > However, there is a more restricted notion of private ?state? that may be > more suitable, and which @AutoValue supports directly: memoization of > derived properties. Developers can tag any nullary method with @Memoize, > and the generated @AutoValue class will cache the return value of that > method in a private field. This seems reasonably compatible with the > semantic goals of records, and could be worth supporting if it is used > regularly. > > However, despite being very easy to use, @Memoize is not very popular. > Only 1.4% of @AutoValue classes memoize any properties. The most obvious > things to memoize are hashCode and toString, and those are indeed the two > most-memoized methods, but in total it is still pretty rare. Of @AutoValue > classes which memoize something, only 14% memoize these methods: most have > some other derived property that they want to cache. > > So, while it might be nice to offer support for lazy/cached methods, > leaving it out will likely not have a significant impact on record > adoption. If lazy instance fields ever make it into the language, we can > retrofit them into records at that time. If memoization support is > included, it should cover all properties, not just Object overrides. > Manually Written Methods > Both records and @AutoValue will automatically provide correct > implementations of equals(Object) and hashCode(), as well as a reasonable > toString(). How often do developers feel the need to override these methods? > > toString(), it turns out, is most common by a landslide, but still rare: > 3% of @AutoValue classes have a manual implementation of toString(). Some > examples: > > @AutoValue public abstract class Constraint { > // ... > @Override public final String toString() { > return String.format( > "Constraint_%s_%s_%s_%s_%s", > cluster().name(), machine().name(), > machineIntent(), subinterval(), constraint()); > } > } > > @AutoValue public abstract class SensitiveString { > public abstract String getValue(); > > public static SensitiveString of(String value) { > return new AutoValue_SensitiveString(value); > } > > // Prevents sensitive strings accidentally being rendered. > @Override public final String toString() { > return "*"; > } > } > > equals(Object) and hashCode() are only overridden around 0.5% of the time. > Developers are generally happy with auto-generated value semantics for > their simple data carriers. I looked at some of the overriding > implementations of these methods - they often just wanted a hashCode that > was faster, at the expense of having more collisions. In one case I found, > one of the fields being wrapped was of a class with an incorrect hashCode > implementation, and so the @AutoValue author hashed it externally. To allow > workarounds like this, allowing overrides is a good idea, but we can expect > it to be used rarely if the automatic implementations of Object methods are > generally suitable. > > In addition to overriding Object methods, there are other method > signatures that crop up multiple times. Most common, although still less > common than toString(), are conversion functions like toJson, toProto(), or > toBuilder() (see Construction section). Much more rare, at around 0.1%, are > methods like iterator() and size(): some @AutoValue classes wrap a single > ImmutableCollection of some kind, and implement methods that delegate to > this field. This could be an argument in favor of the recent > method-delegation proposal, but it is a pretty rare thing to do, and many > of these cases are really not a great idea: they should just call > foo.coll().iterator() instead of foo.iterator(), and having Foo implement > Iterable brings relatively little benefit. > > Footnote: Google?s Codebase > > A brief reminder about the value of using Google?s codebase to answer > questions like these. Our codebase is large, easy to analyze, and highly > cultivated, through static-analysis tools, enforced code-review, etc. In > some ways, it does represent ?what good Java code looks like?, but it also > has some peculiarities, such as a weird fascination with protobufs. So, > keep in mind that when I make claims about how code looks, I am talking > specifically about Google?s codebase, and not about all Java code in the > universe. > From john.r.rose at oracle.com Wed Apr 3 18:22:30 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 3 Apr 2019 11:22:30 -0700 Subject: How records would fit into Google's codebase In-Reply-To: References: Message-ID: This is very welcome work. Reading the tl;dr I was struck again how terrible the term ?immutable? is, because the first syllable so often gets lost, in both speech and text. I suppose you must have meant ?shallow mutability? as short for ?shallow mutability status? but the bit gets flipped so very easily with that term. I?m grumbling about the tools we seemingly must work with. No reflection on your excellent work. Thanks for that! On Apr 3, 2019, at 10:46 AM, Alan Malloy wrote: > > ?The language should enforce shallow mutability (by making all fields final), and style guides should recommend that deep immutability. From amalloy at google.com Wed Apr 3 18:33:45 2019 From: amalloy at google.com (Alan Malloy) Date: Wed, 3 Apr 2019 11:33:45 -0700 Subject: How records would fit into Google's codebase In-Reply-To: References: Message-ID: Hah, you got me. I had written it correctly in a previous version, but a reviewer thought that "records should be shallowly immutable by law" was unclear / unnecessarily poetic, so I tried to rewrite it as "the language should enforce shallow immutability". Little did I know I rewrote it to recommend the exact opposite! I agree immutable is a bit of a bummer, especially to say out loud. It's too bad "persistent" already means something stronger, or we could try to convince everyone to use that. "Changeless" is too prosaic. Oh well, I guess we're stuck with it. On Wed, Apr 3, 2019 at 11:22 AM John Rose wrote: > This is very welcome work. Reading the tl;dr I was struck again how > terrible the term ?immutable? is, because the first syllable so often gets > lost, in both speech and text. > > I suppose you must have meant ?shallow mutability? as short for ?shallow > mutability status? but the bit gets flipped so very easily with that term. > > I?m grumbling about the tools we seemingly must work with. No reflection > on your excellent work. Thanks for that! > > On Apr 3, 2019, at 10:46 AM, Alan Malloy wrote: > > > > ?The language should enforce shallow mutability (by making all fields > final), and style guides should recommend that deep immutability. > > From brian.goetz at oracle.com Wed Apr 3 18:53:22 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 3 Apr 2019 14:53:22 -0400 Subject: How records would fit into Google's codebase In-Reply-To: References: Message-ID: <8986CD81-C221-408E-B19D-DBEF7D55519E@oracle.com> Thanks Alan for this good work; grounding the analysis in real codebases is a valuable tool for validating our theories. Some comments inline. > One additional thing I would have liked to do is to somehow find classes which were ?almost? records, but which ended up not using @AutoValue. A survey of these might help us decide what changes we could make to increase adoption, or simply confirm for us, ?Yes, it?s a good thing we included restriction X, because this class is a bad candidate for a record, but without restriction X it might have become a record?. Alas, unsurprisingly, it is much easier to find actual @AutoValue classes than to design a heuristic for ?almost an @AutoValue?, and so I have not done this. This is desirable, but as you point out, hard to do ? you can?t grep for ?@WishIWasAnAutoValue?. Often the best we can do is recall specific instances of coding in the past when we tumbled off the cliff, and try to reconstruct what was going on. > * Records can expect to be about as common as enums If this was the outcome, I?d call it winning, though I hope for more. The concision of records hopefully makes people more interested in using them as _local_ classes, as pure implementation details (such as the stream example in my writeup.). > * Records should be allowed to implement interfaces, and perhaps to extend a superclass, but should not allow their own subclasses (except perhaps abstract records) The current design says yes on interfaces and no on superclasses, for good reason ? that if a superclass has state, then the record descriptor is not ?the state, the whole state, and nothing but the state.? But, its possible this distinction is too coarse; it would probably be fine for a record to extend a class _with no instance fields_. (In fact, in various versions of the prototype, records extended a base class, analogous to j.l.Enum, which is like this.). So its conceivable the ?no superclass? rule could be relaxed to ?no superclasses with state.? And value types (which have the same restriction) could conceivably relax in the same way. That said, what would it buy us? Such an abstract class could well be an interface ? so is the problem that we have no-state, abstract classes in APIs that should have been interfaces, but were not? > * Records should have a very lightweight syntax for the common case, preferably one line. @AutoValue?s clunkier syntax may be reducing adoption for some use cases. Especially when used encapsulated, such as in local classes. > * Records should be immutable. The language should enforce shallow mutability (by making all fields final), and style guides should recommend that deep immutability. > * We should consider adding an alternative way to construct records besides a constructor with positional parameters. Builders are popular for @AutoValue, but perhaps something better could be done for records as a language feature. Named invocation parameters is a feature that is frequently requested. As with most such features, it is more complicated than it initially seems, but the restrictions of records manage to avoid most of the nastier issues here. So in theory, we could support named invocation of constructor arguments for records ? but we have a concern that this would very quickly be viewed as ?glass half empty?, and an impediment to freely refactoring from records to classes once you exceed the profile for which records were intended. So I would prefer to come up with (not now, later) a more comprehensive way to address named invocation, and fit records into that. The other ?alternative to positional constructors? option is factory methods. Here, I think we?re stuck in an uncomfortable spot; on the one hand, factories are a common practice and have well-documented benefits over constructors; on the other hand, they?re not part of the language, so its even weirder for another language feature to rest on them. (We have the same discomfort with accessors.). > * Records do not need language-level support for withFoo methods, or toBuilder, even if builder support is included. > * Records do not need a way to include private/hidden state, or to memoize derived properties > * Records should allow implementing Object methods by hand, rejecting the auto-generated implementation, but expect this to be done rarely Overall, I take this as a validation that we?ve landed at just about the right place. Good! > How often do @AutoValue classes make use of inheritance? 77% of @AutoValue classes are ?islands? in the inheritance graph: they extend Object, and implement no interfaces. 15% of @AutoValue classes extend Object and implement exactly one interface. A mere 4% extend some class other than Object, and implement no interfaces. Very few do anything else (implement 2+ interfaces, or implement interface(s) while also extending a class). Of those that extend a class other than Object, do you spot any common cases, either specific classes or classes with specific characteristics? > Just like @AutoValue, the current proposal plans to allow implementing interfaces and/or extending a class. That seems reasonable, but extending a class is rare enough that we could consider forbidding it if that fits better onto the semantic goals of ?simple data carriers?: it won?t harm too large a percentage of the usages. Perhaps, for example, we could reserve the extends syntax for the future possibility of extending ?abstract records? that Brian suggested. Still, it would be reasonable to allow normal subclassing, if we judge that it helps achieve the semantic goals of records. The current proposal prohibits extending classes at all, but anticipates we might relax eventually to ?abstract records?, as suggested. > Consider, for example, this pattern I have seen a few times: > > public interface JobInfo { > String session(); > boolean privileged(); > Instant startTime(); > } Someone else also suggested at one point we support some sort of record interface JobInfo(String session, boolean privileged, Instant startTime) option. But, what I?m having a hard time seeing is: is this an interface that multiple classes would want to implement? It seems mostly an extraction of the API of a single class. > Half of @AutoValue classes do the simplest thing: they define a single static factory which delegates to the generated constructor. 10% define two different factories. These groups, totaling 60% of @AutoValue classes, map well onto records as currently proposed. However, a third of @AutoValue classes think it?s worth the trouble to define a builder instead of, or in addition to, the static factory, even though they have to write a bunch more code to support it. This is an area where records could serve developers? needs better, by offering some kind of opt-in support for generating a builder to go with your data carrier. > > But why do people want a builder? How do they use it? Perhaps what they really want is named arguments, or default arguments. A builder may just be the best @AutoValue can do with the language as-is, but a new feature can try something bolder. In some use cases I see, every use of the builder sets every field explicitly, so the main advantage of a builder is that the field names are associated with their values at the construction call site. Such classes would probably be happy with a constructor with named arguments. This is my theory; that named/default arguments will obviate 90+% of builders. > ?Modification? > > Of course with records being immutable, you can?t modify an existing record. But is it common to ask for a ?modified version? of a record, copying a subset of fields but changing others? An often-suggested feature for records is support for ?wither methods?: methods like > > MyRecord withFoo(int newFoo) {return new MyRecord(newFoo, this.bar);} I think such things will be more common with value types than with records ? especially with values that encapsulate some state, such as a Cursor into a data structure. A value cursor would be used like this: for (Cursor c = source.cursor(); c.hasNext(); c = c.next()) { ? } which is like an iterator, but doesn?t require mutation of a heap-based object. With-like behavior (encapsulated within the next() method) will be a regular feature of such values. > @AutoValue supports another kind of ?modification? that I expected to be more popular: toBuilder(). If your data carrier uses a builder for its construction, you can ask @AutoValue to generate a toBuilder() method, which converts an existing value into a builder, so that you can ask for a subset of fields to be changed before solidifying back down into an immutable value. But it turns out this feature is used very rarely: only 1.5% of @AutoValue classes with builders use this feature, which is less than 1% of all @AutoValue classes. So even considering wither methods and toBuilder together, less than 5% of @AutoValue classes use this feature. This idiom might well be replaced with a pattern match, since records will (eventually) come with built-in pattern matching. > In addition to overriding Object methods, there are other method signatures that crop up multiple times. Most common, although still less common than toString(), are conversion functions like toJson, toProto(), or toBuilder() (see Construction section). Much more rare, at around 0.1%, are methods like iterator() and size(): some @AutoValue classes wrap a single ImmutableCollection of some kind, and implement methods that delegate to this field. This could be an argument in favor of the recent method-delegation proposal, but it is a pretty rare thing to do, and many of these cases are really not a great idea: they should just call foo.coll().iterator() instead of foo.iterator(), and having Foo implement Iterable brings relatively little benefit. And is well handled by the current proposal; just declare the interfaces and implement the methods. > Thanks again for the great data. I was a bit surprised, but pleased, to see that many of these ?what about X? issues that came up in the design turn out to be infrequently used in practice. From vicente.romero at oracle.com Wed Apr 3 19:54:19 2019 From: vicente.romero at oracle.com (Vicente Romero) Date: Wed, 3 Apr 2019 15:54:19 -0400 Subject: How records would fit into Google's codebase In-Reply-To: <8986CD81-C221-408E-B19D-DBEF7D55519E@oracle.com> References: <8986CD81-C221-408E-B19D-DBEF7D55519E@oracle.com> Message-ID: <10f14519-7aef-033e-6984-831413176cae@oracle.com> Hi Alan, Thanks for sharing this doc, On 4/3/19 2:53 PM, Brian Goetz wrote: > Thanks Alan for this good work; grounding the analysis in real codebases is a valuable tool for validating our theories. > > Some comments inline. > >> One additional thing I would have liked to do is to somehow find classes which were ?almost? records, but which ended up not using @AutoValue. A survey of these might help us decide what changes we could make to increase adoption, or simply confirm for us, ?Yes, it?s a good thing we included restriction X, because this class is a bad candidate for a record, but without restriction X it might have become a record?. Alas, unsurprisingly, it is much easier to find actual @AutoValue classes than to design a heuristic for ?almost an @AutoValue?, and so I have not done this. > This is desirable, but as you point out, hard to do ? you can?t grep for ?@WishIWasAnAutoValue?. Often the best we can do is recall specific instances of coding in the past when we tumbled off the cliff, and try to reconstruct what was going on. on this respect you can probably find useful class: com.sun.tools.javac.comp.Analyzer, we have used to find things like what initializations could be diamonds or lambdas, etc. Most recently it was used to find out what explicit variable declarations could be substituted by `var`. This analyzer is very powerful. I can help providing an analyzer that you can test on your code base. Could you please share some of the patters of those could-have-been-autovalue classes? > >> * Records can expect to be about as common as enums > Thanks, Vicente From amalloy at google.com Wed Apr 3 20:03:51 2019 From: amalloy at google.com (Alan Malloy) Date: Wed, 3 Apr 2019 13:03:51 -0700 Subject: How records would fit into Google's codebase In-Reply-To: <10f14519-7aef-033e-6984-831413176cae@oracle.com> References: <8986CD81-C221-408E-B19D-DBEF7D55519E@oracle.com> <10f14519-7aef-033e-6984-831413176cae@oracle.com> Message-ID: I have already been using a compiler plugin to do this analysis. The specific analysis tooling is Google-internal, but it uses APIs similar to those in Error Prone matchers to consume javac Tree nodes. The work that would need to be done is more in deciding what patterns to look for, than in the actual mechanics of how to look for them. On Wed, Apr 3, 2019 at 12:54 PM Vicente Romero wrote: > Hi Alan, > > Thanks for sharing this doc, > > On 4/3/19 2:53 PM, Brian Goetz wrote: > > Thanks Alan for this good work; grounding the analysis in real codebases > is a valuable tool for validating our theories. > > > > Some comments inline. > > > >> One additional thing I would have liked to do is to somehow find > classes which were ?almost? records, but which ended up not using > @AutoValue. A survey of these might help us decide what changes we could > make to increase adoption, or simply confirm for us, ?Yes, it?s a good > thing we included restriction X, because this class is a bad candidate for > a record, but without restriction X it might have become a record?. Alas, > unsurprisingly, it is much easier to find actual @AutoValue classes than to > design a heuristic for ?almost an @AutoValue?, and so I have not done this. > > This is desirable, but as you point out, hard to do ? you can?t grep for > ?@WishIWasAnAutoValue?. Often the best we can do is recall specific > instances of coding in the past when we tumbled off the cliff, and try to > reconstruct what was going on. > > on this respect you can probably find useful class: > com.sun.tools.javac.comp.Analyzer, we have used to find things like what > initializations could be diamonds or lambdas, etc. Most recently it was > used to find out what explicit variable declarations could be > substituted by `var`. This analyzer is very powerful. I can help > providing an analyzer that you can test on your code base. Could you > please share some of the patters of those could-have-been-autovalue > classes? > > > >> * Records can expect to be about as common as enums > > > Thanks, > Vicente > From amalloy at google.com Wed Apr 3 20:35:47 2019 From: amalloy at google.com (Alan Malloy) Date: Wed, 3 Apr 2019 13:35:47 -0700 Subject: How records would fit into Google's codebase In-Reply-To: <8986CD81-C221-408E-B19D-DBEF7D55519E@oracle.com> References: <8986CD81-C221-408E-B19D-DBEF7D55519E@oracle.com> Message-ID: On Wed, Apr 3, 2019 at 11:53 AM Brian Goetz wrote: > Of those that extend a class other than Object, do you spot any common > cases, either specific classes or classes with specific characteristics? > Good question! I'm surprised I didn't already look into this - I guess since I had misread your proposal as allowing subclassing, I didn't think too hard about the superclasses. The most frequent specific superclasses turns out to be Guice's AbstractModule , at 6.5%. I'm not familiar with Guice, but my understanding is that a record R extends AbtractModule in order to put a factory for R objects in some kind of generic factory registry. I imagine Guice is a lot more popular here than in the rest of the world, so I would not expect that to represent records in general. Perhaps more interesting, the three next-most-common superclasses are "convergent evolution" of an identical class (but with a different name) public abstract class ToStringless { @Override public final String toString() { return super.toString(); } } This class prevents @AutoValue from synthesizing a toString() method, so Object.toString() gets locked in. Often this is used for a string which may be "sensitive", but perhaps there are other use cases. Of records with superclasses, 9% have something like this as their superclass. No state, just behavior. There's another superclass that also fixes identity-based hashCode and equals(). If we include this in the count, we get over 10%. After that, many of the common superclasses seem to be "abstract records": an abstract class that defines a number of fields for @AutoValue to implement, plus a method or two that uses those fields to conform to some interface. Two funny ones stuck out to me: java.lang.Exception and java.lang.RuntimeException! Makes sense, right? People want to bundle data up in their exceptions, and what better way to bundle data than as a record? Still, this makes up only about 1% of records with superclasses. These classes look about like you'd expect: they just hold a couple fields, nothing fancy. You also mentioned factory methods as an alternative to constructors. I don't see this as helping much: it's still positional, and wouldn't get names to the call sites. From brian.goetz at oracle.com Wed Apr 3 20:46:08 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 3 Apr 2019 16:46:08 -0400 Subject: How records would fit into Google's codebase In-Reply-To: References: <8986CD81-C221-408E-B19D-DBEF7D55519E@oracle.com> Message-ID: <0DEA04B5-9582-4CE6-85EB-DCBC9417C557@oracle.com> > Perhaps more interesting, the three next-most-common superclasses are "convergent evolution" of an identical class (but with a different name) > > public abstract class ToStringless { > @Override > public final String toString() { > return super.toString(); > } > } > > This class prevents @AutoValue from synthesizing a toString() method, so Object.toString() gets locked in. Often this is used for a string which may be "sensitive", but perhaps there are other use cases. Of records with superclasses, 9% have something like this as their superclass. No state, just behavior. There's another superclass that also fixes identity-based hashCode and equals(). If we include this in the count, we get over 10%. OK, so these fall into the narrow gap between what interfaces can do and what abstract classes can ? implement Object methods. That?s interesting, and worth remembering. Of course, it?s also easy to implement toString() in the record (and would be even lower overhead with something like CMB.). From james.laskey at oracle.com Fri Apr 5 14:15:32 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Fri, 5 Apr 2019 11:15:32 -0300 Subject: String reboot (plain text) In-Reply-To: <1842314062.1606725.1553177947475.JavaMail.zimbra@u-pem.fr> References: <7591899A-FB5F-4277-936D-937B7DDBF1E6@oracle.com> <58E6523D-8951-4927-85A7-0BAB30234EC3@oracle.com> <1565954477.1595365.1553176052877.JavaMail.zimbra@u-pem.fr> <1842314062.1606725.1553177947475.JavaMail.zimbra@u-pem.fr> Message-ID: <400D5C99-D878-42DF-8BE9-0633696E2A76@oracle.com> String Tapas - serving (1) I created a new amber branch ?string-tapas? for delivery of Multiline String Literal. This repo now contains the implementation of Brian?s (1) which is a multiline string literal using triple double quotes as a delimiter. I also added String::align method until we have a chance to discuss (1a). Following example works as expected: public class Test { public static void main(String... args) { String result = """ public class Main { public static void main(String... args) { System.out.println("Hello World!"); } } """.align(); System.out.println(result); } } Empty string is both "" and """""". Escape sequences and unicode escapes are always translated. CRLF and CR are translated to LF unless expressed as escape sequences. Triple double quote can be introduced using \""": public class Test { public static void main(String... args) { String result = """ public class Main { public static void main(String... args) { System.out.println(\"""Hello World!\"""); } } """.align(); System.out.println(result); } } Sated: multi-line embedded snippets of many languages (JSON, HTML, XML) without any (or minimal) escaping Sated: fat strings get fat delimiters Sated: One rule for escaping across all strings Still Hungry: Regex Still Hungry: incidental whitespace (align) Fairly minimal but handles a large set if use cases. Cheers, ? Jim > On Mar 21, 2019, at 11:19 AM, forax at univ-mlv.fr wrote: > > ----- Mail original ----- >> De: "Brian Goetz" >> ?: "Remi Forax" , "John Rose" >> Cc: "Jim Laskey" , "amber-spec-experts" >> Envoy?: Jeudi 21 Mars 2019 14:56:58 >> Objet: Re: String reboot (plain text) > >>> I really like in the syntax proposed by Jim the fact that the single quote " is >>> retconned to allow several lines, >>> it seems the easiest thing to do if we just want to introduce a multi-lines >>> literal string. >> >> This has already been rejected, because it doesn't address the main use >> cases -- most multi-line snippets still want to have quotes in them >> (SQL, JSON, XML, etc), and thus would still have to be escaped. > > ok, i never expect to have a lot of codes using a multi-lines single quote but it's fairly common to have DSLs like JSP, velocity marker, mustache template etc to be compiled to Java source and they contains unicode escapes so using raw strings is not an option. > >> >>> I disagree with Brian that we should try to have an intelligent algorithm to >>> remove the blank spaces >> >> Thought that's not actually what I proposed. What I've proposed is to >> start with a choice of "1" or "1a", the latter being "1 with intelligent >> reflow." So your preference for 1 over 1a is recorded! > > R?mi From brian.goetz at oracle.com Fri Apr 5 14:24:06 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 5 Apr 2019 10:24:06 -0400 Subject: String reboot (plain text) In-Reply-To: <400D5C99-D878-42DF-8BE9-0633696E2A76@oracle.com> References: <7591899A-FB5F-4277-936D-937B7DDBF1E6@oracle.com> <58E6523D-8951-4927-85A7-0BAB30234EC3@oracle.com> <1565954477.1595365.1553176052877.JavaMail.zimbra@u-pem.fr> <1842314062.1606725.1553177947475.JavaMail.zimbra@u-pem.fr> <400D5C99-D878-42DF-8BE9-0633696E2A76@oracle.com> Message-ID: > String Tapas - serving (1) I?m hungry already ! > Sated: multi-line embedded snippets of many languages (JSON, HTML, XML) without any (or minimal) escaping > Sated: fat strings get fat delimiters > Sated: One rule for escaping across all strings The ?one rule? thing is a pretty big deal. What this means is that the only difference between a ?classical? string and a ?fat? string is the treatment of how it spans lines in the source ? the escaping rules are the same. These new strings are clearly just the more stout sibling of classical strings ? nothing new to learn (yet). I like that a lot. From alex.buckley at oracle.com Fri Apr 5 18:12:50 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Fri, 05 Apr 2019 11:12:50 -0700 Subject: String reboot (plain text) In-Reply-To: <400D5C99-D878-42DF-8BE9-0633696E2A76@oracle.com> References: <7591899A-FB5F-4277-936D-937B7DDBF1E6@oracle.com> <58E6523D-8951-4927-85A7-0BAB30234EC3@oracle.com> <1565954477.1595365.1553176052877.JavaMail.zimbra@u-pem.fr> <1842314062.1606725.1553177947475.JavaMail.zimbra@u-pem.fr> <400D5C99-D878-42DF-8BE9-0633696E2A76@oracle.com> Message-ID: <5CA79AA2.9080508@oracle.com> On 4/5/2019 7:15 AM, Jim Laskey wrote: > Following example works as expected: > > public class Test { > public static void main(String... args) { > String result = """ > public class Main { > public static void main(String... args) { > System.out.println("Hello World!"); > } > } > """.align(); > System.out.println(result); > } > } > > Empty string is both "" and """""". Escape sequences and unicode escapes are always translated. As someone who was nervous about how raw string literals effectively sidelined Unicode, I'm pleased that \uXXXX escapes are back. It's also great that the traditional escape sequence \" will be interpreted as a single " like it would be in a traditional string literal. Because, as we all know, the code above started life as this painful noisy code: String result = "public class Main {\n" + " public static void main(String... args) {\n" + " System.out.println(\"Hello World!\");\n" + " }\n" + "}\n"; Now a developer can move forward in steps: today remove all the end-of-line cruft involving \n and + that multi-line strings do for free, and don't worry about mid-line escape sequences such as \" -- convert them to " tomorrow, or next week, or not at all, your choice. Alex From brian.goetz at oracle.com Fri Apr 5 19:52:39 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 5 Apr 2019 15:52:39 -0400 Subject: String reboot (plain text) In-Reply-To: <5CA79AA2.9080508@oracle.com> References: <7591899A-FB5F-4277-936D-937B7DDBF1E6@oracle.com> <58E6523D-8951-4927-85A7-0BAB30234EC3@oracle.com> <1565954477.1595365.1553176052877.JavaMail.zimbra@u-pem.fr> <1842314062.1606725.1553177947475.JavaMail.zimbra@u-pem.fr> <400D5C99-D878-42DF-8BE9-0633696E2A76@oracle.com> <5CA79AA2.9080508@oracle.com> Message-ID: Indeed so.? While many languages have been content to treat ML as a special case of raw, there's something somewhat odd and forced about lumping them together.? Of course, splitting has its overheads too -- but I think there is a path here to identifying the right dimensions of parameterization where we can have one feature, simply parameterized.? The first dimension here is what to do about line spanning; the thin-vs-fat delimiter seems a good enough way to split that out, but there is still lots to talk about regarding accidental indentation and terminator normalization (1A) -- before we even get to raw-ness (2).? But so far, teasing out ML as its own axis (which isn't fully finished) looks promising, and I like that these string literals are recognized as merely another flavor of the same basic feature.? Happy so far. On 4/5/2019 2:12 PM, Alex Buckley wrote: > As someone who was nervous about how raw string literals effectively > sidelined Unicode, I'm pleased that \uXXXX escapes are back. It's also > great that the traditional escape sequence \" will be interpreted as a > single " like it would be in a traditional string literal. Because, as > we all know, the code above started life as this painful noisy code: > > String result = "public class Main {\n" + > ??????????????? "? public static void main(String... args) {\n" + > ??????????????? "??? System.out.println(\"Hello World!\");\n" + > ??????????????? "? }\n" + > ??????????????? "}\n"; > > Now a developer can move forward in steps: today remove all the > end-of-line cruft involving \n and + that multi-line strings do for > free, and don't worry about mid-line escape sequences such as \" -- > convert them to " tomorrow, or next week, or not at all, your choice. > > Alex From amalloy at google.com Fri Apr 5 21:49:09 2019 From: amalloy at google.com (Alan Malloy) Date: Fri, 5 Apr 2019 14:49:09 -0700 Subject: Seeking suggestions for a sealed-types analysis Message-ID: Hello again, amber-spec-experts. I plan a follow-up of my recent records report, this time looking at sealed types (from the same proposal). This time, I am seeking suggestions before I get started. In particular, for records, it was easy to find @AutoValue classes: they have a well-defined annotation that I can search for. Regrettably, we did not have the foresight to publish a @TODOMigrateToSealedType annotation, and so I will have to use heuristics to search for code that could have been written "better" if sealed types were available. So, I am looking for suggestions on two main axes (though feel free to pipe in if there is a third axis you think I've missed): 1. Code patterns that you think people would write today, where tomorrow they might instead use sealed types 2. What questions we can ask about occurrences of these code patterns, having identified a list of them We have brainstormed a few ideas for (1) already: - if (x instanceof Foo) {} else if (x instanceof Bar) {} ..., especially when the set of classes tested for covers every known subclass of x's declared type - Visitors. A visitor class is not itself sealed, but if a class or interface C contains methods which accept a visitor as an argument, we guess C probably wants to be sealed. Some heuristics for identifying a visitor are obvious (it has Visitor in the name, it has methods named visitXXX()...), but I am open to others - A DIY "sealed" type might be an abstract class with only package-private constructors, and a number of subclasses in the same package each with public constructors And some questions to ask in (2): - How common are each of these things? - Are implementations usually declared in the same package? Same source file? Nested in the same class? Or scattered all over? - Are sealed type hierarchies typically exactly 2 levels deep (one interface, many direct subclasses), or is there an intricate tree? In case it helps you refine your suggestions, here is a brief description of my methodology and the tools available to me, stating explicitly some stuff I glossed over in the records report earlier this week. We have a compiler plugin that accepts as input a visitor for javac Tree objects. I declare what kinds of Trees I am interested in, and the structure of data that I will produce as output (specified as a protobuf file). Then, I write my visitor, and a batch job calls its visit function for each Tree of interest in the whole of Google's codebase. The output records it produces get stored in (an internal version of) Google BigQuery. Finally, I learn about the results by writing SQL-ish queries against the BigQuery database. The quickest way to iterate is to just refine my queries, but the feedback loop on improving my visitor is not too bad either - I think I went through 12 versions of my @AutoValue visitor before I had everything I wanted. So, I hope that background helps, or that some of you just found it interesting for its own sake. I am looking forward to any suggestions you may have, and to finding out how badly (or not!) we have been missing sealed types. From brian.goetz at oracle.com Fri Apr 5 21:54:33 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 5 Apr 2019 17:54:33 -0400 Subject: Seeking suggestions for a sealed-types analysis In-Reply-To: References: Message-ID: <86ba8cfc-3b6b-f648-a2d6-64f69ba4f186@oracle.com> Thanks Alan!? This sounds great. On 4/5/2019 5:49 PM, Alan Malloy wrote: > Hello again, amber-spec-experts. I plan a follow-up of my recent > records report, this time looking at sealed types (from the same > proposal). This time, I am seeking suggestions before I get started. > In particular, for records, it was easy to find?@AutoValue classes: > they have a well-defined annotation that I can search for. > Regrettably, we did not have the foresight to publish > a?@TODOMigrateToSealedType annotation, and so I will have to use > heuristics to search for code that could have been written "better" if > sealed types were available. > > So, I am looking for suggestions on two main axes (though feel free to > pipe in if there is a third axis you think I've missed): > > 1. Code patterns that you think people would write today, where > tomorrow they might instead use sealed types > 2. What questions we can ask about occurrences of these code > patterns, having identified a list of them > > We have brainstormed a few ideas for (1) already: > > * if (x instanceof Foo) {} else if (x instanceof Bar) {} ..., > especially when the set of classes tested for covers every known > subclass of x's declared type > * Visitors. A visitor class is not itself sealed, but if a class or > interface C contains methods which accept a visitor as an > argument, we guess C probably wants to be sealed. Some heuristics > for identifying a visitor are obvious (it has Visitor in the name, > it has methods named visitXXX()...), but I am open to others > * A DIY "sealed" type might be an abstract class with only > package-private constructors, and a number of subclasses in the > same package each with public constructors > > And some questions to ask in (2): > > * How common are each of these things? > * Are implementations usually declared in the same package? Same > source file? Nested in the same class? Or scattered all over? > * Are sealed type hierarchies typically exactly 2 levels deep (one > interface, many direct subclasses), or is there an intricate tree? > > In case it helps you refine your suggestions, here is a brief > description of my methodology and the tools available to me, stating > explicitly some stuff I glossed over in the records report earlier > this week. > > We have a compiler plugin that accepts as input a visitor for javac > Tree objects. I declare what kinds of Trees I am interested in, and > the structure of data that I will produce as output (specified as a > protobuf file). Then, I write my visitor, and a batch job calls its > visit function for each Tree of interest in the whole of Google's > codebase. The output records it produces get stored in (an internal > version of) Google BigQuery. Finally, I learn about the results by > writing SQL-ish queries against the BigQuery database. The quickest > way to iterate is to just refine my queries, but the feedback loop on > improving my visitor is not too bad either - I think I went through 12 > versions of my?@AutoValue visitor before I had everything I wanted. > > So, I hope that background helps, or that some of you just found it > interesting for its own sake. I am looking forward to any suggestions > you may have, and to finding out how badly (or not!) we have been > missing sealed types. From brian.goetz at oracle.com Sat Apr 6 15:47:25 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 6 Apr 2019 11:47:25 -0400 Subject: Fwd: String reboot (plain text) References: Message-ID: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> Received on amber-spec-comments. These are mostly comments on syntax options that are for later parts of the discussion, so I?m reading them into the record now, and we?re going to leave them to sit until then. High-order summary: - Please consider prefixes a??? for auto-align and r??? for raw. > Begin forwarded message: > > From: Stephen Colebourne > Subject: Re: String reboot (plain text) > Date: March 21, 2019 at 11:14:53 AM EDT > To: amber-spec-comments at openjdk.java.net > Cc: Brian Goetz > > On Wed, 13 Mar 2019 at 17:52, Brian Goetz wrote: >> So, in the spirit of ?keep ordering until sated, but stop there?, here are some reasonable choices. >> >> 1. Do multi-line (escaped) strings with a ??? fixed delimiter. Large benefit, small cost. Most embedded snippets don?t need any escaping. Low cost, big payoff. >> >> 1a. Do 1, but automatically reflow multi-line strings using the equivalent of String::align. There have been reasonable proposals on how to do this; where they fell apart is the interaction with raw-ness, but if we separate ML and raw, these become reasonable again. Higher cost, but higher payoff; having separated the interaction with raw strings, this is more defensible. >> >> 2. Do (1) or (1a), and add: single-line raw string literals delimited by \???\. >> >> 2a. Do (1) or (1a), and also support multi-line raw string literals (where we _don?t_ automatically apply String::align; this can be done manually). Note that this creates anomalies for multi-line raw string literals starting with quotes (this can be handled with concatenation, and having separated ML and raw, this is less of a problem than before). >> >> 3. Do (2) and (2a), and also support a repeating compound delimiter with multiple backslashes and a quote. > > My views have not changed dramatically from my last mail [1]. As per > these options, I think the language would benefit from 1, 1a, 2 and 2a > type changes. I think the choice between 1 and 1a is a false one. I'm > not convinced 3 is worth pursuing. > > I agree that triple double-quote is the right mechanism for multi-line > strings - an obvious direction for Java. I'm happy to accept 1 on its > own *providing that 1a can be added later*. My preference is for """ > to be the delimiter for non-aligned multi-line strings, and for a > single letter prefix 'a' to be used for aligned strings, eg. a""" ... > """ > > For raw strings I personally find the syntax /" ... "/ or /""" ... > """/ unpleasant. While the argument of "distributing the escape over > the string" makes some sense in the abstract, the result is not > appealing to read. Given that I believe aligned and non-aligned > strings should be separated by a single letter prefix, I believe that > raw strings and non-raw strings should also be separated by a single > letter prefix: > > """ - multi-line with-escapes & non-aligned > a""" - multi-line with-escapes & aligned > r""" - multi-line raw & non-aligned > ra""" - multi-line raw & aligned > " - single-line with-escapes > r" - single-line raw > > And yes, I do think you can have raw and aligned as a combination. I > think using prefix letters is more extensible, more orthogonal and > clearer than using /""" ... """/. > > thanks > Stephen > > [1] https://mail.openjdk.java.net/pipermail/amber-dev/2019-January/003850.html From forax at univ-mlv.fr Sat Apr 6 15:52:41 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 6 Apr 2019 17:52:41 +0200 (CEST) Subject: String reboot (plain text) In-Reply-To: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> Message-ID: <1857125401.2344824.1554565961574.JavaMail.zimbra@u-pem.fr> I lke the r prefix because most people think the r prefix means regular expression. R?mi > De: "Brian Goetz" > ?: "amber-spec-experts" > Envoy?: Samedi 6 Avril 2019 17:47:25 > Objet: Fwd: String reboot (plain text) > Received on amber-spec-comments. These are mostly comments on syntax options > that are for later parts of the discussion, so I?m reading them into the record > now, and we?re going to leave them to sit until then. > High-order summary: > - Please consider prefixes a??? for auto-align and r??? for raw. >> Begin forwarded message: >> From: Stephen Colebourne < [ mailto:scolebourne at joda.org | scolebourne at joda.org >> ] > >> Subject: Re: String reboot (plain text) >> Date: March 21, 2019 at 11:14:53 AM EDT >> To: [ mailto:amber-spec-comments at openjdk.java.net | >> amber-spec-comments at openjdk.java.net ] >> Cc: Brian Goetz < [ mailto:brian.goetz at oracle.com | brian.goetz at oracle.com ] > >> On Wed, 13 Mar 2019 at 17:52, Brian Goetz < [ mailto:brian.goetz at oracle.com | >> brian.goetz at oracle.com ] > wrote: >>> So, in the spirit of ?keep ordering until sated, but stop there?, here are some >>> reasonable choices. >>> 1. Do multi-line (escaped) strings with a ??? fixed delimiter. Large benefit, >>> small cost. Most embedded snippets don?t need any escaping. Low cost, big >>> payoff. >>> 1a. Do 1, but automatically reflow multi-line strings using the equivalent of >>> String::align. There have been reasonable proposals on how to do this; where >>> they fell apart is the interaction with raw-ness, but if we separate ML and >>> raw, these become reasonable again. Higher cost, but higher payoff; having >>> separated the interaction with raw strings, this is more defensible. >>> 2. Do (1) or (1a), and add: single-line raw string literals delimited by \???\. >>> 2a. Do (1) or (1a), and also support multi-line raw string literals (where we >>> _don?t_ automatically apply String::align; this can be done manually). Note >>> that this creates anomalies for multi-line raw string literals starting with >>> quotes (this can be handled with concatenation, and having separated ML and >>> raw, this is less of a problem than before). >>> 3. Do (2) and (2a), and also support a repeating compound delimiter with >>> multiple backslashes and a quote. >> My views have not changed dramatically from my last mail [1]. As per >> these options, I think the language would benefit from 1, 1a, 2 and 2a >> type changes. I think the choice between 1 and 1a is a false one. I'm >> not convinced 3 is worth pursuing. >> I agree that triple double-quote is the right mechanism for multi-line >> strings - an obvious direction for Java. I'm happy to accept 1 on its >> own *providing that 1a can be added later*. My preference is for """ >> to be the delimiter for non-aligned multi-line strings, and for a >> single letter prefix 'a' to be used for aligned strings, eg. a""" ... >> """ >> For raw strings I personally find the syntax /" ... "/ or /""" ... >> """/ unpleasant. While the argument of "distributing the escape over >> the string" makes some sense in the abstract, the result is not >> appealing to read. Given that I believe aligned and non-aligned >> strings should be separated by a single letter prefix, I believe that >> raw strings and non-raw strings should also be separated by a single >> letter prefix: >> """ - multi-line with-escapes & non-aligned >> a""" - multi-line with-escapes & aligned >> r""" - multi-line raw & non-aligned >> ra""" - multi-line raw & aligned >> " - single-line with-escapes >> r" - single-line raw >> And yes, I do think you can have raw and aligned as a combination. I >> think using prefix letters is more extensible, more orthogonal and >> clearer than using /""" ... """/. >> thanks >> Stephen >> [1] [ https://mail.openjdk.java.net/pipermail/amber-dev/2019-January/003850.html >> | https://mail.openjdk.java.net/pipermail/amber-dev/2019-January/003850.html ] From brian.goetz at oracle.com Sat Apr 6 16:14:00 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 6 Apr 2019 12:14:00 -0400 Subject: String reboot (plain text) In-Reply-To: <1857125401.2344824.1554565961574.JavaMail.zimbra@u-pem.fr> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <1857125401.2344824.1554565961574.JavaMail.zimbra@u-pem.fr> Message-ID: <722D08EB-DC24-40EA-B9C5-9E4822D6F37E@oracle.com> Yes, because when I said ?these are for the later part of the discussion, we?re going to leave it to sit until then?, what I meant of course was ?please post your opinions on it now?? > On Apr 6, 2019, at 11:52 AM, Remi Forax wrote: > > I lke the r prefix because most people think the r prefix means regular expression. > > R?mi > > De: "Brian Goetz" > ?: "amber-spec-experts" > Envoy?: Samedi 6 Avril 2019 17:47:25 > Objet: Fwd: String reboot (plain text) > Received on amber-spec-comments. These are mostly comments on syntax options that are for later parts of the discussion, so I?m reading them into the record now, and we?re going to leave them to sit until then. > > High-order summary: > - Please consider prefixes a??? for auto-align and r??? for raw. > > Begin forwarded message: > > From: Stephen Colebourne > > Subject: Re: String reboot (plain text) > Date: March 21, 2019 at 11:14:53 AM EDT > To: amber-spec-comments at openjdk.java.net > Cc: Brian Goetz > > > On Wed, 13 Mar 2019 at 17:52, Brian Goetz > wrote: > So, in the spirit of ?keep ordering until sated, but stop there?, here are some reasonable choices. > > 1. Do multi-line (escaped) strings with a ??? fixed delimiter. Large benefit, small cost. Most embedded snippets don?t need any escaping. Low cost, big payoff. > > 1a. Do 1, but automatically reflow multi-line strings using the equivalent of String::align. There have been reasonable proposals on how to do this; where they fell apart is the interaction with raw-ness, but if we separate ML and raw, these become reasonable again. Higher cost, but higher payoff; having separated the interaction with raw strings, this is more defensible. > > 2. Do (1) or (1a), and add: single-line raw string literals delimited by \???\. > > 2a. Do (1) or (1a), and also support multi-line raw string literals (where we _don?t_ automatically apply String::align; this can be done manually). Note that this creates anomalies for multi-line raw string literals starting with quotes (this can be handled with concatenation, and having separated ML and raw, this is less of a problem than before). > > 3. Do (2) and (2a), and also support a repeating compound delimiter with multiple backslashes and a quote. > > My views have not changed dramatically from my last mail [1]. As per > these options, I think the language would benefit from 1, 1a, 2 and 2a > type changes. I think the choice between 1 and 1a is a false one. I'm > not convinced 3 is worth pursuing. > > I agree that triple double-quote is the right mechanism for multi-line > strings - an obvious direction for Java. I'm happy to accept 1 on its > own *providing that 1a can be added later*. My preference is for """ > to be the delimiter for non-aligned multi-line strings, and for a > single letter prefix 'a' to be used for aligned strings, eg. a""" ... > """ > > For raw strings I personally find the syntax /" ... "/ or /""" ... > """/ unpleasant. While the argument of "distributing the escape over > the string" makes some sense in the abstract, the result is not > appealing to read. Given that I believe aligned and non-aligned > strings should be separated by a single letter prefix, I believe that > raw strings and non-raw strings should also be separated by a single > letter prefix: > > """ - multi-line with-escapes & non-aligned > a""" - multi-line with-escapes & aligned > r""" - multi-line raw & non-aligned > ra""" - multi-line raw & aligned > " - single-line with-escapes > r" - single-line raw > > And yes, I do think you can have raw and aligned as a combination. I > think using prefix letters is more extensible, more orthogonal and > clearer than using /""" ... """/. > > thanks > Stephen > > [1] https://mail.openjdk.java.net/pipermail/amber-dev/2019-January/003850.html > > From forax at univ-mlv.fr Sat Apr 6 16:36:50 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Sat, 6 Apr 2019 18:36:50 +0200 (CEST) Subject: String reboot (plain text) In-Reply-To: <722D08EB-DC24-40EA-B9C5-9E4822D6F37E@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <1857125401.2344824.1554565961574.JavaMail.zimbra@u-pem.fr> <722D08EB-DC24-40EA-B9C5-9E4822D6F37E@oracle.com> Message-ID: <1746250705.2347171.1554568610835.JavaMail.zimbra@u-pem.fr> > De: "Brian Goetz" > ?: "Remi Forax" > Cc: "amber-spec-experts" > Envoy?: Samedi 6 Avril 2019 18:14:00 > Objet: Re: String reboot (plain text) > Yes, because when I said ?these are for the later part of the discussion, we?re > going to leave it to sit until then?, what I meant of course was ?please post > your opinions on it now?? ok, ok, no syntax, what i should have said is that any prefix/suffix will do the job. I understand that you want to separate the steps and build a consensus, i think we are spending a lot of time on the raw part, we should decide a syntax and move to the real question, do we offer support for alignment directly in the language or not ? R?mi >> On Apr 6, 2019, at 11:52 AM, Remi Forax < [ mailto:forax at univ-mlv.fr | >> forax at univ-mlv.fr ] > wrote: >> I lke the r prefix because most people think the r prefix means regular >> expression. >> R?mi >>> De: "Brian Goetz" < [ mailto:brian.goetz at oracle.com | brian.goetz at oracle.com ] > >>> ?: "amber-spec-experts" < [ mailto:amber-spec-experts at openjdk.java.net | >>> amber-spec-experts at openjdk.java.net ] > >>> Envoy?: Samedi 6 Avril 2019 17:47:25 >>> Objet: Fwd: String reboot (plain text) >>> Received on amber-spec-comments. These are mostly comments on syntax options >>> that are for later parts of the discussion, so I?m reading them into the record >>> now, and we?re going to leave them to sit until then. >>> High-order summary: >>> - Please consider prefixes a??? for auto-align and r??? for raw. >>>> Begin forwarded message: >>>> From: Stephen Colebourne < [ mailto:scolebourne at joda.org | scolebourne at joda.org >>>> ] > >>>> Subject: Re: String reboot (plain text) >>>> Date: March 21, 2019 at 11:14:53 AM EDT >>>> To: [ mailto:amber-spec-comments at openjdk.java.net | >>>> amber-spec-comments at openjdk.java.net ] >>>> Cc: Brian Goetz < [ mailto:brian.goetz at oracle.com | brian.goetz at oracle.com ] > >>>> On Wed, 13 Mar 2019 at 17:52, Brian Goetz < [ mailto:brian.goetz at oracle.com | >>>> brian.goetz at oracle.com ] > wrote: >>>>> So, in the spirit of ?keep ordering until sated, but stop there?, here are some >>>>> reasonable choices. >>>>> 1. Do multi-line (escaped) strings with a ??? fixed delimiter. Large benefit, >>>>> small cost. Most embedded snippets don?t need any escaping. Low cost, big >>>>> payoff. >>>>> 1a. Do 1, but automatically reflow multi-line strings using the equivalent of >>>>> String::align. There have been reasonable proposals on how to do this; where >>>>> they fell apart is the interaction with raw-ness, but if we separate ML and >>>>> raw, these become reasonable again. Higher cost, but higher payoff; having >>>>> separated the interaction with raw strings, this is more defensible. >>>>> 2. Do (1) or (1a), and add: single-line raw string literals delimited by \???\. >>>>> 2a. Do (1) or (1a), and also support multi-line raw string literals (where we >>>>> _don?t_ automatically apply String::align; this can be done manually). Note >>>>> that this creates anomalies for multi-line raw string literals starting with >>>>> quotes (this can be handled with concatenation, and having separated ML and >>>>> raw, this is less of a problem than before). >>>>> 3. Do (2) and (2a), and also support a repeating compound delimiter with >>>>> multiple backslashes and a quote. >>>> My views have not changed dramatically from my last mail [1]. As per >>>> these options, I think the language would benefit from 1, 1a, 2 and 2a >>>> type changes. I think the choice between 1 and 1a is a false one. I'm >>>> not convinced 3 is worth pursuing. >>>> I agree that triple double-quote is the right mechanism for multi-line >>>> strings - an obvious direction for Java. I'm happy to accept 1 on its >>>> own *providing that 1a can be added later*. My preference is for """ >>>> to be the delimiter for non-aligned multi-line strings, and for a >>>> single letter prefix 'a' to be used for aligned strings, eg. a""" ... >>>> """ >>>> For raw strings I personally find the syntax /" ... "/ or /""" ... >>>> """/ unpleasant. While the argument of "distributing the escape over >>>> the string" makes some sense in the abstract, the result is not >>>> appealing to read. Given that I believe aligned and non-aligned >>>> strings should be separated by a single letter prefix, I believe that >>>> raw strings and non-raw strings should also be separated by a single >>>> letter prefix: >>>> """ - multi-line with-escapes & non-aligned >>>> a""" - multi-line with-escapes & aligned >>>> r""" - multi-line raw & non-aligned >>>> ra""" - multi-line raw & aligned >>>> " - single-line with-escapes >>>> r" - single-line raw >>>> And yes, I do think you can have raw and aligned as a combination. I >>>> think using prefix letters is more extensible, more orthogonal and >>>> clearer than using /""" ... """/. >>>> thanks >>>> Stephen >>>> [1] [ https://mail.openjdk.java.net/pipermail/amber-dev/2019-January/003850.html >>>> | https://mail.openjdk.java.net/pipermail/amber-dev/2019-January/003850.html ] From brian.goetz at oracle.com Sat Apr 6 17:05:43 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 6 Apr 2019 13:05:43 -0400 Subject: String reboot (plain text) In-Reply-To: <1746250705.2347171.1554568610835.JavaMail.zimbra@u-pem.fr> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <1857125401.2344824.1554565961574.JavaMail.zimbra@u-pem.fr> <722D08EB-DC24-40EA-B9C5-9E4822D6F37E@oracle.com> <1746250705.2347171.1554568610835.JavaMail.zimbra@u-pem.fr> Message-ID: <54B1AEE5-F8D2-4088-A102-829DCA6B4DC8@oracle.com> > > I understand that you want to separate the steps and build a consensus, > i think we are spending a lot of time on the raw part, we should decide a syntax and move to the real question, do we offer support for alignment directly in the language or not ? I prefer to move directly on to the real question, and then discuss syntax :) Among others reasons, picking a syntax often implicitly constrains the solution, before you know what the right questions are. Of course, there is the ongoing challenge that, when discussing questions and possible answers, one has to use _something_ to illustrate what you mean, and of course then people will want to discuss the syntax. And we?re going to resist that urge, even though this feature has a higher syntax quotient than most. I agree that raw-ness was a distraction in the first round, which is why I placed it at (2) in my list of steps. And right now, we?re still at 1 / 1a. In the first round, we got wrapped around the axle with raw-ness so early, we didn?t even stop to think about the bigger problem, multi-line. (It is tempting, as I mentioned before, to consider multi-line to be ?just? a special case of raw, and while that?s a possible outcome, there are some really good reasons to consider it on its own first.). Jim is working on some organized thoughts for 1a, stay tuned. From forax at univ-mlv.fr Sat Apr 6 19:17:34 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 6 Apr 2019 21:17:34 +0200 (CEST) Subject: switch statement and lambda Message-ID: <335949504.2353074.1554578254082.JavaMail.zimbra@u-pem.fr> Currently this code doesn't compile IntConsumer c = x -> switch(x) { default -> System.out.println(x); }; I believe it should because this is the basic pattern for supporting the actor model, you consume a message and do a side effect* depending on the type of the message, translated in Java, you want a lambda that takes a message as parameter, calls a switch to do the pattern matching and return void. The other reason is that it's not rare to move from a switch expression to a switch statement and vice-versa when developing the code, and adding/removing a pair of curly braces around the code you are writing because it's the body/expression of a lambda is not very user friendly. regards, R?mi * it's cleaner in Erlang because you have tail calls, so the side effects are hidden. From gavin.bierman at oracle.com Tue Apr 9 17:28:57 2019 From: gavin.bierman at oracle.com (Gavin Bierman) Date: Tue, 9 Apr 2019 19:28:57 +0200 Subject: switch statement and lambda In-Reply-To: <335949504.2353074.1554578254082.JavaMail.zimbra@u-pem.fr> References: <335949504.2353074.1554578254082.JavaMail.zimbra@u-pem.fr> Message-ID: <6FA89498-078B-4C81-B96C-F09B4AF72E46@oracle.com> > On 6 Apr 2019, at 21:17, Remi Forax wrote: > > Currently this code doesn't compile > IntConsumer c = x -> switch(x) { default -> System.out.println(x); }; > > I believe it should because this is the basic pattern for supporting the actor model, > you consume a message and do a side effect* depending on the type of the message, > translated in Java, you want a lambda that takes a message as parameter, calls a switch to do the pattern matching and return void. I understand, although this is actually to do with the way lambda expressions are typed, rather than the switch expression. In JLS 15.27.3 "Type of a Lambda Expression?, there is a special case: ? If the function type's result is void, the lambda body is either a statement expression (?14.8) or a void-compatible block. Which means that the following code typechecks: IntConsumer ic = x -> System.out.println(x); but it breaks as soon as we nest the statement expression, e.g. IntConsumer ic2 = x -> true ? System.out.println(x) : System.out.println(x); // Compilation error: target-type for conditional expression cannot be void This is what is happening in your example. So to deal with this we?d either have to make typechecking of lambdas smarter - either by replacing the typing rule for lambdas above with something more compositional, or by making void a first-class type, or we could perhaps add a pattern-matching form of lambda, which has a void-aware typing rule. I?m not sure about any of these options for now. Cheers, Gavin From forax at univ-mlv.fr Tue Apr 9 19:10:16 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Tue, 9 Apr 2019 21:10:16 +0200 (CEST) Subject: switch statement and lambda In-Reply-To: <6FA89498-078B-4C81-B96C-F09B4AF72E46@oracle.com> References: <335949504.2353074.1554578254082.JavaMail.zimbra@u-pem.fr> <6FA89498-078B-4C81-B96C-F09B4AF72E46@oracle.com> Message-ID: <1063688087.754980.1554837016753.JavaMail.zimbra@u-pem.fr> > De: "Gavin Bierman" > ?: "Remi Forax" > Cc: "amber-spec-experts" > Envoy?: Mardi 9 Avril 2019 19:28:57 > Objet: Re: switch statement and lambda >> On 6 Apr 2019, at 21:17, Remi Forax < [ mailto:forax at univ-mlv.fr | >> forax at univ-mlv.fr ] > wrote: >> Currently this code doesn't compile >> IntConsumer c = x -> switch(x) { default -> System.out.println(x); }; >> I believe it should because this is the basic pattern for supporting the actor >> model, >> you consume a message and do a side effect* depending on the type of the >> message, >> translated in Java, you want a lambda that takes a message as parameter, calls a >> switch to do the pattern matching and return void. > I understand, although this is actually to do with the way lambda expressions > are typed, rather than the switch expression. In JLS 15.27.3 "Type of a Lambda > Expression?, there is a special case: > ? If the function type's result is void, the lambda body is either a statement > expression (?14.8) or a void-compatible block. > Which means that the following code typechecks: > IntConsumer ic = x -> System.out.println(x); > but it breaks as soon as we nest the statement expression, e.g. > IntConsumer ic2 = x -> true ? System.out.println(x) : System.out.println(x); // > Compilation error: target-type for conditional expression cannot be void > This is what is happening in your example. So to deal with this we?d either have > to make typechecking of lambdas smarter - either by replacing the typing rule > for lambdas above with something more compositional, or by making void a > first-class type, or we could perhaps add a pattern-matching form of lambda, > which has a void-aware typing rule. I?m not sure about any of these options for > now. yes, i'm proposing to create a special case for a switch inside a lambda expression for the same reason we have a special treatment for methods. By example, this does not compile [ https://github.com/fora | https://github.com/fora ] x/loom-fiber/blob/cea7b86c26c2e86b00fb72e5098a37983e8b6441/src/main/java/fr.umlv.loom/fr/umlv/loom/actor/CounterStringActorExprSwitchDemo.java#L17 while this compiles https://github.com/forax/loom-fiber/blob/6d8a0a6ba870580d43988cd0d507d19d57653d62/src/main/java/fr.umlv.loom/fr/umlv/loom/actor/CounterStringActorExprSwitchDemo.java#L17 > Cheers, > Gavin cheers, R?mi From brian.goetz at oracle.com Tue Apr 9 19:14:25 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 9 Apr 2019 15:14:25 -0400 Subject: switch statement and lambda In-Reply-To: <1063688087.754980.1554837016753.JavaMail.zimbra@u-pem.fr> References: <335949504.2353074.1554578254082.JavaMail.zimbra@u-pem.fr> <6FA89498-078B-4C81-B96C-F09B4AF72E46@oracle.com> <1063688087.754980.1554837016753.JavaMail.zimbra@u-pem.fr> Message-ID: I see why this is tempting, but I am going to suggest we wait. As part of Valhalla, we would like for `void` to become a real type some day; that will require evaluating all the places in the JLS where we treat statement and expressions differently, or make exceptions like this. Until we?ve completed this analysis, I?m reluctant to add more special cases. > On Apr 9, 2019, at 3:10 PM, forax at univ-mlv.fr wrote: > > > > De: "Gavin Bierman" > ?: "Remi Forax" > Cc: "amber-spec-experts" > Envoy?: Mardi 9 Avril 2019 19:28:57 > Objet: Re: switch statement and lambda > > On 6 Apr 2019, at 21:17, Remi Forax > wrote: > > Currently this code doesn't compile > IntConsumer c = x -> switch(x) { default -> System.out.println(x); }; > > I believe it should because this is the basic pattern for supporting the actor model, > you consume a message and do a side effect* depending on the type of the message, > translated in Java, you want a lambda that takes a message as parameter, calls a switch to do the pattern matching and return void. > > I understand, although this is actually to do with the way lambda expressions are typed, rather than the switch expression. In JLS 15.27.3 "Type of a Lambda Expression?, there is a special case: > > ? If the function type's result is void, the lambda body is either a statement expression (?14.8) or a void-compatible block. > > Which means that the following code typechecks: > > IntConsumer ic = x -> System.out.println(x); > > but it breaks as soon as we nest the statement expression, e.g. > > IntConsumer ic2 = x -> true ? System.out.println(x) : System.out.println(x); // Compilation error: target-type for conditional expression cannot be void > This is what is happening in your example. So to deal with this we?d either have to make typechecking of lambdas smarter - either by replacing the typing rule for lambdas above with something more compositional, or by making void a first-class type, or we could perhaps add a pattern-matching form of lambda, which has a void-aware typing rule. I?m not sure about any of these options for now. > > yes, > i'm proposing to create a special case for a switch inside a lambda expression for the same reason we have a special treatment for methods. > > By example, this does not compile > https://github.com/fora x/loom-fiber/blob/cea7b86c26c2e86b00fb72e5098a37983e8b6441/src/main/java/fr.umlv.loom/fr/umlv/loom/actor/CounterStringActorExprSwitchDemo.java#L17 > while this compiles > https://github.com/forax/loom-fiber/blob/6d8a0a6ba870580d43988cd0d507d19d57653d62/src/main/java/fr.umlv.loom/fr/umlv/loom/actor/CounterStringActorExprSwitchDemo.java#L17 > > > > Cheers, > Gavin > > cheers, > R?mi From forax at univ-mlv.fr Tue Apr 9 19:39:58 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Tue, 9 Apr 2019 21:39:58 +0200 (CEST) Subject: switch statement and lambda In-Reply-To: References: <335949504.2353074.1554578254082.JavaMail.zimbra@u-pem.fr> <6FA89498-078B-4C81-B96C-F09B4AF72E46@oracle.com> <1063688087.754980.1554837016753.JavaMail.zimbra@u-pem.fr> Message-ID: <764083731.758533.1554838798792.JavaMail.zimbra@u-pem.fr> > De: "Brian Goetz" > ?: "Remi Forax" > Cc: "Gavin Bierman" , "amber-spec-experts" > > Envoy?: Mardi 9 Avril 2019 21:14:25 > Objet: Re: switch statement and lambda > I see why this is tempting, but I am going to suggest we wait. As part of > Valhalla, we would like for `void` to become a real type some day; that will > require evaluating all the places in the JLS where we treat statement and > expressions differently, or make exceptions like this. Until we?ve completed > this analysis, I?m reluctant to add more special cases. seems wise, ok ! R?mi >> On Apr 9, 2019, at 3:10 PM, [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] >> wrote: >>> De: "Gavin Bierman" < [ mailto:gavin.bierman at oracle.com | >>> gavin.bierman at oracle.com ] > >>> ?: "Remi Forax" < [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] > >>> Cc: "amber-spec-experts" < [ mailto:amber-spec-experts at openjdk.java.net | >>> amber-spec-experts at openjdk.java.net ] > >>> Envoy?: Mardi 9 Avril 2019 19:28:57 >>> Objet: Re: switch statement and lambda >>>> On 6 Apr 2019, at 21:17, Remi Forax < [ mailto:forax at univ-mlv.fr | >>>> forax at univ-mlv.fr ] > wrote: >>>> Currently this code doesn't compile >>>> IntConsumer c = x -> switch(x) { default -> System.out.println(x); }; >>>> I believe it should because this is the basic pattern for supporting the actor >>>> model, >>>> you consume a message and do a side effect* depending on the type of the >>>> message, >>>> translated in Java, you want a lambda that takes a message as parameter, calls a >>>> switch to do the pattern matching and return void. >>> I understand, although this is actually to do with the way lambda expressions >>> are typed, rather than the switch expression. In JLS 15.27.3 "Type of a Lambda >>> Expression?, there is a special case: >>> ? If the function type's result is void, the lambda body is either a statement >>> expression (?14.8) or a void-compatible block. >>> Which means that the following code typechecks: >>> IntConsumer ic = x -> System.out.println(x); >>> but it breaks as soon as we nest the statement expression, e.g. >>> IntConsumer ic2 = x -> true ? System.out.println(x) : System.out.println(x); // >>> Compilation error: target-type for conditional expression cannot be void >>> This is what is happening in your example. So to deal with this we?d either have >>> to make typechecking of lambdas smarter - either by replacing the typing rule >>> for lambdas above with something more compositional, or by making void a >>> first-class type, or we could perhaps add a pattern-matching form of lambda, >>> which has a void-aware typing rule. I?m not sure about any of these options for >>> now. >> yes, >> i'm proposing to create a special case for a switch inside a lambda expression >> for the same reason we have a special treatment for methods. >> By example, this does not compile >> [ https://github.com/fora | https://github.com/fora ] >> x/loom-fiber/blob/cea7b86c26c2e86b00fb72e5098a37983e8b6441/src/main/java/fr.umlv.loom/fr/umlv/loom/actor/CounterStringActorExprSwitchDemo.java#L17 >> while this compiles >> [ >> https://github.com/forax/loom-fiber/blob/6d8a0a6ba870580d43988cd0d507d19d57653d62/src/main/java/fr.umlv.loom/fr/umlv/loom/actor/CounterStringActorExprSwitchDemo.java#L17 >> | >> https://github.com/forax/loom-fiber/blob/6d8a0a6ba870580d43988cd0d507d19d57653d62/src/main/java/fr.umlv.loom/fr/umlv/loom/actor/CounterStringActorExprSwitchDemo.java#L17 >> ] >>> Cheers, >>> Gavin >> cheers, >> R?mi From james.laskey at oracle.com Wed Apr 10 15:22:59 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Wed, 10 Apr 2019 12:22:59 -0300 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> Message-ID: <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> Next plate is (1a) incidental whitespace. Having decided that we are content with "fat" delimiters (""") for multi-line strings, we have some more choices to make regarding multi-line strings. (We're not going to talk about "raw" strings yet; let's finish the multi-line course first.) Multi-line strings are different from single-line strings in a number of ways, so let's get clear on what we want "multi-line" to mean. Line terminators: When strings span lines, they do so using the line terminators present in the source file, which may vary depending on what operating system the file was authored. Should this be an aspect of multi-line-ness, or should we normalize these to a standard line terminator? It seems a little weird to treat string literals quite so literally; the choice of line terminator is surely an incidental one. I think we're all comfortable saying "these should be normalized", but its worth bringing this up because it is merely one way in which incidental artifacts of how the string is embedded in the source program force us to interpret what the user meant. Which brings us to the next incidental aspect... Whitespace: A multi-line string is nestled in the context of a Java source program. It is likely (though not guaranteed) that the indentation of lines has been distorted by the desire to make the embedded snippet align with the enclosing lines. Most of the time, there is some combination of incidental whitespace and intended whitespace. There are a number of algorithms by which we could try to intuit which the user intended. Which brings us to ask: - Assuming the existence of a reasonable algorithm for re-aligning text, what should the _default_ be for the language? Should it assume the user wants re-alignment, or make the user explicitly opt in? - If the choice is "automatically align", how would we indicate the desire to opt out? - Should we limit what we do automatically to only what can be done by an equivalent library routine? (Again, let's focus on the requirements and semantics and defaults first, before we bikeshed the syntax.) Its hard to answer the above without a clear understanding of the use cases. So, here's a partial catalog of examples; let's play "what was the user thinking", and see if we can agree on that. Examples; String a = """ +--------+ | text | +--------+ """; // first characters in first column? String b = """ +--------+ | text | +--------+ """; // first characters in first column or indented four spaces? String c = """ +--------+ | text | +--------+ """; // first characters in first column or indented several? String d = """ +--------+ | text | +--------+ """; // first characters in first column or indented four? String e = """ +--------+ | text | +--------+ """; // heredoc? String f = """ +--------+ | text | +--------+ """; // one or all leading or trailing blank lines stripped? String g = """ +--------+ | text | +--------+"""; // Last \n dropped String h = """+--------+ | text | +--------+"""; // determine indent of first line using scanner knowledge? String i = """ "nested" """; // strip leading/trailing space? String j = (""" public static void """ + name + """(String... args) { System.out.println(String.join(args)); } """).align(); // how do we handle expressions with multi-line strings? String k = """ public static void %s(String... args) { System.out.println(String.join(args)); } """.format(name); // is this the answer to multi-line string expressions? As we can see, there were a lot of cases where the user _probably_ wanted one thing, but _might have_ wanted another. What control knobs do we have, that we could assign meaning to, that would let the user choose either way? Candidates include: - The opening line (is it blanks followed by a newline, or are there non-whitespace characters?) - The position of the close delimiter (is it on its own line, or not?) Similarly, we have a number of policy choices: - Do we allow content on the same lines as the delimiters? - Should we always add a final newline? - Should we strip blanks lines? Only on the first and last? All leading and trailing? - How do we interpret auto-alignment on single-line strings? Strip? - Should we right strip lines? And some syntax choices (not to be discussed now): - How do we indicate opt-out? Comments? Examples narrative. Don?t peek yet. Stop and comment first. Unlike most other Java constructs, multi-line strings force us to look at coding style "square on". Keep in mind that we are often guilty of making assumptions about developer coding style. For instance, we may assume that multi-line strings tend to be large elements. We may also assume that developers will declare static final String variables to keep multi-line strings from messing up their code. All very neat and tidy, but... we know from experience that developers will use multi-line strings everywhere, as they have with array initialization and large lambda bodies. From this, we recommend that multi-line string fat delimiters should follow the brace pattern used in array initialization, lambdas and other Java constructs. The open delimiter should end the current line. Content follows on separate lines, indented one level. The close delimiter starts a new line, back indented one level, followed by the continuation of enclosing expression. So as in this brace pattern; int[] ia = new int[] { 1, 2, 3 }; we have the fat delimiter pattern; String d = """ +--------+ | text | +--------+ """; and; String.format(""" public static void %s(String... args) { System.out.println(String.join(args)); } """, name); The fat delimiter pattern also significantly helps with future editing in and around the multi-line string. For example, changing the length of the variable name in the above "String d =" example doesn't affect the positioning of the string content or the close delimiter. If we adopt this style, some of the answers to the incidentals questions become easier or even moot. Other styles are still valid, but the result of automatic incidental handling may be surprising. Note that fat delimiters can be used on single lines. What are the semantics for auto-alignment in that case? The question of stripping whitespace and newlines is not really about alignment. It's about what are the rules for handling incidental characters in a fat delimiter string. Continuing with the examples, let's assume some (negotiable) auto-alignment basic rules; 1. All content lines are uniformly right stripped. Whitespace at the end of lines is not something that is consistently managed by IDEs/editors. 2. End of lines are always translated to \n. 3. If the content after the open delimiter is empty then the first end of line is discarded. 4. Content is left justified while preserving relative indentation. And as a reminder, in the last round we introduced or attempted to introduce the following String methods; - String::indent(n) - used to change indentation, line by line (in JDK 11) - String::align() and String::align(n) - used to manage incidental indentation (didn't make it) - String::format as an instance method (resolution issues YTBD) __________________________________________________________________________________________________ String a = """ +--------+ | text | +--------+ """; // first characters in first column? RESULT: +--------+\n | text |\n +--------+\n The problem with this example is that it is not following the fat delimiter pattern. Let's change the variable name "a" to "something". String something = """ .......... +--------+ .......... | text | .......... +--------+ .......... """; // first characters in first column? The "." indicate all the places where we had to add whitespace to maintain the pattern used. __________________________________________________________________________________________________ String b = """ +--------+ | text | +--------+ """; // first characters in first column or indented four? RESULT: +--------+\n | text |\n +--------+\n Same maintenence problem as example (a). Still works, but the question here is, do we give meaning to indentation relative to the close delimiter? Did we want?; +--------+\n | text |\n +--------+\n It's a nice trick but we sabotage the fat delimiter pattern. We would always get at least one level of indentation, whether we wanted it or not. Maybe better to code as; String b = """ +--------+ | text | +--------+ """.indent(4); So the question here is: should it be possible to specify "extra" indentation through the positioning of quotes, or are we better off saying that any extra indentation should be done through library calls? Also noting that the library calls might be subject to compile time folding. __________________________________________________________________________________________________ String c = """ +--------+ | text | +--------+ """; // first characters in first column or indented several? RESULT: +--------+\n | text |\n +--------+\n The amount of indentation is not a problem, just an aesthetic issue. __________________________________________________________________________________________________ String d = """ +--------+ | text | +--------+ """; // first characters in first column or indented four? RESULT: +--------+\n | text |\n +--------+\n Text book fat delimiter pattern. __________________________________________________________________________________________________ String e = """ +--------+ | text | +--------+ """; // heredoc? RESULT: +--------+\n | text |\n +--------+\n Just an aesthetic issue. __________________________________________________________________________________________________ String f = """ +--------+ | text | +--------+ """; // one or all leading or trailing blank lines stripped? As-is would generate; \n \n +--------+\n | text |\n +--------+\n \n \n \n If we stripped away all leading or trailing blank lines, we would then have code as; String f = "\n".repeat(2) + """ +--------+ | text | +--------+ """ + "\n".repeat(2); __________________________________________________________________________________________________ String g = """ +--------+ | text | +--------+"""; // Last \n dropped RESULT: +--------+\n | text |\n +--------+ This one is likely okay. It's not the fat delimiter pattern, but the oddity makes it clear we mean something different; we want to drop the last \n. __________________________________________________________________________________________________ String h = """+--------+ | text | +--------+"""; // determine indent of first line using scanner knowledge? RESULT: +--------+\n | text |\n +--------+ We can do this because the compiler's scanner can determine the indentation on the open delimiter line. However, this one is problematic if we require a String method to duplicate the compiler's algorithm (String::align). Tool vendors may also find this one problematic. __________________________________________________________________________________________________ String i = """ "nested" """; // strip leading/trailing space? RESULT: "nested" This one still follows the rules; left and right stripped. __________________________________________________________________________________________________ String j = (""" public static void """ + name + """(String... args) { System.out.println(String.join(args)); } """).align(); // how do we handle expressions with multi-line strings? Mid-string substitution gets messy fast. Let's break the example down to the following (without align.) String j = """ public static void """ + name + """(String... args) { System.out.println(String.join(args)); } """; This is the same as String j = """ public static void """ + name + """(String... args) { System.out.println(String.join(args)); } """; Which works fine if we say no \n when close delimiter is on the same line. The other requirement is there is that each multi-line string componment ends up with a common indentation. The odds of that happening are poor. Guess we're stuck with parentheses String::align. Unless... __________________________________________________________________________________________________ String k = """ public static void %s(String... args) { System.out.println(String.join(args)); } """.format(name); // is this the answer to multi-line string expressions? RESULT: public static void methodName(String... args) { System.out.println(String.join(args)); } Maybe a better substitution solution. __________________________________________________________________________________________________ From brian.goetz at oracle.com Wed Apr 10 20:54:24 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 10 Apr 2019 16:54:24 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> Message-ID: This is a plateful! Stripping "incidental" whitespace is an attractive target of opportunity; the real question is, can we do it right enough of the time, and when we get it wrong, is there an easy way for the user to recover and get what they want? Kevin described this as: "find the magic rectangle"; that there should be a rectangle enclosing the snippet that sets apart the incidental whitespace from the essential.? In your examples, most of the time, the magic rectangle is, well, the actual rectangle in your text. > > Examples; > > String a = """ > ?+--------+ > ? ? ? ? ? ?| ?text ?| > ?+--------+ > ? ? ? ? ? ?"""; // first characters in first column? > From guy.steele at oracle.com Wed Apr 10 21:36:34 2019 From: guy.steele at oracle.com (Guy Steele) Date: Wed, 10 Apr 2019 17:36:34 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> Message-ID: > On Apr 10, 2019, at 4:54 PM, Brian Goetz wrote: > > This is a plateful! > > Stripping "incidental" whitespace is an attractive target of opportunity; the real question is, can we do it right enough of the time, and when we get it wrong, is there an easy way for the user to recover and get what they want? > > Kevin described this as: "find the magic rectangle"; that there should be a rectangle enclosing the snippet that sets apart the incidental whitespace from the essential. In your examples, most of the time, the magic rectangle is, well, the actual rectangle in your text. > > >> >> Examples; >> >> String a = """ >> +--------+ >> | text | >> +--------+ >> """; // first characters in first column? Which suggests yet another approach to multiline string literals: String a = ??????????????????????????????????????? ?A rectangle of double quotes " ? can enclose any arbitrary text ? ? with any desired indentation, ? ? and you can assume any trailing ? ? whitespace on each line will be ? ? removed and that each line will ? ? end with a \\n. ? ? ? ?So all you need is IDE support for ? ? making nice rectangles. ? ???????????????????????????????????????; String result = ???????????????????????????????????????????????? ?public class Main { ? ? public static void main(String... args) { ? ? System.out.println("Hello World!?); ? ? } ? ?} ? ????????????????????????????????????????????????; String html = ??????????????????????????????????????????????? ? ? ? ? ?

Hello World.

? ? ? ? ? ? ? ???????????????????????????????????????????????; From guy.steele at oracle.com Wed Apr 10 21:48:05 2019 From: guy.steele at oracle.com (Guy Steele) Date: Wed, 10 Apr 2019 17:48:05 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> Message-ID: <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> > On Apr 10, 2019, at 5:36 PM, Guy Steele wrote: > > >> On Apr 10, 2019, at 4:54 PM, Brian Goetz > wrote: >> >> This is a plateful! >> >> Stripping "incidental" whitespace is an attractive target of opportunity; the real question is, can we do it right enough of the time, and when we get it wrong, is there an easy way for the user to recover and get what they want? >> >> Kevin described this as: "find the magic rectangle"; that there should be a rectangle enclosing the snippet that sets apart the incidental whitespace from the essential. In your examples, most of the time, the magic rectangle is, well, the actual rectangle in your text. >> >> >>> >>> Examples; >>> >>> String a = """ >>> +--------+ >>> | text | >>> +--------+ >>> """; // first characters in first column? > > Which suggests yet another approach to multiline string literals: > > String a = ??????????????????????????????????????? > ?A rectangle of double quotes " > ? can enclose any arbitrary text ? > ? with any desired indentation, ? > ? and you can assume any trailing ? > ? whitespace on each line will be ? > ? removed and that each line will ? > ? end with a \\n . ? > ? ? > ?So all you need is IDE support for ? > ? making nice rectangles. ? > ???????????????????????????????????????; > > String result = ???????????????????????????????????????????????? > ?public class Main { ? > ? public static void main(String... args) { ? > ? System.out.println("Hello World!?); ? > ? } ? > ?} ? > ????????????????????????????????????????????????; > > String html = ??????????????????????????????????????????????? > ? ? > ? ? > ?

Hello World.

? > ? ? > ? ? > ? ? > ???????????????????????????????????????????????; > Note that I was inconsistent with my use of escapes. I reckon you should be able to use escapes, but perhaps one need not escape included double quotes because you can tell from the length of the initial line of double-quotes how much text to skip before you expect see the double quote that marks the right-hand edge of the rectangle. From forax at univ-mlv.fr Wed Apr 10 21:52:39 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 10 Apr 2019 23:52:39 +0200 (CEST) Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> Message-ID: <866740314.1162793.1554933159644.JavaMail.zimbra@u-pem.fr> It's more or less the javadoc approach no ? R?mi > De: "Guy Steele" > ?: "Brian Goetz" > Cc: "amber-spec-experts" > Envoy?: Mercredi 10 Avril 2019 23:36:34 > Objet: Re: String reboot - (1a) incidental whitespace >> On Apr 10, 2019, at 4:54 PM, Brian Goetz < [ mailto:brian.goetz at oracle.com | >> brian.goetz at oracle.com ] > wrote: >> This is a plateful! >> Stripping "incidental" whitespace is an attractive target of opportunity; the >> real question is, can we do it right enough of the time, and when we get it >> wrong, is there an easy way for the user to recover and get what they want? >> Kevin described this as: "find the magic rectangle"; that there should be a >> rectangle enclosing the snippet that sets apart the incidental whitespace from >> the essential. In your examples, most of the time, the magic rectangle is, >> well, the actual rectangle in your text. >>> Examples; >>> String a = """ >>> +--------+ >>> | text | >>> +--------+ >>> """; // first characters in first column? > Which suggests yet another approach to multiline string literals: > String a = ??????????????????????????????????????? > ?A rectangle of double quotes " > ? can enclose any arbitrary text ? > ? with any desired indentation, ? > ? and you can assume any trailing ? > ? whitespace on each line will be ? > ? removed and that each line will ? > ? end with a [ smb://n/ | \\n ] . ? > ? ? > ?So all you need is IDE support for ? > ? making nice rectangles. ? > ???????????????????????????????????????; > String result = ???????????????????????????????????????????????? > ?public class Main { ? > ? public static void main(String... args) { ? > ? System.out.println("Hello World!?); ? > ? } ? > ?} ? > ????????????????????????????????????????????????; > String html = ??????????????????????????????????????????????? > ? ? > ? ? > ?

Hello World.

? > ? ? > ? ? > ? ? > ???????????????????????????????????????????????; From forax at univ-mlv.fr Fri Apr 12 07:33:19 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 12 Apr 2019 09:33:19 +0200 (CEST) Subject: records are dead long live to ... Message-ID: <1647509074.1500836.1555054399458.JavaMail.zimbra@u-pem.fr> I've re-read the current state of the record (ex datum) http://cr.openjdk.java.net/~briangoetz/amber/datum.html trying to explain to myself how it works. At the end of section "Why not "just" do tuples ?", you have this gem, "A good starting point for thinking about records is that they are nominal tuples." I believe that since we are exploring the fact that record are immutable, the name "record" doesn't suit well to the feature anymore. I propose to rename it as tuple given that this is what this feature is, named tuples. You will say, but they are not "real" tuples, i.e. they are not structural type, yes, they are not "real" tuple the same way Java lambdas are not "real" lambdas, they are the Java flavor of named tuples. You may think this bikeshedding on the name and it's not important, but i think it will help us to shape this feature the right way. R?mi From brian.goetz at oracle.com Fri Apr 12 14:45:15 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 12 Apr 2019 10:45:15 -0400 Subject: records are dead long live to ... In-Reply-To: <1647509074.1500836.1555054399458.JavaMail.zimbra@u-pem.fr> References: <1647509074.1500836.1555054399458.JavaMail.zimbra@u-pem.fr> Message-ID: > I believe that since we are exploring the fact that record are immutable, the name "record" doesn't suit well to the feature anymore. > I propose to rename it as tuple given that this is what this feature is, named tuples. It should come as no surprise that we thought about this idea quite a bit. You have anticipated the main object: "You idiots, these aren't real tuples."? And the main response: "These are what tuples are in Java". (just as lambdas in Java are literals for functional interface instances.)? It's a believable story. On the other hand, there's a huge difference between this and lambdas -- the word "closure" doesn't appear in the source text.? if it did, the volume of "you idiots, these aren't real closures" would have been much greater. The real question, in my mind, is how it drives users to the right mental model.? For those who have no preconceived notions of what tuples are, no problem.? But for those who do -- and I think that's a lot -- the question is whether it is more work to unlearn their preconceptions first, or to associate what records are with a relatively unpolluted name. There are two categories of preconceived notions that we're working against.? One is the larger audience, who has a vague clue about what a tuple is, but doesn't have a real axe to grind -- it will be some discomfort, but they'll get over it.? The other is the smaller but far^3 more vocal audience -- the Tommy Tuple clan -- who will see it as a door permanently slammed on their favorite feature, and it will be Optional all over again.? And this group may infect the main group. I would rather avoid picking that fight -- I don't really see the point in it.? If instead we say "records are just nominal tuples", two good things happen: ?- Those who vaguely know what tuples are will get it; ?- Those who wish we had done structural tuples will have nothing to argue with. That's winning^2. From james.laskey at oracle.com Fri Apr 12 15:38:43 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Fri, 12 Apr 2019 12:38:43 -0300 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> Message-ID: Kevin & Alan, Do you have numbers from your RSL survey for, of all string expressions that are candidates for translation to a multi-line string literal, what percentage contain no escapes other than quotes and newline? Thank you, -- Jim > On Apr 10, 2019, at 12:22 PM, Jim Laskey wrote: > > Next plate is (1a) incidental whitespace. > > Having decided that we are content with "fat" delimiters (""") for multi-line strings, we have some more choices to make regarding multi-line strings. (We're not going to talk about "raw" strings yet; let's finish the multi-line course first.) > > Multi-line strings are different from single-line strings in a number of ways, so let's get clear on what we want "multi-line" to mean. > > Line terminators: When strings span lines, they do so using the line terminators present in the source file, which may vary depending on what operating system the file was authored. Should this be an aspect of multi-line-ness, or should we normalize these to a standard line terminator? It seems a little weird to treat string literals quite so literally; the choice of line terminator is surely an incidental one. I think we're all comfortable saying "these should be normalized", but its worth bringing this up because it is merely one way in which incidental artifacts of how the string is embedded in the source program force us to interpret what the user meant. Which brings us to the next incidental aspect... > > Whitespace: A multi-line string is nestled in the context of a Java source program. It is likely (though not guaranteed) that the indentation of lines has been distorted by the desire to make the embedded snippet align with the enclosing lines. Most of the time, there is some combination of incidental whitespace and intended whitespace. There are a number of algorithms by which we could try to intuit which the user intended. Which brings us to ask: > > - Assuming the existence of a reasonable algorithm for re-aligning text, what should the _default_ be for the language? Should it assume the user wants re-alignment, or make the user explicitly opt in? > - If the choice is "automatically align", how would we indicate the desire to opt out? > - Should we limit what we do automatically to only what can be done by an equivalent library routine? > > (Again, let's focus on the requirements and semantics and defaults first, before we bikeshed the syntax.) > > Its hard to answer the above without a clear understanding of the use cases. So, here's a partial catalog of examples; let's play "what was the user thinking", and see if we can agree on that. > > Examples; > > String a = """ > +--------+ > | text | > +--------+ > """; // first characters in first column? > > String b = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented four spaces? > > String c = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented several? > > String d = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented four? > > String e = > """ > +--------+ > | text | > +--------+ > """; // heredoc? > > String f = """ > > > +--------+ > | text | > +--------+ > > > """; // one or all leading or trailing blank lines stripped? > > String g = """ > +--------+ > | text | > +--------+"""; // Last \n dropped > > String h = """+--------+ > | text | > +--------+"""; // determine indent of first line using scanner knowledge? > > String i = """ "nested" """; // strip leading/trailing space? > > String j = (""" > public static void """ + name + """(String... args) { > System.out.println(String.join(args)); > } > """).align(); // how do we handle expressions with multi-line strings? > > String k = """ > public static void %s(String... args) { > System.out.println(String.join(args)); > } > """.format(name); // is this the answer to multi-line string expressions? > > As we can see, there were a lot of cases where the user _probably_ wanted one thing, but _might have_ wanted another. What control knobs do we have, that we could assign meaning to, that would let the user choose either way? Candidates include: > > - The opening line (is it blanks followed by a newline, or are there non-whitespace characters?) > - The position of the close delimiter (is it on its own line, or not?) > > Similarly, we have a number of policy choices: > > - Do we allow content on the same lines as the delimiters? > - Should we always add a final newline? > - Should we strip blanks lines? Only on the first and last? All leading and trailing? > - How do we interpret auto-alignment on single-line strings? Strip? > - Should we right strip lines? > > And some syntax choices (not to be discussed now): > > - How do we indicate opt-out? > > Comments? > > > Examples narrative. Don?t peek yet. Stop and comment first. > > > Unlike most other Java constructs, multi-line strings force us to look at coding style "square on". Keep in mind that we are often guilty of making assumptions about developer coding style. For instance, we may assume that multi-line strings tend to be large elements. We may also assume that developers will declare static final String variables to keep multi-line strings from messing up their code. All very neat and tidy, but... we know from experience that developers will use multi-line strings everywhere, as they have with array initialization and large lambda bodies. > > From this, we recommend that multi-line string fat delimiters should follow the brace pattern used in array initialization, lambdas and other Java constructs. The open delimiter should end the current line. Content follows on separate lines, indented one level. The close delimiter starts a new line, back indented one level, followed by the continuation of enclosing expression. > > So as in this brace pattern; > > int[] ia = new int[] { > 1, > 2, > 3 > }; > > we have the fat delimiter pattern; > > String d = """ > +--------+ > | text | > +--------+ > """; > > and; > > String.format(""" > public static void %s(String... args) { > System.out.println(String.join(args)); > } > """, name); > > The fat delimiter pattern also significantly helps with future editing in and around the multi-line string. For example, changing the length of the variable name in the above "String d =" example doesn't affect the positioning of the string content or the close delimiter. > > If we adopt this style, some of the answers to the incidentals questions become easier or even moot. Other styles are still valid, but the result of automatic incidental handling may be surprising. > > Note that fat delimiters can be used on single lines. What are the semantics for auto-alignment in that case? The question of stripping whitespace and newlines is not really about alignment. It's about what are the rules for handling incidental characters in a fat delimiter string. > > > Continuing with the examples, let's assume some (negotiable) auto-alignment basic rules; > > 1. All content lines are uniformly right stripped. Whitespace at the end of lines is not something that is consistently managed by IDEs/editors. > 2. End of lines are always translated to \n. > 3. If the content after the open delimiter is empty then the first end of line is discarded. > 4. Content is left justified while preserving relative indentation. > > And as a reminder, in the last round we introduced or attempted to introduce the following String methods; > > - String::indent(n) - used to change indentation, line by line (in JDK 11) > - String::align() and String::align(n) - used to manage incidental indentation (didn't make it) > - String::format as an instance method (resolution issues YTBD) > > __________________________________________________________________________________________________ > String a = """ > +--------+ > | text | > +--------+ > """; // first characters in first column? > > RESULT: > +--------+\n > | text |\n > +--------+\n > > The problem with this example is that it is not following the fat delimiter pattern. Let's change the variable name "a" to "something". > > String something = """ > .......... +--------+ > .......... | text | > .......... +--------+ > .......... """; // first characters in first column? > > The "." indicate all the places where we had to add whitespace to maintain the pattern used. > __________________________________________________________________________________________________ > String b = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented four? > > RESULT: > +--------+\n > | text |\n > +--------+\n > > Same maintenence problem as example (a). > > Still works, but the question here is, do we give meaning to indentation relative to the close delimiter? Did we want?; > > +--------+\n > | text |\n > +--------+\n > > It's a nice trick but we sabotage the fat delimiter pattern. We would always get at least one level of indentation, whether we wanted it or not. Maybe better to code as; > > String b = """ > +--------+ > | text | > +--------+ > """.indent(4); > > So the question here is: should it be possible to specify "extra" indentation through the positioning of quotes, or are we better off saying that any extra indentation should be done through library calls? Also noting that the library calls might be subject to compile time folding. > __________________________________________________________________________________________________ > String c = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented several? > > RESULT: > +--------+\n > | text |\n > +--------+\n > > The amount of indentation is not a problem, just an aesthetic issue. > > __________________________________________________________________________________________________ > String d = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented four? > > RESULT: > +--------+\n > | text |\n > +--------+\n > > Text book fat delimiter pattern. > __________________________________________________________________________________________________ > String e = > """ > +--------+ > | text | > +--------+ > """; // heredoc? > > RESULT: > +--------+\n > | text |\n > +--------+\n > > Just an aesthetic issue. > __________________________________________________________________________________________________ > String f = """ > > > +--------+ > | text | > +--------+ > > > """; // one or all leading or trailing blank lines stripped? > > As-is would generate; > \n > \n > +--------+\n > | text |\n > +--------+\n > \n > \n > \n > > If we stripped away all leading or trailing blank lines, we would then have code as; > > String f = "\n".repeat(2) + """ > +--------+ > | text | > +--------+ > """ + "\n".repeat(2); > __________________________________________________________________________________________________ > String g = """ > +--------+ > | text | > +--------+"""; // Last \n dropped > > RESULT: > +--------+\n > | text |\n > +--------+ > > This one is likely okay. It's not the fat delimiter pattern, but the oddity makes it clear we mean something different; we want to drop the last \n. > __________________________________________________________________________________________________ > String h = """+--------+ > | text | > +--------+"""; // determine indent of first line using scanner knowledge? > > RESULT: > +--------+\n > | text |\n > +--------+ > > We can do this because the compiler's scanner can determine the indentation on the open delimiter line. However, this one is problematic if we require a String method to duplicate the compiler's algorithm (String::align). Tool vendors may also find this one problematic. > __________________________________________________________________________________________________ > String i = """ "nested" """; // strip leading/trailing space? > > RESULT: > "nested" > > This one still follows the rules; left and right stripped. > __________________________________________________________________________________________________ > String j = (""" > public static void """ + name + """(String... args) { > System.out.println(String.join(args)); > } > """).align(); // how do we handle expressions with multi-line strings? > > Mid-string substitution gets messy fast. Let's break the example down to the following (without align.) > > String j = """ > public static void """ + name + """(String... args) { > System.out.println(String.join(args)); > } > """; > > This is the same as > > String j = > """ > public static void """ > + name + > """(String... args) { > System.out.println(String.join(args)); > } > """; > > Which works fine if we say no \n when close delimiter is on the same line. The other requirement is there is that each multi-line string componment ends up with a common indentation. The odds of that happening are poor. > > Guess we're stuck with parentheses String::align. Unless... > __________________________________________________________________________________________________ > String k = """ > public static void %s(String... args) { > System.out.println(String.join(args)); > } > """.format(name); // is this the answer to multi-line string expressions? > > RESULT: > public static void methodName(String... args) { > System.out.println(String.join(args)); > } > > Maybe a better substitution solution. > __________________________________________________________________________________________________ > From forax at univ-mlv.fr Fri Apr 12 16:01:50 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Fri, 12 Apr 2019 18:01:50 +0200 (CEST) Subject: records are dead long live to ... In-Reply-To: References: <1647509074.1500836.1555054399458.JavaMail.zimbra@u-pem.fr> Message-ID: <1273675629.1701046.1555084910529.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Brian Goetz" > ?: "Remi Forax" , "amber-spec-experts" > Envoy?: Vendredi 12 Avril 2019 16:45:15 > Objet: Re: records are dead long live to ... >> I believe that since we are exploring the fact that record are immutable, the >> name "record" doesn't suit well to the feature anymore. >> I propose to rename it as tuple given that this is what this feature is, named >> tuples. > > It should come as no surprise that we thought about this idea quite a bit. > > You have anticipated the main object: "You idiots, these aren't real > tuples."? And the main response: "These are what tuples are in Java". > (just as lambdas in Java are literals for functional interface > instances.)? It's a believable story. > > On the other hand, there's a huge difference between this and lambdas -- > the word "closure" doesn't appear in the source text.? if it did, the > volume of "you idiots, these aren't real closures" would have been much > greater. The very same issue exist with the keyword 'record', these are not real records because there are immutable and record are clearly not :) And i agree that these are not real tuples, they are "named tuples", exactly like the named tuples of Python, that why you have to provide a name after the keyword 'tuple'. One way to solve that is to replace 'record' by 'named tuple' in the syntax and wait for people to tweet that the keyword 'named' is useless. > > > The real question, in my mind, is how it drives users to the right > mental model.? For those who have no preconceived notions of what tuples > are, no problem.? But for those who do -- and I think that's a lot -- > the question is whether it is more work to unlearn their preconceptions > first, or to associate what records are with a relatively unpolluted name. > > There are two categories of preconceived notions that we're working > against.? One is the larger audience, who has a vague clue about what a > tuple is, but doesn't have a real axe to grind -- it will be some > discomfort, but they'll get over it.? The other is the smaller but far^3 > more vocal audience -- the Tommy Tuple clan -- who will see it as a door > permanently slammed on their favorite feature, and it will be Optional > all over again.? And this group may infect the main group. I think it's exactly the opposite, using 'tuple' is the right name for those who hope to have their favourite structural tuples in the language. Java type system is nominal, if one want a structural construct in such type system, the construct has to provide a name to please the type system. For lambdas, the name you provide to the type system is the name of the functional interface found using type inference. So if we provide named tuples, we are a step in the right direction if the goal is to provide structural tuples, because now you have a name. R?mi From brian.goetz at oracle.com Fri Apr 12 17:08:28 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 12 Apr 2019 13:08:28 -0400 Subject: records are dead long live to ... In-Reply-To: <1273675629.1701046.1555084910529.JavaMail.zimbra@u-pem.fr> References: <1647509074.1500836.1555054399458.JavaMail.zimbra@u-pem.fr> <1273675629.1701046.1555084910529.JavaMail.zimbra@u-pem.fr> Message-ID: <38a88434-6e16-5ea1-41bf-214bcbf92a3d@oracle.com> > The very same issue exist with the keyword 'record', these are not real records because there are immutable and record are clearly not:) I was hoping you would resist trying to make this argument, because it is superficially true in a qualitative way, but in a quantitative way, is in such a totally different world than the story with tuples. Developers who think they understand what tuples are have a very, very clear idea of what they would expect tuples to be in Java. There is no single model of what "records" are, and so arguments "these are not records" are really "these are not the (one of fifty possible notions of records I had in mind.)"? And the set of developers that have a preconceived notion of what "records" should be in Java is at least 100x smaller than for tuples. Nice try :) From kevinb at google.com Fri Apr 12 18:17:12 2019 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 12 Apr 2019 11:17:12 -0700 Subject: records are dead long live to ... In-Reply-To: <1647509074.1500836.1555054399458.JavaMail.zimbra@u-pem.fr> References: <1647509074.1500836.1555054399458.JavaMail.zimbra@u-pem.fr> Message-ID: On Fri, Apr 12, 2019 at 12:34 AM Remi Forax wrote: At the end of section "Why not "just" do tuples ?", you have this gem, > "A good starting point for thinking about records is that they are nominal > tuples." > That is *a* starting point, but I think a barely useful one. Records have semantics, which makes them *worlds* different from tuples. Methods, supertypes, validation, specification... I think it's fair to say that all a record *holds* is a "tuple", but it's so much more. Record is to tuple as enum is to int. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From forax at univ-mlv.fr Fri Apr 12 18:40:39 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Fri, 12 Apr 2019 20:40:39 +0200 (CEST) Subject: records are dead long live to ... In-Reply-To: References: <1647509074.1500836.1555054399458.JavaMail.zimbra@u-pem.fr> Message-ID: <919160636.1722788.1555094439512.JavaMail.zimbra@u-pem.fr> > De: "Kevin Bourrillion" > ?: "Remi Forax" > Cc: "amber-spec-experts" > Envoy?: Vendredi 12 Avril 2019 20:17:12 > Objet: Re: records are dead long live to ... > On Fri, Apr 12, 2019 at 12:34 AM Remi Forax < [ mailto:forax at univ-mlv.fr | > forax at univ-mlv.fr ] > wrote: >> At the end of section "Why not "just" do tuples ?", you have this gem, >> "A good starting point for thinking about records is that they are nominal >> tuples." > That is *a* starting point, but I think a barely useful one. Records have > semantics, which makes them *worlds* different from tuples. Methods, > supertypes, validation, specification... I think it's fair to say that all a > record holds is a "tuple", but it's so much more. Record is to tuple as enum is > to int. Hi Kevin, I find interesting the example you have chosen, because you can interpret it as enum in C vs enum in Java, enum in C have no method, no supertype, etc but we still have kept the name "enum" for the equivalent in Java. In my opinion, the feature we have is more similar to a named tuple than a record, i fully agree that it's a named tuple on steroid, but as i said it's not dissimilar to the relation between an enum in C and an enum in Java. R?mi From guy.steele at oracle.com Fri Apr 12 18:43:12 2019 From: guy.steele at oracle.com (Guy Steele) Date: Fri, 12 Apr 2019 14:43:12 -0400 Subject: records are dead long live to ... In-Reply-To: References: <1647509074.1500836.1555054399458.JavaMail.zimbra@u-pem.fr> Message-ID: <1EA08573-A317-451B-ADDA-92F02A54379A@oracle.com> > On Apr 12, 2019, at 2:17 PM, Kevin Bourrillion wrote: > > On Fri, Apr 12, 2019 at 12:34 AM Remi Forax > wrote: > > At the end of section "Why not "just" do tuples ?", you have this gem, > "A good starting point for thinking about records is that they are nominal tuples." > > That is *a* starting point, but I think a barely useful one. Records have semantics, which makes them *worlds* different from tuples. Methods, supertypes, validation, specification... I think it's fair to say that all a record holds is a "tuple", but it's so much more. Record is to tuple as enum is to int. Good observation. And also note that Java `record` is to C `struct` as Java `enum` is to C `enum`. From forax at univ-mlv.fr Fri Apr 12 18:54:34 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Fri, 12 Apr 2019 20:54:34 +0200 (CEST) Subject: records are dead long live to ... In-Reply-To: <1EA08573-A317-451B-ADDA-92F02A54379A@oracle.com> References: <1647509074.1500836.1555054399458.JavaMail.zimbra@u-pem.fr> <1EA08573-A317-451B-ADDA-92F02A54379A@oracle.com> Message-ID: <1367483046.1723657.1555095274995.JavaMail.zimbra@u-pem.fr> > De: "Guy Steele" > ?: "Kevin Bourrillion" > Cc: "Remi Forax" , "amber-spec-experts" > > Envoy?: Vendredi 12 Avril 2019 20:43:12 > Objet: Re: records are dead long live to ... >> On Apr 12, 2019, at 2:17 PM, Kevin Bourrillion < [ mailto:kevinb at google.com | >> kevinb at google.com ] > wrote: >> On Fri, Apr 12, 2019 at 12:34 AM Remi Forax < [ mailto:forax at univ-mlv.fr | >> forax at univ-mlv.fr ] > wrote: >>> At the end of section "Why not "just" do tuples ?", you have this gem, >>> "A good starting point for thinking about records is that they are nominal >>> tuples." >> That is *a* starting point, but I think a barely useful one. Records have >> semantics, which makes them *worlds* different from tuples. Methods, >> supertypes, validation, specification... I think it's fair to say that all a >> record holds is a "tuple", but it's so much more. Record is to tuple as enum is >> to int. > Good observation. And also note that Java `record` is to C `struct` as Java > `enum` is to C `enum`. Apart from the fact that records are now immutable. I had no issue with the previous incarnation of record to be named record, it was even a great name, record is the Pascal equivalent of the C struct, i've seen it as a kind of homage given that i believe the only other thing that Java has from Pascal is its method calling convention. R?mi From forax at univ-mlv.fr Fri Apr 12 19:38:13 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 12 Apr 2019 21:38:13 +0200 (CEST) Subject: Make Primary Constructor an independant feature Message-ID: <139198437.1725714.1555097893675.JavaMail.zimbra@u-pem.fr> I think there is a merit to separate the primary constructor feature from other features of a record. This afternoon while fixing a bug, i took a look to the classes around to see if it was possible to transform them to records. But i've found that in more than half of the cases, the classes were not representing data but just carrier so having the getters, hashCode and equals automatically generated was the factor that hamper me to transform the classes into records. Given that a record is a data carrier, i'm wondering if it makes sense to say that a record = data + carrier i.e. to separate the data part (getters + equals + hashCode) from the carrier part (the initialization using a primary constructor). The idea is that a primary constructor can be applied not only to records, but also to classes, enums and later value classes. Here are some examples, For a result of a function, having an equals/hashCode can be harmful (maybe the error code is not stable) class Result(int value, int errorCode); For most of the structural patterns, having getters is not helpful class UserProxy(User user) implements User { UserProxy { requireNonNull(user); } ... } An enum that stores an integer, all constants have to initialize the value enum Flag(int flag) { READ(8), WRITE(16) ; public boolean isAllowedBy(int flags) { return flags & flag != 0; } } R?mi From brian.goetz at oracle.com Fri Apr 12 19:52:59 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 12 Apr 2019 15:52:59 -0400 Subject: Make Primary Constructor an independant feature In-Reply-To: <139198437.1725714.1555097893675.JavaMail.zimbra@u-pem.fr> References: <139198437.1725714.1555097893675.JavaMail.zimbra@u-pem.fr> Message-ID: > I think there is a merit to separate the primary constructor feature from other features of a record. You could be saying one of two things here: ?- Let's streamline generation of constructors, as record constructors do, by allowing some way of indicating that parameters are bound to fields, that all classes can use. ?- Let's define a different top-level streamlined idiom, `class X(fields)`. For the first, I agree, and already have most of a design story for this, but am deliberately holding it back because we are already deeply in danger of drowning in "how about this (mostly-syntax-oriented) feature" and not ever delivering anything. In fact, if you go back to my early talks about Amber several years ago, you'll note that I said "Records are almost surely a 'macro' for a set of finer-grained features that can be applied more generally."? Of the various pieces, "field-bound constructors" are the simplest, and they carry over almost directly from records. As to the second: this is a bottomless pit of worms (I went down that route, didn't like it.)? Various problems: ?- The fields don't belong up in the header, if they are not part of the classes API. ?- Once you admit mutable state, you quickly get mired in "but I want setters for these fields", "but I want my getters to be protected", "but I want only these fields to participate in equals/hashCode", etc. What you've basically said here is "Let's throw records in the garbage and design something new, more like "case classes"."? (I know you probably didn't mean to say that, but that's really what you're suggesting.) Records work because all the protocols (representation, construction, deconstruction, equals, hashCode, toString) are all derived from the _same_ description.? Once once departs, the chances that another wants to depart is very high.? At which point, you either end up at a macro generator with too many knobs (more than zero is probably too many), or you end up with an assortment of lower-level features like "make me a constructor", "make me an equals/hashCode", etc. Records also work because they have clear semantics.? Once you start to get into macro-generation land, syntax takes over, and you lose the guiding light of semantics. I think the latter path -- let's find the low-level features that records are really a bundle of -- is a much more sound one.? And I'm willing to walk it, but not right now. > This afternoon while fixing a bug, i took a look to the classes around to see if it was possible to transform them to records. But i've found that in more than half of the cases, the classes were not representing data but just carrier so having the getters, hashCode and equals automatically generated was the factor that hamper me to transform the classes into records. > > Given that a record is a data carrier, i'm wondering if it makes sense to say that a record = data + carrier i.e. to separate the data part (getters + equals + hashCode) from the carrier part (the initialization using a primary constructor). The idea is that a primary constructor can be applied not only to records, but also to classes, enums and later value classes. > > Here are some examples, > For a result of a function, having an equals/hashCode can be harmful (maybe the error code is not stable) > class Result(int value, int errorCode); > > For most of the structural patterns, having getters is not helpful > class UserProxy(User user) implements User { > UserProxy { > requireNonNull(user); > } > ... > } > > An enum that stores an integer, all constants have to initialize the value > enum Flag(int flag) { > READ(8), WRITE(16) > ; > > public boolean isAllowedBy(int flags) { > return flags & flag != 0; > } > } > > R?mi > > From brian.goetz at oracle.com Tue Apr 16 18:21:12 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 16 Apr 2019 14:21:12 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> Message-ID: <346119DB-6C21-4C34-912D-5D10DD487C35@oracle.com> Let me extract some observations, and questions, from JIm?s mail here. I think the alignment algorithm Jim is driving towards is something like this: - Strip up to one leading and trailing blank line - Align the remaining lines by removing the largest common whitespace prefix from each [1] - If the trailing line was blank, add back an ending newline In Jim?s examples, all of a, b, c, d, e, f, g would produce a left-justified text rectangle (textangle?); g would not have a trailing newline. For h, we would get +--------+ | text | +--------+ because the text on the first line counts as text. (In this way, any ML string with text in the first column will have effectively opted out of alignment, since there is no leading common whitespace prefix ? though I would not intend for this to be the only opt-out mechanism.). [1] We can require an exact match (tabs for tabs), or an inexact one (each WS character counts as one whitespace) With respect to using text column position as part of the algorithm, I don?t really think we can justify this; it?s too weird and different. Secondarily, there is a lot of value to the language interpretation of alignment (if any) and the library interpretation (String::align) to be the same; this allows complex cases that are built up via concatenation to opt out on the component strings, build a composition string, and then explicitly align with the library mechanism. I think this is largely a forced move. The examples with concatenation are ugly no matter how we slice them (at least not without interpolation, which is most definitely not on the menu right now.). I don?t think any auto-align mechanism is going to work well on them, but there are two ways to get the desired result: - Use concatenation on opted-out strings, and then wrap the whole thing with (?).align() - Use String.format (we can provide an instance version) on a single, aligned format string, such as: String s = ??? Name: %s Age: %d???.formatted(name, age); The high-order question is: - Do we think that this alignment algorithm produces the desired answer enough of the time, and is easy enough to reason about, to be the default (assuming an opt-out mechanism), or should we prefer explicit alignment all the time? > On Apr 10, 2019, at 11:22 AM, Jim Laskey wrote: > > Next plate is (1a) incidental whitespace. > > Having decided that we are content with "fat" delimiters (""") for multi-line strings, we have some more choices to make regarding multi-line strings. (We're not going to talk about "raw" strings yet; let's finish the multi-line course first.) > > Multi-line strings are different from single-line strings in a number of ways, so let's get clear on what we want "multi-line" to mean. > > Line terminators: When strings span lines, they do so using the line terminators present in the source file, which may vary depending on what operating system the file was authored. Should this be an aspect of multi-line-ness, or should we normalize these to a standard line terminator? It seems a little weird to treat string literals quite so literally; the choice of line terminator is surely an incidental one. I think we're all comfortable saying "these should be normalized", but its worth bringing this up because it is merely one way in which incidental artifacts of how the string is embedded in the source program force us to interpret what the user meant. Which brings us to the next incidental aspect... > > Whitespace: A multi-line string is nestled in the context of a Java source program. It is likely (though not guaranteed) that the indentation of lines has been distorted by the desire to make the embedded snippet align with the enclosing lines. Most of the time, there is some combination of incidental whitespace and intended whitespace. There are a number of algorithms by which we could try to intuit which the user intended. Which brings us to ask: > > - Assuming the existence of a reasonable algorithm for re-aligning text, what should the _default_ be for the language? Should it assume the user wants re-alignment, or make the user explicitly opt in? > - If the choice is "automatically align", how would we indicate the desire to opt out? > - Should we limit what we do automatically to only what can be done by an equivalent library routine? > > (Again, let's focus on the requirements and semantics and defaults first, before we bikeshed the syntax.) > > Its hard to answer the above without a clear understanding of the use cases. So, here's a partial catalog of examples; let's play "what was the user thinking", and see if we can agree on that. > > Examples; > > String a = """ > +--------+ > | text | > +--------+ > """; // first characters in first column? > > String b = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented four spaces? > > String c = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented several? > > String d = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented four? > > String e = > """ > +--------+ > | text | > +--------+ > """; // heredoc? > > String f = """ > > > +--------+ > | text | > +--------+ > > > """; // one or all leading or trailing blank lines stripped? > > String g = """ > +--------+ > | text | > +--------+"""; // Last \n dropped > > String h = """+--------+ > | text | > +--------+"""; // determine indent of first line using scanner knowledge? > > String i = """ "nested" """; // strip leading/trailing space? > > String j = (""" > public static void """ + name + """(String... args) { > System.out.println(String.join(args)); > } > """).align(); // how do we handle expressions with multi-line strings? > > String k = """ > public static void %s(String... args) { > System.out.println(String.join(args)); > } > """.format(name); // is this the answer to multi-line string expressions? > > As we can see, there were a lot of cases where the user _probably_ wanted one thing, but _might have_ wanted another. What control knobs do we have, that we could assign meaning to, that would let the user choose either way? Candidates include: > > - The opening line (is it blanks followed by a newline, or are there non-whitespace characters?) > - The position of the close delimiter (is it on its own line, or not?) > > Similarly, we have a number of policy choices: > > - Do we allow content on the same lines as the delimiters? > - Should we always add a final newline? > - Should we strip blanks lines? Only on the first and last? All leading and trailing? > - How do we interpret auto-alignment on single-line strings? Strip? > - Should we right strip lines? > > And some syntax choices (not to be discussed now): > > - How do we indicate opt-out? > > Comments? > > > Examples narrative. Don?t peek yet. Stop and comment first. > > > Unlike most other Java constructs, multi-line strings force us to look at coding style "square on". Keep in mind that we are often guilty of making assumptions about developer coding style. For instance, we may assume that multi-line strings tend to be large elements. We may also assume that developers will declare static final String variables to keep multi-line strings from messing up their code. All very neat and tidy, but... we know from experience that developers will use multi-line strings everywhere, as they have with array initialization and large lambda bodies. > > From this, we recommend that multi-line string fat delimiters should follow the brace pattern used in array initialization, lambdas and other Java constructs. The open delimiter should end the current line. Content follows on separate lines, indented one level. The close delimiter starts a new line, back indented one level, followed by the continuation of enclosing expression. > > So as in this brace pattern; > > int[] ia = new int[] { > 1, > 2, > 3 > }; > > we have the fat delimiter pattern; > > String d = """ > +--------+ > | text | > +--------+ > """; > > and; > > String.format(""" > public static void %s(String... args) { > System.out.println(String.join(args)); > } > """, name); > > The fat delimiter pattern also significantly helps with future editing in and around the multi-line string. For example, changing the length of the variable name in the above "String d =" example doesn't affect the positioning of the string content or the close delimiter. > > If we adopt this style, some of the answers to the incidentals questions become easier or even moot. Other styles are still valid, but the result of automatic incidental handling may be surprising. > > Note that fat delimiters can be used on single lines. What are the semantics for auto-alignment in that case? The question of stripping whitespace and newlines is not really about alignment. It's about what are the rules for handling incidental characters in a fat delimiter string. > > > Continuing with the examples, let's assume some (negotiable) auto-alignment basic rules; > > 1. All content lines are uniformly right stripped. Whitespace at the end of lines is not something that is consistently managed by IDEs/editors. > 2. End of lines are always translated to \n. > 3. If the content after the open delimiter is empty then the first end of line is discarded. > 4. Content is left justified while preserving relative indentation. > > And as a reminder, in the last round we introduced or attempted to introduce the following String methods; > > - String::indent(n) - used to change indentation, line by line (in JDK 11) > - String::align() and String::align(n) - used to manage incidental indentation (didn't make it) > - String::format as an instance method (resolution issues YTBD) > > __________________________________________________________________________________________________ > String a = """ > +--------+ > | text | > +--------+ > """; // first characters in first column? > > RESULT: > +--------+\n > | text |\n > +--------+\n > > The problem with this example is that it is not following the fat delimiter pattern. Let's change the variable name "a" to "something". > > String something = """ > .......... +--------+ > .......... | text | > .......... +--------+ > .......... """; // first characters in first column? > > The "." indicate all the places where we had to add whitespace to maintain the pattern used. > __________________________________________________________________________________________________ > String b = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented four? > > RESULT: > +--------+\n > | text |\n > +--------+\n > > Same maintenence problem as example (a). > > Still works, but the question here is, do we give meaning to indentation relative to the close delimiter? Did we want?; > > +--------+\n > | text |\n > +--------+\n > > It's a nice trick but we sabotage the fat delimiter pattern. We would always get at least one level of indentation, whether we wanted it or not. Maybe better to code as; > > String b = """ > +--------+ > | text | > +--------+ > """.indent(4); > > So the question here is: should it be possible to specify "extra" indentation through the positioning of quotes, or are we better off saying that any extra indentation should be done through library calls? Also noting that the library calls might be subject to compile time folding. > __________________________________________________________________________________________________ > String c = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented several? > > RESULT: > +--------+\n > | text |\n > +--------+\n > > The amount of indentation is not a problem, just an aesthetic issue. > > __________________________________________________________________________________________________ > String d = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented four? > > RESULT: > +--------+\n > | text |\n > +--------+\n > > Text book fat delimiter pattern. > __________________________________________________________________________________________________ > String e = > """ > +--------+ > | text | > +--------+ > """; // heredoc? > > RESULT: > +--------+\n > | text |\n > +--------+\n > > Just an aesthetic issue. > __________________________________________________________________________________________________ > String f = """ > > > +--------+ > | text | > +--------+ > > > """; // one or all leading or trailing blank lines stripped? > > As-is would generate; > \n > \n > +--------+\n > | text |\n > +--------+\n > \n > \n > \n > > If we stripped away all leading or trailing blank lines, we would then have code as; > > String f = "\n".repeat(2) + """ > +--------+ > | text | > +--------+ > """ + "\n".repeat(2); > __________________________________________________________________________________________________ > String g = """ > +--------+ > | text | > +--------+"""; // Last \n dropped > > RESULT: > +--------+\n > | text |\n > +--------+ > > This one is likely okay. It's not the fat delimiter pattern, but the oddity makes it clear we mean something different; we want to drop the last \n. > __________________________________________________________________________________________________ > String h = """+--------+ > | text | > +--------+"""; // determine indent of first line using scanner knowledge? > > RESULT: > +--------+\n > | text |\n > +--------+ > > We can do this because the compiler's scanner can determine the indentation on the open delimiter line. However, this one is problematic if we require a String method to duplicate the compiler's algorithm (String::align). Tool vendors may also find this one problematic. > __________________________________________________________________________________________________ > String i = """ "nested" """; // strip leading/trailing space? > > RESULT: > "nested" > > This one still follows the rules; left and right stripped. > __________________________________________________________________________________________________ > String j = (""" > public static void """ + name + """(String... args) { > System.out.println(String.join(args)); > } > """).align(); // how do we handle expressions with multi-line strings? > > Mid-string substitution gets messy fast. Let's break the example down to the following (without align.) > > String j = """ > public static void """ + name + """(String... args) { > System.out.println(String.join(args)); > } > """; > > This is the same as > > String j = > """ > public static void """ > + name + > """(String... args) { > System.out.println(String.join(args)); > } > """; > > Which works fine if we say no \n when close delimiter is on the same line. The other requirement is there is that each multi-line string componment ends up with a common indentation. The odds of that happening are poor. > > Guess we're stuck with parentheses String::align. Unless... > __________________________________________________________________________________________________ > String k = """ > public static void %s(String... args) { > System.out.println(String.join(args)); > } > """.format(name); // is this the answer to multi-line string expressions? > > RESULT: > public static void methodName(String... args) { > System.out.println(String.join(args)); > } > > Maybe a better substitution solution. > __________________________________________________________________________________________________ > From guy.steele at oracle.com Tue Apr 16 19:37:17 2019 From: guy.steele at oracle.com (Guy Steele) Date: Tue, 16 Apr 2019 15:37:17 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> Message-ID: <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> Just in case it wasn?t clear, I want to emphasize that my suggestion of using a rectangle-of-double-quotes was intended as a serious proposal?admittedly one that looks unusual and would require good cooperation from each IDE to be practical, but one that solves a number of the problems about that we are facing about indentation and whether to strip whitespace. ?Guy > On Apr 10, 2019, at 5:48 PM, Guy Steele wrote: > > >> On Apr 10, 2019, at 5:36 PM, Guy Steele > wrote: >> >> >>> On Apr 10, 2019, at 4:54 PM, Brian Goetz > wrote: >>> >>> This is a plateful! >>> >>> Stripping "incidental" whitespace is an attractive target of opportunity; the real question is, can we do it right enough of the time, and when we get it wrong, is there an easy way for the user to recover and get what they want? >>> >>> Kevin described this as: "find the magic rectangle"; that there should be a rectangle enclosing the snippet that sets apart the incidental whitespace from the essential. In your examples, most of the time, the magic rectangle is, well, the actual rectangle in your text. >>> >>> >>>> >>>> Examples; >>>> >>>> String a = """ >>>> +--------+ >>>> | text | >>>> +--------+ >>>> """; // first characters in first column? >> >> Which suggests yet another approach to multiline string literals: >> >> String a = ??????????????????????????????????????? >> ?A rectangle of double quotes " >> ? can enclose any arbitrary text ? >> ? with any desired indentation, ? >> ? and you can assume any trailing ? >> ? whitespace on each line will be ? >> ? removed and that each line will ? >> ? end with a \\n . ? >> ? ? >> ?So all you need is IDE support for ? >> ? making nice rectangles. ? >> ???????????????????????????????????????; >> >> String result = ???????????????????????????????????????????????? >> ?public class Main { ? >> ? public static void main(String... args) { ? >> ? System.out.println("Hello World!?); ? >> ? } ? >> ?} ? >> ????????????????????????????????????????????????; >> >> String html = ??????????????????????????????????????????????? >> ? ? >> ? ? >> ?

Hello World.

? >> ? ? >> ? ? >> ? ? >> ???????????????????????????????????????????????; >> > > Note that I was inconsistent with my use of escapes. I reckon you should be able to use escapes, but perhaps one need not escape included double quotes because you can tell from the length of the initial line of double-quotes how much text to skip before you expect see the double quote that marks the right-hand edge of the rectangle. > From guy.steele at oracle.com Tue Apr 16 19:57:52 2019 From: guy.steele at oracle.com (Guy Steele) Date: Tue, 16 Apr 2019 15:57:52 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> Message-ID: <8E4A57FA-33AC-4A06-99BC-EE17B3629524@oracle.com> Actually, let me point out that the rectangle-of-double-quotes syntax is entirely compatible with Jim Laskey?s suggestions; the two can coexist. Define the rectangle-of-double-quotes syntax to be: ::= * * * where ::= * * and the meaning of such a literal is the concatenation of one string expression for each intermediate line calculated as .trimRight() + ?\n? where the method trimRight is the obvious method that trims only on the right side of the string. The only reason for requiring a string-rectangle to begin with at least 7 double quotes in a row is because 6 in a row would presumably be an empty string using """ as delimiters. In addition one could impose constraints on the lengths of the delimiters, the lengths of the strings, and/or the amount of whitespace at the start of each intermediate line. For example, one might require the two occurrences of to be the same length.) So by all means consider Jim?s proposal separately, then please consider the rectangle-of-quotes as one possible way to address the management of indentation and whitespace stripping. This would give users a choice of styles for multiline strings. ?Guy > On Apr 16, 2019, at 3:37 PM, Guy Steele wrote: > > Just in case it wasn?t clear, I want to emphasize that my suggestion of using a rectangle-of-double-quotes was intended as a serious proposal?admittedly one that looks unusual and would require good cooperation from each IDE to be practical, but one that solves a number of the problems about that we are facing about indentation and whether to strip whitespace. > > ?Guy From brian.goetz at oracle.com Tue Apr 16 20:02:48 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 16 Apr 2019 16:02:48 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> Message-ID: <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> Indeed, it does solve a number of "figure out what was the user thinking" questions. You hit the nail on the head regarding IDE support.? Our original thinking was that it should be easy to cut and paste between a (say) JSON document and a Java source program, without having to mangle it up in an annoying and error-prone way. I think what you're saying is, in the age of IDEs, that this is not such a problem, and we should focus on what yields the most _readable_ code, on the theory that writing is the job of the IDE? On 4/16/2019 3:37 PM, Guy Steele wrote: > Just in case it wasn?t clear, I want to emphasize that my suggestion > of using a rectangle-of-double-quotes was intended as a serious > proposal?admittedly one that looks unusual and would require good > cooperation from each IDE to be practical, but one that solves a > number of the problems about that we are facing about indentation and > whether to strip whitespace. > > ?Guy > >> On Apr 10, 2019, at 5:48 PM, Guy Steele > > wrote: >> >> >>> On Apr 10, 2019, at 5:36 PM, Guy Steele >> > wrote: >>> >>> >>>> On Apr 10, 2019, at 4:54 PM, Brian Goetz >>> > wrote: >>>> >>>> This is a plateful! >>>> >>>> Stripping "incidental" whitespace is an attractive target of >>>> opportunity; the real question is, can we do it right enough of the >>>> time, and when we get it wrong, is there an easy way for the user >>>> to recover and get what they want? >>>> >>>> Kevin described this as: "find the magic rectangle"; that there >>>> should be a rectangle enclosing the snippet that sets apart the >>>> incidental whitespace from the essential.? In your examples, most >>>> of the time, the magic rectangle is, well, the actual rectangle in >>>> your text. >>>> >>>> >>>>> >>>>> Examples; >>>>> >>>>> String a = """ >>>>> ?+--------+ >>>>> ? ? ? ? ? ?| ?text ?| >>>>> ?+--------+ >>>>> ? ? ? ? ? ?"""; // first characters in first column? >>> >>> Which suggests yet another approach to multiline string literals: >>> >>> String a = ??????????????????????????????????????? >>> ??A rectangle of double quotes ? ? ? ? " >>> ? ? ? ?? ? can enclose any arbitrary text ?? >>> ? ? ? ?? ? with any desired indentation, ? >>> ? ? ? ?? ? and you can assume any trailing ? >>> ? ? ? ?? ? whitespace on each line will be ? >>> ? ? ? ?? ? removed and that each line will ??? >>> ? ? ? ?? ? end with a \\n . ? ? ? ? ? ? ? ? ? >>> ? ? ? ?? ??? >>> ? ? ? ??So all you need is IDE support for ??? >>> ? ? ? ?? ? making nice rectangles. ??? >>> ????????????????????????????????????????; >>> >>> String result =????????????????????????????????????????????????? >>> ? ? ? ? ? ? ? ???public class Main { ? ? ? ? ? ? ? ? ??? >>> ? ? ? ? ? ??? ? ?public static void main(String... args) {?? >>> ? ? ? ? ? ? ? ??? ?System.out.println("Hello World!?); ??? >>> ? ? ? ? ? ? ? ??? ? ?} ? ? ? ? ? ? ? ? ??? >>> ? ? ? ? ? ? ? ???} ? ? ? ? ? ? ? ? ??? >>> ??????????????????????????????????????????????????; >>> >>> String html = ??????????????????????????????????????????????? >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? >>> ? ? ? ? ? ? ? ? ?

Hello World.

? ? ? ?? >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? >>> ? ? ? ? ? ? ? ? ?? >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? >>> ???????????????????????????????????????????????; >>> >> >> Note that I was inconsistent with my use of escapes. ?I reckon you >> should be able to use escapes, but perhaps one need not escape >> included double quotes because you can tell from the length of the >> initial line of double-quotes how much text to skip before you expect >> see the double quote that marks the right-hand edge of the rectangle. >> > From guy.steele at oracle.com Tue Apr 16 20:14:41 2019 From: guy.steele at oracle.com (Guy Steele) Date: Tue, 16 Apr 2019 16:14:41 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> Message-ID: <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> > On Apr 16, 2019, at 4:02 PM, Brian Goetz wrote: > > Indeed, it does solve a number of "figure out what was the user thinking" questions. > > You hit the nail on the head regarding IDE support. Our original thinking was that it should be easy to cut and paste between a (say) JSON document and a Java source program, without having to mangle it up in an annoying and error-prone way. > > I think what you're saying is, in the age of IDEs, that this is not such a problem, and we should focus on what yields the most _readable_ code, on the theory that writing is the job of the IDE? Yes, exactly. Though, as I pointed out in my subsequent message, it?s easy to give the programmer a choice between a quick-and-easy way to cut-and-paste that is easy to write, and a more labor-intensive or IDE-intensive version that may be easier to read (at least in some situations). At least we should explore such options. ?Guy From cushon at google.com Tue Apr 16 20:50:52 2019 From: cushon at google.com (Liam Miller-Cushon) Date: Tue, 16 Apr 2019 13:50:52 -0700 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> Message-ID: On Fri, Apr 12, 2019 at 8:39 AM Jim Laskey wrote: > Do you have numbers from your RSL survey for, of all string expressions > that are candidates for translation to a multi-line string literal, what > percentage contain no escapes other than quotes and newline? > Based on our data roughly 1 in 85 candidates for multi-line string literals contain non-trivial escapes. From brian.goetz at oracle.com Tue Apr 16 21:24:58 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 16 Apr 2019 17:24:58 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> Message-ID: That's pretty encouraging; 84 out of 85 would have no escapes, and the other 1 would have the same escapes it has now. On 4/16/2019 4:50 PM, Liam Miller-Cushon wrote: > On Fri, Apr 12, 2019 at 8:39 AM Jim Laskey > wrote: > > Do you have numbers from your RSL survey for, of all string > expressions that are candidates for translation to a multi-line > string literal, what percentage contain no escapes other than > quotes and newline? > > > Based on our data roughly 1 in 85 candidates for multi-line string > literals contain non-trivial escapes. From brian.goetz at oracle.com Tue Apr 16 21:28:21 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 16 Apr 2019 17:28:21 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> Message-ID: <5cfc1aca-2267-ff0a-2a17-2c8c8c63f345@oracle.com> So let's move on to the follow up question.? If you replace concatenated strings -- which can have exactly the right amount of indentation: ??? String s = ??????? "public class Foo { \n" + ??????? "??? void m() { }\n" + ??????? "}"; with multi-line strings: ??? String s = """ ??????? public class Foo { ??????????? void m() { } ??????? } ??????? """; you've gained one thing (dropping the concat and escapes), but have inherited a bunch of additional incidental leading whitespace.? And we've talked about two paths for that, the implicit one (where we auto-align, in the absence of reasons not to), and the explicit one: ??? String s = """ ??????? public class Foo { ??????????? void m() { } ??????? } ??????? """.align(); What can your data tell us about how often you care about incidental whitespace, and how often you'd be saying ".align()"? On 4/16/2019 4:50 PM, Liam Miller-Cushon wrote: > On Fri, Apr 12, 2019 at 8:39 AM Jim Laskey > wrote: > > Do you have numbers from your RSL survey for, of all string > expressions that are candidates for translation to a multi-line > string literal, what percentage contain no escapes other than > quotes and newline? > > > Based on our data roughly 1 in 85 candidates for multi-line string > literals contain non-trivial escapes. From james.laskey at oracle.com Tue Apr 16 21:37:04 2019 From: james.laskey at oracle.com (James Laskey) Date: Tue, 16 Apr 2019 18:37:04 -0300 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> Message-ID: <3A8482CC-4FE3-4321-BB9B-7D511B4D47B5@oracle.com> Thank you. Sent from my iPhone > On Apr 16, 2019, at 5:50 PM, Liam Miller-Cushon wrote: > >> On Fri, Apr 12, 2019 at 8:39 AM Jim Laskey wrote: > >> Do you have numbers from your RSL survey for, of all string expressions that are candidates for translation to a multi-line string literal, what percentage contain no escapes other than quotes and newline? > > Based on our data roughly 1 in 85 candidates for multi-line string literals contain non-trivial escapes. From cushon at google.com Tue Apr 16 23:03:02 2019 From: cushon at google.com (Liam Miller-Cushon) Date: Tue, 16 Apr 2019 16:03:02 -0700 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> Message-ID: On Wed, Apr 10, 2019 at 8:25 AM Jim Laskey wrote: > Don?t peek yet. Stop and comment first. > Here's a variation on the narrative you provided. This is more thinking aloud than a proposal I'm advocating for, but I think it's a potentially interesting point in the design space. Idea: multi-line strings should be formatted like other expressions, rather than a block-like construct (this differs from the 'fat delimiter pattern' in the original message), i.e.: String a = """ ... """; // +8 expression continuation indent Policy: both delimiters must be on a separate line from the contents of the string. This means we can't have 'multi-line' strings that do not span multiple lines of source code. Since the closing delimiter's indentation is independent of the position of the opening delimiter, and cannot be on the same line as the string contents, this opens the door to basing the relative indentation off the closing delimiter. (It also avoids needing information from the scanner to manage indentation.) > What control knobs do we have, that we could assign meaning to, that would let the user choose either way? The position of the closing delimiter. > Do we allow content on the same lines as the delimiters? No > How do we interpret auto-alignment on single-line strings? Strip? Single-line strings cannot use `"""`, since we require the delimiters to be on their own line. > Should we always add a final newline? > Should we strip blanks lines? Only on the first and last? All leading and trailing? > Should we right strip lines? Maybe (read on for some discussion of trailing newlines). For the most part the other choices don't affect the following examples. Examples: String a = """ +--------+ | text | +--------+ """; // first characters in first column? Actual: +--------+\n | text |\n +--------+ String b = """ +--------+ | text | +--------+ """; // first characters in first column or indented four spaces? Actual: +4 indentation relative to closing delimiter: +--------+\n | text |\n +--------+ String c = """ +--------+ | text | +--------+ """; // first characters in first column or indented several? Actual: indented several, relative to closing delimiter: +--------+\n | text |\n +--------+ String d = """ +--------+ | text | +--------+ """; // first characters in first column or indented four? Actual: +4 indentation relative to closing delimiter: +--------+\n | text |\n +--------+ String e = """ +--------+ | text | +--------+ """; // heredoc? Actual: +--------+\n | text |\n +--------+ String f = """ +--------+ | text | +--------+ """; // one or all leading or trailing blank lines stripped? Actual: \n \n +--------+\n | text |\n +--------+\n \n Examples g, h, and i are disallowed, since delimiters must be on their own line. For j, since delimiters must be on their own line, we'd need something like: String j = """ public static void """ + " " + name + """ (String... args) { System.out.println(String.join(args)); } """; Using a multi-line string for "public static void" is unnecessary, but I kept that to illustrate what concatenation looks like with multi-line strings. This example suggests we'd usually prefer `String.format` to concatenating multi-line strings under this approach. Conclusions: Requiring delimiters to be on their own line rules out some potentially surprising edge-cases, and allows indentation management to work without special knowledge from the scanner. It also provides a convenient opt-out mechanism if leading indentation is desired. One disadvantage is the handling of the trailing newline. Requiring the closing delimiter to be on its own line means there's always a trailing newline in source. If we want to allow expressing multi-line strings that don't have a trailing newline we could automatically trim one trailing newline character, but then it would be necessary to leave an extra blank line after multi-line strings in cases where a trailing newline is actually desired. Example: String message = """ hello world """; Actual: hello\n world\n From james.laskey at oracle.com Wed Apr 17 12:55:58 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Wed, 17 Apr 2019 09:55:58 -0300 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> Message-ID: <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> To paraphrase McLuhan, I think that we need to keep the focus on the rider and not the horse they rode in on. This was the main reason I pushed for backtick over triple double quote; making it about the content and not the syntax around it. In the Raw String Literal round, I suggested that String query = ``````````````````````````````````````````````` SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` WHERE `CITY` = ?INDIANAPOLIS' ORDER BY `EMP_ID`, `LAST_NAME`; ```````````````````````````````````````````````; might freak out some developers. If you break this example down, the multiple quotes are not really for the language, but instead for the eye of the beholder. Alignment wise, all the language cares about is the first character of each of the delimiters. The last character can be derived from the longest line based on end of line. So this reduces down to String query = `` SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` WHERE `CITY` = ?INDIANAPOLIS' ORDER BY `EMP_ID`, `LAST_NAME`; ``; // need `` because of the enclosed backticks or in the fat delimiter world String query = """ SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` WHERE `CITY` = ?INDIANAPOLIS' ORDER BY `EMP_ID`, `LAST_NAME`; """; In the incidental whitespace discussion, this aligns with closing delimiter influencing indentation. Brian and I have been of four minds on this, where I tend to favour closing delimiter influence and Brian tends otherwise. My strongest argument thus far has been, "Swift does it." As you point out, management of literal boxing is nightmarish and heavily reliant on IDE support, and I feel strongly that is a problem. I'm in and out of BBEdit , VSC, Atom, and Coda as much as I am in Intellij. The IDE answer doesn't work for me, and may not for (many?) others. What I do see is an IDE putting a light gray box or a different background colour to highlight the "auto aligned" string content. Bottom line, closing delimiter influencing indentation provides the same information as boxing with less hassle. Cheers, -- Jim > On Apr 16, 2019, at 5:14 PM, Guy Steele wrote: > > >> On Apr 16, 2019, at 4:02 PM, Brian Goetz > wrote: >> >> Indeed, it does solve a number of "figure out what was the user thinking" questions. >> >> You hit the nail on the head regarding IDE support. Our original thinking was that it should be easy to cut and paste between a (say) JSON document and a Java source program, without having to mangle it up in an annoying and error-prone way. >> >> I think what you're saying is, in the age of IDEs, that this is not such a problem, and we should focus on what yields the most _readable_ code, on the theory that writing is the job of the IDE? > > Yes, exactly. Though, as I pointed out in my subsequent message, it?s easy to give the programmer a choice between a quick-and-easy way to cut-and-paste that is easy to write, and a more labor-intensive or IDE-intensive version that may be easier to read (at least in some situations). > > At least we should explore such options. > > ?Guy > From james.laskey at oracle.com Wed Apr 17 13:13:45 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Wed, 17 Apr 2019 10:13:45 -0300 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> Message-ID: Brian points out that we under highlighted "closing delimiter influence". Liam eludes to it in his discussion, but I'll break the discussion out here. In the following example I'll use "." to indicate whitespace; String s = """ ...............line 1 ...................line 2 ..........."""; The outcome without closing delimiter influence; line 1 ....line 2 represents the result of only including line 1 and line 2 in the determination of the amount of whitespace needs to be removed. The outcome with closing delimiter influence; ....line 1 ........line 2 represents the result including line 1, line 2 AND the line containing the closing delimiter. The negative aspect of the including the closing delimiter line is that the fat delimiter pattern would have to deviate from the brace pattern. Cheers, -- Jim > On Apr 17, 2019, at 9:55 AM, Jim Laskey wrote: > > To paraphrase McLuhan, I think that we need to keep the focus on the rider and not the horse they rode in on. This was the main reason I pushed for backtick over triple double quote; making it about the content and not the syntax around it. > > In the Raw String Literal round, I suggested that > > String query = ``````````````````````````````````````````````` > SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` > WHERE `CITY` = ?INDIANAPOLIS' > ORDER BY `EMP_ID`, `LAST_NAME`; > ```````````````````````````````````````````````; > > might freak out some developers. > > If you break this example down, the multiple quotes are not really for the language, but instead for the eye of the beholder. Alignment wise, all the language cares about is the first character of each of the delimiters. The last character can be derived from the longest line based on end of line. So this reduces down to > > String query = `` > SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` > WHERE `CITY` = ?INDIANAPOLIS' > ORDER BY `EMP_ID`, `LAST_NAME`; > ``; // need `` because of the enclosed backticks > > or in the fat delimiter world > > String query = """ > SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` > WHERE `CITY` = ?INDIANAPOLIS' > ORDER BY `EMP_ID`, `LAST_NAME`; > """; > > In the incidental whitespace discussion, this aligns with closing delimiter influencing indentation. Brian and I have been of four minds on this, where I tend to favour closing delimiter influence and Brian tends otherwise. My strongest argument thus far has been, "Swift does it." > > As you point out, management of literal boxing is nightmarish and heavily reliant on IDE support, and I feel strongly that is a problem. I'm in and out of BBEdit , VSC, Atom, and Coda as much as I am in Intellij. The IDE answer doesn't work for me, and may not for (many?) others. What I do see is an IDE putting a light gray box or a different background colour to highlight the "auto aligned" string content. > > Bottom line, closing delimiter influencing indentation provides the same information as boxing with less hassle. > > Cheers, > > -- Jim > > > >> On Apr 16, 2019, at 5:14 PM, Guy Steele > wrote: >> >> >>> On Apr 16, 2019, at 4:02 PM, Brian Goetz > wrote: >>> >>> Indeed, it does solve a number of "figure out what was the user thinking" questions. >>> >>> You hit the nail on the head regarding IDE support. Our original thinking was that it should be easy to cut and paste between a (say) JSON document and a Java source program, without having to mangle it up in an annoying and error-prone way. >>> >>> I think what you're saying is, in the age of IDEs, that this is not such a problem, and we should focus on what yields the most _readable_ code, on the theory that writing is the job of the IDE? >> >> Yes, exactly. Though, as I pointed out in my subsequent message, it?s easy to give the programmer a choice between a quick-and-easy way to cut-and-paste that is easy to write, and a more labor-intensive or IDE-intensive version that may be easier to read (at least in some situations). >> >> At least we should explore such options. >> >> ?Guy >> > From james.laskey at oracle.com Wed Apr 17 13:18:04 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Wed, 17 Apr 2019 10:18:04 -0300 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> Message-ID: <1FC3BD2E-0B16-47DB-A1AF-FC1C1499F635@oracle.com> The positive aspect; having control of indentation at the literal level. > On Apr 17, 2019, at 10:13 AM, Jim Laskey wrote: > > Brian points out that we under highlighted "closing delimiter influence". Liam eludes to it in his discussion, but I'll break the discussion out here. > > In the following example I'll use "." to indicate whitespace; > > String s = """ > ...............line 1 > ...................line 2 > ..........."""; > > The outcome without closing delimiter influence; > > line 1 > ....line 2 > > represents the result of only including line 1 and line 2 in the determination of the amount of whitespace needs to be removed. > > > The outcome with closing delimiter influence; > > ....line 1 > ........line 2 > > represents the result including line 1, line 2 AND the line containing the closing delimiter. > > The negative aspect of the including the closing delimiter line is that the fat delimiter pattern would have to deviate from the brace pattern. > > Cheers, > > -- Jim > > >> On Apr 17, 2019, at 9:55 AM, Jim Laskey > wrote: >> >> To paraphrase McLuhan, I think that we need to keep the focus on the rider and not the horse they rode in on. This was the main reason I pushed for backtick over triple double quote; making it about the content and not the syntax around it. >> >> In the Raw String Literal round, I suggested that >> >> String query = ``````````````````````````````````````````````` >> SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` >> WHERE `CITY` = ?INDIANAPOLIS' >> ORDER BY `EMP_ID`, `LAST_NAME`; >> ```````````````````````````````````````````````; >> >> might freak out some developers. >> >> If you break this example down, the multiple quotes are not really for the language, but instead for the eye of the beholder. Alignment wise, all the language cares about is the first character of each of the delimiters. The last character can be derived from the longest line based on end of line. So this reduces down to >> >> String query = `` >> SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` >> WHERE `CITY` = ?INDIANAPOLIS' >> ORDER BY `EMP_ID`, `LAST_NAME`; >> ``; // need `` because of the enclosed backticks >> >> or in the fat delimiter world >> >> String query = """ >> SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` >> WHERE `CITY` = ?INDIANAPOLIS' >> ORDER BY `EMP_ID`, `LAST_NAME`; >> """; >> >> In the incidental whitespace discussion, this aligns with closing delimiter influencing indentation. Brian and I have been of four minds on this, where I tend to favour closing delimiter influence and Brian tends otherwise. My strongest argument thus far has been, "Swift does it." >> >> As you point out, management of literal boxing is nightmarish and heavily reliant on IDE support, and I feel strongly that is a problem. I'm in and out of BBEdit , VSC, Atom, and Coda as much as I am in Intellij. The IDE answer doesn't work for me, and may not for (many?) others. What I do see is an IDE putting a light gray box or a different background colour to highlight the "auto aligned" string content. >> >> Bottom line, closing delimiter influencing indentation provides the same information as boxing with less hassle. >> >> Cheers, >> >> -- Jim >> >> >> >>> On Apr 16, 2019, at 5:14 PM, Guy Steele > wrote: >>> >>> >>>> On Apr 16, 2019, at 4:02 PM, Brian Goetz > wrote: >>>> >>>> Indeed, it does solve a number of "figure out what was the user thinking" questions. >>>> >>>> You hit the nail on the head regarding IDE support. Our original thinking was that it should be easy to cut and paste between a (say) JSON document and a Java source program, without having to mangle it up in an annoying and error-prone way. >>>> >>>> I think what you're saying is, in the age of IDEs, that this is not such a problem, and we should focus on what yields the most _readable_ code, on the theory that writing is the job of the IDE? >>> >>> Yes, exactly. Though, as I pointed out in my subsequent message, it?s easy to give the programmer a choice between a quick-and-easy way to cut-and-paste that is easy to write, and a more labor-intensive or IDE-intensive version that may be easier to read (at least in some situations). >>> >>> At least we should explore such options. >>> >>> ?Guy >>> >> > From brian.goetz at oracle.com Wed Apr 17 13:49:06 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 17 Apr 2019 09:49:06 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> Message-ID: <7B8FD516-08E1-42FA-A83E-EEA13CCAF317@oracle.com> To be clear: ?closing delimiter influence? means that, before we strip off that blank line, we look at how much whitespace it has, and we include that in our calculation of ?common leading whitespace prefix?. (This has been called a ?determining line? in prior discussions of alignment.). The degree of control goes in one direction only; you can use this to _decrease_ the amount of whitespace stripped by moving the closing delimiter to the left. I think we can assume that most of the time, users would prefer that the opening delimiter line up with the closing one, especially if we are essentially forcing them to put them on their own lines. Saying String s = ??? stuff ???; feels more natural than String s = ??? stuff ???; or String s = ??? stuff ???; If we assume this to be the user?s default inclination, then ?closing delimiter control? is, in this common case, also ?opening delimiter control?; that the indentation from the opening quote is significant. That means that to get left-aligned strings under this approach, the user would have to do this: String s = ??? stuff more stuff ???; or this String s = ??? stuff more stuff ???; or this: String s = ??? stuff more stuff???; // no trailing NL And to get indented strings, they would do: String s = ??? stuff more stuff ???; So the tradeoff here is, as Jim suggests: - More control over indentation - Less ability to lean on an existing convention Note that we seem to be leaning already towards using the position of the closing delimiter to determine whether there is a trailing NL or not. I think going all the way with this is something users could get their heads around. So it seems a reasonable option. What do we think? > On Apr 17, 2019, at 9:13 AM, Jim Laskey wrote: > > Brian points out that we under highlighted "closing delimiter influence". Liam eludes to it in his discussion, but I'll break the discussion out here. > > In the following example I'll use "." to indicate whitespace; > > String s = """ > ...............line 1 > ...................line 2 > ..........."""; > > The outcome without closing delimiter influence; > > line 1 > ....line 2 > > represents the result of only including line 1 and line 2 in the determination of the amount of whitespace needs to be removed. > > > The outcome with closing delimiter influence; > > ....line 1 > ........line 2 > > represents the result including line 1, line 2 AND the line containing the closing delimiter. > > The negative aspect of the including the closing delimiter line is that the fat delimiter pattern would have to deviate from the brace pattern. > > Cheers, > > -- Jim > > >> On Apr 17, 2019, at 9:55 AM, Jim Laskey > wrote: >> >> To paraphrase McLuhan, I think that we need to keep the focus on the rider and not the horse they rode in on. This was the main reason I pushed for backtick over triple double quote; making it about the content and not the syntax around it. >> >> In the Raw String Literal round, I suggested that >> >> String query = ``````````````````````````````````````````````` >> SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` >> WHERE `CITY` = ?INDIANAPOLIS' >> ORDER BY `EMP_ID`, `LAST_NAME`; >> ```````````````````````````````````````````````; >> >> might freak out some developers. >> >> If you break this example down, the multiple quotes are not really for the language, but instead for the eye of the beholder. Alignment wise, all the language cares about is the first character of each of the delimiters. The last character can be derived from the longest line based on end of line. So this reduces down to >> >> String query = `` >> SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` >> WHERE `CITY` = ?INDIANAPOLIS' >> ORDER BY `EMP_ID`, `LAST_NAME`; >> ``; // need `` because of the enclosed backticks >> >> or in the fat delimiter world >> >> String query = """ >> SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` >> WHERE `CITY` = ?INDIANAPOLIS' >> ORDER BY `EMP_ID`, `LAST_NAME`; >> """; >> >> In the incidental whitespace discussion, this aligns with closing delimiter influencing indentation. Brian and I have been of four minds on this, where I tend to favour closing delimiter influence and Brian tends otherwise. My strongest argument thus far has been, "Swift does it." >> >> As you point out, management of literal boxing is nightmarish and heavily reliant on IDE support, and I feel strongly that is a problem. I'm in and out of BBEdit , VSC, Atom, and Coda as much as I am in Intellij. The IDE answer doesn't work for me, and may not for (many?) others. What I do see is an IDE putting a light gray box or a different background colour to highlight the "auto aligned" string content. >> >> Bottom line, closing delimiter influencing indentation provides the same information as boxing with less hassle. >> >> Cheers, >> >> -- Jim >> >> >> >>> On Apr 16, 2019, at 5:14 PM, Guy Steele > wrote: >>> >>> >>>> On Apr 16, 2019, at 4:02 PM, Brian Goetz > wrote: >>>> >>>> Indeed, it does solve a number of "figure out what was the user thinking" questions. >>>> >>>> You hit the nail on the head regarding IDE support. Our original thinking was that it should be easy to cut and paste between a (say) JSON document and a Java source program, without having to mangle it up in an annoying and error-prone way. >>>> >>>> I think what you're saying is, in the age of IDEs, that this is not such a problem, and we should focus on what yields the most _readable_ code, on the theory that writing is the job of the IDE? >>> >>> Yes, exactly. Though, as I pointed out in my subsequent message, it?s easy to give the programmer a choice between a quick-and-easy way to cut-and-paste that is easy to write, and a more labor-intensive or IDE-intensive version that may be easier to read (at least in some situations). >>> >>> At least we should explore such options. >>> >>> ?Guy >>> >> > From james.laskey at oracle.com Wed Apr 17 14:22:24 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Wed, 17 Apr 2019 11:22:24 -0300 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> Message-ID: <431BC862-12BE-4BBC-9B09-0C2206DFCB55@oracle.com> (inline and at the bottom) > On Apr 16, 2019, at 8:03 PM, Liam Miller-Cushon wrote: > > On Wed, Apr 10, 2019 at 8:25 AM Jim Laskey > wrote: > Don?t peek yet. Stop and comment first. > > Here's a variation on the narrative you provided. This is more thinking aloud than a proposal I'm advocating for, but I think it's a potentially interesting point in the design space. > > Idea: multi-line strings should be formatted like other expressions, rather than a block-like construct (this differs from the 'fat delimiter pattern' in the original message), i.e.: > > String a = """ > ... > """; // +8 expression continuation indent > > Policy: both delimiters must be on a separate line from the contents of the string. This means we can't have 'multi-line' strings that do not span multiple lines of source code. Stephen also suggests this. While not unreasonable, it's more of a convention than a rule. We need a strong argument for why to not allow content on the same lines. > > Since the closing delimiter's indentation is independent of the position of the opening delimiter, and cannot be on the same line as the string contents, this opens the door to basing the relative indentation off the closing delimiter. (It also avoids needing information from the scanner to manage indentation.) This has been mentioned before and is "how Swift does it". I tend to favour this approach and what String::align did until CSR discussions. > > > What control knobs do we have, that we could assign meaning to, that would let the user choose either way? > > The position of the closing delimiter. > > > Do we allow content on the same lines as the delimiters? > > No > > > How do we interpret auto-alignment on single-line strings? Strip? > > Single-line strings cannot use `"""`, since we require the delimiters to be on their own line. > > > Should we always add a final newline? > > Should we strip blanks lines? Only on the first and last? All leading and trailing? > > Should we right strip lines? > > Maybe (read on for some discussion of trailing newlines). For the most part the other choices don't affect the following examples. > > Examples: > > String a = """ > +--------+ > | text | > +--------+ > """; // first characters in first column? > > Actual: > > +--------+\n > | text |\n > +--------+ > > String b = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented four spaces? > > Actual: +4 indentation relative to closing delimiter: > > +--------+\n > | text |\n > +--------+ > > String c = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented several? > > Actual: indented several, relative to closing delimiter: > > +--------+\n > | text |\n > +--------+ > > String d = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented four? > > Actual: +4 indentation relative to closing delimiter: > > +--------+\n > | text |\n > +--------+ > > String e = > """ > +--------+ > | text | > +--------+ > """; // heredoc? > > Actual: > > +--------+\n > | text |\n > +--------+ > > String f = """ > > > +--------+ > | text | > +--------+ > > > """; // one or all leading or trailing blank lines stripped? > > Actual: > > \n > \n > +--------+\n > | text |\n > +--------+\n > \n > > > Examples g, h, and i are disallowed, since delimiters must be on their own line. > > For j, since delimiters must be on their own line, we'd need something like: > > String j = """ > public static void > """ > + " " + name > + """ > (String... args) { > System.out.println(String.join(args)); > } > """; > > Using a multi-line string for "public static void" is unnecessary, but I kept that to illustrate what concatenation looks like with multi-line strings. This example suggests we'd usually prefer `String.format` to concatenating multi-line strings under this approach. > > Conclusions: > > Requiring delimiters to be on their own line rules out some potentially surprising edge-cases, and allows indentation management to work without special knowledge from the scanner. It also provides a convenient opt-out mechanism if leading indentation is desired. > > One disadvantage is the handling of the trailing newline. Requiring the closing delimiter to be on its own line means there's always a trailing newline in source. If we want to allow expressing multi-line strings that don't have a trailing newline we could automatically trim one trailing newline character, but then it would be necessary to leave an extra blank line after multi-line strings in cases where a trailing newline is actually desired. Example: > > String message = """ > hello > world > > """; > > Actual: > > hello\n > world\n > Not sure that having or not having a new line at the end really matters much. Both String::split('\n') and String::lines() are agnostic whether the last \n is present. Of course, it's easier to add than to remove, so not having it might be preferable. But, that is all conditional on the "both delimiters must be on a separate line from the contents of the string" rule. As stated, this is more of a convention than a syntax rule. Let's go back to having close delimiter influencing indentation and the original close delimiter influencing presence of trailing \n. Can we have both? Do they conflict? If so, how do we counteract the default action? So the default alignment would play out as this; String s = """ line 1 line 2 """; Actual: line 1\n line 2\n Let's look at the two contrary examples. // We want to indent 4 but no \n at end String noNewline = """ line 1 line 2 """; Actual fail: line 1\n line 2\n In this case, we get the indentation we want, but get a final \n we don't want. A workaround here is String noNewline = """ line 1 line 2""". indent(4); Actual: line 1\n line 2 Next case. // We want no \n at the end but we want to indent 4 String wrongIndent = """ line 1 line 2"""; Actual fails: line 1\n line 2 In this case, we drop the \n we want to drop, but get the wrong indentation. Interestingly, the workaround here is the same String wrongIndent = """ line 1 line 2""". indent(4); Actual: line 1\n line 2 So we can have both close delimiter influences, with workarounds for the contrary cases. From guy.steele at oracle.com Wed Apr 17 15:41:36 2019 From: guy.steele at oracle.com (Guy Steele) Date: Wed, 17 Apr 2019 11:41:36 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> Message-ID: > On Apr 17, 2019, at 8:55 AM, Jim Laskey wrote: > > To paraphrase McLuhan, I think that we need to keep the focus on the rider and not the horse they rode in on. This was the main reason I pushed for backtick over triple double quote; making it about the content and not the syntax around it. > > In the Raw String Literal round, I suggested that > > String query = ``````````````````````````````````````````````` > SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` > WHERE `CITY` = ?INDIANAPOLIS' > ORDER BY `EMP_ID`, `LAST_NAME`; > ```````````````````````````````````````````````; > > might freak out some developers. > > If you break this example down, the multiple quotes are not really for the language, but instead for the eye of the beholder. Exactly. Which is why I think it?s an advantage that it can be supported as an optional and orthogonal extension. When the programmer reaches the point where he cares more about readability of a reasonably final result than the ability to make changes easily, then a potentially easier-for-the-eye-of-the-beholder notation can be available. I envision a programmer debugging code using Swift-stye multiline strings, and then later asking the IDE to turn it into a pretty box. (If we do it right, this transformation is reversible so that the string content can easily be edited or replaced.) > Alignment wise, all the language cares about is the first character of each of the delimiters. The last character can be derived from the longest line based on end of line. So this reduces down to > > String query = `` > SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` > WHERE `CITY` = ?INDIANAPOLIS' > ORDER BY `EMP_ID`, `LAST_NAME`; > ``; // need `` because of the enclosed backticks > > or in the fat delimiter world > > String query = """ > SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` > WHERE `CITY` = ?INDIANAPOLIS' > ORDER BY `EMP_ID`, `LAST_NAME`; > """; > > In the incidental whitespace discussion, this aligns with closing delimiter influencing indentation. Brian and I have been of four minds on this, where I tend to favour closing delimiter influence and Brian tends otherwise. My strongest argument thus far has been, "Swift does it." > > As you point out, management of literal boxing is nightmarish and heavily reliant on IDE support, and I feel strongly that is a problem. I'm in and out of BBEdit , VSC, Atom, and Coda as much as I am in Intellij. The IDE answer doesn't work for me, and may not for (many?) others. What I do see is an IDE putting a light gray box or a different background colour to highlight the "auto aligned" string content. > > Bottom line, closing delimiter influencing indentation provides the same information as boxing with less hassle. I am sympathetic to the hassle caused by lack of support from IDEs. On the other hand, it?s a pity if sticking purely to current IDE capabilities causes us to be stuck in a lowest-common-denominator situation forever. I well remember when no IDE supported automatic indentation. (In fact, I helped write one of the earliest automatic-indentation routines, for Lisp in Emacs). Early programming languages such as Fortran 66 and BASIC did not have block structure and did not use indentation as part of their customary programming style. Even when a kind of block structure was adopted in Fortran 77, programmers did not always use indentation to highlight it. Languages such as Lisp and Algol and Pascal and C did have block structure (or parenthetical structure), and a lot of programmers wore out a lot of space bars on their keyboards doing their indentation manually. It was a pain, but well worth the resulting readability. It took many years before IDEs caught up, but eventually they all did, and now we take automatic indentation for granted; we would laugh at any program editor that did not provide it. With that history, I bet if we added rectangle-strings to Java, it would be a pain for about a year, maybe less, but by then all IDEs would support it. On the other hand, if the only thing that the rectangle-string proposal accomplishes is to get everyone to agree that trailing-delimiter conventions are acceptable, then it will have served a good purpose. :-) ?Guy From kevinb at google.com Wed Apr 17 17:23:12 2019 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 17 Apr 2019 10:23:12 -0700 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> Message-ID: On Wed, Apr 17, 2019 at 5:56 AM Jim Laskey wrote: String query = """ > SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` > WHERE `CITY` = ?INDIANAPOLIS' > ORDER BY `EMP_ID`, `LAST_NAME`; > """; > > In the incidental whitespace discussion, this aligns with closing > delimiter influencing indentation. Brian and I have been of four minds on > this, where I tend to favour closing delimiter influence and Brian tends > otherwise. My strongest argument thus far has been, "Swift does it." > I'm sorry I'm not completely caught up on this discussion yet and may have missed something. But I'm confused what the alternative to using the closing delimiter position is. You certainly don't want to magically use the column position of the *opening* delimiter in this example! That is *definitely* incidental, as it depends on what the `query` variable got renamed to later by some refactoring tool. Variable renames shouldn't change program behavior. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From elias at vasylenko.uk Wed Apr 17 17:56:59 2019 From: elias at vasylenko.uk (Elias N Vasylenko) Date: Wed, 17 Apr 2019 18:56:59 +0100 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> Message-ID: <8a6d63dc3257473cbd185f250e0ca1b813aacbc9.camel@vasylenko.uk> If we design with our preferred IDE support in mind then Guy's box-of- quotes proposal appears to be functionally equivalent to Jim's IDE- highlighted-box suggestion. Given two equivalent proposals I'd go with the one that delegates to the IDE rather than complicating the language spec. And if we do care about fallback behaviour without IDE support, even just a little, it becomes a choice between: - Box-of-quotes: Easy to read, horrible to edit. - IDE-highlighted-box: Possibly tricky to read, easy to edit. One more very minor point against the box-of-quotes: believe it or not there is a small-but-non-empty set of people who prefer programming with proportional fonts, no matter how perverse that may seem to the rest of us. They wouldn't appreciate this feature! On Wed, 2019-04-17 at 11:41 -0400, Guy Steele wrote: > > On Apr 17, 2019, at 8:55 AM, Jim Laskey > > wrote: > > > > To paraphrase McLuhan, I think that we need to keep the focus on > > the rider and not the horse they rode in on. This was the main > > reason I pushed for backtick over triple double quote; making it > > about the content and not the syntax around it. > > > > In the Raw String Literal round, I suggested that > > > > String query = ``````````````````````````````````````````````` > > SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` > > WHERE `CITY` = ?INDIANAPOLIS' > > ORDER BY `EMP_ID`, `LAST_NAME`; > > ```````````````````````````````````````````````; > > > > might freak out some developers. > > > > If you break this example down, the multiple quotes are not really > > for the language, but instead for the eye of the beholder. > > Exactly. Which is why I think it?s an advantage that it can be > supported as an optional and orthogonal extension. When the > programmer reaches the point where he cares more about readability of > a reasonably final result than the ability to make changes easily, > then a potentially easier-for-the-eye-of-the-beholder notation can be > available. I envision a programmer debugging code using Swift-stye > multiline strings, and then later asking the IDE to turn it into a > pretty box. (If we do it right, this transformation is reversible so > that the string content can easily be edited or replaced.) > > > Alignment wise, all the language cares about is the first character > > of each of the delimiters. The last character can be derived from > > the longest line based on end of line. So this reduces down to > > > > String query = `` > > SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` > > WHERE `CITY` = ?INDIANAPOLIS' > > ORDER BY `EMP_ID`, `LAST_NAME`; > > ``; // need `` because of the enclosed backticks > > > > or in the fat delimiter world > > > > String query = """ > > SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` > > WHERE `CITY` = ?INDIANAPOLIS' > > ORDER BY `EMP_ID`, `LAST_NAME`; > > """; > > > > In the incidental whitespace discussion, this aligns with closing > > delimiter influencing indentation. Brian and I have been of four > > minds on this, where I tend to favour closing delimiter influence > > and Brian tends otherwise. My strongest argument thus far has been, > > "Swift does it." > > > > As you point out, management of literal boxing is nightmarish and > > heavily reliant on IDE support, and I feel strongly that is a > > problem. I'm in and out of BBEdit , VSC, Atom, and Coda as much as > > I am in Intellij. The IDE answer doesn't work for me, and may not > > for (many?) others. What I do see is an IDE putting a light gray > > box or a different background colour to highlight the "auto > > aligned" string content. > > > > Bottom line, closing delimiter influencing indentation provides the > > same information as boxing with less hassle. > > I am sympathetic to the hassle caused by lack of support from > IDEs. On the other hand, it?s a pity if sticking purely to current > IDE capabilities causes us to be stuck in a lowest-common-denominator > situation forever. > > I well remember when no IDE supported automatic indentation. (In > fact, I helped write one of the earliest automatic-indentation > routines, for Lisp in Emacs). Early programming languages such as > Fortran 66 and BASIC did not have block structure and did not use > indentation as part of their customary programming style. Even when > a kind of block structure was adopted in Fortran 77, programmers did > not always use indentation to highlight it. > > Languages such as Lisp and Algol and Pascal and C did have block > structure (or parenthetical structure), and a lot of programmers wore > out a lot of space bars on their keyboards doing their indentation > manually. It was a pain, but well worth the resulting > readability. It took many years before IDEs caught up, but > eventually they all did, and now we take automatic indentation for > granted; we would laugh at any program editor that did not provide > it. > > With that history, I bet if we added rectangle-strings to Java, it > would be a pain for about a year, maybe less, but by then all IDEs > would support it. > > On the other hand, if the only thing that the rectangle-string > proposal accomplishes is to get everyone to agree that trailing- > delimiter conventions are acceptable, then it will have served a good > purpose. :-) > > ?Guy > > > > From guy.steele at oracle.com Wed Apr 17 18:20:53 2019 From: guy.steele at oracle.com (Guy Steele) Date: Wed, 17 Apr 2019 14:20:53 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <8a6d63dc3257473cbd185f250e0ca1b813aacbc9.camel@vasylenko.uk> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> <8a6d63dc3257473cbd185f250e0ca1b813aacbc9.camel@vasylenko.uk> Message-ID: <306DAB47-737F-4EA4-B6F8-4661855ADAD5@oracle.com> > On Apr 17, 2019, at 1:56 PM, Elias N Vasylenko wrote: > . . . > One more very minor point against the box-of-quotes: believe it or not > there is a small-but-non-empty set of people who prefer programming > with proportional fonts, no matter how perverse that may seem to the > rest of us. They wouldn't appreciate this feature! Touch?. From kevinb at google.com Wed Apr 17 18:22:54 2019 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 17 Apr 2019 11:22:54 -0700 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <8a6d63dc3257473cbd185f250e0ca1b813aacbc9.camel@vasylenko.uk> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> <8a6d63dc3257473cbd185f250e0ca1b813aacbc9.camel@vasylenko.uk> Message-ID: On Wed, Apr 17, 2019 at 11:12 AM Elias N Vasylenko wrote: If we design with our preferred IDE support in mind then Guy's box-of- > quotes proposal appears to be functionally equivalent to Jim's IDE- > highlighted-box suggestion. > > Given two equivalent proposals I'd go with the one that delegates to > the IDE rather than complicating the language spec. > I don't even feel the proposals are equivalent. We've already had ample opportunity to discover that boxes of stars around *comments* are a pain and almost no one wants to use them. Talking about the IDE doesn't paint the complete picture. Even if it maintains the width of that box for you as your needs change, the fact that this creates large diffs for small changes -- a blast radius -- is still bad for code reviews, bad for merges, and so on. If we really feel that users will find it too hard to understand where the rectangle is, that would be different, but I really don't think this is going to be hard. And if we do care about fallback behaviour without IDE support, even > just a little, it becomes a choice between: > > - Box-of-quotes: Easy to read, horrible to edit. > > - IDE-highlighted-box: Possibly tricky to read, easy to edit. > > One more very minor point against the box-of-quotes: believe it or not > there is a small-but-non-empty set of people who prefer programming > with proportional fonts, no matter how perverse that may seem to the > rest of us. They wouldn't appreciate this feature! > > On Wed, 2019-04-17 at 11:41 -0400, Guy Steele wrote: > > > On Apr 17, 2019, at 8:55 AM, Jim Laskey > > > wrote: > > > > > > To paraphrase McLuhan, I think that we need to keep the focus on > > > the rider and not the horse they rode in on. This was the main > > > reason I pushed for backtick over triple double quote; making it > > > about the content and not the syntax around it. > > > > > > In the Raw String Literal round, I suggested that > > > > > > String query = ``````````````````````````````````````````````` > > > SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` > > > WHERE `CITY` = ?INDIANAPOLIS' > > > ORDER BY `EMP_ID`, `LAST_NAME`; > > > ```````````````````````````````````````````````; > > > > > > might freak out some developers. > > > > > > If you break this example down, the multiple quotes are not really > > > for the language, but instead for the eye of the beholder. > > > > Exactly. Which is why I think it?s an advantage that it can be > > supported as an optional and orthogonal extension. When the > > programmer reaches the point where he cares more about readability of > > a reasonably final result than the ability to make changes easily, > > then a potentially easier-for-the-eye-of-the-beholder notation can be > > available. I envision a programmer debugging code using Swift-stye > > multiline strings, and then later asking the IDE to turn it into a > > pretty box. (If we do it right, this transformation is reversible so > > that the string content can easily be edited or replaced.) > > > > > Alignment wise, all the language cares about is the first character > > > of each of the delimiters. The last character can be derived from > > > the longest line based on end of line. So this reduces down to > > > > > > String query = `` > > > SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` > > > WHERE `CITY` = ?INDIANAPOLIS' > > > ORDER BY `EMP_ID`, `LAST_NAME`; > > > ``; // need `` because of the enclosed backticks > > > > > > or in the fat delimiter world > > > > > > String query = """ > > > SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB` > > > WHERE `CITY` = ?INDIANAPOLIS' > > > ORDER BY `EMP_ID`, `LAST_NAME`; > > > """; > > > > > > In the incidental whitespace discussion, this aligns with closing > > > delimiter influencing indentation. Brian and I have been of four > > > minds on this, where I tend to favour closing delimiter influence > > > and Brian tends otherwise. My strongest argument thus far has been, > > > "Swift does it." > > > > > > As you point out, management of literal boxing is nightmarish and > > > heavily reliant on IDE support, and I feel strongly that is a > > > problem. I'm in and out of BBEdit , VSC, Atom, and Coda as much as > > > I am in Intellij. The IDE answer doesn't work for me, and may not > > > for (many?) others. What I do see is an IDE putting a light gray > > > box or a different background colour to highlight the "auto > > > aligned" string content. > > > > > > Bottom line, closing delimiter influencing indentation provides the > > > same information as boxing with less hassle. > > > > I am sympathetic to the hassle caused by lack of support from > > IDEs. On the other hand, it?s a pity if sticking purely to current > > IDE capabilities causes us to be stuck in a lowest-common-denominator > > situation forever. > > > > I well remember when no IDE supported automatic indentation. (In > > fact, I helped write one of the earliest automatic-indentation > > routines, for Lisp in Emacs). Early programming languages such as > > Fortran 66 and BASIC did not have block structure and did not use > > indentation as part of their customary programming style. Even when > > a kind of block structure was adopted in Fortran 77, programmers did > > not always use indentation to highlight it. > > > > Languages such as Lisp and Algol and Pascal and C did have block > > structure (or parenthetical structure), and a lot of programmers wore > > out a lot of space bars on their keyboards doing their indentation > > manually. It was a pain, but well worth the resulting > > readability. It took many years before IDEs caught up, but > > eventually they all did, and now we take automatic indentation for > > granted; we would laugh at any program editor that did not provide > > it. > > > > With that history, I bet if we added rectangle-strings to Java, it > > would be a pain for about a year, maybe less, but by then all IDEs > > would support it. > > > > On the other hand, if the only thing that the rectangle-string > > proposal accomplishes is to get everyone to agree that trailing- > > delimiter conventions are acceptable, then it will have served a good > > purpose. :-) > > > > ?Guy > > > > > > > > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Wed Apr 17 18:33:31 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 17 Apr 2019 14:33:31 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> Message-ID: <0E024C32-8682-4972-B356-57C24CAD444E@oracle.com> > > I'm sorry I'm not completely caught up on this discussion yet and may have missed something. But I'm confused what the alternative to using the closing delimiter position is. You certainly don't want to magically use the column position of the *opening* delimiter in this example! That is *definitely* incidental, as it depends on what the `query` variable got renamed to later by some refactoring tool. Variable renames shouldn't change program behavior. > Version 1 of the algorithm is: - Strip leading and trailing blanks (perhaps limited to one each only) - Compute the maximal common whitespace prefix of the remaining lines, and trim that off each line - If a blank last line was trimmed, add back a newline Version 2 of the algorithm ? the ?significant closing delimiter? version ? is: - Strip leading and trailing blanks (perhaps limited to one each only) - Compute the maximal common whitespace prefix of the remaining lines, _including the stripped trailing blank line from above, if any_, and trim that off - If a blank last line was trimmed, add back a newline In other words, the difference is wether we strip off the trailing blank before or after we compute the leading whitespace prefix. In neither case do we use scanner position. From brian.goetz at oracle.com Wed Apr 17 18:40:13 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 17 Apr 2019 14:40:13 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <0E024C32-8682-4972-B356-57C24CAD444E@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> <0E024C32-8682-4972-B356-57C24CAD444E@oracle.com> Message-ID: <2F003D6A-4E83-46D3-A116-9D1DD9FBEFE4@oracle.com> Where this makes a difference is: String s = ??? I am a block of text ???; Under Version 1, we would get: I am a block of text Under version 2, we would get I am a block of text Under version 1, if we wanted the latter answer, we would have to do .indent(4) or something like that. > On Apr 17, 2019, at 2:33 PM, Brian Goetz wrote: > >> >> I'm sorry I'm not completely caught up on this discussion yet and may have missed something. But I'm confused what the alternative to using the closing delimiter position is. You certainly don't want to magically use the column position of the *opening* delimiter in this example! That is *definitely* incidental, as it depends on what the `query` variable got renamed to later by some refactoring tool. Variable renames shouldn't change program behavior. >> > > Version 1 of the algorithm is: > - Strip leading and trailing blanks (perhaps limited to one each only) > - Compute the maximal common whitespace prefix of the remaining lines, and trim that off each line > - If a blank last line was trimmed, add back a newline > > Version 2 of the algorithm ? the ?significant closing delimiter? version ? is: > - Strip leading and trailing blanks (perhaps limited to one each only) > - Compute the maximal common whitespace prefix of the remaining lines, _including the stripped trailing blank line from above, if any_, and trim that off > - If a blank last line was trimmed, add back a newline > > In other words, the difference is wether we strip off the trailing blank before or after we compute the leading whitespace prefix. In neither case do we use scanner position. From kevinb at google.com Wed Apr 17 18:53:43 2019 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 17 Apr 2019 11:53:43 -0700 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <2F003D6A-4E83-46D3-A116-9D1DD9FBEFE4@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> <0E024C32-8682-4972-B356-57C24CAD444E@oracle.com> <2F003D6A-4E83-46D3-A116-9D1DD9FBEFE4@oracle.com> Message-ID: Really sorry for misreading that and seeing a "version 3" you weren't talking about. In my defense Liam and I had recently been talking about version 3 so I just accidentally mapped it on to this discussion. :-) Thanks as ever for the patient explanation. I may not have a *strong* position on this debate then. But here's something to think about. A *lot* of our multiline use cases are an embedded language like sql/xml/etc.; I think even a small *majority*. These get parsed. Parsing sometimes throws an exception. Remedying such an exception is of course a key developer workflow. At some point soon I think we should propose a new interface to be implemented by such exceptions, that defines the methods for getting either line-and-column or character offset of the parse error (potentially a list of such). When an IDE displays such an exception to a user, that user will want to click on it and be taken to the correct position in their Java source file that this location represents. This is a bit of a disadvantage on anything that requires users to postprocess their multiline strings e.g. with `.indent(4)`. It's not impossible that an IDE might grok and compensate for that call, it just starts to get a little weird. On Wed, Apr 17, 2019 at 11:40 AM Brian Goetz wrote: > Where this makes a difference is: > > String s = ??? > I am > a block of text > ???; > > Under Version 1, we would get: > > I am > a block of text > > Under version 2, we would get > > I am > a block of text > > Under version 1, if we wanted the latter answer, we would have to do > .indent(4) or something like that. > > > On Apr 17, 2019, at 2:33 PM, Brian Goetz wrote: > > > I'm sorry I'm not completely caught up on this discussion yet and may have > missed something. But I'm confused what the alternative to using the > closing delimiter position is. You certainly don't want to magically use > the column position of the *opening* delimiter in this example! That is > *definitely* incidental, as it depends on what the `query` variable got > renamed to later by some refactoring tool. Variable renames shouldn't > change program behavior. > > > Version 1 of the algorithm is: > - Strip leading and trailing blanks (perhaps limited to one each only) > - Compute the maximal common whitespace prefix of the remaining lines, > and trim that off each line > - If a blank last line was trimmed, add back a newline > > Version 2 of the algorithm ? the ?significant closing delimiter? version ? > is: > - Strip leading and trailing blanks (perhaps limited to one each only) > - Compute the maximal common whitespace prefix of the remaining lines, > _including the stripped trailing blank line from above, if any_, and trim > that off > - If a blank last line was trimmed, add back a newline > > In other words, the difference is wether we strip off the trailing blank > before or after we compute the leading whitespace prefix. In neither case > do we use scanner position. > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Wed Apr 17 19:03:11 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 17 Apr 2019 15:03:11 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> <0E024C32-8682-4972-B356-57C24CAD444E@oracle.com> <2F003D6A-4E83-46D3-A116-9D1DD9FBEFE4@oracle.com> Message-ID: Interesting idea. This pushes towards the ?let the language do as much as it can?, so that such indexes are maximally useful. But I wonder if this approach is doomed to run out of gas anyway? Because, even if we do auto-alignment, so there is a single ML string literal, there is likely to be substitution of some kind: String s = ??? SELECT * from %s where (condition) ???.formatted(tableName); Now, suppose there is an error in the condition; the column number will still be wrong unless the table name happens to have two letters in it. And the same is true if we did the substitution via linguistic interpolation rather than by a call to a method like formatted(). Coming back to the main question ? and one that we hope could be answered by data ? is whether it is common that embedded snippets will want some actual leading common whitespace (indentation) rather than being pushed as far to the left as they can go. I suspect so ? which is an argument in favor of allowing this to be controlled linguistically (assuming we?re doing our alignment linguistically in the first place.). > On Apr 17, 2019, at 2:53 PM, Kevin Bourrillion wrote: > > Really sorry for misreading that and seeing a "version 3" you weren't talking about. In my defense Liam and I had recently been talking about version 3 so I just accidentally mapped it on to this discussion. :-) Thanks as ever for the patient explanation. > > I may not have a *strong* position on this debate then. > > But here's something to think about. A lot of our multiline use cases are an embedded language like sql/xml/etc.; I think even a small majority. These get parsed. Parsing sometimes throws an exception. Remedying such an exception is of course a key developer workflow. > > At some point soon I think we should propose a new interface to be implemented by such exceptions, that defines the methods for getting either line-and-column or character offset of the parse error (potentially a list of such). When an IDE displays such an exception to a user, that user will want to click on it and be taken to the correct position in their Java source file that this location represents. > > This is a bit of a disadvantage on anything that requires users to postprocess their multiline strings e.g. with `.indent(4)`. It's not impossible that an IDE might grok and compensate for that call, it just starts to get a little weird. > > > > On Wed, Apr 17, 2019 at 11:40 AM Brian Goetz > wrote: > Where this makes a difference is: > > String s = ??? > I am > a block of text > ???; > > Under Version 1, we would get: > > I am > a block of text > > Under version 2, we would get > > I am > a block of text > > Under version 1, if we wanted the latter answer, we would have to do .indent(4) or something like that. > > >> On Apr 17, 2019, at 2:33 PM, Brian Goetz > wrote: >> >>> >>> I'm sorry I'm not completely caught up on this discussion yet and may have missed something. But I'm confused what the alternative to using the closing delimiter position is. You certainly don't want to magically use the column position of the *opening* delimiter in this example! That is *definitely* incidental, as it depends on what the `query` variable got renamed to later by some refactoring tool. Variable renames shouldn't change program behavior. >>> >> >> Version 1 of the algorithm is: >> - Strip leading and trailing blanks (perhaps limited to one each only) >> - Compute the maximal common whitespace prefix of the remaining lines, and trim that off each line >> - If a blank last line was trimmed, add back a newline >> >> Version 2 of the algorithm ? the ?significant closing delimiter? version ? is: >> - Strip leading and trailing blanks (perhaps limited to one each only) >> - Compute the maximal common whitespace prefix of the remaining lines, _including the stripped trailing blank line from above, if any_, and trim that off >> - If a blank last line was trimmed, add back a newline >> >> In other words, the difference is wether we strip off the trailing blank before or after we compute the leading whitespace prefix. In neither case do we use scanner position. > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Wed Apr 17 19:12:05 2019 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 17 Apr 2019 12:12:05 -0700 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> <0E024C32-8682-4972-B356-57C24CAD444E@oracle.com> <2F003D6A-4E83-46D3-A116-9D1DD9FBEFE4@oracle.com> Message-ID: We'll try to get some numbers on the occurrence of common-leading-indentation. We can also look at how common substitute-before-parse appears to be. However, we have heavily moved toward libraries that can parse first and substitute later, because they don't suffer from injection attacks. On Wed, Apr 17, 2019 at 12:03 PM Brian Goetz wrote: > Interesting idea. This pushes towards the ?let the language do as much as > it can?, so that such indexes are maximally useful. But I wonder if this > approach is doomed to run out of gas anyway? Because, even if we do > auto-alignment, so there is a single ML string literal, there is likely to > be substitution of some kind: > > String s = ??? > SELECT * from %s where (condition) > ???.formatted(tableName); > > Now, suppose there is an error in the condition; the column number will > still be wrong unless the table name happens to have two letters in it. > And the same is true if we did the substitution via linguistic > interpolation rather than by a call to a method like formatted(). > > > Coming back to the main question ? and one that we hope could be answered > by data ? is whether it is common that embedded snippets will want some > actual leading common whitespace (indentation) rather than being pushed as > far to the left as they can go. I suspect so ? which is an argument in > favor of allowing this to be controlled linguistically (assuming we?re > doing our alignment linguistically in the first place.). > > On Apr 17, 2019, at 2:53 PM, Kevin Bourrillion wrote: > > Really sorry for misreading that and seeing a "version 3" you weren't > talking about. In my defense Liam and I had recently been talking about > version 3 so I just accidentally mapped it on to this discussion. :-) > Thanks as ever for the patient explanation. > > I may not have a *strong* position on this debate then. > > But here's something to think about. A *lot* of our multiline use cases > are an embedded language like sql/xml/etc.; I think even a small > *majority*. These get parsed. Parsing sometimes throws an exception. > Remedying such an exception is of course a key developer workflow. > > At some point soon I think we should propose a new interface to be > implemented by such exceptions, that defines the methods for getting either > line-and-column or character offset of the parse error (potentially a list > of such). When an IDE displays such an exception to a user, that user will > want to click on it and be taken to the correct position in their Java > source file that this location represents. > > This is a bit of a disadvantage on anything that requires users to > postprocess their multiline strings e.g. with `.indent(4)`. It's not > impossible that an IDE might grok and compensate for that call, it just > starts to get a little weird. > > > > On Wed, Apr 17, 2019 at 11:40 AM Brian Goetz > wrote: > >> Where this makes a difference is: >> >> String s = ??? >> I am >> a block of text >> ???; >> >> Under Version 1, we would get: >> >> I am >> a block of text >> >> Under version 2, we would get >> >> I am >> a block of text >> >> Under version 1, if we wanted the latter answer, we would have to do >> .indent(4) or something like that. >> >> >> On Apr 17, 2019, at 2:33 PM, Brian Goetz wrote: >> >> >> I'm sorry I'm not completely caught up on this discussion yet and may >> have missed something. But I'm confused what the alternative to using the >> closing delimiter position is. You certainly don't want to magically use >> the column position of the *opening* delimiter in this example! That is >> *definitely* incidental, as it depends on what the `query` variable got >> renamed to later by some refactoring tool. Variable renames shouldn't >> change program behavior. >> >> >> Version 1 of the algorithm is: >> - Strip leading and trailing blanks (perhaps limited to one each only) >> - Compute the maximal common whitespace prefix of the remaining lines, >> and trim that off each line >> - If a blank last line was trimmed, add back a newline >> >> Version 2 of the algorithm ? the ?significant closing delimiter? version >> ? is: >> - Strip leading and trailing blanks (perhaps limited to one each only) >> - Compute the maximal common whitespace prefix of the remaining lines, >> _including the stripped trailing blank line from above, if any_, and trim >> that off >> - If a blank last line was trimmed, add back a newline >> >> In other words, the difference is wether we strip off the trailing blank >> before or after we compute the leading whitespace prefix. In neither case >> do we use scanner position. >> >> >> > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From james.laskey at oracle.com Wed Apr 17 19:58:27 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Wed, 17 Apr 2019 16:58:27 -0300 Subject: String reboot - (1a) incidental whitespace Message-ID: <17A2E5F6-958A-4BDC-A8C3-4675D9DA9193@oracle.com> I pushed changes to http://hg.openjdk.java.net/amber/amber string-tapas branch to reflect the incidental whitespace discussion. What is implemented is what Brian described as Version 2 of the algorithm ? the ?significant closing delimiter? version ? is: - Strip leading and trailing blanks (perhaps limited to one each only) - Compute the maximal common whitespace prefix of the remaining lines, _including the stripped trailing blank line from above, if any_, and trim that off - If a blank last line was trimmed, add back a newline With strip leading and trailing blanks limited to one each. I also added a "super escape" \~ to opt out of auto align. Example: String l = """\~ +--------+ | text | +--------+ """; // what opt out might look like Actual: \n .......................+--------+\n .......................| text |\n .......................+--------+\n ................... Note that the String#align needs tweaking to reflect the algorithm (needs to add closing delimiter influence). There is a sampler at http://cr.openjdk.java.net/~jlaskey/Strings/AutoAlign.java (included below as well) that shows the examples from my original (1a) incidental whitespace e-mail. Cheers, -- Jim _____________________________________________________________________________________________________________ public class AutoAlign { public static void report(String label, String result) { System.out.format("Result of variable %s%n", label); String formatted = result.replaceAll(" ", ".") .replaceAll("\n", "\\\\n\n"); System.out.format("%s%n%n", formatted); } public static void main(String... args) throws Exception { String a = """ +--------+ | text | +--------+ """; // first characters in first column? String b = """ +--------+ | text | +--------+ """; // first characters in first column or indented four spaces? String c = """ +--------+ | text | +--------+ """; // first characters in first column or indented several? String d = """ +--------+ | text | +--------+ """; // first characters in first column or indented four? String e = """ +--------+ | text | +--------+ """; // heredoc? String f = """ +--------+ | text | +--------+ """; // one or all leading or trailing blank lines stripped? String g = """ +--------+ | text | +--------+"""; // Last \n dropped String h = """+--------+ | text | +--------+"""; // determine indent of first line using scanner knowledge? String i = """ "nested" """; // strip leading/trailing space? String name = " methodName"; String j = (""" public static void """ + name + """(String... args) { System.out.println(String.join(args)); } """).align(); // how do we handle expressions with multi-line strings? String k = String.format(""" public static void %s(String... args) { System.out.println(String.join(args)); } """, name); // is this the answer to multi-line string expressions? String l = """\~ +--------+ | text | +--------+ """; // what opt out might look like report("a", a); report("b", b); report("c", c); report("d", d); report("e", e); report("f", f); report("g", g); report("h", h); report("i", i); report("j", j); report("k", k); report("l", l); } } _____________________________________________________________________________________________________________ Result of variable a +--------+\n |..text..|\n +--------+\n Result of variable b ....+--------+\n ....|..text..|\n ....+--------+\n Result of variable c ...............+--------+\n ...............|..text..|\n ...............+--------+\n Result of variable d ....+--------+\n ....|..text..|\n ....+--------+\n Result of variable e +--------+\n |..text..|\n +--------+\n Result of variable f \n \n ....+--------+\n ....|..text..|\n ....+--------+\n \n \n Result of variable g +--------+\n |..text..|\n +--------+ Result of variable h +--------+\n |..text..|\n +--------+ Result of variable i "nested" Result of variable j public.static.void.methodName(String....args).{\n ..........System.out.println(String.join(args));\n ......}\n Result of variable k ......public.static.void..methodName(String....args).{\n ..........System.out.println(String.join(args));\n ......}\n Result of variable l \n .......................+--------+\n .......................|..text..|\n .......................+--------+\n ................... From guy.steele at oracle.com Wed Apr 17 20:07:58 2019 From: guy.steele at oracle.com (Guy Steele) Date: Wed, 17 Apr 2019 16:07:58 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <2F003D6A-4E83-46D3-A116-9D1DD9FBEFE4@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> <0E024C32-8682-4972-B356-57C24CAD444E@oracle.com> <2F003D6A-4E83-46D3-A116-9D1DD9FBEFE4@oracle.com> Message-ID: Okay, for this message I abandon rectangles and adopt Version 2 of the algorithm, as described by Brian (see below): a trailing blank line before the final delimiter is included in computing the leading whitespace prefix. Now I return to an idea we?ve seen before: > From: Tagir Valeev > Date: January 27, 2018 at 3:23:31 AM EST > > String html = ` > | > | Message > | > | `.trimMargin(); Here Tagir used pipe characters to indicate the intended left-hand margin of the embedded material, and a trimMargin() method to remove them (and preceding whitespace on the line). So: what if we allow use of that format, but instead of pipes, use double quotes? String s = ??? ? I am ? a block of text ???; This gives a sort of ?partial rectangle? that looks nice, is semi-easy to edit, and works okay with proportional fonts (you may not get good alignment with the leading delimiter, but you can always get perfect alignment with the trailing delimiter). However, functionally it really buys you nothing over what the indentation of the trailing delimiter tells you. So you can use it or not. Where this option shines is when you really don?t want that trailing newline, but do want specific indentation and don?t want to use a method call .indent(4): String s = ??? ? I am ? a block of text???; So for this design I would suggest adding a rule that until you hit that trailing delimiter, either every line DOES have a leading ?, or every line does NOT have a leading ? (error if the two styles are mixed). If you want to use the style without leading ? but some line of content begins with ?, then you have to escape it: String s = ??? \"I am a block of text" ???; If you really don?t like using an escape (for one thing, it makes the relative indentation of content lines less clear), then use the other style: String s = ??? " "I am " a block of text" ???; You have choices. I?m not entirely sure I like such a design, but I?m putting it out there for contemplation. > On Apr 17, 2019, at 2:40 PM, Brian Goetz wrote: > > Where this makes a difference is: > > String s = ??? > I am > a block of text > ???; > > Under Version 1, we would get: > > I am > a block of text > > Under version 2, we would get > > I am > a block of text > > Under version 1, if we wanted the latter answer, we would have to do .indent(4) or something like that. > > >> On Apr 17, 2019, at 2:33 PM, Brian Goetz > wrote: >> >>> >>> I'm sorry I'm not completely caught up on this discussion yet and may have missed something. But I'm confused what the alternative to using the closing delimiter position is. You certainly don't want to magically use the column position of the *opening* delimiter in this example! That is *definitely* incidental, as it depends on what the `query` variable got renamed to later by some refactoring tool. Variable renames shouldn't change program behavior. >>> >> >> Version 1 of the algorithm is: >> - Strip leading and trailing blanks (perhaps limited to one each only) >> - Compute the maximal common whitespace prefix of the remaining lines, and trim that off each line >> - If a blank last line was trimmed, add back a newline >> >> Version 2 of the algorithm ? the ?significant closing delimiter? version ? is: >> - Strip leading and trailing blanks (perhaps limited to one each only) >> - Compute the maximal common whitespace prefix of the remaining lines, _including the stripped trailing blank line from above, if any_, and trim that off >> - If a blank last line was trimmed, add back a newline >> >> In other words, the difference is wether we strip off the trailing blank before or after we compute the leading whitespace prefix. In neither case do we use scanner position. > From cushon at google.com Wed Apr 17 22:36:26 2019 From: cushon at google.com (Liam Miller-Cushon) Date: Wed, 17 Apr 2019 15:36:26 -0700 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <7B8FD516-08E1-42FA-A83E-EEA13CCAF317@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> <7B8FD516-08E1-42FA-A83E-EEA13CCAF317@oracle.com> Message-ID: On Wed, Apr 17, 2019 at 6:51 AM Brian Goetz wrote: > I think we can assume that most of the time, users would prefer that the > opening delimiter line up with the closing one, especially if we are > essentially forcing them to put them on their own lines. Saying > > So the tradeoff here is, as Jim suggests: > > - More control over indentation > - Less ability to lean on an existing convention > > Note that we seem to be leaning already towards using the position of the > closing delimiter to determine whether there is a trailing NL or not. I > think going all the way with this is something users could get their heads > around. So it seems a reasonable option. What do we think? > Yet another stylistic choice is to break before the opening delimiter too: String m = """ +--------+ | text | +--------+ """; While that formatting may be surprising at first glance, this isn't a straw-person. There are a few benefits to that style: * the delimiters are horizontally aligned * renaming the variable doesn't create churn from having to re-indent the string * there's an arguable readability benefit: when embedding large chunks of structured text into the middle of Java program, is may be helpful for the distinction between the two parts of the file to be very clear, even at the cost of some extra vertical space Regarding formatting multi-line strings similar to braces, I'm still thinking through why that's the best existing convention to borrow from. In the code bases I've seen it's unusual to format other constructs in that style, e.g. it's rare to see: someMethod( arg1, arg2 ); Why are multi-line strings closer to constructs with braces than to other expressions? From cushon at google.com Wed Apr 17 22:37:16 2019 From: cushon at google.com (Liam Miller-Cushon) Date: Wed, 17 Apr 2019 15:37:16 -0700 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <431BC862-12BE-4BBC-9B09-0C2206DFCB55@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <431BC862-12BE-4BBC-9B09-0C2206DFCB55@oracle.com> Message-ID: On Wed, Apr 17, 2019 at 7:22 AM Jim Laskey wrote: > We need a strong argument for why to not allow content on the same lines. > ... > Let's go back to having close delimiter influencing indentation and the > original close delimiter influencing presence of trailing \n. Can we have > both? Do they conflict? If so, how do we counteract the default action? > ... > So we can have both close delimiter influences, with workarounds for the contrary > cases. > I found the examples you worked through of why the restriction is unnecessary for closing delimiters convincing. Thanks for the explanation! The 'failure mode' of needing a call to `.indent(4)` doesn't really feel like losing: arguably it helps readability, since reading '4' is easier than visually counting whitespace characters. I still find the restriction appealing for the opening delimiter, though. The argument is that having contents on the opening line seems likely to cause confusion, e.g.: String m = """ +--------+ | text | +--------+"""; Result of variable m under the current string-tapas prototype: ....+--------+\n |..text..|\n +--------+ Unlike the issue with trailing newlines and the closing delimiter, disallowing contents on the same line as the opening delimiter doesn't limit the set of strings that can be expressed using triple quotes. Are there important use-cases that require allowing contents on the same line as the opening delimiter? (The other way to "fix" this example would be using information from the scanner, but there seems to be tentative consensus that cure is worse then the disease.) From james.laskey at oracle.com Thu Apr 18 12:24:23 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Thu, 18 Apr 2019 09:24:23 -0300 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> <0E024C32-8682-4972-B356-57C24CAD444E@oracle.com> <2F003D6A-4E83-46D3-A116-9D1DD9FBEFE4@oracle.com> Message-ID: I see value in this notion. Early on in the design of RSL we had a trimMargin equivalent defined in String. John suggested some modifications which led to a Swiss Army trimMargin which could handle everything you suggest and much more. In the end, what changed our minds about providing trimMargin in the JDK was the result of the RSL SurveyMonkey, indicating that most developers wanted a clutter free, low maintenance literal. Realizing that we would never please everyone, we fell back on the assumption that most were good with the basic literal and that customized align/trim could be handled by String::transform. Cheers, -- Jim > On Apr 17, 2019, at 5:07 PM, Guy Steele wrote: > > Okay, for this message I abandon rectangles and adopt Version 2 of the algorithm, as described by Brian (see below): a trailing blank line before the final delimiter is included in computing the leading whitespace prefix. > > Now I return to an idea we?ve seen before: > >> From: Tagir Valeev > >> Date: January 27, 2018 at 3:23:31 AM EST >> >> String html = ` >> | >> | Message >> | >> | `.trimMargin(); > > > Here Tagir used pipe characters to indicate the intended left-hand margin of the embedded material, and a trimMargin() method to remove them (and preceding whitespace on the line). > > So: what if we allow use of that format, but instead of pipes, use double quotes? > > String s = ??? > ? I am > ? a block of text > ???; > > This gives a sort of ?partial rectangle? that looks nice, is semi-easy to edit, and works okay with proportional fonts (you may not get good alignment with the leading delimiter, but you can always get perfect alignment with the trailing delimiter). However, functionally it really buys you nothing over what the indentation of the trailing delimiter tells you. So you can use it or not. > > Where this option shines is when you really don?t want that trailing newline, but do want specific indentation and don?t want to use a method call .indent(4): > > String s = ??? > ? I am > ? a block of text???; > > So for this design I would suggest adding a rule that until you hit that trailing delimiter, either every line DOES have a leading ?, or every line does NOT have a leading ? (error if the two styles are mixed). > > If you want to use the style without leading ? but some line of content begins with ?, then you have to escape it: > > String s = ??? > \"I am > a block of text" > ???; > > If you really don?t like using an escape (for one thing, it makes the relative indentation of content lines less clear), then use the other style: > > String s = ??? > " "I am > " a block of text" > ???; > > You have choices. I?m not entirely sure I like such a design, but I?m putting it out there for contemplation. > > >> On Apr 17, 2019, at 2:40 PM, Brian Goetz > wrote: >> >> Where this makes a difference is: >> >> String s = ??? >> I am >> a block of text >> ???; >> >> Under Version 1, we would get: >> >> I am >> a block of text >> >> Under version 2, we would get >> >> I am >> a block of text >> >> Under version 1, if we wanted the latter answer, we would have to do .indent(4) or something like that. >> >> >>> On Apr 17, 2019, at 2:33 PM, Brian Goetz > wrote: >>> >>>> >>>> I'm sorry I'm not completely caught up on this discussion yet and may have missed something. But I'm confused what the alternative to using the closing delimiter position is. You certainly don't want to magically use the column position of the *opening* delimiter in this example! That is *definitely* incidental, as it depends on what the `query` variable got renamed to later by some refactoring tool. Variable renames shouldn't change program behavior. >>>> >>> >>> Version 1 of the algorithm is: >>> - Strip leading and trailing blanks (perhaps limited to one each only) >>> - Compute the maximal common whitespace prefix of the remaining lines, and trim that off each line >>> - If a blank last line was trimmed, add back a newline >>> >>> Version 2 of the algorithm ? the ?significant closing delimiter? version ? is: >>> - Strip leading and trailing blanks (perhaps limited to one each only) >>> - Compute the maximal common whitespace prefix of the remaining lines, _including the stripped trailing blank line from above, if any_, and trim that off >>> - If a blank last line was trimmed, add back a newline >>> >>> In other words, the difference is wether we strip off the trailing blank before or after we compute the leading whitespace prefix. In neither case do we use scanner position. >> > From james.laskey at oracle.com Thu Apr 18 12:43:52 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Thu, 18 Apr 2019 09:43:52 -0300 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <431BC862-12BE-4BBC-9B09-0C2206DFCB55@oracle.com> Message-ID: <337F1C29-2223-4EB2-A083-483802F6434C@oracle.com> I had a prototype ready with opening delimiter influence, which worked quite well. What kills the concept is that ODI is not reproducible in a library method. How ODI works in the compiler is that it counts the number of characters in the scan buffer, starting at the start of line, leading up to the literal content after the open delimiter. Once scanned, the literal contains no information that could help us regenerate the count. The only options I can see to support the idea in full are, 1) always trim lines to only include characters right of the open delimiter // Assuming auto alignment turned off String m = """line 1 line 2 line 3 """; Result: line 1 line 2 ine 3 or 2) add equivalent whitespace to the head of the string to compensate for the lead up to the first line of content. // Assuming auto alignment turned off String m = """line 1 line 2 line 3 """; Result: line 1 line 2 line 3 Not sure either are forward thinking. Cheers, -- Jim > On Apr 17, 2019, at 7:37 PM, Liam Miller-Cushon wrote: > > On Wed, Apr 17, 2019 at 7:22 AM Jim Laskey > wrote: > We need a strong argument for why to not allow content on the same lines. > ... > Let's go back to having close delimiter influencing indentation and the original close delimiter influencing presence of trailing \n. Can we have both? Do they conflict? If so, how do we counteract the default action? > ... > So we can have both close delimiter influences, with workarounds for the contrary cases. > > I found the examples you worked through of why the restriction is unnecessary for closing delimiters convincing. Thanks for the explanation! > > The 'failure mode' of needing a call to `.indent(4)` doesn't really feel like losing: arguably it helps readability, since reading '4' is easier than visually counting whitespace characters. > > I still find the restriction appealing for the opening delimiter, though. The argument is that having contents on the opening line seems likely to cause confusion, e.g.: > > String m = """ +--------+ > | text | > +--------+"""; > > Result of variable m under the current string-tapas prototype: > > ....+--------+\n > |..text..|\n > +--------+ > > Unlike the issue with trailing newlines and the closing delimiter, disallowing contents on the same line as the opening delimiter doesn't limit the set of strings that can be expressed using triple quotes. > > Are there important use-cases that require allowing contents on the same line as the opening delimiter? > > (The other way to "fix" this example would be using information from the scanner, but there seems to be tentative consensus that cure is worse then the disease.) From brian.goetz at oracle.com Thu Apr 18 15:52:32 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 18 Apr 2019 11:52:32 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <431BC862-12BE-4BBC-9B09-0C2206DFCB55@oracle.com> Message-ID: <39408A04-43D4-4E1F-A51B-B47F9E933CD8@oracle.com> > I still find the restriction appealing for the opening delimiter, though. The argument is that having contents on the opening line seems likely to cause confusion, e.g.: > > String m = """ +--------+ > | text | > +--------+"""; > > Result of variable m under the current string-tapas prototype: > > ....+--------+\n > |..text..|\n > +--------+ I think this is a restriction that is much more suitable to a _style guide_ than the language. Yes, users can get it wrong, but they?ll learn quickly. And, sometimes putting text on that first line is exactly what you want, such as in the case where you _dont_ want alignment to muck with your indentation. Putting non-blank text on that first line is effectively an opt-out: String m = ???I won?t get any alignment (except maybe NL normalization) ???; From james.laskey at oracle.com Thu Apr 18 16:14:03 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Thu, 18 Apr 2019 13:14:03 -0300 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <17A2E5F6-958A-4BDC-A8C3-4675D9DA9193@oracle.com> References: <17A2E5F6-958A-4BDC-A8C3-4675D9DA9193@oracle.com> Message-ID: I've updated the repo with an updated String::align and switched the compiler to use the String::align method instead of it's own version. I also updated the sample. > On Apr 17, 2019, at 4:58 PM, Jim Laskey wrote: > > I pushed changes to http://hg.openjdk.java.net/amber/amber string-tapas branch to reflect the incidental whitespace discussion. What is implemented is what Brian described as > > Version 2 of the algorithm ? the ?significant closing delimiter? version ? is: > - Strip leading and trailing blanks (perhaps limited to one each only) > - Compute the maximal common whitespace prefix of the remaining lines, _including the stripped trailing blank line from above, if any_, and trim that off > - If a blank last line was trimmed, add back a newline > > With strip leading and trailing blanks limited to one each. > > I also added a "super escape" \~ to opt out of auto align. > > > Example: > > String l = """\~ > +--------+ > | text | > +--------+ > """; // what opt out might look like > > Actual: > > \n > .......................+--------+\n > .......................| text |\n > .......................+--------+\n > ................... > > > Note that the String#align needs tweaking to reflect the algorithm (needs to add closing delimiter influence). > > There is a sampler at http://cr.openjdk.java.net/~jlaskey/Strings/AutoAlign.java (included below as well) that shows the examples from my original (1a) incidental whitespace e-mail. > > Cheers, > > -- Jim > > _____________________________________________________________________________________________________________ > > public class AutoAlign { > public static void report(String label, String result) { > System.out.format("Result of variable %s%n", label); > String formatted = result.replaceAll(" ", ".") > .replaceAll("\n", "\\\\n\n"); > System.out.format("%s%n%n", formatted); > } > > public static void main(String... args) throws Exception { > String a = """ > +--------+ > | text | > +--------+ > """; // first characters in first column? > > String b = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented four spaces? > > String c = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented several? > > String d = """ > +--------+ > | text | > +--------+ > """; // first characters in first column or indented four? > > String e = > """ > +--------+ > | text | > +--------+ > """; // heredoc? > > String f = """ > > > +--------+ > | text | > +--------+ > > > """; // one or all leading or trailing blank lines stripped? > > String g = """ > +--------+ > | text | > +--------+"""; // Last \n dropped > > String h = """+--------+ > | text | > +--------+"""; // determine indent of first line using scanner knowledge? > > String i = """ "nested" """; // strip leading/trailing space? > > String name = " methodName"; > String j = (""" > public static void """ + name + """(String... args) { > System.out.println(String.join(args)); > } > """).align(); // how do we handle expressions with multi-line strings? > > String k = String.format(""" > public static void %s(String... args) { > System.out.println(String.join(args)); > } > """, name); // is this the answer to multi-line string expressions? > > String l = """\~ > +--------+ > | text | > +--------+ > """; // what opt out might look like > > report("a", a); > report("b", b); > report("c", c); > report("d", d); > report("e", e); > report("f", f); > report("g", g); > report("h", h); > report("i", i); > report("j", j); > report("k", k); > report("l", l); > } > } > > _____________________________________________________________________________________________________________ > > Result of variable a > +--------+\n > |..text..|\n > +--------+\n > > > Result of variable b > ....+--------+\n > ....|..text..|\n > ....+--------+\n > > > Result of variable c > ...............+--------+\n > ...............|..text..|\n > ...............+--------+\n > > > Result of variable d > ....+--------+\n > ....|..text..|\n > ....+--------+\n > > > Result of variable e > +--------+\n > |..text..|\n > +--------+\n > > > Result of variable f > \n > \n > ....+--------+\n > ....|..text..|\n > ....+--------+\n > \n > \n > > > Result of variable g > +--------+\n > |..text..|\n > +--------+ > > Result of variable h > +--------+\n > |..text..|\n > +--------+ > > Result of variable i > "nested" > > Result of variable j > public.static.void.methodName(String....args).{\n > ..........System.out.println(String.join(args));\n > ......}\n > > > Result of variable k > ......public.static.void..methodName(String....args).{\n > ..........System.out.println(String.join(args));\n > ......}\n > > > Result of variable l > \n > .......................+--------+\n > .......................|..text..|\n > .......................+--------+\n > ................... > > > > > From james.laskey at oracle.com Thu Apr 18 16:20:30 2019 From: james.laskey at oracle.com (James Laskey) Date: Thu, 18 Apr 2019 13:20:30 -0300 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <17A2E5F6-958A-4BDC-A8C3-4675D9DA9193@oracle.com> Message-ID: I?ll update the comments in String::align when we settle. Sent from my iPhone > On Apr 18, 2019, at 1:14 PM, Jim Laskey wrote: > > I've updated the repo with an updated String::align and switched the compiler to use the String::align method instead of it's own version. I also updated the sample. > > >> On Apr 17, 2019, at 4:58 PM, Jim Laskey wrote: >> >> I pushed changes to http://hg.openjdk.java.net/amber/amber string-tapas branch to reflect the incidental whitespace discussion. What is implemented is what Brian described as >> >> Version 2 of the algorithm ? the ?significant closing delimiter? version ? is: >> - Strip leading and trailing blanks (perhaps limited to one each only) >> - Compute the maximal common whitespace prefix of the remaining lines, _including the stripped trailing blank line from above, if any_, and trim that off >> - If a blank last line was trimmed, add back a newline >> >> With strip leading and trailing blanks limited to one each. >> >> I also added a "super escape" \~ to opt out of auto align. >> >> >> Example: >> >> String l = """\~ >> +--------+ >> | text | >> +--------+ >> """; // what opt out might look like >> >> Actual: >> >> \n >> .......................+--------+\n >> .......................| text |\n >> .......................+--------+\n >> ................... >> >> >> Note that the String#align needs tweaking to reflect the algorithm (needs to add closing delimiter influence). >> >> There is a sampler at http://cr.openjdk.java.net/~jlaskey/Strings/AutoAlign.java (included below as well) that shows the examples from my original (1a) incidental whitespace e-mail. >> >> Cheers, >> >> -- Jim >> >> _____________________________________________________________________________________________________________ >> >> public class AutoAlign { >> public static void report(String label, String result) { >> System.out.format("Result of variable %s%n", label); >> String formatted = result.replaceAll(" ", ".") >> .replaceAll("\n", "\\\\n\n"); >> System.out.format("%s%n%n", formatted); >> } >> >> public static void main(String... args) throws Exception { >> String a = """ >> +--------+ >> | text | >> +--------+ >> """; // first characters in first column? >> >> String b = """ >> +--------+ >> | text | >> +--------+ >> """; // first characters in first column or indented four spaces? >> >> String c = """ >> +--------+ >> | text | >> +--------+ >> """; // first characters in first column or indented several? >> >> String d = """ >> +--------+ >> | text | >> +--------+ >> """; // first characters in first column or indented four? >> >> String e = >> """ >> +--------+ >> | text | >> +--------+ >> """; // heredoc? >> >> String f = """ >> >> >> +--------+ >> | text | >> +--------+ >> >> >> """; // one or all leading or trailing blank lines stripped? >> >> String g = """ >> +--------+ >> | text | >> +--------+"""; // Last \n dropped >> >> String h = """+--------+ >> | text | >> +--------+"""; // determine indent of first line using scanner knowledge? >> >> String i = """ "nested" """; // strip leading/trailing space? >> >> String name = " methodName"; >> String j = (""" >> public static void """ + name + """(String... args) { >> System.out.println(String.join(args)); >> } >> """).align(); // how do we handle expressions with multi-line strings? >> >> String k = String.format(""" >> public static void %s(String... args) { >> System.out.println(String.join(args)); >> } >> """, name); // is this the answer to multi-line string expressions? >> >> String l = """\~ >> +--------+ >> | text | >> +--------+ >> """; // what opt out might look like >> >> report("a", a); >> report("b", b); >> report("c", c); >> report("d", d); >> report("e", e); >> report("f", f); >> report("g", g); >> report("h", h); >> report("i", i); >> report("j", j); >> report("k", k); >> report("l", l); >> } >> } >> >> _____________________________________________________________________________________________________________ >> >> Result of variable a >> +--------+\n >> |..text..|\n >> +--------+\n >> >> >> Result of variable b >> ....+--------+\n >> ....|..text..|\n >> ....+--------+\n >> >> >> Result of variable c >> ...............+--------+\n >> ...............|..text..|\n >> ...............+--------+\n >> >> >> Result of variable d >> ....+--------+\n >> ....|..text..|\n >> ....+--------+\n >> >> >> Result of variable e >> +--------+\n >> |..text..|\n >> +--------+\n >> >> >> Result of variable f >> \n >> \n >> ....+--------+\n >> ....|..text..|\n >> ....+--------+\n >> \n >> \n >> >> >> Result of variable g >> +--------+\n >> |..text..|\n >> +--------+ >> >> Result of variable h >> +--------+\n >> |..text..|\n >> +--------+ >> >> Result of variable i >> "nested" >> >> Result of variable j >> public.static.void.methodName(String....args).{\n >> ..........System.out.println(String.join(args));\n >> ......}\n >> >> >> Result of variable k >> ......public.static.void..methodName(String....args).{\n >> ..........System.out.println(String.join(args));\n >> ......}\n >> >> >> Result of variable l >> \n >> .......................+--------+\n >> .......................|..text..|\n >> .......................+--------+\n >> ................... >> >> >> >> >> > From brian.goetz at oracle.com Thu Apr 18 16:39:44 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 18 Apr 2019 12:39:44 -0400 Subject: To align, or not to align? Message-ID: <2b3f00c2-6557-c998-f910-33d2f58110e8@oracle.com> I think we're at a point where we're ready to make the next big decision. So far, we seem to be converging on a reasonable definition of "alignment" for multi-line strings, modulo a few small choices (e.g., what to do about single-line strings, etc.)? Jim has posted a prototype.? This could be exposed as either a language feature, or a library feature, or both. The question we're now ready to confront is: how should a user indicate that they do, or do not, want alignment?? Options include: ?- Align by default, with some sort of opt-out ?- Do not align by default, with some sort of opt-in ?? - Opt in could just be a library invocation, such as `String::align` ?? - Opt in could be a linguistic marker (Note: we are not ready to discuss syntax yet, we're still discussing how the language works.) Arguments for align-by-default: ?- General feeling (hopefully to be bolstered by data soon) that _most_ embedded ML strings are "programs" in languages like HTML, XML, JSON, YAML, SQL, etc, which naturally use indentation and users will generally want to strip off the incidental indentation caused by embedding in Java source. ?- Incidental indentation is a natural, but accidental, consequence of embedding ML strings in a program. ?- Early feedback included plenty of complaints about "why do I have to say `.align()` all the time?" ?- We don't want to leave users with a perceived bad choice: either say `.align()` explicitly, or mangle your source code to not have incidental indentation (making it look bad), or live with extra indentation. Arguments against align-by-default: ?- The language should provide a _simple_ facility for string literals; the complexity of alignment does not belong in the language. ?- Alignment is not always what the user is going to want; even if it is, the built-in alignment algorithm may not be exactly what the user is going to want. ?- It is not an orthogonal decomposition; we're tying ML-ness to alignment.? The language should expose primitives that the user can combine compositionally. ?- It interacts badly with string concatenation. Note that the "for" arguments are mostly pragmatic, and the "against" arguments are mostly principled.? (That is not to say we shouldn't decide for "for"; this feature is, after all, purely about user convenience, since these strings can be expressed already in the existing language.) Now accepting arguments one way or the other.? (Not yet accepting comments on syntax.) From guy.steele at oracle.com Thu Apr 18 17:00:41 2019 From: guy.steele at oracle.com (Guy Steele) Date: Thu, 18 Apr 2019 13:00:41 -0400 Subject: To align, or not to align? In-Reply-To: <2b3f00c2-6557-c998-f910-33d2f58110e8@oracle.com> References: <2b3f00c2-6557-c998-f910-33d2f58110e8@oracle.com> Message-ID: Sorry to make trouble, Brian, but I am going to argue that the question posed may be a false dichotomy, because it makes an unstated prior assumption that I think needs examining one more time: it seems to make an assumption that the syntax of a multiline string begins with a delimiter, that the syntax ends with a delimiter, that ** the _content_ of the multiline string consists of the _entire_ sequence of characters between the delimiters **, and that ** the job of align-by-default is to adjust that content (according to some rules) **. There is an alternate approach that views the _syntax_ of a multiline string (whatever it may be) as having a somewhat richer structure that includes both ?content? and possibly also explicit indications of the user?s intent as to alignment. (In such a design ?align-by-default? is not necessary.) I worry that framing the question as ?align-by-default? versus ?no align-by-default? implicitly discards consideration of this alternate approach. Are we in fact already firmly committed to the idea that that the syntax of a multiline string, at least as initially implemented, necessarily consists of an unbroken sequence of characters, all of which constitute string content, sandwiched between exactly two delimiters? If the answer is ?yes?, then my concern is addressed and I will withdraw my objection. I just want to be reassured that the question has been considered and that the answer is ?yes?. ?Guy > On Apr 18, 2019, at 12:39 PM, Brian Goetz wrote: > > I think we're at a point where we're ready to make the next big decision. > > So far, we seem to be converging on a reasonable definition of "alignment" for multi-line strings, modulo a few small choices (e.g., what to do about single-line strings, etc.) Jim has posted a prototype. This could be exposed as either a language feature, or a library feature, or both. > > The question we're now ready to confront is: how should a user indicate that they do, or do not, want alignment? Options include: > > - Align by default, with some sort of opt-out > - Do not align by default, with some sort of opt-in > - Opt in could just be a library invocation, such as `String::align` > - Opt in could be a linguistic marker > > (Note: we are not ready to discuss syntax yet, we're still discussing how the language works.) > > Arguments for align-by-default: > > - General feeling (hopefully to be bolstered by data soon) that _most_ embedded ML strings are "programs" in languages like HTML, XML, JSON, YAML, SQL, etc, which naturally use indentation and users will generally want to strip off the incidental indentation caused by embedding in Java source. > > - Incidental indentation is a natural, but accidental, consequence of embedding ML strings in a program. > > - Early feedback included plenty of complaints about "why do I have to say `.align()` all the time?" > > - We don't want to leave users with a perceived bad choice: either say `.align()` explicitly, or mangle your source code to not have incidental indentation (making it look bad), or live with extra indentation. > > Arguments against align-by-default: > > - The language should provide a _simple_ facility for string literals; the complexity of alignment does not belong in the language. > > - Alignment is not always what the user is going to want; even if it is, the built-in alignment algorithm may not be exactly what the user is going to want. > > - It is not an orthogonal decomposition; we're tying ML-ness to alignment. The language should expose primitives that the user can combine compositionally. > > - It interacts badly with string concatenation. > > > Note that the "for" arguments are mostly pragmatic, and the "against" arguments are mostly principled. (That is not to say we shouldn't decide for "for"; this feature is, after all, purely about user convenience, since these strings can be expressed already in the existing language.) > > Now accepting arguments one way or the other. (Not yet accepting comments on syntax.) > > From cushon at google.com Thu Apr 18 17:08:49 2019 From: cushon at google.com (Liam Miller-Cushon) Date: Thu, 18 Apr 2019 10:08:49 -0700 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <39408A04-43D4-4E1F-A51B-B47F9E933CD8@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <431BC862-12BE-4BBC-9B09-0C2206DFCB55@oracle.com> <39408A04-43D4-4E1F-A51B-B47F9E933CD8@oracle.com> Message-ID: On Thu, Apr 18, 2019 at 8:52 AM Brian Goetz wrote: > I still find the restriction appealing for the opening delimiter, though. > The argument is that having contents on the opening line seems likely to > cause confusion, e.g.: > > String m = """ +--------+ > | text | > +--------+"""; > > Result of variable m under the current string-tapas prototype: > > ....+--------+\n > |..text..|\n > +--------+ > > > I think this is a restriction that is much more suitable to a _style > guide_ than the language. Yes, users can get it wrong, but they?ll learn > quickly. And, sometimes putting text on that first line is exactly what > you want, such as in the case where you _dont_ want alignment to muck with > your indentation. Putting non-blank text on that first line is effectively > an opt-out: > > String m = ???I won?t > get any alignment > (except maybe NL normalization) > > ???; > I'm not sure that matches the behaviour of the current prototype, it doesn't seem to be considering the first line: String m = """I won?t get any alignment (except maybe NL normalization) """; Result of variable m I.won?t\n ....get.any.alignment\n ....(except.maybe.NL.normalization)\n \n From kevinb at google.com Thu Apr 18 17:30:35 2019 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 18 Apr 2019 10:30:35 -0700 Subject: To align, or not to align? In-Reply-To: <2b3f00c2-6557-c998-f910-33d2f58110e8@oracle.com> References: <2b3f00c2-6557-c998-f910-33d2f58110e8@oracle.com> Message-ID: On Thu, Apr 18, 2019 at 9:42 AM Brian Goetz wrote: > I think we're at a point where we're ready to make the next big decision. > > So far, we seem to be converging on a reasonable definition of > "alignment" for multi-line strings, modulo a few small choices (e.g., > what to do about single-line strings, etc.) Jim has posted a > prototype. This could be exposed as either a language feature, or a > library feature, or both. > > The question we're now ready to confront is: how should a user indicate > that they do, or do not, want alignment? Options include: > > - Align by default, with some sort of opt-out > - Do not align by default, with some sort of opt-in > - Opt in could just be a library invocation, such as `String::align` > - Opt in could be a linguistic marker > > (Note: we are not ready to discuss syntax yet, we're still discussing > how the language works.) > > Arguments for align-by-default: > > - General feeling (hopefully to be bolstered by data soon) that _most_ > embedded ML strings are "programs" in languages like HTML, XML, JSON, > YAML, SQL, etc, which naturally use indentation and users will generally > want to strip off the incidental indentation caused by embedding in Java > source. > I wouldn't have used this particular argument, as these parsers usually ignore extra indentation anyway. > - Incidental indentation is a natural, but accidental, consequence of > embedding ML strings in a program. > It is exactly that. That rectangular zone of incidental indentation "belongs" to the high-level structuring of the file. It shouldn't be viewed as belonging to the specific string literal being defined in that location. - We don't want to leave users with a perceived bad choice: either say > `.align()` explicitly, or mangle your source code to not have incidental > indentation (making it look bad), or live with extra indentation. > To drill in on "making it look bad", what's happening is that a local choice to use one kind of string or another is doing damage to the perceived *high-level* organization of the file. > Arguments against align-by-default: > > - The language should provide a _simple_ facility for string literals; > the complexity of alignment does not belong in the language. > This argument lacks meaning unless we delve into what is really meant by "simple". We all know that simple to use, simple to read, simple to explain, simple to write up in a language spec, etc., all mean very different things. - Alignment is not always what the user is going to want; even if it > is, the built-in alignment algorithm may not be exactly what the user is > going to want. > This simply says there should be a way to opt out, which I don't think is controversial. So in other words it doesn't seem to say anything. - It is not an orthogonal decomposition; we're tying ML-ness to > alignment. The language should expose primitives that the user can > combine compositionally. > Interesting! I think this is exactly opposite to what's really going on. Here's how people think of their program indentation. When I open a block, I increase it by N. When I close a block, I decrease it by N. Continuation line, maybe +2N. I move in and out based on what's happening locally. However, I have no care at all for what the current absolute value of that indentation is. Maybe it's 10, maybe it's 14, whatever; that value is irrelevant to me, it simply emerges from how nested I happen to be. Indentation stripping is precisely what *preserves* that independence. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Thu Apr 18 18:32:47 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 18 Apr 2019 14:32:47 -0400 Subject: To align, or not to align? In-Reply-To: References: <2b3f00c2-6557-c998-f910-33d2f58110e8@oracle.com> Message-ID: <02bbe97e-5c78-486d-c9b2-f023ec7ab400@oracle.com> > > - It is not an orthogonal decomposition; we're tying ML-ness to > alignment.? The language should expose primitives that the user can > combine compositionally. > > > Interesting!? I think this is exactly opposite to what's really going on. > > Here's how people think of their program indentation. When I open a > block, I increase it by N. When I close a block, I decrease it by N. > Continuation line, maybe +2N. I move in and out based on what's > happening locally.? However, I have no care at all for what the > current absolute value of that indentation is.? Maybe it's 10, maybe > it's 14, whatever; that value is irrelevant to me, it simply emerges > from how nested I happen to be. > > Indentation stripping is precisely what *preserves* that independence. I don't disagree, as much as see two different ways of looking at it, and I want to call those out explicitly so we can be clear on what we think the language actually wants.? And I think that duality of perspective is exactly the question that we need to come to terms with. One view is that a string literal is the sequence of characters between the delimiters, and a multi-line string literal is just a string literal that happens to be able to span lines.? This is also the simplest extension of existing string literals to multi-line; adding only the ability to span lines.?? In this view, implicit alignment can feel like conflating two things. An alternate view is that a multi-line string is a literal that is embedded spatially in the Java source code; therefore it inherently has some 2D structure to it, which gives us permission to muck with it in certain ways that are consistent with that structure. Guy further observes that these two views are both extremes, and there is another option in the middle: that a multi-line string literal is neither merely a sequence of characters, or a 2D text block to be trimmed according to an algorithm, but actually a small program in "spatial string literal language" that can be expressive enough to talk about its structure, and therefore can be more explicit about its boundaries. So I think the question really comes down to: what _is_ a multi-line string literal. From kevinb at google.com Thu Apr 18 19:00:04 2019 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 18 Apr 2019 12:00:04 -0700 Subject: To align, or not to align? In-Reply-To: <02bbe97e-5c78-486d-c9b2-f023ec7ab400@oracle.com> References: <2b3f00c2-6557-c998-f910-33d2f58110e8@oracle.com> <02bbe97e-5c78-486d-c9b2-f023ec7ab400@oracle.com> Message-ID: On Thu, Apr 18, 2019 at 11:32 AM Brian Goetz wrote: So I think the question really comes down to: what _is_ a multi-line string > literal. > I think that question is so abstract and philosophical as to not be useful. I sent the previous message because I believe it is a better way to frame the issue. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Thu Apr 18 19:02:55 2019 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 18 Apr 2019 12:02:55 -0700 Subject: To align, or not to align? In-Reply-To: References: <2b3f00c2-6557-c998-f910-33d2f58110e8@oracle.com> <02bbe97e-5c78-486d-c9b2-f023ec7ab400@oracle.com> Message-ID: Gonna requote it: "When I open a block, I increase [indentation level] by N. When I close a block, I decrease it by N. Continuation line, maybe +2N. I move in and out based on what's happening locally. However, I have no care at all for what the current absolute value of that indentation is. Maybe it's 10, maybe it's 14, whatever; that value is irrelevant to me, it simply emerges from how nested I happen to be." I'd like to know if anyone is disputing that this is indeed how developers look at indentation. Because if it is, then it's a very general property, which no specific local language feature has ever violated before, and I think this is how we should look at it. On Thu, Apr 18, 2019 at 12:00 PM Kevin Bourrillion wrote: > On Thu, Apr 18, 2019 at 11:32 AM Brian Goetz > wrote: > > So I think the question really comes down to: what _is_ a multi-line >> string literal. >> > > I think that question is so abstract and philosophical as to not be > useful. I sent the previous message because I believe it is a better way to > frame the issue. > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Thu Apr 18 19:06:00 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 18 Apr 2019 15:06:00 -0400 Subject: To align, or not to align? In-Reply-To: References: <2b3f00c2-6557-c998-f910-33d2f58110e8@oracle.com> <02bbe97e-5c78-486d-c9b2-f023ec7ab400@oracle.com> Message-ID: I think you may have misunderstood my framing of the question. I don't want to have a philosophical debate; I'm saying, that we're _already_ in elephant-land, in that different parties are coming with their own implicit, fuzzy ideas of what a string literal is (or is supposed to be).? And by making the potential assumptions explicit, we have a better chance of getting past arguments that involve things like "but an elephant is like a tree." On 4/18/2019 3:00 PM, Kevin Bourrillion wrote: > On Thu, Apr 18, 2019 at 11:32 AM Brian Goetz > wrote: > > So I think the question really comes down to: what _is_ a > multi-line string literal. > > > I think that question is so abstract and philosophical as to not be > useful. I sent the previous message because I believe it is a better > way to frame the issue. > > -- > Kevin Bourrillion?|?Java Librarian |?Google, Inc.?|kevinb at google.com > From guy.steele at oracle.com Thu Apr 18 19:05:22 2019 From: guy.steele at oracle.com (Guy Steele) Date: Thu, 18 Apr 2019 15:05:22 -0400 Subject: To align, or not to align? In-Reply-To: References: <2b3f00c2-6557-c998-f910-33d2f58110e8@oracle.com> Message-ID: <1193AEB9-6678-43E5-AAE9-9399872D082F@oracle.com> > On Apr 18, 2019, at 1:30 PM, Kevin Bourrillion wrote: > . . . > Interesting! I think this is exactly opposite to what's really going on. > > Here's how people think of their program indentation. When I open a block, I increase it by N. When I close a block, I decrease it by N. Continuation line, maybe +2N. I move in and out based on what's happening locally. However, I have no care at all for what the current absolute value of that indentation is. Maybe it's 10, maybe it's 14, whatever; that value is irrelevant to me, it simply emerges from how nested I happen to be. > > Indentation stripping is precisely what *preserves* that independence. I agree with this analysis. Much of what are debating is how best to define and support such indentation stripping (if at all), and what controls the user should have, and how ?readable" and ?intuitive? those controls are. From kevinb at google.com Thu Apr 18 19:13:21 2019 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 18 Apr 2019 12:13:21 -0700 Subject: To align, or not to align? In-Reply-To: References: <2b3f00c2-6557-c998-f910-33d2f58110e8@oracle.com> <02bbe97e-5c78-486d-c9b2-f023ec7ab400@oracle.com> Message-ID: Sorry for being eternally confused, but, going with your framing, then I would say *of course* Java source code constructs are embedded spatially in Java source files. And, as I've tried to highlight, their *absolute* vertical and horizontal positions are not properties relating to the constructs themselves, but are both artifacts of what came before them. On Thu, Apr 18, 2019 at 12:06 PM Brian Goetz wrote: > I think you may have misunderstood my framing of the question. I don't > want to have a philosophical debate; I'm saying, that we're _already_ in > elephant-land, in that different parties are coming with their own > implicit, fuzzy ideas of what a string literal is (or is supposed to be). > And by making the potential assumptions explicit, we have a better chance > of getting past arguments that involve things like "but an elephant is like > a tree." > > > > On 4/18/2019 3:00 PM, Kevin Bourrillion wrote: > > On Thu, Apr 18, 2019 at 11:32 AM Brian Goetz > wrote: > > So I think the question really comes down to: what _is_ a multi-line >> string literal. >> > > I think that question is so abstract and philosophical as to not be > useful. I sent the previous message because I believe it is a better way to > frame the issue. > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From john.r.rose at oracle.com Thu Apr 18 19:31:13 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 18 Apr 2019 12:31:13 -0700 Subject: To align, or not to align? In-Reply-To: <02bbe97e-5c78-486d-c9b2-f023ec7ab400@oracle.com> References: <2b3f00c2-6557-c998-f910-33d2f58110e8@oracle.com> <02bbe97e-5c78-486d-c9b2-f023ec7ab400@oracle.com> Message-ID: On Apr 18, 2019, at 11:32 AM, Brian Goetz wrote: > > So I think the question really comes down to: what _is_ a multi-line string literal. As an aficionado of philosophy, I'll take a stab at this. You can judge whether it's useful or not. A string literal is a *convenient* and *natural* programmatic notation for a constant string payload. A multi-line string literal conveniently and naturally is a notation for a constant string payload of multiple lines. *Convenient* means easy to read, maintain (re-write), and originate (write), with IDE support or without. *Natural* means most or all of the payload shows up visually in the program source, as if it had simply been pasted there. Base64 would be highly unnatural because it obscures all payload characters. Also unnatural (but limited in impact) are \n and + notations. Extra contextual indents are slightly unnatural but tolerable because they are easy to disregard using a rectangle rule. (By the way, a *notation* is a way to visually encode programmer intentions. It must be unambiguous. Overly terse or overly convenient notation designs sometimes smuggle in ambiguities, so unambiguity can't always be taken for granted.) For multi-line payloads, a *natural* notation will tend to put each line of the payload on its own line in the source code of the literal. And a *convenient* notation will make clear all distinctions between payload and enclosing program structure, including any extra indentation (imposed by enclosing context) on the payload lines. "Clear distinction" cuts two ways: We need enough delimiters to visually separate the payload from context, but if we have too many delimiters the notation becomes hard to read and makes the payload look confusing (less natural). A rectangle rule could be part of a sweet spot in the design space, since it naturally respect both the 2D format of the program and the 2D format of the payload. In this framing of the problem, we could turn the design knob towards more visually explicit 2D framing of the payload, by somehow adding a delimiter or escape which marks the boundary between the enclosing indentation and the payload indentation. For example (this is just an example) the white space at the *very beginning* of a literal could accept an extra escape of some sort which signals the transition between the enclosing source and the payload. Such extra syntax would be noisy and harder to write, but it would (as extra syntax tends to do) would reduce ambiguity about the programmer's wishes. Treading on the very edge of syntax design, but refusing to jump all the way in, I'll suggest that the northwest corner of the rectangle could be marked with a "blob" of explicit syntax: String mls = """ __NWC_BLOB__ xx xx yy yy """; assert mls == " xx xx\n y yy\n"; Or, a left margin blob could mark the whole western edge: assert mls == """ __WWE_BLOB__ xx xx __WWE_BLOB__ y yy """; I think it's hard to make these be more convenient (readable, editable) than Jim and Brian's rules for stripping. They definitely have an "opt-in" feel to them, because of their extra overheads. But maybe allowing a single whitespace character to be escaped would somehow assist the user in distinguishing payload from non-payload. I can think of several different ways to formulate such a rule, but that's going down into syntax again. HTH ? John From john.r.rose at oracle.com Thu Apr 18 19:35:35 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 18 Apr 2019 12:35:35 -0700 Subject: To align, or not to align? In-Reply-To: References: <2b3f00c2-6557-c998-f910-33d2f58110e8@oracle.com> <02bbe97e-5c78-486d-c9b2-f023ec7ab400@oracle.com> Message-ID: On Apr 18, 2019, at 12:02 PM, Kevin Bourrillion wrote: > > Gonna requote it: > > "When I open a block, I increase [indentation level] by N. When I close a block, I decrease it by N. Continuation line, maybe +2N. I move in and out based on what's happening locally. However, I have no care at all for what the current absolute value of that indentation is. Maybe it's 10, maybe it's 14, whatever; that value is irrelevant to me, it simply emerges from how nested I happen to be." > > I'd like to know if anyone is disputing that this is indeed how developers look at indentation. No dispute from me! This is what I wanted to support by mentioning the "rectangle rule" in the previous message. Source code is read in two dimensions. (The vertical alignment dimension is weakened if you are so unfortunate as to be viewing program source in a variable-width font. Which is why D2 code examples in E-mail are tricky, and programmers like fixed-width fonts.) From john.r.rose at oracle.com Thu Apr 18 20:59:53 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 18 Apr 2019 13:59:53 -0700 Subject: To align, or not to align? In-Reply-To: References: <2b3f00c2-6557-c998-f910-33d2f58110e8@oracle.com> <02bbe97e-5c78-486d-c9b2-f023ec7ab400@oracle.com> Message-ID: <40AA2C7D-7615-47D4-B78E-D19B2AF22F3E@oracle.com> On Apr 18, 2019, at 12:31 PM, John Rose wrote: > > I think it's hard to make these be more convenient (readable, > editable) than Jim and Brian's rules for stripping. They definitely > have an "opt-in" feel to them, because of their extra overheads. To clarify this paragraph: I mean that a syntax like the __NWC_BLOB__ has an "opt-in" feel. Jim and Brian's rules for stripping do *not* have such string "opt-in" feel, and they have less overhead. With less overhead comes more ambiguity; that's always the trade-off. But the stripping rule which says "strip the most whitespace possible, but equally from all lines" makes for a notation that is very convenient and natural, perhaps enough so to be "opt-out" instead of "opt-in". I suggest that an easy "opt-out" for stripping would be to throw in an extra escape somewhere that disables the stripping machinery, either partially or totally. The *simplest* way to do this is to replace the first payload space on the first line with the escape \040, which denotes an ASCII space, but cannot be confused with stripped indentation. (Or can it? It shouldn't, although I suspect it does in the present implementation. Let's check the spec to make sure. And what about \u0020??? Yikes. That's a real space under a unicode escape, which is processed differently from an octal escape.) Another easy "opt-out" is to transfer the first line (with retained indentation) to the line containing the open-quote. Another not-so-easy "opt-out" would be allowing the sequence BACKSLASH NEWLINE to be elided, as /bin/sh does. This is not in Java. If added, it create (a) a way to break very long lines without resorting to "+" operators, and (b) a way to opt out of stripping, by placing BACKSLASH NEWLINE immediately after the open-quote. And so on? there are lots of ways to skin the cat if we allow ourselves to give him new appendages. (And consider a toroidal cat?) ? John From john.r.rose at oracle.com Thu Apr 18 23:33:25 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 18 Apr 2019 16:33:25 -0700 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <431BC862-12BE-4BBC-9B09-0C2206DFCB55@oracle.com> <39408A04-43D4-4E1F-A51B-B47F9E933CD8@oracle.com> Message-ID: <057EC0DF-EEBA-49A0-9AA9-5B29FFE35C22@oracle.com> On Apr 18, 2019, at 10:08 AM, Liam Miller-Cushon wrote: > > Putting non-blank text on that first line is effectively an opt-out +1 I think that would be a reasonable design move. It's a bit ugly but it's learnable. (Putting in \0040 for a not-to-be-stripped space is also ugly but learnable.) From john.r.rose at oracle.com Thu Apr 18 23:39:45 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 18 Apr 2019 16:39:45 -0700 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <337F1C29-2223-4EB2-A083-483802F6434C@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <431BC862-12BE-4BBC-9B09-0C2206DFCB55@oracle.com> <337F1C29-2223-4EB2-A083-483802F6434C@oracle.com> Message-ID: <82F8EAC5-379E-4814-8F83-EE8E4A0D2259@oracle.com> On Apr 18, 2019, at 5:43 AM, Jim Laskey wrote: > > What kills the concept is that ODI is not reproducible in a library method. That also impacts the concept of using escapes to opt out, since the library method cannot see escapes, while the compiler can see them. I suggest the following rule: If the string contains quoted whitespace (especially \n and \040, maybe others), then the compiler does *not* call the library routine to strip leading spaces. Thus, visibly taking control of at least one element of whitespace in the body of the string constitutes an opt-out from any edits to the remaining whitespace. Thus, *if* javac hands the string to the library method, we already know that the library method won't be meddling with escaped spaces or newlines. From john.r.rose at oracle.com Thu Apr 18 23:41:22 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 18 Apr 2019 16:41:22 -0700 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> <0E024C32-8682-4972-B356-57C24CAD444E@oracle.com> <2F003D6A-4E83-46D3-A116-9D1DD9FBEFE4@oracle.com> Message-ID: <49E56598-F9A5-4915-BA59-B2C9D21025F7@oracle.com> On Apr 18, 2019, at 5:24 AM, Jim Laskey wrote: > > John suggested some modifications which led to a Swiss Army trimMargin which could handle everything you suggest and much more. In the end, what changed our minds (Hmmm, could have been Loki the Trickster taking on my appearance. Blowing up a design process by over-generalizing is a Loki move.) From john.r.rose at oracle.com Thu Apr 18 23:55:22 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 18 Apr 2019 16:55:22 -0700 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> <0E024C32-8682-4972-B356-57C24CAD444E@oracle.com> <2F003D6A-4E83-46D3-A116-9D1DD9FBEFE4@oracle.com> Message-ID: On Apr 17, 2019, at 1:07 PM, Guy Steele wrote: > String s = ??? > " "I am > " a block of text" > ???; > > You have choices. I?m not entirely sure I like such a design, but I?m putting it out there for contemplation. Using the single quotes along the left margin looks intuitive to me, more intuitive than putting in a separate margin character (like '|'). Would it be an error if the left-margin quotes were indented at different levels? Or would it be a feature (to be used sparingly if ever)? Same question if the left-margin quotes are indented differently from the final close-quote. Also, is a left-margin quote allowed immediately before the close-quote? Is it required? If allowed and not required, this basically means such a string can end with either three or four close-quotes. If we were to allow this noisy-and-explicit marker for the left margin of the ML string, would we *also* want the less-noisy-and-less-explicit mechanism of trimming as much leading space as possible (the Jim and Brian alignment)? Would having two ways to express the same thing be stylistic freedom, or just the awkward product of a committee design? If the latter, which of the two designs do we prefer, slick and ambiguous or clear and crusty? ? John From john.r.rose at oracle.com Fri Apr 19 00:04:09 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 18 Apr 2019 17:04:09 -0700 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> <8a6d63dc3257473cbd185f250e0ca1b813aacbc9.camel@vasylenko.uk> Message-ID: <6DAFDF5F-C934-4A10-95E3-16A2C72CDEA8@oracle.com> I think your comments below ar emostly applicable to Guy's lightweight "box of quotes" proposal, where the open and close quotes are small (""") and single quotes appear (like stars in a comment) along the left margin. The "blast radius" for making edits is much smaller for the lightweight proposal, since the open and close quotes are invariant across many more edits. One reason people hate those extra stars in comments is that they are using an editor (like Emacs) which is pretty good at generic indentation, but doesn't know about language-specific left margin markup, like stars or (in Guy's second proposal) single quotes. On Apr 17, 2019, at 11:22 AM, Kevin Bourrillion wrote: > > I don't even feel the proposals are equivalent. We've already had ample opportunity to discover that boxes of stars around *comments* are a pain and almost no one wants to use them. > > Talking about the IDE doesn't paint the complete picture. Even if it maintains the width of that box for you as your needs change, the fact that this creates large diffs for small changes -- a blast radius -- is still bad for code reviews, bad for merges, and so on. > > If we really feel that users will find it too hard to understand where the rectangle is, that would be different, but I really don't think this is going to be hard. > > > And if we do care about fallback behaviour without IDE support, even > just a little, it becomes a choice between: > > - Box-of-quotes: Easy to read, horrible to edit. > > - IDE-highlighted-box: Possibly tricky to read, easy to edit. > From john.r.rose at oracle.com Fri Apr 19 00:29:29 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 18 Apr 2019 17:29:29 -0700 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> Message-ID: <1B798D58-0820-4D72-8BC2-3E0F85C358B2@oracle.com> On Apr 16, 2019, at 4:03 PM, Liam Miller-Cushon wrote: > > One disadvantage is the handling of the trailing newline. Requiring the closing delimiter to be on its own line means there's always a trailing newline in source. If we want to allow expressing multi-line strings that don't have a trailing newline we could automatically trim one trailing newline character, but then it would be necessary to leave an extra blank line after multi-line strings in cases where a trailing newline is actually desired. I'm not sure it would be nicer, but here's a move to ponder: If the trailing newline is always stripped, allow an explicit \n (or maybe any number of them?) before the close-quote to opt out of the stripping. String message = """ hello world \n"""; Actual: hello\n world\n It seems to me, though, that stripping the last \n by default tends to create more surprising use cases. Retaining a final \n is usually less surprising, AFAICS. This puts the burden back on the uncommon stripping case. We might want a way to say "delete that last newline, even though I used it to make my pretty box". One way to do this would be a new escape sequence for ML strings only: String message = """ hello world\ """; Actual: hello\n world But if we don't care so much about brace-style layout, this works just as well, without a new escape sequence: String message = """ hello world"""; Actual: hello\n world From guy.steele at oracle.com Fri Apr 19 01:12:18 2019 From: guy.steele at oracle.com (Guy Steele) Date: Thu, 18 Apr 2019 21:12:18 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <6DAFDF5F-C934-4A10-95E3-16A2C72CDEA8@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> <8a6d63dc3257473cbd185f250e0ca1b813aacbc9.camel@vasylenko.uk> <6DAFDF5F-C934-4A10-95E3-16A2C72CDEA8@oracle.com> Message-ID: <22BA8E3E-4B47-4DC6-9FAC-5D3F719FBD0E@oracle.com> > On Apr 18, 2019, at 8:04 PM, John Rose wrote: > . . . > One reason people hate those extra stars in comments is > that they are using an editor (like Emacs) which is pretty > good at generic indentation, but doesn't know about > language-specific left margin markup, like stars or > (in Guy's second proposal) single quotes. Actually, Emacs in Java mode does know quite a bit about doc comments. If you are in the middle of a doc comment: (1) If you open up a blank line and type *, the indentation of the star automatically matches that of the leading star on the preceding line. (2) If you?re within a doc comment and type Meta-J to insert a line break (a routine thing to do in Emacs), a correctly indented star (and a space) are supplied on the new line after the break. (3) If you use Meta-Q to fill (that is, rejustify) a paragraph, all the right things happen to maintain the stars even as the line breaks and number of lines in the paragraph change. I do this all the time. (4) If you set these variables as follows: comment-style extra-line comment-start ?/* ? comment-end ? */? then the command Meta-; will create a new doc comment, properly indented stars and all, around the text you have selected. (If you don?t set those variables as indicated, the default for Java is to use // comments instead.) From guy.steele at oracle.com Fri Apr 19 01:23:56 2019 From: guy.steele at oracle.com (Guy Steele) Date: Thu, 18 Apr 2019 21:23:56 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <784292B1-5C5A-464A-97BE-3CA2F26FB521@oracle.com> <50E1D872-8E5F-4CA0-85BC-0F0C2411D74B@oracle.com> <97501bc0-c18a-53e8-bf49-f90187a89bfb@oracle.com> <694C93FB-3CB9-4A8D-8153-86E7F9F95194@oracle.com> <3C2FF400-8082-4B34-A118-EF3D507C1D86@oracle.com> <0E024C32-8682-4972-B356-57C24CAD444E@oracle.com> <2F003D6A-4E83-46D3-A116-9D1DD9FBEFE4@oracle.com> Message-ID: > On Apr 18, 2019, at 7:55 PM, John Rose wrote: > . . . > Also, is a left-margin quote allowed immediately before the > close-quote? Is it required? If allowed and not required, > this basically means such a string can end with either three > or four close-quotes. I would say ?not allowed?. > On Apr 18, 2019, at 8:29 PM, John Rose wrote: > . . . > This puts the burden back on the uncommon stripping > case. We might want a way to say "delete that last newline, > even though I used it to make my pretty box". One way > to do this would be a new escape sequence for ML strings > only: > > String message = """ > hello > world\ > "?"; Consider an escape sequence (I?ll use \@ as the example) that has the property that it contributes nothing to the content of the string, and furthermore cancels all non-escapewhitespace (including newlines) on both sides. To strip the last newline: String message = """ hello world \@"""; To omit all newlines and just run lines together with one space in between: String message = """ \@\ hello \@\ world \@"""; To omit all newlines and just run lines together with one space in between and one newline at the end: String message = """ \@\ hello \@\ world \@\n"""; From james.laskey at oracle.com Fri Apr 19 13:41:40 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Fri, 19 Apr 2019 10:41:40 -0300 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <431BC862-12BE-4BBC-9B09-0C2206DFCB55@oracle.com> <39408A04-43D4-4E1F-A51B-B47F9E933CD8@oracle.com> Message-ID: Not sure I understand the issue here. The first line has no influence and is not affected by the indentation provided by the close delimiter influence. The example is a bit groggy. Maybe more interesting is that the both following work as an alternative pattern; String m = """I won?t get any alignment (except maybe NL normalization) """; Result: I won?t\n get any alignment\n (except maybe NL normalization)\n String n = """I won?t get any alignment (except maybe NL normalization)"""; Result: I won?t\n get any alignment\n (except maybe NL normalization) > On Apr 18, 2019, at 2:08 PM, Liam Miller-Cushon wrote: > > On Thu, Apr 18, 2019 at 8:52 AM Brian Goetz > wrote: >> I still find the restriction appealing for the opening delimiter, though. The argument is that having contents on the opening line seems likely to cause confusion, e.g.: >> >> String m = """ +--------+ >> | text | >> +--------+"""; >> >> Result of variable m under the current string-tapas prototype: >> >> ....+--------+\n >> |..text..|\n >> +--------+ > > I think this is a restriction that is much more suitable to a _style guide_ than the language. Yes, users can get it wrong, but they?ll learn quickly. And, sometimes putting text on that first line is exactly what you want, such as in the case where you _dont_ want alignment to muck with your indentation. Putting non-blank text on that first line is effectively an opt-out: > > String m = ???I won?t > get any alignment > (except maybe NL normalization) > > ???; > > I'm not sure that matches the behaviour of the current prototype, it doesn't seem to be considering the first line: > > String m = """I won?t > get any alignment > (except maybe NL normalization) > > """; > > Result of variable m > I.won?t\n > ....get.any.alignment\n > ....(except.maybe.NL.normalization)\n > \n > From cushon at google.com Fri Apr 19 18:09:47 2019 From: cushon at google.com (Liam Miller-Cushon) Date: Fri, 19 Apr 2019 11:09:47 -0700 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <431BC862-12BE-4BBC-9B09-0C2206DFCB55@oracle.com> <39408A04-43D4-4E1F-A51B-B47F9E933CD8@oracle.com> Message-ID: On Fri, Apr 19, 2019 at 6:41 AM Jim Laskey wrote: > The first line has no influence and is not affected by the indentation > provided by the close delimiter influence. > Sorry, I may have gotten confused. That seems to be a different approach than the one Brian described in: sometimes putting text on that first line is exactly what you want, such as > in the case where you _dont_ want alignment to muck with your indentation. > Putting non-blank text on that first line is effectively an opt-out I still see room for confusion in either approach to handling the first line, and disallowing string contents on the same line as the opening delimiter has the potential to file a sharp edge off the feature. As Brian mentioned that restriction could also be accomplished with a style rule, but I don't see much downside to including the restriction in the language. Maybe more interesting is that the both following work as an alternative > pattern; String m = """I won?t get any alignment (except maybe NL normalization) """; My concern is that you can't tell from reading the code whether that works because the implementation ignores the first line, or because it uses the column position for re-indentation, so you might expect it to be equivalent to: String m = """ I won?t get any alignment (except maybe NL normalization) """; From guy.steele at oracle.com Fri Apr 19 18:16:09 2019 From: guy.steele at oracle.com (Guy Steele) Date: Fri, 19 Apr 2019 14:16:09 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <431BC862-12BE-4BBC-9B09-0C2206DFCB55@oracle.com> <39408A04-43D4-4E1F-A51B-B47F9E933CD8@oracle.com> Message-ID: > On Apr 19, 2019, at 2:09 PM, Liam Miller-Cushon wrote: > > . . . > Maybe more interesting is that the both following work as an alternative pattern; > > String m = """I won?t > get any alignment > (except maybe NL normalization) > """; > > My concern is that you can't tell from reading the code whether that works because the implementation ignores the first line, or because it uses the column position for re-indentation, so you might expect it to be equivalent to: > > String m = """ I won?t > get any alignment > (except maybe NL normalization) > """; Well, that?s right. I would regard the second example as misleading. I don?t think we can make the rules idiot-proof against such misleading examples. I do think it?s a reasonable goal that the intent of a properly written string literal should be fairly clear even to a reader who does not know the precise rules. And it?s not unreasonable to expect that when you?re writing code, you need to know the rules about how things work. That said, we do want the rules to be simple and easy to remember. From alex.buckley at oracle.com Sat Apr 20 00:16:31 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Fri, 19 Apr 2019 17:16:31 -0700 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> Message-ID: <5CBA64DF.305@oracle.com> On 4/10/2019 8:22 AM, Jim Laskey wrote: > Line terminators: When strings span lines, they do so using the line > terminators present in the source file, which may vary depending on what > operating system the file was authored. Should this be an aspect of > multi-line-ness, or should we normalize these to a standard line > terminator? It seems a little weird to treat string literals quite so > literally; the choice of line terminator is surely an incidental one. I > think we're all comfortable saying "these should be normalized", but its > worth bringing this up because it is merely one way in which incidental > artifacts of how the string is embedded in the source program force us > to interpret what the user meant. No-one has commented on this, but it's important because some libraries are going to be surprised by the presence of line terminators, of any kind, in strings denoted by multi-line string literals. To be clear, I agree with normalizing line terminators. And, I understand that any string could have contained line terminators thanks to escape sequences in traditional string literals. But, it was not common to see a \n except where multi-line-ness was expected or harmless. Going forward, who can guarantee that refactoring the argument of `prepareStatement` from a sequence of concatenations: try (PreparedStatement s = connection.prepareStatement( "SELECT * " + "FROM my_table " + "WHERE a = b " )) { ... } to a multi-line string literal: try (PreparedStatement s = connection.prepareStatement( """SELECT * FROM my_table WHERE a = b""" )) { ... } is behaviorally compatible for `prepareStatement`? It had no reason to expect \n in its string argument before. (Hat tip: https://blog.jooq.org/2015/12/29/please-java-do-finally-support-multiline-strings/) Maybe `prepareStatement` will work fine. But someone somewhere is going to take a program with a sequence of 2000 concatenations and turn them into a huge multi-line string literal, and the inserted line terminators are going to cause memory pressure, and GC is going to take a little longer, and eventually this bug will be filed: "My system runs 5% slower because the source code changed a teeny tiny bit." In reality, a few libraries will need fixing, and that will happen quickly because developers are very keen to use multi-line string literals. But it's fair to point out that while everyone is worrying about whitespace on the left of the literal, the line terminators to the right are a novel artifact too. Alex From alex.buckley at oracle.com Sat Apr 20 00:53:26 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Fri, 19 Apr 2019 17:53:26 -0700 Subject: To align, or not to align? In-Reply-To: <02bbe97e-5c78-486d-c9b2-f023ec7ab400@oracle.com> References: <2b3f00c2-6557-c998-f910-33d2f58110e8@oracle.com> <02bbe97e-5c78-486d-c9b2-f023ec7ab400@oracle.com> Message-ID: <5CBA6D86.1020002@oracle.com> On 4/18/2019 11:32 AM, Brian Goetz wrote: > One view is that a string literal is the sequence of characters between > the delimiters, and a multi-line string literal is just a string literal > that happens to be able to span lines. This is also the simplest > extension of existing string literals to multi-line; adding only the > ability to span lines. In this view, implicit alignment can feel like > conflating two things. > > An alternate view is that a multi-line string is a literal that is > embedded spatially in the Java source code; therefore it inherently has > some 2D structure to it, which gives us permission to muck with it in > certain ways that are consistent with that structure. ... > So I think the question really comes down to: what _is_ a multi-line > string literal. I have a lot of time for the "alterate" view. Multi-line string literals are not meant to be raw; some inference about the developer's intent for the sea of whitespace on the left is fine (such as, "the developer is not interested in it at all"). I do, however, think that a box-of-quotes (or even a lighterweight marker for margins) makes the 2D denotation of a string overwhelming. Alex From guy.steele at oracle.com Sat Apr 20 02:42:54 2019 From: guy.steele at oracle.com (Guy Steele) Date: Fri, 19 Apr 2019 22:42:54 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <5CBA64DF.305@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> Message-ID: <65DE2FE7-446F-4FD5-A378-FB7792E2C533@oracle.com> So is your point that multiline string literals may be an ?attractive nuisance? in that they may make it too convenient for inattentive programmers to perform _incorrect_ refactoring? > On Apr 19, 2019, at 8:16 PM, Alex Buckley wrote: > >> On 4/10/2019 8:22 AM, Jim Laskey wrote: >> Line terminators: When strings span lines, they do so using the line >> terminators present in the source file, which may vary depending on what >> operating system the file was authored. Should this be an aspect of >> multi-line-ness, or should we normalize these to a standard line >> terminator? It seems a little weird to treat string literals quite so >> literally; the choice of line terminator is surely an incidental one. I >> think we're all comfortable saying "these should be normalized", but its >> worth bringing this up because it is merely one way in which incidental >> artifacts of how the string is embedded in the source program force us >> to interpret what the user meant. > > No-one has commented on this, but it's important because some libraries are going to be surprised by the presence of line terminators, of any kind, in strings denoted by multi-line string literals. > > To be clear, I agree with normalizing line terminators. And, I understand that any string could have contained line terminators thanks to escape sequences in traditional string literals. But, it was not common to see a \n except where multi-line-ness was expected or harmless. Going forward, who can guarantee that refactoring the argument of `prepareStatement` from a sequence of concatenations: > > try (PreparedStatement s = connection.prepareStatement( > "SELECT * " > + "FROM my_table " > + "WHERE a = b " > )) { > ... > } > > to a multi-line string literal: > > try (PreparedStatement s = connection.prepareStatement( > """SELECT * > FROM my_table > WHERE a = b""" > )) { > ... > } > > is behaviorally compatible for `prepareStatement`? It had no reason to expect \n in its string argument before. > > (Hat tip: https://blog.jooq.org/2015/12/29/please-java-do-finally-support-multiline-strings/) > > Maybe `prepareStatement` will work fine. But someone somewhere is going to take a program with a sequence of 2000 concatenations and turn them into a huge multi-line string literal, and the inserted line terminators are going to cause memory pressure, and GC is going to take a little longer, and eventually this bug will be filed: "My system runs 5% slower because the source code changed a teeny tiny bit." > > In reality, a few libraries will need fixing, and that will happen quickly because developers are very keen to use multi-line string literals. But it's fair to point out that while everyone is worrying about whitespace on the left of the literal, the line terminators to the right are a novel artifact too. > > Alex From john.r.rose at oracle.com Sat Apr 20 04:43:33 2019 From: john.r.rose at oracle.com (John Rose) Date: Fri, 19 Apr 2019 21:43:33 -0700 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <5CBA64DF.305@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> Message-ID: <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> That's an interesting use case: The original string expression is multi-line, but all of the line terminators are in the envelope, not the payload. If I were writing this as a single literal in Bourne shell I might consider using a formulation that breaks the long line into several smaller lines: PREP_STMT="\ SELECT * \ FROM my_table \ WHERE a = b \ " Supporting that in Java MLS would requires a notation (I won't speculate what ATM) to mark newlines in the MLS as "syntax only, not payload". (By payload I mean the actual characters requested in the resulting string. By syntax I mean the program source which denotes the literal making the request for a string.) By making a distinction between payload and syntax, for newlines in a MLS, we have the following possibilities: syntax element | meaning LineTerminator | newline is both payload and (post-normalized) syntax \ n | newline is payload only, not syntax (syntax is an esc. seq.) TBD? | newline is syntax only, not payload (newline is suppressed somehow) The TBD category is currently occupied by the strippable newlines just after the open-quote and just before the close-quote. Might there be a use for a strippable newline in the middle of the MLS? It would cover Alex's use case. Maybe Alex was just describing an attractive nuisance, not a real use case. But I know that breaking long lines into shorter ones is something that programmers do all the time. MLS is sort of the opposite: Joining short strings into long multi-line ones. Yet both tasks involve fine control of program layout. So it's not an accident that MLS leads us to discuss syntax-not-payload newlines, even if the main driver for MLS is to introduce both-syntax-and-payload newlines. The main thing Brian is waiting for, though, is not lots of new ideas, but rather a consensus that (a) we can treat leading whitespace outside of a given rectangle as syntax-not-payload (thus stripped), and (b) that we should provide a way for programmers to opt out of the stripping (making all space into syntax-and-payload). It feels to me like we have arrived there and are driving around the parking lot, checking out all the parking spots, worrying that we will miss the best one. ? John On Apr 19, 2019, at 5:16 PM, Alex Buckley wrote: > > Going forward, who can guarantee that refactoring the argument of `prepareStatement` from a sequence of concatenations: > > try (PreparedStatement s = connection.prepareStatement( > "SELECT * " > + "FROM my_table " > + "WHERE a = b " > )) { > ... > } > > to a multi-line string literal: > > try (PreparedStatement s = connection.prepareStatement( > """SELECT * > FROM my_table > WHERE a = b""" > )) { > ... > } > > is behaviorally compatible for `prepareStatement`? It had no reason to expect \n in its string argument before. > > ( From brian.goetz at oracle.com Mon Apr 22 13:15:19 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 22 Apr 2019 09:15:19 -0400 Subject: Wrapping up the first two courses In-Reply-To: <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> Message-ID: <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> > The main thing Brian is waiting for, though, is not lots of new ideas, > but rather a consensus that (a) we can treat leading whitespace outside > of a given rectangle as syntax-not-payload (thus stripped), and (b) that > we should provide a way for programmers to opt out of the stripping > (making all space into syntax-and-payload). It feels to me like we > have arrived there and are driving around the parking lot, checking > out all the parking spots, worrying that we will miss the best one. Glad to hear it :) So, I posit, we have consensus over the following things: - Multi-line strings are a useful feature on their own - Using ?fat? delimiters for multi-line strings is practical and intuitive - Multi-line string literals share the same escape language as single-line string literals - Newlines in MLSLs should be normalized to \n - There exists a reasonable alignment algorithm, which users can learn easily enough, and can be captured as a library method on String (some finer points to be hammered out) - To the extent the language performs alignment, it should be consistent with what the library-based version does, so that users can opt out and opt back in again - In the common case, a MLSL will be a combination of some intended and some incidental indentation, and it is reasonable for the default to be that the language attempts to normalize away the incidental indendentation - There needs to be an opt-out, for the cases where alignment is not the default the user wants (A useful way to frame the discussion we had regarding linguistic alignment is: whether a string literal is ?one dimensional? or ?two dimensional.? The 1D interpretation says a string literal is just a sequence of characters between two delimiters; the 2D interpretation says that it has an inherent line structure that could be manipulated directly.) What I like about this proposal ? much more than with the previous round ? is that the two flavors of string literal (thin and fat) are clearly projections of the same feature, and their differences pertain solely to their essential difference ? multi-line-ness. I will leave it to Jim to summarize the current state of the alignment algorithm, and any open questions (e.g., closing delimiter influence, treatment of single-line strings, etc) that may still be lingering, but these are not blockers to placing our order for the first two courses. I am still having a hard time getting comfortable with Guy?s proposal to use more ?envelope? here ? I think others have expressed similar discomfort. If I had to put my finger on it, it is that being able to cut and paste in and out is such a big part of what is currently missing, and there is insufficient trust that there would be ubiquitous IDE support in all the various ways that people edit Java code. But given that this is framed as ?let?s carve out some extra envelope space?, we can keep discussing this even as we move forward. We still need to make some decisions on syntax; the main one that is currently relevant being opt-out. (For any syntax issues, please create another thread.) Jim hinted at this earlier: use an escape sequence that is stripped out of the string but means ?no alignment.? Something like: String s = ?"?\- Leave me just the way you found me??? Obviously there is room to argue over the specific escape sequence, so let?s put this in the ?open questions? bucket. There was another proposal, which was to use a prefix character: String s = a??? // opt into alignment String s = r??? // raw string I?d like to put this one to bed quickly, because I see it as having a number of issues. Having a set of prefix characters is one of those features that starts off weak and scales badly from there :). With only two prefixes, as suggested above, it has a feel of overgeneralization, but with a large number of candidate prefixes, it gets worse, because invariably as such a feature gets more complicated, there are interactions. One need look only at a Perl regex that uses multiple modifiers: /foo*/egimosx to realize that what started as a simple feature (I think initially just `g`) had grown out of control. More importantly, of the two prefixes suggested, one doesn?t really make sense. And that is: while the notion of ?raw? string is attractive, one of the things that tripped us up the first time around is the believe that ?raw? is a binary thing. In reality, raw-ness comes in degrees ? how hard you have to work to break out of the ?string of uninterpreted characters? mode. (Note: please let?s not start a discussion on raw strings; we?re wrapping up our orders for the first courses now. I raise this only to put to bed a syntax choice predicated on the assumption that raw-ness is a binary characteristic.). If we?re pursuing align-by-default, we should consider a different name for the align() method; the name was originally chosen as a compromise when there was no align-by-default, and most of the other names were too long to ask people to type routinely. If alignment is the default, the explicit name can be more descriptive. So, next steps: - Jim to write up current details of alignment algorithm, with current open issues; - Remaining bike sheds on opt-out and naming of align() Once 1/1a are in the pipe, we can consider whether we want to move ahead to raw strings. From brian.goetz at oracle.com Mon Apr 22 15:23:57 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 22 Apr 2019 11:23:57 -0400 Subject: Draft JEP on records and sealed types Message-ID: For review. https://bugs.openjdk.java.net/browse/JDK-8222777 From james.laskey at oracle.com Mon Apr 22 15:26:52 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Mon, 22 Apr 2019 12:26:52 -0300 Subject: Wrapping up the first two courses In-Reply-To: <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> Message-ID: <61B62FDA-D4EB-443C-9A8F-6D39D93F656F@oracle.com> Current "strip incidentals" algorithm as captured in String::align (string-tapas branch) public String align(int n) { if (isEmpty()) { return ""; } long count = lines().count(); if (count == 1) { return strip(); } int outdent = lines().skip(1) .filter(not(String::isBlank)) .mapToInt(String::indexOfNonWhitespace) .min() .orElse(0); String last = lines().skip(count - 1).findFirst().orElse(""); boolean lastIsBlank = last.isBlank(); if (lastIsBlank) { outdent = Integer.min(outdent, last.length()); } return indentStream(lines(1, 1), n - outdent).map(s -> s.stripTrailing()) .collect(Collectors.joining("\n", "", lastIsBlank ? "\n" : "")); } 1. if (isEmpty()) { return ""; } Empty strings returned as empty. """""" ==> "" 2. long count = lines().count(); if (count == 1) { return strip(); } Single line strings (no line terminators) are simply stripped. """ single line """ ==> "single line" 3. int outdent = lines().skip(1) ... Ignoring first line, determine least number of leading whitespaces for all non-blank lines. String s = """ ................line 1.. ....................line 2. """; ==> 16 4. boolean lastIsBlank = last.isBlank(); Detect if last line is blank. 5. if (lastIsBlank) { outdent = Integer.min(outdent, last.length()); } If last line is blank, then check if it has least number of leading whitespaces. String s = """ line 1 line 2 ............"""; ==> 12 * Breaking down the return statement 6. Stream stream1 = lines(1, 1); Break string into a stream of lines, stripping line terminators, stripping first line if blank and stripping last line if blank. ................line 1.. ....................line 2. 7. Stream stream2 = indentStream(stream1, n - outdent); Remove indentation from each line in stream. It's possible that whitespace gets added if n is larger than outdent. ....line 1.. ........line 2. 8. Stream stream3 = stream2.map(s -> s.stripTrailing()); Remove incidental trailing whitespace. ....line 1 ........line 2 9. return stream3.collect(Collectors.joining("\n", "", lastIsBlank ? "\n" : "")); Join lines with \n and add a \n at the end if the last line was blank. ....line 1\n ........line 2\n Options; 2. a) Single line strings would be just stripLeading, but should beconsistent with multi-line and stripTrailing. """ single line """ ==> "single line " b) We could do nothing for single line. """ single line """ ==> " single line " 3. a) If we include open delimiter influence, the equivalent library method can not duplicate the influence unless the user supplied the first line indentation. 5. a) If we omit close delimiter influence, only the content influences the indentation. Loss of control by the user. String s = """ line 1 line 2 """; ==> line 1\n ....line 2\n 6. a) Could strip all leading/trailing blank lines, but awkward to recover the LOI. Not recommending. String s = """ line 1 line 2 """; ==> String s = "\n".repeat(3) + """ line 1 line 2 """ + "\n".repeat(3); 8. a) Not stripping trailing space might leave debris the user didn't expect, but still a choice. 9. a) Always add a last \n. Loss of control by the user. > On Apr 22, 2019, at 10:15 AM, Brian Goetz wrote: > >> The main thing Brian is waiting for, though, is not lots of new ideas, >> but rather a consensus that (a) we can treat leading whitespace outside >> of a given rectangle as syntax-not-payload (thus stripped), and (b) that >> we should provide a way for programmers to opt out of the stripping >> (making all space into syntax-and-payload). It feels to me like we >> have arrived there and are driving around the parking lot, checking >> out all the parking spots, worrying that we will miss the best one. > > Glad to hear it :) > > So, I posit, we have consensus over the following things: > > - Multi-line strings are a useful feature on their own > - Using ?fat? delimiters for multi-line strings is practical and intuitive > - Multi-line string literals share the same escape language as single-line string literals > - Newlines in MLSLs should be normalized to \n > - There exists a reasonable alignment algorithm, which users can learn easily enough, and can be captured as a library method on String (some finer points to be hammered out) > - To the extent the language performs alignment, it should be consistent with what the library-based version does, so that users can opt out and opt back in again > - In the common case, a MLSL will be a combination of some intended and some incidental indentation, and it is reasonable for the default to be that the language attempts to normalize away the incidental indendentation > - There needs to be an opt-out, for the cases where alignment is not the default the user wants > > (A useful way to frame the discussion we had regarding linguistic alignment is: whether a string literal is ?one dimensional? or ?two dimensional.? The 1D interpretation says a string literal is just a sequence of characters between two delimiters; the 2D interpretation says that it has an inherent line structure that could be manipulated directly.) > > What I like about this proposal ? much more than with the previous round ? is that the two flavors of string literal (thin and fat) are clearly projections of the same feature, and their differences pertain solely to their essential difference ? multi-line-ness. > > I will leave it to Jim to summarize the current state of the alignment algorithm, and any open questions (e.g., closing delimiter influence, treatment of single-line strings, etc) that may still be lingering, but these are not blockers to placing our order for the first two courses. > > I am still having a hard time getting comfortable with Guy?s proposal to use more ?envelope? here ? I think others have expressed similar discomfort. If I had to put my finger on it, it is that being able to cut and paste in and out is such a big part of what is currently missing, and there is insufficient trust that there would be ubiquitous IDE support in all the various ways that people edit Java code. But given that this is framed as ?let?s carve out some extra envelope space?, we can keep discussing this even as we move forward. > > We still need to make some decisions on syntax; the main one that is currently relevant being opt-out. (For any syntax issues, please create another thread.) Jim hinted at this earlier: use an escape sequence that is stripped out of the string but means ?no alignment.? Something like: > > String s = ?"?\- > Leave me just the way > you found me??? > > Obviously there is room to argue over the specific escape sequence, so let?s put this in the ?open questions? bucket. > > There was another proposal, which was to use a prefix character: > > String s = a??? // opt into alignment > String s = r??? // raw string > > I?d like to put this one to bed quickly, because I see it as having a number of issues. > > Having a set of prefix characters is one of those features that starts off weak and scales badly from there :). With only two prefixes, as suggested above, it has a feel of overgeneralization, but with a large number of candidate prefixes, it gets worse, because invariably as such a feature gets more complicated, there are interactions. One need look only at a Perl regex that uses multiple modifiers: > > /foo*/egimosx > > to realize that what started as a simple feature (I think initially just `g`) had grown out of control. > > More importantly, of the two prefixes suggested, one doesn?t really make sense. And that is: while the notion of ?raw? string is attractive, one of the things that tripped us up the first time around is the believe that ?raw? is a binary thing. In reality, raw-ness comes in degrees ? how hard you have to work to break out of the ?string of uninterpreted characters? mode. (Note: please let?s not start a discussion on raw strings; we?re wrapping up our orders for the first courses now. I raise this only to put to bed a syntax choice predicated on the assumption that raw-ness is a binary characteristic.). > > If we?re pursuing align-by-default, we should consider a different name for the align() method; the name was originally chosen as a compromise when there was no align-by-default, and most of the other names were too long to ask people to type routinely. If alignment is the default, the explicit name can be more descriptive. > > > So, next steps: > > - Jim to write up current details of alignment algorithm, with current open issues; > - Remaining bike sheds on opt-out and naming of align() > > Once 1/1a are in the pipe, we can consider whether we want to move ahead to raw strings. > > From brian.goetz at oracle.com Mon Apr 22 16:04:35 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 22 Apr 2019 12:04:35 -0400 Subject: Alignment algorithm (was: Wrapping up the first two courses) In-Reply-To: <61B62FDA-D4EB-443C-9A8F-6D39D93F656F@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <61B62FDA-D4EB-443C-9A8F-6D39D93F656F@oracle.com> Message-ID: <5ac4a633-a560-c945-86b1-8121a35b60f9@oracle.com> On 4/22/2019 11:26 AM, Jim Laskey wrote: > Current "strip incidentals" algorithm as captured in String::align > (string-tapas branch) > > ? ? public String align(int n) { > ? ? ? ? if (isEmpty()) { > return ""; > ? ? ? ? } > ? ? ? ? long count = lines().count(); > ? ? ? ? if (count == 1) { > return strip(); > ? ? ? ? } > ? ? ? ? int outdent = lines().skip(1) > ? ? ? ? ? ? ?.filter(not(String::isBlank)) > ? ? ? ? ? ? ?.mapToInt(String::indexOfNonWhitespace) > ? ? ? ? ? ? ?.min() > ? ? ? ? ? ? ?.orElse(0); > ? ? ? ? String last = lines().skip(count - 1).findFirst().orElse(""); > ? ? ? ? boolean lastIsBlank = last.isBlank(); > ? ? ? ? if (lastIsBlank) { > outdent = Integer.min(outdent, last.length()); > ? ? ? ? } > ? ? ? ? return indentStream(lines(1, 1), n - outdent).map(s -> > s.stripTrailing()) > ?.collect(Collectors.joining("\n", "", lastIsBlank ? "\n" : "")); > ? ? } > > > 2. long count = lines().count(); > ? ?if (count == 1) { > ? ? ? ?return strip(); > ? ?} > > Single line strings (no line terminators) are simply stripped. > > ? ? """ ?single line ?""" ==> "single line" I think we should reconsider this one.? The interpretation we settled on is: we're willing to treat a multi-line string as being a sequence of lines, not just of characters, and we're willing to strip incidental whitespace that arises from accidents of how the string is embedded in the program.? But a single-line string doesn't have any of that; I think it should be left alone, regardless of quotes. > > 3. ?int outdent = lines().skip(1) ... > > Ignoring first line, determine least number of leading whitespaces for all > non-blank lines. > > ? ? String s = """ > ................line 1.. > ....................line 2. > """; ?==> 16 I think we should reconsider whether a non-blank first line means that we should consider any indentation on the first line too.? This has the likely-beneficial side-effect that having a non-blank character immediately following the """ effectively means "no stripping." Considering the indentation of the _last_ blank line gives the user more control while not requiring the user to distort indentation for common cases.? So +1 to "CDI". > > Options; > > 2. a) Single line strings would be just stripLeading, but should > beconsistent > with multi-line and stripTrailing. > > ? ? """ ?single line ?""" ==> "single line ?" > ? ? b) We could do nothing for single line. > > ? ? """ ?single line ?""" ==> " ?single line ?" I vote (b). > 5. a) If we omit close delimiter influence, only the content > influences the > indentation. ?Loss of control by the user. I think CDI is fine. > 6. a) Could strip all leading/trailing blank lines, but awkward to > recover the > LOI. Not recommending. Agreed. > 9. a) Always add a last \n. Loss of control by the user. The current behavior pairs nicely with CDI. From james.laskey at oracle.com Mon Apr 22 16:23:34 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Mon, 22 Apr 2019 13:23:34 -0300 Subject: Alignment algorithm (was: Wrapping up the first two courses) In-Reply-To: <5ac4a633-a560-c945-86b1-8121a35b60f9@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <61B62FDA-D4EB-443C-9A8F-6D39D93F656F@oracle.com> <5ac4a633-a560-c945-86b1-8121a35b60f9@oracle.com> Message-ID: > On Apr 22, 2019, at 1:04 PM, Brian Goetz wrote: > > > > On 4/22/2019 11:26 AM, Jim Laskey wrote: >> >> 2. long count = lines().count(); >> if (count == 1) { >> return strip(); >> } >> >> Single line strings (no line terminators) are simply stripped. >> >> """ single line """ ==> "single line" > > I think we should reconsider this one. The interpretation we settled on is: we're willing to treat a multi-line string as being a sequence of lines, not just of characters, and we're willing to strip incidental whitespace that arises from accidents of how the string is embedded in the program. But a single-line string doesn't have any of that; I think it should be left alone, regardless of quotes. Why didn't they write as single quote string in the first place? Having a """ removing incidentals works in our favour for examples like; """"in quotes"""" vs """ "in quotes" """ > >> >> 3. int outdent = lines().skip(1) ... >> >> Ignoring first line, determine least number of leading whitespaces for all >> non-blank lines. >> >> String s = """ >> ................line 1.. >> ....................line 2. >> """; ==> 16 > > I think we should reconsider whether a non-blank first line means that we should consider any indentation on the first line too. This has the likely-beneficial side-effect that having a non-blank character immediately following the """ effectively means "no stripping." Not sure this is a workable perspective. Opting out this way forces the user into some weird configurations that they have to unmangle to get a useful result. String s = """opt-out line 2 """; Result: opt-out .....................line 2 .................. From brian.goetz at oracle.com Mon Apr 22 16:31:01 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 22 Apr 2019 12:31:01 -0400 Subject: Alignment algorithm (was: Wrapping up the first two courses) In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <61B62FDA-D4EB-443C-9A8F-6D39D93F656F@oracle.com> <5ac4a633-a560-c945-86b1-8121a35b60f9@oracle.com> Message-ID: <64FF3DFD-FC7A-4933-AD57-550DCE5CD81F@oracle.com> > > Why didn't they write as single quote string in the first place? Having a """ removing incidentals works in our favour for examples like; > > """"in quotes"""" vs """ "in quotes" ?"" I don?t think we know (or much care) they did or didn?t. My point is, if our justification for stripping has to do with the 2D embedding of a ML string in the source code ? which I think is where we are -- there is no 2D embedding here. So stripping should have nothing to say about this case. From cushon at google.com Mon Apr 22 18:29:51 2019 From: cushon at google.com (Liam Miller-Cushon) Date: Mon, 22 Apr 2019 11:29:51 -0700 Subject: A library for implementing equals and hashCode Message-ID: Please consider this proposal for a library to help implement equals and hashCode. The doc includes a discussion of the motivation for adding such an API to the JDK, a map of the design space, and some thoughts on the subset of that space which might be most interesting: http://cr.openjdk.java.net/~cushon/amber/equivalence.html From alex.buckley at oracle.com Mon Apr 22 19:04:13 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Mon, 22 Apr 2019 12:04:13 -0700 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <65DE2FE7-446F-4FD5-A378-FB7792E2C533@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <65DE2FE7-446F-4FD5-A378-FB7792E2C533@oracle.com> Message-ID: <5CBE102D.8000106@oracle.com> Nope, I don't think multi-line string literals are an attractive nuisance in any way. We should NOT deem it incorrect to refactor a sequence of concatenations into a single multi-line string literal. Developers are chomping at the bit to do it, and if we cast doubt on the ability then we're wasting everyone's time. We should deem it correct, and 99% of the time no-one will care that newline characters exist in the string. The rare library that subtly misbehaves or (and this is the better option) actually blow ups when seeing newlines will feel great pressure to become more liberal in what it accepts, and that is a good thing. Alex On 4/19/2019 7:42 PM, Guy Steele wrote: > So is your point that multiline string literals may be an ?attractive nuisance? in that they may make it too convenient for inattentive programmers to perform _incorrect_ refactoring? > > >> On Apr 19, 2019, at 8:16 PM, Alex Buckley wrote: >> >>> On 4/10/2019 8:22 AM, Jim Laskey wrote: >>> Line terminators: When strings span lines, they do so using the line >>> terminators present in the source file, which may vary depending on what >>> operating system the file was authored. Should this be an aspect of >>> multi-line-ness, or should we normalize these to a standard line >>> terminator? It seems a little weird to treat string literals quite so >>> literally; the choice of line terminator is surely an incidental one. I >>> think we're all comfortable saying "these should be normalized", but its >>> worth bringing this up because it is merely one way in which incidental >>> artifacts of how the string is embedded in the source program force us >>> to interpret what the user meant. >> >> No-one has commented on this, but it's important because some libraries are going to be surprised by the presence of line terminators, of any kind, in strings denoted by multi-line string literals. >> >> To be clear, I agree with normalizing line terminators. And, I understand that any string could have contained line terminators thanks to escape sequences in traditional string literals. But, it was not common to see a \n except where multi-line-ness was expected or harmless. Going forward, who can guarantee that refactoring the argument of `prepareStatement` from a sequence of concatenations: >> >> try (PreparedStatement s = connection.prepareStatement( >> "SELECT * " >> + "FROM my_table " >> + "WHERE a = b " >> )) { >> ... >> } >> >> to a multi-line string literal: >> >> try (PreparedStatement s = connection.prepareStatement( >> """SELECT * >> FROM my_table >> WHERE a = b""" >> )) { >> ... >> } >> >> is behaviorally compatible for `prepareStatement`? It had no reason to expect \n in its string argument before. >> >> (Hat tip: https://blog.jooq.org/2015/12/29/please-java-do-finally-support-multiline-strings/) >> >> Maybe `prepareStatement` will work fine. But someone somewhere is going to take a program with a sequence of 2000 concatenations and turn them into a huge multi-line string literal, and the inserted line terminators are going to cause memory pressure, and GC is going to take a little longer, and eventually this bug will be filed: "My system runs 5% slower because the source code changed a teeny tiny bit." >> >> In reality, a few libraries will need fixing, and that will happen quickly because developers are very keen to use multi-line string literals. But it's fair to point out that while everyone is worrying about whitespace on the left of the literal, the line terminators to the right are a novel artifact too. >> >> Alex > From guy.steele at oracle.com Mon Apr 22 19:16:56 2019 From: guy.steele at oracle.com (Guy Steele) Date: Mon, 22 Apr 2019 15:16:56 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <5CBE102D.8000106@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <65DE2FE7-446F-4FD5-A378-FB7792E2C533@oracle.com> <5CBE102D.8000106@oracle.com> Message-ID: <9DAF8E08-E950-4632-8B66-074B4AD65797@oracle.com> I think we actually are in ?violent agreement? here, Alex, but just to be sure, see comments below. > On Apr 22, 2019, at 3:04 PM, Alex Buckley wrote: > > Nope, I don't think multi-line string literals are an attractive nuisance in any way. We should NOT deem it incorrect to refactor a sequence of concatenations into a single multi-line string literal. I didn?t say (or mean to imply that). I think it?s a great thing to refactor concatenations into a single multi-line string literal WHEN IT IS DONE CORRECTLY. However, if you blindly pull out the concatenations and thereby introduce newlines into the string when they were not there before and doing so violates some contract downstream, THAT IS AN INCORRECT TRANSFORMATION. We certainly agree that it would be a good thing if everything that might be downstream were in fact reasonably tolerant of newlines. BUT IF YOU DON?T KNOW FOR SURE THAT WHAT IS DOWNSTREAM IS TOLERANT OF NEWLINES, AND YOU BLINDLY TRANSFORM A STRING CONCATENATION INTO A MULTI-LINE STRING LITERAL THAT INCLUDES NEWLINES WHERE THERE WERE NONE BEFORE, THAT IS A BAD THING. And a feature that makes it too easy to accidentally do a bad thing _might_ be considered an attractive nuisance, AS OPPOSED TO MY SCREAMING ALL-CAPS, WHICH ARE A REPULSIVE NUISANCE. :-) > Developers are chomping at the bit to do it, and if we cast doubt on the ability then we're wasting everyone's time. We should deem it correct, and 99% of the time no-one will care that newline characters exist in the string. The rare library that subtly misbehaves or (and this is the better option) actually blow ups when seeing newlines will feel great pressure to become more liberal in what it accepts, and that is a good thing. And it would probably also be a good thing to have a way to say that a newline in the string literal should not be part of the string content. C programmers are certainly quite used to sticking a backslash in front of a newline to mean ?not really a newline here?. From alex.buckley at oracle.com Tue Apr 23 00:00:29 2019 From: alex.buckley at oracle.com (Alex Buckley) Date: Mon, 22 Apr 2019 17:00:29 -0700 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <9DAF8E08-E950-4632-8B66-074B4AD65797@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <65DE2FE7-446F-4FD5-A378-FB7792E2C533@oracle.com> <5CBE102D.8000106@oracle.com> <9DAF8E08-E950-4632-8B66-074B4AD65797@oracle.com> Message-ID: <5CBE559D.7060603@oracle.com> On 4/22/2019 12:16 PM, Guy Steele wrote: >> On Apr 22, 2019, at 3:04 PM, Alex Buckley >> wrote: >> >> Nope, I don't think multi-line string literals are an attractive >> nuisance in any way. We should NOT deem it incorrect to refactor a >> sequence of concatenations into a single multi-line string >> literal. > > I didn?t say (or mean to imply that). I think it?s a great thing to > refactor concatenations into a single multi-line string literal WHEN > IT IS DONE CORRECTLY. > > However, if you blindly pull out the concatenations and thereby > introduce newlines into the string when they were not there before > and doing so violates some contract downstream, THAT IS AN INCORRECT > TRANSFORMATION. Literally, yes, it's an incorrect transformation for the caller to perform if it violates the contract offered by the callee. > We certainly agree that it would be a good thing if everything that > might be downstream were in fact reasonably tolerant of newlines. Yes. > BUT IF YOU DON?T KNOW FOR SURE THAT WHAT IS DOWNSTREAM IS TOLERANT OF > NEWLINES, AND YOU BLINDLY TRANSFORM A STRING CONCATENATION INTO A > MULTI-LINE STRING LITERAL THAT INCLUDES NEWLINES WHERE THERE WERE > NONE BEFORE, THAT IS A BAD THING. If the callee's contract says "No newlines in the string argument to Customer::setName", then the caller would be doing a bad thing. But the reason this topic is interesting(ish) is because we're dealing with something that the vast majority of callees never thought to specify. (Well, maybe not "never". I browsed the Java SE API Specification to find a method that takes a String, and randomly clicked on something in JNDI -- https://docs.oracle.com/en/java/javase/12/docs/api/java.naming/javax/naming/Name.html#add(java.lang.String) -- which happens to be strict about the string passed to it, so perhaps someone is about to get an InvalidNameException when they try to lay out a long LDAP query string over multiple lines.) > And a feature that makes it too easy to accidentally do a bad thing > _might_ be considered an attractive nuisance, AS OPPOSED TO MY > SCREAMING ALL-CAPS, WHICH ARE A REPULSIVE NUISANCE. I can get 90% of the way to saying "OK, multi-line string literals _might_ be considered an attractive nuisance", but I can't get 100% of the way there because it's such a callee-centric view to take when the purpose of the feature is to simplify the life of the caller. If you crack open the door to give callees a hearing, you'll get requests to statically reject multi-line string literals (such as via a java.* annotation that programmatically indicates "not multi-line safe", or a java.lang.MultilineString type that's a sibling of String) and we don't want to go anywhere near there. (I recall a library that took Runnable or somesuch, and fell over when the argument was a lambda expression; the library expected an anonymous inner class instance in order to do some peculiar introspection, which failed on the opaque object reifying a lambda expression. The library developer _might_ have considered lambda expressions an attractive nuisance for a few minutes, but who would have sympathy?) Alex From guy.steele at oracle.com Tue Apr 23 03:31:09 2019 From: guy.steele at oracle.com (Guy Steele) Date: Mon, 22 Apr 2019 23:31:09 -0400 Subject: String reboot - (1a) incidental whitespace In-Reply-To: <5CBE559D.7060603@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <65DE2FE7-446F-4FD5-A378-FB7792E2C533@oracle.com> <5CBE102D.8000106@oracle.com> <9DAF8E08-E950-4632-8B66-074B4AD65797@oracle.com> <5CBE559D.7060603@oracle.com> Message-ID: <68F1D052-8235-49EB-935F-2AFFFBD801ED@oracle.com> Good points. Like I said, I think we are in agreement! ?Guy > On Apr 22, 2019, at 8:00 PM, Alex Buckley wrote: > > On 4/22/2019 12:16 PM, Guy Steele wrote: >>> On Apr 22, 2019, at 3:04 PM, Alex Buckley >>> wrote: >>> >>> Nope, I don't think multi-line string literals are an attractive >>> nuisance in any way. We should NOT deem it incorrect to refactor a >>> sequence of concatenations into a single multi-line string >>> literal. >> >> I didn?t say (or mean to imply that). I think it?s a great thing to >> refactor concatenations into a single multi-line string literal WHEN >> IT IS DONE CORRECTLY. >> >> However, if you blindly pull out the concatenations and thereby >> introduce newlines into the string when they were not there before >> and doing so violates some contract downstream, THAT IS AN INCORRECT >> TRANSFORMATION. > > Literally, yes, it's an incorrect transformation for the caller to perform if it violates the contract offered by the callee. > >> We certainly agree that it would be a good thing if everything that >> might be downstream were in fact reasonably tolerant of newlines. > > Yes. > >> BUT IF YOU DON?T KNOW FOR SURE THAT WHAT IS DOWNSTREAM IS TOLERANT OF >> NEWLINES, AND YOU BLINDLY TRANSFORM A STRING CONCATENATION INTO A >> MULTI-LINE STRING LITERAL THAT INCLUDES NEWLINES WHERE THERE WERE >> NONE BEFORE, THAT IS A BAD THING. > > If the callee's contract says "No newlines in the string argument to Customer::setName", then the caller would be doing a bad thing. > > But the reason this topic is interesting(ish) is because we're dealing with something that the vast majority of callees never thought to specify. > > (Well, maybe not "never". I browsed the Java SE API Specification to find a method that takes a String, and randomly clicked on something in JNDI -- https://docs.oracle.com/en/java/javase/12/docs/api/java.naming/javax/naming/Name.html#add(java.lang.String) -- which happens to be strict about the string passed to it, so perhaps someone is about to get an InvalidNameException when they try to lay out a long LDAP query string over multiple lines.) > >> And a feature that makes it too easy to accidentally do a bad thing >> _might_ be considered an attractive nuisance, AS OPPOSED TO MY >> SCREAMING ALL-CAPS, WHICH ARE A REPULSIVE NUISANCE. > > I can get 90% of the way to saying "OK, multi-line string literals _might_ be considered an attractive nuisance", but I can't get 100% of the way there because it's such a callee-centric view to take when the purpose of the feature is to simplify the life of the caller. If you crack open the door to give callees a hearing, you'll get requests to statically reject multi-line string literals (such as via a java.* annotation that programmatically indicates "not multi-line safe", or a java.lang.MultilineString type that's a sibling of String) and we don't want to go anywhere near there. > > (I recall a library that took Runnable or somesuch, and fell over when the argument was a lambda expression; the library expected an anonymous inner class instance in order to do some peculiar introspection, which failed on the opaque object reifying a lambda expression. The library developer _might_ have considered lambda expressions an attractive nuisance for a few minutes, but who would have sympathy?) > > Alex From elias at vasylenko.uk Tue Apr 23 09:41:54 2019 From: elias at vasylenko.uk (Elias N Vasylenko) Date: Tue, 23 Apr 2019 10:41:54 +0100 Subject: Alignment algorithm (was: Wrapping up the first two courses) In-Reply-To: <5ac4a633-a560-c945-86b1-8121a35b60f9@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <61B62FDA-D4EB-443C-9A8F-6D39D93F656F@oracle.com> <5ac4a633-a560-c945-86b1-8121a35b60f9@oracle.com> Message-ID: <79fe66f4f541c34abda2148d7a2efa9ae755eb97.camel@vasylenko.uk> On Mon, 2019-04-22 at 12:04 -0400, Brian Goetz wrote: > > > > Ignoring first line, determine least number of leading whitespaces > > for all > > non-blank lines. > > > > String s = """ > > ................line 1.. > > ....................line 2. > > """; ==> 16 > > I think we should reconsider whether a non-blank first line means > that we should consider any indentation on the first line too. This > has the likely-beneficial side-effect that having a non-blank > character immediately following the """ effectively means "no > stripping." I would go the other way and say that any non-blank first line should mean no stripping at all. A non-blank first line will always have at least an extra three characters of indentation, so won't line up with the rest of the string. (Ignoring some questionable (ab)use of 4+ width tabs.) I think it's a useful principle that automatic indentation stripping deals with the positioning of a string as a whole in 2D space in the source file, and if the relative positioning of each line appears to be inconsistently modified then this principle is violated. Telling users that they can encode this result: ....line 1 ....line 2 As either this: ` String s = """ line 1 line 2 """; ` Or this: ` String s = """ line 1 line 2 """; ` Would be hugely inintuitive imo. Non-empty first lines would still be useful as an opt-out. And I'd suggest a convention of using this opt-out by escaping the leading newline, thus making the first line effectively non-empty without messing up alignment in the source: ` String s = """\ line 1 line 2 """; ` gives: ......................line 1 ......................line 2 We can pull a similar trick to opt-out of any closing delimiter indentation influence: ` String s = """ line 1 line 2\ """; ` gives: line 1 line 2 From brian.goetz at oracle.com Tue Apr 23 14:57:24 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 23 Apr 2019 10:57:24 -0400 Subject: Alignment algorithm (was: Wrapping up the first two courses) In-Reply-To: <79fe66f4f541c34abda2148d7a2efa9ae755eb97.camel@vasylenko.uk> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <61B62FDA-D4EB-443C-9A8F-6D39D93F656F@oracle.com> <5ac4a633-a560-c945-86b1-8121a35b60f9@oracle.com> <79fe66f4f541c34abda2148d7a2efa9ae755eb97.camel@vasylenko.uk> Message-ID: >> I think we should reconsider whether a non-blank first line means >> that we should consider any indentation on the first line too. This >> has the likely-beneficial side-effect that having a non-blank >> character immediately following the """ effectively means "no >> stripping." > > I would go the other way and say that any non-blank first line should > mean no stripping at all. A non-blank first line will always have at > least an extra three characters of indentation, so won't line up with > the rest of the string. (Ignoring some questionable (ab)use of 4+ width > tabs.) This is worth considering ? but as I?ve said before, this can?t be the only opt-out. One way to unify these is some sort of auto-stripped escape that does not count as whitespace. > Non-empty first lines would still be useful as an opt-out. And I'd > suggest a convention of using this opt-out by escaping the leading > newline, thus making the first line effectively non-empty without > messing up alignment in the source: > ` > String s = """\ > line 1 > line 2 > """; > ` I would prefer to treat \ as an escape that means ?eat the newline?, as it is in other languages, as in the example John posted last week. But in that interpretation, you get the same effect, because the above is equavlient to > String s = """line 1 > line 2 > """; Or maybe that?s what you were suggesting? From elias at vasylenko.uk Tue Apr 23 20:19:17 2019 From: elias at vasylenko.uk (Elias N Vasylenko) Date: Tue, 23 Apr 2019 21:19:17 +0100 Subject: Alignment algorithm (was: Wrapping up the first two courses) In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <61B62FDA-D4EB-443C-9A8F-6D39D93F656F@oracle.com> <5ac4a633-a560-c945-86b1-8121a35b60f9@oracle.com> <79fe66f4f541c34abda2148d7a2efa9ae755eb97.camel@vasylenko.uk> Message-ID: <407ded13cf9ab45b06f19a9772632961f9377b6a.camel@vasylenko.uk> On Tue, 2019-04-23 at 10:57 -0400, Brian Goetz wrote: > > > I think we should reconsider whether a non-blank first line means > > > that we should consider any indentation on the first line > > > too. This > > > has the likely-beneficial side-effect that having a non-blank > > > character immediately following the """ effectively means "no > > > stripping." > > > > I would go the other way and say that any non-blank first line > > should > > mean no stripping at all. A non-blank first line will always have > > at > > least an extra three characters of indentation, so won't line up > > with > > the rest of the string. (Ignoring some questionable (ab)use of 4+ > > width > > tabs.) > > This is worth considering ? but as I?ve said before, this can?t be > the only opt-out. I agree! There needs to be a way to opt out without losing the leading newline. > One way to unify these is some sort of auto-stripped escape that does > not count as whitespace. Ah, well I had assumed that the auto-alignment would be applied to the string *after* processing escape sequences, in which case a self- deleting escape sequence wouldn't be visible to it. But on reflection I think that was probably the wrong assumption. Applying auto-alignment before processing escapes is a more faithful expression of the principle that we're dealing with the embedding of the string in 2D space in the source file. > > Non-empty first lines would still be useful as an opt-out. And I'd > > suggest a convention of using this opt-out by escaping the leading > > newline, thus making the first line effectively non-empty without > > messing up alignment in the source: > > ` > > String s = """\ > > line 1 > > line 2 > > """; > > ` > > I would prefer to treat \ as an escape that means ?eat the > newline?, as it is in other languages, as in the example John posted > last week. But in that interpretation, you get the same effect, > because the above is equavlient to > > > String s = """line 1 > > line 2 > > """; > > > Or maybe that?s what you were suggesting? > Yes that was the idea! Or rather, that it would be equivalent to: String s = """ line 1 line 2 """; I didn't notice it had already been suggested. But if indentation stripping is applied before escaping is processed then it doesn't work like that anyway. More exactly, it does still work, but for a different reason. As does the following ... String s = """\n\ line 1 line 2 """; ... for when we want to opt out without losing the leading newline. In this case I don't see the need for a new auto-stripped escape. But in any case I believe the justification for disabling auto- alignment when there is a non-empty first line stands for itself, regardless of how auto-alignment interacts with escapes. From forax at univ-mlv.fr Tue Apr 23 21:54:19 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 23 Apr 2019 23:54:19 +0200 (CEST) Subject: A library for implementing equals and hashCode In-Reply-To: References: Message-ID: <65756369.405751.1556056459260.JavaMail.zimbra@u-pem.fr> Hi Liam, interesting proposal, Here is my implementation [ https://github.com/forax/exotic/blob/master/src/main/java/com.github.forax.exotic/com/github/forax/exotic/ObjectSupport.java | https://github.com/forax/exotic/blob/master/src/main/java/com.github.forax.exotic/com/github/forax/exotic/ObjectSupport.java ] I disagree that it has to be included in the JDK, because as you said there is no typesafe way to represent a field at compile time so any APIs will be either typesafe but slow or less typesafe and faster*. R?mi * It doesn't seems logical that if an API is less typesafe, it can be faster at runtime. It's because if you have an API based on strings you have less type info at compile time but more type info at runtime thanks to the reflection. By contrast, an API based on lambdas will be more typesafe but because you can not do reflection on lambdas, so you have less type info at runtime. > De: "Liam Miller-Cushon" > ?: "amber-spec-experts" > Envoy?: Lundi 22 Avril 2019 20:29:51 > Objet: A library for implementing equals and hashCode > Please consider this proposal for a library to help implement equals and > hashCode. > The doc includes a discussion of the motivation for adding such an API to the > JDK, a map of the design space, and some thoughts on the subset of that space > which might be most interesting: > [ http://cr.openjdk.java.net/~cushon/amber/equivalence.html | > http://cr.openjdk.java.net/~cushon/amber/equivalence.html ] From forax at univ-mlv.fr Tue Apr 23 22:09:29 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 24 Apr 2019 00:09:29 +0200 (CEST) Subject: Draft JEP on records and sealed types In-Reply-To: References: Message-ID: <603985572.407748.1556057369320.JavaMail.zimbra@u-pem.fr> reviewed ! there are two gray areas, how to have several public records in one compilation unit and what is exactly an extractor ? R?mi > De: "Brian Goetz" > ?: "amber-spec-experts" > Envoy?: Lundi 22 Avril 2019 17:23:57 > Objet: Draft JEP on records and sealed types > For review. > [ https://bugs.openjdk.java.net/browse/JDK-8222777 | > https://bugs.openjdk.java.net/browse/JDK-8222777 ] From cushon at google.com Tue Apr 23 22:39:48 2019 From: cushon at google.com (Liam Miller-Cushon) Date: Tue, 23 Apr 2019 15:39:48 -0700 Subject: A library for implementing equals and hashCode In-Reply-To: References: Message-ID: [received off-list] There's some related discussion under "What?s the relationship to Comparator?". There are at least four options: Equivalence extends Comparator, Comparator extends Equivalence, there's no relationship between them, or they both extend some new common super type. There are trade-offs here, but none of those options seem like a slam-dunk. The handling of subtypes also affects the choice between instanceof and getClass() (see "Does equals use instanceof or getClass()?"), but we may have found a way to side-step part of that debate. On Mon, Apr 22, 2019 at 12:20 PM Fred Toussi wrote: > Since 2014 HSQLDB has been using an ObjectComparator that extends > Comparator for its hash sets and maps. These are sets and maps for > combinations of int, long and Object, as well as order preserving sets and > maps. > > > https://sourceforge.net/p/hsqldb/svn/HEAD/tree/base/trunk/src/org/hsqldb/lib/ObjectComparator.java > > https://sourceforge.net/p/hsqldb/svn/HEAD/tree/base/trunk/src/org/hsqldb/map/BaseHashMap.java > > We made a shortcut to extend Comparator, but if this is going to be added > to Java, your Equivalence should be the super interface of Comparator > > You may also consider the problems of correctly implementing equals in > subclasses, which took years to be clarified (by Martin Odersky AFAIR) by > calling super.equals(other) before performing the test. Example below from > HSQLDB code. > > public class RowType extends Type { > public boolean equals(Object other) { > > if (other == this) { > return true; > } > > if (other instanceof RowType) { > if (super.equals(other)) { > .... > > Regards > > Fred Toussi > HSQLDB Project > From cushon at google.com Tue Apr 23 22:41:09 2019 From: cushon at google.com (Liam Miller-Cushon) Date: Tue, 23 Apr 2019 15:41:09 -0700 Subject: A library for implementing equals and hashCode In-Reply-To: <65756369.405751.1556056459260.JavaMail.zimbra@u-pem.fr> References: <65756369.405751.1556056459260.JavaMail.zimbra@u-pem.fr> Message-ID: Hi Remi, On Tue, Apr 23, 2019 at 2:54 PM Remi Forax wrote: > Here is my implementation > > https://github.com/forax/exotic/blob/master/src/main/java/com.github.forax.exotic/com/github/forax/exotic/ObjectSupport.java > Thanks for the pointer! I'll add a note to the prior art section. > I disagree that it has to be included in the JDK, because as you said > there is no typesafe way to represent a field at compile time so any APIs > will be either typesafe but slow or less typesafe and faster*. > > * It doesn't seems logical that if an API is less typesafe, it can be > faster at runtime. It's because if you have an API based on strings you > have less type info at compile time but more type info at runtime thanks to > the reflection. By contrast, an API based on lambdas will be more typesafe > but because you can not do reflection on lambdas, so you have less type > info at runtime. > To maybe clarify the mention of performance in the 'non-goals' section, it's more that it's a long term goal than it is a non-goal. Ultimately we want the performance to be competitive with hand-written implementations, perhaps through some combination of intensification, field references, and lambda cracking. For the initial discussion I wanted to consider what the best (e.g. most readable and ergonomic) library version of the feature would look like. From stuart.marks at oracle.com Wed Apr 24 01:06:50 2019 From: stuart.marks at oracle.com (Stuart Marks) Date: Tue, 23 Apr 2019 18:06:50 -0700 Subject: A library for implementing equals and hashCode In-Reply-To: References: Message-ID: <60baf16a-37ee-4e82-475c-af45287f02de@oracle.com> On 4/22/19 11:29 AM, Liam Miller-Cushon wrote: > Please consider this proposal for a library to help implement equals and > hashCode. > > The doc includes a discussion of the motivation for adding such an API to > the JDK, a map of the design space, and some thoughts on the subset of that > space which might be most interesting: > > http://cr.openjdk.java.net/~cushon/amber/equivalence.html Hi Liam, Thanks for the discussion here. I don't have many specific points to add at the moment, but as you probably know, this has been discussed before. I think it would be useful to add links to those previous discussions to help comparison and analysis. There are a couple RFEs in JIRA that cover this and related topics: JDK-4771660 (coll) Comparator, Comparable, Identity, and Equivalence JDK-6270657 (coll) remove/contains and "Equators" other than .equals() Note these are filed under the collections subcomponent since that's where most of the use cases seem to fall, e.g., for set.contains(obj) [unless the set is a SortedSet, blah blah blah]. I think the Odersky article about Java equals() methods that was referred to elsewhere is this one: https://www.artima.com/lejava/articles/equality.html On hashing, the base-31 polynomial is certainly "traditional" but it's not without faults. John Rose wrote up a bunch of notes in the form of a draft JEP; JDK-8201462 Better hash codes I don't know if this is going anywhere as a JEP at the moment, but it certainly points out that we can do better by default than the base-31 formula. Finally, I agree that the hash function should be left unspecified, but if it has a well-known and predictable implementation, it will shortly become the "de facto" specification and will be very difficult to change in the future. I'm therefore in favor of more aggressive approaches to create a defensible space for future changes. This might include randomization, perhaps on by default, perhaps opt-out, or possibly something else. s'marks From cushon at google.com Wed Apr 24 22:52:10 2019 From: cushon at google.com (Liam Miller-Cushon) Date: Wed, 24 Apr 2019 15:52:10 -0700 Subject: A library for implementing equals and hashCode In-Reply-To: <60baf16a-37ee-4e82-475c-af45287f02de@oracle.com> References: <60baf16a-37ee-4e82-475c-af45287f02de@oracle.com> Message-ID: Hi Stuart, Thanks for the background! I added your links to the doc. On Tue, Apr 23, 2019 at 6:06 PM Stuart Marks wrote: > I think the Odersky article about Java equals() methods that was referred > to > elsewhere is this one: > > https://www.artima.com/lejava/articles/equality.html I think the crux of this is the effect on substitutability. I'm not sure we're going to be able to provide a single opinionated solution to writing equals methods on subclasses. Winning here might be making it possible (and reasonably ergonomic) to express the common approaches to this problem. > Finally, I agree that the hash function should be left unspecified, but if > it > has a well-known and predictable implementation, it will shortly become > the "de > facto" specification and will be very difficult to change in the future. > I'm > therefore in favor of more aggressive approaches to create a defensible > space > for future changes. This might include randomization, perhaps on by > default, > perhaps opt-out, or possibly something else. > As mentioned in the doc we've had good results randomizing the iteration order of hash-based containers at Google, but we're only doing it in tests. Other languages have had success with being more aggressive, for example go always randomizes the iteration order of maps. From stuart.marks at oracle.com Wed Apr 24 23:31:12 2019 From: stuart.marks at oracle.com (Stuart Marks) Date: Wed, 24 Apr 2019 16:31:12 -0700 Subject: A library for implementing equals and hashCode In-Reply-To: References: <60baf16a-37ee-4e82-475c-af45287f02de@oracle.com> Message-ID: On 4/24/19 3:52 PM, Liam Miller-Cushon wrote: > Finally, I agree that the hash function should be left unspecified, but if it > has a well-known and predictable implementation, it will shortly become the "de > facto" specification and will be very difficult to change in the future. I'm > therefore in favor of more aggressive approaches to create a defensible space > for future changes. This might include randomization, perhaps on by default, > perhaps opt-out, or possibly something else. > > As mentioned in the doc we've had good results randomizing the iteration order of > hash-based containers at Google, but we're only doing it in tests. > > Other languages have had success with being more aggressive, for example go > always randomizes the iteration order of maps. Heh, I guess I was being a bit too oblique. :-) The Java 9+ unmodifiable collections (Set.of, Map.of) have randomized iteration order. There's been a bunch of discussion about this, which I don't think we need to repeat here. What's relevant here, though, is that it has enabled us to make internal reorganizations to the data structures (and also to the randomization scheme) with impunity, as we're pretty well assured that applications cannot have any dependency on the iteration order. This is what I meant by "defensible space for future changes." Similarly, if the hashcode library is going to be supplying a function that we'll want to change in the future, we'll need to take steps to ensure that applications don't depend on the initial implementation. However, I can see that some applications might want greater control over the actual hash function, so that capability probably will need to be provided through the API somehow. s'marks From kevinb at google.com Wed Apr 24 23:45:16 2019 From: kevinb at google.com (Kevin Bourrillion) Date: Wed, 24 Apr 2019 16:45:16 -0700 Subject: A library for implementing equals and hashCode In-Reply-To: References: <60baf16a-37ee-4e82-475c-af45287f02de@oracle.com> Message-ID: On Wed, Apr 24, 2019 at 4:31 PM Stuart Marks wrote: However, I can see that > some applications might want greater control over the actual hash > function, so > that capability probably will need to be provided through the API somehow. > I would disagree with this strongly. Use cases that require a *quality* hash code really need to use a proper hashing library like (bias alert) Guava's. It can produce quality results because its API is designed for that. Object.hashCode() doesn't *need *to be a quality hash code! It only needs to be good enough for in-memory hash tables, which is about as low as one? standards can possibly get. One should certainly avoid easily constructable collisions... and then that's about all there is to worry about. And, you *can't *reasonably get much better than that anyway, because you will continually find yourself depending on Integer.hashCode() and String.hashCode() etc., which will never get better than basic. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From cushon at google.com Thu Apr 25 00:18:41 2019 From: cushon at google.com (Liam Miller-Cushon) Date: Wed, 24 Apr 2019 17:18:41 -0700 Subject: A library for implementing equals and hashCode In-Reply-To: References: <60baf16a-37ee-4e82-475c-af45287f02de@oracle.com> Message-ID: On Wed, Apr 24, 2019 at 4:31 PM Stuart Marks wrote: > Heh, I guess I was being a bit too oblique. :-) > (Whoops. {Set,Map}.of are also good examples of randomization :) ) From stuart.marks at oracle.com Thu Apr 25 00:39:23 2019 From: stuart.marks at oracle.com (Stuart Marks) Date: Wed, 24 Apr 2019 17:39:23 -0700 Subject: A library for implementing equals and hashCode In-Reply-To: References: <60baf16a-37ee-4e82-475c-af45287f02de@oracle.com> Message-ID: <968749ad-f590-b152-7df7-030bee77df24@oracle.com> >> However, I can see that >> some applications might want greater control over the actual hash >> function, so >> that capability probably will need to be provided through the API somehow. > > I would disagree with this strongly. Use cases that require a *quality* hash > code really need to use a proper hashing library like (bias alert) Guava's. > It can produce quality results because its API is designed for that. By "quality" hash code, do you mean a cryptographic hash? If so, then yes, that's really quite separate, and I'm not proposing supporting such a thing here. Here's what I meant by "greater control". Liam's proposal talks about providing a hash reduce function implementation that is unspecified, allowing for the possibility of it changing in the future. Some applications will be fine with this. Other applications might want to use this API for convenience, or cleaner code, or something, but which might want to use their own hash reduce function in order to preserve compatibility. They might want to replicate the hash function used in the past, or they might want to ensure that the hash function doesn't change out from under them in a future release. These applications might require "greater control" in the API of the hash reduce function. On the other hand, maybe we don't need to support this use case, in which case the need for greater control goes away. I'm not taking a position on this; I'm just mapping out the design space. s'marks From john.r.rose at oracle.com Thu Apr 25 00:41:49 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 24 Apr 2019 17:41:49 -0700 Subject: A library for implementing equals and hashCode In-Reply-To: References: <60baf16a-37ee-4e82-475c-af45287f02de@oracle.com> Message-ID: On Apr 24, 2019, at 4:45 PM, Kevin Bourrillion wrote: > > On Wed, Apr 24, 2019 at 4:31 PM Stuart Marks wrote: > >> However, I can see that >> some applications might want greater control over the actual hash function, so >> that capability probably will need to be provided through the API somehow. > I would disagree with this strongly. Use cases that require a quality hash code really need to use a proper hashing library like (bias alert) Guava's. It can produce quality results because its API is designed for that. Kevin, are you saying that people who are concerned about hash quality should just stay away from Object.hashCode? Or are you saying that people who want greater control over actual hash functions should use an external library like Guava? (I'm guessing the former, but not sure here. I see the "strongly" but I can't process it without some more clarity.) > Object.hashCode() doesn't need to be a quality hash code! It only needs to be good enough for in-memory hash tables, which is about as low as one? standards can possibly get. One should certainly avoid easily constructable collisions... and then that's about all there is to worry about. And, you can't reasonably get much better than that anyway, because you will continually find yourself depending on Integer.hashCode() and String.hashCode() etc., which will never get better than basic. This paragraph proves that Object.hashCode is pretty poor except for some limited use cases. It's actually worse than you claim: hashCode was designed when everybody ran Java on systems with less than a gigabyte of memory. Nowadays you can build in-memory structures which are *guaranteed* to experience collisions because they contain more than 2^32 keys. You point out that Integer and String have? "basic" hash codes; I would add List to that, since it uses the same "basic" mixing function as String. That was appropriate in 1995 when 32-bit multiply was slow, but now it's just an obsolete function seeking a sinecure for a barely-working retirement. These "basic" hash codes used to be reasonable, and now they need warning labels. Something along the lines of, "these so-called hash codes were reasonable in the 90's". But the Object.hashCode API can't be deprecated or ignored, because it is centrally located, and that makes it frankly useful, sometimes despite its shortcomings. This is why I'm hoping to find a path to replace it with something better, something equally useful (which includes ubiquitous) but not so basic. Or moldly. What excites me, in all this, is a possible convergence of a need (hashCode is increasingly antiquated), an opportunity (fast mixing primitives for wide values on stock hardware), some well-understood frameworks for hashing (like Guava's funnels and mixers IIRC), and some upcoming new capabilities for migrating old classes to new APIs (Valhalla bridges, coming soon I hope). Out of this convergence I think we might be able to obtain some good things: - A higher-precision hash code *in the JDK* that scales to all in-memory structures. (working title "longHashCode") - A migration path that allows classes to lean on old hash codes as a crutch until they can be upgraded. - Some easy ways to upgrade them (such as Liam's library, plus maybe some delegation stuff Brian is cooking up). - Good upgrades for all commonly used JDK types. And also a strong version of the randomization Stuart was talking about: - A contract that allows the JVM to switch hashes, by modifying salts and even algorithms, depending on hardware available. - (Optionally) A way to pass a dynamic salt to a hash code, for container-specific perturbations. But it might be OK to handle this as post-processing. Ultimately it probably can't be a one-size-fits all thing, so I think it might be reasonable to build something like the funnel and mixer framework into a JDK template function (when we have those) so that applications can ask for customized stronger or weaker, wider or narrower hash functions, maybe to cover data sets which are too large for memory and/or which need to be reproducible across JVMs. Then the standard hash code ("longHashCode" or the like) can be defined as a JVM-controlled instance of that framework. By then, if we start to roll out 128-bit integers in Java (using value types) we can roll out some larger hash codes at the same time, using the same migration tricks. Bonus point: If the JVM gets ownership of a pervasively used hash code, then hardening that hash code, even at the expense of JVM performance, becomes an allowed move. Kind of like compiling in side-channel remediations: You do it sometimes because you need more DiD, but not always because of a performance hit. HTH ? John From john.r.rose at oracle.com Thu Apr 25 00:41:49 2019 From: john.r.rose at oracle.com (John Rose) Date: Wed, 24 Apr 2019 17:41:49 -0700 Subject: A library for implementing equals and hashCode In-Reply-To: References: <60baf16a-37ee-4e82-475c-af45287f02de@oracle.com> Message-ID: On Apr 24, 2019, at 4:45 PM, Kevin Bourrillion wrote: > > On Wed, Apr 24, 2019 at 4:31 PM Stuart Marks wrote: > >> However, I can see that >> some applications might want greater control over the actual hash function, so >> that capability probably will need to be provided through the API somehow. > I would disagree with this strongly. Use cases that require a quality hash code really need to use a proper hashing library like (bias alert) Guava's. It can produce quality results because its API is designed for that. Kevin, are you saying that people who are concerned about hash quality should just stay away from Object.hashCode? Or are you saying that people who want greater control over actual hash functions should use an external library like Guava? (I'm guessing the former, but not sure here. I see the "strongly" but I can't process it without some more clarity.) > Object.hashCode() doesn't need to be a quality hash code! It only needs to be good enough for in-memory hash tables, which is about as low as one? standards can possibly get. One should certainly avoid easily constructable collisions... and then that's about all there is to worry about. And, you can't reasonably get much better than that anyway, because you will continually find yourself depending on Integer.hashCode() and String.hashCode() etc., which will never get better than basic. This paragraph proves that Object.hashCode is pretty poor except for some limited use cases. It's actually worse than you claim: hashCode was designed when everybody ran Java on systems with less than a gigabyte of memory. Nowadays you can build in-memory structures which are *guaranteed* to experience collisions because they contain more than 2^32 keys. You point out that Integer and String have? "basic" hash codes; I would add List to that, since it uses the same "basic" mixing function as String. That was appropriate in 1995 when 32-bit multiply was slow, but now it's just an obsolete function seeking a sinecure for a barely-working retirement. These "basic" hash codes used to be reasonable, and now they need warning labels. Something along the lines of, "these so-called hash codes were reasonable in the 90's". But the Object.hashCode API can't be deprecated or ignored, because it is centrally located, and that makes it frankly useful, sometimes despite its shortcomings. This is why I'm hoping to find a path to replace it with something better, something equally useful (which includes ubiquitous) but not so basic. Or moldly. What excites me, in all this, is a possible convergence of a need (hashCode is increasingly antiquated), an opportunity (fast mixing primitives for wide values on stock hardware), some well-understood frameworks for hashing (like Guava's funnels and mixers IIRC), and some upcoming new capabilities for migrating old classes to new APIs (Valhalla bridges, coming soon I hope). Out of this convergence I think we might be able to obtain some good things: - A higher-precision hash code *in the JDK* that scales to all in-memory structures. (working title "longHashCode") - A migration path that allows classes to lean on old hash codes as a crutch until they can be upgraded. - Some easy ways to upgrade them (such as Liam's library, plus maybe some delegation stuff Brian is cooking up). - Good upgrades for all commonly used JDK types. And also a strong version of the randomization Stuart was talking about: - A contract that allows the JVM to switch hashes, by modifying salts and even algorithms, depending on hardware available. - (Optionally) A way to pass a dynamic salt to a hash code, for container-specific perturbations. But it might be OK to handle this as post-processing. Ultimately it probably can't be a one-size-fits all thing, so I think it might be reasonable to build something like the funnel and mixer framework into a JDK template function (when we have those) so that applications can ask for customized stronger or weaker, wider or narrower hash functions, maybe to cover data sets which are too large for memory and/or which need to be reproducible across JVMs. Then the standard hash code ("longHashCode" or the like) can be defined as a JVM-controlled instance of that framework. By then, if we start to roll out 128-bit integers in Java (using value types) we can roll out some larger hash codes at the same time, using the same migration tricks. Bonus point: If the JVM gets ownership of a pervasively used hash code, then hardening that hash code, even at the expense of JVM performance, becomes an allowed move. Kind of like compiling in side-channel remediations: You do it sometimes because you need more DiD, but not always because of a performance hit. HTH ? John From gavin.bierman at oracle.com Thu Apr 25 09:11:11 2019 From: gavin.bierman at oracle.com (Gavin Bierman) Date: Thu, 25 Apr 2019 11:11:11 +0200 Subject: Revised semantics for break within switch expression Message-ID: Dear experts: We are currently preparing a new JEP to make switch expressions a permanent feature. As stated on this list, we propose to replace the break with value form (e.g. break 42;) from the preview, with a new break-with statement (thus, break-with 42;), that is used to produce a value from an enclosing switch expression. This gives us an opportunity to reconsider some earlier design decisions. One I?d like your opinion on regards the control statements and which statements can handle them. Brian sent a summary table to this list a while back that is very useful: break-e break break-l continue return switch-e H X X X X switch-s X H P P P for/while/do X H P H P block P P P P P labeled P P H* P P lambda X X X X H method X X X X H where: + X ? not allowed + P ? let the parent handle it + H ? handle it and complete normally + H* ? handle it and complete normally if the labels match, otherwise P The bit I?d like to reconsider is the break-e column. We initially decided that if a break-e occurred in a for/while/do or switch-statement, it would be super confusing if the break target was not this statement but an outer switch expression, even though it was a new form of break. So we proposed to ban it. For example: int i = switch (a) { default -> { switch (b) { case 0: break 0; // where does this transfer control? }; break 1; } }; fails to compile: error: value break not supported in 'switch' case 0: break 0; // where does this transfer control? However, now we are proposing a completely new break-with statement, perhaps the potential for confusion is reduced? The break-with statement can only be used to produce a value for a switch expression. So there should not be a confusion if that switch expression contains a switch-statement or for/while/do statement. In other words, I would like to propose that we generalise the table to the following (changes in **-**): break-with break break-l continue return switch-e H X X X X switch-s **P** H P P P for/while/do **P** H P H P block P P P P P labeled P P H* P P lambda X X X X H method X X X X H [It is useful to compare with the column for the return statement. This can return a value to the outer method/lambda regardless of any enclosing switch-statement or for/while/do statements.] What do you think? Thanks, Gavin From brian.goetz at oracle.com Thu Apr 25 13:32:43 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 25 Apr 2019 09:32:43 -0400 Subject: Draft JEP on records and sealed types In-Reply-To: <603985572.407748.1556057369320.JavaMail.zimbra@u-pem.fr> References: <603985572.407748.1556057369320.JavaMail.zimbra@u-pem.fr> Message-ID: > reviewed ! > there are two gray areas, how to have several public records in one compilation unit and what is exactly an extractor ? On the first, we went around a few times, and I think I?m convinced we don?t need to do anything special here, at least until we get to pattern matching in switch. Nesting is fine; static import helps; we can consider a similar thing for type patterns of sealed types as we do for enums, where we allow the outer class to be omitted when switching over a sealed type with inner subtypes. An extractor is the implementation of a pattern. We don?t yet have hard terminology for this, so this can be rewritten to ?support for deconstruction patterns? until then. From brian.goetz at oracle.com Thu Apr 25 14:20:17 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 25 Apr 2019 10:20:17 -0400 Subject: A library for implementing equals and hashCode In-Reply-To: References: Message-ID: <0FC772DB-5085-42F1-BEA9-44F1AFF73AA1@oracle.com> (late to the party) Thanks for pulling this together. That people routinely implement equals/hashCode explicitly is something we would like to put in the past. Let me make a few points regarding connection with future potential features, not because I want to say that these will obsolete Equivalence before it starts, or that we should delay working on it until we have these features ? just to point out areas of potential overlap so we can pay attention to how these things might converge. 1. Pattern matching. Pattern matching offers a path to a better way to write equals() methods, without the complex control flow that is typical of template-expanded implementations (whether expanded by a human or an IDE.). For a class like Point, we can implement equals via: boolean equals(Object o) { return o instanceof Point p && p.x == x && p.y = y; } This is no less explicit than the status quo, but more readable and less error-prone (no short circuit tests; one big &&'ed expression.). However, pattern matching offers little help for implement hashCode in a comparable way. 2. Pattern matching, again. The implementation of a nontrivial pattern is a class member. For sake of exposition, imagine it is declared like this (please, we?re not discussing the syntax of this now): class Point { public Point(int x, int y) { this.x = x; this.y = y; } public pattern Point(int x, int y) { x = this.x; y = this.y; } } (and of course records will automate this.). Given that a pattern is a way of extracting a bundle of state from an object, assuming there were some way to refer symbolically to a pattern, one could derive an Equivalence from a ?pattern reference?: Equivalence.of(); 3. Field references. We like the Comparator factories and combinators well enough, which take lambdas that extract single fields. It?s a short hop to accepting field references, which would be both more pleasant to use, and more optimizable (field references are more amenable to constant-folding optimizations, for example, by virtue of their transparency.) 4. Specialized generics. The rift between objects and primitives was the major pain-generator for all of the Java-8-era APIs, including the Comparator APIs. Not only did we need hand-specialized comparingInt() methods, but the lack of a common super type without boxing meant that we could not use the more appealing approach of varargs, but instead had to have a method call for each component in the comparison. You are in the same boat now, until Valhalla delivers specialized generics. Its worth thinking a bit about what we would like the long-term API to look like, so we can steer clear of getting in its way between now and then. With specialized generics, we?d probably want something like static Equivalence of(Class clazz, Function? components) Which suggests we probably want to steer away from having a varargs option, so that we are not buying ourselves one more migration headache. > On Apr 22, 2019, at 2:29 PM, Liam Miller-Cushon wrote: > > Please consider this proposal for a library to help implement equals and hashCode. > > The doc includes a discussion of the motivation for adding such an API to the JDK, a map of the design space, and some thoughts on the subset of that space which might be most interesting: > > http://cr.openjdk.java.net/~cushon/amber/equivalence.html From brian.goetz at oracle.com Thu Apr 25 15:55:46 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 25 Apr 2019 11:55:46 -0400 Subject: Wrapping up the first two courses In-Reply-To: <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> Message-ID: <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> A few more questions have been raised: - Do we do alignment before, or after, escape processing? - What about single-line ?fat? strings? - What is the effect of text on the first line on alignment? - What about opt-out? - What about \ ? Suggested answers: 1. Escape processing. If alignment is about removing _incidental indentation_, it seems hard to believe that a \t escape is intended to be incidental; this feels like payload, not envelope. Which suggests to me that we should be doing alignment on the escaped string, and then doing escape processing. For 2/3, here?s a radical suggestion. Our theory is, a ?fat? string is one that is is co-mingled with the indentation of the surrounding code, and one which we usually wish the compiler to disentangle for us. By this interpretation, fat single-line strings make no sense, so let?s ban them, and similarly, text on the first line similarly makes little sense, so let?s ban that too. In other words, fat strings (with the possible exception of the trailing delimiter) must exist within a ?Kevin Rectangle.? For 4 (opt out), I think it is OK to allow a self-stripping escape on the first line (e.g., \-), which expands to nothing, but suppresses stripping. This effectively becomes a ?here doc?. For 5, since \ is not valid today, we don?t have to decide this now, we can add it later if desired. > So, I posit, we have consensus over the following things: > > - Multi-line strings are a useful feature on their own > - Using ?fat? delimiters for multi-line strings is practical and intuitive > - Multi-line string literals share the same escape language as single-line string literals > - Newlines in MLSLs should be normalized to \n > - There exists a reasonable alignment algorithm, which users can learn easily enough, and can be captured as a library method on String (some finer points to be hammered out) > - To the extent the language performs alignment, it should be consistent with what the library-based version does, so that users can opt out and opt back in again > - In the common case, a MLSL will be a combination of some intended and some incidental indentation, and it is reasonable for the default to be that the language attempts to normalize away the incidental indendentation > - There needs to be an opt-out, for the cases where alignment is not the default the user wants From john.r.rose at oracle.com Thu Apr 25 21:37:59 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 25 Apr 2019 14:37:59 -0700 Subject: Wrapping up the first two courses In-Reply-To: <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> Message-ID: <3788BEC6-74B8-4EA1-AF6E-0AB3EC4688AE@oracle.com> On Apr 25, 2019, at 8:55 AM, Brian Goetz wrote: > > A few more questions have been raised: > > - Do we do alignment before, or after, escape processing? > - What about single-line ?fat? strings? > - What is the effect of text on the first line on alignment? > - What about opt-out? > - What about \ ? TL;DR: - Before, and watch out for \u00XX, - Disallow, - Disallow, - (see next) - Support \LineTerminator as both explicit layout control and opt-out (no \-). > Suggested answers: > > 1. Escape processing. If alignment is about removing _incidental indentation_, it seems hard to believe that a \t escape is intended to be incidental; this feels like payload, not envelope. Which suggests to me that we should be doing alignment on the escaped string, and then doing escape processing. I agree with this; I think it is much more intuitive to make sure that any escaped thing is classified as payload. Basically, if it doesn't look like whitespace, it won't be treated as the envelope of the rectangle but rather as payload inside the rectangle. What could be simpler? At first I thought this might make implementation and specification more complex, but actually it makes it simpler. Here's why: If you treat rectangle extraction as a process of grabbing a bunch of escape-sequence-laden payload, you can treat the expansion of escape sequences as a pure library function, a mapping from String to String, where LineTerminator shows up as \n (\u000A). I think Jim may already favor this approach? Making a clean separation between rectangle extraction (first) and escape sequence expansion (second) may also clarify the opt-out question; see below. One confounding factor we've hesitated to touch is the status of \uXXXX escapes, which look the same as \OOO escapes to most users but are completely different in order of processing. We could make our lives simpler with respect to \uXXXX escapes if we were to modify the rules for them inside of fat strings, so that (somehow) they were always interpreted as payload, and not as envelope. (We can't modify the rules inside of plain strings, sadly.) The JLS warns about \uXXXX escapes aliasing to surprising syntax characters, in 3.3 (\u005c = \), 3.10.4 (\u0027 = '), and 3.10.5 (\u0022 = ""). The net result is that you can obfuscate your Java program horribly if you use any of those unicode escapes. With the rectangle extraction feature of fat strings, the list grows to include \u0020 and other whitespace. As a matter of style programmers should scrupulously avoid unicode escapes for lexically significant code points. (Some puzzlers: What role can the unicode escape \u000A plan, in Java program text today? Hint: It's not a LineTerminator. What could it mean in a multi-line string? Same questions about \u000D? How should those characters interact with rectangle extraction?) At this point we could consider going farther, and make a mechanically checked guarantee against puzzlers in fat strings. I'm not sure about this, but I want to put a proposal out there FTR: Limitation on \uXXXX escapes: Inside of fat strings, any unicode escape sequence (which is necessarily of the form \u*XXXX repeated u followed by four hex digits) is forbidden to specify a hexidecimal number in the range of 0000 to 001F inclusive. (Reduced limitation: U-escapes must not alias to characters significant to the envelope, which are those in "\"\\ \t", quote+backslash+space+tab.) Effect: All remaining \uXXXX escapes are safe to retain during rectangle extraction and can be interpreted along with other string escapes in the same post-pass. In particular, a String library method can handle such escapes along with other C-like string escapes. Processing \u at the same time and in the same method as other escapes seems like a win to me, independently of the exclusion of puzzlers. This extra win made me speak up, in fact. Also, a coordinated limitation on fat string delimiters: The opening triple-quote of a fat string must not be derived from a unicode escape, which would have been of the form \u0022, \uu0022, etc. > For 2/3, here?s a radical suggestion. Our theory is, a ?fat? string is one that is is co-mingled with the indentation of the surrounding code, and one which we usually wish the compiler to disentangle for us. By this interpretation, fat single-line strings make no sense, so let?s ban them, and similarly, text on the first line similarly makes little sense, so let?s ban that too. In other words, fat strings (with the possible exception of the trailing delimiter) must exist within a ?Kevin Rectangle.? Yep. Put rectangle extraction front and center. Alternative theory for 2: Allow single-line fat strings. Perform analogous "line extraction" on them, by removing all unescaped whitespace after the open quote and before the close quote. This is like rectangle extraction, but in one dimension. > For 4 (opt out), I think it is OK to allow a self-stripping escape on the first line (e.g., \-), which expands to nothing, but suppresses stripping. This effectively becomes a ?here doc?. I agree with the desire for a clear opt-out. Here's a question we should answer: When a user opts out of 2D layout with rectangle extraction, what should we call the alternative? Surely it comes with more intensive control from the user. Maybe that leads to odder-looking code, but maybe also it leads to code which the user has "beautified" in some way apart from rectangle extraction. I'd like to think of this opt-out scenario not just negatively ("don't auto-strip that white space") but positively ("I want to organize the form of my program more freely"). Not sure if that's possible, but read on. Any, I think an ad hoc escape \- at the front of the string is not such a clear win, and if we tweak the rules we can gain more than just a single dead-end quasi-escape. > For 5, since \ is not valid today, we don?t have to decide this now, we can add it later if desired. (This is more accurately called \LineTerminator, since escapes are processed after has been tokenized.) It's true we can defer this, but let's look at combining it with the opt-out feature and see if we like what we get. Thesis: The opt-out feature, which asks for all leading blanks (and bracketing newlines) to be retained is a special case of intensified user control over 2D program layout. Such intensified user control over 2D layout very often (in languages we all know, like makefiles and shell) often includes breaking of long lines, using escape sequences or other special control over the envelope (as opposed to payload). The user is taking more control over a complex payload, not just giving up on the rectangle rule. Proposal: Allow newlines to be marked (somehow) as non-payload, so users can have more intense control over program layout without "leaking" newlines used for layout into their payloads (string body characters). If we frame this feature as an escape sequence, which marks newlines for elision, then it can be rolled into the escape processing pass. If (see above) escape processing comes *after* rectangle extraction, then newline control could potentially co-exist with rectangle extraction, depending on the presence or absence of an opt-out condition. I think that could be a bonus, although that could be misused also. There are a range of possible rules for the opt-out from rectangle extraction, all with slightly different outcomes: - Opt out if the string body contains \LineTerminator anywhere. - Opt out if the string body contains \n or \r anywhere. - Opt out if the string body contains \n or \r or \LineTerminator anywhere. - Any of the previous rules, applied only between the open triple-quote and first LineTerminator. - Opt out if any visible character (not whitespace) occurs between the open triple-quote and first LineTerminator. - Allow any single escape sequence, possibly accompanied by whitespace, between the open triple-quote and first LineTerminator, and opt out if that occurs. (As you can see, the opt-out rule can be more or less specific, and can either co-exist with arbitrary "stuff" appearing after the open-quote, or with restrictions that allow only an opt-out to occur in the privileged position.) Specific proposal: The sequence \ LineTerminator followed by any amount of unescaped spaces and tabs is elided. This happens during escape processing, which means after rectangle extraction. Rectangle extraction is inhibited (opted out) by the presence of any escape sequence between the open triple-quote and the first following LineTerminator. Optionally: Other than whitespace and escape sequences, nothing is allowed between the open triple-quote and the first following LineTerminator. If rectangle extraction occurs, and escape processing encounters \ LineTerminator sequences, then additional leading whitespace is stripped. The escape sequence is ignorant of whether any leading whitespace (or none) was removed during rectangle extraction (if it occurred). Such two-step removal seems complicated but is easy to justify: The rectangle extraction isolates a visible block of source code from the containing context, and then the escape sequences do their work. If rectangle extraction is opted out of, the escape sequences would do the same work anyway. I think a set of decisions like this would hang together nicely and give users very good control over the layout of their programs. The resulting programs would (barring intentional obfuscation) read clearly, in both rectangular layouts and more ad hoc free-flowing formats. From kevinb at google.com Thu Apr 25 21:42:11 2019 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 25 Apr 2019 14:42:11 -0700 Subject: A library for implementing equals and hashCode In-Reply-To: References: Message-ID: On Tue, Apr 23, 2019 at 3:40 PM Liam Miller-Cushon wrote: There's some related discussion under "What?s the relationship to > Comparator?". There are at least four options: Equivalence extends > Comparator, Comparator extends Equivalence, there's no relationship between > them, or they both extend some new common super type. There are trade-offs > here, but none of those options seem like a slam-dunk. > As I'm looking at it, 1 and 2 basically just don't work; neither generalizes the other. If it's option 3, no relationship, there is still the possibility that a common set of factory methods will produce a thing that implements both, or that a common builder could have methods for producing each. And yeah, *some* way to unify the two is at least *somewhat* desirable, because we really, really like it when people using TreeSets follow TreeSet's advice to keep their comparison logic consistent with equals, and this would help them do that. However, this has the potential to raise the perceived complexity of the resulting beast and I think it's far from clear that it'll end up worth it. I also think that just being able to construct your comparison and equality logic in consistent and concise ways that you can review against each other gets us at least 51% of the way there. > On Mon, Apr 22, 2019 at 12:20 PM Fred Toussi > wrote: > >> Since 2014 HSQLDB has been using an ObjectComparator that extends >> Comparator for its hash sets and maps. These are sets and maps for >> combinations of int, long and Object, as well as order preserving sets and >> maps. >> >> >> https://sourceforge.net/p/hsqldb/svn/HEAD/tree/base/trunk/src/org/hsqldb/lib/ObjectComparator.java >> >> https://sourceforge.net/p/hsqldb/svn/HEAD/tree/base/trunk/src/org/hsqldb/map/BaseHashMap.java >> >> We made a shortcut to extend Comparator, but if this is going to be added >> to Java, your Equivalence should be the super interface of Comparator >> >> You may also consider the problems of correctly implementing equals in >> subclasses, which took years to be clarified (by Martin Odersky AFAIR) by >> calling super.equals(other) before performing the test. Example below from >> HSQLDB code. >> >> public class RowType extends Type { >> public boolean equals(Object other) { >> >> if (other == this) { >> return true; >> } >> >> if (other instanceof RowType) { >> if (super.equals(other)) { >> .... >> >> Regards >> >> Fred Toussi >> HSQLDB Project >> > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From john.r.rose at oracle.com Thu Apr 25 21:45:39 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 25 Apr 2019 14:45:39 -0700 Subject: Wrapping up the first two courses In-Reply-To: <3788BEC6-74B8-4EA1-AF6E-0AB3EC4688AE@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> <3788BEC6-74B8-4EA1-AF6E-0AB3EC4688AE@oracle.com> Message-ID: On Apr 25, 2019, at 2:37 PM, John Rose wrote: > > TL;DR: - Before, and watch out for \u00XX, - Disallow, > - Disallow, - (see next) - Support \LineTerminator as > both explicit layout control and opt-out (no \-). P.S. One more argument in favor of doing \LineTerminator now, which is even better than "it's more economical than \-": 2D MLS's, with rectangle extraction, endow newlines, spaces and tabs with lexical significance. They are no longer passive payload characters, but instead they join quote " and backslash \ as active delimiting characters. It would be, now that I think of it, crazy *not* to allow *all three* of them to be escaped. (Yes, the mostly-invisible tab character should be escapable, if it is semantically significant to rectangle boundaries.) (And, yes, these extra escapes will sometimes cause columns not to line up. That's life with escapes.) Whether \LineTerminator also gobbles up following horizontal space is a separate question. But if space or tab can be escaped, then it's trivial to indicate a space or tab that should not be gobbled by \LT. You just escape it. And since LT is a 2D feature, it is not wrong to consider allowing it to gobble in 2 dimensions. From kevinb at google.com Thu Apr 25 21:54:20 2019 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 25 Apr 2019 14:54:20 -0700 Subject: A library for implementing equals and hashCode In-Reply-To: <968749ad-f590-b152-7df7-030bee77df24@oracle.com> References: <60baf16a-37ee-4e82-475c-af45287f02de@oracle.com> <968749ad-f590-b152-7df7-030bee77df24@oracle.com> Message-ID: On Wed, Apr 24, 2019 at 5:39 PM Stuart Marks wrote: By "quality" hash code, do you mean a cryptographic hash? No, definitely not. I just mean things like bit dispersion/avalanche and better (sub-crypto) collision resistance. The root problems I'm aware of with Object.hashCode() being pulled beyond its core purpose (providing bits for HashMaps to stir up and use) are: * It can't be seeded * Compositing a hash code from a tree of data has to keep collaring down to 32 bits over and over, putting the burden on "everyone" to know how to do that well * You can't get away from the preponderance of mediocre functions we're already stuck with, as noted I'm all for implementing something better than times-31-plus. AutoValue happens to have been using times-1000003-xor for most of its lifespan, and I wouldn't be at all surprised if there's a better default choice. The only thing I'm arguing against is letting users customize the function; I think that would be a mistake. Other applications might want to use this API for convenience, or cleaner > code, > or something, but which might want to use their own hash reduce function > in > order to preserve compatibility. Of course, by all that is sensible, either you should never have specified your hashCode() behavior or your users should never have depended on it. Someone did something wrong -- which isn't enough reason to tell those people "you're screwed!", but it seems totally reasonable to me that if you're stuck with your old hash function definition, then you're just stuck with your old hash function implementation too. That's not being screwed, that's just keeping on doing what you were doing. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From john.r.rose at oracle.com Thu Apr 25 22:05:39 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 25 Apr 2019 15:05:39 -0700 Subject: Wrapping up the first two courses In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> <3788BEC6-74B8-4EA1-AF6E-0AB3EC4688AE@oracle.com> Message-ID: <41C1B3EA-04A1-4B92-89E0-29299ADE5C52@oracle.com> On Apr 25, 2019, at 2:45 PM, John Rose wrote: > > Whether \LineTerminator also gobbles up following > horizontal space is a separate question. But if space > or tab can be escaped, then it's trivial to indicate > a space or tab that should not be gobbled by \LT. > You just escape it. And since LT is a 2D feature, > it is not wrong to consider allowing it to gobble > in 2 dimensions. P.P.S. I skipped a step here. If treated *exactly* like the lexically significant characters " \ then LineTerminator should escape to *itself*. It's a second move to have it escape to something that provides control over program layout, by gobbling non-payload space used only to control format. Such a move is not forced, but it is very likely, given that "\n" is already an escape sequence for a newline, and LineTerminator also (I assume) is translated to a newline. Thus, \LineTerminator is (a) a candidate for escaping given its new status in MLSs, and (b) a further candidate for use in layout control, given pre-existing coverage by \n and common precedent in other languages. From cushon at google.com Thu Apr 25 23:29:35 2019 From: cushon at google.com (Liam Miller-Cushon) Date: Thu, 25 Apr 2019 16:29:35 -0700 Subject: Wrapping up the first two courses In-Reply-To: <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> Message-ID: On Thu, Apr 25, 2019 at 8:56 AM Brian Goetz wrote: > For 2/3, here?s a radical suggestion. Our theory is, a ?fat? string is > one that is is co-mingled with the indentation of the surrounding code, and > one which we usually wish the compiler to disentangle for us. By this > interpretation, fat single-line strings make no sense, so let?s ban them, > and similarly, text on the first line similarly makes little sense, so > let?s ban that too. In other words, fat strings (with the possible > exception of the trailing delimiter) must exist within a ?Kevin > Rectangle.? > +1 I thought Jim presented a good case for an exception for the trailing delimiter, but otherwise disallowing single-line 'fat' strings (single-line multi-line strings?) seems to mostly have upside. For 4 (opt out), I think it is OK to allow a self-stripping escape on the > first line (e.g., \-), which expands to nothing, but suppresses stripping. > This effectively becomes a ?here doc?. > This seems OK to me too, but is there good return on complexity? Closing delimiter influence can also be used to opt out of stripping. Are there enough use-cases to justify a second opt-out mechanism? And does it have to be decided now, or could it be added later? From kevinb at google.com Fri Apr 26 00:19:02 2019 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 25 Apr 2019 17:19:02 -0700 Subject: Wrapping up the first two courses In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> Message-ID: I'm sure I'm not saying anything totally new in the following, but this is my summary of why I don't see the necessity of any explicit opt-out like \-. Suppose I write this: String s = """ some lines go here """; And suppose I have learned to picture a rectangle whose left edge is the left edge of the ending delimiter. Well, once I'm already picturing that rectangle based on the delimiter, then clearly if I leave the delimiter alone, that leaves the rectangle alone. I can change to String s = """ some lines go here """; ... to insert two spaces before `some`, and I can further change to String s = """ some lines go here """; ... to also insert two spaces before `lines`. What is notable to me is that at no point did I ever change from *one kind* of string literal to *another*. There is no feature that I opted in or out of -- because there just doesn't need to be. That to me is a clear and compelling win for simplicity. It's entirely possible this was all 100% clear already, in which case sorry! On Thu, Apr 25, 2019 at 4:30 PM Liam Miller-Cushon wrote: > On Thu, Apr 25, 2019 at 8:56 AM Brian Goetz > wrote: > > > For 2/3, here?s a radical suggestion. Our theory is, a ?fat? string is > > one that is is co-mingled with the indentation of the surrounding code, > and > > one which we usually wish the compiler to disentangle for us. By this > > interpretation, fat single-line strings make no sense, so let?s ban them, > > and similarly, text on the first line similarly makes little sense, so > > let?s ban that too. In other words, fat strings (with the possible > > exception of the trailing delimiter) must exist within a ?Kevin > > Rectangle.? > > > > +1 > > I thought Jim presented a good case for an exception for the trailing > delimiter, but otherwise disallowing single-line 'fat' strings (single-line > multi-line strings?) seems to mostly have upside. > > For 4 (opt out), I think it is OK to allow a self-stripping escape on the > > first line (e.g., \-), which expands to nothing, but suppresses > stripping. > > This effectively becomes a ?here doc?. > > > > This seems OK to me too, but is there good return on complexity? Closing > delimiter influence can also be used to opt out of stripping. Are there > enough use-cases to justify a second opt-out mechanism? And does it have to > be decided now, or could it be added later? > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From john.r.rose at oracle.com Fri Apr 26 00:30:23 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 25 Apr 2019 17:30:23 -0700 Subject: Wrapping up the first two courses In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> Message-ID: On Apr 25, 2019, at 5:19 PM, Kevin Bourrillion wrote: > > What is notable to me is that at no point did I ever change from one kind of string literal to another. There is no feature that I opted in or out of -- because there just doesn't need to be. That to me is a clear and compelling win for simplicity. > > It's entirely possible this was all 100% clear already, in which case sorry! It could be this way. It's logical and simple to explain. The simple rule for rectangle extraction requires that you read through to the end of the MLS in order to see where the rectangle is. We've discussed alternatives (like a left-hand gutter of single quotes) but none of them appeal to us. The """\- rule for opting out has the marginal advantage that you can see up front that it's a special literal, where you don't need to read all the way to the end to find out that the rectangle has disappeared. A second marginal advantage is that it also allows the close-quote to be indented along with its enclosing context, instead of hard against the left margin. From brian.goetz at oracle.com Fri Apr 26 00:42:08 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 25 Apr 2019 20:42:08 -0400 Subject: Wrapping up the first two courses In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> Message-ID: <8F0DF8BC-B4CE-445E-808A-B4A7B74F7B3E@oracle.com> So what you?re saying is: with CDI, the opt out is: bring the closing delimiter to the left margin, done. > On Apr 25, 2019, at 8:19 PM, Kevin Bourrillion wrote: > > I'm sure I'm not saying anything totally new in the following, but this is my summary of why I don't see the necessity of any explicit opt-out like \-. > > Suppose I write this: > > String s = """ > some > lines go here > """; > > And suppose I have learned to picture a rectangle whose left edge is the left edge of the ending delimiter. > > Well, once I'm already picturing that rectangle based on the delimiter, then clearly if I leave the delimiter alone, that leaves the rectangle alone. I can change to > > String s = """ > some > lines go here > """; > > ... to insert two spaces before `some`, and I can further change to > > String s = """ > some > lines go here > """; > > ... to also insert two spaces before `lines`. > > What is notable to me is that at no point did I ever change from one kind of string literal to another. There is no feature that I opted in or out of -- because there just doesn't need to be. That to me is a clear and compelling win for simplicity. > > It's entirely possible this was all 100% clear already, in which case sorry! > > > > > On Thu, Apr 25, 2019 at 4:30 PM Liam Miller-Cushon > wrote: > On Thu, Apr 25, 2019 at 8:56 AM Brian Goetz > wrote: > > > For 2/3, here?s a radical suggestion. Our theory is, a ?fat? string is > > one that is is co-mingled with the indentation of the surrounding code, and > > one which we usually wish the compiler to disentangle for us. By this > > interpretation, fat single-line strings make no sense, so let?s ban them, > > and similarly, text on the first line similarly makes little sense, so > > let?s ban that too. In other words, fat strings (with the possible > > exception of the trailing delimiter) must exist within a ?Kevin > > Rectangle.? > > > > +1 > > I thought Jim presented a good case for an exception for the trailing > delimiter, but otherwise disallowing single-line 'fat' strings (single-line > multi-line strings?) seems to mostly have upside. > > For 4 (opt out), I think it is OK to allow a self-stripping escape on the > > first line (e.g., \-), which expands to nothing, but suppresses stripping. > > This effectively becomes a ?here doc?. > > > > This seems OK to me too, but is there good return on complexity? Closing > delimiter influence can also be used to opt out of stripping. Are there > enough use-cases to justify a second opt-out mechanism? And does it have to > be decided now, or could it be added later? > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From john.r.rose at oracle.com Fri Apr 26 00:51:43 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 25 Apr 2019 17:51:43 -0700 Subject: Revised semantics for break within switch expression In-Reply-To: References: Message-ID: <62BD2499-30EF-4440-82CC-3D006AEC6967@oracle.com> On Apr 25, 2019, at 2:11 AM, Gavin Bierman wrote: > > However, now we are proposing a completely new break-with statement, perhaps the potential for confusion is reduced? The break-with statement can only be used to produce a value for a switch expression. So there should not be a confusion if that switch expression contains a switch-statement or for/while/do statement. In other words, I would like to propose that we generalise the table to the following (changes in **-**): > > break-with break break-l continue return > switch-e H X X X X > switch-s **P** H P P P > for/while/do **P** H P H P > block P P P P P > labeled P P H* P P > lambda X X X X H > method X X X X H > > > [It is useful to compare with the column for the return statement. This can return a value to the outer method/lambda regardless of any enclosing switch-statement or for/while/do statements.] > > What do you think? I think return : lambda :: break-with : switch-e. And your new table reflects that. The current design, as you say, makes breakable control flow a no-go zone for break-e, on the grounds that break-e looks too much like break. In any case, break-with very clearly matches only to expression switches, so it is even less ambiguous than return (which matches both lambdas and methods). Let 'er fly! ? John From john.r.rose at oracle.com Fri Apr 26 00:55:21 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 25 Apr 2019 17:55:21 -0700 Subject: Alignment algorithm (was: Wrapping up the first two courses) In-Reply-To: <407ded13cf9ab45b06f19a9772632961f9377b6a.camel@vasylenko.uk> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <61B62FDA-D4EB-443C-9A8F-6D39D93F656F@oracle.com> <5ac4a633-a560-c945-86b1-8121a35b60f9@oracle.com> <79fe66f4f541c34abda2148d7a2efa9ae755eb97.camel@vasylenko.uk> <407ded13cf9ab45b06f19a9772632961f9377b6a.camel@vasylenko.uk> Message-ID: <6ED9FBCF-BDE5-4645-B99C-759E80592DC4@oracle.com> On Apr 23, 2019, at 1:19 PM, Elias N Vasylenko wrote: > >> This is worth considering ? but as I?ve said before, this can?t be >> the only opt-out. > > I agree! There needs to be a way to opt out without losing the leading > newline. Don't forget that you can ask for a leading newline explicitly using \n. You're not forced to use a layout newline; you can use an escaped one. """\nfirst non-blank line second non-blank and last line """ (More options are possible with a proper \LineTerminator escape.) From john.r.rose at oracle.com Fri Apr 26 00:59:34 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 25 Apr 2019 17:59:34 -0700 Subject: Alignment algorithm (was: Wrapping up the first two courses) In-Reply-To: <407ded13cf9ab45b06f19a9772632961f9377b6a.camel@vasylenko.uk> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <61B62FDA-D4EB-443C-9A8F-6D39D93F656F@oracle.com> <5ac4a633-a560-c945-86b1-8121a35b60f9@oracle.com> <79fe66f4f541c34abda2148d7a2efa9ae755eb97.camel@vasylenko.uk> <407ded13cf9ab45b06f19a9772632961f9377b6a.camel@vasylenko.uk> Message-ID: On Apr 23, 2019, at 1:19 PM, Elias N Vasylenko wrote: > > But in any case I believe the justification for disabling auto- > alignment when there is a non-empty first line stands for itself, > regardless of how auto-alignment interacts with escapes. FTR I like this approach also. If there's something right after the open-triquote, then there's no rectangle to extract. Separately, we might choose to limit what can go right after the open-triquote, on grounds of readability. (See variations in my long message of earlier today.) From james.laskey at oracle.com Fri Apr 26 13:25:12 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Fri, 26 Apr 2019 10:25:12 -0300 Subject: Wrapping up the first two courses In-Reply-To: <8F0DF8BC-B4CE-445E-808A-B4A7B74F7B3E@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> <8F0DF8BC-B4CE-445E-808A-B4A7B74F7B3E@oracle.com> Message-ID: The example that Kevin left to the imagination was; String s = """ some lines go here """; Which, while awkward, remains a natural progression of CDI, can be interpreted as heredoc, and no other indicator required. Yes, done. So we have; - Must have a line terminator after open delimiter. - This allows the addition of directives (super escapes?) after the open delimiter in the future - Single line fat delimiter strings are illegal - Close delimiter influence is used to control indentation. - It is also used to degrees of opt out. Hard to the left margin is 100% opt out. Other notion mentioned; - \ eats the terminator and continues on the next line. - I suppose we could have \ to mean continue here - This could effective provide margin control -- Jim > On Apr 25, 2019, at 9:42 PM, Brian Goetz wrote: > > So what you?re saying is: with CDI, the opt out is: bring the closing delimiter to the left margin, done. > > >> On Apr 25, 2019, at 8:19 PM, Kevin Bourrillion > wrote: >> >> I'm sure I'm not saying anything totally new in the following, but this is my summary of why I don't see the necessity of any explicit opt-out like \-. >> >> Suppose I write this: >> >> String s = """ >> some >> lines go here >> """; >> >> And suppose I have learned to picture a rectangle whose left edge is the left edge of the ending delimiter. >> >> Well, once I'm already picturing that rectangle based on the delimiter, then clearly if I leave the delimiter alone, that leaves the rectangle alone. I can change to >> >> String s = """ >> some >> lines go here >> """; >> >> ... to insert two spaces before `some`, and I can further change to >> >> String s = """ >> some >> lines go here >> """; >> >> ... to also insert two spaces before `lines`. >> >> What is notable to me is that at no point did I ever change from one kind of string literal to another. There is no feature that I opted in or out of -- because there just doesn't need to be. That to me is a clear and compelling win for simplicity. >> >> It's entirely possible this was all 100% clear already, in which case sorry! >> >> >> >> >> On Thu, Apr 25, 2019 at 4:30 PM Liam Miller-Cushon > wrote: >> On Thu, Apr 25, 2019 at 8:56 AM Brian Goetz > wrote: >> >> > For 2/3, here?s a radical suggestion. Our theory is, a ?fat? string is >> > one that is is co-mingled with the indentation of the surrounding code, and >> > one which we usually wish the compiler to disentangle for us. By this >> > interpretation, fat single-line strings make no sense, so let?s ban them, >> > and similarly, text on the first line similarly makes little sense, so >> > let?s ban that too. In other words, fat strings (with the possible >> > exception of the trailing delimiter) must exist within a ?Kevin >> > Rectangle.? >> > >> >> +1 >> >> I thought Jim presented a good case for an exception for the trailing >> delimiter, but otherwise disallowing single-line 'fat' strings (single-line >> multi-line strings?) seems to mostly have upside. >> >> For 4 (opt out), I think it is OK to allow a self-stripping escape on the >> > first line (e.g., \-), which expands to nothing, but suppresses stripping. >> > This effectively becomes a ?here doc?. >> > >> >> This seems OK to me too, but is there good return on complexity? Closing >> delimiter influence can also be used to opt out of stripping. Are there >> enough use-cases to justify a second opt-out mechanism? And does it have to >> be decided now, or could it be added later? >> >> >> -- >> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Fri Apr 26 13:31:24 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 26 Apr 2019 09:31:24 -0400 Subject: Wrapping up the first two courses In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> <8F0DF8BC-B4CE-445E-808A-B4A7B74F7B3E@oracle.com> Message-ID: I like this scheme; it is clear and the user can control it reasonably well. For the record, let me register one problem, not because I think this problem _must_ be solved, but because we should be aware of the choice we?re making. And that is: the use of CDI to indicate ?opt out of alignment? takes away the user?s ability to use CDI to say ?no trailing newline.? That is, if I want the following string to be indented to the max: > > String s = """ > some > lines go here > "?"; the use of the closing delimiter in this way is fine, but then I can?t get the equivalent string with no trailing newline. So I can use CDI to influence trailing newline, OR alignment, but not both. FTR I think this is probably OK. From kevinb at google.com Fri Apr 26 15:31:32 2019 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 26 Apr 2019 08:31:32 -0700 Subject: Wrapping up the first two courses In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> <8F0DF8BC-B4CE-445E-808A-B4A7B74F7B3E@oracle.com> Message-ID: On Thu, Apr 25, 2019 at 5:42 PM Brian Goetz wrote: So what you?re saying is: with CDI, the opt out is: bring the closing > delimiter to the left margin, done. > derp, thanks for completing my thought. Basically: some say "opt out" while I say "I choose to locate the left edge of my rectangle at the left edge of the actual source file". "Done" indeed. On Fri, Apr 26, 2019 at 6:25 AM Jim Laskey wrote: The example that Kevin left to the imagination was; > > String s = """ > some > lines go here > """; > > Which, while awkward, remains a natural progression of CDI, can be > interpreted as heredoc, and no other indicator required. > I persistently forget that people will sometimes want to do that, to keep their content within their customary source file column limit. (I want to downplay the notion of pasteability because that's precisely a feature of raw strings, which this is not.) Other notion mentioned; > > - \ eats the terminator and continues on the next line. > Eats the terminator plus all leading whitespace of the next line, yes? I had forgotten that the reintroduction of escapes opened up this possibility, and I think it's pretty great -- quite a substantial fraction (25%+ I think) of all our multiline use cases that we've found are things like long exception messages where they don't actually want the newlines, they just want to be free of dealing with the damn quote-plus-quote. Oh, and quite a few of *those* use cases are in annotations like @FlagSpec({"--foo", "long help text about --foo"}), and I'm very happy that these are no longer excluded from indentation stripping. > - I suppose we could have \ to mean continue here > - This could effective provide margin control > I have to catch up on this thread to figure out what this means. I also need to catch up on the issue of what to do with the trailing newline. We can get data on how often our string literals seem to want interior newlines but no trailing one. It would be a bit surprising if the trailing newline is automatically chomped, but at least you have two very simple and obvious ways to restore it (add another line or add `\n`), whereas chomping via library is sad for several reasons (including excluding those @FlagSpecs I mentioned above). > On Apr 25, 2019, at 8:19 PM, Kevin Bourrillion wrote: > > I'm sure I'm not saying anything totally new in the following, but this is > my summary of why I don't see the necessity of any explicit opt-out like > \-. > > Suppose I write this: > > String s = """ > some > lines go here > """; > > > And suppose I have learned to picture a rectangle whose left edge is the > left edge of the ending delimiter. > > Well, once I'm already picturing that rectangle based on the delimiter, > then clearly if I leave the delimiter alone, that leaves the rectangle > alone. I can change to > > String s = """ > some > lines go here > """; > > > ... to insert two spaces before `some`, and I can further change to > > String s = """ > some > lines go here > """; > > > ... to also insert two spaces before `lines`. > > What is notable to me is that at no point did I ever change from *one > kind* of string literal to *another*. There is no feature that I opted in > or out of -- because there just doesn't need to be. That to me is a clear > and compelling win for simplicity. > > It's entirely possible this was all 100% clear already, in which case > sorry! > > > > > On Thu, Apr 25, 2019 at 4:30 PM Liam Miller-Cushon > wrote: > >> On Thu, Apr 25, 2019 at 8:56 AM Brian Goetz >> wrote: >> >> > For 2/3, here?s a radical suggestion. Our theory is, a ?fat? string is >> > one that is is co-mingled with the indentation of the surrounding code, >> and >> > one which we usually wish the compiler to disentangle for us. By this >> > interpretation, fat single-line strings make no sense, so let?s ban >> them, >> > and similarly, text on the first line similarly makes little sense, so >> > let?s ban that too. In other words, fat strings (with the possible >> > exception of the trailing delimiter) must exist within a ?Kevin >> > Rectangle.? >> > >> >> +1 >> >> I thought Jim presented a good case for an exception for the trailing >> delimiter, but otherwise disallowing single-line 'fat' strings >> (single-line >> multi-line strings?) seems to mostly have upside. >> >> For 4 (opt out), I think it is OK to allow a self-stripping escape on the >> > first line (e.g., \-), which expands to nothing, but suppresses >> stripping. >> > This effectively becomes a ?here doc?. >> > >> >> This seems OK to me too, but is there good return on complexity? Closing >> delimiter influence can also be used to opt out of stripping. Are there >> enough use-cases to justify a second opt-out mechanism? And does it have >> to >> be decided now, or could it be added later? >> > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Fri Apr 26 15:38:56 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 26 Apr 2019 11:38:56 -0400 Subject: Wrapping up the first two courses In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> <8F0DF8BC-B4CE-445E-808A-B4A7B74F7B3E@oracle.com> Message-ID: > > > Other notion mentioned; > > - \ eats the terminator and continues on the next > line. > > > Eats the terminator plus all leading whitespace of the next line, > yes?? I had forgotten that the reintroduction of escapes opened up > this possibility, and I think it's pretty great -- quite a substantial > fraction (25%+ I think) of all our multiline use cases that we've > found are things like long exception messages where they don't > actually want the newlines, they just want to be free of dealing with > the damn quote-plus-quote. There are two interpretations here, related to escape-then-align vs align-then-escape.? Since everything else is align-then-escape, what this would mean is we'd consider the leading space on the continuation line for purposes of determining a common prefix, and strip the common prefix from that, THEN eat the newline.? Example: ??? String s = """ ??????? Imagine this line\ ??????? was very long"""; which would result in: Imagine this linewas very long (lack of space between "line" and "was" is not a typo.) Which raises another question: do we allow \ in SL strings?? (I presume so, and we just eat the \ and the terminator.) > Oh, and quite a few of /those/?use cases are in annotations like > @FlagSpec({"--foo", "long help text about --foo"}), and I'm very happy > that these are no longer excluded from indentation stripping. Can you expand this point?? Not sure what you mean by "no longer excluded from indentation stripping", or why it makes you happy. Can you just give a before/after example for what you mean? From james.laskey at oracle.com Fri Apr 26 15:54:09 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Fri, 26 Apr 2019 12:54:09 -0300 Subject: Wrapping up the first two courses In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> <8F0DF8BC-B4CE-445E-808A-B4A7B74F7B3E@oracle.com> Message-ID: <0E99DDEC-F6F1-43BA-80AE-C92D7C51B433@oracle.com> I was interpreting \ as "the line is to be continued" and \ as "this is where the line continues". This interpretion provides the margin control that Guy was looking for. String s = """ \ This is the first bit \ and the second bit \ and the third bit """; Result: This is the first bit and the second bit and the third bit The string continues after the \, overriding the indentation control of that line. Revisiting the other cases; String s = """ This is the first bit and the second bit and the third bit """; Result: This is the first bit and the second bit and the third bit Default case. String s = """ This is the first bit \ and the second bit \ and the third bit """; Result: This is the first bit and the second bit and the third bit. Not because the next line's white space was skipped after the \, but because of default align. String s = """ This is the first bit \ \ and the second bit \ \ and the third bit """; Result: This is the first bit and the second bit and the third bit. Combining the two escapes. > On Apr 26, 2019, at 12:31 PM, Kevin Bourrillion wrote: > > On Thu, Apr 25, 2019 at 5:42 PM Brian Goetz > wrote: > > So what you?re saying is: with CDI, the opt out is: bring the closing delimiter to the left margin, done. > > derp, thanks for completing my thought. Basically: some say "opt out" while I say "I choose to locate the left edge of my rectangle at the left edge of the actual source file". "Done" indeed. > > > On Fri, Apr 26, 2019 at 6:25 AM Jim Laskey > wrote: > > The example that Kevin left to the imagination was; > > String s = """ > some > lines go here > """; > > Which, while awkward, remains a natural progression of CDI, can be interpreted as heredoc, and no other indicator required. > > I persistently forget that people will sometimes want to do that, to keep their content within their customary source file column limit. (I want to downplay the notion of pasteability because that's precisely a feature of raw strings, which this is not.) > > > Other notion mentioned; > > - \ eats the terminator and continues on the next line. > > Eats the terminator plus all leading whitespace of the next line, yes? I had forgotten that the reintroduction of escapes opened up this possibility, and I think it's pretty great -- quite a substantial fraction (25%+ I think) of all our multiline use cases that we've found are things like long exception messages where they don't actually want the newlines, they just want to be free of dealing with the damn quote-plus-quote. > > Oh, and quite a few of those use cases are in annotations like @FlagSpec({"--foo", "long help text about --foo"}), and I'm very happy that these are no longer excluded from indentation stripping. > > > - I suppose we could have \ to mean continue here > - This could effective provide margin control > > I have to catch up on this thread to figure out what this means. > > I also need to catch up on the issue of what to do with the trailing newline. We can get data on how often our string literals seem to want interior newlines but no trailing one. It would be a bit surprising if the trailing newline is automatically chomped, but at least you have two very simple and obvious ways to restore it (add another line or add `\n`), whereas chomping via library is sad for several reasons (including excluding those @FlagSpecs I mentioned above). > > >>> On Apr 25, 2019, at 8:19 PM, Kevin Bourrillion > wrote: >>> >>> I'm sure I'm not saying anything totally new in the following, but this is my summary of why I don't see the necessity of any explicit opt-out like \-. >>> >>> Suppose I write this: >>> >>> String s = """ >>> some >>> lines go here >>> """; >>> >>> And suppose I have learned to picture a rectangle whose left edge is the left edge of the ending delimiter. >>> >>> Well, once I'm already picturing that rectangle based on the delimiter, then clearly if I leave the delimiter alone, that leaves the rectangle alone. I can change to >>> >>> String s = """ >>> some >>> lines go here >>> """; >>> >>> ... to insert two spaces before `some`, and I can further change to >>> >>> String s = """ >>> some >>> lines go here >>> """; >>> >>> ... to also insert two spaces before `lines`. >>> >>> What is notable to me is that at no point did I ever change from one kind of string literal to another. There is no feature that I opted in or out of -- because there just doesn't need to be. That to me is a clear and compelling win for simplicity. >>> >>> It's entirely possible this was all 100% clear already, in which case sorry! >>> >>> >>> >>> >>> On Thu, Apr 25, 2019 at 4:30 PM Liam Miller-Cushon > wrote: >>> On Thu, Apr 25, 2019 at 8:56 AM Brian Goetz > wrote: >>> >>> > For 2/3, here?s a radical suggestion. Our theory is, a ?fat? string is >>> > one that is is co-mingled with the indentation of the surrounding code, and >>> > one which we usually wish the compiler to disentangle for us. By this >>> > interpretation, fat single-line strings make no sense, so let?s ban them, >>> > and similarly, text on the first line similarly makes little sense, so >>> > let?s ban that too. In other words, fat strings (with the possible >>> > exception of the trailing delimiter) must exist within a ?Kevin >>> > Rectangle.? >>> > >>> >>> +1 >>> >>> I thought Jim presented a good case for an exception for the trailing >>> delimiter, but otherwise disallowing single-line 'fat' strings (single-line >>> multi-line strings?) seems to mostly have upside. >>> >>> For 4 (opt out), I think it is OK to allow a self-stripping escape on the >>> > first line (e.g., \-), which expands to nothing, but suppresses stripping. >>> > This effectively becomes a ?here doc?. >>> > >>> >>> This seems OK to me too, but is there good return on complexity? Closing >>> delimiter influence can also be used to opt out of stripping. Are there >>> enough use-cases to justify a second opt-out mechanism? And does it have to >>> be decided now, or could it be added later? >>> >>> >>> -- >>> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Fri Apr 26 15:56:48 2019 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 26 Apr 2019 08:56:48 -0700 Subject: Wrapping up the first two courses In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> <8F0DF8BC-B4CE-445E-808A-B4A7B74F7B3E@oracle.com> Message-ID: On Fri, Apr 26, 2019 at 8:39 AM Brian Goetz wrote: There are two interpretations here, related to escape-then-align vs > align-then-escape. Since everything else is align-then-escape, what this > would mean is we'd consider the leading space on the continuation line for > purposes of determining a common prefix, and strip the common prefix from > that, THEN eat the newline. Example: > > String s = """ > Imagine this line\ > was very long"""; > > which would result in: > > Imagine this linewas very long > > (lack of space between "line" and "was" is not a typo.) > Apparently bash's behavior is to replace with a single space character, and that at least seems like a *useful* behavior for us too if we're open to it. Which raises another question: do we allow \ in SL strings? (I > presume so, and we just eat the \ and the terminator.) > Hmm, I can see how that could be harmless but it seems to blur the boundary between the features to me. But I've lost track of why we need triple-quote to be different from single-quote in the first place. *Could *the notion just be that if you newline immediately after opening quote then you are asking for MLS with everything that comes along with that? Oh, and quite a few of *those* use cases are in annotations like @FlagSpec({"--foo", > "long help text about --foo"}), and I'm very happy that these are no > longer excluded from indentation stripping. > > > Can you expand this point? Not sure what you mean by "no longer excluded > from indentation stripping", or why it makes you happy. Can you just give > a before/after example for what you mean? > So sorry: I meant vs. 6 months ago, not that this is new. I know I complained about our going back to the drawing board then, but where this thread is going now is making us much happier than before. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From kevinb at google.com Fri Apr 26 15:59:37 2019 From: kevinb at google.com (Kevin Bourrillion) Date: Fri, 26 Apr 2019 08:59:37 -0700 Subject: Wrapping up the first two courses In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> <8F0DF8BC-B4CE-445E-808A-B4A7B74F7B3E@oracle.com> Message-ID: On Fri, Apr 26, 2019 at 8:56 AM Kevin Bourrillion wrote: Apparently bash's behavior is to replace backslash, newline, any amount of whitespace> with a single space > character, and that at least seems like a *useful* behavior for us too if > we're open to it. > I was forgetting, when I said this, that *another* substantial minority use case (I want to say at least 15%? These were rough estimates though) for multi-line strings is really long URLs, checksums, etc., that aren't meant to have any spaces in them at all. So the bash behavior is not necessarily what we'd want, although of course consistency with it has some amount of value in itself. > > > Which raises another question: do we allow \ in SL strings? >> (I presume so, and we just eat the \ and the terminator.) >> > > Hmm, I can see how that could be harmless but it seems to blur the > boundary between the features to me. > But I've lost track of why we need triple-quote to be different from > single-quote in the first place. *Could *the notion just be that if you > newline immediately after opening quote then you are asking for MLS with > everything that comes along with that? > > Oh, and quite a few of *those* use cases are in annotations like @FlagSpec({"--foo", >> "long help text about --foo"}), and I'm very happy that these are no >> longer excluded from indentation stripping. >> >> >> Can you expand this point? Not sure what you mean by "no longer excluded >> from indentation stripping", or why it makes you happy. Can you just give >> a before/after example for what you mean? >> > > So sorry: I meant vs. 6 months ago, not that this is new. > > I know I complained about our going back to the drawing board then, but > where this thread is going now is making us much happier than before. > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Fri Apr 26 16:52:32 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 26 Apr 2019 12:52:32 -0400 Subject: Wrapping up the first two courses In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> <8F0DF8BC-B4CE-445E-808A-B4A7B74F7B3E@oracle.com> Message-ID: <9c2769d5-ce7b-de53-9c8e-b156135047e6@oracle.com> On 4/26/2019 11:56 AM, Kevin Bourrillion wrote: > > > Which raises another question: do we allow \ in SL > strings?? (I presume so, and we just eat the \ and the terminator.) > > > Hmm, I can see how that could be harmless but it seems to blur the > boundary between the features to me. I know what you mean.? But, on the other hand, one of the real values of the new approach is that the 'escape language' supported by both kinds of string literals is _identical_; the only differences are the out-of-band characteristics (delimiter) and things that are directly related to 2D-embedding.? Having the two diverge gratuitously is accidental complexity.? (This issue will come back again when we talk about raw-ness too.) > But I've lost track of why we need triple-quote to be different from > single-quote in the first place. /Could /the notion just be that if > you newline immediately after opening quote then you are asking for > MLS with everything that comes along with that? Among other reasons, quotes.? Nearly 100% of the ML candidates have embedded quote characters in them; having to escape them would not be very satisfying. From john.r.rose at oracle.com Fri Apr 26 17:45:06 2019 From: john.r.rose at oracle.com (John Rose) Date: Fri, 26 Apr 2019 10:45:06 -0700 Subject: Wrapping up the first two courses In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> <8F0DF8BC-B4CE-445E-808A-B4A7B74F7B3E@oracle.com> Message-ID: On Apr 26, 2019, at 6:31 AM, Brian Goetz wrote: > > the use of the closing delimiter in this way is fine, but then I can?t get the equivalent string with no trailing newline. So I can use CDI to influence trailing newline, OR alignment, but not both. That?s a use case for \LineTerminator at the very end. From james.laskey at oracle.com Fri Apr 26 17:53:33 2019 From: james.laskey at oracle.com (Jim Laskey) Date: Fri, 26 Apr 2019 14:53:33 -0300 Subject: Wrapping up the first two courses In-Reply-To: References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> <8F0DF8BC-B4CE-445E-808A-B4A7B74F7B3E@oracle.com> Message-ID: <4692F028-3970-4DB0-AAAE-E084EBE88F6D@oracle.com> Always on the ball John. Clever. :-) > On Apr 26, 2019, at 2:45 PM, John Rose wrote: > > On Apr 26, 2019, at 6:31 AM, Brian Goetz wrote: >> >> the use of the closing delimiter in this way is fine, but then I can?t get the equivalent string with no trailing newline. So I can use CDI to influence trailing newline, OR alignment, but not both. > > That?s a use case for \LineTerminator at the very end. From john.r.rose at oracle.com Fri Apr 26 19:51:15 2019 From: john.r.rose at oracle.com (John Rose) Date: Fri, 26 Apr 2019 12:51:15 -0700 Subject: Wrapping up the first two courses In-Reply-To: <0E99DDEC-F6F1-43BA-80AE-C92D7C51B433@oracle.com> References: <846281FA-7231-45DE-9704-0C90DD1B5EE5@oracle.com> <9B0081D0-5C17-4C74-8B79-0F6B3B5597A6@oracle.com> <5CBA64DF.305@oracle.com> <962EB2D4-49E9-4272-826C-FE174DDE5E24@oracle.com> <3985B567-0A79-4C6E-B400-CB1EFA9C24F7@oracle.com> <6B70604F-9AC7-410A-89AA-2E665FC2E141@oracle.com> <8F0DF8BC-B4CE-445E-808A-B4A7B74F7B3E@oracle.com> <0E99DDEC-F6F1-43BA-80AE-C92D7C51B433@oracle.com> Message-ID: Yes, that?s the sort of thing I have in mind. The sequence / + LT + (blanks)* would be replaced by the null string when escapes are processed and *after* rectangle extraction. Any space before the / would be left untouched, since it?s not part of the escape sequence. This gives a hook for breaking up long lines and rejoining them as payload either with or without intervening spaces. The +(spaces)* part above is not strictly necessary but it plays well with the other parts of the syntax, allowing users additional freedom to lay out their program readably. Readability being, of course, more important than writability. > On Apr 26, 2019, at 8:54 AM, Jim Laskey wrote: > > I was interpreting \ as "the line is to be continued" and \ as "this is where the line continues". This interpretion provides the margin control that Guy was looking for. > > String s = """ > \ This is the first bit > \ and the second bit > \ and the third bit > """; > > Result: > This is the first bit > and the second bit > and the third bit > > The string continues after the \, overriding the indentation control of that line. > > > Revisiting the other cases; > > String s = """ > This is the first bit > and the second bit > and the third bit > """; > > Result: > This is the first bit > and the second bit > and the third bit > > Default case. > > > String s = """ > This is the first bit \ > and the second bit \ > and the third bit > """; > > Result: > This is the first bit and the second bit and the third bit. > > Not because the next line's white space was skipped after the \, but because of default align. > > > String s = """ > This is the first bit \ > \ and the second bit \ > \ and the third bit > """; > > Result: > This is the first bit and the second bit and the third bit. > > Combining the two escapes. > > >> On Apr 26, 2019, at 12:31 PM, Kevin Bourrillion wrote: >> >>> On Thu, Apr 25, 2019 at 5:42 PM Brian Goetz wrote: >>> >> >>> So what you?re saying is: with CDI, the opt out is: bring the closing delimiter to the left margin, done. >> >> derp, thanks for completing my thought. Basically: some say "opt out" while I say "I choose to locate the left edge of my rectangle at the left edge of the actual source file". "Done" indeed. >> >> >>> On Fri, Apr 26, 2019 at 6:25 AM Jim Laskey wrote: >>> >>> The example that Kevin left to the imagination was; >>> >>> String s = """ >>> some >>> lines go here >>> """; >>> >>> Which, while awkward, remains a natural progression of CDI, can be interpreted as heredoc, and no other indicator required. >> >> I persistently forget that people will sometimes want to do that, to keep their content within their customary source file column limit. (I want to downplay the notion of pasteability because that's precisely a feature of raw strings, which this is not.) >> >> >>> Other notion mentioned; >>> >>> - \ eats the terminator and continues on the next line. >> >> Eats the terminator plus all leading whitespace of the next line, yes? I had forgotten that the reintroduction of escapes opened up this possibility, and I think it's pretty great -- quite a substantial fraction (25%+ I think) of all our multiline use cases that we've found are things like long exception messages where they don't actually want the newlines, they just want to be free of dealing with the damn quote-plus-quote. >> >> Oh, and quite a few of those use cases are in annotations like @FlagSpec({"--foo", "long help text about --foo"}), and I'm very happy that these are no longer excluded from indentation stripping. >> >> >>> - I suppose we could have \ to mean continue here >>> - This could effective provide margin control >> >> I have to catch up on this thread to figure out what this means. >> >> I also need to catch up on the issue of what to do with the trailing newline. We can get data on how often our string literals seem to want interior newlines but no trailing one. It would be a bit surprising if the trailing newline is automatically chomped, but at least you have two very simple and obvious ways to restore it (add another line or add `\n`), whereas chomping via library is sad for several reasons (including excluding those @FlagSpecs I mentioned above). >> >> >>>>> On Apr 25, 2019, at 8:19 PM, Kevin Bourrillion wrote: >>>>> >>>>> I'm sure I'm not saying anything totally new in the following, but this is my summary of why I don't see the necessity of any explicit opt-out like \-. >>>>> >>>>> Suppose I write this: >>>>> >>>>> String s = """ >>>>> some >>>>> lines go here >>>>> """; >>>>> >>>>> And suppose I have learned to picture a rectangle whose left edge is the left edge of the ending delimiter. >>>>> >>>>> Well, once I'm already picturing that rectangle based on the delimiter, then clearly if I leave the delimiter alone, that leaves the rectangle alone. I can change to >>>>> >>>>> String s = """ >>>>> some >>>>> lines go here >>>>> """; >>>>> >>>>> ... to insert two spaces before `some`, and I can further change to >>>>> >>>>> String s = """ >>>>> some >>>>> lines go here >>>>> """; >>>>> >>>>> ... to also insert two spaces before `lines`. >>>>> >>>>> What is notable to me is that at no point did I ever change from one kind of string literal to another. There is no feature that I opted in or out of -- because there just doesn't need to be. That to me is a clear and compelling win for simplicity. >>>>> >>>>> It's entirely possible this was all 100% clear already, in which case sorry! >>>>> >>>>> >>>>> >>>>> >>>>>> On Thu, Apr 25, 2019 at 4:30 PM Liam Miller-Cushon wrote: >>>>>> On Thu, Apr 25, 2019 at 8:56 AM Brian Goetz wrote: >>>>>> >>>>>> > For 2/3, here?s a radical suggestion. Our theory is, a ?fat? string is >>>>>> > one that is is co-mingled with the indentation of the surrounding code, and >>>>>> > one which we usually wish the compiler to disentangle for us. By this >>>>>> > interpretation, fat single-line strings make no sense, so let?s ban them, >>>>>> > and similarly, text on the first line similarly makes little sense, so >>>>>> > let?s ban that too. In other words, fat strings (with the possible >>>>>> > exception of the trailing delimiter) must exist within a ?Kevin >>>>>> > Rectangle.? >>>>>> > >>>>>> >>>>>> +1 >>>>>> >>>>>> I thought Jim presented a good case for an exception for the trailing >>>>>> delimiter, but otherwise disallowing single-line 'fat' strings (single-line >>>>>> multi-line strings?) seems to mostly have upside. >>>>>> >>>>>> For 4 (opt out), I think it is OK to allow a self-stripping escape on the >>>>>> > first line (e.g., \-), which expands to nothing, but suppresses stripping. >>>>>> > This effectively becomes a ?here doc?. >>>>>> > >>>>>> >>>>>> This seems OK to me too, but is there good return on complexity? Closing >>>>>> delimiter influence can also be used to opt out of stripping. Are there >>>>>> enough use-cases to justify a second opt-out mechanism? And does it have to >>>>>> be decided now, or could it be added later? >>>>> >>>>> >>>>> -- >>>>> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com >>>> >>> >> >> >> -- >> Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > From brian.goetz at oracle.com Sat Apr 27 01:17:54 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 26 Apr 2019 21:17:54 -0400 Subject: Fwd: String reboot proposal References: Message-ID: <56DFD402-071D-4FCC-A451-E510EFD5B15D@oracle.com> Received on the -comments list. > Begin forwarded message: > > From: Victor Nazarov > Subject: String reboot proposal > Date: April 16, 2019 at 10:19:06 PM EDT > To: amber-spec-comments at openjdk.java.net > > I want to propose one more design vector for enhanced string literals that > I haven't seen before > > Basically, I propose two extensions to get to multiline literals with > incidental white space handling. > > Extension1: allow sequences of string literals to denote a single > concatenated literal > Extension2: allow string literals that spans from an opening token till the > end of a line > > Let's start with an example: > > String j = "" > + "public static void " + name + "(String... args) {\n" > + " System.out.println(\"Hello,\\t\" + > String.join(args));\n" > + "}\n"; > > Extension1: allow sequence of string literals to denote concatenated literal > > With this extension we can get rid of plus operators, like this: > > String j = "" > "public static void " + name + "(String... args) {\n" > " System.out.println(\"Hello,\\t\" + String.join(args));\n" > "}\n"; > > This feature is not so alien, it is present in C language. > It may seem not worth it, but I think it can be worth it in combination > with second extension. > > Extension2: allow string literals that spans from an opening token till the > end of a line. > > String s = "hello\n"; > > to be written as > > String s = > """hello > ; > > Tripple quotes in this case start a string literal which ends at an end of > line. > > Having these two extensions, we can rewrite our example as: > > String j = > "public static void " + name + """(String... args) { > """ System.out.println("Hello,\\t" + String.join(args)); > """} > ; > > Other examples: > > String sql = > """SELECT name > """FROM user > """WHERE id = ? and role = 'ADMIN' > ; > > String json = > """{ > """ "login": "john", > """ "id": 123. > """} > ; > > String html = > """
> """ Hello, World > """
> ; > > With this style we can't just copy and past snippets of foreign code into a > string literal, but > all we need is to prefix every line with tripple quotes. > This style is analogues to source code comments handling by programmers, > so, I think, it's familiar enough. > > // TODO: rewrite this method > // using new guava API > > Automatic prefixing of comments is already present in every IDE and is > simple enough to implement by hand, like > > | sed 's/^/"""/g' > > Rawness can be added to such string literals independently in orthogonal > way: > > String regex = \"a(bc)+\s+(de|kg)?"\; > > String j = > \"public static void "\ + name + \"""(String... args) { > \""" System.out.println("Hello,\t" + String.join(args)); > \"""} > ; > > The problem I see is interoperability between simple string literals and > "line sting literals": we can't easily align them vertically. > But may be this problem is more easily solvable than magical incidental > whitespace handling. > > -- > Victor Nazarov From forax at univ-mlv.fr Sat Apr 27 14:59:59 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 27 Apr 2019 16:59:59 +0200 (CEST) Subject: A library for implementing equals and hashCode In-Reply-To: References: Message-ID: <451200183.2030918.1556377199211.JavaMail.zimbra@u-pem.fr> After a hacking session, i've got your API and nice performance. [ https://github.com/forax/exotic/blob/master/src/test/java/com.github.forax.exotic/com/github/forax/exotic/ObjectSupportExampleTests.java#L12 | https://github.com/forax/exotic/blob/master/src/test/java/com.github.forax.exotic/com/github/forax/exotic/ObjectSupportExampleTests.java#L12 ] It works like this, the lambdas are declared Serializable, calling writeReplace on them give me the corresponding SerializedLambda object (i stole that idea from Tagir), which give me the class and the method that contains the lambda body, then i scan the bytecode to extract the corresponding field name (i look for the pattern aload_0 + getfield). Once i have the field name, i re-construct a method handle tree around to implement either hashCode or equals and ensure that from a JIT POV everything is a constant. R?mi > De: "Liam Miller-Cushon" > ?: "amber-spec-experts" > Envoy?: Lundi 22 Avril 2019 20:29:51 > Objet: A library for implementing equals and hashCode > Please consider this proposal for a library to help implement equals and > hashCode. > The doc includes a discussion of the motivation for adding such an API to the > JDK, a map of the design space, and some thoughts on the subset of that space > which might be most interesting: > [ http://cr.openjdk.java.net/~cushon/amber/equivalence.html | > http://cr.openjdk.java.net/~cushon/amber/equivalence.html ] From brian.goetz at oracle.com Sun Apr 28 20:32:05 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 28 Apr 2019 16:32:05 -0400 Subject: String literals: some principles Message-ID: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> I would like to point out a key principle that has guided this second round of exploration on string literals, and mention how it might guide the next round (without actually diving into that round.). Classic string literals, and the new ?fat? string literals ? are now recognizable as variations on the same feature, each adapted to their niche (single vs multi-line.). The ?escape language? supported by both is identical ? and should stay that way ? the only difference is the delimiter, and the handling of artifacts of embedding a snippet of foreign text in a traditionally-indented Java program. (Even their delimiters are similar.). This is a good thing. Looking ahead to the next round, we can build on this. In the first round, we mistakenly thought that there was something that could reasonably be called a ?raw? string, but this notion is a fantasy; no string literal is so raw that it can?t recognize its closing delimiter. So ?rawness? is really only a matter of degree. We can characterize a string literal language as: - Opening delimiter - Closing delimiter - Escape characters, if any - Escape sublanguages, if any That is, we process ordinary characters until we encounter either the closing delimiter, or one of the escape characters. When we encounter an escape character, we process a ?program? from the escape language, and then go back to processing ordinary characters. Classic string literals have opening and closing delimiters of ?, an escape character of \, and an escape language that includes ?programs? like: n ? newline t ? tab 0nnn ? octal literal ? ? quote character Fat string literals are the same, except that the opening and closing delimiter are ???. But we keep the same escape language. This is valuable. It is worth asking explicitly: do we want to keep the same escape character too? Guy has suggested offline that we might consider \\\ as the escape character for fat strings. Looking ahead (but please, let?s not open this discussion now), one of the tools we have at hand for representing degrees of ?raw-ness? is, as we ?strengthen" the delimiter, we also strenghten the escape character at the same rate ? but keep the escape language intact. This would allow raw strings to be yet another projection of the same basic string literal feature, while requiring increasingly explicit action on the part of the user to access the escape language. I bring this up not because I want to talk about raw-ness now (getting the hint?), but because I want to keep all the variations of string literals as lightly-varying projections of the same basic feature. It has come up, for example, that we might treat \ differently in ML strings as in classic strings, but I would prefer it we could not tinker with the escape language in nonuniform ways ? as this minimizes the variations between the various sub-features. So I offer this peek down the road as a means of Soliciting discussion on the pros and cons of keeping \ as our escape character. From guy.steele at oracle.com Mon Apr 29 15:48:05 2019 From: guy.steele at oracle.com (Guy Steele) Date: Mon, 29 Apr 2019 11:48:05 -0400 Subject: String literals: some principles In-Reply-To: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> References: <21B07F8B-F73A-4CCD-851E-6AC0FD8DCCAF@oracle.com> Message-ID: <0FBBF6C5-E836-43CD-88CD-BFD803AE6752@oracle.com> > On Apr 28, 2019, at 4:32 PM, Brian Goetz wrote: > > . . . > Looking ahead to the next round, we can build on this. In the first round, we mistakenly thought that there was something that could reasonably be called a ?raw? string, but this notion is a fantasy; no string literal is so raw that it can?t recognize its closing delimiter. So ?rawness? is really only a matter of degree. This is _almost_ true. If a string is truly raw (that is, it can contain _anything_), then one absolutely cannot depend on recognizing the closing delimiter by examining what might be the raw content. Put another way: one cannot determine how long the raw content is by examining it. That?s a solid principle. But there are other ways of determining how long it is. All have this property in common: you have to know how long the content is before you begin to scan it. And this leads to an obvious solution: you need a count of bytes up front. The design currently under consideration can easily accommodate this, now or in the future: a raw string is an opening delimiter, then a byte count (say, expressed as a decimal integer), then a LineTerminator , then as many bytes as the count indicated, then a LineTerminator, then a closing delimiter (the last two are not really needed, but they look nice, satisfy user expectations, and provide some redundancy to help make sure the byte count was correct). Examples: String PrintableAscii = ???95 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ???; // no need to worry about that embedded backslash  String LotsaQuotes = ???50 ?????????"?????????"?????????"?????????"?????????? ???; // the payload cannot be confused with the closing delimiter String LineNoise = ???16  ???; // I pasted in ^H^I^J^K^L^M^N^O^P^Q^R^S^T^U^V^W here?not sure how it will render in your mail reader The syntax could be further adjusted in arbitrary ways for added clarity: for example String PrintableAscii = ???RAW DATA (95 bytes): !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ???; Presumably an IDE could help you make sure the byte count is correct. ?Guy From amalloy at google.com Mon Apr 29 18:28:20 2019 From: amalloy at google.com (Alan Malloy) Date: Mon, 29 Apr 2019 11:28:20 -0700 Subject: Feedback on Sealed Types Message-ID: Hello again, amber-spec-experts. I have another report from the Google codebase, this time focusing on sealed types. It is viewable in full Technicolor HTML at http://cr.openjdk.java.net/~cushon/amalloy/sealed-types-report.html (thanks again to Liam for hosting), and included below as plain text: ?Author: Alan Malloy (amalloy at google.com) Published: 2019-04-29 Feedback on Sealed Types Hello again, amber-spec-experts. I?m back with a second Google codebase research project. I?m looking again at the Records & Sealed Types proposal (which has now become JDK-8222777), but this time I?m focusing on sealed types instead of records, as promised in my RFC of a few weeks ago. My goal was to investigate Google?s codebase to guess what developers might have done differently if they had access to sealed types. This could help us understand what works in the current proposal and what to consider changing. Unlike my previous report, this one contains more anecdotes than statistics. It wound up being difficult to build static analysis to categorize the interesting cases, so I mostly examined promising candidates by hand. Summary and Recommendations For those who don?t care to read through all my anecdotes, I first provide a summary of my findings, and one suggested addition. Sealed types, as proposed so far, are a good idea in theory: Java already has product types and open polymorphism, and sealed types give us closed polymorphism. However, I could not find many cases of code being written today that would be greatly enhanced if sealed types were available. The main selling point of sealed types for application authors is getting help from the compiler with exhaustiveness checking, but in practice developers almost always have a default case, because they are only interested in a subset of the possible subclasses, and want to ignore cases they don?t understand. This means that exhaustiveness-checking for pattern matches would mostly go unused if developers rewrote their existing code using sealed types. Pattern matching is great, and can replace visitors in many cases, but this does not depend on sealed types except for exhaustiveness checks (which, again, would go mostly unused in code written today). The class hierarchies for which people define visitors today are just too large to write an exhaustive pattern match, and so a default case would be very common. The other audience for sealed types is library authors. While in practice most developers have no great need to forbid subclasses, perhaps it would be a boon for authors of particularly popular libraries, who need to expose a non-final class as an implementation detail but don?t intend for consumers to create their own subclasses. Those authors can already include documentation saying ?you really should not extend this?, but there is always some weirdo out there who will ignore your warnings and then write an angry letter when the next version of your library breaks his program (see: sun.misc.Unsafe). Authors of such libraries would welcome the opportunity to make it truly impossible to create undesirable subclasses. Sealed Types As a Vehicle For Sum Types So, sealed types as-is would be an improvement, but a niche one, used by few. I think we can get substantially more mileage out of them if we also include a more cohesive way to explicitly define a sum type and all its subtypes in one place with minimal ceremony. Such a sum type could be sealed, implicitly or explicitly. A tool like this takes what I see as the ?theoretical? advantage of sum types (closed polymorphism), and makes it ?practical? by putting it front and center. Making sums an actual language element instead of something ?implied? by sealing a type and putting its subclasses nearby could help in a lot of ways: * Developers might more often realize that a sealed/sum type is a good model for their domain. Currently it?s a ?pattern? external to the language instead of a ?feature?, and many don?t realize it could be applied to their domain. Putting it in the language raises its profile, addressing the problem that people don?t realize they want it. * The compiler could provide help for defining simple sums-of-products, while making it possible to opt into more complicated subclasses, in much the way that enums do: the typical enum just has bare constants like EAST, but you can add constructor arguments or override methods when necessary. * The ability to more easily model data in this way may result in developers writing more classes that are amenable to sealing/sums, as they do in other languages with explicit sum types (Haskell, Kotlin, Scala). Then, the exhaustiveness-checking feature that sealed types provide would pull more weight. Since enum types are ?degenerate sum types?, the syntax for defining sums can borrow heavily from enums. A sketch of the syntax I imagine for such things (of course, I am not married to it): public type-enum interface BinaryTree { Leaf { @Override public Stream elements() {return Stream.empty();} }, Node(T data, BinaryTree left, BinaryTree right) { @Override public Stream elements() { return Stream.concat(left.elements(), Stream.concat(Stream.of(data), right.elements())); } }; public Stream elements(); } Like enums, you can use a bare identifier for simple types that exist only to be pattern-matched against, but you can add fields and/or override blocks as necessary. The advantage over declaring a sealed type separately from its elements is both concision (the compiler infers visible records, superclass, and all type parameters) and clarity: you state your intention firmly. I think a convenient syntax like this will encourage developers to use the powerful tool of sealed types to model their data. Evidence in Google?s Codebase If you are just interested in recommendations, you can stop reading now: they are all included in the summary. What follows is a number of anecdotes, or case studies if you prefer, that led me to the conclusions above. Each shows a type that might have been written as a sealed type, and hopefully highlights a different facet of the question of how sealed types can be useful. The first thing I looked for was classes which are often involved in instanceof checks. As language authors, we imagine people writing stuff like this[1] all the time: interface Expr {int eval(Scope s);} record Var(String name) implements Expr { public int eval(Scope s) {return s.get(name);} } record Sum(Expr left, Expr right) implements Expr { public int eval(Scope s) {return left.eval(s) + right.eval(s);} } class Analyzer { Stream variablesUsed(Expr e) { if (e instanceof Var) return Stream.of(((Var)e).name); if (e instanceof Sum) { return variablesUsed(((Sum)e).left) .concat(variablesUsed(((Sum)e).right)); } throw new IllegalArgumentException(); } } Here, the Expr interface captures some of the functionality shared by all expressions, but later a client (Analyzer) came along and invented some other polymorphic operations to perform on an Expr, which Expr did not support. So Analyzer needed to do instanceof checks instead, externalizing the polymorphism. The principled approach would have been for Expr to export a visitor to begin with, but perhaps it wasn?t seen as worth the trouble at the time. To try to find this pattern in the wild, I searched for method bodies which perform multiple instanceof checks against the same variable. Notably, this excludes the typical equals(Object) method, which only performs a single check. For each such variable, I noted: 1. Its declared type 2. The set of subtypes it was checked for with instanceof 3. The common supertype of those subtypes. I guessed that (3) would usually be the same as (1), but in practice 55% of the time they were different. Often, the declared type was Object, or some generic type variable which erases to Object, while the common supertype being tested was something like Number, Event, or Node. For example, a Container knows it will be used in some context where NaN is unsuitable, so it checks whether its contents are Float or Double, and if so ensures NaN is not stored. As a second example, a serialize(Object) method checks whether its input is String or ByteString, and throws an exception otherwise. Bad sealed types found looking at instanceof checks I looked through the most popular declared types of these candidates, to investigate which types are often involved in such checks. Most of them are not good candidates for a sealed type. Object was the most common type, followed by Exception and Throwabe. Next up is an internal DOMObject class, which sounds promising until I tell you it has thousands of direct subclasses. Nobody is doing exhaustive switches on this, of course. Instead, many uses iterate over a Collection, or receive a DOMObject in some way, and just check whether it is of one or two specific subtypes they care about. This turned out to be a very common pattern, not just for DOMObject, but for many candidate sealed types I found: nobody does exhaustive case analysis. They just look for objects they understand in some much larger hierarchy, and ignore the rest. Some more humorous types that are often involved in instanceof checks: java.net.InetAddress (everyone wants to know if it?s v4 or v6) and com.sun.source.tree.Tree, in our static-analysis tools. Tree is an interesting case: here we do exactly what I mentioned previously for DOMObject. On the surface it seems that Tree would be a good candidate for a sealed interface with record subtypes, but in practice I?m not sure what sealing would buy us. We would effectively opt out of exhaustiveness-checking by having a large default case, or by extending a visitor with default-empty methods. Of course, sometimes we define a new visitor to do some polymorphic operation over a Tree, but more often we just look for one or two subtypes we care about. For example, DataFlow inspects a Tree, but knows from context that it is either a LambdaExpressionTree, MethodTree, or an initializer. Plausible sealed types found looking at instanceof checks The previous section notwithstanding, I did dig deep enough into the results to find a few classes that could make good sealed types. The most prominent, and most interesting, was another AST. There is an abstract Node class for representing HTML documents. It has just 4 subclasses defined in the same file: Text, Comment, Tag, and EndTag. This spartan definition suggests it?s used for something like SAX parsing, but I didn?t confirm this. It does everything you could hope for from a type like this: it exposes a Visitor, it provides an accept(Visitor) method, and the superclass specifies abstract methods for a couple of the most common things you would want to do, such as a String toHtml() method. However, recall that I found this class by looking for classes often involved in instanceof checks! Some people use the visitor, but why doesn?t everyone? The first reason I found is one I?ve mentioned several times already: clients only care about one of the 4 cases, and may have felt creating an anonymous visitor is too much ceremony. Would they be happy with a switch and a default clause? Probably, but it?s hard to know for sure. The second reason surprised me a bit: I found clients doing analysis that isn?t really amenable to any one visitor, or a simple pattern-match. They?ve written this: if (mode1) { if (x instanceof Tag) {...} } else if (mode2) { if (x instanceof Text) {...}} The same use site cares about different subclasses at different times, depending on some other flag(s) controlling its behavior. Even if we offered a pattern-match on x, it?s difficult to encode the flags correctly. They would have to match on a tuple of (mode1, mode2, x), with a case for (true, _, Tag name) and another for (false, true, Text text). Technically possible, but not really prettier than what they already have, especially since you would need to use a local record instead of an anonymous tuple. Even so, I think this would have benefited from being a sealed type. Recall that earlier I carefully said ?4 subclasses defined in the same file?. This is because some jokester in a different package altogether has defined their own fifth subclass, Doctype. They have their own sub-interface of Visitor that knows about Doctype nodes. I can?t help but feel that the authors of Node would have preferred to make this illegal, if they had been able to. The second good sealed type I found is almost an enum, except that one of the instances has per-instance data. This is not exactly a surprise, since an enum is a degenerate sum type, and one way to think of sealed types is as a way to model sums. It looks something like this[2]: public abstract class DbResult { public record NoDatabase() extends DbResult; public record RowNotFound() extends DbResult; // Four more error types ... public record EmptySuccess() extends DbResult; public record SuccessWithData(T data) extends DbResult; public T getData() { if (!(this instanceof SuccessWithData)) throw new DbException(); return ((SuccessWithData)this).data; } public DbResult transform(Function f) { if (!(this instanceof SuccessWithData)) { return (DbResult)this; } return new SuccessWithData(f.apply( ((SuccessWithData)this).data)); } Reading this code made me yearn for Haskell: here is someone who surely wanted to write data DbResult t = NoDatabase | NoRow | EmptySuccess | Success t but had to spend 120 lines defining their sum-of-products (the extra verbosity is because really they made the subclasses private, and defined private static singletons for each of the error types, with a static getter to get the type parameter right). This seems like a potential win for records and for sealed types. Certainly my snippet was much shorter than the actual source file because the proposed record syntax is quite concise, so that is a real win. But what do we really gain from sealing this type? Still nobody does exhaustive analysis even of this relatively small type: they just use functions like getData and transform to work with the result generically, or spot-check a couple interesting subtypes with instanceof. Forbidding subclassing from other packages hardly matters: nobody was subclassing it anyway, and nor would they be tempted to. Really the improvements DbResult benefits most from are records, and pattern-matching on records. It would be much nicer to replace the instanceof/cast pattern with a pattern-match that extracts the relevant field. This is the use case that inspired my idea of a type-enum, in the Summary section above. Rewriting it as a type-enum eliminates many of the problems: all the instanceof checks are gone, we don?t need a bunch of extra keywords for each case, and we?re explicit about the subclasses ?belonging to? the sealed parent, which means we get stuff like extends and for free. We get improved clarity by letting the definition of the class hierarchy reflect its ?nature? as a sum. public abstract type-enum DbResult { NoDatabase, RowNotFound, EmptySuccess, SuccessWithData(T data) { @Override public T getData() { return data; } @Override public DbResult transform(Function f) { return new SuccessWithData(f.apply(data)); } } public T getData() { throw new DbException(); } public DbResult transform(Function f) { return (DbResult)this; } } Visitors Instead of doing a bunch of instanceof checks, the ?sophisticated? way to interact with a class having a small, known set of subtypes is with a visitor. I considered doing some complicated analysis to characterize what makes a class a visitor, and trying to automatically cross-reference visitors to the classes they visit...but in practice simply looking for classes with ?Visitor? in their name was a strong enough signal that a more complicated approach was not needed. Having identified visitors, I looked at those visitors with the most subclasses, since each distinct subclass corresponds to one ?interaction? with the sealed type that it visits, and well-used visitors suggest both popularity and good design. One common theme I found: developers aren?t good at applying the visitor pattern. Many cases I found had some weird and inexplicable quirk compared to the ?standard? visitor. These developers will be relieved to get pattern-matching syntax so they can stop writing visitors. The Visiting Object The first popular visitor I found was a bit odd to me. It?s another tree type, but with a weird amalgam of several visitors, and an unusual approach to its double dispatch. I have to include a relatively lengthy code snippet to show all of its facets: public static abstract class Node { public interface Visitor { boolean process(Node node); } public boolean visit(Object v) { return v instanceof Visitor && ((Visitor)v).process(this); } // Other methods common to all Nodes ... } public static final class RootNode extends Node { public interface Visitor { boolean processRoot(RootNode node); } @Override public boolean visit(Object v) { return v instanceof Visitor ? ((Visitor)v).processRoot(this) : super.visit(v); } // Other stuff about root nodes ... } public static abstract class ValueNode extends Node { public interface Visitor { boolean processValue(ValueNode node); } @Override public boolean visit(Object v) { return v instanceof Visitor ? ((Visitor)v).processValue(this) : super.visit(v); } } public static final class BooleanNode extends ValueNode { public interface Visitor { boolean processBool(BooleanNode node); } @Override public boolean visit(Object v) { return v instanceof Visitor ? ((Visitor)v).processBool(this) : super.visit(v); } // Other stuff about booleans ... } public static final class StringNode extends ValueNode { // Much the same as BooleanNode } This goes on for some time: there is a multi-layered hierarchy of dozens of node types, each with a boolean visit(Object) method, and their own distinct Visitor interface, in this file. I should note that this code is actually not written by a human, but rather generated by some process (I didn?t look into how). I still think it is worth mentioning here for two reasons: first, whoever wrote the code generator would probably do something similar if writing it by hand, and second because these visitors are used often by hand-written code. Speaking of hand-written code, visitor subclasses now get to declare ahead of time exactly which kinds of nodes they care about, by implementing only the appropriate Visitor interfaces: private class FooVisitor implements StringNode.Visitor, BooleanNode.Visitor, RootNode.Visitor { // ... } This isn?t how I would have written things, but I can sorta see the appeal, if you don?t have to write it all by hand: a visitor can choose to handle any one subclass of ValueNode, or all ValueNodes, or just RootNode and StringNode, et cetera. They get to pick and choose what sub-trees of the inheritance tree they work with. Would Node be a good sealed class? Maybe. It clearly intends to enumerate all subclasses, but the benefit it gets from enforcing that is minimal. As in my previous examples, the main advantage for Node implementors would come from records, and the main advantage for clients would come from pattern-matching, obviating their need for this giant visitor. The Enumerated Node Another AST, this time for some kind of query language, explicitly declares an enum of all subclasses it can have, and uses this enum instead of using traditional double-dispatch: public interface Node { enum Kind {EXPR, QUERY, IMPORT /* and 9 more */} Kind getKind(); Location getLocation(); } public abstract record AbstractNode(Location l) implements Node {} public class Expr extends AbstractNode { public Kind getKind() {return EXPR;} // ... } // And so on for other Kinds ... public abstract class Visitor { // Empty default implementations, not abstract. public Expr visitExpr(Expr e) {} public Query visitQuery(Query q) {} public Import visitImport(Import i) {} public Node visit(Node n) { switch (n.getKind()) { case EXPR: return visitExpr((Expr)n); case QUERY: return visitQuery((Query)n); case IMPORT: return visitImport((Import)n); // ... } } } It?s not really clear to me why they do it this way, instead of putting an accept(Visitor) method on Node. They gain the ability to return different types for each Node subtype, but are hugely restricted in what visitors can do: they must return a Node, instead of performing an arbitrary computation. It seems like the idea is visitors must specialize to tree rewriting, but I still would have preferred to parameterize the visitor by return type. Would this be better as a sealed type? I feel sure that if sealed types existed, the authors of this class would have used one. We could certainly do away with the enum, and use an expression-switch instead to pattern-match in the implementation of visit(Node). But I think the Visitor class would still exist, and still have separate methods for each Node subtype, because they developer seemed to care about specializing the return type. The only place where an exhaustiveness check helps would be in the visit(Node) method, inside the visitor class itself. All other dispatch goes through visit(Node), or through one of the specialized visitor methods if the type is known statically. It seems like overall this would be an improvement, but again, the improvement comes primarily from pattern-matching, not sealing. Colocated interface implementations Finally, I looked for interfaces having all of their implementations defined in the same file. On this I do have some statistical data[3]. A huge majority (98.5%) of public interfaces have at least one implementation in a different source file. Package-private interfaces also tend to have implementations in other files: 85% of them are in this category. For protected interfaces it?s much closer: only 53% have external implementations. Of course, all private interfaces have all implementations in a single file. Next, I looked at interfaces that share a source file with all their implementations, to see whether they?d make good sealed types. First was this Entry class: public interface Entry { enum Status {OK, PENDING, FAILED} Status getStatus(); int size(); String render(); } public class UserEntry implements Entry { private User u; private Status s; public UserEntry(User u, Status s) { this.u = u; this.s = s; } @Override String render() {return u.name();} @Override int size() {return 1;} @Override Status getStatus() {return s;} } public class AccountEntry implements Entry { private Account a; private Status s; public UserEntry(Account a, Status s) { this.a = a; this.s = s; } @Override String render() {return a.render();} @Override int size() {return a.size();} @Override Status getStatus() {return s;} } A huge majority of the clients of this Entry interface treat it polymorphically, just calling its interface methods. In only one case is there an instanceof check made on an Entry, dispatching to different methods depending on which subclass is present. Is this a good sealed type? I think not, really. There are two implementations now, but perhaps there will be a GroupEntry someday. Existing clients should continue to work in that case: the polymorphic Entry interface provides everything clients are ?intended? to know. Another candidate for sealing: public interface Request {/* Empty */} public record RequestById(int id) implements Request; public record RequestByValue(String owner, boolean historic) implements Request; public class RequestFetcher { public List fetch(Iterable requests) { List idReqs = Lists.newArrayList(); List valueReqs = Lists.newArrayList(); List queries = Lists.newArrayList(); for (Request req : requests) { if (req instanceof RequestById) { idReqs.add((RequestById)req); } else if (req instanceof RequestByValue) { valueReqs.add((RequestByValue)req); } } queries.addAll(prepareIdQueries(idReqs)); queries.addAll(prepareValueQueries(valueReqs)); return runQueries(queries); } } Interestingly, since the Request interface is empty, the only way to do anything with this class is to cast it to one implementation type. In fact, the RequestFetcher I include here is the only usage of either of these classes (plus, of course, helpers like prepareIdQueries). So, clients need to know about specific subclasses, and want to be sure they?re doing exhaustive pattern-matching. Seems like a great sealed class to me. Except...actually each of the two subclasses has been extended by a decorator adding a source[4]: public record SourcedRequestById(Source source) extends RequestById; public record SourcedRequestByValue(Source source) extends RequestByValue; Does this argue in favor of sealing, or against? I don?t really know. The owners of Request clearly intended for all four of these subclasses to exist (they?re in the same package), so they could include them all in the permitted subtype list, but it seems like a confusing API to expose to clients. A third candidate for sealing is another simple sum type: public interface ValueOrAggregatorException { T get(); public static ValueOrAggregatorException of(T value) { return new OfValue(value); } public static ValueOrAggregatorException ofException(AggregatorException err) { return new OfException(err); } private record OfValue(T value) implements ValueOrAggregatorException { @Override T get() {return value;} } private record OfException(AggregatorException err) implements ValueOrAggregatorException { @Override T get() {throw err;} } } It has only two subtypes, and it seems unimaginable there could ever be a third, so why not seal it? However, the subtypes are intentionally hidden: it is undesirable to let people see whether there?s an exception, except by having it thrown at you. In fact AggregatorException is documented as ?plugins may throw this, but should never catch it?: there is some higher-level thing responsible for catching all such exceptions. So, this type gains no benefit from exhaustiveness checks in pattern-matching. The type is intended to be used polymorphically, through its interface method, even though its private implementation is amenable to sealing. ________________ [1] Throughout this document I will use record syntax as if it were already in the language. This is merely for brevity, and to avoid making the reader spend a lot of time reading code that boils down to just storing a couple fields. In practice, of course the code in Google?s codebase either defines the records by hand, or uses an @AutoValue. [2] Recall that @AutoValue, Google?s ?record?, allows extending a class, which is semantically okay here: DbResult has no state, only behavior. [3]This data is imperfect. While the Google codebase strongly discourages having more than one version of a module checked in, there is still some amount of ?vendoring? or checking in multiple versions of some package, e.g. for supporting external clients of an old version of an API. As a result, two ?different files? which are really copies of each other may implement interfaces with the same fully-qualified name; I did not attempt to control for this case, and so such cases may look like they were in the same file, or not. [4] Of course in the record proposal it is illegal to extend records like this; in real life these plain data carriers are implemented by hand as ordinary classes, so the subtyping is legal. From brian.goetz at oracle.com Mon Apr 29 19:34:54 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 29 Apr 2019 15:34:54 -0400 Subject: Feedback on Sealed Types In-Reply-To: References: Message-ID: Thanks Alan, for this nice exploration. There?s a lot to respond to. I?ll start with some general comments about sealing, and then move on to your alternate proposal for exposing it. I can think of several main reasons why you would want to seal a hierarchy. - To say something about the _interface itself_. That is, that it was not designed as a general purpose contract between arms-length entities, but that it exists only as a common super type for a fixed set of classes that are part of the library. In other words, ?please don?t implement me.? - To say something about the semantics of the type. Several of the examples in your report fall into this category: ?a DbResult is either a NoRowsFound or a Rows(List)?. This tells users exactly what the reasonable range of results to expect are when doing a query. Of course, the spec could say the same thing, but that involves reading and interpreting the spec. Easier if this conclusion can be driven by types (and IDEs can help more here too.) - To strengthen typing by simulating unions. If my method is going to return either a String or a Number, the common super type is Object. (Actually, it?s some variant of Serializable & Comparable.). Sums-of-products allow library authors to make a stronger statement about types in the presence of unions. Exposing a sum of StringHolder(String) and NumberHolder(Number), using records and sealed types, is not so ceremonious, so some library developers might choose to do this instead of Object. - Security. Some libraries may want to know that the code they are calling is part of their library, rather than an arbitrary implementation of some interface. - To aid in exhaustiveness. We?ve already discussed this at length; your point is that this one doesn?t come up as often as one might hope. Not only is there an obvious synergy between sums and products (as many languages have demonstrated), but there is a third factor, which is ?if you make it easy enough, people will use it more.? Clearly records are pretty easy to use; your point is that if there were a more streamlined sum-of-products idiom, the third factor would be even stronger here. I think algebraic data types is one of those things that will take some time for developers to learn to appreciate; the easier we make it, of course the faster that will happen. Now, to your syntax suggestion. Overall, I like the idea, but I have some concerns. First, the good parts: - The connection with enums is powerful. Users already understand enums, so this will help them understand sums. Enums have special treatment in switch; we want the same treatment for sealed type patterns. Enums have special treatment for exhaustiveness; we want the same for sealed type patterns. So tying these together with some more general enum-ness leans on what people already know. - While sums and products are theoretically independent features, sums-of-products are expected to be quite common. So it might be reasonable to encourage this syntactically. - The current proposal has some redundancy, in that the subtypes have to say ?implements Node?, even if they are nested within Node. With a stronger mechanism for declaring them, as you propose, then that can safely be left implicit. - I confess that I too like the simplicity of Haskell?s `data` declaration, and this brings us closer. Now, the challenges: - The result is still a little busy. We need a modifier for ?enumerated type?, and we would also need to be able to have child types be not only records, but ordinary classes and interfaces. So we?d have to have a place for ?record?, ?class?, or ?interface? with the declaration of the enumerated classes (as well as other modifiers.). That busies up the result a bit. - Once we do this, I worry that it will be hard to tell the difference between: interface X { class X1 { ? } class X2 { ? } } and enumerated interface Y { class Y1 { ? }, On Apr 29, 2019, at 2:28 PM, Alan Malloy wrote: > > Hello again, amber-spec-experts. I have another report from the Google codebase, this time focusing on sealed types. It is viewable in full Technicolor HTML at http://cr.openjdk.java.net/~cushon/amalloy/sealed-types-report.html (thanks again to Liam for hosting), and included below as plain text: > > ?Author: Alan Malloy (amalloy at google.com ) > Published: 2019-04-29 > > Feedback on Sealed Types > > Hello again, amber-spec-experts. I?m back with a second Google codebase research project. I?m looking again at the Records & Sealed Types proposal (which has now become JDK-8222777), but this time I?m focusing on sealed types instead of records, as promised in my RFC of a few weeks ago. My goal was to investigate Google?s codebase to guess what developers might have done differently if they had access to sealed types. This could help us understand what works in the current proposal and what to consider changing. > > Unlike my previous report, this one contains more anecdotes than statistics. It wound up being difficult to build static analysis to categorize the interesting cases, so I mostly examined promising candidates by hand. > > Summary and Recommendations > > For those who don?t care to read through all my anecdotes, I first provide a summary of my findings, and one suggested addition. > > Sealed types, as proposed so far, are a good idea in theory: Java already has product types and open polymorphism, and sealed types give us closed polymorphism. However, I could not find many cases of code being written today that would be greatly enhanced if sealed types were available. The main selling point of sealed types for application authors is getting help from the compiler with exhaustiveness checking, but in practice developers almost always have a default case, because they are only interested in a subset of the possible subclasses, and want to ignore cases they don?t understand. This means that exhaustiveness-checking for pattern matches would mostly go unused if developers rewrote their existing code using sealed types. > > Pattern matching is great, and can replace visitors in many cases, but this does not depend on sealed types except for exhaustiveness checks (which, again, would go mostly unused in code written today). The class hierarchies for which people define visitors today are just too large to write an exhaustive pattern match, and so a default case would be very common. > > The other audience for sealed types is library authors. While in practice most developers have no great need to forbid subclasses, perhaps it would be a boon for authors of particularly popular libraries, who need to expose a non-final class as an implementation detail but don?t intend for consumers to create their own subclasses. Those authors can already include documentation saying ?you really should not extend this?, but there is always some weirdo out there who will ignore your warnings and then write an angry letter when the next version of your library breaks his program (see: sun.misc.Unsafe). Authors of such libraries would welcome the opportunity to make it truly impossible to create undesirable subclasses. > > Sealed Types As a Vehicle For Sum Types > > So, sealed types as-is would be an improvement, but a niche one, used by few. I think we can get substantially more mileage out of them if we also include a more cohesive way to explicitly define a sum type and all its subtypes in one place with minimal ceremony. Such a sum type could be sealed, implicitly or explicitly. A tool like this takes what I see as the ?theoretical? advantage of sum types (closed polymorphism), and makes it ?practical? by putting it front and center. Making sums an actual language element instead of something ?implied? by sealing a type and putting its subclasses nearby could help in a lot of ways: > > * Developers might more often realize that a sealed/sum type is a good model for their domain. Currently it?s a ?pattern? external to the language instead of a ?feature?, and many don?t realize it could be applied to their domain. Putting it in the language raises its profile, addressing the problem that people don?t realize they want it. > * The compiler could provide help for defining simple sums-of-products, while making it possible to opt into more complicated subclasses, in much the way that enums do: the typical enum just has bare constants like EAST, but you can add constructor arguments or override methods when necessary. > * The ability to more easily model data in this way may result in developers writing more classes that are amenable to sealing/sums, as they do in other languages with explicit sum types (Haskell, Kotlin, Scala). Then, the exhaustiveness-checking feature that sealed types provide would pull more weight. > > Since enum types are ?degenerate sum types?, the syntax for defining sums can borrow heavily from enums. A sketch of the syntax I imagine for such things (of course, I am not married to it): > public type-enum interface BinaryTree { > Leaf { > @Override public Stream elements() {return Stream.empty();} > }, > Node(T data, BinaryTree left, BinaryTree right) { > @Override public Stream elements() { > return Stream.concat(left.elements(), > Stream.concat(Stream.of(data), right.elements())); > } > }; > > > public Stream elements(); > } > > Like enums, you can use a bare identifier for simple types that exist only to be pattern-matched against, but you can add fields and/or override blocks as necessary. The advantage over declaring a sealed type separately from its elements is both concision (the compiler infers visible records, superclass, and all type parameters) and clarity: you state your intention firmly. I think a convenient syntax like this will encourage developers to use the powerful tool of sealed types to model their data. > > Evidence in Google?s Codebase > > If you are just interested in recommendations, you can stop reading now: they are all included in the summary. What follows is a number of anecdotes, or case studies if you prefer, that led me to the conclusions above. Each shows a type that might have been written as a sealed type, and hopefully highlights a different facet of the question of how sealed types can be useful. > > The first thing I looked for was classes which are often involved in instanceof checks. As language authors, we imagine people writing stuff like this[1] all the time: > > interface Expr {int eval(Scope s);} > record Var(String name) implements Expr { > public int eval(Scope s) {return s.get(name);} > } > record Sum(Expr left, Expr right) implements Expr { > public int eval(Scope s) {return left.eval(s) + right.eval(s);} > } > class Analyzer { > Stream variablesUsed(Expr e) { > if (e instanceof Var) return Stream.of(((Var)e).name); > if (e instanceof Sum) { > return variablesUsed(((Sum)e).left) > .concat(variablesUsed(((Sum)e).right)); > } > throw new IllegalArgumentException(); > } > } > > Here, the Expr interface captures some of the functionality shared by all expressions, but later a client (Analyzer) came along and invented some other polymorphic operations to perform on an Expr, which Expr did not support. So Analyzer needed to do instanceof checks instead, externalizing the polymorphism. The principled approach would have been for Expr to export a visitor to begin with, but perhaps it wasn?t seen as worth the trouble at the time. > > To try to find this pattern in the wild, I searched for method bodies which perform multiple instanceof checks against the same variable. Notably, this excludes the typical equals(Object) method, which only performs a single check. For each such variable, I noted: > > 1. Its declared type > 2. The set of subtypes it was checked for with instanceof > 3. The common supertype of those subtypes. > > I guessed that (3) would usually be the same as (1), but in practice 55% of the time they were different. Often, the declared type was Object, or some generic type variable which erases to Object, while the common supertype being tested was something like Number, Event, or Node. For example, a Container knows it will be used in some context where NaN is unsuitable, so it checks whether its contents are Float or Double, and if so ensures NaN is not stored. As a second example, a serialize(Object) method checks whether its input is String or ByteString, and throws an exception otherwise. > > Bad sealed types found looking at instanceof checks > > I looked through the most popular declared types of these candidates, to investigate which types are often involved in such checks. Most of them are not good candidates for a sealed type. Object was the most common type, followed by Exception and Throwabe. > > Next up is an internal DOMObject class, which sounds promising until I tell you it has thousands of direct subclasses. Nobody is doing exhaustive switches on this, of course. Instead, many uses iterate over a Collection, or receive a DOMObject in some way, and just check whether it is of one or two specific subtypes they care about. This turned out to be a very common pattern, not just for DOMObject, but for many candidate sealed types I found: nobody does exhaustive case analysis. They just look for objects they understand in some much larger hierarchy, and ignore the rest. > > Some more humorous types that are often involved in instanceof checks: java.net.InetAddress (everyone wants to know if it?s v4 or v6) and com.sun.source.tree.Tree, in our static-analysis tools. Tree is an interesting case: here we do exactly what I mentioned previously for DOMObject. On the surface it seems that Tree would be a good candidate for a sealed interface with record subtypes, but in practice I?m not sure what sealing would buy us. We would effectively opt out of exhaustiveness-checking by having a large default case, or by extending a visitor with default-empty methods. Of course, sometimes we define a new visitor to do some polymorphic operation over a Tree, but more often we just look for one or two subtypes we care about. For example, DataFlow inspects a Tree, but knows from context that it is either a LambdaExpressionTree, MethodTree, or an initializer. > > Plausible sealed types found looking at instanceof checks > > The previous section notwithstanding, I did dig deep enough into the results to find a few classes that could make good sealed types. The most prominent, and most interesting, was another AST. There is an abstract Node class for representing HTML documents. It has just 4 subclasses defined in the same file: Text, Comment, Tag, and EndTag. This spartan definition suggests it?s used for something like SAX parsing, but I didn?t confirm this. It does everything you could hope for from a type like this: it exposes a Visitor, it provides an accept(Visitor) method, and the superclass specifies abstract methods for a couple of the most common things you would want to do, such as a String toHtml() method. > > However, recall that I found this class by looking for classes often involved in instanceof checks! Some people use the visitor, but why doesn?t everyone? The first reason I found is one I?ve mentioned several times already: clients only care about one of the 4 cases, and may have felt creating an anonymous visitor is too much ceremony. Would they be happy with a switch and a default clause? Probably, but it?s hard to know for sure. The second reason surprised me a bit: I found clients doing analysis that isn?t really amenable to any one visitor, or a simple pattern-match. They?ve written this: > > if (mode1) { if (x instanceof Tag) {...} } > else if (mode2) { if (x instanceof Text) {...}} > > The same use site cares about different subclasses at different times, depending on some other flag(s) controlling its behavior. Even if we offered a pattern-match on x, it?s difficult to encode the flags correctly. They would have to match on a tuple of (mode1, mode2, x), with a case for (true, _, Tag name) and another for (false, true, Text text). Technically possible, but not really prettier than what they already have, especially since you would need to use a local record instead of an anonymous tuple. > > Even so, I think this would have benefited from being a sealed type. Recall that earlier I carefully said ?4 subclasses defined in the same file?. This is because some jokester in a different package altogether has defined their own fifth subclass, Doctype. They have their own sub-interface of Visitor that knows about Doctype nodes. I can?t help but feel that the authors of Node would have preferred to make this illegal, if they had been able to. > > The second good sealed type I found is almost an enum, except that one of the instances has per-instance data. This is not exactly a surprise, since an enum is a degenerate sum type, and one way to think of sealed types is as a way to model sums. It looks something like this[2]: > > public abstract class DbResult { > public record NoDatabase() extends DbResult; > public record RowNotFound() extends DbResult; > // Four more error types ... > public record EmptySuccess() extends DbResult; > public record SuccessWithData(T data) extends DbResult; > > public T getData() { > if (!(this instanceof SuccessWithData)) > throw new DbException(); > return ((SuccessWithData)this).data; > } > public DbResult transform(Function f) { > if (!(this instanceof SuccessWithData)) { > return (DbResult)this; > } > return new SuccessWithData(f.apply( > ((SuccessWithData)this).data)); > } > > Reading this code made me yearn for Haskell: here is someone who surely wanted to write > > data DbResult t = NoDatabase | NoRow | EmptySuccess | Success t > > but had to spend 120 lines defining their sum-of-products (the extra verbosity is because really they made the subclasses private, and defined private static singletons for each of the error types, with a static getter to get the type parameter right). This seems like a potential win for records and for sealed types. Certainly my snippet was much shorter than the actual source file because the proposed record syntax is quite concise, so that is a real win. But what do we really gain from sealing this type? Still nobody does exhaustive analysis even of this relatively small type: they just use functions like getData and transform to work with the result generically, or spot-check a couple interesting subtypes with instanceof. Forbidding subclassing from other packages hardly matters: nobody was subclassing it anyway, and nor would they be tempted to. Really the improvements DbResult benefits most from are records, and pattern-matching on records. It would be much nicer to replace the instanceof/cast pattern with a pattern-match that extracts the relevant field. > > This is the use case that inspired my idea of a type-enum, in the Summary section above. Rewriting it as a type-enum eliminates many of the problems: all the instanceof checks are gone, we don?t need a bunch of extra keywords for each case, and we?re explicit about the subclasses ?belonging to? the sealed parent, which means we get stuff like extends and for free. We get improved clarity by letting the definition of the class hierarchy reflect its ?nature? as a sum. > > public abstract type-enum DbResult { > NoDatabase, > RowNotFound, > EmptySuccess, > SuccessWithData(T data) { > @Override public T getData() { > return data; > } > @Override public DbResult transform(Function f) { > return new SuccessWithData(f.apply(data)); > } > } > > public T getData() { > throw new DbException(); > } > public DbResult transform(Function f) { > return (DbResult)this; > } > } > > Visitors > > Instead of doing a bunch of instanceof checks, the ?sophisticated? way to interact with a class having a small, known set of subtypes is with a visitor. I considered doing some complicated analysis to characterize what makes a class a visitor, and trying to automatically cross-reference visitors to the classes they visit...but in practice simply looking for classes with ?Visitor? in their name was a strong enough signal that a more complicated approach was not needed. Having identified visitors, I looked at those visitors with the most subclasses, since each distinct subclass corresponds to one ?interaction? with the sealed type that it visits, and well-used visitors suggest both popularity and good design. > > One common theme I found: developers aren?t good at applying the visitor pattern. Many cases I found had some weird and inexplicable quirk compared to the ?standard? visitor. These developers will be relieved to get pattern-matching syntax so they can stop writing visitors. > > The Visiting Object > > The first popular visitor I found was a bit odd to me. It?s another tree type, but with a weird amalgam of several visitors, and an unusual approach to its double dispatch. I have to include a relatively lengthy code snippet to show all of its facets: > > public static abstract class Node { > public interface Visitor { > boolean process(Node node); > } > public boolean visit(Object v) { > return v instanceof Visitor > && ((Visitor)v).process(this); > } > // Other methods common to all Nodes ... > } > > public static final class RootNode extends Node { > public interface Visitor { > boolean processRoot(RootNode node); > } > @Override > public boolean visit(Object v) { > return v instanceof Visitor > ? ((Visitor)v).processRoot(this) > : super.visit(v); > } > // Other stuff about root nodes ... > } > > public static abstract class ValueNode extends Node { > public interface Visitor { > boolean processValue(ValueNode node); > } > @Override > public boolean visit(Object v) { > return v instanceof Visitor > ? ((Visitor)v).processValue(this) > : super.visit(v); > } > } > > public static final class BooleanNode extends ValueNode { > public interface Visitor { > boolean processBool(BooleanNode node); > } > @Override > public boolean visit(Object v) { > return v instanceof Visitor > ? ((Visitor)v).processBool(this) > : super.visit(v); > } > // Other stuff about booleans ... > } > > public static final class StringNode extends ValueNode { > // Much the same as BooleanNode > } > > This goes on for some time: there is a multi-layered hierarchy of dozens of node types, each with a boolean visit(Object) method, and their own distinct Visitor interface, in this file. I should note that this code is actually not written by a human, but rather generated by some process (I didn?t look into how). I still think it is worth mentioning here for two reasons: first, whoever wrote the code generator would probably do something similar if writing it by hand, and second because these visitors are used often by hand-written code. > > Speaking of hand-written code, visitor subclasses now get to declare ahead of time exactly which kinds of nodes they care about, by implementing only the appropriate Visitor interfaces: > > private class FooVisitor implements StringNode.Visitor, > BooleanNode.Visitor, RootNode.Visitor { > // ... > } > > This isn?t how I would have written things, but I can sorta see the appeal, if you don?t have to write it all by hand: a visitor can choose to handle any one subclass of ValueNode, or all ValueNodes, or just RootNode and StringNode, et cetera. They get to pick and choose what sub-trees of the inheritance tree they work with. > > Would Node be a good sealed class? Maybe. It clearly intends to enumerate all subclasses, but the benefit it gets from enforcing that is minimal. As in my previous examples, the main advantage for Node implementors would come from records, and the main advantage for clients would come from pattern-matching, obviating their need for this giant visitor. > > The Enumerated Node > > Another AST, this time for some kind of query language, explicitly declares an enum of all subclasses it can have, and uses this enum instead of using traditional double-dispatch: > > public interface Node { > enum Kind {EXPR, QUERY, IMPORT /* and 9 more */} > Kind getKind(); > Location getLocation(); > } > > public abstract record AbstractNode(Location l) implements Node {} > > public class Expr extends AbstractNode { > public Kind getKind() {return EXPR;} > // ... > } > // And so on for other Kinds ... > > public abstract class Visitor { > // Empty default implementations, not abstract. > public Expr visitExpr(Expr e) {} > public Query visitQuery(Query q) {} > public Import visitImport(Import i) {} > public Node visit(Node n) { > switch (n.getKind()) { > case EXPR: return visitExpr((Expr)n); > case QUERY: return visitQuery((Query)n); > case IMPORT: return visitImport((Import)n); > // ... > } > } > } > > It?s not really clear to me why they do it this way, instead of putting an accept(Visitor) method on Node. They gain the ability to return different types for each Node subtype, but are hugely restricted in what visitors can do: they must return a Node, instead of performing an arbitrary computation. It seems like the idea is visitors must specialize to tree rewriting, but I still would have preferred to parameterize the visitor by return type. > > Would this be better as a sealed type? I feel sure that if sealed types existed, the authors of this class would have used one. We could certainly do away with the enum, and use an expression-switch instead to pattern-match in the implementation of visit(Node). But I think the Visitor class would still exist, and still have separate methods for each Node subtype, because they developer seemed to care about specializing the return type. The only place where an exhaustiveness check helps would be in the visit(Node) method, inside the visitor class itself. All other dispatch goes through visit(Node), or through one of the specialized visitor methods if the type is known statically. It seems like overall this would be an improvement, but again, the improvement comes primarily from pattern-matching, not sealing. > > Colocated interface implementations > > Finally, I looked for interfaces having all of their implementations defined in the same file. On this I do have some statistical data[3]. A huge majority (98.5%) of public interfaces have at least one implementation in a different source file. Package-private interfaces also tend to have implementations in other files: 85% of them are in this category. For protected interfaces it?s much closer: only 53% have external implementations. Of course, all private interfaces have all implementations in a single file. > > Next, I looked at interfaces that share a source file with all their implementations, to see whether they?d make good sealed types. First was this Entry class: > > public interface Entry { > enum Status {OK, PENDING, FAILED} > Status getStatus(); > int size(); > String render(); > } > > public class UserEntry implements Entry { > private User u; > private Status s; > public UserEntry(User u, Status s) { > this.u = u; > this.s = s; > } > @Override String render() {return u.name ();} > @Override int size() {return 1;} > @Override Status getStatus() {return s;} > } > > public class AccountEntry implements Entry { > private Account a; > private Status s; > public UserEntry(Account a, Status s) { > this.a = a; > this.s = s; > } > @Override String render() {return a.render();} > @Override int size() {return a.size();} > @Override Status getStatus() {return s;} > } > > A huge majority of the clients of this Entry interface treat it polymorphically, just calling its interface methods. In only one case is there an instanceof check made on an Entry, dispatching to different methods depending on which subclass is present. > > Is this a good sealed type? I think not, really. There are two implementations now, but perhaps there will be a GroupEntry someday. Existing clients should continue to work in that case: the polymorphic Entry interface provides everything clients are ?intended? to know. > > Another candidate for sealing: > > public interface Request {/* Empty */} > public record RequestById(int id) implements Request; > public record RequestByValue(String owner, boolean historic) implements Request; > > public class RequestFetcher { > public List fetch(Iterable requests) { > List idReqs = Lists.newArrayList(); > List valueReqs = Lists.newArrayList(); > List queries = Lists.newArrayList(); > for (Request req : requests) { > if (req instanceof RequestById) { > idReqs.add((RequestById)req); > } else if (req instanceof RequestByValue) { > valueReqs.add((RequestByValue)req); > } > } > queries.addAll(prepareIdQueries(idReqs)); > queries.addAll(prepareValueQueries(valueReqs)); > return runQueries(queries); > } > } > > Interestingly, since the Request interface is empty, the only way to do anything with this class is to cast it to one implementation type. In fact, the RequestFetcher I include here is the only usage of either of these classes (plus, of course, helpers like prepareIdQueries). > > So, clients need to know about specific subclasses, and want to be sure they?re doing exhaustive pattern-matching. Seems like a great sealed class to me. Except...actually each of the two subclasses has been extended by a decorator adding a source[4]: > > public record SourcedRequestById(Source source) extends RequestById; > public record SourcedRequestByValue(Source source) extends RequestByValue; > > Does this argue in favor of sealing, or against? I don?t really know. The owners of Request clearly intended for all four of these subclasses to exist (they?re in the same package), so they could include them all in the permitted subtype list, but it seems like a confusing API to expose to clients. > > A third candidate for sealing is another simple sum type: > > public interface ValueOrAggregatorException { > T get(); > public static ValueOrAggregatorException > of(T value) { > return new OfValue(value); > } > public static ValueOrAggregatorException > ofException(AggregatorException err) { > return new OfException(err); > } > private record OfValue(T value) > implements ValueOrAggregatorException { > @Override T get() {return value;} > } > private record OfException(AggregatorException err) > implements ValueOrAggregatorException { > @Override T get() {throw err;} > } > } > > It has only two subtypes, and it seems unimaginable there could ever be a third, so why not seal it? However, the subtypes are intentionally hidden: it is undesirable to let people see whether there?s an exception, except by having it thrown at you. In fact AggregatorException is documented as ?plugins may throw this, but should never catch it?: there is some higher-level thing responsible for catching all such exceptions. So, this type gains no benefit from exhaustiveness checks in pattern-matching. The type is intended to be used polymorphically, through its interface method, even though its private implementation is amenable to sealing. > ________________ > [1] Throughout this document I will use record syntax as if it were already in the language. This is merely for brevity, and to avoid making the reader spend a lot of time reading code that boils down to just storing a couple fields. In practice, of course the code in Google?s codebase either defines the records by hand, or uses an @AutoValue. > [2] Recall that @AutoValue, Google?s ?record?, allows extending a class, which is semantically okay here: DbResult has no state, only behavior. > [3]This data is imperfect. While the Google codebase strongly discourages having more than one version of a module checked in, there is still some amount of ?vendoring? or checking in multiple versions of some package, e.g. for supporting external clients of an old version of an API. As a result, two ?different files? which are really copies of each other may implement interfaces with the same fully-qualified name; I did not attempt to control for this case, and so such cases may look like they were in the same file, or not. > [4] Of course in the record proposal it is illegal to extend records like this; in real life these plain data carriers are implemented by hand as ordinary classes, so the subtyping is legal. From amalloy at google.com Mon Apr 29 20:33:19 2019 From: amalloy at google.com (Alan Malloy) Date: Mon, 29 Apr 2019 13:33:19 -0700 Subject: Feedback on Sealed Types In-Reply-To: References: Message-ID: Thanks, Brian. I indeed didn't think of some of your proposed benefits of sealing non-sum types, as I was focused mostly on things you mentioned explicitly in the JEP, which is somewhat light on the expected benefits. I think the first two items in your "challenges" solve each other: I don't intend sum types to be the only kind of sealed type, but just a good way to declare the simplest kind. I left out the "record" keyword from the declaration with the idea that it would be implicit: if you want the convenient sum-of-products declaration style, you have to use records. If you want something more complicated, you declare a sealed interface (or superclass), and N permitted subclasses, declared separately in whatever way you want. This restriction helps by making the semantics clearer, and I had also hoped that it would lead to a syntax error if you leave out the comma. Looking more closely, I see this is somewhat precarious: a record declaration looks enough like a method signature that they may be ambiguous in an interface, if you don't require the "record" keyword, or if you use a semicolon instead of a comma. I think it can still work if we require each nested record to use {...} instead of ; even if it's empty. This way, your two examples look like interface X { class X1 { ? } class X2 { ? } } and enumerated interface Y { Y1 { ? }, wrote: > Thanks Alan, for this nice exploration. There?s a lot to respond to. > I?ll start with some general comments about sealing, and then move on to > your alternate proposal for exposing it. > > I can think of several main reasons why you would want to seal a > hierarchy. > > - To say something about the _interface itself_. That is, that it was > not designed as a general purpose contract between arms-length entities, > but that it exists only as a common super type for a fixed set of classes > that are part of the library. In other words, ?please don?t implement me.? > > > - To say something about the semantics of the type. Several of the > examples in your report fall into this category: ?a DbResult is either a > NoRowsFound or a Rows(List)?. This tells users exactly what the > reasonable range of results to expect are when doing a query. Of course, > the spec could say the same thing, but that involves reading and > interpreting the spec. Easier if this conclusion can be driven by types > (and IDEs can help more here too.) > > - To strengthen typing by simulating unions. If my method is going to > return either a String or a Number, the common super type is Object. > (Actually, it?s some variant of Serializable & Comparable.). > Sums-of-products allow library authors to make a stronger statement about > types in the presence of unions. Exposing a sum of StringHolder(String) > and NumberHolder(Number), using records and sealed types, is not so > ceremonious, so some library developers might choose to do this instead of > Object. > > - Security. Some libraries may want to know that the code they are > calling is part of their library, rather than an arbitrary implementation > of some interface. > > - To aid in exhaustiveness. We?ve already discussed this at length; your > point is that this one doesn?t come up as often as one might hope. > > Not only is there an obvious synergy between sums and products (as many > languages have demonstrated), but there is a third factor, which is ?if you > make it easy enough, people will use it more.? Clearly records are pretty > easy to use; your point is that if there were a more streamlined > sum-of-products idiom, the third factor would be even stronger here. I > think algebraic data types is one of those things that will take some time > for developers to learn to appreciate; the easier we make it, of course the > faster that will happen. > > > Now, to your syntax suggestion. Overall, I like the idea, but I have some > concerns. First, the good parts: > > - The connection with enums is powerful. Users already understand enums, > so this will help them understand sums. Enums have special treatment in > switch; we want the same treatment for sealed type patterns. Enums have > special treatment for exhaustiveness; we want the same for sealed type > patterns. So tying these together with some more general enum-ness leans > on what people already know. > > - While sums and products are theoretically independent features, > sums-of-products are expected to be quite common. So it might be > reasonable to encourage this syntactically. > > - The current proposal has some redundancy, in that the subtypes have to > say ?implements Node?, even if they are nested within Node. With a > stronger mechanism for declaring them, as you propose, then that can safely > be left implicit. > > - I confess that I too like the simplicity of Haskell?s `data` > declaration, and this brings us closer. > > Now, the challenges: > > - The result is still a little busy. We need a modifier for ?enumerated > type?, and we would also need to be able to have child types be not only > records, but ordinary classes and interfaces. So we?d have to have a place > for ?record?, ?class?, or ?interface? with the declaration of the > enumerated classes (as well as other modifiers.). That busies up the result > a bit. > > - Once we do this, I worry that it will be hard to tell the difference > between: > > interface X { > class X1 { ? } > class X2 { ? } > } > > and > > enumerated interface Y { > class Y1 { ? }, class Y2 { ? } > } > > and that users will forever be making mistakes like forgetting the comma, > or putting it where it doesn?t belong. > > - This mechanism addresses the very common case of sum-of-product, but > leaves more esoteric sums out of the picture. (Consider the types in > java.lang.constant, which really want to be sealed.). There, because they > are not co-declared, we?d need something more like > > sealed interface ConstantDesc > permits ClassDesc, MethodTypeDesc, ?. { } > > It's possible that such a mechanism can be grafted on to your proposal, or > there is a shuffling that supports it. > > > > > > On Apr 29, 2019, at 2:28 PM, Alan Malloy wrote: > > Hello again, amber-spec-experts. I have another report from the Google > codebase, this time focusing on sealed types. It is viewable in full > Technicolor HTML at > http://cr.openjdk.java.net/~cushon/amalloy/sealed-types-report.html (thanks > again to Liam for hosting), and included below as plain text: > > ?Author: Alan Malloy (amalloy at google.com) > Published: 2019-04-29 > > Feedback on Sealed Types > > Hello again, amber-spec-experts. I?m back with a second Google codebase > research project. I?m looking again at the Records & Sealed Types proposal > (which has now become JDK-8222777), but this time I?m focusing on sealed > types instead of records, as promised in my RFC of a few weeks ago. My goal > was to investigate Google?s codebase to guess what developers might have > done differently if they had access to sealed types. This could help us > understand what works in the current proposal and what to consider changing. > > Unlike my previous report, this one contains more anecdotes than > statistics. It wound up being difficult to build static analysis to > categorize the interesting cases, so I mostly examined promising candidates > by hand. > > Summary and Recommendations > > For those who don?t care to read through all my anecdotes, I first provide > a summary of my findings, and one suggested addition. > > Sealed types, as proposed so far, are a good idea in theory: Java already > has product types and open polymorphism, and sealed types give us closed > polymorphism. However, I could not find many cases of code being written > today that would be greatly enhanced if sealed types were available. The > main selling point of sealed types for application authors is getting help > from the compiler with exhaustiveness checking, but in practice developers > almost always have a default case, because they are only interested in a > subset of the possible subclasses, and want to ignore cases they don?t > understand. This means that exhaustiveness-checking for pattern matches > would mostly go unused if developers rewrote their existing code using > sealed types. > > Pattern matching is great, and can replace visitors in many cases, but > this does not depend on sealed types except for exhaustiveness checks > (which, again, would go mostly unused in code written today). The class > hierarchies for which people define visitors today are just too large to > write an exhaustive pattern match, and so a default case would be very > common. > > The other audience for sealed types is library authors. While in practice > most developers have no great need to forbid subclasses, perhaps it would > be a boon for authors of particularly popular libraries, who need to expose > a non-final class as an implementation detail but don?t intend for > consumers to create their own subclasses. Those authors can already include > documentation saying ?you really should not extend this?, but there is > always some weirdo out there who will ignore your warnings and then write > an angry letter when the next version of your library breaks his program > (see: sun.misc.Unsafe). Authors of such libraries would welcome the > opportunity to make it truly impossible to create undesirable subclasses. > > Sealed Types As a Vehicle For Sum Types > > So, sealed types as-is would be an improvement, but a niche one, used by > few. I think we can get substantially more mileage out of them if we also > include a more cohesive way to explicitly define a sum type and all its > subtypes in one place with minimal ceremony. Such a sum type could be > sealed, implicitly or explicitly. A tool like this takes what I see as the > ?theoretical? advantage of sum types (closed polymorphism), and makes it > ?practical? by putting it front and center. Making sums an actual language > element instead of something ?implied? by sealing a type and putting its > subclasses nearby could help in a lot of ways: > > * Developers might more often realize that a sealed/sum type is a good > model for their domain. Currently it?s a ?pattern? external to the language > instead of a ?feature?, and many don?t realize it could be applied to their > domain. Putting it in the language raises its profile, addressing the > problem that people don?t realize they want it. > * The compiler could provide help for defining simple sums-of-products, > while making it possible to opt into more complicated subclasses, in much > the way that enums do: the typical enum just has bare constants like EAST, > but you can add constructor arguments or override methods when necessary. > * The ability to more easily model data in this way may result in > developers writing more classes that are amenable to sealing/sums, as they > do in other languages with explicit sum types (Haskell, Kotlin, Scala). > Then, the exhaustiveness-checking feature that sealed types provide would > pull more weight. > > Since enum types are ?degenerate sum types?, the syntax for defining sums > can borrow heavily from enums. A sketch of the syntax I imagine for such > things (of course, I am not married to it): > public type-enum interface BinaryTree { > Leaf { > @Override public Stream elements() {return Stream.empty();} > }, > Node(T data, BinaryTree left, BinaryTree right) { > @Override public Stream elements() { > return Stream.concat(left.elements(), > Stream.concat(Stream.of(data), right.elements())); > } > }; > > > public Stream elements(); > } > > Like enums, you can use a bare identifier for simple types that exist only > to be pattern-matched against, but you can add fields and/or override > blocks as necessary. The advantage over declaring a sealed type separately > from its elements is both concision (the compiler infers visible records, > superclass, and all type parameters) and clarity: you state your intention > firmly. I think a convenient syntax like this will encourage developers to > use the powerful tool of sealed types to model their data. > > Evidence in Google?s Codebase > > If you are just interested in recommendations, you can stop reading now: > they are all included in the summary. What follows is a number of > anecdotes, or case studies if you prefer, that led me to the conclusions > above. Each shows a type that might have been written as a sealed type, and > hopefully highlights a different facet of the question of how sealed types > can be useful. > > The first thing I looked for was classes which are often involved in > instanceof checks. As language authors, we imagine people writing stuff > like this[1] all the time: > > interface Expr {int eval(Scope s);} > record Var(String name) implements Expr { > public int eval(Scope s) {return s.get(name);} > } > record Sum(Expr left, Expr right) implements Expr { > public int eval(Scope s) {return left.eval(s) + right.eval(s);} > } > class Analyzer { > Stream variablesUsed(Expr e) { > if (e instanceof Var) return Stream.of(((Var)e).name); > if (e instanceof Sum) { > return variablesUsed(((Sum)e).left) > .concat(variablesUsed(((Sum)e).right)); > } > throw new IllegalArgumentException(); > } > } > > Here, the Expr interface captures some of the functionality shared by all > expressions, but later a client (Analyzer) came along and invented some > other polymorphic operations to perform on an Expr, which Expr did not > support. So Analyzer needed to do instanceof checks instead, externalizing > the polymorphism. The principled approach would have been for Expr to > export a visitor to begin with, but perhaps it wasn?t seen as worth the > trouble at the time. > > To try to find this pattern in the wild, I searched for method bodies > which perform multiple instanceof checks against the same variable. > Notably, this excludes the typical equals(Object) method, which only > performs a single check. For each such variable, I noted: > > 1. Its declared type > 2. The set of subtypes it was checked for with instanceof > 3. The common supertype of those subtypes. > > I guessed that (3) would usually be the same as (1), but in practice 55% > of the time they were different. Often, the declared type was Object, or > some generic type variable which erases to Object, while the common > supertype being tested was something like Number, Event, or Node. For > example, a Container knows it will be used in some context where NaN is > unsuitable, so it checks whether its contents are Float or Double, and if > so ensures NaN is not stored. As a second example, a serialize(Object) > method checks whether its input is String or ByteString, and throws an > exception otherwise. > > Bad sealed types found looking at instanceof checks > > I looked through the most popular declared types of these candidates, to > investigate which types are often involved in such checks. Most of them are > not good candidates for a sealed type. Object was the most common type, > followed by Exception and Throwabe. > > Next up is an internal DOMObject class, which sounds promising until I > tell you it has thousands of direct subclasses. Nobody is doing exhaustive > switches on this, of course. Instead, many uses iterate over a > Collection, or receive a DOMObject in some way, and just check > whether it is of one or two specific subtypes they care about. This turned > out to be a very common pattern, not just for DOMObject, but for many > candidate sealed types I found: nobody does exhaustive case analysis. They > just look for objects they understand in some much larger hierarchy, and > ignore the rest. > > Some more humorous types that are often involved in instanceof checks: > java.net.InetAddress (everyone wants to know if it?s v4 or v6) and > com.sun.source.tree.Tree, in our static-analysis tools. Tree is an > interesting case: here we do exactly what I mentioned previously for > DOMObject. On the surface it seems that Tree would be a good candidate for > a sealed interface with record subtypes, but in practice I?m not sure what > sealing would buy us. We would effectively opt out of > exhaustiveness-checking by having a large default case, or by extending a > visitor with default-empty methods. Of course, sometimes we define a new > visitor to do some polymorphic operation over a Tree, but more often we > just look for one or two subtypes we care about. For example, DataFlow > inspects a Tree, but knows from context that it is either a > LambdaExpressionTree, MethodTree, or an initializer. > > Plausible sealed types found looking at instanceof checks > > The previous section notwithstanding, I did dig deep enough into the > results to find a few classes that could make good sealed types. The most > prominent, and most interesting, was another AST. There is an abstract Node > class for representing HTML documents. It has just 4 subclasses defined in > the same file: Text, Comment, Tag, and EndTag. This spartan definition > suggests it?s used for something like SAX parsing, but I didn?t confirm > this. It does everything you could hope for from a type like this: it > exposes a Visitor, it provides an accept(Visitor) method, and the > superclass specifies abstract methods for a couple of the most common > things you would want to do, such as a String toHtml() method. > > However, recall that I found this class by looking for classes often > involved in instanceof checks! Some people use the visitor, but why doesn?t > everyone? The first reason I found is one I?ve mentioned several times > already: clients only care about one of the 4 cases, and may have felt > creating an anonymous visitor is too much ceremony. Would they be happy > with a switch and a default clause? Probably, but it?s hard to know for > sure. The second reason surprised me a bit: I found clients doing analysis > that isn?t really amenable to any one visitor, or a simple pattern-match. > They?ve written this: > > if (mode1) { if (x instanceof Tag) {...} } > else if (mode2) { if (x instanceof Text) {...}} > > The same use site cares about different subclasses at different times, > depending on some other flag(s) controlling its behavior. Even if we > offered a pattern-match on x, it?s difficult to encode the flags correctly. > They would have to match on a tuple of (mode1, mode2, x), with a case for > (true, _, Tag name) and another for (false, true, Text text). Technically > possible, but not really prettier than what they already have, especially > since you would need to use a local record instead of an anonymous tuple. > > Even so, I think this would have benefited from being a sealed type. > Recall that earlier I carefully said ?4 subclasses defined in the same > file?. This is because some jokester in a different package altogether has > defined their own fifth subclass, Doctype. They have their own > sub-interface of Visitor that knows about Doctype nodes. I can?t help but > feel that the authors of Node would have preferred to make this illegal, if > they had been able to. > > The second good sealed type I found is almost an enum, except that one of > the instances has per-instance data. This is not exactly a surprise, since > an enum is a degenerate sum type, and one way to think of sealed types is > as a way to model sums. It looks something like this[2]: > > public abstract class DbResult { > public record NoDatabase() extends DbResult; > public record RowNotFound() extends DbResult; > // Four more error types ... > public record EmptySuccess() extends DbResult; > public record SuccessWithData(T data) extends DbResult; > > public T getData() { > if (!(this instanceof SuccessWithData)) > throw new DbException(); > return ((SuccessWithData)this).data; > } > public DbResult transform(Function f) { > if (!(this instanceof SuccessWithData)) { > return (DbResult)this; > } > return new SuccessWithData(f.apply( > ((SuccessWithData)this).data)); > } > > Reading this code made me yearn for Haskell: here is someone who surely > wanted to write > > data DbResult t = NoDatabase | NoRow | EmptySuccess | Success t > > but had to spend 120 lines defining their sum-of-products (the extra > verbosity is because really they made the subclasses private, and defined > private static singletons for each of the error types, with a static getter > to get the type parameter right). This seems like a potential win for > records and for sealed types. Certainly my snippet was much shorter than > the actual source file because the proposed record syntax is quite concise, > so that is a real win. But what do we really gain from sealing this type? > Still nobody does exhaustive analysis even of this relatively small type: > they just use functions like getData and transform to work with the result > generically, or spot-check a couple interesting subtypes with instanceof. > Forbidding subclassing from other packages hardly matters: nobody was > subclassing it anyway, and nor would they be tempted to. Really the > improvements DbResult benefits most from are records, and pattern-matching > on records. It would be much nicer to replace the instanceof/cast pattern > with a pattern-match that extracts the relevant field. > > This is the use case that inspired my idea of a type-enum, in the Summary > section above. Rewriting it as a type-enum eliminates many of the problems: > all the instanceof checks are gone, we don?t need a bunch of extra keywords > for each case, and we?re explicit about the subclasses ?belonging to? the > sealed parent, which means we get stuff like extends and for free. We > get improved clarity by letting the definition of the class hierarchy > reflect its ?nature? as a sum. > > public abstract type-enum DbResult { > NoDatabase, > RowNotFound, > EmptySuccess, > SuccessWithData(T data) { > @Override public T getData() { > return data; > } > @Override public DbResult transform(Function f) { > return new SuccessWithData(f.apply(data)); > } > } > > public T getData() { > throw new DbException(); > } > public DbResult transform(Function f) { > return (DbResult)this; > } > } > > Visitors > > Instead of doing a bunch of instanceof checks, the ?sophisticated? way to > interact with a class having a small, known set of subtypes is with a > visitor. I considered doing some complicated analysis to characterize what > makes a class a visitor, and trying to automatically cross-reference > visitors to the classes they visit...but in practice simply looking for > classes with ?Visitor? in their name was a strong enough signal that a more > complicated approach was not needed. Having identified visitors, I looked > at those visitors with the most subclasses, since each distinct subclass > corresponds to one ?interaction? with the sealed type that it visits, and > well-used visitors suggest both popularity and good design. > > One common theme I found: developers aren?t good at applying the visitor > pattern. Many cases I found had some weird and inexplicable quirk compared > to the ?standard? visitor. These developers will be relieved to get > pattern-matching syntax so they can stop writing visitors. > > The Visiting Object > > The first popular visitor I found was a bit odd to me. It?s another tree > type, but with a weird amalgam of several visitors, and an unusual approach > to its double dispatch. I have to include a relatively lengthy code snippet > to show all of its facets: > > public static abstract class Node { > public interface Visitor { > boolean process(Node node); > } > public boolean visit(Object v) { > return v instanceof Visitor > && ((Visitor)v).process(this); > } > // Other methods common to all Nodes ... > } > > public static final class RootNode extends Node { > public interface Visitor { > boolean processRoot(RootNode node); > } > @Override > public boolean visit(Object v) { > return v instanceof Visitor > ? ((Visitor)v).processRoot(this) > : super.visit(v); > } > // Other stuff about root nodes ... > } > > public static abstract class ValueNode extends Node { > public interface Visitor { > boolean processValue(ValueNode node); > } > @Override > public boolean visit(Object v) { > return v instanceof Visitor > ? ((Visitor)v).processValue(this) > : super.visit(v); > } > } > > public static final class BooleanNode extends ValueNode { > public interface Visitor { > boolean processBool(BooleanNode node); > } > @Override > public boolean visit(Object v) { > return v instanceof Visitor > ? ((Visitor)v).processBool(this) > : super.visit(v); > } > // Other stuff about booleans ... > } > > public static final class StringNode extends ValueNode { > // Much the same as BooleanNode > } > > This goes on for some time: there is a multi-layered hierarchy of dozens > of node types, each with a boolean visit(Object) method, and their own > distinct Visitor interface, in this file. I should note that this code is > actually not written by a human, but rather generated by some process (I > didn?t look into how). I still think it is worth mentioning here for two > reasons: first, whoever wrote the code generator would probably do > something similar if writing it by hand, and second because these visitors > are used often by hand-written code. > > Speaking of hand-written code, visitor subclasses now get to declare ahead > of time exactly which kinds of nodes they care about, by implementing only > the appropriate Visitor interfaces: > > private class FooVisitor implements StringNode.Visitor, > BooleanNode.Visitor, RootNode.Visitor { > // ... > } > > This isn?t how I would have written things, but I can sorta see the > appeal, if you don?t have to write it all by hand: a visitor can choose to > handle any one subclass of ValueNode, or all ValueNodes, or just RootNode > and StringNode, et cetera. They get to pick and choose what sub-trees of > the inheritance tree they work with. > > Would Node be a good sealed class? Maybe. It clearly intends to enumerate > all subclasses, but the benefit it gets from enforcing that is minimal. As > in my previous examples, the main advantage for Node implementors would > come from records, and the main advantage for clients would come from > pattern-matching, obviating their need for this giant visitor. > > The Enumerated Node > > Another AST, this time for some kind of query language, explicitly > declares an enum of all subclasses it can have, and uses this enum instead > of using traditional double-dispatch: > > public interface Node { > enum Kind {EXPR, QUERY, IMPORT /* and 9 more */} > Kind getKind(); > Location getLocation(); > } > > public abstract record AbstractNode(Location l) implements Node {} > > public class Expr extends AbstractNode { > public Kind getKind() {return EXPR;} > // ... > } > // And so on for other Kinds ... > > public abstract class Visitor { > // Empty default implementations, not abstract. > public Expr visitExpr(Expr e) {} > public Query visitQuery(Query q) {} > public Import visitImport(Import i) {} > public Node visit(Node n) { > switch (n.getKind()) { > case EXPR: return visitExpr((Expr)n); > case QUERY: return visitQuery((Query)n); > case IMPORT: return visitImport((Import)n); > // ... > } > } > } > > It?s not really clear to me why they do it this way, instead of putting an > accept(Visitor) method on Node. They gain the ability to return different > types for each Node subtype, but are hugely restricted in what visitors can > do: they must return a Node, instead of performing an arbitrary > computation. It seems like the idea is visitors must specialize to tree > rewriting, but I still would have preferred to parameterize the visitor by > return type. > > Would this be better as a sealed type? I feel sure that if sealed types > existed, the authors of this class would have used one. We could certainly > do away with the enum, and use an expression-switch instead to > pattern-match in the implementation of visit(Node). But I think the Visitor > class would still exist, and still have separate methods for each Node > subtype, because they developer seemed to care about specializing the > return type. The only place where an exhaustiveness check helps would be in > the visit(Node) method, inside the visitor class itself. All other dispatch > goes through visit(Node), or through one of the specialized visitor methods > if the type is known statically. It seems like overall this would be an > improvement, but again, the improvement comes primarily from > pattern-matching, not sealing. > > Colocated interface implementations > > Finally, I looked for interfaces having all of their implementations > defined in the same file. On this I do have some statistical data[3]. A > huge majority (98.5%) of public interfaces have at least one implementation > in a different source file. Package-private interfaces also tend to have > implementations in other files: 85% of them are in this category. For > protected interfaces it?s much closer: only 53% have external > implementations. Of course, all private interfaces have all implementations > in a single file. > > Next, I looked at interfaces that share a source file with all their > implementations, to see whether they?d make good sealed types. First was > this Entry class: > > public interface Entry { > enum Status {OK, PENDING, FAILED} > Status getStatus(); > int size(); > String render(); > } > > public class UserEntry implements Entry { > private User u; > private Status s; > public UserEntry(User u, Status s) { > this.u = u; > this.s = s; > } > @Override String render() {return u.name();} > @Override int size() {return 1;} > @Override Status getStatus() {return s;} > } > > public class AccountEntry implements Entry { > private Account a; > private Status s; > public UserEntry(Account a, Status s) { > this.a = a; > this.s = s; > } > @Override String render() {return a.render();} > @Override int size() {return a.size();} > @Override Status getStatus() {return s;} > } > > A huge majority of the clients of this Entry interface treat it > polymorphically, just calling its interface methods. In only one case is > there an instanceof check made on an Entry, dispatching to different > methods depending on which subclass is present. > > Is this a good sealed type? I think not, really. There are two > implementations now, but perhaps there will be a GroupEntry someday. > Existing clients should continue to work in that case: the polymorphic > Entry interface provides everything clients are ?intended? to know. > > Another candidate for sealing: > > public interface Request {/* Empty */} > public record RequestById(int id) implements Request; > public record RequestByValue(String owner, boolean historic) implements > Request; > > public class RequestFetcher { > public List fetch(Iterable requests) { > List idReqs = Lists.newArrayList(); > List valueReqs = Lists.newArrayList(); > List queries = Lists.newArrayList(); > for (Request req : requests) { > if (req instanceof RequestById) { > idReqs.add((RequestById)req); > } else if (req instanceof RequestByValue) { > valueReqs.add((RequestByValue)req); > } > } > queries.addAll(prepareIdQueries(idReqs)); > queries.addAll(prepareValueQueries(valueReqs)); > return runQueries(queries); > } > } > > Interestingly, since the Request interface is empty, the only way to do > anything with this class is to cast it to one implementation type. In fact, > the RequestFetcher I include here is the only usage of either of these > classes (plus, of course, helpers like prepareIdQueries). > > So, clients need to know about specific subclasses, and want to be sure > they?re doing exhaustive pattern-matching. Seems like a great sealed class > to me. Except...actually each of the two subclasses has been extended by a > decorator adding a source[4]: > > public record SourcedRequestById(Source source) extends RequestById; > public record SourcedRequestByValue(Source source) extends RequestByValue; > > Does this argue in favor of sealing, or against? I don?t really know. The > owners of Request clearly intended for all four of these subclasses to > exist (they?re in the same package), so they could include them all in the > permitted subtype list, but it seems like a confusing API to expose to > clients. > > A third candidate for sealing is another simple sum type: > > public interface ValueOrAggregatorException { > T get(); > public static ValueOrAggregatorException > of(T value) { > return new OfValue(value); > } > public static ValueOrAggregatorException > ofException(AggregatorException err) { > return new OfException(err); > } > private record OfValue(T value) > implements ValueOrAggregatorException { > @Override T get() {return value;} > } > private record OfException(AggregatorException err) > implements ValueOrAggregatorException { > @Override T get() {throw err;} > } > } > > It has only two subtypes, and it seems unimaginable there could ever be a > third, so why not seal it? However, the subtypes are intentionally hidden: > it is undesirable to let people see whether there?s an exception, except by > having it thrown at you. In fact AggregatorException is documented as > ?plugins may throw this, but should never catch it?: there is some > higher-level thing responsible for catching all such exceptions. So, this > type gains no benefit from exhaustiveness checks in pattern-matching. The > type is intended to be used polymorphically, through its interface method, > even though its private implementation is amenable to sealing. > ________________ > [1] Throughout this document I will use record syntax as if it were > already in the language. This is merely for brevity, and to avoid making > the reader spend a lot of time reading code that boils down to just storing > a couple fields. In practice, of course the code in Google?s codebase > either defines the records by hand, or uses an @AutoValue. > [2] Recall that @AutoValue, Google?s ?record?, allows extending a class, > which is semantically okay here: DbResult has no state, only behavior. > [3]This data is imperfect. While the Google codebase strongly discourages > having more than one version of a module checked in, there is still some > amount of ?vendoring? or checking in multiple versions of some package, > e.g. for supporting external clients of an old version of an API. As a > result, two ?different files? which are really copies of each other may > implement interfaces with the same fully-qualified name; I did not attempt > to control for this case, and so such cases may look like they were in the > same file, or not. > [4] Of course in the record proposal it is illegal to extend records like > this; in real life these plain data carriers are implemented by hand as > ordinary classes, so the subtyping is legal. > > > From amalloy at google.com Mon Apr 29 20:38:32 2019 From: amalloy at google.com (Alan Malloy) Date: Mon, 29 Apr 2019 13:38:32 -0700 Subject: Feedback on Sealed Types In-Reply-To: References: Message-ID: Thinking about it again, though, you're probably right: it would be nice if there were just one way to declare a sealed type, and if so that way should countenance both records-only and fancier stuff, including declarations in separate files. If we do allow non-record subtypes, it could get difficult to tell enumerated subtypes apart from nested classes in an ordinary interface. As I said, I'm not too attached to the syntax. I'm mostly interested in whether some version of this could be useful, and it seems to me like it should be evocative of enums, but the details of how to do all that I should leave to the people on this list who are more versed in the specifics of language design. On Mon, Apr 29, 2019 at 1:33 PM Alan Malloy wrote: > Thanks, Brian. I indeed didn't think of some of your proposed benefits of > sealing non-sum types, as I was focused mostly on things you mentioned > explicitly in the JEP, which is somewhat light on the expected benefits. > > I think the first two items in your "challenges" solve each other: I don't > intend sum types to be the only kind of sealed type, but just a good way to > declare the simplest kind. I left out the "record" keyword from the > declaration with the idea that it would be implicit: if you want the > convenient sum-of-products declaration style, you have to use records. If > you want something more complicated, you declare a sealed interface (or > superclass), and N permitted subclasses, declared separately in whatever > way you want. This restriction helps by making the semantics clearer, and I > had also hoped that it would lead to a syntax error if you leave out the > comma. Looking more closely, I see this is somewhat precarious: a record > declaration looks enough like a method signature that they may be ambiguous > in an interface, if you don't require the "record" keyword, or if you use a > semicolon instead of a comma. I think it can still work if we require each > nested record to use {...} instead of ; even if it's empty. This way, your > two examples look like > > interface X { > class X1 { ? } > class X2 { ? } > } > > and > > enumerated interface Y { > Y1 { ? }, Y2 { ? } > } > > The latter would become illegal if you dropped the comma, even if you also > forgot the "enumerated" keyword, because the braces make no sense in an > ordinary interface. > > On Mon, Apr 29, 2019 at 12:35 PM Brian Goetz > wrote: > >> Thanks Alan, for this nice exploration. There?s a lot to respond to. >> I?ll start with some general comments about sealing, and then move on to >> your alternate proposal for exposing it. >> >> I can think of several main reasons why you would want to seal a >> hierarchy. >> >> - To say something about the _interface itself_. That is, that it was >> not designed as a general purpose contract between arms-length entities, >> but that it exists only as a common super type for a fixed set of classes >> that are part of the library. In other words, ?please don?t implement me.? >> >> >> - To say something about the semantics of the type. Several of the >> examples in your report fall into this category: ?a DbResult is either a >> NoRowsFound or a Rows(List)?. This tells users exactly what the >> reasonable range of results to expect are when doing a query. Of course, >> the spec could say the same thing, but that involves reading and >> interpreting the spec. Easier if this conclusion can be driven by types >> (and IDEs can help more here too.) >> >> - To strengthen typing by simulating unions. If my method is going to >> return either a String or a Number, the common super type is Object. >> (Actually, it?s some variant of Serializable & Comparable.). >> Sums-of-products allow library authors to make a stronger statement about >> types in the presence of unions. Exposing a sum of StringHolder(String) >> and NumberHolder(Number), using records and sealed types, is not so >> ceremonious, so some library developers might choose to do this instead of >> Object. >> >> - Security. Some libraries may want to know that the code they are >> calling is part of their library, rather than an arbitrary implementation >> of some interface. >> >> - To aid in exhaustiveness. We?ve already discussed this at length; >> your point is that this one doesn?t come up as often as one might hope. >> >> Not only is there an obvious synergy between sums and products (as many >> languages have demonstrated), but there is a third factor, which is ?if you >> make it easy enough, people will use it more.? Clearly records are pretty >> easy to use; your point is that if there were a more streamlined >> sum-of-products idiom, the third factor would be even stronger here. I >> think algebraic data types is one of those things that will take some time >> for developers to learn to appreciate; the easier we make it, of course the >> faster that will happen. >> >> >> Now, to your syntax suggestion. Overall, I like the idea, but I have >> some concerns. First, the good parts: >> >> - The connection with enums is powerful. Users already understand >> enums, so this will help them understand sums. Enums have special treatment >> in switch; we want the same treatment for sealed type patterns. Enums have >> special treatment for exhaustiveness; we want the same for sealed type >> patterns. So tying these together with some more general enum-ness leans >> on what people already know. >> >> - While sums and products are theoretically independent features, >> sums-of-products are expected to be quite common. So it might be >> reasonable to encourage this syntactically. >> >> - The current proposal has some redundancy, in that the subtypes have to >> say ?implements Node?, even if they are nested within Node. With a >> stronger mechanism for declaring them, as you propose, then that can safely >> be left implicit. >> >> - I confess that I too like the simplicity of Haskell?s `data` >> declaration, and this brings us closer. >> >> Now, the challenges: >> >> - The result is still a little busy. We need a modifier for ?enumerated >> type?, and we would also need to be able to have child types be not only >> records, but ordinary classes and interfaces. So we?d have to have a place >> for ?record?, ?class?, or ?interface? with the declaration of the >> enumerated classes (as well as other modifiers.). That busies up the result >> a bit. >> >> - Once we do this, I worry that it will be hard to tell the difference >> between: >> >> interface X { >> class X1 { ? } >> class X2 { ? } >> } >> >> and >> >> enumerated interface Y { >> class Y1 { ? }, > class Y2 { ? } >> } >> >> and that users will forever be making mistakes like forgetting the comma, >> or putting it where it doesn?t belong. >> >> - This mechanism addresses the very common case of sum-of-product, but >> leaves more esoteric sums out of the picture. (Consider the types in >> java.lang.constant, which really want to be sealed.). There, because they >> are not co-declared, we?d need something more like >> >> sealed interface ConstantDesc >> permits ClassDesc, MethodTypeDesc, ?. { } >> >> It's possible that such a mechanism can be grafted on to your proposal, >> or there is a shuffling that supports it. >> >> >> >> >> >> On Apr 29, 2019, at 2:28 PM, Alan Malloy wrote: >> >> Hello again, amber-spec-experts. I have another report from the Google >> codebase, this time focusing on sealed types. It is viewable in full >> Technicolor HTML at >> http://cr.openjdk.java.net/~cushon/amalloy/sealed-types-report.html (thanks >> again to Liam for hosting), and included below as plain text: >> >> ?Author: Alan Malloy (amalloy at google.com) >> Published: 2019-04-29 >> >> Feedback on Sealed Types >> >> Hello again, amber-spec-experts. I?m back with a second Google codebase >> research project. I?m looking again at the Records & Sealed Types proposal >> (which has now become JDK-8222777), but this time I?m focusing on sealed >> types instead of records, as promised in my RFC of a few weeks ago. My goal >> was to investigate Google?s codebase to guess what developers might have >> done differently if they had access to sealed types. This could help us >> understand what works in the current proposal and what to consider changing. >> >> Unlike my previous report, this one contains more anecdotes than >> statistics. It wound up being difficult to build static analysis to >> categorize the interesting cases, so I mostly examined promising candidates >> by hand. >> >> Summary and Recommendations >> >> For those who don?t care to read through all my anecdotes, I first >> provide a summary of my findings, and one suggested addition. >> >> Sealed types, as proposed so far, are a good idea in theory: Java already >> has product types and open polymorphism, and sealed types give us closed >> polymorphism. However, I could not find many cases of code being written >> today that would be greatly enhanced if sealed types were available. The >> main selling point of sealed types for application authors is getting help >> from the compiler with exhaustiveness checking, but in practice developers >> almost always have a default case, because they are only interested in a >> subset of the possible subclasses, and want to ignore cases they don?t >> understand. This means that exhaustiveness-checking for pattern matches >> would mostly go unused if developers rewrote their existing code using >> sealed types. >> >> Pattern matching is great, and can replace visitors in many cases, but >> this does not depend on sealed types except for exhaustiveness checks >> (which, again, would go mostly unused in code written today). The class >> hierarchies for which people define visitors today are just too large to >> write an exhaustive pattern match, and so a default case would be very >> common. >> >> The other audience for sealed types is library authors. While in practice >> most developers have no great need to forbid subclasses, perhaps it would >> be a boon for authors of particularly popular libraries, who need to expose >> a non-final class as an implementation detail but don?t intend for >> consumers to create their own subclasses. Those authors can already include >> documentation saying ?you really should not extend this?, but there is >> always some weirdo out there who will ignore your warnings and then write >> an angry letter when the next version of your library breaks his program >> (see: sun.misc.Unsafe). Authors of such libraries would welcome the >> opportunity to make it truly impossible to create undesirable subclasses. >> >> Sealed Types As a Vehicle For Sum Types >> >> So, sealed types as-is would be an improvement, but a niche one, used by >> few. I think we can get substantially more mileage out of them if we also >> include a more cohesive way to explicitly define a sum type and all its >> subtypes in one place with minimal ceremony. Such a sum type could be >> sealed, implicitly or explicitly. A tool like this takes what I see as the >> ?theoretical? advantage of sum types (closed polymorphism), and makes it >> ?practical? by putting it front and center. Making sums an actual language >> element instead of something ?implied? by sealing a type and putting its >> subclasses nearby could help in a lot of ways: >> >> * Developers might more often realize that a sealed/sum type is a good >> model for their domain. Currently it?s a ?pattern? external to the language >> instead of a ?feature?, and many don?t realize it could be applied to their >> domain. Putting it in the language raises its profile, addressing the >> problem that people don?t realize they want it. >> * The compiler could provide help for defining simple sums-of-products, >> while making it possible to opt into more complicated subclasses, in much >> the way that enums do: the typical enum just has bare constants like EAST, >> but you can add constructor arguments or override methods when necessary. >> * The ability to more easily model data in this way may result in >> developers writing more classes that are amenable to sealing/sums, as they >> do in other languages with explicit sum types (Haskell, Kotlin, Scala). >> Then, the exhaustiveness-checking feature that sealed types provide would >> pull more weight. >> >> Since enum types are ?degenerate sum types?, the syntax for defining sums >> can borrow heavily from enums. A sketch of the syntax I imagine for such >> things (of course, I am not married to it): >> public type-enum interface BinaryTree { >> Leaf { >> @Override public Stream elements() {return Stream.empty();} >> }, >> Node(T data, BinaryTree left, BinaryTree right) { >> @Override public Stream elements() { >> return Stream.concat(left.elements(), >> Stream.concat(Stream.of(data), right.elements())); >> } >> }; >> >> >> public Stream elements(); >> } >> >> Like enums, you can use a bare identifier for simple types that exist >> only to be pattern-matched against, but you can add fields and/or override >> blocks as necessary. The advantage over declaring a sealed type separately >> from its elements is both concision (the compiler infers visible records, >> superclass, and all type parameters) and clarity: you state your intention >> firmly. I think a convenient syntax like this will encourage developers to >> use the powerful tool of sealed types to model their data. >> >> Evidence in Google?s Codebase >> >> If you are just interested in recommendations, you can stop reading now: >> they are all included in the summary. What follows is a number of >> anecdotes, or case studies if you prefer, that led me to the conclusions >> above. Each shows a type that might have been written as a sealed type, and >> hopefully highlights a different facet of the question of how sealed types >> can be useful. >> >> The first thing I looked for was classes which are often involved in >> instanceof checks. As language authors, we imagine people writing stuff >> like this[1] all the time: >> >> interface Expr {int eval(Scope s);} >> record Var(String name) implements Expr { >> public int eval(Scope s) {return s.get(name);} >> } >> record Sum(Expr left, Expr right) implements Expr { >> public int eval(Scope s) {return left.eval(s) + right.eval(s);} >> } >> class Analyzer { >> Stream variablesUsed(Expr e) { >> if (e instanceof Var) return Stream.of(((Var)e).name); >> if (e instanceof Sum) { >> return variablesUsed(((Sum)e).left) >> .concat(variablesUsed(((Sum)e).right)); >> } >> throw new IllegalArgumentException(); >> } >> } >> >> Here, the Expr interface captures some of the functionality shared by all >> expressions, but later a client (Analyzer) came along and invented some >> other polymorphic operations to perform on an Expr, which Expr did not >> support. So Analyzer needed to do instanceof checks instead, externalizing >> the polymorphism. The principled approach would have been for Expr to >> export a visitor to begin with, but perhaps it wasn?t seen as worth the >> trouble at the time. >> >> To try to find this pattern in the wild, I searched for method bodies >> which perform multiple instanceof checks against the same variable. >> Notably, this excludes the typical equals(Object) method, which only >> performs a single check. For each such variable, I noted: >> >> 1. Its declared type >> 2. The set of subtypes it was checked for with instanceof >> 3. The common supertype of those subtypes. >> >> I guessed that (3) would usually be the same as (1), but in practice 55% >> of the time they were different. Often, the declared type was Object, or >> some generic type variable which erases to Object, while the common >> supertype being tested was something like Number, Event, or Node. For >> example, a Container knows it will be used in some context where NaN is >> unsuitable, so it checks whether its contents are Float or Double, and if >> so ensures NaN is not stored. As a second example, a serialize(Object) >> method checks whether its input is String or ByteString, and throws an >> exception otherwise. >> >> Bad sealed types found looking at instanceof checks >> >> I looked through the most popular declared types of these candidates, to >> investigate which types are often involved in such checks. Most of them are >> not good candidates for a sealed type. Object was the most common type, >> followed by Exception and Throwabe. >> >> Next up is an internal DOMObject class, which sounds promising until I >> tell you it has thousands of direct subclasses. Nobody is doing exhaustive >> switches on this, of course. Instead, many uses iterate over a >> Collection, or receive a DOMObject in some way, and just check >> whether it is of one or two specific subtypes they care about. This turned >> out to be a very common pattern, not just for DOMObject, but for many >> candidate sealed types I found: nobody does exhaustive case analysis. They >> just look for objects they understand in some much larger hierarchy, and >> ignore the rest. >> >> Some more humorous types that are often involved in instanceof checks: >> java.net.InetAddress (everyone wants to know if it?s v4 or v6) and >> com.sun.source.tree.Tree, in our static-analysis tools. Tree is an >> interesting case: here we do exactly what I mentioned previously for >> DOMObject. On the surface it seems that Tree would be a good candidate for >> a sealed interface with record subtypes, but in practice I?m not sure what >> sealing would buy us. We would effectively opt out of >> exhaustiveness-checking by having a large default case, or by extending a >> visitor with default-empty methods. Of course, sometimes we define a new >> visitor to do some polymorphic operation over a Tree, but more often we >> just look for one or two subtypes we care about. For example, DataFlow >> inspects a Tree, but knows from context that it is either a >> LambdaExpressionTree, MethodTree, or an initializer. >> >> Plausible sealed types found looking at instanceof checks >> >> The previous section notwithstanding, I did dig deep enough into the >> results to find a few classes that could make good sealed types. The most >> prominent, and most interesting, was another AST. There is an abstract Node >> class for representing HTML documents. It has just 4 subclasses defined in >> the same file: Text, Comment, Tag, and EndTag. This spartan definition >> suggests it?s used for something like SAX parsing, but I didn?t confirm >> this. It does everything you could hope for from a type like this: it >> exposes a Visitor, it provides an accept(Visitor) method, and the >> superclass specifies abstract methods for a couple of the most common >> things you would want to do, such as a String toHtml() method. >> >> However, recall that I found this class by looking for classes often >> involved in instanceof checks! Some people use the visitor, but why doesn?t >> everyone? The first reason I found is one I?ve mentioned several times >> already: clients only care about one of the 4 cases, and may have felt >> creating an anonymous visitor is too much ceremony. Would they be happy >> with a switch and a default clause? Probably, but it?s hard to know for >> sure. The second reason surprised me a bit: I found clients doing analysis >> that isn?t really amenable to any one visitor, or a simple pattern-match. >> They?ve written this: >> >> if (mode1) { if (x instanceof Tag) {...} } >> else if (mode2) { if (x instanceof Text) {...}} >> >> The same use site cares about different subclasses at different times, >> depending on some other flag(s) controlling its behavior. Even if we >> offered a pattern-match on x, it?s difficult to encode the flags correctly. >> They would have to match on a tuple of (mode1, mode2, x), with a case for >> (true, _, Tag name) and another for (false, true, Text text). Technically >> possible, but not really prettier than what they already have, especially >> since you would need to use a local record instead of an anonymous tuple. >> >> Even so, I think this would have benefited from being a sealed type. >> Recall that earlier I carefully said ?4 subclasses defined in the same >> file?. This is because some jokester in a different package altogether has >> defined their own fifth subclass, Doctype. They have their own >> sub-interface of Visitor that knows about Doctype nodes. I can?t help but >> feel that the authors of Node would have preferred to make this illegal, if >> they had been able to. >> >> The second good sealed type I found is almost an enum, except that one of >> the instances has per-instance data. This is not exactly a surprise, since >> an enum is a degenerate sum type, and one way to think of sealed types is >> as a way to model sums. It looks something like this[2]: >> >> public abstract class DbResult { >> public record NoDatabase() extends DbResult; >> public record RowNotFound() extends DbResult; >> // Four more error types ... >> public record EmptySuccess() extends DbResult; >> public record SuccessWithData(T data) extends DbResult; >> >> public T getData() { >> if (!(this instanceof SuccessWithData)) >> throw new DbException(); >> return ((SuccessWithData)this).data; >> } >> public DbResult transform(Function f) { >> if (!(this instanceof SuccessWithData)) { >> return (DbResult)this; >> } >> return new SuccessWithData(f.apply( >> ((SuccessWithData)this).data)); >> } >> >> Reading this code made me yearn for Haskell: here is someone who surely >> wanted to write >> >> data DbResult t = NoDatabase | NoRow | EmptySuccess | Success t >> >> but had to spend 120 lines defining their sum-of-products (the extra >> verbosity is because really they made the subclasses private, and defined >> private static singletons for each of the error types, with a static getter >> to get the type parameter right). This seems like a potential win for >> records and for sealed types. Certainly my snippet was much shorter than >> the actual source file because the proposed record syntax is quite concise, >> so that is a real win. But what do we really gain from sealing this type? >> Still nobody does exhaustive analysis even of this relatively small type: >> they just use functions like getData and transform to work with the result >> generically, or spot-check a couple interesting subtypes with instanceof. >> Forbidding subclassing from other packages hardly matters: nobody was >> subclassing it anyway, and nor would they be tempted to. Really the >> improvements DbResult benefits most from are records, and pattern-matching >> on records. It would be much nicer to replace the instanceof/cast pattern >> with a pattern-match that extracts the relevant field. >> >> This is the use case that inspired my idea of a type-enum, in the Summary >> section above. Rewriting it as a type-enum eliminates many of the problems: >> all the instanceof checks are gone, we don?t need a bunch of extra keywords >> for each case, and we?re explicit about the subclasses ?belonging to? the >> sealed parent, which means we get stuff like extends and for free. We >> get improved clarity by letting the definition of the class hierarchy >> reflect its ?nature? as a sum. >> >> public abstract type-enum DbResult { >> NoDatabase, >> RowNotFound, >> EmptySuccess, >> SuccessWithData(T data) { >> @Override public T getData() { >> return data; >> } >> @Override public DbResult transform(Function f) { >> return new SuccessWithData(f.apply(data)); >> } >> } >> >> public T getData() { >> throw new DbException(); >> } >> public DbResult transform(Function f) { >> return (DbResult)this; >> } >> } >> >> Visitors >> >> Instead of doing a bunch of instanceof checks, the ?sophisticated? way to >> interact with a class having a small, known set of subtypes is with a >> visitor. I considered doing some complicated analysis to characterize what >> makes a class a visitor, and trying to automatically cross-reference >> visitors to the classes they visit...but in practice simply looking for >> classes with ?Visitor? in their name was a strong enough signal that a more >> complicated approach was not needed. Having identified visitors, I looked >> at those visitors with the most subclasses, since each distinct subclass >> corresponds to one ?interaction? with the sealed type that it visits, and >> well-used visitors suggest both popularity and good design. >> >> One common theme I found: developers aren?t good at applying the visitor >> pattern. Many cases I found had some weird and inexplicable quirk compared >> to the ?standard? visitor. These developers will be relieved to get >> pattern-matching syntax so they can stop writing visitors. >> >> The Visiting Object >> >> The first popular visitor I found was a bit odd to me. It?s another tree >> type, but with a weird amalgam of several visitors, and an unusual approach >> to its double dispatch. I have to include a relatively lengthy code snippet >> to show all of its facets: >> >> public static abstract class Node { >> public interface Visitor { >> boolean process(Node node); >> } >> public boolean visit(Object v) { >> return v instanceof Visitor >> && ((Visitor)v).process(this); >> } >> // Other methods common to all Nodes ... >> } >> >> public static final class RootNode extends Node { >> public interface Visitor { >> boolean processRoot(RootNode node); >> } >> @Override >> public boolean visit(Object v) { >> return v instanceof Visitor >> ? ((Visitor)v).processRoot(this) >> : super.visit(v); >> } >> // Other stuff about root nodes ... >> } >> >> public static abstract class ValueNode extends Node { >> public interface Visitor { >> boolean processValue(ValueNode node); >> } >> @Override >> public boolean visit(Object v) { >> return v instanceof Visitor >> ? ((Visitor)v).processValue(this) >> : super.visit(v); >> } >> } >> >> public static final class BooleanNode extends ValueNode { >> public interface Visitor { >> boolean processBool(BooleanNode node); >> } >> @Override >> public boolean visit(Object v) { >> return v instanceof Visitor >> ? ((Visitor)v).processBool(this) >> : super.visit(v); >> } >> // Other stuff about booleans ... >> } >> >> public static final class StringNode extends ValueNode { >> // Much the same as BooleanNode >> } >> >> This goes on for some time: there is a multi-layered hierarchy of dozens >> of node types, each with a boolean visit(Object) method, and their own >> distinct Visitor interface, in this file. I should note that this code is >> actually not written by a human, but rather generated by some process (I >> didn?t look into how). I still think it is worth mentioning here for two >> reasons: first, whoever wrote the code generator would probably do >> something similar if writing it by hand, and second because these visitors >> are used often by hand-written code. >> >> Speaking of hand-written code, visitor subclasses now get to declare >> ahead of time exactly which kinds of nodes they care about, by implementing >> only the appropriate Visitor interfaces: >> >> private class FooVisitor implements StringNode.Visitor, >> BooleanNode.Visitor, RootNode.Visitor { >> // ... >> } >> >> This isn?t how I would have written things, but I can sorta see the >> appeal, if you don?t have to write it all by hand: a visitor can choose to >> handle any one subclass of ValueNode, or all ValueNodes, or just RootNode >> and StringNode, et cetera. They get to pick and choose what sub-trees of >> the inheritance tree they work with. >> >> Would Node be a good sealed class? Maybe. It clearly intends to enumerate >> all subclasses, but the benefit it gets from enforcing that is minimal. As >> in my previous examples, the main advantage for Node implementors would >> come from records, and the main advantage for clients would come from >> pattern-matching, obviating their need for this giant visitor. >> >> The Enumerated Node >> >> Another AST, this time for some kind of query language, explicitly >> declares an enum of all subclasses it can have, and uses this enum instead >> of using traditional double-dispatch: >> >> public interface Node { >> enum Kind {EXPR, QUERY, IMPORT /* and 9 more */} >> Kind getKind(); >> Location getLocation(); >> } >> >> public abstract record AbstractNode(Location l) implements Node {} >> >> public class Expr extends AbstractNode { >> public Kind getKind() {return EXPR;} >> // ... >> } >> // And so on for other Kinds ... >> >> public abstract class Visitor { >> // Empty default implementations, not abstract. >> public Expr visitExpr(Expr e) {} >> public Query visitQuery(Query q) {} >> public Import visitImport(Import i) {} >> public Node visit(Node n) { >> switch (n.getKind()) { >> case EXPR: return visitExpr((Expr)n); >> case QUERY: return visitQuery((Query)n); >> case IMPORT: return visitImport((Import)n); >> // ... >> } >> } >> } >> >> It?s not really clear to me why they do it this way, instead of putting >> an accept(Visitor) method on Node. They gain the ability to return >> different types for each Node subtype, but are hugely restricted in what >> visitors can do: they must return a Node, instead of performing an >> arbitrary computation. It seems like the idea is visitors must specialize >> to tree rewriting, but I still would have preferred to parameterize the >> visitor by return type. >> >> Would this be better as a sealed type? I feel sure that if sealed types >> existed, the authors of this class would have used one. We could certainly >> do away with the enum, and use an expression-switch instead to >> pattern-match in the implementation of visit(Node). But I think the Visitor >> class would still exist, and still have separate methods for each Node >> subtype, because they developer seemed to care about specializing the >> return type. The only place where an exhaustiveness check helps would be in >> the visit(Node) method, inside the visitor class itself. All other dispatch >> goes through visit(Node), or through one of the specialized visitor methods >> if the type is known statically. It seems like overall this would be an >> improvement, but again, the improvement comes primarily from >> pattern-matching, not sealing. >> >> Colocated interface implementations >> >> Finally, I looked for interfaces having all of their implementations >> defined in the same file. On this I do have some statistical data[3]. A >> huge majority (98.5%) of public interfaces have at least one implementation >> in a different source file. Package-private interfaces also tend to have >> implementations in other files: 85% of them are in this category. For >> protected interfaces it?s much closer: only 53% have external >> implementations. Of course, all private interfaces have all implementations >> in a single file. >> >> Next, I looked at interfaces that share a source file with all their >> implementations, to see whether they?d make good sealed types. First was >> this Entry class: >> >> public interface Entry { >> enum Status {OK, PENDING, FAILED} >> Status getStatus(); >> int size(); >> String render(); >> } >> >> public class UserEntry implements Entry { >> private User u; >> private Status s; >> public UserEntry(User u, Status s) { >> this.u = u; >> this.s = s; >> } >> @Override String render() {return u.name();} >> @Override int size() {return 1;} >> @Override Status getStatus() {return s;} >> } >> >> public class AccountEntry implements Entry { >> private Account a; >> private Status s; >> public UserEntry(Account a, Status s) { >> this.a = a; >> this.s = s; >> } >> @Override String render() {return a.render();} >> @Override int size() {return a.size();} >> @Override Status getStatus() {return s;} >> } >> >> A huge majority of the clients of this Entry interface treat it >> polymorphically, just calling its interface methods. In only one case is >> there an instanceof check made on an Entry, dispatching to different >> methods depending on which subclass is present. >> >> Is this a good sealed type? I think not, really. There are two >> implementations now, but perhaps there will be a GroupEntry someday. >> Existing clients should continue to work in that case: the polymorphic >> Entry interface provides everything clients are ?intended? to know. >> >> Another candidate for sealing: >> >> public interface Request {/* Empty */} >> public record RequestById(int id) implements Request; >> public record RequestByValue(String owner, boolean historic) implements >> Request; >> >> public class RequestFetcher { >> public List fetch(Iterable requests) { >> List idReqs = Lists.newArrayList(); >> List valueReqs = Lists.newArrayList(); >> List queries = Lists.newArrayList(); >> for (Request req : requests) { >> if (req instanceof RequestById) { >> idReqs.add((RequestById)req); >> } else if (req instanceof RequestByValue) { >> valueReqs.add((RequestByValue)req); >> } >> } >> queries.addAll(prepareIdQueries(idReqs)); >> queries.addAll(prepareValueQueries(valueReqs)); >> return runQueries(queries); >> } >> } >> >> Interestingly, since the Request interface is empty, the only way to do >> anything with this class is to cast it to one implementation type. In fact, >> the RequestFetcher I include here is the only usage of either of these >> classes (plus, of course, helpers like prepareIdQueries). >> >> So, clients need to know about specific subclasses, and want to be sure >> they?re doing exhaustive pattern-matching. Seems like a great sealed class >> to me. Except...actually each of the two subclasses has been extended by a >> decorator adding a source[4]: >> >> public record SourcedRequestById(Source source) extends RequestById; >> public record SourcedRequestByValue(Source source) extends RequestByValue; >> >> Does this argue in favor of sealing, or against? I don?t really know. The >> owners of Request clearly intended for all four of these subclasses to >> exist (they?re in the same package), so they could include them all in the >> permitted subtype list, but it seems like a confusing API to expose to >> clients. >> >> A third candidate for sealing is another simple sum type: >> >> public interface ValueOrAggregatorException { >> T get(); >> public static ValueOrAggregatorException >> of(T value) { >> return new OfValue(value); >> } >> public static ValueOrAggregatorException >> ofException(AggregatorException err) { >> return new OfException(err); >> } >> private record OfValue(T value) >> implements ValueOrAggregatorException { >> @Override T get() {return value;} >> } >> private record OfException(AggregatorException err) >> implements ValueOrAggregatorException { >> @Override T get() {throw err;} >> } >> } >> >> It has only two subtypes, and it seems unimaginable there could ever be a >> third, so why not seal it? However, the subtypes are intentionally hidden: >> it is undesirable to let people see whether there?s an exception, except by >> having it thrown at you. In fact AggregatorException is documented as >> ?plugins may throw this, but should never catch it?: there is some >> higher-level thing responsible for catching all such exceptions. So, this >> type gains no benefit from exhaustiveness checks in pattern-matching. The >> type is intended to be used polymorphically, through its interface method, >> even though its private implementation is amenable to sealing. >> ________________ >> [1] Throughout this document I will use record syntax as if it were >> already in the language. This is merely for brevity, and to avoid making >> the reader spend a lot of time reading code that boils down to just storing >> a couple fields. In practice, of course the code in Google?s codebase >> either defines the records by hand, or uses an @AutoValue. >> [2] Recall that @AutoValue, Google?s ?record?, allows extending a class, >> which is semantically okay here: DbResult has no state, only behavior. >> [3]This data is imperfect. While the Google codebase strongly discourages >> having more than one version of a module checked in, there is still some >> amount of ?vendoring? or checking in multiple versions of some package, >> e.g. for supporting external clients of an old version of an API. As a >> result, two ?different files? which are really copies of each other may >> implement interfaces with the same fully-qualified name; I did not attempt >> to control for this case, and so such cases may look like they were in the >> same file, or not. >> [4] Of course in the record proposal it is illegal to extend records like >> this; in real life these plain data carriers are implemented by hand as >> ordinary classes, so the subtyping is legal. >> >> >> From brian.goetz at oracle.com Mon Apr 29 21:01:17 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 29 Apr 2019 17:01:17 -0400 Subject: Feedback on Sealed Types In-Reply-To: References: Message-ID: <7a3cd179-1ccc-f3d5-b438-3146e87fa4eb@oracle.com> It would be nice if we could "just" overload enum itself to support a record-like option: ??? enum Node { ??????? AddNode(Node a, Node b), ??????? MulNode(Node a, Node b), ??????? ...; ??????? ... ??? } but unfortunately that syntax looks confusingly close to something else :(? It also doesn't really scale to multi-level hierarchies, though that might be OK. Its worth thinking about, though.? The `data` construct from Haskell is surely more direct than modeling a sum with sealed interfaces (though, the latter is also more flexible than `data`.) On the other hand, declaring a sum of records as: ??? sealed interface I { ??????? record A(int a) implements I { } ??????? record B(long b) implements I { } ??????? record C(String c) implements I { } ??? } isn't so bad.? The main redundancy here is primarily "implements I", and secondarily "record".? We can surely compress that away like this: ??? enum interface I { ??????? A(int a), ??????? B(long b), ??????? C(String c); ??????? // I methods ??? } but I am not sure it carries its weight, given as it adds little additional concision (and no additional semantics.) On 4/29/2019 4:33 PM, Alan Malloy wrote: > Thanks, Brian. I indeed didn't think of some of your proposed benefits > of sealing non-sum types, as I was focused mostly on things you > mentioned explicitly in the JEP, which is somewhat light on the > expected benefits. > > I think the first two items in your "challenges" solve each other: I > don't intend sum types to be the only kind of sealed type, but just a > good way to declare the simplest kind. I left out the "record" keyword > from the declaration with the idea that it would be implicit: if you > want the convenient sum-of-products declaration style, you have to use > records. If you want something more complicated, you declare a sealed > interface (or superclass), and N permitted subclasses, declared > separately in whatever way you want. This restriction helps by making > the semantics clearer, and I had also hoped that it would lead to a > syntax error if you leave out the comma. Looking more closely, I see > this is somewhat precarious: a record declaration looks enough like a > method signature that they may be ambiguous in an interface, if you > don't require the "record" keyword, or if you use a semicolon instead > of a comma. I think it can still work if we require each nested record > to use {...} instead of ; even if it's empty. This way, your two > examples look like > > ? ? interface X { > ? ? ? ? class X1 { ? } > ? ? ? ? class X2 { ? } > ? ? } > > and > > ? ? enumerated interface Y { > ? ? ? ? Y1 { ? }, ? ? ? ? Y2 { ? } > ? ? } > > The latter would become illegal if you dropped the comma, even if you > also forgot the "enumerated" keyword, because the braces make no sense > in an ordinary interface. > > On Mon, Apr 29, 2019 at 12:35 PM Brian Goetz > wrote: > > Thanks Alan, for this nice exploration.? There?s a lot to respond > to.? I?ll start with some general comments about sealing, and then > move on to your alternate proposal for exposing it. > > I can think of several main reasons why you would want to seal a > hierarchy. > > ?- To say something about the _interface itself_. That is, that it > was not designed as a general purpose contract between arms-length > entities, but that it exists only as a common super type for a > fixed set of classes that are part of the library.? In other > words, ?please don?t implement me.? > > ?- To say something about the semantics of the type. Several of > the examples in your report fall into this category: ?a DbResult > is either a NoRowsFound or a Rows(List)?.? This tells users > exactly what the reasonable range of results to expect are when > doing a query.? Of course, the spec could say the same thing, but > that involves reading and interpreting the spec. Easier if this > conclusion can be driven by types (and IDEs can help more here too.) > > ?- To strengthen typing by simulating unions.? If my method is > going to return either a String or a Number, the common super type > is Object. ?(Actually, it?s some variant of Serializable & > Comparable.). Sums-of-products allow library authors > to make a stronger statement about types in the presence of > unions.? Exposing a sum of StringHolder(String) and > NumberHolder(Number), using records and sealed types, is not so > ceremonious, so some library developers might choose to do this > instead of Object. > > ?- Security.? Some libraries may want to know that the code they > are calling is part of their library, rather than an arbitrary > implementation of some interface. > > ?- To aid in exhaustiveness.? We?ve already discussed this at > length; your point is that this one doesn?t come up as often as > one might hope. > > Not only is there an obvious synergy between sums and products (as > many languages have demonstrated), but there is a third factor, > which is ?if you make it easy enough, people will use it more.? > ?Clearly records are pretty easy to use; your point is that if > there were a more streamlined sum-of-products idiom, the third > factor would be even stronger here.? I think algebraic data types > is one of those things that will take some time for developers to > learn to appreciate; the easier we make it, of course the faster > that will happen. > > > Now, to your syntax suggestion.? Overall, I like the idea, but I > have some concerns.? First, the good parts: > > ?- The connection with enums is powerful.? Users already > understand enums, so this will help them understand sums. Enums > have special treatment in switch; we want the same treatment for > sealed type patterns. Enums have special treatment for > exhaustiveness; we want the same for sealed type patterns.? So > tying these together with some more general enum-ness leans on > what people already know. > > ?- While sums and products are theoretically independent features, > sums-of-products are expected to be quite common.? So it might be > reasonable to encourage this syntactically. > > ?- The current proposal has some redundancy, in that the subtypes > have to say ?implements Node?, even if they are nested within > Node.? With a stronger mechanism for declaring them, as you > propose, then that can safely be left implicit. > > ?- I confess that I too like the simplicity of Haskell?s `data` > declaration, and this brings us closer. > > Now, the challenges: > > ?- The result is still a little busy.? We need a modifier for > ?enumerated type?, and we would also need to be able to have child > types be not only records, but ordinary classes and interfaces.? > So we?d have to have a place for ?record?, ?class?, or ?interface? > with the declaration of the enumerated classes (as well as other > modifiers.). That busies up the result a bit. > > ?- Once we do this, I worry that it will be hard to tell the > difference between: > > ? ? interface X { > ? ? ? ? class X1 { ? } > ? ? ? ? class X2 { ? } > ? ? } > > and > > ? ? enumerated interface Y { > ? ? ? ? class Y1 { ? }, ? ? ? ? class Y2 { ? } > ? ? } > > and that users will forever be making mistakes like forgetting the > comma, or putting it where it doesn?t belong. > > ?- This mechanism addresses the very common case of > sum-of-product, but leaves more esoteric sums out of the picture. > ?(Consider the types in java.lang.constant, which really want to > be sealed.). There, because they are not co-declared, we?d need > something more like > > ? ? ?sealed interface ConstantDesc > ? ? ? ? ?permits ClassDesc, MethodTypeDesc, ?. { } > > It's possible that such a mechanism can be grafted on to your > proposal, or there is a shuffling that supports it. > > > > > >> On Apr 29, 2019, at 2:28 PM, Alan Malloy > > wrote: >> >> Hello again, amber-spec-experts. I have another report from the >> Google codebase, this time focusing on sealed types. It is >> viewable in full Technicolor HTML at >> http://cr.openjdk.java.net/~cushon/amalloy/sealed-types-report.html?(thanks >> again to Liam for hosting), and included below as plain text: >> >> ?Author: Alan Malloy (amalloy at google.com ) >> Published: 2019-04-29 >> >> Feedback on Sealed Types >> >> Hello again, amber-spec-experts. I?m back with a second Google >> codebase research project. I?m looking again at the Records & >> Sealed Types proposal (which has now become JDK-8222777), but >> this time I?m focusing on sealed types instead of records, as >> promised in my RFC of a few weeks ago. My goal was to investigate >> Google?s codebase to guess what developers might have done >> differently if they had access to sealed types. This could help >> us understand what works in the current proposal and what to >> consider changing. >> >> Unlike my previous report, this one contains more anecdotes than >> statistics. It wound up being difficult to build static analysis >> to categorize the interesting cases, so I mostly examined >> promising candidates by hand. >> >> Summary and Recommendations >> >> For those who don?t care to read through all my anecdotes, I >> first provide a summary of my findings, and one suggested addition. >> >> Sealed types, as proposed so far, are a good idea in theory: Java >> already has product types and open polymorphism, and sealed types >> give us closed polymorphism. However, I could not find many cases >> of code being written today that would be greatly enhanced if >> sealed types were available. The main selling point of sealed >> types for application authors is getting help from the compiler >> with exhaustiveness checking, but in practice developers almost >> always have a default case, because they are only interested in a >> subset of the possible subclasses, and want to ignore cases they >> don?t understand. This means that exhaustiveness-checking for >> pattern matches would mostly go unused if developers rewrote >> their existing code using sealed types. >> >> Pattern matching is great, and can replace visitors in many >> cases, but this does not depend on sealed types except for >> exhaustiveness checks (which, again, would go mostly unused in >> code written today). The class hierarchies for which people >> define visitors today are just too large to write an exhaustive >> pattern match, and so a default case would be very common. >> >> The other audience for sealed types is library authors. While in >> practice most developers have no great need to forbid subclasses, >> perhaps it would be a boon for authors of particularly popular >> libraries, who need to expose a non-final class as an >> implementation detail but don?t intend for consumers to create >> their own subclasses. Those authors can already include >> documentation saying ?you really should not extend this?, but >> there is always some weirdo out there who will ignore your >> warnings and then write an angry letter when the next version of >> your library breaks his program (see: sun.misc.Unsafe). Authors >> of such libraries would welcome the opportunity to make it truly >> impossible to create undesirable subclasses. >> >> Sealed Types As a Vehicle For Sum Types >> >> So, sealed types as-is would be an improvement, but a niche one, >> used by few. I think we can get substantially more mileage out of >> them if we also include a more cohesive way to explicitly define >> a sum type and all its subtypes in one place with minimal >> ceremony. Such a sum type could be sealed, implicitly or >> explicitly. A tool like this takes what I see as the >> ?theoretical? advantage of sum types (closed polymorphism), and >> makes it ?practical? by putting it front and center. Making sums >> an actual language element instead of something ?implied? by >> sealing a type and putting its subclasses nearby could help in a >> lot of ways: >> >> * Developers might more often realize that a sealed/sum type is a >> good model for their domain. Currently it?s a ?pattern? external >> to the language instead of a ?feature?, and many don?t realize it >> could be applied to their domain. Putting it in the language >> raises its profile, addressing the problem that people don?t >> realize they want it. >> * The compiler could provide help for defining simple >> sums-of-products, while making it possible to opt into more >> complicated subclasses, in much the way that enums do: the >> typical enum just has bare constants like EAST, but you can add >> constructor arguments or override methods when necessary. >> * The ability to more easily model data in this way may result in >> developers writing more classes that are amenable to >> sealing/sums, as they do in other languages with explicit sum >> types (Haskell, Kotlin, Scala). Then, the exhaustiveness-checking >> feature that sealed types provide would pull more weight. >> >> Since enum types are ?degenerate sum types?, the syntax for >> defining sums can borrow heavily from enums. A sketch of the >> syntax I imagine for such things (of course, I am not married to it): >> public type-enum interface BinaryTree { >> ? Leaf { >> ? ? @Override public Stream elements() {return Stream.empty();} >> ? }, >> ? Node(T data, BinaryTree left, BinaryTree right) { >> ? ? @Override public Stream elements() { >> ? ? ? return Stream.concat(left.elements(), >> Stream.concat(Stream.of(data), right.elements())); >> ? ? } >> ? }; >> >> >> ? public Stream elements(); >> } >> >> Like enums, you can use a bare identifier for simple types that >> exist only to be pattern-matched against, but you can add fields >> and/or override blocks as necessary. The advantage over declaring >> a sealed type separately from its elements is both concision (the >> compiler infers visible records, superclass, and all type >> parameters) and clarity: you state your intention firmly. I think >> a convenient syntax like this will encourage developers to use >> the powerful tool of sealed types to model their data. >> >> Evidence in Google?s Codebase >> >> If you are just interested in recommendations, you can stop >> reading now: they are all included in the summary. What follows >> is a number of anecdotes, or case studies if you prefer, that led >> me to the conclusions above. Each shows a type that might have >> been written as a sealed type, and hopefully highlights a >> different facet of the question of how sealed types can be useful. >> >> The first thing I looked for was classes which are often involved >> in instanceof checks. As language authors, we imagine people >> writing stuff like this[1] all the time: >> >> interface Expr {int eval(Scope s);} >> record Var(String name) implements Expr { >> ? public int eval(Scope s) {return s.get(name);} >> } >> record Sum(Expr left, Expr right) implements Expr { >> ? public int eval(Scope s) {return left.eval(s) + right.eval(s);} >> } >> class Analyzer { >> ? Stream variablesUsed(Expr e) { >> ? ? if (e instanceof Var) return Stream.of(((Var)e).name); >> ? ? if (e instanceof Sum) { >> ? ? ? return variablesUsed(((Sum)e).left) >> .concat(variablesUsed(((Sum)e).right)); >> ? ?} >> ? ? throw new IllegalArgumentException(); >> ? } >> } >> >> Here, the Expr interface captures some of the functionality >> shared by all expressions, but later a client (Analyzer) came >> along and invented some other polymorphic operations to perform >> on an Expr, which Expr did not support. So Analyzer needed to do >> instanceof checks instead, externalizing the polymorphism. The >> principled approach would have been for Expr to export a visitor >> to begin with, but perhaps it wasn?t seen as worth the trouble at >> the time. >> >> To try to find this pattern in the wild, I searched for method >> bodies which perform multiple instanceof checks against the same >> variable. Notably, this excludes the typical equals(Object) >> method, which only performs a single check. For each such >> variable, I noted: >> >> 1. Its declared type >> 2. The set of subtypes it was checked for with instanceof >> 3. The common supertype of those subtypes. >> >> I guessed that (3) would usually be the same as (1), but in >> practice 55% of the time they were different. Often, the declared >> type was Object, or some generic type variable which erases to >> Object, while the common supertype being tested was something >> like Number, Event, or Node. For example, a Container knows it >> will be used in some context where NaN is unsuitable, so it >> checks whether its contents are Float or Double, and if so >> ensures NaN is not stored. As a second example, a >> serialize(Object) method checks whether its input is String or >> ByteString, and throws an exception otherwise. >> >> Bad sealed types found looking at instanceof checks >> >> I looked through the most popular declared types of these >> candidates, to investigate which types are often involved in such >> checks. Most of them are not good candidates for a sealed type. >> Object was the most common type, followed by Exception and Throwabe. >> >> Next up is an internal DOMObject class, which sounds promising >> until I tell you it has thousands of direct subclasses. Nobody is >> doing exhaustive switches on this, of course. Instead, many uses >> iterate over a Collection, or receive a DOMObject in >> some way, and just check whether it is of one or two specific >> subtypes they care about. This turned out to be a very common >> pattern, not just for DOMObject, but for many candidate sealed >> types I found: nobody does exhaustive case analysis. They just >> look for objects they understand in some much larger hierarchy, >> and ignore the rest. >> >> Some more humorous types that are often involved in instanceof >> checks: java.net.InetAddress (everyone wants to know if it?s v4 >> or v6) and com.sun.source.tree.Tree, in our static-analysis >> tools. Tree is an interesting case: here we do exactly what I >> mentioned previously for DOMObject. On the surface it seems that >> Tree would be a good candidate for a sealed interface with record >> subtypes, but in practice I?m not sure what sealing would buy us. >> We would effectively opt out of exhaustiveness-checking by having >> a large default case, or by extending a visitor with >> default-empty methods. Of course, sometimes we define a new >> visitor to do some polymorphic operation over a Tree, but more >> often we just look for one or two subtypes we care about. For >> example, DataFlow inspects a Tree, but knows from context that it >> is either a LambdaExpressionTree, MethodTree, or an initializer. >> >> Plausible sealed types found looking at instanceof checks >> >> The previous section notwithstanding, I did dig deep enough into >> the results to find a few classes that could make good sealed >> types. The most prominent, and most interesting, was another AST. >> There is an abstract Node class for representing HTML documents. >> It has just 4 subclasses defined in the same file: Text, Comment, >> Tag, and EndTag. This spartan definition suggests it?s used for >> something like SAX parsing, but I didn?t confirm this. It does >> everything you could hope for from a type like this: it exposes a >> Visitor, it provides an accept(Visitor) method, and the >> superclass specifies abstract methods for a couple of the most >> common things you would want to do, such as a String toHtml() method. >> >> However, recall that I found this class by looking for classes >> often involved in instanceof checks! Some people use the visitor, >> but why doesn?t everyone? The first reason I found is one I?ve >> mentioned several times already: clients only care about one of >> the 4 cases, and may have felt creating an anonymous visitor is >> too much ceremony. Would they be happy with a switch and a >> default clause? Probably, but it?s hard to know for sure. The >> second reason surprised me a bit: I found clients doing analysis >> that isn?t really amenable to any one visitor, or a simple >> pattern-match. They?ve written this: >> >> if (mode1) { if (x instanceof Tag) {...} } >> else if (mode2) { if (x instanceof Text) {...}} >> >> The same use site cares about different subclasses at different >> times, depending on some other flag(s) controlling its behavior. >> Even if we offered a pattern-match on x, it?s difficult to encode >> the flags correctly. They would have to match on a tuple of >> (mode1, mode2, x), with a case for (true, _, Tag name) and >> another for (false, true, Text text). Technically possible, but >> not really prettier than what they already have, especially since >> you would need to use a local record instead of an anonymous tuple. >> >> Even so, I think this would have benefited from being a sealed >> type. Recall that earlier I carefully said ?4 subclasses defined >> in the same file?. This is because some jokester in a different >> package altogether has defined their own fifth subclass, Doctype. >> They have their own sub-interface of Visitor that knows about >> Doctype nodes. I can?t help but feel that the authors of Node >> would have preferred to make this illegal, if they had been able to. >> >> The second good sealed type I found is almost an enum, except >> that one of the instances has per-instance data. This is not >> exactly a surprise, since an enum is a degenerate sum type, and >> one way to think of sealed types is as a way to model sums. It >> looks something like this[2]: >> >> public abstract class DbResult { >> ? public record NoDatabase() extends DbResult; >> ? public record RowNotFound() extends DbResult; >> ? // Four more error types ... >> ? public record EmptySuccess() extends DbResult; >> ? public record SuccessWithData(T data) extends DbResult; >> >> ? public T getData() { >> ? ? if (!(this instanceof SuccessWithData)) >> ? ? ? throw new DbException(); >> ? ? return ((SuccessWithData)this).data; >> ? } >> ? public DbResult transform(Function f) { >> ? ? if (!(this instanceof SuccessWithData)) { >> ? ? ? return (DbResult)this; >> ? ? } >> ? ? return new SuccessWithData(f.apply( >> ((SuccessWithData)this).data)); >> } >> >> Reading this code made me yearn for Haskell: here is someone who >> surely wanted to write >> >> data DbResult t = NoDatabase | NoRow | EmptySuccess | Success t >> >> but had to spend 120 lines defining their sum-of-products (the >> extra verbosity is because really they made the subclasses >> private, and defined private static singletons for each of the >> error types, with a static getter to get the type parameter >> right). This seems like a potential win for records and for >> sealed types. Certainly my snippet was much shorter than the >> actual source file because the proposed record syntax is quite >> concise, so that is a real win. But what do we really gain from >> sealing this type? Still nobody does exhaustive analysis even of >> this relatively small type: they just use functions like getData >> and transform to work with the result generically, or spot-check >> a couple interesting subtypes with instanceof. Forbidding >> subclassing from other packages hardly matters: nobody was >> subclassing it anyway, and nor would they be tempted to. Really >> the improvements DbResult benefits most from are records, and >> pattern-matching on records. It would be much nicer to replace >> the instanceof/cast pattern with a pattern-match that extracts >> the relevant field. >> >> This is the use case that inspired my idea of a type-enum, in the >> Summary section above. Rewriting it as a type-enum eliminates >> many of the problems: all the instanceof checks are gone, we >> don?t need a bunch of extra keywords for each case, and we?re >> explicit about the subclasses ?belonging to? the sealed parent, >> which means we get stuff like extends and for free. We get >> improved clarity by letting the definition of the class hierarchy >> reflect its ?nature? as a sum. >> >> public abstract type-enum DbResult { >> ? NoDatabase, >> ? RowNotFound, >> ? EmptySuccess, >> ? SuccessWithData(T data) { >> ? ? @Override public T getData() { >> ? ? ? return data; >> ? ? } >> ? ? @Override public DbResult transform(Function f) { >> ? ? ? return new SuccessWithData(f.apply(data)); >> ? ? } >> ? } >> >> ? public T getData() { >> ? ? throw new DbException(); >> ? } >> ? public DbResult transform(Function f) { >> ? ? return (DbResult)this; >> ? } >> } >> >> Visitors >> >> Instead of doing a bunch of instanceof checks, the >> ?sophisticated? way to interact with a class having a small, >> known set of subtypes is with a visitor. I considered doing some >> complicated analysis to characterize what makes a class a >> visitor, and trying to automatically cross-reference visitors to >> the classes they visit...but in practice simply looking for >> classes with ?Visitor? in their name was a strong enough signal >> that a more complicated approach was not needed. Having >> identified visitors, I looked at those visitors with the most >> subclasses, since each distinct subclass corresponds to one >> ?interaction? with the sealed type that it visits, and well-used >> visitors suggest both popularity and good design. >> >> One common theme I found: developers aren?t good at applying the >> visitor pattern. Many cases I found had some weird and >> inexplicable quirk compared to the ?standard? visitor. These >> developers will be relieved to get pattern-matching syntax so >> they can stop writing visitors. >> >> The Visiting Object >> >> The first popular visitor I found was a bit odd to me. It?s >> another tree type, but with a weird amalgam of several visitors, >> and an unusual approach to its double dispatch. I have to include >> a relatively lengthy code snippet to show all of its facets: >> >> public static abstract class Node { >> ? public interface Visitor { >> ? ? boolean process(Node node); >> ? } >> ? public boolean visit(Object v) { >> ? ? return v instanceof Visitor >> ? ? ? ? && ((Visitor)v).process(this); >> ? } >> ? // Other methods common to all Nodes ... >> } >> >> public static final class RootNode extends Node { >> ? public interface Visitor { >> ? ? boolean processRoot(RootNode node); >> ? } >> ? @Override >> ? public boolean visit(Object v) { >> ? ? return v instanceof Visitor >> ? ? ? ? ? ((Visitor)v).processRoot(this) >> ? ? ? ? : super.visit(v); >> ? } >> ? // Other stuff about root nodes ... >> } >> >> public static abstract class ValueNode extends Node { >> ? public interface Visitor { >> ? ? boolean processValue(ValueNode node); >> ? } >> ? @Override >> ? public boolean visit(Object v) { >> ? ? return v instanceof Visitor >> ? ? ? ? ? ((Visitor)v).processValue(this) >> ? ? ? ? : super.visit(v); >> ? } >> } >> >> public static final class BooleanNode extends ValueNode { >> ? public interface Visitor { >> ? ? boolean processBool(BooleanNode node); >> ? } >> ? @Override >> ? public boolean visit(Object v) { >> ? ? return v instanceof Visitor >> ? ? ? ? ? ((Visitor)v).processBool(this) >> ? ? ? ? : super.visit(v); >> ? } >> ? // Other stuff about booleans ... >> } >> >> public static final class StringNode extends ValueNode { >> ? // Much the same as BooleanNode >> } >> >> This goes on for some time: there is a multi-layered hierarchy of >> dozens of node types, each with a boolean visit(Object) method, >> and their own distinct Visitor interface, in this file. I should >> note that this code is actually not written by a human, but >> rather generated by some process (I didn?t look into how). I >> still think it is worth mentioning here for two reasons: first, >> whoever wrote the code generator would probably do something >> similar if writing it by hand, and second because these visitors >> are used often by hand-written code. >> >> Speaking of hand-written code, visitor subclasses now get to >> declare ahead of time exactly which kinds of nodes they care >> about, by implementing only the appropriate Visitor interfaces: >> >> private class FooVisitor implements StringNode.Visitor, >> ?BooleanNode.Visitor, RootNode.Visitor { >> ? // ... >> } >> >> This isn?t how I would have written things, but I can sorta see >> the appeal, if you don?t have to write it all by hand: a visitor >> can choose to handle any one subclass of ValueNode, or all >> ValueNodes, or just RootNode and StringNode, et cetera. They get >> to pick and choose what sub-trees of the inheritance tree they >> work with. >> >> Would Node be a good sealed class? Maybe. It clearly intends to >> enumerate all subclasses, but the benefit it gets from enforcing >> that is minimal. As in my previous examples, the main advantage >> for Node implementors would come from records, and the main >> advantage for clients would come from pattern-matching, obviating >> their need for this giant visitor. >> >> The Enumerated Node >> >> Another AST, this time for some kind of query language, >> explicitly declares an enum of all subclasses it can have, and >> uses this enum instead of using traditional double-dispatch: >> >> public interface Node { >> ? enum Kind {EXPR, QUERY, IMPORT /* and 9 more */} >> ? Kind getKind(); >> ? Location getLocation(); >> } >> >> public abstract record AbstractNode(Location l) implements Node {} >> >> public class Expr extends AbstractNode { >> ? public Kind getKind() {return EXPR;} >> ? // ... >> } >> // And so on for other Kinds ... >> >> public abstract class Visitor { >> ? // Empty default implementations, not abstract. >> ? public Expr visitExpr(Expr e) {} >> ? public Query visitQuery(Query q) {} >> ? public Import visitImport(Import i) {} >> ? public Node visit(Node n) { >> ? ? switch (n.getKind()) { >> ? ? ? case EXPR: return visitExpr((Expr)n); >> ? ? ? case QUERY: return visitQuery((Query)n); >> ? ? ? case IMPORT: return visitImport((Import)n); >> ? ? ? // ... >> ? ? } >> ? } >> } >> >> It?s not really clear to me why they do it this way, instead of >> putting an accept(Visitor) method on Node. They gain the ability >> to return different types for each Node subtype, but are hugely >> restricted in what visitors can do: they must return a Node, >> instead of performing an arbitrary computation. It seems like the >> idea is visitors must specialize to tree rewriting, but I still >> would have preferred to parameterize the visitor by return type. >> >> Would this be better as a sealed type? I feel sure that if sealed >> types existed, the authors of this class would have used one. We >> could certainly do away with the enum, and use an >> expression-switch instead to pattern-match in the implementation >> of visit(Node). But I think the Visitor class would still exist, >> and still have separate methods for each Node subtype, because >> they developer seemed to care about specializing the return type. >> The only place where an exhaustiveness check helps would be in >> the visit(Node) method, inside the visitor class itself. All >> other dispatch goes through visit(Node), or through one of the >> specialized visitor methods if the type is known statically. It >> seems like overall this would be an improvement, but again, the >> improvement comes primarily from pattern-matching, not sealing. >> >> Colocated interface implementations >> >> Finally, I looked for interfaces having all of their >> implementations defined in the same file. On this I do have some >> statistical data[3]. A huge majority (98.5%) of public interfaces >> have at least one implementation in a different source file. >> Package-private interfaces also tend to have implementations in >> other files: 85% of them are in this category. For protected >> interfaces it?s much closer: only 53% have external >> implementations. Of course, all private interfaces have all >> implementations in a single file. >> >> Next, I looked at interfaces that share a source file with all >> their implementations, to see whether they?d make good sealed >> types. First was this Entry class: >> >> public interface Entry { >> ? enum Status {OK, PENDING, FAILED} >> ? Status getStatus(); >> ? int size(); >> ? String render(); >> } >> >> public class UserEntry implements Entry { >> ? private User u; >> ? private Status s; >> ? public UserEntry(User u, Status s) { >> ? ? this.u = u; >> ? ? this.s = s; >> ? } >> ? @Override String render() {return u.name ();} >> ? @Override int size() {return 1;} >> ? @Override Status getStatus() {return s;} >> } >> >> public class AccountEntry implements Entry { >> ? private Account a; >> ? private Status s; >> ? public UserEntry(Account a, Status s) { >> ? ? this.a = a; >> ? ? this.s = s; >> ? } >> ? @Override String render() {return a.render();} >> ? @Override int size() {return a.size();} >> ? @Override Status getStatus() {return s;} >> } >> >> A huge majority of the clients of this Entry interface treat it >> polymorphically, just calling its interface methods. In only one >> case is there an instanceof check made on an Entry, dispatching >> to different methods depending on which subclass is present. >> >> Is this a good sealed type? I think not, really. There are two >> implementations now, but perhaps there will be a GroupEntry >> someday. Existing clients should continue to work in that case: >> the polymorphic Entry interface provides everything clients are >> ?intended? to know. >> >> Another candidate for sealing: >> >> public interface Request {/* Empty */} >> public record RequestById(int id) implements Request; >> public record RequestByValue(String owner, boolean historic) >> implements Request; >> >> public class RequestFetcher { >> ? public List fetch(Iterable requests) { >> ? ? List idReqs = Lists.newArrayList(); >> ? ? List valueReqs = Lists.newArrayList(); >> ? ? List queries = Lists.newArrayList(); >> ? ? for (Request req : requests) { >> ? ? ? if (req instanceof RequestById) { >> ? ? ? ? idReqs.add((RequestById)req); >> ? ? ? } else if (req instanceof RequestByValue) { >> valueReqs.add((RequestByValue)req); >> ? ? ? } >> ? ? } >> queries.addAll(prepareIdQueries(idReqs)); >> queries.addAll(prepareValueQueries(valueReqs)); >> ? ? return runQueries(queries); >> ? } >> } >> >> Interestingly, since the Request interface is empty, the only way >> to do anything with this class is to cast it to one >> implementation type. In fact, the RequestFetcher I include here >> is the only usage of either of these classes (plus, of course, >> helpers like prepareIdQueries). >> >> So, clients need to know about specific subclasses, and want to >> be sure they?re doing exhaustive pattern-matching. Seems like a >> great sealed class to me. Except...actually each of the two >> subclasses has been extended by a decorator adding a source[4]: >> >> public record SourcedRequestById(Source source) extends RequestById; >> public record SourcedRequestByValue(Source source) extends >> RequestByValue; >> >> Does this argue in favor of sealing, or against? I don?t really >> know. The owners of Request clearly intended for all four of >> these subclasses to exist (they?re in the same package), so they >> could include them all in the permitted subtype list, but it >> seems like a confusing API to expose to clients. >> >> A third candidate for sealing is another simple sum type: >> >> public interface ValueOrAggregatorException { >> ? T get(); >> ? public static ValueOrAggregatorException >> ? ? of(T value) { >> ? ? return new OfValue(value); >> ? } >> ? public static ValueOrAggregatorException >> ? ? ? ofException(AggregatorException err) { >> ? ? return new OfException(err); >> ? } >> ? private record OfValue(T value) >> ? ? ? implements ValueOrAggregatorException { >> ? ? @Override T get() {return value;} >> ? } >> ? private record OfException(AggregatorException err) >> ? ? ? implements ValueOrAggregatorException { >> ? ? @Override T get() {throw err;} >> ? } >> } >> >> It has only two subtypes, and it seems unimaginable there could >> ever be a third, so why not seal it? However, the subtypes are >> intentionally hidden: it is undesirable to let people see whether >> there?s an exception, except by having it thrown at you. In fact >> AggregatorException is documented as ?plugins may throw this, but >> should never catch it?: there is some higher-level thing >> responsible for catching all such exceptions. So, this type gains >> no benefit from exhaustiveness checks in pattern-matching. The >> type is intended to be used polymorphically, through its >> interface method, even though its private implementation is >> amenable to sealing. >> ________________ >> [1] Throughout this document I will use record syntax as if it >> were already in the language. This is merely for brevity, and to >> avoid making the reader spend a lot of time reading code that >> boils down to just storing a couple fields. In practice, of >> course the code in Google?s codebase either defines the records >> by hand, or uses an @AutoValue. >> [2] Recall that @AutoValue, Google?s ?record?, allows extending a >> class, which is semantically okay here: DbResult has no state, >> only behavior. >> [3]This data is imperfect. While the Google codebase strongly >> discourages having more than one version of a module checked in, >> there is still some amount of ?vendoring? or checking in multiple >> versions of some package, e.g. for supporting external clients of >> an old version of an API. As a result, two ?different files? >> which are really copies of each other may implement interfaces >> with the same fully-qualified name; I did not attempt to control >> for this case, and so such cases may look like they were in the >> same file, or not. >> [4] Of course in the record proposal it is illegal to extend >> records like this; in real life these plain data carriers are >> implemented by hand as ordinary classes, so the subtyping is legal. >