Feedback on Sealed Types

Wed May 1 12:32:40 UTC 2019

> De: "Brian Goetz" <brian.goetz at oracle.com>
> À: "Alan Malloy" <amalloy at google.com>
> Cc: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Envoyé: Lundi 29 Avril 2019 23:01:17
> Objet: Re: Feedback on Sealed Types

> It would be nice if we could "just" overload enum itself to support a
> record-like option:

> enum Node {
> AddNode(Node a, Node b),
> MulNode(Node a, Node b),
> ...;

> ...
> }

> but unfortunately that syntax looks confusingly close to something else :( It
> also doesn't really scale to multi-level hierarchies, though that might be OK.

> Its worth thinking about, though. The `data` construct from Haskell is surely
> more direct than modeling a sum with sealed interfaces (though, the latter is
> also more flexible than `data`.)

> On the other hand, declaring a sum of records as:

> sealed interface I {
> record A(int a) implements I { }
> record B(long b) implements I { }
> record C(String c) implements I { }
> }

> isn't so bad. The main redundancy here is primarily "implements I", and
> secondarily "record". We can surely compress that away like this:

> enum interface I {
> A(int a),
> B(long b),
> C(String c);

> // I methods
> }

> but I am not sure it carries its weight, given as it adds little additional
> concision (and no additional semantics.)
It may solve the enclosing issue because the ';' syntactically separate A, B and C from the content of I which is declared after the ';', so A, B and C can be top-level. 

I kind a like the intellectual separation between 
- a sealed interface which represent a closed type and requires a permit clause and 
- an enum interface which represent a sum type which is sugar on top of sealed interface + records. 

one interesting question is how to desugar an enum interface with a component that has no parameter, like 
enum interface Option<T> { 
Some(T value), 
Empty 
} 

If there is only one constant of type Empty and the construction is typesafe, it can be a huge win. 

Rémi 

> On 4/29/2019 4:33 PM, Alan Malloy wrote:

>> Thanks, Brian. I indeed didn't think of some of your proposed benefits of
>> sealing non-sum types, as I was focused mostly on things you mentioned
>> explicitly in the JEP, which is somewhat light on the expected benefits.

>> I think the first two items in your "challenges" solve each other: I don't
>> intend sum types to be the only kind of sealed type, but just a good way to
>> declare the simplest kind. I left out the "record" keyword from the declaration
>> with the idea that it would be implicit: if you want the convenient
>> sum-of-products declaration style, you have to use records. If you want
>> something more complicated, you declare a sealed interface (or superclass), and
>> N permitted subclasses, declared separately in whatever way you want. This
>> restriction helps by making the semantics clearer, and I had also hoped that it
>> would lead to a syntax error if you leave out the comma. Looking more closely,
>> I see this is somewhat precarious: a record declaration looks enough like a
>> method signature that they may be ambiguous in an interface, if you don't
>> require the "record" keyword, or if you use a semicolon instead of a comma. I
>> think it can still work if we require each nested record to use {...} instead
>> of ; even if it's empty. This way, your two examples look like

>> interface X {
>> class X1 { … }
>> class X2 { … }
>> }

>> and

>> enumerated interface Y {
>> Y1 { … }, <— don’t forget this comma!
>> Y2 { … }
>> }

>> The latter would become illegal if you dropped the comma, even if you also
>> forgot the "enumerated" keyword, because the braces make no sense in an
>> ordinary interface.

>> On Mon, Apr 29, 2019 at 12:35 PM Brian Goetz < [ mailto:brian.goetz at oracle.com |
>> brian.goetz at oracle.com ] > wrote:

>>> Thanks Alan, for this nice exploration. There’s a lot to respond to. I’ll start
>>> with some general comments about sealing, and then move on to your alternate
>>> proposal for exposing it.

>>> I can think of several main reasons why you would want to seal a hierarchy.

>>> - To say something about the _interface itself_. That is, that it was not
>>> designed as a general purpose contract between arms-length entities, but that
>>> it exists only as a common super type for a fixed set of classes that are part
>>> of the library. In other words, “please don’t implement me.”

>>> - To say something about the semantics of the type. Several of the examples in
>>> your report fall into this category: “a DbResult is either a NoRowsFound or a
>>> Rows(List<Row>)”. This tells users exactly what the reasonable range of results
>>> to expect are when doing a query. Of course, the spec could say the same thing,
>>> but that involves reading and interpreting the spec. Easier if this conclusion
>>> can be driven by types (and IDEs can help more here too.)

>>> - To strengthen typing by simulating unions. If my method is going to return
>>> either a String or a Number, the common super type is Object. (Actually, it’s
>>> some variant of Serializable & Comparable<? extends….>.). Sums-of-products
>>> allow library authors to make a stronger statement about types in the presence
>>> of unions. Exposing a sum of StringHolder(String) and NumberHolder(Number),
>>> using records and sealed types, is not so ceremonious, so some library
>>> developers might choose to do this instead of Object.

>>> - Security. Some libraries may want to know that the code they are calling is
>>> part of their library, rather than an arbitrary implementation of some
>>> interface.

>>> - To aid in exhaustiveness. We’ve already discussed this at length; your point
>>> is that this one doesn’t come up as often as one might hope.

>>> Not only is there an obvious synergy between sums and products (as many
>>> languages have demonstrated), but there is a third factor, which is “if you
>>> make it easy enough, people will use it more.” Clearly records are pretty easy
>>> to use; your point is that if there were a more streamlined sum-of-products
>>> idiom, the third factor would be even stronger here. I think algebraic data
>>> types is one of those things that will take some time for developers to learn
>>> to appreciate; the easier we make it, of course the faster that will happen.

>>> Now, to your syntax suggestion. Overall, I like the idea, but I have some
>>> concerns. First, the good parts:

>>> - The connection with enums is powerful. Users already understand enums, so this
>>> will help them understand sums. Enums have special treatment in switch; we want
>>> the same treatment for sealed type patterns. Enums have special treatment for
>>> exhaustiveness; we want the same for sealed type patterns. So tying these
>>> together with some more general enum-ness leans on what people already know.

>>> - While sums and products are theoretically independent features,
>>> sums-of-products are expected to be quite common. So it might be reasonable to
>>> encourage this syntactically.

>>> - The current proposal has some redundancy, in that the subtypes have to say
>>> “implements Node”, even if they are nested within Node. With a stronger
>>> mechanism for declaring them, as you propose, then that can safely be left
>>> implicit.

>>> - I confess that I too like the simplicity of Haskell’s `data` declaration, and
>>> this brings us closer.

>>> Now, the challenges:

>>> - The result is still a little busy. We need a modifier for “enumerated type”,
>>> and we would also need to be able to have child types be not only records, but
>>> ordinary classes and interfaces. So we’d have to have a place for “record”,
>>> “class”, or “interface” with the declaration of the enumerated classes (as well
>>> as other modifiers.). That busies up the result a bit.

>>> - Once we do this, I worry that it will be hard to tell the difference between:

>>> interface X {
>>> class X1 { … }
>>> class X2 { … }
>>> }

>>> and

>>> enumerated interface Y {
>>> class Y1 { … }, <— don’t forget this comma!
>>> class Y2 { … }
>>> }

>>> and that users will forever be making mistakes like forgetting the comma, or
>>> putting it where it doesn’t belong.

>>> - This mechanism addresses the very common case of sum-of-product, but leaves
>>> more esoteric sums out of the picture. (Consider the types in
>>> java.lang.constant, which really want to be sealed.). There, because they are
>>> not co-declared, we’d need something more like

>>> sealed interface ConstantDesc
>>> permits ClassDesc, MethodTypeDesc, …. { }

>>> It's possible that such a mechanism can be grafted on to your proposal, or there
>>> is a shuffling that supports it.

>>>> On Apr 29, 2019, at 2:28 PM, Alan Malloy < [ mailto:amalloy at google.com |
>>>> amalloy at google.com ] > wrote:

>>>> Hello again, amber-spec-experts. I have another report from the Google codebase,
>>>> this time focusing on sealed types. It is viewable in full Technicolor HTML at
>>>> [ http://cr.openjdk.java.net/~cushon/amalloy/sealed-types-report.html |
>>>> http://cr.openjdk.java.net/~cushon/amalloy/sealed-types-report.html ] (thanks
>>>> again to Liam for hosting), and included below as plain text:

>>>> Author: Alan Malloy ( [ mailto:amalloy at google.com | amalloy at google.com ] )
>>>> Published: 2019-04-29

>>>> Feedback on Sealed Types

>>>> Hello again, amber-spec-experts. I’m back with a second Google codebase research
>>>> project. I’m looking again at the Records & Sealed Types proposal (which has
>>>> now become JDK-8222777), but this time I’m focusing on sealed types instead of
>>>> records, as promised in my RFC of a few weeks ago. My goal was to investigate
>>>> Google’s codebase to guess what developers might have done differently if they
>>>> had access to sealed types. This could help us understand what works in the
>>>> current proposal and what to consider changing.

>>>> Unlike my previous report, this one contains more anecdotes than statistics. It
>>>> wound up being difficult to build static analysis to categorize the interesting
>>>> cases, so I mostly examined promising candidates by hand.

>>>> Summary and Recommendations

>>>> For those who don’t care to read through all my anecdotes, I first provide a
>>>> summary of my findings, and one suggested addition.

>>>> Sealed types, as proposed so far, are a good idea in theory: Java already has
>>>> product types and open polymorphism, and sealed types give us closed
>>>> polymorphism. However, I could not find many cases of code being written today
>>>> that would be greatly enhanced if sealed types were available. The main selling
>>>> point of sealed types for application authors is getting help from the compiler
>>>> with exhaustiveness checking, but in practice developers almost always have a
>>>> default case, because they are only interested in a subset of the possible
>>>> subclasses, and want to ignore cases they don’t understand. This means that
>>>> exhaustiveness-checking for pattern matches would mostly go unused if
>>>> developers rewrote their existing code using sealed types.

>>>> Pattern matching is great, and can replace visitors in many cases, but this does
>>>> not depend on sealed types except for exhaustiveness checks (which, again,
>>>> would go mostly unused in code written today). The class hierarchies for which
>>>> people define visitors today are just too large to write an exhaustive pattern
>>>> match, and so a default case would be very common.

>>>> The other audience for sealed types is library authors. While in practice most
>>>> developers have no great need to forbid subclasses, perhaps it would be a boon
>>>> for authors of particularly popular libraries, who need to expose a non-final
>>>> class as an implementation detail but don’t intend for consumers to create
>>>> their own subclasses. Those authors can already include documentation saying
>>>> “you really should not extend this”, but there is always some weirdo out there
>>>> who will ignore your warnings and then write an angry letter when the next
>>>> version of your library breaks his program (see: sun.misc.Unsafe). Authors of
>>>> such libraries would welcome the opportunity to make it truly impossible to
>>>> create undesirable subclasses.

>>>> Sealed Types As a Vehicle For Sum Types

>>>> So, sealed types as-is would be an improvement, but a niche one, used by few. I
>>>> think we can get substantially more mileage out of them if we also include a
>>>> more cohesive way to explicitly define a sum type and all its subtypes in one
>>>> place with minimal ceremony. Such a sum type could be sealed, implicitly or
>>>> explicitly. A tool like this takes what I see as the “theoretical” advantage of
>>>> sum types (closed polymorphism), and makes it “practical” by putting it front
>>>> and center. Making sums an actual language element instead of something
>>>> “implied” by sealing a type and putting its subclasses nearby could help in a
>>>> lot of ways:

>>>> * Developers might more often realize that a sealed/sum type is a good model for
>>>> their domain. Currently it’s a “pattern” external to the language instead of a
>>>> “feature”, and many don’t realize it could be applied to their domain. Putting
>>>> it in the language raises its profile, addressing the problem that people don’t
>>>> realize they want it.
>>>> * The compiler could provide help for defining simple sums-of-products, while
>>>> making it possible to opt into more complicated subclasses, in much the way
>>>> that enums do: the typical enum just has bare constants like EAST, but you can
>>>> add constructor arguments or override methods when necessary.
>>>> * The ability to more easily model data in this way may result in developers
>>>> writing more classes that are amenable to sealing/sums, as they do in other
>>>> languages with explicit sum types (Haskell, Kotlin, Scala). Then, the
>>>> exhaustiveness-checking feature that sealed types provide would pull more
>>>> weight.

>>>> Since enum types are “degenerate sum types”, the syntax for defining sums can
>>>> borrow heavily from enums. A sketch of the syntax I imagine for such things (of
>>>> course, I am not married to it):
>>>> public type-enum interface BinaryTree<T> {
>>>> Leaf {
>>>> @Override public Stream<T> elements() {return Stream.empty();}
>>>> },
>>>> Node(T data, BinaryTree<T> left, BinaryTree<T> right) {
>>>> @Override public Stream<T> elements() {
>>>> return Stream.concat(left.elements(),
>>>> Stream.concat(Stream.of(data), right.elements()));
>>>> }
>>>> };

>>>> public Stream<T> elements();
>>>> }

>>>> Like enums, you can use a bare identifier for simple types that exist only to be
>>>> pattern-matched against, but you can add fields and/or override blocks as
>>>> necessary. The advantage over declaring a sealed type separately from its
>>>> elements is both concision (the compiler infers visible records, superclass,
>>>> and all type parameters) and clarity: you state your intention firmly. I think
>>>> a convenient syntax like this will encourage developers to use the powerful
>>>> tool of sealed types to model their data.

>>>> Evidence in Google’s Codebase

>>>> If you are just interested in recommendations, you can stop reading now: they
>>>> are all included in the summary. What follows is a number of anecdotes, or case
>>>> studies if you prefer, that led me to the conclusions above. Each shows a type
>>>> that might have been written as a sealed type, and hopefully highlights a
>>>> different facet of the question of how sealed types can be useful.

>>>> The first thing I looked for was classes which are often involved in instanceof
>>>> checks. As language authors, we imagine people writing stuff like this[1] all
>>>> the time:

>>>> interface Expr {int eval(Scope s);}
>>>> record Var(String name) implements Expr {
>>>> public int eval(Scope s) {return s.get(name);}
>>>> }
>>>> record Sum(Expr left, Expr right) implements Expr {
>>>> public int eval(Scope s) {return left.eval(s) + right.eval(s);}
>>>> }
>>>> class Analyzer {
>>>> Stream<String> variablesUsed(Expr e) {
>>>> if (e instanceof Var) return Stream.of(((Var)e).name);
>>>> if (e instanceof Sum) {
>>>> return variablesUsed(((Sum)e).left)
>>>> .concat(variablesUsed(((Sum)e).right));
>>>> }
>>>> throw new IllegalArgumentException();
>>>> }
>>>> }

>>>> Here, the Expr interface captures some of the functionality shared by all
>>>> expressions, but later a client (Analyzer) came along and invented some other
>>>> polymorphic operations to perform on an Expr, which Expr did not support. So
>>>> Analyzer needed to do instanceof checks instead, externalizing the
>>>> polymorphism. The principled approach would have been for Expr to export a
>>>> visitor to begin with, but perhaps it wasn’t seen as worth the trouble at the
>>>> time.

>>>> To try to find this pattern in the wild, I searched for method bodies which
>>>> perform multiple instanceof checks against the same variable. Notably, this
>>>> excludes the typical equals(Object) method, which only performs a single check.
>>>> For each such variable, I noted:

>>>> 1. Its declared type
>>>> 2. The set of subtypes it was checked for with instanceof
>>>> 3. The common supertype of those subtypes.

>>>> I guessed that (3) would usually be the same as (1), but in practice 55% of the
>>>> time they were different. Often, the declared type was Object, or some generic
>>>> type variable which erases to Object, while the common supertype being tested
>>>> was something like Number, Event, or Node. For example, a Container<T> knows it
>>>> will be used in some context where NaN is unsuitable, so it checks whether its
>>>> contents are Float or Double, and if so ensures NaN is not stored. As a second
>>>> example, a serialize(Object) method checks whether its input is String or
>>>> ByteString, and throws an exception otherwise.

>>>> Bad sealed types found looking at instanceof checks

>>>> I looked through the most popular declared types of these candidates, to
>>>> investigate which types are often involved in such checks. Most of them are not
>>>> good candidates for a sealed type. Object was the most common type, followed by
>>>> Exception and Throwabe.

>>>> Next up is an internal DOMObject class, which sounds promising until I tell you
>>>> it has thousands of direct subclasses. Nobody is doing exhaustive switches on
>>>> this, of course. Instead, many uses iterate over a Collection<DOMObject>, or
>>>> receive a DOMObject in some way, and just check whether it is of one or two
>>>> specific subtypes they care about. This turned out to be a very common pattern,
>>>> not just for DOMObject, but for many candidate sealed types I found: nobody
>>>> does exhaustive case analysis. They just look for objects they understand in
>>>> some much larger hierarchy, and ignore the rest.

>>>> Some more humorous types that are often involved in instanceof checks:
>>>> java.net.InetAddress (everyone wants to know if it’s v4 or v6) and
>>>> com.sun.source.tree.Tree, in our static-analysis tools. Tree is an interesting
>>>> case: here we do exactly what I mentioned previously for DOMObject. On the
>>>> surface it seems that Tree would be a good candidate for a sealed interface
>>>> with record subtypes, but in practice I’m not sure what sealing would buy us.
>>>> We would effectively opt out of exhaustiveness-checking by having a large
>>>> default case, or by extending a visitor with default-empty methods. Of course,
>>>> sometimes we define a new visitor to do some polymorphic operation over a Tree,
>>>> but more often we just look for one or two subtypes we care about. For example,
>>>> DataFlow inspects a Tree, but knows from context that it is either a
>>>> LambdaExpressionTree, MethodTree, or an initializer.

>>>> Plausible sealed types found looking at instanceof checks

>>>> The previous section notwithstanding, I did dig deep enough into the results to
>>>> find a few classes that could make good sealed types. The most prominent, and
>>>> most interesting, was another AST. There is an abstract Node class for
>>>> representing HTML documents. It has just 4 subclasses defined in the same file:
>>>> Text, Comment, Tag, and EndTag. This spartan definition suggests it’s used for
>>>> something like SAX parsing, but I didn’t confirm this. It does everything you
>>>> could hope for from a type like this: it exposes a Visitor, it provides an
>>>> accept(Visitor) method, and the superclass specifies abstract methods for a
>>>> couple of the most common things you would want to do, such as a String
>>>> toHtml() method.

>>>> However, recall that I found this class by looking for classes often involved in
>>>> instanceof checks! Some people use the visitor, but why doesn’t everyone? The
>>>> first reason I found is one I’ve mentioned several times already: clients only
>>>> care about one of the 4 cases, and may have felt creating an anonymous visitor
>>>> is too much ceremony. Would they be happy with a switch and a default clause?
>>>> Probably, but it’s hard to know for sure. The second reason surprised me a bit:
>>>> I found clients doing analysis that isn’t really amenable to any one visitor,
>>>> or a simple pattern-match. They’ve written this:

>>>> if (mode1) { if (x instanceof Tag) {...} }
>>>> else if (mode2) { if (x instanceof Text) {...}}

>>>> The same use site cares about different subclasses at different times, depending
>>>> on some other flag(s) controlling its behavior. Even if we offered a
>>>> pattern-match on x, it’s difficult to encode the flags correctly. They would
>>>> have to match on a tuple of (mode1, mode2, x), with a case for (true, _, Tag
>>>> name) and another for (false, true, Text text). Technically possible, but not
>>>> really prettier than what they already have, especially since you would need to
>>>> use a local record instead of an anonymous tuple.

>>>> Even so, I think this would have benefited from being a sealed type. Recall that
>>>> earlier I carefully said “4 subclasses defined in the same file”. This is
>>>> because some jokester in a different package altogether has defined their own
>>>> fifth subclass, Doctype. They have their own sub-interface of Visitor that
>>>> knows about Doctype nodes. I can’t help but feel that the authors of Node would
>>>> have preferred to make this illegal, if they had been able to.

>>>> The second good sealed type I found is almost an enum, except that one of the
>>>> instances has per-instance data. This is not exactly a surprise, since an enum
>>>> is a degenerate sum type, and one way to think of sealed types is as a way to
>>>> model sums. It looks something like this[2]:

>>>> public abstract class DbResult<T> {
>>>> public record NoDatabase<T>() extends DbResult<T>;
>>>> public record RowNotFound<T>() extends DbResult<T>;
>>>> // Four more error types ...
>>>> public record EmptySuccess<T>() extends DbResult<T>;
>>>> public record SuccessWithData<T>(T data) extends DbResult<T>;

>>>> public T getData() {
>>>> if (!(this instanceof SuccessWithData))
>>>> throw new DbException();
>>>> return ((SuccessWithData<T>)this).data;
>>>> }
>>>> public <U> DbResult<U> transform(Function<T, U> f) {
>>>> if (!(this instanceof SuccessWithData)) {
>>>> return (DbResult<U>)this;
>>>> }
>>>> return new SuccessWithData<U>(f.apply(
>>>> ((SuccessWithData<T>)this).data));
>>>> }

>>>> Reading this code made me yearn for Haskell: here is someone who surely wanted
>>>> to write

>>>> data DbResult t = NoDatabase | NoRow | EmptySuccess | Success t

>>>> but had to spend 120 lines defining their sum-of-products (the extra verbosity
>>>> is because really they made the subclasses private, and defined private static
>>>> singletons for each of the error types, with a static getter to get the type
>>>> parameter right). This seems like a potential win for records and for sealed
>>>> types. Certainly my snippet was much shorter than the actual source file
>>>> because the proposed record syntax is quite concise, so that is a real win. But
>>>> what do we really gain from sealing this type? Still nobody does exhaustive
>>>> analysis even of this relatively small type: they just use functions like
>>>> getData and transform to work with the result generically, or spot-check a
>>>> couple interesting subtypes with instanceof. Forbidding subclassing from other
>>>> packages hardly matters: nobody was subclassing it anyway, and nor would they
>>>> be tempted to. Really the improvements DbResult benefits most from are records,
>>>> and pattern-matching on records. It would be much nicer to replace the
>>>> instanceof/cast pattern with a pattern-match that extracts the relevant field.

>>>> This is the use case that inspired my idea of a type-enum, in the Summary
>>>> section above. Rewriting it as a type-enum eliminates many of the problems: all
>>>> the instanceof checks are gone, we don’t need a bunch of extra keywords for
>>>> each case, and we’re explicit about the subclasses “belonging to” the sealed
>>>> parent, which means we get stuff like extends and <T> for free. We get improved
>>>> clarity by letting the definition of the class hierarchy reflect its “nature”
>>>> as a sum.

>>>> public abstract type-enum DbResult<T> {
>>>> NoDatabase,
>>>> RowNotFound,
>>>> EmptySuccess,
>>>> SuccessWithData(T data) {
>>>> @Override public T getData() {
>>>> return data;
>>>> }
>>>> @Override public <U> DbResult<U> transform(Function<T, U> f) {
>>>> return new SuccessWithData<U>(f.apply(data));
>>>> }
>>>> }

>>>> public T getData() {
>>>> throw new DbException();
>>>> }
>>>> public <U> DbResult<U> transform(Function<T, U> f) {
>>>> return (DbResult<U>)this;
>>>> }
>>>> }

>>>> Visitors

>>>> Instead of doing a bunch of instanceof checks, the “sophisticated” way to
>>>> interact with a class having a small, known set of subtypes is with a visitor.
>>>> I considered doing some complicated analysis to characterize what makes a class
>>>> a visitor, and trying to automatically cross-reference visitors to the classes
>>>> they visit...but in practice simply looking for classes with “Visitor” in their
>>>> name was a strong enough signal that a more complicated approach was not
>>>> needed. Having identified visitors, I looked at those visitors with the most
>>>> subclasses, since each distinct subclass corresponds to one “interaction” with
>>>> the sealed type that it visits, and well-used visitors suggest both popularity
>>>> and good design.

>>>> One common theme I found: developers aren’t good at applying the visitor
>>>> pattern. Many cases I found had some weird and inexplicable quirk compared to
>>>> the “standard” visitor. These developers will be relieved to get
>>>> pattern-matching syntax so they can stop writing visitors.

>>>> The Visiting Object

>>>> The first popular visitor I found was a bit odd to me. It’s another tree type,
>>>> but with a weird amalgam of several visitors, and an unusual approach to its
>>>> double dispatch. I have to include a relatively lengthy code snippet to show
>>>> all of its facets:

>>>> public static abstract class Node {
>>>> public interface Visitor {
>>>> boolean process(Node node);
>>>> }
>>>> public boolean visit(Object v) {
>>>> return v instanceof Visitor
>>>> && ((Visitor)v).process(this);
>>>> }
>>>> // Other methods common to all Nodes ...
>>>> }

>>>> public static final class RootNode extends Node {
>>>> public interface Visitor {
>>>> boolean processRoot(RootNode node);
>>>> }
>>>> @Override
>>>> public boolean visit(Object v) {
>>>> return v instanceof Visitor
>>>> ? ((Visitor)v).processRoot(this)
>>>> : super.visit(v);
>>>> }
>>>> // Other stuff about root nodes ...
>>>> }

>>>> public static abstract class ValueNode extends Node {
>>>> public interface Visitor {
>>>> boolean processValue(ValueNode node);
>>>> }
>>>> @Override
>>>> public boolean visit(Object v) {
>>>> return v instanceof Visitor
>>>> ? ((Visitor)v).processValue(this)
>>>> : super.visit(v);
>>>> }
>>>> }

>>>> public static final class BooleanNode extends ValueNode {
>>>> public interface Visitor {
>>>> boolean processBool(BooleanNode node);
>>>> }
>>>> @Override
>>>> public boolean visit(Object v) {
>>>> return v instanceof Visitor
>>>> ? ((Visitor)v).processBool(this)
>>>> : super.visit(v);
>>>> }
>>>> // Other stuff about booleans ...
>>>> }

>>>> public static final class StringNode extends ValueNode {
>>>> // Much the same as BooleanNode
>>>> }

>>>> This goes on for some time: there is a multi-layered hierarchy of dozens of node
>>>> types, each with a boolean visit(Object) method, and their own distinct Visitor
>>>> interface, in this file. I should note that this code is actually not written
>>>> by a human, but rather generated by some process (I didn’t look into how). I
>>>> still think it is worth mentioning here for two reasons: first, whoever wrote
>>>> the code generator would probably do something similar if writing it by hand,
>>>> and second because these visitors are used often by hand-written code.

>>>> Speaking of hand-written code, visitor subclasses now get to declare ahead of
>>>> time exactly which kinds of nodes they care about, by implementing only the
>>>> appropriate Visitor interfaces:

>>>> private class FooVisitor implements StringNode.Visitor,
>>>> BooleanNode.Visitor, RootNode.Visitor {
>>>> // ...
>>>> }

>>>> This isn’t how I would have written things, but I can sorta see the appeal, if
>>>> you don’t have to write it all by hand: a visitor can choose to handle any one
>>>> subclass of ValueNode, or all ValueNodes, or just RootNode and StringNode, et
>>>> cetera. They get to pick and choose what sub-trees of the inheritance tree they
>>>> work with.

>>>> Would Node be a good sealed class? Maybe. It clearly intends to enumerate all
>>>> subclasses, but the benefit it gets from enforcing that is minimal. As in my
>>>> previous examples, the main advantage for Node implementors would come from
>>>> records, and the main advantage for clients would come from pattern-matching,
>>>> obviating their need for this giant visitor.

>>>> The Enumerated Node

>>>> Another AST, this time for some kind of query language, explicitly declares an
>>>> enum of all subclasses it can have, and uses this enum instead of using
>>>> traditional double-dispatch:

>>>> public interface Node {
>>>> enum Kind {EXPR, QUERY, IMPORT /* and 9 more */}
>>>> Kind getKind();
>>>> Location getLocation();
>>>> }

>>>> public abstract record AbstractNode(Location l) implements Node {}

>>>> public class Expr extends AbstractNode {
>>>> public Kind getKind() {return EXPR;}
>>>> // ...
>>>> }
>>>> // And so on for other Kinds ...

>>>> public abstract class Visitor {
>>>> // Empty default implementations, not abstract.
>>>> public Expr visitExpr(Expr e) {}
>>>> public Query visitQuery(Query q) {}
>>>> public Import visitImport(Import i) {}
>>>> public Node visit(Node n) {
>>>> switch (n.getKind()) {
>>>> case EXPR: return visitExpr((Expr)n);
>>>> case QUERY: return visitQuery((Query)n);
>>>> case IMPORT: return visitImport((Import)n);
>>>> // ...
>>>> }
>>>> }
>>>> }

>>>> It’s not really clear to me why they do it this way, instead of putting an
>>>> accept(Visitor) method on Node. They gain the ability to return different types
>>>> for each Node subtype, but are hugely restricted in what visitors can do: they
>>>> must return a Node, instead of performing an arbitrary computation. It seems
>>>> like the idea is visitors must specialize to tree rewriting, but I still would
>>>> have preferred to parameterize the visitor by return type.

>>>> Would this be better as a sealed type? I feel sure that if sealed types existed,
>>>> the authors of this class would have used one. We could certainly do away with
>>>> the enum, and use an expression-switch instead to pattern-match in the
>>>> implementation of visit(Node). But I think the Visitor class would still exist,
>>>> and still have separate methods for each Node subtype, because they developer
>>>> seemed to care about specializing the return type. The only place where an
>>>> exhaustiveness check helps would be in the visit(Node) method, inside the
>>>> visitor class itself. All other dispatch goes through visit(Node), or through
>>>> one of the specialized visitor methods if the type is known statically. It
>>>> seems like overall this would be an improvement, but again, the improvement
>>>> comes primarily from pattern-matching, not sealing.

>>>> Colocated interface implementations

>>>> Finally, I looked for interfaces having all of their implementations defined in
>>>> the same file. On this I do have some statistical data[3]. A huge majority
>>>> (98.5%) of public interfaces have at least one implementation in a different
>>>> source file. Package-private interfaces also tend to have implementations in
>>>> other files: 85% of them are in this category. For protected interfaces it’s
>>>> much closer: only 53% have external implementations. Of course, all private
>>>> interfaces have all implementations in a single file.

>>>> Next, I looked at interfaces that share a source file with all their
>>>> implementations, to see whether they’d make good sealed types. First was this
>>>> Entry class:

>>>> public interface Entry {
>>>> enum Status {OK, PENDING, FAILED}
>>>> Status getStatus();
>>>> int size();
>>>> String render();
>>>> }

>>>> public class UserEntry implements Entry {
>>>> private User u;
>>>> private Status s;
>>>> public UserEntry(User u, Status s) {
>>>> this.u = u;
>>>> this.s = s;
>>>> }
>>>> @Override String render() {return [ http://u.name/ | u.name ] ();}
>>>> @Override int size() {return 1;}
>>>> @Override Status getStatus() {return s;}
>>>> }

>>>> public class AccountEntry implements Entry {
>>>> private Account a;
>>>> private Status s;
>>>> public UserEntry(Account a, Status s) {
>>>> this.a = a;
>>>> this.s = s;
>>>> }
>>>> @Override String render() {return a.render();}
>>>> @Override int size() {return a.size();}
>>>> @Override Status getStatus() {return s;}
>>>> }

>>>> A huge majority of the clients of this Entry interface treat it polymorphically,
>>>> just calling its interface methods. In only one case is there an instanceof
>>>> check made on an Entry, dispatching to different methods depending on which
>>>> subclass is present.

>>>> Is this a good sealed type? I think not, really. There are two implementations
>>>> now, but perhaps there will be a GroupEntry someday. Existing clients should
>>>> continue to work in that case: the polymorphic Entry interface provides
>>>> everything clients are “intended” to know.

>>>> Another candidate for sealing:

>>>> public interface Request {/* Empty */}
>>>> public record RequestById(int id) implements Request;
>>>> public record RequestByValue(String owner, boolean historic) implements Request;

>>>> public class RequestFetcher {
>>>> public List<Item> fetch(Iterable<Request> requests) {
>>>> List<RequestById> idReqs = Lists.newArrayList();
>>>> List<RequestByValue> valueReqs = Lists.newArrayList();
>>>> List<Query> queries = Lists.newArrayList();
>>>> for (Request req : requests) {
>>>> if (req instanceof RequestById) {
>>>> idReqs.add((RequestById)req);
>>>> } else if (req instanceof RequestByValue) {
>>>> valueReqs.add((RequestByValue)req);
>>>> }
>>>> }
>>>> queries.addAll(prepareIdQueries(idReqs));
>>>> queries.addAll(prepareValueQueries(valueReqs));
>>>> return runQueries(queries);
>>>> }
>>>> }

>>>> Interestingly, since the Request interface is empty, the only way to do anything
>>>> with this class is to cast it to one implementation type. In fact, the
>>>> RequestFetcher I include here is the only usage of either of these classes
>>>> (plus, of course, helpers like prepareIdQueries).

>>>> So, clients need to know about specific subclasses, and want to be sure they’re
>>>> doing exhaustive pattern-matching. Seems like a great sealed class to me.
>>>> Except...actually each of the two subclasses has been extended by a decorator
>>>> adding a source[4]:

>>>> public record SourcedRequestById(Source source) extends RequestById;
>>>> public record SourcedRequestByValue(Source source) extends RequestByValue;

>>>> Does this argue in favor of sealing, or against? I don’t really know. The owners
>>>> of Request clearly intended for all four of these subclasses to exist (they’re
>>>> in the same package), so they could include them all in the permitted subtype
>>>> list, but it seems like a confusing API to expose to clients.

>>>> A third candidate for sealing is another simple sum type:

>>>> public interface ValueOrAggregatorException<T> {
>>>> T get();
>>>> public static <T> ValueOrAggregatorException<T>
>>>> of(T value) {
>>>> return new OfValue<T>(value);
>>>> }
>>>> public static <T> ValueOrAggregatorException<T>
>>>> ofException(AggregatorException err) {
>>>> return new OfException<T>(err);
>>>> }
>>>> private record OfValue<T>(T value)
>>>> implements ValueOrAggregatorException<T> {
>>>> @Override T get() {return value;}
>>>> }
>>>> private record OfException<T>(AggregatorException err)
>>>> implements ValueOrAggregatorException<T> {
>>>> @Override T get() {throw err;}
>>>> }
>>>> }

>>>> It has only two subtypes, and it seems unimaginable there could ever be a third,
>>>> so why not seal it? However, the subtypes are intentionally hidden: it is
>>>> undesirable to let people see whether there’s an exception, except by having it
>>>> thrown at you. In fact AggregatorException is documented as “plugins may throw
>>>> this, but should never catch it”: there is some higher-level thing responsible
>>>> for catching all such exceptions. So, this type gains no benefit from
>>>> exhaustiveness checks in pattern-matching. The type is intended to be used
>>>> polymorphically, through its interface method, even though its private
>>>> implementation is amenable to sealing.
>>>> ________________
>>>> [1] Throughout this document I will use record syntax as if it were already in
>>>> the language. This is merely for brevity, and to avoid making the reader spend
>>>> a lot of time reading code that boils down to just storing a couple fields. In
>>>> practice, of course the code in Google’s codebase either defines the records by
>>>> hand, or uses an @AutoValue.
>>>> [2] Recall that @AutoValue, Google’s “record”, allows extending a class, which
>>>> is semantically okay here: DbResult has no state, only behavior.
>>>> [3]This data is imperfect. While the Google codebase strongly discourages having
>>>> more than one version of a module checked in, there is still some amount of
>>>> “vendoring” or checking in multiple versions of some package, e.g. for
>>>> supporting external clients of an old version of an API. As a result, two
>>>> “different files” which are really copies of each other may implement
>>>> interfaces with the same fully-qualified name; I did not attempt to control for
>>>> this case, and so such cases may look like they were in the same file, or not.
>>>> [4] Of course in the record proposal it is illegal to extend records like this;
>>>> in real life these plain data carriers are implemented by hand as ordinary
>>>> classes, so the subtyping is legal.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20190501/2fa4dfe1/attachment-0001.html>