Pattern matching -- background and design goals

Thu Apr 20 18:46:31 UTC 2017

I would like to start the discussion surrounding pattern matching with 
some motivation and goals.  The design space here is enormous, and 
connected to so many other features, so bear with me as I try to 
linearize the story.  I've posted a general introductory document on 
possibilities for pattern matching in Java here:

http://cr.openjdk.java.net/~briangoetz/amber/pattern-match.html

The first question is why, as in: Why is this feature needed in Java at 
all?

I'll start with the obvious reasons (explicated in greater detail in the 
document): the common pattern of testing an input against multiple 
candidates not only suffers from boilerplate, but, more importantly, the 
tools we have for this give bugs too many places to hide, obfuscate the 
essential business logic, and introduce accidental ordering constraints 
that make programs slower than necessary.

There are also some less obvious reasons why this makes sense. What's 
being proposed here are really two interrelated features here: better 
support for compositional test-and-extract (destructuring), and better 
support for handling do-exactly-one-of-these constructs.  While either 
could stand on its own, the two together are much stronger.  Of the two, 
destructuring is the richer concept.

To illustrate why destructuring is so important, allow me to make a 
comparison to Lambda.  Adding the ability to pass behavior as data meant 
that we could build richer, more powerful libraries. We could move 
responsibility for control flow from client code (external iteration) to 
libraries (internal iteration), enabling libraries to expose richer, 
higher-level operations. Additionally, by exposing higher-level 
operations, libraries might not have to expose fiddly, state-dependent 
low-level operations like Iterator.hasNext() / next(), and therefore 
don't need to code iteration logic against being called in the wrong 
order or in the wrong state.

Destructuring has a similar abstraction-raising characteristic. Consider 
the methods Class.isArray() and Class.getComponentType(); the latter 
only returns a value if the former returns true. Method pairs like this 
offer the worst of both worlds; the client has to make two calls, and 
has a chance to do it wrong -- and the library has to defend against the 
client getting it wrong.  These two methods really should be one 
operation, which simplifies both the client invocation and the library 
implementation:

     if (c matches Class.arrayClass(Class<?> componentType)) { ... }

Further, decomposition (when done right), is compositional, enabling us 
to express a compound test+extract operation as a single operation, 
rather than as a sequence of operations, each of which can individually 
fail.  This brings client code more in line with how the problem 
statement is typically stated.

Finally, this is a feature with depth; we can start with simple patterns 
and simple pattern-aware language constructs, and add more sophisticated 
kinds of patterns and more constructs that can use patterns as we go. 
This in turn means a greater return on conceptual complexity.

If one looked only at the first few examples in the document, one might 
be tempted to ask "Why not 'just' do flow typing"?  Flow typing would 
eliminate the (irritating) need to cast something to X right after 
having done an instanceof test against X.  It seems like a cheap win 
(though would invariably put pressure on our handling of intersection 
types), but it just doesn't go very far -- essentially it eliminates a 
single increment of boilerplate and a single place where error can creep 
in.  This is not nothing, and it's a defensible choice, but it seems 
like it would be a missed opportunity.

Similarly, one might ask "Why not 'just' do type switch?".  Again, this 
is a feature that seems merely "additive" rather than multiplicative; it 
again eliminates some boilerplate in repeated type tests, but it doesn't 
go much farther than that.

Which brings me around to a deeper observation, which we think is at the 
design center of this feature: destructuring is the natural dual of 
aggregation, and an obvious (and yet missing) component of 
object-oriented modeling.

People often associate pattern-matching as being a "functional 
programming" feature.  But any language that supports aggregation (that 
is, all of them) also has to address how aggregates are going to be 
decomposed.  Scala moved the ball forward here by showing that having a 
destructuring operator ("unapply") on objects allows us to match and 
destructure objects in terms of what their constructor invocation looks 
like (regardless of their representation.)  This is not "OO borrows FP 
idea", this is "OO applies sensible programming concept in a natural OO 
way."

That said, I think Scala didn't go far enough.  (No criticism intended; 
they moved the ball forward dramatically, there's just farther to go.)  
For what are likely mostly accidental reasons, Scala only allows a 
single constructor pattern per class, even though constructors 
themselves can be overloaded.

Again, without criticism: I'm guessing most of the motivation for 
Scala's pattern matching was drawn from the distinguished and 
well-behaved case of algebraic data types, not general objects. So the 
limitation of a single unapply was not really a problem; algebraic data 
types don't go out of their way to hide or evolve their representation.  
But we can apply destructuring to a wider range of targets, in a more OO 
way.

So, to put a stake in the ground, we believe that:
  - destructuring is the dual of construction, and should be treated as 
a first-class construct;
  - destructuring an object should be syntactically similar to how that 
object is constructed.

 From these principles, it follows that:
  - if you can declare a ctor, you should be able to declare an instance 
dtor;
  - instance dtors should support the same overloading and inheritance 
behaviors as ctors;
  - if you can declare a static factory, you should be able to declare a 
static dtor.

(The syntax of how dtors are declared is a topic for another day.)

If an object is created with a constructor:

     x = new Foo(a, b)

ideally, it should be destructurable with an instance dtor:

     if (x matches Foo(var a, var b))

Note the syntactic similarity; this is not accidental.  Acquiring dtors 
isn't automatic (though data classes can automatically acquire both 
ctors and dtors), but it should be easy and straightforward to declare 
such a dtor.

If dtors are overloaded, type information at the use site may be needed 
to disambiguate.  So if we provide dtors Foo(int y) and Foo(String y), then

     if (x matches Foo(var y))

is ambiguous, but by replacing the var-pattern with a an explicit 
type-test pattern, the compiler can properly select:

     if (x matches Foo(int x))

Similarly, if an object is created with a static factory:

     x = Foo.of(a, b)

it should be destructurable with static dtors:

     if (x matches Foo.of(var a, var b))

So, for example, since Optional has two factories:

     x = Optional.of(y)
     x = Optional.empty()

one can discriminate via:

     case Optional.of(var y):
     case Optional.empty():

Why does this matter?  By way of example, we often prefer factory 
methods to constructors.  A factory has more flexibility than a ctor; it 
can return different subtypes in different conditions, and it can change 
its implementation over time to return different subtypes.  That a 
factory commits only to an abstract type:

     static Foo redFoo() {
         return new MyPrivateFooImplementation();
     }

is great for the implementor, but less so for the client -- once the 
caller gets a Foo from the factory, it has no way of asking "is this a 
red Foo?", unless we clutter the API with ad-hoc methods like 
isRedFoo().  This means that APIs with multiple factories have some bad 
choices: either live with the opacity, which can limit what the client 
can do (sometimes this is good, but sometimes it's not), add lots of new 
API points for destructuring, or expose the intermediate types.

Having a complementary dtor allows the implementation to hide its 
details while still providing a degree of reversibility. Without 
venturing near the toxic syntax bikeshed:

    static ... matcher ... Foo ... redFoo(...) { ... }

the library author can render the factory reversible without exposing 
the implementation details.

     x = redFoo();
     ...

     if (x matches redFoo()) { ... }

The implementation of redFoo() remains hidden, but because the dtor is 
part of the same library as the factory, it can still help a client 
reconstruct the object's state (or the parts that it wants to), without 
compromising the encapsulation.  (In just the last few weeks of thinking 
about this, once you start to see this pattern, you can't look at an API 
and not see it.)

Again, this is mostly something to file in the background and think 
about the consequences of -- until we start to circulate a JEP for 
pattern matching, this is still technically outside the Amber charter.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20170420/53c060a4/attachment-0001.html>