New [draft] JEPs for Pattern Matching
Brian Goetz
brian.goetz at oracle.com
Fri Dec 14 15:14:38 UTC 2018
> I can see plenty of examples and motivating use cases for a single
> pattern test in an if statement, with optional when and optional else.
OK, now we're getting somewhere. What you're really saying is "I can
imagine wanting to do things like...
if (target matches pattern && result-refinement) {
...
}
else {
...
}
...but I'm having trouble seeing examples that go much past that. Please
help me see what you see."
Recall that pattern matching fuses three things:
- A test
- Zero or more conditional extractions, if the test succeeds
- Binding the extracted quantities into fresh variables (which are in
scope in the code dominated by a successful match)
The reason pattern matching makes sense as a linguistic abstraction is
that we tend to do these things together _all the time_. What do you do
after a successful instanceof test? Cast the target -- almost 100% of
the time. What do you do next? Put the result in a variable. What do
you do next? Use the variable, because it has a refined type. Why make
those three things, when they can be one?
The benefit of transforming
if (x instanceof Foo) {
Foo f = (Foo) x;
// use x
}
into
if (x instanceof Foo f) {
// use x
}
is obvious; it's not only more direct, but eliminates an opportunity to
make a silly error. But that already fits into what you've already
imagined; what's outside that?
I think part of the trouble you're having is that the pattern defined in
the first round are relatively weak (instanceof), so they don't get
combined in terribly interesting ways -- yet.
So first, let's digress into where we're going. Another place where we
needlessly decompose the test-extract-bind triplet is in APIs like
Optional. We have `Optional::isPresent` and `Optional::get`; it is a
very common mistake to use the latter without checking the former
first. The root of this user mistake is in the API design; it is an
accident waiting to happen, because these two API points really want to
be _one pattern_. (Without patterns, we design the APIs with the
language we've got, and we get this.) If, instead of having an
unconditional Optional::get method, we had patterns for
`Optional.empty()` and `Optional.of(var contents)`, it would be much
harder to make this mistake, because instead you'd say:
if (o instanceof Optional.of(var contents)) { /* use contents */ }
The two API points we have -- which must always be used together -- can
be fused into a single pattern, which is really what the user needs
anyway, since 99+% of the time we should be calling them together.
OK, now, back to your main question -- why not just do something special
for `if`? Boolean expressions can be used in more than `if`; they can
be used in conditional expressions, while loops, for loops, they can be
returned from methods, etc. All of these are useful with patterns, but
we're not done with "if" yet.
Maybe you have some code that looks like this:
if (x instanceof OptionHolder oh) {
// process options
}
else {
throw new InvalidOptionsException();
}
And suppose the body of the `if` were long. Very often, people prefer
(reasonably so) to invert the `if`, so that you first test your
precondition, fail-fast if it is not met, and then go on to the real work:
if (!(x instanceof OptionHolder oh)) {
throw new InvalidOptionsException();
}
else {
// process options
}
But that only works if I can invert the test. If `instanceof` is a
boolean expression, I can do so trivially. If it's some special form
for if, I can't. So now we've invented a language feature that is
resistant to the most basic of refactorings -- inverting an if-else
block. (Which would surely lead to calls for "please add
if-not-matches".) So clearly, we want to be able to invert a pattern
match.
What about preconditions? Suppose we only want to process the options
when we're in debug mode.
if (debugMode && x instanceof OptionHolder oh) {
// process options
}
That seems a pretty reasonable thing to want to do. So not only do we
want to refine the result of a match, we want to be able to precede it
with a condition.
What about ORing? Same deal. Sticking with the "options processing"
example, imagine we have a command line option processor that returns an
Optional describing the argument of a switch. Imagine further we have
two sources of options; a prefs files we've already parsed, and command
line switches.
if (prefsFile.debug()
|| (getOpt("--debug") instanceof Optional(String s) &&
isBooleanTrue(s))) {
...
}
Could I write this without the pattern match? Sure I could, but it
would be messier and more error-prone.
(Digression: if we wrote an options-processing framework for this,
rather than capture "is this string the boolean true" as a static method
from String to boolean, we'd write it as a pattern, and nest the
patterns, exposing the boolean result we want, and hiding the raw
string, which is only useful for further parsing:
if (getOpt("--debug") instanceof Optional(BooleanString(boolean b))
BooleanString could fail to match if the string were not a valid boolean
(i.e., not "true", "false", "t", "f", "yes", "no", etc) and otherwise
convert the string to a boolean.
End digression.)
OK, so we've seen that even for if, being able to combine it with other
conditionals with && and || is valuable, as is inversion.
If something is useful as the operand of `if`, it's useful in a
conditional expression:
int maxThreads = getOpt("--max-threads") instanceof Optional(String s)
? Integer.parseInt(s)
: DEFAULT_MAX_THREADS;
Why would we tell people they can't refactor their if statements to
conditionals?
If something is useful as the operand of an if, its probably useful on
its own:
boolean debugValue = getOpt("--debug") instanceof Optional(String s)
&& isBooleanTrue(s);
Sometimes we return booleans from methods. Like, `Object::equals`.
Currently, we write `equals()` in a convoluted way (here's some
IDE-generated code):
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof Name)) return false;
Name name = (Name) o;
if (!first.equals(name.first)) return false;
return last.equals(name.last);
}
Such control flow. Much goto.
Instead, I can write this as one expression with a pattern match:
public boolean equals(Object o) {
return (o instanceof Name name)
&& first.equals(name.first)
&& last.equals(name.last);
}
This works because `instanceof` is a boolean expression.
OK, what about other control constructs? We use booleans in do, while,
and for loops. And yes, they're useful there too, though these are
likely more advanced use cases. (The more computation that can be
expressed as a pattern match, such as matching a string against a
regular expression (which is a perfect application for pattern
matching), the more likely you are to want to do this in a loop header.)
TO answer your concrete questions:
> I assume the following is either pointless or a compile error:
> boolean isString = obj instanceof String s;
It's valid code, but its silly, in exactly the same way that we can do
silly stuff today, like declaring a variable and not using it. In fact,
that's exactly what this is -- declaring a variable and not using it.
Legal, but silly.
> What about this - is s in scope in the braces?:
> while (obj instanceof String s) { ... }
Yes. Follow the control flow, and ask yourself: when I get inside the
braces, did the instanceof expression always match? If so, then yes.
(Scoping for binding variables is just definite assignment, which is
just "follow the control flow, was it assigned in all paths that could
lead here?")
> Or this - pointless, or error?:
> list.stream().filter(obj -> obj instanceof String s);
Same as the first one; you've declared a variable that is never used.
It is in scope in invisibly small places where you could usefully put
code if you wanted to, though:
list.stream().filter(o -> o instanceof String s && !s.isBlank())
Again, follow the control flow -- when I get to the s.isBlank(), must
the pattern have matched? If so, then s is in scope.
Finally, as an example of what people might do if we didn't have an
instanceof expression -- they might instead synthesize one with a switch
expression:
String castedToString = null;
boolean b = switch (o) {
case String s -> { castedToString = s; break true; }
default -> false;
};
Yuck. This code is worse than `o instanceof String s` in so many ways.
Would you want to encourage this?
More information about the amber-dev
mailing list