Pattern Matching
Remi Forax
forax at univ-mlv.fr
Sat Mar 18 17:22:02 UTC 2017
Hi guys,
i've already implemented a kind of pattern matching in a language that run on the JVM (still closed source :( ),
so i've said to Brian that i will send a summary of how it is done.
please do not care about the syntax, it's just for explanation :)
regards,
Rémi
---
Pattern Matching
1. expression problem
polymorphism:
interface I { String m(); }
class A implements I {
String m() { return "A::m"; }
}
class B implements I {
String m() { return "B::m"; }
}
pattern matching:
static String m(I i) {
switch(i) {
case A a:
return "A::m";
case B b:
return "B::m";
default:
throw new AssertionError();
}
}
polymorphism => easy to add a new subtype of I, can not add a new operation
pattern matching => easy to add a new opertion, can not add a new subtype
said differently, if the hierarchy is open => use polymorphism, if the hierarchy is closed => use pattern matching
The expression problem, Philip Wadler => you can not have both and type safety.
Other ways:
- The visitor pattern uses the double dispatch technique to implement the pattern matching,
the visitor interface has to list all the methods so it's equivalent to closed the hierarchy.
- A hashmap of lambda [1], extensible in both direction but it required an unsafe cast.
- multi dispatch like in CLOS, Dylan or Clojure, not typesafe.
2. smart cast / flow type inference
class A {
int value;
}
class B {
String s;
}
Introducing a new name/local variable in the case allows to avoid unnecessary casts,
so instead of
switch(val) {
case A:
return ((A)a).value;
case B:
return ((B)b).s.length();
}
one can write:
switch(val) {
case A a:
return a.value;
case B b:
return b.s.length();
}
the other solution is to specify a kind of flow inference, i.e. inside the case A, val is now typed as 'A', and inside of case B, val is now typed as 'B'.
switch(val) {
case A: // here val is now a A
return val.value;
case B: // here val is now a B
return val.s.length();
}
This doesn't work well if you can fallthough from case A to case B (more on this later) and in term of implementation, it's better for the debugger
to create a new variable inside the case, so the generated code is more like:
switch(val) {
case A:
A $val = (A)val;
return $val.value;
case B:
B $val = (B)val;
return $val.s.length();
}
2. block vs expression
When switching on type, being able to fallthrough from a case to another is less useful, because it's only safe if the type of the followup case is a super type.
switch(val) {
case Foo:
case Bar: // here Bar as to be a supertype of Foo.
}
Moreover, because of the separate compilation, the relationship between Foo and Bar can change after the code that containing the switch is compiled,
do Bar being a supertype of Foo as to be verified (once) at runtime or by the bytecode verifier.
So instead of using a C like switch based on statements, it's better a construction based on expression like the match in Scala or the when in Kotlin,
something like
match(val) {
case A a -> a.value;
case B b -> b.s.length();
}
3. generalized cases
Instead of a switch on type, we can go a step further and allow any arbitrary tests, like by example allowing comparison by value,
by example in Java, because unlike in Scala there is no hierachy to distinguish if an optional is present or not,
with a match that allows comparison on value, we can write
match(opt) {
case Optional.empty() -> "empty"
case Optional opt -> opt.get()
}
One problem with this construct is that unlike the match on type, here, there is an implicit order between the cases, e.g. at runtime the cases
as to be tested in the order like a cascade of if/else. From the performance point of view, the drawback is that if you have a big hierachy,
and want to do a switch on type, having a linear test is slow. From my own experience, it's a serious issue when you try to use a switch on type
with that perf property in a compiler.
A solution i've used in one language is to force users to define all the case on values first and then the case on types,
at runtime, the case on values were executed with a cascade of if/else and the cases on types were executed using a if/else or an hashmap
depending on the number of cases.
4. structural matching
One problem of the pattern matching, is that because it's not expressed inside a hierarchy, it ruins the encapsulation, so either your user
has to write getters or you have to expose the structure of the class to the pattern matching.
Case class in Scala or record in C# 7 [2] are special constructs that expose the structure of a class, so the case of a match can specified in term
of destructured local variable
interface I { }
case class A(int value) implements I;
case class B(String s, String s2) implements I;
match(i) {
case A(val) -> val
case B(s, _) -> s
}
Both C# and Scala allow user to define how the matching is done, C# relies on the is operator which uses out parameters to extract the values
and Scala uses an extractor method (unapply) which uses tuples to extract the values. Java has none of this mechanism, so either we introduce
a mechanism that provides a structural definition + getters or we wait the introduction of value types (and tuples).
A simple proposal for a structural definition of a class:
structural class B(final String s, final String s2) implements I;
which is equivalent to
/*structural*/ class B {
private final String s;
private final String s2;
public B(String s, String s2) {
this.s = s;
this.s2 = s2;
}
public String getS() { return s; }
public String getS2() { return s2; }
+ equals and hashCode
+ a StructuralAttribute that describe the attributes s and s2.
}
if a body is declared, then nothing is generated by the compiler apart the StructuralAttribute
and it's an error to not provide the getters, equals and hashCode.
structural class B(final String s, final String s2) implements I {
// need a getS and getS2 and equals and hashCode
}
Again, it's maybe better to wait value types instead of relying on getters.
5. Compilation of a pattern matching
As said previously, a switch on types should not be compiled to a cascade of if/else.
- It's dog slow if you have more than a dozen of cases, because it's a linear scan and each instanceof is in the best case also translated to an if/else by the JIT
- It doesn't work well with separate compilation, because if/else cascade is a sequence so defines an order.
The solution is to use invokedynamic, obviously :)
The recipe of a pattern matching can be describe as an array of couple of pattern/action,
the pattern is:
a test to a constant (static final field)
a test to an argument, argument are like captured argument of a lambda pass as argument of invokedynamic
a test of a type (a class)
a destructured definition (a string encoding the matching tree indicating the extracted values)
an action is:
a constant method handle that reference a static method that takes as parameter the local variable defines by the pattern.
Note: in my language, i've encoded the receipe as a JSON like string uuencoded seen as a constant String because there is no way to definition a tree like constant in the bytecode (yet !).
Examples:
match(opt) {
case Optional.empty() -> "empty"
case Optional opt -> opt.get()
}
can be encoded as
invokedynamic(opt) (Ljava/util/Optional;)Ljava/lang/String;
with the recipe:
constant: method: Ljava/lang/Optional;.empty()Ljava/lang/Optional;
type: Optional.class
and
match(i) {
case A(val) -> val
case B(s, _) -> s
}
can be encoded as
invokedynamic(i) (LI;)Ljava/lang/Object;
with the receipe:
structural: LA;(?)
structural: LB;(?_)
(where ? means capture and _ forget)
At runtime, in the bootstrap method, the type of the constant, the type of the argument, the type of the test type and the type of the structural matching are checked to verify that
they are subtype of the first parameter of invokedynamic, otherwise an Error is raised.
The test to a constant and the test to an argument are created as guardWithTest eagerly. The test on types are created lazily when a new class is found, if more that one pattern match,
an error is reported otherwise, a guardWithTest is created and if there are two many guardWithTest (12 currently), a ClassValue is used instead.
GWT are inserted the one after the others, a first prototype was trying to profile the branch taken to re-balance the tree is necessary but sometimes c1 was able to compile the code
that was doing the profiling before the code that was doing the re-balance was called leading to a stupid code invalidation.
A specific MethodHandle could do a better job here !
[1] https://github.com/forax/design-pattern-reloaded/blob/master/src/main/java/visitor/visitor6.java#L28
[2] https://github.com/dotnet/roslyn/blob/features/records/docs/features/records.md
More information about the amber-spec-experts
mailing list