Pattern Matching

Sat Mar 18 22:32:04 UTC 2017

----- Mail original -----
> De: "John Rose" <john.r.rose at oracle.com>
> À: "Rémi Forax" <forax at univ-mlv.fr>
> Cc: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Envoyé: Samedi 18 Mars 2017 19:50:08
> Objet: Re: Pattern Matching

> On Mar 18, 2017, at 10:22 AM, Remi Forax <forax at univ-mlv.fr> wrote:
>> 
>> Hi guys,
>> i've already implemented a kind of pattern matching in a language that run on
>> the JVM (still closed source :( ),
>> so i've said to Brian that i will send a summary of how it is done.
> 
> Thanks, Remi. This is useful.  At many points it parallels thinking that we have been doing.
> 
> I'm glad to see you use indy.  As with the string-concat work, there is an opportunity to perform
> run-time optimizations that way, which an eager bytecode spinning approach can't touch.

yes !

> For example, the static compiler can inspect type relations among different case branches,
> and it can make requirements such as no dead cases, but it's only at runtime
> that the final type relations are available.  And even if the static compiler ignores some type
> relations, the BSM for the switch can use such information to flatten the decision tree
> semi-statically.

yes

> 
> (My thought ATM is to forbid dead code at compile time but allow it at runtime if
> the type relations have changed.  I think that's in keeping with the way we manage
> other binary compatibility questions.

yes,
we have decided to throw an error at runtime in case of dead code mostly to report a compilation bug because we do not have a real IDE/incremental compilation.

>  The runtime optimization can ignore the
> static presuppositions.  This means that the semantics of switch need to be
> "as if" the switch is "really just" a linear decision chain.  We save performance
> by using a declarative formulation to the BSM which allows the BSM code
> generator to reorganize the chain as a true switch, or shallow cascade of
> switches.  BTW, we forgot the switch combinator.  Maybe we can fix this in
> the next release?  More indy leftovers!  It should be used to compile switching
> on strings and enums, as well as any future pattern matches on constants.
> It should be optionally lazy and open, which is what I think you are also
> calling for at the end of your message.)

yes, 
switching on enums should be done using the indy too, currently swapping two constants is not backward compatible because they is maybe a switch somewhere.
and +1 for adding the switch combinator in 10. I would really like to be able to tell the JIT that the tests has no side effect (maybe it can be another combinator too) even if some part of the test are not inlinable. 

> 
> The theory of type safety of multiple dispatch has moved forward with Fortress.
> Alternatively, if you can sugar up a visitor deployment as if it were multimethods
> added as extensions, you could prove type-safety and still define apparent methods,
> outside the type capsule.  When we get value types visitors will become cheaper.
> Maybe at that point we can think about doing multi-method sugar that compiles
> to invisible visitor classes.  (I'm not suggesting we do this now!)

The main issue of the visitor is that you have to add the accept methods on the class of the hierarchy before being able to use the visitor, this is equivalent to be able to insert a method or a constant into a class, which is also equivalent to be able to inject the implementation of a not yet existing interface.

> 
> Maybe one of our lambda leftovers will help with the problem of flow-typing.
>   switch (val) { case A val: … }  // relax shadowing rule for some bindings??
> It's a fraught question, because the anti-shadowing rule prevents confusions
> which a relaxation re-introduces, if users misuse the freedom of expression.
> The goal would be to guide users to shadow variables only when the shadowing
> binding is somehow a logical repeat of the original binding.  This cannot be
> done automatically in all cases we care about.  Tricky tradeoffs.

For lambdas, a lambda is an anonymous function, and even in JavaScript, opening a function open a new scope.  
If the switch as a syntax close to the lambda one (and less close to the C) it may not be a big deal.

 switch(val) {
   A val -> ...
 }

but given that usually you have more that one case, it will rapidly become unreadable

  switch(val) {
   A val -> foo(val)
   B val -> foo(val)
   ...
 }

> 
> We are also struggling with unapply.  There are several hard parts, including
> the encapsulation model and the API for delivering multiple components.
> The low-level parts probably require more indy-level combinators with
> special JVM optimizations.  I wish we had full value types today so we
> could just do tuples for multiple-value packages.  But even that is a
> simulation of the true shape of the problem, and simulation has overheads
> even if we stay out of the heap.  A lower-level solution which requires
> no companion types at all (not even tuples) would be to reify argument
> lists per se, at the same level of abstraction as method handle types.

Will it work if the pattern support nested types ?
  case A(B(x y), _): ...

You will have to have a kind of growable arguments list very similar to the one you need to implement the method handle combinators.
Technically, for the mh combinators you also need to shrink the arguments list but you can cheat if you are able to extract a sub part when doing the calls to the direct method handle.
Jerome has used that trick when implementing invokedynamic on android.

> That's a building block I am currently experimenting with, as a value-based
> class which can be upgraded to a value type.  The type information is
> encoded as (wait for it) a MethodType, interpreted with the arrows reversed.

The arrow reversed MethodType is a way to represent a flatten tuple of types to a named tuple (the value type) with the same component types.

So in order to represent a tuple of multiple values which are the structural value of a class, you need a reified tuple of types and you use a MethodType as a hack.

Another way to see unapply is to see it as a kind 'maybe tailcall' in a continuation,
if the pattern doesn't match, it tailcalls and checks the next pattern otherwise, it drops the unnecessary parameters and calls the action.
In case of embedded component, the action itself is to do a 'maybe tailcall' after the component values are on stack but if the pattern doesn't match it now has to pop two stack frames.

class A {
  int value;
  B b;

  void unapply(MethodHandle[int, B] match, MethodHandle reject) {  // act as a continuation
     if (value == 0) {
       reject.invokeExact(); // tail call to the next pattern
     }
     match.invokeExact(value, b);    // insert the argument into the previous call, calls unapply on b if nested, calls the action otherwise
  }
}

> 
> Thanks for the brain dump!
> 
> — John

Rémi