combinators for pattern matching

forax at univ-mlv.fr forax at univ-mlv.fr
Sun Mar 19 18:22:27 UTC 2017


> De: "John Rose" <john.r.rose at oracle.com>
> À: "Rémi Forax" <forax at univ-mlv.fr>
> Cc: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Envoyé: Dimanche 19 Mars 2017 01:05:20
> Objet: combinators for pattern matching

> On Mar 18, 2017, at 3:32 PM, forax at univ-mlv.fr wrote:

>> switching on enums should be done using the indy too, currently swapping two
>> constants is not backward compatible because they is maybe a switch somewhere.

> Actually it's OK to permute enums because each switch uses a local mapping
> table.
> But that causes overheads (partly because arrays are not constant-foldable,
> another
> place we need frozen arrays). A switch combinator for enums would optimize the
> case where caller and callee agreed on enum order, while covering the corner
> cases
> with a @Stable indirection table. That's better in two ways from what we do now.
> In general, the static compiler would "guess" at some sort of enumeration (or
> perfect
> hash function) of the inputs and trust the runtime to correct any flaws.

right, 
It just remember me something so before i forget, one reason why i come with a string recipe and not something that uses an array of method handles is that you have to support the stupid corner where there are more than 253 cases in a switch. 
Given i hit that peculiar case with a language which is not used by more that a hundred of people, i suppose Java will have the same issue. 

[... discussion moved to mlvm ...] 

>>> The theory of type safety of multiple dispatch has moved forward with Fortress.
>>> Alternatively, if you can sugar up a visitor deployment as if it were
>>> multimethods
>>> added as extensions, you could prove type-safety and still define apparent
>>> methods,
>>> outside the type capsule. When we get value types visitors will become cheaper.
>>> Maybe at that point we can think about doing multi-method sugar that compiles
>>> to invisible visitor classes. (I'm not suggesting we do this now!)

>> The main issue of the visitor is that you have to add the accept methods on the
>> class of the hierarchy before being able to use the visitor, this is equivalent
>> to be able to insert a method or a constant into a class, which is also
>> equivalent to be able to inject the implementation of a not yet existing
>> interface.

> But you can usually write the accept methods once, universally, right?

if you are able to abstract over an unknown number of parameters, yes. 

> The multi-method part is often just the specific visit* methods of the visitor.
> The parts of the multi-methods which switch on the acceptor object types
> can be factored out just once.

> Here's the pattern I'm thinking of in multimethods:

> multimethod WalkY.Result walkY(NodeTypeX node, WalkY walker);
> ==> value class WalkY$Walker implements NodeTypeTop.Visitor { …
> visit(NodeTypeX node) { return walkY(node, this.walker); } … }

> The hooks for all the NodeTypeX things would go once into the node hierarchy,
> assuming such a hierarchy can be pre-configured in a uniform manner, while
> the various ad hoc methods would go into an ad hoc WalkY walker value class.
> At this point I would need to study the "expr problem" literature to see how
> this
> ties in (I'm sure it's not a new idea), so I'll just say it seems useful as a
> FUTURE
> (not NOW) option for sugary multi-methods, and leave it there. No way are we
> doing multi-methods any time soon.

Implementing multimethods with a visitor is a pain, because you have to dispatch on each parameters, so if you are in the middle of a dispatch you need a way to extract an argument, call accept on it with a new extractor that will extract the next argument, so you need tuple and array access to tuple. And this is not the fastest way to implements multimethod, it's better to analyse the signature of all implementations because dispatching on the 3rd parameter and checking the type of the other arguments (not unlike in a unverify entry point) may be a lot of faster than dispatch on the arguments in order. 

> But: The combined pattern of visitor + factory is sometimes called a
> metamorphism.
> I think that what we are after, in the end, is a full set of hooks for building
> ad hoc
> metamorphisms, without having to add more logic to the participating data
> classes.
> A matcher hook is really a co-constructor, which provides a mirror-image
> computation
> to the constructor. One takes some values and produces an object from the value,
> while the other takes an object and produces some values from the object.

yes, if you forget the '_', the matcher hook is the dual of a constructor, an extractor. 
It's even more true, if you compare with a vnew which load the arguments into the fields directly. A default extractor is something that takes the field from an object and put them on stack. 
But for any objecst, vnew is not custumizable enough, no null check, defensive copy, etc. that why we have constructor, unapply should be able to let user to specify which fields should be extracted the same way. 

> Both halves are important, and it is also important to make the two halves
> work together gracefully. IMO that's the deep reason why notations for patterns
> look like notations for object construction expressions. I am hoping we can
> exploit this mirror structure in Amber designs.

yes, i fully agree. 

>>> The goal would be to guide users to shadow variables only when the shadowing
>>> binding is somehow a logical repeat of the original binding. This cannot be
>>> done automatically in all cases we care about. Tricky tradeoffs.

>> For lambdas, a lambda is an anonymous function, and even in JavaScript, opening
>> a function open a new scope.
>> If the switch as a syntax close to the lambda one (and less close to the C) it
>> may not be a big deal.

>> switch(val) {
>> A val -> ...
>> }

>> but given that usually you have more that one case, it will rapidly become
>> unreadable

>> switch(val) {
>> A val -> foo(val)
>> B val -> foo(val)
>>>> }

> Actually, I find that a reasonable compromise between pure flow typing and
> explicit rebinding.

> Yes, the names are stuttery, but there's a logical reason for it; we are
> explicitly instructing the
> compiler to rebind the name after each case label. Compare similar stuttering in
> "this.foo = foo"
> or "map.computeIfAbsent(k, k -> k.toString())".

Maybe, let's see what the other in the EG think. 

>>> A lower-level solution which requires
>>> no companion types at all (not even tuples) would be to reify argument
>>> lists per se, at the same level of abstraction as method handle types.

>> Will it work if the pattern support nested types ?
>> case A(B(x y), _): …

> Yes. You need a composite MH which takes a possible A (with possible B)
> and returns an argument list of x,y. The type of the argument list is totally
> ad hoc. Consider:
> case A(B(x,_),C(y)): …
> In that case, the argument list of (x,y) might have types that never appear
> together, except at this one place in the source code.

yes, 
A.unapply returns { k, l } 
B.unapply retruns { m, n } 
C.unapply return { o } 

so 
a.unapply() -> { k, l } 
k.unapply() -> { m, n } 
l.unapply() -> { o } 
and then box(m, o) -> { m, o } 

in term of implementation, you can either have one argument list by calls or consider that what you want is just an object with stable field that will represent the whole computation. 
Here we need 5 slots, which are empty by default, 
z = { A k, B l, int m, int n, int o } 
a.unapply() is equivalent to z.k = k, z.l = l; 
z.k.unapply() is equivalent to z.m = m, z.n = n 
z.l.unapply() is equivalent to z.o = o 
then you need to be able to call with all z slots on stack, and use permute arguments or drop arguments to call the action. 

This is really if you are inside a constructor of the z object, sending itself to ask all unapply() to initialize a part of it. 

>> You will have to have a kind of growable arguments list very similar to the one
>> you need to implement the method handle combinators.
>> Technically, for the mh combinators you also need to shrink the arguments list
>> but you can cheat if you are able to extract a sub part when doing the calls to
>> the direct method handle.

> I think that means you need argument list utilities (are they combinators?)
> which perform for argument lists what dropArgs, insertArgs, collectArgs,
> spreadArgs, etc., do for MHs. Actually, the existing MH combinators
> will do all of this, given two more MH hooks which allow method handles
> to map between the two forms of argument lists: normal, and "collected
> into a single slug". With a little forcing, collectArguments and spreadArguments
> could be trained to look for the argument list type as well as array types,
> et voila.

>> Jerome has used that trick when implementing invokedynamic on android.

> Ignoring arguments is a good trick. If you have an argument list pointer,
> you can ignore both a prefix and a suffix.

or you copy them on top of the stack and the do the method call. 

>>> That's a building block I am currently experimenting with, as a value-based
>>> class which can be upgraded to a value type. The type information is
>>> encoded as (wait for it) a MethodType, interpreted with the arrows reversed.

>> The arrow reversed MethodType is a way to represent a flatten tuple of types to
>> a named tuple (the value type) with the same component types.

> Hold on, there are no names here, just positions. That's important! We call
> Math.max on (DD) not on (Dx;Dy;).

for me the name was the name of value type to upgrade to. 

>> So in order to represent a tuple of multiple values which are the structural
>> value of a class, you need a reified tuple of types and you use a MethodType as
>> a hack.

> The MT just provides the dynamic type framework. (The return type is always
> void.class.)
> The List API provides weakly-typed access, and there also has to be a way to get
> strongly-typed MH-based access if you know the MT in advance.

for me (see above) everything is always strongly typed but maybe un-initialized. 

>> Another way to see unapply is to see it as a kind 'maybe tailcall' in a
>> continuation,

> Yes, but the JVM is very bad at tailcalls, so that's not practical. And in any
> case,
> even if tailcalls were great, you'd leave behind your local state, which will
> make
> compilers unhappy and lead to the boxing of frame-state (in closures or
> whatnot).

about forgetting the local variable, 
first, you can avoid local variable (i.e. only allow effectively final likes with lambdas) 
then you can always transform a switch with mutable local variable to two switches not unlike how a switch on string is done. 

int foo; 
switch(val) { 
A a -> { foo = a.value; } 
B b -> { foo = b.s.length(); } 
… 
} 
can be translated into 
int index = switch(val) { 
A a -> { return 0; } 
B b -> { return 1; } 
} 
int foo; 
switch(index) { 
case 0: 
A a = (A) val; 
foo = a.value; 
break; 
case 1: 
B b = (B) val; 
foo = b.s.length(); 
break; 
} 

But for the unapply/structural matching switch, we should limit ourselves to actions that have the same restriction as lambdas, i.e. expression switch. 
For the plain switch on type, we can support the C switch, but even that, i find it evil. 

BTW, one of my most painful debugging session i had with a Scala code was because an unapply() doing side effects. 

> It's the difference between internal and external iterators all over again.
> I think we need both options. And in this case I think we want to build the
> internal (CPS flavor) on top of the external (accessor flavor), and not vice
> versa.

> The CPU flavor suffers from bad support for tailcall, loss of local state, and
> one
> more thing: It's a little too eager. It assumes that the client wants *all* of
> the
> component values from the match. It would be nice (though it's not required)
> if you could say "_" for a component pattern, and avoid the cost of reifying
> the component. Why is this important? Well, the component might actually
> cost something to reify; standard examples are defensive copying of array
> components, or creation of invariant-enforcing view objects. We can't handwave
> these away by saying that pattern-matchable objects will always be so simple
> that component extraction will always be cheap. (This is a limitation of
> the argument-list approach also, so we might want something more here.)

if you drop the argument after some point, either the JIT will see the whole picture and remove this useless defensive copy, etc. or the JIT will think something escape, and you will pay for it (not Mexico) 
if you want a more fine grained solution, it's like providing a lambda for each extracted field and in that case, instead of having an unapply function, i think it's better and simpler to in the code make an association between a field, its getter and its order in the pattern. 

> None of this bears on the surface syntax of the language, but it's
> important for some of us to think about since classfiles containing
> compiled Amber code need to be compact and fully optimizable
> by the Java runtime. In particular, if we didn't have indy to perform
> boilerplate generation for us at runtime, our class files would become
> terribly bloated by the boilerplate features we are adding. A one
> line source file could expand by 100x if all the boilerplate had
> to be statically elaborated into the classfile. With indy-style
> bootstrapping we can avoid paying the cost for boilerplate
> functions until they are actually used.

> — John

Rémi 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20170319/381efb1e/attachment-0001.html>


More information about the amber-spec-experts mailing list