Fwd: Record pattern and side effects

Tue Apr 19 19:29:28 UTC 2022

This came in on the amber-spec-comments list, but it's a useful 
discussion to bring here.

> While it's pretty easy to say that record deconstruction should never have
> side effects (or generally be stateful beyond the record), would you also
> extend that to all custom patterns? 

Yes, though not all side-effects are created equally.

Pattern matching is about fusing asking a question with conditional 
extraction, in a way that is composible (so that patterns can be 
composed, just as method calls can be composed.) Let me address 
exceptions separately from "ordinary" side effects, but the answer is 
mostly the same for both.

First, let's get exceptions out of the way.  Pattern matching is about 
asking a question, like "If I casted you to Foo, would you throw?"  
Having the "if I did this, would you throw" answer by throwing is not 
... helpful.  The whole point of pattern matching is that you can easily 
express "is it this?  is it that?" logic; if "is it this" prevented you 
from asking "is it that", that would be rude.

This doesn't mean that language constructs that use pattern matching 
can't fail; if a switch is supposed to be exhaustive, but somehow is not 
(e.g., separate compilation artifacts), while the process of trying to 
match each of the N cases should not throw, if we get to the end and 
none of the cases have matched, then the _switch_ is entitled to throw.

> It seems to me that stateful, effectful
> patterns could be useful if explicit enough.

Yes, but mere usefulness is not the measure of whether a language 
feature is wise, or even beneficial as a whole.  We routinely give up 
flexibility in order to obtain global benefits.  The real question is, 
is the language better with the incremental flexibility, or worse?  Very 
often, the answer is worse, because it undermines safety properties or 
optimizations for comparitively little benefit.

> The behaviour I would expect here is essentially "mimicking an equivalent
> if/else chain", ensuring that I can always refactor between a switch and
> ifs without new behaviour, always evaluating top-to-bottom left-to-right.
> But that's also bad for the majority of patterns that are expected to be
> pure.

This is indeed one of the tradeoffs.  Let's imagine 99% of patterns are 
pure; I think in a sensible world, the number is much higher.  But let's 
imagine further that we constrained execution to work as you describe, 
which has O(n) time complexity, even though many can be executed in O(1) 
without this constraint.  This is terrible; for the sake of a tiny 
fraction of (questionable) patterns, we cripple the performance of every 
switch.  Seems a manifestly bad tradeoff.

But, performance is not the only motivation here.  Suppose we have a 
pattern which matches if the second of the current time is odd, and as 
its extraction, it reads a byte from an InputStream.  Now, we have no 
clue what

     case Foo(var theByte): ...

means; the question has a random answer, and the extraction has not only 
a random value, but may affect the result of other computations 
(including later pattern matches).  Does this really make sense as a 
pattern?  That seems pretty far outside of any reasonable description of 
a pattern to me.  A pattern is not just a combination of a predicate 
with an arbitrary set of code thunks to produce values; it is asking a 
coherent question.

But you hit upon another reason to not like this: composition.  If 
patterns are pure, then they are freely composible, and the order in 
which we evaluate, say, the subpatterns of a given record pattern 
doesn't matter.  Not only does this offer us flexibility enable 
optimization, but it seems awful that

     case Foo(Bar(var x), Baz(var y)):

would behave differently if `Baz(var y)` were evaluated first.  If 
there's a data flow dependency, it should be explicit; arbitrary 
side-effects undermine composition.

I guess it comes down to a philosophical question; is a pattern just an 
arbitrary bag of imperative code, and switch is a weird syntax for 
invoking it, or whether there's a higher-level concept in here.  I think 
we give up a lot and gain very little by sticking to the "its just a 
weird syntax for calling a method" interpretation.

> I'd suggest providing an annotation for impure patterns

Every bit of flexibility has costs, both in terms of development 
bandwidth and the complexity of the language.  In a world where we had 
an infinite development budget and users had infinite complexity 
tolerance, I guess I could imagine this, but even then I'm skeptical, 
because simply putting the indicator at the declaration site of the 
pattern doesn't help people read a switch and discover if it has hidden 
landmines in it.  And having something at the use site would surely be 
clutter.  The benefit seems really tiny, and the cost seems huge, so I 
have a hard time imagining how that could balance.

If you want to provide mutative methods on your APIs, provide them as 
methods, and call them on the RHS of the case.

(And by the way, there's no such thing as annotations that affect 
language semantics.  You're asking for a language feature.)

> While it's pretty easy to say that record deconstruction should never have
> side effects (or generally be stateful beyond the record), would you also
> extend that to all custom patterns? It seems to me that stateful, effectful
> patterns could be useful if explicit enough.
>
> Given some IntelliJ-like AST system, we could have code like
>
> ```
> AstElementReference elem = ...;
> switch(elem){
>    case DirectRef(var ast) -> ...
>    case AstCache.cacheOf(var ast) -> ...
>    case Stubs.stubOf(var ast) -> ...
>    case FileIndex.refToUnparsed(var ast) -> ...
>    default -> throw ...
> }
> ```
> where accessing (or creating) an underlying AST element might be stateful,
> and where proper polymorphism may not be appropriate (e.g. I don't control
> these types, or what I'm doing with them is not meaningful for all
> subtypes, or...).
>
> An equivalent if/else chain might look like
>
> ```
> if(elem instanceof DirectReference(var ast)){
>      ...
> }else if(elem.isCache()){
>      var ast = AstCache.get(elem.key());
>      ...
> }else if(elem.isStub()){
>      var ast = Stubs.createMirror(elem.key());
>      ...
> }else if(elem.isFileRef()){
>      var ast = FileIndex.parse(elem.key());
>      ...
> }
> ```
> or something. An enum might be more appropriate for representing types, but
> that's irrelevant to what the patterns are doing; they're moving out the
> "obvious" step of data extraction into the conditional, making it clearer
> what the "actual" logic is, similar to type patterns but more domain
> specific.
>
> In this example, it's clear that side effects are only appropriate on a
> successful match. Stateful failures*may*  be required if a stateful pattern
> is nested within another pattern, or guarded by a when clause, though;
>
> ```
> switch(elem){
>    case Stubs.stubOf(AstClass.classAst(var clss))
>      -> ...
>    case Stubs.stubOf(var ast)
>    when ast.isPhysical()
>      -> ...
>    default
>      -> throw new IllegalArgumentException();
> }
> ```
>
> Factoring out a common head would be the "correct"/more efficient behaviour
> in this case, but as pointed out already, it's not possible to do that for
> all duplicate occurrences of a pattern.
>
> The behaviour I would expect here is essentially "mimicking an equivalent
> if/else chain", ensuring that I can always refactor between a switch and
> ifs without new behaviour, always evaluating top-to-bottom left-to-right.
> But that's also bad for the majority of patterns that are expected to be
> pure.
>
> I'd suggest providing an annotation for impure patterns, then, which
> prevents the compiler from optimizing the switch in "unexpected" ways, and
> allows warning when an impure pattern is repeated (in the compiler or by an
> IDE), alongside making it clearly documented and explicit.
>
> If the annotation is not present in a switch, the compiler gets to reorder
> and factor out any part it wants.
>
> For the purposes of JDK 19? record patterns, where the dtor cannot be
> explicitly written out, the annotation would have to be applied to the
> whole type, or a particular accessor.