Some thoughts on Member Patterns as parser developer

Mon Apr 29 12:09:25 UTC 2024

On 4/26/2024 3:00 PM, Olexandr Rotan wrote:
> I read through some messages about member patterns design and 
> accumulated some thoughts to share.
>
> It happened so that all of my recent projects were linked to parsing 
> some data:equations, java statements etc., so I have been on close 
> terms with switch statements during tokenization and syntax tree 
> construction.
>
> It is common practice in parser development to build either type 
> hierarchies. For older solutions, it is common to just informally 
> "imply" that one type of tokens is a subtype of another and use some 
> field like "kind" as a discriminator. The thing in common in both 
> approaches, is that one "state" of token or ex[ression could be not 
> mutually exclusive to another. Consider following:
>
> Token
> | - KeywordToken
> | | - ExtendsToken
> | | - SuperToken
> | | ....
> | - LiteralToken
> | | - NumberToken
> | | - StringToken
> ...
> Also, we have some factory that receives String and returns token:
> class Tokens {
>          Token parse(String s) { .. }
>          static case pattern parseKeyword(String s) { .. } // hope I 
> understood syntax correctly
>          static case pattern parseLiteral(String s) { .. }
>          static case pattern parseNumberLiteral(String s) { .. }
> }

I am not sure what semantics you have in mind, but given the name 
(`parseXxx`), you might be on very dangerous territory here.  If these 
parse methods are intended to actually mutate the internal state of the 
parser, then declaring them as patterns is going to be pretty wrong.  
Patterns are not merely "conditional methods with multiple return".  So 
its quite possible this whole question rests on a faulty base.

Your patterns are missing the declaration of what the match candidate 
is, so I will assume that these are patterns whose match candidate is 
Token.  You also don't show whether Token actually has a hierarchy like 
you describe, or whether that is only on the whiteboard.

> Now, somewhere in the processing pipeline, I decided to use member 
> pattern matching:
> switch (str) {
>           case Tokens.parseLiteral(String literalStr) -> ...
>           case Tokens.parseKeyword(String kwdStr) -> ...
>           case null -> ...
> }
>
> At this point, all possible states are already exhausted. However, If 
> i got everything correctly, this won't compile, as the compiler still 
> thinks parseNumberLiteral is not exhausted, while effectively it is.

Why is it effectively exhausted here?  What hidden source of 
exhaustiveness information is there in your hiearchy?  And, could it be 
made explicit?

There are many questions here, but my strong impression here is that you 
are using the wrong tool in a few places, and so not surprisingly are at 
a dead end.

One way to make this all explicit would be:

     sealed interface Token { ... }
     sealed interface KeywordToken extends Token { ... }
     sealed interface LiteralToken extends Token { ... }
     // records for each token type

Now, you would get record patterns for free (case LiteralToken(...)) and 
exhaustiveness for free also.  And if you were switching only on a 
LiteralToken, you would only have to cover the subtypes of LIteralToken.

> I think such situations could become pretty common. One solution I can 
> think of is to add an option to specify a supercase like  static case 
> pattern parseNumberLiteral(String s) extends parseLiteral { .. }.

I think we're pretty far from identifying the problem, so probably best 
to hold off suggesting solutions at this point.Let's back up and expose 
the assumptions here first.