Some thoughts on Member Patterns as parser developer

Mon Apr 29 14:02:57 UTC 2024

I think I did a really poor job expressing my thoughts in the first
message. I will try to be more clear now, along with some situations I have
encountered myself.

Assume we have a stateless math expressions parser. For simplicity, let's
assume that we split expressions into several types: "computation-heavy",
"computation-lightweight", "erroneous" (permits null as input) and
"computation-remote" (delegate parsing to another service via some
communication protocol), and types can be assigned heuristically
(speculatively). For some tasks, we need to perform some pre- and
postprocessing around core parsing logic, like result caching, wrapping
parsing into virtual thread and registering it in phaser etc., for others -
log warning or error, or fail parsing with exception.

These types could be considered "abstract types", because they are "bins"
that real token types fall into, exclusively into one. Lets omit class
hierarchies this time, we will be good just under assumption that null
value is considered erroneous token, and polynomial considered
computation-heavy.

Now, in parser class, we define following patterns:

static case pattern  matchesHeavy(String s) { .. }
static case pattern  matchesLight(String s) { .. }
static case pattern  matchesErroneous(String s) { .. } // covers null and
unresolved
static case pattern  matchesRemote(String s) { .. }
static case pattern  matchesPolynomial(String s) { .. } // considered heavy
// other patterns that are either subcases of mentioned or aren't for
String type...

Note: last time I have been kind of misled by an example I have found in
jdk mails with Integer.parseInt pattern, when effectively what I meant was
matching pattern, not parsing.

Now, we decide to use switch pattern matching when building a syntax tree
like so:

switch (expression /*expression is of type String*/) {
        case matchesHeavy(String heavy) -> ...
        case matchesLight(String heavy) -> ...
        case matchesRemote(String heavy) -> ...
        case matchesErroneous(String heavy) -> ... // covers null and any
unresolved (default brach)
}

Now, what we see is that effectively, every possible pattern for String
type (I make an emphasis here and will return to this later) is covered,
including null and default. But, as I understand, the following won't
compile.

As I see, there are two layers for this problem, one of which was not even
mentioned by me in the previous letter,

Layer 1 (not mentioned in first letter): patterns are exhausted ONLY for
String type (and types assignable to it if there were such). If we are
trying to match Object-typed value, patterns are not exhausted. You were
really right to note that there weren't any real sources of exhaustiveness
in my initial letter. This leads me to a thought that to declare a finite
states list for some type, there will have to be introduced some kind of
companion object that lists all possible states for some type, or at least
I haven't come up with something better. This kind of state object would be
a really complex concept to introduce (complex for users in the first
place), however, this can also be a very powerful tool in the right hands.

Layer 2: even if there are some "patterns object" or any workaround, it
will contain matchesPolynomial pattern. This pattern is a more specific
version of matchesHeavy pattern, and therefore, covered by it. However, if
we leave it as it is now, the compiler will think this branch is not
implemented, and, therefore, will demand the default branch in switch, even
if it is effectively unreachable. This is what I have tried to propose fix
for with this matchesPolynomial extends matchesHeavy syntax. Also,
matchesErroneous covers null (that is what i referred to in "covers null"
syntax section), and default is covered too (but I don't think
matchesErroneous covers default should be a thing). Also, there could be
patterns that cover any non-null value (like potential Objects.nonNull),
and the example I have provided in the first letter demonstrates that
Objects.nonNull and null patterns are also exhausting for any non-primitive
object.

I hope now my thoughts are more clear. Summarizing, what I want to tell by
this thoughts is that there are potential to make member patterns much more
powerful if there would be a way to a) assert that there are a finite
amount of patterns for type and each value falls at least in one and b)
declare that one pattern is subcase of another pattern. This is for sure
would be complex for understanding of users, but could potentially offer a
very powerful instrument in return. I'm sure that you could think of
countless ways to apply this in various projects.

Regards

On Mon, Apr 29, 2024 at 3:09 PM Brian Goetz <brian.goetz at oracle.com> wrote:

>
>
> On 4/26/2024 3:00 PM, Olexandr Rotan wrote:
> > I read through some messages about member patterns design and
> > accumulated some thoughts to share.
> >
> > It happened so that all of my recent projects were linked to parsing
> > some data:equations, java statements etc., so I have been on close
> > terms with switch statements during tokenization and syntax tree
> > construction.
> >
> > It is common practice in parser development to build either type
> > hierarchies. For older solutions, it is common to just informally
> > "imply" that one type of tokens is a subtype of another and use some
> > field like "kind" as a discriminator. The thing in common in both
> > approaches, is that one "state" of token or ex[ression could be not
> > mutually exclusive to another. Consider following:
> >
> > Token
> > | - KeywordToken
> > | | - ExtendsToken
> > | | - SuperToken
> > | | ....
> > | - LiteralToken
> > | | - NumberToken
> > | | - StringToken
> > ...
> > Also, we have some factory that receives String and returns token:
> > class Tokens {
> >          Token parse(String s) { .. }
> >          static case pattern parseKeyword(String s) { .. } // hope I
> > understood syntax correctly
> >          static case pattern parseLiteral(String s) { .. }
> >          static case pattern parseNumberLiteral(String s) { .. }
> > }
>
> I am not sure what semantics you have in mind, but given the name
> (`parseXxx`), you might be on very dangerous territory here.  If these
> parse methods are intended to actually mutate the internal state of the
> parser, then declaring them as patterns is going to be pretty wrong.
> Patterns are not merely "conditional methods with multiple return".  So
> its quite possible this whole question rests on a faulty base.
>
> Your patterns are missing the declaration of what the match candidate
> is, so I will assume that these are patterns whose match candidate is
> Token.  You also don't show whether Token actually has a hierarchy like
> you describe, or whether that is only on the whiteboard.
>
> > Now, somewhere in the processing pipeline, I decided to use member
> > pattern matching:
> > switch (str) {
> >           case Tokens.parseLiteral(String literalStr) -> ...
> >           case Tokens.parseKeyword(String kwdStr) -> ...
> >           case null -> ...
> > }
> >
> > At this point, all possible states are already exhausted. However, If
> > i got everything correctly, this won't compile, as the compiler still
> > thinks parseNumberLiteral is not exhausted, while effectively it is.
>
> Why is it effectively exhausted here?  What hidden source of
> exhaustiveness information is there in your hiearchy?  And, could it be
> made explicit?
>
> There are many questions here, but my strong impression here is that you
> are using the wrong tool in a few places, and so not surprisingly are at
> a dead end.
>
> One way to make this all explicit would be:
>
>      sealed interface Token { ... }
>      sealed interface KeywordToken extends Token { ... }
>      sealed interface LiteralToken extends Token { ... }
>      // records for each token type
>
> Now, you would get record patterns for free (case LiteralToken(...)) and
> exhaustiveness for free also.  And if you were switching only on a
> LiteralToken, you would only have to cover the subtypes of LIteralToken.
>
> > I think such situations could become pretty common. One solution I can
> > think of is to add an option to specify a supercase like  static case
> > pattern parseNumberLiteral(String s) extends parseLiteral { .. }.
>
> I think we're pretty far from identifying the problem, so probably best
> to hold off suggesting solutions at this point.Let's back up and expose
> the assumptions here first.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-dev/attachments/20240429/5053b732/attachment.htm>