Some thoughts on Member Patterns as parser developer

Mon Apr 29 14:51:17 UTC 2024

I also would like to add some clarification on what problem I am trying to
point out. Many could think I am just fighting null and default branches,
but this is just a part of the whole picture.

!, Null and default branches: yes, the most obvious. I think I had talked
about this part enough to not repeat once again. Here I just want to add
that such redundant branches could be much more damaging for readability
than it seems to be at first glance. Besides that this dead code is just
annoying (if I'm not mistaken, I'm not the first one here to say that),
person could spend hours debugging trying to figure out why null value
doesn't reach null branch, not knowing that some pattern actually accepts
null and makes null branch unreachable. Not every code reader is familiar
with code internals to know that there is no way for the default branch to
be reached because all possible patterns are listed, or that one pattern is
a subcase of another.

2. Else-if branches: without a way to tell the compiler there is a finite
number of patterns for value, it may think that not all possible if
branches lead to a either return or throw statements. I think it could be
common to want "processed by pattern" value in each if branch, however body
of if statement with pattern as condition will be recognized as only one of
two possible branches unless there is a way to assert that after trying to
match value with all possible patterns, at least one of each statements
bodies will be executed.

3. JIT optimizations: I am not really familiar with bytecode-level
implementation of switch with patterns, but creating a way to declare
hierarchy of patterns or waying that one pattern accepts null would for
sure help JIT optimize unreachable branches

On Mon, Apr 29, 2024 at 5:02 PM Olexandr Rotan <rotanolexandr842 at gmail.com>
wrote:

> I think I did a really poor job expressing my thoughts in the first
> message. I will try to be more clear now, along with some situations I have
> encountered myself.
>
> Assume we have a stateless math expressions parser. For simplicity, let's
> assume that we split expressions into several types: "computation-heavy",
> "computation-lightweight", "erroneous" (permits null as input) and
> "computation-remote" (delegate parsing to another service via some
> communication protocol), and types can be assigned heuristically
> (speculatively). For some tasks, we need to perform some pre- and
> postprocessing around core parsing logic, like result caching, wrapping
> parsing into virtual thread and registering it in phaser etc., for others -
> log warning or error, or fail parsing with exception.
>
> These types could be considered "abstract types", because they are "bins"
> that real token types fall into, exclusively into one. Lets omit class
> hierarchies this time, we will be good just under assumption that null
> value is considered erroneous token, and polynomial considered
> computation-heavy.
>
> Now, in parser class, we define following patterns:
>
> static case pattern  matchesHeavy(String s) { .. }
> static case pattern  matchesLight(String s) { .. }
> static case pattern  matchesErroneous(String s) { .. } // covers null and
> unresolved
> static case pattern  matchesRemote(String s) { .. }
> static case pattern  matchesPolynomial(String s) { .. } // considered heavy
> // other patterns that are either subcases of mentioned or aren't for
> String type...
>
> Note: last time I have been kind of misled by an example I have found in
> jdk mails with Integer.parseInt pattern, when effectively what I meant was
> matching pattern, not parsing.
>
> Now, we decide to use switch pattern matching when building a syntax tree
> like so:
>
> switch (expression /*expression is of type String*/) {
>         case matchesHeavy(String heavy) -> ...
>         case matchesLight(String heavy) -> ...
>         case matchesRemote(String heavy) -> ...
>         case matchesErroneous(String heavy) -> ... // covers null and any
> unresolved (default brach)
> }
>
> Now, what we see is that effectively, every possible pattern for String
> type (I make an emphasis here and will return to this later) is covered,
> including null and default. But, as I understand, the following won't
> compile.
>
> As I see, there are two layers for this problem, one of which was not even
> mentioned by me in the previous letter,
>
> Layer 1 (not mentioned in first letter): patterns are exhausted ONLY for
> String type (and types assignable to it if there were such). If we are
> trying to match Object-typed value, patterns are not exhausted. You were
> really right to note that there weren't any real sources of exhaustiveness
> in my initial letter. This leads me to a thought that to declare a finite
> states list for some type, there will have to be introduced some kind of
> companion object that lists all possible states for some type, or at least
> I haven't come up with something better. This kind of state object would be
> a really complex concept to introduce (complex for users in the first
> place), however, this can also be a very powerful tool in the right hands.
>
> Layer 2: even if there are some "patterns object" or any workaround, it
> will contain matchesPolynomial pattern. This pattern is a more specific
> version of matchesHeavy pattern, and therefore, covered by it. However, if
> we leave it as it is now, the compiler will think this branch is not
> implemented, and, therefore, will demand the default branch in switch, even
> if it is effectively unreachable. This is what I have tried to propose fix
> for with this matchesPolynomial extends matchesHeavy syntax. Also,
> matchesErroneous covers null (that is what i referred to in "covers null"
> syntax section), and default is covered too (but I don't think
> matchesErroneous covers default should be a thing). Also, there could be
> patterns that cover any non-null value (like potential Objects.nonNull),
> and the example I have provided in the first letter demonstrates that
> Objects.nonNull and null patterns are also exhausting for any non-primitive
> object.
>
> I hope now my thoughts are more clear. Summarizing, what I want to tell by
> this thoughts is that there are potential to make member patterns much more
> powerful if there would be a way to a) assert that there are a finite
> amount of patterns for type and each value falls at least in one and b)
> declare that one pattern is subcase of another pattern. This is for sure
> would be complex for understanding of users, but could potentially offer a
> very powerful instrument in return. I'm sure that you could think of
> countless ways to apply this in various projects.
>
> Regards
>
>
> On Mon, Apr 29, 2024 at 3:09 PM Brian Goetz <brian.goetz at oracle.com>
> wrote:
>
>>
>>
>> On 4/26/2024 3:00 PM, Olexandr Rotan wrote:
>> > I read through some messages about member patterns design and
>> > accumulated some thoughts to share.
>> >
>> > It happened so that all of my recent projects were linked to parsing
>> > some data:equations, java statements etc., so I have been on close
>> > terms with switch statements during tokenization and syntax tree
>> > construction.
>> >
>> > It is common practice in parser development to build either type
>> > hierarchies. For older solutions, it is common to just informally
>> > "imply" that one type of tokens is a subtype of another and use some
>> > field like "kind" as a discriminator. The thing in common in both
>> > approaches, is that one "state" of token or ex[ression could be not
>> > mutually exclusive to another. Consider following:
>> >
>> > Token
>> > | - KeywordToken
>> > | | - ExtendsToken
>> > | | - SuperToken
>> > | | ....
>> > | - LiteralToken
>> > | | - NumberToken
>> > | | - StringToken
>> > ...
>> > Also, we have some factory that receives String and returns token:
>> > class Tokens {
>> >          Token parse(String s) { .. }
>> >          static case pattern parseKeyword(String s) { .. } // hope I
>> > understood syntax correctly
>> >          static case pattern parseLiteral(String s) { .. }
>> >          static case pattern parseNumberLiteral(String s) { .. }
>> > }
>>
>> I am not sure what semantics you have in mind, but given the name
>> (`parseXxx`), you might be on very dangerous territory here.  If these
>> parse methods are intended to actually mutate the internal state of the
>> parser, then declaring them as patterns is going to be pretty wrong.
>> Patterns are not merely "conditional methods with multiple return".  So
>> its quite possible this whole question rests on a faulty base.
>>
>> Your patterns are missing the declaration of what the match candidate
>> is, so I will assume that these are patterns whose match candidate is
>> Token.  You also don't show whether Token actually has a hierarchy like
>> you describe, or whether that is only on the whiteboard.
>>
>> > Now, somewhere in the processing pipeline, I decided to use member
>> > pattern matching:
>> > switch (str) {
>> >           case Tokens.parseLiteral(String literalStr) -> ...
>> >           case Tokens.parseKeyword(String kwdStr) -> ...
>> >           case null -> ...
>> > }
>> >
>> > At this point, all possible states are already exhausted. However, If
>> > i got everything correctly, this won't compile, as the compiler still
>> > thinks parseNumberLiteral is not exhausted, while effectively it is.
>>
>> Why is it effectively exhausted here?  What hidden source of
>> exhaustiveness information is there in your hiearchy?  And, could it be
>> made explicit?
>>
>> There are many questions here, but my strong impression here is that you
>> are using the wrong tool in a few places, and so not surprisingly are at
>> a dead end.
>>
>> One way to make this all explicit would be:
>>
>>      sealed interface Token { ... }
>>      sealed interface KeywordToken extends Token { ... }
>>      sealed interface LiteralToken extends Token { ... }
>>      // records for each token type
>>
>> Now, you would get record patterns for free (case LiteralToken(...)) and
>> exhaustiveness for free also.  And if you were switching only on a
>> LiteralToken, you would only have to cover the subtypes of LIteralToken.
>>
>> > I think such situations could become pretty common. One solution I can
>> > think of is to add an option to specify a supercase like  static case
>> > pattern parseNumberLiteral(String s) extends parseLiteral { .. }.
>>
>> I think we're pretty far from identifying the problem, so probably best
>> to hold off suggesting solutions at this point.Let's back up and expose
>> the assumptions here first.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-dev/attachments/20240429/d8478e9c/attachment.htm>