[External] : Re: RFR: 8291769: Translation of switch with record patterns could be improved

forax at univ-mlv.fr forax at univ-mlv.fr
Fri Aug 5 17:24:01 UTC 2022



----- Original Message -----
> From: "jan lahoda" <jan.lahoda at oracle.com>
> To: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "Jan Lahoda" <jlahoda at openjdk.org>, "Brian Goetz" <brian.goetz at oracle.com>, "compiler-dev"
> <compiler-dev at openjdk.org>
> Sent: Thursday, August 4, 2022 10:10:06 PM
> Subject: Re: [External] : Re: RFR: 8291769: Translation of switch with record patterns could be improved

> On 04. 08. 22 19:23, forax at univ-mlv.fr wrote:
> 
>>
>> ----- Original Message -----
>>> From: "jan lahoda" <jan.lahoda at oracle.com>
>>> To: "Remi Forax" <forax at univ-mlv.fr>, "Jan Lahoda" <jlahoda at openjdk.org>, "Brian
>>> Goetz" <brian.goetz at oracle.com>
>>> Cc: "compiler-dev" <compiler-dev at openjdk.org>
>>> Sent: Thursday, August 4, 2022 6:35:18 PM
>>> Subject: Re: RFR: 8291769: Translation of switch with record patterns could be
>>> improved
>>> On 04. 08. 22 17:43, Remi Forax wrote:
>>>
>>>> ----- Original Message -----
>>>>> From: "Jan Lahoda" <jlahoda at openjdk.org>
>>>>> To: compiler-dev at openjdk.org
>>>>> Sent: Thursday, August 4, 2022 5:04:52 PM
>>>>> Subject: RFR: 8291769: Translation of switch with record patterns could be
>>>>> improved
>>>>> This is an attempt to improve the performance and scalability of switches with
>>>>> record patterns.
>>>>>
>>>>> There are two main parts of this patch:
>>>>> 1. for cases of consecutive runs of cases with the same record pattern, these
>>>>> are replaced with a single case and a nested switch. E.g.:
>>>>>
>>>>> switch (obj) {
>>>>>       case Box(String s) -> {}
>>>>>       case Box(Integer i) -> {}
>>>>>       case Box(Number n) -> {}
>>>>>       ...
>>>>> }
>>>>> =>
>>>>> switch (obj) {
>>>>>       case Box b ->
>>>>>            switch (b.o()) {
>>>>>                case String s -> {}
>>>>>                case Integer i -> {}
>>>>>                case Number n -> {}
>>>>>                default -> continue-with-outer-switch;
>>>>>            };
>>>>>      ...
>>>>> }
>>>>>
>>>>>
>>>>> This is done by first unrolling the record patterns into simple binding patterns
>>>>> a guards, and then by finding cases with the common binding pattern as a label
>>>>> and a viable guard. Binding patterns are reused as much as possibly,
>>>>> eliminating specialized handling of record patterns as much as possible.
>>>> How it works when you have a record pattern followed by a type pattern with the
>>>> same type ?
>>>> Something like:
>>>>
>>>>     switch (obj) {
>>>>       case Box(String s) -> {}
>>>>       case Box(Integer i) -> {}
>>>>       
>>>>       case Box b -> {}
>>>>     }
>>>>
>>>> because you now have twice the same type pattern on the outer switch but the
>>>> first one should not NPE.
>>>
>>> Currently the `case Box b -> {}` is kept separate. It would be possible
>>> to merge (it would simply be a default in the nested switch), but it is
>>> an additional complexity in a fairly complex code already.
>>>
>>>
>>>>
>>>>> 2. When a record accessor method fails with an exception, we need to wrap this
>>>>> exception with a `MatchException`. Currently this is being done by introducing
>>>>> a special proxy method which catches the exception and re-throws. The proposed
>>>>> patch here eliminates the need for the accessor methods by producing an
>>>>> appropriate `ExceptionTable` in the classfile, and a separate catch handler.
>>>>> This handler is attached to the innermost usable block, which is either the
>>>>> method block, lambda body, (static or non-static) initializer or a try block.
>>>>> This should ensure correct semantics, while not producing too many unnecessary
>>>>> catch handlers.
>>>> Yes !
>>>>
>>>> As said above, the semantics for null is slightly changed, because now you may
>>>> throw a NPE instead of a MatchException.
>>>
>>> The idea here is that the semantics should not be changed, and changes
>>> to semantics are bugs. But thanks to your comment, I realized null
>>> record components are not handled properly, so I've attempted to fix that:
>>>
>>> https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/9746/commits/7ddeafab1d1ea23285e60765ec0e61d7a509e442__;!!ACWV5N9M2RV99hQ!ILgm-CykKXhL7fJ8n8zqHYDrHshMWU9tnE836sT6jAqJVoQZnExXgSxUax5NhJvl3XzSqgJzPPzm_1SYQYk$
>>>
>>>
>>>> You can add explicit nullchecks to throw a MatchException instead but i think we
>>>> can do better for our users.
>>>>
>>>> I believe that with this transformation, we can now make the difference between
>>>> the 3 kinds of errors,
>>>> - the call to the accessor / destructor fails throwing an error similar to
>>>> ExceptionInTheInitializerError (ExceptionInTheDestructorError ?)
>>>> - the tested value is null throwing a NPE because most of them are implicit when
>>>> b.o() is called. This means that the compiler has to insert a requireNonNull if
>>>> the nullcheck is not implicit.
>>>> - when there are more permitted subtypes at runtime throwing an
>>>> IncompatibleClassChangeError (like with an switch expression on an enum)
>>>
>>> (FWIW, I believe the spec is saying what should happen in specific
>>> cases, so I don't think this is something javac should decide on its
>>> own. javac can decide on how the behavior is implemented, but not the
>>> behavior.)
>> yes, i want to change the spec because as a user it's hard to know how to fix
>> the code when a MatchException is thrown.
> 
> Sure - but I am not quite sure if this PR is a place to discuss that.
> 
> 
>>
>>>
>>>> for the last one, the idea is that if there is a sealed type, instead of
>>>> generating an if instanceof for all cases, the last one should not be an
>>>> instanceof but just a cast with a cacth handler that catch the
>>>> ClassCastException and throw an IncompatibleClassChangeError wrapping the CCE
>>>> instead.
>>>
>>> Note that when switching over a type we don't generate instanceof for
>>> the subtypes of that type. The typeSwitch bootstrap/indy does the
>>> checks, and the bytecode produced by javac only produces checkcasts. (It
>>> may produce instanceofs for the nested patterns, though.) It would be
>>> possible to use a default instead of the last case, and add catch
>>> handler for the checkcast, but generating cases for all the checked
>>> types, and adding a default which throws (which I believe is the current
>>> implementation) feels much easier and cleaner to implement, and I don't
>>> really see a significant disadvantage in that.
>> It's a performance issue because you need to recheck the same type pattern
>> multiple times if the switch alternate a record pattern followed by a type
>> pattern followed by a record pattern ...
> 
> 
> I assume you are referring to the following case, not to a case where we
> would be catching ClassCastExceptions:
> 
> case Box(String s) -> {}
> 
> case Box b -> {}
> 
> 
> It is true there will (likely) be one more instanceof at runtime, and we
> could avoid this at the cost of an additional complexity in javac.

yes, i think that in the end we will need to build a decision tree, in can be done in javac or inside a bootstrap method (like the lambda proxy is generated),
the later having the advantage to not requirering recompilation to enjoy more performance.

> 
> 
> If the record pattern and type pattern would alternate multiple times,
> that would (I think) require the type patterns to have guards. Like:
> 
> case Box(String s) -> {}
> 
> case Box b when guard1(b) -> {}
> 
> case Box(Integer i) -> {}
> 
> case Box b when guard2(b) -> {}
> 
> ...
> 
> 
> Optimizing this is even more challenging, as we would need to preserve
> the order.

You only need to preserve the order of the "when", not the order of the full pattern matching.

and i was thinking to a pattern matching like that:
switch(o) {
  case A(String s) -> ...
  case A a -> ...
  case B(String s) -> ...
  case B b ->
}

which alternate a record pattern and a type pattern.

> 
> 
> In any case, I'm a bit afraid that we could enhance a translation patch
> forever, always adding new cases that can be optimized. So, I think we
> need to draw a line somewhere where the patch provides a benefit while
> still being testable and reviewable. With a possible separate
> improvements, if they prove to be useful. It is possible I haven't drawn
> the line far enough, but looking at the patch, it seems complex enough.

I think the issue is that a list of pattern is not the right intermediary structure to try to optimize the pattern matching, a decision tree is a better intermediary representation (this is how pattern matching is usually optimized, i've not invented something new :) )

Here is an example of such decision tree:
https://github.com/forax/switch-pattern-combinator

regards,
Rémi



More information about the compiler-dev mailing list