[External] : Re: RFR: 8291769: Translation of switch with record patterns could be improved

Fri Aug 5 20:28:21 UTC 2022

----- Original Message -----
> From: "jan lahoda" <jan.lahoda at oracle.com>
> To: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "Jan Lahoda" <jlahoda at openjdk.org>, "Brian Goetz" <brian.goetz at oracle.com>, "compiler-dev"
> <compiler-dev at openjdk.org>
> Sent: Friday, August 5, 2022 9:35:14 PM
> Subject: Re: [External] : Re: RFR: 8291769: Translation of switch with record patterns could be improved

> On 05. 08. 22 19:24, forax at univ-mlv.fr wrote:
> 
>>
>> ----- Original Message -----
>>> From: "jan lahoda" <jan.lahoda at oracle.com>
>>> To: "Remi Forax" <forax at univ-mlv.fr>
>>> Cc: "Jan Lahoda" <jlahoda at openjdk.org>, "Brian Goetz" <brian.goetz at oracle.com>,
>>> "compiler-dev"
>>> <compiler-dev at openjdk.org>
>>> Sent: Thursday, August 4, 2022 10:10:06 PM
>>> Subject: Re: [External] : Re: RFR: 8291769: Translation of switch with record
>>> patterns could be improved
>>> On 04. 08. 22 19:23, forax at univ-mlv.fr wrote:
>>>
>>>> ----- Original Message -----
>>>>> From: "jan lahoda" <jan.lahoda at oracle.com>
>>>>> To: "Remi Forax" <forax at univ-mlv.fr>, "Jan Lahoda" <jlahoda at openjdk.org>, "Brian
>>>>> Goetz" <brian.goetz at oracle.com>
>>>>> Cc: "compiler-dev" <compiler-dev at openjdk.org>
>>>>> Sent: Thursday, August 4, 2022 6:35:18 PM
>>>>> Subject: Re: RFR: 8291769: Translation of switch with record patterns could be
>>>>> improved
>>>>> On 04. 08. 22 17:43, Remi Forax wrote:
>>>>>
>>>>>> ----- Original Message -----
>>>>>>> From: "Jan Lahoda" <jlahoda at openjdk.org>
>>>>>>> To: compiler-dev at openjdk.org
>>>>>>> Sent: Thursday, August 4, 2022 5:04:52 PM
>>>>>>> Subject: RFR: 8291769: Translation of switch with record patterns could be
>>>>>>> improved
>>>>>>> This is an attempt to improve the performance and scalability of switches with
>>>>>>> record patterns.
>>>>>>>
>>>>>>> There are two main parts of this patch:
>>>>>>> 1. for cases of consecutive runs of cases with the same record pattern, these
>>>>>>> are replaced with a single case and a nested switch. E.g.:
>>>>>>>
>>>>>>> switch (obj) {
>>>>>>>        case Box(String s) -> {}
>>>>>>>        case Box(Integer i) -> {}
>>>>>>>        case Box(Number n) -> {}
>>>>>>>        ...
>>>>>>> }
>>>>>>> =>
>>>>>>> switch (obj) {
>>>>>>>        case Box b ->
>>>>>>>             switch (b.o()) {
>>>>>>>                 case String s -> {}
>>>>>>>                 case Integer i -> {}
>>>>>>>                 case Number n -> {}
>>>>>>>                 default -> continue-with-outer-switch;
>>>>>>>             };
>>>>>>>       ...
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> This is done by first unrolling the record patterns into simple binding patterns
>>>>>>> a guards, and then by finding cases with the common binding pattern as a label
>>>>>>> and a viable guard. Binding patterns are reused as much as possibly,
>>>>>>> eliminating specialized handling of record patterns as much as possible.
>>>>>> How it works when you have a record pattern followed by a type pattern with the
>>>>>> same type ?
>>>>>> Something like:
>>>>>>
>>>>>>      switch (obj) {
>>>>>>        case Box(String s) -> {}
>>>>>>        case Box(Integer i) -> {}
>>>>>>        
>>>>>>        case Box b -> {}
>>>>>>      }
>>>>>>
>>>>>> because you now have twice the same type pattern on the outer switch but the
>>>>>> first one should not NPE.
>>>>> Currently the `case Box b -> {}` is kept separate. It would be possible
>>>>> to merge (it would simply be a default in the nested switch), but it is
>>>>> an additional complexity in a fairly complex code already.
>>>>>
>>>>>
>>>>>>> 2. When a record accessor method fails with an exception, we need to wrap this
>>>>>>> exception with a `MatchException`. Currently this is being done by introducing
>>>>>>> a special proxy method which catches the exception and re-throws. The proposed
>>>>>>> patch here eliminates the need for the accessor methods by producing an
>>>>>>> appropriate `ExceptionTable` in the classfile, and a separate catch handler.
> 
>>>>>>> This handler is attached to the innermost usable block, which is either the
>>>>>>> method block, lambda body, (static or non-static) initializer or a try block.
>>>>>>> This should ensure correct semantics, while not producing too many unnecessary
>>>>>>> catch handlers.
>>>>>> Yes !
>>>>>>
>>>>>> As said above, the semantics for null is slightly changed, because now you may
>>>>>> throw a NPE instead of a MatchException.
>>>>> The idea here is that the semantics should not be changed, and changes
>>>>> to semantics are bugs. But thanks to your comment, I realized null
>>>>> record components are not handled properly, so I've attempted to fix that:
>>>>>
>>>>> https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/9746/commits/7ddeafab1d1ea23285e60765ec0e61d7a509e442__;!!ACWV5N9M2RV99hQ!ILgm-CykKXhL7fJ8n8zqHYDrHshMWU9tnE836sT6jAqJVoQZnExXgSxUax5NhJvl3XzSqgJzPPzm_1SYQYk$
>>>>>
>>>>>
>>>>>> You can add explicit nullchecks to throw a MatchException instead but i think we
>>>>>> can do better for our users.
>>>>>>
>>>>>> I believe that with this transformation, we can now make the difference between
>>>>>> the 3 kinds of errors,
>>>>>> - the call to the accessor / destructor fails throwing an error similar to
>>>>>> ExceptionInTheInitializerError (ExceptionInTheDestructorError ?)
>>>>>> - the tested value is null throwing a NPE because most of them are implicit when
>>>>>> b.o() is called. This means that the compiler has to insert a requireNonNull if
>>>>>> the nullcheck is not implicit.
>>>>>> - when there are more permitted subtypes at runtime throwing an
>>>>>> IncompatibleClassChangeError (like with an switch expression on an enum)
>>>>> (FWIW, I believe the spec is saying what should happen in specific
>>>>> cases, so I don't think this is something javac should decide on its
>>>>> own. javac can decide on how the behavior is implemented, but not the
>>>>> behavior.)
>>>> yes, i want to change the spec because as a user it's hard to know how to fix
>>>> the code when a MatchException is thrown.
>>> Sure - but I am not quite sure if this PR is a place to discuss that.
>>>
>>>
>>>>>> for the last one, the idea is that if there is a sealed type, instead of
>>>>>> generating an if instanceof for all cases, the last one should not be an
>>>>>> instanceof but just a cast with a cacth handler that catch the
>>>>>> ClassCastException and throw an IncompatibleClassChangeError wrapping the CCE
>>>>>> instead.
>>>>> Note that when switching over a type we don't generate instanceof for
>>>>> the subtypes of that type. The typeSwitch bootstrap/indy does the
>>>>> checks, and the bytecode produced by javac only produces checkcasts. (It
>>>>> may produce instanceofs for the nested patterns, though.) It would be
>>>>> possible to use a default instead of the last case, and add catch
>>>>> handler for the checkcast, but generating cases for all the checked
>>>>> types, and adding a default which throws (which I believe is the current
>>>>> implementation) feels much easier and cleaner to implement, and I don't
>>>>> really see a significant disadvantage in that.
>>>> It's a performance issue because you need to recheck the same type pattern
>>>> multiple times if the switch alternate a record pattern followed by a type
>>>> pattern followed by a record pattern ...
>>>
>>> I assume you are referring to the following case, not to a case where we
>>> would be catching ClassCastExceptions:
>>>
>>> case Box(String s) -> {}
>>>
>>> case Box b -> {}
>>>
>>>
>>> It is true there will (likely) be one more instanceof at runtime, and we
>>> could avoid this at the cost of an additional complexity in javac.
>> yes, i think that in the end we will need to build a decision tree, in can be
>> done in javac or inside a bootstrap method (like the lambda proxy is
>> generated),
>> the later having the advantage to not requirering recompilation to enjoy more
>> performance.
>>
>>>
>>> If the record pattern and type pattern would alternate multiple times,
>>> that would (I think) require the type patterns to have guards. Like:
>>>
>>> case Box(String s) -> {}
>>>
>>> case Box b when guard1(b) -> {}
>>>
>>> case Box(Integer i) -> {}
>>>
>>> case Box b when guard2(b) -> {}
>>>
>>> ...
>>>
>>>
>>> Optimizing this is even more challenging, as we would need to preserve
>>> the order.
>> You only need to preserve the order of the "when", not the order of the full
>> pattern matching.
> 
> 
> Yes, some re-orderings may be possible, but the guard will very severely
> limit the reorderings possible. In the above case, I doubt we could
> reorder the cases/patterns in any way.

I wonder if this example shows that there is a missing rule when computing the dominance relationships,
if we want to make those patterns easy to read, i believe that having the guard at the end is more readable

case Box(String s) -> {}
case Box(Integer i) -> {}
case Box b when guard1(b) -> {}
case Box b when guard2(b) -> {}

Having a guard should not a be a free from the dominance relationships card.

The dominance algorithm should be more something like we checks first without the guards and then we add the guards.

We already have a dominance between "case Box b when ..." and "case Box b" and between case Box(..) and case Box b so it seems quite logical to have one between case Box(...) and case Box b when ...

Rémi