RFR: 8263087: Add a MethodHandle combinator that switches over a set of MethodHandles

Fri Apr 9 17:39:32 UTC 2021

On 09/04/2021 18:54, Remi Forax wrote:
> ----- Mail original -----
>> De: "Jorn Vernee" <jvernee at openjdk.java.net>
>> À: "core-libs-dev" <core-libs-dev at openjdk.java.net>
>> Envoyé: Vendredi 9 Avril 2021 12:51:53
>> Objet: RFR: 8263087: Add a MethodHandle combinator that switches over a set of MethodHandles
>> This patch adds a `tableSwitch` combinator that can be used to switch over a set
>> of method handles given an index, with a fallback in case the index is out of
>> bounds, much like the `tableswitch` bytecode.
>>
>> The combinator does not support specifying the starting index, so the switch
>> cases always run from 0 to however many target handles are specified. A
>> starting index can be added manually with another combination step that filters
>> the input index by adding or subtracting a constant from it, which does not
>> affect performance. One of the reasons for not supporting a starting index is
>> that it allows for more lambda form sharing, but also simplifies the
>> implementation somewhat. I guess an open question is if a convenience overload
>> should be added for that case?
> I think the combinator should be lookupswitch which is more general than tableswitch with a special case when generating the bytecode to generate a tableswitch instead of a lookupswitch if the indexes are subsequent.
One of the bigger downsides I see in supporting lookupswitch directly is 
that the lambda form and intrinsified bytecode become dependent on the 
key set, which allows for less sharing. Something that is not/less of a 
problem with tableswitch + filter function, because the filter function 
could potentially be the same for any key set (where the key set is 
bound to the filter function instead).
>
>> Lookup switch can also be simulated by filtering the input through an injection
>> function that translates it into a case index, which has also proven to have
>> the ability to have comparable performance to, or even better performance than,
>> a bytecode-native `lookupswitch` instruction. I plan to add such an injection
>> function to the runtime libraries in the future as well. Maybe at that point it
>> could be evaluated if it's worth it to add a lookup switch combinator as well,
>> but I don't see an immediate need to include it in this patch.
>>
> As i said in the bug when we discuss about that the filtering function,
> i believe that the filtering function for emulating lookupswitch is lookupswitch itself.
Right, but lookupswitch also ties us into C2's optimization strategy for 
lookupswitch. Having the ability to specify the filter function allows 
picking a better one for the particular use-case. For instance for 
switches with a large-ish number of cases (15+) it's faster to use a 
HashMap lookup as a filtering function (according to my benchmarking), 
with comparinble results to native lookupswitch if the filter function 
uses a tree of if/else.

Though, I'm not saying that it's not worth it to add a lookupswitch 
combinator as well, to me it seems like tableswitch is the more 
flexible/minimal primitive, because it doesn't force the use of a 
particular lookup strategy.

WRT picking the translation strategy based on the set of keys; I'm note 
super keen on that. Since the MethodHandle combinators are a low-level 
API, I ended up adopting a simple 'what you see is what you get' 
philosophy as much as possible, with the possibility of building other 
use-cases on top. i.e. a tableSwitch combinator that reliably translates 
into the tableswitch bytecode, a lookupSwitch combinator that reliably 
translates into the lookupswitch bytecode, and an exception if I get the 
key set wrong, rather than silently switching strategies to one or the 
other.
>
>> The current bytecode intrinsification generates a call for each switch case,
>> which guarantees full inlining of the target method handles. Alternatively we
>> could only have 1 callsite at the end of the switch, where each case just loads
>> the target method handle, but currently this does not allow for inlining of the
>> handles, since they are not constant.
> This scheme also allows to never JIT compile a branch which is never used.

Yes, that's a good point, thanks.

Thanks for the input,
Jorn

>
>> Maybe a future C2 optimization could look at the receiver input for invokeBasic
>> call sites, and if the input is a phi node, clone the call for each constant
>> input of the phi. I believe that would allow simplifying the bytecode without
>> giving up on inlining.
>>
>> Some numbers from the added benchmarks:
>> Benchmark                                        (numCases)  (offset)  (sorted)
>> Mode  Cnt   Score   Error  Units
>> MethodHandlesTableSwitchConstant.testSwitch               5         0       N/A
>> avgt   30   4.186 � 0.054  ms/op
>> MethodHandlesTableSwitchConstant.testSwitch               5       150       N/A
>> avgt   30   4.164 � 0.057  ms/op
>> MethodHandlesTableSwitchConstant.testSwitch              10         0       N/A
>> avgt   30   4.124 � 0.023  ms/op
>> MethodHandlesTableSwitchConstant.testSwitch              10       150       N/A
>> avgt   30   4.126 � 0.025  ms/op
>> MethodHandlesTableSwitchConstant.testSwitch              25         0       N/A
>> avgt   30   4.137 � 0.042  ms/op
>> MethodHandlesTableSwitchConstant.testSwitch              25       150       N/A
>> avgt   30   4.113 � 0.016  ms/op
>> MethodHandlesTableSwitchConstant.testSwitch              50         0       N/A
>> avgt   30   4.118 � 0.028  ms/op
>> MethodHandlesTableSwitchConstant.testSwitch              50       150       N/A
>> avgt   30   4.127 � 0.019  ms/op
>> MethodHandlesTableSwitchConstant.testSwitch             100         0       N/A
>> avgt   30   4.116 � 0.013  ms/op
>> MethodHandlesTableSwitchConstant.testSwitch             100       150       N/A
>> avgt   30   4.121 � 0.020  ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch           5         0       N/A
>> avgt   30   4.113 � 0.009  ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch           5       150       N/A
>> avgt   30   4.149 � 0.041  ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch          10         0       N/A
>> avgt   30   4.121 � 0.026  ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch          10       150       N/A
>> avgt   30   4.113 � 0.021  ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch          25         0       N/A
>> avgt   30   4.129 � 0.028  ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch          25       150       N/A
>> avgt   30   4.105 � 0.019  ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch          50         0       N/A
>> avgt   30   4.097 � 0.021  ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch          50       150       N/A
>> avgt   30   4.131 � 0.037  ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch         100         0       N/A
>> avgt   30   4.135 � 0.025  ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch         100       150       N/A
>> avgt   30   4.139 � 0.145  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                 5         0      true
>> avgt   30   4.894 � 0.028  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                 5         0     false
>> avgt   30  11.526 � 0.194  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                 5       150      true
>> avgt   30   4.882 � 0.025  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                 5       150     false
>> avgt   30  11.532 � 0.034  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                10         0      true
>> avgt   30   5.065 � 0.076  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                10         0     false
>> avgt   30  13.016 � 0.020  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                10       150      true
>> avgt   30   5.103 � 0.051  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                10       150     false
>> avgt   30  12.984 � 0.102  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                25         0      true
>> avgt   30   8.441 � 0.165  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                25         0     false
>> avgt   30  13.371 � 0.060  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                25       150      true
>> avgt   30   8.628 � 0.032  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                25       150     false
>> avgt   30  13.542 � 0.020  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                50         0      true
>> avgt   30   4.701 � 0.015  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                50         0     false
>> avgt   30  13.562 � 0.063  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                50       150      true
>> avgt   30   7.991 � 3.111  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch                50       150     false
>> avgt   30  13.543 � 0.088  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch               100         0      true
>> avgt   30   4.712 � 0.020  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch               100         0     false
>> avgt   30  13.600 � 0.085  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch               100       150      true
>> avgt   30   4.676 � 0.011  ms/op
>> MethodHandlesTableSwitchRandom.testSwitch               100       150     false
>> avgt   30  13.476 � 0.043  ms/op
>>
>> Testing:
>> - [x] Running of included benchmarks
>> - [x] Inspecting inlining trace and verifying method handle targets are inlined
>> - [x] Running TestTableSwitch test (currently the only user of the new code)
>> - [x] Running java/lang/invoke tests (just in case)
>> - [x] Some manual testing
>>
>> Thanks,
>> Jorn
>>
>> -------------
>>
>> Commit messages:
>> - Improve test
>> - Touchup
>> - Use cases array + holder
>> - WIP - implement tableSwitch combinator in lambda form interpreter
>>
>> Changes: https://git.openjdk.java.net/jdk/pull/3401/files
>> Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3401&range=00
>>   Issue: https://bugs.openjdk.java.net/browse/JDK-8263087
>>   Stats: 959 lines in 8 files changed: 955 ins; 0 del; 4 mod
>>   Patch: https://git.openjdk.java.net/jdk/pull/3401.diff
>>   Fetch: git fetch https://git.openjdk.java.net/jdk pull/3401/head:pull/3401
>>
>> PR: https://git.openjdk.java.net/jdk/pull/3401