RFR: 8263087: Add a MethodHandle combinator that switches over a set of MethodHandles
Jorn Vernee
jorn.vernee at oracle.com
Fri Apr 9 17:39:32 UTC 2021
On 09/04/2021 18:54, Remi Forax wrote:
> ----- Mail original -----
>> De: "Jorn Vernee" <jvernee at openjdk.java.net>
>> À: "core-libs-dev" <core-libs-dev at openjdk.java.net>
>> Envoyé: Vendredi 9 Avril 2021 12:51:53
>> Objet: RFR: 8263087: Add a MethodHandle combinator that switches over a set of MethodHandles
>> This patch adds a `tableSwitch` combinator that can be used to switch over a set
>> of method handles given an index, with a fallback in case the index is out of
>> bounds, much like the `tableswitch` bytecode.
>>
>> The combinator does not support specifying the starting index, so the switch
>> cases always run from 0 to however many target handles are specified. A
>> starting index can be added manually with another combination step that filters
>> the input index by adding or subtracting a constant from it, which does not
>> affect performance. One of the reasons for not supporting a starting index is
>> that it allows for more lambda form sharing, but also simplifies the
>> implementation somewhat. I guess an open question is if a convenience overload
>> should be added for that case?
> I think the combinator should be lookupswitch which is more general than tableswitch with a special case when generating the bytecode to generate a tableswitch instead of a lookupswitch if the indexes are subsequent.
One of the bigger downsides I see in supporting lookupswitch directly is
that the lambda form and intrinsified bytecode become dependent on the
key set, which allows for less sharing. Something that is not/less of a
problem with tableswitch + filter function, because the filter function
could potentially be the same for any key set (where the key set is
bound to the filter function instead).
>
>> Lookup switch can also be simulated by filtering the input through an injection
>> function that translates it into a case index, which has also proven to have
>> the ability to have comparable performance to, or even better performance than,
>> a bytecode-native `lookupswitch` instruction. I plan to add such an injection
>> function to the runtime libraries in the future as well. Maybe at that point it
>> could be evaluated if it's worth it to add a lookup switch combinator as well,
>> but I don't see an immediate need to include it in this patch.
>>
> As i said in the bug when we discuss about that the filtering function,
> i believe that the filtering function for emulating lookupswitch is lookupswitch itself.
Right, but lookupswitch also ties us into C2's optimization strategy for
lookupswitch. Having the ability to specify the filter function allows
picking a better one for the particular use-case. For instance for
switches with a large-ish number of cases (15+) it's faster to use a
HashMap lookup as a filtering function (according to my benchmarking),
with comparinble results to native lookupswitch if the filter function
uses a tree of if/else.
Though, I'm not saying that it's not worth it to add a lookupswitch
combinator as well, to me it seems like tableswitch is the more
flexible/minimal primitive, because it doesn't force the use of a
particular lookup strategy.
WRT picking the translation strategy based on the set of keys; I'm note
super keen on that. Since the MethodHandle combinators are a low-level
API, I ended up adopting a simple 'what you see is what you get'
philosophy as much as possible, with the possibility of building other
use-cases on top. i.e. a tableSwitch combinator that reliably translates
into the tableswitch bytecode, a lookupSwitch combinator that reliably
translates into the lookupswitch bytecode, and an exception if I get the
key set wrong, rather than silently switching strategies to one or the
other.
>
>> The current bytecode intrinsification generates a call for each switch case,
>> which guarantees full inlining of the target method handles. Alternatively we
>> could only have 1 callsite at the end of the switch, where each case just loads
>> the target method handle, but currently this does not allow for inlining of the
>> handles, since they are not constant.
> This scheme also allows to never JIT compile a branch which is never used.
Yes, that's a good point, thanks.
Thanks for the input,
Jorn
>
>> Maybe a future C2 optimization could look at the receiver input for invokeBasic
>> call sites, and if the input is a phi node, clone the call for each constant
>> input of the phi. I believe that would allow simplifying the bytecode without
>> giving up on inlining.
>>
>> Some numbers from the added benchmarks:
>> Benchmark (numCases) (offset) (sorted)
>> Mode Cnt Score Error Units
>> MethodHandlesTableSwitchConstant.testSwitch 5 0 N/A
>> avgt 30 4.186 � 0.054 ms/op
>> MethodHandlesTableSwitchConstant.testSwitch 5 150 N/A
>> avgt 30 4.164 � 0.057 ms/op
>> MethodHandlesTableSwitchConstant.testSwitch 10 0 N/A
>> avgt 30 4.124 � 0.023 ms/op
>> MethodHandlesTableSwitchConstant.testSwitch 10 150 N/A
>> avgt 30 4.126 � 0.025 ms/op
>> MethodHandlesTableSwitchConstant.testSwitch 25 0 N/A
>> avgt 30 4.137 � 0.042 ms/op
>> MethodHandlesTableSwitchConstant.testSwitch 25 150 N/A
>> avgt 30 4.113 � 0.016 ms/op
>> MethodHandlesTableSwitchConstant.testSwitch 50 0 N/A
>> avgt 30 4.118 � 0.028 ms/op
>> MethodHandlesTableSwitchConstant.testSwitch 50 150 N/A
>> avgt 30 4.127 � 0.019 ms/op
>> MethodHandlesTableSwitchConstant.testSwitch 100 0 N/A
>> avgt 30 4.116 � 0.013 ms/op
>> MethodHandlesTableSwitchConstant.testSwitch 100 150 N/A
>> avgt 30 4.121 � 0.020 ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch 5 0 N/A
>> avgt 30 4.113 � 0.009 ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch 5 150 N/A
>> avgt 30 4.149 � 0.041 ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch 10 0 N/A
>> avgt 30 4.121 � 0.026 ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch 10 150 N/A
>> avgt 30 4.113 � 0.021 ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch 25 0 N/A
>> avgt 30 4.129 � 0.028 ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch 25 150 N/A
>> avgt 30 4.105 � 0.019 ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch 50 0 N/A
>> avgt 30 4.097 � 0.021 ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch 50 150 N/A
>> avgt 30 4.131 � 0.037 ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch 100 0 N/A
>> avgt 30 4.135 � 0.025 ms/op
>> MethodHandlesTableSwitchOpaqueSingle.testSwitch 100 150 N/A
>> avgt 30 4.139 � 0.145 ms/op
>> MethodHandlesTableSwitchRandom.testSwitch 5 0 true
>> avgt 30 4.894 � 0.028 ms/op
>> MethodHandlesTableSwitchRandom.testSwitch 5 0 false
>> avgt 30 11.526 � 0.194 ms/op
>> MethodHandlesTableSwitchRandom.testSwitch 5 150 true
>> avgt 30 4.882 � 0.025 ms/op
>> MethodHandlesTableSwitchRandom.testSwitch 5 150 false
>> avgt 30 11.532 � 0.034 ms/op
>> MethodHandlesTableSwitchRandom.testSwitch 10 0 true
>> avgt 30 5.065 � 0.076 ms/op
>> MethodHandlesTableSwitchRandom.testSwitch 10 0 false
>> avgt 30 13.016 � 0.020 ms/op
>> MethodHandlesTableSwitchRandom.testSwitch 10 150 true
>> avgt 30 5.103 � 0.051 ms/op
>> MethodHandlesTableSwitchRandom.testSwitch 10 150 false
>> avgt 30 12.984 � 0.102 ms/op
>> MethodHandlesTableSwitchRandom.testSwitch 25 0 true
>> avgt 30 8.441 � 0.165 ms/op
>> MethodHandlesTableSwitchRandom.testSwitch 25 0 false
>> avgt 30 13.371 � 0.060 ms/op
>> MethodHandlesTableSwitchRandom.testSwitch 25 150 true
>> avgt 30 8.628 � 0.032 ms/op
>> MethodHandlesTableSwitchRandom.testSwitch 25 150 false
>> avgt 30 13.542 � 0.020 ms/op
>> MethodHandlesTableSwitchRandom.testSwitch 50 0 true
>> avgt 30 4.701 � 0.015 ms/op
>> MethodHandlesTableSwitchRandom.testSwitch 50 0 false
>> avgt 30 13.562 � 0.063 ms/op
>> MethodHandlesTableSwitchRandom.testSwitch 50 150 true
>> avgt 30 7.991 � 3.111 ms/op
>> MethodHandlesTableSwitchRandom.testSwitch 50 150 false
>> avgt 30 13.543 � 0.088 ms/op
>> MethodHandlesTableSwitchRandom.testSwitch 100 0 true
>> avgt 30 4.712 � 0.020 ms/op
>> MethodHandlesTableSwitchRandom.testSwitch 100 0 false
>> avgt 30 13.600 � 0.085 ms/op
>> MethodHandlesTableSwitchRandom.testSwitch 100 150 true
>> avgt 30 4.676 � 0.011 ms/op
>> MethodHandlesTableSwitchRandom.testSwitch 100 150 false
>> avgt 30 13.476 � 0.043 ms/op
>>
>> Testing:
>> - [x] Running of included benchmarks
>> - [x] Inspecting inlining trace and verifying method handle targets are inlined
>> - [x] Running TestTableSwitch test (currently the only user of the new code)
>> - [x] Running java/lang/invoke tests (just in case)
>> - [x] Some manual testing
>>
>> Thanks,
>> Jorn
>>
>> -------------
>>
>> Commit messages:
>> - Improve test
>> - Touchup
>> - Use cases array + holder
>> - WIP - implement tableSwitch combinator in lambda form interpreter
>>
>> Changes: https://git.openjdk.java.net/jdk/pull/3401/files
>> Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3401&range=00
>> Issue: https://bugs.openjdk.java.net/browse/JDK-8263087
>> Stats: 959 lines in 8 files changed: 955 ins; 0 del; 4 mod
>> Patch: https://git.openjdk.java.net/jdk/pull/3401.diff
>> Fetch: git fetch https://git.openjdk.java.net/jdk pull/3401/head:pull/3401
>>
>> PR: https://git.openjdk.java.net/jdk/pull/3401
More information about the core-libs-dev
mailing list