RFR: 8264288: Performance issue with MethodHandle.asCollector [v2]

Vladimir Ivanov vlivanov at openjdk.java.net
Fri Apr 2 14:07:30 UTC 2021


On Fri, 2 Apr 2021 13:17:55 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

>> This patch speeds up MethodHandle.asCollector handles where the array type is not Object[], as well as speeding up all collectors where the arity is greater than 10.
>> 
>> The old code is creating a collector handle by combining a set of hard coded methods for collecting arguments into an Object[], up to a maximum of ten elements, and then copying this intermediate array into a final array.
>> 
>> In principle, it shouldn't matter how slow (or fast) this handle is, because it should be replaced by existing bytecode intrinsification which does the right thing. But, through investigation it turns out that the intrinsification is only being applied in a very limited amount of cases: Object[] with max 10 elements only, only for the intermediate collector handles. Every other collector shape incurs overhead because it essentially ends up calling the ineffecient fallback handle.
>> 
>> Rather than sticking on a band aid (I tried this, but it turned out to be very tricky to untangle the existing code), the new code replaces the existing implementation with a collector handle implemented using a LambdaForm, which removes the need for intrinsification, and also greatly reduces code-complexity of the implementation. (I plan to expose this collector using a public API in the future as well, so users don't have to go through MHs::identity to make a collector).
>> 
>> The old code was also using a special lambda form transform for collecting arguments into an array. I believe this was done to take advantage of the existing-but-limited bytecode intrinsification, at least for Object[] with less than 10 elements.
>> 
>> The new code just uses the existing collect arguments transform with the newly added collector handle as filter, and this works just as well for the existing case, but as a bonus is also much simpler, since no separate transform is needed. Using the collect arguments transform should also improve sharing.
>> 
>> As a result of these changes a lot of code was unused and has been removed in this patch.
>> 
>> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), as well as another variant of the benchmark that used a declared static identity method instead of MHs::identity (not included). Before/after comparison of MethodHandleAs* benchmarks (no differences there).
>> 
>> Here are some numbers from the added benchmark:
>> 
>> Before:
>> Benchmark                                    Mode  Cnt    Score    Error  Units
>> TypedAsCollector.testIntCollect              avgt   30  189.156 �  1.796  ns/op
>> TypedAsCollector.testIntCollectHighArity     avgt   30  660.549 � 10.159  ns/op
>> TypedAsCollector.testObjectCollect           avgt   30    7.092 �  0.042  ns/op
>> TypedAsCollector.testObjectCollectHighArity  avgt   30   65.225 �  0.546  ns/op
>> TypedAsCollector.testStringCollect           avgt   30   28.511 �  0.243  ns/op
>> TypedAsCollector.testStringCollectHighArity  avgt   30   57.054 �  0.635  ns/op
>> (as you can see, just the Object[] with arity less than 10 case is fast here)
>> After:
>> Benchmark                                    Mode  Cnt  Score   Error  Units
>> TypedAsCollector.testIntCollect              avgt   30  6.569 � 0.131  ns/op
>> TypedAsCollector.testIntCollectHighArity     avgt   30  8.923 � 0.066  ns/op
>> TypedAsCollector.testObjectCollect           avgt   30  6.813 � 0.035  ns/op
>> TypedAsCollector.testObjectCollectHighArity  avgt   30  9.718 � 0.066  ns/op
>> TypedAsCollector.testStringCollect           avgt   30  6.737 � 0.016  ns/op
>> TypedAsCollector.testStringCollectHighArity  avgt   30  9.618 � 0.052  ns/op
>> 
>> Thanks,
>> Jorn
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
> 
>   - Address review comments
>   - Use cached version of store func getter
>   - Use ARRAY_STORE intrinsic for array stores
>   - Generate direct call to Array.newInstance instead of using an array constructor handle
>   - Intrinsify call to Array.newInstance if the component type is constant

src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java line 874:

> 872:                 case NEW_ARRAY:
> 873:                     Class<?> rtype = name.function.methodType().returnType();
> 874:                     if (isStaticallyNameable(rtype)) {

Why do you remote `isStaticallyNameable(rtype)` check? 

It is there for a reason: only a very limited set of classes can be directly referenced from LF bytecode.

src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1897:

> 1895:         // field of the receiver handle.
> 1896:         // This is left as a followup enhancement, as it needs to be investigated if this causes
> 1897:         // profile pollution.

Profile pollution shouldn't be a problem here: both types (element type + array type) will be constants attached to the "receiver" BMH instance and during inlining will be fetched from there.  

I'm fine with handling it as a separate RFE.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3306


More information about the core-libs-dev mailing list