RFR: 8274983: C1 optimizes the invocation of private interface methods [v3]
Xin Liu
xliu at openjdk.java.net
Mon Nov 29 21:29:46 UTC 2021
On Wed, 24 Nov 2021 23:42:27 GMT, Xin Liu <xliu at openjdk.org> wrote:
>> The root cause of the C1 regression is that some regex generate multiple classes which all implement
>> an interface. In SlowStartupTest.java, the following **invokeinterface** happens frequently with different receivers. the target is a private interface method.
>>
>>
>> 9: invokeinterface #25, 3 // InterfaceMethod java/util/regex/Pattern$BmpCharPredicate.lambda$union$2:(Ljava/util/regex/Pattern$CharPredicate;I)Z
>>
>>
>> This patch allows c1 to generate the optimized virtual call for invokeinterface
>> whose targets are the private interface methods.
>>
>> Before JDK-823835, LambdaMetaFactory generates invokespecial in this case. Because the private
>> interface methods can not be overrided, c1 generates the optimized virtual call. After JDK-823835,
>> LambdaMetaFactory generates invokeinterface instead. C1 generates the regular virtual call because
>> it can not recognize the new pattern. If a multiple of subclasses all implement a same interface,
>> it is possible that they trash the IC stub using their own concrete klass in runtime.
>>
>> Optimized virtual call uses relocInfo::opt_virtual_call_type(3), It will call VM
>> 'resolve_opt_virtual_call_C' once and resolve the target to the VEP of the nmethod.
>> Therefore, this patch can prevent the callsite from trashing.
>>
>> Before this patch, SlowStartupTest had 38770 times _resolve_invoke_virtual_cnt and 38695 _handle_wrong_method_cnt per 10k iterations. To dump `C1Statistics`, we use fastdebug build for comparison.
>>
>>
>> $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1
>> Executed 10000 iterations in 736ms
>> C1 Runtime statistics:
>> _resolve_invoke_virtual_cnt: 38770
>> _resolve_invoke_opt_virtual_cnt: 186
>> _resolve_invoke_static_cnt: 44
>> _handle_wrong_method_cnt: 38695
>> _ic_miss_cnt: 35
>>
>>
>> With this patch, only 1 _handle_wrong_method_cnt is triggered but we have 3 more `_resolve_invoke_opt_virtual_cnt` events instead. The total runtime reduces from 736ms to 9ms.
>>
>>
>> $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1
>> Executed 10000 iterations in 9ms
>> C1 Runtime statistics:
>> _resolve_invoke_virtual_cnt: 77
>> _resolve_invoke_opt_virtual_cnt: 189
>> _resolve_invoke_static_cnt: 45
>> _handle_wrong_method_cnt: 1
>> _ic_miss_cnt: 39
>>
>>
>> Codegen wise, before the patch, C1 generates LIR for the invokeinterface whose target is a private interface methoda as follows.
>>
>> __bci__use__tid____instr____________________________________
>> . 1 0 v2 a1.invokeinterface()
>> InvokePrivateInterfaceMethod$I.bar()V
>> . 6 0 v3 return
>>
>>
>> With this patch, C1 generates LIR as follows. it first check a1 is a subtype of `InvokePrivateInterfaceMethod$I`. if so, an optimized virtual call is generated. The callsite will be fixed up once and only one time in runtime.
>>
>> __bci__use__tid____instr____________________________________
>> . 1 1 a2 checkcast(a1) InvokePrivateInterfaceMethod$I
>> stack [0:a1]
>> . 1 0 v3 a2.invokeinterface()
>> InvokePrivateInterfaceMethod$I.bar()V
>> . 6 0 v4 return
>
> Xin Liu has updated the pull request incrementally with one additional commit since the last revision:
>
> Call set_invokespecial_receiver_check() so invokespecial throws IllegalAccessError.
>
> We need to checkcast for invokespecial even target->can_be_statically_bound() is false.
> eg. https://github.com/openjdk/jdk/blob/master/test/jdk/java/lang/invoke/SpecialInterfaceCallI4.jasm#L33
I add a JMH benchmark for this one.
$make test TEST='micro:org.openjdk.bench.vm.compiler.InterfacePrivateCalls' CONF=linux-x86_64-server-release
//master(7b2d823e)
Benchmark Mode Cnt Score Error Units
InterfacePrivateCalls.invokePrivateInterfaceMethodC1 avgt 5 11399.058 ± 1086.745 ns/op
InterfacePrivateCalls.invokePrivateInterfaceMethodC2 avgt 5 23.741 ± 0.189 ns/op
// with this patch
Benchmark Mode Cnt Score Error Units
InterfacePrivateCalls.invokePrivateInterfaceMethodC1 avgt 5 24.534 ± 0.130 ns/op
InterfacePrivateCalls.invokePrivateInterfaceMethodC2 avgt 5 23.800 ± 0.384 ns/op
The average cost of C1 has improved from 11399 ns/op to 24.534 ns/op, or 464x faster. now the cost of c1 is comparable to C2.
-------------
PR: https://git.openjdk.java.net/jdk/pull/6445
More information about the hotspot-compiler-dev
mailing list