RFR: 8274983: Pattern.matcher performance regression after JDK-8238358

Xin Liu xliu at openjdk.java.net
Sat Nov 20 04:44:12 UTC 2021


On Sat, 20 Nov 2021 01:26:23 GMT, Dean Long <dlong at openjdk.org> wrote:

>> The root cause of the C1 regression is that some regex generate multiple classes which all implement
>> an interface. In SlowStartupTest.java, the following **invokeinterface** happens frequently with different receivers. the target is a private interface method. 
>> 
>> 
>> 9: invokeinterface #25,  3           // InterfaceMethod java/util/regex/Pattern$BmpCharPredicate.lambda$union$2:(Ljava/util/regex/Pattern$CharPredicate;I)Z
>> 
>> 
>> This patch allows c1 to generate the optimized virtual call for invokeinterface
>> whose targets are the private interface methods.
>> 
>> Before JDK-823835, LambdaMetaFactory generates invokespecial in this case. Because the private
>> interface methods can not be overrided, c1 generates the optimized virtual call. After JDK-823835,
>> LambdaMetaFactory generates invokeinterface instead. C1 generates the regular virtual call because
>> it can not recognize the new pattern. If a multiple of subclasses all implement a same interface,
>> it is possible that they trash the IC stub using their own concrete klass in runtime.
>> 
>> Optimized virtual call uses relocInfo::opt_virtual_call_type(3), It will call VM
>> 'resolve_opt_virtual_call_C' once and resolve the target to the VEP of the nmethod.
>> Therefore, this patch can prevent the callsite from trashing.
>> 
>> Before this patch,   SlowStartupTest had 38770 times  _resolve_invoke_virtual_cnt and 38695  _handle_wrong_method_cnt per 10k iterations. To dump `C1Statistics`, we use fastdebug build for comparison.
>> 
>> 
>> $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1
>> Executed 10000 iterations in 736ms
>> C1 Runtime statistics:
>>  _resolve_invoke_virtual_cnt:     38770
>>  _resolve_invoke_opt_virtual_cnt: 186
>>  _resolve_invoke_static_cnt:      44
>>  _handle_wrong_method_cnt:        38695
>>  _ic_miss_cnt:                    35
>> 
>> 
>> With this patch,  only 1 _handle_wrong_method_cnt is triggered but we have 3 more `_resolve_invoke_opt_virtual_cnt` events instead.  The total runtime reduces from 736ms to 9ms. 
>> 
>> 
>> $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1
>> Executed 10000 iterations in 9ms
>> C1 Runtime statistics:
>>  _resolve_invoke_virtual_cnt:     77
>>  _resolve_invoke_opt_virtual_cnt: 189
>>  _resolve_invoke_static_cnt:      45
>>  _handle_wrong_method_cnt:        1
>>  _ic_miss_cnt:                    39
>> 
>> 
>> Codegen wise,  before the patch, C1 generates LIR for the invokeinterface whose target is a private interface methoda as follows.
>> 
>> __bci__use__tid____instr____________________________________
>> . 1    0    v2     a1.invokeinterface()
>>                    InvokePrivateInterfaceMethod$I.bar()V
>> . 6    0    v3     return
>> 
>> 
>> With this patch, C1 generates LIR as follows. it first check a1 is a subtype of `InvokePrivateInterfaceMethod$I`. if so, an optimized virtual call is generated. The callsite will be fixed up once and only one time in runtime. 
>> 
>> __bci__use__tid____instr____________________________________
>> . 1    1    a2     checkcast(a1) InvokePrivateInterfaceMethod$I
>>                    stack [0:a1]
>> . 1    0    v3     a2.invokeinterface()
>>                    InvokePrivateInterfaceMethod$I.bar()V
>> . 6    0    v4     return
>
> It looks like this does what we want for private interface methods, but I'm wondering if we handle all combinations of private/final and invokevirtual/invokeinterface, or are we missing some cases where can_be_statically_bound() would return true?

hi, @dean-long,

I think C1 covers all cases as long as the target method is loaded. I have seen cases which target methods haven't been loaded in startup time, but they are rare. 

ciMethod::can_be_statically_bound() return true if the method is private or final. The matrix shows the modifiers of target methods.

|                 | final | private | 
|-----------------|-------|---------|
| invokevirtual   | 1     | 2       | 
| invokespecial   | N/A1    | 3       |
| invokeinterface | N/A2    | 4       | 

1. generates the optimized virtual call because [x->target_is_final()](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_LIRGenerator.cpp#L2799) is true.
2. transforms to `invokespecial` [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_GraphBuilder.cpp#L1885) ,  then it will be case 3. 
3. generates the optimize virtual call because `x->code() == Bytecodes::_invokespecial` is true.
4. is what this patch covers. 

NA-1. I think it's impossible for javac.  it would be an optimized virtual call like case 3 even it existed.
NA-2: it's an illegal modifier for an interface method.
 https://docs.oracle.com/javase/specs/jls/se17/html/jls-9.html#jls-InterfaceMethodModifier

-------------

PR: https://git.openjdk.java.net/jdk/pull/6445


More information about the hotspot-compiler-dev mailing list