Re: RFR: 8253049: Enhance itable_stub for AArch64 and x86_64

Kuai Wei kuaiwei.kw at alibaba-inc.com
Tue Sep 15 10:04:10 UTC 2020


Thanks for your quick reply.

>    I updated my test cases in test/micro/org/openjdk/bench/vm/compiler/InterfaceCalls.java . My tests will not inline interface methods and most cpu are used by itable_stub.
> every test will run 10 warmup iterations and 5 measure iterations for one score. I took 3 score for every test.
>    Below is test result on my machines, it looks slow loop has more improvement than origin one.

Good, thanks for the numbers. I'm curious have you observed any 
improvements on larger scale benchmarks or real world apps?

I'm asking because linear scan is already far from optimal when there 
are many superinterfaces present.

Kevin: itable_stub was found hot on several online applications. So I started to work on this. Now I don't have chance to verify it online. So I uses microbenchmarks to verify. I will
test with some benchmarks. 

>    I think lookup_interface_method can be reused as fast path. And it is also used by templateTable::invoke_interface and generate_method_handle_dispatch.
> My implementation in slow path need more registers (6 registers so far), I need to check if there's register conflict in these methods. I'd like to keep a separate
> slow path implementation. How do you think about it?

Frankly speaking, I'd like to avoid the duplication.

Kevin: Ok, I will try to merge them.

Also, absence of guarantees about order of interfaces in the itable 
complicates things: REFC and DECC can be encountered in arbitrary order 
and the pass should take that into account. For example, I don't see 
early exit on success in slow variant, so every lookup has to go through 
the whole itable irrespective of whether it succeeds or fails. I 
attribute that to the complications induced by aforementioned aspect.

Kevin: I use a counter for matching. If it reaches zero, the iteration can exit early.

And speaking of the overall approach (as it is implemented now), IMO 
increased complexity doesn't worth it. If interface calls become a 
bottleneck, the problem lies not in itable stub, but the overall design 
which requires linear scan over itables. It's better to put the effort 
there than micro-optimizing the stub.

Kevin: I agree we can improve itable design. My initial think is jvm may reorder itable at safepoint. I can take it as a follow up optimization.

But I'm happy to change my mind if the rewritten implementation makes it 
easier to reason about the code.

(FTR subtype checks suffer from a similar problem: unless 
Klass::_secondary_super_cache catches it, subtype check for an interface 
does a linear scan over _secondary_supers array.)


Regards,
Kevin



------------------------------------------------------------------
From:Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
Send Time:2020年9月15日(星期二) 17:03
To:蒯微(麦庶) <kuaiwei.kw at alibaba-inc.com>; hotspot-dev <hotspot-dev-retn at openjdk.java.net>; kuaiwei <github.com+1981974+kuaiwei at openjdk.java.net>; hotspot-dev <hotspot-dev at openjdk.java.net>; hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>
Subject:Re: RFR: 8253049: Enhance itable_stub for AArch64 and x86_64


>    I updated my test cases in test/micro/org/openjdk/bench/vm/compiler/InterfaceCalls.java . My tests will not inline interface methods and most cpu are used by itable_stub.
> every test will run 10 warmup iterations and 5 measure iterations for one score. I took 3 score for every test.
>    Below is test result on my machines, it looks slow loop has more improvement than origin one.

Good, thanks for the numbers. I'm curious have you observed any 
improvements on larger scale benchmarks or real world apps?

I'm asking because linear scan is already far from optimal when there 
are many superinterfaces present.

>    I think lookup_interface_method can be reused as fast path. And it is also used by templateTable::invoke_interface and generate_method_handle_dispatch.
> My implementation in slow path need more registers (6 registers so far), I need to check if there's register conflict in these methods. I'd like to keep a separate
> slow path implementation. How do you think about it?

Frankly speaking, I'd like to avoid the duplication.

Also, absence of guarantees about order of interfaces in the itable 
complicates things: REFC and DECC can be encountered in arbitrary order 
and the pass should take that into account. For example, I don't see 
early exit on success in slow variant, so every lookup has to go through 
the whole itable irrespective of whether it succeeds or fails. I 
attribute that to the complications induced by aforementioned aspect.

And speaking of the overall approach (as it is implemented now), IMO 
increased complexity doesn't worth it. If interface calls become a 
bottleneck, the problem lies not in itable stub, but the overall design 
which requires linear scan over itables. It's better to put the effort 
there than micro-optimizing the stub.

But I'm happy to change my mind if the rewritten implementation makes it 
easier to reason about the code.

(FTR subtype checks suffer from a similar problem: unless 
Klass::_secondary_super_cache catches it, subtype check for an interface 
does a linear scan over _secondary_supers array.)

Best regards,
Vladimir Ivanov


> ------------------------------------------------------------------
> From:Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Send Time:2020年9月14日(星期一) 22:10
> To:kuaiwei <github.com+1981974+kuaiwei at openjdk.java.net>; hotspot-dev <hotspot-dev at openjdk.java.net>; hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>
> Subject:Re: RFR: 8253049: Enhance itable_stub for AArch64 and x86_64
> 
> Hi Kevin,
> 
> Very interesting observations. I like the idea to optimize for the case
> when REFC == DECC.
> 
> Fusing 2 passes over the itable into one does look attractive, but I'm
> not sure the proposed variant is correct. I suggest to split the patch
> into 2 enhancements and handle them separately.
> 
> I'm curious what kind of benchmarks you used and what are the
> improvements observed with the patch.
> 
> One suggestion about the implementation:
> 
> src/hotspot/cpu/x86/macroAssembler_x86.cpp:
> 
> +void MacroAssembler::lookup_interface_method_in_stub(Register recv_klass,
> 
> I'd like to avoid having 2 independent implementations of itable lookup
> (MacroAssembler::lookup_interface_method_in_stub() and
> MacroAssembler::lookup_interface_method()). It would be nice to keep the
> implementation unified between itable and MethodHandle linkToInterface
> linker stubs.
> 
> What MacroAssembler::lookup_interface_method(..., true
> /*return_method*/) does is interface method lookup w/o proper subtype
> check and it is equivalent to fast loop in
> MacroAssembler::lookup_interface_method_in_stub().
> 
> As a possible path forward, you could introduce the fast path check
> first by moving the fast path check into
> VtableStubs::create_itable_stub() and guard the first path over the
> itable. It would make the type checking pass over itable optional based
> on runtime check.
> 
> Then you could refactor MacroAssembler::lookup_interface_method() to
> optionally do REFC and DECC checks on every iteration and migrate
> VtableStubs::create_itable_stub()  and
> MethodHandles::generate_method_handle_dispatch() to it.
> 
> Best regards,
> Vladimir Ivanov
> 
> On 14.09.2020 13:52, kuaiwei wrote:
>> Now itable_stub will go through instanceKlass's itable twice to look up a method entry. resolved klass is used for type
>> checking and method holder klass is used to find method entry. In many cases , we observed resolved klass is as same as
>> holder klass. So we can improve itable stub based on it. If they are same klass, stub uses a fast loop to check only
>> one klass. If not, a slow loop is used to checking both klasses.
>>
>> Even entering in slow loop, new implementation can be better than old one in some cases. Because new stub just need go
>> through itable once and reduce memory operations.
>>
>>
>> bug: https://bugs.openjdk.java.net/browse/JDK-8253049
>>
>> -------------
>>
>> Commit messages:
>>    - 8253049: Enhance itable_stub for AArch64 and x86_64
>>
>> Changes: https://git.openjdk.java.net/jdk/pull/128/files
>>    Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=128&range=00
>>     Issue: https://bugs.openjdk.java.net/browse/JDK-8253049
>>     Stats: 220 lines in 7 files changed: 172 ins; 35 del; 13 mod
>>     Patch: https://git.openjdk.java.net/jdk/pull/128.diff
>>     Fetch: git fetch https://git.openjdk.java.net/jdk pull/128/head:pull/128
>>
>> PR: https://git.openjdk.java.net/jdk/pull/128
>>
> 



More information about the hotspot-compiler-dev mailing list