[foreign-abi] RFR: 8248331: Intrinsify downcall handles in C2
Jorn Vernee
jorn.vernee at oracle.com
Wed Jul 1 12:27:25 UTC 2020
Hi Vladimir,
Thanks for the review, some replies inline...
On 30/06/2020 19:59, Vladimir Ivanov wrote:
> Looks good.
>
> Some minor comments:
>
> src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp:
>
> + void NativeInvokerGenerator::generate() {
>
> Though there's need_transition bit (both on JDK and JVM sides) and it
> affects code generation (e.g., in CallNode::guaranteed_safepoint()),
> NativeInvokerGenerator::generate() does perform thread state
> transition unconditionally. Is it intentional? As it is now,
> need_transition=false looks broken.
Yes, this is intentional.
NativeInvokerGenerator::generate() adds the thread state transition
unconditionally, but it is only used when need_transition is actually
set. See the code that creates the call_addr in graphKit:
address call_addr = nep->entry_point();
if (nep->need_transition()) {
call_addr = SharedRuntime::make_native_invoker(call_addr,
nep->shadow_space(),
arg_regs, ret_regs);
C->add_native_stub(call_addr);
}
In the need_transition=false case, we don't call
NativeInvokerGenerator::generate() at all and instead use the target
address directly.
> src/hotspot/cpu/x86/macroAssembler_x86.cpp:
>
> + void MacroAssembler::spill_register(VMReg reg) {
> + assert(reg->is_reg(), "must be a register");
> + if (reg->is_Register()) {
> + push(reg->as_Register());
> + } else if (reg->is_XMMRegister()) {
> + subptr(rsp, 16); // 16 bytes
> + movdqu(Address(rsp, 0), reg->as_XMMRegister());
> + } else {
> + ShouldNotReachHere();
> + }
> + }
>
> spill_register/fill_register assume that XMMRegister is 16 bytes in
> size. Though it may be the case for existing system ABIs, it's not
> necessarily the case for custom calling conventions (e.g., vector).
> I'd either conservatively operate on the largest vector register size
> supported or explicitly record the size of the value and issue moves
> of the right size.
Ok, I've added handling of larger register sizes based on the value of
UseAVX. Though, there's probably a lot more surgery needed to fully
support other carriers. I've also moved the spill_register/fill_register
functions to the NativeInvokerGenerator class as helper functions to
make it clear that they should only be used for this special case.
> src/hotspot/share/opto/callGenerator.cpp:
>
> + } else {
> + // can this happen?
> + print_inlining_failure(C, callee, jvms->depth() - 1,
> jvms->bci(),
> + "NativeEntryPoint not constant");
> + }
>
> It does happen in practice when bytecode for a lambda form is compiled
> stand-alone. In such case LambdaForm is not a compile-time constant,
> so NEP can't be a constant as well.
Ok, thanks. I'll remove the comment.
Jorn
>
> Best regards,
> Vladimir Ivanov
>
> On 29.06.2020 14:59, Jorn Vernee wrote:
>> Hi,
>>
>> This patch adds intrinsification of down call handles.
>>
>> This is done through a new method handle intrinsic called
>> linkToNative. We create a NativeMethodHandle that calls this
>> intrinsic, which then replaces the leaf method handle in
>> ProgrammableInvoker::getBoundMethodHandle, basically replacing
>> the call to invokeMoves. Before C2 kicks in, this intrinsic will call
>> a fallback method handle that we pass to it. The
>> handle that is passed is a handle that points to
>> ProgrammableInvoker::invokeMoves, thus simulating the current
>> behaviour.
>>
>> However, when a call to linkToNative is inlined, C2 will instead
>> generate either a direct call to the target function,
>> or a call to a wrapper stub (which is generated on demand) that also
>> does the thread state transitions needed for long
>> running native functions. Information about ABI, and which registers
>> to use are captured in a so-called 'appendix
>> argument' of the type NativeEntryPoint, which is passed as the last
>> argument. This captures all the information needed
>> to generate the call in C2 (note that previously in the patch shared
>> in the discussion thread this information was
>> split over several classes, but I've condensed the info into just
>> NativeEntryPoint in order to reduce the amount of
>> code needed in the vm to be able to access the information).
>>
>> With this, the overhead for downcalls is on par or slightly lower
>> than with JNI for calls that need to do thread state
>> transitions, and it is even lower when the thread state transitions
>> are omitted (see that *_trivial runs).
>>
>> Benchmark Mode Cnt
>> Score Error Units
>> CallOverhead.jni_blank avgt 30 8.461
>> □ 0.892 ns/op
>> CallOverhead.jni_identity avgt 30 12.585
>> □ 0.066 ns/op
>> CallOverhead.panama_blank avgt 30 8.562
>> □ 0.029 ns/op
>> CallOverhead.panama_blank_trivial avgt 30 1.372
>> □ 0.008 ns/op
>> CallOverhead.panama_identity avgt 30 11.813
>> □ 0.073 ns/op
>> CallOverhead.panama_identity_trivial avgt 30 6.042
>> □ 0.024 ns/op
>> Finished running test 'micro:CallOverhead'
>>
>> Thanks,
>> Jorn
>>
>> -------------
>>
>> Commit messages:
>> - Rebase fixes
>> - Remove RegionPtr
>> - re-write RegionPtr without using std::equal
>> - Add missing debug.hpp includes for files using assert
>> - Patch cleanup mostly updating copyright headers
>> - Reduce VM interface
>> - Implement down call intrinsics
>>
>> Changes: https://git.openjdk.java.net/panama-foreign/pull/219/files
>> Webrev: https://webrevs.openjdk.java.net/panama-foreign/219/webrev.00
>> Issue: https://bugs.openjdk.java.net/browse/JDK-8248331
>> Stats: 1848 lines in 73 files changed: 1741 ins; 28 del; 79 mod
>> Patch: https://git.openjdk.java.net/panama-foreign/pull/219.diff
>> Fetch: git fetch https://git.openjdk.java.net/panama-foreign
>> pull/219/head:pull/219
>>
>> PR: https://git.openjdk.java.net/panama-foreign/pull/219
>>
More information about the panama-dev
mailing list