Questions about oop handling for Panama upcalls.

Jorn Vernee jorn.vernee at oracle.com
Mon Nov 22 19:19:31 UTC 2021


One more comment on this thread for future readers:

As mentioned before, I had noticed that deoptimization code would 
reconstitute the receiver oop in the upcall stub's frame, so I added an 
extra stack word for that in the upcall frame (thinking at that time 
that the caller was supposed to make room for the receiver on the 
stack). But upon recent inspection of c2i adapter code, I noticed that 
the c2i adapter should already be making room for the receiver as well, 
so there should theoretically be no need for those extra stack words.

It turns out that the deopt code will not recreate the c2i adapter when 
doing a deopt. For that to work for compiled callers, the stack needs to 
be adjusted to make room for the parameters (as the 2ci adapter does), 
as well as extra locals. The space that is needed is calculated in 
Deoptimization::fetch_unroll_info_helper by the following code:

   // Compute the amount the oldest interpreter frame will have to adjust
   // its caller's stack by. If the caller is a compiled frame then
   // we pretend that the callee has no parameters so that the
   // extension counts for the full amount of locals and not just
   // locals-parms. This is because without a c2i adapter the parm
   // area as created by the compiled frame will not be usable by
   // the interpreter. (Depending on the calling convention there
   // may not even be enough space).

   // QQQ I'd rather see this pushed down into last_frame_adjust
   // and have it take the sender (aka caller).

   if (deopt_sender.is_compiled_frame() || caller_was_method_handle) {
     caller_adjustment = last_frame_adjust(0, callee_locals);
   } else if (callee_locals > callee_parameters) {
     // The caller frame may need extending to accommodate
     // non-parameter locals of the first unpacked interpreted frame.
     // Compute that adjustment.
     caller_adjustment = last_frame_adjust(callee_parameters, 
callee_locals);
   }

I think you can probably spot the problem from this: we are doing a 
compiled call in the upcall stub, but the if-statement is not catching 
that case, so we don't make enough space on the stack. (in the case of 
method handles a pessimization seems to be used, since it's not known 
how much room the caller has on the stack for the parameters).

Jorn

On 17/11/2021 16:35, Jorn Vernee wrote:
> On 17/11/2021 16:14, Erik Osterlund wrote:
>> Hi Jorn,
>>
>> In the interpreter world, the expression stack at the call site 
>> becomes the locals
>> of the callee. So everything is passed through the stack. So the 
>> upcall stub sets
>> things up like an interpreter method would have (quack quack), and 
>> calls the
>> i2c adapter if there is an nmethod (quack quack), which will 
>> transform the
>> arguments to the compiled convention of the callee. The argument 
>> ownership
>> then switches from the caller to the callee, once the callee can 
>> manifest on the
>> stack. But if there are safepoints inbetween, then the caller owns 
>> the arguments
>> until its callee manifests.
> Okay, thanks, that makes sense. This probably explains why not 
> implementing preserve_callee_argument_oops for the upcall stubs didn't 
> cause any problems so far. There probably just weren't any safepoints 
> in between the call from the stub and the callee setting up it's 
> frame. (although I'm still a bit confused here why the callee doesn't 
> make space for the receiver in it's frame as well).
>> Do you want to avoid the pretend to be the interpreter step because 
>> it is costly
>> in the Panama world to spill arguments to the stack?
> I think either one could "work", although it seems like interpreter 
> calls require more setup of meta data around calls (which would be 
> unneeded if we called into an nmethod I think?). Also, we generate an 
> argument shuffle from the native convention to the Java calling 
> convention (this is unavoidable). If the native convention passes 
> arguments in the same registers that the Java convention expects them 
> in we don't have to generate code for that in the shuffle. 
> Theoretically we could also do a pass to minimize the needed shuffle 
> by reordering parameters on the MethodHandle. If we went with an 
> interpreted calling convention, we would always have to copy across 
> arguments to the stack, in a shuffle-ish manner (right now we rely on 
> SharedRuntime::java_calling_convention to compute the target 
> registers. Would have to implement something similar for the 
> interpreter convention).
>
> It seems to me that in the long run, going with the Java compiled 
> calling convention for the upcall is the right choice if we want to be 
> able to squeeze out as much speed as possible.
>
> Jorn
>>
>> /Erik
>>
>>> -----Original Message-----
>>> From: Jorn Vernee <jorn.vernee at oracle.com>
>>> Sent: Wednesday, 17 November 2021 15:49
>>> To: Erik Osterlund <erik.osterlund at oracle.com>; hotspot-
>>> dev at openjdk.java.net
>>> Subject: Re: Questions about oop handling for Panama upcalls.
>>>
>>> Hi Erik,
>>>
>>> Thanks for the suggestion.
>>>
>>> The callee is a mix of JDK internal and user code. The user gives us 
>>> a method
>>> handle that they want to turn into a native function pointer [1], 
>>> and we adapt
>>> that using method handle combinators [2] to take only primitve 
>>> arguments
>>> according to the registers in which the native calling convention 
>>> passes
>>> arguments (essentially each primitive argument is a register value). 
>>> The
>>> register values are then reconstructed into high-level arguments 
>>> (through
>>> our MH adaptation), and passed to the user code. It's this adapted 
>>> method
>>> handle that we call from the upcall stub.
>>>
>>> I guess what you're suggesting is that we have some internal Java 
>>> method
>>> like this:
>>>
>>>       static ... invoke(long methodHandle, ...) {
>>>           MethodHandle mh = resolveJObject(methodHandle);
>>>           return (...) mh.invokeExact(...);
>>>       }
>>>
>>> Which is then called from the upcall stub instead.
>>>
>>> I think it could work maybe (would have to see how the performance 
>>> works
>>> out), but we have to deal with different signatures, so would have 
>>> to use
>>> bytecode spinning to generate these 'invoke' methods on demand, which
>>> seems like maybe it's a worse medicine (in terms of complexity) than 
>>> adding
>>> the correct oop handling in the VM.
>>>
>>> I would also just like to get a better understanding of how this is 
>>> supposed to
>>> work in the first place (or how it works e.g. in the case of 
>>> nmethods), since I
>>> had to implement the correct oop handling in the past as well when
>>> implementing the intrinsics for down calls, and it's probably not 
>>> the last time I
>>> have to deal with something like this...
>>>
>>>   > Our current upcall stubs try to quack like an interpreter in 
>>> many ways, so
>>> that it will look like an i-2-something call. I think you can either 
>>> try to do the
>>> same quacking dance, to pass the oop to the callee
>>>
>>> So, I suppose interpreter argument oops are handled through another
>>> mechanism than OopMaps, maybe something similar to
>>> CompiledMethod::preserve_callee_argument_oops?
>>>
>>> Thanks,
>>> Jorn
>>>
>>> [1] :
>>> https://github.com/openjdk/panama-foreign/blob/foreign-
>>> jextract/src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLink 
>>>
>>> er.java#L224
>>> [2] :
>>> https://github.com/openjdk/panama-foreign/blob/foreign-
>>> jextract/src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/abi/Pr 
>>>
>>> ogrammableUpcallHandler.java#L157
>>>
>>> On 17/11/2021 10:42, Erik Osterlund wrote:
>>>> Hi Jorn,
>>>>
>>>> So you have a jobject in the caller, resolve it, and then need to 
>>>> pass the
>>> oop around as an argument to the callee. Our current upcall stubs 
>>> try to
>>> quack like an interpreter in many ways, so that it will look like an 
>>> i-2-
>>> something call. I think you can either try to do the same quacking 
>>> dance, to
>>> pass the oop to the callee, or alternatively the primary question 
>>> for me
>>> seems to be who is the callee? You have a very fixed format for the 
>>> call,
>>> which makes me suspect the callee is some kind of JDK internal code.
>>> Another way of dealing with this would be to pass the jobject as a 
>>> long and
>>> just resolve it in the callee instead, if this is indeed JDK 
>>> internal code. Then
>>> this becomes a problem that doesn't need to be solved at all. Just 
>>> sanity
>>> checking.
>>>> /Erik
>>>>
>>>>> -----Original Message-----
>>>>> From: hotspot-dev<hotspot-dev-retn at openjdk.java.net>  On Behalf Of
>>>>> Jorn Vernee
>>>>> Sent: Tuesday, 16 November 2021 18:51 To:hotspot-
>>> dev at openjdk.java.net
>>>>> Subject: Questions about oop handling for Panama upcalls.
>>>>>
>>>>> Hi,
>>>>>
>>>>> For panama-foreign upcalls we spin our own upcall stubs that wrap a
>>>>> method handle VM entry for the actual upcall. I want to make sure I
>>>>> have the oop handling correct on this.
>>>>>
>>>>> We receive a list of arguments from native code (all primitives, so
>>>>> no oops to handle there), and then prefix that list with a
>>>>> MethodHandle oop, before calling into the MH's VM entry. The MH oop
>>>>> can be stored in three different
>>>>> places:
>>>>>
>>>>> 1. The MH oop is stored in a global JNI handle, and then resolved
>>>>> right before the upcall [1].
>>>>> 2. The MH oop is then stored in the first argument register j_rarg0
>>>>> for the call.
>>>>> 3. During a deopt of the callee, the deoptimization code spills the
>>>>> receiver (MH oop) into the frame of the upcall stub. (looks like the
>>>>> extending of the frame that happens for instance in c2i adapters
>>>>> doesn't make room for the receiver?).
>>>>>
>>>>> I don't think I need to do anything else for 1., but for 2. and 3.
>>>>> there is currently no handling. I wanted to ask how those cases
>>>>> should be handled, if at all.
>>>>>
>>>>> I think 2. could in theory be addressed by implementing
>>>>> CodeBlob::preserve_callee_argument_oops. Though, it has been working
>>>>> fine so far without this, so I'm wondering if this is even needed. Is
>>>>> the caller or callee responsible for handling argument oops (seems to
>>>>> be caller, from looking at
>>> CompiledMethod::preserve_callee_argument_oops)?
>>>>> Or does the caller just handle the receiver if there is one (since
>>>>> deopt spills that into the callers frame)? The oop offset is passed
>>>>> to an OopClosure in CompiledArgumentOopFinder::handle_oop_offset as
>>>>> an oop* [2]. Does the argument register get spilled somewhere and the
>>>>> oop needs to be patched in place at that address (by the OopClosure)?
>>>>> Or is this just used to mark the oop as alive? (in the latter 
>>>>> case, the JNI
>>> global should be enough I think).
>>>>> I think 3. could be handled with an OopMap entry at the frame offset
>>>>> where the receiver is spilled during a deopt of the callee? Should it
>>>>> be an oop or a narrowOop, or does it depend on VM settings? FWIW, the
>>>>> deopt code always seems to need a machine word (64-bits) to do the
>>>>> spilling, so I think it's an oop? Do I need to zero out that part of
>>>>> the frame when allocating the frame so that the GC doesn't mistake
>>>>> some garbage that's in there for an oop?
>>>>>
>>>>> I have a POC patch here for reference [3], that implements the 2
>>>>> things above. This passes our test suite, but I'm not sure about the
>>> correctness.
>>>>> Looking at what JNI does for upcalls [4], I don't see how e.g. the
>>>>> receiver argument that is put on the stack is handled, or what
>>>>> happens when the callee deopts (though I think it would just
>>>>> overwrite the value on the stack that's there already, since JNI
>>>>> always seems to do interpreted calls, where we do compiled calls).
>>>>> But, JNI/the call stub might be special cased elsewhere...
>>>>>
>>>>> Also, the oop is briefly stored in rscratch1 when resolving. I'm
>>>>> interested to know when the GC can look at the frame and register
>>>>> state, especially with concurrent GCs in mind. I'm assuming it's only
>>>>> during the call to the MH VM entry (but the existence of
>>> frame::safe_for_sender makes me less sure)?
>>>>> AFAIK the call counts as a safepoint (with oop map for it typically
>>>>> stored at the return offset). At this safepoint, the oop can only be
>>>>> stored at one of the
>>>>> 3 places listed at the start.
>>>>>
>>>>> Thanks,
>>>>> Jorn
>>>>>
>>>>> [1] :
>>>>> https://github.com/openjdk/panama-foreign/blob/foreign-
>>>>> jextract/src/hotspot/cpu/x86/universalUpcallHandler_x86_64.cpp#L412-L
>>>>> 416
>>>>> [2] :
>>>>>
>>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/
>>>>> fr
>>>>> ame.cpp#L939-L946
>>>>> [3] :
>>>>> https://github.com/openjdk/panama-foreign/compare/foreign-
>>>>> memaccess+abi...JornVernee:Deopt_Crash
>>>>> [4] :
>>>>>
>>> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGe
>>>>> nerator_x86_64.cpp#L339


More information about the hotspot-dev mailing list