[foreign-abi] Intrinsify down calls

Thu Jun 18 11:29:18 UTC 2020

>> Having a stub associated with a nmethod have some drawbacks/limitations.
>>
>> For example, if you look at other MethodHandle linkers (linkToVirtual 
>> et al), there are 2 variants: used by compiled code and used by 
>> interpreter. If interpreter needs a stub, then a different solution 
>> for resource management will be needed for it.
> 
> This is not a problem, since the linkToNative linker stub always calls 
> the fallback MethodHandle. Only the intrinsified version needs the 
> transition stub.

Yes, but that's a decision we may reconsider later if we decide it's 
important to have better performance during startup/warmup (with 
interpreter/C1).

>> Sharing stubs between nmethods/native functions also becomes more 
>> complicated: it requires some sort of reference counting to track usages.
>>
>> Also, if sharing is not required (and all the native stubs have unique 
>> nmethod users), then the stubs can be embedded into the nmethod 
>> itself. Then the stubs will be freed along with the nmethod. (It can 
>> also improve situation with code cache fragmentation unless the stub 
>> is allocated right next to the user nmethod.)
> 
> 
> This is what we started out with, but I think there were some concerns 
> with regards to the slow path calls being emitted inline. IIRC there was 
> also a concern of safepoints not being handled correctly in that case 
> (don't quite remember the exact concern though). I also think it's 
> annoying that we have to do manual register spill/fill of the return 
> value around the slow path calls. Maybe We could have separate nodes for 
> the slow path calls and use labels to jump back and forth?

What we started at was the stub code placed inline around the call to 
native code.

What I meant is to keep the current shape of the call (call to 
transition stub), but put the stub inside the nmethod with a separate 
entry point. So, the stub code is separate, but it is managed as part of 
nmethod it is used from.

Regarding handling slow path, I'm not concerned about additional work 
involved since it's not performance critical.

And I don't see any benefit in making the slow path checks explicit in 
Ideal IR considering the opportunities to optimize them are very limited.

>> At the moment, the most promising solution to me is to associate the 
>> JVM stubs with native library and free them all at once when the 
>> library is unloaded.
> 
> 
> I think we'd run into the same problem again with the code cache running 
> out of space in the tests. There is only one library being used in that 
> case as well. Maybe the size of the stubs can be reduced somewhat. I've 
> used MethodHandles::adapter_code_size as the size of the stub which 
> works, but I don't know how that number is determined. So, maybe it can 
> still be made a lot smaller.

It sounds more like a stress scenario than a real world use case. Stub 
size reduction will just push the boundary further away.

Please, remind me what were the problems caused by running out of space 
in code cache. If there's no space left in code cache to allocate a stub 
then intrinsification attempt can just bail out.

>> Quick question: can you remind me why is the following line is needed 
>> (from NativeMethodHandle.make()):
>>
>>   fallback.customize(); // needed
> 
> 
> This is a left-over from earlier experimentation. It is no longer 
> needed. (I thought I had already removed it before actually).

Got it, thanks.

Best regards,
Vladimir Ivanov

>>> On 29/04/2020 13:58, Maurizio Cimadamore wrote:
>>>>
>>>> On 29/04/2020 12:43, Jorn Vernee wrote:
>>>>> There might be some options for attaching the stub to the nmethod 
>>>>> itself, and then freeing it when the nmethod is freed. But maybe 
>>>>> for now we should go back to emitting the thread state transition 
>>>>> inline, and then improve upon this later. 
>>>>
>>>> +1
>>>>
>>>> After chatting with Jorn I'm not overly convinced of the wider 
>>>> advantages to split the nmethod from the thread transition code. 
>>>> They will have to be co-generated anyway, so I don't see an 
>>>> advantage in terms of reduce the amount of generated code; I also 
>>>> don't see an (obvious) performance edge, since in the common case 
>>>> the nmethod will have to call the transition stub code.
>>>>
>>>> Using a Cleaner to do this seems like the wrong tool for the job - 
>>>> since the Cleaner is tuned for heap allocations - and here we have 
>>>> to rely on it to manage the code cache, whose size is typically much 
>>>> smaller, and totally opaque to the cleaner anyway. So this presents 
>>>> all the thorny problems that direct ByteBuffer deallocation via 
>>>> Cleaner also presents: objects that are not deemed "very important" 
>>>> by the cleaner are left around hanging, and you eventually run out 
>>>> of space. ByteBuffers did some heroic attempts to try to kick 
>>>> Cleaner, and add retries with exponential backoff - but I don't 
>>>> think we wanna go there for such a low level API as this one?
>>>>
>>>> Cheers
>>>> Maurizio
>>>>