[foreign-abi] Intrinsify down calls

Fri Jun 19 13:07:15 UTC 2020

On 18/06/2020 13:29, Vladimir Ivanov wrote:
>
>>> Having a stub associated with a nmethod have some 
>>> drawbacks/limitations.
>>>
>>> For example, if you look at other MethodHandle linkers 
>>> (linkToVirtual et al), there are 2 variants: used by compiled code 
>>> and used by interpreter. If interpreter needs a stub, then a 
>>> different solution for resource management will be needed for it.
>>
>> This is not a problem, since the linkToNative linker stub always 
>> calls the fallback MethodHandle. Only the intrinsified version needs 
>> the transition stub.
>
> Yes, but that's a decision we may reconsider later if we decide it's 
> important to have better performance during startup/warmup (with 
> interpreter/C1).

Ok, so we can revisit once we get to that point.

>
>>> Sharing stubs between nmethods/native functions also becomes more 
>>> complicated: it requires some sort of reference counting to track 
>>> usages.
>>>
>>> Also, if sharing is not required (and all the native stubs have 
>>> unique nmethod users), then the stubs can be embedded into the 
>>> nmethod itself. Then the stubs will be freed along with the nmethod. 
>>> (It can also improve situation with code cache fragmentation unless 
>>> the stub is allocated right next to the user nmethod.)
>>
>>
>> This is what we started out with, but I think there were some 
>> concerns with regards to the slow path calls being emitted inline. 
>> IIRC there was also a concern of safepoints not being handled 
>> correctly in that case (don't quite remember the exact concern 
>> though). I also think it's annoying that we have to do manual 
>> register spill/fill of the return value around the slow path calls. 
>> Maybe We could have separate nodes for the slow path calls and use 
>> labels to jump back and forth?
>
> What we started at was the stub code placed inline around the call to 
> native code.
>
> What I meant is to keep the current shape of the call (call to 
> transition stub), but put the stub inside the nmethod with a separate 
> entry point. So, the stub code is separate, but it is managed as part 
> of nmethod it is used from.

Ah, I see. It sounds like a good idea, but also like it would require a 
lot more surgery.

>
> Regarding handling slow path, I'm not concerned about additional work 
> involved since it's not performance critical.
>
> And I don't see any benefit in making the slow path checks explicit in 
> Ideal IR considering the opportunities to optimize them are very limited.
>
>>> At the moment, the most promising solution to me is to associate the 
>>> JVM stubs with native library and free them all at once when the 
>>> library is unloaded.
>>
>>
>> I think we'd run into the same problem again with the code cache 
>> running out of space in the tests. There is only one library being 
>> used in that case as well. Maybe the size of the stubs can be reduced 
>> somewhat. I've used MethodHandles::adapter_code_size as the size of 
>> the stub which works, but I don't know how that number is determined. 
>> So, maybe it can still be made a lot smaller.
>
> It sounds more like a stress scenario than a real world use case. Stub 
> size reduction will just push the boundary further away.
>
> Please, remind me what were the problems caused by running out of 
> space in code cache. If there's no space left in code cache to 
> allocate a stub then intrinsification attempt can just bail out.

By default if there is no more space all compilation will be disabled. 
Or if -XX:+ExitOnFullCodeCache is set the VM will exit.

It sounds like we can improve on the stub management strategy later, so 
for now I'll go with the current implementation since it works. The 
largest part of the code would stay the same any ways.

Thanks,
Jorn

>>> Quick question: can you remind me why is the following line is 
>>> needed (from NativeMethodHandle.make()):
>>>
>>>   fallback.customize(); // needed
>>
>>
>> This is a left-over from earlier experimentation. It is no longer 
>> needed. (I thought I had already removed it before actually).
>
> Got it, thanks.
>
> Best regards,
> Vladimir Ivanov
>
>>>> On 29/04/2020 13:58, Maurizio Cimadamore wrote:
>>>>>
>>>>> On 29/04/2020 12:43, Jorn Vernee wrote:
>>>>>> There might be some options for attaching the stub to the nmethod 
>>>>>> itself, and then freeing it when the nmethod is freed. But maybe 
>>>>>> for now we should go back to emitting the thread state transition 
>>>>>> inline, and then improve upon this later. 
>>>>>
>>>>> +1
>>>>>
>>>>> After chatting with Jorn I'm not overly convinced of the wider 
>>>>> advantages to split the nmethod from the thread transition code. 
>>>>> They will have to be co-generated anyway, so I don't see an 
>>>>> advantage in terms of reduce the amount of generated code; I also 
>>>>> don't see an (obvious) performance edge, since in the common case 
>>>>> the nmethod will have to call the transition stub code.
>>>>>
>>>>> Using a Cleaner to do this seems like the wrong tool for the job - 
>>>>> since the Cleaner is tuned for heap allocations - and here we have 
>>>>> to rely on it to manage the code cache, whose size is typically 
>>>>> much smaller, and totally opaque to the cleaner anyway. So this 
>>>>> presents all the thorny problems that direct ByteBuffer 
>>>>> deallocation via Cleaner also presents: objects that are not 
>>>>> deemed "very important" by the cleaner are left around hanging, 
>>>>> and you eventually run out of space. ByteBuffers did some heroic 
>>>>> attempts to try to kick Cleaner, and add retries with exponential 
>>>>> backoff - but I don't think we wanna go there for such a low level 
>>>>> API as this one?
>>>>>
>>>>> Cheers
>>>>> Maurizio
>>>>>