status of VM long loop optimizations - call for action
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu Dec 16 16:57:39 UTC 2021
On 16/12/2021 16:45, Rado Smogura wrote:
> Hi,
>
>
> Here's signature from man pages ssize_t read(int fd, void *buf, size_t
> count);
>
> And one generated by jextract public static long read ( int __fd,
> Addressable __buf, long __nbytes)
>
>
> The buf is from Polled allocator, so it's previously allocated memory
> segment. I tired with buf passed as MemorySegment and MemoryAddress.
Such a signature should never cause an allocator to be called - it's all
primitives.
On which OS/platform are you on?
Maurizio
>
>
> Kind regards,
>
> Rado
>
> On 16.12.2021 17:35, Maurizio Cimadamore wrote:
>>
>>
>> On 16/12/2021 16:24, Rado Smogura wrote:
>>>
>>> Hi,
>>>
>>> I don't know details of underlying ABI, however I think there should
>>> not be need to allocate additional structs.
>>>
>>> For POSIX read we pass 3 arguments, which should fit registers, and
>>> all are 32/64 bit values
>>>
>>> (long)mh$.invokeExact(__fd, __buf, __nbytes)
>>>
>>> This happens in both cases where buf is MemorySegment and
>>> MemoryAdderss, rest are primitives.
>>>
>> What is the signature of the native function? Is the argument
>> corresponding to __buf a struct or a pointer?
>>
>> Maurizio
>>
>>> Kind regards,
>>>
>>> Rado
>>>
>>> On 16.12.2021 12:12, Maurizio Cimadamore wrote:
>>>>
>>>> On 13/12/2021 22:10, Maurizio Cimadamore wrote:
>>>>> That's odd - I mean, the BindingContext is used when setting up
>>>>> downcall method handles, or upcall stubs. But should not be
>>>>> invoked in the hot path.
>>>>
>>>> Correction: the ofAllocator call you see might in fact even be in a
>>>> hot path. A downcall method handle sometimes has to allocator
>>>> memory for the temp buffers it uses. When that happens, the
>>>> invocation is wrapped with a try-with-resources (well a MH chain
>>>> equivalent to that is generated) and a new "binding context" with a
>>>> SegmentAllocator is created. This should happen only when structs
>>>> that are too big are passed by referenced by the ABI (I think that
>>>> happens on Windows) - so we have to create a temp segment holding
>>>> the struct, and pass the segment pointer to the underlying native
>>>> function. The temp struct is then destroyed after the call.
>>>>
>>>> Upcalls also need an allocator, in case they receive structs by
>>>> values (again, a temp segment might need to be allocated for the
>>>> duration of the upcall).
>>>>
>>>> So, even if your downcall is fully intrinsified, you might still
>>>> see calls to BindingContext::ofAllocator, depending on the shape of
>>>> the called function. It is possible that C2 might have issue in
>>>> scalarizing the Binding.Context allocation - but that's a separate
>>>> problem from the one we were discussing (the impact of long loop
>>>> optimizations).
>>>>
>>>> On that topic, I see that Roland has submitted a PR for the
>>>> remaining perf issue we have seen in our micro benchmarks:
>>>>
>>>> https://urldefense.com/v3/__https://github.com/openjdk/jdk18/pull/35__;!!ACWV5N9M2RV99hQ!e9H5DbwL8T6FSBr9PJNuCT0yqzLh-MTz_AIKhaLak6meLU8VNeW0M-MHypUh0xGdjv4nKXs$
>>>>
>>>> I expect that, once integrated, we should then have full
>>>> performance parity with current workarounds.
>>>>
>>>> Maurizio
>>>>
More information about the panama-dev
mailing list