status of VM long loop optimizations - call for action
Rado Smogura
mail at smogura.eu
Thu Dec 16 16:45:07 UTC 2021
Hi,
Here's signature from man pages ssize_t read(int fd, void *buf, size_t
count);
And one generated by jextract public static long read ( int __fd,
Addressable __buf, long __nbytes)
The buf is from Polled allocator, so it's previously allocated memory
segment. I tired with buf passed as MemorySegment and MemoryAddress.
Kind regards,
Rado
On 16.12.2021 17:35, Maurizio Cimadamore wrote:
>
>
> On 16/12/2021 16:24, Rado Smogura wrote:
>>
>> Hi,
>>
>> I don't know details of underlying ABI, however I think there should
>> not be need to allocate additional structs.
>>
>> For POSIX read we pass 3 arguments, which should fit registers, and
>> all are 32/64 bit values
>>
>> (long)mh$.invokeExact(__fd, __buf, __nbytes)
>>
>> This happens in both cases where buf is MemorySegment and
>> MemoryAdderss, rest are primitives.
>>
> What is the signature of the native function? Is the argument
> corresponding to __buf a struct or a pointer?
>
> Maurizio
>
>> Kind regards,
>>
>> Rado
>>
>> On 16.12.2021 12:12, Maurizio Cimadamore wrote:
>>>
>>> On 13/12/2021 22:10, Maurizio Cimadamore wrote:
>>>> That's odd - I mean, the BindingContext is used when setting up
>>>> downcall method handles, or upcall stubs. But should not be invoked
>>>> in the hot path.
>>>
>>> Correction: the ofAllocator call you see might in fact even be in a
>>> hot path. A downcall method handle sometimes has to allocator memory
>>> for the temp buffers it uses. When that happens, the invocation is
>>> wrapped with a try-with-resources (well a MH chain equivalent to
>>> that is generated) and a new "binding context" with a
>>> SegmentAllocator is created. This should happen only when structs
>>> that are too big are passed by referenced by the ABI (I think that
>>> happens on Windows) - so we have to create a temp segment holding
>>> the struct, and pass the segment pointer to the underlying native
>>> function. The temp struct is then destroyed after the call.
>>>
>>> Upcalls also need an allocator, in case they receive structs by
>>> values (again, a temp segment might need to be allocated for the
>>> duration of the upcall).
>>>
>>> So, even if your downcall is fully intrinsified, you might still see
>>> calls to BindingContext::ofAllocator, depending on the shape of the
>>> called function. It is possible that C2 might have issue in
>>> scalarizing the Binding.Context allocation - but that's a separate
>>> problem from the one we were discussing (the impact of long loop
>>> optimizations).
>>>
>>> On that topic, I see that Roland has submitted a PR for the
>>> remaining perf issue we have seen in our micro benchmarks:
>>>
>>> https://github.com/openjdk/jdk18/pull/35
>>>
>>> I expect that, once integrated, we should then have full performance
>>> parity with current workarounds.
>>>
>>> Maurizio
>>>
More information about the panama-dev
mailing list