status of VM long loop optimizations - call for action
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu Dec 16 16:35:33 UTC 2021
On 16/12/2021 16:24, Rado Smogura wrote:
>
> Hi,
>
> I don't know details of underlying ABI, however I think there should
> not be need to allocate additional structs.
>
> For POSIX read we pass 3 arguments, which should fit registers, and
> all are 32/64 bit values
>
> (long)mh$.invokeExact(__fd, __buf, __nbytes)
>
> This happens in both cases where buf is MemorySegment and
> MemoryAdderss, rest are primitives.
>
What is the signature of the native function? Is the argument
corresponding to __buf a struct or a pointer?
Maurizio
> Kind regards,
>
> Rado
>
> On 16.12.2021 12:12, Maurizio Cimadamore wrote:
>>
>> On 13/12/2021 22:10, Maurizio Cimadamore wrote:
>>> That's odd - I mean, the BindingContext is used when setting up
>>> downcall method handles, or upcall stubs. But should not be invoked
>>> in the hot path.
>>
>> Correction: the ofAllocator call you see might in fact even be in a
>> hot path. A downcall method handle sometimes has to allocator memory
>> for the temp buffers it uses. When that happens, the invocation is
>> wrapped with a try-with-resources (well a MH chain equivalent to that
>> is generated) and a new "binding context" with a SegmentAllocator is
>> created. This should happen only when structs that are too big are
>> passed by referenced by the ABI (I think that happens on Windows) -
>> so we have to create a temp segment holding the struct, and pass the
>> segment pointer to the underlying native function. The temp struct is
>> then destroyed after the call.
>>
>> Upcalls also need an allocator, in case they receive structs by
>> values (again, a temp segment might need to be allocated for the
>> duration of the upcall).
>>
>> So, even if your downcall is fully intrinsified, you might still see
>> calls to BindingContext::ofAllocator, depending on the shape of the
>> called function. It is possible that C2 might have issue in
>> scalarizing the Binding.Context allocation - but that's a separate
>> problem from the one we were discussing (the impact of long loop
>> optimizations).
>>
>> On that topic, I see that Roland has submitted a PR for the remaining
>> perf issue we have seen in our micro benchmarks:
>>
>> https://github.com/openjdk/jdk18/pull/35
>>
>> I expect that, once integrated, we should then have full performance
>> parity with current workarounds.
>>
>> Maurizio
>>
More information about the panama-dev
mailing list