status of VM long loop optimizations - call for action
Rado Smogura
mail at smogura.eu
Thu Dec 16 16:24:18 UTC 2021
Hi,
I don't know details of underlying ABI, however I think there should not
be need to allocate additional structs.
For POSIX read we pass 3 arguments, which should fit registers, and all
are 32/64 bit values
(long)mh$.invokeExact(__fd, __buf, __nbytes)
This happens in both cases where buf is MemorySegment and MemoryAdderss,
rest are primitives.
Kind regards,
Rado
On 16.12.2021 12:12, Maurizio Cimadamore wrote:
>
> On 13/12/2021 22:10, Maurizio Cimadamore wrote:
>> That's odd - I mean, the BindingContext is used when setting up
>> downcall method handles, or upcall stubs. But should not be invoked
>> in the hot path.
>
> Correction: the ofAllocator call you see might in fact even be in a
> hot path. A downcall method handle sometimes has to allocator memory
> for the temp buffers it uses. When that happens, the invocation is
> wrapped with a try-with-resources (well a MH chain equivalent to that
> is generated) and a new "binding context" with a SegmentAllocator is
> created. This should happen only when structs that are too big are
> passed by referenced by the ABI (I think that happens on Windows) - so
> we have to create a temp segment holding the struct, and pass the
> segment pointer to the underlying native function. The temp struct is
> then destroyed after the call.
>
> Upcalls also need an allocator, in case they receive structs by values
> (again, a temp segment might need to be allocated for the duration of
> the upcall).
>
> So, even if your downcall is fully intrinsified, you might still see
> calls to BindingContext::ofAllocator, depending on the shape of the
> called function. It is possible that C2 might have issue in
> scalarizing the Binding.Context allocation - but that's a separate
> problem from the one we were discussing (the impact of long loop
> optimizations).
>
> On that topic, I see that Roland has submitted a PR for the remaining
> perf issue we have seen in our micro benchmarks:
>
> https://github.com/openjdk/jdk18/pull/35
>
> I expect that, once integrated, we should then have full performance
> parity with current workarounds.
>
> Maurizio
>
More information about the panama-dev
mailing list