status of VM long loop optimizations - call for action
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu Dec 16 11:12:07 UTC 2021
On 13/12/2021 22:10, Maurizio Cimadamore wrote:
> That's odd - I mean, the BindingContext is used when setting up
> downcall method handles, or upcall stubs. But should not be invoked in
> the hot path.
Correction: the ofAllocator call you see might in fact even be in a hot
path. A downcall method handle sometimes has to allocator memory for the
temp buffers it uses. When that happens, the invocation is wrapped with
a try-with-resources (well a MH chain equivalent to that is generated)
and a new "binding context" with a SegmentAllocator is created. This
should happen only when structs that are too big are passed by
referenced by the ABI (I think that happens on Windows) - so we have to
create a temp segment holding the struct, and pass the segment pointer
to the underlying native function. The temp struct is then destroyed
after the call.
Upcalls also need an allocator, in case they receive structs by values
(again, a temp segment might need to be allocated for the duration of
the upcall).
So, even if your downcall is fully intrinsified, you might still see
calls to BindingContext::ofAllocator, depending on the shape of the
called function. It is possible that C2 might have issue in scalarizing
the Binding.Context allocation - but that's a separate problem from the
one we were discussing (the impact of long loop optimizations).
On that topic, I see that Roland has submitted a PR for the remaining
perf issue we have seen in our micro benchmarks:
https://github.com/openjdk/jdk18/pull/35
I expect that, once integrated, we should then have full performance
parity with current workarounds.
Maurizio
More information about the panama-dev
mailing list