status of VM long loop optimizations - call for action

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Thu Dec 16 11:12:07 UTC 2021


On 13/12/2021 22:10, Maurizio Cimadamore wrote:
> That's odd - I mean, the BindingContext is used when setting up 
> downcall method handles, or upcall stubs. But should not be invoked in 
> the hot path. 

Correction: the ofAllocator call you see might in fact even be in a hot 
path. A downcall method handle sometimes has to allocator memory for the 
temp buffers it uses. When that happens, the invocation is wrapped with 
a try-with-resources (well a MH chain equivalent to that is generated) 
and a new "binding context" with a SegmentAllocator is created. This 
should happen only when structs that are too big are passed by 
referenced by the ABI (I think that happens on Windows) - so we have to 
create a temp segment holding the struct, and pass the segment pointer 
to the underlying native function. The temp struct is then destroyed 
after the call.

Upcalls also need an allocator, in case they receive structs by values 
(again, a temp segment might need to be allocated for the duration of 
the upcall).

So, even if your downcall is fully intrinsified, you might still see 
calls to BindingContext::ofAllocator, depending on the shape of the 
called function. It is possible that C2 might have issue in scalarizing 
the Binding.Context allocation - but that's a separate problem from the 
one we were discussing (the impact of long loop optimizations).

On that topic, I see that Roland has submitted a PR for the remaining 
perf issue we have seen in our micro benchmarks:

https://github.com/openjdk/jdk18/pull/35

I expect that, once integrated, we should then have full performance 
parity with current workarounds.

Maurizio



More information about the panama-dev mailing list