status of VM long loop optimizations - call for action

Thu Dec 16 16:24:18 UTC 2021

Hi,

I don't know details of underlying ABI, however I think there should not 
be need to allocate additional structs.

For POSIX read we pass 3 arguments, which should fit registers, and all 
are 32/64 bit values

(long)mh$.invokeExact(__fd, __buf, __nbytes)

This happens in both cases where buf is MemorySegment and MemoryAdderss, 
rest are primitives.

Kind regards,

Rado

On 16.12.2021 12:12, Maurizio Cimadamore wrote:
>
> On 13/12/2021 22:10, Maurizio Cimadamore wrote:
>> That's odd - I mean, the BindingContext is used when setting up 
>> downcall method handles, or upcall stubs. But should not be invoked 
>> in the hot path. 
>
> Correction: the ofAllocator call you see might in fact even be in a 
> hot path. A downcall method handle sometimes has to allocator memory 
> for the temp buffers it uses. When that happens, the invocation is 
> wrapped with a try-with-resources (well a MH chain equivalent to that 
> is generated) and a new "binding context" with a SegmentAllocator is 
> created. This should happen only when structs that are too big are 
> passed by referenced by the ABI (I think that happens on Windows) - so 
> we have to create a temp segment holding the struct, and pass the 
> segment pointer to the underlying native function. The temp struct is 
> then destroyed after the call.
>
> Upcalls also need an allocator, in case they receive structs by values 
> (again, a temp segment might need to be allocated for the duration of 
> the upcall).
>
> So, even if your downcall is fully intrinsified, you might still see 
> calls to BindingContext::ofAllocator, depending on the shape of the 
> called function. It is possible that C2 might have issue in 
> scalarizing the Binding.Context allocation - but that's a separate 
> problem from the one we were discussing (the impact of long loop 
> optimizations).
>
> On that topic, I see that Roland has submitted a PR for the remaining 
> perf issue we have seen in our micro benchmarks:
>
> https://github.com/openjdk/jdk18/pull/35
>
> I expect that, once integrated, we should then have full performance 
> parity with current workarounds.
>
> Maurizio
>