[External] : Re: MemorySegment.ofAddress(...).reinterpret(...)

Mon Jul 10 14:33:01 UTC 2023

I decoded to run the benchmark against my application twice in the same 
JVM, to allow for more "warmup". The entire test takes about 9 minutes 
to run for each iteration. With the original unsafe version, I see about 
a ~2% improvement the second time. With the Panama version, there's no 
improvement the second time. The overall performance gaps widens to be 
about 3.5%.

I question the effectiveness of inlining. In a micro benchmark, full 
inlining is easy to observe. But in something more complex, how can I be 
certain that inlining is working? I don't have the luxury of using the 
ForceInline annotation myself. I can enable logging to observe inlining 
actions, but the application is quite complex, and this generates a 
unending stream of noise.

On 2023-07-10 02:31 AM, Maurizio Cimadamore wrote:

> AFAIK, all the work that went into hoisting bounds check with long 
> induction variables should already have taken care of eliminating bounds 
> checks in the vast majority of cases. I'm skeptical that the difference 
> you see is caused by a bound check (especially one against 
> Long.MAX_VALUE, effectively a constant). I think a more detailed 
> benchmark is required here in order to assess exactly where the 
> performance is being lost, as there can be several factors.
> 
> I'm very very skeptical that the restricted method check is playing a 
> part in all of this. We have taken extra care to make the check fast, 
> and to cache the results of such check in a VM @Stable field, which is 
> treated as a true constant. We have benchmark to show that no peak 
> performance is lost due to the restricted method check.
>