Unsafe vs MemorySegments / Bounds checking...

Wed Oct 30 22:11:35 UTC 2024

I see some improvement when I force inline a few of my methods, but I 
think they would eventually inline because they're "hot". But that can 
take awhile. But there's also incomplete inlining within the VarHandle 
method chain. In many other cases the inlining is complete, due to 
forced inlining within the JDK methods.

If the VarHandle approach is preferred (I don't think is), then the 
chain needs to shrink or else more aggressive inlining might be needed. 
I don't think aggressive inlining is a good idea.

The VarHandle approach actually works quite well in a micro benchmark, 
and that's why I think it looks like a solved problem. Within a more 
complex codebase, the compiler is doing so much more work, and 
recompiling, making new inlining decisions, giving up, etc.

I don't know how HotSpot works, but if it seems to me that if I have a 
large method (there are a few of them), with code which is usually dead, 
then the code can be compiled with all sorts of inlining optimizations 
because it's relatively small. When the dead code becomes "undead", then 
the code needs to be deoptimized and recompiled. Some of the failed 
inlining decisions appear to occur after the successful ones, so perhaps 
this is because the method got much bigger? The compiler output is 
noisy, and so I don't know exactly what's happening.

The obvious workaround would be to make the methods smaller, but they're 
not structured in a way that makes this easy. It's also very delicate 
and critical code, and any changes could introduce severe database 
corruption.

On 2024-10-30 02:49 PM, Maurizio Cimadamore wrote:
> 
> On 30/10/2024 20:28, Brian S O'Neill wrote:
>> The "smoking gun" actually has been identified. The call chain is too 
>> large and inlining gives up. Things improve a bit when I can force 
>> inlining, but I shouldn't have to do that.
> Which method fails to inline? Is that a FFM method or a method in your 
> library? How much do things improve when you force inlining? And, I 
> assume this is with the var handle combinator approach?
>> Should I just go back to square one and write the code in what looks 
>> to be the most straightforward, and report the results? As I recall, 
>> the performance regression was easier to spot even in a micro benchmark.
> 
> I think that might be a good idea. It would be good if we could isolate 
> a part of your library where the regression is coming from -- while it's 
> possible this is a "death by a thousand cuts" situation, it is typically 
> not the case.
> 
> What I would like to do in the meantime is (when I have some cycles) to 
> write a benchmark which tests memory segment in a loop with random 
> access (e.g. using random offsets). And then evaluate the costs 
> associated with the various approaches in isolation and see if anything 
> pops up. I have a couple of ideas on how to possibly improve the story 
> for the unbounded MAX_VALUE access -- but I'd like to see where we are 
> w.r.t. random access in vanilla Java 24 first.
> 
> Cheers
> Maurizio
>