Unsafe vs MemorySegments / Bounds checking...
Brian S O'Neill
bronee at gmail.com
Wed Oct 30 22:11:35 UTC 2024
I see some improvement when I force inline a few of my methods, but I
think they would eventually inline because they're "hot". But that can
take awhile. But there's also incomplete inlining within the VarHandle
method chain. In many other cases the inlining is complete, due to
forced inlining within the JDK methods.
If the VarHandle approach is preferred (I don't think is), then the
chain needs to shrink or else more aggressive inlining might be needed.
I don't think aggressive inlining is a good idea.
The VarHandle approach actually works quite well in a micro benchmark,
and that's why I think it looks like a solved problem. Within a more
complex codebase, the compiler is doing so much more work, and
recompiling, making new inlining decisions, giving up, etc.
I don't know how HotSpot works, but if it seems to me that if I have a
large method (there are a few of them), with code which is usually dead,
then the code can be compiled with all sorts of inlining optimizations
because it's relatively small. When the dead code becomes "undead", then
the code needs to be deoptimized and recompiled. Some of the failed
inlining decisions appear to occur after the successful ones, so perhaps
this is because the method got much bigger? The compiler output is
noisy, and so I don't know exactly what's happening.
The obvious workaround would be to make the methods smaller, but they're
not structured in a way that makes this easy. It's also very delicate
and critical code, and any changes could introduce severe database
corruption.
On 2024-10-30 02:49 PM, Maurizio Cimadamore wrote:
>
> On 30/10/2024 20:28, Brian S O'Neill wrote:
>> The "smoking gun" actually has been identified. The call chain is too
>> large and inlining gives up. Things improve a bit when I can force
>> inlining, but I shouldn't have to do that.
> Which method fails to inline? Is that a FFM method or a method in your
> library? How much do things improve when you force inlining? And, I
> assume this is with the var handle combinator approach?
>> Should I just go back to square one and write the code in what looks
>> to be the most straightforward, and report the results? As I recall,
>> the performance regression was easier to spot even in a micro benchmark.
>
> I think that might be a good idea. It would be good if we could isolate
> a part of your library where the regression is coming from -- while it's
> possible this is a "death by a thousand cuts" situation, it is typically
> not the case.
>
> What I would like to do in the meantime is (when I have some cycles) to
> write a benchmark which tests memory segment in a loop with random
> access (e.g. using random offsets). And then evaluate the costs
> associated with the various approaches in isolation and see if anything
> pops up. I have a couple of ideas on how to possibly improve the story
> for the unbounded MAX_VALUE access -- but I'd like to see where we are
> w.r.t. random access in vanilla Java 24 first.
>
> Cheers
> Maurizio
>
More information about the panama-dev
mailing list