Unsafe vs MemorySegments / Bounds checking...
Brian S O'Neill
bronee at gmail.com
Wed Oct 30 20:28:01 UTC 2024
On 2024-10-30 12:32 PM, Maurizio Cimadamore wrote:
>
>>
>
> In this case there is no "smoking gun" that links any of the mentioned
> inlining failures to the actual performance regression you are
> experiencing.They might be related - they might not be.
>
> We have tried to be quite responsive here and elsewhere when new issues
> were presented to us. If the problem is well understood and there's
> something we can do about it, we typically tend to fix things quickly.
> Case in point:
>
> https://git.openjdk.org/jdk/pull/21764
>
> In this case, things are less clear and it will take more work to
> identify the "real" root cause of what you are seeing.
>
The "smoking gun" actually has been identified. The call chain is too
large and inlining gives up. Things improve a bit when I can force
inlining, but I shouldn't have to do that.
Do I have to write a large method to prove this in isolation? I don't
understand why inlining gives up in the first place, and so I don't know
how to write a standalone method that breaks it.
>
> Well, once you have "more direct" memory access var handles, you might
> want "more direct" copy methods...
That's a slippery slope argument, but more direct copy methods would
help too.
>
> We have restricted API points - but a very limited number of them, to do
> things that cannot be done otherwise (such as retroactively attach
> spatial and temporal bounds on a raw memory segment obtained from native
> code). And we should keep it that way.
If the features and performance are possible without new API points,
that's fine. Fewer locations for where restricted checks should be made
is a good thing.
>
>>
>>>
>>> IMHO the most maintainable solution for your code is to use a segment
>>> whose bound is Long.MAX_VALUE. That won't be _exactly_ like Unsafe
>>> (because of the sign check), but it will be pretty close -- and at
>>> least you won't have to maintain long chains of adapted var handle
>>> and cross your fingers that everything optimizes correctly everywhere
>>> (which I suspect will make inlining of the methods in your library a
>>> bit more predictable).
>>
>> I tried this originally and again just now, but the performance is
>> worse, by about 1.5x. The sign check cannot be the primary issue.
>
> This is IMHO the main thing that needs to be investigated. I tend to
> agree that this strategy should perform better than it does today.
>
Should I just go back to square one and write the code in what looks to
be the most straightforward, and report the results? As I recall, the
performance regression was easier to spot even in a micro benchmark.
More information about the panama-dev
mailing list