Unsafe vs MemorySegments / Bounds checking...

Wed Oct 30 20:28:01 UTC 2024

On 2024-10-30 12:32 PM, Maurizio Cimadamore wrote:
> 
>>
> 
> In this case there is no "smoking gun" that links any of the mentioned 
> inlining failures to the actual performance regression you are 
> experiencing.They might be related - they might not be.
> 
> We have tried to be quite responsive here and elsewhere when new issues 
> were presented to us. If the problem is well understood and there's 
> something we can do about it, we typically tend to fix things quickly. 
> Case in point:
> 
> https://git.openjdk.org/jdk/pull/21764
> 
> In this case, things are less clear and it will take more work to 
> identify the "real" root cause of what you are seeing.
> 

The "smoking gun" actually has been identified. The call chain is too 
large and inlining gives up. Things improve a bit when I can force 
inlining, but I shouldn't have to do that.

Do I have to write a large method to prove this in isolation? I don't 
understand why inlining gives up in the first place, and so I don't know 
how to write a standalone method that breaks it.

> 
> Well, once you have "more direct" memory access var handles, you might 
> want "more direct" copy methods...

That's a slippery slope argument, but more direct copy methods would 
help too.

> 
> We have restricted API points - but a very limited number of them, to do 
> things that cannot be done otherwise (such as retroactively attach 
> spatial and temporal bounds on a raw memory segment obtained from native 
> code). And we should keep it that way.

If the features and performance are possible without new API points, 
that's fine. Fewer locations for where restricted checks should be made 
is a good thing.

> 
>>
>>>
>>> IMHO the most maintainable solution for your code is to use a segment 
>>> whose bound is Long.MAX_VALUE. That won't be _exactly_ like Unsafe 
>>> (because of the sign check), but it will be pretty close -- and at 
>>> least you won't have to maintain long chains of adapted var handle 
>>> and cross your fingers that everything optimizes correctly everywhere 
>>> (which I suspect will make inlining of the methods in your library a 
>>> bit more predictable).
>>
>> I tried this originally and again just now, but the performance is 
>> worse, by about 1.5x. The sign check cannot be the primary issue.
> 
> This is IMHO the main thing that needs to be investigated. I tend to 
> agree that this strategy should perform better than it does today.
> 

Should I just go back to square one and write the code in what looks to 
be the most straightforward, and report the results? As I recall, the 
performance regression was easier to spot even in a micro benchmark.