Unsafe vs MemorySegments / Bounds checking...
Brian S O'Neill
bronee at gmail.com
Tue Oct 29 18:35:42 UTC 2024
On 2024-10-29 10:44 AM, Maurizio Cimadamore wrote:
> Unfortunately we have never been able to come up with a reproducer for
> the slow down you are experiencing.
>
> If/when you have a standalone benchmark which shows the issue we will
> obviously take a look at it.
>
If you recall, I did send you a reproducer, and you did verify the
regression. This is what led you to come up with a strategy to define a
derived VarHandle. This helped somewhat, but you observed the inliner
giving up when the code was embedded in a very large method. JDK 23
appears to have introduced a regression that has made this worse.
Why is the inliner giving up? Is it because the VahHandle transform
introduced too many layers of indirection? What if the equivalent
VarHandle could be produced (or be built in) which skipped a few steps?
Would this allowing the inlining to proceed as expected? Is there a
better strategy? As it stands, there's no official path forward for
replacing the Unsafe API in a way that retains it's efficiency.
> On 29/10/2024 15:39, Brian S O'Neill wrote:
>>
>> It looks like (again) the HotSpot inliner isn't doing enough to
>> transform the code into the plain internal unsafe code. At the very
>> least, I think there should be a convenience API which doesn't require
>> me to apply a special transform step. The implementation could then at
>> least employ the magic "force inline" annotations to ensure that
>> there's no performance regressions.
>
> Well, if it was that easy :-)
>
> 99.99% of the work associated with such issues is to understand _where_
> adding ForceInline might be beneficial. Just blanket-adding that
> everywhere will likely make your code slower, not faster.
In this particular case, I'd like the magic transformed VarHandle to
somehow be collapsed into it's simpler form such that HotSpot doesn't
see it as being too big. I don't care if the implementation relies on
(ab)using the inliner or not. I think there needs to be a much simpler
and more direct way to "just access memory".
>
>
> In general 99% of the cost associated with bound checks can be disabled
> by using segments whose length is Long.MAX_VALUE. But, as we have
> learned when looking at some of your examples, this doesn't fully
> eliminate all the costs because there's still a sign check involved:
> e.g. the offset into the memory segment must be > 0 - and there's not
> much C2 can do to eliminate that at the moment.
>
> Of course, when a segment is accessed in a loop, none of this matters -
> checks will be hoisted out of the loops, and any added cost will be
> amortized.
>
> But if code accesses a memory segment (or a byte buffer, or...) in a
> "random" fashion, then some of these additional costs might show up (and
> some are, I think, unavoidable).
I don't think the bounds check is causing much of an issue. When I add
explicit bounds checks of my own I don't see any issue either. I think
it really is just the case of a long chain of transforms not being
collapsed down into much simpler operation.
>
> Popping back up 100 levels: your message seems to imply that a
> requirement for deprecating unsafe is that we should have a replacement
> API which offers 100% of Unsafe performance _and_ it is safe. I think
> this angle is rather unworkable. That said, I don't want to fully close
> the door to investigate whethet there's better "escape hatch" we can
> express within FFM (e.g. using restricted methods) to support corner
> cases where existing optimizations might not work too well. But to do
> that, we need some benchmark to look at (preferrably one that doesn't
> pull in an entire project).
No, I don't want a replacement for Unsafe which is also safe. I want
something which offers the same performance, with the same potential
risks, and therefore should remain restricted. The simplest approach
would be to keep the Unsafe API, or something like it. I understand why
this is undesirable, and I'm not advocating for it.
I feel perfectly comfortable using a VarHandle instead, and I do like
the fact that it offers more features than Unsafe with respect to
concurrency and byte ordering. The alternative would be to bloat the
Unsafe class with a gazillion permutations, which is of course a mess.
I don't know what the "correct" FFM API should look like, but if it
allows me to obtain a restricted VarHandle which "just accesses memory",
then this has two benefits: 1) It makes it easier for applications to
migrate off the Unsafe API, and 2) It makes it easier to optimize
because the implementation can be more direct.
>
> Maurizio
>
>>
>>
More information about the panama-dev
mailing list