Unsafe vs MemorySegments / Bounds checking...
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu Oct 31 10:00:14 UTC 2024
On 31/10/2024 09:45, Mike Hearn wrote:
>
> Hence my suggestion to go back a little, and see what we can do to
> speed up access for a segment created with:
>
> |MemorySegment.NULL.reinterpret(Long.MAX_VALUE) |
>
> (which, as Ron correctly points out, might not mean /exactly as
> fast as Unsafe/)
>
> If a sign check is genuinely causing a meaningful slow down you could
> potentially re-spec such a NULL->MAX memory segment to not do it. In
> that case a negative number would be treated as unsigned.
> Alternatively, the sign bit could be masked out of the address which
> should be faster than a compare and branch. Given that such a memory
> segment is already requesting that safety checks be disabled, maybe
> the check for negative addresses isn't that important as there are
> already so many ways to segfault the VM with such a segment.
>
That is a tempting path we have considered in the past. The drawback of
that is that you have now obtained a new segment which doesn't behave
like other segments. E.g. all the memory access operations, bulk copy
operations, and even slicing will need to specify that some of the
checks apply for all segment _but_ this weird one. Heck, even the size
of this segment would be negative...
To be precise: the sign check is not causing the slow down, but the fact
that there is a sign check is the reason bound checks _in loops_ cannot
be completely eliminated as C2 has to be mindful of overflows. But if
you have a situation where memory access does not follow a pattern
(which seems to be the case here), then bound check elimination wouldn't
kick in anyway.
I've read some exchange with Roland I had last year on this. The reason
why random access is slower, has to do, fundamentally, with the fact
that FFM has to execute more stuff than Unsafe -- there's no way around
that. It used to be the case that, sometimes, C2 would try to speculate
and remove bound checks, and causing regressions when doing so (because
the loop didn't run for long enough). But this has long been fixed:
https://bugs.openjdk.org/browse/JDK-8311932
The workaround I came up with in the past:
https://mail.openjdk.org/pipermail/panama-dev/2023-July/019478.html
Was working because it effectively changed the shape of the code, and
caused C2 to back off, and not introduce an optimization that was
causing more cost than benefit. That should no longer be a problem today
-- as C2 should only optimize loops where the trip count is longer than
a certain threshold.
Stepping back... there's two way to approach this problem. One is to add
more heroics so that C2 can somehow do more of what it's already doing.
That's what we tried in the past, it works -- but up to a point. A more
robust solution, IMHO, would be to find ways to reduce the complexity of
the implementation when accessing a segment whose span is 0..MAX_VALUE.
Maybe we can't eliminate _all_ checks (e.g. alignment and offset sign),
but it seems to me that we can eliminate most of them.
Maurizio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20241031/fd8cdf4f/attachment-0001.htm>
More information about the panama-dev
mailing list