Unsafe vs MemorySegments / Bounds checking...

Tue Nov 5 00:42:01 UTC 2024

Hey all,

I've written a simple benchmark to explore the issues discussed in this
thread and maybe quantify the relevant sources of overhead compared to
equivalent memory access via Unsafe.

https://gist.github.com/Spasi/b5de16a30bd9d12436c6bad6cfc3f742

My findings:

1. The alignment check appears to be just as (if not more) expensive
   than bounds checking.
2. The VarHandle / new-segment-per-access trick indeed works great when
   EA scalar replacement succeeds. All checks are eliminated and the
   entire memory address space is available.
3. The Scope::isAlive check is almost always hidden in the noise, but
   it does become noticable for single accesses or small loops (set a
   small offsetCount to test). I believe the fix for this is simple:
   add an override to the GlobalSession class that always returns true.

I also tested how VarHandle access behaves in deeply nested code and
I'm not seeing any evidence that VHs & MHs have extended inlining
budgets or any kind of special "inlining bubble/horizon" within which
EA is guaranteed to scalar-replace allocations.

https://gist.github.com/Spasi/513d57fc608d9b91bcd73751e71491d1

There are of course many @ForceInlined methods getting compiled for
each VH but, afaict, any method not part of the VH machinery is being
compiled using the standard inlining budgets. The maximum inline level
at least does not appear to be affected, there's no "reset" within the
VH boundaries.

Based on my results, there's no real (maybe 1 level?) difference
between the VarHandle trick and doing the equivalent without a
VarHandle and both approaches will suffer from extreme allocation
overhead in deep call sites.

- Ioannis

On Wed, 30 Oct 2024 at 11:58, Maurizio Cimadamore
<maurizio.cimadamore at oracle.com> wrote:
>
> Glad to hear that JDK 24 helped. This particular fix has also been
> backported and will be in 23.0.2.
>
> Maurizio
>
> On 29/10/2024 21:18, Brian S O'Neill wrote:
> > With JDK 24, the overall performance regression drops down to 1.9%,
> > which is similar to what I saw before JDK 23.
> >
> > On 2024-10-29 12:20 PM, Brian S O'Neill wrote:
> >> I'll try the latest JDK 24 build and report back. One thing that
> >> concerns me with the latest build is this error message when I enable
> >> large pages (which helps with performance):
> >>
> >> [0.027s][error][cds] Failed to commit static  region #3 (Heap)
> >> [0.027s][error][cds] Failed to read archived heap region into
> >> 0x00000007ffe00000
> >>
> >>
> >> On 2024-10-29 12:07 PM, Maurizio Cimadamore wrote:
> >>>
> >>> On 29/10/2024 18:35, Brian S O'Neill wrote:
> >>>> If you recall, I did send you a reproducer, and you did verify the
> >>>> regression. This is what led you to come up with a strategy to
> >>>> define a derived VarHandle. This helped somewhat, but you observed
> >>>> the inliner giving up when the code was embedded in a very large
> >>>> method. JDK 23 appears to have introduced a regression that has
> >>>> made this worse.
> >>>
> >>> The fact that you mention that 23 made it worse reminds me of [1],
> >>> where a fix in 23 created an issue for adapted memory access var
> >>> handles.
> >>>
> >>> If I recall correctly, the workaround we suggested was _also_ using
> >>> adapted var handles.
> >>>
> >>> So I wonder if (a) you were already running into the issue in [1]
> >>> and (b) because of that JDK 23 made it worse for you.
> >>>
> >>> Did you have a chance to try your project with the latest JDK 24
> >>> build? Is the regression gone there? That would be useful to know
> >>> regardless of the wider discussion.
> >>>
> >>> Maurizio
> >>>
> >>> [1] - https://mail.openjdk.org/pipermail/panama-dev/2024-
> >>> September/020643.html
> >>>
> >>
> >