MemorySegment off-heap usage and GC

Mon Sep 16 15:23:13 UTC 2024

Hi Maurizio,

thanks for all the input AFAICT I'm not using any JNI...

So, the problem is, that I'm creating too many allocations (I think in one
test it was 2,7Gb/s) and it's much more with > 1 trx (and by far the most
objects allocated were/are byte-arrays), in the main branch. Thus, I had
the idea to replace this slots byte array of byte arrays with a single
MemorySegment. I think for now it would be even optimal to use a single
on-heap byte-array.

The setSlot method is currently mainly called once during serialization of
the DataRecords during a sync / commit to disk. Also it's called during
deserialization, but even though slots may be added in random order they
are appended to the MemorySegment. I think that usually records are
added/deleted instead of updated (besides the long "pointers" to neighbour
nodes/records).

It's basically a binary encoding for tree-structured data with fine grained
nodes (firstChild/rightSibling/leftSibling/parebt/lastChild) and the nodes
are stored in a dense trie where the leaf pages hold mostly 1024 nodes.

Up to a predefined very small threshold N page fragments are fetched in
parallel from disk if thers's no in memory reference and not found in a
Caffeine cache, which are then combined to a full page, thus setSlot is
called for slots which are not currently set, but are set in the current
page fragment once during reconstruction of the full page.

So, I assume afterwards they are only ever set in a single read-write trx
per resource and only seldom variable length data may be adapted. If that's
not the case I could also try to leave some space after each slot, thus
that it can probably grow without having to shift other data or something
like that.

At least I think the issue with a much worse runtime of traversing roughly
310_000_000 nodes in a preorder traversal (remember that they are stored in
pages) currently switching from 1 to 5 trxs in parallel is due to the
objects allocated (without the MemorySegment):

https://github.com/sirixdb/sirix/blob/main/analysis-single-trx.jfr

vs.

https://github.com/sirixdb/sirix/blob/main/analysis-5-trxs.jfr

Andrei Pangin helped a bit analyzing the async profiler snapshots, as the
runtime of 5 trxs in parallel is almost exactly 4x slower than with a
single trx and it's most probably due to the amount of allocations (even
though GC seems ok).

So all in all I've had a specific runtime performance problem and (also
paging a lot, so I think it makes sense that it may be due to the
allocation rate).

I hope the nodes can simply get a MemorySegment constructor param in the
future instead of a couple of object delegates... so that I can directly
use MemorySegments instead of having to convert between byte arrays back
and forth during serialization/deserialization. It's even we can get
(almost) rid of the whole step and we gain better data locality.

Hope it makes some sense now, but it may also be worth looking into a
single bigger byte array instead of a MemorySegment (even though I think
that off-heap memory usage might not be a bad idea for a database system).

You may have a quick look into the 2 profiles I provided...

Kind regards and thanks a lot for your input. If it may help I can provide
a bigger JSON file I used for importing  / the test.

Johannes

Maurizio Cimadamore <maurizio.cimadamore at oracle.com> schrieb am Mo., 16.
Sept. 2024, 12:31:

>
> On 16/09/2024 11:26, Maurizio Cimadamore wrote:
> > I've rarely had these "Evacuation Failure: Pinned" log entries
> > regarding the current "master" branch on Github, but now it's even worse.
>
> Zooming in on this aspect: this would suggest that your heap memory is
> being kept "pinned" somewhere.
>
> Are you, by any chance, using downcall method handles with the
> "critical" Linker option? Or any form of critical JNI?
>
> It wouldn be interesting (separately from the "architectural" angle
> discussed in my previous reply) to see which method call(s) is causing
> this exactly...
>
> Cheers
> Maurizio
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20240916/7e82fcfc/attachment.htm>