MemorySegment off-heap usage and GC

Johannes Lichtenberger lichtenberger.johannes at gmail.com
Sat Sep 14 14:17:11 UTC 2024


Hello,

I'm currently refactoring my little database project in my spare time from
using a very simple byte[][] slots array of byte-arrays to a single
MemorySegment (or two depending on if DeweyIDs are stored or not (usually
not)):

from

https://github.com/sirixdb/sirix/blob/main/bundles/sirix-core/src/main/java/io/sirix/page/KeyValueLeafPage.java

to

https://github.com/sirixdb/sirix/blob/1aaafd13693c0cf7e073d400766525eed7a24ad6/bundles/sirix-core/src/main/java/io/sirix/page/KeyValueLeafPage.java

However, now I had to introduce reference counting / pinning/unpinning of
the pages, and they have to be closed, for instance, once they are evicted
from cache(s).

Implementing a "real" slotted page with shifting and resizing... has gotten
much more complicated. Furthermore (besides that, pinning/unpinning and
deterministic closing is tricky ;-)) I'm also facing much worse GC
performance (attached).

Of course, I'm in the middle of refactoring, and I'd give the nodes/records
in the page a slice from the MemorySegment of the page. Currently, I have
to convert back and forth for serialization/deserialization from
byte-arrays to MemorySegments, then copying these to the page
MemorySegment... which is currently one issue, but I'm not sure if that's
all.

All in all I'm not sure if there's other stuff I'm missing because I'm now
using `Arena.ofShared()` and I think this stuff is a bit strange:

[3,127s][info   ][gc      ] GC(7) Pause Young (Normal) (G1 Evacuation
Pause) (Evacuation Failure: Pinned) 645M->455M(5124M) 9,563ms
[3,253s][info   ][gc      ] GC(8) Pause Young (Normal) (G1 Evacuation
Pause) 783M->460M(5124M) 4,580ms
[5,094s][info   ][gc      ] GC(9) Pause Young (Normal) (G1 Evacuation
Pause) 3524M->897M(5124M) 40,103ms
[5,200s][info   ][gc      ] GC(10) Pause Young (Normal) (G1 Evacuation
Pause) (Evacuation Failure: Pinned) 1381M->947M(5124M) 29,005ms
[5,696s][info   ][gc      ] GC(11) Pause Young (Normal) (G1 Evacuation
Pause) 1499M->1191M(5124M) 25,405ms
[5,942s][info   ][gc      ] GC(12) Pause Young (Normal) (G1 Evacuation
Pause) (Evacuation Failure: Pinned) 1647M->1379M(5124M) 22,006ms
[5,979s][info   ][gc      ] GC(13) Pause Young (Normal) (G1 Evacuation
Pause) (Evacuation Failure: Pinned) 1899M->1411M(5124M) 7,634ms
[6,628s][info   ][gc      ] GC(14) Pause Young (Normal) (G1 Evacuation
Pause) 2243M->1801M(5124M) 36,093ms
[6,725s][info   ][gc      ] GC(15) Pause Young (Normal) (G1 Evacuation
Pause) (Evacuation Failure: Pinned) 2469M->1873M(5124M) 13,836ms
[7,436s][info   ][gc      ] GC(16) Pause Young (Normal) (G1 Evacuation
Pause) 2857M->2283M(5740M) 64,219ms
[7,525s][info   ][gc      ] GC(17) Pause Young (Normal) (G1 Evacuation
Pause) (Evacuation Failure: Pinned) 3115M->2343M(5740M) 14,110ms
[8,274s][info   ][gc      ] GC(18) Pause Young (Normal) (G1 Evacuation
Pause) 3659M->2783M(5740M) 42,159ms
[9,011s][info   ][gc      ] GC(19) Pause Young (Concurrent Start) (G1
Evacuation Pause) (Evacuation Failure: Pinned) 4027M->3239M(5740M) 51,686ms
[9,011s][info   ][gc      ] GC(20) Concurrent Mark Cycle
[9,165s][info   ][gc      ] GC(20) Pause Remark 4171M->2535M(5360M) 3,315ms
[9,446s][info   ][gc      ] GC(20) Pause Cleanup 2759M->2759M(5360M) 0,253ms
[9,448s][info   ][gc      ] GC(20) Concurrent Mark Cycle 436,601ms
[9,500s][info   ][gc      ] GC(21) Pause Young (Prepare Mixed) (G1
Evacuation Pause) 2783M->1789M(5360M) 30,267ms
[10,575s][info   ][gc      ] GC(22) Pause Young (Mixed) (G1 Evacuation
Pause) 3745M->2419M(5360M) 73,025ms
[11,266s][info   ][gc      ] GC(23) Pause Young (Normal) (G1 Evacuation
Pause) 3987M->2829M(5360M) 55,028ms
[11,762s][info   ][gc      ] GC(24) Pause Young (Concurrent Start) (G1
Evacuation Pause) 4149M->3051M(6012M) 65,550ms
[11,762s][info   ][gc      ] GC(25) Concurrent Mark Cycle
[11,869s][info   ][gc      ] GC(25) Pause Remark 3143M->1393M(5120M) 4,415ms
[12,076s][info   ][gc      ] GC(25) Pause Cleanup 1593M->1593M(5120M)
0,240ms
[12,078s][info   ][gc      ] GC(25) Concurrent Mark Cycle 316,410ms

I've rarely had these "Evacuation Failure: Pinned" log entries regarding
the current "master" branch on Github, but now it's even worse. Plus, I
think I'm still missing to close/clear pages in all cases (to close the
arenas), which turned out to be tricky. I'm also storing the two most
recently accessed pages in fields; sometimes, they are not read/put into a
cache; there are page "fragments" that must be recombined for a full page...

So maybe you know why the GC is much worse now (I guess even if I fail to
close a page, I'd get an OutOfMemoryError or something like that, as the
segments are off-heap (despite my array-based memory segments
(ofArray), which may be a problem, hmm).

All in all I faced a much worse performance with N-read only trxs
traversing a large file in parallel, likely due to ~2,7Gb object allocation
rate for a single trx already (and maybe not that much read from the page
caches), that's why I thought I'd have to try the single MemorySegment
approach for each page.

The G1 log:

https://raw.githubusercontent.com/sirixdb/sirix/main/g1.log.4

kind regards
Johannes
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20240914/da95c031/attachment.htm>


More information about the panama-dev mailing list