MemorySegment off-heap usage and GC

Mon Sep 16 10:26:06 UTC 2024

Hi Johannes,
without knowing more high-level information on what the usage patterns 
for the methods in your code is it’s hard for me to provide guidance, or 
to try and explain why heap allocation is more pronounced than before. I 
will try nevertheless :-)

Eyeballing your code, I only see three places where segments are created:

  * KeyValueLeafPage constructor (because of allocate)
  * resize (because of allocate)
  * setSlot (because of MemorySegment::ofArray)
  * setDeweyId (because of MemorySegment::ofArray)

So, it doesn’t seem like you are creating tons of heap objects. But all 
the setXYZ methods seem to create new heap objects (a new memory 
segment) while before you just stored the incoming byte array into some 
field of the class. So, if your application is sensitive to that kind of 
thing (e.g. these set methods are called a lot), that could explain the 
increase in heap pressure.

In general, looking at the before/after code, there is now more code in 
the set/get code path - as you have to go from byte[] to segment and 
then back. If your public-facing API is taking byte[], you might not 
have many other options than to change your internal representation to 
match that.

(in saying that, I’d note that the “before” implementation doesn’t 
perform defensive copies of the incoming byte arrays, and that might 
also affect performance).

Another reason where performance might be lost is in the resize method. 
Every now and then, when you call setdata, the memory segment might not 
be big enough, so the code allocates a new one, and copies all the data 
over. This is not going to be fast (even igoring the heap pressure 
problem, even though the two could be related).

The reality here is that it is going to be hard to match the performance 
of somtething like this:

|public void setSlot(byte[] recordData, int offset) { slots[offset] = 
recordData; } |

There’s no moving of the data off-heap, no need for resizing ever, and 
this sets you up for avoiding deserialiation altogether:

|public byte[] getSlot(int slotNumber) { return slots[slotNumber]; } |

Anything you change here is added cost. Memory segment is the tip of the 
iceberg here, I think (you’d face exactly the same problems trying to 
use ByteBuffer, or any other such API).

Something that would be more “apple to apple” would be if you could 
avoid resizing - after all in the old code you always allocate a number 
of slots/records/deweyIds equal to some known constant 
(Constants.NDP_NODE_COUNT). Maybe this constant is super high (seems 
1024), so you cannot afford to have the equivalent bytes to be 
pre-allocated when you allocate a page (but that’s another “shortcut” 
you can take because you’re exploiting an on-heap representation). But 
even if you pre-allocated big-enough segments, you’d still have to 
convert the input data into a segment, and then extracting the output 
data into a byte array (instead of simply setting a pointer/getting a 
pointer - another on-heap-oriented assumption).

To sum up, I think in order for the switch to off-heap/segments to be 
beneficial there has to be some use case you want to address that cannot 
be addressed in the current setup. Your setup is currently highly 
optimized under the assumption that data lives on-heap (and that you 
don’t care too much about data being manipulated via the array instances 
being fed to the API /after/ said arrays have been stored in the page). 
If those assumptions work well for you, given they also provide best 
possible performance, why to change? Of course, if you want to pass the 
page data to some native function you will run into a road-block, as now 
you will have to collect all your page data by chasing pointers and 
copying it in some contiguous memory region. But if you don’t need that, 
why bother? Sometimes the simplest design is also the best. That’s not 
to say that there aren’t way to perhaps use memory segments more 
effectively to do what you want to do. For instance, if you "squint" 
your old code is really creating a page with a fixed size of 
(Constants.NDP_NODE_COUNT) data pointers. So, you could create a page 
with an array of N memory segments, to be filled later. Each page will 
be associated with an _arena_ so that all the segment data you create 
will effectively share the same lifetime. This will allow you to avoid 
resizing, while still only "allocating as you need". But you will need 
to copy data in/out of heap, so this will add serialization cost 
compared to your original solution.

Just trying to be honest here, and set the right expectations. Going 
off-heap is not a “make me go fast” kind of toggle. It often requires 
compromises, to make the Java and the off-heap side of the world “align” 
somehow.

Maurizio

On 14/09/2024 15:17, Johannes Lichtenberger wrote:

> Hello,
>
> I'm currently refactoring my little database project in my spare time 
> from using a very simple byte[][] slots array of byte-arrays to a 
> single MemorySegment (or two depending on if DeweyIDs are stored or 
> not (usually not)):
>
> from
>
> https://github.com/sirixdb/sirix/blob/main/bundles/sirix-core/src/main/java/io/sirix/page/KeyValueLeafPage.java
>
> to
>
> https://github.com/sirixdb/sirix/blob/1aaafd13693c0cf7e073d400766525eed7a24ad6/bundles/sirix-core/src/main/java/io/sirix/page/KeyValueLeafPage.java
>
> However, now I had to introduce reference counting / pinning/unpinning 
> of the pages, and they have to be closed, for instance, once they are 
> evicted from cache(s).
>
> Implementing a "real" slotted page with shifting and resizing... has 
> gotten much more complicated. Furthermore (besides that, 
> pinning/unpinning and deterministic closing is tricky ;-)) I'm also 
> facing much worse GC performance (attached).
>
> Of course, I'm in the middle of refactoring, and I'd give the 
> nodes/records in the page a slice from the MemorySegment of the page. 
> Currently, I have to convert back and forth for 
> serialization/deserialization from byte-arrays to MemorySegments, then 
> copying these to the page MemorySegment... which is currently one 
> issue, but I'm not sure if that's all.
>
> All in all I'm not sure if there's other stuff I'm missing because I'm 
> now using `Arena.ofShared()` and I think this stuff is a bit strange:
>
> [3,127s][info   ][gc      ] GC(7) Pause Young (Normal) (G1 Evacuation 
> Pause) (Evacuation Failure: Pinned) 645M->455M(5124M) 9,563ms
> [3,253s][info   ][gc      ] GC(8) Pause Young (Normal) (G1 Evacuation 
> Pause) 783M->460M(5124M) 4,580ms
> [5,094s][info   ][gc      ] GC(9) Pause Young (Normal) (G1 Evacuation 
> Pause) 3524M->897M(5124M) 40,103ms
> [5,200s][info   ][gc      ] GC(10) Pause Young (Normal) (G1 Evacuation 
> Pause) (Evacuation Failure: Pinned) 1381M->947M(5124M) 29,005ms
> [5,696s][info   ][gc      ] GC(11) Pause Young (Normal) (G1 Evacuation 
> Pause) 1499M->1191M(5124M) 25,405ms
> [5,942s][info   ][gc      ] GC(12) Pause Young (Normal) (G1 Evacuation 
> Pause) (Evacuation Failure: Pinned) 1647M->1379M(5124M) 22,006ms
> [5,979s][info   ][gc      ] GC(13) Pause Young (Normal) (G1 Evacuation 
> Pause) (Evacuation Failure: Pinned) 1899M->1411M(5124M) 7,634ms
> [6,628s][info   ][gc      ] GC(14) Pause Young (Normal) (G1 Evacuation 
> Pause) 2243M->1801M(5124M) 36,093ms
> [6,725s][info   ][gc      ] GC(15) Pause Young (Normal) (G1 Evacuation 
> Pause) (Evacuation Failure: Pinned) 2469M->1873M(5124M) 13,836ms
> [7,436s][info   ][gc      ] GC(16) Pause Young (Normal) (G1 Evacuation 
> Pause) 2857M->2283M(5740M) 64,219ms
> [7,525s][info   ][gc      ] GC(17) Pause Young (Normal) (G1 Evacuation 
> Pause) (Evacuation Failure: Pinned) 3115M->2343M(5740M) 14,110ms
> [8,274s][info   ][gc      ] GC(18) Pause Young (Normal) (G1 Evacuation 
> Pause) 3659M->2783M(5740M) 42,159ms
> [9,011s][info   ][gc      ] GC(19) Pause Young (Concurrent Start) (G1 
> Evacuation Pause) (Evacuation Failure: Pinned) 4027M->3239M(5740M) 
> 51,686ms
> [9,011s][info   ][gc      ] GC(20) Concurrent Mark Cycle
> [9,165s][info   ][gc      ] GC(20) Pause Remark 4171M->2535M(5360M) 
> 3,315ms
> [9,446s][info   ][gc      ] GC(20) Pause Cleanup 2759M->2759M(5360M) 
> 0,253ms
> [9,448s][info   ][gc      ] GC(20) Concurrent Mark Cycle 436,601ms
> [9,500s][info   ][gc      ] GC(21) Pause Young (Prepare Mixed) (G1 
> Evacuation Pause) 2783M->1789M(5360M) 30,267ms
> [10,575s][info   ][gc      ] GC(22) Pause Young (Mixed) (G1 Evacuation 
> Pause) 3745M->2419M(5360M) 73,025ms
> [11,266s][info   ][gc      ] GC(23) Pause Young (Normal) (G1 
> Evacuation Pause) 3987M->2829M(5360M) 55,028ms
> [11,762s][info   ][gc      ] GC(24) Pause Young (Concurrent Start) (G1 
> Evacuation Pause) 4149M->3051M(6012M) 65,550ms
> [11,762s][info   ][gc      ] GC(25) Concurrent Mark Cycle
> [11,869s][info   ][gc      ] GC(25) Pause Remark 3143M->1393M(5120M) 
> 4,415ms
> [12,076s][info   ][gc      ] GC(25) Pause Cleanup 1593M->1593M(5120M) 
> 0,240ms
> [12,078s][info   ][gc      ] GC(25) Concurrent Mark Cycle 316,410ms
>
> I've rarely had these "Evacuation Failure: Pinned" log entries 
> regarding the current "master" branch on Github, but now it's even 
> worse. Plus, I think I'm still missing to close/clear pages in all 
> cases (to close the arenas), which turned out to be tricky. I'm also 
> storing the two most recently accessed pages in fields; sometimes, 
> they are not read/put into a cache; there are page "fragments" that 
> must be recombined for a full page...
>
> So maybe you know why the GC is much worse now (I guess even if I fail 
> to close a page, I'd get an OutOfMemoryError or something like that, 
> as the segments are off-heap (despite my array-based memory segments 
> (ofArray), which may be a problem, hmm).
>
> All in all I faced a much worse performance with N-read only trxs 
> traversing a large file in parallel, likely due to ~2,7Gb object 
> allocation rate for a single trx already (and maybe not that much read 
> from the page caches), that's why I thought I'd have to try the single 
> MemorySegment approach for each page.
>
> The G1 log:
>
> https://raw.githubusercontent.com/sirixdb/sirix/main/g1.log.4
>
> kind regards
> Johannes
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20240916/9a472f86/attachment-0001.htm>