MemorySegment off-heap usage and GC

Mon Sep 16 16:49:14 UTC 2024

Hi Maurizio,

the first three points are right. I'll get rid of the method which has the
byte-array as a parameter and the method which returns the byte array
parameter in the future, it's currently work-in-progress, but I thought it
shouldn't be that runtime for shredding a resource (impirting JSON data)
suddenly is 3x worse even now in the middle of a bigger refactoring in my
spare time. I'm currently feeling a bit ill, but I thought it would be a
good idea, as it would be managed manually, too (closing the arena, thus
also reducing GC pressure), serializing/deserializing of pages from/to disk
would be almost gone in the future and due to data locality at least up
until Valhalla ships value classes.

Well, I must say that somehow I also thought that one big chunk of memory
is somehow better than a lot of small arrays, but maybe not ;-) even though
I think the small byte arrays currently are at least scattered through the
Java heap.

This may shed some light, I hope, but most data is read from disk and the
pages are only cached in-memory:

https://sirix.io/docs/concepts.html

Regarding the switch to MemorySegments I was also inspired by Gavin Ray:
https://gavinray97.github.io/blog/panama-not-so-foreign-memory

Kind regards

Johannes

Maurizio Cimadamore <maurizio.cimadamore at oracle.com> schrieb am Mo., 16.
Sept. 2024, 17:59:

> Hi Johannes,
> I'm trying to uplevel as much as possible here. Is this correct:
>
> 1. your application, even when using a byte[][] backing storage already
> had an allocation issue
> 2. it is not clear from the information you shared, where this allocation
> issue is coming from (it predates memory segment)
> 3. when you made the switch to use memory segments instead of byte[][]
> things got worse not better.
>
> Does that accurately reflect your case? IMHO, the crux of the issue is
> (1)/(2). If there was already some allocation issue in your
> application/framework, then adopting memory segment is unlikely to make
> that disappear (esp. with the kind of code we're staring at right now,
> which I think is allocating _more_ temp objects in the heap).
>
> You referred to these big traversals several times. What does a traversal
> do? In principle, if your data is already in memory, I'd expect a traversal
> not to allocate any memory (regardless of the backing storage being used).
>
> So I feel like I probably don't understand what's going on :-)
>
> It would be beneficial for the discussion to come up with some simplified
> model of how the code used to work before (using some mock pseudo code and
> data structures), which problems you identified, and why and how you
> thought using a memory segment would improve over that. This might also
> people (other than me!) to provide more feedback.
>
> Maurizio
>
>
>
> On 16/09/2024 16:23, Johannes Lichtenberger wrote:
>
> Hi Maurizio,
>
> thanks for all the input AFAICT I'm not using any JNI...
>
> So, the problem is, that I'm creating too many allocations (I think in one
> test it was 2,7Gb/s) and it's much more with > 1 trx (and by far the most
> objects allocated were/are byte-arrays), in the main branch. Thus, I had
> the idea to replace this slots byte array of byte arrays with a single
> MemorySegment. I think for now it would be even optimal to use a single
> on-heap byte-array.
>
> The setSlot method is currently mainly called once during serialization of
> the DataRecords during a sync / commit to disk. Also it's called during
> deserialization, but even though slots may be added in random order they
> are appended to the MemorySegment. I think that usually records are
> added/deleted instead of updated (besides the long "pointers" to neighbour
> nodes/records).
>
> It's basically a binary encoding for tree-structured data with fine
> grained nodes (firstChild/rightSibling/leftSibling/parebt/lastChild) and
> the nodes are stored in a dense trie where the leaf pages hold mostly 1024
> nodes.
>
> Up to a predefined very small threshold N page fragments are fetched in
> parallel from disk if thers's no in memory reference and not found in a
> Caffeine cache, which are then combined to a full page, thus setSlot is
> called for slots which are not currently set, but are set in the current
> page fragment once during reconstruction of the full page.
>
> So, I assume afterwards they are only ever set in a single read-write trx
> per resource and only seldom variable length data may be adapted. If that's
> not the case I could also try to leave some space after each slot, thus
> that it can probably grow without having to shift other data or something
> like that.
>
> At least I think the issue with a much worse runtime of traversing roughly
> 310_000_000 nodes in a preorder traversal (remember that they are stored in
> pages) currently switching from 1 to 5 trxs in parallel is due to the
> objects allocated (without the MemorySegment):
>
> https://github.com/sirixdb/sirix/blob/main/analysis-single-trx.jfr
> <https://urldefense.com/v3/__https://github.com/sirixdb/sirix/blob/main/analysis-single-trx.jfr__;!!ACWV5N9M2RV99hQ!PZjliXD6cF77z6VQbG0HoRr9sTYhYMqnXNbcRaPb8CHFWPu8ZR4NLPzxrkm-EjhrU5u33ZhN68JOHuNfBB2iC3SR5yjfWDVmJw$>
>
> vs.
>
> https://github.com/sirixdb/sirix/blob/main/analysis-5-trxs.jfr
> <https://urldefense.com/v3/__https://github.com/sirixdb/sirix/blob/main/analysis-5-trxs.jfr__;!!ACWV5N9M2RV99hQ!PZjliXD6cF77z6VQbG0HoRr9sTYhYMqnXNbcRaPb8CHFWPu8ZR4NLPzxrkm-EjhrU5u33ZhN68JOHuNfBB2iC3SR5yhCs_wJyA$>
>
> Andrei Pangin helped a bit analyzing the async profiler snapshots, as the
> runtime of 5 trxs in parallel is almost exactly 4x slower than with a
> single trx and it's most probably due to the amount of allocations (even
> though GC seems ok).
>
> So all in all I've had a specific runtime performance problem and (also
> paging a lot, so I think it makes sense that it may be due to the
> allocation rate).
>
> I hope the nodes can simply get a MemorySegment constructor param in the
> future instead of a couple of object delegates... so that I can directly
> use MemorySegments instead of having to convert between byte arrays back
> and forth during serialization/deserialization. It's even we can get
> (almost) rid of the whole step and we gain better data locality.
>
> Hope it makes some sense now, but it may also be worth looking into a
> single bigger byte array instead of a MemorySegment (even though I think
> that off-heap memory usage might not be a bad idea for a database system).
>
> You may have a quick look into the 2 profiles I provided...
>
> Kind regards and thanks a lot for your input. If it may help I can provide
> a bigger JSON file I used for importing  / the test.
>
> Johannes
>
> Maurizio Cimadamore <maurizio.cimadamore at oracle.com> schrieb am Mo., 16.
> Sept. 2024, 12:31:
>
>>
>> On 16/09/2024 11:26, Maurizio Cimadamore wrote:
>> > I've rarely had these "Evacuation Failure: Pinned" log entries
>> > regarding the current "master" branch on Github, but now it's even
>> worse.
>>
>> Zooming in on this aspect: this would suggest that your heap memory is
>> being kept "pinned" somewhere.
>>
>> Are you, by any chance, using downcall method handles with the
>> "critical" Linker option? Or any form of critical JNI?
>>
>> It wouldn be interesting (separately from the "architectural" angle
>> discussed in my previous reply) to see which method call(s) is causing
>> this exactly...
>>
>> Cheers
>> Maurizio
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20240916/d4e070dc/attachment-0001.htm>