<div dir="auto">Well, I also need to be able to change the contents in the byte arrays after the first creation (for instance change node references as leftSiblingNodeKey/rightSiblingNodeKey/firstChildKey/lastChildKey... string values, boolean, numbers, null values...). </div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Johannes Lichtenberger <<a href="mailto:lichtenberger.johannes@gmail.com">lichtenberger.johannes@gmail.com</a>> schrieb am Fr., 28. Juli 2023, 23:45:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="auto">I think the main issue is not even allocation or GC, but instead the serialization of in this case over 300_000_000 nodes in total.<div dir="auto"><br></div><div dir="auto">I've attached a screenshot showing the flamegraph of the profiler output I posted the link to...</div><div dir="auto"><br></div><div dir="auto">What do you think? For starters the page already has a slot array to store byte arrays for the nodes. The whole page should probably be something like a growable MemorySegment, but I could probably first try to simply write byte arrays directly and read from them using something as Chronicle bytes or even MemorySegments and the converting it to byte arrays?</div><div dir="auto"><br></div><div dir="auto">Kind regards</div><div dir="auto">Johannes</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" target="_blank" rel="noreferrer">maurizio.cimadamore@oracle.com</a>> schrieb am Fr., 28. Juli 2023, 11:38:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

On 28/07/2023 08:15, Johannes Lichtenberger wrote:<br>

> Hello,<br>

><br>

> I think I mentioned it already, but currently I'm thinking about it again.<br>

><br>

> Regarding the index trie in my spare time project I'm thinking if it <br>

> makes sense, as currently I'm creating fine granular on heap nodes <br>

> during insertions/updates/deletes (1024 per page). Once a page is read <br>

> again from storage I'm storing these nodes in a byte array of byte <br>

> arrays until read for the first time. One thing though is, that the <br>

> nodes may store strings inline and thus are of variable size (and <br>

> thus, the pages are of variable size, too, padded to word aligned IIRC).<br>

><br>

> I'm currently auto-committing after approx 500_000 nodes have been <br>

> created (afterwards they can be garbage collected) and in total there <br>

> are more than 320 million nodes in one test.<br>

><br>

> I think I could store the nodes in MemorySegments instead of using on <br>

> heap classes / instances and dynamically reallocate memory if a node <br>

> value is changed.<br>

><br>

> However, I'm not sure as it means a lot of work and maybe off heap <br>

> memory access is always slightly worse than on heap!?<br>

<br>

I don't think that's necessarily the case. I mean, array access is the <br>

best, there's more optimizations for it, and the access is more <br>

scrutable to the optimizing compiler.<br>

<br>

If you start using APIs, such as ByteBuffer or MemorySegment, they take <br>

a bit of a hit, depending on usage, as each access has to verify certain <br>

access properties. That said, if your access is "well-behaved" and <br>

SIMD-friendly (e.g. counted loop and such), you can expect performance <br>

of MS/BB to be very good, as all the relevant checks will be hoisted out <br>

of loops.<br>

<br>

With memory segments, since you can also create unsafe segments on the <br>

fly, we're investigating approaches where you can get (at least in <br>

synthetic benchmarks) the same assembly and performance of raw Unsafe calls:<br>

<br>

<a href="https://mail.openjdk.org/pipermail/panama-dev/2023-July/019487.html" rel="noreferrer noreferrer noreferrer" target="_blank">https://mail.openjdk.org/pipermail/panama-dev/2023-July/019487.html</a><br>

<br>

I think one area that requires a lot of thought when it comes to <br>

off-heap is allocation. The GC is mightly fast at allocating objects, <br>

especially small ones that might die soon. The default implementation of <br>

Arena::allocate uses malloc under the hood, so it's not going to be <br>

anywhere as fast.<br>

<br>

That said, starting from Java 20 you can define a custom arena with a <br>

better allocation scheme. For instance, if you are allocating in a tight <br>

loop you can write an "allocator" which just recycles memory (see <br>

SegmentAllocator::slicingAllocator). With malloc out of the way the <br>

situation should improve significantly.<br>

<br>

Ultimately picking the right allocation scheme depends on your workload, <br>

there is no one size-fits-all (as I'm sure you know). But there should <br>

be enough building blocks in the API to allow you to do what you need.<br>

<br>

<br>

><br>

> I'll check GC pressure again by logging it, but an IntelliJ profiler <br>

> (async profiler JFR) output of a run to store a big JSON file in <br>

> SirixDB can be seen here: <br>

> <a href="https://github.com/sirixdb/sirix/blob/refactoring-serialization/JsonShredderTest_testChicago_2023_07_27_131637.jfr" rel="noreferrer noreferrer noreferrer" target="_blank">https://github.com/sirixdb/sirix/blob/refactoring-serialization/JsonShredderTest_testChicago_2023_07_27_131637.jfr</a><br>

><br>

> I think I had better performance/latency with Shenandoah (not <br>

> generational), but ZGC was worse in other workloads due to caffeine <br>

> caches and not being generational (but that's changing of course).<br>

><br>

> So, by looking at the profiler output and probably the flame graph <br>

> where G1 work seems to be prominent do you think a refactoring would <br>

> be appropriate using MemorySegments or maybe it's an ideal "big data" <br>

> use case for the generational low latency GCs and the amount of <br>

> objects is not an issue at all!?<br>

<br>

Hard to say. Generational GCs are very very good. And object allocation <br>

might be cheaper than you think. Where off-heap becomes advantageous is <br>

(typically) if you need to work with memory mapped files (and/or native <br>

calls), which is common for database-like use cases.<br>

<br>

Maurizio<br>

<br>

><br>

> Kind regards<br>

> Johannes<br>

</blockquote></div>

</blockquote></div>