<div dir="auto">Well, I also need to be able to change the contents in the byte arrays after the first creation (for instance change node references as leftSiblingNodeKey/rightSiblingNodeKey/firstChildKey/lastChildKey... string values, boolean, numbers, null values...). </div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Johannes Lichtenberger <<a href="mailto:lichtenberger.johannes@gmail.com">lichtenberger.johannes@gmail.com</a>> schrieb am Fr., 28. Juli 2023, 23:45:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="auto">I think the main issue is not even allocation or GC, but instead the serialization of in this case over 300_000_000 nodes in total.<div dir="auto"><br></div><div dir="auto">I've attached a screenshot showing the flamegraph of the profiler output I posted the link to...</div><div dir="auto"><br></div><div dir="auto">What do you think? For starters the page already has a slot array to store byte arrays for the nodes. The whole page should probably be something like a growable MemorySegment, but I could probably first try to simply write byte arrays directly and read from them using something as Chronicle bytes or even MemorySegments and the converting it to byte arrays?</div><div dir="auto"><br></div><div dir="auto">Kind regards</div><div dir="auto">Johannes</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" target="_blank" rel="noreferrer">maurizio.cimadamore@oracle.com</a>> schrieb am Fr., 28. Juli 2023, 11:38:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
On 28/07/2023 08:15, Johannes Lichtenberger wrote:<br>
> Hello,<br>
><br>
> I think I mentioned it already, but currently I'm thinking about it again.<br>
><br>
> Regarding the index trie in my spare time project I'm thinking if it <br>
> makes sense, as currently I'm creating fine granular on heap nodes <br>
> during insertions/updates/deletes (1024 per page). Once a page is read <br>
> again from storage I'm storing these nodes in a byte array of byte <br>
> arrays until read for the first time. One thing though is, that the <br>
> nodes may store strings inline and thus are of variable size (and <br>
> thus, the pages are of variable size, too, padded to word aligned IIRC).<br>
><br>
> I'm currently auto-committing after approx 500_000 nodes have been <br>
> created (afterwards they can be garbage collected) and in total there <br>
> are more than 320 million nodes in one test.<br>
><br>
> I think I could store the nodes in MemorySegments instead of using on <br>
> heap classes / instances and dynamically reallocate memory if a node <br>
> value is changed.<br>
><br>
> However, I'm not sure as it means a lot of work and maybe off heap <br>
> memory access is always slightly worse than on heap!?<br>
<br>
I don't think that's necessarily the case. I mean, array access is the <br>
best, there's more optimizations for it, and the access is more <br>
scrutable to the optimizing compiler.<br>
<br>
If you start using APIs, such as ByteBuffer or MemorySegment, they take <br>
a bit of a hit, depending on usage, as each access has to verify certain <br>
access properties. That said, if your access is "well-behaved" and <br>
SIMD-friendly (e.g. counted loop and such), you can expect performance <br>
of MS/BB to be very good, as all the relevant checks will be hoisted out <br>
of loops.<br>
<br>
With memory segments, since you can also create unsafe segments on the <br>
fly, we're investigating approaches where you can get (at least in <br>
synthetic benchmarks) the same assembly and performance of raw Unsafe calls:<br>
<br>
<a href="https://mail.openjdk.org/pipermail/panama-dev/2023-July/019487.html" rel="noreferrer noreferrer noreferrer" target="_blank">https://mail.openjdk.org/pipermail/panama-dev/2023-July/019487.html</a><br>
<br>
I think one area that requires a lot of thought when it comes to <br>
off-heap is allocation. The GC is mightly fast at allocating objects, <br>
especially small ones that might die soon. The default implementation of <br>
Arena::allocate uses malloc under the hood, so it's not going to be <br>
anywhere as fast.<br>
<br>
That said, starting from Java 20 you can define a custom arena with a <br>
better allocation scheme. For instance, if you are allocating in a tight <br>
loop you can write an "allocator" which just recycles memory (see <br>
SegmentAllocator::slicingAllocator). With malloc out of the way the <br>
situation should improve significantly.<br>
<br>
Ultimately picking the right allocation scheme depends on your workload, <br>
there is no one size-fits-all (as I'm sure you know). But there should <br>
be enough building blocks in the API to allow you to do what you need.<br>
<br>
<br>
><br>
> I'll check GC pressure again by logging it, but an IntelliJ profiler <br>
> (async profiler JFR) output of a run to store a big JSON file in <br>
> SirixDB can be seen here: <br>
> <a href="https://github.com/sirixdb/sirix/blob/refactoring-serialization/JsonShredderTest_testChicago_2023_07_27_131637.jfr" rel="noreferrer noreferrer noreferrer" target="_blank">https://github.com/sirixdb/sirix/blob/refactoring-serialization/JsonShredderTest_testChicago_2023_07_27_131637.jfr</a><br>
><br>
> I think I had better performance/latency with Shenandoah (not <br>
> generational), but ZGC was worse in other workloads due to caffeine <br>
> caches and not being generational (but that's changing of course).<br>
><br>
> So, by looking at the profiler output and probably the flame graph <br>
> where G1 work seems to be prominent do you think a refactoring would <br>
> be appropriate using MemorySegments or maybe it's an ideal "big data" <br>
> use case for the generational low latency GCs and the amount of <br>
> objects is not an issue at all!?<br>
<br>
Hard to say. Generational GCs are very very good. And object allocation <br>
might be cheaper than you think. Where off-heap becomes advantageous is <br>
(typically) if you need to work with memory mapped files (and/or native <br>
calls), which is common for database-like use cases.<br>
<br>
Maurizio<br>
<br>
><br>
> Kind regards<br>
> Johannes<br>
</blockquote></div>
</blockquote></div>