<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>If you need to serialize big chunks of memory in files, I think
you are in the use case Brian S was describing - e.g. you need
some kind of memory mapped solution.</p>
<p>E.g. you probably want the memory layout to match the serialized
layout, so that your memory operations can be persisted directly
on the disk (e.g. calling MS::force).</p>
<p>That said, if you have the requirement to go back and from byte
arrays, you might be in trouble, because there's no way to "wrap
an array" over a piece of off-heap memory. You would have to
allocate and then copy, which is very expensive if you have a lot
of data.</p>
<p>Maurizio<br>
</p>
<div class="moz-cite-prefix">On 28/07/2023 22:45, Johannes
Lichtenberger wrote:<br>
</div>
<blockquote type="cite" cite="mid:CAGXNUvYv5gRneo-0C9REVE6TGOoboEizPnVPShE1Zwe87UL-OA@mail.gmail.com">
<div dir="auto">I think the main issue is not even allocation or
GC, but instead the serialization of in this case over
300_000_000 nodes in total.
<div dir="auto"><br>
</div>
<div dir="auto">I've attached a screenshot showing the
flamegraph of the profiler output I posted the link to...</div>
<div dir="auto"><br>
</div>
<div dir="auto">What do you think? For starters the page already
has a slot array to store byte arrays for the nodes. The whole
page should probably be something like a growable
MemorySegment, but I could probably first try to simply write
byte arrays directly and read from them using something as
Chronicle bytes or even MemorySegments and the converting it
to byte arrays?</div>
<div dir="auto"><br>
</div>
<div dir="auto">Kind regards</div>
<div dir="auto">Johannes</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>
schrieb am Fr., 28. Juli 2023, 11:38:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
On 28/07/2023 08:15, Johannes Lichtenberger wrote:<br>
> Hello,<br>
><br>
> I think I mentioned it already, but currently I'm
thinking about it again.<br>
><br>
> Regarding the index trie in my spare time project I'm
thinking if it <br>
> makes sense, as currently I'm creating fine granular on
heap nodes <br>
> during insertions/updates/deletes (1024 per page). Once a
page is read <br>
> again from storage I'm storing these nodes in a byte
array of byte <br>
> arrays until read for the first time. One thing though
is, that the <br>
> nodes may store strings inline and thus are of variable
size (and <br>
> thus, the pages are of variable size, too, padded to word
aligned IIRC).<br>
><br>
> I'm currently auto-committing after approx 500_000 nodes
have been <br>
> created (afterwards they can be garbage collected) and in
total there <br>
> are more than 320 million nodes in one test.<br>
><br>
> I think I could store the nodes in MemorySegments instead
of using on <br>
> heap classes / instances and dynamically reallocate
memory if a node <br>
> value is changed.<br>
><br>
> However, I'm not sure as it means a lot of work and maybe
off heap <br>
> memory access is always slightly worse than on heap!?<br>
<br>
I don't think that's necessarily the case. I mean, array
access is the <br>
best, there's more optimizations for it, and the access is
more <br>
scrutable to the optimizing compiler.<br>
<br>
If you start using APIs, such as ByteBuffer or MemorySegment,
they take <br>
a bit of a hit, depending on usage, as each access has to
verify certain <br>
access properties. That said, if your access is "well-behaved"
and <br>
SIMD-friendly (e.g. counted loop and such), you can expect
performance <br>
of MS/BB to be very good, as all the relevant checks will be
hoisted out <br>
of loops.<br>
<br>
With memory segments, since you can also create unsafe
segments on the <br>
fly, we're investigating approaches where you can get (at
least in <br>
synthetic benchmarks) the same assembly and performance of raw
Unsafe calls:<br>
<br>
<a href="https://mail.openjdk.org/pipermail/panama-dev/2023-July/019487.html" rel="noreferrer noreferrer" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://mail.openjdk.org/pipermail/panama-dev/2023-July/019487.html</a><br>
<br>
I think one area that requires a lot of thought when it comes
to <br>
off-heap is allocation. The GC is mightly fast at allocating
objects, <br>
especially small ones that might die soon. The default
implementation of <br>
Arena::allocate uses malloc under the hood, so it's not going
to be <br>
anywhere as fast.<br>
<br>
That said, starting from Java 20 you can define a custom arena
with a <br>
better allocation scheme. For instance, if you are allocating
in a tight <br>
loop you can write an "allocator" which just recycles memory
(see <br>
SegmentAllocator::slicingAllocator). With malloc out of the
way the <br>
situation should improve significantly.<br>
<br>
Ultimately picking the right allocation scheme depends on your
workload, <br>
there is no one size-fits-all (as I'm sure you know). But
there should <br>
be enough building blocks in the API to allow you to do what
you need.<br>
<br>
<br>
><br>
> I'll check GC pressure again by logging it, but an
IntelliJ profiler <br>
> (async profiler JFR) output of a run to store a big JSON
file in <br>
> SirixDB can be seen here: <br>
> <a href="https://urldefense.com/v3/__https://github.com/sirixdb/sirix/blob/refactoring-serialization/JsonShredderTest_testChicago_2023_07_27_131637.jfr__;!!ACWV5N9M2RV99hQ!N3Xy1IMnvPeF5Dopce9tfDfeCHtlQddOiTixmyG3PRnoKGiXUXU9MVc112QTGPBbzMYp2SDIdJvX1W1c1Mm4rXDzj_wTWV4Nww$" rel="noreferrer noreferrer" target="_blank" moz-do-not-send="true">https://github.com/sirixdb/sirix/blob/refactoring-serialization/JsonShredderTest_testChicago_2023_07_27_131637.jfr</a><br>
><br>
> I think I had better performance/latency with Shenandoah
(not <br>
> generational), but ZGC was worse in other workloads due
to caffeine <br>
> caches and not being generational (but that's changing of
course).<br>
><br>
> So, by looking at the profiler output and probably the
flame graph <br>
> where G1 work seems to be prominent do you think a
refactoring would <br>
> be appropriate using MemorySegments or maybe it's an
ideal "big data" <br>
> use case for the generational low latency GCs and the
amount of <br>
> objects is not an issue at all!?<br>
<br>
Hard to say. Generational GCs are very very good. And object
allocation <br>
might be cheaper than you think. Where off-heap becomes
advantageous is <br>
(typically) if you need to work with memory mapped files
(and/or native <br>
calls), which is common for database-like use cases.<br>
<br>
Maurizio<br>
<br>
><br>
> Kind regards<br>
> Johannes<br>
</blockquote>
</div>
</blockquote>
</body>
</html>