<html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body>

    <p>If you need to serialize big chunks of memory in files, I think

      you are in the use case Brian S was describing - e.g. you need

      some kind of memory mapped solution.</p>

    <p>E.g. you probably want the memory layout to match the serialized

      layout, so that your memory operations can be persisted directly

      on the disk (e.g. calling MS::force).</p>

    <p>That said, if you have the requirement to go back and from byte

      arrays, you might be in trouble, because there's no way to "wrap

      an array" over a piece of off-heap memory. You would have to

      allocate and then copy, which is very expensive if you have a lot

      of data.</p>

    <p>Maurizio<br>

    </p>

    <div class="moz-cite-prefix">On 28/07/2023 22:45, Johannes

      Lichtenberger wrote:<br>

    </div>

    <blockquote type="cite" cite="mid:CAGXNUvYv5gRneo-0C9REVE6TGOoboEizPnVPShE1Zwe87UL-OA@mail.gmail.com">

      <div dir="auto">I think the main issue is not even allocation or

        GC, but instead the serialization of in this case over

        300_000_000 nodes in total.

        <div dir="auto"><br>

        </div>

        <div dir="auto">I've attached a screenshot showing the

          flamegraph of the profiler output I posted the link to...</div>

        <div dir="auto"><br>

        </div>

        <div dir="auto">What do you think? For starters the page already

          has a slot array to store byte arrays for the nodes. The whole

          page should probably be something like a growable

          MemorySegment, but I could probably first try to simply write

          byte arrays directly and read from them using something as

          Chronicle bytes or even MemorySegments and the converting it

          to byte arrays?</div>

        <div dir="auto"><br>

        </div>

        <div dir="auto">Kind regards</div>

        <div dir="auto">Johannes</div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>

          schrieb am Fr., 28. Juli 2023, 11:38:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0 0 0

          .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

          On 28/07/2023 08:15, Johannes Lichtenberger wrote:<br>

          > Hello,<br>

          ><br>

          > I think I mentioned it already, but currently I'm

          thinking about it again.<br>

          ><br>

          > Regarding the index trie in my spare time project I'm

          thinking if it <br>

          > makes sense, as currently I'm creating fine granular on

          heap nodes <br>

          > during insertions/updates/deletes (1024 per page). Once a

          page is read <br>

          > again from storage I'm storing these nodes in a byte

          array of byte <br>

          > arrays until read for the first time. One thing though

          is, that the <br>

          > nodes may store strings inline and thus are of variable

          size (and <br>

          > thus, the pages are of variable size, too, padded to word

          aligned IIRC).<br>

          ><br>

          > I'm currently auto-committing after approx 500_000 nodes

          have been <br>

          > created (afterwards they can be garbage collected) and in

          total there <br>

          > are more than 320 million nodes in one test.<br>

          ><br>

          > I think I could store the nodes in MemorySegments instead

          of using on <br>

          > heap classes / instances and dynamically reallocate

          memory if a node <br>

          > value is changed.<br>

          ><br>

          > However, I'm not sure as it means a lot of work and maybe

          off heap <br>

          > memory access is always slightly worse than on heap!?<br>

          <br>

          I don't think that's necessarily the case. I mean, array

          access is the <br>

          best, there's more optimizations for it, and the access is

          more <br>

          scrutable to the optimizing compiler.<br>

          <br>

          If you start using APIs, such as ByteBuffer or MemorySegment,

          they take <br>

          a bit of a hit, depending on usage, as each access has to

          verify certain <br>

          access properties. That said, if your access is "well-behaved"

          and <br>

          SIMD-friendly (e.g. counted loop and such), you can expect

          performance <br>

          of MS/BB to be very good, as all the relevant checks will be

          hoisted out <br>

          of loops.<br>

          <br>

          With memory segments, since you can also create unsafe

          segments on the <br>

          fly, we're investigating approaches where you can get (at

          least in <br>

          synthetic benchmarks) the same assembly and performance of raw

          Unsafe calls:<br>

          <br>

          <a href="https://mail.openjdk.org/pipermail/panama-dev/2023-July/019487.html" rel="noreferrer noreferrer" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://mail.openjdk.org/pipermail/panama-dev/2023-July/019487.html</a><br>

          <br>

          I think one area that requires a lot of thought when it comes

          to <br>

          off-heap is allocation. The GC is mightly fast at allocating

          objects, <br>

          especially small ones that might die soon. The default

          implementation of <br>

          Arena::allocate uses malloc under the hood, so it's not going

          to be <br>

          anywhere as fast.<br>

          <br>

          That said, starting from Java 20 you can define a custom arena

          with a <br>

          better allocation scheme. For instance, if you are allocating

          in a tight <br>

          loop you can write an "allocator" which just recycles memory

          (see <br>

          SegmentAllocator::slicingAllocator). With malloc out of the

          way the <br>

          situation should improve significantly.<br>

          <br>

          Ultimately picking the right allocation scheme depends on your

          workload, <br>

          there is no one size-fits-all (as I'm sure you know). But

          there should <br>

          be enough building blocks in the API to allow you to do what

          you need.<br>

          <br>

          <br>

          ><br>

          > I'll check GC pressure again by logging it, but an

          IntelliJ profiler <br>

          > (async profiler JFR) output of a run to store a big JSON

          file in <br>

          > SirixDB can be seen here: <br>

          > <a href="https://urldefense.com/v3/__https://github.com/sirixdb/sirix/blob/refactoring-serialization/JsonShredderTest_testChicago_2023_07_27_131637.jfr__;!!ACWV5N9M2RV99hQ!N3Xy1IMnvPeF5Dopce9tfDfeCHtlQddOiTixmyG3PRnoKGiXUXU9MVc112QTGPBbzMYp2SDIdJvX1W1c1Mm4rXDzj_wTWV4Nww$" rel="noreferrer noreferrer" target="_blank" moz-do-not-send="true">https://github.com/sirixdb/sirix/blob/refactoring-serialization/JsonShredderTest_testChicago_2023_07_27_131637.jfr</a><br>

          ><br>

          > I think I had better performance/latency with Shenandoah

          (not <br>

          > generational), but ZGC was worse in other workloads due

          to caffeine <br>

          > caches and not being generational (but that's changing of

          course).<br>

          ><br>

          > So, by looking at the profiler output and probably the

          flame graph <br>

          > where G1 work seems to be prominent do you think a

          refactoring would <br>

          > be appropriate using MemorySegments or maybe it's an

          ideal "big data" <br>

          > use case for the generational low latency GCs and the

          amount of <br>

          > objects is not an issue at all!?<br>

          <br>

          Hard to say. Generational GCs are very very good. And object

          allocation <br>

          might be cheaper than you think. Where off-heap becomes

          advantageous is <br>

          (typically) if you need to work with memory mapped files

          (and/or native <br>

          calls), which is common for database-like use cases.<br>

          <br>

          Maurizio<br>

          <br>

          ><br>

          > Kind regards<br>

          > Johannes<br>

        </blockquote>

      </div>

    </blockquote>

  </body>

</html>