Generational ZGC issue

Johannes Lichtenberger lichtenberger.johannes at gmail.com
Fri Feb 16 23:36:09 UTC 2024


I just removed "-XX+DisableExplizitGC", increased max direct memory size to
5g (-XX:MaxDirectMemorySize=5g), but also changed Bytes::elasticByteBuffer
to Bytes.elasticHeapByteBuffer(60_000);
to use on heap ByteBuffers.

However, the performance seems to be way worse. I've repeated the test
several times, but with both G1 and non generational ZGC it's ~50s for
importing the JSON file in the first case vs ~100s using generational ZGC,
using Temurin 21.0.2 with similar values for the actual traversals.

>From the log on STDOUT, I can see this (meaning 0,319s and 0,440s... pause
times?)

[35,718s][info   ][gc      ] GC(9) Minor Collection (Allocation Rate)
12462M(81%)->1556M(10%) 0,319s
[40,871s][info   ][gc      ] GC(10) Minor Collection (Allocation Rate)
[41,311s][info   ][gc      ] GC(10) Minor Collection (Allocation Rate)
13088M(85%)->1432M(9%) 0,440s
[46,236s][info   ][gc      ] GC(11) Minor Collection (Allocation Rate)
[46,603s][info   ][gc      ] GC(11) Minor Collection (Allocation Rate)
12406M(81%)->1676M(11%) 0,367s
[51,445s][info   ][gc      ] GC(12) Minor Collection (Allocation Rate)
[51,846s][info   ][gc      ] GC(12) Minor Collection (Allocation Rate)
12848M(84%)->1556M(10%) 0,401s
[56,203s][info   ][gc      ] GC(13) Major Collection (Proactive)
[56,368s][info   ][gc      ] GC(13) Major Collection (Proactive)
11684M(76%)->484M(3%) 0,166s

kind regards
Johannes

Am Fr., 16. Feb. 2024 um 22:39 Uhr schrieb Erik Osterlund <
erik.osterlund at oracle.com>:

> It’s worth noting that when using ZGC, calling System.gc does not invoke a
> classic disastrously long GC pause. Instead, a concurrent GC is triggered,
> which should be not that noticeable to the application. The thread calling
> System.gc is blocked until the GC is done, but the other threads can run
> freely.
>
> /Erik
>
> > On 16 Feb 2024, at 21:55, Stefan Johansson <stefan.johansson at oracle.com>
> wrote:
> >
> > 
> >
> >> On 2024-02-16 18:04, Johannes Lichtenberger wrote:
> >> Thanks a lot, I wasn't even aware of the fact, that DirectByteBuffers
> use System.gc() and I always had in mind that calling System.gc() at least
> in application code is bad practice (or at least we shouldn't rely on it)
> and I think I read somewhere a while ago, that it's recommended to even
> disable this, but may be completely wrong, of course.
> > In most cases callling System.gc() is bad practice, in some special
> cases it might be needed.
> >
> >> I'll change it to on-heap byte buffers tomorrow :-)
> >> I think your GC log entries were from G1, right? It seems ZGC always
> tries to use the full heap :-)
> >
> > Yes, the snippet was G1, it was mostly to show that the pressure isn't
> high. You are correct that ZGC uses more of the given heap but the
> collections are pretty far apart and I'm certian it would function well
> with a smaller heap as well. Maybe in that case some Major collections
> would be triggered.
> >
> >> Kind regards and thanks for sharing your insights.
> >
> > No problem. We appriciate the feedback,
> > StefanJ
> >
> >> Have a nice weekend as well
> >> Johannes
> >> Stefan Johansson <stefan.johansson at oracle.com <mailto:
> stefan.johansson at oracle.com>> schrieb am Fr., 16. Feb. 2024, 17:38:
> >>    Hi,
> >>    Some comments inline.
> >>    On 2024-02-16 16:47, Johannes Lichtenberger wrote:
> >>     > Thanks a lot for looking into it, I've added
> >>     > `-XX:MaxDirectMemorySize=2g` only recently, but without it failed
> as
> >>     > well,  so not sure what the default is. Will definitely check your
> >>     > suggestions :-)
> >>     >
> >>    If you don't set a limit it will be set to:
> >>    Runtime.getRuntime().maxMemory()
> >>    So likely a good idea to set a reasonable limit, but the smaller the
> >>    limit is the more frequent we need to run reference processing to
> allow
> >>    memory to be freed up.
> >>     > Sadly I'm currently working alone on the project in my spare time
> >>     > (besides professionally switched from Java/Kotlin stuff to the
> >>    embedded
> >>     > software world) and I'm not sure if the current architecture of
> >>    Sirix is
> >>     > limited by too much GC pressure. I'd probably have to check
> >>    Cassandra at
> >>     > some point and look into flame graphs and stuff for their
> >>    integration
> >>     > tests, but maybe you can give some general insights/advice...
> >>     >
> >>     > Yesterday evening I switched to other JDKs (also I want to test
> with
> >>     > Shenandoah in particular), but I think especially the better
> escape
> >>     > analysis of the GraalVM is a huge plus in the case of SirixDB (for
> >>     > insertion on my laptop it's ~90s vs ~60s),  but I think it should
> be
> >>     > faster and currently my suspicion is that garbage is a major
> >>    performance
> >>     > issue.
> >>     >
> >>     > Maybe the GC pressure in general is a major issue, as in the CPU
> >>    Flame
> >>     > graph IIRC the G1 had about 20% stack frames allocated and non
> >>     > generational ZGC even around 40% taking all threads into account.
> >>     >
> >>      From what I/we see, the GC pressure in the given test is not high.
> >>    The
> >>    allocation rate is below 1GB/s and since most of it die young the GCs
> >>    are fairly cheap. In this log snippet G1 shows a GC every 5s and the
> >>    pause time is below 50ms:
> >>    [296,016s][info   ][gc      ] GC(90) Pause Young (Normal) (G1
> >>    Evacuation
> >>    Pause) 5413M->1849M(6456M) 35,577ms
> >>    [301,103s][info   ][gc      ] GC(91) Pause Young (Normal) (G1
> >>    Evacuation
> >>    Pause) 5417M->1848M(6456M) 33,357ms
> >>    [306,041s][info   ][gc      ] GC(92) Pause Young (Normal) (G1
> >>    Evacuation
> >>    Pause) 5416M->1848M(6456M) 32,763ms
> >>    [310,849s][info   ][gc      ] GC(93) Pause Young (Normal) (G1
> >>    Evacuation
> >>    Pause) 5416M->1847M(6456M) 33,086ms
> >>    I also see that the heap never expands to more the ~6.5GB even
> >>    though it
> >>    is allow to be 15GB and this also suggest that the GC is not under
> much
> >>    pressure. As I said in the previous mail, the reason Generational ZGC
> >>    don't free up the direct memory without the System.gc() calls is that
> >>    the GC pressure is not high enough to trigger any Major cycles. So I
> >>    would strongly recommend you to not run with -XX+DisableExplicitGC
> >>    unless you really have to. Since you are using DirectByteBuffers and
> >>    they use System.gc() to help free memory when the limit is reached.
> >>     > So in general I'm thinking about backing the KeyValueLeafPages
> with
> >>     > MemorySegments, but I think due to variable sized pages it's
> getting
> >>     > tricky, plus I currently don't have the time for changing
> >>    fundamental
> >>     > stuff and I'm even not sure if it'll bring a performance boost,
> as I
> >>     > have to adapt neighbour relationships of the nodes often and
> >>    off-heap
> >>     > memory access might be slightly worse performance wise.
> >>     >
> >>     > What do you think?
> >>     >
> >>    I know to little about the application to be able to give advice
> here,
> >>    but I would first start with having most memory on heap. Only large
> >>    long
> >>    lived stuff off-heap, if really needed. Looking at the test at hand,
> it
> >>    really doesn't look like it is long lived stuff that is placed off
> heap.
> >>     > I've attached a memory flame graph and there it seems the byte
> array
> >>     > from deserializing each page is prominent, but that might be
> >>    something I
> >>     > can't even avoid (after decompression via Snappy or via another
> >>    lib and
> >>     > maybe also decryption in the future).
> >>     >
> >>     > As of now G1 with GraalVM seems to perform best (a little bit
> better
> >>     > than with non generational ZGC, but I thought ZGC or maybe
> >>    Shenandoah
> >>     > would improve the situation). But as said I may have to generate
> way
> >>     > less garbage after all in general for good performance!?
> >>     >
> >>     > All in all maybe due to most objects die young maybe also the
> >>     > generational GCs are not needed (that said if enough RAM is
> >>    available
> >>     > and the Caffeine Caches are sized accordingly most objects may
> >>    die old).
> >>     > But apparently the byte arrays holding the page data still die
> >>    young (in
> >>     > AbstractReader::deserialize). In fact I'm not even sure why they
> >>    escape,
> >>     > but currently I'm on my phone.
> >>     >
> >>    It's when most objects die young the Generational GC really shines,
> >>    because it can handle the short lived objects without having to look
> at
> >>    the long lived objects. So I would say Generational ZGC is a good fit
> >>    here, but we need to let the System.gc() run to allow reference
> >>    processing or slightly re-design and use HeapByteBuffers.
> >>    Have a nice weekend,
> >>    Stefan
> >>     > Kind regards
> >>     > Johannes
> >>     >
> >>     > Stefan Johansson <stefan.johansson at oracle.com
> >>    <mailto:stefan.johansson at oracle.com>
> >>     > <mailto:stefan.johansson at oracle.com
> >>    <mailto:stefan.johansson at oracle.com>>> schrieb am Fr., 16. Feb.
> >>    2024, 13:43:
> >>     >
> >>     >     Hi Johannes,
> >>     >
> >>     >     We've spent some more time looking at this and getting the
> >>    json-file to
> >>     >     reproduced it made it easy to verify our suspicions. Thanks
> for
> >>     >     uploading it.
> >>     >
> >>     >     There are a few things playing together here. The test is
> >>    making quite
> >>     >     heavy use of DirectByteBuffers and you limit the usage to 2G
> >>     >     (-XX:MaxDirectMemorySize=2g). The life cycle and freeing of
> >>    the native
> >>     >     memory part of the DirectByteBuffer rely on reference
> >>    processing. In
> >>     >     generational ZGC reference processing is only done during
> Major
> >>     >     collections and since the general GC preassure in this
> >>    benchmark is
> >>     >     very
> >>     >     low (most objects die young), we do not trigger that many
> Major
> >>     >     collections.
> >>     >
> >>     >     Normaly this would not be a problem. To avoid throwing an out
> >>    of memory
> >>     >     error (due to hitting the direct buffer memory limit) too
> >>    early the JDK
> >>     >     triggers a System.gc(). This should trigger reference
> >>    procesing and all
> >>     >     buffers that are no longer in use would be freed. Since you
> >>    specify the
> >>     >     option -XX:+DisableExplicitGC all these calls to trigger GCs
> are
> >>     >     ignored
> >>     >     and no direct memory will be freed. So in our testing, just
> >>    removing
> >>     >     this flags makes the test pass.
> >>     >
> >>     >     Another solution is to look at using HeapByteBuffers instead
> >>    and don't
> >>     >     have to worry about the direct memory usage. The OpenHFT lib
> >>    seems to
> >>     >     have support for this by just using
> >>    elasticHeapByteBuffer(...) instead
> >>     >     of elasticByteBuffer().
> >>     >
> >>     >     Lastly, the reason for this working with non-generational ZGC
> is
> >>     >     that it
> >>     >     does reference processing for every GC.
> >>     >
> >>     >     Hope this helps,
> >>     >     StefanJ
> >>     >
> >>     >
> >>     >     On 2024-02-15 21:53, Johannes Lichtenberger wrote:
> >>     >      > It's a laptop, I've attached some details.
> >>     >      >
> >>     >      > Furthermore, if it seems worth digging deeper into the
> >>    issue, the
> >>     >     JSON
> >>     >      > file is here for one week:
> >>     >      > https://www.transfernow.net/dl/20240215j9NaPTc0
> >>    <
> https://urldefense.com/v3/__https://www.transfernow.net/dl/20240215j9NaPTc0__;!!ACWV5N9M2RV99hQ!O5j6Ri-Ostqq68q1zm71CEhSQ4CE7DqBfHZNq7cDAhU7b7CwqrnIA-ddZFaQDbOMAkgHkFriNeIrXJdRofVuv1UybJynYpht$
> >
> >>     >         <
> https://urldefense.com/v3/__https://www.transfernow.net/dl/20240215j9NaPTc0__;!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFeDXD5_-$
> <
> https://urldefense.com/v3/__https://www.transfernow.net/dl/20240215j9NaPTc0__;!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFeDXD5_-$
> >>
> >>     >      > <https://www.transfernow.net/dl/20240215j9NaPTc0
> >>    <
> https://urldefense.com/v3/__https://www.transfernow.net/dl/20240215j9NaPTc0__;!!ACWV5N9M2RV99hQ!O5j6Ri-Ostqq68q1zm71CEhSQ4CE7DqBfHZNq7cDAhU7b7CwqrnIA-ddZFaQDbOMAkgHkFriNeIrXJdRofVuv1UybJynYpht$
> >
> >>     >         <
> https://urldefense.com/v3/__https://www.transfernow.net/dl/20240215j9NaPTc0__;!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFeDXD5_-$
> <
> https://urldefense.com/v3/__https://www.transfernow.net/dl/20240215j9NaPTc0__;!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFeDXD5_-$
> >>>
> >>     >      >
> >>     >      > You'd have to unzip into
> >>    bundles/sirix-core/src/test/resources/json,
> >>     >      > remove the @Disabled annotation and run the test
> >>     >      > JsonShredderTest::testChicagoDescendantAxis
> >>     >      >
> >>     >      > The test JVM parameters are specified in the parent
> >>    build.gradle
> >>     >     in the
> >>     >      > project root folder.
> >>     >      >
> >>     >      > The GitHub repo: https://github.com/sirixdb/sirix
> >>    <
> https://urldefense.com/v3/__https://github.com/sirixdb/sirix__;!!ACWV5N9M2RV99hQ!O5j6Ri-Ostqq68q1zm71CEhSQ4CE7DqBfHZNq7cDAhU7b7CwqrnIA-ddZFaQDbOMAkgHkFriNeIrXJdRofVuv1UybALU2RDy$
> >
> >>     >         <
> https://urldefense.com/v3/__https://github.com/sirixdb/sirix__;!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFUUPdeUD$
> <
> https://urldefense.com/v3/__https://github.com/sirixdb/sirix__;!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFUUPdeUD$
> >>
> >>     >      > <https://github.com/sirixdb/sirix
> >>    <
> https://urldefense.com/v3/__https://github.com/sirixdb/sirix__;!!ACWV5N9M2RV99hQ!O5j6Ri-Ostqq68q1zm71CEhSQ4CE7DqBfHZNq7cDAhU7b7CwqrnIA-ddZFaQDbOMAkgHkFriNeIrXJdRofVuv1UybALU2RDy$
> >
> >>     >         <
> https://urldefense.com/v3/__https://github.com/sirixdb/sirix__;!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFUUPdeUD$
> <
> https://urldefense.com/v3/__https://github.com/sirixdb/sirix__;!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFUUPdeUD$
> >>>
> >>     >      >
> >>     >      > Screenshot from 2024-02-15 21-43-33.png
> >>     >      >
> >>     >      > kind regards
> >>     >      > Johannes
> >>     >      >
> >>     >      > Am Do., 15. Feb. 2024 um 20:01 Uhr schrieb Peter Booth
> >>     >      > <peter_booth at me.com <mailto:peter_booth at me.com>
> >>    <mailto:peter_booth at me.com <mailto:peter_booth at me.com>>
> >>     >     <mailto:peter_booth at me.com <mailto:peter_booth at me.com>
> >>    <mailto:peter_booth at me.com <mailto:peter_booth at me.com>>>>:
> >>     >      >
> >>     >      >     Just curious - what CPU, physical memory and OS are
> >>    you using?
> >>     >      >     Sent from my iPhone
> >>     >      >
> >>     >      >>     On Feb 15, 2024, at 12:23 PM, Johannes Lichtenberger
> >>     >      >>     <lichtenberger.johannes at gmail.com
> >>    <mailto:lichtenberger.johannes at gmail.com>
> >>     >     <mailto:lichtenberger.johannes at gmail.com
> >>    <mailto:lichtenberger.johannes at gmail.com>>
> >>     >      >>     <mailto:lichtenberger.johannes at gmail.com
> >>    <mailto:lichtenberger.johannes at gmail.com>
> >>     >     <mailto:lichtenberger.johannes at gmail.com
> >>    <mailto:lichtenberger.johannes at gmail.com>>>> wrote:
> >>     >      >>
> >>     >      >>     
> >>     >      >>     I guess I don't know which JDK it picks for the
> >>    tests, but I
> >>     >     guess
> >>     >      >>     OpenJDK
> >>     >      >>
> >>     >      >>     Johannes Lichtenberger
> >>    <lichtenberger.johannes at gmail.com
> >>    <mailto:lichtenberger.johannes at gmail.com>
> >>     >     <mailto:lichtenberger.johannes at gmail.com
> >>    <mailto:lichtenberger.johannes at gmail.com>>
> >>     >      >>     <mailto:lichtenberger.johannes at gmail.com
> >>    <mailto:lichtenberger.johannes at gmail.com>
> >>     >     <mailto:lichtenberger.johannes at gmail.com
> >>    <mailto:lichtenberger.johannes at gmail.com>>>> schrieb am Do., 15.
> >>     >      >>     Feb. 2024, 17:58:
> >>     >      >>
> >>     >      >>         However, it's the same with: ./gradlew
> >>     >      >>
>  -Dorg.gradle.java.home=/home/johannes/.jdks/openjdk-21.0.2
> >>     >      >>         :sirix-core:test --tests
> >>     >      >>
> >>     >
>  io.sirix.service.json.shredder.JsonShredderTest.testChicagoDescendantAxis
>  using OpenJDK hopefully
> >>     >      >>
> >>     >      >>         Am Do., 15. Feb. 2024 um 17:54 Uhr schrieb
> Johannes
> >>     >      >>         Lichtenberger <lichtenberger.johannes at gmail.com
> >>    <mailto:lichtenberger.johannes at gmail.com>
> >>     >     <mailto:lichtenberger.johannes at gmail.com
> >>    <mailto:lichtenberger.johannes at gmail.com>>
> >>     >      >>         <mailto:lichtenberger.johannes at gmail.com
> >>    <mailto:lichtenberger.johannes at gmail.com>
> >>     >     <mailto:lichtenberger.johannes at gmail.com
> >>    <mailto:lichtenberger.johannes at gmail.com>>>>:
> >>     >      >>
> >>     >      >>             I've attached two logs, the first one without
> >>     >      >>             -XX:+Generational, the second one with the
> >>    option set,
> >>     >      >>             even though I also saw, that generational ZGC
> is
> >>     >     going to
> >>     >      >>             be supported in GraalVM 24.1 in September...
> >>    so not sure
> >>     >      >>             what this does :)
> >>     >      >>
> >>     >      >>             Am Do., 15. Feb. 2024 um 17:52 Uhr schrieb
> >>    Johannes
> >>     >      >>             Lichtenberger
> >>    <lichtenberger.johannes at gmail.com
> >>    <mailto:lichtenberger.johannes at gmail.com>
> >>     >     <mailto:lichtenberger.johannes at gmail.com
> >>    <mailto:lichtenberger.johannes at gmail.com>>
> >>     >      >>             <mailto:lichtenberger.johannes at gmail.com
> >>    <mailto:lichtenberger.johannes at gmail.com>
> >>     >     <mailto:lichtenberger.johannes at gmail.com
> >>    <mailto:lichtenberger.johannes at gmail.com>>>>:
> >>     >      >>
> >>     >      >>                 Strange, so does it simply ignore the
> >>    option? The
> >>     >      >>                 following is the beginning of the output
> >>    from _non_
> >>     >      >>                 generational ZGC:
> >>     >      >>
> >>     >      >>                 johannes at luna:~/IdeaProjects/sirix$
> ./gradlew
> >>     >      >>
> >>     >
>  -Dorg.gradle.java.home=/home/johannes/.sdkman/candidates/java/21.0.2-graal
> :sirix-core:test --tests
> io.sirix.service.json.shredder.JsonShredderTest.testChicagoDescendantAxis
> >>     >      >>
> >>     >      >>                 > Configure project :
> >>     >      >>                 The 'sonarqube' task depends on compile
> >>    tasks. This
> >>     >      >>                 behavior is now deprecated and will be
> >>    removed in
> >>     >      >>                 version 5.x. To avoid implicit
> >>    compilation, set
> >>     >      >>                 property 'sonar.gradle.skipCompile' to
> 'true'
> >>     >     and make
> >>     >      >>                 sure your project is compiled, before
> >>    analysis has
> >>     >      >>                 started.
> >>     >      >>                 The 'sonar' task depends on compile
> >>    tasks. This
> >>     >      >>                 behavior is now deprecated and will be
> >>    removed in
> >>     >      >>                 version 5.x. To avoid implicit
> >>    compilation, set
> >>     >      >>                 property 'sonar.gradle.skipCompile' to
> 'true'
> >>     >     and make
> >>     >      >>                 sure your project is compiled, before
> >>    analysis has
> >>     >      >>                 started.
> >>     >      >>                 [1,627s][info   ][gc      ] GC(0) Garbage
> >>    Collection
> >>     >      >>                 (Metadata GC Threshold) 84M(1%)->56M(0%)
> >>     >      >>
> >>     >      >>                 > Task :sirix-core:test
> >>     >      >>                 [0.001s][warning][pagesize] UseLargePages
> >>     >     disabled, no
> >>     >      >>                 large pages configured and available on
> >>    the system.
> >>     >      >>                 [1.253s][info   ][gc      ] Using The Z
> >>    Garbage
> >>     >     Collector
> >>     >      >>
> >>     >      >>                 [2,930s][info   ][gc      ] GC(1) Garbage
> >>    Collection
> >>     >      >>                 (Warmup) 1616M(11%)->746M(5%)
> >>     >      >>                 [4,445s][info   ][gc      ] GC(2) Garbage
> >>    Collection
> >>     >      >>                 (Warmup) 3232M(21%)->750M(5%)
> >>     >      >>                 [5,751s][info   ][gc      ] GC(3) Garbage
> >>    Collection
> >>     >      >>                 (Warmup) 4644M(30%)->1356M(9%)
> >>     >      >>                 [9,886s][info   ][gc      ] GC(4) Garbage
> >>    Collection
> >>     >      >>                 (Allocation Rate) 10668M(69%)->612M(4%)
> >>     >      >>                 [10,406s][info   ][gc      ] GC(5) Garbage
> >>     >     Collection
> >>     >      >>                 (Allocation Rate) 2648M(17%)->216M(1%)
> >>     >      >>                 [13,931s][info   ][gc      ] GC(6) Garbage
> >>     >     Collection
> >>     >      >>                 (Allocation Rate) 11164M(73%)->1562M(10%)
> >>     >      >>                 [16,908s][info   ][gc      ] GC(7) Garbage
> >>     >     Collection
> >>     >      >>                 (Allocation Rate) 11750M(76%)->460M(3%)
> >>     >      >>                 [20,690s][info   ][gc      ] GC(8) Garbage
> >>     >     Collection
> >>     >      >>                 (Allocation Rate) 12670M(82%)->726M(5%)
> >>     >      >>                 [24,376s][info   ][gc      ] GC(9) Garbage
> >>     >     Collection
> >>     >      >>                 (Allocation Rate) 13422M(87%)->224M(1%)
> >>     >      >>                 [28,152s][info   ][gc      ] GC(10)
> Garbage
> >>     >     Collection
> >>     >      >>                 (Proactive) 13474M(88%)->650M(4%)
> >>     >      >>                 [31,526s][info   ][gc      ] GC(11)
> Garbage
> >>     >     Collection
> >>     >      >>                 (Allocation Rate) 12072M(79%)->1472M(10%)
> >>     >      >>                 [34,754s][info   ][gc      ] GC(12)
> Garbage
> >>     >     Collection
> >>     >      >>                 (Allocation Rate) 13050M(85%)->330M(2%)
> >>     >      >>                 [38,478s][info   ][gc      ] GC(13)
> Garbage
> >>     >     Collection
> >>     >      >>                 (Allocation Rate) 13288M(87%)->762M(5%)
> >>     >      >>                 [41,936s][info   ][gc      ] GC(14)
> Garbage
> >>     >     Collection
> >>     >      >>                 (Proactive) 13294M(87%)->504M(3%)
> >>     >      >>                 [45,353s][info   ][gc      ] GC(15)
> Garbage
> >>     >     Collection
> >>     >      >>                 (Allocation Rate) 12984M(85%)->268M(2%)
> >>     >      >>                 [48,861s][info   ][gc      ] GC(16)
> Garbage
> >>     >     Collection
> >>     >      >>                 (Allocation Rate) 13008M(85%)->306M(2%)
> >>     >      >>                 [52,133s][info   ][gc      ] GC(17)
> Garbage
> >>     >     Collection
> >>     >      >>                 (Proactive) 12042M(78%)->538M(4%)
> >>     >      >>                 [55,705s][info   ][gc      ] GC(18)
> Garbage
> >>     >     Collection
> >>     >      >>                 (Allocation Rate) 12420M(81%)->1842M(12%)
> >>     >      >>                 [59,000s][info   ][gc      ] GC(19)
> Garbage
> >>     >     Collection
> >>     >      >>                 (Allocation Rate) 12458M(81%)->1422M(9%)
> >>     >      >>                 [64,501s][info   ][gc      ] Allocation
> >>    Stall (Test
> >>     >      >>                 worker) 59,673ms
> >>     >      >>                 [64,742s][info   ][gc      ] Allocation
> >>    Stall (Test
> >>     >      >>                 worker) 240,077ms
> >>     >      >>                 [65,806s][info   ][gc      ] GC(20)
> Garbage
> >>     >     Collection
> >>     >      >>                 (Allocation Rate) 13808M(90%)->6936M(45%)
> >>     >      >>                 [66,476s][info   ][gc      ] GC(21)
> Garbage
> >>     >     Collection
> >>     >      >>                 (Allocation Stall) 7100M(46%)->4478M(29%)
> >>     >      >>                 [69,471s][info   ][gc      ] GC(22)
> Garbage
> >>     >     Collection
> >>     >      >>                 (Allocation Rate) 10098M(66%)->5888M(38%)
> >>     >      >>                 [72,252s][info   ][gc      ] GC(23)
> Garbage
> >>     >     Collection
> >>     >      >>                 (Allocation Rate) 11226M(73%)->5816M(38%)
> >>     >      >>
> >>     >      >>                 ...
> >>     >      >>
> >>     >      >>                 So even here I can see some allocation
> >>    stalls.
> >>     >      >>
> >>     >      >>                 Running the Same with -XX:+ZGenerational
> in
> >>     >      >>                 build.gradle probably using GraalVM does
> >>    something
> >>     >      >>                 differnt, but I don't know what... at
> >>    least off-heap
> >>     >      >>                 memory is exhausted at some point due to
> >>    direct byte
> >>     >      >>                 buffer usage!?
> >>     >      >>
> >>     >      >>                 So, I'm not sure what's the difference,
> >>    though.
> >>     >      >>
> >>     >      >>                 With this:
> >>     >      >>
> >>     >      >>                 "-XX:+UseZGC",
> >>     >      >>
> >>     >       "-Xlog:gc*=debug:file=zgc-generational-detailed.log",
> >>     >      >>                 "-XX:+ZGenerational",
> >>     >      >>                 "-verbose:gc",
> >>     >      >>                 "-XX:+HeapDumpOnOutOfMemoryError",
> >>     >      >>                 "-XX:HeapDumpPath=heapdump.hprof",
> >>     >      >>                 "-XX:MaxDirectMemorySize=2g",
> >>     >      >>
> >>     >      >>
> >>     >      >>                 Caused by: java.lang.OutOfMemoryError:
> Cannot
> >>     >     reserve 60000 bytes of direct buffer memory (allocated:
> >>    2147446560,
> >>     >     limit: 2147483648)
> >>     >      >>                      at
> >>     >     java.base/java.nio.Bits.reserveMemory(Bits.java:178)
> >>     >      >>                      at
> >>     >
>  java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:111)
> >>     >      >>                      at
> >>     >
>  java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:360)
> >>     >      >>                      at
> >>     >
>  net.openhft.chronicle.bytes.internal.NativeBytesStore.elasticByteBuffer(NativeBytesStore.java:191)
> >>     >      >>                      at
> >>     >
>  net.openhft.chronicle.bytes.BytesStore.elasticByteBuffer(BytesStore.java:192)
> >>     >      >>                      at
> >>     >
>  net.openhft.chronicle.bytes.Bytes.elasticByteBuffer(Bytes.java:176)
> >>     >      >>                      at
> >>     >
>  net.openhft.chronicle.bytes.Bytes.elasticByteBuffer(Bytes.java:148)
> >>     >      >>                      at io.sirix.access.trx.page
> >>    <
> https://urldefense.com/v3/__http://io.sirix.access.trx.page__;!!ACWV5N9M2RV99hQ!O5j6Ri-Ostqq68q1zm71CEhSQ4CE7DqBfHZNq7cDAhU7b7CwqrnIA-ddZFaQDbOMAkgHkFriNeIrXJdRofVuv1UybLv7t-Xn$
> >
> >>     >         <
> https://urldefense.com/v3/__http://io.sirix.access.trx.page__;!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFeoArpQf$
> <
> https://urldefense.com/v3/__http://io.sirix.access.trx.page__;!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFeoArpQf$
> >>.NodePageTrx.lambda$parallelSerializationOfKeyValuePages$1(NodePageTrx.java:443)
> >>     >      >>
> >>     >      >>
> >>     >      >>
> >>     >      >>                 Am Do., 15. Feb. 2024 um 12:05 Uhr
> >>    schrieb Stefan
> >>     >      >>                 Karlsson <stefan.karlsson at oracle.com
> >>    <mailto:stefan.karlsson at oracle.com>
> >>     >     <mailto:stefan.karlsson at oracle.com
> >>    <mailto:stefan.karlsson at oracle.com>>
> >>     >      >>                 <mailto:stefan.karlsson at oracle.com
> >>    <mailto:stefan.karlsson at oracle.com>
> >>     >     <mailto:stefan.karlsson at oracle.com
> >>    <mailto:stefan.karlsson at oracle.com>>>>:
> >>     >      >>
> >>     >      >>                     Hi Johannes,
> >>     >      >>
> >>     >      >>                     We tried to look at the log files and
> >>    the jfr
> >>     >      >>                     files, but couldn't find
> >>     >      >>                     an OotOfMemoryError in any of them.
> >>    Do you think
> >>     >      >>                     you could try to rerun
> >>     >      >>                     and capture the entire GC log from the
> >>     >      >>                     OutOfMemoryError run?
> >>     >      >>
> >>     >      >>                     A few things to note:
> >>     >      >>
> >>     >      >>                     1) You seem to be running the Graal
> >>    compiler.
> >>     >      >>                     Graal doesn't support
> >>     >      >>                     Generational ZGC, so you are going to
> run
> >>     >      >>                     different compilers when you
> >>     >      >>                     compare Singlegen ZGC with
> >>    Generational ZGC.
> >>     >      >>
> >>     >      >>                     2) It's not clear to me that the
> >>    provided JFR
> >>     >      >>                     files matches the provided
> >>     >      >>                     log files.
> >>     >      >>
> >>     >      >>                     3) The JFR files show that
> >>     >     -XX:+UseLargePages are
> >>     >      >>                     used, but the gc+init
> >>     >      >>                     logs shows 'Large Page Support:
> >>    Disabled', you
> >>     >      >>                     might want to look into
> >>     >      >>                     why that is the case.
> >>     >      >>
> >>     >      >>                     4) The singlegen JFR file has a
> >>     >      >>                     -Xlog:gc:g1-chicago.log line. It
> should
> >>     >      >>                     probably be named zgc-chicago.log.
> >>     >      >>
> >>     >      >>                     Cheers,
> >>     >      >>                     StefanK
> >>     >      >>
> >>     >      >>                     On 2024-02-14 17:36, Johannes
> >>    Lichtenberger
> >>     >     wrote:
> >>     >      >>                     > Hello,
> >>     >      >>                     >
> >>     >      >>                     > a test of my little DB project
> >>    fails using
> >>     >      >>                     generational ZGC, but not
> >>     >      >>                     > with ZGC and G1 (out of memory
> error).
> >>     >      >>                     >
> >>     >      >>                     > To be honest, I guess the
> >>    allocation rate and
> >>     >      >>                     thus GC pressure, when
> >>     >      >>                     > traversing a resource in SirixDB is
> >>     >      >>                     unacceptable. The strategy is to
> >>     >      >>                     > create fine-grained nodes from JSON
> >>    input and
> >>     >      >>                     store these in a trie.
> >>     >      >>                     > First, a 3,8Gb JSON file is
> >>    shredded and
> >>     >      >>                     imported. Next, a preorder
> >>     >      >>                     > traversal of the generated trie
> >>    traverses
> >>     >     a trie
> >>     >      >>                     (with leaf pages
> >>     >      >>                     > storing 1024 nodes each and in total
> >>     >      >>                     ~300_000_000 (and these are going
> >>     >      >>                     > to be deserialized one by one). The
> >>    pages are
> >>     >      >>                     furthermore referenced
> >>     >      >>                     > in memory through
> >>    PageReference::setPage.
> >>     >      >>                     Furthermore, a Caffeine page
> >>     >      >>                     > cache caches the PageReferences
> >>    (keys) and the
> >>     >      >>                     pages (values) and sets
> >>     >      >>                     > the reference back to null once
> >>    entries are
> >>     >      >>                     going to be evicted
> >>     >      >>                     > (PageReference.setPage(null)).
> >>     >      >>                     >
> >>     >      >>                     > However, I think the whole strategy
> of
> >>     >     having to
> >>     >      >>                     have in-memory nodes
> >>     >      >>                     > might not be best. Maybe it's
> >>    better to use
> >>     >      >>                     off-heap memory for the
> >>     >      >>                     > pages itself with MemorySegments,
> >>    but the
> >>     >     pages
> >>     >      >>                     are not of a fixed
> >>     >      >>                     > size, thus it may get tricky.
> >>     >      >>                     >
> >>     >      >>                     > The test mentioned is this:
> >>     >      >>                     >
> >>     >      >>
> >>     >
> >>
> https://github.com/sirixdb/sirix/blob/248ab141632c94c6484a3069a056550516afb1d2/bundles/sirix-core/src/test/java/io/sirix/service/json/shredder/JsonShredderTest.java#L69
> <
> https://urldefense.com/v3/__https://github.com/sirixdb/sirix/blob/248ab141632c94c6484a3069a056550516afb1d2/bundles/sirix-core/src/test/java/io/sirix/service/json/shredder/JsonShredderTest.java*L69__;Iw!!ACWV5N9M2RV99hQ!O5j6Ri-Ostqq68q1zm71CEhSQ4CE7DqBfHZNq7cDAhU7b7CwqrnIA-ddZFaQDbOMAkgHkFriNeIrXJdRofVuv1UybA2KQCpC$>
> <
> https://urldefense.com/v3/__https://github.com/sirixdb/sirix/blob/248ab141632c94c6484a3069a056550516afb1d2/bundles/sirix-core/src/test/java/io/sirix/service/json/shredder/JsonShredderTest.java*L69__;Iw!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFRtH2qmJ$
> <
> https://urldefense.com/v3/__https://github.com/sirixdb/sirix/blob/248ab141632c94c6484a3069a056550516afb1d2/bundles/sirix-core/src/test/java/io/sirix/service/json/shredder/JsonShredderTest.java*L69__;Iw!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFRtH2qmJ$>>
> <
> https://github.com/sirixdb/sirix/blob/248ab141632c94c6484a3069a056550516afb1d2/bundles/sirix-core/src/test/java/io/sirix/service/json/shredder/JsonShredderTest.java#L69
> <
> https://urldefense.com/v3/__https://github.com/sirixdb/sirix/blob/248ab141632c94c6484a3069a056550516afb1d2/bundles/sirix-core/src/test/java/io/sirix/service/json/shredder/JsonShredderTest.java*L69__;Iw!!ACWV5N9M2RV99hQ!O5j6Ri-Ostqq68q1zm71CEhSQ4CE7DqBfHZNq7cDAhU7b7CwqrnIA-ddZFaQDbOMAkgHkFriNeIrXJdRofVuv1UybA2KQCpC$>
> <
> https://urldefense.com/v3/__https://github.com/sirixdb/sirix/blob/248ab141632c94c6484a3069a056550516afb1d2/bundles/sirix-core/src/test/java/io/sirix/service/json/shredder/JsonShredderTest.java*L69__;Iw!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFRtH2qmJ$
> <
> https://urldefense.com/v3/__https://github.com/sirixdb/sirix/blob/248ab141632c94c6484a3069a056550516afb1d2/bundles/sirix-core/src/test/java/io/sirix/service/json/shredder/JsonShredderTest.java*L69__;Iw!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFRtH2qmJ$
> >>>
> >>     >      >>                     >
> >>     >      >>                     > I can upload the JSON file
> >>    somewhere for a
> >>     >      >>                     couple of days if needed.
> >>     >      >>                     >
> >>     >      >>                     > Caused by:
> java.lang.OutOfMemoryError
> >>     >      >>                     >     at
> >>     >      >>                     >
> >>     >      >>
> >>     >
>  java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
> >>     >      >>                     >     at
> >>     >      >>                     >
> >>     >      >>
> >>     >
>  java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
> >>     >      >>                     >     at
> >>     >      >>                     >
> >>     >      >>
> >>     >
>  java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
> >>     >      >>                     >     at
> >>     >      >>                     >
> >>     >      >>
> >>     >
>  java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:542)
> >>     >      >>                     >     at
> >>     >      >>                     >
> >>     >      >>
> >>     >
>  java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:567)
> >>     >      >>                     >     at
> >>     >      >>                     >
> >>     >      >>
> >>     >
>  java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:670)
> >>     >      >>                     >     at
> >>     >      >>                     >
> >>     >      >>
> >>     >
>  java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
> >>     >      >>                     >     at
> >>     >      >>                     >
> >>     >      >>
> >>     >
>  java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
> >>     >      >>                     >     at
> >>     >      >>                     >
> >>     >      >>
> >>     >
>  java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
> >>     >      >>                     >     at
> >>     >      >>                     >
> >>     >      >>
> >>     >
>  java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
> >>     >      >>                     >     at
> >>     >      >>                     > io.sirix.access.trx.page
> >>    <
> https://urldefense.com/v3/__http://io.sirix.access.trx.page__;!!ACWV5N9M2RV99hQ!O5j6Ri-Ostqq68q1zm71CEhSQ4CE7DqBfHZNq7cDAhU7b7CwqrnIA-ddZFaQDbOMAkgHkFriNeIrXJdRofVuv1UybLv7t-Xn$
> >
> >>     >         <
> https://urldefense.com/v3/__http://io.sirix.access.trx.page__;!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFeoArpQf$
> <
> https://urldefense.com/v3/__http://io.sirix.access.trx.page__;!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFeoArpQf$
> >>
> >>     >      >>                     <http://io.sirix.access.trx.page
> >>    <
> https://urldefense.com/v3/__http://io.sirix.access.trx.page__;!!ACWV5N9M2RV99hQ!O5j6Ri-Ostqq68q1zm71CEhSQ4CE7DqBfHZNq7cDAhU7b7CwqrnIA-ddZFaQDbOMAkgHkFriNeIrXJdRofVuv1UybLv7t-Xn$
> >
> >>     >         <
> https://urldefense.com/v3/__http://io.sirix.access.trx.page__;!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFeoArpQf$
> <
> https://urldefense.com/v3/__http://io.sirix.access.trx.page__;!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFeoArpQf$
> >>>.NodePageTrx.parallelSerializationOfKeyValuePages(NodePageTrx.java:442)
> >>     >      >>                     >
> >>     >      >>                     > I've uploaded several JFR
> >>    recordings and logs
> >>     >      >>                     over here (maybe besides
> >>     >      >>                     > the async profiler JFR files the
> >>    zgc-detailed
> >>     >      >>                     log is most interesting):
> >>     >      >>                     >
> >>     >      >>                     >
> >>     >      >>
> >>    https://github.com/sirixdb/sirix/tree/main/bundles/sirix-core
> >>    <
> https://urldefense.com/v3/__https://github.com/sirixdb/sirix/tree/main/bundles/sirix-core__;!!ACWV5N9M2RV99hQ!O5j6Ri-Ostqq68q1zm71CEhSQ4CE7DqBfHZNq7cDAhU7b7CwqrnIA-ddZFaQDbOMAkgHkFriNeIrXJdRofVuv1UybMV7Rgtt$
> >
> >>     >         <
> https://urldefense.com/v3/__https://github.com/sirixdb/sirix/tree/main/bundles/sirix-core__;!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFYBlqOOx$
> <
> https://urldefense.com/v3/__https://github.com/sirixdb/sirix/tree/main/bundles/sirix-core__;!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFYBlqOOx$>>
> <https://github.com/sirixdb/sirix/tree/main/bundles/sirix-core <
> https://urldefense.com/v3/__https://github.com/sirixdb/sirix/tree/main/bundles/sirix-core__;!!ACWV5N9M2RV99hQ!O5j6Ri-Ostqq68q1zm71CEhSQ4CE7DqBfHZNq7cDAhU7b7CwqrnIA-ddZFaQDbOMAkgHkFriNeIrXJdRofVuv1UybMV7Rgtt$>
> <
> https://urldefense.com/v3/__https://github.com/sirixdb/sirix/tree/main/bundles/sirix-core__;!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFYBlqOOx$
> <
> https://urldefense.com/v3/__https://github.com/sirixdb/sirix/tree/main/bundles/sirix-core__;!!ACWV5N9M2RV99hQ!MWZDuvCBsbZSYul-HLDtF_j1IBD6osBF4cBVE_bg0yM5zCqYFwzLLp7nKN3b1hq1XVFRreqUVaXiKuXjUwGbxpjjFYBlqOOx$
> >>>
> >>     >      >>                     >
> >>     >      >>                     > kind regards
> >>     >      >>                     > Johannes
> >>     >      >>
> >>     >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/zgc-dev/attachments/20240217/1163bb19/attachment-0001.htm>


More information about the zgc-dev mailing list