Generational ZGC issue

Johannes Lichtenberger lichtenberger.johannes at gmail.com
Thu Feb 15 17:22:32 UTC 2024


I guess I don't know which JDK it picks for the tests, but I guess OpenJDK

Johannes Lichtenberger <lichtenberger.johannes at gmail.com> schrieb am Do.,
15. Feb. 2024, 17:58:

> However, it's the same with: ./gradlew
> -Dorg.gradle.java.home=/home/johannes/.jdks/openjdk-21.0.2 :sirix-core:test
> --tests
> io.sirix.service.json.shredder.JsonShredderTest.testChicagoDescendantAxis
>  using OpenJDK hopefully
>
> Am Do., 15. Feb. 2024 um 17:54 Uhr schrieb Johannes Lichtenberger <
> lichtenberger.johannes at gmail.com>:
>
>> I've attached two logs, the first one without -XX:+Generational, the
>> second one with the option set, even though I also saw, that generational
>> ZGC is going to be supported in GraalVM 24.1 in September... so not sure
>> what this does :)
>>
>> Am Do., 15. Feb. 2024 um 17:52 Uhr schrieb Johannes Lichtenberger <
>> lichtenberger.johannes at gmail.com>:
>>
>>> Strange, so does it simply ignore the option? The following is the
>>> beginning of the output from _non_ generational ZGC:
>>>
>>> johannes at luna:~/IdeaProjects/sirix$ ./gradlew
>>> -Dorg.gradle.java.home=/home/johannes/.sdkman/candidates/java/21.0.2-graal
>>> :sirix-core:test --tests
>>> io.sirix.service.json.shredder.JsonShredderTest.testChicagoDescendantAxis
>>>
>>> > Configure project :
>>> The 'sonarqube' task depends on compile tasks. This behavior is now
>>> deprecated and will be removed in version 5.x. To avoid implicit
>>> compilation, set property 'sonar.gradle.skipCompile' to 'true' and make
>>> sure your project is compiled, before analysis has started.
>>> The 'sonar' task depends on compile tasks. This behavior is now
>>> deprecated and will be removed in version 5.x. To avoid implicit
>>> compilation, set property 'sonar.gradle.skipCompile' to 'true' and make
>>> sure your project is compiled, before analysis has started.
>>> [1,627s][info   ][gc      ] GC(0) Garbage Collection (Metadata GC
>>> Threshold) 84M(1%)->56M(0%)
>>>
>>> > Task :sirix-core:test
>>> [0.001s][warning][pagesize] UseLargePages disabled, no large pages
>>> configured and available on the system.
>>> [1.253s][info   ][gc      ] Using The Z Garbage Collector
>>>
>>> [2,930s][info   ][gc      ] GC(1) Garbage Collection (Warmup)
>>> 1616M(11%)->746M(5%)
>>> [4,445s][info   ][gc      ] GC(2) Garbage Collection (Warmup)
>>> 3232M(21%)->750M(5%)
>>> [5,751s][info   ][gc      ] GC(3) Garbage Collection (Warmup)
>>> 4644M(30%)->1356M(9%)
>>> [9,886s][info   ][gc      ] GC(4) Garbage Collection (Allocation Rate)
>>> 10668M(69%)->612M(4%)
>>> [10,406s][info   ][gc      ] GC(5) Garbage Collection (Allocation Rate)
>>> 2648M(17%)->216M(1%)
>>> [13,931s][info   ][gc      ] GC(6) Garbage Collection (Allocation Rate)
>>> 11164M(73%)->1562M(10%)
>>> [16,908s][info   ][gc      ] GC(7) Garbage Collection (Allocation Rate)
>>> 11750M(76%)->460M(3%)
>>> [20,690s][info   ][gc      ] GC(8) Garbage Collection (Allocation Rate)
>>> 12670M(82%)->726M(5%)
>>> [24,376s][info   ][gc      ] GC(9) Garbage Collection (Allocation Rate)
>>> 13422M(87%)->224M(1%)
>>> [28,152s][info   ][gc      ] GC(10) Garbage Collection (Proactive)
>>> 13474M(88%)->650M(4%)
>>> [31,526s][info   ][gc      ] GC(11) Garbage Collection (Allocation Rate)
>>> 12072M(79%)->1472M(10%)
>>> [34,754s][info   ][gc      ] GC(12) Garbage Collection (Allocation Rate)
>>> 13050M(85%)->330M(2%)
>>> [38,478s][info   ][gc      ] GC(13) Garbage Collection (Allocation Rate)
>>> 13288M(87%)->762M(5%)
>>> [41,936s][info   ][gc      ] GC(14) Garbage Collection (Proactive)
>>> 13294M(87%)->504M(3%)
>>> [45,353s][info   ][gc      ] GC(15) Garbage Collection (Allocation Rate)
>>> 12984M(85%)->268M(2%)
>>> [48,861s][info   ][gc      ] GC(16) Garbage Collection (Allocation Rate)
>>> 13008M(85%)->306M(2%)
>>> [52,133s][info   ][gc      ] GC(17) Garbage Collection (Proactive)
>>> 12042M(78%)->538M(4%)
>>> [55,705s][info   ][gc      ] GC(18) Garbage Collection (Allocation Rate)
>>> 12420M(81%)->1842M(12%)
>>> [59,000s][info   ][gc      ] GC(19) Garbage Collection (Allocation Rate)
>>> 12458M(81%)->1422M(9%)
>>> [64,501s][info   ][gc      ] Allocation Stall (Test worker) 59,673ms
>>> [64,742s][info   ][gc      ] Allocation Stall (Test worker) 240,077ms
>>> [65,806s][info   ][gc      ] GC(20) Garbage Collection (Allocation Rate)
>>> 13808M(90%)->6936M(45%)
>>> [66,476s][info   ][gc      ] GC(21) Garbage Collection (Allocation
>>> Stall) 7100M(46%)->4478M(29%)
>>> [69,471s][info   ][gc      ] GC(22) Garbage Collection (Allocation Rate)
>>> 10098M(66%)->5888M(38%)
>>> [72,252s][info   ][gc      ] GC(23) Garbage Collection (Allocation Rate)
>>> 11226M(73%)->5816M(38%)
>>>
>>> ...
>>>
>>> So even here I can see some allocation stalls.
>>>
>>> Running the Same with -XX:+ZGenerational in build.gradle probably using
>>> GraalVM does something differnt, but I don't know what... at least off-heap
>>> memory is exhausted at some point due to direct byte buffer usage!?
>>>
>>> So, I'm not sure what's the difference, though.
>>>
>>> With this:
>>>
>>> "-XX:+UseZGC",
>>> "-Xlog:gc*=debug:file=zgc-generational-detailed.log",
>>> "-XX:+ZGenerational",
>>> "-verbose:gc",
>>> "-XX:+HeapDumpOnOutOfMemoryError",
>>> "-XX:HeapDumpPath=heapdump.hprof",
>>> "-XX:MaxDirectMemorySize=2g",
>>>
>>>
>>> Caused by: java.lang.OutOfMemoryError: Cannot reserve 60000 bytes of direct buffer memory (allocated: 2147446560, limit: 2147483648)
>>> 	at java.base/java.nio.Bits.reserveMemory(Bits.java:178)
>>> 	at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:111)
>>> 	at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:360)
>>> 	at net.openhft.chronicle.bytes.internal.NativeBytesStore.elasticByteBuffer(NativeBytesStore.java:191)
>>> 	at net.openhft.chronicle.bytes.BytesStore.elasticByteBuffer(BytesStore.java:192)
>>> 	at net.openhft.chronicle.bytes.Bytes.elasticByteBuffer(Bytes.java:176)
>>> 	at net.openhft.chronicle.bytes.Bytes.elasticByteBuffer(Bytes.java:148)
>>> 	at io.sirix.access.trx.page.NodePageTrx.lambda$parallelSerializationOfKeyValuePages$1(NodePageTrx.java:443)
>>>
>>>
>>>
>>>
>>> Am Do., 15. Feb. 2024 um 12:05 Uhr schrieb Stefan Karlsson <
>>> stefan.karlsson at oracle.com>:
>>>
>>>> Hi Johannes,
>>>>
>>>> We tried to look at the log files and the jfr files, but couldn't find
>>>> an OotOfMemoryError in any of them. Do you think you could try to rerun
>>>> and capture the entire GC log from the OutOfMemoryError run?
>>>>
>>>> A few things to note:
>>>>
>>>> 1) You seem to be running the Graal compiler. Graal doesn't support
>>>> Generational ZGC, so you are going to run different compilers when you
>>>> compare Singlegen ZGC with Generational ZGC.
>>>>
>>>> 2) It's not clear to me that the provided JFR files matches the
>>>> provided
>>>> log files.
>>>>
>>>> 3) The JFR files show that -XX:+UseLargePages are used, but the gc+init
>>>> logs shows 'Large Page Support: Disabled', you might want to look into
>>>> why that is the case.
>>>>
>>>> 4) The singlegen JFR file has a -Xlog:gc:g1-chicago.log line. It should
>>>> probably be named zgc-chicago.log.
>>>>
>>>> Cheers,
>>>> StefanK
>>>>
>>>> On 2024-02-14 17:36, Johannes Lichtenberger wrote:
>>>> > Hello,
>>>> >
>>>> > a test of my little DB project fails using generational ZGC, but not
>>>> > with ZGC and G1 (out of memory error).
>>>> >
>>>> > To be honest, I guess the allocation rate and thus GC pressure, when
>>>> > traversing a resource in SirixDB is unacceptable. The strategy is to
>>>> > create fine-grained nodes from JSON input and store these in a trie.
>>>> > First, a 3,8Gb JSON file is shredded and imported. Next, a preorder
>>>> > traversal of the generated trie traverses a trie (with leaf pages
>>>> > storing 1024 nodes each and in total ~300_000_000 (and these are
>>>> going
>>>> > to be deserialized one by one). The pages are furthermore referenced
>>>> > in memory through PageReference::setPage. Furthermore, a Caffeine
>>>> page
>>>> > cache caches the PageReferences (keys) and the pages (values) and
>>>> sets
>>>> > the reference back to null once entries are going to be evicted
>>>> > (PageReference.setPage(null)).
>>>> >
>>>> > However, I think the whole strategy of having to have in-memory nodes
>>>> > might not be best. Maybe it's better to use off-heap memory for the
>>>> > pages itself with MemorySegments, but the pages are not of a fixed
>>>> > size, thus it may get tricky.
>>>> >
>>>> > The test mentioned is this:
>>>> >
>>>> https://github.com/sirixdb/sirix/blob/248ab141632c94c6484a3069a056550516afb1d2/bundles/sirix-core/src/test/java/io/sirix/service/json/shredder/JsonShredderTest.java#L69
>>>> >
>>>> > I can upload the JSON file somewhere for a couple of days if needed.
>>>> >
>>>> > Caused by: java.lang.OutOfMemoryError
>>>> >     at
>>>> >
>>>> java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
>>>> >     at
>>>> >
>>>> java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
>>>> >     at
>>>> >
>>>> java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
>>>> >     at
>>>> >
>>>> java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:542)
>>>> >     at
>>>> >
>>>> java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:567)
>>>> >     at
>>>> >
>>>> java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:670)
>>>> >     at
>>>> >
>>>> java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
>>>> >     at
>>>> >
>>>> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
>>>> >     at
>>>> >
>>>> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
>>>> >     at
>>>> >
>>>> java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
>>>> >     at
>>>> > io.sirix.access.trx.page
>>>> .NodePageTrx.parallelSerializationOfKeyValuePages(NodePageTrx.java:442)
>>>> >
>>>> > I've uploaded several JFR recordings and logs over here (maybe
>>>> besides
>>>> > the async profiler JFR files the zgc-detailed log is most
>>>> interesting):
>>>> >
>>>> > https://github.com/sirixdb/sirix/tree/main/bundles/sirix-core
>>>> >
>>>> > kind regards
>>>> > Johannes
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/zgc-dev/attachments/20240215/f7a5dd03/attachment-0001.htm>


More information about the zgc-dev mailing list