Generational ZGC issue
Johannes Lichtenberger
lichtenberger.johannes at gmail.com
Thu Feb 15 16:58:28 UTC 2024
However, it's the same with: ./gradlew
-Dorg.gradle.java.home=/home/johannes/.jdks/openjdk-21.0.2 :sirix-core:test
--tests
io.sirix.service.json.shredder.JsonShredderTest.testChicagoDescendantAxis
using OpenJDK hopefully
Am Do., 15. Feb. 2024 um 17:54 Uhr schrieb Johannes Lichtenberger <
lichtenberger.johannes at gmail.com>:
> I've attached two logs, the first one without -XX:+Generational, the
> second one with the option set, even though I also saw, that generational
> ZGC is going to be supported in GraalVM 24.1 in September... so not sure
> what this does :)
>
> Am Do., 15. Feb. 2024 um 17:52 Uhr schrieb Johannes Lichtenberger <
> lichtenberger.johannes at gmail.com>:
>
>> Strange, so does it simply ignore the option? The following is the
>> beginning of the output from _non_ generational ZGC:
>>
>> johannes at luna:~/IdeaProjects/sirix$ ./gradlew
>> -Dorg.gradle.java.home=/home/johannes/.sdkman/candidates/java/21.0.2-graal
>> :sirix-core:test --tests
>> io.sirix.service.json.shredder.JsonShredderTest.testChicagoDescendantAxis
>>
>> > Configure project :
>> The 'sonarqube' task depends on compile tasks. This behavior is now
>> deprecated and will be removed in version 5.x. To avoid implicit
>> compilation, set property 'sonar.gradle.skipCompile' to 'true' and make
>> sure your project is compiled, before analysis has started.
>> The 'sonar' task depends on compile tasks. This behavior is now
>> deprecated and will be removed in version 5.x. To avoid implicit
>> compilation, set property 'sonar.gradle.skipCompile' to 'true' and make
>> sure your project is compiled, before analysis has started.
>> [1,627s][info ][gc ] GC(0) Garbage Collection (Metadata GC
>> Threshold) 84M(1%)->56M(0%)
>>
>> > Task :sirix-core:test
>> [0.001s][warning][pagesize] UseLargePages disabled, no large pages
>> configured and available on the system.
>> [1.253s][info ][gc ] Using The Z Garbage Collector
>>
>> [2,930s][info ][gc ] GC(1) Garbage Collection (Warmup)
>> 1616M(11%)->746M(5%)
>> [4,445s][info ][gc ] GC(2) Garbage Collection (Warmup)
>> 3232M(21%)->750M(5%)
>> [5,751s][info ][gc ] GC(3) Garbage Collection (Warmup)
>> 4644M(30%)->1356M(9%)
>> [9,886s][info ][gc ] GC(4) Garbage Collection (Allocation Rate)
>> 10668M(69%)->612M(4%)
>> [10,406s][info ][gc ] GC(5) Garbage Collection (Allocation Rate)
>> 2648M(17%)->216M(1%)
>> [13,931s][info ][gc ] GC(6) Garbage Collection (Allocation Rate)
>> 11164M(73%)->1562M(10%)
>> [16,908s][info ][gc ] GC(7) Garbage Collection (Allocation Rate)
>> 11750M(76%)->460M(3%)
>> [20,690s][info ][gc ] GC(8) Garbage Collection (Allocation Rate)
>> 12670M(82%)->726M(5%)
>> [24,376s][info ][gc ] GC(9) Garbage Collection (Allocation Rate)
>> 13422M(87%)->224M(1%)
>> [28,152s][info ][gc ] GC(10) Garbage Collection (Proactive)
>> 13474M(88%)->650M(4%)
>> [31,526s][info ][gc ] GC(11) Garbage Collection (Allocation Rate)
>> 12072M(79%)->1472M(10%)
>> [34,754s][info ][gc ] GC(12) Garbage Collection (Allocation Rate)
>> 13050M(85%)->330M(2%)
>> [38,478s][info ][gc ] GC(13) Garbage Collection (Allocation Rate)
>> 13288M(87%)->762M(5%)
>> [41,936s][info ][gc ] GC(14) Garbage Collection (Proactive)
>> 13294M(87%)->504M(3%)
>> [45,353s][info ][gc ] GC(15) Garbage Collection (Allocation Rate)
>> 12984M(85%)->268M(2%)
>> [48,861s][info ][gc ] GC(16) Garbage Collection (Allocation Rate)
>> 13008M(85%)->306M(2%)
>> [52,133s][info ][gc ] GC(17) Garbage Collection (Proactive)
>> 12042M(78%)->538M(4%)
>> [55,705s][info ][gc ] GC(18) Garbage Collection (Allocation Rate)
>> 12420M(81%)->1842M(12%)
>> [59,000s][info ][gc ] GC(19) Garbage Collection (Allocation Rate)
>> 12458M(81%)->1422M(9%)
>> [64,501s][info ][gc ] Allocation Stall (Test worker) 59,673ms
>> [64,742s][info ][gc ] Allocation Stall (Test worker) 240,077ms
>> [65,806s][info ][gc ] GC(20) Garbage Collection (Allocation Rate)
>> 13808M(90%)->6936M(45%)
>> [66,476s][info ][gc ] GC(21) Garbage Collection (Allocation Stall)
>> 7100M(46%)->4478M(29%)
>> [69,471s][info ][gc ] GC(22) Garbage Collection (Allocation Rate)
>> 10098M(66%)->5888M(38%)
>> [72,252s][info ][gc ] GC(23) Garbage Collection (Allocation Rate)
>> 11226M(73%)->5816M(38%)
>>
>> ...
>>
>> So even here I can see some allocation stalls.
>>
>> Running the Same with -XX:+ZGenerational in build.gradle probably using
>> GraalVM does something differnt, but I don't know what... at least off-heap
>> memory is exhausted at some point due to direct byte buffer usage!?
>>
>> So, I'm not sure what's the difference, though.
>>
>> With this:
>>
>> "-XX:+UseZGC",
>> "-Xlog:gc*=debug:file=zgc-generational-detailed.log",
>> "-XX:+ZGenerational",
>> "-verbose:gc",
>> "-XX:+HeapDumpOnOutOfMemoryError",
>> "-XX:HeapDumpPath=heapdump.hprof",
>> "-XX:MaxDirectMemorySize=2g",
>>
>>
>> Caused by: java.lang.OutOfMemoryError: Cannot reserve 60000 bytes of direct buffer memory (allocated: 2147446560, limit: 2147483648)
>> at java.base/java.nio.Bits.reserveMemory(Bits.java:178)
>> at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:111)
>> at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:360)
>> at net.openhft.chronicle.bytes.internal.NativeBytesStore.elasticByteBuffer(NativeBytesStore.java:191)
>> at net.openhft.chronicle.bytes.BytesStore.elasticByteBuffer(BytesStore.java:192)
>> at net.openhft.chronicle.bytes.Bytes.elasticByteBuffer(Bytes.java:176)
>> at net.openhft.chronicle.bytes.Bytes.elasticByteBuffer(Bytes.java:148)
>> at io.sirix.access.trx.page.NodePageTrx.lambda$parallelSerializationOfKeyValuePages$1(NodePageTrx.java:443)
>>
>>
>>
>>
>> Am Do., 15. Feb. 2024 um 12:05 Uhr schrieb Stefan Karlsson <
>> stefan.karlsson at oracle.com>:
>>
>>> Hi Johannes,
>>>
>>> We tried to look at the log files and the jfr files, but couldn't find
>>> an OotOfMemoryError in any of them. Do you think you could try to rerun
>>> and capture the entire GC log from the OutOfMemoryError run?
>>>
>>> A few things to note:
>>>
>>> 1) You seem to be running the Graal compiler. Graal doesn't support
>>> Generational ZGC, so you are going to run different compilers when you
>>> compare Singlegen ZGC with Generational ZGC.
>>>
>>> 2) It's not clear to me that the provided JFR files matches the provided
>>> log files.
>>>
>>> 3) The JFR files show that -XX:+UseLargePages are used, but the gc+init
>>> logs shows 'Large Page Support: Disabled', you might want to look into
>>> why that is the case.
>>>
>>> 4) The singlegen JFR file has a -Xlog:gc:g1-chicago.log line. It should
>>> probably be named zgc-chicago.log.
>>>
>>> Cheers,
>>> StefanK
>>>
>>> On 2024-02-14 17:36, Johannes Lichtenberger wrote:
>>> > Hello,
>>> >
>>> > a test of my little DB project fails using generational ZGC, but not
>>> > with ZGC and G1 (out of memory error).
>>> >
>>> > To be honest, I guess the allocation rate and thus GC pressure, when
>>> > traversing a resource in SirixDB is unacceptable. The strategy is to
>>> > create fine-grained nodes from JSON input and store these in a trie.
>>> > First, a 3,8Gb JSON file is shredded and imported. Next, a preorder
>>> > traversal of the generated trie traverses a trie (with leaf pages
>>> > storing 1024 nodes each and in total ~300_000_000 (and these are going
>>> > to be deserialized one by one). The pages are furthermore referenced
>>> > in memory through PageReference::setPage. Furthermore, a Caffeine page
>>> > cache caches the PageReferences (keys) and the pages (values) and sets
>>> > the reference back to null once entries are going to be evicted
>>> > (PageReference.setPage(null)).
>>> >
>>> > However, I think the whole strategy of having to have in-memory nodes
>>> > might not be best. Maybe it's better to use off-heap memory for the
>>> > pages itself with MemorySegments, but the pages are not of a fixed
>>> > size, thus it may get tricky.
>>> >
>>> > The test mentioned is this:
>>> >
>>> https://github.com/sirixdb/sirix/blob/248ab141632c94c6484a3069a056550516afb1d2/bundles/sirix-core/src/test/java/io/sirix/service/json/shredder/JsonShredderTest.java#L69
>>> >
>>> > I can upload the JSON file somewhere for a couple of days if needed.
>>> >
>>> > Caused by: java.lang.OutOfMemoryError
>>> > at
>>> >
>>> java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
>>> > at
>>> >
>>> java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
>>> > at
>>> >
>>> java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
>>> > at
>>> >
>>> java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:542)
>>> > at
>>> >
>>> java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:567)
>>> > at
>>> >
>>> java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:670)
>>> > at
>>> >
>>> java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
>>> > at
>>> >
>>> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
>>> > at
>>> >
>>> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
>>> > at
>>> >
>>> java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
>>> > at
>>> > io.sirix.access.trx.page
>>> .NodePageTrx.parallelSerializationOfKeyValuePages(NodePageTrx.java:442)
>>> >
>>> > I've uploaded several JFR recordings and logs over here (maybe besides
>>> > the async profiler JFR files the zgc-detailed log is most interesting):
>>> >
>>> > https://github.com/sirixdb/sirix/tree/main/bundles/sirix-core
>>> >
>>> > kind regards
>>> > Johannes
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/zgc-dev/attachments/20240215/d51b1017/attachment.htm>
More information about the zgc-dev
mailing list