RFD: 8252768: Fast, asynchronous heap dumps
Volker Simonis
volker.simonis at gmail.com
Thu Sep 3 16:03:08 UTC 2020
Hi,
I'd like to get your opinion on a POC I've done in order to speed up
heap dumps on Linux:
https://bugs.openjdk.java.net/browse/JDK-8252768
http://cr.openjdk.java.net/~simonis/webrevs/2020/8252768/
Currently, heap dumps can be taken by the SA tools from a frozen
process or core file or directly from a running process with jcmd,
jconsole & JMX, jmap, etc. If the heap of a running process is dumped,
this happens at a safepoint (see VM_HeapDumper). Because the time to
produce a heap dump is roughly proportional to the size and fill ratio
of the heap, this leads to safepoint times which can range from ~100ms
for a 100mb heap to ~1s for a 1gb heap up to 15s and more for a 8gb
heap (measured on my Core i7 laptop with SSD).
One possibility to decrease the safepoint time is to offload the
dumping work to an asynchronous process. On Linux (and probably any
other OS which supports fork()) this can be achieved by forking and
offloading the heap dumping to the child process. Forking still needs
to happen at a safepoint, but forking is considerably faster compared
to the dumping process itself. The fork performance is still
proportional to the size of the original Java process because although
fork won't copy any memory pages, the kernel still needs to duplicate
the page table entries of the process.
Linux uses a “copy-on-write” technique for the creation of a forked
child process. This means that right after creation, the child process
will have exactly the same memory image like its parent process. But
at the same time, the child process won’t use any additional physical
memory, as long as it doesn’t change (i.e. writes into) its memory.
Since heap dumping only reads the child process's memory and then
exits immediately, this technique can be applied even if the Java
process already uses almost the whole free physical memory.
The POC I've created (see
http://cr.openjdk.java.net/~simonis/webrevs/2020/8252768/) decreases
the aforementioned ~100ms, ~1s and 15s for a 100mb, 1gb and 8gb heap
to ~3ms, ~15ms and ~60ms on my laptop which I think is significant.
You can try it out by using the new "-async" or "-async=true" option
of the "GC.heap_dump" jcmd command.
Of course this change will require a CSR for the additional jcmd
GC.heap_dump "-async" option which I'll be happy to create if there's
any interest in this enhancement. Also, logging in the child process
might potentially interfere with logging in the parent VM and probably
will have to be removed in the final version, but I've left it in for
now to better illustrate what's happening. Finally, we can't output
the size of the created dump any more if we are using asynchronous
dumping but from my point of view that's not such a big problem. Apart
from that, the POC works surprisingly well :)
Please let me know what you think and if there's something I've overlooked?
Best regards,
Volker
PS: by the way, asynchronous dumping combines just fine with
compressed dumps. So you can easily use "GC.heap_dump -async=true
-gz=6"
More information about the serviceability-dev
mailing list