RFD: 8252768: Fast, asynchronous heap dumps

Thomas Stüfe thomas.stuefe at gmail.com
Thu Sep 3 16:58:25 UTC 2020


Hi Volker,

hah, that is a cool idea :)

Lets see what could go wrong:

So, child process is forked, but in the child only the forking thread
survives, right? The forking thread then proceeds to dump. What happens
when, during dumping, it needs a lock owned by one of the non-running
threads? Could also be something non-obvious, like a lock drawn by one of
the lower level subsystems, e.g. memory allocation (and then NMT), UL etc.

--

In the child, I would be afraid of running any kind of cleanup code when
exiting, since that may somehow modify state in the parent (e.g. via
explicitly shared memory, or whatever third party native code may be up
to). So I would use _exit(), not exit(), to avoid running any stray
onexit()/atexit() handlers.

Of course, then you need to make sure the dump is flushed and the file
handle is closed before exiting.

--

Depending on the overcommit settings fork() may fail with ENOMEM,
regardless of copy-on-write.

--

If the parent process is, at the time of the fork, touching a lot of pages,
and the child takes its sweet time writing the dump, total memory usage
will go up, right? Compared to the original, non-async variant.

--

We will now have a second java process popping up, existing for some
seconds, then vanishing. Outside tooling might be confused. OTOH the same
happens when forking via Runtime.exec, but there this state only persists
for some microseconds, until the first exec() call.

--

UL in child: this log output now gets mixed in asynchronously with the
parent's log? I would probably avoid logging in the child process. Also, as
stated above, I am not sure if UL uses locks internally, which may hang.

--

Just some quick first remarks. I find this idea cool, but I am yet not sure
it is practical.

Cheers, Thomas

On Thu, Sep 3, 2020 at 6:03 PM Volker Simonis <volker.simonis at gmail.com>
wrote:

> Hi,
>
> I'd like to get your opinion on a POC I've done in order to speed up
> heap dumps on Linux:
>
> https://bugs.openjdk.java.net/browse/JDK-8252768
> http://cr.openjdk.java.net/~simonis/webrevs/2020/8252768/
>
> Currently, heap dumps can be taken by the SA tools from a frozen
> process or core file or directly from a running process with jcmd,
> jconsole & JMX, jmap, etc. If the heap of a running process is dumped,
> this happens at a safepoint (see VM_HeapDumper). Because the time to
> produce a heap dump is roughly proportional to the size and fill ratio
> of the heap, this leads to safepoint times which can range from ~100ms
> for a 100mb heap to ~1s for a 1gb heap up to 15s and more for a 8gb
> heap (measured on my Core i7 laptop with SSD).
>
> One possibility to decrease the safepoint time is to offload the
> dumping work to an asynchronous process. On Linux (and probably any
> other OS which supports fork()) this can be achieved by forking and
> offloading the heap dumping to the child process. Forking still needs
> to happen at a safepoint, but forking is considerably faster compared
> to the dumping process itself. The fork performance is still
> proportional to the size of the original Java process because although
> fork won't copy any memory pages, the kernel still needs to duplicate
> the page table entries of the process.
>
> Linux uses a “copy-on-write” technique for the creation of a forked
> child process. This means that right after creation, the child process
> will have exactly the same memory image like its parent process. But
> at the same time, the child process won’t use any additional physical
> memory, as long as it doesn’t change (i.e. writes into) its memory.
> Since heap dumping only reads the child process's memory and then
> exits immediately, this technique can be applied even if the Java
> process already uses almost the whole free physical memory.
>
> The POC I've created (see
> http://cr.openjdk.java.net/~simonis/webrevs/2020/8252768/) decreases
> the aforementioned ~100ms, ~1s and 15s for a 100mb, 1gb and 8gb heap
> to ~3ms, ~15ms and ~60ms on my laptop which I think is significant.
> You can try it out by using the new "-async" or "-async=true" option
> of the "GC.heap_dump" jcmd command.
>
> Of course this change will require a CSR for the additional jcmd
> GC.heap_dump "-async" option which I'll be happy to create if there's
> any interest in this enhancement. Also, logging in the child process
> might potentially interfere with logging in the parent VM and probably
> will have to be removed in the final version, but I've left it in for
> now to better illustrate what's happening. Finally, we can't output
> the size of the created dump any more if we are using asynchronous
> dumping but from my point of view that's not such a big problem. Apart
> from that, the POC works surprisingly well :)
>
> Please let me know what you think and if there's something I've overlooked?
>
> Best regards,
> Volker
>
> PS: by the way, asynchronous dumping combines just fine with
> compressed dumps. So you can easily use "GC.heap_dump -async=true
> -gz=6"
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20200903/4df8172b/attachment.htm>


More information about the serviceability-dev mailing list