RFR: JDK-8306441: Segmented heap dump [v3]

Yi Yang yyang at openjdk.org
Fri Apr 28 13:34:53 UTC 2023


> Hi, heap dump brings about pauses for application's execution(STW), this is a well-known pain. JDK-8252842 have added parallel support to heapdump in an attempt to alleviate this issue. However, all concurrent threads competitively write heap data to the same file, and more memory is required to maintain the concurrent buffer queue. In experiments, we did not feel a significant performance improvement from that.
> 
> The minor-pause solution, which is presented in this PR, is a two-stage segmented heap dump:
> 
> 1. Stage One(STW): Concurrent threads directly write data to multiple heap files.
> 2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump file.
> 
> Now concurrent worker threads are not required to maintain a buffer queue, which would result in more memory overhead, nor do they need to compete for locks. It significantly reduces 73~80% application pause time. 
> 
> | memory | numOfThread | STW         | Total      |
> | --- | --------- | -------------- | ------------ |
> | 8g | 1 thread | 15.612 secs | 15.612 secs |
> | 8g | 32 thread |  2.5617250 secs | 14.498 secs |
> | 8g | 96 thread | 2.6790452 secs | 14.012 secs | 
> | 16g | 1 thread | 26.278 secs | 26.278 secs |
> | 16g | 32 thread |  5.2313740 secs | 26.417 secs |
> | 16g | 96 thread | 6.2445556 secs | 27.141 secs |
> | 32g | 1 thread | 48.149 secs | 48.149 secs |
> | 32g | 32 thread | 10.7734677 secs | 61.643 secs | 
> | 32g | 96 thread | 13.1522042 secs |  61.432 secs |
> | 64g | 1 thread |  100.583 secs | 100.583 secs |
> | 64g | 32 thread | 20.9233744 secs | 134.701 secs | 
> | 64g | 96 thread | 26.7374116 secs | 126.080 secs | 
> | 128g | 1 thread | 233.843 secs | 233.843 secs |
> | 128g | 32 thread | 72.9945768 secs | 207.060 secs |
> | 128g | 96 thread | 67.6815929 secs | 336.345 secs |
> 
>> **Total** means the total heap dump including both two phases
>> **STW** means the first phase only.
>> For parallel dump, **Total** = **STW** + **Merge**. For serial dump, **Total** = **STW**
> 
> ![image](https://user-images.githubusercontent.com/5010047/234534654-6f29a3af-dad5-46bc-830b-7449c80b4dec.png)
> 
> In actual testing, two-stage solution can lead to an increase in the overall time for heapdump(See table above). However, considering the reduction of STW time, I think it is an acceptable trade-off. Furthermore, there is still room for optimization in the second merge stage(e.g. sendfile/splice/copy_file_range instead of read+write combination). Since number of parallel dump thread has a considerable impact on total dump time, I added a parameter that allows users to specify the number of parallel dump thread they wish to run.
> 
> ##### Open discussion
> 
> - Pauseless heap dump solution?
> An alternative pauseless solution is to fork a child process, set the parent process heap to read-only, and dump the heap in child process. Once writing happens in parent process, child process observes them by userfaultfd and corresponding pages are prioritized for dumping. I'm also looking forward to hearing comments and discussions about this solution.
> 
> - Client parser support for segmented heap dump
> This patch provides a possibility that whether heap dump needs to be complete or not, can the VM directly generate segmented heapdump, and let the client parser complete the merge process? Looking forward to hearing comments from the Eclipse MAT community

Yi Yang has updated the pull request incrementally with one additional commit since the last revision:

  refactor VM_HeapDumpMerge

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/13667/files
  - new: https://git.openjdk.org/jdk/pull/13667/files/00b49e4e..9e563ca7

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13667&range=01-02

  Stats: 83 lines in 1 file changed: 43 ins; 35 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/13667.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13667/head:pull/13667

PR: https://git.openjdk.org/jdk/pull/13667


More information about the serviceability-dev mailing list