RFR: JDK-8306441: Two phase segmented heap dump [v20]

Kevin Walls kevinw at openjdk.org
Mon Jul 31 22:03:57 UTC 2023


On Wed, 19 Jul 2023 12:42:27 GMT, Yi Yang <yyang at openjdk.org> wrote:

>> ### Motivation and proposal
>> Hi, heap dump brings about pauses for application's execution(STW), this is a well-known pain. JDK-8252842 have added parallel support to heapdump in an attempt to alleviate this issue. However, all concurrent threads competitively write heap data to the same file, and more memory is required to maintain the concurrent buffer queue. In experiments, we did not feel a significant performance improvement from that.
>> 
>> The minor-pause solution, which is presented in this PR, is a two-phase segmented heap dump:
>> 
>> - Phase 1(STW): Concurrent threads directly write data to multiple heap files.
>> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump file. This process can happen outside safepoint.
>> 
>> Now concurrent worker threads are not required to maintain a buffer queue, which would result in more memory overhead, nor do they need to compete for locks. The changes in the overall design are as follows:
>> 
>> ![image](https://github.com/openjdk/jdk/assets/5010047/77e4764a-62b5-4336-8b45-fc880ba14c4a)
>> <p align="center">Fig1. Before</p>
>> 
>> ![image](https://github.com/openjdk/jdk/assets/5010047/931ab874-64d1-4337-ae32-3066eed809fc)
>> <p align="center">Fig2. After this patch</p>
>> 
>> ### Performance evaluation
>> | memory | numOfThread | CompressionMode | STW | Total |
>> | -------| ----------- | --------------- | --- | ---- |
>> | 8g | 1 T | N | 15.612 | 15.612 |
>> | 8g | 32 T | N | 2.561725 | 14.498 |
>> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
>> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
>> | 8g | 96 T | N | 2.6790452 | 14.012 |
>> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
>> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
>> | 16g | 1 T | N | 26.278 | 26.278 |
>> | 16g | 32 T | N | 5.231374 | 26.417 |
>> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
>> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
>> | 16g | 96 T | N | 6.2445556 | 27.141 |
>> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
>> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
>> | 32g | 1 T | N | 48.149 | 48.149 |
>> | 32g | 32 T | N | 10.7734677 | 61.643 |
>> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
>> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
>> | 32g | 96 T | N | 13.1522042 | 61.432 |
>> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
>> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
>> | 64g | 1 T | N | 100.583 | 100.583 |
>> | 64g | 32 T | N | 20.9233744 | 134.701 |
>> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
>> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
>> | 64g | 96 T | N | 26.7374116 | 126.08 |
>> | 64g | ...
>
> Yi Yang has updated the pull request incrementally with one additional commit since the last revision:
> 
>   test failure on Windows

Hi, a few more points today, but I'm getting there. 8-)

1
The previous parallel dump change:
8252842: Extend jmap to support parallel heap dump
..the notes suggest it was going to have a CSR for the new option, but was then decided not to add the option to specify the number of threads.
That CSR was https://bugs.openjdk.org/browse/JDK-8260424 which was withdrawn, but looks like we should do one for this change.
If we had a reasonable expectation of what the best number to use was, we could do the same thing here.  If we might get that wrong, we will need the user to be able to specify.
Actually are we changing the default behaviour: from "parallel if you can", to "serial unless you specify".


2
Running the changes, I got a test failure on macos debug (x64 and aarch64):

test/hotspot/jtreg/serviceability/HeapDump/IntegrityHeapDumpTest.java

On different machines, one x64 and one aarch64, got the same kind of error:
----------stderr:(6/609)----------
com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file /var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/.java_pid36008: target process 36008 doesn't respond within 10500ms or HotSpot VM not loaded
	at jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:95)
	at jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
	at jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
	at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
	at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)
	
..so the test fails with: java.lang.RuntimeException: Expected to get exit value of [0], exit value is: [1]

Other tests e.g. test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpTest.java run OK.

The failing test IntegrityHeapDumpTest.java extends LingeredApp.
test/hotspot/jtreg/serviceability/HeapDump/DuplicateArrayClassesTest.java does this, and runs and passes.

I don't yet understand the failure.  Could things have gone wrong in macosx debug builds in the attach listener.. I didn't get a stack of the LingeredApp under test, but will try again.


3
Docs
https://openjdk.org/guide/#release-notes
This enhancement should probably have a release note, and we could briefly explain what parallel means.  
We probably need to still recommend it is not used on a production server unless you have tested the impact and are happy it does not disturb the application.
Maybe we can offer a recommended usage, a guide number of threads as a starting point.
We also need to explain when this can or cannot be used (SerialGC, any other restrictions? It's not a problem that SerialGC does not use this, just that we should make it clear).  
I can help with the wording if needed, if you'd like to start something (this might make it into other docs also.)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1659249670


More information about the serviceability-dev mailing list