RFR: JDK-8306441: Two phase segmented heap dump [v21]
Kevin Walls
kevinw at openjdk.org
Tue Aug 1 14:44:56 UTC 2023
On Tue, 1 Aug 2023 05:50:42 GMT, Yi Yang <yyang at openjdk.org> wrote:
>> ### Motivation and proposal
>> Hi, heap dump brings about pauses for application's execution(STW), this is a well-known pain. JDK-8252842 have added parallel support to heapdump in an attempt to alleviate this issue. However, all concurrent threads competitively write heap data to the same file, and more memory is required to maintain the concurrent buffer queue. In experiments, we did not feel a significant performance improvement from that.
>>
>> The minor-pause solution, which is presented in this PR, is a two-phase segmented heap dump:
>>
>> - Phase 1(STW): Concurrent threads directly write data to multiple heap files.
>> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump file. This process can happen outside safepoint.
>>
>> Now concurrent worker threads are not required to maintain a buffer queue, which would result in more memory overhead, nor do they need to compete for locks. The changes in the overall design are as follows:
>>
>> 
>> <p align="center">Fig1. Before</p>
>>
>> 
>> <p align="center">Fig2. After this patch</p>
>>
>> ### Performance evaluation
>> | memory | numOfThread | CompressionMode | STW | Total |
>> | -------| ----------- | --------------- | --- | ---- |
>> | 8g | 1 T | N | 15.612 | 15.612 |
>> | 8g | 32 T | N | 2.561725 | 14.498 |
>> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
>> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
>> | 8g | 96 T | N | 2.6790452 | 14.012 |
>> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
>> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
>> | 16g | 1 T | N | 26.278 | 26.278 |
>> | 16g | 32 T | N | 5.231374 | 26.417 |
>> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
>> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
>> | 16g | 96 T | N | 6.2445556 | 27.141 |
>> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
>> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
>> | 32g | 1 T | N | 48.149 | 48.149 |
>> | 32g | 32 T | N | 10.7734677 | 61.643 |
>> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
>> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
>> | 32g | 96 T | N | 13.1522042 | 61.432 |
>> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
>> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
>> | 64g | 1 T | N | 100.583 | 100.583 |
>> | 64g | 32 T | N | 20.9233744 | 134.701 |
>> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
>> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
>> | 64g | 96 T | N | 26.7374116 | 126.08 |
>> | 64g | ...
>
> Yi Yang has updated the pull request incrementally with one additional commit since the last revision:
>
> test failure on mac
Thanks for the update -
Right, on the CSR I may have followed the links too far and been comparing with jmap features, not jcmd features. 8-)
On the default behaviour, running this change, with no -parallel= option, I have been seeing it default to a non-parallel dump, but now I see why:
I see num_dump_thread being set (e.g. 12 on my test system), but I see num_active_workers = 1 (requested is 12).
I just saw a run with no -parallel option which got num_active_workers = 2, so it did a parallel dump, so the default is being set as intended, but the availability of worker threads is not always predictable!
-Xlog:gc shows the GC was using fewer workers than expected.
So that suggested -XX:-UseDynamicNumberOfGCThreads and yes, with that I see e.g. num_active_workers = 23. req_num_dump_thread = 12 for my test. And all 12 threads work on the heap dump.
-------------------
test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpTest.java:
we can't see if parallel dumping is happening
Can we add, like in I think another test: run testng/othervm -Xlog:heapdump HeapDumpTest
I would also suggest -XX:-UseDynamicNumberOfGCThreads otherwise as above, the GC can scale back and you don't always get a parallel dump when you expect it.
-------------------
Do we really want -parallel=-1
..to be a valid input, and to use the workers, when it's clearly an error.
HeapDumpDCmd::execute casts it to a unit so we lose the fact that it was a request for -1.
Can we get the signed value in HeapDumpDCmd::execute, print a message and return like we do when compression level is out of range?
(Passing a non-number when an integer is expected will cause the jcmd to fail, and that seems correct.)
-------------------
So I think I'm done, looks good. 8-)
-------------
PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1660469424
More information about the serviceability-dev
mailing list