RFR: JDK-8306441: Two phase segmented heap dump [v25]
Kevin Walls
kevinw at openjdk.org
Thu Aug 3 13:27:38 UTC 2023
On Wed, 2 Aug 2023 14:33:28 GMT, Yi Yang <yyang at openjdk.org> wrote:
>> ### Motivation and proposal
>> Hi, heap dump brings about pauses for application's execution(STW), this is a well-known pain. JDK-8252842 have added parallel support to heapdump in an attempt to alleviate this issue. However, all concurrent threads competitively write heap data to the same file, and more memory is required to maintain the concurrent buffer queue. In experiments, we did not feel a significant performance improvement from that.
>>
>> The minor-pause solution, which is presented in this PR, is a two-phase segmented heap dump:
>>
>> - Phase 1(STW): Concurrent threads directly write data to multiple heap files.
>> - Phase 2(Non-STW): Merge multiple heap files into one complete heap dump file. This process can happen outside safepoint.
>>
>> Now concurrent worker threads are not required to maintain a buffer queue, which would result in more memory overhead, nor do they need to compete for locks. The changes in the overall design are as follows:
>>
>> 
>> <p align="center">Fig1. Before</p>
>>
>> 
>> <p align="center">Fig2. After this patch</p>
>>
>> ### Performance evaluation
>> | memory | numOfThread | CompressionMode | STW | Total |
>> | -------| ----------- | --------------- | --- | ---- |
>> | 8g | 1 T | N | 15.612 | 15.612 |
>> | 8g | 32 T | N | 2.561725 | 14.498 |
>> | 8g | 32 T | C1 | 2.3084878 | 14.198 |
>> | 8g | 32 T | C2 | 10.9355128 | 21.882 |
>> | 8g | 96 T | N | 2.6790452 | 14.012 |
>> | 8g | 96 T | C1 | 2.3044796 | 3.589 |
>> | 8g | 96 T | C2 | 9.7585151 | 20.219 |
>> | 16g | 1 T | N | 26.278 | 26.278 |
>> | 16g | 32 T | N | 5.231374 | 26.417 |
>> | 16g | 32 T | C1 | 5.6946983 | 6.538 |
>> | 16g | 32 T | C2 | 21.8211105 | 41.133 |
>> | 16g | 96 T | N | 6.2445556 | 27.141 |
>> | 16g | 96 T | C1 | 4.6007096 | 6.259 |
>> | 16g | 96 T | C2 | 19.2965783 | 39.007 |
>> | 32g | 1 T | N | 48.149 | 48.149 |
>> | 32g | 32 T | N | 10.7734677 | 61.643 |
>> | 32g | 32 T | C1 | 10.1642097 | 10.903 |
>> | 32g | 32 T | C2 | 43.8407607 | 88.152 |
>> | 32g | 96 T | N | 13.1522042 | 61.432 |
>> | 32g | 96 T | C1 | 9.0954641 | 9.885 |
>> | 32g | 96 T | C2 | 38.9900931 | 80.574 |
>> | 64g | 1 T | N | 100.583 | 100.583 |
>> | 64g | 32 T | N | 20.9233744 | 134.701 |
>> | 64g | 32 T | C1 | 18.5023784 | 19.358 |
>> | 64g | 32 T | C2 | 86.4748377 | 172.707 |
>> | 64g | 96 T | N | 26.7374116 | 126.08 |
>> | 64g | ...
>
> Yi Yang has updated the pull request incrementally with one additional commit since the last revision:
>
> whitespace
OK, nearly done!
We always log "parallelism true" which looks odd. It just means the dump path has space for up to 9,999 log parts.
In VM_HeapDumper::doit() if we make it log whether it will try a parallel dump, based on all the checks, it simplifies things I think.
Also the variable num_requested_dump_thread is not needed? It's just a copy of _num_dumper_threads which we don't change.
Here is a suggestion for VM_HeapDumper::doit()
...
WorkerThreads* workers = ch->safepoint_workers();
uint num_active_workers = workers != nullptr ? workers->active_workers() : 0;
bool is_parallel = num_active_workers > 0 && can_parallel_dump() && is_parallel_dump();
log_info(heapdump)("Requested dump threads %u, active workers %u, parallelism %s",
_num_dumper_threads, num_active_workers, is_parallel ? "true" : "false");
if (!is_parallel) {
_num_dumper_threads = 1;
work(VMDumperWorkerId);
} else {
_num_dumper_threads = clamp(_num_dumper_threads, 2U, num_active_workers);
uint heap_only_dumper_threads = _num_dumper_threads - 1 /* VMDumper thread */;
_dumper_controller = new (std::nothrow) DumperController(heap_only_dumper_threads);
ParallelObjectIterator poi(_num_dumper_threads);
_poi = &poi;
workers->run_task(this, _num_dumper_threads);
_poi = nullptr;
}
-----------------
HeapDumpParallelTest.java
We need to change all of the:
Asserts.assertTrue(out.getStdout().contains(...
to:
out.shouldContain(...
...then we get the output included in the log if it fails, so we should see what went wrong.
----------------------
-------------
PR Comment: https://git.openjdk.org/jdk/pull/13667#issuecomment-1663979913
More information about the serviceability-dev
mailing list