RFR: 8252842: Extend jmap to support parallel heap dump [v10]

Tue Feb 23 08:26:40 UTC 2021

On Tue, 23 Feb 2021 08:06:14 GMT, Ralf Schmelter <rschmelter at openjdk.org> wrote:

>> Hi @schmelter-sap,
>> Thanks a lot for reviewing and benchmarking. 
>> 
>>> I've benchmarked the code on my machine (128GB memory, 56 logical CPUs) with an example creating a 32 GB heap dump. I only saw a 10 percent reduction in time, both using uncompressed and compressed dumps. Have you seen better numbers in your benchmarks?
>>>
>>> And it seems to potentially use a lot more temporary memory. In my example I had a 4 GB array in the heap and the new code allocated 4 GB of additional memory to write this array. This could happen in more threads in parallel, increasing the memory consumption even more.
>> 
>> I have done some preliminary test on my machine (16GB, 8 core), the data are shown as follow:
>> `$ jmap -dump:file=dump4.bin,parallel=4 127420`
>> `Dumping heap to /home/lzang1/Source/jdk/dump4.bin ...`
>> `Heap dump file created [932950649 bytes in 0.591 secs]`
>> `$ jmap -dump:file=dump1.bin,parallel=1 127420`
>> `Dumping heap to /home/lzang1/Source/jdk/dump1.bin ...`
>> `Heap dump file created [932950739 bytes in 2.957 secs]`
>> 
>> But I do have observed unstable data reported on a machine with more cores and larger RAM, plus a workload with more heap usage. I thought that may be related with the memory consumption as you mentioned. And I am investigating the way to optimize it.
>> 
>>> If the above problems could be fixed, I would suggest to just use the parallel code in all cases.
>> 
>> Thanks a lot! I will let you know when I make some progress on optimization.
>> 
>> BRs,
>> Lin
>
> Hi @linzang,
> 
> I've done more benchmarking using different numbers of threads for parallel heap iteration and have found values which give at least a factor of 2 speedup (for gzipped dumps) or 1.6 (for unzipped dumps). For my scenario using gzip compression about 10 percent of the available CPUs for parallel iteration gave the best speedup, for the uncompressed one it was about 7 percent. 
> 
> Note that the baseline I compared against was not the parallel=1 case, but the old code. The parallel=1 case was always 10 to 20 percent slower than the old code.
> 
> Best regards,
> Ralf

Dear @ralf,
Really Thanks for benchmarking it!
It is a little surprise to me that "parallel=1" is 10~20 percent slower than before. I believe this can be avoid with some revise in code. And I also found a potential memory leak in the implementation, WIP in fixing it.

> I've done more benchmarking using different numbers of threads for parallel heap iteration and have found values which give at least a factor of 2 speedup (for gzipped dumps) or 1.6 (for unzipped dumps). For my scenario using gzip compression about 10 percent of the available CPUs for parallel iteration gave the best speedup, for the uncompressed one it was about 7 percent.

This data are really interest to me, it seems using gzipped dump is faster than unzipped dump, is the because of disk writing or something else? I would investigate more about it~
Thanks a lot!

BRs,
Lin

-------------

PR: https://git.openjdk.java.net/jdk/pull/2261