RFR: 8252842: Extend jmap to support parallel heap dump [v10]

Mon Feb 22 07:38:56 UTC 2021

On Mon, 22 Feb 2021 05:53:19 GMT, Ralf Schmelter <rschmelter at openjdk.org> wrote:

>> Lin Zang has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   fix build fail issue on windows
>
> Hi,
> 
> I've benchmarked the code on my machine (128GB memory, 56 logical CPUs) with an example creating a 32 GB heap dump. I only saw a 10 percent reduction in time, both using uncompressed and compressed dumps. Have you seen better numbers in your benchmarks?
> 
> And it seems to potentially use a lot more temporary memory. In my example I had a 4 GB array in the heap and the new code allocated 4 GB of additional memory to write this array. This could happen in more threads in parallel, increasing the memory consumption even more.
> 
> If the above problems could be fixed, I would suggest to just use the parallel code in all cases.

Hi @schmelter-sap,
Thanks a lot for reviewing and benchmarking. 

> I've benchmarked the code on my machine (128GB memory, 56 logical CPUs) with an example creating a 32 GB heap dump. I only saw a 10 percent reduction in time, both using uncompressed and compressed dumps. Have you seen better numbers in your benchmarks?
>
> And it seems to potentially use a lot more temporary memory. In my example I had a 4 GB array in the heap and the new code allocated 4 GB of additional memory to write this array. This could happen in more threads in parallel, increasing the memory consumption even more.

I have done some preliminary test on my machine (16GB, 8 core), the data are shown as follow:
`$ jmap -dump:file=dump4.bin,parallel=4 127420`
`Dumping heap to /home/lzang1/Source/jdk/dump4.bin ...`
`Heap dump file created [932950649 bytes in 0.591 secs]`
`$ jmap -dump:file=dump1.bin,parallel=1 127420`
`Dumping heap to /home/lzang1/Source/jdk/dump1.bin ...`
`Heap dump file created [932950739 bytes in 2.957 secs]`

But I do have observed unstable data reported on a machine with more cores and larger RAM, plus a workload with more heap usage. I thought that may be related with the memory consumption as you mentioned. And I am investigating the way to optimize it.

> If the above problems could be fixed, I would suggest to just use the parallel code in all cases.
Thanks a lot! I will let you know when I make some progress on optimization.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2261