RFR: 8252842: Extend jmap to support parallel heap dump [v10]

Thu Mar 18 14:43:45 UTC 2021

On Thu, 18 Mar 2021 12:20:25 GMT, Ralf Schmelter <rschmelter at openjdk.org> wrote:

>> Dear Chris and Ralf,
>> May I ask your help to review the related CSR first? both JDK-8260424  and JDK-8261943? 
>> https://bugs.openjdk.java.net/browse/JDK-8260424 is for adding `parallel=<N>` option, which IMO also block the merge of #2379. 
>> https://bugs.openjdk.java.net/browse/JDK-8261943 is for adding new attach api comment `dumpheapext`
>> 
>> Thanks
>> Lin
>
> Hi Lin,
> 
> to be honest I would prefer if we would not add the parallel option but instead do it automatically depending on the number of GC threads available. I doubt that the user has a good idea about number of threads to use, so I think it is better to just use a sensible default derived from the number of threads available and if compression is enabled or not.

Dear Ralf and Chris,

> to be honest I would prefer if we would not add the parallel option but instead do it automatically depending on the number of GC threads available. I doubt that the user has a good idea about number of threads to use, so I think it is better to just use a sensible default derived from the number of threads available and if compression is enabled or not.

Thanks Ralf for your comments!

@plummercj and I have discussed the `parallel=` option several rounds, also together with Serguei @sspitsyn  and Ham-Lin @Hamlin-Li . And I recently also re-considered this option for dumping. here I list my opinion and hope we could made the final decision this time :-)

- The `parallel=<N>` option for `jmap -histo` has been discussed and finalized, I think we could leave it as it is. Because user may get general info about their system, such as core number and multi-process workload, these could help them tuning.  And also because the `parallel=<N>` option only affects how many threads would be used for object iteration (this is different with `jmap -dump`, I will explain in following comments), it is easy to observe the output or time and then adjust the number.

- The `parallel=<N>` option for `jmap -dump` is complicated, it has to consider the file writer threads and the disk writing speed. let me try to explain the implementation briefly:

In current implementation, the heap dump actaully has two kind of worker threads - dumper and writter. The dumper iterates over the heap and send data to the writer; the writer does compression if required and writes data to the file. There is buffer queue between dumper and writer: dumper grab empty buffer and fill data into the queue , and the writer threads grab the filled buffer from the queue, write data to file and then return empty buffer for futurn use. 
In summary it looks like following graph:

` Dumper  --> BufferQueue (size == #writers) --> writer(multiple) ==> disk(file)`

Before this patch, there is only one dumper (the VMThread) and all other worker threads are writer, that is, all GC threads work as writer. The only dumper iterate heap and the multiple writers write data to file. So the bottomneck is more likely to be heap iteration (dumper side).

With this patch,  there are multiple dumpers and multiple writers,  they all divided the gc thread number (plus 1 vmthread). And since there are multiple dumpers, the buffer queue is more likely to be filled.  So the bottomneck move from heap iteration to two aspect (writer side):
- drain of buffer queue 
- speed of disk writing

Since the buffer queue size is determined by number of writers, and the disk writing speed determines how fast the buffer can be empty again, it is important to keep the writer number as much as it can be to satisfy the dumper's requirement.

The `parallel=<N>` option only control the dumper thread number, and it indirectly affects the writer's thread number (as they all divide the total thread number). it is really hard for user to understand these internal mechanism and do tuning.  

And by default, the number of dumpers and writers are half : half of GC threads +1(vmThread), my testing shown this configuration is likly to be optimal. ( But maybe more evalution is required to confirm)

In summary, I think it is very hard for user to tune the parallel thread number for heap dump. So I suggest to remove it for dump and enable the parallel by default. 

Again, For `jmap -histo`, I suggest to keep this since it is more easy to be understood.

IMO, to make final decision, we need to consider following:
- Is it acceptable that `jmap -dump` and `jmap -histo` have different behavior about the option of `parallel`, that is, `heapdump` does not use it and `heaphisto` does.
- If we decide to not introduce the `parallel` option, we don't need to introduce the `heapdumpext` attach api command, so the 2 CSRs related to this PR could be closed. But this may leave the argument count problem to future, when more argumebts is introduced, is this acceptable? 
- if we decide to introduce `parallel=<N>` option, do we need to add explaination in the document? or docs like tuning guide? 

Welcome for any of your comments!

BRs,
Lin

-------------

PR: https://git.openjdk.java.net/jdk/pull/2261