RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump

Tue Feb 11 07:44:17 UTC 2020

Hi again Ralf, :)

A few more comments after taking a closer look at the thread code.

On the surface it seems to me this is a case where it would be okay to 
introduce a subclass of Thread that is not JavaThread nor NonJavaThread. 
I see little point in subclassing NonJavaThread (via NamedThread) but 
then overriding pre_run() and post_run() so that you don't do anything 
that NonJavaThread is supposed to do regarding the NJT iterator 
capabilities. But we currently expect all threads to fit into one 
category or another, so this is problematic. :( I thinking disabling the 
NJT functionality is also problematic. So not sure what to suggest yet.

BTW you extended NamedThread but you never actually set a name AFAICS. ??

For your monitor operations, you should use a MonitorLocker and then 
call ml->wait() which will do the right thing with respect to "no 
safepoint checks" without you needing to specify it directly.

Cheers,
David

On 11/02/2020 8:19 am, David Holmes wrote:
> Hi Ralf,
> 
> One part of this caught my eye and now I look at the webrev I have some 
> concerns. Introducing new threads to the VM is not something that should 
> be done lightly and it has to be done very carefully - I need to look 
> closer at this aspect. Further when using Mutexes/Monitors in such code 
> you have to be extremely careful about how (or even if) those 
> Mutex/Monitor get deleted. The code you have at present is not safe 
> because you cannot know when other threads have completely exited the 
> Monitor/Mutex code. The last thread to terminate will signal the 
> destructing thread (blocked in wait) then release the monitor, allowing 
> the destructing thread to acquire the monitor and then delete the _lock. 
> But at the point at which the monitor becomes free and the destructor 
> thread is unparked, the terminating thread may be context switched out 
> and remain inside the Monitor code. The destructor thread then deletes 
> the monitor and frees it. When the terminating thread resumes, if it 
> touches any memory associated with the Monitor it could SEGV.
> 
> To safely delete a Monitor/Mutex you have to know for certain that all 
> threads using it have completely ceased to use it. You cannot use that 
> Monitor/Mutex as the means for determining that. It is a non-trivial 
> problem to solve.
> 
> Cheers,
> David
> -----
> 
> On 11/02/2020 1:33 am, Schmelter, Ralf wrote:
>> Hi Yasumasa,
>>
>>>    You can use `DCmdArgument<jlong>` for -gz option.
>>
>> That is what I originally tried. But then you always have to supply a 
>> compression level (just specifying -gz doesn't work). Since I would 
>> expect most users never caring about the compression level, I switched 
>> to a string option, which can handle this pattern.
>>
>>> _nr_of_threads, _id_to_write, _current in CompressionBackend should 
>>> be added `volatile` at least.
>>
>> I don't think that is needed. Apart from the initialization, they are 
>> only changed under lock protection.
>>
>>> BTW how much processing time is different between single threaded and 
>>> multi threaded?
>>
>> I've benchmarked an example, which creates a ~31 GB uncompressed hprof 
>> file, with a VM which doesn't use any background threads. Here are the 
>> size of the create files, the compression level and the time spend:
>>
>> Uncompressed, 31.6 G, 71 sec
>> gzipped level 1, 7.57 G, 463 sec (x6.5)
>> gzipped level 3, 7.10 G, 609 sec (x8.6)
>> gzipped level 6, 6.49 G, 1415 sec (x19.9)
>>
>> So even the fastest gzip compression makes writing the dump at least 5 
>> times as slow.
>>
>>> Also I want to know what number is set to ParallelGCThreads.
>>> ParallelGCThreads seems to affect to thread num for GZip compression.
>>
>> Originally, I've tried to use the WorkGang (CollectedHeap:: 
>> get_safepoint_workers()) of the GC to do the work. But this wouldn't 
>> work because Shenandoah could not iterate the heap from a worker 
>> thread. So I've opted to start the needed threads itself for the time 
>> of the heap dump. I've used ParallelGCThreads as the maximum number of 
>> threads, since this is what would be used for a GC too. So it should 
>> not clog up the machine more than a GC. Maybe it would be even better 
>> to additionally limit the threads by the compression level.
>>
>> Best regards,
>> Ralf Schmelter
>>
>> -----Original Message-----
>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>> Sent: Samstag, 8. Februar 2020 14:46
>> To: Schmelter, Ralf <ralf.schmelter at sap.com>; OpenJDK Serviceability 
>> <serviceability-dev at openjdk.java.net>
>> Cc: yasuenag at gmail.com
>> Subject: Re: RFR(L) 8237354: Add option to jcmd to write a gzipped 
>> heap dump
>>
>> Hi Ralf,
>>
>>
>> - diagnosticCommand.cpp
>>    You can use `DCmdArgument<jlong>` for -gz option.
>>    If you want to use lesser type (e.g. int, unsigned char), I guess 
>> you need to modify GenDCmdArgument class.
>>
>> - heapDumper.cpp
>>    _nr_of_threads, _id_to_write, _current in CompressionBackend should 
>> be added `volatile` at least.
>>    (Other values need to be checked)
>>
>>
>> BTW how much processing time is different between single threaded and 
>> multi threaded?
>> Also I want to know what number is set to ParallelGCThreads.
>> ParallelGCThreads seems to affect to thread num for GZip compression.
>>
>>
>> Thanks,
>>
>> Yasumasa
>>