RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump

Wed Feb 19 12:30:39 UTC 2020

Hi David,

> If the lifecycle of your new NonJavaThread does not fit the existing 
> NonJavaThreads then yes you will need to override pre_run() and 
> post_run(), but you shouldn't just delete all the NJT iteration support 
> - at least it isn't obvious to me that it is valid to do so.

The only thing I want to do is to be able to delete the thread after it is finished. 

I first thought I could copy the JfrThreadSampler thread, which deletes itself in the post_run() method. But it turned out, that the run() method of this thread never stops, so the post_run() method is never called. And when I implement the same post_run() method, I hit asserts in the debug build when calling the destructor, since os::free_thread() asserts using Thread::current(), which does not work since the post_run() method of NonJavaThread already wiped out the thread local storage.

So it seems to me now that currently deleting a non-java thread is not supported and nobody does it. But the code itself and the comments in Thread::call_run() seem to indicate, it should work. So maybe it's just as simple as adjusting the asserts in os::free_thread().

> That begs the question for me exactly what it is that your new NJT 
> worker thread will touch in the VM because that will determine where it 
> needs to fit in the Thread hierarchy and what actions it needs to 
> perform in pre_run() and post_run(). I'm unclear what state the VM will 
> be in when this heap dump is performed and these worker threads are 
> doing the compression.

Currently the worker threads only use the Monitor from the VM. Otherwise they just compress the buffers and write them to the file using the os interface. The heap dump itself is performed in a VM operation. They don't accesses objects, classes or other 'VM objects'.

Best regards,
Ralf 

-----Original Message-----
From: David Holmes <david.holmes at oracle.com> 
Sent: Donnerstag, 13. Februar 2020 05:28
To: Schmelter, Ralf <ralf.schmelter at sap.com>; Yasumasa Suenaga <suenaga at oss.nttdata.com>; OpenJDK Serviceability <serviceability-dev at openjdk.java.net>
Cc: yasuenag at gmail.com
Subject: Re: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump

Hi Ralf,

On 12/02/2020 6:41 pm, Schmelter, Ralf wrote:
> Hi David,
> 
>> I see little point in subclassing NonJavaThread (via NamedThread) but
>> then overriding pre_run() and post_run() so that you don't do anything
>> that NonJavaThread is supposed to do regarding the NJT iterator
>> capabilities.
> 
> The problem is  the post_run() method of NamedThread calls Thread::clear_thread_current(), which then makes it impossible to delete the thread at least in a debug build, since the code in ~Thread calls os::free_thread() which calls Thread::current()->->osthread() in an assert, which obviously will crash.

If the lifecycle of your new NonJavaThread does not fit the existing 
NonJavaThreads then yes you will need to override pre_run() and 
post_run(), but you shouldn't just delete all the NJT iteration support 
- at least it isn't obvious to me that it is valid to do so.

> Originally I tried not use my own threads at all and instead use the WorkGang from CollectedHeap:: get_safepoint_workers(). But this ultimately failed because I'm not allowed to iterate the heap in a worker thread on Shenandoah. Additionally ParallelGC did not implement get_safepoint_workers(), but that should have not been a problem.

That begs the question for me exactly what it is that your new NJT 
worker thread will touch in the VM because that will determine where it 
needs to fit in the Thread hierarchy and what actions it needs to 
perform in pre_run() and post_run(). I'm unclear what state the VM will 
be in when this heap dump is performed and these worker threads are 
doing the compression.

Thanks,
David

> Maybe it is better to try to get this to work (e.g. if I could specify a foreground task when calling run_task(), the problem could be avoid by doing the iteration in the foreground task). But I'm not sure how changes in this area are seen.
> 
>> For your monitor operations, you should use a MonitorLocker and then
>> call ml->wait() which will do the right thing with respect to "no
>> safepoint checks" without you needing to specify it directly.
> 
> Thanks, will do.
> 
> Best regards,
> Ralf