RFR: JDK-8202884: SA: Attach/detach might fail on Linux if debugee application create/destroy threads during attaching

Jini George jini.george at oracle.com
Tue Dec 11 17:29:31 UTC 2018


Hello !

Requesting reviews for:

https://bugs.openjdk.java.net/browse/JDK-8202884
Webrev: http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/index.html

Details:
For attaching to the threads in a process, we first go ahead and do a 
ptrace attach to the main thread. Later, we use the libthread_db library 
(or, in the case of being within a container, iterate through the 
/proc/<pid>/task files) to discover the threads of the process, and add 
them to the threads list (within SA) for this process. Once, we have 
discovered all the threads and added these to the list of threads, we 
then invoke ptrace attach individually on all these threads to attach to 
these. When we deal with an application where the threads are exiting 
continuously, some of these threads might not exist by the time we try 
to ptrace attach to these threads. The proposed fix includes the 
following modifications to solve this.
  1. Check the state of the threads in the thread_db callback routine, 
and skip if the state of the thread is TD_THR_UNKNOWN or TD_THR_ZOMBIE. 
SA does not try to ptrace attach to these threads and does not include 
these threads in the threads list.
  2. While ptrace attaching to the thread, if ptrace(PTRACE_ATTACH, ...) 
fails with either ESCRH or EPERM, check the state of the thread by 
checking if the /proc/<pid>/status file corresponding to that thread 
exists and if so, reading in the 'State:' line of that file. Skip 
attaching to this thread and delete this thread from the SA list of 
threads, if the thread is dead (State: X) or is a zombie (State: Z). 
 From the /proc man page, "Current state of the process. One of "R 
(running)", "S (sleeping)", "D (disk sleep)", "T (stopped)", "T (tracing 
stop)", "Z (zombie)", or "X (dead)"."
  3. If waitpid() on the thread is a failure, again skip this thread 
(delete this from SA's list of threads) instead of bailing out if the 
thread has exited or terminated.

Testing:
1. Tested by attaching and detaching multiple times to a test program 
spawning numerous short lived threads.
2. The SA tests (under test/hotspot/jtreg/serviceability/sa) passed with 
100 repeats on Mach5.
3. No new failures and no occurrences of JDK-8202884 seen with testing 
the SA tests (tiers 1 to 5) on Mach5.

More details in the bug comments section.

Thank you,
Jini.



More information about the serviceability-dev mailing list