RFR: JDK-8202884: SA: Attach/detach might fail on Linux if debugee application create/destroy threads during attaching
Jini George
jini.george at oracle.com
Tue Dec 11 17:29:31 UTC 2018
Hello !
Requesting reviews for:
https://bugs.openjdk.java.net/browse/JDK-8202884
Webrev: http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/index.html
Details:
For attaching to the threads in a process, we first go ahead and do a
ptrace attach to the main thread. Later, we use the libthread_db library
(or, in the case of being within a container, iterate through the
/proc/<pid>/task files) to discover the threads of the process, and add
them to the threads list (within SA) for this process. Once, we have
discovered all the threads and added these to the list of threads, we
then invoke ptrace attach individually on all these threads to attach to
these. When we deal with an application where the threads are exiting
continuously, some of these threads might not exist by the time we try
to ptrace attach to these threads. The proposed fix includes the
following modifications to solve this.
1. Check the state of the threads in the thread_db callback routine,
and skip if the state of the thread is TD_THR_UNKNOWN or TD_THR_ZOMBIE.
SA does not try to ptrace attach to these threads and does not include
these threads in the threads list.
2. While ptrace attaching to the thread, if ptrace(PTRACE_ATTACH, ...)
fails with either ESCRH or EPERM, check the state of the thread by
checking if the /proc/<pid>/status file corresponding to that thread
exists and if so, reading in the 'State:' line of that file. Skip
attaching to this thread and delete this thread from the SA list of
threads, if the thread is dead (State: X) or is a zombie (State: Z).
From the /proc man page, "Current state of the process. One of "R
(running)", "S (sleeping)", "D (disk sleep)", "T (stopped)", "T (tracing
stop)", "Z (zombie)", or "X (dead)"."
3. If waitpid() on the thread is a failure, again skip this thread
(delete this from SA's list of threads) instead of bailing out if the
thread has exited or terminated.
Testing:
1. Tested by attaching and detaching multiple times to a test program
spawning numerous short lived threads.
2. The SA tests (under test/hotspot/jtreg/serviceability/sa) passed with
100 repeats on Mach5.
3. No new failures and no occurrences of JDK-8202884 seen with testing
the SA tests (tiers 1 to 5) on Mach5.
More details in the bug comments section.
Thank you,
Jini.
More information about the serviceability-dev
mailing list