RFR: JDK-8202884: SA: Attach/detach might fail on Linux if debugee application create/destroy threads during attaching
JC Beyler
jcbeyler at google.com
Wed Dec 12 17:17:51 UTC 2018
Hi Jini,
Should your return not be return !found_state instead of false here:
257 if (!found_state) {
258 // Assuming the thread exists.
259 print_error("Could not find the 'State:' string in the
/proc/%d/status file\n", pid);
260 }
261 fclose (fp);
262 return false;
In your webrev.00 it was the case but now, you always return the process
does exist even if you have not found it.
cf:
http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c.udiff.html
vs
http://cr.openjdk.java.net/~jgeorge/8202884/webrev.02/src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c.udiff.html
Tiny nit: no need to check *space if you are using isspace(*space) right
after :)
Apart from the return question, the webrev looks good to me :-)
Jc
On Wed, Dec 12, 2018 at 4:15 AM Jini George <jini.george at oracle.com> wrote:
> Thank you very much for looking into this, JC!
>
> I have a revised webrev addressing your comments at:
>
> http://cr.openjdk.java.net/~jgeorge/8202884/webrev.02/index.html
>
> Requesting one more review for this. My comments inline:
>
> On 12/12/2018 2:53 AM, JC Beyler wrote:
> > Hi Jini,
> >
> > I saw a few nits:
> >
> http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/libproc_impl.h.udiff.html
> > -> The comments are in the third person normally it seems so it would
> > become (I also removed the s from threads):
> >
> > +// deletes a thread from the thread list
> Done.
> >
> >
> http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/libproc_impl.c.udiff.html
> > -> You added two empty lines it seems that could be removed
> Done.
> >
> >
> >
> http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c.udiff.html
> > -> Is there a real reason to have both enums? We could have a single
> > enum it seems and not lose too much
>
> You are right. I have done away with the WAITPID* enum.
>
> > -> you have a switch "
> > switch (errno) {"
> > -> Where really you could simplify the reading by moving the
> > EINTR case outside with its continue
> > -> The switch could then remain as it was (though you move
> > print_debug to print_error)
> > -> and just return in each cases
> I have changed this to:
>
> 206 } else {
> 207 switch (errno) {
> 208 case EINTR:
> 209 continue;
> 210 break;
> 211 case ECHILD:
> 212 print_debug("waitpid() failed. Child process pid (%d) does
> not exist \n", pid);
> 213 return ATTACH_THREAD_DEAD;
> 214 case EINVAL:
> 215 print_error("waitpid() failed. Invalid options argument.\n");
> 216 return ATTACH_FAIL;
> 217 default:
> 218 print_error("waitpid() failed. Unexpected error %d\n",
> errno);
> 219 return ATTACH_FAIL;
> 220 }
> 221 } // else
>
>
> >
> > -> if (strncmp (buf, "State:", 6) == 0) {
> > -> You use sizeof("State:") right below; perhaps you could just
> > use " const char const state[] = "State:";" and use sizeof(state) and
> > for the string, it seems less error prone
> >
> > -> A minor "bug" is here:
> > + state = buf + sizeof ("State:");
> > -> You did a strncmp above but that only assures the start of
> > the string is "State:", technically the character after the ':' is the
> > but it could only be that; sizeof("State:") is 7 and not 6. So you miss
> > one character when you are skipping spaces
> > -> It was probably ok because you always had at least one
> > space, ie "State: "
>
> Thanks! I have made some changes here to use a const char string and a
> variable to store the calculated length using strlen(). And I am using
> isspace() now to skip spaces since tabs could also be used as a delimiter.
>
> > -> Extra space here before the '(': "sizeof (buf)"
> Done.
> >
> > Finally your return sequence for that method could be simplified to:
> >
> > + if (!found_state) {
> > + print_error(" Could not find the State: string in the status file
> > for pid %d\n", pid);
> > + }
> > + fclose (fp);
> > + return !found_state;
>
> I have modified this to:
>
> 257 if (!found_state) {
> 258 // Assuming the thread exists.
> 259 print_error("Could not find the 'State:' string in the
> /proc/%d/status file\n", pid);
> 260 }
> 261 fclose (fp);
> 262 return false;
>
> Thank you,
> Jini.
>
> >
> > Thanks!
> > Jc
> >
> > On Tue, Dec 11, 2018 at 9:30 AM Jini George <jini.george at oracle.com
> > <mailto:jini.george at oracle.com>> wrote:
> >
> > Hello !
> >
> > Requesting reviews for:
> >
> > https://bugs.openjdk.java.net/browse/JDK-8202884
> > Webrev:
> http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/index.html
> >
> > Details:
> > For attaching to the threads in a process, we first go ahead and do a
> > ptrace attach to the main thread. Later, we use the libthread_db
> > library
> > (or, in the case of being within a container, iterate through the
> > /proc/<pid>/task files) to discover the threads of the process, and
> add
> > them to the threads list (within SA) for this process. Once, we have
> > discovered all the threads and added these to the list of threads, we
> > then invoke ptrace attach individually on all these threads to
> > attach to
> > these. When we deal with an application where the threads are exiting
> > continuously, some of these threads might not exist by the time we
> try
> > to ptrace attach to these threads. The proposed fix includes the
> > following modifications to solve this.
> > 1. Check the state of the threads in the thread_db callback
> routine,
> > and skip if the state of the thread is TD_THR_UNKNOWN or
> TD_THR_ZOMBIE.
> > SA does not try to ptrace attach to these threads and does not
> include
> > these threads in the threads list.
> > 2. While ptrace attaching to the thread, if ptrace(PTRACE_ATTACH,
> > ...)
> > fails with either ESCRH or EPERM, check the state of the thread by
> > checking if the /proc/<pid>/status file corresponding to that thread
> > exists and if so, reading in the 'State:' line of that file. Skip
> > attaching to this thread and delete this thread from the SA list of
> > threads, if the thread is dead (State: X) or is a zombie (State: Z).
> > From the /proc man page, "Current state of the process. One of "R
> > (running)", "S (sleeping)", "D (disk sleep)", "T (stopped)", "T
> > (tracing
> > stop)", "Z (zombie)", or "X (dead)"."
> > 3. If waitpid() on the thread is a failure, again skip this thread
> > (delete this from SA's list of threads) instead of bailing out if the
> > thread has exited or terminated.
> >
> > Testing:
> > 1. Tested by attaching and detaching multiple times to a test program
> > spawning numerous short lived threads.
> > 2. The SA tests (under test/hotspot/jtreg/serviceability/sa) passed
> > with
> > 100 repeats on Mach5.
> > 3. No new failures and no occurrences of JDK-8202884 seen with
> testing
> > the SA tests (tiers 1 to 5) on Mach5.
> >
> > More details in the bug comments section.
> >
> > Thank you,
> > Jini.
> >
> >
> >
> > --
> >
> > Thanks,
> > Jc
>
--
Thanks,
Jc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20181212/39624a61/attachment.html>
More information about the serviceability-dev
mailing list