RFR: JDK-8202884: SA: Attach/detach might fail on Linux if debugee application create/destroy threads during attaching

Jini George jini.george at oracle.com
Wed Dec 12 18:27:39 UTC 2018


On 12/12/2018 10:47 PM, JC Beyler wrote:
> Hi Jini,
> 
> Should your return not be return !found_state instead of false here:
> 
> 257   if (!found_state) {
> 258     // Assuming the thread exists.
> 259     print_error("Could not find the 'State:' string in the
> /proc/%d/status file\n", pid);
> 260   }
> 261   fclose (fp);
> 262   return false;
> 
> In your webrev.00 it was the case but now, you always return the process 
> does exist even if you have not found it.

I referred to the gdb sources to check what is done under this scenario, 
and in gdb, it is assumed that if the line beginning with 'State:' is 
not found, the thread is alive. But to be frank, I don't know under what 
circumstances will we ever encounter such a scenario. Let me know if you 
don't agree with this.

> cf:
> http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c.udiff.html
> vs
> http://cr.openjdk.java.net/~jgeorge/8202884/webrev.02/src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c.udiff.html
> 
> Tiny nit: no need to check *space if you are using isspace(*space) right 
> after :)

Will change this. Thanks!

> 
> Apart from the return question, the webrev looks good to me :-)
> Jc

Thank you again!
Jini

> 
> 
> On Wed, Dec 12, 2018 at 4:15 AM Jini George <jini.george at oracle.com 
> <mailto:jini.george at oracle.com>> wrote:
> 
>     Thank you very much for looking into this, JC!
> 
>     I have a revised webrev addressing your comments at:
> 
>     http://cr.openjdk.java.net/~jgeorge/8202884/webrev.02/index.html
> 
>     Requesting one more review for this. My comments inline:
> 
>     On 12/12/2018 2:53 AM, JC Beyler wrote:
>      > Hi Jini,
>      >
>      > I saw a few nits:
>      >
>     http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/libproc_impl.h.udiff.html
>      >    -> The comments are in the third person normally it seems so
>     it would
>      > become (I also removed the s from threads):
>      >
>      > +// deletes a thread from the thread list
>     Done.
>      >
>      >
>     http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/libproc_impl.c.udiff.html
>      >    -> You added two empty lines it seems that could be removed
>     Done.
>      >
>      >
>      >
>     http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c.udiff.html
>      >    -> Is there a real reason to have both enums? We could have a
>     single
>      > enum it seems and not lose too much
> 
>     You are right. I have done away with the WAITPID* enum.
> 
>      >    -> you have a switch "
>      >         switch (errno) {"
>      >          -> Where really you could simplify the reading by moving
>     the
>      > EINTR case outside with its continue
>      >          -> The switch could then remain as it was (though you move
>      > print_debug to print_error)
>      >          -> and just return in each cases
>     I have changed this to:
> 
>     206     } else {
>     207       switch (errno) {
>     208         case EINTR:
>     209           continue;
>     210           break;
>     211         case ECHILD:
>     212           print_debug("waitpid() failed. Child process pid (%d)
>     does
>     not exist \n", pid);
>     213           return ATTACH_THREAD_DEAD;
>     214         case EINVAL:
>     215           print_error("waitpid() failed. Invalid options
>     argument.\n");
>     216           return ATTACH_FAIL;
>     217         default:
>     218           print_error("waitpid() failed. Unexpected error %d\n",
>     errno);
>     219           return ATTACH_FAIL;
>     220       }
>     221     } // else
> 
> 
>      >
>      >     -> if (strncmp (buf, "State:", 6) == 0) {
>      >        -> You use sizeof("State:") right below; perhaps you could
>     just
>      > use "  const char const state[] = "State:";" and use
>     sizeof(state) and
>      > for the string, it seems less error prone
>      >
>      >    -> A minor "bug" is here:
>      > +      state = buf + sizeof ("State:");
>      >          -> You did a strncmp above but that only assures the
>     start of
>      > the string is "State:", technically the character after the ':'
>     is the
>      > but it could only be that; sizeof("State:") is 7 and not 6. So
>     you miss
>      > one character when you are skipping spaces
>      >          -> It was probably ok because you always had at least one
>      > space, ie "State: "
> 
>     Thanks! I have made some changes here to use a const char string and a
>     variable to store the calculated length using strlen(). And I am using
>     isspace() now to skip spaces since tabs could also be used as a
>     delimiter.
> 
>      >    -> Extra space here before the '(': "sizeof (buf)"
>     Done.
>      >
>      > Finally your return sequence for that method could be simplified to:
>      >
>      > +  if (!found_state) {
>      > +    print_error(" Could not find the State: string in the status
>     file
>      > for pid %d\n", pid);
>      > +  }
>      > +  fclose (fp);
>      > +  return !found_state;
> 
>     I have modified this to:
> 
>     257   if (!found_state) {
>     258     // Assuming the thread exists.
>     259     print_error("Could not find the 'State:' string in the
>     /proc/%d/status file\n", pid);
>     260   }
>     261   fclose (fp);
>     262   return false;
> 
>     Thank you,
>     Jini.
> 
>      >
>      > Thanks!
>      > Jc
>      >
>      > On Tue, Dec 11, 2018 at 9:30 AM Jini George
>     <jini.george at oracle.com <mailto:jini.george at oracle.com>
>      > <mailto:jini.george at oracle.com <mailto:jini.george at oracle.com>>>
>     wrote:
>      >
>      >     Hello !
>      >
>      >     Requesting reviews for:
>      >
>      > https://bugs.openjdk.java.net/browse/JDK-8202884
>      >     Webrev:
>     http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/index.html
>      >
>      >     Details:
>      >     For attaching to the threads in a process, we first go ahead
>     and do a
>      >     ptrace attach to the main thread. Later, we use the libthread_db
>      >     library
>      >     (or, in the case of being within a container, iterate through the
>      >     /proc/<pid>/task files) to discover the threads of the
>     process, and add
>      >     them to the threads list (within SA) for this process. Once,
>     we have
>      >     discovered all the threads and added these to the list of
>     threads, we
>      >     then invoke ptrace attach individually on all these threads to
>      >     attach to
>      >     these. When we deal with an application where the threads are
>     exiting
>      >     continuously, some of these threads might not exist by the
>     time we try
>      >     to ptrace attach to these threads. The proposed fix includes the
>      >     following modifications to solve this.
>      >        1. Check the state of the threads in the thread_db
>     callback routine,
>      >     and skip if the state of the thread is TD_THR_UNKNOWN or
>     TD_THR_ZOMBIE.
>      >     SA does not try to ptrace attach to these threads and does
>     not include
>      >     these threads in the threads list.
>      >        2. While ptrace attaching to the thread, if
>     ptrace(PTRACE_ATTACH,
>      >     ...)
>      >     fails with either ESCRH or EPERM, check the state of the
>     thread by
>      >     checking if the /proc/<pid>/status file corresponding to that
>     thread
>      >     exists and if so, reading in the 'State:' line of that file. Skip
>      >     attaching to this thread and delete this thread from the SA
>     list of
>      >     threads, if the thread is dead (State: X) or is a zombie
>     (State: Z).
>      >       From the /proc man page, "Current state of the process. One
>     of "R
>      >     (running)", "S (sleeping)", "D (disk sleep)", "T (stopped)", "T
>      >     (tracing
>      >     stop)", "Z (zombie)", or "X (dead)"."
>      >        3. If waitpid() on the thread is a failure, again skip
>     this thread
>      >     (delete this from SA's list of threads) instead of bailing
>     out if the
>      >     thread has exited or terminated.
>      >
>      >     Testing:
>      >     1. Tested by attaching and detaching multiple times to a test
>     program
>      >     spawning numerous short lived threads.
>      >     2. The SA tests (under test/hotspot/jtreg/serviceability/sa)
>     passed
>      >     with
>      >     100 repeats on Mach5.
>      >     3. No new failures and no occurrences of JDK-8202884 seen
>     with testing
>      >     the SA tests (tiers 1 to 5) on Mach5.
>      >
>      >     More details in the bug comments section.
>      >
>      >     Thank you,
>      >     Jini.
>      >
>      >
>      >
>      > --
>      >
>      > Thanks,
>      > Jc
> 
> 
> 
> -- 
> 
> Thanks,
> Jc


More information about the serviceability-dev mailing list