8220175: serviceability/dcmd/framework/VMVersionTest.java fails with a timeout

Chris Plummer chris.plummer at oracle.com
Fri Jun 21 02:39:06 UTC 2019


Thanks for looking into this.

Chris

On 6/20/19 6:42 PM, Daniil Titov wrote:
> Thank you, Chris and Serguei, for reviewing this change.
>
> I did more testing with a test program on Linux that repeats get_namespace_pid() method and reads from the real file ( the copy of /proc/<pid>/status).
> I paused the program after the first several lines ( but not "NSpid:" line)  where processed and deleted all lines in the status file before " NSpid: ..." line in order
> to this line became the first in the edited file. After that the program continues in the while loop but it seems as the original file content was
> already buffered and the program just continues over the original unedited lines and successfully found the match.
>
> Best regards,
> Daniil
>
> On 6/19/19, 9:13 PM, "Chris Plummer" <chris.plummer at oracle.com> wrote:
>
>      Hi Daniil,
>      
>      I think your fix is good, although I wonder about the robustness of this
>      function given that the status file can change while it is reading it. I
>      wonder if it can return a false negative because it never saw the
>      matching line. This could happen if a line gets deleted, causing the
>      matching line to suddenly be earlier in the file, possibly before the
>      current location being read. Anyway, that's not really related to the
>      current issue or fix, but if you think it might be an issue maybe a bug
>      should be filed for it.
>      
>      thanks,
>      
>      Chris
>      
>      On 6/19/19 9:02 PM, Daniil Titov wrote:
>      > Please review the change that fixes an intermittent failure of serviceability/dcmd/framework/* tests on Linux platform.
>      >
>      > The problem here is that get_namespace_pid() method, that is called by mmap_attach_shared () that in turn is called by PerfMemory::attach(),
>      > tries to read the namespace pid information from /proc/<pid>/status file. However, it doesn't check that the error indicator associated with
>      > stream is set that results in the endless  loop (lines 664-677) if the process terminates after /proc/<pid>/status was opened (line 659)
>      > and checked for null (line 661).
>      >
>      >    658	  snprintf(fname, sizeof(fname), "/proc/%d/status", vmid);
>      >     659	  FILE *fp = fopen(fname, "r");
>      >     660	
>      >     661	  if (fp) {
>      >     662	    int pid, nspid;
>      >     663	    int ret;
>      >     664	    while (!feof(fp)) {
>      >     665	      ret = fscanf(fp, "NSpid: %d %d", &pid, &nspid);
>      >     666	      if (ret == 1) {
>      >     667	        break;
>      >     668	      }
>      >     669	      if (ret == 2) {
>      >     670	        retpid = nspid;
>      >     671	        break;
>      >     672	      }
>      >     673	      for (;;) {
>      >     674	        int ch = fgetc(fp);
>      >     675	        if (ch == EOF || ch == (int)'\n') break;
>      >     676	      }
>      >     677	    }
>      >     678	    fclose(fp);
>      >     679	  }
>      >
>      > The fix adds the check for the error indicator to ensure that the "while" loop terminates properly if the file no longer exists.
>      >
>      > Issues [3] and [4] have the same cause and will be closed as duplicates of this issue.
>      >
>      > Testing: Mach5 hotspot_serviceability tests succeeded, tier1,tier2, and tier3 tests are in progress.
>      >
>      > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8220175/webrev.01/
>      > [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8220175
>      > [3] https://bugs.openjdk.java.net/browse/JDK-8223600
>      > [4] https://bugs.openjdk.java.net/browse/JDK-8217351
>      >
>      > Thanks!
>      > -Daniil
>      >
>      >
>      
>      
>
>




More information about the serviceability-dev mailing list