8220175: serviceability/dcmd/framework/VMVersionTest.java fails with a timeout
Chris Plummer
chris.plummer at oracle.com
Fri Jun 21 02:39:06 UTC 2019
Thanks for looking into this.
Chris
On 6/20/19 6:42 PM, Daniil Titov wrote:
> Thank you, Chris and Serguei, for reviewing this change.
>
> I did more testing with a test program on Linux that repeats get_namespace_pid() method and reads from the real file ( the copy of /proc/<pid>/status).
> I paused the program after the first several lines ( but not "NSpid:" line) where processed and deleted all lines in the status file before " NSpid: ..." line in order
> to this line became the first in the edited file. After that the program continues in the while loop but it seems as the original file content was
> already buffered and the program just continues over the original unedited lines and successfully found the match.
>
> Best regards,
> Daniil
>
> On 6/19/19, 9:13 PM, "Chris Plummer" <chris.plummer at oracle.com> wrote:
>
> Hi Daniil,
>
> I think your fix is good, although I wonder about the robustness of this
> function given that the status file can change while it is reading it. I
> wonder if it can return a false negative because it never saw the
> matching line. This could happen if a line gets deleted, causing the
> matching line to suddenly be earlier in the file, possibly before the
> current location being read. Anyway, that's not really related to the
> current issue or fix, but if you think it might be an issue maybe a bug
> should be filed for it.
>
> thanks,
>
> Chris
>
> On 6/19/19 9:02 PM, Daniil Titov wrote:
> > Please review the change that fixes an intermittent failure of serviceability/dcmd/framework/* tests on Linux platform.
> >
> > The problem here is that get_namespace_pid() method, that is called by mmap_attach_shared () that in turn is called by PerfMemory::attach(),
> > tries to read the namespace pid information from /proc/<pid>/status file. However, it doesn't check that the error indicator associated with
> > stream is set that results in the endless loop (lines 664-677) if the process terminates after /proc/<pid>/status was opened (line 659)
> > and checked for null (line 661).
> >
> > 658 snprintf(fname, sizeof(fname), "/proc/%d/status", vmid);
> > 659 FILE *fp = fopen(fname, "r");
> > 660
> > 661 if (fp) {
> > 662 int pid, nspid;
> > 663 int ret;
> > 664 while (!feof(fp)) {
> > 665 ret = fscanf(fp, "NSpid: %d %d", &pid, &nspid);
> > 666 if (ret == 1) {
> > 667 break;
> > 668 }
> > 669 if (ret == 2) {
> > 670 retpid = nspid;
> > 671 break;
> > 672 }
> > 673 for (;;) {
> > 674 int ch = fgetc(fp);
> > 675 if (ch == EOF || ch == (int)'\n') break;
> > 676 }
> > 677 }
> > 678 fclose(fp);
> > 679 }
> >
> > The fix adds the check for the error indicator to ensure that the "while" loop terminates properly if the file no longer exists.
> >
> > Issues [3] and [4] have the same cause and will be closed as duplicates of this issue.
> >
> > Testing: Mach5 hotspot_serviceability tests succeeded, tier1,tier2, and tier3 tests are in progress.
> >
> > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8220175/webrev.01/
> > [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8220175
> > [3] https://bugs.openjdk.java.net/browse/JDK-8223600
> > [4] https://bugs.openjdk.java.net/browse/JDK-8217351
> >
> > Thanks!
> > -Daniil
> >
> >
>
>
>
>
More information about the serviceability-dev
mailing list