RFR 7162400: Intermittent java.io.IOException: Bad file number during HotSpotVirtualMachine.executeCommand

Mikael Gerdin mikael.gerdin at oracle.com
Tue Jul 9 05:48:34 PDT 2013


Peter,

On 2013-07-09 14:25, Peter Allwin wrote:
> Hello!
>
> It is reproducible by letting the test create .java_pid* files for all
> possible process id’s on the system, setting correct access flags,
> launching the target VM and attempting to connect. There are some
> caveats though but it should be doable.
>
> I’ll convert the repro script to JTREG and add it to the webrev.

It's probably not a good idea to have a test which taints the system 
with stale .java_pid* files.
If the test execution times out and the script isn't allowed to clean up 
I imagine that other subsequent executions could fail.
Is there a way to tell the attach api to use a specific directory so you 
won't need to taint /tmp?

/Mikael

>
> Thanks for the reviews!
>
> /peter
>
> *From:*serguei.spitsyn at oracle.com [mailto:serguei.spitsyn at oracle.com]
> *Sent:* Tuesday, July 9, 2013 1:26 AM
> *To:* daniel.daugherty at oracle.com
> *Cc:* Peter Allwin; serviceability-dev at openjdk.java.net;
> hotspot-runtime-dev at openjdk.java.net
> *Subject:* Re: RFR 7162400: Intermittent java.io.IOException: Bad file
> number during HotSpotVirtualMachine.executeCommand
>
> Ok, thanks!
>
> Peter, did you manage to reproduce this issue with your script?
> If so, then, please, include it into the bug report and remove the
> "noreg-sqe" label.
>
> It is Ok if you did not reproduce it, though.
>
> Thanks,
> Serguei
>
>
> On 7/8/13 4:20 PM, Daniel D. Daugherty wrote:
>
>     I definitely don't insist... :-)
>
>     BTW, I noticed this in Peter's e-mail:
>
>     > Testing:
>     > JPRT, reproducing script on Solaris, Linux.
>
>     so maybe Peter already has this covered with "reproducing script"...
>
>     Dan
>
>     On 7/8/13 5:07 PM, serguei.spitsyn at oracle.com
>     <mailto:serguei.spitsyn at oracle.com> wrote:
>
>         Dan,
>
>         Dan, thank you for the recommendation.
>         But I'm still not sure it is a right thing to do.
>         Even though, there are multiple test cases associated with this
>         bug they
>         can not be used to verify that fix because an additional condition
>         must be present as well.
>         This condition is a presence of stale door file which is not
>         that easy to reproduce.
>
>         However, if you insist then I can change the lable to the
>         "noreg-sqe"
>         with the corresponding comment.
>
>         Thanks,
>         Serguei
>
>
>         On 7/8/13 3:46 PM, Daniel D. Daugherty wrote:
>
>             Serguei,
>
>             There are a number of existing tests associated with this
>             bug. I don't
>             think that 'noreg-hard' is the right label. I think
>             'noreg-sqe' is
>             the right one:
>
>             noreg-sqe
>                  Change can be verified by running an existing SQE test
>             suite; the bug
>                  should identify the suite and the specific test case(s).
>
>             Dan
>
>             On 7/8/13 12:59 PM, serguei.spitsyn at oracle.com
>             <mailto:serguei.spitsyn at oracle.com> wrote:
>
>                 Peter,
>
>                 I've added the label "noreg-hard" with the comment to
>                 the report.
>                 It is not easy to reproduce the issue and demonstrate
>                 the fix in a regression test.
>
>                 Thanks,
>                 Serguei
>
>
>                 On 7/8/13 11:36 AM, serguei.spitsyn at oracle.com
>                 <mailto:serguei.spitsyn at oracle.com> wrote:
>
>                     Hi Peter,
>
>                     The fix looks good.
>
>                     Thanks,
>                     Serguei
>
>                     On 7/8/13 6:54 AM, Peter Allwin wrote:
>
>                         Hello!
>
>                         Looking for reviews of this change:
>
>                         http://cr.openjdk.java.net/~allwin/7162400/webrev.01/
>                         <http://cr.openjdk.java.net/%7Eallwin/7162400/webrev.01/>
>
>                         For CR:
>
>                         http://bugs.sun.com/view_bug.do?bug_id=7162400
>
>                         https://jbs.oracle.com/bugs/browse/JDK-7162400
>
>                         Summary:
>
>                         This change addresses an issue in the Attach API
>                         on Solaris, Linux and BSD where an attaching
>                         application can receive IOExceptions such as
>                         “Bad file number” (Solaris), “Connection
>                         refused” (Linux/BSD), or “well-known file is not
>                         secure”.
>
>                         The attach process uses a file in the temporary
>                         directory as a door (Solaris) or domain socket
>                         (Linux,BSD) to communicate with the VM. In
>                         certain circumstances stale files can be left in
>                         the file system which can cause the attaching
>                         application to believe that the VM is ready to
>                         receive a connection when it’s not. With this
>                         change the stale file will be removed during VM
>                         startup.
>
>                         Note that there is still an issue if we don’t
>                         have permission to remove the stale file, the
>                         attaching process will fail to connect.
>
>                         Testing:
>
>                         JPRT, reproducing script on Solaris, Linux.
>
>                         Credits:
>
>                         Thanks to Staffan Larsen who worked on this
>                         issue with me.
>
>                         Regards,
>
>
>                         Peter
>


More information about the hotspot-runtime-dev mailing list