RFR 7162400: Intermittent java.io.IOException: Bad file number during HotSpotVirtualMachine.executeCommand

Peter Allwin peter.allwin at oracle.com
Tue Jul 9 08:25:48 PDT 2013


Mikael,

That's a good point, unfortunately attach uses os::get_temp_directory which
is hardcoded to use /tmp. We could add a whitebox API to allow us to
override this but now we're on the border to noreg-hard land again IMO.

Any other opinions on this?


Thanks!

/peter

> -----Original Message-----
> From: Mikael Gerdin [mailto:mikael.gerdin at oracle.com]
> Sent: Tuesday, July 9, 2013 2:49 PM
> To: Peter Allwin
> Cc: serguei.spitsyn at oracle.com; daniel.daugherty at oracle.com;
> serviceability-dev at openjdk.java.net; hotspot-runtime-
> dev at openjdk.java.net
> Subject: Re: RFR 7162400: Intermittent java.io.IOException: Bad file
number
> during HotSpotVirtualMachine.executeCommand
> 
> Peter,
> 
> On 2013-07-09 14:25, Peter Allwin wrote:
> > Hello!
> >
> > It is reproducible by letting the test create .java_pid* files for all
> > possible process id's on the system, setting correct access flags,
> > launching the target VM and attempting to connect. There are some
> > caveats though but it should be doable.
> >
> > I'll convert the repro script to JTREG and add it to the webrev.
> 
> It's probably not a good idea to have a test which taints the system with
stale
> .java_pid* files.
> If the test execution times out and the script isn't allowed to clean up I
> imagine that other subsequent executions could fail.
> Is there a way to tell the attach api to use a specific directory so you
won't
> need to taint /tmp?
> 
> /Mikael
> 
> >
> > Thanks for the reviews!
> >
> > /peter
> >
> > *From:*serguei.spitsyn at oracle.com [mailto:serguei.spitsyn at oracle.com]
> > *Sent:* Tuesday, July 9, 2013 1:26 AM
> > *To:* daniel.daugherty at oracle.com
> > *Cc:* Peter Allwin; serviceability-dev at openjdk.java.net;
> > hotspot-runtime-dev at openjdk.java.net
> > *Subject:* Re: RFR 7162400: Intermittent java.io.IOException: Bad file
> > number during HotSpotVirtualMachine.executeCommand
> >
> > Ok, thanks!
> >
> > Peter, did you manage to reproduce this issue with your script?
> > If so, then, please, include it into the bug report and remove the
> > "noreg-sqe" label.
> >
> > It is Ok if you did not reproduce it, though.
> >
> > Thanks,
> > Serguei
> >
> >
> > On 7/8/13 4:20 PM, Daniel D. Daugherty wrote:
> >
> >     I definitely don't insist... :-)
> >
> >     BTW, I noticed this in Peter's e-mail:
> >
> >     > Testing:
> >     > JPRT, reproducing script on Solaris, Linux.
> >
> >     so maybe Peter already has this covered with "reproducing script"...
> >
> >     Dan
> >
> >     On 7/8/13 5:07 PM, serguei.spitsyn at oracle.com
> >     <mailto:serguei.spitsyn at oracle.com> wrote:
> >
> >         Dan,
> >
> >         Dan, thank you for the recommendation.
> >         But I'm still not sure it is a right thing to do.
> >         Even though, there are multiple test cases associated with this
> >         bug they
> >         can not be used to verify that fix because an additional
condition
> >         must be present as well.
> >         This condition is a presence of stale door file which is not
> >         that easy to reproduce.
> >
> >         However, if you insist then I can change the lable to the
> >         "noreg-sqe"
> >         with the corresponding comment.
> >
> >         Thanks,
> >         Serguei
> >
> >
> >         On 7/8/13 3:46 PM, Daniel D. Daugherty wrote:
> >
> >             Serguei,
> >
> >             There are a number of existing tests associated with this
> >             bug. I don't
> >             think that 'noreg-hard' is the right label. I think
> >             'noreg-sqe' is
> >             the right one:
> >
> >             noreg-sqe
> >                  Change can be verified by running an existing SQE test
> >             suite; the bug
> >                  should identify the suite and the specific test
case(s).
> >
> >             Dan
> >
> >             On 7/8/13 12:59 PM, serguei.spitsyn at oracle.com
> >             <mailto:serguei.spitsyn at oracle.com> wrote:
> >
> >                 Peter,
> >
> >                 I've added the label "noreg-hard" with the comment to
> >                 the report.
> >                 It is not easy to reproduce the issue and demonstrate
> >                 the fix in a regression test.
> >
> >                 Thanks,
> >                 Serguei
> >
> >
> >                 On 7/8/13 11:36 AM, serguei.spitsyn at oracle.com
> >                 <mailto:serguei.spitsyn at oracle.com> wrote:
> >
> >                     Hi Peter,
> >
> >                     The fix looks good.
> >
> >                     Thanks,
> >                     Serguei
> >
> >                     On 7/8/13 6:54 AM, Peter Allwin wrote:
> >
> >                         Hello!
> >
> >                         Looking for reviews of this change:
> >
> >
http://cr.openjdk.java.net/~allwin/7162400/webrev.01/
> >
> > <http://cr.openjdk.java.net/%7Eallwin/7162400/webrev.01/>
> >
> >                         For CR:
> >
> >                         http://bugs.sun.com/view_bug.do?bug_id=7162400
> >
> >                         https://jbs.oracle.com/bugs/browse/JDK-7162400
> >
> >                         Summary:
> >
> >                         This change addresses an issue in the Attach API
> >                         on Solaris, Linux and BSD where an attaching
> >                         application can receive IOExceptions such as
> >                         "Bad file number" (Solaris), "Connection
> >                         refused" (Linux/BSD), or "well-known file is not
> >                         secure".
> >
> >                         The attach process uses a file in the temporary
> >                         directory as a door (Solaris) or domain socket
> >                         (Linux,BSD) to communicate with the VM. In
> >                         certain circumstances stale files can be left in
> >                         the file system which can cause the attaching
> >                         application to believe that the VM is ready to
> >                         receive a connection when it's not. With this
> >                         change the stale file will be removed during VM
> >                         startup.
> >
> >                         Note that there is still an issue if we don't
> >                         have permission to remove the stale file, the
> >                         attaching process will fail to connect.
> >
> >                         Testing:
> >
> >                         JPRT, reproducing script on Solaris, Linux.
> >
> >                         Credits:
> >
> >                         Thanks to Staffan Larsen who worked on this
> >                         issue with me.
> >
> >                         Regards,
> >
> >
> >                         Peter
> >



More information about the serviceability-dev mailing list