RFR 7162400: Intermittent java.io.IOException: Bad file number during HotSpotVirtualMachine.executeCommand
Peter Allwin
peter.allwin at oracle.com
Tue Jul 9 08:25:48 PDT 2013
Mikael,
That's a good point, unfortunately attach uses os::get_temp_directory which
is hardcoded to use /tmp. We could add a whitebox API to allow us to
override this but now we're on the border to noreg-hard land again IMO.
Any other opinions on this?
Thanks!
/peter
> -----Original Message-----
> From: Mikael Gerdin [mailto:mikael.gerdin at oracle.com]
> Sent: Tuesday, July 9, 2013 2:49 PM
> To: Peter Allwin
> Cc: serguei.spitsyn at oracle.com; daniel.daugherty at oracle.com;
> serviceability-dev at openjdk.java.net; hotspot-runtime-
> dev at openjdk.java.net
> Subject: Re: RFR 7162400: Intermittent java.io.IOException: Bad file
number
> during HotSpotVirtualMachine.executeCommand
>
> Peter,
>
> On 2013-07-09 14:25, Peter Allwin wrote:
> > Hello!
> >
> > It is reproducible by letting the test create .java_pid* files for all
> > possible process id's on the system, setting correct access flags,
> > launching the target VM and attempting to connect. There are some
> > caveats though but it should be doable.
> >
> > I'll convert the repro script to JTREG and add it to the webrev.
>
> It's probably not a good idea to have a test which taints the system with
stale
> .java_pid* files.
> If the test execution times out and the script isn't allowed to clean up I
> imagine that other subsequent executions could fail.
> Is there a way to tell the attach api to use a specific directory so you
won't
> need to taint /tmp?
>
> /Mikael
>
> >
> > Thanks for the reviews!
> >
> > /peter
> >
> > *From:*serguei.spitsyn at oracle.com [mailto:serguei.spitsyn at oracle.com]
> > *Sent:* Tuesday, July 9, 2013 1:26 AM
> > *To:* daniel.daugherty at oracle.com
> > *Cc:* Peter Allwin; serviceability-dev at openjdk.java.net;
> > hotspot-runtime-dev at openjdk.java.net
> > *Subject:* Re: RFR 7162400: Intermittent java.io.IOException: Bad file
> > number during HotSpotVirtualMachine.executeCommand
> >
> > Ok, thanks!
> >
> > Peter, did you manage to reproduce this issue with your script?
> > If so, then, please, include it into the bug report and remove the
> > "noreg-sqe" label.
> >
> > It is Ok if you did not reproduce it, though.
> >
> > Thanks,
> > Serguei
> >
> >
> > On 7/8/13 4:20 PM, Daniel D. Daugherty wrote:
> >
> > I definitely don't insist... :-)
> >
> > BTW, I noticed this in Peter's e-mail:
> >
> > > Testing:
> > > JPRT, reproducing script on Solaris, Linux.
> >
> > so maybe Peter already has this covered with "reproducing script"...
> >
> > Dan
> >
> > On 7/8/13 5:07 PM, serguei.spitsyn at oracle.com
> > <mailto:serguei.spitsyn at oracle.com> wrote:
> >
> > Dan,
> >
> > Dan, thank you for the recommendation.
> > But I'm still not sure it is a right thing to do.
> > Even though, there are multiple test cases associated with this
> > bug they
> > can not be used to verify that fix because an additional
condition
> > must be present as well.
> > This condition is a presence of stale door file which is not
> > that easy to reproduce.
> >
> > However, if you insist then I can change the lable to the
> > "noreg-sqe"
> > with the corresponding comment.
> >
> > Thanks,
> > Serguei
> >
> >
> > On 7/8/13 3:46 PM, Daniel D. Daugherty wrote:
> >
> > Serguei,
> >
> > There are a number of existing tests associated with this
> > bug. I don't
> > think that 'noreg-hard' is the right label. I think
> > 'noreg-sqe' is
> > the right one:
> >
> > noreg-sqe
> > Change can be verified by running an existing SQE test
> > suite; the bug
> > should identify the suite and the specific test
case(s).
> >
> > Dan
> >
> > On 7/8/13 12:59 PM, serguei.spitsyn at oracle.com
> > <mailto:serguei.spitsyn at oracle.com> wrote:
> >
> > Peter,
> >
> > I've added the label "noreg-hard" with the comment to
> > the report.
> > It is not easy to reproduce the issue and demonstrate
> > the fix in a regression test.
> >
> > Thanks,
> > Serguei
> >
> >
> > On 7/8/13 11:36 AM, serguei.spitsyn at oracle.com
> > <mailto:serguei.spitsyn at oracle.com> wrote:
> >
> > Hi Peter,
> >
> > The fix looks good.
> >
> > Thanks,
> > Serguei
> >
> > On 7/8/13 6:54 AM, Peter Allwin wrote:
> >
> > Hello!
> >
> > Looking for reviews of this change:
> >
> >
http://cr.openjdk.java.net/~allwin/7162400/webrev.01/
> >
> > <http://cr.openjdk.java.net/%7Eallwin/7162400/webrev.01/>
> >
> > For CR:
> >
> > http://bugs.sun.com/view_bug.do?bug_id=7162400
> >
> > https://jbs.oracle.com/bugs/browse/JDK-7162400
> >
> > Summary:
> >
> > This change addresses an issue in the Attach API
> > on Solaris, Linux and BSD where an attaching
> > application can receive IOExceptions such as
> > "Bad file number" (Solaris), "Connection
> > refused" (Linux/BSD), or "well-known file is not
> > secure".
> >
> > The attach process uses a file in the temporary
> > directory as a door (Solaris) or domain socket
> > (Linux,BSD) to communicate with the VM. In
> > certain circumstances stale files can be left in
> > the file system which can cause the attaching
> > application to believe that the VM is ready to
> > receive a connection when it's not. With this
> > change the stale file will be removed during VM
> > startup.
> >
> > Note that there is still an issue if we don't
> > have permission to remove the stale file, the
> > attaching process will fail to connect.
> >
> > Testing:
> >
> > JPRT, reproducing script on Solaris, Linux.
> >
> > Credits:
> >
> > Thanks to Staffan Larsen who worked on this
> > issue with me.
> >
> > Regards,
> >
> >
> > Peter
> >
More information about the serviceability-dev
mailing list