RFR 7162400: Intermittent java.io.IOException: Bad file number during HotSpotVirtualMachine.executeCommand

serguei.spitsyn at oracle.com serguei.spitsyn at oracle.com
Tue Jul 9 10:15:32 PDT 2013


On 7/9/13 5:48 AM, Mikael Gerdin wrote:
> Peter,
>
> On 2013-07-09 14:25, Peter Allwin wrote:
>> Hello!
>>
>> It is reproducible by letting the test create .java_pid* files for all
>> possible process id’s on the system, setting correct access flags,
>> launching the target VM and attempting to connect. There are some
>> caveats though but it should be doable.
>>
>> I’ll convert the repro script to JTREG and add it to the webrev.
>
> It's probably not a good idea to have a test which taints the system 
> with stale .java_pid* files.
> If the test execution times out and the script isn't allowed to clean 
> up I imagine that other subsequent executions could fail.

I have the same concern.
Such a test can create more problems than it solves.

> Is there a way to tell the attach api to use a specific directory so 
> you won't need to taint /tmp?
I doubt the attach api has an option to do that.
But even if it has it still looks to much from the test to create many 
files just to check one condition.

I think, the script should be enough for the SQE to verify that the 
issue has been fixed.
We don't need to run it on regular base.
This is my personal opinion, not sure everyone will agree with it. :)

Thanks,
Serguei

>
> /Mikael
>
>>
>> Thanks for the reviews!
>>
>> /peter
>>
>> *From:*serguei.spitsyn at oracle.com [mailto:serguei.spitsyn at oracle.com]
>> *Sent:* Tuesday, July 9, 2013 1:26 AM
>> *To:* daniel.daugherty at oracle.com
>> *Cc:* Peter Allwin; serviceability-dev at openjdk.java.net;
>> hotspot-runtime-dev at openjdk.java.net
>> *Subject:* Re: RFR 7162400: Intermittent java.io.IOException: Bad file
>> number during HotSpotVirtualMachine.executeCommand
>>
>> Ok, thanks!
>>
>> Peter, did you manage to reproduce this issue with your script?
>> If so, then, please, include it into the bug report and remove the
>> "noreg-sqe" label.
>>
>> It is Ok if you did not reproduce it, though.
>>
>> Thanks,
>> Serguei
>>
>>
>> On 7/8/13 4:20 PM, Daniel D. Daugherty wrote:
>>
>>     I definitely don't insist... :-)
>>
>>     BTW, I noticed this in Peter's e-mail:
>>
>>     > Testing:
>>     > JPRT, reproducing script on Solaris, Linux.
>>
>>     so maybe Peter already has this covered with "reproducing script"...
>>
>>     Dan
>>
>>     On 7/8/13 5:07 PM, serguei.spitsyn at oracle.com
>>     <mailto:serguei.spitsyn at oracle.com> wrote:
>>
>>         Dan,
>>
>>         Dan, thank you for the recommendation.
>>         But I'm still not sure it is a right thing to do.
>>         Even though, there are multiple test cases associated with this
>>         bug they
>>         can not be used to verify that fix because an additional 
>> condition
>>         must be present as well.
>>         This condition is a presence of stale door file which is not
>>         that easy to reproduce.
>>
>>         However, if you insist then I can change the lable to the
>>         "noreg-sqe"
>>         with the corresponding comment.
>>
>>         Thanks,
>>         Serguei
>>
>>
>>         On 7/8/13 3:46 PM, Daniel D. Daugherty wrote:
>>
>>             Serguei,
>>
>>             There are a number of existing tests associated with this
>>             bug. I don't
>>             think that 'noreg-hard' is the right label. I think
>>             'noreg-sqe' is
>>             the right one:
>>
>>             noreg-sqe
>>                  Change can be verified by running an existing SQE test
>>             suite; the bug
>>                  should identify the suite and the specific test 
>> case(s).
>>
>>             Dan
>>
>>             On 7/8/13 12:59 PM, serguei.spitsyn at oracle.com
>>             <mailto:serguei.spitsyn at oracle.com> wrote:
>>
>>                 Peter,
>>
>>                 I've added the label "noreg-hard" with the comment to
>>                 the report.
>>                 It is not easy to reproduce the issue and demonstrate
>>                 the fix in a regression test.
>>
>>                 Thanks,
>>                 Serguei
>>
>>
>>                 On 7/8/13 11:36 AM, serguei.spitsyn at oracle.com
>>                 <mailto:serguei.spitsyn at oracle.com> wrote:
>>
>>                     Hi Peter,
>>
>>                     The fix looks good.
>>
>>                     Thanks,
>>                     Serguei
>>
>>                     On 7/8/13 6:54 AM, Peter Allwin wrote:
>>
>>                         Hello!
>>
>>                         Looking for reviews of this change:
>>
>> http://cr.openjdk.java.net/~allwin/7162400/webrev.01/
>> <http://cr.openjdk.java.net/%7Eallwin/7162400/webrev.01/>
>>
>>                         For CR:
>>
>> http://bugs.sun.com/view_bug.do?bug_id=7162400
>>
>> https://jbs.oracle.com/bugs/browse/JDK-7162400
>>
>>                         Summary:
>>
>>                         This change addresses an issue in the Attach API
>>                         on Solaris, Linux and BSD where an attaching
>>                         application can receive IOExceptions such as
>>                         “Bad file number” (Solaris), “Connection
>>                         refused” (Linux/BSD), or “well-known file is not
>>                         secure”.
>>
>>                         The attach process uses a file in the temporary
>>                         directory as a door (Solaris) or domain socket
>>                         (Linux,BSD) to communicate with the VM. In
>>                         certain circumstances stale files can be left in
>>                         the file system which can cause the attaching
>>                         application to believe that the VM is ready to
>>                         receive a connection when it’s not. With this
>>                         change the stale file will be removed during VM
>>                         startup.
>>
>>                         Note that there is still an issue if we don’t
>>                         have permission to remove the stale file, the
>>                         attaching process will fail to connect.
>>
>>                         Testing:
>>
>>                         JPRT, reproducing script on Solaris, Linux.
>>
>>                         Credits:
>>
>>                         Thanks to Staffan Larsen who worked on this
>>                         issue with me.
>>
>>                         Regards,
>>
>>
>>                         Peter
>>



More information about the serviceability-dev mailing list