RFR 9: 8133552 : java/lang/ProcessHandle/InfoTest.java fails intermittently - incorrect user

Chris Hegarty chris.hegarty at oracle.com
Thu Sep 10 13:59:29 UTC 2015


On 10 Sep 2015, at 14:49, Roger Riggs <Roger.Riggs at Oracle.com> wrote:

> Hi Chris,
> 
> ok, updated the webrev with the 30 sec timeouts.  

Thanks Roger.

I remember going many rounds on false timeouts from tests in other areas a few years back. We came to a consensus that 30 secs as a timeout, that should never be triggered, was a reasonable value.

> I also expect that the timeoutFactor on slow systems would be applied by jtreg.

Yes, but this does not cater for swamped systems.  I think we can err on the side of caution here, without any real cost.

Thanks,
-Chris.

> Roger
> 
> 
> On 9/10/2015 9:43 AM, Chris Hegarty wrote:
>> Roger,
>> 
>> The timeouts, in this test, are just to ensure that the test does not block indefinitely, if it encounters a bug in the JDK, right?  If a timeout is ever triggered then there is a bug, right?
>> 
> correct
>> 
>> If this is the case then, we have used larger timeouts in other areas ( net, concurrency ) to cover running on slooooow, or busy, machines. Typically 30 secs.   To ensure no false failures.  The large value doesn’t really matter because it is never expected to actually wait that long. If it does timeout, then there is definitely a JDK bug.  Does it make sense to bump these to 30 secs also?
>> 
>> -Chris.
>> 
>> On 10 Sep 2015, at 14:30, Roger Riggs 
>> <Roger.Riggs at oracle.com>
>>  wrote:
>> 
>> 
>>> Hi Joe,
>>> 
>>> I think adjusting the timeouts is already covered.
>>> The test uses Process.waitFor(timeout) to wait for the process to exit, but only up to the timeout value.
>>> The "Utils.adjustTimeout(5)", performs the desired adjustment based on the jtreg timeoutFactor.
>>> Utils is in the testlibrary.
>>> 
>>> Roger
>>> 
>>> 
>>> On 9/9/2015 8:08 PM, Joseph D. Darcy wrote:
>>> 
>>>> Hi Roger,
>>>> 
>>>> If timeouts need to be used, I suggest rather than fixed values they be adjusted according to the timeout factor being used in the test run.
>>>> 
>>>> Can some sort of repeated testing with exponential backout to a longer timeout be used ? If the system is actually ready is a fraction of a second, it is preferable for the test to be able to complete without waiting the full timeout value. (Perhaps that is already encapsulated in the existing code.)
>>>> 
>>>> Thanks,
>>>> 
>>>> -Joe
>>>> 
>>>> On 9/9/2015 2:49 PM, Roger Riggs wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Please review this update to extract the uid on from the owner of the /proc/<pid> file.
>>>>> It should be more reliable than using the owner of the /proc/<pid>/cmdline file.
>>>>> 
>>>>> Webrev:
>>>>>    
>>>>> http://cr.openjdk.java.net/~rriggs/webrev-info-8133552/
>>>>> 
>>>>> 
>>>>> Thanks, Roger
>>>>> 
>>>>> 
>>>>> On 9/9/2015 12:56 PM, Roger Riggs wrote:
>>>>> 
>>>>>> Hi Volker,
>>>>>> 
>>>>>> Thanks for the review and diagnosis.
>>>>>> 
>>>>>> Can opening /proc/pid be used as a fallback if the st_uid is zero or
>>>>>> is it worth the overhead of stat'ing /proc/pid always?
>>>>>> 
>>>>>> Thanks, Roger
>>>>>> 
>>>>>> 
>>>>>> On 9/9/2015 11:46 AM, Volker Simonis wrote:
>>>>>> 
>>>>>>> Hi Roger,
>>>>>>> 
>>>>>>> I think your change looks good and it surely improves the test
>>>>>>> stability but I don't think it solves the problem in all cases.
>>>>>>> 
>>>>>>> I think this problem is caused by a <defunct> (i.e. "zombie") process
>>>>>>> (the spawned process lived too short and was already a zombie when the
>>>>>>> info object was created). If you look at the proc-file system entry of
>>>>>>> a <defunct> process you can see that its 'cmdline' file has zero size
>>>>>>> and the file is owned by root. This is exactly what is reported by the
>>>>>>> corresponding info object in the bug report (user=root and no cmd
>>>>>>> field).
>>>>>>> 
>>>>>>> We may need to improve the way how we get the uid of a pid on Linux.
>>>>>>> The current way of querying the owner of /proc/<pid>/cmdline seems to
>>>>>>> be unreliable. We may instead take the owner of /proc/<pid> which
>>>>>>> seems to be still the initial user of the process.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Volker
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Sep 8, 2015 at 11:35 PM, Roger Riggs 
>>>>>>> <Roger.Riggs at oracle.com>
>>>>>>>  wrote:
>>>>>>> 
>>>>>>>> With link to webrev corrected:
>>>>>>>> 
>>>>>>>> On 9/8/2015 5:08 PM, Roger Riggs wrote:
>>>>>>>> 
>>>>>>>>> Please review an intermittent test bug fix.
>>>>>>>>> The test setup time is very short and the user may be returned as 0 which
>>>>>>>>> is reported as root.
>>>>>>>>> The correction lengthens the time allowed for the process to start.
>>>>>>>>> 
>>>>>>>>> The test is removed from the ProblemList.
>>>>>>>>> 
>>>>>>>>> Webrev:
>>>>>>>>> 
>>>>>>>>> http://cr.openjdk.java.net/~rriggs//webrev-info-8133552
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Bug:
>>>>>>>>>   
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8133552
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks, Roger
>>>>>>>>> 
>>>>>>>>> 
> 




More information about the core-libs-dev mailing list