RFR(S): 8217744: [TESTBUG] JFR TestShutdownEvent fails on some systems due to process surviving SIGINT

mikhailo.seledtsov at oracle.com mikhailo.seledtsov at oracle.com
Tue Jan 29 22:58:16 UTC 2019


On 1/29/19 2:48 PM, Erik Gahlin wrote:
> On 2019-01-29 22:51, mikhailo.seledtsov at oracle.com wrote:
>> Hi Erik,
>>
>>   Thank you for review.
>>
>>
>> On 1/29/19 1:26 PM, Erik Gahlin wrote:
>>> Hi Misha,
>>>
>>> I noticed the "60_1000" when I reviewed you change the first time, 
>>> but isn't it better to just let the process hang "forever" (i.e. 600 
>>> s) if it can't be killed than an arbitrary 10 s, which may or may 
>>> not be sufficient?
>>>
>>> Determinism is nice when analyzing test failures. Remove the sleep, 
>>> perhaps adding a System.out, or just let it sleep indefinitely, i.e 
>>> Thread.sleep(1_000_000).
>>>
>>> Or would that not work?
>> I did experiment with the value of sleep. Original is 60_1000, which 
>> is 60 sec. 
>
> 60_1000 is 601 seconds.
>
> 60 seconds is 60_000.
That's right, thank you for correction. I missed that.
>
>> I removed the sleep, and that lead to process surviving; I figured it 
>> may take some time for signal to work its way thru, and for process 
>> to properly handle the signal. And JVM takes extra time, of course, 
>> to wrap things up, create hs_err log and jfr recording.
>>
>> I guess I can change it back to 60 sec. In most cases, the process 
>> will be killed shortly, so it will not matter. In some cases where it 
>> takes that long, something is clearly wrong, which will be seen in 
>> the logs. Having a really long timeout (e.g. 1000 sec) is 
>> unnecessarily long IMO; it will most likely result in test's timeout.
>
> I think timeout is fine.  The test is not supposed to fail, so it's 
> better to be 100% certain it was not slow hardware.
>>
>> If you are OK with it, I will revert back to what it was before, 
>> 60_000 ? It worked in the past.
>>
> If you are going to revert, I prefer 60_1000 :)
OK, will do.

Thank you for review,
Misha
>
> Erik
>
>>
>> Thank you,
>> Misha
>>
>>
>>
>>> Thanks
>>> Erik
>>>
>>>> Please review: this change updated the handling of cases where 
>>>> child process survives the signal. If it does,
>>>> we record this and skip verification, and continue with the rest of 
>>>> the test.
>>>>
>>>>     JBS: https://bugs.openjdk.java.net/browse/JDK-8217744
>>>>     Webrev: http://cr.openjdk.java.net/~mseledtsov/8217744.00/
>>>>     Testing:
>>>>         1. Locally: Mac OSX, both as is and simulating the child 
>>>> process surviving the SIGINT - PASS
>>>>         2. Multi-platform automated system: Linux-x64, Win-x64, 
>>>> Mac, Sol-Spc - All PASS
>>>>         3. SAP engineer tested the patch on the SAP systems where 
>>>> it originally failed - Pass
>>>>             Goetz, many thanks for testing the patch.
>>>>
>>>> Thank you,
>>>> Misha
>>>>
>>>
>>
>



More information about the hotspot-jfr-dev mailing list