RFR(S): 8217744: [TESTBUG] JFR TestShutdownEvent fails on some systems due to process surviving SIGINT

Erik Gahlin erik.gahlin at oracle.com
Wed Jan 30 00:23:40 UTC 2019


Thanks. No need repost webrev,

Erik


> On 29 Jan 2019, at 23:58, mikhailo.seledtsov at oracle.com wrote:
> 
> 
> 
> On 1/29/19 2:48 PM, Erik Gahlin wrote:
>> On 2019-01-29 22:51, mikhailo.seledtsov at oracle.com wrote:
>>> Hi Erik,
>>> 
>>>   Thank you for review.
>>> 
>>> 
>>> On 1/29/19 1:26 PM, Erik Gahlin wrote:
>>>> Hi Misha,
>>>> 
>>>> I noticed the "60_1000" when I reviewed you change the first time, but isn't it better to just let the process hang "forever" (i.e. 600 s) if it can't be killed than an arbitrary 10 s, which may or may not be sufficient?
>>>> 
>>>> Determinism is nice when analyzing test failures. Remove the sleep, perhaps adding a System.out, or just let it sleep indefinitely, i.e Thread.sleep(1_000_000).
>>>> 
>>>> Or would that not work?
>>> I did experiment with the value of sleep. Original is 60_1000, which is 60 sec. 
>> 
>> 60_1000 is 601 seconds.
>> 
>> 60 seconds is 60_000.
> That's right, thank you for correction. I missed that.
>> 
>>> I removed the sleep, and that lead to process surviving; I figured it may take some time for signal to work its way thru, and for process to properly handle the signal. And JVM takes extra time, of course, to wrap things up, create hs_err log and jfr recording.
>>> 
>>> I guess I can change it back to 60 sec. In most cases, the process will be killed shortly, so it will not matter. In some cases where it takes that long, something is clearly wrong, which will be seen in the logs. Having a really long timeout (e.g. 1000 sec) is unnecessarily long IMO; it will most likely result in test's timeout.
>> 
>> I think timeout is fine.  The test is not supposed to fail, so it's better to be 100% certain it was not slow hardware.
>>> 
>>> If you are OK with it, I will revert back to what it was before, 60_000 ? It worked in the past.
>>> 
>> If you are going to revert, I prefer 60_1000 :)
> OK, will do.
> 
> Thank you for review,
> Misha
>> 
>> Erik
>> 
>>> 
>>> Thank you,
>>> Misha
>>> 
>>> 
>>> 
>>>> Thanks
>>>> Erik
>>>> 
>>>>> Please review: this change updated the handling of cases where child process survives the signal. If it does,
>>>>> we record this and skip verification, and continue with the rest of the test.
>>>>> 
>>>>>     JBS: https://bugs.openjdk.java.net/browse/JDK-8217744
>>>>>     Webrev: http://cr.openjdk.java.net/~mseledtsov/8217744.00/
>>>>>     Testing:
>>>>>         1. Locally: Mac OSX, both as is and simulating the child process surviving the SIGINT - PASS
>>>>>         2. Multi-platform automated system: Linux-x64, Win-x64, Mac, Sol-Spc - All PASS
>>>>>         3. SAP engineer tested the patch on the SAP systems where it originally failed - Pass
>>>>>             Goetz, many thanks for testing the patch.
>>>>> 
>>>>> Thank you,
>>>>> Misha



More information about the hotspot-jfr-dev mailing list