RFR(S): 8217744: [TESTBUG] JFR TestShutdownEvent fails on some systems due to process surviving SIGINT
mikhailo.seledtsov at oracle.com
mikhailo.seledtsov at oracle.com
Tue Jan 29 22:58:16 UTC 2019
On 1/29/19 2:48 PM, Erik Gahlin wrote:
> On 2019-01-29 22:51, mikhailo.seledtsov at oracle.com wrote:
>> Hi Erik,
>>
>> Thank you for review.
>>
>>
>> On 1/29/19 1:26 PM, Erik Gahlin wrote:
>>> Hi Misha,
>>>
>>> I noticed the "60_1000" when I reviewed you change the first time,
>>> but isn't it better to just let the process hang "forever" (i.e. 600
>>> s) if it can't be killed than an arbitrary 10 s, which may or may
>>> not be sufficient?
>>>
>>> Determinism is nice when analyzing test failures. Remove the sleep,
>>> perhaps adding a System.out, or just let it sleep indefinitely, i.e
>>> Thread.sleep(1_000_000).
>>>
>>> Or would that not work?
>> I did experiment with the value of sleep. Original is 60_1000, which
>> is 60 sec.
>
> 60_1000 is 601 seconds.
>
> 60 seconds is 60_000.
That's right, thank you for correction. I missed that.
>
>> I removed the sleep, and that lead to process surviving; I figured it
>> may take some time for signal to work its way thru, and for process
>> to properly handle the signal. And JVM takes extra time, of course,
>> to wrap things up, create hs_err log and jfr recording.
>>
>> I guess I can change it back to 60 sec. In most cases, the process
>> will be killed shortly, so it will not matter. In some cases where it
>> takes that long, something is clearly wrong, which will be seen in
>> the logs. Having a really long timeout (e.g. 1000 sec) is
>> unnecessarily long IMO; it will most likely result in test's timeout.
>
> I think timeout is fine. The test is not supposed to fail, so it's
> better to be 100% certain it was not slow hardware.
>>
>> If you are OK with it, I will revert back to what it was before,
>> 60_000 ? It worked in the past.
>>
> If you are going to revert, I prefer 60_1000 :)
OK, will do.
Thank you for review,
Misha
>
> Erik
>
>>
>> Thank you,
>> Misha
>>
>>
>>
>>> Thanks
>>> Erik
>>>
>>>> Please review: this change updated the handling of cases where
>>>> child process survives the signal. If it does,
>>>> we record this and skip verification, and continue with the rest of
>>>> the test.
>>>>
>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8217744
>>>> Webrev: http://cr.openjdk.java.net/~mseledtsov/8217744.00/
>>>> Testing:
>>>> 1. Locally: Mac OSX, both as is and simulating the child
>>>> process surviving the SIGINT - PASS
>>>> 2. Multi-platform automated system: Linux-x64, Win-x64,
>>>> Mac, Sol-Spc - All PASS
>>>> 3. SAP engineer tested the patch on the SAP systems where
>>>> it originally failed - Pass
>>>> Goetz, many thanks for testing the patch.
>>>>
>>>> Thank you,
>>>> Misha
>>>>
>>>
>>
>
More information about the hotspot-jfr-dev
mailing list