RFR(S): 8217744: [TESTBUG] JFR TestShutdownEvent fails on some systems due to process surviving SIGINT
Erik Gahlin
erik.gahlin at oracle.com
Tue Jan 29 22:48:47 UTC 2019
On 2019-01-29 22:51, mikhailo.seledtsov at oracle.com wrote:
> Hi Erik,
>
> Thank you for review.
>
>
> On 1/29/19 1:26 PM, Erik Gahlin wrote:
>> Hi Misha,
>>
>> I noticed the "60_1000" when I reviewed you change the first time,
>> but isn't it better to just let the process hang "forever" (i.e. 600
>> s) if it can't be killed than an arbitrary 10 s, which may or may not
>> be sufficient?
>>
>> Determinism is nice when analyzing test failures. Remove the sleep,
>> perhaps adding a System.out, or just let it sleep indefinitely, i.e
>> Thread.sleep(1_000_000).
>>
>> Or would that not work?
> I did experiment with the value of sleep. Original is 60_1000, which
> is 60 sec.
60_1000 is 601 seconds.
60 seconds is 60_000.
> I removed the sleep, and that lead to process surviving; I figured it
> may take some time for signal to work its way thru, and for process to
> properly handle the signal. And JVM takes extra time, of course, to
> wrap things up, create hs_err log and jfr recording.
>
> I guess I can change it back to 60 sec. In most cases, the process
> will be killed shortly, so it will not matter. In some cases where it
> takes that long, something is clearly wrong, which will be seen in the
> logs. Having a really long timeout (e.g. 1000 sec) is unnecessarily
> long IMO; it will most likely result in test's timeout.
I think timeout is fine. The test is not supposed to fail, so it's
better to be 100% certain it was not slow hardware.
>
> If you are OK with it, I will revert back to what it was before,
> 60_000 ? It worked in the past.
>
If you are going to revert, I prefer 60_1000 :)
Erik
>
> Thank you,
> Misha
>
>
>
>> Thanks
>> Erik
>>
>>> Please review: this change updated the handling of cases where child
>>> process survives the signal. If it does,
>>> we record this and skip verification, and continue with the rest of
>>> the test.
>>>
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8217744
>>> Webrev: http://cr.openjdk.java.net/~mseledtsov/8217744.00/
>>> Testing:
>>> 1. Locally: Mac OSX, both as is and simulating the child
>>> process surviving the SIGINT - PASS
>>> 2. Multi-platform automated system: Linux-x64, Win-x64, Mac,
>>> Sol-Spc - All PASS
>>> 3. SAP engineer tested the patch on the SAP systems where it
>>> originally failed - Pass
>>> Goetz, many thanks for testing the patch.
>>>
>>> Thank you,
>>> Misha
>>>
>>
>
More information about the hotspot-jfr-dev
mailing list