RFR(S): 8217744: [TESTBUG] JFR TestShutdownEvent fails on some systems due to process surviving SIGINT

Erik Gahlin erik.gahlin at oracle.com
Tue Jan 29 22:48:47 UTC 2019


On 2019-01-29 22:51, mikhailo.seledtsov at oracle.com wrote:
> Hi Erik,
>
>   Thank you for review.
>
>
> On 1/29/19 1:26 PM, Erik Gahlin wrote:
>> Hi Misha,
>>
>> I noticed the "60_1000" when I reviewed you change the first time, 
>> but isn't it better to just let the process hang "forever" (i.e. 600 
>> s) if it can't be killed than an arbitrary 10 s, which may or may not 
>> be sufficient?
>>
>> Determinism is nice when analyzing test failures. Remove the sleep, 
>> perhaps adding a System.out, or just let it sleep indefinitely, i.e 
>> Thread.sleep(1_000_000).
>>
>> Or would that not work?
> I did experiment with the value of sleep. Original is 60_1000, which 
> is 60 sec. 

60_1000 is 601 seconds.

60 seconds is 60_000.

> I removed the sleep, and that lead to process surviving; I figured it 
> may take some time for signal to work its way thru, and for process to 
> properly handle the signal. And JVM takes extra time, of course, to 
> wrap things up, create hs_err log and jfr recording.
>
> I guess I can change it back to 60 sec. In most cases, the process 
> will be killed shortly, so it will not matter. In some cases where it 
> takes that long, something is clearly wrong, which will be seen in the 
> logs. Having a really long timeout (e.g. 1000 sec) is unnecessarily 
> long IMO; it will most likely result in test's timeout.

I think timeout is fine.  The test is not supposed to fail, so it's 
better to be 100% certain it was not slow hardware.
>
> If you are OK with it, I will revert back to what it was before, 
> 60_000 ? It worked in the past.
>
If you are going to revert, I prefer 60_1000 :)

Erik

>
> Thank you,
> Misha
>
>
>
>> Thanks
>> Erik
>>
>>> Please review: this change updated the handling of cases where child 
>>> process survives the signal. If it does,
>>> we record this and skip verification, and continue with the rest of 
>>> the test.
>>>
>>>     JBS: https://bugs.openjdk.java.net/browse/JDK-8217744
>>>     Webrev: http://cr.openjdk.java.net/~mseledtsov/8217744.00/
>>>     Testing:
>>>         1. Locally: Mac OSX, both as is and simulating the child 
>>> process surviving the SIGINT - PASS
>>>         2. Multi-platform automated system: Linux-x64, Win-x64, Mac, 
>>> Sol-Spc - All PASS
>>>         3. SAP engineer tested the patch on the SAP systems where it 
>>> originally failed - Pass
>>>             Goetz, many thanks for testing the patch.
>>>
>>> Thank you,
>>> Misha
>>>
>>
>



More information about the hotspot-jfr-dev mailing list