RFR: 7903217: jtreg could try killing descendants of stuck test, before timing out the test [v4]

Jonathan Gibbons jonathan.gibbons at oracle.com
Thu Nov 17 21:23:50 UTC 2022


On 11/17/22 12:50 PM, Gerard Ziemski wrote:
> On Mon, 22 Aug 2022 20:54:36 GMT, Gerard Ziemski <gziemski at openjdk.org> wrote:
>
>>> This is an enhancement that aims to improve the robustness of the testing by attempting to quit any child processes (that are possibly stuck and are blocking the parent process from terminating) before timing out the target parent process.
>>>
>>> Aborting a process will flush its stdout/stderr streams, which will hopefully get captured in the test's log and provide additional clues as to why a test was timing out.
>>>
>>> This enhancement was locally tested with a handcrafted test that itself launched a child process that would get stuck on purpose and worked as intended.
>>>
>>> Hopefully, this will help debug issues such as [JDK-8286345](https://bugs.openjdk.org/browse/JDK-8286345)
>> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision:
>>
>>    return exit code if we had to cancel any child processes
>> _Mailing list message from [Jonathan Gibbons](mailto:jonathan.gibbons at oracle.com) on [jtreg-dev](mailto:jtreg-dev at mail.openjdk.org):_
>>
>> On 8/22/22 10:21 AM, Gerard Ziemski wrote:
>>
>>> I see the point, but I also had a case where after force quitting the child process, the parent test process is able to retrieve everything it needed from the child (which finished its task required for the parent before getting itself locked up) and be able to succeed just fine.
>>> If we were to return error code in such cases, we would be generating failures in cases where the parent process was able to finish, but its children were not. Not necessarily always directly related to the case under test, more of a tangential failure, but I do see the value in reporting such cases. I will tweak it then.
>> This sort of detailed analysis is beyond what is reasonable for `jtreg` itself.? When you want complex decision logic for "did it pass", you need to bake that into the test code.
>>
>> -- Jon
> I was trying to help debug a family of issues with the same problem - timeouts with missing timeout info, in hopes that killing/flushing subprocesses of a test that is about to fail (by timeout) anyhow might get us some extra debug info that could help.
>
> It's an issue that affects a wide area of tests from all different components. For example a search of semi recent (2020-2022) bugs with:
>
>
> Timeout information:
> --- Timeout information end.
>
> in common output, gave me this list of P1-P3 (I spent 10 minutes searching, but there are many many more):
>
> https://bugs.openjdk.org/browse/JDK-8184445 JShell tests: fail intermittently if tests are run in high concurrent mode.
> https://bugs.openjdk.org/browse/JDK-8286554 gc/stress/TestStressG1Humongous.java timed out
> https://bugs.openjdk.org/browse/JDK-8288279 gc/z/TestHighUsage.java timed out
> https://bugs.openjdk.org/browse/JDK-8251969 java/lang/invoke/RicochetTest.java timed out
> https://bugs.openjdk.org/browse/JDK-8293289 gc/cslocker/TestCSLocker.java timed out
> https://bugs.openjdk.org/browse/JDK-8270799 vmTestbase/nsk/jvmti/ tests timing out with JFR
> https://bugs.openjdk.org/browse/JDK-8268379 java/util/Locale/LocaleProvidersRun.java and sun/util/locale/provider/CalendarDataRegression.java timed out
> https://bugs.openjdk.org/browse/JDK-8265037 serviceability/sa/ClhsdbPmap.java#id1 failed with "RuntimeException: Process is still alive. Can't get its output."
> https://bugs.openjdk.org/browse/JDK-8289918 serviceability/attach/AttachWithStalePidFile.java timed out with "IOException: Premature EOF"
> https://bugs.openjdk.org/browse/JDK-8278369 java/nio/channels/Channels/TransferTo.java hangs in testStreamContents
> https://bugs.openjdk.org/browse/JDK-8258648 vmTestbase/vm/mlvm/indy/stress/jdi/breakpointInCompiledCode/Test.java timed out
> https://bugs.openjdk.org/browse/JDK-8249684 java/foreign/TestMismatch.java timed out
>
> I was hoping that this enhancement would help with such issues to get child processes to flush their output to help debug them further.
>
> It's not just one test, it's many.
>
> -------------
>
> PR: https://git.openjdk.org/jtreg/pull/97


If it's not one test but many, that suggests putting the functionality 
in code in a test-library, that can be reused by any necessary tests.

Also, note the possibility of using `@run driver` which is intended as 
an extension mechanism for customized execution models, such as that you 
are describing. In conjunction with test library code, it allows you to 
have complex execution models, which can even be different for different 
kinds of tests.   You described one particular model where a process 
might time out but the test should still be deemed to have passed;  that 
may be reasonable for the tests you have in mind, but it does not sound 
general enough or standard enough to be baked into mainline jtreg.

If you have trouble modelling the behavior you want in a test driver 
class, then that is a reason to come back here and propose or ask for 
any necessary enhancements.

-- Jon



More information about the jtreg-dev mailing list