RFR: 7903217: jtreg could try killing descendants of stuck test, before timing out the test [v4]

Gerard Ziemski gziemski at openjdk.org
Thu Nov 17 20:50:37 UTC 2022


On Mon, 22 Aug 2022 20:54:36 GMT, Gerard Ziemski <gziemski at openjdk.org> wrote:

>> This is an enhancement that aims to improve the robustness of the testing by attempting to quit any child processes (that are possibly stuck and are blocking the parent process from terminating) before timing out the target parent process.
>> 
>> Aborting a process will flush its stdout/stderr streams, which will hopefully get captured in the test's log and provide additional clues as to why a test was timing out.
>> 
>> This enhancement was locally tested with a handcrafted test that itself launched a child process that would get stuck on purpose and worked as intended.
>> 
>> Hopefully, this will help debug issues such as [JDK-8286345](https://bugs.openjdk.org/browse/JDK-8286345)
>
> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision:
> 
>   return exit code if we had to cancel any child processes

> _Mailing list message from [Jonathan Gibbons](mailto:jonathan.gibbons at oracle.com) on [jtreg-dev](mailto:jtreg-dev at mail.openjdk.org):_
> 
> On 8/22/22 10:21 AM, Gerard Ziemski wrote:
> 
> > I see the point, but I also had a case where after force quitting the child process, the parent test process is able to retrieve everything it needed from the child (which finished its task required for the parent before getting itself locked up) and be able to succeed just fine.
> > If we were to return error code in such cases, we would be generating failures in cases where the parent process was able to finish, but its children were not. Not necessarily always directly related to the case under test, more of a tangential failure, but I do see the value in reporting such cases. I will tweak it then.
> 
> This sort of detailed analysis is beyond what is reasonable for `jtreg` itself.? When you want complex decision logic for "did it pass", you need to bake that into the test code.
> 
> -- Jon

I was trying to help debug a family of issues with the same problem - timeouts with missing timeout info, in hopes that killing/flushing subprocesses of a test that is about to fail (by timeout) anyhow might get us some extra debug info that could help.

It's an issue that affects a wide area of tests from all different components. For example a search of semi recent (2020-2022) bugs with:


Timeout information: 
--- Timeout information end. 

in common output, gave me this list of P1-P3 (I spent 10 minutes searching, but there are many many more):

https://bugs.openjdk.org/browse/JDK-8184445 JShell tests: fail intermittently if tests are run in high concurrent mode.
https://bugs.openjdk.org/browse/JDK-8286554 gc/stress/TestStressG1Humongous.java timed out
https://bugs.openjdk.org/browse/JDK-8288279 gc/z/TestHighUsage.java timed out
https://bugs.openjdk.org/browse/JDK-8251969 java/lang/invoke/RicochetTest.java timed out
https://bugs.openjdk.org/browse/JDK-8293289 gc/cslocker/TestCSLocker.java timed out
https://bugs.openjdk.org/browse/JDK-8270799 vmTestbase/nsk/jvmti/ tests timing out with JFR
https://bugs.openjdk.org/browse/JDK-8268379 java/util/Locale/LocaleProvidersRun.java and sun/util/locale/provider/CalendarDataRegression.java timed out
https://bugs.openjdk.org/browse/JDK-8265037 serviceability/sa/ClhsdbPmap.java#id1 failed with "RuntimeException: Process is still alive. Can't get its output."
https://bugs.openjdk.org/browse/JDK-8289918 serviceability/attach/AttachWithStalePidFile.java timed out with "IOException: Premature EOF"
https://bugs.openjdk.org/browse/JDK-8278369 java/nio/channels/Channels/TransferTo.java hangs in testStreamContents
https://bugs.openjdk.org/browse/JDK-8258648 vmTestbase/vm/mlvm/indy/stress/jdi/breakpointInCompiledCode/Test.java timed out
https://bugs.openjdk.org/browse/JDK-8249684 java/foreign/TestMismatch.java timed out

I was hoping that this enhancement would help with such issues to get child processes to flush their output to help debug them further.

It's not just one test, it's many.

-------------

PR: https://git.openjdk.org/jtreg/pull/97


More information about the jtreg-dev mailing list