RFR: 7903217: jtreg could try killing descendants of stuck test, before timing out the test [v4]
Thomas Stuefe
stuefe at openjdk.org
Fri Nov 18 06:44:09 UTC 2022
On Mon, 22 Aug 2022 20:54:36 GMT, Gerard Ziemski <gziemski at openjdk.org> wrote:
>> This is an enhancement that aims to improve the robustness of the testing by attempting to quit any child processes (that are possibly stuck and are blocking the parent process from terminating) before timing out the target parent process.
>>
>> Aborting a process will flush its stdout/stderr streams, which will hopefully get captured in the test's log and provide additional clues as to why a test was timing out.
>>
>> This enhancement was locally tested with a handcrafted test that itself launched a child process that would get stuck on purpose and worked as intended.
>>
>> Hopefully, this will help debug issues such as [JDK-8286345](https://bugs.openjdk.org/browse/JDK-8286345)
>
> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision:
>
> return exit code if we had to cancel any child processes
Hi Gerard,
What you are trying to do is useful and appreciated. I often was missing more info. But I'm unsure too if handling this at jtreg level is the best thing. But I see two sides here and therefore keep out of the discussion.
But another thing, in order for this to be useful, we would need thread dumps from hanging children too, if the children happen to be JVMs. Just hoping that abort(3) will nudge the children enough to vomit some output will not often work. E.g. if jcmd hangs, it is usually innocent: it waits for an answer from the attachee, and that one is stuck. It would be perfectly able to react to a thread dump and tell me as much.
So, before killing them, send each of them a SIGQUIT to get thread dumps and give them a bit of time to respond. And that raises more questions. If you do this, especially wholesale for all children, you could absolutely flood the jtr files with thread dumps from children, and analyzing them gets really confusing.
Not sure what a good solution could be. Let's see what others think.
Cheers, Thomas
-------------
PR: https://git.openjdk.org/jtreg/pull/97
More information about the jtreg-dev
mailing list