RFR: 7903217: jtreg could try killing descendants of stuck test, before timing out the test [v4]

Thu Dec 1 19:26:01 UTC 2022

On Mon, 22 Aug 2022 20:54:36 GMT, Gerard Ziemski <gziemski at openjdk.org> wrote:

>> This is an enhancement that aims to improve the robustness of the testing by attempting to quit any child processes (that are possibly stuck and are blocking the parent process from terminating) before timing out the target parent process.
>> 
>> Aborting a process will flush its stdout/stderr streams, which will hopefully get captured in the test's log and provide additional clues as to why a test was timing out.
>> 
>> This enhancement was locally tested with a handcrafted test that itself launched a child process that would get stuck on purpose and worked as intended.
>> 
>> Hopefully, this will help debug issues such as [JDK-8286345](https://bugs.openjdk.org/browse/JDK-8286345)
>
> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision:
> 
>   return exit code if we had to cancel any child processes

> _Mailing list message from [Jonathan Gibbons](mailto:jonathan.gibbons at oracle.com) on [jtreg-dev](mailto:jtreg-dev at mail.openjdk.org):_
> 
> On 11/17/22 12:50 PM, Gerard Ziemski wrote:
> 
> If it's not one test but many, that suggests putting the functionality in code in a test-library, that can be reused by any necessary tests.
> 
> Also, note the possibility of using `@run driver` which is intended as an extension mechanism for customized execution models, such as that you are describing. In conjunction with test library code, it allows you to have complex execution models, which can even be different for different kinds of tests.?? You described one particular model where a process might time out but the test should still be deemed to have passed;? that may be reasonable for the tests you have in mind, but it does not sound general enough or standard enough to be baked into mainline jtreg.
> 
> If you have trouble modelling the behavior you want in a test driver class, then that is a reason to come back here and propose or ask for any necessary enhancements.

The goal here was to "flush" any possible outstanding output, not to make the test finish (either as pass or fail).

And in order to flush any outstanding output, we need to quit any lingering child processes. The other consequence of doing that is that the main test might now be able to continue and possibly finish. In that scenario, you have earlier raised a question on whether we should consider that pass or fail. I said originally, that it should pass (assuming the test's asserts all pass). However, the pass/fail issue was never at the issue here.

I really just wanted to unblock the error/output pipelines here of hanging processes that were about to timeout. So the effects of this fix would only apply to those tests that were about to time out anyhow.

I also was trying to make a point that this timeout, where we are missing possible outputs, affects random tests in a variety of components, so it's not possible to anticipate which ones would need this new feature, and I was trying to argue that we need it system wide, and I still think that.

We can introduce a switch that would make this optional, but that switch should affect ALL the tests.

On my part I don't understand why you both think that such approach should not be part of jtreg. The timeout mechanism itself is part of jtreg. I think it makes a perfect sense to extend it to terminate any hanging child processes (jtreg has a knowledge and a built-in mechanism for doing that, which would need to be re-implemented in any client using jtreg)

Can we continue discussing this?

On my part I will look into `@run driver` feature...

-------------

PR: https://git.openjdk.org/jtreg/pull/97