RFR: 8318839: Update test thread factory to catch all exceptions
Jaikiran Pai
jpai at openjdk.org
Tue Oct 31 05:59:31 UTC 2023
On Wed, 25 Oct 2023 21:08:01 GMT, Leonid Mesnik <lmesnik at openjdk.org> wrote:
> The jtreg starts the main thread in a separate ThreadGroup and checks unhandled exceptions for this group. However, it doesn't catch all unhandled exceptions. There is a jtreg issue for this https://bugs.openjdk.org/browse/CODETOOLS-7903526.
> Catching such issues for virtual threads is important because they are not included in any groups. So this fix implements the handler for the test thread factory.
>
> A few tests start failing.
>
> The test
> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorVMEventsTest.java
> has testcases for platform and virtual threads. So, there is there's no need to run it with the thread factory.
>
> The test
> java/lang/Thread/virtual/ThreadAPI.java
> tests UncaughtExceptionHandler and virtual threads. No need to run it with a thread factory.
>
> Test
> test/jdk/java/util/concurrent/tck/ThreadTest.java is updated to not check the default empty handler.
>
> Probably, we need some common approach about dealing with the UncaughtExceptionHandler in jtreg.
Hello Leonid,
> So this pr is just temporary fix until jtreg is updated.
> ...
> I could withdraw this PR, but not sure what are the risks/issues if I integrate it. We are going just to have a ugly error reporting for the uncaught threads when test thread factory is used or missed something?
I think there is more than one problem with the proposed approach and those problems outweigh any usefulness this change might bring in.
Specifically, this PR introduces a call to `System.exit()` which terminates a JVM. When that happens, it's not the ugly error reporting which is a problem, but the fact that tests which get abruptly terminated like this will have no way to determine what went wrong. For example, you will see:
> result: Error. Agent communication error: java.io.EOFException; check console log for any additional details
>
>
> test result: Error. Agent communication error: java.io.EOFException; check console log for any additional details
Whether or not there will be any additional logs anywhere is uncertain because jtreg and other infrastructure which does any kind of buffering of logs and other state wouldn't have had a chance to properly report the state/logs. It's almost as if the (remote) JVM has crashed. In fact even genuine failures from the test may not get reported and instead might get replaced by this generic JVM termination reporting.
Furthermore, in its current form the System.exit() gets called when an uncaught exception gets thrown from some thread. When this happens, the test itself might not have completed (unlike in the case of the platform threads, where jtreg just keeps track of the uncaught exception and when the test is finally complete, it uses that state to do the decision of failing or passing the test).
Then there is the case where the JVM level uncaught exception handler is being changed in this proposal. What that means is, when run with a virtual thread, this will now impact tests which (rightly) expect that the default uncaught exception handler would be null. One such example is the `test/jdk/java/util/concurrent/tck/ThreadTest.java` which is being updated in this PR. Not even using `/othervm` in those tests will get them to a state where they can expect the default uncaught exception handler to be null because even in `/othervm` scenarios, this proposed change will have set the default uncaught exception handler.
If we leave out setting the default uncaught exception handler and the call to System.exit() from this PR, then what remains is just the exception logging and stacktrace printing. That part is currently anyway available even without this proposed change, because the "system" `ThreadGroup` does exactly that https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ThreadGroup.java#L696. In fact, if we currently run the tests using this virtual thread factory and then once the tests complete, if we run a search for this error mesage from the "system" ThreaGroup, we should be able to identify all those tests which had a thread throw some uncaught exception.
Given all this, I think this PR isn't introducing anything new that will help us even in the short term.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16369#issuecomment-1786498722
More information about the core-libs-dev
mailing list