Can continuation support finally solve the "How do I stop this thread" problem?

Ron Pressler ron.pressler at oracle.com
Tue Sep 6 19:24:51 UTC 2022


The hidden assumption in what you’re saying is that the process can actually recover and behave normally after an OOME, which, perhaps sadly, is not really the case. OOME is a special case of VMError, and the specification ventures awfully close to undefined behaviour when it comes to VMErrors, and with OOME in particular I can say that even the weak guarantees offered by the specification are not always met (there are open bugs). The reason why it’s not a problem people commonly complain about, and the reason why the restrictions might look ridiculous, is that very few programs even try to recover from an OOME. If they did, they might well find their attempts to be unsuccessful.

Thread.stop has the same issues, only it was hypothetically intended to be used in the normal course of normal programs. It can’t work. To give one simple example, we cannot guarantee the execution of finally blocks in the presence of Thread.stop — imagine the program just enters a finally block and then ThreadDeath occurs: the finally block would be aborted. This is really, really bad.

Stack overflow is less of a problem in practice because the OpenJDK VM (HotSpot) happens to only emit SOEs at method entry, but that’s not the case for OOME or ThreadDeath. There is just no way, with Java’s current semantics (or in virtually all other programming languages for that matter), to allow this kind of asynchronous, forceful, thread termination while still keeping the program in a consistent state. It is a *very* hard problem.

— Ron

On 5 Sep 2022, at 07:44, Sam Pullara <spullara at gmail.com<mailto:spullara at gmail.com>> wrote:

I've always thought that the restrictions were somewhat ridiculous considering that an interrupt could happen at almost any time with an OOM. Basically no one protects against that. Perhaps you have the same rule where folks can say that the entire process should die on a thread kill (like can be set for OOM)? Unblocking blocked calls should be the easiest part of fixing this and not even that interesting relative to worrying about losing synchronization.

On Fri, Sep 2, 2022 at 2:34 PM Ron Pressler <ron.pressler at oracle.com<mailto:ron.pressler at oracle.com>> wrote:
I don’t think all OS operations have non-blocking alternatives (at least before io_uring), but Alan will know more.

Also, for a problem to be worth fixing, it needs to be sufficiently troublesome. I don’t think many servers will run into such issues. The most common operations in servers are network operations and those don’t block OS threads, and uncommon operations are smoothed over by the scheduler.

— Ron

On 2 Sep 2022, at 19:28, Archie Cobbs <archie.cobbs at gmail.com<mailto:archie.cobbs at gmail.com>> wrote:

On Fri, Sep 2, 2022 at 11:52 AM Ron Pressler <ron.pressler at oracle.com<mailto:ron.pressler at oracle.com>> wrote:
> So what happens when a virtual thread invokes X? Isn't that going to "lock up" the underlying platform thread (or whatever) while X is blocked?

Yes, and that’s what happens for most filesystem operations. We will employ io_uring, where available, to use non-blocking filesystem operations, but until then (or where io_uring is not available) we compensate by adding more OS threads to the scheduler because there’s nothing the user can do to avoid it (see JEP 425). User-mode threads/coroutine implementations in other languages also suffer from this limitation. User-mode code can only work within the confines of the APIs provided by the OS.

OK thanks, now I get it. This limitation inherited from the OS is not going to be eliminated or worked around by the new code. So if I create 1,000,000 virtual threads and they all call some blocking operation then I'm probably in trouble :)

On UNIX at least, AFAIK all blocking operations have a non-blocking alternative, so in theory it would be possible to make everything unblockable, but of course all internal code - including any JNI native code - would have to play along (i.e., be rewritten to use some official system call wrapper API). This would be similar to what the Pth user-mode threading library does, where they wrap all of the blocking system calls with non-blocking versions (link<https://www.gnu.org/software/pth/pth-manual.html#standard_posix_replacement_api>) and use setcontext/getcontext to context switch.

There are lots of languages (e.g., lua) that have the same issue - everything is coroutines, rainbows, and unicorns until some native code somewhere calls read(2) or waitpid(2) or whatever. It would be cool if someday Java was the one language platform that was able to finally fix this, but that's obviously a lot easier said than done. I'm not suggesting doing this, just pointing out that it's possible.

-Archie

--
Archie L. Cobbs


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20220906/1f6284db/attachment-0001.htm>


More information about the loom-dev mailing list