Can continuation support finally solve the "How do I stop this thread" problem?

Alex Otenko oleksandr.otenko at gmail.com
Wed Sep 7 08:05:34 UTC 2022


On a different but somewhat related note. What do we get when we can't
create a new thread? I think we get an OOME.

Is there a way to limit the number of Virtual threads platform-wide so we
get an error that can be handled in some other way than trying to catch and
analyze OOME?

On Tue, 6 Sep 2022, 20:25 Ron Pressler, <ron.pressler at oracle.com> wrote:

> The hidden assumption in what you’re saying is that the process can
> actually recover and behave normally after an OOME, which, perhaps sadly,
> is not really the case. OOME is a special case of VMError, and the
> specification ventures awfully close to undefined behaviour when it comes
> to VMErrors, and with OOME in particular I can say that even the weak
> guarantees offered by the specification are not always met (there are open
> bugs). The reason why it’s not a problem people commonly complain about,
> and the reason why the restrictions might look ridiculous, is that very few
> programs even try to recover from an OOME. If they did, they might well
> find their attempts to be unsuccessful.
>
> Thread.stop has the same issues, only it was hypothetically intended to be
> used in the normal course of normal programs. It can’t work. To give one
> simple example, we cannot guarantee the execution of finally blocks in the
> presence of Thread.stop — imagine the program just enters a finally block
> and then ThreadDeath occurs: the finally block would be aborted. This is
> really, really bad.
>
> Stack overflow is less of a problem in practice because the OpenJDK VM
> (HotSpot) happens to only emit SOEs at method entry, but that’s not the
> case for OOME or ThreadDeath. There is just no way, with Java’s current
> semantics (or in virtually all other programming languages for that
> matter), to allow this kind of asynchronous, forceful, thread termination
> while still keeping the program in a consistent state. It is a *very* hard
> problem.
>
> — Ron
>
> On 5 Sep 2022, at 07:44, Sam Pullara <spullara at gmail.com> wrote:
>
> I've always thought that the restrictions were somewhat ridiculous
> considering that an interrupt could happen at almost any time with an OOM.
> Basically no one protects against that. Perhaps you have the same rule
> where folks can say that the entire process should die on a thread kill
> (like can be set for OOM)? Unblocking blocked calls should be the easiest
> part of fixing this and not even that interesting relative to worrying
> about losing synchronization.
>
> On Fri, Sep 2, 2022 at 2:34 PM Ron Pressler <ron.pressler at oracle.com>
> wrote:
>
>> I don’t think all OS operations have non-blocking alternatives (at least
>> before io_uring), but Alan will know more.
>>
>> Also, for a problem to be worth fixing, it needs to be sufficiently
>> troublesome. I don’t think many servers will run into such issues. The most
>> common operations in servers are network operations and those don’t block
>> OS threads, and uncommon operations are smoothed over by the scheduler.
>>
>> — Ron
>>
>> On 2 Sep 2022, at 19:28, Archie Cobbs <archie.cobbs at gmail.com> wrote:
>>
>> On Fri, Sep 2, 2022 at 11:52 AM Ron Pressler <ron.pressler at oracle.com>
>> wrote:
>>
>>> > So what happens when a virtual thread invokes X? Isn't that going to
>>> "lock up" the underlying platform thread (or whatever) while X is blocked?
>>>
>>> Yes, and that’s what happens for most filesystem operations. We will
>>> employ io_uring, where available, to use non-blocking filesystem
>>> operations, but until then (or where io_uring is not available) we
>>> compensate by adding more OS threads to the scheduler because there’s
>>> nothing the user can do to avoid it (see JEP 425). User-mode
>>> threads/coroutine implementations in other languages also suffer from this
>>> limitation. User-mode code can only work within the confines of the APIs
>>> provided by the OS.
>>>
>>
>> OK thanks, now I get it. This limitation inherited from the OS is not
>> going to be eliminated or worked around by the new code. So if I create
>> 1,000,000 virtual threads and they all call some blocking operation then
>> I'm probably in trouble :)
>>
>> On UNIX at least, AFAIK all blocking operations have a non-blocking
>> alternative, so *in theory* it would be possible to make everything
>> unblockable, but of course all internal code - including any JNI native
>> code - would have to play along (i.e., be rewritten to use some official
>> system call wrapper API). This would be similar to what the Pth user-mode
>> threading library does, where they wrap all of the blocking system calls
>> with non-blocking versions (link
>> <https://www.gnu.org/software/pth/pth-manual.html#standard_posix_replacement_api>)
>> and use setcontext/getcontext to context switch.
>>
>> There are lots of languages (e.g., lua) that have the same issue -
>> everything is coroutines, rainbows, and unicorns until some native code
>> somewhere calls read(2) or waitpid(2) or whatever. It would be cool if
>> someday Java was the one language platform that was able to finally fix
>> this, but that's obviously a lot easier said than done. I'm not suggesting
>> doing this, just pointing out that it's possible.
>>
>> -Archie
>>
>> --
>> Archie L. Cobbs
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20220907/47215153/attachment-0001.htm>


More information about the loom-dev mailing list