RFR: 8268773: Improvements related to: Failed to start thread - pthread_create failed (EAGAIN)
Thomas Stuefe
stuefe at openjdk.java.net
Thu Jul 1 07:00:03 UTC 2021
On Thu, 1 Jul 2021 06:24:02 GMT, David Holmes <dholmes at openjdk.org> wrote:
> Please review this simple enhancement that:
>
> 1. Retries OS thread creation of it fails due to EAGAIN
>
> This is potentially of limited use as you would need some resources to be released for subsequent calls to succeed.
>
> 2. Prints the name of the thread being started in the warning/log messages
>
> This is also of limited use as JavaThread's do not have their correct name at this stage, nor do some system threads. But others do, so it can be informative.
>
> I looked at trying to (separately) unify this code into a Posix version, but the platform differences are such that it makes it very difficult to try and share code. So this simply updates each the existing code in place.
>
> Testing:
> - builds from tiers 1-3 and GHA
> - manual inspection of output from a simple thread exhaustion test (runtime/Thread/ThreadCountLimit.java)
> - manual inspection of os+thread logging on java -version
>
> Thanks,
> David
Hi David,
I am not sure this is useful.
If you cannot start threads you ran into one of a multiple of possible limits (max number of tasks or out of memory/address space for stacks or out of number of vmas to establish guard pages...). In my experience these exhaustion cases don't solve themselves quickly. And even in the unlikely case a subsequent thread creation succeeds: the next fork/thread creation error is right around the corner. It would be better to fail here and let the user or admin fix the setup instead of stumbling along.
Printing the name of the particular thread I also don't find that helpful (though it does no harm, probably), but I am not sure it is safe to call `JavaThread::get_thread_name()` at this point. In these situations it is seldom interesting which particular thread could not be started but why we hit a limit at all.
So what would be good is if we were to log information about the limit we hit. In its very simplest form this could be a printout of limits (though there are many, e.g. on cgroups level). In a more sophisticated form the VM could do some platform specific analysis as to which limit hit us. But that is not that easy.
Cheers, Thomas
-------------
PR: https://git.openjdk.java.net/jdk/pull/4648
More information about the hotspot-runtime-dev
mailing list