RFR: 8268773: Improvements related to: Failed to start thread - pthread_create failed (EAGAIN)
Thomas Stüfe
thomas.stuefe at gmail.com
Thu Jul 1 08:47:39 UTC 2021
On Thu, Jul 1, 2021 at 9:24 AM David Holmes <david.holmes at oracle.com> wrote:
> Hi Thomas,
>
> Thanks for looking at this.
>
> On 1/07/2021 5:00 pm, Thomas Stuefe wrote:
> > On Thu, 1 Jul 2021 06:24:02 GMT, David Holmes <dholmes at openjdk.org>
> wrote:
> >
> >> Please review this simple enhancement that:
> >>
> >> 1. Retries OS thread creation of it fails due to EAGAIN
> >>
> >> This is potentially of limited use as you would need some resources to
> be released for subsequent calls to succeed.
> >>
> >> 2. Prints the name of the thread being started in the warning/log
> messages
> >>
> >> This is also of limited use as JavaThread's do not have their correct
> name at this stage, nor do some system threads. But others do, so it can be
> informative.
> >>
> >> I looked at trying to (separately) unify this code into a Posix
> version, but the platform differences are such that it makes it very
> difficult to try and share code. So this simply updates each the existing
> code in place.
> >>
> >> Testing:
> >> - builds from tiers 1-3 and GHA
> >> - manual inspection of output from a simple thread exhaustion test
> (runtime/Thread/ThreadCountLimit.java)
> >> - manual inspection of os+thread logging on java -version
> >>
> >> Thanks,
> >> David
> >
> > Hi David,
> >
> > I am not sure this is useful.
>
> Makes two of us - see comments in JBS. But if it is also not harmful ...
>
> >
> > If you cannot start threads you ran into one of a multiple of possible
> limits (max number of tasks or out of memory/address space for stacks or
> out of number of vmas to establish guard pages...). In my experience these
> exhaustion cases don't solve themselves quickly. And even in the unlikely
> case a subsequent thread creation succeeds: the next fork/thread creation
> error is right around the corner. It would be better to fail here and let
> the user or admin fix the setup instead of stumbling along.
>
> Yep as I stated very unlikely a retry will help. Bit like redoing a
> failed malloc() hoping someone else might have done a free(). You might
> get lucky but really you've already run off the edge of the cliff.
>
> > Printing the name of the particular thread I also don't find that
> helpful (though it does no harm, probably), but I am not sure it is safe to
> call `JavaThread::get_thread_name()` at this point. In these situations it
> is seldom interesting which particular thread could not be started but why
> we hit a limit at all.
>
> It is safe to call and will just delegate to Thread::name() due to the
> new JavaThread not being a "protected" JavaThread. Though there is a
> fair bit of overhead for that call chain ...
>
> I also thought the name was of limited use, but from testing I was
> surprised to see that the thread that failed to be started was not the
> thread I was expecting, so it can have some use. :)
>
> > So what would be good is if we were to log information about the limit
> we hit. In its very simplest form this could be a printout of limits
> (though there are many, e.g. on cgroups level). In a more sophisticated
> form the VM could do some platform specific analysis as to which limit hit
> us. But that is not that easy.
>
> We already print various bits of system information to help in that
> regard. But in any case that would be a different RFE.
>
The more I think about this the more I think the right way would be to
enable (some form of) CrashOnOutOfMemoryError for thread resource
exhaustion. You'd get the hs-err file (which can also be streamed to stdout
with ErrorFileToStdout).
Remember ? :) We do that downstream now btw:
https://github.com/SAP/SapMachine/wiki/Handling-of-OnOutOfMemoryError-switches-in-the-SapMachine
- and our support is _very_ happy with that behavior, not only the fact
that we can stop on thread exhaustion, but also e.g. that we suppress cores
on OOM crashes by default.
Cheers, Thomas
> > -------------
> >
> > PR: https://git.openjdk.java.net/jdk/pull/4648
> >
>
More information about the hotspot-runtime-dev
mailing list