RFR: 8268773: Improvements related to: Failed to start thread - pthread_create failed (EAGAIN)

David Holmes david.holmes at oracle.com
Thu Jul 1 07:24:13 UTC 2021


Hi Thomas,

Thanks for looking at this.

On 1/07/2021 5:00 pm, Thomas Stuefe wrote:
> On Thu, 1 Jul 2021 06:24:02 GMT, David Holmes <dholmes at openjdk.org> wrote:
> 
>> Please review this simple enhancement that:
>>
>> 1. Retries OS thread creation of it fails due to EAGAIN
>>
>> This is potentially of limited use as you would need some resources to be released for subsequent calls to succeed.
>>
>> 2. Prints the name of the thread being started in the warning/log messages
>>
>> This is also of limited use as JavaThread's do not have their correct name at this stage, nor do some system threads. But others do, so it can be informative.
>>
>> I looked at trying to (separately) unify this code into a Posix version, but the platform differences are such that it makes it very difficult to try and share code. So this simply updates each the existing code in place.
>>
>> Testing:
>>    - builds from tiers 1-3 and GHA
>>    - manual inspection of output from a simple thread exhaustion test (runtime/Thread/ThreadCountLimit.java)
>>    - manual inspection of os+thread logging on java -version
>>
>> Thanks,
>> David
> 
> Hi David,
> 
> I am not sure this is useful.

Makes two of us - see comments in JBS. But if it is also not harmful ...

> 
> If you cannot start threads you ran into one of a multiple of possible limits (max number of tasks or out of memory/address space for stacks or out of number of vmas to establish guard pages...). In my experience these exhaustion cases don't solve themselves quickly. And even in the unlikely case a subsequent thread creation succeeds: the next fork/thread creation error is right around the corner. It would be better to fail here and let the user or admin fix the setup instead of stumbling along.

Yep as I stated very unlikely a retry will help. Bit like redoing a 
failed malloc() hoping someone else might have done a free(). You might 
get lucky but really you've already run off the edge of the cliff.

> Printing the name of the particular thread I also don't find that helpful (though it does no harm, probably), but I am not sure it is safe to call `JavaThread::get_thread_name()` at this point. In these situations it is seldom interesting which particular thread could not be started but why we hit a limit at all.

It is safe to call and will just delegate to Thread::name() due to the 
new JavaThread not being a "protected" JavaThread. Though there is a 
fair bit of overhead for that call chain ...

I also thought the name was of limited use, but from testing I was 
surprised to see that the thread that failed to be started was not the 
thread I was expecting, so it can have some use. :)

> So what would be good is if we were to log information about the limit we hit. In its very simplest form this could be a printout of limits (though there are many, e.g. on cgroups level). In a more sophisticated form the VM could do some platform specific analysis as to which limit hit us. But that is not that easy.

We already print various bits of system information to help in that 
regard. But in any case that would be a different RFE.

Cheers,
David

> Cheers, Thomas
> 
> -------------
> 
> PR: https://git.openjdk.java.net/jdk/pull/4648
> 


More information about the hotspot-runtime-dev mailing list