Exponentially delay subsequent native thread creation in case of EAGAIN

David Holmes david.holmes at oracle.com
Tue Apr 15 03:07:30 UTC 2025


Hi Yannik,

On 15/04/2025 2:22 am, Yannik Stradmann wrote:
> Hello everyone,
> 
> I'd like to propose a change to hotspot's error handling when spawning native threads in os::create_thread().
> 
> Currently, if EAGAIN is encountered, we retry three times back-to-back.
> 
> During recent years, I've experienced instabilities on certain systems, where back-to-back (re-)requests of native threads kept hitting the depleted resource pool and, eventually, failed.
> 
> I therefore propose to introduce an exponential backoff when hitting EAGAIN during native thread creation. Hotspot will thereby be more kind to an already depleted resource, reduce stress on the kernel and become more robust on systems under high load.
> 
> For reference, I am attaching a patch against os_linux.cpp, which has been running in production on a mid-scale Jenkins cluster over the past three years. If you approve the modification, I'm happy to create a pull request that includes the other platforms (where applicable).
> The current choice of constants is arbitrary and I'd welcome any suggestions here.

This is not an unreasonable idea. But it is very hard to evaluate the 
effectiveness of such a change. Do you have any actual data on how many 
retries you have had to wait to succeed?

When the retries were added in:

https://bugs.openjdk.org/browse/JDK-8268773

there was some discussion across a number of bug reports and two PRs 
about the potential usefulness of even doing a basic retry as the error 
condition was considered to unlikely to be self correcting. But as per 
that original change, adding a delay between retries does no harm other 
than delaying the ultimate reporting of an error, so it may be okay to 
put in place if it will do some good.

I've filed an enhancement request on your behalf:

https://bugs.openjdk.org/browse/JDK-8354560

> Please note that this is my first time contributing to OpenJDK, please excuse potential unfamiliarities with the process.

Please see the following:

https://openjdk.org/guide/

Thanks,
David

> 
> Yannik
> 
> 
> diff --git a/src/hotspot/os/linux/os_linux.cpp b/src/hotspot/os/linux/os_linux.cpp
> index 4e26797cd5b..2858fbba247 100644
> --- a/src/hotspot/os/linux/os_linux.cpp
> +++ b/src/hotspot/os/linux/os_linux.cpp
> @@ -1064,10 +1064,28 @@ bool os::create_thread(Thread* thread, ThreadType thr_type,
>       ResourceMark rm;
>       pthread_t tid;
>       int ret = 0;
> -    int limit = 3;
> -    do {
> +    int limit = 5;
> +    useconds_t delay = 1'000;
> +    constexpr useconds_t max_delay = 1'000'000;
> +
> +    while (true) {
>         ret = pthread_create(&tid, &attr, (void* (*)(void*)) thread_native_entry, thread);
> -    } while (ret == EAGAIN && limit-- > 0);
> +
> +      if (ret != EAGAIN) {
> +          break;
> +      }
> +
> +      if (limit-- <= 0) {
> +          break;
> +      }
> +
> +      log_warning(os, thread)("Failed to start native thread (%s), retrying after %dus.", os::errno_name(ret), delay);
> +      ::usleep(delay);
> +      delay *= 2;
> +      if (delay > max_delay) {
> +          delay = max_delay;
> +      }
> +    }
>   
>       char buf[64];
>       if (ret == 0) {



More information about the hotspot-runtime-dev mailing list