Exponentially delay subsequent native thread creation in case of EAGAIN

Yannik Stradmann yjs at stradmann.name
Mon Apr 14 16:22:40 UTC 2025


Hello everyone,

I'd like to propose a change to hotspot's error handling when spawning native threads in os::create_thread().

Currently, if EAGAIN is encountered, we retry three times back-to-back.

During recent years, I've experienced instabilities on certain systems, where back-to-back (re-)requests of native threads kept hitting the depleted resource pool and, eventually, failed.

I therefore propose to introduce an exponential backoff when hitting EAGAIN during native thread creation. Hotspot will thereby be more kind to an already depleted resource, reduce stress on the kernel and become more robust on systems under high load.

For reference, I am attaching a patch against os_linux.cpp, which has been running in production on a mid-scale Jenkins cluster over the past three years. If you approve the modification, I'm happy to create a pull request that includes the other platforms (where applicable).
The current choice of constants is arbitrary and I'd welcome any suggestions here.


Please note that this is my first time contributing to OpenJDK, please excuse potential unfamiliarities with the process.

Yannik


diff --git a/src/hotspot/os/linux/os_linux.cpp b/src/hotspot/os/linux/os_linux.cpp
index 4e26797cd5b..2858fbba247 100644
--- a/src/hotspot/os/linux/os_linux.cpp
+++ b/src/hotspot/os/linux/os_linux.cpp
@@ -1064,10 +1064,28 @@ bool os::create_thread(Thread* thread, ThreadType thr_type,
     ResourceMark rm;
     pthread_t tid;
     int ret = 0;
-    int limit = 3;
-    do {
+    int limit = 5;
+    useconds_t delay = 1'000;
+    constexpr useconds_t max_delay = 1'000'000;
+
+    while (true) {
       ret = pthread_create(&tid, &attr, (void* (*)(void*)) thread_native_entry, thread);
-    } while (ret == EAGAIN && limit-- > 0);
+
+      if (ret != EAGAIN) {
+          break;
+      }
+
+      if (limit-- <= 0) {
+          break;
+      }
+
+      log_warning(os, thread)("Failed to start native thread (%s), retrying after %dus.", os::errno_name(ret), delay);
+      ::usleep(delay);
+      delay *= 2;
+      if (delay > max_delay) {
+          delay = max_delay;
+      }
+    }
 
     char buf[64];
     if (ret == 0) {


More information about the hotspot-runtime-dev mailing list