RFR: 8354560: Exponentially delay subsequent native thread creation in case of EAGAIN

Wed Apr 16 10:43:41 UTC 2025

On Wed, 16 Apr 2025 10:34:18 GMT, Yannik Stradmann <duke at openjdk.org> wrote:

> This change introduces an exponential backoff when hitting `EAGAIN` during native thread creation in hotspot.
> 
> In contrast to the current solution, where we retry to create a native thread up to three times in a tight loop, hotspot will will thereby be more kind to an already depleted resource, reduce stress on the kernel and become more robust on systems under high load.
> 
> The proposed modifications to `os_linux.cpp` have substantially improved system stability in a mid-sized Jenkins cluster and have been in production within our systems over the past three years. I have verbatim ported these to the other platforms, which previously also relied on identical logic.

@dholmes-ora Thanks a lot for filing the enhancement request!
I don't have permissions to comment in JBS, so I'll reply via this PR – where we can also discuss the specific implementation.

> Do you have any actual data on how many retries you have had to wait to succeed?

I just spent some time scraping all log files I could find from times before we deployed a custom hotspot build including the exponential backoff to production. Unfortunately, this data is rather sparse – but I have similarly sparse data from right after deployment of the modified JVM. The sparsity will reduce the overall amount of logs I see, but should not alter their distribution, so the numbers below will just be a factor of 10-100 smaller than reality.

For our specific cluster, I have found
1. with upstream (3 retries, back-to-back): > 4/day fatal errors due to failed native thread starts
2. with this change (exponential backoff): 0 fatal errors due to failed native thread starts (over all times)
3. with this change, up to 9/day retries were logged
4. with this change, the first retry after 1000us was always successful – I did not find a single occasion where we've hit a second sleep.

Based on these observations, I have kept the amount of retries unchanged (3) for now.
The last point indicates that my previous parametrization of 1000us for the initial delay was too conservative. I've therefore reduced it to 256us, which will result in a maximum inter-trial delay of 1024us.

> When the retries were added in:  
>  https://bugs.openjdk.org/browse/JDK-8268773  
> there was some discussion across a number of bug reports and two PRs about the potential usefulness of even doing a basic retry as the error condition was considered to unlikely to be self correcting. But as per that original change, adding a delay between retries does no harm other than delaying the ultimate reporting of an error, so it may be okay to put in place if it will do some good.

I agree – if at all, this change would counteract the concerns raised in the original discussion by giving the system time to potentially leave the overloaded state. Besides the slightly delayed error reporting you've mentioned, harm could also come from delayed successful thread starts in applications that rely on low jitter and would prefer instantaneous failure over delayed success. Since I'm not aware of any timing guarantees by `pthread_create()` itself, this feels unlikely to me – but I didn't want to leave it unmentioned.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24682#issuecomment-2809176854