RFR: 8354560: Exponentially delay subsequent native thread creation in case of EAGAIN [v3]
David Holmes
dholmes at openjdk.org
Thu May 8 07:23:56 UTC 2025
On Tue, 6 May 2025 22:52:59 GMT, Yannik Stradmann <duke at openjdk.org> wrote:
>> This change introduces an exponential backoff when hitting `EAGAIN` during native thread creation in hotspot.
>>
>> In contrast to the current solution, where we retry to create a native thread up to three times in a tight loop, hotspot will will thereby be more kind to an already depleted resource, reduce stress on the kernel and become more robust on systems under high load.
>>
>> The proposed modifications to `os_linux.cpp` have substantially improved system stability in a mid-sized Jenkins cluster and have been in production within our systems over the past three years. I have verbatim ported these to the other platforms, which previously also relied on identical logic.
>
> Yannik Stradmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>
> - Merge remote-tracking branch 'upstream/master' into robust_pthread
> - Fix build on Windows: Sleep() only accepts milliseconds
> - Exponentially delay native thread creation retries
Sorry for the delay but I have been on vacation (and noone else picked this up).
Seems okay in principle but a couple of small issues below.
Thanks
src/hotspot/os/bsd/os_bsd.cpp line 647:
> 645: pthread_t tid;
> 646: int ret = 0;
> 647: {
Why the extra block scope?
src/hotspot/os/bsd/os_bsd.cpp line 661:
> 659: }
> 660:
> 661: log_warning(os, thread)("Failed to start native thread (%s), retrying after %dus.", os::errno_name(ret), next_delay);
I don't think we want to issue a warning unless we completely fail to start the native thread. For debugging purposes this may be better as a log_debug,
-------------
Changes requested by dholmes (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/24682#pullrequestreview-2824072686
PR Review Comment: https://git.openjdk.org/jdk/pull/24682#discussion_r2079056634
PR Review Comment: https://git.openjdk.org/jdk/pull/24682#discussion_r2079059986
More information about the hotspot-runtime-dev
mailing list