RFR: 8345970: pthread_getcpuclockid related crashes in shenandoah tests
Y. Srinivas Ramakrishna
ysr at openjdk.org
Thu Dec 12 02:29:35 UTC 2024
On Wed, 11 Dec 2024 22:32:00 GMT, William Kemper <wkemper at openjdk.org> wrote:
> I haven't seen this failure mode in our Alpine Linux test pipelines, but the suggestion to avoid getting cpu time for terminated threads sounds sensible.
src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 51:
> 49: ThreadTimeAccumulator() : total_time(0) {}
> 50: void do_thread(Thread* thread) override {
> 51: if (!thread->has_terminated()) {
There's an inherent race here at destruction time because the target thread may be terminated between the check and the cpu time call -- thus you've narrowed the race window but not closed it.
Note that this is today called only on GC-worker-like threads (include controller & regulator & worker threads).
I agree that the crashes are likely occurring during shutdown, just as you surmised. I'd suggest looking at the constructor and destructor (enroll and disenroll) of the MMU Tracker Task, and disenroll it before the GC-workers et al. are shutdown. That would be the most surgical and cleanest fix, and closes the race.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/22693#discussion_r1881283986
More information about the shenandoah-dev
mailing list