RFR: 8345970: pthread_getcpuclockid related crashes in shenandoah tests

Y. Srinivas Ramakrishna ysr at openjdk.org
Thu Dec 12 02:32:34 UTC 2024


On Thu, 12 Dec 2024 02:26:17 GMT, Y. Srinivas Ramakrishna <ysr at openjdk.org> wrote:

>> I haven't seen this failure mode in our Alpine Linux test pipelines, but the suggestion to avoid getting cpu time for terminated threads sounds sensible.
>
> src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 51:
> 
>> 49:   ThreadTimeAccumulator() : total_time(0) {}
>> 50:   void do_thread(Thread* thread) override {
>> 51:     if (!thread->has_terminated()) {
> 
> There's an inherent race here at destruction time because the target thread may be terminated between the check and the cpu time call -- thus you've narrowed the race window but not closed it.
> 
> Note that this is today called only on GC-worker-like threads (include controller & regulator & worker threads).
> 
> I agree that the crashes are likely occurring during shutdown, just as you surmised. I'd suggest looking at the constructor and destructor (enroll and disenroll) of the MMU Tracker Task, and disenroll it before the GC-workers et al. are shutdown. That would be the most surgical and cleanest fix, and closes the race.

Right now the disenroll is done a tad late, since the task is disenrolled in the task's destructor which doesn't happen until the heap is destructed. I think at least the disenroll should be done before we start shutting down GC worker threads etc.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22693#discussion_r1881289328


More information about the shenandoah-dev mailing list