RFR: 8345970: pthread_getcpuclockid related crashes in shenandoah tests [v5]
William Kemper
wkemper at openjdk.org
Fri Dec 13 00:30:40 UTC 2024
On Thu, 12 Dec 2024 23:51:52 GMT, Y. Srinivas Ramakrishna <ysr at openjdk.org> wrote:
>> It wasn't actually the periodic task causing the crash (though that issue has been fixed here as well). The crash was caused by the control thread trying to `ShenandoahHeap::do_gc_threads` which included the regulator thread that had already been stopped by `ShenandoahHeap::stop`. The fix here was to stop the control thread before stopping the regulator thread, thereby preventing the control thread from trying to access stopped threads.
>>
>> This was fairly easy to reproduce (and verify) once I had an Alpine Linux environment set up.
>
> In light of the new findings, should the `if` test be converted now into an `assert` of some sort about the threads not having been terminated during any test (I know the assert is still "racy" -- it doesn't cover the entire window -- but sound to call here. Also wondering if the original when run with a fastdebug build may have asserted down in the `os::` method because of finding a null `osthread`? Should the `os::` methods assert on non-nullness of associated `osthread`? Worth checking now that you have an AlpineLinux box to test on?)
I don't think we can readily test the validity of the `osthread's` native thread handle. I'm sure it _could_ be done, but it's platform specific. In this case, for example, the [glibc version](https://github.com/lattera/glibc/blob/master/nptl/pthread_getcpuclockid.c) of `pthread_getcpuclockid` returns an error code if the handle is `INVALID_TD_P`. The [musl version](https://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_getcpuclockid.c) (used for Alpine Linux), on the other hand, has no such check.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/22693#discussion_r1883069731
More information about the hotspot-gc-dev
mailing list