RFR: 8240588: _threadObj cannot be used on an exiting JavaThread
David Holmes
david.holmes at oracle.com
Wed May 13 01:27:05 UTC 2020
webrev: http://cr.openjdk.java.net/~dholmes/8240588/webrev.v3/
bug: https://bugs.openjdk.java.net/browse/JDK-8240588
When a thread starts to terminate and removes itself from the main
ThreadsList it is no longer visited by GC (through the oops_do
mechanism). But that thread can still be found in secondary ThreadsLists
(via ThreadsListHandles), by code that then tries to access its
threadObj() oop, which can be invalid due to the fact it has not been
visited by the GC.
As per the bug report I looked into a range of ways of addressing this:
- make all ThreadsLists visible to GC
- make the threadObj() a global handle of some form
- fortify the call-sites to try to guard against a bad oop
but I ended up with a very simple and clean solution that maintains an
auxiliary list of exiting threads (guarded by Threads_lock within
existing ThreadSMR code) which is walked via Universe::oops_do such that
all the threadObj() oops are visited and kept valid.
Thanks to Erik, Dan, and Robbin for pre-review of this code and
suggested improvements.
Thanks to Kim for explaining why handle approaches failed and the
limitations of oop access by a terminating thread. As a result of that
there is an additional small fix in thread.cpp to ensure the existing
thread doesn't try to access its own threadObj() oop when the thread is
not permitted to do so.
Testing:
I managed to devise a regression test which may not be future-proof (in
that the test may trivially pass because no oop relocation occurs) but
with which I was able to observe failures today with all GCs without the
fix, and success with the fix.
The regression test was tested locally on Linux with each of Serial, G1,
Z and Shenadoah GCs, with product bits and fastdebug bits, and with the
fix disabled and enabled. With the fix disabled the test reported an
error in all configurations except product with ZGC. With the fix
enabled it passed on all configurations.
The regression test was also tested in the CI:
- linux, macOS x product,fastdebug x serial, G1, Z
- windows x product, fastdebug, x serial, G1
With the fix disabled the test reported an error in all configurations
(including product with ZGC!). With the fix enabled it passed on all
configurations.
General testing: tiers 1-3
Thanks,
David
More information about the hotspot-runtime-dev
mailing list