RFR: 8240588: _threadObj cannot be used on an exiting JavaThread

David Holmes david.holmes at oracle.com
Wed May 13 01:27:05 UTC 2020


webrev: http://cr.openjdk.java.net/~dholmes/8240588/webrev.v3/
bug: https://bugs.openjdk.java.net/browse/JDK-8240588

When a thread starts to terminate and removes itself from the main 
ThreadsList it is no longer visited by GC (through the oops_do 
mechanism). But that thread can still be found in secondary ThreadsLists 
(via ThreadsListHandles), by code that then tries to access its 
threadObj() oop, which can be invalid due to the fact it has not been 
visited by the GC.

As per the bug report I looked into a range of ways of addressing this:
- make all ThreadsLists visible to GC
- make the threadObj() a global handle of some form
- fortify the call-sites to try to guard against a bad oop

but I ended up with a very simple and clean solution that maintains an 
auxiliary list of exiting threads (guarded by Threads_lock within 
existing ThreadSMR code) which is walked via Universe::oops_do such that 
all the threadObj() oops are visited and kept valid.

Thanks to Erik, Dan, and Robbin for pre-review of this code and 
suggested improvements.

Thanks to Kim for explaining why handle approaches failed and the 
limitations of oop access by a terminating thread. As a result of that 
there is an additional small fix in thread.cpp to ensure the existing 
thread doesn't try to access its own threadObj() oop when the thread is 
not permitted to do so.

Testing:

I managed to devise a regression test which may not be future-proof (in 
that the test may trivially pass because no oop relocation occurs) but 
with which I was able to observe failures today with all GCs without the 
fix, and success with the fix.

The regression test was tested locally on Linux with each of Serial, G1, 
Z and Shenadoah GCs, with product bits and fastdebug bits, and with the 
fix disabled and enabled. With the fix disabled the test reported an 
error in all configurations except product with ZGC. With the fix 
enabled it passed on all configurations.

The regression test was also tested in the CI:
- linux, macOS x product,fastdebug x serial, G1, Z
- windows x product, fastdebug, x serial, G1

With the fix disabled the test reported an error in all configurations 
(including product with ZGC!). With the fix enabled it passed on all 
configurations.

General testing: tiers 1-3

Thanks,
David


More information about the hotspot-runtime-dev mailing list