RFR: 8240588: _threadObj cannot be used on an exiting JavaThread

Robbin Ehn robbin.ehn at oracle.com
Wed May 13 14:15:16 UTC 2020


Thanks David, looks good!

Great that you added a test!

/Robbin

On 2020-05-13 03:27, David Holmes wrote:
> webrev: http://cr.openjdk.java.net/~dholmes/8240588/webrev.v3/
> bug: https://bugs.openjdk.java.net/browse/JDK-8240588
> 
> When a thread starts to terminate and removes itself from the main 
> ThreadsList it is no longer visited by GC (through the oops_do 
> mechanism). But that thread can still be found in secondary ThreadsLists 
> (via ThreadsListHandles), by code that then tries to access its 
> threadObj() oop, which can be invalid due to the fact it has not been 
> visited by the GC.
> 
> As per the bug report I looked into a range of ways of addressing this:
> - make all ThreadsLists visible to GC
> - make the threadObj() a global handle of some form
> - fortify the call-sites to try to guard against a bad oop
> 
> but I ended up with a very simple and clean solution that maintains an 
> auxiliary list of exiting threads (guarded by Threads_lock within 
> existing ThreadSMR code) which is walked via Universe::oops_do such that 
> all the threadObj() oops are visited and kept valid.
> 
> Thanks to Erik, Dan, and Robbin for pre-review of this code and 
> suggested improvements.
> 
> Thanks to Kim for explaining why handle approaches failed and the 
> limitations of oop access by a terminating thread. As a result of that 
> there is an additional small fix in thread.cpp to ensure the existing 
> thread doesn't try to access its own threadObj() oop when the thread is 
> not permitted to do so.
> 
> Testing:
> 
> I managed to devise a regression test which may not be future-proof (in 
> that the test may trivially pass because no oop relocation occurs) but 
> with which I was able to observe failures today with all GCs without the 
> fix, and success with the fix.
> 
> The regression test was tested locally on Linux with each of Serial, G1, 
> Z and Shenadoah GCs, with product bits and fastdebug bits, and with the 
> fix disabled and enabled. With the fix disabled the test reported an 
> error in all configurations except product with ZGC. With the fix 
> enabled it passed on all configurations.
> 
> The regression test was also tested in the CI:
> - linux, macOS x product,fastdebug x serial, G1, Z
> - windows x product, fastdebug, x serial, G1
> 
> With the fix disabled the test reported an error in all configurations 
> (including product with ZGC!). With the fix enabled it passed on all 
> configurations.
> 
> General testing: tiers 1-3
> 
> Thanks,
> David


More information about the hotspot-runtime-dev mailing list