RFR: 8240588: _threadObj cannot be used on an exiting JavaThread

David Holmes david.holmes at oracle.com
Wed May 13 22:24:01 UTC 2020


Hi Robbin,

On 14/05/2020 12:15 am, Robbin Ehn wrote:
> Thanks David, looks good!

Thanks. I did overlook one thing in the remove logic, I forgot to 
actually delete the Holder:

   for (Holder* current = _exiting_threads, **prev_next = &_exiting_threads;
        current != NULL;
        prev_next = &current->_next, current = current->_next) {
     if (current->_thread == thread) {
         *prev_next = current->_next;
+       delete current;
         break;
     }
   }

> Great that you added a test!

Yes it was essential, though tricky. :)

Thanks for the review.

David
-----

> 
> /Robbin
> 
> On 2020-05-13 03:27, David Holmes wrote:
>> webrev: http://cr.openjdk.java.net/~dholmes/8240588/webrev.v3/
>> bug: https://bugs.openjdk.java.net/browse/JDK-8240588
>>
>> When a thread starts to terminate and removes itself from the main 
>> ThreadsList it is no longer visited by GC (through the oops_do 
>> mechanism). But that thread can still be found in secondary 
>> ThreadsLists (via ThreadsListHandles), by code that then tries to 
>> access its threadObj() oop, which can be invalid due to the fact it 
>> has not been visited by the GC.
>>
>> As per the bug report I looked into a range of ways of addressing this:
>> - make all ThreadsLists visible to GC
>> - make the threadObj() a global handle of some form
>> - fortify the call-sites to try to guard against a bad oop
>>
>> but I ended up with a very simple and clean solution that maintains an 
>> auxiliary list of exiting threads (guarded by Threads_lock within 
>> existing ThreadSMR code) which is walked via Universe::oops_do such 
>> that all the threadObj() oops are visited and kept valid.
>>
>> Thanks to Erik, Dan, and Robbin for pre-review of this code and 
>> suggested improvements.
>>
>> Thanks to Kim for explaining why handle approaches failed and the 
>> limitations of oop access by a terminating thread. As a result of that 
>> there is an additional small fix in thread.cpp to ensure the existing 
>> thread doesn't try to access its own threadObj() oop when the thread 
>> is not permitted to do so.
>>
>> Testing:
>>
>> I managed to devise a regression test which may not be future-proof 
>> (in that the test may trivially pass because no oop relocation occurs) 
>> but with which I was able to observe failures today with all GCs 
>> without the fix, and success with the fix.
>>
>> The regression test was tested locally on Linux with each of Serial, 
>> G1, Z and Shenadoah GCs, with product bits and fastdebug bits, and 
>> with the fix disabled and enabled. With the fix disabled the test 
>> reported an error in all configurations except product with ZGC. With 
>> the fix enabled it passed on all configurations.
>>
>> The regression test was also tested in the CI:
>> - linux, macOS x product,fastdebug x serial, G1, Z
>> - windows x product, fastdebug, x serial, G1
>>
>> With the fix disabled the test reported an error in all configurations 
>> (including product with ZGC!). With the fix enabled it passed on all 
>> configurations.
>>
>> General testing: tiers 1-3
>>
>> Thanks,
>> David


More information about the hotspot-runtime-dev mailing list