RFR: 8240588: _threadObj cannot be used on an exiting JavaThread

David Holmes david.holmes at oracle.com
Wed May 13 23:08:48 UTC 2020


Hi Dan,

On 14/05/2020 1:41 am, Daniel D. Daugherty wrote:
> Hi David,
> 
> On 5/12/20 9:27 PM, David Holmes wrote:
>> webrev: http://cr.openjdk.java.net/~dholmes/8240588/webrev.v3/
> 
> src/hotspot/share/memory/universe.cpp
>      No comments.
> 
> src/hotspot/share/prims/whitebox.cpp
>      L2238:     THROW_MSG(vmSymbols::java_lang_RuntimeException(), 
> "Target thread not found in ThreadsListHandle!");
>          Please consider: s/ThreadsListHandle/ThreadsList/
> 
>          The ThreadsListHandle is just a wrapper around the underlying
>          ThreadsList. find_JavaThread_from_java_tid() searched the
>          ThreadsList.

Changed.

>      L2248:     os::naked_short_sleep(0);
>          I didn't realize/remember that '0' means minimum time supported 
> by the OS.

I only noticed that by accident :)

>      L2253:   // Now release the GC inducing thread - we have to 
> re-resolve the external oop that
>      L2254:   // was passed in as GC may have occurred and we don't know 
> if we can trust t->threadObj() now.
>      L2255:   oop original = JNIHandles::resolve_non_null(target_handle);
>      L2256:   java_lang_Thread::set_priority(original, 
> ThreadPriority(NormPriority + 2));
>          So target_handle refers to the target thread that we're tracking
>          and have allowed to exit. What good does it do to bump the 
> priority
>          on the target thread when it has (mostly) exited? By mostly, I 
> mean
>          that the JavaThread is still around because there is a ThreadsList
>          that stills contains the target thread, but it is past executing
>          code...
> 
>          Update: Now that I've read the test, I understand that you're 
> using
>          the priority field for test state transitions.
> 
>          Please add after this line (I also added a period at the end of 
> L2225):
> 
>          L2225: // See 
> test/hotspot/jtreg/runtime/Thread/ThreadObjAccessAtExit.java.
>                 // It explains how the thread's priority field is used 
> for test
>                 // state coordination.

Added.

> src/hotspot/share/runtime/thread.cpp
>      No comments.
> 
> src/hotspot/share/runtime/thread.hpp
>      No comments.
> 
> src/hotspot/share/runtime/threadSMR.cpp
>      L1196:   Holder * h = new Holder(thread, _exiting_threads);
>          nit - please delete space before '*'.

Fixed.

> 
>      L1210:     if (current->_thread == thread) {
>      L1211:         *prev_next = current->_next;
>      L1212:         break;
>      L1213:     }
>          nit - indent too much by two spaces on L1211-2.

Fixed. And as I mentioned in my reply to Robbin I also added the missing

delete current;

before the break.

>          I like the rewrite to use prev_next. It's slightly more
>          complicated than your original, but not overly so.

Courtesy of Robbin. I took a little while to grok it, but it avoids the 
need to special-case removing the first entry.

> src/hotspot/share/runtime/threadSMR.hpp
>      L93: //   Threads_lock) and visits the _threadObj oop of each 
> JavaThread
>          nit - needs a period at the end.

Fixed.

>      L175:   DEBUG_ONLY(static bool contains_exiting_thread(JavaThread* 
> thread);)
>          This should be "#ifdef ASSERT ... #endif" to match the .cpp 
> definition.

DEBUG_ONLY is itself defined under "#ifdef ASSERT" so this usage is correct.

>          nit - Also swap L175 and L176 to match the declaration order with
>          the definition order in the .cpp.

Changed.

> 
> test/lib/sun/hotspot/WhiteBox.java
>      No comments.
> 
> test/hotspot/jtreg/runtime/Thread/ThreadObjAccessAtExit.java
>      L29:  *          removed itself from the ThreadsList.
>          s/the ThreadsList/the main ThreadsList/

Changed.

>      L39: /*
>      L56: */
>          nit - It's odd to see this style of comment block without a
>              leading ' * ' before each line. Better still is the '//' 
> style.

Chganged.

>          This comment clarifies the SetPriority() calls in whitebox.cpp
>          so now I know what's going on there.
> 
>      I like this test!!
> 
> Thumbs up! I don't need to see a new webrev if you decide to make
> any tweaks based on my comments above.

Thanks for the review. For good measure updated webrev:

http://cr.openjdk.java.net/~dholmes/8240588/webrev.v4/

with a simple diff of changes:

http://cr.openjdk.java.net/~dholmes/8240588/webrev.v4/v4-incr.patch

I've rebased and will re-test before pushing.

Thanks,
David
-----

> Dan
> 
> 
>> bug: https://bugs.openjdk.java.net/browse/JDK-8240588
>>
>> When a thread starts to terminate and removes itself from the main 
>> ThreadsList it is no longer visited by GC (through the oops_do 
>> mechanism). But that thread can still be found in secondary 
>> ThreadsLists (via ThreadsListHandles), by code that then tries to 
>> access its threadObj() oop, which can be invalid due to the fact it 
>> has not been visited by the GC.
>>
>> As per the bug report I looked into a range of ways of addressing this:
>> - make all ThreadsLists visible to GC
>> - make the threadObj() a global handle of some form
>> - fortify the call-sites to try to guard against a bad oop
>>
>> but I ended up with a very simple and clean solution that maintains an 
>> auxiliary list of exiting threads (guarded by Threads_lock within 
>> existing ThreadSMR code) which is walked via Universe::oops_do such 
>> that all the threadObj() oops are visited and kept valid.
>>
>> Thanks to Erik, Dan, and Robbin for pre-review of this code and 
>> suggested improvements.
>>
>> Thanks to Kim for explaining why handle approaches failed and the 
>> limitations of oop access by a terminating thread. As a result of that 
>> there is an additional small fix in thread.cpp to ensure the existing 
>> thread doesn't try to access its own threadObj() oop when the thread 
>> is not permitted to do so.
>>
>> Testing:
>>
>> I managed to devise a regression test which may not be future-proof 
>> (in that the test may trivially pass because no oop relocation occurs) 
>> but with which I was able to observe failures today with all GCs 
>> without the fix, and success with the fix.
>>
>> The regression test was tested locally on Linux with each of Serial, 
>> G1, Z and Shenadoah GCs, with product bits and fastdebug bits, and 
>> with the fix disabled and enabled. With the fix disabled the test 
>> reported an error in all configurations except product with ZGC. With 
>> the fix enabled it passed on all configurations.
>>
>> The regression test was also tested in the CI:
>> - linux, macOS x product,fastdebug x serial, G1, Z
>> - windows x product, fastdebug, x serial, G1
>>
>> With the fix disabled the test reported an error in all configurations 
>> (including product with ZGC!). With the fix enabled it passed on all 
>> configurations.
>>
>> General testing: tiers 1-3
>>
>> Thanks,
>> David
> 


More information about the hotspot-runtime-dev mailing list