RFR: 8213231: ThreadSnapshot::_threadObj can become stale

David Holmes david.holmes at oracle.com
Tue Jan 22 22:03:42 UTC 2019


Hi Erik,

Good find! This bug made an interesting read. Makes me wonder how we may 
have caught this kind of error sooner? Some kind of unhandled oop check? 
NoSafepointVerifier?

On 23/01/2019 12:59 am, Erik Helin wrote:
> Hi all,
> 
> this patch fixes a problem when the oop in ThreadSnapshot::_threadObj 
> can become stale. The issue is that the ThreadSnapshot::oops_do method 
> only gets called when a ThreadSnapshot instance has been registered in a 
> ThreadDumpResult (via the ThreadDumpResult::add_thread_snapshot method). 
> But, in order to register a ThreadSnapshot instance, you must first 
> create it. The problem is that the ThreadSnapshot constructor first sets 
> _threadObj to thread->threadObj() and then further down might call 
> ObjectSynchronizer:: get_lock_owner. The call to ObjectSynchronizer:: 
> get_lock_owner can result in a VM_RevokeBias VM operation being 
> executed. If a GC VM operation already is enqueued, then that GC VM 
> operation will run when the VM_RevokeBias VM operation is executed. That 
> GC VM operation will not update the oop in ThreadSafepoint::_threadObj, 
> because that ThreadSnapshot instance has not yet been registered in any 
> ThreadDumpResult (recall that the ThreadSafepoint is being constructed), 
> so the GC has no way to find it. The oop in ThreadSafepoint::_threadObj 
> will then become dangling which most likely will cause the JVM to get a 
> SIGSEGV some time later.

_blocker_object could suffer the same fate, and possibly 
_blocker_object_owner if there could be other paths leading to a safepoint.
> 
> The issue was found when debugging why an instance of 
> java/lang/management/ThreadInfo on the Java heap had a stale pointer in 
> its threadName field. Turns out that the java.lang.Thread instance 
> passed to the ThreadInfo was stale most likely for the reason outlined 
> in the paragraph above.
> 
> This patch fixes the issue by ensuring that a ThreadSnapshot is always 
> registered in a ThreadDumpResult before the initialization of the 
> ThreadSnapshot is done. This ensures that the GC will always be able to 
> find the oop ThreadSnapshot::_threadObj via ThreadDumpResult::oops_do.
> 
> Webrev:
> http://cr.openjdk.java.net/~ehelin/8213231/00/

Explanation sounded simple enough - actual change was a little harder to 
follow :)

src/hotspot/share/services/threadService.cpp

Initially I was concerned about ensuring all oop fields were NULL but I 
see the ThreadSnapshot constructor handles that. That means some of the 
NULL setting in initialize is redundant:

  854   _stack_trace = NULL;
  855   _concurrent_locks = NULL;
  856   _next = NULL;
  866   _blocker_object = NULL;
  867   _blocker_object_owner = NULL;

---

src/hotspot/share/services/threadService.hpp

The new initialize function doesn't need to be public.

+   void link_thread_snapshot(ThreadSnapshot* ts);

Please follow the existing layout style in that code and shift the 
method name across to align with others.

Thanks,
David


> 
> Issue:
> https://bugs.openjdk.java.net/browse/JDK-8213231
> 
> Testing:
> - Tier 1, 2 and 3 on Windows, Mac, Linux (all x86-64)
> - RunThese30M (multiple runs) and RunThese24h on Linux x86-64
>    (please note that I never managed to reproduce the issue, all 
> analysis was done based on a core file)
> 
> Thanks,
> Erik


More information about the hotspot-runtime-dev mailing list