RFR: 8213231: ThreadSnapshot::_threadObj can become stale
Erik Helin
erik.helin at oracle.com
Tue Jan 22 14:59:40 UTC 2019
Hi all,
this patch fixes a problem when the oop in ThreadSnapshot::_threadObj
can become stale. The issue is that the ThreadSnapshot::oops_do method
only gets called when a ThreadSnapshot instance has been registered in a
ThreadDumpResult (via the ThreadDumpResult::add_thread_snapshot method).
But, in order to register a ThreadSnapshot instance, you must first
create it. The problem is that the ThreadSnapshot constructor first sets
_threadObj to thread->threadObj() and then further down might call
ObjectSynchronizer:: get_lock_owner. The call to ObjectSynchronizer::
get_lock_owner can result in a VM_RevokeBias VM operation being
executed. If a GC VM operation already is enqueued, then that GC VM
operation will run when the VM_RevokeBias VM operation is executed. That
GC VM operation will not update the oop in ThreadSafepoint::_threadObj,
because that ThreadSnapshot instance has not yet been registered in any
ThreadDumpResult (recall that the ThreadSafepoint is being constructed),
so the GC has no way to find it. The oop in ThreadSafepoint::_threadObj
will then become dangling which most likely will cause the JVM to get a
SIGSEGV some time later.
The issue was found when debugging why an instance of
java/lang/management/ThreadInfo on the Java heap had a stale pointer in
its threadName field. Turns out that the java.lang.Thread instance
passed to the ThreadInfo was stale most likely for the reason outlined
in the paragraph above.
This patch fixes the issue by ensuring that a ThreadSnapshot is always
registered in a ThreadDumpResult before the initialization of the
ThreadSnapshot is done. This ensures that the GC will always be able to
find the oop ThreadSnapshot::_threadObj via ThreadDumpResult::oops_do.
Webrev:
http://cr.openjdk.java.net/~ehelin/8213231/00/
Issue:
https://bugs.openjdk.java.net/browse/JDK-8213231
Testing:
- Tier 1, 2 and 3 on Windows, Mac, Linux (all x86-64)
- RunThese30M (multiple runs) and RunThese24h on Linux x86-64
(please note that I never managed to reproduce the issue, all
analysis was done based on a core file)
Thanks,
Erik
More information about the hotspot-runtime-dev
mailing list