RFR: 8213231: ThreadSnapshot::_threadObj can become stale
Robbin Ehn
robbin.ehn at oracle.com
Tue Jan 22 16:01:49 UTC 2019
Hi Erik, thanks for fixing, looks good!
Thanks, Robbin
On 1/22/19 3:59 PM, Erik Helin wrote:
> Hi all,
>
> this patch fixes a problem when the oop in ThreadSnapshot::_threadObj can become
> stale. The issue is that the ThreadSnapshot::oops_do method only gets called
> when a ThreadSnapshot instance has been registered in a ThreadDumpResult (via
> the ThreadDumpResult::add_thread_snapshot method). But, in order to register a
> ThreadSnapshot instance, you must first create it. The problem is that the
> ThreadSnapshot constructor first sets _threadObj to thread->threadObj() and then
> further down might call ObjectSynchronizer:: get_lock_owner. The call to
> ObjectSynchronizer:: get_lock_owner can result in a VM_RevokeBias VM operation
> being executed. If a GC VM operation already is enqueued, then that GC VM
> operation will run when the VM_RevokeBias VM operation is executed. That GC VM
> operation will not update the oop in ThreadSafepoint::_threadObj, because that
> ThreadSnapshot instance has not yet been registered in any ThreadDumpResult
> (recall that the ThreadSafepoint is being constructed), so the GC has no way to
> find it. The oop in ThreadSafepoint::_threadObj will then become dangling which
> most likely will cause the JVM to get a SIGSEGV some time later.
>
> The issue was found when debugging why an instance of
> java/lang/management/ThreadInfo on the Java heap had a stale pointer in its
> threadName field. Turns out that the java.lang.Thread instance passed to the
> ThreadInfo was stale most likely for the reason outlined in the paragraph above.
>
> This patch fixes the issue by ensuring that a ThreadSnapshot is always
> registered in a ThreadDumpResult before the initialization of the ThreadSnapshot
> is done. This ensures that the GC will always be able to find the oop
> ThreadSnapshot::_threadObj via ThreadDumpResult::oops_do.
>
> Webrev:
> http://cr.openjdk.java.net/~ehelin/8213231/00/
>
> Issue:
> https://bugs.openjdk.java.net/browse/JDK-8213231
>
> Testing:
> - Tier 1, 2 and 3 on Windows, Mac, Linux (all x86-64)
> - RunThese30M (multiple runs) and RunThese24h on Linux x86-64
> (please note that I never managed to reproduce the issue, all analysis was
> done based on a core file)
>
> Thanks,
> Erik
More information about the hotspot-runtime-dev
mailing list