RFR: 8213231: ThreadSnapshot::_threadObj can become stale

Robbin Ehn robbin.ehn at oracle.com
Tue Jan 22 16:01:49 UTC 2019


Hi Erik, thanks for fixing, looks good!

Thanks, Robbin

On 1/22/19 3:59 PM, Erik Helin wrote:
> Hi all,
> 
> this patch fixes a problem when the oop in ThreadSnapshot::_threadObj can become 
> stale. The issue is that the ThreadSnapshot::oops_do method only gets called 
> when a ThreadSnapshot instance has been registered in a ThreadDumpResult (via 
> the ThreadDumpResult::add_thread_snapshot method). But, in order to register a 
> ThreadSnapshot instance, you must first create it. The problem is that the 
> ThreadSnapshot constructor first sets _threadObj to thread->threadObj() and then 
> further down might call ObjectSynchronizer:: get_lock_owner. The call to 
> ObjectSynchronizer:: get_lock_owner can result in a VM_RevokeBias VM operation 
> being executed. If a GC VM operation already is enqueued, then that GC VM 
> operation will run when the VM_RevokeBias VM operation is executed. That GC VM 
> operation will not update the oop in ThreadSafepoint::_threadObj, because that 
> ThreadSnapshot instance has not yet been registered in any ThreadDumpResult 
> (recall that the ThreadSafepoint is being constructed), so the GC has no way to 
> find it. The oop in ThreadSafepoint::_threadObj will then become dangling which 
> most likely will cause the JVM to get a SIGSEGV some time later.
> 
> The issue was found when debugging why an instance of 
> java/lang/management/ThreadInfo on the Java heap had a stale pointer in its 
> threadName field. Turns out that the java.lang.Thread instance passed to the 
> ThreadInfo was stale most likely for the reason outlined in the paragraph above.
> 
> This patch fixes the issue by ensuring that a ThreadSnapshot is always 
> registered in a ThreadDumpResult before the initialization of the ThreadSnapshot 
> is done. This ensures that the GC will always be able to find the oop 
> ThreadSnapshot::_threadObj via ThreadDumpResult::oops_do.
> 
> Webrev:
> http://cr.openjdk.java.net/~ehelin/8213231/00/
> 
> Issue:
> https://bugs.openjdk.java.net/browse/JDK-8213231
> 
> Testing:
> - Tier 1, 2 and 3 on Windows, Mac, Linux (all x86-64)
> - RunThese30M (multiple runs) and RunThese24h on Linux x86-64
>    (please note that I never managed to reproduce the issue, all analysis was 
> done based on a core file)
> 
> Thanks,
> Erik


More information about the hotspot-runtime-dev mailing list