RFR: 8213231: ThreadSnapshot::_threadObj can become stale

Erik Helin erik.helin at oracle.com
Wed Jan 23 13:28:42 UTC 2019


On 1/22/19 10:32 PM, Daniel D. Daugherty wrote:
> On 1/22/19 9:59 AM, Erik Helin wrote:
>> Hi all,
>>
>> this patch fixes a problem when the oop in ThreadSnapshot::_threadObj 
>> can become stale. The issue is that the ThreadSnapshot::oops_do method 
>> only gets called when a ThreadSnapshot instance has been registered in 
>> a ThreadDumpResult (via the ThreadDumpResult::add_thread_snapshot 
>> method). But, in order to register a ThreadSnapshot instance, you must 
>> first create it. The problem is that the ThreadSnapshot constructor 
>> first sets _threadObj to thread->threadObj() and then further down 
>> might call ObjectSynchronizer:: get_lock_owner. The call to 
>> ObjectSynchronizer:: get_lock_owner can result in a VM_RevokeBias VM 
>> operation being executed. If a GC VM operation already is enqueued, 
>> then that GC VM operation will run when the VM_RevokeBias VM operation 
>> is executed. That GC VM operation will not update the oop in 
>> ThreadSafepoint::_threadObj, because that ThreadSnapshot instance has 
>> not yet been registered in any ThreadDumpResult (recall that the 
>> ThreadSafepoint is being constructed), so the GC has no way to find 
>> it. The oop in ThreadSafepoint::_threadObj will then become dangling 
>> which most likely will cause the JVM to get a SIGSEGV some time later.
>>
>> The issue was found when debugging why an instance of 
>> java/lang/management/ThreadInfo on the Java heap had a stale pointer 
>> in its threadName field. Turns out that the java.lang.Thread instance 
>> passed to the ThreadInfo was stale most likely for the reason outlined 
>> in the paragraph above.
>>
>> This patch fixes the issue by ensuring that a ThreadSnapshot is always 
>> registered in a ThreadDumpResult before the initialization of the 
>> ThreadSnapshot is done. This ensures that the GC will always be able 
>> to find the oop ThreadSnapshot::_threadObj via ThreadDumpResult::oops_do.
>>
>> Webrev:
>> http://cr.openjdk.java.net/~ehelin/8213231/00/
> 
> This one caught my eye since I touched the ThreadSnapshot code in the
> Thread-SMR project...
> 
> src/hotspot/share/runtime/vmOperations.cpp
>      No comments.
> 
> src/hotspot/share/runtime/vmOperations.hpp
>      No comments.
> 
> src/hotspot/share/services/management.cpp
>      No comments.
> 
> src/hotspot/share/services/threadService.cpp
>      No comments.
> 
> src/hotspot/share/services/threadService.hpp
>      No comments.
> 
> Thumbs up!

Thanks Dan, appreciate you taking your to review this!

Erik

> Dan
> 
> 
>>
>> Issue:
>> https://bugs.openjdk.java.net/browse/JDK-8213231
>>
>> Testing:
>> - Tier 1, 2 and 3 on Windows, Mac, Linux (all x86-64)
>> - RunThese30M (multiple runs) and RunThese24h on Linux x86-64
>>   (please note that I never managed to reproduce the issue, all 
>> analysis was done based on a core file)
>>
>> Thanks,
>> Erik
>>
> 


More information about the hotspot-runtime-dev mailing list