RFR: 8213231: ThreadSnapshot::_threadObj can become stale
Erik Helin
erik.helin at oracle.com
Wed Jan 23 13:28:42 UTC 2019
On 1/22/19 10:32 PM, Daniel D. Daugherty wrote:
> On 1/22/19 9:59 AM, Erik Helin wrote:
>> Hi all,
>>
>> this patch fixes a problem when the oop in ThreadSnapshot::_threadObj
>> can become stale. The issue is that the ThreadSnapshot::oops_do method
>> only gets called when a ThreadSnapshot instance has been registered in
>> a ThreadDumpResult (via the ThreadDumpResult::add_thread_snapshot
>> method). But, in order to register a ThreadSnapshot instance, you must
>> first create it. The problem is that the ThreadSnapshot constructor
>> first sets _threadObj to thread->threadObj() and then further down
>> might call ObjectSynchronizer:: get_lock_owner. The call to
>> ObjectSynchronizer:: get_lock_owner can result in a VM_RevokeBias VM
>> operation being executed. If a GC VM operation already is enqueued,
>> then that GC VM operation will run when the VM_RevokeBias VM operation
>> is executed. That GC VM operation will not update the oop in
>> ThreadSafepoint::_threadObj, because that ThreadSnapshot instance has
>> not yet been registered in any ThreadDumpResult (recall that the
>> ThreadSafepoint is being constructed), so the GC has no way to find
>> it. The oop in ThreadSafepoint::_threadObj will then become dangling
>> which most likely will cause the JVM to get a SIGSEGV some time later.
>>
>> The issue was found when debugging why an instance of
>> java/lang/management/ThreadInfo on the Java heap had a stale pointer
>> in its threadName field. Turns out that the java.lang.Thread instance
>> passed to the ThreadInfo was stale most likely for the reason outlined
>> in the paragraph above.
>>
>> This patch fixes the issue by ensuring that a ThreadSnapshot is always
>> registered in a ThreadDumpResult before the initialization of the
>> ThreadSnapshot is done. This ensures that the GC will always be able
>> to find the oop ThreadSnapshot::_threadObj via ThreadDumpResult::oops_do.
>>
>> Webrev:
>> http://cr.openjdk.java.net/~ehelin/8213231/00/
>
> This one caught my eye since I touched the ThreadSnapshot code in the
> Thread-SMR project...
>
> src/hotspot/share/runtime/vmOperations.cpp
> No comments.
>
> src/hotspot/share/runtime/vmOperations.hpp
> No comments.
>
> src/hotspot/share/services/management.cpp
> No comments.
>
> src/hotspot/share/services/threadService.cpp
> No comments.
>
> src/hotspot/share/services/threadService.hpp
> No comments.
>
> Thumbs up!
Thanks Dan, appreciate you taking your to review this!
Erik
> Dan
>
>
>>
>> Issue:
>> https://bugs.openjdk.java.net/browse/JDK-8213231
>>
>> Testing:
>> - Tier 1, 2 and 3 on Windows, Mac, Linux (all x86-64)
>> - RunThese30M (multiple runs) and RunThese24h on Linux x86-64
>> (please note that I never managed to reproduce the issue, all
>> analysis was done based on a core file)
>>
>> Thanks,
>> Erik
>>
>
More information about the hotspot-runtime-dev
mailing list