RFR: 8252627: Make it safe for JFR thread to read threadObj

David Holmes david.holmes at oracle.com
Wed Sep 2 22:05:50 UTC 2020


Hi Erik,

Thanks for clarifying the details around the Threads_lock.

David

On 2/09/2020 6:43 pm, Erik Österlund wrote:
> Hi David,
> 
> Thanks for reviewing this.
> 
> On 2020-09-02 07:14, David Holmes wrote:
>> Hi Erik,
>>
>> On 1/09/2020 8:55 pm, Erik Österlund wrote:
>>> Hi,
>>>
>>> Since the thread oop moved to OopStorage, a bug was introduced. The 
>>> oop is now read with load barriers.
>>> If the JFR thread shoots a signal at a thread that holds a load 
>>> barrier lock, the target thread is stuck
>>> holding that lock. Then the JFR thread tries to read the threadObj(), 
>>> which might grab the same lock.
>>> This causes a deadlock. There are also some ZGC asserts that fire, 
>>> not expecting relocation to happen
>>> from an unknown non-Java thread. I had fixes for said problems in my 
>>> own concurrent stack scanning branch,
>>> but forgot to point it out in the review thread for JDK-8244997.
>>>
>>> My solution is to read the threadObj before the target thread is shot 
>>> down, and simply pass in the already
>>> sampled thread oop to the context where the events are created. 
>>
>> That makes sense given the target thread's threadObj() can't change.
> 
> Exactly.
> 
>>> All of this is done with the Threads_lock
>>> taken (today) to lock out safepoints from destroying the oop.
>>
>> Not clear to me (being unfamiliar with the general structure of the 
>> JFR code) where this taking of the Threads_lock is occurring relative 
>> to these raw oop uses?
> 
> The JfrThreadSampler::task_stacktrace function grabs the Threads_lock, 
> and selects target threads to sample. Said sampling is done with a call 
> to JfrThreadSampleClosure::do_sample_thread (which starts by asserting 
> the lock is owned). In there, it has two modes of sampling: one for 
> in_native and one for in_java threads. The native case constructs a 
> JfrNativeSamplerCallback, which caches the oop in the constructor, and 
> the java case constructs a OSThreadSampler, which caches the oop in the 
> constructor. Both of them act as corresponding context for the native 
> and java sampling correspondingly, and the cached oop is used in the 
> subsequent callbacks, when the threads have been stopped, instead of 
> reading the oop fresh from the target threads.
> 
>> Also what kind of thread is executing this code?
> 
> The JFR sampling thread is a non-Java thread. And that is indeed why it 
> is taking the Threads_lock in the first place. It is currently the only 
> way for a non-Java and non-GC thread to block out safepoints.
> 
> Thanks,
> /Erik
> 
>> Thanks,
>> David
>> -----
>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~eosterlund/8252627/webrev.00/
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8252627
>>>
>>> Thanks,
>>> /Erik
> 


More information about the hotspot-dev mailing list