RFR: 8252627: Make it safe for JFR thread to read threadObj
David Holmes
david.holmes at oracle.com
Wed Sep 2 22:05:50 UTC 2020
Hi Erik,
Thanks for clarifying the details around the Threads_lock.
David
On 2/09/2020 6:43 pm, Erik Österlund wrote:
> Hi David,
>
> Thanks for reviewing this.
>
> On 2020-09-02 07:14, David Holmes wrote:
>> Hi Erik,
>>
>> On 1/09/2020 8:55 pm, Erik Österlund wrote:
>>> Hi,
>>>
>>> Since the thread oop moved to OopStorage, a bug was introduced. The
>>> oop is now read with load barriers.
>>> If the JFR thread shoots a signal at a thread that holds a load
>>> barrier lock, the target thread is stuck
>>> holding that lock. Then the JFR thread tries to read the threadObj(),
>>> which might grab the same lock.
>>> This causes a deadlock. There are also some ZGC asserts that fire,
>>> not expecting relocation to happen
>>> from an unknown non-Java thread. I had fixes for said problems in my
>>> own concurrent stack scanning branch,
>>> but forgot to point it out in the review thread for JDK-8244997.
>>>
>>> My solution is to read the threadObj before the target thread is shot
>>> down, and simply pass in the already
>>> sampled thread oop to the context where the events are created.
>>
>> That makes sense given the target thread's threadObj() can't change.
>
> Exactly.
>
>>> All of this is done with the Threads_lock
>>> taken (today) to lock out safepoints from destroying the oop.
>>
>> Not clear to me (being unfamiliar with the general structure of the
>> JFR code) where this taking of the Threads_lock is occurring relative
>> to these raw oop uses?
>
> The JfrThreadSampler::task_stacktrace function grabs the Threads_lock,
> and selects target threads to sample. Said sampling is done with a call
> to JfrThreadSampleClosure::do_sample_thread (which starts by asserting
> the lock is owned). In there, it has two modes of sampling: one for
> in_native and one for in_java threads. The native case constructs a
> JfrNativeSamplerCallback, which caches the oop in the constructor, and
> the java case constructs a OSThreadSampler, which caches the oop in the
> constructor. Both of them act as corresponding context for the native
> and java sampling correspondingly, and the cached oop is used in the
> subsequent callbacks, when the threads have been stopped, instead of
> reading the oop fresh from the target threads.
>
>> Also what kind of thread is executing this code?
>
> The JFR sampling thread is a non-Java thread. And that is indeed why it
> is taking the Threads_lock in the first place. It is currently the only
> way for a non-Java and non-GC thread to block out safepoints.
>
> Thanks,
> /Erik
>
>> Thanks,
>> David
>> -----
>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~eosterlund/8252627/webrev.00/
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8252627
>>>
>>> Thanks,
>>> /Erik
>
More information about the hotspot-dev
mailing list