RFR: 8252627: Make it safe for JFR thread to read threadObj
Erik Österlund
erik.osterlund at oracle.com
Wed Sep 2 08:43:39 UTC 2020
Hi David,
Thanks for reviewing this.
On 2020-09-02 07:14, David Holmes wrote:
> Hi Erik,
>
> On 1/09/2020 8:55 pm, Erik Österlund wrote:
>> Hi,
>>
>> Since the thread oop moved to OopStorage, a bug was introduced. The
>> oop is now read with load barriers.
>> If the JFR thread shoots a signal at a thread that holds a load
>> barrier lock, the target thread is stuck
>> holding that lock. Then the JFR thread tries to read the threadObj(),
>> which might grab the same lock.
>> This causes a deadlock. There are also some ZGC asserts that fire,
>> not expecting relocation to happen
>> from an unknown non-Java thread. I had fixes for said problems in my
>> own concurrent stack scanning branch,
>> but forgot to point it out in the review thread for JDK-8244997.
>>
>> My solution is to read the threadObj before the target thread is shot
>> down, and simply pass in the already
>> sampled thread oop to the context where the events are created.
>
> That makes sense given the target thread's threadObj() can't change.
Exactly.
>> All of this is done with the Threads_lock
>> taken (today) to lock out safepoints from destroying the oop.
>
> Not clear to me (being unfamiliar with the general structure of the
> JFR code) where this taking of the Threads_lock is occurring relative
> to these raw oop uses?
The JfrThreadSampler::task_stacktrace function grabs the Threads_lock,
and selects target threads to sample. Said sampling is done with a call
to JfrThreadSampleClosure::do_sample_thread (which starts by asserting
the lock is owned). In there, it has two modes of sampling: one for
in_native and one for in_java threads. The native case constructs a
JfrNativeSamplerCallback, which caches the oop in the constructor, and
the java case constructs a OSThreadSampler, which caches the oop in the
constructor. Both of them act as corresponding context for the native
and java sampling correspondingly, and the cached oop is used in the
subsequent callbacks, when the threads have been stopped, instead of
reading the oop fresh from the target threads.
> Also what kind of thread is executing this code?
The JFR sampling thread is a non-Java thread. And that is indeed why it
is taking the Threads_lock in the first place. It is currently the only
way for a non-Java and non-GC thread to block out safepoints.
Thanks,
/Erik
> Thanks,
> David
> -----
>
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8252627/webrev.00/
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8252627
>>
>> Thanks,
>> /Erik
More information about the hotspot-dev
mailing list