build openjdk with fsanitizer=thread

Jie He Jie.He at arm.com
Thu Feb 27 09:01:15 UTC 2020


So  the basic thinking about data race detection I got from these papers before is that little false positives, little real races,
Is it still correct for TSAN2?

And as I understand, the false positive case "message queue" in old paper 6.4.2 is not considered as a false positive any more by TSAN2 even using atomic operations, right?

thanks

-----Original Message-----
From: Dmitry Vyukov <dvyukov at google.com> 
Sent: Thursday, February 27, 2020 4:41 PM
To: Jie He <Jie.He at arm.com>
Cc: Arthur Eubanks <aeubanks at google.com>; tsan-dev at openjdk.java.net; nd <nd at arm.com>; thread-sanitizer <thread-sanitizer at googlegroups.com>
Subject: Re: build openjdk with fsanitizer=thread

On Thu, Feb 27, 2020 at 7:56 AM Jie He <Jie.He at arm.com> wrote:
>
> > Tsan is not supposed to produce any false positives according to C/C++ rules (if used correctly and understand synchronization, which should be the case here as JVM seem to use pthread synchronization primitives). Suppressions are mostly to temporarily suppress known bugs for example to make CI green while somebody works on the bug fix.
>
> You said tsan is not supposed to produce any false positives according to c/c++ rules, but the papers about dynamic data detection like djit+, lockset, they all said false positive is unavoidable, even using pure happens-before mode because the algorithm doesn't understand the synchronization semantic indeed. And also the paper "ThreadSanitizer – data race detection in practice" lists some false positive cases, even a simple producer-consumer model.
>
> And thanks your explanation and patience again.

Lockset, yes, it inherently does not understand lock-free synchronization and atomic operations.
The ThreadSanitizer paper is about (very) old version of ThreadSanitizer that was valgrind-based and is long dead. That old TSan could work in 2 modes: lock-set and happens-before. Lock-set mode indeed had lots of false positives, that's probably what the paper refers to. The happens-before was better but had a limitation related to Valgrind binary instrumentation -- it's not possible to see/intercept atomic operations on binary level. So it requires annotations for atomic operations.
The current TSan is only happens-before-based and is based on compiler instrumentation, so it understands std::atomic and __atomic_ builtins (and C atomics as consequence I think). So it does not have these classes of false positives.

I lied a bit about the complete absence of false positives. TSan still does not handle stand-alone memory barriers (std::atomic_thread_fence), there are no fundamental limitations but it's not implemented. It still understands atomic loads and stores with fine-grained memory ordering constraints (acquire/release/relaxed).
But what I meant is: don't write off everything you see as a false positive right away and suppress it. Then there is also little point in applying TSan in the first place. It is capable of finding very tricky race conditions in tricky code, e.g. it's used extensively on
V8 and to some degree enabled switch a concurrent GC by catching a number of races between runtime/mutators.




> -----Original Message-----
> From: Dmitry Vyukov <dvyukov at google.com>
> Sent: Thursday, February 27, 2020 2:22 PM
> To: Jie He <Jie.He at arm.com>
> Cc: Arthur Eubanks <aeubanks at google.com>; tsan-dev at openjdk.java.net; 
> nd <nd at arm.com>; thread-sanitizer <thread-sanitizer at googlegroups.com>
> Subject: Re: build openjdk with fsanitizer=thread
>
> On Thu, Feb 27, 2020 at 4:45 AM Jie He <Jie.He at arm.com> wrote:
> >
> > Yes, you remind me, I ever added a suppression in supp file 
> > "call_from_lib:libjvm.so" to try to get a clear report, but seems it 
> > didn't succeed. There are still a lot of reports produced,  some 
> > threads have stack trace and recognize pthread function well, but 
> > others not,
>
> Oh, if the JVM is compiled to libjvm.so, this is not going to work, this literally tells tsan to ignore all pthread interceptors coming from libjvm.so.
> call_from_lib is meant only for non-instrumented libraries, because compiler instrumentation is not ignored. So what we have with this suppression is that all synchronization (pthread) is ignored but memory accesses are not. This is intended that this will produce tons of nonsensical reports.
> And even for non-instrumented libraries call_from_lib may lead to false positives if the non-instrumented library synchronizes any of instrumented code.
> call_from_lib is meant for very special contexts, generally don't use it.
>
>
> > is it correct behavior? I think no reports should be produced if I add this strong suppression.
>
> Yes. And it's not strong in all respects, only in some.
>
>
> > Then I removed the supp item,  seems Mutex could be handled well, but still a lot of reports exist, I will check and suppress them if it's not a race.
>
> Tsan is not supposed to produce any false positives according to C/C++ rules (if used correctly and understand synchronization, which should be the case here as JVM seem to use pthread synchronization primitives). Suppressions are mostly to temporarily suppress known bugs for example to make CI green while somebody works on the bug fix.
>
>
>
> > Thank u.
> >
> > -----Original Message-----
> > From: Dmitry Vyukov <dvyukov at google.com>
> > Sent: Thursday, February 27, 2020 12:12 AM
> > To: Jie He <Jie.He at arm.com>
> > Cc: Arthur Eubanks <aeubanks at google.com>; tsan-dev at openjdk.java.net; 
> > nd <nd at arm.com>; thread-sanitizer 
> > <thread-sanitizer at googlegroups.com>
> > Subject: Re: build openjdk with fsanitizer=thread
> >
> > This is puzzling. It seems to be protected by the PlatformMutex, which is pthread_mutex_t underneath:
> > https://github.com/openjdk/jdk/blob/0da0333a06aef32ce7af3448dfa38c1f
> > 40 c32826/src/hotspot/os/posix/os_posix.inline.hpp
> >
> > Tsan does understand pthread_mutex_t, that's a very widely used functionality. So there is something fishy going on.
> > Do we know that it intercepts/understands at least some of pthread mutexes? Or there is some systematic problem which results in tsan missing just all of pthread? By any chance don't you have something like "called_from_lib:pthread" in your tsan.supp?
> >
> > There is also something fishy in the reports  you posted. First one happens inside of MutexLocker::~MutexLocker(). The question is: why the race wasn't reported in the MutexLocker constructor where we write to owner_ as well? I don't see any reasonable execution where write in ctor does not race, but write in dtor races...
> > The same in the second report: the race is in notify method, but we also called lock before which write to owner_. Why that did not race?...
> >
> >
> >
> >
> > On Wed, Feb 26, 2020 at 3:11 PM Jie He <Jie.He at arm.com> wrote:
> > >
> > > Oh, I misunderstand the race object, but _owner also appears to be 
> > > protected by the member var _lock within class Mutex.
> > >
> > > -----Original Message-----
> > > From: Dmitry Vyukov <dvyukov at google.com>
> > > Sent: Wednesday, February 26, 2020 9:56 PM
> > > To: Jie He <Jie.He at arm.com>
> > > Cc: Arthur Eubanks <aeubanks at google.com>; 
> > > tsan-dev at openjdk.java.net; nd <nd at arm.com>; thread-sanitizer 
> > > <thread-sanitizer at googlegroups.com>
> > > Subject: Re: build openjdk with fsanitizer=thread
> > >
> > > On Wed, Feb 26, 2020 at 2:46 PM Jie He <Jie.He at arm.com> wrote:
> > > >
> > > > Is this case you mentioned "_owned_by_self"
> > > >
> > > > >     #0 Mutex::owned_by_self() const /home/wave/workspace/jdk_master/src/hotspot/share/runtime/mutex.cpp:301:10 (libjvm.so+0x1925966)
> > > > >     #1 assert_lock_strong(Mutex const*) /home/wave/workspace/jdk_master/src/hotspot/share/runtime/mutexLocker.cpp:187:13 (libjvm.so+0x19272fa)
> > > > >     #2 MutexLocker::~MutexLocker() /home/wave/workspace/jdk_master/src/hotspot/share/runtime/mutexLocker.hpp:237:7 (libjvm.so+0x33da8a)
> > > > >     #3 G1YoungRemSetSamplingThread::stop_service() /home/wave/workspace/jdk_master/src/hotspot/share/gc/g1/g1YoungRemSetSamplingThread.cpp:129:1 (libjvm.so+0x10c50e0)
> > > > >     #4 ConcurrentGCThread::stop() /home/wave/workspace/jdk_master/src/hotspot/share/gc/shared/concurrentGCThread.cpp:65:3 (libjvm.so+0xc9fa11)
> > > > >     #5 G1CollectedHeap::stop() /home/wave/workspace/jdk_master/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:1867:31 (libjvm.so+0xfae058)
> > > > >     #6 before_exit(JavaThread*) /home/wave/workspace/jdk_master/src/hotspot/share/runtime/java.cpp:461:21 (libjvm.so+0x1215b84)
> > > > >     #7 Threads::destroy_vm() /home/wave/workspace/jdk_master/src/hotspot/share/runtime/thread.cpp:4418:3 (libjvm.so+0x1e326ee)
> > > > >     #8 jni_DestroyJavaVM_inner(JavaVM_*) /home/wave/workspace/jdk_master/src/hotspot/share/prims/jni.cpp:3989:7 (libjvm.so+0x137a5eb)
> > > > >     #9 jni_DestroyJavaVM /home/wave/workspace/jdk_master/src/hotspot/share/prims/jni.cpp:4007:14 (libjvm.so+0x137a3ef)
> > > > >     #10 JavaMain
> > > >
> > > > Seems thread::current() is a TLS variable, refer to the code:
> > > >
> > > > inline Thread* Thread::current_or_null() { #ifndef 
> > > > USE_LIBRARY_BASED_TLS_ONLY
> > > >   return _thr_current;
> > > > #else
> > > >   if (ThreadLocalStorage::is_initialized()) {
> > > >     return ThreadLocalStorage::thread();
> > > >   }
> > > >   return NULL;
> > > > #endif
> > > > }
> > >
> > > The race is on Mutex::owner_ based on the fact that another thread does a write, and on allocation stack, and the race just seems to be visible in the code and is a common race.


More information about the tsan-dev mailing list