build openjdk with fsanitizer=thread

Jie He Jie.He at arm.com
Thu Feb 27 06:56:43 UTC 2020


> Tsan is not supposed to produce any false positives according to C/C++ rules (if used correctly and understand synchronization, which should be the case here as JVM seem to use pthread synchronization primitives). Suppressions are mostly to temporarily suppress known bugs for example to make CI green while somebody works on the bug fix.

You said tsan is not supposed to produce any false positives according to c/c++ rules, but the papers about dynamic data detection like djit+, lockset, they all said false positive is unavoidable, even using pure happens-before mode because the algorithm doesn't understand the synchronization semantic indeed. And also the paper "ThreadSanitizer – data race detection in practice" lists some false positive cases, even a simple producer-consumer model.

And thanks your explanation and patience again.

-----Original Message-----
From: Dmitry Vyukov <dvyukov at google.com> 
Sent: Thursday, February 27, 2020 2:22 PM
To: Jie He <Jie.He at arm.com>
Cc: Arthur Eubanks <aeubanks at google.com>; tsan-dev at openjdk.java.net; nd <nd at arm.com>; thread-sanitizer <thread-sanitizer at googlegroups.com>
Subject: Re: build openjdk with fsanitizer=thread

On Thu, Feb 27, 2020 at 4:45 AM Jie He <Jie.He at arm.com> wrote:
>
> Yes, you remind me, I ever added a suppression in supp file 
> "call_from_lib:libjvm.so" to try to get a clear report, but seems it 
> didn't succeed. There are still a lot of reports produced,  some 
> threads have stack trace and recognize pthread function well, but 
> others not,

Oh, if the JVM is compiled to libjvm.so, this is not going to work, this literally tells tsan to ignore all pthread interceptors coming from libjvm.so.
call_from_lib is meant only for non-instrumented libraries, because compiler instrumentation is not ignored. So what we have with this suppression is that all synchronization (pthread) is ignored but memory accesses are not. This is intended that this will produce tons of nonsensical reports.
And even for non-instrumented libraries call_from_lib may lead to false positives if the non-instrumented library synchronizes any of instrumented code.
call_from_lib is meant for very special contexts, generally don't use it.


> is it correct behavior? I think no reports should be produced if I add this strong suppression.

Yes. And it's not strong in all respects, only in some.


> Then I removed the supp item,  seems Mutex could be handled well, but still a lot of reports exist, I will check and suppress them if it's not a race.

Tsan is not supposed to produce any false positives according to C/C++ rules (if used correctly and understand synchronization, which should be the case here as JVM seem to use pthread synchronization primitives). Suppressions are mostly to temporarily suppress known bugs for example to make CI green while somebody works on the bug fix.



> Thank u.
>
> -----Original Message-----
> From: Dmitry Vyukov <dvyukov at google.com>
> Sent: Thursday, February 27, 2020 12:12 AM
> To: Jie He <Jie.He at arm.com>
> Cc: Arthur Eubanks <aeubanks at google.com>; tsan-dev at openjdk.java.net; 
> nd <nd at arm.com>; thread-sanitizer <thread-sanitizer at googlegroups.com>
> Subject: Re: build openjdk with fsanitizer=thread
>
> This is puzzling. It seems to be protected by the PlatformMutex, which is pthread_mutex_t underneath:
> https://github.com/openjdk/jdk/blob/0da0333a06aef32ce7af3448dfa38c1f40
> c32826/src/hotspot/os/posix/os_posix.inline.hpp
>
> Tsan does understand pthread_mutex_t, that's a very widely used functionality. So there is something fishy going on.
> Do we know that it intercepts/understands at least some of pthread mutexes? Or there is some systematic problem which results in tsan missing just all of pthread? By any chance don't you have something like "called_from_lib:pthread" in your tsan.supp?
>
> There is also something fishy in the reports  you posted. First one happens inside of MutexLocker::~MutexLocker(). The question is: why the race wasn't reported in the MutexLocker constructor where we write to owner_ as well? I don't see any reasonable execution where write in ctor does not race, but write in dtor races...
> The same in the second report: the race is in notify method, but we also called lock before which write to owner_. Why that did not race?...
>
>
>
>
> On Wed, Feb 26, 2020 at 3:11 PM Jie He <Jie.He at arm.com> wrote:
> >
> > Oh, I misunderstand the race object, but _owner also appears to be 
> > protected by the member var _lock within class Mutex.
> >
> > -----Original Message-----
> > From: Dmitry Vyukov <dvyukov at google.com>
> > Sent: Wednesday, February 26, 2020 9:56 PM
> > To: Jie He <Jie.He at arm.com>
> > Cc: Arthur Eubanks <aeubanks at google.com>; tsan-dev at openjdk.java.net; 
> > nd <nd at arm.com>; thread-sanitizer 
> > <thread-sanitizer at googlegroups.com>
> > Subject: Re: build openjdk with fsanitizer=thread
> >
> > On Wed, Feb 26, 2020 at 2:46 PM Jie He <Jie.He at arm.com> wrote:
> > >
> > > Is this case you mentioned "_owned_by_self"
> > >
> > > >     #0 Mutex::owned_by_self() const /home/wave/workspace/jdk_master/src/hotspot/share/runtime/mutex.cpp:301:10 (libjvm.so+0x1925966)
> > > >     #1 assert_lock_strong(Mutex const*) /home/wave/workspace/jdk_master/src/hotspot/share/runtime/mutexLocker.cpp:187:13 (libjvm.so+0x19272fa)
> > > >     #2 MutexLocker::~MutexLocker() /home/wave/workspace/jdk_master/src/hotspot/share/runtime/mutexLocker.hpp:237:7 (libjvm.so+0x33da8a)
> > > >     #3 G1YoungRemSetSamplingThread::stop_service() /home/wave/workspace/jdk_master/src/hotspot/share/gc/g1/g1YoungRemSetSamplingThread.cpp:129:1 (libjvm.so+0x10c50e0)
> > > >     #4 ConcurrentGCThread::stop() /home/wave/workspace/jdk_master/src/hotspot/share/gc/shared/concurrentGCThread.cpp:65:3 (libjvm.so+0xc9fa11)
> > > >     #5 G1CollectedHeap::stop() /home/wave/workspace/jdk_master/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:1867:31 (libjvm.so+0xfae058)
> > > >     #6 before_exit(JavaThread*) /home/wave/workspace/jdk_master/src/hotspot/share/runtime/java.cpp:461:21 (libjvm.so+0x1215b84)
> > > >     #7 Threads::destroy_vm() /home/wave/workspace/jdk_master/src/hotspot/share/runtime/thread.cpp:4418:3 (libjvm.so+0x1e326ee)
> > > >     #8 jni_DestroyJavaVM_inner(JavaVM_*) /home/wave/workspace/jdk_master/src/hotspot/share/prims/jni.cpp:3989:7 (libjvm.so+0x137a5eb)
> > > >     #9 jni_DestroyJavaVM /home/wave/workspace/jdk_master/src/hotspot/share/prims/jni.cpp:4007:14 (libjvm.so+0x137a3ef)
> > > >     #10 JavaMain
> > >
> > > Seems thread::current() is a TLS variable, refer to the code:
> > >
> > > inline Thread* Thread::current_or_null() { #ifndef 
> > > USE_LIBRARY_BASED_TLS_ONLY
> > >   return _thr_current;
> > > #else
> > >   if (ThreadLocalStorage::is_initialized()) {
> > >     return ThreadLocalStorage::thread();
> > >   }
> > >   return NULL;
> > > #endif
> > > }
> >
> > The race is on Mutex::owner_ based on the fact that another thread does a write, and on allocation stack, and the race just seems to be visible in the code and is a common race.


More information about the tsan-dev mailing list