Running Apache Artemis with jdk-tsan; some ready-to-use TSAN .supp files?

Man Cao manc at google.com
Fri Apr 24 18:47:26 UTC 2020


Yes. We do have an internal list of suppressed races.
We will publish those suppressed races from JDK library and 3rd party code
to the Github repository.

-Man


On Fri, Apr 24, 2020 at 4:55 AM Jiri Daněk <jdanek at redhat.com> wrote:

> Hello folks,
>
> I moved from trivial hello-world programs to running something real under
> jdk-tsan. I got a lot of races reported, and I am trying to sort through
> them now. I used clang 9 to compile jdk-tsan.
>
> The software under test is https://github.com/apache/activemq-artemis
>
> I went through the reports, and I sorted it into six categories. (I did not
> go through everything that was reported yet. Depending how much
> functionality I exercise, the reported races are different.)
>
> 1) races in C++ code of the JDK, not my concern
> 2) races in classes under java.util.concurrent in the JDK, probably OK
> because Java memory model is defined for racey programs too, and the
> authors hopefully knew what they were doing
> 3) races in 3rd party code (jetty), which I am going to ignore for now
> 4) races in 1st party code in counters and timers, which I think are
> harmless, because something will synchronize the memory eventually, and we
> will learn of the timeout, or the counters will be somewhat off, but we
> don't lose performance, so it is a worthwhile tradeoff.
> 5) races in 1st party custom concurrent structures (there is a custom hash
> map) which are hopefully harmless as in 2)
> 6) races in 1st party regular code, which are worth investigating, and I
> think that it is best to prioritize races with write in constructor and
> read or write somewhere else, because that can lead to
> partially-constructed objects, which are supposed to be a big problem
>
> Is my thinking here correct? Has anyone already started writing TSan
> suppression files for things in the JDK and standard library that can be
> safely ignored by application programmers?
>
> I hoped to be able to search for the TSan warnings and see how other people
> were resolving them, but it looks like jdk-tsan does not have that many
> users yet.
>
> Thanks!
>
> My categorization and analysis is the following. The section numbers are
> just to order things, they are not related to the list above.
>
> # 01.java
>
> https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-01-java
>
> Reports from JDK itself, coming from
> jdk-tsan/src/java.base/share/native/libzip/zip_util.c. I am not going to
> worry about what JDK itself is doing.
>
> # 02.java
>
> https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-02-java
>
> Races reported on JDK's synchronization objects, here it is in
>
> java.util.concurrent.locks.AbstractOwnableSynchronizer.setExclusiveOwnerThread
> / getExclusiveOwnerThread
>
> # 03.java
>
> https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-03-java
>
> Races from org.eclipse.jetty. Some of them look potentially harmful, for
> example between java.util.ArrayDeque.pollFirst and
> java.util.ArrayDeque.<init>. I am on x86_64, so I need not worry too much
> (I hope), but AFAIK if I do not synchronize between initializing an object
> and using it, it may happen that the thread using it sees a partially
> initialized object. Meaning that any races with .<init> should be probably
> the first to investigate?
>
> Anyways, Jetty is just a 3rd party dependency, not a priority overall for
> me.
>
> # 10.java
>
> https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-10-java
>
> Some races in the 1st party code, including races with writes in
> constructor. If I remember correctly, the code is (in some places) using
> Executors that are running only one thread at a time. If TSan reports a
> race from such code, that would mean that even though there is no
> parallelism present, there may be memory synchronization missing. So even
> if I find some situations like this, there can be real bugs.
>
> At the end of this listing, there are some races on java.util.HashMap and
> such. The class is not thread safe, so this should not be happening and
> that could be a possible bug too.
>
> # 11.java
>
> https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-11-java
>
> Races on java.util.concurrent.SynchronousQueue,
> java.util.concurrent.locks.ReentrantReadWriteLock,
> java.util.concurrent.ScheduledThreadPoolExecutor and other synchronization
> objects in the JDK, similar to 02.java. AFAIK it is ok to have races here,
> because people writing this knew what they were doing, racey code can be
> faster, the code is widely used and tested, therefore should not be cause
> of bugs. These reports could be safely suppressed in TSan suppression file.
>
> # 12.java, # 14.java, # 18.java
>
> https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-12-java
>
> https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-14-java
>
> https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-18-java
>
> Same as 02.java and 11.java before. This time, it is
> java.util.concurrent.locks.AbstractQueuedSynchronizer and
> java.util.concurrent.locks.ReentrantReadWriteLock.
>
> # 13.java, # 17.java
>
> https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-13-java
>
> https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-17-java
>
> Races in 1st party code, probably same treatment as in 10.java.
>
> # 15.java
>
> https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-15-java
>
> Race in 1st party code on
>
> org.apache.activemq.artemis.core.journal.impl.JournalTransaction.internalgetCounter.
> This is supposed to use atomic ints, so should be fine, but maybe there is
> something off there. Could be that the software is fine with not having
> accurate counters and trade that for avoiding synchronization cost.
>
> # 16.java
>
> https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-16-java
>
> Race in a custom implementation of a concurrent hash map, in
> org.apache.activemq.artemis.utils.collections.ConcurrentLongHashMap. Could
> be fine, some races can allow for more performant code, and Java memory
> model is supposed to be defined even for racy programs. I may try to
> investigate, but I don't expect to find a problem. The code is in heavy use
> and seems to work.
>
> # 19.java
>
> https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-19-java
>
> Race in 1st party code,
> org.apache.activemq.artemis.utils.critical.CriticalComponentImpl.isExpired.
> This is probably again a case of trying to avoid synchronization, same as
> the counters before, given that something else will eventually synchronize
> the memory and we will learn about the timer expiration eventually. I think
> it is fine, and cannot imagine "fixing" this, without making things more
> complicated, or potentially slower due to more synchronization. It would be
> hard to prove that this is worth fixing and that the fixes are not too
> expensive.
> --
> Mit freundlichen Grüßen / Kind regards
> Jiri Daněk
>


More information about the tsan-dev mailing list