Running Apache Artemis with jdk-tsan; some ready-to-use TSAN .supp files?
Jiri Daněk
jdanek at redhat.com
Fri Apr 24 11:52:14 UTC 2020
Hello folks,
I moved from trivial hello-world programs to running something real under
jdk-tsan. I got a lot of races reported, and I am trying to sort through
them now. I used clang 9 to compile jdk-tsan.
The software under test is https://github.com/apache/activemq-artemis
I went through the reports, and I sorted it into six categories. (I did not
go through everything that was reported yet. Depending how much
functionality I exercise, the reported races are different.)
1) races in C++ code of the JDK, not my concern
2) races in classes under java.util.concurrent in the JDK, probably OK
because Java memory model is defined for racey programs too, and the
authors hopefully knew what they were doing
3) races in 3rd party code (jetty), which I am going to ignore for now
4) races in 1st party code in counters and timers, which I think are
harmless, because something will synchronize the memory eventually, and we
will learn of the timeout, or the counters will be somewhat off, but we
don't lose performance, so it is a worthwhile tradeoff.
5) races in 1st party custom concurrent structures (there is a custom hash
map) which are hopefully harmless as in 2)
6) races in 1st party regular code, which are worth investigating, and I
think that it is best to prioritize races with write in constructor and
read or write somewhere else, because that can lead to
partially-constructed objects, which are supposed to be a big problem
Is my thinking here correct? Has anyone already started writing TSan
suppression files for things in the JDK and standard library that can be
safely ignored by application programmers?
I hoped to be able to search for the TSan warnings and see how other people
were resolving them, but it looks like jdk-tsan does not have that many
users yet.
Thanks!
My categorization and analysis is the following. The section numbers are
just to order things, they are not related to the list above.
# 01.java
https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-01-java
Reports from JDK itself, coming from
jdk-tsan/src/java.base/share/native/libzip/zip_util.c. I am not going to
worry about what JDK itself is doing.
# 02.java
https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-02-java
Races reported on JDK's synchronization objects, here it is in
java.util.concurrent.locks.AbstractOwnableSynchronizer.setExclusiveOwnerThread
/ getExclusiveOwnerThread
# 03.java
https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-03-java
Races from org.eclipse.jetty. Some of them look potentially harmful, for
example between java.util.ArrayDeque.pollFirst and
java.util.ArrayDeque.<init>. I am on x86_64, so I need not worry too much
(I hope), but AFAIK if I do not synchronize between initializing an object
and using it, it may happen that the thread using it sees a partially
initialized object. Meaning that any races with .<init> should be probably
the first to investigate?
Anyways, Jetty is just a 3rd party dependency, not a priority overall for
me.
# 10.java
https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-10-java
Some races in the 1st party code, including races with writes in
constructor. If I remember correctly, the code is (in some places) using
Executors that are running only one thread at a time. If TSan reports a
race from such code, that would mean that even though there is no
parallelism present, there may be memory synchronization missing. So even
if I find some situations like this, there can be real bugs.
At the end of this listing, there are some races on java.util.HashMap and
such. The class is not thread safe, so this should not be happening and
that could be a possible bug too.
# 11.java
https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-11-java
Races on java.util.concurrent.SynchronousQueue,
java.util.concurrent.locks.ReentrantReadWriteLock,
java.util.concurrent.ScheduledThreadPoolExecutor and other synchronization
objects in the JDK, similar to 02.java. AFAIK it is ok to have races here,
because people writing this knew what they were doing, racey code can be
faster, the code is widely used and tested, therefore should not be cause
of bugs. These reports could be safely suppressed in TSan suppression file.
# 12.java, # 14.java, # 18.java
https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-12-java
https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-14-java
https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-18-java
Same as 02.java and 11.java before. This time, it is
java.util.concurrent.locks.AbstractQueuedSynchronizer and
java.util.concurrent.locks.ReentrantReadWriteLock.
# 13.java, # 17.java
https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-13-java
https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-17-java
Races in 1st party code, probably same treatment as in 10.java.
# 15.java
https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-15-java
Race in 1st party code on
org.apache.activemq.artemis.core.journal.impl.JournalTransaction.internalgetCounter.
This is supposed to use atomic ints, so should be fine, but maybe there is
something off there. Could be that the software is fine with not having
accurate counters and trade that for avoiding synchronization cost.
# 16.java
https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-16-java
Race in a custom implementation of a concurrent hash map, in
org.apache.activemq.artemis.utils.collections.ConcurrentLongHashMap. Could
be fine, some races can allow for more performant code, and Java memory
model is supposed to be defined even for racy programs. I may try to
investigate, but I don't expect to find a problem. The code is in heavy use
and seems to work.
# 19.java
https://gist.github.com/jiridanek/279988855ca9d29d99c7218d8ca681f4#file-19-java
Race in 1st party code,
org.apache.activemq.artemis.utils.critical.CriticalComponentImpl.isExpired.
This is probably again a case of trying to avoid synchronization, same as
the counters before, given that something else will eventually synchronize
the memory and we will learn about the timer expiration eventually. I think
it is fine, and cannot imagine "fixing" this, without making things more
complicated, or potentially slower due to more synchronization. It would be
hard to prove that this is worth fixing and that the fixes are not too
expensive.
--
Mit freundlichen Grüßen / Kind regards
Jiri Daněk
More information about the tsan-dev
mailing list