RFR: 8229169: False failure of GenericTaskQueue::pop_local on architectures with weak memory model
Jie Fu
fujie at loongson.cn
Tue Aug 6 08:21:53 UTC 2019
Hi all,
JBS: https://bugs.openjdk.java.net/browse/JDK-8229169
Webrev: http://cr.openjdk.java.net/~jiefu/8229169/webrev.00/
*Background*
Various GC crashes were observed on our Loongson CPUs which support a
weak memory model.
These crashes can be reproduced with the following test.
---------------------------------------------------------
make test
TEST="hotspot/jtreg/gc/g1/humongousObjects/TestHumongousClassLoader.java"
CONF=release
---------------------------------------------------------
*Analysis*
Crashes were caused by the false failure of GenericTaskQueue::pop_local [1].
A corner case had been observed on architectures allowing loads
reordering, which led to the various GC crashes.
With weak memory architectures, the load of '_age.get()' in line 187 may
float up before the load of '_age.top' in line 179.
However, for some corner case, the work stealing algorithm may become
incorrect if this reordering occurs.
---------------------------------------------------------
153
154 template<class E, MEMFLAGS F, unsigned int N> inline bool
155 GenericTaskQueue<E, F, N>::pop_local(volatile E& t, uint threshold) {
156 uint localBot = _bottom;
157 // This value cannot be N-1. That can only occur as a result of
158 // the assignment to bottom in this method. If it does, this method
159 // resets the size to 0 before the next call (which is sequential,
160 // since this is pop_local.)
161 uint dirty_n_elems = dirty_size(localBot, _age.top());
162 assert(dirty_n_elems != N - 1, "Shouldn't be possible...");
163 if (dirty_n_elems <= threshold) return false;
164 localBot = decrement_index(localBot);
165 _bottom = localBot;
166 // This is necessary to prevent any read below from being reordered
167 // before the store just above.
168 OrderAccess::fence();
169 // g++ complains if the volatile result of the assignment is
170 // unused, so we cast the volatile away. We cannot cast directly
171 // to void, because gcc treats that as not using the result of the
172 // assignment. However, casting to E& means that we trigger an
173 // unused-value warning. So, we cast the E& to void.
174 (void) const_cast<E&>(t = _elems[localBot]);
175 // This is a second read of "age"; the "size()" above is the first.
176 // If there's still at least one element in the queue, based on the
177 // "_bottom" and "age" we've read, then there can be no
interference with
178 // a "pop_global" operation, and we're done.
179 idx_t tp = _age.top(); // XXX
180 if (size(localBot, tp) > 0) {
181 assert(dirty_size(localBot, tp) != N - 1, "sanity");
182 TASKQUEUE_STATS_ONLY(stats.record_pop());
183 return true;
184 } else {
185 // Otherwise, the queue contained exactly one element; we take
the slow
186 // path.
187 return pop_local_slow(localBot, _age.get());
188 }
189 }
190
---------------------------------------------------------
Assume that after the execution of line 168, the status of the task
queue is:
_top = 8825; _bottom = 8826; N = 131072
Just imagine the following corner case:
-1) The load of '_age.get()' in line 187 is floating up before the
load of '_age.top' in line 179, and gets _age._top = 8825
Concurrently, another thread steals a task from the queue, and
changes the task queue status to:
_top = 8826; _bottom = 8826; N = 131072
This status means that there is still one task (indexed by _top =
8826) in the queue
-2) Then the load of '_age.top' in line 179 gets tp = _age._top = 8826
-3) Then the if-condition in line 180 is false, and pop_local_slow is
called with parameters localBot = 8826 and _age.get()._top = 8825
-4) pop_local_slow will return false since localBot !=
_age.get().top() [2], and the task queue will be set empty [3]
It's obvious incorrect to empty the task queue if the remaining
task in step 1) hasn't been processed yet.
-5) Then pop_local returns false, which is wrong again if the
remaining task hasn't been stolen by another thread
*Fix*
For weak memory architectures, a memory fence before line 187 is
required to prevent the load of _age from floating up.
Could you please review it?
Thanks a lot.
Best regards,
Jie
[1]
http://hg.openjdk.java.net/jdk/jdk/file/8f067351c370/src/hotspot/share/gc/shared/taskqueue.inline.hpp#l155
[2]
http://hg.openjdk.java.net/jdk/jdk/file/8f067351c370/src/hotspot/share/gc/shared/taskqueue.inline.hpp#l135
[3]
http://hg.openjdk.java.net/jdk/jdk/file/8f067351c370/src/hotspot/share/gc/shared/taskqueue.inline.hpp#l149
More information about the hotspot-gc-dev
mailing list