RFR: 8229169: False failure of GenericTaskQueue::pop_local on architectures with weak memory model

Jie Fu fujie at loongson.cn
Tue Aug 6 08:21:53 UTC 2019


Hi all,

JBS:    https://bugs.openjdk.java.net/browse/JDK-8229169
Webrev: http://cr.openjdk.java.net/~jiefu/8229169/webrev.00/

*Background*
Various GC crashes were observed on our Loongson CPUs which support a 
weak memory model.
These crashes can be reproduced with the following test.
---------------------------------------------------------
make test 
TEST="hotspot/jtreg/gc/g1/humongousObjects/TestHumongousClassLoader.java" 
CONF=release
---------------------------------------------------------

*Analysis*
Crashes were caused by the false failure of GenericTaskQueue::pop_local [1].
A corner case had been observed on architectures allowing loads 
reordering, which led to the various GC crashes.

With weak memory architectures, the load of '_age.get()' in line 187 may 
float up before the load of '_age.top' in line 179.
However, for some corner case, the work stealing algorithm may become 
incorrect if this reordering occurs.
---------------------------------------------------------
153
154 template<class E, MEMFLAGS F, unsigned int N> inline bool
155 GenericTaskQueue<E, F, N>::pop_local(volatile E& t, uint threshold) {
156   uint localBot = _bottom;
157   // This value cannot be N-1.  That can only occur as a result of
158   // the assignment to bottom in this method.  If it does, this method
159   // resets the size to 0 before the next call (which is sequential,
160   // since this is pop_local.)
161   uint dirty_n_elems = dirty_size(localBot, _age.top());
162   assert(dirty_n_elems != N - 1, "Shouldn't be possible...");
163   if (dirty_n_elems <= threshold) return false;
164   localBot = decrement_index(localBot);
165   _bottom = localBot;
166   // This is necessary to prevent any read below from being reordered
167   // before the store just above.
168   OrderAccess::fence();
169   // g++ complains if the volatile result of the assignment is
170   // unused, so we cast the volatile away.  We cannot cast directly
171   // to void, because gcc treats that as not using the result of the
172   // assignment.  However, casting to E& means that we trigger an
173   // unused-value warning.  So, we cast the E& to void.
174   (void) const_cast<E&>(t = _elems[localBot]);
175   // This is a second read of "age"; the "size()" above is the first.
176   // If there's still at least one element in the queue, based on the
177   // "_bottom" and "age" we've read, then there can be no 
interference with
178   // a "pop_global" operation, and we're done.
179   idx_t tp = _age.top();    // XXX
180   if (size(localBot, tp) > 0) {
181     assert(dirty_size(localBot, tp) != N - 1, "sanity");
182     TASKQUEUE_STATS_ONLY(stats.record_pop());
183     return true;
184   } else {
185     // Otherwise, the queue contained exactly one element; we take 
the slow
186     // path.
187     return pop_local_slow(localBot, _age.get());
188   }
189 }
190
---------------------------------------------------------

Assume that after the execution of line 168, the status of the task 
queue is:
         _top = 8825; _bottom = 8826; N = 131072

Just imagine the following corner case:
  -1) The load of '_age.get()' in line 187 is floating up before the 
load of '_age.top' in line 179, and gets _age._top = 8825
      Concurrently, another thread steals a task from the queue, and 
changes the task queue status to:
         _top = 8826; _bottom = 8826; N = 131072
      This status means that there is still one task (indexed by _top = 
8826) in the queue
  -2) Then the load of '_age.top' in line 179 gets tp = _age._top = 8826
  -3) Then the if-condition in line 180 is false, and pop_local_slow is 
called with parameters localBot = 8826 and _age.get()._top = 8825
  -4) pop_local_slow will return false since localBot != 
_age.get().top() [2], and the task queue will be set empty [3]
      It's obvious incorrect to empty the task queue if the remaining 
task in step 1) hasn't been processed yet.
  -5) Then pop_local returns false, which is wrong again if the 
remaining task hasn't been stolen by another thread


*Fix*
For weak memory architectures, a memory fence before line 187 is 
required to prevent the load of _age from floating up.

Could you please review it?

Thanks a lot.
Best regards,
Jie

[1] 
http://hg.openjdk.java.net/jdk/jdk/file/8f067351c370/src/hotspot/share/gc/shared/taskqueue.inline.hpp#l155
[2] 
http://hg.openjdk.java.net/jdk/jdk/file/8f067351c370/src/hotspot/share/gc/shared/taskqueue.inline.hpp#l135
[3] 
http://hg.openjdk.java.net/jdk/jdk/file/8f067351c370/src/hotspot/share/gc/shared/taskqueue.inline.hpp#l149





More information about the hotspot-gc-dev mailing list