RFR: 8229169: False failure of GenericTaskQueue::pop_local on architectures with weak memory model
Doerr, Martin
martin.doerr at sap.com
Tue Aug 6 09:29:13 UTC 2019
Hi Jie,
thanks for reporting and analyzing this issue.
From your description, OrderAccess ::loadload() seems to be the appropriate barrier.
It should be used on all platforms because it contains a compiler barrier for TSO platforms which prevent compilers from reordering the load accesses.
We should also check that the writer of _age._top uses at least a release (or storestore) barrier in your scenario.
I wonder why I've never seen this issue on PPC64. The test "TestHumongousClassLoader" seems to work stable. But could be that we just never hit this corner case by chance.
If I missed anything please let me know.
Best regards,
Martin
> -----Original Message-----
> From: hotspot-gc-dev <hotspot-gc-dev-bounces at openjdk.java.net> On
> Behalf Of Jie Fu
> Sent: Dienstag, 6. August 2019 10:22
> To: Hotspot-Gc-Dev <hotspot-gc-dev at openjdk.java.net>
> Subject: RFR: 8229169: False failure of GenericTaskQueue::pop_local on
> architectures with weak memory model
>
> Hi all,
>
> JBS: https://bugs.openjdk.java.net/browse/JDK-8229169
> Webrev: http://cr.openjdk.java.net/~jiefu/8229169/webrev.00/
>
> *Background*
> Various GC crashes were observed on our Loongson CPUs which support a
> weak memory model.
> These crashes can be reproduced with the following test.
> ---------------------------------------------------------
> make test
> TEST="hotspot/jtreg/gc/g1/humongousObjects/TestHumongousClassLoader
> .java"
> CONF=release
> ---------------------------------------------------------
>
> *Analysis*
> Crashes were caused by the false failure of GenericTaskQueue::pop_local
> [1].
> A corner case had been observed on architectures allowing loads
> reordering, which led to the various GC crashes.
>
> With weak memory architectures, the load of '_age.get()' in line 187 may
> float up before the load of '_age.top' in line 179.
> However, for some corner case, the work stealing algorithm may become
> incorrect if this reordering occurs.
> ---------------------------------------------------------
> 153
> 154 template<class E, MEMFLAGS F, unsigned int N> inline bool
> 155 GenericTaskQueue<E, F, N>::pop_local(volatile E& t, uint threshold) {
> 156 uint localBot = _bottom;
> 157 // This value cannot be N-1. That can only occur as a result of
> 158 // the assignment to bottom in this method. If it does, this method
> 159 // resets the size to 0 before the next call (which is sequential,
> 160 // since this is pop_local.)
> 161 uint dirty_n_elems = dirty_size(localBot, _age.top());
> 162 assert(dirty_n_elems != N - 1, "Shouldn't be possible...");
> 163 if (dirty_n_elems <= threshold) return false;
> 164 localBot = decrement_index(localBot);
> 165 _bottom = localBot;
> 166 // This is necessary to prevent any read below from being reordered
> 167 // before the store just above.
> 168 OrderAccess::fence();
> 169 // g++ complains if the volatile result of the assignment is
> 170 // unused, so we cast the volatile away. We cannot cast directly
> 171 // to void, because gcc treats that as not using the result of the
> 172 // assignment. However, casting to E& means that we trigger an
> 173 // unused-value warning. So, we cast the E& to void.
> 174 (void) const_cast<E&>(t = _elems[localBot]);
> 175 // This is a second read of "age"; the "size()" above is the first.
> 176 // If there's still at least one element in the queue, based on the
> 177 // "_bottom" and "age" we've read, then there can be no
> interference with
> 178 // a "pop_global" operation, and we're done.
> 179 idx_t tp = _age.top(); // XXX
> 180 if (size(localBot, tp) > 0) {
> 181 assert(dirty_size(localBot, tp) != N - 1, "sanity");
> 182 TASKQUEUE_STATS_ONLY(stats.record_pop());
> 183 return true;
> 184 } else {
> 185 // Otherwise, the queue contained exactly one element; we take
> the slow
> 186 // path.
> 187 return pop_local_slow(localBot, _age.get());
> 188 }
> 189 }
> 190
> ---------------------------------------------------------
>
> Assume that after the execution of line 168, the status of the task
> queue is:
> _top = 8825; _bottom = 8826; N = 131072
>
> Just imagine the following corner case:
> -1) The load of '_age.get()' in line 187 is floating up before the
> load of '_age.top' in line 179, and gets _age._top = 8825
> Concurrently, another thread steals a task from the queue, and
> changes the task queue status to:
> _top = 8826; _bottom = 8826; N = 131072
> This status means that there is still one task (indexed by _top =
> 8826) in the queue
> -2) Then the load of '_age.top' in line 179 gets tp = _age._top = 8826
> -3) Then the if-condition in line 180 is false, and pop_local_slow is
> called with parameters localBot = 8826 and _age.get()._top = 8825
> -4) pop_local_slow will return false since localBot !=
> _age.get().top() [2], and the task queue will be set empty [3]
> It's obvious incorrect to empty the task queue if the remaining
> task in step 1) hasn't been processed yet.
> -5) Then pop_local returns false, which is wrong again if the
> remaining task hasn't been stolen by another thread
>
>
> *Fix*
> For weak memory architectures, a memory fence before line 187 is
> required to prevent the load of _age from floating up.
>
> Could you please review it?
>
> Thanks a lot.
> Best regards,
> Jie
>
> [1]
> http://hg.openjdk.java.net/jdk/jdk/file/8f067351c370/src/hotspot/share/gc/
> shared/taskqueue.inline.hpp#l155
> [2]
> http://hg.openjdk.java.net/jdk/jdk/file/8f067351c370/src/hotspot/share/gc/
> shared/taskqueue.inline.hpp#l135
> [3]
> http://hg.openjdk.java.net/jdk/jdk/file/8f067351c370/src/hotspot/share/gc/
> shared/taskqueue.inline.hpp#l149
>
More information about the hotspot-gc-dev
mailing list