RFR: 8229169: False failure of GenericTaskQueue::pop_local on architectures with weak memory model
Kim Barrett
kim.barrett at oracle.com
Wed Aug 7 02:17:21 UTC 2019
> On Aug 6, 2019, at 7:03 AM, Jie Fu <fujie at loongson.cn> wrote:
>
> Hi Martin,
>
> Thanks for your review and valuable comments.
>
> Updated: http://cr.openjdk.java.net/~jiefu/8229169/webrev.01/
>
> Please see comments inline.
>
> On 2019/8/6 下午5:29, Doerr, Martin wrote
>> From your description, OrderAccess ::loadload() seems to be the appropriate barrier.
>> It should be used on all platforms because it contains a compiler barrier for TSO platforms which prevent compilers from reordering the load accesses.
> Done. Thanks.
>>
>> We should also check that the writer of _age._top uses at least a release (or storestore) barrier in your scenario.
> The writer calls GenericTaskQueue<E, F, N>::pop_global to update _age._top with _age.cmpxchg(newAge, oldAge) [1] .
> I think it already contains the release (or storestore) barrier semantic.
> What do you think?
>> I wonder why I've never seen this issue on PPC64. The test "TestHumongousClassLoader" seems to work stable. But could be that we just never hit this corner case by chance.
> Here is my reproducer to debug this issue: http://cr.openjdk.java.net/~jiefu/8229169/reproducer/
> And this is my patch used to catch the corner case: http://cr.openjdk.java.net/~jiefu/8229169/gc-debug.diff
> HTH.
>
> Any comments?
>
> Thanks a lot.
> Best regards,
> Jie
>
> [1] http://hg.openjdk.java.net/jdk/jdk/file/8f067351c370/src/hotspot/share/gc/shared/taskqueue.inline.hpp#l222
The additional loadload barrier looks good. I can sponsor this change.
I'd like the comment to be explicit that the barrier is to prevent
reordering the two reads of _age. Perhaps note that if size == 0 then
_age cannot change after the read used for the size check.
I think a possible alternative to adding the barrier would be to
ensure the two uses of _age really are using the same data, as they
appear to be (incorrectly) assuming. That is, read _age once and use
the result of that read in both places, e.g. something like
Age curAge = _age.get(); // read once for consistent value below
idx_t tp = curAge.top();
...
} else {
return pop_local_slow(localBot, curAge);
}
I think I prefer the additional loadload barrier though. Maybe wait a
couple of days to see if anyone else wants to chime in though; these
sorts of things can be hard to review. I'll ask if anyone else here
has time to take a look.
Martin, I assume you also saw JDK-8229020, also from Jie? I'm
somewhat surprised neither of these has been previously reported.
Maybe these Loongson CPUs are more aggressively re-ordering reads than
are other platforms?
More information about the hotspot-gc-dev
mailing list