[jdk13u-dev] RFR: 8264640: CMS ParScanClosure misses a barrier
John Cuthbertson
johnc at azul.com
Thu Apr 8 18:33:55 UTC 2021
Hi Anton,
This looks good to me. I think I’m still a reviewer for the jdk-updates project.
For the benefit of everyone else...
We were seeing this as a crash when obtaining the size of an object to be copied. The klass was observed to be transiently NULL. We found that the object, reached through another reference path, had already been copied and the from-space oop placed on the task queue for subsequent reference field scanning. The task queue, however, had overflowed and the from-space oop was placed on the shared overflow queue where objects are chained together through their klass field. If the reads are ordered as they are in the code then everything is OK as per the comment at line 105 (in ParScanClosure::do_oop_work) but we found that gcc had reordered the reads in the non-compressed oops case. So the mark word is read and the object is observed to not forwarded (yet). Then, via another reference path, the object is copied, forwarded, and placed on the overflow task queue — over writing the from-space object’s klass. Then in the original path the klass is read and observed to be NULL or the next overflow entry — leading to the crash. When the from-space oop is dequeued, its klass is restored — which is what was observed in the core file.
Using worker thread local queues, -XX:+ParGCUseLocalOverflow, seems to workaround the problem.
Thanks,
John Cuthbertson
> On Apr 2, 2021, at 2:02 AM, Anton Kozlov <akozlov at azul.com> wrote:
>
> Adding hotspot-gc-dev. It will be great to receive comments from GC experts, even the fix does not make sense for mainline jdk.
>
> Thanks,
> Anton
>
> On 4/2/21 11:51 AM, Anton Kozlov wrote:
>> Hi, please review an original fix for a GC crash. The jdk13u is the latest supported version that still has buggy code, it was deleted in jdk14 as a part of JEP 363: Remove the Concurrent Mark Sweep (CMS) Garbage Collector. So I'm proposing it here.
>> The fix is low-risk, on x86-64 it just introduces a compiler barrier to prevent two reads to be reordered as intended by surrounding comments. On CPUs with weaker memory models it introduces CPU barriers as well.
>> -------------
>> Commit messages:
>> - Add missing barriers
>> Changes: https://git.openjdk.java.net/jdk13u-dev/pull/165/files
>> Webrev: https://webrevs.openjdk.java.net/?repo=jdk13u-dev&pr=165&range=00
>> Issue: https://bugs.openjdk.java.net/browse/JDK-8264640
>> Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod
>> Patch: https://git.openjdk.java.net/jdk13u-dev/pull/165.diff
>> Fetch: git fetch https://git.openjdk.java.net/jdk13u-dev pull/165/head:pull/165
>> PR: https://git.openjdk.java.net/jdk13u-dev/pull/165
More information about the jdk-updates-dev
mailing list