RFR: 8342848: Shenandoah: Marking bitmap may not be completely cleared in generational mode

Xiaolong Peng xpeng at openjdk.org
Wed Oct 23 06:29:24 UTC 2024


In the investigation of the crashe I saw in PR https://github.com/openjdk/shenandoah/pull/516, I happened to reproduce the crash GenShen TIP as well, the crash was reproduce multi times on both AWS r7g-4xlarge and r7i-4xlarge instances by running test below repeatedly:


CONF=linux-aarch64-server-fastdebug  make clean test TEST=gc/stress/gcold/TestGCOldWithShenandoah.java#generational JTREG="REPEAT_COUNT=100" 
``` 

Crash:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp:642), pid=24134, tid=24158
#  assert(_generation->is_bitmap_clear()) failed: need clear marking bitmap
#
# JRE version: OpenJDK Runtime Environment (24.0) (fastdebug build 24-internal-adhoc.xlpeng.jdk-xlpeng)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-adhoc.xlpeng.jdk-xlpeng, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, shenandoah gc, linux-aarch64)
# Problematic frame:
# V  [libjvm.so+0x15eadc4]  ShenandoahConcurrentGC::op_init_mark()+0x358
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /local/home/xlpeng/repos/jdk-xlpeng/build/linux-aarch64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_gc_stress_gcold_TestGCOldWithShenandoah_java_generational/scratch/0/hs_err_pid24134.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#


With logging/instrumentation,  it seems to be caused by the one line  code `bool needs_reset = _generation->contains(region) || !region->is_affiliated(); `, considering bitmap reset is a concurrent operation, if is possible mutator thread changes the affiliation from FREE to YOUNG when  bitmap reset is running, both `_generation->contains(region)` and `!region->is_affiliated()` can be false when affiliation is FREE and mutator is updating it at the same time.

Logs from instrumentation:

[32.793s][info][gc          ] GC(19) Not reseting bitmap for YOUNG region (0x0000ffff8c1a6100)(affiliation before test: FREE)



The fix is simple, just need to swap the two tests,  `!region->is_affiliated()` should be evaluated prior to  `_generation->contains(region)`

-------------

Commit messages:
 - 8342848: Shenandoah: Marking bitmap may not be completely cleared in generational mode

Changes: https://git.openjdk.org/shenandoah/pull/523/files
  Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=523&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8342848
  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/shenandoah/pull/523.diff
  Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/523/head:pull/523

PR: https://git.openjdk.org/shenandoah/pull/523


More information about the shenandoah-dev mailing list