RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v14]

Xiaolong Peng xpeng at openjdk.org
Fri Mar 28 00:54:25 UTC 2025


On Wed, 26 Mar 2025 20:37:59 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

>> There are some scenarios in which GenShen may have improper remembered set verification logic:
>> 
>> 1. Concurrent young cycles following a Full GC:
>> 
>> In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification
>> 
>> 
>> ShenandoahVerifier  
>> ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() {
>>   shenandoah_assert_generations_reconciled();
>>   if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) {
>>     return _heap->complete_marking_context();
>>   }
>>   return nullptr;
>> }
>> 
>> 
>> For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. 
>> 
>> 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. 
>> 
>> 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. 
>> 
>> 4. After concurrent young cycle evacuates objects from a young region, it update refs using marking bitmaps from marking context, therefore it won't update references of dead old objects(is_marked(obj) is false: obj is not marking strong/weak and it is below tams). In this case, if the next cycle if global concurrent GC, remembered set can't be verified before init-mark because of the dead pointers. 
>> 
>> ### Solution
>> * After a full GC, always set marking completeness flag to false after reseting the marking bitmaps.
>> * Because there could be dead pointers in old gen were not updated to point to new address after evacuation and refs update, we should disable rem-set validation before init-mark&update-refs if old marking context is incomplete. 
>> 
>> ### Test
>> - [x] `make test TEST=hotspot_gc_shenandoah`
>> - [x] GHA
>
> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add comments

I have reproduced the bug https://bugs.openjdk.org/browse/JDK-8345399 on ppc64le hardware with tip, crash happens in a young cycle after a full GC, which is one of the problems I'm trying to fix in this PR:

[13.990s][info][gc,start       ] GC(101) Pause Full                                                                                                                                                                                                
[13.990s][info][gc,task        ] GC(101) Using 4 of 4 workers for full gc                                                                                                                                                                          
[13.990s][info][gc,start       ] GC(101) Verify Before Full GC, Level 4                                                                                                                                                                            
[13.998s][info][gc             ] GC(101) Verify Before Full GC, Level 4 (22772 reachable, 0 marked)                                                                                                                                                
[13.998s][info][gc,phases,start] GC(101) Phase 1: Mark live objects                                                                                                                                                                                
[14.003s][info][gc,ref         ] GC(101) Clearing All SoftReferences                                                                                                                                                                               
[14.003s][info][gc,ref         ] GC(101) Clearing All SoftReferences                                                                                                                                                                               
[14.009s][info][gc,ref         ] GC(101) Encountered references: Soft: 49, Weak: 101, Final: 0, Phantom: 8                                                                                                                                         
[14.009s][info][gc,ref         ] GC(101) Discovered  references: Soft: 31, Weak: 39, Final: 0, Phantom: 8                                                                                                                                          
[14.009s][info][gc,ref         ] GC(101) Enqueued    references: Soft: 0, Weak: 0, Final: 0, Phantom: 0                                                                                                                                            
[14.012s][info][gc,phases      ] GC(101) Phase 1: Mark live objects 13.674ms                                                                                                                                                                       
[14.012s][info][gc,phases,start] GC(101) Phase 2: Compute new object addresses                                                                                                                                                                     
[14.026s][info][gc,phases      ] GC(101) Phase 2: Compute new object addresses 14.166ms                                                                                                                                                            
[14.026s][info][gc,phases,start] GC(101) Phase 3: Adjust pointers                                                                                                                                                                                  
[14.030s][info][gc,phases      ] GC(101) Phase 3: Adjust pointers 3.626ms                                                                                                                                                                          
[14.030s][info][gc,phases,start] GC(101) Phase 4: Move objects                                                                                                                                                                                     
[14.128s][info][gc,phases      ] GC(101) Phase 4: Move objects 98.264ms                                                                                                                                                                            
[14.128s][info][gc,phases,start] GC(101) Phase 5: Full GC epilog                                                                                                                                                                                   
[14.146s][info][gc,ergo        ] GC(101) Transfer 234 region(s) from Old to Young, yielding increased size: 790M                                                                                                                                   
[14.146s][info][gc,ergo        ] GC(101) FullGC done: young usage: 450M, old usage: 231M                                                                                                                                                           
[14.146s][info][gc,free        ] Free: 296M, Max: 512K regular, 296M humongous, Frag: 0% external, 0% internal; Used: 0B, Mutator Free: 592 Collector Reserve: 40959K, Max: 512K; Used: 16B Old Collector Reserve: 1307K, Max: 511K; Used: 740K    
[14.146s][info][gc,ergo        ] GC(101) After Full GC, successfully transferred 0 regions to none to prepare for next gc, old available: 1307K, young_available: 296M                                                                             
[14.146s][info][gc,barrier     ] GC(101) Cleaned read_table from 0x0000754a50290000 to 0x0000754a5048ffff                                                                                                                                          
[14.146s][info][gc,barrier     ] GC(101) Current write_card_table: 0x0000754a4fc90000                                                                                                                                                              
[14.148s][info][gc,phases      ] GC(101) Phase 5: Full GC epilog 20.265ms                                                                                                                                                                          
[14.148s][info][gc,start       ] GC(101) Verify After Full GC, Level 4                                                                                                                                                                             
[14.182s][info][gc             ] GC(101) Verify After Full GC, Level 4 (22664 reachable, 125 marked)                                                                                                                                               
[14.182s][info][gc,ergo        ] GC(101) At end of Full GC: GCU: 6.9%, MU: 9.9% during period of 0.261s                                                                                                                                            
[14.182s][info][gc,ergo        ] GC(101) At end of Full GC: Young generation used: 450M, used regions: 454M, humongous waste: 3532K, soft capacity: 1024M, max capacity: 790M, available: 296M                                                     
[14.182s][info][gc,ergo        ] GC(101) At end of Full GC: Old generation used: 231M, used regions: 234M, humongous waste: 1654K, soft capacity: 0B, max capacity: 234M, available: 1307K                                                         
[14.182s][info][gc,ergo        ] GC(101) Good progress for free space: 296M, need 10485K                                                                                                                                                           
[14.182s][info][gc,ergo        ] GC(101) Good progress for used space: 148M, need 512K                                                                                                                                                             
[14.182s][info][gc             ] GC(101) Pause Full 829M->681M(1024M) 192.311ms                                                                                                                                                                    
...
[14.196s][info][gc             ] Trigger (Young): Free (65536K) is below minimum threshold (80895K)
[14.196s][info][gc,free        ] Free: 65536K, Max: 512K regular, 65536K humongous, Frag: 0% external, 0% internal; Used: 0B, Mutator Free: 128 Collector Reserve: 40959K, Max: 512K; Used: 16B Old Collector Reserve: 1307K, Max: 511K; Used: 740K
[14.196s][info][gc,ergo        ] GC(102) Start GC cycle (Young)
[14.196s][info][gc,start       ] GC(102) Concurrent reset (Young)
[14.196s][info][gc,task        ] GC(102) Using 2 of 4 workers for Concurrent reset (Young)
[14.196s][info][gc,ergo        ] GC(102) Pacer for Reset. Non-Taxable: 1024M
Allocated: 732 Mb
Allocated: 699 Mb
Allocated: 715 Mb
[14.200s][info][gc,thread      ] Cancelling GC: unknown GCCause
[14.200s][info][gc             ] Failed to allocate Shared, 61709K
[14.202s][info][gc             ] GC(102) Concurrent reset (Young) 6.371ms
[14.203s][info][gc,barrier     ] GC(102) Cleaned read_table from 0x0000754a50080000 to 0x0000754a5027ffff
[14.203s][info][gc,start       ] GC(102) Pause Init Mark (Young)
[14.203s][info][gc,task        ] GC(102) Using 4 of 4 workers for init marking
[14.205s][info][gc,barrier     ] GC(102) Current write_card_table: 0x0000754a4fa80000
[14.205s][info][gc,start       ] GC(102) Verify Before Mark, Level 4
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/xlpeng/repos/jdk/src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp:1270), pid=2167519, tid=2167538
#  Error: Verify init-mark remembered set violation; clean card, it should be dirty.

Referenced from:
  interior location: 0x00000000c00c2bfc
  inside Java heap
    not in collection set
  region: |    1|R  |O|BTE     c0080000,     c00c2c78,     c0100000|TAMS     c0080000|UWM     c00c2c78|U   267K|T     0B|G     0B|P     0B|S   267K|L   267K|CP   0

Object:
  0x00000000e8c00000 - klass 0x000001df001abfa0 [I
    not allocated after mark start
    not after update watermark
    not marked strong
    not marked weak
    not in collection set
  age: 0
  mark: mark(is_unlocked no_hash age=0)
  region: | 1304|H  |Y|BTE     e8c00000,     e8c80000,     e8c80000|TAMS     e8c80000|UWM     e8c80000|U   512K|T     0B|G     0B|P     0B|S   512K|L     0B|CP   0

Forwardee:
  (the object itself)



I'll run the same test to confirm whether this PR fix the bug.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24092#issuecomment-2759904598


More information about the shenandoah-dev mailing list