RFR(S): G1: assert(_finger == _heap_end) failed, concurrentMark.cpp:809

Wed Mar 13 18:12:26 UTC 2013

Hi Everyone,

Small error....

On 3/13/2013 10:21 AM, John Cuthbertson wrote:
> Hi Bengt,
>
> Thanks for looking at the code. I'm happy to answer questions.
>
> On 3/13/2013 6:05 AM, Bengt Rutisson wrote:
>>
>> Hi John,
>>
>> Thanks for splitting this up into two webrevs! :)
>>
>> I understand that we only have to reset the marking phase in 
>> enter_first_sync_barrier() if we are during concurrent mark since we 
>> are anyway doing it in checkpointRootsFinal(). And I think you are 
>> correct about the fact that we don't have to drain the discovered 
>> references if we have hit an overflow.
>>
>> So, basically I think I understand the changes for 1. and 3. that you 
>> explained below.
>>
>> I'm struggling a bit to grasp the change motivated by 2. Are these 
>> the calls to set_phase() that triggered the assert before? Do we 
>> still need to remove them considering that we no longer reset the 
>> marking phase during remark?
>
> Yes - these are the calls that triggered the asserts. Here's the stack 
> trace:
>
> Stack: [0xd817f000,0xd81ff000], sp=0xd81fe34c, free space=508k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, 
> C=native code)
> V [libjvm.so+0xdcaeac] void VMError::report(outputStream*)+0xdec
> V [libjvm.so+0xdcc205] void VMError::report_and_die()+0x735
> V [libjvm.so+0x6af46b] void report_vm_error(const char*,int,const 
> char*,const char*)+0x8b
> V [libjvm.so+0x65e6a3] void 
> ConcurrentMark::set_phase(unsigned,bool)+0x1a3
> V [libjvm.so+0x660534] void 
> G1CMRefProcTaskExecutor::execute(AbstractRefProcTaskExecutor::ProcessTask&)+0xc4
> V [libjvm.so+0xc5275c] void 
> ReferenceProcessor::process_discovered_reflist(DiscoveredList*,ReferencePolicy*,bool,BoolObjectClosure*,OopClosure*,VoidClosure*,AbstractRefProcTaskExecutor*)+0x25c
> V [libjvm.so+0xc50af6] void 
> ReferenceProcessor::process_discovered_references(BoolObjectClosure*,OopClosure*,VoidClosure*,AbstractRefProcTaskExecutor*)+0x266
> V [libjvm.so+0x660975] void ConcurrentMark::weakRefsWork(bool)+0x2d5
> V [libjvm.so+0x65f593] void 
> ConcurrentMark::checkpointRootsFinal(bool)+0x153
> V [libjvm.so+0x6983e8] void CMCheckpointRootsFinalClosure::do_void()+0x18
> V [libjvm.so+0xe0ecd1] void VM_CGC_Operation::doit()+0xe1
> V [libjvm.so+0xe0b82d] void VM_Operation::evaluate()+0x7d
> V [libjvm.so+0xe0938c] void 
> VMThread::evaluate_operation(VM_Operation*)+0x9c
> V [libjvm.so+0xe09960] void VMThread::loop()+0x4d0
> V [libjvm.so+0xe08fc1] void VMThread::run()+0x111
> V [libjvm.so+0xbb1490] java_start+0x210
> C [libc.so.1+0xa73b7] _thr_setup+0x4e
> C [libc.so.1+0xa76a0] __moddi3+0x60
>
> If a stack overflow occurred when executing one ref processing task, 
> set the phase in the next would assert.
>
>>
>> Also, you replaced the calls to set_phase() with calls to 
>> _cm->terminator()->reset_for_reuse(_active_workers). The comment says 
>> that this is necessary for the termination protocol in 
>> do_marking_step() to work properly. But that method also uses 
>> _first_overflow_barrier_sync and _second_overflow_barrier_sync. Don't 
>> we have to reset those too? Similarly to what set_phase() does?
>
> AFAICT we don't need to reset the first and second barrier syncs.
>
> ConcurrentMark::set_phase(active_workers,...._) initializes the 
> _terminator with a new ParallelTaskTerminator instance with the given 
> number of workers; the value of the _offered_termination field is 
> zeroed. When a worker offers termination, _offered_terminatation is 
> incremented. The only ways to reset the value of _offered_termination 
> is to create a new instance (as set_phase does) or call 
> reset_for_reuse():
>
>> void ParallelTaskTerminator::reset_for_reuse() {
>>   if (_offered_termination != 0) {
>>     assert(_offered_termination == _n_threads,
>>            "Terminator may still be in use");
>>     _offered_termination = 0;
>>   }
>> }
>
> to reuse the existing terminator but with new values in the 
> _offered_termination and _n_threads fields. Note: reset_for_reuse() is 
> not called when the number of threads offering termination reaches the 
> number of threads to wait for. We have to do it explicitly.
>
> Compare this with WorkGangBarrierSync. Set_phase() calls 
> set_n_threads() for the first and second barriers. This routine sets 
> _n_workers to give number of threads and clears _n_completed, 
> _should_reset:
>
>> void WorkGangBarrierSync::set_n_workers(uint n_workers) {
>>   _n_workers   = n_workers;
>>   _n_completed = 0;
>>   _should_reset = false;
>> }
>
> When the last thread to enter the barrier sync does so, the value of 
> _should_reset is set to true. When do_marking_step is called for the 
> next ref processing task, the first to enter the barrier resets the 
> value of _n_completed:
>
>> void WorkGangBarrierSync::enter() {
>>   MutexLockerEx x(monitor(), Mutex::_no_safepoint_check_flag);
>>   if (should_reset()) {
>>     // The should_reset() was set and we are the first worker to enter
>>     // the sync barrier. We will zero the n_completed() count which
>>     // effectively resets the barrier.
>>     zero_completed();
>>     set_should_reset(false);
>>   }
>
> Hence we don't need to explicitly reset the barrier syncs. Make sense?
>
> After answering your question I realize I should be calling the 
> version of reset_for_reuse() that takes no parameters. We want to 
> change the value of _n_threads in the terminator because it could then 
> be changed to a value different from that in the barrier syncs.

The sentence above should be "We *don't* want to change the value of 
_n_threads...."

Apologies,

JohnC