RFR(S): G1: assert(_finger == _heap_end) failed, concurrentMark.cpp:809
John Cuthbertson
john.cuthbertson at oracle.com
Wed Mar 13 18:12:26 UTC 2013
Hi Everyone,
Small error....
On 3/13/2013 10:21 AM, John Cuthbertson wrote:
> Hi Bengt,
>
> Thanks for looking at the code. I'm happy to answer questions.
>
> On 3/13/2013 6:05 AM, Bengt Rutisson wrote:
>>
>> Hi John,
>>
>> Thanks for splitting this up into two webrevs! :)
>>
>> I understand that we only have to reset the marking phase in
>> enter_first_sync_barrier() if we are during concurrent mark since we
>> are anyway doing it in checkpointRootsFinal(). And I think you are
>> correct about the fact that we don't have to drain the discovered
>> references if we have hit an overflow.
>>
>> So, basically I think I understand the changes for 1. and 3. that you
>> explained below.
>>
>> I'm struggling a bit to grasp the change motivated by 2. Are these
>> the calls to set_phase() that triggered the assert before? Do we
>> still need to remove them considering that we no longer reset the
>> marking phase during remark?
>
> Yes - these are the calls that triggered the asserts. Here's the stack
> trace:
>
> Stack: [0xd817f000,0xd81ff000], sp=0xd81fe34c, free space=508k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
> C=native code)
> V [libjvm.so+0xdcaeac] void VMError::report(outputStream*)+0xdec
> V [libjvm.so+0xdcc205] void VMError::report_and_die()+0x735
> V [libjvm.so+0x6af46b] void report_vm_error(const char*,int,const
> char*,const char*)+0x8b
> V [libjvm.so+0x65e6a3] void
> ConcurrentMark::set_phase(unsigned,bool)+0x1a3
> V [libjvm.so+0x660534] void
> G1CMRefProcTaskExecutor::execute(AbstractRefProcTaskExecutor::ProcessTask&)+0xc4
> V [libjvm.so+0xc5275c] void
> ReferenceProcessor::process_discovered_reflist(DiscoveredList*,ReferencePolicy*,bool,BoolObjectClosure*,OopClosure*,VoidClosure*,AbstractRefProcTaskExecutor*)+0x25c
> V [libjvm.so+0xc50af6] void
> ReferenceProcessor::process_discovered_references(BoolObjectClosure*,OopClosure*,VoidClosure*,AbstractRefProcTaskExecutor*)+0x266
> V [libjvm.so+0x660975] void ConcurrentMark::weakRefsWork(bool)+0x2d5
> V [libjvm.so+0x65f593] void
> ConcurrentMark::checkpointRootsFinal(bool)+0x153
> V [libjvm.so+0x6983e8] void CMCheckpointRootsFinalClosure::do_void()+0x18
> V [libjvm.so+0xe0ecd1] void VM_CGC_Operation::doit()+0xe1
> V [libjvm.so+0xe0b82d] void VM_Operation::evaluate()+0x7d
> V [libjvm.so+0xe0938c] void
> VMThread::evaluate_operation(VM_Operation*)+0x9c
> V [libjvm.so+0xe09960] void VMThread::loop()+0x4d0
> V [libjvm.so+0xe08fc1] void VMThread::run()+0x111
> V [libjvm.so+0xbb1490] java_start+0x210
> C [libc.so.1+0xa73b7] _thr_setup+0x4e
> C [libc.so.1+0xa76a0] __moddi3+0x60
>
> If a stack overflow occurred when executing one ref processing task,
> set the phase in the next would assert.
>
>>
>> Also, you replaced the calls to set_phase() with calls to
>> _cm->terminator()->reset_for_reuse(_active_workers). The comment says
>> that this is necessary for the termination protocol in
>> do_marking_step() to work properly. But that method also uses
>> _first_overflow_barrier_sync and _second_overflow_barrier_sync. Don't
>> we have to reset those too? Similarly to what set_phase() does?
>
> AFAICT we don't need to reset the first and second barrier syncs.
>
> ConcurrentMark::set_phase(active_workers,...._) initializes the
> _terminator with a new ParallelTaskTerminator instance with the given
> number of workers; the value of the _offered_termination field is
> zeroed. When a worker offers termination, _offered_terminatation is
> incremented. The only ways to reset the value of _offered_termination
> is to create a new instance (as set_phase does) or call
> reset_for_reuse():
>
>> void ParallelTaskTerminator::reset_for_reuse() {
>> if (_offered_termination != 0) {
>> assert(_offered_termination == _n_threads,
>> "Terminator may still be in use");
>> _offered_termination = 0;
>> }
>> }
>
> to reuse the existing terminator but with new values in the
> _offered_termination and _n_threads fields. Note: reset_for_reuse() is
> not called when the number of threads offering termination reaches the
> number of threads to wait for. We have to do it explicitly.
>
> Compare this with WorkGangBarrierSync. Set_phase() calls
> set_n_threads() for the first and second barriers. This routine sets
> _n_workers to give number of threads and clears _n_completed,
> _should_reset:
>
>> void WorkGangBarrierSync::set_n_workers(uint n_workers) {
>> _n_workers = n_workers;
>> _n_completed = 0;
>> _should_reset = false;
>> }
>
> When the last thread to enter the barrier sync does so, the value of
> _should_reset is set to true. When do_marking_step is called for the
> next ref processing task, the first to enter the barrier resets the
> value of _n_completed:
>
>> void WorkGangBarrierSync::enter() {
>> MutexLockerEx x(monitor(), Mutex::_no_safepoint_check_flag);
>> if (should_reset()) {
>> // The should_reset() was set and we are the first worker to enter
>> // the sync barrier. We will zero the n_completed() count which
>> // effectively resets the barrier.
>> zero_completed();
>> set_should_reset(false);
>> }
>
> Hence we don't need to explicitly reset the barrier syncs. Make sense?
>
> After answering your question I realize I should be calling the
> version of reset_for_reuse() that takes no parameters. We want to
> change the value of _n_threads in the terminator because it could then
> be changed to a value different from that in the barrier syncs.
The sentence above should be "We *don't* want to change the value of
_n_threads...."
Apologies,
JohnC
More information about the hotspot-gc-dev
mailing list