RFR(S): G1: assert(_finger == _heap_end) failed, concurrentMark.cpp:809
John Cuthbertson
john.cuthbertson at oracle.com
Wed Mar 13 17:21:07 UTC 2013
Hi Bengt,
Thanks for looking at the code. I'm happy to answer questions.
On 3/13/2013 6:05 AM, Bengt Rutisson wrote:
>
> Hi John,
>
> Thanks for splitting this up into two webrevs! :)
>
> I understand that we only have to reset the marking phase in
> enter_first_sync_barrier() if we are during concurrent mark since we
> are anyway doing it in checkpointRootsFinal(). And I think you are
> correct about the fact that we don't have to drain the discovered
> references if we have hit an overflow.
>
> So, basically I think I understand the changes for 1. and 3. that you
> explained below.
>
> I'm struggling a bit to grasp the change motivated by 2. Are these the
> calls to set_phase() that triggered the assert before? Do we still
> need to remove them considering that we no longer reset the marking
> phase during remark?
Yes - these are the calls that triggered the asserts. Here's the stack
trace:
Stack: [0xd817f000,0xd81ff000], sp=0xd81fe34c, free space=508k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
C=native code)
V [libjvm.so+0xdcaeac] void VMError::report(outputStream*)+0xdec
V [libjvm.so+0xdcc205] void VMError::report_and_die()+0x735
V [libjvm.so+0x6af46b] void report_vm_error(const char*,int,const
char*,const char*)+0x8b
V [libjvm.so+0x65e6a3] void ConcurrentMark::set_phase(unsigned,bool)+0x1a3
V [libjvm.so+0x660534] void
G1CMRefProcTaskExecutor::execute(AbstractRefProcTaskExecutor::ProcessTask&)+0xc4
V [libjvm.so+0xc5275c] void
ReferenceProcessor::process_discovered_reflist(DiscoveredList*,ReferencePolicy*,bool,BoolObjectClosure*,OopClosure*,VoidClosure*,AbstractRefProcTaskExecutor*)+0x25c
V [libjvm.so+0xc50af6] void
ReferenceProcessor::process_discovered_references(BoolObjectClosure*,OopClosure*,VoidClosure*,AbstractRefProcTaskExecutor*)+0x266
V [libjvm.so+0x660975] void ConcurrentMark::weakRefsWork(bool)+0x2d5
V [libjvm.so+0x65f593] void
ConcurrentMark::checkpointRootsFinal(bool)+0x153
V [libjvm.so+0x6983e8] void CMCheckpointRootsFinalClosure::do_void()+0x18
V [libjvm.so+0xe0ecd1] void VM_CGC_Operation::doit()+0xe1
V [libjvm.so+0xe0b82d] void VM_Operation::evaluate()+0x7d
V [libjvm.so+0xe0938c] void
VMThread::evaluate_operation(VM_Operation*)+0x9c
V [libjvm.so+0xe09960] void VMThread::loop()+0x4d0
V [libjvm.so+0xe08fc1] void VMThread::run()+0x111
V [libjvm.so+0xbb1490] java_start+0x210
C [libc.so.1+0xa73b7] _thr_setup+0x4e
C [libc.so.1+0xa76a0] __moddi3+0x60
If a stack overflow occurred when executing one ref processing task, set
the phase in the next would assert.
>
> Also, you replaced the calls to set_phase() with calls to
> _cm->terminator()->reset_for_reuse(_active_workers). The comment says
> that this is necessary for the termination protocol in
> do_marking_step() to work properly. But that method also uses
> _first_overflow_barrier_sync and _second_overflow_barrier_sync. Don't
> we have to reset those too? Similarly to what set_phase() does?
AFAICT we don't need to reset the first and second barrier syncs.
ConcurrentMark::set_phase(active_workers,...._) initializes the
_terminator with a new ParallelTaskTerminator instance with the given
number of workers; the value of the _offered_termination field is
zeroed. When a worker offers termination, _offered_terminatation is
incremented. The only ways to reset the value of _offered_termination is
to create a new instance (as set_phase does) or call reset_for_reuse():
> void ParallelTaskTerminator::reset_for_reuse() {
> if (_offered_termination != 0) {
> assert(_offered_termination == _n_threads,
> "Terminator may still be in use");
> _offered_termination = 0;
> }
> }
to reuse the existing terminator but with new values in the
_offered_termination and _n_threads fields. Note: reset_for_reuse() is
not called when the number of threads offering termination reaches the
number of threads to wait for. We have to do it explicitly.
Compare this with WorkGangBarrierSync. Set_phase() calls set_n_threads()
for the first and second barriers. This routine sets _n_workers to give
number of threads and clears _n_completed, _should_reset:
> void WorkGangBarrierSync::set_n_workers(uint n_workers) {
> _n_workers = n_workers;
> _n_completed = 0;
> _should_reset = false;
> }
When the last thread to enter the barrier sync does so, the value of
_should_reset is set to true. When do_marking_step is called for the
next ref processing task, the first to enter the barrier resets the
value of _n_completed:
> void WorkGangBarrierSync::enter() {
> MutexLockerEx x(monitor(), Mutex::_no_safepoint_check_flag);
> if (should_reset()) {
> // The should_reset() was set and we are the first worker to enter
> // the sync barrier. We will zero the n_completed() count which
> // effectively resets the barrier.
> zero_completed();
> set_should_reset(false);
> }
Hence we don't need to explicitly reset the barrier syncs. Make sense?
After answering your question I realize I should be calling the version
of reset_for_reuse() that takes no parameters. We want to change the
value of _n_threads in the terminator because it could then be changed
to a value different from that in the barrier syncs.
> Finally, why does it say "// Not strictly necessary but..." in
> G1CMRefProcTaskExecutor::execute() ?
It's not necessary because the enqueue task doesn't interact with the
parallel task terminator (i.e. it doesn't call do_marking_step). It just
turns around and calls RefProcEnqueueTask::work(). It just increased the
similarity and symmetry of the two execute() routines and adds a little
bit of "just in case".
Cheers,
JohnC
More information about the hotspot-gc-dev
mailing list