<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix"><br>
Hi John,<br>
<br>
Thanks for fixing this so quickly!<br>
<br>
I have only had a quick look at the change. I'll make sure to look
closer tomorrow. However, I have two questions. If you have time
to answer them it would be good. If you don't have time I hope
they become clear when I look more in detail at the change
tomorrow...<br>
<br>
First, in the constructor for
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
G1CMDrainMarkingStackClosure() we do:<br>
<br>
2285Â Â Â Â _do_stealing = !_is_serial;<br>
2286Â Â Â Â _do_termination = true;<br>
<br>
Just from looking at this it seems like we could get away with
only having one instance variable instead of three. Is that the
case or can any of _do_stealing, _is_serial or _do_termination
change during the life time of a G1CMDrainMarkingStackClosure
instance?<br>
<br>
Second, as you describe in the text below you are actually fixing
two issues with this patch. Would it make sense to split the
changes up into two changesets?<br>
<br>
Thanks,<br>
Bengt<br>
<br>
On 3/11/13 10:35 PM, John Cuthbertson wrote:<br>
</div>
<blockquote cite="mid:513E4E3E.4020405@oracle.com" type="cite">Hi
Everyone,
<br>
<br>
Can I have a couple of volunteers review these changes? The webrev
can be found at:
<a class="moz-txt-link-freetext" href="http://cr.openjdk.java.net/~johnc/8009536/webrev.0/">http://cr.openjdk.java.net/~johnc/8009536/webrev.0/</a>.
<br>
<br>
First of all - many thanks to Uwe Schindler for discovering an
reporting the problem and providing very clear instructions on how
to reproduce the issue. Many thanks also Dawid Weiss for also
stepping in with a self-contained reproducer.
<br>
<br>
I also wish to thank Bengt for his help. It was Bengt who gave me
the magic proxy formula that allowed Ivy to satisfy and download
all the dependencies for the test case. Bengt also diagnosed the
problem and gave an initial fix (which the changes in the webrev
are based upon).
<br>
<br>
Summary:
<br>
During the remark pause, the execution of the parallel RemarkTask
set the number of workers thread in the ParallelTaskTerminator and
the first and second barrier syncs. During serial reference
processing, the marking stack overflowed causing the single
(VMThread) thread to enter the overflow handling code in
CMTask::do_marking_step(). This overflow code using two
WorkBarrierSyncs to synchronize the threads before resetting the
marking state for restarting marking. The barrier syncs were
waiting for the number of threads that participated in the
RemarkTask) but, since only the VM thread was executing, only a
single thread entered the barrier - resulting in the barrier
indefinitely waiting for the other (non existent) threads.
<br>
<br>
A proposed solution was to call set_phase to reset the number of
threads in the parallel task terminator and the barriers to the
number of active threads for the reference processing. This
solution ran into a similar hang while processing the JNI
references with parallel reference processing enabled. (In
parallel reference processing, the JNI references are processed
serially by the calling thread). Resetting the phase to
single-threaded before processing the JNI refs solved the second
hang but resulted in an assertion failure: only a concurrentGC
thread can enter a barrier sync and the calling thread was the VM
thread.
<br>
<br>
Furthermore another problem was discovered. If the marking state
is reset, a subsequent call to set_phase() will assert as the
global finger has been set to start of the heap. This was a
discovered by the marking stack overflowing during the RemarkTask
and parallel reference processing calling set_phase() to
reinitialize number of workers in the parallel task terminator. It
was also discovered when trying out another proposed solution:
adding a start_gc closure to reference processing which would
call set_phase() before each processing phase. As a result the
marking state is only reset by worker 0 if an overflow occurs
during the concurrent phase of marking; if an overflow occurs
during remark, reference processing is skipped, and the marking
state is reset by the VM thread. Resetting the marking state
before reference processing was a benign error (objects would be
marked but not pushed on to the stack as they were no longer below
the finger; the objects would then be traced, in the normal
fashion, when marking restarted) but it's better to safe than
sorry. The other part of the fix for this secondary problem is
that the parallel reference processing task executor now calls the
terminator's reset_for_reuse() routine instead of set_phase().
<br>
<br>
The resulting solution for the hang is based upon the patch sent
out by Bengt - namely we do not enter the sync barriers when
CMTask::do_marking_step() is being called serially. The difference
is that I added an extra parameter to CMTask::do_marking_step()
instead of piggy-backing on the existing parameter list.
Additionally, if this new parameter indicates serial operation,Â
the current thread will skip offering termination. This allows the
serial drain closure to enter the termination protocol and execute
the guarantees contained therein.
<br>
<br>
The other changes are for the secondary issues, described above,
that were discovered while out trying other possible solutions.
<br>
<br>
Testing:
<br>
The lucene test case with serial reference processing (with and
without verification);Â the lucene test case with parallel
reference processing (with and without verification).
<br>
GC test suite with a mark stack size of 1K and 4K, with both
serial and parallel reference processing (with and without
verification).
<br>
</blockquote>
<br>
</body>
</html>