<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Hi Kim,<br>
<br>
Thanks for the review and sorry for replying late.<br>
I had to spend some time to figure out complete_gc closure stuff and
re-running all tests again.<br>
<br>
<div class="moz-cite-prefix">On 6/4/18 5:11 PM, Kim Barrett wrote:<br>
</div>
<blockquote type="cite"
cite="mid:A956AB83-BB37-4167-8FB0-24312EE2952C@oracle.com">
<blockquote type="cite">
<pre wrap="">On Jun 1, 2018, at 5:48 PM, <a class="moz-txt-link-abbreviated" href="mailto:sangheon.kim@oracle.com">sangheon.kim@oracle.com</a> wrote:
Hi all,
As webrev.0 is conflicting with webrev.0 of "8203319: JDK-8201487 disabled too much queue balancing"(out for review, but not yet pushed), I'm posting webrev.1.
<a class="moz-txt-link-freetext" href="http://cr.openjdk.java.net/%7Esangheki/8043575/webrev.1">http://cr.openjdk.java.net/~sangheki/8043575/webrev.1</a>
<a class="moz-txt-link-freetext" href="http://cr.openjdk.java.net/%7Esangheki/8043575/webrev.1_to_0">http://cr.openjdk.java.net/~sangheki/8043575/webrev.1_to_0</a>
</pre>
</blockquote>
<pre wrap="">The hookup of the changes into the various collectors seems okay. My
comments are mostly focused on the ReferenceProcessor changes.
------------------------------------------------------------------------------
src/hotspot/share/gc/shared/referenceProcessorPhaseTimes.hpp
140 void set_phase_number(RefProcPhaseNumbers phase_number) { _phase_number = phase_number; }
141 RefProcPhaseNumbers phase_number() const { return _phase_number; }
I think I dislike these and how they are used. And see other
discussion below, suggesting they might not be needed.
(Though they are similar to other related states. But maybe I dislike
those too.)</pre>
</blockquote>
I can avoid using it as you suggested below.<br>
But it is still needed at other locations in
referenceProcessorPhaseTimes.cpp. If you are saying we need to avoid
using this stuffs, I think it is out of scope for this patch. So,
please file a CR.<br>
<br>
<blockquote type="cite"
cite="mid:A956AB83-BB37-4167-8FB0-24312EE2952C@oracle.com">
<pre wrap="">------------------------------------------------------------------------------
src/hotspot/share/gc/shared/referenceProcessor.hpp
641 class RefProcMTDegreeAdjuster : public StackObj {
...
649 RefProcMTDegreeAdjuster(ReferenceProcessor* rp, ReferenceProcessorPhaseTimes* times, size_t ref_count);
The times argument is here to provide information about what phase
we're in. That seems really indirect, and I'd really prefer that
information to be provided directly as arguments here.</pre>
</blockquote>
Webrev.2 uses ReferenceType and RefProcPhaseNumbers directly, please
have a chance to compare with previous version.<br>
<br>
<blockquote type="cite"
cite="mid:A956AB83-BB37-4167-8FB0-24312EE2952C@oracle.com">
<pre wrap="">I think that might also remove the need for the new phase_number and
set_phase_number functions for the phase times class.</pre>
</blockquote>
As I noted above, those are still used in other locations.<br>
<br>
<blockquote type="cite"
cite="mid:A956AB83-BB37-4167-8FB0-24312EE2952C@oracle.com">
<pre wrap="">This also would flow through to ergo_proc_thread_count and use_max_threads.</pre>
</blockquote>
Done.<br>
<br>
<blockquote type="cite"
cite="mid:A956AB83-BB37-4167-8FB0-24312EE2952C@oracle.com">
<pre wrap="">------------------------------------------------------------------------------
src/hotspot/share/gc/shared/referenceProcessor.cpp
727 bool must_balance = mt_processing && need_balance_queues(refs_lists);
need_balance_queues returns the right answer for an initial balancing,
but not for a re-balance after some processing. Specifically, further
reducing the mt-degree may require rebalancing, even if it wasn't
previously needed.
(This comment is assuming the updated version of need_balance_queues
that I sent out today. The situation is worse without that.)</pre>
</blockquote>
Done.<br>
<br>
<blockquote type="cite"
cite="mid:A956AB83-BB37-4167-8FB0-24312EE2952C@oracle.com">
<pre wrap="">------------------------------------------------------------------------------
src/hotspot/share/gc/shared/referenceProcessor.cpp
787 // For discovery_is_atomic() is true, phase 3 should be processed to call do_void() from VoidClosure.
788 // For discovery_is_atomic() is false, phase 3 can be skipped if there are no references because do_void() is
789 // already called at phase 2.
790 if (!discovery_is_atomic()) {
791 if (total_count(refs_lists) == 0) {
792 return;
793 }
794 }
The discovery_is_atomic stuff is left-over from before JDK-8203028.
Phase2 now never calls the complete_gc closure, because it is never
needed, regardless of the value of discovery_is_atomic. So checking
that can be removed here.</pre>
</blockquote>
Here are 2 things.<br>
1) If we call complete_gc closure before exit, is it okay?<br>
- The short answer is no, as we face an assertion failure at CMS
ParNew. The long answer is: For ParallelRefProcEnabled=true case,
CMS ParNew is relying on to do some extra work by AbstractExecutor.
So eventually ParScanThreadState::_young_old_boundary is set and
then ParEvacuateFollowersClosure is used as a complete_gc closure.
Just calling complete_gc closure that passed as a parameter doesn't
help as it is EvacuateFollowersClosureGeneral<ScanClosure,
ScanClosureWithParBarrier>. i.e. complete_gc closure is different
at process_discovered_references() vs. phase3, ProcessTask::work().<br>
2) Why 'ref_count' is not updated and then line 800 is just using
old 'ref_count'?<br>
- If we use different number of workers(not exit at zero ref
cases), we face a few types of assertion failures. IIRC, faced
SIGSEGV too at copy_to_survivor_space().<br>
- Each collector has own per worker state information, so I
think it will not properly updated if we use different number of
workers between phase2 and phase3. <br>
- This could be applied between phase1 and phase2, but I
couldn't see any problem so far. So I left as is the exit path
between phase1 and phase2. An alternative would be excluding exit
among phases and address it from the CR. Previously there were
another CR of JDK-8181214.<br>
<br>
* My previous version allowed only !discovery_is_atomic() cases
would exit. With that condition, most cases are filtered out so CMS
old gc and G1 CM would get the benefit. I never faced any crashes or
assertion failures with it. But for the reason of 2) I'm removing
exit path between phase2 and phase3 with some explanation why we
can't exit.<br>
Please give me a better wording. <br>
<br>
<blockquote type="cite"
cite="mid:A956AB83-BB37-4167-8FB0-24312EE2952C@oracle.com">
<pre wrap="">At least, that's what's expected by the ReferenceProcessing code. The
atomic discovery case for phase2 has "always" (since at least the
beginning of the mercurial age) ignored the complete_gc argument, and
only used it when dealing with the concurrent discovery case.
But G1CopyingKeepAliveClosure appears to have always passed the buck
to the complete_gc closure, contrary to the expectations of the RP
phase2 code. So G1 young/mixed collections only work because phase3
will eventually call the complete_gc closure (by each of the threads).
(G1 concurrent and full gc's have a cuttoff where no work is left for
the complete_gc closure if the object is already marked. I spot
checked other collectors, and at a quick glance it looks like the
others meet the expectations of the ReferenceProcessor code; it's just
G1 young/mixed collections that don't.)
I'm not entirely sure what happens if the number of threads in phase3
is less than the number in phase2 in that case. I think any pending
phase2 work associated with the missing threads will get stolen by
phase3 working threads, but haven't verified that.
I think the simplest fix for this is to change phase2 to always call
the complete_gc closure after all. For some collections that's a
(perhaps somewhat expensive) nop, but in the overall context of
reference processing that's probably in the noise. And also update
the relevant comment in phase2.
The alternative would be to make G1CopyingKeepAliveClosure meet the
(not explicitly stated in the API) ReferenceProcessor expectations,
perhaps bypassing some of the G1ParScanThreadState machinery.</pre>
</blockquote>
I think this is already covered above.<br>
<br>
<blockquote type="cite"
cite="mid:A956AB83-BB37-4167-8FB0-24312EE2952C@oracle.com">
<pre wrap="">------------------------------------------------------------------------------
src/hotspot/share/gc/shared/referenceProcessor.cpp
800 RefProcMTDegreeAdjuster a(this, phase_times, ref_count);
This is using ref_count, which was last set on line 759, and not
updated after phase2. It should have been updated as part of the new
code block around line 790.</pre>
</blockquote>
I think this is already answered above.<br>
<br>
<blockquote type="cite"
cite="mid:A956AB83-BB37-4167-8FB0-24312EE2952C@oracle.com">
<pre wrap="">------------------------------------------------------------------------------
src/hotspot/share/gc/shared/referenceProcessor.cpp
1193 uint ReferenceProcessor::ergo_proc_thread_count(ReferenceProcessorPhaseTimes* times) const {
1194 size_t ref_count = total_count(_discovered_refs);
1195
1196 return ergo_proc_thread_count(ref_count, num_queues(), times);
1197 }
I don't think this overload should exist. The ref_count used here is
the SoftReference count, which isn't particularly interesting
here. (It's not the total number of discovered references, which is
also not an interesting number for this purpose.)
It seems to only exist as part of the workarounds for making ParNew
kind of limp along with these changes. But wouldn't it be simpler to
leave ParNewRefProceTaskExecutor::execute alone, using the
active_workers as before? (And recall that I suggested above that
execute should just use the currently configured mt-processing degree,
rather than adding an ergo_workers argument.) It will be called in the
context of RefProcMTDegreeAdjuster, that won't do anything because of
ParNew disables mt-degree adjustment. So I think by leaving things
alone here we retain the current behavior, e.g. CMS ParNew uses
ParallelRefProcEnabled as an all-or-nothing flag, and doesn't pay
attention to the new ReferencesPerThread. And that seems reasonable
to me for a deprecated collector.</pre>
</blockquote>
Done.<br>
I agree for not supporting CMS so we don't need to add the overload
version for CMS.<br>
<br>
<blockquote type="cite"
cite="mid:A956AB83-BB37-4167-8FB0-24312EE2952C@oracle.com">
<pre wrap="">------------------------------------------------------------------------------
src/hotspot/share/gc/cms/parNewGeneration.cpp
1455 false); // disable adjusting queue size when processing references
It's not the queue size that adjusted (or not), it's the number of
queues. And really, it's the MT processing degree.
That last probably should also apply to the name of the variable in
the reference processor, and related names. Although given that this
is all a bit of a kludge to work around (deprecated) CMS deficiencies,
maybe I shouldn't be too concerned. And you did file JDK-8203951.
Though I see Thomas also requested a name change...
If you change the name, remember to update JDK-8203951.</pre>
</blockquote>
Sure, I will update the name at JDK-8203951.<br>
webrev.2 is proposing "bool _adjust_no_of_processing_threads".<br>
<br>
<blockquote type="cite"
cite="mid:A956AB83-BB37-4167-8FB0-24312EE2952C@oracle.com">
<pre wrap="">------------------------------------------------------------------------------
src/hotspot/share/gc/shared/referenceProcessor.cpp
1199 uint ReferenceProcessor::ergo_proc_thread_count(size_t ref_count,
1200 uint max_threads,
1201 ReferenceProcessorPhaseTimes* times) const {
...
1204 if (ReferencesPerThread == 0) {
1205 return _num_queues;
1206 }
Why is _num_queues used here, but max_threads used elsewhere in this
function. I *think* this ought to also be max_threads.</pre>
</blockquote>
Agreed, so changed to return max_threads.<br>
Basically _num_queues and max_threads are same value here but for
consistency, max_threads is better. :)<br>
<br>
As you may know, the intent of exiting here is to avoid SIGFPE at
<br>
size_t thread_count = 1 + (ref_count / ReferencesPerThread);<br>
<br>
Basically this method will not be called if ReferencesPerThread == 0
because it will be filtered out at line 1235. i.e. ctor of
RefProcMTDegreeAdjuster.<br>
But I wanted to have something to avoid SIGFPE within this method
too.<br>
<br>
<blockquote type="cite"
cite="mid:A956AB83-BB37-4167-8FB0-24312EE2952C@oracle.com">
<pre wrap="">------------------------------------------------------------------------------
src/hotspot/share/gc/shared/referenceProcessor.cpp
1229 RefProcMTDegreeAdjuster::RefProcMTDegreeAdjuster(ReferenceProcessor* rp,
...
1235 if (!_rp->has_adjustable_queue() || (ReferencesPerThread == 0)) {
This checks ReferencesPerThread for 0 to do nothing. And then
ergo_proc_thread_count similarly checks for 0 to (effectively) do
nothing. If the earlier suggestion to kill the 1-arg overload is
taken, then ergo_proc_thread_count is really just a helper for
RefProcMTDegreeAdjuster, and we don't need that special case twice.</pre>
</blockquote>
The check for ReferencesPerThread is to disable this feature.<br>
<br>
<blockquote type="cite"
cite="mid:A956AB83-BB37-4167-8FB0-24312EE2952C@oracle.com">
<pre wrap="">And perhaps it should be a private helper in RefProcMTDegreeAdjuster?
Similarly for use_max_threads. Or maybe just inline them into that
class's constructor.</pre>
</blockquote>
Good idea and it was one of my original patch before considering to
support CMS.<br>
Done.<br>
<br>
<blockquote type="cite"
cite="mid:A956AB83-BB37-4167-8FB0-24312EE2952C@oracle.com">
<pre wrap="">------------------------------------------------------------------------------
src/hotspot/share/gc/shared/referenceProcessor.cpp
1199 uint ReferenceProcessor::ergo_proc_thread_count(size_t ref_count,
...
1213 return (uint)MIN3(thread_count,
1214 static_cast<size_t>(max_threads),
1215 (size_t)os::initial_active_processor_count());
I don't see why this should look at initial_active_processor_count?
------------------------------------------------------------------------------
</pre>
</blockquote>
As I noted in pre-review email thread, it showed better results if I
avoid using more than active cpus.<br>
This is a kind of enhancement for better performance, so even though
the command-line option was set to use more than cpus(via
ParalelGCThreads), I think it is good to limit. This is different
from command-line option processing which lets users decide even
though the given options would results in worse performance.<br>
<br>
Or if you are saying I should use "active_processor_count()"? If it
is the case, I updated to use 'active_processor_count()' instead of
'initial_active_processor_count()'.<br>
<br>
I would prefer to have this unless there's any reason of not having
it.<br>
<br>
----<br>
In addition:<br>
1) I removed condition check of (num_queues() >1) because it
would affect to a caller side expectation for per worker state
information etc.. I never faced any problem though.<br>
2) Minor changes such as renaming. e.g.
RefProcMTDegreeAdjuster::_saved_num_queue to _saved_num_queues.<br>
<br>
Webrev:<br>
<a class="moz-txt-link-freetext"
href="http://cr.openjdk.java.net/%7Esangheki/8043575/webrev.2">http://cr.openjdk.java.net/~sangheki/8043575/webrev.2</a><br>
<a class="moz-txt-link-freetext"
href="http://cr.openjdk.java.net/%7Esangheki/8043575/webrev.2_to_1">http://cr.openjdk.java.net/~sangheki/8043575/webrev.2_to_1</a><br>
Testing: hs-tier1~5 with/without ParallelRefProcEnabled<br>
<br>
Thanks,<br>
Sangheon<br>
</body>
</html>