RFR: 8247928: Refactor G1ConcurrentMarkThread for mark abort

Mon Sep 7 11:48:00 UTC 2020

On Mon, 7 Sep 2020 08:59:45 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

> Hi all,
> 
>   this is a continuation of the review of the same CR at
>   https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-September/030794.html; the current change is "Version 3"
>   above, the relevant email is https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-September/030812.html copied
>   verbatim here again:
> 
> Hi Kim,
> 
> On 03.09.20 19:17, Kim Barrett wrote:
>>> On Sep 3, 2020, at 6:07 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>> On 03.09.20 09:24, Kim Barrett wrote:
>>>> It seems backward that the phase functions return true to indicate aborted,
>>>> overflow, or whatever, and return false for successful completion. And
>>>> there's no mention of what the return values are in the description in the
>>>> header file.
>>>
>>> I think the code is easier to read without the need to always negate the result. Ie. in the following example
>>>
>>> if (!phase1) return;
>>> if (!phase2) return;
>>>
>>> the "!" detracts imho from reading the steps.
>>>
>>> Another option I can see could be a macro hiding all this, e.g.
>>>
>>> EXEC(phase1);
>>> EXEC(phase2);
>>>
>>> but I kind of dislike that even more.
>>>
>>> Added documentation.
>> 
>> I agree a macro is not the way to go here. I'm less bothered by the need for
>> lots of preceeding "!" than you are, but I certainly see your point. So I'm
>> okay with the code as is, now that there's a comment to set expectations.
>> 
>>> http://cr.openjdk.java.net/~tschatzl/8247928/webrev.1_to_2 (diff)
>>> http://cr.openjdk.java.net/~tschatzl/8247928/webrev.2 (full)
>> 
>> I like the overall approach a lot.  Thanks for doing this cleanup.
>> 
>> Although I have a number of comments, there's nothing major here.
>> 
>> ------------------------------------------------------------------------------
>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp
>>   190   uint iter = 1;
>>   191   while (true) {
>> ...
>>   209     log_info(gc, marking)("Concurrent Mark Restart for Mark Stack Overflow (iteration #%u)",
>>   210                           iter++);
>> 
>> I think that's the only use of iter, and that it's not used after the loop.
>> How about instead
>> 
>>    for (uint iter = 1; true; ++iter) {
>> 
>> and remove the increment in the logging statement.
> 
> Done.
> 
>> 
>> ------------------------------------------------------------------------------
>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp
>>   191   while (true) {
>> ...
>>   211   }
>>   212
>>   213   Ticks mark_end = Ticks::now();
>>   214   log_info(gc, marking)("Concurrent Mark (%.3fs, %.3fs) %.3fms",
>>   215                         mark_start.seconds(), mark_end.seconds(),
>>   216                         (mark_end - mark_start).seconds() * 1000.0);
>> 
>> This is a change to the logging. Some consequences I like, others I'm not so
>> sure about.  You talked about some of this in the RFR description, but I
>> want to go through it carefully here.
>> 
>> - This now includes the remark pause in the time, where previously that
>> seemed to be intentionally excluded. I don't know the rationale for that
>> exclusion, but it was pretty clearly intentional since it intentionally (per
>> comment) contorted the code somewhat. So it's hard to decide whether this is
>> okay. The remark time could be excluded though by capturing the mark_end
>> time just before calling subphase_remark().
> 
> Since within the "Concurrent Mark" time span there can be any number of
> other (gc) stw pauses, I am sceptical how subtracting only the Remark
> pause this could be anything but misleading.
> 
> It could be certainly interesting to provide some concurrent duration
> for the whole marking, but I think this out of scope here.
> 
> I do not know what the old "Concurrent Mark" end message actually
> measured: if you look, the first message will not contain remark pause,
> the next ones after a mark restart do contain them (so all time
> excluding the last Remark pause). It is neither just the concurrent time
> nor the full marking time. Not sure how useful that is.
> 
> This code simplifies the message to something like: from start of
> marking related activity until all marking activity fully completed.
> 
> Actually there are still a few inconsistencies in the
> (gc=info,gc+marking=info) log output. Let me give an example:
> 
> [23.160s][info][gc        ] GC(225) Concurrent Cycle
> [23.160s][info][gc,marking] GC(225) Concurrent Clear Claimed Marks
> [23.160s][info][gc,marking] GC(225) Concurrent Clear Claimed Marks 0.009ms
> [23.160s][info][gc,marking] GC(225) Concurrent Scan Root Regions
> [23.209s][info][gc,marking] GC(225) Concurrent Scan Root Regions 48.811ms
> [23.209s][info][gc,marking] GC(225) Concurrent Mark (23.209s)
> 
> ^-- only the "Concurrent Mark" messages has start time - I think it is
> superfluous, just repeating the output of UL.
> 
> [23.209s][info][gc,marking] GC(225) Concurrent Mark From Roots
> [...]
> [24.792s][info][gc,marking] GC(225) Concurrent Mark From Roots 1583.035ms
> [24.792s][info][gc,marking] GC(225) Concurrent Preclean
> [24.793s][info][gc,marking] GC(225) Concurrent Preclean 0.379ms
> [24.886s][info][gc        ] GC(225) Pause Remark 552M->225M(760M) 93.108ms
> [24.886s][info][gc,marking] GC(225) Concurrent Mark (23.209s, 24.886s)
> 1676.831ms
> 
> ^-- same here with start/end times.
> 
> [24.886s][info][gc,marking] GC(225) Concurrent Rebuild Remembered Sets
> [...]
> [31.592s][info][gc,marking] GC(225) Concurrent Rebuild Remembered Sets
> 6706.292ms
> [31.594s][info][gc        ] GC(225) Pause Cleanup 451M->451M(1024M) 0.774ms
> [31.594s][info][gc,marking] GC(225) Concurrent Cleanup for Next Mark
> [31.600s][info][gc,marking] GC(225) Concurrent Cleanup for Next Mark 6.298ms
> [31.600s][info][gc        ] GC(225) Concurrent Cycle 8439.504ms
> 
> I fixed those two in the next patch next week I will publish using a
> github PR...
> 
> Another inconsistency is that the Concurrent Mark Abort message does not
> contain any duration information (but UL captures the "end" time stamp).
> But I would like to defer that (existing) issue.
> 
>> 
>> - It only does this logging once, rather than on each iteration.  That seems
>> good; as you pointed out in the RFR, multiples are pretty likely handled
>> wrong by some uses.
>> 
>> - This logging no longer happens if marking is aborted by remark.  That's
>> now consistent with other abort cases, so probably an improvement.
> 
> Agree to both.
> 
>> 
>> ------------------------------------------------------------------------------
>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp
>>   137   while (!should_terminate()) {
>>   138     if (wait_for_next_cycle()) {
>>   139       break;
>>   140     }
>> 
>> This could instead just be
>> 
>>    while (!wait_for_next_cycle()) { ...
>> 
>> But I already thought the sense of the wait_for_next_cycle result was
>> backward, and this makes an even stronger case for that. I think better
>> would be to reverse the sense of the result and use
>> 
>>    while (wait_for_next_cycle()) { ...
> 
> +1. Done.
> 
>> 
>> ------------------------------------------------------------------------------
>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.hpp
>>    65   bool phase_concurrent_cycle_start();
>> 
>> Unused and undefined.
>> 
> 
> Done.
> 
>> ------------------------------------------------------------------------------
>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.hpp
>>    63   // ConcurrentGCBreakpoints must not be placed before the the root
>>    64   // region scan phase too for this reason.
>> 
>> This comment seems misplaced here. I think it belongs in the implementation
>> of full_concurrent_cycle_do, in conjunction with the comment about not
>> returning before the scan root regions phase.
>> 
>> I think the rest of the comment about STS &etc also seems misplaced here and
>> should be similarly moved (more or less where it used to be).
>> 
>> ------------------------------------------------------------------------------
>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.hpp
>>    66   bool phase_clear_cld_claimed_marks();
>> 
>> The result of this function is unused, because of the restriction against
>> returning before the scan root regions phase. I think I'd prefer it return
>> void, to make it clear there's nothing to be done with it. Even though that
>> makes its signature different from all the phase functions.
>> 
>> ------------------------------------------------------------------------------
>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.hpp
>>    81   // Perform a full concurrent cycle.
>> 
>> This comment seems kind of vacuous. The function also in some sense doesn't
>> even do that, since it doesn't include the start/end parts.
>> 
>> ------------------------------------------------------------------------------
>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.hpp
>>    70   bool mark_loop_needs_restart() const;
>> 
>> This one seems out of order or otherwise misplaced here.  Seems like it
>> should be after subphase_remark.  Or maybe it shouldn't be in the phase
>> function section at all.
>> 
>> ------------------------------------------------------------------------------
>> 
> 
> All fixed.
> 
> Testing: tier1 currently running.
> 
> Thanks,
>    Thomas

Looks good on GitHub as well 👍

-------------

Marked as reviewed by sjohanss (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/44