RFR: 8247928: Refactor G1ConcurrentMarkThread for mark abort

Thu Sep 10 08:56:21 UTC 2020

On Mon, 7 Sep 2020 11:45:09 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:

>> Hi all,
>> 
>>   this is a continuation of the review of the same CR at
>>   https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-September/030794.html; the current change is "Version 3"
>>   above, the relevant email is https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-September/030812.html copied
>>   verbatim here again:
>> 
>> Hi Kim,
>> 
>> On 03.09.20 19:17, Kim Barrett wrote:
>>>> On Sep 3, 2020, at 6:07 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>>> On 03.09.20 09:24, Kim Barrett wrote:
>>>>> It seems backward that the phase functions return true to indicate aborted,
>>>>> overflow, or whatever, and return false for successful completion. And
>>>>> there's no mention of what the return values are in the description in the
>>>>> header file.
>>>>
>>>> I think the code is easier to read without the need to always negate the result. Ie. in the following example
>>>>
>>>> if (!phase1) return;
>>>> if (!phase2) return;
>>>>
>>>> the "!" detracts imho from reading the steps.
>>>>
>>>> Another option I can see could be a macro hiding all this, e.g.
>>>>
>>>> EXEC(phase1);
>>>> EXEC(phase2);
>>>>
>>>> but I kind of dislike that even more.
>>>>
>>>> Added documentation.
>>> 
>>> I agree a macro is not the way to go here. I'm less bothered by the need for
>>> lots of preceeding "!" than you are, but I certainly see your point. So I'm
>>> okay with the code as is, now that there's a comment to set expectations.
>>> 
>>>> http://cr.openjdk.java.net/~tschatzl/8247928/webrev.1_to_2 (diff)
>>>> http://cr.openjdk.java.net/~tschatzl/8247928/webrev.2 (full)
>>> 
>>> I like the overall approach a lot.  Thanks for doing this cleanup.
>>> 
>>> Although I have a number of comments, there's nothing major here.
>>> 
>>> ------------------------------------------------------------------------------
>>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp
>>>   190   uint iter = 1;
>>>   191   while (true) {
>>> ...
>>>   209     log_info(gc, marking)("Concurrent Mark Restart for Mark Stack Overflow (iteration #%u)",
>>>   210                           iter++);
>>> 
>>> I think that's the only use of iter, and that it's not used after the loop.
>>> How about instead
>>> 
>>>    for (uint iter = 1; true; ++iter) {
>>> 
>>> and remove the increment in the logging statement.
>> 
>> Done.
>> 
>>> 
>>> ------------------------------------------------------------------------------
>>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp
>>>   191   while (true) {
>>> ...
>>>   211   }
>>>   212
>>>   213   Ticks mark_end = Ticks::now();
>>>   214   log_info(gc, marking)("Concurrent Mark (%.3fs, %.3fs) %.3fms",
>>>   215                         mark_start.seconds(), mark_end.seconds(),
>>>   216                         (mark_end - mark_start).seconds() * 1000.0);
>>> 
>>> This is a change to the logging. Some consequences I like, others I'm not so
>>> sure about.  You talked about some of this in the RFR description, but I
>>> want to go through it carefully here.
>>> 
>>> - This now includes the remark pause in the time, where previously that
>>> seemed to be intentionally excluded. I don't know the rationale for that
>>> exclusion, but it was pretty clearly intentional since it intentionally (per
>>> comment) contorted the code somewhat. So it's hard to decide whether this is
>>> okay. The remark time could be excluded though by capturing the mark_end
>>> time just before calling subphase_remark().
>> 
>> Since within the "Concurrent Mark" time span there can be any number of
>> other (gc) stw pauses, I am sceptical how subtracting only the Remark
>> pause this could be anything but misleading.
>> 
>> It could be certainly interesting to provide some concurrent duration
>> for the whole marking, but I think this out of scope here.
>> 
>> I do not know what the old "Concurrent Mark" end message actually
>> measured: if you look, the first message will not contain remark pause,
>> the next ones after a mark restart do contain them (so all time
>> excluding the last Remark pause). It is neither just the concurrent time
>> nor the full marking time. Not sure how useful that is.
>> 
>> This code simplifies the message to something like: from start of
>> marking related activity until all marking activity fully completed.
>> 
>> Actually there are still a few inconsistencies in the
>> (gc=info,gc+marking=info) log output. Let me give an example:
>> 
>> [23.160s][info][gc        ] GC(225) Concurrent Cycle
>> [23.160s][info][gc,marking] GC(225) Concurrent Clear Claimed Marks
>> [23.160s][info][gc,marking] GC(225) Concurrent Clear Claimed Marks 0.009ms
>> [23.160s][info][gc,marking] GC(225) Concurrent Scan Root Regions
>> [23.209s][info][gc,marking] GC(225) Concurrent Scan Root Regions 48.811ms
>> [23.209s][info][gc,marking] GC(225) Concurrent Mark (23.209s)
>> 
>> ^-- only the "Concurrent Mark" messages has start time - I think it is
>> superfluous, just repeating the output of UL.
>> 
>> [23.209s][info][gc,marking] GC(225) Concurrent Mark From Roots
>> [...]
>> [24.792s][info][gc,marking] GC(225) Concurrent Mark From Roots 1583.035ms
>> [24.792s][info][gc,marking] GC(225) Concurrent Preclean
>> [24.793s][info][gc,marking] GC(225) Concurrent Preclean 0.379ms
>> [24.886s][info][gc        ] GC(225) Pause Remark 552M->225M(760M) 93.108ms
>> [24.886s][info][gc,marking] GC(225) Concurrent Mark (23.209s, 24.886s)
>> 1676.831ms
>> 
>> ^-- same here with start/end times.
>> 
>> [24.886s][info][gc,marking] GC(225) Concurrent Rebuild Remembered Sets
>> [...]
>> [31.592s][info][gc,marking] GC(225) Concurrent Rebuild Remembered Sets
>> 6706.292ms
>> [31.594s][info][gc        ] GC(225) Pause Cleanup 451M->451M(1024M) 0.774ms
>> [31.594s][info][gc,marking] GC(225) Concurrent Cleanup for Next Mark
>> [31.600s][info][gc,marking] GC(225) Concurrent Cleanup for Next Mark 6.298ms
>> [31.600s][info][gc        ] GC(225) Concurrent Cycle 8439.504ms
>> 
>> I fixed those two in the next patch next week I will publish using a
>> github PR...
>> 
>> Another inconsistency is that the Concurrent Mark Abort message does not
>> contain any duration information (but UL captures the "end" time stamp).
>> But I would like to defer that (existing) issue.
>> 
>>> 
>>> - It only does this logging once, rather than on each iteration.  That seems
>>> good; as you pointed out in the RFR, multiples are pretty likely handled
>>> wrong by some uses.
>>> 
>>> - This logging no longer happens if marking is aborted by remark.  That's
>>> now consistent with other abort cases, so probably an improvement.
>> 
>> Agree to both.
>> 
>>> 
>>> ------------------------------------------------------------------------------
>>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp
>>>   137   while (!should_terminate()) {
>>>   138     if (wait_for_next_cycle()) {
>>>   139       break;
>>>   140     }
>>> 
>>> This could instead just be
>>> 
>>>    while (!wait_for_next_cycle()) { ...
>>> 
>>> But I already thought the sense of the wait_for_next_cycle result was
>>> backward, and this makes an even stronger case for that. I think better
>>> would be to reverse the sense of the result and use
>>> 
>>>    while (wait_for_next_cycle()) { ...
>> 
>> +1. Done.
>> 
>>> 
>>> ------------------------------------------------------------------------------
>>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.hpp
>>>    65   bool phase_concurrent_cycle_start();
>>> 
>>> Unused and undefined.
>>> 
>> 
>> Done.
>> 
>>> ------------------------------------------------------------------------------
>>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.hpp
>>>    63   // ConcurrentGCBreakpoints must not be placed before the the root
>>>    64   // region scan phase too for this reason.
>>> 
>>> This comment seems misplaced here. I think it belongs in the implementation
>>> of full_concurrent_cycle_do, in conjunction with the comment about not
>>> returning before the scan root regions phase.
>>> 
>>> I think the rest of the comment about STS &etc also seems misplaced here and
>>> should be similarly moved (more or less where it used to be).
>>> 
>>> ------------------------------------------------------------------------------
>>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.hpp
>>>    66   bool phase_clear_cld_claimed_marks();
>>> 
>>> The result of this function is unused, because of the restriction against
>>> returning before the scan root regions phase. I think I'd prefer it return
>>> void, to make it clear there's nothing to be done with it. Even though that
>>> makes its signature different from all the phase functions.
>>> 
>>> ------------------------------------------------------------------------------
>>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.hpp
>>>    81   // Perform a full concurrent cycle.
>>> 
>>> This comment seems kind of vacuous. The function also in some sense doesn't
>>> even do that, since it doesn't include the start/end parts.
>>> 
>>> ------------------------------------------------------------------------------
>>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.hpp
>>>    70   bool mark_loop_needs_restart() const;
>>> 
>>> This one seems out of order or otherwise misplaced here.  Seems like it
>>> should be after subphase_remark.  Or maybe it shouldn't be in the phase
>>> function section at all.
>>> 
>>> ------------------------------------------------------------------------------
>>> 
>> 
>> All fixed.
>> 
>> Testing: tier1 currently running.
>> 
>> Thanks,
>>    Thomas
>
> Looks good on GitHub as well 👍

@kimbarrett : could you have a look as the latest change is due to your comments? Thanks.

-------------

PR: https://git.openjdk.java.net/jdk/pull/44