RFR: 8247928: Refactor G1ConcurrentMarkThread for mark abort
Stefan Johansson
sjohanss at openjdk.java.net
Mon Sep 7 11:48:00 UTC 2020
On Mon, 7 Sep 2020 08:59:45 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:
> Hi all,
>
> this is a continuation of the review of the same CR at
> https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-September/030794.html; the current change is "Version 3"
> above, the relevant email is https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-September/030812.html copied
> verbatim here again:
>
> Hi Kim,
>
> On 03.09.20 19:17, Kim Barrett wrote:
>>> On Sep 3, 2020, at 6:07 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>> On 03.09.20 09:24, Kim Barrett wrote:
>>>> It seems backward that the phase functions return true to indicate aborted,
>>>> overflow, or whatever, and return false for successful completion. And
>>>> there's no mention of what the return values are in the description in the
>>>> header file.
>>>
>>> I think the code is easier to read without the need to always negate the result. Ie. in the following example
>>>
>>> if (!phase1) return;
>>> if (!phase2) return;
>>>
>>> the "!" detracts imho from reading the steps.
>>>
>>> Another option I can see could be a macro hiding all this, e.g.
>>>
>>> EXEC(phase1);
>>> EXEC(phase2);
>>>
>>> but I kind of dislike that even more.
>>>
>>> Added documentation.
>>
>> I agree a macro is not the way to go here. I'm less bothered by the need for
>> lots of preceeding "!" than you are, but I certainly see your point. So I'm
>> okay with the code as is, now that there's a comment to set expectations.
>>
>>> http://cr.openjdk.java.net/~tschatzl/8247928/webrev.1_to_2 (diff)
>>> http://cr.openjdk.java.net/~tschatzl/8247928/webrev.2 (full)
>>
>> I like the overall approach a lot. Thanks for doing this cleanup.
>>
>> Although I have a number of comments, there's nothing major here.
>>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp
>> 190 uint iter = 1;
>> 191 while (true) {
>> ...
>> 209 log_info(gc, marking)("Concurrent Mark Restart for Mark Stack Overflow (iteration #%u)",
>> 210 iter++);
>>
>> I think that's the only use of iter, and that it's not used after the loop.
>> How about instead
>>
>> for (uint iter = 1; true; ++iter) {
>>
>> and remove the increment in the logging statement.
>
> Done.
>
>>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp
>> 191 while (true) {
>> ...
>> 211 }
>> 212
>> 213 Ticks mark_end = Ticks::now();
>> 214 log_info(gc, marking)("Concurrent Mark (%.3fs, %.3fs) %.3fms",
>> 215 mark_start.seconds(), mark_end.seconds(),
>> 216 (mark_end - mark_start).seconds() * 1000.0);
>>
>> This is a change to the logging. Some consequences I like, others I'm not so
>> sure about. You talked about some of this in the RFR description, but I
>> want to go through it carefully here.
>>
>> - This now includes the remark pause in the time, where previously that
>> seemed to be intentionally excluded. I don't know the rationale for that
>> exclusion, but it was pretty clearly intentional since it intentionally (per
>> comment) contorted the code somewhat. So it's hard to decide whether this is
>> okay. The remark time could be excluded though by capturing the mark_end
>> time just before calling subphase_remark().
>
> Since within the "Concurrent Mark" time span there can be any number of
> other (gc) stw pauses, I am sceptical how subtracting only the Remark
> pause this could be anything but misleading.
>
> It could be certainly interesting to provide some concurrent duration
> for the whole marking, but I think this out of scope here.
>
> I do not know what the old "Concurrent Mark" end message actually
> measured: if you look, the first message will not contain remark pause,
> the next ones after a mark restart do contain them (so all time
> excluding the last Remark pause). It is neither just the concurrent time
> nor the full marking time. Not sure how useful that is.
>
> This code simplifies the message to something like: from start of
> marking related activity until all marking activity fully completed.
>
> Actually there are still a few inconsistencies in the
> (gc=info,gc+marking=info) log output. Let me give an example:
>
> [23.160s][info][gc ] GC(225) Concurrent Cycle
> [23.160s][info][gc,marking] GC(225) Concurrent Clear Claimed Marks
> [23.160s][info][gc,marking] GC(225) Concurrent Clear Claimed Marks 0.009ms
> [23.160s][info][gc,marking] GC(225) Concurrent Scan Root Regions
> [23.209s][info][gc,marking] GC(225) Concurrent Scan Root Regions 48.811ms
> [23.209s][info][gc,marking] GC(225) Concurrent Mark (23.209s)
>
> ^-- only the "Concurrent Mark" messages has start time - I think it is
> superfluous, just repeating the output of UL.
>
> [23.209s][info][gc,marking] GC(225) Concurrent Mark From Roots
> [...]
> [24.792s][info][gc,marking] GC(225) Concurrent Mark From Roots 1583.035ms
> [24.792s][info][gc,marking] GC(225) Concurrent Preclean
> [24.793s][info][gc,marking] GC(225) Concurrent Preclean 0.379ms
> [24.886s][info][gc ] GC(225) Pause Remark 552M->225M(760M) 93.108ms
> [24.886s][info][gc,marking] GC(225) Concurrent Mark (23.209s, 24.886s)
> 1676.831ms
>
> ^-- same here with start/end times.
>
> [24.886s][info][gc,marking] GC(225) Concurrent Rebuild Remembered Sets
> [...]
> [31.592s][info][gc,marking] GC(225) Concurrent Rebuild Remembered Sets
> 6706.292ms
> [31.594s][info][gc ] GC(225) Pause Cleanup 451M->451M(1024M) 0.774ms
> [31.594s][info][gc,marking] GC(225) Concurrent Cleanup for Next Mark
> [31.600s][info][gc,marking] GC(225) Concurrent Cleanup for Next Mark 6.298ms
> [31.600s][info][gc ] GC(225) Concurrent Cycle 8439.504ms
>
> I fixed those two in the next patch next week I will publish using a
> github PR...
>
> Another inconsistency is that the Concurrent Mark Abort message does not
> contain any duration information (but UL captures the "end" time stamp).
> But I would like to defer that (existing) issue.
>
>>
>> - It only does this logging once, rather than on each iteration. That seems
>> good; as you pointed out in the RFR, multiples are pretty likely handled
>> wrong by some uses.
>>
>> - This logging no longer happens if marking is aborted by remark. That's
>> now consistent with other abort cases, so probably an improvement.
>
> Agree to both.
>
>>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp
>> 137 while (!should_terminate()) {
>> 138 if (wait_for_next_cycle()) {
>> 139 break;
>> 140 }
>>
>> This could instead just be
>>
>> while (!wait_for_next_cycle()) { ...
>>
>> But I already thought the sense of the wait_for_next_cycle result was
>> backward, and this makes an even stronger case for that. I think better
>> would be to reverse the sense of the result and use
>>
>> while (wait_for_next_cycle()) { ...
>
> +1. Done.
>
>>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.hpp
>> 65 bool phase_concurrent_cycle_start();
>>
>> Unused and undefined.
>>
>
> Done.
>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.hpp
>> 63 // ConcurrentGCBreakpoints must not be placed before the the root
>> 64 // region scan phase too for this reason.
>>
>> This comment seems misplaced here. I think it belongs in the implementation
>> of full_concurrent_cycle_do, in conjunction with the comment about not
>> returning before the scan root regions phase.
>>
>> I think the rest of the comment about STS &etc also seems misplaced here and
>> should be similarly moved (more or less where it used to be).
>>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.hpp
>> 66 bool phase_clear_cld_claimed_marks();
>>
>> The result of this function is unused, because of the restriction against
>> returning before the scan root regions phase. I think I'd prefer it return
>> void, to make it clear there's nothing to be done with it. Even though that
>> makes its signature different from all the phase functions.
>>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.hpp
>> 81 // Perform a full concurrent cycle.
>>
>> This comment seems kind of vacuous. The function also in some sense doesn't
>> even do that, since it doesn't include the start/end parts.
>>
>> ------------------------------------------------------------------------------
>> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.hpp
>> 70 bool mark_loop_needs_restart() const;
>>
>> This one seems out of order or otherwise misplaced here. Seems like it
>> should be after subphase_remark. Or maybe it shouldn't be in the phase
>> function section at all.
>>
>> ------------------------------------------------------------------------------
>>
>
> All fixed.
>
> Testing: tier1 currently running.
>
> Thanks,
> Thomas
Looks good on GitHub as well 👍
-------------
Marked as reviewed by sjohanss (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/44
More information about the hotspot-gc-dev
mailing list