RFR: 8224660: Parallel GC: Use WorkGang (2: MarksFromRootsTask)

Sun Jun 9 06:08:46 UTC 2019

> On Jun 3, 2019, at 1:59 PM, Leo Korinth <leo.korinth at oracle.com> wrote:
> On 29/05/2019 20:08, Kim Barrett wrote:
>>>> src/hotspot/share/gc/parallel/psRootType.inline.hpp
>>>> Is EnumClaimer enough better than the existing mechanism of an
>>>> if-sequence on SubTasksDone::try_claim_task to be worth having?
>>>> One thing I dislike is the need for a sentinal enumerator.  Of course,
>>>> I don't much like the SubTasksDone::all_tasks_completed protocol
>>>> either; I have some notes about it that should be turned into an RFE.
>>> 
>>> I prefer it because it tries to be typesafe and does not have all_tasks_completed which in this case is not needed. I would prefer having a separate barrier for when it is needed.
>> all_tasks_completed exists to support reuse of a SubTasksDone object.
>> It would be better to just disallow such reuse; it's so not worth the
>> added usage complexity.  My notes say the only place with a reused
>> SubTasksDone is GenCollectedHeap, which I think therefore means CMS
>> (since Serial doesn't have parallelism, but is code-sharing with CMS
>> here).  I haven't looked at how hard it would be to kill that re-use.
> 
> Sorry, I was answering your older question a bit too fast as I thought you were talking about *Sequential*SubTasksDone. I would _not_ like to use SubTasksDone as it lacks the same features SequentialSubTasksDone lacks plus it _also_ allocates memory.
> 
> SequentialSubTasksDone has all_tasks_completed (in addition to object reuse as you noted) to signal that cleanup can be made on resources by the last thread (outside the class). It returns a boolean that is checked in one place in CMS. Reuse of SequentialSubTasksDone just seems like bad design as it does not even allocate memory (and thus is free to create). However it is re-used and located in Space for (to me) no apparent reason, especially when it is a StackObj.
> 
> 
>>>> Assuming we're going to use it (and replace existing SubTasksDone
>>>> usages with it?), the class really should be in gc/shared, probably in
>>>> workgroup.hpp (having the class declaration in a .inline.hpp prevents
>>>> embedding an instance in some other class that is declared in a .hpp),
>>>> with the implementation in (new) workgroup.inline.hpp >
>>>> And the Parallel::RootType::Value enum should be in psRootType.hpp.
>>> 
>>> I did put it in workgroup to begin with and moved it out (after suggestions from colleagues) as I were the only user of it. Do you think it would be okay to leave it in the .inline.hpp file (as is) and if we find more users of the class in the future maybe move it to some shared class (and then separate it into hpp and cpp file)? I feel splitting it is overkill and hurts readability when the interface is never needed without the implementation.
>> What I really want to avoid is having two facilities for solving the
>> same problem.  That's why I suggested a different location.  If the
>> plan is to replace SubTasksDone with EnumClaimer, it should be in
>> shared code.  If that's not the plan, then I don't want EnumClaimer at
>> all.
> 
> I think I do understand what you want to avoid (that was why I did initially place it in shared hoping to remove SequentialSubTasksDone eventually). However I also understand why Per and StefanK do not want to place every new class "that could be used in other places" in shared code. I also feel it is no good to "fix" SequentialSubTasksDone, fix its usage in other GCs, and do all that work in this -- already -- quite big patch series. My guess is that _that_ would not be acceptable to quite a few reviewers.
> 
> As my proposed solution is not accepted, I guess the way forward is to kill my EnumClaimer, use SequentialSubTasksDone, manually cast to enum values and after that call all_tasks_completed(). all_tasks_completed() is of course not needed in my case (it will just eat cpu cycles), but it is documented that it must be called, and I do not want to break the contract. Are you okay with such a solution? I do not like it myself, but it might be a not-so-good solution that everyone is able to accept. Do you have a better proposal?
> 
> Sometimes it feels that we all want the perfect solution and by avoiding a non-perfect solution that is, IMO, quite a bit better than before, we end up reverting to using the the old inferior one and in the process loosing an incremental improvement. I will now probably add another use of what is in my opinion a bad class. And later in this patch series I will add another usage to that class. SequentialSubTasksDone will thus have two more usages and be harder to remove after this. People will also more likely find the class and see that this is a class that is used in many places in the code base and thus use it.

SubTasksDone is the comparable for what EnumClaimer is being used for.
It is intended for claiming named (via enum) tasks.
SequentialSubTasksDone is intended for claiming anonymous sequential
chunks of some larger task. Those are different use-cases.

Using SequentialSubTasksDone in the places where EnumClaimer is being
used would be a mis-use; it could be made to work, but that's not the
intended usage. The possible need to cast the claimed subtasks to the
corresponding enumerators is an indication of that. (Note that I
hadn't looked at the reuse problem for SequentialSubTasksDone; it
looks similar to that for SubTasksDone, but perhaps with more
occurrences and more wide-spread use. I'm not trying to deal with that
in this discussion.) (I noticed that EnumClaimer is basically
SequentialSubTasksDone with an enum type and some casts wrapped around
it. That doesn't change my opinion on the intended usage. Also see
below.)

I know that SubTasksDone does an allocation, and agree that's a
demerit against it. I think in the grand scheme of things, it doesn't
really matter; there are lots of temporary allocations during a
collection.  (It could also be removed by having an array member that
is sufficiently large for all uses.  Unfortunately, C++ still doesn't
have variable-length arrays.)

However, there is a feature of SubTasksDone that makes it
significantly superior to EnumClaimer. SubTasksDone allows (indeed,
requires) an order be specified for the subtask processing, via the
order in which the subtasks are claimed. This is independent of the
enumerator value order, which can be in logical groupings. This
ability to specify the order of subtask processing is important for
optimal scheduling. The standard heuristic for minimizing the duration
of multiple parallel tasks when the number of tasks exceeds the number
of execution units is to do the tasks in descending (expected)
duration. (I don't know how much attention has been paid to the
ordering in existing uses of SubTasksDone. I don't recall seeing the
potential importance of the ordering mentioned anywhere, but lack of
such helpful comments is unfortunately the norm in this code base.)

So I think EnumClaimer is not an improvement. I think SubTasksDone
should be used instead of EnumClaimer, with some thought given to the
claim ordering. (Ideally it would be dynamic, based on some measured
estimate of the costs, but I don't think any existing code does that,
so certainly wouldn't fault this set of changes for not doing so
either.) Is there any ordering (implied or explicit) in the task
creation order of the old mechanism? (Asking because I don't remember
and don't want to spend time studying code that is about to be
deleted.)

I think it would be really nice to get rid of the (I think) one re-use
of a SubTasksDone instance and get rid of all_tasks_completed. That
could be done as a separate precursor to this set of changes, or as a
later cleanup, whichever seems more convenient.