Request for Review (xs) - 8159073: Error handling incomplete when creating GC threads lazily

Tue Jul 19 17:27:30 UTC 2016

On 7/19/2016 2:03 AM, Thomas Schatzl wrote:
> Hi Jon,
>
> On Fri, 2016-07-15 at 07:28 -0700, Jon Masamitsu wrote:
>> On 7/14/2016 1:47 AM, Thomas Schatzl wrote:
>>> Hi,
>>>
>>> On Wed, 2016-07-13 at 16:46 -0700, Jon Masamitsu wrote:
>>>> On 7/13/2016 4:11 AM, Thomas Schatzl wrote:
>>>>> Hi Jon,
>>>>>
>>>>> On Thu, 2016-06-23 at 13:32 -0700, Jon Masamitsu wrote:
>>>>>> The change has grown some (small now, not extra small).
>>>>>>
>>>>>> Added logging for the case where thread creation fails.
>>>>>>
>>>>>> http://cr.openjdk.java.net/~jmasa/8159073/webrev_delta_00_01a
>>>>>> /
>>>>>>
>>>>>> Added fixes for failed thread creation.
>>>>>>
>>>>>> http://cr.openjdk.java.net/~jmasa/8159073/webrev_delta_00_01b
>>>>>> /
>>>>>>
>>>>>> Complete fix with test added.
>>>>>>
>>>>>> http://cr.openjdk.java.net/~jmasa/8159073/webrev.01/
>>>>>>
>>>>>> Tested with gc_test_suite with new diagnostic flag on and
>>>>>> off.
>>>>>      some comments:
>>>>>
>>>>> - in a few files (concurrentMarkSweepGeneration.cpp,
>>>>> parnewGeneration.cpp) there is a comment like:
>>>>>
>>>>> 2891       // If the total workers is greater than 1, then
>>>>> multiple
>>>>> workers
>>>>> 2892       // may be used at some time and the initialization
>>>>> has
>>>>> been
>>>>> set
>>>>> 2893       // such that the single threaded path cannot be
>>>>> used.
>>>>> 2894       if (workers->total_workers() > 1) {
>>>>>
>>>>> Wouldn't it be easier to just remove the special case for
>>>>> running
>>>>> the task in the VM thread?
>>>> I've seen cases with G1 where using the VM thread compared to
>>>> using
>>>> 1 GC worker thread is about .5ms faster.   For a long time I
>>>> would
>>>> have blown off .5ms but I'm less inclined to do that now.  It is
>>>> after all 500us :-).  We hear talk about sub ms pauses being
>>>> important to someone somewhere.
>>> Not trying to convince you, and it is a good argument, but:
>>>
> [...]
>>> There are also significant gains to be made by just reducing the
>>> number
>>> of times we start up threads (e.g. reference processing, G1's
>>> "other"
>>> tasks). Not that you couldn't address any of these issues in
>>> addition,
>>> but they decrease the impact :)
>> Are you saying that if I delete the option to use the VM thread, then
>> I don't need that code that checks the special case of 1 thread
>> (more deleted code) so it simplifies my change?
> Yes - but this has been more of a side remark, so I do not think it
> would be a good change for this CR.

Ok.  No change to this patch.

>
>>> Further, if we consider doing that, I would prefer to have that
>>> behavior centralized in the run_task() methods. Not have to
>>> consider every time we create and run a gang task whether it might
>>> be useful to run it single-threaded (and add the boiler plate code
>>> to do it everywhere).
>> I can see that point although I don't think the centralized point
>> should be run_task().
> Sure, this is open for discussion. Just a suggestion.

OK.  No change here either for this patch.

>
> [...]
>>>>> So I tried a vm.gc test run with the patch at
>>>>> http://cr.openjdk.java.net/~tschatzl/8159073/webrev/hotspot.pat
>>>>> ch (basically your patch, with changes in WorkGang::run_task(),
>>>>> and removing the nasty assert in update_active_workers()), and
>>>>> it passed.
>>>> I can believe that patch workers but I think it is going in the
>>>> wrong direction.
>>>>
>>>> calc_active_workers() picks the number of GC threads based on
>>>> heap size. I think this is appropriate for many parallel GC
>>>> tasks.  But there are exceptions and  run_task(AbstractGangTask*
>>>> task, uint num_workers) is for those exceptions.  For example,
>>>> Reference processing  does not scale with the size of the heap so
>>>> we should use some other policy to choose the number of parallel
>>>> workers for Reference processing and use run_task(task,
>>>> num_workers) to say I want to override the number of workers
>>>> active in the workgroup.
>>> Yes, but the existing code does not support that. It strictly
>>> bounds the actual number of "workers" (and actually, in terms of
>>> the framework this is more a number of tasks!) by the number of
>>> active threads available at the moment! :)
>> Do you mean this code that is bounding the num_workers by the
>> number of created threads?
>>
>> 278   uint active_workers = MIN2(_created_workers, num_workers);
>> 279   _dispatcher->coordinator_execute_on_workers(task,
>> active_workers);
> Yes. Sorry for creating confusion.
>
>>> E.g. if active_workers = 3 (for whatever reason), but somebody
>>> calls run_task(task, 10), the method silently only performs three
>>> "tasks".
>> The limiting to active_workers is the kind of thing I see callers of
>> run_task(task, num_worker) doing.  From
>> "share/vm/gc/g1/g1CardLiveData.cpp"
>> (line 427)
>>
>> 427   uint const num_workers = (uint)MIN2(num_chunks,
>> (size_t)workers->active_workers());
>> 428
>> 429   G1ClearCardLiveDataTask cl(live_cards_bm(), num_chunks);
>> 430
>> 431   log_debug(gc, ergo)("Running %s using %u workers for "
>> SIZE_FORMAT
>> " work units.", cl.n    ame(), num_workers, num_chunks);
>> 432   workers->run_task(&cl, num_workers);
>>
> Yes, the caller limits the number of threads. Which means that the
> caller (hopefully) properly prepares the AbstractGangTask to work with
> that number of threads :)
>
> The problem is the run_task(, uint) method potentially further
> restricting the number of "workers". For that one the caller might not
> have prepared the AbstractGangTask for.

It seems to me that we don't want the caller of run_task() to have to know
that not all the threads could be created (i.e., that _created_threads <
_available_threads).   As you say, some central authority knows that.
Here the central place is the AbstractWorkGang.  That seems the right
place to me.  The STW parallel WorkGang's and the concurrent parallel
WorkGang's have a different number of threads that they wants to
use.  It could be argued that the task (AbstractGangTask) know more
about the work than the work gang so should decide how many
threads to use, but there are/will be more global constraints on the
number of parallel threads than the work to be done by the task.
When the JVM becomes aware of the resource usage on the platform,
it may want to limit the number of threads to use (so as not to
hog the CPU).

If you still want the patch you proposed to save and restore available
threads, I'll accept it as part of this change.

>
>>> [...]
>>> Overall, and as a future improvement, I think it might be better to
>>> move the management of "global" active_workers variable out from
>>> WorkGang. And maybe rename the parameters more appropriately. ;)
>> In my mind WorkGang is a place to store the number of active workers
>> (and now created workers).  WorkGang knows the maximum number of
>> workers that it can use so active_workers seemed to fit.  Are you
>> suggesting removing the run_task(task) method and only have
>> run_task(task, num_workers) so that
>> the number of active workers is always feed to WorkGang?  I would not
>> be opposed to that.
>>
> Yes, that is the idea - as some future change.

Ok.  Future change.

Thanks for the interesting discussion.  I think the only change
still in question is the save/restore of available threads.  Let
me know what you want me to do with that.  Also, let me
know if there is something I've missed.

Jon
>
> Thanks,
>    Thomas
>