Request for Review (xs) - 8159073: Error handling incomplete when creating GC threads lazily

Wed Jul 13 23:46:16 UTC 2016

On 7/13/2016 4:11 AM, Thomas Schatzl wrote:
> Hi Jon,
>
> On Thu, 2016-06-23 at 13:32 -0700, Jon Masamitsu wrote:
>> The change has grown some (small now, not extra small).
>>
>> Added logging for the case where thread creation fails.
>>
>> http://cr.openjdk.java.net/~jmasa/8159073/webrev_delta_00_01a/
>>
>> Added fixes for failed thread creation.
>>
>> http://cr.openjdk.java.net/~jmasa/8159073/webrev_delta_00_01b/
>>
>> Complete fix with test added.
>>
>> http://cr.openjdk.java.net/~jmasa/8159073/webrev.01/
>>
>> Tested with gc_test_suite with new diagnostic flag on and off.
>    some comments:
>
> - in a few files (concurrentMarkSweepGeneration.cpp,
> parnewGeneration.cpp) there is a comment like:
>
> 2891       // If the total workers is greater than 1, then multiple
> workers
> 2892       // may be used at some time and the initialization has been
> set
> 2893       // such that the single threaded path cannot be used.
> 2894       if (workers->total_workers() > 1) {
>
> Wouldn't it be easier to just remove the special case for running the
> task in the VM thread?

I've seen cases with G1 where using the VM thread compared to using
1 GC worker thread is about .5ms faster.   For a long time I would have
blown off .5ms but I'm less inclined to do that now.  It is after all
500us :-).  We hear talk about sub ms pauses being important to someone
somewhere.  Beyond that, one of the instances where this comment occurs
is for the CMS marking thread where 1 concurrent worker may be common.

And if that's not convincing I'll bring out my last resort excuse: 
"Beyond the
scope of this change" :-).  Yes, it would simplify things, I'd rather 
not have that
fight over this change.

>
> That would simplify this code, and probably the initialization code a
> bit.
> Not sure if we are concerned about this code now though.
>
> - I like the rename from set_active_workers() to
> update_active_workers().
>
> - sorry to always be the one complaining about this, but in
> workerManager.hpp indentation is off in several places:

No problem.
>
>    63         log_trace(gc, task)("WorkerManager::add_workers() : "
>    64            "creation failed due to failed allocation of native
> %s",
>    65            new_worker == NULL ?  "memory" : "thread");
>
>    66         if (new_worker != NULL) {
>    67            delete new_worker;
>    68         }
>
>    78     log_trace(gc, task)("WorkerManager::add_workers() : "
>    79        "created_workers: %u", created_workers);
>
> (Also I think at least the last one conflicts with the recent changes
> to the log messages).

Fixed.

>
> - WorkGang::run_task(): from my understanding that method has always
> been intended to be a shortcut for
>
> old_num_workers = num_workers;
> update_workers(num_workers);
> run_task(cl);
> update_workers(old_num_workers);
>
> This is not the case here any more. The code silently changes the
> number of dispatched workers, which may lead to confusion. I.e. the
> caller may expect that the given do_work() closure method will be
> called with arguments from 0..num_workers-1 eventually.

I think you're referring to

265 void WorkGang::run_task(AbstractGangTask* task) {
266   run_task(task, active_workers());
267 }

I think the idiom is

1) decide if you want to recalculate the number of threads to use 
(either call
calc_active_workers() or use the current value of active_workers()).
2) call run_task() passing the task and it uses the active_workers() that
is part of the workgang.

Repeat at the next parallel task (no dependency on the number of
active workers used in the last parallel task).

I know there was some unfortunate cases with G1 where the reporting of 
statistics
required that the number of active_workers at the parallel task had to 
be the
number of when the statistics were reported and required the restoring the
previous active_workers value.  I had hoped that the new run_task()

void run_task(AbstractGangTask* task, uint num_workers);

would help fix that unfortunate case but I am no longer holding out
hope for that.

>
> That is not the case here any more.
>
> I am not sure whether this is a problem, I think that all currently
> implemented closures are okay with that. But I am not sure what the
> problem is with the original code, so I would tend to try to avoid such
> a change in behavior. Again, the worst thing that may happen is that a
> single worker thread may get his do_work() method called multiple
> times.
>
> This is something that has been possible, and the code had to prepared
> for that, since forever.
>
> [Additionally, currently, all existing users of that run_task() method
> pass (or should pass) num_workers <= active_workers() that call this
> method, so no extra threads should be created.]

I don't recall thinking that the parameter (num_workers) should be less than
active_workers.  If num_workers should be less than active_workers, that's
to be decided by the caller of run_task() and run_task() should use 
num_workers
if possible (where possible is num_workers <= total_workers).

>
> The only problem calling update_active_workers() in that method right
> now is that it only allows less active workers than total workers if
> UseDynamicNumberOfGCThreads is true as far as I can see.

I think UseDynamicNumberOfGCThreads being true allows use to use less
than total number of threads (which is the default behavior forever).  G1
is more parallel and more concurrent and perhaps needs more flexibility.
That's a change we could consciously make and advertise to the world.

>
> So I tried a vm.gc test run with the patch at http://cr.openjdk.java.ne
> t/~tschatzl/8159073/webrev/hotspot.patch (basically your patch, with
> changes in WorkGang::run_task(), and removing the nasty assert in
> update_active_workers()), and it passed.

I can believe that patch workers but I think it is going in the wrong
direction.

calc_active_workers() picks the number of GC threads based on heap size.
I think this is appropriate for many parallel GC tasks.  But there are
exceptions and  run_task(AbstractGangTask* task, uint num_workers)
is for those exceptions.  For example, Reference processing  does not scale
with the size of the heap so we should use some other policy to choose
the number of parallel workers for Reference processing and use
run_task(task, num_workers) to say I want to override the number
of workers active in the workgroup.

Alternatively, we could figure out another way to handle the cases
where the number of active workers needs to be the same between
tasks.  I'm pretty sure it had to do with reporting information in
G1.  As I recall it, it went like

- pick N active workers to do workA
- pick M active workers to do workB
- report workA with current active workers
M and either under report (N > M) or report
garbage (N < M)

Jon
>
> I think that would be exactly the intention of the original change.
>
> What do you think?
>
> Thanks,
>    Thomas
>