[Bug] Load imbalance of Tasks among GCTaskThreads during the Young GC
Thomas Schatzl
thomas.schatzl at oracle.com
Mon Nov 21 09:22:06 UTC 2016
Hi,
On Fri, 2016-11-18 at 22:54 -0600, Tony S wrote:
> Hi,
>
> [Description]
> src/share/vm/gc_implementation/parallelScavenge/psScavenge.cpp
>
> PSPromotionManager enqueues various tasks (ThreadRootsTask /
> StealTask / ScavengeRootsTask / OldToYoungRootsTask) into
> GCTaskQueue.
> GCTaskThread asked GCTaskManager to call get_task() to obtain tasks
> from the GCTaskQueue. However, I found the tasks are not distributed
> evenly among the GCTaskThreads. Some GCTaskThreads get more tasks
> than others.
This imbalance could have several reasons:
- startup of some tasks is delayed, e.g. due to scheduling threads(s) X
gets activated a significant amount of time later than other threads.
- the work unit per task is so small, completing some of these tasks is
so quick that a thread immediately competes for more work with the
other tasks. Due to scheduling that task might somehow get favorable
treatment (e.g. OldToYoungRootsTask) by the scheduler (like: it is
already running and has not used up its time slice).
- there is no 1:1 correspondence between threads and tasks (e.g.
ScavengeRootsTask), as the work split is done across tasks and not
threads for these. So they can't be evenly distributed (with larger
thread counts).
> Here are my trace log:
>
>
> [Tracing Log]
>
> OldToYoungRootsTask GCTaskThread ID:4
> OldToYoungRootsTask GCTaskThread ID:3
> OldToYoungRootsTask GCTaskThread ID:3
> OldToYoungRootsTask GCTaskThread ID:3
> OldToYoungRootsTask GCTaskThread ID:3
> OldToYoungRootsTask GCTaskThread ID:3
> OldToYoungRootsTask GCTaskThread ID:3
> OldToYoungRootsTask GCTaskThread ID:10
> OldToYoungRootsTask GCTaskThread ID:3
> OldToYoungRootsTask GCTaskThread ID:10
> OldToYoungRootsTask GCTaskThread ID:3
> OldToYoungRootsTask GCTaskThread ID:10
> OldToYoungRootsTask GCTaskThread ID:4
> OldToYoungRootsTask GCTaskThread ID:10
> ScavengeRootsTask GCTaskThread ID:4
> ScavengeRootsTask GCTaskThread ID:10
> ThreadRootsTask GCTaskThread ID:10
> OldToYoungRootsTask GCTaskThread ID:3
> ThreadRootsTask GCTaskThread ID:10
> ThreadRootsTask GCTaskThread ID:10
> ThreadRootsTask GCTaskThread ID:3
> ThreadRootsTask GCTaskThread ID:10
> ThreadRootsTask GCTaskThread ID:3
> ThreadRootsTask GCTaskThread ID:10
> ThreadRootsTask GCTaskThread ID:3
> ThreadRootsTask GCTaskThread ID:3
> ThreadRootsTask GCTaskThread ID:10
> ThreadRootsTask GCTaskThread ID:3
> ThreadRootsTask GCTaskThread ID:10
> ThreadRootsTask GCTaskThread ID:10
> ThreadRootsTask GCTaskThread ID:3
> ThreadRootsTask GCTaskThread ID:10
> ThreadRootsTask GCTaskThread ID:3
> ThreadRootsTask GCTaskThread ID:3
> ScavengeRootsTask GCTaskThread ID:3
> ScavengeRootsTask GCTaskThread ID:3
> ScavengeRootsTask GCTaskThread ID:3
> ThreadRootsTask GCTaskThread ID:10
> ScavengeRootsTask GCTaskThread ID:3
> ScavengeRootsTask GCTaskThread ID:3
> ScavengeRootsTask GCTaskThread ID:3
> ThreadRootsTask GCTaskThread ID:4
> ScavengeRootsTask GCTaskThread ID:10
> StealTask GCTaskThread ID:6
> StealTask GCTaskThread ID:9
> StealTask GCTaskThread ID:3
> StealTask GCTaskThread ID:14
> StealTask GCTaskThread ID:7
> StealTask GCTaskThread ID:5
> StealTask GCTaskThread ID:8
> StealTask GCTaskThread ID:0
> StealTask GCTaskThread ID:12
> StealTask GCTaskThread ID:13
> StealTask GCTaskThread ID:2
> StealTask GCTaskThread ID:4
> StealTask GCTaskThread ID:11
> StealTask GCTaskThread ID:1
> StealTask GCTaskThread ID:10
>
>
> I use dacapo benchmark, running lusearch on JDK 1.8 on ubuntu 16.10.
> I have totally 15 GCTaskThreads on the above test. We can see that
Afair (many of) the DaCapo benchmarks are a bit on the small side for
modern machines. Anyway, due to various reasons you might even get
better gc pauses with less threads.
> only StealTasks are evenly executed on different GCTaskThread. The
> other tasks are executed on few GCTaskThreads, not evenly
> distributed.
>
> When I increase the number of mutator threads and thus have more
> tasks (ThreadRootsTask / StealTask / ScavengeRootsTask /
> OldToYoungRootsTask), however, those tasks are still not distributed
> evenly among the GCTaskThreads, which causes the load imbalance and
> thus prolongs the Young GC time.
The trace above does not include start/end times of the threads and
tasks so it is hard to diagnose any actual problem, its source, or if
there is actually one. I.e. StealTasks of different threads might take
a different amount of time, so they may actually finish at
approximately the same time anyway? Did you measure how long the
threads are actually waiting for work (or completion of other threads)?
> Did anyone know the reason?
> Thanks.
Thanks,
Thomas
More information about the hotspot-gc-dev
mailing list