[Bug] Load imbalance of Tasks among GCTaskThreads during the Young GC
Tony S
suokunstar at gmail.com
Mon Nov 21 17:15:47 UTC 2016
Hi Thomas,
Thank you for your reply.
On Mon, Nov 21, 2016 at 3:22 AM, Thomas Schatzl
<thomas.schatzl at oracle.com> wrote:
> Hi,
>
> On Fri, 2016-11-18 at 22:54 -0600, Tony S wrote:
>> Hi,
>>
>> [Description]
>> src/share/vm/gc_implementation/parallelScavenge/psScavenge.cpp
>>
>> PSPromotionManager enqueues various tasks (ThreadRootsTask /
>> StealTask / ScavengeRootsTask / OldToYoungRootsTask) into
>> GCTaskQueue.
>> GCTaskThread asked GCTaskManager to call get_task() to obtain tasks
>> from the GCTaskQueue. However, I found the tasks are not distributed
>> evenly among the GCTaskThreads. Some GCTaskThreads get more tasks
>> than others.
>
> This imbalance could have several reasons:
>
> - startup of some tasks is delayed, e.g. due to scheduling threads(s) X
> gets activated a significant amount of time later than other threads.
>
I am not sure. I tried to bind all the GCTaskThread to different cores
to avoid these threads interfered with each other (like below).
GCTaskManager::initialize() {
...
for (uint t = 0; t < workers(); t += 1) {
set_thread(t, GCTaskThread::create(this, t, processor_assignment[t]));
}
...
}
However, the issue still exists there.
> - the work unit per task is so small, completing some of these tasks is
> so quick that a thread immediately competes for more work with the
> other tasks. Due to scheduling that task might somehow get favorable
> treatment (e.g. OldToYoungRootsTask) by the scheduler (like: it is
> already running and has not used up its time slice).
>
The time execution time for some task is small. That is right. For
example, the below tasks only need hundreds to thousands of
nanoseconds.
OldToYoungRootsTask time:3961 GCTaskThread ID:13
ScavengeRootsTask time:1492 GCTaskThread ID:14
ThreadRootsTask time:995 GCTaskThread ID:14
I do not think it is due to the scheduler. I changed the linux
scheduler to FIFO instead of CFS. Here are the results. Under FIFO,
the OldToYoungRootsTasks still are not distributed evenly. See,
GCTaskThread 0,3,6 execute them.
OldToYoungRootsTask time:26748 GCTaskThread ID:6
OldToYoungRootsTask time:9327 GCTaskThread ID:3
OldToYoungRootsTask time:20358 GCTaskThread ID:0
OldToYoungRootsTask time:3413 GCTaskThread ID:6
OldToYoungRootsTask time:609 GCTaskThread ID:0
OldToYoungRootsTask time:3354 GCTaskThread ID:3
OldToYoungRootsTask time:947 GCTaskThread ID:6
OldToYoungRootsTask time:2007 GCTaskThread ID:0
OldToYoungRootsTask time:861 GCTaskThread ID:6
OldToYoungRootsTask time:3364 GCTaskThread ID:3
OldToYoungRootsTask time:1493 GCTaskThread ID:0
OldToYoungRootsTask time:740 GCTaskThread ID:6
OldToYoungRootsTask time:349 GCTaskThread ID:3
OldToYoungRootsTask time:73 GCTaskThread ID:0
> - there is no 1:1 correspondence between threads and tasks (e.g.
> ScavengeRootsTask), as the work split is done across tasks and not
> threads for these. So they can't be evenly distributed (with larger
> thread counts).
Yes. For ScavengeRootsTask, there is no 1:1 correspondence. But how
about OldToYoungRootsTask and ThreadRootsTask? Their number are either
equal or greater than the number of threads, still not distributed
evenly.
Another potential reason might be the lock on the GCTaskQueue, but I
am not sure right now.
>
>> Here are my trace log:
>>
>>
>> [Tracing Log]
>>
>> OldToYoungRootsTask GCTaskThread ID:4
>> OldToYoungRootsTask GCTaskThread ID:3
>> OldToYoungRootsTask GCTaskThread ID:3
>> OldToYoungRootsTask GCTaskThread ID:3
>> OldToYoungRootsTask GCTaskThread ID:3
>> OldToYoungRootsTask GCTaskThread ID:3
>> OldToYoungRootsTask GCTaskThread ID:3
>> OldToYoungRootsTask GCTaskThread ID:10
>> OldToYoungRootsTask GCTaskThread ID:3
>> OldToYoungRootsTask GCTaskThread ID:10
>> OldToYoungRootsTask GCTaskThread ID:3
>> OldToYoungRootsTask GCTaskThread ID:10
>> OldToYoungRootsTask GCTaskThread ID:4
>> OldToYoungRootsTask GCTaskThread ID:10
>> ScavengeRootsTask GCTaskThread ID:4
>> ScavengeRootsTask GCTaskThread ID:10
>> ThreadRootsTask GCTaskThread ID:10
>> OldToYoungRootsTask GCTaskThread ID:3
>> ThreadRootsTask GCTaskThread ID:10
>> ThreadRootsTask GCTaskThread ID:10
>> ThreadRootsTask GCTaskThread ID:3
>> ThreadRootsTask GCTaskThread ID:10
>> ThreadRootsTask GCTaskThread ID:3
>> ThreadRootsTask GCTaskThread ID:10
>> ThreadRootsTask GCTaskThread ID:3
>> ThreadRootsTask GCTaskThread ID:3
>> ThreadRootsTask GCTaskThread ID:10
>> ThreadRootsTask GCTaskThread ID:3
>> ThreadRootsTask GCTaskThread ID:10
>> ThreadRootsTask GCTaskThread ID:10
>> ThreadRootsTask GCTaskThread ID:3
>> ThreadRootsTask GCTaskThread ID:10
>> ThreadRootsTask GCTaskThread ID:3
>> ThreadRootsTask GCTaskThread ID:3
>> ScavengeRootsTask GCTaskThread ID:3
>> ScavengeRootsTask GCTaskThread ID:3
>> ScavengeRootsTask GCTaskThread ID:3
>> ThreadRootsTask GCTaskThread ID:10
>> ScavengeRootsTask GCTaskThread ID:3
>> ScavengeRootsTask GCTaskThread ID:3
>> ScavengeRootsTask GCTaskThread ID:3
>> ThreadRootsTask GCTaskThread ID:4
>> ScavengeRootsTask GCTaskThread ID:10
>> StealTask GCTaskThread ID:6
>> StealTask GCTaskThread ID:9
>> StealTask GCTaskThread ID:3
>> StealTask GCTaskThread ID:14
>> StealTask GCTaskThread ID:7
>> StealTask GCTaskThread ID:5
>> StealTask GCTaskThread ID:8
>> StealTask GCTaskThread ID:0
>> StealTask GCTaskThread ID:12
>> StealTask GCTaskThread ID:13
>> StealTask GCTaskThread ID:2
>> StealTask GCTaskThread ID:4
>> StealTask GCTaskThread ID:11
>> StealTask GCTaskThread ID:1
>> StealTask GCTaskThread ID:10
>>
>>
>> I use dacapo benchmark, running lusearch on JDK 1.8 on ubuntu 16.10.
>> I have totally 15 GCTaskThreads on the above test. We can see that
>
> Afair (many of) the DaCapo benchmarks are a bit on the small side for
> modern machines. Anyway, due to various reasons you might even get
> better gc pauses with less threads.
>
>> only StealTasks are evenly executed on different GCTaskThread. The
>> other tasks are executed on few GCTaskThreads, not evenly
>> distributed.
>>
>> When I increase the number of mutator threads and thus have more
>> tasks (ThreadRootsTask / StealTask / ScavengeRootsTask /
>> OldToYoungRootsTask), however, those tasks are still not distributed
>> evenly among the GCTaskThreads, which causes the load imbalance and
>> thus prolongs the Young GC time.
>
> The trace above does not include start/end times of the threads and
> tasks so it is hard to diagnose any actual problem, its source, or if
> there is actually one. I.e. StealTasks of different threads might take
> a different amount of time, so they may actually finish at
> approximately the same time anyway? Did you measure how long the
> threads are actually waiting for work (or completion of other threads)?
>
I add the start and end timestamp in the JDK to measure the execution
time of different task, here are the results(using ubuntu 16.10,
scheduler: default CFS, machine: dell T430, benchmark dapaco
lusearch):
OldToYoungRootsTask time:19213 GCTaskThread ID:10
OldToYoungRootsTask time:5680 GCTaskThread ID:13
OldToYoungRootsTask time:13489 GCTaskThread ID:14
OldToYoungRootsTask time:944 GCTaskThread ID:14
OldToYoungRootsTask time:821 GCTaskThread ID:14
OldToYoungRootsTask time:6227 GCTaskThread ID:13
OldToYoungRootsTask time:768 GCTaskThread ID:14
OldToYoungRootsTask time:2729 GCTaskThread ID:13
OldToYoungRootsTask time:745 GCTaskThread ID:13
OldToYoungRootsTask time:4095 GCTaskThread ID:14
OldToYoungRootsTask time:774 GCTaskThread ID:13
OldToYoungRootsTask time:828 GCTaskThread ID:14
OldToYoungRootsTask time:384 GCTaskThread ID:14
ScavengeRootsTask time:1248 GCTaskThread ID:14
OldToYoungRootsTask time:3961 GCTaskThread ID:13
ScavengeRootsTask time:1492 GCTaskThread ID:14
ThreadRootsTask time:995 GCTaskThread ID:14
ThreadRootsTask time:590 GCTaskThread ID:14
ThreadRootsTask time:437 GCTaskThread ID:14
ThreadRootsTask time:1341 GCTaskThread ID:2
ThreadRootsTask time:377 GCTaskThread ID:14
ThreadRootsTask time:310 GCTaskThread ID:2
ThreadRootsTask time:431 GCTaskThread ID:14
ThreadRootsTask time:376 GCTaskThread ID:2
ThreadRootsTask time:277 GCTaskThread ID:2
ThreadRootsTask time:400 GCTaskThread ID:2
ThreadRootsTask time:3354 GCTaskThread ID:14
ThreadRootsTask time:951 GCTaskThread ID:2
ThreadRootsTask time:584 GCTaskThread ID:14
ThreadRootsTask time:1592 GCTaskThread ID:8
OldToYoungRootsTask time:50418 GCTaskThread ID:10
ThreadRootsTask time:1255 GCTaskThread ID:8
ScavengeRootsTask time:1836 GCTaskThread ID:8
ScavengeRootsTask time:1735 GCTaskThread ID:8
ScavengeRootsTask time:1128 GCTaskThread ID:8
ThreadRootsTask time:14132 GCTaskThread ID:2
ThreadRootsTask time:16539 GCTaskThread ID:14
ScavengeRootsTask time:510 GCTaskThread ID:14
ScavengeRootsTask time:6407 GCTaskThread ID:8
StealTask time:3160638 GCTaskThread ID:1
StealTask time:3929547 GCTaskThread ID:13
StealTask time:3359946 GCTaskThread ID:4
StealTask time:3469719 GCTaskThread ID:2
StealTask time:3216089 GCTaskThread ID:11
StealTask time:3863544 GCTaskThread ID:14
StealTask time:3465927 GCTaskThread ID:9
StealTask time:3507085 GCTaskThread ID:3
StealTask time:3624188 GCTaskThread ID:12
StealTask time:3704221 GCTaskThread ID:7
StealTask time:3565777 GCTaskThread ID:6
StealTask time:3204696 GCTaskThread ID:0
StealTask time:4006233 GCTaskThread ID:8
StealTask time:3997742 GCTaskThread ID:10
StealTask time:3530225 GCTaskThread ID:5
Also, I found the StealTask dominates the total time, contributes the
most part of the young GC. However, not all the StealTask really needs
so much time. If one of the StealTask takes much time, many other
StealTask will be prolonged. This is another issue.
>> Did anyone know the reason?
>> Thanks.
>
> Thanks,
> Thomas
--
More information about the hotspot-gc-dev
mailing list