RFR(M): 8205921: Optimizing best-of-2 work stealing queue selection

Thomas Schatzl thomas.schatzl at oracle.com
Wed Jul 25 15:09:19 UTC 2018


Hi all,

  could I have reviews for this change to work stealing that
significantly decreases the amount of unsuccessful steal attempts
initially worked on by Zhengyu from RedHat [0]?

After some initial reviews we agreed that I continued working on that,
mostly cleaning up the code, and doing some more testing.

While the change looks a lot bigger now, most of the changes were due
to removing the need to pass in the "seed" parameter in from all
collectors.

The change is based on one of the ideas presented in a recent paper
[1], where it has been shown that work stealing in the task queues
should be biased towards queues a thread already successfully stole
from, as this would speed up GC pauses (and program execution)
significantly.

My measurements showed that the change significantly decreases the
number of overall steal attempts (not necessarily termination time) and
increases the relative number of successful ones.

Contrary to the paper there is no clear observable (statistically
significant) actual pause time improvement. This may have several
reasons:

- the paper mostly measures total execution time, not pause times which
I am mostly interested in. :)

- while the number of steal attempts decreases significantly with that
change, this number is typically dwarfed by the actual pushes and pops
on all applications that actually do work - and on others you might
want to simply use less gc threads. 
At least in steady state, the work in the majority of applications I
tried seem to be fairly well balanced in default configurations, so
there is not much stealing going on compared to other work.

Dacapo is an outlier because there is typically not much work to do at
all, e.g. I measured in total like 1000(!) objects pushed on the task
queue per GC in parallel GC on lusearch (with 40 worker threads, which
is higher than used in the paper).

- the GC statistics implementation enabled by the TASKQUEUE_STATS may
be buggy or incomplete.

- the test setup in the paper may not have been specified well enough.
I used -XX:ParallelGCThreads=15 -Xmx<paper-value> -Xms<paper-value>.

- the dacapo benchmarks may be a bit useless to use for measuring GC
pause times: pause times average take 1-2ms on roughly the same machine
as in the paper with the suggested heap sizes, and a significant part
of that time seems to be spent on work completely unrelated to task
queues.
I.e. you can actually measure "Object Copy" which includes actual
copying work next to stealing with G1. I get like 10% of total pause
time spent there for e.g. lusearch (in the baseline).
The paper suggests ~10% smaller total _execution_ time with only these
task queue improvements (Fig 10a) which seems very hard to achieve
knowing that.

- while I did most of my measurements with JDK11 and G1, some very
brief tests on Parallel GC and JDK8 did not show much difference.

However overall I think this change is useful. :)

I also spent some time on the second enhancement, i.e. limiting the
number of steal attempts per steal round based on the number of active
queues, which is attached to the CR.

That one did not show any further measurable improvement on the number
of steal attempts (within statistical significance), which makes sense
if you consider that we do not spend a lot of time in stealing anyway,
and the reduction of the number of threads due to that is very rare (in
my tests).

The paper also did not give a breakdown either.

The changeset also credits Zhengyu for his work.

CR:
https://bugs.openjdk.java.net/browse/JDK-8205921
Webrev:
http://cr.openjdk.java.net/~tschatzl/8205921/webrev/
Testing:
hs-tier1-4

Thanks,
  Thomas

P.S: sorry for taking so long. Actually the changes were lying around
for some time locally...

[0] http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2018-June/022
556.html
[1] Characterizing and Optimizing Hotspot Parallel Garbage Collection
on Multicore Systems http://ranger.uta.edu/~jrao/papers/EuroSys18.pdf



More information about the hotspot-gc-dev mailing list