New subject: RFR: 8027545: Improve object array chunking test in G1's copy_to_survivor_space

9 Sep 2020

      On Wed, 9 Sep 2020 07:24:27 GMT, Kim Barrett <kbarrett@openjdk.org> wrote:
...
This is rework of initial change from before the transition to git.
The initial RFR email is attached below.
The primary change is to limit the number of partial array tasks in
the queues for any given array.  The original change just split up an
array into N tasks that were all enqueued at once.  But for a large
array this could be a lot of tasks, leading to significant and
unnecessary queue expansion.  Instead we now limit the number of tasks
for a given array based on the number of workers, and only gradually
add tasks up to that limit.  This gives other threads an opportunity
to steal such tasks, while still keeping queue growth under control.
Most of the calculation for this is handled by a new helper class, so
this can later be shared with ParallelGC.
The dispatch on array klass type for has also been changed.  It now
affirmatively breaks Project Valhalla, rather than quietly doing
something that probably isn't what is actually wanted.  I'll discuss
with them so there is a plan for dealing with it when they take this
update.
Ran tier1-6 in mach5 and some local stress tests.
Performance testing was unchanged from previous, except I wasn't able
to reproduce the small specjbb2015 critical-jops improvement
previously seen on one platform.  My suspicion is that improvement was
a statistical abberation.
--- Initial RFR email ---
RFR: 8158045: Improve large object handling during evacuation
RFR: 8027761: Investigate fast-path for scanning only objects with references during gc
RFR: 8027545: Improve object array chunking test in G1's copy_to_survivor_space
Please review this change to type dispatching and handling in G1's
evacuation copying, in order to improve the hot paths and improve array
handling. This change addresses several closely co-located enhancement
requests; it seemed difficult to split them up in a sensible way.
do_copy_to_survivor_space now gets the klass of the object being copied
once, up front, for use in multiple places. This avoids fetching (including
re-decoding when compressed) the klass multiple times. This addresses part
of JDK-8027545.
Moved check for and handling of string deduplication later, only applying it
after the special array cases have been dealt with, since strings are not
arrays.  (They are header objects pointing to an array of character values.)
Special case typeArray, doing nothing other than the copy, since they
contain no oops that need to be processed.  This addresses JDK-8027761.
Changed handling of objArray, pushing all of the partial array tasks up
front, rather than processing the current chunk after pushing a single task
for the remaining work.  This addresses JDK-8158045.
As part of these, cached some more frequently accessed values in
G1ParScanThreadState member variables. This addresses part of part of
JDK-8027545.
While both the old and new code will work for Project Valhalla, the handling
of arrays should be updated for that project, which introduces new array
types.
Deleted a lingering reference to G1ParScanPartialArrayClosure that was
deleted long ago (JDK-8035330, JDK 9).
CR:
https://bugs.openjdk.java.net/browse/JDK-8158045
https://bugs.openjdk.java.net/browse/JDK-8027761
https://bugs.openjdk.java.net/browse/JDK-8027545
Webrev:
https://cr.openjdk.java.net/~kbarrett/8158045/open.00/
Testing:
tier1-3
performance testing - seems to be at worst performance neutral, with a
statistically significant 3% improvement in specjbb2015 critical-jops seen
on one platform.
Overall a very nice improvement. Some comments and ideas below.

src/hotspot/share/gc/g1/g1ParScanThreadState.hpp line 84:
...
82:   bool _old_gen_is_full;
83:   // Size (in elements) of a partial objArray task chunk.
84:   int _partial_objarray_chunk_size;
I would prefer to skip the "obj"-part here to have more consistent naming or, as mentioned above, include it in the
stepper instead.

src/hotspot/share/gc/g1/g1ParScanThreadState.hpp line 165:
...
163: private:
164:   inline void do_partial_array(PartialArrayScanTask task);
165:   inline void start_partial_objArray(G1HeapRegionAttr dest_dir, oop from, oop to);
Same here, drop "obj" for consistent naming and avoiding the camel-case.

src/hotspot/share/gc/shared/partialArrayTaskStepper.cpp line 34:
...
32:   // that maximizes available parallelism.
33:   return n_workers;
34: }
In preparation for a more advanced logic for the limit, or why not just use the input at the call-site?

src/hotspot/share/gc/shared/partialArrayTaskStepper.inline.hpp line 71:
...
69:   // The from-space object contains the real length.
70:   int length = from->length();
71:   assert(start < length, "invariant: start %d, length %d", start, length);
Just so I'm not missing anything. There will never be more tasks than chunks left, so there is no risk that we have a
race for the last chunk and thus hit this assertion?

src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 68:
...
66:     _old_gen_is_full(false),
67:     _partial_objarray_chunk_size(ParGCArrayScanChunk),
68:     _partial_array_stepper(n_workers),
What do you think about saving the chunk size in the stepper instead? Then we don't need to pass it in to `start()` and
`next()`. To avoid needing it for the call to `oop_iterate_range()` we could instead have the `Step` include both the
start and end index.

src/hotspot/share/gc/shared/partialArrayTaskStepper.hpp line 57:
...
55:     int _index;                 // Array index for the step.
56:     uint _ncreate;              // Number of new tasks to create.
57:   };
I wouldn't mind having getters for those, but it's not a hard request :)

src/hotspot/share/gc/shared/partialArrayTaskStepper.inline.hpp line 103:
...
101:   uint ncreate = MIN2(_task_fannout, MIN2(remaining_tasks, _task_limit + 1) - pending);
102:   Step result = { start, ncreate };
103:   return result;
Similar to my comment above, what do you think about trying to write some test for this to verify we never get to many
"tasks".

-------------

Changes requested by sjohanss (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/90

Re: RFR: 8027545: Improve object array chunking test in G1's copy_to_survivor_space

Stefan Johansson

Kim Barrett

Kim Barrett

Kim Barrett

Kim Barrett

Kim Barrett

Stefan Johansson

Kim Barrett

Kim Barrett

Kim Barrett

tags

participants (3)