RFR: 8027545: Improve object array chunking test in G1's copy_to_survivor_space

Stefan Johansson sjohanss at openjdk.java.net
Wed Sep 9 18:37:45 UTC 2020


On Wed, 9 Sep 2020 07:24:27 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> This is rework of initial change from before the transition to git.
> The initial RFR email is attached below.
> 
> The primary change is to limit the number of partial array tasks in
> the queues for any given array.  The original change just split up an
> array into N tasks that were all enqueued at once.  But for a large
> array this could be a lot of tasks, leading to significant and
> unnecessary queue expansion.  Instead we now limit the number of tasks
> for a given array based on the number of workers, and only gradually
> add tasks up to that limit.  This gives other threads an opportunity
> to steal such tasks, while still keeping queue growth under control.
> 
> Most of the calculation for this is handled by a new helper class, so
> this can later be shared with ParallelGC.
> 
> The dispatch on array klass type for has also been changed.  It now
> affirmatively breaks Project Valhalla, rather than quietly doing
> something that probably isn't what is actually wanted.  I'll discuss
> with them so there is a plan for dealing with it when they take this
> update.
> 
> Ran tier1-6 in mach5 and some local stress tests.
> 
> Performance testing was unchanged from previous, except I wasn't able
> to reproduce the small specjbb2015 critical-jops improvement
> previously seen on one platform.  My suspicion is that improvement was
> a statistical abberation.
> 
> --- Initial RFR email ---
> 
> RFR: 8158045: Improve large object handling during evacuation
> RFR: 8027761: Investigate fast-path for scanning only objects with references during gc
> RFR: 8027545: Improve object array chunking test in G1's copy_to_survivor_space
> 
> Please review this change to type dispatching and handling in G1's
> evacuation copying, in order to improve the hot paths and improve array
> handling. This change addresses several closely co-located enhancement
> requests; it seemed difficult to split them up in a sensible way.
> 
> do_copy_to_survivor_space now gets the klass of the object being copied
> once, up front, for use in multiple places. This avoids fetching (including
> re-decoding when compressed) the klass multiple times. This addresses part
> of JDK-8027545.
> 
> Moved check for and handling of string deduplication later, only applying it
> after the special array cases have been dealt with, since strings are not
> arrays.  (They are header objects pointing to an array of character values.)
> 
> Special case typeArray, doing nothing other than the copy, since they
> contain no oops that need to be processed.  This addresses JDK-8027761.
> 
> Changed handling of objArray, pushing all of the partial array tasks up
> front, rather than processing the current chunk after pushing a single task
> for the remaining work.  This addresses JDK-8158045.
> 
> As part of these, cached some more frequently accessed values in
> G1ParScanThreadState member variables. This addresses part of part of
> JDK-8027545.
> 
> While both the old and new code will work for Project Valhalla, the handling
> of arrays should be updated for that project, which introduces new array
> types.
> 
> Deleted a lingering reference to G1ParScanPartialArrayClosure that was
> deleted long ago (JDK-8035330, JDK 9).
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8158045
> https://bugs.openjdk.java.net/browse/JDK-8027761
> https://bugs.openjdk.java.net/browse/JDK-8027545
> 
> Webrev:
> https://cr.openjdk.java.net/~kbarrett/8158045/open.00/
> 
> Testing:
> tier1-3
> 
> performance testing - seems to be at worst performance neutral, with a
> statistically significant 3% improvement in specjbb2015 critical-jops seen
> on one platform.

Overall a very nice improvement. Some comments and ideas below.

src/hotspot/share/gc/g1/g1ParScanThreadState.hpp line 84:

> 82:   bool _old_gen_is_full;
> 83:   // Size (in elements) of a partial objArray task chunk.
> 84:   int _partial_objarray_chunk_size;

I would prefer to skip the "obj"-part here to have more consistent naming or, as mentioned above, include it in the
stepper instead.

src/hotspot/share/gc/g1/g1ParScanThreadState.hpp line 165:

> 163: private:
> 164:   inline void do_partial_array(PartialArrayScanTask task);
> 165:   inline void start_partial_objArray(G1HeapRegionAttr dest_dir, oop from, oop to);

Same here, drop "obj" for consistent naming and avoiding the camel-case.

src/hotspot/share/gc/shared/partialArrayTaskStepper.cpp line 34:

> 32:   // that maximizes available parallelism.
> 33:   return n_workers;
> 34: }

In preparation for a more advanced logic for the limit, or why not just use the input at the call-site?

src/hotspot/share/gc/shared/partialArrayTaskStepper.inline.hpp line 71:

> 69:   // The from-space object contains the real length.
> 70:   int length = from->length();
> 71:   assert(start < length, "invariant: start %d, length %d", start, length);

Just so I'm not missing anything. There will never be more tasks than chunks left, so there is no risk that we have a
race for the last chunk and thus hit this assertion?

src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 68:

> 66:     _old_gen_is_full(false),
> 67:     _partial_objarray_chunk_size(ParGCArrayScanChunk),
> 68:     _partial_array_stepper(n_workers),

What do you think about saving the chunk size in the stepper instead? Then we don't need to pass it in to `start()` and
`next()`. To avoid needing it for the call to `oop_iterate_range()` we could instead have the `Step` include both the
start and end index.

src/hotspot/share/gc/shared/partialArrayTaskStepper.hpp line 57:

> 55:     int _index;                 // Array index for the step.
> 56:     uint _ncreate;              // Number of new tasks to create.
> 57:   };

I wouldn't mind having getters for those, but it's not a hard request :)

src/hotspot/share/gc/shared/partialArrayTaskStepper.inline.hpp line 103:

> 101:   uint ncreate = MIN2(_task_fannout, MIN2(remaining_tasks, _task_limit + 1) - pending);
> 102:   Step result = { start, ncreate };
> 103:   return result;

Similar to my comment above, what do you think about trying to write some test for this to verify we never get to many
"tasks".

-------------

Changes requested by sjohanss (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/90



More information about the hotspot-gc-dev mailing list