On Wed, 9 Sep 2020 18:34:45 GMT, Stefan Johansson <sjohanss@openjdk.org> wrote:
This is rework of initial change from before the transition to git. The initial RFR email is attached below.
The primary change is to limit the number of partial array tasks in the queues for any given array. The original change just split up an array into N tasks that were all enqueued at once. But for a large array this could be a lot of tasks, leading to significant and unnecessary queue expansion. Instead we now limit the number of tasks for a given array based on the number of workers, and only gradually add tasks up to that limit. This gives other threads an opportunity to steal such tasks, while still keeping queue growth under control.
Most of the calculation for this is handled by a new helper class, so this can later be shared with ParallelGC.
The dispatch on array klass type for has also been changed. It now affirmatively breaks Project Valhalla, rather than quietly doing something that probably isn't what is actually wanted. I'll discuss with them so there is a plan for dealing with it when they take this update.
Ran tier1-6 in mach5 and some local stress tests.
Performance testing was unchanged from previous, except I wasn't able to reproduce the small specjbb2015 critical-jops improvement previously seen on one platform. My suspicion is that improvement was a statistical abberation.
--- Initial RFR email ---
RFR: 8158045: Improve large object handling during evacuation RFR: 8027761: Investigate fast-path for scanning only objects with references during gc RFR: 8027545: Improve object array chunking test in G1's copy_to_survivor_space
Please review this change to type dispatching and handling in G1's evacuation copying, in order to improve the hot paths and improve array handling. This change addresses several closely co-located enhancement requests; it seemed difficult to split them up in a sensible way.
do_copy_to_survivor_space now gets the klass of the object being copied once, up front, for use in multiple places. This avoids fetching (including re-decoding when compressed) the klass multiple times. This addresses part of JDK-8027545.
Moved check for and handling of string deduplication later, only applying it after the special array cases have been dealt with, since strings are not arrays. (They are header objects pointing to an array of character values.)
Special case typeArray, doing nothing other than the copy, since they contain no oops that need to be processed. This addresses JDK-8027761.
Changed handling of objArray, pushing all of the partial array tasks up front, rather than processing the current chunk after pushing a single task for the remaining work. This addresses JDK-8158045.
As part of these, cached some more frequently accessed values in G1ParScanThreadState member variables. This addresses part of part of JDK-8027545.
While both the old and new code will work for Project Valhalla, the handling of arrays should be updated for that project, which introduces new array types.
Deleted a lingering reference to G1ParScanPartialArrayClosure that was deleted long ago (JDK-8035330, JDK 9).
CR: https://bugs.openjdk.java.net/browse/JDK-8158045 https://bugs.openjdk.java.net/browse/JDK-8027761 https://bugs.openjdk.java.net/browse/JDK-8027545
Webrev: https://cr.openjdk.java.net/~kbarrett/8158045/open.00/
Testing: tier1-3
performance testing - seems to be at worst performance neutral, with a statistically significant 3% improvement in specjbb2015 critical-jops seen on one platform.
src/hotspot/share/gc/shared/partialArrayTaskStepper.inline.hpp line 103:
101: uint ncreate = MIN2(_task_fannout, MIN2(remaining_tasks, _task_limit + 1) - pending); 102: Step result = { start, ncreate }; 103: return result;
Similar to my comment above, what do you think about trying to write some test for this to verify we never get to many "tasks".
The asserts in the stepper's next() after the increment of to's length verify that we haven't overrun. I haven't thought of a way to verify the algorithm doesn't generate too few tasks though. Well, other than getting crashes because some array elements didn't get processed. But maybe you mean unit tests? I will try to write some; I should have done so already. ------------- PR: https://git.openjdk.java.net/jdk/pull/90