Re: RFR: 8027545: Improve object array chunking test in G1's copy_to_survivor_space
On Wed, 9 Sep 2020 07:24:27 GMT, Kim Barrett <kbarrett@openjdk.org> wrote:
This is rework of initial change from before the transition to git. The initial RFR email is attached below.
The primary change is to limit the number of partial array tasks in the queues for any given array. The original change just split up an array into N tasks that were all enqueued at once. But for a large array this could be a lot of tasks, leading to significant and unnecessary queue expansion. Instead we now limit the number of tasks for a given array based on the number of workers, and only gradually add tasks up to that limit. This gives other threads an opportunity to steal such tasks, while still keeping queue growth under control.
Most of the calculation for this is handled by a new helper class, so this can later be shared with ParallelGC.
The dispatch on array klass type for has also been changed. It now affirmatively breaks Project Valhalla, rather than quietly doing something that probably isn't what is actually wanted. I'll discuss with them so there is a plan for dealing with it when they take this update.
Ran tier1-6 in mach5 and some local stress tests.
Performance testing was unchanged from previous, except I wasn't able to reproduce the small specjbb2015 critical-jops improvement previously seen on one platform. My suspicion is that improvement was a statistical abberation.
--- Initial RFR email ---
RFR: 8158045: Improve large object handling during evacuation RFR: 8027761: Investigate fast-path for scanning only objects with references during gc RFR: 8027545: Improve object array chunking test in G1's copy_to_survivor_space
Please review this change to type dispatching and handling in G1's evacuation copying, in order to improve the hot paths and improve array handling. This change addresses several closely co-located enhancement requests; it seemed difficult to split them up in a sensible way.
do_copy_to_survivor_space now gets the klass of the object being copied once, up front, for use in multiple places. This avoids fetching (including re-decoding when compressed) the klass multiple times. This addresses part of JDK-8027545.
Moved check for and handling of string deduplication later, only applying it after the special array cases have been dealt with, since strings are not arrays. (They are header objects pointing to an array of character values.)
Special case typeArray, doing nothing other than the copy, since they contain no oops that need to be processed. This addresses JDK-8027761.
Changed handling of objArray, pushing all of the partial array tasks up front, rather than processing the current chunk after pushing a single task for the remaining work. This addresses JDK-8158045.
As part of these, cached some more frequently accessed values in G1ParScanThreadState member variables. This addresses part of part of JDK-8027545.
While both the old and new code will work for Project Valhalla, the handling of arrays should be updated for that project, which introduces new array types.
Deleted a lingering reference to G1ParScanPartialArrayClosure that was deleted long ago (JDK-8035330, JDK 9).
CR: https://bugs.openjdk.java.net/browse/JDK-8158045 https://bugs.openjdk.java.net/browse/JDK-8027761 https://bugs.openjdk.java.net/browse/JDK-8027545
Webrev: https://cr.openjdk.java.net/~kbarrett/8158045/open.00/
Testing: tier1-3
performance testing - seems to be at worst performance neutral, with a statistically significant 3% improvement in specjbb2015 critical-jops seen on one platform.
Overall a very nice improvement. Some comments and ideas below. src/hotspot/share/gc/g1/g1ParScanThreadState.hpp line 84:
82: bool _old_gen_is_full; 83: // Size (in elements) of a partial objArray task chunk. 84: int _partial_objarray_chunk_size;
I would prefer to skip the "obj"-part here to have more consistent naming or, as mentioned above, include it in the stepper instead. src/hotspot/share/gc/g1/g1ParScanThreadState.hpp line 165:
163: private: 164: inline void do_partial_array(PartialArrayScanTask task); 165: inline void start_partial_objArray(G1HeapRegionAttr dest_dir, oop from, oop to);
Same here, drop "obj" for consistent naming and avoiding the camel-case. src/hotspot/share/gc/shared/partialArrayTaskStepper.cpp line 34:
32: // that maximizes available parallelism. 33: return n_workers; 34: }
In preparation for a more advanced logic for the limit, or why not just use the input at the call-site? src/hotspot/share/gc/shared/partialArrayTaskStepper.inline.hpp line 71:
69: // The from-space object contains the real length. 70: int length = from->length(); 71: assert(start < length, "invariant: start %d, length %d", start, length);
Just so I'm not missing anything. There will never be more tasks than chunks left, so there is no risk that we have a race for the last chunk and thus hit this assertion? src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 68:
66: _old_gen_is_full(false), 67: _partial_objarray_chunk_size(ParGCArrayScanChunk), 68: _partial_array_stepper(n_workers),
What do you think about saving the chunk size in the stepper instead? Then we don't need to pass it in to `start()` and `next()`. To avoid needing it for the call to `oop_iterate_range()` we could instead have the `Step` include both the start and end index. src/hotspot/share/gc/shared/partialArrayTaskStepper.hpp line 57:
55: int _index; // Array index for the step. 56: uint _ncreate; // Number of new tasks to create. 57: };
I wouldn't mind having getters for those, but it's not a hard request :) src/hotspot/share/gc/shared/partialArrayTaskStepper.inline.hpp line 103:
101: uint ncreate = MIN2(_task_fannout, MIN2(remaining_tasks, _task_limit + 1) - pending); 102: Step result = { start, ncreate }; 103: return result;
Similar to my comment above, what do you think about trying to write some test for this to verify we never get to many "tasks". ------------- Changes requested by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/90
On Wed, 9 Sep 2020 12:07:38 GMT, Stefan Johansson <sjohanss@openjdk.org> wrote:
This is rework of initial change from before the transition to git. The initial RFR email is attached below.
The primary change is to limit the number of partial array tasks in the queues for any given array. The original change just split up an array into N tasks that were all enqueued at once. But for a large array this could be a lot of tasks, leading to significant and unnecessary queue expansion. Instead we now limit the number of tasks for a given array based on the number of workers, and only gradually add tasks up to that limit. This gives other threads an opportunity to steal such tasks, while still keeping queue growth under control.
Most of the calculation for this is handled by a new helper class, so this can later be shared with ParallelGC.
The dispatch on array klass type for has also been changed. It now affirmatively breaks Project Valhalla, rather than quietly doing something that probably isn't what is actually wanted. I'll discuss with them so there is a plan for dealing with it when they take this update.
Ran tier1-6 in mach5 and some local stress tests.
Performance testing was unchanged from previous, except I wasn't able to reproduce the small specjbb2015 critical-jops improvement previously seen on one platform. My suspicion is that improvement was a statistical abberation.
--- Initial RFR email ---
RFR: 8158045: Improve large object handling during evacuation RFR: 8027761: Investigate fast-path for scanning only objects with references during gc RFR: 8027545: Improve object array chunking test in G1's copy_to_survivor_space
Please review this change to type dispatching and handling in G1's evacuation copying, in order to improve the hot paths and improve array handling. This change addresses several closely co-located enhancement requests; it seemed difficult to split them up in a sensible way.
do_copy_to_survivor_space now gets the klass of the object being copied once, up front, for use in multiple places. This avoids fetching (including re-decoding when compressed) the klass multiple times. This addresses part of JDK-8027545.
Moved check for and handling of string deduplication later, only applying it after the special array cases have been dealt with, since strings are not arrays. (They are header objects pointing to an array of character values.)
Special case typeArray, doing nothing other than the copy, since they contain no oops that need to be processed. This addresses JDK-8027761.
Changed handling of objArray, pushing all of the partial array tasks up front, rather than processing the current chunk after pushing a single task for the remaining work. This addresses JDK-8158045.
As part of these, cached some more frequently accessed values in G1ParScanThreadState member variables. This addresses part of part of JDK-8027545.
While both the old and new code will work for Project Valhalla, the handling of arrays should be updated for that project, which introduces new array types.
Deleted a lingering reference to G1ParScanPartialArrayClosure that was deleted long ago (JDK-8035330, JDK 9).
CR: https://bugs.openjdk.java.net/browse/JDK-8158045 https://bugs.openjdk.java.net/browse/JDK-8027761 https://bugs.openjdk.java.net/browse/JDK-8027545
Webrev: https://cr.openjdk.java.net/~kbarrett/8158045/open.00/
Testing: tier1-3
performance testing - seems to be at worst performance neutral, with a statistically significant 3% improvement in specjbb2015 critical-jops seen on one platform.
src/hotspot/share/gc/g1/g1ParScanThreadState.hpp line 84:
82: bool _old_gen_is_full; 83: // Size (in elements) of a partial objArray task chunk. 84: int _partial_objarray_chunk_size;
I would prefer to skip the "obj"-part here to have more consistent naming or, as mentioned above, include it in the stepper instead.
Some of the naming and factoring I've done is forward looking. I think we should consider splitting the copying part as well as the scanning, and chunking the copy of typeArrays. For example, JDK-8031565 suggests copying large typeArrays as part of termination waiting; I think splitting them into partial array tasks to use the normal parallelism in the framework is better than some new side channel. The chunk size for that should probably be substantially larger (and depend on the element size) than for objArray. Also, Project Valhalla is going to add new kinds of arrays that are neither objArray nor typeArray. We'll want to split them too. The same splitting calculations can apply, even though the chunk size may be different (and probably depends on the element klass). ------------- PR: https://git.openjdk.java.net/jdk/pull/90
On Wed, 9 Sep 2020 12:14:31 GMT, Stefan Johansson <sjohanss@openjdk.org> wrote:
This is rework of initial change from before the transition to git. The initial RFR email is attached below.
The primary change is to limit the number of partial array tasks in the queues for any given array. The original change just split up an array into N tasks that were all enqueued at once. But for a large array this could be a lot of tasks, leading to significant and unnecessary queue expansion. Instead we now limit the number of tasks for a given array based on the number of workers, and only gradually add tasks up to that limit. This gives other threads an opportunity to steal such tasks, while still keeping queue growth under control.
Most of the calculation for this is handled by a new helper class, so this can later be shared with ParallelGC.
The dispatch on array klass type for has also been changed. It now affirmatively breaks Project Valhalla, rather than quietly doing something that probably isn't what is actually wanted. I'll discuss with them so there is a plan for dealing with it when they take this update.
Ran tier1-6 in mach5 and some local stress tests.
Performance testing was unchanged from previous, except I wasn't able to reproduce the small specjbb2015 critical-jops improvement previously seen on one platform. My suspicion is that improvement was a statistical abberation.
--- Initial RFR email ---
RFR: 8158045: Improve large object handling during evacuation RFR: 8027761: Investigate fast-path for scanning only objects with references during gc RFR: 8027545: Improve object array chunking test in G1's copy_to_survivor_space
Please review this change to type dispatching and handling in G1's evacuation copying, in order to improve the hot paths and improve array handling. This change addresses several closely co-located enhancement requests; it seemed difficult to split them up in a sensible way.
do_copy_to_survivor_space now gets the klass of the object being copied once, up front, for use in multiple places. This avoids fetching (including re-decoding when compressed) the klass multiple times. This addresses part of JDK-8027545.
Moved check for and handling of string deduplication later, only applying it after the special array cases have been dealt with, since strings are not arrays. (They are header objects pointing to an array of character values.)
Special case typeArray, doing nothing other than the copy, since they contain no oops that need to be processed. This addresses JDK-8027761.
Changed handling of objArray, pushing all of the partial array tasks up front, rather than processing the current chunk after pushing a single task for the remaining work. This addresses JDK-8158045.
As part of these, cached some more frequently accessed values in G1ParScanThreadState member variables. This addresses part of part of JDK-8027545.
While both the old and new code will work for Project Valhalla, the handling of arrays should be updated for that project, which introduces new array types.
Deleted a lingering reference to G1ParScanPartialArrayClosure that was deleted long ago (JDK-8035330, JDK 9).
CR: https://bugs.openjdk.java.net/browse/JDK-8158045 https://bugs.openjdk.java.net/browse/JDK-8027761 https://bugs.openjdk.java.net/browse/JDK-8027545
Webrev: https://cr.openjdk.java.net/~kbarrett/8158045/open.00/
Testing: tier1-3
performance testing - seems to be at worst performance neutral, with a statistically significant 3% improvement in specjbb2015 critical-jops seen on one platform.
src/hotspot/share/gc/g1/g1ParScanThreadState.hpp line 165:
163: private: 164: inline void do_partial_array(PartialArrayScanTask task); 165: inline void start_partial_objArray(G1HeapRegionAttr dest_dir, oop from, oop to);
Same here, drop "obj" for consistent naming and avoiding the camel-case.
See response above about the name of _partial_objarray_chunk_size. But I should probably be consistent about objarray vs objArray. Since objArray is what it is over in runtime oop-land I'm going to go with that. ------------- PR: https://git.openjdk.java.net/jdk/pull/90
On Wed, 9 Sep 2020 12:19:34 GMT, Stefan Johansson <sjohanss@openjdk.org> wrote:
This is rework of initial change from before the transition to git. The initial RFR email is attached below.
The primary change is to limit the number of partial array tasks in the queues for any given array. The original change just split up an array into N tasks that were all enqueued at once. But for a large array this could be a lot of tasks, leading to significant and unnecessary queue expansion. Instead we now limit the number of tasks for a given array based on the number of workers, and only gradually add tasks up to that limit. This gives other threads an opportunity to steal such tasks, while still keeping queue growth under control.
Most of the calculation for this is handled by a new helper class, so this can later be shared with ParallelGC.
The dispatch on array klass type for has also been changed. It now affirmatively breaks Project Valhalla, rather than quietly doing something that probably isn't what is actually wanted. I'll discuss with them so there is a plan for dealing with it when they take this update.
Ran tier1-6 in mach5 and some local stress tests.
Performance testing was unchanged from previous, except I wasn't able to reproduce the small specjbb2015 critical-jops improvement previously seen on one platform. My suspicion is that improvement was a statistical abberation.
--- Initial RFR email ---
RFR: 8158045: Improve large object handling during evacuation RFR: 8027761: Investigate fast-path for scanning only objects with references during gc RFR: 8027545: Improve object array chunking test in G1's copy_to_survivor_space
Please review this change to type dispatching and handling in G1's evacuation copying, in order to improve the hot paths and improve array handling. This change addresses several closely co-located enhancement requests; it seemed difficult to split them up in a sensible way.
do_copy_to_survivor_space now gets the klass of the object being copied once, up front, for use in multiple places. This avoids fetching (including re-decoding when compressed) the klass multiple times. This addresses part of JDK-8027545.
Moved check for and handling of string deduplication later, only applying it after the special array cases have been dealt with, since strings are not arrays. (They are header objects pointing to an array of character values.)
Special case typeArray, doing nothing other than the copy, since they contain no oops that need to be processed. This addresses JDK-8027761.
Changed handling of objArray, pushing all of the partial array tasks up front, rather than processing the current chunk after pushing a single task for the remaining work. This addresses JDK-8158045.
As part of these, cached some more frequently accessed values in G1ParScanThreadState member variables. This addresses part of part of JDK-8027545.
While both the old and new code will work for Project Valhalla, the handling of arrays should be updated for that project, which introduces new array types.
Deleted a lingering reference to G1ParScanPartialArrayClosure that was deleted long ago (JDK-8035330, JDK 9).
CR: https://bugs.openjdk.java.net/browse/JDK-8158045 https://bugs.openjdk.java.net/browse/JDK-8027761 https://bugs.openjdk.java.net/browse/JDK-8027545
Webrev: https://cr.openjdk.java.net/~kbarrett/8158045/open.00/
Testing: tier1-3
performance testing - seems to be at worst performance neutral, with a statistically significant 3% improvement in specjbb2015 critical-jops seen on one platform.
src/hotspot/share/gc/shared/partialArrayTaskStepper.cpp line 34:
32: // that maximizes available parallelism. 33: return n_workers; 34: }
In preparation for a more advanced logic for the limit, or why not just use the input at the call-site?
Being noncommittal about whether something more "clever" could or should be done.
src/hotspot/share/gc/shared/partialArrayTaskStepper.inline.hpp line 71:
69: // The from-space object contains the real length. 70: int length = from->length(); 71: assert(start < length, "invariant: start %d, length %d", start, length);
Just so I'm not missing anything. There will never be more tasks than chunks left, so there is no risk that we have a race for the last chunk and thus hit this assertion?
That's correct. Unless there's a bug in the analysis or implementation I haven't found. That's why it's an invariant. ------------- PR: https://git.openjdk.java.net/jdk/pull/90
On Wed, 9 Sep 2020 12:49:56 GMT, Stefan Johansson <sjohanss@openjdk.org> wrote:
This is rework of initial change from before the transition to git. The initial RFR email is attached below.
The primary change is to limit the number of partial array tasks in the queues for any given array. The original change just split up an array into N tasks that were all enqueued at once. But for a large array this could be a lot of tasks, leading to significant and unnecessary queue expansion. Instead we now limit the number of tasks for a given array based on the number of workers, and only gradually add tasks up to that limit. This gives other threads an opportunity to steal such tasks, while still keeping queue growth under control.
Most of the calculation for this is handled by a new helper class, so this can later be shared with ParallelGC.
The dispatch on array klass type for has also been changed. It now affirmatively breaks Project Valhalla, rather than quietly doing something that probably isn't what is actually wanted. I'll discuss with them so there is a plan for dealing with it when they take this update.
Ran tier1-6 in mach5 and some local stress tests.
Performance testing was unchanged from previous, except I wasn't able to reproduce the small specjbb2015 critical-jops improvement previously seen on one platform. My suspicion is that improvement was a statistical abberation.
--- Initial RFR email ---
RFR: 8158045: Improve large object handling during evacuation RFR: 8027761: Investigate fast-path for scanning only objects with references during gc RFR: 8027545: Improve object array chunking test in G1's copy_to_survivor_space
Please review this change to type dispatching and handling in G1's evacuation copying, in order to improve the hot paths and improve array handling. This change addresses several closely co-located enhancement requests; it seemed difficult to split them up in a sensible way.
do_copy_to_survivor_space now gets the klass of the object being copied once, up front, for use in multiple places. This avoids fetching (including re-decoding when compressed) the klass multiple times. This addresses part of JDK-8027545.
Moved check for and handling of string deduplication later, only applying it after the special array cases have been dealt with, since strings are not arrays. (They are header objects pointing to an array of character values.)
Special case typeArray, doing nothing other than the copy, since they contain no oops that need to be processed. This addresses JDK-8027761.
Changed handling of objArray, pushing all of the partial array tasks up front, rather than processing the current chunk after pushing a single task for the remaining work. This addresses JDK-8158045.
As part of these, cached some more frequently accessed values in G1ParScanThreadState member variables. This addresses part of part of JDK-8027545.
While both the old and new code will work for Project Valhalla, the handling of arrays should be updated for that project, which introduces new array types.
Deleted a lingering reference to G1ParScanPartialArrayClosure that was deleted long ago (JDK-8035330, JDK 9).
CR: https://bugs.openjdk.java.net/browse/JDK-8158045 https://bugs.openjdk.java.net/browse/JDK-8027761 https://bugs.openjdk.java.net/browse/JDK-8027545
Webrev: https://cr.openjdk.java.net/~kbarrett/8158045/open.00/
Testing: tier1-3
performance testing - seems to be at worst performance neutral, with a statistically significant 3% improvement in specjbb2015 critical-jops seen on one platform.
src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 68:
66: _old_gen_is_full(false), 67: _partial_objarray_chunk_size(ParGCArrayScanChunk), 68: _partial_array_stepper(n_workers),
What do you think about saving the chunk size in the stepper instead? Then we don't need to pass it in to `start()` and `next()`. To avoid needing it for the call to `oop_iterate_range()` we could instead have the `Step` include both the start and end index.
See above discussion about naming and factoring. The same stepper can support multiple array types if the chunk size is external to the stepper. ------------- PR: https://git.openjdk.java.net/jdk/pull/90
On Sep 10, 2020, at 10:08 AM, Kim Barrett <kbarrett@openjdk.java.net> wrote:
On Wed, 9 Sep 2020 12:49:56 GMT, Stefan Johansson <sjohanss@openjdk.org> wrote:
src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 68:
66: _old_gen_is_full(false), 67: _partial_objarray_chunk_size(ParGCArrayScanChunk), 68: _partial_array_stepper(n_workers),
What do you think about saving the chunk size in the stepper instead? Then we don't need to pass it in to `start()` and `next()`. To avoid needing it for the call to `oop_iterate_range()` we could instead have the `Step` include both the start and end index.
See above discussion about naming and factoring. The same stepper can support multiple array types if the chunk size is external to the stepper.
Well, that discussion is only “above" in the github UI tool. Now that I see what happens when I reply there, maybe I won’t do that so much. That’s seriously chatty and massively quoting redundant.
On Thu, 10 Sep 2020 14:05:25 GMT, Kim Barrett <kbarrett@openjdk.org> wrote:
src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 68:
66: _old_gen_is_full(false), 67: _partial_objarray_chunk_size(ParGCArrayScanChunk), 68: _partial_array_stepper(n_workers),
What do you think about saving the chunk size in the stepper instead? Then we don't need to pass it in to `start()` and `next()`. To avoid needing it for the call to `oop_iterate_range()` we could instead have the `Step` include both the start and end index.
See above discussion about naming and factoring. The same stepper can support multiple array types if the chunk size is external to the stepper.
It can, but I think I would prefer multiple "steppers" for that case. This is of course just a matter of taste and I'm fine with leaving the chunk size external.
src/hotspot/share/gc/g1/g1ParScanThreadState.hpp line 84:
82: bool _old_gen_is_full; 83: // Size (in elements) of a partial objArray task chunk. 84: int _partial_objarray_chunk_size;
I would prefer to skip the "obj"-part here to have more consistent naming or, as mentioned above, include it in the stepper instead.
Some of the naming and factoring I've done is forward looking.
I think we should consider splitting the copying part as well as the scanning, and chunking the copy of typeArrays. For example, JDK-8031565 suggests copying large typeArrays as part of termination waiting; I think splitting them into partial array tasks to use the normal parallelism in the framework is better than some new side channel. The chunk size for that should probably be substantially larger (and depend on the element size) than for objArray. Also, Project Valhalla is going to add new kinds of arrays that are neither objArray nor typeArray. We'll want to split them too. The same splitting calculations can apply, even though the chunk size may be different (and probably depends on the element klass).
Ok, keeping the "obj"-part sounds reasonable.
src/hotspot/share/gc/g1/g1ParScanThreadState.hpp line 165:
163: private: 164: inline void do_partial_array(PartialArrayScanTask task); 165: inline void start_partial_objArray(G1HeapRegionAttr dest_dir, oop from, oop to);
Same here, drop "obj" for consistent naming and avoiding the camel-case.
See response above about the name of _partial_objarray_chunk_size. But I should probably be consistent about objarray vs objArray. Since objArray is what it is over in runtime oop-land I'm going to go with that.
Again, we have different taste. We very seldom go with camel-case in members and especially in the middle it looks strange. Looking through the GC-code I find mostly `objarray` or `obj_array`
src/hotspot/share/gc/shared/partialArrayTaskStepper.cpp line 34:
32: // that maximizes available parallelism. 33: return n_workers; 34: }
In preparation for a more advanced logic for the limit, or why not just use the input at the call-site?
Being noncommittal about whether something more "clever" could or should be done.
👍
src/hotspot/share/gc/shared/partialArrayTaskStepper.hpp line 57:
55: int _index; // Array index for the step. 56: uint _ncreate; // Number of new tasks to create. 57: };
I wouldn't mind having getters for those, but it's not a hard request :)
Step is intended to be a trivial data carrier. A (HotSpot) Pair or std::pair (if we used stdlib) would do, except I like named data items.
I prefer them named as well 👍
src/hotspot/share/gc/shared/partialArrayTaskStepper.inline.hpp line 103:
101: uint ncreate = MIN2(_task_fannout, MIN2(remaining_tasks, _task_limit + 1) - pending); 102: Step result = { start, ncreate }; 103: return result;
Similar to my comment above, what do you think about trying to write some test for this to verify we never get to many "tasks".
The asserts in the stepper's next() after the increment of to's length verify that we haven't overrun. I haven't thought of a way to verify the algorithm doesn't generate too few tasks though. Well, other than getting crashes because some array elements didn't get processed. But maybe you mean unit tests? I will try to write some; I should have done so already.
Should have been a bit more clear. I'm talking about unit tests, so if you plan writing some that's great. ------------- PR: https://git.openjdk.java.net/jdk/pull/90
On Thu, 10 Sep 2020 21:42:01 GMT, Stefan Johansson <sjohanss@openjdk.org> wrote:
See response above about the name of _partial_objarray_chunk_size. But I should probably be consistent about objarray vs objArray. Since objArray is what it is over in runtime oop-land I'm going to go with that.
Again, we have different taste. We very seldom go with camel-case in members and especially in the middle it looks strange. Looking through the GC-code I find mostly `objarray` or `obj_array`
You are right; in gc code, it seems "objarray" is used almost exclusively. The one exception I found seems to be around the G1CMObjArrayProcessor, where some camel-case has snuck in, such as G1ConcurrentMark::_objArray_processor. So I'll use "objarray". ------------- PR: https://git.openjdk.java.net/jdk/pull/90
On Wed, 9 Sep 2020 12:55:12 GMT, Stefan Johansson <sjohanss@openjdk.org> wrote:
This is rework of initial change from before the transition to git. The initial RFR email is attached below.
The primary change is to limit the number of partial array tasks in the queues for any given array. The original change just split up an array into N tasks that were all enqueued at once. But for a large array this could be a lot of tasks, leading to significant and unnecessary queue expansion. Instead we now limit the number of tasks for a given array based on the number of workers, and only gradually add tasks up to that limit. This gives other threads an opportunity to steal such tasks, while still keeping queue growth under control.
Most of the calculation for this is handled by a new helper class, so this can later be shared with ParallelGC.
The dispatch on array klass type for has also been changed. It now affirmatively breaks Project Valhalla, rather than quietly doing something that probably isn't what is actually wanted. I'll discuss with them so there is a plan for dealing with it when they take this update.
Ran tier1-6 in mach5 and some local stress tests.
Performance testing was unchanged from previous, except I wasn't able to reproduce the small specjbb2015 critical-jops improvement previously seen on one platform. My suspicion is that improvement was a statistical abberation.
--- Initial RFR email ---
RFR: 8158045: Improve large object handling during evacuation RFR: 8027761: Investigate fast-path for scanning only objects with references during gc RFR: 8027545: Improve object array chunking test in G1's copy_to_survivor_space
Please review this change to type dispatching and handling in G1's evacuation copying, in order to improve the hot paths and improve array handling. This change addresses several closely co-located enhancement requests; it seemed difficult to split them up in a sensible way.
do_copy_to_survivor_space now gets the klass of the object being copied once, up front, for use in multiple places. This avoids fetching (including re-decoding when compressed) the klass multiple times. This addresses part of JDK-8027545.
Moved check for and handling of string deduplication later, only applying it after the special array cases have been dealt with, since strings are not arrays. (They are header objects pointing to an array of character values.)
Special case typeArray, doing nothing other than the copy, since they contain no oops that need to be processed. This addresses JDK-8027761.
Changed handling of objArray, pushing all of the partial array tasks up front, rather than processing the current chunk after pushing a single task for the remaining work. This addresses JDK-8158045.
As part of these, cached some more frequently accessed values in G1ParScanThreadState member variables. This addresses part of part of JDK-8027545.
While both the old and new code will work for Project Valhalla, the handling of arrays should be updated for that project, which introduces new array types.
Deleted a lingering reference to G1ParScanPartialArrayClosure that was deleted long ago (JDK-8035330, JDK 9).
CR: https://bugs.openjdk.java.net/browse/JDK-8158045 https://bugs.openjdk.java.net/browse/JDK-8027761 https://bugs.openjdk.java.net/browse/JDK-8027545
Webrev: https://cr.openjdk.java.net/~kbarrett/8158045/open.00/
Testing: tier1-3
performance testing - seems to be at worst performance neutral, with a statistically significant 3% improvement in specjbb2015 critical-jops seen on one platform.
src/hotspot/share/gc/shared/partialArrayTaskStepper.hpp line 57:
55: int _index; // Array index for the step. 56: uint _ncreate; // Number of new tasks to create. 57: };
I wouldn't mind having getters for those, but it's not a hard request :)
Step is intended to be a trivial data carrier. A (HotSpot) Pair or std::pair (if we used stdlib) would do, except I like named data items. ------------- PR: https://git.openjdk.java.net/jdk/pull/90
On Wed, 9 Sep 2020 18:34:45 GMT, Stefan Johansson <sjohanss@openjdk.org> wrote:
This is rework of initial change from before the transition to git. The initial RFR email is attached below.
The primary change is to limit the number of partial array tasks in the queues for any given array. The original change just split up an array into N tasks that were all enqueued at once. But for a large array this could be a lot of tasks, leading to significant and unnecessary queue expansion. Instead we now limit the number of tasks for a given array based on the number of workers, and only gradually add tasks up to that limit. This gives other threads an opportunity to steal such tasks, while still keeping queue growth under control.
Most of the calculation for this is handled by a new helper class, so this can later be shared with ParallelGC.
The dispatch on array klass type for has also been changed. It now affirmatively breaks Project Valhalla, rather than quietly doing something that probably isn't what is actually wanted. I'll discuss with them so there is a plan for dealing with it when they take this update.
Ran tier1-6 in mach5 and some local stress tests.
Performance testing was unchanged from previous, except I wasn't able to reproduce the small specjbb2015 critical-jops improvement previously seen on one platform. My suspicion is that improvement was a statistical abberation.
--- Initial RFR email ---
RFR: 8158045: Improve large object handling during evacuation RFR: 8027761: Investigate fast-path for scanning only objects with references during gc RFR: 8027545: Improve object array chunking test in G1's copy_to_survivor_space
Please review this change to type dispatching and handling in G1's evacuation copying, in order to improve the hot paths and improve array handling. This change addresses several closely co-located enhancement requests; it seemed difficult to split them up in a sensible way.
do_copy_to_survivor_space now gets the klass of the object being copied once, up front, for use in multiple places. This avoids fetching (including re-decoding when compressed) the klass multiple times. This addresses part of JDK-8027545.
Moved check for and handling of string deduplication later, only applying it after the special array cases have been dealt with, since strings are not arrays. (They are header objects pointing to an array of character values.)
Special case typeArray, doing nothing other than the copy, since they contain no oops that need to be processed. This addresses JDK-8027761.
Changed handling of objArray, pushing all of the partial array tasks up front, rather than processing the current chunk after pushing a single task for the remaining work. This addresses JDK-8158045.
As part of these, cached some more frequently accessed values in G1ParScanThreadState member variables. This addresses part of part of JDK-8027545.
While both the old and new code will work for Project Valhalla, the handling of arrays should be updated for that project, which introduces new array types.
Deleted a lingering reference to G1ParScanPartialArrayClosure that was deleted long ago (JDK-8035330, JDK 9).
CR: https://bugs.openjdk.java.net/browse/JDK-8158045 https://bugs.openjdk.java.net/browse/JDK-8027761 https://bugs.openjdk.java.net/browse/JDK-8027545
Webrev: https://cr.openjdk.java.net/~kbarrett/8158045/open.00/
Testing: tier1-3
performance testing - seems to be at worst performance neutral, with a statistically significant 3% improvement in specjbb2015 critical-jops seen on one platform.
src/hotspot/share/gc/shared/partialArrayTaskStepper.inline.hpp line 103:
101: uint ncreate = MIN2(_task_fannout, MIN2(remaining_tasks, _task_limit + 1) - pending); 102: Step result = { start, ncreate }; 103: return result;
Similar to my comment above, what do you think about trying to write some test for this to verify we never get to many "tasks".
The asserts in the stepper's next() after the increment of to's length verify that we haven't overrun. I haven't thought of a way to verify the algorithm doesn't generate too few tasks though. Well, other than getting crashes because some array elements didn't get processed. But maybe you mean unit tests? I will try to write some; I should have done so already. ------------- PR: https://git.openjdk.java.net/jdk/pull/90
participants (3)
-
Kim Barrett
-
Kim Barrett
-
Stefan Johansson