RFR: 8299703: GenShen: improvements in card scanning [v7]
Y. Srinivas Ramakrishna
ysr at openjdk.org
Thu Feb 2 01:01:55 UTC 2023
> **Main changes:**
> 1. `process_clusters()` now finds and processes contiguous ranges of dirty cards, skipping over contiguous ranges of clean cards. For reading the diffs, it might be easiest to look at the new code, rather than view the side-by-side diffs.
> 2. the ShenandoahCardCluster class has been extended by a `block_start()` method which returns the first object in a card (which could be co-initial with the card); this method is used by the refactored `process_clusters()` above.
> 3. ShenandoahCardCluster class's `has_object()` method has been renamed `starts_object()` which more closely reflects the API.
> 4. ShenandoahCardStats class has been modified to better suit the way statistics are gathered in the rewritten `process_clusters()`. The larger-grain API should also result in less overhead for gathering the statistics and might (subject to measurement) allow it to be available in product/release builds (if so, that will be done in a separate follow-up ticket).
> 5. Added some const annotations.
>
> **Testing & Implementation Notes:**
> 6. Tested with Extremem and SpecJBB, fastdebug, release, and product builds, with and without verification enabled.
> 7. Preliminary performance data with an Extremem workload showed roughly 17-18% reduction in wall-clock durations of concurrent remembered set scanning across the distribution (p0, p25, p50, p75), p100 (max) was marginally down at 2%. The trend of the change was as expected since the gains are lost when we have a higher frequency of dirty/clean alternations with short dirty/clean runs.
> 8. More performance data with SPECjbb and several different Extremem workloads were gathered, and can be found below, including both phases that use the process_clusters code. See https://github.com/openjdk/shenandoah/pull/193#issuecomment-1405191124 below.
>
> **Acknowledgments**:
> 9. Many thanks to @kdnilsen for feedback on an earlier version of the draft PR, which helped catch a crucial misunderstanding on the role of TAMS and marked objects, and helped fix the error that had been dogging me.
>
> **Epilogue**:
> 10. Further performance improvements are possible, but are deferred for follow-up.
Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 116 commits:
- Merge branch 'master' into rs_scan
- guarantees to asserts (non-production), and warnings (production);
review feedback.
- jcheck: tab
- Merge branch 'master' into rs_scan
- a const, some assertions, and avoid redundant scans for on-objArrays
that straddle across card clusters (sic).
- A couple of guarantees to catch a pesky assert that's occasionally
triggering.
- Merge branch 'master' into rs_scan
- More const safety, some asserts, some comments.
- Change type of loop variable to signed to allow correct termination for the case when start_card_index is 0. Nominal check for overflow when using signed type for card index.
- Fix the direction of an address comparison, add a couple of assertions,
and elaborate some comments. Passes heap verification handily now.
- ... and 106 more: https://git.openjdk.org/shenandoah/compare/4ec2cd90...4b718fd4
-------------
Changes: https://git.openjdk.org/shenandoah/pull/193/files
Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=193&range=06
Stats: 853 lines in 12 files changed: 346 ins; 268 del; 239 mod
Patch: https://git.openjdk.org/shenandoah/pull/193.diff
Fetch: git fetch https://git.openjdk.org/shenandoah pull/193/head:pull/193
PR: https://git.openjdk.org/shenandoah/pull/193
More information about the shenandoah-dev
mailing list