RFR: 8299703: GenShen: improvements in card scanning

Thu Jan 26 15:37:12 UTC 2023

On Thu, 5 Jan 2023 20:45:00 GMT, Y. Srinivas Ramakrishna <ysr at openjdk.org> wrote:

> **Main changes:**
> 1. `process_clusters()` now finds and processes contiguous ranges of dirty cards, skipping over contiguous ranges of clean cards. For reading the diffs, it might be easiest to look at the new code, rather than view the side-by-side diffs.
> 2. the ShenandoahCardCluster class has been extended by a `block_start()` method which returns the first object in a card (which could be co-initial with the card); this method is used by the refactored `process_clusters()` above.
> 3. ShenandoahCardCluster class's `has_object()` method has been renamed `starts_object()` which more closely reflects the API.
> 4. ShenandoahCardStats class has been modified to better suit the way statistics are gathered in the rewritten `process_clusters()`. The larger-grain API should also result in less overhead for gathering the statistics and might (subject to measurement) allow it to be available in product/release builds (if so, that will be done in a separate follow-up ticket).
> 5. Added some const annotations.
> 
> **Testing & Implementation Notes:**
> 6. Tested with Extremem and SpecJBB, fastdebug, release, and product builds, with and without verification enabled.
> 7. Preliminary performance data with an Extremem workload show roughly 17-18% reduction in wall-clock durations of concurrent remembered set scanning across the distribution (p0, p25, p50, p75), p100 (max) was marginally down at 2%. The trend of the change was as expected since the gains are lost when we have a higher frequency of dirty/clean alternations with short dirty/clean runs.
> 8. More performance data with SPECjbb and a different Extremem workload are being gathered, and will be included. We will also include the impact on the concurrent update refs phase, as well as the overall impact on latency scores.
> 
> **Acknowledgments**:
> 9. Many thanks to @kdnilsen for feedback on an earlier version of the draft PR, which helped catch a crucial misunderstanding on the role of TAMS and marked objects, and helped fix the error that had been dogging me.
> 
> **Epilogue**:
> 10. Further performance improvements are possible, but are deferred for follow-up.

**SPECjbb**

Conc Scan Rem:
-----------------
Before:    av=72.3 lvls=(0.83, 57.03, 68.24, 83.38, 848.8).      ms
After:     av=53.9 lvls=(0.66, 42.62, 53.48, 66.09, 253.8)       ms
----------------------------------------------------------------
Delta:         -25      -20.  -25    -21.   -21    -70.           %

Conc Upd Ref:
--------------
Before:   av=278.7 lvls=(4.91, 113.63, 197.46, 457.42, 892.35).  ms
After:    av=145.3 lvls=(2.21,  66.52, 111.62, 185.35, 881.24)   ms
-------------------------------------------------------------------------
Delta:       -48.        -55.    -41.    -43.    -59.   -1.2.     %

**Extremem Config1**

Conc Scan Rem:
-----------------
Before:  av=8.7.   lvls=(0.94, 8.18, 8.63, 9.05, 22.99).          ms
After:   av=7.2.   lvls=(0.78, 6.71, 7.10, 7.49, 22.61)           ms
----------------------------------------------------------------------- 
Delta:      -16        -17.   -18    -18    -17.   -1.7           %

Conc Upd Ref:
--------------
Before:  av=12.49   lvls=(4.95, 12.11, 12.59, 13.22, 18.28)         ms
After:   av=11.00.  lvls=(3.13, 10.53, 11.02, 11.50,  18.24)        ms
-----------------------------------------------------------------------
Delta:     -12          -37    -13    -13    -13    -0.2          %

**Extremem Config2**

Conc Scan Rem:
-----------------
Before:   av=123     lvls=(3.53, 22.45, 49.79, 172.95, 779.97)      ms
After:    av=129     lvls=(3.59, 17.68,  41.60, 218.83, 774.75).    ms
--------------------------------------------------------------------
Delta:     +5             0.     -27.     -20.    +21.     -7       %

Conc Upd Ref:
--------------
Before:  av=257.   lvls=(21.6, 144.1,  241.9, 318.9, 762.8)      ms
After:   av=257    lvls=(26.8, 105.5, 244.7, 330.5, 751.3)       ms
--------------------------------------------------------------------
Delta:      0            +24     -27     +1     +4     -2         %

The following Extremem configurations are not useful because the generation sizes were subject to rapid changes, and many degenerate collections occurred at random times:

**Extremem Config3**

Conc Scan Rem:
-----------------
Before:   av=9.45     lvls=(0.003, 0.369, 0.742, 19.79, 24.59)      s
After:    av=0.37     lvls=(0.004, 0.151, 0.272,  0.417, 6.07).     s
---------------------------------------------------------------------
Delta:    -96               +10     -59.     -63    -98   -75        %

Conc Upd Ref:
--------------
Before:   av=235.7  lvls=(45.0, 62.5, 265.0, 301.4,  679.3)       ms
After:    av=274.5  lvls=(19.2, 153.9, 242.2,370.9, 1213.4)       ms
-------------------------------------------------------------------
Delta:      +17            -57.   +147.  -9.   +23.   +79         %

-------------

PR: https://git.openjdk.org/shenandoah/pull/193