Integrated: Load balance remset scan

Kelvin Nilsen kdnilsen at openjdk.org
Fri Nov 18 23:26:06 UTC 2022


On Fri, 18 Nov 2022 18:49:08 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

> Prior to this change, the initial group of remembered set assignments was given to worker threads one entire region at a time.  We found that with large region sizes (e.g. 16 MiB and above), this resulted in too much imbalance in the work performed by individual threads.  A few threads assigned to scan 16 MiB regions with high density of "interesting pointers" were still scanning after all other worker threads finished their scanning efforts.
> 
> This change caps the maximum assignment size for worker threads at 4 MiB.  This results in better distribution of efforts between multiple concurrent threads.  With 13 worker threads and 16 MiB heap regions, we observe the following benefits on an Extremem workload (46_064 MiB heap size, 27_648 MiB new size):
> 
> Latency for customer preparation processing improved by 0.79% for P50, 2.26% for P95, 8.21% for p99, 28.17% for p99.9, 86.59% for p99.99, 86.77% for p99.999.  The p100 response improved only slightly, by 1.99%.
> 
> Average time for concurrent remembered set marking scan improved by 1.92%.  The average time for concurrent update refs time, which includes remembered set scanning, improved by 1.72%.

This pull request has now been integrated.

Changeset: 264f9c23
Author:    Kelvin Nilsen <kdnilsen at openjdk.org>
URL:       https://git.openjdk.org/shenandoah/commit/264f9c2343244bdbb6f6e6d1d290f18495409da4
Stats:     161 lines in 4 files changed: 98 ins; 6 del; 57 mod

Load balance remset scan

Reviewed-by: wkemper

-------------

PR: https://git.openjdk.org/shenandoah/pull/173


More information about the shenandoah-dev mailing list