[8] RFR: Shenandoah: specialize String Table scans for better pause performance
Zhengyu Gu
zgu at redhat.com
Tue May 19 11:34:37 UTC 2020
935 assert(UseShenandoahGC, "Only for Shenandoah");
936 const int chunk_size = limit / (ParallelGCThreads * 10);
937
Maybe need to guard against minimum chunk_size? even through, it is
unlikely, but possible chunk_size == 0.
Otherwise, looks good.
-Zhengyu
On 5/19/20 5:23 AM, Aleksey Shipilev wrote:
> This is sh/jdk8 specific fix. String Table scans dominate some workloads, notably Cassandra. But
> they are also visible on simple hello world scenarios. The underlying reason is that parallel
> StringTable scans use a very small chunk size, so on decently-sized tables we hit the contended
> atomic very often. Larger chunk size is needed to improve performance.
>
> Since that chunk size is hard-coded and entrenched in the 8u upstream code, I chose to copy-paste
> the relevant method and specialize it for Shenandoah. 11u and later code handles it better with
> OopStorage and concurrent hash tables, so this specialization is only needed for 8u.
>
> 8u webrev:
> https://cr.openjdk.java.net/~shade/shenandoah/8u-stringtable-scan/webrev.01/
>
> This yields the impressive improvements on Cassandra perf tests, see example GC cycles:
>
> === Baseline:
>
> Pause Init Mark (N) 3646 us
> Accumulate Stats 18 us
> Make Parsable 4 us
> Update Region States 86 us
> Scan Roots 3478 us
> S: <total> 26681 us
> S: Thread Roots 1521 us, workers (us): 468, 392, 111, 122, 113, 106, 111, 98,
> S: Universe Roots 3 us, workers (us): 3, ---, ---, ---, ---, ---, ---, ---,
> S: JNI Handles Roots 3 us, workers (us): 3, ---, ---, ---, ---, ---, ---, ---,
> S: JNI Weak Roots 25 us, workers (us): ---, ---, ---, 25, ---, ---, ---, ---,
> S: String Table Roots 24644 us, workers (us): 2857, 2935, 3141, 3122, 3148, 3148, 3148, 3145,
> S: Synchronizer Roots 3 us, workers (us): 2, 0, 0, 0, 0, 0, 0, 0,
> S: Flat Profiler Roots 2 us, workers (us): ---, ---, 2, ---, ---, ---, ---, ---,
> S: Management Roots 2 us, workers (us): 2, ---, ---, ---, ---, ---, ---, ---,
> S: System Dict Roots 10 us, workers (us): ---, ---, 10, ---, ---, ---, ---, ---,
> S: CLDG Roots 458 us, workers (us): 12, 13, 75, 72, 74, 75, 65, 72,
> S: JVMTI Roots 10 us, workers (us): ---, 10, ---, ---, ---, ---, ---, ---,
> Resize TLABs 11 us
>
> === Patched:
> Pause Init Mark (N) 999 us
> Accumulate Stats 17 us
> Make Parsable 4 us
> Update Region States 92 us
> Scan Roots 824 us
> S: <total> 5494 us
> S: Thread Roots 1584 us, workers (us): 121, 613, 114, 131, 301, 110, 102, 93,
> S: Universe Roots 2 us, workers (us): 2, ---, ---, ---, ---, ---, ---, ---,
> S: JNI Handles Roots 4 us, workers (us): ---, 4, ---, ---, ---, ---, ---, ---,
> S: JNI Weak Roots 20 us, workers (us): ---, ---, ---, ---, ---, 20, ---, ---,
> S: String Table Roots 3374 us, workers (us): 506, 81, 479, 487, 372, 482, 473, 494,
> S: Synchronizer Roots 4 us, workers (us): 0, 0, 3, 0, 0, 0, 0, 0,
> S: Flat Profiler Roots 5 us, workers (us): ---, 5, ---, ---, ---, ---, ---, ---,
> S: Management Roots 1 us, workers (us): 1, ---, ---, ---, ---, ---, ---, ---,
> S: System Dict Roots 10 us, workers (us): ---, ---, 10, ---, ---, ---, ---, ---,
> S: CLDG Roots 478 us, workers (us): 79, 12, 77, 69, 13, 75, 75, 78,
> S: JVMTI Roots 10 us, workers (us): 10, ---, ---, ---, ---, ---, ---, ---,
> Resize TLABs 10 us
>
>
> Testing: hotspot_gc_shenandoah, Cassandra benches, eyeballing gc logs
>
More information about the shenandoah-dev
mailing list