RFR: 8331411: Shenandoah: Reconsider spinning duration in ShenandoahLock [v2]

Wed Jun 19 16:40:22 UTC 2024

> ### Notes
> While doing CAS to get the lock, original implementation sleep/yield once after spinning 0xFFF times, and do these over and over again until get the lock successfully, it is like ```(N spins + sleep/yield) loop ```, based on test results, it seems doing more spins results in worse performance, we decided to change the algorithm to ```(N spins) + (yield loop)```, meanwhile block thread immediately if Safepoint is pending. But still need to determine the best N value for spins, tested multiple possible values: 0, 0x0, 0x1, 0x3, 0x7, 0xF, 0x1F, 0x3F, 0x7F, 0xFF, and compare the results with the baseline data(original implementation).
> 
> Also noticed there was regression in Dacapo  h2 benchmark, after deep dive and debug we decided to let non-java threads to only spin, which  favors GC threads a little over Java threads at contented lock. Some follow-up tasks will be taken to reduce lock contention from Shenandoah GC, e.g. https://bugs.openjdk.org/browse/JDK-8334147
> 
> #### Test code
> 
> public class Alloc {
> 	static final int THREADS = 1280; //32 threads per CPU core, 40 cores
> 	static final Object[] sinks = new Object[64*THREADS];
> 	static volatile boolean start;
> 	static volatile boolean stop;
> 
> 	public static void main(String... args) throws Throwable {
> 			for (int t = 0; t < THREADS; t++) {
> 					int ft = t;
> 					new Thread(() -> work(ft * 64)).start();
> 			}
> 
> 			Thread.sleep(1000);
> 			start = true;
> 			Thread.sleep(30_000);
> 			stop = true;
> 	}
> 
> 	public static void work(int idx) {
> 			while (!start) { Thread.onSpinWait(); }
> 			while (!stop) {
> 					sinks[idx] = new byte[128];
> 			}
> 	}
> }
> 
> 
> Run it like this and observe TTSP times:
> 
> 
> java -Xms256m -Xmx256m -XX:+UseShenandoahGC -XX:-UseTLAB -Xlog:gc -Xlog:safepoint Alloc.java
> 
> 
> #### Metrics from tests(TTSP, allocation rate)
> ##### Heavy contention(1280 threads, 32 per CPU core)
> | Test       | Count | AVG    | TRIMMEAN 2% | MAX      | MIN   | AVG allocation rate(M/s) |
> | ---------- | ----- | ------ | ----------- | -------- | ----- | ------------------------ |
> | Baseline   | 19    | 940270 | 940270      | 5956928  | 75562 | 23.34                    |
> | No spin    | 164   | 222958 | 204822      | 3330053  | 53819 | 238.9                    |
> | 0x01 | 172   | 189173 | 186601      | 750715   | 64864 | 244.1                    |
> | 0x03      | 174   | 286892 | 217739      | 12412225 | 55891 | 239.5                    |
> | 0x07       | 187   | 194440 | 183894      | 2284615  | 55256 | 235.9                    |
> | 0x0F       | 1...

Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision:

 - Wait on STS_lock to suspend Java thread acquiring heap lock when SP is synchronizing
 - Test code

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/19570/files
  - new: https://git.openjdk.org/jdk/pull/19570/files/d5d8b65f..f388b344

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19570&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19570&range=00-01

  Stats: 35 lines in 2 files changed: 20 ins; 6 del; 9 mod
  Patch: https://git.openjdk.org/jdk/pull/19570.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19570/head:pull/19570

PR: https://git.openjdk.org/jdk/pull/19570