RFR: 8331411: Shenandoah: Reconsider spinning duration in ShenandoahLock [v7]

William Kemper wkemper at openjdk.org
Wed Jun 26 17:07:14 UTC 2024


On Tue, 25 Jun 2024 08:20:43 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

>> ### Notes
>> While doing CAS to get the lock, original implementation sleep/yield once after spinning 0xFFF times, and do these over and over again until get the lock successfully, it is like ```(N spins + sleep/yield) loop ```, based on test results, it seems doing more spins results in worse performance, we decided to change the algorithm to ```(N spins) + (yield loop)```, meanwhile block thread immediately if Safepoint is pending. But still need to determine the best N value for spins, tested multiple possible values: 0, 0x01, 0x7, 0xF, 0x1F, 0x3F, 0x7F, 0xFF, and compare the results with the baseline data(original implementation).
>> 
>> #### Test code
>> 
>> public class Alloc {
>> 	static final int THREADS = 1280; //32 threads per CPU core, 40 cores
>> 	static final Object[] sinks = new Object[64*THREADS];
>> 	static volatile boolean start;
>> 	static volatile boolean stop;
>> 
>> 	public static void main(String... args) throws Throwable {
>> 			for (int t = 0; t < THREADS; t++) {
>> 					int ft = t;
>> 					new Thread(() -> work(ft * 64)).start();
>> 			}
>> 
>> 			Thread.sleep(1000);
>> 			start = true;
>> 			Thread.sleep(30_000);
>> 			stop = true;
>> 	}
>> 
>> 	public static void work(int idx) {
>> 			while (!start) { Thread.onSpinWait(); }
>> 			while (!stop) {
>> 					sinks[idx] = new byte[128];
>> 			}
>> 	}
>> }
>> 
>> 
>> Run it like this and observe TTSP times:
>> 
>> 
>> java -Xms256m -Xmx256m -XX:+UseShenandoahGC -XX:-UseTLAB -Xlog:gc -Xlog:safepoint Alloc.java
>> 
>> 
>> #### Metrics from tests(TTSP, allocation rate)
>> ##### Heavy contention(1280 threads, 32 per CPU core)
>> | Test     | SP polls | Average TTSP | 2% TRIMMEAN | MAX      | MIN   |
>> | -------- | -------- | ------------ | ----------- | -------- | ----- |
>> | baseline | 18       | 3882361      | 3882361     | 43310117 | 49197 |
>> | 0x00     | 168      | 861677       | 589036      | 46937732 | 44005 |
>> | 0x01     | 164      | 627056       | 572697      | 10004767 | 55472 |
>> | 0x07     | 163      | 650578       | 625329      | 5312631  | 53734 |
>> | 0x0F     | 164      | 590398       | 557325      | 6481761  | 56794 |
>> | 0x1F     | 144      | 814400       | 790089      | 5024881  | 56041 |
>> | 0x3F     | 137      | 830288       | 801192      | 5533538  | 54982 |
>> | 0x7F     | 132      | 1101625      | 845626      | 35425614 | 57492 |
>> | 0xFF     | 125      | 1005433      | 970988      | 6193342  | 54362 |
>> 
>> 
>> ##### Light contention(40 threads, 1 per CPU core)
>> | Spins    | SP polls | Average TTSP | 2% T...
>
> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Simplify code with less stacks

Are we trying to minimize TTSP or SP Polls? Does SP mean Safepoint or SpinPause here? If we're trying to minimize TTSP, The result table suggests 0x0F would be better?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19570#issuecomment-2192226292


More information about the hotspot-gc-dev mailing list