RFR: 8331411: Shenandoah: Reconsider spinning duration in ShenandoahLock [v6]
Xiaolong Peng
xpeng at openjdk.org
Tue Jun 25 08:11:13 UTC 2024
On Tue, 25 Jun 2024 08:06:49 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:
>> ### Notes
>> While doing CAS to get the lock, original implementation sleep/yield once after spinning 0xFFF times, and do these over and over again until get the lock successfully, it is like ```(N spins + sleep/yield) loop ```, based on test results, it seems doing more spins results in worse performance, we decided to change the algorithm to ```(N spins) + (yield loop)```, meanwhile block thread immediately if Safepoint is pending. But still need to determine the best N value for spins, tested multiple possible values: 0, 0x01, 0x7, 0xF, 0x1F, 0x3F, 0x7F, 0xFF, and compare the results with the baseline data(original implementation).
>>
>> #### Test code
>>
>> public class Alloc {
>> static final int THREADS = 1280; //32 threads per CPU core, 40 cores
>> static final Object[] sinks = new Object[64*THREADS];
>> static volatile boolean start;
>> static volatile boolean stop;
>>
>> public static void main(String... args) throws Throwable {
>> for (int t = 0; t < THREADS; t++) {
>> int ft = t;
>> new Thread(() -> work(ft * 64)).start();
>> }
>>
>> Thread.sleep(1000);
>> start = true;
>> Thread.sleep(30_000);
>> stop = true;
>> }
>>
>> public static void work(int idx) {
>> while (!start) { Thread.onSpinWait(); }
>> while (!stop) {
>> sinks[idx] = new byte[128];
>> }
>> }
>> }
>>
>>
>> Run it like this and observe TTSP times:
>>
>>
>> java -Xms256m -Xmx256m -XX:+UseShenandoahGC -XX:-UseTLAB -Xlog:gc -Xlog:safepoint Alloc.java
>>
>>
>> #### Metrics from tests(TTSP, allocation rate)
>> ##### Heavy contention(1280 threads, 32 per CPU core)
>> | Test | SP polls | Average TTSP | 2% TRIMMEAN | MAX | MIN |
>> | -------- | -------- | ------------ | ----------- | -------- | ----- |
>> | baseline | 18 | 3882361 | 3882361 | 43310117 | 49197 |
>> | 0x00 | 168 | 861677 | 589036 | 46937732 | 44005 |
>> | 0x01 | 164 | 627056 | 572697 | 10004767 | 55472 |
>> | 0x07 | 163 | 650578 | 625329 | 5312631 | 53734 |
>> | 0x0F | 164 | 590398 | 557325 | 6481761 | 56794 |
>> | 0x1F | 144 | 814400 | 790089 | 5024881 | 56041 |
>> | 0x3F | 137 | 830288 | 801192 | 5533538 | 54982 |
>> | 0x7F | 132 | 1101625 | 845626 | 35425614 | 57492 |
>> | 0xFF | 125 | 1005433 | 970988 | 6193342 | 54362 |
>>
>>
>> ##### Light contention(40 threads, 1 per CPU core)
>> | Spins | SP polls | Average TTSP | 2% T...
>
> Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision:
>
> - Add parentheses for better alignment
> - Increase spin pauses to 0xFF to address performance regression in lightly contended scenarios
Also increased spin pauses to 255 times based on the test result, based on the test it should be better balanced in all cases, we should not sacrifice the performance of regular cases for the best performances of extreme cases.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/19570#issuecomment-2188253223
More information about the hotspot-gc-dev
mailing list