RFR: 8331411: Shenandoah: Reconsider spinning duration in ShenandoahLock [v7]
William Kemper
wkemper at openjdk.org
Wed Jun 26 17:29:12 UTC 2024
On Tue, 25 Jun 2024 08:20:43 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:
>> ### Notes
>> While doing CAS to get the lock, original implementation sleep/yield once after spinning 0xFFF times, and do these over and over again until get the lock successfully, it is like ```(N spins + sleep/yield) loop ```, based on test results, it seems doing more spins results in worse performance, we decided to change the algorithm to ```(N spins) + (yield loop)```, meanwhile block thread immediately if Safepoint is pending. But still need to determine the best N value for spins, tested multiple possible values: 0, 0x01, 0x7, 0xF, 0x1F, 0x3F, 0x7F, 0xFF, and compare the results with the baseline data(original implementation).
>>
>> #### Test code
>>
>> public class Alloc {
>> static final int THREADS = 1280; //32 threads per CPU core, 40 cores
>> static final Object[] sinks = new Object[64*THREADS];
>> static volatile boolean start;
>> static volatile boolean stop;
>>
>> public static void main(String... args) throws Throwable {
>> for (int t = 0; t < THREADS; t++) {
>> int ft = t;
>> new Thread(() -> work(ft * 64)).start();
>> }
>>
>> Thread.sleep(1000);
>> start = true;
>> Thread.sleep(30_000);
>> stop = true;
>> }
>>
>> public static void work(int idx) {
>> while (!start) { Thread.onSpinWait(); }
>> while (!stop) {
>> sinks[idx] = new byte[128];
>> }
>> }
>> }
>>
>>
>> Run it like this and observe TTSP times:
>>
>>
>> java -Xms256m -Xmx256m -XX:+UseShenandoahGC -XX:-UseTLAB -Xlog:gc -Xlog:safepoint Alloc.java
>>
>>
>> #### Metrics from tests(TTSP, allocation rate)
>> ##### Heavy contention(1280 threads, 32 per CPU core)
>> | Test | SP polls | Average TTSP | 2% TRIMMEAN | MAX | MIN |
>> | -------- | -------- | ------------ | ----------- | -------- | ----- |
>> | baseline | 18 | 3882361 | 3882361 | 43310117 | 49197 |
>> | 0x00 | 168 | 861677 | 589036 | 46937732 | 44005 |
>> | 0x01 | 164 | 627056 | 572697 | 10004767 | 55472 |
>> | 0x07 | 163 | 650578 | 625329 | 5312631 | 53734 |
>> | 0x0F | 164 | 590398 | 557325 | 6481761 | 56794 |
>> | 0x1F | 144 | 814400 | 790089 | 5024881 | 56041 |
>> | 0x3F | 137 | 830288 | 801192 | 5533538 | 54982 |
>> | 0x7F | 132 | 1101625 | 845626 | 35425614 | 57492 |
>> | 0xFF | 125 | 1005433 | 970988 | 6193342 | 54362 |
>>
>>
>> ##### Light contention(40 threads, 1 per CPU core)
>> | Spins | SP polls | Average TTSP | 2% T...
>
> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision:
>
> Simplify code with less stacks
Okay. I don't understand the data in the third table for: `java -Xms256m -Xmx256m -XX:+UseShenandoahGC -XX:+UseTLAB -Xlog:gc -Xlog:safepoint Alloc.java`. Are those values also SP Polls? or are they GC counts? At any rate, isn't lower better? or did you chose 0xFF because it showed the best performance on dacapo testing?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/19570#issuecomment-2192260401
More information about the hotspot-gc-dev
mailing list