safepointing. read vs write

Wed May 3 08:53:58 UTC 2023

Hello

Just wanted to share some results of my analysis.

I have changed safepoints from reading polling page ( lwu to x0) to writing to polling page ( sw from x0).

here are results:

polling page  read
Benchmark                                                          Mode  Cnt   Score    Error  Units
InterfaceCalls.testInterfaceCastAndCall                            avgt   25  32.192 ?  0.364  ns/op
InterfaceCalls.testInterfaceCastAndCall:L1-dcache-load-misses:u    avgt    5   0.001 ?  0.001   #/op
InterfaceCalls.testInterfaceCastAndCall:L1-dcache-loads:u          avgt    5   9.036 ?  0.072   #/op
InterfaceCalls.testInterfaceCastAndCall:L1-dcache-stores:u         avgt    5   0.019 ?  0.025   #/op
InterfaceCalls.testInterfaceCastAndCall:LLC-loads:u                avgt    5   0.005 ?  0.007   #/op
InterfaceCalls.testInterfaceCastAndCall:LLC-stores:u               avgt    5   0.002 ?  0.002   #/op
InterfaceCalls.testInterfaceCastAndCall:branch-misses:u            avgt    5   0.001 ?  0.001   #/op
InterfaceCalls.testInterfaceCastAndCall:branches:u                 avgt    5   4.014 ?  0.029   #/op
InterfaceCalls.testInterfaceCastAndCall:cycles:u                   avgt    5  38.777 ?  2.444   #/op
InterfaceCalls.testInterfaceCastAndCall:instructions:u             avgt    5  15.121 ?  0.200   #/op
InterfaceCalls.testInterfaceCastAndCall:stalled-cycles-frontend:u  avgt    5   0.001 ?  0.002   #/op

polling page write
Benchmark                                                          Mode  Cnt   Score    Error  Units
InterfaceCalls.testInterfaceCastAndCall                            avgt   25  30.326 ?  0.257  ns/op
InterfaceCalls.testInterfaceCastAndCall:L1-dcache-load-misses:u    avgt    5   0.001 ?  0.001   #/op
InterfaceCalls.testInterfaceCastAndCall:L1-dcache-loads:u          avgt    5   8.035 ?  0.016   #/op
InterfaceCalls.testInterfaceCastAndCall:L1-dcache-stores:u         avgt    5   1.016 ?  0.012   #/op
InterfaceCalls.testInterfaceCastAndCall:LLC-loads:u                avgt    5   0.004 ?  0.004   #/op
InterfaceCalls.testInterfaceCastAndCall:LLC-stores:u               avgt    5   0.001 ?  0.001   #/op
InterfaceCalls.testInterfaceCastAndCall:branch-misses:u            avgt    5   0.001 ?  0.001   #/op
InterfaceCalls.testInterfaceCastAndCall:branches:u                 avgt    5   4.014 ?  0.007   #/op
InterfaceCalls.testInterfaceCastAndCall:cycles:u                   avgt    5  36.552 ?  1.662   #/op
InterfaceCalls.testInterfaceCastAndCall:instructions:u             avgt    5  15.110 ?  0.062   #/op
InterfaceCalls.testInterfaceCastAndCall:stalled-cycles-frontend:u  avgt    5   0.001 ?  0.001   #/op

Minus one 1 l1 load, Plus one l1 store, it’s obvious. Since stores are cheaper (for cpu core, not for caches), total cycles got reduced.
One another hand, stores storm should generate some traffic from l1d to LLC and then to RAM which may slow down another threads/apps.

Regards, Vladimir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/riscv-port-dev/attachments/20230503/6f2d1bbf/attachment-0001.htm>