RFR: 8272138: ZGC: Adopt release ordering for self-healing [v5]

Xiaowei Lu github.com+39413832+weixlu at openjdk.java.net
Tue Aug 10 14:33:56 UTC 2021


> ZGC utilizes self-healing in load barrier to fix bad references. Currently, this fixing (ZBarrier::self_heal) adopts memory_order_conservative to guarantee that (1) the slow path (relocate, mark, etc., where addresses get healed) always happens before self healing, and (2) the other thread that accesses the same reference is able to access the healed address.
> Let us consider memory_order_release for ZBarrier::self_heal. For example, Thread 1 is fixing a reference, and Thread 2 attempts to access the same reference. There exists data dependency in T2 where access of pointer happens before access of object’s content, equaling acquire semantic. Pairing with the release semantic in self healing, this makes up inter-thread acquire-release memory ordering. As a result, the two statements we mentioned above can be guaranteed by the acquire-release ordering.
> We performed an experiment on benchmark corretto/heapothesys on AArch64. The optimized version results in both (1) shorter average concurrent mark time and (2) shorter average concurrent relocation time. Furthermore, we notice shorter average latency in almost all testcases.
> 
> 
> [root at localhost corretto]# grep "00.*Phase: Concurrent Mark           " *.log
> baseline.log:[100.412s][info][gc,stats    ]       Phase: Concurrent Mark                             960.359 / 960.359     587.203 / 1248.362    587.203 / 1248.362    587.203 / 1248.362    ms
> baseline.log:[200.411s][info][gc,stats    ]       Phase: Concurrent Mark                             116.748 / 116.748     656.777 / 1736.469    656.777 / 1736.469    656.777 / 1736.469    ms
> baseline.log:[300.411s][info][gc,stats    ]       Phase: Concurrent Mark                             125.460 / 125.460     620.440 / 1736.469    620.440 / 1736.469    620.440 / 1736.469    ms
> baseline.log:[400.411s][info][gc,stats    ]       Phase: Concurrent Mark                             935.295 / 935.295     673.080 / 1736.469    673.080 / 1736.469    673.080 / 1736.469    ms
> baseline.log:[500.411s][info][gc,stats    ]       Phase: Concurrent Mark                            1448.705 / 1448.705    723.484 / 1814.849    723.484 / 1814.849    723.484 / 1814.849    ms
> baseline.log:[600.411s][info][gc,stats    ]       Phase: Concurrent Mark                            1490.123 / 1490.123    796.960 / 1842.794    796.960 / 1842.794    796.960 / 1842.794    ms
> baseline.log:[700.411s][info][gc,stats    ]       Phase: Concurrent Mark                               0.000 / 0.000       912.439 / 2183.065    867.799 / 2183.065    867.799 / 2183.065    ms
> baseline.log:[800.412s][info][gc,stats    ]       Phase: Concurrent Mark                            1468.594 / 1468.594    990.044 / 2183.065    912.281 / 2183.065    912.281 / 2183.065    ms
> baseline.log:[900.411s][info][gc,stats    ]       Phase: Concurrent Mark                             137.435 / 137.435    1109.116 / 2276.535    967.470 / 2276.535    967.470 / 2276.535    ms
> baseline.log:[1000.411s][info][gc,stats    ]       Phase: Concurrent Mark                             184.093 / 184.093    1172.446 / 2276.535    997.343 / 2276.535    997.343 / 2276.535    ms
> baseline.log:[1100.411s][info][gc,stats    ]       Phase: Concurrent Mark                            1537.673 / 1537.673   1211.815 / 2276.535   1013.076 / 2276.535   1013.076 / 2276.535    ms
> baseline.log:[1200.412s][info][gc,stats    ]       Phase: Concurrent Mark                               0.000 / 0.000      1218.085 / 2276.535   1025.443 / 2276.535   1025.443 / 2276.535    ms
> optimized.log:[100.423s][info][gc,stats    ]       Phase: Concurrent Mark                            1053.065 / 1053.065    581.646 / 1249.822    581.646 / 1249.822    581.646 / 1249.822    ms
> optimized.log:[200.423s][info][gc,stats    ]       Phase: Concurrent Mark                             885.795 / 885.795     573.650 / 1277.782    573.650 / 1277.782    573.650 / 1277.782    ms
> optimized.log:[300.423s][info][gc,stats    ]       Phase: Concurrent Mark                             124.236 / 124.236     641.028 / 1828.124    641.028 / 1828.124    641.028 / 1828.124    ms
> optimized.log:[400.423s][info][gc,stats    ]       Phase: Concurrent Mark                             875.383 / 875.383     666.465 / 1828.124    666.465 / 1828.124    666.465 / 1828.124    ms
> optimized.log:[500.423s][info][gc,stats    ]       Phase: Concurrent Mark                            1937.305 / 1937.305    754.228 / 1937.305    754.228 / 1937.305    754.228 / 1937.305    ms
> optimized.log:[600.423s][info][gc,stats    ]       Phase: Concurrent Mark                             173.064 / 173.064     771.387 / 1937.305    771.387 / 1937.305    771.387 / 1937.305    ms
> optimized.log:[700.423s][info][gc,stats    ]       Phase: Concurrent Mark                            1832.584 / 1832.584    899.646 / 2048.471    856.838 / 2048.471    856.838 / 2048.471    ms
> optimized.log:[800.423s][info][gc,stats    ]       Phase: Concurrent Mark                            1510.755 / 1510.755    981.807 / 2048.471    893.373 / 2048.471    893.373 / 2048.471    ms
> optimized.log:[900.423s][info][gc,stats    ]       Phase: Concurrent Mark                            1472.737 / 1472.737   1044.755 / 2089.539    927.733 / 2089.539    927.733 / 2089.539    ms
> optimized.log:[1000.423s][info][gc,stats    ]       Phase: Concurrent Mark                            1513.077 / 1513.077   1095.827 / 2089.539    947.202 / 2089.539    947.202 / 2089.539    ms
> optimized.log:[1100.423s][info][gc,stats    ]       Phase: Concurrent Mark                               0.000 / 0.000      1073.703 / 2089.539    943.684 / 2089.539    943.684 / 2089.539    ms
> optimized.log:[1200.423s][info][gc,stats    ]       Phase: Concurrent Mark                            1337.865 / 1337.865   1119.936 / 2113.895    962.172 / 2113.895    962.172 / 2113.895    ms
> 
> [root at localhost corretto]# grep "00.*Phase: Concurrent Relocate           " *.log
> baseline.log:[100.412s][info][gc,stats    ]       Phase: Concurrent Relocate                         196.522 / 196.522     114.318 / 245.371     114.318 / 245.371     114.318 / 245.371     ms
> baseline.log:[200.411s][info][gc,stats    ]       Phase: Concurrent Relocate                          47.748 / 47.748      130.861 / 331.948     130.861 / 331.948     130.861 / 331.948     ms
> baseline.log:[300.411s][info][gc,stats    ]       Phase: Concurrent Relocate                          56.922 / 56.922      129.174 / 331.948     129.174 / 331.948     129.174 / 331.948     ms
> baseline.log:[400.411s][info][gc,stats    ]       Phase: Concurrent Relocate                         218.707 / 218.707     137.495 / 331.948     137.495 / 331.948     137.495 / 331.948     ms
> baseline.log:[500.411s][info][gc,stats    ]       Phase: Concurrent Relocate                         197.166 / 197.166     144.216 / 359.644     144.216 / 359.644     144.216 / 359.644     ms
> baseline.log:[600.411s][info][gc,stats    ]       Phase: Concurrent Relocate                         202.118 / 202.118     153.507 / 373.447     153.507 / 373.447     153.507 / 373.447     ms
> baseline.log:[700.411s][info][gc,stats    ]       Phase: Concurrent Relocate                           0.000 / 0.000       172.241 / 395.113     164.291 / 395.113     164.291 / 395.113     ms
> baseline.log:[800.412s][info][gc,stats    ]       Phase: Concurrent Relocate                         215.121 / 215.121     186.007 / 421.039     173.139 / 421.039     173.139 / 421.039     ms
> baseline.log:[900.411s][info][gc,stats    ]       Phase: Concurrent Relocate                          48.550 / 48.550      203.420 / 421.982     181.899 / 421.982     181.899 / 421.982     ms
> baseline.log:[1000.411s][info][gc,stats    ]       Phase: Concurrent Relocate                          53.847 / 53.847      211.774 / 421.982     185.728 / 421.982     185.728 / 421.982     ms
> baseline.log:[1100.411s][info][gc,stats    ]       Phase: Concurrent Relocate                         224.489 / 224.489     218.195 / 431.088     188.087 / 431.088     188.087 / 431.088     ms
> baseline.log:[1200.412s][info][gc,stats    ]       Phase: Concurrent Relocate                           0.000 / 0.000       222.852 / 431.088     191.130 / 431.088     191.130 / 431.088     ms
> optimized.log:[100.423s][info][gc,stats    ]       Phase: Concurrent Relocate                         193.811 / 193.811     113.043 / 248.471     113.043 / 248.471     113.043 / 248.471     ms
> optimized.log:[200.423s][info][gc,stats    ]       Phase: Concurrent Relocate                         196.220 / 196.220     117.810 / 248.471     117.810 / 248.471     117.810 / 248.471     ms
> optimized.log:[300.423s][info][gc,stats    ]       Phase: Concurrent Relocate                          48.786 / 48.786      131.753 / 351.890     131.753 / 351.890     131.753 / 351.890     ms
> optimized.log:[400.423s][info][gc,stats    ]       Phase: Concurrent Relocate                         195.302 / 195.302     139.115 / 351.890     139.115 / 351.890     139.115 / 351.890     ms
> optimized.log:[500.423s][info][gc,stats    ]       Phase: Concurrent Relocate                         374.022 / 374.022     155.204 / 374.022     155.204 / 374.022     155.204 / 374.022     ms
> optimized.log:[600.423s][info][gc,stats    ]       Phase: Concurrent Relocate                          49.222 / 49.222      159.444 / 400.795     159.444 / 400.795     159.444 / 400.795     ms
> optimized.log:[700.423s][info][gc,stats    ]       Phase: Concurrent Relocate                         381.072 / 381.072     182.488 / 409.086     173.140 / 409.086     173.140 / 409.086     ms
> optimized.log:[800.423s][info][gc,stats    ]       Phase: Concurrent Relocate                         223.399 / 223.399     191.774 / 409.086     175.748 / 409.086     175.748 / 409.086     ms
> optimized.log:[900.423s][info][gc,stats    ]       Phase: Concurrent Relocate                         214.184 / 214.184     201.526 / 409.086     181.302 / 409.086     181.302 / 409.086     ms
> optimized.log:[1000.423s][info][gc,stats    ]       Phase: Concurrent Relocate                         208.600 / 208.600     207.389 / 420.479     183.756 / 420.479     183.756 / 420.479     ms
> optimized.log:[1100.423s][info][gc,stats    ]       Phase: Concurrent Relocate                         209.444 / 209.444     202.367 / 420.479     183.173 / 420.479     183.173 / 420.479     ms
> optimized.log:[1200.423s][info][gc,stats    ]       Phase: Concurrent Relocate                         223.841 / 223.841     206.268 / 420.479     185.074 / 420.479     185.074 / 420.479     ms
> 
> [root at localhost corretto]# grep "average latency:" nohup_baseline.out 
>         average latency: 2ms:40us
>         average latency: 6ms:550us
>         average latency: 6ms:543us
>         average latency: 6ms:493us
>         average latency: 928us
>         average latency: 794us
>         average latency: 1ms:403us
>         average latency: 23ms:216us
>         average latency: 775us
> [root at localhost corretto]# grep "average latency:" nohup_optimized.out
>         average latency: 2ms:48us
>         average latency: 5ms:948us
>         average latency: 5ms:940us
>         average latency: 5ms:875us
>         average latency: 850us
>         average latency: 723us
>         average latency: 1ms:221us
>         average latency: 22ms:653us
>         average latency: 693us

Xiaowei Lu has updated the pull request incrementally with one additional commit since the last revision:

  replace release() to a single place in insert()

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5046/files
  - new: https://git.openjdk.java.net/jdk/pull/5046/files/3f685dee..3b8ed5f5

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5046&range=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5046&range=03-04

  Stats: 12 lines in 2 files changed: 4 ins; 8 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5046.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5046/head:pull/5046

PR: https://git.openjdk.java.net/jdk/pull/5046



More information about the hotspot-gc-dev mailing list