RFR: 8272138: ZGC: Adopt release ordering for self-healing [v4]
Xiaowei Lu
github.com+39413832+weixlu at openjdk.java.net
Tue Aug 10 14:09:48 UTC 2021
> ZGC utilizes self-healing in load barrier to fix bad references. Currently, this fixing (ZBarrier::self_heal) adopts memory_order_conservative to guarantee that (1) the slow path (relocate, mark, etc., where addresses get healed) always happens before self healing, and (2) the other thread that accesses the same reference is able to access the healed address.
> Let us consider memory_order_release for ZBarrier::self_heal. For example, Thread 1 is fixing a reference, and Thread 2 attempts to access the same reference. There exists data dependency in T2 where access of pointer happens before access of object’s content, equaling acquire semantic. Pairing with the release semantic in self healing, this makes up inter-thread acquire-release memory ordering. As a result, the two statements we mentioned above can be guaranteed by the acquire-release ordering.
> We performed an experiment on benchmark corretto/heapothesys on AArch64. The optimized version results in both (1) shorter average concurrent mark time and (2) shorter average concurrent relocation time. Furthermore, we notice shorter average latency in almost all testcases.
>
>
> [root at localhost corretto]# grep "00.*Phase: Concurrent Mark " *.log
> baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Mark 960.359 / 960.359 587.203 / 1248.362 587.203 / 1248.362 587.203 / 1248.362 ms
> baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Mark 116.748 / 116.748 656.777 / 1736.469 656.777 / 1736.469 656.777 / 1736.469 ms
> baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Mark 125.460 / 125.460 620.440 / 1736.469 620.440 / 1736.469 620.440 / 1736.469 ms
> baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Mark 935.295 / 935.295 673.080 / 1736.469 673.080 / 1736.469 673.080 / 1736.469 ms
> baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Mark 1448.705 / 1448.705 723.484 / 1814.849 723.484 / 1814.849 723.484 / 1814.849 ms
> baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Mark 1490.123 / 1490.123 796.960 / 1842.794 796.960 / 1842.794 796.960 / 1842.794 ms
> baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 912.439 / 2183.065 867.799 / 2183.065 867.799 / 2183.065 ms
> baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Mark 1468.594 / 1468.594 990.044 / 2183.065 912.281 / 2183.065 912.281 / 2183.065 ms
> baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Mark 137.435 / 137.435 1109.116 / 2276.535 967.470 / 2276.535 967.470 / 2276.535 ms
> baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Mark 184.093 / 184.093 1172.446 / 2276.535 997.343 / 2276.535 997.343 / 2276.535 ms
> baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Mark 1537.673 / 1537.673 1211.815 / 2276.535 1013.076 / 2276.535 1013.076 / 2276.535 ms
> baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1218.085 / 2276.535 1025.443 / 2276.535 1025.443 / 2276.535 ms
> optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Mark 1053.065 / 1053.065 581.646 / 1249.822 581.646 / 1249.822 581.646 / 1249.822 ms
> optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Mark 885.795 / 885.795 573.650 / 1277.782 573.650 / 1277.782 573.650 / 1277.782 ms
> optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Mark 124.236 / 124.236 641.028 / 1828.124 641.028 / 1828.124 641.028 / 1828.124 ms
> optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Mark 875.383 / 875.383 666.465 / 1828.124 666.465 / 1828.124 666.465 / 1828.124 ms
> optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Mark 1937.305 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 ms
> optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Mark 173.064 / 173.064 771.387 / 1937.305 771.387 / 1937.305 771.387 / 1937.305 ms
> optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Mark 1832.584 / 1832.584 899.646 / 2048.471 856.838 / 2048.471 856.838 / 2048.471 ms
> optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Mark 1510.755 / 1510.755 981.807 / 2048.471 893.373 / 2048.471 893.373 / 2048.471 ms
> optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Mark 1472.737 / 1472.737 1044.755 / 2089.539 927.733 / 2089.539 927.733 / 2089.539 ms
> optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Mark 1513.077 / 1513.077 1095.827 / 2089.539 947.202 / 2089.539 947.202 / 2089.539 ms
> optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1073.703 / 2089.539 943.684 / 2089.539 943.684 / 2089.539 ms
> optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Mark 1337.865 / 1337.865 1119.936 / 2113.895 962.172 / 2113.895 962.172 / 2113.895 ms
>
> [root at localhost corretto]# grep "00.*Phase: Concurrent Relocate " *.log
> baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Relocate 196.522 / 196.522 114.318 / 245.371 114.318 / 245.371 114.318 / 245.371 ms
> baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Relocate 47.748 / 47.748 130.861 / 331.948 130.861 / 331.948 130.861 / 331.948 ms
> baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Relocate 56.922 / 56.922 129.174 / 331.948 129.174 / 331.948 129.174 / 331.948 ms
> baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Relocate 218.707 / 218.707 137.495 / 331.948 137.495 / 331.948 137.495 / 331.948 ms
> baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Relocate 197.166 / 197.166 144.216 / 359.644 144.216 / 359.644 144.216 / 359.644 ms
> baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Relocate 202.118 / 202.118 153.507 / 373.447 153.507 / 373.447 153.507 / 373.447 ms
> baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 172.241 / 395.113 164.291 / 395.113 164.291 / 395.113 ms
> baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Relocate 215.121 / 215.121 186.007 / 421.039 173.139 / 421.039 173.139 / 421.039 ms
> baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Relocate 48.550 / 48.550 203.420 / 421.982 181.899 / 421.982 181.899 / 421.982 ms
> baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Relocate 53.847 / 53.847 211.774 / 421.982 185.728 / 421.982 185.728 / 421.982 ms
> baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Relocate 224.489 / 224.489 218.195 / 431.088 188.087 / 431.088 188.087 / 431.088 ms
> baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 222.852 / 431.088 191.130 / 431.088 191.130 / 431.088 ms
> optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Relocate 193.811 / 193.811 113.043 / 248.471 113.043 / 248.471 113.043 / 248.471 ms
> optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Relocate 196.220 / 196.220 117.810 / 248.471 117.810 / 248.471 117.810 / 248.471 ms
> optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Relocate 48.786 / 48.786 131.753 / 351.890 131.753 / 351.890 131.753 / 351.890 ms
> optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Relocate 195.302 / 195.302 139.115 / 351.890 139.115 / 351.890 139.115 / 351.890 ms
> optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Relocate 374.022 / 374.022 155.204 / 374.022 155.204 / 374.022 155.204 / 374.022 ms
> optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Relocate 49.222 / 49.222 159.444 / 400.795 159.444 / 400.795 159.444 / 400.795 ms
> optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Relocate 381.072 / 381.072 182.488 / 409.086 173.140 / 409.086 173.140 / 409.086 ms
> optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Relocate 223.399 / 223.399 191.774 / 409.086 175.748 / 409.086 175.748 / 409.086 ms
> optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Relocate 214.184 / 214.184 201.526 / 409.086 181.302 / 409.086 181.302 / 409.086 ms
> optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Relocate 208.600 / 208.600 207.389 / 420.479 183.756 / 420.479 183.756 / 420.479 ms
> optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Relocate 209.444 / 209.444 202.367 / 420.479 183.173 / 420.479 183.173 / 420.479 ms
> optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Relocate 223.841 / 223.841 206.268 / 420.479 185.074 / 420.479 185.074 / 420.479 ms
>
> [root at localhost corretto]# grep "average latency:" nohup_baseline.out
> average latency: 2ms:40us
> average latency: 6ms:550us
> average latency: 6ms:543us
> average latency: 6ms:493us
> average latency: 928us
> average latency: 794us
> average latency: 1ms:403us
> average latency: 23ms:216us
> average latency: 775us
> [root at localhost corretto]# grep "average latency:" nohup_optimized.out
> average latency: 2ms:48us
> average latency: 5ms:948us
> average latency: 5ms:940us
> average latency: 5ms:875us
> average latency: 850us
> average latency: 723us
> average latency: 1ms:221us
> average latency: 22ms:653us
> average latency: 693us
Xiaowei Lu has updated the pull request incrementally with one additional commit since the last revision:
add annotation
-------------
Changes:
- all: https://git.openjdk.java.net/jdk/pull/5046/files
- new: https://git.openjdk.java.net/jdk/pull/5046/files/348c1c04..3f685dee
Webrevs:
- full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5046&range=03
- incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5046&range=02-03
Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod
Patch: https://git.openjdk.java.net/jdk/pull/5046.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/5046/head:pull/5046
PR: https://git.openjdk.java.net/jdk/pull/5046
More information about the hotspot-gc-dev
mailing list