Long pause time of Final Mark in latest 11u shenandoah

Liang Mao maoliang.ml at alibaba-inc.com
Mon Aug 31 04:31:32 UTC 2020


Hi Roman,

Thanks very much for your quick reply! I tried the flush interval but it
didn't work. And then I looked into the code and reduced the number of
ShenandoahSATBBufferSize which resolved the problem. But now I don't have
the testing environment and cannot do more verificatoins. Will let you
 know once I have a double check.

BTW, do you have any experience of performance impact if we reduce the
 default value of ShenandoahSATBBufferSize? I guess is it possible that
the "snapshots" in Shenandoah would be more than G1 because G1 satb
only handles old objects?

Thanks,
Liang


----------------------------------------------------------------------
Hello Liang,

It sticks out that this is a really short concurrent-mark phase (69ms)
but a really long final-mark phase (680ms). It seems likely that most
of the reachable graph gets 'hidden' in SATB queues, which are only
processed at final-mark. Also, the SATB flush interval is 100ms, so we
don't get to flush the SATB queues at all, before hitting final-mark.
It may be worth experimenting with lower
ShenandoahSATBBufferFlushInterval, e.g.
-XX:ShenandoahSATBBufferFlushInterval=30 or so.

We are considering to add machinery to force-flush SATB queues during
conc-mark, before hitting final-mark, so that we decrease the
likelyhood of unmarked subgraphs hiding in SATB buffers.

Let me know if decreasing the SATB flush interval helps!

Thanks for the report!

Cheers,
Roman





------------------------------------------------------------------
From:MAO, Liang <maoliang.ml at alibaba-inc.com>
Send Time:2020 Aug. 27 (Thu.) 14:56
To:shenandoah-dev at openjdk.java.net <shenandoah-dev at openjdk.java.net>
Subject:Long pause time of Final Mark in latest 11u shenandoah

Hi Shenandoah developers,

We are investigating the product version of Shenandoah GC in 11u and it showed
a good result except an unexpected behavior of long pause time in final mark
(finish queues actually). I didn't find any open discussions of this issue yet
and cannot see any suspicious points in code. Could you please help?

Below is the problematic GC information in log. Usually the final mark only took
~2 ms. The long pause time of final mark happened in a rate of ~1% in this application.
The java option is "-Xmx24g -Xms24g -XX:+UseShenandoahGC".

[2020-08-27T13:42:26.940+0800][gc            ] Trigger: Free (2451M) is below minimum threshold (2457M)
[2020-08-27T13:42:26.940+0800][gc,ergo       ] Free: 2451M, Max: 8192K regular, 1288M humongous, Frag: 47% external, 0% internal; Reserve: 1232M, Max: 8192K
[2020-08-27T13:42:26.940+0800][gc,start      ] GC(2167) Concurrent reset
[2020-08-27T13:42:26.940+0800][gc,task       ] GC(2167) Using 4 of 8 workers for concurrent reset
[2020-08-27T13:42:26.940+0800][gc,ergo       ] GC(2167) Pacer for Reset. Non-Taxable: 24576M
[2020-08-27T13:42:26.946+0800][gc            ] GC(2167) Concurrent reset 5.489ms
[2020-08-27T13:42:26.947+0800][gc,start      ] GC(2167) Pause Init Mark
[2020-08-27T13:42:26.947+0800][gc,task       ] GC(2167) Using 8 of 8 workers for init marking
[2020-08-27T13:42:26.948+0800][gc,ergo       ] GC(2167) Pacer for Mark. Expected Live: 3577M, Free: 2451M, Non-Taxable: 245M, Alloc Tax Rate: 1.8x
[2020-08-27T13:42:26.948+0800][gc            ] GC(2167) Pause Init Mark 1.093ms
[2020-08-27T13:42:26.948+0800][gc,start      ] GC(2167) Concurrent marking
[2020-08-27T13:42:26.948+0800][gc,task       ] GC(2167) Using 4 of 8 workers for concurrent marking
[2020-08-27T13:42:27.018+0800][gc            ] GC(2167) Concurrent marking 69.710ms
[2020-08-27T13:42:27.019+0800][gc,start      ] GC(2167) Pause Final Mark
[2020-08-27T13:42:27.019+0800][gc,task       ] GC(2167) Using 8 of 8 workers for final marking
[2020-08-27T13:42:27.698+0800][gc,ergo       ] GC(2167) Adaptive CSet Selection. Target Free: 3481M, Actual Free: 15270M, Max CSet: 1024M, Min Garbage: 0B
[2020-08-27T13:42:27.698+0800][gc,ergo       ] GC(2167) Collectable Garbage: 16415M (96%), Immediate: 11966M (70%), CSet: 4448M (26%)
[2020-08-27T13:42:27.699+0800][gc,ergo       ] GC(2167) Pacer for Evacuation. Used CSet: 5469M, Free: 14101M, Non-Taxable: 1410M, Alloc Tax Rate: 1.1x
[2020-08-27T13:42:27.699+0800][gc            ] GC(2167) Pause Final Mark 679.962ms
[2020-08-27T13:42:27.699+0800][gc,start      ] GC(2167) Concurrent cleanup
[2020-08-27T13:42:27.699+0800][gc            ] GC(2167) Concurrent cleanup 20942M->8984M(24576M) 0.288ms
[2020-08-27T13:42:27.699+0800][gc,ergo       ] GC(2167) Free: 14100M, Max: 8192K regular, 1288M humongous, Frag: 91% external, 0% internal; Reserve: 1231M, Max: 8192K
[2020-08-27T13:42:27.699+0800][gc,start      ] GC(2167) Concurrent evacuation
[2020-08-27T13:42:27.699+0800][gc,task       ] GC(2167) Using 4 of 8 workers for concurrent evacuation
[2020-08-27T13:42:27.879+0800][gc            ] GC(2167) Concurrent evacuation 180.091ms
[2020-08-27T13:42:27.881+0800][gc,start      ] GC(2167) Pause Init Update Refs
[2020-08-27T13:42:27.881+0800][gc,ergo       ] GC(2167) Pacer for Update Refs. Used: 10377M, Free: 13756M, Non-Taxable: 1375M, Alloc Tax Rate: 1.1x
[2020-08-27T13:42:27.881+0800][gc            ] GC(2167) Pause Init Update Refs 0.154ms
[2020-08-27T13:42:27.881+0800][gc,start      ] GC(2167) Concurrent update references
[2020-08-27T13:42:27.881+0800][gc,task       ] GC(2167) Using 4 of 8 workers for concurrent reference update
[2020-08-27T13:42:28.570+0800][gc            ] GC(2167) Concurrent update references 688.702ms[2020-08-27T13:42:28.571+0800][gc,start      ] GC(2167) Pause Final Update Refs
[2020-08-27T13:42:28.571+0800][gc,task       ] GC(2167) Using 8 of 8 workers for final reference update
[2020-08-27T13:42:28.571+0800][gc            ] GC(2167) Pause Final Update Refs 0.402ms
[2020-08-27T13:42:28.571+0800][gc,start      ] GC(2167) Concurrent cleanup
[2020-08-27T13:42:28.571+0800][gc            ] GC(2167) Concurrent cleanup 10886M->5416M(24576M) 0.156ms


[2020-08-27T13:42:28.572+0800][gc,stats      ] Pause Final Mark (G)             681245 us
[2020-08-27T13:42:28.572+0800][gc,stats      ] Pause Final Mark (N)             679951 us
[2020-08-27T13:42:28.572+0800][gc,stats      ]   Finish Queues                  678515 us
[2020-08-27T13:42:28.572+0800][gc,stats      ]   System Purge                      130 us
[2020-08-27T13:42:28.572+0800][gc,stats      ]     Weak Roots                      129 us
[2020-08-27T13:42:28.572+0800][gc,stats      ]   Update Region States               30 us
[2020-08-27T13:42:28.572+0800][gc,stats      ]   Retire TLABs                      102 us
[2020-08-27T13:42:28.572+0800][gc,stats      ]   Choose Collection Set             197 us
[2020-08-27T13:42:28.572+0800][gc,stats      ]   Rebuild Free Set                   26 us
[2020-08-27T13:42:28.572+0800][gc,stats      ]   Initial Evacuation                893 us, parallelism: 5.36x

Thanks,
Liang



More information about the shenandoah-dev mailing list