CMS-concurrent-sweep spent extremly long time with 8u
Liang Mao
maoliang.ml at alibaba-inc.com
Tue Feb 18 11:03:31 UTC 2020
Hi,
I saw an very unusual scenario that concurrent sweep cost extremly long time.
After a long time concurrent mode failure, the GC was recovered.
In the previous CMS cycle at 05:32, sweep worked fine that we could see
old gen occupancy continued to decrease until sweep completed. After the
problematic sweep started, the old gen occupancy dropped very slowly in
the early stage and then increased to promotion failure. The young GC
slowed 30x at that time as well.
2020-02-18T05:32:12.740+0800: 14277958.735: [CMS-concurrent-sweep-start]
2020-02-18T05:32:13.178+0800: 14277959.173: [GC (Allocation Failure) 2020-02-18T05:32:13.178+0800: 14277959.173: [ParNew 247829K->126882K(3145728K), 0.0259766 secs] 57679525K->55581737K(82837504K), 0.0266680 secs] [Times: user=0.31 sys=0.00, real=0.03 secs]
2020-02-18T05:32:13.764+0800: 14277959.759: [GC (Allocation Failure) 2020-02-18T05:32:13.764+0800: 14277959.759: [ParNew 2224034K->100365K(3145728K), 0.0138495 secs] 56901617K->54782226K(82837504K), 0.0145248 secs] [Times: user=0.30 sys=0.00, real=0.01 secs]
2020-02-18T05:32:14.357+0800: 14277960.353: [GC (Allocation Failure) 2020-02-18T05:32:14.358+0800: 14277960.353: [ParNew 2197517K->107850K(3145728K), 0.0129100 secs] 56173691K->54088043K(82837504K), 0.0135664 secs] [Times: user=0.28 sys=0.00, real=0.01 secs]
...
2020-02-18T05:32:24.655+0800: 14277970.650: [GC (Allocation Failure) 2020-02-18T05:32:24.655+0800: 14277970.650: [ParNew 2218555K->137181K(3145728K), 0.0127018 secs] 45316426K->43236019K(82837504K), 0.0133486 secs] [Times: user=0.26 sys=0.00, real=0.01 secs]
...
2020-02-18T05:32:31.642+0800: 14277977.637: [GC (Allocation Failure) 2020-02-18T05:32:31.642+0800: 14277977.637: [ParNew 2257551K->159124K(3145728K), 0.0135751 secs] 38026885K->35932714K(82837504K), 0.0142222 secs] [Times: user=0.28 sys=0.00, real=0.02 secs]
...
2020-02-18T05:32:36.630+0800: 14277982.625: [GC (Allocation Failure) 2020-02-18T05:32:36.630+0800: 14277982.625: [ParNew 2232867K->134026K(3145728K), 0.0132394 secs] 32928275K->30837706K(82837504K), 0.0138968 secs] [Times: user=0.27 sys=0.00, real=0.02 secs]
2020-02-18T05:32:36.812+0800: 14277982.807: [CMS-concurrent-sweep: 23.250/24.072 secs] [Times: user=137.69 sys=0.00, real=24.07 secs]
...
2020-02-18T06:31:11.378+0800: 14281497.373: [CMS-concurrent-sweep-start]
2020-02-18T06:31:14.862+0800: 14281500.857: [GC (Allocation Failure) 2020-02-18T06:31:14.862+0800: 14281500.857: [ParNew 2925648K->871894K(3145728K), 0.3871002 secs] 58747945K->56735151K(82837504K), 0.3877653 secs] [Times: user=0.84 sys=0.00, real=0.39 secs]
2020-02-18T06:31:18.103+0800: 14281504.098: [GC (Allocation Failure) 2020-02-18T06:31:18.103+0800: 14281504.098: [ParNew 2969046K->828180K(3145728K), 0.4009765 secs] 58768864K->56670502K(82837504K), 0.4016504 secs] [Times: user=0.86 sys=0.00, real=0.40 secs]
...
2020-02-18T06:31:53.530+0800: 14281539.525: [GC (Allocation Failure) 2020-02-18T06:31:53.530+0800: 14281539.525: [ParNew 2884806K->790526K(3145728K), 0.4082761 secs] 58450942K->56399496K(82837504K), 0.4089053 secs] [Times: user=0.86 sys=0.00, real=0.41 secs]
2020-02-18T06:31:56.913+0800: 14281542.908: [GC (Allocation Failure) 2020-02-18T06:31:56.914+0800: 14281542.909: [ParNew 2887678K->830520K(3145728K), 0.3762305 secs] 58449269K->56431129K(82837504K), 0.3768546 secs] [Times: user=0.80 sys=0.00, real=0.37 secs]
...
2020-02-18T06:39:10.412+0800: 14281976.407: [GC (Allocation Failure) 2020-02-18T06:39:10.412+0800: 14281976.407: [ParNew 2912121K->831832K(3145728K), 0.3765129 secs] 55456571K->53415654K(82837504K), 0.3771153 secs] [Times: user=0.90 sys=0.00, real=0.38 secs]
...
2020-02-18T06:47:16.462+0800: 14282462.457: [GC (Allocation Failure) 2020-02-18T06:47:16.463+0800: 14282462.458: [ParNew 2931678K->872899K(3145728K), 0.3223769 secs] 56740808K->54718072K(82837504K), 0.3229753 secs] [Times: user=0.76 sys=0.00, real=0.32 secs]
...
2020-02-18T06:55:36.434+0800: 14282962.429: [GC (Allocation Failure) 2020-02-18T06:55:36.434+0800: 14282962.429: [ParNew 2932041K->837434K(3145728K), 0.3941144 secs] 59917770K->57864311K(82837504K), 0.3947332 secs] [Times: user=0.86 sys=0.00, real=0.39 secs]
...
2020-02-18T07:02:02.007+0800: 14283348.002: [GC (Allocation Failure) 2020-02-18T07:02:02.008+0800: 14283348.003: [ParNew 2871413K->847657K(3145728K), 0.3542050 secs] 68308980K->66321780K(82837504K), 0.3549298 secs] [Times: user=0.79 sys=0.00, real=0.36 secs]
...
2020-02-18T07:05:35.166+0800: 14283561.161: [GC (Allocation Failure) 2020-02-18T07:05:35.166+0800: 14283561.161: [ParNew 2859132K->870624K(3145728K), 0.3499013 secs] 75159088K->73210284K(82837504K), 0.3505328 secs] [Times: user=0.79 sys=0.00, real=0.35 secs]
2020-02-18T07:05:36.418+0800: 14283562.413: [GC (Allocation Failure) 2020-02-18T07:05:36.419+0800: 14283562.414: [ParNew (0: promotion failure size = 3286) (1: promotion failure size = 3288) (2: promotion failure size = 3285) (3: promotion failure size = 3286) (4: promotion failure size = 3286) (5: promotion failure size = 3288) (6: promotion failure size = 3290) (7: promotion failure size = 3286) (8: promotion failure size = 3286) (9: promotion failure size = 3287) (10: promotion failure size = 3292) (11: promotion failure size = 3286) (12: promotion failure size = 3285) (13: promotion failure size = 3288) (14: promotion failure size = 3286) (15: promotion failure size = 3286) (16: promotion failure size = 3287) (17: promotion failure size = 3287) (18: promotion failure size = 3285) (19: promotion failure size = 3286) (20: promotion failure size = 3285) (21: promotion failure size = 3289) (22: promotion failure size = 3289) (23: promotion failure size = 3286)
(promotion failed): 2967776K->3077198K(3145728K), 0.8107304 secs]2020-02-18T07:05:37.229+0800: 14283563.224: [CMS2020-02-18T07:06:57.395+0800: 14283643.390: [CMS-concurrent-sweep: 1752.431/2146.017 secs] [Times: user=8483.92 sys=0.00, real=2146.02 secs]
(concurrent mode failure) 72370103K->31883050K(79691776K), 95.8957007 secs] 75293380K->31883050K(82837504K), [Metaspace: 59347K->59341K(61440K)], 96.7139359 secs] [Times: user=97.25 sys=0.00, real=96.72 secs]
2020-02-18T07:08:04.605+0800: 14283710.600: [GC (Allocation Failure) 2020-02-18T07:08:04.605+0800: 14283710.600: [ParNew
2097152K->361723K(3145728K), 0.0195181 secs] 33980202K->32244773K(82837504K), 0.0198080 secs] [Times: user=0.36 sys=0.00, real=0.02 secs]
There're no any suspicious GC options:
-XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSScavengeBeforeRemark -XX:GCLogFileSize=209715200 -XX:InitialHeapSize=85899345920 -XX:MaxHeapSize=85899345920 -XX:MaxNewSize=4294967296 -XX:MinHeapDeltaBytes=196608 -XX:NewSize=4294967296 -XX:NumberOfGCLogFiles=5 -XX:OldPLABSize=16 -XX:OldSize=81604378624 -XX:-OmitStackTraceInFastThrow -XX:OnOutOfMemoryError=kill -9 %p -XX:ParGCCardsPerStrideChunk=4096 -XX:ParallelGCThreads=24 -XX:-ParallelRefProcEnabled -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCRootsTraceTime -XX:+PrintGCTimeStamps -XX:+PrintPromotionFailure -XX:+PrintReferenceGC -XX:SurvivorRatio=2 -XX:+UnlockDiagnosticVMOptions -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation -XX:+UseParNewGC
The instance has been running for months.
Does anybody know if there is a specific bug? Or it's just because of the fragment issue of CMS?
Thanks,
Liang
More information about the hotspot-gc-dev
mailing list