From prabhashrathore at gmail.com  Mon Feb  3 05:52:00 2020
From: prabhashrathore at gmail.com (Prabhash Rathore)
Date: Sun, 2 Feb 2020 21:52:00 -0800
Subject: Information on how to parse/interpret ZGC Logs
Message-ID: <CAFw09gJ8Z+uyMB=60Vpjwj86bw_-KRdQX1RBFXgG3dvciiXJkw@mail.gmail.com>

Hello,

We have decided to use ZGC Garbage Collector for our Java application
running on Java 11. I was wondering if there are any tools or any
documenation on how to interpret ZGC logs.

I found following statistics in ZGC log which as per my understanding shows
a very large allocation stall of 3902042.342 milliseconds. It will be
really great if I can get some help to understand this further.

[2020-02-02T22:37:36.883+0000] === Garbage Collection Statistics
=======================================================================================================================
[2020-02-02T22:37:36.883+0000]
                 Last 10s              Last 10m              Last 10h
         Total
[2020-02-02T22:37:36.883+0000]
                 Avg / Max             Avg / Max             Avg / Max
        Avg / Max
[2020-02-02T22:37:36.883+0000]   Collector: Garbage Collection Cycle
               0.000 / 0.000      7789.187 / 7789.187  12727.424 /
3903938.012  1265.033 / 3903938.012   ms
[2020-02-02T22:37:36.883+0000]  Contention: Mark Segment Reset Contention
                  0 / 0                10 / 1084            176 / 15122
       42 / 15122       ops/s
[2020-02-02T22:37:36.883+0000]  Contention: Mark SeqNum Reset Contention
                   0 / 0                 0 / 5                 0 / 31
         0 / 31          ops/s
[2020-02-02T22:37:36.883+0000]  Contention: Relocation Contention
                  0 / 0                 0 / 3                 1 / 708
        7 / 890         ops/s
[2020-02-02T22:37:36.883+0000]    Critical: Allocation Stall
               0.000 / 0.000         0.000 / 0.000      6714.722 /
3902042.342  6714.722 / 3902042.342   ms
[2020-02-02T22:37:36.883+0000]    Critical: Allocation Stall
                   0 / 0                 0 / 0                12 / 4115
         2 / 4115        ops/s
[2020-02-02T22:37:36.883+0000]    Critical: GC Locker Stall
              0.000 / 0.000         0.000 / 0.000         3.979 / 6.561
    1.251 / 6.561       ms
[2020-02-02T22:37:36.883+0000]    Critical: GC Locker Stall
                  0 / 0                 0 / 0                 0 / 1
        0 / 1           ops/s
[2020-02-02T22:37:36.883+0000]      Memory: Allocation Rate
                  0 / 0                 6 / 822             762 / 25306
     1548 / 25306       MB/s
[2020-02-02T22:37:36.883+0000]      Memory: Heap Used After Mark
                   0 / 0             92170 / 92170         89632 / 132896
     30301 / 132896      MB
[2020-02-02T22:37:36.883+0000]      Memory: Heap Used After Relocation
                   0 / 0             76376 / 76376         67490 / 132928
      8047 / 132928      MB
[2020-02-02T22:37:36.883+0000]      Memory: Heap Used Before Mark
                  0 / 0             92128 / 92128         84429 / 132896
     29452 / 132896      MB
[2020-02-02T22:37:36.883+0000]      Memory: Heap Used Before Relocation
                  0 / 0             86340 / 86340         76995 / 132896
     15862 / 132896      MB
[2020-02-02T22:37:36.883+0000]      Memory: Out Of Memory
                  0 / 0                 0 / 0                 0 / 0
        0 / 0           ops/s
[2020-02-02T22:37:36.883+0000]      Memory: Page Cache Flush
                   0 / 0                 0 / 0                62 / 2868
        16 / 2868        MB/s
[2020-02-02T22:37:36.883+0000]      Memory: Page Cache Hit L1
                  0 / 0                 7 / 2233            277 / 11553
      583 / 11553       ops/s
[2020-02-02T22:37:36.883+0000]      Memory: Page Cache Hit L2
                  0 / 0                 0 / 0                20 / 4619
        59 / 4619        ops/s
[2020-02-02T22:37:36.883+0000]      Memory: Page Cache Miss
                  0 / 0                 0 / 0                15 / 1039
         3 / 1297        ops/s
[2020-02-02T22:37:36.883+0000]      Memory: Undo Object Allocation Failed
                  0 / 0                 0 / 0                 0 / 24
         0 / 24          ops/s
[2020-02-02T22:37:36.883+0000]      Memory: Undo Object Allocation
Succeeded                  0 / 0                 0 / 3                 1 /
708               7 / 890         ops/s
[2020-02-02T22:37:36.883+0000]      Memory: Undo Page Allocation
                   0 / 0                 0 / 12               30 / 3464
         7 / 3464        ops/s
[2020-02-02T22:37:36.883+0000]       Phase: Concurrent Destroy Detached
Pages             0.000 / 0.000         0.004 / 0.004        11.675 /
1484.886      1.155 / 1484.886    ms
[2020-02-02T22:37:36.883+0000]       Phase: Concurrent Mark
              0.000 / 0.000      7016.569 / 7016.569  11758.365 /
3901893.544  1103.558 / 3901893.544   ms
[2020-02-02T22:37:36.883+0000]       Phase: Concurrent Mark Continue
               0.000 / 0.000         0.000 / 0.000      1968.844 / 3674.454
  1968.844 / 3674.454    ms
[2020-02-02T22:37:36.883+0000]       Phase: Concurrent Prepare Relocation
Set             0.000 / 0.000       453.732 / 453.732     364.535 /
7103.720     39.453 / 7103.720    ms
[2020-02-02T22:37:36.883+0000]       Phase: Concurrent Process Non-Strong
References      0.000 / 0.000         2.003 / 2.003         2.738 / 34.406
       2.253 / 34.406      ms
[2020-02-02T22:37:36.883+0000]       Phase: Concurrent Relocate
              0.000 / 0.000       261.822 / 261.822     335.954 / 2207.669
    45.868 / 2207.669    ms
[2020-02-02T22:37:36.883+0000]       Phase: Concurrent Reset Relocation Set
              0.000 / 0.000         6.083 / 6.083        13.489 / 1128.678
     3.574 / 1128.678    ms
[2020-02-02T22:37:36.883+0000]       Phase: Concurrent Select Relocation
Set              0.000 / 0.000         6.379 / 6.379        97.530 /
1460.679     18.439 / 1460.679    ms
[2020-02-02T22:37:36.883+0000]       Phase: Pause Mark End
               0.000 / 0.000         4.420 / 4.420         6.219 / 26.498
     6.474 / 40.883      ms
[2020-02-02T22:37:36.883+0000]       Phase: Pause Mark Start
               0.000 / 0.000        14.836 / 14.836       11.893 / 28.350
    11.664 / 41.767      ms
[2020-02-02T22:37:36.884+0000]       Phase: Pause Relocate Start
               0.000 / 0.000        13.411 / 13.411       30.849 / 697.344
     11.995 / 697.344     ms
[2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark
              0.000 / 0.000      7015.793 / 7016.276  18497.265 /
3901893.075  1690.497 / 3901893.075   ms
[2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Idle
               0.000 / 0.000         1.127 / 13.510        1.292 / 219.999
      1.280 / 219.999     ms
[2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Try Flush
              0.000 / 0.000         1.295 / 2.029        47.094 / 34869.359
    4.797 / 34869.359   ms
[2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Try Terminate
              0.000 / 0.000         1.212 / 14.847        1.760 / 3799.238
     1.724 / 3799.238    ms
[2020-02-02T22:37:36.884+0000]    Subphase: Concurrent References Enqueue
              0.000 / 0.000         0.009 / 0.009         0.022 / 1.930
    0.017 / 2.350       ms
[2020-02-02T22:37:36.884+0000]    Subphase: Concurrent References Process
              0.000 / 0.000         0.599 / 0.599         0.768 / 23.966
     0.495 / 23.966      ms
[2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
              0.000 / 0.000         0.882 / 1.253         1.155 / 21.699
     1.077 / 23.602      ms
[2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
JNIWeakHandles          0.000 / 0.000         0.301 / 0.943         0.308 /
10.868        0.310 / 23.219      ms
[2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
StringTable             0.000 / 0.000         0.289 / 0.496         0.390 /
12.794        0.363 / 22.907      ms
[2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
VMWeakHandles           0.000 / 0.000         0.230 / 0.469         0.329 /
21.267        0.331 / 23.135      ms
[2020-02-02T22:37:36.884+0000]    Subphase: Pause Mark Try Complete
              0.000 / 0.000         0.000 / 0.000         0.501 / 4.801
    0.480 / 17.208      ms
[2020-02-02T22:37:36.884+0000]    Subphase: Pause Remap TLABS
              0.000 / 0.000         0.252 / 0.252         0.195 / 0.528
    0.226 / 3.451       ms
[2020-02-02T22:37:36.884+0000]    Subphase: Pause Retire TLABS
               0.000 / 0.000         1.195 / 1.195         1.324 / 5.082
      1.408 / 11.219      ms
[2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots
              0.000 / 0.000         6.968 / 10.865       12.329 / 693.701
    6.431 / 1300.994    ms
[2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots
ClassLoaderDataGraph              0.000 / 0.000         4.819 / 8.232
  9.635 / 693.405       3.476 / 693.405     ms
[2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots CodeCache
              0.000 / 0.000         0.842 / 2.731         0.996 / 83.553
     0.780 / 83.553      ms
[2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots JNIHandles
               0.000 / 0.000         1.171 / 6.314         0.866 / 17.875
     0.837 / 25.708      ms


Thank you!
Prabhash Rathore

From per.liden at oracle.com  Mon Feb  3 10:35:13 2020
From: per.liden at oracle.com (Per Liden)
Date: Mon, 3 Feb 2020 11:35:13 +0100
Subject: Information on how to parse/interpret ZGC Logs
In-Reply-To: <CAFw09gJ8Z+uyMB=60Vpjwj86bw_-KRdQX1RBFXgG3dvciiXJkw@mail.gmail.com>
References: <CAFw09gJ8Z+uyMB=60Vpjwj86bw_-KRdQX1RBFXgG3dvciiXJkw@mail.gmail.com>
Message-ID: <365169bd-1d39-d0e5-eb21-2a7130973b03@oracle.com>

Hi,

On 2020-02-03 06:52, Prabhash Rathore wrote:
> Hello,
> 
> We have decided to use ZGC Garbage Collector for our Java application
> running on Java 11. I was wondering if there are any tools or any
> documenation on how to interpret ZGC logs.

Is there something in particular in the logs you're wondering about?

> 
> I found following statistics in ZGC log which as per my understanding shows
> a very large allocation stall of 3902042.342 milliseconds. It will be
> really great if I can get some help to understand this further.

I can see that you've had making times that is more than an hour long, 
which suggests that something in your system is seriously wrong (like 
extremely overloaded or an extremely slow disk that you log to... just 
guessing here). I think you need to have a broader look at the health of 
the system before we can draw any conclusion from the GC logs.

cheers,
Per

> 
> [2020-02-02T22:37:36.883+0000] === Garbage Collection Statistics
> =======================================================================================================================
> [2020-02-02T22:37:36.883+0000]
>                   Last 10s              Last 10m              Last 10h
>           Total
> [2020-02-02T22:37:36.883+0000]
>                   Avg / Max             Avg / Max             Avg / Max
>          Avg / Max
> [2020-02-02T22:37:36.883+0000]   Collector: Garbage Collection Cycle
>                 0.000 / 0.000      7789.187 / 7789.187  12727.424 /
> 3903938.012  1265.033 / 3903938.012   ms
> [2020-02-02T22:37:36.883+0000]  Contention: Mark Segment Reset Contention
>                    0 / 0                10 / 1084            176 / 15122
>         42 / 15122       ops/s
> [2020-02-02T22:37:36.883+0000]  Contention: Mark SeqNum Reset Contention
>                     0 / 0                 0 / 5                 0 / 31
>           0 / 31          ops/s
> [2020-02-02T22:37:36.883+0000]  Contention: Relocation Contention
>                    0 / 0                 0 / 3                 1 / 708
>          7 / 890         ops/s
> [2020-02-02T22:37:36.883+0000]    Critical: Allocation Stall
>                 0.000 / 0.000         0.000 / 0.000      6714.722 /
> 3902042.342  6714.722 / 3902042.342   ms
> [2020-02-02T22:37:36.883+0000]    Critical: Allocation Stall
>                     0 / 0                 0 / 0                12 / 4115
>           2 / 4115        ops/s
> [2020-02-02T22:37:36.883+0000]    Critical: GC Locker Stall
>                0.000 / 0.000         0.000 / 0.000         3.979 / 6.561
>      1.251 / 6.561       ms
> [2020-02-02T22:37:36.883+0000]    Critical: GC Locker Stall
>                    0 / 0                 0 / 0                 0 / 1
>          0 / 1           ops/s
> [2020-02-02T22:37:36.883+0000]      Memory: Allocation Rate
>                    0 / 0                 6 / 822             762 / 25306
>       1548 / 25306       MB/s
> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used After Mark
>                     0 / 0             92170 / 92170         89632 / 132896
>       30301 / 132896      MB
> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used After Relocation
>                     0 / 0             76376 / 76376         67490 / 132928
>        8047 / 132928      MB
> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used Before Mark
>                    0 / 0             92128 / 92128         84429 / 132896
>       29452 / 132896      MB
> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used Before Relocation
>                    0 / 0             86340 / 86340         76995 / 132896
>       15862 / 132896      MB
> [2020-02-02T22:37:36.883+0000]      Memory: Out Of Memory
>                    0 / 0                 0 / 0                 0 / 0
>          0 / 0           ops/s
> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Flush
>                     0 / 0                 0 / 0                62 / 2868
>          16 / 2868        MB/s
> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Hit L1
>                    0 / 0                 7 / 2233            277 / 11553
>        583 / 11553       ops/s
> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Hit L2
>                    0 / 0                 0 / 0                20 / 4619
>          59 / 4619        ops/s
> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Miss
>                    0 / 0                 0 / 0                15 / 1039
>           3 / 1297        ops/s
> [2020-02-02T22:37:36.883+0000]      Memory: Undo Object Allocation Failed
>                    0 / 0                 0 / 0                 0 / 24
>           0 / 24          ops/s
> [2020-02-02T22:37:36.883+0000]      Memory: Undo Object Allocation
> Succeeded                  0 / 0                 0 / 3                 1 /
> 708               7 / 890         ops/s
> [2020-02-02T22:37:36.883+0000]      Memory: Undo Page Allocation
>                     0 / 0                 0 / 12               30 / 3464
>           7 / 3464        ops/s
> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Destroy Detached
> Pages             0.000 / 0.000         0.004 / 0.004        11.675 /
> 1484.886      1.155 / 1484.886    ms
> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Mark
>                0.000 / 0.000      7016.569 / 7016.569  11758.365 /
> 3901893.544  1103.558 / 3901893.544   ms
> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Mark Continue
>                 0.000 / 0.000         0.000 / 0.000      1968.844 / 3674.454
>    1968.844 / 3674.454    ms
> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Prepare Relocation
> Set             0.000 / 0.000       453.732 / 453.732     364.535 /
> 7103.720     39.453 / 7103.720    ms
> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Process Non-Strong
> References      0.000 / 0.000         2.003 / 2.003         2.738 / 34.406
>         2.253 / 34.406      ms
> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Relocate
>                0.000 / 0.000       261.822 / 261.822     335.954 / 2207.669
>      45.868 / 2207.669    ms
> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Reset Relocation Set
>                0.000 / 0.000         6.083 / 6.083        13.489 / 1128.678
>       3.574 / 1128.678    ms
> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Select Relocation
> Set              0.000 / 0.000         6.379 / 6.379        97.530 /
> 1460.679     18.439 / 1460.679    ms
> [2020-02-02T22:37:36.883+0000]       Phase: Pause Mark End
>                 0.000 / 0.000         4.420 / 4.420         6.219 / 26.498
>       6.474 / 40.883      ms
> [2020-02-02T22:37:36.883+0000]       Phase: Pause Mark Start
>                 0.000 / 0.000        14.836 / 14.836       11.893 / 28.350
>      11.664 / 41.767      ms
> [2020-02-02T22:37:36.884+0000]       Phase: Pause Relocate Start
>                 0.000 / 0.000        13.411 / 13.411       30.849 / 697.344
>       11.995 / 697.344     ms
> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark
>                0.000 / 0.000      7015.793 / 7016.276  18497.265 /
> 3901893.075  1690.497 / 3901893.075   ms
> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Idle
>                 0.000 / 0.000         1.127 / 13.510        1.292 / 219.999
>        1.280 / 219.999     ms
> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Try Flush
>                0.000 / 0.000         1.295 / 2.029        47.094 / 34869.359
>      4.797 / 34869.359   ms
> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Try Terminate
>                0.000 / 0.000         1.212 / 14.847        1.760 / 3799.238
>       1.724 / 3799.238    ms
> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent References Enqueue
>                0.000 / 0.000         0.009 / 0.009         0.022 / 1.930
>      0.017 / 2.350       ms
> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent References Process
>                0.000 / 0.000         0.599 / 0.599         0.768 / 23.966
>       0.495 / 23.966      ms
> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
>                0.000 / 0.000         0.882 / 1.253         1.155 / 21.699
>       1.077 / 23.602      ms
> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
> JNIWeakHandles          0.000 / 0.000         0.301 / 0.943         0.308 /
> 10.868        0.310 / 23.219      ms
> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
> StringTable             0.000 / 0.000         0.289 / 0.496         0.390 /
> 12.794        0.363 / 22.907      ms
> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
> VMWeakHandles           0.000 / 0.000         0.230 / 0.469         0.329 /
> 21.267        0.331 / 23.135      ms
> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Mark Try Complete
>                0.000 / 0.000         0.000 / 0.000         0.501 / 4.801
>      0.480 / 17.208      ms
> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Remap TLABS
>                0.000 / 0.000         0.252 / 0.252         0.195 / 0.528
>      0.226 / 3.451       ms
> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Retire TLABS
>                 0.000 / 0.000         1.195 / 1.195         1.324 / 5.082
>        1.408 / 11.219      ms
> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots
>                0.000 / 0.000         6.968 / 10.865       12.329 / 693.701
>      6.431 / 1300.994    ms
> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots
> ClassLoaderDataGraph              0.000 / 0.000         4.819 / 8.232
>    9.635 / 693.405       3.476 / 693.405     ms
> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots CodeCache
>                0.000 / 0.000         0.842 / 2.731         0.996 / 83.553
>       0.780 / 83.553      ms
> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots JNIHandles
>                 0.000 / 0.000         1.171 / 6.314         0.866 / 17.875
>       0.837 / 25.708      ms
> 
> 
> Thank you!
> Prabhash Rathore
> 

From prabhashrathore at gmail.com  Sun Feb  9 06:55:33 2020
From: prabhashrathore at gmail.com (Prabhash Rathore)
Date: Sat, 8 Feb 2020 22:55:33 -0800
Subject: Information on how to parse/interpret ZGC Logs
In-Reply-To: <365169bd-1d39-d0e5-eb21-2a7130973b03@oracle.com>
References: <CAFw09gJ8Z+uyMB=60Vpjwj86bw_-KRdQX1RBFXgG3dvciiXJkw@mail.gmail.com>
 <365169bd-1d39-d0e5-eb21-2a7130973b03@oracle.com>
Message-ID: <CAFw09gL4LCRDEUHNZNC11ZvBG5TDNpuSfQDv+qzeg9B0eAz5Tg@mail.gmail.com>

Hi Per,

Thanks for your reply!

About ZGC logs, in general I am trying to understand following:

   - What are the full pause times?
   - How many such pause per unit time?
   - Anything else which helps me eliminate GC as cause for high
   application latency.

This is how I have configured ZGC logging at JVM level, wondering if I
should add other tags like Safepoint to get more details about GC stats:
-Xlog:gc*=debug:file=gc.log

All JVM flas used in my application:
-Xms130G -Xmx130G
-Xlog:gc*=debug:file=/somedir/gc.log:time:filecount=10,filesize=67108864
-XX:+AlwaysPreTouch -XX:+HeapDumpOnOutOfMemoryError
-XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:ConcGCThreads=48
-XX:ParallelGCThreads=96 -XX:-OmitStackTraceInFastThrow

It's a large machine with 96 threads and 196 GB RAM.

I have -XX:+AlwaysPreTouch configured as one another option. With
AlwaysPreTouch option, I see in Linux top command shows a very high shared
and resident memory. My max heap size is configured as 130 GB but I see
shared memory is shown as 388 GB and Resident memory as 436 GB. On the
other hand, total virtual memory for this process in top is shown as 17.1
tera byte. How is this possible? My whole machine size is 196 GB (is this
accounting for things swapped out to disk). I did see without
AlwaysPretouch, numbers look close to the heap size. Trying to understand
why with PreTouch, process memory is shown was higher than configured size?
I understand shared memory has all shared libs mapped out but how can it be
such. a large size?

Regarding high GC pause time, I did notice that my machine was low on
memory and it was swapping, hence slowing down everything. For now I have
disabled Swappines completely with Kernel VM tunable but I am still trying
to find the actual cause of why swapping kicked in. This machine only runs
this particular Java applicaion which has 130 GB heap size. Other than
heap, I still have 66 GB memory available on host. Trying to figure out if
there is a native memory leak. If you have any inputs on this then please
share.

Thanks!
Prabhash Rathore

On Mon, Feb 3, 2020 at 2:35 AM Per Liden <per.liden at oracle.com> wrote:

> Hi,
>
> On 2020-02-03 06:52, Prabhash Rathore wrote:
> > Hello,
> >
> > We have decided to use ZGC Garbage Collector for our Java application
> > running on Java 11. I was wondering if there are any tools or any
> > documenation on how to interpret ZGC logs.
>
> Is there something in particular in the logs you're wondering about?
>
> >
> > I found following statistics in ZGC log which as per my understanding
> shows
> > a very large allocation stall of 3902042.342 milliseconds. It will be
> > really great if I can get some help to understand this further.
>
> I can see that you've had making times that is more than an hour long,
> which suggests that something in your system is seriously wrong (like
> extremely overloaded or an extremely slow disk that you log to... just
> guessing here). I think you need to have a broader look at the health of
> the system before we can draw any conclusion from the GC logs.
>
> cheers,
> Per
>
> >
> > [2020-02-02T22:37:36.883+0000] === Garbage Collection Statistics
> >
> =======================================================================================================================
> > [2020-02-02T22:37:36.883+0000]
> >                   Last 10s              Last 10m              Last 10h
> >           Total
> > [2020-02-02T22:37:36.883+0000]
> >                   Avg / Max             Avg / Max             Avg / Max
> >          Avg / Max
> > [2020-02-02T22:37:36.883+0000]   Collector: Garbage Collection Cycle
> >                 0.000 / 0.000      7789.187 / 7789.187  12727.424 /
> > 3903938.012  1265.033 / 3903938.012   ms
> > [2020-02-02T22:37:36.883+0000]  Contention: Mark Segment Reset Contention
> >                    0 / 0                10 / 1084            176 / 15122
> >         42 / 15122       ops/s
> > [2020-02-02T22:37:36.883+0000]  Contention: Mark SeqNum Reset Contention
> >                     0 / 0                 0 / 5                 0 / 31
> >           0 / 31          ops/s
> > [2020-02-02T22:37:36.883+0000]  Contention: Relocation Contention
> >                    0 / 0                 0 / 3                 1 / 708
> >          7 / 890         ops/s
> > [2020-02-02T22:37:36.883+0000]    Critical: Allocation Stall
> >                 0.000 / 0.000         0.000 / 0.000      6714.722 /
> > 3902042.342  6714.722 / 3902042.342   ms
> > [2020-02-02T22:37:36.883+0000]    Critical: Allocation Stall
> >                     0 / 0                 0 / 0                12 / 4115
> >           2 / 4115        ops/s
> > [2020-02-02T22:37:36.883+0000]    Critical: GC Locker Stall
> >                0.000 / 0.000         0.000 / 0.000         3.979 / 6.561
> >      1.251 / 6.561       ms
> > [2020-02-02T22:37:36.883+0000]    Critical: GC Locker Stall
> >                    0 / 0                 0 / 0                 0 / 1
> >          0 / 1           ops/s
> > [2020-02-02T22:37:36.883+0000]      Memory: Allocation Rate
> >                    0 / 0                 6 / 822             762 / 25306
> >       1548 / 25306       MB/s
> > [2020-02-02T22:37:36.883+0000]      Memory: Heap Used After Mark
> >                     0 / 0             92170 / 92170         89632 /
> 132896
> >       30301 / 132896      MB
> > [2020-02-02T22:37:36.883+0000]      Memory: Heap Used After Relocation
> >                     0 / 0             76376 / 76376         67490 /
> 132928
> >        8047 / 132928      MB
> > [2020-02-02T22:37:36.883+0000]      Memory: Heap Used Before Mark
> >                    0 / 0             92128 / 92128         84429 / 132896
> >       29452 / 132896      MB
> > [2020-02-02T22:37:36.883+0000]      Memory: Heap Used Before Relocation
> >                    0 / 0             86340 / 86340         76995 / 132896
> >       15862 / 132896      MB
> > [2020-02-02T22:37:36.883+0000]      Memory: Out Of Memory
> >                    0 / 0                 0 / 0                 0 / 0
> >          0 / 0           ops/s
> > [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Flush
> >                     0 / 0                 0 / 0                62 / 2868
> >          16 / 2868        MB/s
> > [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Hit L1
> >                    0 / 0                 7 / 2233            277 / 11553
> >        583 / 11553       ops/s
> > [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Hit L2
> >                    0 / 0                 0 / 0                20 / 4619
> >          59 / 4619        ops/s
> > [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Miss
> >                    0 / 0                 0 / 0                15 / 1039
> >           3 / 1297        ops/s
> > [2020-02-02T22:37:36.883+0000]      Memory: Undo Object Allocation Failed
> >                    0 / 0                 0 / 0                 0 / 24
> >           0 / 24          ops/s
> > [2020-02-02T22:37:36.883+0000]      Memory: Undo Object Allocation
> > Succeeded                  0 / 0                 0 / 3                 1
> /
> > 708               7 / 890         ops/s
> > [2020-02-02T22:37:36.883+0000]      Memory: Undo Page Allocation
> >                     0 / 0                 0 / 12               30 / 3464
> >           7 / 3464        ops/s
> > [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Destroy Detached
> > Pages             0.000 / 0.000         0.004 / 0.004        11.675 /
> > 1484.886      1.155 / 1484.886    ms
> > [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Mark
> >                0.000 / 0.000      7016.569 / 7016.569  11758.365 /
> > 3901893.544  1103.558 / 3901893.544   ms
> > [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Mark Continue
> >                 0.000 / 0.000         0.000 / 0.000      1968.844 /
> 3674.454
> >    1968.844 / 3674.454    ms
> > [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Prepare Relocation
> > Set             0.000 / 0.000       453.732 / 453.732     364.535 /
> > 7103.720     39.453 / 7103.720    ms
> > [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Process Non-Strong
> > References      0.000 / 0.000         2.003 / 2.003         2.738 /
> 34.406
> >         2.253 / 34.406      ms
> > [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Relocate
> >                0.000 / 0.000       261.822 / 261.822     335.954 /
> 2207.669
> >      45.868 / 2207.669    ms
> > [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Reset Relocation
> Set
> >                0.000 / 0.000         6.083 / 6.083        13.489 /
> 1128.678
> >       3.574 / 1128.678    ms
> > [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Select Relocation
> > Set              0.000 / 0.000         6.379 / 6.379        97.530 /
> > 1460.679     18.439 / 1460.679    ms
> > [2020-02-02T22:37:36.883+0000]       Phase: Pause Mark End
> >                 0.000 / 0.000         4.420 / 4.420         6.219 /
> 26.498
> >       6.474 / 40.883      ms
> > [2020-02-02T22:37:36.883+0000]       Phase: Pause Mark Start
> >                 0.000 / 0.000        14.836 / 14.836       11.893 /
> 28.350
> >      11.664 / 41.767      ms
> > [2020-02-02T22:37:36.884+0000]       Phase: Pause Relocate Start
> >                 0.000 / 0.000        13.411 / 13.411       30.849 /
> 697.344
> >       11.995 / 697.344     ms
> > [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark
> >                0.000 / 0.000      7015.793 / 7016.276  18497.265 /
> > 3901893.075  1690.497 / 3901893.075   ms
> > [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Idle
> >                 0.000 / 0.000         1.127 / 13.510        1.292 /
> 219.999
> >        1.280 / 219.999     ms
> > [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Try Flush
> >                0.000 / 0.000         1.295 / 2.029        47.094 /
> 34869.359
> >      4.797 / 34869.359   ms
> > [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Try Terminate
> >                0.000 / 0.000         1.212 / 14.847        1.760 /
> 3799.238
> >       1.724 / 3799.238    ms
> > [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent References Enqueue
> >                0.000 / 0.000         0.009 / 0.009         0.022 / 1.930
> >      0.017 / 2.350       ms
> > [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent References Process
> >                0.000 / 0.000         0.599 / 0.599         0.768 / 23.966
> >       0.495 / 23.966      ms
> > [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
> >                0.000 / 0.000         0.882 / 1.253         1.155 / 21.699
> >       1.077 / 23.602      ms
> > [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
> > JNIWeakHandles          0.000 / 0.000         0.301 / 0.943
>  0.308 /
> > 10.868        0.310 / 23.219      ms
> > [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
> > StringTable             0.000 / 0.000         0.289 / 0.496
>  0.390 /
> > 12.794        0.363 / 22.907      ms
> > [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
> > VMWeakHandles           0.000 / 0.000         0.230 / 0.469
>  0.329 /
> > 21.267        0.331 / 23.135      ms
> > [2020-02-02T22:37:36.884+0000]    Subphase: Pause Mark Try Complete
> >                0.000 / 0.000         0.000 / 0.000         0.501 / 4.801
> >      0.480 / 17.208      ms
> > [2020-02-02T22:37:36.884+0000]    Subphase: Pause Remap TLABS
> >                0.000 / 0.000         0.252 / 0.252         0.195 / 0.528
> >      0.226 / 3.451       ms
> > [2020-02-02T22:37:36.884+0000]    Subphase: Pause Retire TLABS
> >                 0.000 / 0.000         1.195 / 1.195         1.324 / 5.082
> >        1.408 / 11.219      ms
> > [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots
> >                0.000 / 0.000         6.968 / 10.865       12.329 /
> 693.701
> >      6.431 / 1300.994    ms
> > [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots
> > ClassLoaderDataGraph              0.000 / 0.000         4.819 / 8.232
> >    9.635 / 693.405       3.476 / 693.405     ms
> > [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots CodeCache
> >                0.000 / 0.000         0.842 / 2.731         0.996 / 83.553
> >       0.780 / 83.553      ms
> > [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots JNIHandles
> >                 0.000 / 0.000         1.171 / 6.314         0.866 /
> 17.875
> >       0.837 / 25.708      ms
> >
> >
> > Thank you!
> > Prabhash Rathore
> >
>

From per.liden at oracle.com  Mon Feb 10 14:07:53 2020
From: per.liden at oracle.com (Per Liden)
Date: Mon, 10 Feb 2020 15:07:53 +0100
Subject: Information on how to parse/interpret ZGC Logs
In-Reply-To: <CAFw09gL4LCRDEUHNZNC11ZvBG5TDNpuSfQDv+qzeg9B0eAz5Tg@mail.gmail.com>
References: <CAFw09gJ8Z+uyMB=60Vpjwj86bw_-KRdQX1RBFXgG3dvciiXJkw@mail.gmail.com>
 <365169bd-1d39-d0e5-eb21-2a7130973b03@oracle.com>
 <CAFw09gL4LCRDEUHNZNC11ZvBG5TDNpuSfQDv+qzeg9B0eAz5Tg@mail.gmail.com>
Message-ID: <89bc7951-8e1f-50c6-f29c-d03233e65e6b@oracle.com>

Hi,

On 2/9/20 7:55 AM, Prabhash Rathore wrote:
> Hi Per,
> 
> Thanks for your reply!
> 
> About ZGC logs, in general I am trying to understand following:
> 
>   * What are the full pause times?

The statistics table that is printed have the following rows:

Phase: Pause Mark End
Phase: Pause Mark Start
Phase: Pause Relocate Start

These shows you the pause time average and max for different time 
windows. These times are also printed for each GC (if gc+phase or gc* 
logging is enabled).

You should keep an eye on the "Critical: Allocation Stall" row. It 
should always be zero. If it's not it's an indication that the GC can't 
keep up with the allocation rate of your application, and you should 
consider reconfiguring your system, e.g. increase the max heap size if 
possible.

>   * How many such pause per unit time?

The log outputs time when things happen, so just look for the "Pause 
..." log lines and do the math. However, you can also just look at the 
MMU log line to get an overview of the worst case that has happened so 
far (google "Minimum Mutator Utilization" if you're unfamiliar with the 
term).

>   * Anything else which helps me eliminate GC as cause for high
>     application latency.
> 
> This is how I have configured ZGC logging at JVM level, wondering if I 
> should add other tags like Safepoint to get more details about GC stats:
> -Xlog:gc*=debug:file=gc.log

It's not recommend to use -Xlog:gc*=debug in any kind of production 
setting, since the amount of logging done in sensitive paths can affect 
pause times, etc. Instead use just -Xlog:gc*, it will log the most 
relevant information you need to understand what's going on.

> 
> All JVM flas used in my application:
> -Xms130G -Xmx130G 
> -Xlog:gc*=debug:file=/somedir/gc.log:time:filecount=10,filesize=67108864 
> -XX:+AlwaysPreTouch -XX:+HeapDumpOnOutOfMemoryError 
> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:ConcGCThreads=48 
> -XX:ParallelGCThreads=96 -XX:-OmitStackTraceInFastThrow
> 
> It's a large machine with 96 threads and 196 GB RAM.

Generally speaking, -XX:ConcGCThreads=48 and -XX:ParallelGCThreads=96 
looks way too high for the given machine. I would start with not 
touching those at all and just use the default settings.

> 
> I have -XX:+AlwaysPreTouch configured as one another option. With 
> AlwaysPreTouch option, I see in Linux top command shows a very high 
> shared and resident memory. My max heap size is configured as 130 GB but 
> I see shared memory is shown as 388 GB and Resident memory as 436 GB. On 
> the other hand, total virtual memory for this process in top is shown as 
> 17.1 tera byte. How is this possible? My whole machine size is 196 GB 
> (is this accounting for things swapped out to disk). I did see without 
> AlwaysPretouch, numbers look close to the heap size. Trying to 
> understand why with PreTouch, process memory is shown was higher than 
> configured size? I understand shared memory has all shared libs mapped 
> out but how can it be such. a large size?

Please see this thread 
https://mail.openjdk.java.net/pipermail/zgc-dev/2019-September/000731.html 
for how to interpret the resident size. The virtual memory size is 
unrelated to now much memory physical is actually used, as it's just an 
address space reservation.

The -XX:AlwaysPreTouch option is unrelated to all of this, it just means 
the whole heap will be touched at startup of the JVM, which means the 
RSS/VIRT numbers will show up and stabilize immediately.

> 
> Regarding high GC pause time, I did notice that my machine was low on 
> memory and it was swapping, hence slowing down everything. For now I 
> have disabled Swappines completely with Kernel VM tunable but I am still 

Just do "swapoff -a" to disable all swapping.

cheers,
Per

> trying to find the actual cause of why swapping kicked in. This machine 
> only runs this particular Java applicaion which has 130 GB heap size. 
> Other than heap, I still have 66 GB memory available on host. Trying to 
> figure out if there is a native memory leak. If you have any inputs on 
> this then please share.
> 
> Thanks!
> Prabhash Rathore
> 
> On Mon, Feb 3, 2020 at 2:35 AM Per Liden <per.liden at oracle.com 
> <mailto:per.liden at oracle.com>> wrote:
> 
>     Hi,
> 
>     On 2020-02-03 06:52, Prabhash Rathore wrote:
>      > Hello,
>      >
>      > We have decided to use ZGC Garbage Collector for our Java application
>      > running on Java 11. I was wondering if there are any tools or any
>      > documenation on how to interpret ZGC logs.
> 
>     Is there something in particular in the logs you're wondering about?
> 
>      >
>      > I found following statistics in ZGC log which as per my
>     understanding shows
>      > a very large allocation stall of 3902042.342 milliseconds. It will be
>      > really great if I can get some help to understand this further.
> 
>     I can see that you've had making times that is more than an hour long,
>     which suggests that something in your system is seriously wrong (like
>     extremely overloaded or an extremely slow disk that you log to... just
>     guessing here). I think you need to have a broader look at the
>     health of
>     the system before we can draw any conclusion from the GC logs.
> 
>     cheers,
>     Per
> 
>      >
>      > [2020-02-02T22:37:36.883+0000] === Garbage Collection Statistics
>      >
>     =======================================================================================================================
>      > [2020-02-02T22:37:36.883+0000]
>      >? ? ? ? ? ? ? ? ? ?Last 10s? ? ? ? ? ? ? Last 10m             
>     Last 10h
>      >? ? ? ? ? ?Total
>      > [2020-02-02T22:37:36.883+0000]
>      >? ? ? ? ? ? ? ? ? ?Avg / Max? ? ? ? ? ? ?Avg / Max? ? ? ? ? ? ?Avg
>     / Max
>      >? ? ? ? ? Avg / Max
>      > [2020-02-02T22:37:36.883+0000]? ?Collector: Garbage Collection Cycle
>      >? ? ? ? ? ? ? ? ?0.000 / 0.000? ? ? 7789.187 / 7789.187? 12727.424 /
>      > 3903938.012? 1265.033 / 3903938.012? ?ms
>      > [2020-02-02T22:37:36.883+0000]? Contention: Mark Segment Reset
>     Contention
>      >? ? ? ? ? ? ? ? ? ? 0 / 0? ? ? ? ? ? ? ? 10 / 1084? ? ? ? ? ? 176
>     / 15122
>      >? ? ? ? ?42 / 15122? ? ? ?ops/s
>      > [2020-02-02T22:37:36.883+0000]? Contention: Mark SeqNum Reset
>     Contention
>      >? ? ? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? ?0 / 5? ? ? ? ? ? ? ? ?0
>     / 31
>      >? ? ? ? ? ?0 / 31? ? ? ? ? ops/s
>      > [2020-02-02T22:37:36.883+0000]? Contention: Relocation Contention
>      >? ? ? ? ? ? ? ? ? ? 0 / 0? ? ? ? ? ? ? ? ?0 / 3? ? ? ? ? ? ? ? ?1
>     / 708
>      >? ? ? ? ? 7 / 890? ? ? ? ?ops/s
>      > [2020-02-02T22:37:36.883+0000]? ? Critical: Allocation Stall
>      >? ? ? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?0.000 / 0.000? ? ? 6714.722 /
>      > 3902042.342? 6714.722 / 3902042.342? ?ms
>      > [2020-02-02T22:37:36.883+0000]? ? Critical: Allocation Stall
>      >? ? ? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? 12
>     / 4115
>      >? ? ? ? ? ?2 / 4115? ? ? ? ops/s
>      > [2020-02-02T22:37:36.883+0000]? ? Critical: GC Locker Stall
>      >? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?0.000 / 0.000? ? ? ? ?3.979
>     / 6.561
>      >? ? ? 1.251 / 6.561? ? ? ?ms
>      > [2020-02-02T22:37:36.883+0000]? ? Critical: GC Locker Stall
>      >? ? ? ? ? ? ? ? ? ? 0 / 0? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? ?0 / 1
>      >? ? ? ? ? 0 / 1? ? ? ? ? ?ops/s
>      > [2020-02-02T22:37:36.883+0000]? ? ? Memory: Allocation Rate
>      >? ? ? ? ? ? ? ? ? ? 0 / 0? ? ? ? ? ? ? ? ?6 / 822? ? ? ? ? ? ?762
>     / 25306
>      >? ? ? ?1548 / 25306? ? ? ?MB/s
>      > [2020-02-02T22:37:36.883+0000]? ? ? Memory: Heap Used After Mark
>      >? ? ? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ?92170 / 92170? ? ? ? ?89632
>     / 132896
>      >? ? ? ?30301 / 132896? ? ? MB
>      > [2020-02-02T22:37:36.883+0000]? ? ? Memory: Heap Used After
>     Relocation
>      >? ? ? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ?76376 / 76376? ? ? ? ?67490
>     / 132928
>      >? ? ? ? 8047 / 132928? ? ? MB
>      > [2020-02-02T22:37:36.883+0000]? ? ? Memory: Heap Used Before Mark
>      >? ? ? ? ? ? ? ? ? ? 0 / 0? ? ? ? ? ? ?92128 / 92128? ? ? ? ?84429
>     / 132896
>      >? ? ? ?29452 / 132896? ? ? MB
>      > [2020-02-02T22:37:36.883+0000]? ? ? Memory: Heap Used Before
>     Relocation
>      >? ? ? ? ? ? ? ? ? ? 0 / 0? ? ? ? ? ? ?86340 / 86340? ? ? ? ?76995
>     / 132896
>      >? ? ? ?15862 / 132896? ? ? MB
>      > [2020-02-02T22:37:36.883+0000]? ? ? Memory: Out Of Memory
>      >? ? ? ? ? ? ? ? ? ? 0 / 0? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? ?0 / 0
>      >? ? ? ? ? 0 / 0? ? ? ? ? ?ops/s
>      > [2020-02-02T22:37:36.883+0000]? ? ? Memory: Page Cache Flush
>      >? ? ? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? 62
>     / 2868
>      >? ? ? ? ? 16 / 2868? ? ? ? MB/s
>      > [2020-02-02T22:37:36.883+0000]? ? ? Memory: Page Cache Hit L1
>      >? ? ? ? ? ? ? ? ? ? 0 / 0? ? ? ? ? ? ? ? ?7 / 2233? ? ? ? ? ? 277
>     / 11553
>      >? ? ? ? 583 / 11553? ? ? ?ops/s
>      > [2020-02-02T22:37:36.883+0000]? ? ? Memory: Page Cache Hit L2
>      >? ? ? ? ? ? ? ? ? ? 0 / 0? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? 20
>     / 4619
>      >? ? ? ? ? 59 / 4619? ? ? ? ops/s
>      > [2020-02-02T22:37:36.883+0000]? ? ? Memory: Page Cache Miss
>      >? ? ? ? ? ? ? ? ? ? 0 / 0? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? 15
>     / 1039
>      >? ? ? ? ? ?3 / 1297? ? ? ? ops/s
>      > [2020-02-02T22:37:36.883+0000]? ? ? Memory: Undo Object
>     Allocation Failed
>      >? ? ? ? ? ? ? ? ? ? 0 / 0? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? ?0 / 24
>      >? ? ? ? ? ?0 / 24? ? ? ? ? ops/s
>      > [2020-02-02T22:37:36.883+0000]? ? ? Memory: Undo Object Allocation
>      > Succeeded? ? ? ? ? ? ? ? ? 0 / 0? ? ? ? ? ? ? ? ?0 / 3           
>      ? ? ?1 /
>      > 708? ? ? ? ? ? ? ?7 / 890? ? ? ? ?ops/s
>      > [2020-02-02T22:37:36.883+0000]? ? ? Memory: Undo Page Allocation
>      >? ? ? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? ?0 / 12? ? ? ? ? ? ? ?30
>     / 3464
>      >? ? ? ? ? ?7 / 3464? ? ? ? ops/s
>      > [2020-02-02T22:37:36.883+0000]? ? ? ?Phase: Concurrent Destroy
>     Detached
>      > Pages? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?0.004 / 0.004? ? ? ? 11.675 /
>      > 1484.886? ? ? 1.155 / 1484.886? ? ms
>      > [2020-02-02T22:37:36.883+0000]? ? ? ?Phase: Concurrent Mark
>      >? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? 7016.569 / 7016.569? 11758.365 /
>      > 3901893.544? 1103.558 / 3901893.544? ?ms
>      > [2020-02-02T22:37:36.883+0000]? ? ? ?Phase: Concurrent Mark Continue
>      >? ? ? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?0.000 / 0.000? ? ? 1968.844
>     / 3674.454
>      >? ? 1968.844 / 3674.454? ? ms
>      > [2020-02-02T22:37:36.883+0000]? ? ? ?Phase: Concurrent Prepare
>     Relocation
>      > Set? ? ? ? ? ? ?0.000 / 0.000? ? ? ?453.732 / 453.732? ? ?364.535 /
>      > 7103.720? ? ?39.453 / 7103.720? ? ms
>      > [2020-02-02T22:37:36.883+0000]? ? ? ?Phase: Concurrent Process
>     Non-Strong
>      > References? ? ? 0.000 / 0.000? ? ? ? ?2.003 / 2.003? ? ? ? ?2.738
>     / 34.406
>      >? ? ? ? ?2.253 / 34.406? ? ? ms
>      > [2020-02-02T22:37:36.883+0000]? ? ? ?Phase: Concurrent Relocate
>      >? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? ?261.822 / 261.822? ? ?335.954
>     / 2207.669
>      >? ? ? 45.868 / 2207.669? ? ms
>      > [2020-02-02T22:37:36.883+0000]? ? ? ?Phase: Concurrent Reset
>     Relocation Set
>      >? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?6.083 / 6.083? ? ? ? 13.489
>     / 1128.678
>      >? ? ? ?3.574 / 1128.678? ? ms
>      > [2020-02-02T22:37:36.883+0000]? ? ? ?Phase: Concurrent Select
>     Relocation
>      > Set? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?6.379 / 6.379? ? ? ? 97.530 /
>      > 1460.679? ? ?18.439 / 1460.679? ? ms
>      > [2020-02-02T22:37:36.883+0000]? ? ? ?Phase: Pause Mark End
>      >? ? ? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?4.420 / 4.420? ? ? ? ?6.219
>     / 26.498
>      >? ? ? ?6.474 / 40.883? ? ? ms
>      > [2020-02-02T22:37:36.883+0000]? ? ? ?Phase: Pause Mark Start
>      >? ? ? ? ? ? ? ? ?0.000 / 0.000? ? ? ? 14.836 / 14.836? ? ? ?11.893
>     / 28.350
>      >? ? ? 11.664 / 41.767? ? ? ms
>      > [2020-02-02T22:37:36.884+0000]? ? ? ?Phase: Pause Relocate Start
>      >? ? ? ? ? ? ? ? ?0.000 / 0.000? ? ? ? 13.411 / 13.411? ? ? ?30.849
>     / 697.344
>      >? ? ? ?11.995 / 697.344? ? ?ms
>      > [2020-02-02T22:37:36.884+0000]? ? Subphase: Concurrent Mark
>      >? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? 7015.793 / 7016.276? 18497.265 /
>      > 3901893.075? 1690.497 / 3901893.075? ?ms
>      > [2020-02-02T22:37:36.884+0000]? ? Subphase: Concurrent Mark Idle
>      >? ? ? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?1.127 / 13.510? ? ? ? 1.292
>     / 219.999
>      >? ? ? ? 1.280 / 219.999? ? ?ms
>      > [2020-02-02T22:37:36.884+0000]? ? Subphase: Concurrent Mark Try Flush
>      >? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?1.295 / 2.029? ? ? ? 47.094
>     / 34869.359
>      >? ? ? 4.797 / 34869.359? ?ms
>      > [2020-02-02T22:37:36.884+0000]? ? Subphase: Concurrent Mark Try
>     Terminate
>      >? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?1.212 / 14.847? ? ? ? 1.760
>     / 3799.238
>      >? ? ? ?1.724 / 3799.238? ? ms
>      > [2020-02-02T22:37:36.884+0000]? ? Subphase: Concurrent References
>     Enqueue
>      >? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?0.009 / 0.009? ? ? ? ?0.022
>     / 1.930
>      >? ? ? 0.017 / 2.350? ? ? ?ms
>      > [2020-02-02T22:37:36.884+0000]? ? Subphase: Concurrent References
>     Process
>      >? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?0.599 / 0.599? ? ? ? ?0.768
>     / 23.966
>      >? ? ? ?0.495 / 23.966? ? ? ms
>      > [2020-02-02T22:37:36.884+0000]? ? Subphase: Concurrent Weak Roots
>      >? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?0.882 / 1.253? ? ? ? ?1.155
>     / 21.699
>      >? ? ? ?1.077 / 23.602? ? ? ms
>      > [2020-02-02T22:37:36.884+0000]? ? Subphase: Concurrent Weak Roots
>      > JNIWeakHandles? ? ? ? ? 0.000 / 0.000? ? ? ? ?0.301 / 0.943     
>      ? ?0.308 /
>      > 10.868? ? ? ? 0.310 / 23.219? ? ? ms
>      > [2020-02-02T22:37:36.884+0000]? ? Subphase: Concurrent Weak Roots
>      > StringTable? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?0.289 / 0.496     
>      ? ?0.390 /
>      > 12.794? ? ? ? 0.363 / 22.907? ? ? ms
>      > [2020-02-02T22:37:36.884+0000]? ? Subphase: Concurrent Weak Roots
>      > VMWeakHandles? ? ? ? ? ?0.000 / 0.000? ? ? ? ?0.230 / 0.469     
>      ? ?0.329 /
>      > 21.267? ? ? ? 0.331 / 23.135? ? ? ms
>      > [2020-02-02T22:37:36.884+0000]? ? Subphase: Pause Mark Try Complete
>      >? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?0.000 / 0.000? ? ? ? ?0.501
>     / 4.801
>      >? ? ? 0.480 / 17.208? ? ? ms
>      > [2020-02-02T22:37:36.884+0000]? ? Subphase: Pause Remap TLABS
>      >? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?0.252 / 0.252? ? ? ? ?0.195
>     / 0.528
>      >? ? ? 0.226 / 3.451? ? ? ?ms
>      > [2020-02-02T22:37:36.884+0000]? ? Subphase: Pause Retire TLABS
>      >? ? ? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?1.195 / 1.195? ? ? ? ?1.324
>     / 5.082
>      >? ? ? ? 1.408 / 11.219? ? ? ms
>      > [2020-02-02T22:37:36.884+0000]? ? Subphase: Pause Roots
>      >? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?6.968 / 10.865? ? ? ?12.329
>     / 693.701
>      >? ? ? 6.431 / 1300.994? ? ms
>      > [2020-02-02T22:37:36.884+0000]? ? Subphase: Pause Roots
>      > ClassLoaderDataGraph? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?4.819 / 8.232
>      >? ? 9.635 / 693.405? ? ? ?3.476 / 693.405? ? ?ms
>      > [2020-02-02T22:37:36.884+0000]? ? Subphase: Pause Roots CodeCache
>      >? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?0.842 / 2.731? ? ? ? ?0.996
>     / 83.553
>      >? ? ? ?0.780 / 83.553? ? ? ms
>      > [2020-02-02T22:37:36.884+0000]? ? Subphase: Pause Roots JNIHandles
>      >? ? ? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?1.171 / 6.314? ? ? ? ?0.866
>     / 17.875
>      >? ? ? ?0.837 / 25.708? ? ? ms
>      >
>      >
>      > Thank you!
>      > Prabhash Rathore
>      >
> 

From peter_booth at me.com  Mon Feb 10 17:27:59 2020
From: peter_booth at me.com (Peter Booth)
Date: Mon, 10 Feb 2020 12:27:59 -0500
Subject: Information on how to parse/interpret ZGC Logs
In-Reply-To: <CAFw09gL4LCRDEUHNZNC11ZvBG5TDNpuSfQDv+qzeg9B0eAz5Tg@mail.gmail.com>
References: <CAFw09gL4LCRDEUHNZNC11ZvBG5TDNpuSfQDv+qzeg9B0eAz5Tg@mail.gmail.com>
Message-ID: <48002F4A-20D2-4DFB-91BE-0E0E346DFCD5@me.com>

Prabhash,

What OS version?
Is it a vanilla OS install?
Can you print the output of the following?
(Assuming Linux)
egrep ?thp|trans? /proc/vmstat
tail -28 /proc/cpuinfo

Peter

Sent from my iPhone

> On Feb 9, 2020, at 1:56 AM, Prabhash Rathore <prabhashrathore at gmail.com> wrote:
> 
> ?Hi Per,
> 
> Thanks for your reply!
> 
> About ZGC logs, in general I am trying to understand following:
> 
>   - What are the full pause times?
>   - How many such pause per unit time?
>   - Anything else which helps me eliminate GC as cause for high
>   application latency.
> 
> This is how I have configured ZGC logging at JVM level, wondering if I
> should add other tags like Safepoint to get more details about GC stats:
> -Xlog:gc*=debug:file=gc.log
> 
> All JVM flas used in my application:
> -Xms130G -Xmx130G
> -Xlog:gc*=debug:file=/somedir/gc.log:time:filecount=10,filesize=67108864
> -XX:+AlwaysPreTouch -XX:+HeapDumpOnOutOfMemoryError
> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:ConcGCThreads=48
> -XX:ParallelGCThreads=96 -XX:-OmitStackTraceInFastThrow
> 
> It's a large machine with 96 threads and 196 GB RAM.
> 
> I have -XX:+AlwaysPreTouch configured as one another option. With
> AlwaysPreTouch option, I see in Linux top command shows a very high shared
> and resident memory. My max heap size is configured as 130 GB but I see
> shared memory is shown as 388 GB and Resident memory as 436 GB. On the
> other hand, total virtual memory for this process in top is shown as 17.1
> tera byte. How is this possible? My whole machine size is 196 GB (is this
> accounting for things swapped out to disk). I did see without
> AlwaysPretouch, numbers look close to the heap size. Trying to understand
> why with PreTouch, process memory is shown was higher than configured size?
> I understand shared memory has all shared libs mapped out but how can it be
> such. a large size?
> 
> Regarding high GC pause time, I did notice that my machine was low on
> memory and it was swapping, hence slowing down everything. For now I have
> disabled Swappines completely with Kernel VM tunable but I am still trying
> to find the actual cause of why swapping kicked in. This machine only runs
> this particular Java applicaion which has 130 GB heap size. Other than
> heap, I still have 66 GB memory available on host. Trying to figure out if
> there is a native memory leak. If you have any inputs on this then please
> share.
> 
> Thanks!
> Prabhash Rathore
> 
>> On Mon, Feb 3, 2020 at 2:35 AM Per Liden <per.liden at oracle.com> wrote:
>> 
>> Hi,
>> 
>>> On 2020-02-03 06:52, Prabhash Rathore wrote:
>>> Hello,
>>> 
>>> We have decided to use ZGC Garbage Collector for our Java application
>>> running on Java 11. I was wondering if there are any tools or any
>>> documenation on how to interpret ZGC logs.
>> 
>> Is there something in particular in the logs you're wondering about?
>> 
>>> 
>>> I found following statistics in ZGC log which as per my understanding
>> shows
>>> a very large allocation stall of 3902042.342 milliseconds. It will be
>>> really great if I can get some help to understand this further.
>> 
>> I can see that you've had making times that is more than an hour long,
>> which suggests that something in your system is seriously wrong (like
>> extremely overloaded or an extremely slow disk that you log to... just
>> guessing here). I think you need to have a broader look at the health of
>> the system before we can draw any conclusion from the GC logs.
>> 
>> cheers,
>> Per
>> 
>>> 
>>> [2020-02-02T22:37:36.883+0000] === Garbage Collection Statistics
>>> 
>> =======================================================================================================================
>>> [2020-02-02T22:37:36.883+0000]
>>>                  Last 10s              Last 10m              Last 10h
>>>          Total
>>> [2020-02-02T22:37:36.883+0000]
>>>                  Avg / Max             Avg / Max             Avg / Max
>>>         Avg / Max
>>> [2020-02-02T22:37:36.883+0000]   Collector: Garbage Collection Cycle
>>>                0.000 / 0.000      7789.187 / 7789.187  12727.424 /
>>> 3903938.012  1265.033 / 3903938.012   ms
>>> [2020-02-02T22:37:36.883+0000]  Contention: Mark Segment Reset Contention
>>>                   0 / 0                10 / 1084            176 / 15122
>>>        42 / 15122       ops/s
>>> [2020-02-02T22:37:36.883+0000]  Contention: Mark SeqNum Reset Contention
>>>                    0 / 0                 0 / 5                 0 / 31
>>>          0 / 31          ops/s
>>> [2020-02-02T22:37:36.883+0000]  Contention: Relocation Contention
>>>                   0 / 0                 0 / 3                 1 / 708
>>>         7 / 890         ops/s
>>> [2020-02-02T22:37:36.883+0000]    Critical: Allocation Stall
>>>                0.000 / 0.000         0.000 / 0.000      6714.722 /
>>> 3902042.342  6714.722 / 3902042.342   ms
>>> [2020-02-02T22:37:36.883+0000]    Critical: Allocation Stall
>>>                    0 / 0                 0 / 0                12 / 4115
>>>          2 / 4115        ops/s
>>> [2020-02-02T22:37:36.883+0000]    Critical: GC Locker Stall
>>>               0.000 / 0.000         0.000 / 0.000         3.979 / 6.561
>>>     1.251 / 6.561       ms
>>> [2020-02-02T22:37:36.883+0000]    Critical: GC Locker Stall
>>>                   0 / 0                 0 / 0                 0 / 1
>>>         0 / 1           ops/s
>>> [2020-02-02T22:37:36.883+0000]      Memory: Allocation Rate
>>>                   0 / 0                 6 / 822             762 / 25306
>>>      1548 / 25306       MB/s
>>> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used After Mark
>>>                    0 / 0             92170 / 92170         89632 /
>> 132896
>>>      30301 / 132896      MB
>>> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used After Relocation
>>>                    0 / 0             76376 / 76376         67490 /
>> 132928
>>>       8047 / 132928      MB
>>> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used Before Mark
>>>                   0 / 0             92128 / 92128         84429 / 132896
>>>      29452 / 132896      MB
>>> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used Before Relocation
>>>                   0 / 0             86340 / 86340         76995 / 132896
>>>      15862 / 132896      MB
>>> [2020-02-02T22:37:36.883+0000]      Memory: Out Of Memory
>>>                   0 / 0                 0 / 0                 0 / 0
>>>         0 / 0           ops/s
>>> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Flush
>>>                    0 / 0                 0 / 0                62 / 2868
>>>         16 / 2868        MB/s
>>> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Hit L1
>>>                   0 / 0                 7 / 2233            277 / 11553
>>>       583 / 11553       ops/s
>>> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Hit L2
>>>                   0 / 0                 0 / 0                20 / 4619
>>>         59 / 4619        ops/s
>>> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Miss
>>>                   0 / 0                 0 / 0                15 / 1039
>>>          3 / 1297        ops/s
>>> [2020-02-02T22:37:36.883+0000]      Memory: Undo Object Allocation Failed
>>>                   0 / 0                 0 / 0                 0 / 24
>>>          0 / 24          ops/s
>>> [2020-02-02T22:37:36.883+0000]      Memory: Undo Object Allocation
>>> Succeeded                  0 / 0                 0 / 3                 1
>> /
>>> 708               7 / 890         ops/s
>>> [2020-02-02T22:37:36.883+0000]      Memory: Undo Page Allocation
>>>                    0 / 0                 0 / 12               30 / 3464
>>>          7 / 3464        ops/s
>>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Destroy Detached
>>> Pages             0.000 / 0.000         0.004 / 0.004        11.675 /
>>> 1484.886      1.155 / 1484.886    ms
>>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Mark
>>>               0.000 / 0.000      7016.569 / 7016.569  11758.365 /
>>> 3901893.544  1103.558 / 3901893.544   ms
>>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Mark Continue
>>>                0.000 / 0.000         0.000 / 0.000      1968.844 /
>> 3674.454
>>>   1968.844 / 3674.454    ms
>>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Prepare Relocation
>>> Set             0.000 / 0.000       453.732 / 453.732     364.535 /
>>> 7103.720     39.453 / 7103.720    ms
>>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Process Non-Strong
>>> References      0.000 / 0.000         2.003 / 2.003         2.738 /
>> 34.406
>>>        2.253 / 34.406      ms
>>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Relocate
>>>               0.000 / 0.000       261.822 / 261.822     335.954 /
>> 2207.669
>>>     45.868 / 2207.669    ms
>>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Reset Relocation
>> Set
>>>               0.000 / 0.000         6.083 / 6.083        13.489 /
>> 1128.678
>>>      3.574 / 1128.678    ms
>>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Select Relocation
>>> Set              0.000 / 0.000         6.379 / 6.379        97.530 /
>>> 1460.679     18.439 / 1460.679    ms
>>> [2020-02-02T22:37:36.883+0000]       Phase: Pause Mark End
>>>                0.000 / 0.000         4.420 / 4.420         6.219 /
>> 26.498
>>>      6.474 / 40.883      ms
>>> [2020-02-02T22:37:36.883+0000]       Phase: Pause Mark Start
>>>                0.000 / 0.000        14.836 / 14.836       11.893 /
>> 28.350
>>>     11.664 / 41.767      ms
>>> [2020-02-02T22:37:36.884+0000]       Phase: Pause Relocate Start
>>>                0.000 / 0.000        13.411 / 13.411       30.849 /
>> 697.344
>>>      11.995 / 697.344     ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark
>>>               0.000 / 0.000      7015.793 / 7016.276  18497.265 /
>>> 3901893.075  1690.497 / 3901893.075   ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Idle
>>>                0.000 / 0.000         1.127 / 13.510        1.292 /
>> 219.999
>>>       1.280 / 219.999     ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Try Flush
>>>               0.000 / 0.000         1.295 / 2.029        47.094 /
>> 34869.359
>>>     4.797 / 34869.359   ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Try Terminate
>>>               0.000 / 0.000         1.212 / 14.847        1.760 /
>> 3799.238
>>>      1.724 / 3799.238    ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent References Enqueue
>>>               0.000 / 0.000         0.009 / 0.009         0.022 / 1.930
>>>     0.017 / 2.350       ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent References Process
>>>               0.000 / 0.000         0.599 / 0.599         0.768 / 23.966
>>>      0.495 / 23.966      ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
>>>               0.000 / 0.000         0.882 / 1.253         1.155 / 21.699
>>>      1.077 / 23.602      ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
>>> JNIWeakHandles          0.000 / 0.000         0.301 / 0.943
>> 0.308 /
>>> 10.868        0.310 / 23.219      ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
>>> StringTable             0.000 / 0.000         0.289 / 0.496
>> 0.390 /
>>> 12.794        0.363 / 22.907      ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
>>> VMWeakHandles           0.000 / 0.000         0.230 / 0.469
>> 0.329 /
>>> 21.267        0.331 / 23.135      ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Mark Try Complete
>>>               0.000 / 0.000         0.000 / 0.000         0.501 / 4.801
>>>     0.480 / 17.208      ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Remap TLABS
>>>               0.000 / 0.000         0.252 / 0.252         0.195 / 0.528
>>>     0.226 / 3.451       ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Retire TLABS
>>>                0.000 / 0.000         1.195 / 1.195         1.324 / 5.082
>>>       1.408 / 11.219      ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots
>>>               0.000 / 0.000         6.968 / 10.865       12.329 /
>> 693.701
>>>     6.431 / 1300.994    ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots
>>> ClassLoaderDataGraph              0.000 / 0.000         4.819 / 8.232
>>>   9.635 / 693.405       3.476 / 693.405     ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots CodeCache
>>>               0.000 / 0.000         0.842 / 2.731         0.996 / 83.553
>>>      0.780 / 83.553      ms
>>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots JNIHandles
>>>                0.000 / 0.000         1.171 / 6.314         0.866 /
>> 17.875
>>>      0.837 / 25.708      ms
>>> 
>>> 
>>> Thank you!
>>> Prabhash Rathore
>>> 
>> 


From prabhashrathore at gmail.com  Thu Feb 13 07:20:40 2020
From: prabhashrathore at gmail.com (Prabhash Rathore)
Date: Wed, 12 Feb 2020 23:20:40 -0800
Subject: Information on how to parse/interpret ZGC Logs
In-Reply-To: <48002F4A-20D2-4DFB-91BE-0E0E346DFCD5@me.com>
References: <CAFw09gL4LCRDEUHNZNC11ZvBG5TDNpuSfQDv+qzeg9B0eAz5Tg@mail.gmail.com>
 <48002F4A-20D2-4DFB-91BE-0E0E346DFCD5@me.com>
Message-ID: <CAFw09gKVw4514CLDUdtvbCd0UvgE75XUGLa-CCVSdKXn4Hhcxg@mail.gmail.com>

Thank you Per for your help! It's very helpful.

I started my application GC configuration with default settings, I just had
Xmx set but because of memory allocation stalls and Gc pauses, I tunes
Concurrent threads and Parallel threads as default options didn't see
enough.

My reasoning for high parallel  thread configuration
(-XX:ParallelGCThreads=78) is that application threads are anyway stalled
during full pause so having higher threads (for now 80% of OS threads) can
work on collection. and keep the GC pause time lower. Again I increased
Concurrent threads from default value to keep collection rate on par with
the allocation rate.

You mentioned when I see Allocation Stall, increase heap size. I think I
have already heap configured at 80% of RAM size. For such allocation stall,
is there anything else I can tune other than heap size, concurrent and
parallel thread counts.


Hi Peter,

This application runs on Linux RHEL 7.7. OS. Kernel version is 3.10.0-1062

Output of *egrep "thp|trans" /proc/vmstat:*
nr_anon_transparent_hugepages 4722
thp_fault_alloc 51664
thp_fault_fallback 620147
thp_collapse_alloc 11462
thp_collapse_alloc_failed 20085
thp_split 9350
thp_zero_page_alloc 1
thp_zero_page_alloc_failed 0


Output of *tail -28 /proc/cpuinfo*
power management:

processor : 95
vendor_id : GenuineIntel
cpu family : 6
model : 85
model name : Intel(R) Xeon(R) Gold 6263CY CPU @ 2.60GHz
stepping : 7
microcode : 0x5000021
cpu MHz : 3304.431
cache size : 33792 KB
physical id : 1
siblings : 48
core id : 29
cpu cores : 24
apicid : 123
initial apicid : 123
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx
est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe
popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm
3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba
ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid
fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a
avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl
xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local
dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke
avx512_vnni md_clear spec_ctrl intel_stibp flush_l1d arch_capabilities
bogomips : 5206.07
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:


On Mon, Feb 10, 2020 at 9:28 AM Peter Booth <peter_booth at me.com> wrote:

> Prabhash,
>
> What OS version?
> Is it a vanilla OS install?
> Can you print the output of the following?
> (Assuming Linux)
> egrep ?thp|trans? /proc/vmstat
> tail -28 /proc/cpuinfo
>
> Peter
>
> Sent from my iPhone
>
> > On Feb 9, 2020, at 1:56 AM, Prabhash Rathore <prabhashrathore at gmail.com>
> wrote:
> >
> > ?Hi Per,
> >
> > Thanks for your reply!
> >
> > About ZGC logs, in general I am trying to understand following:
> >
> >   - What are the full pause times?
> >   - How many such pause per unit time?
> >   - Anything else which helps me eliminate GC as cause for high
> >   application latency.
> >
> > This is how I have configured ZGC logging at JVM level, wondering if I
> > should add other tags like Safepoint to get more details about GC stats:
> > -Xlog:gc*=debug:file=gc.log
> >
> > All JVM flas used in my application:
> > -Xms130G -Xmx130G
> > -Xlog:gc*=debug:file=/somedir/gc.log:time:filecount=10,filesize=67108864
> > -XX:+AlwaysPreTouch -XX:+HeapDumpOnOutOfMemoryError
> > -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:ConcGCThreads=48
> > -XX:ParallelGCThreads=96 -XX:-OmitStackTraceInFastThrow
> >
> > It's a large machine with 96 threads and 196 GB RAM.
> >
> > I have -XX:+AlwaysPreTouch configured as one another option. With
> > AlwaysPreTouch option, I see in Linux top command shows a very high
> shared
> > and resident memory. My max heap size is configured as 130 GB but I see
> > shared memory is shown as 388 GB and Resident memory as 436 GB. On the
> > other hand, total virtual memory for this process in top is shown as 17.1
> > tera byte. How is this possible? My whole machine size is 196 GB (is this
> > accounting for things swapped out to disk). I did see without
> > AlwaysPretouch, numbers look close to the heap size. Trying to understand
> > why with PreTouch, process memory is shown was higher than configured
> size?
> > I understand shared memory has all shared libs mapped out but how can it
> be
> > such. a large size?
> >
> > Regarding high GC pause time, I did notice that my machine was low on
> > memory and it was swapping, hence slowing down everything. For now I have
> > disabled Swappines completely with Kernel VM tunable but I am still
> trying
> > to find the actual cause of why swapping kicked in. This machine only
> runs
> > this particular Java applicaion which has 130 GB heap size. Other than
> > heap, I still have 66 GB memory available on host. Trying to figure out
> if
> > there is a native memory leak. If you have any inputs on this then please
> > share.
> >
> > Thanks!
> > Prabhash Rathore
> >
> >> On Mon, Feb 3, 2020 at 2:35 AM Per Liden <per.liden at oracle.com> wrote:
> >>
> >> Hi,
> >>
> >>> On 2020-02-03 06:52, Prabhash Rathore wrote:
> >>> Hello,
> >>>
> >>> We have decided to use ZGC Garbage Collector for our Java application
> >>> running on Java 11. I was wondering if there are any tools or any
> >>> documenation on how to interpret ZGC logs.
> >>
> >> Is there something in particular in the logs you're wondering about?
> >>
> >>>
> >>> I found following statistics in ZGC log which as per my understanding
> >> shows
> >>> a very large allocation stall of 3902042.342 milliseconds. It will be
> >>> really great if I can get some help to understand this further.
> >>
> >> I can see that you've had making times that is more than an hour long,
> >> which suggests that something in your system is seriously wrong (like
> >> extremely overloaded or an extremely slow disk that you log to... just
> >> guessing here). I think you need to have a broader look at the health of
> >> the system before we can draw any conclusion from the GC logs.
> >>
> >> cheers,
> >> Per
> >>
> >>>
> >>> [2020-02-02T22:37:36.883+0000] === Garbage Collection Statistics
> >>>
> >>
> =======================================================================================================================
> >>> [2020-02-02T22:37:36.883+0000]
> >>>                  Last 10s              Last 10m              Last 10h
> >>>          Total
> >>> [2020-02-02T22:37:36.883+0000]
> >>>                  Avg / Max             Avg / Max             Avg / Max
> >>>         Avg / Max
> >>> [2020-02-02T22:37:36.883+0000]   Collector: Garbage Collection Cycle
> >>>                0.000 / 0.000      7789.187 / 7789.187  12727.424 /
> >>> 3903938.012  1265.033 / 3903938.012   ms
> >>> [2020-02-02T22:37:36.883+0000]  Contention: Mark Segment Reset
> Contention
> >>>                   0 / 0                10 / 1084            176 / 15122
> >>>        42 / 15122       ops/s
> >>> [2020-02-02T22:37:36.883+0000]  Contention: Mark SeqNum Reset
> Contention
> >>>                    0 / 0                 0 / 5                 0 / 31
> >>>          0 / 31          ops/s
> >>> [2020-02-02T22:37:36.883+0000]  Contention: Relocation Contention
> >>>                   0 / 0                 0 / 3                 1 / 708
> >>>         7 / 890         ops/s
> >>> [2020-02-02T22:37:36.883+0000]    Critical: Allocation Stall
> >>>                0.000 / 0.000         0.000 / 0.000      6714.722 /
> >>> 3902042.342  6714.722 / 3902042.342   ms
> >>> [2020-02-02T22:37:36.883+0000]    Critical: Allocation Stall
> >>>                    0 / 0                 0 / 0                12 / 4115
> >>>          2 / 4115        ops/s
> >>> [2020-02-02T22:37:36.883+0000]    Critical: GC Locker Stall
> >>>               0.000 / 0.000         0.000 / 0.000         3.979 / 6.561
> >>>     1.251 / 6.561       ms
> >>> [2020-02-02T22:37:36.883+0000]    Critical: GC Locker Stall
> >>>                   0 / 0                 0 / 0                 0 / 1
> >>>         0 / 1           ops/s
> >>> [2020-02-02T22:37:36.883+0000]      Memory: Allocation Rate
> >>>                   0 / 0                 6 / 822             762 / 25306
> >>>      1548 / 25306       MB/s
> >>> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used After Mark
> >>>                    0 / 0             92170 / 92170         89632 /
> >> 132896
> >>>      30301 / 132896      MB
> >>> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used After Relocation
> >>>                    0 / 0             76376 / 76376         67490 /
> >> 132928
> >>>       8047 / 132928      MB
> >>> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used Before Mark
> >>>                   0 / 0             92128 / 92128         84429 /
> 132896
> >>>      29452 / 132896      MB
> >>> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used Before Relocation
> >>>                   0 / 0             86340 / 86340         76995 /
> 132896
> >>>      15862 / 132896      MB
> >>> [2020-02-02T22:37:36.883+0000]      Memory: Out Of Memory
> >>>                   0 / 0                 0 / 0                 0 / 0
> >>>         0 / 0           ops/s
> >>> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Flush
> >>>                    0 / 0                 0 / 0                62 / 2868
> >>>         16 / 2868        MB/s
> >>> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Hit L1
> >>>                   0 / 0                 7 / 2233            277 / 11553
> >>>       583 / 11553       ops/s
> >>> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Hit L2
> >>>                   0 / 0                 0 / 0                20 / 4619
> >>>         59 / 4619        ops/s
> >>> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Miss
> >>>                   0 / 0                 0 / 0                15 / 1039
> >>>          3 / 1297        ops/s
> >>> [2020-02-02T22:37:36.883+0000]      Memory: Undo Object Allocation
> Failed
> >>>                   0 / 0                 0 / 0                 0 / 24
> >>>          0 / 24          ops/s
> >>> [2020-02-02T22:37:36.883+0000]      Memory: Undo Object Allocation
> >>> Succeeded                  0 / 0                 0 / 3
>  1
> >> /
> >>> 708               7 / 890         ops/s
> >>> [2020-02-02T22:37:36.883+0000]      Memory: Undo Page Allocation
> >>>                    0 / 0                 0 / 12               30 / 3464
> >>>          7 / 3464        ops/s
> >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Destroy Detached
> >>> Pages             0.000 / 0.000         0.004 / 0.004        11.675 /
> >>> 1484.886      1.155 / 1484.886    ms
> >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Mark
> >>>               0.000 / 0.000      7016.569 / 7016.569  11758.365 /
> >>> 3901893.544  1103.558 / 3901893.544   ms
> >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Mark Continue
> >>>                0.000 / 0.000         0.000 / 0.000      1968.844 /
> >> 3674.454
> >>>   1968.844 / 3674.454    ms
> >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Prepare
> Relocation
> >>> Set             0.000 / 0.000       453.732 / 453.732     364.535 /
> >>> 7103.720     39.453 / 7103.720    ms
> >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Process
> Non-Strong
> >>> References      0.000 / 0.000         2.003 / 2.003         2.738 /
> >> 34.406
> >>>        2.253 / 34.406      ms
> >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Relocate
> >>>               0.000 / 0.000       261.822 / 261.822     335.954 /
> >> 2207.669
> >>>     45.868 / 2207.669    ms
> >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Reset Relocation
> >> Set
> >>>               0.000 / 0.000         6.083 / 6.083        13.489 /
> >> 1128.678
> >>>      3.574 / 1128.678    ms
> >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Select
> Relocation
> >>> Set              0.000 / 0.000         6.379 / 6.379        97.530 /
> >>> 1460.679     18.439 / 1460.679    ms
> >>> [2020-02-02T22:37:36.883+0000]       Phase: Pause Mark End
> >>>                0.000 / 0.000         4.420 / 4.420         6.219 /
> >> 26.498
> >>>      6.474 / 40.883      ms
> >>> [2020-02-02T22:37:36.883+0000]       Phase: Pause Mark Start
> >>>                0.000 / 0.000        14.836 / 14.836       11.893 /
> >> 28.350
> >>>     11.664 / 41.767      ms
> >>> [2020-02-02T22:37:36.884+0000]       Phase: Pause Relocate Start
> >>>                0.000 / 0.000        13.411 / 13.411       30.849 /
> >> 697.344
> >>>      11.995 / 697.344     ms
> >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark
> >>>               0.000 / 0.000      7015.793 / 7016.276  18497.265 /
> >>> 3901893.075  1690.497 / 3901893.075   ms
> >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Idle
> >>>                0.000 / 0.000         1.127 / 13.510        1.292 /
> >> 219.999
> >>>       1.280 / 219.999     ms
> >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Try Flush
> >>>               0.000 / 0.000         1.295 / 2.029        47.094 /
> >> 34869.359
> >>>     4.797 / 34869.359   ms
> >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Try
> Terminate
> >>>               0.000 / 0.000         1.212 / 14.847        1.760 /
> >> 3799.238
> >>>      1.724 / 3799.238    ms
> >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent References
> Enqueue
> >>>               0.000 / 0.000         0.009 / 0.009         0.022 / 1.930
> >>>     0.017 / 2.350       ms
> >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent References
> Process
> >>>               0.000 / 0.000         0.599 / 0.599         0.768 /
> 23.966
> >>>      0.495 / 23.966      ms
> >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
> >>>               0.000 / 0.000         0.882 / 1.253         1.155 /
> 21.699
> >>>      1.077 / 23.602      ms
> >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
> >>> JNIWeakHandles          0.000 / 0.000         0.301 / 0.943
> >> 0.308 /
> >>> 10.868        0.310 / 23.219      ms
> >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
> >>> StringTable             0.000 / 0.000         0.289 / 0.496
> >> 0.390 /
> >>> 12.794        0.363 / 22.907      ms
> >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
> >>> VMWeakHandles           0.000 / 0.000         0.230 / 0.469
> >> 0.329 /
> >>> 21.267        0.331 / 23.135      ms
> >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Mark Try Complete
> >>>               0.000 / 0.000         0.000 / 0.000         0.501 / 4.801
> >>>     0.480 / 17.208      ms
> >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Remap TLABS
> >>>               0.000 / 0.000         0.252 / 0.252         0.195 / 0.528
> >>>     0.226 / 3.451       ms
> >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Retire TLABS
> >>>                0.000 / 0.000         1.195 / 1.195         1.324 /
> 5.082
> >>>       1.408 / 11.219      ms
> >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots
> >>>               0.000 / 0.000         6.968 / 10.865       12.329 /
> >> 693.701
> >>>     6.431 / 1300.994    ms
> >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots
> >>> ClassLoaderDataGraph              0.000 / 0.000         4.819 / 8.232
> >>>   9.635 / 693.405       3.476 / 693.405     ms
> >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots CodeCache
> >>>               0.000 / 0.000         0.842 / 2.731         0.996 /
> 83.553
> >>>      0.780 / 83.553      ms
> >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots JNIHandles
> >>>                0.000 / 0.000         1.171 / 6.314         0.866 /
> >> 17.875
> >>>      0.837 / 25.708      ms
> >>>
> >>>
> >>> Thank you!
> >>> Prabhash Rathore
> >>>
> >>
>
>

From pme at activeviam.com  Thu Feb 13 13:58:32 2020
From: pme at activeviam.com (Pierre Mevel)
Date: Thu, 13 Feb 2020 14:58:32 +0100
Subject: zgc-dev Digest, Vol 26, Issue 4
In-Reply-To: <mailman.15.1581595227.17607.zgc-dev@openjdk.java.net>
References: <mailman.15.1581595227.17607.zgc-dev@openjdk.java.net>
Message-ID: <CABeV01bFYT2zN2JXfCvfekPLK=EXT3kZjDBQF8C6aOcfm1dS-A@mail.gmail.com>

Good morning,

Following on "Information on how to parse/interpret ZGC Logs",
I did get the same issues back in October. (
http://mail.openjdk.java.net/pipermail/zgc-dev/2019-October/000779.html for
the curious).

Basically, our application runs on relatively big servers, and allocates
memory at a very high pace.

We get enormous allocation stalls with ZGC, and increasing the amount of
threads running will simply delay the first allocation stalls, not resolve
the issue.
Because ZGC is almost entirely concurrent, the application still allocates
memory during the Concurrent Relocation phase.
We have two root issues that clash against each other:
1. The allocation rate can be much higher than the recollection rate (which
makes us want to give more OS resources to the GC)
2. The allocation rate can vary greatly (and when it's at a low, we do not
want to have many threads running Concurrent phases) (and when it's at a
high, it's because clients want some answers from the system, and we can't
afford a long Allocation Stall)

As I tried to suggest in my previous mail in October, I suggest that the
workers count be boosted to a higher value (Max(ParallelGcThreads,
ConcGcThreads) maybe?) as soon as an Allocation Stall is triggered, and
restored to normal at the end of the phase (maybe?).

Apologies for having taken up your time,
Best Regards,
Pierre M?vel


On Thu, Feb 13, 2020 at 1:03 PM <zgc-dev-request at openjdk.java.net> wrote:

> Send zgc-dev mailing list submissions to
>         zgc-dev at openjdk.java.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://mail.openjdk.java.net/mailman/listinfo/zgc-dev
> or, via email, send a message with subject or body 'help' to
>         zgc-dev-request at openjdk.java.net
>
> You can reach the person managing the list at
>         zgc-dev-owner at openjdk.java.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of zgc-dev digest..."
>
>
> Today's Topics:
>
>    1. Re: Information on how to parse/interpret ZGC Logs
>       (Prabhash Rathore)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 12 Feb 2020 23:20:40 -0800
> From: Prabhash Rathore <prabhashrathore at gmail.com>
> To: Peter Booth <peter_booth at me.com>
> Cc: zgc-dev at openjdk.java.net
> Subject: Re: Information on how to parse/interpret ZGC Logs
> Message-ID:
>         <
> CAFw09gKVw4514CLDUdtvbCd0UvgE75XUGLa-CCVSdKXn4Hhcxg at mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Thank you Per for your help! It's very helpful.
>
> I started my application GC configuration with default settings, I just had
> Xmx set but because of memory allocation stalls and Gc pauses, I tunes
> Concurrent threads and Parallel threads as default options didn't see
> enough.
>
> My reasoning for high parallel  thread configuration
> (-XX:ParallelGCThreads=78) is that application threads are anyway stalled
> during full pause so having higher threads (for now 80% of OS threads) can
> work on collection. and keep the GC pause time lower. Again I increased
> Concurrent threads from default value to keep collection rate on par with
> the allocation rate.
>
> You mentioned when I see Allocation Stall, increase heap size. I think I
> have already heap configured at 80% of RAM size. For such allocation stall,
> is there anything else I can tune other than heap size, concurrent and
> parallel thread counts.
>
>
> Hi Peter,
>
> This application runs on Linux RHEL 7.7. OS. Kernel version is 3.10.0-1062
>
> Output of *egrep "thp|trans" /proc/vmstat:*
> nr_anon_transparent_hugepages 4722
> thp_fault_alloc 51664
> thp_fault_fallback 620147
> thp_collapse_alloc 11462
> thp_collapse_alloc_failed 20085
> thp_split 9350
> thp_zero_page_alloc 1
> thp_zero_page_alloc_failed 0
>
>
>
> Output of *tail -28 /proc/cpuinfo*
> power management:
>
> processor : 95
> vendor_id : GenuineIntel
> cpu family : 6
> model : 85
> model name : Intel(R) Xeon(R) Gold 6263CY CPU @ 2.60GHz
> stepping : 7
> microcode : 0x5000021
> cpu MHz : 3304.431
> cache size : 33792 KB
> physical id : 1
> siblings : 48
> core id : 29
> cpu cores : 24
> apicid : 123
> initial apicid : 123
> fpu : yes
> fpu_exception : yes
> cpuid level : 22
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
> pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
> rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology
> nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx
> est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe
> popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm
> 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba
> ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid
> fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a
> avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl
> xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local
> dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke
> avx512_vnni md_clear spec_ctrl intel_stibp flush_l1d arch_capabilities
> bogomips : 5206.07
> clflush size : 64
> cache_alignment : 64
> address sizes : 46 bits physical, 48 bits virtual
> power management:
>
>
>
>
>
> On Mon, Feb 10, 2020 at 9:28 AM Peter Booth <peter_booth at me.com> wrote:
>
> > Prabhash,
> >
> > What OS version?
> > Is it a vanilla OS install?
> > Can you print the output of the following?
> > (Assuming Linux)
> > egrep ?thp|trans? /proc/vmstat
> > tail -28 /proc/cpuinfo
> >
> > Peter
> >
> > Sent from my iPhone
> >
> > > On Feb 9, 2020, at 1:56 AM, Prabhash Rathore <
> prabhashrathore at gmail.com>
> > wrote:
> > >
> > > ?Hi Per,
> > >
> > > Thanks for your reply!
> > >
> > > About ZGC logs, in general I am trying to understand following:
> > >
> > >   - What are the full pause times?
> > >   - How many such pause per unit time?
> > >   - Anything else which helps me eliminate GC as cause for high
> > >   application latency.
> > >
> > > This is how I have configured ZGC logging at JVM level, wondering if I
> > > should add other tags like Safepoint to get more details about GC
> stats:
> > > -Xlog:gc*=debug:file=gc.log
> > >
> > > All JVM flas used in my application:
> > > -Xms130G -Xmx130G
> > >
> -Xlog:gc*=debug:file=/somedir/gc.log:time:filecount=10,filesize=67108864
> > > -XX:+AlwaysPreTouch -XX:+HeapDumpOnOutOfMemoryError
> > > -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:ConcGCThreads=48
> > > -XX:ParallelGCThreads=96 -XX:-OmitStackTraceInFastThrow
> > >
> > > It's a large machine with 96 threads and 196 GB RAM.
> > >
> > > I have -XX:+AlwaysPreTouch configured as one another option. With
> > > AlwaysPreTouch option, I see in Linux top command shows a very high
> > shared
> > > and resident memory. My max heap size is configured as 130 GB but I see
> > > shared memory is shown as 388 GB and Resident memory as 436 GB. On the
> > > other hand, total virtual memory for this process in top is shown as
> 17.1
> > > tera byte. How is this possible? My whole machine size is 196 GB (is
> this
> > > accounting for things swapped out to disk). I did see without
> > > AlwaysPretouch, numbers look close to the heap size. Trying to
> understand
> > > why with PreTouch, process memory is shown was higher than configured
> > size?
> > > I understand shared memory has all shared libs mapped out but how can
> it
> > be
> > > such. a large size?
> > >
> > > Regarding high GC pause time, I did notice that my machine was low on
> > > memory and it was swapping, hence slowing down everything. For now I
> have
> > > disabled Swappines completely with Kernel VM tunable but I am still
> > trying
> > > to find the actual cause of why swapping kicked in. This machine only
> > runs
> > > this particular Java applicaion which has 130 GB heap size. Other than
> > > heap, I still have 66 GB memory available on host. Trying to figure out
> > if
> > > there is a native memory leak. If you have any inputs on this then
> please
> > > share.
> > >
> > > Thanks!
> > > Prabhash Rathore
> > >
> > >> On Mon, Feb 3, 2020 at 2:35 AM Per Liden <per.liden at oracle.com>
> wrote:
> > >>
> > >> Hi,
> > >>
> > >>> On 2020-02-03 06:52, Prabhash Rathore wrote:
> > >>> Hello,
> > >>>
> > >>> We have decided to use ZGC Garbage Collector for our Java application
> > >>> running on Java 11. I was wondering if there are any tools or any
> > >>> documenation on how to interpret ZGC logs.
> > >>
> > >> Is there something in particular in the logs you're wondering about?
> > >>
> > >>>
> > >>> I found following statistics in ZGC log which as per my understanding
> > >> shows
> > >>> a very large allocation stall of 3902042.342 milliseconds. It will be
> > >>> really great if I can get some help to understand this further.
> > >>
> > >> I can see that you've had making times that is more than an hour long,
> > >> which suggests that something in your system is seriously wrong (like
> > >> extremely overloaded or an extremely slow disk that you log to... just
> > >> guessing here). I think you need to have a broader look at the health
> of
> > >> the system before we can draw any conclusion from the GC logs.
> > >>
> > >> cheers,
> > >> Per
> > >>
> > >>>
> > >>> [2020-02-02T22:37:36.883+0000] === Garbage Collection Statistics
> > >>>
> > >>
> >
> =======================================================================================================================
> > >>> [2020-02-02T22:37:36.883+0000]
> > >>>                  Last 10s              Last 10m              Last 10h
> > >>>          Total
> > >>> [2020-02-02T22:37:36.883+0000]
> > >>>                  Avg / Max             Avg / Max             Avg /
> Max
> > >>>         Avg / Max
> > >>> [2020-02-02T22:37:36.883+0000]   Collector: Garbage Collection Cycle
> > >>>                0.000 / 0.000      7789.187 / 7789.187  12727.424 /
> > >>> 3903938.012  1265.033 / 3903938.012   ms
> > >>> [2020-02-02T22:37:36.883+0000]  Contention: Mark Segment Reset
> > Contention
> > >>>                   0 / 0                10 / 1084            176 /
> 15122
> > >>>        42 / 15122       ops/s
> > >>> [2020-02-02T22:37:36.883+0000]  Contention: Mark SeqNum Reset
> > Contention
> > >>>                    0 / 0                 0 / 5                 0 / 31
> > >>>          0 / 31          ops/s
> > >>> [2020-02-02T22:37:36.883+0000]  Contention: Relocation Contention
> > >>>                   0 / 0                 0 / 3                 1 / 708
> > >>>         7 / 890         ops/s
> > >>> [2020-02-02T22:37:36.883+0000]    Critical: Allocation Stall
> > >>>                0.000 / 0.000         0.000 / 0.000      6714.722 /
> > >>> 3902042.342  6714.722 / 3902042.342   ms
> > >>> [2020-02-02T22:37:36.883+0000]    Critical: Allocation Stall
> > >>>                    0 / 0                 0 / 0                12 /
> 4115
> > >>>          2 / 4115        ops/s
> > >>> [2020-02-02T22:37:36.883+0000]    Critical: GC Locker Stall
> > >>>               0.000 / 0.000         0.000 / 0.000         3.979 /
> 6.561
> > >>>     1.251 / 6.561       ms
> > >>> [2020-02-02T22:37:36.883+0000]    Critical: GC Locker Stall
> > >>>                   0 / 0                 0 / 0                 0 / 1
> > >>>         0 / 1           ops/s
> > >>> [2020-02-02T22:37:36.883+0000]      Memory: Allocation Rate
> > >>>                   0 / 0                 6 / 822             762 /
> 25306
> > >>>      1548 / 25306       MB/s
> > >>> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used After Mark
> > >>>                    0 / 0             92170 / 92170         89632 /
> > >> 132896
> > >>>      30301 / 132896      MB
> > >>> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used After
> Relocation
> > >>>                    0 / 0             76376 / 76376         67490 /
> > >> 132928
> > >>>       8047 / 132928      MB
> > >>> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used Before Mark
> > >>>                   0 / 0             92128 / 92128         84429 /
> > 132896
> > >>>      29452 / 132896      MB
> > >>> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used Before
> Relocation
> > >>>                   0 / 0             86340 / 86340         76995 /
> > 132896
> > >>>      15862 / 132896      MB
> > >>> [2020-02-02T22:37:36.883+0000]      Memory: Out Of Memory
> > >>>                   0 / 0                 0 / 0                 0 / 0
> > >>>         0 / 0           ops/s
> > >>> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Flush
> > >>>                    0 / 0                 0 / 0                62 /
> 2868
> > >>>         16 / 2868        MB/s
> > >>> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Hit L1
> > >>>                   0 / 0                 7 / 2233            277 /
> 11553
> > >>>       583 / 11553       ops/s
> > >>> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Hit L2
> > >>>                   0 / 0                 0 / 0                20 /
> 4619
> > >>>         59 / 4619        ops/s
> > >>> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Miss
> > >>>                   0 / 0                 0 / 0                15 /
> 1039
> > >>>          3 / 1297        ops/s
> > >>> [2020-02-02T22:37:36.883+0000]      Memory: Undo Object Allocation
> > Failed
> > >>>                   0 / 0                 0 / 0                 0 / 24
> > >>>          0 / 24          ops/s
> > >>> [2020-02-02T22:37:36.883+0000]      Memory: Undo Object Allocation
> > >>> Succeeded                  0 / 0                 0 / 3
> >  1
> > >> /
> > >>> 708               7 / 890         ops/s
> > >>> [2020-02-02T22:37:36.883+0000]      Memory: Undo Page Allocation
> > >>>                    0 / 0                 0 / 12               30 /
> 3464
> > >>>          7 / 3464        ops/s
> > >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Destroy
> Detached
> > >>> Pages             0.000 / 0.000         0.004 / 0.004        11.675 /
> > >>> 1484.886      1.155 / 1484.886    ms
> > >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Mark
> > >>>               0.000 / 0.000      7016.569 / 7016.569  11758.365 /
> > >>> 3901893.544  1103.558 / 3901893.544   ms
> > >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Mark Continue
> > >>>                0.000 / 0.000         0.000 / 0.000      1968.844 /
> > >> 3674.454
> > >>>   1968.844 / 3674.454    ms
> > >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Prepare
> > Relocation
> > >>> Set             0.000 / 0.000       453.732 / 453.732     364.535 /
> > >>> 7103.720     39.453 / 7103.720    ms
> > >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Process
> > Non-Strong
> > >>> References      0.000 / 0.000         2.003 / 2.003         2.738 /
> > >> 34.406
> > >>>        2.253 / 34.406      ms
> > >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Relocate
> > >>>               0.000 / 0.000       261.822 / 261.822     335.954 /
> > >> 2207.669
> > >>>     45.868 / 2207.669    ms
> > >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Reset
> Relocation
> > >> Set
> > >>>               0.000 / 0.000         6.083 / 6.083        13.489 /
> > >> 1128.678
> > >>>      3.574 / 1128.678    ms
> > >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Select
> > Relocation
> > >>> Set              0.000 / 0.000         6.379 / 6.379        97.530 /
> > >>> 1460.679     18.439 / 1460.679    ms
> > >>> [2020-02-02T22:37:36.883+0000]       Phase: Pause Mark End
> > >>>                0.000 / 0.000         4.420 / 4.420         6.219 /
> > >> 26.498
> > >>>      6.474 / 40.883      ms
> > >>> [2020-02-02T22:37:36.883+0000]       Phase: Pause Mark Start
> > >>>                0.000 / 0.000        14.836 / 14.836       11.893 /
> > >> 28.350
> > >>>     11.664 / 41.767      ms
> > >>> [2020-02-02T22:37:36.884+0000]       Phase: Pause Relocate Start
> > >>>                0.000 / 0.000        13.411 / 13.411       30.849 /
> > >> 697.344
> > >>>      11.995 / 697.344     ms
> > >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark
> > >>>               0.000 / 0.000      7015.793 / 7016.276  18497.265 /
> > >>> 3901893.075  1690.497 / 3901893.075   ms
> > >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Idle
> > >>>                0.000 / 0.000         1.127 / 13.510        1.292 /
> > >> 219.999
> > >>>       1.280 / 219.999     ms
> > >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Try Flush
> > >>>               0.000 / 0.000         1.295 / 2.029        47.094 /
> > >> 34869.359
> > >>>     4.797 / 34869.359   ms
> > >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Try
> > Terminate
> > >>>               0.000 / 0.000         1.212 / 14.847        1.760 /
> > >> 3799.238
> > >>>      1.724 / 3799.238    ms
> > >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent References
> > Enqueue
> > >>>               0.000 / 0.000         0.009 / 0.009         0.022 /
> 1.930
> > >>>     0.017 / 2.350       ms
> > >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent References
> > Process
> > >>>               0.000 / 0.000         0.599 / 0.599         0.768 /
> > 23.966
> > >>>      0.495 / 23.966      ms
> > >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
> > >>>               0.000 / 0.000         0.882 / 1.253         1.155 /
> > 21.699
> > >>>      1.077 / 23.602      ms
> > >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
> > >>> JNIWeakHandles          0.000 / 0.000         0.301 / 0.943
> > >> 0.308 /
> > >>> 10.868        0.310 / 23.219      ms
> > >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
> > >>> StringTable             0.000 / 0.000         0.289 / 0.496
> > >> 0.390 /
> > >>> 12.794        0.363 / 22.907      ms
> > >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
> > >>> VMWeakHandles           0.000 / 0.000         0.230 / 0.469
> > >> 0.329 /
> > >>> 21.267        0.331 / 23.135      ms
> > >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Mark Try Complete
> > >>>               0.000 / 0.000         0.000 / 0.000         0.501 /
> 4.801
> > >>>     0.480 / 17.208      ms
> > >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Remap TLABS
> > >>>               0.000 / 0.000         0.252 / 0.252         0.195 /
> 0.528
> > >>>     0.226 / 3.451       ms
> > >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Retire TLABS
> > >>>                0.000 / 0.000         1.195 / 1.195         1.324 /
> > 5.082
> > >>>       1.408 / 11.219      ms
> > >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots
> > >>>               0.000 / 0.000         6.968 / 10.865       12.329 /
> > >> 693.701
> > >>>     6.431 / 1300.994    ms
> > >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots
> > >>> ClassLoaderDataGraph              0.000 / 0.000         4.819 / 8.232
> > >>>   9.635 / 693.405       3.476 / 693.405     ms
> > >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots CodeCache
> > >>>               0.000 / 0.000         0.842 / 2.731         0.996 /
> > 83.553
> > >>>      0.780 / 83.553      ms
> > >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots JNIHandles
> > >>>                0.000 / 0.000         1.171 / 6.314         0.866 /
> > >> 17.875
> > >>>      0.837 / 25.708      ms
> > >>>
> > >>>
> > >>> Thank you!
> > >>> Prabhash Rathore
> > >>>
> > >>
> >
> >
>
>
> End of zgc-dev Digest, Vol 26, Issue 4
> **************************************
>

From per.liden at oracle.com  Thu Feb 20 10:52:17 2020
From: per.liden at oracle.com (Per Liden)
Date: Thu, 20 Feb 2020 11:52:17 +0100
Subject: RFC: JEP: ZGC: Production Ready
Message-ID: <f39f4160-4f77-c45a-85f5-a9910bc6ecc5@oracle.com>

Hi all,

I've created a JEP draft to make ZGC a product (non-experimental) feature.

https://bugs.openjdk.java.net/browse/JDK-8209683

Comments and feedback welcome.

cheers,
Per

From per.liden at oracle.com  Mon Feb 24 12:32:46 2020
From: per.liden at oracle.com (Per Liden)
Date: Mon, 24 Feb 2020 13:32:46 +0100
Subject: Information on how to parse/interpret ZGC Logs
In-Reply-To: <CAFw09gKVw4514CLDUdtvbCd0UvgE75XUGLa-CCVSdKXn4Hhcxg@mail.gmail.com>
References: <CAFw09gL4LCRDEUHNZNC11ZvBG5TDNpuSfQDv+qzeg9B0eAz5Tg@mail.gmail.com>
 <48002F4A-20D2-4DFB-91BE-0E0E346DFCD5@me.com>
 <CAFw09gKVw4514CLDUdtvbCd0UvgE75XUGLa-CCVSdKXn4Hhcxg@mail.gmail.com>
Message-ID: <089b6068-adb0-e40f-f1ac-11c6356c2779@oracle.com>

Hi,

On 2/13/20 8:20 AM, Prabhash Rathore wrote:
> Thank you Per for your help! It's very helpful.
> 
> I started my application GC configuration with default settings, I just 
> had Xmx set but because of memory allocation stalls and Gc pauses, I 
> tunes Concurrent threads and Parallel threads as default options didn't 
> see enough.
> 
> My reasoning for high parallel? thread configuration 
> (-XX:ParallelGCThreads=78) is that application threads are anyway 
> stalled during full pause so having higher threads (for now 80% of OS 
> threads) can work on collection. and keep the GC pause time lower. Again 
> I increased Concurrent threads from default value to keep collection 
> rate on par with the allocation rate.
> 
> You mentioned when I see Allocation Stall, increase heap size. I think I 
> have already heap configured at 80% of RAM size. For such allocation 
> stall, is there anything else I can tune other than heap size, 
> concurrent and parallel thread counts.
> 
> 
> Hi Peter,
> 
> This application runs on Linux RHEL 7.7. OS. Kernel version is 3.10.0-1062
> 
> Output of *egrep "thp|trans" /proc/vmstat:*
> nr_anon_transparent_hugepages 4722
> thp_fault_alloc 51664
> thp_fault_fallback 620147
> thp_collapse_alloc 11462
> thp_collapse_alloc_failed 20085
> thp_split 9350
> thp_zero_page_alloc 1
> thp_zero_page_alloc_failed 0

For best throughput and latency, I'd suggest you disable all use of 
transparent hugepages and instead configure the kernel hugepage pool and 
use the -XX:+UseLargePages JVM option.

cheers,
Per

> 
> 
> 
> Output of *tail -28 /proc/cpuinfo*
> power management:
> 
> processor : 95
> vendor_id : GenuineIntel
> cpu family : 6
> model : 85
> model name : Intel(R) Xeon(R) Gold 6263CY CPU @ 2.60GHz
> stepping : 7
> microcode : 0x5000021
> cpu MHz : 3304.431
> cache size : 33792 KB
> physical id : 1
> siblings : 48
> core id : 29
> cpu cores : 24
> apicid : 123
> initial apicid : 123
> fpu : yes
> fpu_exception : yes
> cpuid level : 22
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx 
> pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl 
> xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor 
> ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 
> sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand 
> lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin 
> intel_pt ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi 
> flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms 
> invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt 
> clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc 
> cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp 
> hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear 
> spec_ctrl intel_stibp flush_l1d arch_capabilities
> bogomips : 5206.07
> clflush size : 64
> cache_alignment : 64
> address sizes : 46 bits physical, 48 bits virtual
> power management:
> 
> 
> 
> 
> 
> On Mon, Feb 10, 2020 at 9:28 AM Peter Booth <peter_booth at me.com 
> <mailto:peter_booth at me.com>> wrote:
> 
>     Prabhash,
> 
>     What OS version?
>     Is it a vanilla OS install?
>     Can you print the output of the following?
>     (Assuming Linux)
>     egrep ?thp|trans? /proc/vmstat
>     tail -28 /proc/cpuinfo
> 
>     Peter
> 
>     Sent from my iPhone
> 
>      > On Feb 9, 2020, at 1:56 AM, Prabhash Rathore
>     <prabhashrathore at gmail.com <mailto:prabhashrathore at gmail.com>> wrote:
>      >
>      > ?Hi Per,
>      >
>      > Thanks for your reply!
>      >
>      > About ZGC logs, in general I am trying to understand following:
>      >
>      >? ?- What are the full pause times?
>      >? ?- How many such pause per unit time?
>      >? ?- Anything else which helps me eliminate GC as cause for high
>      >? ?application latency.
>      >
>      > This is how I have configured ZGC logging at JVM level, wondering
>     if I
>      > should add other tags like Safepoint to get more details about GC
>     stats:
>      > -Xlog:gc*=debug:file=gc.log
>      >
>      > All JVM flas used in my application:
>      > -Xms130G -Xmx130G
>      >
>     -Xlog:gc*=debug:file=/somedir/gc.log:time:filecount=10,filesize=67108864
>      > -XX:+AlwaysPreTouch -XX:+HeapDumpOnOutOfMemoryError
>      > -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:ConcGCThreads=48
>      > -XX:ParallelGCThreads=96 -XX:-OmitStackTraceInFastThrow
>      >
>      > It's a large machine with 96 threads and 196 GB RAM.
>      >
>      > I have -XX:+AlwaysPreTouch configured as one another option. With
>      > AlwaysPreTouch option, I see in Linux top command shows a very
>     high shared
>      > and resident memory. My max heap size is configured as 130 GB but
>     I see
>      > shared memory is shown as 388 GB and Resident memory as 436 GB.
>     On the
>      > other hand, total virtual memory for this process in top is shown
>     as 17.1
>      > tera byte. How is this possible? My whole machine size is 196 GB
>     (is this
>      > accounting for things swapped out to disk). I did see without
>      > AlwaysPretouch, numbers look close to the heap size. Trying to
>     understand
>      > why with PreTouch, process memory is shown was higher than
>     configured size?
>      > I understand shared memory has all shared libs mapped out but how
>     can it be
>      > such. a large size?
>      >
>      > Regarding high GC pause time, I did notice that my machine was low on
>      > memory and it was swapping, hence slowing down everything. For
>     now I have
>      > disabled Swappines completely with Kernel VM tunable but I am
>     still trying
>      > to find the actual cause of why swapping kicked in. This machine
>     only runs
>      > this particular Java applicaion which has 130 GB heap size. Other
>     than
>      > heap, I still have 66 GB memory available on host. Trying to
>     figure out if
>      > there is a native memory leak. If you have any inputs on this
>     then please
>      > share.
>      >
>      > Thanks!
>      > Prabhash Rathore
>      >
>      >> On Mon, Feb 3, 2020 at 2:35 AM Per Liden <per.liden at oracle.com
>     <mailto:per.liden at oracle.com>> wrote:
>      >>
>      >> Hi,
>      >>
>      >>> On 2020-02-03 06:52, Prabhash Rathore wrote:
>      >>> Hello,
>      >>>
>      >>> We have decided to use ZGC Garbage Collector for our Java
>     application
>      >>> running on Java 11. I was wondering if there are any tools or any
>      >>> documenation on how to interpret ZGC logs.
>      >>
>      >> Is there something in particular in the logs you're wondering about?
>      >>
>      >>>
>      >>> I found following statistics in ZGC log which as per my
>     understanding
>      >> shows
>      >>> a very large allocation stall of 3902042.342 milliseconds. It
>     will be
>      >>> really great if I can get some help to understand this further.
>      >>
>      >> I can see that you've had making times that is more than an hour
>     long,
>      >> which suggests that something in your system is seriously wrong
>     (like
>      >> extremely overloaded or an extremely slow disk that you log
>     to... just
>      >> guessing here). I think you need to have a broader look at the
>     health of
>      >> the system before we can draw any conclusion from the GC logs.
>      >>
>      >> cheers,
>      >> Per
>      >>
>      >>>
>      >>> [2020-02-02T22:37:36.883+0000] === Garbage Collection Statistics
>      >>>
>      >>
>     =======================================================================================================================
>      >>> [2020-02-02T22:37:36.883+0000]
>      >>>? ? ? ? ? ? ? ? ? Last 10s? ? ? ? ? ? ? Last 10m             
>     Last 10h
>      >>>? ? ? ? ? Total
>      >>> [2020-02-02T22:37:36.883+0000]
>      >>>? ? ? ? ? ? ? ? ? Avg / Max? ? ? ? ? ? ?Avg / Max           
>      ?Avg / Max
>      >>>? ? ? ? ?Avg / Max
>      >>> [2020-02-02T22:37:36.883+0000]? ?Collector: Garbage Collection
>     Cycle
>      >>>? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? 7789.187 / 7789.187? 12727.424 /
>      >>> 3903938.012? 1265.033 / 3903938.012? ?ms
>      >>> [2020-02-02T22:37:36.883+0000]? Contention: Mark Segment Reset
>     Contention
>      >>>? ? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? 10 / 1084? ? ? ? ? ? 176
>     / 15122
>      >>>? ? ? ? 42 / 15122? ? ? ?ops/s
>      >>> [2020-02-02T22:37:36.883+0000]? Contention: Mark SeqNum Reset
>     Contention
>      >>>? ? ? ? ? ? ? ? ? ? 0 / 0? ? ? ? ? ? ? ? ?0 / 5               
>      ?0 / 31
>      >>>? ? ? ? ? 0 / 31? ? ? ? ? ops/s
>      >>> [2020-02-02T22:37:36.883+0000]? Contention: Relocation Contention
>      >>>? ? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? ?0 / 3? ? ? ? ? ? ? ? ?1
>     / 708
>      >>>? ? ? ? ?7 / 890? ? ? ? ?ops/s
>      >>> [2020-02-02T22:37:36.883+0000]? ? Critical: Allocation Stall
>      >>>? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?0.000 / 0.000? ? ? 6714.722 /
>      >>> 3902042.342? 6714.722 / 3902042.342? ?ms
>      >>> [2020-02-02T22:37:36.883+0000]? ? Critical: Allocation Stall
>      >>>? ? ? ? ? ? ? ? ? ? 0 / 0? ? ? ? ? ? ? ? ?0 / 0               
>     12 / 4115
>      >>>? ? ? ? ? 2 / 4115? ? ? ? ops/s
>      >>> [2020-02-02T22:37:36.883+0000]? ? Critical: GC Locker Stall
>      >>>? ? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?0.000 / 0.000? ? ? ? ?3.979
>     / 6.561
>      >>>? ? ?1.251 / 6.561? ? ? ?ms
>      >>> [2020-02-02T22:37:36.883+0000]? ? Critical: GC Locker Stall
>      >>>? ? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? ?0 / 1
>      >>>? ? ? ? ?0 / 1? ? ? ? ? ?ops/s
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? Memory: Allocation Rate
>      >>>? ? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? ?6 / 822? ? ? ? ? ? ?762
>     / 25306
>      >>>? ? ? 1548 / 25306? ? ? ?MB/s
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? Memory: Heap Used After Mark
>      >>>? ? ? ? ? ? ? ? ? ? 0 / 0? ? ? ? ? ? ?92170 / 92170? ? ? ? ?89632 /
>      >> 132896
>      >>>? ? ? 30301 / 132896? ? ? MB
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? Memory: Heap Used After
>     Relocation
>      >>>? ? ? ? ? ? ? ? ? ? 0 / 0? ? ? ? ? ? ?76376 / 76376? ? ? ? ?67490 /
>      >> 132928
>      >>>? ? ? ?8047 / 132928? ? ? MB
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? Memory: Heap Used Before Mark
>      >>>? ? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ?92128 / 92128? ? ? ? ?84429
>     / 132896
>      >>>? ? ? 29452 / 132896? ? ? MB
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? Memory: Heap Used Before
>     Relocation
>      >>>? ? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ?86340 / 86340? ? ? ? ?76995
>     / 132896
>      >>>? ? ? 15862 / 132896? ? ? MB
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? Memory: Out Of Memory
>      >>>? ? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? ?0 / 0
>      >>>? ? ? ? ?0 / 0? ? ? ? ? ?ops/s
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? Memory: Page Cache Flush
>      >>>? ? ? ? ? ? ? ? ? ? 0 / 0? ? ? ? ? ? ? ? ?0 / 0               
>     62 / 2868
>      >>>? ? ? ? ?16 / 2868? ? ? ? MB/s
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? Memory: Page Cache Hit L1
>      >>>? ? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? ?7 / 2233? ? ? ? ? ? 277
>     / 11553
>      >>>? ? ? ?583 / 11553? ? ? ?ops/s
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? Memory: Page Cache Hit L2
>      >>>? ? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? 20
>     / 4619
>      >>>? ? ? ? ?59 / 4619? ? ? ? ops/s
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? Memory: Page Cache Miss
>      >>>? ? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? 15
>     / 1039
>      >>>? ? ? ? ? 3 / 1297? ? ? ? ops/s
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? Memory: Undo Object
>     Allocation Failed
>      >>>? ? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? ?0 / 0? ? ? ? ? ? ? ? ?0
>     / 24
>      >>>? ? ? ? ? 0 / 24? ? ? ? ? ops/s
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? Memory: Undo Object Allocation
>      >>> Succeeded? ? ? ? ? ? ? ? ? 0 / 0? ? ? ? ? ? ? ? ?0 / 3         
>      ? ? ? ?1
>      >> /
>      >>> 708? ? ? ? ? ? ? ?7 / 890? ? ? ? ?ops/s
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? Memory: Undo Page Allocation
>      >>>? ? ? ? ? ? ? ? ? ? 0 / 0? ? ? ? ? ? ? ? ?0 / 12             
>      ?30 / 3464
>      >>>? ? ? ? ? 7 / 3464? ? ? ? ops/s
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? ?Phase: Concurrent Destroy
>     Detached
>      >>> Pages? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?0.004 / 0.004       
>     11.675 /
>      >>> 1484.886? ? ? 1.155 / 1484.886? ? ms
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? ?Phase: Concurrent Mark
>      >>>? ? ? ? ? ? ? ?0.000 / 0.000? ? ? 7016.569 / 7016.569? 11758.365 /
>      >>> 3901893.544? 1103.558 / 3901893.544? ?ms
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? ?Phase: Concurrent Mark
>     Continue
>      >>>? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?0.000 / 0.000? ? ? 1968.844 /
>      >> 3674.454
>      >>>? ?1968.844 / 3674.454? ? ms
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? ?Phase: Concurrent Prepare
>     Relocation
>      >>> Set? ? ? ? ? ? ?0.000 / 0.000? ? ? ?453.732 / 453.732? ? ?364.535 /
>      >>> 7103.720? ? ?39.453 / 7103.720? ? ms
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? ?Phase: Concurrent Process
>     Non-Strong
>      >>> References? ? ? 0.000 / 0.000? ? ? ? ?2.003 / 2.003? ? ? ? ?2.738 /
>      >> 34.406
>      >>>? ? ? ? 2.253 / 34.406? ? ? ms
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? ?Phase: Concurrent Relocate
>      >>>? ? ? ? ? ? ? ?0.000 / 0.000? ? ? ?261.822 / 261.822? ? ?335.954 /
>      >> 2207.669
>      >>>? ? ?45.868 / 2207.669? ? ms
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? ?Phase: Concurrent Reset
>     Relocation
>      >> Set
>      >>>? ? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?6.083 / 6.083? ? ? ? 13.489 /
>      >> 1128.678
>      >>>? ? ? 3.574 / 1128.678? ? ms
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? ?Phase: Concurrent Select
>     Relocation
>      >>> Set? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?6.379 / 6.379       
>     97.530 /
>      >>> 1460.679? ? ?18.439 / 1460.679? ? ms
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? ?Phase: Pause Mark End
>      >>>? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?4.420 / 4.420? ? ? ? ?6.219 /
>      >> 26.498
>      >>>? ? ? 6.474 / 40.883? ? ? ms
>      >>> [2020-02-02T22:37:36.883+0000]? ? ? ?Phase: Pause Mark Start
>      >>>? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? 14.836 / 14.836? ? ? ?11.893 /
>      >> 28.350
>      >>>? ? ?11.664 / 41.767? ? ? ms
>      >>> [2020-02-02T22:37:36.884+0000]? ? ? ?Phase: Pause Relocate Start
>      >>>? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? 13.411 / 13.411? ? ? ?30.849 /
>      >> 697.344
>      >>>? ? ? 11.995 / 697.344? ? ?ms
>      >>> [2020-02-02T22:37:36.884+0000]? ? Subphase: Concurrent Mark
>      >>>? ? ? ? ? ? ? ?0.000 / 0.000? ? ? 7015.793 / 7016.276? 18497.265 /
>      >>> 3901893.075? 1690.497 / 3901893.075? ?ms
>      >>> [2020-02-02T22:37:36.884+0000]? ? Subphase: Concurrent Mark Idle
>      >>>? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?1.127 / 13.510? ? ? ? 1.292 /
>      >> 219.999
>      >>>? ? ? ?1.280 / 219.999? ? ?ms
>      >>> [2020-02-02T22:37:36.884+0000]? ? Subphase: Concurrent Mark Try
>     Flush
>      >>>? ? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?1.295 / 2.029? ? ? ? 47.094 /
>      >> 34869.359
>      >>>? ? ?4.797 / 34869.359? ?ms
>      >>> [2020-02-02T22:37:36.884+0000]? ? Subphase: Concurrent Mark Try
>     Terminate
>      >>>? ? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?1.212 / 14.847? ? ? ? 1.760 /
>      >> 3799.238
>      >>>? ? ? 1.724 / 3799.238? ? ms
>      >>> [2020-02-02T22:37:36.884+0000]? ? Subphase: Concurrent
>     References Enqueue
>      >>>? ? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?0.009 / 0.009? ? ? ? ?0.022
>     / 1.930
>      >>>? ? ?0.017 / 2.350? ? ? ?ms
>      >>> [2020-02-02T22:37:36.884+0000]? ? Subphase: Concurrent
>     References Process
>      >>>? ? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?0.599 / 0.599? ? ? ? ?0.768
>     / 23.966
>      >>>? ? ? 0.495 / 23.966? ? ? ms
>      >>> [2020-02-02T22:37:36.884+0000]? ? Subphase: Concurrent Weak Roots
>      >>>? ? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?0.882 / 1.253? ? ? ? ?1.155
>     / 21.699
>      >>>? ? ? 1.077 / 23.602? ? ? ms
>      >>> [2020-02-02T22:37:36.884+0000]? ? Subphase: Concurrent Weak Roots
>      >>> JNIWeakHandles? ? ? ? ? 0.000 / 0.000? ? ? ? ?0.301 / 0.943
>      >> 0.308 /
>      >>> 10.868? ? ? ? 0.310 / 23.219? ? ? ms
>      >>> [2020-02-02T22:37:36.884+0000]? ? Subphase: Concurrent Weak Roots
>      >>> StringTable? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?0.289 / 0.496
>      >> 0.390 /
>      >>> 12.794? ? ? ? 0.363 / 22.907? ? ? ms
>      >>> [2020-02-02T22:37:36.884+0000]? ? Subphase: Concurrent Weak Roots
>      >>> VMWeakHandles? ? ? ? ? ?0.000 / 0.000? ? ? ? ?0.230 / 0.469
>      >> 0.329 /
>      >>> 21.267? ? ? ? 0.331 / 23.135? ? ? ms
>      >>> [2020-02-02T22:37:36.884+0000]? ? Subphase: Pause Mark Try Complete
>      >>>? ? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?0.000 / 0.000? ? ? ? ?0.501
>     / 4.801
>      >>>? ? ?0.480 / 17.208? ? ? ms
>      >>> [2020-02-02T22:37:36.884+0000]? ? Subphase: Pause Remap TLABS
>      >>>? ? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?0.252 / 0.252? ? ? ? ?0.195
>     / 0.528
>      >>>? ? ?0.226 / 3.451? ? ? ?ms
>      >>> [2020-02-02T22:37:36.884+0000]? ? Subphase: Pause Retire TLABS
>      >>>? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?1.195 / 1.195       
>      ?1.324 / 5.082
>      >>>? ? ? ?1.408 / 11.219? ? ? ms
>      >>> [2020-02-02T22:37:36.884+0000]? ? Subphase: Pause Roots
>      >>>? ? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?6.968 / 10.865? ? ? ?12.329 /
>      >> 693.701
>      >>>? ? ?6.431 / 1300.994? ? ms
>      >>> [2020-02-02T22:37:36.884+0000]? ? Subphase: Pause Roots
>      >>> ClassLoaderDataGraph? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?4.819 /
>     8.232
>      >>>? ?9.635 / 693.405? ? ? ?3.476 / 693.405? ? ?ms
>      >>> [2020-02-02T22:37:36.884+0000]? ? Subphase: Pause Roots CodeCache
>      >>>? ? ? ? ? ? ? ?0.000 / 0.000? ? ? ? ?0.842 / 2.731? ? ? ? ?0.996
>     / 83.553
>      >>>? ? ? 0.780 / 83.553? ? ? ms
>      >>> [2020-02-02T22:37:36.884+0000]? ? Subphase: Pause Roots JNIHandles
>      >>>? ? ? ? ? ? ? ? 0.000 / 0.000? ? ? ? ?1.171 / 6.314? ? ? ? ?0.866 /
>      >> 17.875
>      >>>? ? ? 0.837 / 25.708? ? ? ms
>      >>>
>      >>>
>      >>> Thank you!
>      >>> Prabhash Rathore
>      >>>
>      >>
> 

From raell at web.de  Mon Feb 24 18:12:31 2020
From: raell at web.de (raell at web.de)
Date: Mon, 24 Feb 2020 19:12:31 +0100
Subject: How does ZGC process objects newly created during concurrent marking?
Message-ID: <trinity-98ff99b6-31f6-4bea-bc26-d70b4125efab-1582567951918@3c-app-webde-bs45>

Hi,

I'm just curious how ZGC processes objects, that are newly created during concurrent marking:

Suppose concurrent marking is running, and after object x has been marked, the mutator executes
x.f = new Object();

How does ZGC process this case?
Are the new objects implicitly considered live by using a TAMS (as Shenandoah does)?

Thank you very much!

Ralph

From stefan.karlsson at oracle.com  Mon Feb 24 19:07:53 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Mon, 24 Feb 2020 20:07:53 +0100
Subject: How does ZGC process objects newly created during concurrent
 marking?
In-Reply-To: <trinity-98ff99b6-31f6-4bea-bc26-d70b4125efab-1582567951918@3c-app-webde-bs45>
References: <trinity-98ff99b6-31f6-4bea-bc26-d70b4125efab-1582567951918@3c-app-webde-bs45>
Message-ID: <7aca93c6-7b19-b46d-9af5-dafdf1ae0d67@oracle.com>

Hi Ralph,

On 2020-02-24 19:12, raell at web.de wrote:
> Hi,
>
> I'm just curious how ZGC processes objects, that are newly created during concurrent marking:
>
> Suppose concurrent marking is running, and after object x has been marked, the mutator executes
> x.f = new Object();
>
> How does ZGC process this case?
> Are the new objects implicitly considered live by using a TAMS (as Shenandoah does)?

We don't use a TAMS, instead we track if ZPages (heap regions) are 
allocated/reused before or after the last mark start. Objects in pages 
allocated after mark start are considered live.

You can see the code for that here:

https://hg.openjdk.java.net/jdk/jdk/file/d33754052039/src/hotspot/share/gc/z/zMark.cpp#l318

bool ZMark::try_mark_object(ZMarkCache* cache, uintptr_t addr, bool 
finalizable) { ?ZPage* const page = _page_table->get(addr);
if (page->is_allocating()) {
// Newly allocated objects are implicitly marked
return false;
} 
https://hg.openjdk.java.net/jdk/jdk/file/d33754052039/src/hotspot/share/gc/z/zPage.inline.hpp 
inline bool ZPage::is_allocating() const { return _seqnum == 
ZGlobalSeqNum; } When marking starts we update the ZGlobalSeqNum: 
https://hg.openjdk.java.net/jdk/jdk/file/d33754052039/src/hotspot/share/gc/z/zMark.cpp 

void ZMark::prepare_mark() { // Increment global sequence number to 
invalidate // marking information for all pages. ZGlobalSeqNum++; And 
when pages are allocated/reused we set its _seqnum; 
https://hg.openjdk.java.net/jdk/jdk/file/d33754052039/src/hotspot/share/gc/z/zPage.cpp 
void ZPage::reset() {
_seqnum = ZGlobalSeqNum; HTH, StefanK

> Thank you very much!
>
> Ralph


From peter_booth at me.com  Mon Feb 24 21:02:56 2020
From: peter_booth at me.com (Peter Booth)
Date: Mon, 24 Feb 2020 16:02:56 -0500
Subject: Information on how to parse/interpret ZGC Logs
In-Reply-To: <089b6068-adb0-e40f-f1ac-11c6356c2779@oracle.com>
References: <089b6068-adb0-e40f-f1ac-11c6356c2779@oracle.com>
Message-ID: <B6C1C5DA-51A8-45EA-A9D3-8974A5E5F162@me.com>

I asked about THP because I imagined it was probably enabled. 

Now, to be frank, I think that right now your host / app is sick, and that it would be easy to waste time trying alternate configurations / tunings. It looks as though there are currently too many ?smart ideas? interacting. I think you might want to consider reverting to a simple, unoptimized, transparent configuration and confirming that it works. Then step by step add settings and measure their impact.

By ?simple? I mean removing the non-default settings for GC threads, removing PreTouch, removing GC debug log. By ?transparent? I mean setting a much lower Xms value - say 10G, so that, as your application runs you will see from logs when the heap resizes. This isn?t about tuning but stability and increasing visibility into the behavior of the app. 

Hope this helps

Peter

Sent from my iPhone

> On Feb 24, 2020, at 7:32 AM, Per Liden <per.liden at oracle.com> wrote:
> 
> ?Hi,
> 
>> On 2/13/20 8:20 AM, Prabhash Rathore wrote:
>> Thank you Per for your help! It's very helpful.
>> I started my application GC configuration with default settings, I just had Xmx set but because of memory allocation stalls and Gc pauses, I tunes Concurrent threads and Parallel threads as default options didn't see enough.
>> My reasoning for high parallel  thread configuration (-XX:ParallelGCThreads=78) is that application threads are anyway stalled during full pause so having higher threads (for now 80% of OS threads) can work on collection. and keep the GC pause time lower. Again I increased Concurrent threads from default value to keep collection rate on par with the allocation rate.
>> You mentioned when I see Allocation Stall, increase heap size. I think I have already heap configured at 80% of RAM size. For such allocation stall, is there anything else I can tune other than heap size, concurrent and parallel thread counts.
>> Hi Peter,
>> This application runs on Linux RHEL 7.7. OS. Kernel version is 3.10.0-1062
>> Output of *egrep "thp|trans" /proc/vmstat:*
>> nr_anon_transparent_hugepages 4722
>> thp_fault_alloc 51664
>> thp_fault_fallback 620147
>> thp_collapse_alloc 11462
>> thp_collapse_alloc_failed 20085
>> thp_split 9350
>> thp_zero_page_alloc 1
>> thp_zero_page_alloc_failed 0
> 
> For best throughput and latency, I'd suggest you disable all use of transparent hugepages and instead configure the kernel hugepage pool and use the -XX:+UseLargePages JVM option.
> 
> cheers,
> Per
> 
>> Output of *tail -28 /proc/cpuinfo*
>> power management:
>> processor : 95
>> vendor_id : GenuineIntel
>> cpu family : 6
>> model : 85
>> model name : Intel(R) Xeon(R) Gold 6263CY CPU @ 2.60GHz
>> stepping : 7
>> microcode : 0x5000021
>> cpu MHz : 3304.431
>> cache size : 33792 KB
>> physical id : 1
>> siblings : 48
>> core id : 29
>> cpu cores : 24
>> apicid : 123
>> initial apicid : 123
>> fpu : yes
>> fpu_exception : yes
>> cpuid level : 22
>> wp : yes
>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear spec_ctrl intel_stibp flush_l1d arch_capabilities
>> bogomips : 5206.07
>> clflush size : 64
>> cache_alignment : 64
>> address sizes : 46 bits physical, 48 bits virtual
>> power management:
>> On Mon, Feb 10, 2020 at 9:28 AM Peter Booth <peter_booth at me.com <mailto:peter_booth at me.com>> wrote:
>>    Prabhash,
>>    What OS version?
>>    Is it a vanilla OS install?
>>    Can you print the output of the following?
>>    (Assuming Linux)
>>    egrep ?thp|trans? /proc/vmstat
>>    tail -28 /proc/cpuinfo
>>    Peter
>>    Sent from my iPhone
>>     > On Feb 9, 2020, at 1:56 AM, Prabhash Rathore
>>    <prabhashrathore at gmail.com <mailto:prabhashrathore at gmail.com>> wrote:
>>     >
>>     > ?Hi Per,
>>     >
>>     > Thanks for your reply!
>>     >
>>     > About ZGC logs, in general I am trying to understand following:
>>     >
>>     >   - What are the full pause times?
>>     >   - How many such pause per unit time?
>>     >   - Anything else which helps me eliminate GC as cause for high
>>     >   application latency.
>>     >
>>     > This is how I have configured ZGC logging at JVM level, wondering
>>    if I
>>     > should add other tags like Safepoint to get more details about GC
>>    stats:
>>     > -Xlog:gc*=debug:file=gc.log
>>     >
>>     > All JVM flas used in my application:
>>     > -Xms130G -Xmx130G
>>     >
>>    -Xlog:gc*=debug:file=/somedir/gc.log:time:filecount=10,filesize=67108864
>>     > -XX:+AlwaysPreTouch -XX:+HeapDumpOnOutOfMemoryError
>>     > -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:ConcGCThreads=48
>>     > -XX:ParallelGCThreads=96 -XX:-OmitStackTraceInFastThrow
>>     >
>>     > It's a large machine with 96 threads and 196 GB RAM.
>>     >
>>     > I have -XX:+AlwaysPreTouch configured as one another option. With
>>     > AlwaysPreTouch option, I see in Linux top command shows a very
>>    high shared
>>     > and resident memory. My max heap size is configured as 130 GB but
>>    I see
>>     > shared memory is shown as 388 GB and Resident memory as 436 GB.
>>    On the
>>     > other hand, total virtual memory for this process in top is shown
>>    as 17.1
>>     > tera byte. How is this possible? My whole machine size is 196 GB
>>    (is this
>>     > accounting for things swapped out to disk). I did see without
>>     > AlwaysPretouch, numbers look close to the heap size. Trying to
>>    understand
>>     > why with PreTouch, process memory is shown was higher than
>>    configured size?
>>     > I understand shared memory has all shared libs mapped out but how
>>    can it be
>>     > such. a large size?
>>     >
>>     > Regarding high GC pause time, I did notice that my machine was low on
>>     > memory and it was swapping, hence slowing down everything. For
>>    now I have
>>     > disabled Swappines completely with Kernel VM tunable but I am
>>    still trying
>>     > to find the actual cause of why swapping kicked in. This machine
>>    only runs
>>     > this particular Java applicaion which has 130 GB heap size. Other
>>    than
>>     > heap, I still have 66 GB memory available on host. Trying to
>>    figure out if
>>     > there is a native memory leak. If you have any inputs on this
>>    then please
>>     > share.
>>     >
>>     > Thanks!
>>     > Prabhash Rathore
>>     >
>>     >> On Mon, Feb 3, 2020 at 2:35 AM Per Liden <per.liden at oracle.com
>>    <mailto:per.liden at oracle.com>> wrote:
>>     >>
>>     >> Hi,
>>     >>
>>     >>> On 2020-02-03 06:52, Prabhash Rathore wrote:
>>     >>> Hello,
>>     >>>
>>     >>> We have decided to use ZGC Garbage Collector for our Java
>>    application
>>     >>> running on Java 11. I was wondering if there are any tools or any
>>     >>> documenation on how to interpret ZGC logs.
>>     >>
>>     >> Is there something in particular in the logs you're wondering about?
>>     >>
>>     >>>
>>     >>> I found following statistics in ZGC log which as per my
>>    understanding
>>     >> shows
>>     >>> a very large allocation stall of 3902042.342 milliseconds. It
>>    will be
>>     >>> really great if I can get some help to understand this further.
>>     >>
>>     >> I can see that you've had making times that is more than an hour
>>    long,
>>     >> which suggests that something in your system is seriously wrong
>>    (like
>>     >> extremely overloaded or an extremely slow disk that you log
>>    to... just
>>     >> guessing here). I think you need to have a broader look at the
>>    health of
>>     >> the system before we can draw any conclusion from the GC logs.
>>     >>
>>     >> cheers,
>>     >> Per
>>     >>
>>     >>>
>>     >>> [2020-02-02T22:37:36.883+0000] === Garbage Collection Statistics
>>     >>>
>>     >>
>>    =======================================================================================================================
>>     >>> [2020-02-02T22:37:36.883+0000]
>>     >>>                  Last 10s              Last 10m                 Last 10h
>>     >>>          Total
>>     >>> [2020-02-02T22:37:36.883+0000]
>>     >>>                  Avg / Max             Avg / Max                 Avg / Max
>>     >>>         Avg / Max
>>     >>> [2020-02-02T22:37:36.883+0000]   Collector: Garbage Collection
>>    Cycle
>>     >>>                0.000 / 0.000      7789.187 / 7789.187  12727.424 /
>>     >>> 3903938.012  1265.033 / 3903938.012   ms
>>     >>> [2020-02-02T22:37:36.883+0000]  Contention: Mark Segment Reset
>>    Contention
>>     >>>                   0 / 0                10 / 1084            176
>>    / 15122
>>     >>>        42 / 15122       ops/s
>>     >>> [2020-02-02T22:37:36.883+0000]  Contention: Mark SeqNum Reset
>>    Contention
>>     >>>                    0 / 0                 0 / 5                     0 / 31
>>     >>>          0 / 31          ops/s
>>     >>> [2020-02-02T22:37:36.883+0000]  Contention: Relocation Contention
>>     >>>                   0 / 0                 0 / 3                 1
>>    / 708
>>     >>>         7 / 890         ops/s
>>     >>> [2020-02-02T22:37:36.883+0000]    Critical: Allocation Stall
>>     >>>                0.000 / 0.000         0.000 / 0.000      6714.722 /
>>     >>> 3902042.342  6714.722 / 3902042.342   ms
>>     >>> [2020-02-02T22:37:36.883+0000]    Critical: Allocation Stall
>>     >>>                    0 / 0                 0 / 0                   12 / 4115
>>     >>>          2 / 4115        ops/s
>>     >>> [2020-02-02T22:37:36.883+0000]    Critical: GC Locker Stall
>>     >>>               0.000 / 0.000         0.000 / 0.000         3.979
>>    / 6.561
>>     >>>     1.251 / 6.561       ms
>>     >>> [2020-02-02T22:37:36.883+0000]    Critical: GC Locker Stall
>>     >>>                   0 / 0                 0 / 0                 0 / 1
>>     >>>         0 / 1           ops/s
>>     >>> [2020-02-02T22:37:36.883+0000]      Memory: Allocation Rate
>>     >>>                   0 / 0                 6 / 822             762
>>    / 25306
>>     >>>      1548 / 25306       MB/s
>>     >>> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used After Mark
>>     >>>                    0 / 0             92170 / 92170         89632 /
>>     >> 132896
>>     >>>      30301 / 132896      MB
>>     >>> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used After
>>    Relocation
>>     >>>                    0 / 0             76376 / 76376         67490 /
>>     >> 132928
>>     >>>       8047 / 132928      MB
>>     >>> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used Before Mark
>>     >>>                   0 / 0             92128 / 92128         84429
>>    / 132896
>>     >>>      29452 / 132896      MB
>>     >>> [2020-02-02T22:37:36.883+0000]      Memory: Heap Used Before
>>    Relocation
>>     >>>                   0 / 0             86340 / 86340         76995
>>    / 132896
>>     >>>      15862 / 132896      MB
>>     >>> [2020-02-02T22:37:36.883+0000]      Memory: Out Of Memory
>>     >>>                   0 / 0                 0 / 0                 0 / 0
>>     >>>         0 / 0           ops/s
>>     >>> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Flush
>>     >>>                    0 / 0                 0 / 0                   62 / 2868
>>     >>>         16 / 2868        MB/s
>>     >>> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Hit L1
>>     >>>                   0 / 0                 7 / 2233            277
>>    / 11553
>>     >>>       583 / 11553       ops/s
>>     >>> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Hit L2
>>     >>>                   0 / 0                 0 / 0                20
>>    / 4619
>>     >>>         59 / 4619        ops/s
>>     >>> [2020-02-02T22:37:36.883+0000]      Memory: Page Cache Miss
>>     >>>                   0 / 0                 0 / 0                15
>>    / 1039
>>     >>>          3 / 1297        ops/s
>>     >>> [2020-02-02T22:37:36.883+0000]      Memory: Undo Object
>>    Allocation Failed
>>     >>>                   0 / 0                 0 / 0                 0
>>    / 24
>>     >>>          0 / 24          ops/s
>>     >>> [2020-02-02T22:37:36.883+0000]      Memory: Undo Object Allocation
>>     >>> Succeeded                  0 / 0                 0 / 3                     1
>>     >> /
>>     >>> 708               7 / 890         ops/s
>>     >>> [2020-02-02T22:37:36.883+0000]      Memory: Undo Page Allocation
>>     >>>                    0 / 0                 0 / 12                   30 / 3464
>>     >>>          7 / 3464        ops/s
>>     >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Destroy
>>    Detached
>>     >>> Pages             0.000 / 0.000         0.004 / 0.004           11.675 /
>>     >>> 1484.886      1.155 / 1484.886    ms
>>     >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Mark
>>     >>>               0.000 / 0.000      7016.569 / 7016.569  11758.365 /
>>     >>> 3901893.544  1103.558 / 3901893.544   ms
>>     >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Mark
>>    Continue
>>     >>>                0.000 / 0.000         0.000 / 0.000      1968.844 /
>>     >> 3674.454
>>     >>>   1968.844 / 3674.454    ms
>>     >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Prepare
>>    Relocation
>>     >>> Set             0.000 / 0.000       453.732 / 453.732     364.535 /
>>     >>> 7103.720     39.453 / 7103.720    ms
>>     >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Process
>>    Non-Strong
>>     >>> References      0.000 / 0.000         2.003 / 2.003         2.738 /
>>     >> 34.406
>>     >>>        2.253 / 34.406      ms
>>     >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Relocate
>>     >>>               0.000 / 0.000       261.822 / 261.822     335.954 /
>>     >> 2207.669
>>     >>>     45.868 / 2207.669    ms
>>     >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Reset
>>    Relocation
>>     >> Set
>>     >>>               0.000 / 0.000         6.083 / 6.083        13.489 /
>>     >> 1128.678
>>     >>>      3.574 / 1128.678    ms
>>     >>> [2020-02-02T22:37:36.883+0000]       Phase: Concurrent Select
>>    Relocation
>>     >>> Set              0.000 / 0.000         6.379 / 6.379           97.530 /
>>     >>> 1460.679     18.439 / 1460.679    ms
>>     >>> [2020-02-02T22:37:36.883+0000]       Phase: Pause Mark End
>>     >>>                0.000 / 0.000         4.420 / 4.420         6.219 /
>>     >> 26.498
>>     >>>      6.474 / 40.883      ms
>>     >>> [2020-02-02T22:37:36.883+0000]       Phase: Pause Mark Start
>>     >>>                0.000 / 0.000        14.836 / 14.836       11.893 /
>>     >> 28.350
>>     >>>     11.664 / 41.767      ms
>>     >>> [2020-02-02T22:37:36.884+0000]       Phase: Pause Relocate Start
>>     >>>                0.000 / 0.000        13.411 / 13.411       30.849 /
>>     >> 697.344
>>     >>>      11.995 / 697.344     ms
>>     >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark
>>     >>>               0.000 / 0.000      7015.793 / 7016.276  18497.265 /
>>     >>> 3901893.075  1690.497 / 3901893.075   ms
>>     >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Idle
>>     >>>                0.000 / 0.000         1.127 / 13.510        1.292 /
>>     >> 219.999
>>     >>>       1.280 / 219.999     ms
>>     >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Try
>>    Flush
>>     >>>               0.000 / 0.000         1.295 / 2.029        47.094 /
>>     >> 34869.359
>>     >>>     4.797 / 34869.359   ms
>>     >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Mark Try
>>    Terminate
>>     >>>               0.000 / 0.000         1.212 / 14.847        1.760 /
>>     >> 3799.238
>>     >>>      1.724 / 3799.238    ms
>>     >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent
>>    References Enqueue
>>     >>>               0.000 / 0.000         0.009 / 0.009         0.022
>>    / 1.930
>>     >>>     0.017 / 2.350       ms
>>     >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent
>>    References Process
>>     >>>               0.000 / 0.000         0.599 / 0.599         0.768
>>    / 23.966
>>     >>>      0.495 / 23.966      ms
>>     >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
>>     >>>               0.000 / 0.000         0.882 / 1.253         1.155
>>    / 21.699
>>     >>>      1.077 / 23.602      ms
>>     >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
>>     >>> JNIWeakHandles          0.000 / 0.000         0.301 / 0.943
>>     >> 0.308 /
>>     >>> 10.868        0.310 / 23.219      ms
>>     >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
>>     >>> StringTable             0.000 / 0.000         0.289 / 0.496
>>     >> 0.390 /
>>     >>> 12.794        0.363 / 22.907      ms
>>     >>> [2020-02-02T22:37:36.884+0000]    Subphase: Concurrent Weak Roots
>>     >>> VMWeakHandles           0.000 / 0.000         0.230 / 0.469
>>     >> 0.329 /
>>     >>> 21.267        0.331 / 23.135      ms
>>     >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Mark Try Complete
>>     >>>               0.000 / 0.000         0.000 / 0.000         0.501
>>    / 4.801
>>     >>>     0.480 / 17.208      ms
>>     >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Remap TLABS
>>     >>>               0.000 / 0.000         0.252 / 0.252         0.195
>>    / 0.528
>>     >>>     0.226 / 3.451       ms
>>     >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Retire TLABS
>>     >>>                0.000 / 0.000         1.195 / 1.195             1.324 / 5.082
>>     >>>       1.408 / 11.219      ms
>>     >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots
>>     >>>               0.000 / 0.000         6.968 / 10.865       12.329 /
>>     >> 693.701
>>     >>>     6.431 / 1300.994    ms
>>     >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots
>>     >>> ClassLoaderDataGraph              0.000 / 0.000         4.819 /
>>    8.232
>>     >>>   9.635 / 693.405       3.476 / 693.405     ms
>>     >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots CodeCache
>>     >>>               0.000 / 0.000         0.842 / 2.731         0.996
>>    / 83.553
>>     >>>      0.780 / 83.553      ms
>>     >>> [2020-02-02T22:37:36.884+0000]    Subphase: Pause Roots JNIHandles
>>     >>>                0.000 / 0.000         1.171 / 6.314         0.866 /
>>     >> 17.875
>>     >>>      0.837 / 25.708      ms
>>     >>>
>>     >>>
>>     >>> Thank you!
>>     >>> Prabhash Rathore
>>     >>>
>>     >>

From Barry.Galster at imc.com  Fri Feb 28 20:50:44 2020
From: Barry.Galster at imc.com (Barry Galster)
Date: Fri, 28 Feb 2020 20:50:44 +0000
Subject: ThreadDump safepoint ever invoked by ZGC?
Message-ID: <9B2C95F1-4366-459A-90F6-0349D28BF341@imc.com>

ZGC experts,
I have a trio of questions for you:


  1.  Does ZGC ever invoke the ThreadDump safepoint?
  2.  If so, why?
  3.  Is there any way to disable this?

Regards,
Barry

Barry Galster
Performance Engineer - Team Lead
T +13122047574
E Barry.Galster at imc.com<mailto:Barry.Galster at imc.com>
233 South Wacker Drive # 4300,
Chicago, Illinois 60606, US
[IMC Logo]<https://www.imc.com/us/>

[F]<https://www.facebook.com/IMCTrading>


[t]<http://twitter.com/IMCTrading>


[I]<https://www.instagram.com/imctrading/>


[in]<https://www.linkedin.com/company/imc-financial-markets>


imc.com<https://www.imc.com/us/>


________________________________

The information in this e-mail is intended only for the person or entity to which it is addressed.

It may contain confidential and /or privileged material, the disclosure of which is prohibited. Any unauthorized copying, disclosure or distribution of the information in this email outside your company is strictly forbidden.

If you are not the intended recipient (or have received this email in error), please contact the sender immediately and permanently delete all copies of this email and any attachments from your computer system and destroy any hard copies. Although the information in this email has been compiled with great care, neither IMC nor any of its related entities shall accept any responsibility for any errors, omissions or other inaccuracies in this information or for the consequences thereof, nor shall it be bound in any way by the contents of this e-mail or its attachments.

Messages and attachments are scanned for all known viruses. Always scan attachments before opening them.

From stefan.karlsson at oracle.com  Fri Feb 28 21:21:49 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Fri, 28 Feb 2020 22:21:49 +0100
Subject: ThreadDump safepoint ever invoked by ZGC?
In-Reply-To: <9B2C95F1-4366-459A-90F6-0349D28BF341@imc.com>
References: <9B2C95F1-4366-459A-90F6-0349D28BF341@imc.com>
Message-ID: <9b5cfce9-1a92-4190-14ef-fae9d2e87bfa@oracle.com>

Hi Barry,

On 2020-02-28 21:50, Barry Galster wrote:
> ZGC experts,
> I have a trio of questions for you:
>
>
>    1.  Does ZGC ever invoke the ThreadDump safepoint?

ZGC in it self does not invoke any ThreadDump safepoints. However, there 
are different serviceability and management features/APIs that do. They 
are independent of what GC is used.

A few of the APIs can be found here:
https://docs.oracle.com/en/java/javase/13/docs/api/java.management/java/lang/management/ThreadMXBean.html

and in the Thread class:
https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/lang/Thread.html

>    2.  If so, why?
>    3.  Is there any way to disable this?

There's no way that I'm aware of, except trying to turn figure out what 
management APIs are used and disable them somehow.

StefanK

>
> Regards,
> Barry
>
> Barry Galster
> Performance Engineer - Team Lead
> T +13122047574
> E Barry.Galster at imc.com<mailto:Barry.Galster at imc.com>
> 233 South Wacker Drive # 4300,
> Chicago, Illinois 60606, US
> [IMC Logo]<https://www.imc.com/us/>
>
> [F]<https://www.facebook.com/IMCTrading>
>
>
> [t]<http://twitter.com/IMCTrading>
>
>
> [I]<https://www.instagram.com/imctrading/>
>
>
> [in]<https://www.linkedin.com/company/imc-financial-markets>
>
>
> imc.com<https://www.imc.com/us/>
>
>
>
> ________________________________
>
> The information in this e-mail is intended only for the person or entity to which it is addressed.
>
> It may contain confidential and /or privileged material, the disclosure of which is prohibited. Any unauthorized copying, disclosure or distribution of the information in this email outside your company is strictly forbidden.
>
> If you are not the intended recipient (or have received this email in error), please contact the sender immediately and permanently delete all copies of this email and any attachments from your computer system and destroy any hard copies. Although the information in this email has been compiled with great care, neither IMC nor any of its related entities shall accept any responsibility for any errors, omissions or other inaccuracies in this information or for the consequences thereof, nor shall it be bound in any way by the contents of this e-mail or its attachments.
>
> Messages and attachments are scanned for all known viruses. Always scan attachments before opening them.