From prasanna.gopal at blackrock.com  Thu Oct  6 10:48:09 2016
From: prasanna.gopal at blackrock.com (Gopal, Prasanna CWK)
Date: Thu, 6 Oct 2016 10:48:09 +0000
Subject: G1 GC - [Ref Enq] taking lot of time
Message-ID: <a42e7bc8164a4a3c846f7d082bb0610b@UKPMSEXD202N02.na.blkint.com>

Hi All

We are experimenting G1 GC for one of our application.  Please find our application settings GC Settings

-XX:MaxPermSize=512m
 -XX:+UseG1GC
 -XX:G1ReservePercent=40
 -XX:ConcGCThreads=14
 -XX:+PrintGCDateStamps
-XX:+PrintTenuringDistribution
 -XX:+PrintGCApplicationConcurrentTime
 -XX:+PrintGCApplicationStoppedTime

JVM : JVM : jdk_7u40_x64

While scanning for instances which caused application threads to get stopped , we found the following instance in our GC logs.

GC Logs
=======


{Heap before GC invocations=57832 (full 1):
garbage-first heap   total 5242880K, used 3111240K [0x00000006a0000000, 0x00000007e0000000, 0x00000007e0000000)
  region size 2048K, 672 young (1376256K), 1 survivors (2048K)
compacting perm gen  total 98304K, used 96772K [0x00000007e0000000, 0x00000007e6000000, 0x0000000800000000)
   the space 98304K,  98% used [0x00000007e0000000, 0x00000007e5e81338, 0x00000007e5e81400, 0x00000007e6000000)
No shared spaces configured.
2016-10-05T12:13:19.835-0400: 80080.725: [GC pause (young)
Desired survivor size 88080384 bytes, new threshold 15 (max 15)
- age   1:     136824 bytes,     136824 total
- age   2:      11120 bytes,     147944 total
- age   3:      11408 bytes,     159352 total
- age   4:       9248 bytes,     168600 total
- age   5:       8632 bytes,     177232 total
- age   6:       8224 bytes,     185456 total
- age   7:       8784 bytes,     194240 total
- age   8:      87856 bytes,     282096 total
- age   9:      25080 bytes,     307176 total
- age  10:       8272 bytes,     315448 total
- age  11:       7984 bytes,     323432 total
- age  12:      14120 bytes,     337552 total
- age  13:       9824 bytes,     347376 total
- age  14:      11616 bytes,     358992 total
- age  15:       8032 bytes,     367024 total
, 51.2453720 secs]
   [Parallel Time: 11.3 ms, GC Workers: 16]
      [GC Worker Start (ms): Min: 80080725.7, Avg: 80080725.8, Max: 80080725.9, Diff: 0.2]
      [Ext Root Scanning (ms): Min: 4.4, Avg: 5.2, Max: 10.7, Diff: 6.3, Sum: 83.5]
      [Update RS (ms): Min: 0.0, Avg: 0.8, Max: 1.3, Diff: 1.3, Sum: 13.4]
         [Processed Buffers: Min: 0, Avg: 3.0, Max: 7, Diff: 7, Sum: 48]
      [Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.3, Diff: 0.2, Sum: 3.7]
      [Object Copy (ms): Min: 0.1, Avg: 0.4, Max: 0.6, Diff: 0.5, Sum: 6.0]
      [Termination (ms): Min: 0.0, Avg: 4.3, Max: 4.6, Diff: 4.6, Sum: 68.6]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.1, Sum: 1.0]
      [GC Worker Total (ms): Min: 10.9, Avg: 11.0, Max: 11.2, Diff: 0.3, Sum: 176.3]
      [GC Worker End (ms): Min: 80080736.7, Avg: 80080736.8, Max: 80080736.8, Diff: 0.1]
   [Code Root Fixup: 0.0 ms]
   [Clear CT: 0.4 ms]
   [Other: 51233.7 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 17.2 ms]
      [Ref Enq: 50800.9 ms]
      [Free CSet: 381.2 ms]
   [Eden: 1342.0M(1342.0M)->0.0B(254.0M) Survivors: 2048.0K->2048.0K Heap: 3038.3M(5120.0M)->1696.2M(5120.0M)]
Heap after GC invocations=57833 (full 1):
 garbage-first heap   total 5242880K, used 1736897K [0x00000006a0000000, 0x00000007e0000000, 0x00000007e0000000)
  region size 2048K, 1 young (2048K), 1 survivors (2048K)
compacting perm gen  total 98304K, used 96772K [0x00000007e0000000, 0x00000007e6000000, 0x0000000800000000)
   the space 98304K,  98% used [0x00000007e0000000, 0x00000007e5e81338, 0x00000007e5e81400, 0x00000007e6000000)
No shared spaces configured.
}
[Times: user=0.00 sys=17.51, real=51.28 secs]
2016-10-05T12:14:11.208-0400: 80132.098: Total time for which application threads were stopped: 51.3791600 seconds

It looks like Reference  Enqueue ( Ref Enq)  event took nearly 50 seconds to complete. Could  you please help me in understanding , why it might take so much time to complete. Do I need to add any diagnostic flag to get more information ?.  Apologies , If  similar question was answered before in the mailing list. Any help is really appreciated.

Thanks and Regards
Prasanna


This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for further information.  Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more information about BlackRock?s Privacy Policy.
BlackRock Advisors (UK) Limited and BlackRock Investment Management (UK) Limited are authorised and regulated by the Financial Conduct Authority. Registered in England No. 796793 and No. 2020394 respectively. BlackRock Life Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. Registered in England No. 2223202. Registered Offices: 12 Throgmorton Avenue, London EC2N 2DL. BlackRock International Limited is authorised and regulated by the Financial Conduct Authority and is a registered investment adviser with the Securities and Exchange Commission (SEC). Registered in Scotland No. SC160821. Registered Office: Exchange Place One, 1 Semple Street, Edinburgh EH3 8BL.
For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.

? 2016 BlackRock, Inc. All rights reserved.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161006/97793b67/attachment-0001.html>

From prasanna.gopal at blackrock.com  Fri Oct  7 12:00:04 2016
From: prasanna.gopal at blackrock.com (Gopal, Prasanna CWK)
Date: Fri, 7 Oct 2016 12:00:04 +0000
Subject: G1-GC - Full GC [humongous allocation request failed] 
Message-ID: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>

Hi All

We have one of our application with the  following settings

JVM :  jdk_7u40_x64  ( we are in process of migrating to latest Jdk 7 family )

-XX:MaxPermSize=512m
 -XX:+UseG1GC
 -XX:G1ReservePercent=40
 -XX:ConcGCThreads=14
 -XX:+PrintGCDateStamps
-XX:+PrintTenuringDistribution
 -XX:+PrintGCApplicationConcurrentTime
 -XX:+PrintGCApplicationStoppedTime
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintHeapAtGC
 -XX:+PrintReferenceGC
-Xmx5120M
 -Xms5120M


>From our GC logs , we can see our application is going Full GC due to humongous allocation failure. But from the logs we can see


GC logs
=======

2016-10-07T02:37:14.978-0400: 71150.009: Total time for which application threads were stopped: 0.0137870 seconds
71150.399: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 75497488 bytes]
71150.399: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 75497472 bytes, attempted expansion amount: 75497472 bytes]
71150.399: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed]
2016-10-07T02:37:15.367-0400: 71150.399: Application time: 0.3898050 seconds
71150.401: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 75497488 bytes]
71150.401: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 75497472 bytes, attempted expansion amount: 75497472 bytes]
71150.401: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed]
{Heap before GC invocations=55428 (full 4):
garbage-first heap   total 5242880K, used 1900903K [0x00000006a0000000, 0x00000007e0000000, 0x00000007e0000000)
  region size 2048K, 197 young (403456K), 14 survivors (28672K)
compacting perm gen  total 96256K, used 94313K [0x00000007e0000000, 0x00000007e5e00000, 0x0000000800000000)
   the space 96256K,  97% used [0x00000007e0000000, 0x00000007e5c1a700, 0x00000007e5c1a800, 0x00000007e5e00000)
No shared spaces configured.
2016-10-07T02:37:15.369-0400: 71150.401: [GC pause (young)
Desired survivor size 108003328 bytes, new threshold 15 (max 15)
- age   1:    2362400 bytes,    2362400 total
- age   2:     393128 bytes,    2755528 total
- age   3:    1086824 bytes,    3842352 total
- age   4:    1086528 bytes,    4928880 total
- age   5:    1075480 bytes,    6004360 total
- age   6:    1126736 bytes,    7131096 total
- age   7:    1153072 bytes,    8284168 total
- age   8:    1145832 bytes,    9430000 total
- age   9:    1217904 bytes,   10647904 total
- age  10:    1188384 bytes,   11836288 total
- age  11:    1212456 bytes,   13048744 total
- age  12:    1263960 bytes,   14312704 total
- age  13:       4816 bytes,   14317520 total
- age  14:      88952 bytes,   14406472 total
- age  15:       7408 bytes,   14413880 total
71150.401: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 149101, predicted base time: 16.35 ms, remaining time: 183.65 ms, target pause time: 200.00 ms]
71150.401: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 183 regions, survivors: 14 regions, predicted young region time: 3.45 ms]
71150.401: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 183 regions, survivors: 14 regions, old: 0 regions, predicted pause time: 19.80 ms, target pause time: 200.00 ms]
2016-10-07T02:37:15.410-0400: 71150.442: [SoftReference, 0 refs, 0.0000460 secs]2016-10-07T02:37:15.410-0400: 71150.442: [WeakReference, 1 refs, 0.0000050 secs]2016-10-07T02:37:15.410-0400: 71150.442: [FinalReference, 4 refs, 0.0000210 secs]2016-10-07T02:37:15.410-0400: 71150.442: [PhantomReference, 0 refs, 0.0000040 secs]2016-10-07T02:37:15.410-0400: 71150.442: [JNI Weak Reference, 0.0000050 secs], 0.0428440 secs]
   [Parallel Time: 40.0 ms, GC Workers: 16]
      [GC Worker Start (ms): Min: 71150401.6, Avg: 71150401.8, Max: 71150401.9, Diff: 0.4]
      [Ext Root Scanning (ms): Min: 3.1, Avg: 3.9, Max: 8.4, Diff: 5.3, Sum: 62.1]
      [Update RS (ms): Min: 14.8, Avg: 19.0, Max: 19.8, Diff: 5.0, Sum: 304.3]
                  [Processed Buffers: Min: 21, Avg: 37.6, Max: 86, Diff: 65, Sum: 601]
      [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: 0.9]
      [Object Copy (ms): Min: 16.5, Avg: 16.6, Max: 16.8, Diff: 0.3, Sum: 266.3]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.7]
      [GC Worker Total (ms): Min: 39.5, Avg: 39.6, Max: 39.9, Diff: 0.4, Sum: 634.4]
      [GC Worker End (ms): Min: 71150441.4, Avg: 71150441.4, Max: 71150441.5, Diff: 0.1]
   [Code Root Fixup: 0.0 ms]
   [Clear CT: 0.2 ms]
   [Other: 2.7 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.3 ms]
      [Ref Enq: 0.0 ms]
      [Free CSet: 1.3 ms]
   [Eden: 366.0M(1616.0M)->0.0B(1384.0M) Survivors: 28.0M->104.0M Heap: 1856.6M(5120.0M)->1568.8M(5120.0M)]
Heap after GC invocations=55429 (full 4):
garbage-first heap   total 5242880K, used 1606459K [0x00000006a0000000, 0x00000007e0000000, 0x00000007e0000000)
  region size 2048K, 52 young (106496K), 52 survivors (106496K)
compacting perm gen  total 96256K, used 94313K [0x00000007e0000000, 0x00000007e5e00000, 0x0000000800000000)
   the space 96256K,  97% used [0x00000007e0000000, 0x00000007e5c1a700, 0x00000007e5c1a800, 0x00000007e5e00000)
No shared spaces configured.
}
[Times: user=0.64 sys=0.00, real=0.04 secs]
71150.444: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 75497488 bytes]
71150.444: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 69206016 bytes, attempted expansion amount: 69206016 bytes]
71150.444: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed]
2016-10-07T02:37:15.412-0400: 71150.444: Total time for which application threads were stopped: 0.0448480 seconds
2016-10-07T02:37:15.412-0400: 71150.444: Application time: 0.0000500 seconds
71150.445: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 75497488 bytes]
71150.445: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 69206016 bytes, attempted expansion amount: 69206016 bytes]
71150.445: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed]
71150.445: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: allocation request failed, allocation request: 75497488 bytes]
71150.445: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 75497488 bytes, attempted expansion amount: 77594624 bytes]
71150.445: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed]
{Heap before GC invocations=55429 (full 4):
garbage-first heap   total 5242880K, used 1606459K [0x00000006a0000000, 0x00000007e0000000, 0x00000007e0000000)
  region size 2048K, 53 young (108544K), 52 survivors (106496K)
compacting perm gen  total 96256K, used 94313K [0x00000007e0000000, 0x00000007e5e00000, 0x0000000800000000)
   the space 96256K,  97% used [0x00000007e0000000, 0x00000007e5c1a700, 0x00000007e5c1a800, 0x00000007e5e00000)
No shared spaces configured.
2016-10-07T02:37:15.414-0400: 71150.445: [Full GC2016-10-07T02:37:16.337-0400: 71151.368: [SoftReference, 86 refs, 0.0000720 secs]2016-10-07T02:37:16.337-0400: 71151.368: [WeakReference, 1760 refs, 0.0002980 secs]2016-10-07T02:37:16.337-0400: 71151.369: [FinalReference, 1201 refs, 0.0002080 secs]2016-10-07T02:37:16.337-0400: 71151.369: [PhantomReference, 0 refs, 0.0000030 secs]2016-10-07T02:37:16.337-0400: 71151.369: [JNI Weak Reference, 0.0000080 secs] 1568M->915M(5120M), 2.6880870 secs]
60 refs, 0.0002980 secs]2016-10-07T02:37:16.337-0400: 71151.369: [FinalReference, 1201 refs, 0.0002080 secs]2016-10-07T02:37:16.337-0400: 71151.369: [PhantomReference, 0 refs, 0.0000030 secs]2016-10-07T02:37:16.337-0400: 71151.369: [JNI Weak Reference, 0.0000080 secs] 1568M->915M(5120M), 2.6880870 secs]
   [Eden: 2048.0K(1384.0M)->0.0B(2112.0M) Survivors: 104.0M->0.0B Heap: 1568.8M(5120.0M)->915.2M(5120.0M)]
Heap after GC invocations=55430 (full 5):
garbage-first heap   total 5242880K, used 937168K [0x00000006a0000000, 0x00000007e0000000, 0x00000007e0000000)
  region size 2048K, 0 young (0K), 0 survivors (0K)
compacting perm gen  total 96256K, used 94313K [0x00000007e0000000, 0x00000007e5e00000, 0x0000000800000000)
   the space 96256K,  97% used [0x00000007e0000000, 0x00000007e5c1a700, 0x00000007e5c1a800, 0x00000007e5e00000)
No shared spaces configured.

Could you please help me in giving your views on the following queries


1)      Humongus allocation request for 72 mb failed, from the logs we can also see we have free space of  around 3 GB. Does this means , our application is encountering high  amount of fragmentation ?.

2)      Does tunning the gc params to make sure Mixed GC happens more , will help in resolving such Full GC?s ?

3)      Is there ?XX:Print* flag which can tell us how many old gen and humongous regions we have  (other than looking at  [G1 Ergonomics]  output , which sometimes gives old gen region count) ?

Please do let me know , if you need any more information. Appreciate your help.

Thanks and Regards
Prasanna

This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for further information.  Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more information about BlackRock?s Privacy Policy.
BlackRock Advisors (UK) Limited and BlackRock Investment Management (UK) Limited are authorised and regulated by the Financial Conduct Authority. Registered in England No. 796793 and No. 2020394 respectively. BlackRock Life Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. Registered in England No. 2223202. Registered Offices: 12 Throgmorton Avenue, London EC2N 2DL. BlackRock International Limited is authorised and regulated by the Financial Conduct Authority and is a registered investment adviser with the Securities and Exchange Commission (SEC). Registered in Scotland No. SC160821. Registered Office: Exchange Place One, 1 Semple Street, Edinburgh EH3 8BL.
For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.

? 2016 BlackRock, Inc. All rights reserved.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161007/0ab18661/attachment.html>

From vitalyd at gmail.com  Fri Oct  7 12:48:49 2016
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Fri, 7 Oct 2016 08:48:49 -0400
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
Message-ID: <CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>

Hi Prasanna,

First suggestion - move to latest Java 8.  G1 saw a lot of improvements in
8, and 7 is EOL of course.

Humongous allocations require contiguous regions to satisfy the allocation,
and are done directly out of old gen.  You're reserving 40% of heap to
handle overflow(G1ReservePercent) - why? I believe that reserve is only for
mitigating to-space exhaustion, which is during evacuation only - they
won't be available for humongous allocations (someone can correct me if
that's wrong).

Heap expansion fails because you're already at the limit given 40% is
reserved.

Again, I think you'll get more help here if you move to one of the latest
Java 8 releases.

On Friday, October 7, 2016, Gopal, Prasanna CWK <
prasanna.gopal at blackrock.com
<javascript:_e(%7B%7D,'cvml','prasanna.gopal at blackrock.com');>> wrote:

> Hi All
>
>
>
> We have one of our application with the  following settings
>
>
>
> JVM :  jdk_7u40_x64  ( we are in process of migrating to latest Jdk 7
> family )
>
>
>
> -XX:MaxPermSize=512m
>
>  -XX:+UseG1GC
>
>  -XX:G1ReservePercent=40
>
>  -XX:ConcGCThreads=14
>
>  -XX:+PrintGCDateStamps
>
> -XX:+PrintTenuringDistribution
>
>  -XX:+PrintGCApplicationConcurrentTime
>
>  -XX:+PrintGCApplicationStoppedTime
>
> -XX:+PrintAdaptiveSizePolicy
>
> -XX:+PrintHeapAtGC
>
>  -XX:+PrintReferenceGC
>
> -Xmx5120M
>
>  -Xms5120M
>
>
>
>
>
> From our GC logs , we can see our application is going Full GC due to
> humongous allocation failure. But from the logs we can see
>
>
>
>
>
> GC logs
>
> =======
>
>
>
> 2016-10-07T02:37:14.978-0400: 71150.009: Total time for which application
> threads were stopped: 0.0137870 seconds
>
> 71150.399: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason:
> humongous allocation request failed, allocation request: 75497488 bytes]
>
> 71150.399: [G1Ergonomics (Heap Sizing) expand the heap, requested
> expansion amount: *75497472* bytes, attempted expansion amount: 75497472
> bytes]
>
> 71150.399: [G1Ergonomics (Heap Sizing) did not expand the heap, reason:
> heap expansion operation failed]
>
> 2016-10-07T02:37:15.367-0400: 71150.399: Application time: 0.3898050
> seconds
>
> 71150.401: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason:
> humongous allocation request failed, allocation request: 75497488 bytes]
>
> 71150.401: [G1Ergonomics (Heap Sizing) expand the heap, requested
> expansion amount: 75497472 bytes, attempted expansion amount: 75497472
> bytes]
>
> 71150.401: [G1Ergonomics (Heap Sizing) did not expand the heap, reason:
> heap expansion operation failed]
>
> {Heap before GC invocations=55428 (full 4):
>
> garbage-first heap   total 5242880K, used 1900903K [0x00000006a0000000,
> 0x00000007e0000000, 0x00000007e0000000)
>
>   region size 2048K, 197 young (403456K), 14 survivors (28672K)
>
> compacting perm gen  total 96256K, used 94313K [0x00000007e0000000,
> 0x00000007e5e00000, 0x0000000800000000)
>
>    the space 96256K,  97% used [0x00000007e0000000, 0x00000007e5c1a700,
> 0x00000007e5c1a800, 0x00000007e5e00000)
>
> No shared spaces configured.
>
> 2016-10-07T02:37:15.369-0400: 71150.401: [GC pause (young)
>
> Desired survivor size 108003328 bytes, new threshold 15 (max 15)
>
> - age   1:    2362400 bytes,    2362400 total
>
> - age   2:     393128 bytes,    2755528 total
>
> - age   3:    1086824 bytes,    3842352 total
>
> - age   4:    1086528 bytes,    4928880 total
>
> - age   5:    1075480 bytes,    6004360 total
>
> - age   6:    1126736 bytes,    7131096 total
>
> - age   7:    1153072 bytes,    8284168 total
>
> - age   8:    1145832 bytes,    9430000 total
>
> - age   9:    1217904 bytes,   10647904 total
>
> - age  10:    1188384 bytes,   11836288 total
>
> - age  11:    1212456 bytes,   13048744 total
>
> - age  12:    1263960 bytes,   14312704 total
>
> - age  13:       4816 bytes,   14317520 total
>
> - age  14:      88952 bytes,   14406472 total
>
> - age  15:       7408 bytes,   14413880 total
>
> 71150.401: [G1Ergonomics (CSet Construction) start choosing CSet,
> _pending_cards: 149101, predicted base time: 16.35 ms, remaining time:
> 183.65 ms, target pause time: 200.00 ms]
>
> 71150.401: [G1Ergonomics (CSet Construction) add young regions to CSet,
> eden: 183 regions, survivors: 14 regions, predicted young region time: 3.45
> ms]
>
> 71150.401: [G1Ergonomics (CSet Construction) finish choosing CSet, eden:
> 183 regions, survivors: 14 regions, old: 0 regions, predicted pause time:
> 19.80 ms, target pause time: 200.00 ms]
>
> 2016-10-07T02:37:15.410-0400: 71150.442: [SoftReference, 0 refs, 0.0000460
> secs]2016-10-07T02:37:15.410-0400: 71150.442: [WeakReference, 1 refs,
> 0.0000050 secs]2016-10-07T02:37:15.410-0400: 71150.442: [FinalReference,
> 4 refs, 0.0000210 secs]2016-10-07T02:37:15.410-0400: 71150.442:
> [PhantomReference, 0 refs, 0.0000040 secs]2016-10-07T02:37:15.410-0400:
> 71150.442: [JNI Weak Reference, 0.0000050 secs], 0.0428440 secs]
>
>    [Parallel Time: 40.0 ms, GC Workers: 16]
>
>       [GC Worker Start (ms): Min: 71150401.6, Avg: 71150401.8, Max:
> 71150401.9, Diff: 0.4]
>
>       [Ext Root Scanning (ms): Min: 3.1, Avg: 3.9, Max: 8.4, Diff: 5.3,
> Sum: 62.1]
>
>       [Update RS (ms): Min: 14.8, Avg: 19.0, Max: 19.8, Diff: 5.0, Sum:
> 304.3]
>
>                   [Processed Buffers: Min: 21, Avg: 37.6, Max: 86, Diff:
> 65, Sum: 601]
>
>       [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: 0.9]
>
>       [Object Copy (ms): Min: 16.5, Avg: 16.6, Max: 16.8, Diff: 0.3, Sum:
> 266.3]
>
>       [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
>
>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum:
> 0.7]
>
>       [GC Worker Total (ms): Min: 39.5, Avg: 39.6, Max: 39.9, Diff: 0.4,
> Sum: 634.4]
>
>       [GC Worker End (ms): Min: 71150441.4, Avg: 71150441.4, Max:
> 71150441.5, Diff: 0.1]
>
>    [Code Root Fixup: 0.0 ms]
>
>    [Clear CT: 0.2 ms]
>
>    [Other: 2.7 ms]
>
>       [Choose CSet: 0.0 ms]
>
>       [Ref Proc: 0.3 ms]
>
>       [Ref Enq: 0.0 ms]
>
>       [Free CSet: 1.3 ms]
>
>    [Eden: 366.0M(1616.0M)->0.0B(1384.0M) Survivors: 28.0M->104.0M Heap:
> 1856.6M(5120.0M)->1568.8M(5120.0M)]
>
> Heap after GC invocations=55429 (full 4):
>
> garbage-first heap   total 5242880K, used 1606459K [0x00000006a0000000,
> 0x00000007e0000000, 0x00000007e0000000)
>
>   region size 2048K, 52 young (106496K), 52 survivors (106496K)
>
> compacting perm gen  total 96256K, used 94313K [0x00000007e0000000,
> 0x00000007e5e00000, 0x0000000800000000)
>
>    the space 96256K,  97% used [0x00000007e0000000, 0x00000007e5c1a700,
> 0x00000007e5c1a800, 0x00000007e5e00000)
>
> No shared spaces configured.
>
> }
>
> [Times: user=0.64 sys=0.00, real=0.04 secs]
>
> 71150.444: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason:
> humongous allocation request failed, allocation request: 75497488 bytes]
>
> 71150.444: [G1Ergonomics (Heap Sizing) expand the heap, requested
> expansion amount: 69206016 bytes, attempted expansion amount: 69206016
> bytes]
>
> 71150.444: [G1Ergonomics (Heap Sizing) did not expand the heap, reason:
> heap expansion operation failed]
>
> 2016-10-07T02:37:15.412-0400: 71150.444: Total time for which application
> threads were stopped: 0.0448480 seconds
>
> 2016-10-07T02:37:15.412-0400: 71150.444: Application time: 0.0000500
> seconds
>
> 71150.445: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason:
> humongous allocation request failed, allocation request: 75497488 bytes]
>
> 71150.445: [G1Ergonomics (Heap Sizing) expand the heap, requested
> expansion amount: 69206016 bytes, attempted expansion amount: 69206016
> bytes]
>
> 71150.445: [G1Ergonomics (Heap Sizing) did not expand the heap, reason:
> heap expansion operation failed]
>
> 71150.445: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason:
> allocation request failed, allocation request: 75497488 bytes]
>
> 71150.445: [G1Ergonomics (Heap Sizing) expand the heap, requested
> expansion amount: 75497488 bytes, attempted expansion amount: 77594624
> bytes]
>
> 71150.445: [G1Ergonomics (Heap Sizing) did not expand the heap, reason:
> heap expansion operation failed]
>
> {Heap before GC invocations=55429 (full 4):
>
> garbage-first heap   total 5242880K, used 1606459K [0x00000006a0000000,
> 0x00000007e0000000, 0x00000007e0000000)
>
>   region size 2048K, 53 young (108544K), 52 survivors (106496K)
>
> compacting perm gen  total 96256K, used 94313K [0x00000007e0000000,
> 0x00000007e5e00000, 0x0000000800000000)
>
>    the space 96256K,  97% used [0x00000007e0000000, 0x00000007e5c1a700,
> 0x00000007e5c1a800, 0x00000007e5e00000)
>
> No shared spaces configured.
>
> 2016-10-07T02:37:15.414-0400: 71150.445: [Full
> GC2016-10-07T02:37:16.337-0400: 71151.368: [SoftReference, 86 refs,
> 0.0000720 secs]2016-10-07T02:37:16.337-0400: 71151.368: [WeakReference,
> 1760 refs, 0.0002980 secs]2016-10-07T02:37:16.337-0400: 71151.369:
> [FinalReference, 1201 refs, 0.0002080 secs]2016-10-07T02:37:16.337-0400:
> 71151.369: [PhantomReference, 0 refs, 0.0000030
> secs]2016-10-07T02:37:16.337-0400: 71151.369: [JNI Weak Reference,
> 0.0000080 secs] 1568M->915M(5120M), 2.6880870 secs]
>
> 60 refs, 0.0002980 secs]2016-10-07T02:37:16.337-0400: 71151.369:
> [FinalReference, 1201 refs, 0.0002080 secs]2016-10-07T02:37:16.337-0400:
> 71151.369: [PhantomReference, 0 refs, 0.0000030
> secs]2016-10-07T02:37:16.337-0400: 71151.369: [JNI Weak Reference,
> 0.0000080 secs] 1568M->915M(5120M), 2.6880870 secs]
>
>    [Eden: 2048.0K(1384.0M)->0.0B(2112.0M) Survivors: 104.0M->0.0B Heap:
> 1568.8M(5120.0M)->915.2M(5120.0M)]
>
> Heap after GC invocations=55430 (full 5):
>
> garbage-first heap   total 5242880K, used 937168K [0x00000006a0000000,
> 0x00000007e0000000, 0x00000007e0000000)
>
>   region size 2048K, 0 young (0K), 0 survivors (0K)
>
> compacting perm gen  total 96256K, used 94313K [0x00000007e0000000,
> 0x00000007e5e00000, 0x0000000800000000)
>
>    the space 96256K,  97% used [0x00000007e0000000, 0x00000007e5c1a700,
> 0x00000007e5c1a800, 0x00000007e5e00000)
>
> No shared spaces configured.
>
>
>
> Could you please help me in giving your views on the following queries
>
>
>
> 1)      Humongus allocation request for 72 mb failed, from the logs we
> can also see we have free space of  around 3 GB. Does this means , our
> application is encountering high  amount of fragmentation ?.
>
> 2)      Does tunning the gc params to make sure Mixed GC happens more ,
> will help in resolving such Full GC?s ?
>
> 3)      Is there ?XX:Print* flag which can tell us how many old gen and
> humongous regions we have  (other than looking at  [G1 Ergonomics]  output
> , which sometimes gives old gen region count) ?
>
>
>
> Please do let me know , if you need any more information. Appreciate your
> help.
>
>
>
> Thanks and Regards
>
> Prasanna
>
>
>
> This message may contain information that is confidential or privileged.
> If you are not the intended recipient, please advise the sender immediately
> and delete this message. See http://www.blackrock.com/corpo
> rate/en-us/compliance/email-disclaimers for further information.  Please
> refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-
> policy for more information about BlackRock?s Privacy Policy.
>
> BlackRock Advisors (UK) Limited and BlackRock Investment Management (UK)
> Limited are authorised and regulated by the Financial Conduct Authority.
> Registered in England No. 796793 and No. 2020394 respectively. BlackRock
> Life Limited is authorised by the Prudential Regulation Authority and
> regulated by the Financial Conduct Authority and the Prudential Regulation
> Authority. Registered in England No. 2223202. Registered Offices: 12
> Throgmorton Avenue, London EC2N 2DL. BlackRock International Limited is
> authorised and regulated by the Financial Conduct Authority and is a
> registered investment adviser with the Securities and Exchange Commission
> (SEC). Registered in Scotland No. SC160821. Registered Office: Exchange
> Place One, 1 Semple Street, Edinburgh EH3 8BL.
>
> For a list of BlackRock's office addresses worldwide, see
> http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.
>
> ? 2016 BlackRock, Inc. All rights reserved.
>
>

-- 
Sent from my phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161007/43f61f19/attachment-0001.html>

From yu.zhang at oracle.com  Fri Oct  7 15:52:01 2016
From: yu.zhang at oracle.com (yu.zhang at oracle.com)
Date: Fri, 7 Oct 2016 08:52:01 -0700
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
Message-ID: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>

Prasanna,

In addition to what Vitaly said, I have some comments about your question:

1) Humongus allocation request for 72 mb failed, from the logs we can 
also see we have free space of  around 3 GB. Does this means , our 
application is encountering high  amount of fragmentation ?.

It is possible. What it means is g1 can not find 36 consecutive regions 
for that 72 mb object.

I agree the ReservePercent=40 is too high, but that should not prevent 
allocating to the old gen. G1 tries to honor ReservePercent.

2)2)  Does tunning the gc params to make sure Mixed GC happens more , 
will help in resolving such Full GC?s ?

If you can not move to jdk8, in jdk7 the default for parameter 
G1MixedGCLiveThresholdPercent is 65 (changed to 85 in jdk8). That is too 
low for most workloads. You can increase that so that more old regions 
will be treat as candidate for mixed gc.

3) Is there ?XX:Print* flag which can tell us how many old gen and 
humongous regions we have  (other than looking at  [G1 Ergonomics] 
  output , which sometimes gives old gen region count) ?

You get that count in jdk9, not jdk7.

One basic requirement for using G1GC, you need to give the pause time 
goal. G1 uses that for sizing young/old gen. The default is 200ms.

Thanks
Jenny
On 10/07/2016 05:48 AM, Vitaly Davidovich wrote:
> Hi Prasanna,
>
> First suggestion - move to latest Java 8.  G1 saw a lot of 
> improvements in 8, and 7 is EOL of course.
>
> Humongous allocations require contiguous regions to satisfy the 
> allocation, and are done directly out of old gen.  You're reserving 
> 40% of heap to handle overflow(G1ReservePercent) - why? I believe that 
> reserve is only for mitigating to-space exhaustion, which is during 
> evacuation only - they won't be available for humongous allocations 
> (someone can correct me if that's wrong).
>
> Heap expansion fails because you're already at the limit given 40% is 
> reserved.
>
> Again, I think you'll get more help here if you move to one of the 
> latest Java 8 releases.
>
> On Friday, October 7, 2016, Gopal, Prasanna CWK 
> <prasanna.gopal at blackrock.com 
> <javascript:_e(%7B%7D,'cvml','prasanna.gopal at blackrock.com');>> wrote:
>
>     Hi All
>
>     We have one of our application with the  following settings
>
>     JVM :  jdk_7u40_x64  ( we are in process of migrating to latest
>     Jdk 7 family )
>
>     -XX:MaxPermSize=512m
>
>      -XX:+UseG1GC
>
>      -XX:G1ReservePercent=40
>
>      -XX:ConcGCThreads=14
>
>      -XX:+PrintGCDateStamps
>
>     -XX:+PrintTenuringDistribution
>
>      -XX:+PrintGCApplicationConcurrentTime
>
>      -XX:+PrintGCApplicationStoppedTime
>
>     -XX:+PrintAdaptiveSizePolicy
>
>     -XX:+PrintHeapAtGC
>
>      -XX:+PrintReferenceGC
>
>     -Xmx5120M
>
>      -Xms5120M
>
>     From our GC logs , we can see our application is going Full GC due
>     to humongous allocation failure. But from the logs we can see
>
>     GC logs
>
>     =======
>
>     2016-10-07T02:37:14.978-0400: 71150.009: Total time for which
>     application threads were stopped: 0.0137870 seconds
>
>     71150.399: [G1Ergonomics (Heap Sizing) attempt heap expansion,
>     reason: humongous allocation request failed, allocation request:
>     75497488 bytes]
>
>     71150.399: [G1Ergonomics (Heap Sizing) expand the heap, requested
>     expansion amount: *75497472* bytes, attempted expansion amount:
>     75497472 bytes]
>
>     71150.399: [G1Ergonomics (Heap Sizing) did not expand the heap,
>     reason: heap expansion operation failed]
>
>     2016-10-07T02:37:15.367-0400: 71150.399: Application time:
>     0.3898050 seconds
>
>     71150.401: [G1Ergonomics (Heap Sizing) attempt heap expansion,
>     reason: humongous allocation request failed, allocation request:
>     75497488 bytes]
>
>     71150.401: [G1Ergonomics (Heap Sizing) expand the heap, requested
>     expansion amount: 75497472 bytes, attempted expansion amount:
>     75497472 bytes]
>
>     71150.401: [G1Ergonomics (Heap Sizing) did not expand the heap,
>     reason: heap expansion operation failed]
>
>     {Heap before GC invocations=55428 (full 4):
>
>     garbage-first heap   total 5242880K, used 1900903K
>     [0x00000006a0000000, 0x00000007e0000000, 0x00000007e0000000)
>
>     region size 2048K, 197 young (403456K), 14 survivors (28672K)
>
>     compacting perm gen  total 96256K, used 94313K
>     [0x00000007e0000000, 0x00000007e5e00000, 0x0000000800000000)
>
>        the space 96256K,  97% used [0x00000007e0000000,
>     0x00000007e5c1a700, 0x00000007e5c1a800, 0x00000007e5e00000)
>
>     No shared spaces configured.
>
>     2016-10-07T02:37:15.369-0400: 71150.401: [GC pause (young)
>
>     Desired survivor size 108003328 bytes, new threshold 15 (max 15)
>
>     - age   1:    2362400 bytes,    2362400 total
>
>     - age   2:     393128 bytes,    2755528 total
>
>     - age   3:    1086824 bytes,    3842352 total
>
>     - age   4:    1086528 bytes,    4928880 total
>
>     - age   5:    1075480 bytes,    6004360 total
>
>     - age   6:    1126736 bytes,    7131096 total
>
>     - age   7:    1153072 bytes,    8284168 total
>
>     - age   8:    1145832 bytes,    9430000 total
>
>     - age   9:    1217904 bytes,   10647904 total
>
>     - age 10:    1188384 bytes,   11836288 total
>
>     - age 11:    1212456 bytes,   13048744 total
>
>     - age 12:    1263960 bytes,   14312704 total
>
>     - age 13:       4816 bytes,   14317520 total
>
>     - age 14:      88952 bytes,   14406472 total
>
>     - age 15:       7408 bytes,   14413880 total
>
>     71150.401: [G1Ergonomics (CSet Construction) start choosing CSet,
>     _pending_cards: 149101, predicted base time: 16.35 ms, remaining
>     time: 183.65 ms, target pause time: 200.00 ms]
>
>     71150.401: [G1Ergonomics (CSet Construction) add young regions to
>     CSet, eden: 183 regions, survivors: 14 regions, predicted young
>     region time: 3.45 ms]
>
>     71150.401: [G1Ergonomics (CSet Construction) finish choosing CSet,
>     eden: 183 regions, survivors: 14 regions, old: 0 regions,
>     predicted pause time: 19.80 ms, target pause time: 200.00 ms]
>
>     2016-10-07T02:37:15.410-0400: 71150.442: [SoftReference, 0 refs,
>     0.0000460 secs]2016-10-07T02:37:15.410-0400: 71150.442:
>     [WeakReference, 1 refs, 0.0000050
>     secs]2016-10-07T02:37:15.410-0400: 71150.442: [FinalReference, 4
>     refs, 0.0000210 secs]2016-10-07T02:37:15.410-0400: 71150.442:
>     [PhantomReference, 0 refs, 0.0000040
>     secs]2016-10-07T02:37:15.410-0400: 71150.442: [JNI Weak Reference,
>     0.0000050 secs], 0.0428440 secs]
>
>     [Parallel Time: 40.0 ms, GC Workers: 16]
>
>     [GC Worker Start (ms): Min: 71150401.6, Avg: 71150401.8, Max:
>     71150401.9, Diff: 0.4]
>
>     [Ext Root Scanning (ms): Min: 3.1, Avg: 3.9, Max: 8.4, Diff: 5.3,
>     Sum: 62.1]
>
>     [Update RS (ms): Min: 14.8, Avg: 19.0, Max: 19.8, Diff: 5.0, Sum:
>     304.3]
>
>           [Processed Buffers: Min: 21, Avg: 37.6, Max: 86, Diff: 65,
>     Sum: 601]
>
>     [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: 0.9]
>
>     [Object Copy (ms): Min: 16.5, Avg: 16.6, Max: 16.8, Diff: 0.3,
>     Sum: 266.3]
>
>     [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
>
>     [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
>     Sum: 0.7]
>
>     [GC Worker Total (ms): Min: 39.5, Avg: 39.6, Max: 39.9, Diff: 0.4,
>     Sum: 634.4]
>
>     [GC Worker End (ms): Min: 71150441.4, Avg: 71150441.4, Max:
>     71150441.5, Diff: 0.1]
>
>     [Code Root Fixup: 0.0 ms]
>
>     [Clear CT: 0.2 ms]
>
>     [Other: 2.7 ms]
>
>     [Choose CSet: 0.0 ms]
>
>     [Ref Proc: 0.3 ms]
>
>     [Ref Enq: 0.0 ms]
>
>     [Free CSet: 1.3 ms]
>
>     [Eden: 366.0M(1616.0M)->0.0B(1384.0M) Survivors: 28.0M->104.0M
>     Heap: 1856.6M(5120.0M)->1568.8M(5120.0M)]
>
>     Heap after GC invocations=55429 (full 4):
>
>     garbage-first heap   total 5242880K, used 1606459K
>     [0x00000006a0000000, 0x00000007e0000000, 0x00000007e0000000)
>
>     region size 2048K, 52 young (106496K), 52 survivors (106496K)
>
>     compacting perm gen  total 96256K, used 94313K
>     [0x00000007e0000000, 0x00000007e5e00000, 0x0000000800000000)
>
>        the space 96256K,  97% used [0x00000007e0000000,
>     0x00000007e5c1a700, 0x00000007e5c1a800, 0x00000007e5e00000)
>
>     No shared spaces configured.
>
>     }
>
>     [Times: user=0.64 sys=0.00, real=0.04 secs]
>
>     71150.444: [G1Ergonomics (Heap Sizing) attempt heap expansion,
>     reason: humongous allocation request failed, allocation request:
>     75497488 bytes]
>
>     71150.444: [G1Ergonomics (Heap Sizing) expand the heap, requested
>     expansion amount: 69206016 bytes, attempted expansion amount:
>     69206016 bytes]
>
>     71150.444: [G1Ergonomics (Heap Sizing) did not expand the heap,
>     reason: heap expansion operation failed]
>
>     2016-10-07T02:37:15.412-0400: 71150.444: Total time for which
>     application threads were stopped: 0.0448480 seconds
>
>     2016-10-07T02:37:15.412-0400: 71150.444: Application time:
>     0.0000500 seconds
>
>     71150.445: [G1Ergonomics (Heap Sizing) attempt heap expansion,
>     reason: humongous allocation request failed, allocation request:
>     75497488 bytes]
>
>     71150.445: [G1Ergonomics (Heap Sizing) expand the heap, requested
>     expansion amount: 69206016 bytes, attempted expansion amount:
>     69206016 bytes]
>
>     71150.445: [G1Ergonomics (Heap Sizing) did not expand the heap,
>     reason: heap expansion operation failed]
>
>     71150.445: [G1Ergonomics (Heap Sizing) attempt heap expansion,
>     reason: allocation request failed, allocation request: 75497488 bytes]
>
>     71150.445: [G1Ergonomics (Heap Sizing) expand the heap, requested
>     expansion amount: 75497488 bytes, attempted expansion amount:
>     77594624 bytes]
>
>     71150.445: [G1Ergonomics (Heap Sizing) did not expand the heap,
>     reason: heap expansion operation failed]
>
>     {Heap before GC invocations=55429 (full 4):
>
>     garbage-first heap   total 5242880K, used 1606459K
>     [0x00000006a0000000, 0x00000007e0000000, 0x00000007e0000000)
>
>     region size 2048K, 53 young (108544K), 52 survivors (106496K)
>
>     compacting perm gen  total 96256K, used 94313K
>     [0x00000007e0000000, 0x00000007e5e00000, 0x0000000800000000)
>
>        the space 96256K,  97% used [0x00000007e0000000,
>     0x00000007e5c1a700, 0x00000007e5c1a800, 0x00000007e5e00000)
>
>     No shared spaces configured.
>
>     2016-10-07T02:37:15.414-0400: 71150.445: [Full
>     GC2016-10-07T02:37:16.337-0400: 71151.368: [SoftReference, 86
>     refs, 0.0000720 secs]2016-10-07T02:37:16.337-0400: 71151.368:
>     [WeakReference, 1760 refs, 0.0002980
>     secs]2016-10-07T02:37:16.337-0400: 71151.369: [FinalReference,
>     1201 refs, 0.0002080 secs]2016-10-07T02:37:16.337-0400: 71151.369:
>     [PhantomReference, 0 refs, 0.0000030
>     secs]2016-10-07T02:37:16.337-0400: 71151.369: [JNI Weak Reference,
>     0.0000080 secs] 1568M->915M(5120M), 2.6880870 secs]
>
>     60 refs, 0.0002980 secs]2016-10-07T02:37:16.337-0400: 71151.369:
>     [FinalReference, 1201 refs, 0.0002080
>     secs]2016-10-07T02:37:16.337-0400: 71151.369: [PhantomReference, 0
>     refs, 0.0000030 secs]2016-10-07T02:37:16.337-0400: 71151.369: [JNI
>     Weak Reference, 0.0000080 secs] 1568M->915M(5120M), 2.6880870 secs]
>
>     [Eden: 2048.0K(1384.0M)->0.0B(2112.0M) Survivors: 104.0M->0.0B
>     Heap: 1568.8M(5120.0M)->915.2M(5120.0M)]
>
>     Heap after GC invocations=55430 (full 5):
>
>     garbage-first heap   total 5242880K, used 937168K
>     [0x00000006a0000000, 0x00000007e0000000, 0x00000007e0000000)
>
>     region size 2048K, 0 young (0K), 0 survivors (0K)
>
>     compacting perm gen  total 96256K, used 94313K
>     [0x00000007e0000000, 0x00000007e5e00000, 0x0000000800000000)
>
>        the space 96256K,  97% used [0x00000007e0000000,
>     0x00000007e5c1a700, 0x00000007e5c1a800, 0x00000007e5e00000)
>
>     No shared spaces configured.
>
>     Could you please help me in giving your views on the following queries
>
>     1) Humongus allocation request for 72 mb failed, from the logs we
>     can also see we have free space of  around 3 GB. Does this means ,
>     our application is encountering high amount of fragmentation ?.
>
>     2) Does tunning the gc params to make sure Mixed GC happens more ,
>     will help in resolving such Full GC?s ?
>
>     3) Is there ?XX:Print* flag which can tell us how many old gen and
>     humongous regions we have  (other than looking at  [G1 Ergonomics]
>      output , which sometimes gives old gen region count) ?
>
>     Please do let me know , if you need any more information.
>     Appreciate your help.
>
>     Thanks and Regards
>
>     Prasanna
>
>     This message may contain information that is confidential or
>     privileged. If you are not the intended recipient, please advise
>     the sender immediately and delete this message. See
>     http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers
>     <http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers>
>     for further information.  Please refer to
>     http://www.blackrock.com/corporate/en-us/compliance/privacy-policy
>     <http://www.blackrock.com/corporate/en-us/compliance/privacy-policy> for
>     more information about BlackRock?s Privacy Policy.
>
>     BlackRock Advisors (UK) Limited and BlackRock Investment
>     Management (UK) Limited are authorised and regulated by the
>     Financial Conduct Authority. Registered in England No. 796793 and
>     No. 2020394 respectively. BlackRock Life Limited is authorised by
>     the Prudential Regulation Authority and regulated by the Financial
>     Conduct Authority and the Prudential Regulation Authority.
>     Registered in England No. 2223202. Registered Offices: 12
>     Throgmorton Avenue, London EC2N 2DL. BlackRock International
>     Limited is authorised and regulated by the Financial Conduct
>     Authority and is a registered investment adviser with the
>     Securities and Exchange Commission (SEC). Registered in Scotland
>     No. SC160821. Registered Office: Exchange Place One, 1 Semple
>     Street, Edinburgh EH3 8BL.
>
>     For a list of BlackRock's office addresses worldwide, see
>     http://www.blackrock.com/corporate/en-us/about-us/contacts-locations
>     <http://www.blackrock.com/corporate/en-us/about-us/contacts-locations>.
>
>     ? 2016 BlackRock, Inc. All rights reserved.
>
>
>
> -- 
> Sent from my phone
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161007/51190db5/attachment-0001.html>

From vitalyd at gmail.com  Fri Oct  7 16:00:00 2016
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Fri, 7 Oct 2016 12:00:00 -0400
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
Message-ID: <CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>

Hi Jenny,

On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com <yu.zhang at oracle.com>
wrote:

> Prasanna,
>
> In addition to what Vitaly said, I have some comments about your question:
>
> 1)      Humongus allocation request for 72 mb failed, from the logs we
> can also see we have free space of  around 3 GB. Does this means , our
> application is encountering high  amount of fragmentation ?.
>
> It is possible. What it means is g1 can not find 36 consecutive regions
> for that 72 mb object.
>
> I agree the ReservePercent=40 is too high, but that should not prevent
> allocating to the old gen. G1 tries to honor ReservePercent.
>
So just to clarify - is the space (i.e. regions) reserved by
G1ReservePercent allocatable to humongous object allocations? All
docs/webpages I found talk about this space being for holding survivors
(i.e. evac failure/to-space exhaustion mitigation).  It sounds like you're
saying these reserved regions should also be used to satisfy HO allocs?

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161007/40804361/attachment.html>

From charlie.hunt at oracle.com  Fri Oct  7 16:46:00 2016
From: charlie.hunt at oracle.com (charlie hunt)
Date: Fri, 7 Oct 2016 11:46:00 -0500
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
Message-ID: <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>

Hi Vitaly,

Just to clarify things in case there might be some confusion ? one of the terms in G1 can be a little confusing with a term used in Parallel GC, Serial GC and CMS GC, and that is ?to-space?.  In the latter case, ?to-space? is a survivor space. In G1, ?to-space? is any space that a G1 is evacuating objects too.  So a ?to-space exhausted? means that during an evacuation of live objects from a G1 region (which could be an eden region, survivor region or old region), and there is not an available region to evacuate those live objects, this constitutes a ?to-space failure?.

I may be wrong, but my understanding is that once a humongous object is allocated, it is not evacuated. It stays in the same allocated region(s) until it is marked as being unreachable and can be reclaimed.

charlie

> On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
> 
> Hi Jenny,
> 
> On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com <mailto:yu.zhang at oracle.com> <yu.zhang at oracle.com <mailto:yu.zhang at oracle.com>> wrote:
> Prasanna,
> 
> In addition to what Vitaly said, I have some comments about your question:
> 
> 1)      Humongus allocation request for 72 mb failed, from the logs we can also see we have free space of  around 3 GB. Does this means , our application is encountering high  amount of fragmentation ?. 
> 
> It is possible. What it means is g1 can not find 36 consecutive regions for that 72 mb object.
> I agree the ReservePercent=40 is too high, but that should not prevent allocating to the old gen. G1 tries to honor ReservePercent.
> 
> So just to clarify - is the space (i.e. regions) reserved by G1ReservePercent allocatable to humongous object allocations? All docs/webpages I found talk about this space being for holding survivors (i.e. evac failure/to-space exhaustion mitigation).  It sounds like you're saying these reserved regions should also be used to satisfy HO allocs? 
> 
> Thanks
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161007/4b14818a/attachment.html>

From vitalyd at gmail.com  Fri Oct  7 16:51:47 2016
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Fri, 7 Oct 2016 12:51:47 -0400
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
	<70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
Message-ID: <CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>

Hi Charlie,

On Fri, Oct 7, 2016 at 12:46 PM, charlie hunt <charlie.hunt at oracle.com>
wrote:

> Hi Vitaly,
>
> Just to clarify things in case there might be some confusion ? one of the
> terms in G1 can be a little confusing with a term used in Parallel GC,
> Serial GC and CMS GC, and that is ?to-space?.  In the latter case,
> ?to-space? is a survivor space. In G1, ?to-space? is any space that a G1 is
> evacuating objects too.  So a ?to-space exhausted? means that during an
> evacuation of live objects from a G1 region (which could be an eden region,
> survivor region or old region), and there is not an available region to
> evacuate those live objects, this constitutes a ?to-space failure?.
>
> I may be wrong, but my understanding is that once a humongous object is
> allocated, it is not evacuated. It stays in the same allocated region(s)
> until it is marked as being unreachable and can be reclaimed.
>
Right, I understand the distinction in terminology.

What I'm a bit confused by is when Jenny said "I agree the
ReservePercent=40 is too high, but that should not prevent allocating to
the old gen. G1 tries to honor ReservePercent."  Specifically, the "G1
tries to honor ReservePercent".  It wasn't clear to me whether that implies
humongous allocations can look for contiguous regions in the reserve, or
not.  That's what I'm hoping to get clarification on since other sources
online don't mention G1ReservePercent playing a role for HO specifically.

Thanks

>
> charlie
>
> On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
>
> Hi Jenny,
>
> On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com <yu.zhang at oracle.com>
> wrote:
>
>> Prasanna,
>>
>> In addition to what Vitaly said, I have some comments about your question:
>>
>> 1)      Humongus allocation request for 72 mb failed, from the logs we
>> can also see we have free space of  around 3 GB. Does this means , our
>> application is encountering high  amount of fragmentation ?.
>>
>> It is possible. What it means is g1 can not find 36 consecutive regions
>> for that 72 mb object.
>>
>> I agree the ReservePercent=40 is too high, but that should not prevent
>> allocating to the old gen. G1 tries to honor ReservePercent.
>>
> So just to clarify - is the space (i.e. regions) reserved by
> G1ReservePercent allocatable to humongous object allocations? All
> docs/webpages I found talk about this space being for holding survivors
> (i.e. evac failure/to-space exhaustion mitigation).  It sounds like you're
> saying these reserved regions should also be used to satisfy HO allocs?
>
> Thanks
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161007/73cbbab4/attachment.html>

From charlie.hunt at oracle.com  Fri Oct  7 17:00:06 2016
From: charlie.hunt at oracle.com (charlie hunt)
Date: Fri, 7 Oct 2016 12:00:06 -0500
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
	<70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
	<CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
Message-ID: <D66CC823-9BDE-49D4-9828-266DEC4A5089@oracle.com>

Glad to hear you?re not confused with the terminology.  :-)

On ReservePercent, my understanding is that the ReservePercent applies to the number of regions that will not used for young generation, eden regions or survivor regions. The intent is avoid to-space exhausted by ensuring a ?reserved percentage? of regions are available for evacuation. This implies that those reserved regions could be used for old regions or humongous regions.

charlie

> On Oct 7, 2016, at 11:51 AM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
> 
> Hi Charlie,
> 
> On Fri, Oct 7, 2016 at 12:46 PM, charlie hunt <charlie.hunt at oracle.com <mailto:charlie.hunt at oracle.com>> wrote:
> Hi Vitaly,
> 
> Just to clarify things in case there might be some confusion ? one of the terms in G1 can be a little confusing with a term used in Parallel GC, Serial GC and CMS GC, and that is ?to-space?.  In the latter case, ?to-space? is a survivor space. In G1, ?to-space? is any space that a G1 is evacuating objects too.  So a ?to-space exhausted? means that during an evacuation of live objects from a G1 region (which could be an eden region, survivor region or old region), and there is not an available region to evacuate those live objects, this constitutes a ?to-space failure?.
> 
> I may be wrong, but my understanding is that once a humongous object is allocated, it is not evacuated. It stays in the same allocated region(s) until it is marked as being unreachable and can be reclaimed.
> Right, I understand the distinction in terminology.
> 
> What I'm a bit confused by is when Jenny said "I agree the ReservePercent=40 is too high, but that should not prevent allocating to the old gen. G1 tries to honor ReservePercent."  Specifically, the "G1 tries to honor ReservePercent".  It wasn't clear to me whether that implies humongous allocations can look for contiguous regions in the reserve, or not.  That's what I'm hoping to get clarification on since other sources online don't mention G1ReservePercent playing a role for HO specifically.
> 
> Thanks
> 
> charlie
> 
>> On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich <vitalyd at gmail.com <mailto:vitalyd at gmail.com>> wrote:
>> 
>> Hi Jenny,
>> 
>> On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com <mailto:yu.zhang at oracle.com> <yu.zhang at oracle.com <mailto:yu.zhang at oracle.com>> wrote:
>> Prasanna,
>> 
>> In addition to what Vitaly said, I have some comments about your question:
>> 
>> 1)      Humongus allocation request for 72 mb failed, from the logs we can also see we have free space of  around 3 GB. Does this means , our application is encountering high  amount of fragmentation ?. 
>> 
>> It is possible. What it means is g1 can not find 36 consecutive regions for that 72 mb object.
>> I agree the ReservePercent=40 is too high, but that should not prevent allocating to the old gen. G1 tries to honor ReservePercent.
>> 
>> So just to clarify - is the space (i.e. regions) reserved by G1ReservePercent allocatable to humongous object allocations? All docs/webpages I found talk about this space being for holding survivors (i.e. evac failure/to-space exhaustion mitigation).  It sounds like you're saying these reserved regions should also be used to satisfy HO allocs? 
>> 
>> Thanks
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161007/3b19c491/attachment.html>

From vitalyd at gmail.com  Fri Oct  7 17:09:13 2016
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Fri, 7 Oct 2016 13:09:13 -0400
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <D66CC823-9BDE-49D4-9828-266DEC4A5089@oracle.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
	<70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
	<CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
	<D66CC823-9BDE-49D4-9828-266DEC4A5089@oracle.com>
Message-ID: <CAHjP37FmWS9izaKFyd48fDBdrhH8-M6HmtpOAFFEqQ1L9e4Q9w@mail.gmail.com>

On Fri, Oct 7, 2016 at 1:00 PM, charlie hunt <charlie.hunt at oracle.com>
wrote:

> Glad to hear you?re not confused with the terminology.  :-)
>
> On ReservePercent, my understanding is that the ReservePercent applies to
> the number of regions that will not used for young generation, eden regions
> or survivor regions. The intent is avoid to-space exhausted by ensuring a
> ?reserved percentage? of regions are available for evacuation. This implies
> that those reserved regions could be used for old regions or humongous
> regions.
>
Ok, so then the more explicit wording would be "The intent is to avoid
to-space exhausted by ensuring a reserved percentage of regions are
available for evacuation or humongous object allocation", right? Perhaps
the "for evacuation" is throwing it off a bit for me, since the HO
allocation isn't an "evacuation" obviously.

Thanks Charlie

P.S. I realize I'm hijacking Prasanna's thread quite a bit, but hopefully
the discussed info is useful anyway.

>
> charlie
>
> On Oct 7, 2016, at 11:51 AM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
>
> Hi Charlie,
>
> On Fri, Oct 7, 2016 at 12:46 PM, charlie hunt <charlie.hunt at oracle.com>
> wrote:
>
>> Hi Vitaly,
>>
>> Just to clarify things in case there might be some confusion ? one of the
>> terms in G1 can be a little confusing with a term used in Parallel GC,
>> Serial GC and CMS GC, and that is ?to-space?.  In the latter case,
>> ?to-space? is a survivor space. In G1, ?to-space? is any space that a G1 is
>> evacuating objects too.  So a ?to-space exhausted? means that during an
>> evacuation of live objects from a G1 region (which could be an eden region,
>> survivor region or old region), and there is not an available region to
>> evacuate those live objects, this constitutes a ?to-space failure?.
>>
>> I may be wrong, but my understanding is that once a humongous object is
>> allocated, it is not evacuated. It stays in the same allocated region(s)
>> until it is marked as being unreachable and can be reclaimed.
>>
> Right, I understand the distinction in terminology.
>
> What I'm a bit confused by is when Jenny said "I agree the
> ReservePercent=40 is too high, but that should not prevent allocating to
> the old gen. G1 tries to honor ReservePercent."  Specifically, the "G1
> tries to honor ReservePercent".  It wasn't clear to me whether that implies
> humongous allocations can look for contiguous regions in the reserve, or
> not.  That's what I'm hoping to get clarification on since other sources
> online don't mention G1ReservePercent playing a role for HO specifically.
>
> Thanks
>
>>
>> charlie
>>
>> On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
>>
>> Hi Jenny,
>>
>> On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com <yu.zhang at oracle.com
>> > wrote:
>>
>>> Prasanna,
>>>
>>> In addition to what Vitaly said, I have some comments about your
>>> question:
>>>
>>> 1)      Humongus allocation request for 72 mb failed, from the logs we
>>> can also see we have free space of  around 3 GB. Does this means , our
>>> application is encountering high  amount of fragmentation ?.
>>>
>>> It is possible. What it means is g1 can not find 36 consecutive regions
>>> for that 72 mb object.
>>>
>>> I agree the ReservePercent=40 is too high, but that should not prevent
>>> allocating to the old gen. G1 tries to honor ReservePercent.
>>>
>> So just to clarify - is the space (i.e. regions) reserved by
>> G1ReservePercent allocatable to humongous object allocations? All
>> docs/webpages I found talk about this space being for holding survivors
>> (i.e. evac failure/to-space exhaustion mitigation).  It sounds like you're
>> saying these reserved regions should also be used to satisfy HO allocs?
>>
>> Thanks
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161007/2268179c/attachment-0001.html>

From yu.zhang at oracle.com  Fri Oct  7 17:15:54 2016
From: yu.zhang at oracle.com (yu.zhang at oracle.com)
Date: Fri, 7 Oct 2016 10:15:54 -0700
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
	<70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
	<CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
Message-ID: <76f94e97-3886-9c1d-65a6-d00dda2d903c@oracle.com>

Hi, Vitaly,

Here is what happens in jdk9(I think the logic is the same as in jdk8).

_reserve_regions = reserve percent*regions of the heap
when trying to decide regions for young gen, we look at the free regions 
at the end of the collection, and try to honor the reserve_regions
if (available_free_regions > _reserve_regions) {
     base_free_regions = available_free_regions - _reserve_regions;
}

And there are other constrains to consider: user defined constrains and 
pause time goal.

This is what I meant by 'try to honor' the reserved.
If there is enough available_free_regions, it will reserve those 
regions. Those regions can be used as old or young.

Jenny
On 10/07/2016 09:51 AM, Vitaly Davidovich wrote:
> Hi Charlie,
>
> On Fri, Oct 7, 2016 at 12:46 PM, charlie hunt <charlie.hunt at oracle.com 
> <mailto:charlie.hunt at oracle.com>> wrote:
>
>     Hi Vitaly,
>
>     Just to clarify things in case there might be some confusion ? one
>     of the terms in G1 can be a little confusing with a term used in
>     Parallel GC, Serial GC and CMS GC, and that is ?to-space?.  In the
>     latter case, ?to-space? is a survivor space. In G1, ?to-space? is
>     any space that a G1 is evacuating objects too.  So a ?to-space
>     exhausted? means that during an evacuation of live objects from a
>     G1 region (which could be an eden region, survivor region or old
>     region), and there is not an available region to evacuate those
>     live objects, this constitutes a ?to-space failure?.
>
>     I may be wrong, but my understanding is that once a humongous
>     object is allocated, it is not evacuated. It stays in the same
>     allocated region(s) until it is marked as being unreachable and
>     can be reclaimed.
>
> Right, I understand the distinction in terminology.
>
> What I'm a bit confused by is when Jenny said "I agree the 
> ReservePercent=40 is too high, but that should not prevent allocating 
> to the old gen. G1 tries to honor ReservePercent."  Specifically, the 
> "G1 tries to honor ReservePercent". It wasn't clear to me whether that 
> implies humongous allocations can look for contiguous regions in the 
> reserve, or not.  That's what I'm hoping to get clarification on since 
> other sources online don't mention G1ReservePercent playing a role for 
> HO specifically.
>
> Thanks
>
>
>     charlie
>
>>     On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich <vitalyd at gmail.com
>>     <mailto:vitalyd at gmail.com>> wrote:
>>
>>     Hi Jenny,
>>
>>     On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com
>>     <mailto:yu.zhang at oracle.com> <yu.zhang at oracle.com
>>     <mailto:yu.zhang at oracle.com>> wrote:
>>
>>         Prasanna,
>>
>>         In addition to what Vitaly said, I have some comments about
>>         your question:
>>
>>         1) Humongus allocation request for 72 mb failed, from the
>>         logs we can also see we have free space of  around 3 GB. Does
>>         this means , our application is encountering high  amount of
>>         fragmentation ?.
>>
>>         It is possible. What it means is g1 can not find 36
>>         consecutive regions for that 72 mb object.
>>
>>         I agree the ReservePercent=40 is too high, but that should
>>         not prevent allocating to the old gen. G1 tries to honor
>>         ReservePercent.
>>
>>     So just to clarify - is the space (i.e. regions) reserved by
>>     G1ReservePercent allocatable to humongous object allocations? All
>>     docs/webpages I found talk about this space being for holding
>>     survivors (i.e. evac failure/to-space exhaustion mitigation).  It
>>     sounds like you're saying these reserved regions should also be
>>     used to satisfy HO allocs?
>>
>>     Thanks
>>     _______________________________________________
>>     hotspot-gc-use mailing list
>>     hotspot-gc-use at openjdk.java.net
>>     <mailto:hotspot-gc-use at openjdk.java.net>
>>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>     <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161007/dbf11de0/attachment.html>

From charlie.hunt at oracle.com  Fri Oct  7 17:24:57 2016
From: charlie.hunt at oracle.com (charlie hunt)
Date: Fri, 7 Oct 2016 12:24:57 -0500
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <CAHjP37FmWS9izaKFyd48fDBdrhH8-M6HmtpOAFFEqQ1L9e4Q9w@mail.gmail.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
	<70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
	<CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
	<D66CC823-9BDE-49D4-9828-266DEC4A5089@oracle.com>
	<CAHjP37FmWS9izaKFyd48fDBdrhH8-M6HmtpOAFFEqQ1L9e4Q9w@mail.gmail.com>
Message-ID: <2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com>

I think others are benefiting from your question(s) ? and it?s helping refresh my memory of things too. ;-)

Actually, I just looked at what we documented in Java Performance Companion for G1ReservePercent, this wording may imply a very slightly subtle different definition, ?To reduce the risk of getting a promotion failure, G1 reserves some memory for promotions. This memory will not be used for the young generation.? 

Perhaps one of the G1 engineers can clarify this?

Based on what we documented for G1ReservePercent, it implies that regions are reserved for promotions, which implies old generation regions. Note that on a young GC, some objects will be evacuated to survivor regions, and if G1 decides to grow the number of eden regions, then both those evacuated ?to survivor regions? and ?additional eden regions? will not come from that G1ReservePercent. And, since humongous objects are allocated from old regions, it is not clear to me that G1ReservePercent regions could be allocated into as humongous objects if the intent for G1ReservePercent is for promotions. Humongous objects are not promoted. They are allocated directly into humongous regions which get allocated from old generation.

Again, hopefully one of the G1 engineers can jump in and clarify.

Thanks for the question(s)!

charlie 

> On Oct 7, 2016, at 12:09 PM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
> 
> 
> 
> On Fri, Oct 7, 2016 at 1:00 PM, charlie hunt <charlie.hunt at oracle.com <mailto:charlie.hunt at oracle.com>> wrote:
> Glad to hear you?re not confused with the terminology.  :-)
> 
> On ReservePercent, my understanding is that the ReservePercent applies to the number of regions that will not used for young generation, eden regions or survivor regions. The intent is avoid to-space exhausted by ensuring a ?reserved percentage? of regions are available for evacuation. This implies that those reserved regions could be used for old regions or humongous regions.
> Ok, so then the more explicit wording would be "The intent is to avoid to-space exhausted by ensuring a reserved percentage of regions are available for evacuation or humongous object allocation", right? Perhaps the "for evacuation" is throwing it off a bit for me, since the HO allocation isn't an "evacuation" obviously. 
> 
> Thanks Charlie
> 
> P.S. I realize I'm hijacking Prasanna's thread quite a bit, but hopefully the discussed info is useful anyway.
> 
> charlie
> 
>> On Oct 7, 2016, at 11:51 AM, Vitaly Davidovich <vitalyd at gmail.com <mailto:vitalyd at gmail.com>> wrote:
>> 
>> Hi Charlie,
>> 
>> On Fri, Oct 7, 2016 at 12:46 PM, charlie hunt <charlie.hunt at oracle.com <mailto:charlie.hunt at oracle.com>> wrote:
>> Hi Vitaly,
>> 
>> Just to clarify things in case there might be some confusion ? one of the terms in G1 can be a little confusing with a term used in Parallel GC, Serial GC and CMS GC, and that is ?to-space?.  In the latter case, ?to-space? is a survivor space. In G1, ?to-space? is any space that a G1 is evacuating objects too.  So a ?to-space exhausted? means that during an evacuation of live objects from a G1 region (which could be an eden region, survivor region or old region), and there is not an available region to evacuate those live objects, this constitutes a ?to-space failure?.
>> 
>> I may be wrong, but my understanding is that once a humongous object is allocated, it is not evacuated. It stays in the same allocated region(s) until it is marked as being unreachable and can be reclaimed.
>> Right, I understand the distinction in terminology.
>> 
>> What I'm a bit confused by is when Jenny said "I agree the ReservePercent=40 is too high, but that should not prevent allocating to the old gen. G1 tries to honor ReservePercent."  Specifically, the "G1 tries to honor ReservePercent".  It wasn't clear to me whether that implies humongous allocations can look for contiguous regions in the reserve, or not.  That's what I'm hoping to get clarification on since other sources online don't mention G1ReservePercent playing a role for HO specifically.
>> 
>> Thanks
>> 
>> charlie
>> 
>>> On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich <vitalyd at gmail.com <mailto:vitalyd at gmail.com>> wrote:
>>> 
>>> Hi Jenny,
>>> 
>>> On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com <mailto:yu.zhang at oracle.com> <yu.zhang at oracle.com <mailto:yu.zhang at oracle.com>> wrote:
>>> Prasanna,
>>> 
>>> In addition to what Vitaly said, I have some comments about your question:
>>> 
>>> 1)      Humongus allocation request for 72 mb failed, from the logs we can also see we have free space of  around 3 GB. Does this means , our application is encountering high  amount of fragmentation ?. 
>>> 
>>> It is possible. What it means is g1 can not find 36 consecutive regions for that 72 mb object.
>>> I agree the ReservePercent=40 is too high, but that should not prevent allocating to the old gen. G1 tries to honor ReservePercent.
>>> 
>>> So just to clarify - is the space (i.e. regions) reserved by G1ReservePercent allocatable to humongous object allocations? All docs/webpages I found talk about this space being for holding survivors (i.e. evac failure/to-space exhaustion mitigation).  It sounds like you're saying these reserved regions should also be used to satisfy HO allocs? 
>>> 
>>> Thanks
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
>> 
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161007/b5ef7cd5/attachment.html>

From vitalyd at gmail.com  Fri Oct  7 17:27:13 2016
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Fri, 7 Oct 2016 13:27:13 -0400
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <76f94e97-3886-9c1d-65a6-d00dda2d903c@oracle.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
	<70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
	<CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
	<76f94e97-3886-9c1d-65a6-d00dda2d903c@oracle.com>
Message-ID: <CAHjP37E457tZdpWaEFL2SX+MpuXA7o1L8kn3jRK7org9-EDeHg@mail.gmail.com>

Hi Jenny,

On Fri, Oct 7, 2016 at 1:15 PM, yu.zhang at oracle.com <yu.zhang at oracle.com>
wrote:

> Hi, Vitaly,
>
> Here is what happens in jdk9(I think the logic is the same as in jdk8).
> _reserve_regions = reserve percent*regions of the heap
> when trying to decide regions for young gen, we look at the free regions
> at the end of the collection, and try to honor the reserve_regions
> if (available_free_regions > _reserve_regions) {
>     base_free_regions = available_free_regions - _reserve_regions;
> }
>
> And there are other constrains to consider: user defined constrains and
> pause time goal.
>
> This is what I meant by 'try to honor' the reserved.
> If there is enough available_free_regions, it will reserve those regions.
> Those regions can be used as old or young.
>
Ok, thanks.  As you say, G1 *tries* to honor it, but may not.  The docs
I've come across online make it sound like this reservation is a guarantee,
or at least they don't stipulate the reservation may not work.  I don't
know if it's worth clarifying that point or not, but my vote would be to
make the docs err on the side of "more info" than less.

The second part is what I mentioned to Charlie in my last reply - can
humongous *allocations* be satisfied out of the reserve, or are the
reserved regions only used to hold evacuees (when base_free_regions are not
available).

Thanks

>
> Jenny
>
> On 10/07/2016 09:51 AM, Vitaly Davidovich wrote:
>
> Hi Charlie,
>
> On Fri, Oct 7, 2016 at 12:46 PM, charlie hunt <charlie.hunt at oracle.com>
> wrote:
>
>> Hi Vitaly,
>>
>> Just to clarify things in case there might be some confusion ? one of the
>> terms in G1 can be a little confusing with a term used in Parallel GC,
>> Serial GC and CMS GC, and that is ?to-space?.  In the latter case,
>> ?to-space? is a survivor space. In G1, ?to-space? is any space that a G1 is
>> evacuating objects too.  So a ?to-space exhausted? means that during an
>> evacuation of live objects from a G1 region (which could be an eden region,
>> survivor region or old region), and there is not an available region to
>> evacuate those live objects, this constitutes a ?to-space failure?.
>>
>> I may be wrong, but my understanding is that once a humongous object is
>> allocated, it is not evacuated. It stays in the same allocated region(s)
>> until it is marked as being unreachable and can be reclaimed.
>>
> Right, I understand the distinction in terminology.
>
> What I'm a bit confused by is when Jenny said "I agree the
> ReservePercent=40 is too high, but that should not prevent allocating to
> the old gen. G1 tries to honor ReservePercent."  Specifically, the "G1
> tries to honor ReservePercent".  It wasn't clear to me whether that implies
> humongous allocations can look for contiguous regions in the reserve, or
> not.  That's what I'm hoping to get clarification on since other sources
> online don't mention G1ReservePercent playing a role for HO specifically.
>
> Thanks
>
>>
>> charlie
>>
>> On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
>>
>> Hi Jenny,
>>
>> On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com <yu.zhang at oracle.com
>> > wrote:
>>
>>> Prasanna,
>>>
>>> In addition to what Vitaly said, I have some comments about your
>>> question:
>>>
>>> 1)      Humongus allocation request for 72 mb failed, from the logs we
>>> can also see we have free space of  around 3 GB. Does this means , our
>>> application is encountering high  amount of fragmentation ?.
>>>
>>> It is possible. What it means is g1 can not find 36 consecutive regions
>>> for that 72 mb object.
>>>
>>> I agree the ReservePercent=40 is too high, but that should not prevent
>>> allocating to the old gen. G1 tries to honor ReservePercent.
>>>
>> So just to clarify - is the space (i.e. regions) reserved by
>> G1ReservePercent allocatable to humongous object allocations? All
>> docs/webpages I found talk about this space being for holding survivors
>> (i.e. evac failure/to-space exhaustion mitigation).  It sounds like you're
>> saying these reserved regions should also be used to satisfy HO allocs?
>>
>> Thanks
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161007/ac91fbec/attachment-0001.html>

From prasanna.gopal at blackrock.com  Fri Oct  7 17:29:44 2016
From: prasanna.gopal at blackrock.com (Gopal, Prasanna CWK)
Date: Fri, 7 Oct 2016 17:29:44 +0000
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <CAHjP37E457tZdpWaEFL2SX+MpuXA7o1L8kn3jRK7org9-EDeHg@mail.gmail.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
	<70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
	<CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
	<76f94e97-3886-9c1d-65a6-d00dda2d903c@oracle.com>
	<CAHjP37E457tZdpWaEFL2SX+MpuXA7o1L8kn3jRK7org9-EDeHg@mail.gmail.com>
Message-ID: <1102f59681054bbdba0107e076e98303@UKPMSEXD202N02.na.blkint.com>

Hi All


Thanks for all your reply. These discussions certainly help to get good insight ?.

So just to summarize


1)     G1ReservePercent will not affect Humongus allocation , so the full GC we are encountering is due to fragmentation

2)     I will try chaging G1MixedGCLiveThresholdPercent to 85 to see the mixed GC?s can be increased.

3)     Due to some other dependencies , we were unable to move to latest Jdk?s ( Jdk 8).  Our application is currently running with CMS and we are seeing long GC pause , that why we wanted to explore G1.As we can?t move Jdk 8 soon , Is it good idea to migrate to G1 with Jdk 7

Thanks and regards
Prasanna

From: hotspot-gc-use [mailto:hotspot-gc-use-bounces at openjdk.java.net] On Behalf Of Vitaly Davidovich
Sent: 07 October 2016 18:27
To: yu.zhang at oracle.com
Cc: hotspot-gc-use at openjdk.java.net
Subject: Re: G1-GC - Full GC [humongous allocation request failed]

Hi Jenny,

On Fri, Oct 7, 2016 at 1:15 PM, yu.zhang at oracle.com<mailto:yu.zhang at oracle.com> <yu.zhang at oracle.com<mailto:yu.zhang at oracle.com>> wrote:

Hi, Vitaly,

Here is what happens in jdk9(I think the logic is the same as in jdk8).
_reserve_regions = reserve percent*regions of the heap
when trying to decide regions for young gen, we look at the free regions at the end of the collection, and try to honor the reserve_regions
if (available_free_regions > _reserve_regions) {
    base_free_regions = available_free_regions - _reserve_regions;
}

And there are other constrains to consider: user defined constrains and pause time goal.

This is what I meant by 'try to honor' the reserved.
If there is enough available_free_regions, it will reserve those regions. Those regions can be used as old or young.
Ok, thanks.  As you say, G1 *tries* to honor it, but may not.  The docs I've come across online make it sound like this reservation is a guarantee, or at least they don't stipulate the reservation may not work.  I don't know if it's worth clarifying that point or not, but my vote would be to make the docs err on the side of "more info" than less.

The second part is what I mentioned to Charlie in my last reply - can humongous *allocations* be satisfied out of the reserve, or are the reserved regions only used to hold evacuees (when base_free_regions are not available).

Thanks

Jenny

On 10/07/2016 09:51 AM, Vitaly Davidovich wrote:
Hi Charlie,

On Fri, Oct 7, 2016 at 12:46 PM, charlie hunt <charlie.hunt at oracle.com<mailto:charlie.hunt at oracle.com>> wrote:
Hi Vitaly,

Just to clarify things in case there might be some confusion ? one of the terms in G1 can be a little confusing with a term used in Parallel GC, Serial GC and CMS GC, and that is ?to-space?.  In the latter case, ?to-space? is a survivor space. In G1, ?to-space? is any space that a G1 is evacuating objects too.  So a ?to-space exhausted? means that during an evacuation of live objects from a G1 region (which could be an eden region, survivor region or old region), and there is not an available region to evacuate those live objects, this constitutes a ?to-space failure?.

I may be wrong, but my understanding is that once a humongous object is allocated, it is not evacuated. It stays in the same allocated region(s) until it is marked as being unreachable and can be reclaimed.
Right, I understand the distinction in terminology.

What I'm a bit confused by is when Jenny said "I agree the ReservePercent=40 is too high, but that should not prevent allocating to the old gen. G1 tries to honor ReservePercent."  Specifically, the "G1 tries to honor ReservePercent".  It wasn't clear to me whether that implies humongous allocations can look for contiguous regions in the reserve, or not.  That's what I'm hoping to get clarification on since other sources online don't mention G1ReservePercent playing a role for HO specifically.

Thanks

charlie

On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich <vitalyd at gmail.com<mailto:vitalyd at gmail.com>> wrote:

Hi Jenny,

On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com<mailto:yu.zhang at oracle.com> <yu.zhang at oracle.com<mailto:yu.zhang at oracle.com>> wrote:

Prasanna,

In addition to what Vitaly said, I have some comments about your question:

1)      Humongus allocation request for 72 mb failed, from the logs we can also see we have free space of  around 3 GB. Does this means , our application is encountering high  amount of fragmentation ?.

It is possible. What it means is g1 can not find 36 consecutive regions for that 72 mb object.

I agree the ReservePercent=40 is too high, but that should not prevent allocating to the old gen. G1 tries to honor ReservePercent.
So just to clarify - is the space (i.e. regions) reserved by G1ReservePercent allocatable to humongous object allocations? All docs/webpages I found talk about this space being for holding survivors (i.e. evac failure/to-space exhaustion mitigation).  It sounds like you're saying these reserved regions should also be used to satisfy HO allocs?

Thanks
_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use<https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.openjdk.java.net_mailman_listinfo_hotspot-2Dgc-2Duse&d=DQMFaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=zRhnqN6xuCQh8NZ-MtoiYBMlItU6r8UBO9AjZ3c3DEY&m=bFLMKO8nyTpNvz4kdxHMZYhhI0bhpuK8D-VzwdSucAs&s=OYJg7JMU45EUL_dJoLlr5CPanO_joIXi1r8LfuSxZcY&e=>


This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for further information.  Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more information about BlackRock?s Privacy Policy.
BlackRock Advisors (UK) Limited and BlackRock Investment Management (UK) Limited are authorised and regulated by the Financial Conduct Authority. Registered in England No. 796793 and No. 2020394 respectively. BlackRock Life Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. Registered in England No. 2223202. Registered Offices: 12 Throgmorton Avenue, London EC2N 2DL. BlackRock International Limited is authorised and regulated by the Financial Conduct Authority and is a registered investment adviser with the Securities and Exchange Commission (SEC). Registered in Scotland No. SC160821. Registered Office: Exchange Place One, 1 Semple Street, Edinburgh EH3 8BL.
For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.

? 2016 BlackRock, Inc. All rights reserved.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161007/08cee811/attachment.html>

From vitalyd at gmail.com  Fri Oct  7 17:44:29 2016
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Fri, 7 Oct 2016 13:44:29 -0400
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
	<70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
	<CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
	<D66CC823-9BDE-49D4-9828-266DEC4A5089@oracle.com>
	<CAHjP37FmWS9izaKFyd48fDBdrhH8-M6HmtpOAFFEqQ1L9e4Q9w@mail.gmail.com>
	<2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com>
Message-ID: <CAHjP37EavH3TjZbPjoOKY2jH--8BPkZ5KvbfyzhxGJv-D80uBg@mail.gmail.com>

On Friday, October 7, 2016, charlie hunt <charlie.hunt at oracle.com> wrote:

> I think others are benefiting from your question(s) ? and it?s helping
> refresh my memory of things too. ;-)
>
> Actually, I just looked at what we documented in Java Performance
> Companion for G1ReservePercent, this wording may imply a very slightly
> subtle different definition, ?To reduce the risk of getting a promotion
> failure, G1 reserves some memory for promotions. This memory will not be
> used for the young generation.?
>
> Perhaps one of the G1 engineers can clarify this?
>
Yeah, would be good to clarify.  The above wording in the performance
companion is at odds of other definitions.  If the regions can be used to
hold Eden survivors or other survivors from existing survivor regions, then
it's not really accurate to say it's not used for the young generation.
I'm guessing what's meant is it won't be used for Eden regions, i.e. to
hold normal non HO allocations.

>
> Based on what we documented for G1ReservePercent, it implies that regions
> are reserved for promotions, which implies old generation regions. Note
> that on a young GC, some objects will be evacuated to survivor regions, and
> if G1 decides to grow the number of eden regions, then both those evacuated
> ?to survivor regions? and ?additional eden regions? will not come from that
> G1ReservePercent. And, since humongous objects are allocated from old
> regions, it is not clear to me that G1ReservePercent regions could be
> allocated into as humongous objects if the intent for G1ReservePercent is
> for promotions. Humongous objects are not promoted. They are allocated
> directly into humongous regions which get allocated from old generation.
>
It seems they're reserved for evacuees, not promotions, which can come from
any region modulo humongous (since those aren't copied).

>
> Again, hopefully one of the G1 engineers can jump in and clarify.
>
Yes, please.

>
> Thanks for the question(s)!
>
> charlie
>
> On Oct 7, 2016, at 12:09 PM, Vitaly Davidovich <vitalyd at gmail.com
> <javascript:_e(%7B%7D,'cvml','vitalyd at gmail.com');>> wrote:
>
>
>
> On Fri, Oct 7, 2016 at 1:00 PM, charlie hunt <charlie.hunt at oracle.com
> <javascript:_e(%7B%7D,'cvml','charlie.hunt at oracle.com');>> wrote:
>
>> Glad to hear you?re not confused with the terminology.  :-)
>>
>> On ReservePercent, my understanding is that the ReservePercent applies to
>> the number of regions that will not used for young generation, eden regions
>> or survivor regions. The intent is avoid to-space exhausted by ensuring a
>> ?reserved percentage? of regions are available for evacuation. This implies
>> that those reserved regions could be used for old regions or humongous
>> regions.
>>
> Ok, so then the more explicit wording would be "The intent is to avoid
> to-space exhausted by ensuring a reserved percentage of regions are
> available for evacuation or humongous object allocation", right? Perhaps
> the "for evacuation" is throwing it off a bit for me, since the HO
> allocation isn't an "evacuation" obviously.
>
> Thanks Charlie
>
> P.S. I realize I'm hijacking Prasanna's thread quite a bit, but hopefully
> the discussed info is useful anyway.
>
>>
>> charlie
>>
>> On Oct 7, 2016, at 11:51 AM, Vitaly Davidovich <vitalyd at gmail.com
>> <javascript:_e(%7B%7D,'cvml','vitalyd at gmail.com');>> wrote:
>>
>> Hi Charlie,
>>
>> On Fri, Oct 7, 2016 at 12:46 PM, charlie hunt <charlie.hunt at oracle.com
>> <javascript:_e(%7B%7D,'cvml','charlie.hunt at oracle.com');>> wrote:
>>
>>> Hi Vitaly,
>>>
>>> Just to clarify things in case there might be some confusion ? one of
>>> the terms in G1 can be a little confusing with a term used in Parallel GC,
>>> Serial GC and CMS GC, and that is ?to-space?.  In the latter case,
>>> ?to-space? is a survivor space. In G1, ?to-space? is any space that a G1 is
>>> evacuating objects too.  So a ?to-space exhausted? means that during an
>>> evacuation of live objects from a G1 region (which could be an eden region,
>>> survivor region or old region), and there is not an available region to
>>> evacuate those live objects, this constitutes a ?to-space failure?.
>>>
>>> I may be wrong, but my understanding is that once a humongous object is
>>> allocated, it is not evacuated. It stays in the same allocated region(s)
>>> until it is marked as being unreachable and can be reclaimed.
>>>
>> Right, I understand the distinction in terminology.
>>
>> What I'm a bit confused by is when Jenny said "I agree the
>> ReservePercent=40 is too high, but that should not prevent allocating to
>> the old gen. G1 tries to honor ReservePercent."  Specifically, the "G1
>> tries to honor ReservePercent".  It wasn't clear to me whether that implies
>> humongous allocations can look for contiguous regions in the reserve, or
>> not.  That's what I'm hoping to get clarification on since other sources
>> online don't mention G1ReservePercent playing a role for HO specifically.
>>
>> Thanks
>>
>>>
>>> charlie
>>>
>>> On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich <vitalyd at gmail.com
>>> <javascript:_e(%7B%7D,'cvml','vitalyd at gmail.com');>> wrote:
>>>
>>> Hi Jenny,
>>>
>>> On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com
>>> <javascript:_e(%7B%7D,'cvml','yu.zhang at oracle.com');> <
>>> yu.zhang at oracle.com
>>> <javascript:_e(%7B%7D,'cvml','yu.zhang at oracle.com');>> wrote:
>>>
>>>> Prasanna,
>>>>
>>>> In addition to what Vitaly said, I have some comments about your
>>>> question:
>>>>
>>>> 1)      Humongus allocation request for 72 mb failed, from the logs we
>>>> can also see we have free space of  around 3 GB. Does this means , our
>>>> application is encountering high  amount of fragmentation ?.
>>>>
>>>> It is possible. What it means is g1 can not find 36 consecutive regions
>>>> for that 72 mb object.
>>>>
>>>> I agree the ReservePercent=40 is too high, but that should not prevent
>>>> allocating to the old gen. G1 tries to honor ReservePercent.
>>>>
>>> So just to clarify - is the space (i.e. regions) reserved by
>>> G1ReservePercent allocatable to humongous object allocations? All
>>> docs/webpages I found talk about this space being for holding survivors
>>> (i.e. evac failure/to-space exhaustion mitigation).  It sounds like you're
>>> saying these reserved regions should also be used to satisfy HO allocs?
>>>
>>> Thanks
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> <javascript:_e(%7B%7D,'cvml','hotspot-gc-use at openjdk.java.net');>
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>>
>>>
>>
>>
>
>

-- 
Sent from my phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161007/61131d41/attachment-0001.html>

From yu.zhang at oracle.com  Fri Oct  7 18:21:00 2016
From: yu.zhang at oracle.com (yu.zhang at oracle.com)
Date: Fri, 7 Oct 2016 11:21:00 -0700
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <CAHjP37E457tZdpWaEFL2SX+MpuXA7o1L8kn3jRK7org9-EDeHg@mail.gmail.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
	<70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
	<CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
	<76f94e97-3886-9c1d-65a6-d00dda2d903c@oracle.com>
	<CAHjP37E457tZdpWaEFL2SX+MpuXA7o1L8kn3jRK7org9-EDeHg@mail.gmail.com>
Message-ID: <6b567a4d-3c10-a56e-8d58-c6f5d01de29e@oracle.com>

Vitaly,

I am cc this to the dev list.

My comments in line.


On 10/07/2016 10:27 AM, Vitaly Davidovich wrote:
> Hi Jenny,
>
> On Fri, Oct 7, 2016 at 1:15 PM, yu.zhang at oracle.com 
> <mailto:yu.zhang at oracle.com> <yu.zhang at oracle.com 
> <mailto:yu.zhang at oracle.com>> wrote:
>
>     Hi, Vitaly,
>
>     Here is what happens in jdk9(I think the logic is the same as in
>     jdk8).
>
>     _reserve_regions = reserve percent*regions of the heap
>     when trying to decide regions for young gen, we look at the free
>     regions at the end of the collection, and try to honor the
>     reserve_regions
>     if (available_free_regions > _reserve_regions) {
>         base_free_regions = available_free_regions - _reserve_regions;
>     }
>
>     And there are other constrains to consider: user defined
>     constrains and pause time goal.
>
>     This is what I meant by 'try to honor' the reserved.
>     If there is enough available_free_regions, it will reserve those
>     regions. Those regions can be used as old or young.
>
> Ok, thanks.  As you say, G1 *tries* to honor it, but may not.  The 
> docs I've come across online make it sound like this reservation is a 
> guarantee, or at least they don't stipulate the reservation may not 
> work.  I don't know if it's worth clarifying that point or not, but my 
> vote would be to make the docs err on the side of "more info" than less.
Agree.
>
> The second part is what I mentioned to Charlie in my last reply - can 
> humongous *allocations* be satisfied out of the reserve, or are the 
> reserved regions only used to hold evacuees (when base_free_regions 
> are not available).
That is a good question. Here is my understanding, which need to be 
confirmed by G1 developer. In this code
HeapWord* G1CollectedHeap::humongous_obj_allocate(size_t word_size, 
AllocationContext_t context)
G1 tries to find regions from _free_list that can hold the humongous 
objects. The reserved regions are also on the _free_list (again need to 
be confirmed by developer). So my understanding is those reserved 
regions can be used as humongous allocation.

But I might be missing something.

>
> Thanks
>
>
>     Jenny
>
>     On 10/07/2016 09:51 AM, Vitaly Davidovich wrote:
>>     Hi Charlie,
>>
>>     On Fri, Oct 7, 2016 at 12:46 PM, charlie hunt
>>     <charlie.hunt at oracle.com <mailto:charlie.hunt at oracle.com>> wrote:
>>
>>         Hi Vitaly,
>>
>>         Just to clarify things in case there might be some confusion
>>         ? one of the terms in G1 can be a little confusing with a
>>         term used in Parallel GC, Serial GC and CMS GC, and that is
>>         ?to-space?.  In the latter case, ?to-space? is a survivor
>>         space. In G1, ?to-space? is any space that a G1 is evacuating
>>         objects too.  So a ?to-space exhausted? means that during an
>>         evacuation of live objects from a G1 region (which could be
>>         an eden region, survivor region or old region), and there is
>>         not an available region to evacuate those live objects, this
>>         constitutes a ?to-space failure?.
>>
>>         I may be wrong, but my understanding is that once a humongous
>>         object is allocated, it is not evacuated. It stays in the
>>         same allocated region(s) until it is marked as being
>>         unreachable and can be reclaimed.
>>
>>     Right, I understand the distinction in terminology.
>>
>>     What I'm a bit confused by is when Jenny said "I agree the
>>     ReservePercent=40 is too high, but that should not prevent
>>     allocating to the old gen. G1 tries to honor ReservePercent."
>>      Specifically, the "G1 tries to honor ReservePercent".  It wasn't
>>     clear to me whether that implies humongous allocations can look
>>     for contiguous regions in the reserve, or not.  That's what I'm
>>     hoping to get clarification on since other sources online don't
>>     mention G1ReservePercent playing a role for HO specifically.
>>
>>     Thanks
>>
>>
>>         charlie
>>
>>>         On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich
>>>         <vitalyd at gmail.com <mailto:vitalyd at gmail.com>> wrote:
>>>
>>>         Hi Jenny,
>>>
>>>         On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com
>>>         <mailto:yu.zhang at oracle.com> <yu.zhang at oracle.com
>>>         <mailto:yu.zhang at oracle.com>> wrote:
>>>
>>>             Prasanna,
>>>
>>>             In addition to what Vitaly said, I have some comments
>>>             about your question:
>>>
>>>             1) Humongus allocation request for 72 mb failed, from
>>>             the logs we can also see we have free space of  around 3
>>>             GB. Does this means , our application is encountering
>>>             high  amount of fragmentation ?.
>>>
>>>             It is possible. What it means is g1 can not find 36
>>>             consecutive regions for that 72 mb object.
>>>
>>>             I agree the ReservePercent=40 is too high, but that
>>>             should not prevent allocating to the old gen. G1 tries
>>>             to honor ReservePercent.
>>>
>>>         So just to clarify - is the space (i.e. regions) reserved by
>>>         G1ReservePercent allocatable to humongous object
>>>         allocations? All docs/webpages I found talk about this space
>>>         being for holding survivors (i.e. evac failure/to-space
>>>         exhaustion mitigation).  It sounds like you're saying these
>>>         reserved regions should also be used to satisfy HO allocs?
>>>
>>>         Thanks
>>>         _______________________________________________
>>>         hotspot-gc-use mailing list
>>>         hotspot-gc-use at openjdk.java.net
>>>         <mailto:hotspot-gc-use at openjdk.java.net>
>>>         http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>         <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161007/0cfae44e/attachment.html>

From vitalyd at gmail.com  Sat Oct  8 18:30:53 2016
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Sat, 8 Oct 2016 14:30:53 -0400
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <6b567a4d-3c10-a56e-8d58-c6f5d01de29e@oracle.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
	<70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
	<CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
	<76f94e97-3886-9c1d-65a6-d00dda2d903c@oracle.com>
	<CAHjP37E457tZdpWaEFL2SX+MpuXA7o1L8kn3jRK7org9-EDeHg@mail.gmail.com>
	<6b567a4d-3c10-a56e-8d58-c6f5d01de29e@oracle.com>
Message-ID: <CAHjP37FqnqoZa8qhArquMpKYPdN034jHa54-wQvYtC52ZL5Ngw@mail.gmail.com>

On Friday, October 7, 2016, Gopal, Prasanna CWK <
prasanna.gopal at blackrock.com
<javascript:_e(%7B%7D,'cvml','prasanna.gopal at blackrock.com');>> wrote:

> Hi All
>
>
>
>
>
> Thanks for all your reply. These discussions certainly help to get good
> insight J.
>
>
>
> So just to summarize
>
>
>
> 1)     G1ReservePercent will not affect Humongus allocation , so the full
> GC we are encountering is due to fragmentation
>
> It may or may not - let's see what a G1 dev says (as Jenny mentioned).
Either way, you don't have enough contiguous regions to satisfy the
allocation, so it's fragmentation one way or the other.

You should drop G1ReservePercent for now unless you have good reason for
setting it to 40 (that's a large value, btw).

> 2)     I will try chaging G1MixedGCLiveThresholdPercent to 85 to see the
> mixed GC?s can be increased.
>
Yes, that's a good idea.  Do you see any mixed GCs at all now? If so, how
long are the concurrent marking phases taking (look for concurrent-mark-end
in the gc log).

> 3)     Due to some other dependencies , we were unable to move to latest
> Jdk?s ( Jdk 8).  Our application is currently running with CMS and we are
> seeing long GC pause , that why we wanted to explore G1.As we can?t move
> Jdk 8 soon , Is it good idea to migrate to G1 with Jdk 7
>
Have you tried just setting Xmx and a reasonable pause time goal? As a rule
of thumb, setting Xmx to 3x your live set works well (the more headroom you
give to G1, the better).  Giving it a reasonable pause time goal allows it
to adjust young gen dynamically and possibly raising it high enough such
that there's either no promotion or very little - young gen collection
efficiency is a function of how many survivors you have when the collection
kicks in, so the fewer survivors the better (that applies to all
generational copying collectors, not just G1 of course).

>
>
> Thanks and regards
>
> Prasanna
>

>
> *From:* hotspot-gc-use [mailto:hotspot-gc-use-bounces at openjdk.java.net] *On
> Behalf Of *Vitaly Davidovich
> *Sent:* 07 October 2016 18:27
> *To:* yu.zhang at oracle.com
> *Cc:* hotspot-gc-use at openjdk.java.net
> *Subject:* Re: G1-GC - Full GC [humongous allocation request failed]
>
>
>
> Hi Jenny,
>
>
>
> On Fri, Oct 7, 2016 at 1:15 PM, yu.zhang at oracle.com <yu.zhang at oracle.com>
> wrote:
>
> Hi, Vitaly,
>
> Here is what happens in jdk9(I think the logic is the same as in jdk8).
>
> _reserve_regions = reserve percent*regions of the heap
> when trying to decide regions for young gen, we look at the free regions
> at the end of the collection, and try to honor the reserve_regions
> if (available_free_regions > _reserve_regions) {
>     base_free_regions = available_free_regions - _reserve_regions;
> }
>
> And there are other constrains to consider: user defined constrains and
> pause time goal.
>
> This is what I meant by 'try to honor' the reserved.
> If there is enough available_free_regions, it will reserve those regions.
> Those regions can be used as old or young.
>
> Ok, thanks.  As you say, G1 *tries* to honor it, but may not.  The docs
> I've come across online make it sound like this reservation is a guarantee,
> or at least they don't stipulate the reservation may not work.  I don't
> know if it's worth clarifying that point or not, but my vote would be to
> make the docs err on the side of "more info" than less.
>
>
>
> The second part is what I mentioned to Charlie in my last reply - can
> humongous *allocations* be satisfied out of the reserve, or are the
> reserved regions only used to hold evacuees (when base_free_regions are not
> available).
>
>
>
> Thanks
>
>
> Jenny
>
>
>
> On 10/07/2016 09:51 AM, Vitaly Davidovich wrote:
>
> Hi Charlie,
>
>
>
> On Fri, Oct 7, 2016 at 12:46 PM, charlie hunt <charlie.hunt at oracle.com>
> wrote:
>
> Hi Vitaly,
>
>
>
> Just to clarify things in case there might be some confusion ? one of the
> terms in G1 can be a little confusing with a term used in Parallel GC,
> Serial GC and CMS GC, and that is ?to-space?.  In the latter case,
> ?to-space? is a survivor space. In G1, ?to-space? is any space that a G1 is
> evacuating objects too.  So a ?to-space exhausted? means that during an
> evacuation of live objects from a G1 region (which could be an eden region,
> survivor region or old region), and there is not an available region to
> evacuate those live objects, this constitutes a ?to-space failure?.
>
>
>
> I may be wrong, but my understanding is that once a humongous object is
> allocated, it is not evacuated. It stays in the same allocated region(s)
> until it is marked as being unreachable and can be reclaimed.
>
> Right, I understand the distinction in terminology.
>
>
>
> What I'm a bit confused by is when Jenny said "I agree the
> ReservePercent=40 is too high, but that should not prevent allocating to
> the old gen. G1 tries to honor ReservePercent."  Specifically, the "G1
> tries to honor ReservePercent".  It wasn't clear to me whether that implies
> humongous allocations can look for contiguous regions in the reserve, or
> not.  That's what I'm hoping to get clarification on since other sources
> online don't mention G1ReservePercent playing a role for HO specifically.
>
>
>
> Thanks
>
>
>
> charlie
>
>
>
> On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
>
>
>
> Hi Jenny,
>
>
>
> On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com <yu.zhang at oracle.com>
> wrote:
>
> Prasanna,
>
> In addition to what Vitaly said, I have some comments about your question:
>
> 1)      Humongus allocation request for 72 mb failed, from the logs we can
> also see we have free space of  around 3 GB. Does this means , our
> application is encountering high  amount of fragmentation ?.
>
> It is possible. What it means is g1 can not find 36 consecutive regions
> for that 72 mb object.
>
> I agree the ReservePercent=40 is too high, but that should not prevent
> allocating to the old gen. G1 tries to honor ReservePercent.
>
> So just to clarify - is the space (i.e. regions) reserved by
> G1ReservePercent allocatable to humongous object allocations? All
> docs/webpages I found talk about this space being for holding survivors
> (i.e. evac failure/to-space exhaustion mitigation).  It sounds like you're
> saying these reserved regions should also be used to satisfy HO allocs?
>
>
>
> Thanks
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.openjdk.java.net_mailman_listinfo_hotspot-2Dgc-2Duse&d=DQMFaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=zRhnqN6xuCQh8NZ-MtoiYBMlItU6r8UBO9AjZ3c3DEY&m=bFLMKO8nyTpNvz4kdxHMZYhhI0bhpuK8D-VzwdSucAs&s=OYJg7JMU45EUL_dJoLlr5CPanO_joIXi1r8LfuSxZcY&e=>
>
>
>
>
>
>
>
>
>
>
>
> This message may contain information that is confidential or privileged.
> If you are not the intended recipient, please advise the sender immediately
> and delete this message. See http://www.blackrock.com/corpo
> rate/en-us/compliance/email-disclaimers for further information.  Please
> refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-
> policy for more information about BlackRock?s Privacy Policy.
>
> BlackRock Advisors (UK) Limited and BlackRock Investment Management (UK)
> Limited are authorised and regulated by the Financial Conduct Authority.
> Registered in England No. 796793 and No. 2020394 respectively. BlackRock
> Life Limited is authorised by the Prudential Regulation Authority and
> regulated by the Financial Conduct Authority and the Prudential Regulation
> Authority. Registered in England No. 2223202. Registered Offices: 12
> Throgmorton Avenue, London EC2N 2DL. BlackRock International Limited is
> authorised and regulated by the Financial Conduct Authority and is a
> registered investment adviser with the Securities and Exchange Commission
> (SEC). Registered in Scotland No. SC160821. Registered Office: Exchange
> Place One, 1 Semple Street, Edinburgh EH3 8BL.
>
> For a list of BlackRock's office addresses worldwide, see
> http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.
>
> ? 2016 BlackRock, Inc. All rights reserved.
>
>

-- 
Sent from my phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161008/851f9dc5/attachment-0001.html>

From thomas.schatzl at oracle.com  Mon Oct 10 08:12:51 2016
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Mon, 10 Oct 2016 10:12:51 +0200
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <CAHjP37EavH3TjZbPjoOKY2jH--8BPkZ5KvbfyzhxGJv-D80uBg@mail.gmail.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
	<70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
	<CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
	<D66CC823-9BDE-49D4-9828-266DEC4A5089@oracle.com>
	<CAHjP37FmWS9izaKFyd48fDBdrhH8-M6HmtpOAFFEqQ1L9e4Q9w@mail.gmail.com>
	<2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com>
	<CAHjP37EavH3TjZbPjoOKY2jH--8BPkZ5KvbfyzhxGJv-D80uBg@mail.gmail.com>
Message-ID: <1476087171.2652.37.camel@oracle.com>

Hi all,

On Fri, 2016-10-07 at 13:44 -0400, Vitaly Davidovich wrote:
>?
>?On Friday, October 7, 2016, charlie hunt <charlie.hunt at oracle.com>
>?wrote:
>?>?I think others are benefiting from your question(s) ? and it?s
>?>?helping refresh my memory of things too. ;-)
>?>?
>?>?Actually, I just looked at what we documented in Java Performance
>?>?Companion for G1ReservePercent, this wording may imply a very
>?>?slightly subtle different definition, ?To reduce the risk of
>?>?getting a promotion failure, G1 reserves some memory for
>?>?promotions. This memory will not?be used for the young
>?>?generation.??
>?>?
>?>?Perhaps one of the G1 engineers can clarify this?
? the area covered by G1ReservePercent is regular space available for
any allocation, whether young or old or humongous.

The only difference is that while the heap occupancy is beyond the
reserve percent threshold, young gen will be minimal (like bounded by
G1NewSizePercent). I.e. G1 will run in some kind of "degraded
throughput" mode. "Degraded" as in young gen size is typically somehow
correlated with allocation throughput, so if you bound young gen size,
you also bound throughput.

The thinking for the reserve is to cover for extraneous large
allocations (either humongous or just a case where due to application
behavior changes lots of young gen objects survive) while G1 is getting
liveness information for the reclamation phase (i.e. mixed gc phase).

The collector just can't know what is the "maximum" promotion or
humongous object allocation rate as it is heavily application
dependent.
Just assuming the worst case, i.e. G1ReservePercent equals young gen,
would be way too wasteful, and at odds with other settings actually -
G1 can and will expand young gen to up to 70% if possible. Further,
such a heuristic would not capture humongous allocation by the
application anyway.

Ideally G1ReservePercent and InitiatingHeapOccupancyPercent are tuned
so that reclamation starts when occupancy reaches the G1ReservePercent
threshold. I.e., some ASCII art:

? ?+--------------------+ ?<-- heap full
^ ?| ? ? ? ? ? ? ? ? ? ?|
| ?| 1)G1ReservePercent |
| ?| ? ? ? ? ? ? ? ? ? ?|
? ?+--------------------+ ?<-- first mixed gc
H ?| ? ? ? ? ? ? ? ? ? ?|
e ?| 2)Allocation into ?|
a ?| old gen during ? ? |
p ?| marking ? ? ? ? ? ?|
? ?| ? ? ? ? ? ? ? ? ? ?|
o ?+--------------------+ <-- InitiatingHeapOccupancyPercent
c ?| ? ? ? ? ? ? ? ? ? ?|
c ?. 3)"Unconstrained" ?.
u ?. young gen sizing ? .
p ?. operation ? ? ? ? ?.
a ?. ? ? ? ? ? ? ? ? ? ?.
n ?. ? ? ? ? ? ? ? ? ? ?.
c ?. ? ? ? ? ? ? ? ? ? ?.
y ?. ? ? ? ? ? ? ? ? ? ?.
? ?+--------------------+ ?<-- heap empty

(I am probably forgetting one or the other edge case here, but that's
the general idea; also please consider that for G1, except for
humongous allocations, the heap does not need to )

So when current young gen size + old gen occupancy is somewhere in
areas 2)/3), G1 will expand young gen as it sees fit to meet pause
time, i.e. is "unconstrained".

If young gen size + old gen occupancy starts eating into area 1), G1
minimizes young gen to try to keep as much memory left for these
"extraneous allocations" that G1ReservePercent indicates, in the hope
that the IHOP is "soon" kicking in. Until jdk9, G1 assumes that the
user gave some sane settings according to (roughly) this model.
With jdk9 onwards, the IHOP is determined automatically according to
this model and so far seems to work quite nicely - at least it will
typically give you a decent starting point for setting it on your own.

As for the default value of G1ReservePercent (=10), well, consider it
some default for the "typical" application, trying to strike some
balance between throughput and safety to prevent running out of memory.

For very large heaps, it might typically be set a bit too large as the
young gen will most of the times be smaller than 10% of the heap due to
pause time constraints (or e.g. G1MaxNewSizePercent) and application
specific boundaries like "useful" allocation rates. Setting it to 40%
seems a bit too cautious, but may be warranted in some cases. Before
JDK9, it may be better to set InitiatingHeapOccupancyPercent properly.

For very small heaps?G1ReservePercent?may be too small.

(jdk9 specific tip: you can use?G1ReservePercent?to set a maximum IHOP
value).

Thanks,
? Thomas


From thomas.schatzl at oracle.com  Mon Oct 10 08:32:39 2016
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Mon, 10 Oct 2016 10:32:39 +0200
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <CAHjP37FqnqoZa8qhArquMpKYPdN034jHa54-wQvYtC52ZL5Ngw@mail.gmail.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
	<70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
	<CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
	<76f94e97-3886-9c1d-65a6-d00dda2d903c@oracle.com>
	<CAHjP37E457tZdpWaEFL2SX+MpuXA7o1L8kn3jRK7org9-EDeHg@mail.gmail.com>
	<6b567a4d-3c10-a56e-8d58-c6f5d01de29e@oracle.com>
	<CAHjP37FqnqoZa8qhArquMpKYPdN034jHa54-wQvYtC52ZL5Ngw@mail.gmail.com>
Message-ID: <1476088359.2652.48.camel@oracle.com>

Hi,

On Sat, 2016-10-08 at 14:30 -0400, Vitaly Davidovich wrote:
> 
> 
> On Friday, October 7, 2016, Gopal, Prasanna CWK <prasanna.gopal at black
> rock.com> wrote:
> > Hi All
> > ?
> > ?
> > Thanks for all your reply. These discussions certainly help to get
> > good insight J.
> > ?
> > So just to summarize
> > ?
> > 1)???? G1ReservePercent will not affect Humongus allocation , so
> > the full GC we are encountering is due to fragmentation
> > 
> It may or may not - let's see what a G1 dev says (as Jenny
> mentioned).? Either way, you don't have enough contiguous regions to
> satisfy the allocation, so it's fragmentation one way or the other.
> 
> You should drop G1ReservePercent for now unless you have good reason
> for setting it to 40 (that's a large value, btw).?

See the other email in this thread.

> > 2)???? I will try chaging G1MixedGCLiveThresholdPercent to 85 to
> > see the mixed GC?s can be increased.
> > 
> Yes, that's a good idea.? Do you see any mixed GCs at all now? If so,
> how long are the concurrent marking phases taking (look for
> concurrent-mark-end in the gc log).

Agree. Making G1 more aggressive with reclaiming old gen regions may
help.

With 8u40+, some simple, quite effective heuristics were added that in
many cases decrease fragmentation a lot. With 7, you can only either
give G1 more memory, try to make it more aggressively reclaim regions,
or minimize old gen allocations that cause fragmentation.

> > 3)???? Due to some other dependencies , we were unable to move to
> > latest Jdk?s ( Jdk 8).? Our application is currently running with
> > CMS and we are seeing long GC pause , that why we wanted to explore
> > G1.As we can?t move Jdk 8 soon , Is it good idea to migrate to G1
> > with Jdk 7??

Please at least move to latest 7u (I remember you mentioning 7u40).
There were a few very useful patches for G1 in 7u60 iirc.

> Have you tried just setting Xmx and a reasonable pause time goal?
> As a rule of thumb, setting Xmx to 3x your live set works?
> well(the more headroom you give to G1, the better).? Giving it?
> a reasonable pause time goal allows it to adjust young gen?
> dynamically and possibly raising it high enough such that there's?
> either no promotion or very little - young gen collection efficiency?
> is a function of how many survivors you have when the collection?
> kicks in, so the fewer survivors the better (that applies?
> to all generational?copying collectors, not just G1 of course).?

Thanks,
? Thomas


From thomas.schatzl at oracle.com  Mon Oct 10 08:38:42 2016
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Mon, 10 Oct 2016 10:38:42 +0200
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <1476087171.2652.37.camel@oracle.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
	<70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
	<CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
	<D66CC823-9BDE-49D4-9828-266DEC4A5089@oracle.com>
	<CAHjP37FmWS9izaKFyd48fDBdrhH8-M6HmtpOAFFEqQ1L9e4Q9w@mail.gmail.com>
	<2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com>
	<CAHjP37EavH3TjZbPjoOKY2jH--8BPkZ5KvbfyzhxGJv-D80uBg@mail.gmail.com>
	<1476087171.2652.37.camel@oracle.com>
Message-ID: <1476088722.2652.50.camel@oracle.com>


On Mon, 2016-10-10 at 10:12 +0200, Thomas Schatzl wrote:
> Hi all,
> 
[...]
> 
> (I am probably forgetting one or the other edge case here, but that's
> the general idea; also please consider that for G1, except for
> humongous allocations, the heap does not need to )

... the actually occupied heap area does not need to be contiguous.
It's just easier to draw as such :)

Thanks,
? Thomas


From vitalyd at gmail.com  Mon Oct 10 10:42:20 2016
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Mon, 10 Oct 2016 06:42:20 -0400
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <1476087171.2652.37.camel@oracle.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
	<70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
	<CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
	<D66CC823-9BDE-49D4-9828-266DEC4A5089@oracle.com>
	<CAHjP37FmWS9izaKFyd48fDBdrhH8-M6HmtpOAFFEqQ1L9e4Q9w@mail.gmail.com>
	<2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com>
	<CAHjP37EavH3TjZbPjoOKY2jH--8BPkZ5KvbfyzhxGJv-D80uBg@mail.gmail.com>
	<1476087171.2652.37.camel@oracle.com>
Message-ID: <CAHjP37Fdrbwbnny0hHV=Agob8VAS3Y0cSQkqnrJ1hyT=YUCFkQ@mail.gmail.com>

Hi Thomas,

Thanks for the clarification and insights.  A few comments below ...

On Monday, October 10, 2016, Thomas Schatzl <thomas.schatzl at oracle.com>
wrote:

> Hi all,
>
> On Fri, 2016-10-07 at 13:44 -0400, Vitaly Davidovich wrote:
> >
> > On Friday, October 7, 2016, charlie hunt <charlie.hunt at oracle.com
> <javascript:;>>
> > wrote:
> > > I think others are benefiting from your question(s) ? and it?s
> > > helping refresh my memory of things too. ;-)
> > >
> > > Actually, I just looked at what we documented in Java Performance
> > > Companion for G1ReservePercent, this wording may imply a very
> > > slightly subtle different definition, ?To reduce the risk of
> > > getting a promotion failure, G1 reserves some memory for
> > > promotions. This memory will not be used for the young
> > > generation.?
> > >
> > > Perhaps one of the G1 engineers can clarify this?
>   the area covered by G1ReservePercent is regular space available for
> any allocation, whether young or old or humongous.
>
> The only difference is that while the heap occupancy is beyond the
> reserve percent threshold, young gen will be minimal (like bounded by
> G1NewSizePercent). I.e. G1 will run in some kind of "degraded
> throughput" mode. "Degraded" as in young gen size is typically somehow
> correlated with allocation throughput, so if you bound young gen size,
> you also bound throughput.

Ok, so that's a quite different definition of the reserve than pretty much
all sources that I've seen :).  Your explanation makes it sound like a
"yellow zone" for G1, or a throttle/watermark for the young gen sizing.

>
> The thinking for the reserve is to cover for extraneous large
> allocations (either humongous or just a case where due to application
> behavior changes lots of young gen objects survive) while G1 is getting
> liveness information for the reclamation phase (i.e. mixed gc phase).


> The collector just can't know what is the "maximum" promotion or
> humongous object allocation rate as it is heavily application
> dependent.
> Just assuming the worst case, i.e. G1ReservePercent equals young gen,
> would be way too wasteful, and at odds with other settings actually -
> G1 can and will expand young gen to up to 70% if possible. Further,
> such a heuristic would not capture humongous allocation by the
> application anyway.
>
> Ideally G1ReservePercent and InitiatingHeapOccupancyPercent are tuned
> so that reclamation starts when occupancy reaches the G1ReservePercent
> threshold. I.e., some ASCII art:
>
>    +--------------------+  <-- heap full
> ^  |                    |
> |  | 1)G1ReservePercent |
> |  |                    |
>    +--------------------+  <-- first mixed gc
> H  |                    |
> e  | 2)Allocation into  |
> a  | old gen during     |
> p  | marking            |
>    |                    |
> o  +--------------------+ <-- InitiatingHeapOccupancyPercent
> c  |                    |
> c  . 3)"Unconstrained"  .
> u  . young gen sizing   .
> p  . operation          .
> a  .                    .
> n  .                    .
> c  .                    .
> y  .                    .
>    +--------------------+  <-- heap empty
>
> (I am probably forgetting one or the other edge case here, but that's
> the general idea; also please consider that for G1, except for
> humongous allocations, the heap does not need to )
>
> So when current young gen size + old gen occupancy is somewhere in
> areas 2)/3), G1 will expand young gen as it sees fit to meet pause
> time, i.e. is "unconstrained".
>
> If young gen size + old gen occupancy starts eating into area 1), G1
> minimizes young gen to try to keep as much memory left for these
> "extraneous allocations" that G1ReservePercent indicates, in the hope
> that the IHOP is "soon" kicking in. Until jdk9, G1 assumes that the
> user gave some sane settings according to (roughly) this model.
> With jdk9 onwards, the IHOP is determined automatically according to
> this model and so far seems to work quite nicely - at least it will
> typically give you a decent starting point for setting it on your own.

Ok, so the reserve acts like a high watermark in 9, used to adjust IHOP
dynamically.  It sounds like it's an IHOP++ setting :).

I'm also not sure winding the young gen down helps in cases where old gen
occupancy is growing.  Intuitively, that ought to make things worse
actually.  Young evacs will occur more frequently, with higher likelihood
that more objects are still live, and need to be kept alive, possibly
causing further promotion.

One way that it helps is there's more frequent feedback to G1 about heap
occupancy (since young evacs occur more frequently), and so it may notice
that things aren't looking so peachy earlier.  Is that the idea?


> As for the default value of G1ReservePercent (=10), well, consider it
> some default for the "typical" application, trying to strike some
> balance between throughput and safety to prevent running out of memory.
>
> For very large heaps, it might typically be set a bit too large as the
> young gen will most of the times be smaller than 10% of the heap due to
> pause time constraints (or e.g. G1MaxNewSizePercent) and application
> specific boundaries like "useful" allocation rates. Setting it to 40%
> seems a bit too cautious, but may be warranted in some cases. Before
> JDK9, it may be better to set InitiatingHeapOccupancyPercent properly.
>
> For very small heaps G1ReservePercent may be too small.
>
> (jdk9 specific tip: you can use G1ReservePercent to set a maximum IHOP
> value).
>
> Thanks,
>   Thomas
>
>

-- 
Sent from my phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161010/17448fed/attachment.html>

From vitalyd at gmail.com  Mon Oct 10 10:45:35 2016
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Mon, 10 Oct 2016 06:45:35 -0400
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <1476088722.2652.50.camel@oracle.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
	<70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
	<CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
	<D66CC823-9BDE-49D4-9828-266DEC4A5089@oracle.com>
	<CAHjP37FmWS9izaKFyd48fDBdrhH8-M6HmtpOAFFEqQ1L9e4Q9w@mail.gmail.com>
	<2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com>
	<CAHjP37EavH3TjZbPjoOKY2jH--8BPkZ5KvbfyzhxGJv-D80uBg@mail.gmail.com>
	<1476087171.2652.37.camel@oracle.com>
	<1476088722.2652.50.camel@oracle.com>
Message-ID: <CAHjP37GL+ZWYP4dm0woPQwy39te15OUGDsWDiMJ2cSNLMughLg@mail.gmail.com>

On Monday, October 10, 2016, Thomas Schatzl <thomas.schatzl at oracle.com>
wrote:

>
> On Mon, 2016-10-10 at 10:12 +0200, Thomas Schatzl wrote:
> > Hi all,
> >
> [...]
> >
> > (I am probably forgetting one or the other edge case here, but that's
> > the general idea; also please consider that for G1, except for
> > humongous allocations, the heap does not need to )
>
> ... the actually occupied heap area does not need to be contiguous.
> It's just easier to draw as such :)

Are the reserved regions contiguous or no?

Thanks

>
> Thanks,
>   Thomas
>
>

-- 
Sent from my phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161010/6d4ed30b/attachment.html>

From thomas.schatzl at oracle.com  Mon Oct 10 11:07:27 2016
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Mon, 10 Oct 2016 13:07:27 +0200
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <CAHjP37GL+ZWYP4dm0woPQwy39te15OUGDsWDiMJ2cSNLMughLg@mail.gmail.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
	<70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
	<CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
	<D66CC823-9BDE-49D4-9828-266DEC4A5089@oracle.com>
	<CAHjP37FmWS9izaKFyd48fDBdrhH8-M6HmtpOAFFEqQ1L9e4Q9w@mail.gmail.com>
	<2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com>
	<CAHjP37EavH3TjZbPjoOKY2jH--8BPkZ5KvbfyzhxGJv-D80uBg@mail.gmail.com>
	<1476087171.2652.37.camel@oracle.com>
	<1476088722.2652.50.camel@oracle.com>
	<CAHjP37GL+ZWYP4dm0woPQwy39te15OUGDsWDiMJ2cSNLMughLg@mail.gmail.com>
Message-ID: <1476097647.2652.63.camel@oracle.com>

Hi,

On Mon, 2016-10-10 at 06:45 -0400, Vitaly Davidovich wrote:
> 
> 
> On Monday, October 10, 2016, Thomas Schatzl <thomas.schatzl at oracle.co
> m> wrote:
> > 
> > On Mon, 2016-10-10 at 10:12 +0200, Thomas Schatzl wrote:
> > > Hi all,
> > >
> > [...]
> > >
> > > (I am probably forgetting one or the other edge case here, but
> > that's
> > > the general idea; also please consider that for G1, except for
> > > humongous allocations, the heap does not need to )
> > 
> > ... the actually occupied heap area does not need to be contiguous.
> > It's just easier to draw as such :)
> Are the reserved regions contiguous or no?

? no. You can't guarantee that. Consider long-living allocations into
this area. At the moment G1 never moves humongous objects.

Thomas


From vitalyd at gmail.com  Mon Oct 10 11:17:48 2016
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Mon, 10 Oct 2016 07:17:48 -0400
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <1476097647.2652.63.camel@oracle.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
	<70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
	<CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
	<D66CC823-9BDE-49D4-9828-266DEC4A5089@oracle.com>
	<CAHjP37FmWS9izaKFyd48fDBdrhH8-M6HmtpOAFFEqQ1L9e4Q9w@mail.gmail.com>
	<2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com>
	<CAHjP37EavH3TjZbPjoOKY2jH--8BPkZ5KvbfyzhxGJv-D80uBg@mail.gmail.com>
	<1476087171.2652.37.camel@oracle.com>
	<1476088722.2652.50.camel@oracle.com>
	<CAHjP37GL+ZWYP4dm0woPQwy39te15OUGDsWDiMJ2cSNLMughLg@mail.gmail.com>
	<1476097647.2652.63.camel@oracle.com>
Message-ID: <CAHjP37H-sJniKwFhSPCedRjUxNS4Exi=XyMv4Ne8pOe2TT5Xtw@mail.gmail.com>

On Monday, October 10, 2016, Thomas Schatzl <thomas.schatzl at oracle.com>
wrote:

> Hi,
>
> On Mon, 2016-10-10 at 06:45 -0400, Vitaly Davidovich wrote:
> >
> >
> > On Monday, October 10, 2016, Thomas Schatzl <thomas.schatzl at oracle.co
> <javascript:;>
> > m> wrote:
> > >
> > > On Mon, 2016-10-10 at 10:12 +0200, Thomas Schatzl wrote:
> > > > Hi all,
> > > >
> > > [...]
> > > >
> > > > (I am probably forgetting one or the other edge case here, but
> > > that's
> > > > the general idea; also please consider that for G1, except for
> > > > humongous allocations, the heap does not need to )
> > >
> > > ... the actually occupied heap area does not need to be contiguous.
> > > It's just easier to draw as such :)
> > Are the reserved regions contiguous or no?
>
>   no. You can't guarantee that. Consider long-living allocations into
> this area. At the moment G1 never moves humongous objects.

Yeah, I didn't expect them to be but wanted to clarify/confirm that.  This
makes them less useful for covering humongous allocations, but I understand
the constraints.  Also, by considering the reserve as a watermark value,
rather than space you really expect to use, I think that's fine.

Thanks

>
> Thomas
>
>

-- 
Sent from my phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161010/dcc38fdb/attachment-0001.html>

From prasanna.gopal at blackrock.com  Mon Oct 10 14:01:22 2016
From: prasanna.gopal at blackrock.com (Gopal, Prasanna CWK)
Date: Mon, 10 Oct 2016 14:01:22 +0000
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <CAHjP37Fdrbwbnny0hHV=Agob8VAS3Y0cSQkqnrJ1hyT=YUCFkQ@mail.gmail.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
	<70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
	<CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
	<D66CC823-9BDE-49D4-9828-266DEC4A5089@oracle.com>
	<CAHjP37FmWS9izaKFyd48fDBdrhH8-M6HmtpOAFFEqQ1L9e4Q9w@mail.gmail.com>
	<2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com>
	<CAHjP37EavH3TjZbPjoOKY2jH--8BPkZ5KvbfyzhxGJv-D80uBg@mail.gmail.com>
	<1476087171.2652.37.camel@oracle.com>
	<CAHjP37Fdrbwbnny0hHV=Agob8VAS3Y0cSQkqnrJ1hyT=YUCFkQ@mail.gmail.com>
Message-ID: <a53b29af7add43f790ed31c052eb5832@UKPMSEXD202N02.na.blkint.com>

Hi Thomas

Thanks for this wonderful explanation for G1ReservePercent parameter. As noted by Vitaly , it is activing as watermark for young generation heap size. So in our case where  G1ReservePercent=40, We are effectively asking G1 not to resize (increase) young generation , when we reach 60% of heap occupancy.

@All  -thanks for your comments. I will adjust the parameters we discussed and publish the outcome.  I am going to tune with the following parameters


1)     G1ReservePercent  -To reduce this value to a reasonable value. As a result , we are allowing G1 to resize young generation size more which can reduce the object promotion rate.


2)     G1MixedGCLiveThresholdPercent ? To increase this percent to 85. This will G1 more aggressive by having more mixed GC?s

            Could someone please explain me how increasing it from  65 ( which is default in JDK 7)  to 85 makes G1 to collect more old regions. I would have thought keeping it 65 means , we asking G1 to consider

regions above 65% of occupancy which will include regions with 85% as well. Am I missing some thing here ?


3)     To override MaxGCPauseMillis to a higher value , to make G1 less aggressive about GC pause time.


4)     To move to latest version of JDK , as suggested by everyone.


Thanks again for your comments. Really appreciate it.

Thanks and Regards
Prasanna


From: hotspot-gc-use [mailto:hotspot-gc-use-bounces at openjdk.java.net] On Behalf Of Vitaly Davidovich
Sent: 10 October 2016 11:42
To: Thomas Schatzl <thomas.schatzl at oracle.com>
Cc: hotspot-gc-use at openjdk.java.net
Subject: Re: G1-GC - Full GC [humongous allocation request failed]

Hi Thomas,

Thanks for the clarification and insights.  A few comments below ...

On Monday, October 10, 2016, Thomas Schatzl <thomas.schatzl at oracle.com<mailto:thomas.schatzl at oracle.com>> wrote:
Hi all,

On Fri, 2016-10-07 at 13:44 -0400, Vitaly Davidovich wrote:
>
> On Friday, October 7, 2016, charlie hunt <charlie.hunt at oracle.com<javascript:;>>
> wrote:
> > I think others are benefiting from your question(s) ? and it?s
> > helping refresh my memory of things too. ;-)
> >
> > Actually, I just looked at what we documented in Java Performance
> > Companion for G1ReservePercent, this wording may imply a very
> > slightly subtle different definition, ?To reduce the risk of
> > getting a promotion failure, G1 reserves some memory for
> > promotions. This memory will not be used for the young
> > generation.?
> >
> > Perhaps one of the G1 engineers can clarify this?
  the area covered by G1ReservePercent is regular space available for
any allocation, whether young or old or humongous.

The only difference is that while the heap occupancy is beyond the
reserve percent threshold, young gen will be minimal (like bounded by
G1NewSizePercent). I.e. G1 will run in some kind of "degraded
throughput" mode. "Degraded" as in young gen size is typically somehow
correlated with allocation throughput, so if you bound young gen size,
you also bound throughput.
Ok, so that's a quite different definition of the reserve than pretty much all sources that I've seen :).  Your explanation makes it sound like a "yellow zone" for G1, or a throttle/watermark for the young gen sizing.

The thinking for the reserve is to cover for extraneous large
allocations (either humongous or just a case where due to application
behavior changes lots of young gen objects survive) while G1 is getting
liveness information for the reclamation phase (i.e. mixed gc phase).

The collector just can't know what is the "maximum" promotion or
humongous object allocation rate as it is heavily application
dependent.
Just assuming the worst case, i.e. G1ReservePercent equals young gen,
would be way too wasteful, and at odds with other settings actually -
G1 can and will expand young gen to up to 70% if possible. Further,
such a heuristic would not capture humongous allocation by the
application anyway.

Ideally G1ReservePercent and InitiatingHeapOccupancyPercent are tuned
so that reclamation starts when occupancy reaches the G1ReservePercent
threshold. I.e., some ASCII art:

   +--------------------+  <-- heap full
^  |                    |
|  | 1)G1ReservePercent |
|  |                    |
   +--------------------+  <-- first mixed gc
H  |                    |
e  | 2)Allocation into  |
a  | old gen during     |
p  | marking            |
   |                    |
o  +--------------------+ <-- InitiatingHeapOccupancyPercent
c  |                    |
c  . 3)"Unconstrained"  .
u  . young gen sizing   .
p  . operation          .
a  .                    .
n  .                    .
c  .                    .
y  .                    .
   +--------------------+  <-- heap empty

(I am probably forgetting one or the other edge case here, but that's
the general idea; also please consider that for G1, except for
humongous allocations, the heap does not need to )

So when current young gen size + old gen occupancy is somewhere in
areas 2)/3), G1 will expand young gen as it sees fit to meet pause
time, i.e. is "unconstrained".

If young gen size + old gen occupancy starts eating into area 1), G1
minimizes young gen to try to keep as much memory left for these
"extraneous allocations" that G1ReservePercent indicates, in the hope
that the IHOP is "soon" kicking in. Until jdk9, G1 assumes that the
user gave some sane settings according to (roughly) this model.
With jdk9 onwards, the IHOP is determined automatically according to
this model and so far seems to work quite nicely - at least it will
typically give you a decent starting point for setting it on your own.
Ok, so the reserve acts like a high watermark in 9, used to adjust IHOP dynamically.  It sounds like it's an IHOP++ setting :).

I'm also not sure winding the young gen down helps in cases where old gen occupancy is growing.  Intuitively, that ought to make things worse actually.  Young evacs will occur more frequently, with higher likelihood that more objects are still live, and need to be kept alive, possibly causing further promotion.

One way that it helps is there's more frequent feedback to G1 about heap occupancy (since young evacs occur more frequently), and so it may notice that things aren't looking so peachy earlier.  Is that the idea?


As for the default value of G1ReservePercent (=10), well, consider it
some default for the "typical" application, trying to strike some
balance between throughput and safety to prevent running out of memory.

For very large heaps, it might typically be set a bit too large as the
young gen will most of the times be smaller than 10% of the heap due to
pause time constraints (or e.g. G1MaxNewSizePercent) and application
specific boundaries like "useful" allocation rates. Setting it to 40%
seems a bit too cautious, but may be warranted in some cases. Before
JDK9, it may be better to set InitiatingHeapOccupancyPercent properly.

For very small heaps G1ReservePercent may be too small.

(jdk9 specific tip: you can use G1ReservePercent to set a maximum IHOP
value).

Thanks,
  Thomas


--
Sent from my phone

This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for further information.  Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more information about BlackRock?s Privacy Policy.
BlackRock Advisors (UK) Limited and BlackRock Investment Management (UK) Limited are authorised and regulated by the Financial Conduct Authority. Registered in England No. 796793 and No. 2020394 respectively. BlackRock Life Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. Registered in England No. 2223202. Registered Offices: 12 Throgmorton Avenue, London EC2N 2DL. BlackRock International Limited is authorised and regulated by the Financial Conduct Authority and is a registered investment adviser with the Securities and Exchange Commission (SEC). Registered in Scotland No. SC160821. Registered Office: Exchange Place One, 1 Semple Street, Edinburgh EH3 8BL.
For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.

? 2016 BlackRock, Inc. All rights reserved.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161010/7103394c/attachment.html>

From vitalyd at gmail.com  Mon Oct 10 22:28:07 2016
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Mon, 10 Oct 2016 18:28:07 -0400
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <a53b29af7add43f790ed31c052eb5832@UKPMSEXD202N02.na.blkint.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
	<70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
	<CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
	<D66CC823-9BDE-49D4-9828-266DEC4A5089@oracle.com>
	<CAHjP37FmWS9izaKFyd48fDBdrhH8-M6HmtpOAFFEqQ1L9e4Q9w@mail.gmail.com>
	<2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com>
	<CAHjP37EavH3TjZbPjoOKY2jH--8BPkZ5KvbfyzhxGJv-D80uBg@mail.gmail.com>
	<1476087171.2652.37.camel@oracle.com>
	<CAHjP37Fdrbwbnny0hHV=Agob8VAS3Y0cSQkqnrJ1hyT=YUCFkQ@mail.gmail.com>
	<a53b29af7add43f790ed31c052eb5832@UKPMSEXD202N02.na.blkint.com>
Message-ID: <CAHjP37HpcjwAqRFAWz+sdz61T3PeH-e2Rx8yBMhLJ1MTQULpzg@mail.gmail.com>

On Monday, October 10, 2016, Gopal, Prasanna CWK <
prasanna.gopal at blackrock.com> wrote:

> Hi Thomas
>
>
>
> Thanks for this wonderful explanation for G1ReservePercent parameter. As
> noted by Vitaly , it is activing as watermark for young generation heap
> size. So in our case where  G1ReservePercent=40, We are effectively
> asking G1 not to resize (increase) young generation , when we reach 60% of
> heap occupancy.
>
>
>
> @All  -thanks for your comments. I will adjust the parameters we discussed
> and publish the outcome.  I am going to tune with the following parameters
>
>
>
> 1)     G1ReservePercent  -To reduce this value to a reasonable value. As
> a result , we are allowing G1 to resize young generation size more which
> can reduce the object promotion rate.
>
>
>
> 2)     G1MixedGCLiveThresholdPercent ? To increase this percent to 85.
> This will G1 more aggressive by having more mixed GC?s
>
>             Could someone please explain me how increasing it from  65 (
> which is default in JDK 7)  to 85 makes G1 to collect more old regions. I
> would have thought keeping it 65 means , we asking G1 to consider
>
> regions above 65% of occupancy which will include regions with 85% as
> well. Am I missing some thing here ?
>
> This value says "what max liveness does an old region need to have to be
considered for mixed collections".  In other words, a value of 65 means a
region must have liveness of 65 or less to be considered.  Put another way,
if garbage is 35%+ it's a candidate.

When you set it to 85%, a region can be "more live"/less garbage (i.e. 15%+
garbage) and still be eligible.

>
>
> 3)     To override MaxGCPauseMillis to a higher value , to make G1 less
> aggressive about GC pause time.
>
>
>
> 4)     To move to latest version of JDK , as suggested by everyone.
>
>
>
> Thanks again for your comments. Really appreciate it.
>
>
>
> Thanks and Regards
>
> Prasanna
>
>
>
>
>
> *From:* hotspot-gc-use [mailto:hotspot-gc-use-bounces at openjdk.java.net
> <javascript:_e(%7B%7D,'cvml','hotspot-gc-use-bounces at openjdk.java.net');>]
> *On Behalf Of *Vitaly Davidovich
> *Sent:* 10 October 2016 11:42
> *To:* Thomas Schatzl <thomas.schatzl at oracle.com
> <javascript:_e(%7B%7D,'cvml','thomas.schatzl at oracle.com');>>
> *Cc:* hotspot-gc-use at openjdk.java.net
> <javascript:_e(%7B%7D,'cvml','hotspot-gc-use at openjdk.java.net');>
> *Subject:* Re: G1-GC - Full GC [humongous allocation request failed]
>
>
>
> Hi Thomas,
>
>
>
> Thanks for the clarification and insights.  A few comments below ...
>
> On Monday, October 10, 2016, Thomas Schatzl <thomas.schatzl at oracle.com
> <javascript:_e(%7B%7D,'cvml','thomas.schatzl at oracle.com');>> wrote:
>
> Hi all,
>
> On Fri, 2016-10-07 at 13:44 -0400, Vitaly Davidovich wrote:
> >
> > On Friday, October 7, 2016, charlie hunt <charlie.hunt at oracle.com>
> > wrote:
> > > I think others are benefiting from your question(s) ? and it?s
> > > helping refresh my memory of things too. ;-)
> > >
> > > Actually, I just looked at what we documented in Java Performance
> > > Companion for G1ReservePercent, this wording may imply a very
> > > slightly subtle different definition, ?To reduce the risk of
> > > getting a promotion failure, G1 reserves some memory for
> > > promotions. This memory will not be used for the young
> > > generation.?
> > >
> > > Perhaps one of the G1 engineers can clarify this?
>   the area covered by G1ReservePercent is regular space available for
> any allocation, whether young or old or humongous.
>
> The only difference is that while the heap occupancy is beyond the
> reserve percent threshold, young gen will be minimal (like bounded by
> G1NewSizePercent). I.e. G1 will run in some kind of "degraded
> throughput" mode. "Degraded" as in young gen size is typically somehow
> correlated with allocation throughput, so if you bound young gen size,
> you also bound throughput.
>
> Ok, so that's a quite different definition of the reserve than pretty much
> all sources that I've seen :).  Your explanation makes it sound like a
> "yellow zone" for G1, or a throttle/watermark for the young gen sizing.
>
>
> The thinking for the reserve is to cover for extraneous large
> allocations (either humongous or just a case where due to application
> behavior changes lots of young gen objects survive) while G1 is getting
> liveness information for the reclamation phase (i.e. mixed gc phase).
>
>
> The collector just can't know what is the "maximum" promotion or
> humongous object allocation rate as it is heavily application
> dependent.
> Just assuming the worst case, i.e. G1ReservePercent equals young gen,
> would be way too wasteful, and at odds with other settings actually -
> G1 can and will expand young gen to up to 70% if possible. Further,
> such a heuristic would not capture humongous allocation by the
> application anyway.
>
> Ideally G1ReservePercent and InitiatingHeapOccupancyPercent are tuned
> so that reclamation starts when occupancy reaches the G1ReservePercent
> threshold. I.e., some ASCII art:
>
>    +--------------------+  <-- heap full
> ^  |                    |
> |  | 1)G1ReservePercent |
> |  |                    |
>    +--------------------+  <-- first mixed gc
> H  |                    |
> e  | 2)Allocation into  |
> a  | old gen during     |
> p  | marking            |
>    |                    |
> o  +--------------------+ <-- InitiatingHeapOccupancyPercent
> c  |                    |
> c  . 3)"Unconstrained"  .
> u  . young gen sizing   .
> p  . operation          .
> a  .                    .
> n  .                    .
> c  .                    .
> y  .                    .
>    +--------------------+  <-- heap empty
>
> (I am probably forgetting one or the other edge case here, but that's
> the general idea; also please consider that for G1, except for
> humongous allocations, the heap does not need to )
>
> So when current young gen size + old gen occupancy is somewhere in
> areas 2)/3), G1 will expand young gen as it sees fit to meet pause
> time, i.e. is "unconstrained".
>
> If young gen size + old gen occupancy starts eating into area 1), G1
> minimizes young gen to try to keep as much memory left for these
> "extraneous allocations" that G1ReservePercent indicates, in the hope
> that the IHOP is "soon" kicking in. Until jdk9, G1 assumes that the
> user gave some sane settings according to (roughly) this model.
> With jdk9 onwards, the IHOP is determined automatically according to
> this model and so far seems to work quite nicely - at least it will
> typically give you a decent starting point for setting it on your own.
>
> Ok, so the reserve acts like a high watermark in 9, used to adjust IHOP
> dynamically.  It sounds like it's an IHOP++ setting :).
>
>
>
> I'm also not sure winding the young gen down helps in cases where old gen
> occupancy is growing.  Intuitively, that ought to make things worse
> actually.  Young evacs will occur more frequently, with higher likelihood
> that more objects are still live, and need to be kept alive, possibly
> causing further promotion.
>
>
>
> One way that it helps is there's more frequent feedback to G1 about heap
> occupancy (since young evacs occur more frequently), and so it may notice
> that things aren't looking so peachy earlier.  Is that the idea?
>
>
>
>
> As for the default value of G1ReservePercent (=10), well, consider it
> some default for the "typical" application, trying to strike some
> balance between throughput and safety to prevent running out of memory.
>
> For very large heaps, it might typically be set a bit too large as the
> young gen will most of the times be smaller than 10% of the heap due to
> pause time constraints (or e.g. G1MaxNewSizePercent) and application
> specific boundaries like "useful" allocation rates. Setting it to 40%
> seems a bit too cautious, but may be warranted in some cases. Before
> JDK9, it may be better to set InitiatingHeapOccupancyPercent properly.
>
> For very small heaps G1ReservePercent may be too small.
>
> (jdk9 specific tip: you can use G1ReservePercent to set a maximum IHOP
> value).
>
> Thanks,
>   Thomas
>
>
>
> --
> Sent from my phone
>
>
>
> This message may contain information that is confidential or privileged.
> If you are not the intended recipient, please advise the sender immediately
> and delete this message. See http://www.blackrock.com/
> corporate/en-us/compliance/email-disclaimers for further information.
> Please refer to http://www.blackrock.com/corporate/en-us/compliance/
> privacy-policy for more information about BlackRock?s Privacy Policy.
>
> BlackRock Advisors (UK) Limited and BlackRock Investment Management (UK)
> Limited are authorised and regulated by the Financial Conduct Authority.
> Registered in England No. 796793 and No. 2020394 respectively. BlackRock
> Life Limited is authorised by the Prudential Regulation Authority and
> regulated by the Financial Conduct Authority and the Prudential Regulation
> Authority. Registered in England No. 2223202. Registered Offices: 12
> Throgmorton Avenue, London EC2N 2DL. BlackRock International Limited is
> authorised and regulated by the Financial Conduct Authority and is a
> registered investment adviser with the Securities and Exchange Commission
> (SEC). Registered in Scotland No. SC160821. Registered Office: Exchange
> Place One, 1 Semple Street, Edinburgh EH3 8BL.
>
> For a list of BlackRock's office addresses worldwide, see
> http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.
>
> ? 2016 BlackRock, Inc. All rights reserved.
>
>

-- 
Sent from my phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161010/8c42c9cf/attachment-0001.html>

From thomas.schatzl at oracle.com  Tue Oct 11 06:55:26 2016
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 11 Oct 2016 08:55:26 +0200
Subject: G1-GC - Full GC [humongous allocation request failed]
In-Reply-To: <CAHjP37Fdrbwbnny0hHV=Agob8VAS3Y0cSQkqnrJ1hyT=YUCFkQ@mail.gmail.com>
References: <b73d768d847544ea8c295bfa06397bbe@UKPMSEXD202N02.na.blkint.com>
	<CAHjP37FNdtrkKcHq80nEMXhx4Knx_Q92tkYS5qwraGBMXmQ1NQ@mail.gmail.com>
	<6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com>
	<CAHjP37G86w+cqMQ_Bn+isZ9nuage83q7Xv3-_Uj8w4B-UG-hJg@mail.gmail.com>
	<70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com>
	<CAHjP37HkR3SG01umuCoxEyCO6kdgyhR0iRqVL2qfF_LoYZSt=Q@mail.gmail.com>
	<D66CC823-9BDE-49D4-9828-266DEC4A5089@oracle.com>
	<CAHjP37FmWS9izaKFyd48fDBdrhH8-M6HmtpOAFFEqQ1L9e4Q9w@mail.gmail.com>
	<2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com>
	<CAHjP37EavH3TjZbPjoOKY2jH--8BPkZ5KvbfyzhxGJv-D80uBg@mail.gmail.com>
	<1476087171.2652.37.camel@oracle.com>
	<CAHjP37Fdrbwbnny0hHV=Agob8VAS3Y0cSQkqnrJ1hyT=YUCFkQ@mail.gmail.com>
Message-ID: <1476168926.2502.3.camel@oracle.com>

Hi Vitaly,

On Mon, 2016-10-10 at 06:42 -0400, Vitaly Davidovich wrote:
> Hi Thomas,
> 
> Thanks for the clarification and insights.? A few comments below ...
> 
> On Monday, October 10, 2016, Thomas Schatzl <thomas.schatzl at oracle.co
> m> wrote:
> > Hi all,
> > 
> > On Fri, 2016-10-07 at 13:44 -0400, Vitaly Davidovich wrote:
> > >?
> > >?On Friday, October 7, 2016, charlie hunt <charlie.hunt at oracle.com
> > >
> > >?wrote:
> > >?>?I think others are benefiting from your question(s) ? and it?s
> > >?>?helping refresh my memory of things too. ;-)
> > >?>?
> > >?>?Actually, I just looked at what we documented in Java 
> > > > Performance
> > >?>?Companion for G1ReservePercent, this wording may imply a very
> > >?>?slightly subtle different definition, ?To reduce the risk of
> > >?>?getting a promotion failure, G1 reserves some memory for
> > >?>?promotions. This memory will not?be used for the young
> > >?>?generation.??
> > >?>?
> > >?>?Perhaps one of the G1 engineers can clarify this?
> >
> > ? the area covered by G1ReservePercent is regular space available
> > for any allocation, whether young or old or humongous.
> > 
> > The only difference is that while the heap occupancy is beyond the
> > reserve percent threshold, young gen will be minimal (like bounded
> > by G1NewSizePercent). I.e. G1 will run in some kind of "degraded
> > throughput" mode. "Degraded" as in young gen size is typically
> > somehow correlated with allocation throughput, so if you bound
> > young gen size, you also bound throughput.
>
> Ok, so that's a quite different definition of the reserve than pretty
> much all sources that I've seen :).? Your explanation makes it sound
> like a "yellow zone" for G1, or a throttle/watermark?for the young
> gen sizing.

I described the effect it has. It should be considered a reserve for
unexpected promotions/allocations only, and in general is an area to
not allocate into.

[...]
> 
> > If young gen size + old gen occupancy starts eating into area 1), 
> > G1 minimizes young gen to try to keep as much memory left for these
> > "extraneous allocations" that G1ReservePercent indicates, in the?
> > hope that the IHOP is "soon" kicking in. Until jdk9, G1 assumes?
> > that the user gave some sane settings according to (roughly) this?
> > model.
> > With jdk9 onwards, the IHOP is determined automatically according?
> > to this model and so far seems to work quite nicely - at least it?
> > will typically give you a decent starting point for setting it on?
> > your own.

> Ok, so the reserve acts like a high watermark in 9, used to adjust
> IHOP dynamically.? It sounds like it's an IHOP++ setting :).

;)

In the cases I have seen so far, the adaptive IHOP mechanism makes sure
that you don't get into this situation to actually use the reserve at
all - unless the application behavior changes a lot over time to avoid
full gc.

I am sure you can find situations where it fails of course. It is just
another heuristic.

> I'm also not sure winding the young gen down helps in cases where old
> gen occupancy is growing.? Intuitively, that ought to make things
> worse actually.? Young evacs will occur more frequently, with higher
> likelihood that more objects are still live, and need to be kept
> alive, possibly causing further promotion.

That depends on the application. Some applications work this way, many
others don't, at least beyond a certain threshold of young gen size.

There is the option to set G1ReservePercent to zero and set IHOP
manually, then the upper bound for eden is just the remaining memory in
this situation.

You could set the "confidence" the adaptive IHOP has too to get some
extra slack.

> One way that it helps is there's more frequent feedback to G1 about
> heap occupancy (since young evacs occur more frequently), and so it
> may notice that things aren't looking so peachy earlier.? Is that the
> idea?

There is the reason you suggest, i.e. to make sure that the IHOP is
checked more frequently to start marking as soon as its threshold is
crossed (if it has not been). There may be others.

As for the impact, I am not sure, considering that the main problem at
this point is that when getting close to G1ReserverPercent of remaining
space, you are close to getting to the end of available space!

I.e. consider looking at the defaults of G1ReservePercent of 10, this
is not a lot compared to defaults for G1NewSizePercent and
G1MaxNewSizePercent of 5 and 60 respectively.

To use all remaining memory for eden at this point is for obvious
reasons not an excellent idea...

Now you might argue that maybe one should start marking to make sure
that you can use maximum eden at all times. That can be done (set IHOP
manually), but the problem is that this likely causes more frequent
concurrent marking, that also need CPU resources.
Somewhat shorter?gc pause intervals typically do not affect total
throughput as much. Additionally, allowing old gen objects to die for
longer typically pays off a lot, i.e. makes the mixed gc phase shorter.

If you feel that above paragraph is a bit too hand-wavy here:
basically, GC heuristics interact in sometimes counter-intuitive ways
with the application.?

Thanks,
? Thomas


From jun.zhuang at hobsons.com  Tue Oct 11 18:25:29 2016
From: jun.zhuang at hobsons.com (Jun Zhuang)
Date: Tue, 11 Oct 2016 18:25:29 +0000
Subject: About the finalization queue and reference queue
Message-ID: <BN6PR02MB2769C4277A863B89AD2EF9E381DA0@BN6PR02MB2769.namprd02.prod.outlook.com>

Hi,

While reading about re-defining the finalize() method explicitly in a class I came across some statements and like to get some clarification from the experts.

On http://www.fasterj.com/articles/finalizer2.shtml, the author states that "the GC adds each of those Finalizer objects to the reference queue at java.lang.ref.Finalizer.ReferenceQueue.". Based on this the Finalizer object associated with the finalizeable object goes on the reference queue.

On page 311 of book Service-Oriented Computing - ICSOC 2011 Workshops<https://books.google.com/books?id=hIa5BQAAQBAJ&pg=PA311&lpg=PA311&dq=java,+finalization+queue&source=bl&ots=LLGiYGWh0L&sig=Glvf0kn0zKHrdWoPzM6y6wtsr_M&hl=en&sa=X&ved=0ahUKEwjHrsL2pdPPAhUk3IMKHdXRCtc4ChDoAQgbMAA#v=onepage&q=finalization%20queue&f=false> "... all those objects that have a finalize () method and are found to be unreachable(dead) by garbage collector, are pushed into a finalization queue.". So the finalizeable object goes on the finalization queue.

Then this site, https://yourkit.com/forum/viewtopic.php?f=3&t=4672, states that "Objects of all classes with redefined finalize() method are added to a queue at the moment of creation. The queue head is referenced from a static field in java.lang.ref.Finalizer. An instance of Finalizer is created for each "finalizeable" object and is stored in that queue, which is in fact a linked list of Finalizers.", so both the finalizeable object and the associated Finalizer object are stored in the same queue?

So my questions are: Are there one or two queues involved? Exactly how object finalization works?


Appreciate your input,
Jun

Jun Zhuang
Sr. Performance QA Engineer | Hobsons<https://www.hobsons.com/?utm_source=outlook&utm_medium=email&utm_campaign=banner_02.12.16_general>
T: +1 513 746 2288 | jun.zhuang at hobsons.com
50 E-Business Way, Suite 300 | Cincinnati, OH 45241 | USA


Upgraded by Hobsons - Subscribe Today
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161011/afd226fa/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image755000.png
Type: image/png
Size: 13602 bytes
Desc: image755000.png
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161011/afd226fa/image755000-0001.png>

From ecki at zusammenkunft.net  Tue Oct 11 19:05:49 2016
From: ecki at zusammenkunft.net (Bernd Eckenfels)
Date: Tue, 11 Oct 2016 21:05:49 +0200
Subject: About the finalization queue and reference queue
In-Reply-To: <BN6PR02MB2769C4277A863B89AD2EF9E381DA0@BN6PR02MB2769.namprd02.prod.outlook.com>
References: <BN6PR02MB2769C4277A863B89AD2EF9E381DA0@BN6PR02MB2769.namprd02.prod.outlook.com>
Message-ID: <20161011210549.00006b33.ecki@zusammenkunft.net>

Hello,

what is interesting to know is, that each finalizeable object which is
tracked is wrapped/tracked with an instance of j.lang.ref.Finalizer
(which is a FinalizerReference<Obj> subclass i.e. a final reference).

Generally for references to work you need to keep alive the
Reference instance. The finalizer does this with a built-in linked
list in the Finalizer instance (static unfinalized points to the head
of the list and each Finalizer object has a next/prev pointer. When
the VM tracks a finalizeable object it calls the Finalize(Object)
constructor which makes sure to add it.

So if you have thousands of finalized objects there are all indirectly
referenced from this long linear list.

When the GC does its thing and a instance becomes unreachable, it will
add it to the ReferenceQueue of the finalizer. The FinalizerThread will
consume it from there and make sure to finally also remove the
Finalizer reference for that object from the double linked list.

If you work with heap-dumps and analyse memeory leak informations you
need to ignore the memeory consumotion under "Finalizer.unfinalized"
with the (possibly long if the Finalizer thread is blocked and unwanted)
ReferenceQueue Finalizer#queue#head single linked list.

Gruss
Bernd

Am Tue, 11 Oct 2016 18:25:29 +0000
schrieb Jun Zhuang <jun.zhuang at hobsons.com>:

> Hi,
> 
> While reading about re-defining the finalize() method explicitly in a
> class I came across some statements and like to get some
> clarification from the experts.
> 
> On http://www.fasterj.com/articles/finalizer2.shtml, the author
> states that "the GC adds each of those Finalizer objects to the
> reference queue at java.lang.ref.Finalizer.ReferenceQueue.". Based on
> this the Finalizer object associated with the finalizeable object
> goes on the reference queue.
> 
> On page 311 of book Service-Oriented Computing - ICSOC 2011
> Workshops<https://books.google.com/books?id=hIa5BQAAQBAJ&pg=PA311&lpg=PA311&dq=java,+finalization+queue&source=bl&ots=LLGiYGWh0L&sig=Glvf0kn0zKHrdWoPzM6y6wtsr_M&hl=en&sa=X&ved=0ahUKEwjHrsL2pdPPAhUk3IMKHdXRCtc4ChDoAQgbMAA#v=onepage&q=finalization%20queue&f=false>
> "... all those objects that have a finalize () method and are found
> to be unreachable(dead) by garbage collector, are pushed into a
> finalization queue.". So the finalizeable object goes on the
> finalization queue.
> 
> Then this site, https://yourkit.com/forum/viewtopic.php?f=3&t=4672,
> states that "Objects of all classes with redefined finalize() method
> are added to a queue at the moment of creation. The queue head is
> referenced from a static field in java.lang.ref.Finalizer. An
> instance of Finalizer is created for each "finalizeable" object and
> is stored in that queue, which is in fact a linked list of
> Finalizers.", so both the finalizeable object and the associated
> Finalizer object are stored in the same queue?
> 
> So my questions are: Are there one or two queues involved? Exactly
> how object finalization works?
> 
> 
> Appreciate your input,
> Jun
> 
> Jun Zhuang
> Sr. Performance QA Engineer |
> Hobsons<https://www.hobsons.com/?utm_source=outlook&utm_medium=email&utm_campaign=banner_02.12.16_general>
> T: +1 513 746 2288 | jun.zhuang at hobsons.com 50 E-Business Way, Suite
> 300 | Cincinnati, OH 45241 | USA
> 
> 
> Upgraded by Hobsons - Subscribe Today
> 


From jun.zhuang at hobsons.com  Tue Oct 11 20:55:54 2016
From: jun.zhuang at hobsons.com (Jun Zhuang)
Date: Tue, 11 Oct 2016 20:55:54 +0000
Subject: About the finalization queue and reference queue
In-Reply-To: <20161011210549.00006b33.ecki@zusammenkunft.net>
References: <BN6PR02MB2769C4277A863B89AD2EF9E381DA0@BN6PR02MB2769.namprd02.prod.outlook.com>
	<20161011210549.00006b33.ecki@zusammenkunft.net>
Message-ID: <BN6PR02MB2769C2D5CAA8CD8421F952E481DA0@BN6PR02MB2769.namprd02.prod.outlook.com>

Hi Bernd,


Appreciate your quick response. I understand following:

*         A Finalizer instance is created for every finalizeable object

*         All the Finalizer instances are linked together using a double linked list

*         All the Finalizer instances are tracked by the java.lang.ref.Finalizer class. Or is it only the first one by the unfinalized field?


What I am still not clear are:

1.       Is there a finalization queue at all? If so, what does it do?

2.       What goes into the ReferenceQueue? The finalizeable objects?


Thanks a lot,

Jun


-----Original Message-----
From: Bernd Eckenfels [mailto:ecki at zusammenkunft.net]
Sent: Tuesday, October 11, 2016 3:06 PM
To: hotspot-gc-use at openjdk.java.net
Cc: Jun Zhuang <jun.zhuang at hobsons.com>
Subject: Re: About the finalization queue and reference queue


Hello,


what is interesting to know is, that each finalizeable object which is tracked is wrapped/tracked with an instance of j.lang.ref.Finalizer (which is a FinalizerReference<Obj> subclass i.e. a final reference).


Generally for references to work you need to keep alive the Reference instance. The finalizer does this with a built-in linked list in the Finalizer instance (static unfinalized points to the head of the list and each Finalizer object has a next/prev pointer. When the VM tracks a finalizeable object it calls the Finalize(Object) constructor which makes sure to add it.


So if you have thousands of finalized objects there are all indirectly referenced from this long linear list.


When the GC does its thing and a instance becomes unreachable, it will add it to the ReferenceQueue of the finalizer. The FinalizerThread will consume it from there and make sure to finally also remove the Finalizer reference for that object from the double linked list.


If you work with heap-dumps and analyse memeory leak informations you need to ignore the memeory consumotion under "Finalizer.unfinalized"

with the (possibly long if the Finalizer thread is blocked and unwanted) ReferenceQueue Finalizer#queue#head single linked list.


Gruss

Bernd


Am Tue, 11 Oct 2016 18:25:29 +0000

schrieb Jun Zhuang <jun.zhuang at hobsons.com<mailto:jun.zhuang at hobsons.com>>:


> Hi,

>

> While reading about re-defining the finalize() method explicitly in a

> class I came across some statements and like to get some clarification

> from the experts.

>

> On http://www.fasterj.com/articles/finalizer2.shtml, the author states

> that "the GC adds each of those Finalizer objects to the reference

> queue at java.lang.ref.Finalizer.ReferenceQueue.". Based on this the

> Finalizer object associated with the finalizeable object goes on the

> reference queue.

>

> On page 311 of book Service-Oriented Computing - ICSOC 2011

> Workshops<https://books.google.com/books?id=hIa5BQAAQBAJ&pg=PA311&lpg=

> PA311&dq=java,+finalization+queue&source=bl&ots=LLGiYGWh0L&sig=Glvf0kn

> 0zKHrdWoPzM6y6wtsr_M&hl=en&sa=X&ved=0ahUKEwjHrsL2pdPPAhUk3IMKHdXRCtc4C

> hDoAQgbMAA#v=onepage&q=finalization%20queue&f=false>

> "... all those objects that have a finalize () method and are found to

> be unreachable(dead) by garbage collector, are pushed into a

> finalization queue.". So the finalizeable object goes on the

> finalization queue.

>

> Then this site, https://yourkit.com/forum/viewtopic.php?f=3&t=4672,

> states that "Objects of all classes with redefined finalize() method

> are added to a queue at the moment of creation. The queue head is

> referenced from a static field in java.lang.ref.Finalizer. An instance

> of Finalizer is created for each "finalizeable" object and is stored

> in that queue, which is in fact a linked list of Finalizers.", so both

> the finalizeable object and the associated Finalizer object are stored

> in the same queue?

>

> So my questions are: Are there one or two queues involved? Exactly how

> object finalization works?

>

>

> Appreciate your input,

> Jun

>

> Jun Zhuang

> Sr. Performance QA Engineer |

> Hobsons<https://www.hobsons.com/?utm_source=outlook&utm_medium=email&u

> tm_campaign=banner_02.12.16_general>

> T: +1 513 746 2288 | jun.zhuang at hobsons.com<mailto:jun.zhuang at hobsons.com> 50 E-Business Way, Suite

> 300 | Cincinnati, OH 45241 | USA

>

>

> Upgraded by Hobsons - Subscribe Today

>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161011/89131731/attachment.html>

From ecki at zusammenkunft.net  Tue Oct 11 21:25:51 2016
From: ecki at zusammenkunft.net (Bernd Eckenfels)
Date: Tue, 11 Oct 2016 23:25:51 +0200
Subject: About the finalization queue and reference queue
In-Reply-To: <BN6PR02MB2769C2D5CAA8CD8421F952E481DA0@BN6PR02MB2769.namprd02.prod.outlook.com>
References: <BN6PR02MB2769C4277A863B89AD2EF9E381DA0@BN6PR02MB2769.namprd02.prod.outlook.com>
	<20161011210549.00006b33.ecki@zusammenkunft.net>
	<BN6PR02MB2769C2D5CAA8CD8421F952E481DA0@BN6PR02MB2769.namprd02.prod.outlook.com>
Message-ID: <20161011232551.00005ac8.ecki@zusammenkunft.net>

Hello,

Am Tue, 11 Oct 2016 20:55:54 +0000
schrieb Jun Zhuang <jun.zhuang at hobsons.com>:

> *         A Finalizer instance is created for every finalizeable
> object

Yes, the java.lang.ref.Finalizer.Finalizer(Object) constructor will put
the referent in the referent` field of Finalizer (declared in parent
class Referent) and then link this instance with add() method at the
head of the list. The head is kept alive by Finalizer#unfinalized
(which is static). This constructor is called (via
Finalizer#register(Object)) by the JVM when it creates a new
finalizeable object.

> *         All the Finalizer instances are linked together using a
> double linked list

Yes, they dont use a LikedList class but implement the next/prev fields
themself (so this needs no additional instances).

> *         All the Finalizer instances are tracked by the
> java.lang.ref.Finalizer class. Or is it only the first one by the
> unfinalized field?

The head of the linked list is referenced by `unfinalized field. It
points to a Finalizer instance, which points which the next field to the
next Finalized and so on (and each of them has a referee).

> What I am still not clear are:
> 
> 1.       Is there a finalization queue at all? If so, what does it do?

There is a ReferenceQueue in the static field queue. This queue is
given to all Finalize instances (the field "queue" in Reference holds
thatuntil needed). The objets are enqueued to this queue by the GC. The
Finalizer thread reads them from this queue and knows "this one is now
not reachable anymore" and does its work (and removes it from the queue
and also remove the Finalize wrapper from its linked list).

> 2.       What goes into the ReferenceQueue? The finalizeable objects?

References go to the ReferenceQueue, the wrapper Finalizer instance is
also a (Final)Reference. In case of FinalReference the queue can look
at the Finalize#get() to get the referent and call finalize() on this.

Same mechanism asdone for allother Reference types (with the exception
of phantom references which do not allow this get).

The Finalize class is a real beast:

- instances are a Reference wrapping the finalizeable objects
- instances form a double linked list of all Finalize instances
- the class itself holds the head of the list and the queue alive in
  statics 
- the FinalizerThread (which removes Finalizer instances from the
  ReferenceQueue and invokes the finalizer method on it (once) is a
  inner class: Finalizer$FinalizerThread
- the static initializer of Finalizer actually starts the
  FinalizerThread (as a daemon with MAX-2 prio).
- the static field lock in Finalizer is used to synchronize the daemon
  thread with secondary finalizer threads (from runAllFinalizers() on
  shutdown or Runtime.runFinalizazion())

Lots of the finalization logic is done in Java, only the register() and
Reference discovery is done by the runtime/gc.

You can see that here, note especially the "Invoked by" comments (and
parents):

http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/2bf254421854/src/java.base/share/classes/java/lang/ref/Finalizer.java

Bernd

From inurislamov at getintent.com  Wed Oct 12 08:39:04 2016
From: inurislamov at getintent.com (Ildar Nurislamov)
Date: Wed, 12 Oct 2016 11:39:04 +0300
Subject: JDK-8166500 Adaptive sizing for IHOP causes excessively long
	mixed GC pauses
In-Reply-To: <FB62382E-72A1-444E-8078-02EE891AD322@getintent.com>
References: <273BC628-AC88-4E21-AB27-32AE2021B8FA@getintent.com>
	<1475048227.4430.4.camel@oracle.com>
	<FB62382E-72A1-444E-8078-02EE891AD322@getintent.com>
Message-ID: <2CA764D0-CCB4-4FA5-9A5A-B005688BD16D@getintent.com>

Hi Thomas,

It was too early to make conclusions.
After some prolonged testing i've noticed that more thorough tuning may be required to avoid this issue completely. 
And -XX:-G1UseAdaptiveIHOP not always enough too. 

What bothers me is the steep jump in time required between the last Mixed GC and the previous:
In 9th it took 129.8ms to evacuate 104 old region:
[64394.771s][info][gc,phases     ] GC(38781)   Evacuate Collection Set: 129.8ms
[64394.771s][info][gc,phases     ] GC(38781)   Code Roots: 0.0ms
[64394.771s][info][gc,phases     ] GC(38781)   Clear Card Table: 3.4ms
[64394.771s][info][gc,phases     ] GC(38781)   Expand Heap After Collection: 0.0ms
[64394.771s][info][gc,phases     ] GC(38781)   Free Collection Set: 3.9ms
[64394.771s][info][gc,phases     ] GC(38781)   Merge Per-Thread State: 0.1ms
[64394.771s][info][gc,phases     ] GC(38781)   Other: 13.8ms
[64394.771s][info][gc,heap       ] GC(38781) Eden regions: 37->0(37)
[64394.771s][info][gc,heap       ] GC(38781) Survivor regions: 3->3(5)
[64394.771s][info][gc,heap       ] GC(38781) Old regions: 457->353
[64394.771s][info][gc,heap       ] GC(38781) Humongous regions: 3->3
[64394.771s][info][gc,metaspace  ] GC(38781) Metaspace: 70587K->70587K(83968K)
[64394.771s][info][gc            ] GC(38781) Pause Mixed (G1 Evacuation Pause) 15972M->11457M(65536M) (64394.620s, 64394.771s) 150.931ms

While in 10th (the last) it took 3401.3ms to evacuate 87:
[64398.393s][info][gc,phases     ] GC(38782)   Evacuate Collection Set: 3401.3ms
[64398.393s][info][gc,phases     ] GC(38782)   Code Roots: 0.0ms
[64398.393s][info][gc,phases     ] GC(38782)   Clear Card Table: 2.8ms
[64398.393s][info][gc,phases     ] GC(38782)   Expand Heap After Collection: 0.0ms
[64398.393s][info][gc,phases     ] GC(38782)   Free Collection Set: 4.3ms
[64398.393s][info][gc,phases     ] GC(38782)   Merge Per-Thread State: 0.1ms
[64398.393s][info][gc,phases     ] GC(38782)   Other: 12.2ms
[64398.393s][info][gc,heap       ] GC(38782) Eden regions: 37->0(37)
[64398.393s][info][gc,heap       ] GC(38782) Survivor regions: 3->3(5)
[64398.393s][info][gc,heap       ] GC(38782) Old regions: 353->266
[64398.393s][info][gc,heap       ] GC(38782) Humongous regions: 3->3
[64398.393s][info][gc,metaspace  ] GC(38782) Metaspace: 70587K->70587K(83968K)
[64398.393s][info][gc            ] GC(38782) Pause Mixed (G1 Evacuation Pause) 12641M->8678M(65536M) (64394.973s, 64398.393s) 3420.666ms

It looks like at average old regions in 10th Mixed GC were 31.5 times more expensive than in 9th and it took 39ms to collect just one region. Does it make sense? To what extent one old region may be more expensive than another?
I wish G1Ergonomics similar to "reason: predicted time is too high" but for order of magnitude jump cases worked here even when min old regions number has not been reached. We didn't spend all XX:G1MixedGCCountTarget=12 yet here.

Log file: https://www.dropbox.com/s/ubpkosh0a8tomss/jdk9_135_tuned_11_10_16.log.zip?dl=0 <https://www.dropbox.com/s/ubpkosh0a8tomss/jdk9_135_tuned_11_10_16.log.zip?dl=0> 
Sadly with no ergonomics.

Next thing i'm going to try is adjusting XX:G1MixedGCLiveThresholdPercent.

Thank you!

--
Ildar Nurislamov
GetIntent, AdServer Team Leader

> On Sep 29, 2016, at 13:42, Ildar Nurislamov <inurislamov at getintent.com> wrote:
> 
> Hi Thomas,
> 
> Thank you for really helpful advices.
> 
> I have performed 8-hour testing with:
> -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=2 -XX:G1HeapWastePercent=10 -XX:G1MixedGCCountTarget=12 
> and they improved situation for both 8u and 9ea.
> Longest pause on 9ea now is 400ms with Adaptive sizing for IHOP
> 
> I will continue testing and report if anything interesting pops out.
> 
> --
> Ildar Nurislamov
> GetIntent, AdServer Team Leader
> 
>> On Sep 28, 2016, at 10:37, Thomas Schatzl <thomas.schatzl at oracle.com <mailto:thomas.schatzl at oracle.com>> wrote:
>> 
>> Hi Ildar,
>> 
>> On Fri, 2016-09-23 at 12:40 +0300, Ildar Nurislamov wrote:
>>> Hi Thomas Schatzl!
>>> 
>>> Thank you for such prompt responses. 
>>> I'm going to try you advices and send results next week.
>>> 
>>> Here are log files you have asked about:
>>> https://www.dropbox.com/s/i9o4nuuh5gpsf1y/9noaihop_07_09_16.log.zip?d <https://www.dropbox.com/s/i9o4nuuh5gpsf1y/9noaihop_07_09_16.log.zip?d>
>>> l=0
>>> https://www.dropbox.com/s/xa3cfezvlqwwh6v/8u_log_07_09_16.log.zip?dl=
>>> 0
>>>  
>> 
>>   thanks a lot for the logs. As you may have noticed I closed JDK-
>> 8166500 as duplicate of the existing JDK-8159697 issue. They are the
>> same after all.
>> We will continue working on improving out-of-box experience of G1. :)
>> 
>> As hypothesized in the text for JDK-8166500, the 8u and 9-without-aihop 
>> show the same general issue. The suggested tunings should improve mixed
>> gc times for now.
>> 
>> Thanks,
>>   Thomas
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161012/a259a5d2/attachment.html>

From thomas.schatzl at oracle.com  Wed Oct 12 12:51:46 2016
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 12 Oct 2016 14:51:46 +0200
Subject: JDK-8166500 Adaptive sizing for IHOP causes excessively long
	mixed GC pauses
In-Reply-To: <2CA764D0-CCB4-4FA5-9A5A-B005688BD16D@getintent.com>
References: <273BC628-AC88-4E21-AB27-32AE2021B8FA@getintent.com>
	<1475048227.4430.4.camel@oracle.com>
	<FB62382E-72A1-444E-8078-02EE891AD322@getintent.com>
	<2CA764D0-CCB4-4FA5-9A5A-B005688BD16D@getintent.com>
Message-ID: <1476276706.2632.92.camel@oracle.com>

Hi,

On Wed, 2016-10-12 at 11:39 +0300, Ildar Nurislamov wrote:
> Hi Thomas,
> 
> It was too early to make conclusions.
> After some prolonged testing i've noticed that more thorough tuning
> may be required to avoid this issue completely.?
> And -XX:-G1UseAdaptiveIHOP not always enough too.?
> 
> What bothers me is the steep jump in time required between the last
> Mixed GC and the previous:
> In 9th it took 129.8ms to evacuate 104 old region:
> [64394.771s][info][gc,phases ? ? ] GC(38781) ? Evacuate Collection
> Set: 129.8ms
> [64394.771s][info][gc,phases ? ? ] GC(38781) ? Code Roots: 0.0ms
> [64394.771s][info][gc,phases ? ? ] GC(38781) ? Clear Card Table:
> 3.4ms
> [64394.771s][info][gc,phases ? ? ] GC(38781) ? Expand Heap After
> Collection: 0.0ms
> [64394.771s][info][gc,phases ? ? ] GC(38781) ? Free Collection Set:
> 3.9ms
> [64394.771s][info][gc,phases ? ? ] GC(38781) ? Merge Per-Thread
> State: 0.1ms
> [64394.771s][info][gc,phases ? ? ] GC(38781) ? Other: 13.8ms
> [64394.771s][info][gc,heap ? ? ? ] GC(38781) Eden regions: 37->0(37)
> [64394.771s][info][gc,heap ? ? ? ] GC(38781) Survivor regions: 3-
> >3(5)
> [64394.771s][info][gc,heap ? ? ? ] GC(38781) Old regions: 457->353
> [64394.771s][info][gc,heap ? ? ? ] GC(38781) Humongous regions: 3->3
> [64394.771s][info][gc,metaspace? ] GC(38781) Metaspace: 70587K-
> >70587K(83968K)
> [64394.771s][info][gc? ? ? ? ? ? ] GC(38781) Pause Mixed (G1
> Evacuation Pause) 15972M->11457M(65536M) (64394.620s, 64394.771s)
> 150.931ms
> 
> While in 10th (the last) it took?3401.3ms to evacuate 87:
> [64398.393s][info][gc,phases ? ? ] GC(38782) ? Evacuate Collection
> Set: 3401.3ms
> [64398.393s][info][gc,phases ? ? ] GC(38782) ? Code Roots: 0.0ms
> [64398.393s][info][gc,phases ? ? ] GC(38782) ? Clear Card Table:
> 2.8ms
> [64398.393s][info][gc,phases ? ? ] GC(38782) ? Expand Heap After
> Collection: 0.0ms
> [64398.393s][info][gc,phases ? ? ] GC(38782) ? Free Collection Set:
> 4.3ms
> [64398.393s][info][gc,phases ? ? ] GC(38782) ? Merge Per-Thread
> State: 0.1ms
> [64398.393s][info][gc,phases ? ? ] GC(38782) ? Other: 12.2ms
> [64398.393s][info][gc,heap ? ? ? ] GC(38782) Eden regions: 37->0(37)
> [64398.393s][info][gc,heap ? ? ? ] GC(38782) Survivor regions: 3-
> >3(5)
> [64398.393s][info][gc,heap ? ? ? ] GC(38782) Old regions: 353->266
> [64398.393s][info][gc,heap ? ? ? ] GC(38782) Humongous regions: 3->3
> [64398.393s][info][gc,metaspace? ] GC(38782) Metaspace: 70587K-
> >70587K(83968K)
> [64398.393s][info][gc? ? ? ? ? ? ] GC(38782) Pause Mixed (G1
> Evacuation Pause) 12641M->8678M(65536M) (64394.973s, 64398.393s)
> 3420.666ms
> 
> It looks like at average old regions in 10th Mixed GC were 31.5 times
> more expensive than in 9th and it took 39ms to collect just one
> region. Does it make sense? To what extent one old region may be more
> expensive than another?

Mostly remembered set operations.

> I wish G1Ergonomics similar to "reason: predicted time is too high"
> but for order of magnitude jump cases worked here even when min old
> regions number has not been reached. We didn't spend all
> XX:G1MixedGCCountTarget=12 yet here.
> 
> Log file:?https://www.dropbox.com/s/ubpkosh0a8tomss/jdk9_135_tuned_11
> _10_16.log.zip?dl=0?
> Sadly with no ergonomics.
> 
> Next thing i'm going to try is adjusting
> XX:G1MixedGCLiveThresholdPercent.

? I did not have time for a look at the logs yet, but you can try to
avoid this by either increasing MixedGCCountTarget further - as you
noticed this is a hint for G1 only anyway - or trying to get rid of
these expensive regions. One way is to decrease
G1MixedGCLiveThresholdPercent (default 85), as regions with lots of
occupancy also often have a large remembered set that is expensive to
reclaim.

Another way to explore is looking at statistics for remembered set
sizes directly. There is -XX:G1SummarizeRSetStatsPeriod which takes a
number that tells G1 to collect and print these statistics every
G1SummarizeRSetStatsPeriod'th GC. Note that this is an expensive
operation, so you might only want to do this every 10th or so GC (needs
-XX:+UnlockDiagnosticVMOptions).

Thanks,
? Thomas


From yu.zhang at oracle.com  Wed Oct 12 15:31:07 2016
From: yu.zhang at oracle.com (yu.zhang at oracle.com)
Date: Wed, 12 Oct 2016 08:31:07 -0700
Subject: JDK-8166500 Adaptive sizing for IHOP causes excessively long
	mixed GC pauses
In-Reply-To: <1476276706.2632.92.camel@oracle.com>
References: <273BC628-AC88-4E21-AB27-32AE2021B8FA@getintent.com>
	<1475048227.4430.4.camel@oracle.com>
	<FB62382E-72A1-444E-8078-02EE891AD322@getintent.com>
	<2CA764D0-CCB4-4FA5-9A5A-B005688BD16D@getintent.com>
	<1476276706.2632.92.camel@oracle.com>
Message-ID: <356ce0a6-580e-075b-f355-d064ed1529b5@oracle.com>

Ildar,

Another thing you can try is to increase G1HeapWastePercent to get rid 
of the expensive mixed gcs. From the log snip, the heap is not tight.

Thanks

Jenny


On 10/12/2016 05:51 AM, Thomas Schatzl wrote:
> Hi,
>
> On Wed, 2016-10-12 at 11:39 +0300, Ildar Nurislamov wrote:
>> Hi Thomas,
>>
>> It was too early to make conclusions.
>> After some prolonged testing i've noticed that more thorough tuning
>> may be required to avoid this issue completely.
>> And -XX:-G1UseAdaptiveIHOP not always enough too.
>>
>> What bothers me is the steep jump in time required between the last
>> Mixed GC and the previous:
>> In 9th it took 129.8ms to evacuate 104 old region:
>> [64394.771s][info][gc,phases     ] GC(38781)   Evacuate Collection
>> Set: 129.8ms
>> [64394.771s][info][gc,phases     ] GC(38781)   Code Roots: 0.0ms
>> [64394.771s][info][gc,phases     ] GC(38781)   Clear Card Table:
>> 3.4ms
>> [64394.771s][info][gc,phases     ] GC(38781)   Expand Heap After
>> Collection: 0.0ms
>> [64394.771s][info][gc,phases     ] GC(38781)   Free Collection Set:
>> 3.9ms
>> [64394.771s][info][gc,phases     ] GC(38781)   Merge Per-Thread
>> State: 0.1ms
>> [64394.771s][info][gc,phases     ] GC(38781)   Other: 13.8ms
>> [64394.771s][info][gc,heap       ] GC(38781) Eden regions: 37->0(37)
>> [64394.771s][info][gc,heap       ] GC(38781) Survivor regions: 3-
>>> 3(5)
>> [64394.771s][info][gc,heap       ] GC(38781) Old regions: 457->353
>> [64394.771s][info][gc,heap       ] GC(38781) Humongous regions: 3->3
>> [64394.771s][info][gc,metaspace  ] GC(38781) Metaspace: 70587K-
>>> 70587K(83968K)
>> [64394.771s][info][gc            ] GC(38781) Pause Mixed (G1
>> Evacuation Pause) 15972M->11457M(65536M) (64394.620s, 64394.771s)
>> 150.931ms
>>
>> While in 10th (the last) it took 3401.3ms to evacuate 87:
>> [64398.393s][info][gc,phases     ] GC(38782)   Evacuate Collection
>> Set: 3401.3ms
>> [64398.393s][info][gc,phases     ] GC(38782)   Code Roots: 0.0ms
>> [64398.393s][info][gc,phases     ] GC(38782)   Clear Card Table:
>> 2.8ms
>> [64398.393s][info][gc,phases     ] GC(38782)   Expand Heap After
>> Collection: 0.0ms
>> [64398.393s][info][gc,phases     ] GC(38782)   Free Collection Set:
>> 4.3ms
>> [64398.393s][info][gc,phases     ] GC(38782)   Merge Per-Thread
>> State: 0.1ms
>> [64398.393s][info][gc,phases     ] GC(38782)   Other: 12.2ms
>> [64398.393s][info][gc,heap       ] GC(38782) Eden regions: 37->0(37)
>> [64398.393s][info][gc,heap       ] GC(38782) Survivor regions: 3-
>>> 3(5)
>> [64398.393s][info][gc,heap       ] GC(38782) Old regions: 353->266
>> [64398.393s][info][gc,heap       ] GC(38782) Humongous regions: 3->3
>> [64398.393s][info][gc,metaspace  ] GC(38782) Metaspace: 70587K-
>>> 70587K(83968K)
>> [64398.393s][info][gc            ] GC(38782) Pause Mixed (G1
>> Evacuation Pause) 12641M->8678M(65536M) (64394.973s, 64398.393s)
>> 3420.666ms
>>
>> It looks like at average old regions in 10th Mixed GC were 31.5 times
>> more expensive than in 9th and it took 39ms to collect just one
>> region. Does it make sense? To what extent one old region may be more
>> expensive than another?
> Mostly remembered set operations.
>
>> I wish G1Ergonomics similar to "reason: predicted time is too high"
>> but for order of magnitude jump cases worked here even when min old
>> regions number has not been reached. We didn't spend all
>> XX:G1MixedGCCountTarget=12 yet here.
>>
>> Log file: https://www.dropbox.com/s/ubpkosh0a8tomss/jdk9_135_tuned_11
>> _10_16.log.zip?dl=0
>> Sadly with no ergonomics.
>>
>> Next thing i'm going to try is adjusting
>> XX:G1MixedGCLiveThresholdPercent.
>    I did not have time for a look at the logs yet, but you can try to
> avoid this by either increasing MixedGCCountTarget further - as you
> noticed this is a hint for G1 only anyway - or trying to get rid of
> these expensive regions. One way is to decrease
> G1MixedGCLiveThresholdPercent (default 85), as regions with lots of
> occupancy also often have a large remembered set that is expensive to
> reclaim.
>
> Another way to explore is looking at statistics for remembered set
> sizes directly. There is -XX:G1SummarizeRSetStatsPeriod which takes a
> number that tells G1 to collect and print these statistics every
> G1SummarizeRSetStatsPeriod'th GC. Note that this is an expensive
> operation, so you might only want to do this every 10th or so GC (needs
> -XX:+UnlockDiagnosticVMOptions).
>
> Thanks,
>    Thomas
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


From jun.zhuang at hobsons.com  Thu Oct 13 15:21:56 2016
From: jun.zhuang at hobsons.com (Jun Zhuang)
Date: Thu, 13 Oct 2016 15:21:56 +0000
Subject: Questions regarding Java string literal pool
Message-ID: <MWHPR02MB27840667C3B27F59353211AB81DC0@MWHPR02MB2784.namprd02.prod.outlook.com>

Hi,

I have a few questions related to the Java String pool, I wonder if I can get some clarification from the experts?


1.       Location of the String pool

Following are from some of the posts I read but with conflicting information:


?         http://java-performance.info/string-intern-in-java-6-7-8/

?In those good old days [before java 7] all interned strings were stored in the PermGen ? the fixed size part of heap mainly used for storing loaded classes and string pool.?  ? ?in Java 7 ? the string pool was relocated to the heap. ... All strings are now located in the heap, as most of other ordinary objects?

Above statement suggests that both the interned strings and the string pool are in the PermGen prior to java 7 but being relocated to the heap in 7.


?         https://www.javaranch.com/journal/200409/ScjpTipLine-StringsLiterally.html

?Objects are created on the heap and Strings are no exception. So, Strings that are part of the "String Literal Pool" still live on the heap, but they have references to them from the String Literal Pool.?

This post suggests that string literals are created on the heap as other objects but did not tie that to any java version.


?         http://www.javamadesoeasy.com/2015/05/string-pool-string-literal-pool-string.html

?From java 7 String pool is a storage area in java heap memory, where all the other objects are created. Prior to Java 7 String pool was created in permgen space of heap.?

So prior to java 7 the string pool was in the PermGen; beginning with 7 it?s in the heap. Same as the 1st post.

My questions are:

1.       Where is the string pool located prior and after java 7

2.       Are the string literals & interned strings objects created in the PermGen prior to java 7 then being created on the heap after?


2.       Can string literals be garbage collected?


The post @https://www.javaranch.com/journal/200409/ScjpTipLine-StringsLiterally.html says ?Unlike most objects, String literals always have a reference to them from the String Literal Pool. That means that they always have a reference to them and are, therefore, not eligible for garbage collection.? But this one @http://java-performance.info/string-intern-in-java-6-7-8/ says ?Yes, all strings in the JVM string pool are eligible for garbage collection if there are no references to them from your program roots.? Are they both true under certain conditions?


Appreciate your help,
Jun

Jun Zhuang
Sr. Performance QA Engineer | Hobsons<https://www.hobsons.com/?utm_source=outlook&utm_medium=email&utm_campaign=banner_02.12.16_general>
T: +1 513 746 2288 | jun.zhuang at hobsons.com<mailto:jun.zhuang at hobsons.com>
50 E-Business Way, Suite 300 | Cincinnati, OH 45241 | USA


Upgraded by Hobsons - Subscribe Today
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161013/acf5b336/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image857000.png
Type: image/png
Size: 13602 bytes
Desc: image857000.png
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161013/acf5b336/image857000-0001.png>

From hongkai.liu at ericsson.com  Wed Oct 19 20:16:01 2016
From: hongkai.liu at ericsson.com (Hongkai Liu)
Date: Wed, 19 Oct 2016 20:16:01 +0000
Subject: G1GC and finalizer queue
Message-ID: <AM5PR0701MB27249FA3E4072632AA3620EC9AD20@AM5PR0701MB2724.eurprd07.prod.outlook.com>

Hi,


our application (Gerrit) consumes more and more memory and Yourkit showed up with 18M Jdbc4PreparedStatement<http://grepcode.com/file/repo1.maven.org/maven2/postgresql/postgresql/9.1-901.jdbc4/org/postgresql/jdbc4/Jdbc4PreparedStatement.java> objects in "pending finalization" which uses up 21G of mem.

The heap is taken immediately after two GCs.


I wonder why those objects survived of GCs.

According to Yourkit doc<https://www.yourkit.com/docs/java/help/reachability.jsp>,  the objects in "pending finalization" are from the class with an implemenation of finalize() method while Jdbc4PreparedStatement<http://grepcode.com/file/repo1.maven.org/maven2/postgresql/postgresql/9.1-901.jdbc4/org/postgresql/jdbc4/Jdbc4PreparedStatement.java> is without it.

Is it about G1GC?


Any hint is appreciated.


BR,

Hongkai


================================


Here are the screenshots of Yourkit and App info.

[cid:82499a3c-9de5-4ae9-b0f3-c9f2e173aa40]

[cid:5dd8cdb6-7736-4294-9bc5-698ad47ecf29]

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161019/742c4e20/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot from 2016-10-19 15:51:04.png
Type: image/png
Size: 54687 bytes
Desc: Screenshot from 2016-10-19 15:51:04.png
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161019/742c4e20/04-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot from 2016-10-19 15:53:51.png
Type: image/png
Size: 27127 bytes
Desc: Screenshot from 2016-10-19 15:53:51.png
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161019/742c4e20/51-0001.png>

From ecki at zusammenkunft.net  Wed Oct 19 20:39:55 2016
From: ecki at zusammenkunft.net (Bernd Eckenfels)
Date: Wed, 19 Oct 2016 22:39:55 +0200
Subject: G1GC and finalizer queue
In-Reply-To: <AM5PR0701MB27249FA3E4072632AA3620EC9AD20@AM5PR0701MB2724.eurprd07.prod.outlook.com>
References: <AM5PR0701MB27249FA3E4072632AA3620EC9AD20@AM5PR0701MB2724.eurprd07.prod.outlook.com>
Message-ID: <20161019223955.0000207d.ecki@zusammenkunft.net>

Hello,

the finalize() is in one of the parent classes.

http://grepcode.com/file/repo1.maven.org/maven2/postgresql/postgresql/9.1-901.jdbc4/org/postgresql/jdbc2/AbstractJdbc2Statement.java#803

I am not sure if youkit shows unreferenced or referenced objects in
"pending finalization". If it is referenced objects, the statements
might hang around in a prepared statement cache. If they are
unreferenced the finalizer thread might be slow or blocked.

I would try to do an heapdump to investigate.

When properly using datasource pools and tomcat facilities a leak is
unlikely. If you have some hardcoded jdbc code, that might also be a
possible explanation for the number.

Gruss
Bernd

 Am Wed, 19 Oct 2016 20:16:01 +0000
schrieb Hongkai Liu <hongkai.liu at ericsson.com>:

> Hi,
> 
> 
> our application (Gerrit) consumes more and more memory and Yourkit
> showed up with 18M
> Jdbc4PreparedStatement<http://grepcode.com/file/repo1.maven.org/maven2/postgresql/postgresql/9.1-901.jdbc4/org/postgresql/jdbc4/Jdbc4PreparedStatement.java>
> objects in "pending finalization" which uses up 21G of mem.
> 
> The heap is taken immediately after two GCs.
> 
> 
> I wonder why those objects survived of GCs.
> 
> According to Yourkit
> doc<https://www.yourkit.com/docs/java/help/reachability.jsp>,  the
> objects in "pending finalization" are from the class with an
> implemenation of finalize() method while
> Jdbc4PreparedStatement<http://grepcode.com/file/repo1.maven.org/maven2/postgresql/postgresql/9.1-901.jdbc4/org/postgresql/jdbc4/Jdbc4PreparedStatement.java>
> is without it.
> 
> Is it about G1GC?
> 
> 
> Any hint is appreciated.
> 
> 
> BR,
> 
> Hongkai
> 
> 
> ================================
> 
> 
> Here are the screenshots of Yourkit and App info.
> 
> [cid:82499a3c-9de5-4ae9-b0f3-c9f2e173aa40]
> 
> [cid:5dd8cdb6-7736-4294-9bc5-698ad47ecf29]
> 
> 


From vitalyd at gmail.com  Wed Oct 19 21:52:09 2016
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Wed, 19 Oct 2016 17:52:09 -0400
Subject: G1GC and finalizer queue
In-Reply-To: <20161019223955.0000207d.ecki@zusammenkunft.net>
References: <AM5PR0701MB27249FA3E4072632AA3620EC9AD20@AM5PR0701MB2724.eurprd07.prod.outlook.com>
	<20161019223955.0000207d.ecki@zusammenkunft.net>
Message-ID: <CAHjP37H+Cztm7GsSKkcN6i5Q17pRMJjL_tTZ5nquqN4Edc5FtQ@mail.gmail.com>

On Wednesday, October 19, 2016, Bernd Eckenfels <ecki at zusammenkunft.net>
wrote:

> Hello,
>
> the finalize() is in one of the parent classes.
>
> http://grepcode.com/file/repo1.maven.org/maven2/
> postgresql/postgresql/9.1-901.jdbc4/org/postgresql/jdbc2/
> AbstractJdbc2Statement.java#803
>
> I am not sure if youkit shows unreferenced or referenced objects in
> "pending finalization". If it is referenced objects, the statements
> might hang around in a prepared statement cache. If they are
> unreferenced the finalizer thread might be slow or blocked.

YK docs indicate it's showing (strongly) unreachable objects that are
sitting on the finalization queue.  This implies (unless YK is broken)
these instances have already been discovered by G1 to be unreachable, and
thus got enqueued for finalization.

I'd jstack/sigquit the Java process to get a thread dump and see what the
Finalizer thread is up to.

>
> I would try to do an heapdump to investigate.
>
> When properly using datasource pools and tomcat facilities a leak is
> unlikely. If you have some hardcoded jdbc code, that might also be a
> possible explanation for the number.
>
> Gruss
> Bernd
>
>  Am Wed, 19 Oct 2016 20:16:01 +0000
> schrieb Hongkai Liu <hongkai.liu at ericsson.com <javascript:;>>:
>
> > Hi,
> >
> >
> > our application (Gerrit) consumes more and more memory and Yourkit
> > showed up with 18M
> > Jdbc4PreparedStatement<http://grepcode.com/file/repo1.maven.
> org/maven2/postgresql/postgresql/9.1-901.jdbc4/org/postgresql/jdbc4/
> Jdbc4PreparedStatement.java>
> > objects in "pending finalization" which uses up 21G of mem.
> >
> > The heap is taken immediately after two GCs.
> >
> >
> > I wonder why those objects survived of GCs.
> >
> > According to Yourkit
> > doc<https://www.yourkit.com/docs/java/help/reachability.jsp>,  the
> > objects in "pending finalization" are from the class with an
> > implemenation of finalize() method while
> > Jdbc4PreparedStatement<http://grepcode.com/file/repo1.maven.
> org/maven2/postgresql/postgresql/9.1-901.jdbc4/org/postgresql/jdbc4/
> Jdbc4PreparedStatement.java>
> > is without it.
> >
> > Is it about G1GC?
> >
> >
> > Any hint is appreciated.
> >
> >
> > BR,
> >
> > Hongkai
> >
> >
> > ================================
> >
> >
> > Here are the screenshots of Yourkit and App info.
> >
> > [cid:82499a3c-9de5-4ae9-b0f3-c9f2e173aa40]
> >
> > [cid:5dd8cdb6-7736-4294-9bc5-698ad47ecf29]
> >
> >
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net <javascript:;>
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>


-- 
Sent from my phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161019/5b6bc220/attachment.html>

From brian.toal at gmail.com  Fri Oct 21 23:55:39 2016
From: brian.toal at gmail.com (Brian Toal)
Date: Fri, 21 Oct 2016 16:55:39 -0700
Subject: metaspace proportion of fragmentation
Message-ID: <CAC5hbCXDHehGxLYjfyLX4K5_xUz8xOeYPRrdjSFuat-TSkusjw@mail.gmail.com>

Good evening.  In a application that I'm responsible for, Metaspace is set
to 1.1GB.  Specifically the following flags are set:

-XX:MetaspaceSize=1152m
-XX:MaxMetaspaceSize=1152m
-XX:MinMetaspaceFreeRatio=0
-XX:MaxMetaspaceFreeRatio=100

However we are getting a OOME when metaspace size hits 80% of 1.1GB.  Doing
a bit or research it seems that Metaspace is known to fragement the memory
when a loader needs to acquire memory from the current chunk, and the
current chuck can't accomodate the request, the pointer is bumped to the
next available chunk, meaning any free memory in the previous chunks block
is gone with the wind.  More than happy if someone corrects my understand
here or can point me to a good reference that explains this in detail.

My question is, how to do I monitor current usage + fragmentation so the
proportion of free space can be monitored?

Also is there any tuning that can take place to reduce the proportion of
fragmentation?

Does compressed cache acquire memory from memory set aside from memory
allocated via MaxMetaspaceSize?

Thanks in advance.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161021/8e56bc9d/attachment.html>

From prasanna.gopal at blackrock.com  Tue Oct 25 08:41:53 2016
From: prasanna.gopal at blackrock.com (Gopal, Prasanna CWK)
Date: Tue, 25 Oct 2016 08:41:53 +0000
Subject: G1 GC  Humongous Objects -  Garbage collection
Message-ID: <3749e09396984048a2485a1324b86e6c@UKPMSEXD202N02.na.blkint.com>

Hi All

I have the following question about Garbage collection of  Humongous objects.


1)     When will the humongous objects will get reclaimed ?

2)     Is there is any behaviour difference between Jdk 7 and Jdk 8 run time ?

3)     I understand, in pre-jdk 8 G1 GC , the humongous objects gets collected only through Full GC. In my application , I couldn?t see Full GC happening for long time (running on jdk_7u40_x64) , does this means the humongous objects stay in memory , till we have a full GC ?

Appreciate your   help in answering these questions.

Thanks and Regards
Prasanna


This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for further information.  Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more information about BlackRock?s Privacy Policy.
BlackRock Advisors (UK) Limited and BlackRock Investment Management (UK) Limited are authorised and regulated by the Financial Conduct Authority. Registered in England No. 796793 and No. 2020394 respectively. BlackRock Life Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. Registered in England No. 2223202. Registered Offices: 12 Throgmorton Avenue, London EC2N 2DL. BlackRock International Limited is authorised and regulated by the Financial Conduct Authority and is a registered investment adviser with the Securities and Exchange Commission (SEC). Registered in Scotland No. SC160821. Registered Office: Exchange Place One, 1 Semple Street, Edinburgh EH3 8BL.
For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.

? 2016 BlackRock, Inc. All rights reserved.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161025/899a6252/attachment.html>

From thomas.schatzl at oracle.com  Tue Oct 25 09:24:20 2016
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 25 Oct 2016 11:24:20 +0200
Subject: G1 GC  Humongous Objects -  Garbage collection
In-Reply-To: <3749e09396984048a2485a1324b86e6c@UKPMSEXD202N02.na.blkint.com>
References: <3749e09396984048a2485a1324b86e6c@UKPMSEXD202N02.na.blkint.com>
Message-ID: <1477387460.2969.14.camel@oracle.com>

Hi,

On Tue, 2016-10-25 at 08:41 +0000, Gopal, Prasanna CWK wrote:
> Hi All
> ?
> I have the following question about Garbage collection of? Humongous
> objects.
> ?
> 1)???? When will the humongous objects will get reclaimed ?
> 2)???? Is there is any behaviour difference between Jdk 7 and Jdk 8
> run time ?
> 3)???? I understand, in pre-jdk 8 G1 GC , the humongous objects gets
> collected only through Full GC. In my application , I couldn?t see
> Full GC happening for long time (running on jdk_7u40_x64) , does this
> means the humongous objects stay in memory , till we have a full GC ?

G1 can reclaim humongous objects...

* at the end of marking in the GC Cleanup pause.

* during full gc.

* JDK8u60+ can also reclaim particular types of humongous objects
(arrays that do _not_ consist of references to objects) at every young
GC. See the release notes for 8u60 at?http://www.oracle.com/technetwork
/java/javase/8u60-relnotes-2620227.html?under "New Features and
Changes" for how to control this.
(It works for any array of primitive type, not limited to the examples
given there - just in case you wonder).

Thanks,
? Thomas


From thomas.stuefe at gmail.com  Tue Oct 25 09:37:40 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Tue, 25 Oct 2016 11:37:40 +0200
Subject: metaspace proportion of fragmentation
In-Reply-To: <CAC5hbCXDHehGxLYjfyLX4K5_xUz8xOeYPRrdjSFuat-TSkusjw@mail.gmail.com>
References: <CAC5hbCXDHehGxLYjfyLX4K5_xUz8xOeYPRrdjSFuat-TSkusjw@mail.gmail.com>
Message-ID: <CAA-vtUwHLokJZrRp1BTor+aVnnUWc1pk8mu+mUyP=7PVXc8n5Q@mail.gmail.com>

Hi Brian,

On Sat, Oct 22, 2016 at 1:55 AM, Brian Toal <brian.toal at gmail.com> wrote:

> Good evening.  In a application that I'm responsible for, Metaspace is set
> to 1.1GB.  Specifically the following flags are set:
>
> -XX:MetaspaceSize=1152m
> -XX:MaxMetaspaceSize=1152m
> -XX:MinMetaspaceFreeRatio=0
> -XX:MaxMetaspaceFreeRatio=100
>
> However we are getting a OOME when metaspace size hits 80% of 1.1GB.
>

Out of metaspace or out of compressed class space? If the latter, have you
set CompressedClassSpaceSize?


> Doing a bit or research it seems that Metaspace is known to fragement the
> memory when a loader needs to acquire memory from the current chunk, and
> the current chuck can't accomodate the request, the pointer is bumped to
> the next available chunk, meaning any free memory in the previous chunks
> block is gone with the wind.
>

No. The remaining space is put into freelists (both on chunk and on block
level) and used for follow-up requests, should the size fit. In our
experience, we see very little wastage due to "half-eaten blocks/chunks.

There are other possible waste scenarios:

1) you have a lot of class loaders living in parallel. Each one will take a
chunk of memory (its current chunk) and satisfy memory requests from there.
This means that the current chunk always contains a portion of still-unused
memory which may be used by this class loader in the future but already
counts against MaxMetaspaceSize. However, to make this hurt, you really
have to have very many class loaders in parallel, as the maximum possible
overhead for this scenario cannot exceed the size of a medium chunk per
classloader (64k?)

2) The scenario described in this JEP here:
https://bugs.openjdk.java.net/browse/JDK-8166690 .

3) real fragmentation (i.e. a mixture of in-use and free chunks).

In my practice, I keep seeing (2). Hence the JEP, which will hopefully help.

Kind Regards, Thomas


> More than happy if someone corrects my understand here or can point me to
> a good reference that explains this in detail.
>
> My question is, how to do I monitor current usage + fragmentation so the
> proportion of free space can be monitored?
>
> Also is there any tuning that can take place to reduce the proportion of
> fragmentation?
>
> Does compressed cache acquire memory from memory set aside from memory
> allocated via MaxMetaspaceSize?
>
> Thanks in advance.
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161025/5148cd0d/attachment-0001.html>

From prasanna.gopal at blackrock.com  Tue Oct 25 09:53:34 2016
From: prasanna.gopal at blackrock.com (Gopal, Prasanna CWK)
Date: Tue, 25 Oct 2016 09:53:34 +0000
Subject: G1 GC  Humongous Objects -  Garbage collection
In-Reply-To: <1477387460.2969.14.camel@oracle.com>
References: <3749e09396984048a2485a1324b86e6c@UKPMSEXD202N02.na.blkint.com>
	<1477387460.2969.14.camel@oracle.com>
Message-ID: <17856615e3ec4df3a0559a7ab31122e8@UKPMSEXD202N02.na.blkint.com>

?Hi Thomas 

Thanks for your explanation. Appreciate your help.

Thanks and Regards
Prasanna 

-----Original Message-----
From: Thomas Schatzl [mailto:thomas.schatzl at oracle.com] 
Sent: 25 October 2016 10:24
To: Gopal, Prasanna CWK <prasanna.gopal at blackrock.com>; hotspot-gc-use at openjdk.java.net
Subject: Re: G1 GC Humongous Objects - Garbage collection

Hi,

On Tue, 2016-10-25 at 08:41 +0000, Gopal, Prasanna CWK wrote:
> Hi All
> ?
> I have the following question about Garbage collection of? Humongous 
> objects.
> ?
> 1)???? When will the humongous objects will get reclaimed ?
> 2)???? Is there is any behaviour difference between Jdk 7 and Jdk 8 
> run time ?
> 3)???? I understand, in pre-jdk 8 G1 GC , the humongous objects gets 
> collected only through Full GC. In my application , I couldn?t see 
> Full GC happening for long time (running on jdk_7u40_x64) , does this 
> means the humongous objects stay in memory , till we have a full GC ?

G1 can reclaim humongous objects...

* at the end of marking in the GC Cleanup pause.

* during full gc.

* JDK8u60+ can also reclaim particular types of humongous objects (arrays that do _not_ consist of references to objects) at every young GC. See the release notes for 8u60 at?https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oracle.com_technetwork&d=DQIFaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=zRhnqN6xuCQh8NZ-MtoiYBMlItU6r8UBO9AjZ3c3DEY&m=5pQkGSufUB_aL1XJUcW86zVuBn5xYh1XrUD5N2zcu1M&s=OKbYPqGNR3NGiLzOFh6tXk2cXLnbhFxrp8H4Svff20A&e=
/java/javase/8u60-relnotes-2620227.html?under "New Features and Changes" for how to control this.
(It works for any array of primitive type, not limited to the examples given there - just in case you wonder).

Thanks,
? Thomas


This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for further information.  Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more information about BlackRock?s Privacy Policy.
BlackRock Advisors (UK) Limited and BlackRock Investment Management (UK) Limited are authorised and regulated by the Financial Conduct Authority. Registered in England No. 796793 and No. 2020394 respectively. BlackRock Life Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. Registered in England No. 2223202. Registered Offices: 12 Throgmorton Avenue, London EC2N 2DL. BlackRock International Limited is authorised and regulated by the Financial Conduct Authority and is a registered investment adviser with the Securities and Exchange Commission (SEC). Registered in Scotland No. SC160821. Registered Office: Exchange Place One, 1 Semple Street, Edinburgh EH3 8BL.


For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.

? 2016 BlackRock, Inc. All rights reserved.

From brian.toal at gmail.com  Tue Oct 25 17:05:02 2016
From: brian.toal at gmail.com (Brian Toal)
Date: Tue, 25 Oct 2016 10:05:02 -0700
Subject: metaspace proportion of fragmentation
In-Reply-To: <CAA-vtUwHLokJZrRp1BTor+aVnnUWc1pk8mu+mUyP=7PVXc8n5Q@mail.gmail.com>
References: <CAC5hbCXDHehGxLYjfyLX4K5_xUz8xOeYPRrdjSFuat-TSkusjw@mail.gmail.com>
	<CAA-vtUwHLokJZrRp1BTor+aVnnUWc1pk8mu+mUyP=7PVXc8n5Q@mail.gmail.com>
Message-ID: <CAC5hbCV5GJW8c_FNNd21yOf4dFvO0hpBP=sM37BRpZN0AMpWnA@mail.gmail.com>

Thanks for the reply Thomas.

We are getting a OOME on metaspace and do not set
CompressedClassSpaceSize.  It seems our used CompressedClassSpaceSize is
~100MB.  If this isn't set, will it use as much of the specified
XX:MaxMetaspaceSize
as needed?

We do have a lot of class loaders, after looking closely at the heap.
Looks like ~11k loaders, where the majority of them are
sun.reflect.DelegatingClassLoader's corresponding to reflective method
instances that are being strongly referenced.  I'm not sure if the chunk
size is 64k because that would lead to ~687MB of Metaspace going to the
initial allocation for each loader, however looking at the output of "jcmd
<pid> GC.class_stats" and summing up the total for all
DelegatingClassLoader's it shows only ~46MB is used and the remaining
classes account for ~840MB.  Maybe the total accumulated chunk memory is
not part of the output of "jcmd <pid> GC.class_stats", do you know if
allocated by unused space by the loader is reported here?  If not is there
anyway to get this info on a production JVM?

Looking at the pdf was on the reference of 2, it seem like the
DelegatingClassLoader's are consuming 4 small chunks of 512 words which is
4 x 512 words x 8 bytes/word so as a lower bound i would expect the cost of
all those loaders to be ~171MB.  The metaspace usage in the JVM is ~940MB
and compressed class space is ~100mb, so adding the ~171MB it seems to
bring us to a total of ~1.18G which is roughly the value of
XX:MaxMetaspaceSize.

I've found a few ways to limit the number of DelegatingClassLoader's by
either changing the inflation threshold or we could possibly just break the
path back to gc root of the class instance that is owned by the
corresponding DelegatingClassLoader.  It's a shame that the same
DelegatingClassLoader isn't reused, but I suppose the finer granularity of
method to classloader is to increase the chances that the loader can be
unloaded when the method is not longer referenced.

On Tue, Oct 25, 2016 at 2:37 AM, Thomas St?fe <thomas.stuefe at gmail.com>
wrote:

> Hi Brian,
>
> On Sat, Oct 22, 2016 at 1:55 AM, Brian Toal <brian.toal at gmail.com> wrote:
>
>> Good evening.  In a application that I'm responsible for, Metaspace is
>> set to 1.1GB.  Specifically the following flags are set:
>>
>> -XX:MetaspaceSize=1152m
>> -XX:MaxMetaspaceSize=1152m
>> -XX:MinMetaspaceFreeRatio=0
>> -XX:MaxMetaspaceFreeRatio=100
>>
>> However we are getting a OOME when metaspace size hits 80% of 1.1GB.
>>
>
> Out of metaspace or out of compressed class space? If the latter, have you
> set CompressedClassSpaceSize?
>
>
>> Doing a bit or research it seems that Metaspace is known to fragement the
>> memory when a loader needs to acquire memory from the current chunk, and
>> the current chuck can't accomodate the request, the pointer is bumped to
>> the next available chunk, meaning any free memory in the previous chunks
>> block is gone with the wind.
>>
>
> No. The remaining space is put into freelists (both on chunk and on block
> level) and used for follow-up requests, should the size fit. In our
> experience, we see very little wastage due to "half-eaten blocks/chunks.
>
> There are other possible waste scenarios:
>
> 1) you have a lot of class loaders living in parallel. Each one will take
> a chunk of memory (its current chunk) and satisfy memory requests from
> there. This means that the current chunk always contains a portion of
> still-unused memory which may be used by this class loader in the future
> but already counts against MaxMetaspaceSize. However, to make this hurt,
> you really have to have very many class loaders in parallel, as the maximum
> possible overhead for this scenario cannot exceed the size of a medium
> chunk per classloader (64k?)
>
> 2) The scenario described in this JEP here: https://bugs.openjdk.
> java.net/browse/JDK-8166690 .
>
> 3) real fragmentation (i.e. a mixture of in-use and free chunks).
>
> In my practice, I keep seeing (2). Hence the JEP, which will hopefully
> help.
>
> Kind Regards, Thomas
>
>
>
>> More than happy if someone corrects my understand here or can point me to
>> a good reference that explains this in detail.
>>
>> My question is, how to do I monitor current usage + fragmentation so the
>> proportion of free space can be monitored?
>>
>> Also is there any tuning that can take place to reduce the proportion of
>> fragmentation?
>>
>> Does compressed cache acquire memory from memory set aside from memory
>> allocated via MaxMetaspaceSize?
>>
>> Thanks in advance.
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161025/49cea8ca/attachment.html>

From david.ely at unboundid.com  Fri Oct 28 22:43:01 2016
From: david.ely at unboundid.com (David Ely)
Date: Fri, 28 Oct 2016 17:43:01 -0500
Subject: occasional ParNew times of 15+ seconds
Message-ID: <CAC1_7KNM+c5o_yMH6m3ueO9c+qOEAxW4_LQAq3iTn0BCBDJxog@mail.gmail.com>

While typical ParNew GC times are 50ms, our application is occasionally
hitting ParNew times that are over 15 seconds for one of our customers, and
we have no idea why. Looking at the full GC log file:

382250 ParNew GCs are < 1 second
9303 are 100ms to 1 second
1267 are 1 second to 2 seconds
99 are 2 seconds to 10 seconds
24 are > 10 seconds, 48 seconds being the max

The long ones are somewhat bursty as you can see from looking at the line
numbers in the GC log:

$ egrep -n  '(ParNew.*real=[1-9][0-9]\)' gc.log.0

12300:2016-10-21T01:03:20.380+0000: 20278.069:
[GC2016-10-21T01:03:20.380+0000: 20278.070: [ParNew:
1697741K->10024K(1887488K), 16.9913450 secs]
33979542K->32817239K(84724992K), 16.9921050 secs] [Times: user=541.32
sys=14.37, real=16.99 secs]
43730:2016-10-21T14:12:25.050+0000: 67622.740:
[GC2016-10-21T14:12:25.051+0000: 67622.740: [ParNew:
1728194K->33817K(1887488K), 12.7508470 secs]
49737924K->48320707K(84724992K), 12.7517840 secs] [Times: user=405.89
sys=11.05, real=12.75 secs]
44079:2016-10-21T14:18:55.172+0000: 68012.862:
[GC2016-10-21T14:18:55.173+0000: 68012.862: [ParNew:
1698371K->26958K(1887488K), 12.7384460 secs]
50339815K->48930730K(84724992K), 12.7392360 secs] [Times: user=406.58
sys=11.29, real=12.73 secs]
50151:2016-10-21T17:10:14.471+0000: 78292.160:
[GC2016-10-21T17:10:14.471+0000: 78292.161: [ParNew:
1713813K->40968K(1887488K), 18.6593320 secs]
49366906K->47959129K(84724992K), 18.6602550 secs] [Times: user=590.03
sys=17.45, real=18.66 secs]
56073:2016-10-21T19:59:36.847+0000: 88454.536:
[GC2016-10-21T19:59:36.847+0000: 88454.537: [ParNew:
1685720K->20763K(1887488K), 16.0840200 secs]
50704025K->49302131K(84724992K), 16.0848810 secs] [Times: user=487.00
sys=16.84, real=16.09 secs]
78987:2016-10-22T05:49:25.623+0000: 123843.312:
[GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew:
1709771K->22678K(1887488K), 10.9933380 secs]
43323834K->41914203K(84724992K), 10.9943060 secs] [Times: user=349.67
sys=9.84, real=10.99 secs]
79104:2016-10-22T05:59:26.382+0000: 124444.071:
[GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew:
1697024K->22260K(1887488K), 11.5490390 secs]
44558499K->43145880K(84724992K), 11.5499650 secs] [Times: user=367.73
sys=10.01, real=11.55 secs]
79504:2016-10-22T06:09:36.983+0000: 125054.672:
[GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew:
1688112K->4769K(1887488K), 14.1528810 secs]
46684947K->45263748K(84724992K), 14.1539860 secs] [Times: user=452.28
sys=12.71, real=14.15 secs]
79772:2016-10-22T06:30:36.130+0000: 126313.819:
[GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew:
1725520K->35893K(1887488K), 14.4479670 secs]
48989739K->47563879K(84724992K), 14.4488810 secs] [Times: user=461.60
sys=13.04, real=14.45 secs]
80087:2016-10-22T06:37:07.202+0000: 126704.891:
[GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew:
1698021K->23440K(1887488K), 15.7039920 secs]
50517163K->49105987K(84724992K), 15.7050040 secs] [Times: user=497.65
sys=14.75, real=15.70 secs]
89969:2016-10-22T13:54:27.978+0000: 152945.667:
[GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew:
1834914K->15978K(1887488K), 11.5637150 secs]
48716340K->47307673K(84724992K), 11.5645440 secs] [Times: user=367.77
sys=10.01, real=11.57 secs]
90200:2016-10-22T14:05:02.717+0000: 153580.407:
[GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew:
1684626K->7078K(1887488K), 17.3424650 secs]
50361539K->48947648K(84724992K), 17.3433490 secs] [Times: user=554.39
sys=15.81, real=17.34 secs]
90299:2016-10-22T14:14:30.521+0000: 154148.210:
[GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew:
1690850K->6078K(1887488K), 13.1699350 secs]
51455784K->50033156K(84724992K), 13.1708900 secs] [Times: user=419.55
sys=11.54, real=13.17 secs]
261329:2016-10-26T00:06:44.499+0000: 448882.189:
[GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew:
1705614K->22224K(1887488K), 17.5831730 secs]
40683698K->39525817K(84724992K), 17.5843270 secs] [Times: user=561.85
sys=14.79, real=17.58 secs]
261935:2016-10-26T00:13:34.277+0000: 449291.967:
[GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew:
1690085K->26707K(1887488K), 13.9331790 secs]
43792178K->42655000K(84724992K), 13.9340780 secs] [Times: user=446.36
sys=11.45, real=13.93 secs]
262143:2016-10-26T00:20:09.397+0000: 449687.087:
[GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew:
1696593K->27078K(1887488K), 40.3344500 secs]
45588644K->44444949K(84724992K), 40.3355430 secs] [Times: user=1248.15
sys=43.07, real=40.33 secs]
262275:2016-10-26T00:27:02.196+0000: 450099.886:
[GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew:
1683406K->17853K(1887488K), 17.7472360 secs]
46908499K->45506131K(84724992K), 17.7482260 secs] [Times: user=567.03
sys=16.10, real=17.75 secs]
262282:2016-10-26T00:27:29.448+0000: 450127.138:
[GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew:
1687737K->10499K(1887488K), 35.4934000 secs]
47195678K->46044477K(84724992K), 35.4943230 secs] [Times: user=1131.34
sys=31.87, real=35.49 secs]
262631:2016-10-26T00:34:17.632+0000: 450535.321:
[GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew:
1687590K->10226K(1887488K), 21.4043600 secs]
49431427K->48018504K(84724992K), 21.4052230 secs] [Times: user=682.50
sys=19.46, real=21.41 secs]
262844:2016-10-26T00:41:08.118+0000: 450945.808:
[GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew:
1692928K->11302K(1887488K), 48.2899260 secs]
51073216K->49915878K(84724992K), 48.2909550 secs] [Times: user=1493.17
sys=53.55, real=48.28 secs]
345421:2016-10-27T04:17:59.617+0000: 550357.306:
[GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew:
1695052K->22991K(1887488K), 33.8707510 secs]
46334738K->45187822K(84724992K), 33.8718980 secs] [Times: user=1081.31
sys=30.59, real=33.86 secs]
345510:2016-10-27T04:24:11.721+0000: 550729.411:
[GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew:
1705080K->20401K(1887488K), 18.9795540 secs]
47388073K->45965537K(84724992K), 18.9805410 secs] [Times: user=606.94
sys=17.25, real=18.98 secs]
345514:2016-10-27T04:24:36.695+0000: 550754.385:
[GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew:
1707810K->32640K(1887488K), 30.9728200 secs]
47656489K->46506725K(84724992K), 30.9737300 secs] [Times: user=917.67
sys=33.07, real=30.97 secs]
345777:2016-10-27T04:31:30.102+0000: 551167.791:
[GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew:
1704518K->30860K(1887488K), 38.0976720 secs]
49570144K->48422333K(84724992K), 38.0984950 secs] [Times: user=1215.89
sys=34.79, real=38.09 secs]

Context around a single instance is fairly normal:

345773-2016-10-27T04:31:28.032+0000: 551165.721:
[GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew:
1685858K->8851K(1887488K), 0.0480250 secs] 49545909K->47870050K(84724992K),
0.0490200 secs] [Times: user=1.47 sys=0.02, real=0.05 secs]
345774-2016-10-27T04:31:28.635+0000: 551166.324:
[GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew:
1686675K->10456K(1887488K), 0.0463570 secs]
49547874K->47872545K(84724992K), 0.0473410 secs] [Times: user=1.41
sys=0.04, real=0.05 secs]
345775-2016-10-27T04:31:29.205+0000: 551166.894:
[GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew:
1688280K->12733K(1887488K), 0.0487100 secs]
49550369K->47876404K(84724992K), 0.0496310 secs] [Times: user=1.47
sys=0.04, real=0.05 secs]
345776-2016-10-27T04:31:29.798+0000: 551167.487:
[GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew:
1690557K->26694K(1887488K), 0.0471170 secs]
49554228K->47892320K(84724992K), 0.0481180 secs] [Times: user=1.40
sys=0.02, real=0.05 secs]
345777:2016-10-27T04:31:30.102+0000: 551167.791:
[GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew:
1704518K->30860K(1887488K), 38.0976720 secs]
49570144K->48422333K(84724992K), 38.0984950 secs] [Times: user=1215.89
sys=34.79, real=38.09 secs]
345778-2016-10-27T04:32:08.449+0000: 551206.139:
[GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew:
1708684K->122033K(1887488K), 0.0664280 secs]
50100157K->48528020K(84724992K), 0.0672860 secs] [Times: user=1.60
sys=0.05, real=0.07 secs]
345779-2016-10-27T04:32:09.090+0000: 551206.779:
[GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew:
1799857K->42169K(1887488K), 0.0688910 secs]
50205844K->48541030K(84724992K), 0.0696110 secs] [Times: user=1.70
sys=0.03, real=0.07 secs]
345780-2016-10-27T04:32:09.802+0000: 551207.491:
[GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew:
1719993K->43790K(1887488K), 0.0508540 secs]
50218854K->48542651K(84724992K), 0.0516000 secs] [Times: user=1.54
sys=0.03, real=0.05 secs]
345781-2016-10-27T04:32:10.536+0000: 551208.226:
[GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew:
1721614K->30389K(1887488K), 0.0668100 secs]
50220475K->48545932K(84724992K), 0.0675470 secs] [Times: user=1.81
sys=0.03, real=0.06 secs]
345782-2016-10-27T04:32:11.137+0000: 551208.826:
[GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew:
1708213K->18631K(1887488K), 0.0632570 secs]
50223756K->48540797K(84724992K), 0.0639650 secs] [Times: user=1.95
sys=0.01, real=0.06 secs]
345783-2016-10-27T04:32:11.642+0000: 551209.332:
[GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew:
1696455K->19415K(1887488K), 0.0509260 secs]
50218621K->48545033K(84724992K), 0.0516780 secs] [Times: user=1.55
sys=0.03, real=0.05 secs]

Since the user times are high as well, I don't think this could be swapping.

Here are the hard-earned set of JVM arguments that we're using:

-d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \
  -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled \
  -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled \
  -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \
  -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000 \
  -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC
-XX:+UseBiasedLocking \
  -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops -XX:PermSize=256M \
  -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \
  -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar -XX:+UseLargePages \
  -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags \
  -XX:+UseGCLogFileRotation \
  -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \
  -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log

This is on Linux with Java 1.7.0_72.

Does this look familiar to anyone? Alternatively, are there some more JVM
options that we could include to get more information?

One of the first things that we'll try is to move to a later JVM, but it
will be easier to get the customer to do that if we can point to a specific
issue that has been addressed.

Thanks for your help.

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161028/63b8764c/attachment.html>

From Peter.B.Kessler at Oracle.COM  Fri Oct 28 23:30:24 2016
From: Peter.B.Kessler at Oracle.COM (Peter B. Kessler)
Date: Fri, 28 Oct 2016 16:30:24 -0700
Subject: occasional ParNew times of 15+ seconds
In-Reply-To: <CAC1_7KNM+c5o_yMH6m3ueO9c+qOEAxW4_LQAq3iTn0BCBDJxog@mail.gmail.com>
References: <CAC1_7KNM+c5o_yMH6m3ueO9c+qOEAxW4_LQAq3iTn0BCBDJxog@mail.gmail.com>
Message-ID: <3f2b0f43-d5ae-ce6b-f525-80d61022406e@Oracle.COM>

Look at the promotion rates (sort of) in the "context around a single instance": subtract the "after" size of one line from the "after" size of the next line.  I see

	Size                             	Duration	PromotedK
	------------------------------- 	---------	---------
	49545909K->47870050K(84724992K) 	 0.049020										
	49547874K->47872545K(84724992K) 	 0.047341	     2495
	49550369K->47876404K(84724992K) 	 0.049631	     3859
	49554228K->47892320K(84724992K) 	 0.048118	    15916
	49570144K->48422333K(84724992K) 	38.098495	   530013
	50100157K->48528020K(84724992K) 	 0.067286	   105687
	50205844K->48541030K(84724992K) 	 0.069611	    13010
	50218854K->48542651K(84724992K) 	 0.051600	     1621
	50220475K->48545932K(84724992K) 	 0.067547	     3281
	50223756K->48540797K(84724992K) 	 0.063965	    -5135
	50218621K->48545033K(84724992K) 	 0.051678	     4236

(I hope that survives mail reformatting, but you get the idea.)

The interval around the long collection is not like the rest of the context.  It promotes 100x the amount of memory, with some ramp up in the collection before and soma ramp down in the 2 collections after.  What's up with that?

Even 100x longer wouldn't be 38 seconds.  And the collections before and after copy more data but don't take longer.  So there's something that doesn't scale about the objects being copied, too.  Chasing a long list?  Copying a big array of references that then also have to be promoted?  Those would both happen all in one collection, not smeared out over 4 collections.  But it seems like application behavior is involved, not just a failure of the collector.  Do you know what the application is doing at that time?  Is is doing that at the other times with long pauses?  Do the contexts for the other long pauses look like that?

Or it could be something else.  It often is. :-)  I'm just looking under the streetlight.

			... peter

On 10/28/16 03:43 PM, David Ely wrote:
> While typical ParNew GC times are 50ms, our application is occasionally hitting ParNew times that are over 15 seconds for one of our customers, and we have no idea why. Looking at the full GC log file:
>
> 382250 ParNew GCs are < 1 second
> 9303 are 100ms to 1 second
> 1267 are 1 second to 2 seconds
> 99 are 2 seconds to 10 seconds
> 24 are > 10 seconds, 48 seconds being the max
>
> The long ones are somewhat bursty as you can see from looking at the line numbers in the GC log:
>
> $ egrep -n  '(ParNew.*real=[1-9][0-9]\)' gc.log.0
>
> 12300:2016-10-21T01:03:20.380+0000: 20278.069: [GC2016-10-21T01:03:20.380+0000: 20278.070: [ParNew: 1697741K->10024K(1887488K), 16.9913450 secs] 33979542K->32817239K(84724992K), 16.9921050 secs] [Times: user=541.32 sys=14.37, real=16.99 secs]
> 43730:2016-10-21T14:12:25.050+0000: 67622.740: [GC2016-10-21T14:12:25.051+0000: 67622.740: [ParNew: 1728194K->33817K(1887488K), 12.7508470 secs] 49737924K->48320707K(84724992K), 12.7517840 secs] [Times: user=405.89 sys=11.05, real=12.75 secs]
> 44079:2016-10-21T14:18:55.172+0000: 68012.862: [GC2016-10-21T14:18:55.173+0000: 68012.862: [ParNew: 1698371K->26958K(1887488K), 12.7384460 secs] 50339815K->48930730K(84724992K), 12.7392360 secs] [Times: user=406.58 sys=11.29, real=12.73 secs]
> 50151:2016-10-21T17:10:14.471+0000: 78292.160: [GC2016-10-21T17:10:14.471+0000: 78292.161: [ParNew: 1713813K->40968K(1887488K), 18.6593320 secs] 49366906K->47959129K(84724992K), 18.6602550 secs] [Times: user=590.03 sys=17.45, real=18.66 secs]
> 56073:2016-10-21T19:59:36.847+0000: 88454.536: [GC2016-10-21T19:59:36.847+0000: 88454.537: [ParNew: 1685720K->20763K(1887488K), 16.0840200 secs] 50704025K->49302131K(84724992K), 16.0848810 secs] [Times: user=487.00 sys=16.84, real=16.09 secs]
> 78987:2016-10-22T05:49:25.623+0000: 123843.312: [GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew: 1709771K->22678K(1887488K), 10.9933380 secs] 43323834K->41914203K(84724992K), 10.9943060 secs] [Times: user=349.67 sys=9.84, real=10.99 secs]
> 79104:2016-10-22T05:59:26.382+0000: 124444.071: [GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew: 1697024K->22260K(1887488K), 11.5490390 secs] 44558499K->43145880K(84724992K), 11.5499650 secs] [Times: user=367.73 sys=10.01, real=11.55 secs]
> 79504:2016-10-22T06:09:36.983+0000: 125054.672: [GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew: 1688112K->4769K(1887488K), 14.1528810 secs] 46684947K->45263748K(84724992K), 14.1539860 secs] [Times: user=452.28 sys=12.71, real=14.15 secs]
> 79772:2016-10-22T06:30:36.130+0000: 126313.819: [GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew: 1725520K->35893K(1887488K), 14.4479670 secs] 48989739K->47563879K(84724992K), 14.4488810 secs] [Times: user=461.60 sys=13.04, real=14.45 secs]
> 80087:2016-10-22T06:37:07.202+0000: 126704.891: [GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew: 1698021K->23440K(1887488K), 15.7039920 secs] 50517163K->49105987K(84724992K), 15.7050040 secs] [Times: user=497.65 sys=14.75, real=15.70 secs]
> 89969:2016-10-22T13:54:27.978+0000: 152945.667: [GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew: 1834914K->15978K(1887488K), 11.5637150 secs] 48716340K->47307673K(84724992K), 11.5645440 secs] [Times: user=367.77 sys=10.01, real=11.57 secs]
> 90200:2016-10-22T14:05:02.717+0000: 153580.407: [GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew: 1684626K->7078K(1887488K), 17.3424650 secs] 50361539K->48947648K(84724992K), 17.3433490 secs] [Times: user=554.39 sys=15.81, real=17.34 secs]
> 90299:2016-10-22T14:14:30.521+0000: 154148.210: [GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew: 1690850K->6078K(1887488K), 13.1699350 secs] 51455784K->50033156K(84724992K), 13.1708900 secs] [Times: user=419.55 sys=11.54, real=13.17 secs]
> 261329:2016-10-26T00:06:44.499+0000: 448882.189: [GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew: 1705614K->22224K(1887488K), 17.5831730 secs] 40683698K->39525817K(84724992K), 17.5843270 secs] [Times: user=561.85 sys=14.79, real=17.58 secs]
> 261935:2016-10-26T00:13:34.277+0000: 449291.967: [GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew: 1690085K->26707K(1887488K), 13.9331790 secs] 43792178K->42655000K(84724992K), 13.9340780 secs] [Times: user=446.36 sys=11.45, real=13.93 secs]
> 262143:2016-10-26T00:20:09.397+0000: 449687.087: [GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew: 1696593K->27078K(1887488K), 40.3344500 secs] 45588644K->44444949K(84724992K), 40.3355430 secs] [Times: user=1248.15 sys=43.07, real=40.33 secs]
> 262275:2016-10-26T00:27:02.196+0000: 450099.886: [GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew: 1683406K->17853K(1887488K), 17.7472360 secs] 46908499K->45506131K(84724992K), 17.7482260 secs] [Times: user=567.03 sys=16.10, real=17.75 secs]
> 262282:2016-10-26T00:27:29.448+0000: 450127.138: [GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew: 1687737K->10499K(1887488K), 35.4934000 secs] 47195678K->46044477K(84724992K), 35.4943230 secs] [Times: user=1131.34 sys=31.87, real=35.49 secs]
> 262631:2016-10-26T00:34:17.632+0000: 450535.321: [GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew: 1687590K->10226K(1887488K), 21.4043600 secs] 49431427K->48018504K(84724992K), 21.4052230 secs] [Times: user=682.50 sys=19.46, real=21.41 secs]
> 262844:2016-10-26T00:41:08.118+0000: 450945.808: [GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew: 1692928K->11302K(1887488K), 48.2899260 secs] 51073216K->49915878K(84724992K), 48.2909550 secs] [Times: user=1493.17 sys=53.55, real=48.28 secs]
> 345421:2016-10-27T04:17:59.617+0000: 550357.306: [GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew: 1695052K->22991K(1887488K), 33.8707510 secs] 46334738K->45187822K(84724992K), 33.8718980 secs] [Times: user=1081.31 sys=30.59, real=33.86 secs]
> 345510:2016-10-27T04:24:11.721+0000: 550729.411: [GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew: 1705080K->20401K(1887488K), 18.9795540 secs] 47388073K->45965537K(84724992K), 18.9805410 secs] [Times: user=606.94 sys=17.25, real=18.98 secs]
> 345514:2016-10-27T04:24:36.695+0000: 550754.385: [GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew: 1707810K->32640K(1887488K), 30.9728200 secs] 47656489K->46506725K(84724992K), 30.9737300 secs] [Times: user=917.67 sys=33.07, real=30.97 secs]
> 345777:2016-10-27T04:31:30.102+0000: 551167.791: [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs]
>
> Context around a single instance is fairly normal:
>
> 345773-2016-10-27T04:31:28.032+0000: 551165.721: [GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew: 1685858K->8851K(1887488K), 0.0480250 secs] 49545909K->47870050K(84724992K), 0.0490200 secs] [Times: user=1.47 sys=0.02, real=0.05 secs]
> 345774-2016-10-27T04:31:28.635+0000: 551166.324: [GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew: 1686675K->10456K(1887488K), 0.0463570 secs] 49547874K->47872545K(84724992K), 0.0473410 secs] [Times: user=1.41 sys=0.04, real=0.05 secs]
> 345775-2016-10-27T04:31:29.205+0000: 551166.894: [GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew: 1688280K->12733K(1887488K), 0.0487100 secs] 49550369K->47876404K(84724992K), 0.0496310 secs] [Times: user=1.47 sys=0.04, real=0.05 secs]
> 345776-2016-10-27T04:31:29.798+0000: 551167.487: [GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew: 1690557K->26694K(1887488K), 0.0471170 secs] 49554228K->47892320K(84724992K), 0.0481180 secs] [Times: user=1.40 sys=0.02, real=0.05 secs]
> 345777:2016-10-27T04:31:30.102+0000: 551167.791: [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs]
> 345778-2016-10-27T04:32:08.449+0000: 551206.139: [GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew: 1708684K->122033K(1887488K), 0.0664280 secs] 50100157K->48528020K(84724992K), 0.0672860 secs] [Times: user=1.60 sys=0.05, real=0.07 secs]
> 345779-2016-10-27T04:32:09.090+0000: 551206.779: [GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew: 1799857K->42169K(1887488K), 0.0688910 secs] 50205844K->48541030K(84724992K), 0.0696110 secs] [Times: user=1.70 sys=0.03, real=0.07 secs]
> 345780-2016-10-27T04:32:09.802+0000: 551207.491: [GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew: 1719993K->43790K(1887488K), 0.0508540 secs] 50218854K->48542651K(84724992K), 0.0516000 secs] [Times: user=1.54 sys=0.03, real=0.05 secs]
> 345781-2016-10-27T04:32:10.536+0000: 551208.226: [GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew: 1721614K->30389K(1887488K), 0.0668100 secs] 50220475K->48545932K(84724992K), 0.0675470 secs] [Times: user=1.81 sys=0.03, real=0.06 secs]
> 345782-2016-10-27T04:32:11.137+0000: 551208.826: [GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew: 1708213K->18631K(1887488K), 0.0632570 secs] 50223756K->48540797K(84724992K), 0.0639650 secs] [Times: user=1.95 sys=0.01, real=0.06 secs]
> 345783-2016-10-27T04:32:11.642+0000: 551209.332: [GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew: 1696455K->19415K(1887488K), 0.0509260 secs] 50218621K->48545033K(84724992K), 0.0516780 secs] [Times: user=1.55 sys=0.03, real=0.05 secs]
>
> Since the user times are high as well, I don't think this could be swapping.
>
> Here are the hard-earned set of JVM arguments that we're using:
>
> -d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \
>   -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled \
>   -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled \
>   -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \
>   -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000 \
>   -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC -XX:+UseBiasedLocking \
>   -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops -XX:PermSize=256M \
>   -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \
>   -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar -XX:+UseLargePages \
>   -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags \
>   -XX:+UseGCLogFileRotation \
>   -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \
>   -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log
>
> This is on Linux with Java 1.7.0_72.
>
> Does this look familiar to anyone? Alternatively, are there some more JVM options that we could include to get more information?
>
> One of the first things that we'll try is to move to a later JVM, but it will be easier to get the customer to do that if we can point to a specific issue that has been addressed.
>
> Thanks for your help.
>
> David
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>

From vitalyd at gmail.com  Sat Oct 29 01:04:54 2016
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Fri, 28 Oct 2016 21:04:54 -0400
Subject: occasional ParNew times of 15+ seconds
In-Reply-To: <CAC1_7KNM+c5o_yMH6m3ueO9c+qOEAxW4_LQAq3iTn0BCBDJxog@mail.gmail.com>
References: <CAC1_7KNM+c5o_yMH6m3ueO9c+qOEAxW4_LQAq3iTn0BCBDJxog@mail.gmail.com>
Message-ID: <CAHjP37EsiXBwW5p6AwY+jtw71bQ-+YpBb_NwZoJkCvDgS8HwZQ@mail.gmail.com>

On Friday, October 28, 2016, David Ely <david.ely at unboundid.com> wrote:

> While typical ParNew GC times are 50ms, our application is occasionally
> hitting ParNew times that are over 15 seconds for one of our customers, and
> we have no idea why. Looking at the full GC log file:
>
> 382250 ParNew GCs are < 1 second
> 9303 are 100ms to 1 second
> 1267 are 1 second to 2 seconds
> 99 are 2 seconds to 10 seconds
> 24 are > 10 seconds, 48 seconds being the max
>
> The long ones are somewhat bursty as you can see from looking at the line
> numbers in the GC log:
>
> $ egrep -n  '(ParNew.*real=[1-9][0-9]\)' gc.log.0
>
> 12300:2016-10-21T01:03:20.380+0000: 20278.069: [GC2016-10-21T01:03:20.380+0000:
> 20278.070: [ParNew: 1697741K->10024K(1887488K), 16.9913450 secs]
> 33979542K->32817239K(84724992K), 16.9921050 secs] [Times: user=541.32
> sys=14.37, real=16.99 secs]
> 43730:2016-10-21T14:12:25.050+0000: 67622.740: [GC2016-10-21T14:12:25.051+0000:
> 67622.740: [ParNew: 1728194K->33817K(1887488K), 12.7508470 secs]
> 49737924K->48320707K(84724992K), 12.7517840 secs] [Times: user=405.89
> sys=11.05, real=12.75 secs]
> 44079:2016-10-21T14:18:55.172+0000: 68012.862: [GC2016-10-21T14:18:55.173+0000:
> 68012.862: [ParNew: 1698371K->26958K(1887488K), 12.7384460 secs]
> 50339815K->48930730K(84724992K), 12.7392360 secs] [Times: user=406.58
> sys=11.29, real=12.73 secs]
> 50151:2016-10-21T17:10:14.471+0000: 78292.160: [GC2016-10-21T17:10:14.471+0000:
> 78292.161: [ParNew: 1713813K->40968K(1887488K), 18.6593320 secs]
> 49366906K->47959129K(84724992K), 18.6602550 secs] [Times: user=590.03
> sys=17.45, real=18.66 secs]
> 56073:2016-10-21T19:59:36.847+0000: 88454.536: [GC2016-10-21T19:59:36.847+0000:
> 88454.537: [ParNew: 1685720K->20763K(1887488K), 16.0840200 secs]
> 50704025K->49302131K(84724992K), 16.0848810 secs] [Times: user=487.00
> sys=16.84, real=16.09 secs]
> 78987:2016-10-22T05:49:25.623+0000: 123843.312:
> [GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew:
> 1709771K->22678K(1887488K), 10.9933380 secs] 43323834K->41914203K(84724992K),
> 10.9943060 secs] [Times: user=349.67 sys=9.84, real=10.99 secs]
> 79104:2016-10-22T05:59:26.382+0000: 124444.071:
> [GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew:
> 1697024K->22260K(1887488K), 11.5490390 secs] 44558499K->43145880K(84724992K),
> 11.5499650 secs] [Times: user=367.73 sys=10.01, real=11.55 secs]
> 79504:2016-10-22T06:09:36.983+0000: 125054.672:
> [GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew:
> 1688112K->4769K(1887488K), 14.1528810 secs] 46684947K->45263748K(84724992K),
> 14.1539860 secs] [Times: user=452.28 sys=12.71, real=14.15 secs]
> 79772:2016-10-22T06:30:36.130+0000: 126313.819:
> [GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew:
> 1725520K->35893K(1887488K), 14.4479670 secs] 48989739K->47563879K(84724992K),
> 14.4488810 secs] [Times: user=461.60 sys=13.04, real=14.45 secs]
> 80087:2016-10-22T06:37:07.202+0000: 126704.891:
> [GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew:
> 1698021K->23440K(1887488K), 15.7039920 secs] 50517163K->49105987K(84724992K),
> 15.7050040 secs] [Times: user=497.65 sys=14.75, real=15.70 secs]
> 89969:2016-10-22T13:54:27.978+0000: 152945.667:
> [GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew:
> 1834914K->15978K(1887488K), 11.5637150 secs] 48716340K->47307673K(84724992K),
> 11.5645440 secs] [Times: user=367.77 sys=10.01, real=11.57 secs]
> 90200:2016-10-22T14:05:02.717+0000: 153580.407:
> [GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew:
> 1684626K->7078K(1887488K), 17.3424650 secs] 50361539K->48947648K(84724992K),
> 17.3433490 secs] [Times: user=554.39 sys=15.81, real=17.34 secs]
> 90299:2016-10-22T14:14:30.521+0000: 154148.210:
> [GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew:
> 1690850K->6078K(1887488K), 13.1699350 secs] 51455784K->50033156K(84724992K),
> 13.1708900 secs] [Times: user=419.55 sys=11.54, real=13.17 secs]
> 261329:2016-10-26T00:06:44.499+0000: 448882.189:
> [GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew:
> 1705614K->22224K(1887488K), 17.5831730 secs] 40683698K->39525817K(84724992K),
> 17.5843270 secs] [Times: user=561.85 sys=14.79, real=17.58 secs]
> 261935:2016-10-26T00:13:34.277+0000: 449291.967:
> [GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew:
> 1690085K->26707K(1887488K), 13.9331790 secs] 43792178K->42655000K(84724992K),
> 13.9340780 secs] [Times: user=446.36 sys=11.45, real=13.93 secs]
> 262143:2016-10-26T00:20:09.397+0000: 449687.087:
> [GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew:
> 1696593K->27078K(1887488K), 40.3344500 secs] 45588644K->44444949K(84724992K),
> 40.3355430 secs] [Times: user=1248.15 sys=43.07, real=40.33 secs]
> 262275:2016-10-26T00:27:02.196+0000: 450099.886:
> [GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew:
> 1683406K->17853K(1887488K), 17.7472360 secs] 46908499K->45506131K(84724992K),
> 17.7482260 secs] [Times: user=567.03 sys=16.10, real=17.75 secs]
> 262282:2016-10-26T00:27:29.448+0000: 450127.138:
> [GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew:
> 1687737K->10499K(1887488K), 35.4934000 secs] 47195678K->46044477K(84724992K),
> 35.4943230 secs] [Times: user=1131.34 sys=31.87, real=35.49 secs]
> 262631:2016-10-26T00:34:17.632+0000: 450535.321:
> [GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew:
> 1687590K->10226K(1887488K), 21.4043600 secs] 49431427K->48018504K(84724992K),
> 21.4052230 secs] [Times: user=682.50 sys=19.46, real=21.41 secs]
> 262844:2016-10-26T00:41:08.118+0000: 450945.808:
> [GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew:
> 1692928K->11302K(1887488K), 48.2899260 secs] 51073216K->49915878K(84724992K),
> 48.2909550 secs] [Times: user=1493.17 sys=53.55, real=48.28 secs]
> 345421:2016-10-27T04:17:59.617+0000: 550357.306:
> [GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew:
> 1695052K->22991K(1887488K), 33.8707510 secs] 46334738K->45187822K(84724992K),
> 33.8718980 secs] [Times: user=1081.31 sys=30.59, real=33.86 secs]
> 345510:2016-10-27T04:24:11.721+0000: 550729.411:
> [GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew:
> 1705080K->20401K(1887488K), 18.9795540 secs] 47388073K->45965537K(84724992K),
> 18.9805410 secs] [Times: user=606.94 sys=17.25, real=18.98 secs]
> 345514:2016-10-27T04:24:36.695+0000: 550754.385:
> [GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew:
> 1707810K->32640K(1887488K), 30.9728200 secs] 47656489K->46506725K(84724992K),
> 30.9737300 secs] [Times: user=917.67 sys=33.07, real=30.97 secs]
> 345777:2016-10-27T04:31:30.102+0000: 551167.791:
> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew:
> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K),
> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs]
>
> Context around a single instance is fairly normal:
>
> 345773-2016-10-27T04:31:28.032+0000: 551165.721:
> [GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew:
> 1685858K->8851K(1887488K), 0.0480250 secs] 49545909K->47870050K(84724992K),
> 0.0490200 secs] [Times: user=1.47 sys=0.02, real=0.05 secs]
> 345774-2016-10-27T04:31:28.635+0000: 551166.324:
> [GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew:
> 1686675K->10456K(1887488K), 0.0463570 secs] 49547874K->47872545K(84724992K),
> 0.0473410 secs] [Times: user=1.41 sys=0.04, real=0.05 secs]
> 345775-2016-10-27T04:31:29.205+0000: 551166.894:
> [GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew:
> 1688280K->12733K(1887488K), 0.0487100 secs] 49550369K->47876404K(84724992K),
> 0.0496310 secs] [Times: user=1.47 sys=0.04, real=0.05 secs]
> 345776-2016-10-27T04:31:29.798+0000: 551167.487:
> [GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew:
> 1690557K->26694K(1887488K), 0.0471170 secs] 49554228K->47892320K(84724992K),
> 0.0481180 secs] [Times: user=1.40 sys=0.02, real=0.05 secs]
> 345777:2016-10-27T04:31:30.102+0000: 551167.791:
> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew:
> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K),
> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs]
> 345778-2016-10-27T04:32:08.449+0000: 551206.139:
> [GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew:
> 1708684K->122033K(1887488K), 0.0664280 secs] 50100157K->48528020K(84724992K),
> 0.0672860 secs] [Times: user=1.60 sys=0.05, real=0.07 secs]
> 345779-2016-10-27T04:32:09.090+0000: 551206.779:
> [GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew:
> 1799857K->42169K(1887488K), 0.0688910 secs] 50205844K->48541030K(84724992K),
> 0.0696110 secs] [Times: user=1.70 sys=0.03, real=0.07 secs]
> 345780-2016-10-27T04:32:09.802+0000: 551207.491:
> [GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew:
> 1719993K->43790K(1887488K), 0.0508540 secs] 50218854K->48542651K(84724992K),
> 0.0516000 secs] [Times: user=1.54 sys=0.03, real=0.05 secs]
> 345781-2016-10-27T04:32:10.536+0000: 551208.226:
> [GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew:
> 1721614K->30389K(1887488K), 0.0668100 secs] 50220475K->48545932K(84724992K),
> 0.0675470 secs] [Times: user=1.81 sys=0.03, real=0.06 secs]
> 345782-2016-10-27T04:32:11.137+0000: 551208.826:
> [GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew:
> 1708213K->18631K(1887488K), 0.0632570 secs] 50223756K->48540797K(84724992K),
> 0.0639650 secs] [Times: user=1.95 sys=0.01, real=0.06 secs]
> 345783-2016-10-27T04:32:11.642+0000: 551209.332:
> [GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew:
> 1696455K->19415K(1887488K), 0.0509260 secs] 50218621K->48545033K(84724992K),
> 0.0516780 secs] [Times: user=1.55 sys=0.03, real=0.05 secs]
>
> Since the user times are high as well, I don't think this could be
> swapping.
>
Can you ask the customer if they're using transparent hugepages (THP)?

>
> Here are the hard-earned set of JVM arguments that we're using:
>
> -d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \
>   -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled \
>   -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled \
>   -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \
>   -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000 \
>   -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC
> -XX:+UseBiasedLocking \
>   -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops -XX:PermSize=256M \
>   -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \
>   -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar -XX:+UseLargePages \
>   -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags \
>   -XX:+UseGCLogFileRotation \
>   -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \
>   -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log
>
> This is on Linux with Java 1.7.0_72.
>
> Does this look familiar to anyone? Alternatively, are there some more JVM
> options that we could include to get more information?
>
> One of the first things that we'll try is to move to a later JVM, but it
> will be easier to get the customer to do that if we can point to a specific
> issue that has been addressed.
>
> Thanks for your help.
>
> David
>


-- 
Sent from my phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161028/4ecccb34/attachment.html>

From david.ely at unboundid.com  Sat Oct 29 13:40:02 2016
From: david.ely at unboundid.com (David Ely)
Date: Sat, 29 Oct 2016 08:40:02 -0500
Subject: occasional ParNew times of 15+ seconds
In-Reply-To: <CAHjP37EsiXBwW5p6AwY+jtw71bQ-+YpBb_NwZoJkCvDgS8HwZQ@mail.gmail.com>
References: <CAC1_7KNM+c5o_yMH6m3ueO9c+qOEAxW4_LQAq3iTn0BCBDJxog@mail.gmail.com>
	<CAHjP37EsiXBwW5p6AwY+jtw71bQ-+YpBb_NwZoJkCvDgS8HwZQ@mail.gmail.com>
Message-ID: <CAC1_7KPyfeV43zEGNKPMrXJObNWXNMV-CzTFs6bYyfgkjWeDiw@mail.gmail.com>

Thank you for the response. Yes. meminfo (see full output below) shows
~80GB of AnonHugePages, which is pretty close to the size of the JVM (full
output below). Looking back through previous information that we have from
this customer, transparent huge pages have been turned on for years. We've
asked them for anything else that might have changed in this environment.

Are there any other JVM options that we could enable that would shed light
on what's going on within the ParNew? Would -XX:+PrintTLAB
-XX:+PrintPLAB -XX:PrintFLSStatistics=1
show anything useful?

David


MemTotal:       264396572 kB
MemFree:         2401576 kB
Buffers:          381564 kB
Cached:         172673120 kB
SwapCached:            0 kB
Active:         163439836 kB
Inactive:       90737452 kB
Active(anon):   76910848 kB
Inactive(anon):  4212580 kB
Active(file):   86528988 kB
Inactive(file): 86524872 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      16236540 kB
SwapFree:       16236540 kB
Dirty:             14552 kB
Writeback:             0 kB
AnonPages:      81111768 kB
Mapped:            31312 kB
Shmem:               212 kB
Slab:            6078732 kB
SReclaimable:    5956052 kB
SUnreclaim:       122680 kB
KernelStack:       41296 kB
PageTables:       171324 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    148434824 kB
Committed_AS:   93124984 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      686780 kB
VmallocChunk:   34225639420 kB
HardwareCorrupted:     0 kB
*AnonHugePages:  80519168 kB*
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        5132 kB
DirectMap2M:     1957888 kB
DirectMap1G:    266338304 kB


On Fri, Oct 28, 2016 at 8:04 PM, Vitaly Davidovich <vitalyd at gmail.com>
wrote:

>
>
> On Friday, October 28, 2016, David Ely <david.ely at unboundid.com> wrote:
>
>> While typical ParNew GC times are 50ms, our application is occasionally
>> hitting ParNew times that are over 15 seconds for one of our customers, and
>> we have no idea why. Looking at the full GC log file:
>>
>> 382250 ParNew GCs are < 1 second
>> 9303 are 100ms to 1 second
>> 1267 are 1 second to 2 seconds
>> 99 are 2 seconds to 10 seconds
>> 24 are > 10 seconds, 48 seconds being the max
>>
>> The long ones are somewhat bursty as you can see from looking at the line
>> numbers in the GC log:
>>
>> $ egrep -n  '(ParNew.*real=[1-9][0-9]\)' gc.log.0
>>
>> 12300:2016-10-21T01:03:20.380+0000: 20278.069:
>> [GC2016-10-21T01:03:20.380+0000: 20278.070: [ParNew:
>> 1697741K->10024K(1887488K), 16.9913450 secs] 33979542K->32817239K(84724992K),
>> 16.9921050 secs] [Times: user=541.32 sys=14.37, real=16.99 secs]
>> 43730:2016-10-21T14:12:25.050+0000: 67622.740:
>> [GC2016-10-21T14:12:25.051+0000: 67622.740: [ParNew:
>> 1728194K->33817K(1887488K), 12.7508470 secs] 49737924K->48320707K(84724992K),
>> 12.7517840 secs] [Times: user=405.89 sys=11.05, real=12.75 secs]
>> 44079:2016-10-21T14:18:55.172+0000: 68012.862:
>> [GC2016-10-21T14:18:55.173+0000: 68012.862: [ParNew:
>> 1698371K->26958K(1887488K), 12.7384460 secs] 50339815K->48930730K(84724992K),
>> 12.7392360 secs] [Times: user=406.58 sys=11.29, real=12.73 secs]
>> 50151:2016-10-21T17:10:14.471+0000: 78292.160:
>> [GC2016-10-21T17:10:14.471+0000: 78292.161: [ParNew:
>> 1713813K->40968K(1887488K), 18.6593320 secs] 49366906K->47959129K(84724992K),
>> 18.6602550 secs] [Times: user=590.03 sys=17.45, real=18.66 secs]
>> 56073:2016-10-21T19:59:36.847+0000: 88454.536:
>> [GC2016-10-21T19:59:36.847+0000: 88454.537: [ParNew:
>> 1685720K->20763K(1887488K), 16.0840200 secs] 50704025K->49302131K(84724992K),
>> 16.0848810 secs] [Times: user=487.00 sys=16.84, real=16.09 secs]
>> 78987:2016-10-22T05:49:25.623+0000: 123843.312:
>> [GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew:
>> 1709771K->22678K(1887488K), 10.9933380 secs] 43323834K->41914203K(84724992K),
>> 10.9943060 secs] [Times: user=349.67 sys=9.84, real=10.99 secs]
>> 79104:2016-10-22T05:59:26.382+0000: 124444.071:
>> [GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew:
>> 1697024K->22260K(1887488K), 11.5490390 secs] 44558499K->43145880K(84724992K),
>> 11.5499650 secs] [Times: user=367.73 sys=10.01, real=11.55 secs]
>> 79504:2016-10-22T06:09:36.983+0000: 125054.672:
>> [GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew:
>> 1688112K->4769K(1887488K), 14.1528810 secs] 46684947K->45263748K(84724992K),
>> 14.1539860 secs] [Times: user=452.28 sys=12.71, real=14.15 secs]
>> 79772:2016-10-22T06:30:36.130+0000: 126313.819:
>> [GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew:
>> 1725520K->35893K(1887488K), 14.4479670 secs] 48989739K->47563879K(84724992K),
>> 14.4488810 secs] [Times: user=461.60 sys=13.04, real=14.45 secs]
>> 80087:2016-10-22T06:37:07.202+0000: 126704.891:
>> [GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew:
>> 1698021K->23440K(1887488K), 15.7039920 secs] 50517163K->49105987K(84724992K),
>> 15.7050040 secs] [Times: user=497.65 sys=14.75, real=15.70 secs]
>> 89969:2016-10-22T13:54:27.978+0000: 152945.667:
>> [GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew:
>> 1834914K->15978K(1887488K), 11.5637150 secs] 48716340K->47307673K(84724992K),
>> 11.5645440 secs] [Times: user=367.77 sys=10.01, real=11.57 secs]
>> 90200:2016-10-22T14:05:02.717+0000: 153580.407:
>> [GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew:
>> 1684626K->7078K(1887488K), 17.3424650 secs] 50361539K->48947648K(84724992K),
>> 17.3433490 secs] [Times: user=554.39 sys=15.81, real=17.34 secs]
>> 90299:2016-10-22T14:14:30.521+0000: 154148.210:
>> [GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew:
>> 1690850K->6078K(1887488K), 13.1699350 secs] 51455784K->50033156K(84724992K),
>> 13.1708900 secs] [Times: user=419.55 sys=11.54, real=13.17 secs]
>> 261329:2016-10-26T00:06:44.499+0000: 448882.189:
>> [GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew:
>> 1705614K->22224K(1887488K), 17.5831730 secs] 40683698K->39525817K(84724992K),
>> 17.5843270 secs] [Times: user=561.85 sys=14.79, real=17.58 secs]
>> 261935:2016-10-26T00:13:34.277+0000: 449291.967:
>> [GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew:
>> 1690085K->26707K(1887488K), 13.9331790 secs] 43792178K->42655000K(84724992K),
>> 13.9340780 secs] [Times: user=446.36 sys=11.45, real=13.93 secs]
>> 262143:2016-10-26T00:20:09.397+0000: 449687.087:
>> [GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew:
>> 1696593K->27078K(1887488K), 40.3344500 secs] 45588644K->44444949K(84724992K),
>> 40.3355430 secs] [Times: user=1248.15 sys=43.07, real=40.33 secs]
>> 262275:2016-10-26T00:27:02.196+0000: 450099.886:
>> [GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew:
>> 1683406K->17853K(1887488K), 17.7472360 secs] 46908499K->45506131K(84724992K),
>> 17.7482260 secs] [Times: user=567.03 sys=16.10, real=17.75 secs]
>> 262282:2016-10-26T00:27:29.448+0000: 450127.138:
>> [GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew:
>> 1687737K->10499K(1887488K), 35.4934000 secs] 47195678K->46044477K(84724992K),
>> 35.4943230 secs] [Times: user=1131.34 sys=31.87, real=35.49 secs]
>> 262631:2016-10-26T00:34:17.632+0000: 450535.321:
>> [GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew:
>> 1687590K->10226K(1887488K), 21.4043600 secs] 49431427K->48018504K(84724992K),
>> 21.4052230 secs] [Times: user=682.50 sys=19.46, real=21.41 secs]
>> 262844:2016-10-26T00:41:08.118+0000: 450945.808:
>> [GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew:
>> 1692928K->11302K(1887488K), 48.2899260 secs] 51073216K->49915878K(84724992K),
>> 48.2909550 secs] [Times: user=1493.17 sys=53.55, real=48.28 secs]
>> 345421:2016-10-27T04:17:59.617+0000: 550357.306:
>> [GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew:
>> 1695052K->22991K(1887488K), 33.8707510 secs] 46334738K->45187822K(84724992K),
>> 33.8718980 secs] [Times: user=1081.31 sys=30.59, real=33.86 secs]
>> 345510:2016-10-27T04:24:11.721+0000: 550729.411:
>> [GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew:
>> 1705080K->20401K(1887488K), 18.9795540 secs] 47388073K->45965537K(84724992K),
>> 18.9805410 secs] [Times: user=606.94 sys=17.25, real=18.98 secs]
>> 345514:2016-10-27T04:24:36.695+0000: 550754.385:
>> [GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew:
>> 1707810K->32640K(1887488K), 30.9728200 secs] 47656489K->46506725K(84724992K),
>> 30.9737300 secs] [Times: user=917.67 sys=33.07, real=30.97 secs]
>> 345777:2016-10-27T04:31:30.102+0000: 551167.791:
>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew:
>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K),
>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs]
>>
>> Context around a single instance is fairly normal:
>>
>> 345773-2016-10-27T04:31:28.032+0000: 551165.721:
>> [GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew:
>> 1685858K->8851K(1887488K), 0.0480250 secs] 49545909K->47870050K(84724992K),
>> 0.0490200 secs] [Times: user=1.47 sys=0.02, real=0.05 secs]
>> 345774-2016-10-27T04:31:28.635+0000: 551166.324:
>> [GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew:
>> 1686675K->10456K(1887488K), 0.0463570 secs] 49547874K->47872545K(84724992K),
>> 0.0473410 secs] [Times: user=1.41 sys=0.04, real=0.05 secs]
>> 345775-2016-10-27T04:31:29.205+0000: 551166.894:
>> [GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew:
>> 1688280K->12733K(1887488K), 0.0487100 secs] 49550369K->47876404K(84724992K),
>> 0.0496310 secs] [Times: user=1.47 sys=0.04, real=0.05 secs]
>> 345776-2016-10-27T04:31:29.798+0000: 551167.487:
>> [GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew:
>> 1690557K->26694K(1887488K), 0.0471170 secs] 49554228K->47892320K(84724992K),
>> 0.0481180 secs] [Times: user=1.40 sys=0.02, real=0.05 secs]
>> 345777:2016-10-27T04:31:30.102+0000: 551167.791:
>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew:
>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K),
>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs]
>> 345778-2016-10-27T04:32:08.449+0000: 551206.139:
>> [GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew:
>> 1708684K->122033K(1887488K), 0.0664280 secs] 50100157K->48528020K(84724992K),
>> 0.0672860 secs] [Times: user=1.60 sys=0.05, real=0.07 secs]
>> 345779-2016-10-27T04:32:09.090+0000: 551206.779:
>> [GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew:
>> 1799857K->42169K(1887488K), 0.0688910 secs] 50205844K->48541030K(84724992K),
>> 0.0696110 secs] [Times: user=1.70 sys=0.03, real=0.07 secs]
>> 345780-2016-10-27T04:32:09.802+0000: 551207.491:
>> [GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew:
>> 1719993K->43790K(1887488K), 0.0508540 secs] 50218854K->48542651K(84724992K),
>> 0.0516000 secs] [Times: user=1.54 sys=0.03, real=0.05 secs]
>> 345781-2016-10-27T04:32:10.536+0000: 551208.226:
>> [GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew:
>> 1721614K->30389K(1887488K), 0.0668100 secs] 50220475K->48545932K(84724992K),
>> 0.0675470 secs] [Times: user=1.81 sys=0.03, real=0.06 secs]
>> 345782-2016-10-27T04:32:11.137+0000: 551208.826:
>> [GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew:
>> 1708213K->18631K(1887488K), 0.0632570 secs] 50223756K->48540797K(84724992K),
>> 0.0639650 secs] [Times: user=1.95 sys=0.01, real=0.06 secs]
>> 345783-2016-10-27T04:32:11.642+0000: 551209.332:
>> [GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew:
>> 1696455K->19415K(1887488K), 0.0509260 secs] 50218621K->48545033K(84724992K),
>> 0.0516780 secs] [Times: user=1.55 sys=0.03, real=0.05 secs]
>>
>> Since the user times are high as well, I don't think this could be
>> swapping.
>>
> Can you ask the customer if they're using transparent hugepages (THP)?
>
>>
>> Here are the hard-earned set of JVM arguments that we're using:
>>
>> -d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \
>>   -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled \
>>   -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled \
>>   -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \
>>   -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000 \
>>   -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC
>> -XX:+UseBiasedLocking \
>>   -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops -XX:PermSize=256M \
>>   -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \
>>   -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar -XX:+UseLargePages \
>>   -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags \
>>   -XX:+UseGCLogFileRotation \
>>   -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \
>>   -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log
>>
>> This is on Linux with Java 1.7.0_72.
>>
>> Does this look familiar to anyone? Alternatively, are there some more JVM
>> options that we could include to get more information?
>>
>> One of the first things that we'll try is to move to a later JVM, but it
>> will be easier to get the customer to do that if we can point to a specific
>> issue that has been addressed.
>>
>> Thanks for your help.
>>
>> David
>>
>
>
> --
> Sent from my phone
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161029/93b4bb20/attachment-0001.html>

From david.ely at unboundid.com  Sat Oct 29 14:56:57 2016
From: david.ely at unboundid.com (David Ely)
Date: Sat, 29 Oct 2016 09:56:57 -0500
Subject: occasional ParNew times of 15+ seconds
In-Reply-To: <3f2b0f43-d5ae-ce6b-f525-80d61022406e@Oracle.COM>
References: <CAC1_7KNM+c5o_yMH6m3ueO9c+qOEAxW4_LQAq3iTn0BCBDJxog@mail.gmail.com>
	<3f2b0f43-d5ae-ce6b-f525-80d61022406e@Oracle.COM>
Message-ID: <CAC1_7KPqSaPR133j_f2pRjCjOR0nH3f59BTDJkVPMS24goBFZQ@mail.gmail.com>

Thanks for the response. It does seem to be related to the amount of data
promoted, but that isn't the only factor at play, Here's a plot of the
amount of data promoted per ParNew above the ParNew duration for a two hour
window:

[image: Inline image 2]

As you can see long ParNews imply a large promotion but not the reverse.
What second factor might be involved?

We're looking into what is different in the application at this time. The
majority of the heap and hence promoted data is part of a Berkeley DB Java
Edition database cache. The database cache holds all of the data and is
otherwise stable. There are other activities like database checkpointing
and cleaning that happen in the background, but those are going on all of
the time.

Are there any more JVM options that we could shed light on what's happening
during the ParNew collections?

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161029/b486158c/attachment.html>

From vitalyd at gmail.com  Sat Oct 29 15:07:58 2016
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Sat, 29 Oct 2016 11:07:58 -0400
Subject: occasional ParNew times of 15+ seconds
In-Reply-To: <CAC1_7KPyfeV43zEGNKPMrXJObNWXNMV-CzTFs6bYyfgkjWeDiw@mail.gmail.com>
References: <CAC1_7KNM+c5o_yMH6m3ueO9c+qOEAxW4_LQAq3iTn0BCBDJxog@mail.gmail.com>
	<CAHjP37EsiXBwW5p6AwY+jtw71bQ-+YpBb_NwZoJkCvDgS8HwZQ@mail.gmail.com>
	<CAC1_7KPyfeV43zEGNKPMrXJObNWXNMV-CzTFs6bYyfgkjWeDiw@mail.gmail.com>
Message-ID: <CAHjP37G2ibgco7YLg0gXk=HKcE-fkhyKhutvHmhAWSzZLyVEuQ@mail.gmail.com>

David,

Ask them to turn off THP - it's a known source of large latency due to the
kernel doing page defragmentation; your app takes a page fault, and boom -
the kernel may start doing defragmentation to make a huge page available.
You can search online for THP issues.  The symptoms are similar to yours -
very high sys time.

If they turn it off and still get same lengthy parnew pauses, then it's
clearly something else but at least we'll eliminate THP as the culprit.

On Saturday, October 29, 2016, David Ely <david.ely at unboundid.com> wrote:

> Thank you for the response. Yes. meminfo (see full output below) shows
> ~80GB of AnonHugePages, which is pretty close to the size of the JVM (full
> output below). Looking back through previous information that we have from
> this customer, transparent huge pages have been turned on for years.
> We've asked them for anything else that might have changed in this
> environment.
>
> Are there any other JVM options that we could enable that would shed light
> on what's going on within the ParNew? Would -XX:+PrintTLAB -XX:+PrintPLAB
> -XX:PrintFLSStatistics=1 show anything useful?
>
> David
>
>
> MemTotal:       264396572 kB
> MemFree:         2401576 kB
> Buffers:          381564 kB
> Cached:         172673120 kB
> SwapCached:            0 kB
> Active:         163439836 kB
> Inactive:       90737452 kB
> Active(anon):   76910848 kB
> Inactive(anon):  4212580 kB
> Active(file):   86528988 kB
> Inactive(file): 86524872 kB
> Unevictable:           0 kB
> Mlocked:               0 kB
> SwapTotal:      16236540 kB
> SwapFree:       16236540 kB
> Dirty:             14552 kB
> Writeback:             0 kB
> AnonPages:      81111768 kB
> Mapped:            31312 kB
> Shmem:               212 kB
> Slab:            6078732 kB
> SReclaimable:    5956052 kB
> SUnreclaim:       122680 kB
> KernelStack:       41296 kB
> PageTables:       171324 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:    148434824 kB
> Committed_AS:   93124984 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:      686780 kB
> VmallocChunk:   34225639420 kB
> HardwareCorrupted:     0 kB
> *AnonHugePages:  80519168 kB*
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
> DirectMap4k:        5132 kB
> DirectMap2M:     1957888 kB
> DirectMap1G:    266338304 kB
>
>
> On Fri, Oct 28, 2016 at 8:04 PM, Vitaly Davidovich <vitalyd at gmail.com
> <javascript:_e(%7B%7D,'cvml','vitalyd at gmail.com');>> wrote:
>
>>
>>
>> On Friday, October 28, 2016, David Ely <david.ely at unboundid.com
>> <javascript:_e(%7B%7D,'cvml','david.ely at unboundid.com');>> wrote:
>>
>>> While typical ParNew GC times are 50ms, our application is occasionally
>>> hitting ParNew times that are over 15 seconds for one of our customers, and
>>> we have no idea why. Looking at the full GC log file:
>>>
>>> 382250 ParNew GCs are < 1 second
>>> 9303 are 100ms to 1 second
>>> 1267 are 1 second to 2 seconds
>>> 99 are 2 seconds to 10 seconds
>>> 24 are > 10 seconds, 48 seconds being the max
>>>
>>> The long ones are somewhat bursty as you can see from looking at the
>>> line numbers in the GC log:
>>>
>>> $ egrep -n  '(ParNew.*real=[1-9][0-9]\)' gc.log.0
>>>
>>> 12300:2016-10-21T01:03:20.380+0000: 20278.069:
>>> [GC2016-10-21T01:03:20.380+0000: 20278.070: [ParNew:
>>> 1697741K->10024K(1887488K), 16.9913450 secs] 33979542K->32817239K(84724992K),
>>> 16.9921050 secs] [Times: user=541.32 sys=14.37, real=16.99 secs]
>>> 43730:2016-10-21T14:12:25.050+0000: 67622.740:
>>> [GC2016-10-21T14:12:25.051+0000: 67622.740: [ParNew:
>>> 1728194K->33817K(1887488K), 12.7508470 secs] 49737924K->48320707K(84724992K),
>>> 12.7517840 secs] [Times: user=405.89 sys=11.05, real=12.75 secs]
>>> 44079:2016-10-21T14:18:55.172+0000: 68012.862:
>>> [GC2016-10-21T14:18:55.173+0000: 68012.862: [ParNew:
>>> 1698371K->26958K(1887488K), 12.7384460 secs] 50339815K->48930730K(84724992K),
>>> 12.7392360 secs] [Times: user=406.58 sys=11.29, real=12.73 secs]
>>> 50151:2016-10-21T17:10:14.471+0000: 78292.160:
>>> [GC2016-10-21T17:10:14.471+0000: 78292.161: [ParNew:
>>> 1713813K->40968K(1887488K), 18.6593320 secs] 49366906K->47959129K(84724992K),
>>> 18.6602550 secs] [Times: user=590.03 sys=17.45, real=18.66 secs]
>>> 56073:2016-10-21T19:59:36.847+0000: 88454.536:
>>> [GC2016-10-21T19:59:36.847+0000: 88454.537: [ParNew:
>>> 1685720K->20763K(1887488K), 16.0840200 secs] 50704025K->49302131K(84724992K),
>>> 16.0848810 secs] [Times: user=487.00 sys=16.84, real=16.09 secs]
>>> 78987:2016-10-22T05:49:25.623+0000: 123843.312:
>>> [GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew:
>>> 1709771K->22678K(1887488K), 10.9933380 secs] 43323834K->41914203K(84724992K),
>>> 10.9943060 secs] [Times: user=349.67 sys=9.84, real=10.99 secs]
>>> 79104:2016-10-22T05:59:26.382+0000: 124444.071:
>>> [GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew:
>>> 1697024K->22260K(1887488K), 11.5490390 secs] 44558499K->43145880K(84724992K),
>>> 11.5499650 secs] [Times: user=367.73 sys=10.01, real=11.55 secs]
>>> 79504:2016-10-22T06:09:36.983+0000: 125054.672:
>>> [GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew:
>>> 1688112K->4769K(1887488K), 14.1528810 secs] 46684947K->45263748K(84724992K),
>>> 14.1539860 secs] [Times: user=452.28 sys=12.71, real=14.15 secs]
>>> 79772:2016-10-22T06:30:36.130+0000: 126313.819:
>>> [GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew:
>>> 1725520K->35893K(1887488K), 14.4479670 secs] 48989739K->47563879K(84724992K),
>>> 14.4488810 secs] [Times: user=461.60 sys=13.04, real=14.45 secs]
>>> 80087:2016-10-22T06:37:07.202+0000: 126704.891:
>>> [GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew:
>>> 1698021K->23440K(1887488K), 15.7039920 secs] 50517163K->49105987K(84724992K),
>>> 15.7050040 secs] [Times: user=497.65 sys=14.75, real=15.70 secs]
>>> 89969:2016-10-22T13:54:27.978+0000: 152945.667:
>>> [GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew:
>>> 1834914K->15978K(1887488K), 11.5637150 secs] 48716340K->47307673K(84724992K),
>>> 11.5645440 secs] [Times: user=367.77 sys=10.01, real=11.57 secs]
>>> 90200:2016-10-22T14:05:02.717+0000: 153580.407:
>>> [GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew:
>>> 1684626K->7078K(1887488K), 17.3424650 secs] 50361539K->48947648K(84724992K),
>>> 17.3433490 secs] [Times: user=554.39 sys=15.81, real=17.34 secs]
>>> 90299:2016-10-22T14:14:30.521+0000: 154148.210:
>>> [GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew:
>>> 1690850K->6078K(1887488K), 13.1699350 secs] 51455784K->50033156K(84724992K),
>>> 13.1708900 secs] [Times: user=419.55 sys=11.54, real=13.17 secs]
>>> 261329:2016-10-26T00:06:44.499+0000: 448882.189:
>>> [GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew:
>>> 1705614K->22224K(1887488K), 17.5831730 secs] 40683698K->39525817K(84724992K),
>>> 17.5843270 secs] [Times: user=561.85 sys=14.79, real=17.58 secs]
>>> 261935:2016-10-26T00:13:34.277+0000: 449291.967:
>>> [GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew:
>>> 1690085K->26707K(1887488K), 13.9331790 secs] 43792178K->42655000K(84724992K),
>>> 13.9340780 secs] [Times: user=446.36 sys=11.45, real=13.93 secs]
>>> 262143:2016-10-26T00:20:09.397+0000: 449687.087:
>>> [GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew:
>>> 1696593K->27078K(1887488K), 40.3344500 secs] 45588644K->44444949K(84724992K),
>>> 40.3355430 secs] [Times: user=1248.15 sys=43.07, real=40.33 secs]
>>> 262275:2016-10-26T00:27:02.196+0000: 450099.886:
>>> [GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew:
>>> 1683406K->17853K(1887488K), 17.7472360 secs] 46908499K->45506131K(84724992K),
>>> 17.7482260 secs] [Times: user=567.03 sys=16.10, real=17.75 secs]
>>> 262282:2016-10-26T00:27:29.448+0000: 450127.138:
>>> [GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew:
>>> 1687737K->10499K(1887488K), 35.4934000 secs] 47195678K->46044477K(84724992K),
>>> 35.4943230 secs] [Times: user=1131.34 sys=31.87, real=35.49 secs]
>>> 262631:2016-10-26T00:34:17.632+0000: 450535.321:
>>> [GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew:
>>> 1687590K->10226K(1887488K), 21.4043600 secs] 49431427K->48018504K(84724992K),
>>> 21.4052230 secs] [Times: user=682.50 sys=19.46, real=21.41 secs]
>>> 262844:2016-10-26T00:41:08.118+0000: 450945.808:
>>> [GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew:
>>> 1692928K->11302K(1887488K), 48.2899260 secs] 51073216K->49915878K(84724992K),
>>> 48.2909550 secs] [Times: user=1493.17 sys=53.55, real=48.28 secs]
>>> 345421:2016-10-27T04:17:59.617+0000: 550357.306:
>>> [GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew:
>>> 1695052K->22991K(1887488K), 33.8707510 secs] 46334738K->45187822K(84724992K),
>>> 33.8718980 secs] [Times: user=1081.31 sys=30.59, real=33.86 secs]
>>> 345510:2016-10-27T04:24:11.721+0000: 550729.411:
>>> [GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew:
>>> 1705080K->20401K(1887488K), 18.9795540 secs] 47388073K->45965537K(84724992K),
>>> 18.9805410 secs] [Times: user=606.94 sys=17.25, real=18.98 secs]
>>> 345514:2016-10-27T04:24:36.695+0000: 550754.385:
>>> [GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew:
>>> 1707810K->32640K(1887488K), 30.9728200 secs] 47656489K->46506725K(84724992K),
>>> 30.9737300 secs] [Times: user=917.67 sys=33.07, real=30.97 secs]
>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791:
>>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew:
>>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K),
>>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs]
>>>
>>> Context around a single instance is fairly normal:
>>>
>>> 345773-2016-10-27T04:31:28.032+0000: 551165.721:
>>> [GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew:
>>> 1685858K->8851K(1887488K), 0.0480250 secs] 49545909K->47870050K(84724992K),
>>> 0.0490200 secs] [Times: user=1.47 sys=0.02, real=0.05 secs]
>>> 345774-2016-10-27T04:31:28.635+0000: 551166.324:
>>> [GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew:
>>> 1686675K->10456K(1887488K), 0.0463570 secs] 49547874K->47872545K(84724992K),
>>> 0.0473410 secs] [Times: user=1.41 sys=0.04, real=0.05 secs]
>>> 345775-2016-10-27T04:31:29.205+0000: 551166.894:
>>> [GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew:
>>> 1688280K->12733K(1887488K), 0.0487100 secs] 49550369K->47876404K(84724992K),
>>> 0.0496310 secs] [Times: user=1.47 sys=0.04, real=0.05 secs]
>>> 345776-2016-10-27T04:31:29.798+0000: 551167.487:
>>> [GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew:
>>> 1690557K->26694K(1887488K), 0.0471170 secs] 49554228K->47892320K(84724992K),
>>> 0.0481180 secs] [Times: user=1.40 sys=0.02, real=0.05 secs]
>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791:
>>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew:
>>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K),
>>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs]
>>> 345778-2016-10-27T04:32:08.449+0000: 551206.139:
>>> [GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew:
>>> 1708684K->122033K(1887488K), 0.0664280 secs] 50100157K->48528020K(84724992K),
>>> 0.0672860 secs] [Times: user=1.60 sys=0.05, real=0.07 secs]
>>> 345779-2016-10-27T04:32:09.090+0000: 551206.779:
>>> [GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew:
>>> 1799857K->42169K(1887488K), 0.0688910 secs] 50205844K->48541030K(84724992K),
>>> 0.0696110 secs] [Times: user=1.70 sys=0.03, real=0.07 secs]
>>> 345780-2016-10-27T04:32:09.802+0000: 551207.491:
>>> [GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew:
>>> 1719993K->43790K(1887488K), 0.0508540 secs] 50218854K->48542651K(84724992K),
>>> 0.0516000 secs] [Times: user=1.54 sys=0.03, real=0.05 secs]
>>> 345781-2016-10-27T04:32:10.536+0000: 551208.226:
>>> [GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew:
>>> 1721614K->30389K(1887488K), 0.0668100 secs] 50220475K->48545932K(84724992K),
>>> 0.0675470 secs] [Times: user=1.81 sys=0.03, real=0.06 secs]
>>> 345782-2016-10-27T04:32:11.137+0000: 551208.826:
>>> [GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew:
>>> 1708213K->18631K(1887488K), 0.0632570 secs] 50223756K->48540797K(84724992K),
>>> 0.0639650 secs] [Times: user=1.95 sys=0.01, real=0.06 secs]
>>> 345783-2016-10-27T04:32:11.642+0000: 551209.332:
>>> [GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew:
>>> 1696455K->19415K(1887488K), 0.0509260 secs] 50218621K->48545033K(84724992K),
>>> 0.0516780 secs] [Times: user=1.55 sys=0.03, real=0.05 secs]
>>>
>>> Since the user times are high as well, I don't think this could be
>>> swapping.
>>>
>> Can you ask the customer if they're using transparent hugepages (THP)?
>>
>>>
>>> Here are the hard-earned set of JVM arguments that we're using:
>>>
>>> -d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \
>>>   -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled \
>>>   -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled \
>>>   -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \
>>>   -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000 \
>>>   -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC
>>> -XX:+UseBiasedLocking \
>>>   -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops -XX:PermSize=256M \
>>>   -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \
>>>   -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar -XX:+UseLargePages \
>>>   -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags \
>>>   -XX:+UseGCLogFileRotation \
>>>   -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \
>>>   -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log
>>>
>>> This is on Linux with Java 1.7.0_72.
>>>
>>> Does this look familiar to anyone? Alternatively, are there some more
>>> JVM options that we could include to get more information?
>>>
>>> One of the first things that we'll try is to move to a later JVM, but it
>>> will be easier to get the customer to do that if we can point to a specific
>>> issue that has been addressed.
>>>
>>> Thanks for your help.
>>>
>>> David
>>>
>>
>>
>> --
>> Sent from my phone
>>
>
>

-- 
Sent from my phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161029/fe4a1eb3/attachment.html>

From charlie.hunt at oracle.com  Sat Oct 29 23:15:37 2016
From: charlie.hunt at oracle.com (charlie hunt)
Date: Sat, 29 Oct 2016 18:15:37 -0500
Subject: occasional ParNew times of 15+ seconds
In-Reply-To: <CAHjP37G2ibgco7YLg0gXk=HKcE-fkhyKhutvHmhAWSzZLyVEuQ@mail.gmail.com>
References: <CAC1_7KNM+c5o_yMH6m3ueO9c+qOEAxW4_LQAq3iTn0BCBDJxog@mail.gmail.com>
	<CAHjP37EsiXBwW5p6AwY+jtw71bQ-+YpBb_NwZoJkCvDgS8HwZQ@mail.gmail.com>
	<CAC1_7KPyfeV43zEGNKPMrXJObNWXNMV-CzTFs6bYyfgkjWeDiw@mail.gmail.com>
	<CAHjP37G2ibgco7YLg0gXk=HKcE-fkhyKhutvHmhAWSzZLyVEuQ@mail.gmail.com>
Message-ID: <B432689D-C653-49EC-90CB-56294C329C82@oracle.com>

+1 on disabling THP

Charlie 

> On Oct 29, 2016, at 10:07 AM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
> 
> David,
> 
> Ask them to turn off THP - it's a known source of large latency due to the kernel doing page defragmentation; your app takes a page fault, and boom - the kernel may start doing defragmentation to make a huge page available.  You can search online for THP issues.  The symptoms are similar to yours - very high sys time.
> 
> If they turn it off and still get same lengthy parnew pauses, then it's clearly something else but at least we'll eliminate THP as the culprit.
> 
>> On Saturday, October 29, 2016, David Ely <david.ely at unboundid.com> wrote:
>> Thank you for the response. Yes. meminfo (see full output below) shows ~80GB of AnonHugePages, which is pretty close to the size of the JVM (full output below). Looking back through previous information that we have from this customer, transparent huge pages have been turned on for years. We've asked them for anything else that might have changed in this environment.
>> 
>> Are there any other JVM options that we could enable that would shed light on what's going on within the ParNew? Would -XX:+PrintTLAB -XX:+PrintPLAB -XX:PrintFLSStatistics=1 show anything useful?
>> 
>> David
>> 
>> 
>> MemTotal:       264396572 kB
>> MemFree:         2401576 kB
>> Buffers:          381564 kB
>> Cached:         172673120 kB
>> SwapCached:            0 kB
>> Active:         163439836 kB
>> Inactive:       90737452 kB
>> Active(anon):   76910848 kB
>> Inactive(anon):  4212580 kB
>> Active(file):   86528988 kB
>> Inactive(file): 86524872 kB
>> Unevictable:           0 kB
>> Mlocked:               0 kB
>> SwapTotal:      16236540 kB
>> SwapFree:       16236540 kB
>> Dirty:             14552 kB
>> Writeback:             0 kB
>> AnonPages:      81111768 kB
>> Mapped:            31312 kB
>> Shmem:               212 kB
>> Slab:            6078732 kB
>> SReclaimable:    5956052 kB
>> SUnreclaim:       122680 kB
>> KernelStack:       41296 kB
>> PageTables:       171324 kB
>> NFS_Unstable:          0 kB
>> Bounce:                0 kB
>> WritebackTmp:          0 kB
>> CommitLimit:    148434824 kB
>> Committed_AS:   93124984 kB
>> VmallocTotal:   34359738367 kB
>> VmallocUsed:      686780 kB
>> VmallocChunk:   34225639420 kB
>> HardwareCorrupted:     0 kB
>> AnonHugePages:  80519168 kB
>> HugePages_Total:       0
>> HugePages_Free:        0
>> HugePages_Rsvd:        0
>> HugePages_Surp:        0
>> Hugepagesize:       2048 kB
>> DirectMap4k:        5132 kB
>> DirectMap2M:     1957888 kB
>> DirectMap1G:    266338304 kB
>> 
>> 
>>> On Fri, Oct 28, 2016 at 8:04 PM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
>>> 
>>> 
>>>> On Friday, October 28, 2016, David Ely <david.ely at unboundid.com> wrote:
>>>> While typical ParNew GC times are 50ms, our application is occasionally hitting ParNew times that are over 15 seconds for one of our customers, and we have no idea why. Looking at the full GC log file:
>>>> 
>>>> 382250 ParNew GCs are < 1 second
>>>> 9303 are 100ms to 1 second
>>>> 1267 are 1 second to 2 seconds
>>>> 99 are 2 seconds to 10 seconds
>>>> 24 are > 10 seconds, 48 seconds being the max
>>>> 
>>>> The long ones are somewhat bursty as you can see from looking at the line numbers in the GC log:
>>>> 
>>>> $ egrep -n  '(ParNew.*real=[1-9][0-9]\)' gc.log.0 
>>>> 
>>>> 12300:2016-10-21T01:03:20.380+0000: 20278.069: [GC2016-10-21T01:03:20.380+0000: 20278.070: [ParNew: 1697741K->10024K(1887488K), 16.9913450 secs] 33979542K->32817239K(84724992K), 16.9921050 secs] [Times: user=541.32 sys=14.37, real=16.99 secs] 
>>>> 43730:2016-10-21T14:12:25.050+0000: 67622.740: [GC2016-10-21T14:12:25.051+0000: 67622.740: [ParNew: 1728194K->33817K(1887488K), 12.7508470 secs] 49737924K->48320707K(84724992K), 12.7517840 secs] [Times: user=405.89 sys=11.05, real=12.75 secs] 
>>>> 44079:2016-10-21T14:18:55.172+0000: 68012.862: [GC2016-10-21T14:18:55.173+0000: 68012.862: [ParNew: 1698371K->26958K(1887488K), 12.7384460 secs] 50339815K->48930730K(84724992K), 12.7392360 secs] [Times: user=406.58 sys=11.29, real=12.73 secs] 
>>>> 50151:2016-10-21T17:10:14.471+0000: 78292.160: [GC2016-10-21T17:10:14.471+0000: 78292.161: [ParNew: 1713813K->40968K(1887488K), 18.6593320 secs] 49366906K->47959129K(84724992K), 18.6602550 secs] [Times: user=590.03 sys=17.45, real=18.66 secs] 
>>>> 56073:2016-10-21T19:59:36.847+0000: 88454.536: [GC2016-10-21T19:59:36.847+0000: 88454.537: [ParNew: 1685720K->20763K(1887488K), 16.0840200 secs] 50704025K->49302131K(84724992K), 16.0848810 secs] [Times: user=487.00 sys=16.84, real=16.09 secs] 
>>>> 78987:2016-10-22T05:49:25.623+0000: 123843.312: [GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew: 1709771K->22678K(1887488K), 10.9933380 secs] 43323834K->41914203K(84724992K), 10.9943060 secs] [Times: user=349.67 sys=9.84, real=10.99 secs] 
>>>> 79104:2016-10-22T05:59:26.382+0000: 124444.071: [GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew: 1697024K->22260K(1887488K), 11.5490390 secs] 44558499K->43145880K(84724992K), 11.5499650 secs] [Times: user=367.73 sys=10.01, real=11.55 secs] 
>>>> 79504:2016-10-22T06:09:36.983+0000: 125054.672: [GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew: 1688112K->4769K(1887488K), 14.1528810 secs] 46684947K->45263748K(84724992K), 14.1539860 secs] [Times: user=452.28 sys=12.71, real=14.15 secs] 
>>>> 79772:2016-10-22T06:30:36.130+0000: 126313.819: [GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew: 1725520K->35893K(1887488K), 14.4479670 secs] 48989739K->47563879K(84724992K), 14.4488810 secs] [Times: user=461.60 sys=13.04, real=14.45 secs] 
>>>> 80087:2016-10-22T06:37:07.202+0000: 126704.891: [GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew: 1698021K->23440K(1887488K), 15.7039920 secs] 50517163K->49105987K(84724992K), 15.7050040 secs] [Times: user=497.65 sys=14.75, real=15.70 secs] 
>>>> 89969:2016-10-22T13:54:27.978+0000: 152945.667: [GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew: 1834914K->15978K(1887488K), 11.5637150 secs] 48716340K->47307673K(84724992K), 11.5645440 secs] [Times: user=367.77 sys=10.01, real=11.57 secs] 
>>>> 90200:2016-10-22T14:05:02.717+0000: 153580.407: [GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew: 1684626K->7078K(1887488K), 17.3424650 secs] 50361539K->48947648K(84724992K), 17.3433490 secs] [Times: user=554.39 sys=15.81, real=17.34 secs] 
>>>> 90299:2016-10-22T14:14:30.521+0000: 154148.210: [GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew: 1690850K->6078K(1887488K), 13.1699350 secs] 51455784K->50033156K(84724992K), 13.1708900 secs] [Times: user=419.55 sys=11.54, real=13.17 secs] 
>>>> 261329:2016-10-26T00:06:44.499+0000: 448882.189: [GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew: 1705614K->22224K(1887488K), 17.5831730 secs] 40683698K->39525817K(84724992K), 17.5843270 secs] [Times: user=561.85 sys=14.79, real=17.58 secs] 
>>>> 261935:2016-10-26T00:13:34.277+0000: 449291.967: [GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew: 1690085K->26707K(1887488K), 13.9331790 secs] 43792178K->42655000K(84724992K), 13.9340780 secs] [Times: user=446.36 sys=11.45, real=13.93 secs] 
>>>> 262143:2016-10-26T00:20:09.397+0000: 449687.087: [GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew: 1696593K->27078K(1887488K), 40.3344500 secs] 45588644K->44444949K(84724992K), 40.3355430 secs] [Times: user=1248.15 sys=43.07, real=40.33 secs] 
>>>> 262275:2016-10-26T00:27:02.196+0000: 450099.886: [GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew: 1683406K->17853K(1887488K), 17.7472360 secs] 46908499K->45506131K(84724992K), 17.7482260 secs] [Times: user=567.03 sys=16.10, real=17.75 secs] 
>>>> 262282:2016-10-26T00:27:29.448+0000: 450127.138: [GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew: 1687737K->10499K(1887488K), 35.4934000 secs] 47195678K->46044477K(84724992K), 35.4943230 secs] [Times: user=1131.34 sys=31.87, real=35.49 secs] 
>>>> 262631:2016-10-26T00:34:17.632+0000: 450535.321: [GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew: 1687590K->10226K(1887488K), 21.4043600 secs] 49431427K->48018504K(84724992K), 21.4052230 secs] [Times: user=682.50 sys=19.46, real=21.41 secs] 
>>>> 262844:2016-10-26T00:41:08.118+0000: 450945.808: [GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew: 1692928K->11302K(1887488K), 48.2899260 secs] 51073216K->49915878K(84724992K), 48.2909550 secs] [Times: user=1493.17 sys=53.55, real=48.28 secs] 
>>>> 345421:2016-10-27T04:17:59.617+0000: 550357.306: [GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew: 1695052K->22991K(1887488K), 33.8707510 secs] 46334738K->45187822K(84724992K), 33.8718980 secs] [Times: user=1081.31 sys=30.59, real=33.86 secs] 
>>>> 345510:2016-10-27T04:24:11.721+0000: 550729.411: [GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew: 1705080K->20401K(1887488K), 18.9795540 secs] 47388073K->45965537K(84724992K), 18.9805410 secs] [Times: user=606.94 sys=17.25, real=18.98 secs] 
>>>> 345514:2016-10-27T04:24:36.695+0000: 550754.385: [GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew: 1707810K->32640K(1887488K), 30.9728200 secs] 47656489K->46506725K(84724992K), 30.9737300 secs] [Times: user=917.67 sys=33.07, real=30.97 secs] 
>>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791: [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs] 
>>>> 
>>>> Context around a single instance is fairly normal:
>>>> 
>>>> 345773-2016-10-27T04:31:28.032+0000: 551165.721: [GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew: 1685858K->8851K(1887488K), 0.0480250 secs] 49545909K->47870050K(84724992K), 0.0490200 secs] [Times: user=1.47 sys=0.02, real=0.05 secs] 
>>>> 345774-2016-10-27T04:31:28.635+0000: 551166.324: [GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew: 1686675K->10456K(1887488K), 0.0463570 secs] 49547874K->47872545K(84724992K), 0.0473410 secs] [Times: user=1.41 sys=0.04, real=0.05 secs] 
>>>> 345775-2016-10-27T04:31:29.205+0000: 551166.894: [GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew: 1688280K->12733K(1887488K), 0.0487100 secs] 49550369K->47876404K(84724992K), 0.0496310 secs] [Times: user=1.47 sys=0.04, real=0.05 secs] 
>>>> 345776-2016-10-27T04:31:29.798+0000: 551167.487: [GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew: 1690557K->26694K(1887488K), 0.0471170 secs] 49554228K->47892320K(84724992K), 0.0481180 secs] [Times: user=1.40 sys=0.02, real=0.05 secs] 
>>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791: [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs] 
>>>> 345778-2016-10-27T04:32:08.449+0000: 551206.139: [GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew: 1708684K->122033K(1887488K), 0.0664280 secs] 50100157K->48528020K(84724992K), 0.0672860 secs] [Times: user=1.60 sys=0.05, real=0.07 secs] 
>>>> 345779-2016-10-27T04:32:09.090+0000: 551206.779: [GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew: 1799857K->42169K(1887488K), 0.0688910 secs] 50205844K->48541030K(84724992K), 0.0696110 secs] [Times: user=1.70 sys=0.03, real=0.07 secs] 
>>>> 345780-2016-10-27T04:32:09.802+0000: 551207.491: [GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew: 1719993K->43790K(1887488K), 0.0508540 secs] 50218854K->48542651K(84724992K), 0.0516000 secs] [Times: user=1.54 sys=0.03, real=0.05 secs] 
>>>> 345781-2016-10-27T04:32:10.536+0000: 551208.226: [GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew: 1721614K->30389K(1887488K), 0.0668100 secs] 50220475K->48545932K(84724992K), 0.0675470 secs] [Times: user=1.81 sys=0.03, real=0.06 secs] 
>>>> 345782-2016-10-27T04:32:11.137+0000: 551208.826: [GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew: 1708213K->18631K(1887488K), 0.0632570 secs] 50223756K->48540797K(84724992K), 0.0639650 secs] [Times: user=1.95 sys=0.01, real=0.06 secs] 
>>>> 345783-2016-10-27T04:32:11.642+0000: 551209.332: [GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew: 1696455K->19415K(1887488K), 0.0509260 secs] 50218621K->48545033K(84724992K), 0.0516780 secs] [Times: user=1.55 sys=0.03, real=0.05 secs] 
>>>> 
>>>> Since the user times are high as well, I don't think this could be swapping.
>>> 
>>> Can you ask the customer if they're using transparent hugepages (THP)? 
>>>> 
>>>> Here are the hard-earned set of JVM arguments that we're using:
>>>> 
>>>> -d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \
>>>>   -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled \
>>>>   -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled \
>>>>   -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \
>>>>   -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000 \
>>>>   -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC -XX:+UseBiasedLocking \
>>>>   -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops -XX:PermSize=256M \
>>>>   -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \
>>>>   -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar -XX:+UseLargePages \
>>>>   -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags \
>>>>   -XX:+UseGCLogFileRotation \
>>>>   -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \
>>>>   -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log
>>>> 
>>>> This is on Linux with Java 1.7.0_72.
>>>> 
>>>> Does this look familiar to anyone? Alternatively, are there some more JVM options that we could include to get more information? 
>>>> 
>>>> One of the first things that we'll try is to move to a later JVM, but it will be easier to get the customer to do that if we can point to a specific issue that has been addressed.
>>>> 
>>>> Thanks for your help.
>>>> 
>>>> David
>>> 
>>> 
>>> -- 
>>> Sent from my phone
>> 
> 
> 
> -- 
> Sent from my phone
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161029/8adbdc29/attachment-0001.html>

From david.ely at unboundid.com  Sun Oct 30 14:14:38 2016
From: david.ely at unboundid.com (David Ely)
Date: Sun, 30 Oct 2016 09:14:38 -0500
Subject: occasional ParNew times of 15+ seconds
In-Reply-To: <B432689D-C653-49EC-90CB-56294C329C82@oracle.com>
References: <CAC1_7KNM+c5o_yMH6m3ueO9c+qOEAxW4_LQAq3iTn0BCBDJxog@mail.gmail.com>
	<CAHjP37EsiXBwW5p6AwY+jtw71bQ-+YpBb_NwZoJkCvDgS8HwZQ@mail.gmail.com>
	<CAC1_7KPyfeV43zEGNKPMrXJObNWXNMV-CzTFs6bYyfgkjWeDiw@mail.gmail.com>
	<CAHjP37G2ibgco7YLg0gXk=HKcE-fkhyKhutvHmhAWSzZLyVEuQ@mail.gmail.com>
	<B432689D-C653-49EC-90CB-56294C329C82@oracle.com>
Message-ID: <CAC1_7KMH6yHkNY9n8B5+293-Vswr1Ztp=CRH8aY0VFeJnZhw1Q@mail.gmail.com>

Thank you Vitaly and Charlie. We will have them disable THP, move to a
later version of the JVM, and add in some additional GC logging JVM options.

Looking more at the GC log, it appears that the long ParNew pauses only
occur when the old generation usage is at least half of the distance
between the live size and when CMS is triggered via
CMSInitiatingOccupancyFraction. After a CMS collection, the long pauses
stop. However, there are plenty of CMS cycles where we don't see any long
pauses, and there are plenty of places where we promote the same amount of
data associated with a long pause but don't experience a long pause.

Is this behavior consistent with the THP diagnosis?

David

On Sat, Oct 29, 2016 at 6:15 PM, charlie hunt <charlie.hunt at oracle.com>
wrote:

> +1 on disabling THP
>
> Charlie
>
> On Oct 29, 2016, at 10:07 AM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
>
> David,
>
> Ask them to turn off THP - it's a known source of large latency due to the
> kernel doing page defragmentation; your app takes a page fault, and boom -
> the kernel may start doing defragmentation to make a huge page available.
> You can search online for THP issues.  The symptoms are similar to yours -
> very high sys time.
>
> If they turn it off and still get same lengthy parnew pauses, then it's
> clearly something else but at least we'll eliminate THP as the culprit.
>
> On Saturday, October 29, 2016, David Ely <david.ely at unboundid.com> wrote:
>
>> Thank you for the response. Yes. meminfo (see full output below) shows
>> ~80GB of AnonHugePages, which is pretty close to the size of the JVM (full
>> output below). Looking back through previous information that we have from
>> this customer, transparent huge pages have been turned on for years.
>> We've asked them for anything else that might have changed in this
>> environment.
>>
>> Are there any other JVM options that we could enable that would shed
>> light on what's going on within the ParNew? Would -XX:+PrintTLAB
>> -XX:+PrintPLAB -XX:PrintFLSStatistics=1 show anything useful?
>>
>> David
>>
>>
>> MemTotal:       264396572 kB
>> MemFree:         2401576 kB
>> Buffers:          381564 kB
>> Cached:         172673120 kB
>> SwapCached:            0 kB
>> Active:         163439836 kB
>> Inactive:       90737452 kB
>> Active(anon):   76910848 kB
>> Inactive(anon):  4212580 kB
>> Active(file):   86528988 kB
>> Inactive(file): 86524872 kB
>> Unevictable:           0 kB
>> Mlocked:               0 kB
>> SwapTotal:      16236540 kB
>> SwapFree:       16236540 kB
>> Dirty:             14552 kB
>> Writeback:             0 kB
>> AnonPages:      81111768 kB
>> Mapped:            31312 kB
>> Shmem:               212 kB
>> Slab:            6078732 kB
>> SReclaimable:    5956052 kB
>> SUnreclaim:       122680 kB
>> KernelStack:       41296 kB
>> PageTables:       171324 kB
>> NFS_Unstable:          0 kB
>> Bounce:                0 kB
>> WritebackTmp:          0 kB
>> CommitLimit:    148434824 kB
>> Committed_AS:   93124984 kB
>> VmallocTotal:   34359738367 kB
>> VmallocUsed:      686780 kB
>> VmallocChunk:   34225639420 kB
>> HardwareCorrupted:     0 kB
>> *AnonHugePages:  80519168 kB*
>> HugePages_Total:       0
>> HugePages_Free:        0
>> HugePages_Rsvd:        0
>> HugePages_Surp:        0
>> Hugepagesize:       2048 kB
>> DirectMap4k:        5132 kB
>> DirectMap2M:     1957888 kB
>> DirectMap1G:    266338304 kB
>>
>>
>> On Fri, Oct 28, 2016 at 8:04 PM, Vitaly Davidovich <vitalyd at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Friday, October 28, 2016, David Ely <david.ely at unboundid.com> wrote:
>>>
>>>> While typical ParNew GC times are 50ms, our application is occasionally
>>>> hitting ParNew times that are over 15 seconds for one of our customers, and
>>>> we have no idea why. Looking at the full GC log file:
>>>>
>>>> 382250 ParNew GCs are < 1 second
>>>> 9303 are 100ms to 1 second
>>>> 1267 are 1 second to 2 seconds
>>>> 99 are 2 seconds to 10 seconds
>>>> 24 are > 10 seconds, 48 seconds being the max
>>>>
>>>> The long ones are somewhat bursty as you can see from looking at the
>>>> line numbers in the GC log:
>>>>
>>>> $ egrep -n  '(ParNew.*real=[1-9][0-9]\)' gc.log.0
>>>>
>>>> 12300:2016-10-21T01:03:20.380+0000: 20278.069:
>>>> [GC2016-10-21T01:03:20.380+0000: 20278.070: [ParNew:
>>>> 1697741K->10024K(1887488K), 16.9913450 secs] 33979542K->32817239K(84724992K),
>>>> 16.9921050 secs] [Times: user=541.32 sys=14.37, real=16.99 secs]
>>>> 43730:2016-10-21T14:12:25.050+0000: 67622.740:
>>>> [GC2016-10-21T14:12:25.051+0000: 67622.740: [ParNew:
>>>> 1728194K->33817K(1887488K), 12.7508470 secs] 49737924K->48320707K(84724992K),
>>>> 12.7517840 secs] [Times: user=405.89 sys=11.05, real=12.75 secs]
>>>> 44079:2016-10-21T14:18:55.172+0000: 68012.862:
>>>> [GC2016-10-21T14:18:55.173+0000: 68012.862: [ParNew:
>>>> 1698371K->26958K(1887488K), 12.7384460 secs] 50339815K->48930730K(84724992K),
>>>> 12.7392360 secs] [Times: user=406.58 sys=11.29, real=12.73 secs]
>>>> 50151:2016-10-21T17:10:14.471+0000: 78292.160:
>>>> [GC2016-10-21T17:10:14.471+0000: 78292.161: [ParNew:
>>>> 1713813K->40968K(1887488K), 18.6593320 secs] 49366906K->47959129K(84724992K),
>>>> 18.6602550 secs] [Times: user=590.03 sys=17.45, real=18.66 secs]
>>>> 56073:2016-10-21T19:59:36.847+0000: 88454.536:
>>>> [GC2016-10-21T19:59:36.847+0000: 88454.537: [ParNew:
>>>> 1685720K->20763K(1887488K), 16.0840200 secs] 50704025K->49302131K(84724992K),
>>>> 16.0848810 secs] [Times: user=487.00 sys=16.84, real=16.09 secs]
>>>> 78987:2016-10-22T05:49:25.623+0000: 123843.312:
>>>> [GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew:
>>>> 1709771K->22678K(1887488K), 10.9933380 secs] 43323834K->41914203K(84724992K),
>>>> 10.9943060 secs] [Times: user=349.67 sys=9.84, real=10.99 secs]
>>>> 79104:2016-10-22T05:59:26.382+0000: 124444.071:
>>>> [GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew:
>>>> 1697024K->22260K(1887488K), 11.5490390 secs] 44558499K->43145880K(84724992K),
>>>> 11.5499650 secs] [Times: user=367.73 sys=10.01, real=11.55 secs]
>>>> 79504:2016-10-22T06:09:36.983+0000: 125054.672:
>>>> [GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew:
>>>> 1688112K->4769K(1887488K), 14.1528810 secs] 46684947K->45263748K(84724992K),
>>>> 14.1539860 secs] [Times: user=452.28 sys=12.71, real=14.15 secs]
>>>> 79772:2016-10-22T06:30:36.130+0000: 126313.819:
>>>> [GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew:
>>>> 1725520K->35893K(1887488K), 14.4479670 secs] 48989739K->47563879K(84724992K),
>>>> 14.4488810 secs] [Times: user=461.60 sys=13.04, real=14.45 secs]
>>>> 80087:2016-10-22T06:37:07.202+0000: 126704.891:
>>>> [GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew:
>>>> 1698021K->23440K(1887488K), 15.7039920 secs] 50517163K->49105987K(84724992K),
>>>> 15.7050040 secs] [Times: user=497.65 sys=14.75, real=15.70 secs]
>>>> 89969:2016-10-22T13:54:27.978+0000: 152945.667:
>>>> [GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew:
>>>> 1834914K->15978K(1887488K), 11.5637150 secs] 48716340K->47307673K(84724992K),
>>>> 11.5645440 secs] [Times: user=367.77 sys=10.01, real=11.57 secs]
>>>> 90200:2016-10-22T14:05:02.717+0000: 153580.407:
>>>> [GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew:
>>>> 1684626K->7078K(1887488K), 17.3424650 secs] 50361539K->48947648K(84724992K),
>>>> 17.3433490 secs] [Times: user=554.39 sys=15.81, real=17.34 secs]
>>>> 90299:2016-10-22T14:14:30.521+0000: 154148.210:
>>>> [GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew:
>>>> 1690850K->6078K(1887488K), 13.1699350 secs] 51455784K->50033156K(84724992K),
>>>> 13.1708900 secs] [Times: user=419.55 sys=11.54, real=13.17 secs]
>>>> 261329:2016-10-26T00:06:44.499+0000: 448882.189:
>>>> [GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew:
>>>> 1705614K->22224K(1887488K), 17.5831730 secs] 40683698K->39525817K(84724992K),
>>>> 17.5843270 secs] [Times: user=561.85 sys=14.79, real=17.58 secs]
>>>> 261935:2016-10-26T00:13:34.277+0000: 449291.967:
>>>> [GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew:
>>>> 1690085K->26707K(1887488K), 13.9331790 secs] 43792178K->42655000K(84724992K),
>>>> 13.9340780 secs] [Times: user=446.36 sys=11.45, real=13.93 secs]
>>>> 262143:2016-10-26T00:20:09.397+0000: 449687.087:
>>>> [GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew:
>>>> 1696593K->27078K(1887488K), 40.3344500 secs] 45588644K->44444949K(84724992K),
>>>> 40.3355430 secs] [Times: user=1248.15 sys=43.07, real=40.33 secs]
>>>> 262275:2016-10-26T00:27:02.196+0000: 450099.886:
>>>> [GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew:
>>>> 1683406K->17853K(1887488K), 17.7472360 secs] 46908499K->45506131K(84724992K),
>>>> 17.7482260 secs] [Times: user=567.03 sys=16.10, real=17.75 secs]
>>>> 262282:2016-10-26T00:27:29.448+0000: 450127.138:
>>>> [GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew:
>>>> 1687737K->10499K(1887488K), 35.4934000 secs] 47195678K->46044477K(84724992K),
>>>> 35.4943230 secs] [Times: user=1131.34 sys=31.87, real=35.49 secs]
>>>> 262631:2016-10-26T00:34:17.632+0000: 450535.321:
>>>> [GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew:
>>>> 1687590K->10226K(1887488K), 21.4043600 secs] 49431427K->48018504K(84724992K),
>>>> 21.4052230 secs] [Times: user=682.50 sys=19.46, real=21.41 secs]
>>>> 262844:2016-10-26T00:41:08.118+0000: 450945.808:
>>>> [GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew:
>>>> 1692928K->11302K(1887488K), 48.2899260 secs] 51073216K->49915878K(84724992K),
>>>> 48.2909550 secs] [Times: user=1493.17 sys=53.55, real=48.28 secs]
>>>> 345421:2016-10-27T04:17:59.617+0000: 550357.306:
>>>> [GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew:
>>>> 1695052K->22991K(1887488K), 33.8707510 secs] 46334738K->45187822K(84724992K),
>>>> 33.8718980 secs] [Times: user=1081.31 sys=30.59, real=33.86 secs]
>>>> 345510:2016-10-27T04:24:11.721+0000: 550729.411:
>>>> [GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew:
>>>> 1705080K->20401K(1887488K), 18.9795540 secs] 47388073K->45965537K(84724992K),
>>>> 18.9805410 secs] [Times: user=606.94 sys=17.25, real=18.98 secs]
>>>> 345514:2016-10-27T04:24:36.695+0000: 550754.385:
>>>> [GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew:
>>>> 1707810K->32640K(1887488K), 30.9728200 secs] 47656489K->46506725K(84724992K),
>>>> 30.9737300 secs] [Times: user=917.67 sys=33.07, real=30.97 secs]
>>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791:
>>>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew:
>>>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K),
>>>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs]
>>>>
>>>> Context around a single instance is fairly normal:
>>>>
>>>> 345773-2016-10-27T04:31:28.032+0000: 551165.721:
>>>> [GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew:
>>>> 1685858K->8851K(1887488K), 0.0480250 secs] 49545909K->47870050K(84724992K),
>>>> 0.0490200 secs] [Times: user=1.47 sys=0.02, real=0.05 secs]
>>>> 345774-2016-10-27T04:31:28.635+0000: 551166.324:
>>>> [GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew:
>>>> 1686675K->10456K(1887488K), 0.0463570 secs] 49547874K->47872545K(84724992K),
>>>> 0.0473410 secs] [Times: user=1.41 sys=0.04, real=0.05 secs]
>>>> 345775-2016-10-27T04:31:29.205+0000: 551166.894:
>>>> [GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew:
>>>> 1688280K->12733K(1887488K), 0.0487100 secs] 49550369K->47876404K(84724992K),
>>>> 0.0496310 secs] [Times: user=1.47 sys=0.04, real=0.05 secs]
>>>> 345776-2016-10-27T04:31:29.798+0000: 551167.487:
>>>> [GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew:
>>>> 1690557K->26694K(1887488K), 0.0471170 secs] 49554228K->47892320K(84724992K),
>>>> 0.0481180 secs] [Times: user=1.40 sys=0.02, real=0.05 secs]
>>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791:
>>>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew:
>>>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K),
>>>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs]
>>>> 345778-2016-10-27T04:32:08.449+0000: 551206.139:
>>>> [GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew:
>>>> 1708684K->122033K(1887488K), 0.0664280 secs] 50100157K->48528020K(84724992K),
>>>> 0.0672860 secs] [Times: user=1.60 sys=0.05, real=0.07 secs]
>>>> 345779-2016-10-27T04:32:09.090+0000: 551206.779:
>>>> [GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew:
>>>> 1799857K->42169K(1887488K), 0.0688910 secs] 50205844K->48541030K(84724992K),
>>>> 0.0696110 secs] [Times: user=1.70 sys=0.03, real=0.07 secs]
>>>> 345780-2016-10-27T04:32:09.802+0000: 551207.491:
>>>> [GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew:
>>>> 1719993K->43790K(1887488K), 0.0508540 secs] 50218854K->48542651K(84724992K),
>>>> 0.0516000 secs] [Times: user=1.54 sys=0.03, real=0.05 secs]
>>>> 345781-2016-10-27T04:32:10.536+0000: 551208.226:
>>>> [GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew:
>>>> 1721614K->30389K(1887488K), 0.0668100 secs] 50220475K->48545932K(84724992K),
>>>> 0.0675470 secs] [Times: user=1.81 sys=0.03, real=0.06 secs]
>>>> 345782-2016-10-27T04:32:11.137+0000: 551208.826:
>>>> [GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew:
>>>> 1708213K->18631K(1887488K), 0.0632570 secs] 50223756K->48540797K(84724992K),
>>>> 0.0639650 secs] [Times: user=1.95 sys=0.01, real=0.06 secs]
>>>> 345783-2016-10-27T04:32:11.642+0000: 551209.332:
>>>> [GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew:
>>>> 1696455K->19415K(1887488K), 0.0509260 secs] 50218621K->48545033K(84724992K),
>>>> 0.0516780 secs] [Times: user=1.55 sys=0.03, real=0.05 secs]
>>>>
>>>> Since the user times are high as well, I don't think this could be
>>>> swapping.
>>>>
>>> Can you ask the customer if they're using transparent hugepages (THP)?
>>>
>>>>
>>>> Here are the hard-earned set of JVM arguments that we're using:
>>>>
>>>> -d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \
>>>>   -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled \
>>>>   -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled \
>>>>   -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \
>>>>   -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000 \
>>>>   -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC
>>>> -XX:+UseBiasedLocking \
>>>>   -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops -XX:PermSize=256M \
>>>>   -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \
>>>>   -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar -XX:+UseLargePages
>>>> \
>>>>   -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags
>>>> \
>>>>   -XX:+UseGCLogFileRotation \
>>>>   -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \
>>>>   -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log
>>>>
>>>> This is on Linux with Java 1.7.0_72.
>>>>
>>>> Does this look familiar to anyone? Alternatively, are there some more
>>>> JVM options that we could include to get more information?
>>>>
>>>> One of the first things that we'll try is to move to a later JVM, but
>>>> it will be easier to get the customer to do that if we can point to a
>>>> specific issue that has been addressed.
>>>>
>>>> Thanks for your help.
>>>>
>>>> David
>>>>
>>>
>>>
>>> --
>>> Sent from my phone
>>>
>>
>>
>
> --
> Sent from my phone
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161030/42ad9f34/attachment.html>

From vitalyd at gmail.com  Sun Oct 30 18:56:24 2016
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Sun, 30 Oct 2016 14:56:24 -0400
Subject: occasional ParNew times of 15+ seconds
In-Reply-To: <CAC1_7KMH6yHkNY9n8B5+293-Vswr1Ztp=CRH8aY0VFeJnZhw1Q@mail.gmail.com>
References: <CAC1_7KNM+c5o_yMH6m3ueO9c+qOEAxW4_LQAq3iTn0BCBDJxog@mail.gmail.com>
	<CAHjP37EsiXBwW5p6AwY+jtw71bQ-+YpBb_NwZoJkCvDgS8HwZQ@mail.gmail.com>
	<CAC1_7KPyfeV43zEGNKPMrXJObNWXNMV-CzTFs6bYyfgkjWeDiw@mail.gmail.com>
	<CAHjP37G2ibgco7YLg0gXk=HKcE-fkhyKhutvHmhAWSzZLyVEuQ@mail.gmail.com>
	<B432689D-C653-49EC-90CB-56294C329C82@oracle.com>
	<CAC1_7KMH6yHkNY9n8B5+293-Vswr1Ztp=CRH8aY0VFeJnZhw1Q@mail.gmail.com>
Message-ID: <CAHjP37Fi3+=FDEGNP3f5QM9RMHU+5r41nzXXiOEtqFKkDyR=-Q@mail.gmail.com>

On Sunday, October 30, 2016, David Ely <david.ely at unboundid.com> wrote:

> Thank you Vitaly and Charlie. We will have them disable THP, move to a
> later version of the JVM, and add in some additional GC logging JVM options.
>
> Looking more at the GC log, it appears that the long ParNew pauses only
> occur when the old generation usage is at least half of the distance
> between the live size and when CMS is triggered via
> CMSInitiatingOccupancyFraction. After a CMS collection, the long pauses
> stop. However, there are plenty of CMS cycles where we don't see any long
> pauses, and there are plenty of places where we promote the same amount of
> data associated with a long pause but don't experience a long pause.
>
> Is this behavior consistent with the THP diagnosis?
>
The very high sys time is unusual for a parnew collection.  THP defrag is
one possible known cause of that.  It's certainly possible there's
something else going on, but turning off THP is a good start in
troubleshooting; even if it's not the cause here, it may bite your customer
later.

A few more questions in the meantime:

1) are these parnew tails reproducible?
2) is this running on bare metal or VM?
3) what's the hardware spec?

If you can have the customer disable THP without bumping the JVM version,
it would help pinpoint the issue.  But, I understand if you just want to
fix the issue asap.

>
> David
>
> On Sat, Oct 29, 2016 at 6:15 PM, charlie hunt <charlie.hunt at oracle.com
> <javascript:_e(%7B%7D,'cvml','charlie.hunt at oracle.com');>> wrote:
>
>> +1 on disabling THP
>>
>> Charlie
>>
>> On Oct 29, 2016, at 10:07 AM, Vitaly Davidovich <vitalyd at gmail.com
>> <javascript:_e(%7B%7D,'cvml','vitalyd at gmail.com');>> wrote:
>>
>> David,
>>
>> Ask them to turn off THP - it's a known source of large latency due to
>> the kernel doing page defragmentation; your app takes a page fault, and
>> boom - the kernel may start doing defragmentation to make a huge page
>> available.  You can search online for THP issues.  The symptoms are similar
>> to yours - very high sys time.
>>
>> If they turn it off and still get same lengthy parnew pauses, then it's
>> clearly something else but at least we'll eliminate THP as the culprit.
>>
>> On Saturday, October 29, 2016, David Ely <david.ely at unboundid.com
>> <javascript:_e(%7B%7D,'cvml','david.ely at unboundid.com');>> wrote:
>>
>>> Thank you for the response. Yes. meminfo (see full output below) shows
>>> ~80GB of AnonHugePages, which is pretty close to the size of the JVM (full
>>> output below). Looking back through previous information that we have from
>>> this customer, transparent huge pages have been turned on for years.
>>> We've asked them for anything else that might have changed in this
>>> environment.
>>>
>>> Are there any other JVM options that we could enable that would shed
>>> light on what's going on within the ParNew? Would -XX:+PrintTLAB
>>> -XX:+PrintPLAB -XX:PrintFLSStatistics=1 show anything useful?
>>>
>>> David
>>>
>>>
>>> MemTotal:       264396572 kB
>>> MemFree:         2401576 kB
>>> Buffers:          381564 kB
>>> Cached:         172673120 kB
>>> SwapCached:            0 kB
>>> Active:         163439836 kB
>>> Inactive:       90737452 kB
>>> Active(anon):   76910848 kB
>>> Inactive(anon):  4212580 kB
>>> Active(file):   86528988 kB
>>> Inactive(file): 86524872 kB
>>> Unevictable:           0 kB
>>> Mlocked:               0 kB
>>> SwapTotal:      16236540 kB
>>> SwapFree:       16236540 kB
>>> Dirty:             14552 kB
>>> Writeback:             0 kB
>>> AnonPages:      81111768 kB
>>> Mapped:            31312 kB
>>> Shmem:               212 kB
>>> Slab:            6078732 kB
>>> SReclaimable:    5956052 kB
>>> SUnreclaim:       122680 kB
>>> KernelStack:       41296 kB
>>> PageTables:       171324 kB
>>> NFS_Unstable:          0 kB
>>> Bounce:                0 kB
>>> WritebackTmp:          0 kB
>>> CommitLimit:    148434824 kB
>>> Committed_AS:   93124984 kB
>>> VmallocTotal:   34359738367 kB
>>> VmallocUsed:      686780 kB
>>> VmallocChunk:   34225639420 kB
>>> HardwareCorrupted:     0 kB
>>> *AnonHugePages:  80519168 kB*
>>> HugePages_Total:       0
>>> HugePages_Free:        0
>>> HugePages_Rsvd:        0
>>> HugePages_Surp:        0
>>> Hugepagesize:       2048 kB
>>> DirectMap4k:        5132 kB
>>> DirectMap2M:     1957888 kB
>>> DirectMap1G:    266338304 kB
>>>
>>>
>>> On Fri, Oct 28, 2016 at 8:04 PM, Vitaly Davidovich <vitalyd at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Friday, October 28, 2016, David Ely <david.ely at unboundid.com> wrote:
>>>>
>>>>> While typical ParNew GC times are 50ms, our application is
>>>>> occasionally hitting ParNew times that are over 15 seconds for one of our
>>>>> customers, and we have no idea why. Looking at the full GC log file:
>>>>>
>>>>> 382250 ParNew GCs are < 1 second
>>>>> 9303 are 100ms to 1 second
>>>>> 1267 are 1 second to 2 seconds
>>>>> 99 are 2 seconds to 10 seconds
>>>>> 24 are > 10 seconds, 48 seconds being the max
>>>>>
>>>>> The long ones are somewhat bursty as you can see from looking at the
>>>>> line numbers in the GC log:
>>>>>
>>>>> $ egrep -n  '(ParNew.*real=[1-9][0-9]\)' gc.log.0
>>>>>
>>>>> 12300:2016-10-21T01:03:20.380+0000: 20278.069:
>>>>> [GC2016-10-21T01:03:20.380+0000: 20278.070: [ParNew:
>>>>> 1697741K->10024K(1887488K), 16.9913450 secs] 33979542K->32817239K(84724992K),
>>>>> 16.9921050 secs] [Times: user=541.32 sys=14.37, real=16.99 secs]
>>>>> 43730:2016-10-21T14:12:25.050+0000: 67622.740:
>>>>> [GC2016-10-21T14:12:25.051+0000: 67622.740: [ParNew:
>>>>> 1728194K->33817K(1887488K), 12.7508470 secs] 49737924K->48320707K(84724992K),
>>>>> 12.7517840 secs] [Times: user=405.89 sys=11.05, real=12.75 secs]
>>>>> 44079:2016-10-21T14:18:55.172+0000: 68012.862:
>>>>> [GC2016-10-21T14:18:55.173+0000: 68012.862: [ParNew:
>>>>> 1698371K->26958K(1887488K), 12.7384460 secs] 50339815K->48930730K(84724992K),
>>>>> 12.7392360 secs] [Times: user=406.58 sys=11.29, real=12.73 secs]
>>>>> 50151:2016-10-21T17:10:14.471+0000: 78292.160:
>>>>> [GC2016-10-21T17:10:14.471+0000: 78292.161: [ParNew:
>>>>> 1713813K->40968K(1887488K), 18.6593320 secs] 49366906K->47959129K(84724992K),
>>>>> 18.6602550 secs] [Times: user=590.03 sys=17.45, real=18.66 secs]
>>>>> 56073:2016-10-21T19:59:36.847+0000: 88454.536:
>>>>> [GC2016-10-21T19:59:36.847+0000: 88454.537: [ParNew:
>>>>> 1685720K->20763K(1887488K), 16.0840200 secs] 50704025K->49302131K(84724992K),
>>>>> 16.0848810 secs] [Times: user=487.00 sys=16.84, real=16.09 secs]
>>>>> 78987:2016-10-22T05:49:25.623+0000: 123843.312:
>>>>> [GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew:
>>>>> 1709771K->22678K(1887488K), 10.9933380 secs] 43323834K->41914203K(84724992K),
>>>>> 10.9943060 secs] [Times: user=349.67 sys=9.84, real=10.99 secs]
>>>>> 79104:2016-10-22T05:59:26.382+0000: 124444.071:
>>>>> [GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew:
>>>>> 1697024K->22260K(1887488K), 11.5490390 secs] 44558499K->43145880K(84724992K),
>>>>> 11.5499650 secs] [Times: user=367.73 sys=10.01, real=11.55 secs]
>>>>> 79504:2016-10-22T06:09:36.983+0000: 125054.672:
>>>>> [GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew:
>>>>> 1688112K->4769K(1887488K), 14.1528810 secs] 46684947K->45263748K(84724992K),
>>>>> 14.1539860 secs] [Times: user=452.28 sys=12.71, real=14.15 secs]
>>>>> 79772:2016-10-22T06:30:36.130+0000: 126313.819:
>>>>> [GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew:
>>>>> 1725520K->35893K(1887488K), 14.4479670 secs] 48989739K->47563879K(84724992K),
>>>>> 14.4488810 secs] [Times: user=461.60 sys=13.04, real=14.45 secs]
>>>>> 80087:2016-10-22T06:37:07.202+0000: 126704.891:
>>>>> [GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew:
>>>>> 1698021K->23440K(1887488K), 15.7039920 secs] 50517163K->49105987K(84724992K),
>>>>> 15.7050040 secs] [Times: user=497.65 sys=14.75, real=15.70 secs]
>>>>> 89969:2016-10-22T13:54:27.978+0000: 152945.667:
>>>>> [GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew:
>>>>> 1834914K->15978K(1887488K), 11.5637150 secs] 48716340K->47307673K(84724992K),
>>>>> 11.5645440 secs] [Times: user=367.77 sys=10.01, real=11.57 secs]
>>>>> 90200:2016-10-22T14:05:02.717+0000: 153580.407:
>>>>> [GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew:
>>>>> 1684626K->7078K(1887488K), 17.3424650 secs] 50361539K->48947648K(84724992K),
>>>>> 17.3433490 secs] [Times: user=554.39 sys=15.81, real=17.34 secs]
>>>>> 90299:2016-10-22T14:14:30.521+0000: 154148.210:
>>>>> [GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew:
>>>>> 1690850K->6078K(1887488K), 13.1699350 secs] 51455784K->50033156K(84724992K),
>>>>> 13.1708900 secs] [Times: user=419.55 sys=11.54, real=13.17 secs]
>>>>> 261329:2016-10-26T00:06:44.499+0000: 448882.189:
>>>>> [GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew:
>>>>> 1705614K->22224K(1887488K), 17.5831730 secs] 40683698K->39525817K(84724992K),
>>>>> 17.5843270 secs] [Times: user=561.85 sys=14.79, real=17.58 secs]
>>>>> 261935:2016-10-26T00:13:34.277+0000: 449291.967:
>>>>> [GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew:
>>>>> 1690085K->26707K(1887488K), 13.9331790 secs] 43792178K->42655000K(84724992K),
>>>>> 13.9340780 secs] [Times: user=446.36 sys=11.45, real=13.93 secs]
>>>>> 262143:2016-10-26T00:20:09.397+0000: 449687.087:
>>>>> [GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew:
>>>>> 1696593K->27078K(1887488K), 40.3344500 secs] 45588644K->44444949K(84724992K),
>>>>> 40.3355430 secs] [Times: user=1248.15 sys=43.07, real=40.33 secs]
>>>>> 262275:2016-10-26T00:27:02.196+0000: 450099.886:
>>>>> [GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew:
>>>>> 1683406K->17853K(1887488K), 17.7472360 secs] 46908499K->45506131K(84724992K),
>>>>> 17.7482260 secs] [Times: user=567.03 sys=16.10, real=17.75 secs]
>>>>> 262282:2016-10-26T00:27:29.448+0000: 450127.138:
>>>>> [GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew:
>>>>> 1687737K->10499K(1887488K), 35.4934000 secs] 47195678K->46044477K(84724992K),
>>>>> 35.4943230 secs] [Times: user=1131.34 sys=31.87, real=35.49 secs]
>>>>> 262631:2016-10-26T00:34:17.632+0000: 450535.321:
>>>>> [GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew:
>>>>> 1687590K->10226K(1887488K), 21.4043600 secs] 49431427K->48018504K(84724992K),
>>>>> 21.4052230 secs] [Times: user=682.50 sys=19.46, real=21.41 secs]
>>>>> 262844:2016-10-26T00:41:08.118+0000: 450945.808:
>>>>> [GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew:
>>>>> 1692928K->11302K(1887488K), 48.2899260 secs] 51073216K->49915878K(84724992K),
>>>>> 48.2909550 secs] [Times: user=1493.17 sys=53.55, real=48.28 secs]
>>>>> 345421:2016-10-27T04:17:59.617+0000: 550357.306:
>>>>> [GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew:
>>>>> 1695052K->22991K(1887488K), 33.8707510 secs] 46334738K->45187822K(84724992K),
>>>>> 33.8718980 secs] [Times: user=1081.31 sys=30.59, real=33.86 secs]
>>>>> 345510:2016-10-27T04:24:11.721+0000: 550729.411:
>>>>> [GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew:
>>>>> 1705080K->20401K(1887488K), 18.9795540 secs] 47388073K->45965537K(84724992K),
>>>>> 18.9805410 secs] [Times: user=606.94 sys=17.25, real=18.98 secs]
>>>>> 345514:2016-10-27T04:24:36.695+0000: 550754.385:
>>>>> [GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew:
>>>>> 1707810K->32640K(1887488K), 30.9728200 secs] 47656489K->46506725K(84724992K),
>>>>> 30.9737300 secs] [Times: user=917.67 sys=33.07, real=30.97 secs]
>>>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791:
>>>>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew:
>>>>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K),
>>>>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs]
>>>>>
>>>>> Context around a single instance is fairly normal:
>>>>>
>>>>> 345773-2016-10-27T04:31:28.032+0000: 551165.721:
>>>>> [GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew:
>>>>> 1685858K->8851K(1887488K), 0.0480250 secs] 49545909K->47870050K(84724992K),
>>>>> 0.0490200 secs] [Times: user=1.47 sys=0.02, real=0.05 secs]
>>>>> 345774-2016-10-27T04:31:28.635+0000: 551166.324:
>>>>> [GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew:
>>>>> 1686675K->10456K(1887488K), 0.0463570 secs] 49547874K->47872545K(84724992K),
>>>>> 0.0473410 secs] [Times: user=1.41 sys=0.04, real=0.05 secs]
>>>>> 345775-2016-10-27T04:31:29.205+0000: 551166.894:
>>>>> [GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew:
>>>>> 1688280K->12733K(1887488K), 0.0487100 secs] 49550369K->47876404K(84724992K),
>>>>> 0.0496310 secs] [Times: user=1.47 sys=0.04, real=0.05 secs]
>>>>> 345776-2016-10-27T04:31:29.798+0000: 551167.487:
>>>>> [GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew:
>>>>> 1690557K->26694K(1887488K), 0.0471170 secs] 49554228K->47892320K(84724992K),
>>>>> 0.0481180 secs] [Times: user=1.40 sys=0.02, real=0.05 secs]
>>>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791:
>>>>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew:
>>>>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K),
>>>>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs]
>>>>> 345778-2016-10-27T04:32:08.449+0000: 551206.139:
>>>>> [GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew:
>>>>> 1708684K->122033K(1887488K), 0.0664280 secs] 50100157K->48528020K(84724992K),
>>>>> 0.0672860 secs] [Times: user=1.60 sys=0.05, real=0.07 secs]
>>>>> 345779-2016-10-27T04:32:09.090+0000: 551206.779:
>>>>> [GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew:
>>>>> 1799857K->42169K(1887488K), 0.0688910 secs] 50205844K->48541030K(84724992K),
>>>>> 0.0696110 secs] [Times: user=1.70 sys=0.03, real=0.07 secs]
>>>>> 345780-2016-10-27T04:32:09.802+0000: 551207.491:
>>>>> [GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew:
>>>>> 1719993K->43790K(1887488K), 0.0508540 secs] 50218854K->48542651K(84724992K),
>>>>> 0.0516000 secs] [Times: user=1.54 sys=0.03, real=0.05 secs]
>>>>> 345781-2016-10-27T04:32:10.536+0000: 551208.226:
>>>>> [GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew:
>>>>> 1721614K->30389K(1887488K), 0.0668100 secs] 50220475K->48545932K(84724992K),
>>>>> 0.0675470 secs] [Times: user=1.81 sys=0.03, real=0.06 secs]
>>>>> 345782-2016-10-27T04:32:11.137+0000: 551208.826:
>>>>> [GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew:
>>>>> 1708213K->18631K(1887488K), 0.0632570 secs] 50223756K->48540797K(84724992K),
>>>>> 0.0639650 secs] [Times: user=1.95 sys=0.01, real=0.06 secs]
>>>>> 345783-2016-10-27T04:32:11.642+0000: 551209.332:
>>>>> [GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew:
>>>>> 1696455K->19415K(1887488K), 0.0509260 secs] 50218621K->48545033K(84724992K),
>>>>> 0.0516780 secs] [Times: user=1.55 sys=0.03, real=0.05 secs]
>>>>>
>>>>> Since the user times are high as well, I don't think this could be
>>>>> swapping.
>>>>>
>>>> Can you ask the customer if they're using transparent hugepages (THP)?
>>>>
>>>>>
>>>>> Here are the hard-earned set of JVM arguments that we're using:
>>>>>
>>>>> -d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \
>>>>>   -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled \
>>>>>   -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled
>>>>> \
>>>>>   -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \
>>>>>   -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000 \
>>>>>   -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC
>>>>> -XX:+UseBiasedLocking \
>>>>>   -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops -XX:PermSize=256M \
>>>>>   -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \
>>>>>   -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar
>>>>> -XX:+UseLargePages \
>>>>>   -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>>>>> -XX:+PrintCommandLineFlags \
>>>>>   -XX:+UseGCLogFileRotation \
>>>>>   -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \
>>>>>   -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log
>>>>>
>>>>> This is on Linux with Java 1.7.0_72.
>>>>>
>>>>> Does this look familiar to anyone? Alternatively, are there some more
>>>>> JVM options that we could include to get more information?
>>>>>
>>>>> One of the first things that we'll try is to move to a later JVM, but
>>>>> it will be easier to get the customer to do that if we can point to a
>>>>> specific issue that has been addressed.
>>>>>
>>>>> Thanks for your help.
>>>>>
>>>>> David
>>>>>
>>>>
>>>>
>>>> --
>>>> Sent from my phone
>>>>
>>>
>>>
>>
>> --
>> Sent from my phone
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> <javascript:_e(%7B%7D,'cvml','hotspot-gc-use at openjdk.java.net');>
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>

-- 
Sent from my phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161030/2b88f760/attachment-0001.html>

From david.ely at unboundid.com  Mon Oct 31 00:09:38 2016
From: david.ely at unboundid.com (David Ely)
Date: Sun, 30 Oct 2016 19:09:38 -0500
Subject: occasional ParNew times of 15+ seconds
In-Reply-To: <CAHjP37Fi3+=FDEGNP3f5QM9RMHU+5r41nzXXiOEtqFKkDyR=-Q@mail.gmail.com>
References: <CAC1_7KNM+c5o_yMH6m3ueO9c+qOEAxW4_LQAq3iTn0BCBDJxog@mail.gmail.com>
	<CAHjP37EsiXBwW5p6AwY+jtw71bQ-+YpBb_NwZoJkCvDgS8HwZQ@mail.gmail.com>
	<CAC1_7KPyfeV43zEGNKPMrXJObNWXNMV-CzTFs6bYyfgkjWeDiw@mail.gmail.com>
	<CAHjP37G2ibgco7YLg0gXk=HKcE-fkhyKhutvHmhAWSzZLyVEuQ@mail.gmail.com>
	<B432689D-C653-49EC-90CB-56294C329C82@oracle.com>
	<CAC1_7KMH6yHkNY9n8B5+293-Vswr1Ztp=CRH8aY0VFeJnZhw1Q@mail.gmail.com>
	<CAHjP37Fi3+=FDEGNP3f5QM9RMHU+5r41nzXXiOEtqFKkDyR=-Q@mail.gmail.com>
Message-ID: <CAC1_7KPf5H+mzUk3hB_nvk5cwnJDgfD_GSFEtdwmy8DpqSJ=Yg@mail.gmail.com>

Thanks again Vitaly. Responses inline.

On Sun, Oct 30, 2016 at 1:56 PM, Vitaly Davidovich <vitalyd at gmail.com>
wrote:

>
>
> On Sunday, October 30, 2016, David Ely <david.ely at unboundid.com> wrote:
>
>> Thank you Vitaly and Charlie. We will have them disable THP, move to a
>> later version of the JVM, and add in some additional GC logging JVM options.
>>
>> Looking more at the GC log, it appears that the long ParNew pauses only
>> occur when the old generation usage is at least half of the distance
>> between the live size and when CMS is triggered via
>> CMSInitiatingOccupancyFraction. After a CMS collection, the long pauses
>> stop. However, there are plenty of CMS cycles where we don't see any long
>> pauses, and there are plenty of places where we promote the same amount of
>> data associated with a long pause but don't experience a long pause.
>>
>> Is this behavior consistent with the THP diagnosis?
>>
> The very high sys time is unusual for a parnew collection.  THP defrag is
> one possible known cause of that.  It's certainly possible there's
> something else going on, but turning off THP is a good start in
> troubleshooting; even if it's not the cause here, it may bite your customer
> later.
>

The sys times are high, but they are not especially high relative to the
user times. The ratio across all of the ParNew collections is about the
same.


>
> A few more questions in the meantime:
>
> 1) are these parnew tails reproducible?
>

I believe so. They are seeing it on multiple systems. It seems to have
gotten worse on the newer systems, which have 256GB of RAM compared to 96GB.


> 2) is this running on bare metal or VM?
>

Bare metal.


> 3) what's the hardware spec?
>

These specific pauses on hardware they acquired recently. Java sees 48
CPUs, and it has 256GB of RAM.


>
> If you can have the customer disable THP without bumping the JVM version,
> it would help pinpoint the issue.  But, I understand if you just want to
> fix the issue asap.
>

Since they are seeing this on multiple systems, they should be able to have
at least one where they only disable THP.

They'll have to put these changes through their testing environment, so it
might be a little while before I'll have an update.


>
>
>>
>> On Sat, Oct 29, 2016 at 6:15 PM, charlie hunt <charlie.hunt at oracle.com>
>> wrote:
>>
>>> +1 on disabling THP
>>>
>>> Charlie
>>>
>>> On Oct 29, 2016, at 10:07 AM, Vitaly Davidovich <vitalyd at gmail.com>
>>> wrote:
>>>
>>> David,
>>>
>>> Ask them to turn off THP - it's a known source of large latency due to
>>> the kernel doing page defragmentation; your app takes a page fault, and
>>> boom - the kernel may start doing defragmentation to make a huge page
>>> available.  You can search online for THP issues.  The symptoms are similar
>>> to yours - very high sys time.
>>>
>>> If they turn it off and still get same lengthy parnew pauses, then it's
>>> clearly something else but at least we'll eliminate THP as the culprit.
>>>
>>> On Saturday, October 29, 2016, David Ely <david.ely at unboundid.com>
>>> wrote:
>>>
>>>> Thank you for the response. Yes. meminfo (see full output below) shows
>>>> ~80GB of AnonHugePages, which is pretty close to the size of the JVM (full
>>>> output below). Looking back through previous information that we have from
>>>> this customer, transparent huge pages have been turned on for years.
>>>> We've asked them for anything else that might have changed in this
>>>> environment.
>>>>
>>>> Are there any other JVM options that we could enable that would shed
>>>> light on what's going on within the ParNew? Would -XX:+PrintTLAB
>>>> -XX:+PrintPLAB -XX:PrintFLSStatistics=1 show anything useful?
>>>>
>>>> David
>>>>
>>>>
>>>> MemTotal:       264396572 kB
>>>> MemFree:         2401576 kB
>>>> Buffers:          381564 kB
>>>> Cached:         172673120 kB
>>>> SwapCached:            0 kB
>>>> Active:         163439836 kB
>>>> Inactive:       90737452 kB
>>>> Active(anon):   76910848 kB
>>>> Inactive(anon):  4212580 kB
>>>> Active(file):   86528988 kB
>>>> Inactive(file): 86524872 kB
>>>> Unevictable:           0 kB
>>>> Mlocked:               0 kB
>>>> SwapTotal:      16236540 kB
>>>> SwapFree:       16236540 kB
>>>> Dirty:             14552 kB
>>>> Writeback:             0 kB
>>>> AnonPages:      81111768 kB
>>>> Mapped:            31312 kB
>>>> Shmem:               212 kB
>>>> Slab:            6078732 kB
>>>> SReclaimable:    5956052 kB
>>>> SUnreclaim:       122680 kB
>>>> KernelStack:       41296 kB
>>>> PageTables:       171324 kB
>>>> NFS_Unstable:          0 kB
>>>> Bounce:                0 kB
>>>> WritebackTmp:          0 kB
>>>> CommitLimit:    148434824 kB
>>>> Committed_AS:   93124984 kB
>>>> VmallocTotal:   34359738367 kB
>>>> VmallocUsed:      686780 kB
>>>> VmallocChunk:   34225639420 kB
>>>> HardwareCorrupted:     0 kB
>>>> *AnonHugePages:  80519168 kB*
>>>> HugePages_Total:       0
>>>> HugePages_Free:        0
>>>> HugePages_Rsvd:        0
>>>> HugePages_Surp:        0
>>>> Hugepagesize:       2048 kB
>>>> DirectMap4k:        5132 kB
>>>> DirectMap2M:     1957888 kB
>>>> DirectMap1G:    266338304 kB
>>>>
>>>>
>>>> On Fri, Oct 28, 2016 at 8:04 PM, Vitaly Davidovich <vitalyd at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Friday, October 28, 2016, David Ely <david.ely at unboundid.com>
>>>>> wrote:
>>>>>
>>>>>> While typical ParNew GC times are 50ms, our application is
>>>>>> occasionally hitting ParNew times that are over 15 seconds for one of our
>>>>>> customers, and we have no idea why. Looking at the full GC log file:
>>>>>>
>>>>>> 382250 ParNew GCs are < 1 second
>>>>>> 9303 are 100ms to 1 second
>>>>>> 1267 are 1 second to 2 seconds
>>>>>> 99 are 2 seconds to 10 seconds
>>>>>> 24 are > 10 seconds, 48 seconds being the max
>>>>>>
>>>>>> The long ones are somewhat bursty as you can see from looking at the
>>>>>> line numbers in the GC log:
>>>>>>
>>>>>> $ egrep -n  '(ParNew.*real=[1-9][0-9]\)' gc.log.0
>>>>>>
>>>>>> 12300:2016-10-21T01:03:20.380+0000: 20278.069:
>>>>>> [GC2016-10-21T01:03:20.380+0000: 20278.070: [ParNew:
>>>>>> 1697741K->10024K(1887488K), 16.9913450 secs] 33979542K->32817239K(84724992K),
>>>>>> 16.9921050 secs] [Times: user=541.32 sys=14.37, real=16.99 secs]
>>>>>> 43730:2016-10-21T14:12:25.050+0000: 67622.740:
>>>>>> [GC2016-10-21T14:12:25.051+0000: 67622.740: [ParNew:
>>>>>> 1728194K->33817K(1887488K), 12.7508470 secs] 49737924K->48320707K(84724992K),
>>>>>> 12.7517840 secs] [Times: user=405.89 sys=11.05, real=12.75 secs]
>>>>>> 44079:2016-10-21T14:18:55.172+0000: 68012.862:
>>>>>> [GC2016-10-21T14:18:55.173+0000: 68012.862: [ParNew:
>>>>>> 1698371K->26958K(1887488K), 12.7384460 secs] 50339815K->48930730K(84724992K),
>>>>>> 12.7392360 secs] [Times: user=406.58 sys=11.29, real=12.73 secs]
>>>>>> 50151:2016-10-21T17:10:14.471+0000: 78292.160:
>>>>>> [GC2016-10-21T17:10:14.471+0000: 78292.161: [ParNew:
>>>>>> 1713813K->40968K(1887488K), 18.6593320 secs] 49366906K->47959129K(84724992K),
>>>>>> 18.6602550 secs] [Times: user=590.03 sys=17.45, real=18.66 secs]
>>>>>> 56073:2016-10-21T19:59:36.847+0000: 88454.536:
>>>>>> [GC2016-10-21T19:59:36.847+0000: 88454.537: [ParNew:
>>>>>> 1685720K->20763K(1887488K), 16.0840200 secs] 50704025K->49302131K(84724992K),
>>>>>> 16.0848810 secs] [Times: user=487.00 sys=16.84, real=16.09 secs]
>>>>>> 78987:2016-10-22T05:49:25.623+0000: 123843.312:
>>>>>> [GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew:
>>>>>> 1709771K->22678K(1887488K), 10.9933380 secs] 43323834K->41914203K(84724992K),
>>>>>> 10.9943060 secs] [Times: user=349.67 sys=9.84, real=10.99 secs]
>>>>>> 79104:2016-10-22T05:59:26.382+0000: 124444.071:
>>>>>> [GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew:
>>>>>> 1697024K->22260K(1887488K), 11.5490390 secs] 44558499K->43145880K(84724992K),
>>>>>> 11.5499650 secs] [Times: user=367.73 sys=10.01, real=11.55 secs]
>>>>>> 79504:2016-10-22T06:09:36.983+0000: 125054.672:
>>>>>> [GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew:
>>>>>> 1688112K->4769K(1887488K), 14.1528810 secs] 46684947K->45263748K(84724992K),
>>>>>> 14.1539860 secs] [Times: user=452.28 sys=12.71, real=14.15 secs]
>>>>>> 79772:2016-10-22T06:30:36.130+0000: 126313.819:
>>>>>> [GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew:
>>>>>> 1725520K->35893K(1887488K), 14.4479670 secs] 48989739K->47563879K(84724992K),
>>>>>> 14.4488810 secs] [Times: user=461.60 sys=13.04, real=14.45 secs]
>>>>>> 80087:2016-10-22T06:37:07.202+0000: 126704.891:
>>>>>> [GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew:
>>>>>> 1698021K->23440K(1887488K), 15.7039920 secs] 50517163K->49105987K(84724992K),
>>>>>> 15.7050040 secs] [Times: user=497.65 sys=14.75, real=15.70 secs]
>>>>>> 89969:2016-10-22T13:54:27.978+0000: 152945.667:
>>>>>> [GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew:
>>>>>> 1834914K->15978K(1887488K), 11.5637150 secs] 48716340K->47307673K(84724992K),
>>>>>> 11.5645440 secs] [Times: user=367.77 sys=10.01, real=11.57 secs]
>>>>>> 90200:2016-10-22T14:05:02.717+0000: 153580.407:
>>>>>> [GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew:
>>>>>> 1684626K->7078K(1887488K), 17.3424650 secs] 50361539K->48947648K(84724992K),
>>>>>> 17.3433490 secs] [Times: user=554.39 sys=15.81, real=17.34 secs]
>>>>>> 90299:2016-10-22T14:14:30.521+0000: 154148.210:
>>>>>> [GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew:
>>>>>> 1690850K->6078K(1887488K), 13.1699350 secs] 51455784K->50033156K(84724992K),
>>>>>> 13.1708900 secs] [Times: user=419.55 sys=11.54, real=13.17 secs]
>>>>>> 261329:2016-10-26T00:06:44.499+0000: 448882.189:
>>>>>> [GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew:
>>>>>> 1705614K->22224K(1887488K), 17.5831730 secs] 40683698K->39525817K(84724992K),
>>>>>> 17.5843270 secs] [Times: user=561.85 sys=14.79, real=17.58 secs]
>>>>>> 261935:2016-10-26T00:13:34.277+0000: 449291.967:
>>>>>> [GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew:
>>>>>> 1690085K->26707K(1887488K), 13.9331790 secs] 43792178K->42655000K(84724992K),
>>>>>> 13.9340780 secs] [Times: user=446.36 sys=11.45, real=13.93 secs]
>>>>>> 262143:2016-10-26T00:20:09.397+0000: 449687.087:
>>>>>> [GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew:
>>>>>> 1696593K->27078K(1887488K), 40.3344500 secs] 45588644K->44444949K(84724992K),
>>>>>> 40.3355430 secs] [Times: user=1248.15 sys=43.07, real=40.33 secs]
>>>>>> 262275:2016-10-26T00:27:02.196+0000: 450099.886:
>>>>>> [GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew:
>>>>>> 1683406K->17853K(1887488K), 17.7472360 secs] 46908499K->45506131K(84724992K),
>>>>>> 17.7482260 secs] [Times: user=567.03 sys=16.10, real=17.75 secs]
>>>>>> 262282:2016-10-26T00:27:29.448+0000: 450127.138:
>>>>>> [GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew:
>>>>>> 1687737K->10499K(1887488K), 35.4934000 secs] 47195678K->46044477K(84724992K),
>>>>>> 35.4943230 secs] [Times: user=1131.34 sys=31.87, real=35.49 secs]
>>>>>> 262631:2016-10-26T00:34:17.632+0000: 450535.321:
>>>>>> [GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew:
>>>>>> 1687590K->10226K(1887488K), 21.4043600 secs] 49431427K->48018504K(84724992K),
>>>>>> 21.4052230 secs] [Times: user=682.50 sys=19.46, real=21.41 secs]
>>>>>> 262844:2016-10-26T00:41:08.118+0000: 450945.808:
>>>>>> [GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew:
>>>>>> 1692928K->11302K(1887488K), 48.2899260 secs] 51073216K->49915878K(84724992K),
>>>>>> 48.2909550 secs] [Times: user=1493.17 sys=53.55, real=48.28 secs]
>>>>>> 345421:2016-10-27T04:17:59.617+0000: 550357.306:
>>>>>> [GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew:
>>>>>> 1695052K->22991K(1887488K), 33.8707510 secs] 46334738K->45187822K(84724992K),
>>>>>> 33.8718980 secs] [Times: user=1081.31 sys=30.59, real=33.86 secs]
>>>>>> 345510:2016-10-27T04:24:11.721+0000: 550729.411:
>>>>>> [GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew:
>>>>>> 1705080K->20401K(1887488K), 18.9795540 secs] 47388073K->45965537K(84724992K),
>>>>>> 18.9805410 secs] [Times: user=606.94 sys=17.25, real=18.98 secs]
>>>>>> 345514:2016-10-27T04:24:36.695+0000: 550754.385:
>>>>>> [GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew:
>>>>>> 1707810K->32640K(1887488K), 30.9728200 secs] 47656489K->46506725K(84724992K),
>>>>>> 30.9737300 secs] [Times: user=917.67 sys=33.07, real=30.97 secs]
>>>>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791:
>>>>>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew:
>>>>>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K),
>>>>>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs]
>>>>>>
>>>>>> Context around a single instance is fairly normal:
>>>>>>
>>>>>> 345773-2016-10-27T04:31:28.032+0000: 551165.721:
>>>>>> [GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew:
>>>>>> 1685858K->8851K(1887488K), 0.0480250 secs] 49545909K->47870050K(84724992K),
>>>>>> 0.0490200 secs] [Times: user=1.47 sys=0.02, real=0.05 secs]
>>>>>> 345774-2016-10-27T04:31:28.635+0000: 551166.324:
>>>>>> [GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew:
>>>>>> 1686675K->10456K(1887488K), 0.0463570 secs] 49547874K->47872545K(84724992K),
>>>>>> 0.0473410 secs] [Times: user=1.41 sys=0.04, real=0.05 secs]
>>>>>> 345775-2016-10-27T04:31:29.205+0000: 551166.894:
>>>>>> [GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew:
>>>>>> 1688280K->12733K(1887488K), 0.0487100 secs] 49550369K->47876404K(84724992K),
>>>>>> 0.0496310 secs] [Times: user=1.47 sys=0.04, real=0.05 secs]
>>>>>> 345776-2016-10-27T04:31:29.798+0000: 551167.487:
>>>>>> [GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew:
>>>>>> 1690557K->26694K(1887488K), 0.0471170 secs] 49554228K->47892320K(84724992K),
>>>>>> 0.0481180 secs] [Times: user=1.40 sys=0.02, real=0.05 secs]
>>>>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791:
>>>>>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew:
>>>>>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K),
>>>>>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs]
>>>>>> 345778-2016-10-27T04:32:08.449+0000: 551206.139:
>>>>>> [GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew:
>>>>>> 1708684K->122033K(1887488K), 0.0664280 secs] 50100157K->48528020K(84724992K),
>>>>>> 0.0672860 secs] [Times: user=1.60 sys=0.05, real=0.07 secs]
>>>>>> 345779-2016-10-27T04:32:09.090+0000: 551206.779:
>>>>>> [GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew:
>>>>>> 1799857K->42169K(1887488K), 0.0688910 secs] 50205844K->48541030K(84724992K),
>>>>>> 0.0696110 secs] [Times: user=1.70 sys=0.03, real=0.07 secs]
>>>>>> 345780-2016-10-27T04:32:09.802+0000: 551207.491:
>>>>>> [GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew:
>>>>>> 1719993K->43790K(1887488K), 0.0508540 secs] 50218854K->48542651K(84724992K),
>>>>>> 0.0516000 secs] [Times: user=1.54 sys=0.03, real=0.05 secs]
>>>>>> 345781-2016-10-27T04:32:10.536+0000: 551208.226:
>>>>>> [GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew:
>>>>>> 1721614K->30389K(1887488K), 0.0668100 secs] 50220475K->48545932K(84724992K),
>>>>>> 0.0675470 secs] [Times: user=1.81 sys=0.03, real=0.06 secs]
>>>>>> 345782-2016-10-27T04:32:11.137+0000: 551208.826:
>>>>>> [GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew:
>>>>>> 1708213K->18631K(1887488K), 0.0632570 secs] 50223756K->48540797K(84724992K),
>>>>>> 0.0639650 secs] [Times: user=1.95 sys=0.01, real=0.06 secs]
>>>>>> 345783-2016-10-27T04:32:11.642+0000: 551209.332:
>>>>>> [GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew:
>>>>>> 1696455K->19415K(1887488K), 0.0509260 secs] 50218621K->48545033K(84724992K),
>>>>>> 0.0516780 secs] [Times: user=1.55 sys=0.03, real=0.05 secs]
>>>>>>
>>>>>> Since the user times are high as well, I don't think this could be
>>>>>> swapping.
>>>>>>
>>>>> Can you ask the customer if they're using transparent hugepages (THP)?
>>>>>
>>>>>>
>>>>>> Here are the hard-earned set of JVM arguments that we're using:
>>>>>>
>>>>>> -d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \
>>>>>>   -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled \
>>>>>>   -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled
>>>>>> \
>>>>>>   -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \
>>>>>>   -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000 \
>>>>>>   -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC
>>>>>> -XX:+UseBiasedLocking \
>>>>>>   -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops -XX:PermSize=256M
>>>>>> \
>>>>>>   -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \
>>>>>>   -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar
>>>>>> -XX:+UseLargePages \
>>>>>>   -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>>>>>> -XX:+PrintCommandLineFlags \
>>>>>>   -XX:+UseGCLogFileRotation \
>>>>>>   -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \
>>>>>>   -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log
>>>>>>
>>>>>> This is on Linux with Java 1.7.0_72.
>>>>>>
>>>>>> Does this look familiar to anyone? Alternatively, are there some more
>>>>>> JVM options that we could include to get more information?
>>>>>>
>>>>>> One of the first things that we'll try is to move to a later JVM, but
>>>>>> it will be easier to get the customer to do that if we can point to a
>>>>>> specific issue that has been addressed.
>>>>>>
>>>>>> Thanks for your help.
>>>>>>
>>>>>> David
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sent from my phone
>>>>>
>>>>
>>>>
>>>
>>> --
>>> Sent from my phone
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>>
>>
>
> --
> Sent from my phone
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161030/ddb60367/attachment.html>

From graphhopper at gmx.de  Mon Oct 31 18:07:38 2016
From: graphhopper at gmx.de (Peter)
Date: Mon, 31 Oct 2016 19:07:38 +0100
Subject: Big speed difference for G1 vs. parallel GC
Message-ID: <0729fdcc-3597-2cf3-9806-21d7d85f3c7f@gmx.de>

Hi,

I've stumbled today* over a big speed difference for code execution with
G1 GC vs. parallel GC also in the latest JDK8 (1.8.0_111-b14). Maybe you
have interests to investigate this. You should be able to reproduce this
via:

# setup
git clone https://github.com/graphhopper/graphhopper
wget http://download.geofabrik.de/europe/germany/bayern-latest.osm.pbf
cd graphhopper

# run measurement
export JAVA_OPTS="-XX:+UseParallelGC -Xmx1000m -Xms1000m"
# the graphhopper.sh script just makes the installation of maven and
bundling the jar a bit simpler
# you can also execute the tests in the class Measurement.java
<https://github.com/graphhopper/graphhopper/blob/master/tools/src/main/java/com/graphhopper/tools/Measurement.java>
./graphhopper.sh clean
./graphhopper.sh measurement berlin-latest.osm.pbf
# now a measurement-<some date>.properties is created:
grep routing.mean measurement-XY.properties

Now this should print a line where the value is in ms. E.g. I get ~450ms
for the parallel GC and ~780ms for G1GC (on an old laptop). When I
increase the Xmx for the G1 run to 1400m the results do NOT get closer
to parallel GC!

Let me know if you need more information!

Regards
Peter

*
https://github.com/graphhopper/graphhopper/issues/854

-- 
GraphHopper.com - fast and flexible route planning

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161031/94feeebc/attachment-0001.html>

From ecki at zusammenkunft.net  Mon Oct 31 19:52:06 2016
From: ecki at zusammenkunft.net (Bernd Eckenfels)
Date: Mon, 31 Oct 2016 19:52:06 +0000 (UTC)
Subject: Big speed difference for G1 vs. parallel GC
In-Reply-To: <0729fdcc-3597-2cf3-9806-21d7d85f3c7f@gmx.de>
References: <0729fdcc-3597-2cf3-9806-21d7d85f3c7f@gmx.de>
Message-ID: <57A008FAA9E6E1E3.C9141F80-A575-4BDE-9C6B-548B87AAEF9A@mail.outlook.com>

Hello,
Since this is measuring a short workload after vom startup it might not be the best benchmark, but then again throughput GC is expected to be faster than G1.
In the particular case however I guess you could tune G1 a bit to that workload. Did you check the verbose GC logs, and how many CPUs does Java see/use?

Gruss
Bernd
-- 
http://bernd.eckenfels.net


On Mon, Oct 31, 2016 at 8:38 PM +0100, "Peter" <graphhopper at gmx.de> wrote:


    Hi,

    
    I've stumbled today* over a big speed difference for code execution
    with G1 GC vs. parallel GC also in the latest JDK8 (1.8.0_111-b14).
    Maybe you have interests to investigate this. You should be able to
    reproduce this via:

    
    # setup

    git clone https://github.com/graphhopper/graphhopper

    wget
    http://download.geofabrik.de/europe/germany/bayern-latest.osm.pbf

    cd graphhopper

    
    # run measurement

    export JAVA_OPTS="-XX:+UseParallelGC -Xmx1000m -Xms1000m"

    # the graphhopper.sh script just makes the installation of maven and
    bundling the jar a bit simpler

    # you can also execute the tests in the class Measurement.java

    ./graphhopper.sh clean

    ./graphhopper.sh measurement berlin-latest.osm.pbf

    # now a measurement-<some date>.properties is created:

    grep routing.mean measurement-XY.properties

    
    Now this should print a line where the value is in ms. E.g. I get
    ~450ms for the parallel GC and ~780ms for G1GC (on an old laptop).
    When I increase the Xmx for the G1 run to 1400m the results do NOT
    get closer to parallel GC!

    
    Let me know if you need more information!

    
    Regards

    Peter

    
    *

    https://github.com/graphhopper/graphhopper/issues/854

    -- 
GraphHopper.com - fast and flexible route planning
  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161031/e04d3df3/attachment.html>

From graphhopper at gmx.de  Mon Oct 31 20:47:35 2016
From: graphhopper at gmx.de (Peter)
Date: Mon, 31 Oct 2016 21:47:35 +0100
Subject: Big speed difference for G1 vs. parallel GC
In-Reply-To: <57A008FAA9E6E1E3.C9141F80-A575-4BDE-9C6B-548B87AAEF9A@mail.outlook.com>
References: <0729fdcc-3597-2cf3-9806-21d7d85f3c7f@gmx.de>
	<57A008FAA9E6E1E3.C9141F80-A575-4BDE-9C6B-548B87AAEF9A@mail.outlook.com>
Message-ID: <0968f2eb-606c-2a62-a3d5-afde77bac47a@gmx.de>

Hi Bernd,

why do you think it is measuring a short workload? 'short' in which
terms? The overall test suite takes roughly 3 minutes but can be
increased easily via increasing the number of road routing queries. BTW:
with routing.mean we measure the latency of every road routing query, at
least I think so ;)

> Did you check the verbose GC logs, and how many CPUs does Java see/use?

Nothing suspicious in the GC logs
<https://gist.github.com/karussell/94848aa48244252c6010018b3621b565>
IMO, except that G1 produces much more output. Still this reminded me of
another mistake <http://stackoverflow.com/q/38905739/194609> I made
recently (not disabling swapping) and so I went to my dev server
(instead of laptop) where this is already done and have more RAM there
(32g), still using just 1000m and the results are a bit better: 320ms
vs. only 235ms, so G1 is only ~25% slower. What differences are expected
here ... let's say 'maximum'?

BTW: CPU usage on the server is roughly 200-240% for G1 and 100-120% for
the parallel GC, so the speedup might be also related to the CPUs as the
laptop only has 2 cores without hyperthreading.

Regards
Peter

On 31.10.2016 20:52, Bernd Eckenfels wrote:
> Hello,
>
> Since this is measuring a short workload after vom startup it might
> not be the best benchmark, but then again throughput GC is expected to
> be faster than G1.
>
> In the particular case however I guess you could tune G1 a bit to that
> workload. Did you check the verbose GC logs, and how many CPUs does
> Java see/use?
>
> Gruss
> Bernd
> -- 
> http://bernd.eckenfels.net
>
>
>
>
> On Mon, Oct 31, 2016 at 8:38 PM +0100, "Peter" <graphhopper at gmx.de
> <mailto:graphhopper at gmx.de>> wrote:
>
>     Hi,
>
>     I've stumbled today* over a big speed difference for code
>     execution with G1 GC vs. parallel GC also in the latest JDK8
>     (1.8.0_111-b14). Maybe you have interests to investigate this. You
>     should be able to reproduce this via:
>
>     # setup
>     git clone https://github.com/graphhopper/graphhopper
>     wget http://download.geofabrik.de/europe/germany/bayern-latest.osm.pbf
>     cd graphhopper
>
>     # run measurement
>     export JAVA_OPTS="-XX:+UseParallelGC -Xmx1000m -Xms1000m"
>     # the graphhopper.sh script just makes the installation of maven
>     and bundling the jar a bit simpler
>     # you can also execute the tests in the class Measurement.java
>     <https://github.com/graphhopper/graphhopper/blob/master/tools/src/main/java/com/graphhopper/tools/Measurement.java>
>     ./graphhopper.sh clean
>     ./graphhopper.sh measurement berlin-latest.osm.pbf
>     # now a measurement-<some date>.properties is created:
>     grep routing.mean measurement-XY.properties
>
>     Now this should print a line where the value is in ms. E.g. I get
>     ~450ms for the parallel GC and ~780ms for G1GC (on an old laptop).
>     When I increase the Xmx for the G1 run to 1400m the results do NOT
>     get closer to parallel GC!
>
>     Let me know if you need more information!
>
>     Regards
>     Peter
>
>     *
>     https://github.com/graphhopper/graphhopper/issues/854
>
>     -- 
>     GraphHopper.com - fast and flexible route planning
>


-- 
GraphHopper.com - fast and flexible route planning

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161031/dbad6849/attachment.html>

From vitalyd at gmail.com  Mon Oct 31 23:04:16 2016
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Mon, 31 Oct 2016 19:04:16 -0400
Subject: Big speed difference for G1 vs. parallel GC
In-Reply-To: <0968f2eb-606c-2a62-a3d5-afde77bac47a@gmx.de>
References: <0729fdcc-3597-2cf3-9806-21d7d85f3c7f@gmx.de>
	<57A008FAA9E6E1E3.C9141F80-A575-4BDE-9C6B-548B87AAEF9A@mail.outlook.com>
	<0968f2eb-606c-2a62-a3d5-afde77bac47a@gmx.de>
Message-ID: <CAHjP37FjWAM78bvuUCJgvXQJmyFr81-BmcTDwZ3PpixZrc90gQ@mail.gmail.com>

G1 has more expensive GC write barriers - if you have a reference heavy
heap with lots of mutation, it can add up.  20% more overhead for each
barrier is a number I've heard before.

On Monday, October 31, 2016, Peter <graphhopper at gmx.de> wrote:

> Hi Bernd,
>
> why do you think it is measuring a short workload? 'short' in which terms?
> The overall test suite takes roughly 3 minutes but can be increased easily
> via increasing the number of road routing queries. BTW: with routing.mean
> we measure the latency of every road routing query, at least I think so ;)
>
> > Did you check the verbose GC logs, and how many CPUs does Java see/use?
>
> Nothing suspicious in the GC logs
> <https://gist.github.com/karussell/94848aa48244252c6010018b3621b565> IMO,
> except that G1 produces much more output. Still this reminded me of another
> mistake <http://stackoverflow.com/q/38905739/194609> I made recently (not
> disabling swapping) and so I went to my dev server (instead of laptop)
> where this is already done and have more RAM there (32g), still using just
> 1000m and the results are a bit better: 320ms vs. only 235ms, so G1 is only
> ~25% slower. What differences are expected here ... let's say 'maximum'?
>
> BTW: CPU usage on the server is roughly 200-240% for G1 and 100-120% for
> the parallel GC, so the speedup might be also related to the CPUs as the
> laptop only has 2 cores without hyperthreading.
>
> Regards
> Peter
>
> On 31.10.2016 20:52, Bernd Eckenfels wrote:
>
> Hello,
>
> Since this is measuring a short workload after vom startup it might not be
> the best benchmark, but then again throughput GC is expected to be faster
> than G1.
>
> In the particular case however I guess you could tune G1 a bit to that
> workload. Did you check the verbose GC logs, and how many CPUs does Java
> see/use?
>
> Gruss
> Bernd
> --
> http://bernd.eckenfels.net
>
>
>
>
> On Mon, Oct 31, 2016 at 8:38 PM +0100, "Peter" <graphhopper at gmx.de
> <javascript:_e(%7B%7D,'cvml','graphhopper at gmx.de');>> wrote:
>
> Hi,
>>
>> I've stumbled today* over a big speed difference for code execution with
>> G1 GC vs. parallel GC also in the latest JDK8 (1.8.0_111-b14). Maybe you
>> have interests to investigate this. You should be able to reproduce this
>> via:
>>
>> # setup
>> git clone https://github.com/graphhopper/graphhopper
>> wget http://download.geofabrik.de/europe/germany/bayern-latest.osm.pbf
>> cd graphhopper
>>
>> # run measurement
>> export JAVA_OPTS="-XX:+UseParallelGC -Xmx1000m -Xms1000m"
>> # the graphhopper.sh script just makes the installation of maven and
>> bundling the jar a bit simpler
>> # you can also execute the tests in the class Measurement.java
>> <https://github.com/graphhopper/graphhopper/blob/master/tools/src/main/java/com/graphhopper/tools/Measurement.java>
>> ./graphhopper.sh clean
>> ./graphhopper.sh measurement berlin-latest.osm.pbf
>> # now a measurement-<some date>.properties is created:
>> grep routing.mean measurement-XY.properties
>>
>> Now this should print a line where the value is in ms. E.g. I get ~450ms
>> for the parallel GC and ~780ms for G1GC (on an old laptop). When I increase
>> the Xmx for the G1 run to 1400m the results do NOT get closer to parallel
>> GC!
>>
>> Let me know if you need more information!
>>
>> Regards
>> Peter
>>
>> *
>> https://github.com/graphhopper/graphhopper/issues/854
>>
>> --
>> GraphHopper.com - fast and flexible route planning
>>
>>
>
> --
> GraphHopper.com - fast and flexible route planning
>
>

-- 
Sent from my phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161031/bbba34ef/attachment.html>