From david.ely at unboundid.com  Wed Nov  2 04:06:57 2016
From: david.ely at unboundid.com (David Ely)
Date: Tue, 1 Nov 2016 23:06:57 -0500
Subject: occasional ParNew times of 15+ seconds
In-Reply-To: <CAC1_7KPf5H+mzUk3hB_nvk5cwnJDgfD_GSFEtdwmy8DpqSJ=Yg@mail.gmail.com>
References: <CAC1_7KNM+c5o_yMH6m3ueO9c+qOEAxW4_LQAq3iTn0BCBDJxog@mail.gmail.com>
	<CAHjP37EsiXBwW5p6AwY+jtw71bQ-+YpBb_NwZoJkCvDgS8HwZQ@mail.gmail.com>
	<CAC1_7KPyfeV43zEGNKPMrXJObNWXNMV-CzTFs6bYyfgkjWeDiw@mail.gmail.com>
	<CAHjP37G2ibgco7YLg0gXk=HKcE-fkhyKhutvHmhAWSzZLyVEuQ@mail.gmail.com>
	<B432689D-C653-49EC-90CB-56294C329C82@oracle.com>
	<CAC1_7KMH6yHkNY9n8B5+293-Vswr1Ztp=CRH8aY0VFeJnZhw1Q@mail.gmail.com>
	<CAHjP37Fi3+=FDEGNP3f5QM9RMHU+5r41nzXXiOEtqFKkDyR=-Q@mail.gmail.com>
	<CAC1_7KPf5H+mzUk3hB_nvk5cwnJDgfD_GSFEtdwmy8DpqSJ=Yg@mail.gmail.com>
Message-ID: <CAC1_7KOgWkJYr2-333BNPHFuwZ5rG4M=csUiuYBZYaJFGWWwJw@mail.gmail.com>

First, a question on isolating this issue. If outside of the process we
detect that the JVM is paused, would we learn anything if we took pstacks
while it was paused?

The customer has disabled THP in their testing environment and hopefully
will be moving that to production, where they are seeing these issues, soon.

They have moved to 1.7u121 and added in more GC settings. Below are a
couple of the long ParNew cycles. I don't understand all of what's logged
here. Does this shed light on anything?


2016-11-01T20:47:31.163+0000: 8256.232: [GCTLAB: gc thread:
0x00007fdeac006800 [id: 37683] desired_size: 2KB slow allocs: 10  refill
waste: 32B alloc: 0.00001       19KB refills: 16 waste  5.5% gc: 1176B
slow: 632B fast: 0B
TLAB: gc thread: 0x00007ff44a588800 [id: 46128] desired_size: 2KB slow
allocs: 0  refill waste: 32B alloc: 0.00000        2KB refills: 1 waste
100.0% gc: 2072B slow: 0B fast: 0B
...
TLAB totals: thrds: 464  refills: 5231 max: 165 slow allocs: 1691 max 64
waste:  1.0% gc: 13793264B max: 1872736B slow: 305880B max: 38776B fast: 0B
max: 0B
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 1543881846
Max   Chunk Size: 1543881846
Number of Blocks: 1
Av.  Block  Size: 1543881846
Tree      Height: 1
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 1137422
Max   Chunk Size: 1135616
Number of Blocks: 6
Av.  Block  Size: 189570
Tree      Height: 4
2016-11-01T20:47:31.165+0000: 8256.233:
[ParNew2016-11-01T20:47:44.302+0000: 8269.370: [SoftReference, 0 refs,
0.0001060 secs]2016-11-01T20:47:44.302+0000: 8269.370: [WeakReference, 3
refs, 0.0000220 secs]2016-11-01T20:47:44.302+0000: 8269.370:
[FinalReference, 53 refs, 0.0001970 secs]2016-11-01T20:47:44.302+0000:
8269.370: [PhantomReference, 0 refs, 0 refs, 0.0000320
secs]2016-11-01T20:47:44.302+0000: 8269.370: [JNI Weak Reference, 0.0000090
secs]
Desired survivor size 107347968 bytes, new threshold 6 (max 6)
- age   1:    3631864 bytes,    3631864 total
- age   2:    1532688 bytes,    5164552 total
- age   3:   12273088 bytes,   17437640 total
- age   4:    2159144 bytes,   19596784 total
- age   5:    1102496 bytes,   20699280 total
- age   6:    1924216 bytes,   22623496 total
 (plab_sz = 78250  desired_plab_sz = 80167) : 1715472K->32796K(1887488K),
13.1373660 secs] 38927713K->37508799K(84724992K)After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 1510165982
Max   Chunk Size: 1510165982
Number of Blocks: 1
Av.  Block  Size: 1510165982
Tree      Height: 1
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 1137422
Max   Chunk Size: 1135616
Number of Blocks: 6
Av.  Block  Size: 189570
Tree      Height: 4
, 13.1395130 secs] [Times: user=419.77 sys=11.27, real=13.14 secs]


2016-11-01T20:54:32.448+0000: 8677.516: [GCTLAB: gc thread:
0x00007fdeac006800 [id: 37683] desired_size: 2KB slow allocs: 0  refill
waste: 32B alloc: 0.00002       33KB refills: 1 waste 100.0% gc: 2072B
slow: 0B fast: 0B
TLAB: gc thread: 0x00007fde9800b800 [id: 46243] desired_size: 26KB slow
allocs: 16  refill waste: 672B alloc: 0.00078     1307KB refills: 33 waste
 0.5% gc: 2080B slow: 2696B fast: 0B
...
TLAB totals: thrds: 432  refills: 3498 max: 162 slow allocs: 876 max 69
waste:  1.2% gc: 14146232B max: 3248064B slow: 169880B max: 10712B fast: 0B
max: 0B
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 1400742863
Max   Chunk Size: 1400742863
Number of Blocks: 1
Av.  Block  Size: 1400742863
Tree      Height: 1
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 1104654
Max   Chunk Size: 1102848
Number of Blocks: 6
Av.  Block  Size: 184109
Tree      Height: 4
2016-11-01T20:54:32.449+0000: 8677.518:
[ParNew2016-11-01T20:54:49.766+0000: 8694.834: [SoftReference, 0 refs,
0.0001490 secs]2016-11-01T20:54:49.766+0000: 8694.834: [WeakReference, 5
refs, 0.0000400 secs]2016-11-01T20:54:49.766+0000: 8694.834:
[FinalReference, 24 refs, 0.0003910 secs]2016-11-01T20:54:49.766+0000:
8694.835: [PhantomReference, 0 refs, 0 refs, 0.0000520
secs]2016-11-01T20:54:49.766+0000: 8694.835: [JNI Weak Reference, 0.0000200
secs]
Desired survivor size 107347968 bytes, new threshold 6 (max 6)
- age   1:    3054792 bytes,    3054792 total
- age   2:    8902352 bytes,   11957144 total
- age   3:    1016784 bytes,   12973928 total
- age   4:     781808 bytes,   13755736 total
- age   5:     525088 bytes,   14280824 total
- age   6:    3151000 bytes,   17431824 total
 (plab_sz = 58269  desired_plab_sz = 55860) : 1697115K->26319K(1887488K),
17.3176320 secs] 40031748K->38885564K(84724992K)After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 1333594291
Max   Chunk Size: 1333594291
Number of Blocks: 1
Av.  Block  Size: 1333594291
Tree      Height: 1
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 1104654
Max   Chunk Size: 1102848
Number of Blocks: 6
Av.  Block  Size: 184109
Tree      Height: 4
, 17.3196490 secs] [Times: user=554.79 sys=13.91, real=17.32 secs]

Thanks.

David


On Sun, Oct 30, 2016 at 7:09 PM, David Ely <david.ely at unboundid.com> wrote:

> Thanks again Vitaly. Responses inline.
>
> On Sun, Oct 30, 2016 at 1:56 PM, Vitaly Davidovich <vitalyd at gmail.com>
> wrote:
>
>>
>>
>> On Sunday, October 30, 2016, David Ely <david.ely at unboundid.com> wrote:
>>
>>> Thank you Vitaly and Charlie. We will have them disable THP, move to a
>>> later version of the JVM, and add in some additional GC logging JVM options.
>>>
>>> Looking more at the GC log, it appears that the long ParNew pauses only
>>> occur when the old generation usage is at least half of the distance
>>> between the live size and when CMS is triggered via
>>> CMSInitiatingOccupancyFraction. After a CMS collection, the long pauses
>>> stop. However, there are plenty of CMS cycles where we don't see any long
>>> pauses, and there are plenty of places where we promote the same amount of
>>> data associated with a long pause but don't experience a long pause.
>>>
>>> Is this behavior consistent with the THP diagnosis?
>>>
>> The very high sys time is unusual for a parnew collection.  THP defrag is
>> one possible known cause of that.  It's certainly possible there's
>> something else going on, but turning off THP is a good start in
>> troubleshooting; even if it's not the cause here, it may bite your customer
>> later.
>>
>
> The sys times are high, but they are not especially high relative to the
> user times. The ratio across all of the ParNew collections is about the
> same.
>
>
>>
>> A few more questions in the meantime:
>>
>> 1) are these parnew tails reproducible?
>>
>
> I believe so. They are seeing it on multiple systems. It seems to have
> gotten worse on the newer systems, which have 256GB of RAM compared to 96GB.
>
>
>> 2) is this running on bare metal or VM?
>>
>
> Bare metal.
>
>
>> 3) what's the hardware spec?
>>
>
> These specific pauses on hardware they acquired recently. Java sees 48
> CPUs, and it has 256GB of RAM.
>
>
>>
>> If you can have the customer disable THP without bumping the JVM version,
>> it would help pinpoint the issue.  But, I understand if you just want to
>> fix the issue asap.
>>
>
> Since they are seeing this on multiple systems, they should be able to
> have at least one where they only disable THP.
>
> They'll have to put these changes through their testing environment, so it
> might be a little while before I'll have an update.
>
>
>>
>>
>>>
>>> On Sat, Oct 29, 2016 at 6:15 PM, charlie hunt <charlie.hunt at oracle.com>
>>> wrote:
>>>
>>>> +1 on disabling THP
>>>>
>>>> Charlie
>>>>
>>>> On Oct 29, 2016, at 10:07 AM, Vitaly Davidovich <vitalyd at gmail.com>
>>>> wrote:
>>>>
>>>> David,
>>>>
>>>> Ask them to turn off THP - it's a known source of large latency due to
>>>> the kernel doing page defragmentation; your app takes a page fault, and
>>>> boom - the kernel may start doing defragmentation to make a huge page
>>>> available.  You can search online for THP issues.  The symptoms are similar
>>>> to yours - very high sys time.
>>>>
>>>> If they turn it off and still get same lengthy parnew pauses, then it's
>>>> clearly something else but at least we'll eliminate THP as the culprit.
>>>>
>>>> On Saturday, October 29, 2016, David Ely <david.ely at unboundid.com>
>>>> wrote:
>>>>
>>>>> Thank you for the response. Yes. meminfo (see full output below) shows
>>>>> ~80GB of AnonHugePages, which is pretty close to the size of the JVM (full
>>>>> output below). Looking back through previous information that we have from
>>>>> this customer, transparent huge pages have been turned on for years.
>>>>> We've asked them for anything else that might have changed in this
>>>>> environment.
>>>>>
>>>>> Are there any other JVM options that we could enable that would shed
>>>>> light on what's going on within the ParNew? Would -XX:+PrintTLAB
>>>>> -XX:+PrintPLAB -XX:PrintFLSStatistics=1 show anything useful?
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>> MemTotal:       264396572 kB
>>>>> MemFree:         2401576 kB
>>>>> Buffers:          381564 kB
>>>>> Cached:         172673120 kB
>>>>> SwapCached:            0 kB
>>>>> Active:         163439836 kB
>>>>> Inactive:       90737452 kB
>>>>> Active(anon):   76910848 kB
>>>>> Inactive(anon):  4212580 kB
>>>>> Active(file):   86528988 kB
>>>>> Inactive(file): 86524872 kB
>>>>> Unevictable:           0 kB
>>>>> Mlocked:               0 kB
>>>>> SwapTotal:      16236540 kB
>>>>> SwapFree:       16236540 kB
>>>>> Dirty:             14552 kB
>>>>> Writeback:             0 kB
>>>>> AnonPages:      81111768 kB
>>>>> Mapped:            31312 kB
>>>>> Shmem:               212 kB
>>>>> Slab:            6078732 kB
>>>>> SReclaimable:    5956052 kB
>>>>> SUnreclaim:       122680 kB
>>>>> KernelStack:       41296 kB
>>>>> PageTables:       171324 kB
>>>>> NFS_Unstable:          0 kB
>>>>> Bounce:                0 kB
>>>>> WritebackTmp:          0 kB
>>>>> CommitLimit:    148434824 kB
>>>>> Committed_AS:   93124984 kB
>>>>> VmallocTotal:   34359738367 kB
>>>>> VmallocUsed:      686780 kB
>>>>> VmallocChunk:   34225639420 kB
>>>>> HardwareCorrupted:     0 kB
>>>>> *AnonHugePages:  80519168 kB*
>>>>> HugePages_Total:       0
>>>>> HugePages_Free:        0
>>>>> HugePages_Rsvd:        0
>>>>> HugePages_Surp:        0
>>>>> Hugepagesize:       2048 kB
>>>>> DirectMap4k:        5132 kB
>>>>> DirectMap2M:     1957888 kB
>>>>> DirectMap1G:    266338304 kB
>>>>>
>>>>>
>>>>> On Fri, Oct 28, 2016 at 8:04 PM, Vitaly Davidovich <vitalyd at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Friday, October 28, 2016, David Ely <david.ely at unboundid.com>
>>>>>> wrote:
>>>>>>
>>>>>>> While typical ParNew GC times are 50ms, our application is
>>>>>>> occasionally hitting ParNew times that are over 15 seconds for one of our
>>>>>>> customers, and we have no idea why. Looking at the full GC log file:
>>>>>>>
>>>>>>> 382250 ParNew GCs are < 1 second
>>>>>>> 9303 are 100ms to 1 second
>>>>>>> 1267 are 1 second to 2 seconds
>>>>>>> 99 are 2 seconds to 10 seconds
>>>>>>> 24 are > 10 seconds, 48 seconds being the max
>>>>>>>
>>>>>>> The long ones are somewhat bursty as you can see from looking at the
>>>>>>> line numbers in the GC log:
>>>>>>>
>>>>>>> $ egrep -n  '(ParNew.*real=[1-9][0-9]\)' gc.log.0
>>>>>>>
>>>>>>> 12300:2016-10-21T01:03:20.380+0000: 20278.069:
>>>>>>> [GC2016-10-21T01:03:20.380+0000: 20278.070: [ParNew:
>>>>>>> 1697741K->10024K(1887488K), 16.9913450 secs] 33979542K->32817239K(84724992K),
>>>>>>> 16.9921050 secs] [Times: user=541.32 sys=14.37, real=16.99 secs]
>>>>>>> 43730:2016-10-21T14:12:25.050+0000: 67622.740:
>>>>>>> [GC2016-10-21T14:12:25.051+0000: 67622.740: [ParNew:
>>>>>>> 1728194K->33817K(1887488K), 12.7508470 secs] 49737924K->48320707K(84724992K),
>>>>>>> 12.7517840 secs] [Times: user=405.89 sys=11.05, real=12.75 secs]
>>>>>>> 44079:2016-10-21T14:18:55.172+0000: 68012.862:
>>>>>>> [GC2016-10-21T14:18:55.173+0000: 68012.862: [ParNew:
>>>>>>> 1698371K->26958K(1887488K), 12.7384460 secs] 50339815K->48930730K(84724992K),
>>>>>>> 12.7392360 secs] [Times: user=406.58 sys=11.29, real=12.73 secs]
>>>>>>> 50151:2016-10-21T17:10:14.471+0000: 78292.160:
>>>>>>> [GC2016-10-21T17:10:14.471+0000: 78292.161: [ParNew:
>>>>>>> 1713813K->40968K(1887488K), 18.6593320 secs] 49366906K->47959129K(84724992K),
>>>>>>> 18.6602550 secs] [Times: user=590.03 sys=17.45, real=18.66 secs]
>>>>>>> 56073:2016-10-21T19:59:36.847+0000: 88454.536:
>>>>>>> [GC2016-10-21T19:59:36.847+0000: 88454.537: [ParNew:
>>>>>>> 1685720K->20763K(1887488K), 16.0840200 secs] 50704025K->49302131K(84724992K),
>>>>>>> 16.0848810 secs] [Times: user=487.00 sys=16.84, real=16.09 secs]
>>>>>>> 78987:2016-10-22T05:49:25.623+0000: 123843.312:
>>>>>>> [GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew:
>>>>>>> 1709771K->22678K(1887488K), 10.9933380 secs] 43323834K->41914203K(84724992K),
>>>>>>> 10.9943060 secs] [Times: user=349.67 sys=9.84, real=10.99 secs]
>>>>>>> 79104:2016-10-22T05:59:26.382+0000: 124444.071:
>>>>>>> [GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew:
>>>>>>> 1697024K->22260K(1887488K), 11.5490390 secs] 44558499K->43145880K(84724992K),
>>>>>>> 11.5499650 secs] [Times: user=367.73 sys=10.01, real=11.55 secs]
>>>>>>> 79504:2016-10-22T06:09:36.983+0000: 125054.672:
>>>>>>> [GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew:
>>>>>>> 1688112K->4769K(1887488K), 14.1528810 secs] 46684947K->45263748K(84724992K),
>>>>>>> 14.1539860 secs] [Times: user=452.28 sys=12.71, real=14.15 secs]
>>>>>>> 79772:2016-10-22T06:30:36.130+0000: 126313.819:
>>>>>>> [GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew:
>>>>>>> 1725520K->35893K(1887488K), 14.4479670 secs] 48989739K->47563879K(84724992K),
>>>>>>> 14.4488810 secs] [Times: user=461.60 sys=13.04, real=14.45 secs]
>>>>>>> 80087:2016-10-22T06:37:07.202+0000: 126704.891:
>>>>>>> [GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew:
>>>>>>> 1698021K->23440K(1887488K), 15.7039920 secs] 50517163K->49105987K(84724992K),
>>>>>>> 15.7050040 secs] [Times: user=497.65 sys=14.75, real=15.70 secs]
>>>>>>> 89969:2016-10-22T13:54:27.978+0000: 152945.667:
>>>>>>> [GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew:
>>>>>>> 1834914K->15978K(1887488K), 11.5637150 secs] 48716340K->47307673K(84724992K),
>>>>>>> 11.5645440 secs] [Times: user=367.77 sys=10.01, real=11.57 secs]
>>>>>>> 90200:2016-10-22T14:05:02.717+0000: 153580.407:
>>>>>>> [GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew:
>>>>>>> 1684626K->7078K(1887488K), 17.3424650 secs] 50361539K->48947648K(84724992K),
>>>>>>> 17.3433490 secs] [Times: user=554.39 sys=15.81, real=17.34 secs]
>>>>>>> 90299:2016-10-22T14:14:30.521+0000: 154148.210:
>>>>>>> [GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew:
>>>>>>> 1690850K->6078K(1887488K), 13.1699350 secs] 51455784K->50033156K(84724992K),
>>>>>>> 13.1708900 secs] [Times: user=419.55 sys=11.54, real=13.17 secs]
>>>>>>> 261329:2016-10-26T00:06:44.499+0000: 448882.189:
>>>>>>> [GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew:
>>>>>>> 1705614K->22224K(1887488K), 17.5831730 secs] 40683698K->39525817K(84724992K),
>>>>>>> 17.5843270 secs] [Times: user=561.85 sys=14.79, real=17.58 secs]
>>>>>>> 261935:2016-10-26T00:13:34.277+0000: 449291.967:
>>>>>>> [GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew:
>>>>>>> 1690085K->26707K(1887488K), 13.9331790 secs] 43792178K->42655000K(84724992K),
>>>>>>> 13.9340780 secs] [Times: user=446.36 sys=11.45, real=13.93 secs]
>>>>>>> 262143:2016-10-26T00:20:09.397+0000: 449687.087:
>>>>>>> [GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew:
>>>>>>> 1696593K->27078K(1887488K), 40.3344500 secs] 45588644K->44444949K(84724992K),
>>>>>>> 40.3355430 secs] [Times: user=1248.15 sys=43.07, real=40.33 secs]
>>>>>>> 262275:2016-10-26T00:27:02.196+0000: 450099.886:
>>>>>>> [GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew:
>>>>>>> 1683406K->17853K(1887488K), 17.7472360 secs] 46908499K->45506131K(84724992K),
>>>>>>> 17.7482260 secs] [Times: user=567.03 sys=16.10, real=17.75 secs]
>>>>>>> 262282:2016-10-26T00:27:29.448+0000: 450127.138:
>>>>>>> [GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew:
>>>>>>> 1687737K->10499K(1887488K), 35.4934000 secs] 47195678K->46044477K(84724992K),
>>>>>>> 35.4943230 secs] [Times: user=1131.34 sys=31.87, real=35.49 secs]
>>>>>>> 262631:2016-10-26T00:34:17.632+0000: 450535.321:
>>>>>>> [GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew:
>>>>>>> 1687590K->10226K(1887488K), 21.4043600 secs] 49431427K->48018504K(84724992K),
>>>>>>> 21.4052230 secs] [Times: user=682.50 sys=19.46, real=21.41 secs]
>>>>>>> 262844:2016-10-26T00:41:08.118+0000: 450945.808:
>>>>>>> [GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew:
>>>>>>> 1692928K->11302K(1887488K), 48.2899260 secs] 51073216K->49915878K(84724992K),
>>>>>>> 48.2909550 secs] [Times: user=1493.17 sys=53.55, real=48.28 secs]
>>>>>>> 345421:2016-10-27T04:17:59.617+0000: 550357.306:
>>>>>>> [GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew:
>>>>>>> 1695052K->22991K(1887488K), 33.8707510 secs] 46334738K->45187822K(84724992K),
>>>>>>> 33.8718980 secs] [Times: user=1081.31 sys=30.59, real=33.86 secs]
>>>>>>> 345510:2016-10-27T04:24:11.721+0000: 550729.411:
>>>>>>> [GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew:
>>>>>>> 1705080K->20401K(1887488K), 18.9795540 secs] 47388073K->45965537K(84724992K),
>>>>>>> 18.9805410 secs] [Times: user=606.94 sys=17.25, real=18.98 secs]
>>>>>>> 345514:2016-10-27T04:24:36.695+0000: 550754.385:
>>>>>>> [GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew:
>>>>>>> 1707810K->32640K(1887488K), 30.9728200 secs] 47656489K->46506725K(84724992K),
>>>>>>> 30.9737300 secs] [Times: user=917.67 sys=33.07, real=30.97 secs]
>>>>>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791:
>>>>>>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew:
>>>>>>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K),
>>>>>>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs]
>>>>>>>
>>>>>>> Context around a single instance is fairly normal:
>>>>>>>
>>>>>>> 345773-2016-10-27T04:31:28.032+0000: 551165.721:
>>>>>>> [GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew:
>>>>>>> 1685858K->8851K(1887488K), 0.0480250 secs] 49545909K->47870050K(84724992K),
>>>>>>> 0.0490200 secs] [Times: user=1.47 sys=0.02, real=0.05 secs]
>>>>>>> 345774-2016-10-27T04:31:28.635+0000: 551166.324:
>>>>>>> [GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew:
>>>>>>> 1686675K->10456K(1887488K), 0.0463570 secs] 49547874K->47872545K(84724992K),
>>>>>>> 0.0473410 secs] [Times: user=1.41 sys=0.04, real=0.05 secs]
>>>>>>> 345775-2016-10-27T04:31:29.205+0000: 551166.894:
>>>>>>> [GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew:
>>>>>>> 1688280K->12733K(1887488K), 0.0487100 secs] 49550369K->47876404K(84724992K),
>>>>>>> 0.0496310 secs] [Times: user=1.47 sys=0.04, real=0.05 secs]
>>>>>>> 345776-2016-10-27T04:31:29.798+0000: 551167.487:
>>>>>>> [GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew:
>>>>>>> 1690557K->26694K(1887488K), 0.0471170 secs] 49554228K->47892320K(84724992K),
>>>>>>> 0.0481180 secs] [Times: user=1.40 sys=0.02, real=0.05 secs]
>>>>>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791:
>>>>>>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew:
>>>>>>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K),
>>>>>>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs]
>>>>>>> 345778-2016-10-27T04:32:08.449+0000: 551206.139:
>>>>>>> [GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew:
>>>>>>> 1708684K->122033K(1887488K), 0.0664280 secs] 50100157K->48528020K(84724992K),
>>>>>>> 0.0672860 secs] [Times: user=1.60 sys=0.05, real=0.07 secs]
>>>>>>> 345779-2016-10-27T04:32:09.090+0000: 551206.779:
>>>>>>> [GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew:
>>>>>>> 1799857K->42169K(1887488K), 0.0688910 secs] 50205844K->48541030K(84724992K),
>>>>>>> 0.0696110 secs] [Times: user=1.70 sys=0.03, real=0.07 secs]
>>>>>>> 345780-2016-10-27T04:32:09.802+0000: 551207.491:
>>>>>>> [GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew:
>>>>>>> 1719993K->43790K(1887488K), 0.0508540 secs] 50218854K->48542651K(84724992K),
>>>>>>> 0.0516000 secs] [Times: user=1.54 sys=0.03, real=0.05 secs]
>>>>>>> 345781-2016-10-27T04:32:10.536+0000: 551208.226:
>>>>>>> [GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew:
>>>>>>> 1721614K->30389K(1887488K), 0.0668100 secs] 50220475K->48545932K(84724992K),
>>>>>>> 0.0675470 secs] [Times: user=1.81 sys=0.03, real=0.06 secs]
>>>>>>> 345782-2016-10-27T04:32:11.137+0000: 551208.826:
>>>>>>> [GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew:
>>>>>>> 1708213K->18631K(1887488K), 0.0632570 secs] 50223756K->48540797K(84724992K),
>>>>>>> 0.0639650 secs] [Times: user=1.95 sys=0.01, real=0.06 secs]
>>>>>>> 345783-2016-10-27T04:32:11.642+0000: 551209.332:
>>>>>>> [GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew:
>>>>>>> 1696455K->19415K(1887488K), 0.0509260 secs] 50218621K->48545033K(84724992K),
>>>>>>> 0.0516780 secs] [Times: user=1.55 sys=0.03, real=0.05 secs]
>>>>>>>
>>>>>>> Since the user times are high as well, I don't think this could be
>>>>>>> swapping.
>>>>>>>
>>>>>> Can you ask the customer if they're using transparent hugepages
>>>>>> (THP)?
>>>>>>
>>>>>>>
>>>>>>> Here are the hard-earned set of JVM arguments that we're using:
>>>>>>>
>>>>>>> -d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \
>>>>>>>   -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled
>>>>>>> \
>>>>>>>   -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled
>>>>>>> \
>>>>>>>   -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \
>>>>>>>   -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000
>>>>>>> \
>>>>>>>   -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC
>>>>>>> -XX:+UseBiasedLocking \
>>>>>>>   -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops
>>>>>>> -XX:PermSize=256M \
>>>>>>>   -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \
>>>>>>>   -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar
>>>>>>> -XX:+UseLargePages \
>>>>>>>   -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>>>>>>> -XX:+PrintCommandLineFlags \
>>>>>>>   -XX:+UseGCLogFileRotation \
>>>>>>>   -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \
>>>>>>>   -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log
>>>>>>>
>>>>>>> This is on Linux with Java 1.7.0_72.
>>>>>>>
>>>>>>> Does this look familiar to anyone? Alternatively, are there some
>>>>>>> more JVM options that we could include to get more information?
>>>>>>>
>>>>>>> One of the first things that we'll try is to move to a later JVM,
>>>>>>> but it will be easier to get the customer to do that if we can point to a
>>>>>>> specific issue that has been addressed.
>>>>>>>
>>>>>>> Thanks for your help.
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sent from my phone
>>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Sent from my phone
>>>>
>>>> _______________________________________________
>>>> hotspot-gc-use mailing list
>>>> hotspot-gc-use at openjdk.java.net
>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>
>>>>
>>>
>>
>> --
>> Sent from my phone
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161101/f1016879/attachment-0001.html>

From jun.zhuang at hobsons.com  Thu Nov  3 14:46:40 2016
From: jun.zhuang at hobsons.com (Jun Zhuang)
Date: Thu, 3 Nov 2016 14:46:40 +0000
Subject: Questions regarding Java string literal pool
Message-ID: <BN6PR02MB2769BED4E8D129483F4295F481A30@BN6PR02MB2769.namprd02.prod.outlook.com>

Hi,

I have a few questions related to the Java String pool, I wonder if I can get some clarification from the experts?


1.       Location of the String pool

Following are from some of the posts I read but with conflicting information:


?         http://java-performance.info/string-intern-in-java-6-7-8/

?In those good old days [before java 7] all interned strings were stored in the PermGen ? the fixed size part of heap mainly used for storing loaded classes and string pool.?  ? ?in Java 7 ? the string pool was relocated to the heap. ... All strings are now located in the heap, as most of other ordinary objects?

Above statement suggests that both the interned strings and the string pool are in the PermGen prior to java 7 but being relocated to the heap in 7.


?         https://www.javaranch.com/journal/200409/ScjpTipLine-StringsLiterally.html

?Objects are created on the heap and Strings are no exception. So, Strings that are part of the "String Literal Pool" still live on the heap, but they have references to them from the String Literal Pool.?

This post suggests that string literals are created on the heap as other objects but did not tie that to any java version.


?         http://www.javamadesoeasy.com/2015/05/string-pool-string-literal-pool-string.html

?From java 7 String pool is a storage area in java heap memory, where all the other objects are created. Prior to Java 7 String pool was created in permgen space of heap.?

So prior to java 7 the string pool was in the PermGen; beginning with 7 it?s in the heap. Same as the 1st post.

My questions are:

1.       Where is the string pool located prior and after java 7

2.       Are the string literals & interned strings objects created in the PermGen prior to java 7 then being created on the heap after?


2.       Can string literals be garbage collected?


The post @https://www.javaranch.com/journal/200409/ScjpTipLine-StringsLiterally.html says ?Unlike most objects, String literals always have a reference to them from the String Literal Pool. That means that they always have a reference to them and are, therefore, not eligible for garbage collection.? But this one @http://java-performance.info/string-intern-in-java-6-7-8/ says ?Yes, all strings in the JVM string pool are eligible for garbage collection if there are no references to them from your program roots.? Are they both true under certain conditions?


Appreciate your help,
Jun

Jun Zhuang
Sr. Performance QA Engineer | Hobsons<https://www.hobsons.com/?utm_source=outlook&utm_medium=email&utm_campaign=banner_02.12.16_general>
T: +1 513 746 2288 | jun.zhuang at hobsons.com<mailto:jun.zhuang at hobsons.com>
50 E-Business Way, Suite 300 | Cincinnati, OH 45241 | USA


Upgraded by Hobsons - Subscribe Today
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161103/28b98c66/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image891000.png
Type: image/png
Size: 13602 bytes
Desc: image891000.png
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161103/28b98c66/image891000-0001.png>

From amit.mishra at redknee.com  Thu Nov 10 07:38:55 2016
From: amit.mishra at redknee.com (Amit Mishra)
Date: Thu, 10 Nov 2016 07:38:55 +0000
Subject: Troubleshoot memory leak without taking heap dump of Production
	application
Message-ID: <MWHPR05MB31187A57C3AA37E04CAE0C12F0B80@MWHPR05MB3118.namprd05.prod.outlook.com>

Hello Charlie/Poonam/team,

Need your help/suggestions on how to troubleshoot memory leak without taking any heap dump.

We are facing random Promotion failure followed by Continuous concurrent mode failures/Full GC events that impacts our Standalone application for long time until restart.

Application GC remain stable for more than a week with smooth saw tooth pattern and suddenly something happened within 1 hour or so that results in severe GC failure and ultimately application failure.

We have verified traffic pattern/application logs and other dependent application logs but there is no indication on why suddenly at one point of time heap usage kept on increasing which results in CMS failures.(Traffic pattern is fairly stable and there are no scheduled or cron jobs during time of issue)

We cannot take heap dump as this is standalone application having big heap size.(32G)

We have collected histogram during issue time and of non- issue time and found that instances of 2-3 classes have been suddenly increased from 200-300 MB to 5G+ but not sure how we can dig into code to find out what cause those classes instances to surge.

Please guide me how to troubleshoot this issue in terms of any light weight tool that would exactly pin point methods or calls that can lead to this memory leak as we can't take heap dump which is very heavy impacting tool.

One more question is why Full GC not able to clean generations even after multiple attempts and a continuous loop of GC failures being created which got resolved only after application restart, does it indicates that no new objects was creating & it was only GC algorithm which started failing and increased heap usage.

Many thanks in advance for your kind support and guidance.

This is GC graph and attached is GC file.

[cid:image002.jpg at 01D23948.747997C0]


Histogram snapshots:

java.util.HashMap$Entry was only 400 MB before issue and then 5.5G during issue same thing true for AcctSessionInfo and java.lang.String class instances.


Non issue time:

  num     #instances         #bytes  class name
----------------------------------------------
   1:      13613915     2219936904  [Ljava.lang.Object;
  2:      10065566     1569906056  [Ljava.util.HashMap$Entry;
   3:       2671564     1175488160  com.redknee.product.s5600.ipc.xgen.PdpContextID
   4:      17247420      903565648  [C
   5:      10055084      723966048  java.util.HashMap
   6:      17208464      688338560  java.lang.String
   7:       7843562      439239472  java.util.HashMap$Entry
   8:      10065566      402622640  java.util.HashMap$FrontCache


Issue time :Heap usage around 28G
  num     #instances         #bytes  class name
----------------------------------------------
   1:     118037170     6600874168  [C
   2:     103071116     5771982496  java.util.HashMap$Entry
   3:     101560457     5687385592  com.redknee.product.s5600.ipc.xgen.AcctSessionInfo
   4:     118042761     4721710440  java.lang.String
   5:       9942863     3020272632  [Ljava.lang.Object;
   6:       7537560     2737186632  [Ljava.util.HashMap$Entry;
   7:       1453865      639700600  com.redknee.product.s5600.ipc.xgen.PdpContextID
   8:       7537148      542674656  java.util.HashMap

Thanks,
Amit Mishra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161110/ff77aa86/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 34533 bytes
Desc: image002.jpg
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161110/ff77aa86/image002-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gcstats.log.rar
Type: application/octet-stream
Size: 559660 bytes
Desc: gcstats.log.rar
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161110/ff77aa86/gcstats.log-0001.rar>

From yu.zhang at oracle.com  Thu Nov 10 18:02:53 2016
From: yu.zhang at oracle.com (yu.zhang at oracle.com)
Date: Thu, 10 Nov 2016 10:02:53 -0800
Subject: Troubleshoot memory leak without taking heap dump of Production
	application
In-Reply-To: <MWHPR05MB31187A57C3AA37E04CAE0C12F0B80@MWHPR05MB3118.namprd05.prod.outlook.com>
References: <MWHPR05MB31187A57C3AA37E04CAE0C12F0B80@MWHPR05MB3118.namprd05.prod.outlook.com>
Message-ID: <a7785e24-052c-7595-5de4-b8ffc97040ec@oracle.com>

Can you try Flight Recorder with allocation profile on?

If Full GC can not clean those objects, the application probably is 
holding those. Might be a memory leak.

Thanks

Jenny


On 11/09/2016 11:38 PM, Amit Mishra wrote:
>
> Hello Charlie/Poonam/team,
>
> Need your help/suggestions on how to troubleshoot memory leak without 
> taking any heap dump.
>
> We are facing random Promotion failure followed by Continuous 
> concurrent mode failures/Full GC events that impacts our Standalone 
> application for long time until restart.
>
> Application GC remain stable for more than a week with smooth saw 
> tooth pattern and suddenly something happened within 1 hour or so that 
> results in severe GC failure and ultimately application failure.
>
> We have verified traffic pattern/application logs and other dependent 
> application logs but there is no indication on why suddenly at one 
> point of time heap usage kept on increasing which results in CMS 
> failures.(Traffic pattern is fairly stable and there are no scheduled 
> or cron jobs during time of issue)
>
> We cannot take heap dump as this is standalone application having big 
> heap size.(32G)
>
> We have collected histogram during issue time and of non- issue time 
> and found that instances of 2-3 classes have been suddenly increased 
> from 200-300 MB to 5G+ but not sure how we can dig into code to find 
> out what cause those classes instances to surge.
>
> Please guide me how to troubleshoot this issue in terms of any light 
> weight tool that would exactly pin point methods or calls that can 
> lead to this memory leak as we can?t take heap dump which is very 
> heavy impacting tool.
>
> One more question is why Full GC not able to clean generations even 
> after multiple attempts and a continuous loop of GC failures being 
> created which got resolved only after application restart, does it 
> indicates that no new objects was creating & it was only GC algorithm 
> which started failing and increased heap usage.
>
> Many thanks in advance for your kind support and guidance.
>
> This is GC graph and attached is GC file.
>
> cid:image002.jpg at 01D23948.747997C0
>
> Histogram snapshots:
>
> java.util.HashMap$Entry was only 400 MB before issue and then 5.5G 
> during issue same thing true for AcctSessionInfo and java.lang.String 
> class instances.
>
> Non issue time:
>
>   num     #instances         #bytes  class name
>
> ----------------------------------------------
>
>    1:      13613915     2219936904 [Ljava.lang.Object;
>
>   2:      10065566     1569906056 [Ljava.util.HashMap$Entry;
>
>    3:       2671564     1175488160 
> com.redknee.product.s5600.ipc.xgen.PdpContextID
>
>    4:      17247420      903565648  [C
>
>    5:      10055084      723966048 java.util.HashMap
>
>    6:      17208464      688338560 java.lang.String
>
>    7:       7843562      439239472 java.util.HashMap$Entry
>
>    8:      10065566      402622640 java.util.HashMap$FrontCache
>
> Issue time :Heap usage around 28G
>
>   num     #instances         #bytes  class name
>
> ----------------------------------------------
>
>    1:     118037170     6600874168  [C
>
>    2:     103071116     5771982496 java.util.HashMap$Entry
>
>    3:     101560457     5687385592 
> com.redknee.product.s5600.ipc.xgen.AcctSessionInfo
>
>    4:     118042761     4721710440 java.lang.String
>
>    5:       9942863     3020272632 [Ljava.lang.Object;
>
>    6:       7537560     2737186632 [Ljava.util.HashMap$Entry;
>
>    7:       1453865      639700600 
> com.redknee.product.s5600.ipc.xgen.PdpContextID
>
>    8:       7537148      542674656 java.util.HashMap
>
> Thanks,
>
> Amit Mishra
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161110/ad737641/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 34533 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161110/ad737641/attachment-0001.jpe>

From poonam.bajaj at oracle.com  Thu Nov 10 18:15:57 2016
From: poonam.bajaj at oracle.com (Poonam Bajaj Parhar)
Date: Thu, 10 Nov 2016 10:15:57 -0800
Subject: Troubleshoot memory leak without taking heap dump of Production
	application
In-Reply-To: <MWHPR05MB31187A57C3AA37E04CAE0C12F0B80@MWHPR05MB3118.namprd05.prod.outlook.com>
References: <MWHPR05MB31187A57C3AA37E04CAE0C12F0B80@MWHPR05MB3118.namprd05.prod.outlook.com>
Message-ID: <f6e1a505-562d-2902-4163-d7a414ad8118@oracle.com>

Hello Amit,

Given the fact that the Full GCs are not able to reclaim space, this 
indicates that there is some strong root that is holding on to the 
growing objects in the Java Heap.

Issue time :Heap usage around 28G

   num     #instances         #bytes  class name

----------------------------------------------

    1:     118037170     6600874168  [C

    2:     103071116     5771982496 java.util.HashMap$Entry

    3:     101560457     5687385592 
com.redknee.product.s5600.ipc.xgen.AcctSessionInfo

    4:     118042761     4721710440 java.lang.String

    5:       9942863     3020272632 [Ljava.lang.Object;

    6:       7537560     2737186632 [Ljava.util.HashMap$Entry;

    7:       1453865      639700600 
com.redknee.product.s5600.ipc.xgen.PdpContextID

    8:       7537148      542674656  java.util.HashMap

I would focus my attention on 
'com.redknee.product.s5600.ipc.xgen.AcctSessionInfo' instances and try 
to determine what is holding them and preventing them from getting 
collected by the Full GCs.

Heap dumps are the best way to figure that out if you could collect one 
from your production system when the issue starts occurring. If that is 
not possible, then would it be possible to run JVMTI agent to collect 
the reference path information for these objects? Long time back, I had 
written this JVMTI agent that given a class name can print the reference 
path information for the instances of that class.
https://blogs.oracle.com/poonam/entry/jvmti_agent_to_print_reference

And if you have access to the code where instances of AcctSessionInfo 
are being created, and stored in a HashMap, I would suggest taking a 
look at the source code around that too and see if there is anything 
obviously happening wrong with the storage of these instances.

Thanks,
Poonam

On 11/9/2016 11:38 PM, Amit Mishra wrote:
>
> Hello Charlie/Poonam/team,
>
> Need your help/suggestions on how to troubleshoot memory leak without 
> taking any heap dump.
>
> We are facing random Promotion failure followed by Continuous 
> concurrent mode failures/Full GC events that impacts our Standalone 
> application for long time until restart.
>
> Application GC remain stable for more than a week with smooth saw 
> tooth pattern and suddenly something happened within 1 hour or so that 
> results in severe GC failure and ultimately application failure.
>
> We have verified traffic pattern/application logs and other dependent 
> application logs but there is no indication on why suddenly at one 
> point of time heap usage kept on increasing which results in CMS 
> failures.(Traffic pattern is fairly stable and there are no scheduled 
> or cron jobs during time of issue)
>
> We cannot take heap dump as this is standalone application having big 
> heap size.(32G)
>
> We have collected histogram during issue time and of non- issue time 
> and found that instances of 2-3 classes have been suddenly increased 
> from 200-300 MB to 5G+ but not sure how we can dig into code to find 
> out what cause those classes instances to surge.
>
> Please guide me how to troubleshoot this issue in terms of any light 
> weight tool that would exactly pin point methods or calls that can 
> lead to this memory leak as we can?t take heap dump which is very 
> heavy impacting tool.
>
> One more question is why Full GC not able to clean generations even 
> after multiple attempts and a continuous loop of GC failures being 
> created which got resolved only after application restart, does it 
> indicates that no new objects was creating & it was only GC algorithm 
> which started failing and increased heap usage.
>
> Many thanks in advance for your kind support and guidance.
>
> This is GC graph and attached is GC file.
>
> cid:image002.jpg at 01D23948.747997C0
>
> Histogram snapshots:
>
> java.util.HashMap$Entry was only 400 MB before issue and then 5.5G 
> during issue same thing true for AcctSessionInfo and java.lang.String 
> class instances.
>
> Non issue time:
>
>   num     #instances         #bytes  class name
>
> ----------------------------------------------
>
>    1:      13613915     2219936904 [Ljava.lang.Object;
>
>   2:      10065566     1569906056 [Ljava.util.HashMap$Entry;
>
>    3:       2671564     1175488160 
> com.redknee.product.s5600.ipc.xgen.PdpContextID
>
>    4:      17247420      903565648  [C
>
>    5:      10055084      723966048 java.util.HashMap
>
>    6:      17208464      688338560 java.lang.String
>
>    7:       7843562      439239472 java.util.HashMap$Entry
>
>    8:      10065566      402622640 java.util.HashMap$FrontCache
>
> Issue time :Heap usage around 28G
>
>   num     #instances         #bytes  class name
>
> ----------------------------------------------
>
>    1:     118037170     6600874168  [C
>
>    2:     103071116     5771982496 java.util.HashMap$Entry
>
>    3:     101560457     5687385592 
> com.redknee.product.s5600.ipc.xgen.AcctSessionInfo
>
>    4:     118042761     4721710440 java.lang.String
>
>    5:       9942863     3020272632 [Ljava.lang.Object;
>
>    6:       7537560     2737186632 [Ljava.util.HashMap$Entry;
>
>    7:       1453865      639700600 
> com.redknee.product.s5600.ipc.xgen.PdpContextID
>
>    8:       7537148      542674656 java.util.HashMap
>
> Thanks,
>
> Amit Mishra
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161110/6b6879a4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 34533 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161110/6b6879a4/attachment-0001.jpe>

From david.ely at unboundid.com  Tue Nov 15 03:21:22 2016
From: david.ely at unboundid.com (David Ely)
Date: Mon, 14 Nov 2016 21:21:22 -0600
Subject: occasional ParNew times of 15+ seconds
In-Reply-To: <CAC1_7KOgWkJYr2-333BNPHFuwZ5rG4M=csUiuYBZYaJFGWWwJw@mail.gmail.com>
References: <CAC1_7KNM+c5o_yMH6m3ueO9c+qOEAxW4_LQAq3iTn0BCBDJxog@mail.gmail.com>
	<CAHjP37EsiXBwW5p6AwY+jtw71bQ-+YpBb_NwZoJkCvDgS8HwZQ@mail.gmail.com>
	<CAC1_7KPyfeV43zEGNKPMrXJObNWXNMV-CzTFs6bYyfgkjWeDiw@mail.gmail.com>
	<CAHjP37G2ibgco7YLg0gXk=HKcE-fkhyKhutvHmhAWSzZLyVEuQ@mail.gmail.com>
	<B432689D-C653-49EC-90CB-56294C329C82@oracle.com>
	<CAC1_7KMH6yHkNY9n8B5+293-Vswr1Ztp=CRH8aY0VFeJnZhw1Q@mail.gmail.com>
	<CAHjP37Fi3+=FDEGNP3f5QM9RMHU+5r41nzXXiOEtqFKkDyR=-Q@mail.gmail.com>
	<CAC1_7KPf5H+mzUk3hB_nvk5cwnJDgfD_GSFEtdwmy8DpqSJ=Yg@mail.gmail.com>
	<CAC1_7KOgWkJYr2-333BNPHFuwZ5rG4M=csUiuYBZYaJFGWWwJw@mail.gmail.com>
Message-ID: <CAC1_7KNQ5Q3_i1uG26jKGhVL7KWGSk2dgG0Zi5LVBKNSzBNtDg@mail.gmail.com>

After running for over a week without any long ParNew pauses, it appears
that turning off transparent huge pages has fixed this issue. It's
something that we'll know to look out for in the future. Thanks again for
your help.

David

On Tue, Nov 1, 2016 at 11:06 PM, David Ely <david.ely at unboundid.com> wrote:

> First, a question on isolating this issue. If outside of the process we
> detect that the JVM is paused, would we learn anything if we took pstacks
> while it was paused?
>
> The customer has disabled THP in their testing environment and hopefully
> will be moving that to production, where they are seeing these issues, soon.
>
> They have moved to 1.7u121 and added in more GC settings. Below are a
> couple of the long ParNew cycles. I don't understand all of what's logged
> here. Does this shed light on anything?
>
>
> 2016-11-01T20:47:31.163+0000: 8256.232: [GCTLAB: gc thread:
> 0x00007fdeac006800 [id: 37683] desired_size: 2KB slow allocs: 10  refill
> waste: 32B alloc: 0.00001       19KB refills: 16 waste  5.5% gc: 1176B
> slow: 632B fast: 0B
> TLAB: gc thread: 0x00007ff44a588800 [id: 46128] desired_size: 2KB slow
> allocs: 0  refill waste: 32B alloc: 0.00000        2KB refills: 1 waste
> 100.0% gc: 2072B slow: 0B fast: 0B
> ...
> TLAB totals: thrds: 464  refills: 5231 max: 165 slow allocs: 1691 max 64
> waste:  1.0% gc: 13793264B max: 1872736B slow: 305880B max: 38776B fast: 0B
> max: 0B
> Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1543881846
> Max   Chunk Size: 1543881846
> Number of Blocks: 1
> Av.  Block  Size: 1543881846
> Tree      Height: 1
> Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1137422
> Max   Chunk Size: 1135616
> Number of Blocks: 6
> Av.  Block  Size: 189570
> Tree      Height: 4
> 2016-11-01T20:47:31.165+0000: 8256.233: [ParNew2016-11-01T20:47:44.302+0000:
> 8269.370: [SoftReference, 0 refs, 0.0001060 secs]2016-11-01T20:47:44.302+0000:
> 8269.370: [WeakReference, 3 refs, 0.0000220 secs]2016-11-01T20:47:44.302+0000:
> 8269.370: [FinalReference, 53 refs, 0.0001970 secs]2016-11-01T20:47:44.302+0000:
> 8269.370: [PhantomReference, 0 refs, 0 refs, 0.0000320
> secs]2016-11-01T20:47:44.302+0000: 8269.370: [JNI Weak Reference,
> 0.0000090 secs]
> Desired survivor size 107347968 bytes, new threshold 6 (max 6)
> - age   1:    3631864 bytes,    3631864 total
> - age   2:    1532688 bytes,    5164552 total
> - age   3:   12273088 bytes,   17437640 total
> - age   4:    2159144 bytes,   19596784 total
> - age   5:    1102496 bytes,   20699280 total
> - age   6:    1924216 bytes,   22623496 total
>  (plab_sz = 78250  desired_plab_sz = 80167) : 1715472K->32796K(1887488K),
> 13.1373660 secs] 38927713K->37508799K(84724992K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1510165982
> Max   Chunk Size: 1510165982
> Number of Blocks: 1
> Av.  Block  Size: 1510165982
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1137422
> Max   Chunk Size: 1135616
> Number of Blocks: 6
> Av.  Block  Size: 189570
> Tree      Height: 4
> , 13.1395130 secs] [Times: user=419.77 sys=11.27, real=13.14 secs]
>
>
> 2016-11-01T20:54:32.448+0000: 8677.516: [GCTLAB: gc thread:
> 0x00007fdeac006800 [id: 37683] desired_size: 2KB slow allocs: 0  refill
> waste: 32B alloc: 0.00002       33KB refills: 1 waste 100.0% gc: 2072B
> slow: 0B fast: 0B
> TLAB: gc thread: 0x00007fde9800b800 [id: 46243] desired_size: 26KB slow
> allocs: 16  refill waste: 672B alloc: 0.00078     1307KB refills: 33 waste
>  0.5% gc: 2080B slow: 2696B fast: 0B
> ...
> TLAB totals: thrds: 432  refills: 3498 max: 162 slow allocs: 876 max 69
> waste:  1.2% gc: 14146232B max: 3248064B slow: 169880B max: 10712B fast: 0B
> max: 0B
> Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1400742863
> Max   Chunk Size: 1400742863
> Number of Blocks: 1
> Av.  Block  Size: 1400742863
> Tree      Height: 1
> Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1104654
> Max   Chunk Size: 1102848
> Number of Blocks: 6
> Av.  Block  Size: 184109
> Tree      Height: 4
> 2016-11-01T20:54:32.449+0000: 8677.518: [ParNew2016-11-01T20:54:49.766+0000:
> 8694.834: [SoftReference, 0 refs, 0.0001490 secs]2016-11-01T20:54:49.766+0000:
> 8694.834: [WeakReference, 5 refs, 0.0000400 secs]2016-11-01T20:54:49.766+0000:
> 8694.834: [FinalReference, 24 refs, 0.0003910 secs]2016-11-01T20:54:49.766+0000:
> 8694.835: [PhantomReference, 0 refs, 0 refs, 0.0000520
> secs]2016-11-01T20:54:49.766+0000: 8694.835: [JNI Weak Reference,
> 0.0000200 secs]
> Desired survivor size 107347968 bytes, new threshold 6 (max 6)
> - age   1:    3054792 bytes,    3054792 total
> - age   2:    8902352 bytes,   11957144 total
> - age   3:    1016784 bytes,   12973928 total
> - age   4:     781808 bytes,   13755736 total
> - age   5:     525088 bytes,   14280824 total
> - age   6:    3151000 bytes,   17431824 total
>  (plab_sz = 58269  desired_plab_sz = 55860) : 1697115K->26319K(1887488K),
> 17.3176320 secs] 40031748K->38885564K(84724992K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1333594291
> Max   Chunk Size: 1333594291
> Number of Blocks: 1
> Av.  Block  Size: 1333594291
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1104654
> Max   Chunk Size: 1102848
> Number of Blocks: 6
> Av.  Block  Size: 184109
> Tree      Height: 4
> , 17.3196490 secs] [Times: user=554.79 sys=13.91, real=17.32 secs]
>
> Thanks.
>
> David
>
>
> On Sun, Oct 30, 2016 at 7:09 PM, David Ely <david.ely at unboundid.com>
> wrote:
>
>> Thanks again Vitaly. Responses inline.
>>
>> On Sun, Oct 30, 2016 at 1:56 PM, Vitaly Davidovich <vitalyd at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Sunday, October 30, 2016, David Ely <david.ely at unboundid.com> wrote:
>>>
>>>> Thank you Vitaly and Charlie. We will have them disable THP, move to a
>>>> later version of the JVM, and add in some additional GC logging JVM options.
>>>>
>>>> Looking more at the GC log, it appears that the long ParNew pauses only
>>>> occur when the old generation usage is at least half of the distance
>>>> between the live size and when CMS is triggered via
>>>> CMSInitiatingOccupancyFraction. After a CMS collection, the long
>>>> pauses stop. However, there are plenty of CMS cycles where we don't see any
>>>> long pauses, and there are plenty of places where we promote the same
>>>> amount of data associated with a long pause but don't experience a long
>>>> pause.
>>>>
>>>> Is this behavior consistent with the THP diagnosis?
>>>>
>>> The very high sys time is unusual for a parnew collection.  THP defrag
>>> is one possible known cause of that.  It's certainly possible there's
>>> something else going on, but turning off THP is a good start in
>>> troubleshooting; even if it's not the cause here, it may bite your customer
>>> later.
>>>
>>
>> The sys times are high, but they are not especially high relative to the
>> user times. The ratio across all of the ParNew collections is about the
>> same.
>>
>>
>>>
>>> A few more questions in the meantime:
>>>
>>> 1) are these parnew tails reproducible?
>>>
>>
>> I believe so. They are seeing it on multiple systems. It seems to have
>> gotten worse on the newer systems, which have 256GB of RAM compared to 96GB.
>>
>>
>>> 2) is this running on bare metal or VM?
>>>
>>
>> Bare metal.
>>
>>
>>> 3) what's the hardware spec?
>>>
>>
>> These specific pauses on hardware they acquired recently. Java sees 48
>> CPUs, and it has 256GB of RAM.
>>
>>
>>>
>>> If you can have the customer disable THP without bumping the JVM
>>> version, it would help pinpoint the issue.  But, I understand if you just
>>> want to fix the issue asap.
>>>
>>
>> Since they are seeing this on multiple systems, they should be able to
>> have at least one where they only disable THP.
>>
>> They'll have to put these changes through their testing environment, so
>> it might be a little while before I'll have an update.
>>
>>
>>>
>>>
>>>>
>>>> On Sat, Oct 29, 2016 at 6:15 PM, charlie hunt <charlie.hunt at oracle.com>
>>>> wrote:
>>>>
>>>>> +1 on disabling THP
>>>>>
>>>>> Charlie
>>>>>
>>>>> On Oct 29, 2016, at 10:07 AM, Vitaly Davidovich <vitalyd at gmail.com>
>>>>> wrote:
>>>>>
>>>>> David,
>>>>>
>>>>> Ask them to turn off THP - it's a known source of large latency due to
>>>>> the kernel doing page defragmentation; your app takes a page fault, and
>>>>> boom - the kernel may start doing defragmentation to make a huge page
>>>>> available.  You can search online for THP issues.  The symptoms are similar
>>>>> to yours - very high sys time.
>>>>>
>>>>> If they turn it off and still get same lengthy parnew pauses, then
>>>>> it's clearly something else but at least we'll eliminate THP as the culprit.
>>>>>
>>>>> On Saturday, October 29, 2016, David Ely <david.ely at unboundid.com>
>>>>> wrote:
>>>>>
>>>>>> Thank you for the response. Yes. meminfo (see full output below)
>>>>>> shows ~80GB of AnonHugePages, which is pretty close to the size of the JVM
>>>>>> (full output below). Looking back through previous information that we have
>>>>>> from this customer, transparent huge pages have been turned on for years.
>>>>>> We've asked them for anything else that might have changed in this
>>>>>> environment.
>>>>>>
>>>>>> Are there any other JVM options that we could enable that would shed
>>>>>> light on what's going on within the ParNew? Would -XX:+PrintTLAB
>>>>>> -XX:+PrintPLAB -XX:PrintFLSStatistics=1 show anything useful?
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>>>> MemTotal:       264396572 kB
>>>>>> MemFree:         2401576 kB
>>>>>> Buffers:          381564 kB
>>>>>> Cached:         172673120 kB
>>>>>> SwapCached:            0 kB
>>>>>> Active:         163439836 kB
>>>>>> Inactive:       90737452 kB
>>>>>> Active(anon):   76910848 kB
>>>>>> Inactive(anon):  4212580 kB
>>>>>> Active(file):   86528988 kB
>>>>>> Inactive(file): 86524872 kB
>>>>>> Unevictable:           0 kB
>>>>>> Mlocked:               0 kB
>>>>>> SwapTotal:      16236540 kB
>>>>>> SwapFree:       16236540 kB
>>>>>> Dirty:             14552 kB
>>>>>> Writeback:             0 kB
>>>>>> AnonPages:      81111768 kB
>>>>>> Mapped:            31312 kB
>>>>>> Shmem:               212 kB
>>>>>> Slab:            6078732 kB
>>>>>> SReclaimable:    5956052 kB
>>>>>> SUnreclaim:       122680 kB
>>>>>> KernelStack:       41296 kB
>>>>>> PageTables:       171324 kB
>>>>>> NFS_Unstable:          0 kB
>>>>>> Bounce:                0 kB
>>>>>> WritebackTmp:          0 kB
>>>>>> CommitLimit:    148434824 kB
>>>>>> Committed_AS:   93124984 kB
>>>>>> VmallocTotal:   34359738367 kB
>>>>>> VmallocUsed:      686780 kB
>>>>>> VmallocChunk:   34225639420 kB
>>>>>> HardwareCorrupted:     0 kB
>>>>>> *AnonHugePages:  80519168 kB*
>>>>>> HugePages_Total:       0
>>>>>> HugePages_Free:        0
>>>>>> HugePages_Rsvd:        0
>>>>>> HugePages_Surp:        0
>>>>>> Hugepagesize:       2048 kB
>>>>>> DirectMap4k:        5132 kB
>>>>>> DirectMap2M:     1957888 kB
>>>>>> DirectMap1G:    266338304 kB
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 28, 2016 at 8:04 PM, Vitaly Davidovich <vitalyd at gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Friday, October 28, 2016, David Ely <david.ely at unboundid.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> While typical ParNew GC times are 50ms, our application is
>>>>>>>> occasionally hitting ParNew times that are over 15 seconds for one of our
>>>>>>>> customers, and we have no idea why. Looking at the full GC log file:
>>>>>>>>
>>>>>>>> 382250 ParNew GCs are < 1 second
>>>>>>>> 9303 are 100ms to 1 second
>>>>>>>> 1267 are 1 second to 2 seconds
>>>>>>>> 99 are 2 seconds to 10 seconds
>>>>>>>> 24 are > 10 seconds, 48 seconds being the max
>>>>>>>>
>>>>>>>> The long ones are somewhat bursty as you can see from looking at
>>>>>>>> the line numbers in the GC log:
>>>>>>>>
>>>>>>>> $ egrep -n  '(ParNew.*real=[1-9][0-9]\)' gc.log.0
>>>>>>>>
>>>>>>>> 12300:2016-10-21T01:03:20.380+0000: 20278.069:
>>>>>>>> [GC2016-10-21T01:03:20.380+0000: 20278.070: [ParNew:
>>>>>>>> 1697741K->10024K(1887488K), 16.9913450 secs] 33979542K->32817239K(84724992K),
>>>>>>>> 16.9921050 secs] [Times: user=541.32 sys=14.37, real=16.99 secs]
>>>>>>>> 43730:2016-10-21T14:12:25.050+0000: 67622.740:
>>>>>>>> [GC2016-10-21T14:12:25.051+0000: 67622.740: [ParNew:
>>>>>>>> 1728194K->33817K(1887488K), 12.7508470 secs] 49737924K->48320707K(84724992K),
>>>>>>>> 12.7517840 secs] [Times: user=405.89 sys=11.05, real=12.75 secs]
>>>>>>>> 44079:2016-10-21T14:18:55.172+0000: 68012.862:
>>>>>>>> [GC2016-10-21T14:18:55.173+0000: 68012.862: [ParNew:
>>>>>>>> 1698371K->26958K(1887488K), 12.7384460 secs] 50339815K->48930730K(84724992K),
>>>>>>>> 12.7392360 secs] [Times: user=406.58 sys=11.29, real=12.73 secs]
>>>>>>>> 50151:2016-10-21T17:10:14.471+0000: 78292.160:
>>>>>>>> [GC2016-10-21T17:10:14.471+0000: 78292.161: [ParNew:
>>>>>>>> 1713813K->40968K(1887488K), 18.6593320 secs] 49366906K->47959129K(84724992K),
>>>>>>>> 18.6602550 secs] [Times: user=590.03 sys=17.45, real=18.66 secs]
>>>>>>>> 56073:2016-10-21T19:59:36.847+0000: 88454.536:
>>>>>>>> [GC2016-10-21T19:59:36.847+0000: 88454.537: [ParNew:
>>>>>>>> 1685720K->20763K(1887488K), 16.0840200 secs] 50704025K->49302131K(84724992K),
>>>>>>>> 16.0848810 secs] [Times: user=487.00 sys=16.84, real=16.09 secs]
>>>>>>>> 78987:2016-10-22T05:49:25.623+0000: 123843.312:
>>>>>>>> [GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew:
>>>>>>>> 1709771K->22678K(1887488K), 10.9933380 secs] 43323834K->41914203K(84724992K),
>>>>>>>> 10.9943060 secs] [Times: user=349.67 sys=9.84, real=10.99 secs]
>>>>>>>> 79104:2016-10-22T05:59:26.382+0000: 124444.071:
>>>>>>>> [GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew:
>>>>>>>> 1697024K->22260K(1887488K), 11.5490390 secs] 44558499K->43145880K(84724992K),
>>>>>>>> 11.5499650 secs] [Times: user=367.73 sys=10.01, real=11.55 secs]
>>>>>>>> 79504:2016-10-22T06:09:36.983+0000: 125054.672:
>>>>>>>> [GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew:
>>>>>>>> 1688112K->4769K(1887488K), 14.1528810 secs] 46684947K->45263748K(84724992K),
>>>>>>>> 14.1539860 secs] [Times: user=452.28 sys=12.71, real=14.15 secs]
>>>>>>>> 79772:2016-10-22T06:30:36.130+0000: 126313.819:
>>>>>>>> [GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew:
>>>>>>>> 1725520K->35893K(1887488K), 14.4479670 secs] 48989739K->47563879K(84724992K),
>>>>>>>> 14.4488810 secs] [Times: user=461.60 sys=13.04, real=14.45 secs]
>>>>>>>> 80087:2016-10-22T06:37:07.202+0000: 126704.891:
>>>>>>>> [GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew:
>>>>>>>> 1698021K->23440K(1887488K), 15.7039920 secs] 50517163K->49105987K(84724992K),
>>>>>>>> 15.7050040 secs] [Times: user=497.65 sys=14.75, real=15.70 secs]
>>>>>>>> 89969:2016-10-22T13:54:27.978+0000: 152945.667:
>>>>>>>> [GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew:
>>>>>>>> 1834914K->15978K(1887488K), 11.5637150 secs] 48716340K->47307673K(84724992K),
>>>>>>>> 11.5645440 secs] [Times: user=367.77 sys=10.01, real=11.57 secs]
>>>>>>>> 90200:2016-10-22T14:05:02.717+0000: 153580.407:
>>>>>>>> [GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew:
>>>>>>>> 1684626K->7078K(1887488K), 17.3424650 secs] 50361539K->48947648K(84724992K),
>>>>>>>> 17.3433490 secs] [Times: user=554.39 sys=15.81, real=17.34 secs]
>>>>>>>> 90299:2016-10-22T14:14:30.521+0000: 154148.210:
>>>>>>>> [GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew:
>>>>>>>> 1690850K->6078K(1887488K), 13.1699350 secs] 51455784K->50033156K(84724992K),
>>>>>>>> 13.1708900 secs] [Times: user=419.55 sys=11.54, real=13.17 secs]
>>>>>>>> 261329:2016-10-26T00:06:44.499+0000: 448882.189:
>>>>>>>> [GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew:
>>>>>>>> 1705614K->22224K(1887488K), 17.5831730 secs] 40683698K->39525817K(84724992K),
>>>>>>>> 17.5843270 secs] [Times: user=561.85 sys=14.79, real=17.58 secs]
>>>>>>>> 261935:2016-10-26T00:13:34.277+0000: 449291.967:
>>>>>>>> [GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew:
>>>>>>>> 1690085K->26707K(1887488K), 13.9331790 secs] 43792178K->42655000K(84724992K),
>>>>>>>> 13.9340780 secs] [Times: user=446.36 sys=11.45, real=13.93 secs]
>>>>>>>> 262143:2016-10-26T00:20:09.397+0000: 449687.087:
>>>>>>>> [GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew:
>>>>>>>> 1696593K->27078K(1887488K), 40.3344500 secs] 45588644K->44444949K(84724992K),
>>>>>>>> 40.3355430 secs] [Times: user=1248.15 sys=43.07, real=40.33 secs]
>>>>>>>> 262275:2016-10-26T00:27:02.196+0000: 450099.886:
>>>>>>>> [GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew:
>>>>>>>> 1683406K->17853K(1887488K), 17.7472360 secs] 46908499K->45506131K(84724992K),
>>>>>>>> 17.7482260 secs] [Times: user=567.03 sys=16.10, real=17.75 secs]
>>>>>>>> 262282:2016-10-26T00:27:29.448+0000: 450127.138:
>>>>>>>> [GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew:
>>>>>>>> 1687737K->10499K(1887488K), 35.4934000 secs] 47195678K->46044477K(84724992K),
>>>>>>>> 35.4943230 secs] [Times: user=1131.34 sys=31.87, real=35.49 secs]
>>>>>>>> 262631:2016-10-26T00:34:17.632+0000: 450535.321:
>>>>>>>> [GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew:
>>>>>>>> 1687590K->10226K(1887488K), 21.4043600 secs] 49431427K->48018504K(84724992K),
>>>>>>>> 21.4052230 secs] [Times: user=682.50 sys=19.46, real=21.41 secs]
>>>>>>>> 262844:2016-10-26T00:41:08.118+0000: 450945.808:
>>>>>>>> [GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew:
>>>>>>>> 1692928K->11302K(1887488K), 48.2899260 secs] 51073216K->49915878K(84724992K),
>>>>>>>> 48.2909550 secs] [Times: user=1493.17 sys=53.55, real=48.28 secs]
>>>>>>>> 345421:2016-10-27T04:17:59.617+0000: 550357.306:
>>>>>>>> [GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew:
>>>>>>>> 1695052K->22991K(1887488K), 33.8707510 secs] 46334738K->45187822K(84724992K),
>>>>>>>> 33.8718980 secs] [Times: user=1081.31 sys=30.59, real=33.86 secs]
>>>>>>>> 345510:2016-10-27T04:24:11.721+0000: 550729.411:
>>>>>>>> [GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew:
>>>>>>>> 1705080K->20401K(1887488K), 18.9795540 secs] 47388073K->45965537K(84724992K),
>>>>>>>> 18.9805410 secs] [Times: user=606.94 sys=17.25, real=18.98 secs]
>>>>>>>> 345514:2016-10-27T04:24:36.695+0000: 550754.385:
>>>>>>>> [GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew:
>>>>>>>> 1707810K->32640K(1887488K), 30.9728200 secs] 47656489K->46506725K(84724992K),
>>>>>>>> 30.9737300 secs] [Times: user=917.67 sys=33.07, real=30.97 secs]
>>>>>>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791:
>>>>>>>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew:
>>>>>>>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K),
>>>>>>>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs]
>>>>>>>>
>>>>>>>> Context around a single instance is fairly normal:
>>>>>>>>
>>>>>>>> 345773-2016-10-27T04:31:28.032+0000: 551165.721:
>>>>>>>> [GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew:
>>>>>>>> 1685858K->8851K(1887488K), 0.0480250 secs] 49545909K->47870050K(84724992K),
>>>>>>>> 0.0490200 secs] [Times: user=1.47 sys=0.02, real=0.05 secs]
>>>>>>>> 345774-2016-10-27T04:31:28.635+0000: 551166.324:
>>>>>>>> [GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew:
>>>>>>>> 1686675K->10456K(1887488K), 0.0463570 secs] 49547874K->47872545K(84724992K),
>>>>>>>> 0.0473410 secs] [Times: user=1.41 sys=0.04, real=0.05 secs]
>>>>>>>> 345775-2016-10-27T04:31:29.205+0000: 551166.894:
>>>>>>>> [GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew:
>>>>>>>> 1688280K->12733K(1887488K), 0.0487100 secs] 49550369K->47876404K(84724992K),
>>>>>>>> 0.0496310 secs] [Times: user=1.47 sys=0.04, real=0.05 secs]
>>>>>>>> 345776-2016-10-27T04:31:29.798+0000: 551167.487:
>>>>>>>> [GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew:
>>>>>>>> 1690557K->26694K(1887488K), 0.0471170 secs] 49554228K->47892320K(84724992K),
>>>>>>>> 0.0481180 secs] [Times: user=1.40 sys=0.02, real=0.05 secs]
>>>>>>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791:
>>>>>>>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew:
>>>>>>>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K),
>>>>>>>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs]
>>>>>>>> 345778-2016-10-27T04:32:08.449+0000: 551206.139:
>>>>>>>> [GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew:
>>>>>>>> 1708684K->122033K(1887488K), 0.0664280 secs] 50100157K->48528020K(84724992K),
>>>>>>>> 0.0672860 secs] [Times: user=1.60 sys=0.05, real=0.07 secs]
>>>>>>>> 345779-2016-10-27T04:32:09.090+0000: 551206.779:
>>>>>>>> [GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew:
>>>>>>>> 1799857K->42169K(1887488K), 0.0688910 secs] 50205844K->48541030K(84724992K),
>>>>>>>> 0.0696110 secs] [Times: user=1.70 sys=0.03, real=0.07 secs]
>>>>>>>> 345780-2016-10-27T04:32:09.802+0000: 551207.491:
>>>>>>>> [GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew:
>>>>>>>> 1719993K->43790K(1887488K), 0.0508540 secs] 50218854K->48542651K(84724992K),
>>>>>>>> 0.0516000 secs] [Times: user=1.54 sys=0.03, real=0.05 secs]
>>>>>>>> 345781-2016-10-27T04:32:10.536+0000: 551208.226:
>>>>>>>> [GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew:
>>>>>>>> 1721614K->30389K(1887488K), 0.0668100 secs] 50220475K->48545932K(84724992K),
>>>>>>>> 0.0675470 secs] [Times: user=1.81 sys=0.03, real=0.06 secs]
>>>>>>>> 345782-2016-10-27T04:32:11.137+0000: 551208.826:
>>>>>>>> [GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew:
>>>>>>>> 1708213K->18631K(1887488K), 0.0632570 secs] 50223756K->48540797K(84724992K),
>>>>>>>> 0.0639650 secs] [Times: user=1.95 sys=0.01, real=0.06 secs]
>>>>>>>> 345783-2016-10-27T04:32:11.642+0000: 551209.332:
>>>>>>>> [GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew:
>>>>>>>> 1696455K->19415K(1887488K), 0.0509260 secs] 50218621K->48545033K(84724992K),
>>>>>>>> 0.0516780 secs] [Times: user=1.55 sys=0.03, real=0.05 secs]
>>>>>>>>
>>>>>>>> Since the user times are high as well, I don't think this could be
>>>>>>>> swapping.
>>>>>>>>
>>>>>>> Can you ask the customer if they're using transparent hugepages
>>>>>>> (THP)?
>>>>>>>
>>>>>>>>
>>>>>>>> Here are the hard-earned set of JVM arguments that we're using:
>>>>>>>>
>>>>>>>> -d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \
>>>>>>>>   -XX:NewSize=2g -XX:+UseConcMarkSweepGC
>>>>>>>> -XX:+CMSConcurrentMTEnabled \
>>>>>>>>   -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled
>>>>>>>> \
>>>>>>>>   -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \
>>>>>>>>   -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000
>>>>>>>> \
>>>>>>>>   -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC
>>>>>>>> -XX:+UseBiasedLocking \
>>>>>>>>   -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops
>>>>>>>> -XX:PermSize=256M \
>>>>>>>>   -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \
>>>>>>>>   -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar
>>>>>>>> -XX:+UseLargePages \
>>>>>>>>   -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>>>>>>>> -XX:+PrintCommandLineFlags \
>>>>>>>>   -XX:+UseGCLogFileRotation \
>>>>>>>>   -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \
>>>>>>>>   -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log
>>>>>>>>
>>>>>>>> This is on Linux with Java 1.7.0_72.
>>>>>>>>
>>>>>>>> Does this look familiar to anyone? Alternatively, are there some
>>>>>>>> more JVM options that we could include to get more information?
>>>>>>>>
>>>>>>>> One of the first things that we'll try is to move to a later JVM,
>>>>>>>> but it will be easier to get the customer to do that if we can point to a
>>>>>>>> specific issue that has been addressed.
>>>>>>>>
>>>>>>>> Thanks for your help.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sent from my phone
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Sent from my phone
>>>>>
>>>>> _______________________________________________
>>>>> hotspot-gc-use mailing list
>>>>> hotspot-gc-use at openjdk.java.net
>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Sent from my phone
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161114/7b39723f/attachment-0001.html>

From org.openjdk at io7m.com  Tue Nov 15 16:19:32 2016
From: org.openjdk at io7m.com (org.openjdk at io7m.com)
Date: Tue, 15 Nov 2016 16:19:32 +0000
Subject: Shenandoah: How small is small?
Message-ID: <20161115161932.2b1e3bf0@copperhead.int.arc7.info>

Hello.

I've been watching the development of Shenandoah since it began. As a
developer of software with mildly soft-realtime requirements (games,
primarily), I'm always eager to see advances that can reduce GC pause
times. Although right now I don't have a GC problem (typically, my
pauses for minor GCs are well below 16ms and therefore are not
perceptible given the usual 30hz/60hz game loop), I still feel that I
have to be more conscious of allocation rates than feels natural in
order to avoid producing too much garbage. I sometimes find myself
avoiding better abstractions and immutable objects simply because I
want to avoid allocations. Escape analysis helps, but sometimes those
objects really do need to hang around. Value types will also help!

I've read in JEP 189 that Shenandoah is intended to try to reduce pause
times on +100gb heaps, and a rather outdated blog post online [0]
suggested that a 512mb heap is simply too small to run at all. The
software I write is written under the general assumption that 1GB of
memory will be a minimum requirement - this includes both the JVM heap 
and any allocated native memory.

Right now I'm still using ParNew, although I'll likely move to G1 if it
becomes the default in JDK9. Is Shenandoah likely to be an improvement
for my use case?

Regards,
Mark

[0] https://www.jclarity.com/2014/03/12/shenandoah-experiment-1-will-it-run-pcgen/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 821 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161115/92b62590/attachment.bin>

From martijnverburg at gmail.com  Thu Nov 17 15:02:15 2016
From: martijnverburg at gmail.com (Martijn Verburg)
Date: Thu, 17 Nov 2016 16:02:15 +0100
Subject: Shenandoah: How small is small?
In-Reply-To: <20161115161932.2b1e3bf0@copperhead.int.arc7.info>
References: <20161115161932.2b1e3bf0@copperhead.int.arc7.info>
Message-ID: <CAP7YuASzF_Dfn6mZLf02pO0oLD5tYqzjBCzGOqr6zTp-tDokPQ@mail.gmail.com>

There's a specific shenandoah-dev list - you may get a more accurate
response there as it's a moving target.

Cheers,
Martijn

On 15 November 2016 at 17:19, <org.openjdk at io7m.com> wrote:

> Hello.
>
> I've been watching the development of Shenandoah since it began. As a
> developer of software with mildly soft-realtime requirements (games,
> primarily), I'm always eager to see advances that can reduce GC pause
> times. Although right now I don't have a GC problem (typically, my
> pauses for minor GCs are well below 16ms and therefore are not
> perceptible given the usual 30hz/60hz game loop), I still feel that I
> have to be more conscious of allocation rates than feels natural in
> order to avoid producing too much garbage. I sometimes find myself
> avoiding better abstractions and immutable objects simply because I
> want to avoid allocations. Escape analysis helps, but sometimes those
> objects really do need to hang around. Value types will also help!
>
> I've read in JEP 189 that Shenandoah is intended to try to reduce pause
> times on +100gb heaps, and a rather outdated blog post online [0]
> suggested that a 512mb heap is simply too small to run at all. The
> software I write is written under the general assumption that 1GB of
> memory will be a minimum requirement - this includes both the JVM heap
> and any allocated native memory.
>
> Right now I'm still using ParNew, although I'll likely move to G1 if it
> becomes the default in JDK9. Is Shenandoah likely to be an improvement
> for my use case?
>
> Regards,
> Mark
>
> [0] https://www.jclarity.com/2014/03/12/shenandoah-experiment-1-
> will-it-run-pcgen/
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161117/7ac5a613/attachment.html>

From org.openjdk at io7m.com  Thu Nov 17 15:23:21 2016
From: org.openjdk at io7m.com (org.openjdk at io7m.com)
Date: Thu, 17 Nov 2016 15:23:21 +0000
Subject: Shenandoah: How small is small?
In-Reply-To: <CAP7YuASzF_Dfn6mZLf02pO0oLD5tYqzjBCzGOqr6zTp-tDokPQ@mail.gmail.com>
References: <20161115161932.2b1e3bf0@copperhead.int.arc7.info>
	<CAP7YuASzF_Dfn6mZLf02pO0oLD5tYqzjBCzGOqr6zTp-tDokPQ@mail.gmail.com>
Message-ID: <20161117152321.03cbf6cb@copperhead.int.arc7.info>

On 2016-11-17T16:02:15 +0100
Martijn Verburg <martijnverburg at gmail.com> wrote:

> There's a specific shenandoah-dev list - you may get a more accurate
> response there as it's a moving target.
> 
> Cheers,
> Martijn

Ah, ok, thanks! Wasn't sure about asking there as I'd assumed it was
off-topic (it not strictly being a question about developing
Shenandoah).

Will repost it there.

Regards,
Mark
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 821 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161117/15b3a4bf/attachment.bin>