From thomas.schatzl at oracle.com  Wed Feb  1 13:10:28 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 01 Feb 2017 14:10:28 +0100
Subject: CMS large objects vs G1 humongous allocations
In-Reply-To: <CACoa2tVN0A1_ZfnCCVWhADBKJx4tpDpwtoZRz9wQk4J3eH_nfg@mail.gmail.com>
References: <CACoa2tXRYVW10jRcE43VL3D-JH7njbx3eaLWNOHZ+3MokBsrEw@mail.gmail.com>
	<CAHjP37H2xuM1srz5VJokidasQSdTX_69J7r2AjO-8EbWe+zQdA@mail.gmail.com>
	<CACoa2tUP+M2o1FRZZ8EuBu8WWGeDsj2VWhUnr4SZ7x38EPZuyQ@mail.gmail.com>
	<CAHjP37H=JVBP3kuC2bJ7BdTkfZ2yM3w+PoA_OA6-qLQX60OkdA@mail.gmail.com>
	<CACoa2tXDob+YS76P1GieJWcFKcZvC9i+zufJR+oM455HhtAo1Q@mail.gmail.com>
	<CAHjP37FgAKm1LB7BeRShDxHQFg2LsU8_N4axPOTvD5b98xA-Nw@mail.gmail.com>
	<CAHjP37Feq4KRumNCK7CYf3EtM8AREfBtkvV9nw96QhBxMLvqwQ@mail.gmail.com>
	<1485859866.3425.7.camel@oracle.com>
	<CAHjP37HgBgbtHN3bbhW4c7F1_1CuGFBXxjFZ3G5txbR+6oStVA@mail.gmail.com>
	<CACoa2tVN0A1_ZfnCCVWhADBKJx4tpDpwtoZRz9wQk4J3eH_nfg@mail.gmail.com>
Message-ID: <1485954628.3415.12.camel@oracle.com>

Hi Amit,

On Tue, 2017-01-31 at 22:04 +0530, Amit Balode wrote:
> File is a bit huge:) linked it here?https://raw.githubusercontent.com
> /amitbalode/uploads/master/amit.txt
> 

? some observations:

- some humongous objects; not sure if increasing heap region size helps
- as for the evacuation failures, I think they can be avoided by
capping the maximum young gen size. Every time this happens, the young
gen is really large, however it seems that according to heap size
calculations the surviving objects should actually have enough space.
It does not because the humongous objects may take up too much space,
and at least the printing does not take that into account.

I remember discussing this or similar issues in the past, not sure if
it has been fixed in one way or another in the meantime.

Anyway, capping young gen should be able to avoid this issue at least
some times. Try setting -XX:G1MaxNewSizePercent to something lower than
the default 60%.

Thanks,
? Thomas


From amit.balode at gmail.com  Wed Feb  1 13:48:07 2017
From: amit.balode at gmail.com (Amit Balode)
Date: Wed, 1 Feb 2017 19:18:07 +0530
Subject: CMS large objects vs G1 humongous allocations
In-Reply-To: <1485954628.3415.12.camel@oracle.com>
References: <CACoa2tXRYVW10jRcE43VL3D-JH7njbx3eaLWNOHZ+3MokBsrEw@mail.gmail.com>
	<CAHjP37H2xuM1srz5VJokidasQSdTX_69J7r2AjO-8EbWe+zQdA@mail.gmail.com>
	<CACoa2tUP+M2o1FRZZ8EuBu8WWGeDsj2VWhUnr4SZ7x38EPZuyQ@mail.gmail.com>
	<CAHjP37H=JVBP3kuC2bJ7BdTkfZ2yM3w+PoA_OA6-qLQX60OkdA@mail.gmail.com>
	<CACoa2tXDob+YS76P1GieJWcFKcZvC9i+zufJR+oM455HhtAo1Q@mail.gmail.com>
	<CAHjP37FgAKm1LB7BeRShDxHQFg2LsU8_N4axPOTvD5b98xA-Nw@mail.gmail.com>
	<CAHjP37Feq4KRumNCK7CYf3EtM8AREfBtkvV9nw96QhBxMLvqwQ@mail.gmail.com>
	<1485859866.3425.7.camel@oracle.com>
	<CAHjP37HgBgbtHN3bbhW4c7F1_1CuGFBXxjFZ3G5txbR+6oStVA@mail.gmail.com>
	<CACoa2tVN0A1_ZfnCCVWhADBKJx4tpDpwtoZRz9wQk4J3eH_nfg@mail.gmail.com>
	<1485954628.3415.12.camel@oracle.com>
Message-ID: <CACoa2tXF_GxnMabUXOizr=K-vK9aN0YCGNM1gJC5XUcKKSpkSw@mail.gmail.com>

Hi Thomas, thanks for input.

For "Every time this happens, the young gen is really large, however it
seems that according to heap size calculations the surviving objects should
actually have enough space." - Could you paste the snippet from log which
you referring to?

"I remember discussing this or similar issues in the past, not sure if it
has been fixed in one way or another in the meantime." It would really be
great if you could help dig whether it has been fixed and which release so
we could try upgrading to it.

good point regarding G1MaxNewSizePercent. In general,  I have been trying
to avoid too many customization with G1 and let heuristics decide for
itself but if no option, I will try to put this setting and experiment.

On Wed, Feb 1, 2017 at 6:40 PM, Thomas Schatzl <thomas.schatzl at oracle.com>
wrote:

> Hi Amit,
>
> On Tue, 2017-01-31 at 22:04 +0530, Amit Balode wrote:
> > File is a bit huge:) linked it here https://raw.githubusercontent.com
> > /amitbalode/uploads/master/amit.txt
> >
>
>   some observations:
>
> - some humongous objects; not sure if increasing heap region size helps
> - as for the evacuation failures, I think they can be avoided by
> capping the maximum young gen size. Every time this happens, the young
> gen is really large, however it seems that according to heap size
> calculations the surviving objects should actually have enough space.
> It does not because the humongous objects may take up too much space,
> and at least the printing does not take that into account.
>
> I remember discussing this or similar issues in the past, not sure if
> it has been fixed in one way or another in the meantime.
>
> Anyway, capping young gen should be able to avoid this issue at least
> some times. Try setting -XX:G1MaxNewSizePercent to something lower than
> the default 60%.
>
> Thanks,
>   Thomas
>
>


-- 
Thanks & Regards,
Amit.Balode
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170201/0d20d2e4/attachment.html>

From amit.balode at gmail.com  Wed Feb  1 13:54:50 2017
From: amit.balode at gmail.com (Amit Balode)
Date: Wed, 1 Feb 2017 19:24:50 +0530
Subject: Deciding between 2MB or 32MB region size in G1
Message-ID: <CACoa2tVabLBVfM9X=jAkXwcqRPcypoOTwMW2B=4AxmBxatZRcw@mail.gmail.com>

Hello, We have multiple applications running in production where predicting
size of the runtime object is kinda tough and random. It could vary from
1KB to 25MB for different applications. To not have too many lingering
configs for different applications, I am trying to come up with standard
set of configs which could be applicable to all applications. Some
applications do not exceed 10KB object size, so I could definitely keep 2MB
as region size for them. But I am wondering what would be disadvantage of
setting all applications to 32MB region size regardless of how small the
object is?

Is it that fragmentation issues will happen more if you have less regions?
If so, will the fragmentation issue happen only during humongous
allocations?
In term of performance, will selection of size change anything?

-- 
Thanks & Regards,
Amit.Balode
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170201/6820d84b/attachment.html>

From gustav.r.akesson at gmail.com  Wed Feb  1 14:44:08 2017
From: gustav.r.akesson at gmail.com (=?UTF-8?Q?Gustav_=C3=85kesson?=)
Date: Wed, 1 Feb 2017 15:44:08 +0100
Subject: Long Parnew pause
Message-ID: <CAKEw5+52eeaSSz+DSe38QOYQLceYE5shdaQcMZhrQLvXs_=K7A@mail.gmail.com>

Hi,

In our application I've observed an occasional and peculiar Parnew GC which
takes several seconds. From what I've been able to gather, it is associated
with an occasional 35mb allocation of data. Those objects are allocated and
tenured (see the bold aging in below logs) and once being promoted to old
generation, that Parnew GC takes around 7 seconds. This surprises me a bit
since it should not take so long time to move 35mb of data to another heap
region.

Looking at the logs, this issue is not related to I/O (zero systime) nor
TTSP (takes few millis to stop the application threads). GC threads seems
to simply spend their time working on the CPU chip. What in Parnew/CMS
could possibly make these 35mb take so long to promote? Some flags that can
shed som light, or any suspicion such as free-list balancing?

I appreciate any input on the matter.

JVM settings and platform information at the end of this mail.

{Heap before GC invocations=1723 (full 0):
 par new generation   total 1887488K, used 1768096K [0x00007fccf1600000,
0x00007fcd71600000, 0x00007fcd71600000)
  eden space 1677824K, 100% used [0x00007fccf1600000, 0x00007fcd57c80000,
0x00007fcd57c80000)
  from space 209664K,  43% used [0x00007fcd64940000, 0x00007fcd6a1680e8,
0x00007fcd71600000)
  to   space 209664K,   0% used [0x00007fcd57c80000, 0x00007fcd57c80000,
0x00007fcd64940000)
 concurrent mark-sweep generation total 34973696K, used 7319028K
[0x00007fcd71600000, 0x00007fd5c8000000, 0x00007fd5c8000000)
 Metaspace       used 121299K, capacity 134343K, committed 134400K,
reserved 135168K
2017-01-26T12:50:11.476+0100: 14135.489: [GC (Allocation Failure)
2017-01-26T12:50:11.476+0100: 14135.489: [ParNew
Desired survivor size 107347968 bytes, new threshold 6 (max 6)
- age   1:   12439600 bytes,   12439600 total
- age   2:    5233256 bytes,   17672856 total
- age   3:    5083408 bytes,   22756264 total
*- age   4:   37639936 bytes,   60396200 total*
- age   5:    4869520 bytes,   65265720 total
- age   6:    4746784 bytes,   70012504 total
: 1768096K->91122K(1887488K), 0.1117876 secs]
9087124K->7412981K(36861184K), 0.1120711 secs] [Times: user=0.85 sys=0.00,
real=0.11 secs]
Heap after GC invocations=1724 (full 0):
 par new generation   total 1887488K, used 91122K [0x00007fccf1600000,
0x00007fcd71600000, 0x00007fcd71600000)
  eden space 1677824K,   0% used [0x00007fccf1600000, 0x00007fccf1600000,
0x00007fcd57c80000)
  from space 209664K,  43% used [0x00007fcd57c80000, 0x00007fcd5d57ca70,
0x00007fcd64940000)
  to   space 209664K,   0% used [0x00007fcd64940000, 0x00007fcd64940000,
0x00007fcd71600000)
 concurrent mark-sweep generation total 34973696K, used 7321858K
[0x00007fcd71600000, 0x00007fd5c8000000, 0x00007fd5c8000000)
 Metaspace       used 121299K, capacity 134343K, committed 134400K,
reserved 135168K
}
2017-01-26T12:50:11.589+0100: 14135.601: Total time for which application
threads were stopped: 0.1174674 seconds, Stopping threads took: 0.0042340
seconds
2017-01-26T12:50:12.168+0100: 14136.181: Application time: 0.5798363 seconds
{Heap before GC invocations=1724 (full 0):
 par new generation   total 1887488K, used 1768946K [0x00007fccf1600000,
0x00007fcd71600000, 0x00007fcd71600000)
  eden space 1677824K, 100% used [0x00007fccf1600000, 0x00007fcd57c80000,
0x00007fcd57c80000)
  from space 209664K,  43% used [0x00007fcd57c80000, 0x00007fcd5d57ca70,
0x00007fcd64940000)
  to   space 209664K,   0% used [0x00007fcd64940000, 0x00007fcd64940000,
0x00007fcd71600000)
 concurrent mark-sweep generation total 34973696K, used 7321858K
[0x00007fcd71600000, 0x00007fd5c8000000, 0x00007fd5c8000000)
 Metaspace       used 121299K, capacity 134343K, committed 134400K,
reserved 135168K
2017-01-26T12:50:12.170+0100: 14136.182: [GC (Allocation Failure)
2017-01-26T12:50:12.170+0100: 14136.182: [ParNew
Desired survivor size 107347968 bytes, new threshold 6 (max 6)
- age   1:   10383048 bytes,   10383048 total
- age   2:    5102856 bytes,   15485904 total
- age   3:    5154816 bytes,   20640720 total
- age   4:    5080000 bytes,   25720720 total
*- age   5:   37637680 bytes,   63358400 total*
- age   6:    4658912 bytes,   68017312 total
: 1768946K->86544K(1887488K), 0.0929344 secs]
9090805K->7411133K(36861184K), 0.0932244 secs] [Times: user=0.70 sys=0.00,
real=0.09 secs]
Heap after GC invocations=1725 (full 0):
 par new generation   total 1887488K, used 86544K [0x00007fccf1600000,
0x00007fcd71600000, 0x00007fcd71600000)
  eden space 1677824K,   0% used [0x00007fccf1600000, 0x00007fccf1600000,
0x00007fcd57c80000)
  from space 209664K,  41% used [0x00007fcd64940000, 0x00007fcd69dc41d0,
0x00007fcd71600000)
  to   space 209664K,   0% used [0x00007fcd57c80000, 0x00007fcd57c80000,
0x00007fcd64940000)
 concurrent mark-sweep generation total 34973696K, used 7324589K
[0x00007fcd71600000, 0x00007fd5c8000000, 0x00007fd5c8000000)
 Metaspace       used 121299K, capacity 134343K, committed 134400K,
reserved 135168K
}
2017-01-26T12:50:12.263+0100: 14136.276: Total time for which application
threads were stopped: 0.0945634 seconds, Stopping threads took: 0.0001968
seconds
2017-01-26T12:50:12.960+0100: 14136.972: Application time: 0.6966358 seconds
{Heap before GC invocations=1725 (full 0):
 par new generation   total 1887488K, used 1764368K [0x00007fccf1600000,
0x00007fcd71600000, 0x00007fcd71600000)
  eden space 1677824K, 100% used [0x00007fccf1600000, 0x00007fcd57c80000,
0x00007fcd57c80000)
  from space 209664K,  41% used [0x00007fcd64940000, 0x00007fcd69dc41d0,
0x00007fcd71600000)
  to   space 209664K,   0% used [0x00007fcd57c80000, 0x00007fcd57c80000,
0x00007fcd64940000)
 concurrent mark-sweep generation total 34973696K, used 7324589K
[0x00007fcd71600000, 0x00007fd5c8000000, 0x00007fd5c8000000)
 Metaspace       used 121324K, capacity 134471K, committed 134656K,
reserved 135168K
2017-01-26T12:50:12.961+0100: 14136.973: [GC (Allocation Failure)
2017-01-26T12:50:12.961+0100: 14136.973: [ParNew
Desired survivor size 107347968 bytes, new threshold 6 (max 6)
- age   1:    8033264 bytes,    8033264 total
- age   2:    5686168 bytes,   13719432 total
- age   3:    5019640 bytes,   18739072 total
- age   4:    5150920 bytes,   23889992 total
- age   5:    5076720 bytes,   28966712 total
*- age   6:   37481736 bytes,   66448448 total*
: 1764368K->79984K(1887488K), 0.0955902 secs]
9088957K->7407366K(36861184K), 0.0958643 secs] [Times: user=0.69 sys=0.00,
real=0.10 secs]
Heap after GC invocations=1726 (full 0):
 par new generation   total 1887488K, used 79984K [0x00007fccf1600000,
0x00007fcd71600000, 0x00007fcd71600000)
  eden space 1677824K,   0% used [0x00007fccf1600000, 0x00007fccf1600000,
0x00007fcd57c80000)
  from space 209664K,  38% used [0x00007fcd57c80000, 0x00007fcd5ca9c148,
0x00007fcd64940000)
  to   space 209664K,   0% used [0x00007fcd64940000, 0x00007fcd64940000,
0x00007fcd71600000)
 concurrent mark-sweep generation total 34973696K, used 7327382K
[0x00007fcd71600000, 0x00007fd5c8000000, 0x00007fd5c8000000)
 Metaspace       used 121324K, capacity 134471K, committed 134656K,
reserved 135168K
}
2017-01-26T12:50:13.057+0100: 14137.069: Total time for which application
threads were stopped: 0.0972200 seconds, Stopping threads took: 0.0001917
seconds
2017-01-26T12:50:13.683+0100: 14137.695: Application time: 0.6259722 seconds
{Heap before GC invocations=1726 (full 0):
 par new generation   total 1887488K, used 1757808K [0x00007fccf1600000,
0x00007fcd71600000, 0x00007fcd71600000)
  eden space 1677824K, 100% used [0x00007fccf1600000, 0x00007fcd57c80000,
0x00007fcd57c80000)
  from space 209664K,  38% used [0x00007fcd57c80000, 0x00007fcd5ca9c148,
0x00007fcd64940000)
  to   space 209664K,   0% used [0x00007fcd64940000, 0x00007fcd64940000,
0x00007fcd71600000)
 concurrent mark-sweep generation total 34973696K, used 7327382K
[0x00007fcd71600000, 0x00007fd5c8000000, 0x00007fd5c8000000)
 Metaspace       used 121324K, capacity 134471K, committed 134656K,
reserved 135168K
*2017-01-26T12:50:13.684+0100: 14137.697: [GC (Allocation Failure)
2017-01-26T12:50:13.684+0100: 14137.697: [ParNew*
*Desired survivor size 107347968 bytes, new threshold 6 (max 6)*
*- age   1:   10784424 bytes,   10784424 total*
*- age   2:    5148032 bytes,   15932456 total*
*- age   3:    5607232 bytes,   21539688 total*
*- age   4:    5013024 bytes,   26552712 total*
*- age   5:    5148840 bytes,   31701552 total*
*- age   6:    4839808 bytes,   36541360 total*
*: 1757808K->58357K(1887488K), 7.4626505 secs]
9085190K->7420330K(36861184K), 7.4629090 secs] [Times: user=58.63 sys=0.00,
real=7.47 secs] *
Heap after GC invocations=1727 (full 0):
 par new generation   total 1887488K, used 58357K [0x00007fccf1600000,
0x00007fcd71600000, 0x00007fcd71600000)
  eden space 1677824K,   0% used [0x00007fccf1600000, 0x00007fccf1600000,
0x00007fcd57c80000)
  from space 209664K,  27% used [0x00007fcd64940000, 0x00007fcd6823d650,
0x00007fcd71600000)
  to   space 209664K,   0% used [0x00007fcd57c80000, 0x00007fcd57c80000,
0x00007fcd64940000)
 concurrent mark-sweep generation total 34973696K, used 7361973K
[0x00007fcd71600000, 0x00007fd5c8000000, 0x00007fd5c8000000)
 Metaspace       used 121324K, capacity 134471K, committed 134656K,
reserved 135168K
}
*2017-01-26T12:50:21.147+0100: 14145.160: Total time for which application
threads were stopped: 7.4642882 seconds, Stopping threads took: 0.0002572
seconds*


Java HotSpot(TM) 64-Bit Server VM (25.112-b15) for linux-amd64 JRE
(1.8.0_112-b15), built on Sep 22 2016 21:10:53 by "java_re" with gcc 4.3.0
20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 49427048k(42773024k free), swap 4194300k(4194300k
free)
-XX:+AlwaysPreTouch
-XX:+CMSEdenChunksRecordAlways
-XX:CMSInitiatingOccupancyFraction=80
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:CMSWaitDuration=60000
-XX:+DisableExplicitGC
-XX:GCLogFileSize=31457280
-XX:InitialHeapSize=37959499776
-XX:MaxHeapSize=37959499776
-XX:MaxMetaspaceSize=268435456
-XX:MaxNewSize=2147483648
-XX:MaxTenuringThreshold=6
-XX:MetaspaceSize=268435456
-XX:NewSize=2147483648
-XX:+UseBiasedLocking
-XX:+UseCMSInitiatingOccupancyOnly
-XX:-UseCompressedOops
-XX:+UseConcMarkSweepGC
-XX:+UseGCLogFileRotation
-XX:+UseLargePages
-XX:+UseParNewGC


Best Regards,
Gustav ?kesson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170201/88149404/attachment.html>

From thomas.schatzl at oracle.com  Thu Feb  2 11:07:33 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 02 Feb 2017 12:07:33 +0100
Subject: CMS large objects vs G1 humongous allocations
In-Reply-To: <CACoa2tXF_GxnMabUXOizr=K-vK9aN0YCGNM1gJC5XUcKKSpkSw@mail.gmail.com>
References: <CACoa2tXRYVW10jRcE43VL3D-JH7njbx3eaLWNOHZ+3MokBsrEw@mail.gmail.com>
	<CAHjP37H2xuM1srz5VJokidasQSdTX_69J7r2AjO-8EbWe+zQdA@mail.gmail.com>
	<CACoa2tUP+M2o1FRZZ8EuBu8WWGeDsj2VWhUnr4SZ7x38EPZuyQ@mail.gmail.com>
	<CAHjP37H=JVBP3kuC2bJ7BdTkfZ2yM3w+PoA_OA6-qLQX60OkdA@mail.gmail.com>
	<CACoa2tXDob+YS76P1GieJWcFKcZvC9i+zufJR+oM455HhtAo1Q@mail.gmail.com>
	<CAHjP37FgAKm1LB7BeRShDxHQFg2LsU8_N4axPOTvD5b98xA-Nw@mail.gmail.com>
	<CAHjP37Feq4KRumNCK7CYf3EtM8AREfBtkvV9nw96QhBxMLvqwQ@mail.gmail.com>
	<1485859866.3425.7.camel@oracle.com>
	<CAHjP37HgBgbtHN3bbhW4c7F1_1CuGFBXxjFZ3G5txbR+6oStVA@mail.gmail.com>
	<CACoa2tVN0A1_ZfnCCVWhADBKJx4tpDpwtoZRz9wQk4J3eH_nfg@mail.gmail.com>
	<1485954628.3415.12.camel@oracle.com>
	<CACoa2tXF_GxnMabUXOizr=K-vK9aN0YCGNM1gJC5XUcKKSpkSw@mail.gmail.com>
Message-ID: <1486033653.8016.23.camel@oracle.com>

Hi,

On Wed, 2017-02-01 at 19:18 +0530, Amit Balode wrote:
> Hi Thomas, thanks for input.
> 
> For "Every time this happens, the young?gen is really?large, however
> it seems that according to heap size?calculations the
> surviving?objects?should actually have enough space." - Could you
> paste the snippet from log which you referring to??

? ?[Eden: 8960.0M(8960.0M)->0.0B(288.0M) Survivors: 864.0M->512.0M
Heap: 13.6G(16.0G)->2112.0M(16.0G)]

? ?[Eden: 8832.0M(8960.0M)->0.0B(800.0M) Survivors: 864.0M->0.0B Heap:
13.9G(16.0G)->11.6G(16.0G)]

? ?[Eden: 8960.0M(8960.0M)->0.0B(8544.0M) Survivors: 320.0M->512.0M
Heap: 13.3G(16.0G)->2624.0M(16.0G)]

? ?[Eden: 8416.0M(9600.0M)->0.0B(9440.0M) Survivors: 224.0M->384.0M
Heap: 13.1G(16.0G)->2392.0M(16.0G)]

For the GCs that had evacuation failure.

According to these lines the heap occupancy for those is e.g. 13.1G,
i.e. quite a bit lower than 16G, which should in theory be enough to
cover the promotion (looking at previous gcs, it is at most in the few
100MBs).

(Caveat: there are a lot of assumptions in application behavior here)?

So the 13.1G (which means 2.9G free) may be somewhat misleading. It
shows free memory, but not memory that can be allocated into. I could
guess this is from humongous objects.

So we are probably closer to full heap than we think we are.

> "I remember discussing this or similar issues in the past, not sure
> if?it has been fixed in one way or another in the meantime." It would
> really be great if you could help dig whether it has been fixed and
> which release so we could try upgrading to it.

One of the issues I remember is that garbage collection itself wasted
quite a bit of heap with PLAB sizing (gc threads don't allocate object
by object, but get memory to copy to in largish chunks, the PLABs, for
various reasons); the existing young gen calculation mostly assumes
that there is mostly no memory overhead because of this (but there are
some "heuristics" in there of course).

In memory tight situations this may cause that problem.

This sometimes excessive java heap consumption during gc has been
improved a lot with jdk9; further evacuation failures are very fast
with that.

One other option for any older release is the mentioned
G1MaxNewSizePercent which basically limits the amount of data copied
during gc (so that the other heuristics are good). Others are fixing
PLAB size (potentially impacting gc performance), or increasing
G1ReservePercent (the "heuristics" mentioned above).

> good point regarding?G1MaxNewSizePercent. In general, ?I have been
> trying to avoid too many?customization with G1 and let heuristics
> decide for itself but if no option, I will try to put this setting
> and experiment.

We recommend to at least try without options with G1. Very very often
they are quite successful in achieving their goals.

Thanks,
? Thomas


From yu.zhang at oracle.com  Thu Feb  2 16:20:03 2017
From: yu.zhang at oracle.com (yu.zhang at oracle.com)
Date: Thu, 2 Feb 2017 08:20:03 -0800
Subject: Deciding between 2MB or 32MB region size in G1
In-Reply-To: <CACoa2tVabLBVfM9X=jAkXwcqRPcypoOTwMW2B=4AxmBxatZRcw@mail.gmail.com>
References: <CACoa2tVabLBVfM9X=jAkXwcqRPcypoOTwMW2B=4AxmBxatZRcw@mail.gmail.com>
Message-ID: <aa61023b-bdab-b202-9761-36d1b5e26e12@oracle.com>

Hi, Amit,

IMO, there is no one size fits all.

Some considerations about the bigger region size:

Reduce the humongous objects. The humongous objects are allocated in old 
gen. If they can not be collected during young gc, they can fill up the 
old gen quickly without marking or full gc.

Less remember set to keep track of.

Bigger TLAB. This could be good or bad. With bigger tlab, threads need 
less refill trip, but may waste more tlab space. It depends on the 
objects size.

Possible bigger waste due to humongous objects (depends on the size of 
the objects)

Possible end of region waste for allocation.

Maybe others have more comments.

Thanks

Jenny


On 02/01/2017 05:54 AM, Amit Balode wrote:
> Hello, We have multiple applications running in production where 
> predicting size of the runtime object is kinda tough and random. It 
> could vary from 1KB to 25MB for different applications. To not have 
> too many lingering configs for different applications, I am trying to 
> come up with standard set of configs which could be applicable to all 
> applications. Some applications do not exceed 10KB object size, so I 
> could definitely keep 2MB as region size for them. But I am wondering 
> what would be disadvantage of setting all applications to 32MB region 
> size regardless of how small the object is?
>
> Is it that fragmentation issues will happen more if you have less 
> regions? If so, will the fragmentation issue happen only during 
> humongous allocations?
> In term of performance, will selection of size change anything?
>
> -- 
> Thanks & Regards,
> Amit.Balode
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170202/ac2b913f/attachment-0001.html>

From thomas.schatzl at oracle.com  Fri Feb  3 13:01:53 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 03 Feb 2017 14:01:53 +0100
Subject: Need help on G1 GC young gen Update RS and Scan RS pause reduction
In-Reply-To: <MWHPR05MB31181BC64AB111F7D98E6B90F04F0@MWHPR05MB3118.namprd05.prod.outlook.com>
References: <MWHPR05MB3118FCF54B533C7E26EF191AF07E0@MWHPR05MB3118.namprd05.prod.outlook.com>
	<1484852604.6579.27.camel@oracle.com>
	<MWHPR05MB311884733FEF15CCA084BB22F0710@MWHPR05MB3118.namprd05.prod.outlook.com>
	<1485244966.2883.8.camel@oracle.com>
	<MWHPR05MB3118C22043AE3A69E096FA9FF0750@MWHPR05MB3118.namprd05.prod.outlook.com>
	<1485338858.3625.42.camel@oracle.com>
	<MWHPR05MB31181BC64AB111F7D98E6B90F04F0@MWHPR05MB3118.namprd05.prod.outlook.com>
Message-ID: <1486126913.2892.31.camel@oracle.com>

Hi Amit,

On Fri, 2017-02-03 at 11:09 +0000, Amit Mishra wrote:
> Hi Thomas/team,
> 
> 
> I have put all parameters as per your suggestion but somehow the
> minor gc pauses are still haunting.
> 
> Attaching GC logs.
> 
> 
> bash-3.2$ grep -i young gcstats.log.10636|cut -d, -f2|awk -F" "
> '{print $1}'|awk '$1 > 1'
> 1.1273134
> 1.1683221
> 3.5504848
> 5.2693987

? looking at these log entries, there seems to be something going on
that seems outside of VM control:

- from one gc to another, just for these four gcs, sys time is
relatively high.

- for the last two occurrences, at least one thread is hanging in "Ext
Root Scanning" for almost all of the gc time for no obvious reason.

- there do not seem to be an unusually large amount of changes in the
amount of work done in the particular phases that would raise immediate
concerns to me.

Please try to find out the source of the high sys time and maybe even
what causes it. I can't help a lot in that area, but dtrace seems a
good starting point as suggested earlier.

I think we went through most obvious tunings now, but maybe somebody else has more ideas. I don't at this time.

The jdk (7u45) you are using is also very old, so even if we find that
there is something wrong with g1 in particular, I kind of doubt there
are many more useful knobs to turn with that version (or even
appropriate logging to find out about the actual issue). Since 7u45
release, there have been hundreds of changes that in particular improve
G1 performance, so please consider upgrading to something more recent
(at least latest 8u, preferably to me some test runs with 9ea).
Upgrading alone might already help.

Thanks,
? Thomas


From Milan.Mimica at infobip.com  Fri Feb  3 16:22:56 2017
From: Milan.Mimica at infobip.com (Milan Mimica)
Date: Fri, 3 Feb 2017 16:22:56 +0000
Subject: G1 native memory consumption
In-Reply-To: <1485168079.2811.21.camel@oracle.com>
References: <1484943874550.90103@infobip.com>,
	<1485168079.2811.21.camel@oracle.com>
Message-ID: <1486138975652.57172@infobip.com>

Hi Thomas

Thanks for your input. I took me a while to have a stable system again to repeat measurements.

I have tried setting G1HeapRegionSize to 16M on one instance (8M is default) and I notice lower GC memory usage:
GC (reserved=1117MB -18MB, committed=1117MB -18MB)
vs
GC (reserved=1604MB +313MB, committed=1604MB +313MB)

It seems more stable too. However, "Internal" is still relatively high for a 25G heap, and there is no much difference between instances:
Internal (reserved=2132MB -7MB, committed=2132MB -7MB)


Milan Mimica, Senior Software Engineer / Division Lead
________________________________________
From: Thomas Schatzl <thomas.schatzl at oracle.com>
Sent: Monday, January 23, 2017 11:41
To: Milan Mimica; hotspot-gc-use at openjdk.java.net
Subject: Re: G1 native memory consumption

Hi Milan,

On Fri, 2017-01-20 at 20:24 +0000, Milan Mimica wrote:
> Hi
>
> I'm inspecting memory consumption issues of a service running on
> java-8u102, linux. The service is running for a few days now, and in
> a few days more it would consume all of 32GB physical memory
> available, and get killed by OOM Killer.
> Questions:
> - If the code is not allocating any significant off-heap memory,
> neither by Unsafe.allocateMemory or by external library, isn't 7GB
> native memory overhead supposed to be enough for a 25GB heap?
> - Why so much memory spent on "Internal" category, apparently from G1
> thread?

G1 remembered sets typically consume approximately 10% of java heap,
depending on application, heap and your remembered set configuration.

This remembered set contains information that is necessary to be able
to do incremental and partial old generation compaction.

So the 7GB should be sufficient.

You can decrease remembered set overhead by e.g. increasing heap region
size. Given the heap size you mentioned it seems it would be worth a
try to go to 16M regions.


> Find the attached jemalloc heap profile, showing "live"
> allocations that happened in about 30 hour timespan, and a
> NMT profile of approximately same period.
> Profiling was done after some warm-up time time, and with a
> manually triggered Full GC in between just to give the JVM a chance
> to clean up everything.

Remembered set memory consumption should level out after some time,
where "some" may be quite a bit of time. I.e. it may take much longer
than for other collectors. It also depends on the memory allocator.

Do you have any new measurements after this weekend? It should have
levelled out by this time.

JDK9 contains some improvements on memory usage in exactly this area.
There will likely be further improvements in this area going forward.

Thanks,
  Thomas

From amit.balode at gmail.com  Fri Feb  3 16:42:33 2017
From: amit.balode at gmail.com (Amit Balode)
Date: Fri, 3 Feb 2017 22:12:33 +0530
Subject: Deciding between 2MB or 32MB region size in G1
In-Reply-To: <aa61023b-bdab-b202-9761-36d1b5e26e12@oracle.com>
References: <CACoa2tVabLBVfM9X=jAkXwcqRPcypoOTwMW2B=4AxmBxatZRcw@mail.gmail.com>
	<aa61023b-bdab-b202-9761-36d1b5e26e12@oracle.com>
Message-ID: <CACoa2tWxnmyPYQG8Kff35r=b7GFGQOA1S4iZ1jCExZizLA9EWQ@mail.gmail.com>

Yeah, humongous allocation savings is a bigger advantage to have as
compared to some amount of fragmentation which will come with larger 32MB.

Would love to hear more comments.

On Thu, Feb 2, 2017 at 9:50 PM, yu.zhang at oracle.com <yu.zhang at oracle.com>
wrote:

> Hi, Amit,
>
> IMO, there is no one size fits all.
>
> Some considerations about the bigger region size:
>
> Reduce the humongous objects. The humongous objects are allocated in old
> gen. If they can not be collected during young gc, they can fill up the old
> gen quickly without marking or full gc.
>
> Less remember set to keep track of.
>
> Bigger TLAB. This could be good or bad. With bigger tlab, threads need
> less refill trip, but may waste more tlab space. It depends on the objects
> size.
>
> Possible bigger waste due to humongous objects (depends on the size of the
> objects)
>
> Possible end of region waste for allocation.
>
> Maybe others have more comments.
>
> Thanks
>
> Jenny
>
> On 02/01/2017 05:54 AM, Amit Balode wrote:
>
> Hello, We have multiple applications running in production where
> predicting size of the runtime object is kinda tough and random. It could
> vary from 1KB to 25MB for different applications. To not have too many
> lingering configs for different applications, I am trying to come up with
> standard set of configs which could be applicable to all applications. Some
> applications do not exceed 10KB object size, so I could definitely keep 2MB
> as region size for them. But I am wondering what would be disadvantage of
> setting all applications to 32MB region size regardless of how small the
> object is?
>
> Is it that fragmentation issues will happen more if you have less regions?
> If so, will the fragmentation issue happen only during humongous
> allocations?
> In term of performance, will selection of size change anything?
>
> --
> Thanks & Regards,
> Amit.Balode
>
>
>
> _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>


-- 
Thanks & Regards,
Amit.Balode
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170203/b7206f69/attachment.html>

From amit.balode at gmail.com  Fri Feb  3 16:46:26 2017
From: amit.balode at gmail.com (Amit Balode)
Date: Fri, 3 Feb 2017 22:16:26 +0530
Subject: CMS large objects vs G1 humongous allocations
In-Reply-To: <1486033653.8016.23.camel@oracle.com>
References: <CACoa2tXRYVW10jRcE43VL3D-JH7njbx3eaLWNOHZ+3MokBsrEw@mail.gmail.com>
	<CAHjP37H2xuM1srz5VJokidasQSdTX_69J7r2AjO-8EbWe+zQdA@mail.gmail.com>
	<CACoa2tUP+M2o1FRZZ8EuBu8WWGeDsj2VWhUnr4SZ7x38EPZuyQ@mail.gmail.com>
	<CAHjP37H=JVBP3kuC2bJ7BdTkfZ2yM3w+PoA_OA6-qLQX60OkdA@mail.gmail.com>
	<CACoa2tXDob+YS76P1GieJWcFKcZvC9i+zufJR+oM455HhtAo1Q@mail.gmail.com>
	<CAHjP37FgAKm1LB7BeRShDxHQFg2LsU8_N4axPOTvD5b98xA-Nw@mail.gmail.com>
	<CAHjP37Feq4KRumNCK7CYf3EtM8AREfBtkvV9nw96QhBxMLvqwQ@mail.gmail.com>
	<1485859866.3425.7.camel@oracle.com>
	<CAHjP37HgBgbtHN3bbhW4c7F1_1CuGFBXxjFZ3G5txbR+6oStVA@mail.gmail.com>
	<CACoa2tVN0A1_ZfnCCVWhADBKJx4tpDpwtoZRz9wQk4J3eH_nfg@mail.gmail.com>
	<1485954628.3415.12.camel@oracle.com>
	<CACoa2tXF_GxnMabUXOizr=K-vK9aN0YCGNM1gJC5XUcKKSpkSw@mail.gmail.com>
	<1486033653.8016.23.camel@oracle.com>
Message-ID: <CACoa2tVrzDXcCtkP73im700Q+zCY7VQz4SquWeNZrdZOVtgKiw@mail.gmail.com>

Thomas, thanks a lot of inputs. I will try out those options as you and
Vitaly mentioned.

On Thu, Feb 2, 2017 at 4:37 PM, Thomas Schatzl <thomas.schatzl at oracle.com>
wrote:

> Hi,
>
> On Wed, 2017-02-01 at 19:18 +0530, Amit Balode wrote:
> > Hi Thomas, thanks for input.
> >
> > For "Every time this happens, the young gen is really large, however
> > it seems that according to heap size calculations the
> > surviving objects should actually have enough space." - Could you
> > paste the snippet from log which you referring to?
>
>    [Eden: 8960.0M(8960.0M)->0.0B(288.0M) Survivors: 864.0M->512.0M
> Heap: 13.6G(16.0G)->2112.0M(16.0G)]
>
>    [Eden: 8832.0M(8960.0M)->0.0B(800.0M) Survivors: 864.0M->0.0B Heap:
> 13.9G(16.0G)->11.6G(16.0G)]
>
>    [Eden: 8960.0M(8960.0M)->0.0B(8544.0M) Survivors: 320.0M->512.0M
> Heap: 13.3G(16.0G)->2624.0M(16.0G)]
>
>    [Eden: 8416.0M(9600.0M)->0.0B(9440.0M) Survivors: 224.0M->384.0M
> Heap: 13.1G(16.0G)->2392.0M(16.0G)]
>
> For the GCs that had evacuation failure.
>
> According to these lines the heap occupancy for those is e.g. 13.1G,
> i.e. quite a bit lower than 16G, which should in theory be enough to
> cover the promotion (looking at previous gcs, it is at most in the few
> 100MBs).
>
> (Caveat: there are a lot of assumptions in application behavior here)
>
> So the 13.1G (which means 2.9G free) may be somewhat misleading. It
> shows free memory, but not memory that can be allocated into. I could
> guess this is from humongous objects.
>
> So we are probably closer to full heap than we think we are.
>
> > "I remember discussing this or similar issues in the past, not sure
> > if it has been fixed in one way or another in the meantime." It would
> > really be great if you could help dig whether it has been fixed and
> > which release so we could try upgrading to it.
>
> One of the issues I remember is that garbage collection itself wasted
> quite a bit of heap with PLAB sizing (gc threads don't allocate object
> by object, but get memory to copy to in largish chunks, the PLABs, for
> various reasons); the existing young gen calculation mostly assumes
> that there is mostly no memory overhead because of this (but there are
> some "heuristics" in there of course).
>
> In memory tight situations this may cause that problem.
>
> This sometimes excessive java heap consumption during gc has been
> improved a lot with jdk9; further evacuation failures are very fast
> with that.
>
> One other option for any older release is the mentioned
> G1MaxNewSizePercent which basically limits the amount of data copied
> during gc (so that the other heuristics are good). Others are fixing
> PLAB size (potentially impacting gc performance), or increasing
> G1ReservePercent (the "heuristics" mentioned above).
>
> > good point regarding G1MaxNewSizePercent. In general,  I have been
> > trying to avoid too many customization with G1 and let heuristics
> > decide for itself but if no option, I will try to put this setting
> > and experiment.
>
> We recommend to at least try without options with G1. Very very often
> they are quite successful in achieving their goals.
>
> Thanks,
>   Thomas
>
>


-- 
Thanks & Regards,
Amit.Balode
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170203/bb6f050f/attachment.html>

From amit.mishra at redknee.com  Fri Feb  3 11:09:48 2017
From: amit.mishra at redknee.com (Amit Mishra)
Date: Fri, 3 Feb 2017 11:09:48 +0000
Subject: Need help on G1 GC young gen Update RS and Scan RS pause reduction
References: <MWHPR05MB3118FCF54B533C7E26EF191AF07E0@MWHPR05MB3118.namprd05.prod.outlook.com>
	<1484852604.6579.27.camel@oracle.com>
	<MWHPR05MB311884733FEF15CCA084BB22F0710@MWHPR05MB3118.namprd05.prod.outlook.com>
	<1485244966.2883.8.camel@oracle.com>
	<MWHPR05MB3118C22043AE3A69E096FA9FF0750@MWHPR05MB3118.namprd05.prod.outlook.com>
	<1485338858.3625.42.camel@oracle.com> 
Message-ID: <MWHPR05MB31181BC64AB111F7D98E6B90F04F0@MWHPR05MB3118.namprd05.prod.outlook.com>

Hi Thomas/team,


I have put all parameters as per your suggestion but somehow the minor gc pauses are still haunting.

Attaching GC logs.


bash-3.2$ grep -i young gcstats.log.10636|cut -d, -f2|awk -F" " '{print $1}'|awk '$1 > 1'
1.1273134
1.1683221
3.5504848
5.2693987

Kindly suggest me next action.

GC parameters

argv[0]: /usr/java1.7/bin/amd64/java
argv[11]: -Xmx48g
argv[12]: -Xms48g
argv[13]: -XX:-EliminateLocks
argv[14]: -Dorg.omg.CORBA.ORBSingletonClass=org.jacorb.orb.ORBSingleton
argv[15]: -Dorg.omg.CORBA.ORBClass=org.jacorb.orb.ORB
argv[18]: -XX:-ReduceInitialCardMarks
argv[19]: -server
argv[21]: -classpath
argv[25]: -Xss1m
argv[26]: -Xoss1m
argv[27]: -XX:NewSize=1024m
argv[28]: -XX:MaxNewSize=3072m
argv[29]: -XX:PermSize=512m
argv[30]: -XX:MaxPermSize=512m
argv[31]: -XX:ReservedCodeCacheSize=128m
argv[32]: -XX:+HeapDumpOnOutOfMemoryError
argv[33]: -XX:+AggressiveOpts
argv[34]: -Dnetworkaddress.cache.ttl=3600
argv[35]: -Dcom.sun.management.jmxremote.port=11883
argv[36]: -Dcom.sun.management.jmxremote.ssl=false
argv[37]: -Dcom.sun.management.jmxremote.authenticate=false
argv[38]: -XX:+UseG1GC
argv[39]: -XX:MaxGCPauseMillis=500
argv[40]: -XX:+PrintFlagsFinal
argv[41]: -XX:G1RSetUpdatingPauseTimePercent=5
argv[42]: -XX:+PrintGCTimeStamps
argv[43]: -XX:+PrintGCDetails
argv[46]: -XX:+UseLargePages
argv[47]: -XX:+MaxFDLimit
argv[51]: -XX:+ParallelRefProcEnabled
argv[52]: -XX:+DisableExplicitGC
argv[53]: -XX:+UnlockDiagnosticVMOptions
argv[54]: -XX:+G1SummarizeRSetStats
argv[55]: -XX:G1SummarizeRSetStatsPeriod=1
argv[56]: -XX:+PerfDisableSharedMem
argv[57]: -XX:+AlwaysPreTouch
argv[58]: -XX:G1HeapRegionSize=32M
argv[59]: -XX:G1RSetRegionEntries=2048
argv[60]: -XX:+UnlockDiagnosticVMOptions


Thanks,
Amit Mishra


-----Original Message-----
From: Amit Mishra 
Sent: Wednesday, January 25, 2017 15:47
To: 'Thomas Schatzl' <thomas.schatzl at oracle.com>; hotspot-gc-use at openjdk.java.net
Subject: RE: Need help on G1 GC young gen Update RS and Scan RS pause reduction

Thank you very much Thomas but based on our CMS experience we do also set NewGen Size same as Max New Gen to avoid shrinking and expansion of new gen in real time which sometimes results in unexplainable pauses, can we do the same thing here as well to set min/max size as 1 G and see if it improves overall situation , next thing I am going to do it to increase InitiatingHeapOccupancyPercent from 40 to 60% which we generally set for CMS.(CMSInitiatingOccupancyFraction)

I am doing these changes and will let you know once again.

Regards,
Amit Mishra

-----Original Message-----
From: Thomas Schatzl [mailto:thomas.schatzl at oracle.com]
Sent: Wednesday, January 25, 2017 15:38
To: Amit Mishra <amit.mishra at redknee.com>; hotspot-gc-use at openjdk.java.net
Subject: Re: Need help on G1 GC young gen Update RS and Scan RS pause reduction

Hi Amit,

On Tue, 2017-01-24 at 10:41 +0000, Amit Mishra wrote:
> Hello Thomas/team,
> 
> I have put parameters as per your suggestion and now update RS time is 
> manageable but Scan RS and Object copy time are high causing pause 
> time to go beyond 1 second while we do expect max pause time to not to 
> be greater than 500 ms.

From the log it seems that the application, at least during startup time, has quite a bit of variance in the amount of objects that are held live from one garbage collection to the other.

It is quite common for applications to behave differently in this area during startup actually.

Since G1 sizes the young gen based on previous measurements, if there is a long stretch of garbage collections that do not need lots of work, it will increase the size of the young gen. However, at some point the application's behavior changes (i.e. the number of objects that need to be preserved during collection), and then these long pauses occur.

If these long garbage collections are really not desired, the only way I see is to decrease the maximum young generation size, not the minimum one as I wrongly suggested.

From that log I can see that the issue only occurs at the beginning of the run; i.e. the second occurrence is at around 310s, all other ~4100s are fine, using a pretty large young gen (~25G).

To avoid these long pauses you would need to set maximum young gen at (I would guess) around 2G. Since this is a global setting for the entire run, this will decrease throughput a lot.

It's up to you to determine whether this is okay in your case.

To set minimum (and maximum) young generation size there are two sets of options:

Set
? -XX:G1NewSizePercent (back) to 1 and
? -XX:G1MaxNewSizePercent to something like 4-5, maybe 3.

As you might have noticed, there is not much wiggle room with percentages in your case any more, but you can also set absolute min/max young gen sizes via

? -XX:NewSize=X , probably something around or above 1G seems good.
? -XX:MaxNewSize=Y , in your case something around 2-3G should work.

[Please make sure that you set both NewSize/MaxNewSize, otherwise you might experience somewhat unexpected behavior]

I also looked through your current settings below.

> Value of pause are as below(attaching complete gc file for you) grep 
> -i young gcstats.log.3319|cut -d, -f2|awk -F" " '{print $1}'|awk
> '$1 > 1'
> 1.4668911
> 1.2109846
> 
> Note : I am observing pauses just after 4-5 minutes after Application 
> restart post new GC parameters implementation.
> 
> GC parameters are as:
> 
> 
> argv[11]: -Xmx48g
> argv[12]: -Xms48g
> argv[13]: -XX:-EliminateLocks
> argv[14]:
> -Dorg.omg.CORBA.ORBSingletonClass=org.jacorb.orb.ORBSingleton
> argv[15]: -Dorg.omg.CORBA.ORBClass=org.jacorb.orb.ORB
> argv[18]: -XX:-ReduceInitialCardMarks
> argv[19]: -server
> argv[21]: -classpath
> argv[24]: -Djava.io.tmpdir=/tmp
> argv[25]: -Xss1m
> argv[26]: -Xoss1m
> argv[27]: -XX:PermSize=512m
> argv[28]: -XX:MaxPermSize=512m
> argv[29]: -XX:ReservedCodeCacheSize=128m
> argv[30]: -XX:+HeapDumpOnOutOfMemoryError
> argv[31]: -XX:+AggressiveOpts
> argv[32]: -Dnetworkaddress.cache.ttl=3600
> argv[33]: -Dcom.sun.management.jmxremote.port=11883
> argv[34]: -Dcom.sun.management.jmxremote.ssl=false
> argv[35]: -Dcom.sun.management.jmxremote.authenticate=false
> argv[36]: -XX:+UseG1GC
> argv[37]: -XX:MaxGCPauseMillis=500
> argv[38]: -XX:+PrintFlagsFinal
> argv[39]: -XX:G1RSetUpdatingPauseTimePercent=5
> argv[40]: -XX:+PrintGCTimeStamps
> argv[41]: -XX:+PrintGCDetails
> argv[43]: -verbose:gc
> argv[44]: -XX:+UseLargePages
> argv[45]: -XX:+MaxFDLimit
> argv[49]: -XX:+UnlockExperimentalVMOptions
> argv[50]: -XX:G1NewSizePercent=2

^--- as suggested above, use either
G1NewSizePercent/G1MaxNewSizePercent or NewSize/MaxNewSize with the suggested values. NewSize/MaxNewSize don't need the UnlockExperimentalVMOptions btw.

> argv[51]: -XX:+ParallelRefProcEnabled
> argv[52]: -XX:+DisableExplicitGC
> argv[53]: -XX:ParallelGCThreads=70

^--- again, the recommendation is to not set ParallelGCThreads to anything above the number of virtual cpus you have. You could try removing this again.?

> argv[54]: -XX:InitiatingHeapOccupancyPercent=40

I think this is a bit too conservative, but I don't have a good value, and maybe it is required later in the run. Independent on whether you cap maximum new size or not, it might be useful for throughput to increase this a lot.

Some plug for JDK9: G1 automatically determines rather good values for that with that version. :)

> argv[55]: -XX:+UnlockDiagnosticVMOptions
> argv[56]: -XX:+G1SummarizeRSetStats
> argv[57]: -XX:G1SummarizeRSetStatsPeriod=1

^--- again, don't use these two production after your testing completes. If you remove them, I think you can also remove UnlockDiagnosticVMOptions.

> argv[58]: -XX:+PerfDisableSharedMem
> argv[59]: -XX:+AlwaysPreTouch
> argv[60]: -XX:G1HeapRegionSize=32M
> argv[61]: -XX:G1RSetRegionEntries=2048
> argv[62]: -XX:+UnlockDiagnosticVMOptions

^--- no need to repeat that at the end.

Hth,
? Thomas

-------------- next part --------------
A non-text attachment was scrubbed...
Name: gcstats.log.10636_cps_3rdfeb.gz
Type: application/x-gzip
Size: 1877106 bytes
Desc: gcstats.log.10636_cps_3rdfeb.gz
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170203/76762dfc/gcstats.log.10636_cps_3rdfeb-0001.gz>

From charlie.hunt at oracle.com  Fri Feb  3 18:41:31 2017
From: charlie.hunt at oracle.com (charlie hunt)
Date: Fri, 3 Feb 2017 12:41:31 -0600
Subject: Need help on G1 GC young gen Update RS and Scan RS pause reduction
In-Reply-To: <1486126913.2892.31.camel@oracle.com>
References: <MWHPR05MB3118FCF54B533C7E26EF191AF07E0@MWHPR05MB3118.namprd05.prod.outlook.com>
	<1484852604.6579.27.camel@oracle.com>
	<MWHPR05MB311884733FEF15CCA084BB22F0710@MWHPR05MB3118.namprd05.prod.outlook.com>
	<1485244966.2883.8.camel@oracle.com>
	<MWHPR05MB3118C22043AE3A69E096FA9FF0750@MWHPR05MB3118.namprd05.prod.outlook.com>
	<1485338858.3625.42.camel@oracle.com>
	<MWHPR05MB31181BC64AB111F7D98E6B90F04F0@MWHPR05MB3118.namprd05.prod.outlook.com>
	<1486126913.2892.31.camel@oracle.com>
Message-ID: <4712011A-5C3D-471B-A238-DF1A08267DDC@oracle.com>

> ?- from one gc to another, just for these four gcs, sys time is relatively high.?

Assuming this is on Linux ? perhaps double check that THP (transparent huge pages) is disabled.

charlie

> On Feb 3, 2017, at 7:01 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi Amit,
> 
> On Fri, 2017-02-03 at 11:09 +0000, Amit Mishra wrote:
>> Hi Thomas/team,
>> 
>> 
>> I have put all parameters as per your suggestion but somehow the
>> minor gc pauses are still haunting.
>> 
>> Attaching GC logs.
>> 
>> 
>> bash-3.2$ grep -i young gcstats.log.10636|cut -d, -f2|awk -F" "
>> '{print $1}'|awk '$1 > 1'
>> 1.1273134
>> 1.1683221
>> 3.5504848
>> 5.2693987
> 
>   looking at these log entries, there seems to be something going on
> that seems outside of VM control:
> 
> - from one gc to another, just for these four gcs, sys time is
> relatively high.
> 
> - for the last two occurrences, at least one thread is hanging in "Ext
> Root Scanning" for almost all of the gc time for no obvious reason.
> 
> - there do not seem to be an unusually large amount of changes in the
> amount of work done in the particular phases that would raise immediate
> concerns to me.
> 
> Please try to find out the source of the high sys time and maybe even
> what causes it. I can't help a lot in that area, but dtrace seems a
> good starting point as suggested earlier.
> 
> I think we went through most obvious tunings now, but maybe somebody else has more ideas. I don't at this time.
> 
> The jdk (7u45) you are using is also very old, so even if we find that
> there is something wrong with g1 in particular, I kind of doubt there
> are many more useful knobs to turn with that version (or even
> appropriate logging to find out about the actual issue). Since 7u45
> release, there have been hundreds of changes that in particular improve
> G1 performance, so please consider upgrading to something more recent
> (at least latest 8u, preferably to me some test runs with 9ea).
> Upgrading alone might already help.
> 
> Thanks,
>   Thomas
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170203/cda5044b/attachment.html>

From alexey.ragozin at gmail.com  Sun Feb  5 19:06:39 2017
From: alexey.ragozin at gmail.com (Alexey Ragozin)
Date: Sun, 5 Feb 2017 22:06:39 +0300
Subject: Long Parnew pause
In-Reply-To: <CAKEw5+52eeaSSz+DSe38QOYQLceYE5shdaQcMZhrQLvXs_=K7A@mail.gmail.com>
References: <CAKEw5+52eeaSSz+DSe38QOYQLceYE5shdaQcMZhrQLvXs_=K7A@mail.gmail.com>
Message-ID: <CAMgTVmLehjbFH4VkR=PkO53DgJnbwp33iR+4pUVw8cMvxTvZ8w@mail.gmail.com>

Hi,

Are you running on physical box or virtual one?

In virtualized environments hypervisor may kick guest OS from certain cores
for prolonged period (up to few dozen of seconds in my expirience). From
guest OS prospective, task holding a core is accounted for all that time
(so you can see unrealistic CPU usage).

I'm not aware of reliable means to monitor such codition, though usually
guest OS would have a spike of CPU job queue.

Regards,
Alexey Ragozin

On Wed, Feb 1, 2017 at 5:44 PM, Gustav ?kesson <gustav.r.akesson at gmail.com>
wrote:

> Hi,
>
> In our application I've observed an occasional and peculiar Parnew GC
> which takes several seconds. From what I've been able to gather, it is
> associated with an occasional 35mb allocation of data. Those objects are
> allocated and tenured (see the bold aging in below logs) and once being
> promoted to old generation, that Parnew GC takes around 7 seconds. This
> surprises me a bit since it should not take so long time to move 35mb of
> data to another heap region.
>
> Looking at the logs, this issue is not related to I/O (zero systime) nor
> TTSP (takes few millis to stop the application threads). GC threads seems
> to simply spend their time working on the CPU chip. What in Parnew/CMS
> could possibly make these 35mb take so long to promote? Some flags that can
> shed som light, or any suspicion such as free-list balancing?
>
> I appreciate any input on the matter.
>
> JVM settings and platform information at the end of this mail.
>
> {Heap before GC invocations=1723 (full 0):
>  par new generation   total 1887488K, used 1768096K [0x00007fccf1600000,
> 0x00007fcd71600000, 0x00007fcd71600000)
>   eden space 1677824K, 100% used [0x00007fccf1600000, 0x00007fcd57c80000,
> 0x00007fcd57c80000)
>   from space 209664K,  43% used [0x00007fcd64940000, 0x00007fcd6a1680e8,
> 0x00007fcd71600000)
>   to   space 209664K,   0% used [0x00007fcd57c80000, 0x00007fcd57c80000,
> 0x00007fcd64940000)
>  concurrent mark-sweep generation total 34973696K, used 7319028K
> [0x00007fcd71600000, 0x00007fd5c8000000, 0x00007fd5c8000000)
>  Metaspace       used 121299K, capacity 134343K, committed 134400K,
> reserved 135168K
> 2017-01-26T12:50:11.476+0100: 14135.489: [GC (Allocation Failure)
> 2017-01-26T12:50:11.476+0100: 14135.489: [ParNew
> Desired survivor size 107347968 bytes, new threshold 6 (max 6)
> - age   1:   12439600 bytes,   12439600 total
> - age   2:    5233256 bytes,   17672856 total
> - age   3:    5083408 bytes,   22756264 total
> *- age   4:   37639936 bytes,   60396200 total*
> - age   5:    4869520 bytes,   65265720 total
> - age   6:    4746784 bytes,   70012504 total
> : 1768096K->91122K(1887488K), 0.1117876 secs]
> 9087124K->7412981K(36861184K), 0.1120711 secs] [Times: user=0.85 sys=0.00,
> real=0.11 secs]
> Heap after GC invocations=1724 (full 0):
>  par new generation   total 1887488K, used 91122K [0x00007fccf1600000,
> 0x00007fcd71600000, 0x00007fcd71600000)
>   eden space 1677824K,   0% used [0x00007fccf1600000, 0x00007fccf1600000,
> 0x00007fcd57c80000)
>   from space 209664K,  43% used [0x00007fcd57c80000, 0x00007fcd5d57ca70,
> 0x00007fcd64940000)
>   to   space 209664K,   0% used [0x00007fcd64940000, 0x00007fcd64940000,
> 0x00007fcd71600000)
>  concurrent mark-sweep generation total 34973696K, used 7321858K
> [0x00007fcd71600000, 0x00007fd5c8000000, 0x00007fd5c8000000)
>  Metaspace       used 121299K, capacity 134343K, committed 134400K,
> reserved 135168K
> }
> 2017-01-26T12:50:11.589+0100: 14135.601: Total time for which application
> threads were stopped: 0.1174674 seconds, Stopping threads took: 0.0042340
> seconds
> 2017-01-26T12:50:12.168+0100: 14136.181: Application time: 0.5798363
> seconds
> {Heap before GC invocations=1724 (full 0):
>  par new generation   total 1887488K, used 1768946K [0x00007fccf1600000,
> 0x00007fcd71600000, 0x00007fcd71600000)
>   eden space 1677824K, 100% used [0x00007fccf1600000, 0x00007fcd57c80000,
> 0x00007fcd57c80000)
>   from space 209664K,  43% used [0x00007fcd57c80000, 0x00007fcd5d57ca70,
> 0x00007fcd64940000)
>   to   space 209664K,   0% used [0x00007fcd64940000, 0x00007fcd64940000,
> 0x00007fcd71600000)
>  concurrent mark-sweep generation total 34973696K, used 7321858K
> [0x00007fcd71600000, 0x00007fd5c8000000, 0x00007fd5c8000000)
>  Metaspace       used 121299K, capacity 134343K, committed 134400K,
> reserved 135168K
> 2017-01-26T12:50:12.170+0100: 14136.182: [GC (Allocation Failure)
> 2017-01-26T12:50:12.170+0100: 14136.182: [ParNew
> Desired survivor size 107347968 bytes, new threshold 6 (max 6)
> - age   1:   10383048 bytes,   10383048 total
> - age   2:    5102856 bytes,   15485904 total
> - age   3:    5154816 bytes,   20640720 total
> - age   4:    5080000 bytes,   25720720 total
> *- age   5:   37637680 bytes,   63358400 total*
> - age   6:    4658912 bytes,   68017312 total
> : 1768946K->86544K(1887488K), 0.0929344 secs]
> 9090805K->7411133K(36861184K), 0.0932244 secs] [Times: user=0.70 sys=0.00,
> real=0.09 secs]
> Heap after GC invocations=1725 (full 0):
>  par new generation   total 1887488K, used 86544K [0x00007fccf1600000,
> 0x00007fcd71600000, 0x00007fcd71600000)
>   eden space 1677824K,   0% used [0x00007fccf1600000, 0x00007fccf1600000,
> 0x00007fcd57c80000)
>   from space 209664K,  41% used [0x00007fcd64940000, 0x00007fcd69dc41d0,
> 0x00007fcd71600000)
>   to   space 209664K,   0% used [0x00007fcd57c80000, 0x00007fcd57c80000,
> 0x00007fcd64940000)
>  concurrent mark-sweep generation total 34973696K, used 7324589K
> [0x00007fcd71600000, 0x00007fd5c8000000, 0x00007fd5c8000000)
>  Metaspace       used 121299K, capacity 134343K, committed 134400K,
> reserved 135168K
> }
> 2017-01-26T12:50:12.263+0100: 14136.276: Total time for which application
> threads were stopped: 0.0945634 seconds, Stopping threads took: 0.0001968
> seconds
> 2017-01-26T12:50:12.960+0100: 14136.972: Application time: 0.6966358
> seconds
> {Heap before GC invocations=1725 (full 0):
>  par new generation   total 1887488K, used 1764368K [0x00007fccf1600000,
> 0x00007fcd71600000, 0x00007fcd71600000)
>   eden space 1677824K, 100% used [0x00007fccf1600000, 0x00007fcd57c80000,
> 0x00007fcd57c80000)
>   from space 209664K,  41% used [0x00007fcd64940000, 0x00007fcd69dc41d0,
> 0x00007fcd71600000)
>   to   space 209664K,   0% used [0x00007fcd57c80000, 0x00007fcd57c80000,
> 0x00007fcd64940000)
>  concurrent mark-sweep generation total 34973696K, used 7324589K
> [0x00007fcd71600000, 0x00007fd5c8000000, 0x00007fd5c8000000)
>  Metaspace       used 121324K, capacity 134471K, committed 134656K,
> reserved 135168K
> 2017-01-26T12:50:12.961+0100: 14136.973: [GC (Allocation Failure)
> 2017-01-26T12:50:12.961+0100: 14136.973: [ParNew
> Desired survivor size 107347968 bytes, new threshold 6 (max 6)
> - age   1:    8033264 bytes,    8033264 total
> - age   2:    5686168 bytes,   13719432 total
> - age   3:    5019640 bytes,   18739072 total
> - age   4:    5150920 bytes,   23889992 total
> - age   5:    5076720 bytes,   28966712 total
> *- age   6:   37481736 bytes,   66448448 total*
> : 1764368K->79984K(1887488K), 0.0955902 secs]
> 9088957K->7407366K(36861184K), 0.0958643 secs] [Times: user=0.69 sys=0.00,
> real=0.10 secs]
> Heap after GC invocations=1726 (full 0):
>  par new generation   total 1887488K, used 79984K [0x00007fccf1600000,
> 0x00007fcd71600000, 0x00007fcd71600000)
>   eden space 1677824K,   0% used [0x00007fccf1600000, 0x00007fccf1600000,
> 0x00007fcd57c80000)
>   from space 209664K,  38% used [0x00007fcd57c80000, 0x00007fcd5ca9c148,
> 0x00007fcd64940000)
>   to   space 209664K,   0% used [0x00007fcd64940000, 0x00007fcd64940000,
> 0x00007fcd71600000)
>  concurrent mark-sweep generation total 34973696K, used 7327382K
> [0x00007fcd71600000, 0x00007fd5c8000000, 0x00007fd5c8000000)
>  Metaspace       used 121324K, capacity 134471K, committed 134656K,
> reserved 135168K
> }
> 2017-01-26T12:50:13.057+0100: 14137.069: Total time for which application
> threads were stopped: 0.0972200 seconds, Stopping threads took: 0.0001917
> seconds
> 2017-01-26T12:50:13.683+0100: 14137.695: Application time: 0.6259722
> seconds
> {Heap before GC invocations=1726 (full 0):
>  par new generation   total 1887488K, used 1757808K [0x00007fccf1600000,
> 0x00007fcd71600000, 0x00007fcd71600000)
>   eden space 1677824K, 100% used [0x00007fccf1600000, 0x00007fcd57c80000,
> 0x00007fcd57c80000)
>   from space 209664K,  38% used [0x00007fcd57c80000, 0x00007fcd5ca9c148,
> 0x00007fcd64940000)
>   to   space 209664K,   0% used [0x00007fcd64940000, 0x00007fcd64940000,
> 0x00007fcd71600000)
>  concurrent mark-sweep generation total 34973696K, used 7327382K
> [0x00007fcd71600000, 0x00007fd5c8000000, 0x00007fd5c8000000)
>  Metaspace       used 121324K, capacity 134471K, committed 134656K,
> reserved 135168K
> *2017-01-26T12:50:13.684+0100: 14137.697: [GC (Allocation Failure)
> 2017-01-26T12:50:13.684+0100: 14137.697: [ParNew*
> *Desired survivor size 107347968 bytes, new threshold 6 (max 6)*
> *- age   1:   10784424 bytes,   10784424 total*
> *- age   2:    5148032 bytes,   15932456 total*
> *- age   3:    5607232 bytes,   21539688 total*
> *- age   4:    5013024 bytes,   26552712 total*
> *- age   5:    5148840 bytes,   31701552 total*
> *- age   6:    4839808 bytes,   36541360 total*
> *: 1757808K->58357K(1887488K), 7.4626505 secs]
> 9085190K->7420330K(36861184K), 7.4629090 secs] [Times: user=58.63 sys=0.00,
> real=7.47 secs] *
> Heap after GC invocations=1727 (full 0):
>  par new generation   total 1887488K, used 58357K [0x00007fccf1600000,
> 0x00007fcd71600000, 0x00007fcd71600000)
>   eden space 1677824K,   0% used [0x00007fccf1600000, 0x00007fccf1600000,
> 0x00007fcd57c80000)
>   from space 209664K,  27% used [0x00007fcd64940000, 0x00007fcd6823d650,
> 0x00007fcd71600000)
>   to   space 209664K,   0% used [0x00007fcd57c80000, 0x00007fcd57c80000,
> 0x00007fcd64940000)
>  concurrent mark-sweep generation total 34973696K, used 7361973K
> [0x00007fcd71600000, 0x00007fd5c8000000, 0x00007fd5c8000000)
>  Metaspace       used 121324K, capacity 134471K, committed 134656K,
> reserved 135168K
> }
> *2017-01-26T12:50:21.147+0100: 14145.160: Total time for which application
> threads were stopped: 7.4642882 seconds, Stopping threads took: 0.0002572
> seconds*
>
>
> Java HotSpot(TM) 64-Bit Server VM (25.112-b15) for linux-amd64 JRE
> (1.8.0_112-b15), built on Sep 22 2016 21:10:53 by "java_re" with gcc 4.3.0
> 20080428 (Red Hat 4.3.0-8)
> Memory: 4k page, physical 49427048k(42773024k free), swap
> 4194300k(4194300k free)
> -XX:+AlwaysPreTouch
> -XX:+CMSEdenChunksRecordAlways
> -XX:CMSInitiatingOccupancyFraction=80
> -XX:+CMSParallelInitialMarkEnabled
> -XX:+CMSScavengeBeforeRemark
> -XX:CMSWaitDuration=60000
> -XX:+DisableExplicitGC
> -XX:GCLogFileSize=31457280
> -XX:InitialHeapSize=37959499776
> -XX:MaxHeapSize=37959499776
> -XX:MaxMetaspaceSize=268435456
> -XX:MaxNewSize=2147483648
> -XX:MaxTenuringThreshold=6
> -XX:MetaspaceSize=268435456
> -XX:NewSize=2147483648
> -XX:+UseBiasedLocking
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:-UseCompressedOops
> -XX:+UseConcMarkSweepGC
> -XX:+UseGCLogFileRotation
> -XX:+UseLargePages
> -XX:+UseParNewGC
>
>
>
> Best Regards,
> Gustav ?kesson
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170205/17bad044/attachment-0001.html>

From amit.mishra at redknee.com  Mon Feb  6 09:17:52 2017
From: amit.mishra at redknee.com (Amit Mishra)
Date: Mon, 6 Feb 2017 09:17:52 +0000
Subject: Need help on G1 GC young gen Update RS and Scan RS pause reduction
In-Reply-To: <4712011A-5C3D-471B-A238-DF1A08267DDC@oracle.com>
References: <MWHPR05MB3118FCF54B533C7E26EF191AF07E0@MWHPR05MB3118.namprd05.prod.outlook.com>
	<1484852604.6579.27.camel@oracle.com>
	<MWHPR05MB311884733FEF15CCA084BB22F0710@MWHPR05MB3118.namprd05.prod.outlook.com>
	<1485244966.2883.8.camel@oracle.com>
	<MWHPR05MB3118C22043AE3A69E096FA9FF0750@MWHPR05MB3118.namprd05.prod.outlook.com>
	<1485338858.3625.42.camel@oracle.com>
	<MWHPR05MB31181BC64AB111F7D98E6B90F04F0@MWHPR05MB3118.namprd05.prod.outlook.com>
	<1486126913.2892.31.camel@oracle.com>
	<4712011A-5C3D-471B-A238-DF1A08267DDC@oracle.com>
Message-ID: <MWHPR05MB31184C403F74B3375D4DD0E0F0400@MWHPR05MB3118.namprd05.prod.outlook.com>

Hi Charlie,

This is solaris OS and there is flag argv[46]: -XX:+UseLargePages  which is enabled, do you recommend to disable it.

Also per Thomas suggestion I am trying to figure out Java 1.8 compatible with our Apps and will use that in further testing.

Thanks,
Amit Mishra

From: charlie hunt [mailto:charlie.hunt at oracle.com]
Sent: Saturday, February 4, 2017 00:12
To: Thomas Schatzl <thomas.schatzl at oracle.com>
Cc: Amit Mishra <amit.mishra at redknee.com>; hotspot-gc-use at openjdk.java.net
Subject: Re: Need help on G1 GC young gen Update RS and Scan RS pause reduction

> ?- from one gc to another, just for these four gcs, sys time is relatively high.?

Assuming this is on Linux ? perhaps double check that THP (transparent huge pages) is disabled.

charlie

On Feb 3, 2017, at 7:01 AM, Thomas Schatzl <thomas.schatzl at oracle.com<mailto:thomas.schatzl at oracle.com>> wrote:

Hi Amit,

On Fri, 2017-02-03 at 11:09 +0000, Amit Mishra wrote:

Hi Thomas/team,


I have put all parameters as per your suggestion but somehow the
minor gc pauses are still haunting.

Attaching GC logs.


bash-3.2$ grep -i young gcstats.log.10636|cut -d, -f2|awk -F" "
'{print $1}'|awk '$1 > 1'
1.1273134
1.1683221
3.5504848
5.2693987

  looking at these log entries, there seems to be something going on
that seems outside of VM control:

- from one gc to another, just for these four gcs, sys time is
relatively high.

- for the last two occurrences, at least one thread is hanging in "Ext
Root Scanning" for almost all of the gc time for no obvious reason.

- there do not seem to be an unusually large amount of changes in the
amount of work done in the particular phases that would raise immediate
concerns to me.

Please try to find out the source of the high sys time and maybe even
what causes it. I can't help a lot in that area, but dtrace seems a
good starting point as suggested earlier.

I think we went through most obvious tunings now, but maybe somebody else has more ideas. I don't at this time.

The jdk (7u45) you are using is also very old, so even if we find that
there is something wrong with g1 in particular, I kind of doubt there
are many more useful knobs to turn with that version (or even
appropriate logging to find out about the actual issue). Since 7u45
release, there have been hundreds of changes that in particular improve
G1 performance, so please consider upgrading to something more recent
(at least latest 8u, preferably to me some test runs with 9ea).
Upgrading alone might already help.

Thanks,
  Thomas

_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170206/43bc0171/attachment.html>

From charlie.hunt at oracle.com  Tue Feb  7 01:51:01 2017
From: charlie.hunt at oracle.com (charlie hunt)
Date: Mon, 6 Feb 2017 19:51:01 -0600
Subject: Need help on G1 GC young gen Update RS and Scan RS pause reduction
In-Reply-To: <MWHPR05MB31184C403F74B3375D4DD0E0F0400@MWHPR05MB3118.namprd05.prod.outlook.com>
References: <MWHPR05MB3118FCF54B533C7E26EF191AF07E0@MWHPR05MB3118.namprd05.prod.outlook.com>
	<1484852604.6579.27.camel@oracle.com>
	<MWHPR05MB311884733FEF15CCA084BB22F0710@MWHPR05MB3118.namprd05.prod.outlook.com>
	<1485244966.2883.8.camel@oracle.com>
	<MWHPR05MB3118C22043AE3A69E096FA9FF0750@MWHPR05MB3118.namprd05.prod.outlook.com>
	<1485338858.3625.42.camel@oracle.com>
	<MWHPR05MB31181BC64AB111F7D98E6B90F04F0@MWHPR05MB3118.namprd05.prod.outlook.com>
	<1486126913.2892.31.camel@oracle.com>
	<4712011A-5C3D-471B-A238-DF1A08267DDC@oracle.com>
	<MWHPR05MB31184C403F74B3375D4DD0E0F0400@MWHPR05MB3118.namprd05.prod.outlook.com>
Message-ID: <BE78EFEE-6ED6-428D-AB15-A1CDFA4C5828@oracle.com>

No, if this is Solaris, continue to use large pages. There are no known issues with using large pages on Solaris (SPARC or x86/x64).

Charlie 

> On Feb 6, 2017, at 3:17 AM, Amit Mishra <amit.mishra at redknee.com> wrote:
> 
> Hi Charlie,
>  
> This is solaris OS and there is flag argv[46]: -XX:+UseLargePages  which is enabled, do you recommend to disable it.
>  
> Also per Thomas suggestion I am trying to figure out Java 1.8 compatible with our Apps and will use that in further testing.
>  
> Thanks,
> Amit Mishra
>  
> From: charlie hunt [mailto:charlie.hunt at oracle.com] 
> Sent: Saturday, February 4, 2017 00:12
> To: Thomas Schatzl <thomas.schatzl at oracle.com>
> Cc: Amit Mishra <amit.mishra at redknee.com>; hotspot-gc-use at openjdk.java.net
> Subject: Re: Need help on G1 GC young gen Update RS and Scan RS pause reduction
>  
> > ?- from one gc to another, just for these four gcs, sys time is relatively high.?
>  
> Assuming this is on Linux ? perhaps double check that THP (transparent huge pages) is disabled.
>  
> charlie
>  
> On Feb 3, 2017, at 7:01 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>  
> Hi Amit,
> 
> On Fri, 2017-02-03 at 11:09 +0000, Amit Mishra wrote:
> 
> Hi Thomas/team,
> 
> 
> I have put all parameters as per your suggestion but somehow the
> minor gc pauses are still haunting.
> 
> Attaching GC logs.
> 
> 
> bash-3.2$ grep -i young gcstats.log.10636|cut -d, -f2|awk -F" "
> '{print $1}'|awk '$1 > 1'
> 1.1273134
> 1.1683221
> 3.5504848
> 5.2693987
> 
>   looking at these log entries, there seems to be something going on
> that seems outside of VM control:
> 
> - from one gc to another, just for these four gcs, sys time is
> relatively high.
> 
> - for the last two occurrences, at least one thread is hanging in "Ext
> Root Scanning" for almost all of the gc time for no obvious reason.
> 
> - there do not seem to be an unusually large amount of changes in the
> amount of work done in the particular phases that would raise immediate
> concerns to me.
> 
> Please try to find out the source of the high sys time and maybe even
> what causes it. I can't help a lot in that area, but dtrace seems a
> good starting point as suggested earlier.
> 
> I think we went through most obvious tunings now, but maybe somebody else has more ideas. I don't at this time.
> 
> The jdk (7u45) you are using is also very old, so even if we find that
> there is something wrong with g1 in particular, I kind of doubt there
> are many more useful knobs to turn with that version (or even
> appropriate logging to find out about the actual issue). Since 7u45
> release, there have been hundreds of changes that in particular improve
> G1 performance, so please consider upgrading to something more recent
> (at least latest 8u, preferably to me some test runs with 9ea).
> Upgrading alone might already help.
> 
> Thanks,
>   Thomas
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170206/457a097d/attachment.html>

From amit.balode at gmail.com  Tue Feb  7 13:36:59 2017
From: amit.balode at gmail.com (Amit Balode)
Date: Tue, 7 Feb 2017 19:06:59 +0530
Subject: How to find fragmented space in G1 regions
Message-ID: <CACoa2tVFG2=cGam0Fbq+s0WT52HiogFJhsN8ghSu4hfiWbso2Q@mail.gmail.com>

Any thoughts on how we could find how much space is fragmented in G1
regions?

-- 
Thanks & Regards,
Amit.Balode
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170207/3d62fce1/attachment.html>

From prasanna.gopal at blackrock.com  Wed Feb  8 12:25:49 2017
From: prasanna.gopal at blackrock.com (Gopal, Prasanna CWK)
Date: Wed, 8 Feb 2017 12:25:49 +0000
Subject: G1  Region size info
Message-ID: <46dc58914a7c40f991bcdbd3f023a85e@UKPMSEXD202N02.na.blkint.com>

Hi All

I am trying to understand the region size info provided in of our application?s GC log file.

We are running an application with the following configuration

-Xmx7G
 -Xms7G
 -XX:+UseG1GC
 -XX:+UnlockExperimentalVMOptions
 -XX:InitiatingHeapOccupancyPercent=60
 -XX:G1ReservePercent=20
 -XX:G1ReservePercent=20
-XX:G1HeapRegionSize=32M
 -XX:G1MixedGCLiveThresholdPercent=85
 -XX:MaxGCPauseMillis=500
 -XX:+ParallelRefProcEnabled
 -XX:+PrintAdaptiveSizePolicy
 -XX:+PrintHeapAtGC
 -XX:+PrintReferenceGC
 -XX:+PrintGCDateStamps
 -XX:+PrintTenuringDistribution
 -XX:+PrintGCApplicationConcurrentTime
 -XX:+PrintGCApplicationStoppedTime


JDK :  jdk_7u40_x64  ( Yes we need to move to latest JDK)


At the end of the region size info, we have following summary


### SUMMARY  capacity: 352.00 MB  used: 328.94 MB / 93.45 %  prev-live: 138.65 MB / 39.39 %  next-live: 0.00 MB / 0.00 %

4283M->4121M(6144M), 0.0066530 secs]
[Times: user=0.04 sys=0.00, real=0.01 secs]
2017-02-08T00:00:59.239-0500: 223238.232: Total time for which application threads were stopped: 0.0075650 seconds
2017-02-08T00:00:59.239-0500: 223238.232: [GC concurrent-cleanup-start]
2017-02-08T00:00:59.239-0500: 223238.232: [GC concurrent-cleanup-end, 0.0000950 secs]
2017-02-08T00:01:00.003-0500: 223238.996: Application time: 0.7639680 seconds
2017-02-08T00:01:00.005-0500: 223238.999: Total time for which application threads were stopped: 0.0025640 seconds
2017-02-08T00:01:05.024-0500: 223244.017: Application time: 5.0186280 seconds


Does this mean,  our heap occupancy is only 352 MB after Post-Sorting phase ?. It doesn?t co-relate with the information provided at thd end of GC clean up phase (4283M->4121M(6144M), 0.0066530 secs) , which say the heap size is 4121M.

And subsequent Young GC shows the following heap composition

   [Eden: 992.0M(992.0M)->0.0B(1088.0M) Survivors: 64.0M->32.0M Heap: 4538.0M(6144.0M)->3516.9M(6144.0M)]

Could you please help me in understanding the Summary information provided in Post sorting phase ? . Please let  me know ,if you need any more information.


Regards
Prasanna


Region size info
=============

  region size 32768K, 2 young (65536K), 2 survivors (65536K)
compacting perm gen  total 98304K, used 85713K [0x00000007c0000000, 0x00000007c6000000, 0x0000000800000000)
   the space 98304K,  87% used [0x00000007c0000000, 0x00000007c53b4400, 0x00000007c53b4400, 0x00000007c6000000)
No shared spaces configured.
}
[Times: user=0.09 sys=0.00, real=0.02 secs]
2017-02-08T00:00:55.746-0500: 223234.739: [GC concurrent-root-region-scan-start]
2017-02-08T00:00:55.746-0500: 223234.739: Total time for which application threads were stopped: 0.0145960 seconds
2017-02-08T00:00:55.748-0500: 223234.742: [GC concurrent-root-region-scan-end, 0.0025760 secs]
2017-02-08T00:00:55.748-0500: 223234.742: [GC concurrent-mark-start]
2017-02-08T00:00:59.211-0500: 223238.204: [GC concurrent-mark-end, 3.4626220 secs]
2017-02-08T00:00:59.211-0500: 223238.204: Application time: 3.4652490 seconds
2017-02-08T00:00:59.212-0500: 223238.205: [GC remark 2017-02-08T00:00:59.213-0500: 223238.206: [GC ref-proc2017-02-08T00:00:59.213-0500: 223238.206: [SoftReference, 986 refs, 0.0007480 secs]2017-02-08T00:00:59.213-0500: 223238.207: [WeakReference, 547 refs, 0.0004050 secs]2017-02-08T00:00:59.214-0500: 223238.207: [FinalReference, 1 refs, 0.0002350 secs]2017-02-08T00:00:59.214-0500: 223238.207: [PhantomReference, 6 refs, 0.0002730 secs]2017-02-08T00:00:59.214-0500: 223238.208: [JNI Weak Reference, 0.0000510 secs], 0.0018440 secs], 0.0189520 secs]
[Times: user=0.05 sys=0.00, real=0.02 secs]
2017-02-08T00:00:59.231-0500: 223238.225: Total time for which application threads were stopped: 0.0202670 seconds
2017-02-08T00:00:59.231-0500: 223238.225: Application time: 0.0001010 seconds
2017-02-08T00:00:59.232-0500: 223238.226: [GC cleanup
### PHASE Post-Marking @ 223238.226
### HEAP  committed: 0x0000000640000000-0x00000007c0000000  reserved: 0x0000000640000000-0x00000007c0000000  region-size: 33554432
###
###   type                         address-range       used  prev-live  next-live          gc-eff
###                                                 (bytes)    (bytes)    (bytes)      (bytes/ms)
###   OLD  0x0000000640000000-0x0000000642000000   32368376   32368376   32366752        223961.6
###   OLD  0x0000000642000000-0x0000000644000000   33554408   33554408   33554408         21493.5
###   OLD  0x0000000644000000-0x0000000646000000   33554424   33554424   33554424         29029.2
###   OLD  0x0000000646000000-0x0000000648000000   33554424   33554424   33554424         13470.1
###   OLD  0x0000000648000000-0x000000064a000000   33554432   33554432   33552752        293134.5
###   OLD  0x000000064a000000-0x000000064c000000   33357104   33357104   33357104        179951.0
###   OLD  0x000000064c000000-0x000000064e000000   33554408   33554408   33554408        525436.5
###   OLD  0x000000064e000000-0x0000000650000000   33554416   33554416   33554416        273501.1
###   OLD  0x0000000650000000-0x0000000652000000   33554432   33554432   33554432        407703.2
###   OLD  0x0000000652000000-0x0000000654000000   33554432   33554432   33554432         70645.8
###   OLD  0x0000000654000000-0x0000000656000000   33554432   33554432   33554432        222200.0
###   OLD  0x0000000656000000-0x0000000658000000   33554392   33554392   33554392      16195314.6
###   OLD  0x0000000658000000-0x000000065a000000   33303600   33303600   33303600        428407.8
###   OLD  0x000000065a000000-0x000000065c000000   33554424   33554424   33554424        153899.4
###   OLD  0x000000065c000000-0x000000065e000000   33554384   33554384   33554384        307229.6
###   OLD  0x000000065e000000-0x0000000660000000   33554416   33554416   33554416      10610494.9
###   OLD  0x0000000660000000-0x0000000662000000   33554416   33554416   33554416        469683.0
###   OLD  0x0000000662000000-0x0000000664000000   33554320   33554320   33554320       1241801.8
###   OLD  0x0000000664000000-0x0000000666000000   33554416   33554416   33554416        371576.7
###   OLD  0x0000000666000000-0x0000000668000000   33554432   33554408   33554408         18876.1
###   OLD  0x0000000668000000-0x000000066a000000   33553800   33553800   33553800        197254.3
###   OLD  0x000000066a000000-0x000000066c000000   32920592   32920592   32920592        261906.8
###   OLD  0x000000066c000000-0x000000066e000000   33554272   33554272   33554272        263415.0
###   OLD  0x000000066e000000-0x0000000670000000   33466704   33466704   33466704        217406.7
###   OLD  0x0000000670000000-0x0000000672000000   33554432   33554432   33554432        558273.8
###   OLD  0x0000000672000000-0x0000000674000000   33554424   33554424   33521640        471863.4
###   OLD  0x0000000674000000-0x0000000676000000   33554416   33554416   33554416         24962.1
###   OLD  0x0000000676000000-0x0000000678000000   32866480   32866480   32866480        469198.9
###   OLD  0x0000000678000000-0x000000067a000000   33554424   33554424   33554424       1911056.3
###   OLD  0x000000067a000000-0x000000067c000000   33420736   33420736   33420736        405214.5
###   OLD  0x000000067c000000-0x000000067e000000   33554432   33554432   33554432        413762.1
###   OLD  0x000000067e000000-0x0000000680000000   33554144   33554144   33554144        658142.1
###   OLD  0x0000000680000000-0x0000000682000000   33554432   33554432   33554432        561025.9
###   OLD  0x0000000682000000-0x0000000684000000   33554248   33554248   33543904        702743.4
###   OLD  0x0000000684000000-0x0000000686000000   33554416   33554416   33554416       4785040.9
###   OLD  0x0000000686000000-0x0000000688000000   33554360   33554360   33554360        307205.2
###   OLD  0x0000000688000000-0x000000068a000000   33554416   33554416   33554416        791487.3
###   OLD  0x000000068a000000-0x000000068c000000   33554376   33554376   33554376       6024807.6
??  Remove some for brevity

###   HUMS 0x000000079e000000-0x00000007a0000000   33554432   33554432   33554432     501904008.8
###   HUMC 0x00000007a0000000-0x00000007a2000000    2367536    2367536    2367536       2526909.9
###   FREE 0x00000007a2000000-0x00000007a4000000          0          0          0    1018023826.2
###   FREE 0x00000007a4000000-0x00000007a6000000          0          0          0       3621347.9
###   FREE 0x00000007a6000000-0x00000007a8000000          0          0          0       6975354.2
###   FREE 0x00000007a8000000-0x00000007aa000000          0          0          0       6228744.7
###   FREE 0x00000007aa000000-0x00000007ac000000          0          0          0        825510.3
###   FREE 0x00000007ac000000-0x00000007ae000000          0          0          0       3304768.5
###   FREE 0x00000007ae000000-0x00000007b0000000          0          0          0       2701625.1
###   FREE 0x00000007b0000000-0x00000007b2000000          0          0          0       2623840.5
###   FREE 0x00000007b2000000-0x00000007b4000000          0          0          0       6313095.1
###   FREE 0x00000007b4000000-0x00000007b6000000          0          0          0       4588159.0
###   FREE 0x00000007b6000000-0x00000007b8000000          0          0          0       2184971.5
###   FREE 0x00000007b8000000-0x00000007ba000000          0          0          0        592114.6
###   FREE 0x00000007ba000000-0x00000007bc000000          0          0          0      24307488.7
###   FREE 0x00000007bc000000-0x00000007be000000          0          0          0       2283212.1
###   FREE 0x00000007be000000-0x00000007c0000000          0          0          0        627031.8
###
### SUMMARY  capacity: 6144.00 MB  used: 4283.18 MB / 69.71 %  prev-live: 4057.87 MB / 66.05 %  next-live: 3896.51 MB / 63.42 %


### PHASE Post-Sorting @ 223238.232
### HEAP  committed: 0x0000000640000000-0x00000007c0000000  reserved: 0x0000000640000000-0x00000007c0000000  region-size: 33554432
###
###   type                         address-range       used  prev-live  next-live          gc-eff
###                                                 (bytes)    (bytes)    (bytes)      (bytes/ms)
###   OLD  0x0000000736000000-0x0000000738000000   33554432    2166312          0      87829788.1
###   OLD  0x000000073c000000-0x000000073e000000   33554432    6048552          0      26417453.7
###   OLD  0x000000073a000000-0x000000073c000000   33554432   13552368          0       8930752.6
###   OLD  0x00000006fe000000-0x0000000700000000   32982728   15474224          0       6200283.7
###   OLD  0x0000000738000000-0x000000073a000000   33554432   17669672          0       5731058.5
###   OLD  0x00000006f0000000-0x00000006f2000000   33232096    5230584          0       5595398.5
###   OLD  0x00000006f2000000-0x00000006f4000000   33554384    8938848          0       5331392.0
###   OLD  0x0000000706000000-0x0000000708000000   10273584   10273432          0       3376131.8
###   OLD  0x00000006ee000000-0x00000006f0000000   33554392   18481472          0       1231822.2
###   OLD  0x00000006f6000000-0x00000006f8000000   33554432   22116128          0        774360.6
###   OLD  0x0000000702000000-0x0000000704000000   33554424   25431376          0        520416.9
###
### SUMMARY  capacity: 352.00 MB  used: 328.94 MB / 93.45 %  prev-live: 138.65 MB / 39.39 %  next-live: 0.00 MB / 0.00 %

4283M->4121M(6144M), 0.0066530 secs]
[Times: user=0.04 sys=0.00, real=0.01 secs]
2017-02-08T00:00:59.239-0500: 223238.232: Total time for which application threads were stopped: 0.0075650 seconds
2017-02-08T00:00:59.239-0500: 223238.232: [GC concurrent-cleanup-start]
2017-02-08T00:00:59.239-0500: 223238.232: [GC concurrent-cleanup-end, 0.0000950 secs]
2017-02-08T00:01:00.003-0500: 223238.996: Application time: 0.7639680 seconds
2017-02-08T00:01:00.005-0500: 223238.999: Total time for which application threads were stopped: 0.0025640 seconds
2017-02-08T00:01:05.024-0500: 223244.017: Application time: 5.0186280 seconds


This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for further information.  Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more information about BlackRock?s Privacy Policy.
BlackRock Advisors (UK) Limited and BlackRock Investment Management (UK) Limited are authorised and regulated by the Financial Conduct Authority. Registered in England No. 796793 and No. 2020394 respectively. BlackRock Life Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. Registered in England No. 2223202. Registered Offices: 12 Throgmorton Avenue, London EC2N 2DL. BlackRock International Limited is authorised and regulated by the Financial Conduct Authority and is a registered investment adviser with the Securities and Exchange Commission (SEC). Registered in Scotland No. SC160821. Registered Office: Exchange Place One, 1 Semple Street, Edinburgh EH3 8BL.
For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.

? 2017 BlackRock, Inc. All rights reserved.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170208/ee371215/attachment-0001.html>

From thomas.schatzl at oracle.com  Wed Feb  8 15:31:58 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 08 Feb 2017 16:31:58 +0100
Subject: How to find fragmented space in G1 regions
In-Reply-To: <CACoa2tVFG2=cGam0Fbq+s0WT52HiogFJhsN8ghSu4hfiWbso2Q@mail.gmail.com>
References: <CACoa2tVFG2=cGam0Fbq+s0WT52HiogFJhsN8ghSu4hfiWbso2Q@mail.gmail.com>
Message-ID: <1486567918.3510.30.camel@oracle.com>

Hi,

On Tue, 2017-02-07 at 19:06 +0530, Amit Balode wrote:
> Any thoughts on how we could find how much space is fragmented in G1
> regions?

? -XX:+G1PrintRegionLivenessInfo prints region information containing
its type after every marking (use gc+liveness=trace for jdk9) twice.
The first "post-marking" output is what you want. The "post-sorting"
one is only interesting for getting details about the collection set
(and this one does not contain free regions).

The first column contains information about the type of the region:

FREE -> free region
SURV -> survivor region
EDEN -> eden region
HUMS -> humongous (start)
HUMC -> humongous (continuation)
OLD -> old region
ARC -> archive regions (jdk9 only I think, not sure, maybe also jdk8)

>From that you can deduce the size and how many contiguous free regions
are available at the moment.

There is also -XX:+PrintHeapAtGC and -XX:+PrintHeapAtGCExtended which
print the region layout at every GC.

Thanks,
? Thomas


From amit.balode at gmail.com  Wed Feb  8 15:45:59 2017
From: amit.balode at gmail.com (Amit Balode)
Date: Wed, 8 Feb 2017 21:15:59 +0530
Subject: How to find fragmented space in G1 regions
In-Reply-To: <1486567918.3510.30.camel@oracle.com>
References: <CACoa2tVFG2=cGam0Fbq+s0WT52HiogFJhsN8ghSu4hfiWbso2Q@mail.gmail.com>
	<1486567918.3510.30.camel@oracle.com>
Message-ID: <CACoa2tUoun2seOfJbETNDk+J4DLhSxOK=D-StkqZdvSKnY4mww@mail.gmail.com>

Thanks Thomas, do you know if there is overhead of those flags? I think
trace would be expensive but what about others? Will they add anything to
pause time?

On Wed, Feb 8, 2017 at 9:01 PM, Thomas Schatzl <thomas.schatzl at oracle.com>
wrote:

> Hi,
>
> On Tue, 2017-02-07 at 19:06 +0530, Amit Balode wrote:
> > Any thoughts on how we could find how much space is fragmented in G1
> > regions?
>
>   -XX:+G1PrintRegionLivenessInfo prints region information containing
> its type after every marking (use gc+liveness=trace for jdk9) twice.
> The first "post-marking" output is what you want. The "post-sorting"
> one is only interesting for getting details about the collection set
> (and this one does not contain free regions).
>
> The first column contains information about the type of the region:
>
> FREE -> free region
> SURV -> survivor region
> EDEN -> eden region
> HUMS -> humongous (start)
> HUMC -> humongous (continuation)
> OLD -> old region
> ARC -> archive regions (jdk9 only I think, not sure, maybe also jdk8)
>
> From that you can deduce the size and how many contiguous free regions
> are available at the moment.
>
> There is also -XX:+PrintHeapAtGC and -XX:+PrintHeapAtGCExtended which
> print the region layout at every GC.
>
> Thanks,
>   Thomas
>
>


-- 
Thanks & Regards,
Amit.Balode
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170208/bb5840c3/attachment.html>

From thomas.schatzl at oracle.com  Wed Feb  8 15:53:32 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 08 Feb 2017 16:53:32 +0100
Subject: How to find fragmented space in G1 regions
In-Reply-To: <CACoa2tUoun2seOfJbETNDk+J4DLhSxOK=D-StkqZdvSKnY4mww@mail.gmail.com>
References: <CACoa2tVFG2=cGam0Fbq+s0WT52HiogFJhsN8ghSu4hfiWbso2Q@mail.gmail.com>
	<1486567918.3510.30.camel@oracle.com>
	<CACoa2tUoun2seOfJbETNDk+J4DLhSxOK=D-StkqZdvSKnY4mww@mail.gmail.com>
Message-ID: <1486569212.3510.43.camel@oracle.com>

Hi,

On Wed, 2017-02-08 at 21:15 +0530, Amit Balode wrote:
> Thanks Thomas, do you know if there is overhead of those flags? I
> think trace would be expensive but what about others? Will they add
> anything to pause time?

? the bottleneck is 99% writing out the data. The internal per-line
calculation overhead is negligible. However the amount of information
printed may cause I/O issues.

I think since at least?G1PrintRegionLivenessInfo is a diagnostic
option, you can turn it completely on and off at runtime.
(In JDK9, everything is completely based on the unified logging, you
can do that as well).

Thanks,
? Thomas


From thomas.schatzl at oracle.com  Wed Feb  8 16:06:48 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 08 Feb 2017 17:06:48 +0100
Subject: G1 native memory consumption
In-Reply-To: <1486138975652.57172@infobip.com>
References: <1484943874550.90103@infobip.com> ,
	<1485168079.2811.21.camel@oracle.com> <1486138975652.57172@infobip.com>
Message-ID: <1486570008.3510.50.camel@oracle.com>

Hi Milan,

On Fri, 2017-02-03 at 16:22 +0000, Milan Mimica wrote:
> Hi Thomas
> 
> Thanks for your input. I took me a while to have a stable system
> again to repeat measurements.
> 
> I have tried setting G1HeapRegionSize to 16M on one instance (8M is
> default) and I notice lower GC memory usage:
> GC (reserved=1117MB -18MB, committed=1117MB -18MB)
> vs
> GC (reserved=1604MB +313MB, committed=1604MB +313MB)
> 
> It seems more stable too. However, "Internal" is still relatively
> high for a 25G heap, and there is no much difference between
> instances:
> Internal (reserved=2132MB -7MB, committed=2132MB -7MB)

I am not sure why there is no difference, it would be nice to have a
breakdown on this like in the previous case to rule out other
components or not enough warmup.

Everything that is allocated via the OtherRegionsTable::add_reference()
-> BitMap::resize() path in the figure from the other email is
remembered sets, and they _should_ have gone down.

You can try to move memory from that path to the CHeapObj operator new
one. This results in g1 storing remembered sets in a much more dense
but potentially slower to access representation.

The switch to turn here is G1RSetSparseRegionEntries. It gives maximum
number of cards (small areas, 512 bytes) per region to store in that
representation. If it overflows, pretty large bitmaps that might be
really sparsely populated are used (that take lots of time). By default
it is somewhat like?

4 * (log2(region-size-in-MB + 1)

E.g. with 32M region only 24 cards are stored there max. I think you
can easily increase this to something like 64 or 128 or even larger. I
think (and I am unsure about this, in jdk9 we halved its memory usage)
memory usage should be around equal with the bitmaps with 2k entries on
32M regions, so I would stop at something in that area at most.

This size need not be a power of two btw. You can try increasing this
value significantly and see if it helps with memory consumption without
impacting performance too much.

Thanks,
? Thomas


From thomas.schatzl at oracle.com  Wed Feb  8 16:13:22 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 08 Feb 2017 17:13:22 +0100
Subject: G1  Region size info
In-Reply-To: <46dc58914a7c40f991bcdbd3f023a85e@UKPMSEXD202N02.na.blkint.com>
References: <46dc58914a7c40f991bcdbd3f023a85e@UKPMSEXD202N02.na.blkint.com>
Message-ID: <1486570402.3510.55.camel@oracle.com>

Hi,

On Wed, 2017-02-08 at 12:25 +0000, Gopal, Prasanna CWK wrote:
> Hi All
> ?
> I am trying to understand the region size info provided in of our
> application?s GC log file.
> ?
> We are running an application with the following configuration
> ?
> -Xmx7G

[...]

> ?
> Does this mean, ?our heap occupancy is only 352 MB after Post-Sorting 
> phase ?. It doesn?t co-relate with the information provided at thd
> end of GC clean up phase (4283M->4121M(6144M), 0.0066530 secs) ,
> which say the heap size is 4121M.

Post-sorting only considers regions that are collection set candidates
- i.e. regions that G1 will clean out in the next reclamation phase.
I.e. contain lots of garbage.

If you think that is too little (g1 not cleaning out enough in mixed
gcs), you might want to make evacuation more aggressive.

>From the post-marking snippet it seems though that there are not many
regions with lots of garbage there anyway though.

Post-marking statistics is what you want to look at and compare with.

Thanks,
? Thomas


From prasanna.gopal at blackrock.com  Wed Feb  8 16:28:41 2017
From: prasanna.gopal at blackrock.com (Gopal, Prasanna CWK)
Date: Wed, 8 Feb 2017 16:28:41 +0000
Subject: G1  Region size info
In-Reply-To: <1486570402.3510.55.camel@oracle.com>
References: <46dc58914a7c40f991bcdbd3f023a85e@UKPMSEXD202N02.na.blkint.com>
	<1486570402.3510.55.camel@oracle.com>
Message-ID: <ab44084327244cc6bc90756b35ff248a@UKPMSEXD202N02.na.blkint.com>

?Hi Thomas

Thanks for your reply. We have following G1 configuration at the moment 


-Xmx7G 
 -Xms7G 
 -XX:+UseG1GC 
 -XX:+UnlockExperimentalVMOptions 
 -XX:InitiatingHeapOccupancyPercent=60 
 -XX:G1ReservePercent=20 
 -XX:G1ReservePercent=20 
-XX:G1HeapRegionSize=32M
 -XX:G1MixedGCLiveThresholdPercent=85   
 -XX:MaxGCPauseMillis=500 
 -XX:+ParallelRefProcEnabled  
 -XX:+PrintAdaptiveSizePolicy  
 -XX:+PrintHeapAtGC  
 -XX:+PrintReferenceGC   
 -XX:+PrintGCDateStamps 
 -XX:+PrintTenuringDistribution 
 -XX:+PrintGCApplicationConcurrentTime 
 -XX:+PrintGCApplicationStoppedTime 
 
Apart from these parameters, can we try another parameter to make the evacuation more aggressive? 


-XX:G1MixedGCLiveThresholdPercent=85   
-XX:InitiatingHeapOccupancyPercent=60 ==> we have experimented with less values , it is just making the concurrent cycle without claiming any significant. 
-XX:MaxGCPauseMillis=500

Our object allocation rate is very high, before increasing the memory can we try any other parameter which can make the evacuation more aggressive? Appreciate your help. Please do let me know, if you need any more information. 

Regards
Prasanna


-----Original Message-----
From: Thomas Schatzl [mailto:thomas.schatzl at oracle.com] 
Sent: 08 February 2017 16:13
To: Gopal, Prasanna CWK <prasanna.gopal at blackrock.com>; hotspot-gc-use at openjdk.java.net
Subject: Re: G1 Region size info

Hi,

On Wed, 2017-02-08 at 12:25 +0000, Gopal, Prasanna CWK wrote:
> Hi All
> ?
> I am trying to understand the region size info provided in of our 
> application?s GC log file.
> ?
> We are running an application with the following configuration
> ?
> -Xmx7G

[...]

> ?
> Does this mean, ?our heap occupancy is only 352 MB after Post-Sorting 
> phase ?. It doesn?t co-relate with the information provided at thd end 
> of GC clean up phase (4283M->4121M(6144M), 0.0066530 secs) , which say 
> the heap size is 4121M.

Post-sorting only considers regions that are collection set candidates
- i.e. regions that G1 will clean out in the next reclamation phase.
I.e. contain lots of garbage.

If you think that is too little (g1 not cleaning out enough in mixed gcs), you might want to make evacuation more aggressive.

From the post-marking snippet it seems though that there are not many regions with lots of garbage there anyway though.

Post-marking statistics is what you want to look at and compare with.

Thanks,
? Thomas


This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for further information.  Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more information about BlackRock?s Privacy Policy.
BlackRock Advisors (UK) Limited and BlackRock Investment Management (UK) Limited are authorised and regulated by the Financial Conduct Authority. Registered in England No. 796793 and No. 2020394 respectively. BlackRock Life Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. Registered in England No. 2223202. Registered Offices: 12 Throgmorton Avenue, London EC2N 2DL. BlackRock International Limited is authorised and regulated by the Financial Conduct Authority and is a registered investment adviser with the Securities and Exchange Commission (SEC). Registered in Scotland No. SC160821. Registered Office: Exchange Place One, 1 Semple Street, Edinburgh EH3 8BL.


For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.

? 2017 BlackRock, Inc. All rights reserved.

From Milan.Mimica at infobip.com  Wed Feb  8 16:56:36 2017
From: Milan.Mimica at infobip.com (Milan Mimica)
Date: Wed, 8 Feb 2017 16:56:36 +0000
Subject: G1 native memory consumption
In-Reply-To: <1486570008.3510.50.camel@oracle.com>
References: <1484943874550.90103@infobip.com>
	,<1485168079.2811.21.camel@oracle.com>
	<1486138975652.57172@infobip.com>, <1486570008.3510.50.camel@oracle.com>
Message-ID: <1486572996601.68328@infobip.com>

Hi Thomas

> I am not sure why there is no difference, it would be nice to have a
> breakdown on this like in the previous case to rule out other
> components or not enough warmup.

At least native memory is stable now. That's what I was aiming for. Attached are two graphs. This is after 5 days uptime, high load. With G1HeapRegionSize 16M.
Note that the graph is showing alive memory allocations for which allocation happened in last:
izd2.svg -- 36 hours
izd3.svg -- 10 hours

As you can see, not much going on in last 10 hours. That's great! It's stable.

Still, native memory usage is relatively high, but that's not a big problem for me.

Java Heap (reserved=25600MB, committed=25600MB)
Internal (reserved=2144MB, committed=2144MB)
GC (reserved=1166MB, committed=1166MB)

I'll look at the rest you wrote some later day.


Milan Mimica, Senior Software Engineer / Division Lead
________________________________________
From: Thomas Schatzl <thomas.schatzl at oracle.com>
Sent: Wednesday, February 8, 2017 17:06
To: Milan Mimica; hotspot-gc-use at openjdk.java.net
Subject: Re: G1 native memory consumption

Hi Milan,

On Fri, 2017-02-03 at 16:22 +0000, Milan Mimica wrote:
> Hi Thomas
>
> Thanks for your input. I took me a while to have a stable system
> again to repeat measurements.
>
> I have tried setting G1HeapRegionSize to 16M on one instance (8M is
> default) and I notice lower GC memory usage:
> GC (reserved=1117MB -18MB, committed=1117MB -18MB)
> vs
> GC (reserved=1604MB +313MB, committed=1604MB +313MB)
>
> It seems more stable too. However, "Internal" is still relatively
> high for a 25G heap, and there is no much difference between
> instances:
> Internal (reserved=2132MB -7MB, committed=2132MB -7MB)

I am not sure why there is no difference, it would be nice to have a
breakdown on this like in the previous case to rule out other
components or not enough warmup.

Everything that is allocated via the OtherRegionsTable::add_reference()
-> BitMap::resize() path in the figure from the other email is
remembered sets, and they _should_ have gone down.

You can try to move memory from that path to the CHeapObj operator new
one. This results in g1 storing remembered sets in a much more dense
but potentially slower to access representation.

The switch to turn here is G1RSetSparseRegionEntries. It gives maximum
number of cards (small areas, 512 bytes) per region to store in that
representation. If it overflows, pretty large bitmaps that might be
really sparsely populated are used (that take lots of time). By default
it is somewhat like

4 * (log2(region-size-in-MB + 1)

E.g. with 32M region only 24 cards are stored there max. I think you
can easily increase this to something like 64 or 128 or even larger. I
think (and I am unsure about this, in jdk9 we halved its memory usage)
memory usage should be around equal with the bitmaps with 2k entries on
32M regions, so I would stop at something in that area at most.

This size need not be a power of two btw. You can try increasing this
value significantly and see if it helps with memory consumption without
impacting performance too much.

Thanks,
  Thomas

-------------- next part --------------
A non-text attachment was scrubbed...
Name: izd2.svg
Type: image/svg+xml
Size: 47136 bytes
Desc: izd2.svg
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170208/92918488/izd2-0001.svg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: izd3.svg
Type: image/svg+xml
Size: 92803 bytes
Desc: izd3.svg
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170208/92918488/izd3-0001.svg>

From thomas.schatzl at oracle.com  Wed Feb  8 19:28:33 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 08 Feb 2017 20:28:33 +0100
Subject: G1  Region size info
In-Reply-To: <ab44084327244cc6bc90756b35ff248a@UKPMSEXD202N02.na.blkint.com>
References: <46dc58914a7c40f991bcdbd3f023a85e@UKPMSEXD202N02.na.blkint.com>
	<1486570402.3510.55.camel@oracle.com>
	<ab44084327244cc6bc90756b35ff248a@UKPMSEXD202N02.na.blkint.com>
Message-ID: <1486582113.3853.24.camel@oracle.com>

Hi Gopal,

On Wed, 2017-02-08 at 16:28 +0000, Gopal, Prasanna CWK wrote:
> ?Hi Thomas
> 
> Thanks for your reply. We have following G1 configuration at the
> moment?
> 
> 
[...]
> Apart from these parameters, can we try another parameter to make the
> evacuation more aggressive??
> 

It premature to try to make the collection more aggressive if e.g.
there is not anything worth to reclaim anyway.

> 
> -XX:G1MixedGCLiveThresholdPercent=85???

You could increase that. Look at your "post-marking" output and see if
there would be a significant additional amount of space to be
reclaimed.

Be aware that evacuating e.g. 90% full regions might be slow (and you
will only ever get 10% back).

Another option would be decreasing G1HeapWastePercent (not sure what
the default is, but it is pretty low already iirc), which would more
thoroughly clean out the collection set.

Also being more aggressively evacuating may not help e.g. for problems
with humongous objects/region fragmentation.

If there is a lot of unusable memory at the end of humongous objects
(check the "post-marking" output) actually decreasing region size might
help.

Eg.
###???HUMS 0x000000079e000000-
0x00000007a0000000???33554432???33554432???33554432?????501904008.8
###???HUMC 0x00000007a0000000-
0x00000007a2000000????2367536????2367536????2367536???????2526909.9

indicates that that humongous object basically wastes 31M out of 64M,
which is really bad if there are more of those hanging around.

I do not see any good solution with g1 on 7u other than increasing the
heap if that large a region size is necessary. If these humongous
objects are short-lived (and do not have j.l.O. elements), then
upgrading to 8u/9 may help (i.e. if eager reclaim can clean out large
objects regularly and asap).

Btw, the log also indicates?4121M out of 6144M of live data (around
3800M after hypothetically cleaning out all of old gen). This amount of
live data may already beyond the comfort zone of most collectors. Only
Jdk9 improves a bit in these situations, but not sure if the changes
apply here.

Not sure if decreasing heap region size will help a lot either as the
heap is already relatively full.

> -XX:InitiatingHeapOccupancyPercent=60 ==> we have experimented with
> less values , it is just making the concurrent cycle without claiming
> any significant.?

Actually even 60% seems to much. If your average live set size is at
61% already like in the log, G1 already runs marking all the time.

> -XX:MaxGCPauseMillis=500
> 
> Our object allocation rate is very high, before increasing the memory
> can we try any other parameter which can make the evacuation more
> aggressive? Appreciate your help. Please do let me know, if you need
> any more information.?

Thanks,
? Thomas


From willb at eero.com  Tue Feb 21 18:52:36 2017
From: willb at eero.com (Will Bertelsen)
Date: Tue, 21 Feb 2017 10:52:36 -0800
Subject: Native memory leak in StringTable::intern using G1
Message-ID: <CAH9wd+K1s09Wb0icOJ3wxJKt448WmmDowSc=VCxHqDv6sJ5uzg@mail.gmail.com>

Hi All,

I've been experimenting with G1 in production and have noticed a large
native memory leak that eventually exhausts all memory on the system. I ran
it overnight with NMT enabled and this was the biggest offender:

[0x00007f86c31cf205] Hashtable<oopDesc*, (MemoryType)9>::new_entry(unsigned
int, oopDesc*)+0x165
[0x00007f86c35dd263] StringTable::basic_add(int, Handle, unsigned short*,
int, unsigned int, Thread*)+0xd3
[0x00007f86c35dd452] StringTable::intern(Handle, unsigned short*, int,
Thread*)+0x182
[0x00007f86c35dd921] StringTable::intern(oopDesc*, Thread*)+0x131
                             (malloc=2628520KB +2601784KB #328565 +325223)

Has anyone seen this before?

Here is my java version and gc settings:

java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

-Xmx16384M
-Xms16384M
-XX:+AggressiveOpts
-XX:+UnlockExperimentalVMOptions

-XX:+UseG1GC
-XX:G1HeapRegionSize=8M
-XX:G1NewSizePercent=20
-XX:G1MaxNewSizePercent=80

-XX:MaxGCPauseMillis=250
-XX:MaxMetaspaceSize=1G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170221/9608d0e4/attachment.html>

From yu.zhang at oracle.com  Wed Feb 22 19:04:36 2017
From: yu.zhang at oracle.com (yu.zhang at oracle.com)
Date: Wed, 22 Feb 2017 11:04:36 -0800
Subject: Native memory leak in StringTable::intern using G1
In-Reply-To: <CAH9wd+K1s09Wb0icOJ3wxJKt448WmmDowSc=VCxHqDv6sJ5uzg@mail.gmail.com>
References: <CAH9wd+K1s09Wb0icOJ3wxJKt448WmmDowSc=VCxHqDv6sJ5uzg@mail.gmail.com>
Message-ID: <5a6265b5-9e96-1427-845b-2bf508d1edd2@oracle.com>

Will,

Does your application generate a lot of interned string?

Another way to confirm is with jmap -heap <pid> 'interned Stings' should 
be printed. Did full gc happen during the run?

Thanks

Jenny


On 02/21/2017 10:52 AM, Will Bertelsen wrote:
> Hi All,
>
> I've been experimenting with G1 in production and have noticed a large 
> native memory leak that eventually exhausts all memory on the system. 
> I ran it overnight with NMT enabled and this was the biggest offender:
>
> [0x00007f86c31cf205] Hashtable<oopDesc*, 
> (MemoryType)9>::new_entry(unsigned int, oopDesc*)+0x165
> [0x00007f86c35dd263] StringTable::basic_add(int, Handle, unsigned 
> short*, int, unsigned int, Thread*)+0xd3
> [0x00007f86c35dd452] StringTable::intern(Handle, unsigned short*, int, 
> Thread*)+0x182
> [0x00007f86c35dd921] StringTable::intern(oopDesc*, Thread*)+0x131
>                              (malloc=2628520KB +2601784KB #328565 +325223)
>
> Has anyone seen this before?
>
> Here is my java version and gc settings:
>
> java version "1.8.0_45"
> Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
>
> -Xmx16384M
> -Xms16384M
> -XX:+AggressiveOpts
> -XX:+UnlockExperimentalVMOptions
>
> -XX:+UseG1GC
> -XX:G1HeapRegionSize=8M
> -XX:G1NewSizePercent=20
> -XX:G1MaxNewSizePercent=80
>
> -XX:MaxGCPauseMillis=250
> -XX:MaxMetaspaceSize=1G
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170222/4ac71206/attachment.html>

From willb at eero.com  Wed Feb 22 21:35:01 2017
From: willb at eero.com (Will Bertelsen)
Date: Wed, 22 Feb 2017 13:35:01 -0800
Subject: Native memory leak in StringTable::intern using G1
In-Reply-To: <5a6265b5-9e96-1427-845b-2bf508d1edd2@oracle.com>
References: <CAH9wd+K1s09Wb0icOJ3wxJKt448WmmDowSc=VCxHqDv6sJ5uzg@mail.gmail.com>
 <5a6265b5-9e96-1427-845b-2bf508d1edd2@oracle.com>
Message-ID: <CAH9wd++5uHySSiQwux3OAgqbB8V2DuJaV0w5DoPogPopPFh2Tw@mail.gmail.com>

My application doesn't explicitly intern anything, though our libraries
might. However, when running jmap as you suggested no interned strings are
reported.
And no. Full GC never occurred in the 2 or so days we ran G1 before the OS
killed our proc due to system memory exhaustion.

Here is the output of jmap:

JVM version is 25.45-b02

using thread-local object allocation.

Garbage-First (G1) GC with 13 thread(s)

Heap Configuration:

   MinHeapFreeRatio         = 40

   MaxHeapFreeRatio         = 70

   MaxHeapSize              = 17179869184 (16384.0MB)

   NewSize                  = 1363144 (1.2999954223632812MB)

   MaxNewSize               = 13740539904 (13104.0MB)

   OldSize                  = 5452592 (5.1999969482421875MB)

   NewRatio                 = 2

   SurvivorRatio            = 8

   MetaspaceSize            = 21807104 (20.796875MB)

   CompressedClassSpaceSize = 1073741824 (1024.0MB)

   MaxMetaspaceSize         = 1073741824 (1024.0MB)

   G1HeapRegionSize         = 8388608 (8.0MB)


Heap Usage:

G1 Heap:

   regions  = 2048

   capacity = 17179869184 (16384.0MB)

   used     = 7899518456 (7533.5678634643555MB)

   free     = 9280350728 (8850.432136535645MB)

   45.98124916665256% used

G1 Young Generation:

Eden Space:

   regions  = 518

   capacity = 7700742144 (7344.0MB)

   used     = 4345298944 (4144.0MB)

   free     = 3355443200 (3200.0MB)

   56.42701525054466% used

Survivor Space:

   regions  = 58

   capacity = 486539264 (464.0MB)

   used     = 486539264 (464.0MB)

   free     = 0 (0.0MB)

   100.0% used

G1 Old Generation:

   regions  = 367

   capacity = 8992587776 (8576.0MB)

   used     = 3067680248 (2925.5678634643555MB)

   free     = 5924907528 (5650.4321365356445MB)

   34.11343124375414% used

On Wed, Feb 22, 2017 at 11:04 AM, yu.zhang at oracle.com <yu.zhang at oracle.com>
wrote:

> Will,
>
> Does your application generate a lot of interned string?
>
> Another way to confirm is with jmap -heap <pid> 'interned Stings' should
> be printed. Did full gc happen during the run?
>
> Thanks
>
> Jenny
>
> On 02/21/2017 10:52 AM, Will Bertelsen wrote:
>
> Hi All,
>
> I've been experimenting with G1 in production and have noticed a large
> native memory leak that eventually exhausts all memory on the system. I ran
> it overnight with NMT enabled and this was the biggest offender:
>
> [0x00007f86c31cf205] Hashtable<oopDesc*, (MemoryType)9>::new_entry(unsigned
> int, oopDesc*)+0x165
> [0x00007f86c35dd263] StringTable::basic_add(int, Handle, unsigned short*,
> int, unsigned int, Thread*)+0xd3
> [0x00007f86c35dd452] StringTable::intern(Handle, unsigned short*, int,
> Thread*)+0x182
> [0x00007f86c35dd921] StringTable::intern(oopDesc*, Thread*)+0x131
>                              (malloc=2628520KB +2601784KB #328565 +325223)
>
> Has anyone seen this before?
>
> Here is my java version and gc settings:
>
> java version "1.8.0_45"
> Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
>
> -Xmx16384M
> -Xms16384M
> -XX:+AggressiveOpts
> -XX:+UnlockExperimentalVMOptions
>
> -XX:+UseG1GC
> -XX:G1HeapRegionSize=8M
> -XX:G1NewSizePercent=20
> -XX:G1MaxNewSizePercent=80
>
> -XX:MaxGCPauseMillis=250
> -XX:MaxMetaspaceSize=1G
>
>
> _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170222/419cd1ae/attachment.html>

From thomas.schatzl at oracle.com  Thu Feb 23 11:53:33 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 23 Feb 2017 12:53:33 +0100
Subject: Native memory leak in StringTable::intern using G1
In-Reply-To: <CAH9wd+K1s09Wb0icOJ3wxJKt448WmmDowSc=VCxHqDv6sJ5uzg@mail.gmail.com>
References: <CAH9wd+K1s09Wb0icOJ3wxJKt448WmmDowSc=VCxHqDv6sJ5uzg@mail.gmail.com>
Message-ID: <1487850813.7074.31.camel@oracle.com>

Hi,

On Tue, 2017-02-21 at 10:52 -0800, Will Bertelsen wrote:
> Hi All,
> 
> I've been experimenting with G1 in production and have noticed a
> large native memory leak that eventually exhausts all memory on the
> system. I ran it overnight with NMT enabled and this was the biggest
> offender:
> 
> [0x00007f86c31cf205] Hashtable<oopDesc*,
> (MemoryType)9>::new_entry(unsigned int, oopDesc*)+0x165
> [0x00007f86c35dd263] StringTable::basic_add(int, Handle, unsigned
> short*, int, unsigned int, Thread*)+0xd3
> [0x00007f86c35dd452] StringTable::intern(Handle, unsigned short*,
> int, Thread*)+0x182
> [0x00007f86c35dd921] StringTable::intern(oopDesc*, Thread*)+0x131
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(malloc=2628520KB +2601784KB #328565
> +325223)
> 
> 
> Has anyone seen this before?

This is just a workaround, but if the application runs for so long that
it never does a full gc or do a marking cycle (did it?), you could
manually trigger string table cleanup by issuing a system.gc with jmap
now and then.
If you set -XX:+ExplicitGCInvokesConcurrent, it will not be a stop-the-
world gc.

There is no equivalent to CMSTriggerInterval in G1 which starts a
regular concurrent collection cycle every now and then (which is
basically the same band-aid).

> Here is my java version and gc settings:
> 
> java version "1.8.0_45"
> Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

You may want to update. While looking for something similar in the bug
tracker, I found e.g.?https://bugs.openjdk.java.net/browse/JDK-8133193?
fixed in 8u72.

Thanks,
? Thomas


From willb at eero.com  Fri Feb 24 23:44:26 2017
From: willb at eero.com (Will Bertelsen)
Date: Fri, 24 Feb 2017 15:44:26 -0800
Subject: Native memory leak in StringTable::intern using G1
In-Reply-To: <1487850813.7074.31.camel@oracle.com>
References: <CAH9wd+K1s09Wb0icOJ3wxJKt448WmmDowSc=VCxHqDv6sJ5uzg@mail.gmail.com>
 <1487850813.7074.31.camel@oracle.com>
Message-ID: <CAH9wd++qU-7pYPYXnh9qDVat-RvGZgF+o89i67_wNxhvLhe0Nw@mail.gmail.com>

No, in this configuration it only did young and mixed GCs before it was
killed by the system. I've fallen back to CMS for now, but when we upgrade
java I might try G1 again to see if this is resolved.

On Thu, Feb 23, 2017 at 3:53 AM, Thomas Schatzl <thomas.schatzl at oracle.com>
wrote:

> Hi,
>
> On Tue, 2017-02-21 at 10:52 -0800, Will Bertelsen wrote:
> > Hi All,
> >
> > I've been experimenting with G1 in production and have noticed a
> > large native memory leak that eventually exhausts all memory on the
> > system. I ran it overnight with NMT enabled and this was the biggest
> > offender:
> >
> > [0x00007f86c31cf205] Hashtable<oopDesc*,
> > (MemoryType)9>::new_entry(unsigned int, oopDesc*)+0x165
> > [0x00007f86c35dd263] StringTable::basic_add(int, Handle, unsigned
> > short*, int, unsigned int, Thread*)+0xd3
> > [0x00007f86c35dd452] StringTable::intern(Handle, unsigned short*,
> > int, Thread*)+0x182
> > [0x00007f86c35dd921] StringTable::intern(oopDesc*, Thread*)+0x131
> >                              (malloc=2628520KB +2601784KB #328565
> > +325223)
> >
> >
> > Has anyone seen this before?
>
> This is just a workaround, but if the application runs for so long that
> it never does a full gc or do a marking cycle (did it?), you could
> manually trigger string table cleanup by issuing a system.gc with jmap
> now and then.
> If you set -XX:+ExplicitGCInvokesConcurrent, it will not be a stop-the-
> world gc.
>
> There is no equivalent to CMSTriggerInterval in G1 which starts a
> regular concurrent collection cycle every now and then (which is
> basically the same band-aid).
>
> > Here is my java version and gc settings:
> >
> > java version "1.8.0_45"
> > Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
> > Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
>
> You may want to update. While looking for something similar in the bug
> tracker, I found e.g. https://bugs.openjdk.java.net/browse/JDK-8133193
> fixed in 8u72.
>
> Thanks,
>   Thomas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170224/d5e755eb/attachment.html>

From tg at freigmbh.de  Tue Feb 28 12:45:44 2017
From: tg at freigmbh.de (Thorsten Goetzke)
Date: Tue, 28 Feb 2017 13:45:44 +0100
Subject: Unreachable Memory not freed, Nashorn Demo
Message-ID: <9171bcd9-5212-edb6-59f6-aa17b60b50e2@freigmbh.de>

Hello,

Back in January i posted about Unreachable Objects not claimed by the 
gc, i am finally able to produce a micro, see below. When I run the 
class below using -Xmx4g and take a memory snaphsot (hprof or yourkit, 
doesnt matter), I will see 2 LeakImpl Objects. These Objects have no 
reported path to root, yet they won't be collected. If i lower the heap 
space to -Xmx2g the Application throws java.lang.OutOfMemoryError: Java 
heap space.
@Jenny Zhang should I create a new bugreport, or will you take care of this?

Best Regards,
Thorsten Goetzke

package de.frei.demo;

import jdk.nashorn.api.scripting.NashornScriptEngine;
import jdk.nashorn.api.scripting.NashornScriptEngineFactory;

import javax.script.CompiledScript;
import javax.script.ScriptException;
import javax.script.SimpleBindings;
import java.util.function.Function;


public final class LeakDemo {

     private static  NashornScriptEngine ENGINE = getNashornScriptEngine();
     private static CompiledScript SCRIPT;

     public static void main(String[] args) throws Exception {
         simulateLoad();
         simulateLoad();
         System.gc();
         Thread.sleep(1000000);

     }

     private static void simulateLoad() throws ScriptException {
         final CompiledScript compiledScript = getCompiledScript(ENGINE);
         compiledScript.eval(new SimplestBindings(new LeakImpl()));
     }

     private static NashornScriptEngine getNashornScriptEngine() {
         final NashornScriptEngineFactory factory = new 
NashornScriptEngineFactory();
         final NashornScriptEngine scriptEngine = (NashornScriptEngine) 
factory.getScriptEngine();
         return scriptEngine;
     }


     private static CompiledScript getCompiledScript(final 
NashornScriptEngine scriptEngine) throws ScriptException {
         if (SCRIPT == null) {
             SCRIPT = scriptEngine.compile("    var pivot = 
getItem(\"pivot\");");
         }
         return SCRIPT;
     }

     public  interface Leak {

         LiveItem getItem(String id);
     }


     public static final class LeakImpl implements Leak {
         private final byte[] payload = new byte[1024 * 1024 * 1024];


         @Override
         public LiveItem getItem(final String id) {
             return new LiveItem() {
             };
         }


     }

     public interface LiveItem {
     }

     public static final class SimplestBindings extends SimpleBindings {
         public SimplestBindings(Leak leak) {

             put("getItem",(Function< String, LiveItem>) leak::getItem);
         }
     }
}


From yu.zhang at oracle.com  Tue Feb 28 17:06:32 2017
From: yu.zhang at oracle.com (Jenny Zhang)
Date: Tue, 28 Feb 2017 09:06:32 -0800
Subject: Unreachable Memory not freed, Nashorn Demo
In-Reply-To: <9171bcd9-5212-edb6-59f6-aa17b60b50e2@freigmbh.de>
References: <9171bcd9-5212-edb6-59f6-aa17b60b50e2@freigmbh.de>
Message-ID: <b55bebc7-c702-4058-5efd-0122c0600052@oracle.com>

Thorsten

Thanks very much for the micro. I have update it to

https://bugs.openjdk.java.net/browse/JDK-8173594

Thanks
Jenny

On 2/28/2017 4:45 AM, Thorsten Goetzke wrote:
> Hello,
>
> Back in January i posted about Unreachable Objects not claimed by the 
> gc, i am finally able to produce a micro, see below. When I run the 
> class below using -Xmx4g and take a memory snaphsot (hprof or yourkit, 
> doesnt matter), I will see 2 LeakImpl Objects. These Objects have no 
> reported path to root, yet they won't be collected. If i lower the 
> heap space to -Xmx2g the Application throws 
> java.lang.OutOfMemoryError: Java heap space.
> @Jenny Zhang should I create a new bugreport, or will you take care of 
> this?
>
> Best Regards,
> Thorsten Goetzke
>
> package de.frei.demo;
>
> import jdk.nashorn.api.scripting.NashornScriptEngine;
> import jdk.nashorn.api.scripting.NashornScriptEngineFactory;
>
> import javax.script.CompiledScript;
> import javax.script.ScriptException;
> import javax.script.SimpleBindings;
> import java.util.function.Function;
>
>
> public final class LeakDemo {
>
>     private static  NashornScriptEngine ENGINE = 
> getNashornScriptEngine();
>     private static CompiledScript SCRIPT;
>
>     public static void main(String[] args) throws Exception {
>         simulateLoad();
>         simulateLoad();
>         System.gc();
>         Thread.sleep(1000000);
>
>     }
>
>     private static void simulateLoad() throws ScriptException {
>         final CompiledScript compiledScript = getCompiledScript(ENGINE);
>         compiledScript.eval(new SimplestBindings(new LeakImpl()));
>     }
>
>     private static NashornScriptEngine getNashornScriptEngine() {
>         final NashornScriptEngineFactory factory = new 
> NashornScriptEngineFactory();
>         final NashornScriptEngine scriptEngine = (NashornScriptEngine) 
> factory.getScriptEngine();
>         return scriptEngine;
>     }
>
>
>
>     private static CompiledScript getCompiledScript(final 
> NashornScriptEngine scriptEngine) throws ScriptException {
>         if (SCRIPT == null) {
>             SCRIPT = scriptEngine.compile("    var pivot = 
> getItem(\"pivot\");");
>         }
>         return SCRIPT;
>     }
>
>     public  interface Leak {
>
>         LiveItem getItem(String id);
>     }
>
>
>     public static final class LeakImpl implements Leak {
>         private final byte[] payload = new byte[1024 * 1024 * 1024];
>
>
>         @Override
>         public LiveItem getItem(final String id) {
>             return new LiveItem() {
>             };
>         }
>
>
>     }
>
>     public interface LiveItem {
>     }
>
>     public static final class SimplestBindings extends SimpleBindings {
>         public SimplestBindings(Leak leak) {
>
>             put("getItem",(Function< String, LiveItem>) leak::getItem);
>         }
>     }
> }
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


From poonam.bajaj at oracle.com  Tue Feb 28 17:56:42 2017
From: poonam.bajaj at oracle.com (Poonam Bajaj Parhar)
Date: Tue, 28 Feb 2017 09:56:42 -0800
Subject: Unreachable Memory not freed, Nashorn Demo
In-Reply-To: <b55bebc7-c702-4058-5efd-0122c0600052@oracle.com>
References: <9171bcd9-5212-edb6-59f6-aa17b60b50e2@freigmbh.de>
 <b55bebc7-c702-4058-5efd-0122c0600052@oracle.com>
Message-ID: <ff243ceb-0492-d470-029a-89640e792b37@oracle.com>

Hello Thorsten,

I ran this test program with jdk9-ea and created a Heap Dump after the 
first Full GC using -XX:+HeapDumpAfterFullGC. In that heap dump, I can 
see 2 instances of LeakImpl:

Class Name       | Objects | Shallow Heap | Retained Heap
----------------------------------------------------------
LeakDemo$LeakImpl|       2 |           32 |
----------------------------------------------------------

the first one is reachable as a local variable from the main thread 
which is fine:

Class Name                                    | Ref. Objects | Shallow 
Heap | Ref. Shallow Heap | Retained Heap
----------------------------------------------------------------------------------------------------------------
java.lang.Thread @ 0x84f211f8 Thread          |            1 |          
120 |                16 |           736
'- <Java Local> LeakDemo$LeakImpl @ 0x850d89f0|            1 |           
16 |                16 |            16
----------------------------------------------------------------------------------------------------------------


the other one is reachable through the referent 
"jdk.nashorn.internal.objects.Global" of a WeakReference:

Class Name              | Ref. Objects | Shallow Heap | Ref. Shallow 
Heap | Retained Heap
-----------------------------------------------------------------------------------------------------------
class jdk.internal.loader.ClassLoaders @ 0x84f268f8 System 
Class                                     |            1 |           16 
|                16 |            16
'- PLATFORM_LOADER jdk.internal.loader.ClassLoaders$PlatformClassLoader 
@ 0x84f2a610                 |            1 |           96 
|                16 |       199,624
    '- classes java.util.Vector @ 0x850b2b70 |            1 |           
32 |                16 |        68,104
       '- elementData java.lang.Object[640] @ 0x850b2b90 |            1 
|        2,576 |                16 |        68,072
          '- [196] class jdk.nashorn.internal.scripts.JD @ 
0x84f49960                                 |            1 |            8 
|                16 |         4,560
             '- map$ jdk.nashorn.internal.runtime.PropertyMap @ 
0x850d4a88                            |            1 | 64 
|                16 |         4,552
                '- protoHistory java.util.WeakHashMap @ 
0x850d5418                                    |            1 |           
48 |                16 |         2,208
                   '- table java.util.WeakHashMap$Entry[16] @ 
0x850d5448                              |            1 | 80 
|                16 |         2,112
*'- [10] java.util.WeakHashMap$Entry @ 
0x850d5498                                |            1 |           40 
|                16 |         2,032*
                         '- referent jdk.nashorn.internal.objects.Global 
@ 0x85137a18 |            1 |          544 |                16 |        
39,920
                            '- initscontext 
javax.script.SimpleScriptContext @ 0x8515c910 |            1 |           
32 |                16 |           280
                               '- engineScope LeakDemo$SimplestBindings 
@ 0x8515c930 |            1 |           16 |                16 
|           248
                                  '- map java.util.HashMap @ 
0x8515c940                               |            1 |           48 
|                16 |           232
                                     '- table java.util.HashMap$Node[16] 
@ 0x8515c970 |            1 |           80 |                16 
|           184
                                        '- [9] java.util.HashMap$Node @ 
0x8515c9f8 |            1 |           32 |                16 |            48
                                           '- value 
LeakDemo$SimplestBindings$$Lambda$118 @ 0x8515ca18|            1 
|           16 |                16 |            16
                                              '- arg$1 LeakDemo$LeakImpl 
@ 0x8515c600                 |            1 |           16 
|                16 | 1,073,741,856
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

 From the GC logs, the referent is present in an old region:

[2.044s][info   ][gc,metaspace   ] GC(6) Metaspace: 13522K->13518K(1062912K)
[2.047s][info   ][gc,start       ] GC(6) Heap Dump (after full gc)
Dumping heap to java_pid20428.hprof ...
Heap dump file created [1081050745 bytes in 25.084 secs]
[27.137s][info   ][gc             ] GC(6) Heap Dump (after full gc) 
25089.382ms
[27.137s][info   ][gc             ] GC(6) Pause Full (Allocation 
Failure) 1028M->1028M(1970M) 25171.038ms

Also:
[10.651s][trace][gc,region] GC(6) G1HR POST-COMPACTION(OLD) 
[0x0000000085100000, 0x0000000085161f20, 0x0000000085200000]

This Full GC didn't discover this WeakReference and didn't clear its 
referent. It needs to be investigated if it gets cleared and collected 
in the subsequent GCs.

Thanks,
Poonam

On 2/28/2017 9:06 AM, Jenny Zhang wrote:
> Thorsten
>
> Thanks very much for the micro. I have update it to
>
> https://bugs.openjdk.java.net/browse/JDK-8173594
>
> Thanks
> Jenny
>
> On 2/28/2017 4:45 AM, Thorsten Goetzke wrote:
>> Hello,
>>
>> Back in January i posted about Unreachable Objects not claimed by the 
>> gc, i am finally able to produce a micro, see below. When I run the 
>> class below using -Xmx4g and take a memory snaphsot (hprof or 
>> yourkit, doesnt matter), I will see 2 LeakImpl Objects. These Objects 
>> have no reported path to root, yet they won't be collected. If i 
>> lower the heap space to -Xmx2g the Application throws 
>> java.lang.OutOfMemoryError: Java heap space.
>> @Jenny Zhang should I create a new bugreport, or will you take care 
>> of this?
>>
>> Best Regards,
>> Thorsten Goetzke
>>
>> package de.frei.demo;
>>
>> import jdk.nashorn.api.scripting.NashornScriptEngine;
>> import jdk.nashorn.api.scripting.NashornScriptEngineFactory;
>>
>> import javax.script.CompiledScript;
>> import javax.script.ScriptException;
>> import javax.script.SimpleBindings;
>> import java.util.function.Function;
>>
>>
>> public final class LeakDemo {
>>
>>     private static  NashornScriptEngine ENGINE = 
>> getNashornScriptEngine();
>>     private static CompiledScript SCRIPT;
>>
>>     public static void main(String[] args) throws Exception {
>>         simulateLoad();
>>         simulateLoad();
>>         System.gc();
>>         Thread.sleep(1000000);
>>
>>     }
>>
>>     private static void simulateLoad() throws ScriptException {
>>         final CompiledScript compiledScript = getCompiledScript(ENGINE);
>>         compiledScript.eval(new SimplestBindings(new LeakImpl()));
>>     }
>>
>>     private static NashornScriptEngine getNashornScriptEngine() {
>>         final NashornScriptEngineFactory factory = new 
>> NashornScriptEngineFactory();
>>         final NashornScriptEngine scriptEngine = 
>> (NashornScriptEngine) factory.getScriptEngine();
>>         return scriptEngine;
>>     }
>>
>>
>>
>>     private static CompiledScript getCompiledScript(final 
>> NashornScriptEngine scriptEngine) throws ScriptException {
>>         if (SCRIPT == null) {
>>             SCRIPT = scriptEngine.compile("    var pivot = 
>> getItem(\"pivot\");");
>>         }
>>         return SCRIPT;
>>     }
>>
>>     public  interface Leak {
>>
>>         LiveItem getItem(String id);
>>     }
>>
>>
>>     public static final class LeakImpl implements Leak {
>>         private final byte[] payload = new byte[1024 * 1024 * 1024];
>>
>>
>>         @Override
>>         public LiveItem getItem(final String id) {
>>             return new LiveItem() {
>>             };
>>         }
>>
>>
>>     }
>>
>>     public interface LiveItem {
>>     }
>>
>>     public static final class SimplestBindings extends SimpleBindings {
>>         public SimplestBindings(Leak leak) {
>>
>>             put("getItem",(Function< String, LiveItem>) leak::getItem);
>>         }
>>     }
>> }
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170228/1cb3e47c/attachment.html>