From thomas.schatzl at oracle.com  Mon Sep  1 15:08:37 2014
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Mon, 01 Sep 2014 17:08:37 +0200
Subject: Unexplained long stop the world pauses during concurrent
	marking step in G1 Collector
In-Reply-To: <CE6F8973BA99BB4CAE71AC901973A625538C60C8@OHCINMAIL03.corp.local>
References: <CE6F8973BA99BB4CAE71AC901973A625538C4C8E@OHCINMAIL03.corp.local>
	<53FBE77B.6060806@oracle.com>
	<CAKspAHKb7MVAf7Jc_Md2Mdjroft-LE2_8NZAou9JNWvzRNYzog@mail.gmail.com>
	<CE6F8973BA99BB4CAE71AC901973A625538C5AA5@OHCINMAIL03.corp.local>
	<CE6F8973BA99BB4CAE71AC901973A625538C600A@OHCINMAIL03.corp.local>
	, <CABzyjykZeeTA-YQbC9vTN2ZU977=htS2r3dTxGjHxdCsp7YasA@mail.gmail.com>
	<CE6F8973BA99BB4CAE71AC901973A625538C60C8@OHCINMAIL03.corp.local>
Message-ID: <1409584117.2755.88.camel@cirrus>


Hi all,

  having had some time to investigate this issue, I can confirm the
problem. Large reference arrays cause very long pauses.

I filed https://bugs.openjdk.java.net/browse/JDK-8057003 for this problem.

In addition to that, very large object arrays will trip other pathological
performance problems.

E.g. almost guaranteed mark stack overflow that prevents completion of
the marking, leading into full gcs.

On Fri, 2014-08-29 at 16:07 +0000, Krishnamurthy, Kannan wrote:
> Ramki, 
> 
> Thanks for the detailed explanation. Will continue to profile
>further and share the finding. Excuse my naivety, so the default
>value of 10 ms for  G1ConcMarkStepDurationMillis doesn't still help
>in this case ? 
> Will  G1RefProcDrainInterval be of any use ?

No. The only workaround I can see is to make sure that there are no
such large objects at all at this time.

> ________________________________________
> From: Srinivas Ramakrishna [ysr1729 at gmail.com]
> Sent: Thursday, August 28, 2014 7:25 PM
> To: Krishnamurthy, Kannan
> Cc: Martin Makundi; Yu Zhang; hotspot-gc-use at openjdk.java.net;
>kndkannan at gmail.com; Zhou, Jerry
> Subject: Re: Unexplained long stop the world pauses during
>concurrent marking step in G1 Collector
> 
> It's been a while since I looked at G1 code and I'm sure it's
>evolved a bunch sine then...
> 
> Hi Kannan --
> 
> As you surmised, it's likely that the marking step isn't checking
> at a sufficiently fine granularity whether a safepoint has been
> requested. Or, equivalently, the marking step is doing too much
> work in one "step", thus preventing a safepoint while the marking
> step is in progress. If you have GC logs from the application, you
> could look at the allocation rates that you observe and compare
> the rates during the marking phase and outside of the marking
> phase. I am guessing that because
> of this, the making phase must be slowing down allocation, and we
> can get a measure of that from your GC logs. It is clear from your
> stack traces that
> the mutators are all blocked for allocation, while a safepoint is
> waiting for the marking step to yield.
> 
> It could be (from the stack retrace) that we are scanning from a
> gigantic obj array and perhaps the marking step can yield only after
> the entire array has been scanned. In which case, the use of large
> object arrays (or hash tables) could be a performance anti-pattern
> for G1.
> Perhaps we should allow for partial scanning of arrays -- i can't
> recall if CMS does that for marking -- save the
> state of the partial scan and resume from that point after the
> yield (which occurs at  a sufficiently fine granularity).

CMS does not split large objects or yields on parts of large objects
either as far as I can see. Some parts of the scanning only scan
dirty cards within these objects, but I am not sure that this is
sufficient in all cases. I think it is unrelated.

Maybe there is somebody with more experience on the CMS code that can
verify this.

> This used to be an issue with CMS as well in the early days and we
> had to refine the granularity of the marking steps
> (or the so-called "concurrent work yield points" -- points at which
> the marking will stop to allow a scavenge to proceed). I am
> guessing we'll need to refine the granularity at which G1 does
> these yields to allow a young collection to proceed in a timely
> fashion.

Thanks,
  Thomas


From ysr1729 at gmail.com  Tue Sep  2 06:11:45 2014
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Mon, 1 Sep 2014 23:11:45 -0700
Subject: Unexplained long stop the world pauses during concurrent marking
	step in G1 Collector
In-Reply-To: <1409584117.2755.88.camel@cirrus>
References: <CE6F8973BA99BB4CAE71AC901973A625538C4C8E@OHCINMAIL03.corp.local>
	<53FBE77B.6060806@oracle.com>
	<CAKspAHKb7MVAf7Jc_Md2Mdjroft-LE2_8NZAou9JNWvzRNYzog@mail.gmail.com>
	<CE6F8973BA99BB4CAE71AC901973A625538C5AA5@OHCINMAIL03.corp.local>
	<CE6F8973BA99BB4CAE71AC901973A625538C600A@OHCINMAIL03.corp.local>
	<CABzyjykZeeTA-YQbC9vTN2ZU977=htS2r3dTxGjHxdCsp7YasA@mail.gmail.com>
	<CE6F8973BA99BB4CAE71AC901973A625538C60C8@OHCINMAIL03.corp.local>
	<1409584117.2755.88.camel@cirrus>
Message-ID: <CABzyjynVUn7fuT6CEK-5SS-8GMyPpA8sq07SH0H67bdxjo4PJg@mail.gmail.com>

Hi Thomas --

Thanks for the test. I think you are right that CMS doesn't yield after a
partial scan of an array either.
The dirty card rescan is for the incremental update scanning following the
initial scan. So, yes, as you state,
CMS and G1 are probably equally susceptible to this issue.

Thanks for filing the bug. I am guessing we could have a way by which
marking state could be remembered in
the form of resumption point(s) for partially scanned object arrays. One
technique I vaguely recall
in CMS was to re-dirty the unscanned part of an object array so that the
re-dirtied suffix would be picked up and scanned
at a later time. But perhaps it applied to a different part of the
scanning.... memory is a bit foggy, and I'm
just trying to get set up with the code following a longish break.

-- ramki


On Mon, Sep 1, 2014 at 8:08 AM, Thomas Schatzl <thomas.schatzl at oracle.com>
wrote:

>
> Hi all,
>
>   having had some time to investigate this issue, I can confirm the
> problem. Large reference arrays cause very long pauses.
>
> I filed https://bugs.openjdk.java.net/browse/JDK-8057003 for this problem.
>
> In addition to that, very large object arrays will trip other pathological
> performance problems.
>
> E.g. almost guaranteed mark stack overflow that prevents completion of
> the marking, leading into full gcs.
>
> On Fri, 2014-08-29 at 16:07 +0000, Krishnamurthy, Kannan wrote:
> > Ramki,
> >
> > Thanks for the detailed explanation. Will continue to profile
> >further and share the finding. Excuse my naivety, so the default
> >value of 10 ms for  G1ConcMarkStepDurationMillis doesn't still help
> >in this case ?
> > Will  G1RefProcDrainInterval be of any use ?
>
> No. The only workaround I can see is to make sure that there are no
> such large objects at all at this time.
>
> > ________________________________________
> > From: Srinivas Ramakrishna [ysr1729 at gmail.com]
> > Sent: Thursday, August 28, 2014 7:25 PM
> > To: Krishnamurthy, Kannan
> > Cc: Martin Makundi; Yu Zhang; hotspot-gc-use at openjdk.java.net;
> >kndkannan at gmail.com; Zhou, Jerry
> > Subject: Re: Unexplained long stop the world pauses during
> >concurrent marking step in G1 Collector
> >
> > It's been a while since I looked at G1 code and I'm sure it's
> >evolved a bunch sine then...
> >
> > Hi Kannan --
> >
> > As you surmised, it's likely that the marking step isn't checking
> > at a sufficiently fine granularity whether a safepoint has been
> > requested. Or, equivalently, the marking step is doing too much
> > work in one "step", thus preventing a safepoint while the marking
> > step is in progress. If you have GC logs from the application, you
> > could look at the allocation rates that you observe and compare
> > the rates during the marking phase and outside of the marking
> > phase. I am guessing that because
> > of this, the making phase must be slowing down allocation, and we
> > can get a measure of that from your GC logs. It is clear from your
> > stack traces that
> > the mutators are all blocked for allocation, while a safepoint is
> > waiting for the marking step to yield.
> >
> > It could be (from the stack retrace) that we are scanning from a
> > gigantic obj array and perhaps the marking step can yield only after
> > the entire array has been scanned. In which case, the use of large
> > object arrays (or hash tables) could be a performance anti-pattern
> > for G1.
> > Perhaps we should allow for partial scanning of arrays -- i can't
> > recall if CMS does that for marking -- save the
> > state of the partial scan and resume from that point after the
> > yield (which occurs at  a sufficiently fine granularity).
>
> CMS does not split large objects or yields on parts of large objects
> either as far as I can see. Some parts of the scanning only scan
> dirty cards within these objects, but I am not sure that this is
> sufficient in all cases. I think it is unrelated.
>
> Maybe there is somebody with more experience on the CMS code that can
> verify this.
>
> > This used to be an issue with CMS as well in the early days and we
> > had to refine the granularity of the marking steps
> > (or the so-called "concurrent work yield points" -- points at which
> > the marking will stop to allow a scavenge to proceed). I am
> > guessing we'll need to refine the granularity at which G1 does
> > these yields to allow a young collection to proceed in a timely
> > fashion.
>
> Thanks,
>   Thomas
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140901/35770248/attachment.html>

From thomas.schatzl at oracle.com  Tue Sep  2 14:27:37 2014
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 02 Sep 2014 16:27:37 +0200
Subject: Unexplained long stop the world pauses during concurrent
	marking step in G1 Collector
In-Reply-To: <CABzyjynVUn7fuT6CEK-5SS-8GMyPpA8sq07SH0H67bdxjo4PJg@mail.gmail.com>
References: <CE6F8973BA99BB4CAE71AC901973A625538C4C8E@OHCINMAIL03.corp.local>
	<53FBE77B.6060806@oracle.com>
	<CAKspAHKb7MVAf7Jc_Md2Mdjroft-LE2_8NZAou9JNWvzRNYzog@mail.gmail.com>
	<CE6F8973BA99BB4CAE71AC901973A625538C5AA5@OHCINMAIL03.corp.local>
	<CE6F8973BA99BB4CAE71AC901973A625538C600A@OHCINMAIL03.corp.local>
	<CABzyjykZeeTA-YQbC9vTN2ZU977=htS2r3dTxGjHxdCsp7YasA@mail.gmail.com>
	<CE6F8973BA99BB4CAE71AC901973A625538C60C8@OHCINMAIL03.corp.local>
	<1409584117.2755.88.camel@cirrus>
	<CABzyjynVUn7fuT6CEK-5SS-8GMyPpA8sq07SH0H67bdxjo4PJg@mail.gmail.com>
Message-ID: <1409668057.2665.75.camel@cirrus>

Hi Ramki,

On Mon, 2014-09-01 at 23:11 -0700, Srinivas Ramakrishna wrote:
> Hi Thomas --
> 
> 
> Thanks for the test. I think you are right that CMS doesn't yield
> after a partial scan of an array either.
> The dirty card rescan is for the incremental update scanning following
> the initial scan. So, yes, as you state,
> CMS and G1 are probably equally susceptible to this issue.
> 
> 
> Thanks for filing the bug. I am guessing we could have a way by which
> marking state could be remembered in
> the form of resumption point(s) for partially scanned object arrays.
> One technique I vaguely recall
> in CMS was to re-dirty the unscanned part of an object array so that
> the re-dirtied suffix would be picked up and scanned
> at a later time. But perhaps it applied to a different part of the
> scanning.... memory is a bit foggy, and I'm
> just trying to get set up with the code following a longish break.

  another solution for this issue could be, like the full gc marking
phases do, split the array objects into currently processed part and
continuation, and store the continuation in an extra queue.

This would often also solve the problem with the mark stack overflow
too.

Not sure if it is easy to do :)

Thanks,
  Thomas


From jerry.zhou at cengage.com  Tue Sep  2 16:07:49 2014
From: jerry.zhou at cengage.com (Zhou, Jerry)
Date: Tue, 2 Sep 2014 16:07:49 +0000
Subject: Unexplained long stop the world pauses during concurrent
	marking step in G1 Collector
In-Reply-To: <CABzyjynVUn7fuT6CEK-5SS-8GMyPpA8sq07SH0H67bdxjo4PJg@mail.gmail.com>
References: <CE6F8973BA99BB4CAE71AC901973A625538C4C8E@OHCINMAIL03.corp.local>
	<53FBE77B.6060806@oracle.com>
	<CAKspAHKb7MVAf7Jc_Md2Mdjroft-LE2_8NZAou9JNWvzRNYzog@mail.gmail.com>
	<CE6F8973BA99BB4CAE71AC901973A625538C5AA5@OHCINMAIL03.corp.local>
	<CE6F8973BA99BB4CAE71AC901973A625538C600A@OHCINMAIL03.corp.local>
	<CABzyjykZeeTA-YQbC9vTN2ZU977=htS2r3dTxGjHxdCsp7YasA@mail.gmail.com>
	<CE6F8973BA99BB4CAE71AC901973A625538C60C8@OHCINMAIL03.corp.local>
	<1409584117.2755.88.camel@cirrus>
	<CABzyjynVUn7fuT6CEK-5SS-8GMyPpA8sq07SH0H67bdxjo4PJg@mail.gmail.com>
Message-ID: <D02B5B12.629F%jerry.zhou@cengage.com>

Thanks a lot Thomas and Ramki!

I?m Kannan?s colleague at Cengage Learning. He is out on a short personal trip.  I will be following on this for now. Our team have a few more questions as below:

  1.  Could you please give a rough estimated time of arrival (ETA) on the fix? We may not need this immediately, but as the load goes up, we will have to deal with that soon.
  2.  In JDK-8057003<https://bugs.openjdk.java.net/browse/JDK-8057003>, Thomas marked the Affects Version/s: 8u40, 9. After it is fixed , will it be backported to Java7?
  3.  Shall we capture some valuable information and discussions from this thread to the JDK-8057003<https://bugs.openjdk.java.net/browse/JDK-8057003>? We are not sure whether we will be able to do that, or Thomas and Ramki can help on that?
  4.  Is that any possible workaround for the time-being except chopping short the arrays? Our production system uses Apache Lucene underlying and has to deal with a lot of complex filters. A major refactoring will be required to use shorter arrays, which may go way beyond the ETA of a fix from your end.

Thanks again for the timely responses.

Best Regards,

Jerry Zhou


From: Srinivas Ramakrishna <ysr1729 at gmail.com<mailto:ysr1729 at gmail.com>>
Date: Tuesday, September 2, 2014 at 2:11 AM
To: Thomas Schatzl <thomas.schatzl at oracle.com<mailto:thomas.schatzl at oracle.com>>
Cc: "Krishnamurthy, Kannan" <Kannan.Krishnamurthy at contractor.cengage.com<mailto:Kannan.Krishnamurthy at contractor.cengage.com>>, "hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>" <hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>>, "kndkannan at gmail.com<mailto:kndkannan at gmail.com>" <kndkannan at gmail.com<mailto:kndkannan at gmail.com>>, "Zhou, Jerry" <jerry.zhou at cengage.com<mailto:jerry.zhou at cengage.com>>
Subject: Re: Unexplained long stop the world pauses during concurrent marking step in G1 Collector

Hi Thomas --

Thanks for the test. I think you are right that CMS doesn't yield after a partial scan of an array either.
The dirty card rescan is for the incremental update scanning following the initial scan. So, yes, as you state,
CMS and G1 are probably equally susceptible to this issue.

Thanks for filing the bug. I am guessing we could have a way by which marking state could be remembered in
the form of resumption point(s) for partially scanned object arrays. One technique I vaguely recall
in CMS was to re-dirty the unscanned part of an object array so that the re-dirtied suffix would be picked up and scanned
at a later time. But perhaps it applied to a different part of the scanning.... memory is a bit foggy, and I'm
just trying to get set up with the code following a longish break.

-- ramki


On Mon, Sep 1, 2014 at 8:08 AM, Thomas Schatzl <thomas.schatzl at oracle.com<mailto:thomas.schatzl at oracle.com>> wrote:

Hi all,

  having had some time to investigate this issue, I can confirm the
problem. Large reference arrays cause very long pauses.

I filed https://bugs.openjdk.java.net/browse/JDK-8057003 for this problem.

In addition to that, very large object arrays will trip other pathological
performance problems.

E.g. almost guaranteed mark stack overflow that prevents completion of
the marking, leading into full gcs.

On Fri, 2014-08-29 at 16:07 +0000, Krishnamurthy, Kannan wrote:
> Ramki,
>
> Thanks for the detailed explanation. Will continue to profile
>further and share the finding. Excuse my naivety, so the default
>value of 10 ms for  G1ConcMarkStepDurationMillis doesn't still help
>in this case ?
> Will  G1RefProcDrainInterval be of any use ?

No. The only workaround I can see is to make sure that there are no
such large objects at all at this time.

> ________________________________________
> From: Srinivas Ramakrishna [ysr1729 at gmail.com<mailto:ysr1729 at gmail.com>]
> Sent: Thursday, August 28, 2014 7:25 PM
> To: Krishnamurthy, Kannan
> Cc: Martin Makundi; Yu Zhang; hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>;
>kndkannan at gmail.com<mailto:kndkannan at gmail.com>; Zhou, Jerry
> Subject: Re: Unexplained long stop the world pauses during
>concurrent marking step in G1 Collector
>
> It's been a while since I looked at G1 code and I'm sure it's
>evolved a bunch sine then...
>
> Hi Kannan --
>
> As you surmised, it's likely that the marking step isn't checking
> at a sufficiently fine granularity whether a safepoint has been
> requested. Or, equivalently, the marking step is doing too much
> work in one "step", thus preventing a safepoint while the marking
> step is in progress. If you have GC logs from the application, you
> could look at the allocation rates that you observe and compare
> the rates during the marking phase and outside of the marking
> phase. I am guessing that because
> of this, the making phase must be slowing down allocation, and we
> can get a measure of that from your GC logs. It is clear from your
> stack traces that
> the mutators are all blocked for allocation, while a safepoint is
> waiting for the marking step to yield.
>
> It could be (from the stack retrace) that we are scanning from a
> gigantic obj array and perhaps the marking step can yield only after
> the entire array has been scanned. In which case, the use of large
> object arrays (or hash tables) could be a performance anti-pattern
> for G1.
> Perhaps we should allow for partial scanning of arrays -- i can't
> recall if CMS does that for marking -- save the
> state of the partial scan and resume from that point after the
> yield (which occurs at  a sufficiently fine granularity).

CMS does not split large objects or yields on parts of large objects
either as far as I can see. Some parts of the scanning only scan
dirty cards within these objects, but I am not sure that this is
sufficient in all cases. I think it is unrelated.

Maybe there is somebody with more experience on the CMS code that can
verify this.

> This used to be an issue with CMS as well in the early days and we
> had to refine the granularity of the marking steps
> (or the so-called "concurrent work yield points" -- points at which
> the marking will stop to allow a scavenge to proceed). I am
> guessing we'll need to refine the granularity at which G1 does
> these yields to allow a young collection to proceed in a timely
> fashion.

Thanks,
  Thomas


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140902/61f5079e/attachment-0001.html>

From alexey.ragozin at gmail.com  Wed Sep 10 21:53:01 2014
From: alexey.ragozin at gmail.com (Alexey Ragozin)
Date: Thu, 11 Sep 2014 01:53:01 +0400
Subject: Max possible heap size for Concurrent Mark Sweep collector.
Message-ID: <CAMgTVmK19LHPeNd4KRxsvHRVzq-a00ohaKTtKxm4p2Q=XUeT_A@mail.gmail.com>

Hi,

Sometimes ago we have tried to setup 200GiB heap per JVM using CMS GC.
Oracle HotSpot JVM have failed to start with following error

Java HotSpot(TM) 64-Bit Server VM warning: CMS bit map allocation failure
Java HotSpot(TM) 64-Bit Server VM warning: Failed to allocate CMS Bit Map
Error occurred during initialization of VM
Could not create CMS collector
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

JVM command line

/usr/java/jdk1.7.0_60/bin/java -d64 -verbose:gc -XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xmn512m
-XX:MaxTenuringThreshold=1 -Xms200G -Xmx200G
-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=90
-Duse-cached-field-op=false  -XX:MaxPermSize=1024m
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=13521
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false -XX:-UsePopCountInstruction
APPLICATION_MAIN_CLASS ...

We have reduced -Xmx to 180GiB and currently that is our configuration.
Provided amount of memory on our standard hardware setup, using JVM with
400 GiB would be desirable.

My question is

Is that 200GiB fundamental limitation of CMS algorithm or it is something
what could be remedied?

Any relevant pointers to open JDK code base would be appreciated.

Regards,
Alexey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140911/21ad851d/attachment.html>

From ysr1729 at gmail.com  Wed Sep 10 22:55:10 2014
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Wed, 10 Sep 2014 15:55:10 -0700
Subject: Max possible heap size for Concurrent Mark Sweep collector.
In-Reply-To: <CAMgTVmK19LHPeNd4KRxsvHRVzq-a00ohaKTtKxm4p2Q=XUeT_A@mail.gmail.com>
References: <CAMgTVmK19LHPeNd4KRxsvHRVzq-a00ohaKTtKxm4p2Q=XUeT_A@mail.gmail.com>
Message-ID: <B0A7F05C-6910-4CD8-89F7-64AFECBF2548@gmail.com>

It can't be a fundamental limitation, from what I can tell and what I recall.  The failure is in allocating the marking bit map for the old gen. Could it be that you don't have enough swap configured to back the virtual address space required for this setup? You could strace the launch and see why the mmap request might be failing.

-- Ramki

ysr1729

> On Sep 10, 2014, at 14:53, Alexey Ragozin <alexey.ragozin at gmail.com> wrote:
> 
> Hi,
> 
> Sometimes ago we have tried to setup 200GiB heap per JVM using CMS GC.
> Oracle HotSpot JVM have failed to start with following error
> 
> Java HotSpot(TM) 64-Bit Server VM warning: CMS bit map allocation failure 
> Java HotSpot(TM) 64-Bit Server VM warning: Failed to allocate CMS Bit Map 
> Error occurred during initialization of VM 
> Could not create CMS collector 
> Error: Could not create the Java Virtual Machine. 
> Error: A fatal exception has occurred. Program will exit.
> 
> JVM command line
> 
> /usr/java/jdk1.7.0_60/bin/java -d64 -verbose:gc -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xmn512m -XX:MaxTenuringThreshold=1 -Xms200G -Xmx200G -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=90 -Duse-cached-field-op=false  -XX:MaxPermSize=1024m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=13521 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -XX:-UsePopCountInstruction  APPLICATION_MAIN_CLASS ...
> 
> We have reduced -Xmx to 180GiB and currently that is our configuration.
> Provided amount of memory on our standard hardware setup, using JVM with 400 GiB would be desirable.
> 
> My question is
> 
> Is that 200GiB fundamental limitation of CMS algorithm or it is something what could be remedied?
> 
> Any relevant pointers to open JDK code base would be appreciated.
> 
> Regards,
> Alexey
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140910/56ddca51/attachment.html>

From yiyeguhu at gmail.com  Wed Sep 10 22:57:13 2014
From: yiyeguhu at gmail.com (Tao Mao)
Date: Wed, 10 Sep 2014 15:57:13 -0700
Subject: Max possible heap size for Concurrent Mark Sweep collector.
In-Reply-To: <CAMgTVmK19LHPeNd4KRxsvHRVzq-a00ohaKTtKxm4p2Q=XUeT_A@mail.gmail.com>
References: <CAMgTVmK19LHPeNd4KRxsvHRVzq-a00ohaKTtKxm4p2Q=XUeT_A@mail.gmail.com>
Message-ID: <CANrGW1w2ub=gpDbpDpKGy1D7ckZyG==hq4-ZqX6rmAMZ_ixHcQ@mail.gmail.com>

How large is the machine's memory? and how many vm's did you try to run on
one machine at the same time?

On Wed, Sep 10, 2014 at 2:53 PM, Alexey Ragozin <alexey.ragozin at gmail.com>
wrote:

> Hi,
>
> Sometimes ago we have tried to setup 200GiB heap per JVM using CMS GC.
> Oracle HotSpot JVM have failed to start with following error
>
> Java HotSpot(TM) 64-Bit Server VM warning: CMS bit map allocation failure
> Java HotSpot(TM) 64-Bit Server VM warning: Failed to allocate CMS Bit Map
> Error occurred during initialization of VM
> Could not create CMS collector
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
>
> JVM command line
>
> /usr/java/jdk1.7.0_60/bin/java -d64 -verbose:gc -XX:+UseConcMarkSweepGC
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xmn512m
> -XX:MaxTenuringThreshold=1 -Xms200G -Xmx200G
> -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=90
> -Duse-cached-field-op=false  -XX:MaxPermSize=1024m
> -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=13521
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.ssl=false -XX:-UsePopCountInstruction
> APPLICATION_MAIN_CLASS ...
>
> We have reduced -Xmx to 180GiB and currently that is our configuration.
> Provided amount of memory on our standard hardware setup, using JVM with
> 400 GiB would be desirable.
>
> My question is
>
> Is that 200GiB fundamental limitation of CMS algorithm or it is something
> what could be remedied?
>
> Any relevant pointers to open JDK code base would be appreciated.
>
> Regards,
> Alexey
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140910/e2707566/attachment.html>

From alexey.ragozin at gmail.com  Sat Sep 13 00:52:07 2014
From: alexey.ragozin at gmail.com (Alexey Ragozin)
Date: Sat, 13 Sep 2014 04:52:07 +0400
Subject: Max possible heap size for Concurrent Mark Sweep collector.
In-Reply-To: <CANrGW1w2ub=gpDbpDpKGy1D7ckZyG==hq4-ZqX6rmAMZ_ixHcQ@mail.gmail.com>
References: <CAMgTVmK19LHPeNd4KRxsvHRVzq-a00ohaKTtKxm4p2Q=XUeT_A@mail.gmail.com>
	<CANrGW1w2ub=gpDbpDpKGy1D7ckZyG==hq4-ZqX6rmAMZ_ixHcQ@mail.gmail.com>
Message-ID: <CAMgTVm+5c5D5sJQdd1Xx4h8tS-ErpTnryADQ9zy6zEVe0KsZwg@mail.gmail.com>

Box has 256GiB of RAM and no other memory consuming processes were run at
that time.
Kernel version 3.0.80 (Suse Enterprise Linux 11)

I have a look at code base and it seams that "CMS bit map allocation
failure" raised if JVM failed to allocate continuous address space for
bitmap. Actual memory commit is few line below.

    bool CMSBitMap::allocate(MemRegion mr) {
      _bmStartWord = mr.start();
      _bmWordSize  = mr.word_size();
      ReservedSpace brs(ReservedSpace::allocation_align_size_up(
                         (_bmWordSize >> (_shifter + LogBitsPerByte)) + 1));
      if (!brs.is_reserved()) {
        warning("CMS bit map allocation failure");
        return false;
      }
      // For now we'll just commit all of the bit map up fromt.
      // Later on we'll try to be more parsimonious with swap.
      if (!_virtual_space.initialize(brs, brs.size())) {
        warning("CMS bit map backing store failure");
        return false;
      }

How it could be possible for 64 bit process?

Regards,
Alexey


On Thu, Sep 11, 2014 at 2:57 AM, Tao Mao <yiyeguhu at gmail.com> wrote:

> How large is the machine's memory? and how many vm's did you try to run on
> one machine at the same time?
>
> On Wed, Sep 10, 2014 at 2:53 PM, Alexey Ragozin <alexey.ragozin at gmail.com>
> wrote:
>
>> Hi,
>>
>> Sometimes ago we have tried to setup 200GiB heap per JVM using CMS GC.
>> Oracle HotSpot JVM have failed to start with following error
>>
>> Java HotSpot(TM) 64-Bit Server VM warning: CMS bit map allocation failure
>> Java HotSpot(TM) 64-Bit Server VM warning: Failed to allocate CMS Bit Map
>> Error occurred during initialization of VM
>> Could not create CMS collector
>> Error: Could not create the Java Virtual Machine.
>> Error: A fatal exception has occurred. Program will exit.
>>
>> JVM command line
>>
>> /usr/java/jdk1.7.0_60/bin/java -d64 -verbose:gc -XX:+UseConcMarkSweepGC
>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xmn512m
>> -XX:MaxTenuringThreshold=1 -Xms200G -Xmx200G
>> -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=90
>> -Duse-cached-field-op=false  -XX:MaxPermSize=1024m
>> -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=13521
>> -Dcom.sun.management.jmxremote.authenticate=false
>> -Dcom.sun.management.jmxremote.ssl=false -XX:-UsePopCountInstruction
>> APPLICATION_MAIN_CLASS ...
>>
>> We have reduced -Xmx to 180GiB and currently that is our configuration.
>> Provided amount of memory on our standard hardware setup, using JVM with
>> 400 GiB would be desirable.
>>
>> My question is
>>
>> Is that 200GiB fundamental limitation of CMS algorithm or it is something
>> what could be remedied?
>>
>> Any relevant pointers to open JDK code base would be appreciated.
>>
>> Regards,
>> Alexey
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140913/7db09f55/attachment.html>