Strange GC behaviour,

Dang Nhan Nguyen nhann at chalmers.se
Thu Dec 8 18:20:17 UTC 2011


Hi,

>Actually, prior to the fix for 7112034, there would be potentially fewer Reference objects identified for later >processing (a step known as discovery).
>Following 7112034, there would be potentially more, as some objects not discovered before the fix would be >discovered now.

Agree!

>This is the strange part, and to me does not make sense. How id you determine that (1) there were lots of >Reference objects (2) that a lot
>of time was spent in Reference processing during remark? If you run with -XX:+PrintGCDetails ->XX:+PrintReferenceGC for both the
>cases (with and withouyt -XX:+ParallelRefProcEnabled), we would know what the relative numbers/times looked >like and perhaps
>get closer to what the issue might be here.

Attached cms_lusearch.txt is output of PrintGCDetails which I skipped some ParNew GC part to reduce file size.  The other is my record of number of objects marked in CMS phases. Since I run CMS in stop-the-world fashion (I change code a bit to do so), no object is promoted.
cms_lusearch.log records live objects marked in that phase, I recorded it by increasing the counter everytime an object is marked successfully.

I ran with below command line (don't worry about -XX:CMSInitiatingOccupancyFraction=100 -XX:+UseCMSInitiatingOccupancyOnly. I used it as part of my trick to cause CMS running in foreground):

JAVA_HOME=../ build/linux-amd64/j2sdk-image LD_LIBRARY_PATH=../build/hotspot_product_prof/linux_amd64_compiler2/product ../ build/hotspot_product_prof/linux_amd64_compiler2/product/gamma -Xms100M -Xmx100M -XX:+PrintGCDetails -XX:+PrintReferenceGC -XX:CMSInitiatingOccupancyFraction=100 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=8 -XX:NewRatio=10 -jar dacapo-9.12-bach.jar -s large -n 10 -t 8 lusearch

I recorded high number of objects marked in remark phase. As you can see in the two files, the total number of references reported is around 2-3000 refs during remark. But the number of objects marked during remarked is much more than that: 600k objects (for lusearch application in Dacapo benchmark).
Does this scenario normal?

-Nhan

From: Srinivas Ramakrishna [mailto:ysr1729 at gmail.com]
Sent: Thursday, December 08, 2011 5:15 PM
To: Dang Nhan Nguyen
Cc: hotspot-gc-dev at openjdk.java.net
Subject: Re: Strange GC behaviour,


On Thu, Dec 8, 2011 at 2:41 AM, Dang Nhan Nguyen <nhann at chalmers.se<mailto:nhann at chalmers.se>> wrote:
Hi,

Thank you for your update.
I apply CR is 7112034 and also try with -XX:+ParallelRefProcEnabled:


-    While applying CR is 7112034, CMS still leaves about the same number of objects/ref marked for the remark phase comparing to before applying the CR. Which means the CR make no (or little) change to the number of references  processed during the concurrent marking phase. How can this be explained?

Actually, prior to the fix for 7112034, there would be potentially fewer Reference objects identified for later processing (a step known as discovery).
Following 7112034, there would be potentially more, as some objects not discovered before the fix would be discovered now.

-    When I use -XX:+ParallelRefProcEnabled, performance is worse than not using this option. So, when I enable this, how many threads were used for this reference processing (in a ratio of ParallelGCThreads and ConcGCThreads)?

This is the strange part, and to me does not make sense. How id you determine that (1) there were lots of Reference objects (2) that a lot
of time was spent in Reference processing during remark? If you run with -XX:+PrintGCDetails -XX:+PrintReferenceGC for both the
cases (with and withouyt -XX:+ParallelRefProcEnabled), we would know what the relative numbers/times looked like and perhaps
get closer to what the issue might be here.



>From further information, in one application, 100k-200k references were left to the remark phase, and in  another they are 1.5m references.

May be this is the natural behaviour of your application? One thing we have found to help is if we mak good use of
the survivor spaces, then sometimes these probles will go away because Reference objects get processed
as they become eligible for processing in the young generation (therefore during minor GC's) and thus do not
impose a large monolithic overhead during the less frequent major collections.

-- ramki

Best,
-Nhan

From: Srinivas Ramakrishna [mailto:ysr1729 at gmail.com<mailto:ysr1729 at gmail.com>]
Sent: Thursday, December 08, 2011 2:44 AM
To: Dang Nhan Nguyen
Cc: hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>
Subject: Re: Strange GC behaviour,

Hi Nhan --

Make sure you have the recent fix to Reference object processing when doing
MT/Parallel marking with CMS; i don't have the CR handy, but the fix was pushed by
Stefan not long ago. (The workaround was to disable parallel concurrent marking via
-XX:-CMSConcurrentMTEnabled -- but of course the performance impact of that,
depending on yr platform, might compromise yr benchmarking results.)

More inline below:-
On Wed, Dec 7, 2011 at 2:20 PM, Dang Nhan Nguyen <nhann at chalmers.se<mailto:nhann at chalmers.se>> wrote:
Hi guys,

I am benchmarking CMS collector and found it strange that there are many objects left in remark phase.

When I check the issue again by only running foreground CMS (I changed the source code a bit to force CMS running in foreground), so there is no promotion during collecting  (so no objects marked during Preclean and Abortable Preclean phase). Even in this case, I found that there are still many objects left in remark phase (in lusearch and tomcat in Dacapo benchmark, they are 10 times and 1/3 of number of objects marked in concmarking phase, while in suflow, they are just about 100 objects/refs). Most (maybe ALL) of these objects marked in remark phase were processed by refProcessingWork(). In turn, this refProcessingWork() run in single thread mode. The performance was reduced a lot because of this.

-XX:+ParallelRefProcEnabled uses multiple threads to do that phase.


My questions are:
1. What kinds of references were processed by refProcessingWork()? weak and soft ref? what are exactly they are?

all objects of type java.lang.ref.Reference (includes WeakReferencem=, SoftReference, FinalReference, PhantomReference).

2. While I only run CMS in stop-the-world, why are there still many objects left to be processed by remark, in particular here is refProcessingWork()? Why aren't these references/objects processed in concurrent marking phase?

I am not sure I understand the question. What do you mean by "run CMS in stop-the-world"? Note that Reference objects are processed
by all kinds of collectors, young gen (minor) or whole heap (major), stop-world or concurrent. The volume of those objects may
vary a bit between them based on the heap size, NewRatio etc.

hope that helped.
-- ramki

Best,
-Nhan Nguyen


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20111208/aa31cdac/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cms_lusearch.log
Type: application/octet-stream
Size: 1219 bytes
Desc: cms_lusearch.log
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20111208/aa31cdac/cms_lusearch.log>
-------------- next part --------------
Warning: name cms_lusearch_100M_8.1.er is in use; changed to cms_lusearch_100M_8.4.er
WARNING: Linux-64-bit, 8 CPUs, Ubuntu_11.10 system "dark" is not supported by the Performance tools.
WARNING: Data collection may fail: system is not properly configured or is unsupported.
Creating experiment database ./lusearch/cms_lusearch_100M_8.4.er ...
Using scaled threading model. 8 processors detected, 8 threads used to drive the workload, in a possible range of [1,128]
===== DaCapo 9.12 lusearch starting warmup 1 =====
[GC Using java runtime at: ../openjdkjvm/openjdk-mer/build/linux-amd64/j2sdk-image/jre
[ParNew[SoftReference, 0 refs, 0.0000170 secs][WeakReference, 16 refs, 0.0000090 secs][FinalReference, 203 refs, 0.0002140 secs][PhantomReference, 0 refs, 0.0000060 secs][JNI Weak Reference, 0.0001660 secs]: 7486K->895K(8384K), 0.0093490 secs] 7486K->2330K(101504K), 0.0093900 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
<<<<<<<<<<<<<<<<<<<
Many similar ParNew GCs
>>>>>>>>>>>>>>>>>>>>
[GC [ParNew[SoftReference, 0 refs, 0.0000170 secs][WeakReference, 8 refs, 0.0000090 secs][FinalReference, 8 refs, 0.0000100 secs][PhantomReference, 0 refs, 0.0000060 secs][JNI Weak Reference, 0.0001730 secs]: 7979K->504K(8384K), 0.0012370 secs] 92503K->85096K(101504K), 0.0012720 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
[GC [ParNew[SoftReference, 0 refs, 0.0000280 secs][WeakReference, 6 refs, 0.0000090 secs][FinalReference, 15 refs, 0.0000080 secs][PhantomReference, 0 refs, 0.0000060 secs][JNI Weak Reference, 0.0001720 secs]: 7992K->512K(8384K), 0.0010670 secs] 92584K->85151K(101504K), 0.0011050 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[GC [ParNew[SoftReference, 0 refs, 0.0000180 secs][WeakReference, 6 refs, 0.0000070 secs][FinalReference, 15 refs, 0.0000070 secs][PhantomReference, 0 refs, 0.0000070 secs][JNI Weak Reference, 0.0001730 secs]: 7995K->425K(8384K), 0.0023690 secs] 92634K->85066K(101504K), 0.0024060 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[GC [ParNew[SoftReference, 0 refs, 0.0000160 secs][WeakReference, 2 refs, 0.0000070 secs][FinalReference, 5 refs, 0.0000070 secs][PhantomReference, 0 refs, 0.0000060 secs][JNI Weak Reference, 0.0001750 secs]: 7910K->356K(8384K), 0.0023160 secs][CMSForeground collection takes control 2
 (concurrent mode failure)[YG occupancy: 356 K (8384 K)][weak refs processing[SoftReference, 0 refs, 0.0000070 secs][WeakReference, 227 refs, 0.0000250 secs][FinalReference, 3085 refs, 0.0305350 secs][PhantomReference, 0 refs, 0.0000040 secs][JNI Weak Reference, 0.0001280 secs], 0.0307560 secs][Foreground_GCs: 1]
: 84869K->28067K(93120K), 0.0762010 secs] 92551K->28424K(101504K), [CMS Perm : 5635K->5635K(21248K)], 0.0785770 secs] [Times: user=0.10 sys=0.00, real=0.08 secs] 
[GC [ParNew[SoftReference, 0 refs, 0.0000210 secs][WeakReference, 0 refs, 0.0000070 secs][FinalReference, 0 refs, 0.0000070 secs][PhantomReference, 0 refs, 0.0000060 secs][JNI Weak Reference, 0.0001730 secs]: 7844K->120K(8384K), 0.0014750 secs] 35912K->28298K(101504K), 0.0015100 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
<<<<<<<<<<<<<<<<<<<
Many similar ParNew GCs
>>>>>>>>>>>>>>>>>>>>
[GC [ParNew[SoftReference, 0 refs, 0.0000160 secs][WeakReference, 2 refs, 0.0000080 secs][FinalReference, 1 refs, 0.0000060 secs][PhantomReference, 0 refs, 0.0000070 secs][JNI Weak Reference, 0.0001710 secs]: 7694K->211K(8384K), 0.0010980 secs] 92418K->84946K(101504K), 0.0011350 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
[GC [ParNew[SoftReference, 0 refs, 0.0000130 secs][WeakReference, 0 refs, 0.0000100 secs][FinalReference, 0 refs, 0.0000110 secs][PhantomReference, 0 refs, 0.0000060 secs][JNI Weak Reference, 0.0001720 secs]: 7698K->102K(8384K), 0.0010090 secs][CMSForeground collection takes control 2
 (concurrent mode failure)[YG occupancy: 102 K (8384 K)][weak refs processing[SoftReference, 0 refs, 0.0000060 secs][WeakReference, 58 refs, 0.0000120 secs][FinalReference, 2124 refs, 0.0291600 secs][PhantomReference, 0 refs, 0.0000030 secs][JNI Weak Reference, 0.0001290 secs], 0.0293700 secs][Foreground_GCs: 2]
: 84842K->20152K(93120K), 0.0992390 secs] 92433K->20254K(101504K), [CMS Perm : 5653K->5653K(21248K)], 0.1003030 secs] [Times: user=0.12 sys=0.00, real=0.10 secs] 
[GC [ParNew[SoftReference, 0 refs, 0.0000080 secs][WeakReference, 0 refs, 0.0000030 secs][FinalReference, 0 refs, 0.0000030 secs][PhantomReference, 0 refs, 0.0000030 secs][JNI Weak Reference, 0.0000970 secs]: 7590K->64K(8384K), 0.0008310 secs] 27742K->20218K(101504K), 0.0008570 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[GC [ParNew[SoftReference, 0 refs, 0.0000110 secs][WeakReference, 0 refs, 0.0000050 secs][FinalReference, 0 refs, 0.0000030 secs][PhantomReference, 0 refs, 0.0000030 secs][JNI Weak Reference, 0.0000950 secs]: 7552K->47K(8384K), 0.0008380 secs] 27706K->20203K(101504K), 0.0008630 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[GC [ParNew[SoftReference, 0 refs, 0.0000100 secs][WeakReference, 0 refs, 0.0000050 secs][FinalReference, 0 refs, 0.0000030 secs][PhantomReference, 0 refs, 0.0000030 secs][JNI Weak Reference, 0.0000940 secs]: 7523K->58K(8384K), 0.0007570 secs] 27680K->20216K(101504K), 0.0007820 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[GC [ParNew[SoftReference, 0 refs, 0.0000090 secs][WeakReference, 0 refs, 0.0000030 secs][FinalReference, 0 refs, 0.0000030 secs][PhantomReference, 0 refs, 0.0000030 secs][JNI Weak Reference, 0.0001880 secs]: 7546K->56K(8384K), 0.0007010 secs] 27704K->20217K(101504K), 0.0007250 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 

<<<<<<<<<<<<<<<<<<<
Many similar ParNew GCs
>>>>>>>>>>>>>>>>>>>>

[GC [ParNew[SoftReference, 0 refs, 0.0000160 secs][WeakReference, 0 refs, 0.0000070 secs][FinalReference, 0 refs, 0.0000060 secs][PhantomReference, 0 refs, 0.0000070 secs][JNI Weak Reference, 0.0001730 secs]: 7496K->36K(8384K), 0.0010970 secs] 58709K->51250K(101504K), 0.0011290 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[GC [ParNew[SoftReference, 0 refs, 0.0000170 secs][WeakReference, 0 refs, 0.0000060 secs][FinalReference, 0 refs, 0.0000060 secs][PhantomReference, 0 refs, 0.0000060 secs][JNI Weak Reference, 0.0001730 secs]: 7500K->36K(8384K), 0.0011540 secs] 58714K->51252K(101504K), 0.0011900 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[GC [ParNew[SoftReference, 0 refs, 0.0000140 secs][WeakReference, 0 refs, 0.0000060 secs][FinalReference, 0 refs, 0.0000070 secs][PhantomReference, 0 refs, 0.0000060 secs][JNI Weak Reference, 0.0001740 secs]: 7522K->26K(8384K), 0.0007900 secs] 58739K->51253K(101504K), 0.0008250 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[GC [ParNew[SoftReference, 0 refs, 0.0000160 secs][WeakReference, 0 refs, 0.0000070 secs][FinalReference, 0 refs, 0.0000060 secs][PhantomReference, 0 refs, 0.0000060 secs][JNI Weak Reference, 0.0001740 secs]: 7500K->18K(8384K), 0.0009330 secs] 58727K->51253K(101504K), 0.0009660 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[GC [ParNew[SoftReference, 0 refs, 0.0000170 secs][WeakReference, 0 refs, 0.0000090 secs][FinalReference, 0 refs, 0.0000080 secs][PhantomReference, 0 refs, 0.0000070 secs][JNI Weak Reference, 0.0001700 secs]: 7488K->24K(8384K), 0.0009420 secs] 58724K->51261K(101504K), 0.0009750 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
128 query batches completed
===== DaCapo 9.12 lusearch PASSED in 5272 msec =====
Heap
 par new generation   total 8384K, used 4463K [0x00000000f4a00000, 0x00000000f5310000, 0x00000000f5310000)
  eden space 7488K,  59% used [0x00000000f4a00000, 0x00000000f4e55d50, 0x00000000f5150000)
  from space 896K,   2% used [0x00000000f5150000, 0x00000000f5156010, 0x00000000f5230000)
  to   space 896K,   0% used [0x00000000f5230000, 0x00000000f5230000, 0x00000000f5310000)
 concurrent mark-sweep generation total 93120K, used 51237K [0x00000000f5310000, 0x00000000fae00000, 0x00000000fae00000)
 concurrent-mark-sweep perm gen total 21248K, used 5668K [0x00000000fae00000, 0x00000000fc2c0000, 0x0000000100000000)


More information about the hotspot-gc-dev mailing list