Parallel GC thread priority

Tue Sep 11 11:29:09 UTC 2012

Hi Dmytro,

Thanks for experimenting with priorities.  So it's not surprising that
there would be scheduler interference when running lots of cpu hungry
processes/threads.  The time discrepancy in that case seems
reasonable/plausible.  But 1.2s of user + sys vs 30s of real just seems,
intuitively, like a bit too much to place blame on just the scheduler
interference.

I think your experiment does highlight that boosting gc thread priority
seems like a good thing (and it makes sense to me conceptually).  At the
very least I think there should be an XX option to control this behavior.

Thanks

Sent from my phone
On Sep 11, 2012 5:54 AM, "Dmytro Sheyko" <dmytro_sheyko at hotmail.com> wrote:

>  Hi,
>
> Here is a bit more details:
>
> {Heap before GC invocations=1468 (full 6):
> PSYoungGen      total 171904K, used 10352K [0x00000000eaab0000,
> 0x00000000f6940000, 0x0000000100000000)
>   eden space 161536K, 0% used
> [0x00000000eaab0000,0x00000000eaab0000,0x00000000f4870000)
>   from space 10368K, 99% used
> [0x00000000f5f20000,0x00000000f693c060,0x00000000f6940000)
>   to   space 16640K, 0% used
> [0x00000000f48c0000,0x00000000f48c0000,0x00000000f5900000)
> PSOldGen        total 408640K, used 403417K [0x00000000c0000000,
> 0x00000000d8f10000, 0x00000000eaab0000)
>   object space 408640K, 98% used
> [0x00000000c0000000,0x00000000d89f6478,0x00000000d8f10000)
> PSPermGen       total 83968K, used 76315K [0x00000000bae00000,
> 0x00000000c0000000, 0x00000000c0000000)
>   object space 83968K, 90% used
> [0x00000000bae00000,0x00000000bf886db0,0x00000000c0000000)
> 2012-03-05T00:03:06.936-0600: 130338.132: [Full GC [PSYoungGen:
> 10352K->0K(171904K)] [PSOldGen: 403417K->114637K(415872K)]
> 413769K->114637K(587776K) [PSPermGen: 76315K->76315K(83968K)], 30.2595731
> secs] [Times: user=0.77 sys=0.41, real=30.26 secs]
> Heap after GC invocations=1468 (full 6):
> PSYoungGen      total 171904K, used 0K [0x00000000eaab0000,
> 0x00000000f6940000, 0x0000000100000000)
>   eden space 161536K, 0% used
> [0x00000000eaab0000,0x00000000eaab0000,0x00000000f4870000)
>   from space 10368K, 0% used
> [0x00000000f5f20000,0x00000000f5f20000,0x00000000f6940000)
>   to   space 16640K, 0% used
> [0x00000000f48c0000,0x00000000f48c0000,0x00000000f5900000)
> PSOldGen        total 415872K, used 114637K [0x00000000c0000000,
> 0x00000000d9620000, 0x00000000eaab0000)
>   object space 415872K, 27% used
> [0x00000000c0000000,0x00000000c6ff3410,0x00000000d9620000)
> PSPermGen       total 83968K, used 76315K [0x00000000bae00000,
> 0x00000000c0000000, 0x00000000c0000000)
>   object space 83968K, 90% used
> [0x00000000bae00000,0x00000000bf886db0,0x00000000c0000000)
> }
>
> JVM is x64 Java 7u2.
> JVM options are (printed by -XX:+PrintCommandLineFlags):
> -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=1073741824
> -XX:ParallelGCThreads=4 -XX:+PrintGC -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
> -XX:+PrintJNIGCStalls -XX:+ReduceSignalUsage -XX:+UseCompressedOops
> -XX:+UseLargePagesIndividualAllocation -XX:+UseParallelGC
>
> This happened several months ago and many things were changed since that:
> we added 4G RAM (8G -> 12G), tried other collectors CMS and G1, tried 32
> bit, tried other java 7 updates etc. And currently it works pretty well. I
> doubt I may experiment with our CI server in order to find the real cause
> of the long GC pause happened before, however please let me know what
> options are to be set, that would help us to prove that swapping is guilty
> of such discrepancy between CPU time and wall time if it happens again. I
> assume that the issue is not specific to PS and it can be observed with CMS
> and G1 also.
>
> Well, I almost forgot about such issue, but I noticed recently (analyzing
> jstack output) that Parallel GC threads work with NORM, while G1 and CMS
> use MAX priority. This seemed to me suspect.
> If it is intentional, what is the reason for this?
>
> Recently I wrote simple test (attached) to find out how other
> cpu-intensive applications impact on GC pauses.
> The test does following:
> 1. figures out Parallel GC threads and their native ids (using jstack)
> 2. raises their native priority from NORMAL(0) to HIGH(2) (using JNA)
> (optional)
> 3. starts java application (Disturber) that simply spins endless loop in
> several threads (100, configurable) just to consume cpu time.
> 4. generates garbage to provoke GC.
>
> Used following options:
> -Xmx1024m -Xms256m
> -server
> -showversion
> -XX:+PrintGCDetails
> -XX:+UseParallelGC
> -XX:+UseParallelOldGC
> -Ddisturber.java.priority=5
> -Dduration.ms=100000
> -Ddisturber.threads=100
> -Dgc.native.priority=0
>
>
> Got following results:
>
> 1. The test is run without any disturber threads (-Ddisturber.threads=0).
> Priority of GC threads does not matter. Typical GC log looks like this:
> [GC [PSYoungGen: 312502K->0K(349440K)] 898885K->625445K(1048512K),
> 0.0858486 secs] [Times: user=0.19 sys=0.00, real=0.09 secs]
> [GC [PSYoungGen: 312502K->0K(349440K)] 937947K->664508K(1048512K),
> 0.0855026 secs] [Times: user=0.19 sys=0.00, real=0.09 secs]
> [Full GC [PSYoungGen: 0K->0K(349440K)] [ParOldGen:
> 664508K->78570K(699072K)] 664508K->78570K(1048512K) [PSPermGen:
> 3065K->3065K(16384K)], 0.4356773 secs] [Times: user=0.80 sys=0.00,
> real=0.44 secs]
> [GC [PSYoungGen: 312502K->0K(349440K)] 391072K->117633K(1048512K),
> 0.0829569 secs] [Times: user=0.13 sys=0.02, real=0.08 secs]
> [GC [PSYoungGen: 312502K->0K(349440K)] 430135K->156695K(1048512K),
> 0.0831900 secs] [Times: user=0.19 sys=0.00, real=0.08 secs]
>
> 2. The test is run with 100 disturber threads (-Ddisturber.threads=100).
> Priority of GC threads remains NORM(0) (-Dgc.native.priority=0). Typical GC
> log looks like this:
> [GC [PSYoungGen: 312502K->0K(349440K)] 664510K->391070K(814976K),
> 1.9201232 secs] [Times: user=0.13 sys=0.02, real=1.92 secs]
> [GC [PSYoungGen: 312502K->0K(349440K)] 703572K->430133K(814976K),
> 1.9581938 secs] [Times: user=0.16 sys=0.05, real=1.96 secs]
> [Full GC [PSYoungGen: 0K->0K(349440K)] [ParOldGen:
> 430133K->78570K(490368K)] 430133K->78570K(839808K) [PSPermGen:
> 3065K->3065K(16384K)], 8.0233189 secs] [Times: user=0.41 sys=0.00,
> real=8.02 secs]
> [GC [PSYoungGen: 312502K->0K(349440K)] 391072K->117633K(839808K),
> 2.1210901 secs] [Times: user=0.13 sys=0.02, real=2.12 secs]
> [GC [PSYoungGen: 312502K->0K(349440K)] 430135K->156695K(839808K),
> 2.1864928 secs] [Times: user=0.13 sys=0.02, real=2.19 secs]
>
> 3. The test is run with 100 disturber threads (-Ddisturber.threads=100).
> Priority of GC threads is raisen to HIGH(2) (-Dgc.native.priority=2).
> Typical GC log looks like this:
> [GC [PSYoungGen: 312502K->0K(349440K)] 898885K->625445K(1048512K),
> 0.0824847 secs] [Times: user=0.16 sys=0.00, real=0.08 secs]
> [GC [PSYoungGen: 312502K->0K(349440K)] 937947K->664508K(1048512K),
> 0.0823295 secs] [Times: user=0.17 sys=0.00, real=0.08 secs]
> [Full GC [PSYoungGen: 0K->0K(349440K)] [ParOldGen:
> 664508K->78570K(699072K)] 664508K->78570K(1048512K) [PSPermGen:
> 3065K->3065K(16384K)], 0.5145061 secs] [Times: user=0.65 sys=0.00,
> real=0.51 secs]
> [GC [PSYoungGen: 312502K->0K(349440K)] 391072K->117633K(1048512K),
> 0.0793123 secs] [Times: user=0.16 sys=0.00, real=0.08 secs]
> [GC [PSYoungGen: 312502K->0K(349440K)] 430135K->156695K(1048512K),
> 0.0809080 secs] [Times: user=0.16 sys=0.00, real=0.08 secs]
>
> Here I can see that there is a noticeable difference between cpu and wall
> time of GC pauses in case #2 (when GC threads work with NORM priority and
> there are cpu intensive applications).
>
> I believe that Parallel GC threads are to be run with HIGH priority (or at
> least with highest priority of app threads they block), because otherwise
> there is some sort of priority inversion: GC threads block all application
> threads even those that run with HIGH priority, however they themself run
> with NORM priority and can be simply preempted with threads of other app
> that run with ABOVE_NORM and NORM priority.
>
> Thanks,
> Dmytro
>
> ------------------------------
> Date: Mon, 10 Sep 2012 17:22:51 +0100
> Subject: Re: Parallel GC thread priority
> From: ysr1729 at gmail.com
> To: vitalyd at gmail.com
> CC: dmytro_sheyko at hotmail.com; hotspot-gc-dev at openjdk.java.net
>
> I agree that eliminating paging as a suspect is a good idea. Although the
> stats that Dmytro provided below seems to show plenty of free RAM, perhaps
> that info was collected at a different time than when the long pause was
> observed?
>
> Like Vitaly, I am suspicious of paging in such extreme cases.
>
> What's the version of JVM you are running? (In such cases running with
> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps would have been great as it
> just might have provided a couple more clues -- or not.)
> (There were some older versions of the JVM where there were issues with
> thread suspension that would cause long pauses, but i believe the GC log
> symptom of that was somewhat different.)
>
> By the way, since this is a 4 core system, you should be running
> -XX:+UseParallelOldGC.
>
> Note, for that reason, that "parallel gc thread" priority is moot for this
> case, it's really "(serial) GC thread", since there's a single GC thread
> running your full collection below. (Although your minor collections are
> indeed run multi-threaded.)
>
> -- ramki
>
> On Fri, Sep 7, 2012 at 5:53 PM, Vitaly Davidovich <vitalyd at gmail.com>wrote:
>
> I see you're on windows, but on Linux I'm almost certain that time spent
> waiting for i/o does not count in sys time (it's purely CPU time).  I
> suspect, though, that windows does similar accounting but I'm not sure.
> Sent from my phone
> On Sep 7, 2012 12:31 PM, "Dmytro Sheyko" <dmytro_sheyko at hotmail.com>
> wrote:
>
>  Hi,
>
> I can provide following info only:
> ===
> System:
>  Microsoft Windows Server 2003 R2
>  Standard x64 Edition
>  Service Pack 2
> Computer:
>  Intel(R) Xeon(R) CPU E5-2640
>  2.50 GHz, 12.0 GB of RAM
> ===
> JConsole output:
>
> Operating System:       Windows 2003 5.2
> Architecture:           amd64
> Number of processors:   4
>
> Total physical memory:  12,582,100 kbytes
> Free physical memory:    6,184,012 kbytes
> Total swap space:       14,137,188 kbytes
> Free swap space:        11,189,520 kbytes
> ===
> But at time that this long GC pause happened it was 8 GB of RAM. And I
> don't know how much memory was used at that time.
>
> Please note that it's continuous integration server. Some build tasks are
> executed on the same machine. The pause happened on web server, which
> serves UI.
>
> BTW, isn't swapping counted as sys/kernel time?
>
> Thanks,
> Dmytro
>
>
> ------------------------------
> Date: Fri, 7 Sep 2012 11:45:34 -0400
> Subject: Re: Parallel GC thread priority
> From: vitalyd at gmail.com
> To: azeem.jiva at oracle.com
> CC: dmytro_sheyko at hotmail.com; hotspot-gc-dev at openjdk.java.net
>
> Whether it helps or not would depend on what priority the other threads
> are running at.  If the server had several jvms running at the same time
> and they all start a par new collection at about the same time, then yeah
> it won't help if priority is boosted - at that point, the machine is simply
> oversubscribed.
> In Dmytro's case, the wall time is so out of whack with CPU time that I
> wonder if it wasn't swapping that mostly contributed to this.  For such a
> relatively small heap, I can't imagine how much other stuff would have to
> run to inflate the time so much.
> Dmytro, what hardware spec was this on, out of curiosity?
> Sent from my phone
> On Sep 7, 2012 11:39 AM, "Azeem Jiva" <azeem.jiva at oracle.com> wrote:
>
>  I had not thought about other processes, which is a possibility.
> Although raising the priority of the GC threads won't help which I believe
> what Dmytro was suggestion.
>
>
> On 09/07/2012 10:35 AM, Vitaly Davidovich wrote:
>
> You can have other threads from different processes in the system
> competing though.
> However, such a large wall time vs CPU time can also be caused by heavy
> swapping on a slow disk.  The heap there doesn't look all that big though
> ...
> Sent from my phone
> On Sep 7, 2012 11:21 AM, "Azeem Jiva" <azeem.jiva at oracle.com> wrote:
>
>  The Parallel collector is a stop-the-world collector.  The Java threads
> are suspended until the GC finishes.  I think your survivor spaces maybe
> mis-configured, and that's why you're seeing such large GC times.
>
>
> On 09/07/2012 10:17 AM, Dmytro Sheyko wrote:
>
>  Hi,
>
> I can see that Parallel GC works on threads with NormPriority, while CMS
> and G1 threads run with NearMaxPriority. This probably not an issue if java
> application works alone, but some time ago I observed GC log like this (it
> was Jenkins CI on Windows):
>
> [Full GC [PSYoungGen: 10352K->0K(171904K)] [PSOldGen:
> 403417K->114637K(415872K)] 413769K->114637K(587776K) [PSPermGen:
> 76315K->76315K(83968K)], 30.2595731 secs] [Times: user=0.77 sys=0.41,
> real=30.26 secs]
>
> Despite cpu time for GC was just 1.18 sec (= 0.77 + 0.41), the real time
> was 30.26 sec! It seems to me that the system was busy that time and GC
> threads was starving.
>
> If we could raise priority of Parallel GC threads, other application would
> have less impact on GC duration.
>
> What do you think?
>
> Thanks,
> Dmytro
>
>
> --
> Azeem Jiva
> @javawithjiva
>
>
> --
> Azeem Jiva
> @javawithjiva
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20120911/683ea791/attachment.htm>