Parallel GC thread priority

Dmytro Sheyko dmytro_sheyko at hotmail.com
Tue Sep 11 09:52:27 UTC 2012


Hi,

Here is a bit more details:

{Heap before GC invocations=1468 (full 6):
PSYoungGen      total 171904K, used 10352K [0x00000000eaab0000, 0x00000000f6940000, 0x0000000100000000)
  eden space 161536K, 0% used [0x00000000eaab0000,0x00000000eaab0000,0x00000000f4870000)
  from space 10368K, 99% used [0x00000000f5f20000,0x00000000f693c060,0x00000000f6940000)
  to   space 16640K, 0% used [0x00000000f48c0000,0x00000000f48c0000,0x00000000f5900000)
PSOldGen        total 408640K, used 403417K [0x00000000c0000000, 0x00000000d8f10000, 0x00000000eaab0000)
  object space 408640K, 98% used [0x00000000c0000000,0x00000000d89f6478,0x00000000d8f10000)
PSPermGen       total 83968K, used 76315K [0x00000000bae00000, 0x00000000c0000000, 0x00000000c0000000)
  object space 83968K, 90% used [0x00000000bae00000,0x00000000bf886db0,0x00000000c0000000)
2012-03-05T00:03:06.936-0600: 130338.132: [Full GC [PSYoungGen: 10352K->0K(171904K)] [PSOldGen: 403417K->114637K(415872K)] 413769K->114637K(587776K) [PSPermGen: 76315K->76315K(83968K)], 30.2595731 secs] [Times: user=0.77 sys=0.41, real=30.26 secs]
Heap after GC invocations=1468 (full 6):
PSYoungGen      total 171904K, used 0K [0x00000000eaab0000, 0x00000000f6940000, 0x0000000100000000)
  eden space 161536K, 0% used [0x00000000eaab0000,0x00000000eaab0000,0x00000000f4870000)
  from space 10368K, 0% used [0x00000000f5f20000,0x00000000f5f20000,0x00000000f6940000)
  to   space 16640K, 0% used [0x00000000f48c0000,0x00000000f48c0000,0x00000000f5900000)
PSOldGen        total 415872K, used 114637K [0x00000000c0000000, 0x00000000d9620000, 0x00000000eaab0000)
  object space 415872K, 27% used [0x00000000c0000000,0x00000000c6ff3410,0x00000000d9620000)
PSPermGen       total 83968K, used 76315K [0x00000000bae00000, 0x00000000c0000000, 0x00000000c0000000)
  object space 83968K, 90% used [0x00000000bae00000,0x00000000bf886db0,0x00000000c0000000)
}

JVM is x64 Java 7u2.
JVM options are (printed by -XX:+PrintCommandLineFlags):
-XX:InitialHeapSize=536870912 -XX:MaxHeapSize=1073741824 -XX:ParallelGCThreads=4 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintJNIGCStalls -XX:+ReduceSignalUsage -XX:+UseCompressedOops -XX:+UseLargePagesIndividualAllocation -XX:+UseParallelGC

This happened several months ago and many things were changed since that: we added 4G RAM (8G -> 12G), tried other collectors CMS and G1, tried 32 bit, tried other java 7 updates etc. And currently it works pretty well. I doubt I may experiment with our CI server in order to find the real cause of the long GC pause happened before, however please let me know what options are to be set, that would help us to prove that swapping is guilty of such discrepancy between CPU time and wall time if it happens again. I assume that the issue is not specific to PS and it can be observed with CMS and G1 also.

Well, I almost forgot about such issue, but I noticed recently (analyzing jstack output) that Parallel GC threads work with NORM, while G1 and CMS use MAX priority. This seemed to me suspect.
If it is intentional, what is the reason for this?

Recently I wrote simple test (attached) to find out how other cpu-intensive applications impact on GC pauses.
The test does following:
1. figures out Parallel GC threads and their native ids (using jstack)
2. raises their native priority from NORMAL(0) to HIGH(2) (using JNA) (optional)
3. starts java application (Disturber) that simply spins endless loop in several threads (100, configurable) just to consume cpu time.
4. generates garbage to provoke GC.

Used following options:
-Xmx1024m -Xms256m
-server
-showversion
-XX:+PrintGCDetails
-XX:+UseParallelGC
-XX:+UseParallelOldGC
-Ddisturber.java.priority=5
-Dduration.ms=100000
-Ddisturber.threads=100
-Dgc.native.priority=0


Got following results:

1. The test is run without any disturber threads (-Ddisturber.threads=0). Priority of GC threads does not matter. Typical GC log looks like this:
[GC [PSYoungGen: 312502K->0K(349440K)] 898885K->625445K(1048512K), 0.0858486 secs] [Times: user=0.19 sys=0.00, real=0.09 secs] 
[GC [PSYoungGen: 312502K->0K(349440K)] 937947K->664508K(1048512K), 0.0855026 secs] [Times: user=0.19 sys=0.00, real=0.09 secs] 
[Full GC [PSYoungGen: 0K->0K(349440K)] [ParOldGen: 664508K->78570K(699072K)] 664508K->78570K(1048512K) [PSPermGen: 3065K->3065K(16384K)], 0.4356773 secs] [Times: user=0.80 sys=0.00, real=0.44 secs] 
[GC [PSYoungGen: 312502K->0K(349440K)] 391072K->117633K(1048512K), 0.0829569 secs] [Times: user=0.13 sys=0.02, real=0.08 secs] 
[GC [PSYoungGen: 312502K->0K(349440K)] 430135K->156695K(1048512K), 0.0831900 secs] [Times: user=0.19 sys=0.00, real=0.08 secs] 

2. The test is run with 100 disturber threads (-Ddisturber.threads=100). Priority of GC threads remains NORM(0) (-Dgc.native.priority=0). Typical GC log looks like this:
[GC [PSYoungGen: 312502K->0K(349440K)] 664510K->391070K(814976K), 1.9201232 secs] [Times: user=0.13 sys=0.02, real=1.92 secs] 
[GC [PSYoungGen: 312502K->0K(349440K)] 703572K->430133K(814976K), 1.9581938 secs] [Times: user=0.16 sys=0.05, real=1.96 secs] 
[Full GC [PSYoungGen: 0K->0K(349440K)] [ParOldGen: 430133K->78570K(490368K)] 430133K->78570K(839808K) [PSPermGen: 3065K->3065K(16384K)], 8.0233189 secs] [Times: user=0.41 sys=0.00, real=8.02 secs] 
[GC [PSYoungGen: 312502K->0K(349440K)] 391072K->117633K(839808K), 2.1210901 secs] [Times: user=0.13 sys=0.02, real=2.12 secs] 
[GC [PSYoungGen: 312502K->0K(349440K)] 430135K->156695K(839808K), 2.1864928 secs] [Times: user=0.13 sys=0.02, real=2.19 secs] 

3. The test is run with 100 disturber threads (-Ddisturber.threads=100). Priority of GC threads is raisen to HIGH(2) (-Dgc.native.priority=2). Typical GC log looks like this:
[GC [PSYoungGen: 312502K->0K(349440K)] 898885K->625445K(1048512K), 0.0824847 secs] [Times: user=0.16 sys=0.00, real=0.08 secs] 
[GC [PSYoungGen: 312502K->0K(349440K)] 937947K->664508K(1048512K), 0.0823295 secs] [Times: user=0.17 sys=0.00, real=0.08 secs] 
[Full GC [PSYoungGen: 0K->0K(349440K)] [ParOldGen: 664508K->78570K(699072K)] 664508K->78570K(1048512K) [PSPermGen: 3065K->3065K(16384K)], 0.5145061 secs] [Times: user=0.65 sys=0.00, real=0.51 secs] 
[GC [PSYoungGen: 312502K->0K(349440K)] 391072K->117633K(1048512K), 0.0793123 secs] [Times: user=0.16 sys=0.00, real=0.08 secs] 
[GC [PSYoungGen: 312502K->0K(349440K)] 430135K->156695K(1048512K), 0.0809080 secs] [Times: user=0.16 sys=0.00, real=0.08 secs] 

Here I can see that there is a noticeable difference between cpu and wall time of GC pauses in case #2 (when GC threads work with NORM priority and there are cpu intensive applications).

I believe that Parallel GC threads are to be run with HIGH priority (or at least with highest priority of app threads they block), because otherwise there is some sort of priority inversion: GC threads block all application threads even those that run with HIGH priority, however they themself run with NORM priority and can be simply preempted with threads of other app that run with ABOVE_NORM and NORM priority.

Thanks,
Dmytro

Date: Mon, 10 Sep 2012 17:22:51 +0100
Subject: Re: Parallel GC thread priority
From: ysr1729 at gmail.com
To: vitalyd at gmail.com
CC: dmytro_sheyko at hotmail.com; hotspot-gc-dev at openjdk.java.net

I agree that eliminating paging as a suspect is a good idea. Although the stats that Dmytro provided below seems to show plenty of free RAM, perhaps that info was collected at a different time than when the long pause was observed?


Like Vitaly, I am suspicious of paging in such extreme cases.

What's the version of JVM you are running? (In such cases running with -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps would have been great as it just might have provided a couple more clues -- or not.)

(There were some older versions of the JVM where there were issues with thread suspension that would cause long pauses, but i believe the GC log symptom of that was somewhat different.)

By the way, since this is a 4 core system, you should be running -XX:+UseParallelOldGC.


Note, for that reason, that "parallel gc thread" priority is moot for this case, it's really "(serial) GC thread", since there's a single GC thread running your full collection below. (Although your minor collections are indeed run multi-threaded.)


-- ramki

On Fri, Sep 7, 2012 at 5:53 PM, Vitaly Davidovich <vitalyd at gmail.com> wrote:

I see you're on windows, but on Linux I'm almost certain that time spent waiting for i/o does not count in sys time (it's purely CPU time).  I suspect, though, that windows does similar accounting but I'm not sure.


Sent from my phone
On Sep 7, 2012 12:31 PM, "Dmytro Sheyko" <dmytro_sheyko at hotmail.com> wrote:






Hi,

I can provide following info only:
===
System:
 Microsoft Windows Server 2003 R2
 Standard x64 Edition
 Service Pack 2
Computer:
 Intel(R) Xeon(R) CPU E5-2640
 2.50 GHz, 12.0 GB of RAM


===
JConsole output:

Operating System:       Windows 2003 5.2 
Architecture:           amd64 
Number of processors:   4 

Total physical memory:  12,582,100 kbytes 
Free physical memory:    6,184,012 kbytes 


Total swap space:       14,137,188 kbytes 
Free swap space:        11,189,520 kbytes 
===
But at time that this long GC pause happened it was 8 GB of RAM. And I don't know how much memory was used at that time.



Please note that it's continuous integration server. Some build tasks are executed on the same machine. The pause happened on web server, which serves UI.

BTW, isn't swapping counted as sys/kernel time?



Thanks,
Dmytro


Date: Fri, 7 Sep 2012 11:45:34 -0400
Subject: Re: Parallel GC thread priority
From: vitalyd at gmail.com
To: azeem.jiva at oracle.com


CC: dmytro_sheyko at hotmail.com; hotspot-gc-dev at openjdk.java.net

Whether it helps or not would depend on what priority the other threads are running at.  If the server had several jvms running at the same time and they all start a par new collection at about the same time, then yeah it won't help if priority is boosted - at that point, the machine is simply oversubscribed.




In Dmytro's case, the wall time is so out of whack with CPU time that I wonder if it wasn't swapping that mostly contributed to this.  For such a relatively small heap, I can't imagine how much other stuff would have to run to inflate the time so much.




Dmytro, what hardware spec was this on, out of curiosity? 

Sent from my phone

On Sep 7, 2012 11:39 AM, "Azeem Jiva" <azeem.jiva at oracle.com> wrote:


  
    
  
  
    I had not thought about other processes, which is a possibility. 
    Although raising the priority of the GC threads won't help which I
    believe what Dmytro was suggestion.

    

    

    On 09/07/2012 10:35 AM, Vitaly
      Davidovich wrote:

    
    
      You can have other threads from different processes in the
        system competing though.

      However, such a large wall time vs CPU time can also be caused
        by heavy swapping on a slow disk.  The heap there doesn't look
        all that big though ...

      Sent from my phone

      On Sep 7, 2012 11:21 AM, "Azeem Jiva"
        <azeem.jiva at oracle.com>
        wrote:

        
           The Parallel collector
            is a stop-the-world collector.  The Java threads are
            suspended until the GC finishes.  I think your survivor
            spaces maybe mis-configured, and that's why you're seeing
            such large GC times. 

            

             

            On 09/07/2012 10:17 AM, Dmytro Sheyko wrote:

            
            
               Hi,

                

                I can see that Parallel GC works on threads with
                NormPriority, while CMS and G1 threads run with
                NearMaxPriority. This probably not an issue if java
                application works alone, but some time ago I observed GC
                log like this (it was Jenkins CI on Windows):

                

                [Full GC [PSYoungGen: 10352K->0K(171904K)] [PSOldGen:
                403417K->114637K(415872K)]
                413769K->114637K(587776K) [PSPermGen:
                76315K->76315K(83968K)], 30.2595731 secs] [Times:
                user=0.77 sys=0.41, real=30.26 secs]

                

                Despite cpu time for GC was just 1.18 sec (= 0.77 +
                0.41), the real time was 30.26 sec! It seems to me that
                the system was busy that time and GC threads was
                starving.

                

                If we could raise priority of Parallel GC threads, other
                application would have less impact on GC duration.

                

                What do you think?

                

                Thanks,

                Dmytro 
            
            

            -- 
Azeem Jiva
@javawithjiva
          
        
      
    
    

    -- 
Azeem Jiva
@javawithjiva
  

 		 	   		  


 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20120911/1edf9364/attachment.htm>
-------------- next part --------------
package test;

import java.io.IOException;

public class Disturber implements Runnable {
	public static void main(String[] args) throws IOException {
		int n = Integer.getInteger("threads", 100);
		int p = Integer.getInteger("priority", Thread.NORM_PRIORITY);
		Disturber d = new Disturber();
		for (int i = 0; i < n; ++i) {
			Thread t = new Thread(d, "T" + i);
			t.setDaemon(true);
			t.setPriority(p);
			t.start();
		}
		Thread.currentThread().setPriority(Thread.MAX_PRIORITY);
		System.in.read();
	}
	public void run() {
		while (true);
	}
}
-------------- next part --------------
package test;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.lang.ref.SoftReference;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import com.sun.jna.Native;
import com.sun.jna.Pointer;
import com.sun.jna.win32.StdCallLibrary;

public class Main {
	public interface Kernel32 extends StdCallLibrary {
		Kernel32 INSTANCE = (Kernel32) Native.loadLibrary("kernel32", Kernel32.class);

		int GetCurrentProcessId();
		int GetThreadPriority(Pointer hThread);
		int SetThreadPriority(Pointer hThread, int nPriority);
		Pointer OpenThread(int dwDesiredAccess, int bInheritHandle, int dwThreadId);
		int CloseHandle(Pointer hObject);
	}

	public static void main(String[] args) throws IOException, InterruptedException {
		List<Integer> ids = getGcThreadIds(new ArrayList<Integer>());
		System.out.println(ids);
		setNativeThreadPriority(ids, Integer.getInteger("gc.native.priority", 2));
		Thread.currentThread().setPriority(Thread.MAX_PRIORITY);
		Process disturber = new ProcessBuilder(System.getProperty("java.home") + "/bin/java",
				"-Dthreads=" + System.getProperty("disturber.threads", "100"),
				"-Dpriority=" + System.getProperty("disturber.java.priority", "" + Thread.NORM_PRIORITY),
				"-classpath", System.getProperty("java.class.path"), "test.Disturber").start();
		generateGarbage(Long.getLong("duration.ms", 100_000));
		System.out.println("done");
		disturber.destroy();
	}

	static void generateGarbage(long duration) {
		long deadline = System.currentTimeMillis() + duration;
		SoftReference<Object[]> ref = new SoftReference<Object[]>(null);
		while (System.currentTimeMillis() < deadline) {
			Object[] obj = new Object[10000000];
			Object[] prv = ref.get();
			if (prv == null) {
				ref = new SoftReference<Object[]>(obj);
			} else {
				Arrays.fill(prv, obj);
			}
		}
	}

	static <C extends Collection<Integer>> C getGcThreadIds(C col) throws IOException {
		Pattern pattern = Pattern
				.compile("\\\".*\\(ParallelGC\\)\\\".* nid=(\\w+).*");
		String jstack = System.getProperty("java.home") + "/../bin/jstack";
		Process process = new ProcessBuilder(jstack, String.valueOf(Kernel32.INSTANCE.GetCurrentProcessId())).redirectErrorStream(true).start();
		BufferedReader reader = new BufferedReader(new InputStreamReader(process.getInputStream()));
		try {
			String line;
			while((line = reader.readLine()) != null) {
				Matcher matcher = pattern.matcher(line);
				if (matcher.matches()) {
					System.out.println(line);
					col.add(Integer.decode(matcher.group(1)));
				}
			}
		} finally {
			reader.close();
		}
		return col;
	}

	static void setNativeThreadPriority(Collection<Integer> ids, int native_prio) {
		for (Integer id : ids) {
			Pointer hThread = Kernel32.INSTANCE.OpenThread(96, 0, id);
			if (hThread != null) {
				try {
					System.out.printf("%x: %d", id, Kernel32.INSTANCE.GetThreadPriority(hThread));
					if (Kernel32.INSTANCE.SetThreadPriority(hThread, native_prio) == 0) {
						throw new RuntimeException("SetThreadPriority(" + id + "," + native_prio + ")");
					}
					System.out.printf(" -> %d\n", Kernel32.INSTANCE.GetThreadPriority(hThread));
				} finally {
					Kernel32.INSTANCE.CloseHandle(hThread);
				}
			} else {
				throw new RuntimeException("OpenThread(" + id + ")");
			}
		}
	}
}


More information about the hotspot-gc-dev mailing list