Effect of setting CPU quota on Java performance

Ashutosh Mehra1 asmehra1 at in.ibm.com
Tue Feb 6 10:13:00 UTC 2018


> Just curious, what would be the use case for running on more cores with 
less CPU quota per core?

> Are you trying to find out “can I maintain the same level of performance 
with lower power usage by plugging in more cores?”

> Or put it another way “can I buy power with cores?”

> I think that would be great, as that would mean your investment in the 
extra core can eventually be paid back by power savings.

Well, my experiment is only to see if CPU quota, which is used quite 
extensively in kubernetes and docker environments, has any impact on 
performance of applications (especially JVM based).
So, earlier if some one is running their app on 4 CPU system decides to 
deploy their app in kubernetes-docker environment which uses cpu quota to 
provide the requested "effective" CPUs, then what kind of performance 
issues can arise.
The fact that the performance issue I encountered has its roots in kernel 
CPU frequency governor for power saving is just plain luck! (or should I 
say bad luck!)

Another (long term) aim of this exercise is to understand how the JVM is 
performing in docker container, and if there is any scope of improvement 
there.
As part of that, I am checking the impact of CPU quota on various aspects 
of JVM. 
GC pause time and JIT compilation time are two such quantities that I am 
currently measuring when running AcmeAir benchmark.

Ashutosh



From:   Ioi Lam <ioi.lam at oracle.com>
To:     Ashutosh Mehra1 <asmehra1 at in.ibm.com>
Cc:     jdk-dev at openjdk.java.net, Dinakar Guniguntala 
<Dinakar.G at in.ibm.com>
Date:   02/06/2018 02:44 PM
Subject:        Re: Effect of setting CPU quota on Java performance



Just curious, what would be the use case for running on more cores with 
less CPU quota per core?

Are you trying to find out “can I maintain the same level of performance 
with lower power usage by plugging in more cores?”

Or put it another way “can I buy power with cores?”

I think that would be great, as that would mean your investment in the 
extra core can eventually be paid back by power savings.

Ioi


On Feb 6, 2018, at 2:43 PM, Ashutosh Mehra1 <asmehra1 at in.ibm.com> wrote:

>> As a control test, maybe you can run a simple multi-threaded C 
benchmark 
> with the same settings?
> 
> Thats a good point. I have also been testing with a simple 
multi-threaded 
> C program for past few days, and I observed similar difference in 
> performance between 4CPU at 100 and 8CPU at 50.
> In fact I asked this question based on that multi-threaded C program 
here 
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__unix.stackexchange.com_questions_417506_what-2Dis-2Dthe-2Deffect-2Dof-2Dsetting-2Dcpu-2Dcpu-2Dquota-2Dus-2Din-2Dcpu-2Dcgroup&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=E-YV0z5Ta99mwh3Za06_I769mDNiOgT5HLTiH-9tcIY&m=EoWnPxdBJetX_MYOuYzlqw_1dBePJ6X_jq9FyiwhTYg&s=pGc13T8qxYnbuyDanEF0tT7DuuzVgGtiwvZ55oB-o2A&e=

> but didn't get any response.
> 
> Yesterday, while testing with that C program we noticed CPU frequency 
for 
> 4CPU at 100 case was close to maximum, but in case of 8CPU at 50, it was way 
> below that.
> On further examination we found the system was using "powersave" as the 
> kernel governor for CPU frequency.
> For a quick overview of different kernel governors for CPU frequency see 

> this: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.kernel.org_doc_Documentation_cpu-2Dfreq_governors.txt&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=E-YV0z5Ta99mwh3Za06_I769mDNiOgT5HLTiH-9tcIY&m=EoWnPxdBJetX_MYOuYzlqw_1dBePJ6X_jq9FyiwhTYg&s=7j2vI3sWYhJtMXE3T_Twbsggvc84INFKKuPm2Axgq3w&e=

> When we changed the kernel governor to "performance", we were able to 
see 
> much better performance with 8CPU at 50 which was very close to 4CPU at 100.
> 
> I am going to repeat my experiment with AcmeAir benchmark using 
> "performance" kernel governor.
> I expect this should help in bridging the throughput gap I noticed in my 

> earlier experiment.
> 
> - Ashutosh
> 
> 
> 
> From:   Ioi Lam <ioi.lam at oracle.com>
> To:     Ashutosh Mehra1 <asmehra1 at in.ibm.com>
> Cc:     jdk-dev at openjdk.java.net, Dinakar Guniguntala 
> <Dinakar.G at in.ibm.com>
> Date:   02/05/2018 11:47 PM
> Subject:        Re: Effect of setting CPU quota on Java performance
> 
> 
> 
> As a control test, maybe you can run a simple multi-threaded C benchmark 

> with the same settings?
> 
>> On Feb 5, 2018, at 1:48 PM, Ashutosh Mehra1 <asmehra1 at in.ibm.com> 
wrote:
>> 
>> I have been trying to understand if setting CPU quota limit on a docker 

>> container, provided the "effective" CPUs are the same, has any impact 
on 
> 
>> the application performance.
>> As an example, if my app is running on 4 CPUs @ 100% quota, would I get 

>> same performance if my app is running on 8 CPUs at 50% quota? Note that 
>> "effective" CPUs is 4 in both cases.
>> 
>> Since OpenJDK early access builds for Java 10 have improved support for 

>> docker container (
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8146115&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=E-YV0z5Ta99mwh3Za06_I769mDNiOgT5HLTiH-9tcIY&m=AF2cyoYVI5rZB4oOzFm7tKN8hqOduv6oKv-ewl5sIbg&s=pYU9iHBTE76q-M9IZrQkWIq_LZnocNRTSH1_bJFCGLQ&e=

> ), I 
>> decided to do some measurements using that build.
>> I got the build OpenJDK build jdk-10-ea+40 from 
> 
https://urldefense.proofpoint.com/v2/url?u=http-3A__jdk.java.net_10_&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=E-YV0z5Ta99mwh3Za06_I769mDNiOgT5HLTiH-9tcIY&m=AF2cyoYVI5rZB4oOzFm7tKN8hqOduv6oKv-ewl5sIbg&s=_bL50Git2eGTEuWiDNdtvjIXdgc7499XiPL7JFilAxA&e=

> . 
>> This build by default has container support enabled.
>> The system I used has 32 CPUs including 2 hyperthreads per core. I 
> turned 
>> off hyperthreading for this experiment. That leaves me with 16 cores on 

> 2 
>> sockets: 0-7 on 1 socket and 8-15 on 2nd socket.
>> System details are:
>> 
>> # lsb_release -a
>> No LSB modules are available.
>> Distributor ID:          Ubuntu
>> Description:             Ubuntu 16.04.2 LTS
>> Release:                 16.04
>> Codename:                xenial
>> 
>> # uname -r
>> 4.4.0-103-generic
>> 
>> For measurements I used AcmeAir benchmark at 
>> 
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sabkrish_acmeair_tree_microservice-5Fchanges&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=E-YV0z5Ta99mwh3Za06_I769mDNiOgT5HLTiH-9tcIY&m=AF2cyoYVI5rZB4oOzFm7tKN8hqOduv6oKv-ewl5sIbg&s=bkcFfs-e2YzGHlWabfwKyipPqlqDOWXrW6W5-ns1EP8&e=

> .
>> I ran the AcmeAir benchmark with the said build for following cases:
>> 
>> 1) Ran AcmeAir with JVM bound to 4 cpus (8-11) and no limit on quota. 
> Lets 
>> call this 4CPU at 100.
>> This used following JVM settings:
>> CICompilerCount = 3, ParallelGCThreads = 4, ConcGCThreads = 1
>> 
>> 2) Case 8Cpu50Quota: Ran AcmeAir with JVM bound to 8 cpus (8-15) and 
50% 
> 
>> quota. Lets call this 8CPU at 50.
>> In this case container support was enabled by default and it used 
>> following JVM settings:
>> CICompilerCount = 3, ParallelGCThreads = 4, ConcGCThreads = 1
>> 
>> 3) Ran AcmeAir with JVM bound to 8 cpus (8-15) and 50% quota with 
>> -XX:-UseContainerSupport option to disable container support. Lets call 

>> this 8CPU at 50NoCS.
>> This used following JVM settings:
>> CICompilerCount = 4, ParallelGCThreads = 8, ConcGCThreads = 2
>> 
>> Load on the server was applied using JMeter which was running on same 
> box 
>> but bound to 0-8 CPUs. I applied the load for few minutes to warm up 
the 
> 
>> JVM before starting the final "measure" run.
>> Throughput reported below is for the final "measure" run. All numbers 
>> reported below are an average of 10 iterations.
>> 
>> Throughput result:
>> 4Cpu1 at 100  |  8Cpu at 50 |  8Cpu at 50NoCS
>> 9621.5            |  6970.6      |  7252.1
>> 
>> I also measured Total compilation time (in seconds) and Total pause 
time 
> 
>> (in seconds) for the duration of the server (which includes warm up 
>> phase):
>> 
>> Compilation time:
>> 4Cpu1 at 100  |  8Cpu at 50  |  8Cpu at 50NoCS
>> 79.8545          |  76.7041     |  100.085
>> 
>> GC Pasue time:
>> 4Cpu1 at 100  |  8Cpu at 50  |  8Cpu at 50NoCS
>> 1.829              |  1.886         |  1.927
>> 
>> I am quite surprised to see the drop in throughput between 4Cpu100Quota 

>> and 8Cpu50Quota case.
>> Looking deeper into the results, I do notice that numbers for 
> 8Cpu50Quota 
>> case were not very consistent, but in general I can say they are not 
>> matching 4Cpu100Quota case.
>> I will be doing additional runs for this setup (and on different 
> OS/kernel 
>> version) and increase warm up time for the JVM to see if that improves 
> the 
>> consistency.
>> 
>> Meanwhile, couple of questions I wanted to put forward:
>> 1) Has anyone else noticed this kind of difference in Java 
>> application/JVM performance when CPU quota is used?
>> 2) What other open-source benchmarks are available that I can use to 
>> verify the behavior I am observing?
>> 
>> Any comments/feedback are welcome.
>> 
>> Thanks,
>> Ashutosh Mehra
>> 
> 
> 
> 
> 
> 







More information about the jdk-dev mailing list