Effect of setting CPU quota on Java performance
David Holmes
david.holmes at oracle.com
Tue Feb 6 22:19:21 UTC 2018
Just to add, as I've stated in the past when our container support for
cpu quotas was being put in, this is really a difficult area. Quotas are
deceptively simply, but the devil is in the details - you need to know
_exactly_ how a system will implement quotas before you can
appropriately size your application in terms of number of threads.
For example, if there is a 12 processor system and you request 50% quota
then what do you get? You know that over time you will get 50% of the
cpu resources but that's not sufficient. You might get 6 processors for
100%, or 12 for 50%. But to take advantage of 12 processors you need 12
threads; whereas if you have only 6 processors you only want 6 threads.
So what do you do? You need to know these details.
David
On 6/02/2018 8:13 PM, Ashutosh Mehra1 wrote:
>> Just curious, what would be the use case for running on more cores with
> less CPU quota per core?
>
>> Are you trying to find out “can I maintain the same level of performance
> with lower power usage by plugging in more cores?”
>
>> Or put it another way “can I buy power with cores?”
>
>> I think that would be great, as that would mean your investment in the
> extra core can eventually be paid back by power savings.
>
> Well, my experiment is only to see if CPU quota, which is used quite
> extensively in kubernetes and docker environments, has any impact on
> performance of applications (especially JVM based).
> So, earlier if some one is running their app on 4 CPU system decides to
> deploy their app in kubernetes-docker environment which uses cpu quota to
> provide the requested "effective" CPUs, then what kind of performance
> issues can arise.
> The fact that the performance issue I encountered has its roots in kernel
> CPU frequency governor for power saving is just plain luck! (or should I
> say bad luck!)
>
> Another (long term) aim of this exercise is to understand how the JVM is
> performing in docker container, and if there is any scope of improvement
> there.
> As part of that, I am checking the impact of CPU quota on various aspects
> of JVM.
> GC pause time and JIT compilation time are two such quantities that I am
> currently measuring when running AcmeAir benchmark.
>
> Ashutosh
>
>
>
> From: Ioi Lam <ioi.lam at oracle.com>
> To: Ashutosh Mehra1 <asmehra1 at in.ibm.com>
> Cc: jdk-dev at openjdk.java.net, Dinakar Guniguntala
> <Dinakar.G at in.ibm.com>
> Date: 02/06/2018 02:44 PM
> Subject: Re: Effect of setting CPU quota on Java performance
>
>
>
> Just curious, what would be the use case for running on more cores with
> less CPU quota per core?
>
> Are you trying to find out “can I maintain the same level of performance
> with lower power usage by plugging in more cores?”
>
> Or put it another way “can I buy power with cores?”
>
> I think that would be great, as that would mean your investment in the
> extra core can eventually be paid back by power savings.
>
> Ioi
>
>
> On Feb 6, 2018, at 2:43 PM, Ashutosh Mehra1 <asmehra1 at in.ibm.com> wrote:
>
>>> As a control test, maybe you can run a simple multi-threaded C
> benchmark
>> with the same settings?
>>
>> Thats a good point. I have also been testing with a simple
> multi-threaded
>> C program for past few days, and I observed similar difference in
>> performance between 4CPU at 100 and 8CPU at 50.
>> In fact I asked this question based on that multi-threaded C program
> here
>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__unix.stackexchange.com_questions_417506_what-2Dis-2Dthe-2Deffect-2Dof-2Dsetting-2Dcpu-2Dcpu-2Dquota-2Dus-2Din-2Dcpu-2Dcgroup&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=E-YV0z5Ta99mwh3Za06_I769mDNiOgT5HLTiH-9tcIY&m=EoWnPxdBJetX_MYOuYzlqw_1dBePJ6X_jq9FyiwhTYg&s=pGc13T8qxYnbuyDanEF0tT7DuuzVgGtiwvZ55oB-o2A&e=
>
>> but didn't get any response.
>>
>> Yesterday, while testing with that C program we noticed CPU frequency
> for
>> 4CPU at 100 case was close to maximum, but in case of 8CPU at 50, it was way
>> below that.
>> On further examination we found the system was using "powersave" as the
>> kernel governor for CPU frequency.
>> For a quick overview of different kernel governors for CPU frequency see
>
>> this:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.kernel.org_doc_Documentation_cpu-2Dfreq_governors.txt&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=E-YV0z5Ta99mwh3Za06_I769mDNiOgT5HLTiH-9tcIY&m=EoWnPxdBJetX_MYOuYzlqw_1dBePJ6X_jq9FyiwhTYg&s=7j2vI3sWYhJtMXE3T_Twbsggvc84INFKKuPm2Axgq3w&e=
>
>> When we changed the kernel governor to "performance", we were able to
> see
>> much better performance with 8CPU at 50 which was very close to 4CPU at 100.
>>
>> I am going to repeat my experiment with AcmeAir benchmark using
>> "performance" kernel governor.
>> I expect this should help in bridging the throughput gap I noticed in my
>
>> earlier experiment.
>>
>> - Ashutosh
>>
>>
>>
>> From: Ioi Lam <ioi.lam at oracle.com>
>> To: Ashutosh Mehra1 <asmehra1 at in.ibm.com>
>> Cc: jdk-dev at openjdk.java.net, Dinakar Guniguntala
>> <Dinakar.G at in.ibm.com>
>> Date: 02/05/2018 11:47 PM
>> Subject: Re: Effect of setting CPU quota on Java performance
>>
>>
>>
>> As a control test, maybe you can run a simple multi-threaded C benchmark
>
>> with the same settings?
>>
>>> On Feb 5, 2018, at 1:48 PM, Ashutosh Mehra1 <asmehra1 at in.ibm.com>
> wrote:
>>>
>>> I have been trying to understand if setting CPU quota limit on a docker
>
>>> container, provided the "effective" CPUs are the same, has any impact
> on
>>
>>> the application performance.
>>> As an example, if my app is running on 4 CPUs @ 100% quota, would I get
>
>>> same performance if my app is running on 8 CPUs at 50% quota? Note that
>>> "effective" CPUs is 4 in both cases.
>>>
>>> Since OpenJDK early access builds for Java 10 have improved support for
>
>>> docker container (
>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8146115&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=E-YV0z5Ta99mwh3Za06_I769mDNiOgT5HLTiH-9tcIY&m=AF2cyoYVI5rZB4oOzFm7tKN8hqOduv6oKv-ewl5sIbg&s=pYU9iHBTE76q-M9IZrQkWIq_LZnocNRTSH1_bJFCGLQ&e=
>
>> ), I
>>> decided to do some measurements using that build.
>>> I got the build OpenJDK build jdk-10-ea+40 from
>>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__jdk.java.net_10_&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=E-YV0z5Ta99mwh3Za06_I769mDNiOgT5HLTiH-9tcIY&m=AF2cyoYVI5rZB4oOzFm7tKN8hqOduv6oKv-ewl5sIbg&s=_bL50Git2eGTEuWiDNdtvjIXdgc7499XiPL7JFilAxA&e=
>
>> .
>>> This build by default has container support enabled.
>>> The system I used has 32 CPUs including 2 hyperthreads per core. I
>> turned
>>> off hyperthreading for this experiment. That leaves me with 16 cores on
>
>> 2
>>> sockets: 0-7 on 1 socket and 8-15 on 2nd socket.
>>> System details are:
>>>
>>> # lsb_release -a
>>> No LSB modules are available.
>>> Distributor ID: Ubuntu
>>> Description: Ubuntu 16.04.2 LTS
>>> Release: 16.04
>>> Codename: xenial
>>>
>>> # uname -r
>>> 4.4.0-103-generic
>>>
>>> For measurements I used AcmeAir benchmark at
>>>
>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sabkrish_acmeair_tree_microservice-5Fchanges&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=E-YV0z5Ta99mwh3Za06_I769mDNiOgT5HLTiH-9tcIY&m=AF2cyoYVI5rZB4oOzFm7tKN8hqOduv6oKv-ewl5sIbg&s=bkcFfs-e2YzGHlWabfwKyipPqlqDOWXrW6W5-ns1EP8&e=
>
>> .
>>> I ran the AcmeAir benchmark with the said build for following cases:
>>>
>>> 1) Ran AcmeAir with JVM bound to 4 cpus (8-11) and no limit on quota.
>> Lets
>>> call this 4CPU at 100.
>>> This used following JVM settings:
>>> CICompilerCount = 3, ParallelGCThreads = 4, ConcGCThreads = 1
>>>
>>> 2) Case 8Cpu50Quota: Ran AcmeAir with JVM bound to 8 cpus (8-15) and
> 50%
>>
>>> quota. Lets call this 8CPU at 50.
>>> In this case container support was enabled by default and it used
>>> following JVM settings:
>>> CICompilerCount = 3, ParallelGCThreads = 4, ConcGCThreads = 1
>>>
>>> 3) Ran AcmeAir with JVM bound to 8 cpus (8-15) and 50% quota with
>>> -XX:-UseContainerSupport option to disable container support. Lets call
>
>>> this 8CPU at 50NoCS.
>>> This used following JVM settings:
>>> CICompilerCount = 4, ParallelGCThreads = 8, ConcGCThreads = 2
>>>
>>> Load on the server was applied using JMeter which was running on same
>> box
>>> but bound to 0-8 CPUs. I applied the load for few minutes to warm up
> the
>>
>>> JVM before starting the final "measure" run.
>>> Throughput reported below is for the final "measure" run. All numbers
>>> reported below are an average of 10 iterations.
>>>
>>> Throughput result:
>>> 4Cpu1 at 100 | 8Cpu at 50 | 8Cpu at 50NoCS
>>> 9621.5 | 6970.6 | 7252.1
>>>
>>> I also measured Total compilation time (in seconds) and Total pause
> time
>>
>>> (in seconds) for the duration of the server (which includes warm up
>>> phase):
>>>
>>> Compilation time:
>>> 4Cpu1 at 100 | 8Cpu at 50 | 8Cpu at 50NoCS
>>> 79.8545 | 76.7041 | 100.085
>>>
>>> GC Pasue time:
>>> 4Cpu1 at 100 | 8Cpu at 50 | 8Cpu at 50NoCS
>>> 1.829 | 1.886 | 1.927
>>>
>>> I am quite surprised to see the drop in throughput between 4Cpu100Quota
>
>>> and 8Cpu50Quota case.
>>> Looking deeper into the results, I do notice that numbers for
>> 8Cpu50Quota
>>> case were not very consistent, but in general I can say they are not
>>> matching 4Cpu100Quota case.
>>> I will be doing additional runs for this setup (and on different
>> OS/kernel
>>> version) and increase warm up time for the JVM to see if that improves
>> the
>>> consistency.
>>>
>>> Meanwhile, couple of questions I wanted to put forward:
>>> 1) Has anyone else noticed this kind of difference in Java
>>> application/JVM performance when CPU quota is used?
>>> 2) What other open-source benchmarks are available that I can use to
>>> verify the behavior I am observing?
>>>
>>> Any comments/feedback are welcome.
>>>
>>> Thanks,
>>> Ashutosh Mehra
>>>
>>
>>
>>
>>
>>
>
>
>
>
>
More information about the jdk-dev
mailing list