Effect of setting CPU quota on Java performance
Ioi Lam
ioi.lam at oracle.com
Tue Feb 6 09:14:24 UTC 2018
Just curious, what would be the use case for running on more cores with less CPU quota per core?
Are you trying to find out “can I maintain the same level of performance with lower power usage by plugging in more cores?”
Or put it another way “can I buy power with cores?”
I think that would be great, as that would mean your investment in the extra core can eventually be paid back by power savings.
Ioi
On Feb 6, 2018, at 2:43 PM, Ashutosh Mehra1 <asmehra1 at in.ibm.com> wrote:
>> As a control test, maybe you can run a simple multi-threaded C benchmark
> with the same settings?
>
> Thats a good point. I have also been testing with a simple multi-threaded
> C program for past few days, and I observed similar difference in
> performance between 4CPU at 100 and 8CPU at 50.
> In fact I asked this question based on that multi-threaded C program here
> https://unix.stackexchange.com/questions/417506/what-is-the-effect-of-setting-cpu-cpu-quota-us-in-cpu-cgroup
> but didn't get any response.
>
> Yesterday, while testing with that C program we noticed CPU frequency for
> 4CPU at 100 case was close to maximum, but in case of 8CPU at 50, it was way
> below that.
> On further examination we found the system was using "powersave" as the
> kernel governor for CPU frequency.
> For a quick overview of different kernel governors for CPU frequency see
> this: https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt
> When we changed the kernel governor to "performance", we were able to see
> much better performance with 8CPU at 50 which was very close to 4CPU at 100.
>
> I am going to repeat my experiment with AcmeAir benchmark using
> "performance" kernel governor.
> I expect this should help in bridging the throughput gap I noticed in my
> earlier experiment.
>
> - Ashutosh
>
>
>
> From: Ioi Lam <ioi.lam at oracle.com>
> To: Ashutosh Mehra1 <asmehra1 at in.ibm.com>
> Cc: jdk-dev at openjdk.java.net, Dinakar Guniguntala
> <Dinakar.G at in.ibm.com>
> Date: 02/05/2018 11:47 PM
> Subject: Re: Effect of setting CPU quota on Java performance
>
>
>
> As a control test, maybe you can run a simple multi-threaded C benchmark
> with the same settings?
>
>> On Feb 5, 2018, at 1:48 PM, Ashutosh Mehra1 <asmehra1 at in.ibm.com> wrote:
>>
>> I have been trying to understand if setting CPU quota limit on a docker
>> container, provided the "effective" CPUs are the same, has any impact on
>
>> the application performance.
>> As an example, if my app is running on 4 CPUs @ 100% quota, would I get
>> same performance if my app is running on 8 CPUs at 50% quota? Note that
>> "effective" CPUs is 4 in both cases.
>>
>> Since OpenJDK early access builds for Java 10 have improved support for
>> docker container (
> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8146115&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=E-YV0z5Ta99mwh3Za06_I769mDNiOgT5HLTiH-9tcIY&m=AF2cyoYVI5rZB4oOzFm7tKN8hqOduv6oKv-ewl5sIbg&s=pYU9iHBTE76q-M9IZrQkWIq_LZnocNRTSH1_bJFCGLQ&e=
> ), I
>> decided to do some measurements using that build.
>> I got the build OpenJDK build jdk-10-ea+40 from
> https://urldefense.proofpoint.com/v2/url?u=http-3A__jdk.java.net_10_&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=E-YV0z5Ta99mwh3Za06_I769mDNiOgT5HLTiH-9tcIY&m=AF2cyoYVI5rZB4oOzFm7tKN8hqOduv6oKv-ewl5sIbg&s=_bL50Git2eGTEuWiDNdtvjIXdgc7499XiPL7JFilAxA&e=
> .
>> This build by default has container support enabled.
>> The system I used has 32 CPUs including 2 hyperthreads per core. I
> turned
>> off hyperthreading for this experiment. That leaves me with 16 cores on
> 2
>> sockets: 0-7 on 1 socket and 8-15 on 2nd socket.
>> System details are:
>>
>> # lsb_release -a
>> No LSB modules are available.
>> Distributor ID: Ubuntu
>> Description: Ubuntu 16.04.2 LTS
>> Release: 16.04
>> Codename: xenial
>>
>> # uname -r
>> 4.4.0-103-generic
>>
>> For measurements I used AcmeAir benchmark at
>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sabkrish_acmeair_tree_microservice-5Fchanges&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=E-YV0z5Ta99mwh3Za06_I769mDNiOgT5HLTiH-9tcIY&m=AF2cyoYVI5rZB4oOzFm7tKN8hqOduv6oKv-ewl5sIbg&s=bkcFfs-e2YzGHlWabfwKyipPqlqDOWXrW6W5-ns1EP8&e=
> .
>> I ran the AcmeAir benchmark with the said build for following cases:
>>
>> 1) Ran AcmeAir with JVM bound to 4 cpus (8-11) and no limit on quota.
> Lets
>> call this 4CPU at 100.
>> This used following JVM settings:
>> CICompilerCount = 3, ParallelGCThreads = 4, ConcGCThreads = 1
>>
>> 2) Case 8Cpu50Quota: Ran AcmeAir with JVM bound to 8 cpus (8-15) and 50%
>
>> quota. Lets call this 8CPU at 50.
>> In this case container support was enabled by default and it used
>> following JVM settings:
>> CICompilerCount = 3, ParallelGCThreads = 4, ConcGCThreads = 1
>>
>> 3) Ran AcmeAir with JVM bound to 8 cpus (8-15) and 50% quota with
>> -XX:-UseContainerSupport option to disable container support. Lets call
>> this 8CPU at 50NoCS.
>> This used following JVM settings:
>> CICompilerCount = 4, ParallelGCThreads = 8, ConcGCThreads = 2
>>
>> Load on the server was applied using JMeter which was running on same
> box
>> but bound to 0-8 CPUs. I applied the load for few minutes to warm up the
>
>> JVM before starting the final "measure" run.
>> Throughput reported below is for the final "measure" run. All numbers
>> reported below are an average of 10 iterations.
>>
>> Throughput result:
>> 4Cpu1 at 100 | 8Cpu at 50 | 8Cpu at 50NoCS
>> 9621.5 | 6970.6 | 7252.1
>>
>> I also measured Total compilation time (in seconds) and Total pause time
>
>> (in seconds) for the duration of the server (which includes warm up
>> phase):
>>
>> Compilation time:
>> 4Cpu1 at 100 | 8Cpu at 50 | 8Cpu at 50NoCS
>> 79.8545 | 76.7041 | 100.085
>>
>> GC Pasue time:
>> 4Cpu1 at 100 | 8Cpu at 50 | 8Cpu at 50NoCS
>> 1.829 | 1.886 | 1.927
>>
>> I am quite surprised to see the drop in throughput between 4Cpu100Quota
>> and 8Cpu50Quota case.
>> Looking deeper into the results, I do notice that numbers for
> 8Cpu50Quota
>> case were not very consistent, but in general I can say they are not
>> matching 4Cpu100Quota case.
>> I will be doing additional runs for this setup (and on different
> OS/kernel
>> version) and increase warm up time for the JVM to see if that improves
> the
>> consistency.
>>
>> Meanwhile, couple of questions I wanted to put forward:
>> 1) Has anyone else noticed this kind of difference in Java
>> application/JVM performance when CPU quota is used?
>> 2) What other open-source benchmarks are available that I can use to
>> verify the behavior I am observing?
>>
>> Any comments/feedback are welcome.
>>
>> Thanks,
>> Ashutosh Mehra
>>
>
>
>
>
>
More information about the jdk-dev
mailing list