RFR: 8146115 - Improve docker container detection and resource configuration usage

Wed Oct 4 18:51:04 UTC 2017

> On Oct 4, 2017, at 2:30 PM, Robbin Ehn <robbin.ehn at oracle.com> wrote:
> 
> Thanks Bob for looking into this.
> 
> On 10/04/2017 08:14 PM, Bob Vandette wrote:
>> Robbin,
>> I’ve looked into this issue and you are correct.  I do have to examine both the
>> sched_getaffinity results as well as the cgroup cpu subsystem configuration
>> files in order to provide a reasonable value for active_processors.  If I was only
>> interested in cpusets, I could simply rely on the getaffinity call but I also want to
>> factor in shares and quotas as well.
> 
> We had a quick discussion at the office, we actually do think that you could skip reading the shares and quotas.
> It really depends on what the user expect, if he give us 4 cpu's with 50% or 2 full cpu what do he expect the differences would be?
> One could argue that he 'knows' that he will only use max 50% and thus we can act as if he is giving us 4 full cpu.
> But I'll leave that up to you, just a tough we had.

It’s my opinion that we should do something if someone makes the effort to configure their
containers to use quotas or shares.  There are many different opinions on what the right that
right “something” is.

Many developers that are trying to deploy apps that use containers say they don’t like
cpusets.  This is too limiting for them especially when the server configurations vary
within their organization. 

From everything I’ve read including source code, there seems to be a consensus that
shares and quotas are being used as a way to specify a fraction of a system (number of cpus).

Docker added —cpus which is implemented using quotas and periods.  They adjust these
two parameters to provide a way of calculating the number of cpus that will be available
to a process (quota/period).  Amazon also documents that cpu shares are defined to be a multiple of 1024.
Where 1024 represents a single cpu and a share value of N*1024 represents N cpus.

Of course these are just conventions.  This is why I provided a way of specifying the
number of CPUs so folks deploying Java services can be certain they get what they want.

Bob.

> 
>> I had assumed that when sched_setaffinity was called (in your case by numactl) that the
>> cgroup cpu config files would be updated to reflect the current processor affinity for the
>> running process. This is not correct.  I have updated my changeset and have successfully
>> run with your examples below.  I’ll post a new webrev soon.
> 
> I see, thanks again!
> 
> /Robbin
> 
>> Thanks,
>> Bob.
>>> 
>>>> I still want to include the flag for at least one Java release in the event that the new behavior causes some regression
>>>> in behavior.  I’m trying to make the detection robust so that it will fallback to the current behavior in the event
>>>> that cgroups is not configured as expected but I’d like to have a way of forcing the issue.  JDK 10 is not
>>>> supposed to be a long term support release which makes it a good target for this new behavior.
>>>> I agree with David that once we commit to cgroups, we should extract all VM configuration data from that
>>>> source.  There’s more information available for cpusets than just processor affinity that we might want to
>>>> consider when calculating the number of processors to assume for the VM.  There’s exclusivity and
>>>> effective cpu data available in addition to the cpuset string.
>>> 
>>> cgroup only contains limits, not the real hard limits.
>>> You most consider the affinity mask. We that have numa nodes do:
>>> 
>>> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java -Xlog:os=debug -cp . ForEver | grep proc
>>> [0.001s][debug][os] Initial active processor count set to 16
>>> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java -Xlog:os=debug -XX:+UseContainerSupport -cp . ForEver | grep proc
>>> [0.001s][debug][os] Initial active processor count set to 32
>>> 
>>> when benchmarking all the time and that must be set to 16 otherwise the flag is really bad for us.
>>> So the flag actually breaks the little numa support we have now.
>>> 
>>> Thanks, Robbin