RFR: 8146115 - Improve docker container detection and resource configuration usage

Tue Oct 3 14:41:38 UTC 2017

> On Oct 3, 2017, at 8:39 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:
> 
> On 10/03/2017 02:25 PM, Bob Vandette wrote:
>> After talking to a number of folks and getting feedback, my current thinking is to enable the support by default.
> 
> Great.
> 
>> I still want to include the flag for at least one Java release in the event that the new behavior causes some regression
>> in behavior.  I’m trying to make the detection robust so that it will fallback to the current behavior in the event
>> that cgroups is not configured as expected but I’d like to have a way of forcing the issue.  JDK 10 is not
>> supposed to be a long term support release which makes it a good target for this new behavior.
>> I agree with David that once we commit to cgroups, we should extract all VM configuration data from that
>> source.  There’s more information available for cpusets than just processor affinity that we might want to
>> consider when calculating the number of processors to assume for the VM.  There’s exclusivity and
>> effective cpu data available in addition to the cpuset string.
> 
> cgroup only contains limits, not the real hard limits.
> You most consider the affinity mask. We that have numa nodes do:
> 
> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java -Xlog:os=debug -cp . ForEver | grep proc
> [0.001s][debug][os] Initial active processor count set to 16
> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java -Xlog:os=debug -XX:+UseContainerSupport -cp . ForEver | grep proc
> [0.001s][debug][os] Initial active processor count set to 32
> 
> when benchmarking all the time and that must be set to 16 otherwise the flag is really bad for us.
> So the flag actually breaks the little numa support we have now.

Thanks for sharing those results.  I’ll look into this.

I’m hoping this is due to the fact that I am not yet examining the memory node files in the cgroup file
system.  

Bob.

> 
> Thanks, Robbin
> 
>> Bob.
>>> On Oct 3, 2017, at 4:00 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:
>>> 
>>> Hi David,
>>> 
>>> On 10/03/2017 12:46 AM, David Holmes wrote:
>>>> Hi Robbin,
>>>> I have some views on this :)
>>>> On 3/10/2017 6:20 AM, Robbin Ehn wrote:
>>>>> Hi Bob,
>>>>> 
>>>>> As I said in your presentation for RT.
>>>>> If kernel if configured with cgroup this should always be read (otherwise we get wrong values).
>>>>> E.g. fedora have had cgroups default on several years (I believe most distros have it on).
>>>>> 
>>>>> - No option is needed at all: right now we have wrong values your fix will provide right ones, why would you ever what to turn that off?
>>>> It's not that you would want to turn that off (necessarily) but just because cgroups capability exists it doesn't mean they have actually been enabled and configured - in which case reading all the cgroup info is unnecessary startup overhead. So for now this is opt-in - as was the experimental cgroup support we added. Once it becomes clearer how this needs to be used we can adjust the defaults. For now this is enabling technology only.
>>> 
>>> If cgroup are mounted they are on and the only way to know the configuration (such as no limits) is to actual read the cgroup filesystem.
>>> Therefore the flag make no sense.
>>> 
>>>>> - log target container would make little sense since almost all linuxes run with croups on.
>>>> Again the capability is present but may not be enabled/configured.
>>> 
>>> The capability is on if cgroup are mount and the only way to know the configuration is to read the cgroup filesystem.
>>> 
>>>>> - For cpuset, the processes affinity mask already reflect cgroup setting so you don't need to look into cgroup for that
>>>>>    If you do, you would miss any processes specific affinity mask. So _cpu_count() should already be returning the right number of CPU's.
>>>> While the process affinity mask reflect cpusets (and we already use it for that reason), it doesn't reflect shares and quotas. And if shares/quotas are enforced and someone sets a custom affinity mask, what is it all supposed to mean? That's one of the main reasons to allow the number of cpu's to be hardwired via a flag. So it's better IMHO to read everything from the cgroups if configured to use cgroups.
>>> 
>>> I'm not taking about shares and quotes, they should be read of course, but cpuset should be checked such as in _cpu_count.
>>> 
>>> Here is the bug:
>>> 
>>> [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java -Xlog:os=debug -cp . ForEver | grep proc
>>> [0.002s][debug][os] Initial active processor count set to 4
>>> ^C
>>> [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java -XX:+UseContainerSupport -Xlog:os=debug -cp . ForEver | grep proc
>>> [0.003s][debug][os] Initial active processor count set to 32
>>> ^C
>>> 
>>> _cpu_count already does the right thing.
>>> 
>>> Thanks, Robbin
>>> 
>>> 
>>>> Cheers,
>>>> David
>>>>> 
>>>>> Thanks for trying to fixing this!
>>>>> 
>>>>> /Robbin
>>>>> 
>>>>> On 09/22/2017 04:27 PM, Bob Vandette wrote:
>>>>>> Please review these changes that improve on docker container detection and the
>>>>>> automatic configuration of the number of active CPUs and total and free memory
>>>>>> based on the containers resource limitation settings and metric data files.
>>>>>> 
>>>>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ <http://cr.openjdk.java.net/~bobv/8146115/webrev.00/>
>>>>>> 
>>>>>> These changes are enabled with -XX:+UseContainerSupport.
>>>>>> 
>>>>>> You can enable logging for this support via -Xlog:os+container=trace.
>>>>>> 
>>>>>> Since the dynamic selection of CPUs based on cpusets, quotas and shares
>>>>>> may not satisfy every users needs, I’ve added an additional flag to allow the
>>>>>> number of CPUs to be overridden.  This flag is named -XX:ActiveProcessorCount=xx.
>>>>>> 
>>>>>> 
>>>>>> Bob.
>>>>>> 
>>>>>> 
>>>>>>