RFR: 8146115 - Improve docker container detection and resource configuration usage

Tue Oct 3 11:00:46 UTC 2017

Hi Robbin,

On 3/10/2017 8:45 PM, Robbin Ehn wrote:
> Hi David, I think we are seen the issue from complete opposite. (this 
> RFE could be pushed as a bug from my POV)

Yes we see this completely opposite. I see this is a poorly integrated 
add-on API that we have to try to account for instead of being able to 
read an "always correct" value from a standard OS API. They at least got 
the cpuset support correct by having sched_getaffinity correctly account 
for it. Alas the rest is ad-hoc.

> 
> On 10/03/2017 10:42 AM, David Holmes wrote:
>> On 3/10/2017 6:00 PM, Robbin Ehn wrote:
>>> Hi David,
>>>
>>> On 10/03/2017 12:46 AM, David Holmes wrote:
>>>> Hi Robbin,
>>>>
>>>> I have some views on this :)
>>>>
>>>> On 3/10/2017 6:20 AM, Robbin Ehn wrote:
>>>>> Hi Bob,
>>>>>
>>>>> As I said in your presentation for RT.
>>>>> If kernel if configured with cgroup this should always be read 
>>>>> (otherwise we get wrong values).
>>>>> E.g. fedora have had cgroups default on several years (I believe 
>>>>> most distros have it on).
>>>>>
>>>>> - No option is needed at all: right now we have wrong values your 
>>>>> fix will provide right ones, why would you ever what to turn that off?
>>>>
>>>> It's not that you would want to turn that off (necessarily) but just 
>>>> because cgroups capability exists it doesn't mean they have actually 
>>>> been enabled and configured - in which case reading all the cgroup 
>>>> info is unnecessary startup overhead. So for now this is opt-in - as 
>>>> was the experimental cgroup support we added. Once it becomes 
>>>> clearer how this needs to be used we can adjust the defaults. For 
>>>> now this is enabling technology only.
>>>
>>> If cgroup are mounted they are on and the only way to know the 
>>> configuration (such as no limits) is to actual read the cgroup 
>>> filesystem.
>>> Therefore the flag make no sense.
>>
>> No that is exactly why it is opt-in! Why should we have to waste 
>> startup time reading a bunch of cgroup values just to determine that 
>> cgroups are not actually being used!
> 
> If you have a cgroup enabled kernel they _are_ being used, no escaping 
> that.

A cgroup set to unlimited is not being used from a practical perspective.

> cgroup is not a simple yes and no so for which resources depend on how 
> you configured your kernel.
> To find out for what resource and what limits are set is we need to read 
> them.
> 
> I rather waste startup time (0.103292989 vs 0.103577139 seconds) and get 
> values correct, so our heuristic works fine out-of-the-box. (and if you 
> must, it opt-out)

I'd rather people say "Hey I'm using this add-on resource management API 
so don't ask the OS but please query the add-on.". Yes that is a little 
harsh but the lack of integration at the OS level is a huge impediment 
in my opinion.

> Also I notice that we don't read the numa values so the phys mem method 
> does a poor job. Correct would be check at least cgroup and numa bindings.

NUMA is another minefield.

> We also have this option UseCGroupMemoryLimitForHeap which should be 
> removed.

Bob already addressed why he was not getting rid of that initially.

>>
>>>>
>>>>> - log target container would make little sense since almost all 
>>>>> linuxes run with croups on.
>>>>
>>>> Again the capability is present but may not be enabled/configured.
>>>
>>> The capability is on if cgroup are mount and the only way to know the 
>>> configuration is to read the cgroup filesystem.
>>>
>>>>
>>>>> - For cpuset, the processes affinity mask already reflect cgroup 
>>>>> setting so you don't need to look into cgroup for that
>>>>>    If you do, you would miss any processes specific affinity mask. 
>>>>> So _cpu_count() should already be returning the right number of CPU's.
>>>>
>>>> While the process affinity mask reflect cpusets (and we already use 
>>>> it for that reason), it doesn't reflect shares and quotas. And if 
>>>> shares/quotas are enforced and someone sets a custom affinity mask, 
>>>> what is it all supposed to mean? That's one of the main reasons to 
>>>> allow the number of cpu's to be hardwired via a flag. So it's better 
>>>> IMHO to read everything from the cgroups if configured to use cgroups.
>>>
>>> I'm not taking about shares and quotes, they should be read of 
>>> course, but cpuset should be checked such as in _cpu_count.
>>>
>>> Here is the bug:
>>>
>>> [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java -Xlog:os=debug -cp 
>>> . ForEver | grep proc
>>> [0.002s][debug][os] Initial active processor count set to 4
>>> ^C
>>> [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java 
>>> -XX:+UseContainerSupport -Xlog:os=debug -cp . ForEver | grep proc
>>> [0.003s][debug][os] Initial active processor count set to 32
>>> ^C
>>>
>>> _cpu_count already does the right thing.
>>
>> But how do you then combine that information with the use of shares 
>> and/or quotas?
> 
> That I don't know, wild naive guess would be:
> active count ~ MIN(OSContainer::pd_active_processor_count(), cpuset); :)

That would be one option but it may not be meaningful. That said I don't 
think the use of quota or shares to define the number of available CPUs 
makes sense anyway.

Personally I don't think mixing direct use of cpusets with cgroup 
defined limits makes much sense.

> I assume everything we need to know is in: 
> https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt

Nope that only addresses cpusets. The one part of this that at least 
makes sense in isolation.

Cheers,
David

> Thanks, Robbin
> 
>>
>> David
>> -----
>>
>>> Thanks, Robbin
>>>
>>>
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>>>
>>>>> Thanks for trying to fixing this!
>>>>>
>>>>> /Robbin
>>>>>
>>>>> On 09/22/2017 04:27 PM, Bob Vandette wrote:
>>>>>> Please review these changes that improve on docker container 
>>>>>> detection and the
>>>>>> automatic configuration of the number of active CPUs and total and 
>>>>>> free memory
>>>>>> based on the containers resource limitation settings and metric 
>>>>>> data files.
>>>>>>
>>>>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ 
>>>>>> <http://cr.openjdk.java.net/~bobv/8146115/webrev.00/>
>>>>>>
>>>>>> These changes are enabled with -XX:+UseContainerSupport.
>>>>>>
>>>>>> You can enable logging for this support via -Xlog:os+container=trace.
>>>>>>
>>>>>> Since the dynamic selection of CPUs based on cpusets, quotas and 
>>>>>> shares
>>>>>> may not satisfy every users needs, I’ve added an additional flag 
>>>>>> to allow the
>>>>>> number of CPUs to be overridden.  This flag is named 
>>>>>> -XX:ActiveProcessorCount=xx.
>>>>>>
>>>>>>
>>>>>> Bob.
>>>>>>
>>>>>>
>>>>>>