RFR: 8146115 - Improve docker container detection and resource configuration usage

Tue Oct 3 12:19:36 UTC 2017

Hi,

I'll leave that discussion for a while, another thing is:

In os::Linux::available_memory(), OSContainer::memory_limit_in_bytes() the limit can be larger than actual ram.
So we also need to check sysinfo e.g. return MIN(avail_mem, si.freeram * si.mem_unit).

So I think the check against "if (XXX == 9223372036854771712)" is not needed at all for any of those methods.
Just return what cgroup says if that is larger then the actual value pick the lower one.

/Robbin

On 10/03/2017 01:00 PM, David Holmes wrote:
> Hi Robbin,
> 
> On 3/10/2017 8:45 PM, Robbin Ehn wrote:
>> Hi David, I think we are seen the issue from complete opposite. (this RFE could be pushed as a bug from my POV)
> 
> Yes we see this completely opposite. I see this is a poorly integrated add-on API that we have to try to account for instead of being able to read an "always correct" value 
> from a standard OS API. They at least got the cpuset support correct by having sched_getaffinity correctly account for it. Alas the rest is ad-hoc.
> 
>>
>> On 10/03/2017 10:42 AM, David Holmes wrote:
>>> On 3/10/2017 6:00 PM, Robbin Ehn wrote:
>>>> Hi David,
>>>>
>>>> On 10/03/2017 12:46 AM, David Holmes wrote:
>>>>> Hi Robbin,
>>>>>
>>>>> I have some views on this :)
>>>>>
>>>>> On 3/10/2017 6:20 AM, Robbin Ehn wrote:
>>>>>> Hi Bob,
>>>>>>
>>>>>> As I said in your presentation for RT.
>>>>>> If kernel if configured with cgroup this should always be read (otherwise we get wrong values).
>>>>>> E.g. fedora have had cgroups default on several years (I believe most distros have it on).
>>>>>>
>>>>>> - No option is needed at all: right now we have wrong values your fix will provide right ones, why would you ever what to turn that off?
>>>>>
>>>>> It's not that you would want to turn that off (necessarily) but just because cgroups capability exists it doesn't mean they have actually been enabled and configured - 
>>>>> in which case reading all the cgroup info is unnecessary startup overhead. So for now this is opt-in - as was the experimental cgroup support we added. Once it becomes 
>>>>> clearer how this needs to be used we can adjust the defaults. For now this is enabling technology only.
>>>>
>>>> If cgroup are mounted they are on and the only way to know the configuration (such as no limits) is to actual read the cgroup filesystem.
>>>> Therefore the flag make no sense.
>>>
>>> No that is exactly why it is opt-in! Why should we have to waste startup time reading a bunch of cgroup values just to determine that cgroups are not actually being used!
>>
>> If you have a cgroup enabled kernel they _are_ being used, no escaping that.
> 
> A cgroup set to unlimited is not being used from a practical perspective.
> 
>> cgroup is not a simple yes and no so for which resources depend on how you configured your kernel.
>> To find out for what resource and what limits are set is we need to read them.
>>
>> I rather waste startup time (0.103292989 vs 0.103577139 seconds) and get values correct, so our heuristic works fine out-of-the-box. (and if you must, it opt-out)
> 
> I'd rather people say "Hey I'm using this add-on resource management API so don't ask the OS but please query the add-on.". Yes that is a little harsh but the lack of 
> integration at the OS level is a huge impediment in my opinion.
> 
>> Also I notice that we don't read the numa values so the phys mem method does a poor job. Correct would be check at least cgroup and numa bindings.
> 
> NUMA is another minefield.
> 
>> We also have this option UseCGroupMemoryLimitForHeap which should be removed.
> 
> Bob already addressed why he was not getting rid of that initially.
> 
>>>
>>>>>
>>>>>> - log target container would make little sense since almost all linuxes run with croups on.
>>>>>
>>>>> Again the capability is present but may not be enabled/configured.
>>>>
>>>> The capability is on if cgroup are mount and the only way to know the configuration is to read the cgroup filesystem.
>>>>
>>>>>
>>>>>> - For cpuset, the processes affinity mask already reflect cgroup setting so you don't need to look into cgroup for that
>>>>>>    If you do, you would miss any processes specific affinity mask. So _cpu_count() should already be returning the right number of CPU's.
>>>>>
>>>>> While the process affinity mask reflect cpusets (and we already use it for that reason), it doesn't reflect shares and quotas. And if shares/quotas are enforced and 
>>>>> someone sets a custom affinity mask, what is it all supposed to mean? That's one of the main reasons to allow the number of cpu's to be hardwired via a flag. So it's 
>>>>> better IMHO to read everything from the cgroups if configured to use cgroups.
>>>>
>>>> I'm not taking about shares and quotes, they should be read of course, but cpuset should be checked such as in _cpu_count.
>>>>
>>>> Here is the bug:
>>>>
>>>> [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java -Xlog:os=debug -cp . ForEver | grep proc
>>>> [0.002s][debug][os] Initial active processor count set to 4
>>>> ^C
>>>> [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java -XX:+UseContainerSupport -Xlog:os=debug -cp . ForEver | grep proc
>>>> [0.003s][debug][os] Initial active processor count set to 32
>>>> ^C
>>>>
>>>> _cpu_count already does the right thing.
>>>
>>> But how do you then combine that information with the use of shares and/or quotas?
>>
>> That I don't know, wild naive guess would be:
>> active count ~ MIN(OSContainer::pd_active_processor_count(), cpuset); :)
> 
> That would be one option but it may not be meaningful. That said I don't think the use of quota or shares to define the number of available CPUs makes sense anyway.
> 
> Personally I don't think mixing direct use of cpusets with cgroup defined limits makes much sense.
> 
>> I assume everything we need to know is in: https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt
> 
> Nope that only addresses cpusets. The one part of this that at least makes sense in isolation.
> 
> Cheers,
> David
> 
>> Thanks, Robbin
>>
>>>
>>> David
>>> -----
>>>
>>>> Thanks, Robbin
>>>>
>>>>
>>>>>
>>>>> Cheers,
>>>>> David
>>>>>
>>>>>>
>>>>>> Thanks for trying to fixing this!
>>>>>>
>>>>>> /Robbin
>>>>>>
>>>>>> On 09/22/2017 04:27 PM, Bob Vandette wrote:
>>>>>>> Please review these changes that improve on docker container detection and the
>>>>>>> automatic configuration of the number of active CPUs and total and free memory
>>>>>>> based on the containers resource limitation settings and metric data files.
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ <http://cr.openjdk.java.net/~bobv/8146115/webrev.00/>
>>>>>>>
>>>>>>> These changes are enabled with -XX:+UseContainerSupport.
>>>>>>>
>>>>>>> You can enable logging for this support via -Xlog:os+container=trace.
>>>>>>>
>>>>>>> Since the dynamic selection of CPUs based on cpusets, quotas and shares
>>>>>>> may not satisfy every users needs, I’ve added an additional flag to allow the
>>>>>>> number of CPUs to be overridden.  This flag is named -XX:ActiveProcessorCount=xx.
>>>>>>>
>>>>>>>
>>>>>>> Bob.
>>>>>>>
>>>>>>>
>>>>>>>