RFR: 8146115 - Improve docker container detection and resource configuration usage

Mon Oct 23 01:59:55 UTC 2017

Sorry just spotted a minor issue when actually running the code. Many of 
you log statements include \n in the format string. This is unnecessary 
and results in lots of blank lines in the logging output eg:

[0.002s][trace][os,container] OSContainer::init: Initializing Container 
Support
[0.003s][trace][os,container] Path to /memory.limit_in_bytes is 
/cgroup/memory//memory.limit_in_bytes

[0.003s][trace][os,container] Memory Limit is: 9223372036854775807

[0.004s][trace][os,container] Memory Limit is: Unlimited

[0.004s][trace][os          ] active_processor_count: using static path 
- configured processors: 4
[0.004s][trace][os          ] active_processor_count: sched_getaffinity 
processor count: 4
[0.004s][trace][os,container] Path to /cpu.shares is /cgroup/cpu//cpu.shares

[0.005s][trace][os,container] CPU Shares is: 1024

[0.005s][trace][os,container] Path to /cpu.cfs_quota_us is 
/cgroup/cpu//cpu.cfs_quota_us

[0.005s][debug][os,container] file not found /cgroup/cpu//cpu.cfs_quota_us

[0.005s][debug][os,container] Error reading /cpu.cfs_quota_us
[0.005s][trace][os,container] Path to /cpu.cfs_period_us is 
/cgroup/cpu//cpu.cfs_period_us

[0.006s][debug][os,container] file not found /cgroup/cpu//cpu.cfs_period_us

[0.006s][debug][os,container] Error reading /cpu.cfs_period_us

Thanks,
David

On 23/10/2017 7:52 AM, David Holmes wrote:
> Hi Bob,
> 
> Changes seem fine.
> 
> I'll take up the issue of whether this should be enabled by default in 
> the CSR.
> 
> Thanks,
> David
> 
> On 21/10/2017 4:44 AM, Bob Vandette wrote:
>> Here’s an updated webrev that hopefully takes care of all remaining 
>> comments.
>>
>> http://cr.openjdk.java.net/~bobv/8146115/webrev.02
>>
>> I added the deprecation of the UseCGroupMemoryLimitForHeap option this 
>> round since
>> this experimental option should no longer be necessary.
>>
>>
>> Bob.
>>
>>
>>> On Oct 13, 2017, at 9:34 AM, David Holmes <David.Holmes at oracle.com> 
>>> wrote:
>>>
>>> Reading back through my suggestion for os.hpp 
>>> initialize_container_support should just be init_container_support.
>>>
>>> Thanks,
>>> David
>>>
>>> On 13/10/2017 11:14 PM, Bob Vandette wrote:
>>>>> On Oct 12, 2017, at 11:08 PM, David Holmes 
>>>>> <david.holmes at oracle.com> wrote:
>>>>>
>>>>> Hi Bob,
>>>>>
>>>>> On 13/10/2017 1:43 AM, Bob Vandette wrote:
>>>>>>> On Oct 11, 2017, at 9:04 PM, David Holmes 
>>>>>>> <david.holmes at oracle.com> wrote:
>>>>>>>
>>>>>>> Hi Bob,
>>>>>>>
>>>>>>> On 12/10/2017 5:11 AM, Bob Vandette wrote:
>>>>>>>> Here’s an updated webrev for this RFE that contains changes and 
>>>>>>>> cleanups based on feedback I’ve received so far.
>>>>>>>> I’m still investigating the best approach for reacting to cpu 
>>>>>>>> shares and quotas.  I do not believe doing nothing is the answer.
>>>>>>>
>>>>>>> I do. :) Let me try this again. When you run outside of a 
>>>>>>> container you don't get 100% of the CPUs - you have to share with 
>>>>>>> whatever else is running on the system. You get a fraction of CPU 
>>>>>>> time based on the load. We don't try to communicate load 
>>>>>>> information to the VM/application so it can adapt. Within a 
>>>>>>> container setting shares/quotas is just a way of setting an 
>>>>>>> artificial load. So why should we be treating it any differently?
>>>>>> Because today we optimize for a lightly loaded system and when 
>>>>>> running serverless applications in containers we should be
>>>>>> optimizing for a fully loaded system.  If developers don’t want 
>>>>>> this, then don’t use shares or quotas and you’ll have exactly
>>>>>> the behavior you have today.  I think we just have to document the 
>>>>>> new behavior (and how to turn it off) so people know what
>>>>>> to expect.
>>>>>
>>>>> The person deploying the app may not have control over how the app 
>>>>> is deployed in terms of shares/quotas. It all depends how (and who) 
>>>>> manages the containers. This is a big part of my problem/concerns 
>>>>> here that I don't know exactly how all this is organized and who 
>>>>> knows what in advance and what they can control.
>>>>>
>>>>> But I'll let this drop, other than raising an additional concern. I 
>>>>> don't think just allowing the user to hardwire the number of 
>>>>> processors to use will necessarily solve the problem with what 
>>>>> available_processors() returns. I'm concerned the execution of the 
>>>>> VM may occur in a context where the number of processors is not 
>>>>> known in advance, and the user can not disable shares/quotas. In 
>>>>> that case we may need to have a flag that says to ignore 
>>>>> shares/quotas in the processor count calculation.
>>>> I’m not sure that’s a high probability issue.  It’s my understanding 
>>>> that whoever is configuring the container
>>>> management will be specifying the resources required to run these 
>>>> applications which comes along with a
>>>> guarantee of these resources.  If this issue does come up, I do have 
>>>> the -XX:-UseContainerSupport big
>>>> switch that turns all of this off.  It will however disable the 
>>>> memory support as well.
>>>>>
>>>>>> You seem to discount the added cost of 100s of VMs creating lots 
>>>>>> of un-necessaary threads.  In the current JDK 10 code base,
>>>>>> In a heavily loaded system with 88 processors, VmData grows from 
>>>>>> 60MBs (1 cpu) to 376MB (88 cpus).  This is only mapped
>>>>>> memory and it depends heavily on how deep in the stack these 
>>>>>> threads go before it impacts VmRSS but it shows the potential 
>>>>>> downside
>>>>>> of having 100s of VMs thinking they each own the entire machine.
>>>>>
>>>>> I agree that the default ergonomics does not scale well. Anyone 
>>>>> doing any serious Java deployment tunes the VM explicitly and does 
>>>>> not rely on the defaults. How will they do that in a container 
>>>>> environment? I don't know.
>>>>>
>>>>> I would love to see some actual deployment scenarios/experiences 
>>>>> for this to understand things better.
>>>> This is one of the reasons I want to get this support out in JDK 10, 
>>>> to get some feedback under real scenarios.
>>>>>
>>>>>> I haven’t even done any experiments to determine the added context 
>>>>>> switching cost if the VM decides to use excessive
>>>>>> pthreads.
>>>>>>>
>>>>>>> That's not to say an API to provide load/shares/quota information 
>>>>>>> may not be useful, but that is a separate issue to what the 
>>>>>>> "active processor count" should report.
>>>>>> I don’t have a problem with active processor count reporting the 
>>>>>> number of processors we have, but I do have a problem
>>>>>> with our current usage of this information within the VM and Core 
>>>>>> libraries.
>>>>>
>>>>> That is a somewhat separate issue. One worth pursuing separately.
>>>> We should look at this as part of the “Container aware Java” JEP.
>>>>>
>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.01
>>>>>>>> Updates:
>>>>>>>> 1. I had to move the processing of AggressiveHeap since the 
>>>>>>>> container memory size needs to be known before this can be 
>>>>>>>> processed.
>>>>>>>
>>>>>>> I don't like the placement of this - we don't call os:: init 
>>>>>>> functions from inside Arguments - we manage the initialization 
>>>>>>> sequence from Threads::create_vm. Seems to me that container 
>>>>>>> initialization can/should happen in os::init_before_ergo, and the 
>>>>>>> AggressiveHeap processing can occur at the start of 
>>>>>>> Arguments::apply_ergo().
>>>>>>>
>>>>>>> That said we need to be sure nothing touched by 
>>>>>>> set_aggressive_heap_flags will be used before we now reach that 
>>>>>>> code - there are a lot of flags being set in there.
>>>>>> This is exactly the reason why I put the call where it did.  I put 
>>>>>> the call to set_aggressive_heap_flags in finalize_vm_init_args
>>>>>> because that is exactly what this call is doing.  It’s finalizing 
>>>>>> flags used after the parsing.  The impacted flags are definitely 
>>>>>> being
>>>>>> used shortly after and before init_before_ergo is called.
>>>>>
>>>>> I see that now and it is very unfortunate because I really do not 
>>>>> like what you had to do here. As you can tell from the logic in 
>>>>> create_vm we have always refactored to ensure we can progressively 
>>>>> manage the interleaving of OS initialization with Arguments 
>>>>> processing. So having a deep part of Argument processing go off and 
>>>>> call some more OS initialization is not nice. That said I can't see 
>>>>> a way around it without very unreasonable refactoring.
>>>>>
>>>>> But I do have a couple of changes I'd like to request please:
>>>>>
>>>>> 1. Move the call to os::initialize_container_support() up a level 
>>>>> to before the call to finalize_vm_init_args(), with a more 
>>>>> elaborate comment:
>>>>>
>>>>> // We need to ensure processor and memory resources have been properly
>>>>> // configured - which may rely on arguments we just processed - before
>>>>> // doing the final argument processing. Any argument processing that
>>>>> // needs to know about processor and memory resources must occur after
>>>>> // this point.
>>>>>
>>>>> os::initialize_container_support();
>>>>>
>>>>> // Do final processing now that all arguments have been parsed
>>>>> result = finalize_vm_init_args(patch_mod_javabase);
>>>>>
>>>>> 2. Simplify and modify os.hpp as follows:
>>>>>
>>>>> +  LINUX_ONLY(static void pd_initialize_container_support();)
>>>>>
>>>>>    public:
>>>>>     static void init(void);                      // Called before 
>>>>> command line parsing
>>>>>
>>>>> +   static void initialize_container_support() { // Called during 
>>>>> command line parsing
>>>>> +     LINUX_ONLY(pd_initialize_container_support();)
>>>>> +   }
>>>>>
>>>>>     static void init_before_ergo(void);          // Called after 
>>>>> command line parsing
>>>>>                                                  // before VM 
>>>>> ergonomics
>>>>>
>>>>> 3. In thread.cpp add a comment here:
>>>>>
>>>>>    // Parse arguments
>>>>> +  // Note: this internally calls os::initialize_container_support()
>>>>>    jint parse_result = Arguments::parse(args);
>>>> All very reasonable changes.
>>>> Thanks,
>>>> Bob.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>>>
>>>>>>>> 2. I no longer use the cpuset.cpus contents since 
>>>>>>>> sched_getaffinity reports the correct results
>>>>>>>> even if someone manually updates the cgroup data.  I originally 
>>>>>>>> didn’t think this was the case since
>>>>>>>> sched_setaffinity didn’t automatically update the cpuset file 
>>>>>>>> contents but the inverse is true.
>>>>>>>
>>>>>>> Ok.
>>>>>>>
>>>>>>>> 3. I ifdef’d the container function support in 
>>>>>>>> src/hotspot/share/runtime/os.hpp to avoid putting stubs in all 
>>>>>>>> other os
>>>>>>>> platform directories.  I can do this if it’s absolutely necessary.
>>>>>>>
>>>>>>> You should not need to do this if initialization moves as I 
>>>>>>> suggested above. os::init_before_ergo() in os_linux.cpp can call 
>>>>>>> OSContainer::init().
>>>>>>> No need for os::initialize_container_support() or 
>>>>>>> os::pd_initialize_container_support.
>>>>>> But os::init_before_ergo is in shared code.
>>>>>
>>>>> Yep my bad - point is moot now anyway.
>>>>>
>>>>> <snip>
>>>>>
>>>>>>> src/hotspot/os/linux/os_linux.cpp/.hpp
>>>>>>>
>>>>>>> 187         log_trace(os)("available container memory: " 
>>>>>>> JULONG_FORMAT, avail_mem);
>>>>>>> 188         return avail_mem;
>>>>>>> 189       } else {
>>>>>>> 190         log_debug(os,container)("container memory usage call 
>>>>>>> failed: " JLONG_FORMAT, mem_usage);
>>>>>>>
>>>>>>> Why "trace" (the third logging level) to show the information, 
>>>>>>> but "debug" (the second level) to show failed calls? You use 
>>>>>>> debug in other files for basic info. Overall I'm unclear on your 
>>>>>>> use of debug versus trace for the logging.
>>>>>> I use trace for noisy information that is not reporting errors and 
>>>>>> debug for failures that are informational and not fatal.
>>>>>> In this case, the call could return -1 or -2.  -1 is unlimited and 
>>>>>> -2 is an error.  In either case we fallback to the
>>>>>> standard system call to get available memory.  I would have used 
>>>>>> warning but since these messages were occurring
>>>>>> during a test run causing test failures.
>>>>>
>>>>> Okay. Thanks for clarifying.
>>>>>
>>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> src/hotspot/os/linux/osContainer_linux.cpp
>>>>>>>
>>>>>>> Dead code:
>>>>>>>
>>>>>>> 376 #if 0
>>>>>>> 377   os::Linux::print_container_info(tty);
>>>>>>> ...
>>>>>>> 390 #endif
>>>>>> I left it in for standalone testing.  Should I use some other #if?
>>>>>
>>>>> We don't generally leave in dead code in the runtime code. Do you 
>>>>> see this as useful after you've finalized the changes?
>>>>>
>>>>> Is this testing just for showing the logging? Is it worth making 
>>>>> this a logging controlled call? Is it suitable for a Gtest test?
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>>> Bob.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>> Bob.
>>