RFR: 8146115 - Improve docker container detection and resource configuration usage

Thu Oct 12 15:43:17 UTC 2017

> On Oct 11, 2017, at 9:04 PM, David Holmes <david.holmes at oracle.com> wrote:
> 
> Hi Bob,
> 
> On 12/10/2017 5:11 AM, Bob Vandette wrote:
>> Here’s an updated webrev for this RFE that contains changes and cleanups based on feedback I’ve received so far.
>> I’m still investigating the best approach for reacting to cpu shares and quotas.  I do not believe doing nothing is the answer.
> 
> I do. :) Let me try this again. When you run outside of a container you don't get 100% of the CPUs - you have to share with whatever else is running on the system. You get a fraction of CPU time based on the load. We don't try to communicate load information to the VM/application so it can adapt. Within a container setting shares/quotas is just a way of setting an artificial load. So why should we be treating it any differently?
Because today we optimize for a lightly loaded system and when running serverless applications in containers we should be 
optimizing for a fully loaded system.  If developers don’t want this, then don’t use shares or quotas and you’ll have exactly
the behavior you have today.  I think we just have to document the new behavior (and how to turn it off) so people know what 
to expect.  

You seem to discount the added cost of 100s of VMs creating lots of un-necessaary threads.  In the current JDK 10 code base,
In a heavily loaded system with 88 processors, VmData grows from 60MBs (1 cpu) to 376MB (88 cpus).  This is only mapped
memory and it depends heavily on how deep in the stack these threads go before it impacts VmRSS but it shows the potential downside
of having 100s of VMs thinking they each own the entire machine.

I haven’t even done any experiments to determine the added context switching cost if the VM decides to use excessive
pthreads.

> 
> That's not to say an API to provide load/shares/quota information may not be useful, but that is a separate issue to what the "active processor count" should report.
I don’t have a problem with active processor count reporting the number of processors we have, but I do have a problem
with our current usage of this information within the VM and Core libraries.

> 
>> http://cr.openjdk.java.net/~bobv/8146115/webrev.01
>> Updates:
>> 1. I had to move the processing of AggressiveHeap since the container memory size needs to be known before this can be processed.
> 
> I don't like the placement of this - we don't call os:: init functions from inside Arguments - we manage the initialization sequence from Threads::create_vm. Seems to me that container initialization can/should happen in os::init_before_ergo, and the AggressiveHeap processing can occur at the start of Arguments::apply_ergo().
> 
> That said we need to be sure nothing touched by set_aggressive_heap_flags will be used before we now reach that code - there are a lot of flags being set in there.

This is exactly the reason why I put the call where it did.  I put the call to set_aggressive_heap_flags in finalize_vm_init_args
because that is exactly what this call is doing.  It’s finalizing flags used after the parsing.  The impacted flags are definitely being
used shortly after and before init_before_ergo is called.  

> 
>> 2. I no longer use the cpuset.cpus contents since sched_getaffinity reports the correct results
>> even if someone manually updates the cgroup data.  I originally didn’t think this was the case since
>> sched_setaffinity didn’t automatically update the cpuset file contents but the inverse is true.
> 
> Ok.
> 
>> 3. I ifdef’d the container function support in src/hotspot/share/runtime/os.hpp to avoid putting stubs in all other os
>> platform directories.  I can do this if it’s absolutely necessary.
> 
> You should not need to do this if initialization moves as I suggested above. os::init_before_ergo() in os_linux.cpp can call OSContainer::init().

> No need for os::initialize_container_support() or os::pd_initialize_container_support.

But os::init_before_ergo is in shared code.

> 
> 
> Some further comments:
> 
> src/hotspot/share/runtime/globals.hpp
> 
> +           "Optimize heap optnios
> 
> Typo.

Thx.

> 
> +   product(intx, ActiveProcessorCount, -1,

Cut and paste issue, fixed.

> 
> Why intx? It can be int then the logging
> 
>     log_trace(os)("active_processor_count: "
>                   "active processor count set by user : %d",
>                   (int)ActiveProcessorCount);
> 
> can use %d without casts. Or you can init to 0 and make it uint (and use %u).
> 
> +   product(bool, UseContainerSupport, true,      \
> +           "(Linux Only)
> 
> Sorry don't recall if we already covered this, but this should be in ./os/linux/globals_linux.hpp

Fixed.

> 
> ---
> 
> src/hotspot/os/linux/os_linux.cpp/.hpp
> 
> 187         log_trace(os)("available container memory: " JULONG_FORMAT, avail_mem);
> 188         return avail_mem;
> 189       } else {
> 190         log_debug(os,container)("container memory usage call failed: " JLONG_FORMAT, mem_usage);
> 
> Why "trace" (the third logging level) to show the information, but "debug" (the second level) to show failed calls? You use debug in other files for basic info. Overall I'm unclear on your use of debug versus trace for the logging.

I use trace for noisy information that is not reporting errors and debug for failures that are informational and not fatal.
In this case, the call could return -1 or -2.  -1 is unlimited and -2 is an error.  In either case we fallback to the 
standard system call to get available memory.  I would have used warning but since these messages were occurring
during a test run causing test failures.

> 
> ---
> 
> src/hotspot/os/linux/osContainer_linux.cpp
> 
> Dead code:
> 
> 376 #if 0
> 377   os::Linux::print_container_info(tty);
> ...
> 390 #endif

I left it in for standalone testing.  Should I use some other #if?

Bob.

> 
> Thanks,
> David
> 
>> Bob.