RFR: 8146115 - Improve docker container detection and resource configuration usage

David Holmes david.holmes at oracle.com
Tue Sep 26 05:19:13 UTC 2017


Hi Bob,

I think there are some high-level decisions to be made regarding Linux 
only or seemingly shared support for containers - see below - but 
overall I don't have any major issues. I can't comment on the details of 
the actual low-level cgroup queries etc.

Comments below ...

Bob writes:
> Please review these changes that improve on docker container detection and the
> automatic configuration of the number of active CPUs and total and free memory
> based on the containers resource limitation settings and metric data files.
> 
> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ <http://cr.openjdk.java.net/~bobv/8146115/webrev.00/>
> 
> These changes are enabled with -XX:+UseContainerSupport.

If this is all confined to Linux only then this should be a linux-only 
flag and all the changes should be confined to linux code. No shared 
osContainer API is needed as it can be defined as a nested class of 
os::Linux, and will only be called from os_linux.cpp.

> 
> You can enable logging for this support via -Xlog:os+container=trace.

I'm not sure that is the right tagging for all situations. For example 
in os::Linux::available_memory you might log the actual amount for 
os+memory, regardless of whether containerized on not. You can of course 
also have container specific logging in the same function.

> Since the dynamic selection of CPUs based on cpusets, quotas and shares
> may not satisfy every users needs, I’ve added an additional flag to allow the
> number of CPUs to be overridden.  This flag is named -XX:ActiveProcessorCount=xx.

I would suggest that ActiveProcessorCount be constrained to being >1 - 
this is in line with our plans to get rid of AssumeMP/os::is_MP and 
always build in MP support. Otherwise a count of 1 today won't behave 
the same as a count of 1 in the future.

Also you have defined this globally but only accounted for it on Linux. 
I think it makes sense to support this flag on all platforms (a 
generalization of AssumeMP). Otherwise it needs to be defined as a 
Linux-only flag in the pd_globals.hpp file

---

src/os/linux/vm/os_linux.cpp

In os::Linux::available_memory you might want more container specific 
logging to track why you don't use container info even if containerized. 
It might help expose a misconfigured container.

In os::Linux::print_container_info I'd probably just print nothing if 
not containerized.

General style issue: we're not constrained to only declare local 
variables at the start of a block. You should be able to declare the 
variable at, or close to, the point of initialization.

Style issue:

2121     if (i < 0) st->print("OSContainer::active_processor_count() 
failed");
2122     else

and elsewhere. Please move the st->print to its own line. Others may 
argue for always using blocks ({}) in if/else.

2128       st->print("OSContainer::memory_limit_in_bytes: %ld", j);

And elsewhere: %ld should be JLONG_FORMAT when printing a jlong.

5024   // User has overridden the number of active processors
5025   if (!FLAG_IS_DEFAULT(ActiveProcessorCount)) {
5026     log_trace(os)("active_processor_count: "
5027                   "active processor count set by user : %d",
5028                   (int)ActiveProcessorCount);
5029     return ActiveProcessorCount;
5030   }

We don't normally check flags in runtime code like this - this will be 
executed on every call, and you will see that logging each time. This 
should be handled during initialization (os::Posix::init()? - if 
applying this flag globally) - with logging occurring once. The above 
should just reduce to:

if (ActiveProcessorCount > 0) {
   return ActiveProcessorCount; // explicit user control of number of cpus
}

Even then I do get concerned about having to always check for the least 
common cases before the most common one. :(

---

The osContainer_<os>.hpp files seem to be unnecessary as they are all empty.

---

osContainer.hpp:

The pd_* methods should be private.

You've exposed all of the potential cpu container functions as-if they 
might be used directly, rather than just internally when calculating 
available-processors. That's okay but you then need to document all the 
possible return values - in particular the distinction between -2, -1, 0 
and > 0.

---

osContainer.cpp

   41 bool OSContainer::is_containerized() {
   42   if (!_is_initialized) OSContainer::init();
   43   return _is_containerized;
   44 }

As OSContainer::init() is called at VM initialization the above should 
simply assert that _is_initialized is true - else it means your call to 
::init is in the wrong place and serves no purpose. That allows 
is_containerized to be trivially inlined and placed in the .hpp file.

---

osContainer_linux.cpp

As Paul commented you should check whether you should be using 
os::strdup, os::malloc, os::free etc to interact with NMT correctly - 
and also respond to OOM appropriately.

34 class CgroupSubsystem: CHeapObj<mtInternal> {

You defined this class as CHeapObj and added a destructor to free a few 
things, but I can't see where the instances of this class will 
themselves ever be freed.

62     void set_subsystem_path(char *cgroup_path) {

If this takes a "const char*" will it save you from casting string 
literals to "char*" elsewhere?

170 GEN_CONTAINER_GET_INFO(jlong, jlong, "%ld")

%ld should be JLONG_FORMAT

417   if (memlimit == 9223372036854771712)

Large constants should be defined using the CONST64 macro. And they 
should only be defined once as Paul suggested.

  485  * Algorythm:

Typo: -> Algorithm

509   cpus = OSContainer::cpu_cpuset_cpus();
516     log_error(os,container)("Error getting cpuset_cpucount");

That seems to be the wrong error message for the function called.

522     share_count = ceilf((float)share / 1024.0f);

The cast to float is not needed as you are dividing by a float.

  509   cpus = OSContainer::cpu_cpuset_cpus();
  510   if (cpus != (char *)CONTAINER_ERROR) {

This is invalid code - you can't cast -2 to char*. The only way to 
report an error from a char* function is an actual error message, or 
else NULL.

509   cpus = OSContainer::cpu_cpuset_cpus();

As you are inside a pd_* implementation, I don't think you want or need 
to call the public versions of the other OSContainer functions - you can 
just call the local pd_ variants directly. Nothing written at the 
OSContainer level should be allowed to affect what you calculate here 
inside your implementation.

555  * Return the number of miliseconds per period

Typo: -> milliseconds

525   else share_count = cpu_count;

Style (various locations): separate line and possibly {}

Thanks,
David
-----

> 
> Bob.



More information about the hotspot-dev mailing list