RFR: 8146115 - Improve docker container detection and resource configuration usage
David Holmes
david.holmes at oracle.com
Tue Sep 26 05:19:13 UTC 2017
Hi Bob,
I think there are some high-level decisions to be made regarding Linux
only or seemingly shared support for containers - see below - but
overall I don't have any major issues. I can't comment on the details of
the actual low-level cgroup queries etc.
Comments below ...
Bob writes:
> Please review these changes that improve on docker container detection and the
> automatic configuration of the number of active CPUs and total and free memory
> based on the containers resource limitation settings and metric data files.
>
> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ <http://cr.openjdk.java.net/~bobv/8146115/webrev.00/>
>
> These changes are enabled with -XX:+UseContainerSupport.
If this is all confined to Linux only then this should be a linux-only
flag and all the changes should be confined to linux code. No shared
osContainer API is needed as it can be defined as a nested class of
os::Linux, and will only be called from os_linux.cpp.
>
> You can enable logging for this support via -Xlog:os+container=trace.
I'm not sure that is the right tagging for all situations. For example
in os::Linux::available_memory you might log the actual amount for
os+memory, regardless of whether containerized on not. You can of course
also have container specific logging in the same function.
> Since the dynamic selection of CPUs based on cpusets, quotas and shares
> may not satisfy every users needs, I’ve added an additional flag to allow the
> number of CPUs to be overridden. This flag is named -XX:ActiveProcessorCount=xx.
I would suggest that ActiveProcessorCount be constrained to being >1 -
this is in line with our plans to get rid of AssumeMP/os::is_MP and
always build in MP support. Otherwise a count of 1 today won't behave
the same as a count of 1 in the future.
Also you have defined this globally but only accounted for it on Linux.
I think it makes sense to support this flag on all platforms (a
generalization of AssumeMP). Otherwise it needs to be defined as a
Linux-only flag in the pd_globals.hpp file
---
src/os/linux/vm/os_linux.cpp
In os::Linux::available_memory you might want more container specific
logging to track why you don't use container info even if containerized.
It might help expose a misconfigured container.
In os::Linux::print_container_info I'd probably just print nothing if
not containerized.
General style issue: we're not constrained to only declare local
variables at the start of a block. You should be able to declare the
variable at, or close to, the point of initialization.
Style issue:
2121 if (i < 0) st->print("OSContainer::active_processor_count()
failed");
2122 else
and elsewhere. Please move the st->print to its own line. Others may
argue for always using blocks ({}) in if/else.
2128 st->print("OSContainer::memory_limit_in_bytes: %ld", j);
And elsewhere: %ld should be JLONG_FORMAT when printing a jlong.
5024 // User has overridden the number of active processors
5025 if (!FLAG_IS_DEFAULT(ActiveProcessorCount)) {
5026 log_trace(os)("active_processor_count: "
5027 "active processor count set by user : %d",
5028 (int)ActiveProcessorCount);
5029 return ActiveProcessorCount;
5030 }
We don't normally check flags in runtime code like this - this will be
executed on every call, and you will see that logging each time. This
should be handled during initialization (os::Posix::init()? - if
applying this flag globally) - with logging occurring once. The above
should just reduce to:
if (ActiveProcessorCount > 0) {
return ActiveProcessorCount; // explicit user control of number of cpus
}
Even then I do get concerned about having to always check for the least
common cases before the most common one. :(
---
The osContainer_<os>.hpp files seem to be unnecessary as they are all empty.
---
osContainer.hpp:
The pd_* methods should be private.
You've exposed all of the potential cpu container functions as-if they
might be used directly, rather than just internally when calculating
available-processors. That's okay but you then need to document all the
possible return values - in particular the distinction between -2, -1, 0
and > 0.
---
osContainer.cpp
41 bool OSContainer::is_containerized() {
42 if (!_is_initialized) OSContainer::init();
43 return _is_containerized;
44 }
As OSContainer::init() is called at VM initialization the above should
simply assert that _is_initialized is true - else it means your call to
::init is in the wrong place and serves no purpose. That allows
is_containerized to be trivially inlined and placed in the .hpp file.
---
osContainer_linux.cpp
As Paul commented you should check whether you should be using
os::strdup, os::malloc, os::free etc to interact with NMT correctly -
and also respond to OOM appropriately.
34 class CgroupSubsystem: CHeapObj<mtInternal> {
You defined this class as CHeapObj and added a destructor to free a few
things, but I can't see where the instances of this class will
themselves ever be freed.
62 void set_subsystem_path(char *cgroup_path) {
If this takes a "const char*" will it save you from casting string
literals to "char*" elsewhere?
170 GEN_CONTAINER_GET_INFO(jlong, jlong, "%ld")
%ld should be JLONG_FORMAT
417 if (memlimit == 9223372036854771712)
Large constants should be defined using the CONST64 macro. And they
should only be defined once as Paul suggested.
485 * Algorythm:
Typo: -> Algorithm
509 cpus = OSContainer::cpu_cpuset_cpus();
516 log_error(os,container)("Error getting cpuset_cpucount");
That seems to be the wrong error message for the function called.
522 share_count = ceilf((float)share / 1024.0f);
The cast to float is not needed as you are dividing by a float.
509 cpus = OSContainer::cpu_cpuset_cpus();
510 if (cpus != (char *)CONTAINER_ERROR) {
This is invalid code - you can't cast -2 to char*. The only way to
report an error from a char* function is an actual error message, or
else NULL.
509 cpus = OSContainer::cpu_cpuset_cpus();
As you are inside a pd_* implementation, I don't think you want or need
to call the public versions of the other OSContainer functions - you can
just call the local pd_ variants directly. Nothing written at the
OSContainer level should be allowed to affect what you calculate here
inside your implementation.
555 * Return the number of miliseconds per period
Typo: -> milliseconds
525 else share_count = cpu_count;
Style (various locations): separate line and possibly {}
Thanks,
David
-----
>
> Bob.
More information about the hotspot-dev
mailing list