RFR: 8146115 - Improve docker container detection and resource configuration usage

Bob Vandette bob.vandette at oracle.com
Mon Oct 2 15:46:48 UTC 2017


> On Sep 27, 2017, at 9:20 PM, David Holmes <David.Holmes at oracle.com> wrote:
> 
> Hi Bob,
> 
> On 28/09/2017 1:45 AM, Bob Vandette wrote:
>> David,  Thank you for taking the time and providing a detailed review of these changes.
>> Where I haven’t responded, I’ll update the implementation based on your comments.
> 
> Okay. I've trimmed below to only leave things I have follow up on.
> 
>>> If this is all confined to Linux only then this should be a linux-only flag and all the changes should be confined to linux code. No shared osContainer API is needed as it can be defined as a nested class of os::Linux, and will only be called from os_linux.cpp.
>> I received feedback on my other Container work where I was asked to
>> make sure it was possible to support other container technologies.
>> The addition of the shared osContainer API is to prepare for this and
>> recognize that this will eventually be supported other platforms.
> 
> The problem is that the proposed osContainer API is totally cgroup centric. That API might not make sense for a different container technology. Even if Docker is used on different platforms, does it use cgroups on those other platforms? Until we have at least two examples we want to support we don't know how to formulate a generic API. So in my opinion we should initially keep this Linux specific as a proof-of-concept for future more general container support.

I was trying to prepare for the JEP implementation where M&M and JFR hooks will need a shared API to call.
I was expecting to return a not supported error code on platforms that didn’t have the os specific implementations.
I did take a look at a few other types of containers (VMWare’s SDK for example) and they all had similar types of functions
for retrieving the number of cpus and quotas along with the memory limits, swap and free space.  I assumed that we
could clean up the shared APIs once we did the second container support.

In any case that work can be done by the JEP integration so I’m ok with making this os/linux specific but I still would
like to keep this support in it’s own file (osContainer_linux.cpp and osContainer_linux.hpp) so all the cgroup processing
is kept separate and these files don’t have to move later.  This would make it easier to support  alternate types of containers.
I also wanted to avoid adding lots more size to os_linux.cpp.  It’s already too big.

Bob.

> 
>>>> Since the dynamic selection of CPUs based on cpusets, quotas and shares
>>>> may not satisfy every users needs, I’ve added an additional flag to allow the
>>>> number of CPUs to be overridden.  This flag is named -XX:ActiveProcessorCount=xx.
>>> 
>>> I would suggest that ActiveProcessorCount be constrained to being >1 - this is in line with our plans to get rid of AssumeMP/os::is_MP and always build in MP support. Otherwise a count of 1 today won't behave the same as a count of 1 in the future.
>> What if I return true for is_MP anytime ActiveProcessorCount is set.   I’d like to provide the ability of specifying a single processor.
> 
> If I make the AssumeMP change for 18.3 as planned then this won't be an issue. I'd better get onto that :)
> 
>>> 
>>> Also you have defined this globally but only accounted for it on Linux. I think it makes sense to support this flag on all platforms (a generalization of AssumeMP). Otherwise it needs to be defined as a Linux-only flag in the pd_globals.hpp file
>> Good idea.
> 
> You could even factor this out as a separate issue/task independent of the container work.
> 
>>> Style issue:
>>> 
>>> 2121     if (i < 0) st->print("OSContainer::active_processor_count() failed");
>>> 2122     else
>>> 
>>> and elsewhere. Please move the st->print to its own line. Others may argue for always using blocks ({}) in if/else.
>> There doesn’t seem to be consistency on this issue.
> 
> No there's no consistency :( And this isn't in the hotspot style guide AFAICS. But I'm sure it's in some other coding guidelines ;-)
> 
>>> 5024   // User has overridden the number of active processors
>>> 5025   if (!FLAG_IS_DEFAULT(ActiveProcessorCount)) {
>>> 5026     log_trace(os)("active_processor_count: "
>>> 5027                   "active processor count set by user : %d",
>>> 5028                   (int)ActiveProcessorCount);
>>> 5029     return ActiveProcessorCount;
>>> 5030   }
>>> 
>>> We don't normally check flags in runtime code like this - this will be executed on every call, and you will see that logging each time. This should be handled during initialization (os::Posix::init()? - if applying this flag globally) - with logging occurring once. The above should just reduce to:
>>> 
>>> if (ActiveProcessorCount > 0) {
>>>  return ActiveProcessorCount; // explicit user control of number of cpus
>>> }
>>> 
>>> Even then I do get concerned about having to always check for the least common cases before the most common one. :(
>> This is not in a highly used function so it should be ok.
> 
> I really don't like seeing the FLAG_IS_DEFAULT in there - and you need to move the logging anyway.
> 
>>> 
>>> The osContainer_<os>.hpp files seem to be unnecessary as they are all empty.
>> I’ll remove them.  I wasn’t sure if there was a convention to move more of osContainer_linux.cpp -> osContainer_linux.hpp.
>> For example: classCgroupSubsystem
> 
> The header is only needed to expose an API for other code to use. Locally defined classes can be kept in the .cpp file.
> 
>>> 34 class CgroupSubsystem: CHeapObj<mtInternal> {
>>> 
>>> You defined this class as CHeapObj and added a destructor to free a few things, but I can't see where the instances of this class will themselves ever be freed
>> What’s the latest thinking on freeing CHeap Objects on termination?  Is it really worth wasting cpu cycles when our
>> process is about to terminate?  If not, I’ll just remove the destructors.
> 
> Philosophically I prefer new APIs to play nice with the invocation API, even if existing API's don't play nice. But that's just me.
> 
>>> 
>>> 62     void set_subsystem_path(char *cgroup_path) {
>>> 
>>> If this takes a "const char*" will it save you from casting string literals to "char*" elsewhere?
>> I tried several different ways of declaring the container accessor functions and
>> always ended up with warnings due to scanf not being able to validate arguments
>> since the format string didn’t end up being a string literal.  I originally was using templates
>> and then ended up with the macros.  I tried several different casts but could resolve the problem.
> 
> Sounds like something Kim Barrett should take a look at :)
> 
> Thanks,
> David



More information about the hotspot-dev mailing list