RFR: 8197408: Bad pointer comparison and small cleanup in os_linux.cpp

Robbin Ehn robbin.ehn at oracle.com
Thu Feb 15 13:33:22 UTC 2018


Hi David,

On 2018-02-15 14:07, David Holmes wrote:
> On 15/02/2018 10:36 PM, Robbin Ehn wrote:
>> Hi David,
>>
>> On 2018-02-15 12:13, David Holmes wrote:
>>> Hi Robbin,
>>>
>>> On 15/02/2018 7:59 PM, Robbin Ehn wrote:
>>>> Hi,
>>>>
>>>> I notice there is something strange about container test.
>>>> They are not in any test group and mach5 don't run them when specified via path.
>>>
>>> That's because we don't have containerized test environments generally available and the tests have:
>>
>> In the actual tests we only care about cgroups as far as I can see?
> 
> Everything I see in runtime/containers/docker is utilising Docker not (just) cgroups.

The tests reads cgroups values, e.g.:

    private static void testTheSet(String setType) throws Exception {
         String cpuSetStr = CPUSetsReader.readFromProcStatus(setType);

Why do you think you need docker to read cgroup value? cgroups are a kernel feature of Linux.
Docker is a random application, who happens to use this feature.

> 
>> So (most/all?) of the tests should work perfectly fine outside docker with just cgroups.
> 
> Wouldn't you need root access to configure the cgroups outside of Docker??

We are already inside a container, e.g. the test CheckContainerized.java will say yes on mach5 as you see below.
 >> jib > [0.001s][trace][os,container] OSContainer::init: Initializing Container Support

If you need to change some value that will work for e.g. cpuset without root.
For other stuff supposedly you need root, but if that is the case docker also needs to be run as root.
I would just do LD_PRELOAD and give back whatever value I wanted to test for instead of launching docker just to set some cgroup values.

/Robbin

> 
> David
> ----
> 
>> And since cgroups are default ON (as in the scheduler will put all task into an unlimited group) ~all Linuxes are _default_ containerized.
>> Plus in mach5 we run in mesos which uses cgroups, so test would work for that reason also!
>>
>> mach5:
>> jib > [0.001s][trace][os,container] OSContainer::init: Initializing Container Support
>> jib > [0.001s][trace][os,container] Path to /memory.limit_in_bytes is /sys/fs/cgroup/memory/system.slice/mesos-slave.service/memory.limit_in_bytes
>> jib > [0.001s][trace][os,container] Memory Limit is: 9223372036854775807
>> jib > [0.001s][trace][os,container] Memory Limit is: Unlimited
>> jib > [0.001s][trace][os,container] Path to /cpu.shares is /sys/fs/cgroup/cpu,cpuacct/system.slice/mesos-slave.service/cpu.shares
>> jib > [0.001s][trace][os,container] CPU Shares is: 1024
>> ...
>>
>> We are default running in containerized environment.
>>
>> /Robbin
>>
>>
>>>
>>>   * @requires docker.support
>>>
>>> I've had the chance to look at this more closely and the refactoring seems okay to me.
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>>
>>>> *sigh*
>>>>
>>>> So I have only manually verified this.
>>>>
>>>> Bob, can you have a look at changeset?
>>>>
>>>> /Robbin
>>>>
>>>> On 2018-02-15 10:37, Robbin Ehn wrote:
>>>>> Hi Thomas, thanks for having a look.
>>>>>
>>>>> On 2018-02-14 15:06, Thomas Stüfe wrote:
>>>>>> Hi Robin,
>>>>>>
>>>>>> Had a short look. Note that I cannot open the issue. Link is wrong, and the link in the source.patch is wrong too. So, I just looked at the webrev.
>>>>>>
>>>>>
>>>>> Sorry bug is:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8197408
>>>>>
>>>>> Inc here:
>>>>> http://cr.openjdk.java.net/~rehn/8197408/v2/inc/webrev/
>>>>> Full here:
>>>>> http://cr.openjdk.java.net/~rehn/8197408/v2/full/webrev/
>>>>>
>>>>> Comments on remarks below:
>>>>>
>>>>>> Remarks:
>>>>>>
>>>>>> -> This is a matter of taste but I prefer pointers to references for output variables. Makes the intent clearer at the calling site, and that way you could get rid of the /* output */ comment.
>>>>>
>>>>> We had a debate about this, my opinion was that avoiding the NULL check makes it worth using a ref, but I see I'm pretty lonely in this camp so fixed.
>>>>>
>>>>>>
>>>>>> -> in available_memory_container():
>>>>>> Could we also add "using host value" to the logging in error case as you do in physical_memory_container()?
>>>>>
>>>>> Fixed, and tried to improve the logs.
>>>>>
>>>>>>
>>>>>> -> (Not part of your patch) Can OSContainer::memory_limit_in_bytes() actually ever return "OSCONTAINER_ERROR"? I may be wrong here but:
>>>>>>
>>>>>> OSCONTAINER_ERROR = -2. /memory.limit_in_bytes gets returned as julong, so we have (julong)(-2). That gets compared with julong _unlimited_memory which is basically LONG_MAX, so signed long max, which should be smaller than (julong)(-2), or? So OSContainer::memory_limit_in_bytes() should always return -1 for both errors and the unlimited case.
>>>>>>
>>>>>
>>>>> Manually tested this, it seems to work. The -2 gets promoted to unsigned in comparison.
>>>>> We go from (jlong)-2 to (julong)ULONG_MAX-1 back to (jlong)-2.
>>>>> Not obviously that it will always work. I will not touch that in this changeset.
>>>>>
>>>>>> -> Can we:
>>>>>> -  st->print("container_type: %s\n", p != NULL ? p : "failed");
>>>>>> +  st->print_cr("container_type: %s", p != NULL ? p : "failed");
>>>>>> ?
>>>>>
>>>>> All of these use print + \n, you want me to just change this one or ?
>>>>> I'll rather leave them alone.
>>>>>
>>>>>>
>>>>>> -> char * OSContainer::container_type() - you free() that value but it is not strdup()ed.
>>>>>
>>>>> Thanks for seeing this, copy-paste :)
>>>>>
>>>>>>
>>>>>> (Btw I do not like that some functions return strduped values, some do not. If OSContainer::container_type() wants to return a static string, its return type should be at least const char*, not char*. They even added an explicit cast to nonconst char* .)
>>>>>
>>>>> Changed to const.
>>>>>
>>>>> Thanks, Robbin
>>>>>
>>>>>>
>>>>>> Kind Regards, Thomas
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 14, 2018 at 2:18 PM, Robbin Ehn <robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com>> wrote:
>>>>>>
>>>>>>     Ping!
>>>>>>
>>>>>>     /Robbin
>>>>>>
>>>>>>
>>>>>>     On 2018-02-08 13:35, Robbin Ehn wrote:
>>>>>>
>>>>>>         Hi David,
>>>>>>
>>>>>>         On 02/08/2018 01:19 PM, David Holmes wrote:
>>>>>>
>>>>>>             On 8/02/2018 10:08 PM, Robbin Ehn wrote:
>>>>>>
>>>>>>                 Hi David,
>>>>>>
>>>>>>                 On 02/08/2018 12:43 PM, David Holmes wrote:
>>>>>>
>>>>>>                     Hi Robbin,
>>>>>>
>>>>>>                     On 8/02/2018 7:03 PM, Robbin Ehn wrote:
>>>>>>
>>>>>>                         Hi all,
>>>>>>
>>>>>>                         There is a bad pointer comparison in os_linux.cpp while looking at that
>>>>>>
>>>>>>
>>>>>>                     You seem to be missing the fact that OSContainer::cpu_cpuset_memory_nodes() can return a pointer or an error code.
>>>>>>
>>>>>>
>>>>>>                 This is not true for macro:
>>>>>>                 GET_CONTAINER_INFO_CPTR
>>>>>>
>>>>>>                 As far I can see?
>>>>>>
>>>>>>
>>>>>>             Sorry - you're right. I misread current code and misremembered what happened at the initial code review - where I'm sure this "pointer versus error code" issue was also flagged.
>>>>>>
>>>>>>                 It returns:
>>>>>>                 if (err != 0)
>>>>>>                     return (return_type) NULL;
>>>>>>                 or:
>>>>>>                 return os::strdup(mems);
>>>>>>
>>>>>>                 If you know a method that returns an integer in a char*, it's broken should be fixed.
>>>>>>
>>>>>>
>>>>>>                         I saw some if statement were missing bracket, a lot of extra scopes and complexity in the scoping.
>>>>>>
>>>>>>
>>>>>>                     You'd better check all this with Bob Vandette as its his container support code.
>>>>>>
>>>>>>
>>>>>>                 Not sure what you mean. It passes container tests.
>>>>>>
>>>>>>
>>>>>>             In what environment did you run the container tests? Most of the code you've been refactoring deals with various errors and misconfigurations that can occur.
>>>>>>
>>>>>>             I'm sure Bob will want a chance to check the refactoring still does as he intended.
>>>>>>
>>>>>>
>>>>>>         On mach5 all platforms with the container test (+ hotspot_tier1) and locally.
>>>>>>         I said before, regarding containers, all Linux have cgroups configured so this logic always thinks we are in a container and does this logging and calculation.
>>>>>>
>>>>>>         But no there is no tests for miss-configuration that I found.
>>>>>>
>>>>>>         Added Bob!
>>>>>>
>>>>>>         Thanks, Robbin
>>>>>>
>>>>>>
>>>>>>
>>>>>>             Cheers,
>>>>>>             David
>>>>>>             -----
>>>>>>
>>>>>>
>>>>>>                 Thanks, Robbin
>>>>>>
>>>>>>
>>>>>>                     Cheers,
>>>>>>                     David
>>>>>>
>>>>>>                         Webrev:
>>>>>> http://cr.openjdk.java.net/~rehn/8197408/webrev/ <http://cr.openjdk.java.net/~rehn/8197408/webrev/>
>>>>>>                         Bug:
>>>>>> http://cr.openjdk.java.net/~rehn/8197408/webrev/ <http://cr.openjdk.java.net/~rehn/8197408/webrev/>
>>>>>>
>>>>>>                         Thanks, Robbin
>>>>>>
>>>>>>


More information about the hotspot-runtime-dev mailing list