RFR: 8197408: Bad pointer comparison and small cleanup in os_linux.cpp

Tue Feb 27 10:05:57 UTC 2018

Hi Thomas,

You ok with this version also?
Full:
http://cr.openjdk.java.net/~rehn/8197408/v4/fulll/webrev/

Thanks, Robbin

On 02/26/2018 05:13 PM, Bob Vandette wrote:
> Sorry, didn’t realize you were waiting for another response.
> 
> The fix below looks good.
> 
> Bob.
> 
> 
>> On Feb 26, 2018, at 11:02 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:
>>
>> Ping Bob,
>>
>> /Robbin
>>
>> On 02/20/2018 05:44 PM, Robbin Ehn wrote:
>>> Hi Bob, thanks for having another look.
>>> On 2018-02-20 16:13, Bob Vandette wrote:
>>>>
>>>>> On Feb 19, 2018, at 8:03 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:
>>>>>
>>>>> Hi Bob,
>>>>>
>>>>> On 02/15/2018 04:50 PM, Bob Vandette wrote:
>>>>>> The changes look ok to me.
>>>>>
>>>>> Thanks!
>>>>>
>>>>>> I would have preferred that you kept the original logic embedded in the
>>>>>> higher level functions rather than adding additional static functions.  We’re hoping
>>>>>> to back port this capability to JDK 8 and it just makes the merges more error prone.
>>>>>
>>>>> Here I moved the logic back:
>>>>>
>>>>> http://cr.openjdk.java.net/~rehn/8197408/v3/inc/
>>>>> http://cr.openjdk.java.net/~rehn/8197408/v3/full/
>>>>>
>>>>> I prefer the v2 version of os_linux.cpp, stick with v3 or go back to v2?
>>>>
>>>> I prefer v3 although I still see changes that are unnecessary even in that changeset.
>>>>
>>>> My implementation of os::Linux::available_memory() is more efficient since I’m only testing
>>>> memory limit and usage once.  The compiler may optimize this but there is no need to check
>>>> usage if the limit cannot be read or is unlimited.  Also, memory_usage_in_bytes will never return unlimited.
>>>> Can’t you just update my log_debug lines to provide the additional information?
>>> Good catch on the memory_usage_in_bytes it snuck in v1->v2.
>>> diff -r 98ee8add2b00 src/hotspot/os/linux/os_linux.cpp
>>> --- a/src/hotspot/os/linux/os_linux.cpp    Mon Feb 19 10:24:37 2018 +0100
>>> +++ b/src/hotspot/os/linux/os_linux.cpp    Tue Feb 20 17:29:56 2018 +0100
>>> @@ -185,2 +185,1 @@
>>> -      log_debug(os, container)("container memory usage %s: " JLONG_FORMAT ", using host value",
>>> -                             mem_usage == OSCONTAINER_ERROR ? "failed" : "unlimited", mem_usage);
>>> +      log_debug(os, container)("container memory usage failed: " JLONG_FORMAT ", using host value", mem_usage);
>>> I don't either like multiple checking of limit, that is why I like v2 with a separate function for container and merging a new function should apply pretty cleanly.
>>> But my both versions is much simpler (as in easier to read), which was the cleanup.
>>> If you are worried about performance we should not use sscanf to read a file just containing a number, sscanf is slow.
>>> But I suggest we stick sscanf to avoid yet another macro.
>>>>
>>>> In print_container_info, if you are going to eliminate the multiple if/else entries for some of
>>>> the output, why didn’t you also do that for active_processor_count?
>>> active_processor_count only have 2 levels of nesting, while the code I cleanup had 4 levels.
>>> And this is not an attempt to clean the whole file.
>>> Also can you comment on test, do you see any cases that was missed or other issues?
>>> Thanks, Robbin
>>>>
>>>> Bob.
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> I also added a test which just verifies we can read the cgroup's values without failing when in a container. And if not in container verifies that we failed as expected.
>>>>>
>>>>> Both v2 and v3 passes this test via the in container case.
>>>>>
>>>>> Thanks, Robbin
>>>>>
>>>>>> I’m also a bit uncomfortable with freeing potentially NULL pointers in print_container_info.
>>>>>> I’ve worked with too many different Linux clibs to trust that they all have the same
>>>>>> behavior but it is documented on both Linux and Bsd so I guess we’re ok.
>>>>>> Bob.
>>>>>>> On Feb 15, 2018, at 4:37 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:
>>>>>>>
>>>>>>> Hi Thomas, thanks for having a look.
>>>>>>>
>>>>>>> On 2018-02-14 15:06, Thomas Stüfe wrote:
>>>>>>>> Hi Robin,
>>>>>>>> Had a short look. Note that I cannot open the issue. Link is wrong, and the link in the source.patch is wrong too. So, I just looked at the webrev.
>>>>>>>
>>>>>>> Sorry bug is:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8197408
>>>>>>>
>>>>>>> Inc here:
>>>>>>> http://cr.openjdk.java.net/~rehn/8197408/v2/inc/webrev/
>>>>>>> Full here:
>>>>>>> http://cr.openjdk.java.net/~rehn/8197408/v2/full/webrev/
>>>>>>>
>>>>>>> Comments on remarks below:
>>>>>>>
>>>>>>>> Remarks:
>>>>>>>> -> This is a matter of taste but I prefer pointers to references for output variables. Makes the intent clearer at the calling site, and that way you could get rid of the /* output */ comment.
>>>>>>>
>>>>>>> We had a debate about this, my opinion was that avoiding the NULL check makes it worth using a ref, but I see I'm pretty lonely in this camp so fixed.
>>>>>>>
>>>>>>>> -> in available_memory_container():
>>>>>>>> Could we also add "using host value" to the logging in error case as you do in physical_memory_container()?
>>>>>>>
>>>>>>> Fixed, and tried to improve the logs.
>>>>>>>
>>>>>>>> -> (Not part of your patch) Can OSContainer::memory_limit_in_bytes() actually ever return "OSCONTAINER_ERROR"? I may be wrong here but:
>>>>>>>> OSCONTAINER_ERROR = -2. /memory.limit_in_bytes gets returned as julong, so we have (julong)(-2). That gets compared with julong _unlimited_memory which is basically LONG_MAX, so signed long max, which should be smaller than (julong)(-2), or? So OSContainer::memory_limit_in_bytes() should always return -1 for both errors and the unlimited case.
>>>>>>>
>>>>>>> Manually tested this, it seems to work. The -2 gets promoted to unsigned in comparison.
>>>>>>> We go from (jlong)-2 to (julong)ULONG_MAX-1 back to (jlong)-2.
>>>>>>> Not obviously that it will always work. I will not touch that in this changeset.
>>>>>>>
>>>>>>>> -> Can we:
>>>>>>>> -  st->print("container_type: %s\n", p != NULL ? p : "failed");
>>>>>>>> +  st->print_cr("container_type: %s", p != NULL ? p : "failed");
>>>>>>>> ?
>>>>>>>
>>>>>>> All of these use print + \n, you want me to just change this one or ?
>>>>>>> I'll rather leave them alone.
>>>>>>>
>>>>>>>> -> char * OSContainer::container_type() - you free() that value but it is not strdup()ed.
>>>>>>>
>>>>>>> Thanks for seeing this, copy-paste :)
>>>>>>>
>>>>>>>> (Btw I do not like that some functions return strduped values, some do not. If OSContainer::container_type() wants to return a static string, its return type should be at least const char*, not char*. They even added an explicit cast to nonconst char* .)
>>>>>>>
>>>>>>> Changed to const.
>>>>>>>
>>>>>>> Thanks, Robbin
>>>>>>>
>>>>>>>> Kind Regards, Thomas
>>>>>>>> On Wed, Feb 14, 2018 at 2:18 PM, Robbin Ehn <robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com>> wrote:
>>>>>>>>      Ping!
>>>>>>>>      /Robbin
>>>>>>>>      On 2018-02-08 13:35, Robbin Ehn wrote:
>>>>>>>>          Hi David,
>>>>>>>>          On 02/08/2018 01:19 PM, David Holmes wrote:
>>>>>>>>              On 8/02/2018 10:08 PM, Robbin Ehn wrote:
>>>>>>>>                  Hi David,
>>>>>>>>                  On 02/08/2018 12:43 PM, David Holmes wrote:
>>>>>>>>                      Hi Robbin,
>>>>>>>>                      On 8/02/2018 7:03 PM, Robbin Ehn wrote:
>>>>>>>>                          Hi all,
>>>>>>>>                          There is a bad pointer comparison in os_linux.cpp while looking at that
>>>>>>>>                      You seem to be missing the fact that OSContainer::cpu_cpuset_memory_nodes() can return a pointer or an error code.
>>>>>>>>                  This is not true for macro:
>>>>>>>>                  GET_CONTAINER_INFO_CPTR
>>>>>>>>                  As far I can see?
>>>>>>>>              Sorry - you're right. I misread current code and misremembered what happened at the initial code review - where I'm sure this "pointer versus error code" issue was also flagged.
>>>>>>>>                  It returns:
>>>>>>>>                  if (err != 0)
>>>>>>>>                      return (return_type) NULL;
>>>>>>>>                  or:
>>>>>>>>                  return os::strdup(mems);
>>>>>>>>                  If you know a method that returns an integer in a char*, it's broken should be fixed.
>>>>>>>>                          I saw some if statement were missing bracket, a lot of extra scopes and complexity in the scoping.
>>>>>>>>                      You'd better check all this with Bob Vandette as its his container support code.
>>>>>>>>                  Not sure what you mean. It passes container tests.
>>>>>>>>              In what environment did you run the container tests? Most of the code you've been refactoring deals with various errors and misconfigurations that can occur.
>>>>>>>>              I'm sure Bob will want a chance to check the refactoring still does as he intended.
>>>>>>>>          On mach5 all platforms with the container test (+ hotspot_tier1) and locally.
>>>>>>>>          I said before, regarding containers, all Linux have cgroups configured so this logic always thinks we are in a container and does this logging and calculation.
>>>>>>>>          But no there is no tests for miss-configuration that I found.
>>>>>>>>          Added Bob!
>>>>>>>>          Thanks, Robbin
>>>>>>>>              Cheers,
>>>>>>>>              David
>>>>>>>>              -----
>>>>>>>>                  Thanks, Robbin
>>>>>>>>                      Cheers,
>>>>>>>>                      David
>>>>>>>>                          Webrev:
>>>>>>>>                          http://cr.openjdk.java.net/~rehn/8197408/webrev/ <http://cr.openjdk.java.net/~rehn/8197408/webrev/>
>>>>>>>>                          Bug:
>>>>>>>>                          http://cr.openjdk.java.net/~rehn/8197408/webrev/ <http://cr.openjdk.java.net/~rehn/8197408/webrev/>
>>>>>>>>                          Thanks, Robbin
>>>>
>