RFR [XS]: 8244500: jtreg test error in test/hotspot/jtreg/containers/docker/TestMemoryAwareness.java
Bob Vandette
bob.vandette at oracle.com
Wed May 20 20:16:32 UTC 2020
What do you think about fixing getTotalSwapSpaceSze and getFreeSwapSpaceSize to return 0L if either of
the underlying Metrics used to calculate the result return -1. Of course we only do this if containerMetrics != null.
Failure to read these values, means we have no swap. If we have no swap we also have no free swap. 0 seems
like an appropriate value to represent this state.
It seems as though some kernels report 0 when kernel swap limit is not enabled while others don’t populate
the memory status files.
Bob.
> On May 20, 2020, at 3:29 PM, Severin Gehwolf <sgehwolf at redhat.com> wrote:
>
> Hi Matthias,
>
> On Wed, 2020-05-20 at 11:07 +0000, Baesken, Matthias wrote:
>> Hi Severin ,
>>
>>> FWIW, parts of this work-around got added by JDK-8236617. Looking at
>>> the bug it was observed that:
>>>
>>> containerMetrics.getMemoryAndSwapLimit() == 0 and
>>> containerMetrics.getMemoryLimit() returned a large value.
>>>
>>> So then with JDK-8231111 I suspect
>>> containerMetrics.getMemoryAndSwapLimit() now returns -1 and, thus,
>>> getTotalSwapSpaceSize0() is being used instead of the container values.
>>
>> I added some tracing output to OperatingSystemImpl.getTotalSwapSpaceSize() and get these values :
>>
>> containerMetrics.getMemoryAndSwapLimit():-1 containerMetrics.getMemoryLimit():-1
>> ****** getTotalSwapSpaceSize0() returned 16106061824
>
> Thanks for this. This seems to confirm my suspicion.
>
> So my working theory here is that the system where you are observing
> the issue now is similar to the one when JDK-8236617 was discovered.
> For some reason containerMetrics.getMemoryAndSwapLimit() returned 0 for
> the case when the necessary files were missing (which was wrong IMO).
> After JDK-8231111 there is a clear distinction for this case: -1 is
> being returned. That makes more sense to me, as there wouldn't be any
> container limit to take into account.
>
> Your trace output seems to confirm that:
>
> containerMetrics.getMemoryAndSwapLimit():-1
> containerMetrics.getMemoryLimit():-1
>
>>
>>> So far the theory, I'd like to gather more data before we come up with
>>> another patch, though. In JDK-8236617 it wasn't investigated why
>>> containerMetrics.getMemoryAndSwapLimit() returned 0 on such a system.
>>> It would be good to know why. Could it be that file
>>> 'memory.memsw.limit_in_bytes' doesn't exist on such a system?
>>
>> True, it does not exist :
>>
>> ls -alL /sys/fs/cgroup/memory/
>> total 0
>> drwxr-xr-x 3 root root 0 Mar 16 11:00 .
>> drwxr-xr-x 11 root root 260 Mar 16 11:00 ..
>> -rw-r--r-- 1 root root 0 Mar 16 11:00 cgroup.clone_children
>> --w--w--w- 1 root root 0 Mar 16 11:00 cgroup.event_control
>> -rw-r--r-- 1 root root 0 May 20 13:00 cgroup.procs
>> -r--r--r-- 1 root root 0 Mar 16 11:00 cgroup.sane_behavior
>> drwxr-xr-x 2 root root 0 May 20 12:52 docker
>> -rw-r--r-- 1 root root 0 Mar 16 11:00 memory.failcnt
>> --w------- 1 root root 0 Mar 16 11:00 memory.force_empty
>> -rw-r--r-- 1 root root 0 Mar 16 11:00 memory.limit_in_bytes
>> -rw-r--r-- 1 root root 0 Mar 16 11:00 memory.low_limit_in_bytes
>> -rw-r--r-- 1 root root 0 Mar 16 11:00 memory.max_usage_in_bytes
>> -rw-r--r-- 1 root root 0 Mar 16 11:00 memory.move_charge_at_immigrate
>> -r--r--r-- 1 root root 0 Mar 16 11:00 memory.numa_stat
>> -rw-r--r-- 1 root root 0 Mar 16 11:00 memory.oom_control
>> ---------- 1 root root 0 Mar 16 11:00 memory.pressure_level
>> -rw-r--r-- 1 root root 0 Mar 16 11:00 memory.soft_limit_in_bytes
>> -r--r--r-- 1 root root 0 Mar 16 11:00 memory.stat
>> -rw-r--r-- 1 root root 0 Mar 16 11:00 memory.swappiness
>> -r--r--r-- 1 root root 0 Mar 16 11:00 memory.usage_in_bytes
>> -rw-r--r-- 1 root root 0 May 20 13:00 memory.use_hierarchy
>> -rw-r--r-- 1 root root 0 Mar 16 11:00 notify_on_release
>> -rw-r--r-- 1 root root 0 Mar 16 11:00 release_agent
>> -rw-r--r-- 1 root root 0 Mar 16 11:00 tasks
>
> Hmm, is this inside a container on this system with --memory and --
> memory-swap being specified on the docker command line?
>
> Anyway, I think we should change the test code similar to this:
> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8244500/01/webrev/
>
> If this works for you, then that's what I'd suggest as a fix.
>
> Note that I haven't tested this patch. I need to recreate my set-up for
> proper docker testing here :-/
>
> Thanks,
> Severin
>
>> Best regards, Matthias
>>
>>
>>
>> -----Original Message-----
>> From: Severin Gehwolf <sgehwolf at redhat.com>
>> Sent: Montag, 18. Mai 2020 12:09
>> To: jiefu(傅杰) <jiefu at tencent.com>; David Holmes <david.holmes at oracle.com>; Baesken, Matthias <matthias.baesken at sap.com>; 'hotspot-dev at openjdk.java.net' <hotspot-dev at openjdk.java.net>
>> Cc: Bob Vandette <Bob.Vandette at oracle.com>
>> Subject: Re: RFR [XS]: 8244500: jtreg test error in test/hotspot/jtreg/containers/docker/TestMemoryAwareness.java
>>
>> Hi,
>>
>> On Mon, 2020-05-18 at 06:15 +0000, jiefu(傅杰) wrote:
>>> Hi David,
>>>
>>> This bug should be introduced by JDK-8242480.
>>> I'm sorry for that.
>>>
>>> And I think the logic in src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java should be fixed.
>>>
>>> In my fix for JDK-8242480, I had ignored the case for kernels which don't support "--memory-swap 150M".
>>> In that case, containerMetrics.getMemoryAndSwapLimit()[1] will return -1.
>>
>> It appears so that containerMetrics.getMemoryAndSwapLimit() returned 0
>> prior JDK-8231111 and -1 after. I'm not convinced JDK-8242480
>> introduced this.
>>
>>> I didn't notice this bug because all our kernels support --memory-swap.
>>> Fortunately, Matthias found this bug and will fix it.
>>
>> I'm still not sure what's happening in Matthias' case on that system.
>> It would be best to first try to understand the actual problem better.
>> The info I have is:
>>
>> - It's a cgroup v1 system
>> - The system now returns large values for
>> OperatingSystemMXBean.getTotalSwapSpaceSize() and
>> OperatingSystemMXBean.getFreeSwapSpaceSize()
>>
>> It would be good to know what the actual return values are for the
>> container metrics on this system with swap limit disabled at kernel
>> level. And also whether or not relevant cgroup files exist on such a
>> system (within docker).
>>
>> FWIW, parts of this work-around got added by JDK-8236617. Looking at
>> the bug it was observed that:
>>
>> containerMetrics.getMemoryAndSwapLimit() == 0 and
>> containerMetrics.getMemoryLimit() returned a large value.
>>
>> So then with JDK-8231111 I suspect
>> containerMetrics.getMemoryAndSwapLimit() now returns -1 and, thus,
>> getTotalSwapSpaceSize0() is being used instead of the container values.
>>
>> So far the theory, I'd like to gather more data before we come up with
>> another patch, though. In JDK-8236617 it wasn't investigated why
>> containerMetrics.getMemoryAndSwapLimit() returned 0 on such a system.
>> It would be good to know why. Could it be that file
>> 'memory.memsw.limit_in_bytes' doesn't exist on such a system?
>>
>> Unfortunately, I don't know what config one needs in order to reproduce
>> so we'll have to rely on Matthias to gather the info.
>>
>> Thanks,
>> Severin
>>
>>> Thanks a lot.
>>> Best regards,
>>> Jie
>>>
>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/a95911422005/src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java#l73
>>>
>>> On 2020/5/18, 1:48 PM, "David Holmes" <david.holmes at oracle.com> wrote:
>>>
>>> Hi Matthias,
>>>
>>> On 16/05/2020 12:59 am, Baesken, Matthias wrote:
>>>> Hello David , I get the warning always on this linux machine (I think it depends on machine setup and/or kernel version).
>>>>
>>>>> Surely it must be specified to return 0 or some such sentinel in such a case?
>>>>
>>>> The doc just says this :
>>>>
>>>> 62 /**
>>>> 63 * Returns the total amount of swap space in bytes.
>>>> 64 *
>>>> 65 * @return the total amount of swap space in bytes.
>>>> 66 */
>>>> 67 public long getTotalSwapSpaceSize();
>>>>
>>>> ( however it returned previously 0 before the latest changes in the container support code came in ).
>>>
>>> That sounds like a bug was introduced to me then.
>>>
>>>> Regarding the getFreeSwapSpaceSize issue , Jie provided a change to src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java
>>>> Because of this I could remove the try-construct .
>>>>
>>>> New webrev :
>>>> http://cr.openjdk.java.net/~mbaesken/webrevs/8244500.3/
>>>
>>> I'm quite confused by your changes. Your first version simply adjusted
>>> the test to check for the warning. But now you are also changing the
>>> main logic. I can't assess the validity of what you are doing sorry.
>>>
>>> I've cc'd Severin and Bob as I think they need to evaluate what the real
>>> problem is here and when it was introduced.
>>>
>>> David
>>> ----
>>>
>>>> Best regards, Matthias
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes <david.holmes at oracle.com>
>>>> Sent: Donnerstag, 14. Mai 2020 07:02
>>>> To: Baesken, Matthias <matthias.baesken at sap.com>; 'hotspot-dev at openjdk.java.net' <hotspot-dev at openjdk.java.net>
>>>> Cc: jiefu(傅杰) <jiefu at tencent.com>
>>>> Subject: Re: RFR [XS]: 8244500: jtreg test error in test/hotspot/jtreg/containers/docker/TestMemoryAwareness.java
>>>>
>>>> On 14/05/2020 12:28 am, Baesken, Matthias wrote:
>>>>> Hello, here is a new webrev removing the comment :
>>>>>
>>>>> http://cr.openjdk.java.net/~mbaesken/webrevs/8244500.1/
>>>>>
>>>>> It removes the comment , and it also handles 0 for getFreeSwapSpaceSize
>>>>> ( this can occur after JDK-8242480 : Negative value may be returned by getFreeSwapSpaceSize() in the docker was pushed recently ).
>>>>
>>>> Sorry Matthias but I still see this partly as a bug in the underlying
>>>> container support code. If you get the warning then what does
>>>> getTotalSwapSpaceSize() return? Surely it must be specified to return 0
>>>> or some such sentinel in such a case?
>>>>
>>>> Other comments:
>>>>
>>>> ! try {
>>>> !
>>>> out.shouldMatch("OperatingSystemMXBean\\.getFreeSwapSpaceSize:
>>>> [1-9][0-9]+");
>>>> ! } catch(RuntimeException ex) { // allow 0, see 8242480
>>>> !
>>>> out.shouldContain("OperatingSystemMXBean.getFreeSwapSpaceSize: 0");
>>>> ! }
>>>>
>>>> can't you just change the original regex to match >= 0?
>>>>
>>>> Style nit: avoid historical comments like "see 8242480" and "since
>>>> 8231111 ...".
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Best regards, Matthias
>>>>>
>>>>> -----Original Message-----
>>>>> From: Baesken, Matthias
>>>>> Sent: Mittwoch, 13. Mai 2020 10:11
>>>>> To: 'David Holmes' <david.holmes at oracle.com>; 'hotspot-dev at openjdk.java.net' <hotspot-dev at openjdk.java.net>
>>>>> Subject: RE: RFR [XS]: 8244500: jtreg test error in test/hotspot/jtreg/containers/docker/TestMemoryAwareness.java
>>>>>
>>>>>
>>>>>> But according to the comment immediately before your changes:
>>>>>>
>>>>>> // in case of warnings like : "Your kernel does not support swap limit
>>>>>> capabilities or the cgroup is not mounted. Memory limited without swap."
>>>>>> // the getTotalSwapSpaceSize does not return the expected result, but 0
>>>>>>
>>>>>> we should be returning zero. So this seems to be a bug in the
>>>>>> implementation not in the test.
>>>>>
>>>>>
>>>>> Hi David , I think in case of "Your kernel does not support swap limit capabilities ... " we just do not get the values
>>>>> that are normally expected.
>>>>> Previously we indeed got "0" .
>>>>> Probably the comment should be deleted .
>>>>>
>>>>> Best regards, Matthias
>>>>>
>>>
>>>
>>>
>
More information about the hotspot-dev
mailing list