RFR [XS]: 8244500: jtreg test error in test/hotspot/jtreg/containers/docker/TestMemoryAwareness.java

Severin Gehwolf sgehwolf at redhat.com
Wed Jun 3 13:46:46 UTC 2020


Hi Matthias,

On Fri, 2020-05-22 at 13:36 +0000, Baesken, Matthias wrote:
> > Anyway, I think we should change the test code similar to this:
> > http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8244500/01/webrev/
> 
> Hi Severin, this gave me :
> 
> Checking OperatingSystemMXBean
> Exception in thread "main" java.lang.IllegalAccessError: class CheckOperatingSystemMXBean (in unnamed module @0x5acf9800) cannot access class jdk.internal.platform.Container (in module java.base) because module java.base does not export jdk.internal.platform to unnamed module @0x5acf9800
> 	at CheckOperatingSystemMXBean.main(CheckOperatingSystemMXBean.java:33)
> 
> (a bit unexpected because you added it to the test, I though this would be sufficient !?)

OK, reproduced. FWIW, 01 was an untested patch ;-)

Please try this updated webrev:
http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8244500/02/webrev/

It works fine here for cgroups v2 and cgroups v1.

> (plus we  still  need Jie's adjustment to fix getFreeSwapSpaceSize  )

Yes, but I'd rather fix this as a separate issue. I don't think it's
related.

Thanks,
Severin

> -----Original Message-----
> From: Severin Gehwolf <sgehwolf at redhat.com> 
> Sent: Mittwoch, 20. Mai 2020 21:30
> To: Baesken, Matthias <matthias.baesken at sap.com>; jiefu(傅杰) <
> jiefu at tencent.com>; David Holmes <david.holmes at oracle.com>; 
> 'hotspot-dev at openjdk.java.net' <hotspot-dev at openjdk.java.net>
> Cc: Bob Vandette <Bob.Vandette at oracle.com>
> Subject: Re: RFR [XS]: 8244500: jtreg test error in
> test/hotspot/jtreg/containers/docker/TestMemoryAwareness.java
> 
> Hi Matthias,
> 
> On Wed, 2020-05-20 at 11:07 +0000, Baesken, Matthias wrote:
> > Hi Severin , 
> > 
> > > FWIW, parts of this work-around got added by JDK-8236617. Looking
> > > at
> > > the bug it was observed that:
> > > 
> > > containerMetrics.getMemoryAndSwapLimit() == 0 and 
> > > containerMetrics.getMemoryLimit() returned a large value.
> > > 
> > > So then with JDK-8231111 I suspect
> > > containerMetrics.getMemoryAndSwapLimit() now returns -1 and,
> > > thus, 
> > > getTotalSwapSpaceSize0() is being used instead of the container
> > > values.
> > 
> > I added some tracing output
> > to  OperatingSystemImpl.getTotalSwapSpaceSize() and get these
> > values :
> > 
> > containerMetrics.getMemoryAndSwapLimit():-1
> > containerMetrics.getMemoryLimit():-1
> > ****** getTotalSwapSpaceSize0() returned 16106061824
> 
> Thanks for this. This seems to confirm my suspicion.
> 
> So my working theory here is that the system where you are observing
> the issue now is similar to the one when JDK-8236617 was discovered.
> For some reason containerMetrics.getMemoryAndSwapLimit() returned 0
> for
> the case when the necessary files were missing (which was wrong IMO).
> After JDK-8231111 there is a clear distinction for this case: -1 is
> being returned. That makes more sense to me, as there wouldn't be any
> container limit to take into account.
> 
> Your trace output seems to confirm that:
> 
> containerMetrics.getMemoryAndSwapLimit():-1
> containerMetrics.getMemoryLimit():-1
> 
> > > So far the theory, I'd like to gather more data before we come up
> > > with
> > > another patch, though. In JDK-8236617 it wasn't investigated why
> > > containerMetrics.getMemoryAndSwapLimit() returned 0 on such a
> > > system.
> > > It would be good to know why. Could it be that file
> > > 'memory.memsw.limit_in_bytes' doesn't exist on such a system?
> > 
> > True, it does not exist :
> > 
> > ls -alL  /sys/fs/cgroup/memory/
> > total 0
> > drwxr-xr-x  3 root root   0 Mar 16 11:00 .
> > drwxr-xr-x 11 root root 260 Mar 16 11:00 ..
> > -rw-r--r--  1 root root   0 Mar 16 11:00 cgroup.clone_children
> > --w--w--w-  1 root root   0 Mar 16 11:00 cgroup.event_control
> > -rw-r--r--  1 root root   0 May 20 13:00 cgroup.procs
> > -r--r--r--  1 root root   0 Mar 16 11:00 cgroup.sane_behavior
> > drwxr-xr-x  2 root root   0 May 20 12:52 docker
> > -rw-r--r--  1 root root   0 Mar 16 11:00 memory.failcnt
> > --w-------  1 root root   0 Mar 16 11:00 memory.force_empty
> > -rw-r--r--  1 root root   0 Mar 16 11:00 memory.limit_in_bytes
> > -rw-r--r--  1 root root   0 Mar 16 11:00 memory.low_limit_in_bytes
> > -rw-r--r--  1 root root   0 Mar 16 11:00 memory.max_usage_in_bytes
> > -rw-r--r--  1 root root   0 Mar 16 11:00
> > memory.move_charge_at_immigrate
> > -r--r--r--  1 root root   0 Mar 16 11:00 memory.numa_stat
> > -rw-r--r--  1 root root   0 Mar 16 11:00 memory.oom_control
> > ----------  1 root root   0 Mar 16 11:00 memory.pressure_level
> > -rw-r--r--  1 root root   0 Mar 16 11:00 memory.soft_limit_in_bytes
> > -r--r--r--  1 root root   0 Mar 16 11:00 memory.stat
> > -rw-r--r--  1 root root   0 Mar 16 11:00 memory.swappiness
> > -r--r--r--  1 root root   0 Mar 16 11:00 memory.usage_in_bytes
> > -rw-r--r--  1 root root   0 May 20 13:00 memory.use_hierarchy
> > -rw-r--r--  1 root root   0 Mar 16 11:00 notify_on_release
> > -rw-r--r--  1 root root   0 Mar 16 11:00 release_agent
> > -rw-r--r--  1 root root   0 Mar 16 11:00 tasks
> 
> Hmm, is this inside a container on this system with --memory and --
> memory-swap being specified on the docker command line?
> 
> Anyway, I think we should change the test code similar to this:
> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8244500/01/webrev/
> 
> If this works for you, then that's what I'd suggest as a fix.
> 
> Note that I haven't tested this patch. I need to recreate my set-up
> for
> proper docker testing here :-/
> 
> Thanks,
> Severin
> 
> > Best regards, Matthias
> > 
> > 
> > 
> > -----Original Message-----
> > From: Severin Gehwolf <sgehwolf at redhat.com> 
> > Sent: Montag, 18. Mai 2020 12:09
> > To: jiefu(傅杰) <jiefu at tencent.com>; David Holmes <
> > david.holmes at oracle.com>; Baesken, Matthias <
> > matthias.baesken at sap.com>; 'hotspot-dev at openjdk.java.net' <
> > hotspot-dev at openjdk.java.net>
> > Cc: Bob Vandette <Bob.Vandette at oracle.com>
> > Subject: Re: RFR [XS]: 8244500: jtreg test error in
> > test/hotspot/jtreg/containers/docker/TestMemoryAwareness.java
> > 
> > Hi,
> > 
> > On Mon, 2020-05-18 at 06:15 +0000, jiefu(傅杰) wrote:
> > > Hi David,
> > > 
> > > This bug should be introduced by JDK-8242480.
> > > I'm sorry for that.
> > > 
> > > And I think the logic in
> > > src/jdk.management/unix/classes/com/sun/management/internal/Opera
> > > tingSystemImpl.java should be fixed.
> > > 
> > > In my fix for JDK-8242480, I had ignored the case for kernels
> > > which don't support "--memory-swap 150M".
> > > In that case, containerMetrics.getMemoryAndSwapLimit()[1] will
> > > return -1.
> > 
> > It appears so that containerMetrics.getMemoryAndSwapLimit()
> > returned 0
> > prior JDK-8231111 and -1 after. I'm not convinced JDK-8242480
> > introduced this.
> > 
> > > I didn't notice this bug because all our kernels support --
> > > memory-swap.
> > > Fortunately, Matthias found this bug and will fix it.
> > 
> > I'm still not sure what's happening in Matthias' case on that
> > system.
> > It would be best to first try to understand the actual problem
> > better.
> > The info I have is:
> > 
> > - It's a cgroup v1 system
> > - The system now returns large values for
> >   OperatingSystemMXBean.getTotalSwapSpaceSize() and
> >   OperatingSystemMXBean.getFreeSwapSpaceSize()
> > 
> > It would be good to know what the actual return values are for the
> > container metrics on this system with swap limit disabled at kernel
> > level. And also whether or not relevant cgroup files exist on such
> > a
> > system (within docker).
> > 
> > FWIW, parts of this work-around got added by JDK-8236617. Looking
> > at
> > the bug it was observed that:
> > 
> > containerMetrics.getMemoryAndSwapLimit() == 0 and 
> > containerMetrics.getMemoryLimit() returned a large value.
> > 
> > So then with JDK-8231111 I suspect
> > containerMetrics.getMemoryAndSwapLimit() now returns -1 and, thus, 
> > getTotalSwapSpaceSize0() is being used instead of the container
> > values.
> > 
> > So far the theory, I'd like to gather more data before we come up
> > with
> > another patch, though. In JDK-8236617 it wasn't investigated why
> > containerMetrics.getMemoryAndSwapLimit() returned 0 on such a
> > system.
> > It would be good to know why. Could it be that file
> > 'memory.memsw.limit_in_bytes' doesn't exist on such a system?
> > 
> > Unfortunately, I don't know what config one needs in order to
> > reproduce
> > so we'll have to rely on Matthias to gather the info.
> > 
> > Thanks,
> > Severin
> > 
> > > Thanks a lot.
> > > Best regards,
> > > Jie
> > > 
> > > [1] 
> > > http://hg.openjdk.java.net/jdk/jdk/file/a95911422005/src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java#l73
> > > 
> > > On 2020/5/18, 1:48 PM, "David Holmes" <david.holmes at oracle.com>
> > > wrote:
> > > 
> > >     Hi Matthias,
> > >     
> > >     On 16/05/2020 12:59 am, Baesken, Matthias wrote:
> > >     > Hello David ,  I get  the warning  always on this linux
> > > machine (I think it depends on machine setup and/or
> > > kernel   version).
> > >     > 
> > >     >> Surely it must be specified to return 0  or some such
> > > sentinel in such a case?
> > >     > 
> > >     > The doc just says this :
> > >     > 
> > >     > 62    /**
> > >     > 63     * Returns the total amount of swap space in bytes.
> > >     > 64     *
> > >     > 65     * @return the total amount of swap space in bytes.
> > >     > 66     */
> > >     > 67    public long getTotalSwapSpaceSize();
> > >     > 
> > >     > ( however  it returned previously 0 before the latest
> > > changes in the container support code came in ).
> > >     
> > >     That sounds like a bug was introduced to me then.
> > >     
> > >     >
> > > Regarding  the  getFreeSwapSpaceSize  issue  ,   Jie   provided a
> > > change
> > > to  src/jdk.management/unix/classes/com/sun/management/internal/O
> > > peratingSystemImpl.java
> > >     >   Because of this   I could remove  the try-construct .
> > >     > 
> > >     > New  webrev :
> > >     > http://cr.openjdk.java.net/~mbaesken/webrevs/8244500.3/
> > >     
> > >     I'm quite confused by your changes. Your first version simply
> > > adjusted 
> > >     the test to check for the warning. But now you are also
> > > changing the 
> > >     main logic. I can't assess the validity of what you are doing
> > > sorry.
> > >     
> > >     I've cc'd Severin and Bob as I think they need to evaluate
> > > what the real 
> > >     problem is here and when it was introduced.
> > >     
> > >     David
> > >     ----
> > >     
> > >     > Best regards, Matthias
> > >     > 
> > >     > 
> > >     > 
> > >     > -----Original Message-----
> > >     > From: David Holmes <david.holmes at oracle.com>
> > >     > Sent: Donnerstag, 14. Mai 2020 07:02
> > >     > To: Baesken, Matthias <matthias.baesken at sap.com>; 
> > > 'hotspot-dev at openjdk.java.net' <hotspot-dev at openjdk.java.net>
> > >     > Cc: jiefu(傅杰) <jiefu at tencent.com>
> > >     > Subject: Re: RFR [XS]: 8244500: jtreg test error in
> > > test/hotspot/jtreg/containers/docker/TestMemoryAwareness.java
> > >     > 
> > >     > On 14/05/2020 12:28 am, Baesken, Matthias wrote:
> > >     >> Hello, here is a new webrev removing the comment :
> > >     >>
> > >     >> http://cr.openjdk.java.net/~mbaesken/webrevs/8244500.1/
> > >     >>
> > >     >> It removes the comment , and it also  handles  0  for
> > > getFreeSwapSpaceSize
> > >     >> (  this can occur  after   JDK-8242480 : Negative value
> > > may be returned by getFreeSwapSpaceSize() in the docker   was
> > > pushed  recently ).
> > >     > 
> > >     > Sorry Matthias but I still see this partly as a bug in the
> > > underlying
> > >     > container support code. If you get the warning then what
> > > does
> > >     > getTotalSwapSpaceSize() return? Surely it must be specified
> > > to return 0
> > >     > or some such sentinel in such a case?
> > >     > 
> > >     > Other comments:
> > >     > 
> > >     > !         try {
> > >     > !
> > >     >
> > > out.shouldMatch("OperatingSystemMXBean\\.getFreeSwapSpaceSize:
> > >     > [1-9][0-9]+");
> > >     > !         } catch(RuntimeException ex) { // allow 0, see
> > > 8242480
> > >     > !
> > >     >
> > > out.shouldContain("OperatingSystemMXBean.getFreeSwapSpaceSize:
> > > 0");
> > >     > !         }
> > >     > 
> > >     > can't you just change the original regex to match >= 0?
> > >     > 
> > >     > Style nit: avoid historical comments like "see 8242480" and
> > > "since
> > >     > 8231111 ...".
> > >     > 
> > >     > Thanks,
> > >     > David
> > >     > 
> > >     >> Best regards, Matthias
> > >     >>
> > >     >> -----Original Message-----
> > >     >> From: Baesken, Matthias
> > >     >> Sent: Mittwoch, 13. Mai 2020 10:11
> > >     >> To: 'David Holmes' <david.holmes at oracle.com>; 
> > > 'hotspot-dev at openjdk.java.net' <hotspot-dev at openjdk.java.net>
> > >     >> Subject: RE: RFR [XS]: 8244500: jtreg test error in
> > > test/hotspot/jtreg/containers/docker/TestMemoryAwareness.java
> > >     >>
> > >     >>
> > >     >>> But according to the comment immediately before your
> > > changes:
> > >     >>>
> > >     >>> // in case of warnings like : "Your kernel does not
> > > support swap limit
> > >     >>> capabilities or the cgroup is not mounted. Memory limited
> > > without swap."
> > >     >>> // the getTotalSwapSpaceSize does not return the expected
> > > result, but 0
> > >     >>>
> > >     >>> we should be returning zero. So this seems to be a bug in
> > > the
> > >     >>> implementation not in the test.
> > >     >>
> > >     >>    
> > >     >> Hi David , I think  in case of  "Your kernel does not
> > > support swap limit  capabilities   ... "    we just do not get
> > > the values
> > >     >>    that are normally expected.
> > >     >> Previously we indeed got "0" .
> > >     >> Probably the comment should be deleted .
> > >     >>
> > >     >> Best regards, Matthias
> > >     >>
> > >     
> > >     
> > > 



More information about the hotspot-dev mailing list