From daniil.x.titov at oracle.com Tue Dec 3 19:42:54 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Tue, 03 Dec 2019 11:42:54 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware Message-ID: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> Please review the change that makes OperatingSystemMXBean methods return container specific information rather than the host based data. The webrev [1] is based on the code Andrew and Severin initially provided with some additional changes and combined with the spec update David made [3]. The webrev corrects the implementation for the free/total swap methods as Bob noted to subtract the total and free memory from the returned values. It also corrects getCpuLoad() implementation, as Bob advised, to cover the case when CPU quotas are not active. The webrev also takes into account the case when java.security.AccessControlException exception is thrown during the initialization of the container subsystem ( e.g. when java.policy doesn?t grant "read" access to "/proc/self/mountinfo" file). CSR for the spec changes [3] is approved. Testing: Mach5 tiers1, tiers2, tiers3, tier4, tier5 (including open/test/hotspot/jtreg/containers/docker), and tier6 tests passed . [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.02/ [2] Jira issue :https://bugs.openjdk.java.net/browse/JDK-8226575 [3] CSR https://bugs.openjdk.java.net/browse/JDK-8228428 Thank you, -Daniil From bob.vandette at oracle.com Tue Dec 3 21:30:17 2019 From: bob.vandette at oracle.com (Bob Vandette) Date: Tue, 3 Dec 2019 16:30:17 -0500 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> Message-ID: <48599FD5-583F-4CB7-80CA-09E91AA24D4A@oracle.com> Daniil, Looks good to me. If there are any management jtreg tests, I?d run these since your changes to OperatingSystemMXBean will alter the behavior of these methods even for Linux hosts since cgroups is typically enabled causing the container detection to report containerized. It?s too bad getCpuLoad can?t detect that the cpuset is identical to the hosts in order to allow you to fallback to getSystemCpuLoad0. Bob. > On Dec 3, 2019, at 2:42 PM, Daniil Titov wrote: > > Please review the change that makes OperatingSystemMXBean methods return container specific information > rather than the host based data. > > The webrev [1] is based on the code Andrew and Severin initially provided with some additional changes and combined > with the spec update David made [3]. > > The webrev corrects the implementation for the free/total swap methods as Bob noted to subtract the total > and free memory from the returned values. > > It also corrects getCpuLoad() implementation, as Bob advised, to cover the case when CPU quotas are not active. > > The webrev also takes into account the case when java.security.AccessControlException exception is thrown > during the initialization of the container subsystem ( e.g. when java.policy doesn?t grant "read" access to > "/proc/self/mountinfo" file). > > CSR for the spec changes [3] is approved. > > Testing: Mach5 tiers1, tiers2, tiers3, tier4, tier5 (including open/test/hotspot/jtreg/containers/docker), and tier6 tests passed . > > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.02/ > [2] Jira issue :https://bugs.openjdk.java.net/browse/JDK-8226575 > [3] CSR https://bugs.openjdk.java.net/browse/JDK-8228428 > > Thank you, > -Daniil > > From mandy.chung at oracle.com Wed Dec 4 00:10:13 2019 From: mandy.chung at oracle.com (Mandy Chung) Date: Tue, 3 Dec 2019 16:10:13 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> Message-ID: On 12/3/19 11:42 AM, Daniil Titov wrote: > Please review the change that makes OperatingSystemMXBean methods return container specific informationrather than the host based data. > > The webrev also takes into account the case when java.security.AccessControlException exception is thrown > during the initialization of the container subsystem ( e.g. when java.policy doesn?t grant "read" access to "/proc/self/mountinfo" file). Instead of failing to access /proc/self/mountinfo, I expect this to wrap the call with doPrivileged so that it can report the metrics independent of the security policy.? The jdk default security policy should grant proper permission to do so. > > CSR for the spec changes [3] is approved. > > Testing: Mach5 tiers1, tiers2, tiers3, tier4, tier5 (including open/test/hotspot/jtreg/containers/docker), and tier6 tests passed . > > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.02/ > [2] Jira issue :https://bugs.openjdk.java.net/browse/JDK-8226575 > [3] CSR https://bugs.openjdk.java.net/browse/JDK-8228428 > > src/java.base/linux/classes/jdk/internal/platform/cgroupv1/Metrics.java ??? this should wrap the security-sensitive operations with doPrivileged.? jdk.management is trusted and it has all permissions. src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c ??? Formatting nit:? line 346-355: JDK native source uses 4-space identation convention.? A space is missing between "if" and "(". src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java 59 if (limit >= 0 && memLimit >= 0) { 60 return limit - memLimit; 61 } Under what circumstance that limit or memLimit is < 0??? It fallbacks to return the system's total swap space size - this is not really what it should report.?? Is it worth specifying this case?Similarly, getFreeMemorySize and getTotalMemorySize and getCpuLoad. getFreeSwapSpaceSize retry for a few times.? What special about this method but not others like getFreeMemorySize? src/jdk.management/windows/classes/com/sun/management/internal/OperatingSystemImpl.java ???? There is no strong need to make the deprecated methods as default methods.? If they were default methods, they only need to be implemented once as opposed to in all OS-specific implementations. CheckOperatingSystemMXBean.java ???? System.out.println(String.format(...)) can simply be replaced with System.out.format. Mandy -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniil.x.titov at oracle.com Wed Dec 4 02:00:28 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Tue, 03 Dec 2019 18:00:28 -0800 Subject: jmx-dev 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <48599FD5-583F-4CB7-80CA-09E91AA24D4A@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <48599FD5-583F-4CB7-80CA-09E91AA24D4A@oracle.com> Message-ID: Hi Bob, >> It?s too bad getCpuLoad can?t detect that the cpuset is identical to the hosts in order to allow you to fallback to >> getSystemCpuLoad0. I think we can detect that the cpuset is identical to the host's one by comparing the length of the array containerMetrics.getEffectiveCpuSetCpus() returns to the number of the CPUs configured on the host and returned by sysconf(_SC_NPROCESSORS_CONF) . The latter could be retrieved by adding new native method to OperatingSystemImpl getConfiguredCpuCount0. If they match then we just fallback to getSystemCpuLoad0(). I did some testing on Linux host and inside Docker container with different '--cpuset-cpus' settings and it seems to work as expected. JNIEXPORT jint JNICALL Java_com_sun_management_internal_OperatingSystemImpl_getConfiguredCpuCount0 (JNIEnv *env, jobject mbean) { if(perfInit() == 0) { return counters.nProcs; } else { return -1; } } If there is no objection I will include this change in the new webrev. Thank you, Daniil ?On 12/3/19, 1:30 PM, "Bob Vandette" wrote: Daniil, Looks good to me. If there are any management jtreg tests, I?d run these since your changes to OperatingSystemMXBean will alter the behavior of these methods even for Linux hosts since cgroups is typically enabled causing the container detection to report containerized. It?s too bad getCpuLoad can?t detect that the cpuset is identical to the hosts in order to allow you to fallback to getSystemCpuLoad0. Bob. > On Dec 3, 2019, at 2:42 PM, Daniil Titov wrote: > > Please review the change that makes OperatingSystemMXBean methods return container specific information > rather than the host based data. > > The webrev [1] is based on the code Andrew and Severin initially provided with some additional changes and combined > with the spec update David made [3]. > > The webrev corrects the implementation for the free/total swap methods as Bob noted to subtract the total > and free memory from the returned values. > > It also corrects getCpuLoad() implementation, as Bob advised, to cover the case when CPU quotas are not active. > > The webrev also takes into account the case when java.security.AccessControlException exception is thrown > during the initialization of the container subsystem ( e.g. when java.policy doesn?t grant "read" access to > "/proc/self/mountinfo" file). > > CSR for the spec changes [3] is approved. > > Testing: Mach5 tiers1, tiers2, tiers3, tier4, tier5 (including open/test/hotspot/jtreg/containers/docker), and tier6 tests passed . > > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.02/ > [2] Jira issue :https://bugs.openjdk.java.net/browse/JDK-8226575 > [3] CSR https://bugs.openjdk.java.net/browse/JDK-8228428 > > Thank you, > -Daniil > > From daniil.x.titov at oracle.com Wed Dec 4 03:34:23 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Tue, 03 Dec 2019 19:34:23 -0800 Subject: jmx-dev 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> Message-ID: Hi Mandy, Thank you for your comments, please find my answers below. >> src/java.base/linux/classes/jdk/internal/platform/cgroupv1/Metrics.java >> this should wrap the security-sensitive operations with doPrivileged. jdk.management is trusted and it has all permissions. I will include this change in the next webrev, thank you. >> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c >> Formatting nit: line 346-355: JDK native source uses 4-space identation convention. A space is missing between "if" and "(". I will correct this, thanks. >>Under what circumstance that limit or memLimit is < 0? The memory limit metrics is not available if JVM runs on Linux host ( not in a docker container) or if a docker container was started without specifying a memory limit ( without '--memory=' Docker option) . In latter there is no limit on how much memory the container can use and it can use as much memory as the host's OS allows. >> Is it worth specifying this case? I believe yes, since it covers the cases when JVM runs on a Linux host or a docker container was started without memory limitation. >> It fallbacks to return the system's total swap space size - this is not really what it should report. For the case when JVM runs on a Linux host it is exactly what we want. The only problematic case is if JVM runs inside a docker container without a memory limit set. However, I am not sure how we could differentiate these 2 cases. >> Similarly, getFreeMemorySize and getTotalMemorySize and getCpuLoad. For getTotalMemorySize I think we are good here. If limit is not set then all memory the host's OS have is available. For getFreeMemorySize the problematic case is if is the memory limit is set but the memory usage for some reason is not available (containerMetrics.getMemoryUsage() returns 0). Probably in this case we should just return -1 as currently getFreePhysicalMemorySize0() does if it cannot retrieve a valid result. For getCpuLoad() the problematic case if CPU quotas are active but CpuPeriod, CpuNumPeriods , or getCpuUsage are unavailable or if a valid CPU load for some CPU was not retrieved, or if all retrieved CPU load values happen to be zeros. Probably we should just return -1 in these cases rather then falling back to getSystemCpuLoad0() >>src/jdk.management/windows/classes/com/sun/management/internal/OperatingSystemImpl.java >> There is no strong need to make the deprecated methods as default methods. If they were default methods, they only need to be implemented once as opposed to in all OS-specific implementations. I could make these methods defaults if you feel it is a better approach here. >>CheckOperatingSystemMXBean.java >> System.out.println(String.format(...)) can simply be replaced with System.out.format. I will include this change in the next webrev, thank you! Best regards, Daniil From: Mandy Chung Date: Tuesday, December 3, 2019 at 4:10 PM To: Daniil Titov Cc: OpenJDK Serviceability , "jmx-dev at openjdk.java.net" , Bob Vandette Subject: Re: RFR: 8226575: OperatingSystemMXBean should be made container aware On 12/3/19 11:42 AM, Daniil Titov wrote: Please review the change that makes OperatingSystemMXBean methods return container specific informationrather than the host based data. The webrev also takes into account the case when java.security.AccessControlException exception is thrown during the initialization of the container subsystem ( e.g. when java.policy doesn?t grant "read" access to "/proc/self/mountinfo" file). Instead of failing to access /proc/self/mountinfo, I expect this to wrap the call with doPrivileged so that it can report the metrics independent of the security policy.? The jdk default security policy should grant proper permission to do so. CSR for the spec changes [3] is approved. Testing: Mach5 tiers1, tiers2, tiers3, tier4, tier5 (including open/test/hotspot/jtreg/containers/docker), and tier6 tests passed . [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.02/ [2] Jira issue :https://bugs.openjdk.java.net/browse/JDK-8226575 [3] CSR https://bugs.openjdk.java.net/browse/JDK-8228428 src/java.base/linux/classes/jdk/internal/platform/cgroupv1/Metrics.java ??? this should wrap the security-sensitive operations with doPrivileged.? jdk.management is trusted and it has all permissions. src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c ??? Formatting nit:? line 346-355: JDK native source uses 4-space identation convention.? A space is missing between "if" and "(". src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java 59 if (limit >= 0 && memLimit >= 0) { 60 return limit - memLimit; 61 } Under what circumstance that limit or memLimit is < 0??? It fallbacks to return the system's total swap space size - this is not really what it should report.?? Is it worth? specifying this case? Similarly, getFreeMemorySize and getTotalMemorySize and getCpuLoad.?? getFreeSwapSpaceSize retry for a few times.? What special about this method but not others like getFreeMemorySize? src/jdk.management/windows/classes/com/sun/management/internal/OperatingSystemImpl.java ???? There is no strong need to make the deprecated methods as default methods.? If they were default methods, they only need to be implemented once as opposed to in all OS-specific implementations. CheckOperatingSystemMXBean.java ???? System.out.println(String.format(...)) can simply be replaced with System.out.format. Mandy From daniil.x.titov at oracle.com Wed Dec 4 05:37:10 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Tue, 03 Dec 2019 21:37:10 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <0A51834F-622B-42DD-A0DA-AFAD59B23D29@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <48599FD5-583F-4CB7-80CA-09E91AA24D4A@oracle.com> <0A51834F-622B-42DD-A0DA-AFAD59B23D29@oracle.com> Message-ID: <1EB171CE-2582-4A42-A7F6-3D37C33DFBDD@oracle.com> Resending with the corrected title, "RFR" was somehow stripped from it that breaks the sorting by subject... Hi Bob, >> It?s too bad getCpuLoad can?t detect that the cpuset is identical to the hosts in order to allow you to fallback to >> getSystemCpuLoad0. I think we can detect that the cpuset is identical to the host's one by comparing the length of the array containerMetrics.getEffectiveCpuSetCpus() returns to the number of the CPUs configured on the host and returned by sysconf(_SC_NPROCESSORS_CONF) . The latter could be retrieved by adding new native method to OperatingSystemImpl getConfiguredCpuCount0. If they match then we just fallback to getSystemCpuLoad0(). I did some testing on Linux host and inside Docker container with different '--cpuset-cpus' settings and it seems to work as expected. JNIEXPORT jint JNICALL Java_com_sun_management_internal_OperatingSystemImpl_getConfiguredCpuCount0 (JNIEnv *env, jobject mbean) { if(perfInit() == 0) { return counters.nProcs; } else { return -1; } } If there is no objection I will include this change in the new webrev. Thank you, Daniil ?On 12/3/19, 1:30 PM, "Bob Vandette" wrote: Daniil, Looks good to me. If there are any management jtreg tests, I?d run these since your changes to OperatingSystemMXBean will alter the behavior of these methods even for Linux hosts since cgroups is typically enabled causing the container detection to report containerized. It?s too bad getCpuLoad can?t detect that the cpuset is identical to the hosts in order to allow you to fallback to getSystemCpuLoad0. Bob. > On Dec 3, 2019, at 2:42 PM, Daniil Titov wrote: > > Please review the change that makes OperatingSystemMXBean methods return container specific information > rather than the host based data. > > The webrev [1] is based on the code Andrew and Severin initially provided with some additional changes and combined > with the spec update David made [3]. > > The webrev corrects the implementation for the free/total swap methods as Bob noted to subtract the total > and free memory from the returned values. > > It also corrects getCpuLoad() implementation, as Bob advised, to cover the case when CPU quotas are not active. > > The webrev also takes into account the case when java.security.AccessControlException exception is thrown > during the initialization of the container subsystem ( e.g. when java.policy doesn?t grant "read" access to > "/proc/self/mountinfo" file). > > CSR for the spec changes [3] is approved. > > Testing: Mach5 tiers1, tiers2, tiers3, tier4, tier5 (including open/test/hotspot/jtreg/containers/docker), and tier6 tests passed . > > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.02/ > [2] Jira issue :https://bugs.openjdk.java.net/browse/JDK-8226575 > [3] CSR https://bugs.openjdk.java.net/browse/JDK-8228428 > > Thank you, > -Daniil > > From daniil.x.titov at oracle.com Wed Dec 4 05:40:32 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Tue, 03 Dec 2019 21:40:32 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> Message-ID: <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> Resending with the corrected subject. "RFR" was somehow stripped from it and that breaks the sorting by subject... ? Hi Mandy, Thank you for your comments, please find my answers below. >> src/java.base/linux/classes/jdk/internal/platform/cgroupv1/Metrics.java >> this should wrap the security-sensitive operations with doPrivileged. jdk.management is trusted and it has all permissions. I will include this change in the next webrev, thank you. >> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c >> Formatting nit: line 346-355: JDK native source uses 4-space identation convention. A space is missing between "if" and "(". I will correct this, thanks. >>Under what circumstance that limit or memLimit is < 0? The memory limit metrics is not available if JVM runs on Linux host ( not in a docker container) or if a docker container was started without specifying a memory limit ( without '--memory=' Docker option) . In latter there is no limit on how much memory the container can use and it can use as much memory as the host's OS allows. >> Is it worth specifying this case? I believe yes, since it covers the cases when JVM runs on a Linux host or a docker container was started without memory limitation. >> It fallbacks to return the system's total swap space size - this is not really what it should report. For the case when JVM runs on a Linux host it is exactly what we want. The only problematic case is if JVM runs inside a docker container without a memory limit set. However, I am not sure how we could differentiate these 2 cases. >> Similarly, getFreeMemorySize and getTotalMemorySize and getCpuLoad. For getTotalMemorySize I think we are good here. If limit is not set then all memory the host's OS have is available. For getFreeMemorySize the problematic case is if is the memory limit is set but the memory usage for some reason is not available (containerMetrics.getMemoryUsage() returns 0). Probably in this case we should just return -1 as currently getFreePhysicalMemorySize0() does if it cannot retrieve a valid result. For getCpuLoad() the problematic case if CPU quotas are active but CpuPeriod, CpuNumPeriods , or getCpuUsage are unavailable or if a valid CPU load for some CPU was not retrieved, or if all retrieved CPU load values happen to be zeros. Probably we should just return -1 in these cases rather then falling back to getSystemCpuLoad0() >>src/jdk.management/windows/classes/com/sun/management/internal/OperatingSystemImpl.java >> There is no strong need to make the deprecated methods as default methods. If they were default methods, they only need to be implemented once as opposed to in all OS-specific implementations. I could make these methods defaults if you feel it is a better approach here. >>CheckOperatingSystemMXBean.java >> System.out.println(String.format(...)) can simply be replaced with System.out.format. I will include this change in the next webrev, thank you! Best regards, Daniil From: Mandy Chung Date: Tuesday, December 3, 2019 at 4:10 PM To: Daniil Titov Cc: OpenJDK Serviceability , "jmx-dev at openjdk.java.net" , Bob Vandette Subject: Re: RFR: 8226575: OperatingSystemMXBean should be made container aware On 12/3/19 11:42 AM, Daniil Titov wrote: Please review the change that makes OperatingSystemMXBean methods return container specific informationrather than the host based data. The webrev also takes into account the case when java.security.AccessControlException exception is thrown during the initialization of the container subsystem ( e.g. when java.policy doesn?t grant "read" access to "/proc/self/mountinfo" file). Instead of failing to access /proc/self/mountinfo, I expect this to wrap the call with doPrivileged so that it can report the metrics independent of the security policy. The jdk default security policy should grant proper permission to do so. CSR for the spec changes [3] is approved. Testing: Mach5 tiers1, tiers2, tiers3, tier4, tier5 (including open/test/hotspot/jtreg/containers/docker), and tier6 tests passed . [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.02/ [2] Jira issue :https://bugs.openjdk.java.net/browse/JDK-8226575 [3] CSR https://bugs.openjdk.java.net/browse/JDK-8228428 src/java.base/linux/classes/jdk/internal/platform/cgroupv1/Metrics.java this should wrap the security-sensitive operations with doPrivileged. jdk.management is trusted and it has all permissions. src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c Formatting nit: line 346-355: JDK native source uses 4-space identation convention. A space is missing between "if" and "(". src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java 59 if (limit >= 0 && memLimit >= 0) { 60 return limit - memLimit; 61 } Under what circumstance that limit or memLimit is < 0? It fallbacks to return the system's total swap space size - this is not really what it should report. Is it worth specifying this case? Similarly, getFreeMemorySize and getTotalMemorySize and getCpuLoad. getFreeSwapSpaceSize retry for a few times. What special about this method but not others like getFreeMemorySize? src/jdk.management/windows/classes/com/sun/management/internal/OperatingSystemImpl.java There is no strong need to make the deprecated methods as default methods. If they were default methods, they only need to be implemented once as opposed to in all OS-specific implementations. CheckOperatingSystemMXBean.java System.out.println(String.format(...)) can simply be replaced with System.out.format. Mandy From daniil.x.titov at oracle.com Wed Dec 4 06:36:14 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Tue, 03 Dec 2019 22:36:14 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> Message-ID: Hi Mandy, I think in my previous reply I missed to answer one of the questions from your email. >> getFreeSwapSpaceSize retry for a few times. What special about this method but not others like getFreeMemorySize? The specific of method getFreeSwapSpaceSize is that MemoryAndSwapUsage and MemoryUsage metrics it reads are related ( MemoryAndSwapUsage includes MemoryUsage) and they are not constant. Since these metrics are not read atomically it could be that they change their values between these 2 reads. On the contrary, some other metrics, such as MemoryLimit, are constant. They are set when the container starts and are supposed to return the same value over the whole time the JVM runs. The other methods don't use more than one such nonconstant metric, so the only place where this potential issue with not atomic reads could happen is getFreeSwapSpaceSize method. Best regards, Daniil ?On 12/3/19, 7:34 PM, "Daniil Titov" wrote: Hi Mandy, Thank you for your comments, please find my answers below. >> src/java.base/linux/classes/jdk/internal/platform/cgroupv1/Metrics.java >> this should wrap the security-sensitive operations with doPrivileged. jdk.management is trusted and it has all permissions. I will include this change in the next webrev, thank you. >> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c >> Formatting nit: line 346-355: JDK native source uses 4-space identation convention. A space is missing between "if" and "(". I will correct this, thanks. >>Under what circumstance that limit or memLimit is < 0? The memory limit metrics is not available if JVM runs on Linux host ( not in a docker container) or if a docker container was started without specifying a memory limit ( without '--memory=' Docker option) . In latter there is no limit on how much memory the container can use and it can use as much memory as the host's OS allows. >> Is it worth specifying this case? I believe yes, since it covers the cases when JVM runs on a Linux host or a docker container was started without memory limitation. >> It fallbacks to return the system's total swap space size - this is not really what it should report. For the case when JVM runs on a Linux host it is exactly what we want. The only problematic case is if JVM runs inside a docker container without a memory limit set. However, I am not sure how we could differentiate these 2 cases. >> Similarly, getFreeMemorySize and getTotalMemorySize and getCpuLoad. For getTotalMemorySize I think we are good here. If limit is not set then all memory the host's OS have is available. For getFreeMemorySize the problematic case is if is the memory limit is set but the memory usage for some reason is not available (containerMetrics.getMemoryUsage() returns 0). Probably in this case we should just return -1 as currently getFreePhysicalMemorySize0() does if it cannot retrieve a valid result. For getCpuLoad() the problematic case if CPU quotas are active but CpuPeriod, CpuNumPeriods , or getCpuUsage are unavailable or if a valid CPU load for some CPU was not retrieved, or if all retrieved CPU load values happen to be zeros. Probably we should just return -1 in these cases rather then falling back to getSystemCpuLoad0() >>src/jdk.management/windows/classes/com/sun/management/internal/OperatingSystemImpl.java >> There is no strong need to make the deprecated methods as default methods. If they were default methods, they only need to be implemented once as opposed to in all OS-specific implementations. I could make these methods defaults if you feel it is a better approach here. >>CheckOperatingSystemMXBean.java >> System.out.println(String.format(...)) can simply be replaced with System.out.format. I will include this change in the next webrev, thank you! Best regards, Daniil From: Mandy Chung Date: Tuesday, December 3, 2019 at 4:10 PM To: Daniil Titov Cc: OpenJDK Serviceability , "jmx-dev at openjdk.java.net" , Bob Vandette Subject: Re: RFR: 8226575: OperatingSystemMXBean should be made container aware On 12/3/19 11:42 AM, Daniil Titov wrote: Please review the change that makes OperatingSystemMXBean methods return container specific informationrather than the host based data. The webrev also takes into account the case when java.security.AccessControlException exception is thrown during the initialization of the container subsystem ( e.g. when java.policy doesn?t grant "read" access to "/proc/self/mountinfo" file). Instead of failing to access /proc/self/mountinfo, I expect this to wrap the call with doPrivileged so that it can report the metrics independent of the security policy. The jdk default security policy should grant proper permission to do so. CSR for the spec changes [3] is approved. Testing: Mach5 tiers1, tiers2, tiers3, tier4, tier5 (including open/test/hotspot/jtreg/containers/docker), and tier6 tests passed . [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.02/ [2] Jira issue :https://bugs.openjdk.java.net/browse/JDK-8226575 [3] CSR https://bugs.openjdk.java.net/browse/JDK-8228428 src/java.base/linux/classes/jdk/internal/platform/cgroupv1/Metrics.java this should wrap the security-sensitive operations with doPrivileged. jdk.management is trusted and it has all permissions. src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c Formatting nit: line 346-355: JDK native source uses 4-space identation convention. A space is missing between "if" and "(". src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java 59 if (limit >= 0 && memLimit >= 0) { 60 return limit - memLimit; 61 } Under what circumstance that limit or memLimit is < 0? It fallbacks to return the system's total swap space size - this is not really what it should report. Is it worth specifying this case? Similarly, getFreeMemorySize and getTotalMemorySize and getCpuLoad. getFreeSwapSpaceSize retry for a few times. What special about this method but not others like getFreeMemorySize? src/jdk.management/windows/classes/com/sun/management/internal/OperatingSystemImpl.java There is no strong need to make the deprecated methods as default methods. If they were default methods, they only need to be implemented once as opposed to in all OS-specific implementations. CheckOperatingSystemMXBean.java System.out.println(String.format(...)) can simply be replaced with System.out.format. Mandy From bob.vandette at oracle.com Wed Dec 4 13:32:05 2019 From: bob.vandette at oracle.com (Bob Vandette) Date: Wed, 4 Dec 2019 08:32:05 -0500 Subject: jmx-dev 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <48599FD5-583F-4CB7-80CA-09E91AA24D4A@oracle.com> Message-ID: <133A9A62-7D4E-4810-A9E4-B51A6D222585@oracle.com> > On Dec 3, 2019, at 9:00 PM, Daniil Titov wrote: > > Hi Bob, > >>> It?s too bad getCpuLoad can?t detect that the cpuset is identical to the hosts in order to allow you to fallback to >>> getSystemCpuLoad0. > > I think we can detect that the cpuset is identical to the host's one by comparing the length of the array containerMetrics.getEffectiveCpuSetCpus() returns > to the number of the CPUs configured on the host and returned by sysconf(_SC_NPROCESSORS_CONF) . The latter could be retrieved by adding new native > method to OperatingSystemImpl getConfiguredCpuCount0. If they match then we just fallback to getSystemCpuLoad0(). I did some testing on Linux host and > inside Docker container with different '--cpuset-cpus' settings and it seems to work as expected. > > JNIEXPORT jint JNICALL > Java_com_sun_management_internal_OperatingSystemImpl_getConfiguredCpuCount0 > (JNIEnv *env, jobject mbean) > { > if(perfInit() == 0) { > return counters.nProcs; > } else { > return -1; > } > } > > > If there is no objection I will include this change in the new webrev. I don?t think this approach will work. Both the array returned and the sysconf(_SC_NPROCESSORS_CONF) report the containers cpuset value so they will be equal causing you to always fallback. You can try to use containerMetrics.getPerCpuUsage() instead of containerMetrics.getEffectiveCpuSetCpus(). The length of the array returned is the number of host cpus. Maybe Severin can confirm if this true in cgroupv2 as well. Bob. > > Thank you, > Daniil > > ?On 12/3/19, 1:30 PM, "Bob Vandette" wrote: > > Daniil, > > Looks good to me. > > If there are any management jtreg tests, I?d run these since your changes to OperatingSystemMXBean will > alter the behavior of these methods even for Linux hosts since cgroups is typically enabled causing the > container detection to report containerized. > > It?s too bad getCpuLoad can?t detect that the cpuset is identical to the hosts in order to allow you to fallback to > getSystemCpuLoad0. > > > Bob. > > >> On Dec 3, 2019, at 2:42 PM, Daniil Titov wrote: >> >> Please review the change that makes OperatingSystemMXBean methods return container specific information >> rather than the host based data. >> >> The webrev [1] is based on the code Andrew and Severin initially provided with some additional changes and combined >> with the spec update David made [3]. >> >> The webrev corrects the implementation for the free/total swap methods as Bob noted to subtract the total >> and free memory from the returned values. >> >> It also corrects getCpuLoad() implementation, as Bob advised, to cover the case when CPU quotas are not active. >> >> The webrev also takes into account the case when java.security.AccessControlException exception is thrown >> during the initialization of the container subsystem ( e.g. when java.policy doesn?t grant "read" access to >> "/proc/self/mountinfo" file). >> >> CSR for the spec changes [3] is approved. >> >> Testing: Mach5 tiers1, tiers2, tiers3, tier4, tier5 (including open/test/hotspot/jtreg/containers/docker), and tier6 tests passed . >> >> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.02/ >> [2] Jira issue :https://bugs.openjdk.java.net/browse/JDK-8226575 >> [3] CSR https://bugs.openjdk.java.net/browse/JDK-8228428 >> >> Thank you, >> -Daniil >> >> > > > > From bob.vandette at oracle.com Wed Dec 4 14:13:51 2019 From: bob.vandette at oracle.com (Bob Vandette) Date: Wed, 4 Dec 2019 09:13:51 -0500 Subject: jmx-dev 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <133A9A62-7D4E-4810-A9E4-B51A6D222585@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <48599FD5-583F-4CB7-80CA-09E91AA24D4A@oracle.com> <133A9A62-7D4E-4810-A9E4-B51A6D222585@oracle.com> Message-ID: > On Dec 4, 2019, at 8:32 AM, Bob Vandette wrote: > > >> On Dec 3, 2019, at 9:00 PM, Daniil Titov wrote: >> >> Hi Bob, >> >>>> It?s too bad getCpuLoad can?t detect that the cpuset is identical to the hosts in order to allow you to fallback to >>>> getSystemCpuLoad0. >> >> I think we can detect that the cpuset is identical to the host's one by comparing the length of the array containerMetrics.getEffectiveCpuSetCpus() returns >> to the number of the CPUs configured on the host and returned by sysconf(_SC_NPROCESSORS_CONF) . The latter could be retrieved by adding new native >> method to OperatingSystemImpl getConfiguredCpuCount0. If they match then we just fallback to getSystemCpuLoad0(). I did some testing on Linux host and >> inside Docker container with different '--cpuset-cpus' settings and it seems to work as expected. >> >> JNIEXPORT jint JNICALL >> Java_com_sun_management_internal_OperatingSystemImpl_getConfiguredCpuCount0 >> (JNIEnv *env, jobject mbean) >> { >> if(perfInit() == 0) { >> return counters.nProcs; >> } else { >> return -1; >> } >> } >> >> >> If there is no objection I will include this change in the new webrev. > > I don?t think this approach will work. Both the array returned and the sysconf(_SC_NPROCESSORS_CONF) > report the containers cpuset value so they will be equal causing you to always fallback. > > You can try to use containerMetrics.getPerCpuUsage() instead of containerMetrics.getEffectiveCpuSetCpus(). > The length of the array returned is the number of host cpus. Maybe Severin can confirm if this true in cgroupv2 as > well. I just checked the webrev for the cgroupv2 implementation and getPerCpuUsage is not supported. I still think it?s worth implementing this optimization but it won?t be used on cgroupv2 since the array length (0) won?t be equal to _SC_NPROCESSORS_CONF. Here?s the cgroupv2 implementation of this method. 64 @Override 65 public long[] getPerCpuUsage() { 66 // Not supported 67 return new long[0]; 68 } Bob. > > Bob. > > >> >> Thank you, >> Daniil >> >> ?On 12/3/19, 1:30 PM, "Bob Vandette" wrote: >> >> Daniil, >> >> Looks good to me. >> >> If there are any management jtreg tests, I?d run these since your changes to OperatingSystemMXBean will >> alter the behavior of these methods even for Linux hosts since cgroups is typically enabled causing the >> container detection to report containerized. >> >> It?s too bad getCpuLoad can?t detect that the cpuset is identical to the hosts in order to allow you to fallback to >> getSystemCpuLoad0. >> >> >> Bob. >> >> >>> On Dec 3, 2019, at 2:42 PM, Daniil Titov wrote: >>> >>> Please review the change that makes OperatingSystemMXBean methods return container specific information >>> rather than the host based data. >>> >>> The webrev [1] is based on the code Andrew and Severin initially provided with some additional changes and combined >>> with the spec update David made [3]. >>> >>> The webrev corrects the implementation for the free/total swap methods as Bob noted to subtract the total >>> and free memory from the returned values. >>> >>> It also corrects getCpuLoad() implementation, as Bob advised, to cover the case when CPU quotas are not active. >>> >>> The webrev also takes into account the case when java.security.AccessControlException exception is thrown >>> during the initialization of the container subsystem ( e.g. when java.policy doesn?t grant "read" access to >>> "/proc/self/mountinfo" file). >>> >>> CSR for the spec changes [3] is approved. >>> >>> Testing: Mach5 tiers1, tiers2, tiers3, tier4, tier5 (including open/test/hotspot/jtreg/containers/docker), and tier6 tests passed . >>> >>> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.02/ >>> [2] Jira issue :https://bugs.openjdk.java.net/browse/JDK-8226575 >>> [3] CSR https://bugs.openjdk.java.net/browse/JDK-8228428 >>> >>> Thank you, >>> -Daniil >>> >>> >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sgehwolf at redhat.com Wed Dec 4 18:22:30 2019 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Wed, 04 Dec 2019 19:22:30 +0100 Subject: jmx-dev 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <133A9A62-7D4E-4810-A9E4-B51A6D222585@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <48599FD5-583F-4CB7-80CA-09E91AA24D4A@oracle.com> <133A9A62-7D4E-4810-A9E4-B51A6D222585@oracle.com> Message-ID: <2a47a93f7ee2f479a73b1e7a3dd829a244358095.camel@redhat.com> On Wed, 2019-12-04 at 08:32 -0500, Bob Vandette wrote: > You can try to use containerMetrics.getPerCpuUsage() instead of containerMetrics.getEffectiveCpuSetCpus(). > The length of the array returned is the number of host cpus. Maybe Severin can confirm if this true in cgroupv2 as > well. If I'm not mistaken getPerCpuUsage() is not supported in cgroupv2. Thanks, Severin From mandy.chung at oracle.com Thu Dec 5 00:09:45 2019 From: mandy.chung at oracle.com (Mandy Chung) Date: Wed, 4 Dec 2019 16:09:45 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> Message-ID: On 12/3/19 9:40 PM, Daniil Titov wrote: > >>> Under what circumstance that limit or memLimit is < 0? > The memory limit metrics is not available if JVM runs on Linux host ( not in a docker container) or if a docker container was started without > specifying a memory limit ( without '--memory=' Docker option) . In latter there is no limit on how much memory the container can use and > it can use as much memory as the host's OS allows. > OK.? Please add a comment to the code. It may worth considering adding Metrics::getSwapLimit and Metrics::getSwapUsage and move the computation to the implementation of Metrics.? Bob may have an opinion. Also it seems correct for the memory related methods to check if (containerMetrics != null && containerMetrics.getMemoryLimit() >= 0).? BTW what does it mean if limit == 0? >>> Is it worth specifying this case? > I believe yes, since it covers the cases when JVM runs on a Linux host or a docker container was started without memory limitation. > I was wondering if the javadoc should specify that. >>> It fallbacks to return the system's total swap space size - this is not really what it should report. > For the case when JVM runs on a Linux host it is exactly what we want. The only problematic case is if JVM runs inside a docker container without a memory limit set. > However, I am not sure how we could differentiate these 2 cases. As this is the case when the limit is not set in the container, it returns the system metrics which sounds appropriate. > >>> Similarly, getFreeMemorySize and getTotalMemorySize and getCpuLoad. > For getTotalMemorySize I think we are good here. If limit is not set then all memory the host's OS have is available. > For getFreeMemorySize the problematic case is if is the memory limit is set but the memory usage for some reason is not available (containerMetrics.getMemoryUsage() returns 0). Will zero memory usage happen? > Probably in this case we should just return -1 as currently getFreePhysicalMemorySize0() does if it cannot retrieve a valid result. > > For getCpuLoad() the problematic case if CPU quotas are active but CpuPeriod, CpuNumPeriods , or getCpuUsage are unavailable or if a valid CPU load for some CPU was > not retrieved, or if all retrieved CPU load values happen to be zeros. Probably we should just return -1 in these cases rather then falling back to getSystemCpuLoad0() > returning -1 sounds right. >>> src/jdk.management/windows/classes/com/sun/management/internal/OperatingSystemImpl.java >>> There is no strong need to make the deprecated methods as default methods. If they were default methods, they only need to be implemented once as opposed to in all OS-specific implementations. > I could make these methods defaults if you feel it is a better approach here. > > It's not strictly needed but I can go either way. >>> CheckOperatingSystemMXBean.java >>> System.out.println(String.format(...)) can simply be replaced with System.out.format. > I will include this change in the next webrev, thank you! > > thanks Mandy From daniil.x.titov at oracle.com Thu Dec 5 19:31:02 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Thu, 05 Dec 2019 11:31:02 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> Message-ID: <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> Hi Mandy and Bob, Please review a new version of the webrev that addresses the most of these issues: 1) The interface and spec [3] were updated to use default methods. CSR [3] was re-approved. 2) Security-sensitive operations in j.i.p.cgroupv1.Metrics and in j.i.p.cgroupv1. SubSystem were wrapped with doPrivileged 3) getCpuLoad () method was optimized to fallback to getSystemCpuLoad0 if the cpuset is identical to the host's one. It uses sysconf(_SC_NPROCESSORS_CONF) to retrieve the number of CPUs configured on the host . Testing with different --cpuset-cpus settings inside a Docker container proved that it always returns the same number of hosts configured CPUs regardless of --cpuset-cpus settings while the same settings affect getEffectiveCpuSetCpus and getCpuSetCpus() metrics. In addition, getCpuLoad () method now returns -1 in the cases when quotas are active but cpu usage and cpu period metrics are not available and in the case when for some reason it fails to retrieve a valid CPU load for one of CPUs while iteration over them >> CheckOperatingSystemMXBean.java >> System.out.println(String.format(...)) can simply be replaced with System.out.format. I had to leave it unchanged since replacing it with System.out.format results in the tests instability as it makes the trace output occasionally Intervene here (the trace message sometimes is printed inside this message) and tests cannot find the expected pattern in the output. >> It may worth considering adding Metrics::getSwapLimit and >> Metrics::getSwapUsage and move the computation to the implementation of >> Metrics. Bob may have an opinion. There was no any new input regarding this so I decided to leave it unchanged. >>Also it seems correct for the memory related methods to check if >>(containerMetrics != null && containerMetrics.getMemoryLimit() >= 0). >> BTW what does it mean if limit == 0? Per Docker docs the minimum allowed value for memory limit (--memory option) is 4 megabytes. And if memory limit is unset the return value is -1. Thus, in my understanding the value 0 is only possible if something went wrong while retrieving this metric. Testing: Mach5 tier1-tier6 tests (that include open/test/hotspot/jtreg/containers/docker and : jdk_management tests) passed. [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.03/ [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 Thank you, Daniil ?On 12/4/19, 4:09 PM, "Mandy Chung" wrote: On 12/3/19 9:40 PM, Daniil Titov wrote: > >>> Under what circumstance that limit or memLimit is < 0? > The memory limit metrics is not available if JVM runs on Linux host ( not in a docker container) or if a docker container was started without > specifying a memory limit ( without '--memory=' Docker option) . In latter there is no limit on how much memory the container can use and > it can use as much memory as the host's OS allows. > OK. Please add a comment to the code. It may worth considering adding Metrics::getSwapLimit and Metrics::getSwapUsage and move the computation to the implementation of Metrics. Bob may have an opinion. Also it seems correct for the memory related methods to check if (containerMetrics != null && containerMetrics.getMemoryLimit() >= 0). BTW what does it mean if limit == 0? >>> Is it worth specifying this case? > I believe yes, since it covers the cases when JVM runs on a Linux host or a docker container was started without memory limitation. > I was wondering if the javadoc should specify that. >>> It fallbacks to return the system's total swap space size - this is not really what it should report. > For the case when JVM runs on a Linux host it is exactly what we want. The only problematic case is if JVM runs inside a docker container without a memory limit set. > However, I am not sure how we could differentiate these 2 cases. As this is the case when the limit is not set in the container, it returns the system metrics which sounds appropriate. > >>> Similarly, getFreeMemorySize and getTotalMemorySize and getCpuLoad. > For getTotalMemorySize I think we are good here. If limit is not set then all memory the host's OS have is available. > For getFreeMemorySize the problematic case is if is the memory limit is set but the memory usage for some reason is not available (containerMetrics.getMemoryUsage() returns 0). Will zero memory usage happen? > Probably in this case we should just return -1 as currently getFreePhysicalMemorySize0() does if it cannot retrieve a valid result. > > For getCpuLoad() the problematic case if CPU quotas are active but CpuPeriod, CpuNumPeriods , or getCpuUsage are unavailable or if a valid CPU load for some CPU was > not retrieved, or if all retrieved CPU load values happen to be zeros. Probably we should just return -1 in these cases rather then falling back to getSystemCpuLoad0() > returning -1 sounds right. >>> src/jdk.management/windows/classes/com/sun/management/internal/OperatingSystemImpl.java >>> There is no strong need to make the deprecated methods as default methods. If they were default methods, they only need to be implemented once as opposed to in all OS-specific implementations. > I could make these methods defaults if you feel it is a better approach here. > > It's not strictly needed but I can go either way. >>> CheckOperatingSystemMXBean.java >>> System.out.println(String.format(...)) can simply be replaced with System.out.format. > I will include this change in the next webrev, thank you! > > thanks Mandy From bob.vandette at oracle.com Thu Dec 5 20:50:49 2019 From: bob.vandette at oracle.com (Bob Vandette) Date: Thu, 5 Dec 2019 15:50:49 -0500 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> Message-ID: In http://cr.openjdk.java.net/~dtitov/8226575/webrev.03/src/java.base/linux/classes/jdk/internal/platform/cgroupv1/SubSystem.java.sdiff.html Shouldn?t you keep the IOException catch clauses in case the file is not found? > On Dec 5, 2019, at 2:31 PM, Daniil Titov wrote: > > Hi Mandy and Bob, > > Please review a new version of the webrev that addresses the most of these issues: > > 1) The interface and spec [3] were updated to use default methods. CSR [3] was re-approved. > > 2) Security-sensitive operations in j.i.p.cgroupv1.Metrics and in j.i.p.cgroupv1. SubSystem > were wrapped with doPrivileged > > 3) getCpuLoad () method was optimized to fallback to getSystemCpuLoad0 if the cpuset is identical to the host's one. > It uses sysconf(_SC_NPROCESSORS_CONF) to retrieve the number of CPUs configured on the host . Testing with > different --cpuset-cpus settings inside a Docker container proved that it always returns the same number of hosts configured > CPUs regardless of --cpuset-cpus settings while the same settings affect getEffectiveCpuSetCpus and getCpuSetCpus() metrics. > > In addition, getCpuLoad () method now returns -1 in the cases when quotas are active but cpu usage and cpu period metrics are not available and > in the case when for some reason it fails to retrieve a valid CPU load for one of CPUs while iteration over them Shouldn't you do the same for getCpuLoad 149 int[] cpuSet = containerMetrics.getEffectiveCpuSetCpus(); 150 if (cpuSet != null && cpuSet.length > 0) { If cpuSet.length == 0? > >>> CheckOperatingSystemMXBean.java >>> System.out.println(String.format(...)) can simply be replaced with System.out.format. > I had to leave it unchanged since replacing it with System.out.format results in the tests instability as it makes the trace output > occasionally Intervene here (the trace message sometimes is printed inside this message) and tests cannot find the expected > pattern in the output. > >>> It may worth considering adding Metrics::getSwapLimit and >>> Metrics::getSwapUsage and move the computation to the implementation of >>> Metrics. Bob may have an opinion. > > There was no any new input regarding this so I decided to leave it unchanged. Sorry, I didn?t respond to this. Since the calculation required for getFreeSwapSpaceSize requires retries due to the access of multiple changing values, I think it?s best to leave things as they are so the caller of these methods understands the limitations of the API. Also, the fact that swap size metrics include memory sizes is fully documented in both the cgroup and docker online documentation so it?s probably best to be consistent. > >>> Also it seems correct for the memory related methods to check if >>> (containerMetrics != null && containerMetrics.getMemoryLimit() >= 0). >>> BTW what does it mean if limit == 0? > > Per Docker docs the minimum allowed value for memory limit (--memory option) is 4 megabytes. > And if memory limit is unset the return value is -1. Thus, in my understanding the value 0 is only possible > if something went wrong while retrieving this metric. That is true but shouldn?t you return -1 in that case? I originally thought it was ok to fall back to the host data for 0 values but I think its better to return unavailable (-1) I think you might want to change all >= 0 to > 0 and return -1 if any of the values are 0. This would be more consistent. You should only fall back to the original logic (host values) if container values are set to unlimited. Bob. > > Testing: Mach5 tier1-tier6 tests (that include open/test/hotspot/jtreg/containers/docker and : jdk_management tests) passed. > > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.03/ > [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 > [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 > > Thank you, > Daniil > > ?On 12/4/19, 4:09 PM, "Mandy Chung" wrote: > > > > On 12/3/19 9:40 PM, Daniil Titov wrote: >> >>>> Under what circumstance that limit or memLimit is < 0? >> The memory limit metrics is not available if JVM runs on Linux host ( not in a docker container) or if a docker container was started without >> specifying a memory limit ( without '--memory=' Docker option) . In latter there is no limit on how much memory the container can use and >> it can use as much memory as the host's OS allows. >> > > OK. Please add a comment to the code. > > It may worth considering adding Metrics::getSwapLimit and > Metrics::getSwapUsage and move the computation to the implementation of > Metrics. Bob may have an opinion. > > Also it seems correct for the memory related methods to check if > (containerMetrics != null && containerMetrics.getMemoryLimit() >= 0). > BTW what does it mean if limit == 0? > >>>> Is it worth specifying this case? >> I believe yes, since it covers the cases when JVM runs on a Linux host or a docker container was started without memory limitation. >> > > I was wondering if the javadoc should specify that. >>>> It fallbacks to return the system's total swap space size - this is not really what it should report. >> For the case when JVM runs on a Linux host it is exactly what we want. The only problematic case is if JVM runs inside a docker container without a memory limit set. >> However, I am not sure how we could differentiate these 2 cases. > > As this is the case when the limit is not set in the container, it > returns the system metrics which sounds appropriate. > >> >>>> Similarly, getFreeMemorySize and getTotalMemorySize and getCpuLoad. >> For getTotalMemorySize I think we are good here. If limit is not set then all memory the host's OS have is available. >> For getFreeMemorySize the problematic case is if is the memory limit is set but the memory usage for some reason is not available (containerMetrics.getMemoryUsage() returns 0). > > Will zero memory usage happen? > >> Probably in this case we should just return -1 as currently getFreePhysicalMemorySize0() does if it cannot retrieve a valid result. >> > >> For getCpuLoad() the problematic case if CPU quotas are active but CpuPeriod, CpuNumPeriods , or getCpuUsage are unavailable or if a valid CPU load for some CPU was >> not retrieved, or if all retrieved CPU load values happen to be zeros. Probably we should just return -1 in these cases rather then falling back to getSystemCpuLoad0() >> > > returning -1 sounds right. >>>> src/jdk.management/windows/classes/com/sun/management/internal/OperatingSystemImpl.java >>>> There is no strong need to make the deprecated methods as default methods. If they were default methods, they only need to be implemented once as opposed to in all OS-specific implementations. >> I could make these methods defaults if you feel it is a better approach here. >> >> > > It's not strictly needed but I can go either way. > > >>>> CheckOperatingSystemMXBean.java >>>> System.out.println(String.format(...)) can simply be replaced with System.out.format. >> I will include this change in the next webrev, thank you! >> >> > > thanks > Mandy > > > From mandy.chung at oracle.com Thu Dec 5 20:59:16 2019 From: mandy.chung at oracle.com (Mandy Chung) Date: Thu, 5 Dec 2019 12:59:16 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> Message-ID: On 12/5/19 12:50 PM, Bob Vandette wrote: > >>>> It may worth considering adding Metrics::getSwapLimit and >>>> Metrics::getSwapUsage and move the computation to the implementation of >>>> Metrics. Bob may have an opinion. >> There was no any new input regarding this so I decided to leave it unchanged. > Sorry, I didn?t respond to this. Since the calculation required for getFreeSwapSpaceSize requires retries > due to the access of multiple changing values, I think it?s best to leave things as they are so the caller of > these methods understands the limitations of the API. OK with me. > Also, the fact that swap size metrics include memory sizes is fully documented in both the cgroup and docker > online documentation so it?s probably best to be consistent. > >>>> Also it seems correct for the memory related methods to check if >>>> (containerMetrics != null && containerMetrics.getMemoryLimit() >= 0). >>>> BTW what does it mean if limit == 0? >> Per Docker docs the minimum allowed value for memory limit (--memory option) is 4 megabytes. >> And if memory limit is unset the return value is -1. Thus, in my understanding the value 0 is only possible >> if something went wrong while retrieving this metric. > That is true but shouldn?t you return -1 in that case? > > I originally thought it was ok to fall back to the host data for 0 values but I think its better to return unavailable (-1) > I think you might want to change all >= 0 to > 0 and return -1 if any of the values are 0. This would be more consistent. +1 The javadoc should be changed and returns -1 when it's unavailable and the CSR should also be updated to reflect this.??? I'm sure Joe can re-approve the CSR quickly when the fix is reviewed and approved. > You should only fall back to the original logic (host values) if container values are set to unlimited. > +1 Mandy From daniil.x.titov at oracle.com Fri Dec 6 01:03:21 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Thu, 05 Dec 2019 17:03:21 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> Message-ID: <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> Hi Mandy and Bob, Thank you for your comments. Please review a new version of the fix [1] that makes OperatingSystemImpl methods return -1 if one of the metric has value 0. As Mandy recommended I also updated the Javadoc for OperatingSystemMXBean indicating that methods could return -1 if the information is not available. There were no changes in CSR [3] yet, I plan to proceed with them after the fix is reviewed. > In http://cr.openjdk.java.net/~dtitov/8226575/webrev.03/src/java.base/linux/classes/jdk/internal/platform/cgroupv1/SubSystem.java.sdiff.html > Shouldn?t you keep the IOException catch clauses in case the file is not found? There is no need in keeping IOException catch in these 2 places where it used to be (getLongValueMatchingLine and getLongEntry methods). As I understand IOException catch was required only because File.lines() and File. readAllLines() can throw IOException. Now these calls are performed inside AccessController.doPrivileged(PrivilegedExceptionAction) that wraps all checked exceptions in PrivilegedActionException that we are catching now instead of IOException. Here is the sampe of the stacktrace: java.security.PrivilegedActionException: java.io.FileNotFoundException at java.base/java.security.AccessController.doPrivileged(AccessController.java:558) at java.base/jdk.internal.platform.cgroupv1.SubSystem.getLongValueMatchingLine(SubSystem.java:113) at java.base/jdk.internal.platform.cgroupv1.Metrics.getMemoryLimit(Metrics.java:390) at jdk.management/com.sun.management.internal.OperatingSystemImpl.getTotalMemorySize(OperatingSystemImpl.java:109) at CheckOperatingSystemMXBean.main(CheckOperatingSystemMXBean.java:36) Caused by: java.io.FileNotFoundException at java.base/jdk.internal.platform.cgroupv1.SubSystem.lambda$getLongValueMatchingLine$1(SubSystem.java:116) at java.base/java.security.AccessController.doPrivileged(AccessController.java:554) In getStringValue method the whole code block is now executed inside AccessController.doPrivileged() so we still need either catch IOException inside this code block or convert this block to PrivilegedExceptionAction and then put AccessController.doPrivileged call inside new try/catch Block to catch PrivilegedExceptionAction. The former approach looked more preferable. Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.04/ [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 Thanks, Daniil ?On 12/5/19, 12:59 PM, "Mandy Chung" wrote: On 12/5/19 12:50 PM, Bob Vandette wrote: > >>>> It may worth considering adding Metrics::getSwapLimit and >>>> Metrics::getSwapUsage and move the computation to the implementation of >>>> Metrics. Bob may have an opinion. >> There was no any new input regarding this so I decided to leave it unchanged. > Sorry, I didn?t respond to this. Since the calculation required for getFreeSwapSpaceSize requires retries > due to the access of multiple changing values, I think it?s best to leave things as they are so the caller of > these methods understands the limitations of the API. OK with me. > Also, the fact that swap size metrics include memory sizes is fully documented in both the cgroup and docker > online documentation so it?s probably best to be consistent. > >>>> Also it seems correct for the memory related methods to check if >>>> (containerMetrics != null && containerMetrics.getMemoryLimit() >= 0). >>>> BTW what does it mean if limit == 0? >> Per Docker docs the minimum allowed value for memory limit (--memory option) is 4 megabytes. >> And if memory limit is unset the return value is -1. Thus, in my understanding the value 0 is only possible >> if something went wrong while retrieving this metric. > That is true but shouldn?t you return -1 in that case? > > I originally thought it was ok to fall back to the host data for 0 values but I think its better to return unavailable (-1) > I think you might want to change all >= 0 to > 0 and return -1 if any of the values are 0. This would be more consistent. +1 The javadoc should be changed and returns -1 when it's unavailable and the CSR should also be updated to reflect this. I'm sure Joe can re-approve the CSR quickly when the fix is reviewed and approved. > You should only fall back to the original logic (host values) if container values are set to unlimited. > +1 Mandy From david.holmes at oracle.com Fri Dec 6 07:49:34 2019 From: david.holmes at oracle.com (David Holmes) Date: Fri, 6 Dec 2019 17:49:34 +1000 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> Message-ID: Hi Daniil, I'm not familiar with all the details of the various API's involved here so just a few general comments in places. I do have one major issue flagged below. --- src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c ! static int initialized=1; Am I reading this right that the code currently fails to actually do the initialization because of this ??? Style nit: if(perfInit() space after "if" --- src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. Surely there must always be some information available from the operating environment? I see from the impl file: // the host data, value 0 indicates that something went wrong while the metric was read and // in this case we return "information unavailable" code -1. I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. --- src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java src/jdk.management/windows/classes/com/sun/management/internal/OperatingSystemImpl.java Can you please rename the legacy methods so that, for example, getTotalMemorySize() calls getTotalMemorySize0() rather than getTotalPhysicalMemorySize0(). That way we relegate the legacy names to the interface only. --- test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java System.out.println(String.format(...) Why not simply System.out.printf(..) ? --- Thanks, David ----- On 6/12/2019 11:03 am, Daniil Titov wrote: > Hi Mandy and Bob, > > Thank you for your comments. Please review a new version of the fix [1] that makes > OperatingSystemImpl methods return -1 if one of the metric has value 0. > > As Mandy recommended I also updated the Javadoc for OperatingSystemMXBean > indicating that methods could return -1 if the information is not available. > There were no changes in CSR [3] yet, I plan to proceed with them after the fix is > reviewed. > >> In http://cr.openjdk.java.net/~dtitov/8226575/webrev.03/src/java.base/linux/classes/jdk/internal/platform/cgroupv1/SubSystem.java.sdiff.html >> Shouldn?t you keep the IOException catch clauses in case the file is not found? > > There is no need in keeping IOException catch in these 2 places where it used to be (getLongValueMatchingLine and getLongEntry methods). > As I understand IOException catch was required only because File.lines() and File. readAllLines() can throw IOException. > Now these calls are performed inside AccessController.doPrivileged(PrivilegedExceptionAction) that wraps > all checked exceptions in PrivilegedActionException that we are catching now instead of IOException. > > Here is the sampe of the stacktrace: > java.security.PrivilegedActionException: java.io.FileNotFoundException > at java.base/java.security.AccessController.doPrivileged(AccessController.java:558) > at java.base/jdk.internal.platform.cgroupv1.SubSystem.getLongValueMatchingLine(SubSystem.java:113) > at java.base/jdk.internal.platform.cgroupv1.Metrics.getMemoryLimit(Metrics.java:390) > at jdk.management/com.sun.management.internal.OperatingSystemImpl.getTotalMemorySize(OperatingSystemImpl.java:109) > at CheckOperatingSystemMXBean.main(CheckOperatingSystemMXBean.java:36) > Caused by: java.io.FileNotFoundException > at java.base/jdk.internal.platform.cgroupv1.SubSystem.lambda$getLongValueMatchingLine$1(SubSystem.java:116) > at java.base/java.security.AccessController.doPrivileged(AccessController.java:554) > > > In getStringValue method the whole code block is now executed inside AccessController.doPrivileged() so we still need either catch > IOException inside this code block or convert this block to PrivilegedExceptionAction and then put AccessController.doPrivileged > call inside new try/catch Block to catch PrivilegedExceptionAction. The former approach looked more preferable. > > Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. > > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.04/ > [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 > [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 > > Thanks, > Daniil > > ?On 12/5/19, 12:59 PM, "Mandy Chung" wrote: > > > > On 12/5/19 12:50 PM, Bob Vandette wrote: > > > >>>> It may worth considering adding Metrics::getSwapLimit and > >>>> Metrics::getSwapUsage and move the computation to the implementation of > >>>> Metrics. Bob may have an opinion. > >> There was no any new input regarding this so I decided to leave it unchanged. > > Sorry, I didn?t respond to this. Since the calculation required for getFreeSwapSpaceSize requires retries > > due to the access of multiple changing values, I think it?s best to leave things as they are so the caller of > > these methods understands the limitations of the API. > > OK with me. > > Also, the fact that swap size metrics include memory sizes is fully documented in both the cgroup and docker > > online documentation so it?s probably best to be consistent. > > > >>>> Also it seems correct for the memory related methods to check if > >>>> (containerMetrics != null && containerMetrics.getMemoryLimit() >= 0). > >>>> BTW what does it mean if limit == 0? > >> Per Docker docs the minimum allowed value for memory limit (--memory option) is 4 megabytes. > >> And if memory limit is unset the return value is -1. Thus, in my understanding the value 0 is only possible > >> if something went wrong while retrieving this metric. > > That is true but shouldn?t you return -1 in that case? > > > > I originally thought it was ok to fall back to the host data for 0 values but I think its better to return unavailable (-1) > > I think you might want to change all >= 0 to > 0 and return -1 if any of the values are 0. This would be more consistent. > > +1 > > The javadoc should be changed and returns -1 when it's unavailable and > the CSR should also be updated to reflect this. I'm sure Joe can > re-approve the CSR quickly when the fix is reviewed and approved. > > > You should only fall back to the original logic (host values) if container values are set to unlimited. > > > +1 > > Mandy > > > From bob.vandette at oracle.com Fri Dec 6 13:59:10 2019 From: bob.vandette at oracle.com (Bob Vandette) Date: Fri, 6 Dec 2019 08:59:10 -0500 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> Message-ID: > On Dec 6, 2019, at 2:49 AM, David Holmes wrote: > > Hi Daniil, > > I'm not familiar with all the details of the various API's involved here so just a few general comments in places. I do have one major issue flagged below. > > --- > > src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c > > ! static int initialized=1; > > Am I reading this right that the code currently fails to actually do the initialization because of this ??? > > Style nit: if(perfInit() > > space after "if" > > --- > > src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java > > The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. Surely there must always be some information available from the operating environment? I see from the impl file: > > // the host data, value 0 indicates that something went wrong while the metric was read and > // in this case we return "information unavailable" code -1. > > I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods. Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others are, I?d suggest we keep Daniil?s original logic to fall back to the host value since a disabled subsystem is equivalent to no limits. Bob. > > --- > > src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java > src/jdk.management/windows/classes/com/sun/management/internal/OperatingSystemImpl.java > > Can you please rename the legacy methods so that, for example, getTotalMemorySize() calls getTotalMemorySize0() rather than getTotalPhysicalMemorySize0(). That way we relegate the legacy names to the interface only. > > --- > > test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java > > System.out.println(String.format(...) > > Why not simply > > System.out.printf(..) > > ? > > --- > > Thanks, > David > ----- > > On 6/12/2019 11:03 am, Daniil Titov wrote: >> Hi Mandy and Bob, >> Thank you for your comments. Please review a new version of the fix [1] that makes >> OperatingSystemImpl methods return -1 if one of the metric has value 0. >> As Mandy recommended I also updated the Javadoc for OperatingSystemMXBean >> indicating that methods could return -1 if the information is not available. >> There were no changes in CSR [3] yet, I plan to proceed with them after the fix is >> reviewed. >>> In http://cr.openjdk.java.net/~dtitov/8226575/webrev.03/src/java.base/linux/classes/jdk/internal/platform/cgroupv1/SubSystem.java.sdiff.html >>> Shouldn?t you keep the IOException catch clauses in case the file is not found? >> There is no need in keeping IOException catch in these 2 places where it used to be (getLongValueMatchingLine and getLongEntry methods). >> As I understand IOException catch was required only because File.lines() and File. readAllLines() can throw IOException. >> Now these calls are performed inside AccessController.doPrivileged(PrivilegedExceptionAction) that wraps >> all checked exceptions in PrivilegedActionException that we are catching now instead of IOException. >> Here is the sampe of the stacktrace: >> java.security.PrivilegedActionException: java.io.FileNotFoundException >> at java.base/java.security.AccessController.doPrivileged(AccessController.java:558) >> at java.base/jdk.internal.platform.cgroupv1.SubSystem.getLongValueMatchingLine(SubSystem.java:113) >> at java.base/jdk.internal.platform.cgroupv1.Metrics.getMemoryLimit(Metrics.java:390) >> at jdk.management/com.sun.management.internal.OperatingSystemImpl.getTotalMemorySize(OperatingSystemImpl.java:109) >> at CheckOperatingSystemMXBean.main(CheckOperatingSystemMXBean.java:36) >> Caused by: java.io.FileNotFoundException >> at java.base/jdk.internal.platform.cgroupv1.SubSystem.lambda$getLongValueMatchingLine$1(SubSystem.java:116) >> at java.base/java.security.AccessController.doPrivileged(AccessController.java:554) >> In getStringValue method the whole code block is now executed inside AccessController.doPrivileged() so we still need either catch >> IOException inside this code block or convert this block to PrivilegedExceptionAction and then put AccessController.doPrivileged >> call inside new try/catch Block to catch PrivilegedExceptionAction. The former approach looked more preferable. >> Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. >> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.04/ >> [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 >> [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 >> Thanks, >> Daniil >> ?On 12/5/19, 12:59 PM, "Mandy Chung" wrote: >> On 12/5/19 12:50 PM, Bob Vandette wrote: >> > >> >>>> It may worth considering adding Metrics::getSwapLimit and >> >>>> Metrics::getSwapUsage and move the computation to the implementation of >> >>>> Metrics. Bob may have an opinion. >> >> There was no any new input regarding this so I decided to leave it unchanged. >> > Sorry, I didn?t respond to this. Since the calculation required for getFreeSwapSpaceSize requires retries >> > due to the access of multiple changing values, I think it?s best to leave things as they are so the caller of >> > these methods understands the limitations of the API. >> OK with me. >> > Also, the fact that swap size metrics include memory sizes is fully documented in both the cgroup and docker >> > online documentation so it?s probably best to be consistent. >> > >> >>>> Also it seems correct for the memory related methods to check if >> >>>> (containerMetrics != null && containerMetrics.getMemoryLimit() >= 0). >> >>>> BTW what does it mean if limit == 0? >> >> Per Docker docs the minimum allowed value for memory limit (--memory option) is 4 megabytes. >> >> And if memory limit is unset the return value is -1. Thus, in my understanding the value 0 is only possible >> >> if something went wrong while retrieving this metric. >> > That is true but shouldn?t you return -1 in that case? >> > >> > I originally thought it was ok to fall back to the host data for 0 values but I think its better to return unavailable (-1) >> > I think you might want to change all >= 0 to > 0 and return -1 if any of the values are 0. This would be more consistent. >> +1 >> The javadoc should be changed and returns -1 when it's unavailable and >> the CSR should also be updated to reflect this. I'm sure Joe can >> re-approve the CSR quickly when the fix is reviewed and approved. >> > You should only fall back to the original logic (host values) if container values are set to unlimited. >> > >> +1 >> Mandy >> From mandy.chung at oracle.com Fri Dec 6 21:38:22 2019 From: mandy.chung at oracle.com (Mandy Chung) Date: Fri, 6 Dec 2019 13:38:22 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> Message-ID: <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> On 12/6/19 5:59 AM, Bob Vandette wrote: >> On Dec 6, 2019, at 2:49 AM, David Holmes wrote: >> >> >> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java >> >> The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. I thought that the error case we are referring to is limit == 0 which indicates something unexpected goes wrong.? So the compatibility concern should be low.? This is very specific to Metrics implementation for cgroup v1 and let me know if I'm wrong. >> Surely there must always be some information available from the operating environment? I see from the impl file: >> >> // the host data, value 0 indicates that something went wrong while the metric was read and >> // in this case we return "information unavailable" code -1. >> >> I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. > I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods. > Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others > are, I?d suggest we keep Daniil?s original logic to fall back to the host value since a disabled subsystem is equivalent to no > limits. > It's important to consider carefully if the monitoring API indicates an error vs unavailable and an application should continue to run when the monitoring system fails to get the metrics. There are several choices to report "something goes wrong" scenarios (should unlikely happen???): 1. fall back to a random positive value? (e.g. host value) 2. return a negative value 3. throw an exception #3 is not an option as the application is not expecting this.? For #2, the application can filter bad values if desirable. I'm okay if you want to file a JBS issue to follow up and thoroughly look at the cases that the metrics are unavailable and the cases when fails to obtain. >> --- >> >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >> >> System.out.println(String.format(...) >> >> Why not simply >> >> System.out.printf(..) >> >> ? or simply (as I commented [1]) ??? System.out.format Mandy [1] https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html From daniil.x.titov at oracle.com Sat Dec 7 01:41:13 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Fri, 06 Dec 2019 17:41:13 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> Message-ID: Hi David, Mandy, and Bob, Thank you for reviewing this fix. Please review a new version of the fix [1] that includes the following changes comparing to the previous version of the webrev ( webrev.04) 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 and to CSR [3] were discarded. 2. The implementation of methods getFreeMemorySize, getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize was also reverted to webrev.03 version that return host's values if the metrics are unavailable or cannot be properly read. I would like to mention that currently the native implementation of these methods de-facto may return -1 at some circumstances, but I agree that the changes proposed in the previous version of the webrev increase such probability. I filed the follow-up issue [4] as Mandy suggested. 3. The legacy methods were renamed as David suggested. > src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c > ! static int initialized=1; > > Am I reading this right that the code currently fails to actually do the > initialization because of this ??? Yes, currently the code fails to do the initialization but it was unnoticed since method get_cpuload_internal(...) was never called for a specific CPU, the first parameter "which" was always -1. > test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java > > System.out.println(String.format(...) > > Why not simply > > System.out.printf(..) As I tried explain it earlier it would make the tests unstable. System.out.printf(...) just delegates the call to System.out.format(...) that doesn't emit the string atomically. Instead it parses the format string into a list of FormatString objects and then iterates over the list. As a result, the other traces occasionally got printed between these iterations and break the pattern the test is expected to find in the output. For example, here is the sample of a such output that has the trace message printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:" and "1030762496". [0.304s][trace][os,container] Memory Usage is: 42983424 OperatingSystemMXBean.getFreeMemorySize: 1030758400 [0.305s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes [0.305s][trace][os,container] Memory Usage is: 42979328 [0.306s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes OperatingSystemMXBean.getFreePhysicalMemorySize: [0.306s][trace][os,container] Memory Usage is: 42975232 1030762496 OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176 java.lang.RuntimeException: 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from stdout/stderr at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306) at TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151) at TestMemoryAwareness.main(TestMemoryAwareness.java:73) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) at java.base/java.lang.Thread.run(Thread.java:832) Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.05 [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 [4] https://bugs.openjdk.java.net/browse/JDK-8235522 Thank you, Daniil ?On 12/6/19, 1:38 PM, "Mandy Chung" wrote: On 12/6/19 5:59 AM, Bob Vandette wrote: >> On Dec 6, 2019, at 2:49 AM, David Holmes wrote: >> >> >> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java >> >> The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. I thought that the error case we are referring to is limit == 0 which indicates something unexpected goes wrong. So the compatibility concern should be low. This is very specific to Metrics implementation for cgroup v1 and let me know if I'm wrong. >> Surely there must always be some information available from the operating environment? I see from the impl file: >> >> // the host data, value 0 indicates that something went wrong while the metric was read and >> // in this case we return "information unavailable" code -1. >> >> I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. > I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods. > Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others > are, I?d suggest we keep Daniil?s original logic to fall back to the host value since a disabled subsystem is equivalent to no > limits. > It's important to consider carefully if the monitoring API indicates an error vs unavailable and an application should continue to run when the monitoring system fails to get the metrics. There are several choices to report "something goes wrong" scenarios (should unlikely happen???): 1. fall back to a random positive value (e.g. host value) 2. return a negative value 3. throw an exception #3 is not an option as the application is not expecting this. For #2, the application can filter bad values if desirable. I'm okay if you want to file a JBS issue to follow up and thoroughly look at the cases that the metrics are unavailable and the cases when fails to obtain. >> --- >> >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >> >> System.out.println(String.format(...) >> >> Why not simply >> >> System.out.printf(..) >> >> ? or simply (as I commented [1]) System.out.format Mandy [1] https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html From david.holmes at oracle.com Mon Dec 9 04:49:11 2019 From: david.holmes at oracle.com (David Holmes) Date: Mon, 9 Dec 2019 14:49:11 +1000 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> Message-ID: <0a412f63-1583-6789-9afa-cebbc968e7c4@oracle.com> Hi Daniil, On 7/12/2019 11:41 am, Daniil Titov wrote: > Hi David, Mandy, and Bob, > > Thank you for reviewing this fix. > > Please review a new version of the fix [1] that includes the following changes comparing to the previous version of the webrev ( webrev.04) > 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 and to CSR [3] were discarded. Okay. > 2. The implementation of methods getFreeMemorySize, getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize > was also reverted to webrev.03 version that return host's values if the metrics are unavailable or cannot be properly read. Okay. > I would like to mention that currently the native implementation of these methods de-facto may return -1 at some circumstances, > but I agree that the changes proposed in the previous version of the webrev increase such probability. > I filed the follow-up issue [4] as Mandy suggested. I added a comment to the bug. This is potentially a difficult problem to resolve - it all depends on the likelihood of any errors and what they really indicate. > 3. The legacy methods were renamed as David suggested. Thanks! > >> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c >> ! static int initialized=1; >> >> Am I reading this right that the code currently fails to actually do the >> initialization because of this ??? > > Yes, currently the code fails to do the initialization but it was unnoticed since method > get_cpuload_internal(...) was never called for a specific CPU, the first parameter "which" > was always -1. So we never try to access the uninitialized counters.cpus array which is good but we still return garbage for counters.jvmTicks and counters.cpuTicks - surely that should have been noticeable? >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >> >> System.out.println(String.format(...) >> >> Why not simply >> >> System.out.printf(..) > > As I tried explain it earlier it would make the tests unstable. > System.out.printf(...) just delegates the call to System.out.format(...) that doesn't emit the string atomically. > Instead it parses the format string into a list of FormatString objects and then iterates over the list. > As a result, the other traces occasionally got printed between these iterations and break the pattern the test is expected to find > in the output. Sorry I missed the earlier explanation. I find it somewhat surprising that format() works that way, but without unlimited buffering there will always be a need to flush the outputstream at some point. Thanks, David ----- > For example, here is the sample of a such output that has the trace message printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:" > and "1030762496". > > > [0.304s][trace][os,container] Memory Usage is: 42983424 > OperatingSystemMXBean.getFreeMemorySize: 1030758400 > [0.305s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > [0.305s][trace][os,container] Memory Usage is: 42979328 > [0.306s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > OperatingSystemMXBean.getFreePhysicalMemorySize: [0.306s][trace][os,container] Memory Usage is: 42975232 > 1030762496 > OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176 > > > java.lang.RuntimeException: 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from stdout/stderr > > at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306) > at TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151) > at TestMemoryAwareness.main(TestMemoryAwareness.java:73) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) > at java.base/java.lang.Thread.run(Thread.java:832) > > Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. > > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.05 > [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 > [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 > [4] https://bugs.openjdk.java.net/browse/JDK-8235522 > > Thank you, > Daniil > > ?On 12/6/19, 1:38 PM, "Mandy Chung" wrote: > > > > On 12/6/19 5:59 AM, Bob Vandette wrote: > >> On Dec 6, 2019, at 2:49 AM, David Holmes wrote: > >> > >> > >> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java > >> > >> The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. > > I thought that the error case we are referring to is limit == 0 which > indicates something unexpected goes wrong. So the compatibility concern > should be low. This is very specific to Metrics implementation for > cgroup v1 and let me know if I'm wrong. > > >> Surely there must always be some information available from the operating environment? I see from the impl file: > >> > >> // the host data, value 0 indicates that something went wrong while the metric was read and > >> // in this case we return "information unavailable" code -1. > >> > >> I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. > > I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods. > > Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others > > are, I?d suggest we keep Daniil?s original logic to fall back to the host value since a disabled subsystem is equivalent to no > > limits. > > > > It's important to consider carefully if the monitoring API indicates an > error vs unavailable and an application should continue to run when the > monitoring system fails to get the metrics. > > There are several choices to report "something goes wrong" scenarios > (should unlikely happen???): > 1. fall back to a random positive value (e.g. host value) > 2. return a negative value > 3. throw an exception > > #3 is not an option as the application is not expecting this. For #2, > the application can filter bad values if desirable. > > I'm okay if you want to file a JBS issue to follow up and thoroughly > look at the cases that the metrics are unavailable and the cases when > fails to obtain. > > >> --- > >> > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java > >> > >> System.out.println(String.format(...) > >> > >> Why not simply > >> > >> System.out.printf(..) > >> > >> ? > > or simply (as I commented [1]) > System.out.format > > Mandy > [1] > https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html > > > > From bob.vandette at oracle.com Mon Dec 9 15:17:33 2019 From: bob.vandette at oracle.com (Bob Vandette) Date: Mon, 9 Dec 2019 10:17:33 -0500 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> Message-ID: Why did you not change the exception caught in SubSystem.java:getStringValue to PrivilegedActionException from IOException so it?s consistent with the other get functions? Bob. > On Dec 6, 2019, at 8:41 PM, Daniil Titov wrote: > > Hi David, Mandy, and Bob, > > Thank you for reviewing this fix. > > Please review a new version of the fix [1] that includes the following changes comparing to the previous version of the webrev ( webrev.04) > 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 and to CSR [3] were discarded. > 2. The implementation of methods getFreeMemorySize, getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize > was also reverted to webrev.03 version that return host's values if the metrics are unavailable or cannot be properly read. > I would like to mention that currently the native implementation of these methods de-facto may return -1 at some circumstances, > but I agree that the changes proposed in the previous version of the webrev increase such probability. > I filed the follow-up issue [4] as Mandy suggested. > 3. The legacy methods were renamed as David suggested. > > >> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c >> ! static int initialized=1; >> >> Am I reading this right that the code currently fails to actually do the >> initialization because of this ??? > > Yes, currently the code fails to do the initialization but it was unnoticed since method > get_cpuload_internal(...) was never called for a specific CPU, the first parameter "which" > was always -1. > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >> >> System.out.println(String.format(...) >> >> Why not simply >> >> System.out.printf(..) > > As I tried explain it earlier it would make the tests unstable. > System.out.printf(...) just delegates the call to System.out.format(...) that doesn't emit the string atomically. > Instead it parses the format string into a list of FormatString objects and then iterates over the list. > As a result, the other traces occasionally got printed between these iterations and break the pattern the test is expected to find > in the output. > > For example, here is the sample of a such output that has the trace message printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:" > and "1030762496". > > > [0.304s][trace][os,container] Memory Usage is: 42983424 > OperatingSystemMXBean.getFreeMemorySize: 1030758400 > [0.305s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > [0.305s][trace][os,container] Memory Usage is: 42979328 > [0.306s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > OperatingSystemMXBean.getFreePhysicalMemorySize: [0.306s][trace][os,container] Memory Usage is: 42975232 > 1030762496 > OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176 > > > java.lang.RuntimeException: 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from stdout/stderr > > at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306) > at TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151) > at TestMemoryAwareness.main(TestMemoryAwareness.java:73) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) > at java.base/java.lang.Thread.run(Thread.java:832) > > Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. > > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.05 > [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 > [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 > [4] https://bugs.openjdk.java.net/browse/JDK-8235522 > > Thank you, > Daniil > > ?On 12/6/19, 1:38 PM, "Mandy Chung" wrote: > > > > On 12/6/19 5:59 AM, Bob Vandette wrote: >>> On Dec 6, 2019, at 2:49 AM, David Holmes wrote: >>> >>> >>> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java >>> >>> The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. > > I thought that the error case we are referring to is limit == 0 which > indicates something unexpected goes wrong. So the compatibility concern > should be low. This is very specific to Metrics implementation for > cgroup v1 and let me know if I'm wrong. > >>> Surely there must always be some information available from the operating environment? I see from the impl file: >>> >>> // the host data, value 0 indicates that something went wrong while the metric was read and >>> // in this case we return "information unavailable" code -1. >>> >>> I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. >> I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods. >> Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others >> are, I?d suggest we keep Daniil?s original logic to fall back to the host value since a disabled subsystem is equivalent to no >> limits. >> > > It's important to consider carefully if the monitoring API indicates an > error vs unavailable and an application should continue to run when the > monitoring system fails to get the metrics. > > There are several choices to report "something goes wrong" scenarios > (should unlikely happen???): > 1. fall back to a random positive value (e.g. host value) > 2. return a negative value > 3. throw an exception > > #3 is not an option as the application is not expecting this. For #2, > the application can filter bad values if desirable. > > I'm okay if you want to file a JBS issue to follow up and thoroughly > look at the cases that the metrics are unavailable and the cases when > fails to obtain. > >>> --- >>> >>> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >>> >>> System.out.println(String.format(...) >>> >>> Why not simply >>> >>> System.out.printf(..) >>> >>> ? > > or simply (as I commented [1]) > System.out.format > > Mandy > [1] > https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html > > > > From mandy.chung at oracle.com Mon Dec 9 17:48:51 2019 From: mandy.chung at oracle.com (Mandy Chung) Date: Mon, 9 Dec 2019 09:48:51 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> Message-ID: <95ab70c9-8be4-13cf-ef90-f36b2e993450@oracle.com> Files:lines requires FilePermission check.? So it needs to be wrapped with doPrivileged.? The readFilePrivileged can unwrap and throw the cause instead like this: ??? static Stream readFilePrivileged(Path path) throws IOException { ???????? try { ???????????? return AccessController.doPrivileged((PrivilegedExceptionAction>) () -> Files.lines(path)); ???????? } catch (PrivilegedActionException e) { ???????????? Throwable x = e.getCause(); ???????????? if (x instanceof IOException) ????????????????? throw (IOException)x; ???????????? if (x instanceof RuntimeException) ????????????????? throw (RuntimeException)x; ???????????? if (x instanceof Error) ????????????????? throw (Error)x; ???????????? throw new InternalError(x); ???????? } ??? } On 12/9/19 7:17 AM, Bob Vandette wrote: > Why did you not change the exception caught in SubSystem.java:getStringValue to PrivilegedActionException from IOException > so it?s consistent with the other get functions? > > Bob. > > >> On Dec 6, 2019, at 8:41 PM, Daniil Titov wrote: >> >> Hi David, Mandy, and Bob, >> >> Thank you for reviewing this fix. >> >> Please review a new version of the fix [1] that includes the following changes comparing to the previous version of the webrev ( webrev.04) >> 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 and to CSR [3] were discarded. >> 2. The implementation of methods getFreeMemorySize, getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize >> was also reverted to webrev.03 version that return host's values if the metrics are unavailable or cannot be properly read. >> I would like to mention that currently the native implementation of these methods de-facto may return -1 at some circumstances, >> but I agree that the changes proposed in the previous version of the webrev increase such probability. >> I filed the follow-up issue [4] as Mandy suggested. >> 3. The legacy methods were renamed as David suggested. >> >> >>> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c >>> ! static int initialized=1; >>> >>> Am I reading this right that the code currently fails to actually do the >>> initialization because of this ??? >> Yes, currently the code fails to do the initialization but it was unnoticed since method >> get_cpuload_internal(...) was never called for a specific CPU, the first parameter "which" >> was always -1. >> >>> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >>> >>> System.out.println(String.format(...) >>> >>> Why not simply >>> >>> System.out.printf(..) >> As I tried explain it earlier it would make the tests unstable. >> System.out.printf(...) just delegates the call to System.out.format(...) that doesn't emit the string atomically. >> Instead it parses the format string into a list of FormatString objects and then iterates over the list. >> As a result, the other traces occasionally got printed between these iterations and break the pattern the test is expected to find >> in the output. >> >> For example, here is the sample of a such output that has the trace message printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:" >> and "1030762496". >> >> >> [0.304s][trace][os,container] Memory Usage is: 42983424 >> OperatingSystemMXBean.getFreeMemorySize: 1030758400 >> [0.305s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes >> [0.305s][trace][os,container] Memory Usage is: 42979328 >> [0.306s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes >> OperatingSystemMXBean.getFreePhysicalMemorySize: [0.306s][trace][os,container] Memory Usage is: 42975232 >> 1030762496 >> OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176 >> >> >> java.lang.RuntimeException: 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from stdout/stderr >> >> at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306) >> at TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151) >> at TestMemoryAwareness.main(TestMemoryAwareness.java:73) >> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.base/java.lang.reflect.Method.invoke(Method.java:564) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) >> at java.base/java.lang.Thread.run(Thread.java:832) >> >> Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. >> >> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.05 >> [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 >> [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 >> [4] https://bugs.openjdk.java.net/browse/JDK-8235522 >> >> Thank you, >> Daniil >> >> ?On 12/6/19, 1:38 PM, "Mandy Chung" wrote: >> >> >> >> On 12/6/19 5:59 AM, Bob Vandette wrote: >>>> On Dec 6, 2019, at 2:49 AM, David Holmes wrote: >>>> >>>> >>>> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java >>>> >>>> The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. >> I thought that the error case we are referring to is limit == 0 which >> indicates something unexpected goes wrong. So the compatibility concern >> should be low. This is very specific to Metrics implementation for >> cgroup v1 and let me know if I'm wrong. >> >>>> Surely there must always be some information available from the operating environment? I see from the impl file: >>>> >>>> // the host data, value 0 indicates that something went wrong while the metric was read and >>>> // in this case we return "information unavailable" code -1. >>>> >>>> I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. >>> I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods. >>> Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others >>> are, I?d suggest we keep Daniil?s original logic to fall back to the host value since a disabled subsystem is equivalent to no >>> limits. >>> >> It's important to consider carefully if the monitoring API indicates an >> error vs unavailable and an application should continue to run when the >> monitoring system fails to get the metrics. >> >> There are several choices to report "something goes wrong" scenarios >> (should unlikely happen???): >> 1. fall back to a random positive value (e.g. host value) >> 2. return a negative value >> 3. throw an exception >> >> #3 is not an option as the application is not expecting this. For #2, >> the application can filter bad values if desirable. >> >> I'm okay if you want to file a JBS issue to follow up and thoroughly >> look at the cases that the metrics are unavailable and the cases when >> fails to obtain. >> >>>> --- >>>> >>>> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >>>> >>>> System.out.println(String.format(...) >>>> >>>> Why not simply >>>> >>>> System.out.printf(..) >>>> >>>> ? >> or simply (as I commented [1]) >> System.out.format >> >> Mandy >> [1] >> https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html >> >> >> >> From daniil.x.titov at oracle.com Mon Dec 9 18:51:15 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Mon, 09 Dec 2019 10:51:15 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <95ab70c9-8be4-13cf-ef90-f36b2e993450@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> <95ab70c9-8be4-13cf-ef90-f36b2e993450@oracle.com> Message-ID: Hi Mandy and Bob, > Why did you not change the exception caught in SubSystem.java:getStringValue to PrivilegedActionException from IOException > so it?s consistent with the other get functions? In this method both Files.newBufferedReader and return bufferedReader.readLine could throw IOException so for simplicity I just put the whole code block in doPrivileged. On the other side I don't believe that BufferedReader.readline() requires FilePermission checks ( and tests proved that) so we could change this implementation to the following: public static String getStringValue(SubSystem subsystem, String parm) { if (subsystem == null) return null; try (BufferedReader bufferedReader = AccessController.doPrivileged((PrivilegedExceptionAction) () -> { return Files.newBufferedReader(Paths.get(subsystem.path(), parm)); })) { return bufferedReader.readLine(); } catch (PrivilegedActionException | IOException e) { return null; } } Could you please advise are you OK with it or you would like to proceed with the approach Mandy suggested to unwrap PrivilegedActionException exception and throw the cause instead? Thank you, Daniil ?On 12/9/19, 9:48 AM, "Mandy Chung" wrote: Files:lines requires FilePermission check. So it needs to be wrapped with doPrivileged. The readFilePrivileged can unwrap and throw the cause instead like this: static Stream readFilePrivileged(Path path) throws IOException { try { return AccessController.doPrivileged((PrivilegedExceptionAction>) () -> Files.lines(path)); } catch (PrivilegedActionException e) { Throwable x = e.getCause(); if (x instanceof IOException) throw (IOException)x; if (x instanceof RuntimeException) throw (RuntimeException)x; if (x instanceof Error) throw (Error)x; throw new InternalError(x); } } On 12/9/19 7:17 AM, Bob Vandette wrote: > Why did you not change the exception caught in SubSystem.java:getStringValue to PrivilegedActionException from IOException > so it?s consistent with the other get functions? > > Bob. > > >> On Dec 6, 2019, at 8:41 PM, Daniil Titov wrote: >> >> Hi David, Mandy, and Bob, >> >> Thank you for reviewing this fix. >> >> Please review a new version of the fix [1] that includes the following changes comparing to the previous version of the webrev ( webrev.04) >> 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 and to CSR [3] were discarded. >> 2. The implementation of methods getFreeMemorySize, getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize >> was also reverted to webrev.03 version that return host's values if the metrics are unavailable or cannot be properly read. >> I would like to mention that currently the native implementation of these methods de-facto may return -1 at some circumstances, >> but I agree that the changes proposed in the previous version of the webrev increase such probability. >> I filed the follow-up issue [4] as Mandy suggested. >> 3. The legacy methods were renamed as David suggested. >> >> >>> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c >>> ! static int initialized=1; >>> >>> Am I reading this right that the code currently fails to actually do the >>> initialization because of this ??? >> Yes, currently the code fails to do the initialization but it was unnoticed since method >> get_cpuload_internal(...) was never called for a specific CPU, the first parameter "which" >> was always -1. >> >>> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >>> >>> System.out.println(String.format(...) >>> >>> Why not simply >>> >>> System.out.printf(..) >> As I tried explain it earlier it would make the tests unstable. >> System.out.printf(...) just delegates the call to System.out.format(...) that doesn't emit the string atomically. >> Instead it parses the format string into a list of FormatString objects and then iterates over the list. >> As a result, the other traces occasionally got printed between these iterations and break the pattern the test is expected to find >> in the output. >> >> For example, here is the sample of a such output that has the trace message printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:" >> and "1030762496". >> >> >> [0.304s][trace][os,container] Memory Usage is: 42983424 >> OperatingSystemMXBean.getFreeMemorySize: 1030758400 >> [0.305s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes >> [0.305s][trace][os,container] Memory Usage is: 42979328 >> [0.306s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes >> OperatingSystemMXBean.getFreePhysicalMemorySize: [0.306s][trace][os,container] Memory Usage is: 42975232 >> 1030762496 >> OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176 >> >> >> java.lang.RuntimeException: 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from stdout/stderr >> >> at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306) >> at TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151) >> at TestMemoryAwareness.main(TestMemoryAwareness.java:73) >> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.base/java.lang.reflect.Method.invoke(Method.java:564) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) >> at java.base/java.lang.Thread.run(Thread.java:832) >> >> Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. >> >> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.05 >> [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 >> [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 >> [4] https://bugs.openjdk.java.net/browse/JDK-8235522 >> >> Thank you, >> Daniil >> >> ?On 12/6/19, 1:38 PM, "Mandy Chung" wrote: >> >> >> >> On 12/6/19 5:59 AM, Bob Vandette wrote: >>>> On Dec 6, 2019, at 2:49 AM, David Holmes wrote: >>>> >>>> >>>> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java >>>> >>>> The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. >> I thought that the error case we are referring to is limit == 0 which >> indicates something unexpected goes wrong. So the compatibility concern >> should be low. This is very specific to Metrics implementation for >> cgroup v1 and let me know if I'm wrong. >> >>>> Surely there must always be some information available from the operating environment? I see from the impl file: >>>> >>>> // the host data, value 0 indicates that something went wrong while the metric was read and >>>> // in this case we return "information unavailable" code -1. >>>> >>>> I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. >>> I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods. >>> Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others >>> are, I?d suggest we keep Daniil?s original logic to fall back to the host value since a disabled subsystem is equivalent to no >>> limits. >>> >> It's important to consider carefully if the monitoring API indicates an >> error vs unavailable and an application should continue to run when the >> monitoring system fails to get the metrics. >> >> There are several choices to report "something goes wrong" scenarios >> (should unlikely happen???): >> 1. fall back to a random positive value (e.g. host value) >> 2. return a negative value >> 3. throw an exception >> >> #3 is not an option as the application is not expecting this. For #2, >> the application can filter bad values if desirable. >> >> I'm okay if you want to file a JBS issue to follow up and thoroughly >> look at the cases that the metrics are unavailable and the cases when >> fails to obtain. >> >>>> --- >>>> >>>> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >>>> >>>> System.out.println(String.format(...) >>>> >>>> Why not simply >>>> >>>> System.out.printf(..) >>>> >>>> ? >> or simply (as I commented [1]) >> System.out.format >> >> Mandy >> [1] >> https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html >> >> >> >> From daniil.x.titov at oracle.com Mon Dec 9 18:59:21 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Mon, 09 Dec 2019 10:59:21 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <89E9C74A-7962-4408-93C5-1AA947FD973D@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> <95ab70c9-8be4-13cf-ef90-f36b2e993450@oracle.com> <89E9C74A-7962-4408-93C5-1AA947FD973D@oracle.com> Message-ID: A correction... We could even further simplify it as the following: public static String getStringValue(SubSystem subsystem, String parm) { if (subsystem == null) return null; try (BufferedReader bufferedReader = AccessController.doPrivileged((PrivilegedExceptionAction) () -> Files.newBufferedReader(Paths.get(subsystem.path(), parm)))) { return bufferedReader.readLine(); } catch (PrivilegedActionException | IOException e) { return null; } } Best regards, Daniil ?On 12/9/19, 10:51 AM, "Daniil Titov" wrote: Hi Mandy and Bob, > Why did you not change the exception caught in SubSystem.java:getStringValue to PrivilegedActionException from IOException > so it?s consistent with the other get functions? In this method both Files.newBufferedReader and return bufferedReader.readLine could throw IOException so for simplicity I just put the whole code block in doPrivileged. On the other side I don't believe that BufferedReader.readline() requires FilePermission checks ( and tests proved that) so we could change this implementation to the following: public static String getStringValue(SubSystem subsystem, String parm) { if (subsystem == null) return null; try (BufferedReader bufferedReader = AccessController.doPrivileged((PrivilegedExceptionAction) () -> { return Files.newBufferedReader(Paths.get(subsystem.path(), parm)); })) { return bufferedReader.readLine(); } catch (PrivilegedActionException | IOException e) { return null; } } Could you please advise are you OK with it or you would like to proceed with the approach Mandy suggested to unwrap PrivilegedActionException exception and throw the cause instead? Thank you, Daniil ?On 12/9/19, 9:48 AM, "Mandy Chung" wrote: Files:lines requires FilePermission check. So it needs to be wrapped with doPrivileged. The readFilePrivileged can unwrap and throw the cause instead like this: static Stream readFilePrivileged(Path path) throws IOException { try { return AccessController.doPrivileged((PrivilegedExceptionAction>) () -> Files.lines(path)); } catch (PrivilegedActionException e) { Throwable x = e.getCause(); if (x instanceof IOException) throw (IOException)x; if (x instanceof RuntimeException) throw (RuntimeException)x; if (x instanceof Error) throw (Error)x; throw new InternalError(x); } } On 12/9/19 7:17 AM, Bob Vandette wrote: > Why did you not change the exception caught in SubSystem.java:getStringValue to PrivilegedActionException from IOException > so it?s consistent with the other get functions? > > Bob. > > >> On Dec 6, 2019, at 8:41 PM, Daniil Titov wrote: >> >> Hi David, Mandy, and Bob, >> >> Thank you for reviewing this fix. >> >> Please review a new version of the fix [1] that includes the following changes comparing to the previous version of the webrev ( webrev.04) >> 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 and to CSR [3] were discarded. >> 2. The implementation of methods getFreeMemorySize, getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize >> was also reverted to webrev.03 version that return host's values if the metrics are unavailable or cannot be properly read. >> I would like to mention that currently the native implementation of these methods de-facto may return -1 at some circumstances, >> but I agree that the changes proposed in the previous version of the webrev increase such probability. >> I filed the follow-up issue [4] as Mandy suggested. >> 3. The legacy methods were renamed as David suggested. >> >> >>> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c >>> ! static int initialized=1; >>> >>> Am I reading this right that the code currently fails to actually do the >>> initialization because of this ??? >> Yes, currently the code fails to do the initialization but it was unnoticed since method >> get_cpuload_internal(...) was never called for a specific CPU, the first parameter "which" >> was always -1. >> >>> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >>> >>> System.out.println(String.format(...) >>> >>> Why not simply >>> >>> System.out.printf(..) >> As I tried explain it earlier it would make the tests unstable. >> System.out.printf(...) just delegates the call to System.out.format(...) that doesn't emit the string atomically. >> Instead it parses the format string into a list of FormatString objects and then iterates over the list. >> As a result, the other traces occasionally got printed between these iterations and break the pattern the test is expected to find >> in the output. >> >> For example, here is the sample of a such output that has the trace message printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:" >> and "1030762496". >> >> >> [0.304s][trace][os,container] Memory Usage is: 42983424 >> OperatingSystemMXBean.getFreeMemorySize: 1030758400 >> [0.305s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes >> [0.305s][trace][os,container] Memory Usage is: 42979328 >> [0.306s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes >> OperatingSystemMXBean.getFreePhysicalMemorySize: [0.306s][trace][os,container] Memory Usage is: 42975232 >> 1030762496 >> OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176 >> >> >> java.lang.RuntimeException: 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from stdout/stderr >> >> at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306) >> at TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151) >> at TestMemoryAwareness.main(TestMemoryAwareness.java:73) >> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.base/java.lang.reflect.Method.invoke(Method.java:564) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) >> at java.base/java.lang.Thread.run(Thread.java:832) >> >> Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. >> >> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.05 >> [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 >> [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 >> [4] https://bugs.openjdk.java.net/browse/JDK-8235522 >> >> Thank you, >> Daniil >> >> ?On 12/6/19, 1:38 PM, "Mandy Chung" wrote: >> >> >> >> On 12/6/19 5:59 AM, Bob Vandette wrote: >>>> On Dec 6, 2019, at 2:49 AM, David Holmes wrote: >>>> >>>> >>>> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java >>>> >>>> The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. >> I thought that the error case we are referring to is limit == 0 which >> indicates something unexpected goes wrong. So the compatibility concern >> should be low. This is very specific to Metrics implementation for >> cgroup v1 and let me know if I'm wrong. >> >>>> Surely there must always be some information available from the operating environment? I see from the impl file: >>>> >>>> // the host data, value 0 indicates that something went wrong while the metric was read and >>>> // in this case we return "information unavailable" code -1. >>>> >>>> I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. >>> I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods. >>> Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others >>> are, I?d suggest we keep Daniil?s original logic to fall back to the host value since a disabled subsystem is equivalent to no >>> limits. >>> >> It's important to consider carefully if the monitoring API indicates an >> error vs unavailable and an application should continue to run when the >> monitoring system fails to get the metrics. >> >> There are several choices to report "something goes wrong" scenarios >> (should unlikely happen???): >> 1. fall back to a random positive value (e.g. host value) >> 2. return a negative value >> 3. throw an exception >> >> #3 is not an option as the application is not expecting this. For #2, >> the application can filter bad values if desirable. >> >> I'm okay if you want to file a JBS issue to follow up and thoroughly >> look at the cases that the metrics are unavailable and the cases when >> fails to obtain. >> >>>> --- >>>> >>>> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >>>> >>>> System.out.println(String.format(...) >>>> >>>> Why not simply >>>> >>>> System.out.printf(..) >>>> >>>> ? >> or simply (as I commented [1]) >> System.out.format >> >> Mandy >> [1] >> https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html >> >> >> >> From daniil.x.titov at oracle.com Mon Dec 9 19:31:20 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Mon, 09 Dec 2019 11:31:20 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <0a412f63-1583-6789-9afa-cebbc968e7c4@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> <0a412f63-1583-6789-9afa-cebbc968e7c4@oracle.com> Message-ID: <447D72BA-1046-4131-9B13-C1F7244D87EA@oracle.com> Hi David, > So we never try to access the uninitialized counters.cpus array which is > good but we still return garbage for counters.jvmTicks and > counters.cpuTicks - surely that should have been noticeable? It only affected the first time the CPU load was requested. Function get_cpuload_internal(...) calls get_totalticks() and get_jvmticks() functions that update these counters. But on the first call, yes, it compares the newly received counters with the garbage. It has the code that seems to be written to somehow mitigate it and in worse case just return 0 or 1.0. But it also could be that there is some other problem this code tries to solve so I'm not sure we should remove these workarounds as a part of the current fix. 274 // seems like we sometimes end up with less kernel ticks when 275 // reading /proc/self/stat a second time, timing issue between cpus? 276 if (pticks->usedKernel < tmp.usedKernel) { 277 kdiff = 0; 278 } else { 279 kdiff = pticks->usedKernel - tmp.usedKernel; 280 } 281 tdiff = pticks->total - tmp.total; 282 udiff = pticks->used - tmp.used; 283 284 if (tdiff == 0) { 285 user_load = 0; 286 } else { 287 if (tdiff < (udiff + kdiff)) { 288 tdiff = udiff + kdiff; 289 } 290 *pkernelLoad = (kdiff / (double)tdiff); 291 // BUG9044876, normalize return values to sane values 292 *pkernelLoad = MAX(*pkernelLoad, 0.0); 293 *pkernelLoad = MIN(*pkernelLoad, 1.0); 294 295 user_load = (udiff / (double)tdiff); 296 user_load = MAX(user_load, 0.0); 297 user_load = MIN(user_load, 1.0); 298 } 299 } Best regards, Daniil ?On 12/8/19, 8:49 PM, "David Holmes" wrote: Hi Daniil, On 7/12/2019 11:41 am, Daniil Titov wrote: > Hi David, Mandy, and Bob, > > Thank you for reviewing this fix. > > Please review a new version of the fix [1] that includes the following changes comparing to the previous version of the webrev ( webrev.04) > 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 and to CSR [3] were discarded. Okay. > 2. The implementation of methods getFreeMemorySize, getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize > was also reverted to webrev.03 version that return host's values if the metrics are unavailable or cannot be properly read. Okay. > I would like to mention that currently the native implementation of these methods de-facto may return -1 at some circumstances, > but I agree that the changes proposed in the previous version of the webrev increase such probability. > I filed the follow-up issue [4] as Mandy suggested. I added a comment to the bug. This is potentially a difficult problem to resolve - it all depends on the likelihood of any errors and what they really indicate. > 3. The legacy methods were renamed as David suggested. Thanks! > >> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c >> ! static int initialized=1; >> >> Am I reading this right that the code currently fails to actually do the >> initialization because of this ??? > > Yes, currently the code fails to do the initialization but it was unnoticed since method > get_cpuload_internal(...) was never called for a specific CPU, the first parameter "which" > was always -1. So we never try to access the uninitialized counters.cpus array which is good but we still return garbage for counters.jvmTicks and counters.cpuTicks - surely that should have been noticeable? >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >> >> System.out.println(String.format(...) >> >> Why not simply >> >> System.out.printf(..) > > As I tried explain it earlier it would make the tests unstable. > System.out.printf(...) just delegates the call to System.out.format(...) that doesn't emit the string atomically. > Instead it parses the format string into a list of FormatString objects and then iterates over the list. > As a result, the other traces occasionally got printed between these iterations and break the pattern the test is expected to find > in the output. Sorry I missed the earlier explanation. I find it somewhat surprising that format() works that way, but without unlimited buffering there will always be a need to flush the outputstream at some point. Thanks, David ----- > For example, here is the sample of a such output that has the trace message printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:" > and "1030762496". > > > [0.304s][trace][os,container] Memory Usage is: 42983424 > OperatingSystemMXBean.getFreeMemorySize: 1030758400 > [0.305s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > [0.305s][trace][os,container] Memory Usage is: 42979328 > [0.306s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > OperatingSystemMXBean.getFreePhysicalMemorySize: [0.306s][trace][os,container] Memory Usage is: 42975232 > 1030762496 > OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176 > > > java.lang.RuntimeException: 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from stdout/stderr > > at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306) > at TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151) > at TestMemoryAwareness.main(TestMemoryAwareness.java:73) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) > at java.base/java.lang.Thread.run(Thread.java:832) > > Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. > > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.05 > [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 > [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 > [4] https://bugs.openjdk.java.net/browse/JDK-8235522 > > Thank you, > Daniil > > ?On 12/6/19, 1:38 PM, "Mandy Chung" wrote: > > > > On 12/6/19 5:59 AM, Bob Vandette wrote: > >> On Dec 6, 2019, at 2:49 AM, David Holmes wrote: > >> > >> > >> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java > >> > >> The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. > > I thought that the error case we are referring to is limit == 0 which > indicates something unexpected goes wrong. So the compatibility concern > should be low. This is very specific to Metrics implementation for > cgroup v1 and let me know if I'm wrong. > > >> Surely there must always be some information available from the operating environment? I see from the impl file: > >> > >> // the host data, value 0 indicates that something went wrong while the metric was read and > >> // in this case we return "information unavailable" code -1. > >> > >> I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. > > I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods. > > Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others > > are, I?d suggest we keep Daniil?s original logic to fall back to the host value since a disabled subsystem is equivalent to no > > limits. > > > > It's important to consider carefully if the monitoring API indicates an > error vs unavailable and an application should continue to run when the > monitoring system fails to get the metrics. > > There are several choices to report "something goes wrong" scenarios > (should unlikely happen???): > 1. fall back to a random positive value (e.g. host value) > 2. return a negative value > 3. throw an exception > > #3 is not an option as the application is not expecting this. For #2, > the application can filter bad values if desirable. > > I'm okay if you want to file a JBS issue to follow up and thoroughly > look at the cases that the metrics are unavailable and the cases when > fails to obtain. > > >> --- > >> > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java > >> > >> System.out.println(String.format(...) > >> > >> Why not simply > >> > >> System.out.printf(..) > >> > >> ? > > or simply (as I commented [1]) > System.out.format > > Mandy > [1] > https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html > > > > From mandy.chung at oracle.com Mon Dec 9 21:54:58 2019 From: mandy.chung at oracle.com (Mandy Chung) Date: Mon, 9 Dec 2019 13:54:58 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> <95ab70c9-8be4-13cf-ef90-f36b2e993450@oracle.com> Message-ID: <2b476e9f-0266-eba7-3629-114296a55f29@oracle.com> On 12/9/19 10:51 AM, Daniil Titov wrote: > Hi Mandy and Bob, > >> Why did you not change the exception caught in SubSystem.java:getStringValue to PrivilegedActionException from IOException >> so it?s consistent with the other get functions? > In this method both Files.newBufferedReader and return bufferedReader.readLine could throw IOException so for simplicity I just put > the whole code block in doPrivileged. On the other side I don't believe that BufferedReader.readline() requires FilePermission checks ( and tests proved that) > so we could change this implementation to the following: > > public static String getStringValue(SubSystem subsystem, String parm) { > if (subsystem == null) return null; > > try (BufferedReader bufferedReader = > AccessController.doPrivileged((PrivilegedExceptionAction) () -> { > return Files.newBufferedReader(Paths.get(subsystem.path(), parm)); > })) { > return bufferedReader.readLine(); > } catch (PrivilegedActionException | IOException e) { > return null; > } > } > > Could you please advise are you OK with it or you would like to proceed with the approach Mandy suggested to unwrap > PrivilegedActionException exception and throw the cause instead? > I think it's simpler to read and understand if the doPrivileged call is moved out as a separate method that will throw IOException as the expected functionality as suggested above. For SubSystem::getStringValue, one suggestion would be: diff --git a/src/java.base/linux/classes/jdk/internal/platform/cgroupv1/SubSystem.java b/src/java.base/linux/classes/jdk/internal/platform/cgroupv1/SubSystem.java --- a/src/java.base/linux/classes/jdk/internal/platform/cgroupv1/SubSystem.java +++ b/src/java.base/linux/classes/jdk/internal/platform/cgroupv1/SubSystem.java @@ -29,7 +29,11 @@ ?import java.io.IOException; ?import java.math.BigInteger; ?import java.nio.file.Files; +import java.nio.file.Path; ?import java.nio.file.Paths; +import java.security.AccessController; +import java.security.PrivilegedActionException; +import java.security.PrivilegedExceptionAction; ?import java.util.ArrayList; ?import java.util.List; ?import java.util.Optional; @@ -90,9 +94,8 @@ ???? public static String getStringValue(SubSystem subsystem, String parm) { ???????? if (subsystem == null) return null; -??????? try(BufferedReader bufferedReader = Files.newBufferedReader(Paths.get(subsystem.path(), parm))) { -??????????? String line = bufferedReader.readLine(); -??????????? return line; +??????? try { +??????????? return subsystem.readStringValue(parm); ???????? } ???????? catch (IOException e) { ???????????? return null; @@ -100,6 +103,24 @@ ???? } +??? private String readStringValue(String param) throws IOException { +??????? PrivilegedExceptionAction pea = () -> Files.newBufferedReader(Paths.get(path(), param)); +??????? try (BufferedReader bufferedReader = AccessController.doPrivileged(pea)) { +??????????? String line = bufferedReader.readLine(); +??????????? return line; +??????? } catch (PrivilegedActionException e) { +??????????? Throwable x = e.getCause(); +??????????? if (x instanceof IOException) +??????????????? throw (IOException)x; +??????????? if (x instanceof RuntimeException) +??????????????? throw (RuntimeException)x; +??????????? if (x instanceof Error) +??????????????? throw (Error)x; + +??????????? throw new InternalError(x); +??????? } +??? } + From daniil.x.titov at oracle.com Mon Dec 9 23:47:02 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Mon, 09 Dec 2019 15:47:02 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <2b476e9f-0266-eba7-3629-114296a55f29@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> <95ab70c9-8be4-13cf-ef90-f36b2e993450@oracle.com> <2b476e9f-0266-eba7-3629-114296a55f29@oracle.com> Message-ID: <48F75B58-0529-4B43-A355-A12EB0D09598@oracle.com> Hi Mandy and Bob, Please review a new version of the webrev [1] that moves doPrivileged calls in jdk.internal.platform.cgroupv1.SubSystem to separate methods that throw IOException, as Mandy suggested. Mach5 tests are still running. [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.06/ [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 Thank you, Daniil ?On 12/9/19, 1:55 PM, "Mandy Chung" wrote: On 12/9/19 10:51 AM, Daniil Titov wrote: > Hi Mandy and Bob, > >> Why did you not change the exception caught in SubSystem.java:getStringValue to PrivilegedActionException from IOException >> so it?s consistent with the other get functions? > In this method both Files.newBufferedReader and return bufferedReader.readLine could throw IOException so for simplicity I just put > the whole code block in doPrivileged. On the other side I don't believe that BufferedReader.readline() requires FilePermission checks ( and tests proved that) > so we could change this implementation to the following: > > public static String getStringValue(SubSystem subsystem, String parm) { > if (subsystem == null) return null; > > try (BufferedReader bufferedReader = > AccessController.doPrivileged((PrivilegedExceptionAction) () -> { > return Files.newBufferedReader(Paths.get(subsystem.path(), parm)); > })) { > return bufferedReader.readLine(); > } catch (PrivilegedActionException | IOException e) { > return null; > } > } > > Could you please advise are you OK with it or you would like to proceed with the approach Mandy suggested to unwrap > PrivilegedActionException exception and throw the cause instead? > I think it's simpler to read and understand if the doPrivileged call is moved out as a separate method that will throw IOException as the expected functionality as suggested above. For SubSystem::getStringValue, one suggestion would be: diff --git a/src/java.base/linux/classes/jdk/internal/platform/cgroupv1/SubSystem.java b/src/java.base/linux/classes/jdk/internal/platform/cgroupv1/SubSystem.java --- a/src/java.base/linux/classes/jdk/internal/platform/cgroupv1/SubSystem.java +++ b/src/java.base/linux/classes/jdk/internal/platform/cgroupv1/SubSystem.java @@ -29,7 +29,11 @@ import java.io.IOException; import java.math.BigInteger; import java.nio.file.Files; +import java.nio.file.Path; import java.nio.file.Paths; +import java.security.AccessController; +import java.security.PrivilegedActionException; +import java.security.PrivilegedExceptionAction; import java.util.ArrayList; import java.util.List; import java.util.Optional; @@ -90,9 +94,8 @@ public static String getStringValue(SubSystem subsystem, String parm) { if (subsystem == null) return null; - try(BufferedReader bufferedReader = Files.newBufferedReader(Paths.get(subsystem.path(), parm))) { - String line = bufferedReader.readLine(); - return line; + try { + return subsystem.readStringValue(parm); } catch (IOException e) { return null; @@ -100,6 +103,24 @@ } + private String readStringValue(String param) throws IOException { + PrivilegedExceptionAction pea = () -> Files.newBufferedReader(Paths.get(path(), param)); + try (BufferedReader bufferedReader = AccessController.doPrivileged(pea)) { + String line = bufferedReader.readLine(); + return line; + } catch (PrivilegedActionException e) { + Throwable x = e.getCause(); + if (x instanceof IOException) + throw (IOException)x; + if (x instanceof RuntimeException) + throw (RuntimeException)x; + if (x instanceof Error) + throw (Error)x; + + throw new InternalError(x); + } + } + From serguei.spitsyn at oracle.com Tue Dec 10 02:02:21 2019 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 9 Dec 2019 18:02:21 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> Message-ID: <072d6861-1374-8190-135d-e30ece2ee380@oracle.com> Hi Daniil, It is not a full review, just some minor comments. In fact, I do not see real problems yet. http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java.frames.html ? 55???? public long getTotalSwapSpaceSize() { ? 56???????? if (containerMetrics != null) { ? 57???????????? long limit = containerMetrics.getMemoryAndSwapLimit(); ? 58???????????? // The memory limit metrics is not available if JVM runs on Linux host ( not in a docker container) ? 59???????????? // or if a docker container was started without specifying a memory limit ( without '--memory=' ? 60???????????? // Docker option). In latter case there is no limit on how much memory the container can use and ? 61???????????? // it can use as much memory as the host's OS allows. ? 62???????????? long memLimit = containerMetrics.getMemoryLimit(); ? 63???????????? if (limit >= 0 && memLimit >= 0) { ? 64???????????????? return limit - memLimit; ? 65???????????? } ? 66???????? } ? 67???????? return getTotalSwapSpaceSize0(); ? 68???? } ? Unneeded space after brackets '('. ? Do we need to check if the (limit - memLimit) value is negative? ? The same question is for getFreeSwapSpaceSize(): ??? memSwapLimit - memLimit - (memSwapUsage - memUsage) ? and getFreeMemorySize(): ??? 101 return limit - usage; ? 81???????????????????????? // If this happens just retry the loop for a few iterations ? Dot is missed at the end of comment. http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java.html ? 34 System.out.println(String.format("Runtime.availableProcessors: %d", Runtime.getRuntime().availableProcessors())); ? 35 System.out.println(String.format("OperatingSystemMXBean.getAvailableProcessors: %d", osBean.getAvailableProcessors())); ? 36 System.out.println(String.format("OperatingSystemMXBean.getTotalMemorySize: %d", osBean.getTotalMemorySize())); ? 37 System.out.println(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize: %d", osBean.getTotalPhysicalMemorySize())); ? 38 System.out.println(String.format("OperatingSystemMXBean.getFreeMemorySize: %d", osBean.getFreeMemorySize())); ? 39 System.out.println(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: %d", osBean.getFreePhysicalMemorySize())); ? 40 System.out.println(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize: %d", osBean.getTotalSwapSpaceSize())); ? 41 System.out.println(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize: %d", osBean.getFreeSwapSpaceSize())); ? 42 System.out.println(String.format("OperatingSystemMXBean.getCpuLoad: %f", osBean.getCpuLoad())); ? 43 System.out.println(String.format("OperatingSystemMXBean.getSystemCpuLoad: %f", osBean.getSystemCpuLoad())); ? To make the above lines a little bit shorter I'd suggest to define a log() method like this: ???? private static void log(String msg) ( System.out.println(msg(; } ? 34???????? log(String.format("Runtime.availableProcessors: %d", Runtime.getRuntime().availableProcessors())); ? 35 log(String.format("OperatingSystemMXBean.getAvailableProcessors: %d", osBean.getAvailableProcessors())); ? 36 log(String.format("OperatingSystemMXBean.getTotalMemorySize: %d", osBean.getTotalMemorySize())); ? 37 log(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize: %d", osBean.getTotalPhysicalMemorySize())); ? 38 log(String.format("OperatingSystemMXBean.getFreeMemorySize: %d", osBean.getFreeMemorySize())); ? 39 log(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: %d", osBean.getFreePhysicalMemorySize())); ? 40 log(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize: %d", osBean.getTotalSwapSpaceSize())); ? 41 log(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize: %d", osBean.getFreeSwapSpaceSize())); ? 42???????? log(String.format("OperatingSystemMXBean.getCpuLoad: %f", osBean.getCpuLoad())); ? 43 log(String.format("OperatingSystemMXBean.getSystemCpuLoad: %f", osBean.getSystemCpuLoad())); Thanks, Serguei On 12/6/19 17:41, Daniil Titov wrote: > Hi David, Mandy, and Bob, > > Thank you for reviewing this fix. > > Please review a new version of the fix [1] that includes the following changes comparing to the previous version of the webrev ( webrev.04) > 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 and to CSR [3] were discarded. > 2. The implementation of methods getFreeMemorySize, getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize > was also reverted to webrev.03 version that return host's values if the metrics are unavailable or cannot be properly read. > I would like to mention that currently the native implementation of these methods de-facto may return -1 at some circumstances, > but I agree that the changes proposed in the previous version of the webrev increase such probability. > I filed the follow-up issue [4] as Mandy suggested. > 3. The legacy methods were renamed as David suggested. > > >> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c >> ! static int initialized=1; >> >> Am I reading this right that the code currently fails to actually do the >> initialization because of this ??? > Yes, currently the code fails to do the initialization but it was unnoticed since method > get_cpuload_internal(...) was never called for a specific CPU, the first parameter "which" > was always -1. > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >> >> System.out.println(String.format(...) >> >> Why not simply >> >> System.out.printf(..) > As I tried explain it earlier it would make the tests unstable. > System.out.printf(...) just delegates the call to System.out.format(...) that doesn't emit the string atomically. > Instead it parses the format string into a list of FormatString objects and then iterates over the list. > As a result, the other traces occasionally got printed between these iterations and break the pattern the test is expected to find > in the output. > > For example, here is the sample of a such output that has the trace message printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:" > and "1030762496". > > > [0.304s][trace][os,container] Memory Usage is: 42983424 > OperatingSystemMXBean.getFreeMemorySize: 1030758400 > [0.305s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > [0.305s][trace][os,container] Memory Usage is: 42979328 > [0.306s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > OperatingSystemMXBean.getFreePhysicalMemorySize: [0.306s][trace][os,container] Memory Usage is: 42975232 > 1030762496 > OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176 > > > java.lang.RuntimeException: 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from stdout/stderr > > at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306) > at TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151) > at TestMemoryAwareness.main(TestMemoryAwareness.java:73) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) > at java.base/java.lang.Thread.run(Thread.java:832) > > Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. > > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.05 > [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 > [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 > [4] https://bugs.openjdk.java.net/browse/JDK-8235522 > > Thank you, > Daniil > > ?On 12/6/19, 1:38 PM, "Mandy Chung" wrote: > > > > On 12/6/19 5:59 AM, Bob Vandette wrote: > >> On Dec 6, 2019, at 2:49 AM, David Holmes wrote: > >> > >> > >> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java > >> > >> The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. > > I thought that the error case we are referring to is limit == 0 which > indicates something unexpected goes wrong. So the compatibility concern > should be low. This is very specific to Metrics implementation for > cgroup v1 and let me know if I'm wrong. > > >> Surely there must always be some information available from the operating environment? I see from the impl file: > >> > >> // the host data, value 0 indicates that something went wrong while the metric was read and > >> // in this case we return "information unavailable" code -1. > >> > >> I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. > > I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods. > > Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others > > are, I?d suggest we keep Daniil?s original logic to fall back to the host value since a disabled subsystem is equivalent to no > > limits. > > > > It's important to consider carefully if the monitoring API indicates an > error vs unavailable and an application should continue to run when the > monitoring system fails to get the metrics. > > There are several choices to report "something goes wrong" scenarios > (should unlikely happen???): > 1. fall back to a random positive value (e.g. host value) > 2. return a negative value > 3. throw an exception > > #3 is not an option as the application is not expecting this. For #2, > the application can filter bad values if desirable. > > I'm okay if you want to file a JBS issue to follow up and thoroughly > look at the cases that the metrics are unavailable and the cases when > fails to obtain. > > >> --- > >> > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java > >> > >> System.out.println(String.format(...) > >> > >> Why not simply > >> > >> System.out.printf(..) > >> > >> ? > > or simply (as I commented [1]) > System.out.format > > Mandy > [1] > https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html > > > > From mandy.chung at oracle.com Tue Dec 10 06:11:59 2019 From: mandy.chung at oracle.com (Mandy Chung) Date: Mon, 9 Dec 2019 22:11:59 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <48F75B58-0529-4B43-A355-A12EB0D09598@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> <95ab70c9-8be4-13cf-ef90-f36b2e993450@oracle.com> <2b476e9f-0266-eba7-3629-114296a55f29@oracle.com> <48F75B58-0529-4B43-A355-A12EB0D09598@oracle.com> Message-ID: <1dc773d4-14e6-9209-2a4e-190699d88a47@oracle.com> On 12/9/19 3:47 PM, Daniil Titov wrote: > Hi Mandy and Bob, > > Please review a new version of the webrev [1] that moves doPrivileged calls in > jdk.internal.platform.cgroupv1.SubSystem to separate methods that throw > IOException, as Mandy suggested. > > Mach5 tests are still running. > > > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.06/ I reviewed Metrics and Subsystem in this version. I? think it's simpler to have unwrapIOExceptionAndRethrow handling the InternalError case. + List lines = subsystem.readMatchingLines(param); + for (String line : lines) { if (line.startsWith(match)) { retval = conversion.apply(line); break; } }This can simply call Metrics::readFilePrivileged and process on the Stream. return Metrics::readFilePrivileged(Paths.get(subsystem.path(), param)) .filter(line -> line.startsWith(match)) .map(conversion::apply) .findFirst().orElseGet(() ->retval); I don't need to see a new webrev. Mandy -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Tue Dec 10 10:11:23 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 10 Dec 2019 20:11:23 +1000 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <447D72BA-1046-4131-9B13-C1F7244D87EA@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> <0a412f63-1583-6789-9afa-cebbc968e7c4@oracle.com> <447D72BA-1046-4131-9B13-C1F7244D87EA@oracle.com> Message-ID: <1996939d-54af-c34c-c796-cab3ec92b445@oracle.com> On 10/12/2019 5:31 am, Daniil Titov wrote: > Hi David, > >> So we never try to access the uninitialized counters.cpus array which is >> good but we still return garbage for counters.jvmTicks and >> counters.cpuTicks - surely that should have been noticeable? > > It only affected the first time the CPU load was requested. Function get_cpuload_internal(...) > calls get_totalticks() and get_jvmticks() functions that update these counters. > But on the first call, yes, it compares the newly received counters with the garbage. > > > It has the code that seems to be written to somehow mitigate it and in worse case just return > 0 or 1.0. But it also could be that there is some other problem this code tries to solve so I'm not sure > we should remove these workarounds as a part of the current fix. Please file a follow up RFE to look into this. Thanks, David > 274 // seems like we sometimes end up with less kernel ticks when > 275 // reading /proc/self/stat a second time, timing issue between cpus? > 276 if (pticks->usedKernel < tmp.usedKernel) { > 277 kdiff = 0; > 278 } else { > 279 kdiff = pticks->usedKernel - tmp.usedKernel; > 280 } > 281 tdiff = pticks->total - tmp.total; > 282 udiff = pticks->used - tmp.used; > 283 > 284 if (tdiff == 0) { > 285 user_load = 0; > 286 } else { > 287 if (tdiff < (udiff + kdiff)) { > 288 tdiff = udiff + kdiff; > 289 } > 290 *pkernelLoad = (kdiff / (double)tdiff); > 291 // BUG9044876, normalize return values to sane values > 292 *pkernelLoad = MAX(*pkernelLoad, 0.0); > 293 *pkernelLoad = MIN(*pkernelLoad, 1.0); > 294 > 295 user_load = (udiff / (double)tdiff); > 296 user_load = MAX(user_load, 0.0); > 297 user_load = MIN(user_load, 1.0); > 298 } > 299 } > > Best regards, > Daniil > > ?On 12/8/19, 8:49 PM, "David Holmes" wrote: > > Hi Daniil, > > On 7/12/2019 11:41 am, Daniil Titov wrote: > > Hi David, Mandy, and Bob, > > > > Thank you for reviewing this fix. > > > > Please review a new version of the fix [1] that includes the following changes comparing to the previous version of the webrev ( webrev.04) > > 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 and to CSR [3] were discarded. > > Okay. > > > 2. The implementation of methods getFreeMemorySize, getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize > > was also reverted to webrev.03 version that return host's values if the metrics are unavailable or cannot be properly read. > > Okay. > > > I would like to mention that currently the native implementation of these methods de-facto may return -1 at some circumstances, > > but I agree that the changes proposed in the previous version of the webrev increase such probability. > > I filed the follow-up issue [4] as Mandy suggested. > > I added a comment to the bug. This is potentially a difficult problem to > resolve - it all depends on the likelihood of any errors and what they > really indicate. > > > 3. The legacy methods were renamed as David suggested. > > Thanks! > > > > >> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c > >> ! static int initialized=1; > >> > >> Am I reading this right that the code currently fails to actually do the > >> initialization because of this ??? > > > > Yes, currently the code fails to do the initialization but it was unnoticed since method > > get_cpuload_internal(...) was never called for a specific CPU, the first parameter "which" > > was always -1. > > So we never try to access the uninitialized counters.cpus array which is > good but we still return garbage for counters.jvmTicks and > counters.cpuTicks - surely that should have been noticeable? > > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java > >> > >> System.out.println(String.format(...) > >> > >> Why not simply > >> > >> System.out.printf(..) > > > > As I tried explain it earlier it would make the tests unstable. > > System.out.printf(...) just delegates the call to System.out.format(...) that doesn't emit the string atomically. > > Instead it parses the format string into a list of FormatString objects and then iterates over the list. > > As a result, the other traces occasionally got printed between these iterations and break the pattern the test is expected to find > > in the output. > > Sorry I missed the earlier explanation. I find it somewhat surprising > that format() works that way, but without unlimited buffering there will > always be a need to flush the outputstream at some point. > > Thanks, > David > ----- > > > For example, here is the sample of a such output that has the trace message printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:" > > and "1030762496". > > > > > > [0.304s][trace][os,container] Memory Usage is: 42983424 > > OperatingSystemMXBean.getFreeMemorySize: 1030758400 > > [0.305s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > > [0.305s][trace][os,container] Memory Usage is: 42979328 > > [0.306s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > > OperatingSystemMXBean.getFreePhysicalMemorySize: [0.306s][trace][os,container] Memory Usage is: 42975232 > > 1030762496 > > OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176 > > > > > > java.lang.RuntimeException: 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from stdout/stderr > > > > at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306) > > at TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151) > > at TestMemoryAwareness.main(TestMemoryAwareness.java:73) > > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) > > at java.base/java.lang.Thread.run(Thread.java:832) > > > > Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. > > > > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.05 > > [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 > > [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 > > [4] https://bugs.openjdk.java.net/browse/JDK-8235522 > > > > Thank you, > > Daniil > > > > ?On 12/6/19, 1:38 PM, "Mandy Chung" wrote: > > > > > > > > On 12/6/19 5:59 AM, Bob Vandette wrote: > > >> On Dec 6, 2019, at 2:49 AM, David Holmes wrote: > > >> > > >> > > >> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java > > >> > > >> The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. > > > > I thought that the error case we are referring to is limit == 0 which > > indicates something unexpected goes wrong. So the compatibility concern > > should be low. This is very specific to Metrics implementation for > > cgroup v1 and let me know if I'm wrong. > > > > >> Surely there must always be some information available from the operating environment? I see from the impl file: > > >> > > >> // the host data, value 0 indicates that something went wrong while the metric was read and > > >> // in this case we return "information unavailable" code -1. > > >> > > >> I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. > > > I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods. > > > Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others > > > are, I?d suggest we keep Daniil?s original logic to fall back to the host value since a disabled subsystem is equivalent to no > > > limits. > > > > > > > It's important to consider carefully if the monitoring API indicates an > > error vs unavailable and an application should continue to run when the > > monitoring system fails to get the metrics. > > > > There are several choices to report "something goes wrong" scenarios > > (should unlikely happen???): > > 1. fall back to a random positive value (e.g. host value) > > 2. return a negative value > > 3. throw an exception > > > > #3 is not an option as the application is not expecting this. For #2, > > the application can filter bad values if desirable. > > > > I'm okay if you want to file a JBS issue to follow up and thoroughly > > look at the cases that the metrics are unavailable and the cases when > > fails to obtain. > > > > >> --- > > >> > > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java > > >> > > >> System.out.println(String.format(...) > > >> > > >> Why not simply > > >> > > >> System.out.printf(..) > > >> > > >> ? > > > > or simply (as I commented [1]) > > System.out.format > > > > Mandy > > [1] > > https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html > > > > > > > > > > > From daniil.x.titov at oracle.com Tue Dec 10 17:49:51 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Tue, 10 Dec 2019 09:49:51 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <1996939d-54af-c34c-c796-cab3ec92b445@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> <0a412f63-1583-6789-9afa-cebbc968e7c4@oracle.com> <447D72BA-1046-4131-9B13-C1F7244D87EA@oracle.com> <1996939d-54af-c34c-c796-cab3ec92b445@oracle.com> Message-ID: <92CAC22E-53FE-422B-B5E0-EA6C959352FC@oracle.com> Hi David, > Please file a follow up RFE to look into this. I created an issue to follow this up [1] [1] https://bugs.openjdk.java.net/browse/JDK-8235681 Thank you, Daniil ?On 12/10/19, 2:11 AM, "David Holmes" wrote: On 10/12/2019 5:31 am, Daniil Titov wrote: > Hi David, > >> So we never try to access the uninitialized counters.cpus array which is >> good but we still return garbage for counters.jvmTicks and >> counters.cpuTicks - surely that should have been noticeable? > > It only affected the first time the CPU load was requested. Function get_cpuload_internal(...) > calls get_totalticks() and get_jvmticks() functions that update these counters. > But on the first call, yes, it compares the newly received counters with the garbage. > > > It has the code that seems to be written to somehow mitigate it and in worse case just return > 0 or 1.0. But it also could be that there is some other problem this code tries to solve so I'm not sure > we should remove these workarounds as a part of the current fix. Please file a follow up RFE to look into this. Thanks, David > 274 // seems like we sometimes end up with less kernel ticks when > 275 // reading /proc/self/stat a second time, timing issue between cpus? > 276 if (pticks->usedKernel < tmp.usedKernel) { > 277 kdiff = 0; > 278 } else { > 279 kdiff = pticks->usedKernel - tmp.usedKernel; > 280 } > 281 tdiff = pticks->total - tmp.total; > 282 udiff = pticks->used - tmp.used; > 283 > 284 if (tdiff == 0) { > 285 user_load = 0; > 286 } else { > 287 if (tdiff < (udiff + kdiff)) { > 288 tdiff = udiff + kdiff; > 289 } > 290 *pkernelLoad = (kdiff / (double)tdiff); > 291 // BUG9044876, normalize return values to sane values > 292 *pkernelLoad = MAX(*pkernelLoad, 0.0); > 293 *pkernelLoad = MIN(*pkernelLoad, 1.0); > 294 > 295 user_load = (udiff / (double)tdiff); > 296 user_load = MAX(user_load, 0.0); > 297 user_load = MIN(user_load, 1.0); > 298 } > 299 } > > Best regards, > Daniil > > ?On 12/8/19, 8:49 PM, "David Holmes" wrote: > > Hi Daniil, > > On 7/12/2019 11:41 am, Daniil Titov wrote: > > Hi David, Mandy, and Bob, > > > > Thank you for reviewing this fix. > > > > Please review a new version of the fix [1] that includes the following changes comparing to the previous version of the webrev ( webrev.04) > > 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 and to CSR [3] were discarded. > > Okay. > > > 2. The implementation of methods getFreeMemorySize, getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize > > was also reverted to webrev.03 version that return host's values if the metrics are unavailable or cannot be properly read. > > Okay. > > > I would like to mention that currently the native implementation of these methods de-facto may return -1 at some circumstances, > > but I agree that the changes proposed in the previous version of the webrev increase such probability. > > I filed the follow-up issue [4] as Mandy suggested. > > I added a comment to the bug. This is potentially a difficult problem to > resolve - it all depends on the likelihood of any errors and what they > really indicate. > > > 3. The legacy methods were renamed as David suggested. > > Thanks! > > > > >> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c > >> ! static int initialized=1; > >> > >> Am I reading this right that the code currently fails to actually do the > >> initialization because of this ??? > > > > Yes, currently the code fails to do the initialization but it was unnoticed since method > > get_cpuload_internal(...) was never called for a specific CPU, the first parameter "which" > > was always -1. > > So we never try to access the uninitialized counters.cpus array which is > good but we still return garbage for counters.jvmTicks and > counters.cpuTicks - surely that should have been noticeable? > > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java > >> > >> System.out.println(String.format(...) > >> > >> Why not simply > >> > >> System.out.printf(..) > > > > As I tried explain it earlier it would make the tests unstable. > > System.out.printf(...) just delegates the call to System.out.format(...) that doesn't emit the string atomically. > > Instead it parses the format string into a list of FormatString objects and then iterates over the list. > > As a result, the other traces occasionally got printed between these iterations and break the pattern the test is expected to find > > in the output. > > Sorry I missed the earlier explanation. I find it somewhat surprising > that format() works that way, but without unlimited buffering there will > always be a need to flush the outputstream at some point. > > Thanks, > David > ----- > > > For example, here is the sample of a such output that has the trace message printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:" > > and "1030762496". > > > > > > [0.304s][trace][os,container] Memory Usage is: 42983424 > > OperatingSystemMXBean.getFreeMemorySize: 1030758400 > > [0.305s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > > [0.305s][trace][os,container] Memory Usage is: 42979328 > > [0.306s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > > OperatingSystemMXBean.getFreePhysicalMemorySize: [0.306s][trace][os,container] Memory Usage is: 42975232 > > 1030762496 > > OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176 > > > > > > java.lang.RuntimeException: 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from stdout/stderr > > > > at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306) > > at TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151) > > at TestMemoryAwareness.main(TestMemoryAwareness.java:73) > > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) > > at java.base/java.lang.Thread.run(Thread.java:832) > > > > Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. > > > > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.05 > > [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 > > [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 > > [4] https://bugs.openjdk.java.net/browse/JDK-8235522 > > > > Thank you, > > Daniil > > > > ?On 12/6/19, 1:38 PM, "Mandy Chung" wrote: > > > > > > > > On 12/6/19 5:59 AM, Bob Vandette wrote: > > >> On Dec 6, 2019, at 2:49 AM, David Holmes wrote: > > >> > > >> > > >> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java > > >> > > >> The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. > > > > I thought that the error case we are referring to is limit == 0 which > > indicates something unexpected goes wrong. So the compatibility concern > > should be low. This is very specific to Metrics implementation for > > cgroup v1 and let me know if I'm wrong. > > > > >> Surely there must always be some information available from the operating environment? I see from the impl file: > > >> > > >> // the host data, value 0 indicates that something went wrong while the metric was read and > > >> // in this case we return "information unavailable" code -1. > > >> > > >> I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. > > > I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods. > > > Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others > > > are, I?d suggest we keep Daniil?s original logic to fall back to the host value since a disabled subsystem is equivalent to no > > > limits. > > > > > > > It's important to consider carefully if the monitoring API indicates an > > error vs unavailable and an application should continue to run when the > > monitoring system fails to get the metrics. > > > > There are several choices to report "something goes wrong" scenarios > > (should unlikely happen???): > > 1. fall back to a random positive value (e.g. host value) > > 2. return a negative value > > 3. throw an exception > > > > #3 is not an option as the application is not expecting this. For #2, > > the application can filter bad values if desirable. > > > > I'm okay if you want to file a JBS issue to follow up and thoroughly > > look at the cases that the metrics are unavailable and the cases when > > fails to obtain. > > > > >> --- > > >> > > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java > > >> > > >> System.out.println(String.format(...) > > >> > > >> Why not simply > > >> > > >> System.out.printf(..) > > >> > > >> ? > > > > or simply (as I commented [1]) > > System.out.format > > > > Mandy > > [1] > > https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html > > > > > > > > > > > From daniil.x.titov at oracle.com Tue Dec 10 18:29:56 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Tue, 10 Dec 2019 10:29:56 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <072d6861-1374-8190-135d-e30ece2ee380@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> <072d6861-1374-8190-135d-e30ece2ee380@oracle.com> Message-ID: <7720D36B-0505-49E3-8424-76ACEACAF0AB@oracle.com> Hi Serguei, > Do we need to check if the (limit - memLimit) value is negative? > The same question is for getFreeSwapSpaceSize(): > memSwapLimit - memLimit - (memSwapUsage - memUsage) > > and getFreeMemorySize(): > 101 return limit - usage; I don't think we need such check here. If it happens in fact it means the serious system malfunction and a negative value this method returns would indicate this (currently the native methods already returns -1 if something went wrong). But we could revise it in the follow up issue I created for that [1]. [1] https://bugs.openjdk.java.net/browse/JDK-8235522 Thank you, Daniil ?On 12/9/19, 6:02 PM, "serguei.spitsyn at oracle.com" wrote: Hi Daniil, It is not a full review, just some minor comments. In fact, I do not see real problems yet. http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java.frames.html 55 public long getTotalSwapSpaceSize() { 56 if (containerMetrics != null) { 57 long limit = containerMetrics.getMemoryAndSwapLimit(); 58 // The memory limit metrics is not available if JVM runs on Linux host ( not in a docker container) 59 // or if a docker container was started without specifying a memory limit ( without '--memory=' 60 // Docker option). In latter case there is no limit on how much memory the container can use and 61 // it can use as much memory as the host's OS allows. 62 long memLimit = containerMetrics.getMemoryLimit(); 63 if (limit >= 0 && memLimit >= 0) { 64 return limit - memLimit; 65 } 66 } 67 return getTotalSwapSpaceSize0(); 68 } Unneeded space after brackets '('. Do we need to check if the (limit - memLimit) value is negative? The same question is for getFreeSwapSpaceSize(): memSwapLimit - memLimit - (memSwapUsage - memUsage) and getFreeMemorySize(): 101 return limit - usage; 81 // If this happens just retry the loop for a few iterations Dot is missed at the end of comment. http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java.html 34 System.out.println(String.format("Runtime.availableProcessors: %d", Runtime.getRuntime().availableProcessors())); 35 System.out.println(String.format("OperatingSystemMXBean.getAvailableProcessors: %d", osBean.getAvailableProcessors())); 36 System.out.println(String.format("OperatingSystemMXBean.getTotalMemorySize: %d", osBean.getTotalMemorySize())); 37 System.out.println(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize: %d", osBean.getTotalPhysicalMemorySize())); 38 System.out.println(String.format("OperatingSystemMXBean.getFreeMemorySize: %d", osBean.getFreeMemorySize())); 39 System.out.println(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: %d", osBean.getFreePhysicalMemorySize())); 40 System.out.println(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize: %d", osBean.getTotalSwapSpaceSize())); 41 System.out.println(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize: %d", osBean.getFreeSwapSpaceSize())); 42 System.out.println(String.format("OperatingSystemMXBean.getCpuLoad: %f", osBean.getCpuLoad())); 43 System.out.println(String.format("OperatingSystemMXBean.getSystemCpuLoad: %f", osBean.getSystemCpuLoad())); To make the above lines a little bit shorter I'd suggest to define a log() method like this: private static void log(String msg) ( System.out.println(msg(; } 34 log(String.format("Runtime.availableProcessors: %d", Runtime.getRuntime().availableProcessors())); 35 log(String.format("OperatingSystemMXBean.getAvailableProcessors: %d", osBean.getAvailableProcessors())); 36 log(String.format("OperatingSystemMXBean.getTotalMemorySize: %d", osBean.getTotalMemorySize())); 37 log(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize: %d", osBean.getTotalPhysicalMemorySize())); 38 log(String.format("OperatingSystemMXBean.getFreeMemorySize: %d", osBean.getFreeMemorySize())); 39 log(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: %d", osBean.getFreePhysicalMemorySize())); 40 log(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize: %d", osBean.getTotalSwapSpaceSize())); 41 log(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize: %d", osBean.getFreeSwapSpaceSize())); 42 log(String.format("OperatingSystemMXBean.getCpuLoad: %f", osBean.getCpuLoad())); 43 log(String.format("OperatingSystemMXBean.getSystemCpuLoad: %f", osBean.getSystemCpuLoad())); Thanks, Serguei On 12/6/19 17:41, Daniil Titov wrote: > Hi David, Mandy, and Bob, > > Thank you for reviewing this fix. > > Please review a new version of the fix [1] that includes the following changes comparing to the previous version of the webrev ( webrev.04) > 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 and to CSR [3] were discarded. > 2. The implementation of methods getFreeMemorySize, getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize > was also reverted to webrev.03 version that return host's values if the metrics are unavailable or cannot be properly read. > I would like to mention that currently the native implementation of these methods de-facto may return -1 at some circumstances, > but I agree that the changes proposed in the previous version of the webrev increase such probability. > I filed the follow-up issue [4] as Mandy suggested. > 3. The legacy methods were renamed as David suggested. > > >> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c >> ! static int initialized=1; >> >> Am I reading this right that the code currently fails to actually do the >> initialization because of this ??? > Yes, currently the code fails to do the initialization but it was unnoticed since method > get_cpuload_internal(...) was never called for a specific CPU, the first parameter "which" > was always -1. > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >> >> System.out.println(String.format(...) >> >> Why not simply >> >> System.out.printf(..) > As I tried explain it earlier it would make the tests unstable. > System.out.printf(...) just delegates the call to System.out.format(...) that doesn't emit the string atomically. > Instead it parses the format string into a list of FormatString objects and then iterates over the list. > As a result, the other traces occasionally got printed between these iterations and break the pattern the test is expected to find > in the output. > > For example, here is the sample of a such output that has the trace message printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:" > and "1030762496". > > > [0.304s][trace][os,container] Memory Usage is: 42983424 > OperatingSystemMXBean.getFreeMemorySize: 1030758400 > [0.305s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > [0.305s][trace][os,container] Memory Usage is: 42979328 > [0.306s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > OperatingSystemMXBean.getFreePhysicalMemorySize: [0.306s][trace][os,container] Memory Usage is: 42975232 > 1030762496 > OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176 > > > java.lang.RuntimeException: 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from stdout/stderr > > at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306) > at TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151) > at TestMemoryAwareness.main(TestMemoryAwareness.java:73) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) > at java.base/java.lang.Thread.run(Thread.java:832) > > Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. > > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.05 > [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 > [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 > [4] https://bugs.openjdk.java.net/browse/JDK-8235522 > > Thank you, > Daniil > > ?On 12/6/19, 1:38 PM, "Mandy Chung" wrote: > > > > On 12/6/19 5:59 AM, Bob Vandette wrote: > >> On Dec 6, 2019, at 2:49 AM, David Holmes wrote: > >> > >> > >> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java > >> > >> The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. > > I thought that the error case we are referring to is limit == 0 which > indicates something unexpected goes wrong. So the compatibility concern > should be low. This is very specific to Metrics implementation for > cgroup v1 and let me know if I'm wrong. > > >> Surely there must always be some information available from the operating environment? I see from the impl file: > >> > >> // the host data, value 0 indicates that something went wrong while the metric was read and > >> // in this case we return "information unavailable" code -1. > >> > >> I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. > > I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods. > > Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others > > are, I?d suggest we keep Daniil?s original logic to fall back to the host value since a disabled subsystem is equivalent to no > > limits. > > > > It's important to consider carefully if the monitoring API indicates an > error vs unavailable and an application should continue to run when the > monitoring system fails to get the metrics. > > There are several choices to report "something goes wrong" scenarios > (should unlikely happen???): > 1. fall back to a random positive value (e.g. host value) > 2. return a negative value > 3. throw an exception > > #3 is not an option as the application is not expecting this. For #2, > the application can filter bad values if desirable. > > I'm okay if you want to file a JBS issue to follow up and thoroughly > look at the cases that the metrics are unavailable and the cases when > fails to obtain. > > >> --- > >> > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java > >> > >> System.out.println(String.format(...) > >> > >> Why not simply > >> > >> System.out.printf(..) > >> > >> ? > > or simply (as I commented [1]) > System.out.format > > Mandy > [1] > https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html > > > > From daniil.x.titov at oracle.com Wed Dec 11 17:12:40 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Wed, 11 Dec 2019 09:12:40 -0800 (PST) Subject: jmx-dev 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <072d6861-1374-8190-135d-e30ece2ee380@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> <072d6861-1374-8190-135d-e30ece2ee380@oracle.com> Message-ID: <2A286F91-6CB8-4A32-B103-2C28D52E5A1A@oracle.com> Hi Serguei, Thank you for your comments. I will correct this nits before pushing the changes. Hi Bob and David, > [Mandy Chung] >> I reviewed Metrics and Subsystem in this version. >> I don't need to see a new webrev. As I understood Mandy finished reviewing this fix. Just wanted to confirm with you if you are okey with that version of the fix (webrev.06) ? Mach5 testing: tier1-tier6 and open/test/hotspot/jtreg/containers/docker tests passed. Thank you, Daniil ?On 12/9/19, 6:02 PM, "serguei.spitsyn at oracle.com" wrote: Hi Daniil, It is not a full review, just some minor comments. In fact, I do not see real problems yet. http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java.frames.html 55 public long getTotalSwapSpaceSize() { 56 if (containerMetrics != null) { 57 long limit = containerMetrics.getMemoryAndSwapLimit(); 58 // The memory limit metrics is not available if JVM runs on Linux host ( not in a docker container) 59 // or if a docker container was started without specifying a memory limit ( without '--memory=' 60 // Docker option). In latter case there is no limit on how much memory the container can use and 61 // it can use as much memory as the host's OS allows. 62 long memLimit = containerMetrics.getMemoryLimit(); 63 if (limit >= 0 && memLimit >= 0) { 64 return limit - memLimit; 65 } 66 } 67 return getTotalSwapSpaceSize0(); 68 } Unneeded space after brackets '('. Do we need to check if the (limit - memLimit) value is negative? The same question is for getFreeSwapSpaceSize(): memSwapLimit - memLimit - (memSwapUsage - memUsage) and getFreeMemorySize(): 101 return limit - usage; 81 // If this happens just retry the loop for a few iterations Dot is missed at the end of comment. http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java.html 34 System.out.println(String.format("Runtime.availableProcessors: %d", Runtime.getRuntime().availableProcessors())); 35 System.out.println(String.format("OperatingSystemMXBean.getAvailableProcessors: %d", osBean.getAvailableProcessors())); 36 System.out.println(String.format("OperatingSystemMXBean.getTotalMemorySize: %d", osBean.getTotalMemorySize())); 37 System.out.println(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize: %d", osBean.getTotalPhysicalMemorySize())); 38 System.out.println(String.format("OperatingSystemMXBean.getFreeMemorySize: %d", osBean.getFreeMemorySize())); 39 System.out.println(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: %d", osBean.getFreePhysicalMemorySize())); 40 System.out.println(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize: %d", osBean.getTotalSwapSpaceSize())); 41 System.out.println(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize: %d", osBean.getFreeSwapSpaceSize())); 42 System.out.println(String.format("OperatingSystemMXBean.getCpuLoad: %f", osBean.getCpuLoad())); 43 System.out.println(String.format("OperatingSystemMXBean.getSystemCpuLoad: %f", osBean.getSystemCpuLoad())); To make the above lines a little bit shorter I'd suggest to define a log() method like this: private static void log(String msg) ( System.out.println(msg(; } 34 log(String.format("Runtime.availableProcessors: %d", Runtime.getRuntime().availableProcessors())); 35 log(String.format("OperatingSystemMXBean.getAvailableProcessors: %d", osBean.getAvailableProcessors())); 36 log(String.format("OperatingSystemMXBean.getTotalMemorySize: %d", osBean.getTotalMemorySize())); 37 log(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize: %d", osBean.getTotalPhysicalMemorySize())); 38 log(String.format("OperatingSystemMXBean.getFreeMemorySize: %d", osBean.getFreeMemorySize())); 39 log(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: %d", osBean.getFreePhysicalMemorySize())); 40 log(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize: %d", osBean.getTotalSwapSpaceSize())); 41 log(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize: %d", osBean.getFreeSwapSpaceSize())); 42 log(String.format("OperatingSystemMXBean.getCpuLoad: %f", osBean.getCpuLoad())); 43 log(String.format("OperatingSystemMXBean.getSystemCpuLoad: %f", osBean.getSystemCpuLoad())); Thanks, Serguei On 12/6/19 17:41, Daniil Titov wrote: > Hi David, Mandy, and Bob, > > Thank you for reviewing this fix. > > Please review a new version of the fix [1] that includes the following changes comparing to the previous version of the webrev ( webrev.04) > 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 and to CSR [3] were discarded. > 2. The implementation of methods getFreeMemorySize, getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize > was also reverted to webrev.03 version that return host's values if the metrics are unavailable or cannot be properly read. > I would like to mention that currently the native implementation of these methods de-facto may return -1 at some circumstances, > but I agree that the changes proposed in the previous version of the webrev increase such probability. > I filed the follow-up issue [4] as Mandy suggested. > 3. The legacy methods were renamed as David suggested. > > >> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c >> ! static int initialized=1; >> >> Am I reading this right that the code currently fails to actually do the >> initialization because of this ??? > Yes, currently the code fails to do the initialization but it was unnoticed since method > get_cpuload_internal(...) was never called for a specific CPU, the first parameter "which" > was always -1. > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >> >> System.out.println(String.format(...) >> >> Why not simply >> >> System.out.printf(..) > As I tried explain it earlier it would make the tests unstable. > System.out.printf(...) just delegates the call to System.out.format(...) that doesn't emit the string atomically. > Instead it parses the format string into a list of FormatString objects and then iterates over the list. > As a result, the other traces occasionally got printed between these iterations and break the pattern the test is expected to find > in the output. > > For example, here is the sample of a such output that has the trace message printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:" > and "1030762496". > > > [0.304s][trace][os,container] Memory Usage is: 42983424 > OperatingSystemMXBean.getFreeMemorySize: 1030758400 > [0.305s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > [0.305s][trace][os,container] Memory Usage is: 42979328 > [0.306s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > OperatingSystemMXBean.getFreePhysicalMemorySize: [0.306s][trace][os,container] Memory Usage is: 42975232 > 1030762496 > OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176 > > > java.lang.RuntimeException: 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from stdout/stderr > > at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306) > at TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151) > at TestMemoryAwareness.main(TestMemoryAwareness.java:73) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) > at java.base/java.lang.Thread.run(Thread.java:832) > > Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. > > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.05 > [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 > [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 > [4] https://bugs.openjdk.java.net/browse/JDK-8235522 > > Thank you, > Daniil > > ?On 12/6/19, 1:38 PM, "Mandy Chung" wrote: > > > > On 12/6/19 5:59 AM, Bob Vandette wrote: > >> On Dec 6, 2019, at 2:49 AM, David Holmes wrote: > >> > >> > >> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java > >> > >> The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. > > I thought that the error case we are referring to is limit == 0 which > indicates something unexpected goes wrong. So the compatibility concern > should be low. This is very specific to Metrics implementation for > cgroup v1 and let me know if I'm wrong. > > >> Surely there must always be some information available from the operating environment? I see from the impl file: > >> > >> // the host data, value 0 indicates that something went wrong while the metric was read and > >> // in this case we return "information unavailable" code -1. > >> > >> I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. > > I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods. > > Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others > > are, I?d suggest we keep Daniil?s original logic to fall back to the host value since a disabled subsystem is equivalent to no > > limits. > > > > It's important to consider carefully if the monitoring API indicates an > error vs unavailable and an application should continue to run when the > monitoring system fails to get the metrics. > > There are several choices to report "something goes wrong" scenarios > (should unlikely happen???): > 1. fall back to a random positive value (e.g. host value) > 2. return a negative value > 3. throw an exception > > #3 is not an option as the application is not expecting this. For #2, > the application can filter bad values if desirable. > > I'm okay if you want to file a JBS issue to follow up and thoroughly > look at the cases that the metrics are unavailable and the cases when > fails to obtain. > > >> --- > >> > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java > >> > >> System.out.println(String.format(...) > >> > >> Why not simply > >> > >> System.out.printf(..) > >> > >> ? > > or simply (as I commented [1]) > System.out.format > > Mandy > [1] > https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html > > > > From bob.vandette at oracle.com Wed Dec 11 17:21:11 2019 From: bob.vandette at oracle.com (Bob Vandette) Date: Wed, 11 Dec 2019 12:21:11 -0500 Subject: jmx-dev 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <2A286F91-6CB8-4A32-B103-2C28D52E5A1A@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> <072d6861-1374-8190-135d-e30ece2ee380@oracle.com> <2A286F91-6CB8-4A32-B103-2C28D52E5A1A@oracle.com> Message-ID: Yes, I defer to Mandy on the best way to express the various Java exceptions. I?m ok with the changes. Thanks for getting this done for JDK14! Bob. > On Dec 11, 2019, at 12:12 PM, Daniil Titov wrote: > > Hi Serguei, > > Thank you for your comments. I will correct this nits before pushing the changes. > > Hi Bob and David, > >> [Mandy Chung] >>> I reviewed Metrics and Subsystem in this version. >>> I don't need to see a new webrev. > > As I understood Mandy finished reviewing this fix. Just wanted to confirm with you if you are okey with that version of the fix (webrev.06) ? > > Mach5 testing: tier1-tier6 and open/test/hotspot/jtreg/containers/docker tests passed. > > Thank you, > Daniil > > > > ?On 12/9/19, 6:02 PM, "serguei.spitsyn at oracle.com" wrote: > > Hi Daniil, > > It is not a full review, just some minor comments. > In fact, I do not see real problems yet. > > http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java.frames.html > > 55 public long getTotalSwapSpaceSize() { > 56 if (containerMetrics != null) { > 57 long limit = containerMetrics.getMemoryAndSwapLimit(); > 58 // The memory limit metrics is not available if JVM > runs on Linux host ( not in a docker container) > 59 // or if a docker container was started without > specifying a memory limit ( without '--memory=' > 60 // Docker option). In latter case there is no limit on > how much memory the container can use and > 61 // it can use as much memory as the host's OS allows. > 62 long memLimit = containerMetrics.getMemoryLimit(); > 63 if (limit >= 0 && memLimit >= 0) { > 64 return limit - memLimit; > 65 } > 66 } > 67 return getTotalSwapSpaceSize0(); > 68 } > > Unneeded space after brackets '('. > Do we need to check if the (limit - memLimit) value is negative? > The same question is for getFreeSwapSpaceSize(): > memSwapLimit - memLimit - (memSwapUsage - memUsage) > > and getFreeMemorySize(): > 101 return limit - usage; > > 81 // If this happens just retry the loop for > a few iterations > > Dot is missed at the end of comment. > > > http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java.html > > 34 System.out.println(String.format("Runtime.availableProcessors: > %d", Runtime.getRuntime().availableProcessors())); > 35 > System.out.println(String.format("OperatingSystemMXBean.getAvailableProcessors: > %d", osBean.getAvailableProcessors())); > 36 > System.out.println(String.format("OperatingSystemMXBean.getTotalMemorySize: > %d", osBean.getTotalMemorySize())); > 37 > System.out.println(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize: > %d", osBean.getTotalPhysicalMemorySize())); > 38 > System.out.println(String.format("OperatingSystemMXBean.getFreeMemorySize: > %d", osBean.getFreeMemorySize())); > 39 > System.out.println(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: > %d", osBean.getFreePhysicalMemorySize())); > 40 > System.out.println(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize: > %d", osBean.getTotalSwapSpaceSize())); > 41 > System.out.println(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize: > %d", osBean.getFreeSwapSpaceSize())); > 42 > System.out.println(String.format("OperatingSystemMXBean.getCpuLoad: %f", > osBean.getCpuLoad())); > 43 > System.out.println(String.format("OperatingSystemMXBean.getSystemCpuLoad: > %f", osBean.getSystemCpuLoad())); > > > To make the above lines a little bit shorter I'd suggest to define a > log() method like this: > private static void log(String msg) ( System.out.println(msg(; } > > 34 log(String.format("Runtime.availableProcessors: %d", > Runtime.getRuntime().availableProcessors())); > 35 log(String.format("OperatingSystemMXBean.getAvailableProcessors: > %d", osBean.getAvailableProcessors())); > 36 log(String.format("OperatingSystemMXBean.getTotalMemorySize: %d", > osBean.getTotalMemorySize())); > 37 > log(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize: > %d", osBean.getTotalPhysicalMemorySize())); > 38 log(String.format("OperatingSystemMXBean.getFreeMemorySize: %d", > osBean.getFreeMemorySize())); > 39 > log(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: %d", > osBean.getFreePhysicalMemorySize())); > 40 log(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize: > %d", osBean.getTotalSwapSpaceSize())); > 41 log(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize: > %d", osBean.getFreeSwapSpaceSize())); > 42 log(String.format("OperatingSystemMXBean.getCpuLoad: %f", > osBean.getCpuLoad())); > 43 log(String.format("OperatingSystemMXBean.getSystemCpuLoad: %f", > osBean.getSystemCpuLoad())); > > > Thanks, > Serguei > > > > On 12/6/19 17:41, Daniil Titov wrote: >> Hi David, Mandy, and Bob, >> >> Thank you for reviewing this fix. >> >> Please review a new version of the fix [1] that includes the following changes comparing to the previous version of the webrev ( webrev.04) >> 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 and to CSR [3] were discarded. >> 2. The implementation of methods getFreeMemorySize, getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize >> was also reverted to webrev.03 version that return host's values if the metrics are unavailable or cannot be properly read. >> I would like to mention that currently the native implementation of these methods de-facto may return -1 at some circumstances, >> but I agree that the changes proposed in the previous version of the webrev increase such probability. >> I filed the follow-up issue [4] as Mandy suggested. >> 3. The legacy methods were renamed as David suggested. >> >> >>> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c >>> ! static int initialized=1; >>> >>> Am I reading this right that the code currently fails to actually do the >>> initialization because of this ??? >> Yes, currently the code fails to do the initialization but it was unnoticed since method >> get_cpuload_internal(...) was never called for a specific CPU, the first parameter "which" >> was always -1. >> >>> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >>> >>> System.out.println(String.format(...) >>> >>> Why not simply >>> >>> System.out.printf(..) >> As I tried explain it earlier it would make the tests unstable. >> System.out.printf(...) just delegates the call to System.out.format(...) that doesn't emit the string atomically. >> Instead it parses the format string into a list of FormatString objects and then iterates over the list. >> As a result, the other traces occasionally got printed between these iterations and break the pattern the test is expected to find >> in the output. >> >> For example, here is the sample of a such output that has the trace message printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:" >> and "1030762496". >> >> >> [0.304s][trace][os,container] Memory Usage is: 42983424 >> OperatingSystemMXBean.getFreeMemorySize: 1030758400 >> [0.305s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes >> [0.305s][trace][os,container] Memory Usage is: 42979328 >> [0.306s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes >> OperatingSystemMXBean.getFreePhysicalMemorySize: [0.306s][trace][os,container] Memory Usage is: 42975232 >> 1030762496 >> OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176 >> >> >> java.lang.RuntimeException: 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from stdout/stderr >> >> at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306) >> at TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151) >> at TestMemoryAwareness.main(TestMemoryAwareness.java:73) >> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.base/java.lang.reflect.Method.invoke(Method.java:564) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) >> at java.base/java.lang.Thread.run(Thread.java:832) >> >> Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. >> >> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.05 >> [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 >> [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 >> [4] https://bugs.openjdk.java.net/browse/JDK-8235522 >> >> Thank you, >> Daniil >> >> ?On 12/6/19, 1:38 PM, "Mandy Chung" wrote: >> >> >> >> On 12/6/19 5:59 AM, Bob Vandette wrote: >>>> On Dec 6, 2019, at 2:49 AM, David Holmes wrote: >>>> >>>> >>>> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java >>>> >>>> The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. >> >> I thought that the error case we are referring to is limit == 0 which >> indicates something unexpected goes wrong. So the compatibility concern >> should be low. This is very specific to Metrics implementation for >> cgroup v1 and let me know if I'm wrong. >> >>>> Surely there must always be some information available from the operating environment? I see from the impl file: >>>> >>>> // the host data, value 0 indicates that something went wrong while the metric was read and >>>> // in this case we return "information unavailable" code -1. >>>> >>>> I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. >>> I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods. >>> Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others >>> are, I?d suggest we keep Daniil?s original logic to fall back to the host value since a disabled subsystem is equivalent to no >>> limits. >>> >> >> It's important to consider carefully if the monitoring API indicates an >> error vs unavailable and an application should continue to run when the >> monitoring system fails to get the metrics. >> >> There are several choices to report "something goes wrong" scenarios >> (should unlikely happen???): >> 1. fall back to a random positive value (e.g. host value) >> 2. return a negative value >> 3. throw an exception >> >> #3 is not an option as the application is not expecting this. For #2, >> the application can filter bad values if desirable. >> >> I'm okay if you want to file a JBS issue to follow up and thoroughly >> look at the cases that the metrics are unavailable and the cases when >> fails to obtain. >> >>>> --- >>>> >>>> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >>>> >>>> System.out.println(String.format(...) >>>> >>>> Why not simply >>>> >>>> System.out.printf(..) >>>> >>>> ? >> >> or simply (as I commented [1]) >> System.out.format >> >> Mandy >> [1] >> https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html >> >> >> >> > > > > From serguei.spitsyn at oracle.com Wed Dec 11 23:13:34 2019 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 11 Dec 2019 15:13:34 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <7720D36B-0505-49E3-8424-76ACEACAF0AB@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> <072d6861-1374-8190-135d-e30ece2ee380@oracle.com> <7720D36B-0505-49E3-8424-76ACEACAF0AB@oracle.com> Message-ID: <3172f6e2-5330-91eb-35b4-03bec407ee5b@oracle.com> Hi Daniil, One my concerns was a non-atomic read of multiple metrics before comparison. It creates a potential to get a mismatch in result. However, the probability to get a negative value is pretty low, I think. The other concern (if incorrect metrics are returned) is covered by JDK-8235522. Revising all concerns in JDK-8235522 sounds good to me. Thanks, Serguei On 12/10/19 10:29, Daniil Titov wrote: > Hi Serguei, > >> Do we need to check if the (limit - memLimit) value is negative? >> The same question is for getFreeSwapSpaceSize(): >> memSwapLimit - memLimit - (memSwapUsage - memUsage) >> >> and getFreeMemorySize(): >> 101 return limit - usage; > I don't think we need such check here. If it happens in fact it means the serious system malfunction and a negative value this method > returns would indicate this (currently the native methods already returns -1 if something went wrong). But we could revise it in the follow > up issue I created for that [1]. > > [1] https://bugs.openjdk.java.net/browse/JDK-8235522 > > Thank you, > Daniil > > ?On 12/9/19, 6:02 PM, "serguei.spitsyn at oracle.com" wrote: > > Hi Daniil, > > It is not a full review, just some minor comments. > In fact, I do not see real problems yet. > > http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java.frames.html > > 55 public long getTotalSwapSpaceSize() { > 56 if (containerMetrics != null) { > 57 long limit = containerMetrics.getMemoryAndSwapLimit(); > 58 // The memory limit metrics is not available if JVM > runs on Linux host ( not in a docker container) > 59 // or if a docker container was started without > specifying a memory limit ( without '--memory=' > 60 // Docker option). In latter case there is no limit on > how much memory the container can use and > 61 // it can use as much memory as the host's OS allows. > 62 long memLimit = containerMetrics.getMemoryLimit(); > 63 if (limit >= 0 && memLimit >= 0) { > 64 return limit - memLimit; > 65 } > 66 } > 67 return getTotalSwapSpaceSize0(); > 68 } > > Unneeded space after brackets '('. > Do we need to check if the (limit - memLimit) value is negative? > The same question is for getFreeSwapSpaceSize(): > memSwapLimit - memLimit - (memSwapUsage - memUsage) > > and getFreeMemorySize(): > 101 return limit - usage; > > 81 // If this happens just retry the loop for > a few iterations > > Dot is missed at the end of comment. > > > http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java.html > > 34 System.out.println(String.format("Runtime.availableProcessors: > %d", Runtime.getRuntime().availableProcessors())); > 35 > System.out.println(String.format("OperatingSystemMXBean.getAvailableProcessors: > %d", osBean.getAvailableProcessors())); > 36 > System.out.println(String.format("OperatingSystemMXBean.getTotalMemorySize: > %d", osBean.getTotalMemorySize())); > 37 > System.out.println(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize: > %d", osBean.getTotalPhysicalMemorySize())); > 38 > System.out.println(String.format("OperatingSystemMXBean.getFreeMemorySize: > %d", osBean.getFreeMemorySize())); > 39 > System.out.println(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: > %d", osBean.getFreePhysicalMemorySize())); > 40 > System.out.println(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize: > %d", osBean.getTotalSwapSpaceSize())); > 41 > System.out.println(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize: > %d", osBean.getFreeSwapSpaceSize())); > 42 > System.out.println(String.format("OperatingSystemMXBean.getCpuLoad: %f", > osBean.getCpuLoad())); > 43 > System.out.println(String.format("OperatingSystemMXBean.getSystemCpuLoad: > %f", osBean.getSystemCpuLoad())); > > > To make the above lines a little bit shorter I'd suggest to define a > log() method like this: > private static void log(String msg) ( System.out.println(msg(; } > > 34 log(String.format("Runtime.availableProcessors: %d", > Runtime.getRuntime().availableProcessors())); > 35 log(String.format("OperatingSystemMXBean.getAvailableProcessors: > %d", osBean.getAvailableProcessors())); > 36 log(String.format("OperatingSystemMXBean.getTotalMemorySize: %d", > osBean.getTotalMemorySize())); > 37 > log(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize: > %d", osBean.getTotalPhysicalMemorySize())); > 38 log(String.format("OperatingSystemMXBean.getFreeMemorySize: %d", > osBean.getFreeMemorySize())); > 39 > log(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: %d", > osBean.getFreePhysicalMemorySize())); > 40 log(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize: > %d", osBean.getTotalSwapSpaceSize())); > 41 log(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize: > %d", osBean.getFreeSwapSpaceSize())); > 42 log(String.format("OperatingSystemMXBean.getCpuLoad: %f", > osBean.getCpuLoad())); > 43 log(String.format("OperatingSystemMXBean.getSystemCpuLoad: %f", > osBean.getSystemCpuLoad())); > > > Thanks, > Serguei > > > > On 12/6/19 17:41, Daniil Titov wrote: > > Hi David, Mandy, and Bob, > > > > Thank you for reviewing this fix. > > > > Please review a new version of the fix [1] that includes the following changes comparing to the previous version of the webrev ( webrev.04) > > 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 and to CSR [3] were discarded. > > 2. The implementation of methods getFreeMemorySize, getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize > > was also reverted to webrev.03 version that return host's values if the metrics are unavailable or cannot be properly read. > > I would like to mention that currently the native implementation of these methods de-facto may return -1 at some circumstances, > > but I agree that the changes proposed in the previous version of the webrev increase such probability. > > I filed the follow-up issue [4] as Mandy suggested. > > 3. The legacy methods were renamed as David suggested. > > > > > >> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c > >> ! static int initialized=1; > >> > >> Am I reading this right that the code currently fails to actually do the > >> initialization because of this ??? > > Yes, currently the code fails to do the initialization but it was unnoticed since method > > get_cpuload_internal(...) was never called for a specific CPU, the first parameter "which" > > was always -1. > > > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java > >> > >> System.out.println(String.format(...) > >> > >> Why not simply > >> > >> System.out.printf(..) > > As I tried explain it earlier it would make the tests unstable. > > System.out.printf(...) just delegates the call to System.out.format(...) that doesn't emit the string atomically. > > Instead it parses the format string into a list of FormatString objects and then iterates over the list. > > As a result, the other traces occasionally got printed between these iterations and break the pattern the test is expected to find > > in the output. > > > > For example, here is the sample of a such output that has the trace message printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:" > > and "1030762496". > > > > > > [0.304s][trace][os,container] Memory Usage is: 42983424 > > OperatingSystemMXBean.getFreeMemorySize: 1030758400 > > [0.305s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > > [0.305s][trace][os,container] Memory Usage is: 42979328 > > [0.306s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > > OperatingSystemMXBean.getFreePhysicalMemorySize: [0.306s][trace][os,container] Memory Usage is: 42975232 > > 1030762496 > > OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176 > > > > > > java.lang.RuntimeException: 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from stdout/stderr > > > > at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306) > > at TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151) > > at TestMemoryAwareness.main(TestMemoryAwareness.java:73) > > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) > > at java.base/java.lang.Thread.run(Thread.java:832) > > > > Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. > > > > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.05 > > [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 > > [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 > > [4] https://bugs.openjdk.java.net/browse/JDK-8235522 > > > > Thank you, > > Daniil > > > > ?On 12/6/19, 1:38 PM, "Mandy Chung" wrote: > > > > > > > > On 12/6/19 5:59 AM, Bob Vandette wrote: > > >> On Dec 6, 2019, at 2:49 AM, David Holmes wrote: > > >> > > >> > > >> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java > > >> > > >> The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. > > > > I thought that the error case we are referring to is limit == 0 which > > indicates something unexpected goes wrong. So the compatibility concern > > should be low. This is very specific to Metrics implementation for > > cgroup v1 and let me know if I'm wrong. > > > > >> Surely there must always be some information available from the operating environment? I see from the impl file: > > >> > > >> // the host data, value 0 indicates that something went wrong while the metric was read and > > >> // in this case we return "information unavailable" code -1. > > >> > > >> I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. > > > I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods. > > > Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others > > > are, I?d suggest we keep Daniil?s original logic to fall back to the host value since a disabled subsystem is equivalent to no > > > limits. > > > > > > > It's important to consider carefully if the monitoring API indicates an > > error vs unavailable and an application should continue to run when the > > monitoring system fails to get the metrics. > > > > There are several choices to report "something goes wrong" scenarios > > (should unlikely happen???): > > 1. fall back to a random positive value (e.g. host value) > > 2. return a negative value > > 3. throw an exception > > > > #3 is not an option as the application is not expecting this. For #2, > > the application can filter bad values if desirable. > > > > I'm okay if you want to file a JBS issue to follow up and thoroughly > > look at the cases that the metrics are unavailable and the cases when > > fails to obtain. > > > > >> --- > > >> > > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java > > >> > > >> System.out.println(String.format(...) > > >> > > >> Why not simply > > >> > > >> System.out.printf(..) > > >> > > >> ? > > > > or simply (as I commented [1]) > > System.out.format > > > > Mandy > > [1] > > https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html > > > > > > > > > > > > From daniil.x.titov at oracle.com Wed Dec 11 23:33:05 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Wed, 11 Dec 2019 15:33:05 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: <3172f6e2-5330-91eb-35b4-03bec407ee5b@oracle.com> References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> <072d6861-1374-8190-135d-e30ece2ee380@oracle.com> <7720D36B-0505-49E3-8424-76ACEACAF0AB@oracle.com> <3172f6e2-5330-91eb-35b4-03bec407ee5b@oracle.com> Message-ID: Hi Serguei, Thank you for reviewing this change. Just wanted to add that the only "volatile" metrics are "usage" ones ( memoryUsage and memoryAndSwapLimit). The "limit" metrics (memoryLimit and memoryAndSwapLimit) are set when the container starts and are not subjects to change. The only method that reads more than one "volatile" metric is getFreeSwapSpaceSize() and it has a code that retries if the calculated swapUsage is negative as a result of non-atomic reads. Thank you, Daniil ?On 12/11/19, 3:13 PM, "serguei.spitsyn at oracle.com" wrote: Hi Daniil, One my concerns was a non-atomic read of multiple metrics before comparison. It creates a potential to get a mismatch in result. However, the probability to get a negative value is pretty low, I think. The other concern (if incorrect metrics are returned) is covered by JDK-8235522. Revising all concerns in JDK-8235522 sounds good to me. Thanks, Serguei On 12/10/19 10:29, Daniil Titov wrote: > Hi Serguei, > >> Do we need to check if the (limit - memLimit) value is negative? >> The same question is for getFreeSwapSpaceSize(): >> memSwapLimit - memLimit - (memSwapUsage - memUsage) >> >> and getFreeMemorySize(): >> 101 return limit - usage; > I don't think we need such check here. If it happens in fact it means the serious system malfunction and a negative value this method > returns would indicate this (currently the native methods already returns -1 if something went wrong). But we could revise it in the follow > up issue I created for that [1]. > > [1] https://bugs.openjdk.java.net/browse/JDK-8235522 > > Thank you, > Daniil > > ?On 12/9/19, 6:02 PM, "serguei.spitsyn at oracle.com" wrote: > > Hi Daniil, > > It is not a full review, just some minor comments. > In fact, I do not see real problems yet. > > http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java.frames.html > > 55 public long getTotalSwapSpaceSize() { > 56 if (containerMetrics != null) { > 57 long limit = containerMetrics.getMemoryAndSwapLimit(); > 58 // The memory limit metrics is not available if JVM > runs on Linux host ( not in a docker container) > 59 // or if a docker container was started without > specifying a memory limit ( without '--memory=' > 60 // Docker option). In latter case there is no limit on > how much memory the container can use and > 61 // it can use as much memory as the host's OS allows. > 62 long memLimit = containerMetrics.getMemoryLimit(); > 63 if (limit >= 0 && memLimit >= 0) { > 64 return limit - memLimit; > 65 } > 66 } > 67 return getTotalSwapSpaceSize0(); > 68 } > > Unneeded space after brackets '('. > Do we need to check if the (limit - memLimit) value is negative? > The same question is for getFreeSwapSpaceSize(): > memSwapLimit - memLimit - (memSwapUsage - memUsage) > > and getFreeMemorySize(): > 101 return limit - usage; > > 81 // If this happens just retry the loop for > a few iterations > > Dot is missed at the end of comment. > > > http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java.html > > 34 System.out.println(String.format("Runtime.availableProcessors: > %d", Runtime.getRuntime().availableProcessors())); > 35 > System.out.println(String.format("OperatingSystemMXBean.getAvailableProcessors: > %d", osBean.getAvailableProcessors())); > 36 > System.out.println(String.format("OperatingSystemMXBean.getTotalMemorySize: > %d", osBean.getTotalMemorySize())); > 37 > System.out.println(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize: > %d", osBean.getTotalPhysicalMemorySize())); > 38 > System.out.println(String.format("OperatingSystemMXBean.getFreeMemorySize: > %d", osBean.getFreeMemorySize())); > 39 > System.out.println(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: > %d", osBean.getFreePhysicalMemorySize())); > 40 > System.out.println(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize: > %d", osBean.getTotalSwapSpaceSize())); > 41 > System.out.println(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize: > %d", osBean.getFreeSwapSpaceSize())); > 42 > System.out.println(String.format("OperatingSystemMXBean.getCpuLoad: %f", > osBean.getCpuLoad())); > 43 > System.out.println(String.format("OperatingSystemMXBean.getSystemCpuLoad: > %f", osBean.getSystemCpuLoad())); > > > To make the above lines a little bit shorter I'd suggest to define a > log() method like this: > private static void log(String msg) ( System.out.println(msg(; } > > 34 log(String.format("Runtime.availableProcessors: %d", > Runtime.getRuntime().availableProcessors())); > 35 log(String.format("OperatingSystemMXBean.getAvailableProcessors: > %d", osBean.getAvailableProcessors())); > 36 log(String.format("OperatingSystemMXBean.getTotalMemorySize: %d", > osBean.getTotalMemorySize())); > 37 > log(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize: > %d", osBean.getTotalPhysicalMemorySize())); > 38 log(String.format("OperatingSystemMXBean.getFreeMemorySize: %d", > osBean.getFreeMemorySize())); > 39 > log(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: %d", > osBean.getFreePhysicalMemorySize())); > 40 log(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize: > %d", osBean.getTotalSwapSpaceSize())); > 41 log(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize: > %d", osBean.getFreeSwapSpaceSize())); > 42 log(String.format("OperatingSystemMXBean.getCpuLoad: %f", > osBean.getCpuLoad())); > 43 log(String.format("OperatingSystemMXBean.getSystemCpuLoad: %f", > osBean.getSystemCpuLoad())); > > > Thanks, > Serguei > > > > On 12/6/19 17:41, Daniil Titov wrote: > > Hi David, Mandy, and Bob, > > > > Thank you for reviewing this fix. > > > > Please review a new version of the fix [1] that includes the following changes comparing to the previous version of the webrev ( webrev.04) > > 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 and to CSR [3] were discarded. > > 2. The implementation of methods getFreeMemorySize, getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize > > was also reverted to webrev.03 version that return host's values if the metrics are unavailable or cannot be properly read. > > I would like to mention that currently the native implementation of these methods de-facto may return -1 at some circumstances, > > but I agree that the changes proposed in the previous version of the webrev increase such probability. > > I filed the follow-up issue [4] as Mandy suggested. > > 3. The legacy methods were renamed as David suggested. > > > > > >> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c > >> ! static int initialized=1; > >> > >> Am I reading this right that the code currently fails to actually do the > >> initialization because of this ??? > > Yes, currently the code fails to do the initialization but it was unnoticed since method > > get_cpuload_internal(...) was never called for a specific CPU, the first parameter "which" > > was always -1. > > > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java > >> > >> System.out.println(String.format(...) > >> > >> Why not simply > >> > >> System.out.printf(..) > > As I tried explain it earlier it would make the tests unstable. > > System.out.printf(...) just delegates the call to System.out.format(...) that doesn't emit the string atomically. > > Instead it parses the format string into a list of FormatString objects and then iterates over the list. > > As a result, the other traces occasionally got printed between these iterations and break the pattern the test is expected to find > > in the output. > > > > For example, here is the sample of a such output that has the trace message printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:" > > and "1030762496". > > > > > > [0.304s][trace][os,container] Memory Usage is: 42983424 > > OperatingSystemMXBean.getFreeMemorySize: 1030758400 > > [0.305s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > > [0.305s][trace][os,container] Memory Usage is: 42979328 > > [0.306s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > > OperatingSystemMXBean.getFreePhysicalMemorySize: [0.306s][trace][os,container] Memory Usage is: 42975232 > > 1030762496 > > OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176 > > > > > > java.lang.RuntimeException: 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from stdout/stderr > > > > at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306) > > at TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151) > > at TestMemoryAwareness.main(TestMemoryAwareness.java:73) > > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) > > at java.base/java.lang.Thread.run(Thread.java:832) > > > > Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. > > > > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.05 > > [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 > > [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 > > [4] https://bugs.openjdk.java.net/browse/JDK-8235522 > > > > Thank you, > > Daniil > > > > ?On 12/6/19, 1:38 PM, "Mandy Chung" wrote: > > > > > > > > On 12/6/19 5:59 AM, Bob Vandette wrote: > > >> On Dec 6, 2019, at 2:49 AM, David Holmes wrote: > > >> > > >> > > >> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java > > >> > > >> The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. > > > > I thought that the error case we are referring to is limit == 0 which > > indicates something unexpected goes wrong. So the compatibility concern > > should be low. This is very specific to Metrics implementation for > > cgroup v1 and let me know if I'm wrong. > > > > >> Surely there must always be some information available from the operating environment? I see from the impl file: > > >> > > >> // the host data, value 0 indicates that something went wrong while the metric was read and > > >> // in this case we return "information unavailable" code -1. > > >> > > >> I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. > > > I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods. > > > Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others > > > are, I?d suggest we keep Daniil?s original logic to fall back to the host value since a disabled subsystem is equivalent to no > > > limits. > > > > > > > It's important to consider carefully if the monitoring API indicates an > > error vs unavailable and an application should continue to run when the > > monitoring system fails to get the metrics. > > > > There are several choices to report "something goes wrong" scenarios > > (should unlikely happen???): > > 1. fall back to a random positive value (e.g. host value) > > 2. return a negative value > > 3. throw an exception > > > > #3 is not an option as the application is not expecting this. For #2, > > the application can filter bad values if desirable. > > > > I'm okay if you want to file a JBS issue to follow up and thoroughly > > look at the cases that the metrics are unavailable and the cases when > > fails to obtain. > > > > >> --- > > >> > > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java > > >> > > >> System.out.println(String.format(...) > > >> > > >> Why not simply > > >> > > >> System.out.printf(..) > > >> > > >> ? > > > > or simply (as I commented [1]) > > System.out.format > > > > Mandy > > [1] > > https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html > > > > > > > > > > > > From daniil.x.titov at oracle.com Wed Dec 11 23:35:10 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Wed, 11 Dec 2019 15:35:10 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> <072d6861-1374-8190-135d-e30ece2ee380@oracle.com> <7720D36B-0505-49E3-8424-76ACEACAF0AB@oracle.com> <3172f6e2-5330-91eb-35b4-03bec407ee5b@oracle.com> Message-ID: <8E4D2767-A223-46AC-A541-D51CC5D3D7AF@oracle.com> Typo fixed... .. that the only "volatile" metrics are "usage" ones ( memoryUsage and *memoryAndSwapUsage*). Best regards, Daniil ?On 12/11/19, 3:33 PM, "Daniil Titov" wrote: Hi Serguei, Thank you for reviewing this change. Just wanted to add that the only "volatile" metrics are "usage" ones ( memoryUsage and memoryAndSwapLimit). The "limit" metrics (memoryLimit and memoryAndSwapLimit) are set when the container starts and are not subjects to change. The only method that reads more than one "volatile" metric is getFreeSwapSpaceSize() and it has a code that retries if the calculated swapUsage is negative as a result of non-atomic reads. Thank you, Daniil ?On 12/11/19, 3:13 PM, "serguei.spitsyn at oracle.com" wrote: Hi Daniil, One my concerns was a non-atomic read of multiple metrics before comparison. It creates a potential to get a mismatch in result. However, the probability to get a negative value is pretty low, I think. The other concern (if incorrect metrics are returned) is covered by JDK-8235522. Revising all concerns in JDK-8235522 sounds good to me. Thanks, Serguei On 12/10/19 10:29, Daniil Titov wrote: > Hi Serguei, > >> Do we need to check if the (limit - memLimit) value is negative? >> The same question is for getFreeSwapSpaceSize(): >> memSwapLimit - memLimit - (memSwapUsage - memUsage) >> >> and getFreeMemorySize(): >> 101 return limit - usage; > I don't think we need such check here. If it happens in fact it means the serious system malfunction and a negative value this method > returns would indicate this (currently the native methods already returns -1 if something went wrong). But we could revise it in the follow > up issue I created for that [1]. > > [1] https://bugs.openjdk.java.net/browse/JDK-8235522 > > Thank you, > Daniil > > ?On 12/9/19, 6:02 PM, "serguei.spitsyn at oracle.com" wrote: > > Hi Daniil, > > It is not a full review, just some minor comments. > In fact, I do not see real problems yet. > > http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java.frames.html > > 55 public long getTotalSwapSpaceSize() { > 56 if (containerMetrics != null) { > 57 long limit = containerMetrics.getMemoryAndSwapLimit(); > 58 // The memory limit metrics is not available if JVM > runs on Linux host ( not in a docker container) > 59 // or if a docker container was started without > specifying a memory limit ( without '--memory=' > 60 // Docker option). In latter case there is no limit on > how much memory the container can use and > 61 // it can use as much memory as the host's OS allows. > 62 long memLimit = containerMetrics.getMemoryLimit(); > 63 if (limit >= 0 && memLimit >= 0) { > 64 return limit - memLimit; > 65 } > 66 } > 67 return getTotalSwapSpaceSize0(); > 68 } > > Unneeded space after brackets '('. > Do we need to check if the (limit - memLimit) value is negative? > The same question is for getFreeSwapSpaceSize(): > memSwapLimit - memLimit - (memSwapUsage - memUsage) > > and getFreeMemorySize(): > 101 return limit - usage; > > 81 // If this happens just retry the loop for > a few iterations > > Dot is missed at the end of comment. > > > http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java.html > > 34 System.out.println(String.format("Runtime.availableProcessors: > %d", Runtime.getRuntime().availableProcessors())); > 35 > System.out.println(String.format("OperatingSystemMXBean.getAvailableProcessors: > %d", osBean.getAvailableProcessors())); > 36 > System.out.println(String.format("OperatingSystemMXBean.getTotalMemorySize: > %d", osBean.getTotalMemorySize())); > 37 > System.out.println(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize: > %d", osBean.getTotalPhysicalMemorySize())); > 38 > System.out.println(String.format("OperatingSystemMXBean.getFreeMemorySize: > %d", osBean.getFreeMemorySize())); > 39 > System.out.println(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: > %d", osBean.getFreePhysicalMemorySize())); > 40 > System.out.println(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize: > %d", osBean.getTotalSwapSpaceSize())); > 41 > System.out.println(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize: > %d", osBean.getFreeSwapSpaceSize())); > 42 > System.out.println(String.format("OperatingSystemMXBean.getCpuLoad: %f", > osBean.getCpuLoad())); > 43 > System.out.println(String.format("OperatingSystemMXBean.getSystemCpuLoad: > %f", osBean.getSystemCpuLoad())); > > > To make the above lines a little bit shorter I'd suggest to define a > log() method like this: > private static void log(String msg) ( System.out.println(msg(; } > > 34 log(String.format("Runtime.availableProcessors: %d", > Runtime.getRuntime().availableProcessors())); > 35 log(String.format("OperatingSystemMXBean.getAvailableProcessors: > %d", osBean.getAvailableProcessors())); > 36 log(String.format("OperatingSystemMXBean.getTotalMemorySize: %d", > osBean.getTotalMemorySize())); > 37 > log(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize: > %d", osBean.getTotalPhysicalMemorySize())); > 38 log(String.format("OperatingSystemMXBean.getFreeMemorySize: %d", > osBean.getFreeMemorySize())); > 39 > log(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: %d", > osBean.getFreePhysicalMemorySize())); > 40 log(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize: > %d", osBean.getTotalSwapSpaceSize())); > 41 log(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize: > %d", osBean.getFreeSwapSpaceSize())); > 42 log(String.format("OperatingSystemMXBean.getCpuLoad: %f", > osBean.getCpuLoad())); > 43 log(String.format("OperatingSystemMXBean.getSystemCpuLoad: %f", > osBean.getSystemCpuLoad())); > > > Thanks, > Serguei > > > > On 12/6/19 17:41, Daniil Titov wrote: > > Hi David, Mandy, and Bob, > > > > Thank you for reviewing this fix. > > > > Please review a new version of the fix [1] that includes the following changes comparing to the previous version of the webrev ( webrev.04) > > 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 and to CSR [3] were discarded. > > 2. The implementation of methods getFreeMemorySize, getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize > > was also reverted to webrev.03 version that return host's values if the metrics are unavailable or cannot be properly read. > > I would like to mention that currently the native implementation of these methods de-facto may return -1 at some circumstances, > > but I agree that the changes proposed in the previous version of the webrev increase such probability. > > I filed the follow-up issue [4] as Mandy suggested. > > 3. The legacy methods were renamed as David suggested. > > > > > >> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c > >> ! static int initialized=1; > >> > >> Am I reading this right that the code currently fails to actually do the > >> initialization because of this ??? > > Yes, currently the code fails to do the initialization but it was unnoticed since method > > get_cpuload_internal(...) was never called for a specific CPU, the first parameter "which" > > was always -1. > > > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java > >> > >> System.out.println(String.format(...) > >> > >> Why not simply > >> > >> System.out.printf(..) > > As I tried explain it earlier it would make the tests unstable. > > System.out.printf(...) just delegates the call to System.out.format(...) that doesn't emit the string atomically. > > Instead it parses the format string into a list of FormatString objects and then iterates over the list. > > As a result, the other traces occasionally got printed between these iterations and break the pattern the test is expected to find > > in the output. > > > > For example, here is the sample of a such output that has the trace message printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:" > > and "1030762496". > > > > > > [0.304s][trace][os,container] Memory Usage is: 42983424 > > OperatingSystemMXBean.getFreeMemorySize: 1030758400 > > [0.305s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > > [0.305s][trace][os,container] Memory Usage is: 42979328 > > [0.306s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > > OperatingSystemMXBean.getFreePhysicalMemorySize: [0.306s][trace][os,container] Memory Usage is: 42975232 > > 1030762496 > > OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176 > > > > > > java.lang.RuntimeException: 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from stdout/stderr > > > > at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306) > > at TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151) > > at TestMemoryAwareness.main(TestMemoryAwareness.java:73) > > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) > > at java.base/java.lang.Thread.run(Thread.java:832) > > > > Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. > > > > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.05 > > [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 > > [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 > > [4] https://bugs.openjdk.java.net/browse/JDK-8235522 > > > > Thank you, > > Daniil > > > > ?On 12/6/19, 1:38 PM, "Mandy Chung" wrote: > > > > > > > > On 12/6/19 5:59 AM, Bob Vandette wrote: > > >> On Dec 6, 2019, at 2:49 AM, David Holmes wrote: > > >> > > >> > > >> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java > > >> > > >> The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. > > > > I thought that the error case we are referring to is limit == 0 which > > indicates something unexpected goes wrong. So the compatibility concern > > should be low. This is very specific to Metrics implementation for > > cgroup v1 and let me know if I'm wrong. > > > > >> Surely there must always be some information available from the operating environment? I see from the impl file: > > >> > > >> // the host data, value 0 indicates that something went wrong while the metric was read and > > >> // in this case we return "information unavailable" code -1. > > >> > > >> I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. > > > I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods. > > > Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others > > > are, I?d suggest we keep Daniil?s original logic to fall back to the host value since a disabled subsystem is equivalent to no > > > limits. > > > > > > > It's important to consider carefully if the monitoring API indicates an > > error vs unavailable and an application should continue to run when the > > monitoring system fails to get the metrics. > > > > There are several choices to report "something goes wrong" scenarios > > (should unlikely happen???): > > 1. fall back to a random positive value (e.g. host value) > > 2. return a negative value > > 3. throw an exception > > > > #3 is not an option as the application is not expecting this. For #2, > > the application can filter bad values if desirable. > > > > I'm okay if you want to file a JBS issue to follow up and thoroughly > > look at the cases that the metrics are unavailable and the cases when > > fails to obtain. > > > > >> --- > > >> > > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java > > >> > > >> System.out.println(String.format(...) > > >> > > >> Why not simply > > >> > > >> System.out.printf(..) > > >> > > >> ? > > > > or simply (as I commented [1]) > > System.out.format > > > > Mandy > > [1] > > https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html > > > > > > > > > > > > From serguei.spitsyn at oracle.com Wed Dec 11 23:51:12 2019 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 11 Dec 2019 15:51:12 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> <072d6861-1374-8190-135d-e30ece2ee380@oracle.com> <7720D36B-0505-49E3-8424-76ACEACAF0AB@oracle.com> <3172f6e2-5330-91eb-35b4-03bec407ee5b@oracle.com> Message-ID: Hi Daniil, Got it, thanks! Serguei On 12/11/19 15:33, Daniil Titov wrote: > Hi Serguei, > > Thank you for reviewing this change. > > Just wanted to add that the only "volatile" metrics are "usage" ones ( memoryUsage and > memoryAndSwapLimit). The "limit" metrics (memoryLimit and memoryAndSwapLimit) are set > when the container starts and are not subjects to change. The only method that reads more than one > "volatile" metric is getFreeSwapSpaceSize() and it has a code that retries if the calculated swapUsage > is negative as a result of non-atomic reads. > > > Thank you, > Daniil > > ?On 12/11/19, 3:13 PM, "serguei.spitsyn at oracle.com" wrote: > > Hi Daniil, > > One my concerns was a non-atomic read of multiple metrics before comparison. > It creates a potential to get a mismatch in result. > However, the probability to get a negative value is pretty low, I think. > The other concern (if incorrect metrics are returned) is covered by > JDK-8235522. > Revising all concerns in JDK-8235522 sounds good to me. > > Thanks, > Serguei > > On 12/10/19 10:29, Daniil Titov wrote: > > Hi Serguei, > > > >> Do we need to check if the (limit - memLimit) value is negative? > >> The same question is for getFreeSwapSpaceSize(): > >> memSwapLimit - memLimit - (memSwapUsage - memUsage) > >> > >> and getFreeMemorySize(): > >> 101 return limit - usage; > > I don't think we need such check here. If it happens in fact it means the serious system malfunction and a negative value this method > > returns would indicate this (currently the native methods already returns -1 if something went wrong). But we could revise it in the follow > > up issue I created for that [1]. > > > > [1] https://bugs.openjdk.java.net/browse/JDK-8235522 > > > > Thank you, > > Daniil > > > > ?On 12/9/19, 6:02 PM, "serguei.spitsyn at oracle.com" wrote: > > > > Hi Daniil, > > > > It is not a full review, just some minor comments. > > In fact, I do not see real problems yet. > > > > http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java.frames.html > > > > 55 public long getTotalSwapSpaceSize() { > > 56 if (containerMetrics != null) { > > 57 long limit = containerMetrics.getMemoryAndSwapLimit(); > > 58 // The memory limit metrics is not available if JVM > > runs on Linux host ( not in a docker container) > > 59 // or if a docker container was started without > > specifying a memory limit ( without '--memory=' > > 60 // Docker option). In latter case there is no limit on > > how much memory the container can use and > > 61 // it can use as much memory as the host's OS allows. > > 62 long memLimit = containerMetrics.getMemoryLimit(); > > 63 if (limit >= 0 && memLimit >= 0) { > > 64 return limit - memLimit; > > 65 } > > 66 } > > 67 return getTotalSwapSpaceSize0(); > > 68 } > > > > Unneeded space after brackets '('. > > Do we need to check if the (limit - memLimit) value is negative? > > The same question is for getFreeSwapSpaceSize(): > > memSwapLimit - memLimit - (memSwapUsage - memUsage) > > > > and getFreeMemorySize(): > > 101 return limit - usage; > > > > 81 // If this happens just retry the loop for > > a few iterations > > > > Dot is missed at the end of comment. > > > > > > http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java.html > > > > 34 System.out.println(String.format("Runtime.availableProcessors: > > %d", Runtime.getRuntime().availableProcessors())); > > 35 > > System.out.println(String.format("OperatingSystemMXBean.getAvailableProcessors: > > %d", osBean.getAvailableProcessors())); > > 36 > > System.out.println(String.format("OperatingSystemMXBean.getTotalMemorySize: > > %d", osBean.getTotalMemorySize())); > > 37 > > System.out.println(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize: > > %d", osBean.getTotalPhysicalMemorySize())); > > 38 > > System.out.println(String.format("OperatingSystemMXBean.getFreeMemorySize: > > %d", osBean.getFreeMemorySize())); > > 39 > > System.out.println(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: > > %d", osBean.getFreePhysicalMemorySize())); > > 40 > > System.out.println(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize: > > %d", osBean.getTotalSwapSpaceSize())); > > 41 > > System.out.println(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize: > > %d", osBean.getFreeSwapSpaceSize())); > > 42 > > System.out.println(String.format("OperatingSystemMXBean.getCpuLoad: %f", > > osBean.getCpuLoad())); > > 43 > > System.out.println(String.format("OperatingSystemMXBean.getSystemCpuLoad: > > %f", osBean.getSystemCpuLoad())); > > > > > > To make the above lines a little bit shorter I'd suggest to define a > > log() method like this: > > private static void log(String msg) ( System.out.println(msg(; } > > > > 34 log(String.format("Runtime.availableProcessors: %d", > > Runtime.getRuntime().availableProcessors())); > > 35 log(String.format("OperatingSystemMXBean.getAvailableProcessors: > > %d", osBean.getAvailableProcessors())); > > 36 log(String.format("OperatingSystemMXBean.getTotalMemorySize: %d", > > osBean.getTotalMemorySize())); > > 37 > > log(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize: > > %d", osBean.getTotalPhysicalMemorySize())); > > 38 log(String.format("OperatingSystemMXBean.getFreeMemorySize: %d", > > osBean.getFreeMemorySize())); > > 39 > > log(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: %d", > > osBean.getFreePhysicalMemorySize())); > > 40 log(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize: > > %d", osBean.getTotalSwapSpaceSize())); > > 41 log(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize: > > %d", osBean.getFreeSwapSpaceSize())); > > 42 log(String.format("OperatingSystemMXBean.getCpuLoad: %f", > > osBean.getCpuLoad())); > > 43 log(String.format("OperatingSystemMXBean.getSystemCpuLoad: %f", > > osBean.getSystemCpuLoad())); > > > > > > Thanks, > > Serguei > > > > > > > > On 12/6/19 17:41, Daniil Titov wrote: > > > Hi David, Mandy, and Bob, > > > > > > Thank you for reviewing this fix. > > > > > > Please review a new version of the fix [1] that includes the following changes comparing to the previous version of the webrev ( webrev.04) > > > 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 and to CSR [3] were discarded. > > > 2. The implementation of methods getFreeMemorySize, getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize > > > was also reverted to webrev.03 version that return host's values if the metrics are unavailable or cannot be properly read. > > > I would like to mention that currently the native implementation of these methods de-facto may return -1 at some circumstances, > > > but I agree that the changes proposed in the previous version of the webrev increase such probability. > > > I filed the follow-up issue [4] as Mandy suggested. > > > 3. The legacy methods were renamed as David suggested. > > > > > > > > >> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c > > >> ! static int initialized=1; > > >> > > >> Am I reading this right that the code currently fails to actually do the > > >> initialization because of this ??? > > > Yes, currently the code fails to do the initialization but it was unnoticed since method > > > get_cpuload_internal(...) was never called for a specific CPU, the first parameter "which" > > > was always -1. > > > > > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java > > >> > > >> System.out.println(String.format(...) > > >> > > >> Why not simply > > >> > > >> System.out.printf(..) > > > As I tried explain it earlier it would make the tests unstable. > > > System.out.printf(...) just delegates the call to System.out.format(...) that doesn't emit the string atomically. > > > Instead it parses the format string into a list of FormatString objects and then iterates over the list. > > > As a result, the other traces occasionally got printed between these iterations and break the pattern the test is expected to find > > > in the output. > > > > > > For example, here is the sample of a such output that has the trace message printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:" > > > and "1030762496". > > > > > > > > > [0.304s][trace][os,container] Memory Usage is: 42983424 > > > OperatingSystemMXBean.getFreeMemorySize: 1030758400 > > > [0.305s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > > > [0.305s][trace][os,container] Memory Usage is: 42979328 > > > [0.306s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes > > > OperatingSystemMXBean.getFreePhysicalMemorySize: [0.306s][trace][os,container] Memory Usage is: 42975232 > > > 1030762496 > > > OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176 > > > > > > > > > java.lang.RuntimeException: 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from stdout/stderr > > > > > > at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306) > > > at TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151) > > > at TestMemoryAwareness.main(TestMemoryAwareness.java:73) > > > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > > > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) > > > at java.base/java.lang.Thread.run(Thread.java:832) > > > > > > Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. > > > > > > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.05 > > > [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 > > > [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 > > > [4] https://bugs.openjdk.java.net/browse/JDK-8235522 > > > > > > Thank you, > > > Daniil > > > > > > ?On 12/6/19, 1:38 PM, "Mandy Chung" wrote: > > > > > > > > > > > > On 12/6/19 5:59 AM, Bob Vandette wrote: > > > >> On Dec 6, 2019, at 2:49 AM, David Holmes wrote: > > > >> > > > >> > > > >> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java > > > >> > > > >> The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. > > > > > > I thought that the error case we are referring to is limit == 0 which > > > indicates something unexpected goes wrong. So the compatibility concern > > > should be low. This is very specific to Metrics implementation for > > > cgroup v1 and let me know if I'm wrong. > > > > > > >> Surely there must always be some information available from the operating environment? I see from the impl file: > > > >> > > > >> // the host data, value 0 indicates that something went wrong while the metric was read and > > > >> // in this case we return "information unavailable" code -1. > > > >> > > > >> I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. > > > > I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods. > > > > Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others > > > > are, I?d suggest we keep Daniil?s original logic to fall back to the host value since a disabled subsystem is equivalent to no > > > > limits. > > > > > > > > > > It's important to consider carefully if the monitoring API indicates an > > > error vs unavailable and an application should continue to run when the > > > monitoring system fails to get the metrics. > > > > > > There are several choices to report "something goes wrong" scenarios > > > (should unlikely happen???): > > > 1. fall back to a random positive value (e.g. host value) > > > 2. return a negative value > > > 3. throw an exception > > > > > > #3 is not an option as the application is not expecting this. For #2, > > > the application can filter bad values if desirable. > > > > > > I'm okay if you want to file a JBS issue to follow up and thoroughly > > > look at the cases that the metrics are unavailable and the cases when > > > fails to obtain. > > > > > > >> --- > > > >> > > > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java > > > >> > > > >> System.out.println(String.format(...) > > > >> > > > >> Why not simply > > > >> > > > >> System.out.printf(..) > > > >> > > > >> ? > > > > > > or simply (as I commented [1]) > > > System.out.format > > > > > > Mandy > > > [1] > > > https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html > > > > > > > > > > > > > > > > > > > > > > > > From daniil.x.titov at oracle.com Thu Dec 12 03:25:49 2019 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Wed, 11 Dec 2019 19:25:49 -0800 Subject: jmx-dev RFR: 8226575: OperatingSystemMXBean should be made container aware In-Reply-To: References: <99DD810A-E931-47D9-9968-065385C91752@oracle.com> <6BDD7672-2B08-4988-997D-F5942FF3C9E6@oracle.com> <427C2F30-9D91-4DFF-B3C2-F81DF697B881@oracle.com> <71B27770-C551-415B-8A26-1C6C38E7819B@oracle.com> <6CEB6280-69C7-4524-BF24-17CDB8364706@oracle.com> <4e4f0bef-c222-22c9-adef-a58a0d43f116@oracle.com> <072d6861-1374-8190-135d-e30ece2ee380@oracle.com> <2A286F91-6CB8-4A32-B103-2C28D52E5A1A@oracle.com> Message-ID: <0569FA2A-AEB3-41E1-8306-11290159B030@oracle.com> Hi Bob, David, Mandy, and Serguei, Thank you for reviewing this change! Best regards, Daniil ?On 12/11/19, 9:21 AM, "Bob Vandette" wrote: Yes, I defer to Mandy on the best way to express the various Java exceptions. I?m ok with the changes. Thanks for getting this done for JDK14! Bob. > On Dec 11, 2019, at 12:12 PM, Daniil Titov wrote: > > Hi Serguei, > > Thank you for your comments. I will correct this nits before pushing the changes. > > Hi Bob and David, > >> [Mandy Chung] >>> I reviewed Metrics and Subsystem in this version. >>> I don't need to see a new webrev. > > As I understood Mandy finished reviewing this fix. Just wanted to confirm with you if you are okey with that version of the fix (webrev.06) ? > > Mach5 testing: tier1-tier6 and open/test/hotspot/jtreg/containers/docker tests passed. > > Thank you, > Daniil > > > > ?On 12/9/19, 6:02 PM, "serguei.spitsyn at oracle.com" wrote: > > Hi Daniil, > > It is not a full review, just some minor comments. > In fact, I do not see real problems yet. > > http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java.frames.html > > 55 public long getTotalSwapSpaceSize() { > 56 if (containerMetrics != null) { > 57 long limit = containerMetrics.getMemoryAndSwapLimit(); > 58 // The memory limit metrics is not available if JVM > runs on Linux host ( not in a docker container) > 59 // or if a docker container was started without > specifying a memory limit ( without '--memory=' > 60 // Docker option). In latter case there is no limit on > how much memory the container can use and > 61 // it can use as much memory as the host's OS allows. > 62 long memLimit = containerMetrics.getMemoryLimit(); > 63 if (limit >= 0 && memLimit >= 0) { > 64 return limit - memLimit; > 65 } > 66 } > 67 return getTotalSwapSpaceSize0(); > 68 } > > Unneeded space after brackets '('. > Do we need to check if the (limit - memLimit) value is negative? > The same question is for getFreeSwapSpaceSize(): > memSwapLimit - memLimit - (memSwapUsage - memUsage) > > and getFreeMemorySize(): > 101 return limit - usage; > > 81 // If this happens just retry the loop for > a few iterations > > Dot is missed at the end of comment. > > > http://cr.openjdk.java.net/~dtitov/8226575/webrev.05/test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java.html > > 34 System.out.println(String.format("Runtime.availableProcessors: > %d", Runtime.getRuntime().availableProcessors())); > 35 > System.out.println(String.format("OperatingSystemMXBean.getAvailableProcessors: > %d", osBean.getAvailableProcessors())); > 36 > System.out.println(String.format("OperatingSystemMXBean.getTotalMemorySize: > %d", osBean.getTotalMemorySize())); > 37 > System.out.println(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize: > %d", osBean.getTotalPhysicalMemorySize())); > 38 > System.out.println(String.format("OperatingSystemMXBean.getFreeMemorySize: > %d", osBean.getFreeMemorySize())); > 39 > System.out.println(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: > %d", osBean.getFreePhysicalMemorySize())); > 40 > System.out.println(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize: > %d", osBean.getTotalSwapSpaceSize())); > 41 > System.out.println(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize: > %d", osBean.getFreeSwapSpaceSize())); > 42 > System.out.println(String.format("OperatingSystemMXBean.getCpuLoad: %f", > osBean.getCpuLoad())); > 43 > System.out.println(String.format("OperatingSystemMXBean.getSystemCpuLoad: > %f", osBean.getSystemCpuLoad())); > > > To make the above lines a little bit shorter I'd suggest to define a > log() method like this: > private static void log(String msg) ( System.out.println(msg(; } > > 34 log(String.format("Runtime.availableProcessors: %d", > Runtime.getRuntime().availableProcessors())); > 35 log(String.format("OperatingSystemMXBean.getAvailableProcessors: > %d", osBean.getAvailableProcessors())); > 36 log(String.format("OperatingSystemMXBean.getTotalMemorySize: %d", > osBean.getTotalMemorySize())); > 37 > log(String.format("OperatingSystemMXBean.getTotalPhysicalMemorySize: > %d", osBean.getTotalPhysicalMemorySize())); > 38 log(String.format("OperatingSystemMXBean.getFreeMemorySize: %d", > osBean.getFreeMemorySize())); > 39 > log(String.format("OperatingSystemMXBean.getFreePhysicalMemorySize: %d", > osBean.getFreePhysicalMemorySize())); > 40 log(String.format("OperatingSystemMXBean.getTotalSwapSpaceSize: > %d", osBean.getTotalSwapSpaceSize())); > 41 log(String.format("OperatingSystemMXBean.getFreeSwapSpaceSize: > %d", osBean.getFreeSwapSpaceSize())); > 42 log(String.format("OperatingSystemMXBean.getCpuLoad: %f", > osBean.getCpuLoad())); > 43 log(String.format("OperatingSystemMXBean.getSystemCpuLoad: %f", > osBean.getSystemCpuLoad())); > > > Thanks, > Serguei > > > > On 12/6/19 17:41, Daniil Titov wrote: >> Hi David, Mandy, and Bob, >> >> Thank you for reviewing this fix. >> >> Please review a new version of the fix [1] that includes the following changes comparing to the previous version of the webrev ( webrev.04) >> 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 and to CSR [3] were discarded. >> 2. The implementation of methods getFreeMemorySize, getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize >> was also reverted to webrev.03 version that return host's values if the metrics are unavailable or cannot be properly read. >> I would like to mention that currently the native implementation of these methods de-facto may return -1 at some circumstances, >> but I agree that the changes proposed in the previous version of the webrev increase such probability. >> I filed the follow-up issue [4] as Mandy suggested. >> 3. The legacy methods were renamed as David suggested. >> >> >>> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c >>> ! static int initialized=1; >>> >>> Am I reading this right that the code currently fails to actually do the >>> initialization because of this ??? >> Yes, currently the code fails to do the initialization but it was unnoticed since method >> get_cpuload_internal(...) was never called for a specific CPU, the first parameter "which" >> was always -1. >> >>> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >>> >>> System.out.println(String.format(...) >>> >>> Why not simply >>> >>> System.out.printf(..) >> As I tried explain it earlier it would make the tests unstable. >> System.out.printf(...) just delegates the call to System.out.format(...) that doesn't emit the string atomically. >> Instead it parses the format string into a list of FormatString objects and then iterates over the list. >> As a result, the other traces occasionally got printed between these iterations and break the pattern the test is expected to find >> in the output. >> >> For example, here is the sample of a such output that has the trace message printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:" >> and "1030762496". >> >> >> [0.304s][trace][os,container] Memory Usage is: 42983424 >> OperatingSystemMXBean.getFreeMemorySize: 1030758400 >> [0.305s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes >> [0.305s][trace][os,container] Memory Usage is: 42979328 >> [0.306s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes >> OperatingSystemMXBean.getFreePhysicalMemorySize: [0.306s][trace][os,container] Memory Usage is: 42975232 >> 1030762496 >> OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176 >> >> >> java.lang.RuntimeException: 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from stdout/stderr >> >> at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306) >> at TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151) >> at TestMemoryAwareness.main(TestMemoryAwareness.java:73) >> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.base/java.lang.reflect.Method.invoke(Method.java:564) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) >> at java.base/java.lang.Thread.run(Thread.java:832) >> >> Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running. >> >> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.05 >> [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 >> [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 >> [4] https://bugs.openjdk.java.net/browse/JDK-8235522 >> >> Thank you, >> Daniil >> >> ?On 12/6/19, 1:38 PM, "Mandy Chung" wrote: >> >> >> >> On 12/6/19 5:59 AM, Bob Vandette wrote: >>>> On Dec 6, 2019, at 2:49 AM, David Holmes wrote: >>>> >>>> >>>> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java >>>> >>>> The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue. >> >> I thought that the error case we are referring to is limit == 0 which >> indicates something unexpected goes wrong. So the compatibility concern >> should be low. This is very specific to Metrics implementation for >> cgroup v1 and let me know if I'm wrong. >> >>>> Surely there must always be some information available from the operating environment? I see from the impl file: >>>> >>>> // the host data, value 0 indicates that something went wrong while the metric was read and >>>> // in this case we return "information unavailable" code -1. >>>> >>>> I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO. >>> I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods. >>> Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others >>> are, I?d suggest we keep Daniil?s original logic to fall back to the host value since a disabled subsystem is equivalent to no >>> limits. >>> >> >> It's important to consider carefully if the monitoring API indicates an >> error vs unavailable and an application should continue to run when the >> monitoring system fails to get the metrics. >> >> There are several choices to report "something goes wrong" scenarios >> (should unlikely happen???): >> 1. fall back to a random positive value (e.g. host value) >> 2. return a negative value >> 3. throw an exception >> >> #3 is not an option as the application is not expecting this. For #2, >> the application can filter bad values if desirable. >> >> I'm okay if you want to file a JBS issue to follow up and thoroughly >> look at the cases that the metrics are unavailable and the cases when >> fails to obtain. >> >>>> --- >>>> >>>> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >>>> >>>> System.out.println(String.format(...) >>>> >>>> Why not simply >>>> >>>> System.out.printf(..) >>>> >>>> ? >> >> or simply (as I commented [1]) >> System.out.format >> >> Mandy >> [1] >> https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html >> >> >> >> > > > >