RFR: 8226575: OperatingSystemMXBean should be made container aware
Daniil Titov
daniil.x.titov at oracle.com
Mon Dec 9 18:59:21 UTC 2019
A correction...
We could even further simplify it as the following:
public static String getStringValue(SubSystem subsystem, String parm) {
if (subsystem == null) return null;
try (BufferedReader bufferedReader = AccessController.doPrivileged((PrivilegedExceptionAction<BufferedReader>)
() -> Files.newBufferedReader(Paths.get(subsystem.path(), parm)))) {
return bufferedReader.readLine();
} catch (PrivilegedActionException | IOException e) {
return null;
}
}
Best regards,
Daniil
On 12/9/19, 10:51 AM, "Daniil Titov" <daniil.x.titov at oracle.com> wrote:
Hi Mandy and Bob,
> Why did you not change the exception caught in SubSystem.java:getStringValue to PrivilegedActionException from IOException
> so it’s consistent with the other get functions?
In this method both Files.newBufferedReader and return bufferedReader.readLine could throw IOException so for simplicity I just put
the whole code block in doPrivileged. On the other side I don't believe that BufferedReader.readline() requires FilePermission checks ( and tests proved that)
so we could change this implementation to the following:
public static String getStringValue(SubSystem subsystem, String parm) {
if (subsystem == null) return null;
try (BufferedReader bufferedReader =
AccessController.doPrivileged((PrivilegedExceptionAction<BufferedReader>) () -> {
return Files.newBufferedReader(Paths.get(subsystem.path(), parm));
})) {
return bufferedReader.readLine();
} catch (PrivilegedActionException | IOException e) {
return null;
}
}
Could you please advise are you OK with it or you would like to proceed with the approach Mandy suggested to unwrap
PrivilegedActionException exception and throw the cause instead?
Thank you,
Daniil
On 12/9/19, 9:48 AM, "Mandy Chung" <mandy.chung at oracle.com> wrote:
Files:lines requires FilePermission check. So it needs to be wrapped
with doPrivileged. The readFilePrivileged can unwrap and throw the
cause instead like this:
static Stream<String> readFilePrivileged(Path path) throws
IOException {
try {
return
AccessController.doPrivileged((PrivilegedExceptionAction<Stream<String>>)
() -> Files.lines(path));
} catch (PrivilegedActionException e) {
Throwable x = e.getCause();
if (x instanceof IOException)
throw (IOException)x;
if (x instanceof RuntimeException)
throw (RuntimeException)x;
if (x instanceof Error)
throw (Error)x;
throw new InternalError(x);
}
}
On 12/9/19 7:17 AM, Bob Vandette wrote:
> Why did you not change the exception caught in SubSystem.java:getStringValue to PrivilegedActionException from IOException
> so it’s consistent with the other get functions?
>
> Bob.
>
>
>> On Dec 6, 2019, at 8:41 PM, Daniil Titov <daniil.x.titov at oracle.com> wrote:
>>
>> Hi David, Mandy, and Bob,
>>
>> Thank you for reviewing this fix.
>>
>> Please review a new version of the fix [1] that includes the following changes comparing to the previous version of the webrev ( webrev.04)
>> 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 and to CSR [3] were discarded.
>> 2. The implementation of methods getFreeMemorySize, getTotalMemorySize, getFreeSwapSpaceSize and getTotalSwapSpaceSize
>> was also reverted to webrev.03 version that return host's values if the metrics are unavailable or cannot be properly read.
>> I would like to mention that currently the native implementation of these methods de-facto may return -1 at some circumstances,
>> but I agree that the changes proposed in the previous version of the webrev increase such probability.
>> I filed the follow-up issue [4] as Mandy suggested.
>> 3. The legacy methods were renamed as David suggested.
>>
>>
>>> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c
>>> ! static int initialized=1;
>>>
>>> Am I reading this right that the code currently fails to actually do the
>>> initialization because of this ???
>> Yes, currently the code fails to do the initialization but it was unnoticed since method
>> get_cpuload_internal(...) was never called for a specific CPU, the first parameter "which"
>> was always -1.
>>
>>> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java
>>>
>>> System.out.println(String.format(...)
>>>
>>> Why not simply
>>>
>>> System.out.printf(..)
>> As I tried explain it earlier it would make the tests unstable.
>> System.out.printf(...) just delegates the call to System.out.format(...) that doesn't emit the string atomically.
>> Instead it parses the format string into a list of FormatString objects and then iterates over the list.
>> As a result, the other traces occasionally got printed between these iterations and break the pattern the test is expected to find
>> in the output.
>>
>> For example, here is the sample of a such output that has the trace message printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:"
>> and "1030762496".
>>
>> <skipped>
>> [0.304s][trace][os,container] Memory Usage is: 42983424
>> OperatingSystemMXBean.getFreeMemorySize: 1030758400
>> [0.305s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes
>> [0.305s][trace][os,container] Memory Usage is: 42979328
>> [0.306s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes
>> OperatingSystemMXBean.getFreePhysicalMemorySize: [0.306s][trace][os,container] Memory Usage is: 42975232
>> 1030762496
>> OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176
>>
>> <skipped>
>> java.lang.RuntimeException: 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from stdout/stderr
>>
>> at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306)
>> at TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151)
>> at TestMemoryAwareness.main(TestMemoryAwareness.java:73)
>> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
>> at java.base/java.lang.Thread.run(Thread.java:832)
>>
>> Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker tests passed. Tier4-tier6 tests are still running.
>>
>> [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.05
>> [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575
>> [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428
>> [4] https://bugs.openjdk.java.net/browse/JDK-8235522
>>
>> Thank you,
>> Daniil
>>
>> On 12/6/19, 1:38 PM, "Mandy Chung" <mandy.chung at oracle.com> wrote:
>>
>>
>>
>> On 12/6/19 5:59 AM, Bob Vandette wrote:
>>>> On Dec 6, 2019, at 2:49 AM, David Holmes<David.Holmes at oracle.com> wrote:
>>>>
>>>>
>>>> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java
>>>>
>>>> The changes to allow for a return of -1 are somewhat more extensive than we have previously discussed. These methods previously were (per the spec) guaranteed to return some (assumably) meaningful value but now they are effectively allowed to fail by returning -1. No existing code is expecting to have to handle a return of -1 so I see this as a significant compatibility issue.
>> I thought that the error case we are referring to is limit == 0 which
>> indicates something unexpected goes wrong. So the compatibility concern
>> should be low. This is very specific to Metrics implementation for
>> cgroup v1 and let me know if I'm wrong.
>>
>>>> Surely there must always be some information available from the operating environment? I see from the impl file:
>>>>
>>>> // the host data, value 0 indicates that something went wrong while the metric was read and
>>>> // in this case we return "information unavailable" code -1.
>>>>
>>>> I don't agree with this. If the container metrics are messed up somehow we should either fallback to the host value or else abort with some kind of exception. Returning -1 is not an option here IMO.
>>> I agree with David on the compatibility concern. I originally thought that -1 was already a specified return for all of these methods.
>>> Since the 0 return failure from the Metrics API should only occur if one of the cgroup subsystems is not enabled while others
>>> are, I’d suggest we keep Daniil’s original logic to fall back to the host value since a disabled subsystem is equivalent to no
>>> limits.
>>>
>> It's important to consider carefully if the monitoring API indicates an
>> error vs unavailable and an application should continue to run when the
>> monitoring system fails to get the metrics.
>>
>> There are several choices to report "something goes wrong" scenarios
>> (should unlikely happen???):
>> 1. fall back to a random positive value (e.g. host value)
>> 2. return a negative value
>> 3. throw an exception
>>
>> #3 is not an option as the application is not expecting this. For #2,
>> the application can filter bad values if desirable.
>>
>> I'm okay if you want to file a JBS issue to follow up and thoroughly
>> look at the cases that the metrics are unavailable and the cases when
>> fails to obtain.
>>
>>>> ---
>>>>
>>>> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java
>>>>
>>>> System.out.println(String.format(...)
>>>>
>>>> Why not simply
>>>>
>>>> System.out.printf(..)
>>>>
>>>> ?
>> or simply (as I commented [1])
>> System.out.format
>>
>> Mandy
>> [1]
>> https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html
>>
>>
>>
>>
More information about the serviceability-dev
mailing list