RFR: 8296125: Add a command line option to set a refresh rate of the OS cached metrics in Linux [v2]
senecaspurling
duke at openjdk.org
Fri Dec 16 01:28:04 UTC 2022
On Wed, 2 Nov 2022 00:19:35 GMT, Olga Mikhaltsova <omikhaltcova at openjdk.org> wrote:
>> I would like to add a new command line product option:
>> -XX:OsCachedMetricsRefreshRate=value, -
>> where a value is times per second and is in the range [1; 1000000000].
>>
>> It substitutes the hardcoded timeout of 20 ms between re-readings of the OS cached metrics introduced in [JDK-8232207](https://bugs.openjdk.org/browse/JDK-8232207) and allows to set this timeout (as a refresh rate) by a user at the launch time.
>>
>> This option will be available only on Linux.
>>
>> It can be used as followed:
>> java -XX:OsCachedMetricsRefreshRate=100 MyApp
>
> Olga Mikhaltsova has updated the pull request incrementally with two additional commits since the last revision:
>
> - Made the option (OsCachedMetricsRefreshRate) DIAGNOSTIC
> - Moved oscontainer_cache_timeout() from os::Linux to OSContainer
I see some comments about "polling being slow" so I just want to clarify that the issue is exactly the opposite. The user wants to make the polling *slower*. With 50 containers running JVM processes on a host, the host (and the containers as well though to a lesser extent) experience performance problems because of lock contention for cgroups from the JVMs in the containers querying cgroups 50 times / second. They don't want those queries to happen more frequently/more quickly. They want to slow them down, have them happen LESS frequently. With -XX:OsCachedMetricsRefreshRate=1, the lowest possible setting here, this slows the timeout down to 1000ms (1 second). They do see some improvement with this but it's not sufficient.
The JDK feature to dynamically create compiler threads is querying the os for available memory to make a decision whether a new thread should be created or not: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compileBroker.cpp#L995
That will delegate to cgroups in a container and it will compete for the lock. With 50 containerized processes doing this 50 times/second, the contention for cgroups causes performance issues. Running with -XX:-UseDynamicNumberOfCompilerThreads, or disabling support for containers with -XX:-UseContainerSupport, resolves the issue entirely.
-------------
PR: https://git.openjdk.org/jdk/pull/10918
More information about the hotspot-runtime-dev
mailing list