RFR: 8296125: Add a command line option to set a refresh rate of the OS cached metrics in Linux [v2]

Thomas Stuefe stuefe at openjdk.org
Wed Nov 2 19:22:44 UTC 2022


On Wed, 2 Nov 2022 17:48:01 GMT, Olga Mikhaltsova <omikhaltcova at openjdk.org> wrote:

> 
> But a user is not able to provide me with a reproducer and there is no 100% guarantee that this 20 ms timeout is a culprit => maybe it's better to have a diagnostic option for the 1st iteration and to wait for the 2nd user request to have this option as product and only after that to propose CSR and to open PR with request to convert this option to a product one.

Sorry, it seems odd to add a flag to OpenJDK proper just to analyze a customer scenario. We should have a better understanding of this problem before adding switches. Can't you do it downstream, or hand the customer a custom-built VM?

> > Do you have any benchmark that shows the benefit of this switch? We don't want to add a switch, even a diagnostic one, just on a suspicion.

+1

> 
> Also, we need to measure the actual cost of reading the OS metrics. As far as I know, it's just parsing a very simple text string. It's hard to imagine that it would cause any difference if you do it 50 times per second vs 10 times per second.
> 
> (The original fix, [JDK-8232207](https://bugs.openjdk.org/browse/JDK-8232207), reduces the refresh rate from something like 1000 times per second to 50 times per second, which made a real difference).
> 
> However, if you delay updating the OS metrics, you will be running with the old metrics for a longer time. For example, if the memory limit has been reduced, the correct behavior might be to shrink the size of some application caches which would result in lower performance. By changing the refresh rate to 100ms, you effectively would use the larger cache for a longer time, so the measured performance would be higher.

Maybe it is my limited understanding, but I cannot see an application that needs even sub-second reaction time to a changed container memory limit. @jerboaa what do you think?
 
Also, as I wrote before, if our polling code is still slow we should improve it. But lets measure first.

-------------

PR: https://git.openjdk.org/jdk/pull/10918


More information about the hotspot-runtime-dev mailing list