Proposed Ergonomics Profiles
Stephanie Crater
scrater at microsoft.com
Wed Jun 14 18:02:12 UTC 2023
Hi Severin,
Thanks for your comments – hopefully, this can provide some clarity:
> This boils down to having different profiles for physical deployments (status quo) vs. deployment in containers (on k8s), right?
Yes, the default ergonomics for physical deployments (`shared` profile) would remain unchanged. The JVM would activate the `dedicated` profile by default when it detects that it is in that type of environment (e.g., the JVM sees it is running inside a cgroups namespace or a container). A JVM flag could also override the ergonomics profile -XX:ErgonomicsProfile=<shared|dedicated>. This flag would be useful, especially for Cloud PaaS services where sometimes the environment is a pure VM (not cgroups) but still dedicated to that JVM process.
> Of course, such a feature would have a non-free cost item on the continuous test column. Any thoughts how you'd plan to ensure that both profiles behave as they're supposed to behave?
We could piggyback off existing container tests here to ensure that the dedicated profile is set correctly when the JVM runs in environments where we want the dedicated profile to be on by default (e.g., cgroups). However, we will need to resolve a few known issues before this proposal lands. Most notably that OSContainer::is_containerized() returns true when run outside a container [1].
> Perhaps it would sense from a high-level understanding perspective to sketch out what you envision such a 'dedicated' profile would amount to? Do you have some concrete ideas?
We believe the dedicated profile should set a larger default max heap size - likely 75% of the available memory. This 75% guideline aligns with what we already see in the .NET Runtime [2]. We are also looking into changing the default GCs in dedicated, based on available memory, especially considering the remaining 25% of non-heap memory.
One possibility is to use ParallelGC by default within `dedicated` with 2+ processors and up to a certain amount of available memory. We are initially considering 2GB or 4GB as the threshold before switching to G1GC. Then, after a certain amount of memory, we could set ZGC, with 32GB+ of memory available.
Other OpenJDK distributions could, of course, change these profiles and have different settings, with even different GCs (e.g., Shenandoah instead of ZGC in the Red Hat Build of OpenJDK; or Azul C4 in Azul Platform Prime; and so on).
We still need to do some tests, and we will report back here, but we would love to hear your thoughts on the concept.
> Any thoughts why active processor counting would need adjustment for such a profile? Why would the current way how the container detection code abstracts that metric be insufficient?
We observed a few tests where bumping active processor counting to 2 when the JVM has a 1000millicore cpu limit yielded better results. After more thought into this and reviewing the executed tests, we are no longer ready to proceed with this adjustment. Moreover, several other articles and advocates strongly suggest that application workloads on Kubernetes should have no CPU limit at all (only cpu request). If users follow this guidance, there should be no need to tweak active processor counting, as JVM would consider them all.
> Greater than 2 profiles seem concerning. Why do you think more than two would be necessary?
At this stage, we only have plans for the `shared` and `dedicated` profiles. However, while it is not a goal of the initial proposal, there is an opportunity for this feature to provide a framework for OpenJDK to allow OpenJDK providers in the future to bundle a set of ergonomics for a broader set of workloads and hardware. For example, on Microsoft Azure, we have VMs that are “General purpose”, “Compute optimized” (higher CPU-to-memory ratio), “Memory optimized” (higher Memory-to-CPU ratio), “Disk optimized” (high disk throughput and IO; for databases and data warehouse) [3].
Other JVM flags could be also enabled by default as well, such as UseNUMA.
As for how to make the JVM observe these special VMs, the VM image could have a default JAVA_TOOL_OPTIONS=-XX:ErgonomicsProfile=`<profile>` to be picked up automatically by HotSpot while still maintaining the `shared` as the default in the case of traditional VMs and bare metal (non-container/non-cgroups).
In summary, we believe ergonomics profiles are a step forward in JVM tuning without complexity, allowing developers to continue relying on JVM defaults.
[1] [JDK-8261242] [Linux] OSContainer::is_containerized() returns true when run outside a container - Java Bug System (openjdk.org)<https://bugs.openjdk.org/browse/JDK-8261242>
[2] Garbage collector config settings - .NET | Microsoft Learn<https://learn.microsoft.com/en-us/dotnet/core/runtime-config/garbage-collector#heap-limit>
[3] VM sizes - Azure Virtual Machines | Microsoft Learn<https://learn.microsoft.com/en-us/azure/virtual-machines/sizes>
Thanks,
Stephanie and Bruno
From: Severin Gehwolf <sgehwolf at redhat.com>
Date: Friday, May 19, 2023 at 2:23 AM
To: Stephanie Crater <scrater at microsoft.com>, hotspot-dev at openjdk.org <hotspot-dev at openjdk.org>
Subject: [EXTERNAL] Re: Proposed Ergonomics Profiles
Hi Stephanie,
In principle it would be useful to have, so I'd be on board with such a
proposal. It would free us from rolling our own tuning in downstream
images. This boils down to having different profiles for physical
deployments (status quo) vs. deployment in containers (on k8s), right?
Of course, such a feature would have a non-free cost item on the
continuous test column. Any thoughts how you'd plan to ensure that both
profiles behave as they're supposed to behave?
On Tue, 2023-05-16 at 20:18 +0000, Stephanie Crater wrote:
> Hi,
>
> The Java Engineering Group at Microsoft is currently working on a JEP
> to introduce Ergonomics Profiles as a new JVM feature, with a
> `shared` profile for the existing JVM ergonomics and a `dedicated`
> option for when the JVM is running on systems with dedicated
> resources for the one JVM process.
>
> The current default JVM ergonomics were designed with the
> understanding that the JVM must share resources with other processes.
> However, a recent study done by an APM vendor (New Relic) identified
> that more than 70% of monitored JVMs [1] in production are running in
> dedicated environments (e.g., containers) as opposed to being shared.
> Many of these JVMs are running without explicit JVM tuning flags,
> once more confirming that JVM tuning is a challenging exercise many
> developers have no experience with. Introducing updated ergonomics
> for when the JVM is running in specific environments would allow the
> JVM to consume available resources more effectively instead of
> running with default ergonomics aimed at shared environments.
>
> For example, our customer data from Azure Spring Apps shows that 83%
> of monitored JVMs do not use JVM flags to set the heap size. Using
> the current JVM ergonomics, the default maximum heap size of the JVM
> varies from 50% to 25%, depending on how much memory is available in
> the environment: up to 256MB, or 512MB or more, respectively, with a
> fixed amount of ~127MB for systems with anywhere between 256MB and
> 512MB of memory. These amounts do not adequately map the intended
> resource plan of dedicated environments. The user may have already
> considered to allocating, e.g., 4GB of memory to the JVM and expect
> it to use more than only 1GB of the heap (25%).
>
> The `dedicated` ergonomics profile will contain different heuristics
> to increase resource consumption in the environment, compared to
> `shared`.
Perhaps it would sense from an high level understanding perspective to
sketch out what you envision such a 'dedicated' profile would actually
amount to? Do you have some concrete ideas?
> The ergonomics we target include heuristics for maximum heap size, GC
> selection, active processor counting, and thread pool sizes internal
> to the JVM. If it would help, we have started writing this proposal
> in a JEP format.
Any thoughts why active processor counting would need adjustment for
such a profile? Why would the current way how the container detection
code abstracts that metric be insufficient?
> We would love to hear what the community thinks about this proposed
> enhancement and any suggestions you may have for the dedicated
> ergonomics profile. For example, this profile will likely increase
> heap size allocation to 60%-70% by default, but GC selection and
> active processor counting are much more complex. This JEP would also
> provide a framework for OpenJDK to include more ergonomics profiles
> for specific machines, environments, or workloads.
Greater than 2 profiles seem concerning. Why do you think more than two
would be necessary?
Thanks,
Severin
> Thank you for the feedback!
>
> [1]: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnewrelic.com%2Fresources%2Freport%2F2023-state-of-the-java-ecosystem&data=05%7C01%7Cscrater%40microsoft.com%7C0b347b7060284548e33808db584aa219%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638200849798608285%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=iePSh5jvyA36bbRMOzUIUVodp9e8XEfYdonzTkMDrTI%3D&reserved=0<https://newrelic.com/resources/report/2023-state-of-the-java-ecosystem>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20230614/1226339f/attachment-0001.htm>
More information about the hotspot-dev
mailing list