RFR: 8367319: Add os interfaces to get machine and container values separately [v2]

Erik Österlund eosterlund at openjdk.org
Wed Oct 22 06:46:03 UTC 2025


On Wed, 22 Oct 2025 05:12:18 GMT, David Holmes <dholmes at openjdk.org> wrote:

> >  In other words - are you against the idea of having an implicit API that gives you either the container or the "machine"?
> 
> 
> 
> @fisk I am "against" trying to shoe-horn a poorly-defined  legacy API into a dichotomy of "container value" versus "machine value". The concepts are not at all clean or well-defined for both variants.
> 

Okay. Let's start with the problem domain then.

> 
> > CPU quotas when running in a container.
> 
> 
> 
> Then you need an explicit API for that. No contortion of "available processors" is going to tell you what your quota is.
> 

While that is true, I was hoping for an API that isn't just the exact Linux implementation that screams of Linux for no good reason. What I mean by that is that having the available processors as a double seems general enough that potentially other OS containers could use it, whether or not they implemented CPU throttling in the same way. Having an explicit fraction instead does not help me much as a user.

> 
> > The "machine" numbers when running in a container (which get overridden by container numbers).
> 
> 
> 
> Sounds simple but is it well-defined? Can you even ask the question (again I come back to available processors where sched_getaffinity will already account for the presence of the container if tasksets are used - what answer would you want for the "machine"?). 
> 

What I specifically care about is how many cores the JVM is allowed to be scheduled to run on. These cores are shared across containers and utilizing them has a shared latency impact.

The container layer might throttle how often you are allowed to run on said processors and for how long. But neither cgroup CPU limit constrains how many processors the JVM is allowed to run on. It might run on all of them for a very short period of time.

> 
> Aside: even if you can ask the question what use are these values if you are running within the container? Is there some means to bypass the container constraints?

So the reason this matters to me is that GC latency with a concurrent GC is all about CPU utilization. As CPU utilization grows, response times grow too across all percentiles.

So when in a position to pick how to pay for GC in terms of memory vs CPU, I want to increasingly bias the cost more towards using memory when the shared cores the container is allowed to run on get saturated, despite the individual container using less CPU.

In other words, there are trade-offs between latency, CPU and memory. All of these are shared across containers. So keeping within the container limits is a good start, but the container resources are shared. Therefore, making decisions to better share resources and latency pain across containers is even better. The same sharing story goes for memory which is also a shared resource across containers.

I don't expect a lot of code to reason about these things, but it is important for writing good GC heuristics. I can grab the values by piercing through the OS abstractions with some horrible #if LINUX CGroups::foo type of code, but I was hoping we could admit that both the resource view from within and outside of the container are important enough to have a proper API.

I have seen other sprinklings of code piercing through the container limit, like JFR CPU load metrics. It felt like it would be nice if we had a proper API for this.

So that's the problem domain.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27646#issuecomment-3430714497


More information about the hotspot-dev mailing list