RFR [14]: 8232207: Linux os::available_memory re-reads cgroup configuration on every invocation

Tue Oct 15 16:18:03 UTC 2019

Hi Claes,

On Tue, 2019-10-15 at 16:59 +0200, Claes Redestad wrote:
> Hi Severin,
> 
> On 2019-10-15 16:33, Severin Gehwolf wrote:
> > Hi Claes,
> > 
> > On Tue, 2019-10-15 at 11:12 +0200, Claes Redestad wrote:
> > > Hi,
> > > 
> > > on a Linux system with container support, os::available_memory will read
> > > cgroup configuration files from /proc to determine max memory limits.
> > > This leads to measurable memory overheads in some places, e.g., JIT
> > > compiler threads will poll os::available_memory between compilation to
> > > determine if we need to reduce the number of compiler threads.
> > > 
> > > Overhead from polling these /proc files can take up to 5% of total CPU
> > > resource usage during startup and warmup.
> > 
> > Would there be a way to reproduce this myself? What did you use to
> > measure this? It would come in handy for the cgroups v2 work I'm doing.
> 
> It should be reproducible on most Linuxes - within or outside docker -
> when the kernel is built with CONFIG_SCHED_AUTOGROUP=y (which is the
> case on all my machines, apparently..)
> 
> I run various startup tests wrapped by perf stat, for example:
> 
> perf stat -r 50 java Hello World

Thanks.

> > How does this compare to -XX:-UseContainerSupport runs?
> 
> -XX:-UseContainerSupport recuperates a few more cycles, e.g.:
> 
>         120,105,173      instructions              #    0.83  insns per 
> cycle          ( +-  0.08% )
>          23,843,560      branches                  #  433.858 M/sec 
>                ( +-  0.08% )
>             822,607      branch-misses             #    3.45% of all 
> branches          ( +-  0.22% )
> 
>         0.037818060 seconds time elapsed 
>           ( +-  0.38% )
> 
> So there's a very small cost you can avoid by turning off container
> support.

Thanks. OK, that's expected.

> > Aside: It seems we'd need similar work for
> > OSContainer::active_processor_count(). Or rather, cpu_quota(),
> > cpu_period() and cpu_shares() functions. See:
> > 
> > https://bugs.openjdk.java.net/browse/JDK-8227006
> > 
> > I'll give using a similar approach for active_processor_count() a shot.
> 
> Could be worthwhile, but the reason we see an improvement here is that
> we avoid a situation where many (compiler) threads are concurrently
> querying for the current setting during a short time frame. Each call
> (outside of the grace time) is still just as expensive. So it'd be nice
> with a faster API..

Right. I've just tried it[1] and the this helps the tight-loop case as
originally reported in JDK-8227006. It's no longer being used in
CompletableFuture in latest OpenJDK 8u (JDK-8227018), but it's still a
data point.

Thanks,
Severin

[1] https://bugs.openjdk.java.net/browse/JDK-8227006?focusedCommentId=14294155&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14294155