RFR [14]: 8232207: Linux os::available_memory re-reads cgroup configuration on every invocation
Severin Gehwolf
sgehwolf at redhat.com
Tue Oct 15 16:18:03 UTC 2019
Hi Claes,
On Tue, 2019-10-15 at 16:59 +0200, Claes Redestad wrote:
> Hi Severin,
>
> On 2019-10-15 16:33, Severin Gehwolf wrote:
> > Hi Claes,
> >
> > On Tue, 2019-10-15 at 11:12 +0200, Claes Redestad wrote:
> > > Hi,
> > >
> > > on a Linux system with container support, os::available_memory will read
> > > cgroup configuration files from /proc to determine max memory limits.
> > > This leads to measurable memory overheads in some places, e.g., JIT
> > > compiler threads will poll os::available_memory between compilation to
> > > determine if we need to reduce the number of compiler threads.
> > >
> > > Overhead from polling these /proc files can take up to 5% of total CPU
> > > resource usage during startup and warmup.
> >
> > Would there be a way to reproduce this myself? What did you use to
> > measure this? It would come in handy for the cgroups v2 work I'm doing.
>
> It should be reproducible on most Linuxes - within or outside docker -
> when the kernel is built with CONFIG_SCHED_AUTOGROUP=y (which is the
> case on all my machines, apparently..)
>
> I run various startup tests wrapped by perf stat, for example:
>
> perf stat -r 50 java Hello World
Thanks.
> > How does this compare to -XX:-UseContainerSupport runs?
>
> -XX:-UseContainerSupport recuperates a few more cycles, e.g.:
>
> 120,105,173 instructions # 0.83 insns per
> cycle ( +- 0.08% )
> 23,843,560 branches # 433.858 M/sec
> ( +- 0.08% )
> 822,607 branch-misses # 3.45% of all
> branches ( +- 0.22% )
>
> 0.037818060 seconds time elapsed
> ( +- 0.38% )
>
> So there's a very small cost you can avoid by turning off container
> support.
Thanks. OK, that's expected.
> > Aside: It seems we'd need similar work for
> > OSContainer::active_processor_count(). Or rather, cpu_quota(),
> > cpu_period() and cpu_shares() functions. See:
> >
> > https://bugs.openjdk.java.net/browse/JDK-8227006
> >
> > I'll give using a similar approach for active_processor_count() a shot.
>
> Could be worthwhile, but the reason we see an improvement here is that
> we avoid a situation where many (compiler) threads are concurrently
> querying for the current setting during a short time frame. Each call
> (outside of the grace time) is still just as expensive. So it'd be nice
> with a faster API..
Right. I've just tried it[1] and the this helps the tight-loop case as
originally reported in JDK-8227006. It's no longer being used in
CompletableFuture in latest OpenJDK 8u (JDK-8227018), but it's still a
data point.
Thanks,
Severin
[1] https://bugs.openjdk.java.net/browse/JDK-8227006?focusedCommentId=14294155&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14294155
More information about the hotspot-runtime-dev
mailing list