RFR(S): 8198756: Limit number of compiler threads for small code cache

Fri Mar 2 03:24:30 UTC 2018

Hi Igor, Martin,

Just to throw out some other user experience:

I’m typically running on machines with 98 to 224 CPUs. It’s not the case that *every* Java app needs to use all the CPUs for compiler threads. The JVM may not be the only JVM running on the system (Hadoop, microservices, etc), let alone the only important app on the system.

Historically the GC threads have been the worst offenders in this regard. The GC thread’s “scaling factor” is much higher than the compiler thread’s scaling factor. But with options like UseDynamicNumberOfGCThreads, the GC tries to adjust the number of GC threads to the work to be done. I think it’s important that the JVM figure out how to scale the number of compiler threads as well.

I won’t claim that Martin’s scheme is the best approach, or that it should be on by default, but unless a better solution is going into JDK 11, I’d support this scheme as an experimental flag. FWIW.

  *   Derek White, Cavium (Purveyor of fine 224 cpu systems for the discerning developer).

From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Igor Veresov
Sent: Thursday, March 01, 2018 7:46 PM
To: Doerr, Martin <martin.doerr at sap.com>
Cc: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(S): 8198756: Limit number of compiler threads for small code cache

Doerr,

I think the optimal number of compiler threads is such that it keeps the length of the compiler queues as minimal. During startup typically the optimal number of compiler threads is equal to the number of the CPUs, may be even more than that considering threads a either C1 or C2 and compiles typically happen in waves using one and then the other. The fact that some users see code cache filling slower with fewer threads is just an indication of how huge their compile queues are, and this is certainly not good for startup. The problem of resource holding is real, since after startup we don’t need that many threads (unless you’re running something that does dynamic code generation). Perhaps the solution to all of this is having a dynamic pool of compiler threads that could expand/shrink depending on the load (the length of the compile queues).

igor

On Mar 1, 2018, at 12:31 AM, Doerr, Martin <martin.doerr at sap.com<mailto:martin.doerr at sap.com>> wrote:

Hi Igor,

we observed that the compiler threads fill up the code cache faster than the sweeper can clean when using a small code cache.
This doesn't seem beneficial at all.

Some customers try to save memory by using a very small code cache. It's very annoying that so much memory gets wasted for such a large number of idle compiler threads which hold their arenas etc.

Maybe the current formula was optimized for a special scenario with many slow cores? Maybe SPARC Niagara?
Shouldn't such scenarios use a large code cache? Maybe much more than 240MB?

Best regards,
Martin

From: Igor Veresov [mailto:igor.veresov at oracle.com]
Sent: Donnerstag, 1. März 2018 08:05
To: Vladimir Kozlov <vladimir.kozlov at oracle.com<mailto:vladimir.kozlov at oracle.com>>
Cc: Doerr, Martin <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>; hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(S): 8198756: Limit number of compiler threads for small code cache

I’m curious about the rationale for tying the number of thread to the size of the code cache. Is it because you don’t want them to keep holding the space for their code buffers when they are idle?

igor

On Feb 27, 2018, at 10:19 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com<mailto:vladimir.kozlov at oracle.com>> wrote:

Hi Doerr,

The problem with your proposal is that we don't use scale number of compiler threads when we have a lot of cpus (>1000 on big "slow" machines).
By default for tiered compilation we have 240Mb for CodeCache. With your formula we always will have 7 threads (2 C1 and 5 C2) which could be fine if machine has < total 32 procs/threads. But for big machines it may be bottleneck for JIT compilation intensive applications (and for startup when most JIT compilations happened).

Main motivation of current approach was to reach peak performance (c2 compilations) as fast as possible. What we usually observed before is large compilation queue for C2 compilation because slow throughput of C2. It was especially visible with tiered compilation when compilation thresholds reached faster with first tier compiled profiling code.

And I agree that we may have problem with number of compiler threads at the beginning of graph (< 32 cpu threads) when the number grows too fast:

Graph for 3*log2(x)*log2(log2(x))/2
-60-55-50-45-40-35-30-25-20-15-10-55101520253035404550556065707580859095100105110115120125130-35-30-25-20-15-10-55101520253035404550556065x: 32.0711217y: 17.4325495

May be we should have a formula which takes into account code cache size and number of cpu threads.

Igor Veresov was original developer of current formula. It would be nice to hear his opinion.

Thanks,
Vladimir

On 2/27/18 8:10 AM, Doerr, Martin wrote:
Hi,

the VM currently starts a large amount of compiler threads on systems with many CPUs regardless of the code cache size.
This doesn't make sense for very small code cache sizes.

The dynamically determined number of compiler threads can be observed by:
jdk/bin/java -XX:ReservedCodeCacheSize=128m -XX:+PrintFlagsFinal -version|grep CICompiler

I suggest not to use more than 1 compiler thread per 32MB of code cache:
http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.00/<http://cr.openjdk.java.net/%7Emdoerr/8198756_CompilerCount/webrev.00/>

This seems to be conservative.
Please review and let me know if you have a different limitation proposal.

Best regards,
Martin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180302/b296c41a/attachment-0001.html>