RFR(S): 8198756: Limit number of compiler threads for small code cache

Wed Mar 21 22:36:11 UTC 2018

I hijacked and change Subject of this RFE to implement dynamic 
allocation of compiler threads.

I wrote proposal in FRE's comment. Please, look.

It is still assigned to Martin ;) but we can take ownership when we 
finalize design.

Thanks,
Vladimir

On 3/2/18 2:28 AM, Doerr, Martin wrote:
> Hi Derek, Igor and Vladimir,
> 
> thanks for all replies.
> 
> I agree with that it would be good to have something like 
> UseDynamicNumberOfGCThreads.
> 
> Btw. I have recently requested to activate that one by default 
> (JDK-8198547).
> 
> If we can’t get it for jdk11, I’d like at least to make it easier for 
> customers to save memory without explicitly setting CICompilerCount.
> 
> Best regards,
> 
> Martin
> 
> *From:*White, Derek [mailto:Derek.White at cavium.com]
> *Sent:* Freitag, 2. März 2018 04:25
> *To:* Igor Veresov <igor.veresov at oracle.com>; Doerr, Martin 
> <martin.doerr at sap.com>
> *Cc:* Vladimir Kozlov <vladimir.kozlov at oracle.com>; 
> hotspot-compiler-dev at openjdk.java.net
> *Subject:* RE: RFR(S): 8198756: Limit number of compiler threads for 
> small code cache
> 
> Hi Igor, Martin,
> 
> Just to throw out some other user experience:
> 
> I’m typically running on machines with 98 to 224 CPUs. It’s not the case 
> that **every** Java app needs to use all the CPUs for compiler threads. 
> The JVM may not be the only JVM running on the system (Hadoop, 
> microservices, etc), let alone the only important app on the system.
> 
> Historically the GC threads have been the worst offenders in this 
> regard. The GC thread’s “scaling factor” is much higher than the 
> compiler thread’s scaling factor. But with options like 
> UseDynamicNumberOfGCThreads, the GC tries to adjust the number of GC 
> threads to the work to be done. I think it’s important that the JVM 
> figure out how to scale the number of compiler threads as well.
> 
> I won’t claim that Martin’s scheme is the best approach, or that it 
> should be on by default, but unless a better solution is going into JDK 
> 11, I’d support this scheme as an experimental flag. FWIW.
> 
>   * Derek White, Cavium (Purveyor of fine 224 cpu systems for the
>     discerning developer).
> 
> *From:*hotspot-compiler-dev 
> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] *On Behalf Of 
> *Igor Veresov
> *Sent:* Thursday, March 01, 2018 7:46 PM
> *To:* Doerr, Martin <martin.doerr at sap.com <mailto:martin.doerr at sap.com>>
> *Cc:* Vladimir Kozlov <vladimir.kozlov at oracle.com 
> <mailto:vladimir.kozlov at oracle.com>>; 
> hotspot-compiler-dev at openjdk.java.net 
> <mailto:hotspot-compiler-dev at openjdk.java.net>
> *Subject:* Re: RFR(S): 8198756: Limit number of compiler threads for 
> small code cache
> 
> Doerr,
> 
> I think the optimal number of compiler threads is such that it keeps the 
> length of the compiler queues as minimal. During startup typically the 
> optimal number of compiler threads is equal to the number of the CPUs, 
> may be even more than that considering threads a either C1 or C2 and 
> compiles typically happen in waves using one and then the other. The 
> fact that some users see code cache filling slower with fewer threads is 
> just an indication of how huge their compile queues are, and this is 
> certainly not good for startup. The problem of resource holding is real, 
> since after startup we don’t need that many threads (unless you’re 
> running something that does dynamic code generation). Perhaps the 
> solution to all of this is having a dynamic pool of compiler threads 
> that could expand/shrink depending on the load (the length of the 
> compile queues).
> 
> igor
> 
>     On Mar 1, 2018, at 12:31 AM, Doerr, Martin <martin.doerr at sap.com
>     <mailto:martin.doerr at sap.com>> wrote:
> 
>     Hi Igor,
> 
>     we observed that the compiler threads fill up the code cache faster
>     than the sweeper can clean when using a small code cache.
> 
>     This doesn't seem beneficial at all.
> 
>     Some customers try to save memory by using a very small code cache.
>     It's very annoying that so much memory gets wasted for such a large
>     number of idle compiler threads which hold their arenas etc.
> 
>     Maybe the current formula was optimized for a special scenario with
>     many slow cores? Maybe SPARC Niagara?
> 
>     Shouldn't such scenarios use a large code cache? Maybe much more
>     than 240MB?
> 
>     Best regards,
> 
>     Martin
> 
>     *From:*Igor Veresov [mailto:igor.veresov at oracle.com]
>     *Sent:*Donnerstag, 1. März 2018 08:05
>     *To:*Vladimir Kozlov <vladimir.kozlov at oracle.com
>     <mailto:vladimir.kozlov at oracle.com>>
>     *Cc:*Doerr, Martin <martin.doerr at sap.com
>     <mailto:martin.doerr at sap.com>>;
>     hotspot-compiler-dev at openjdk.java.net
>     <mailto:hotspot-compiler-dev at openjdk.java.net>
>     *Subject:*Re: RFR(S): 8198756: Limit number of compiler threads for
>     small code cache
> 
>     I’m curious about the rationale for tying the number of thread to
>     the size of the code cache. Is it because you don’t want them to
>     keep holding the space for their code buffers when they are idle?
> 
>     igor
> 
> 
> 
>         On Feb 27, 2018, at 10:19 AM, Vladimir Kozlov
>         <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>
>         wrote:
> 
>         Hi Doerr,
> 
>         The problem with your proposal is that we don't use scale number
>         of compiler threads when we have a lot of cpus (>1000 on big
>         "slow" machines).
>         By default for tiered compilation we have 240Mb for CodeCache.
>         With your formula we always will have 7 threads (2 C1 and 5 C2)
>         which could be fine if machine has < total 32 procs/threads. But
>         for big machines it may be bottleneck for JIT compilation
>         intensive applications (and for startup when most JIT
>         compilations happened).
> 
>         Main motivation of current approach was to reach peak
>         performance (c2 compilations) as fast as possible. What we
>         usually observed before is large compilation queue for C2
>         compilation because slow throughput of C2. It was especially
>         visible with tiered compilation when compilation thresholds
>         reached faster with first tier compiled profiling code.
> 
>         And I agree that we may have problem with number of compiler
>         threads at the beginning of graph (< 32 cpu threads) when the
>         number grows too fast:
> 
>         Graph for3*log2(x)*log2(log2(x))/2
> 
>         -60-55-50-45-40-35-30-25-20-15-10-55101520253035404550556065707580859095100105110115120125130-35-30-25-20-15-10-55101520253035404550556065x:
>         32.0711217y: 17.4325495
> 
> 
> 
>         May be we should have a formula which takes into account code
>         cache size and number of cpu threads.
> 
>         Igor Veresov was original developer of current formula. It would
>         be nice to hear his opinion.
> 
>         Thanks,
>         Vladimir
> 
>         On 2/27/18 8:10 AM, Doerr, Martin wrote:
> 
>             Hi,
> 
>             the VM currently starts a large amount of compiler threads
>             on systems with many CPUs regardless of the code cache size.
> 
>             This doesn't make sense for very small code cache sizes.
> 
>             The dynamically determined number of compiler threads can be
>             observed by:
> 
>             jdk/bin/java -XX:ReservedCodeCacheSize=128m
>             -XX:+PrintFlagsFinal -version|grep CICompiler
> 
>             I suggest not to use more than 1 compiler thread per 32MB of
>             code cache:
> 
>             http://cr.openjdk.java.net/~mdoerr/8198756_CompilerCount/webrev.00/
>             <http://cr.openjdk.java.net/%7Emdoerr/8198756_CompilerCount/webrev.00/>
> 
>             This seems to be conservative.
> 
>             Please review and let me know if you have a different
>             limitation proposal.
> 
>             Best regards,
> 
>             Martin
>