Any way to avoid JIT overhead for small programs when using AOT?
Claes Redestad
claes.redestad at oracle.com
Tue Sep 11 10:39:57 UTC 2018
Hi,
On 2018-09-11 08:22, jayaprabhakar k wrote:
>
> > I understand that at present AOT and -Xint are not compatible. I
> see the
> > code explicitly disables AOT when -Xint is set
> >
> <http://cr.openjdk.java.net/~kvn/8171137/webrev/raw_files/new/src/share/vm/aot/aotLoader.cpp
> <http://cr.openjdk.java.net/%7Ekvn/8171137/webrev/raw_files/new/src/share/vm/aot/aotLoader.cpp>>
> > .
> >
> > For extremely short programs, typically used by beginners
> learning Java, I
> > see that CDS, AOT and Xint all help reduce the startup time.
> While CDS
> > works with both AOT and Xint, multiplying the benefits, AOT and
> Xint do
> > not.
> >
> > Is there a way to keep both AOT + Xint, For classes/methods that are
> > precompiled, use AOT code, and for others just interpret? If not
> now, would
> > it be possible in the future?
>
> Does it significantly help? If you precompile the Java library and
> your programs
> are extremely short, you'll see very little compilation activity.
>
> Thanks Andrew.
> I don't see any compilation (The default -XX:CompileThreshold is quite
> large), but the overhead still seems to be large. I ran a small test
> on AWS T2 instances.
> The test class just has empty main method. But I could reproduce the
> exact same behavior when run with *--dry-run* command line option.
>
> So most of the delay happens on startup.
>
> -- Default --
> $ perf stat -e cpu-clock -r50 java -XX:+UseG1GC EmptyMainMethod
>
> Performance counter stats for 'java -XX:+UseG1GC EmptyMainMethod' (50 runs):
>
> 104.039398 cpu-clock (msec) ( +- 0.39% )
>
> 0.093801870 seconds time elapsed ( +- 2.66% )
>
> -- Xint --
> perf stat -e cpu-clock -r50 java -XX:+UseG1GC -Xint EmptyMainMethod
>
> Performance counter stats for 'java -XX:+UseG1GC -Xint EmptyMainMethod' (50 runs):
>
> 76.203249 cpu-clock (msec) ( +- 0.33% )
>
> 0.083464038 seconds time elapsed ( +- 2.03% )
> -- AOT --
> $ perf stat -e cpu-clock -r50 java -XX:+UseG1GC -XX:AOTLibrary=jaot/touched_methods.so EmptyMainMethod
>
> Performance counter stats for 'java -XX:+UseG1GC -XX:AOTLibrary=jaot/touched_methods.so EmptyMainMethod' (50 runs):
>
> 102.416037 cpu-clock (msec) ( +- 0.22% )
>
> 0.083394143 seconds time elapsed ( +- 0.92% )
> --
there might always be some things executed by the interpreter, some of
which might get hot enough to trigger compilations. And if you've
compiled your AOT library with support for tiered compilation you might
also see C2 jobs fired off early.
You can indirectly avoid some of this by stopping the JIT from trying to
go beyond C1 level optimization:
-XX:TieredStopAtLevel=1
In your constrained environment you might also want to limit the number
of compiler threads the system could be spinning up to a minimum:
-XX:CICompilerCount=1
With this I see a significant reduction in cpu-clock time on my local
machine (recent build from jdk/jdk):
AOT:
81.064838 cpu-clock
(msec) ( +- 1.13% )
0.073530160 seconds time
elapsed ( +- 1.05% )
AOT -XX:TieredStopAtLevel=1 -XX:CICompilerCount=1
54.584255 cpu-clock
(msec) ( +- 1.16% )
0.054806668 seconds time
elapsed ( +- 1.35% )
There's some I/O and extra linking overhead of starting up with an AOT
archive, so -Xint might still outperform on a hello world:
52.138182 cpu-clock
(msec) ( +- 1.60% )
0.053423763 seconds time
elapsed ( +- 1.67% )
Generally the static startup overhead of AOT should be amortized rather
quickly, say, once you have something that runs for more than a couple
of hundred milliseconds.
HTH
/Claes
>
> --
> The source code for the test is
>
> public class EmptyMainMethod {
> public static void main(String[] args) {
>
> }
> }
>
>
> --
> This delay seems consistent with most programs created by school
> students learning Java.
>
> Context for the request: I am the developer of Codiva.io online Java
> IDE <https://www.codiva.io>. Many teachers recommend it for their
> students to learn java. To support spiky load, I run the programs on
> the server on a container with reduced resource limits for each run.
> At 10% CPU limit, the difference gets around 200ms.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180911/c3864783/attachment-0001.html>
More information about the hotspot-compiler-dev
mailing list