premain: performance sniff tests
ioi.lam at oracle.com
ioi.lam at oracle.com
Tue Sep 5 23:32:09 UTC 2023
We see about 300ms -> 90ms improvement for "javac HelloWorld.java"
You can see the numbers at the end of:
https://github.com/openjdk/leyden/blob/premain/test/hotspot/jtreg/premain/javac_helloworld/run.sh
Wall clock time - geomean over 10 runs of 'perf stat -r 16 javac
HelloWorld.java'
Mainline JDK (CDS disabled) 302.86 ms
Mainline JDK (CDS enabled) 161.34 ms
Premain Prototype (CDS only) 131.71 ms
Premain Prototype (CDS + AOT) 92.84 ms
... with the typical disclaimer that the code is in very early stage so
your mileage will vary ...
Thanks
- Ioi
On 9/5/23 4:23 PM, Vladimir Ivanov wrote:
> Hi Ashutosh,
>
> Thanks for giving it a try!
>
> There were some experiments with PetClinic on our side before and it
> was noticed that the application relies on custom loaders which aren't
> fully supported yet. It was the main limiting factor for new
> optimizations.
> Until proper support for custom loaders is there, I suggest to modify
> the benchmark so it relies only on existing system loaders.
>
> Speaking of peak performance, some loss of performance is expected.
> Cached code is compiled conservatively (e.g., no constant folding for
> static final fields) so it can be reused in deployment runs. For now,
> the intended solution is to eventually recompile cached code online
> with all the optimizations enabled (have to be explicitly enabled
> -XX:+UseRecompilation). It's a work-in-progress and our experience
> using it was mixed: recompilation doesn't always fully restore peak
> performance.
>
> But assuming that both CDS and cached code archive are underutilized
> (due to aforementioned reliance on custom loaders), 10% sounds way too
> big of a difference. I suggest to experiment with different flag
> combinations (e.g., turning ReplayTraining and LoadCachedCode on and
> off independently).
>
> There's additional diagnostic output JVM produces which may help to
> observe effects from new optimizations during both training and
> deployment runs:
>
> * -XX:+PrintCompilation: compilations satisfied from cached code
> archive are marked w/ "R";
>
> * -XX:+CITime: prints information about cached code archive usage;
>
> * -Xlog:init=info: produces additional information about some startup
> activities
>
> * -XX:+PrintSharedArchiveAndExit additionally dumps training data and
> cached code archive info
>
> * -Xlog:scc*=info and -Xlog:cds*=info print lots of additional
> information both during training and deployment
>
> Hope it helps.
>
> Best regards,
> Vladimir Ivanov
>
> On 9/5/23 13:52, Ashutosh Mehra wrote:
>> Hi,
>>
>> We have been interested in persisting the profiling data in the CDS
>> archive with the intention of improving the application's warmup time.
>> And now that the premain branch is here that does save profile data
>> along with AOT, we started playing with the premain branch to
>> understand its impact on the performance.
>>
>> Our setup uses Springboot Petclinic [0] application and the CDS and
>> shared code archives are generated in a manner similar to this script
>> [1].
>> Our training run only covers the application startup phase. That
>> means at each step we start the application and shut it down without
>> putting any load on it.
>>
>> Using the archives thus generated I have done few experiments on my
>> local system. In these experiments the application is bound to two cpus.
>> The baseline for comparing the results is the case where the CDS
>> archive does not have any profiling data and there is no shared code
>> archive.
>> The "premain" configuration refers to using a shared code archive and
>> a CDS archive with training data.
>>
>> Here are some initial results:
>>
>> 1. Startup: It is heartening to see start-up time improve by almost 11%.
>>
>> baseline 10.2s
>> premain 9.1s
>>
>> 2. Warmup:
>> This test measures the warmup time by applying load using 1 jmeter
>> thread to get an idea of the ramp-up time to reach the peak throughput.
>> The load is applied for the duration of 300 seconds. The graph [2]
>> for aot+profiling configuration shows interesting behavior.
>> In the initial period premain is ramping up faster than the baseline.
>> Then the slope of the curve for premain reduces significantly and a
>> couple of dips are also seen. Finally the throughput stabilizes.
>> It shows a drastic difference in the warmup time of the application
>> when running with the "premain" config.
>>
>> 3. Peak throughput: Last experiment is to measure peak throughput. It
>> starts with a warm-up phase of 180 seconds using 1 jmeter thread.
>> After the warmup phase the load is applied with 10 jmeter threads for
>> a duration of 5 mins.
>> Last two minutes of throughput is considered for measurement. The
>> graph [3] for this test shows almost a 10% drop in the throughput
>> compared to the baseline.
>>
>>
>> I am sure others would have done similar testing. My questions are:
>>
>> 1. Are these results on the expected lines?
>> 2. Are these tests using the CDS and the shared code (or cached code)
>> archives in the expected manner.
>> 3. Warmup time with the premain branch looks pretty bad which is
>> surprising. Is there any trick I missed in my tests? Is there
>> anything else that needs to be done to get better warmup time?
>> 4. What is the point of creating a new static archive? Shouldn't the
>> applications just create the dynamic archive?
>> 5. I am also wondering if there is any design doc that can be shared
>> that explains the AOT compilation strategy adopted in the premain
>> branch?
>>
>> I have placed my scripts here [4] in case anyone wants to use them to
>> run these tests (you need to build the Petclinic app before using
>> these scripts).
>>
>> Please feel free to share your thoughts.
>>
>> [0] https://github.com/spring-projects/spring-petclinic
>> <https://github.com/spring-projects/spring-petclinic>
>> [1]
>> https://github.com/openjdk/leyden/blob/d960fb15258cc99a1bf7f0b1e94bd8be06605aad/test/hotspot/jtreg/premain/lib/premain-run.sh#L70-L101
>> <https://github.com/openjdk/leyden/blob/d960fb15258cc99a1bf7f0b1e94bd8be06605aad/test/hotspot/jtreg/premain/lib/premain-run.sh#L70-L101>
>>
>> [2]
>> https://github.com/ashu-mehra/leyden-perf/blob/main/spring/fd82682/tput-t1.svg
>> <https://github.com/ashu-mehra/leyden-perf/blob/main/spring/fd82682/tput-t1.svg>
>>
>> [3]
>> https://github.com/ashu-mehra/leyden-perf/blob/main/spring/fd82682/tput-t10.svg
>> <https://github.com/ashu-mehra/leyden-perf/blob/main/spring/fd82682/tput-t10.svg>
>>
>> [4] https://github.com/ashu-mehra/leyden-perf
>> <https://github.com/ashu-mehra/leyden-perf>
>>
>> Thanks,
>> - Ashutosh Mehra
More information about the leyden-dev
mailing list