premain: performance sniff tests

ioi.lam at oracle.com ioi.lam at oracle.com
Tue Sep 5 23:32:09 UTC 2023


We see about 300ms -> 90ms improvement for "javac HelloWorld.java"

You can see the numbers at the end of:

https://github.com/openjdk/leyden/blob/premain/test/hotspot/jtreg/premain/javac_helloworld/run.sh

Wall clock time - geomean over 10 runs of 'perf stat -r 16 javac 
HelloWorld.java'
Mainline JDK (CDS disabled)     302.86 ms
Mainline JDK (CDS enabled)      161.34 ms
Premain Prototype (CDS only)    131.71 ms
Premain Prototype (CDS + AOT)    92.84 ms

... with the typical disclaimer that the code is in very early stage so 
your mileage will vary ...

Thanks

- Ioi



On 9/5/23 4:23 PM, Vladimir Ivanov wrote:
> Hi Ashutosh,
>
> Thanks for giving it a try!
>
> There were some experiments with PetClinic on our side before and it 
> was noticed that the application relies on custom loaders which aren't 
> fully supported yet. It was the main limiting factor for new 
> optimizations.
> Until proper support for custom loaders is there, I suggest to modify 
> the benchmark so it relies only on existing system loaders.
>
> Speaking of peak performance, some loss of performance is expected. 
> Cached code is compiled conservatively (e.g., no constant folding for 
> static final fields) so it can be reused in deployment runs. For now, 
> the intended solution is to eventually recompile cached code online 
> with all the optimizations enabled (have to be explicitly enabled 
> -XX:+UseRecompilation). It's a work-in-progress and our experience 
> using it was mixed: recompilation doesn't always fully restore peak 
> performance.
>
> But assuming that both CDS and cached code archive are underutilized 
> (due to aforementioned reliance on custom loaders), 10% sounds way too 
> big of a difference. I suggest to experiment with different flag 
> combinations (e.g., turning ReplayTraining and LoadCachedCode on and 
> off independently).
>
> There's additional diagnostic output JVM produces which may help to 
> observe effects from new optimizations during both training and 
> deployment runs:
>
>  * -XX:+PrintCompilation: compilations satisfied from cached code 
> archive are marked w/ "R";
>
>  * -XX:+CITime:  prints information about cached code archive usage;
>
>  * -Xlog:init=info: produces additional information about some startup 
> activities
>
>  * -XX:+PrintSharedArchiveAndExit additionally dumps training data and 
> cached code archive info
>
>  * -Xlog:scc*=info and -Xlog:cds*=info print lots of additional 
> information both during training and deployment
>
> Hope it helps.
>
> Best regards,
> Vladimir Ivanov
>
> On 9/5/23 13:52, Ashutosh Mehra wrote:
>> Hi,
>>
>> We have been interested in persisting the profiling data in the CDS 
>> archive with the intention of improving the application's warmup time.
>> And now that the premain branch is here that does save profile data 
>> along with AOT, we started playing with the premain branch to 
>> understand its impact on the performance.
>>
>> Our setup uses Springboot Petclinic [0] application and the CDS and 
>> shared code archives are generated in a manner similar to this script 
>> [1].
>> Our training run only covers the application startup phase. That 
>> means at each step we start the application and shut it down without 
>> putting any load on it.
>>
>> Using the archives thus generated I have done few experiments on my 
>> local system. In these experiments the application is bound to two cpus.
>> The baseline for comparing the results is the case where the CDS 
>> archive does not have any profiling data and there is no shared code 
>> archive.
>> The "premain" configuration refers to using a shared code archive and 
>> a CDS archive with training data.
>>
>> Here are some initial results:
>>
>> 1. Startup: It is heartening to see start-up time improve by almost 11%.
>>
>> baseline       10.2s
>> premain         9.1s
>>
>> 2. Warmup:
>> This test measures the warmup time by applying load using 1 jmeter 
>> thread to get an idea of the ramp-up time to reach the peak throughput.
>> The load is applied for the duration of 300 seconds. The graph [2] 
>> for aot+profiling configuration shows interesting behavior.
>> In the initial period premain is ramping up faster than the baseline. 
>> Then the slope of the curve for premain reduces significantly and a 
>> couple of dips are also seen. Finally the throughput stabilizes.
>> It shows a drastic difference in the warmup time of the application 
>> when running with the "premain" config.
>>
>> 3. Peak throughput: Last experiment is to measure peak throughput. It 
>> starts with a warm-up phase of 180 seconds using 1 jmeter thread. 
>> After the warmup phase the load is applied with 10 jmeter threads for 
>> a duration of 5 mins.
>> Last two minutes of throughput is considered for measurement. The 
>> graph [3] for this test shows almost a 10% drop in the throughput 
>> compared to the baseline.
>>
>>
>> I am sure others would have done similar testing.  My questions are:
>>
>> 1. Are these results on the expected lines?
>> 2. Are these tests using the CDS and the shared code (or cached code) 
>> archives in the expected manner.
>> 3. Warmup time with the premain branch looks pretty bad which is 
>> surprising. Is there any trick I missed in my tests? Is there 
>> anything else that needs to be done to get better warmup time?
>> 4. What is the point of creating a new static archive? Shouldn't the 
>> applications just create the dynamic archive?
>> 5. I am also wondering if there is any design doc that can be shared 
>> that explains the AOT compilation strategy adopted in the premain 
>> branch?
>>
>> I have placed my scripts here [4] in case anyone wants to use them to 
>> run these tests (you need to build the Petclinic app before using 
>> these scripts).
>>
>> Please feel free to share your thoughts.
>>
>> [0] https://github.com/spring-projects/spring-petclinic 
>> <https://github.com/spring-projects/spring-petclinic>
>> [1] 
>> https://github.com/openjdk/leyden/blob/d960fb15258cc99a1bf7f0b1e94bd8be06605aad/test/hotspot/jtreg/premain/lib/premain-run.sh#L70-L101 
>> <https://github.com/openjdk/leyden/blob/d960fb15258cc99a1bf7f0b1e94bd8be06605aad/test/hotspot/jtreg/premain/lib/premain-run.sh#L70-L101> 
>>
>> [2] 
>> https://github.com/ashu-mehra/leyden-perf/blob/main/spring/fd82682/tput-t1.svg 
>> <https://github.com/ashu-mehra/leyden-perf/blob/main/spring/fd82682/tput-t1.svg> 
>>
>> [3] 
>> https://github.com/ashu-mehra/leyden-perf/blob/main/spring/fd82682/tput-t10.svg 
>> <https://github.com/ashu-mehra/leyden-perf/blob/main/spring/fd82682/tput-t10.svg> 
>>
>> [4] https://github.com/ashu-mehra/leyden-perf 
>> <https://github.com/ashu-mehra/leyden-perf>
>>
>> Thanks,
>> - Ashutosh Mehra


More information about the leyden-dev mailing list