premain: performance sniff tests

Ashutosh Mehra asmehra at redhat.com
Tue Sep 5 20:52:56 UTC 2023


Hi,

We have been interested in persisting the profiling data in the CDS archive
with the intention of improving the application's warmup time.
And now that the premain branch is here that does save profile data along
with AOT, we started playing with the premain branch to understand its
impact on the performance.

Our setup uses Springboot Petclinic [0] application and the CDS and shared
code archives are generated in a manner similar to this script [1].
Our training run only covers the application startup phase. That means at
each step we start the application and shut it down without putting any
load on it.

Using the archives thus generated I have done few experiments on my local
system. In these experiments the application is bound to two cpus.
The baseline for comparing the results is the case where the CDS archive
does not have any profiling data and there is no shared code archive.
The "premain" configuration refers to using a shared code archive and a CDS
archive with training data.

Here are some initial results:

1. Startup: It is heartening to see start-up time improve by almost 11%.

baseline       10.2s
premain         9.1s

2. Warmup:
This test measures the warmup time by applying load using 1 jmeter thread
to get an idea of the ramp-up time to reach the peak throughput.
The load is applied for the duration of 300 seconds. The graph [2] for
aot+profiling configuration shows interesting behavior.
In the initial period premain is ramping up faster than the baseline. Then
the slope of the curve for premain reduces significantly and a couple of
dips are also seen. Finally the throughput stabilizes.
It shows a drastic difference in the warmup time of the application when
running with the "premain" config.

3. Peak throughput: Last experiment is to measure peak throughput. It
starts with a warm-up phase of 180 seconds using 1 jmeter thread. After the
warmup phase the load is applied with 10 jmeter threads for a duration of 5
mins.
Last two minutes of throughput is considered for measurement. The graph [3]
for this test shows almost a 10% drop in the throughput compared to the
baseline.


I am sure others would have done similar testing.  My questions are:

1. Are these results on the expected lines?
2. Are these tests using the CDS and the shared code (or cached code)
archives in the expected manner.
3. Warmup time with the premain branch looks pretty bad which is
surprising. Is there any trick I missed in my tests? Is there anything else
that needs to be done to get better warmup time?
4. What is the point of creating a new static archive? Shouldn't the
applications just create the dynamic archive?
5. I am also wondering if there is any design doc that can be shared that
explains the AOT compilation strategy adopted in the premain branch?

I have placed my scripts here [4] in case anyone wants to use them to run
these tests (you need to build the Petclinic app before using these
scripts).

Please feel free to share your thoughts.

[0] https://github.com/spring-projects/spring-petclinic
[1]
https://github.com/openjdk/leyden/blob/d960fb15258cc99a1bf7f0b1e94bd8be06605aad/test/hotspot/jtreg/premain/lib/premain-run.sh#L70-L101
[2]
https://github.com/ashu-mehra/leyden-perf/blob/main/spring/fd82682/tput-t1.svg
[3]
https://github.com/ashu-mehra/leyden-perf/blob/main/spring/fd82682/tput-t10.svg
[4] https://github.com/ashu-mehra/leyden-perf

Thanks,
- Ashutosh Mehra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/leyden-dev/attachments/20230905/3e6c47ef/attachment.htm>


More information about the leyden-dev mailing list