premain: performance sniff tests
Ashutosh Mehra
asmehra at redhat.com
Wed Sep 6 15:07:56 UTC 2023
Hi Vladimir,
Thanks for providing the explanation on peak performance and diagnostic
options.
There were some experiments with PetClinic on our side before and it was
> noticed that the application relies on custom loaders which aren't fully
> supported yet.
Can you please elaborate more about the support required for handling
custom classloaders.
Do they have an impact on AOT code quality or the training data?
Until proper support for custom loaders is there, I suggest to modify
> the benchmark so it relies only on existing system loaders.
Is there ongoing work to improve the support for custom loaders?
Another thing that I want to check is the portability of the AOT code.
Do we do anything to ensure the AOT code is portable across
microarchitectures,
that is, it is not tied to the CPU features of the system where the code is
being generated.
If we bundle the cached code archive in containers, which I expect would be
one of the ways to deploy these archives,
then the portability would come into picture.
Thanks,
- Ashutosh Mehra
On Tue, Sep 5, 2023 at 8:41 PM Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
wrote:
> Hi Ashutosh,
>
> Thanks for giving it a try!
>
> There were some experiments with PetClinic on our side before and it was
> noticed that the application relies on custom loaders which aren't fully
> supported yet. It was the main limiting factor for new optimizations.
> Until proper support for custom loaders is there, I suggest to modify
> the benchmark so it relies only on existing system loaders.
>
> Speaking of peak performance, some loss of performance is expected.
> Cached code is compiled conservatively (e.g., no constant folding for
> static final fields) so it can be reused in deployment runs. For now,
> the intended solution is to eventually recompile cached code online with
> all the optimizations enabled (have to be explicitly enabled
> -XX:+UseRecompilation). It's a work-in-progress and our experience using
> it was mixed: recompilation doesn't always fully restore peak
> performance.
>
> But assuming that both CDS and cached code archive are underutilized
> (due to aforementioned reliance on custom loaders), 10% sounds way too
> big of a difference. I suggest to experiment with different flag
> combinations (e.g., turning ReplayTraining and LoadCachedCode on and off
> independently).
>
> There's additional diagnostic output JVM produces which may help to
> observe effects from new optimizations during both training and
> deployment runs:
>
> * -XX:+PrintCompilation: compilations satisfied from cached code
> archive are marked w/ "R";
>
> * -XX:+CITime: prints information about cached code archive usage;
>
> * -Xlog:init=info: produces additional information about some startup
> activities
>
> * -XX:+PrintSharedArchiveAndExit additionally dumps training data and
> cached code archive info
>
> * -Xlog:scc*=info and -Xlog:cds*=info print lots of additional
> information both during training and deployment
>
> Hope it helps.
>
> Best regards,
> Vladimir Ivanov
>
> On 9/5/23 13:52, Ashutosh Mehra wrote:
> > Hi,
> >
> > We have been interested in persisting the profiling data in the CDS
> > archive with the intention of improving the application's warmup time.
> > And now that the premain branch is here that does save profile data
> > along with AOT, we started playing with the premain branch to understand
> > its impact on the performance.
> >
> > Our setup uses Springboot Petclinic [0] application and the CDS and
> > shared code archives are generated in a manner similar to this script
> [1].
> > Our training run only covers the application startup phase. That means
> > at each step we start the application and shut it down without putting
> > any load on it.
> >
> > Using the archives thus generated I have done few experiments on my
> > local system. In these experiments the application is bound to two cpus.
> > The baseline for comparing the results is the case where the CDS archive
> > does not have any profiling data and there is no shared code archive.
> > The "premain" configuration refers to using a shared code archive and a
> > CDS archive with training data.
> >
> > Here are some initial results:
> >
> > 1. Startup: It is heartening to see start-up time improve by almost 11%.
> >
> > baseline 10.2s
> > premain 9.1s
> >
> > 2. Warmup:
> > This test measures the warmup time by applying load using 1 jmeter
> > thread to get an idea of the ramp-up time to reach the peak throughput.
> > The load is applied for the duration of 300 seconds. The graph [2] for
> > aot+profiling configuration shows interesting behavior.
> > In the initial period premain is ramping up faster than the baseline.
> > Then the slope of the curve for premain reduces significantly and a
> > couple of dips are also seen. Finally the throughput stabilizes.
> > It shows a drastic difference in the warmup time of the application when
> > running with the "premain" config.
> >
> > 3. Peak throughput: Last experiment is to measure peak throughput. It
> > starts with a warm-up phase of 180 seconds using 1 jmeter thread. After
> > the warmup phase the load is applied with 10 jmeter threads for a
> > duration of 5 mins.
> > Last two minutes of throughput is considered for measurement. The graph
> > [3] for this test shows almost a 10% drop in the throughput compared to
> > the baseline.
> >
> >
> > I am sure others would have done similar testing. My questions are:
> >
> > 1. Are these results on the expected lines?
> > 2. Are these tests using the CDS and the shared code (or cached code)
> > archives in the expected manner.
> > 3. Warmup time with the premain branch looks pretty bad which is
> > surprising. Is there any trick I missed in my tests? Is there anything
> > else that needs to be done to get better warmup time?
> > 4. What is the point of creating a new static archive? Shouldn't the
> > applications just create the dynamic archive?
> > 5. I am also wondering if there is any design doc that can be shared
> > that explains the AOT compilation strategy adopted in the premain branch?
> >
> > I have placed my scripts here [4] in case anyone wants to use them to
> > run these tests (you need to build the Petclinic app before using these
> > scripts).
> >
> > Please feel free to share your thoughts.
> >
> > [0] https://github.com/spring-projects/spring-petclinic
> > <https://github.com/spring-projects/spring-petclinic>
> > [1]
> >
> https://github.com/openjdk/leyden/blob/d960fb15258cc99a1bf7f0b1e94bd8be06605aad/test/hotspot/jtreg/premain/lib/premain-run.sh#L70-L101
> <
> https://github.com/openjdk/leyden/blob/d960fb15258cc99a1bf7f0b1e94bd8be06605aad/test/hotspot/jtreg/premain/lib/premain-run.sh#L70-L101
> >
> > [2]
> >
> https://github.com/ashu-mehra/leyden-perf/blob/main/spring/fd82682/tput-t1.svg
> <
> https://github.com/ashu-mehra/leyden-perf/blob/main/spring/fd82682/tput-t1.svg
> >
> > [3]
> >
> https://github.com/ashu-mehra/leyden-perf/blob/main/spring/fd82682/tput-t10.svg
> <
> https://github.com/ashu-mehra/leyden-perf/blob/main/spring/fd82682/tput-t10.svg
> >
> > [4] https://github.com/ashu-mehra/leyden-perf
> > <https://github.com/ashu-mehra/leyden-perf>
> >
> > Thanks,
> > - Ashutosh Mehra
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/leyden-dev/attachments/20230906/c1540347/attachment.htm>
More information about the leyden-dev
mailing list