JEP: https://bugs.openjdk.java.net/browse/JDK-8203832

Fri Jun 1 03:01:11 UTC 2018

Hi, Tobias

    Thanks for your review/questions. First I would introduce some
background of JWarmup application on use scenario  and how we implement the
interaction between application and scheduling (dispatch system, DS).

    The load of each application is controlled by DS. The profiling data is
collected against real input data (so it mostly matches the application run
in production environments, thus reduce the deoptimization chance). When
run with profiling data, application gets notification from DS when
compiling should start, application then calls API to notify JVM the hot
methods recorded in file can be compiled,  after the compilations, a
message sent out to DS so DS will dispatch load into this application.

     Now answer your questions:

      Here are some questions more detailed questions:

- How is it implemented? Is it based on the replay compilation framework?

A: No, it does not base on replay compilation framework. The data structure
for recording profile is newly designed.

- How do you handle dynamically generated code (for example, lambda forms)?

A: Not handled for any dynamically generated methods. Since name are
generated different some time for different runs.

- What information is stored/re-used and which profile is cached?

A: class/method/init order/bci method data etc..

- Does it work for C1 and C2?

A: C2 only, can make it work on C1.

- How is the tiered compilation policy affected?

A: Currently disabled.

- When do we compile (if method is first used or are there still
thresholds)?

A: see introduction above. No check for threshhold when compiling.

- How do you avoid overloading the compile queue?

A: In real application run, we did not find compile queue overloaded. This
also can be controlled since we know how many compiler threads configured,
and the size of recorded methods.

- Is re-profiling/re-compilation supported?

A: No. This answer also see answer for below question.

- What if a method is deoptimized? Is the cached profile update and re-used?

 A: During run with pre-compiled methods, deoptimization is only seen with
null-check elimination so it is not eliminated. The profile data is not
updated and re-used. That is, after deoptimized, it starts from interpreter
mode like freshly loaded.

Thanks

Yumin

On Wed, May 30, 2018 at 11:38 PM, Tobias Hartmann <
tobias.hartmann at oracle.com> wrote:

> Hi Yumin,
>
> This reminds me of a project we did for a student's bachelor thesis in
> 2015:
> https://github.com/mohlerm/hotspot/blob/master/report/
> profile_caching_mohlerm.pdf
>
> We also published a paper on that topic:
> https://dl.acm.org/citation.cfm?id=3132210
>
> Thanks for submitting the JEP, very interesting! Here are the things we've
> learned from the "cached
> profiles" project, maybe you can correct this from your experience with
> JWarmup:
> - Startup: We were seeing great improvements for some benchmarks but also
> large regressions for
> others. Problems like the overhead of reading the profiles, overloading
> the compile queue and
> increased compile time due to more optimizations affect the startup time.
> - Peak performance: Using profile information from a previous run might
> cause significant
> performance regressions in early stages of the execution. This is because
> a "late" profile is
> usually also the one with the fewest optimistic assumptions. For example,
> the latest profile from a
> previous run might have been updated right when the application was about
> to shut down. If this
> triggered class loading or has other side effects, we might not be able to
> inline some methods or
> perform other optimistic optimizations. Using this profile right from the
> beginning in a subsequent
> run limits peak performance significantly.
>
> Here are some questions more detailed questions:
> - How is it implemented? Is it based on the replay compilation framework?
> - How do you handle dynamically generated code (for example, lambda forms)?
> - What information is stored/re-used and which profile is cached?
> - Does it work for C1 and C2?
> - How is the tiered compilation policy affected?
> - When do we compile (if method is first used or are there still
> thresholds)?
> - How do you avoid overloading the compile queue?
> - Is re-profiling/re-compilation supported?
> - What if a method is deoptimized? Is the cached profile update and
> re-used?
>
> Best regards,
> Tobias
>
> On 29.05.2018 06:09, yumin qi wrote:
> > Hi， Experts
> >
> >   This is a newly filed JEP (JWarmup) for working on resolving java
> > performance issue caused by both application load peaking up and JIT
> > threads compiling java hot methods happen at same time.
> >
> >   https://bugs.openjdk.java.net/browse/JDK-8203832
> >
> >    For a large java application, the load comes in short period of time,
> > like the 'Single Day' sale on Alibaba's e-commerce application, this
> > massive load comes in and makes many java methods ready for JIT
> compilation
> > to convert them into native methods. The compiler threads will kick in to
> > do the complication work and take system resource from  mutator java
> > threads which are busy on processing requests thus lead to peak time
> > performance degradation.
> >
> >    The JWarmup technique was proposed to avoid such issue by precompiling
> > the hot methods at application startup and it has been successfully
> applied
> > to Alibaba's e-commerce applications. We would like to contribute it to
> > OpenJDK and wish it can help  java developers overcome the same issue.
> >
> >    Please review and give your feedback.
> >
> >   Thanks
> >   Yumin
> >
> >    (Alibaba Group Inc)
> >
>