The Great Startup Problem

Fri Aug 22 20:08:36 UTC 2014

Marcus coaxed me into making a post about our indy issues. Our indy
issues mostly surround startup and warmup time, so I'm making this a
general post about startup and warmup.

When I started working on JRuby 7 years ago, I hoped we'd have a good
answer for poor startup time and long warmup times. Today, the answers
are no better -- and in many cases much worse -- than when I started.

Here's a summary of our experience over the years...

* client versus server

Early on, we made JRuby's launcher use client mode by default. This
was by far the best way to get good startup performance, but it led to
us perpetuating the old question "which mode are you running in" when
people reported poor steady-state performance.

* Tiered compiler

The promise of the tiered compiler was great: client-fast startup with
server-fast steady state. In practice, tiered has failed to meet
expectations for us. The situation is aggravated by the loss of
-client and -server flags.

On the startup side, we have found that the tiered compiler never even
comes close to the startup time of -client. For a nontrivial app
startup, like a Rails app, we see a 50% reduction in startup time by
forcing tier 1 (which is C1, the old -client mode) rather than letting
the tiered compiler work normally.

Obviously limiting ourselves to tier 1 means performance is reduced,
but these days our #1 user complain is startup time. So, we have AGAIN
taken the step of putting startup-improving flags into our launchers:
jruby --dev forces tier 1 + client mode.

On the steady-state side, the tiered compiler is rather unpredictable.
Some cases will be faster (presumably from better profiling in earlier
tiers), while others will be much slower. And it can vary from run to
run...tiered steady-state performance is even harder to predict than
C2 (-server). We have done no investigation here.

* Invokedynamic

We love indy. We love it more than just about anyone. But we have
again had to make indy support OFF by default in JRuby 1.7.14 and may
have to do the same for JRuby 9000.

Originally, we had indy off because of the NCDFE bugs in the old
implementation. LambdaForms have fixed all that, and with JIT
improvements in the past year they generally (eventually) reach the
same steady-state performance.

Unfortunately, LambdaForms have an enormous startup-time cost. I
believe there's two reasons for this:

1. Method handle chains can now result in dozens of lambda forms,
making the initial bootstrapping cost much higher. Multiply this by
thousands of call sites, all getting hit for the first time. Multiply
that by PIC depth. And then remember that many boot-time operations
will blow out those caches, so you'll start over repeatedly. Some of
this can be mitigated in JRuby, but much of it cannot.

2. Lambda forms are too slow to execute and take too long to optimize
down to native code. Lambda forms work sorta like the tiered compiler.
They'll be interpreted for a while, then they'll become JVM bytecode
for a while, which interprets for a while, then the tiered compiler's
first phase will pick it up.... There's no way to "commit" a lambda
form you know you're going to be hitting hard, so it takes FOREVER to
get from a newly-bootstrapped call site to the 5 assembly instructions
that *actually* need to run.

I do want to emphasize that for us, LambdaForms usually do get to the
same peak performance we saw with the old implementation. It's just
taking way, way too long to get there.

Because of these issues, JRuby's new --dev flag turns invokedynamic
off, and JRuby 1.7.14 will once again tuen indy off by default on all
JVM versions.

* Other ways of mitigating startup time

We have recommended Nailgun in the past. Nailgun keeps a JVM running
in the background, and you toss it commands to run. It works well as
long as the commands are actually self-contained, self-cleaning units
of work; spin up one thread or leave resources open, and the Nailgun
server eventually becomes unusable.

We now recommend Drip as a similar solution. For each command you run,
Drip attempts to start additional larval JVMs in the background in
preparation for future commands. You can configure those instances to
pre-boot libraries or application resources, to reduce the work done
at startup for the next command (e.g. preboot your Rails application,
and then the next command just has to utilize it). Drip is cleaner
than Nailgun, but never quite achieves the same startup time without a
lot of configuration. It is also a bit of a hack...you can easily
preboot something in the "next JVM" that is out of date by the time
you use it.

CONCLUSION...

We obviously still love working with OpenJDK, and it remains the best
platform for building JRuby (and other languages). However, our
failure as a community to address these startup/warmup issues is
eventually going to kill us. Startup time remains the #1 complaint
about JRuby, and warmup time may be a close second.

What are the rest of you doing to deal with these issues?

- Charlie