premain: negative lookup cache for class loaders
Ashutosh Mehra
asmehra at redhat.com
Fri Jan 12 20:32:28 UTC 2024
>
> Ashutosh, you and your team have mentioned that there are tens of
> milliseconds (several percentage points of time) consumed during startup of
> some workloads by *failed* lookups
While working on Vladimir's suggestion to profile JVM_FindClassFromCaller,
I realized I had made a mistake in my earlier attempt to profile the
Class.forName method.
Sadly once I fixed that bug, the time spent in failed lookups is not that
significant any more.
This is the patch
<https://github.com/ashu-mehra/leyden/commit/0bd59831b387358b60b9f38080ff09081512679a>
I have for profiling Class.forName at Java level. It shows the time spent
in Class.forName for negative lookups for app class loader.
For Quarkus app that I am using, the patch reports 11ms which is 1.4% of
the startup time (of 750 ms).
For Springboot-petclinic app the patch reports 36ms which is 1.1% of the
startup time (of 3250ms).
The other patch
<https://github.com/ashu-mehra/leyden/commit/3923adbb2a3e3291965dd5b85cb7a918db555117>
I have is for profiling JVM_FindClassFromCaller when it throws an exception
for the app classloader.
For Quarkus app the patch reports 5ms.
For Springboot-petclinic app the patch reports 25ms.
Given these numbers, @JohnR do you think it is still worth spending time on
the negative cache for the class loaders?
And sorry for reporting incorrect numbers earlier.
Thanks,
- Ashutosh Mehra
On Thu, Jan 11, 2024 at 7:39 PM Vladimir Ivanov <
vladimir.x.ivanov at oracle.com> wrote:
>
> > We know that /successful/ lookups go fast the second time because the VM
> > caches the result in a central system dictionary. And, CDS technology
> > makes successful lookups go fast the /first time/, if the lookup was
> > performed in a training run and the resulting state stored in a CDS
> > archive. (Those who watch our premain branch will see that there is lots
> > of low-hanging fruit in CDS, that we are only beginning to enjoy.)
>
> Even though repeated successful lookups are already fast it is still
> benefitial to optimize them. For example, class pre-loading and CP entry
> pre-resolution are implemented in premain and do give noticeable startup
> improvements.
>
> And repeated successful lookups are common when it comes to
> Class.forName(). For example, PetClinic deployment run experiences 10k
> calls into JVM_FindClassFromCaller which cost ~20ms (measured on M1 Pro).
>
> So, while negative lookup cache looks like the lowest hanging fruit,
> it's worth to consider positive lookup caching scenario as well.
>
> Best regards,
> Vladimir Ivanov
>
> > But, a /failed/ lookup is not recorded anywhere. So every distinct
> > lookup must start again from first principles and fail all over again.
> > For some workloads this costs a small but measurable percentage of
> > startup time.
> >
> > The story is different for the local |CONSTANT_Class| entries in any
> > given classfile: The JVMS mandates that both successful and failed
> > lookups are recorded on the first attempt (per CP entry per se, not
> > globally and not per class). Global usage includes both use of
> > |Class.forName| and the “back end” logic for CP entry resolution. CP
> > resolution is performed at most once per CP entry, and (win or lose) is
> > made sticky on the CP itself, locally.
> >
> > To summarize, we can say that, for class lookup, both success and
> > failure are “sticky” locally, and success is “sticky” globally, but
> > failure is “not sticky” globally.
> >
> > The global behavior can be thought of either specific to a class loader
> > (i.e., coded in JDK code) or as something in the VM or JNI code that
> > works with the JDK code. In reality it is an emergent property of a
> > number of small details in both.
> >
> > A /negative lookup cache/ is a collection of class names (for a given
> > loader) which have already failed to load. “Sticky failure” could be
> > implemented with a negative lookup cache, either on a class loader (my
> > preferred solution, I think) or else somewhere in the VM internals that
> > participate in class loading paths.
> >
> > The benefits are obvious: Startup could be shorter by tens of
> > milliseconds. The eliminated operations include re-creating exceptions,
> > and throwing and catching them, and (maybe) uselessly re-probing the
> > file system.
> >
> > The risks include at least two cases. First, a user might somehow
> > contrive to extend the class path after a failure has been made sticky,
> > and then the user could be disappointed when a class appears on the new
> > class path components that satisfies the load. Second, a user might
> > somehow contrive to mutate an existing class path component (by writing
> > a file into a directory, say), and have the same disappointment of not
> > seeing the classfile get picked up on the next request.
> >
> > But it seems to me that a negative lookup cache is a legitimate
> > optimization /for well behaved class loaders/. (Please check my work
> > here!) The preconditions are that the well behaved class takes its input
> > from inputs that cannot be updated after the VM has started running. Or,
> > if and when those inputs are updated somehow, the negative cache must be
> > invalidated, at least for classes that could possibly be loaded from the
> > updated parts. You can sometimes reason from the package prefix and from
> > the class path updates that some name cannot be read from some class
> > path element, just because of a missing directory.
> >
> > A CDS archive records its class path, and can detect whether that class
> > path reads only from an immutable backing store. (This is a sweet spot
> > for Leyden.) If that is the case, then the CDS archive could also store
> > a negative lookup cache (for each eligible class loader). I think this
> > should be done in Java code and the relevant field and its data
> > special-cased to be retained via CDS.
> >
> > (I mean “special-cased” the way we already special-case some other
> > selected data, like the module graph and integer box cache. As with
> > framework-defined class loaders, we may have a conversation in the
> > future about letting user code into this little game as well. But it has
> > to be done in a way that does not violate any specification, which makes
> > it challenging. One step at a time.)
> >
> > For immediate prototyping and testing of the concept, we don’t need to
> > bring CDS into the picture. We can just have a global flag that says “it
> > is safe to use a negative lookup cache”. But to roll out this
> > optimization in a product, the flag needs to be automatically set to a
> > safe value, probably by CDS at startup, based on in inspection of the
> > class path settings in both training and deployment runs. And of course
> > (as a separate step) we can pre-populate the caches at CDS dump time
> > (that is, after a training run), so that the deployed application can
> > immediately benefit from the cache, and spend zero time exploring the
> > class path for classes that are known to be missing.
> >
> > BTW, I think it is just fine to throw a pre-constructed exception when
> > the negative lookup cache hits, even though some users will complain
> > that such exceptions are lacking meaningful messages and backtraces.
> > It’s within spec. HotSpot does this for certain “hot throws” of built-in
> > exceptions; see |GraphKit::builtin_throw|, and see also the tricky logic
> > that makes failures sticky in CP entries (which edits down the exception
> > information). As a compromise, the negative lookup cache could store an
> > exception object whose message is the class name (but with no backtrace).
> >
> > There’s a another way to approach this issue, which is to index the
> > class path in such a way that class loaders can respond to arbitrary
> > load requests but do little or no work on failing requests. A Bloom
> > filter is sometimes used in such cases to avoid many (not all) of the
> > searches. But I think that’s overkill for the use cases we actually
> > observe, which is a large number of failed lookups on a small number of
> > class names. A per-loader table mapping a name to an exception seems to
> > be a good tradeoff. And as I noted, CDS can pre-populate these things
> > eventually.
> >
> > Ashutosh, maybe you are interested in working on some of this? :-)
> >
> > — John
> >
> > P.S. If the negative lookup cache has the right “stability” properties,
> > we can even ask the JIT to think about optimizing failing
> > |Class.forName| calls, by consulting the cache at compile time. In the
> > Leyden setting, some |Class.forName| calls (not all) can be
> > constant-folded. Perhaps the argument is semi-constant and can be
> > profiled and speculated. Maybe some of that pays off, or maybe not;
> > probably not since the |forName| call is probably buried in a stack of
> > middleware. These are ideas for the JIT team to put on their very long
> list.
> >
> > P.P.S. Regarding the two side issues mentioned above…
> >
> > We are not at all forgetting about framework-defined class loaders. But
> > for the next few months it is enough to assume that we will optimize
> > only class loaders which are defined by the VM+JDK substrate. In the
> > future we will want to investigate how to make framework-defined loaders
> > compatible with whatever optimizations we create for the well behaved
> > JDK class loaders. It it not yet time to discuss that in detail; it is
> > time to learn the elements of our craft by working with the well behaved
> > class loaders only.
> >
> > The same comment applies to the observation that we might try to
> > “auto-train” applications. That is, get rid of the CDS archive,
> > generated by a separate training run, and just automagically run the
> > same application faster the second time, by capturing CDS-like states
> > from the first run, treating it “secretly” as a training run. We know
> > this can work well on some Java workloads. But we also like the
> > predictability and simplicity of CDS. For HotSpot, it is not yet time to
> > work on applying our learnings with CDS to the problem of auto-training.
> > I hope that time will come after we have mined out more of the basic
> > potential of CDS. For now we are working on the “one-step workflow”,
> > where there is an explicit training phase that generates CDS. The
> > “zero-step workflow” will comne in time.
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/leyden-dev/attachments/20240112/0797731e/attachment.htm>
More information about the leyden-dev
mailing list