premain: negative lookup cache for class loaders

John Rose john.r.rose at oracle.com
Thu Jan 11 22:15:06 UTC 2024


(I’m putting in email something we discussed on a zoom call.)

On the premain project, we are learning to perform special optimizations 
that depend on “well behaved” class loaders, because they are simply 
front-ends to declarative information like the class path.  These 
optimizations shift work into a training run, storing resulting states 
into a CDS archive, and then adopting those states into a deployed 
application, quickly, as it starts up.

(“But what about user-defined loaders?  But why do we have to use 
CDS?” — See below for comments on these two side issues.)

Ashutosh, you and your team have mentioned that there are tens of 
milliseconds (several percentage points of time) consumed during startup 
of some workloads by *failed* lookups. A logging framework may be 
querying for code resources and falling back somehow if they fail to 
load.  The code probably has a try/catch that processes 
`ClassNotFoundException` or the like.

We know that *successful* lookups go fast the second time because the VM 
caches the result in a central system dictionary.  And, CDS technology 
makes successful lookups go fast the *first time*, if the lookup was 
performed in a training run and the resulting state stored in a CDS 
archive.  (Those who watch our premain branch will see that there is 
lots of low-hanging fruit in CDS, that we are only beginning to enjoy.)

But, a *failed* lookup is not recorded anywhere.  So every distinct 
lookup must start again from first principles and fail all over again.  
For some workloads this costs a small but measurable percentage of 
startup time.

The story is different for the local `CONSTANT_Class` entries in any 
given classfile:  The JVMS mandates that both successful and failed 
lookups are recorded on the first attempt (per CP entry per se, not 
globally and not per class).  Global usage includes both use of 
`Class.forName` and the “back end” logic for CP entry resolution.  
CP resolution is performed at most once per CP entry, and (win or lose) 
is made sticky on the CP itself, locally.

To summarize, we can say that, for class lookup, both success and 
failure are “sticky” locally, and success is “sticky” globally, 
but failure is “not sticky” globally.

The global behavior can be thought of either specific to a class loader 
(i.e., coded in JDK code) or as something in the VM or JNI code that 
works with the JDK code.  In reality it is an emergent property of a 
number of small details in both.

A *negative lookup cache* is a collection of class names (for a given 
loader) which have already failed to load.  “Sticky failure” could 
be implemented with a negative lookup cache, either on a class loader 
(my preferred solution, I think) or else somewhere in the VM internals 
that participate in class loading paths.

The benefits are obvious: Startup could be shorter by tens of 
milliseconds.  The eliminated operations include re-creating exceptions, 
and throwing and catching them, and (maybe) uselessly re-probing the 
file system.

The risks include at least two cases.  First, a user might somehow 
contrive to extend the class path after a failure has been made sticky, 
and then the user could be disappointed when a class appears on the new 
class path components that satisfies the load.  Second, a user might 
somehow contrive to mutate an existing class path component (by writing 
a file into a directory, say), and have the same disappointment of not 
seeing the classfile get picked up on the next request.

But it seems to me that a negative lookup cache is a legitimate 
optimization *for well behaved class loaders*.  (Please check my work 
here!)  The preconditions are that the well behaved class takes its 
input from inputs that cannot be updated after the VM has started 
running.  Or, if and when those inputs are updated somehow, the negative 
cache must be invalidated, at least for classes that could possibly be 
loaded from the updated parts.  You can sometimes reason from the 
package prefix and from the class path updates that some name cannot be 
read from some class path element, just because of a missing directory.

A CDS archive records its class path, and can detect whether that class 
path reads only from an immutable backing store.  (This is a sweet spot 
for Leyden.)  If that is the case, then the CDS archive could also store 
a negative lookup cache (for each eligible class loader).  I think this 
should be done in Java code and the relevant field and its data 
special-cased to be retained via CDS.

(I mean “special-cased” the way we already special-case some other 
selected data, like the module graph and integer box cache.  As with 
framework-defined class loaders, we may have a conversation in the 
future about letting user code into this little game as well.  But it 
has to be done in a way that does not violate any specification, which 
makes it challenging.  One step at a time.)

For immediate prototyping and testing of the concept, we don’t need to 
bring CDS into the picture.  We can just have a global flag that says 
“it is safe to use a negative lookup cache”.  But to roll out this 
optimization in a product, the flag needs to be automatically set to a 
safe value, probably by CDS at startup, based on in inspection of the 
class path settings in both training and deployment runs.  And of course 
(as a separate step) we can pre-populate the caches at CDS dump time 
(that is, after a training run), so that the deployed application can 
immediately benefit from the cache, and spend zero time exploring the 
class path for classes that are known to be missing.

BTW, I think it is just fine to throw a pre-constructed exception when 
the negative lookup cache hits, even though some users will complain 
that such exceptions are lacking meaningful messages and backtraces.  
It’s within spec.  HotSpot does this for certain “hot throws” of 
built-in exceptions; see `GraphKit::builtin_throw`, and see also the 
tricky logic that makes failures sticky in CP entries (which edits down 
the exception information).  As a compromise, the negative lookup cache 
could store an exception object whose message is the class name (but 
with no backtrace).

There’s a another way to approach this issue, which is to index the 
class path in such a way that class loaders can respond to arbitrary 
load requests but do little or no work on failing requests.  A Bloom 
filter is sometimes used in such cases to avoid many (not all) of the 
searches.  But I think that’s overkill for the use cases we actually 
observe, which is a large number of failed lookups on a small number of 
class names.  A per-loader table mapping a name to an exception seems to 
be a good tradeoff.  And as I noted, CDS can pre-populate these things 
eventually.

Ashutosh, maybe you are interested in working on some of this? :-)

— John

P.S. If the negative lookup cache has the right “stability” 
properties, we can even ask the JIT to think about optimizing failing 
`Class.forName` calls, by consulting the cache at compile time.  In the 
Leyden setting, some `Class.forName` calls (not all) can be 
constant-folded.  Perhaps the argument is semi-constant and can be 
profiled and speculated.  Maybe some of that pays off, or maybe not; 
probably not since the `forName` call is probably buried in a stack of 
middleware.  These are ideas for the JIT team to put on their very long 
list.

P.P.S.  Regarding the two side issues mentioned above…

We are not at all forgetting about framework-defined class loaders.  But 
for the next few months it is enough to assume that we will optimize 
only class loaders which are defined by the VM+JDK substrate.  In the 
future we will want to investigate how to make framework-defined loaders 
compatible with whatever optimizations we create for the well behaved 
JDK class loaders.  It it not yet time to discuss that in detail; it is 
time to learn the elements of our craft by working with the well behaved 
class loaders only.

The same comment applies to the observation that we might try to 
“auto-train” applications.  That is, get rid of the CDS archive, 
generated by a separate training run, and just automagically run the 
same application faster the second time, by capturing CDS-like states 
from the first run, treating it “secretly” as a training run.  We 
know this can work well on some Java workloads.  But we also like the 
predictability and simplicity of CDS.  For HotSpot, it is not yet time 
to work on applying our learnings with CDS to the problem of 
auto-training.  I hope that time will come after we have mined out more 
of the basic potential of CDS.  For now we are working on the 
“one-step workflow”, where there is an explicit training phase that 
generates CDS.  The “zero-step workflow” will comne in time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/leyden-dev/attachments/20240111/36bdd0cd/attachment.htm>


More information about the leyden-dev mailing list