<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>I think it's worth experimenting and see how much saving can be
achieved.</p>
<p>Although the current estimate may be small (11 ms out of 750 ms),
it may be a code path that's difficult to optimize down the road.</p>
<p>I think we can significantly improve over the current performance
by shifting more Java computation to build time (e.g., making a
heap snapshot of computed constants, running <clinit> at
build time, etc). We should understand where the frameworks are
doing such negative class lookups, and see if they can be time
shifted (or otherwise avoided).</p>
<p>If the answer is no, then I think it's worthwhile to implement a
*runtime* negative lookup cache. As the overall start-up time goes
down, the cost of negative class lookup will increase. For
example, if it becomes 9 ms out of 250ms, then it will be more
significant.</p>
<p>Also, are you testing with "AOT" mode for Spring Petclinic --
it's a special packaging mode where a lot of the symbolic
information are resolved at build time, so perhaps it will have a
much lower use of negative class lookup?<br>
</p>
<p><a class="moz-txt-link-freetext" href="https://github.com/openjdk/leyden/tree/premain/test/hotspot/jtreg/premain/spring-petclinic">https://github.com/openjdk/leyden/tree/premain/test/hotspot/jtreg/premain/spring-petclinic</a><br>
</p>
<p>Thanks</p>
<p>- Ioi<br>
</p>
<p><br>
</p>
On 1/12/24 12:32 PM, Ashutosh Mehra wrote:<br>
<blockquote type="cite" cite="mid:CAKt0pyRreF2JkWvRbe75OKg_jbk+5YU_4tzAPHcWN3mXkz5-xg@mail.gmail.com">
<div dir="ltr">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span style="font-family:sans-serif">Ashutosh, you and your team
have mentioned that there are tens of milliseconds (several
percentage points of time) consumed during startup of some
workloads by </span><em style="font-family:sans-serif">failed</em><span style="font-family:sans-serif"> lookups</span> </blockquote>
<div><br>
</div>
<div>While working on Vladimir's suggestion to
profile JVM_FindClassFromCaller, I realized I had made a
mistake in my earlier attempt to profile the Class.forName
method.</div>
<div>Sadly once I fixed that bug, the time spent in failed
lookups is not that significant any more.</div>
<div><br>
</div>
<div>This is the <a href="https://github.com/ashu-mehra/leyden/commit/0bd59831b387358b60b9f38080ff09081512679a" moz-do-not-send="true">patch</a> I have for profiling
Class.forName at Java level. It shows the time spent in
Class.forName for negative lookups for app class loader. </div>
<div>For Quarkus app that I am using, the patch reports 11ms
which is 1.4% of the startup time (of 750 ms).</div>
<div>For Springboot-petclinic app the patch reports 36ms which
is 1.1% of the startup time (of 3250ms).</div>
<div><br>
</div>
<div>The other <a href="https://github.com/ashu-mehra/leyden/commit/3923adbb2a3e3291965dd5b85cb7a918db555117" moz-do-not-send="true">patch</a> I have is for profiling
JVM_FindClassFromCaller when it throws an exception for the
app classloader.</div>
<div>
<div>For Quarkus app the patch reports 5ms.</div>
<div>For Springboot-petclinic app the patch reports 25ms.</div>
</div>
<div><br>
</div>
<div>Given these numbers, @JohnR do you think it is still worth
spending time on the negative cache for the class loaders?<br>
</div>
<div>And sorry for reporting incorrect numbers earlier.</div>
<div><br>
</div>
<div>Thanks,</div>
<div>
<div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr">- Ashutosh Mehra</div>
</div>
</div>
<br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Jan 11, 2024 at
7:39 PM Vladimir Ivanov <<a href="mailto:vladimir.x.ivanov@oracle.com" moz-do-not-send="true" class="moz-txt-link-freetext">vladimir.x.ivanov@oracle.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
> We know that /successful/ lookups go fast the second time
because the VM <br>
> caches the result in a central system dictionary. And,
CDS technology <br>
> makes successful lookups go fast the /first time/, if the
lookup was <br>
> performed in a training run and the resulting state
stored in a CDS <br>
> archive. (Those who watch our premain branch will see
that there is lots <br>
> of low-hanging fruit in CDS, that we are only beginning
to enjoy.)<br>
<br>
Even though repeated successful lookups are already fast it is
still <br>
benefitial to optimize them. For example, class pre-loading
and CP entry <br>
pre-resolution are implemented in premain and do give
noticeable startup <br>
improvements.<br>
<br>
And repeated successful lookups are common when it comes to <br>
Class.forName(). For example, PetClinic deployment run
experiences 10k <br>
calls into JVM_FindClassFromCaller which cost ~20ms (measured
on M1 Pro).<br>
<br>
So, while negative lookup cache looks like the lowest hanging
fruit, <br>
it's worth to consider positive lookup caching scenario as
well.<br>
<br>
Best regards,<br>
Vladimir Ivanov<br>
<br>
> But, a /failed/ lookup is not recorded anywhere. So every
distinct <br>
> lookup must start again from first principles and fail
all over again. <br>
> For some workloads this costs a small but measurable
percentage of <br>
> startup time.<br>
> <br>
> The story is different for the local |CONSTANT_Class|
entries in any <br>
> given classfile: The JVMS mandates that both successful
and failed <br>
> lookups are recorded on the first attempt (per CP entry
per se, not <br>
> globally and not per class). Global usage includes both
use of <br>
> |Class.forName| and the “back end” logic for CP entry
resolution. CP <br>
> resolution is performed at most once per CP entry, and
(win or lose) is <br>
> made sticky on the CP itself, locally.<br>
> <br>
> To summarize, we can say that, for class lookup, both
success and <br>
> failure are “sticky” locally, and success is “sticky”
globally, but <br>
> failure is “not sticky” globally.<br>
> <br>
> The global behavior can be thought of either specific to
a class loader <br>
> (i.e., coded in JDK code) or as something in the VM or
JNI code that <br>
> works with the JDK code. In reality it is an emergent
property of a <br>
> number of small details in both.<br>
> <br>
> A /negative lookup cache/ is a collection of class names
(for a given <br>
> loader) which have already failed to load. “Sticky
failure” could be <br>
> implemented with a negative lookup cache, either on a
class loader (my <br>
> preferred solution, I think) or else somewhere in the VM
internals that <br>
> participate in class loading paths.<br>
> <br>
> The benefits are obvious: Startup could be shorter by
tens of <br>
> milliseconds. The eliminated operations include
re-creating exceptions, <br>
> and throwing and catching them, and (maybe) uselessly
re-probing the <br>
> file system.<br>
> <br>
> The risks include at least two cases. First, a user might
somehow <br>
> contrive to extend the class path after a failure has
been made sticky, <br>
> and then the user could be disappointed when a class
appears on the new <br>
> class path components that satisfies the load. Second, a
user might <br>
> somehow contrive to mutate an existing class path
component (by writing <br>
> a file into a directory, say), and have the same
disappointment of not <br>
> seeing the classfile get picked up on the next request.<br>
> <br>
> But it seems to me that a negative lookup cache is a
legitimate <br>
> optimization /for well behaved class loaders/. (Please
check my work <br>
> here!) The preconditions are that the well behaved class
takes its input <br>
> from inputs that cannot be updated after the VM has
started running. Or, <br>
> if and when those inputs are updated somehow, the
negative cache must be <br>
> invalidated, at least for classes that could possibly be
loaded from the <br>
> updated parts. You can sometimes reason from the package
prefix and from <br>
> the class path updates that some name cannot be read from
some class <br>
> path element, just because of a missing directory.<br>
> <br>
> A CDS archive records its class path, and can detect
whether that class <br>
> path reads only from an immutable backing store. (This is
a sweet spot <br>
> for Leyden.) If that is the case, then the CDS archive
could also store <br>
> a negative lookup cache (for each eligible class loader).
I think this <br>
> should be done in Java code and the relevant field and
its data <br>
> special-cased to be retained via CDS.<br>
> <br>
> (I mean “special-cased” the way we already special-case
some other <br>
> selected data, like the module graph and integer box
cache. As with <br>
> framework-defined class loaders, we may have a
conversation in the <br>
> future about letting user code into this little game as
well. But it has <br>
> to be done in a way that does not violate any
specification, which makes <br>
> it challenging. One step at a time.)<br>
> <br>
> For immediate prototyping and testing of the concept, we
don’t need to <br>
> bring CDS into the picture. We can just have a global
flag that says “it <br>
> is safe to use a negative lookup cache”. But to roll out
this <br>
> optimization in a product, the flag needs to be
automatically set to a <br>
> safe value, probably by CDS at startup, based on in
inspection of the <br>
> class path settings in both training and deployment runs.
And of course <br>
> (as a separate step) we can pre-populate the caches at
CDS dump time <br>
> (that is, after a training run), so that the deployed
application can <br>
> immediately benefit from the cache, and spend zero time
exploring the <br>
> class path for classes that are known to be missing.<br>
> <br>
> BTW, I think it is just fine to throw a pre-constructed
exception when <br>
> the negative lookup cache hits, even though some users
will complain <br>
> that such exceptions are lacking meaningful messages and
backtraces. <br>
> It’s within spec. HotSpot does this for certain “hot
throws” of built-in <br>
> exceptions; see |GraphKit::builtin_throw|, and see also
the tricky logic <br>
> that makes failures sticky in CP entries (which edits
down the exception <br>
> information). As a compromise, the negative lookup cache
could store an <br>
> exception object whose message is the class name (but
with no backtrace).<br>
> <br>
> There’s a another way to approach this issue, which is to
index the <br>
> class path in such a way that class loaders can respond
to arbitrary <br>
> load requests but do little or no work on failing
requests. A Bloom <br>
> filter is sometimes used in such cases to avoid many (not
all) of the <br>
> searches. But I think that’s overkill for the use cases
we actually <br>
> observe, which is a large number of failed lookups on a
small number of <br>
> class names. A per-loader table mapping a name to an
exception seems to <br>
> be a good tradeoff. And as I noted, CDS can pre-populate
these things <br>
> eventually.<br>
> <br>
> Ashutosh, maybe you are interested in working on some of
this? :-)<br>
> <br>
> — John<br>
> <br>
> P.S. If the negative lookup cache has the right
“stability” properties, <br>
> we can even ask the JIT to think about optimizing failing
<br>
> |Class.forName| calls, by consulting the cache at compile
time. In the <br>
> Leyden setting, some |Class.forName| calls (not all) can
be <br>
> constant-folded. Perhaps the argument is semi-constant
and can be <br>
> profiled and speculated. Maybe some of that pays off, or
maybe not; <br>
> probably not since the |forName| call is probably buried
in a stack of <br>
> middleware. These are ideas for the JIT team to put on
their very long list.<br>
> <br>
> P.P.S. Regarding the two side issues mentioned above…<br>
> <br>
> We are not at all forgetting about framework-defined
class loaders. But <br>
> for the next few months it is enough to assume that we
will optimize <br>
> only class loaders which are defined by the VM+JDK
substrate. In the <br>
> future we will want to investigate how to make
framework-defined loaders <br>
> compatible with whatever optimizations we create for the
well behaved <br>
> JDK class loaders. It it not yet time to discuss that in
detail; it is <br>
> time to learn the elements of our craft by working with
the well behaved <br>
> class loaders only.<br>
> <br>
> The same comment applies to the observation that we might
try to <br>
> “auto-train” applications. That is, get rid of the CDS
archive, <br>
> generated by a separate training run, and just
automagically run the <br>
> same application faster the second time, by capturing
CDS-like states <br>
> from the first run, treating it “secretly” as a training
run. We know <br>
> this can work well on some Java workloads. But we also
like the <br>
> predictability and simplicity of CDS. For HotSpot, it is
not yet time to <br>
> work on applying our learnings with CDS to the problem of
auto-training. <br>
> I hope that time will come after we have mined out more
of the basic <br>
> potential of CDS. For now we are working on the “one-step
workflow”, <br>
> where there is an explicit training phase that generates
CDS. The <br>
> “zero-step workflow” will comne in time.<br>
> <br>
<br>
</blockquote>
</div>
</blockquote>
</body>
</html>