<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<div dir="ltr">
<div dir="ltr" style="font-family: Aptos, Arial, Helvetica, sans-serif; font-size: 12pt;">
Thanks David</div>
<div><br>
</div>
<div dir="ltr"><font size="3">I should have been clearer in my response; “stopping” pertains to the recording of the training data, not necessarily the application itself. And “immediate” could be better phrased as “asap”, both are used in contrast to</font><span style="font-size: medium;"> the
other option: “stop recording training data when some event happens in the future"</span></div>
<div dir="ltr"><br>
</div>
<div id="ms-outlook-mobile-signature" dir="ltr"></div>
<div id="mail-editor-reference-message-container">
<div class="ms-outlook-mobile-reference-message skipProofing"><span style="mso-bookmark:_MailOriginalBody">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
<div style="font-family: Aptos; font-size: 12pt; text-align: left; border-width: 1pt medium medium; border-style: solid none none; border-color: rgb(181, 196, 223) currentcolor currentcolor; padding: 3pt 0in 0in; color: black;">
<span style="font-weight:bold">From: </span>leyden-dev <leyden-dev-retn@openjdk.org> on behalf of David Holmes <david.holmes@oracle.com><br>
<span style="font-weight:bold">Date: </span>Thursday, August 22, 2024 at 5:18 PM<br>
<span style="font-weight:bold">To: </span>leyden-dev@openjdk.org <leyden-dev@openjdk.org><br>
<span style="font-weight:bold">Subject: </span>Re: EA feedback<br>
<br>
</div>
<font size="2"><span style="font-size:11pt;">
<div class="PlainText">I'm normally just a lurker here but wanted to clarify one thing ...<br>
<br>
On 23/08/2024 2:03 am, Mat Carter wrote:<br>
> Having considered the JBS issue [1], we'd like to propose/discuss some <br>
> additional options, the thought being that while implementation can be <br>
> staggered it might be helpful to consider them now to help <br>
> prioritize/rethink/reject.<br>
> <br>
> In summary we currently have the "stop" action, stopping is immediate <br>
> (now) and operationalized via the following mechanisms:<br>
> • Ctrl-Z/D<br>
> • System.Exit()<br>
> • Run to completion (normal termination)<br>
> • Unhandled exception :)<br>
<br>
Note that in actuality none of those mechanisms are immediate, there are <br>
lots of things continuing to happen whilst we are in the process of <br>
"stopping", some of which themselves must run to completion (shutdown <br>
hooks) whilst others are truly abruptly terminated once the process is <br>
blown away.<br>
<br>
I think Leyden is just looking for a way to signal the end of training <br>
data collection, not necessarily the end of the program it is being <br>
collected from.<br>
<br>
Cheers,<br>
David<br>
-----<br>
<br>
> And this new ask in JBS [1] is to<br>
> • Add a mechanism using 'jcmd' and/or a new Leyden API to action <br>
> the existing "stop" (now) functionality<br>
> • Add a variant of the "stop" action where training stops when <br>
> execution enters a java method (optionally after N invocations); specify <br>
> method to stop on via Command Line (-XX), JCmd and/or Leyden API<br>
> <br>
> We’d like to propose two additional variants to aid with stopping <br>
> training runs:<br>
> • Stop training after some time has elapsed<br>
> • Stop training when some threshold is met<br>
> <br>
> The second point comes from considering the definition of 'startup <br>
> complete' and 'warmup complete'. Considering the developer is already <br>
> engaged in AOT training, we should allow the developer to aid the <br>
> training by:<br>
> • Indicating when the application has completed startup (ready to <br>
> work)<br>
> • Indicate when the application has completed warmup, either by:<br>
> ○ Calling an API to indicate warmup is done<br>
> ○ Calling an API when a chunk of work has completed, passing <br>
> in the 'duration'; this 'duration' is compared to some specified <br>
> 'threshold' to generate the 'warmup is complete' event<br>
> <br>
> Adding a threshold means that training runs can be of optimal length and <br>
> can handle changes in the environment and/or code. Coupled with maybe <br>
> an "abort training" action we now have a training run that targets a <br>
> threshold and aborts if not reached in time N.<br>
> <br>
> Currently we have the command line, jcmd and a new Leyden API to support <br>
> training runs. Lastly we'd like to propose a fourth method that being <br>
> an MxBean (uses the new Leyden API); this would allow the developer to <br>
> provide the startup and warmup indicators internally or externally <br>
> (their choice), and would allow for runtime analysis using bespoke <br>
> production systems or JMC, and offline analysis via JFR.<br>
> <br>
> Cheers<br>
> Mat<br>
> <br>
> [1] <a href="https://bugs.openjdk.org/browse/JDK-8335358">https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.org%2Fbrowse%2FJDK-8335358&data=05%7C02%7Cmatthew.carter%40microsoft.com%7C4ff12f99e9bf4436bff508dcc3091127%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638599690928723295%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=H6qKUl4pFMhTH04QWxrh5Q9JHvUnJXWf%2F3vetY1H%2B%2FA%3D&reserved=0</a>
<br>
> <<a href="https://bugs.openjdk.org/browse/JDK-8335358">https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.org%2Fbrowse%2FJDK-8335358&data=05%7C02%7Cmatthew.carter%40microsoft.com%7C4ff12f99e9bf4436bff508dcc3091127%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638599690928733661%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=6hpy60KVGSP3GT8TGtsJxaNNr5GBTihuMhcjiEzo7Qk%3D&reserved=0</a>><br>
> <br>
> From: leyden-dev <leyden-dev-retn@openjdk.org> on behalf of John Rose <br>
> <john.r.rose@oracle.com><br>
> Date: Friday, August 16, 2024 at 4:51 PM<br>
> To: ioi.lam@oracle.com <ioi.lam@oracle.com><br>
> Cc: leyden-dev@openjdk.org <leyden-dev@openjdk.org><br>
> Subject: Re: EA feedback<br>
> <br>
> Here’s the way I would prefer to think about a “dump command”.<br>
> <br>
> The native way that the JVM represents sequential operations is<br>
> the method. Talking about methods is therefore a basic way to<br>
> specify a condition for injecting a JVM operation like training<br>
> dumps. I would like to figure out a good way to tie the training<br>
> dump to the invocation of a method, either a single well-known<br>
> method, or to a method specified (on the command line) by the<br>
> user.<br>
> <br>
> In fact, it feels like a breakpoint-like operation would be a<br>
> natural way to view the training dump. You don’t need JVMTI<br>
> to get it done; you just need a hack in the VM which parallels<br>
> the existing breakpoint mechanism, but special-cases it to<br>
> drive a training dump.<br>
> <br>
> Given such a foundation, jsig could then inject a call to a<br>
> method which is appropriately tied to the dump command.<br>
> <br>
> Sketch of implementation:<br>
> <br>
> When a method is first linked, a list is checked to see if<br>
> it has a dump event tied to it, and a bit is set on the method.<br>
> The method’s interpreter entry point might be modified, or<br>
> perhaps the interpreter just always checks the bit. On entry<br>
> to the method, before the first bytecode, an upcall tells<br>
> the VM that it’s time to finish the training run.<br>
> <br>
> The compilers also check this bit, of course.<br>
> <br>
> There is some method deep in the privates of java.base<br>
> that is always treated this way. That’s what jcmd reaches.<br>
> There is a command line option which lists more methods<br>
> to treat this way, something like the CompileOnly command.<br>
> <br>
> As a separate option, the upcall to end the training run<br>
> might return (allowing the VM to continue) or just exit.<br>
> <br>
> As a separate option, allow the user to specify a count N,<br>
> so that the training dump happens only after N “hits” on<br>
> any marked method(s).<br>
> <br>
> I think all this is useful and flexible.<br>
> <br>
> On 13 Aug 2024, at 18:22, ioi.lam@oracle.com wrote:<br>
> <br>
> <br>
> On 8/13/24 12:42 PM, Ashutosh Mehra wrote:<br>
>><br>
>> Being able to trigger assembly/verification via jcmd without<br>
>> exiting, would make this far easier for us to support.<br>
>><br>
>> There is a proposed enhancement for doing exactly this (and<br>
>> exploring other ways to trigger end of training run); see<br>
>> <a href="https://bugs.openjdk.org/browse/JDK-8335358">https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.org%2Fbrowse%2FJDK-8335358&data=05%7C02%7Cmatthew.carter%40microsoft.com%7C4ff12f99e9bf4436bff508dcc3091127%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638599690928736831%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ANr2CrEAfwqp5jeANKhMquiV77Xt%2Fntd7x4cEWdnasM%3D&reserved=0</a><br>
> <br>
> <br>
> I am working on a prototype for dumping with jcmd. It will be<br>
> similar to the existing "jcmd VM.cds statoc_dump" command, except<br>
> that it will also support the dumping of the AOT cache and profile data.<br>
> <br>
> <br>
> Thanks<br>
> <br>
> - Ioi<br>
> <br>
> <br>
>><br>
>> Thanks,<br>
>> - Ashutosh Mehra<br>
>><br>
>><br>
>> On Fri, Aug 9, 2024 at 4:38 PM Danny Thomas <dannyt@netflix.com><br>
>> wrote:<br>
>><br>
>> I tried 24-leydenpremain+2-8 on a few internal applications,<br>
>> some quick feedback below (good to see you folks at the JVM LS!).<br>
>><br>
>> If a jar has a Class-Path attribute and one or more of those<br>
>> libraries are explicitly on the classpath, it causes the<br>
>> actual and expected classpath to always differ. This is also<br>
>> the case currently with CDS of course, but this feature is<br>
>> sure to be deployed far more broadly than CDS is currently, so<br>
>> likely something you want to look at:<br>
>><br>
>> [0.057s][info][class,path] non-existent Class-Path entry<br>
>> lib/failureaccess-1.0.1.jar<br>
>> [0.057s][info][class,path] opened:<br>
>> lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar<br>
>> [0.057s][info][class,path] library =<br>
>> lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar<br>
>><br>
>> Startup time when training seems to be on par<br>
>> with ArchiveClassesAtExit in JDK 21, but it's about a 3.5x<br>
>> startup time penalty for one of our typical Spring Boot<br>
>> applications. From a back-to-back run on my machine (AMD EPYC<br>
>> 9R14, 32 cores, 123G, Ubuntu 22.04.4 LTS):<br>
>><br>
>> Started App in 7.698 seconds (process running for 8.229)<br>
>> Started App in 26.247 seconds (process running for 29.262) -<br>
>> w/ CacheDataStore, Training Run<br>
>> Started App in 4.341 seconds (process running for 4.917) - w/<br>
>> CacheDataStore, Production Run<br>
>><br>
>> I also got a crash on one attempt, I can't remember what I did<br>
>> to cause this unfortunately:<br>
>><br>
>> Stack: [0x00007f3949ab0000,0x00007f3949bb0000],<br>
>> sp=0x00007f3949bae628, free space=1017k<br>
>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM<br>
>> code, C=native code)<br>
>> V [libjvm.so+0x42ca30]<br>
>> ArchiveBuilder::get_buffered_addr(unsigned char*) const+0x40<br>
>> V [libjvm.so+0xce4aa5] VM_PopulateDumpSharedSpace::doit()+0x395<br>
>> V [libjvm.so+0x100ae69] VM_Operation::evaluate()+0x109<br>
>> V [libjvm.so+0x100e348]<br>
>> VMThread::evaluate_operation(VM_Operation*)+0xe8<br>
>> V [libjvm.so+0x10142fb]<br>
>> VMThread::inner_execute(VM_Operation*)+0x35b<br>
>> V [libjvm.so+0x101460f] VMThread::run()+0x16f<br>
>> V [libjvm.so+0xf6e5cf] Thread::call_run()+0x9f<br>
>> V [libjvm.so+0xd74e13] thread_native_entry(Thread*)+0x183<br>
>> C [libc.so.6+0x98b07]<br>
>><br>
>> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR),<br>
>> si_addr: 0x0000000000000030<br>
>><br>
>> Thinking ahead to operationalizing AOT, while a<br>
>> single-shot/on-exit workflow is great for iterating locally,<br>
>> requiring the VM to exit makes this more difficult to<br>
>> operationalize at scale:<br>
>><br>
>> 1. We'll perform training and assembly on test, production<br>
>> canary and production instances on behalf of application<br>
>> owners and handle distribution of the archives. Depending<br>
>> on when we're able to perform a training run, it'll have<br>
>> different benefits. i.e.:<br>
>> 1. Test environment will at least improve startup<br>
>> performance, with a mixed benefit for warm up<br>
>> depending on the kind of traffic they take in test<br>
>> 2. If an application uses canary deployments we'll have a<br>
>> full production profile prior to the full production<br>
>> deployment, and all instances will come up hot<br>
>> 3. If we reach production with only a test environment<br>
>> profile, we'll perform a training run in production,<br>
>> so instances that scale up following that run will<br>
>> come up hot (completely cold instances for an initial<br>
>> deployment is less of a concern, because we deploy<br>
>> immutably and get a natural warm-up period while we<br>
>> have 200% capacity online for a cluster)<br>
>> 2. It's currently not a problem if a VM doesn't exit<br>
>> completely due to a dangling non-daemon thread or hung<br>
>> shutdown hook<br>
>><br>
>> Being able to trigger assembly/verification via jcmd without<br>
>> exiting, would make this far easier for us to support. If the<br>
>> overhead of the instrumentation for CDS can be avoided, being<br>
>> able to take a snapshot at any time on any VM would be better<br>
>> still, but that wouldn't be an impediment for us: we'll know<br>
>> that the instance will be used for training at boot time.<br>
>><br>
>> We build nightlies of all the currently active OpenJDK<br>
>> projects, so if you land anything on premain between EA builds<br>
>> that you'd like us to try, let us know!<br>
>><br>
>> Cheers,<br>
>> Danny<br>
>><br>
</div>
</span></font></span></div>
</div>
</div>
</body>
</html>