EA feedback
ioi.lam at oracle.com
ioi.lam at oracle.com
Tue Aug 13 22:22:20 UTC 2024
On 8/13/24 12:42 PM, Ashutosh Mehra wrote:
>
> Being able to trigger assembly/verification via jcmd without
> exiting, would make this far easier for us to support.
>
> There is a proposed enhancement for doing exactly this (and exploring
> other ways to trigger end of training run); see
> https://bugs.openjdk.org/browse/JDK-8335358
I am working on a prototype for dumping with jcmd. It will be similar to
the existing "jcmd VM.cds statoc_dump" command, except that it will also
support the dumping of the AOT cache and profile data.
Thanks
- Ioi
>
> Thanks,
> - Ashutosh Mehra
>
>
> On Fri, Aug 9, 2024 at 4:38 PM Danny Thomas <dannyt at netflix.com> wrote:
>
> I tried 24-leydenpremain+2-8 on a few internal applications, some
> quick feedback below (good to see you folks at the JVM LS!).
>
> If a jar has a Class-Path attribute and one or more of those
> libraries are explicitly on the classpath, it causes the actual
> and expected classpath to always differ. This is also the case
> currently with CDS of course, but this feature is sure to be
> deployed far more broadly than CDS is currently, so likely
> something you want to look at:
>
> [0.057s][info][class,path] non-existent Class-Path entry
> lib/failureaccess-1.0.1.jar
> [0.057s][info][class,path] opened:
> lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
> [0.057s][info][class,path] library =
> lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
>
> Startup time when training seems to be on par
> with ArchiveClassesAtExit in JDK 21, but it's about a 3.5x startup
> time penalty for one of our typical Spring Boot applications. From
> a back-to-back run on my machine (AMD EPYC 9R14, 32 cores, 123G,
> Ubuntu 22.04.4 LTS):
>
> Started App in 7.698 seconds (process running for 8.229)
> Started App in 26.247 seconds (process running for 29.262) - w/
> CacheDataStore, Training Run
> Started App in 4.341 seconds (process running for 4.917) - w/
> CacheDataStore, Production Run
>
> I also got a crash on one attempt, I can't remember what I did to
> cause this unfortunately:
>
> Stack: [0x00007f3949ab0000,0x00007f3949bb0000],
> sp=0x00007f3949bae628, free space=1017k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
> C=native code)
> V [libjvm.so+0x42ca30]
> ArchiveBuilder::get_buffered_addr(unsigned char*) const+0x40
> V [libjvm.so+0xce4aa5] VM_PopulateDumpSharedSpace::doit()+0x395
> V [libjvm.so+0x100ae69] VM_Operation::evaluate()+0x109
> V [libjvm.so+0x100e348]
> VMThread::evaluate_operation(VM_Operation*)+0xe8
> V [libjvm.so+0x10142fb] VMThread::inner_execute(VM_Operation*)+0x35b
> V [libjvm.so+0x101460f] VMThread::run()+0x16f
> V [libjvm.so+0xf6e5cf] Thread::call_run()+0x9f
> V [libjvm.so+0xd74e13] thread_native_entry(Thread*)+0x183
> C [libc.so.6+0x98b07]
>
> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR),
> si_addr: 0x0000000000000030
>
> Thinking ahead to operationalizing AOT, while a
> single-shot/on-exit workflow is great for iterating locally,
> requiring the VM to exit makes this more difficult to
> operationalize at scale:
>
> 1. We'll perform training and assembly on test, production canary
> and production instances on behalf of application owners and
> handle distribution of the archives. Depending on when we're
> able to perform a training run, it'll have different benefits.
> i.e.:
> 1. Test environment will at least improve startup
> performance, with a mixed benefit for warm up depending on
> the kind of traffic they take in test
> 2. If an application uses canary deployments we'll have a
> full production profile prior to the full production
> deployment, and all instances will come up hot
> 3. If we reach production with only a test environment
> profile, we'll perform a training run in production, so
> instances that scale up following that run will come up
> hot (completely cold instances for an initial deployment
> is less of a concern, because we deploy immutably and get
> a natural warm-up period while we have 200% capacity
> online for a cluster)
> 2. It's currently not a problem if a VM doesn't exit completely
> due to a dangling non-daemon thread or hung shutdown hook
>
> Being able to trigger assembly/verification via jcmd without
> exiting, would make this far easier for us to support. If the
> overhead of the instrumentation for CDS can be avoided, being able
> to take a snapshot at any time on any VM would be better still,
> but that wouldn't be an impediment for us: we'll know that the
> instance will be used for training at boot time.
>
> We build nightlies of all the currently active OpenJDK projects,
> so if you land anything on premain between EA builds that you'd
> like us to try, let us know!
>
> Cheers,
> Danny
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/leyden-dev/attachments/20240813/3717637f/attachment.htm>
More information about the leyden-dev
mailing list