EA feedback

Tue Aug 13 22:22:20 UTC 2024

On 8/13/24 12:42 PM, Ashutosh Mehra wrote:
>
>     Being able to trigger assembly/verification via jcmd without
>     exiting, would make this far easier for us to support.
>
> There is a proposed enhancement for doing exactly this (and exploring 
> other ways to trigger end of training run); see 
> https://bugs.openjdk.org/browse/JDK-8335358

I am working on a prototype for dumping with jcmd. It will be similar to 
the existing "jcmd VM.cds statoc_dump" command, except that it will also 
support the dumping of the AOT cache and profile data.

Thanks

- Ioi

>
> Thanks,
> - Ashutosh Mehra
>
>
> On Fri, Aug 9, 2024 at 4:38 PM Danny Thomas <dannyt at netflix.com> wrote:
>
>     I tried 24-leydenpremain+2-8 on a few internal applications, some
>     quick feedback below (good to see you folks at the JVM LS!).
>
>     If a jar has a Class-Path attribute and one or more of those
>     libraries are explicitly on the classpath, it causes the actual
>     and expected classpath to always differ. This is also the case
>     currently with CDS of course, but this feature is sure to be
>     deployed far more broadly than CDS is currently, so likely
>     something you want to look at:
>
>     [0.057s][info][class,path] non-existent Class-Path entry
>     lib/failureaccess-1.0.1.jar
>     [0.057s][info][class,path] opened:
>     lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
>     [0.057s][info][class,path] library =
>     lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
>
>     Startup time when training seems to be on par
>     with ArchiveClassesAtExit in JDK 21, but it's about a 3.5x startup
>     time penalty for one of our typical Spring Boot applications. From
>     a back-to-back run on my machine (AMD EPYC 9R14, 32 cores, 123G,
>     Ubuntu 22.04.4 LTS):
>
>     Started App in 7.698 seconds (process running for 8.229)
>     Started App in 26.247 seconds (process running for 29.262) - w/
>     CacheDataStore, Training Run
>     Started App in 4.341 seconds (process running for 4.917)  - w/
>     CacheDataStore, Production Run
>
>     I also got a crash on one attempt, I can't remember what I did to
>     cause this unfortunately:
>
>     Stack: [0x00007f3949ab0000,0x00007f3949bb0000],
>      sp=0x00007f3949bae628,  free space=1017k
>     Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
>     C=native code)
>     V  [libjvm.so+0x42ca30]
>      ArchiveBuilder::get_buffered_addr(unsigned char*) const+0x40
>     V  [libjvm.so+0xce4aa5]  VM_PopulateDumpSharedSpace::doit()+0x395
>     V  [libjvm.so+0x100ae69]  VM_Operation::evaluate()+0x109
>     V  [libjvm.so+0x100e348]
>      VMThread::evaluate_operation(VM_Operation*)+0xe8
>     V  [libjvm.so+0x10142fb]  VMThread::inner_execute(VM_Operation*)+0x35b
>     V  [libjvm.so+0x101460f]  VMThread::run()+0x16f
>     V  [libjvm.so+0xf6e5cf]  Thread::call_run()+0x9f
>     V  [libjvm.so+0xd74e13]  thread_native_entry(Thread*)+0x183
>     C  [libc.so.6+0x98b07]
>
>     siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR),
>     si_addr: 0x0000000000000030
>
>     Thinking ahead to operationalizing AOT, while a
>     single-shot/on-exit workflow is great for iterating locally,
>     requiring the VM to exit makes this more difficult to
>     operationalize at scale:
>
>      1. We'll perform training and assembly on test, production canary
>         and production instances on behalf of application owners and
>         handle distribution of the archives. Depending on when we're
>         able to perform a training run, it'll have different benefits.
>         i.e.:
>          1. Test environment will at least improve startup
>             performance, with a mixed benefit for warm up depending on
>             the kind of traffic they take in test
>          2. If an application uses canary deployments we'll have a
>             full production profile prior to the full production
>             deployment, and all instances will come up hot
>          3. If we reach production with only a test environment
>             profile, we'll perform a training run in production, so
>             instances that scale up following that run will come up
>             hot (completely cold instances for an initial deployment
>             is less of a concern, because we deploy immutably and get
>             a natural warm-up period while we have 200% capacity
>             online for a cluster)
>      2. It's currently not a problem if a VM doesn't exit completely
>         due to a dangling non-daemon thread or hung shutdown hook
>
>     Being able to trigger assembly/verification via jcmd without
>     exiting, would make this far easier for us to support. If the
>     overhead of the instrumentation for CDS can be avoided, being able
>     to take a snapshot at any time on any VM would be better still,
>     but that wouldn't be an impediment for us: we'll know that the
>     instance will be used for training at boot time.
>
>     We build nightlies of all the currently active OpenJDK projects,
>     so if you land anything on premain between EA builds that you'd
>     like us to try, let us know!
>
>     Cheers,
>     Danny
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/leyden-dev/attachments/20240813/3717637f/attachment.htm>