EA feedback

Danny Thomas dannyt at netflix.com
Tue Aug 20 05:26:29 UTC 2024


I created a reduced version of what we're seeing here:

https://gist.github.com/DanielThomas/83eefaad41af33a071d9a9ee17ca8fe1


On Tue, Aug 13, 2024 at 1:54 PM Calvin Cheung <calvin.cheung at oracle.com>
wrote:

> Hi Danny,
> On 8/9/24 1:38 PM, Danny Thomas wrote:
>
> I tried 24-leydenpremain+2-8 on a few internal applications, some quick
> feedback below (good to see you folks at the JVM LS!).
>
> Thanks for trying the EA build.
>
>
> If a jar has a Class-Path attribute and one or more of those libraries are
> explicitly on the classpath, it causes the actual and expected classpath to
> always differ. This is also the case currently with CDS of course, but this
> feature is sure to be deployed far more broadly than CDS is currently, so
> likely something you want to look at:
>
> [0.057s][info][class,path] non-existent Class-Path entry
> lib/failureaccess-1.0.1.jar
> [0.057s][info][class,path] opened:
> lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
> [0.057s][info][class,path] library =
> lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
>
> I couldn't reproduce the above Class-Path attribute issue with a simple
> test[1]. The simple test was extracted from an existing test case[2].
>
> Can you provide a test case?
>
> (I'll let others to answer the other issues.)
> Thanks,
> Calvin
> [1] https://cr.openjdk.org/~ccheung/cp-attribute/
> [2]
> https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/runtime/cds/appcds/ClassPathAttr.java
>
>
> Startup time when training seems to be on par with ArchiveClassesAtExit in
> JDK 21, but it's about a 3.5x startup time penalty for one of our typical
> Spring Boot applications. From a back-to-back run on my machine (AMD EPYC
> 9R14, 32 cores, 123G, Ubuntu 22.04.4 LTS):
>
> Started App in 7.698 seconds (process running for 8.229)
> Started App in 26.247 seconds (process running for 29.262) - w/
> CacheDataStore, Training Run
> Started App in 4.341 seconds (process running for 4.917)  - w/
> CacheDataStore, Production Run
>
> I also got a crash on one attempt, I can't remember what I did to cause
> this unfortunately:
>
> Stack: [0x00007f3949ab0000,0x00007f3949bb0000],  sp=0x00007f3949bae628,
>  free space=1017k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
> code)
> V  [libjvm.so+0x42ca30]  ArchiveBuilder::get_buffered_addr(unsigned char*)
> const+0x40
> V  [libjvm.so+0xce4aa5]  VM_PopulateDumpSharedSpace::doit()+0x395
> V  [libjvm.so+0x100ae69]  VM_Operation::evaluate()+0x109
> V  [libjvm.so+0x100e348]  VMThread::evaluate_operation(VM_Operation*)+0xe8
> V  [libjvm.so+0x10142fb]  VMThread::inner_execute(VM_Operation*)+0x35b
> V  [libjvm.so+0x101460f]  VMThread::run()+0x16f
> V  [libjvm.so+0xf6e5cf]  Thread::call_run()+0x9f
> V  [libjvm.so+0xd74e13]  thread_native_entry(Thread*)+0x183
> C  [libc.so.6+0x98b07]
>
> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr:
> 0x0000000000000030
>
> Thinking ahead to operationalizing AOT, while a single-shot/on-exit
> workflow is great for iterating locally, requiring the VM to exit makes
> this more difficult to operationalize at scale:
>
>    1. We'll perform training and assembly on test, production canary and
>    production instances on behalf of application owners and handle
>    distribution of the archives. Depending on when we're able to perform a
>    training run, it'll have different benefits. i.e.:
>       1. Test environment will at least improve startup performance, with
>       a mixed benefit for warm up depending on the kind of traffic they take in
>       test
>       2. If an application uses canary deployments we'll have a full
>       production profile prior to the full production deployment, and all
>       instances will come up hot
>       3. If we reach production with only a test environment profile,
>       we'll perform a training run in production, so instances that scale up
>       following that run will come up hot (completely cold instances for an
>       initial deployment is less of a concern, because we deploy immutably and
>       get a natural warm-up period while we have 200% capacity online for a
>       cluster)
>    2. It's currently not a problem if a VM doesn't exit completely due to
>    a dangling non-daemon thread or hung shutdown hook
>
> Being able to trigger assembly/verification via jcmd without
> exiting, would make this far easier for us to support. If the overhead of
> the instrumentation for CDS can be avoided, being able to take a snapshot
> at any time on any VM would be better still, but that wouldn't be an
> impediment for us: we'll know that the instance will be used for training
> at boot time.
>
> We build nightlies of all the currently active OpenJDK projects, so if you
> land anything on premain between EA builds that you'd like us to try, let
> us know!
>
> Cheers,
> Danny
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/leyden-dev/attachments/20240820/e285817b/attachment.htm>


More information about the leyden-dev mailing list