I tried 24-leydenpremain+2-8 on a few internal applications, some quick feedback below (good to see you folks at the JVM LS!). If a jar has a Class-Path attribute and one or more of those libraries are explicitly on the classpath, it causes the actual and expected classpath to always differ. This is also the case currently with CDS of course, but this feature is sure to be deployed far more broadly than CDS is currently, so likely something you want to look at: [0.057s][info][class,path] non-existent Class-Path entry lib/failureaccess-1.0.1.jar [0.057s][info][class,path] opened: lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar [0.057s][info][class,path] library = lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar Startup time when training seems to be on par with ArchiveClassesAtExit in JDK 21, but it's about a 3.5x startup time penalty for one of our typical Spring Boot applications. From a back-to-back run on my machine (AMD EPYC 9R14, 32 cores, 123G, Ubuntu 22.04.4 LTS): Started App in 7.698 seconds (process running for 8.229) Started App in 26.247 seconds (process running for 29.262) - w/ CacheDataStore, Training Run Started App in 4.341 seconds (process running for 4.917) - w/ CacheDataStore, Production Run I also got a crash on one attempt, I can't remember what I did to cause this unfortunately: Stack: [0x00007f3949ab0000,0x00007f3949bb0000], sp=0x00007f3949bae628, free space=1017k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x42ca30] ArchiveBuilder::get_buffered_addr(unsigned char*) const+0x40 V [libjvm.so+0xce4aa5] VM_PopulateDumpSharedSpace::doit()+0x395 V [libjvm.so+0x100ae69] VM_Operation::evaluate()+0x109 V [libjvm.so+0x100e348] VMThread::evaluate_operation(VM_Operation*)+0xe8 V [libjvm.so+0x10142fb] VMThread::inner_execute(VM_Operation*)+0x35b V [libjvm.so+0x101460f] VMThread::run()+0x16f V [libjvm.so+0xf6e5cf] Thread::call_run()+0x9f V [libjvm.so+0xd74e13] thread_native_entry(Thread*)+0x183 C [libc.so.6+0x98b07] siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000030 Thinking ahead to operationalizing AOT, while a single-shot/on-exit workflow is great for iterating locally, requiring the VM to exit makes this more difficult to operationalize at scale: 1. We'll perform training and assembly on test, production canary and production instances on behalf of application owners and handle distribution of the archives. Depending on when we're able to perform a training run, it'll have different benefits. i.e.: 1. Test environment will at least improve startup performance, with a mixed benefit for warm up depending on the kind of traffic they take in test 2. If an application uses canary deployments we'll have a full production profile prior to the full production deployment, and all instances will come up hot 3. If we reach production with only a test environment profile, we'll perform a training run in production, so instances that scale up following that run will come up hot (completely cold instances for an initial deployment is less of a concern, because we deploy immutably and get a natural warm-up period while we have 200% capacity online for a cluster) 2. It's currently not a problem if a VM doesn't exit completely due to a dangling non-daemon thread or hung shutdown hook Being able to trigger assembly/verification via jcmd without exiting, would make this far easier for us to support. If the overhead of the instrumentation for CDS can be avoided, being able to take a snapshot at any time on any VM would be better still, but that wouldn't be an impediment for us: we'll know that the instance will be used for training at boot time. We build nightlies of all the currently active OpenJDK projects, so if you land anything on premain between EA builds that you'd like us to try, let us know! Cheers, Danny
Hi Danny, On 8/9/24 1:38 PM, Danny Thomas wrote:
I tried 24-leydenpremain+2-8 on a few internal applications, some quick feedback below (good to see you folks at the JVM LS!). Thanks for trying the EA build.
If a jar has a Class-Path attribute and one or more of those libraries are explicitly on the classpath, it causes the actual and expected classpath to always differ. This is also the case currently with CDS of course, but this feature is sure to be deployed far more broadly than CDS is currently, so likely something you want to look at:
[0.057s][info][class,path] non-existent Class-Path entry lib/failureaccess-1.0.1.jar [0.057s][info][class,path] opened: lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar [0.057s][info][class,path] library = lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
I couldn't reproduce the above Class-Path attribute issue with a simple test[1]. The simple test was extracted from an existing test case[2]. Can you provide a test case? (I'll let others to answer the other issues.) Thanks, Calvin [1] https://cr.openjdk.org/~ccheung/cp-attribute/ [2] https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/runtime/cds/ap...
Startup time when training seems to be on par with ArchiveClassesAtExit in JDK 21, but it's about a 3.5x startup time penalty for one of our typical Spring Boot applications. From a back-to-back run on my machine (AMD EPYC 9R14, 32 cores, 123G, Ubuntu 22.04.4 LTS):
Started App in 7.698 seconds (process running for 8.229) Started App in 26.247 seconds (process running for 29.262) - w/ CacheDataStore, Training Run Started App in 4.341 seconds (process running for 4.917) - w/ CacheDataStore, Production Run
I also got a crash on one attempt, I can't remember what I did to cause this unfortunately:
Stack: [0x00007f3949ab0000,0x00007f3949bb0000], sp=0x00007f3949bae628, free space=1017k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x42ca30] ArchiveBuilder::get_buffered_addr(unsigned char*) const+0x40 V [libjvm.so+0xce4aa5] VM_PopulateDumpSharedSpace::doit()+0x395 V [libjvm.so+0x100ae69] VM_Operation::evaluate()+0x109 V [libjvm.so+0x100e348] VMThread::evaluate_operation(VM_Operation*)+0xe8 V [libjvm.so+0x10142fb] VMThread::inner_execute(VM_Operation*)+0x35b V [libjvm.so+0x101460f] VMThread::run()+0x16f V [libjvm.so+0xf6e5cf] Thread::call_run()+0x9f V [libjvm.so+0xd74e13] thread_native_entry(Thread*)+0x183 C [libc.so.6+0x98b07]
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000030
Thinking ahead to operationalizing AOT, while a single-shot/on-exit workflow is great for iterating locally, requiring the VM to exit makes this more difficult to operationalize at scale:
1. We'll perform training and assembly on test, production canary and production instances on behalf of application owners and handle distribution of the archives. Depending on when we're able to perform a training run, it'll have different benefits. i.e.: 1. Test environment will at least improve startup performance, with a mixed benefit for warm up depending on the kind of traffic they take in test 2. If an application uses canary deployments we'll have a full production profile prior to the full production deployment, and all instances will come up hot 3. If we reach production with only a test environment profile, we'll perform a training run in production, so instances that scale up following that run will come up hot (completely cold instances for an initial deployment is less of a concern, because we deploy immutably and get a natural warm-up period while we have 200% capacity online for a cluster) 2. It's currently not a problem if a VM doesn't exit completely due to a dangling non-daemon thread or hung shutdown hook
Being able to trigger assembly/verification via jcmd without exiting, would make this far easier for us to support. If the overhead of the instrumentation for CDS can be avoided, being able to take a snapshot at any time on any VM would be better still, but that wouldn't be an impediment for us: we'll know that the instance will be used for training at boot time.
We build nightlies of all the currently active OpenJDK projects, so if you land anything on premain between EA builds that you'd like us to try, let us know!
Cheers, Danny
I created a reduced version of what we're seeing here: https://gist.github.com/DanielThomas/83eefaad41af33a071d9a9ee17ca8fe1 On Tue, Aug 13, 2024 at 1:54 PM Calvin Cheung <calvin.cheung@oracle.com> wrote:
Hi Danny, On 8/9/24 1:38 PM, Danny Thomas wrote:
I tried 24-leydenpremain+2-8 on a few internal applications, some quick feedback below (good to see you folks at the JVM LS!).
Thanks for trying the EA build.
If a jar has a Class-Path attribute and one or more of those libraries are explicitly on the classpath, it causes the actual and expected classpath to always differ. This is also the case currently with CDS of course, but this feature is sure to be deployed far more broadly than CDS is currently, so likely something you want to look at:
[0.057s][info][class,path] non-existent Class-Path entry lib/failureaccess-1.0.1.jar [0.057s][info][class,path] opened: lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar [0.057s][info][class,path] library = lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
I couldn't reproduce the above Class-Path attribute issue with a simple test[1]. The simple test was extracted from an existing test case[2].
Can you provide a test case?
(I'll let others to answer the other issues.) Thanks, Calvin [1] https://cr.openjdk.org/~ccheung/cp-attribute/ [2] https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/runtime/cds/ap...
Startup time when training seems to be on par with ArchiveClassesAtExit in JDK 21, but it's about a 3.5x startup time penalty for one of our typical Spring Boot applications. From a back-to-back run on my machine (AMD EPYC 9R14, 32 cores, 123G, Ubuntu 22.04.4 LTS):
Started App in 7.698 seconds (process running for 8.229) Started App in 26.247 seconds (process running for 29.262) - w/ CacheDataStore, Training Run Started App in 4.341 seconds (process running for 4.917) - w/ CacheDataStore, Production Run
I also got a crash on one attempt, I can't remember what I did to cause this unfortunately:
Stack: [0x00007f3949ab0000,0x00007f3949bb0000], sp=0x00007f3949bae628, free space=1017k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x42ca30] ArchiveBuilder::get_buffered_addr(unsigned char*) const+0x40 V [libjvm.so+0xce4aa5] VM_PopulateDumpSharedSpace::doit()+0x395 V [libjvm.so+0x100ae69] VM_Operation::evaluate()+0x109 V [libjvm.so+0x100e348] VMThread::evaluate_operation(VM_Operation*)+0xe8 V [libjvm.so+0x10142fb] VMThread::inner_execute(VM_Operation*)+0x35b V [libjvm.so+0x101460f] VMThread::run()+0x16f V [libjvm.so+0xf6e5cf] Thread::call_run()+0x9f V [libjvm.so+0xd74e13] thread_native_entry(Thread*)+0x183 C [libc.so.6+0x98b07]
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000030
Thinking ahead to operationalizing AOT, while a single-shot/on-exit workflow is great for iterating locally, requiring the VM to exit makes this more difficult to operationalize at scale:
1. We'll perform training and assembly on test, production canary and production instances on behalf of application owners and handle distribution of the archives. Depending on when we're able to perform a training run, it'll have different benefits. i.e.: 1. Test environment will at least improve startup performance, with a mixed benefit for warm up depending on the kind of traffic they take in test 2. If an application uses canary deployments we'll have a full production profile prior to the full production deployment, and all instances will come up hot 3. If we reach production with only a test environment profile, we'll perform a training run in production, so instances that scale up following that run will come up hot (completely cold instances for an initial deployment is less of a concern, because we deploy immutably and get a natural warm-up period while we have 200% capacity online for a cluster) 2. It's currently not a problem if a VM doesn't exit completely due to a dangling non-daemon thread or hung shutdown hook
Being able to trigger assembly/verification via jcmd without exiting, would make this far easier for us to support. If the overhead of the instrumentation for CDS can be avoided, being able to take a snapshot at any time on any VM would be better still, but that wouldn't be an impediment for us: we'll know that the instance will be used for training at boot time.
We build nightlies of all the currently active OpenJDK projects, so if you land anything on premain between EA builds that you'd like us to try, let us know!
Cheers, Danny
On 8/19/24 10:26 PM, Danny Thomas wrote:
I created a reduced version of what we're seeing here:
https://gist.github.com/DanielThomas/83eefaad41af33a071d9a9ee17ca8fe1 <https://urldefense.com/v3/__https://gist.github.com/DanielThomas/83eefaad41af33a071d9a9ee17ca8fe1__;!!ACWV5N9M2RV99hQ!MtJRHiVk_i583hydhLE7sXfaYzbgUBlQV59Zl9BqWi3iTHM1Wj1EDJ4X0Y_6rZdFtKkLsaHIWRGH3j1PnfA$>
I could see the problem and filed the following bug: https://bugs.openjdk.org/browse/JDK-8338686 A workaround is not to include the jars, which are in the Class-Path attribute, in the -cp. The test passed with the following CP setting: CP="lib/guice-all-5.1.1-jakartaee.jar:lib/main.jar" Thanks, Calvin
Being able to trigger assembly/verification via jcmd without exiting, would make this far easier for us to support.
There is a proposed enhancement for doing exactly this (and exploring other ways to trigger end of training run); see https://bugs.openjdk.org/browse/JDK-8335358 Thanks, - Ashutosh Mehra On Fri, Aug 9, 2024 at 4:38 PM Danny Thomas <dannyt@netflix.com> wrote:
I tried 24-leydenpremain+2-8 on a few internal applications, some quick feedback below (good to see you folks at the JVM LS!).
If a jar has a Class-Path attribute and one or more of those libraries are explicitly on the classpath, it causes the actual and expected classpath to always differ. This is also the case currently with CDS of course, but this feature is sure to be deployed far more broadly than CDS is currently, so likely something you want to look at:
[0.057s][info][class,path] non-existent Class-Path entry lib/failureaccess-1.0.1.jar [0.057s][info][class,path] opened: lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar [0.057s][info][class,path] library = lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
Startup time when training seems to be on par with ArchiveClassesAtExit in JDK 21, but it's about a 3.5x startup time penalty for one of our typical Spring Boot applications. From a back-to-back run on my machine (AMD EPYC 9R14, 32 cores, 123G, Ubuntu 22.04.4 LTS):
Started App in 7.698 seconds (process running for 8.229) Started App in 26.247 seconds (process running for 29.262) - w/ CacheDataStore, Training Run Started App in 4.341 seconds (process running for 4.917) - w/ CacheDataStore, Production Run
I also got a crash on one attempt, I can't remember what I did to cause this unfortunately:
Stack: [0x00007f3949ab0000,0x00007f3949bb0000], sp=0x00007f3949bae628, free space=1017k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x42ca30] ArchiveBuilder::get_buffered_addr(unsigned char*) const+0x40 V [libjvm.so+0xce4aa5] VM_PopulateDumpSharedSpace::doit()+0x395 V [libjvm.so+0x100ae69] VM_Operation::evaluate()+0x109 V [libjvm.so+0x100e348] VMThread::evaluate_operation(VM_Operation*)+0xe8 V [libjvm.so+0x10142fb] VMThread::inner_execute(VM_Operation*)+0x35b V [libjvm.so+0x101460f] VMThread::run()+0x16f V [libjvm.so+0xf6e5cf] Thread::call_run()+0x9f V [libjvm.so+0xd74e13] thread_native_entry(Thread*)+0x183 C [libc.so.6+0x98b07]
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000030
Thinking ahead to operationalizing AOT, while a single-shot/on-exit workflow is great for iterating locally, requiring the VM to exit makes this more difficult to operationalize at scale:
1. We'll perform training and assembly on test, production canary and production instances on behalf of application owners and handle distribution of the archives. Depending on when we're able to perform a training run, it'll have different benefits. i.e.: 1. Test environment will at least improve startup performance, with a mixed benefit for warm up depending on the kind of traffic they take in test 2. If an application uses canary deployments we'll have a full production profile prior to the full production deployment, and all instances will come up hot 3. If we reach production with only a test environment profile, we'll perform a training run in production, so instances that scale up following that run will come up hot (completely cold instances for an initial deployment is less of a concern, because we deploy immutably and get a natural warm-up period while we have 200% capacity online for a cluster) 2. It's currently not a problem if a VM doesn't exit completely due to a dangling non-daemon thread or hung shutdown hook
Being able to trigger assembly/verification via jcmd without exiting, would make this far easier for us to support. If the overhead of the instrumentation for CDS can be avoided, being able to take a snapshot at any time on any VM would be better still, but that wouldn't be an impediment for us: we'll know that the instance will be used for training at boot time.
We build nightlies of all the currently active OpenJDK projects, so if you land anything on premain between EA builds that you'd like us to try, let us know!
Cheers, Danny
On 8/13/24 12:42 PM, Ashutosh Mehra wrote:
Being able to trigger assembly/verification via jcmd without exiting, would make this far easier for us to support.
There is a proposed enhancement for doing exactly this (and exploring other ways to trigger end of training run); see https://bugs.openjdk.org/browse/JDK-8335358
I am working on a prototype for dumping with jcmd. It will be similar to the existing "jcmd VM.cds statoc_dump" command, except that it will also support the dumping of the AOT cache and profile data. Thanks - Ioi
Thanks, - Ashutosh Mehra
On Fri, Aug 9, 2024 at 4:38 PM Danny Thomas <dannyt@netflix.com> wrote:
I tried 24-leydenpremain+2-8 on a few internal applications, some quick feedback below (good to see you folks at the JVM LS!).
If a jar has a Class-Path attribute and one or more of those libraries are explicitly on the classpath, it causes the actual and expected classpath to always differ. This is also the case currently with CDS of course, but this feature is sure to be deployed far more broadly than CDS is currently, so likely something you want to look at:
[0.057s][info][class,path] non-existent Class-Path entry lib/failureaccess-1.0.1.jar [0.057s][info][class,path] opened: lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar [0.057s][info][class,path] library = lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
Startup time when training seems to be on par with ArchiveClassesAtExit in JDK 21, but it's about a 3.5x startup time penalty for one of our typical Spring Boot applications. From a back-to-back run on my machine (AMD EPYC 9R14, 32 cores, 123G, Ubuntu 22.04.4 LTS):
Started App in 7.698 seconds (process running for 8.229) Started App in 26.247 seconds (process running for 29.262) - w/ CacheDataStore, Training Run Started App in 4.341 seconds (process running for 4.917) - w/ CacheDataStore, Production Run
I also got a crash on one attempt, I can't remember what I did to cause this unfortunately:
Stack: [0x00007f3949ab0000,0x00007f3949bb0000], sp=0x00007f3949bae628, free space=1017k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x42ca30] ArchiveBuilder::get_buffered_addr(unsigned char*) const+0x40 V [libjvm.so+0xce4aa5] VM_PopulateDumpSharedSpace::doit()+0x395 V [libjvm.so+0x100ae69] VM_Operation::evaluate()+0x109 V [libjvm.so+0x100e348] VMThread::evaluate_operation(VM_Operation*)+0xe8 V [libjvm.so+0x10142fb] VMThread::inner_execute(VM_Operation*)+0x35b V [libjvm.so+0x101460f] VMThread::run()+0x16f V [libjvm.so+0xf6e5cf] Thread::call_run()+0x9f V [libjvm.so+0xd74e13] thread_native_entry(Thread*)+0x183 C [libc.so.6+0x98b07]
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000030
Thinking ahead to operationalizing AOT, while a single-shot/on-exit workflow is great for iterating locally, requiring the VM to exit makes this more difficult to operationalize at scale:
1. We'll perform training and assembly on test, production canary and production instances on behalf of application owners and handle distribution of the archives. Depending on when we're able to perform a training run, it'll have different benefits. i.e.: 1. Test environment will at least improve startup performance, with a mixed benefit for warm up depending on the kind of traffic they take in test 2. If an application uses canary deployments we'll have a full production profile prior to the full production deployment, and all instances will come up hot 3. If we reach production with only a test environment profile, we'll perform a training run in production, so instances that scale up following that run will come up hot (completely cold instances for an initial deployment is less of a concern, because we deploy immutably and get a natural warm-up period while we have 200% capacity online for a cluster) 2. It's currently not a problem if a VM doesn't exit completely due to a dangling non-daemon thread or hung shutdown hook
Being able to trigger assembly/verification via jcmd without exiting, would make this far easier for us to support. If the overhead of the instrumentation for CDS can be avoided, being able to take a snapshot at any time on any VM would be better still, but that wouldn't be an impediment for us: we'll know that the instance will be used for training at boot time.
We build nightlies of all the currently active OpenJDK projects, so if you land anything on premain between EA builds that you'd like us to try, let us know!
Cheers, Danny
Here’s the way I would prefer to think about a “dump command”. The native way that the JVM represents sequential operations is the method. Talking about methods is therefore a basic way to specify a condition for injecting a JVM operation like training dumps. I would like to figure out a good way to tie the training dump to the invocation of a method, either a single well-known method, or to a method specified (on the command line) by the user. In fact, it feels like a breakpoint-like operation would be a natural way to view the training dump. You don’t need JVMTI to get it done; you just need a hack in the VM which parallels the existing breakpoint mechanism, but special-cases it to drive a training dump. Given such a foundation, jsig could then inject a call to a method which is appropriately tied to the dump command. Sketch of implementation: When a method is first linked, a list is checked to see if it has a dump event tied to it, and a bit is set on the method. The method’s interpreter entry point might be modified, or perhaps the interpreter just always checks the bit. On entry to the method, before the first bytecode, an upcall tells the VM that it’s time to finish the training run. The compilers also check this bit, of course. There is some method deep in the privates of java.base that is always treated this way. That’s what jcmd reaches. There is a command line option which lists more methods to treat this way, something like the CompileOnly command. As a separate option, the upcall to end the training run might return (allowing the VM to continue) or just exit. As a separate option, allow the user to specify a count N, so that the training dump happens only after N “hits” on any marked method(s). I think all this is useful and flexible. On 13 Aug 2024, at 18:22, ioi.lam@oracle.com wrote:
On 8/13/24 12:42 PM, Ashutosh Mehra wrote:
Being able to trigger assembly/verification via jcmd without exiting, would make this far easier for us to support.
There is a proposed enhancement for doing exactly this (and exploring other ways to trigger end of training run); see https://bugs.openjdk.org/browse/JDK-8335358
I am working on a prototype for dumping with jcmd. It will be similar to the existing "jcmd VM.cds statoc_dump" command, except that it will also support the dumping of the AOT cache and profile data.
Thanks
- Ioi
Thanks, - Ashutosh Mehra
On Fri, Aug 9, 2024 at 4:38 PM Danny Thomas <dannyt@netflix.com> wrote:
I tried 24-leydenpremain+2-8 on a few internal applications, some quick feedback below (good to see you folks at the JVM LS!).
If a jar has a Class-Path attribute and one or more of those libraries are explicitly on the classpath, it causes the actual and expected classpath to always differ. This is also the case currently with CDS of course, but this feature is sure to be deployed far more broadly than CDS is currently, so likely something you want to look at:
[0.057s][info][class,path] non-existent Class-Path entry lib/failureaccess-1.0.1.jar [0.057s][info][class,path] opened: lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar [0.057s][info][class,path] library = lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
Startup time when training seems to be on par with ArchiveClassesAtExit in JDK 21, but it's about a 3.5x startup time penalty for one of our typical Spring Boot applications. From a back-to-back run on my machine (AMD EPYC 9R14, 32 cores, 123G, Ubuntu 22.04.4 LTS):
Started App in 7.698 seconds (process running for 8.229) Started App in 26.247 seconds (process running for 29.262) - w/ CacheDataStore, Training Run Started App in 4.341 seconds (process running for 4.917) - w/ CacheDataStore, Production Run
I also got a crash on one attempt, I can't remember what I did to cause this unfortunately:
Stack: [0x00007f3949ab0000,0x00007f3949bb0000], sp=0x00007f3949bae628, free space=1017k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x42ca30] ArchiveBuilder::get_buffered_addr(unsigned char*) const+0x40 V [libjvm.so+0xce4aa5] VM_PopulateDumpSharedSpace::doit()+0x395 V [libjvm.so+0x100ae69] VM_Operation::evaluate()+0x109 V [libjvm.so+0x100e348] VMThread::evaluate_operation(VM_Operation*)+0xe8 V [libjvm.so+0x10142fb] VMThread::inner_execute(VM_Operation*)+0x35b V [libjvm.so+0x101460f] VMThread::run()+0x16f V [libjvm.so+0xf6e5cf] Thread::call_run()+0x9f V [libjvm.so+0xd74e13] thread_native_entry(Thread*)+0x183 C [libc.so.6+0x98b07]
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000030
Thinking ahead to operationalizing AOT, while a single-shot/on-exit workflow is great for iterating locally, requiring the VM to exit makes this more difficult to operationalize at scale:
1. We'll perform training and assembly on test, production canary and production instances on behalf of application owners and handle distribution of the archives. Depending on when we're able to perform a training run, it'll have different benefits. i.e.: 1. Test environment will at least improve startup performance, with a mixed benefit for warm up depending on the kind of traffic they take in test 2. If an application uses canary deployments we'll have a full production profile prior to the full production deployment, and all instances will come up hot 3. If we reach production with only a test environment profile, we'll perform a training run in production, so instances that scale up following that run will come up hot (completely cold instances for an initial deployment is less of a concern, because we deploy immutably and get a natural warm-up period while we have 200% capacity online for a cluster) 2. It's currently not a problem if a VM doesn't exit completely due to a dangling non-daemon thread or hung shutdown hook
Being able to trigger assembly/verification via jcmd without exiting, would make this far easier for us to support. If the overhead of the instrumentation for CDS can be avoided, being able to take a snapshot at any time on any VM would be better still, but that wouldn't be an impediment for us: we'll know that the instance will be used for training at boot time.
We build nightlies of all the currently active OpenJDK projects, so if you land anything on premain between EA builds that you'd like us to try, let us know!
Cheers, Danny
Having considered the JBS issue [1], we'd like to propose/discuss some additional options, the thought being that while implementation can be staggered it might be helpful to consider them now to help prioritize/rethink/reject. In summary we currently have the "stop" action, stopping is immediate (now) and operationalized via the following mechanisms: • Ctrl-Z/D • System.Exit() • Run to completion (normal termination) • Unhandled exception :) And this new ask in JBS [1] is to • Add a mechanism using 'jcmd' and/or a new Leyden API to action the existing "stop" (now) functionality • Add a variant of the "stop" action where training stops when execution enters a java method (optionally after N invocations); specify method to stop on via Command Line (-XX), JCmd and/or Leyden API We’d like to propose two additional variants to aid with stopping training runs: • Stop training after some time has elapsed • Stop training when some threshold is met The second point comes from considering the definition of 'startup complete' and 'warmup complete'. Considering the developer is already engaged in AOT training, we should allow the developer to aid the training by: • Indicating when the application has completed startup (ready to work) • Indicate when the application has completed warmup, either by: ○ Calling an API to indicate warmup is done ○ Calling an API when a chunk of work has completed, passing in the 'duration'; this 'duration' is compared to some specified 'threshold' to generate the 'warmup is complete' event Adding a threshold means that training runs can be of optimal length and can handle changes in the environment and/or code. Coupled with maybe an "abort training" action we now have a training run that targets a threshold and aborts if not reached in time N. Currently we have the command line, jcmd and a new Leyden API to support training runs. Lastly we'd like to propose a fourth method that being an MxBean (uses the new Leyden API); this would allow the developer to provide the startup and warmup indicators internally or externally (their choice), and would allow for runtime analysis using bespoke production systems or JMC, and offline analysis via JFR. Cheers Mat [1] https://bugs.openjdk.org/browse/JDK-8335358 From: leyden-dev <leyden-dev-retn@openjdk.org> on behalf of John Rose <john.r.rose@oracle.com> Date: Friday, August 16, 2024 at 4:51 PM To: ioi.lam@oracle.com <ioi.lam@oracle.com> Cc: leyden-dev@openjdk.org <leyden-dev@openjdk.org> Subject: Re: EA feedback Here’s the way I would prefer to think about a “dump command”. The native way that the JVM represents sequential operations is the method. Talking about methods is therefore a basic way to specify a condition for injecting a JVM operation like training dumps. I would like to figure out a good way to tie the training dump to the invocation of a method, either a single well-known method, or to a method specified (on the command line) by the user. In fact, it feels like a breakpoint-like operation would be a natural way to view the training dump. You don’t need JVMTI to get it done; you just need a hack in the VM which parallels the existing breakpoint mechanism, but special-cases it to drive a training dump. Given such a foundation, jsig could then inject a call to a method which is appropriately tied to the dump command. Sketch of implementation: When a method is first linked, a list is checked to see if it has a dump event tied to it, and a bit is set on the method. The method’s interpreter entry point might be modified, or perhaps the interpreter just always checks the bit. On entry to the method, before the first bytecode, an upcall tells the VM that it’s time to finish the training run. The compilers also check this bit, of course. There is some method deep in the privates of java.base that is always treated this way. That’s what jcmd reaches. There is a command line option which lists more methods to treat this way, something like the CompileOnly command. As a separate option, the upcall to end the training run might return (allowing the VM to continue) or just exit. As a separate option, allow the user to specify a count N, so that the training dump happens only after N “hits” on any marked method(s). I think all this is useful and flexible. On 13 Aug 2024, at 18:22, ioi.lam@oracle.com wrote: On 8/13/24 12:42 PM, Ashutosh Mehra wrote: Being able to trigger assembly/verification via jcmd without exiting, would make this far easier for us to support. There is a proposed enhancement for doing exactly this (and exploring other ways to trigger end of training run); see https://bugs.openjdk.org/browse/JDK-8335358 I am working on a prototype for dumping with jcmd. It will be similar to the existing "jcmd VM.cds statoc_dump" command, except that it will also support the dumping of the AOT cache and profile data. Thanks - Ioi Thanks, - Ashutosh Mehra On Fri, Aug 9, 2024 at 4:38 PM Danny Thomas <dannyt@netflix.com<mailto:dannyt@netflix.com>> wrote: I tried 24-leydenpremain+2-8 on a few internal applications, some quick feedback below (good to see you folks at the JVM LS!). If a jar has a Class-Path attribute and one or more of those libraries are explicitly on the classpath, it causes the actual and expected classpath to always differ. This is also the case currently with CDS of course, but this feature is sure to be deployed far more broadly than CDS is currently, so likely something you want to look at: [0.057s][info][class,path] non-existent Class-Path entry lib/failureaccess-1.0.1.jar [0.057s][info][class,path] opened: lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar [0.057s][info][class,path] library = lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar Startup time when training seems to be on par with ArchiveClassesAtExit in JDK 21, but it's about a 3.5x startup time penalty for one of our typical Spring Boot applications. From a back-to-back run on my machine (AMD EPYC 9R14, 32 cores, 123G, Ubuntu 22.04.4 LTS): Started App in 7.698 seconds (process running for 8.229) Started App in 26.247 seconds (process running for 29.262) - w/ CacheDataStore, Training Run Started App in 4.341 seconds (process running for 4.917) - w/ CacheDataStore, Production Run I also got a crash on one attempt, I can't remember what I did to cause this unfortunately: Stack: [0x00007f3949ab0000,0x00007f3949bb0000], sp=0x00007f3949bae628, free space=1017k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x42ca30] ArchiveBuilder::get_buffered_addr(unsigned char*) const+0x40 V [libjvm.so+0xce4aa5] VM_PopulateDumpSharedSpace::doit()+0x395 V [libjvm.so+0x100ae69] VM_Operation::evaluate()+0x109 V [libjvm.so+0x100e348] VMThread::evaluate_operation(VM_Operation*)+0xe8 V [libjvm.so+0x10142fb] VMThread::inner_execute(VM_Operation*)+0x35b V [libjvm.so+0x101460f] VMThread::run()+0x16f V [libjvm.so+0xf6e5cf] Thread::call_run()+0x9f V [libjvm.so+0xd74e13] thread_native_entry(Thread*)+0x183 C [libc.so.6+0x98b07] siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000030 Thinking ahead to operationalizing AOT, while a single-shot/on-exit workflow is great for iterating locally, requiring the VM to exit makes this more difficult to operationalize at scale: 1. We'll perform training and assembly on test, production canary and production instances on behalf of application owners and handle distribution of the archives. Depending on when we're able to perform a training run, it'll have different benefits. i.e.: * Test environment will at least improve startup performance, with a mixed benefit for warm up depending on the kind of traffic they take in test * If an application uses canary deployments we'll have a full production profile prior to the full production deployment, and all instances will come up hot * If we reach production with only a test environment profile, we'll perform a training run in production, so instances that scale up following that run will come up hot (completely cold instances for an initial deployment is less of a concern, because we deploy immutably and get a natural warm-up period while we have 200% capacity online for a cluster) 2. It's currently not a problem if a VM doesn't exit completely due to a dangling non-daemon thread or hung shutdown hook Being able to trigger assembly/verification via jcmd without exiting, would make this far easier for us to support. If the overhead of the instrumentation for CDS can be avoided, being able to take a snapshot at any time on any VM would be better still, but that wouldn't be an impediment for us: we'll know that the instance will be used for training at boot time. We build nightlies of all the currently active OpenJDK projects, so if you land anything on premain between EA builds that you'd like us to try, let us know! Cheers, Danny
I'm normally just a lurker here but wanted to clarify one thing ... On 23/08/2024 2:03 am, Mat Carter wrote:
Having considered the JBS issue [1], we'd like to propose/discuss some additional options, the thought being that while implementation can be staggered it might be helpful to consider them now to help prioritize/rethink/reject.
In summary we currently have the "stop" action, stopping is immediate (now) and operationalized via the following mechanisms: • Ctrl-Z/D • System.Exit() • Run to completion (normal termination) • Unhandled exception :)
Note that in actuality none of those mechanisms are immediate, there are lots of things continuing to happen whilst we are in the process of "stopping", some of which themselves must run to completion (shutdown hooks) whilst others are truly abruptly terminated once the process is blown away. I think Leyden is just looking for a way to signal the end of training data collection, not necessarily the end of the program it is being collected from. Cheers, David -----
And this new ask in JBS [1] is to • Add a mechanism using 'jcmd' and/or a new Leyden API to action the existing "stop" (now) functionality • Add a variant of the "stop" action where training stops when execution enters a java method (optionally after N invocations); specify method to stop on via Command Line (-XX), JCmd and/or Leyden API
We’d like to propose two additional variants to aid with stopping training runs: • Stop training after some time has elapsed • Stop training when some threshold is met
The second point comes from considering the definition of 'startup complete' and 'warmup complete'. Considering the developer is already engaged in AOT training, we should allow the developer to aid the training by: • Indicating when the application has completed startup (ready to work) • Indicate when the application has completed warmup, either by: ○ Calling an API to indicate warmup is done ○ Calling an API when a chunk of work has completed, passing in the 'duration'; this 'duration' is compared to some specified 'threshold' to generate the 'warmup is complete' event
Adding a threshold means that training runs can be of optimal length and can handle changes in the environment and/or code. Coupled with maybe an "abort training" action we now have a training run that targets a threshold and aborts if not reached in time N.
Currently we have the command line, jcmd and a new Leyden API to support training runs. Lastly we'd like to propose a fourth method that being an MxBean (uses the new Leyden API); this would allow the developer to provide the startup and warmup indicators internally or externally (their choice), and would allow for runtime analysis using bespoke production systems or JMC, and offline analysis via JFR.
Cheers Mat
[1] https://bugs.openjdk.org/browse/JDK-8335358 <https://bugs.openjdk.org/browse/JDK-8335358>
From: leyden-dev <leyden-dev-retn@openjdk.org> on behalf of John Rose <john.r.rose@oracle.com> Date: Friday, August 16, 2024 at 4:51 PM To: ioi.lam@oracle.com <ioi.lam@oracle.com> Cc: leyden-dev@openjdk.org <leyden-dev@openjdk.org> Subject: Re: EA feedback
Here’s the way I would prefer to think about a “dump command”.
The native way that the JVM represents sequential operations is the method. Talking about methods is therefore a basic way to specify a condition for injecting a JVM operation like training dumps. I would like to figure out a good way to tie the training dump to the invocation of a method, either a single well-known method, or to a method specified (on the command line) by the user.
In fact, it feels like a breakpoint-like operation would be a natural way to view the training dump. You don’t need JVMTI to get it done; you just need a hack in the VM which parallels the existing breakpoint mechanism, but special-cases it to drive a training dump.
Given such a foundation, jsig could then inject a call to a method which is appropriately tied to the dump command.
Sketch of implementation:
When a method is first linked, a list is checked to see if it has a dump event tied to it, and a bit is set on the method. The method’s interpreter entry point might be modified, or perhaps the interpreter just always checks the bit. On entry to the method, before the first bytecode, an upcall tells the VM that it’s time to finish the training run.
The compilers also check this bit, of course.
There is some method deep in the privates of java.base that is always treated this way. That’s what jcmd reaches. There is a command line option which lists more methods to treat this way, something like the CompileOnly command.
As a separate option, the upcall to end the training run might return (allowing the VM to continue) or just exit.
As a separate option, allow the user to specify a count N, so that the training dump happens only after N “hits” on any marked method(s).
I think all this is useful and flexible.
On 13 Aug 2024, at 18:22, ioi.lam@oracle.com wrote:
On 8/13/24 12:42 PM, Ashutosh Mehra wrote:
Being able to trigger assembly/verification via jcmd without exiting, would make this far easier for us to support.
There is a proposed enhancement for doing exactly this (and exploring other ways to trigger end of training run); see https://bugs.openjdk.org/browse/JDK-8335358
I am working on a prototype for dumping with jcmd. It will be similar to the existing "jcmd VM.cds statoc_dump" command, except that it will also support the dumping of the AOT cache and profile data.
Thanks
- Ioi
Thanks, - Ashutosh Mehra
On Fri, Aug 9, 2024 at 4:38 PM Danny Thomas <dannyt@netflix.com> wrote:
I tried 24-leydenpremain+2-8 on a few internal applications, some quick feedback below (good to see you folks at the JVM LS!).
If a jar has a Class-Path attribute and one or more of those libraries are explicitly on the classpath, it causes the actual and expected classpath to always differ. This is also the case currently with CDS of course, but this feature is sure to be deployed far more broadly than CDS is currently, so likely something you want to look at:
[0.057s][info][class,path] non-existent Class-Path entry lib/failureaccess-1.0.1.jar [0.057s][info][class,path] opened: lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar [0.057s][info][class,path] library = lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
Startup time when training seems to be on par with ArchiveClassesAtExit in JDK 21, but it's about a 3.5x startup time penalty for one of our typical Spring Boot applications. From a back-to-back run on my machine (AMD EPYC 9R14, 32 cores, 123G, Ubuntu 22.04.4 LTS):
Started App in 7.698 seconds (process running for 8.229) Started App in 26.247 seconds (process running for 29.262) - w/ CacheDataStore, Training Run Started App in 4.341 seconds (process running for 4.917) - w/ CacheDataStore, Production Run
I also got a crash on one attempt, I can't remember what I did to cause this unfortunately:
Stack: [0x00007f3949ab0000,0x00007f3949bb0000], sp=0x00007f3949bae628, free space=1017k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x42ca30] ArchiveBuilder::get_buffered_addr(unsigned char*) const+0x40 V [libjvm.so+0xce4aa5] VM_PopulateDumpSharedSpace::doit()+0x395 V [libjvm.so+0x100ae69] VM_Operation::evaluate()+0x109 V [libjvm.so+0x100e348] VMThread::evaluate_operation(VM_Operation*)+0xe8 V [libjvm.so+0x10142fb] VMThread::inner_execute(VM_Operation*)+0x35b V [libjvm.so+0x101460f] VMThread::run()+0x16f V [libjvm.so+0xf6e5cf] Thread::call_run()+0x9f V [libjvm.so+0xd74e13] thread_native_entry(Thread*)+0x183 C [libc.so.6+0x98b07]
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000030
Thinking ahead to operationalizing AOT, while a single-shot/on-exit workflow is great for iterating locally, requiring the VM to exit makes this more difficult to operationalize at scale:
1. We'll perform training and assembly on test, production canary and production instances on behalf of application owners and handle distribution of the archives. Depending on when we're able to perform a training run, it'll have different benefits. i.e.: 1. Test environment will at least improve startup performance, with a mixed benefit for warm up depending on the kind of traffic they take in test 2. If an application uses canary deployments we'll have a full production profile prior to the full production deployment, and all instances will come up hot 3. If we reach production with only a test environment profile, we'll perform a training run in production, so instances that scale up following that run will come up hot (completely cold instances for an initial deployment is less of a concern, because we deploy immutably and get a natural warm-up period while we have 200% capacity online for a cluster) 2. It's currently not a problem if a VM doesn't exit completely due to a dangling non-daemon thread or hung shutdown hook
Being able to trigger assembly/verification via jcmd without exiting, would make this far easier for us to support. If the overhead of the instrumentation for CDS can be avoided, being able to take a snapshot at any time on any VM would be better still, but that wouldn't be an impediment for us: we'll know that the instance will be used for training at boot time.
We build nightlies of all the currently active OpenJDK projects, so if you land anything on premain between EA builds that you'd like us to try, let us know!
Cheers, Danny
Thanks David I should have been clearer in my response; “stopping” pertains to the recording of the training data, not necessarily the application itself. And “immediate” could be better phrased as “asap”, both are used in contrast to the other option: “stop recording training data when some event happens in the future" From: leyden-dev <leyden-dev-retn@openjdk.org> on behalf of David Holmes <david.holmes@oracle.com> Date: Thursday, August 22, 2024 at 5:18 PM To: leyden-dev@openjdk.org <leyden-dev@openjdk.org> Subject: Re: EA feedback I'm normally just a lurker here but wanted to clarify one thing ... On 23/08/2024 2:03 am, Mat Carter wrote:
Having considered the JBS issue [1], we'd like to propose/discuss some additional options, the thought being that while implementation can be staggered it might be helpful to consider them now to help prioritize/rethink/reject.
In summary we currently have the "stop" action, stopping is immediate (now) and operationalized via the following mechanisms: • Ctrl-Z/D • System.Exit() • Run to completion (normal termination) • Unhandled exception :)
Note that in actuality none of those mechanisms are immediate, there are lots of things continuing to happen whilst we are in the process of "stopping", some of which themselves must run to completion (shutdown hooks) whilst others are truly abruptly terminated once the process is blown away. I think Leyden is just looking for a way to signal the end of training data collection, not necessarily the end of the program it is being collected from. Cheers, David -----
And this new ask in JBS [1] is to • Add a mechanism using 'jcmd' and/or a new Leyden API to action the existing "stop" (now) functionality • Add a variant of the "stop" action where training stops when execution enters a java method (optionally after N invocations); specify method to stop on via Command Line (-XX), JCmd and/or Leyden API
We’d like to propose two additional variants to aid with stopping training runs: • Stop training after some time has elapsed • Stop training when some threshold is met
The second point comes from considering the definition of 'startup complete' and 'warmup complete'. Considering the developer is already engaged in AOT training, we should allow the developer to aid the training by: • Indicating when the application has completed startup (ready to work) • Indicate when the application has completed warmup, either by: ○ Calling an API to indicate warmup is done ○ Calling an API when a chunk of work has completed, passing in the 'duration'; this 'duration' is compared to some specified 'threshold' to generate the 'warmup is complete' event
Adding a threshold means that training runs can be of optimal length and can handle changes in the environment and/or code. Coupled with maybe an "abort training" action we now have a training run that targets a threshold and aborts if not reached in time N.
Currently we have the command line, jcmd and a new Leyden API to support training runs. Lastly we'd like to propose a fourth method that being an MxBean (uses the new Leyden API); this would allow the developer to provide the startup and warmup indicators internally or externally (their choice), and would allow for runtime analysis using bespoke production systems or JMC, and offline analysis via JFR.
Cheers Mat
[1] https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.org%2Fbrowse%2FJDK-8335358&data=05%7C02%7Cmatthew.carter%40microsoft.com%7C4ff12f99e9bf4436bff508dcc3091127%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638599690928723295%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=H6qKUl4pFMhTH04QWxrh5Q9JHvUnJXWf%2F3vetY1H%2B%2FA%3D&reserved=0<https://bugs.openjdk.org/browse/JDK-8335358> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.org%2Fbrowse%2FJDK-8335358&data=05%7C02%7Cmatthew.carter%40microsoft.com%7C4ff12f99e9bf4436bff508dcc3091127%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638599690928733661%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=6hpy60KVGSP3GT8TGtsJxaNNr5GBTihuMhcjiEzo7Qk%3D&reserved=0<https://bugs.openjdk.org/browse/JDK-8335358>>
From: leyden-dev <leyden-dev-retn@openjdk.org> on behalf of John Rose <john.r.rose@oracle.com> Date: Friday, August 16, 2024 at 4:51 PM To: ioi.lam@oracle.com <ioi.lam@oracle.com> Cc: leyden-dev@openjdk.org <leyden-dev@openjdk.org> Subject: Re: EA feedback
Here’s the way I would prefer to think about a “dump command”.
The native way that the JVM represents sequential operations is the method. Talking about methods is therefore a basic way to specify a condition for injecting a JVM operation like training dumps. I would like to figure out a good way to tie the training dump to the invocation of a method, either a single well-known method, or to a method specified (on the command line) by the user.
In fact, it feels like a breakpoint-like operation would be a natural way to view the training dump. You don’t need JVMTI to get it done; you just need a hack in the VM which parallels the existing breakpoint mechanism, but special-cases it to drive a training dump.
Given such a foundation, jsig could then inject a call to a method which is appropriately tied to the dump command.
Sketch of implementation:
When a method is first linked, a list is checked to see if it has a dump event tied to it, and a bit is set on the method. The method’s interpreter entry point might be modified, or perhaps the interpreter just always checks the bit. On entry to the method, before the first bytecode, an upcall tells the VM that it’s time to finish the training run.
The compilers also check this bit, of course.
There is some method deep in the privates of java.base that is always treated this way. That’s what jcmd reaches. There is a command line option which lists more methods to treat this way, something like the CompileOnly command.
As a separate option, the upcall to end the training run might return (allowing the VM to continue) or just exit.
As a separate option, allow the user to specify a count N, so that the training dump happens only after N “hits” on any marked method(s).
I think all this is useful and flexible.
On 13 Aug 2024, at 18:22, ioi.lam@oracle.com wrote:
On 8/13/24 12:42 PM, Ashutosh Mehra wrote:
Being able to trigger assembly/verification via jcmd without exiting, would make this far easier for us to support.
There is a proposed enhancement for doing exactly this (and exploring other ways to trigger end of training run); see https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.org%2Fbrowse%2FJDK-8335358&data=05%7C02%7Cmatthew.carter%40microsoft.com%7C4ff12f99e9bf4436bff508dcc3091127%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638599690928736831%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ANr2CrEAfwqp5jeANKhMquiV77Xt%2Fntd7x4cEWdnasM%3D&reserved=0<https://bugs.openjdk.org/browse/JDK-8335358>
I am working on a prototype for dumping with jcmd. It will be similar to the existing "jcmd VM.cds statoc_dump" command, except that it will also support the dumping of the AOT cache and profile data.
Thanks
- Ioi
Thanks, - Ashutosh Mehra
On Fri, Aug 9, 2024 at 4:38 PM Danny Thomas <dannyt@netflix.com> wrote:
I tried 24-leydenpremain+2-8 on a few internal applications, some quick feedback below (good to see you folks at the JVM LS!).
If a jar has a Class-Path attribute and one or more of those libraries are explicitly on the classpath, it causes the actual and expected classpath to always differ. This is also the case currently with CDS of course, but this feature is sure to be deployed far more broadly than CDS is currently, so likely something you want to look at:
[0.057s][info][class,path] non-existent Class-Path entry lib/failureaccess-1.0.1.jar [0.057s][info][class,path] opened: lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar [0.057s][info][class,path] library = lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
Startup time when training seems to be on par with ArchiveClassesAtExit in JDK 21, but it's about a 3.5x startup time penalty for one of our typical Spring Boot applications. From a back-to-back run on my machine (AMD EPYC 9R14, 32 cores, 123G, Ubuntu 22.04.4 LTS):
Started App in 7.698 seconds (process running for 8.229) Started App in 26.247 seconds (process running for 29.262) - w/ CacheDataStore, Training Run Started App in 4.341 seconds (process running for 4.917) - w/ CacheDataStore, Production Run
I also got a crash on one attempt, I can't remember what I did to cause this unfortunately:
Stack: [0x00007f3949ab0000,0x00007f3949bb0000], sp=0x00007f3949bae628, free space=1017k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x42ca30] ArchiveBuilder::get_buffered_addr(unsigned char*) const+0x40 V [libjvm.so+0xce4aa5] VM_PopulateDumpSharedSpace::doit()+0x395 V [libjvm.so+0x100ae69] VM_Operation::evaluate()+0x109 V [libjvm.so+0x100e348] VMThread::evaluate_operation(VM_Operation*)+0xe8 V [libjvm.so+0x10142fb] VMThread::inner_execute(VM_Operation*)+0x35b V [libjvm.so+0x101460f] VMThread::run()+0x16f V [libjvm.so+0xf6e5cf] Thread::call_run()+0x9f V [libjvm.so+0xd74e13] thread_native_entry(Thread*)+0x183 C [libc.so.6+0x98b07]
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000030
Thinking ahead to operationalizing AOT, while a single-shot/on-exit workflow is great for iterating locally, requiring the VM to exit makes this more difficult to operationalize at scale:
1. We'll perform training and assembly on test, production canary and production instances on behalf of application owners and handle distribution of the archives. Depending on when we're able to perform a training run, it'll have different benefits. i.e.: 1. Test environment will at least improve startup performance, with a mixed benefit for warm up depending on the kind of traffic they take in test 2. If an application uses canary deployments we'll have a full production profile prior to the full production deployment, and all instances will come up hot 3. If we reach production with only a test environment profile, we'll perform a training run in production, so instances that scale up following that run will come up hot (completely cold instances for an initial deployment is less of a concern, because we deploy immutably and get a natural warm-up period while we have 200% capacity online for a cluster) 2. It's currently not a problem if a VM doesn't exit completely due to a dangling non-daemon thread or hung shutdown hook
Being able to trigger assembly/verification via jcmd without exiting, would make this far easier for us to support. If the overhead of the instrumentation for CDS can be avoided, being able to take a snapshot at any time on any VM would be better still, but that wouldn't be an impediment for us: we'll know that the instance will be used for training at boot time.
We build nightlies of all the currently active OpenJDK projects, so if you land anything on premain between EA builds that you'd like us to try, let us know!
Cheers, Danny
For some background on my thinking here: There are two extremes here I’d like to avoid – the first is “clean exit is the only way to stop training”, and the second is the “the more the merrier” configuration extreme. I think we’re already moving away from the first (good!) but I don’t want us to move so far away we run into the second. I have a lot of OpenJ9 experience which had two great tools for hooking into system events – the dump and trace apis. Unfortunately, they both had grown all manner of bells and whistles that required webapps to generate the required set of options. Great tool – terrible UX. All that to say I’m somewhat hesitant of being too configurable. My preference is to build the minimum hooks into the runtime that allow *external* processes to trigger the end of training. John’s proposal to use a breakpoint-like mechanism spells out the runtime internals of one such mechanism to support an external process (config file, command line, api or jcmd) triggering the “end-training” signal. Can we spin the two new requests as being part of such an external process? Are they more naturally VM primitives? --Dan From: leyden-dev <leyden-dev-retn@openjdk.org> on behalf of Mat Carter <Matthew.Carter@microsoft.com> Date: Thursday, August 22, 2024 at 12:04 PM To: John Rose <john.r.rose@oracle.com>, Ioi Lam <ioi.lam@oracle.com> Cc: leyden-dev@openjdk.org <leyden-dev@openjdk.org> Subject: Re: EA feedback Having considered the JBS issue [1], we'd like to propose/discuss some additional options, the thought being that while implementation can be staggered it might be helpful to consider them now to help prioritize/rethink/reject. In summary we currently have the "stop" action, stopping is immediate (now) and operationalized via the following mechanisms: • Ctrl-Z/D • System.Exit() • Run to completion (normal termination) • Unhandled exception :) And this new ask in JBS [1] is to • Add a mechanism using 'jcmd' and/or a new Leyden API to action the existing "stop" (now) functionality • Add a variant of the "stop" action where training stops when execution enters a java method (optionally after N invocations); specify method to stop on via Command Line (-XX), JCmd and/or Leyden API We’d like to propose two additional variants to aid with stopping training runs: • Stop training after some time has elapsed • Stop training when some threshold is met The second point comes from considering the definition of 'startup complete' and 'warmup complete'. Considering the developer is already engaged in AOT training, we should allow the developer to aid the training by: • Indicating when the application has completed startup (ready to work) • Indicate when the application has completed warmup, either by: ○ Calling an API to indicate warmup is done ○ Calling an API when a chunk of work has completed, passing in the 'duration'; this 'duration' is compared to some specified 'threshold' to generate the 'warmup is complete' event Adding a threshold means that training runs can be of optimal length and can handle changes in the environment and/or code. Coupled with maybe an "abort training" action we now have a training run that targets a threshold and aborts if not reached in time N. Currently we have the command line, jcmd and a new Leyden API to support training runs. Lastly we'd like to propose a fourth method that being an MxBean (uses the new Leyden API); this would allow the developer to provide the startup and warmup indicators internally or externally (their choice), and would allow for runtime analysis using bespoke production systems or JMC, and offline analysis via JFR. Cheers Mat [1] https://bugs.openjdk.org/browse/JDK-8335358 From: leyden-dev <leyden-dev-retn@openjdk.org> on behalf of John Rose <john.r.rose@oracle.com> Date: Friday, August 16, 2024 at 4:51 PM To: ioi.lam@oracle.com <ioi.lam@oracle.com> Cc: leyden-dev@openjdk.org <leyden-dev@openjdk.org> Subject: Re: EA feedback Here’s the way I would prefer to think about a “dump command”. The native way that the JVM represents sequential operations is the method. Talking about methods is therefore a basic way to specify a condition for injecting a JVM operation like training dumps. I would like to figure out a good way to tie the training dump to the invocation of a method, either a single well-known method, or to a method specified (on the command line) by the user. In fact, it feels like a breakpoint-like operation would be a natural way to view the training dump. You don’t need JVMTI to get it done; you just need a hack in the VM which parallels the existing breakpoint mechanism, but special-cases it to drive a training dump. Given such a foundation, jsig could then inject a call to a method which is appropriately tied to the dump command. Sketch of implementation: When a method is first linked, a list is checked to see if it has a dump event tied to it, and a bit is set on the method. The method’s interpreter entry point might be modified, or perhaps the interpreter just always checks the bit. On entry to the method, before the first bytecode, an upcall tells the VM that it’s time to finish the training run. The compilers also check this bit, of course. There is some method deep in the privates of java.base that is always treated this way. That’s what jcmd reaches. There is a command line option which lists more methods to treat this way, something like the CompileOnly command. As a separate option, the upcall to end the training run might return (allowing the VM to continue) or just exit. As a separate option, allow the user to specify a count N, so that the training dump happens only after N “hits” on any marked method(s). I think all this is useful and flexible. On 13 Aug 2024, at 18:22, ioi.lam@oracle.com wrote: On 8/13/24 12:42 PM, Ashutosh Mehra wrote: Being able to trigger assembly/verification via jcmd without exiting, would make this far easier for us to support. There is a proposed enhancement for doing exactly this (and exploring other ways to trigger end of training run); see https://bugs.openjdk.org/browse/JDK-8335358 I am working on a prototype for dumping with jcmd. It will be similar to the existing "jcmd VM.cds statoc_dump" command, except that it will also support the dumping of the AOT cache and profile data. Thanks - Ioi Thanks, - Ashutosh Mehra On Fri, Aug 9, 2024 at 4:38 PM Danny Thomas <dannyt@netflix.com<mailto:dannyt@netflix.com>> wrote: I tried 24-leydenpremain+2-8 on a few internal applications, some quick feedback below (good to see you folks at the JVM LS!). If a jar has a Class-Path attribute and one or more of those libraries are explicitly on the classpath, it causes the actual and expected classpath to always differ. This is also the case currently with CDS of course, but this feature is sure to be deployed far more broadly than CDS is currently, so likely something you want to look at: [0.057s][info][class,path] non-existent Class-Path entry lib/failureaccess-1.0.1.jar [0.057s][info][class,path] opened: lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar [0.057s][info][class,path] library = lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar Startup time when training seems to be on par with ArchiveClassesAtExit in JDK 21, but it's about a 3.5x startup time penalty for one of our typical Spring Boot applications. From a back-to-back run on my machine (AMD EPYC 9R14, 32 cores, 123G, Ubuntu 22.04.4 LTS): Started App in 7.698 seconds (process running for 8.229) Started App in 26.247 seconds (process running for 29.262) - w/ CacheDataStore, Training Run Started App in 4.341 seconds (process running for 4.917) - w/ CacheDataStore, Production Run I also got a crash on one attempt, I can't remember what I did to cause this unfortunately: Stack: [0x00007f3949ab0000,0x00007f3949bb0000], sp=0x00007f3949bae628, free space=1017k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x42ca30] ArchiveBuilder::get_buffered_addr(unsigned char*) const+0x40 V [libjvm.so+0xce4aa5] VM_PopulateDumpSharedSpace::doit()+0x395 V [libjvm.so+0x100ae69] VM_Operation::evaluate()+0x109 V [libjvm.so+0x100e348] VMThread::evaluate_operation(VM_Operation*)+0xe8 V [libjvm.so+0x10142fb] VMThread::inner_execute(VM_Operation*)+0x35b V [libjvm.so+0x101460f] VMThread::run()+0x16f V [libjvm.so+0xf6e5cf] Thread::call_run()+0x9f V [libjvm.so+0xd74e13] thread_native_entry(Thread*)+0x183 C [libc.so.6+0x98b07] siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000030 Thinking ahead to operationalizing AOT, while a single-shot/on-exit workflow is great for iterating locally, requiring the VM to exit makes this more difficult to operationalize at scale: 1. We'll perform training and assembly on test, production canary and production instances on behalf of application owners and handle distribution of the archives. Depending on when we're able to perform a training run, it'll have different benefits. i.e.: 1. Test environment will at least improve startup performance, with a mixed benefit for warm up depending on the kind of traffic they take in test 2. If an application uses canary deployments we'll have a full production profile prior to the full production deployment, and all instances will come up hot 3. If we reach production with only a test environment profile, we'll perform a training run in production, so instances that scale up following that run will come up hot (completely cold instances for an initial deployment is less of a concern, because we deploy immutably and get a natural warm-up period while we have 200% capacity online for a cluster) 2. It's currently not a problem if a VM doesn't exit completely due to a dangling non-daemon thread or hung shutdown hook Being able to trigger assembly/verification via jcmd without exiting, would make this far easier for us to support. If the overhead of the instrumentation for CDS can be avoided, being able to take a snapshot at any time on any VM would be better still, but that wouldn't be an impediment for us: we'll know that the instance will be used for training at boot time. We build nightlies of all the currently active OpenJDK projects, so if you land anything on premain between EA builds that you'd like us to try, let us know! Cheers, Danny
Thanks Dan; I agree. In the Leyden premain engineering group, it is very common for folks to confront a hard problem (like training run specification) by making long lists of possible solutions. I like our habit of acknowledging those long lists, noting them down somewhere (perhaps in a filed RFE) and then selecting the top one or two items to do, ruthlessly discarding the rest. That’s the way we have made progress, and have avoided “more is merrier”, or “design by committee”, which for us would be a trap. How do we select the top one or two? By picking one which will be the simplest to implement, and serve 80-90% of the use cases. For bonus points we can try to identify the most effective primitive the underlies the myriad of proposed surface features, and steer our engineering in that direction. That’s what I way trying to do in pointing out a “breakpoint like” mechanism that might be the JVM primitive to build — the eventual primitive, if not the first thing we build. I will also say that any “training run API” to be invoked by the training run is probably very low down on the list of priorities, even if it is everybody’s first thought of how to run things. (System::trainingIsDone, amiright?) First, that requires modifications to application code, which only a minority of users will welcome (so it’s not an 80% solution). Second, the design of such an API is very subtle, and we don’t have enough experience with the primitives or the workflows to propose a good API. Therefore, if such an API arises, it will be years in the future, after we have covered many other use cases, and after we understand how best to shape such an API. As I noted elsewhere, if we have a flexible “breakpoint-like” primitive for triggering training run dumps, any user can build their own trainingIsDone method on top, if they really want it. On 23 Aug 2024, at 6:53, Dan Heidinga wrote:
For some background on my thinking here: There are two extremes here I’d like to avoid – the first is “clean exit is the only way to stop training”, and the second is the “the more the merrier” configuration extreme. I think we’re already moving away from the first (good!) but I don’t want us to move so far away we run into the second. I have a lot of OpenJ9 experience which had two great tools for hooking into system events – the dump and trace apis. Unfortunately, they both had grown all manner of bells and whistles that required webapps to generate the required set of options. Great tool – terrible UX. All that to say I’m somewhat hesitant of being too configurable.
My preference is to build the minimum hooks into the runtime that allow *external* processes to trigger the end of training. John’s proposal to use a breakpoint-like mechanism spells out the runtime internals of one such mechanism to support an external process (config file, command line, api or jcmd) triggering the “end-training” signal.
Can we spin the two new requests as being part of such an external process? Are they more naturally VM primitives?
--Dan
From:leyden-dev <leyden-dev-retn@openjdk.org> on behalf of Mat Carter <Matthew.Carter@microsoft.com> Date: Thursday, August 22, 2024 at 12:04 PM To: John Rose <john.r.rose@oracle.com>, Ioi Lam <ioi.lam@oracle.com> Cc: leyden-dev@openjdk.org <leyden-dev@openjdk.org> Subject: Re: EA feedback
Having considered the JBS issue [1], we'd like to propose/discuss some additional options, the thought being that while implementation can be staggered it might be helpful to consider them now to help prioritize/rethink/reject.
In summary we currently have the "stop" action, stopping is immediate (now) and operationalized via the following mechanisms:
• Ctrl-Z/D
• System.Exit()
• Run to completion (normal termination)
• Unhandled exception :)
And this new ask in JBS [1] is to
• Add a mechanism using 'jcmd' and/or a new Leyden API to action the existing "stop" (now) functionality
• Add a variant of the "stop" action where training stops when execution enters a java method (optionally after N invocations); specify method to stop on via Command Line (-XX), JCmd and/or Leyden API
We’d like to propose two additional variants to aid with stopping training runs:
• Stop training after some time has elapsed
• Stop training when some threshold is met
The second point comes from considering the definition of 'startup complete' and 'warmup complete'. Considering the developer is already engaged in AOT training, we should allow the developer to aid the training by:
• Indicating when the application has completed startup (ready to work)
• Indicate when the application has completed warmup, either by:
○ Calling an API to indicate warmup is done
○ Calling an API when a chunk of work has completed, passing in the 'duration'; this 'duration' is compared to some specified 'threshold' to generate the 'warmup is complete' event
Adding a threshold means that training runs can be of optimal length and can handle changes in the environment and/or code. Coupled with maybe an "abort training" action we now have a training run that targets a threshold and aborts if not reached in time N.
Currently we have the command line, jcmd and a new Leyden API to support training runs. Lastly we'd like to propose a fourth method that being an MxBean (uses the new Leyden API); this would allow the developer to provide the startup and warmup indicators internally or externally (their choice), and would allow for runtime analysis using bespoke production systems or JMC, and offline analysis via JFR.
Cheers
Mat
[1] https://bugs.openjdk.org/browse/JDK-8335358
From:leyden-dev <leyden-dev-retn@openjdk.org> on behalf of John Rose <john.r.rose@oracle.com> Date: Friday, August 16, 2024 at 4:51 PM To: ioi.lam@oracle.com <ioi.lam@oracle.com> Cc: leyden-dev@openjdk.org <leyden-dev@openjdk.org> Subject: Re: EA feedback
Here’s the way I would prefer to think about a “dump command”.
The native way that the JVM represents sequential operations is the method. Talking about methods is therefore a basic way to specify a condition for injecting a JVM operation like training dumps. I would like to figure out a good way to tie the training dump to the invocation of a method, either a single well-known method, or to a method specified (on the command line) by the user.
In fact, it feels like a breakpoint-like operation would be a natural way to view the training dump. You don’t need JVMTI to get it done; you just need a hack in the VM which parallels the existing breakpoint mechanism, but special-cases it to drive a training dump.
Given such a foundation, jsig could then inject a call to a method which is appropriately tied to the dump command.
Sketch of implementation:
When a method is first linked, a list is checked to see if it has a dump event tied to it, and a bit is set on the method. The method’s interpreter entry point might be modified, or perhaps the interpreter just always checks the bit. On entry to the method, before the first bytecode, an upcall tells the VM that it’s time to finish the training run.
The compilers also check this bit, of course.
There is some method deep in the privates of java.base that is always treated this way. That’s what jcmd reaches. There is a command line option which lists more methods to treat this way, something like the CompileOnly command.
As a separate option, the upcall to end the training run might return (allowing the VM to continue) or just exit.
As a separate option, allow the user to specify a count N, so that the training dump happens only after N “hits” on any marked method(s).
I think all this is useful and flexible.
On 13 Aug 2024, at 18:22, ioi.lam@oracle.com wrote:
On 8/13/24 12:42 PM, Ashutosh Mehra wrote:
Being able to trigger assembly/verification via jcmd without exiting, would make this far easier for us to support.
There is a proposed enhancement for doing exactly this (and exploring other ways to trigger end of training run); see https://bugs.openjdk.org/browse/JDK-8335358
I am working on a prototype for dumping with jcmd. It will be similar to the existing "jcmd VM.cds statoc_dump" command, except that it will also support the dumping of the AOT cache and profile data.
Thanks
- Ioi
Thanks,
- Ashutosh Mehra
On Fri, Aug 9, 2024 at 4:38 PM Danny Thomas <dannyt@netflix.com> wrote:
I tried 24-leydenpremain+2-8 on a few internal applications, some quick feedback below (good to see you folks at the JVM LS!).
If a jar has a Class-Path attribute and one or more of those libraries are explicitly on the classpath, it causes the actual and expected classpath to always differ. This is also the case currently with CDS of course, but this feature is sure to be deployed far more broadly than CDS is currently, so likely something you want to look at:
[0.057s][info][class,path] non-existent Class-Path entry lib/failureaccess-1.0.1.jar [0.057s][info][class,path] opened: lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar [0.057s][info][class,path] library = lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
Startup time when training seems to be on par with ArchiveClassesAtExit in JDK 21, but it's about a 3.5x startup time penalty for one of our typical Spring Boot applications. From a back-to-back run on my machine (AMD EPYC 9R14, 32 cores, 123G, Ubuntu 22.04.4 LTS):
Started App in 7.698 seconds (process running for 8.229) Started App in 26.247 seconds (process running for 29.262) - w/ CacheDataStore, Training Run Started App in 4.341 seconds (process running for 4.917) - w/ CacheDataStore, Production Run
I also got a crash on one attempt, I can't remember what I did to cause this unfortunately:
Stack: [0x00007f3949ab0000,0x00007f3949bb0000], sp=0x00007f3949bae628, free space=1017k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x42ca30] ArchiveBuilder::get_buffered_addr(unsigned char*) const+0x40 V [libjvm.so+0xce4aa5] VM_PopulateDumpSharedSpace::doit()+0x395 V [libjvm.so+0x100ae69] VM_Operation::evaluate()+0x109 V [libjvm.so+0x100e348] VMThread::evaluate_operation(VM_Operation*)+0xe8 V [libjvm.so+0x10142fb] VMThread::inner_execute(VM_Operation*)+0x35b V [libjvm.so+0x101460f] VMThread::run()+0x16f V [libjvm.so+0xf6e5cf] Thread::call_run()+0x9f V [libjvm.so+0xd74e13] thread_native_entry(Thread*)+0x183 C [libc.so.6+0x98b07]
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000030
Thinking ahead to operationalizing AOT, while a single-shot/on-exit workflow is great for iterating locally, requiring the VM to exit makes this more difficult to operationalize at scale:
1. We'll perform training and assembly on test, production canary and production instances on behalf of application owners and handle distribution of the archives. Depending on when we're able to perform a training run, it'll have different benefits. i.e.:
1. Test environment will at least improve startup performance, with a mixed benefit for warm up depending on the kind of traffic they take in test
2. If an application uses canary deployments we'll have a full production profile prior to the full production deployment, and all instances will come up hot
3. If we reach production with only a test environment profile, we'll perform a training run in production, so instances that scale up following that run will come up hot (completely cold instances for an initial deployment is less of a concern, because we deploy immutably and get a natural warm-up period while we have 200% capacity online for a cluster)
2. It's currently not a problem if a VM doesn't exit completely due to a dangling non-daemon thread or hung shutdown hook
Being able to trigger assembly/verification via jcmd without exiting, would make this far easier for us to support. If the overhead of the instrumentation for CDS can be avoided, being able to take a snapshot at any time on any VM would be better still, but that wouldn't be an impediment for us: we'll know that the instance will be used for training at boot time.
We build nightlies of all the currently active OpenJDK projects, so if you land anything on premain between EA builds that you'd like us to try, let us know!
Cheers, Danny
participants (8)
-
Ashutosh Mehra
-
Calvin Cheung
-
Dan Heidinga
-
Danny Thomas
-
David Holmes
-
ioi.lam@oracle.com
-
John Rose
-
Mat Carter