RFR: 8368465: [leyden] Improve precompiler method selection code
Forked from [JDK-8366681](https://bugs.openjdk.org/browse/JDK-8366681): there are still some cleanups/performance improvements possible. Current selection code is a bit hairy, and turns out the changes I made for previous patch improve performance. Notable improvements: 1. Push the compilation level filters downwards. This allows compiling A2 from T2/T3 code more easily, and allows to implement policies for compiling on any A* level based on observing top-compiled T* levels. 2. Sort methods by hotness and code size. This avoids a fairly awkward path to get compile IDs, ditching which _I suspect_ is the cause for performance improvement. With new code, we compile a tad more A2 code. I have not digged through why current code accepts fewer methods for compilation. New code improves performance everywhere, so I suggest we just accept that and move on. Additional testing: - [x] Performance tests (see comments) - [x] Linux x86_64 server fastdebug, `runtime/cds` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/leyden/pull/99/files Webrev: https://webrevs.openjdk.org/?repo=leyden&pr=99&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8368465 Stats: 116 lines in 4 files changed: 59 ins; 27 del; 30 mod Patch: https://git.openjdk.org/leyden/pull/99.diff Fetch: git fetch https://git.openjdk.org/leyden.git pull/99/head:pull/99 PR: https://git.openjdk.org/leyden/pull/99
On Tue, 23 Sep 2025 12:33:23 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:
Forked from [JDK-8366681](https://bugs.openjdk.org/browse/JDK-8366681): there are still some cleanups/performance improvements possible. Current selection code is a bit hairy, and turns out the changes I made for previous patch improve performance.
Notable improvements: 1. Push the compilation level filters downwards. This allows compiling A2 from T2/T3 code more easily, and allows to implement policies for compiling on any A* level based on observing top-compiled T* levels. 2. Sort methods by hotness and code size. This avoids a fairly awkward path to get compile IDs, ditching which _I suspect_ is the cause for performance improvement. With new code, we compile a tad more A2 code. I have not digged through why current code accepts fewer methods for compilation. New code improves performance everywhere, so I suggest we just accept that and move on.
Additional testing: - [x] Performance tests (see comments) - [x] Linux x86_64 server fastdebug, `runtime/cds`
javac test (1000 iterations trained, 50 iterations production) # --- Before Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -Xms64m -Xmx1g -XX:+UseSerialGC -cp JavacBenchApp.jar -XX:AOTCache=app.aot JavacBenchApp 50 Time (mean ± σ): 350.4 ms ± 4.9 ms [User: 683.2 ms, System: 104.5 ms] Range (min … max): 343.8 ms … 359.8 ms 10 runs Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -Xms64m -Xmx1g -XX:+UseSerialGC -cp JavacBenchApp.jar -XX:+UnlockExperimentalVMOptions -XX:+PreloadOnly -XX:AOTCache=app.aot JavacBenchApp 50 Time (mean ± σ): 481.2 ms ± 3.4 ms [User: 475.0 ms, System: 55.8 ms] Range (min … max): 477.0 ms … 487.9 ms 10 runs # --- After Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -Xms64m -Xmx1g -XX:+UseSerialGC -cp JavacBenchApp.jar -XX:AOTCache=app.aot JavacBenchApp 50 Time (mean ± σ): 344.0 ms ± 2.0 ms [User: 445.7 ms, System: 82.1 ms] Range (min … max): 342.0 ms … 348.2 ms 10 runs Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -Xms64m -Xmx1g -XX:+UseSerialGC -cp JavacBenchApp.jar -XX:+UnlockExperimentalVMOptions -XX:+PreloadOnly -XX:AOTCache=app.aot JavacBenchApp 50 Time (mean ± σ): 489.9 ms ± 1.7 ms [User: 481.2 ms, System: 58.6 ms] Range (min … max): 487.2 ms … 492.8 ms 10 runs `user` time significantly improves, I believe this is due to more `A2` code being used from the archive rather than being compiled on the fly. Larger benchmarks all improve with 1-core tests. quarkus-getting-started: Run,Old CDS + AOT,New CDS + AOT 1,301,284 2,309,278 3,299,290 4,307,280 5,300,280 6,304,293 7,298,283 8,308,301 9,319,284 10,307,288 Geomean,305.14,286.02 (1.07x improvement) Stdev,5.96,6.69 helidon-quickstart-se Run,Old CDS + AOT,New CDS + AOT 1,200,197 2,228,199 3,211,198 4,218,200 5,214,201 6,221,200 7,214,207 8,220,200 9,211,197 10,222,199 Geomean,215.77,199.78 (1.08x improvement) Stdev,7.34,2.71 micronaut-first-app Run,Old CDS + AOT,New CDS + AOT 1,256,224 2,250,239 3,259,232 4,264,240 5,252,225 6,250,236 7,248,234 8,265,244 9,246,231 10,265,234 Geomean,255.41,233.82 (1.09x improvement) Stdev,6.96,5.99 spring-boot-getting-started: Run,Old CDS + AOT,New CDS + AOT 1,567,567 2,581,557 3,581,564 4,575,560 5,571,540 6,571,548 7,575,557 8,575,553 9,568,543 10,571,552 Geomean,573.48,554.04 (1.04x improvement) Stdev,4.59,8.25 spring-petclinic: Run,Old CDS + AOT,New CDS + AOT 1,3440,3384 2,3375,3391 3,3367,3379 4,3371,3375 5,3444,3378 6,3391,3399 7,3403,3371 8,3391,3378 9,3454,3391 10,3409,3374 Geomean,3404.37,3381.99 (1.01x improvement) Stdev,30.09,8.54 ------------- PR Comment: https://git.openjdk.org/leyden/pull/99#issuecomment-3323828350 PR Comment: https://git.openjdk.org/leyden/pull/99#issuecomment-3323832493
Forked from [JDK-8366681](https://bugs.openjdk.org/browse/JDK-8366681): there are still some cleanups/performance improvements possible. Current selection code is a bit hairy, and turns out the changes I made for previous patch improve performance.
Notable improvements: 1. Push the compilation level filters downwards. This allows compiling A2 from T2/T3 code more easily, and allows to implement policies for compiling on any A* level based on observing top-compiled T* levels. 2. Sort methods by hotness and code size. This avoids a fairly awkward path to get compile IDs, ditching which _I suspect_ is the cause for performance improvement. With new code, we compile a tad more A2 code. I have not digged through why current code accepts fewer methods for compilation. New code improves performance everywhere, so I suggest we just accept that and move on.
Additional testing: - [x] Performance tests (see comments) - [x] Linux x86_64 server fastdebug, `runtime/cds`
Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: - Touchup - Touchups ------------- Changes: - all: https://git.openjdk.org/leyden/pull/99/files - new: https://git.openjdk.org/leyden/pull/99/files/d63bde8e..77b8b3ef Webrevs: - full: https://webrevs.openjdk.org/?repo=leyden&pr=99&range=01 - incr: https://webrevs.openjdk.org/?repo=leyden&pr=99&range=00-01 Stats: 6 lines in 1 file changed: 3 ins; 2 del; 1 mod Patch: https://git.openjdk.org/leyden/pull/99.diff Fetch: git fetch https://git.openjdk.org/leyden.git pull/99/head:pull/99 PR: https://git.openjdk.org/leyden/pull/99
Forked from [JDK-8366681](https://bugs.openjdk.org/browse/JDK-8366681): there are still some cleanups/performance improvements possible. Current selection code is a bit hairy, and turns out the changes I made for previous patch improve performance.
Notable improvements: 1. Push the compilation level filters downwards. This allows compiling A2 from T2/T3 code more easily, and allows to implement policies for compiling on any A* level based on observing top-compiled T* levels. 2. Sort methods by hotness and code size. This avoids a fairly awkward path to get compile IDs, ditching which _I suspect_ is the cause for performance improvement. With new code, we compile a tad more A2 code. I have not digged through why current code accepts fewer methods for compilation. New code improves performance everywhere, so I suggest we just accept that and move on.
Additional testing: - [x] Performance tests (see comments) - [x] Linux x86_64 server fastdebug, `runtime/cds`
Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'premain' into JDK-8368465-precompiler-method-select - Merge branch 'premain' into JDK-8368465-precompiler-method-select - Touchup - Touchups - Fix ------------- Changes: - all: https://git.openjdk.org/leyden/pull/99/files - new: https://git.openjdk.org/leyden/pull/99/files/77b8b3ef..c1ceda27 Webrevs: - full: https://webrevs.openjdk.org/?repo=leyden&pr=99&range=02 - incr: https://webrevs.openjdk.org/?repo=leyden&pr=99&range=01-02 Stats: 171272 lines in 2019 files changed: 136228 ins; 22676 del; 12368 mod Patch: https://git.openjdk.org/leyden/pull/99.diff Fetch: git fetch https://git.openjdk.org/leyden.git pull/99/head:pull/99 PR: https://git.openjdk.org/leyden/pull/99
On Fri, 17 Oct 2025 07:04:26 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:
Forked from [JDK-8366681](https://bugs.openjdk.org/browse/JDK-8366681): there are still some cleanups/performance improvements possible. Current selection code is a bit hairy, and turns out the changes I made for previous patch improve performance.
Notable improvements: 1. Push the compilation level filters downwards. This allows compiling A2 from T2/T3 code more easily, and allows to implement policies for compiling on any A* level based on observing top-compiled T* levels. 2. Sort methods by hotness and code size. This looks to have a positive effect on shorter workloads, I suspect because we are avoiding a lot of C1 compilations by preloading hottest code first.
Additional testing: - [x] Performance tests (see comments) - [x] Linux x86_64 server fastdebug, `runtime/cds`
Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
- Merge branch 'premain' into JDK-8368465-precompiler-method-select - Merge branch 'premain' into JDK-8368465-precompiler-method-select - Touchup - Touchups - Fix
Ready for review, folks. There are performance benefits of doing this, very apparently. ------------- PR Comment: https://git.openjdk.org/leyden/pull/99#issuecomment-3414157076
On Fri, 17 Oct 2025 07:04:26 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:
Forked from [JDK-8366681](https://bugs.openjdk.org/browse/JDK-8366681): there are still some cleanups/performance improvements possible. Current selection code is a bit hairy, and turns out the changes I made for previous patch improve performance.
Notable improvements: 1. Push the compilation level filters downwards. This allows compiling A2 from T2/T3 code more easily, and allows to implement policies for compiling on any A* level based on observing top-compiled T* levels. 2. Sort methods by hotness and code size. This looks to have a positive effect on shorter workloads, I suspect because we are avoiding a lot of C1 compilations by preloading hottest code first.
Additional testing: - [x] Performance tests (see comments) - [x] Linux x86_64 server fastdebug, `runtime/cds`
Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
- Merge branch 'premain' into JDK-8368465-precompiler-method-select - Merge branch 'premain' into JDK-8368465-precompiler-method-select - Touchup - Touchups - Fix
Few questions src/hotspot/share/compiler/precompiler.cpp line 76:
74: case CompLevel_full_profile: 75: // We do not include C1 full profiled methods at this time. 76: // TODO: See if it is profitable to do so. This requires MDO support in AOTCache.
We already support reference to MDO from C1 compiled code because we cache tier2 which do profiling. But we may miss few places in tier3 code. src/hotspot/share/compiler/precompiler.cpp line 124:
122: MethodData* md = mtd->final_profile(); 123: if (md != nullptr) { 124: count += md->backedge_count();
Hmm, this will put methods with hot loop up front. src/hotspot/share/compiler/precompiler.cpp line 143:
141: if (c1 < c2) return +1; 142: 143: // Otherwise, break the tie by code size: largest methods go first.
What is the reason for larger method be first? Can we use compile ID here instead? ------------- PR Review: https://git.openjdk.org/leyden/pull/99#pullrequestreview-3351199210 PR Review Comment: https://git.openjdk.org/leyden/pull/99#discussion_r2440650794 PR Review Comment: https://git.openjdk.org/leyden/pull/99#discussion_r2440657543 PR Review Comment: https://git.openjdk.org/leyden/pull/99#discussion_r2440660187
On Fri, 17 Oct 2025 17:14:54 GMT, Vladimir Kozlov <kvn@openjdk.org> wrote:
Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
- Merge branch 'premain' into JDK-8368465-precompiler-method-select - Merge branch 'premain' into JDK-8368465-precompiler-method-select - Touchup - Touchups - Fix
src/hotspot/share/compiler/precompiler.cpp line 76:
74: case CompLevel_full_profile: 75: // We do not include C1 full profiled methods at this time. 76: // TODO: See if it is profitable to do so. This requires MDO support in AOTCache.
We already support reference to MDO from C1 compiled code because we cache tier2 which do profiling. But we may miss few places in tier3 code.
I thought we only record MCS, and reinstantiating MDO. But my recollection is vague. I dropped this to avoid confusion.
src/hotspot/share/compiler/precompiler.cpp line 124:
122: MethodData* md = mtd->final_profile(); 123: if (md != nullptr) { 124: count += md->backedge_count();
Hmm, this will put methods with hot loop up front.
Yes, this is intentional: this effectively puts the methods that are profitable to (pre)load first, so they: a) do not linger in interpreter too much; b) do not trigger JIT compilation before AOT code is able to (pre)load. The methods with hot back-branches are those methods :)
src/hotspot/share/compiler/precompiler.cpp line 143:
141: if (c1 < c2) return +1; 142: 143: // Otherwise, break the tie by code size: largest methods go first.
What is the reason for larger method be first? Can we use compile ID here instead?
So my logic was like with the hot methods. If we have lost the game of "preload the AOT code before JIT got triggered", and JIT got triggered, we want to then prioritize larger methods, as they are more likely to take more time to JIT compile. In other words, I think if you lost to JIT timing-wise, you want to preempt the fattest JIT compiles first. But it is only a bet. If we ever record compilation time in nmethods/profiles, we could have used that to break the tie. ------------- PR Review Comment: https://git.openjdk.org/leyden/pull/99#discussion_r2441293292 PR Review Comment: https://git.openjdk.org/leyden/pull/99#discussion_r2441300763 PR Review Comment: https://git.openjdk.org/leyden/pull/99#discussion_r2441308216
On Fri, 17 Oct 2025 21:31:26 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:
src/hotspot/share/compiler/precompiler.cpp line 143:
141: if (c1 < c2) return +1; 142: 143: // Otherwise, break the tie by code size: largest methods go first.
What is the reason for larger method be first? Can we use compile ID here instead?
So my logic was like with the hot methods. If we have lost the game of "preload the AOT code before JIT got triggered", and JIT got triggered, we want to then prioritize larger methods, as they are more likely to take more time to JIT compile. In other words, I think if you lost to JIT timing-wise, you want to preempt the fattest JIT compiles first. But it is only a bet. If we ever record compilation time in nmethods/profiles, we could have used that to break the tie.
I am not sure understand how different order of pre/AOT-compilation can affect performance of production run. We bulk load all "Preload" AOT code - ordering does not matter for it. Even if we load in selected order. It is one thread which do loading and it is blocking (I actually playing with spreading this preload on all compiler threads - did not see much affect on startup). The only explanation is that preload happens only when C2 compiler threads are initialized (Preload AOT code is C2 compiled code) and it happens simultaneously with C1 threads initialization which could be available sooner for C1 compilation than we finish preloading. Especially on small number of cores machines. I did observed that we start loading A1 and A2 code first (even normal C1 compilations) before we start preload AP4. Is it what you are trying to solve here? The invocation counters should be roughly the same for methods without loops (10000 to trigger C2 compilation). They could be different if code was deoptimized and run in interpreter. The only difference id backedge counter. So in this sense you push methods with hot loop up front as we talked about in other comment. Which may affect performance but it would depend on application. I agree with ordering by size (or time spant in compilation) but only for methods which did not have A1 or A2 code. Which should not be the case - if we have AP4 we will have A1 and A2 for it. I am still not convince by this. May be we should try to move `AOTCodeCache::preload_code()` just after `SystemDictionary::compute_java_loaders()` because it does not depend on training data. So we can have AP4 sooner. MMaping directly into CodeCache will also speedup preloading - it is on our list to do. ------------- PR Review Comment: https://git.openjdk.org/leyden/pull/99#discussion_r2441456207
On Fri, 17 Oct 2025 23:50:02 GMT, Vladimir Kozlov <kvn@openjdk.org> wrote:
We bulk load all "Preload" AOT code - ordering does not matter for it. Even if we load in selected order. It is one thread which do loading and it is blocking (I actually playing with spreading this preload on all compiler threads - did not see much affect on startup).
Um, I don't think that's the case? Preloading is asynchronous. See `CompileBroker::compile_method`: bool is_blocking = ReplayCompiles || !directive->BackgroundCompilationOption || (PreloadBlocking && (compile_reason == CompileTask::Reason_Preload)); compile_method_base(method, osr_bci, comp_level, hot_count, compile_reason, requires_online_compilation, is_blocking, THREAD); We have the option to _make_ preload blocking (`PreloadBlocking`), but it is turned off by default. We know enabling `+PreloadBlocking` is counter-productive, because it could easily take hundreds of milliseconds. So while compilers are working through preloading the code, the application runs and can trigger compilations. Sorting preloading methods allows loading hottest code before that code transits to normal compilation. This is the problem I am trying to mitigate. Maybe the compilation policy should actually participate in preloading: i.e. if there is a hot code that transitions from T0 to any other level, attempt the preload first, in case normal preloading is lagging behind. That would be more intrusive, though, so as the conservative approach I would like to prioritize more profitable preload code first. ------------- PR Review Comment: https://git.openjdk.org/leyden/pull/99#discussion_r2580796305
On Tue, 2 Dec 2025 11:35:43 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:
So while compilers are working through preloading the code, the application runs and can trigger compilations. Sorting preloading methods allows loading hottest code before that code transits to normal compilation. This is the problem I am trying to mitigate.
Okay, I think I can understand when compile ID may screw up us. If T4 compilation during training run happens several times for the same method due to deoptimization we will cache only last corresponding AP4: if (entry->for_preload()) { if (entry->not_entrant()) { // Skip not entrant preload code: such entry will have high compile ID. We can keep early ID and used it for cached AP4 to avoid this. Which leads to an other issue. In initial AOT code implementation I kept deoptimization counter for A4 and use it when search A4 to load in production run. We removed that counter but kept all versions of A4 but `find_entry()` will return first A4 it found which may have a lot more uncommon traps when latest A4 version. May be we should filter A4 the same way we do for AP4. ------------- PR Review Comment: https://git.openjdk.org/leyden/pull/99#discussion_r2582832469
On Tue, 2 Dec 2025 21:28:53 GMT, Vladimir Kozlov <kvn@openjdk.org> wrote:
We bulk load all "Preload" AOT code - ordering does not matter for it. Even if we load in selected order. It is one thread which do loading and it is blocking (I actually playing with spreading this preload on all compiler threads - did not see much affect on startup).
Um, I don't think that's the case? Preloading is asynchronous. See `CompileBroker::compile_method`:
bool is_blocking = ReplayCompiles || !directive->BackgroundCompilationOption || (PreloadBlocking && (compile_reason == CompileTask::Reason_Preload)); compile_method_base(method, osr_bci, comp_level, hot_count, compile_reason, requires_online_compilation, is_blocking, THREAD);
We have the option to _make_ preload blocking (`PreloadBlocking`), but it is turned off by default. We know enabling `+PreloadBlocking` is counter-productive, because it could easily take hundreds of milliseconds.
So while compilers are working through preloading the code, the application runs and can trigger compilations. Sorting preloading methods allows loading hottest code before that code transits to normal compilation. This is the problem I am trying to mitigate.
Maybe the compilation policy should actually participate in preloading: i.e. if there is a hot code that transitions from T0 to any other level, attempt the preload first, in case normal preloading is lagging behind. That would be more intrusive, though, so as the conservative approach I would like to prioritize more profitable preload code first.
So while compilers are working through preloading the code, the application runs and can trigger compilations. Sorting preloading methods allows loading hottest code before that code transits to normal compilation. This is the problem I am trying to mitigate.
Okay, I think I can understand when compile ID may screw up us. If T4 compilation during training run happens several times for the same method due to deoptimization we will cache only last corresponding AP4:
if (entry->for_preload()) { if (entry->not_entrant()) { // Skip not entrant preload code:
such entry will have high compile ID. We can keep early ID and used it for cached AP4 to avoid this.
Which leads to an other issue. In initial AOT code implementation I kept deoptimization counter for A4 and use it when search A4 to load in production run. We removed that counter but kept all versions of A4 but `find_entry()` will return first A4 it found which may have a lot more uncommon traps when latest A4 version. May be we should filter A4 the same way we do for AP4.
By "blocking" I mean that we have only one AOT compiler thread to load AP4. ------------- PR Review Comment: https://git.openjdk.org/leyden/pull/99#discussion_r2582836251
On Tue, 2 Dec 2025 21:30:18 GMT, Vladimir Kozlov <kvn@openjdk.org> wrote:
So while compilers are working through preloading the code, the application runs and can trigger compilations. Sorting preloading methods allows loading hottest code before that code transits to normal compilation. This is the problem I am trying to mitigate.
Okay, I think I can understand when compile ID may screw up us. If T4 compilation during training run happens several times for the same method due to deoptimization we will cache only last corresponding AP4:
if (entry->for_preload()) { if (entry->not_entrant()) { // Skip not entrant preload code:
such entry will have high compile ID. We can keep early ID and used it for cached AP4 to avoid this.
Which leads to an other issue. In initial AOT code implementation I kept deoptimization counter for A4 and use it when search A4 to load in production run. We removed that counter but kept all versions of A4 but `find_entry()` will return first A4 it found which may have a lot more uncommon traps when latest A4 version. May be we should filter A4 the same way we do for AP4.
By "blocking" I mean that we have only one AOT compiler thread to load AP4.
May be your change reduced number of AOT compiled nmethod in cache which allow faster processing. Please run with `-Xlog:aot+codecache+init=debug -XX:+CITime` for production run to see how many AOT nmethods in AOT cache and how many were loaded/used. ------------- PR Review Comment: https://git.openjdk.org/leyden/pull/99#discussion_r2582859309
On Tue, 2 Dec 2025 21:38:57 GMT, Vladimir Kozlov <kvn@openjdk.org> wrote:
By "blocking" I mean that we have only one AOT compiler thread to load AP4.
May be your change reduced number of AOT compiled nmethod in cache which allow faster processing. Please run with `-Xlog:aot+codecache+init=debug -XX:+CITime` for production run to see how many AOT nmethods in AOT cache and how many were loaded/used.
Actually... Now I see we generate, and thus use substantially A2 code! This also aligns with performance data: we have way fewer C1 compilations with this patch. # ==== Baseline Create: [4.593s][info][precompile] Precompilation for level 1 finished (94 successful out of 94 total) [4.604s][info][precompile] Precompilation for level 2 finished (131 successful out of 131 total) [4.814s][info][precompile] Precompilation for level 2 finished (1852 successful out of 1852 total) [6.035s][info][precompile] Precompilation for level 4 finished (1660 successful out of 1660 total) [4.589s][info][precompile] Precompilation for level 5 finished (1660 successful out of 1660 total) Use: Tier1 {speed: 42838.159 bytes/s; standard: 0.014 s, 582 bytes, 135 methods; ...} Tier2 {speed: 210303.802 bytes/s; standard: 0.311 s, 63857 bytes, 817 methods; ...} Tier3 {speed: 134013.414 bytes/s; standard: 0.035 s, 4685 bytes, 245 methods; ...} Tier4 {speed: 69205.374 bytes/s; standard: 0.051 s, 3225 bytes, 13 methods; ...} AOT Code T1 {speed: 297580.645 bytes/s; standard: 0.001 s, 369 bytes, 94 methods; ...} AOT Code T2 {speed: 5654043.587 bytes/s; standard: 0.042 s, 237861 bytes, 1969 methods; ...} AOT Code T4 {speed: 25219362.296 bytes/s; standard: 0.029 s, 737408 bytes, 927 methods; ...} AOT Code T5 {speed: 30793594.418 bytes/s; standard: 0.048 s, 1474270 bytes, 1658 methods; ...} # ==== Patched Create: [3.984s][info][precompile] Precompilation for level 1 finished (311 successful out of 311 total) [4.382s][info][precompile] Precompilation for level 2 finished (2752 successful out of 2752 total) [4.383s][info][precompile] Precompilation for level 3 finished (0 successful out of 0 total) [5.392s][info][precompile] Precompilation for level 4 finished (1641 successful out of 1641 total) [3.972s][info][precompile] Precompilation for level 5 finished (1641 successful out of 1641 total) Use: Tier1 {speed: 0.000 bytes/s; standard: 0.000 s, 0 bytes, 0 methods; ... Tier2 {speed: 579987.470 bytes/s; standard: 0.026 s, 15526 bytes, 44 methods; ... Tier3 {speed: 181499.273 bytes/s; standard: 0.026 s, 4761 bytes, 254 methods; ... Tier4 {speed: 77265.133 bytes/s; standard: 0.027 s, 2087 bytes, 12 methods; ... AOT Code T1 {speed: 432360.583 bytes/s; standard: 0.002 s, 942 bytes, 228 methods; ... AOT Code T2 {speed: 6664604.248 bytes/s; standard: 0.042 s, 281287 bytes, 2735 methods; ... AOT Code T4 {speed: 26296881.658 bytes/s; standard: 0.026 s, 682331 bytes, 924 methods; ... AOT Code T5 {speed: 33814172.284 bytes/s; standard: 0.042 s, 1430045 bytes, 1632 methods; ... So I have changed something in selection code that takes on more A2 compiles, profitably. Have not yet confirmed if preload order has any effect on top of that. Investigating... ------------- PR Review Comment: https://git.openjdk.org/leyden/pull/99#discussion_r2584978192
On Wed, 3 Dec 2025 12:46:31 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:
May be your change reduced number of AOT compiled nmethod in cache which allow faster processing. Please run with `-Xlog:aot+codecache+init=debug -XX:+CITime` for production run to see how many AOT nmethods in AOT cache and how many were loaded/used.
Actually... Now I see we generate, and thus use substantially A2 code! This also aligns with performance data: we have way fewer C1 compilations with this patch.
# ==== Baseline
Create: [4.593s][info][precompile] Precompilation for level 1 finished (94 successful out of 94 total) [4.604s][info][precompile] Precompilation for level 2 finished (131 successful out of 131 total) [4.814s][info][precompile] Precompilation for level 2 finished (1852 successful out of 1852 total) [6.035s][info][precompile] Precompilation for level 4 finished (1660 successful out of 1660 total) [4.589s][info][precompile] Precompilation for level 5 finished (1660 successful out of 1660 total)
Use: Tier1 {speed: 42838.159 bytes/s; standard: 0.014 s, 582 bytes, 135 methods; ...} Tier2 {speed: 210303.802 bytes/s; standard: 0.311 s, 63857 bytes, 817 methods; ...} Tier3 {speed: 134013.414 bytes/s; standard: 0.035 s, 4685 bytes, 245 methods; ...} Tier4 {speed: 69205.374 bytes/s; standard: 0.051 s, 3225 bytes, 13 methods; ...} AOT Code T1 {speed: 297580.645 bytes/s; standard: 0.001 s, 369 bytes, 94 methods; ...} AOT Code T2 {speed: 5654043.587 bytes/s; standard: 0.042 s, 237861 bytes, 1969 methods; ...} AOT Code T4 {speed: 25219362.296 bytes/s; standard: 0.029 s, 737408 bytes, 927 methods; ...} AOT Code T5 {speed: 30793594.418 bytes/s; standard: 0.048 s, 1474270 bytes, 1658 methods; ...}
# ==== Patched
Create: [3.984s][info][precompile] Precompilation for level 1 finished (311 successful out of 311 total) [4.382s][info][precompile] Precompilation for level 2 finished (2752 successful out of 2752 total) [4.383s][info][precompile] Precompilation for level 3 finished (0 successful out of 0 total) [5.392s][info][precompile] Precompilation for level 4 finished (1641 successful out of 1641 total) [3.972s][info][precompile] Precompilation for level 5 finished (1641 successful out of 1641 total)
Use: Tier1 {speed: 0.000 bytes/s; standard: 0.000 s, 0 bytes, 0 methods; ... Tier2 {speed: 579987.470 bytes/s; standard: 0.026 s, 15526 bytes, 44 methods; ... Tier3 {speed: 181499.273 bytes/s; standard: 0.026 s, 4761 bytes, 254 methods; ... Tier4 {speed: 77265.133 bytes/s; standard: 0.027 s, 2087 bytes, 12 methods; ... AOT Code T1 {speed: 432360.583 bytes/s; standard: 0.002 s, 942 bytes, 228 methods; ... AOT Code T2 {speed: 6664604.248 bytes/s; standard: 0.042 s, 281287 bytes, 2735 methods; ... AOT Code T4 {speed: 26296881.658 bytes/s...
LOL, I think I found the performance bug in the original code that I fixed by accident. `MTD::highest_level()` includes the cases when method is inlined. For T3 code, it would return `4` if we ended up inlining that method into T4 code. Which would fail the inclusion check for A2 compilation, even though we did have a legit top-level T3 compile. I would say the fact we have inlined T3 in _some_ T4 should _not_ disqualify A2 compilation. This change alone gives the same kind of performance boost as my patch: diff --git a/src/hotspot/share/compiler/precompiler.cpp b/src/hotspot/share/compiler/precompiler.cpp index 04f95857a63..8a5da803b04 100644 --- a/src/hotspot/share/compiler/precompiler.cpp +++ b/src/hotspot/share/compiler/precompiler.cpp @@ -84,7 +84,7 @@ class PrecompileIterator : StackObj { static int compile_id(Method* m, int level) { MethodTrainingData* mtd = m->method_holder()->is_loaded() ? MethodTrainingData::find(methodHandle(Thread::current(), m)) : nullptr; - if (mtd != nullptr && mtd->highest_level() == level) { + if (mtd != nullptr && mtd->highest_top_level() == level) { CompileTrainingData* ctd = mtd->last_toplevel_compile(level); if (ctd != nullptr) { return ctd->compile_id(); This thing is confusing, and yet another reason why I this PR looks more understandable: it very explicitly checks `MTD::highest_top_level()` when deciding whether to accept the method. It does not do this with this bug, and also does not implicitly participate in filtering by `compile_id() < INT_MAX`. ------------- PR Review Comment: https://git.openjdk.org/leyden/pull/99#discussion_r2585116344
On Wed, 3 Dec 2025 13:27:39 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:
Actually... Now I see we generate, and thus use substantially A2 code! This also aligns with performance data: we have way fewer C1 compilations with this patch.
# ==== Baseline
Create: [4.593s][info][precompile] Precompilation for level 1 finished (94 successful out of 94 total) [4.604s][info][precompile] Precompilation for level 2 finished (131 successful out of 131 total) [4.814s][info][precompile] Precompilation for level 2 finished (1852 successful out of 1852 total) [6.035s][info][precompile] Precompilation for level 4 finished (1660 successful out of 1660 total) [4.589s][info][precompile] Precompilation for level 5 finished (1660 successful out of 1660 total)
Use: Tier1 {speed: 42838.159 bytes/s; standard: 0.014 s, 582 bytes, 135 methods; ...} Tier2 {speed: 210303.802 bytes/s; standard: 0.311 s, 63857 bytes, 817 methods; ...} Tier3 {speed: 134013.414 bytes/s; standard: 0.035 s, 4685 bytes, 245 methods; ...} Tier4 {speed: 69205.374 bytes/s; standard: 0.051 s, 3225 bytes, 13 methods; ...} AOT Code T1 {speed: 297580.645 bytes/s; standard: 0.001 s, 369 bytes, 94 methods; ...} AOT Code T2 {speed: 5654043.587 bytes/s; standard: 0.042 s, 237861 bytes, 1969 methods; ...} AOT Code T4 {speed: 25219362.296 bytes/s; standard: 0.029 s, 737408 bytes, 927 methods; ...} AOT Code T5 {speed: 30793594.418 bytes/s; standard: 0.048 s, 1474270 bytes, 1658 methods; ...}
# ==== Patched
Create: [3.984s][info][precompile] Precompilation for level 1 finished (311 successful out of 311 total) [4.382s][info][precompile] Precompilation for level 2 finished (2752 successful out of 2752 total) [4.383s][info][precompile] Precompilation for level 3 finished (0 successful out of 0 total) [5.392s][info][precompile] Precompilation for level 4 finished (1641 successful out of 1641 total) [3.972s][info][precompile] Precompilation for level 5 finished (1641 successful out of 1641 total)
Use: Tier1 {speed: 0.000 bytes/s; standard: 0.000 s, 0 bytes, 0 methods; ... Tier2 {speed: 579987.470 bytes/s; standard: 0.026 s, 15526 bytes, 44 methods; ... Tier3 {speed: 181499.273 bytes/s; standard: 0.026 s, 4761 bytes, 254 methods; ... Tier4 {speed: 77265.133 bytes/s; standard: 0.027 s, 2087 bytes, 12 methods; ... AOT Code T1 {speed: 432360.583 bytes/s; standard: 0.002 s, 942 bytes, 228 methods; ... AOT Code T2 {speed: 6664604.248 bytes/s; standard: 0.042...
LOL, I think I found the performance bug in the original code that I fixed by accident. `MTD::highest_level()` includes the cases when method is inlined. For T2/T3 code, it would return `4` if we ended up inlining that method into T4 code. Which would fail the inclusion check for A2 compilation, even though we did have a legit top-level T2/T3 compile. I would say the fact we have inlined T2/T3 in _some_ T4 should _not_ disqualify A2 compilation.
This change alone gives the same kind of performance boost as my patch:
diff --git a/src/hotspot/share/compiler/precompiler.cpp b/src/hotspot/share/compiler/precompiler.cpp index 04f95857a63..8a5da803b04 100644 --- a/src/hotspot/share/compiler/precompiler.cpp +++ b/src/hotspot/share/compiler/precompiler.cpp @@ -84,7 +84,7 @@ class PrecompileIterator : StackObj {
static int compile_id(Method* m, int level) { MethodTrainingData* mtd = m->method_holder()->is_loaded() ? MethodTrainingData::find(methodHandle(Thread::current(), m)) : nullptr; - if (mtd != nullptr && mtd->highest_level() == level) { + if (mtd != nullptr && mtd->highest_top_level() == level) { CompileTrainingData* ctd = mtd->last_toplevel_compile(level); if (ctd != nullptr) { return ctd->compile_id();
This thing is confusing, and yet another reason why I this PR looks more understandable: it very explicitly checks `MTD::highest_top_level()` when deciding whether to accept the method. It does not do this with this bug, and also does not implicitly participate in filtering by `compile_id() < INT_MAX`.
Reverted the sorting back to compile IDs instead of counters/size, as it does not affect performance, really. ------------- PR Review Comment: https://git.openjdk.org/leyden/pull/99#discussion_r2585220837
Forked from [JDK-8366681](https://bugs.openjdk.org/browse/JDK-8366681): there are still some cleanups/performance improvements possible. Current selection code is a bit hairy, and turns out the changes I made for previous patch improve performance.
Notable improvements: 1. Push the compilation level filters downwards. This allows compiling A2 from T2/T3 code more easily, and allows to implement policies for compiling on any A* level based on observing top-compiled T* levels. 2. Sort methods by hotness and code size. This looks to have a positive effect on shorter workloads, I suspect because we are avoiding a lot of C1 compilations by preloading hottest code first.
Additional testing: - [x] Performance tests (see comments) - [x] Linux x86_64 server fastdebug, `runtime/cds`
Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Drop the mention of MDO ------------- Changes: - all: https://git.openjdk.org/leyden/pull/99/files - new: https://git.openjdk.org/leyden/pull/99/files/c1ceda27..49f95523 Webrevs: - full: https://webrevs.openjdk.org/?repo=leyden&pr=99&range=03 - incr: https://webrevs.openjdk.org/?repo=leyden&pr=99&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/leyden/pull/99.diff Fetch: git fetch https://git.openjdk.org/leyden.git pull/99/head:pull/99 PR: https://git.openjdk.org/leyden/pull/99
On Fri, 17 Oct 2025 21:35:04 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:
Forked from [JDK-8366681](https://bugs.openjdk.org/browse/JDK-8366681): there are still some cleanups/performance improvements possible. Current selection code is a bit hairy, and turns out the changes I made for previous patch improve performance.
Notable improvements: 1. Push the compilation level filters downwards. This allows compiling A2 from T2/T3 code more easily, and allows to implement policies for compiling on any A* level based on observing top-compiled T* levels. 2. Sort methods by hotness and code size. This looks to have a positive effect on shorter workloads, I suspect because we are avoiding a lot of C1 compilations by preloading hottest code first.
Additional testing: - [x] Performance tests (see comments) - [x] Linux x86_64 server fastdebug, `runtime/cds`
Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
Drop the mention of MDO
I think we have small performance "issue" how we replace existing JITed code with new one which AOT code loading could be more sensitive. We deoptimize old code before new code is set under lock `NMethodState_lock `: https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/ci/ciEnv.cp... If lock is held by other thread we may deoptimize previous code and go into interpreter before new code is set for use. This is present in mainline but with normal JIT compilation replacement it may be not noticeable. ------------- PR Comment: https://git.openjdk.org/leyden/pull/99#issuecomment-3417547444
On Fri, 17 Oct 2025 21:35:04 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:
Forked from [JDK-8366681](https://bugs.openjdk.org/browse/JDK-8366681): there are still some cleanups/performance improvements possible. Current selection code is a bit hairy, and turns out the changes I made for previous patch improve performance.
Notable improvements: 1. Push the compilation level filters downwards. This allows compiling A2 from T2/T3 code more easily, and allows to implement policies for compiling on any A* level based on observing top-compiled T* levels. 2. Sort methods by hotness and code size. This looks to have a positive effect on shorter workloads, I suspect because we are avoiding a lot of C1 compilations by preloading hottest code first.
Additional testing: - [x] Performance tests (see comments) - [x] Linux x86_64 server fastdebug, `runtime/cds`
Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
Drop the mention of MDO
An other suggestion for this concurrent preloading would be to split A4 preload code. One set is the current which needs to wait `compute_java_loaders()`. And new one (much smaller) is for simple methods for classes which are loaded first (String, for example) which we can preload much sooner. ------------- PR Comment: https://git.openjdk.org/leyden/pull/99#issuecomment-3417553909
On Sat, 18 Oct 2025 00:16:00 GMT, Vladimir Kozlov <kvn@openjdk.org> wrote:
Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
Drop the mention of MDO
An other suggestion for this concurrent preloading would be to split A4 preload code. One set is the current which needs to wait `compute_java_loaders()`. And new one (much smaller) is for simple methods for classes which are loaded first (String, for example) which we can preload much sooner.
Any news on testing, @vnkozlov? ------------- PR Comment: https://git.openjdk.org/leyden/pull/99#issuecomment-3615382305
On Sat, 18 Oct 2025 00:16:00 GMT, Vladimir Kozlov <kvn@openjdk.org> wrote:
Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
Drop the mention of MDO
An other suggestion for this concurrent preloading would be to split A4 preload code. One set is the current which needs to wait `compute_java_loaders()`. And new one (much smaller) is for simple methods for classes which are loaded first (String, for example) which we can preload much sooner.
Any news on testing, @vnkozlov?
It is still running. There was big backlog of testing jobs. There are several failures. I need to run control testing without these changes to see if failures are new. ------------- PR Comment: https://git.openjdk.org/leyden/pull/99#issuecomment-3615466321
On Fri, 17 Oct 2025 21:35:04 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:
Forked from [JDK-8366681](https://bugs.openjdk.org/browse/JDK-8366681): there are still some cleanups/performance improvements possible. Current selection code is a bit hairy, and turns out the changes I made for previous patch improve performance.
Notable improvements: 1. Push the compilation level filters downwards. This allows compiling A2 from T2/T3 code more easily, and allows to implement policies for compiling on any A* level based on observing top-compiled T* levels. 2. Sort methods by hotness and code size. This looks to have a positive effect on shorter workloads, I suspect because we are avoiding a lot of C1 compilations by preloading hottest code first.
Additional testing: - [x] Performance tests (see comments) - [x] Linux x86_64 server fastdebug, `runtime/cds`
Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
Drop the mention of MDO
Getting back to this... ------------- PR Comment: https://git.openjdk.org/leyden/pull/99#issuecomment-3601598908
Forked from [JDK-8366681](https://bugs.openjdk.org/browse/JDK-8366681): there are still some cleanups/performance improvements possible. Current selection code is a bit hairy, and turns out the changes I made for previous patch improve performance.
Notable improvements: 1. Push the compilation level filters downwards. This allows compiling A2 from T2/T3 code more easily, and allows to implement policies for compiling on any A* level based on observing top-compiled T* levels. 2. Sort methods by hotness and code size. This looks to have a positive effect on shorter workloads, I suspect because we are avoiding a lot of C1 compilations by preloading hottest code first.
Additional testing: - [x] Performance tests (see comments) - [x] Linux x86_64 server fastdebug, `runtime/cds`
Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'premain' into JDK-8368465-precompiler-method-select - Drop the mention of MDO - Merge branch 'premain' into JDK-8368465-precompiler-method-select - Merge branch 'premain' into JDK-8368465-precompiler-method-select - Touchup - Touchups - Fix ------------- Changes: - all: https://git.openjdk.org/leyden/pull/99/files - new: https://git.openjdk.org/leyden/pull/99/files/49f95523..3d298056 Webrevs: - full: https://webrevs.openjdk.org/?repo=leyden&pr=99&range=04 - incr: https://webrevs.openjdk.org/?repo=leyden&pr=99&range=03-04 Stats: 158648 lines in 2753 files changed: 91357 ins; 50434 del; 16857 mod Patch: https://git.openjdk.org/leyden/pull/99.diff Fetch: git fetch https://git.openjdk.org/leyden.git pull/99/head:pull/99 PR: https://git.openjdk.org/leyden/pull/99
On Tue, 2 Dec 2025 11:40:23 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:
Forked from [JDK-8366681](https://bugs.openjdk.org/browse/JDK-8366681): there are still some cleanups/performance improvements possible. Current selection code is a bit hairy, and turns out the changes I made for previous patch improve performance.
Notable improvements: 1. Push the compilation level filters downwards. This allows compiling A2 from T2/T3 code more easily, and allows to implement policies for compiling on any A* level based on observing top-compiled T* levels. 2. Sort methods by hotness and code size. This looks to have a positive effect on shorter workloads, I suspect because we are avoiding a lot of C1 compilations by preloading hottest code first.
Additional testing: - [x] Performance tests (see comments) - [x] Linux x86_64 server fastdebug, `runtime/cds`
Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision:
- Merge branch 'premain' into JDK-8368465-precompiler-method-select - Drop the mention of MDO - Merge branch 'premain' into JDK-8368465-precompiler-method-select - Merge branch 'premain' into JDK-8368465-precompiler-method-select - Touchup - Touchups - Fix
I re-merged with current `premain`, re-measured some light benchmarks, and the performance improvements are still there. I still believe this is a useful thing to do for infrastructural reasons (gives me access to more advanced selection policies), and performance boost comes as a nice bonus. There are other possibilities in optimizing interaction with preload code, and those can and should be done separately, IMO. Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -XX:AOTCache=app.aot -Xms64m -Xmx1g -XX:+UseSerialGC -cp JavacBenchApp.jar JavacBenchApp 50 ### 2 cores # Baseline Time (mean ± σ): 425.7 ms ± 17.6 ms [User: 667.3 ms, System: 96.3 ms] Range (min … max): 404.8 ms … 458.1 ms 10 runs Time (mean ± σ): 427.5 ms ± 18.3 ms [User: 668.7 ms, System: 99.6 ms] Range (min … max): 399.2 ms … 451.0 ms 10 runs Time (mean ± σ): 418.6 ms ± 11.6 ms [User: 657.2 ms, System: 96.1 ms] Range (min … max): 402.5 ms … 436.7 ms 10 runs # Patched Time (mean ± σ): 373.4 ms ± 11.7 ms [User: 547.1 ms, System: 89.7 ms] Range (min … max): 359.3 ms … 397.5 ms 10 runs Time (mean ± σ): 363.4 ms ± 8.5 ms [User: 511.6 ms, System: 92.6 ms] Range (min … max): 346.2 ms … 373.7 ms 10 runs Time (mean ± σ): 370.4 ms ± 11.9 ms [User: 520.3 ms, System: 93.4 ms] Range (min … max): 353.4 ms … 384.3 ms 10 runs ------------- PR Comment: https://git.openjdk.org/leyden/pull/99#issuecomment-3602972698
Forked from [JDK-8366681](https://bugs.openjdk.org/browse/JDK-8366681): there are still some cleanups/performance improvements possible. Current selection code is a bit hairy, and turns out the changes I made for previous patch improve performance.
Notable improvements: 1. Push the compilation level filters downwards. This allows compiling A2 from T2/T3 code more easily, and allows to implement policies for compiling on any A* level based on observing top-compiled T* levels. 2. Sort methods by hotness and code size. This looks to have a positive effect on shorter workloads, I suspect because we are avoiding a lot of C1 compilations by preloading hottest code first.
Additional testing: - [x] Performance tests (see comments) - [x] Linux x86_64 server fastdebug, `runtime/cds`
Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: - More cosmetics - Improve compile ID sorting - Revert sorting by method count ------------- Changes: - all: https://git.openjdk.org/leyden/pull/99/files - new: https://git.openjdk.org/leyden/pull/99/files/3d298056..fc30a139 Webrevs: - full: https://webrevs.openjdk.org/?repo=leyden&pr=99&range=05 - incr: https://webrevs.openjdk.org/?repo=leyden&pr=99&range=04-05 Stats: 60 lines in 3 files changed: 14 ins; 35 del; 11 mod Patch: https://git.openjdk.org/leyden/pull/99.diff Fetch: git fetch https://git.openjdk.org/leyden.git pull/99/head:pull/99 PR: https://git.openjdk.org/leyden/pull/99
On Wed, 3 Dec 2025 13:58:45 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:
Forked from [JDK-8366681](https://bugs.openjdk.org/browse/JDK-8366681): there are still some cleanups/performance improvements possible. Current selection code is a bit hairy, and turns out the changes I made for previous patch improve performance.
Notable improvements: 1. Push the compilation level filters downwards. This allows compiling A2 from T2/T3 code more easily, and allows to implement policies for compiling on any A* level based on observing top-compiled T* levels. 2. Sort methods by hotness and code size. This looks to have a positive effect on shorter workloads, I suspect because we are avoiding a lot of C1 compilations by preloading hottest code first.
Additional testing: - [x] Performance tests (see comments) - [x] Linux x86_64 server fastdebug, `runtime/cds`
Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision:
- More cosmetics - Improve compile ID sorting - Revert sorting by method count
Good. Let me test it. ------------- PR Review: https://git.openjdk.org/leyden/pull/99#pullrequestreview-3536795120
On Wed, 3 Dec 2025 20:25:38 GMT, Vladimir Kozlov <kvn@openjdk.org> wrote:
Good. Let me test it.
Thanks! I hope we can integrate it this year :) ------------- PR Comment: https://git.openjdk.org/leyden/pull/99#issuecomment-3613417063
On Wed, 3 Dec 2025 13:58:45 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:
Forked from [JDK-8366681](https://bugs.openjdk.org/browse/JDK-8366681): there are still some cleanups/performance improvements possible. Current selection code is a bit hairy, and turns out the changes I made for previous patch improve performance.
Notable improvements: 1. Push the compilation level filters downwards. This allows compiling A2 from T2/T3 code more easily, and allows to implement policies for compiling on any A* level based on observing top-compiled T* levels. 2. Sort methods by hotness and code size. This looks to have a positive effect on shorter workloads, I suspect because we are avoiding a lot of C1 compilations by preloading hottest code first.
Additional testing: - [x] Performance tests (see comments) - [x] Linux x86_64 server fastdebug, `runtime/cds`
Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision:
- More cosmetics - Improve compile ID sorting - Revert sorting by method count
Testing results are mess :( for both, these changes and control. But I don't see anything alarming. I approve changes. ------------- Marked as reviewed by kvn (Committer). PR Review: https://git.openjdk.org/leyden/pull/99#pullrequestreview-3545836697
On Wed, 3 Dec 2025 13:58:45 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:
Forked from [JDK-8366681](https://bugs.openjdk.org/browse/JDK-8366681): there are still some cleanups/performance improvements possible. Current selection code is a bit hairy, and turns out the changes I made for previous patch improve performance.
Notable improvements: 1. Push the compilation level filters downwards. This allows compiling A2 from T2/T3 code more easily, and allows to implement policies for compiling on any A* level based on observing top-compiled T* levels. 2. Sort methods by hotness and code size. This looks to have a positive effect on shorter workloads, I suspect because we are avoiding a lot of C1 compilations by preloading hottest code first.
Additional testing: - [x] Performance tests (see comments) - [x] Linux x86_64 server fastdebug, `runtime/cds`
Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision:
- More cosmetics - Improve compile ID sorting - Revert sorting by method count
Thanks! Here goes. ------------- PR Comment: https://git.openjdk.org/leyden/pull/99#issuecomment-3618053352
On Tue, 23 Sep 2025 12:33:23 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:
Forked from [JDK-8366681](https://bugs.openjdk.org/browse/JDK-8366681): there are still some cleanups/performance improvements possible. Current selection code is a bit hairy, and turns out the changes I made for previous patch improve performance.
Notable improvements: 1. Push the compilation level filters downwards. This allows compiling A2 from T2/T3 code more easily, and allows to implement policies for compiling on any A* level based on observing top-compiled T* levels. 2. Sort methods by hotness and code size. This looks to have a positive effect on shorter workloads, I suspect because we are avoiding a lot of C1 compilations by preloading hottest code first.
Additional testing: - [x] Performance tests (see comments) - [x] Linux x86_64 server fastdebug, `runtime/cds`
This pull request has now been integrated. Changeset: 9c83531e Author: Aleksey Shipilev <shade@openjdk.org> URL: https://git.openjdk.org/leyden/commit/9c83531e88020f5762116020bcf472511058cf... Stats: 99 lines in 2 files changed: 39 ins; 27 del; 33 mod 8368465: [leyden] Improve precompiler method selection code Reviewed-by: kvn ------------- PR: https://git.openjdk.org/leyden/pull/99
participants (2)
-
Aleksey Shipilev
-
Vladimir Kozlov