RFR: Save/load nmethod without going through CodeBuffer [v6]

Ashutosh Mehra asmehra at openjdk.org
Mon Mar 31 08:40:50 UTC 2025


On Fri, 28 Mar 2025 11:51:12 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote:

>> This is the prototype for storing and loading nmethods without going through the CodeBuffer.
>> The new implementation is protected by the flag -XX:+UseNewCode2.
>> 
>> Some numbers using this implementation:
>> spring-boot-getting-started [0] shows startup improvement of ~ 7.5%% and quarkus-getting-started [1] shows improvement of around ~3.5%.
>> 
>> Numbers for Springboot
>> 
>> Old build = /home/asmehra/data/ashu-mehra/leyden/build/nmethod-single-copy-load-release/images/jdk with options -XX:+UnlockDiagnosticVMOptions -XX:-UseNewCode2
>> New build = /home/asmehra/data/ashu-mehra/leyden/build/nmethod-single-copy-load-release/images/jdk with options -XX:+UnlockDiagnosticVMOptions -XX:+UseNewCode2
>> Run,Old CDS + AOT,New CDS + AOT
>> 1,544,512
>> 2,554,525
>> 3,550,508
>> 4,571,515
>> 5,550,506
>> 6,552,515
>> 7,568,510
>> 8,554,517
>> 9,551,506
>> 10,556,508
>> Geomean,554.94,512.17
>> Stdev,7.90,5.65
>> 
>> Numbers for Quarkus:
>> 
>> Old build = /home/asmehra/data/ashu-mehra/leyden/build/nmethod-single-copy-load-release/images/jdk with options -XX:+UnlockDiagnosticVMOptions -XX:-UseNewCode2
>> New build = /home/asmehra/data/ashu-mehra/leyden/build/nmethod-single-copy-load-release/images/jdk with options -XX:+UnlockDiagnosticVMOptions -XX:+UseNewCode2
>> Run,Old CDS + AOT,New CDS + AOT
>> 1,359,346
>> 2,360,353
>> 3,373,357
>> 4,376,356
>> 5,366,353
>> 6,360,356
>> 7,361,361
>> 8,347,349
>> 9,355,336
>> 10,374,342
>> Geomean,363.00,350.82
>> Stdev,8.70,7.27
>> 
>> 
>> -Xlog:init logs the load time from AOT code cache at JVM exit.
>> For spring-boot-getting-started without `UseNewCode2`:
>> 
>> [3.459s][info][init]     SC Load Time:           0.202 s
>> [3.459s][info][init]       nmethod register:       0.135 s
>> [3.459s][info][init]       find cached code:       0.007 s
>> 
>> 
>> For spring-boot-getting-started with `UseNewCode2`:
>> 
>> [3.192s][info][init] 
>> [3.192s][info][init]     SC Load Time:           0.138 s
>> [3.192s][info][init]       nmethod register:       0.111 s
>> [3.192s][info][init]       find cached code:       0.006 s
>> 
>> 
>> For quarkus-getting-started without `UseNewCode2`
>> 
>> [0.392s][info][init]     SC Load Time:           0.060 s
>> [0.392s][info][init]       nmethod register:       0.039 s
>> [0.392s][info][init]       find cached code:       0.002 s
>> 
>> 
>> For quarkus-getting-started with `UseNewCode2`
>> 
>> [0.386s][info][init]     SC Load Time:           0.033 s
>> [0.386s][info][init]       nmethod register:       0.027 s
>> [0.386s][info][init]       find cached code:      ...
>
> Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision:
> 
>  - Merge branch 'premain' into nmethod-single-copy-load
>  - Rename write_nmethod_extra_relocations to write_nmethod_loadtime_relocations
>    
>    Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>
>  - Flush code block when loading the nmethod from archive
>    
>    Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>
>  - Some more cleanup
>    
>    Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>
>  - Set compile id
>    
>    Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>
>  - Minor cleanup
>    
>    Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>
>  - Handle mutable nmethod data properly when storing/loading from AOT code
>    cache
>    
>    Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>
>  - Fix win compile failures
>    
>    Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>
>  - Remove trailing whitespaces
>    
>    Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>
>  - Include os.hpp to fix compile failures
>    
>    Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>
>  - ... and 7 more: https://git.openjdk.org/leyden/compare/d5429885...45b086ad

I got some numbers for aarch64 as well and they are in similar range as for x86-64

On a 128 cpu server:

Old build = /tmp/ashu/leyden/build/premain-release/images/jdk with options
New build = /tmp/ashu/leyden/build/nmethod-single-copy-load-release/images/jdk with options
Run,Old CDS + AOT,New CDS + AOT       
1,463,444                                                                                                                                                                                                                                                     2,457,443                                                                                                                                                                                                                                                     3,465,453                                                                                                                      
4,452,447                                                                                                                      
5,460,446                                                                                                                      
6,458,443                      
7,465,445                                                                                                                                                                                                                                                     
8,464,454
9,461,449
10,455,449
Geomean,459.98,447.28
Stdev,4.22,3.72


On same 128 cpu server but bound to 0-7 cpus:

Old build = /tmp/ashu/leyden/build/premain-release/images/jdk with options
New build = /tmp/ashu/leyden/build/nmethod-single-copy-load-release/images/jdk with options
Run,Old CDS + AOT,New CDS + AOT
1,441,424
2,449,420
3,435,427
4,429,427
5,429,430
6,437,424
7,441,425
8,447,426
9,429,432
10,438,421
Geomean,437.45,425.59
Stdev,6.86,3.50


Load time as reported by `Xlog:init` for premain:

[0.680s][info][init]     SC Load Time:           0.210 s                                                                       
[0.680s][info][init]       nmethod register:       0.148 s                                                                     
[0.680s][info][init]       find cached code:       0.004 s  

Load time as reported by `Xlog:init` for this PR:

[0.675s][info][init]     SC Load Time:           0.143 s                                                                       
[0.675s][info][init]       nmethod register:       0.128 s                                                                     
[0.675s][info][init]       find cached code:       0.004 s

-------------

PR Comment: https://git.openjdk.org/leyden/pull/27#issuecomment-2765510815


More information about the leyden-dev mailing list