RFR: 8341334: CDS: Parallel pretouch and relocation [v7]

Ioi Lam iklam at openjdk.org
Wed Nov 6 06:19:30 UTC 2024


On Tue, 5 Nov 2024 20:12:06 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> In Leyden performance investigations, we have figured that `ArchiveRelocationMode=0` is saving 5..7 ms on HelloWorld startup. Mainline defaults to `ARM=1`, _losing_ as much. `ARM=0` was switched to `ARM=1` with [JDK-8294323](https://github.com/openjdk/jdk/commit/be70bc1c58eaec876aa1ab36eacba90b901ac9b8), which was delivered to JDK 17+ in in Apr 2023.
>> 
>> Profiling shows we spend time mem-faulting the memory loading the RO/RW regions, about 15 MB total. 15 MB in 5ms amounts to >4GB/sec, close to the single-threaded limits. I suspect the impact is larger if we relocate larger Metaspace, e.g. after dumping a CDS archive from a large application. There is little we can do to make the actual relocation part faster: the overwhelming majority of samples is on kernel side.
>> 
>> This PR implements the infrastructure for fast archive workers, and leverages it to perform parallel core regions relocation. The RW/RO regions this code traverses is large, and we can squeeze more performance by parallelizing it. Without pretouch from (1), this step serves as one for RW/RO regions. 
>> 
>> (I'll put some performance data in the comments to show how these interact)
>> 
>> Additional testing:
>>  - [x] Linux x86_64 server fastdebug, `runtime/cds`
>>  - [x] Linux AArch64 server fastdebug, `all`
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision:
> 
>  - More perf touchups
>  - Cap the max number of workers
>  - Revert pre-touch parts: there are startup regressions on smaller thread counts
>  - Handle single-threaded modes better
>  - Merge branch 'master' into JDK-8341334-cds-parallel-relocation
>  - Merge branch 'master' into JDK-8341334-cds-parallel-relocation
>  - Make sure we gracefully shutdown whatever happens, refix shutdown race
>  - Simpler bitmap distribution
>  - Capitalize constants
>  - Do not create worker threads too early: Mac/Windows are not yet ready to use Semaphores
>  - ... and 3 more: https://git.openjdk.org/jdk/compare/747632ea...d4d739b1

@fisk I think you did something similar to this PR. Would you take a look?

src/hotspot/share/cds/cds_globals.hpp line 99:

> 97:                                                                             \
> 98:   product(bool, ArchiveParallelRelocation, true, DIAGNOSTIC,                \
> 99:           "Use parallel relocation code to speed up startup.")              \

Maybe wait after JEP 483 integration and rename this flag to `AOTCacheParallelRelocation` to be consistent with the new naming scheme?

-------------

PR Review: https://git.openjdk.org/jdk/pull/21302#pullrequestreview-2417323246
PR Review Comment: https://git.openjdk.org/jdk/pull/21302#discussion_r1830436920


More information about the hotspot-runtime-dev mailing list