RFR: 8344831: [REDO] CDS: Parallel relocation

Aleksey Shipilev shade at openjdk.org
Tue Nov 26 09:48:21 UTC 2024


On Mon, 25 Nov 2024 17:58:35 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> This re-does [JDK-8341334](https://bugs.openjdk.org/browse/JDK-8341334), being mindful of failures reported in [JDK-8344583](https://bugs.openjdk.org/browse/JDK-8344583). See the original motivation in [JDK-8341334](https://bugs.openjdk.org/browse/JDK-8341334).
> 
> To simplify reviews, this PR contains the original commit (https://github.com/openjdk/jdk/commit/4218b882cd0a75f3159495ec2a57305ecfc8bf69), and amendments are stacked on top of it.
> 
> The common theme in all failures that we have seen is VM exiting when `ArchiveWorkers` (`AW`) are still actively waiting on related `Semaphores`. Since the `AW` were static, the destructors for `Semaphores` would run as part of `AW` destruction sequence, and what would make the active workers fail. 
> 
> We resolve this trouble and strengthen the code with the following amendments:
>   1. `AW` is now a structured stack object. `AW` instances are not exposed statically, so even the abnormal VM termination would not break semaphores. The normal scoping rules would make sure `AW` is shutdown on all normal paths.
>   2. We move `AW` use mark to the only place it is currently used. This allows us to further insulate from `AW` problems if we flip the `AOTCacheParallelRelocation` flag back to `false` in the field.
>   3. `AW` is now a single-use object. Making it reusable proves to be fairly hard, especially when workers are lagging behind during the startup or wakeups. Quite a bit of that can be mitigated by smart shutdown protocol, but it requires more coordination, and extra pool work. Single use `AW` is significantly easier to reason about. We can re-instate reusability in/when we need it.
> 
> I will put the performance data in a separate commit.
> 
> Additional testing:
>  - [x] GHA
>  - [x] linux-x86_64-server-fastdebug, `runtime/cds`
>  - [x] linux-x86_64-server-fastdebug, `all`
>  - [x] macos-aarch64-server-fastdebug, `compiler/compilercontrol compiler/ciReplay runtime/CommandLine/` (100x, no failures)

This still yields the same startup improvements as the original change (compare with https://github.com/openjdk/jdk/pull/21302#issuecomment-2388349450):


$ hyperfine -w 30 -r 1000 "build/linux-x86_64-server-release/images/jdk/bin/java ... Hello"

# -XX:ArchiveRelocationMode=0
  Time (mean ± σ):      23.3 ms ±   0.2 ms    [User: 10.7 ms, System: 11.2 ms]  # 1 CPU
  Time (mean ± σ):      20.3 ms ±   0.5 ms    [User: 10.7 ms, System: 11.6 ms]  # 2 CPUs
  Time (mean ± σ):      19.7 ms ±   0.2 ms    [User: 11.1 ms, System: 11.5 ms]  # 4 CPUs
  Time (mean ± σ):      19.8 ms ±   0.2 ms    [User: 11.0 ms, System: 11.9 ms]  # 8 CPUs
  Time (mean ± σ):      20.5 ms ±   0.3 ms    [User: 11.0 ms, System: 12.8 ms]  # 16 CPUs
  Time (mean ± σ):      20.7 ms ±   0.3 ms    [User: 11.0 ms, System: 13.0 ms]  # 32 CPUs

# -XX:-AOTCacheParallelRelocation (effectively current mainline)
  Time (mean ± σ):      29.0 ms ±   0.3 ms    [User: 12.6 ms, System: 15.0 ms]  # 1 CPU
  Time (mean ± σ):      26.8 ms ±   0.8 ms    [User: 13.1 ms, System: 15.3 ms]  # 2 CPUs
  Time (mean ± σ):      25.5 ms ±   0.3 ms    [User: 12.9 ms, System: 15.4 ms]  # 4 CPUs
  Time (mean ± σ):      25.7 ms ±   0.3 ms    [User: 13.1 ms, System: 15.6 ms]  # 8 CPUs
  Time (mean ± σ):      26.6 ms ±   0.3 ms    [User: 12.9 ms, System: 16.8 ms]  # 16 CPUs
  Time (mean ± σ):      26.9 ms ±   0.3 ms    [User: 12.8 ms, System: 17.2 ms]  # 32 CPUs

# -XX:+AOTCacheParallelRelocation (new default)
  Time (mean ± σ):      29.1 ms ±   0.4 ms    [User: 12.5 ms, System: 15.1 ms]  # 1 CPU
  Time (mean ± σ):      22.9 ms ±   0.6 ms    [User: 12.8 ms, System: 15.0 ms]  # 2 CPUs
  Time (mean ± σ):      21.1 ms ±   0.2 ms    [User: 12.6 ms, System: 15.1 ms]  # 4 CPUs
  Time (mean ± σ):      20.8 ms ±   0.4 ms    [User: 12.2 ms, System: 15.6 ms]  # 8 CPUs
  Time (mean ± σ):      21.0 ms ±   0.2 ms    [User: 12.1 ms, System: 16.5 ms]  # 16 CPUs
  Time (mean ± σ):      21.1 ms ±   0.3 ms    [User: 11.9 ms, System: 17.1 ms]  # 32 CPUs

@dholmes-ora -- this is the re-do we were talking about last week. I simplified the lifecycle quite significantly to insulate us from more surprises, at the cost of pool reusability. Shutting down when workers can suddenly circle back to semaphores is not a very easy task, and requires extra steps that eat into startup costs. Single-use thing still performs well on startup tests. I am going to throw more testing at it overnight, but feel free to poke holes in this meanwhile.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22369#issuecomment-2498850555
PR Comment: https://git.openjdk.org/jdk/pull/22369#issuecomment-2498901579


More information about the hotspot-runtime-dev mailing list