RFR: 8344831: [REDO] CDS: Parallel relocation [v4]

Fri Nov 29 09:28:43 UTC 2024

On Wed, 27 Nov 2024 15:29:58 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> This re-does [JDK-8341334](https://bugs.openjdk.org/browse/JDK-8341334), being mindful of failures reported in [JDK-8344583](https://bugs.openjdk.org/browse/JDK-8344583). See the original motivation in [JDK-8341334](https://bugs.openjdk.org/browse/JDK-8341334).
>> 
>> To simplify reviews, this PR contains the original commit (https://github.com/openjdk/jdk/commit/4218b882cd0a75f3159495ec2a57305ecfc8bf69), and amendments are stacked on top of it.
>> 
>> The common theme in all failures that we have seen is VM exiting when `ArchiveWorkers` (`AW`) are still actively waiting on related `Semaphores`. Since the `AW` were static, the destructors for `Semaphores` would run as part of `AW` destruction sequence, and what would make the active workers fail. 
>> 
>> We resolve this trouble and strengthen the code with the following amendments:
>>   1. `AW` is now a structured stack object. `AW` instances are not exposed statically, so even the abnormal VM termination would not break semaphores. The normal scoping rules would make sure `AW` is shutdown on all normal paths.
>>   2. We move `AW` use mark to the only place it is currently used. This allows us to further insulate from `AW` problems if we flip the `AOTCacheParallelRelocation` flag back to `false` in the field.
>>   3. `AW` is now a single-use object. Making it reusable proves to be fairly hard, especially when workers are lagging behind during the startup or wakeups. Quite a bit of that can be mitigated by smart shutdown protocol, but it requires more coordination, and extra pool work. Single use `AW` is significantly easier to reason about. We can re-instate reusability in/when we need it.
>> 
>> I will put the performance data in a separate commit.
>> 
>> Additional testing:
>>  - [x] GHA
>>  - [x] linux-x86_64-server-fastdebug, `runtime/cds`
>>  - [x] linux-x86_64-server-fastdebug, `all`
>>  - [x] macos-aarch64-server-fastdebug, `compiler/compilercontrol compiler/ciReplay runtime/CommandLine/` (100x, no failures)
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Atomicity improvements

src/hotspot/share/cds/archiveUtils.cpp line 434:

> 432:     }
> 433:     if (Atomic::cmpxchg(&_started_workers, cur, cur + 1, memory_order_relaxed) == cur) {
> 434:       new ArchiveWorkerThread(this);

Okay, we now just leak the thread objects. That is intentional?

I guess it has to be; otherwise thread invoking the ArchiveWorkers destructor would race the last worker threads finishing their runs?

test/hotspot/gtest/cds/test_archiveWorkers.cpp line 43:

> 41: };
> 42: 
> 43: // Test a repeated cycle of pool start works.

This does not do what the comment says. ArchiveWorkers ctor is a noop.

test/hotspot/gtest/cds/test_archiveWorkers.cpp line 55:

> 53:     TestArchiveWorkerTask task;
> 54:     ArchiveWorkers workers;
> 55:     workers.run_task(&task);

Here, we leak the thread objects (can be observed with PrintNMTStatistics). See comment above.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22369#discussion_r1863216840
PR Review Comment: https://git.openjdk.org/jdk/pull/22369#discussion_r1863213431
PR Review Comment: https://git.openjdk.org/jdk/pull/22369#discussion_r1863214177