RFR: 8344583: Make ArchiveWorkers lifecycle robust [v2]
David Holmes
dholmes at openjdk.org
Fri Nov 22 05:17:13 UTC 2024
On Thu, 21 Nov 2024 07:40:36 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>> It is obvious from the bug report that `ArchiveWorkers` lifecycle is not robust enough. Looks like archive workers escape their normal shutdown sequence when an unusual VM exit path is taken. We have tried to capture this in before_exit, but that is obviously not enough.
>>
>> This PR reworks the lifecycle a bit: we now use the scoped object to manage `ArchiveWorkers` state. Previous version was carrying `ArchiveWorkers` for the entire lifecycle of CDS archive load path, but that runs into problems when VM exits in the middle of it. This PR shortens the lifecycle of `ArchiveWorkers` to the only place where they are used. To avoid the loss of usefulness, we now allow multiple pool restarts, so if any other code would need these workers, they can restart the pool again. New gtest checks this works.
>>
>> Additional testing:
>> - [x] macos-aarch64-server-fastdebug: reported failing tests are not failing anymore
>> - [x] macos-aarch64-server-fastdebug, `tier{1,2}`
>> - [x] linux-x86_64-server-fastdebug, `all`
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
>
> Move the use of worker pool directly where we need it
I don't think the issue is the normal operation of this pool. Let your pool be active and the workers working away quite merrily. Unrelated to the pool an error is encountered, the VM crashes and the process starts to terminate. We run the static dtors while your pool threads are still active and trying to use the Semaphores that have now been deleted.
This seems to be causing a lot of failures in our higher tier testing now.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/22276#issuecomment-2492891751
More information about the hotspot-runtime-dev
mailing list