[jdk21u] RFR: 8351500: G1: NUMA migrations cause crashes in region allocation
Thomas Stuefe
stuefe at openjdk.org
Thu Mar 27 07:54:23 UTC 2025
On Mon, 24 Mar 2025 07:38:03 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
> This pull request contains a backport of commit [37ec7962](https://github.com/openjdk/jdk/commit/37ec796255ae857588a5c7e0d572407dd81cbec9) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository.
>
> The commit being backported was authored by Thomas Stuefe on 13 Mar 2025 and was reviewed by Roman Kennke, Stefan Johansson and Thomas Schatzl.
>
> ---
>
> Please consider this patch for backporting. It fixes a G1 crash we see at a customer with a very large NUMA installation.
>
> The patch applied cleanly, but for JDK 21 it was not sufficient to fix the bug. In JDK 21, we must also fix `G1Allocator::attempt_allocation_force`.
>
> Reviewer Notes: Unfortunately, the upstream patch was tainted with a couple of aesthetic code changes (changes to parameter order) due to the wishes of the G1 maintainers. These changes are harmless but obfuscate the real issue that is fixed.
>
> The heart of this patch is in `G1CollectedHeap::attempt_allocation` and everything that happens below that frame, in particular `G1CollectedHeap::attempt_allocation_slow`. Where before we would retrieve the current node number for the current CPU we are running on over and over again - which exposes us to bugs when that node number changes mid-function, we now determine the node number once, up in `G1CollectedHeap::attempt_allocation`, and use that one throughout the allocation process.
>
> The original patch is commit 0aebb171b7e0d9aecb04a5f9832620898047674f, whereas the JDK-21 specific additions are in follow-up commit f57033d6ce60fcca3ee4e9f3cfa0dc3d8d365cc0.
>
> -----
>
> Testing:
>
> - I tested the fix with an additional change mimicking tons of NUMA node migrations. I verified that without the fix, we get the crashes/asserts our customer sees; with patch, crashes/asserts are gone.
> - SAP was nice enough to run their whole JDK21 testing CI, all green
> - I am running tier1_gc on Linux x64
> - GHAs green
After discussions with my colleagues, we decided not to do this as a critical patch, but as a normal patch for the JDK 21 July update. Therefore, I close this PR in favor of [this
](https://github.com/openjdk/jdk21u-dev/pull/1488) .
Sorry for all this confusion.
-------------
PR Comment: https://git.openjdk.org/jdk21u/pull/461#issuecomment-2757046206
More information about the jdk-updates-dev
mailing list