RFR: 8351500: G1: NUMA migrations cause crashes in region allocation [v2]

Stefan Johansson sjohanss at openjdk.org
Thu Mar 13 08:52:55 UTC 2025


On Wed, 12 Mar 2025 13:27:17 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> For details, please see JBS issue.
>> 
>> _Please note that this bug only shows symptoms in JDK 21 and JDK 17! Due to code shuffling done as part of G1 region-local pinning work, the error does not show in JDKs 22 and later._
>> 
>> I originally planned to fix this just for JDK 21 and 17 (see https://github.com/openjdk/jdk21u-dev/pull/1460). However, I would rather have it fixed in the mainline, even though it is symptom-free. It is a lingering issue that may surface later if the code is ever changed. Plus, this prevents the fix from being accidentally overwritten in JDK 21 if we backport.
>> 
>> ----
>> 
>> The fix is simple: we fix (hah) the NUMA association for the full duration of a heap allocation in G1. That way, regardless of the OS scheduler moving the thread to a different NUMA node, we always use the same `G1AllocRegion` object, and changes in the control flow that rely on that won't break on NUMA.
>> 
>> This has the disadvantage of allocating memory from a node we are potentially moving away from. However, I argue that this is exceedingly rare, and if it happens, the OS will cope by eventually migrating the memory to the correct node.
>> 
>> ---
>> 
>> Testing:
>> 
>> Testing is difficult. See remark in JBS issue. 
>> 
>> I tested a modified version of this patch on JDK 21, where the error does cause crashes. I tested with an additional patch mimicking tons of NUMA node migrations. As I wrote in JBS, I plan to contribute that "FakeNUMA" mode eventually, but lack the time to polish it up for now. I hope the fix is simple and uncontested enough to go in quickly, since I would like to fix JDK 21 soon via backporting this patch.
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   node_index parameter should precede output parameters

I agree with keeping this patch small and then at some point doing a larger refactor in this area. I also agree that we want the out-parameter last, but when reading the parameters now you get sizing parameters split by the node index parameter, i.e: 

HeapWord* result = _allocator->attempt_allocation(min_word_size, desired_word_size, node_index, actual_word_size);


I would prefer the node index first here, like:

HeapWord* result = _allocator->attempt_allocation(node_index, min_word_size, desired_word_size, actual_word_size);

For this one: 

  HeapWord* par_allocate_during_gc(G1HeapRegionAttr dest,
                                   size_t min_word_size,
                                   size_t desired_word_size,
                                   uint node_index,
                                   size_t* actual_word_size);


I would prefer to put the node_index after dest but before the sizes.

What do you think?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23984#issuecomment-2720447995


More information about the hotspot-gc-dev mailing list