RFR: 8205051: UseNUMA memory interleaving vs cpunodebind & localalloc [v2]
Stefan Johansson
sjohanss at openjdk.org
Mon Dec 9 10:21:44 UTC 2024
On Tue, 3 Dec 2024 06:26:15 GMT, Swati Sharma <duke at openjdk.org> wrote:
>> Hi All,
>>
>> The PR handles the performance issues related to flag UseNUMA. We disable the UseNUMA flag when the process gets invoked with incorrect node alignment.
>> We check the cpunodebind and membind(or interleave for interleave policy) bitmask equality and disable UseNUMA when they are not equal.
>> For example on a 4 NUMA node system:
>> 0123 Node Number
>> 1100 cpunodebind bitmask
>> 1111 membind bitmask
>> Disable UseNUMA as CPU and memory bitmask are not equal.
>>
>> 0123 Node Number
>> 1100 cpunodebind bitmask
>> 1100 membind bitmask
>> Enable UseNUMA as CPU and memory bitmask are equal.
>>
>> This covers all the cases with all policies and tested this with below command
>> numactl --cpunodebind=0,1 --localalloc java -Xlog:gc*=info -XX:+UseParallelGC -XX:+UseNUMA -version
>>
>> For localalloc and preferred policies the membind bitmask returns true for all nodes, hence if cpunodebind is not bound to all nodes then the UseNUMA will be disabled.
>>
>> This PR covers disabling the UseNUMA flag for all GC's hence we observed an improvement of ~25% on G1GC , ~20% on ZGC and ~7-8% on PGC in both throughput and latency on SPECjbb2015 on a 2 NUMA node SRF-SP system with 6Group configuration.
>>
>> Please review and provide your valuable comments.
>>
>> Thanks,
>> Swati Sharma
>> Intel
>
> Swati Sharma has updated the pull request incrementally with one additional commit since the last revision:
>
> 8205051: Resolved review comments.
Thanks for addressing my previous comments.
A minor comments on the new warning-messages and some thought around how to structure the disabling of NUMA.
src/hotspot/os/linux/os_linux.cpp line 4495:
> 4493: (UseNUMAInterleaving && FLAG_IS_CMDLINE(UseNUMAInterleaving))) {
> 4494: // Only issue a warning if the user explicitly asked for NUMA support
> 4495: log_warning(os)("NUMA support is disabled as libnuma not initialized");
Suggestion:
log_warning(os)("NUMA support is disabled as libnuma failed to initialize");
src/hotspot/os/linux/os_linux.cpp line 4516:
> 4514: FLAG_SET_ERGO(UseNUMA, false);
> 4515: FLAG_SET_ERGO(UseNUMAInterleaving, false);
> 4516:
Thought a bit more on this and I think it's better if we make it more explicit in the warning why NUMA was disabled. Especially now when we have three different reasons. What do you think about:
Suggestion:
if (Linux::numa_max_node() < 1) {
disable_numa("Only a single NUMA node is available");
} else if (Linux::is_bound_to_single_mem_node()) {
disable_numa("The process is bound to a single NUMA node");
} else if (Linux::mem_and_cpu_node_mismatch()) {
disable_numa("The process memory and cpu node configuration does not match");
And:
static void disable_numa(const char* reason) {
if ((UseNUMA && FLAG_IS_CMDLINE(UseNUMA)) ||
(UseNUMAInterleaving && FLAG_IS_CMDLINE(UseNUMAInterleaving))) {
// Only issue a warning if the user explicitly asked for NUMA support
log_warning(os)("NUMA support disabled: %s", reason);
}
FLAG_SET_ERGO(UseNUMA, false);
FLAG_SET_ERGO(UseNUMAInterleaving, false);
}
-------------
Changes requested by sjohanss (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/22395#pullrequestreview-2488145299
PR Review Comment: https://git.openjdk.org/jdk/pull/22395#discussion_r1875596995
PR Review Comment: https://git.openjdk.org/jdk/pull/22395#discussion_r1875722823
More information about the hotspot-runtime-dev
mailing list