[10] RFR (S) 8175813: PPC64: "mbind: Invalid argument" when -XX:+UseNUMA is used

Gustavo Romero gromero at linux.vnet.ibm.com
Wed Apr 12 22:51:39 UTC 2017


Hi,

Any update on it?

Thank you.

Regards,
Gustavo

On 09-03-2017 16:33, Gustavo Romero wrote:
> Hi,
> 
> Could the following webrev be reviewed please?
> 
> It improves the numa node detection when non-consecutive or memory-less nodes
> exist in the system.
> 
> webrev: http://cr.openjdk.java.net/~gromero/8175813/v2/
> bug   : https://bugs.openjdk.java.net/browse/JDK-8175813
> 
> Currently, although no problem exists when the JVM detects numa nodes that are
> consecutive and have memory, for example in a numa topology like:
> 
> available: 2 nodes (0-1)
> node 0 cpus: 0 8 16 24 32
> node 0 size: 65258 MB
> node 0 free: 34 MB
> node 1 cpus: 40 48 56 64 72
> node 1 size: 65320 MB
> node 1 free: 150 MB
> node distances:
> node   0   1
>   0:  10  20
>   1:  20  10,
> 
> it fails on detecting numa nodes to be used in the Parallel GC in a numa
> topology like:
> 
> available: 4 nodes (0-1,16-17)
> node 0 cpus: 0 8 16 24 32
> node 0 size: 130706 MB
> node 0 free: 7729 MB
> node 1 cpus: 40 48 56 64 72
> node 1 size: 0 MB
> node 1 free: 0 MB
> node 16 cpus: 80 88 96 104 112
> node 16 size: 130630 MB
> node 16 free: 5282 MB
> node 17 cpus: 120 128 136 144 152
> node 17 size: 0 MB
> node 17 free: 0 MB
> node distances:
> node   0   1  16  17
>   0:  10  20  40  40
>   1:  20  10  40  40
>  16:  40  40  10  20
>  17:  40  40  20  10,
> 
> where node 16 is not consecutive in relation to 1 and also nodes 1 and 17 have
> no memory.
> 
> If a topology like that exists, os::numa_make_local() will receive a local group
> id as a hint that is not available in the system to be bound (it will receive
> all nodes from 0 to 17), causing a proliferation of "mbind: Invalid argument"
> messages:
> 
> http://cr.openjdk.java.net/~gromero/logs/jdk10_pristine.log
> 
> That change improves the detection by making the JVM numa API aware of the
> existence of numa nodes that are non-consecutive from 0 to the highest node
> number and also of nodes that might be memory-less nodes, i.e. that might not
> be, in libnuma terms, a configured node. Hence just the configured nodes will
> be available:
> 
> http://cr.openjdk.java.net/~gromero/logs/jdk10_numa_patched.log
> 
> The change has no effect on numa topologies were the problem does not occur,
> i.e. no change in the number of nodes and no change in the cpu to node map. On
> numa topologies where memory-less nodes exist (like in the last example above),
> cpus from a memory-less node won't be able to bind locally so they are mapped
> to the closest node, otherwise they would be not associate to any node and
> MutableNUMASpace::cas_allocate() would pick a node randomly, compromising the
> performance.
> 
> I found no regressions on x64 for the following numa topology:
> 
> available: 2 nodes (0-1)
> node 0 cpus: 0 1 2 3 8 9 10 11
> node 0 size: 24102 MB
> node 0 free: 19806 MB
> node 1 cpus: 4 5 6 7 12 13 14 15
> node 1 size: 24190 MB
> node 1 free: 21951 MB
> node distances:
> node   0   1
>   0:  10  21
>   1:  21  10
> 
> I understand that fixing the current numa detection is a prerequisite to enable
> UseNUMA by the default [1] and to extend the numa-aware allocation to the G1 GC [2].
> 
> Thank you.
> 
> 
> Best regards,
> Gustavo
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8046153 (JEP 163: Enable NUMA Mode by Default When Appropriate)
> [2] https://bugs.openjdk.java.net/browse/JDK-8046147 (JEP 157: G1 GC: NUMA-Aware Allocation)
> 



More information about the hotspot-dev mailing list