[PATCH] JDK-8205051 (UseNUMA memory interleaving vs cpunodebind & localalloc)
Thomas Schatzl
thomas.schatzl at oracle.com
Thu Sep 27 19:01:54 UTC 2018
Hi Roshan,
On Tue, 2018-09-25 at 12:18 +0530, roshan mangal wrote:
> Hi All,
>
> This Patch is for https://bugs.openjdk.java.net/browse/JDK-8205051
>
> Issue:
>
> If the JVM isn't allowed to run on all of the nodes (by numactl,
> cgroups, docker, etc), then a significant fraction of the Java heap
> will be unusable, causing early GC.
>
> Every Thread captures their locality group(lgrp) and allocates memory
> from that lgrp.
>
> lgrp id is same as NUMA node id.
>
> Thread running on CPU belongs to NUMA node 0, will capture Thread-
> >lgrp as lgrp0 and will allocate memory from NUMA node 0. Once NUMA
> node 0 is full, it will trigger GC irrespective of other NUMA node
> having memory.
>
> Solution proposed:
>
> Create List of NUMA nodes based on distance and allocate memory from
> near NUMA node when other closest NUMA node is/are full.
>
> Below system has eight NUMA nodes and distance table given below.
>
> node distances:
>
> node 0 1 2 3 4 5 6 7
> 0: 10 16 16 16 32 32 32 32
> 1: 16 10 16 16 32 32 32 32
> 2: 16 16 10 16 32 32 32 32
> 3: 16 16 16 10 32 32 32 32
> 4: 32 32 32 32 10 16 16 16
> 5: 32 32 32 32 16 10 16 16
> 6: 32 32 32 32 16 16 10 16
> 7: 32 32 32 32 16 16 16 10
>
> The corresponding list for each lgrp will be like this.
>
> Thread's lgrp
> Order of Allocation in NUMA node
>
> lgrp0 [ numaNode0->numaNode1->numaNode2->numaNode3->
> numaNode4->numaNode5->numaNode6->numaNode7 ]
> lgrp1 [ numaNode1->numaNode0->numaNode2->numaNode3->
> numaNode4->numaNode5->numaNode6->numaNode7 ]
> lgrp2 [ numaNode2->numaNode0->numaNode1->numaNode3->
> numaNode4->numaNode5->numaNode6->numaNode7 ]
> lgrp3 [ numaNode3->numaNode0->numaNode1->numaNode2->
> numaNode4->numaNode5->numaNode6->numaNode7 ]
> lgrp4 [ numaNode4->numaNode5->numaNode6->numaNode7->
> numaNode0->numaNode1->numaNode2->numaNode3 ]
> lgrp5 [ numaNode5->numaNode4->numaNode6->numaNode7->
> numaNode0->numaNode1->numaNode2->numaNode3 ]
> lgrp6 [ numaNode6->numaNode4->numaNode5->numaNode7->
> numaNode0->numaNode1->numaNode2->numaNode3 ]
> lgrp7 [ numaNode7->numaNode4->numaNode5->numaNode6->
> numaNode0->numaNode1->numaNode2->numaNode3 ]
I have a question about this: lgrps often have the same distance from
each other, and this order-of-allocation list seems to be
deterministic. So in this case nodes with lower lgrp id (but the same
distance) are preferred to ones with higher lgrp id.
Do you expect some imbalance because of that? If so, couldn't it be
useful to randomize lgrps with the same distance in this list, and
regularly change them?
Long ago I have been implementing some NUMA support for G1 and had that
issue (in this case the distribution is a nice lattice with everyone
connected by everyone else with two hops), with the above mentioned
solution to that "problem".
Do you think something like this would make sense (not particularly in
this change).
> Allocation on NUMA node, which is far from CPU can lead to
> performance issue. Sometimes triggering GC is a better option than
> allocating from NUMA node at large distance i.e. high memory latency.
>
> For this, I have added option "NumaAllocationDistanceLimit", which
> will restrict memory allocation from the far nodes.
>
> In above system if we set -XX:NumaAllocationDistanceLimit=16.
That makes sense imho, although it is a bit sad that this number is
specific to the machine.
>
> The corresponding list for each lgrp will be like this.
>
> Thread's lgrp Order of Allocation in NUMA node
> lgrp0 [ numaNode0->numaNode1->numaNode2->numaNode3 ]
> lgrp1 [ numaNode1->numaNode0->numaNode2->numaNode3 ]
> lgrp2 [ numaNode2->numaNode0->numaNode1->numaNode3 ]
> lgrp3 [ numaNode3->numaNode0->numaNode1->numaNode2 ]
> lgrp4 [ numaNode4->numaNode5->numaNode6->numaNode7 ]
> lgrp5 [ numaNode5->numaNode4->numaNode6->numaNode7 ]
> lgrp6 [ numaNode6->numaNode4->numaNode5->numaNode7 ]
> lgrp7 [ numaNode7->numaNode4->numaNode5->numaNode6 ]
>
> #################################################### PATCH
> ################################
could you send me a patch as webrev so I can put it on
cr.openjdk.java.net? (Or maybe sending the patch as attachment helps
too). It got mangled by your email program, adding many linebreaks.
Thanks,
Thomas
More information about the hotspot-dev
mailing list