Unexpected results when enabling +UseNUMA for G1GC

Sat Jan 9 20:15:54 UTC 2021

Hi Guys,
We're exploring the use of the flag -XX:+UseNUMA and its effect on G1 GC in
JDK 14.
For that, we've created a test that consists of 2 k8s deployments of some
service,
where deployment A has the UseNUMA flag enabled, and deployment B doesn't
have it.

In order for NUMA to actually work inside the docker container, we also
needed to add numactl lib to the container (apk add numactl),
and in order to measure the local/remote memory access we've used pcm-numa (
https://github.com/opcm/pcm),
the docker is based on an image of Alpine Linux v3.11.

Each deployment handles around 150 requests per second and all of the
deployment's pods are running on the same kube machine.
When running the test, we expected to see that the (local memory access) /
(total memory access) ratio on the UseNUMA deployment, is much higher than
the non-numa deployment,
and as a result that the deployment itself handles a higher throughput of
requests than the non-numa deployment.

Surprisingly this isn't the case:
On the kube running deployment A which uses NUMA, we measured 20M/ 13M/ 33M
(local/remote/total) memory accesses,
and for the kube running deployment B which doesn't use NUMA, we measured
(23M/10M/33M) on the same time.

Can you help to understand if we're doing anything wrong? or maybe our
expectations are wrong ?

The 2 deployments are identical (except for the UseNUMA flag):
Each deployment contains 2 pods running on k8s.
Each pod has 10GB memory, 8GB heap, requires 2 CPUs (but not limited to 2).
Each deployment runs on a separate but identical kube machine with this
spec:
              Hardware............: Supermicro SYS-2027TR-HTRF+
              CPU.................: Intel(R) Xeon(R) CPU E5-2630L v2 @
2.40GHz
              CPUs................: 2
              CPU Cores...........: 12
              Memory..............: 63627 MB

We've also written to a file all NUMA related logs (using
-Xlog:os*,gc*=trace:file=/outbrain/heapdumps/fulllog.log:hostname,time,level,tags)
- log file could be found here:
https://drive.google.com/file/d/1eZqYDtBDWKXaEakh_DoYv0P6V9bcLs6Z/view?usp=sharing
so we know that NUMA is indeed working, but again, it doesn't give the
desired results we expected to see.

Any Ideas why ?
Is it a matter of workload ? Are there any workloads you can suggest that
will benefit from G1 NUMA awareness ?
Do you happen to have a link to code that runs such a workload?

Thanks,
Tal

-- 
The above terms reflect a potential business arrangement, are provided 
solely as a basis for further discussion, and are not intended to be and do 
not constitute a legally binding obligation. No legally binding obligations 
will be created, implied, or inferred until an agreement in final form is 
executed in writing by all parties involved.

This email and any 
attachments hereto may be confidential or privileged.  If you received this 
communication by mistake, please don't forward it to anyone else, please 
erase all copies and attachments, and please let me know that it has gone 
to the wrong person. Thanks.