RFR: 8298482: Implement ParallelGC NUMAStats for Linux
Nick Gasson
ngasson at openjdk.org
Mon Dec 12 15:28:49 UTC 2022
ParallelGC has a seemingly useful option -XX:+NUMAStats that prints detailed information in GC.heap_info about which NUMA node pages in the eden space are bound to. However as far as I can tell this only ever worked on Solaris and is not implemented on any of the systems we currently support. This patch implements it on Linux using the move_pages system call.
The function os::get_page_info() and accompanying struct page_info was just a thin wrapper around the Solaris meminfo(2) syscall and was never ported to other systems so I've just removed it rather than try to emulate its interface.
There's also a method MutableNUMASpace::LGRPSpace::scan_pages() which attempts to find pages on the wrong NUMA node and frees them so that they have another chance to be allocated on the correct node by the first-touching thread, but I think this has always been a no-op on non-Solaris so perhaps should also be removed. On Linux it shouldn't be necessary as you can bind pages to the desired node directly.
I don't know what the performance of this option was like on Solaris but on Linux the move_pages call can be quite slow: I measured about 25ms/GB on my system. At the moment we call LGRPSpace::accumulate_statistics() twice per GC cycle: I removed the second call as it's likely to see a lot of uncommitted pages if the spaces were just resized. MutableNUMASpace::print_on() also calls accumulate_statistics() directly and since that's the only place this data is used, maybe we can drop the call from MutableNUMASpace::accumulate_statistics() as well?
Example output:
PSYoungGen total 4290560K, used 835628K [0x00000006aac00000, 0x0000000800000000, 0x0000000800000000)
eden space 3096576K, 1% used [0x00000006aac00000,0x00000007176a9f48,0x0000000767c00000)
lgrp 0 space 1761280K, 2% used [0x00000006aac00000,0x00000006acfc4980,0x0000000716400000)
local/remote/unbiased/uncommitted: 1671168K/0K/0K/90112K, large/small pages: 0/440320
lgrp 1 space 1335296K, 46% used [0x0000000716400000,0x000000073c2abb18,0x0000000767c00000)
local/remote/unbiased/uncommitted: 1335296K/0K/0K/0K, large/small pages: 0/333824
from space 1193984K, 65% used [0x00000007b7200000,0x00000007e6b9c778,0x0000000800000000)
to space 1247232K, 0% used [0x0000000767c00000,0x0000000767c00000,0x00000007b3e00000)
After testing this with SPECjbb for a while I noticed some pages always end up bound to the wrong node. I think this is a regression caused by JDK-8283935 but I'll raise a separate ticket for that.
-------------
Commit messages:
- 8298482: Implement ParallelGC NUMAStats for Linux
Changes: https://git.openjdk.org/jdk/pull/11635/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11635&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8298482
Stats: 42 lines in 6 files changed: 7 ins; 14 del; 21 mod
Patch: https://git.openjdk.org/jdk/pull/11635.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/11635/head:pull/11635
PR: https://git.openjdk.org/jdk/pull/11635
More information about the hotspot-dev
mailing list