RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v16]

Thomas Stuefe stuefe at openjdk.java.net
Wed Feb 24 08:25:48 UTC 2021


On Tue, 16 Feb 2021 16:32:56 GMT, Marcus G K Williams <github.com+168222+mgkwill at openjdk.org> wrote:

>> When using LargePageSizeInBytes=1G, os::Linux::reserve_memory_special_huge_tlbfs* cannot select large pages smaller than 1G. Code heap usually uses less than 1G, so currently the code precludes code heap from using
>> Large pages in this circumstance and when os::Linux::reserve_memory_special_huge_tlbfs* is called page sizes fall back to Linux::page_size() (usually 4k).
>> 
>> This change allows the above use case by populating all large_page_sizes present in /sys/kernel/mm/hugepages in _page_sizes upon calling os::Linux::setup_large_page_size().
>> 
>> In os::Linux::reserve_memory_special_huge_tlbfs* we then select the largest large page size available in _page_sizes that is smaller than bytes being reserved.
>
> Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits:
> 
>  - Merge branch 'master' into pull/1153
>  - kstefanj update
>    
>    Signed-off-by: Marcus G K Williams <marcus.williams at intel.com>
>  - Merge branch 'master' into update_hlp
>  - Merge branch 'master' into update_hlp
>  - Remove extraneous ' from warning
>    
>    Signed-off-by: Marcus G K Williams <marcus.williams at intel.com>
>  - Merge branch 'master' into update_hlp
>  - Merge branch 'master' into update_hlp
>  - Merge branch 'master' into update_hlp
>  - Fix os::large_page_size() in last update
>    
>    Signed-off-by: Marcus G K Williams <marcus.williams at intel.com>
>  - Ivan W. Requested Changes
>    
>    Removed os::Linux::select_large_page_size and
>    use os::page_size_for_region instead
>    
>    Removed Linux::find_large_page_size and use
>    register_large_page_sizes. Streamlined
>    Linux::setup_large_page_size
>    
>    Signed-off-by: Marcus G K Williams <marcus.williams at intel.com>
>  - ... and 15 more: https://git.openjdk.java.net/jdk/compare/f4cfd758...f2e44ac7

Hi Markus,

Many apologies for letting this cook too long, the last months have been hectic. I looked closer at the code today, at least the initialization parts, and have some suggestions and remarks. Will look at the runtime side later. A lot of my remarks will be referring to pre-existing code without me pointing it out each time, just know that I am aware that a lot of that stuff has nothing to do with your work.

I propose some simplifications and streamlining with initialization. Main point would be to clearly separate getting information from the OS from post-processing (consistency checks and decisions), in addition to a bit clearer naming.

---

We have `find_default_large_page_size()` and `register_large_page_sizes()`. The names could be a bit clearer, and I do not think they should be known outside of this file, so I would propose to redefine them to be local convenience functions which just scan the proc fs and do not change outside state, just return values, like this:

- `static size_t scan_default_large_page_size();`
- `static os::PageSizes scan_multiple_page_support();`

(naming is lent from vm/hugetlbpage.txt)

---

Today, in `find_default_large_page_size()`, if we have no default huge page configured, currently we return a hard coded default:

https://github.com/openjdk/jdk/blob/f2e44ac726bad2e7db1ec9f5e77703a99ccfb683/src/hotspot/os/linux/os_linux.cpp#L3627-L3636

I am not sure this makes sense. The kernel documentation states that if this entry does not exist, we cannot use huge pages. I would consider removing this and just return 0 in that case.

The point is that these low level convenience functions should read OS information and not make up stuff. Making up stuff should be done, if at all, in the caller.

---

When consistency checking and post-processing what we got from the OS, note that there are slight inconsistencies (preexisting) how we handle things:

- we gracefully handle the non-existence of /sys/kernel/mm/hugepages - or if it exists, the fact that the default page size may be missing from it - by transparently adding the default huge page size to os::_page_sizes.
- But if the user specifies UseLargePageSize in bytes, overwriting the default large page size, we now require the multi page size to be present in os::_page_sizes. So in that case, /sys/kernel/mm/hugepages had to be present.

I mean, either we trust /sys/kernel/mm/hugepages, or we don't. We happily make up page sizes in find_default_large_page_size(), but here we check rather strictly. 

It makes sense to check the user input for validity, but then, could we not always just require /sys/kernel/mm/hugepages to be present and consistent with /proc/meminfo?

---

I am not sure of the usefulness of `os::Linux::setup_large_page_size()`. Its just a thin wrapper. I would remove it and merge it directly into `os::large_page_init()`, which would be easier to understand.

So, `os::large_page_init()` could look like this:

void os::large_page_init() {
  // 1) Handle the case where we do not want to use huge pages and hence
  //    there is no need to scan the OS for related info
  if (!UseLargePages &&
      !UseTransparentHugePages &&
      !UseHugeTLBFS &&
      !UseSHM) {
    // Not using large pages.
    return;
  }

  if (!FLAG_IS_DEFAULT(UseLargePages) && !UseLargePages) {
    // The user explicitly turned off large pages.
    // Ignore the rest of the large pages flags.
    UseTransparentHugePages = false;
    UseHugeTLBFS = false;
    UseSHM = false;
    return;
  }

  // 2) Scan OS info
  size_t default_large_page_size = scan_default_large_page_size();
  if (default_large_page_size == 0) {
    // We are done, no large pages configured.
    UseTransparentHugePages = false;
    UseHugeTLBFS = false;
    UseSHM = false;
    return;
  }
  os::PageSizes all_pages = scan_multiple_page_support();

  // 3) Consistency check and post-processing

  // It is unclear if /sys/kernel/mm/hugepages/ and /proc/meminfo could disagree. Manually
  // re-add the default page size to the list of page sizes to be sure.
  all_pages.add(default_large_page_size);

  // Handle LargePageSizeInBytes
  if (!FLAG_IS_DEFAULT(LargePageSizeInBytes) && LargePageSizeInBytes != _default_large_page_size) {
    ... blabla
    default_large_page_size = LargePageSizeInBytes
    log_info(os)("Overriding default huge page size..");
    ...
  }

  // Now determine the type of large pages to use:
  os::Linux::setup_large_page_type()

  set_coredump_filter(LARGEPAGES_BIT);

  // Any final logging:
  logloglog

}

What do you think? I think this would be a bit easier to read and understand, and we have that clear separation between scanning OS info and deciding what we do with it.

Still a small nit is that we let the user override the OS info with LargePageSizeInBytes. I rather would have a variable containing unmodified OS info, and a separate variable for whatever we make up. But thats just a small issue.

src/hotspot/os/linux/os_linux.cpp line 3679:

> 3677:         sscanf(entry->d_name, "hugepages-%zukB", &page_size) == 1) {
> 3678:       // The kernel is using kB, hotspot uses bytes
> 3679:       if (page_size * K > (size_t)Linux::page_size()) {

I do not think excluding the base page size is needed here. The directory only contains entries for huge pages. If for any weird reason this is the same as the base page size (which I have never seen) I would include it, since its a huge page too. But I do not think this can happen.

src/hotspot/os/linux/os_linux.cpp line 3670:

> 3668:     // If we can't open /sys/kernel/mm/hugepages
> 3669:     // Add _default_large_page_size to _page_sizes
> 3670:     _page_sizes.add(_default_large_page_size);

missing return here.

src/hotspot/os/linux/os_linux.cpp line 3692:

> 3690:     ls.print("Available page sizes: ");
> 3691:     _page_sizes.print_on(&ls);
> 3692:   }

Does this work and show something? I know UL is initialization time sensitive (which is annoying btw).

-------------

Changes requested by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/1153



More information about the hotspot-gc-dev mailing list