RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap

Marcus G K Williams github.com+168222+mgkwill at openjdk.java.net
Wed Nov 18 19:18:09 UTC 2020


On Thu, 12 Nov 2020 15:36:13 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Use 2m pages for executable large
>> pages and in large page requests less
>> than 1g on linux.
>> 
>> - Add os::exec_large_page_size() that
>> returns 2m as size
>> - Add os::select_large_page_size() to return
>> correct large page size for size_t bytes
>> - Add 2m size to _page_sizes array
>> - Update reserve_memory_special methods
>> to set/use large_page_size based on exec
>> size
>> - Update large page not reserved warnings
>> to include large_page_size attempted
>> - Update TestLargePageUseForAuxMemory.java
>> to expect 2m large pages in some instances
>> 
>> Signed-off-by: Marcus G K Williams <marcus.williams at intel.com>
>
> Hi,
> 
> this seems like a improvement for a very specific scenario (2M instead of 1G pages on x86(?) Linux(?)). At the moment this feels more like an early prototype. The lack of comments/documentation is not helping.
> 
> Both JBS and PR are a bit taciturn. It would help if you could elaborate a bit. E.g. is this just for Linux? for x86 only? since the ticket talks about 4K pages, which are not universal across all architectures.
> 
> I can glean some of what you want to do from the patch itself, but the spec is vague so there is no way to verify if the patch matches the spec.
> 
> What does page size have to do with exec permission? This should not be tied to exec. The whole patch should not contain the word "exec" :)
> 
> Why is this proposal hard coded to 2M pages?
> 
> What memory regions are supposed to be affected by this? JBS ticket talks about "code, card table and other". 
> 
> One problem I see is that the notion of "we have a small standard page and a single large pinned page size" is - I believe - baked in into a few places. Are there any places where an implicit assumption of the page size or their "pinned-ness" could break things now (see also below remark about UseSHM)? For instance, are these pages pinned on all our platforms, and if no, could code be affected which commits/uncommits and assumes a certain page size?
> 
> What tests have you run? On what platforms? Also platforms with different page sizes? How well did you test UseSHM?
> 
> The latter is interesting because arguably there is the bigger behavioral change. TLBFS path was using a mixture of large and small pages anyway, so adding another page size into the mix is not a big stretch. But for SHM, things would change: where before reserve_memory_special would return NULL and we'd invoke fallback reservation, now we return a region consisting of 2M pinned pages.
> 
> For SHM, I think you need to make sure that alignment matches SHMLBA?
> 
> It was not clear from the patch or the JBS item whether you propose to change the semantics of LargePageSizeInBytes. E.g. what happens if the value specified explicitely is smaller than your exec_page size? Your patch seems to give preference to exec_page_size. But this would be a behavioral change, and may need a CSR.
> 
> Finally, comments would be nice. Clear API specs. Extending regression tests for reserve_memory_special would be good too, to test the new behavior (for gtests examples, see test/hotspot/gtest/runtime in the source folder).
> 
> The linux-2m-page-specific code in the platform-generic G1 test seems wrong.
> 
> Cheers, Thomas

Hi Thomas,

Thanks so much for your review. Please bear with me as this if my first patch to JDK community. But I have pushed patches to other open source communities (OpenDaylight, ONAP, OpenStack) and worked as a committer in some.

**Responses below inline:**

> Hi,
> 
> this seems like a improvement for a very specific scenario (2M instead of 1G pages on x86(?) Linux(?)). At the moment this feels more like an early prototype. The lack of comments/documentation is not helping.
> 
> Both JBS and PR are a bit taciturn. It would help if you could elaborate a bit. E.g. is this just for Linux? for x86 only? since the ticket talks about 4K pages, which are not universal across all architectures.
> 

I appreciate the feedback. Perhaps the lack of detail in the pull request/JDK issue is a function of my zoomed focus on the specific purpose and lack of understanding about how much detail is normally included. The purpose of the patch/issue is to enable code hugepage memory reservations on Linux when the JDK is configured with 1G hugepages (LargePages in JDK parlance). 

To my knowledge, in most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations.

> I can glean some of what you want to do from the patch itself, but the spec is vague so there is no way to verify if the patch matches the spec.
> 
> What does page size have to do with exec permission? This should not be tied to exec. The whole patch should not contain the word "exec" :)

I'd appreciate any advice on writing a less vague spec. I have used exec as a stand-in for code memory reservations in my descriptions, mostly due to the fact that a 'bool exec' is used in functions that reserve HugePages and this later is translated into 'PROT_EXEC' when mmap is called, "exec" is passed in but not used in SHM. These are the particular memory reservations we wanted the patch to affect when using 1G LargePages. However I will remove those references if unwarranted.

> 
> Why is this proposal hard coded to 2M pages?
> 

To my knowledge 2M is the smallest large pages size supported by Linux at the moment. Hardcoding 2M pages was an attempt to simplify the reservation of code memory using LargePages. Also it was an implementation suggestion from some Oracle engineers on the topic, though my implementation of those suggestions could be suspect. Perhaps we should detect the smallest large page size in _page_sizes array that will fit the requested memory amount. However populating _page_sizes array is complicated by the fact that the current jdk code relies heavily on the default large page size, almost as if that is the only size that can be used. 2M large page sizes are the default default_large_page_size in Linux and should be available even if one configures the default_large_page_size to 1G.

> What memory regions are supposed to be affected by this? JBS ticket talks about "code, card table and other".
> 

Code is the target memory region. However there are some other instances where large_page reservation is happening due to the addition of 2M pages as an option. Some calls fail and error when adding 2M pages to _page_sizes array in the company of 1G pages. See line https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR3790 
This is where 2m pages are added.

However at https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4077
we get failures after the addition of 2M sizes to _page_sizes array, due to some smaller reservations that happen regardless of the LargePageSizeInBytes=1G

So in
https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4229
we make sure that we select the largest page size that works with the bytes requested for reservation. Perhaps we shouldn't have exec as a special case, as the large page returned will be the same based on the size requested.

> One problem I see is that the notion of "we have a small standard page and a single large pinned page size" is - I believe - baked in into a few places. Are there any places where an implicit assumption of the page size or their "pinned-ness" could break things now (see also below remark about UseSHM)? For instance, are these pages pinned on all our platforms, and if no, could code be affected which commits/uncommits and assumes a certain page size?
> 
> What tests have you run? On what platforms? Also platforms with different page sizes? How well did you test UseSHM?
> 

My architecture knowledge outside of x86_64 is limited. I've been looking into this and thinking about it and I will have some more comments in the next day or so. For UseSHM I ran some negative tests but I will do some more rigorous testing and report back. 

> The latter is interesting because arguably there is the bigger behavioral change. TLBFS path was using a mixture of large and small pages anyway, so adding another page size into the mix is not a big stretch. But for SHM, things would change: where before reserve_memory_special would return NULL and we'd invoke fallback reservation, now we return a region consisting of 2M pinned pages.
> 
> For SHM, I think you need to make sure that alignment matches SHMLBA?

Looking into this.

> 
> It was not clear from the patch or the JBS item whether you propose to change the semantics of LargePageSizeInBytes. E.g. what happens if the value specified explicitely is smaller than your exec_page size? Your patch seems to give preference to exec_page_size. But this would be a behavioral change, and may need a CSR.
> 

I'm open to removing exec references and just enabling multiple page sizes which would allow for 2M pages to be used by code memory reservations.

> Finally, comments would be nice. Clear API specs. Extending regression tests for reserve_memory_special would be good too, to test the new behavior (for gtests examples, see test/hotspot/gtest/runtime in the source folder).
>

Thanks. I will push an updated patch w/ comments. Will attempt clear API spec and regression tests for reserve_memory_special but will need some guidance on those.

> The linux-2m-page-specific code in the platform-generic G1 test seems wrong.

Any advice here. My change specifically changes the behavior of the pages returned in the test for linux platforms but should not have effects on other platforms. I don't know how this would generally happen for JDK tests in this case. It seems to me that the JDK will act differently on different platforms. How is this normally handled?

> 
> Cheers, Thomas

Thanks again for the review.

> src/hotspot/os/linux/os_linux.cpp line 3970:
> 
>> 3968:                                             char* req_addr, bool exec) {
>> 3969:   size_t large_page_size;
>> 3970:   large_page_size = os::select_large_page_size(bytes, exec);
> 
> The "os" is shared and platform generic. Please don't add anything there unless you write (and test as much as possible) the different platforms. I do not see why this API should even be exported from this unit.

Understood. Will move from src/hotspot/share/runtime/os.hpp to src/hotspot/os/linux/os_linux.hpp .

> src/hotspot/os/linux/os_linux.cpp line 4002:
> 
>> 4000:     char msg[128];
>> 4001:     jio_snprintf(msg, sizeof(msg), "Failed to reserve shared memory with large_page_size: " SIZE_FORMAT ".", large_page_size);
>> 4002:     shm_warning_format_with_errno("%s", msg);
> 
> Why the double printf here? But you can just use Univeral Logging ` log_info(os)("..") `. See e.g. thread creation in this file for examples.

Looking into this.

> test/hotspot/jtreg/gc/g1/TestLargePageUseForAuxMemory.java line 80:
> 
>> 78:     }
>> 79: 
>> 80:     static void testVM(String what, long heapsize, boolean cardsShouldUseLargePages, boolean bitmapShouldUseLargePages, boolean largePages2m) throws Exception {
> 
> Having this linux-specific stuff in a generic G1 test :(

Understood. Not sure how others would handle this as assumptions would change if my code is merged and one gets 2M large pages on smaller mem reservations but only on linux. Any advice?

> test/hotspot/jtreg/gc/g1/TestLargePageUseForAuxMemory.java line 150:
> 
>> 148:         if (Platform.isLinux() && largePageSize != largePageExecSize) {
>> 149:             try {
>> 150:                 Scanner scan_hugepages = new Scanner(new File("/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages"));
> 
> 2M hard coded.

Understood. Not sure how others would handle this.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1153



More information about the hotspot-gc-dev mailing list