RFR: JDK-8304954: SegmentedCodeCache fails when using large pages

Fri Jul 21 09:02:41 UTC 2023

On Thu, 20 Jul 2023 14:03:39 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> # Issue
>> 
>> When large pages are enabled and segmented code cache is used, the VM tries to use one page for each segment. If the amount of reserved code cache is limited, this can make the total size of the code cache bigger than the reserved size, which in turn makes the VM fail, claiming that there is not enough space (e.g. this happens when running `java -XX:+UseLargePages -XX:+SegmentedCodeCache -XX:InitialCodeCacheSize=2g -XX:ReservedCodeCacheSize=2g -XX:LargePageSizeInBytes=1g -Xlog:pagesize*=debug -version`).
>> This behaviour is not correct as the VM should fall back and try with a smaller page size (and print a warning).
>> 
>> # Solution
>> 
>> Right before reserving heap space for code segments we introduce a test (if we are using large pages) that checks for the total size of aligned code segments and  selects smaller and smaller page sizes until the total size fits in the reserved size for the code cache.
>> 
>> # Test
>> 
>> The regression test runs a new VM with `-XX:+UseLargePages -XX:LargePageSizeInBytes=1g`. The `main` method then checks if the two flags have been "taken". If so, another process is started that checks for a specific output, otherwise the test passes (i.e. the current system doesn't allow 1GB large pages)
>
> Hi @dafedafe,
> 
> good catch! May I propose a much simpler fix, though?
> 
> The problem is that we don't pass the correct value for `min_pages` to `CodeCache::page_size`.
> 
> `CodeCache::page_size()` calls either one of `os::page_size_for_region_aligned()` or `os::page_size_for_region_unaligned()`, which already tries to fit the given memory region by iterating through page sizes, exactly like you do.
> 
> If you change `CodeCache::reserve_heap_memory()` to pass in a minimum number of pages of "8" (as we pass "8" in the earlier call that calculates the alignment of the segments), the error goes away. 
> 
> Arguably, one should calculate the page size just once and then always use that calculated value, instead of recalculating it differently each time.
> 
> 
> diff --git a/src/hotspot/share/code/codeCache.cpp b/src/hotspot/share/code/codeCache.cpp
> index 2ea72a1fcbd..7a30bfb1783 100644
> --- a/src/hotspot/share/code/codeCache.cpp
> +++ b/src/hotspot/share/code/codeCache.cpp
> @@ -356,7 +356,7 @@ size_t CodeCache::page_size(bool aligned, size_t min_pages) {
>  
>  ReservedCodeSpace CodeCache::reserve_heap_memory(size_t size) {
>    // Align and reserve space for code cache
> -  const size_t rs_ps = page_size();
> +  const size_t rs_ps = page_size(false, 8);
>    const size_t rs_align = MAX2(rs_ps, os::vm_allocation_granularity());
>    const size_t rs_size = align_up(size, rs_align);
>    ReservedCodeSpace rs(rs_size, rs_align, rs_ps);
> 
> 
> On my machine with both 1G and 2M pages configured, with the fix, we now don't crash but use 2 MB pages for the code cache:
> 
> 
> thomas at starfish$ ./images/jdk/bin/java -XX:+UseLargePages -XX:+SegmentedCodeCache -XX:InitialCodeCacheSize=2g -XX:ReservedCodeCacheSize=2g -XX:LargePageSizeInBytes=1g -Xlog:pagesize*=debug -version
> [0.001s][info][pagesize] Static hugepage support:
> [0.001s][info][pagesize]   hugepage size: 2M, nr_hugepages: 2000, nr_overcommit_hugepages: 0
> [0.001s][info][pagesize]   hugepage size: 1G, nr_hugepages: 5, nr_overcommit_hugepages: 0
> [0.001s][info][pagesize]   default hugepage size: 2M
> [0.001s][info][pagesize] Transparent hugepage (THP) support:
> [0.001s][info][pagesize]   THP mode: always
> [0.001s][info][pagesize]   THP pagesize: 2M
> [0.001s][info][pagesize] Overriding default large page size (2M) using LargePageSizeInBytes: 1G
> [0.001s][info][pagesize] UseLargePages=1, UseTransparentHugePages=0, UseHugeTLBFS=1, UseSHM=0
> [0.001s][info][pagesize] Large page support enabled. Usable page sizes: 4k, 2M, 1G. Default large page size: 1G.
> ...
> [0.002s][info ][pag...

Thanks a lot for your review @tstuefe 

> The problem is that we don't pass the correct value for `min_pages` to `CodeCache::page_size`.
> 
> `CodeCache::page_size()` calls either one of `os::page_size_for_region_aligned()` or `os::page_size_for_region_unaligned()`, which already tries to fit the given memory region by iterating through page sizes, exactly like you do.
> 
> If you change `CodeCache::reserve_heap_memory()` to pass in a minimum number of pages of "8" (as we pass "8" in the earlier call that calculates the alignment of the segments), the error goes away.

You're right, it is much simpler!
I noticed the 8 passed earlier but I came to this solution since I was wondering if we really want to give a minimum of 8 pages, especially if we use large pages (on the other hand I didn't want to change the earlier code as it is used for non large pages as well). But I'm not sure this makes sense as I might not have a full picture.

> On my machine with both 1G and 2M pages configured, with the fix, we now don't crash but use 2 MB pages for the code cache:

You're right, with that machine (which has the same large page configuration as the one I used to test) the result is the same but I was wondering if this is always the case (e.g. with a large page of 1/2G).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14903#issuecomment-1645247576