RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v3]

Mon Jul 24 11:29:42 UTC 2023

On Fri, 21 Jul 2023 09:28:42 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Hi @dafedafe,
>> 
>> good catch! May I propose a much simpler fix, though?
>> 
>> The problem is that we don't pass the correct value for `min_pages` to `CodeCache::page_size`.
>> 
>> `CodeCache::page_size()` calls either one of `os::page_size_for_region_aligned()` or `os::page_size_for_region_unaligned()`, which already tries to fit the given memory region by iterating through page sizes, exactly like you do.
>> 
>> If you change `CodeCache::reserve_heap_memory()` to pass in a minimum number of pages of "8" (as we pass "8" in the earlier call that calculates the alignment of the segments), the error goes away. 
>> 
>> Arguably, one should calculate the page size just once and then always use that calculated value, instead of recalculating it differently each time.
>> 
>> 
>> diff --git a/src/hotspot/share/code/codeCache.cpp b/src/hotspot/share/code/codeCache.cpp
>> index 2ea72a1fcbd..7a30bfb1783 100644
>> --- a/src/hotspot/share/code/codeCache.cpp
>> +++ b/src/hotspot/share/code/codeCache.cpp
>> @@ -356,7 +356,7 @@ size_t CodeCache::page_size(bool aligned, size_t min_pages) {
>>  
>>  ReservedCodeSpace CodeCache::reserve_heap_memory(size_t size) {
>>    // Align and reserve space for code cache
>> -  const size_t rs_ps = page_size();
>> +  const size_t rs_ps = page_size(false, 8);
>>    const size_t rs_align = MAX2(rs_ps, os::vm_allocation_granularity());
>>    const size_t rs_size = align_up(size, rs_align);
>>    ReservedCodeSpace rs(rs_size, rs_align, rs_ps);
>> 
>> 
>> On my machine with both 1G and 2M pages configured, with the fix, we now don't crash but use 2 MB pages for the code cache:
>> 
>> 
>> thomas at starfish$ ./images/jdk/bin/java -XX:+UseLargePages -XX:+SegmentedCodeCache -XX:InitialCodeCacheSize=2g -XX:ReservedCodeCacheSize=2g -XX:LargePageSizeInBytes=1g -Xlog:pagesize*=debug -version
>> [0.001s][info][pagesize] Static hugepage support:
>> [0.001s][info][pagesize]   hugepage size: 2M, nr_hugepages: 2000, nr_overcommit_hugepages: 0
>> [0.001s][info][pagesize]   hugepage size: 1G, nr_hugepages: 5, nr_overcommit_hugepages: 0
>> [0.001s][info][pagesize]   default hugepage size: 2M
>> [0.001s][info][pagesize] Transparent hugepage (THP) support:
>> [0.001s][info][pagesize]   THP mode: always
>> [0.001s][info][pagesize]   THP pagesize: 2M
>> [0.001s][info][pagesize] Overriding default large page size (2M) using LargePageSizeInBytes: 1G
>> [0.001s][info][pagesize] UseLargePages=1, UseTransparentHugePages=0, UseHugeTLBFS=1, UseSHM=0
>> [0.001s][info][pagesize] Large page support enabled....
>
>> Thanks a lot for your review @tstuefe
>> 
>> > The problem is that we don't pass the correct value for `min_pages` to `CodeCache::page_size`.
>> > `CodeCache::page_size()` calls either one of `os::page_size_for_region_aligned()` or `os::page_size_for_region_unaligned()`, which already tries to fit the given memory region by iterating through page sizes, exactly like you do.
>> > If you change `CodeCache::reserve_heap_memory()` to pass in a minimum number of pages of "8" (as we pass "8" in the earlier call that calculates the alignment of the segments), the error goes away.
>> 
>> You're right, it is much simpler! I noticed the 8 passed earlier but I came to this solution since I was wondering if we really want to give a minimum of 8 pages, especially if we use large pages (on the other hand I didn't want to change the earlier code as it is used for non large pages as well). But I'm not sure this makes sense as I might not have a full picture.
> 
> I think we need not 8, but at least 3 if the codecache size is already 3*pagesize, otherwise we need at least 6:
> 
> https://github.com/openjdk/jdk/blob/55aa122462c34d8f4cafa58f4d1f2d900449c83e/src/hotspot/share/code/codeCache.cpp#L315-L318
> 
> for all these align operations to end up with non-zero results for every segment.
> 
> But I would keep it at 8. Just change one thing at a time (and I may overlook another reason for that minimum page number). That solves the immediate problem.
> 
> Thinking further, I believe there is a case for keeping all segments in one 1G page:
> - if we don't plan on uncommitting the heap, there is no need to align the segments to page boundaries. So we could run with a single static 1 GB hugepage.
> - if we plan to uncommit the heap, each segment should be larger than 1 page, since we probably never will be able to uncommit a segment fully.
> 
> But that's for a future RFE. The solution I proposed has the advantage that its easy to downport, since its minimally invasive.
> 
>> 
>> > On my machine with both 1G and 2M pages configured, with the fix, we now don't crash but use 2 MB pages for the code cache:
>> 
>> You're right, with that machine (which has the same large page configuration as the one I used to test) the result is the same but I was wondering if this is always the case (e.g. with a large page of 1/2G).
> 
> I believe as long as the number of pages > 6 this should always work.

Hi @tstuefe 

> But I would keep it at 8. Just change one thing at a time (and I may overlook another reason for that minimum page number). That solves the immediate problem.

Fair enough: I've changed the code to use a minimum of 8 pages (and calculated only once). I haven't made it the default since it is also used here to enable `SegmentedCodeCache`:
https://github.com/openjdk/jdk/blob/67fbd87378a9b3861f1676977f9f2b36052add29/src/hotspot/share/compiler/compilerDefinitions.cpp#L318-L323 
I've also left the warning for when we use large pages and we cannot reserve pages of that size.

Thanks a lot!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14903#issuecomment-1647728413