RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size

Thomas Stuefe stuefe at openjdk.java.net
Wed Nov 25 18:40:01 UTC 2020


On Wed, 25 Nov 2020 17:01:13 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Yes, but I think it makes sense to allow the os layer to mix page sizes for large paged areas, for performance reasons. The fact that this coding exists, and that Intel wants to further complicate it and add 2M pages, means we have a need here. 
>> 
>> Trying to avoid this just means that people add patches sideways to satisfy specific needs, which hurts maintainability. Also, I don't like to discourage first time contributors with lots of concerns, therefore I'd like a cleaner, more flexible os layer.
>> 
>> But no-one forces you to accept multi-page-sized-areas. If you really want just one page size, you can query the largest page size available beforehand and align the reservation size accordingly, and with my proposed change you could now assert the result and log it correctly.
>> 
>> But if one just generally wants large pages without caring for the precise layout, one could let os::reserve_memory_special() do its best, and would now get the information about what reserve_memory_special() did. 
>> 
>> For example, were I to re-introduce large pages for Metaspace, I would like to have the luxury of just calling os::reserve_memory_special() without having to think about specific geometry - if the space is large enough for large pages, it should stitch the area together as best as it can.
>
> Hi,
> 
> On 25.11.20 16:52, Thomas Stuefe wrote:
>> Yes, but I think it makes sense to allow the os layer to mix page sizes 
>> for large paged areas, for performance reasons. The fact that this 
>> coding exists, 
> 
> That code may be an artifact of 32 bit machines from 10+ years ago where 
> address space fragmentation has been a real concern (typically Windows). 
> Or RAM sizes were measured in hundreds of MB instead of GBs where (on 
> Linux) pre-reserving hundreds of MB for huge pages is/has been an issue.
> 
>> and that Intel wants to further complicate it and add 2M 
>> pages, means we have a need here.
> 
> The JDK-8256155 (Intel) CR does not state that requirement. Imho it only 
> says that the author wants to use (any and all) configured pages for 
> different reserved spaces.
> 
> E.g. the machine has (on x64, to simplify the case) configured:
> 
> 10 1G pages
> 1000 2M pages
> 
> so the heap should use the 1G pages (assuming it's less than 10G), other 
> reservations like code heap should first use the 50 2M pages before 
> falling back to other page sizes to better use available TLB cache entries.
> 
> I would prefer if we do not overcomplicate the requirements :) Also 
> probably this should be asked and followed up in the correct review thread.
> 
>> Trying to avoid this just means that people add patches sideways to 
>> satisfy specific needs, which hurts maintainability. Also, I don't like 
>> to discourage first time contributors with lots of concerns, therefore 
>> I'd like a cleaner, more flexible os layer.
> 
> I'd like a simpler, maybe less flexible but understandable by mere 
> mortals OS layer :) lower layer that does not make upper layers too 
> complicated.
> 
>> But no-one forces you to accept multi-page-sized-areas. If you really 
>> want just one page size, you can query the largest page size available 
>> beforehand and align the reservation size accordingly, and with my 
> 
> Which is no issue with 64 bit machines at all, but probably has been 
> with the prevalence of 32 bit address spaces.
> 
>> proposed change you could now assert the result and log it correctly.
>> 
>> But if one just generally wants large pages without caring for the 
>> precise layout, one could let os::reserve_memory_special() do its best, 
>> and would now get the information about what reserve_memory_special() did.
> 
> This is a kind of one-sided argument, taking only commit into account. 
> Since actually giving back memory is expected nowadays, taking care of 
> different random page sizes is complicated.
> 
> E.g. when implementing G1's region memory management (in 8u40) the 
> decision to only support a single page size for every one of its GC data 
> structures has been a conscious one - because the complexity overhead 
> did not justify the gains.
> 
> Nobody complained yet.
> 
>> For example, were I to re-introduce large pages for Metaspace, I would 
>> like to have the luxury of just calling os::reserve_memory_special() 
>> without having to think about specific geometry - if the space is large 
>> enough for large pages, it should stitch the area together as best as it 
>> can.
> 
> That's true, but that Metaspace layer then needs to be aware of multiple 
> page sizes when uncommitting, and (presumably) tracking liveness on the 
> lowest granularity anyway. Which does not make the code easier.
> 
> Thanks,
>    Thomas

Hi Thomas,

first off, the last thing I want is for matters to become more complicated. God no.

Today the virtual memory layer is inconsistent and undocumented and has subtle bugs. Among other things this means that PRs can take forever to discuss, which I guess is frustrating for newcomers. Or if they are pushed they can cause a lot of cleanup work afterwards.

Just attempting to write a consistent API description of the virtual memory layer, parameters expected behavior etc, is very difficult. I know since I tried. And that is a bad sign. Tests are missing too, since what you cannot describe you cannot test. For an interesting example, see JDK-8255978. That bug had been in there since at least JDK8.

One problem I see reoccurring with these APIs is that meta information about a reservation are needed later, but are lost. Many examples: 
- the exact flags with which a mapping was established (exec or not, MAP_JIT, etc)
- which API has been used (AIX)
- what the page sizes were at reservation time
- whether the area had been stitched together with multiple mappings or a single one
.. etc.

The code then often tries to guess these information from "circumstantial evidence", which introduces lots of strange dependencies and makes the coding fragile. ReservedSpace::actual_page_size() in this patch is one example.

A better way would be to have these information kept somewhere. Ideally, the OS would do this and I could ask it for properties of a memory mapping. In reality, the OS is not so forthcoming. So we should keep these information ourselves.

One of these information is the page sizes of a region. In that light you can see my proposal. I did not propose to change a single atom about how os::reserve_memory_special() works. I just proposed for it to just be honest and return information about what it did. So that this information could be kept inside ReservedSpace and reused instead of guessed. And since the reality today is that os::reserve_memory_special() can reserve multiple page sizes, a correct implementation would have to reflect that or be wrong.

You are right, reporting multiple page sizes is more difficult than just one. But even more complex is what we have today, where a mapping can have multiple page sizes but this is not acknowledged by upper layers. In this case, we have ReservedSpace::actual_page_size() which plain cannot be implemented correctly.

I am fine with the idea of restricting RS to just one page size. But then lets do that and get rid of mixed TLBFS reservation (and whatever Windows does, they may be doing something similar). We also should specify the page size as an explicit parameter and not confuse this with the alignment.

> That code may be an artifact of 32 bit machines from 10+ years ago where address space fragmentation has been a real concern (typically Windows). Or RAM sizes were measured in hundreds of MB instead of GBs where (on Linux) pre-reserving hundreds of MB for huge pages is/has been an issue.

I am not sure what coding you refer to.

> and that Intel wants to further complicate it and add 2M pages, means we have a need here.

> The JDK-8256155 (Intel) CR does not state that requirement. Imho it only says that the author wants to use (any and all) configured pages for different reserved spaces. E.g. the machine has (on x64, to simplify the case) configured: 10 1G pages 1000 2M pages so the heap should use the 1G pages (assuming it's less than 10G), other reservations like code heap should first use the 50 2M pages before falling back to other page sizes to better use available TLB cache entries. I would prefer if we do not overcomplicate the requirements :) 

Flexible stitching would have one advantage though, in that a huge page pool could be better used. In your example, if someone starts with a 12G heap, it would fail since no pool indicidually is large enough, but combining different page sizes would work.

I wont insist on this. As I wrote, I am fine with one-pagesize-per-mapping if we can arrive there.

> Also probably this should be asked and followed up in the correct review thread.

Since it was relevant to this PR, I did mention it here.

> I'd like a simpler, maybe less flexible but understandable by mere mortals OS layer :) lower layer that does not make upper layers too complicated.

I do too.

>> <snip>
> This is a kind of one-sided argument, taking only commit into account. Since actually giving back memory is expected nowadays, taking care of different random page sizes is complicated. E.g. when implementing G1's region memory management (in 8u40) the decision to only support a single page size for every one of its GC data structures has been a conscious one - because the complexity overhead did not justify the gains. Nobody complained yet.

I may be missing something here, but is hupe paged space not pinned? How can this be uncommitted? Okay yes, if it can be uncommitted, multiple page sizes should be avoided.

>> For example, were I to re-introduce large pages for Metaspace, I would like to have the luxury of just calling os::reserve_memory_special() without having to think about specific geometry - if the space is large enough for large pages, it should stitch the area together as best as it can.

> That's true, but that Metaspace layer then needs to be aware of multiple page sizes when uncommitting, and (presumably) tracking liveness on the lowest granularity anyway. Which does not make the code easier.

See above. I was under the assumption that uncommit goes out of the window the moment you choose explicit large pages.

> Thanks, Thomas

Thanks, Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/1161


More information about the hotspot-dev mailing list