RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size

Thu Dec 3 20:26:59 UTC 2020

On Thu, 26 Nov 2020 08:48:43 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:

>> Hi Thomas,
>> 
>> first off, the last thing I want is for matters to become more complicated. God no.
>> 
>> Today the virtual memory layer is inconsistent and undocumented and has subtle bugs. Among other things this means that PRs can take forever to discuss, which I guess is frustrating for newcomers. Or if they are pushed they can cause a lot of cleanup work afterwards.
>> 
>> Just attempting to write a consistent API description of the virtual memory layer, parameters expected behavior etc, is very difficult. I know since I tried. And that is a bad sign. Tests are missing too, since what you cannot describe you cannot test. For an interesting example, see JDK-8255978. That bug had been in there since at least JDK8.
>> 
>> One problem I see reoccurring with these APIs is that meta information about a reservation are needed later, but are lost. Many examples: 
>> - the exact flags with which a mapping was established (exec or not, MAP_JIT, etc)
>> - which API has been used (AIX)
>> - what the page sizes were at reservation time
>> - whether the area had been stitched together with multiple mappings or a single one
>> .. etc.
>> 
>> The code then often tries to guess these information from "circumstantial evidence", which introduces lots of strange dependencies and makes the coding fragile. ReservedSpace::actual_page_size() in this patch is one example.
>> 
>> A better way would be to have these information kept somewhere. Ideally, the OS would do this and I could ask it for properties of a memory mapping. In reality, the OS is not so forthcoming. So we should keep these information ourselves.
>> 
>> One of these information is the page sizes of a region. In that light you can see my proposal. I did not propose to change a single atom about how os::reserve_memory_special() works. I just proposed for it to just be honest and return information about what it did. So that this information could be kept inside ReservedSpace and reused instead of guessed. And since the reality today is that os::reserve_memory_special() can reserve multiple page sizes, a correct implementation would have to reflect that or be wrong.
>> 
>> You are right, reporting multiple page sizes is more difficult than just one. But even more complex is what we have today, where a mapping can have multiple page sizes but this is not acknowledged by upper layers. In this case, we have ReservedSpace::actual_page_size() which plain cannot be implemented correctly.
>> 
>> I am fine with the idea of restricting RS to just one page size. But then lets do that and get rid of mixed TLBFS reservation (and whatever Windows does, they may be doing something similar). We also should specify the page size as an explicit parameter and not confuse this with the alignment.
>> 
>>> That code may be an artifact of 32 bit machines from 10+ years ago where address space fragmentation has been a real concern (typically Windows). Or RAM sizes were measured in hundreds of MB instead of GBs where (on Linux) pre-reserving hundreds of MB for huge pages is/has been an issue.
>> 
>> I am not sure what coding you refer to.
>> 
>>> and that Intel wants to further complicate it and add 2M pages, means we have a need here.
>> 
>>> The JDK-8256155 (Intel) CR does not state that requirement. Imho it only says that the author wants to use (any and all) configured pages for different reserved spaces. E.g. the machine has (on x64, to simplify the case) configured: 10 1G pages 1000 2M pages so the heap should use the 1G pages (assuming it's less than 10G), other reservations like code heap should first use the 50 2M pages before falling back to other page sizes to better use available TLB cache entries. I would prefer if we do not overcomplicate the requirements :) 
>> 
>> Flexible stitching would have one advantage though, in that a huge page pool could be better used. In your example, if someone starts with a 12G heap, it would fail since no pool indicidually is large enough, but combining different page sizes would work.
>> 
>> I wont insist on this. As I wrote, I am fine with one-pagesize-per-mapping if we can arrive there.
>> 
>>> Also probably this should be asked and followed up in the correct review thread.
>> 
>> Since it was relevant to this PR, I did mention it here.
>> 
>>> I'd like a simpler, maybe less flexible but understandable by mere mortals OS layer :) lower layer that does not make upper layers too complicated.
>> 
>> I do too.
>> 
>>>> <snip>
>>> This is a kind of one-sided argument, taking only commit into account. Since actually giving back memory is expected nowadays, taking care of different random page sizes is complicated. E.g. when implementing G1's region memory management (in 8u40) the decision to only support a single page size for every one of its GC data structures has been a conscious one - because the complexity overhead did not justify the gains. Nobody complained yet.
>> 
>> I may be missing something here, but is hupe paged space not pinned? How can this be uncommitted? Okay yes, if it can be uncommitted, multiple page sizes should be avoided.
>> 
>>>> For example, were I to re-introduce large pages for Metaspace, I would like to have the luxury of just calling os::reserve_memory_special() without having to think about specific geometry - if the space is large enough for large pages, it should stitch the area together as best as it can.
>> 
>>> That's true, but that Metaspace layer then needs to be aware of multiple page sizes when uncommitting, and (presumably) tracking liveness on the lowest granularity anyway. Which does not make the code easier.
>> 
>> See above. I was under the assumption that uncommit goes out of the window the moment you choose explicit large pages.
>> 
>>> Thanks, Thomas
>> 
>> Thanks, Thomas
>
> You are correct that explicit huge pages are pinned and of course there might be situations where it would be beneficial to do the stitching, but I think the maintenance cost will be significantly higher. It sound like you would prefer a simpler model as well and int would certainly be interesting to see if there are any problems connected to getting rid of the mixed mappings we have today.
> 
> I also agree that we should do better book keeping since it is so hard to get the information later on.

Hi guys,

gentle ping. I know we are all busy with the impending code freeze :)

I am fine with removing the ability to reserve space with multiple page sizes. But that is something I cannot drive - probably someone at Oracle should, since it has repercussions for GC and for backward compatibility. Might also take a bit longer.

Meanwhile, we still have the new ReservedSpace::actual_page_size(), which is subtly broken and will be more so once JDK-8234930 is implemented. IIUC you did not like the proposal of making reserve_space_special() return information about the reserved page sizes. That's totally fine for me, but have we decided on how to deal with this instead?

Cheers, Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/1161