RFR: JDK-8310111: Shenandoah wastes memory when running with very large page sizes [v2]

Fri Jun 30 12:25:53 UTC 2023

On Fri, 30 Jun 2023 09:38:33 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> I do wonder if we want to cobble together all Shenandoah bulk memory structures in a single `ReservedSpace` unconditionally, which could then use the single (large) page size, if needed. It feels redundant to try and allocate some data-structures separately and then solve the overdraft factor, splitting and piggy-backing? I think that logic would be even more complicated when GenShen comes in with additional RemSet data structure.
> 
> I.e. do a ReservedSpace, which _always_ gets the largest page size, and always contains:
> 
>     1. Collection set bitmap
> 
>     2. Regions storage
> 
>     3. Remembered set (from GenShen)
> 
>     4. Mark bitmap
> 
>     5. Aux bitmap
> 
> 
> The RS base selection would be driven by (1), which wants a specific address. (1)..(4) are commited at init. (5) is commited on demand, if page sizes allow. The "waste" on large pages for smaller (1) and (2) would be subsumed by (4) and (5) that are in the same RS.
> 
> So if we are lucky, we would be able to fit everything in 1G page, like one of our example shows, but in a simpler way.
> 

I like it, its simpler, but some drawbacks you need to okay beforehand:

- cset is normally tiny. Therefore the chance to map in lower-based regions is very good. The combined-everything RS would be a lot larger, therfore diminish the chance of mapping in lower regions. It would also reduce the chance of mapping the java heap in lower based regions. Especially if we ever try to attach the heap for zero-based *and* zero-shift oops mode, which at the moment we don't even try.

- Allocating everything together is an all-or-nothing deal: we need to have enough large pages for everyone, otherwise nobody gets large pages.

Both points may be acceptable drawbacks. Especially the second one, since it basically comes down to the sysadmin allocating enough hugepages. Not getting them could be seen as config error.

> The problem I would see is that mark bitmap uncommit code would need to be adjusted a bit, if mark bitmap bases would not be aligned to page sizes anymore, but since they are the last in RS, we can do a bit of internal alignment to fit bulk uncommits better.

I wonder if this is even an issue.

For static hugepages, we are "special" and never uncommit. For THP, would slice-wise uncommit and recommit really make sense? We'd just keep khugepaged busy. Alternatively, we could reserve huge-page-aligned and madvise in THP mode, but then never uncommit. That way, things coalesce and then stay that way.

Btw, on my machines the max. hugepage size for THP seems to be 2M. I know there was a kernel patch some time ago that tried to add 1G support, but don't know whether that ever got integrated.

My proposal would be: as you propose, use one RS. Then:
- static huge pages - no uncommit, no problem
- THP: never uncommit, all stays committed, no problem
- Non-LP: we use tiny system pages. Align the mark bitmap start to system page size, then all works as it did before.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14559#issuecomment-1614577295