[jdk16] RFR: 8259380: Correct pretouch chunk size to cap with actual page size

Mon Jan 11 14:59:00 UTC 2021

On Sat, 9 Jan 2021 11:36:31 GMT, Patrick Zhang <qpzhang at openjdk.org> wrote:

>> Another option is to just set the default chunk size for aarch64 to e.g. 512M and defer searching for the "best" later.
>
>> Another option is to just set the default chunk size for aarch64 to e.g. 512M and defer searching for the "best" later.
> 
> This cannot solve the problem completely, e.g., [HugeTLB Pages](https://github.com/torvalds/linux/blob/a09b1d78505eb9fe27597a5174c61a7c66253fe8/Documentation/admin-guide/mm/hugetlbpage.rst): "_x86 CPUs normally support 4K and 2M (1G if architecturally supported)_". Should there be a x64 system configured with 1GB large page, using current 4MB chunk size, the regression slowdown would show too, I believe. 
> This was probably the reason why `-XX:PreTouchParallelChunkSize` has default 1GB settings, which could cover all kinds of large pages in modern kernels/architectures.

As for the expected regression with 1g pages with 4m chunk size vs. 1g chunk size: interestingly, on Linux, without THP, 4m chunk size is faster for a simple "Hello World" app. I noticed that already yesterday, and re-verified on different machines and heap sizes up to 2TB today.

However this seems to be an artifact of the test, as when comparing log message times (the `Running G1 PreTouch with X workers for ...` ones shown with gc+heap=debug, they are the same.

-------------

PR: https://git.openjdk.java.net/jdk16/pull/97