RFR: 8324776: runtime/os/TestTransparentHugePageUsage.java fails with The usage of THP is not enough

Liming Liu duke at openjdk.org
Thu Apr 18 03:40:08 UTC 2024


On Wed, 17 Apr 2024 15:28:10 GMT, Daniel D. Daugherty <dcubed at openjdk.org> wrote:

>> This PR remove the testcase introduced in JDK-8315923, as we could not find a reliable way to evaluate the usage of THP. We have tried the following methods:
>> 
>> 1. transverse /proc/self/smaps rather than looking up the first map covered by the heap, as we found there can be multiple sections in /proc/self/smaps for the heap; (https://github.com/limingliu-ampere/jdk/commit/c5b0c4cdf9fa42988faa9fee6ee004ebb599d40a)
>> 2. take the mode of de-fragment and the enabling of khugepaged into account rather than just THP mode, as THP may not be available immediately when the de-fragment mode is neither "always" nor "madvise", or khugepaged does not collapse pages; (https://github.com/limingliu-ampere/jdk/commit/9c70e9384325b44e074a9e8973846343b27fd2cc)
>> 3. call madvise with MADV_HUGEPAGE unconditionally rather than calling it only when THP mode is not "always", and adjust the sizes of young and old generations to ensure the parameters are aligned with THP; (https://github.com/limingliu-ampere/jdk/commit/de9607ff64cc526bca9968b72a7065888c2f944d)
>> 4. check the changes of system-wide counters like thp_* in /proc/vmstat before and after pretouch via gtest. (https://github.com/limingliu-ampere/jdk/commit/bc83e19a682156ee7d09bf939c2b18f3d8c79e22)
>> 
>> But none of them helps. The amount of THP keeps zero on Oracle CI, although the THP mode is "always", the de-fragment mode is "madvise" and khugepaged is enabled. Furthermore, none of thp counters changed around pretouch. However, we tried the same kernel (5.15-UEK) as Oracle CI on our machine, and found that these methods do help. Thus, we decided to remove this testcase.
>
> JDK-8315923 also added `pretouch_thp_and_use_concurrent` in
> test/hotspot/gtest/runtime/test_os_linux.cpp. What is still executing
> that test code?

Hi @dcubed-ojdk, JDK-8315923 introduced two testcases: `pretouch_thp_and_use_concurrent` and `TestTransparentHugePageUsage`. The first testcase was added for JDK-8272807 to check whether the ability introduced by JDK-8272807 gets broken by JDK-8315923 or any further changes about pretouch. The testcase does execute. For example, the file `GTestWrapper.jtr` you posted contains:

[ RUN      ] os_linux.pretouch_thp_and_use_concurrent_vm
[       OK ] os_linux.pretouch_thp_and_use_concurrent_vm (413 ms)

The second testcase was to check whether the benefits taken by JDK-8315923 get broken by future changes. However, we found this seems hard to evaluate as posted above, especially for thp counters in /proc/vmstat. From the related document [1], when defrag is "madvise", kernel should stall and collapse THP on demand, so I expected these counters would change after pretouch. Even if the collapse of THP failed, counters like thp_fault_fallback would grow. But it seems not true. So I think it would be better to remove it. Any advice on the check of THP usage is welcome.

1: https://docs.kernel.org/admin-guide/mm/transhuge.html

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18792#issuecomment-2062931378


More information about the hotspot-runtime-dev mailing list