RFR: 8267155: runtime/os/TestTracePageSizes times out [v4]

Thomas Stuefe stuefe at openjdk.java.net
Tue May 18 12:18:44 UTC 2021


On Tue, 18 May 2021 06:53:19 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> This Linux-specific test parses `/proc/self/smaps` using a dotall regular expression. If part of the expression don't match it explodes in complexity, leading to timeouts.
>> 
>> In our case, `VmFlags` tag was missing from smaps, which was introduced with kernel 3.8. I am actually not able to determine how slow they were; on one machine they ran for two hours before getting killed.
>> 
>> I tried to fiddle with the regular expression and gave up, instead opting to rewrite the parser to get a simple one-pass scan. This is way faster than before - on our old-kernel machines the tests complete in 2 minutes. On new kernels the test is a bit faster too.
>> 
>> In addition to rewriting the parser, I added code which copies the smaps file into the test directory before parsing it. I do this to minimize problems should the underlying proc fs content change while parsing, and to have a way to retain the parsed smaps files.
>> 
>> I also added a way to feed an external smaps file into the test. Of course the test would fail, but it was a way to test the parser.
>> 
>> Unfortunately, this does not make the test succeed. The timeouts are gone, but we have still have no way to know if TPHs are enabled or not. That is a separate issue though.
>> 
>> Thanks, Thomas
>
> Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Update test/hotspot/jtreg/runtime/os/TestTracePageSizes.java
>    
>    Co-authored-by: Aleksey Shipilëv <shade at redhat.com>
>  - Update test/hotspot/jtreg/runtime/os/TestTracePageSizes.java
>    
>    oops
>    
>    Co-authored-by: Aleksey Shipilëv <shade at redhat.com>

The more I think about this the more confused I get.

In theory, THP can be enabled on a machine in a way which needs no cooperation of the VM; so whether or not we run with +UseTransparentHugePages makes no difference, pages used by the VM may be folded transparently to huge pages without our knowledge. Would on those machines the section show up as "VmFlags: hg" too? Or is this only the explicit madvice flag? Would the KernelPageSize reflect the page size set by the THP folding process (probably not since it can be multiple page sizes)?

And would this "proactive folding" interfere with the recognition of hugetlb pages? In other words, could we trace out "4k" and then have a "2048k" region show up in smaps?

The more I think about this, the muddier it gets. ATM I think we should just skip this test altogether for older kernels, and/or not test THP at all... not sure what to think here.

I also found this piece in man proc:

The "KernelPageSize" line (available since Linux 2.6.29)
 is the page size used by the kernel to back the virtual
 memory area.  .....  
However, one counter-example occurs on PPC64 kernels 
whereby a kernel using 64 kB as a base page
 size may still use 4 kB pages for the MMU on older
 processors.

I propose to just skip the test on that platform. Personally I have never used large pages on a ppc machine.

-------------

PR: https://git.openjdk.java.net/jdk/pull/4064


More information about the hotspot-runtime-dev mailing list