RFR: 8267155: runtime/os/TestTracePageSizes times out [v4]

Thomas Stuefe stuefe at openjdk.java.net
Tue May 18 10:57:50 UTC 2021


On Tue, 18 May 2021 06:53:19 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> This Linux-specific test parses `/proc/self/smaps` using a dotall regular expression. If part of the expression don't match it explodes in complexity, leading to timeouts.
>> 
>> In our case, `VmFlags` tag was missing from smaps, which was introduced with kernel 3.8. I am actually not able to determine how slow they were; on one machine they ran for two hours before getting killed.
>> 
>> I tried to fiddle with the regular expression and gave up, instead opting to rewrite the parser to get a simple one-pass scan. This is way faster than before - on our old-kernel machines the tests complete in 2 minutes. On new kernels the test is a bit faster too.
>> 
>> In addition to rewriting the parser, I added code which copies the smaps file into the test directory before parsing it. I do this to minimize problems should the underlying proc fs content change while parsing, and to have a way to retain the parsed smaps files.
>> 
>> I also added a way to feed an external smaps file into the test. Of course the test would fail, but it was a way to test the parser.
>> 
>> Unfortunately, this does not make the test succeed. The timeouts are gone, but we have still have no way to know if TPHs are enabled or not. That is a separate issue though.
>> 
>> Thanks, Thomas
>
> Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Update test/hotspot/jtreg/runtime/os/TestTracePageSizes.java
>    
>    Co-authored-by: Aleksey Shipilëv <shade at redhat.com>
>  - Update test/hotspot/jtreg/runtime/os/TestTracePageSizes.java
>    
>    oops
>    
>    Co-authored-by: Aleksey Shipilëv <shade at redhat.com>

Hi Stefan,

> > > Cursory review follows.
> > > The alternative would be to disable this on kernels < 3.8? See the commits from my last PR:
> > > https://github.com/openjdk/jdk/pull/3415/files/c7a6136cedb827260f43ecc850a8910492cfefcc..1df37de73ed6615db565aeafe76edf30dcee108d
> > 
> > 
> > Hmm, I like faster tests even on new kernels, therefore I would like to use this parser instead of the regex. Also think it makes the test more readable and robust, but that may be just me. E.g. you don't rely on order of entry in smaps.
> > But we could do both, of course. Lets ask @kstefanj , its his test.
> 
> I'm of course biased since I wrote the test. That said, I don't consider this _my_ test, so I won't object to changing it.
> 

Sorry, I meant no offence. I think this test is really useful, since I want to be able to rely on the page sizes in the traces and not have to countercheck the smaps file. That's why I put work in it instead of just disabling it.

What I meant was do you consider testing correct tracing of *explicit* huge pages important enough on semi-old kernels before 3.8? If not, we can modify the test to skip older kernels completely, otherwise just the detect-THP-part?

> Personally I prefer the regex-parsing is clearer since it "parses" one segment at a time in the smaps file.

Unfortunately its not so easy since its smaps sections don't have a clear ending. I played around yesterday and gave up after a while. I also find regex code beyond a certain complexity really work intensive and difficult to debug.

The original regexp was also too loose, I think, since `(\\w+)-(\\w+)` - unanchored to the start of a line - matches the permissions if those contain dashes. And theoretically, you could accidentally match across secions, were one to miss a flag.

> I think if we go with this new parser, it would be nice to add some comments clarifying the steps.

Okay, I could do that.

> 
> An alternative would be to have different regular expressions depending on the kernel version or first check if vmFlags are present and choose the correct expression. This would of course not help if this kind of parsing is still very slow on some systems.

I think its just more vulnerable against unknown or future smaps variants. E.g. relying on a specific order of entries.

> 
> I like that you do a copy of smaps before parsing, that crossed my mind as well, to help debugging. Regarding smaller heap sizes. I don't see a problem if some of the test VMs fail getting large pages, since it tests that we still log the correct thing.

I prefer the "explicit-large-pages" test to really run a VM with explicit large pages. Otherwise we test with small pages, which would be the same as with the other tests.

> 
> Just wanted to share my thoughts before doing a proper review. I'm happy to review this patch if we decide to go with this new parser. If we do that I also think we should look at enabling parsing of `AnonHugePages` for the case where vmFlags are not present (similar to @shipilev approach in the last PR).

This test has been a bit of whack-a-mole; I am interested in a stable test which covers the important cases. E.g. I tried to find a clear definition of AnonHugePages but beyond that it requires the kernel to be compiled with THP I did not find any indication that this is synonymous with THP, or under which circumstances it is set, or whats its relationship to KernelPageSize is. I also suspect it is kernel-version-dependent like the other properties.

Frankly am interested in this test for testing trace of explicit huge pages correctly, not so much THP. Since this is what most of our customers use. THP have a bad rep here because they used to come with problems. And since hugetlbfs has gotten easier to use (eg with vm.nr_overcommit_hugepages) I usually use that instead of TPH.

Cheers, Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/4064


More information about the hotspot-runtime-dev mailing list