RFR: 8265332: gtest/LargePageGtests.java OOMEs on -XX:+UseSHM cases [v2]
Aleksey Shipilev
shade at openjdk.java.net
Thu Apr 22 08:34:28 UTC 2021
On Wed, 21 Apr 2021 10:44:12 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>> It looks like some `+UseSHM` test cases added by [JDK-8213269](https://bugs.openjdk.java.net/browse/JDK-8213269) reliably blow up the VM log reader with OOME. There are lots of `OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory.` in the log, if you increase the test heap size. AFAIU, many of those messages are expected from the new test cases.
>>
>> I believe ultimately this test produces a virtually unbounded number of warning messages, which would eventually blow out the Java heap in test infra parsers. This is a reliable tier1 failure on my TR 3970X, probably because it has enough cores to run 30 threads concurrently for 15 seconds all spewing warning messages.
>>
>> #### Try 1
>>
>> The first attempt recognizes that `ConcurrentTestRunner` runs a time-bound number of iterations, which means the faster machine is, the more warning messages would be printed. Then, the way out is to make `ConcurrentTestRunner` to cap the number of iterations, so that VM output length is more predictable.
>>
>> Test times before:
>>
>>
>> # default
>> [ OK ] os_linux.reserve_memory_special_concurrent_vm (15003 ms)
>>
>> # -XX:+UseLargePages
>> [ OK ] os_linux.reserve_memory_special_concurrent_vm (16121 ms)
>>
>> # -XX:+UseLargePages -XX:LargePageSizeInBytes=1G
>> [ OK ] os_linux.reserve_memory_special_concurrent_vm (15006 ms)
>>
>> # -XX:+UseLargePages -XX:+UseSHM
>> [ OK ] os_linux.reserve_memory_special_concurrent_vm (15030 ms)
>>
>>
>> Test times after:
>>
>>
>> # default
>> [ OK ] os_linux.reserve_memory_special_concurrent_vm (15003 ms)
>>
>> # -XX:+UseLargePages
>> [ OK ] os_linux.reserve_memory_special_concurrent_vm (16071 ms)
>>
>> # -XX:+UseLargePages -XX:LargePageSizeInBytes=1G
>> [ OK ] os_linux.reserve_memory_special_concurrent_vm (15006 ms)
>>
>> # -XX:+UseLargePages -XX:+UseSHM
>> [ OK ] os_linux.reserve_memory_special_concurrent_vm (1190 ms)
>>
>>
>> The major difference is that the last mode gets capped by `maxIteration`. This fixes the test failure, as `-XX:+UseSHM` case would produce lots of warnings on my machine.
>>
>> #### Try 2
>>
>> The second attempt run the tests with `-XX:-PrintWarnings` to avoid warning log overload.
>>
>> Additional testing:
>> - [x] `os_linux` gtest
>> - [x] `gtest/LargePageGtests.java` used to fail, now passes
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>
> - Just run with -XX:-PrintWarnings
> - Merge branch 'master' into JDK-8265332-largepages-oome
> - 8265332: gtest/LargePageGtests.java OOMEs on -XX:+UseSHM cases
Thanks!
-------------
PR: https://git.openjdk.java.net/jdk/pull/3542
More information about the hotspot-dev
mailing list