RFR: 8265332: gtest/LargePageGtests.java OOMEs on -XX:+UseSHM cases
Thomas Stuefe
stuefe at openjdk.java.net
Fri Apr 16 17:01:45 UTC 2021
On Fri, 16 Apr 2021 10:06:43 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
> It looks like some `+UseSHM` test cases added by [JDK-8213269](https://bugs.openjdk.java.net/browse/JDK-8213269) reliably blow up the VM log reader with OOME. There are lots of `OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory.` in the log, if you increase the test heap size. AFAIU, many of those messages are expected from the new test cases.
>
> I believe ultimately this test produces a virtually unbounded number of warning messages, which would eventually blow out the Java heap in test infra parsers. `ConcurrentTestRunner` runs a time-bound number of iterations, which means the faster machine is, the more warning messages would be printed. I believe the way out is to make `ConcurrentTestRunner` to cap the number of iterations, so that VM output length is more predictable.
>
> This is a reliable tier1 failure on my TR 3970X, probably because it has enough cores to run 30 threads concurrently for 15 seconds all spewing warning messages.
>
> Test times before:
>
>
> # default
> [ OK ] os_linux.reserve_memory_special_concurrent_vm (15003 ms)
>
> # -XX:+UseLargePages
> [ OK ] os_linux.reserve_memory_special_concurrent_vm (16121 ms)
>
> # -XX:+UseLargePages -XX:LargePageSizeInBytes=1G
> [ OK ] os_linux.reserve_memory_special_concurrent_vm (15006 ms)
>
> # -XX:+UseLargePages -XX:+UseSHM
> [ OK ] os_linux.reserve_memory_special_concurrent_vm (15030 ms)
>
>
> Test times after:
>
>
> # default
> [ OK ] os_linux.reserve_memory_special_concurrent_vm (15003 ms)
>
> # -XX:+UseLargePages
> [ OK ] os_linux.reserve_memory_special_concurrent_vm (16071 ms)
>
> # -XX:+UseLargePages -XX:LargePageSizeInBytes=1G
> [ OK ] os_linux.reserve_memory_special_concurrent_vm (15006 ms)
>
> # -XX:+UseLargePages -XX:+UseSHM
> [ OK ] os_linux.reserve_memory_special_concurrent_vm (1190 ms)
>
>
> The major difference is that the last mode gets capped by `maxIteration`. This fixes the test failure, as `-XX:+UseSHM` case would produce lots of warnings on my machine.
>
> Additional testing:
> - [x] `os_linux` gtest
> - [x] `gtest/LargePageGtests.java` used to fail, now passes
I think a better and simpler way would be to make those messages conditional. IMHO only fatal error information should be unconditionally printed. This is not fatal, since if we get no large pages the VM would just happily switch to small pages. I wanted to quieten these messages for a long time, but never dared because of backward compatibility. But maybe we should just do it.
Another useful fix, for a different RFE maybe, would be to limit size of output from chatty sub processes in the test framework to something sensible.
-------------
PR: https://git.openjdk.java.net/jdk/pull/3542
More information about the hotspot-dev
mailing list