RFR: 8267404: vmTestbase/vm/mlvm/anonloader/stress/oome/metaspace/Test.java failed with OutOfMemoryError
Thomas Stuefe
stuefe at openjdk.java.net
Fri May 21 10:03:29 UTC 2021
On Fri, 21 May 2021 05:11:51 GMT, Jie Fu <jiefu at openjdk.org> wrote:
> Hi all,
>
> vmTestbase/vm/mlvm/anonloader/stress/oome/metaspace/Test.java OOMEs on Oracle's aarch64 platforms.
> The reason is that both -Xmx and -XX:MetaspaceSize are not enough.
>
> From the original JBS decription of JDK-8267404, the VM OOMEs before the expected OOME in metaspace happened showing that -Xmx256m is not enough.
>
> Then, @dcubed-ojdk helped me test with -Xmx512, which still OOMEs.
> However, the expected OOME in metaspace was caught this time.
> But a second uncaught OOME in metaspace happened soon, which means -XX:MetaspaceSize=8m is not enough.
>
> So both -Xmx and -XX:MetaspaceSize should be increased.
> The fix just:
> - Revert changes about mataspace size setting
> - Increase -Xmx from 256m to 1g
>
> -Xmx512m may be OK on Oracle's aarch64 machines, but to make it safer, -Xmx1g is preferred.
>
> Thanks.
> Best regards,
> Jie
I don't think this patch does what you think it does, sorry.
First off, as @stefank said, please forget about `MetaspaceSize` (terribly confusing name). Either omit it or set it to the same value as `MaxMetaspaceSize`. The latter may cause the test to be slightly faster, but it does not matter much either way.
The test allocates classes endlessly. The point of this test is to fill up metaspace with data until it is full, then to observe a correctly thrown OOM from Metaspace. For that to work, metaspace has to be **small**, not large.
In theory you could make it as small as 512k or so. Only, we need the VM itself to come up and start the test. So the VM needs to be able to load all JDK classes needed for booting, then the test classes. Then the test starts, loads all these generated classes, and then metaspace should run out. In my local tests I was able to run the test correctly with MaxMetaspaceSize=4m.
Each generated j.l.Class object needs heap space too, therefore this test is a race between what part is exhausted earlier, metaspace or heap. If you see an OOM from heap, not metaspace, it means heap space ran out first, and the test will fail.
You want metaspace to be as small as possible, heap space to be large enough to exhaust metaspace first. In my local test on Linux x64, I was able to run with MaxMetaspaceSize=4m and 128m heap space.
If you increase MaxMetaspaceSize to 20m, you will clearly need more and more heap space to still be able to exhaust that higher limit. Leave MaxMetaspaceSize at 8, or preferably less.
What I am concerned about is that with there should not be *that* much variance in how much heap space we use. Sure you can set it to 1g or 2g, but something is clearly off. I get by on Linux x64 with 128m, on 32bit x86 with 256m (since we don't have CompressedClassPointers we need somewhat more heap). But needing 1g on aarch64 seems weird.
Cheers, Thomas
p.s. One source of variance may be the question if CDS is on or off. If CDS is on (default), the JVM will use significantly less Metaspace at boot. With CDS off, we use more metaspace. It may make sense to set Xshare=off to get rid of this variance.
p.p.s. There are several ways to improve this test and make it more robust:
1) increase the class size (see AnonkTestee01 - make a sister class with more fields and more constants and use that one)
2) Rewrite the whole test: let it start a jvm with ProcessBuilder with a very low metaspace - too low to load the JDK itself. Something like this: `java -Xshare:off -XX:MaxMetaspaceSize=1m -version`. The VM should not come up; instead we should see an `OutOfMemoryError: Metaspace` at stdout.
(1) makes the test run through faster and requires less heap to fill up metaspace
(2) would completely remove the guessing game of "how much metaspace do I need to let the VM come up and start the test".
-------------
PR: https://git.openjdk.java.net/jdk/pull/4140
More information about the hotspot-runtime-dev
mailing list