RFR: 8205924: ZGC: Premature OOME due to failure to expand backing file
Erik Helin
erik.helin at oracle.com
Wed Jul 4 09:44:44 UTC 2018
On 07/02/2018 05:05 PM, Per Liden wrote:
> ZGC currently assumes that there will be enough space available on the
> backing file system to hold the max heap size (-Xmx). However, this
> might not be true. For example, the backing filesystem might have been
> misconfigured or space on that filesystem might be used by some other
> process. In this situation, ZGC will try (and fail) to map more memory
> every time a new page needs to be allocated (assuming that request can't
> be satisfied by the page case). As a result, we fail to flush the page
> cache, which in turn means we throw a premature OOME and we continuously
> take the performance hit by making unnecessary fallocate() syscalls that
> will never succeed. We should instead detect this situation, flush the
> page cache and avoid making further fallocate() calls.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8205924
> Webrev: http://cr.openjdk.java.net/~pliden/8205924/webrev.0
This looks good to me, I would have added an assert in
size_t ZPhysicalMemoryBacking::try_expand such as
+size_t ZPhysicalMemoryBacking::try_expand(size_t old_capacity, size_t
new_capacity) {
+ assert(new_capacity > old_capacity, "invariant");
+ const size_t capacity = _file.try_expand(old_capacity, new_capacity -
old_capacity, _granule_size);
Not because I spotted anything wrong with this patch, more because if
someone one day introduces such a bug, then it will be hell to debug
without an assert like the above one :)
In ZPageAllocator::try_ensure_unused_for_pre_mapped I would maybe have
designed ZPhysicalMemoryManager::try_ensure_unused_capacity so that it
is always valid to call (the method would just return in case _backing
isn't initialized).
I don't need to see another webrev if you just add the assert, but
please send out a new version if you rework ZPhysicalMemoryManager.
Thanks,
Erik
More information about the hotspot-gc-dev
mailing list