RFR: 8205924: ZGC: Premature OOME due to failure to expand backing file
Per Liden
per.liden at oracle.com
Wed Jul 4 09:57:14 UTC 2018
On 07/04/2018 11:44 AM, Erik Helin wrote:
> On 07/02/2018 05:05 PM, Per Liden wrote:
>> ZGC currently assumes that there will be enough space available on the
>> backing file system to hold the max heap size (-Xmx). However, this
>> might not be true. For example, the backing filesystem might have been
>> misconfigured or space on that filesystem might be used by some other
>> process. In this situation, ZGC will try (and fail) to map more memory
>> every time a new page needs to be allocated (assuming that request
>> can't be satisfied by the page case). As a result, we fail to flush
>> the page cache, which in turn means we throw a premature OOME and we
>> continuously take the performance hit by making unnecessary
>> fallocate() syscalls that will never succeed. We should instead detect
>> this situation, flush the page cache and avoid making further
>> fallocate() calls.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8205924
>> Webrev: http://cr.openjdk.java.net/~pliden/8205924/webrev.0
>
> This looks good to me, I would have added an assert in
> size_t ZPhysicalMemoryBacking::try_expand such as
>
> +size_t ZPhysicalMemoryBacking::try_expand(size_t old_capacity, size_t
> new_capacity) {
> + assert(new_capacity > old_capacity, "invariant");
> + const size_t capacity = _file.try_expand(old_capacity, new_capacity -
> old_capacity, _granule_size);
>
> Not because I spotted anything wrong with this patch, more because if
> someone one day introduces such a bug, then it will be hell to debug
> without an assert like the above one :)
Sounds good, will add that.
>
> In ZPageAllocator::try_ensure_unused_for_pre_mapped I would maybe have
> designed ZPhysicalMemoryManager::try_ensure_unused_capacity so that it
> is always valid to call (the method would just return in case _backing
> isn't initialized).
I would prefer to keep that check in
ZPageAllocator::try_ensure_unused_for_pre_mapped(), since that function
is a special case since it's called during construction. The underlying
ZPhysicalMemoryManager::try_ensure_unused_capacity() should never be
called if the ZPhysicalMemoryManager isn't initialized and I'd rather
crash hard than silently return if someone does that mistake.
>
> I don't need to see another webrev if you just add the assert, but
> please send out a new version if you rework ZPhysicalMemoryManager.
Thanks for reviewing, Erik!
cheers,
Per
More information about the hotspot-gc-dev
mailing list