RFR: 8205924: ZGC: Premature OOME due to failure to expand backing file

Per Liden per.liden at oracle.com
Wed Jul 4 09:57:14 UTC 2018


On 07/04/2018 11:44 AM, Erik Helin wrote:
> On 07/02/2018 05:05 PM, Per Liden wrote:
>> ZGC currently assumes that there will be enough space available on the 
>> backing file system to hold the max heap size (-Xmx). However, this 
>> might not be true. For example, the backing filesystem might have been 
>> misconfigured or space on that filesystem might be used by some other 
>> process. In this situation, ZGC will try (and fail) to map more memory 
>> every time a new page needs to be allocated (assuming that request 
>> can't be satisfied by the page case). As a result, we fail to flush 
>> the page cache, which in turn means we throw a premature OOME and we 
>> continuously take the performance hit by making unnecessary 
>> fallocate() syscalls that will never succeed. We should instead detect 
>> this situation, flush the page cache and avoid making further 
>> fallocate() calls.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8205924
>> Webrev: http://cr.openjdk.java.net/~pliden/8205924/webrev.0
> 
> This looks good to me, I would have added an assert in
> size_t ZPhysicalMemoryBacking::try_expand such as
> 
> +size_t ZPhysicalMemoryBacking::try_expand(size_t old_capacity, size_t 
> new_capacity) {
> +  assert(new_capacity > old_capacity, "invariant");
> +  const size_t capacity = _file.try_expand(old_capacity, new_capacity - 
> old_capacity, _granule_size);
> 
> Not because I spotted anything wrong with this patch, more because if 
> someone one day introduces such a bug, then it will be hell to debug 
> without an assert like the above one :)

Sounds good, will add that.

> 
> In ZPageAllocator::try_ensure_unused_for_pre_mapped I would maybe have 
> designed ZPhysicalMemoryManager::try_ensure_unused_capacity so that it 
> is always valid to call (the method would just return in case _backing 
> isn't initialized).

I would prefer to keep that check in 
ZPageAllocator::try_ensure_unused_for_pre_mapped(), since that function 
is a special case since it's called during construction. The underlying 
ZPhysicalMemoryManager::try_ensure_unused_capacity() should never be 
called if the ZPhysicalMemoryManager isn't initialized and I'd rather 
crash hard than silently return if someone does that mistake.

> 
> I don't need to see another webrev if you just add the assert, but 
> please send out a new version if you rework ZPhysicalMemoryManager.

Thanks for reviewing, Erik!

cheers,
Per



More information about the hotspot-gc-dev mailing list