Review request for 5049299

Thu May 28 11:19:20 UTC 2009

Florian Weimer wrote:
> * Andrew Haley:
> 
>> But those running Linux won't benefit from such a change because
>> on Linux there is no transient doubling of process size: all that happens
>> is that the page table entries in the new process are mapped copy on write.
>> The extra pages count towards the overcommit limit, but that's wholly
>> artifical.
> 
> On Linux in vm.overcommit_memory=2 mode, the whole heap (not just the
> committed part) counts against the system's total memory allocation
> limit.  After a fork, the heap counts twice.  Copy-on-write is just an
> optimization in this mode, it does not change the physical memory
> requirements of the workload (in vm.overcommit_memory=1 mode, it
> does).

Well, yes, all that vm.overcommit_memory=2 mode does is disable
overcommit, and overcommit is what you need to make this work
properly.  But even in vm.overcommit_memory=2 mode the pages still
aren't copied until written.  All that mode 2 does is prevent the
transient allocation of the pages in the copy of the forked process,
even though the system has all the memory it needs to fulfil the
request.  AIUI...

> A better way seems to be to allocate the heap with PROT_NONE, and
> later use mprotect with PROT_READ|PROT_WRITE (and perhaps PROT_EXEC)
> to allocate chunks from the kernel.  This will fail deterministically
> in the garbage collector if no physical memory is available.  The
> PROT_NONE mapping is only there to reserve a continuous chunk of
> address space (so that calls to malloc or dlopen do not create
> mappings in the middle of the Java heap).  When I tested this some
> time ago, a PROT_NONE mapping did not count towards the system's
> memory allocation limit, hence the potential failure in the mprotect
> call.  The main problem with this approach is that this is not a
> documented way of using the kernel API; it might work accidentally now
> and change behavior in the future.

I can see the sense in this.  In modes 0 and 1 Java won't behave any
differently from the way it does today, except that why it does run
out of real memory there will be a decent traceback rather than a segfault.

However, I think you don't want PROT_NONE whan a system has been configured
with mode 2: in that case, a user has a reasonable expectation that
they can use all of the memory they allocated at VM startup.  It makes
more sense to allocate all the -Xms size immediately.

Andrew.