HSX-24 Backport Code Review for memory commit failure fix (8013057)

Fri Jun 7 15:21:04 PDT 2013

Greetings,

I have an HSX-24 backport of the proposed fix for the following bug:

     8013057 assert(_needs_gc || SafepointSynchronize::is_at_safepoint())
             failed: only read at safepoint

Here are the (round 0) webrev URLs for the HSX-24 backport:

OpenJDK: http://cr.openjdk.java.net/~dcubed/8013057-webrev/0-hsx24/
Internal: http://javaweb.us.oracle.com/~ddaugher/8013057-webrev/0-hsx24/

Here are the (round 3) webrev URLs for the HSX-25 version for reference:

OpenJDK: http://cr.openjdk.java.net/~dcubed/8013057-webrev/3-hsx25/
Internal: http://javaweb.us.oracle.com/~ddaugher/8013057-webrev/3-hsx25/

The easiest way to review the backport is to download and compare
the two patch files with your favorite file comparison tool:

OpenJDK patch links:
http://cr.openjdk.java.net/~dcubed/8013057-webrev/0-hsx24/8013057_for_hsx24_exp.patch
http://cr.openjdk.java.net/~dcubed/8013057-webrev/3-hsx25/8013057_exp.patch

Internal patch links:
http://javaweb.us.oracle.com/~ddaugher/8013057-webrev/0-hsx24/8013057_for_hsx24_exp.patch
http://javaweb.us.oracle.com/~ddaugher/8013057-webrev/3-hsx25/8013057_exp.patch

For 'diff' oriented people (like me), here's my quick sanity of the two
patch files:
     - diff the patches
     - strip down to line additions and deletions
     - strip out the copyright changes

$ diff 8013057-webrev/3-hsx25/8013057_exp.patch \
     8013057-webrev/0-hsx24/8013057_for_hsx24_exp.patch \
     | grep '^[<>] [+-] ' | grep -v Copyright
< +  return false;
 > +  return false;
< +    vm_exit_out_of_memory(size, OOM_MMAP_ERROR, mesg);
 > +    vm_exit_out_of_memory(size, mesg);
< +    vm_exit_out_of_memory(size, OOM_MMAP_ERROR, "committing reserved 
memory.");
 > +    vm_exit_out_of_memory(size, "committing reserved memory.");
< +    vm_exit_out_of_memory(size, OOM_MMAP_ERROR, mesg);
 > +    vm_exit_out_of_memory(size, mesg);
< +    vm_exit_out_of_memory(size, OOM_MMAP_ERROR, mesg);
 > +    vm_exit_out_of_memory(size, mesg);
< +    vm_exit_out_of_memory(bytes, OOM_MMAP_ERROR, "committing reserved 
memory.");
 > +    vm_exit_out_of_memory(bytes, "committing reserved memory.");
< +    vm_exit_out_of_memory(bytes, OOM_MMAP_ERROR, mesg);
 > +    vm_exit_out_of_memory(bytes, mesg);
< +    vm_exit_out_of_memory(bytes, OOM_MMAP_ERROR, mesg);
 > +    vm_exit_out_of_memory(bytes, mesg);
< +    vm_exit_out_of_memory(size, OOM_MMAP_ERROR, mesg);
 > +    vm_exit_out_of_memory(size, mesg);
< -        vm_exit_out_of_memory(new_committed.byte_size(), OOM_MMAP_ERROR,
 > -        vm_exit_out_of_memory(new_committed.byte_size(),
< -    vm_exit_out_of_memory(_size, OOM_MMAP_ERROR, "Allocator (commit)");
 > -    vm_exit_out_of_memory(_size, "Allocator (commit)");
< -    vm_exit_out_of_memory(_page_size, OOM_MMAP_ERROR, "card table 
last card");
 > -    vm_exit_out_of_memory(_page_size, "card table last card");
< -        vm_exit_out_of_memory(new_committed.byte_size(), OOM_MMAP_ERROR,
 > -        vm_exit_out_of_memory(new_committed.byte_size(),

The majority of the diffs are due to new OOM failure type param that
was added in HSX-25 only. The "return false" shows up because HSX-24
has some left-over MacOS X port cruft that someone removed in HSX-25
so the "return false" looks like it was added in different places.

The remainder is the HSX-25 round 3 code review invite with the
code blocks updated for HSX-24.

Testing:
- Aurora Adhoc vm.quick batch for all OSes in the following configs:
   {Client VM, Server VM} x {fastdebug} x {-Xmixed}
- I've created a standalone Java stress test with a shell script
   wrapper that reproduces the failing code paths on my Solaris X86
   server and on Ron's DevOps Linux machine. This test will not be
   integrated since running the machine out of swap space is very
   disruptive (crashes the window system, causes various services to
   exit, etc.)

There are three parts to this fix:

1) Detect commit memory failures on Linux and Solaris where the
    previous reservation can be lost and call vm_exit_out_of_memory()
    to report the resource exhaustion. Add os::commit_memory_or_exit()
    API to provide more consistent handling of vm_exit_out_of_memory()
    calls.
2) Change existing os::commit_memory() calls to make the executable
    status of memory more clear; this makes security analysis easier.
3) Clean up some platform dependent layer calls that were resulting
    in extra NMT accounting. Clean up some mmap() return value checks.

Gory details are below. As always, comments, questions and
suggestions are welome.

Dan

Gory Details:

The VirtualSpace data structure is built on top of the ReservedSpace
data structure. VirtualSpace presumes that failed os::commit_memory()
calls do not affect the underlying ReservedSpace memory mappings.
That assumption is true on MacOS X and Windows, but it is not true
on Linux or Solaris. The mmap() system call on Linux or Solaris can
lose previous mappings in the event of certain errors. On MacOS X,
the mmap() system call clearly states that previous mappings are
replaced only on success. On Windows, a different set of APIs are
used and they do not document any loss of previous mappings.

The solution is to implement the proper failure checks in the
os::commit_memory() implementations on Linux and Solaris. On MacOS X
and Windows, no additional checks are needed.

During code review round 1, there was a request from the GC team to
provide an os::commit_memory_or_exit() entry point in order to preserve
the existing error messages on all platforms. This entry point allows
code like this:

src/share/vm/gc_implementation/parallelScavenge/cardTableExtension.cpp:

  652       if (!os::commit_memory((char*)new_committed.start(),
  653                              new_committed.byte_size())) {
  654         vm_exit_out_of_memory(new_committed.byte_size(),
  655                               "card table expansion");

to be replaced with code like this:

  652       os::commit_memory_or_exit((char*)new_committed.start(),
  653                                 new_committed.byte_size(), !ExecMem,
  654                                 "card table expansion");

All uses of os::commit_memory() have been visited and those locations
that previously exited on error have been updated to use the new entry
point. This new entry point cleans up the original call sites and the
vm_exit_out_of_memory() calls are now consistent on all platforms.

As a secondary change, while visiting all os::commit_memory() calls, I
also updated them to use the new ExecMem enum in order to make the
executable status of the memory more clear. Since executable memory can
be an attack vector, it is prudent to make the executable status of
memory crystal clear. This also allowed me to remove the default
executable flag value of 'false'. Now all new uses of commit_memory()
must be clear about the executable status of the memory.

There are also tertiary changes where some of the pd_commit_memory()
calls were calling os::commit_memory() instead of calling their sibling
os::pd_commit_memory(). This resulted in double NMT tracking so this
has also been fixed. There were also some incorrect mmap)() return
value checks which have been fixed.

Just to be clear: This fix simply properly detects the "out of swap
space" condition on Linux and Solaris and causes the VM to fail in a
more orderly fashion with a message that looks like this:

The Java process' stderr will show:

INFO: os::commit_memory(0xfffffd7fb2522000, 4096, 4096, 0) failed; errno=11
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 4096 bytes for 
committing reserved memory.
# An error report file with more information is saved as:
# /work/shared/bugs/8013057/looper.03/hs_err_pid9111.log

The hs_err_pid file will have the more verbose info:

#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 4096 bytes for 
committing reserved memory.
# Possible reasons:
#   The system is out of physical RAM or swap space
#   In 32 bit mode, the process size limit was hit
# Possible solutions:
#   Reduce memory load on the system
#   Increase physical memory or swap space
#   Check if swap backing store is full
#   Use 64 bit Java on a 64 bit OS
#   Decrease Java heap size (-Xmx/-Xms)
#   Decrease number of Java threads
#   Decrease Java thread stack sizes (-Xss)
#   Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
#  Out of Memory Error 
(/work/shared/bug_hunt/hsx_rt_latest/exp_8013057/src/os/solaris/vm/os_solaris.cpp:2791), 
pid=9111, tid=21
#
# JRE version: Java(TM) SE Runtime Environment (8.0-b89) (build 
1.8.0-ea-b89)
# Java VM: Java HotSpot(TM) 64-Bit Server VM 
(25.0-b33-bh_hsx_rt_exp_8013057_dcubed-product-fastdebug mixed mode 
solaris-amd64 compressed oops)
# Core dump written. Default location: 
/work/shared/bugs/8013057/looper.03/core or core.9111
#

You might be wondering why we are assuming that the failed mmap()
commit operation has lost the 'reserved memory' mapping.

     We have no good way to determine if the 'reserved memory' mapping
     is lost. Since all the other threads are not idle, it is possible
     for another thread to have 'reserved' the same memory space for a
     different data structure. Our thread could observe that the memory
     is still 'reserved' but we have no way to know that the reservation
     isn't ours.

You might be wondering why we can't recover from this transient
resource availability issue.

     We could retry the failed mmap() commit operation, but we would
     again run into the issue that we no longer know which data
     structure 'owns' the 'reserved' memory mapping. In particular, the
     memory could be reserved by native code calling mmap() directly so
     the VM really has no way to recover from this failure.

You might be wondering why part of this work is deferred:

2749     if (!recoverable_mmap_error(err)) {
2750       // However, it is not clear that this loss of our reserved 
mapping
2751       // happens with large pages on Linux or that we cannot recover
2752       // from the loss. For now, we just issue a warning and we don't
2753       // call vm_exit_out_of_memory(). This issue is being tracked by
2754       // JBS-8007074.
2755       warn_fail_commit_memory(addr, size, alignment_hint, exec, err);
2756 //    vm_exit_out_of_memory(size, "committing reserved memory.");
2757     }
2758     // Fall through and try to use small pages

     When line 2756 is enabled and UseHugeTLBFS is specified, then the
     VM will exit because no more huge/large pages are available. It is
     not yet clear that this transition from large to small pages is
     actually unsafe, but we don't yet have proof that it is safe
     either. More research will be done via JBS-8007074.

If you've made it this far without falling asleep or drooling on your
keyboard, I applaud you!