code review (round 2) for memory commit failure fix (8013057)

Tue Jun 18 05:59:22 PDT 2013

Dmitry,

Thanks for closing the loop on this review!

Dan

On 6/18/13 3:46 AM, Dmitry Samersoff wrote:
> Dan,
>
> Revised fix looks OK for me.
>
> -Dmitry
>
>
> On 2013-06-06 04:17, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> Since we're in a holding pattern with RT_Baseline, I've taken
>> the opportunity to make the code style changes that Stefan
>> requested.
>>
>> New webrev:
>>
>>      http://cr.openjdk.java.net/~dcubed/8013057-webrev/3-hsx25/
>>
>> Stefan has already re-reviewed and approves. JPRT-hotspotwest job
>> is in progress.
>>
>> Dan
>>
>>
>>
>> On 6/5/13 7:01 AM, Daniel D. Daugherty wrote:
>>> Stefan,
>>>
>>> Thanks for the re-review! And for several off-line reviews
>>> as we were trying to work out this 'round 2' version.
>>>
>>>
>>> On 6/5/13 2:02 AM, Stefan Karlsson wrote:
>>>> On 06/04/2013 07:32 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I have another revised version of the proposed fix for the following
>>>>> bug:
>>>>>
>>>>>      8013057 assert(_needs_gc ||
>>>>> SafepointSynchronize::is_at_safepoint())
>>>>>              failed: only read at safepoint
>>>>>
>>>>> Here are the (round 2) webrev URLs:
>>>>>
>>>>> OpenJDK: http://cr.openjdk.java.net/~dcubed/8013057-webrev/2-hsx25/
>>>>> Internal:
>>>>> http://javaweb.us.oracle.com/~ddaugher/8013057-webrev/2-hsx25/
>>>> Looks good. Thanks for fixing this and making sure that this new code
>>>> work well with the error handling in the GC code.
>>> Thanks! The new commit_memory_or_exit() cleaned up call sites in
>>> several places so I appreciate your feedback.
>>>
>>>
>>>> I have two code style comments that are not necessary to fix, but it
>>>> would make me happy if they were.
>>>>
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8013057-webrev/2-hsx25/src/share/vm/runtime/os.hpp.udiff.html
>>>>
>>>>
>>>> +// Executable parameter flag for os::commit_memory() and
>>>> +// os::commit_memory_or_exit().
>>>> +enum ExecMemFlag {
>>>> +  ExecMem = true
>>>> +};
>>>>
>>>>
>>>> Why is this an Enum and not a const bool when it's used as a bool?
>>> Hmmm... It's an Enum because that seemed to be style used in other places
>>> in the VM when we didn't want to pass "true /* some comment */" as a
>>> parameter. Do you happen to have a "const bool" example I could look at?
>>> I'll poke around, but if you could point me at an example, I would
>>> appreciate it...
>>>
>>>
>>>> http://cr.openjdk.java.net/~dcubed/8013057-webrev/2-hsx25/src/os/linux/vm/os_linux.cpp.udiff.html
>>>>
>>>>
>>>> +  // see if the error is one we can let the caller handle
>>>> +  switch (err) {
>>>> +  case EBADF:
>>>> +  case EINVAL:
>>>> +  case ENOTSUP:
>>>> +    // let the caller deal with these errors
>>>> +    break;
>>>> +
>>>> +  default:
>>>> +    // Any remaining errors on this OS can cause our reserved mapping
>>>> +    // to be lost. That can cause confusion where different data
>>>> +    // structures think they have the same memory mapped. The worst
>>>> +    // scenario is if both the VM and a library think they have the
>>>> +    // same memory mapped.
>>>> +    warn_fail_commit_memory(addr, size, exec, err);
>>>> +    vm_exit_out_of_memory(size, OOM_MMAP_ERROR, "committing reserved
>>>> memory.");
>>>> +    break;
>>>> +  }
>>>>
>>>> and
>>>>
>>>> +    switch (err) {
>>>> +    case EBADF:
>>>> +    case EINVAL:
>>>> +    case ENOTSUP:
>>>> +      // these errors are OK for retrying with small pages
>>>> +      break;
>>>> +
>>>> +    default:
>>>> +      // Any remaining errors on this OS can cause our reserved mapping
>>>> +      // to be lost. That can cause confusion where different data
>>>> +      // structures think they have the same memory mapped. The worst
>>>> +      // scenario is if both the VM and a library think they have the
>>>> +      // same memory mapped.
>>>> +      //
>>>> +      // However, it is not clear that this loss of our reserved
>>>> mapping
>>>> +      // happens with large pages on Linux or that we cannot recover
>>>> +      // from the loss. For now, we just issue a warning and we don't
>>>> +      // call vm_exit_out_of_memory(). This issue is being tracked by
>>>> +      // JBS-8007074.
>>>> +      warn_fail_commit_memory(addr, size, alignment_hint, exec, err);
>>>> +#if 0
>>>> +      vm_exit_out_of_memory(size, OOM_MMAP_ERROR,
>>>> +                            "committing reserved memory.");
>>>> +#endif
>>>> +      break;
>>>> +    }
>>>>
>>>> These construct are rather long and duplicated. I'd prefer if this
>>>> could be extracted out into a helper function so that the logic of
>>>> commit_memory_impl is a bit clearer.
>>> I agree that this could be improved. You had recommended this:
>>>
>>> if (!recoverable_error(err)) {
>>>    warn_fail_commit_memory(addr, size, exec, err);
>>>    vm_exit_out_of_memory(size, OOM_MMAP_ERROR, "committing reserved
>>> memory.");
>>> }
>>>
>>> and this:
>>>
>>> if (!recoverable_error(err)) {
>>>    warn_fail_commit_memory(addr, size, alignment_hint, exec, err);
>>> #if 0
>>>    vm_exit_out_of_memory(size, OOM_MMAP_ERROR, "committing reserved
>>> memory.");
>>> #endif
>>> }
>>>
>>> I concur that the identical parts can be refactored into
>>> recoverable_error(), but the "However, it is not clear..."
>>> paragraph needs to stay with disabled vm_exit_out_of_memory()
>>> call. BTW, Coleen requested (off-thread) that the call be
>>> commented out since of #if'ed out.
>>>
>>> I'll definitely make these changes if I have to re-roll the
>>> patch for non-code style changes. Depending on how this
>>> changeset lines up with RT_Baseline's schedule this week,
>>> I may even make the changes anyway.
>>>
>>> Again, thanks for the many review cycles.
>>>
>>> Dan
>>>
>>>
>>>> thanks,
>>>> StefanK
>>>>
>>>>> Testing:
>>>>> - Aurora Adhoc vm.quick batch for all OSes in the following configs:
>>>>>    {Client VM, Server VM} x {fastdebug} x {-Xmixed}
>>>>> - I've created a standalone Java stress test with a shell script
>>>>>    wrapper that reproduces the failing code paths on my Solaris X86
>>>>>    server and on Ron's DevOps Linux machine. This test will not be
>>>>>    integrated since running the machine out of swap space is very
>>>>>    disruptive (crashes the window system, causes various services to
>>>>>    exit, etc.)
>>>>>
>>>>> There are three parts to this fix:
>>>>>
>>>>> 1) Detect commit memory failures on Linux and Solaris where the
>>>>>     previous reservation can be lost and call vm_exit_out_of_memory()
>>>>>     to report the resource exhaustion. Add os::commit_memory_or_exit()
>>>>>     API to provide more consistent handling of vm_exit_out_of_memory()
>>>>>     calls.
>>>>> 2) Change existing os::commit_memory() calls to make the executable
>>>>>     status of memory more clear; this makes security analysis easier.
>>>>> 3) Clean up some platform dependent layer calls that were resulting
>>>>>     in extra NMT accounting. Clean up some mmap() return value checks.
>>>>>
>>>>> Gory details are below. As always, comments, questions and
>>>>> suggestions are welome.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> Gory Details:
>>>>>
>>>>> The VirtualSpace data structure is built on top of the ReservedSpace
>>>>> data structure. VirtualSpace presumes that failed os::commit_memory()
>>>>> calls do not affect the underlying ReservedSpace memory mappings.
>>>>> That assumption is true on MacOS X and Windows, but it is not true
>>>>> on Linux or Solaris. The mmap() system call on Linux or Solaris can
>>>>> lose previous mappings in the event of certain errors. On MacOS X,
>>>>> the mmap() system call clearly states that previous mappings are
>>>>> replaced only on success. On Windows, a different set of APIs are
>>>>> used and they do not document any loss of previous mappings.
>>>>>
>>>>> The solution is to implement the proper failure checks in the
>>>>> os::commit_memory() implementations on Linux and Solaris. On MacOS X
>>>>> and Windows, no additional checks are needed.
>>>>>
>>>>> During code review round 1, there was a request from the GC team to
>>>>> provide an os::commit_memory_or_exit() entry point in order to preserve
>>>>> the existing error messages on all platforms. This entry point allows
>>>>> code like this:
>>>>>
>>>>> src/share/vm/gc_implementation/parallelScavenge/cardTableExtension.cpp:
>>>>>
>>>>>   568       if (!os::commit_memory((char*)new_committed.start(),
>>>>>   569                              new_committed.byte_size())) {
>>>>>   570         vm_exit_out_of_memory(new_committed.byte_size(),
>>>>> OOM_MMAP_ERROR,
>>>>>   571                               "card table expansion");
>>>>>
>>>>> to be replaced with code like this:
>>>>>
>>>>>   568 os::commit_memory_or_exit((char*)new_committed.start(),
>>>>>   569 new_committed.byte_size(), !ExecMem,
>>>>>   570                                 "card table expansion");
>>>>>
>>>>> All uses of os::commit_memory() have been visited and those locations
>>>>> that previously exited on error have been updated to use the new entry
>>>>> point. This new entry point cleans up the original call sites and the
>>>>> vm_exit_out_of_memory() calls are now consistent on all platforms.
>>>>>
>>>>> As a secondary change, while visiting all os::commit_memory() calls, I
>>>>> also updated them to use the new ExecMem enum in order to make the
>>>>> executable status of the memory more clear. Since executable memory can
>>>>> be an attack vector, it is prudent to make the executable status of
>>>>> memory crystal clear. This also allowed me to remove the default
>>>>> executable flag value of 'false'. Now all new uses of commit_memory()
>>>>> must be clear about the executable status of the memory.
>>>>>
>>>>> There are also tertiary changes where some of the pd_commit_memory()
>>>>> calls were calling os::commit_memory() instead of calling their sibling
>>>>> os::pd_commit_memory(). This resulted in double NMT tracking so this
>>>>> has also been fixed. There were also some incorrect mmap)() return
>>>>> value checks which have been fixed.
>>>>>
>>>>> Just to be clear: This fix simply properly detects the "out of swap
>>>>> space" condition on Linux and Solaris and causes the VM to fail in a
>>>>> more orderly fashion with a message that looks like this:
>>>>>
>>>>> The Java process' stderr will show:
>>>>>
>>>>> INFO: os::commit_memory(0xfffffd7fb2522000, 4096, 4096, 0) failed;
>>>>> errno=11
>>>>> #
>>>>> # There is insufficient memory for the Java Runtime Environment to
>>>>> continue.
>>>>> # Native memory allocation (mmap) failed to map 4096 bytes for
>>>>> committing reserved memory.
>>>>> # An error report file with more information is saved as:
>>>>> # /work/shared/bugs/8013057/looper.03/hs_err_pid9111.log
>>>>>
>>>>> The hs_err_pid file will have the more verbose info:
>>>>>
>>>>> #
>>>>> # There is insufficient memory for the Java Runtime Environment to
>>>>> continue.
>>>>> # Native memory allocation (mmap) failed to map 4096 bytes for
>>>>> committing reserved memory.
>>>>> # Possible reasons:
>>>>> #   The system is out of physical RAM or swap space
>>>>> #   In 32 bit mode, the process size limit was hit
>>>>> # Possible solutions:
>>>>> #   Reduce memory load on the system
>>>>> #   Increase physical memory or swap space
>>>>> #   Check if swap backing store is full
>>>>> #   Use 64 bit Java on a 64 bit OS
>>>>> #   Decrease Java heap size (-Xmx/-Xms)
>>>>> #   Decrease number of Java threads
>>>>> #   Decrease Java thread stack sizes (-Xss)
>>>>> #   Set larger code cache with -XX:ReservedCodeCacheSize=
>>>>> # This output file may be truncated or incomplete.
>>>>> #
>>>>> #  Out of Memory Error
>>>>> (/work/shared/bug_hunt/hsx_rt_latest/exp_8013057/src/os/s
>>>>> olaris/vm/os_solaris.cpp:2791), pid=9111, tid=21
>>>>> #
>>>>> # JRE version: Java(TM) SE Runtime Environment (8.0-b89) (build
>>>>> 1.8.0-ea-b89)
>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM
>>>>> (25.0-b33-bh_hsx_rt_exp_8013057_dcu
>>>>> bed-product-fastdebug mixed mode solaris-amd64 compressed oops)
>>>>> # Core dump written. Default location:
>>>>> /work/shared/bugs/8013057/looper.03/core
>>>>> or core.9111
>>>>> #
>>>>>
>>>>> You might be wondering why we are assuming that the failed mmap()
>>>>> commit operation has lost the 'reserved memory' mapping.
>>>>>
>>>>>      We have no good way to determine if the 'reserved memory' mapping
>>>>>      is lost. Since all the other threads are not idle, it is possible
>>>>>      for another thread to have 'reserved' the same memory space for a
>>>>>      different data structure. Our thread could observe that the memory
>>>>>      is still 'reserved' but we have no way to know that the reservation
>>>>>      isn't ours.
>>>>>
>>>>> You might be wondering why we can't recover from this transient
>>>>> resource availability issue.
>>>>>
>>>>>      We could retry the failed mmap() commit operation, but we would
>>>>>      again run into the issue that we no longer know which data
>>>>>      structure 'owns' the 'reserved' memory mapping. In particular, the
>>>>>      memory could be reserved by native code calling mmap() directly so
>>>>>      the VM really has no way to recover from this failure.
>>>>>
>>>>> You might be wondering why part of his work is deferred:
>>>>>
>>>>> 2654     default:
>>>>> 2655       // Any remaining errors on this OS can cause our reserved
>>>>> mapping
>>>>> 2656       // to be lost. That can cause confusion where different data
>>>>> 2657       // structures think they have the same memory mapped. The
>>>>> worst
>>>>> 2658       // scenario is if both the VM and a library think they
>>>>> have the
>>>>> 2659       // same memory mapped.
>>>>> 2660       //
>>>>> 2661       // However, it is not clear that this loss of our
>>>>> reserved mapping
>>>>> 2662       // happens with large pages on Linux or that we cannot
>>>>> recover
>>>>> 2663       // from the loss. For now, we just issue a warning and we
>>>>> don't
>>>>> 2664       // call vm_exit_out_of_memory(). This issue is being
>>>>> tracked by
>>>>> 2665       // JBS-8007074.
>>>>> 2666       warn_fail_commit_memory(addr, size, alignment_hint, exec,
>>>>> err);
>>>>> 2667 #if 0
>>>>> 2668       vm_exit_out_of_memory(size, OOM_MMAP_ERROR,
>>>>> 2669                             "committing reserved memory.");
>>>>> 2670 #endif
>>>>> 2671       break;
>>>>>
>>>>>      When lines 2668-2669 are enabled and UseHugeTLBFS is specified,
>>>>>      then the VM will exit because no more huge/large pages are
>>>>>      available. It is not yet clear that this transition from large to
>>>>>      small pages is actually unsafe, but we don't yet have proof that
>>>>>      it is safe either. More research will be done via JBS-8007074.
>>>>>
>>>
>>>
>