code review (round 2) for memory commit failure fix (8013057)

Wed Jun 5 17:17:00 PDT 2013

Greetings,

Since we're in a holding pattern with RT_Baseline, I've taken
the opportunity to make the code style changes that Stefan
requested.

New webrev:

     http://cr.openjdk.java.net/~dcubed/8013057-webrev/3-hsx25/

Stefan has already re-reviewed and approves. JPRT-hotspotwest job
is in progress.

Dan

On 6/5/13 7:01 AM, Daniel D. Daugherty wrote:
> Stefan,
>
> Thanks for the re-review! And for several off-line reviews
> as we were trying to work out this 'round 2' version.
>
>
> On 6/5/13 2:02 AM, Stefan Karlsson wrote:
>> On 06/04/2013 07:32 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> I have another revised version of the proposed fix for the following 
>>> bug:
>>>
>>>     8013057 assert(_needs_gc || 
>>> SafepointSynchronize::is_at_safepoint())
>>>             failed: only read at safepoint
>>>
>>> Here are the (round 2) webrev URLs:
>>>
>>> OpenJDK: http://cr.openjdk.java.net/~dcubed/8013057-webrev/2-hsx25/
>>> Internal: 
>>> http://javaweb.us.oracle.com/~ddaugher/8013057-webrev/2-hsx25/
>>
>> Looks good. Thanks for fixing this and making sure that this new code 
>> work well with the error handling in the GC code.
>
> Thanks! The new commit_memory_or_exit() cleaned up call sites in
> several places so I appreciate your feedback.
>
>
>> I have two code style comments that are not necessary to fix, but it 
>> would make me happy if they were.
>>
>>
>> http://cr.openjdk.java.net/~dcubed/8013057-webrev/2-hsx25/src/share/vm/runtime/os.hpp.udiff.html 
>>
>>
>> +// Executable parameter flag for os::commit_memory() and
>> +// os::commit_memory_or_exit().
>> +enum ExecMemFlag {
>> +  ExecMem = true
>> +};
>>
>>
>> Why is this an Enum and not a const bool when it's used as a bool?
>
> Hmmm... It's an Enum because that seemed to be style used in other places
> in the VM when we didn't want to pass "true /* some comment */" as a
> parameter. Do you happen to have a "const bool" example I could look at?
> I'll poke around, but if you could point me at an example, I would
> appreciate it...
>
>
>> http://cr.openjdk.java.net/~dcubed/8013057-webrev/2-hsx25/src/os/linux/vm/os_linux.cpp.udiff.html 
>>
>>
>> +  // see if the error is one we can let the caller handle
>> +  switch (err) {
>> +  case EBADF:
>> +  case EINVAL:
>> +  case ENOTSUP:
>> +    // let the caller deal with these errors
>> +    break;
>> +
>> +  default:
>> +    // Any remaining errors on this OS can cause our reserved mapping
>> +    // to be lost. That can cause confusion where different data
>> +    // structures think they have the same memory mapped. The worst
>> +    // scenario is if both the VM and a library think they have the
>> +    // same memory mapped.
>> +    warn_fail_commit_memory(addr, size, exec, err);
>> +    vm_exit_out_of_memory(size, OOM_MMAP_ERROR, "committing reserved 
>> memory.");
>> +    break;
>> +  }
>>
>> and
>>
>> +    switch (err) {
>> +    case EBADF:
>> +    case EINVAL:
>> +    case ENOTSUP:
>> +      // these errors are OK for retrying with small pages
>> +      break;
>> +
>> +    default:
>> +      // Any remaining errors on this OS can cause our reserved mapping
>> +      // to be lost. That can cause confusion where different data
>> +      // structures think they have the same memory mapped. The worst
>> +      // scenario is if both the VM and a library think they have the
>> +      // same memory mapped.
>> +      //
>> +      // However, it is not clear that this loss of our reserved 
>> mapping
>> +      // happens with large pages on Linux or that we cannot recover
>> +      // from the loss. For now, we just issue a warning and we don't
>> +      // call vm_exit_out_of_memory(). This issue is being tracked by
>> +      // JBS-8007074.
>> +      warn_fail_commit_memory(addr, size, alignment_hint, exec, err);
>> +#if 0
>> +      vm_exit_out_of_memory(size, OOM_MMAP_ERROR,
>> +                            "committing reserved memory.");
>> +#endif
>> +      break;
>> +    }
>>
>> These construct are rather long and duplicated. I'd prefer if this 
>> could be extracted out into a helper function so that the logic of 
>> commit_memory_impl is a bit clearer.
>
> I agree that this could be improved. You had recommended this:
>
> if (!recoverable_error(err)) {
>   warn_fail_commit_memory(addr, size, exec, err);
>   vm_exit_out_of_memory(size, OOM_MMAP_ERROR, "committing reserved 
> memory.");
> }
>
> and this:
>
> if (!recoverable_error(err)) {
>   warn_fail_commit_memory(addr, size, alignment_hint, exec, err);
> #if 0
>   vm_exit_out_of_memory(size, OOM_MMAP_ERROR, "committing reserved 
> memory.");
> #endif
> }
>
> I concur that the identical parts can be refactored into
> recoverable_error(), but the "However, it is not clear..."
> paragraph needs to stay with disabled vm_exit_out_of_memory()
> call. BTW, Coleen requested (off-thread) that the call be
> commented out since of #if'ed out.
>
> I'll definitely make these changes if I have to re-roll the
> patch for non-code style changes. Depending on how this
> changeset lines up with RT_Baseline's schedule this week,
> I may even make the changes anyway.
>
> Again, thanks for the many review cycles.
>
> Dan
>
>
>>
>> thanks,
>> StefanK
>>
>>>
>>> Testing:
>>> - Aurora Adhoc vm.quick batch for all OSes in the following configs:
>>>   {Client VM, Server VM} x {fastdebug} x {-Xmixed}
>>> - I've created a standalone Java stress test with a shell script
>>>   wrapper that reproduces the failing code paths on my Solaris X86
>>>   server and on Ron's DevOps Linux machine. This test will not be
>>>   integrated since running the machine out of swap space is very
>>>   disruptive (crashes the window system, causes various services to
>>>   exit, etc.)
>>>
>>> There are three parts to this fix:
>>>
>>> 1) Detect commit memory failures on Linux and Solaris where the
>>>    previous reservation can be lost and call vm_exit_out_of_memory()
>>>    to report the resource exhaustion. Add os::commit_memory_or_exit()
>>>    API to provide more consistent handling of vm_exit_out_of_memory()
>>>    calls.
>>> 2) Change existing os::commit_memory() calls to make the executable
>>>    status of memory more clear; this makes security analysis easier.
>>> 3) Clean up some platform dependent layer calls that were resulting
>>>    in extra NMT accounting. Clean up some mmap() return value checks.
>>>
>>> Gory details are below. As always, comments, questions and
>>> suggestions are welome.
>>>
>>> Dan
>>>
>>>
>>> Gory Details:
>>>
>>> The VirtualSpace data structure is built on top of the ReservedSpace
>>> data structure. VirtualSpace presumes that failed os::commit_memory()
>>> calls do not affect the underlying ReservedSpace memory mappings.
>>> That assumption is true on MacOS X and Windows, but it is not true
>>> on Linux or Solaris. The mmap() system call on Linux or Solaris can
>>> lose previous mappings in the event of certain errors. On MacOS X,
>>> the mmap() system call clearly states that previous mappings are
>>> replaced only on success. On Windows, a different set of APIs are
>>> used and they do not document any loss of previous mappings.
>>>
>>> The solution is to implement the proper failure checks in the
>>> os::commit_memory() implementations on Linux and Solaris. On MacOS X
>>> and Windows, no additional checks are needed.
>>>
>>> During code review round 1, there was a request from the GC team to
>>> provide an os::commit_memory_or_exit() entry point in order to preserve
>>> the existing error messages on all platforms. This entry point allows
>>> code like this:
>>>
>>> src/share/vm/gc_implementation/parallelScavenge/cardTableExtension.cpp:
>>>
>>>  568       if (!os::commit_memory((char*)new_committed.start(),
>>>  569                              new_committed.byte_size())) {
>>>  570         vm_exit_out_of_memory(new_committed.byte_size(), 
>>> OOM_MMAP_ERROR,
>>>  571                               "card table expansion");
>>>
>>> to be replaced with code like this:
>>>
>>>  568 os::commit_memory_or_exit((char*)new_committed.start(),
>>>  569 new_committed.byte_size(), !ExecMem,
>>>  570                                 "card table expansion");
>>>
>>> All uses of os::commit_memory() have been visited and those locations
>>> that previously exited on error have been updated to use the new entry
>>> point. This new entry point cleans up the original call sites and the
>>> vm_exit_out_of_memory() calls are now consistent on all platforms.
>>>
>>> As a secondary change, while visiting all os::commit_memory() calls, I
>>> also updated them to use the new ExecMem enum in order to make the
>>> executable status of the memory more clear. Since executable memory can
>>> be an attack vector, it is prudent to make the executable status of
>>> memory crystal clear. This also allowed me to remove the default
>>> executable flag value of 'false'. Now all new uses of commit_memory()
>>> must be clear about the executable status of the memory.
>>>
>>> There are also tertiary changes where some of the pd_commit_memory()
>>> calls were calling os::commit_memory() instead of calling their sibling
>>> os::pd_commit_memory(). This resulted in double NMT tracking so this
>>> has also been fixed. There were also some incorrect mmap)() return
>>> value checks which have been fixed.
>>>
>>> Just to be clear: This fix simply properly detects the "out of swap
>>> space" condition on Linux and Solaris and causes the VM to fail in a
>>> more orderly fashion with a message that looks like this:
>>>
>>> The Java process' stderr will show:
>>>
>>> INFO: os::commit_memory(0xfffffd7fb2522000, 4096, 4096, 0) failed; 
>>> errno=11
>>> #
>>> # There is insufficient memory for the Java Runtime Environment to 
>>> continue.
>>> # Native memory allocation (mmap) failed to map 4096 bytes for 
>>> committing reserved memory.
>>> # An error report file with more information is saved as:
>>> # /work/shared/bugs/8013057/looper.03/hs_err_pid9111.log
>>>
>>> The hs_err_pid file will have the more verbose info:
>>>
>>> #
>>> # There is insufficient memory for the Java Runtime Environment to 
>>> continue.
>>> # Native memory allocation (mmap) failed to map 4096 bytes for 
>>> committing reserved memory.
>>> # Possible reasons:
>>> #   The system is out of physical RAM or swap space
>>> #   In 32 bit mode, the process size limit was hit
>>> # Possible solutions:
>>> #   Reduce memory load on the system
>>> #   Increase physical memory or swap space
>>> #   Check if swap backing store is full
>>> #   Use 64 bit Java on a 64 bit OS
>>> #   Decrease Java heap size (-Xmx/-Xms)
>>> #   Decrease number of Java threads
>>> #   Decrease Java thread stack sizes (-Xss)
>>> #   Set larger code cache with -XX:ReservedCodeCacheSize=
>>> # This output file may be truncated or incomplete.
>>> #
>>> #  Out of Memory Error 
>>> (/work/shared/bug_hunt/hsx_rt_latest/exp_8013057/src/os/s
>>> olaris/vm/os_solaris.cpp:2791), pid=9111, tid=21
>>> #
>>> # JRE version: Java(TM) SE Runtime Environment (8.0-b89) (build 
>>> 1.8.0-ea-b89)
>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM 
>>> (25.0-b33-bh_hsx_rt_exp_8013057_dcu
>>> bed-product-fastdebug mixed mode solaris-amd64 compressed oops)
>>> # Core dump written. Default location: 
>>> /work/shared/bugs/8013057/looper.03/core
>>> or core.9111
>>> #
>>>
>>> You might be wondering why we are assuming that the failed mmap()
>>> commit operation has lost the 'reserved memory' mapping.
>>>
>>>     We have no good way to determine if the 'reserved memory' mapping
>>>     is lost. Since all the other threads are not idle, it is possible
>>>     for another thread to have 'reserved' the same memory space for a
>>>     different data structure. Our thread could observe that the memory
>>>     is still 'reserved' but we have no way to know that the reservation
>>>     isn't ours.
>>>
>>> You might be wondering why we can't recover from this transient
>>> resource availability issue.
>>>
>>>     We could retry the failed mmap() commit operation, but we would
>>>     again run into the issue that we no longer know which data
>>>     structure 'owns' the 'reserved' memory mapping. In particular, the
>>>     memory could be reserved by native code calling mmap() directly so
>>>     the VM really has no way to recover from this failure.
>>>
>>> You might be wondering why part of his work is deferred:
>>>
>>> 2654     default:
>>> 2655       // Any remaining errors on this OS can cause our reserved 
>>> mapping
>>> 2656       // to be lost. That can cause confusion where different data
>>> 2657       // structures think they have the same memory mapped. The 
>>> worst
>>> 2658       // scenario is if both the VM and a library think they 
>>> have the
>>> 2659       // same memory mapped.
>>> 2660       //
>>> 2661       // However, it is not clear that this loss of our 
>>> reserved mapping
>>> 2662       // happens with large pages on Linux or that we cannot 
>>> recover
>>> 2663       // from the loss. For now, we just issue a warning and we 
>>> don't
>>> 2664       // call vm_exit_out_of_memory(). This issue is being 
>>> tracked by
>>> 2665       // JBS-8007074.
>>> 2666       warn_fail_commit_memory(addr, size, alignment_hint, exec, 
>>> err);
>>> 2667 #if 0
>>> 2668       vm_exit_out_of_memory(size, OOM_MMAP_ERROR,
>>> 2669                             "committing reserved memory.");
>>> 2670 #endif
>>> 2671       break;
>>>
>>>     When lines 2668-2669 are enabled and UseHugeTLBFS is specified,
>>>     then the VM will exit because no more huge/large pages are
>>>     available. It is not yet clear that this transition from large to
>>>     small pages is actually unsafe, but we don't yet have proof that
>>>     it is safe either. More research will be done via JBS-8007074.
>>>
>>
>
>
>