code review (round 2) for memory commit failure fix (8013057)

David Holmes david.holmes at oracle.com
Fri Jun 7 00:47:35 PDT 2013


Thanks Dan. Understood re error handling.

I don't think we need the new _impl method to be in os::XXX but I won't 
force you make any further changes. (Need to keep you on side for a FDS 
issue ;-) )

Cheers,
David

On 7/06/2013 12:49 PM, Daniel D. Daugherty wrote:
> David,
>
> Thanks for the re-review. Replies are embedded below.
>
>
> On 6/6/13 8:05 PM, David Holmes wrote:
>> Hi Dan,
>>
>> On 6/06/2013 10:17 AM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> Since we're in a holding pattern with RT_Baseline, I've taken
>>> the opportunity to make the code style changes that Stefan
>>> requested.
>>>
>>> New webrev:
>>>
>>>      http://cr.openjdk.java.net/~dcubed/8013057-webrev/3-hsx25/
>>>
>>> Stefan has already re-reviewed and approves. JPRT-hotspotwest job
>>> is in progress.
>>
>> For the record I find it somewhat confusing to have these different
>> failure semantics depending on the platform. It might simplify both
>> the pd code and the shared code if any mmap failure were simply a
>> fatal error. But we're not taking that path so ...
>>
>> I don't really understand the "recoverable mmap error" logic:
>>
>> 2557   case EBADF:
>> 2558   case EINVAL:
>> 2559   case ENOTSUP:
>>
>> these are, we hope, errors caused by failed pre-conditions and so we
>> expect that at the point these issues are detected the mmap has been a
>> no-op and so no reservation has been lost. Fair enough.
>
> Prior this this fix, we called os::commit_memory() and it came back
> with true or false and the caller "did something" with that failure
> mode. The idea is that the upper layers have some idea about the
> overall semantics of what they are trying to do and are in the best
> position to decide whether to fail or retry later.
>
> Along comes our realization that the underlying mmap() API doesn't
> quite behave the way we expect on all OSes. If all the OSes were
> equally screwed up, we would fix this a different way. However, it
> is just Linux and Solaris that screw this up so we limit the new
> vm_exit_out_of_memory() calls for this mmap() failure to those two
> OSes.
>
> Any failures that make it back from os::commit_memory() to the upper
> layers will continue to be handled at those upper layers the way they
> were handled before. I don't think we're ready to completely rewrite
> the whole reserve_memory/commit_memory model. That would be a bit
> much since this has to be backported.
>
>> But to me these indicate programming errors in our code so why not
>> fail-fast anyway?
>
> Maybe they are programming errors and maybe they are crazy commit_memory()
> probes in an attempt to adapt to some page size thing. I can't say that
> I understand all the reasons that the above three errnos are returned
> nor do I need to for this bug fix. What I do know is that these three
> errnos were permitted to trickle back before and it appears to be safe
> to continue to let them do so since they won't mess up existing mappings.
>
>
>> If these occur we are going to lose the detailed information by
>> returning false and this may not even trigger an upper layer failure
>> allowing a programming error to go undetected.
>
> Sorry, this is no different than we behaved before so I'm not planning
> to worry about that for this fix. As I said above, we're not rewriting
> the whole reserve_memory/commit_memory model.
>
>
>> Is there a reason that:
>>
>> int os::Linux::commit_memory_impl
>>
>> is not simply simply
>>
>> static int commit_memory_impl? It seems to be purely internal and so
>> doesn't need to be part of the os::Linux exported API. Ditto for the
>> Solaris version.
>
> Seemed like the place to put it since there are other similar
> peers at the os::Linux level. I used the 'static' function for
> the things that were strictly helper functions. Also seemed
> like the thing to do.
>
>
>
>> ---
>>
>> Otherwise ok. This is messy issue, thanks for taking the time to try
>> and make it cleaner.
>
> Thanks! I'm not planning to make any changes based on your review.
> Please let me know if you're OK with that.
>
> Credit for making it cleaner goes to Stefan K. He kept after me
> readability of this code and not overloading too many things into
> one construct.
>
> Dan
>
>
>>
>> Cheers,
>> David
>> -----
>>
>>> Dan
>>>
>>>
>>>
>>> On 6/5/13 7:01 AM, Daniel D. Daugherty wrote:
>>>> Stefan,
>>>>
>>>> Thanks for the re-review! And for several off-line reviews
>>>> as we were trying to work out this 'round 2' version.
>>>>
>>>>
>>>> On 6/5/13 2:02 AM, Stefan Karlsson wrote:
>>>>> On 06/04/2013 07:32 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I have another revised version of the proposed fix for the following
>>>>>> bug:
>>>>>>
>>>>>>     8013057 assert(_needs_gc ||
>>>>>> SafepointSynchronize::is_at_safepoint())
>>>>>>             failed: only read at safepoint
>>>>>>
>>>>>> Here are the (round 2) webrev URLs:
>>>>>>
>>>>>> OpenJDK: http://cr.openjdk.java.net/~dcubed/8013057-webrev/2-hsx25/
>>>>>> Internal:
>>>>>> http://javaweb.us.oracle.com/~ddaugher/8013057-webrev/2-hsx25/
>>>>>
>>>>> Looks good. Thanks for fixing this and making sure that this new code
>>>>> work well with the error handling in the GC code.
>>>>
>>>> Thanks! The new commit_memory_or_exit() cleaned up call sites in
>>>> several places so I appreciate your feedback.
>>>>
>>>>
>>>>> I have two code style comments that are not necessary to fix, but it
>>>>> would make me happy if they were.
>>>>>
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8013057-webrev/2-hsx25/src/share/vm/runtime/os.hpp.udiff.html
>>>>>
>>>>>
>>>>>
>>>>> +// Executable parameter flag for os::commit_memory() and
>>>>> +// os::commit_memory_or_exit().
>>>>> +enum ExecMemFlag {
>>>>> +  ExecMem = true
>>>>> +};
>>>>>
>>>>>
>>>>> Why is this an Enum and not a const bool when it's used as a bool?
>>>>
>>>> Hmmm... It's an Enum because that seemed to be style used in other
>>>> places
>>>> in the VM when we didn't want to pass "true /* some comment */" as a
>>>> parameter. Do you happen to have a "const bool" example I could look
>>>> at?
>>>> I'll poke around, but if you could point me at an example, I would
>>>> appreciate it...
>>>>
>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8013057-webrev/2-hsx25/src/os/linux/vm/os_linux.cpp.udiff.html
>>>>>
>>>>>
>>>>>
>>>>> +  // see if the error is one we can let the caller handle
>>>>> +  switch (err) {
>>>>> +  case EBADF:
>>>>> +  case EINVAL:
>>>>> +  case ENOTSUP:
>>>>> +    // let the caller deal with these errors
>>>>> +    break;
>>>>> +
>>>>> +  default:
>>>>> +    // Any remaining errors on this OS can cause our reserved mapping
>>>>> +    // to be lost. That can cause confusion where different data
>>>>> +    // structures think they have the same memory mapped. The worst
>>>>> +    // scenario is if both the VM and a library think they have the
>>>>> +    // same memory mapped.
>>>>> +    warn_fail_commit_memory(addr, size, exec, err);
>>>>> +    vm_exit_out_of_memory(size, OOM_MMAP_ERROR, "committing reserved
>>>>> memory.");
>>>>> +    break;
>>>>> +  }
>>>>>
>>>>> and
>>>>>
>>>>> +    switch (err) {
>>>>> +    case EBADF:
>>>>> +    case EINVAL:
>>>>> +    case ENOTSUP:
>>>>> +      // these errors are OK for retrying with small pages
>>>>> +      break;
>>>>> +
>>>>> +    default:
>>>>> +      // Any remaining errors on this OS can cause our reserved
>>>>> mapping
>>>>> +      // to be lost. That can cause confusion where different data
>>>>> +      // structures think they have the same memory mapped. The worst
>>>>> +      // scenario is if both the VM and a library think they have the
>>>>> +      // same memory mapped.
>>>>> +      //
>>>>> +      // However, it is not clear that this loss of our reserved
>>>>> mapping
>>>>> +      // happens with large pages on Linux or that we cannot recover
>>>>> +      // from the loss. For now, we just issue a warning and we don't
>>>>> +      // call vm_exit_out_of_memory(). This issue is being tracked by
>>>>> +      // JBS-8007074.
>>>>> +      warn_fail_commit_memory(addr, size, alignment_hint, exec, err);
>>>>> +#if 0
>>>>> +      vm_exit_out_of_memory(size, OOM_MMAP_ERROR,
>>>>> +                            "committing reserved memory.");
>>>>> +#endif
>>>>> +      break;
>>>>> +    }
>>>>>
>>>>> These construct are rather long and duplicated. I'd prefer if this
>>>>> could be extracted out into a helper function so that the logic of
>>>>> commit_memory_impl is a bit clearer.
>>>>
>>>> I agree that this could be improved. You had recommended this:
>>>>
>>>> if (!recoverable_error(err)) {
>>>>   warn_fail_commit_memory(addr, size, exec, err);
>>>>   vm_exit_out_of_memory(size, OOM_MMAP_ERROR, "committing reserved
>>>> memory.");
>>>> }
>>>>
>>>> and this:
>>>>
>>>> if (!recoverable_error(err)) {
>>>>   warn_fail_commit_memory(addr, size, alignment_hint, exec, err);
>>>> #if 0
>>>>   vm_exit_out_of_memory(size, OOM_MMAP_ERROR, "committing reserved
>>>> memory.");
>>>> #endif
>>>> }
>>>>
>>>> I concur that the identical parts can be refactored into
>>>> recoverable_error(), but the "However, it is not clear..."
>>>> paragraph needs to stay with disabled vm_exit_out_of_memory()
>>>> call. BTW, Coleen requested (off-thread) that the call be
>>>> commented out since of #if'ed out.
>>>>
>>>> I'll definitely make these changes if I have to re-roll the
>>>> patch for non-code style changes. Depending on how this
>>>> changeset lines up with RT_Baseline's schedule this week,
>>>> I may even make the changes anyway.
>>>>
>>>> Again, thanks for the many review cycles.
>>>>
>>>> Dan
>>>>
>>>>
>>>>>
>>>>> thanks,
>>>>> StefanK
>>>>>
>>>>>>
>>>>>> Testing:
>>>>>> - Aurora Adhoc vm.quick batch for all OSes in the following configs:
>>>>>>   {Client VM, Server VM} x {fastdebug} x {-Xmixed}
>>>>>> - I've created a standalone Java stress test with a shell script
>>>>>>   wrapper that reproduces the failing code paths on my Solaris X86
>>>>>>   server and on Ron's DevOps Linux machine. This test will not be
>>>>>>   integrated since running the machine out of swap space is very
>>>>>>   disruptive (crashes the window system, causes various services to
>>>>>>   exit, etc.)
>>>>>>
>>>>>> There are three parts to this fix:
>>>>>>
>>>>>> 1) Detect commit memory failures on Linux and Solaris where the
>>>>>>    previous reservation can be lost and call vm_exit_out_of_memory()
>>>>>>    to report the resource exhaustion. Add os::commit_memory_or_exit()
>>>>>>    API to provide more consistent handling of vm_exit_out_of_memory()
>>>>>>    calls.
>>>>>> 2) Change existing os::commit_memory() calls to make the executable
>>>>>>    status of memory more clear; this makes security analysis easier.
>>>>>> 3) Clean up some platform dependent layer calls that were resulting
>>>>>>    in extra NMT accounting. Clean up some mmap() return value checks.
>>>>>>
>>>>>> Gory details are below. As always, comments, questions and
>>>>>> suggestions are welome.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> Gory Details:
>>>>>>
>>>>>> The VirtualSpace data structure is built on top of the ReservedSpace
>>>>>> data structure. VirtualSpace presumes that failed os::commit_memory()
>>>>>> calls do not affect the underlying ReservedSpace memory mappings.
>>>>>> That assumption is true on MacOS X and Windows, but it is not true
>>>>>> on Linux or Solaris. The mmap() system call on Linux or Solaris can
>>>>>> lose previous mappings in the event of certain errors. On MacOS X,
>>>>>> the mmap() system call clearly states that previous mappings are
>>>>>> replaced only on success. On Windows, a different set of APIs are
>>>>>> used and they do not document any loss of previous mappings.
>>>>>>
>>>>>> The solution is to implement the proper failure checks in the
>>>>>> os::commit_memory() implementations on Linux and Solaris. On MacOS X
>>>>>> and Windows, no additional checks are needed.
>>>>>>
>>>>>> During code review round 1, there was a request from the GC team to
>>>>>> provide an os::commit_memory_or_exit() entry point in order to
>>>>>> preserve
>>>>>> the existing error messages on all platforms. This entry point allows
>>>>>> code like this:
>>>>>>
>>>>>> src/share/vm/gc_implementation/parallelScavenge/cardTableExtension.cpp:
>>>>>>
>>>>>>
>>>>>>  568       if (!os::commit_memory((char*)new_committed.start(),
>>>>>>  569 new_committed.byte_size())) {
>>>>>>  570 vm_exit_out_of_memory(new_committed.byte_size(),
>>>>>> OOM_MMAP_ERROR,
>>>>>>  571                               "card table expansion");
>>>>>>
>>>>>> to be replaced with code like this:
>>>>>>
>>>>>>  568 os::commit_memory_or_exit((char*)new_committed.start(),
>>>>>>  569 new_committed.byte_size(), !ExecMem,
>>>>>>  570                                 "card table expansion");
>>>>>>
>>>>>> All uses of os::commit_memory() have been visited and those locations
>>>>>> that previously exited on error have been updated to use the new
>>>>>> entry
>>>>>> point. This new entry point cleans up the original call sites and the
>>>>>> vm_exit_out_of_memory() calls are now consistent on all platforms.
>>>>>>
>>>>>> As a secondary change, while visiting all os::commit_memory()
>>>>>> calls, I
>>>>>> also updated them to use the new ExecMem enum in order to make the
>>>>>> executable status of the memory more clear. Since executable
>>>>>> memory can
>>>>>> be an attack vector, it is prudent to make the executable status of
>>>>>> memory crystal clear. This also allowed me to remove the default
>>>>>> executable flag value of 'false'. Now all new uses of commit_memory()
>>>>>> must be clear about the executable status of the memory.
>>>>>>
>>>>>> There are also tertiary changes where some of the pd_commit_memory()
>>>>>> calls were calling os::commit_memory() instead of calling their
>>>>>> sibling
>>>>>> os::pd_commit_memory(). This resulted in double NMT tracking so this
>>>>>> has also been fixed. There were also some incorrect mmap)() return
>>>>>> value checks which have been fixed.
>>>>>>
>>>>>> Just to be clear: This fix simply properly detects the "out of swap
>>>>>> space" condition on Linux and Solaris and causes the VM to fail in a
>>>>>> more orderly fashion with a message that looks like this:
>>>>>>
>>>>>> The Java process' stderr will show:
>>>>>>
>>>>>> INFO: os::commit_memory(0xfffffd7fb2522000, 4096, 4096, 0) failed;
>>>>>> errno=11
>>>>>> #
>>>>>> # There is insufficient memory for the Java Runtime Environment to
>>>>>> continue.
>>>>>> # Native memory allocation (mmap) failed to map 4096 bytes for
>>>>>> committing reserved memory.
>>>>>> # An error report file with more information is saved as:
>>>>>> # /work/shared/bugs/8013057/looper.03/hs_err_pid9111.log
>>>>>>
>>>>>> The hs_err_pid file will have the more verbose info:
>>>>>>
>>>>>> #
>>>>>> # There is insufficient memory for the Java Runtime Environment to
>>>>>> continue.
>>>>>> # Native memory allocation (mmap) failed to map 4096 bytes for
>>>>>> committing reserved memory.
>>>>>> # Possible reasons:
>>>>>> #   The system is out of physical RAM or swap space
>>>>>> #   In 32 bit mode, the process size limit was hit
>>>>>> # Possible solutions:
>>>>>> #   Reduce memory load on the system
>>>>>> #   Increase physical memory or swap space
>>>>>> #   Check if swap backing store is full
>>>>>> #   Use 64 bit Java on a 64 bit OS
>>>>>> #   Decrease Java heap size (-Xmx/-Xms)
>>>>>> #   Decrease number of Java threads
>>>>>> #   Decrease Java thread stack sizes (-Xss)
>>>>>> #   Set larger code cache with -XX:ReservedCodeCacheSize=
>>>>>> # This output file may be truncated or incomplete.
>>>>>> #
>>>>>> #  Out of Memory Error
>>>>>> (/work/shared/bug_hunt/hsx_rt_latest/exp_8013057/src/os/s
>>>>>> olaris/vm/os_solaris.cpp:2791), pid=9111, tid=21
>>>>>> #
>>>>>> # JRE version: Java(TM) SE Runtime Environment (8.0-b89) (build
>>>>>> 1.8.0-ea-b89)
>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM
>>>>>> (25.0-b33-bh_hsx_rt_exp_8013057_dcu
>>>>>> bed-product-fastdebug mixed mode solaris-amd64 compressed oops)
>>>>>> # Core dump written. Default location:
>>>>>> /work/shared/bugs/8013057/looper.03/core
>>>>>> or core.9111
>>>>>> #
>>>>>>
>>>>>> You might be wondering why we are assuming that the failed mmap()
>>>>>> commit operation has lost the 'reserved memory' mapping.
>>>>>>
>>>>>>     We have no good way to determine if the 'reserved memory' mapping
>>>>>>     is lost. Since all the other threads are not idle, it is possible
>>>>>>     for another thread to have 'reserved' the same memory space for a
>>>>>>     different data structure. Our thread could observe that the
>>>>>> memory
>>>>>>     is still 'reserved' but we have no way to know that the
>>>>>> reservation
>>>>>>     isn't ours.
>>>>>>
>>>>>> You might be wondering why we can't recover from this transient
>>>>>> resource availability issue.
>>>>>>
>>>>>>     We could retry the failed mmap() commit operation, but we would
>>>>>>     again run into the issue that we no longer know which data
>>>>>>     structure 'owns' the 'reserved' memory mapping. In particular,
>>>>>> the
>>>>>>     memory could be reserved by native code calling mmap()
>>>>>> directly so
>>>>>>     the VM really has no way to recover from this failure.
>>>>>>
>>>>>> You might be wondering why part of his work is deferred:
>>>>>>
>>>>>> 2654     default:
>>>>>> 2655       // Any remaining errors on this OS can cause our reserved
>>>>>> mapping
>>>>>> 2656       // to be lost. That can cause confusion where different
>>>>>> data
>>>>>> 2657       // structures think they have the same memory mapped. The
>>>>>> worst
>>>>>> 2658       // scenario is if both the VM and a library think they
>>>>>> have the
>>>>>> 2659       // same memory mapped.
>>>>>> 2660       //
>>>>>> 2661       // However, it is not clear that this loss of our
>>>>>> reserved mapping
>>>>>> 2662       // happens with large pages on Linux or that we cannot
>>>>>> recover
>>>>>> 2663       // from the loss. For now, we just issue a warning and we
>>>>>> don't
>>>>>> 2664       // call vm_exit_out_of_memory(). This issue is being
>>>>>> tracked by
>>>>>> 2665       // JBS-8007074.
>>>>>> 2666       warn_fail_commit_memory(addr, size, alignment_hint, exec,
>>>>>> err);
>>>>>> 2667 #if 0
>>>>>> 2668       vm_exit_out_of_memory(size, OOM_MMAP_ERROR,
>>>>>> 2669                             "committing reserved memory.");
>>>>>> 2670 #endif
>>>>>> 2671       break;
>>>>>>
>>>>>>     When lines 2668-2669 are enabled and UseHugeTLBFS is specified,
>>>>>>     then the VM will exit because no more huge/large pages are
>>>>>>     available. It is not yet clear that this transition from large to
>>>>>>     small pages is actually unsafe, but we don't yet have proof that
>>>>>>     it is safe either. More research will be done via JBS-8007074.
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>


More information about the hotspot-runtime-dev mailing list